License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-01 15:07:57 +01:00
|
|
|
/* SPDX-License-Identifier: GPL-2.0 */
|
2017-01-09 16:55:17 +01:00
|
|
|
/*
|
|
|
|
* Shared Memory Communications over RDMA (SMC-R) and RoCE
|
|
|
|
*
|
|
|
|
* Definitions for SMC Connections, Link Groups and Links
|
|
|
|
*
|
|
|
|
* Copyright IBM Corp. 2016
|
|
|
|
*
|
|
|
|
* Author(s): Ursula Braun <ubraun@linux.vnet.ibm.com>
|
|
|
|
*/
|
|
|
|
|
|
|
|
#ifndef _SMC_CORE_H
|
|
|
|
#define _SMC_CORE_H
|
|
|
|
|
2017-01-09 16:55:19 +01:00
|
|
|
#include <linux/atomic.h>
|
2020-12-01 20:20:48 +01:00
|
|
|
#include <linux/smc.h>
|
|
|
|
#include <linux/pci.h>
|
2017-01-09 16:55:17 +01:00
|
|
|
#include <rdma/ib_verbs.h>
|
2020-12-01 20:20:44 +01:00
|
|
|
#include <net/genetlink.h>
|
2017-01-09 16:55:17 +01:00
|
|
|
|
|
|
|
#include "smc.h"
|
|
|
|
#include "smc_ib.h"
|
|
|
|
|
2017-01-09 16:55:18 +01:00
|
|
|
#define SMC_RMBS_PER_LGR_MAX 255 /* max. # of RMBs per link group */
|
|
|
|
|
2017-01-09 16:55:17 +01:00
|
|
|
struct smc_lgr_list { /* list of link group definition */
|
|
|
|
struct list_head list;
|
|
|
|
spinlock_t lock; /* protects list of link groups */
|
2018-05-18 09:34:11 +02:00
|
|
|
u32 num; /* unique link group number */
|
2017-01-09 16:55:17 +01:00
|
|
|
};
|
|
|
|
|
|
|
|
enum smc_lgr_role { /* possible roles of a link group */
|
|
|
|
SMC_CLNT, /* client */
|
|
|
|
SMC_SERV /* server */
|
|
|
|
};
|
|
|
|
|
2018-03-01 13:51:32 +01:00
|
|
|
enum smc_link_state { /* possible states of a link */
|
2020-04-29 17:10:43 +02:00
|
|
|
SMC_LNK_UNUSED, /* link is unused */
|
2018-03-01 13:51:32 +01:00
|
|
|
SMC_LNK_INACTIVE, /* link is inactive */
|
|
|
|
SMC_LNK_ACTIVATING, /* link is being activated */
|
2018-07-25 16:35:33 +02:00
|
|
|
SMC_LNK_ACTIVE, /* link is active */
|
2018-03-01 13:51:32 +01:00
|
|
|
};
|
|
|
|
|
2017-01-09 16:55:19 +01:00
|
|
|
#define SMC_WR_BUF_SIZE 48 /* size of work request buffer */
|
2021-10-16 11:37:49 +02:00
|
|
|
#define SMC_WR_BUF_V2_SIZE 8192 /* size of v2 work request buffer */
|
2017-01-09 16:55:19 +01:00
|
|
|
|
|
|
|
struct smc_wr_buf {
|
|
|
|
u8 raw[SMC_WR_BUF_SIZE];
|
|
|
|
};
|
|
|
|
|
2021-10-16 11:37:49 +02:00
|
|
|
struct smc_wr_v2_buf {
|
|
|
|
u8 raw[SMC_WR_BUF_V2_SIZE];
|
|
|
|
};
|
|
|
|
|
2017-07-28 13:56:17 +02:00
|
|
|
#define SMC_WR_REG_MR_WAIT_TIME (5 * HZ)/* wait time for ib_wr_reg_mr result */
|
|
|
|
|
|
|
|
enum smc_wr_reg_state {
|
|
|
|
POSTED, /* ib_wr_reg_mr request posted */
|
|
|
|
CONFIRMED, /* ib_wr_reg_mr response: successful */
|
|
|
|
FAILED /* ib_wr_reg_mr response: failure */
|
|
|
|
};
|
|
|
|
|
2019-02-04 13:44:44 +01:00
|
|
|
struct smc_rdma_sge { /* sges for RDMA writes */
|
|
|
|
struct ib_sge wr_tx_rdma_sge[SMC_IB_MAX_SEND_SGE];
|
|
|
|
};
|
|
|
|
|
|
|
|
#define SMC_MAX_RDMA_WRITES 2 /* max. # of RDMA writes per
|
|
|
|
* message send
|
|
|
|
*/
|
|
|
|
|
|
|
|
struct smc_rdma_sges { /* sges per message send */
|
|
|
|
struct smc_rdma_sge tx_rdma_sge[SMC_MAX_RDMA_WRITES];
|
|
|
|
};
|
|
|
|
|
|
|
|
struct smc_rdma_wr { /* work requests per message
|
|
|
|
* send
|
|
|
|
*/
|
|
|
|
struct ib_rdma_wr wr_tx_rdma[SMC_MAX_RDMA_WRITES];
|
|
|
|
};
|
|
|
|
|
2020-05-04 14:18:47 +02:00
|
|
|
#define SMC_LGR_ID_SIZE 4
|
|
|
|
|
2017-01-09 16:55:17 +01:00
|
|
|
struct smc_link {
|
|
|
|
struct smc_ib_device *smcibdev; /* ib-device */
|
|
|
|
u8 ibport; /* port - values 1 | 2 */
|
2017-01-09 16:55:19 +01:00
|
|
|
struct ib_pd *roce_pd; /* IB protection domain,
|
|
|
|
* unique for every RoCE QP
|
|
|
|
*/
|
2017-01-09 16:55:17 +01:00
|
|
|
struct ib_qp *roce_qp; /* IB queue pair */
|
|
|
|
struct ib_qp_attr qp_attr; /* IB queue pair attributes */
|
2017-01-09 16:55:19 +01:00
|
|
|
|
|
|
|
struct smc_wr_buf *wr_tx_bufs; /* WR send payload buffers */
|
|
|
|
struct ib_send_wr *wr_tx_ibs; /* WR send meta data */
|
|
|
|
struct ib_sge *wr_tx_sges; /* WR send gather meta data */
|
2019-02-04 13:44:44 +01:00
|
|
|
struct smc_rdma_sges *wr_tx_rdma_sges;/*RDMA WRITE gather meta data*/
|
|
|
|
struct smc_rdma_wr *wr_tx_rdmas; /* WR RDMA WRITE */
|
2017-01-09 16:55:19 +01:00
|
|
|
struct smc_wr_tx_pend *wr_tx_pends; /* WR send waiting for CQE */
|
2020-05-04 14:18:41 +02:00
|
|
|
struct completion *wr_tx_compl; /* WR send CQE completion */
|
2017-01-09 16:55:19 +01:00
|
|
|
/* above four vectors have wr_tx_cnt elements and use the same index */
|
2021-10-16 11:37:49 +02:00
|
|
|
struct ib_send_wr *wr_tx_v2_ib; /* WR send v2 meta data */
|
|
|
|
struct ib_sge *wr_tx_v2_sge; /* WR send v2 gather meta data*/
|
|
|
|
struct smc_wr_tx_pend *wr_tx_v2_pend; /* WR send v2 waiting for CQE */
|
2017-01-09 16:55:19 +01:00
|
|
|
dma_addr_t wr_tx_dma_addr; /* DMA address of wr_tx_bufs */
|
2021-10-16 11:37:49 +02:00
|
|
|
dma_addr_t wr_tx_v2_dma_addr; /* DMA address of v2 tx buf*/
|
2017-01-09 16:55:19 +01:00
|
|
|
atomic_long_t wr_tx_id; /* seq # of last sent WR */
|
|
|
|
unsigned long *wr_tx_mask; /* bit mask of used indexes */
|
|
|
|
u32 wr_tx_cnt; /* number of WR send buffers */
|
|
|
|
wait_queue_head_t wr_tx_wait; /* wait for free WR send buf */
|
2021-08-09 11:05:56 +02:00
|
|
|
atomic_t wr_tx_refcnt; /* tx refs to link */
|
2017-01-09 16:55:19 +01:00
|
|
|
|
|
|
|
struct smc_wr_buf *wr_rx_bufs; /* WR recv payload buffers */
|
|
|
|
struct ib_recv_wr *wr_rx_ibs; /* WR recv meta data */
|
|
|
|
struct ib_sge *wr_rx_sges; /* WR recv scatter meta data */
|
|
|
|
/* above three vectors have wr_rx_cnt elements and use the same index */
|
|
|
|
dma_addr_t wr_rx_dma_addr; /* DMA address of wr_rx_bufs */
|
2021-10-16 11:37:49 +02:00
|
|
|
dma_addr_t wr_rx_v2_dma_addr; /* DMA address of v2 rx buf*/
|
2017-01-09 16:55:19 +01:00
|
|
|
u64 wr_rx_id; /* seq # of last recv WR */
|
2022-09-06 21:01:39 +08:00
|
|
|
u64 wr_rx_id_compl; /* seq # of last completed WR */
|
2017-01-09 16:55:19 +01:00
|
|
|
u32 wr_rx_cnt; /* number of WR recv buffers */
|
2018-05-02 16:56:44 +02:00
|
|
|
unsigned long wr_rx_tstamp; /* jiffies when last buf rx */
|
2022-09-06 21:01:39 +08:00
|
|
|
wait_queue_head_t wr_rx_empty_wait; /* wait for RQ empty */
|
2017-01-09 16:55:19 +01:00
|
|
|
|
2017-07-28 13:56:17 +02:00
|
|
|
struct ib_reg_wr wr_reg; /* WR register memory region */
|
|
|
|
wait_queue_head_t wr_reg_wait; /* wait for wr_reg result */
|
2021-08-09 11:05:56 +02:00
|
|
|
atomic_t wr_reg_refcnt; /* reg refs to link */
|
2017-07-28 13:56:17 +02:00
|
|
|
enum smc_wr_reg_state wr_reg_state; /* state of wr_reg request */
|
|
|
|
|
2018-07-25 16:35:31 +02:00
|
|
|
u8 gid[SMC_GID_SIZE];/* gid matching used vlan id*/
|
|
|
|
u8 sgid_index; /* gid index for vlan id */
|
2017-01-09 16:55:17 +01:00
|
|
|
u32 peer_qpn; /* QP number of peer */
|
|
|
|
enum ib_mtu path_mtu; /* used mtu */
|
|
|
|
enum ib_mtu peer_mtu; /* mtu size of peer */
|
|
|
|
u32 psn_initial; /* QP tx initial packet seqno */
|
|
|
|
u32 peer_psn; /* QP rx initial packet seqno */
|
|
|
|
u8 peer_mac[ETH_ALEN]; /* = gid[8:10||13:15] */
|
2018-07-25 16:35:31 +02:00
|
|
|
u8 peer_gid[SMC_GID_SIZE]; /* gid of peer*/
|
2017-01-09 16:55:21 +01:00
|
|
|
u8 link_id; /* unique # within link group */
|
2020-05-04 14:18:47 +02:00
|
|
|
u8 link_uid[SMC_LGR_ID_SIZE]; /* unique lnk id */
|
2020-05-04 14:18:48 +02:00
|
|
|
u8 peer_link_uid[SMC_LGR_ID_SIZE]; /* peer uid */
|
2020-04-29 17:10:39 +02:00
|
|
|
u8 link_idx; /* index in lgr link array */
|
2020-05-04 14:18:44 +02:00
|
|
|
u8 link_is_asym; /* is link asymmetric? */
|
2022-01-13 16:36:42 +08:00
|
|
|
u8 clearing : 1; /* link is being cleared */
|
|
|
|
refcount_t refcnt; /* link reference count */
|
2020-04-29 17:10:40 +02:00
|
|
|
struct smc_link_group *lgr; /* parent link group */
|
2020-05-01 12:48:08 +02:00
|
|
|
struct work_struct link_down_wrk; /* wrk to bring link down */
|
2020-12-01 20:20:41 +01:00
|
|
|
char ibname[IB_DEVICE_NAME_MAX]; /* ib device name */
|
|
|
|
int ndev_ifidx; /* network device ifindex */
|
2018-03-01 13:51:32 +01:00
|
|
|
|
|
|
|
enum smc_link_state state; /* state of link */
|
2018-05-02 16:56:44 +02:00
|
|
|
struct delayed_work llc_testlink_wrk; /* testlink worker */
|
|
|
|
struct completion llc_testlink_resp; /* wait for rx of testlink */
|
|
|
|
int llc_testlink_time; /* testlink interval */
|
2020-12-01 20:20:38 +01:00
|
|
|
atomic_t conn_cnt; /* connections on this link */
|
2017-01-09 16:55:17 +01:00
|
|
|
};
|
|
|
|
|
|
|
|
/* For now we just allow one parallel link per link group. The SMC protocol
|
|
|
|
* allows more (up to 8).
|
|
|
|
*/
|
2020-05-01 12:48:12 +02:00
|
|
|
#define SMC_LINKS_PER_LGR_MAX 3
|
2017-01-09 16:55:17 +01:00
|
|
|
#define SMC_SINGLE_LINK 0
|
|
|
|
|
2017-01-09 16:55:18 +01:00
|
|
|
/* tx/rx buffer list element for sndbufs list and rmbs list of a lgr */
|
|
|
|
struct smc_buf_desc {
|
|
|
|
struct list_head list;
|
|
|
|
void *cpu_addr; /* virtual address of buffer */
|
2018-05-03 18:12:38 +02:00
|
|
|
struct page *pages;
|
2018-05-18 09:34:10 +02:00
|
|
|
int len; /* length of buffer */
|
2017-01-09 16:55:18 +01:00
|
|
|
u32 used; /* currently used / unused */
|
net/smc: add base infrastructure for SMC-D and ISM
SMC supports two variants: SMC-R and SMC-D. For data transport, SMC-R
uses RDMA devices, SMC-D uses so-called Internal Shared Memory (ISM)
devices. An ISM device only allows shared memory communication between
SMC instances on the same machine. For example, this allows virtual
machines on the same host to communicate via SMC without RDMA devices.
This patch adds the base infrastructure for SMC-D and ISM devices to
the existing SMC code. It contains the following:
* ISM driver interface:
This interface allows an ISM driver to register ISM devices in SMC. In
the process, the driver provides a set of device ops for each device.
SMC uses these ops to execute SMC specific operations on or transfer
data over the device.
* Core SMC-D link group, connection, and buffer support:
Link groups, SMC connections and SMC buffers (in smc_core) are
extended to support SMC-D.
* SMC type checks:
Some type checks are added to prevent using SMC-R specific code for
SMC-D and vice versa.
To actually use SMC-D, additional changes to pnetid, CLC, CDC, etc. are
required. These are added in follow-up patches.
Signed-off-by: Hans Wippel <hwippel@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Suggested-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-28 19:05:07 +02:00
|
|
|
union {
|
|
|
|
struct { /* SMC-R */
|
2020-04-29 17:10:41 +02:00
|
|
|
struct sg_table sgt[SMC_LINKS_PER_LGR_MAX];
|
|
|
|
/* virtual buffer */
|
net/smc: Allow virtually contiguous sndbufs or RMBs for SMC-R
On long-running enterprise production servers, high-order contiguous
memory pages are usually very rare and in most cases we can only get
fragmented pages.
When replacing TCP with SMC-R in such production scenarios, attempting
to allocate high-order physically contiguous sndbufs and RMBs may result
in frequent memory compaction, which will cause unexpected hung issue
and further stability risks.
So this patch is aimed to allow SMC-R link group to use virtually
contiguous sndbufs and RMBs to avoid potential issues mentioned above.
Whether to use physically or virtually contiguous buffers can be set
by sysctl smcr_buf_type.
Note that using virtually contiguous buffers will bring an acceptable
performance regression, which can be mainly divided into two parts:
1) regression in data path, which is brought by additional address
translation of sndbuf by RNIC in Tx. But in general, translating
address through MTT is fast.
Taking 256KB sndbuf and RMB as an example, the comparisons in qperf
latency and bandwidth test with physically and virtually contiguous
buffers are as follows:
- client:
smc_run taskset -c <cpu> qperf <server> -oo msg_size:1:64K:*2\
-t 5 -vu tcp_{bw|lat}
- server:
smc_run taskset -c <cpu> qperf
[latency]
msgsize tcp smcr smcr-use-virt-buf
1 11.17 us 7.56 us 7.51 us (-0.67%)
2 10.65 us 7.74 us 7.56 us (-2.31%)
4 11.11 us 7.52 us 7.59 us ( 0.84%)
8 10.83 us 7.55 us 7.51 us (-0.48%)
16 11.21 us 7.46 us 7.51 us ( 0.71%)
32 10.65 us 7.53 us 7.58 us ( 0.61%)
64 10.95 us 7.74 us 7.80 us ( 0.76%)
128 11.14 us 7.83 us 7.87 us ( 0.47%)
256 10.97 us 7.94 us 7.92 us (-0.28%)
512 11.23 us 7.94 us 8.20 us ( 3.25%)
1024 11.60 us 8.12 us 8.20 us ( 0.96%)
2048 14.04 us 8.30 us 8.51 us ( 2.49%)
4096 16.88 us 9.13 us 9.07 us (-0.64%)
8192 22.50 us 10.56 us 11.22 us ( 6.26%)
16384 28.99 us 12.88 us 13.83 us ( 7.37%)
32768 40.13 us 16.76 us 16.95 us ( 1.16%)
65536 68.70 us 24.68 us 24.85 us ( 0.68%)
[bandwidth]
msgsize tcp smcr smcr-use-virt-buf
1 1.65 MB/s 1.59 MB/s 1.53 MB/s (-3.88%)
2 3.32 MB/s 3.17 MB/s 3.08 MB/s (-2.67%)
4 6.66 MB/s 6.33 MB/s 6.09 MB/s (-3.85%)
8 13.67 MB/s 13.45 MB/s 11.97 MB/s (-10.99%)
16 25.36 MB/s 27.15 MB/s 24.16 MB/s (-11.01%)
32 48.22 MB/s 54.24 MB/s 49.41 MB/s (-8.89%)
64 106.79 MB/s 107.32 MB/s 99.05 MB/s (-7.71%)
128 210.21 MB/s 202.46 MB/s 201.02 MB/s (-0.71%)
256 400.81 MB/s 416.81 MB/s 393.52 MB/s (-5.59%)
512 746.49 MB/s 834.12 MB/s 809.99 MB/s (-2.89%)
1024 1292.33 MB/s 1641.96 MB/s 1571.82 MB/s (-4.27%)
2048 2007.64 MB/s 2760.44 MB/s 2717.68 MB/s (-1.55%)
4096 2665.17 MB/s 4157.44 MB/s 4070.76 MB/s (-2.09%)
8192 3159.72 MB/s 4361.57 MB/s 4270.65 MB/s (-2.08%)
16384 4186.70 MB/s 4574.13 MB/s 4501.17 MB/s (-1.60%)
32768 4093.21 MB/s 4487.42 MB/s 4322.43 MB/s (-3.68%)
65536 4057.14 MB/s 4735.61 MB/s 4555.17 MB/s (-3.81%)
2) regression in buffer initialization and destruction path, which is
brought by additional MR operations of sndbufs. But thanks to link
group buffer reuse mechanism, the impact of this kind of regression
decreases as times of buffer reuse increases.
Taking 256KB sndbuf and RMB as an example, latency of some key SMC-R
buffer-related function obtained by bpftrace are as follows:
Function Phys-bufs Virt-bufs
smcr_new_buf_create() 67154 ns 79164 ns
smc_ib_buf_map_sg() 525 ns 928 ns
smc_ib_get_memory_region() 162294 ns 161191 ns
smc_wr_reg_send() 9957 ns 9635 ns
smc_ib_put_memory_region() 203548 ns 198374 ns
smc_ib_buf_unmap_sg() 508 ns 1158 ns
------------
Test environment notes:
1. Above tests run on 2 VMs within the same Host.
2. The NIC is ConnectX-4Lx, using SRIOV and passing through 2 VFs to
the each VM respectively.
3. VMs' vCPUs are binded to different physical CPUs, and the binded
physical CPUs are isolated by `isolcpus=xxx` cmdline.
4. NICs' queue number are set to 1.
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-14 17:44:04 +08:00
|
|
|
struct ib_mr *mr[SMC_LINKS_PER_LGR_MAX];
|
|
|
|
/* memory region: for rmb and
|
|
|
|
* vzalloced sndbuf
|
2020-04-29 17:10:41 +02:00
|
|
|
* incl. rkey provided to peer
|
net/smc: Allow virtually contiguous sndbufs or RMBs for SMC-R
On long-running enterprise production servers, high-order contiguous
memory pages are usually very rare and in most cases we can only get
fragmented pages.
When replacing TCP with SMC-R in such production scenarios, attempting
to allocate high-order physically contiguous sndbufs and RMBs may result
in frequent memory compaction, which will cause unexpected hung issue
and further stability risks.
So this patch is aimed to allow SMC-R link group to use virtually
contiguous sndbufs and RMBs to avoid potential issues mentioned above.
Whether to use physically or virtually contiguous buffers can be set
by sysctl smcr_buf_type.
Note that using virtually contiguous buffers will bring an acceptable
performance regression, which can be mainly divided into two parts:
1) regression in data path, which is brought by additional address
translation of sndbuf by RNIC in Tx. But in general, translating
address through MTT is fast.
Taking 256KB sndbuf and RMB as an example, the comparisons in qperf
latency and bandwidth test with physically and virtually contiguous
buffers are as follows:
- client:
smc_run taskset -c <cpu> qperf <server> -oo msg_size:1:64K:*2\
-t 5 -vu tcp_{bw|lat}
- server:
smc_run taskset -c <cpu> qperf
[latency]
msgsize tcp smcr smcr-use-virt-buf
1 11.17 us 7.56 us 7.51 us (-0.67%)
2 10.65 us 7.74 us 7.56 us (-2.31%)
4 11.11 us 7.52 us 7.59 us ( 0.84%)
8 10.83 us 7.55 us 7.51 us (-0.48%)
16 11.21 us 7.46 us 7.51 us ( 0.71%)
32 10.65 us 7.53 us 7.58 us ( 0.61%)
64 10.95 us 7.74 us 7.80 us ( 0.76%)
128 11.14 us 7.83 us 7.87 us ( 0.47%)
256 10.97 us 7.94 us 7.92 us (-0.28%)
512 11.23 us 7.94 us 8.20 us ( 3.25%)
1024 11.60 us 8.12 us 8.20 us ( 0.96%)
2048 14.04 us 8.30 us 8.51 us ( 2.49%)
4096 16.88 us 9.13 us 9.07 us (-0.64%)
8192 22.50 us 10.56 us 11.22 us ( 6.26%)
16384 28.99 us 12.88 us 13.83 us ( 7.37%)
32768 40.13 us 16.76 us 16.95 us ( 1.16%)
65536 68.70 us 24.68 us 24.85 us ( 0.68%)
[bandwidth]
msgsize tcp smcr smcr-use-virt-buf
1 1.65 MB/s 1.59 MB/s 1.53 MB/s (-3.88%)
2 3.32 MB/s 3.17 MB/s 3.08 MB/s (-2.67%)
4 6.66 MB/s 6.33 MB/s 6.09 MB/s (-3.85%)
8 13.67 MB/s 13.45 MB/s 11.97 MB/s (-10.99%)
16 25.36 MB/s 27.15 MB/s 24.16 MB/s (-11.01%)
32 48.22 MB/s 54.24 MB/s 49.41 MB/s (-8.89%)
64 106.79 MB/s 107.32 MB/s 99.05 MB/s (-7.71%)
128 210.21 MB/s 202.46 MB/s 201.02 MB/s (-0.71%)
256 400.81 MB/s 416.81 MB/s 393.52 MB/s (-5.59%)
512 746.49 MB/s 834.12 MB/s 809.99 MB/s (-2.89%)
1024 1292.33 MB/s 1641.96 MB/s 1571.82 MB/s (-4.27%)
2048 2007.64 MB/s 2760.44 MB/s 2717.68 MB/s (-1.55%)
4096 2665.17 MB/s 4157.44 MB/s 4070.76 MB/s (-2.09%)
8192 3159.72 MB/s 4361.57 MB/s 4270.65 MB/s (-2.08%)
16384 4186.70 MB/s 4574.13 MB/s 4501.17 MB/s (-1.60%)
32768 4093.21 MB/s 4487.42 MB/s 4322.43 MB/s (-3.68%)
65536 4057.14 MB/s 4735.61 MB/s 4555.17 MB/s (-3.81%)
2) regression in buffer initialization and destruction path, which is
brought by additional MR operations of sndbufs. But thanks to link
group buffer reuse mechanism, the impact of this kind of regression
decreases as times of buffer reuse increases.
Taking 256KB sndbuf and RMB as an example, latency of some key SMC-R
buffer-related function obtained by bpftrace are as follows:
Function Phys-bufs Virt-bufs
smcr_new_buf_create() 67154 ns 79164 ns
smc_ib_buf_map_sg() 525 ns 928 ns
smc_ib_get_memory_region() 162294 ns 161191 ns
smc_wr_reg_send() 9957 ns 9635 ns
smc_ib_put_memory_region() 203548 ns 198374 ns
smc_ib_buf_unmap_sg() 508 ns 1158 ns
------------
Test environment notes:
1. Above tests run on 2 VMs within the same Host.
2. The NIC is ConnectX-4Lx, using SRIOV and passing through 2 VFs to
the each VM respectively.
3. VMs' vCPUs are binded to different physical CPUs, and the binded
physical CPUs are isolated by `isolcpus=xxx` cmdline.
4. NICs' queue number are set to 1.
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-14 17:44:04 +08:00
|
|
|
* and lkey provided to local
|
2020-04-29 17:10:41 +02:00
|
|
|
*/
|
|
|
|
u32 order; /* allocation order */
|
|
|
|
|
|
|
|
u8 is_conf_rkey;
|
|
|
|
/* confirm_rkey done */
|
|
|
|
u8 is_reg_mr[SMC_LINKS_PER_LGR_MAX];
|
|
|
|
/* mem region registered */
|
|
|
|
u8 is_map_ib[SMC_LINKS_PER_LGR_MAX];
|
|
|
|
/* mem region mapped to lnk */
|
2022-07-14 17:44:01 +08:00
|
|
|
u8 is_dma_need_sync;
|
2020-04-29 17:10:41 +02:00
|
|
|
u8 is_reg_err;
|
|
|
|
/* buffer registration err */
|
net/smc: Allow virtually contiguous sndbufs or RMBs for SMC-R
On long-running enterprise production servers, high-order contiguous
memory pages are usually very rare and in most cases we can only get
fragmented pages.
When replacing TCP with SMC-R in such production scenarios, attempting
to allocate high-order physically contiguous sndbufs and RMBs may result
in frequent memory compaction, which will cause unexpected hung issue
and further stability risks.
So this patch is aimed to allow SMC-R link group to use virtually
contiguous sndbufs and RMBs to avoid potential issues mentioned above.
Whether to use physically or virtually contiguous buffers can be set
by sysctl smcr_buf_type.
Note that using virtually contiguous buffers will bring an acceptable
performance regression, which can be mainly divided into two parts:
1) regression in data path, which is brought by additional address
translation of sndbuf by RNIC in Tx. But in general, translating
address through MTT is fast.
Taking 256KB sndbuf and RMB as an example, the comparisons in qperf
latency and bandwidth test with physically and virtually contiguous
buffers are as follows:
- client:
smc_run taskset -c <cpu> qperf <server> -oo msg_size:1:64K:*2\
-t 5 -vu tcp_{bw|lat}
- server:
smc_run taskset -c <cpu> qperf
[latency]
msgsize tcp smcr smcr-use-virt-buf
1 11.17 us 7.56 us 7.51 us (-0.67%)
2 10.65 us 7.74 us 7.56 us (-2.31%)
4 11.11 us 7.52 us 7.59 us ( 0.84%)
8 10.83 us 7.55 us 7.51 us (-0.48%)
16 11.21 us 7.46 us 7.51 us ( 0.71%)
32 10.65 us 7.53 us 7.58 us ( 0.61%)
64 10.95 us 7.74 us 7.80 us ( 0.76%)
128 11.14 us 7.83 us 7.87 us ( 0.47%)
256 10.97 us 7.94 us 7.92 us (-0.28%)
512 11.23 us 7.94 us 8.20 us ( 3.25%)
1024 11.60 us 8.12 us 8.20 us ( 0.96%)
2048 14.04 us 8.30 us 8.51 us ( 2.49%)
4096 16.88 us 9.13 us 9.07 us (-0.64%)
8192 22.50 us 10.56 us 11.22 us ( 6.26%)
16384 28.99 us 12.88 us 13.83 us ( 7.37%)
32768 40.13 us 16.76 us 16.95 us ( 1.16%)
65536 68.70 us 24.68 us 24.85 us ( 0.68%)
[bandwidth]
msgsize tcp smcr smcr-use-virt-buf
1 1.65 MB/s 1.59 MB/s 1.53 MB/s (-3.88%)
2 3.32 MB/s 3.17 MB/s 3.08 MB/s (-2.67%)
4 6.66 MB/s 6.33 MB/s 6.09 MB/s (-3.85%)
8 13.67 MB/s 13.45 MB/s 11.97 MB/s (-10.99%)
16 25.36 MB/s 27.15 MB/s 24.16 MB/s (-11.01%)
32 48.22 MB/s 54.24 MB/s 49.41 MB/s (-8.89%)
64 106.79 MB/s 107.32 MB/s 99.05 MB/s (-7.71%)
128 210.21 MB/s 202.46 MB/s 201.02 MB/s (-0.71%)
256 400.81 MB/s 416.81 MB/s 393.52 MB/s (-5.59%)
512 746.49 MB/s 834.12 MB/s 809.99 MB/s (-2.89%)
1024 1292.33 MB/s 1641.96 MB/s 1571.82 MB/s (-4.27%)
2048 2007.64 MB/s 2760.44 MB/s 2717.68 MB/s (-1.55%)
4096 2665.17 MB/s 4157.44 MB/s 4070.76 MB/s (-2.09%)
8192 3159.72 MB/s 4361.57 MB/s 4270.65 MB/s (-2.08%)
16384 4186.70 MB/s 4574.13 MB/s 4501.17 MB/s (-1.60%)
32768 4093.21 MB/s 4487.42 MB/s 4322.43 MB/s (-3.68%)
65536 4057.14 MB/s 4735.61 MB/s 4555.17 MB/s (-3.81%)
2) regression in buffer initialization and destruction path, which is
brought by additional MR operations of sndbufs. But thanks to link
group buffer reuse mechanism, the impact of this kind of regression
decreases as times of buffer reuse increases.
Taking 256KB sndbuf and RMB as an example, latency of some key SMC-R
buffer-related function obtained by bpftrace are as follows:
Function Phys-bufs Virt-bufs
smcr_new_buf_create() 67154 ns 79164 ns
smc_ib_buf_map_sg() 525 ns 928 ns
smc_ib_get_memory_region() 162294 ns 161191 ns
smc_wr_reg_send() 9957 ns 9635 ns
smc_ib_put_memory_region() 203548 ns 198374 ns
smc_ib_buf_unmap_sg() 508 ns 1158 ns
------------
Test environment notes:
1. Above tests run on 2 VMs within the same Host.
2. The NIC is ConnectX-4Lx, using SRIOV and passing through 2 VFs to
the each VM respectively.
3. VMs' vCPUs are binded to different physical CPUs, and the binded
physical CPUs are isolated by `isolcpus=xxx` cmdline.
4. NICs' queue number are set to 1.
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-14 17:44:04 +08:00
|
|
|
u8 is_vm;
|
|
|
|
/* virtually contiguous */
|
net/smc: add base infrastructure for SMC-D and ISM
SMC supports two variants: SMC-R and SMC-D. For data transport, SMC-R
uses RDMA devices, SMC-D uses so-called Internal Shared Memory (ISM)
devices. An ISM device only allows shared memory communication between
SMC instances on the same machine. For example, this allows virtual
machines on the same host to communicate via SMC without RDMA devices.
This patch adds the base infrastructure for SMC-D and ISM devices to
the existing SMC code. It contains the following:
* ISM driver interface:
This interface allows an ISM driver to register ISM devices in SMC. In
the process, the driver provides a set of device ops for each device.
SMC uses these ops to execute SMC specific operations on or transfer
data over the device.
* Core SMC-D link group, connection, and buffer support:
Link groups, SMC connections and SMC buffers (in smc_core) are
extended to support SMC-D.
* SMC type checks:
Some type checks are added to prevent using SMC-R specific code for
SMC-D and vice versa.
To actually use SMC-D, additional changes to pnetid, CLC, CDC, etc. are
required. These are added in follow-up patches.
Signed-off-by: Hans Wippel <hwippel@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Suggested-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-28 19:05:07 +02:00
|
|
|
};
|
|
|
|
struct { /* SMC-D */
|
2020-04-29 17:10:41 +02:00
|
|
|
unsigned short sba_idx;
|
|
|
|
/* SBA index number */
|
|
|
|
u64 token;
|
|
|
|
/* DMB token number */
|
|
|
|
dma_addr_t dma_addr;
|
|
|
|
/* DMA address */
|
net/smc: add base infrastructure for SMC-D and ISM
SMC supports two variants: SMC-R and SMC-D. For data transport, SMC-R
uses RDMA devices, SMC-D uses so-called Internal Shared Memory (ISM)
devices. An ISM device only allows shared memory communication between
SMC instances on the same machine. For example, this allows virtual
machines on the same host to communicate via SMC without RDMA devices.
This patch adds the base infrastructure for SMC-D and ISM devices to
the existing SMC code. It contains the following:
* ISM driver interface:
This interface allows an ISM driver to register ISM devices in SMC. In
the process, the driver provides a set of device ops for each device.
SMC uses these ops to execute SMC specific operations on or transfer
data over the device.
* Core SMC-D link group, connection, and buffer support:
Link groups, SMC connections and SMC buffers (in smc_core) are
extended to support SMC-D.
* SMC type checks:
Some type checks are added to prevent using SMC-R specific code for
SMC-D and vice versa.
To actually use SMC-D, additional changes to pnetid, CLC, CDC, etc. are
required. These are added in follow-up patches.
Signed-off-by: Hans Wippel <hwippel@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Suggested-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-28 19:05:07 +02:00
|
|
|
};
|
|
|
|
};
|
2017-01-09 16:55:18 +01:00
|
|
|
};
|
|
|
|
|
2017-01-09 16:55:20 +01:00
|
|
|
struct smc_rtoken { /* address/key of remote RMB */
|
|
|
|
u64 dma_addr;
|
|
|
|
u32 rkey;
|
|
|
|
};
|
|
|
|
|
2018-05-18 09:34:14 +02:00
|
|
|
#define SMC_BUF_MIN_SIZE 16384 /* minimum size of an RMB */
|
|
|
|
#define SMC_RMBE_SIZES 16 /* number of distinct RMBE sizes */
|
|
|
|
/* theoretically, the RFC states that largest size would be 512K,
|
|
|
|
* i.e. compressed 5 and thus 6 sizes (0..5), despite
|
|
|
|
* struct smc_clc_msg_accept_confirm.rmbe_size being a 4 bit value (0..15)
|
|
|
|
*/
|
2017-01-09 16:55:21 +01:00
|
|
|
|
net/smc: add base infrastructure for SMC-D and ISM
SMC supports two variants: SMC-R and SMC-D. For data transport, SMC-R
uses RDMA devices, SMC-D uses so-called Internal Shared Memory (ISM)
devices. An ISM device only allows shared memory communication between
SMC instances on the same machine. For example, this allows virtual
machines on the same host to communicate via SMC without RDMA devices.
This patch adds the base infrastructure for SMC-D and ISM devices to
the existing SMC code. It contains the following:
* ISM driver interface:
This interface allows an ISM driver to register ISM devices in SMC. In
the process, the driver provides a set of device ops for each device.
SMC uses these ops to execute SMC specific operations on or transfer
data over the device.
* Core SMC-D link group, connection, and buffer support:
Link groups, SMC connections and SMC buffers (in smc_core) are
extended to support SMC-D.
* SMC type checks:
Some type checks are added to prevent using SMC-R specific code for
SMC-D and vice versa.
To actually use SMC-D, additional changes to pnetid, CLC, CDC, etc. are
required. These are added in follow-up patches.
Signed-off-by: Hans Wippel <hwippel@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Suggested-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-28 19:05:07 +02:00
|
|
|
struct smcd_dev;
|
|
|
|
|
2020-04-30 15:55:40 +02:00
|
|
|
enum smc_lgr_type { /* redundancy state of lgr */
|
|
|
|
SMC_LGR_NONE, /* no active links, lgr to be deleted */
|
|
|
|
SMC_LGR_SINGLE, /* 1 active RNIC on each peer */
|
|
|
|
SMC_LGR_SYMMETRIC, /* 2 active RNICs on each peer */
|
|
|
|
SMC_LGR_ASYMMETRIC_PEER, /* local has 2, peer 1 active RNICs */
|
|
|
|
SMC_LGR_ASYMMETRIC_LOCAL, /* local has 1, peer 2 active RNICs */
|
|
|
|
};
|
|
|
|
|
2022-07-14 17:44:02 +08:00
|
|
|
enum smcr_buf_type { /* types of SMC-R sndbufs and RMBs */
|
|
|
|
SMCR_PHYS_CONT_BUFS = 0,
|
|
|
|
SMCR_VIRT_CONT_BUFS = 1,
|
|
|
|
SMCR_MIXED_BUFS = 2,
|
|
|
|
};
|
|
|
|
|
2020-04-30 15:55:38 +02:00
|
|
|
enum smc_llc_flowtype {
|
|
|
|
SMC_LLC_FLOW_NONE = 0,
|
|
|
|
SMC_LLC_FLOW_ADD_LINK = 2,
|
|
|
|
SMC_LLC_FLOW_DEL_LINK = 4,
|
2021-10-16 11:37:50 +02:00
|
|
|
SMC_LLC_FLOW_REQ_ADD_LINK = 5,
|
2020-04-30 15:55:38 +02:00
|
|
|
SMC_LLC_FLOW_RKEY = 6,
|
|
|
|
};
|
|
|
|
|
|
|
|
struct smc_llc_qentry;
|
|
|
|
|
|
|
|
struct smc_llc_flow {
|
|
|
|
enum smc_llc_flowtype type;
|
|
|
|
struct smc_llc_qentry *qentry;
|
|
|
|
};
|
|
|
|
|
2017-01-09 16:55:17 +01:00
|
|
|
struct smc_link_group {
|
|
|
|
struct list_head list;
|
|
|
|
struct rb_root conns_all; /* connection tree */
|
|
|
|
rwlock_t conns_lock; /* protects conns_all */
|
|
|
|
unsigned int conns_num; /* current # of connections */
|
|
|
|
unsigned short vlan_id; /* vlan id of link group */
|
2017-01-09 16:55:18 +01:00
|
|
|
|
|
|
|
struct list_head sndbufs[SMC_RMBE_SIZES];/* tx buffers */
|
2020-04-29 17:10:48 +02:00
|
|
|
struct mutex sndbufs_lock; /* protects tx buffers */
|
2017-01-09 16:55:18 +01:00
|
|
|
struct list_head rmbs[SMC_RMBE_SIZES]; /* rx buffers */
|
2020-04-29 17:10:48 +02:00
|
|
|
struct mutex rmbs_lock; /* protects rx buffers */
|
2017-01-09 16:55:20 +01:00
|
|
|
|
2017-01-09 16:55:21 +01:00
|
|
|
u8 id[SMC_LGR_ID_SIZE]; /* unique lgr id */
|
2017-01-09 16:55:17 +01:00
|
|
|
struct delayed_work free_work; /* delayed freeing of an lgr */
|
2019-10-21 16:13:14 +02:00
|
|
|
struct work_struct terminate_work; /* abnormal lgr termination */
|
2020-09-10 18:48:29 +02:00
|
|
|
struct workqueue_struct *tx_wq; /* wq for conn. tx workers */
|
2018-05-15 17:05:03 +02:00
|
|
|
u8 sync_err : 1; /* lgr no longer fits to peer */
|
|
|
|
u8 terminating : 1;/* lgr is terminating */
|
2019-10-21 16:13:11 +02:00
|
|
|
u8 freeing : 1; /* lgr is being freed */
|
net/smc: add base infrastructure for SMC-D and ISM
SMC supports two variants: SMC-R and SMC-D. For data transport, SMC-R
uses RDMA devices, SMC-D uses so-called Internal Shared Memory (ISM)
devices. An ISM device only allows shared memory communication between
SMC instances on the same machine. For example, this allows virtual
machines on the same host to communicate via SMC without RDMA devices.
This patch adds the base infrastructure for SMC-D and ISM devices to
the existing SMC code. It contains the following:
* ISM driver interface:
This interface allows an ISM driver to register ISM devices in SMC. In
the process, the driver provides a set of device ops for each device.
SMC uses these ops to execute SMC specific operations on or transfer
data over the device.
* Core SMC-D link group, connection, and buffer support:
Link groups, SMC connections and SMC buffers (in smc_core) are
extended to support SMC-D.
* SMC type checks:
Some type checks are added to prevent using SMC-R specific code for
SMC-D and vice versa.
To actually use SMC-D, additional changes to pnetid, CLC, CDC, etc. are
required. These are added in follow-up patches.
Signed-off-by: Hans Wippel <hwippel@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Suggested-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-28 19:05:07 +02:00
|
|
|
|
2022-01-13 16:36:40 +08:00
|
|
|
refcount_t refcnt; /* lgr reference count */
|
net/smc: add base infrastructure for SMC-D and ISM
SMC supports two variants: SMC-R and SMC-D. For data transport, SMC-R
uses RDMA devices, SMC-D uses so-called Internal Shared Memory (ISM)
devices. An ISM device only allows shared memory communication between
SMC instances on the same machine. For example, this allows virtual
machines on the same host to communicate via SMC without RDMA devices.
This patch adds the base infrastructure for SMC-D and ISM devices to
the existing SMC code. It contains the following:
* ISM driver interface:
This interface allows an ISM driver to register ISM devices in SMC. In
the process, the driver provides a set of device ops for each device.
SMC uses these ops to execute SMC specific operations on or transfer
data over the device.
* Core SMC-D link group, connection, and buffer support:
Link groups, SMC connections and SMC buffers (in smc_core) are
extended to support SMC-D.
* SMC type checks:
Some type checks are added to prevent using SMC-R specific code for
SMC-D and vice versa.
To actually use SMC-D, additional changes to pnetid, CLC, CDC, etc. are
required. These are added in follow-up patches.
Signed-off-by: Hans Wippel <hwippel@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Suggested-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-28 19:05:07 +02:00
|
|
|
bool is_smcd; /* SMC-R or SMC-D */
|
2020-09-26 12:44:31 +02:00
|
|
|
u8 smc_version;
|
|
|
|
u8 negotiated_eid[SMC_MAX_EID_LEN];
|
|
|
|
u8 peer_os; /* peer operating system */
|
|
|
|
u8 peer_smc_release;
|
|
|
|
u8 peer_hostname[SMC_MAX_HOSTNAME_LEN];
|
net/smc: add base infrastructure for SMC-D and ISM
SMC supports two variants: SMC-R and SMC-D. For data transport, SMC-R
uses RDMA devices, SMC-D uses so-called Internal Shared Memory (ISM)
devices. An ISM device only allows shared memory communication between
SMC instances on the same machine. For example, this allows virtual
machines on the same host to communicate via SMC without RDMA devices.
This patch adds the base infrastructure for SMC-D and ISM devices to
the existing SMC code. It contains the following:
* ISM driver interface:
This interface allows an ISM driver to register ISM devices in SMC. In
the process, the driver provides a set of device ops for each device.
SMC uses these ops to execute SMC specific operations on or transfer
data over the device.
* Core SMC-D link group, connection, and buffer support:
Link groups, SMC connections and SMC buffers (in smc_core) are
extended to support SMC-D.
* SMC type checks:
Some type checks are added to prevent using SMC-R specific code for
SMC-D and vice versa.
To actually use SMC-D, additional changes to pnetid, CLC, CDC, etc. are
required. These are added in follow-up patches.
Signed-off-by: Hans Wippel <hwippel@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Suggested-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-28 19:05:07 +02:00
|
|
|
union {
|
|
|
|
struct { /* SMC-R */
|
|
|
|
enum smc_lgr_role role;
|
|
|
|
/* client or server */
|
|
|
|
struct smc_link lnk[SMC_LINKS_PER_LGR_MAX];
|
|
|
|
/* smc link */
|
2021-10-16 11:37:49 +02:00
|
|
|
struct smc_wr_v2_buf *wr_rx_buf_v2;
|
|
|
|
/* WR v2 recv payload buffer */
|
|
|
|
struct smc_wr_v2_buf *wr_tx_buf_v2;
|
|
|
|
/* WR v2 send payload buffer */
|
net/smc: add base infrastructure for SMC-D and ISM
SMC supports two variants: SMC-R and SMC-D. For data transport, SMC-R
uses RDMA devices, SMC-D uses so-called Internal Shared Memory (ISM)
devices. An ISM device only allows shared memory communication between
SMC instances on the same machine. For example, this allows virtual
machines on the same host to communicate via SMC without RDMA devices.
This patch adds the base infrastructure for SMC-D and ISM devices to
the existing SMC code. It contains the following:
* ISM driver interface:
This interface allows an ISM driver to register ISM devices in SMC. In
the process, the driver provides a set of device ops for each device.
SMC uses these ops to execute SMC specific operations on or transfer
data over the device.
* Core SMC-D link group, connection, and buffer support:
Link groups, SMC connections and SMC buffers (in smc_core) are
extended to support SMC-D.
* SMC type checks:
Some type checks are added to prevent using SMC-R specific code for
SMC-D and vice versa.
To actually use SMC-D, additional changes to pnetid, CLC, CDC, etc. are
required. These are added in follow-up patches.
Signed-off-by: Hans Wippel <hwippel@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Suggested-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-28 19:05:07 +02:00
|
|
|
char peer_systemid[SMC_SYSTEMID_LEN];
|
|
|
|
/* unique system_id of peer */
|
|
|
|
struct smc_rtoken rtokens[SMC_RMBS_PER_LGR_MAX]
|
|
|
|
[SMC_LINKS_PER_LGR_MAX];
|
|
|
|
/* remote addr/key pairs */
|
2018-07-23 13:53:11 +02:00
|
|
|
DECLARE_BITMAP(rtokens_used_mask, SMC_RMBS_PER_LGR_MAX);
|
net/smc: add base infrastructure for SMC-D and ISM
SMC supports two variants: SMC-R and SMC-D. For data transport, SMC-R
uses RDMA devices, SMC-D uses so-called Internal Shared Memory (ISM)
devices. An ISM device only allows shared memory communication between
SMC instances on the same machine. For example, this allows virtual
machines on the same host to communicate via SMC without RDMA devices.
This patch adds the base infrastructure for SMC-D and ISM devices to
the existing SMC code. It contains the following:
* ISM driver interface:
This interface allows an ISM driver to register ISM devices in SMC. In
the process, the driver provides a set of device ops for each device.
SMC uses these ops to execute SMC specific operations on or transfer
data over the device.
* Core SMC-D link group, connection, and buffer support:
Link groups, SMC connections and SMC buffers (in smc_core) are
extended to support SMC-D.
* SMC type checks:
Some type checks are added to prevent using SMC-R specific code for
SMC-D and vice versa.
To actually use SMC-D, additional changes to pnetid, CLC, CDC, etc. are
required. These are added in follow-up patches.
Signed-off-by: Hans Wippel <hwippel@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Suggested-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-28 19:05:07 +02:00
|
|
|
/* used rtoken elements */
|
2020-04-29 17:10:39 +02:00
|
|
|
u8 next_link_id;
|
2020-04-30 15:55:40 +02:00
|
|
|
enum smc_lgr_type type;
|
2022-07-14 17:44:03 +08:00
|
|
|
enum smcr_buf_type buf_type;
|
2020-04-30 15:55:40 +02:00
|
|
|
/* redundancy state */
|
2020-05-01 12:48:06 +02:00
|
|
|
u8 pnet_id[SMC_MAX_PNETID_LEN + 1];
|
|
|
|
/* pnet id of this lgr */
|
2020-04-29 17:10:46 +02:00
|
|
|
struct list_head llc_event_q;
|
|
|
|
/* queue for llc events */
|
|
|
|
spinlock_t llc_event_q_lock;
|
|
|
|
/* protects llc_event_q */
|
net/smc: llc_conf_mutex refactor, replace it with rw_semaphore
llc_conf_mutex was used to protect links and link related configurations
in the same link group, for example, add or delete links. However,
in most cases, the protected critical area has only read semantics and
with no write semantics at all, such as obtaining a usable link or an
available rmb_desc.
This patch do simply code refactoring, replace mutex with rw_semaphore,
replace mutex_lock with down_write and replace mutex_unlock with
up_write.
Theoretically, this replacement is equivalent, but after this patch,
we can distinguish lock granularity according to different semantics
of critical areas.
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-02 16:26:39 +08:00
|
|
|
struct rw_semaphore llc_conf_mutex;
|
2020-05-01 12:48:05 +02:00
|
|
|
/* protects lgr reconfig. */
|
2020-05-01 12:48:13 +02:00
|
|
|
struct work_struct llc_add_link_work;
|
2020-05-03 14:38:47 +02:00
|
|
|
struct work_struct llc_del_link_work;
|
2020-04-29 17:10:46 +02:00
|
|
|
struct work_struct llc_event_work;
|
|
|
|
/* llc event worker */
|
2020-07-08 17:05:11 +02:00
|
|
|
wait_queue_head_t llc_flow_waiter;
|
2020-04-30 15:55:38 +02:00
|
|
|
/* w4 next llc event */
|
2020-07-08 17:05:11 +02:00
|
|
|
wait_queue_head_t llc_msg_waiter;
|
|
|
|
/* w4 next llc msg */
|
2020-04-30 15:55:38 +02:00
|
|
|
struct smc_llc_flow llc_flow_lcl;
|
|
|
|
/* llc local control field */
|
|
|
|
struct smc_llc_flow llc_flow_rmt;
|
|
|
|
/* llc remote control field */
|
|
|
|
struct smc_llc_qentry *delayed_event;
|
|
|
|
/* arrived when flow active */
|
|
|
|
spinlock_t llc_flow_lock;
|
|
|
|
/* protects llc flow */
|
2020-04-29 17:10:49 +02:00
|
|
|
int llc_testlink_time;
|
|
|
|
/* link keep alive time */
|
2020-05-04 14:18:45 +02:00
|
|
|
u32 llc_termination_rsn;
|
|
|
|
/* rsn code for termination */
|
2021-10-16 11:37:45 +02:00
|
|
|
u8 nexthop_mac[ETH_ALEN];
|
|
|
|
u8 uses_gateway;
|
|
|
|
__be32 saddr;
|
2021-12-28 21:06:09 +08:00
|
|
|
/* net namespace */
|
|
|
|
struct net *net;
|
net/smc: add base infrastructure for SMC-D and ISM
SMC supports two variants: SMC-R and SMC-D. For data transport, SMC-R
uses RDMA devices, SMC-D uses so-called Internal Shared Memory (ISM)
devices. An ISM device only allows shared memory communication between
SMC instances on the same machine. For example, this allows virtual
machines on the same host to communicate via SMC without RDMA devices.
This patch adds the base infrastructure for SMC-D and ISM devices to
the existing SMC code. It contains the following:
* ISM driver interface:
This interface allows an ISM driver to register ISM devices in SMC. In
the process, the driver provides a set of device ops for each device.
SMC uses these ops to execute SMC specific operations on or transfer
data over the device.
* Core SMC-D link group, connection, and buffer support:
Link groups, SMC connections and SMC buffers (in smc_core) are
extended to support SMC-D.
* SMC type checks:
Some type checks are added to prevent using SMC-R specific code for
SMC-D and vice versa.
To actually use SMC-D, additional changes to pnetid, CLC, CDC, etc. are
required. These are added in follow-up patches.
Signed-off-by: Hans Wippel <hwippel@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Suggested-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-28 19:05:07 +02:00
|
|
|
};
|
|
|
|
struct { /* SMC-D */
|
|
|
|
u64 peer_gid;
|
|
|
|
/* Peer GID (remote) */
|
|
|
|
struct smcd_dev *smcd;
|
|
|
|
/* ISM device for VLAN reg. */
|
2019-11-14 13:02:40 +01:00
|
|
|
u8 peer_shutdown : 1;
|
|
|
|
/* peer triggered shutdownn */
|
net/smc: add base infrastructure for SMC-D and ISM
SMC supports two variants: SMC-R and SMC-D. For data transport, SMC-R
uses RDMA devices, SMC-D uses so-called Internal Shared Memory (ISM)
devices. An ISM device only allows shared memory communication between
SMC instances on the same machine. For example, this allows virtual
machines on the same host to communicate via SMC without RDMA devices.
This patch adds the base infrastructure for SMC-D and ISM devices to
the existing SMC code. It contains the following:
* ISM driver interface:
This interface allows an ISM driver to register ISM devices in SMC. In
the process, the driver provides a set of device ops for each device.
SMC uses these ops to execute SMC specific operations on or transfer
data over the device.
* Core SMC-D link group, connection, and buffer support:
Link groups, SMC connections and SMC buffers (in smc_core) are
extended to support SMC-D.
* SMC type checks:
Some type checks are added to prevent using SMC-R specific code for
SMC-D and vice versa.
To actually use SMC-D, additional changes to pnetid, CLC, CDC, etc. are
required. These are added in follow-up patches.
Signed-off-by: Hans Wippel <hwippel@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Suggested-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-28 19:05:07 +02:00
|
|
|
};
|
|
|
|
};
|
2017-01-09 16:55:17 +01:00
|
|
|
};
|
|
|
|
|
2019-04-12 12:57:26 +02:00
|
|
|
struct smc_clc_msg_local;
|
|
|
|
|
2021-10-16 11:37:44 +02:00
|
|
|
#define GID_LIST_SIZE 2
|
|
|
|
|
|
|
|
struct smc_gidlist {
|
|
|
|
u8 len;
|
|
|
|
u8 list[GID_LIST_SIZE][SMC_GID_SIZE];
|
|
|
|
};
|
|
|
|
|
|
|
|
struct smc_init_info_smcrv2 {
|
|
|
|
/* Input fields */
|
|
|
|
__be32 saddr;
|
|
|
|
struct sock *clc_sk;
|
|
|
|
__be32 daddr;
|
|
|
|
|
|
|
|
/* Output fields when saddr is set */
|
|
|
|
struct smc_ib_device *ib_dev_v2;
|
|
|
|
u8 ib_port_v2;
|
|
|
|
u8 ib_gid_v2[SMC_GID_SIZE];
|
|
|
|
|
|
|
|
/* Additional output fields when clc_sk and daddr is set as well */
|
|
|
|
u8 uses_gateway;
|
|
|
|
u8 nexthop_mac[ETH_ALEN];
|
|
|
|
|
|
|
|
struct smc_gidlist gidlist;
|
|
|
|
};
|
|
|
|
|
2019-04-12 12:57:26 +02:00
|
|
|
struct smc_init_info {
|
|
|
|
u8 is_smcd;
|
2020-09-26 12:44:27 +02:00
|
|
|
u8 smc_type_v1;
|
|
|
|
u8 smc_type_v2;
|
2020-09-10 18:48:21 +02:00
|
|
|
u8 first_contact_peer;
|
|
|
|
u8 first_contact_local;
|
2019-04-12 12:57:26 +02:00
|
|
|
unsigned short vlan_id;
|
2020-10-31 19:19:38 +01:00
|
|
|
u32 rc;
|
2021-09-14 10:35:05 +02:00
|
|
|
u8 negotiated_eid[SMC_MAX_EID_LEN];
|
2019-04-12 12:57:26 +02:00
|
|
|
/* SMC-R */
|
2021-10-16 11:37:44 +02:00
|
|
|
u8 smcr_version;
|
|
|
|
u8 check_smcrv2;
|
2021-10-16 11:37:45 +02:00
|
|
|
u8 peer_gid[SMC_GID_SIZE];
|
|
|
|
u8 peer_mac[ETH_ALEN];
|
|
|
|
u8 peer_systemid[SMC_SYSTEMID_LEN];
|
2019-04-12 12:57:26 +02:00
|
|
|
struct smc_ib_device *ib_dev;
|
|
|
|
u8 ib_gid[SMC_GID_SIZE];
|
|
|
|
u8 ib_port;
|
|
|
|
u32 ib_clcqpn;
|
2021-10-16 11:37:44 +02:00
|
|
|
struct smc_init_info_smcrv2 smcrv2;
|
2019-04-12 12:57:26 +02:00
|
|
|
/* SMC-D */
|
2020-09-26 12:44:23 +02:00
|
|
|
u64 ism_peer_gid[SMC_MAX_ISM_DEVS + 1];
|
|
|
|
struct smcd_dev *ism_dev[SMC_MAX_ISM_DEVS + 1];
|
2020-09-26 12:44:25 +02:00
|
|
|
u16 ism_chid[SMC_MAX_ISM_DEVS + 1];
|
2020-09-26 12:44:27 +02:00
|
|
|
u8 ism_offered_cnt; /* # of ISM devices offered */
|
2020-09-26 12:44:29 +02:00
|
|
|
u8 ism_selected; /* index of selected ISM dev*/
|
2020-09-26 12:44:27 +02:00
|
|
|
u8 smcd_version;
|
2019-04-12 12:57:26 +02:00
|
|
|
};
|
|
|
|
|
2017-01-09 16:55:17 +01:00
|
|
|
/* Find the connection associated with the given alert token in the link group.
|
|
|
|
* To use rbtrees we have to implement our own search core.
|
|
|
|
* Requires @conns_lock
|
|
|
|
* @token alert token to search for
|
|
|
|
* @lgr link group to search in
|
|
|
|
* Returns connection associated with token if found, NULL otherwise.
|
|
|
|
*/
|
|
|
|
static inline struct smc_connection *smc_lgr_find_conn(
|
|
|
|
u32 token, struct smc_link_group *lgr)
|
|
|
|
{
|
|
|
|
struct smc_connection *res = NULL;
|
|
|
|
struct rb_node *node;
|
|
|
|
|
|
|
|
node = lgr->conns_all.rb_node;
|
|
|
|
while (node) {
|
|
|
|
struct smc_connection *cur = rb_entry(node,
|
|
|
|
struct smc_connection, alert_node);
|
|
|
|
|
|
|
|
if (cur->alert_token_local > token) {
|
|
|
|
node = node->rb_left;
|
|
|
|
} else {
|
|
|
|
if (cur->alert_token_local < token) {
|
|
|
|
node = node->rb_right;
|
|
|
|
} else {
|
|
|
|
res = cur;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return res;
|
2022-01-13 16:36:41 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static inline bool smc_conn_lgr_valid(struct smc_connection *conn)
|
|
|
|
{
|
|
|
|
return conn->lgr && conn->alert_token_local;
|
2017-01-09 16:55:17 +01:00
|
|
|
}
|
|
|
|
|
2021-12-31 14:08:53 +08:00
|
|
|
/*
|
|
|
|
* Returns true if the specified link is usable.
|
|
|
|
*
|
|
|
|
* usable means the link is ready to receive RDMA messages, map memory
|
|
|
|
* on the link, etc. This doesn't ensure we are able to send RDMA messages
|
|
|
|
* on this link, if sending RDMA messages is needed, use smc_link_sendable()
|
|
|
|
*/
|
2020-04-29 17:10:43 +02:00
|
|
|
static inline bool smc_link_usable(struct smc_link *lnk)
|
|
|
|
{
|
|
|
|
if (lnk->state == SMC_LNK_UNUSED || lnk->state == SMC_LNK_INACTIVE)
|
|
|
|
return false;
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
2021-12-31 14:08:53 +08:00
|
|
|
/*
|
|
|
|
* Returns true if the specified link is ready to receive AND send RDMA
|
|
|
|
* messages.
|
|
|
|
*
|
|
|
|
* For the client side in first contact, the underlying QP may still in
|
|
|
|
* RESET or RTR when the link state is ACTIVATING, checks in smc_link_usable()
|
|
|
|
* is not strong enough. For those places that need to send any CDC or LLC
|
|
|
|
* messages, use smc_link_sendable(), otherwise, use smc_link_usable() instead
|
|
|
|
*/
|
2021-12-28 17:03:24 +08:00
|
|
|
static inline bool smc_link_sendable(struct smc_link *lnk)
|
|
|
|
{
|
|
|
|
return smc_link_usable(lnk) &&
|
|
|
|
lnk->qp_attr.cur_qp_state == IB_QPS_RTS;
|
|
|
|
}
|
|
|
|
|
2020-07-18 15:06:16 +02:00
|
|
|
static inline bool smc_link_active(struct smc_link *lnk)
|
|
|
|
{
|
|
|
|
return lnk->state == SMC_LNK_ACTIVE;
|
|
|
|
}
|
|
|
|
|
2020-12-01 20:20:46 +01:00
|
|
|
static inline void smc_gid_be16_convert(__u8 *buf, u8 *gid_raw)
|
|
|
|
{
|
|
|
|
sprintf(buf, "%04x:%04x:%04x:%04x:%04x:%04x:%04x:%04x",
|
|
|
|
be16_to_cpu(((__be16 *)gid_raw)[0]),
|
|
|
|
be16_to_cpu(((__be16 *)gid_raw)[1]),
|
|
|
|
be16_to_cpu(((__be16 *)gid_raw)[2]),
|
|
|
|
be16_to_cpu(((__be16 *)gid_raw)[3]),
|
|
|
|
be16_to_cpu(((__be16 *)gid_raw)[4]),
|
|
|
|
be16_to_cpu(((__be16 *)gid_raw)[5]),
|
|
|
|
be16_to_cpu(((__be16 *)gid_raw)[6]),
|
|
|
|
be16_to_cpu(((__be16 *)gid_raw)[7]));
|
|
|
|
}
|
|
|
|
|
2020-12-01 20:20:48 +01:00
|
|
|
struct smc_pci_dev {
|
|
|
|
__u32 pci_fid;
|
|
|
|
__u16 pci_pchid;
|
|
|
|
__u16 pci_vendor;
|
|
|
|
__u16 pci_device;
|
|
|
|
__u8 pci_id[SMC_PCI_ID_STR_LEN];
|
|
|
|
};
|
|
|
|
|
|
|
|
static inline void smc_set_pci_values(struct pci_dev *pci_dev,
|
|
|
|
struct smc_pci_dev *smc_dev)
|
|
|
|
{
|
|
|
|
smc_dev->pci_vendor = pci_dev->vendor;
|
|
|
|
smc_dev->pci_device = pci_dev->device;
|
|
|
|
snprintf(smc_dev->pci_id, sizeof(smc_dev->pci_id), "%s",
|
|
|
|
pci_name(pci_dev));
|
|
|
|
#if IS_ENABLED(CONFIG_S390)
|
|
|
|
{ /* Set s390 specific PCI information */
|
|
|
|
struct zpci_dev *zdev;
|
|
|
|
|
|
|
|
zdev = to_zpci(pci_dev);
|
|
|
|
smc_dev->pci_fid = zdev->fid;
|
|
|
|
smc_dev->pci_pchid = zdev->pchid;
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
}
|
|
|
|
|
2017-01-09 16:55:18 +01:00
|
|
|
struct smc_sock;
|
|
|
|
struct smc_clc_msg_accept_confirm;
|
|
|
|
|
2022-01-06 20:42:08 +08:00
|
|
|
void smc_lgr_cleanup_early(struct smc_link_group *lgr);
|
2020-02-17 16:24:54 +01:00
|
|
|
void smc_lgr_terminate_sched(struct smc_link_group *lgr);
|
2022-01-13 16:36:40 +08:00
|
|
|
void smc_lgr_hold(struct smc_link_group *lgr);
|
|
|
|
void smc_lgr_put(struct smc_link_group *lgr);
|
2020-05-01 12:48:07 +02:00
|
|
|
void smcr_port_add(struct smc_ib_device *smcibdev, u8 ibport);
|
2020-05-01 12:48:08 +02:00
|
|
|
void smcr_port_err(struct smc_ib_device *smcibdev, u8 ibport);
|
2018-11-20 16:46:41 +01:00
|
|
|
void smc_smcd_terminate(struct smcd_dev *dev, u64 peer_gid,
|
|
|
|
unsigned short vlan);
|
2019-11-14 13:02:42 +01:00
|
|
|
void smc_smcd_terminate_all(struct smcd_dev *dev);
|
2019-11-14 13:02:47 +01:00
|
|
|
void smc_smcr_terminate_all(struct smc_ib_device *smcibdev);
|
net/smc: add base infrastructure for SMC-D and ISM
SMC supports two variants: SMC-R and SMC-D. For data transport, SMC-R
uses RDMA devices, SMC-D uses so-called Internal Shared Memory (ISM)
devices. An ISM device only allows shared memory communication between
SMC instances on the same machine. For example, this allows virtual
machines on the same host to communicate via SMC without RDMA devices.
This patch adds the base infrastructure for SMC-D and ISM devices to
the existing SMC code. It contains the following:
* ISM driver interface:
This interface allows an ISM driver to register ISM devices in SMC. In
the process, the driver provides a set of device ops for each device.
SMC uses these ops to execute SMC specific operations on or transfer
data over the device.
* Core SMC-D link group, connection, and buffer support:
Link groups, SMC connections and SMC buffers (in smc_core) are
extended to support SMC-D.
* SMC type checks:
Some type checks are added to prevent using SMC-R specific code for
SMC-D and vice versa.
To actually use SMC-D, additional changes to pnetid, CLC, CDC, etc. are
required. These are added in follow-up patches.
Signed-off-by: Hans Wippel <hwippel@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Suggested-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-28 19:05:07 +02:00
|
|
|
int smc_buf_create(struct smc_sock *smc, bool is_smcd);
|
2018-05-18 09:34:14 +02:00
|
|
|
int smc_uncompress_bufsize(u8 compressed);
|
2020-04-29 17:10:42 +02:00
|
|
|
int smc_rmb_rtoken_handling(struct smc_connection *conn, struct smc_link *link,
|
2017-01-09 16:55:20 +01:00
|
|
|
struct smc_clc_msg_accept_confirm *clc);
|
2020-04-29 17:10:40 +02:00
|
|
|
int smc_rtoken_add(struct smc_link *lnk, __be64 nw_vaddr, __be32 nw_rkey);
|
|
|
|
int smc_rtoken_delete(struct smc_link *lnk, __be32 nw_rkey);
|
2020-04-30 15:55:47 +02:00
|
|
|
void smc_rtoken_set(struct smc_link_group *lgr, int link_idx, int link_idx_new,
|
|
|
|
__be32 nw_rkey_known, __be64 nw_vaddr, __be32 nw_rkey);
|
|
|
|
void smc_rtoken_set2(struct smc_link_group *lgr, int rtok_idx, int link_id,
|
|
|
|
__be64 nw_vaddr, __be32 nw_rkey);
|
2017-07-28 13:56:22 +02:00
|
|
|
void smc_sndbuf_sync_sg_for_device(struct smc_connection *conn);
|
|
|
|
void smc_rmb_sync_sg_for_cpu(struct smc_connection *conn);
|
2019-04-12 12:57:26 +02:00
|
|
|
int smc_vlan_by_tcpsk(struct socket *clcsock, struct smc_init_info *ini);
|
net/smc: add base infrastructure for SMC-D and ISM
SMC supports two variants: SMC-R and SMC-D. For data transport, SMC-R
uses RDMA devices, SMC-D uses so-called Internal Shared Memory (ISM)
devices. An ISM device only allows shared memory communication between
SMC instances on the same machine. For example, this allows virtual
machines on the same host to communicate via SMC without RDMA devices.
This patch adds the base infrastructure for SMC-D and ISM devices to
the existing SMC code. It contains the following:
* ISM driver interface:
This interface allows an ISM driver to register ISM devices in SMC. In
the process, the driver provides a set of device ops for each device.
SMC uses these ops to execute SMC specific operations on or transfer
data over the device.
* Core SMC-D link group, connection, and buffer support:
Link groups, SMC connections and SMC buffers (in smc_core) are
extended to support SMC-D.
* SMC type checks:
Some type checks are added to prevent using SMC-R specific code for
SMC-D and vice versa.
To actually use SMC-D, additional changes to pnetid, CLC, CDC, etc. are
required. These are added in follow-up patches.
Signed-off-by: Hans Wippel <hwippel@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Suggested-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-28 19:05:07 +02:00
|
|
|
|
2018-05-18 09:34:14 +02:00
|
|
|
void smc_conn_free(struct smc_connection *conn);
|
2019-04-12 12:57:26 +02:00
|
|
|
int smc_conn_create(struct smc_sock *smc, struct smc_init_info *ini);
|
2018-07-25 16:35:33 +02:00
|
|
|
void smc_lgr_schedule_free_work_fast(struct smc_link_group *lgr);
|
2019-11-16 17:47:29 +01:00
|
|
|
int smc_core_init(void);
|
2018-05-18 09:34:11 +02:00
|
|
|
void smc_core_exit(void);
|
2018-07-23 13:53:10 +02:00
|
|
|
|
2020-05-03 14:38:40 +02:00
|
|
|
int smcr_link_init(struct smc_link_group *lgr, struct smc_link *lnk,
|
|
|
|
u8 link_idx, struct smc_init_info *ini);
|
2020-05-05 15:01:20 +02:00
|
|
|
void smcr_link_clear(struct smc_link *lnk, bool log);
|
2022-01-13 16:36:42 +08:00
|
|
|
void smcr_link_hold(struct smc_link *lnk);
|
|
|
|
void smcr_link_put(struct smc_link *lnk);
|
2021-08-09 11:05:57 +02:00
|
|
|
void smc_switch_link_and_count(struct smc_connection *conn,
|
|
|
|
struct smc_link *to_lnk);
|
2020-05-01 12:48:03 +02:00
|
|
|
int smcr_buf_map_lgr(struct smc_link *lnk);
|
|
|
|
int smcr_buf_reg_lgr(struct smc_link *lnk);
|
2020-05-04 14:18:44 +02:00
|
|
|
void smcr_lgr_set_type(struct smc_link_group *lgr, enum smc_lgr_type new_type);
|
|
|
|
void smcr_lgr_set_type_asym(struct smc_link_group *lgr,
|
|
|
|
enum smc_lgr_type new_type, int asym_lnk_idx);
|
net/smc: Allow virtually contiguous sndbufs or RMBs for SMC-R
On long-running enterprise production servers, high-order contiguous
memory pages are usually very rare and in most cases we can only get
fragmented pages.
When replacing TCP with SMC-R in such production scenarios, attempting
to allocate high-order physically contiguous sndbufs and RMBs may result
in frequent memory compaction, which will cause unexpected hung issue
and further stability risks.
So this patch is aimed to allow SMC-R link group to use virtually
contiguous sndbufs and RMBs to avoid potential issues mentioned above.
Whether to use physically or virtually contiguous buffers can be set
by sysctl smcr_buf_type.
Note that using virtually contiguous buffers will bring an acceptable
performance regression, which can be mainly divided into two parts:
1) regression in data path, which is brought by additional address
translation of sndbuf by RNIC in Tx. But in general, translating
address through MTT is fast.
Taking 256KB sndbuf and RMB as an example, the comparisons in qperf
latency and bandwidth test with physically and virtually contiguous
buffers are as follows:
- client:
smc_run taskset -c <cpu> qperf <server> -oo msg_size:1:64K:*2\
-t 5 -vu tcp_{bw|lat}
- server:
smc_run taskset -c <cpu> qperf
[latency]
msgsize tcp smcr smcr-use-virt-buf
1 11.17 us 7.56 us 7.51 us (-0.67%)
2 10.65 us 7.74 us 7.56 us (-2.31%)
4 11.11 us 7.52 us 7.59 us ( 0.84%)
8 10.83 us 7.55 us 7.51 us (-0.48%)
16 11.21 us 7.46 us 7.51 us ( 0.71%)
32 10.65 us 7.53 us 7.58 us ( 0.61%)
64 10.95 us 7.74 us 7.80 us ( 0.76%)
128 11.14 us 7.83 us 7.87 us ( 0.47%)
256 10.97 us 7.94 us 7.92 us (-0.28%)
512 11.23 us 7.94 us 8.20 us ( 3.25%)
1024 11.60 us 8.12 us 8.20 us ( 0.96%)
2048 14.04 us 8.30 us 8.51 us ( 2.49%)
4096 16.88 us 9.13 us 9.07 us (-0.64%)
8192 22.50 us 10.56 us 11.22 us ( 6.26%)
16384 28.99 us 12.88 us 13.83 us ( 7.37%)
32768 40.13 us 16.76 us 16.95 us ( 1.16%)
65536 68.70 us 24.68 us 24.85 us ( 0.68%)
[bandwidth]
msgsize tcp smcr smcr-use-virt-buf
1 1.65 MB/s 1.59 MB/s 1.53 MB/s (-3.88%)
2 3.32 MB/s 3.17 MB/s 3.08 MB/s (-2.67%)
4 6.66 MB/s 6.33 MB/s 6.09 MB/s (-3.85%)
8 13.67 MB/s 13.45 MB/s 11.97 MB/s (-10.99%)
16 25.36 MB/s 27.15 MB/s 24.16 MB/s (-11.01%)
32 48.22 MB/s 54.24 MB/s 49.41 MB/s (-8.89%)
64 106.79 MB/s 107.32 MB/s 99.05 MB/s (-7.71%)
128 210.21 MB/s 202.46 MB/s 201.02 MB/s (-0.71%)
256 400.81 MB/s 416.81 MB/s 393.52 MB/s (-5.59%)
512 746.49 MB/s 834.12 MB/s 809.99 MB/s (-2.89%)
1024 1292.33 MB/s 1641.96 MB/s 1571.82 MB/s (-4.27%)
2048 2007.64 MB/s 2760.44 MB/s 2717.68 MB/s (-1.55%)
4096 2665.17 MB/s 4157.44 MB/s 4070.76 MB/s (-2.09%)
8192 3159.72 MB/s 4361.57 MB/s 4270.65 MB/s (-2.08%)
16384 4186.70 MB/s 4574.13 MB/s 4501.17 MB/s (-1.60%)
32768 4093.21 MB/s 4487.42 MB/s 4322.43 MB/s (-3.68%)
65536 4057.14 MB/s 4735.61 MB/s 4555.17 MB/s (-3.81%)
2) regression in buffer initialization and destruction path, which is
brought by additional MR operations of sndbufs. But thanks to link
group buffer reuse mechanism, the impact of this kind of regression
decreases as times of buffer reuse increases.
Taking 256KB sndbuf and RMB as an example, latency of some key SMC-R
buffer-related function obtained by bpftrace are as follows:
Function Phys-bufs Virt-bufs
smcr_new_buf_create() 67154 ns 79164 ns
smc_ib_buf_map_sg() 525 ns 928 ns
smc_ib_get_memory_region() 162294 ns 161191 ns
smc_wr_reg_send() 9957 ns 9635 ns
smc_ib_put_memory_region() 203548 ns 198374 ns
smc_ib_buf_unmap_sg() 508 ns 1158 ns
------------
Test environment notes:
1. Above tests run on 2 VMs within the same Host.
2. The NIC is ConnectX-4Lx, using SRIOV and passing through 2 VFs to
the each VM respectively.
3. VMs' vCPUs are binded to different physical CPUs, and the binded
physical CPUs are isolated by `isolcpus=xxx` cmdline.
4. NICs' queue number are set to 1.
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-14 17:44:04 +08:00
|
|
|
int smcr_link_reg_buf(struct smc_link *link, struct smc_buf_desc *rmb_desc);
|
2020-05-04 14:18:38 +02:00
|
|
|
struct smc_link *smc_switch_conns(struct smc_link_group *lgr,
|
|
|
|
struct smc_link *from_lnk, bool is_dev_err);
|
2020-05-01 12:48:08 +02:00
|
|
|
void smcr_link_down_cond(struct smc_link *lnk);
|
|
|
|
void smcr_link_down_cond_sched(struct smc_link *lnk);
|
2020-12-01 20:20:44 +01:00
|
|
|
int smc_nl_get_sys_info(struct sk_buff *skb, struct netlink_callback *cb);
|
2020-12-01 20:20:45 +01:00
|
|
|
int smcr_nl_get_lgr(struct sk_buff *skb, struct netlink_callback *cb);
|
2020-12-01 20:20:46 +01:00
|
|
|
int smcr_nl_get_link(struct sk_buff *skb, struct netlink_callback *cb);
|
2020-12-01 20:20:47 +01:00
|
|
|
int smcd_nl_get_lgr(struct sk_buff *skb, struct netlink_callback *cb);
|
2020-05-01 12:48:08 +02:00
|
|
|
|
2018-07-23 13:53:10 +02:00
|
|
|
static inline struct smc_link_group *smc_get_lgr(struct smc_link *link)
|
|
|
|
{
|
2020-04-29 17:10:40 +02:00
|
|
|
return link->lgr;
|
2018-07-23 13:53:10 +02:00
|
|
|
}
|
2017-01-09 16:55:17 +01:00
|
|
|
#endif
|