linux-stable/include/rdma
Yamin Friedman c7ff819aef RDMA/core: Introduce shared CQ pool API
Allow a ULP to ask the core to provide a completion queue based on a
least-used search on a per-device CQ pools. The device CQ pools grow in a
lazy fashion when more CQs are requested.

This feature reduces the amount of interrupts when using many QPs.  Using
shared CQs allows for more effcient completion handling. It also reduces
the amount of overhead needed for CQ contexts.

Test setup:
Intel(R) Xeon(R) Platinum 8176M CPU @ 2.10GHz servers.
Running NVMeoF 4KB read IOs over ConnectX-5EX across Spectrum switch.
TX-depth = 32. The patch was applied in the nvme driver on both the target
and initiator. Four controllers are accessed from each core. In the
current test case we have exposed sixteen NVMe namespaces using four
different subsystems (four namespaces per subsystem) from one NVM port.
Each controller allocated X queues (RDMA QPs) and attached to Y CQs.
Before this series we had X == Y, i.e for four controllers we've created
total of 4X QPs and 4X CQs. In the shared case, we've created 4X QPs and
only X CQs which means that we have four controllers that share a
completion queue per core. Until fourteen cores there is no significant
change in performance and the number of interrupts per second is less than
a million in the current case.
==================================================
|Cores|Current KIOPs  |Shared KIOPs  |improvement|
|-----|---------------|--------------|-----------|
|14   |2332           |2723          |16.7%      |
|-----|---------------|--------------|-----------|
|20   |2086           |2712          |30%        |
|-----|---------------|--------------|-----------|
|28   |1971           |2669          |35.4%      |
|=================================================
|Cores|Current avg lat|Shared avg lat|improvement|
|-----|---------------|--------------|-----------|
|14   |767us          |657us         |14.3%      |
|-----|---------------|--------------|-----------|
|20   |1225us         |943us         |23%        |
|-----|---------------|--------------|-----------|
|28   |1816us         |1341us        |26.1%      |
========================================================
|Cores|Current interrupts|Shared interrupts|improvement|
|-----|------------------|-----------------|-----------|
|14   |1.6M/sec          |0.4M/sec         |72%        |
|-----|------------------|-----------------|-----------|
|20   |2.8M/sec          |0.6M/sec         |72.4%      |
|-----|------------------|-----------------|-----------|
|28   |2.9M/sec          |0.8M/sec         |63.4%      |
====================================================================
|Cores|Current 99.99th PCTL lat|Shared 99.99th PCTL lat|improvement|
|-----|------------------------|-----------------------|-----------|
|14   |67ms                    |6ms                    |90.9%      |
|-----|------------------------|-----------------------|-----------|
|20   |5ms                     |6ms                    |-10%       |
|-----|------------------------|-----------------------|-----------|
|28   |8.7ms                   |6ms                    |25.9%      |
|===================================================================

Performance improvement with sixteen disks (sixteen CQs per core) is
comparable.

Link: https://lore.kernel.org/r/1590568495-101621-3-git-send-email-yaminf@mellanox.com
Signed-off-by: Yamin Friedman <yaminf@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-29 16:09:02 -03:00
..
ib_addr.h RDMA/core: Annotate timeout as unsigned long 2018-10-16 13:34:01 -04:00
ib_cache.h RDMA/core: Add helper function to retrieve driver gid context from gid attr 2020-02-19 16:54:25 -04:00
ib_cm.h RDMA/cm: Send and receive ECE parameter over the wire 2020-05-27 16:05:05 -03:00
ib_fmr_pool.h RDMA: Replace zero-length array with flexible-array member 2020-02-20 13:33:51 -04:00
ib_hdrs.h IB/hfi1: Build TID RDMA WRITE request 2019-02-05 18:07:43 -05:00
ib_mad.h RDMA/mad: Remove snoop interface 2020-05-06 11:50:22 -03:00
ib_marshall.h IB/core: Convert ah_attr from OPA to IB when copying to user 2017-08-08 14:47:18 -04:00
ib_pack.h IB/core: Fix calculation of maximum RoCE MTU 2017-10-18 12:11:36 -04:00
ib_pma.h IB/core: Display extended counter set if available 2015-12-23 15:58:30 -05:00
ib_sa.h RDMA/core: Annotate timeout as unsigned long 2018-10-16 13:34:01 -04:00
ib_smi.h RDMA: Use __packed annotation instead of __attribute__ ((packed)) 2019-03-25 21:14:12 -03:00
ib_umem_odp.h IB: Allow calls to ib_umem_get from kernel ULPs 2020-01-16 16:14:28 +02:00
ib_umem.h IB: Allow calls to ib_umem_get from kernel ULPs 2020-01-16 16:14:28 +02:00
ib_verbs.h RDMA/core: Introduce shared CQ pool API 2020-05-29 16:09:02 -03:00
ib.h RDMA: Make most headers compile stand alone 2019-07-25 13:58:47 -03:00
iba.h RDMA/cm: Add SET/GET implementations to hide IBA wire format 2020-01-25 15:05:59 -04:00
ibta_vol1_c12.h RDMA/cm: Add Enhanced Connection Establishment (ECE) bits 2020-05-27 16:05:05 -03:00
iw_cm.h RDMA: Get rid of iw_cm_verbs 2019-05-03 10:56:56 -03:00
iw_portmap.h RDMA: Make most headers compile stand alone 2019-07-25 13:58:47 -03:00
lag.h RDMA/core: Add LAG functionality 2020-05-02 20:19:54 -03:00
mr_pool.h Linux 5.2-rc6 2019-06-28 21:18:23 -03:00
opa_addr.h include/rdma/opa_addr.h: Fix an endianness issue 2018-07-03 14:11:34 -06:00
opa_port_info.h IB/ipoib: Increase ipoib Datagram mode MTU's upper limit 2020-05-21 11:23:55 -03:00
opa_smi.h RDMA: Use __packed annotation instead of __attribute__ ((packed)) 2019-03-25 21:14:12 -03:00
opa_vnic.h IB/{rdmavt, hfi1}: Implement creation of accelerated UD QPs 2020-05-21 11:23:54 -03:00
rdma_cm_ib.h RDMA/{cma, ucma}: Simplify and rename rdma_set_ib_paths 2018-01-10 22:00:33 -07:00
rdma_cm.h RDMA/cma: Provide ECE reject reason 2020-05-27 16:05:05 -03:00
rdma_counter.h RDMA/core: Make rdma_counter.h compile stand alone 2019-07-09 09:44:47 -03:00
rdma_netlink.h RDMA/core: Support netlink commands in non init_net net namespaces 2019-07-25 14:12:41 -03:00
rdma_vt.h IB/{rdmavt, hfi1, qib}: Add a counter for credit waits 2019-09-13 16:59:55 -03:00
rdmavt_cq.h RDMA: Make most headers compile stand alone 2019-07-25 13:58:47 -03:00
rdmavt_mr.h RDMA: Replace zero-length array with flexible-array member 2020-02-20 13:33:51 -04:00
rdmavt_qp.h IB/hfi1: Remove module parameter for KDETH qpns 2020-05-21 11:23:54 -03:00
restrack.h RDMA/nldev: Provide MR statistics 2019-10-22 15:33:31 -03:00
rw.h Linux 5.2-rc6 2019-06-28 21:18:23 -03:00
signature.h RDMA: Make most headers compile stand alone 2019-07-25 13:58:47 -03:00
tid_rdma_defs.h IB/hfi1: Build TID RDMA WRITE request 2019-02-05 18:07:43 -05:00
uverbs_ioctl.h RDMA/core: Use offsetofend() instead of open coding 2020-05-29 15:27:04 -03:00
uverbs_named_ioctl.h RDMA/core: Make the entire API tree static 2020-01-30 16:28:52 -04:00
uverbs_std_types.h RDMA/core: Allow the ioctl layer to abort a fully created uobject 2020-05-21 20:10:46 -03:00
uverbs_types.h RDMA/core: Allow the ioctl layer to abort a fully created uobject 2020-05-21 20:10:46 -03:00