mirror of
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
synced 2025-01-12 16:19:53 +00:00
9c1e67f941
Added documentation for v1 and v2 version describing high level design and usage examples on using rdma controller. Signed-off-by: Parav Pandit <pandit.parav@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
110 lines
4.3 KiB
Plaintext
110 lines
4.3 KiB
Plaintext
RDMA Controller
|
|
----------------
|
|
|
|
Contents
|
|
--------
|
|
|
|
1. Overview
|
|
1-1. What is RDMA controller?
|
|
1-2. Why RDMA controller needed?
|
|
1-3. How is RDMA controller implemented?
|
|
2. Usage Examples
|
|
|
|
1. Overview
|
|
|
|
1-1. What is RDMA controller?
|
|
-----------------------------
|
|
|
|
RDMA controller allows user to limit RDMA/IB specific resources that a given
|
|
set of processes can use. These processes are grouped using RDMA controller.
|
|
|
|
RDMA controller defines two resources which can be limited for processes of a
|
|
cgroup.
|
|
|
|
1-2. Why RDMA controller needed?
|
|
--------------------------------
|
|
|
|
Currently user space applications can easily take away all the rdma verb
|
|
specific resources such as AH, CQ, QP, MR etc. Due to which other applications
|
|
in other cgroup or kernel space ULPs may not even get chance to allocate any
|
|
rdma resources. This can leads to service unavailability.
|
|
|
|
Therefore RDMA controller is needed through which resource consumption
|
|
of processes can be limited. Through this controller different rdma
|
|
resources can be accounted.
|
|
|
|
1-3. How is RDMA controller implemented?
|
|
----------------------------------------
|
|
|
|
RDMA cgroup allows limit configuration of resources. Rdma cgroup maintains
|
|
resource accounting per cgroup, per device using resource pool structure.
|
|
Each such resource pool is limited up to 64 resources in given resource pool
|
|
by rdma cgroup, which can be extended later if required.
|
|
|
|
This resource pool object is linked to the cgroup css. Typically there
|
|
are 0 to 4 resource pool instances per cgroup, per device in most use cases.
|
|
But nothing limits to have it more. At present hundreds of RDMA devices per
|
|
single cgroup may not be handled optimally, however there is no
|
|
known use case or requirement for such configuration either.
|
|
|
|
Since RDMA resources can be allocated from any process and can be freed by any
|
|
of the child processes which shares the address space, rdma resources are
|
|
always owned by the creator cgroup css. This allows process migration from one
|
|
to other cgroup without major complexity of transferring resource ownership;
|
|
because such ownership is not really present due to shared nature of
|
|
rdma resources. Linking resources around css also ensures that cgroups can be
|
|
deleted after processes migrated. This allow progress migration as well with
|
|
active resources, even though that is not a primary use case.
|
|
|
|
Whenever RDMA resource charging occurs, owner rdma cgroup is returned to
|
|
the caller. Same rdma cgroup should be passed while uncharging the resource.
|
|
This also allows process migrated with active RDMA resource to charge
|
|
to new owner cgroup for new resource. It also allows to uncharge resource of
|
|
a process from previously charged cgroup which is migrated to new cgroup,
|
|
even though that is not a primary use case.
|
|
|
|
Resource pool object is created in following situations.
|
|
(a) User sets the limit and no previous resource pool exist for the device
|
|
of interest for the cgroup.
|
|
(b) No resource limits were configured, but IB/RDMA stack tries to
|
|
charge the resource. So that it correctly uncharge them when applications are
|
|
running without limits and later on when limits are enforced during uncharging,
|
|
otherwise usage count will drop to negative.
|
|
|
|
Resource pool is destroyed if all the resource limits are set to max and
|
|
it is the last resource getting deallocated.
|
|
|
|
User should set all the limit to max value if it intents to remove/unconfigure
|
|
the resource pool for a particular device.
|
|
|
|
IB stack honors limits enforced by the rdma controller. When application
|
|
query about maximum resource limits of IB device, it returns minimum of
|
|
what is configured by user for a given cgroup and what is supported by
|
|
IB device.
|
|
|
|
Following resources can be accounted by rdma controller.
|
|
hca_handle Maximum number of HCA Handles
|
|
hca_object Maximum number of HCA Objects
|
|
|
|
2. Usage Examples
|
|
-----------------
|
|
|
|
(a) Configure resource limit:
|
|
echo mlx4_0 hca_handle=2 hca_object=2000 > /sys/fs/cgroup/rdma/1/rdma.max
|
|
echo ocrdma1 hca_handle=3 > /sys/fs/cgroup/rdma/2/rdma.max
|
|
|
|
(b) Query resource limit:
|
|
cat /sys/fs/cgroup/rdma/2/rdma.max
|
|
#Output:
|
|
mlx4_0 hca_handle=2 hca_object=2000
|
|
ocrdma1 hca_handle=3 hca_object=max
|
|
|
|
(c) Query current usage:
|
|
cat /sys/fs/cgroup/rdma/2/rdma.current
|
|
#Output:
|
|
mlx4_0 hca_handle=1 hca_object=20
|
|
ocrdma1 hca_handle=1 hca_object=23
|
|
|
|
(d) Delete resource limit:
|
|
echo echo mlx4_0 hca_handle=max hca_object=max > /sys/fs/cgroup/rdma/1/rdma.max
|