iommu: IOMMU Groups
IOMMU device groups are currently a rather vague associative notion
with assembly required by the user or user level driver provider to
do anything useful. This patch intends to grow the IOMMU group concept
into something a bit more consumable.
To do this, we first create an object representing the group, struct
iommu_group. This structure is allocated (iommu_group_alloc) and
filled (iommu_group_add_device) by the iommu driver. The iommu driver
is free to add devices to the group using it's own set of policies.
This allows inclusion of devices based on physical hardware or topology
limitations of the platform, as well as soft requirements, such as
multi-function trust levels or peer-to-peer protection of the
interconnects. Each device may only belong to a single iommu group,
which is linked from struct device.iommu_group. IOMMU groups are
maintained using kobject reference counting, allowing for automatic
removal of empty, unreferenced groups. It is the responsibility of
the iommu driver to remove devices from the group
(iommu_group_remove_device).
IOMMU groups also include a userspace representation in sysfs under
/sys/kernel/iommu_groups. When allocated, each group is given a
dynamically assign ID (int). The ID is managed by the core IOMMU group
code to support multiple heterogeneous iommu drivers, which could
potentially collide in group naming/numbering. This also keeps group
IDs to small, easily managed values. A directory is created under
/sys/kernel/iommu_groups for each group. A further subdirectory named
"devices" contains links to each device within the group. The iommu_group
file in the device's sysfs directory, which formerly contained a group
number when read, is now a link to the iommu group. Example:
$ ls -l /sys/kernel/iommu_groups/26/devices/
total 0
lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 ->
../../../../devices/pci0000:00/0000:00:1e.0
lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 ->
../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0
lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 ->
../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1
$ ls -l /sys/kernel/iommu_groups/26/devices/*/iommu_group
[truncating perms/owner/timestamp]
/sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group ->
../../../kernel/iommu_groups/26
/sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group ->
../../../../kernel/iommu_groups/26
/sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group ->
../../../../kernel/iommu_groups/26
Groups also include several exported functions for use by user level
driver providers, for example VFIO. These include:
iommu_group_get(): Acquires a reference to a group from a device
iommu_group_put(): Releases reference
iommu_group_for_each_dev(): Iterates over group devices using callback
iommu_group_[un]register_notifier(): Allows notification of device add
and remove operations relevant to the group
iommu_group_id(): Return the group number
This patch also extends the IOMMU API to allow attaching groups to
domains. This is currently a simple wrapper for iterating through
devices within a group, but it's expected that the IOMMU API may
eventually make groups a more integral part of domains.
Groups intentionally do not try to manage group ownership. A user
level driver provider must independently acquire ownership for each
device within a group before making use of the group as a whole.
This may change in the future if group usage becomes more pervasive
across both DMA and IOMMU ops.
Groups intentionally do not provide a mechanism for driver locking
or otherwise manipulating driver matching/probing of devices within
the group. Such interfaces are generic to devices and beyond the
scope of IOMMU groups. If implemented, user level providers have
ready access via iommu_group_for_each_dev and group notifiers.
iommu_device_group() is removed here as it has no users. The
replacement is:
group = iommu_group_get(dev);
id = iommu_group_id(group);
iommu_group_put(group);
AMD-Vi & Intel VT-d support re-added in following patches.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-05-30 14:18:53 -06:00
|
|
|
What: /sys/kernel/iommu_groups/
|
|
|
|
Date: May 2012
|
|
|
|
KernelVersion: v3.5
|
|
|
|
Contact: Alex Williamson <alex.williamson@redhat.com>
|
|
|
|
Description: /sys/kernel/iommu_groups/ contains a number of sub-
|
|
|
|
directories, each representing an IOMMU group. The
|
|
|
|
name of the sub-directory matches the iommu_group_id()
|
|
|
|
for the group, which is an integer value. Within each
|
|
|
|
subdirectory is another directory named "devices" with
|
|
|
|
links to the sysfs devices contained in this group.
|
|
|
|
The group directory also optionally contains a "name"
|
|
|
|
file if the IOMMU driver has chosen to register a more
|
|
|
|
common name for the group.
|
|
|
|
Users:
|
2017-01-19 20:57:52 +00:00
|
|
|
|
|
|
|
What: /sys/kernel/iommu_groups/reserved_regions
|
|
|
|
Date: January 2017
|
|
|
|
KernelVersion: v4.11
|
|
|
|
Contact: Eric Auger <eric.auger@redhat.com>
|
|
|
|
Description: /sys/kernel/iommu_groups/reserved_regions list IOVA
|
|
|
|
regions that are reserved. Not necessarily all
|
|
|
|
reserved regions are listed. This is typically used to
|
|
|
|
output direct-mapped, MSI, non mappable regions. Each
|
|
|
|
region is described on a single line: the 1st field is
|
|
|
|
the base IOVA, the second is the end IOVA and the third
|
|
|
|
field describes the type of the region.
|
2019-06-03 08:53:35 +02:00
|
|
|
|
2021-05-19 10:51:44 +02:00
|
|
|
Since kernel 5.3, in case an RMRR is used only by graphics or
|
|
|
|
USB devices it is now exposed as "direct-relaxable" instead
|
|
|
|
of "direct". In device assignment use case, for instance,
|
|
|
|
those RMRR are considered to be relaxable and safe.
|
2020-11-24 21:06:04 +08:00
|
|
|
|
|
|
|
What: /sys/kernel/iommu_groups/<grp_id>/type
|
|
|
|
Date: November 2020
|
|
|
|
KernelVersion: v5.11
|
|
|
|
Contact: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
|
|
|
|
Description: /sys/kernel/iommu_groups/<grp_id>/type shows the type of default
|
|
|
|
domain in use by iommu for this group. See include/linux/iommu.h
|
2020-11-26 17:06:03 +08:00
|
|
|
for possible read values. A privileged user could request kernel to
|
|
|
|
change the group type by writing to this file. Valid write values:
|
|
|
|
|
|
|
|
======== ======================================================
|
|
|
|
DMA All the DMA transactions from the device in this group
|
|
|
|
are translated by the iommu.
|
2021-08-11 13:21:35 +01:00
|
|
|
DMA-FQ As above, but using batched invalidation to lazily
|
|
|
|
remove translations after use. This may offer reduced
|
|
|
|
overhead at the cost of reduced memory protection.
|
2020-11-26 17:06:03 +08:00
|
|
|
identity All the DMA transactions from the device in this group
|
2021-08-11 13:21:35 +01:00
|
|
|
are not translated by the iommu. Maximum performance
|
|
|
|
but zero protection.
|
2020-11-26 17:06:03 +08:00
|
|
|
auto Change to the type the device was booted with.
|
|
|
|
======== ======================================================
|
|
|
|
|
2020-11-24 21:06:04 +08:00
|
|
|
The default domain type of a group may be modified only when
|
2020-11-26 17:06:03 +08:00
|
|
|
|
|
|
|
- The device in the group is not bound to any device driver.
|
|
|
|
So, the users must unbind the appropriate driver before
|
|
|
|
changing the default domain type.
|
|
|
|
|
2020-11-24 21:06:04 +08:00
|
|
|
Unbinding a device driver will take away the driver's control
|
|
|
|
over the device and if done on devices that host root file
|
|
|
|
system could lead to catastrophic effects (the users might
|
|
|
|
need to reboot the machine to get it to normal state). So, it's
|
|
|
|
expected that the users understand what they're doing.
|