2019-05-19 13:07:45 +01:00
|
|
|
# SPDX-License-Identifier: GPL-2.0-only
|
2023-06-14 13:39:46 -06:00
|
|
|
menu "VFIO support for PCI devices"
|
|
|
|
depends on PCI && MMU
|
|
|
|
|
2021-08-26 13:39:12 +03:00
|
|
|
config VFIO_PCI_CORE
|
|
|
|
tristate
|
2015-03-17 08:33:38 -06:00
|
|
|
select VFIO_VIRQFD
|
2015-09-18 22:29:50 +08:00
|
|
|
select IRQ_BYPASS_MANAGER
|
2021-08-26 13:39:12 +03:00
|
|
|
|
|
|
|
config VFIO_PCI_MMAP
|
|
|
|
def_bool y if !S390
|
2023-06-14 13:39:46 -06:00
|
|
|
depends on VFIO_PCI_CORE
|
2021-08-26 13:39:12 +03:00
|
|
|
|
|
|
|
config VFIO_PCI_INTX
|
|
|
|
def_bool y if !S390
|
2023-06-14 13:39:46 -06:00
|
|
|
depends on VFIO_PCI_CORE
|
2021-08-26 13:39:12 +03:00
|
|
|
|
|
|
|
config VFIO_PCI
|
|
|
|
tristate "Generic VFIO support for any PCI device"
|
|
|
|
select VFIO_PCI_CORE
|
2012-07-31 08:16:24 -06:00
|
|
|
help
|
2021-08-26 13:39:12 +03:00
|
|
|
Support for the generic PCI VFIO bus driver which can connect any
|
|
|
|
PCI device to the VFIO framework.
|
2012-07-31 08:16:24 -06:00
|
|
|
|
|
|
|
If you don't know what to do here, say N.
|
2013-02-18 10:11:13 -07:00
|
|
|
|
2021-08-26 13:39:11 +03:00
|
|
|
if VFIO_PCI
|
2013-02-18 10:11:13 -07:00
|
|
|
config VFIO_PCI_VGA
|
2021-08-26 13:39:12 +03:00
|
|
|
bool "Generic VFIO PCI support for VGA devices"
|
2021-08-26 13:39:11 +03:00
|
|
|
depends on X86 && VGA_ARB
|
2013-02-18 10:11:13 -07:00
|
|
|
help
|
|
|
|
Support for VGA extension to VFIO PCI. This exposes an additional
|
|
|
|
region on VGA devices for accessing legacy VGA addresses used by
|
|
|
|
BIOS and generic video drivers.
|
|
|
|
|
|
|
|
If you don't know what to do here, say N.
|
2014-11-07 09:52:22 -07:00
|
|
|
|
2016-02-22 16:02:43 -07:00
|
|
|
config VFIO_PCI_IGD
|
2021-08-26 13:39:12 +03:00
|
|
|
bool "Generic VFIO PCI extensions for Intel graphics (GVT-d)"
|
2021-08-26 13:39:11 +03:00
|
|
|
depends on X86
|
2018-06-18 16:39:50 -06:00
|
|
|
default y
|
|
|
|
help
|
|
|
|
Support for Intel IGD specific extensions to enable direct
|
|
|
|
assignment to virtual machines. This includes exposing an IGD
|
|
|
|
specific firmware table and read-only copies of the host bridge
|
|
|
|
and LPC bridge config space.
|
|
|
|
|
|
|
|
To enable Intel IGD assignment through vfio-pci, say Y.
|
2021-08-26 13:39:12 +03:00
|
|
|
endif
|
2022-02-24 16:20:22 +02:00
|
|
|
|
2022-06-06 16:33:14 -04:00
|
|
|
config VFIO_PCI_ZDEV_KVM
|
|
|
|
bool "VFIO PCI extensions for s390x KVM passthrough"
|
|
|
|
depends on S390 && KVM
|
|
|
|
default y
|
|
|
|
help
|
|
|
|
Support s390x-specific extensions to enable support for enhancements
|
|
|
|
to KVM passthrough capabilities, such as interpretive execution of
|
|
|
|
zPCI instructions.
|
|
|
|
|
|
|
|
To enable s390x KVM vfio-pci extensions, say Y.
|
|
|
|
|
2022-02-24 16:20:22 +02:00
|
|
|
source "drivers/vfio/pci/mlx5/Kconfig"
|
|
|
|
|
2022-03-08 18:48:57 +00:00
|
|
|
source "drivers/vfio/pci/hisilicon/Kconfig"
|
|
|
|
|
2023-08-07 13:57:55 -07:00
|
|
|
source "drivers/vfio/pci/pds/Kconfig"
|
|
|
|
|
vfio/virtio: Introduce a vfio driver over virtio devices
Introduce a vfio driver over virtio devices to support the legacy
interface functionality for VFs.
Background, from the virtio spec [1].
--------------------------------------------------------------------
In some systems, there is a need to support a virtio legacy driver with
a device that does not directly support the legacy interface. In such
scenarios, a group owner device can provide the legacy interface
functionality for the group member devices. The driver of the owner
device can then access the legacy interface of a member device on behalf
of the legacy member device driver.
For example, with the SR-IOV group type, group members (VFs) can not
present the legacy interface in an I/O BAR in BAR0 as expected by the
legacy pci driver. If the legacy driver is running inside a virtual
machine, the hypervisor executing the virtual machine can present a
virtual device with an I/O BAR in BAR0. The hypervisor intercepts the
legacy driver accesses to this I/O BAR and forwards them to the group
owner device (PF) using group administration commands.
--------------------------------------------------------------------
Specifically, this driver adds support for a virtio-net VF to be exposed
as a transitional device to a guest driver and allows the legacy IO BAR
functionality on top.
This allows a VM which uses a legacy virtio-net driver in the guest to
work transparently over a VF which its driver in the host is that new
driver.
The driver can be extended easily to support some other types of virtio
devices (e.g virtio-blk), by adding in a few places the specific type
properties as was done for virtio-net.
For now, only the virtio-net use case was tested and as such we introduce
the support only for such a device.
Practically,
Upon probing a VF for a virtio-net device, in case its PF supports
legacy access over the virtio admin commands and the VF doesn't have BAR
0, we set some specific 'vfio_device_ops' to be able to simulate in SW a
transitional device with I/O BAR in BAR 0.
The existence of the simulated I/O bar is reported later on by
overwriting the VFIO_DEVICE_GET_REGION_INFO command and the device
exposes itself as a transitional device by overwriting some properties
upon reading its config space.
Once we report the existence of I/O BAR as BAR 0 a legacy driver in the
guest may use it via read/write calls according to the virtio
specification.
Any read/write towards the control parts of the BAR will be captured by
the new driver and will be translated into admin commands towards the
device.
In addition, any data path read/write access (i.e. virtio driver
notifications) will be captured by the driver and forwarded to the
physical BAR which its properties were supplied by the admin command
VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO upon the probing/init flow.
With that code in place a legacy driver in the guest has the look and
feel as if having a transitional device with legacy support for both its
control and data path flows.
[1]
https://github.com/oasis-tcs/virtio-spec/commit/03c2d32e5093ca9f2a17797242fbef88efe94b8c
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://lore.kernel.org/r/20231219093247.170936-10-yishaih@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2023-12-19 11:32:47 +02:00
|
|
|
source "drivers/vfio/pci/virtio/Kconfig"
|
|
|
|
|
vfio/nvgrace-gpu: Add vfio pci variant module for grace hopper
NVIDIA's upcoming Grace Hopper Superchip provides a PCI-like device
for the on-chip GPU that is the logical OS representation of the
internal proprietary chip-to-chip cache coherent interconnect.
The device is peculiar compared to a real PCI device in that whilst
there is a real 64b PCI BAR1 (comprising region 2 & region 3) on the
device, it is not used to access device memory once the faster
chip-to-chip interconnect is initialized (occurs at the time of host
system boot). The device memory is accessed instead using the chip-to-chip
interconnect that is exposed as a contiguous physically addressable
region on the host. This device memory aperture can be obtained from host
ACPI table using device_property_read_u64(), according to the FW
specification. Since the device memory is cache coherent with the CPU,
it can be mmap into the user VMA with a cacheable mapping using
remap_pfn_range() and used like a regular RAM. The device memory
is not added to the host kernel, but mapped directly as this reduces
memory wastage due to struct pages.
There is also a requirement of a minimum reserved 1G uncached region
(termed as resmem) to support the Multi-Instance GPU (MIG) feature [1].
This is to work around a HW defect. Based on [2], the requisite properties
(uncached, unaligned access) can be achieved through a VM mapping (S1)
of NORMAL_NC and host (S2) mapping with MemAttr[2:0]=0b101. To provide
a different non-cached property to the reserved 1G region, it needs to
be carved out from the device memory and mapped as a separate region
in Qemu VMA with pgprot_writecombine(). pgprot_writecombine() sets the
Qemu VMA page properties (pgprot) as NORMAL_NC.
Provide a VFIO PCI variant driver that adapts the unique device memory
representation into a more standard PCI representation facing userspace.
The variant driver exposes these two regions - the non-cached reserved
(resmem) and the cached rest of the device memory (termed as usemem) as
separate VFIO 64b BAR regions. This is divergent from the baremetal
approach, where the device memory is exposed as a device memory region.
The decision for a different approach was taken in view of the fact that
it would necessiate additional code in Qemu to discover and insert those
regions in the VM IPA, along with the additional VM ACPI DSDT changes to
communicate the device memory region IPA to the VM workloads. Moreover,
this behavior would have to be added to a variety of emulators (beyond
top of tree Qemu) out there desiring grace hopper support.
Since the device implements 64-bit BAR0, the VFIO PCI variant driver
maps the uncached carved out region to the next available PCI BAR (i.e.
comprising of region 2 and 3). The cached device memory aperture is
assigned BAR region 4 and 5. Qemu will then naturally generate a PCI
device in the VM with the uncached aperture reported as BAR2 region,
the cacheable as BAR4. The variant driver provides emulation for these
fake BARs' PCI config space offset registers.
The hardware ensures that the system does not crash when the memory
is accessed with the memory enable turned off. It synthesis ~0 reads
and dropped writes on such access. So there is no need to support the
disablement/enablement of BAR through PCI_COMMAND config space register.
The memory layout on the host looks like the following:
devmem (memlength)
|--------------------------------------------------|
|-------------cached------------------------|--NC--|
| |
usemem.memphys resmem.memphys
PCI BARs need to be aligned to the power-of-2, but the actual memory on the
device may not. A read or write access to the physical address from the
last device PFN up to the next power-of-2 aligned physical address
results in reading ~0 and dropped writes. Note that the GPU device
driver [6] is capable of knowing the exact device memory size through
separate means. The device memory size is primarily kept in the system
ACPI tables for use by the VFIO PCI variant module.
Note that the usemem memory is added by the VM Nvidia device driver [5]
to the VM kernel as memblocks. Hence make the usable memory size memblock
(MEMBLK_SIZE) aligned. This is a hardwired ABI value between the GPU FW and
VFIO driver. The VM device driver make use of the same value for its
calculation to determine USEMEM size.
Currently there is no provision in KVM for a S2 mapping with
MemAttr[2:0]=0b101, but there is an ongoing effort to provide the same [3].
As previously mentioned, resmem is mapped pgprot_writecombine(), that
sets the Qemu VMA page properties (pgprot) as NORMAL_NC. Using the
proposed changes in [3] and [4], KVM marks the region with
MemAttr[2:0]=0b101 in S2.
If the device memory properties are not present, the driver registers the
vfio-pci-core function pointers. Since there are no ACPI memory properties
generated for the VM, the variant driver inside the VM will only use
the vfio-pci-core ops and hence try to map the BARs as non cached. This
is not a problem as the CPUs have FWB enabled which blocks the VM
mapping's ability to override the cacheability set by the host mapping.
This goes along with a qemu series [6] to provides the necessary
implementation of the Grace Hopper Superchip firmware specification so
that the guest operating system can see the correct ACPI modeling for
the coherent GPU device. Verified with the CUDA workload in the VM.
[1] https://www.nvidia.com/en-in/technologies/multi-instance-gpu/
[2] section D8.5.5 of https://developer.arm.com/documentation/ddi0487/latest/
[3] https://lore.kernel.org/all/20240211174705.31992-1-ankita@nvidia.com/
[4] https://lore.kernel.org/all/20230907181459.18145-2-ankita@nvidia.com/
[5] https://github.com/NVIDIA/open-gpu-kernel-modules
[6] https://lore.kernel.org/all/20231203060245.31593-1-ankita@nvidia.com/
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Yishai Hadas <yishaih@nvidia.com>
Reviewed-by: Zhi Wang <zhi.wang.linux@gmail.com>
Signed-off-by: Aniket Agashe <aniketa@nvidia.com>
Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
Link: https://lore.kernel.org/r/20240220115055.23546-4-ankita@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2024-02-20 17:20:55 +05:30
|
|
|
source "drivers/vfio/pci/nvgrace-gpu/Kconfig"
|
|
|
|
|
2024-04-26 14:40:51 +08:00
|
|
|
source "drivers/vfio/pci/qat/Kconfig"
|
|
|
|
|
2023-06-14 13:39:46 -06:00
|
|
|
endmenu
|