Mariusz Tkaczyk 4e893545ef PCI/NPEM: Add Native PCIe Enclosure Management support
Native PCIe Enclosure Management (NPEM, PCIe r6.1 sec 6.28) allows managing
LEDs in storage enclosures. NPEM is indication oriented and it does not
give direct access to LEDs. Although each indication *could* represent an
individual LED, multiple indications could also be represented as a single,
multi-color LED or a single LED blinking in a specific interval.  The
specification leaves that open.

Each enabled indication (capability register bit on) is represented as a
ledclass_dev which can be controlled through sysfs. For every ledclass
device only 2 brightness states are allowed: LED_ON (1) or LED_OFF (0).
This corresponds to the NPEM control register (Indication bit on/off).

Ledclass devices appear in sysfs as child devices (subdirectory) of PCI
device which has an NPEM Extended Capability and indication is enabled in
NPEM capability register. For example, these are LEDs created for pcieport
"10000:02:05.0" on my setup:

  leds/
  ├── 10000:02:05.0:enclosure:fail
  ├── 10000:02:05.0:enclosure:locate
  ├── 10000:02:05.0:enclosure:ok
  └── 10000:02:05.0:enclosure:rebuild

They can be also found in "/sys/class/leds" directory. The parent PCIe
device domain/bus/device/function address is used to guarantee uniqueness
across leds subsystem.

To enable/disable a "fail" indication, the "brightness" file can be edited:

  echo 1 > ./leds/10000:02:05.0:enclosure:fail/brightness
  echo 0 > ./leds/10000:02:05.0:enclosure:fail/brightness

PCIe r6.1, sec 7.9.19.2 defines the possible indications.

Multiple indications for same parent PCIe device can conflict and hardware
may update them when processing new request. To avoid issues, driver
refresh all indications by reading back control register.

This driver expects to be the exclusive NPEM extended capability manager.
It waits up to 1 second after imposing new request, it doesn't verify if
controller is busy before write, and it assumes the mutex lock gives
protection from concurrent updates.

If _DSM LED management is available, we assume the platform may be using
NPEM for its own purposes (see PCI Firmware Spec r3.3 sec 4.7), so the
driver does not use NPEM. A future patch will add _DSM support; an info
message notes whether NPEM or _DSM is being used.

NPEM is a PCIe extended capability so it should be registered in
pcie_init_capabilities() but it is not possible due to LED dependency.  The
parent pci_device must be added earlier for led_classdev_register() to be
successful. NPEM does not require configuration on kernel side, so it is
safe to register LED devices later.

Link: https://lore.kernel.org/r/20240904104848.23480-3-mariusz.tkaczyk@linux.intel.com
Suggested-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Stuart Hayes <stuart.w.hayes@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2024-09-04 17:25:12 -05:00

311 lines
8.6 KiB
Plaintext

# SPDX-License-Identifier: GPL-2.0
#
# PCI configuration
#
# select this to offer the PCI prompt
config HAVE_PCI
bool
# select this to unconditionally force on PCI support
config FORCE_PCI
bool
select HAVE_PCI
select PCI
# select this to provide a generic PCI iomap,
# without PCI itself having to be defined
config GENERIC_PCI_IOMAP
bool
menuconfig PCI
bool "PCI support"
depends on HAVE_PCI
help
This option enables support for the PCI local bus, including
support for PCI-X and the foundations for PCI Express support.
Say 'Y' here unless you know what you are doing.
if PCI
config PCI_DOMAINS
bool
depends on PCI
config PCI_DOMAINS_GENERIC
bool
select PCI_DOMAINS
config PCI_SYSCALL
bool
source "drivers/pci/pcie/Kconfig"
config PCI_MSI
bool "Message Signaled Interrupts (MSI and MSI-X)"
select GENERIC_MSI_IRQ
help
This allows device drivers to enable MSI (Message Signaled
Interrupts). Message Signaled Interrupts enable a device to
generate an interrupt using an inbound Memory Write on its
PCI bus instead of asserting a device IRQ pin.
Use of PCI MSI interrupts can be disabled at kernel boot time
by using the 'pci=nomsi' option. This disables MSI for the
entire system.
If you don't know what to do here, say Y.
config PCI_MSI_ARCH_FALLBACKS
bool
config PCI_QUIRKS
default y
bool "Enable PCI quirk workarounds" if EXPERT
help
This enables workarounds for various PCI chipset bugs/quirks.
Disable this only if your target machine is unaffected by PCI
quirks.
config PCI_DEBUG
bool "PCI Debugging"
depends on DEBUG_KERNEL
help
Say Y here if you want the PCI core to produce a bunch of debug
messages to the system log. Select this if you are having a
problem with PCI support and want to see more of what is going on.
When in doubt, say N.
config PCI_REALLOC_ENABLE_AUTO
bool "Enable PCI resource re-allocation detection"
depends on PCI_IOV
help
Say Y here if you want the PCI core to detect if PCI resource
re-allocation needs to be enabled. You can always use pci=realloc=on
or pci=realloc=off to override it. It will automatically
re-allocate PCI resources if SR-IOV BARs have not been allocated by
the BIOS.
When in doubt, say N.
config PCI_STUB
tristate "PCI Stub driver"
help
Say Y or M here if you want be able to reserve a PCI device
when it is going to be assigned to a guest operating system.
When in doubt, say N.
config PCI_PF_STUB
tristate "PCI PF Stub driver"
depends on PCI_IOV
help
Say Y or M here if you want to enable support for devices that
require SR-IOV support, while at the same time the PF (Physical
Function) itself is not providing any actual services on the
host itself such as storage or networking.
When in doubt, say N.
config XEN_PCIDEV_FRONTEND
tristate "Xen PCI Frontend"
depends on XEN_PV
select PCI_XEN
select XEN_XENBUS_FRONTEND
default y
help
The PCI device frontend driver allows the kernel to import arbitrary
PCI devices from a PCI backend to support PCI driver domains.
config PCI_ATS
bool
config PCI_DOE
bool
config PCI_ECAM
bool
config PCI_LOCKLESS_CONFIG
bool
config PCI_BRIDGE_EMUL
bool
config PCI_IOV
bool "PCI IOV support"
select PCI_ATS
help
I/O Virtualization is a PCI feature supported by some devices
which allows them to create virtual devices which share their
physical resources.
If unsure, say N.
config PCI_NPEM
bool "Native PCIe Enclosure Management"
depends on LEDS_CLASS=y
help
Support for Native PCIe Enclosure Management. It allows managing LED
indications in storage enclosures. Enclosure must support following
indications: OK, Locate, Fail, Rebuild, other indications are
optional.
config PCI_PRI
bool "PCI PRI support"
select PCI_ATS
help
PRI is the PCI Page Request Interface. It allows PCI devices that are
behind an IOMMU to recover from page faults.
If unsure, say N.
config PCI_PASID
bool "PCI PASID support"
select PCI_ATS
help
Process Address Space Identifiers (PASIDs) can be used by PCI devices
to access more than one IO address space at the same time. To make
use of this feature an IOMMU is required which also supports PASIDs.
Select this option if you have such an IOMMU and want to compile the
driver for it into your kernel.
If unsure, say N.
config PCI_P2PDMA
bool "PCI peer-to-peer transfer support"
depends on ZONE_DEVICE
#
# The need for the scatterlist DMA bus address flag means PCI P2PDMA
# requires 64bit
#
depends on 64BIT
select GENERIC_ALLOCATOR
select NEED_SG_DMA_FLAGS
help
Enables drivers to do PCI peer-to-peer transactions to and from
BARs that are exposed in other devices that are the part of
the hierarchy where peer-to-peer DMA is guaranteed by the PCI
specification to work (ie. anything below a single PCI bridge).
Many PCIe root complexes do not support P2P transactions and
it's hard to tell which support it at all, so at this time,
P2P DMA transactions must be between devices behind the same root
port.
If unsure, say N.
config PCI_LABEL
def_bool y if (DMI || ACPI)
select NLS
config PCI_HYPERV
tristate "Hyper-V PCI Frontend"
depends on ((X86 && X86_64) || ARM64) && HYPERV && PCI_MSI && SYSFS
select PCI_HYPERV_INTERFACE
help
The PCI device frontend driver allows the kernel to import arbitrary
PCI devices from a PCI backend to support PCI driver domains.
config PCI_DYNAMIC_OF_NODES
bool "Create Device tree nodes for PCI devices"
depends on OF_IRQ
select OF_DYNAMIC
help
This option enables support for generating device tree nodes for some
PCI devices. Thus, the driver of this kind can load and overlay
flattened device tree for its downstream devices.
Once this option is selected, the device tree nodes will be generated
for all PCI bridges.
choice
prompt "PCI Express hierarchy optimization setting"
default PCIE_BUS_DEFAULT
depends on PCI && EXPERT
help
MPS (Max Payload Size) and MRRS (Max Read Request Size) are PCIe
device parameters that affect performance and the ability to
support hotplug and peer-to-peer DMA.
The following choices set the MPS and MRRS optimization strategy
at compile-time. The choices are the same as those offered for
the kernel command-line parameter 'pci', i.e.,
'pci=pcie_bus_tune_off', 'pci=pcie_bus_safe',
'pci=pcie_bus_perf', and 'pci=pcie_bus_peer2peer'.
This is a compile-time setting and can be overridden by the above
command-line parameters. If unsure, choose PCIE_BUS_DEFAULT.
config PCIE_BUS_TUNE_OFF
bool "Tune Off"
depends on PCI
help
Use the BIOS defaults; don't touch MPS at all. This is the same
as booting with 'pci=pcie_bus_tune_off'.
config PCIE_BUS_DEFAULT
bool "Default"
depends on PCI
help
Default choice; ensure that the MPS matches upstream bridge.
config PCIE_BUS_SAFE
bool "Safe"
depends on PCI
help
Use largest MPS that boot-time devices support. If you have a
closed system with no possibility of adding new devices, this
will use the largest MPS that's supported by all devices. This
is the same as booting with 'pci=pcie_bus_safe'.
config PCIE_BUS_PERFORMANCE
bool "Performance"
depends on PCI
help
Use MPS and MRRS for best performance. Ensure that a given
device's MPS is no larger than its parent MPS, which allows us to
keep all switches/bridges to the max MPS supported by their
parent. This is the same as booting with 'pci=pcie_bus_perf'.
config PCIE_BUS_PEER2PEER
bool "Peer2peer"
depends on PCI
help
Set MPS = 128 for all devices. MPS configuration effected by the
other options could cause the MPS on one root port to be
different than that of the MPS on another, which may cause
hot-added devices or peer-to-peer DMA to fail. Set MPS to the
smallest possible value (128B) system-wide to avoid these issues.
This is the same as booting with 'pci=pcie_bus_peer2peer'.
endchoice
config VGA_ARB
bool "VGA Arbitration" if EXPERT
default y
depends on (PCI && !S390)
help
Some "legacy" VGA devices implemented on PCI typically have the same
hard-decoded addresses as they did on ISA. When multiple PCI devices
are accessed at same time they need some kind of coordination. Please
see Documentation/gpu/vgaarbiter.rst for more details. Select this to
enable VGA arbiter.
config VGA_ARB_MAX_GPUS
int "Maximum number of GPUs"
default 16
depends on VGA_ARB
help
Reserves space in the kernel to maintain resource locking for
multiple GPUS. The overhead for each GPU is very small.
source "drivers/pci/hotplug/Kconfig"
source "drivers/pci/controller/Kconfig"
source "drivers/pci/endpoint/Kconfig"
source "drivers/pci/switch/Kconfig"
source "drivers/pci/pwrctl/Kconfig"
endif