linux-stable/drivers/pci
Ilpo Järvinen 566f1dd528 PCI: Relax bridge window tail sizing rules
During remove & rescan cycle, PCI subsystem will recalculate and adjust
the bridge window sizing that was initially done by "BIOS". The size
calculation is based on the required alignment of the largest resource
among the downstream resources as per pbus_size_mem() (unimportant or
zero parameters marked with "..."):

  min_align = calculate_mem_align(aligns, max_order);
  size0 = calculate_memsize(size, ..., min_align);

inside calculate_memsize(), for the largest alignment:

  min_align = align1 >> 1;
  ...
  return min_align;

and then in calculate_memsize():

  return ALIGN(max(size, ...), align);

If the original bridge window sizing tried to conserve space, this will
lead to massive increase of the required bridge window size when the
downstream has a large disparity in BAR sizes. E.g., with 16MiB and
16GiB BARs this results in 24GiB bridge window size even if 16MiB BAR
does not require gigabytes of space to fit.

When doing remove & rescan for a bus that contains such a PCI device, a
larger bridge window is suddenly required on rescan but when there is a
bridge window upstream that is already assigned based on the original
size, it cannot be enlarged to the new requirement. This causes the
allocation of the bridge window to fail (0x600000000 > 0x400ffffff):

  pci 0000:02:01.0: PCI bridge to [bus 03]
  pci 0000:02:01.0:   bridge window [mem 0x40400000-0x405fffff]
  pci 0000:02:01.0:   bridge window [mem 0x6000000000-0x6400ffffff 64bit pref]
  pci 0000:01:00.0: PCI bridge to [bus 02-04]
  pci 0000:01:00.0:   bridge window [mem 0x40400000-0x406fffff]
  pci 0000:01:00.0:   bridge window [mem 0x6000000000-0x6400ffffff 64bit pref]

  pci 0000:03:00.0: device released
  pci 0000:02:01.0: device released
  pcieport 0000:01:00.0: scanning [bus 02-04] behind bridge, pass 0
  pci 0000:02:01.0: PCI bridge to [bus 03]
  pci 0000:02:01.0:   bridge window [mem 0x40400000-0x405fffff]
  pci 0000:02:01.0:   bridge window [mem 0x6000000000-0x6400ffffff 64bit pref]
  pci 0000:02:01.0: scanning [bus 03-03] behind bridge, pass 0
  pci 0000:03:00.0: BAR 0 [mem 0x6400000000-0x6400ffffff 64bit pref]
  pci 0000:03:00.0: BAR 2 [mem 0x6000000000-0x63ffffffff 64bit pref]
  pci 0000:03:00.0: ROM [mem 0x40400000-0x405fffff pref]

  pci 0000:02:01.0: PCI bridge to [bus 03]
  pci 0000:02:01.0: scanning [bus 03-03] behind bridge, pass 1
  pcieport 0000:01:00.0: scanning [bus 02-04] behind bridge, pass 1
  pci 0000:02:01.0: bridge window [mem size 0x600000000 64bit pref]: can't assign; no space
  pci 0000:02:01.0: bridge window [mem size 0x600000000 64bit pref]: failed to assign
  pci 0000:02:01.0: bridge window [mem 0x40400000-0x405fffff]: assigned
  pci 0000:03:00.0: BAR 2 [mem size 0x400000000 64bit pref]: can't assign; no space
  pci 0000:03:00.0: BAR 2 [mem size 0x400000000 64bit pref]: failed to assign
  pci 0000:03:00.0: BAR 0 [mem size 0x01000000 64bit pref]: can't assign; no space
  pci 0000:03:00.0: BAR 0 [mem size 0x01000000 64bit pref]: failed to assign
  pci 0000:03:00.0: ROM [mem 0x40400000-0x405fffff pref]: assigned
  pci 0000:02:01.0: PCI bridge to [bus 03]
  pci 0000:02:01.0:   bridge window [mem 0x40400000-0x405fffff]

This is a major surprise for users who are suddenly left with a device that
was working fine with the original bridge window sizing.

Even if the already assigned bridge window could be enlarged by
reallocation in some cases (something the current code does not attempt
to do), it is not possible in general case and the large amount of
wasted space at the tail of the bridge window may lead to other
resource exhaustion problems on Root Complex level (think of multiple
PCIe cards with VFs and BAR size disparity in a single system).

PCI BARs only need natural alignment (PCIe r6.1, sec 7.5.1.2.1) and bridge
memory windows need 1MiB (sec 7.5.1.3). The current bridge window tail
alignment rule was introduced in the commit 5d0a8965aea9 ("[PATCH] 2.5.14:
New PCI allocation code (alpha, arm, parisc) [2/2]") that only states:
"pbus_size_mem: core stuff; tested with randomly generated sets of
resources". It does not explain the motivation for the extra tail space
allocated that is not truly needed by the downstream resources. As such, it
is far from clear if it ever has been required by any HW.

To prevent devices with BAR size disparity from becoming unusable after
remove & rescan cycle, attempt to do a truly minimal allocation for memory
resources if needed. First check if the normally calculated bridge window
will not fit into an already assigned upstream resource.  In such case, try
with relaxed bridge window tail sizing rules instead where no extra tail
space is requested beyond what the downstream resources require.  Only
enforce the alignment requirement of the bridge window itself (normally
1MiB).

With this patch, the resources are successfully allocated:

  pci 0000:02:01.0: PCI bridge to [bus 03]
  pci 0000:02:01.0: scanning [bus 03-03] behind bridge, pass 1
  pcieport 0000:01:00.0: scanning [bus 02-04] behind bridge, pass 1
  pcieport 0000:01:00.0: Assigned bridge window [mem 0x6000000000-0x6400ffffff 64bit pref] to [bus 02-04] cannot fit 0x600000000 required for 0000:02:01.0 bridging to [bus 03]
  pci 0000:02:01.0: bridge window [mem 0x6000000000-0x6400ffffff 64bit pref] to [bus 03] requires relaxed alignment rules
  pcieport 0000:01:00.0: Assigned bridge window [mem 0x40400000-0x406fffff] to [bus 02-04] free space at [mem 0x40400000-0x405fffff]
  pci 0000:02:01.0: bridge window [mem 0x6000000000-0x6400ffffff 64bit pref]: assigned
  pci 0000:02:01.0: bridge window [mem 0x40400000-0x405fffff]: assigned
  pci 0000:03:00.0: BAR 2 [mem 0x6000000000-0x63ffffffff 64bit pref]: assigned
  pci 0000:03:00.0: BAR 0 [mem 0x6400000000-0x6400ffffff 64bit pref]: assigned
  pci 0000:03:00.0: ROM [mem 0x40400000-0x405fffff pref]: assigned
  pci 0000:02:01.0: PCI bridge to [bus 03]
  pci 0000:02:01.0:   bridge window [mem 0x40400000-0x405fffff]
  pci 0000:02:01.0:   bridge window [mem 0x6000000000-0x6400ffffff 64bit pref]

This patch draws inspiration from the initial investigations and work by
Mika Westerberg.

Closes: https://bugzilla.kernel.org/show_bug.cgi?id=216795
Link: https://lore.kernel.org/linux-pci/20190812144144.2646-1-mika.westerberg@linux.intel.com/
Fixes: 5d0a8965aea9 ("[PATCH] 2.5.14: New PCI allocation code (alpha, arm, parisc) [2/2]")
Link: https://lore.kernel.org/r/20240507102523.57320-9-ilpo.jarvinen@linux.intel.com
Tested-by: Lidong Wang <lidong.wang@intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
2024-06-12 14:51:30 -05:00
..
controller Merge branch 'pci/controller/tegra194' 2024-05-16 18:14:13 -05:00
endpoint Merge branch 'pci/endpoint' 2024-05-16 18:14:13 -05:00
hotplug PCI: hotplug: Remove obsolete sgi_hotplug TODO notes 2024-05-03 16:26:50 -05:00
msi Merge branch 'pci/ims-removal' 2024-05-16 18:14:14 -05:00
pcie pci-v6.10-changes 2024-05-21 10:09:28 -07:00
switch PCI: switchtec: Fix an error handling path in switchtec_pci_probe() 2024-02-08 15:35:38 -06:00
access.c Merge branch 'pci/misc' 2024-05-16 18:14:14 -05:00
ats.c PCI/ATS: Use FIELD_GET() 2023-10-24 16:55:45 -05:00
bus.c resource: Use typedef for alignf callback 2024-05-28 11:14:14 -05:00
devres.c PCI: Move devres code from pci.c to devres.c 2024-02-12 10:36:17 -06:00
doe.c PCI/DOE: Support discovery version 2 2024-04-09 09:33:15 -05:00
ecam.c
host-bridge.c PCI: VMD: ACPI: Make ACPI companion lookup work for VMD bus 2021-09-02 17:59:58 +02:00
iomap.c PCI: Move pci_iomap.c to drivers/pci/ 2024-02-12 10:35:40 -06:00
iov.c PCI: Use resource names in PCI log messages 2023-12-15 17:28:42 -06:00
irq.c PCI: Place interrupt related code into irq.c 2024-01-29 17:01:31 -06:00
Kconfig PCI: Move pci_iomap.c to drivers/pci/ 2024-02-12 10:35:40 -06:00
Makefile Merge branch 'pci/sysfs' 2024-03-12 12:14:23 -05:00
mmap.c PCI/sysfs: Compile pci-sysfs.c only if CONFIG_SYSFS=y 2024-03-05 16:08:43 -06:00
of_property.c PCI: of_property: Return error for int_map allocation failure 2024-05-02 17:15:01 -05:00
of.c PCI: of: Destroy changeset when adding PCI device node fails 2023-09-29 17:33:51 -05:00
p2pdma.c PCI/P2PDMA: Fix a sleeping issue in a RCU read section 2024-02-08 15:31:43 -06:00
pci-acpi.c Merge branch 'pci/pm' 2023-10-28 13:30:59 -05:00
pci-bridge-emul.c PCI: pci-bridge-emul: Set position of PCI capabilities to real HW value 2022-08-25 12:07:56 +02:00
pci-bridge-emul.h PCI: pci-bridge-emul: Set position of PCI capabilities to real HW value 2022-08-25 12:07:56 +02:00
pci-driver.c Merge branch 'pci/misc' 2024-03-12 12:14:24 -05:00
pci-label.c
pci-mid.c PCI: PM: Do not use pci_platform_pm_ops for Intel MID PM 2021-09-27 17:13:21 +02:00
pci-pf-stub.c
pci-stub.c PCI: pci_stub: Set driver_managed_dma 2022-04-28 15:32:20 +02:00
pci-sysfs.c PCI/sysfs: Demacrofy pci_dev_resource_resize_attr(n) functions 2024-03-05 16:10:17 -06:00
pci.c Merge branch 'pci/misc' 2024-05-16 18:14:14 -05:00
pci.h PCI: Make pcie_bandwidth_capable() static 2024-05-08 19:03:55 -05:00
probe.c Merge branch 'pci/misc' 2024-05-16 18:14:14 -05:00
proc.c PCI: Remove pci_mmap_page_range() wrapper 2022-07-29 12:08:44 -05:00
quirks.c pci-v6.10-changes 2024-05-21 10:09:28 -07:00
remove.c PCI: Create device tree node for bridge 2023-08-22 14:56:09 -05:00
rom.c PCI: Prefer 'unsigned int' over bare 'unsigned' 2021-10-27 13:41:22 -05:00
search.c PCI: Add pci_get_base_class() helper 2023-09-28 16:49:44 -05:00
setup-bus.c PCI: Relax bridge window tail sizing rules 2024-06-12 14:51:30 -05:00
setup-res.c PCI: Use resource names in PCI log messages 2023-12-15 17:28:42 -06:00
slot.c PCI/sysfs: Constify struct kobj_type pci_slot_ktype 2023-02-16 12:00:25 -06:00
syscall.c PCI: Use consistent put_user() pointer types 2023-08-25 08:15:13 -05:00
vc.c PCI/VC: Use FIELD_GET() 2023-10-24 16:55:45 -05:00
vgaarb.c pci-v6.7-changes 2023-11-02 14:05:18 -10:00
vpd.c PCI/VPD: Add runtime power management to sysfs interface 2023-08-11 14:19:16 -05:00
xen-pcifront.c x86: always initialize xen-swiotlb when xen-pcifront is enabling 2023-07-31 17:54:27 +02:00