334 Commits

Author SHA1 Message Date
Linus Torvalds
5c2b050848 A set of updates for the interrupt subsystem:
- Tree wide:
 
     * Make nr_irqs static to the core code and provide accessor functions
       to remove existing and prevent future aliasing problems with local
       variables or function arguments of the same name.
 
   - Core code:
 
     * Prevent freeing an interrupt in the devres code which is not managed
       by devres in the first place.
 
     * Use seq_put_decimal_ull_width() for decimal values output in
       /proc/interrupts which increases performance significantly as it
       avoids parsing the format strings over and over.
 
     * Optimize raising the timer and hrtimer soft interrupts by using the
       'set bit only' variants instead of the combined version which checks
       whether ksoftirqd should be woken up. The latter is a pointless
       exercise as both soft interrupts are raised in the context of the
       timer interrupt and therefore never wake up ksoftirqd.
 
     * Delegate timer/hrtimer soft interrupt processing to a dedicated thread
       on RT.
 
       Timer and hrtimer soft interrupts are always processed in ksoftirqd
       on RT enabled kernels. This can lead to high latencies when other
       soft interrupts are delegated to ksoftirqd as well.
 
       The separate thread allows to run them seperately under a RT
       scheduling policy to reduce the latency overhead.
 
   - Drivers:
 
     * New drivers or extensions of existing drivers to support Renesas
       RZ/V2H(P), Aspeed AST27XX, T-HEAD C900 and ATMEL sam9x7 interrupt
       chips
 
     * Support for multi-cluster GICs on MIPS.
 
       MIPS CPUs can come with multiple CPU clusters, where each CPU cluster
       has its own GIC (Generic Interrupt Controller). This requires to
       access the GIC of a remote cluster through a redirect register block.
 
       This is encapsulated into a set of helper functions to keep the
       complexity out of the actual code paths which handle the GIC details.
 
     * Support for encrypted guests in the ARM GICV3 ITS driver
 
       The ITS page needs to be shared with the hypervisor and therefore
       must be decrypted.
 
     * Small cleanups and fixes all over the place
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmc7ggcTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoaf7D/9G6FgJXx/60zqnpnOr9Yx0hxjaI47x
 PFyCd3P05qyVMBYXfI99vrSKuVdMZXJ/fH5L83y+sOaTASyLTzg37igZycIDJzLI
 FnHh/m/+UA8k2aIC5VUiNAjne2RLaTZiRN15uEHFVjByC5Y+YTlCNUE4BBhg5RfQ
 hKmskeffWdtui3ou13CSNvbFn+pmqi4g6n1ysUuLhiwM2E5b1rZMprcCOnun/cGP
 IdUQsODNWTTv9eqPJez985M6A1x2SCGNv7Z73h58B9N0pBRPEC1xnhUnCJ1sA0cJ
 pnfde2C1lztEjYbwDngy0wgq0P6LINjQ5Ma2YY2F2hTMsXGJxGPDZm24/u5uR46x
 N/gsOQMXqw6f5yvbiS7Asx9WzR6ry8rJl70QRgTyozz7xxJTaiNm2HqVFe2wc+et
 Q/BzaKdhmUJj1GMZmqD2rrgwYeDcb4wWYNtwjM4PVHHxYlJVq0mEF1kLLS8YDyjf
 HuGPVqtSkt3E0+Br3FKcv5ltUQP8clXbudc6L1u98YBfNK12hW8L+c3YSvIiFoYM
 ZOAeANPM7VtQbP2Jg2q81Dd3CShImt5jqL2um+l8g7+mUE7l9gyuO/w/a5dQ57+b
 kx7mHHIW2zCeHrkZZbRUYzI2BJfMCCOVN4Ax5OZxTLnLsL9VEehy8NM8QYT4TS8R
 XmTOYW3U9XR3gw==
 =JqxC
 -----END PGP SIGNATURE-----

Merge tag 'irq-core-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull interrupt subsystem updates from Thomas Gleixner:
 "Tree wide:

   - Make nr_irqs static to the core code and provide accessor functions
     to remove existing and prevent future aliasing problems with local
     variables or function arguments of the same name.

  Core code:

   - Prevent freeing an interrupt in the devres code which is not
     managed by devres in the first place.

   - Use seq_put_decimal_ull_width() for decimal values output in
     /proc/interrupts which increases performance significantly as it
     avoids parsing the format strings over and over.

   - Optimize raising the timer and hrtimer soft interrupts by using the
     'set bit only' variants instead of the combined version which
     checks whether ksoftirqd should be woken up. The latter is a
     pointless exercise as both soft interrupts are raised in the
     context of the timer interrupt and therefore never wake up
     ksoftirqd.

   - Delegate timer/hrtimer soft interrupt processing to a dedicated
     thread on RT.

     Timer and hrtimer soft interrupts are always processed in ksoftirqd
     on RT enabled kernels. This can lead to high latencies when other
     soft interrupts are delegated to ksoftirqd as well.

     The separate thread allows to run them seperately under a RT
     scheduling policy to reduce the latency overhead.

  Drivers:

   - New drivers or extensions of existing drivers to support Renesas
     RZ/V2H(P), Aspeed AST27XX, T-HEAD C900 and ATMEL sam9x7 interrupt
     chips

   - Support for multi-cluster GICs on MIPS.

     MIPS CPUs can come with multiple CPU clusters, where each CPU
     cluster has its own GIC (Generic Interrupt Controller). This
     requires to access the GIC of a remote cluster through a redirect
     register block.

     This is encapsulated into a set of helper functions to keep the
     complexity out of the actual code paths which handle the GIC
     details.

   - Support for encrypted guests in the ARM GICV3 ITS driver

     The ITS page needs to be shared with the hypervisor and therefore
     must be decrypted.

   - Small cleanups and fixes all over the place"

* tag 'irq-core-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits)
  irqchip/riscv-aplic: Prevent crash when MSI domain is missing
  genirq/proc: Use seq_put_decimal_ull_width() for decimal values
  softirq: Use a dedicated thread for timer wakeups on PREEMPT_RT.
  timers: Use __raise_softirq_irqoff() to raise the softirq.
  hrtimer: Use __raise_softirq_irqoff() to raise the softirq
  riscv: defconfig: Enable T-HEAD C900 ACLINT SSWI drivers
  irqchip: Add T-HEAD C900 ACLINT SSWI driver
  dt-bindings: interrupt-controller: Add T-HEAD C900 ACLINT SSWI device
  irqchip/stm32mp-exti: Use of_property_present() for non-boolean properties
  irqchip/mips-gic: Fix selection of GENERIC_IRQ_EFFECTIVE_AFF_MASK
  irqchip/mips-gic: Prevent indirect access to clusters without CPU cores
  irqchip/mips-gic: Multi-cluster support
  irqchip/mips-gic: Setup defaults in each cluster
  irqchip/mips-gic: Support multi-cluster in for_each_online_cpu_gic()
  irqchip/mips-gic: Replace open coded online CPU iterations
  genirq/irqdesc: Use str_enabled_disabled() helper in wakeup_show()
  genirq/devres: Don't free interrupt which is not managed by devres
  irqchip/gic-v3-its: Fix over allocation in itt_alloc_pool()
  irqchip/aspeed-intc: Add AST27XX INTC support
  dt-bindings: interrupt-controller: Add support for ASPEED AST27XX INTC
  ...
2024-11-19 15:54:19 -08:00
Marc Zyngier
e6c24e2d05 irqchip/gic-v4: Correctly deal with set_affinity on lazily-mapped VPEs
Zenghui points out that a recent change to the way set_affinity is
handled for VPEs has the potential to return an error if the VPE
hasn't been mapped yet (because the guest hasn't emited a MAPTI
command yet), affecting GICv4.0 implementations that rely on the
ITSList feature.

Fix this by making the set_affinity succeed in this case, and
return early, without trying to touch the HW.

Fixes: 1442ee0011983 ("irqchip/gic-v4: Don't allow a VMOVP on a dying VPE")
Reported-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Link: https://lore.kernel.org/all/20241027102220.1858558-1-maz@kernel.org
Link: https://lore.kernel.org/r/aab45cd3-e5ca-58cf-e081-e32a17f5b4e7@huawei.com
2024-10-27 17:30:16 +01:00
Steven Price
bc88d44bd7 irqchip/gic-v3-its: Fix over allocation in itt_alloc_pool()
itt_alloc_pool() calls its_alloc_pages_node() to allocate an individual
page to add to the pool (for allocations <PAGE_SIZE). However the final
argument of its_alloc_pages_node() is the page order not the number of
pages. Currently it allocates two pages and leaks the second page.
Fix it by passing 0 instead (1 << 0 = 1 page).

Fixes: b08e2f42e86b ("irqchip/gic-v3-its: Share ITS tables with a non-trusted hypervisor")
Reported-by: Shanker Donthineni <sdonthineni@nvidia.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/1f6e19c4-1fb9-43ab-a8a2-a465c9cff84b@arm.com
Closes: https://lore.kernel.org/r/ed65312a-245c-4fa5-91ad-5d620cab7c6b%40nvidia.com
2024-10-21 15:49:15 +02:00
Marc Zyngier
1442ee0011 irqchip/gic-v4: Don't allow a VMOVP on a dying VPE
Kunkun Jiang reported that there is a small window of opportunity for
userspace to force a change of affinity for a VPE while the VPE has already
been unmapped, but the corresponding doorbell interrupt still visible in
/proc/irq/.

Plug the race by checking the value of vmapp_count, which tracks whether
the VPE is mapped ot not, and returning an error in this case.

This involves making vmapp_count common to both GICv4.1 and its v4.0
ancestor.

Fixes: 64edfaa9a234 ("irqchip/gic-v4.1: Implement the v4.1 flavour of VMAPP")
Reported-by: Kunkun Jiang <jiangkunkun@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/c182ece6-2ba0-ce4f-3404-dba7a3ab6c52@huawei.com
Link: https://lore.kernel.org/all/20241002204959.2051709-1-maz@kernel.org
2024-10-08 17:44:27 +02:00
Steven Price
e36d4165f0 irqchip/gic-v3-its: Rely on genpool alignment
its_create_device() over-allocated by ITS_ITT_ALIGN - 1 bytes to ensure
that an aligned area was available within the allocation. The new genpool
allocator has its min_alloc_order set to get_order(ITS_ITT_ALIGN) so all
allocations from it should be appropriately aligned.

Remove the over-allocation from its_create_device() and alignment from
its_build_mapd_cmd().

Signed-off-by: Steven Price <steven.price@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Will Deacon <will@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/all/20241002141630.433502-3-steven.price@arm.com
2024-10-02 18:00:41 +02:00
Steven Price
b08e2f42e8 irqchip/gic-v3-its: Share ITS tables with a non-trusted hypervisor
Within a realm guest the ITS is emulated by the host. This means the
allocations must have been made available to the host by a call to
set_memory_decrypted(). Introduce an allocation function which performs
this extra call.

For the ITT use a custom genpool-based allocator that calls
set_memory_decrypted() for each page allocated, but then suballocates the
size needed for each ITT. Note that there is no mechanism implemented to
return pages from the genpool, but it is unlikely that the peak number of
devices will be much larger than the normal level - so this isn't expected
to be an issue.

Co-developed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Will Deacon <will@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/all/20241002141630.433502-2-steven.price@arm.com
2024-10-02 18:00:40 +02:00
Marc Zyngier
f97fd45876 irqchip/gic-v4: Fix ordering between vmapp and vpe locks
The recently established lock ordering mandates that the per-VM
vmapp_lock is acquired before taking the per-VPE lock.

As it turns out, its_vpe_set_affinity() takes the VPE lock, and
then calls into its_send_vmovp(), which itself takes the vmapp
lock. Obviously, this is a lock order violation.

As its_send_vmovp() is only called from its_vpe_set_affinity(),
hoist the vmapp locking from the former into the latter, restoring
the expected order.

Fixes: f0eb154c39471 ("irqchip/gic-v4: Substitute vmovp_lock for a per-VM lock")
Reported-by: Zhou Wang <wangzhou1@hisilicon.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20240818171625.3030584-1-maz@kernel.org
2024-08-20 16:57:13 +02:00
Linus Torvalds
66ebbdfdeb Switch ARM/ARM64 over to the modern per device MSI domains:
This simplifies the handling of platform MSI and wire to MSI controllers
   and removes about 500 lines of legacy code.
 
   Aside of that it paves the way for ARM/ARM64 to utilize the dynamic
   allocation of PCI/MSI interrupts and to support the upcoming non
   standard IMS (Interrupt Message Store) mechanism on PCIe devices
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmaeheUTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYocX4D/wLYD+DQDpA3U1XS8jPNE4vKcmBBNX8
 Mj4qdHsY8fK+FhmtLsj8FL3iSTymPtgXzFupXGS+5iFG3LhbW8JWEbqjbowcJ1c8
 /4w8sKyyWdCSScrCTrH4A3RrLNDAX3DzSMqqi17sdETuwtN0RJiXgcm/CwRXETmn
 kVqB7ddalyAR0Z2N/ym1fkuwyBAdeu3cBxMy/BWR6GFae1dAGe8Kr8GsmmuzBTFi
 DQSmkh6kZntTn9y+K7juXF+1q8InolmHiOOUeoUJachSCyp6nu9W2+S2MVUiuOA2
 X1/Ei3eKvkBHFDd7phZnIrVecuNehAQEV6BRMKOYBiDG4lwD6vCbbr9/YF5vBGni
 tbZAetk9VBpIj0YRVAz7WkLC2JmVbw4znlrDwe8+xeLeDwRXl9f4Xc1Udr0qKgpd
 1bNE1zG1z45v5J3OtJLJ4MCYcUCsQgv1CkUlNEdz5+NhXHT+W+oKJor/0WYJ3Qwe
 iqTEJ9BA1/SzvngwIt/uoMZlEjBl/0/T1UEMJvP/7oEqjl/UAEWGlpKnID3hsDc2
 GcIEOJod6hWzyPyeJUI6RpCHy4ZG93WL7Ks+lvzfp381yoDL5/KlveDtSomyuzYF
 2xXHUAvw8MAYfJ/CFft/DYme8sBpn1cxAMWdctEiAn0qfS7X1RNZ/RhQ2OXxRw3q
 tNpc0jEen9m72A==
 =2adH
 -----END PGP SIGNATURE-----

Merge tag 'irq-msi-2024-07-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull MSI interrupt updates from Thomas Gleixner:
 "Switch ARM/ARM64 over to the modern per device MSI domains.

  This simplifies the handling of platform MSI and wire to MSI
  controllers and removes about 500 lines of legacy code.

  Aside of that it paves the way for ARM/ARM64 to utilize the dynamic
  allocation of PCI/MSI interrupts and to support the upcoming non
  standard IMS (Interrupt Message Store) mechanism on PCIe devices"

* tag 'irq-msi-2024-07-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
  irqchip/gic-v3-its: Correctly fish out the DID for platform MSI
  irqchip/gic-v3-its: Correctly honor the RID remapping
  genirq/msi: Move msi_device_data to core
  genirq/msi: Remove platform MSI leftovers
  irqchip/irq-mvebu-icu: Remove platform MSI leftovers
  irqchip/irq-mvebu-sei: Switch to MSI parent
  irqchip/mvebu-odmi: Switch to parent MSI
  irqchip/mvebu-gicp: Switch to MSI parent
  irqchip/irq-mvebu-icu: Prepare for real per device MSI
  irqchip/imx-mu-msi: Switch to MSI parent
  irqchip/gic-v2m: Switch to device MSI
  irqchip/gic_v3_mbi: Switch over to parent domain
  genirq/msi: Remove platform_msi_create_device_domain()
  irqchip/mbigen: Remove platform_msi_create_device_domain() fallback
  irqchip/gic-v3-its: Switch platform MSI to MSI parent
  irqchip/irq-msi-lib: Prepare for DOMAIN_BUS_WIRED_TO_MSI
  irqchip/mbigen: Prepare for real per device MSI
  irqchip/irq-msi-lib: Prepare for DEVICE MSI to replace platform MSI
  irqchip/gic-v3-its: Provide MSI parent for PCI/MSI[-X]
  irqchip/irq-msi-lib: Prepare for PCI MSI/MSIX
  ...
2024-07-22 14:02:19 -07:00
Linus Torvalds
ac7473a179 Updates for the interrupt subsystem:
- Core:
 
     - Provide a new mechanism to create interrupt domains. The existing
       interfaces have already too many parameters and it's a pain to expand
       any of this for new required functionality.
 
       The new function takes a pointer to a data structure as argument. The
       data structure combines all existing parameters and allows for easy
       extension.
 
       The first extension for this is to handle the instantiation of
       generic interrupt chips at the core level and to allow drivers to
       provide extra init/exit callbacks.
 
       This is necessary to do the full interrupt chip initialization before
       the new domain is published, so that concurrent usage sites won't see
       a half initialized interrupt domain. Similar problems exist on
       teardown.
 
       This has turned out to be a real problem due to the deferred and
       parallel probing which was added in recent years.
 
       Handling this at the core level allows to remove quite some accrued
       boilerplate code in existing drivers and avoids horrible workarounds
       at the driver level.
 
     - The usual small improvements all over the place
 
   - Drivers
 
     - Add support for LAN966x OIC and RZ/Five SoC
 
     - Split the STM ExtI driver into a microcontroller and a SMP version to
       allow building the latter as a module for multi-platform kernels.
 
     - Enable MSI support for Armada 370XP on platforms which do not support
       IPIs.
 
     - The usual small fixes and enhancements all over the place.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmaVJbUTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoXTuD/9Tc9BhY5CW7HQkdPQu2Db1O+esprkQ
 Uo9lMpTTpPiy9btg4LONzLf4mjbufZpyKBxkRWoZFO0Zj5q4UE9NZYh7EcxrF5Tl
 CIFJmyteLsYuOyCmPrtSDSovonXjQKYBE3u2LVJNNkwEkhYbYW9sqIKeT8nneLv6
 53gd28ESFUEUjHNTblw/eXviweyUKSXc0qyg+3hgZQPMoh9RkdkEPvyaw9Y/s5Ce
 FelLLxzMqX86dR2TJMLqiaGiMpUu/kl+Yz2m5c77TwA2D68qjhHywbtKtlH7b3C6
 LMHu2dMrrKSJrLL8roVIYJdHAd1TKWVdnYhqv9WBHFTu1sDuztpR44mewbo8exUU
 L2RgVSGYNmeFC3p4wztWYSQfIVa9uOg7+TnJJdh7G0jLIeKM/TbufWqDAJAuoVPL
 QhGbZ5xNbZJZ8bvhhItjxpRN/kPs44p3mUGyRJBQzm+mDN118bqfmQzhLcwRbfE2
 smp73SQzg9alG2rGdNVEqkKmp8zhg2Crx2VCeVdgbeOxWQRet9zLWcp4FfCEUE9e
 eK3iEi8z+rmwafaf3rsxYdrdIRLaUmcni0v7R/16cJH/Cs7bU3Re8XyGhevo3lsO
 pJiP5wZDxbckwXNpLm3S/qPDW7vSCnuFPF7QmOvC3a70PsD+E4NKUgiwJuHtn/ZV
 pFBKzbQgCsowQA==
 =QCRH
 -----END PGP SIGNATURE-----

Merge tag 'irq-core-2024-07-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull interrupt subsystem updates from Thomas Gleixner:
 "Core:

   - Provide a new mechanism to create interrupt domains. The existing
     interfaces have already too many parameters and it's a pain to
     expand any of this for new required functionality.

     The new function takes a pointer to a data structure as argument.
     The data structure combines all existing parameters and allows for
     easy extension.

     The first extension for this is to handle the instantiation of
     generic interrupt chips at the core level and to allow drivers to
     provide extra init/exit callbacks.

     This is necessary to do the full interrupt chip initialization
     before the new domain is published, so that concurrent usage sites
     won't see a half initialized interrupt domain. Similar problems
     exist on teardown.

     This has turned out to be a real problem due to the deferred and
     parallel probing which was added in recent years.

     Handling this at the core level allows to remove quite some accrued
     boilerplate code in existing drivers and avoids horrible
     workarounds at the driver level.

   - The usual small improvements all over the place

  Drivers:

   - Add support for LAN966x OIC and RZ/Five SoC

   - Split the STM ExtI driver into a microcontroller and a SMP version
     to allow building the latter as a module for multi-platform
     kernels

   - Enable MSI support for Armada 370XP on platforms which do not
     support IPIs

   - The usual small fixes and enhancements all over the place"

* tag 'irq-core-2024-07-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (59 commits)
  irqdomain: Fix the kernel-doc and plug it into Documentation
  genirq: Set IRQF_COND_ONESHOT in request_irq()
  irqchip/imx-irqsteer: Handle runtime power management correctly
  irqchip/gic-v3: Pass #redistributor-regions to gic_of_setup_kvm_info()
  irqchip/bcm2835: Enable SKIP_SET_WAKE and MASK_ON_SUSPEND
  irqchip/gic-v4: Make sure a VPE is locked when VMAPP is issued
  irqchip/gic-v4: Substitute vmovp_lock for a per-VM lock
  irqchip/gic-v4: Always configure affinity on VPE activation
  Revert "irqchip/dw-apb-ictl: Support building as module"
  Revert "Loongarch: Support loongarch avec"
  arm64: Kconfig: Allow build irq-stm32mp-exti driver as module
  ARM: stm32: Allow build irq-stm32mp-exti driver as module
  irqchip/stm32mp-exti: Allow building as module
  irqchip/stm32mp-exti: Rename internal symbols
  irqchip/stm32-exti: Split MCU and MPU code
  arm64: Kconfig: Select STM32MP_EXTI on STM32 platforms
  ARM: stm32: Use different EXTI driver on ARMv7m and ARMv7a
  irqchip/stm32-exti: Add CONFIG_STM32MP_EXTI
  irqchip/dw-apb-ictl: Support building as module
  irqchip/riscv-aplic: Simplify the initialization code
  ...
2024-07-22 13:52:05 -07:00
Thomas Gleixner
48f71d56e2 irqchip/gic-v3-its: Provide MSI parent infrastructure
To support per device MSI domains the ITS must provide MSI parent domain
functionality.

Provide the basic skeleton for this:

   - msi_parent_ops
   - child domain init callback
   - the MSI parent flag set in irqdomain::flags

This does not make ITS a functional parent domain as there is no bit set in
the bus_select_mask yet, but it provides the base to implement PCI and
platform MSI support gradually on top.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Signed-off-by: Shivamurthy Shastri <shivamurthy.shastri@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240623142234.903076277@linutronix.de
2024-07-18 20:31:19 +02:00
Marc Zyngier
a84a07fa31 irqchip/gic-v4: Make sure a VPE is locked when VMAPP is issued
In order to make sure that vpe->col_idx is correctly sampled when a VMAPP
command is issued, the vpe_lock must be held for the VPE. This is now
possible since the introduction of the per-VM vmapp_lock, which can be
taken before vpe_lock in the correct locking order.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Nianyao Tang <tangnianyao@huawei.com>
Link: https://lore.kernel.org/r/20240705093155.871070-4-maz@kernel.org
2024-07-15 15:13:55 +02:00
Marc Zyngier
f0eb154c39 irqchip/gic-v4: Substitute vmovp_lock for a per-VM lock
vmovp_lock is abused in a number of cases to serialise updates
to vlpi_count[] and deal with map/unmap of a VM to ITSs.

Instead, provide a per-VM lock and revisit the use of vlpi_count[]
so that it is always wrapped in this per-VM vmapp_lock.

This reduces the potential contention on a concurrent VMOVP command,
and paves the way for subsequent VPE locking that holding vmovp_lock
actively prevents due to the lock ordering.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Nianyao Tang <tangnianyao@huawei.com>
Link: https://lore.kernel.org/r/20240705093155.871070-3-maz@kernel.org
2024-07-15 15:13:55 +02:00
Marc Zyngier
7d2c2048a8 irqchip/gic-v4: Always configure affinity on VPE activation
There are currently two paths to set the initial affinity of a VPE:

 - at activation time on GICv4 without the stupid VMOVP list, and
   on GICv4.1

 - at map time for GICv4 with VMOVP list

The latter location may end-up modifying the affinity of VPE that is
currently running, making the results unpredictible.

Instead, unify the two paths, making sure to set the initial affinity only
at activation time.

Reported-by: Nianyao Tang <tangnianyao@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Nianyao Tang <tangnianyao@huawei.com>
Link: https://lore.kernel.org/r/20240705093155.871070-2-maz@kernel.org
2024-07-15 15:13:55 +02:00
Mark Rutland
a6156e7031 irqchip/gic-v3: Make distributor priorities variables
In subsequent patches the GICv3 driver will choose the regular interrupt
priority at boot time.

In preparation for using dynamic priorities, place the priorities in
variables and update the code to pass these as parameters. Users of
GICD_INT_DEF_PRI_X4 are modified to replicate the priority byte using
REPEAT_BYTE_U32().

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Alexandru Elisei <alexandru.elisei@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20240617111841.2529370-4-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
2024-06-24 18:16:44 +01:00
Lorenzo Pieralisi
ababa16fd9 irqchip/gic-v3: Enable non-coherent redistributors/ITSes ACPI probing
The GIC architecture specification defines a set of registers for
redistributors and ITSes that control the sharebility and cacheability
attributes of redistributors/ITSes initiator ports on the interconnect
(GICR_[V]PROPBASER, GICR_[V]PENDBASER, GITS_BASER<n>).

Architecturally the GIC provides a means to drive shareability and
cacheability attributes signals but it is not mandatory for designs to
wire up the corresponding interconnect signals that control the
cacheability/shareability of transactions.

Redistributors and ITSes interconnect ports can be connected to
non-coherent interconnects that are not able to manage the
shareability/cacheability attributes; this implicitly makes the
redistributors and ITSes non-coherent observers.

To enable non-coherent GIC designs on ACPI based systems, parse the MADT
GICC/GICR/ITS subtables non-coherent flags to determine whether the
respective components are non-coherent observers and force the
shareability attributes to be programmed into the redistributors and
ITSes registers.

An ACPI global function (acpi_get_madt_revision()) is added to retrieve
the MADT revision, in that it is essential to check the MADT revision
before checking for flags that were added with MADT revision 7 so that
if the kernel is booted with an ACPI MADT table with revision < 7 it
skips parsing the newly added flags (that should be zeroed reserved
values for MADT versions < 7 but they could turn out to be buggy and
should be ignored).

Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Link: https://lore.kernel.org/r/20240606094238.757649-2-lpieralisi@kernel.org
2024-06-06 16:30:15 +02:00
Hagar Hemdan
b97e8a2f71 irqchip/gic-v3-its: Fix potential race condition in its_vlpi_prop_update()
its_vlpi_prop_update() calls lpi_write_config() which obtains the
mapping information for a VLPI without lock held. So it could race
with its_vlpi_unmap().

Since all calls from its_irq_set_vcpu_affinity() require the same
lock to be held, hoist the locking there instead of sprinkling the
locking all over the place.

This bug was discovered using Coverity Static Analysis Security Testing
(SAST) by Synopsys, Inc.

[ tglx: Use guard() instead of goto ]

Fixes: 015ec0386ab6 ("irqchip/gic-v3-its: Add VLPI configuration handling")
Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Hagar Hemdan <hagarhem@amazon.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20240531162144.28650-1-hagarhem@amazon.com
2024-06-03 18:20:00 +02:00
Linus Torvalds
6bfd2d442a Updates for the interrupt subsystem:
- Core code:
 
    - Interrupt storm detection for the lockup watchdog:
 
      Lockups which are caused by interrupt storms are not easy to debug
      because there is no information about the events which make the lockup
      detector trigger.
 
      To make this more user friendly, provide an extenstion to interrupt
      statistics which allows to take snapshots and an interface to retrieve
      the delta to the snapshot. Use this new mechanism in the watchdog code
      to do a two stage lockup analysis by taking the snapshot and printing
      the deltas for the topmost active interrupts on the second trigger.
 
      Note: This contains both the interrupt and the watchdog changes as
      the latter depend on the former obviously.
 
   - Avoid summation loops in the /proc/interrupts output and use the global
     counter when possible
 
   - Skip suspended interrupts on CPU hotplug operations to ensure that they
     are not delivered before the system resumes the device drivers when
     coming out of suspend.
 
   - On CPU hot-unplug interrupts which are affine to the outgoing CPU are
     migrated to a different CPU in the affinity mask. This can fail when
     the CPUs have no vectors left. Instead of giving up try to migrate it
     to any online CPU and thereby breaking the affinity setting in order to
     prevent a stale device interrupt which targets an offline CPU
 
   - The usual small cleanups
 
  - Driver code:
 
   - Support for the RISCV AIA MSI controller
 
   - Make the interrupt allocation for the Loongson PCH controller more
     flexible to prevent vector exhaustion
 
   - The usual set of cleanups and fixes all over the place
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmZBCM0THHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoeZHEACqMLN3K+1HyWflYtcTHJeYCjZLHS77
 2tQeKaaskOA4W6dcGXPxMw5CHqAobHVQQMqgcJxhUdqQiOJnFFnrtCD7JtqM0hWK
 UORNbyeovuhAo+iJ0fTuS8p63H7vm2GIWwBLWJnOuChYv/6Yyx5Cald1skvyvbzL
 zePhiiAf5mkdmJMeT5wJSCqEWSRYOXsVAJ/0YAwFG3bKkJH3bmDo6SDJY02sXT5P
 pjbtD/0hum9wIVT4fNdYleHHQMdBdj9dLlcxXBikHq50mDMw7GxvjKiLcXmoerw3
 rEBfVVJp3qpSofpNJZ3HH0ywcF3yUzq04/LPE9Tk2MoQ8NF0GzP8r9Ahke4B7cUj
 FysWNiAlC2IisEi6th313FZkTLx0zgewdsdEBTLt8eAE9TU0wamRbo99LZ8i/Qr3
 hk7jV8DzL+EDQJLgl4p1iPJgA708eW17tbCxLEa15VKVV6P58miohmhx/IfPO2Gx
 FV1PPehtItsmiK/UoRtUCoFdFsqNQtOE+h8DWLyy8RDmhBqGbn9Ut4euXiQIF+rX
 WJKPFfslCTR39BrBcZnZeNsgOCN7tEfFRstzjzkey1DaeTGWtxmA5UGhpC2vT74y
 YyXluvZlgKr4S64ABmcqQj++hQLho0OQAih3uW5YVxt4VxEUcXYMJOsV1AQGpMjF
 UnewWH5opBQdfw==
 =jFLf
 -----END PGP SIGNATURE-----

Merge tag 'irq-core-2024-05-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull interrupt subsystem updates from Thomas Gleixner:
 "Core code:

   - Interrupt storm detection for the lockup watchdog:

     Lockups which are caused by interrupt storms are not easy to debug
     because there is no information about the events which make the
     lockup detector trigger.

     To make this more user friendly, provide an extenstion to interrupt
     statistics which allows to take snapshots and an interface to
     retrieve the delta to the snapshot. Use this new mechanism in the
     watchdog code to do a two stage lockup analysis by taking the
     snapshot and printing the deltas for the topmost active interrupts
     on the second trigger.

     Note: This contains both the interrupt and the watchdog changes as
     the latter depend on the former obviously.

   - Avoid summation loops in the /proc/interrupts output and use the
     global counter when possible

   - Skip suspended interrupts on CPU hotplug operations to ensure that
     they are not delivered before the system resumes the device drivers
     when coming out of suspend.

   - On CPU hot-unplug interrupts which are affine to the outgoing CPU
     are migrated to a different CPU in the affinity mask. This can fail
     when the CPUs have no vectors left. Instead of giving up try to
     migrate it to any online CPU and thereby breaking the affinity
     setting in order to prevent a stale device interrupt which targets
     an offline CPU

   - The usual small cleanups

  Driver code:

   - Support for the RISCV AIA MSI controller

   - Make the interrupt allocation for the Loongson PCH controller more
     flexible to prevent vector exhaustion

   - The usual set of cleanups and fixes all over the place"

* tag 'irq-core-2024-05-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (51 commits)
  irqchip/gic-v3-its: Remove BUG_ON in its_vpe_irq_domain_alloc
  cpuidle: Avoid explicit cpumask allocation on stack
  irqchip/sifive-plic: Avoid explicit cpumask allocation on stack
  irqchip/riscv-aplic-direct: Avoid explicit cpumask allocation on stack
  irqchip/loongson-eiointc: Avoid explicit cpumask allocation on stack
  irqchip/gic-v3-its: Avoid explicit cpumask allocation on stack
  irqchip/irq-bcm6345-l1: Avoid explicit cpumask allocation on stack
  cpumask: Introduce cpumask_first_and_and()
  irqchip/irq-brcmstb-l2: Avoid saving mask on shutdown
  genirq: Reuse irq_is_nmi()
  genirq/cpuhotplug: Retry with cpu_online_mask when migration fails
  genirq/cpuhotplug: Skip suspended interrupts when restoring affinity
  arm64: dts: st: Add interrupt parent to pinctrl on stm32mp251
  arm64: dts: st: Add exti1 and exti2 nodes on stm32mp251
  ARM: dts: stm32: List exti parent interrupts on stm32mp131
  ARM: dts: stm32: List exti parent interrupts on stm32mp151
  arm64: Kconfig.platforms: Enable STM32_EXTI for ARCH_STM32
  irqchip/stm32-exti: Mark events reserved with RIF configuration check
  irqchip/stm32-exti: Skip secure events
  irqchip/stm32-exti: Convert driver to standard PM
  ...
2024-05-14 09:47:14 -07:00
Guanrui Huang
382d2ffe86 irqchip/gic-v3-its: Remove BUG_ON in its_vpe_irq_domain_alloc
This BUG_ON() is useless, because the same effect will be obtained 
by letting the code run its course and vm being dereferenced,
triggering an exception.

So just remove this check.

Signed-off-by: Guanrui Huang <guanrui.huang@linux.alibaba.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20240418061053.96803-3-guanrui.huang@linux.alibaba.com
2024-04-25 14:38:24 +02:00
Guanrui Huang
c26591afd3 irqchip/gic-v3-its: Prevent double free on error
The error handling path in its_vpe_irq_domain_alloc() causes a double free
when its_vpe_init() fails after successfully allocating at least one
interrupt. This happens because its_vpe_irq_domain_free() frees the
interrupts along with the area bitmap and the vprop_page and
its_vpe_irq_domain_alloc() subsequently frees the area bitmap and the
vprop_page again.

Fix this by unconditionally invoking its_vpe_irq_domain_free() which
handles all cases correctly and by removing the bitmap/vprop_page freeing
from its_vpe_irq_domain_alloc().

[ tglx: Massaged change log ]

Fixes: 7d75bbb4bc1a ("irqchip/gic-v3-its: Add VPE irq domain allocation/teardown")
Signed-off-by: Guanrui Huang <guanrui.huang@linux.alibaba.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240418061053.96803-2-guanrui.huang@linux.alibaba.com
2024-04-25 14:30:46 +02:00
Dawei Li
fcb8af4cbc irqchip/gic-v3-its: Avoid explicit cpumask allocation on stack
In general it's preferable to avoid placing cpumasks on the stack, as
for large values of NR_CPUS these can consume significant amounts of
stack space and make stack overflows more likely.

Remove cpumask var on stack and use cpumask_any_and() to address it.

Signed-off-by: Dawei Li <dawei.li@shingroup.cn>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20240416085454.3547175-4-dawei.li@shingroup.cn
2024-04-24 21:23:49 +02:00
Nianyao Tang
80e9963fb3 irqchip/gic-v3-its: Fix VSYNC referencing an unmapped VPE on GIC v4.1
As per the GICv4.1 spec (Arm IHI 0069H, 5.3.19):

 "A VMAPP with {V, Alloc}=={0, x} is self-synchronizing, This means the ITS
  command queue does not show the command as consumed until all of its
  effects are completed."

Furthermore, VSYNC is allowed to deliver an SError when referencing a
non existent VPE.

By these definitions, a VMAPP followed by a VSYNC is a bug, as the
later references a VPE that has been unmapped by the former.

Fix it by eliding the VSYNC in this scenario.

Fixes: 64edfaa9a234 ("irqchip/gic-v4.1: Implement the v4.1 flavour of VMAPP")
Signed-off-by: Nianyao Tang <tangnianyao@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Link: https://lore.kernel.org/r/20240406022737.3898763-1-tangnianyao@huawei.com
2024-04-09 11:11:18 +02:00
Linus Torvalds
02d4df78c5 Updates for the interrupt subsystem:
- Core:
 
    - Make affinity changes immediately effective for interrupt
      threads. This reduces the impact on isolated CPUs as it pulls over the
      thread right away instead of doing it after the next hardware
      interrupt arrived.
 
    - Cleanup and improvements for the interrupt chip simulator
 
    - Deduplication of the interrupt descriptor initialization code so the
      sparse and non-sparse mode share more code.
 
  - Drivers:
 
    - A set of conversions to platform_drivers::remove_new() which gets rid
      of the pointless return value.
 
    - A new driver for the Starfive JH8100 SoC
 
    - Support for Amlogic-T7 SoCs
 
    - Improvement for the interrupt handling and EOI management for the
      loongson interrupt controller.
 
    - The usual fixes and improvements all over the place.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmXt6RUTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoRahEACenZz//vEy+n5t94UCNoYEBsqL4qsl
 eHb2LPkOwJdzy0I0et8sSRfmjFgfmiB5vmcOtuTjbA+pAASMU16M5nU38dD4Qw7V
 lwfutv3wb0XT7INslvrsEF4SvhapoiSBtzdK4IEVJysaHek/bbvZg8rot2tXTjCR
 3sK4sMuWLXxB+MzcaYEXSZlIlsrXcARHYNVCbudsEqL2Rt7mGtBJBMIPAYXaWLMn
 Y1B15huDNcj+Z9s/rbX218oSajEYJv24NE7JW/eYhG8Rv3yc+1zMTIARq35V77/3
 KIV15XqKozkR4G8BEzQ1hUp6l1cggOjMslkwjyKnXTddkHQnQs5928/48y1qs4W0
 IDpJqpPL30ckfzg/fUKfUU98t95qB4X55jmK3LuiWfdS8cfd65gq4Ro2bIszM1NQ
 SYhcTvZRRcNJqlbO3rQfFAmVU0bvVyR3DlmrLzVl2tH5touwNBBQ/3D3o7CRGEns
 37c07zjVZnir+HFmrtTKOiENTay+fHrtIw5dFf7FMqREpE4kL/nsgZfN0wgZPUHj
 QGFExV/kJNSMvqwCz77uvHt6c5uoVZGn2j8iYAdqWVKYRcWCMids2gVEkc8QK4gQ
 eWsIEAClIEjArPqpQzPE2v3a9puCmOpbHWRmU7VDtNka9/ur8qoU2KMXMJBySaL4
 UKXfWYE+43RVbQ==
 =AbVv
 -----END PGP SIGNATURE-----

Merge tag 'irq-core-2024-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq updates from Thomas Gleixner:
 "Core:

   - Make affinity changes take effect immediately for interrupt
     threads. This reduces the impact on isolated CPUs as it pulls over
     the thread right away instead of doing it after the next hardware
     interrupt arrived.

   - Cleanup and improvements for the interrupt chip simulator

   - Deduplication of the interrupt descriptor initialization code so
     the sparse and non-sparse mode share more code.

  Drivers:

   - A set of conversions to platform_drivers::remove_new() which gets
     rid of the pointless return value.

   - A new driver for the Starfive JH8100 SoC

   - Support for Amlogic-T7 SoCs

   - Improvement for the interrupt handling and EOI management for the
     loongson interrupt controller.

   - The usual fixes and improvements all over the place"

* tag 'irq-core-2024-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits)
  irqchip/ts4800: Convert to platform_driver::remove_new() callback
  irqchip/stm32-exti: Convert to platform_driver::remove_new() callback
  irqchip/renesas-rza1: Convert to platform_driver::remove_new() callback
  irqchip/renesas-irqc: Convert to platform_driver::remove_new() callback
  irqchip/renesas-intc-irqpin: Convert to platform_driver::remove_new() callback
  irqchip/pruss-intc: Convert to platform_driver::remove_new() callback
  irqchip/mvebu-pic: Convert to platform_driver::remove_new() callback
  irqchip/madera: Convert to platform_driver::remove_new() callback
  irqchip/ls-scfg-msi: Convert to platform_driver::remove_new() callback
  irqchip/keystone: Convert to platform_driver::remove_new() callback
  irqchip/imx-irqsteer: Convert to platform_driver::remove_new() callback
  irqchip/imx-intmux: Convert to platform_driver::remove_new() callback
  irqchip/imgpdc: Convert to platform_driver::remove_new() callback
  irqchip: Add StarFive external interrupt controller
  dt-bindings: interrupt-controller: Add starfive,jh8100-intc
  arm64: dts: Add gpio_intc node for Amlogic-T7 SoCs
  irqchip/meson-gpio: Add support for Amlogic-T7 SoCs
  dt-bindings: interrupt-controller: Add support for Amlogic-T7 SoCs
  irqchip/vic: Fix a kernel-doc warning
  genirq: Wake interrupt threads immediately when changing affinity
  ...
2024-03-11 13:50:30 -07:00
Oliver Upton
ec4308ecfc irqchip/gic-v3-its: Do not assume vPE tables are preallocated
The GIC/ITS code is designed to ensure to pick up any preallocated LPI
tables on the redistributors, as enabling LPIs is a one-way switch. There
is no such restriction for vLPIs, and for GICv4.1 it is expected to
allocate a new vPE table at boot.

This works as intended when initializing an ITS, however when setting up a
redistributor in cpu_init_lpis() the early return for preallocated RD
tables skips straight past the GICv4 setup. This all comes to a head when
trying to kexec() into a new kernel, as the new kernel silently fails to
set up GICv4, leading to a complete loss of SGIs and LPIs for KVM VMs.

Slap a band-aid on the problem by ensuring its_cpu_init_lpis() always
initializes GICv4 on the way out, even if the other RD tables were
preallocated.

Fixes: 6479450f72c1 ("irqchip/gic-v4: Fix occasional VLPI drop")
Reported-by: George Cherian <gcherian@marvell.com>
Co-developed-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240219185809.286724-2-oliver.upton@linux.dev
2024-02-21 21:11:20 +01:00
Marc Zyngier
af9acbfc2c irqchip/gic-v3-its: Fix GICv4.1 VPE affinity update
When updating the affinity of a VPE, the VMOVP command is currently skipped
if the two CPUs are part of the same VPE affinity.

But this is wrong, as the doorbell corresponding to this VPE is still
delivered on the 'old' CPU, which screws up the balancing.  Furthermore,
offlining that 'old' CPU results in doorbell interrupts generated for this
VPE being discarded.

The harsh reality is that VMOVP cannot be elided when a set_affinity()
request occurs. It needs to be obeyed, and if an optimisation is to be
made, it is at the point where the affinity change request is made (such as
in KVM).

Drop the VMOVP elision altogether, and only use the vpe_table_mask
to try and stay within the same ITS affinity group if at all possible.

Fixes: dd3f050a216e (irqchip/gic-v4.1: Implement the v4.1 flavour of VMOVP)
Reported-by: Kunkun Jiang <jiangkunkun@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240213101206.2137483-4-maz@kernel.org
2024-02-13 11:29:52 +01:00
Marc Zyngier
8b02da04ad irqchip/gic-v3-its: Restore quirk probing for ACPI-based systems
While refactoring the way the ITSs are probed, the handling of quirks
applicable to ACPI-based platforms was lost. As a result, systems such as
HIP07 lose their GICv4 functionnality, and some other may even fail to
boot, unless they are configured to boot with DT.

Move the enabling of quirks into its_probe_one(), making it common to all
firmware implementations.

Fixes: 9585a495ac93 ("irqchip/gic-v3-its: Split allocation from initialisation of its_node")
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240213101206.2137483-3-maz@kernel.org
2024-02-13 11:29:52 +01:00
Marc Zyngier
846297e11e irqchip/gic-v3-its: Handle non-coherent GICv4 redistributors
Although the GICv3 code base has gained some handling of systems failing to
handle the shareability attributes, the GICv4 side of things has been
firmly ignored.

This is unfortunate, as the new recent addition of the "dma-noncoherent" is
supposed to apply to all of the GICR tables, and not just the ones that are
common to v3 and v4.

Add some checks to handle the VPROPBASE/VPENDBASE shareability and
cacheability attributes in the same way we deal with the other GICR_BASE
registers, wrapping the flag check in a helper for improved readability.

Note that this has been found by inspection only, as I don't have access to
HW that suffers from this particular issue.

Fixes: 3a0fff0fb6a3 ("irqchip/gic-v3: Enable non-coherent redistributors/ITSes DT probing")
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Link: https://lore.kernel.org/r/20240213101206.2137483-2-maz@kernel.org
2024-02-13 11:29:51 +01:00
Christophe JAILLET
ee4c1592b7 irqchip/gic-v3-its: Remove usage of the deprecated ida_simple_xx() API
ida_alloc() and ida_free() should be used instead of the deprecated
ida_simple_get() and ida_simple_remove().

The upper limit of ida_simple_get() is exclusive, but the one of
ida_alloc_max() is inclusive. Adjust the code accordingly.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/3b472b0e7edf6e483b8b255cf8d1cb0163532adf.1705222332.git.christophe.jaillet@wanadoo.fr
2024-02-13 11:18:55 +01:00
Kirill A. Shutemov
5e0a760b44 mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER
commit 23baf831a32c ("mm, treewide: redefine MAX_ORDER sanely") has
changed the definition of MAX_ORDER to be inclusive.  This has caused
issues with code that was not yet upstream and depended on the previous
definition.

To draw attention to the altered meaning of the define, rename MAX_ORDER
to MAX_PAGE_ORDER.

Link: https://lkml.kernel.org/r/20231228144704.14033-2-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-08 15:27:15 -08:00
Fang Xiang
d3badb1561 irqchip/gic-v3-its: Flush ITS tables correctly in non-coherent GIC designs
In non-coherent GIC designs, the ITS tables must be flushed before writing
to the GITS_BASER<n> registers, otherwise the ITS could read dirty tables,
which results in unpredictable behavior.

Flush the tables right at the begin of its_setup_baser() to prevent that.

[ tglx: Massage changelog ]

Fixes: a8707f553884 ("irqchip/gic-v3: Add Rockchip 3588001 erratum workaround")
Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Fang Xiang <fangxiang3@xiaomi.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231030083256.4345-1-fangxiang3@xiaomi.com
2023-11-06 01:16:33 +01:00
Marc Zyngier
f199bf5bf8 irqchip/gic-v3-its: Don't override quirk settings with default values
When splitting the allocation of the ITS node from its configuration,
some of the default settings were kept in the latter instead of
being moved to the former.

This has the side effect of negating some of the quirk detections that
have happened in between, amongst which the dreaded Synquacer hack
(that also affect Dominic's TI platform).

Move the initialisation of these fields early, so that they can again be
overriden by the Synquacer quirk.

Fixes: 9585a495ac93 ("irqchip/gic-v3-its: Split allocation from initialisation of its_node")
Reported by: Dominic Rath <dominic.rath@ibv-augsburg.net>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Dominic Rath <dominic.rath@ibv-augsburg.net>
Link: https://lore.kernel.org/r/20231024084831.GA3788@JADEVM-DRA
Link: https://lore.kernel.org/r/20231024143431.2144579-1-maz@kernel.org
2023-10-25 21:44:49 +02:00
Lorenzo Pieralisi
3a0fff0fb6 irqchip/gic-v3: Enable non-coherent redistributors/ITSes DT probing
The GIC architecture specification defines a set of registers
for redistributors and ITSes that control the sharebility and
cacheability attributes of redistributors/ITSes initiator ports
on the interconnect (GICR_[V]PROPBASER, GICR_[V]PENDBASER,
GITS_BASER<n>).

Architecturally the GIC provides a means to drive shareability
and cacheability attributes signals and related IWB/OWB/ISH barriers
but it is not mandatory for designs to wire up the corresponding
interconnect signals that control the cacheability/shareability
of transactions.

Redistributors and ITSes interconnect ports can be connected to
non-coherent interconnects that are not able to manage the
shareability/cacheability attributes; this implicitly makes
the redistributors and ITSes non-coherent observers.

So far, the GIC driver on probe executes a write to "probe" for
the redistributors and ITSes registers shareability bitfields
by writing a value (ie InnerShareable - the shareability domain the
CPUs are in) and check it back to detect whether the value sticks or
not; this hinges on a GIC programming model behaviour that predates the
current specifications, that just define shareability bits as writeable
but do not guarantee that writing certain shareability values
enable the expected behaviour for the redistributors/ITSes
memory interconnect ports.

To enable non-coherent GIC designs, introduce the "dma-noncoherent"
device tree property to allow firmware to describe redistributors and
ITSes as non-coherent observers on the memory interconnect and use the
property to force the shareability attributes to be programmed into the
redistributors and ITSes registers through the GIC quirks mechanism.

Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231006125929.48591-3-lpieralisi@kernel.org
2023-10-07 12:47:12 +01:00
Marc Zyngier
9585a495ac irqchip/gic-v3-its: Split allocation from initialisation of its_node
In order to pave the way for more fancy quirk handling without making
more of a mess of this terrible driver, split the allocation of the
ITS descriptor (its_node) from the actual probing.

This will allow firmware-specific hooks to be added between these
two points.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231006125929.48591-4-lpieralisi@kernel.org
2023-10-07 12:47:12 +01:00
Sebastian Reichel
567f67acac irqchip/gic-v3: Enable Rockchip 3588001 erratum workaround for RK3588S
Commit a8707f553884 ("irqchip/gic-v3: Add Rockchip 3588001 erratum
workaround") mentioned RK3588S (the slimmed down variant of RK3588)
being affected, but did not check for its compatible value. Thus the
quirk is not applied on RK3588S. Since the GIC ITS node got added to the
upstream DT, boards using RK3588S are no longer booting without this
quirk being applied.

Fixes: 06cdac8e8407 ("arm64: dts: rockchip: add GIC ITS support to rk3588")
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230703164129.193991-1-sebastian.reichel@collabora.com
2023-07-03 19:48:04 +01:00
Marc Zyngier
926846a703 irqchip/gic-v4.1: Properly lock VPEs when doing a directLPI invalidation
We normally rely on the irq_to_cpuid_[un]lock() primitives to make
sure nothing will change col->idx while performing a LPI invalidation.

However, these primitives do not cover VPE doorbells, and we have
some open-coded locking for that. Unfortunately, this locking is
pretty bogus.

Instead, extend the above primitives to cover VPE doorbells and
convert the whole thing to it.

Fixes: f3a059219bc7 ("irqchip/gic-v4.1: Ensure mutual exclusion between vPE affinity change and RD access")
Reported-by: Kunkun Jiang <jiangkunkun@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Cc: wanghaibin.wang@huawei.com
Tested-by: Kunkun Jiang <jiangkunkun@huawei.com>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Link: https://lore.kernel.org/r/20230617073242.3199746-1-maz@kernel.org
2023-07-03 19:48:04 +01:00
James Gowans
8f4b589595 irqchip/gic-v3-its: Enable RESEND_WHEN_IN_PROGRESS for LPIs
GICv3 LPIs are impacted by an architectural design issue: they do not
have a global active state and as such a given LPI can be delivered to
a new CPU after an affinity change while the previous instance of the
same LPI handler has not yet completed on the original CPU.

If LPIs had an active state, this second LPI would not be delivered
until the first CPU deactivated the initial LPI, just like SPIs.

To solve this issue, use the newly introduced IRQD_RESEND_WHEN_IN_PROGRESS
flag, ensuring that we do not lose an LPI being delivered during that window
by getting the GIC to resend it.

This workaround gets enabled for all LPIs, including the VPE doorbells.

Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: James Gowans <jgowans@amazon.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Marc Zyngier <maz@kernel.org>
Cc: KarimAllah Raslan <karahmed@amazon.com>
Cc: Yipeng Zou <zouyipeng@huawei.com>
Cc: Zhang Jianhua <chris.zjh@huawei.com>
[maz: massaged commit message]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230608120021.3273400-4-jgowans@amazon.com
2023-06-16 12:23:40 +01:00
Linus Torvalds
7fa8a8ee94 - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of
switching from a user process to a kernel thread.
 
 - More folio conversions from Kefeng Wang, Zhang Peng and Pankaj Raghav.
 
 - zsmalloc performance improvements from Sergey Senozhatsky.
 
 - Yue Zhao has found and fixed some data race issues around the
   alteration of memcg userspace tunables.
 
 - VFS rationalizations from Christoph Hellwig:
 
   - removal of most of the callers of write_one_page().
 
   - make __filemap_get_folio()'s return value more useful
 
 - Luis Chamberlain has changed tmpfs so it no longer requires swap
   backing.  Use `mount -o noswap'.
 
 - Qi Zheng has made the slab shrinkers operate locklessly, providing
   some scalability benefits.
 
 - Keith Busch has improved dmapool's performance, making part of its
   operations O(1) rather than O(n).
 
 - Peter Xu adds the UFFD_FEATURE_WP_UNPOPULATED feature to userfaultd,
   permitting userspace to wr-protect anon memory unpopulated ptes.
 
 - Kirill Shutemov has changed MAX_ORDER's meaning to be inclusive rather
   than exclusive, and has fixed a bunch of errors which were caused by its
   unintuitive meaning.
 
 - Axel Rasmussen give userfaultfd the UFFDIO_CONTINUE_MODE_WP feature,
   which causes minor faults to install a write-protected pte.
 
 - Vlastimil Babka has done some maintenance work on vma_merge():
   cleanups to the kernel code and improvements to our userspace test
   harness.
 
 - Cleanups to do_fault_around() by Lorenzo Stoakes.
 
 - Mike Rapoport has moved a lot of initialization code out of various
   mm/ files and into mm/mm_init.c.
 
 - Lorenzo Stoakes removd vmf_insert_mixed_prot(), which was added for
   DRM, but DRM doesn't use it any more.
 
 - Lorenzo has also coverted read_kcore() and vread() to use iterators
   and has thereby removed the use of bounce buffers in some cases.
 
 - Lorenzo has also contributed further cleanups of vma_merge().
 
 - Chaitanya Prakash provides some fixes to the mmap selftesting code.
 
 - Matthew Wilcox changes xfs and afs so they no longer take sleeping
   locks in ->map_page(), a step towards RCUification of pagefaults.
 
 - Suren Baghdasaryan has improved mmap_lock scalability by switching to
   per-VMA locking.
 
 - Frederic Weisbecker has reworked the percpu cache draining so that it
   no longer causes latency glitches on cpu isolated workloads.
 
 - Mike Rapoport cleans up and corrects the ARCH_FORCE_MAX_ORDER Kconfig
   logic.
 
 - Liu Shixin has changed zswap's initialization so we no longer waste a
   chunk of memory if zswap is not being used.
 
 - Yosry Ahmed has improved the performance of memcg statistics flushing.
 
 - David Stevens has fixed several issues involving khugepaged,
   userfaultfd and shmem.
 
 - Christoph Hellwig has provided some cleanup work to zram's IO-related
   code paths.
 
 - David Hildenbrand has fixed up some issues in the selftest code's
   testing of our pte state changing.
 
 - Pankaj Raghav has made page_endio() unneeded and has removed it.
 
 - Peter Xu contributed some rationalizations of the userfaultfd
   selftests.
 
 - Yosry Ahmed has fixed an issue around memcg's page recalim accounting.
 
 - Chaitanya Prakash has fixed some arm-related issues in the
   selftests/mm code.
 
 - Longlong Xia has improved the way in which KSM handles hwpoisoned
   pages.
 
 - Peter Xu fixes a few issues with uffd-wp at fork() time.
 
 - Stefan Roesch has changed KSM so that it may now be used on a
   per-process and per-cgroup basis.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZEr3zQAKCRDdBJ7gKXxA
 jlLoAP0fpQBipwFxED0Us4SKQfupV6z4caXNJGPeay7Aj11/kQD/aMRC2uPfgr96
 eMG3kwn2pqkB9ST2QpkaRbxA//eMbQY=
 =J+Dj
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2023-04-27-15-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of
   switching from a user process to a kernel thread.

 - More folio conversions from Kefeng Wang, Zhang Peng and Pankaj
   Raghav.

 - zsmalloc performance improvements from Sergey Senozhatsky.

 - Yue Zhao has found and fixed some data race issues around the
   alteration of memcg userspace tunables.

 - VFS rationalizations from Christoph Hellwig:
     - removal of most of the callers of write_one_page()
     - make __filemap_get_folio()'s return value more useful

 - Luis Chamberlain has changed tmpfs so it no longer requires swap
   backing. Use `mount -o noswap'.

 - Qi Zheng has made the slab shrinkers operate locklessly, providing
   some scalability benefits.

 - Keith Busch has improved dmapool's performance, making part of its
   operations O(1) rather than O(n).

 - Peter Xu adds the UFFD_FEATURE_WP_UNPOPULATED feature to userfaultd,
   permitting userspace to wr-protect anon memory unpopulated ptes.

 - Kirill Shutemov has changed MAX_ORDER's meaning to be inclusive
   rather than exclusive, and has fixed a bunch of errors which were
   caused by its unintuitive meaning.

 - Axel Rasmussen give userfaultfd the UFFDIO_CONTINUE_MODE_WP feature,
   which causes minor faults to install a write-protected pte.

 - Vlastimil Babka has done some maintenance work on vma_merge():
   cleanups to the kernel code and improvements to our userspace test
   harness.

 - Cleanups to do_fault_around() by Lorenzo Stoakes.

 - Mike Rapoport has moved a lot of initialization code out of various
   mm/ files and into mm/mm_init.c.

 - Lorenzo Stoakes removd vmf_insert_mixed_prot(), which was added for
   DRM, but DRM doesn't use it any more.

 - Lorenzo has also coverted read_kcore() and vread() to use iterators
   and has thereby removed the use of bounce buffers in some cases.

 - Lorenzo has also contributed further cleanups of vma_merge().

 - Chaitanya Prakash provides some fixes to the mmap selftesting code.

 - Matthew Wilcox changes xfs and afs so they no longer take sleeping
   locks in ->map_page(), a step towards RCUification of pagefaults.

 - Suren Baghdasaryan has improved mmap_lock scalability by switching to
   per-VMA locking.

 - Frederic Weisbecker has reworked the percpu cache draining so that it
   no longer causes latency glitches on cpu isolated workloads.

 - Mike Rapoport cleans up and corrects the ARCH_FORCE_MAX_ORDER Kconfig
   logic.

 - Liu Shixin has changed zswap's initialization so we no longer waste a
   chunk of memory if zswap is not being used.

 - Yosry Ahmed has improved the performance of memcg statistics
   flushing.

 - David Stevens has fixed several issues involving khugepaged,
   userfaultfd and shmem.

 - Christoph Hellwig has provided some cleanup work to zram's IO-related
   code paths.

 - David Hildenbrand has fixed up some issues in the selftest code's
   testing of our pte state changing.

 - Pankaj Raghav has made page_endio() unneeded and has removed it.

 - Peter Xu contributed some rationalizations of the userfaultfd
   selftests.

 - Yosry Ahmed has fixed an issue around memcg's page recalim
   accounting.

 - Chaitanya Prakash has fixed some arm-related issues in the
   selftests/mm code.

 - Longlong Xia has improved the way in which KSM handles hwpoisoned
   pages.

 - Peter Xu fixes a few issues with uffd-wp at fork() time.

 - Stefan Roesch has changed KSM so that it may now be used on a
   per-process and per-cgroup basis.

* tag 'mm-stable-2023-04-27-15-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits)
  mm,unmap: avoid flushing TLB in batch if PTE is inaccessible
  shmem: restrict noswap option to initial user namespace
  mm/khugepaged: fix conflicting mods to collapse_file()
  sparse: remove unnecessary 0 values from rc
  mm: move 'mmap_min_addr' logic from callers into vm_unmapped_area()
  hugetlb: pte_alloc_huge() to replace huge pte_alloc_map()
  maple_tree: fix allocation in mas_sparse_area()
  mm: do not increment pgfault stats when page fault handler retries
  zsmalloc: allow only one active pool compaction context
  selftests/mm: add new selftests for KSM
  mm: add new KSM process and sysfs knobs
  mm: add new api to enable ksm per process
  mm: shrinkers: fix debugfs file permissions
  mm: don't check VMA write permissions if the PTE/PMD indicates write permissions
  migrate_pages_batch: fix statistics for longterm pin retry
  userfaultfd: use helper function range_in_vma()
  lib/show_mem.c: use for_each_populated_zone() simplify code
  mm: correct arg in reclaim_pages()/reclaim_clean_pages_from_list()
  fs/buffer: convert create_page_buffers to folio_create_buffers
  fs/buffer: add folio_create_empty_buffers helper
  ...
2023-04-27 19:42:02 -07:00
Sebastian Reichel
a8707f5538 irqchip/gic-v3: Add Rockchip 3588001 erratum workaround
Rockchip RK3588/RK3588s GIC600 integration does not support the
sharability feature. Rockchip assigned Erratum ID #3588001 for this
issue.

Note, that the 0x0201743b ID is not Rockchip specific and thus
there is an extra of_machine_is_compatible() check.

The flags are named FORCE_NON_SHAREABLE to be vendor agnostic,
since apparently similar integration design errors exist in other
platforms and they can reuse the same flag.

Co-developed-by: XiaoDong Huang <derrick.huang@rock-chips.com>
Signed-off-by: XiaoDong Huang <derrick.huang@rock-chips.com>
Co-developed-by: Kever Yang <kever.yang@rock-chips.com>
Signed-off-by: Kever Yang <kever.yang@rock-chips.com>
Co-developed-by: Lucas Tanure <lucas.tanure@collabora.com>
Signed-off-by: Lucas Tanure <lucas.tanure@collabora.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230418142109.49762-2-sebastian.reichel@collabora.com
2023-04-18 17:31:17 +01:00
Kirill A. Shutemov
23baf831a3 mm, treewide: redefine MAX_ORDER sanely
MAX_ORDER currently defined as number of orders page allocator supports:
user can ask buddy allocator for page order between 0 and MAX_ORDER-1.

This definition is counter-intuitive and lead to number of bugs all over
the kernel.

Change the definition of MAX_ORDER to be inclusive: the range of orders
user can ask from buddy allocator is 0..MAX_ORDER now.

[kirill@shutemov.name: fix min() warning]
  Link: https://lkml.kernel.org/r/20230315153800.32wib3n5rickolvh@box
[akpm@linux-foundation.org: fix another min_t warning]
[kirill@shutemov.name: fixups per Zi Yan]
  Link: https://lkml.kernel.org/r/20230316232144.b7ic4cif4kjiabws@box.shutemov.name
[akpm@linux-foundation.org: fix underlining in docs]
  Link: https://lore.kernel.org/oe-kbuild-all/202303191025.VRCTk6mP-lkp@intel.com/
Link: https://lkml.kernel.org/r/20230315113133.11326-11-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Michael Ellerman <mpe@ellerman.id.au>	[powerpc]
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-05 19:42:46 -07:00
Linus Torvalds
143c7bc649 iommufd for 6.3
Some polishing and small fixes for iommufd:
 
 - Remove IOMMU_CAP_INTR_REMAP, instead rely on the interrupt subsystem
 
 - Use GFP_KERNEL_ACCOUNT inside the iommu_domains
 
 - Support VFIO_NOIOMMU mode with iommufd
 
 - Various typos
 
 - A list corruption bug if HWPTs are used for attach
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCY/TgzQAKCRCFwuHvBreF
 Ya3AAP4/WxTJIbDvtTyH3Fae3NxTdO8j8gsUvU1vrRYG83zdnAEAxd1yii7GEO8D
 crkeq9D4FUiPAkFnJ64Exw2FHb060Qg=
 =RABK
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd

Pull iommufd updates from Jason Gunthorpe:
 "Some polishing and small fixes for iommufd:

   - Remove IOMMU_CAP_INTR_REMAP, instead rely on the interrupt
     subsystem

   - Use GFP_KERNEL_ACCOUNT inside the iommu_domains

   - Support VFIO_NOIOMMU mode with iommufd

   - Various typos

   - A list corruption bug if HWPTs are used for attach"

* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd:
  iommufd: Do not add the same hwpt to the ioas->hwpt_list twice
  iommufd: Make sure to zero vfio_iommu_type1_info before copying to user
  vfio: Support VFIO_NOIOMMU with iommufd
  iommufd: Add three missing structures in ucmd_buffer
  selftests: iommu: Fix test_cmd_destroy_access() call in user_copy
  iommu: Remove IOMMU_CAP_INTR_REMAP
  irq/s390: Add arch_is_isolated_msi() for s390
  iommu/x86: Replace IOMMU_CAP_INTR_REMAP with IRQ_DOMAIN_FLAG_ISOLATED_MSI
  genirq/msi: Rename IRQ_DOMAIN_MSI_REMAP to IRQ_DOMAIN_ISOLATED_MSI
  genirq/irqdomain: Remove unused irq_domain_check_msi_remap() code
  iommufd: Convert to msi_device_has_isolated_msi()
  vfio/type1: Convert to iommu_group_has_isolated_msi()
  iommu: Add iommu_group_has_isolated_msi()
  genirq/msi: Add msi_device_has_isolated_msi()
2023-02-24 14:34:12 -08:00
Johan Hovold
1e46e040de irqchip/gic-v3-its: Use irq_domain_create_hierarchy()
Use the irq_domain_create_hierarchy() helper to create the hierarchical
domain, which both serves as documentation and avoids poking at
irqdomain internals.

Note that the domain host_data was first set to the struct its_node
during allocation only to immediately be overwritten with the struct
msi_domain_info.

Tested-by: Hsin-Yi Wang <hsinyi@chromium.org>
Tested-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com>
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230213104302.17307-17-johan+linaro@kernel.org
2023-02-13 19:31:25 +00:00
Jason Gunthorpe
dcb83f6ec1 genirq/msi: Rename IRQ_DOMAIN_MSI_REMAP to IRQ_DOMAIN_ISOLATED_MSI
What x86 calls "interrupt remapping" is one way to achieve isolated MSI,
make it clear this is talking about isolated MSI, no matter how it is
achieved. This matches the new driver facing API name of
msi_device_has_isolated_msi()

No functional change.

Link: https://lore.kernel.org/r/6-v3-3313bb5dd3a3+10f11-secure_msi_jgg@nvidia.com
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-01-11 16:27:23 -04:00
Linus Torvalds
f23cdfcd04 IOMMU Updates for Linux v6.1:
Including:
 
 	- Removal of the bus_set_iommu() interface which became
 	  unnecesary because of IOMMU per-device probing
 
 	- Make the dma-iommu.h header private
 
 	- Intel VT-d changes from Lu Baolu:
 	  - Decouple PASID and PRI from SVA
 	  - Add ESRTPS & ESIRTPS capability check
 	  - Cleanups
 
 	- Apple DART support for the M1 Pro/MAX SOCs
 
 	- Support for AMD IOMMUv2 page-tables for the DMA-API layer. The
 	  v2 page-tables are compatible with the x86 CPU page-tables.
 	  Using them for DMA-API prepares support for hardware-assisted
 	  IOMMU virtualization
 
 	- Support for MT6795 Helio X10 M4Us in the Mediatek IOMMU driver
 
 	- Some smaller fixes and cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmNEC5oACgkQK/BELZcB
 GuNcOQ/6A5SXmcvDRLYZW1ENM5Z6xsZ1LabSZkjhYSpmbJyu8Uny/Z2aRWqxPMLJ
 hJeHTsWSLhrTq1VfjFhELHB3kgT2DRr7H3LXXaMNC6qz690EcavX1wKX2AxH0m22
 8YrktkyAmFQ3BG6rsQLdlMMasLph/x06ix/xO9opQZVFdj/fV0Jx7ekX1JK+U3hx
 MI96i5W3G5PBVHBypAvjxSlmA4saj9Fhk7l3IZL7py9AOKz7NypuwWRs+86PMBiO
 EzLt5aF4g8pmKChF/c9BsoIbjBYvTG/s3NbycIng0ACc2SOvf+EvtoVZQclWifbT
 lwti9PLdsoVUnPOZHLYOTx4xSf/UyoLVzaLxJ52aoXnNYe2qaX5DANXhT2mWIY/Y
 z1mzOkShmK7WF7a8arRyqJeLJ4SvDx8GrbvLiom3DAzmqVHzzFGadHtt5fvGYN4F
 Jet/JIN3HjECQbamqtPBpWquBFhLmgusPksIiyMFscRvYdZqkaVkTkElcF3WqAMm
 QkeecfoTQ9Vdtdz44ZVLRjKpS77yRZmHshp1r/rfSI+9Ok8uRI+xmmcyrAI6ElqH
 DH14tLHPzw694rTHF+bTCd+pPMGOoFLi0xAfUXAeGWm1uzC1JIRrVu5JeQNOUOSD
 5SQDXB7dPrhXngaws5Fx2u3amCO3688mslcGgM7q54kC+LyVo0E=
 =h0sT
 -----END PGP SIGNATURE-----

Merge tag 'iommu-updates-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull iommu updates from Joerg Roedel:

 - remove the bus_set_iommu() interface which became unnecesary because
   of IOMMU per-device probing

 - make the dma-iommu.h header private

 - Intel VT-d changes from Lu Baolu:
	  - Decouple PASID and PRI from SVA
	  - Add ESRTPS & ESIRTPS capability check
	  - Cleanups

 - Apple DART support for the M1 Pro/MAX SOCs

 - support for AMD IOMMUv2 page-tables for the DMA-API layer.

   The v2 page-tables are compatible with the x86 CPU page-tables. Using
   them for DMA-API prepares support for hardware-assisted IOMMU
   virtualization

 - support for MT6795 Helio X10 M4Us in the Mediatek IOMMU driver

 - some smaller fixes and cleanups

* tag 'iommu-updates-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (59 commits)
  iommu/vt-d: Avoid unnecessary global DMA cache invalidation
  iommu/vt-d: Avoid unnecessary global IRTE cache invalidation
  iommu/vt-d: Rename cap_5lp_support to cap_fl5lp_support
  iommu/vt-d: Remove pasid_set_eafe()
  iommu/vt-d: Decouple PASID & PRI enabling from SVA
  iommu/vt-d: Remove unnecessary SVA data accesses in page fault path
  dt-bindings: iommu: arm,smmu-v3: Relax order of interrupt names
  iommu: dart: Support t6000 variant
  iommu/io-pgtable-dart: Add DART PTE support for t6000
  iommu/io-pgtable: Add DART subpage protection support
  iommu/io-pgtable: Move Apple DART support to its own file
  iommu/mediatek: Add support for MT6795 Helio X10 M4Us
  iommu/mediatek: Introduce new flag TF_PORT_TO_ADDR_MT8173
  dt-bindings: mediatek: Add bindings for MT6795 M4U
  iommu/iova: Fix module config properly
  iommu/amd: Fix sparse warning
  iommu/amd: Remove outdated comment
  iommu/amd: Free domain ID after domain_flush_pages
  iommu/amd: Free domain id in error path
  iommu/virtio: Fix compile error with viommu_capable()
  ...
2022-10-10 13:20:53 -07:00
Pierre Gondois
f55a9b59e8 irqchip/gic-v3-its: Remove cpumask_var_t allocation
Running a PREEMPT_RT kernel based on v5.19-rc3-rt4 on an Ampere Altra
triggers:

[   22.616229] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46
[   22.616239] in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 1884, name: kworker/80:1
[   22.616243] preempt_count: 3, expected: 0
[   22.616244] RCU nest depth: 0, expected: 0
[...]
[   22.616250] hardirqs last enabled at (33): _raw_spin_unlock_irq (/home/piegon01/linux/./arch/arm64/include/asm/irqflags.h:35)
[   22.616273] hardirqs last disabled at (34): __schedule (/home/piegon01/linux/kernel/sched/core.c:6432 (discriminator 1))
[   22.616283] softirqs last enabled at (0): copy_process (/home/piegon01/linux/./include/linux/lockdep.h:191)
[   22.616297] softirqs last disabled at (0): 0x0
[   22.616305] Preemption disabled at:
[   22.616307] __setup_irq (/home/piegon01/linux/kernel/irq/manage.c:1612)
[   22.616322] CPU: 80 PID: 1884 Comm: kworker/80:1 Tainted: G        W        [...]
[   22.616328] Hardware name: WIWYNN Mt.Jade Server System B81.03001.0005/Mt.Jade Motherboard, BIOS 1.08.20220218 (SCP: 1.08.20220218) 2022/02/18
[   22.616333] Workqueue: events work_for_cpu_fn
[   22.616344] Call trace:
[...]
[   22.616403] alloc_cpumask_var_node (/home/piegon01/linux/lib/cpumask.c:115)
[   22.616414] alloc_cpumask_var (/home/piegon01/linux/lib/cpumask.c:147)
[   22.616417] its_select_cpu (/home/piegon01/linux/drivers/irqchip/irq-gic-v3-its.c:1580)
[   22.616428] its_set_affinity (/home/piegon01/linux/drivers/irqchip/irq-gic-v3-its.c:1659)
[   22.616431] msi_domain_set_affinity (/home/piegon01/linux/kernel/irq/msi.c:501)
[   22.616440] irq_do_set_affinity (/home/piegon01/linux/kernel/irq/manage.c:276)
[   22.616443] irq_setup_affinity (/home/piegon01/linux/kernel/irq/manage.c:633)
[   22.616447] irq_startup (/home/piegon01/linux/kernel/irq/chip.c:280)
[   22.616453] __setup_irq (/home/piegon01/linux/kernel/irq/manage.c:1777)

Follow the pattern established in commit cba4235e6031e ("genirq: Remove
mask argument from setup_affinity()") and co to overcome this issue by
defining a static struct cpumask and protecting it by a raw spinlock.

Since its_select_cpu() can be executed with IRQs enabled or disabled,
enforce that the cpumask computation is done with interrupts disabled.

Signed-off-by: Pierre Gondois <pierre.gondois@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220912141857.1391343-1-pierre.gondois@arm.com
2022-09-12 16:37:43 +01:00
Robin Murphy
fa49364cd5 iommu/dma: Move public interfaces to linux/iommu.h
The iommu-dma layer is now mostly encapsulated by iommu_dma_ops, with
only a couple more public interfaces left pertaining to MSI integration.
Since these depend on the main IOMMU API header anyway, move their
declarations there, taking the opportunity to update the half-baked
comments to proper kerneldoc along the way.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/9cd99738f52094e6bed44bfee03fa4f288d20695.1660668998.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-09-07 14:47:00 +02:00
Thomas Gleixner
cdb4913293 irqchip updates for 5.19:
- Add new infrastructure to stop gpiolib from rewriting irq_chip
   structures behind our back. Convert a few of them, but this will
   obviously be a long effort.
 
 - A bunch of GICv3 improvements, such as using MMIO-based invalidations
   when possible, and reducing the amount of polling we perform when
   reconfiguring interrupts.
 
 - Another set of GICv3 improvements for the Pseudo-NMI functionality,
   with a nice cleanup making it easy to reason about the various
   states we can be in when an NMI fires.
 
 - The usual bunch of misc fixes and minor improvements.
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmKGcX8PHG1hekBrZXJu
 ZWwub3JnAAoJECPQ0LrRPXpD7kYP/1sbxyRoq7iWqtTDK7ENWvqXh5wu/YZe0pnw
 jr0hPrJTdQKUbsBA+pusklEnTHvRgnLOmFpfR3X7apGg/If7mPRZGQcZz3fXKwDA
 53u74IzZhYa+fx9H0L1qtBUHJtTP4/IexkzL/84R19u2/ewIhzDyhpvGxA/yAFj+
 Gi6bgz93NGMOt/tdNtXZvj5zdr+5BayC6JBpnyzliyxS1xD3YeA0T05fHDYfjrcM
 51gUeA/9tA3EWiRzsdZGq6uDaUfBW5aspWu0bZx/WBUWNBvAAjWzhIgNWDW/xKJP
 N3t6UQ6+uNYJXvdaCJlBLc6TiXBzGXINgr4oMljg8nJRYLt+xVsadkTnFxlnqoY/
 FNeEiOUQqjZ1qcvHJoIceGHgTq//o3VaZ+AnuAESqeNPGavz+LMOCNo7Su+k2+Tk
 H3x09+p+SbrzJvRVyboLVk+v74NtzEz1fGrjEzQk2eHw+dc18yz1v+D1EX1REkhM
 gjzjSIAgZoq1M3GZL8tyrov44vhG3mUm3jAO01u9fRTHqEee6WIKt0aijSe/sCRr
 chTf+S9n8xPsr6AHUPQImV/fSismK4erCJeAiSp+P3hZjqyK8iPsHgiM5YLj50Cl
 ry9dACxv6CYf7lMKmKPC/atV1IlJSEZpguc6FLQ2tv9IBWqNMQXve0012acFMr6B
 ZpncbECV
 =nQxd
 -----END PGP SIGNATURE-----

Merge tag 'irqchip-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core

Pull irqchip updates from Marc Zyngier:

 - Add new infrastructure to stop gpiolib from rewriting irq_chip
   structures behind our back. Convert a few of them, but this will
   obviously be a long effort.

 - A bunch of GICv3 improvements, such as using MMIO-based invalidations
   when possible, and reducing the amount of polling we perform when
   reconfiguring interrupts.

 - Another set of GICv3 improvements for the Pseudo-NMI functionality,
   with a nice cleanup making it easy to reason about the various
   states we can be in when an NMI fires.

 - The usual bunch of misc fixes and minor improvements.

Link: https://lore.kernel.org/all/20220519165308.998315-1-maz@kernel.org
2022-05-20 18:48:54 +02:00
Marc Zyngier
3f893a5962 irqchip/gic-v3: Always trust the managed affinity provided by the core code
Now that the core code has been fixed to always give us an affinity
that only includes online CPUs, directly use this affinity when
computing a target CPU.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20220405185040.206297-4-maz@kernel.org
2022-04-10 21:06:30 +02:00
Marc Zyngier
af27e41612 irqchip/gic-v4: Wait for GICR_VPENDBASER.Dirty to clear before descheduling
The way KVM drives GICv4.{0,1} is as follows:
- vcpu_load() makes the VPE resident, instructing the RD to start
  scanning for interrupts
- just before entering the guest, we check that the RD has finished
  scanning and that we can start running the vcpu
- on preemption, we deschedule the VPE by making it invalid on
  the RD

However, we are preemptible between the first two steps. If it so
happens *and* that the RD was still scanning, we nonetheless write
to the GICR_VPENDBASER register while Dirty is set, and bad things
happen (we're in UNPRED land).

This affects both the 4.0 and 4.1 implementations.

Make sure Dirty is cleared before performing the deschedule,
meaning that its_clear_vpend_valid() becomes a sort of full VPE
residency barrier.

Reported-by: Jingyi Wang <wangjingyi11@huawei.com>
Tested-by: Nianyao Tang <tangnianyao@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Fixes: 57e3cebd022f ("KVM: arm64: Delay the polling of the GICR_VPENDBASER.Dirty bit")
Link: https://lore.kernel.org/r/4aae10ba-b39a-5f84-754b-69c2eb0a2c03@huawei.com
2022-04-05 16:33:13 +01:00
Marc Zyngier
eba1e44bee irqchip/gic-v3-its: Skip HP notifier when no ITS is registered
We have some systems out there that have both LPI support and an
ITS, but that don't expose the ITS in their firmware tables
(either because it is broken or because they run under a hypervisor
that hides it...).

Is such a configuration, we still register the HP notifier to free
the allocated tables if needed, resulting in a warning as there is
no memory to free (nothing was allocated the first place).

Fix it by keying the HP notifier on the presence of at least one
sucessfully probed ITS.

Fixes: d23bc2bc1d63 ("irqchip/gic-v3-its: Postpone LPI pending table freeing and memreserve")
Reported-by: Steev Klimaszewski <steev@kali.org>
Tested-by: Steev Klimaszewski <steev@kali.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lore.kernel.org/r/20220202103454.2480465-1-maz@kernel.org
2022-02-02 10:43:10 +00:00
Thomas Gleixner
243d308037 irqchip fixes for 5.17, take #1
- Drop an unused private data field in the AIC driver
 
 - Various fixes to the realtek-rtl driver
 
 - Make the GICv3 ITS driver compile again in !SMP configurations
 
 - Force reset of the GICv3 ITSs at probe time to avoid issues during kexec
 
 - Yet another kfree/bitmap_free conversion
 
 - Various DT updates (Renesas, SiFive)
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmH0KYkPHG1hekBrZXJu
 ZWwub3JnAAoJECPQ0LrRPXpD0JMQAMAQfj4UwC0Ha6wcfbdzkNRWRIfMno2/vSDi
 wJ/RAQi9i9+XeTRluq3aUDmMo+uOCumhJnBsfVF9YGzQXEtWN7OHiSyYtEP5M4gQ
 pk7SHJ2Ro0W8yGGQvwVvf9g8USDD1NfDefjPSr4SZe27hEEush0ngmVWhJeUZzbO
 UnTLb3ej4/tlxc0JJBKt8qdBqhfGVqeqppAAFKWm1PoJOtg7HPYQUlaHyTWWfDRf
 jqqdtKVUysgqG+ok0cO5fX1mnNfA/pC1UkCJavK0mceThtgLf3J3FRT2R8dMBaxP
 kK0bwf1rj7z9pPKaHmWvoplaJFNs6dtvtNdRQ1DGzsT4NqV6YpOmU8vZvYPsv1VE
 jA2YvjuRN2DHfH437RIdIKu6RozMdJlnh/cp5CwHXmyJJY51gWj2WwBjQ/oOKmKZ
 Yd11Wi9pQ06AoTRzmVwS7adHk2VPdygW/P7Xatem/UUJoS15BLxDLGXI7mK2xzNo
 911DAYA0JvgmUZ3zYAkT3V4Lf62hBPUnJnE4J/0eWjPIun2b0UzyOaz/IhJFcShU
 wwZNz/HErd0qkzdFxDxjcEHCA1pHpZO/g171LVaH3b5E7WNMknt0r/YLS+7ktFkq
 we69SFg7iz5P9QJONqvlIqP2HRTer9ILQNaeLdM+Ag2YusYLq03W1eiYOfPYv9up
 sYcoIf2f
 =0mrS
 -----END PGP SIGNATURE-----

Merge tag 'irqchip-fixes-5.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/urgent

Pull irqchip fixes from Marc Zyngier:

  - Drop an unused private data field in the AIC driver

  - Various fixes to the realtek-rtl driver

  - Make the GICv3 ITS driver compile again in !SMP configurations

  - Force reset of the GICv3 ITSs at probe time to avoid issues during kexec

  - Yet another kfree/bitmap_free conversion

  - Various DT updates (Renesas, SiFive)

Link: https://lore.kernel.org/r/20220128174217.517041-1-maz@kernel.org
2022-01-29 21:03:20 +01:00
Marc Zyngier
c733ebb7cb irqchip/gic-v3-its: Reset each ITS's BASERn register before probe
A recent bug report outlined that the way GICv4.1 is handled across
kexec is pretty bad. We can end-up in a situation where ITSs share
memory (this is the case when SVPET==1) and reprogram the base
registers, creating a situation where ITSs that are part of a given
affinity group see different pointers. Which is illegal. Boo.

In order to restore some sanity, reset the BASERn registers to 0
*before* probing any ITS. Although this isn't optimised at all,
this is only a once-per-boot cost, which shouldn't show up on
anyone's radar.

Cc: Jay Chen <jkchen@linux.alibaba.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Link: https://lore.kernel.org/r/20211216190315.GA14220@lpieralisi
Link: https://lore.kernel.org/r/20220124133809.1291195-1-maz@kernel.org
2022-01-26 11:10:28 +00:00