linux-next/drivers
Maximilian Heyne fa765c4b4a xen/events: close evtchn after mapping cleanup
shutdown_pirq and startup_pirq are not taking the
irq_mapping_update_lock because they can't due to lock inversion. Both
are called with the irq_desc->lock being taking. The lock order,
however, is first irq_mapping_update_lock and then irq_desc->lock.

This opens multiple races:
- shutdown_pirq can be interrupted by a function that allocates an event
  channel:

  CPU0                        CPU1
  shutdown_pirq {
    xen_evtchn_close(e)
                              __startup_pirq {
                                EVTCHNOP_bind_pirq
                                  -> returns just freed evtchn e
                                set_evtchn_to_irq(e, irq)
                              }
    xen_irq_info_cleanup() {
      set_evtchn_to_irq(e, -1)
    }
  }

  Assume here event channel e refers here to the same event channel
  number.
  After this race the evtchn_to_irq mapping for e is invalid (-1).

- __startup_pirq races with __unbind_from_irq in a similar way. Because
  __startup_pirq doesn't take irq_mapping_update_lock it can grab the
  evtchn that __unbind_from_irq is currently freeing and cleaning up. In
  this case even though the event channel is allocated, its mapping can
  be unset in evtchn_to_irq.

The fix is to first cleanup the mappings and then close the event
channel. In this way, when an event channel gets allocated it's
potential previous evtchn_to_irq mappings are guaranteed to be unset already.
This is also the reverse order of the allocation where first the event
channel is allocated and then the mappings are setup.

On a 5.10 kernel prior to commit 3fcdaf3d76 ("xen/events: modify internal
[un]bind interfaces"), we hit a BUG like the following during probing of NVMe
devices. The issue is that during nvme_setup_io_queues, pci_free_irq
is called for every device which results in a call to shutdown_pirq.
With many nvme devices it's therefore likely to hit this race during
boot because there will be multiple calls to shutdown_pirq and
startup_pirq are running potentially in parallel.

  ------------[ cut here ]------------
  blkfront: xvda: barrier or flush: disabled; persistent grants: enabled; indirect descriptors: enabled; bounce buffer: enabled
  kernel BUG at drivers/xen/events/events_base.c:499!
  invalid opcode: 0000 [#1] SMP PTI
  CPU: 44 PID: 375 Comm: kworker/u257:23 Not tainted 5.10.201-191.748.amzn2.x86_64 #1
  Hardware name: Xen HVM domU, BIOS 4.11.amazon 08/24/2006
  Workqueue: nvme-reset-wq nvme_reset_work
  RIP: 0010:bind_evtchn_to_cpu+0xdf/0xf0
  Code: 5d 41 5e c3 cc cc cc cc 44 89 f7 e8 2b 55 ad ff 49 89 c5 48 85 c0 0f 84 64 ff ff ff 4c 8b 68 30 41 83 fe ff 0f 85 60 ff ff ff <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00
  RSP: 0000:ffffc9000d533b08 EFLAGS: 00010046
  RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000006
  RDX: 0000000000000028 RSI: 00000000ffffffff RDI: 00000000ffffffff
  RBP: ffff888107419680 R08: 0000000000000000 R09: ffffffff82d72b00
  R10: 0000000000000000 R11: 0000000000000000 R12: 00000000000001ed
  R13: 0000000000000000 R14: 00000000ffffffff R15: 0000000000000002
  FS:  0000000000000000(0000) GS:ffff88bc8b500000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000000 CR3: 0000000002610001 CR4: 00000000001706e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   ? show_trace_log_lvl+0x1c1/0x2d9
   ? show_trace_log_lvl+0x1c1/0x2d9
   ? set_affinity_irq+0xdc/0x1c0
   ? __die_body.cold+0x8/0xd
   ? die+0x2b/0x50
   ? do_trap+0x90/0x110
   ? bind_evtchn_to_cpu+0xdf/0xf0
   ? do_error_trap+0x65/0x80
   ? bind_evtchn_to_cpu+0xdf/0xf0
   ? exc_invalid_op+0x4e/0x70
   ? bind_evtchn_to_cpu+0xdf/0xf0
   ? asm_exc_invalid_op+0x12/0x20
   ? bind_evtchn_to_cpu+0xdf/0xf0
   ? bind_evtchn_to_cpu+0xc5/0xf0
   set_affinity_irq+0xdc/0x1c0
   irq_do_set_affinity+0x1d7/0x1f0
   irq_setup_affinity+0xd6/0x1a0
   irq_startup+0x8a/0xf0
   __setup_irq+0x639/0x6d0
   ? nvme_suspend+0x150/0x150
   request_threaded_irq+0x10c/0x180
   ? nvme_suspend+0x150/0x150
   pci_request_irq+0xa8/0xf0
   ? __blk_mq_free_request+0x74/0xa0
   queue_request_irq+0x6f/0x80
   nvme_create_queue+0x1af/0x200
   nvme_create_io_queues+0xbd/0xf0
   nvme_setup_io_queues+0x246/0x320
   ? nvme_irq_check+0x30/0x30
   nvme_reset_work+0x1c8/0x400
   process_one_work+0x1b0/0x350
   worker_thread+0x49/0x310
   ? process_one_work+0x350/0x350
   kthread+0x11b/0x140
   ? __kthread_bind_mask+0x60/0x60
   ret_from_fork+0x22/0x30
  Modules linked in:
  ---[ end trace a11715de1eee1873 ]---

Fixes: d46a78b05c ("xen: implement pirq type event channels")
Cc: stable@vger.kernel.org
Co-debugged-by: Andrew Panyakin <apanyaki@amazon.com>
Signed-off-by: Maximilian Heyne <mheyne@amazon.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20240124163130.31324-1-mheyne@amazon.de
Signed-off-by: Juergen Gross <jgross@suse.com>
2024-02-13 10:12:47 +01:00
..
accel accel/ivpu: Add job status for jobs aborted by the driver 2024-02-06 13:37:34 +01:00
accessibility
acpi cxl/cper: Fix errant CPER prints for CXL events 2024-02-03 18:31:17 +01:00
amba
android binder: signal epoll threads of self-work 2024-01-31 14:08:28 -08:00
ata ahci: Extend ASM1061 43-bit DMA address quirk to other ASM106x parts 2024-01-31 12:09:34 +01:00
atm atm: idt77252: fix a memleak in open_card_ubr0 2024-02-03 12:46:13 +00:00
auxdisplay drm-next for 6.8: 2024-01-12 11:32:19 -08:00
base RTC for 6.8 2024-01-18 17:25:39 -08:00
bcma
block block-6.8-2024-02-10 2024-02-10 08:02:48 -08:00
bluetooth USB / Thunderbolt changes for 6.8-rc1 2024-01-18 11:43:55 -08:00
bus Char/Misc and other Driver changes for 6.8-rc1 2024-01-17 16:47:17 -08:00
cache
cdrom
cdx cdx: Unlock on error path in rescan_store() 2024-01-04 17:01:14 +01:00
char TTY/Serial changes for 6.8-rc1 2024-01-18 11:37:24 -08:00
clk clk: qcom: gcc-x1e80100: Replace of_device.h with explicit includes 2024-01-19 08:17:28 -06:00
clocksource clocksource/drivers/ep93xx: Fix error handling during probe 2023-12-27 15:37:11 +01:00
comedi
connector Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-01-04 18:06:46 -08:00
counter
cpufreq cpufreq/amd-pstate: Fix setting scaling max/min freq values 2024-01-22 20:35:58 +01:00
cpuidle cpuidle: haltpoll: Do not enable interrupts when entering idle 2023-12-29 18:08:18 +01:00
crypto crypto: ccp - Fix null pointer dereference in __sev_platform_shutdown_locked 2024-02-02 18:08:12 +08:00
cxl EFI fixes for v6.8 #1 2024-02-09 10:40:50 -08:00
dax New code for 6.8: 2024-01-10 08:45:22 -08:00
dca
devfreq
dio
dma dmaengine: at_hdmac: add missing kernel-doc style description 2024-02-02 17:16:55 +01:00
dma-buf dma-buf: heaps: Don't track CMA dma-buf pages under RssFile 2024-01-31 19:54:58 +05:30
dpll dpll: fix register pin with unregistered parent pin 2024-01-22 11:01:11 +00:00
edac Driver core changes for 6.8-rc1 2024-01-18 09:48:40 -08:00
eisa
extcon
firewire firewire: core: send bus reset promptly on gap count error 2024-02-07 08:20:02 +09:00
firmware EFI fixes for v6.8 #1 2024-02-09 10:40:50 -08:00
fpga Char/Misc and other Driver changes for 6.8-rc1 2024-01-17 16:47:17 -08:00
fsi
gnss TTY/Serial changes for 6.8-rc1 2024-01-18 11:37:24 -08:00
gpio gpio: remove GPIO device from the list unconditionally in error path 2024-02-08 10:33:03 +01:00
gpu Merge tag 'drm-msm-fixes-2024-02-07' of https://gitlab.freedesktop.org/drm/msm into drm-fixes 2024-02-09 11:32:38 +10:00
greybus TTY/Serial changes for 6.8-rc1 2024-01-18 11:37:24 -08:00
hid HID: bpf: use __bpf_kfunc instead of noinline 2024-01-31 10:27:08 +01:00
hsi
hte
hv
hwmon hwmon: (coretemp) Enlarge per package core count limit 2024-02-04 06:43:45 -08:00
hwspinlock
hwtracing
i2c This cycle, I2C removes the currently unused CLASS_DDC support 2024-01-18 17:29:01 -08:00
i3c i3c: master: cdns: Update maximum prescaler value for i2c clock 2024-01-08 00:51:36 +01:00
idle Power management updates for 6.8-rc1 2024-01-09 16:32:11 -08:00
iio TTY/Serial changes for 6.8-rc1 2024-01-18 11:37:24 -08:00
infiniband RDMA v6.8 merge window 2024-01-12 13:52:21 -08:00
input Input updates for v6.8-rc2 2024-02-02 12:52:44 -08:00
interconnect Char/Misc and other Driver changes for 6.8-rc1 2024-01-17 16:47:17 -08:00
iommu iommu: Allow ops->default_domain to work when !CONFIG_IOMMU_DMA 2024-02-01 13:16:17 +01:00
ipack TTY/Serial changes for 6.8-rc1 2024-01-18 11:37:24 -08:00
irqchip header cleanups for 6.8 2024-01-10 16:43:55 -08:00
isdn
leds - New Drivers 2024-01-17 15:25:27 -08:00
macintosh
mailbox mediatek: add CMDQ support for mt8188 2024-01-17 15:39:32 -08:00
mcb
md dm-crypt, dm-verity: disable tasklets 2024-02-02 12:33:50 -05:00
media media: vb2: refactor setting flags and caps, fix missing cap 2024-01-24 17:27:51 +01:00
memory IOMMU Updates for Linux v6.8 2024-01-18 15:16:57 -08:00
memstick
message
mfd TTY/Serial changes for 6.8-rc1 2024-01-18 11:37:24 -08:00
misc misc: open-dice: Fix spurious lockdep warning 2024-01-30 16:20:54 -08:00
mmc mmc: slot-gpio: Allow non-sleeping GPIO ro 2024-02-06 12:35:44 +01:00
most
mtd This pull request contains updates for UBI and UBIFS: 2024-01-17 10:27:13 -08:00
mux mux: mmio: use reg property when parent device is not a syscon 2024-01-04 17:01:14 +01:00
net octeontx2-af: Initialize maps. 2024-02-08 12:03:02 +01:00
nfc
ntb
nubus nubus: Make nubus_bus_type static and constant 2024-01-03 13:33:59 +01:00
nvdimm virtio: features, fixes 2024-01-18 16:44:03 -08:00
nvme nvme: use ns->head->pi_size instead of t10_pi_tuple structure size 2024-02-07 15:49:36 -08:00
nvmem Char/Misc and other Driver changes for 6.8-rc1 2024-01-17 16:47:17 -08:00
of IOMMU Updates for Linux v6.8 2024-01-18 15:16:57 -08:00
opp OPP: Rename 'rate_clk_single' 2024-01-05 15:55:41 +05:30
parisc parisc/power: Fix power soft-off button emulation on qemu 2024-01-07 22:59:16 +01:00
parport
pci pci-v6.8-fixes-2 2024-02-09 10:37:59 -08:00
pcmcia
peci
perf ACPI updates for 6.8-rc1 2024-01-09 16:12:44 -08:00
phy phy: ti: phy-omap-usb2: Fix NULL pointer dereference for SRP 2024-01-30 22:41:11 +05:30
pinctrl pinctrl: amd: Add IRQF_ONESHOT to the interrupt request 2024-01-31 10:06:07 +01:00
platform platform/x86: touchscreen_dmi: Add info for the TECLAST X16 Plus tablet 2024-01-26 20:21:47 +01:00
pmdomain pmdomain: mediatek: fix race conditions with genpd 2024-01-23 13:19:15 +01:00
pnp More ACPI updates for 6.8-rc1 2024-01-17 14:37:40 -08:00
power Revert "power: supply: qcom_battmgr: Register the power supplies after PDR is up" 2024-01-26 22:45:58 +01:00
powercap
pps
ps3
ptp Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-01-04 18:06:46 -08:00
pwm pwm: jz4740: Don't use dev_err_probe() in .request() 2024-01-12 18:25:05 +01:00
rapidio
ras
regulator regulator (max5970): Fix IRQ handler 2024-01-30 15:27:16 +00:00
remoteproc
reset SoC: driver updates for 6.8 2024-01-11 11:31:46 -08:00
rpmsg
rtc rtc: nuvoton: Compatible with NCT3015Y-R and NCT3018Y-R 2024-01-18 01:05:33 +01:00
s390 s390/qeth: Fix potential loss of L3-IP@ in case of network issues 2024-02-08 12:10:09 +01:00
sbus
scsi scsi: lpfc: Use unsigned type for num_sge 2024-02-05 16:19:11 -05:00
sh maple: make maple_bus_type static and const 2024-01-04 14:37:17 +01:00
siox
slimbus
soc soc: apple: mailbox: error pointers are negative integers 2024-01-30 11:34:49 -08:00
soundwire soundwire updates for 6.7 2024-01-18 17:08:31 -08:00
spi spi: sh-msiof: avoid integer overflow in constants 2024-01-30 15:27:21 +00:00
spmi
ssb
staging This cycle, I2C removes the currently unused CLASS_DDC support 2024-01-18 17:29:01 -08:00
target SCSI misc on 20240120 2024-01-20 09:42:32 -08:00
tc
tee Another moderately busy cycle for documentation, including: 2024-01-11 19:46:52 -08:00
thermal thermal: intel: powerclamp: Remove dead code for target mwait value 2024-01-22 11:59:22 +01:00
thunderbolt USB / Thunderbolt changes for 6.8-rc1 2024-01-18 11:43:55 -08:00
tty serial: max310x: prevent infinite while() loop in port startup 2024-01-27 19:09:10 -08:00
ufs scsi: ufs: core: Remove the ufshcd_release() in ufshcd_err_handling_prepare() 2024-02-05 16:31:18 -05:00
uio uio: Fix use-after-free in uio_open 2024-01-04 17:03:47 +01:00
usb USB-serial device ids for 6.8-rc3 2024-02-02 08:36:38 -08:00
vdpa virtio: features, fixes 2024-01-18 16:44:03 -08:00
vfio VFIO updates for v6.8-rc1 2024-01-18 15:57:25 -08:00
vhost virtio: features, fixes 2024-01-18 16:44:03 -08:00
video fbdev: stifb: Fix crash in stifb_blank() 2024-01-23 09:13:24 +01:00
virt Char/Misc and other Driver changes for 6.8-rc1 2024-01-17 16:47:17 -08:00
virtio virtio: features, fixes 2024-01-18 16:44:03 -08:00
w1
watchdog linux-watchdog 6.8-rc1 tag 2024-01-12 13:32:30 -08:00
xen xen/events: close evtchn after mapping cleanup 2024-02-13 10:12:47 +01:00
zorro
Kconfig
Makefile fbdev/intelfb: Remove driver 2024-01-12 12:38:37 +01:00