linux-stable/drivers
Michal Pecio 42b7581376 usb: xhci: Limit Stop Endpoint retries
Some host controllers fail to atomically transition an endpoint to the
Running state on a doorbell ring and enter a hidden "Restarting" state,
which looks very much like Stopped, with the important difference that
it will spontaneously transition to Running anytime soon.

A Stop Endpoint command queued in the Restarting state typically fails
with Context State Error and the completion handler sees the Endpoint
Context State as either still Stopped or already Running. Even a case
of Halted was observed, when an error occurred right after the restart.

The Halted state is already recovered from by resetting the endpoint.
The Running state is handled by retrying Stop Endpoint.

The Stopped state was recognized as a problem on NEC controllers and
worked around also by retrying, because the endpoint soon restarts and
then stops for good. But there is a risk: the command may fail if the
endpoint is "stopped for good" already, and retries will fail forever.

The possibility of this was not realized at the time, but a number of
cases were discovered later and reproduced. Some proved difficult to
deal with, and it is outright impossible to predict if an endpoint may
fail to ever start at all due to a hardware bug. One such bug (albeit
on ASM3142, not on NEC) was found to be reliably triggered simply by
toggling an AX88179 NIC up/down in a tight loop for a few seconds.

An endless retries storm is quite nasty. Besides putting needless load
on the xHC and CPU, it causes URBs never to be given back, paralyzing
the device and connection/disconnection logic for the whole bus if the
device is unplugged. User processes waiting for URBs become unkillable,
drivers and kworker threads lock up and xhci_hcd cannot be reloaded.

For peace of mind, impose a timeout on Stop Endpoint retries in this
case. If they don't succeed in 100ms, consider the endpoint stopped
permanently for some reason and just give back the unlinked URBs. This
failure case is rare already and work is under way to make it rarer.

Start this work today by also handling one simple case of race with
Reset Endpoint, because it costs just two lines to implement.

Fixes: fd9d55d190 ("xhci: retry Stop Endpoint on buggy NEC controllers")
CC: stable@vger.kernel.org
Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20241106101459.775897-32-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-11-06 13:26:16 +01:00
..
accel accel/ivpu: Fix NOC firewall interrupt handling 2024-10-30 10:17:00 +01:00
accessibility
acpi ACPI: CPPC: Make rmw_lock a raw_spin_lock 2024-10-29 12:56:19 +01:00
amba ARM: 9416/1: amba: make amba_bustype constant 2024-09-04 15:01:17 +01:00
android binder: modify the comment for binder_proc_unlock 2024-09-11 16:02:45 +02:00
ata ata: libata: Set DID_TIME_OUT for commands that actually timed out 2024-10-24 11:14:00 +02:00
atm
auxdisplay move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
base Driver core revert fix for 6.12-rc6 2024-11-03 08:51:53 -10:00
bcma PCI: Rename CRS Completion Status to RRS 2024-09-10 19:52:30 -05:00
block block-6.12-20241018 2024-10-18 15:53:00 -07:00
bluetooth Bluetooth: btusb: Fix regression with fake CSR controllers 0a12:0001 2024-10-16 16:10:25 -04:00
bus Driver core update for 6.12-rc1 2024-09-27 08:48:37 -07:00
cache
cdrom cdrom: Avoid barrier_nospec() in cdrom_ioctl_media_changed() 2024-10-17 19:47:15 -06:00
cdx
char tpm: Lazily flush the auth session 2024-10-29 00:46:20 +02:00
clk Two clk driver fixes and a unit test fix: 2024-10-17 16:24:42 -07:00
clocksource Updates for x86 timers: 2024-09-17 15:27:01 +02:00
comedi move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
connector
counter move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
cpufreq cpufreq/amd-pstate: Use nominal perf for limits when boost is disabled 2024-10-15 23:54:15 -05:00
cpuidle pmdomain core: 2024-09-18 10:49:45 +02:00
crypto This push fixes the following issues: 2024-10-16 08:42:54 -07:00
cxl cxl/port: Prevent out-of-order decoder allocation 2024-10-25 16:07:03 -05:00
dax device-dax: correct pgoff align in dax_set_mapping() 2024-10-09 12:47:19 -07:00
dca
devfreq PM / devfreq: imx-bus: Use of_property_present() 2024-09-05 01:23:56 +09:00
dio
dma dmaengine fixes for v6.12 2024-11-03 10:15:50 -10:00
dma-buf drm next for 6.12-rc1 2024-09-19 10:18:15 +02:00
dpll
edac - Drop a now obsolete ppc4xx_edac driver 2024-09-16 06:36:37 +02:00
eisa
extcon Char/Misc and other driver changes for 6.12-rc1 2024-09-26 10:13:08 -07:00
firewire firewire: core: fix invalid port index for parent device 2024-10-27 11:14:35 +09:00
firmware arm64 fixes for -rc6 2024-11-01 07:54:11 -10:00
fpga move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
fsi move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
gnss [tree-wide] finally take no_llseek out 2024-09-27 08:18:43 -07:00
gpio gpiolib: fix debugfs dangling chip separator 2024-10-31 19:14:17 +01:00
gpu Driver Changes: 2024-11-02 04:44:27 +10:00
greybus move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
hid hid-for-linus-20241024 2024-10-24 16:31:58 -07:00
hsi
hte
hv drm next for 6.12-rc1 2024-09-19 10:18:15 +02:00
hwmon [PATCH} hwmon: (jc42) Properly detect TSE2004-compliant devices again 2024-10-14 19:14:08 -07:00
hwspinlock
hwtracing [tree-wide] finally take no_llseek out 2024-09-27 08:18:43 -07:00
i2c i2c-for-6.12-rc2 2024-10-05 10:31:04 -07:00
i3c i3c: master: svc: Fix use after free vulnerability in svc_i3c_master Driver Due to Race Condition 2024-09-17 16:51:45 +02:00
idle intel_idle: fix ACPI _CST matching for newer Xeon platforms 2024-09-25 22:30:33 +02:00
iio iio: dac: Kconfig: Fix build error for ltc2664 2024-10-24 18:46:04 +01:00
infiniband RDMA/bnxt_re: synchronize the qp-handle table array 2024-10-21 13:28:15 -03:00
input Input updates for v6.12-rc5 2024-11-03 08:35:29 -10:00
interconnect
iommu iommu/vt-d: Fix incorrect pci_for_each_dma_alias() for non-PCI devices 2024-10-15 10:17:54 +02:00
ipack
irqchip irqchip/gic-v4: Correctly deal with set_affinity on lazily-mapped VPEs 2024-10-27 17:30:16 +01:00
isdn move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
leds move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
macintosh move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
mailbox mailbox, remoteproc: omap2+: fix compile testing 2024-09-27 09:11:05 -05:00
mcb
md block-6.12-20241026 2024-10-27 08:29:36 -10:00
media move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
memory
memstick move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
message SCSI misc on 20240928 2024-09-29 09:22:34 -07:00
mfd move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
misc mei: use kvmalloc for read buffer 2024-10-29 04:01:40 +01:00
mmc MMC core: 2024-10-11 11:23:21 -07:00
most
mtd move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
mux
net net: hns3: fix kernel crash when 1588 is sent on HIP08 devices 2024-10-31 11:15:43 +01:00
nfc move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
ntb ntb: Force physically contiguous allocation of rx ring buffers 2024-09-20 10:51:25 -04:00
nubus
nvdimm virtio: features, fixes, cleanups 2024-09-26 08:43:17 -07:00
nvme block-6.12-20241101 2024-11-01 13:41:55 -10:00
nvmem Char/Misc and other driver changes for 6.12-rc1 2024-09-26 10:13:08 -07:00
of of: Skip kunit tests when arm64+ACPI doesn't populate root node 2024-10-10 12:43:01 -05:00
opp OPP: fix error code in dev_pm_opp_set_config() 2024-10-02 01:27:50 +02:00
parisc parisc: pdc_stable: Constify struct kobj_type 2024-09-09 08:53:17 +02:00
parport parport: Proper fix for array out-of-bounds access 2024-10-13 18:17:35 +02:00
pci pci-v6.12-fixes-2 2024-11-01 15:44:23 -10:00
pcmcia move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
peci move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
perf drivers/perf: riscv: Align errno for unsupported perf event 2024-10-01 02:47:39 -07:00
phy Merge v6.12-rc6 into usb-next 2024-11-05 09:56:08 +01:00
pinctrl pinctrl: ocelot: fix system hang on level based interrupts 2024-10-12 22:04:38 +02:00
platform platform-drivers-x86 for v6.12-3 2024-10-27 08:40:33 -10:00
pmdomain pmdomain: qcom-cpr: Fix the return of uninitialized variable 2024-10-02 12:38:53 +02:00
pnp
power move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
powercap powercap: dtpm_devfreq: Fix error check against dev_pm_qos_add_request() 2024-10-21 13:23:06 +02:00
pps [tree-wide] finally take no_llseek out 2024-09-27 08:18:43 -07:00
ps3
ptp move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
pwm soc: convert ep93xx to devicetree 2024-09-26 12:00:25 -07:00
rapidio
ras
regulator regulator: sm5703: Remove because it is unused and fails to build 2024-09-13 19:08:14 +01:00
remoteproc mhu-v3, omap2+ : fix kconfig dependencies 2024-09-29 09:53:04 -07:00
reset reset: starfive: jh71x0: Fix accessing the empty member on JH7110 SoC 2024-09-30 14:24:37 +02:00
rpmsg rpmsg: glink: Avoid -Wflex-array-member-not-at-end warnings 2024-09-13 14:09:47 -07:00
rtc move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
s390 s390/sclp_vt220: Convert newlines to CRLF instead of LFCR 2024-10-16 11:32:32 +02:00
sbus [tree-wide] finally take no_llseek out 2024-09-27 08:18:43 -07:00
scsi SCSI fixes on 20241030 2024-10-30 08:16:23 -10:00
sh sh: intc: Replace simple_strtoul() with kstrtoul() 2024-09-26 17:25:29 +02:00
siox
slimbus slimbus: qcom-ngd-ctrl: use 'time_left' variable with wait_for_completion_timeout() 2024-09-03 12:10:38 +02:00
soc FSL SOC fixes for v6.12: 2024-10-11 10:03:13 +00:00
soundwire soundwire: intel_ace2x: Send PDI stream number during prepare 2024-10-17 12:11:19 +01:00
spi spi: spi-fsl-dspi: Fix crash when not using GPIO chip select 2024-10-23 22:37:54 +01:00
spmi
ssb
staging staging: iio: frequency: ad9832: fix division by zero in ad9832_calc_freqreg() 2024-10-24 18:30:47 +01:00
target SCSI fixes on 20241019 2024-10-19 12:52:19 -07:00
tc
tee optee: Fix a NULL vs IS_ERR() check 2024-09-09 12:22:06 +02:00
thermal Power management updates for 6.12-rc3 2024-10-11 11:41:20 -07:00
thunderbolt thunderbolt: Honor TMU requirements in the domain when setting TMU mode 2024-10-21 09:42:42 +03:00
tty serial: qcom-geni: rename suspend functions 2024-10-11 08:39:24 +02:00
ufs SCSI fixes on 20241030 2024-10-30 08:16:23 -10:00
uio uio: Constify struct kobj_type 2024-09-11 16:02:54 +02:00
usb usb: xhci: Limit Stop Endpoint retries 2024-11-06 13:26:16 +01:00
vdpa virtio: bugfixes 2024-10-07 11:33:26 -07:00
vfio [tree-wide] finally take no_llseek out 2024-09-27 08:18:43 -07:00
vhost virtio: bugfixes 2024-10-07 11:33:26 -07:00
video fbdev: wm8505fb: select CONFIG_FB_IOMEM_FOPS 2024-10-21 11:16:51 +02:00
virt [tree-wide] finally take no_llseek out 2024-09-27 08:18:43 -07:00
virtio virtio: bugfixes 2024-10-07 11:33:26 -07:00
w1 w1: ds2482: Drop explicit initialization of struct i2c_device_id::driver_data to 0 2024-09-06 19:18:32 +02:00
watchdog move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
xen xen: Remove dependency between pciback and privcmd 2024-10-18 11:59:04 +02:00
zorro
Kconfig
Makefile leds: Init leds class earlier 2024-09-04 17:24:58 -05:00