linux-stable/drivers/usb
Michal Pecio 42b7581376 usb: xhci: Limit Stop Endpoint retries
Some host controllers fail to atomically transition an endpoint to the
Running state on a doorbell ring and enter a hidden "Restarting" state,
which looks very much like Stopped, with the important difference that
it will spontaneously transition to Running anytime soon.

A Stop Endpoint command queued in the Restarting state typically fails
with Context State Error and the completion handler sees the Endpoint
Context State as either still Stopped or already Running. Even a case
of Halted was observed, when an error occurred right after the restart.

The Halted state is already recovered from by resetting the endpoint.
The Running state is handled by retrying Stop Endpoint.

The Stopped state was recognized as a problem on NEC controllers and
worked around also by retrying, because the endpoint soon restarts and
then stops for good. But there is a risk: the command may fail if the
endpoint is "stopped for good" already, and retries will fail forever.

The possibility of this was not realized at the time, but a number of
cases were discovered later and reproduced. Some proved difficult to
deal with, and it is outright impossible to predict if an endpoint may
fail to ever start at all due to a hardware bug. One such bug (albeit
on ASM3142, not on NEC) was found to be reliably triggered simply by
toggling an AX88179 NIC up/down in a tight loop for a few seconds.

An endless retries storm is quite nasty. Besides putting needless load
on the xHC and CPU, it causes URBs never to be given back, paralyzing
the device and connection/disconnection logic for the whole bus if the
device is unplugged. User processes waiting for URBs become unkillable,
drivers and kworker threads lock up and xhci_hcd cannot be reloaded.

For peace of mind, impose a timeout on Stop Endpoint retries in this
case. If they don't succeed in 100ms, consider the endpoint stopped
permanently for some reason and just give back the unlinked URBs. This
failure case is rare already and work is under way to make it rarer.

Start this work today by also handling one simple case of race with
Reset Endpoint, because it costs just two lines to implement.

Fixes: fd9d55d190 ("xhci: retry Stop Endpoint on buggy NEC controllers")
CC: stable@vger.kernel.org
Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20241106101459.775897-32-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-11-06 13:26:16 +01:00
..
atm Merge 6.12-rc3 into usb-next 2024-10-14 08:03:44 +02:00
c67x00 usb: Switch back to struct platform_driver::remove() 2024-10-04 15:13:03 +02:00
cdns3 usb: Switch back to struct platform_driver::remove() 2024-10-04 15:13:03 +02:00
chipidea usb: Use (of|device)_property_present() for non-boolean properties 2024-11-05 13:29:26 +01:00
class move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
common usb: Switch back to struct platform_driver::remove() 2024-10-04 15:13:03 +02:00
core Merge v6.12-rc6 into usb-next 2024-11-05 09:56:08 +01:00
dwc2 Merge v6.12-rc6 into usb-next 2024-11-05 09:56:08 +01:00
dwc3 usb: Use (of|device)_property_present() for non-boolean properties 2024-11-05 13:29:26 +01:00
early usb: early: xhci-dbc: Use memcpy_and_pad() 2023-01-31 10:40:54 +01:00
fotg210 Merge 6.12-rc3 into usb-next 2024-10-14 08:03:44 +02:00
gadget Merge 6.12-rc4 into usb-next 2024-10-21 08:53:43 +02:00
host usb: xhci: Limit Stop Endpoint retries 2024-11-06 13:26:16 +01:00
image scsi: core: Add a dma_alignment field to the host and host template 2024-04-11 21:37:48 -04:00
isp1760 Merge 6.12-rc3 into usb-next 2024-10-14 08:03:44 +02:00
misc Merge 6.12-rc3 into usb-next 2024-10-14 08:03:44 +02:00
mon [tree-wide] finally take no_llseek out 2024-09-27 08:18:43 -07:00
mtu3 usb: Use (of|device)_property_present() for non-boolean properties 2024-11-05 13:29:26 +01:00
musb Merge 6.12-rc3 into usb-next 2024-10-14 08:03:44 +02:00
phy usb: Use (of|device)_property_present() for non-boolean properties 2024-11-05 13:29:26 +01:00
renesas_usbhs usb: Use (of|device)_property_present() for non-boolean properties 2024-11-05 13:29:26 +01:00
roles usb: Switch back to struct platform_driver::remove() 2024-10-04 15:13:03 +02:00
serial USB: serial: option: add Telit FN920C04 MBIM compositions 2024-10-17 16:38:02 +02:00
storage usb: storage: use US_BULK_FLAG_OUT instead of constant values 2024-10-29 04:33:25 +01:00
typec Merge v6.12-rc6 into usb-next 2024-11-05 09:56:08 +01:00
usbip usb: Switch back to struct platform_driver::remove() 2024-10-04 15:13:03 +02:00
Kconfig usb: pci-quirks: handle HAS_IOPORT dependency for AMD quirk 2023-10-02 16:19:12 +02:00
Makefile USB: xen-hcd: Traverse host/ when CONFIG_USB_XEN_HCD is selected 2024-06-04 15:33:38 +02:00
usb-skeleton.c usb: add usb_set_intfdata() documentation 2022-11-29 08:56:09 +01:00