linux/drivers/net/can
Matthias Schiffer 743375f8de can: m_can: fix missed interrupts with m_can_pci
The interrupt line of PCI devices is interpreted as edge-triggered,
however the interrupt signal of the m_can controller integrated in Intel
Elkhart Lake CPUs appears to be generated level-triggered.

Consider the following sequence of events:

- IR register is read, interrupt X is set
- A new interrupt Y is triggered in the m_can controller
- IR register is written to acknowledge interrupt X. Y remains set in IR

As at no point in this sequence no interrupt flag is set in IR, the
m_can interrupt line will never become deasserted, and no edge will ever
be observed to trigger another run of the ISR. This was observed to
result in the TX queue of the EHL m_can to get stuck under high load,
because frames were queued to the hardware in m_can_start_xmit(), but
m_can_finish_tx() was never run to account for their successful
transmission.

On an Elkhart Lake based board with the two CAN interfaces connected to
each other, the following script can reproduce the issue:

    ip link set can0 up type can bitrate 1000000
    ip link set can1 up type can bitrate 1000000

    cangen can0 -g 2 -I 000 -L 8 &
    cangen can0 -g 2 -I 001 -L 8 &
    cangen can0 -g 2 -I 002 -L 8 &
    cangen can0 -g 2 -I 003 -L 8 &
    cangen can0 -g 2 -I 004 -L 8 &
    cangen can0 -g 2 -I 005 -L 8 &
    cangen can0 -g 2 -I 006 -L 8 &
    cangen can0 -g 2 -I 007 -L 8 &

    cangen can1 -g 2 -I 100 -L 8 &
    cangen can1 -g 2 -I 101 -L 8 &
    cangen can1 -g 2 -I 102 -L 8 &
    cangen can1 -g 2 -I 103 -L 8 &
    cangen can1 -g 2 -I 104 -L 8 &
    cangen can1 -g 2 -I 105 -L 8 &
    cangen can1 -g 2 -I 106 -L 8 &
    cangen can1 -g 2 -I 107 -L 8 &

    stress-ng --matrix 0 &

To fix the issue, repeatedly read and acknowledge interrupts at the
start of the ISR until no interrupt flags are set, so the next incoming
interrupt will also result in an edge on the interrupt line.

While we have received a report that even with this patch, the TX queue
can become stuck under certain (currently unknown) circumstances on the
Elkhart Lake, this patch completely fixes the issue with the above
reproducer, and it is unclear whether the remaining issue has a similar
cause at all.

Fixes: cab7ffc032 ("can: m_can: add PCI glue driver for Intel Elkhart Lake")
Signed-off-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com>
Reviewed-by: Markus Schneider-Pargmann <msp@baylibre.com>
Link: https://patch.msgid.link/fdf0439c51bcb3a46c21e9fb21c7f1d06363be84.1728288535.git.matthias.schiffer@ew.tq-group.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2024-12-18 09:30:52 +01:00
..
c_can can: c_can: c_can_handle_bus_err(): update statistics if skb allocation fails 2024-11-26 10:49:21 +01:00
cc770 can: {cc770,sja1000}_isa: allow building on x86_64 2024-11-04 17:46:06 +01:00
ctucanfd can: Switch back to struct platform_driver::remove() 2024-09-11 09:37:16 +02:00
dev can: dev: can_set_termination(): allow sleeping GPIOs 2024-11-26 10:13:34 +01:00
esd can: esd_402_pci: Add support for one-shot mode 2024-08-05 17:32:00 +02:00
flexcan can: Switch back to struct platform_driver::remove() 2024-09-11 09:37:16 +02:00
ifi_canfd can: ifi_canfd: ifi_canfd_handle_lec_err(): fix {rx,tx}_errors statistics 2024-11-26 10:50:40 +01:00
m_can can: m_can: fix missed interrupts with m_can_pci 2024-12-18 09:30:52 +01:00
mscan can: Switch back to struct platform_driver::remove() 2024-09-11 09:37:16 +02:00
peak_canfd can: peak_canfd: Remove setting of RX software timestamp 2024-09-03 15:17:47 -07:00
rcar can: Switch back to struct platform_driver::remove() 2024-09-11 09:37:16 +02:00
rockchip can: rockchip_canfd: Drop obsolete dependency on COMPILE_TEST 2024-11-04 18:01:06 +01:00
sja1000 can: sja1000: sja1000_err(): fix {rx,tx}_errors statistics 2024-11-26 10:50:54 +01:00
slcan tty: use u8 for flags 2023-08-11 21:12:45 +02:00
softing can: Switch back to struct platform_driver::remove() 2024-09-11 09:37:16 +02:00
spi can: mcp251xfd: mcp251xfd_get_tef_len(): work around erratum DS80000789E 6. 2024-11-26 11:42:32 +01:00
usb can: f81604: f81604_handle_can_bus_errors(): fix {rx,tx}_errors statistics 2024-11-26 10:51:12 +01:00
at91_can.c can: Switch back to struct platform_driver::remove() 2024-09-11 09:37:16 +02:00
bxcan.c can: Switch back to struct platform_driver::remove() 2024-09-11 09:37:16 +02:00
can327.c tty: use u8 for flags 2023-08-11 21:12:45 +02:00
grcan.c can: Switch back to struct platform_driver::remove() 2024-09-11 09:37:16 +02:00
janz-ican3.c can: Switch back to struct platform_driver::remove() 2024-09-11 09:37:16 +02:00
Kconfig can: rockchip_canfd: add driver for Rockchip CAN-FD controller 2024-09-04 14:41:51 +02:00
kvaser_pciefd.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-09-15 09:13:19 -07:00
Makefile can: rockchip_canfd: add driver for Rockchip CAN-FD controller 2024-09-04 14:41:51 +02:00
sun4i_can.c can: sun4i_can: sun4i_can_err(): fix {rx,tx}_errors statistics 2024-11-26 10:51:00 +01:00
ti_hecc.c can: Switch back to struct platform_driver::remove() 2024-09-11 09:37:16 +02:00
vcan.c net: annotate writes on dev->mtu from ndo_change_mtu() 2024-05-07 16:19:14 -07:00
vxcan.c rtnetlink: fix double call of rtnl_link_get_net_ifla() 2024-12-03 11:29:29 +01:00
xilinx_can.c can: Switch back to struct platform_driver::remove() 2024-09-11 09:37:16 +02:00