linux-next/drivers/net/dsa
Vladimir Oltean acfcdb78d5 net: dsa: felix: fix stuck CPU-injected packets with short taprio windows
With this port schedule:

tc qdisc replace dev $send_if parent root handle 100 taprio \
	num_tc 8 queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
	map 0 1 2 3 4 5 6 7 \
	base-time 0 cycle-time 10000 \
	sched-entry S 01 1250 \
	sched-entry S 02 1250 \
	sched-entry S 04 1250 \
	sched-entry S 08 1250 \
	sched-entry S 10 1250 \
	sched-entry S 20 1250 \
	sched-entry S 40 1250 \
	sched-entry S 80 1250 \
	flags 2

ptp4l would fail to take TX timestamps of Pdelay_Resp messages like this:

increasing tx_timestamp_timeout may correct this issue, but it is likely caused by a driver bug
ptp4l[4134.168]: port 2: send peer delay response failed

It turns out that the driver can't take their TX timestamps because it
can't transmit them in the first place. And there's nothing special
about the Pdelay_Resp packets - they're just regular 68 byte packets.
But with this taprio configuration, the switch would refuse to send even
the ETH_ZLEN minimum packet size.

This should have definitely not been the case. When applying the taprio
config, the driver prints:

mscc_felix 0000:00:00.5: port 0 tc 0 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS
mscc_felix 0000:00:00.5: port 0 tc 1 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS
mscc_felix 0000:00:00.5: port 0 tc 2 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS
mscc_felix 0000:00:00.5: port 0 tc 3 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS
mscc_felix 0000:00:00.5: port 0 tc 4 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS
mscc_felix 0000:00:00.5: port 0 tc 5 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS
mscc_felix 0000:00:00.5: port 0 tc 6 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS
mscc_felix 0000:00:00.5: port 0 tc 7 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS

and thus, everything under 132 bytes - ETH_FCS_LEN should have been sent
without problems. Yet it's not.

For the forwarding path, the configuration is fine, yet packets injected
from Linux get stuck with this schedule no matter what.

The first hint that the static guard bands are the cause of the problem
is that reverting Michael Walle's commit 297c4de6f7 ("net: dsa: felix:
re-enable TAS guard band mode") made things work. It must be that the
guard bands are calculated incorrectly.

I remembered that there is a magic constant in the driver, set to 33 ns
for no logical reason other than experimentation, which says "never let
the static guard bands get so large as to leave less than this amount of
remaining space in the time slot, because the queue system will refuse
to schedule packets otherwise, and they will get stuck". I had a hunch
that my previous experimentally-determined value was only good for
packets coming from the forwarding path, and that the CPU injection path
needed more.

I came to the new value of 35 ns through binary search, after seeing
that with 544 ns (the bit time required to send the Pdelay_Resp packet
at gigabit) it works. Again, this is purely experimental, there's no
logic and the manual doesn't say anything.

The new driver prints for this schedule look like this:

mscc_felix 0000:00:00.5: port 0 tc 0 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS
mscc_felix 0000:00:00.5: port 0 tc 1 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS
mscc_felix 0000:00:00.5: port 0 tc 2 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS
mscc_felix 0000:00:00.5: port 0 tc 3 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS
mscc_felix 0000:00:00.5: port 0 tc 4 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS
mscc_felix 0000:00:00.5: port 0 tc 5 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS
mscc_felix 0000:00:00.5: port 0 tc 6 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS
mscc_felix 0000:00:00.5: port 0 tc 7 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS

So yes, the maximum MTU is now even smaller by 1 byte than before.
This is maybe counter-intuitive, but makes more sense with a diagram of
one time slot.

Before:

 Gate open                                   Gate close
 |                                                    |
 v           1250 ns total time slot duration         v
 <---------------------------------------------------->
 <----><---------------------------------------------->
  33 ns            1217 ns static guard band
  useful

 Gate open                                   Gate close
 |                                                    |
 v           1250 ns total time slot duration         v
 <---------------------------------------------------->
 <-----><--------------------------------------------->
  35 ns            1215 ns static guard band
  useful

The static guard band implemented by this switch hardware directly
determines the maximum allowable MTU for that traffic class. The larger
it is, the earlier the switch will stop scheduling frames for
transmission, because otherwise they might overrun the gate close time
(and avoiding that is the entire purpose of Michael's patch).
So, we now have guard bands smaller by 2 ns, thus, in this particular
case, we lose a byte of the maximum MTU.

Fixes: 11afdc6526 ("net: dsa: felix: tc-taprio intervals smaller than MTU should send at least one packet")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Michael Walle <mwalle@kernel.org>
Link: https://patch.msgid.link/20241210132640.3426788-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-11 20:24:56 -08:00
..
b53 net: dsa: use ethtool string helpers 2024-11-03 10:36:34 -08:00
hirschmann net: dsa: use ethtool string helpers 2024-11-03 10:36:34 -08:00
microchip net: dsa: microchip: Add LAN9646 switch support to KSZ DSA driver 2024-11-13 19:54:58 -08:00
mv88e6xxx net: dsa: use ethtool string helpers 2024-11-03 10:36:34 -08:00
ocelot net: dsa: felix: fix stuck CPU-injected packets with short taprio windows 2024-12-11 20:24:56 -08:00
qca dsa: qca8k: Use nested lock to avoid splat 2024-11-12 18:25:30 -08:00
realtek module: Convert symbol namespace to string literal 2024-12-02 11:34:44 -08:00
sja1105 net: dsa: use ethtool string helpers 2024-11-03 10:36:34 -08:00
xrs700x net: dsa: use ethtool string helpers 2024-11-03 10:36:34 -08:00
bcm_sf2_cfp.c net: dsa: use ethtool string helpers 2024-11-03 10:36:34 -08:00
bcm_sf2_regs.h net: dsa: bcm_sf2: refactor LED regs access 2021-12-30 17:28:32 -08:00
bcm_sf2.c net: dsa: use ethtool string helpers 2024-11-03 10:36:34 -08:00
bcm_sf2.h net: dsa: use ethtool string helpers 2024-11-03 10:36:34 -08:00
dsa_loop_bdinfo.c net: fill in MODULE_DESCRIPTION()s for dsa_loop_bdinfo 2024-02-09 14:12:02 -08:00
dsa_loop.c net: dsa: use ethtool string helpers 2024-11-03 10:36:34 -08:00
dsa_loop.h
Kconfig net: dsa: vsc73xx: Implement the tag_8021q VLAN operations 2024-07-15 06:55:15 -07:00
lan9303_i2c.c net: Drop explicit initialization of struct i2c_device_id::driver_data to 0 2024-06-26 07:28:08 -07:00
lan9303_mdio.c dsa: lan9303: consistent naming for PHY address parameter 2024-07-15 08:49:59 -07:00
lan9303-core.c net: dsa: lan9303: ensure chip reset and wait for READY status 2024-10-07 16:38:02 -07:00
lan9303.h net: dsa: be compatible with masters which unregister on shutdown 2021-09-19 12:08:37 +01:00
lantiq_gswip.c net: dsa: Switch back to struct platform_driver::remove() 2024-10-04 16:39:57 -07:00
lantiq_pce.h net: dsa: Use the correct style for SPDX License Identifier 2019-09-22 15:25:08 -07:00
Makefile net: dsa: mt7530: introduce driver for MT7988 built-in switch 2023-04-03 10:13:01 +01:00
mt7530-mdio.c net: dsa: mt7530-mdio: read PHY address of switch from device tree 2024-04-23 10:32:40 +02:00
mt7530-mmio.c net: dsa: Switch back to struct platform_driver::remove() 2024-10-04 16:39:57 -07:00
mt7530.c net: dsa: mt7530: Add TBF qdisc offload support 2024-11-03 12:53:42 -08:00
mt7530.h net: dsa: mt7530: Add TBF qdisc offload support 2024-11-03 12:53:42 -08:00
mv88e6060.c net: dsa: mv88e6060: add phylink_get_caps implementation 2023-08-14 18:57:17 -07:00
mv88e6060.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
rzn1_a5psw.c net: dsa: use ethtool string helpers 2024-11-03 10:36:34 -08:00
rzn1_a5psw.h net: dsa: rzn1-a5psw: add vlan support 2023-08-11 11:58:36 +01:00
vitesse-vsc73xx-core.c net: dsa: vsc73xx: fix reception from VLAN-unaware bridges 2024-10-15 18:41:52 -07:00
vitesse-vsc73xx-platform.c net: dsa: Switch back to struct platform_driver::remove() 2024-10-04 16:39:57 -07:00
vitesse-vsc73xx-spi.c net: dsa: vitesse-vsc73xx: remove unnecessary set_drvdata() 2022-09-22 19:30:39 -07:00
vitesse-vsc73xx.h net: dsa: vsc73xx: implement FDB operations 2024-09-03 10:22:58 +02:00