1323754 Commits

Author SHA1 Message Date
Frederic Weisbecker
33035977b4 net: pktgen: Use kthread_create_on_cpu()
Use the proper API instead of open coding it.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://patch.msgid.link/20241208234955.31910-1-frederic@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-10 18:24:54 -08:00
Furong Xu
46afe345ff net: stmmac: Relocate extern declarations in common.h and hwif.h
The extern declarations should be in a header file that corresponds to
their definition, move these extern declarations to its header file.
Some of them have nowhere to go, so move them to hwif.h since they are
referenced in hwif.c only.

dwmac100_* dwmac1000_* dwmac4_* dwmac410_* dwmac510_* stay in hwif.h,
otherwise you will be flooded with name conflicts from dwmac100.h,
dwmac1000.h and dwmac4.h if hwif.c try to #include these .h files.

Compile tested only.
No functional change intended.

Suggested-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Furong Xu <0x1207@gmail.com>
Link: https://patch.msgid.link/20241208070202.203931-1-0x1207@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-10 18:24:36 -08:00
Jakub Kicinski
23c57f404b Merge branch 'dsa-mv88e6xxx-refactor-statistics-ready-for-rmu-support'
Andrew Lunn says:

====================
dsa: mv88e6xxx: Refactor statistics ready for RMU support

Marvell Ethernet switches support sending commands to the switch
inside Ethernet frames, which the Remote Management Unit, RMU,
handles. One such command retries all the RMON statistics. The
switches however have other statistics which cannot be retried by this
bulk method, so need to be gathered individually.

This patch series refactors the existing statistics code into a
structure that will allow RMU integration in a future patchset.

There should be no functional change as a result of this refactoring.
====================

Link: https://patch.msgid.link/20241207-v6-13-rc1-net-next-mv88e6xxx-stats-refactor-v1-0-b9960f839846@lunn.ch
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-10 18:23:17 -08:00
Andrew Lunn
9a4eef6bf2 dsa: mv88e6xxx: Centralise common statistics check
With moving information about available statistics into the info
structure, the test becomes identical. Consolidate them into a single
test.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241207-v6-13-rc1-net-next-mv88e6xxx-stats-refactor-v1-2-b9960f839846@lunn.ch
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-10 18:23:12 -08:00
Andrew Lunn
5595e3613e dsa: mv88e6xxx: Move available stats into info structure
Different families of switches have different statistics available.
This information is current hard coded into functions, however this
information will also soon be needed when getting statistics from the
RMU. Move it into the info structure.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241207-v6-13-rc1-net-next-mv88e6xxx-stats-refactor-v1-1-b9960f839846@lunn.ch
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-10 18:23:12 -08:00
Jakub Kicinski
a0e1fc921c Merge branch 'add-support-for-synopsis-dwmac-ip-on-nxp-automotive-socs-s32g2xx-s32g3xx-s32r45'
Jan Petrous via says:

====================
Add support for Synopsis DWMAC IP on NXP Automotive SoCs S32G2xx/S32G3xx/S32R45

The SoC series S32G2xx and S32G3xx feature one DWMAC instance,
the SoC S32R45 has two instances. The devices can use RGMII/RMII/MII
interface over Pinctrl device or the output can be routed
to the embedded SerDes for SGMII connectivity.

The provided stmmac glue code implements only basic functionality,
interface support is restricted to RGMII only. More, including
SGMII/SerDes support will come later.

This patchset adds stmmac glue driver based on downstream NXP git [0].

[0] https://github.com/nxp-auto-linux/linux

v7: https://lore.kernel.org/20241202-upstream_s32cc_gmac-v7-0-bc3e1f9f656e@oss.nxp.com
v6: https://lore.kernel.org/20241124-upstream_s32cc_gmac-v6-0-dc5718ccf001@oss.nxp.com
v5: https://lore.kernel.org/20241119-upstream_s32cc_gmac-v5-0-7dcc90fcffef@oss.nxp.com
v4: https://lore.kernel.org/20241028-upstream_s32cc_gmac-v4-0-03618f10e3e2@oss.nxp.com
v3: https://lore.kernel.org/20241013-upstream_s32cc_gmac-v3-0-d84b5a67b930@oss.nxp.com
====================

Link: https://patch.msgid.link/20241205-upstream_s32cc_gmac-v8-0-ec1d180df815@oss.nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 18:36:06 -08:00
Jan Petrous (OSS)
6bc6234cbd MAINTAINERS: Add Jan Petrous as the NXP S32G/R DWMAC driver maintainer
Add myself as NXP S32G/R DWMAC Ethernet driver maintainer.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jan Petrous (OSS) <jan.petrous@oss.nxp.com>
Link: https://patch.msgid.link/20241205-upstream_s32cc_gmac-v8-15-ec1d180df815@oss.nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 18:36:04 -08:00
Jan Petrous (OSS)
cd197ac5d6 net: stmmac: dwmac-s32: add basic NXP S32G/S32R glue driver
NXP S32G2xx/S32G3xx and S32R45 are automotive grade SoCs
that integrate one or two Synopsys DWMAC 5.10/5.20 IPs.

The basic driver supports only RGMII interface.

Signed-off-by: Jan Petrous (OSS) <jan.petrous@oss.nxp.com>
Link: https://patch.msgid.link/20241205-upstream_s32cc_gmac-v8-14-ec1d180df815@oss.nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 18:36:03 -08:00
Jan Petrous (OSS)
91f10e5895 dt-bindings: net: Add DT bindings for DWMAC on NXP S32G/R SoCs
Add basic description for DWMAC ethernet IP on NXP S32G2xx, S32G3xx
and S32R45 automotive series SoCs.

Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Jan Petrous (OSS) <jan.petrous@oss.nxp.com>
Link: https://patch.msgid.link/20241205-upstream_s32cc_gmac-v8-13-ec1d180df815@oss.nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 18:36:03 -08:00
Jan Petrous (OSS)
1ead577755 net: dwmac-sti: Use helper rgmii_clock
Utilize a new helper function rgmii_clock().

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jan Petrous (OSS) <jan.petrous@oss.nxp.com>
Link: https://patch.msgid.link/20241205-upstream_s32cc_gmac-v8-12-ec1d180df815@oss.nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 18:36:03 -08:00
Jan Petrous (OSS)
fd59bca4d5 net: xgene_enet: Use helper rgmii_clock
Utilize a new helper function rgmii_clock().

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jan Petrous (OSS) <jan.petrous@oss.nxp.com>
Link: https://patch.msgid.link/20241205-upstream_s32cc_gmac-v8-11-ec1d180df815@oss.nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 18:36:03 -08:00
Jan Petrous (OSS)
04207d28f4 net: macb: Use helper rgmii_clock
Utilize a new helper function rgmii_clock().

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jan Petrous (OSS) <jan.petrous@oss.nxp.com>
Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Link: https://patch.msgid.link/20241205-upstream_s32cc_gmac-v8-10-ec1d180df815@oss.nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 18:36:03 -08:00
Jan Petrous (OSS)
b561d717a7 net: dwmac-starfive: Use helper rgmii_clock
Utilize a new helper function rgmii_clock().

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Signed-off-by: Jan Petrous (OSS) <jan.petrous@oss.nxp.com>
Link: https://patch.msgid.link/20241205-upstream_s32cc_gmac-v8-9-ec1d180df815@oss.nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 18:36:03 -08:00
Jan Petrous (OSS)
30b4a9b5c3 net: dwmac-rk: Use helper rgmii_clock
Utilize a new helper function rgmii_clock().

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jan Petrous (OSS) <jan.petrous@oss.nxp.com>
Link: https://patch.msgid.link/20241205-upstream_s32cc_gmac-v8-8-ec1d180df815@oss.nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 18:36:03 -08:00
Jan Petrous (OSS)
8470bfc835 net: dwmac-intel-plat: Use helper rgmii_clock
Utilize a new helper function rgmii_clock().

When in, remove dead code in kmb_eth_fix_mac_speed().

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jan Petrous (OSS) <jan.petrous@oss.nxp.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/20241205-upstream_s32cc_gmac-v8-7-ec1d180df815@oss.nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 18:36:02 -08:00
Jan Petrous (OSS)
839b75ea4d net: dwmac-imx: Use helper rgmii_clock
Utilize a new helper function rgmii_clock().

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jan Petrous (OSS) <jan.petrous@oss.nxp.com>
Link: https://patch.msgid.link/20241205-upstream_s32cc_gmac-v8-6-ec1d180df815@oss.nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 18:36:02 -08:00
Jan Petrous (OSS)
37b66c483e net: dwmac-dwc-qos-eth: Use helper rgmii_clock
Utilize a new helper function rgmii_clock().

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jan Petrous (OSS) <jan.petrous@oss.nxp.com>
Link: https://patch.msgid.link/20241205-upstream_s32cc_gmac-v8-5-ec1d180df815@oss.nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 18:36:02 -08:00
Jan Petrous (OSS)
386aa60abd net: phy: Add helper for mapping RGMII link speed to clock rate
The RGMII interface supports three data rates: 10/100 Mbps
and 1 Gbps. These speeds correspond to clock frequencies
of 2.5/25 MHz and 125 MHz, respectively.

Many Ethernet drivers, including glues in stmmac, follow
a similar pattern of converting RGMII speed to clock frequency.

To simplify code, define the helper rgmii_clock(speed)
to convert connection speed to clock frequency.

Suggested-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jan Petrous (OSS) <jan.petrous@oss.nxp.com>
Link: https://patch.msgid.link/20241205-upstream_s32cc_gmac-v8-4-ec1d180df815@oss.nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 18:36:02 -08:00
Jan Petrous (OSS)
cb09f61a9a net: stmmac: Fix clock rate variables size
The clock API clk_get_rate() returns unsigned long value.
Expand affected members of stmmac platform data and
convert the stmmac_clk_csr_set() and dwmac4_core_init() methods
to defining the unsigned long clk_rate local variables.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Serge Semin <fancer.lancer@gmail.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jan Petrous (OSS) <jan.petrous@oss.nxp.com>
Link: https://patch.msgid.link/20241205-upstream_s32cc_gmac-v8-3-ec1d180df815@oss.nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 18:36:02 -08:00
Jan Petrous (OSS)
c8fab05d02 net: stmmac: Extend CSR calc support
Add support for CSR clock range up to 800 MHz.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jan Petrous (OSS) <jan.petrous@oss.nxp.com>
Link: https://patch.msgid.link/20241205-upstream_s32cc_gmac-v8-2-ec1d180df815@oss.nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 18:36:02 -08:00
Jan Petrous (OSS)
31cdd84182 net: stmmac: Fix CSR divider comment
The comment in declaration of STMMAC_CSR_250_300M
incorrectly describes the constant as '/* MDC = clk_scr_i/122 */'
but the DWC Ether QOS Handbook version 5.20a says it is
CSR clock/124.

Signed-off-by: Jan Petrous (OSS) <jan.petrous@oss.nxp.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/20241205-upstream_s32cc_gmac-v8-1-ec1d180df815@oss.nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 18:36:02 -08:00
Nikita Yushchenko
32fd46f5b6 net: renesas: rswitch: remove speed from gwca structure
This field is set but never used.

GWCA is rswitch CPU interface module which connects rswitch to the
host over AXI bus. Speed of the switch ports is not anyhow related to
GWCA operation.

Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/20241206192140.1714-2-nikita.yoush@cogentembedded.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 18:27:29 -08:00
Nikita Yushchenko
070927427d net: renesas: rswitch: do not deinit disabled ports
In rswitch_ether_port_init_all(), only enabled ports are initialized.
Then, rswitch_ether_port_deinit_all() shall also only deinitialize
enabled ports.

Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/20241206192140.1714-1-nikita.yoush@cogentembedded.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 18:27:29 -08:00
Shinas Rasheed
8a241ef9b9 octeon_ep: add ndo ops for VFs in PF driver
These APIs are needed to support applications that use netlink to get VF
information from a PF driver.

Signed-off-by: Shinas Rasheed <srasheed@marvell.com>
Link: https://patch.msgid.link/20241206064135.2331790-1-srasheed@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 16:13:51 -08:00
Jakub Kicinski
e58b4771af Merge branch 'vxlan-support-user-defined-reserved-bits'
Petr Machata says:

====================
vxlan: Support user-defined reserved bits

Currently the VXLAN header validation works by vxlan_rcv() going feature
by feature, each feature clearing the bits that it consumes. If anything
is left unparsed at the end, the packet is rejected.

Unfortunately there are machines out there that send VXLAN packets with
reserved bits set, even if they are configured to not use the
corresponding features. One such report is here[1], and we have heard
similar complaints from our customers as well.

This patchset adds an attribute that makes it configurable which bits
the user wishes to tolerate and which they consider reserved. This was
recommended in [1] as well.

A knob like that inevitably allows users to set as reserved bits that
are in fact required for the features enabled by the netdevice, such as
GPE. This is detected, and such configurations are rejected.

In patches #1..#7, the reserved bits validation code is gradually moved
away from the unparsed approach described above, to one where a given
set of valid bits is precomputed and then the packet is validated
against that.

In patch #8, this precomputed set is made configurable through a new
attribute IFLA_VXLAN_RESERVED_BITS.

Patches #9 and #10 massage the testsuite a bit, so that patch #11 can
introduce a selftest for the resreved bits feature.

The corresponding iproute2 support is available in [2].

[1] https://lore.kernel.org/netdev/db8b9e19-ad75-44d3-bfb2-46590d426ff5@proxmox.com/
[2] https://github.com/pmachata/iproute2/commits/vxlan_reserved_bits/
====================

Link: https://patch.msgid.link/cover.1733412063.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 14:47:12 -08:00
Petr Machata
d84b5dccf3 selftests: forwarding: Add a selftest for the new reserved_bits UAPI
Run VXLAN packets through a gateway. Flip individual bits of the packet
and/or reserved bits of the gateway, and check that the gateway treats the
packets as expected.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/388bef3c30ebc887d4e64cd86a362e2df2f2d2e1.1733412063.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 14:47:06 -08:00
Petr Machata
d76ccb2ec3 selftests: net: lib: Add several autodefer helpers
Add ip_link_set_addr(), ip_link_set_up(), ip_addr_add() and ip_route_add()
to the suite of helpers that automatically schedule a corresponding
cleanup.

When setting a new MAC, one needs to remember the old address first. Move
mac_get() from forwarding/ to that end.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/add6bcbe30828fd01363266df20c338cf13aaf25.1733412063.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 14:47:05 -08:00
Petr Machata
8653eb21d6 selftests: net: lib: Rename ip_link_master() to ip_link_set_master()
Let's have a verb in that function name to make it clearer what's going on.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/fbf7c53a429b340b9cff5831280ea8c305a224f9.1733412063.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 14:47:05 -08:00
Petr Machata
6c11379b10 vxlan: Add an attribute to make VXLAN header validation configurable
The set of bits that the VXLAN netdevice currently considers reserved is
defined by the features enabled at the netdevice construction. In order to
make this configurable, add an attribute, IFLA_VXLAN_RESERVED_BITS. The
payload is a pair of big-endian u32's covering the VXLAN header. This is
validated against the set of flags used by the various enabled VXLAN
features, and attempts to override bits used by an enabled feature are
bounced.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/c657275e5ceed301e62c69fe8e559e32909442e2.1733412063.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 14:47:05 -08:00
Petr Machata
bb16786ed6 vxlan: vxlan_rcv(): Drop unparsed
The code currently validates the VXLAN header in two ways: first by
comparing it with the set of reserved bits, constructed ahead of time
during the netdevice construction; and second by gradually clearing the
bits off a separate copy of VXLAN header, "unparsed". Drop the latter
validation method.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/4559f16c5664c189b3a4ee6f5da91f552ad4821c.1733412063.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 14:47:04 -08:00
Petr Machata
752b1c8d8b vxlan: Bump error counters for header mismatches
The VXLAN driver so far has not increased the error counters for packets
that set reserved bits. It does so for other packet errors, so do it for
this case as well.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/d096084167d56706d620afe5136cf37a2d34d1b9.1733412063.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 14:47:04 -08:00
Petr Machata
e4f8647767 vxlan: Track reserved bits explicitly as part of the configuration
In order to make it possible to configure which bits in VXLAN header should
be considered reserved, introduce a new field vxlan_config::reserved_bits.
Have it cover the whole header, except for the VNI-present bit and the bits
for VNI itself, and have individual enabled features clear more bits off
reserved_bits.

(This is expressed as first constructing a used_bits set, and then
inverting it to get the reserved_bits. The set of used_bits will be useful
on its own for validation of user-set reserved_bits in a following patch.)

The patch also moves a comment relevant to the validation from the unparsed
validation site up to the new site. Logically this patch should add the new
comment, and a later patch that removes the unparsed bits would remove the
old comment. But keeping both legs in the same patch is better from the
history spelunking point of view.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/984dbf98d5940d3900268dbffaf70961f731d4a4.1733412063.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 14:47:04 -08:00
Petr Machata
e713130dfb vxlan: vxlan_rcv(): Extract vxlan_hdr(skb) to a named variable
Having a named reference to the VXLAN header is more handy than having to
conjure it anew through vxlan_hdr() on every use. Add a new variable and
convert several open-coded sites.

Additionally, convert one "unparsed" use to the new variable as well. Thus
the only "unparsed" uses that remain are the flag-clearing and the header
validity check at the end.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/2a0a940e883c435a0fdbcdc1d03c4858957ad00e.1733412063.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 14:47:03 -08:00
Petr Machata
fe3dcbcfae vxlan: vxlan_rcv() callees: Drop the unparsed argument
The functions vxlan_remcsum() and vxlan_parse_gbp_hdr() take both the SKB
and the unparsed VXLAN header. Now that unparsed adjustment is handled
directly by vxlan_rcv(), drop this argument, and have the function derive
it from the SKB on its own.

vxlan_parse_gpe_proto() does not take SKB, so keep the header parameter.
However const it so that it's clear that the intention is that it does not
get changed.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/5ea651f4e06485ba1a84a8eb556a457c39f0dfd4.1733412063.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 14:47:03 -08:00
Petr Machata
0f09ae9078 vxlan: vxlan_rcv() callees: Move clearing of unparsed flags out
In order to migrate away from the use of unparsed to detect invalid flags,
move all the code that actually clears the flags from callees directly to
vxlan_rcv().

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/2857871d929375c881b9defe378473c8200ead9b.1733412063.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 14:47:03 -08:00
Petr Machata
9234a37a49 vxlan: In vxlan_rcv(), access flags through the vxlan netdevice
vxlan_sock.flags is constructed from vxlan_dev.cfg.flags, as the subset of
flags (named VXLAN_F_RCV_FLAGS) that is important from the point of view of
socket sharing. Attempts to reconfigure these flags during the vxlan netdev
lifetime are also bounced. It is therefore immaterial whether we access the
flags through the vxlan_dev or through the socket.

Convert the socket accesses to netdevice accesses in this separate patch to
make the conversions that take place in the following patches more obvious.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/5d237ffd731055e524d7b7c436de43358d8743d2.1733412063.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 14:47:02 -08:00
Jakub Kicinski
3f330db306 net: reformat kdoc return statements
kernel-doc -Wall warns about missing Return: statement for non-void
functions. We have a number of kdocs in our headers which are missing
the colon, IOW they use
 * Return some value
or
 * Returns some value

Having the colon makes some sense, it should help kdoc parser avoid
false positives. So add them. This is mostly done with a sed script,
and removing the unnecessary cases (mostly the comments which aren't
kdoc).

Acked-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Acked-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Reviewed-by: Edward Cree <ecree.xilinx@gmail.com>
Acked-by: Alexandra Winter <wintera@linux.ibm.com>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Link: https://patch.msgid.link/20241205165914.1071102-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 14:44:59 -08:00
Jesse Van Gavere
ca78588805 net: dsa: microchip: Make MDIO bus name unique
In configurations with 2 or more DSA clusters it will fail to allocate
unique MDIO bus names as only the switch ID is used, fix this by using
a combination of the tree ID and switch ID when needed

Signed-off-by: Jesse Van Gavere <jesse.vangavere@scioteq.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20241206204202.649912-1-jesse.vangavere@scioteq.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 14:32:00 -08:00
Eric Dumazet
2d20773aec mctp: no longer rely on net->dev_index_head[]
mctp_dump_addrinfo() is one of the last users of
net->dev_index_head[] in the control path.

Switch to for_each_netdev_dump() for better scalability.

Use C99 for mctp_device_rtnl_msg_handlers[] to prepare
future RTNL removal from mctp_dump_addrinfo()

(mdev->addrs is not yet RCU protected)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Matt Johnston <matt@codeconstruct.com.au>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Jeremy Kerr <jk@codeconstruct.com.au>
Link: https://patch.msgid.link/20241206223811.1343076-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 14:29:14 -08:00
Jakub Kicinski
f9663b7caf Merge branch 'rxrpc-implement-jumbo-data-transmission-and-rack-tlp'
David Howells says:

====================
rxrpc: Implement jumbo DATA transmission and RACK-TLP

Here's a series of patches to implement two main features:

 (1) The transmission of jumbo data packets whereby several DATA packets of
     a particular size can be glued together into a single UDP packet,
     allowing us to make use of larger MTU sizes.  The basic jumbo
     subpacket capacity is 1412 bytes (RXRPC_JUMBO_DATALEN) and, say, an
     MTU of 8192 allows five of them to be transmitted as one.

     An alternative (and possibly more efficient way) would be to
     expand/shrink the capacity of each DATA packet to match the MTU and
     thus save on header and tail-gap overhead, but the Rx protocol does
     not provide a mechanism for splitting the data - especially as the
     transported data is encrypted per-packet - and so UDP fragmentation
     would be the only way to handle this.

     In fact, in the future, AF_RXRPC also needs to look at shrinking the
     packet size where the MTU is smaller - for instance in the case of
     being carried by IPv6 over wifi where there isn't capacity for a 1412
     byte capacity.

 (2) RACK-TLP to manage packet loss and retransmission in conjunction with
     the congestion control algorithm.

These allow for better data throughput and work towards being able to have
larger transmission windows.

To this end, the following changes are also made:

 (1) Use a single large array of kvec structs for the I/O thread rather
     than having one per transmission buffer.  We need a much bigger
     collection of kvecs for ping padding

 (2) Implement path-MTU probing by sending padded PING ACK packets and
     monitoring for PING RESPONSE ACKs.  The pmtud value determined is used
     to configure the construction of jumbo DATA packets.

 (3) The transmission queue is changed from a linked list of transmission
     buffer structs to a linked list of transmission-queue structs, each of
     which points to either 32 or 64 transmission buffers (depending on cpu
     word size) and various bits of metadata are concentrated in the queue
     structs rather than the buffers to make better use of the cpu cache.

 (4) SACK data is stored in the transmission-queue structures in batches of
     32 or 64 making it faster to process rather than being spread amongst
     all the individual packet buffers.

 (5) Don't change the DF flag on the UDP socket unless we need to - and
     basically only enable it for path-MTU probing.

There are also some additional bits:

 (1) Fix the handling of connection aborts to poke the aborted connections.

 (2) Don't set the MORE-PACKETS Rx header flag on the wire.  No one
     actually checks it and it is, in any case, generated inconsistently
     between implementations.

 (3) Request an ACK when, during call transmission, there's a stall in the
     app generating the data to be transmitted.

 (4) Fix attention starvation in the I/O thread by making sure we go
     through all outstanding events rather than returning to the beginning
     of the check cycle after any time we process an event.

 (5) Don't use the skbuff timestamp in the calculation of timeouts and RTT
     as we really should include local processing time in that too.
     Further, getting receive skbuff timestamps may be expensive.

 (6) Make RTT tracking per call with the saving of the value between calls,
     even within the same connection channel.  The initial call timeout
     starts off large to allow the server time to set up its state before
     the initial reply.

 (7) Don't allocate txbuf structs for ACK packets, but rather use page
     frags and MSG_SPLICE_PAGES.

 (8) Use irq-disabling locks for interactions between app threads and I/O
     threads so that the I/O thread doesn't get help up.

 (9) Make rxrpc set the REQUEST-ACK flag on an outgoing packet when cwnd is
     at RXRPC_MIN_CWND (currently 4), not at 2 which it can never reach.

(10) Add some tracing bits and pieces (including displaying the userStatus
     field in an ACK header) and some more stats counters (including
     different sizes of jumbo packets sent/received).

Link: https://lore.kernel.org/r/20240306000655.1100294-1-dhowells@redhat.com/ [1]
====================

Link: https://patch.msgid.link/20241204074710.990092-1-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 13:48:37 -08:00
David Howells
7c48266593 rxrpc: Implement RACK/TLP to deal with transmission stalls [RFC8985]
When an rxrpc call is in its transmission phase and is sending a lot of
packets, stalls occasionally occur that cause severe performance
degradation (eg. increasing the transmission time for a 256MiB payload from
0.7s to 2.5s over a 10G link).

rxrpc already implements TCP-style congestion control [RFC5681] and this
helps mitigate the effects, but occasionally we're missing a time event
that deals with a missing ACK, leading to a stall until the RTO expires.

Fix this by implementing RACK/TLP in rxrpc.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 13:48:33 -08:00
David Howells
4ee4c2f82b rxrpc: Fix request for an ACK when cwnd is minimum
rxrpc_prepare_data_subpacket() sets the REQUEST-ACK flag on the outgoing
DATA packet under a number of circumstances, including, theoretically, when
the cwnd is at minimum (or less).  However, the minimum in this function is
hard-coded as 2, but the actual minimum is RXRPC_MIN_CWND (which is
currently 4) and so this never occurs.

Without this, we will miss the request of some ACKs, potentially leading to
a transmission stall until a timeout occurs on one side or the other that
leads to an ACK being generated.

Fix the function to use RXRPC_MIN_CWND rather than a hard-coded number.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 13:48:32 -08:00
David Howells
b40ef2b85a rxrpc: Manage RTT per-call rather than per-peer
Manage the determination of RTT on a per-call (ie. per-RPC op) basis rather
than on a per-peer basis, averaging across all calls going to that peer.
The problem is that the RTT measurements from the initial packets on a call
may be off because the server may do some setting up (such as getting a
lock on a file) before accepting the rest of the data in the RPC and,
further, the RTT may be affected by server-side file operations, for
instance if a large amount of data is being written or read.

Note: When handling the FS.StoreData-type RPCs, for example, the server
uses the userStatus field in the header of ACK packets as supplementary
flow control to aid in managing this.  AF_RXRPC does not yet support this,
but it should be added.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 13:48:32 -08:00
David Howells
b509934094 rxrpc: Add a reason indicator to the tx_ack tracepoint
Record the reason for the transmission of an ACK in the rxrpc_tx_ack
tracepoint, and not just in the rxrpc_propose_ack tracepoint.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 13:48:32 -08:00
David Howells
372d12d191 rxrpc: Add a reason indicator to the tx_data tracepoint
Add an indicator to the rxrpc_tx_data tracepoint to indicate what triggered
the transmission of a particular packet.  At this point, it's only normal
transmission and retransmission, plus the tracepoint is also used to record
loss injection, but in a future patch, TLP-induced (re-)transmission will
also be a thing.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 13:48:32 -08:00
David Howells
547a9acd4c rxrpc: Tidy up the ACK parsing a bit
Tidy up the ACK parsing in the following ways:

 (1) Put the serial number of the ACK packet into the rxrpc_ack_summary
     struct and access it from there whilst parsing an ACK.

 (2) Be consistent about using "if (summary.acked_serial)" rather than "if
     (summary.acked_serial != 0)".

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 13:48:31 -08:00
David Howells
a2ea9a9072 rxrpc: Use irq-disabling spinlocks between app and I/O thread
Where a spinlock is used by both the application thread and the I/O thread,
use irq-disabling locking so that an interrupt taken on the app thread
doesn't also slow down the I/O thread.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 13:48:31 -08:00
David Howells
08d55d7cf3 rxrpc: Don't allocate a txbuf for an ACK transmission
Don't allocate an rxrpc_txbuf struct for an ACK transmission.  There's now
no need as the memory to hold the ACK content is allocated with a page frag
allocator.  The allocation and freeing of a txbuf is just unnecessary
overhead.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 13:48:31 -08:00
David Howells
fe24a54943 rxrpc: Send jumbo DATA packets
Send jumbo DATA packets if the path-MTU probing using padded PING ACK
packets shows up sufficient capacity to do so.  This allows larger chunks
of data to be sent without reducing the retryability as the subpackets in a
jumbo packet can also be retransmitted individually.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 13:48:31 -08:00
David Howells
0130eff911 rxrpc: Fix initial resend timeout
The constant for the initial resend timeout is in milliseconds, but the
variable it's assigned to is in microseconds.  Fix the constant to be in
microseconds.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09 13:48:30 -08:00