Commit Graph

1253557 Commits

Author SHA1 Message Date
Jakub Kicinski
196febcb82 Merge branch 'tools-net-ynl-add-support-for-nlctrl-netlink-family'
Donald Hunter says:

====================
tools/net/ynl: Add support for nlctrl netlink family

This series adds a new YNL spec for the nlctrl family, plus some fixes
and enhancements for ynl.

Patch 1 fixes an extack decoding bug
Patch 2 gives cleaner netlink error reporting
Patch 3 fixes an array-nest codegen bug
Patch 4 adds nest-type-value support to ynl
Patch 5 fixes the ynl schemas to allow empty enum-name attrs
Patch 6 contains the nlctrl spec
====================

Link: https://lore.kernel.org/r/20240306231046.97158-1-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07 20:28:41 -08:00
Donald Hunter
768e044a5f doc/netlink/specs: Add spec for nlctrl netlink family
Add a spec for the nlctrl family.

Example usage:

./tools/net/ynl/cli.py \
    --spec Documentation/netlink/specs/nlctrl.yaml \
    --do getfamily --json '{"family-name": "nlctrl"}'

./tools/net/ynl/cli.py \
    --spec Documentation/netlink/specs/nlctrl.yaml \
    --dump getpolicy --json '{"family-name": "nlctrl"}'

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://lore.kernel.org/r/20240306231046.97158-7-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07 20:28:38 -08:00
Donald Hunter
bc52b39309 doc/netlink: Allow empty enum-name in ynl specs
Update the ynl schemas to allow the specification of empty enum names
for all enum code generation.

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://lore.kernel.org/r/20240306231046.97158-6-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07 20:28:35 -08:00
Donald Hunter
b6e6a76dec tools/net/ynl: Add nest-type-value decoding
The nlctrl genetlink-legacy family uses nest-type-value encoding as
described in Documentation/userspace-api/netlink/genetlink-legacy.rst

Add nest-type-value decoding to ynl.

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://lore.kernel.org/r/20240306231046.97158-5-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07 20:28:32 -08:00
Donald Hunter
6fe7de5e9c tools/net/ynl: Fix c codegen for array-nest
ynl-gen-c generates e.g. 'calloc(mcast_groups, sizeof(*dst->mcast_groups))'
for array-nest attrs when it should be 'n_mcast_groups'.

Add a 'n_' prefix in the generated code for array-nests.

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://lore.kernel.org/r/20240306231046.97158-4-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07 20:28:29 -08:00
Donald Hunter
771b7012e5 tools/net/ynl: Report netlink errors without stacktrace
ynl does not handle NlError exceptions so they get reported like program
failures. Handle the NlError exceptions and report the netlink errors
more cleanly.

Example now:

Netlink error: No such file or directory
nl_len = 44 (28) nl_flags = 0x300 nl_type = 2
	error: -2	extack: {'bad-attr': '.op'}

Example before:

Traceback (most recent call last):
  File "/home/donaldh/net-next/./tools/net/ynl/cli.py", line 81, in <module>
    main()
  File "/home/donaldh/net-next/./tools/net/ynl/cli.py", line 69, in main
    reply = ynl.dump(args.dump, attrs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/donaldh/net-next/tools/net/ynl/lib/ynl.py", line 906, in dump
    return self._op(method, vals, [], dump=True)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/donaldh/net-next/tools/net/ynl/lib/ynl.py", line 872, in _op
    raise NlError(nl_msg)
lib.ynl.NlError: Netlink error: No such file or directory
nl_len = 44 (28) nl_flags = 0x300 nl_type = 2
	error: -2	extack: {'bad-attr': '.op'}

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://lore.kernel.org/r/20240306231046.97158-3-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07 20:28:26 -08:00
Donald Hunter
cecbc52c46 tools/net/ynl: Fix extack decoding for netlink-raw
Extack decoding was using a hard-coded msg header size of 20 but
netlink-raw has a header size of 16.

Use a protocol specific msghdr_size() when decoding the attr offssets.

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://lore.kernel.org/r/20240306231046.97158-2-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07 20:28:22 -08:00
Jakub Kicinski
62a1e41602 Merge branch 'isdn-constify-struct-class-usage'
Ricardo B. Marliere says:

====================
isdn: constify struct class usage

This is a simple and straight forward cleanup series that aims to make the
class structures in isdn constant. This has been possible since 2023 [1].

[1]: https://lore.kernel.org/all/2023040248-customary-release-4aec@gregkh/
====================

Link: https://lore.kernel.org/r/20240305-class_cleanup-isdn-v1-0-6f0edca75b61@marliere.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07 20:26:26 -08:00
Ricardo B. Marliere
12fbd67ea3 isdn: capi: make capi_class constant
Since commit 43a7206b09 ("driver core: class: make class_register() take
a const *"), the driver core allows for struct class to be in read-only
memory, so move the capi_class structure to be declared at build time
placing it into read-only memory, instead of having to be dynamically
allocated at boot time.

Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ricardo B. Marliere <ricardo@marliere.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240305-class_cleanup-isdn-v1-2-6f0edca75b61@marliere.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07 20:26:24 -08:00
Ricardo B. Marliere
479b4bc867 isdn: mISDN: make elements_class constant
Since commit 43a7206b09 ("driver core: class: make class_register() take
a const *"), the driver core allows for struct class to be in read-only
memory, so move the elements_class structure to be declared at build time
placing it into read-only memory, instead of having to be dynamically
allocated at boot time.

Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ricardo B. Marliere <ricardo@marliere.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240305-class_cleanup-isdn-v1-1-6f0edca75b61@marliere.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07 20:26:24 -08:00
Jérémie Dautheribes
b72413211b dt-bindings: net: dp83822: change ti,rmii-mode description
Drop reference to the 25MHz clock as it has nothing to do with connecting
the PHY and the MAC.
Add info about the reference clock direction between the PHY and the MAC
as it depends on the selected rmii mode.

Suggested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jérémie Dautheribes <jeremie.dautheribes@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20240305141309.127669-1-jeremie.dautheribes@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07 20:25:28 -08:00
Justin Swartz
7a04ff1277 net: x25: remove dead links from Kconfig
Remove the "You can read more about X.25 at" links provided in
Kconfig as they have not pointed at any relevant pages for quite
a while.

An old copy of https://www.sangoma.com/tutorials/x25/ can be
retrieved via https://archive.org/web/ but nothing useful seems
to have been preserved for http://docwiki.cisco.com/wiki/X.25

For the sake of necromancy and those who really did want to
read more about X.25, a previous incarnation of Kconfig included
a link to:
http://www.cisco.com/univercd/cc/td/doc/product/software/ios11/cbook/cx25.htm

Which can still be read at:
https://web.archive.org/web/20071013101232/http://cisco.com/en/US/docs/ios/11_0/router/configuration/guide/cx25.html

Signed-off-by: Justin Swartz <justin.swartz@risingedge.co.za>
Acked-by: Martin Schiller <ms@dev.tdt.de>
Link: https://lore.kernel.org/r/20240306112659.25375-1-justin.swartz@risingedge.co.za
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07 20:24:35 -08:00
Jakub Kicinski
15d2540e0d tools: ynl: check for overflow of constructed messages
Donald points out that we don't check for overflows.
Stash the length of the message on nlmsg_pid (nlmsg_seq would
do as well). This allows the attribute helpers to remain
self-contained (no extra arguments). Also let the put
helpers continue to return nothing. The error is checked
only in (newly introduced) ynl_msg_end().

Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://lore.kernel.org/r/20240305185000.964773-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07 11:01:31 -08:00
Jakub Kicinski
e3afe5dd3a Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

No conflicts.

Adjacent changes:

net/core/page_pool_user.c
  0b11b1c5c3 ("netdev: let netlink core handle -EMSGSIZE errors")
  429679dcf7 ("page_pool: fix netlink dump stop/resume")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07 10:29:36 -08:00
Linus Torvalds
df4793505a Including fixes from bpf, ipsec and netfilter.
No solution yet for the stmmac issue mentioned in the last PR,
 but it proved to be a lockdep false positive, not a blocker.
 
 Current release - regressions:
 
   - dpll: move all dpll<>netdev helpers to dpll code, fix build
     regression with old compilers
 
 Current release - new code bugs:
 
   - page_pool: fix netlink dump stop/resume
 
 Previous releases - regressions:
 
   - bpf: fix verifier to check bpf_func_state->callback_depth when pruning
        states as otherwise unsafe programs could get accepted
 
   - ipv6: avoid possible UAF in ip6_route_mpath_notify()
 
   - ice: reconfig host after changing MSI-X on VF
 
   - mlx5:
     - e-switch, change flow rule destination checking
     - add a memory barrier to prevent a possible null-ptr-deref
     - switch to using _bh variant of of spinlock where needed
 
 Previous releases - always broken:
 
   - netfilter: nf_conntrack_h323: add protection for bmp length out of range
 
   - bpf: fix to zero-initialise xdp_rxq_info struct before running XDP
 	program in CPU map which led to random xdp_md fields
 
   - xfrm: fix UDP encapsulation in TX packet offload
 
   - netrom: fix data-races around sysctls
 
   - ice:
     - fix potential NULL pointer dereference in ice_bridge_setlink()
     - fix uninitialized dplls mutex usage
 
   - igc: avoid returning frame twice in XDP_REDIRECT
 
   - i40e: disable NAPI right after disabling irqs when handling xsk_pool
 
   - geneve: make sure to pull inner header in geneve_rx()
 
   - sparx5: fix use after free inside sparx5_del_mact_entry
 
   - dsa: microchip: fix register write order in ksz8_ind_write8()
 
 Misc:
 
   -  selftests: mptcp: fixes for diag.sh
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmXptoYSHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOkK3IP+QGe1Q37l75YM8IPpihjNYvBTiP6VWv0
 3cKoI0kz2EF5zmt3RAPK1M/ea1GY1L4Fsa/tdV0b9BzP9xC3si7IdFLZLqXh5tUX
 tW5m1LIoPqYLXE2i7qtOS5omMuCqKm2gM7TURarJA0XsAGyu645bYiJeT5dybnZQ
 AuAsXKj9RM3AkcLiqB4PZjdDuG9vIQLi2wSIybP4KFGqY7UMRlkRKFYlu2rpF29s
 XPlR671chaX90sP4bNwf+qVr81Ebu9APmDA0a9tVFDkgEqhPezpRDGHr2Kj+W25s
 j3XXwoygL6gIpJKzRgHsugAaZjla82DpCuygPOcmtTEEtHmF6fn8mBebjY/QDL6w
 ibbcOYJpzPFccRfMyHiiwzjqcaj+Zc58DktFf3H4EnKJULPralhKyMoyPngiAo1Y
 wNIGlWR8SNLhJzyZMeFPMKsz3RnLiC5vMdXMFfZdyH1RHHib5L+8AVogya+SaVkF
 1J1DrrShOEddvlrbZbM8c/03WHkAJXSRD34oHW9c3PkZscSzHmB1xqI1bER6sc5U
 5FjuDnsQDQ61pa6pip2Ug71UOw6ZAwZJs6AgestI49caDvUpSKI7jg/F6Dle6wNT
 p2KVUWFoz5BQBXG8Ut7yWpWvoEmaHe0cEn03rqZSYFnltWgkNvWMRMhkzuroOHWO
 UmOnuVIQH9Vh
 =0bH0
 -----END PGP SIGNATURE-----

Merge tag 'net-6.8-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from bpf, ipsec and netfilter.

  No solution yet for the stmmac issue mentioned in the last PR, but it
  proved to be a lockdep false positive, not a blocker.

  Current release - regressions:

   - dpll: move all dpll<>netdev helpers to dpll code, fix build
     regression with old compilers

  Current release - new code bugs:

   - page_pool: fix netlink dump stop/resume

  Previous releases - regressions:

   - bpf: fix verifier to check bpf_func_state->callback_depth when
     pruning states as otherwise unsafe programs could get accepted

   - ipv6: avoid possible UAF in ip6_route_mpath_notify()

   - ice: reconfig host after changing MSI-X on VF

   - mlx5:
       - e-switch, change flow rule destination checking
       - add a memory barrier to prevent a possible null-ptr-deref
       - switch to using _bh variant of of spinlock where needed

  Previous releases - always broken:

   - netfilter: nf_conntrack_h323: add protection for bmp length out of
     range

   - bpf: fix to zero-initialise xdp_rxq_info struct before running XDP
     program in CPU map which led to random xdp_md fields

   - xfrm: fix UDP encapsulation in TX packet offload

   - netrom: fix data-races around sysctls

   - ice:
       - fix potential NULL pointer dereference in ice_bridge_setlink()
       - fix uninitialized dplls mutex usage

   - igc: avoid returning frame twice in XDP_REDIRECT

   - i40e: disable NAPI right after disabling irqs when handling
     xsk_pool

   - geneve: make sure to pull inner header in geneve_rx()

   - sparx5: fix use after free inside sparx5_del_mact_entry

   - dsa: microchip: fix register write order in ksz8_ind_write8()

  Misc:

   - selftests: mptcp: fixes for diag.sh"

* tag 'net-6.8-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (63 commits)
  net: pds_core: Fix possible double free in error handling path
  netrom: Fix data-races around sysctl_net_busy_read
  netrom: Fix a data-race around sysctl_netrom_link_fails_count
  netrom: Fix a data-race around sysctl_netrom_routing_control
  netrom: Fix a data-race around sysctl_netrom_transport_no_activity_timeout
  netrom: Fix a data-race around sysctl_netrom_transport_requested_window_size
  netrom: Fix a data-race around sysctl_netrom_transport_busy_delay
  netrom: Fix a data-race around sysctl_netrom_transport_acknowledge_delay
  netrom: Fix a data-race around sysctl_netrom_transport_maximum_tries
  netrom: Fix a data-race around sysctl_netrom_transport_timeout
  netrom: Fix data-races around sysctl_netrom_network_ttl_initialiser
  netrom: Fix a data-race around sysctl_netrom_obsolescence_count_initialiser
  netrom: Fix a data-race around sysctl_netrom_default_path_quality
  netfilter: nf_conntrack_h323: Add protection for bmp length out of range
  netfilter: nf_tables: mark set as dead when unbinding anonymous set with timeout
  netfilter: nft_ct: fix l3num expectations with inet pseudo family
  netfilter: nf_tables: reject constant set with timeout
  netfilter: nf_tables: disallow anonymous set with timeout flag
  net/rds: fix WARNING in rds_conn_connect_if_down
  net: dsa: microchip: fix register write order in ksz8_ind_write8()
  ...
2024-03-07 09:23:33 -08:00
Luiz Augusto von Dentz
42ed95de82 Bluetooth: ISO: Align broadcast sync_timeout with connection timeout
This aligns broadcast sync_timeout with existing connection timeouts
which are 20 seconds long.

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-03-07 11:58:17 -05:00
Paolo Abeni
a148f82c45 Merge branch 'tcp-add-two-missing-addresses-when-using-trace'
Jason Xing says:

====================
tcp: add two missing addresses when using trace

When I reviewed other people's patch [1], I noticed that similar things
also happen in tcp_event_skb class and tcp_event_sk_skb class. They
don't print those two addrs of skb/sk which already exist.

In this patch, I just do as other trace functions do, like
trace_net_dev_start_xmit(), to know the exact flow or skb we would like
to know in case some systems doesn't support BPF programs well or we
have to use /sys/kernel/debug/tracing only for some reasons.

[1]
Link: https://lore.kernel.org/netdev/CAL+tcoAhvFhXdr1WQU8mv_6ZX5nOoNpbOLAB6=C+DB-qXQ11Ew@mail.gmail.com/

v2
Link: https://lore.kernel.org/netdev/CANn89iJcScraKAUk1GzZFoOO20RtC9iXpiJ4LSOWT5RUAC_QQA@mail.gmail.com/
1. change the description.
====================

Link: https://lore.kernel.org/r/20240304092934.76698-1-kerneljasonxing@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-07 15:29:51 +01:00
Jason Xing
0ab544b6f0 tcp: add tracing of skbaddr in tcp_event_skb class
Use the existing parameter and print the address of skbaddr
as other trace functions do.

Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-07 15:29:15 +01:00
Jason Xing
4e441bb8ac tcp: add tracing of skb/skaddr in tcp_event_sk_skb class
Printing the addresses can help us identify the exact skb/sk
for those system in which it's not that easy to run BPF program.
As we can see, it already fetches those, then use it directly
and it will print like below:

...tcp_retransmit_skb: skbaddr=XXX skaddr=XXX family=AF_INET...

Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-07 15:29:15 +01:00
Paolo Abeni
25a6838317 Merge branch 'doc-sfp-phylink-update-the-porting-guide'
Maxime Chevallier says:

====================
doc: sfp-phylink: update the porting guide

Here's a V3 for an update on the phylink porting guide. The only
difference with V2 is a whitespace fix along with a line-wrap.

The main point of the update is the description of a basic process to
follow to expose one or more PCS to phylink. Let me know if you spot any
inaccuracies in the guide.

The second patch is a simple fixup on some in-code doc that was spotted
while updating the guide.

Link to V2: https://lore.kernel.org/netdev/20240228095755.1499577-1-maxime.chevallier@bootlin.com/
Link to V1: https://lore.kernel.org/netdev/20240220160406.3363002-1-maxime.chevallier@bootlin.com/
====================

Link: https://lore.kernel.org/r/20240301164309.3643849-1-maxime.chevallier@bootlin.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-07 15:27:07 +01:00
Maxime Chevallier
68ac1e4642 net: phylink: clean the pcs_get_state documentation
commit 4d72c3bb60 ("net: phylink: strip out pre-March 2020 legacy code")
dropped the mac_pcs_get_state ops in phylink_mac_ops in favor of
dedicated PCS operation pcs_get_state. However, the documentation for
the pcs_get_state ops was incorrectly converted and now self-references.

Drop the extra comment.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-07 15:27:05 +01:00
Maxime Chevallier
9b1d858839 doc: sfp-phylink: update the porting guide with PCS handling
Now that phylink has a comprehensive PCS support, update the porting
guide to explain the process of supporting the PCS configuration. This
also removed outdated references to phylink_config fields that no longer
exists.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-07 15:27:05 +01:00
Yongzhi Liu
ba18deddd6 net: pds_core: Fix possible double free in error handling path
When auxiliary_device_add() returns error and then calls
auxiliary_device_uninit(), Callback function pdsc_auxbus_dev_release
calls kfree(padev) to free memory. We shouldn't call kfree(padev)
again in the error handling path.

Fix this by cleaning up the redundant kfree() and putting
the error handling back to where the errors happened.

Fixes: 4569cce43b ("pds_core: add auxiliary_bus devices")
Signed-off-by: Yongzhi Liu <hyperlyzcs@gmail.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Link: https://lore.kernel.org/r/20240306105714.20597-1-hyperlyzcs@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-07 12:03:19 +01:00
Paolo Abeni
d5b8aff73d netfilter pull request 24-03-07
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEN9lkrMBJgcdVAPub1V2XiooUIOQFAmXpIkoACgkQ1V2XiooU
 IOTEeQ/8DjOKAZW1Tbb6AVNdBI2DxEv3Nl94IkPpTXFOHV2apReuxrZlx0qS/Tda
 FutYLap9SmU0koIgUp3ZDZV8eqk9YlPERmAYog2zB9AHiCyfT5xSUmj9zZCE5l4N
 yFHQ665wl8Iz4TDAoSL75ZRKhOhdaDt4WBThtUkMQHhL+lNDtXSWuQBDtle1q8CF
 Edu0OPlcG6/KMu55XgSXcbvWj6ka9RZjCO5Z5D3ZG6UzNOTCZeb+o8o+K+qKTOyB
 /V5OHWTlEU1D7M6twa8qG6n/ce3sVTh9XoZRAEaqHBMkjwNr/VOeO7Pvb23hVy+j
 ZKATYQsle4gXGJqjwcXXG4K8BWMtR8CiweK85+cBXNPasjJOcGrxy0W04X0vBXWt
 xmJ5ou/0PgYv/0RT/JzN4wJw5MccM7RXXElxNjmZS0zkzEPfMKhqMPbYcAEQaPaF
 CyscYDtVrIeOqHBl/HFqbN0ZwdUIQ4nF57vsUVvn8bsevdJaqWP8VxTtAW0U1ST7
 lPJkmeBiqjDezIHbt3wu2+sdlkrxwgJT3puxyyFP/FA0oiWyTfN7OYlWCUssEKTs
 9MwFL5flgmNnEwvZ8iVqjI/Pcf8W/Rx92eou0YayMuhmfJJZ9NjFERdob7QIvgfP
 XG51rrB1a2y/a08o47URmG5u6219WPBwXO/PYs83M3vahg7FKHc=
 =UZiH
 -----END PGP SIGNATURE-----

Merge tag 'nf-24-03-07' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains fixes for net:

Patch #1 disallows anonymous sets with timeout, except for dynamic sets.
         Anonymous sets with timeouts using the pipapo set backend makes
         no sense from userspace perspective.

Patch #2 rejects constant sets with timeout which has no practical usecase.
         This kind of set, once bound, contains elements that expire but
         no new elements can be added.

Patch #3 restores custom conntrack expectations with NFPROTO_INET,
         from Florian Westphal.

Patch #4 marks rhashtable anonymous set with timeout as dead from the
         commit path to avoid that async GC collects these elements. Rules
         that refers to the anonymous set get released with no mutex held
         from the commit path.

Patch #5 fixes a UBSAN shift overflow in H.323 conntrack helper,
         from Lena Wang.

netfilter pull request 24-03-07

* tag 'nf-24-03-07' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nf_conntrack_h323: Add protection for bmp length out of range
  netfilter: nf_tables: mark set as dead when unbinding anonymous set with timeout
  netfilter: nft_ct: fix l3num expectations with inet pseudo family
  netfilter: nf_tables: reject constant set with timeout
  netfilter: nf_tables: disallow anonymous set with timeout flag
====================

Link: https://lore.kernel.org/r/20240307021545.149386-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-07 11:06:14 +01:00
Paolo Abeni
6d673e86cd Merge branch 'netrom-fix-all-the-data-races-around-sysctls'
Jason Xing says:

====================
netrom: Fix all the data-races around sysctls

As the title said, in this patchset I fix the data-race issues because
the writer and the reader can manipulate the same value concurrently.
====================

Link: https://lore.kernel.org/r/20240304082046.64977-1-kerneljasonxing@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-07 10:37:01 +01:00
Jason Xing
d380ce7005 netrom: Fix data-races around sysctl_net_busy_read
We need to protect the reader reading the sysctl value because the
value can be changed concurrently.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-07 10:36:58 +01:00
Jason Xing
bc76645ebd netrom: Fix a data-race around sysctl_netrom_link_fails_count
We need to protect the reader reading the sysctl value because the
value can be changed concurrently.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-07 10:36:58 +01:00
Jason Xing
b5dffcb8f7 netrom: Fix a data-race around sysctl_netrom_routing_control
We need to protect the reader reading the sysctl value because the
value can be changed concurrently.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-07 10:36:58 +01:00
Jason Xing
f99b494b40 netrom: Fix a data-race around sysctl_netrom_transport_no_activity_timeout
We need to protect the reader reading the sysctl value because the
value can be changed concurrently.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-07 10:36:58 +01:00
Jason Xing
a2e7068414 netrom: Fix a data-race around sysctl_netrom_transport_requested_window_size
We need to protect the reader reading the sysctl value because the
value can be changed concurrently.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-07 10:36:58 +01:00
Jason Xing
43547d8699 netrom: Fix a data-race around sysctl_netrom_transport_busy_delay
We need to protect the reader reading the sysctl value because the
value can be changed concurrently.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-07 10:36:58 +01:00
Jason Xing
806f462ba9 netrom: Fix a data-race around sysctl_netrom_transport_acknowledge_delay
We need to protect the reader reading the sysctl value because the
value can be changed concurrently.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-07 10:36:58 +01:00
Jason Xing
e799299aaf netrom: Fix a data-race around sysctl_netrom_transport_maximum_tries
We need to protect the reader reading the sysctl value because the
value can be changed concurrently.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-07 10:36:58 +01:00
Jason Xing
60a7a152ab netrom: Fix a data-race around sysctl_netrom_transport_timeout
We need to protect the reader reading the sysctl value because the
value can be changed concurrently.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-07 10:36:58 +01:00
Jason Xing
119cae5ea3 netrom: Fix data-races around sysctl_netrom_network_ttl_initialiser
We need to protect the reader reading the sysctl value because the
value can be changed concurrently.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-07 10:36:58 +01:00
Jason Xing
cfd9f4a740 netrom: Fix a data-race around sysctl_netrom_obsolescence_count_initialiser
We need to protect the reader reading the sysctl value
because the value can be changed concurrently.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-07 10:36:58 +01:00
Jason Xing
958d6145a6 netrom: Fix a data-race around sysctl_netrom_default_path_quality
We need to protect the reader reading sysctl_netrom_default_path_quality
because the value can be changed concurrently.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-07 10:36:58 +01:00
Tariq Toukan
77d9ec3f6c Documentation: networking: Add description for multi-pf netdev
Add documentation for the multi-pf netdev feature.
Describe the mlx5 implementation and design decisions.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
2024-03-07 00:40:40 -08:00
Tariq Toukan
ed29705e4e net/mlx5: Enable SD feature
Have an actual mlx5_sd instance in the core device, and fix the getter
accordingly. This allows SD stuff to flow, the feature becomes supported
only here.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-03-07 00:40:40 -08:00
Tariq Toukan
d1a8b2c3e4 net/mlx5e: Block TLS device offload on combined SD netdev
1) Each TX TLS device offloaded context has its own TIS object.  Extra work
is needed to get it working in a SD environment, where a stream can move
between different SQs (belonging to different mdevs).

2) Each RX TLS device offloaded context needs a DEK object from the DEK
pool.

Extra work is needed to get it working in a SD environment, as the DEK
pool currently falsely depends on TX cap, and is on the primary device
only.

Disallow this combination for now.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-03-07 00:40:40 -08:00
Tariq Toukan
7f525acbcc net/mlx5e: Support per-mdev queue counter
Each queue counter object counts some events (in hardware) for the RQs
that are attached to it, like events of packet drops due to no receive
WQE (rx_out_of_buffer).

Each RQ can be attached to a queue counter only within the same vhca. To
still cover all RQs with these counters, we create multiple instances,
one per vhca.

The result that's shown to the user is now the sum of all instances.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-03-07 00:40:39 -08:00
Tariq Toukan
40e6ad9182 net/mlx5e: Support cross-vhca RSS
Implement driver support for the HW feature that allows RX steering of
one device to target other device's RQs.

In SD multi-pf netdev mode, we set the secondaries into silent mode,
disconnecting them from the network. This feature is then used to steer
traffic from the primary to the secondaries.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-03-07 00:40:39 -08:00
Tariq Toukan
67936e1385 net/mlx5e: Let channels be SD-aware
Distribute the channels between the different SD-devices to acheive
local numa node performance on multiple numas.

Each channel works against one specific mdev, creating all datapath
queues against it.

We distribute channels to mdevs in a round-robin policy.

Example for 2 mdevs and 6 channels:
+-------+---------+
| ch ix | mdev ix |
+-------+---------+
|   0   |    0    |
|   1   |    1    |
|   2   |    0    |
|   3   |    1    |
|   4   |    0    |
|   5   |    1    |
+-------+---------+

This round-robin distribution policy is preferred over another suggested
intuitive distribution, in which we first distribute one half of the
channels to mdev #0 and then the second half to mdev #1.

We prefer round-robin for a reason: it is less influenced by changes in
the number of channels. The mapping between channel index and mdev is
fixed, no matter how many channels the user configures. As the channel
stats are persistent to channels closure, changing the mapping every
single time would turn the accumulative stats less representing of the
channel's history.

Per-channel objects should stop using the primary mdev (priv->mdev)
directly, and instead move to using their own channel's mdev.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-03-07 00:40:39 -08:00
Tariq Toukan
846122b126 net/mlx5e: Create EN core HW resources for all secondary devices
Traffic queues will be created on all devices, including the
secondaries. Create the needed core layer resources for them as well.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-03-07 00:40:39 -08:00
Tariq Toukan
381978d283 net/mlx5e: Create single netdev per SD group
Integrate the SD library calls into the auxiliary_driver ops in
preparation for creating a single netdev for the multiple PFs belonging
to the same SD group.

SD is still disabled at this stage. It is enabled by a downstream patch
when all needed parts are implemented.

The netdev is created whenever the SD group, with all its participants,
are ready. It is later destroyed whenever any of the participating PFs
drops.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-03-07 00:40:39 -08:00
Tariq Toukan
4375130bf5 net/mlx5: SD, Add debugfs
Add debugfs entries that describe the Socket-Direct group.

Example:
$ grep -H . /sys/kernel/debug/mlx5/0000\:08\:00.0/multi-pf/*
/sys/kernel/debug/mlx5/0000:08:00.0/multi-pf/group_id:0x00000101
/sys/kernel/debug/mlx5/0000:08:00.0/multi-pf/primary:0000:08:00.0 vhca 0x0
/sys/kernel/debug/mlx5/0000:08:00.0/multi-pf/secondary_0:0000:09:00.0 vhca 0x2

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-03-07 00:40:39 -08:00
Tariq Toukan
ae40550e3a net/mlx5: SD, Add informative prints in kernel log
Print to kernel log when an SD group moves from/to ready state.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-03-07 00:40:38 -08:00
Tariq Toukan
f218179b78 net/mlx5: SD, Implement steering for primary and secondaries
Implement the needed SD steering adjustments for the primary and
secondaries.

While the SD multiple PFs are used to avoid cross-numa memory, when it
comes to chip level all traffic goes only through the primary device.
The secondaries are forced to silent mode, to guarantee they are not
involved in any unexpected ingress/egress traffic.

In RX, secondary devices will not have steering objects. Traffic will be
steered from the primary device to the RQs of a secondary device using
advanced cross-vhca RX steering capabilities.

In TX, the primary creates a new TX flow table, which is aliased by the
secondaries.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-03-07 00:40:38 -08:00
Tariq Toukan
d3d0576660 net/mlx5: SD, Implement devcom communication and primary election
Use devcom to communicate between the different devices. Add a new
devcom component type for this.

Each device registers itself to the devcom component <SD, group ID>.
Once all devices of a component are registered, the component becomes
ready, and a primary device is elected.

In principle, any of the devices can act as a primary, they are all
capable, and a random election would've worked. However, we aim to
achieve predictability and consistency, hence each group always choses
the same device, with the lowest PCI BUS number, as primary.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-03-07 00:40:38 -08:00
Tariq Toukan
678eb44805 net/mlx5: SD, Implement basic query and instantiation
Add implementation for querying the MPIR register for Socket-Direct
attributes, and instantiating a SD struct accordingly.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-03-07 00:40:38 -08:00