1312775 Commits

Author SHA1 Message Date
Hongguang Gao
46010d43ab bnxt_en: Add a 'force' parameter to bnxt_free_ctx_mem()
If 'force' is false, it will keep the memory pages and all data
structures for the context memory type if the memory is valid.

This patch always passes true for the 'force' parameter so there is
no change in behavior.  Later patches will adjust the 'force' parameter
for the FW log context memory types so that the logs will not be reset
after FW reset.

Signed-off-by: Hongguang Gao <hongguang.gao@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241115151438.550106-5-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 19:48:54 -08:00
Hongguang Gao
968d2cc07c bnxt_en: Refactor bnxt_free_ctx_mem()
Add a new function bnxt_free_one_ctx_mem() to free one context
memory type.  bnxt_free_ctx_mem() now calls the new function in
the loop to free each context memory type.  There is no change in
behavior.  Later patches will further make use of the new function.

Signed-off-by: Hongguang Gao <hongguang.gao@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241115151438.550106-4-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 19:48:54 -08:00
Shruti Parab
0b350b4927 bnxt_en: Add mem_valid bit to struct bnxt_ctx_mem_type
Add a new bit to struct bnxt_ctx_mem_type to indicate that host
memory has been successfully allocated for this context memory type.
In the next patches, we'll be adding some additional context memory
types for FW debugging/logging.  If memory cannot be allocated for
any of these new types, we will not abort and the cleared mem_valid
bit will indicate to skip configuring the memory type.

Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com>
Signed-off-by: Shruti Parab <shruti.parab@broadcom.com>
Signed-of-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241115151438.550106-3-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 19:48:54 -08:00
Michael Chan
ff00bcc9ec bnxt_en: Update firmware interface spec to 1.10.3.85
The major change is the new firmware command to flush the FW debug
logs to the host backing store context memory buffers.

Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241115151438.550106-2-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 19:48:54 -08:00
Jakub Kicinski
84ad482560 Merge branch 'wireguard-updates-and-fixes-for-6-13'
Jason A. Donenfeld says:

====================
wireguard updates and fixes for 6.13

This tiny series (+3/-2) fixes one bug and has three small improvements.

1) Fix running the netns.sh test suite on systems that haven't yet
   inserted the nf_conntrack module.

2) Remove a stray useless function call in a selftest.

3) There's no need to zero out the netdev private data in recent
   kernels.

4) Set the TSO max size to be GSO_MAX_SIZE, so that we aggregate larger
   packets. Daniel reports seeing a 15% improvement in a simple load and
   suggested the speedups would be even better in more complex loads.
====================

Link: https://patch.msgid.link/20241117212030.629159-1-Jason@zx2c4.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 19:32:33 -08:00
Daniel Borkmann
06a34f7db7 wireguard: device: support big tcp GSO
Advertise GSO_MAX_SIZE as TSO max size in order support BIG TCP for wireguard.
This helps to improve wireguard performance a bit when enabled as it allows
wireguard to aggregate larger skbs in wg_packet_consume_data_done() via
napi_gro_receive(), but also allows the stack to build larger skbs on xmit
where the driver then segments them before encryption inside wg_xmit().
We've seen a 15% improvement in TCP stream performance.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Link: https://patch.msgid.link/20241117212030.629159-5-Jason@zx2c4.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 19:32:27 -08:00
Hangbin Liu
0290abc986 wireguard: selftests: load nf_conntrack if not present
Some distros may not load nf_conntrack by default, which will cause
subsequent nf_conntrack sets to fail. Load this module if it is not
already loaded.

Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
[ Jason: add [[ -e ... ]] check so this works in the qemu harness. ]
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Link: https://patch.msgid.link/20241117212030.629159-4-Jason@zx2c4.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 19:32:27 -08:00
Dheeraj Reddy Jonnalagadda
c1822fb64f wireguard: allowedips: remove redundant selftest call
This commit fixes a useless call issue detected by Coverity (CID
1508092). The call to horrible_allowedips_lookup_v4 is unnecessary as
its return value is never checked.

Signed-off-by: Dheeraj Reddy Jonnalagadda <dheeraj.linuxdev@gmail.com>
Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Link: https://patch.msgid.link/20241117212030.629159-3-Jason@zx2c4.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 19:32:27 -08:00
Tobias Klauser
2c862914fb wireguard: device: omit unnecessary memset of netdev private data
The memory for netdev_priv is allocated using kvzalloc in
alloc_netdev_mqs before rtnl_link_ops->setup is called so there is no
need to zero it again in wg_setup.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Link: https://patch.msgid.link/20241117212030.629159-2-Jason@zx2c4.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 19:32:27 -08:00
Menglong Dong
85c7975acd net: ip: fix unexpected return in fib_validate_source()
The errno should be replaced with drop reasons in fib_validate_source(),
and the "-EINVAL" shouldn't be returned. And this causes a warning, which
is reported by syzkaller:

netlink: 'syz-executor371': attribute type 4 has an invalid length.
------------[ cut here ]------------
WARNING: CPU: 0 PID: 5842 at net/core/skbuff.c:1219 __sk_skb_reason_drop net/core/skbuff.c:1216 [inline]
WARNING: CPU: 0 PID: 5842 at net/core/skbuff.c:1219 sk_skb_reason_drop+0x87/0x380 net/core/skbuff.c:1241
Modules linked in:
CPU: 0 UID: 0 PID: 5842 Comm: syz-executor371 Not tainted 6.12.0-rc6-syzkaller-01362-ga58f00ed24b8 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/30/2024
RIP: 0010:__sk_skb_reason_drop net/core/skbuff.c:1216 [inline]
RIP: 0010:sk_skb_reason_drop+0x87/0x380 net/core/skbuff.c:1241
Code: 00 00 00 fc ff df 41 8d 9e 00 00 fc ff bf 01 00 fc ff 89 de e8 ea 9f 08 f8 81 fb 00 00 fc ff 77 3a 4c 89 e5 e8 9a 9b 08 f8 90 <0f> 0b 90 eb 5e bf 01 00 00 00 89 ee e8 c8 9f 08 f8 85 ed 0f 8e 49
RSP: 0018:ffffc90003d57078 EFLAGS: 00010293
RAX: ffffffff898c3ec6 RBX: 00000000fffbffea RCX: ffff8880347a5a00
RDX: 0000000000000000 RSI: 00000000fffbffea RDI: 00000000fffc0001
RBP: dffffc0000000000 R08: ffffffff898c3eb6 R09: 1ffff110023eb7d4
R10: dffffc0000000000 R11: ffffed10023eb7d5 R12: dffffc0000000000
R13: ffff888011f5bdc0 R14: 00000000ffffffea R15: 0000000000000000
FS:  000055557d41e380(0000) GS:ffff8880b8600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000056519d31d608 CR3: 000000007854e000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 kfree_skb_reason include/linux/skbuff.h:1263 [inline]
 ip_rcv_finish_core+0xfde/0x1b50 net/ipv4/ip_input.c:424
 ip_list_rcv_finish net/ipv4/ip_input.c:610 [inline]
 ip_sublist_rcv+0x3b1/0xab0 net/ipv4/ip_input.c:636
 ip_list_rcv+0x42b/0x480 net/ipv4/ip_input.c:670
 __netif_receive_skb_list_ptype net/core/dev.c:5715 [inline]
 __netif_receive_skb_list_core+0x94e/0x980 net/core/dev.c:5762
 __netif_receive_skb_list net/core/dev.c:5814 [inline]
 netif_receive_skb_list_internal+0xa51/0xe30 net/core/dev.c:5905
 netif_receive_skb_list+0x55/0x4b0 net/core/dev.c:5957
 xdp_recv_frames net/bpf/test_run.c:280 [inline]
 xdp_test_run_batch net/bpf/test_run.c:361 [inline]
 bpf_test_run_xdp_live+0x1b5e/0x21b0 net/bpf/test_run.c:390
 bpf_prog_test_run_xdp+0x805/0x11e0 net/bpf/test_run.c:1318
 bpf_prog_test_run+0x2e4/0x360 kernel/bpf/syscall.c:4266
 __sys_bpf+0x48d/0x810 kernel/bpf/syscall.c:5671
 __do_sys_bpf kernel/bpf/syscall.c:5760 [inline]
 __se_sys_bpf kernel/bpf/syscall.c:5758 [inline]
 __x64_sys_bpf+0x7c/0x90 kernel/bpf/syscall.c:5758
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f18af25a8e9
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffee4090af8 EFLAGS: 00000246 ORIG_RAX: 0000000000000141
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f18af25a8e9
RDX: 0000000000000048 RSI: 0000000020000600 RDI: 000000000000000a
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000

Fix it by returning "-SKB_DROP_REASON_IP_LOCAL_SOURCE" instead of
"-EINVAL" in fib_validate_source().

Reported-by: syzbot+52fbd90f020788ec7709@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/6738e539.050a0220.e1c64.0002.GAE@google.com/
Fixes: 82d9983ebeb8 ("net: ip: make ip_route_input_noref() return drop reasons")
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 18:57:00 -08:00
Dr. David Alan Gilbert
78a36139fc net/fungible: Remove unused fun_create_queue
fun_create_queue was added in 2022 by
commit e1ffcc66818f ("net/fungible: Add service module for Fungible
drivers")
but hasn't been used.

Remove it.

Also remove the static helper functions it was the only user of.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 18:54:11 -08:00
Jakub Kicinski
a537cfdaa7 Merge branch 'uapi-ethtool-avoid-flex-array-in-struct-ethtool_link_settings'
Kees Cook says:

====================
UAPI: ethtool: Avoid flex-array in struct ethtool_link_settings

This reverts the tagged struct group in struct ethtool_link_settings and
instead just removes the flexible array member from Linux's view as it
is entirely unused.
====================

Link: https://patch.msgid.link/20241115204115.work.686-kees@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 18:52:15 -08:00
Kees Cook
96c677fca5 UAPI: ethtool: Avoid flex-array in struct ethtool_link_settings
struct ethtool_link_settings tends to be used as a header for other
structures that have trailing bytes[1], but has a trailing flexible array
itself. Using this overlapped with other structures leads to ambiguous
object sizing in the compiler, so we want to avoid such situations (which
have caused real bugs in the past). Detecting this can be done with
-Wflex-array-member-not-at-end, which will need to be enabled globally.

Using a tagged struct_group() to create a new ethtool_link_settings_hdr
structure isn't possible as it seems we cannot use the tagged variant of
struct_group() due to syntax issues from C++'s perspective (even within
"extern C")[2]. Instead, we can just leave the offending member defined
in UAPI and remove it from the kernel's view of the structure, as Linux
doesn't actually use this member at all. There is also no change in
size since it was already a flexible array that didn't contribute to
size returned by any use of sizeof().

Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://lore.kernel.org/lkml/20241109100213.262a2fa0@kernel.org/ [2]
Link: https://lore.kernel.org/lkml/0bc2809fe2a6c11dd4c8a9a10d9bd65cccdb559b.1730238285.git.gustavoars@kernel.org/ [1]
Signed-off-by: Kees Cook <kees@kernel.org>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20241115204308.3821419-3-kees@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 18:52:11 -08:00
Kees Cook
ebda123fe7 Revert "UAPI: ethtool: Use __struct_group() in struct ethtool_link_settings"
This reverts commit 43d3487035e9a86fad952de4240a518614240d43. We cannot
use tagged struct groups in UAPI because C++ will throw syntax errors
even under "extern C".

Signed-off-by: Kees Cook <kees@kernel.org>
Link: https://patch.msgid.link/20241115204308.3821419-2-kees@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 18:52:11 -08:00
Kees Cook
1cfb5e5788 Revert "net: ethtool: Avoid thousands of -Wflex-array-member-not-at-end warnings"
This reverts commit 3bd9b9abdf1563a22041b7255baea6d449902f1a. We cannot
use the new tagged struct group because it throws C++ errors even under
"extern C".

Signed-off-by: Kees Cook <kees@kernel.org>
Link: https://patch.msgid.link/20241115204308.3821419-1-kees@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 18:52:11 -08:00
Jakub Kicinski
920efe3e13 selftests: net: add more info to error in bpf_offload
bpf_offload caught a spurious warning in TC recently, but the error
message did not provide enough information to know what the problem
is:

  FAIL: Found 'netdevsim' in command output, leaky extack?

Add the extack to the output:

  FAIL: Unexpected command output, leaky extack? ('netdevsim', 'Warning: Filter with specified priority/protocol not found.')

Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 18:51:41 -08:00
Gerd Bayer
16a04d043b net/smc: Run patches also by RDMA ML
Commits for the SMC protocol usually get carried through the netdev
mailing list. Some portions use InfiniBand verbs that are discussed on
the RDMA mailing list. So run patches by that list too to increase the
likelihood that all interested parties can see them.

Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 18:50:39 -08:00
Jakub Kicinski
5b7cfe0633 Merge branch 'mptcp-pm-lockless-list-traversal-and-cleanup'
Matthieu Baerts says:

====================
mptcp: pm: lockless list traversal and cleanup

Here are two patches improving the MPTCP in-kernel path-manager.

- Patch 1: the get and dump endpoints operations are iterating over the
  endpoints list in a lockless way.

- Patch 2: reduce the code duplication to lookup an endpoint.
====================

Link: https://patch.msgid.link/20241115-net-next-mptcp-pm-lockless-dump-v1-0-f4a1bcb4ca2c@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 18:50:15 -08:00
Geliang Tang
1d7fa6ceb9 mptcp: pm: avoid code duplication to lookup endp
The helper __lookup_addr() can be used in mptcp_pm_nl_get_local_id()
and mptcp_pm_nl_is_backup() to simplify the code, and avoid code
duplication.

Co-developed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20241115-net-next-mptcp-pm-lockless-dump-v1-2-f4a1bcb4ca2c@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 18:50:13 -08:00
Matthieu Baerts (NGI0)
3fbb27b7f8 mptcp: pm: lockless list traversal to dump endp
To return an endpoint to the userspace via Netlink, and to dump all of
them, the endpoint list was iterated while holding the pernet->lock, but
only to read the content of the list.

In these cases, the spin locks can be replaced by RCU read ones, and use
the _rcu variants to iterate over the entries list in a lockless way.

Note that the __lookup_addr_by_id() helper has been modified to use the
_rcu variants of list_for_each_entry(), but with an extra conditions, so
it can be called either while the RCU read lock is held, or when the
associated pernet->lock is held.

Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20241115-net-next-mptcp-pm-lockless-dump-v1-1-f4a1bcb4ca2c@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 18:50:13 -08:00
Vitalii Mordan
cc84d89ad8 stmmac: dwmac-intel-plat: remove redundant dwmac->data check in probe
The driver’s compatibility with devices is confirmed earlier in
platform_match(). Since reaching probe means the device is valid,
the extra check can be removed to simplify the code.

Signed-off-by: Vitalii Mordan <mordan@ispras.ru>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 18:49:53 -08:00
Jiawen Wu
e867ed3ac8 net: txgbe: remove GPIO interrupt controller
Since the GPIO interrupt controller is always not working properly, we need
to constantly add workaround to cope with hardware deficiencies. So just
remove GPIO interrupt controller, and let the SFP driver poll the GPIO
status.

Fixes: b4a2496c17ed ("net: txgbe: fix GPIO interrupt blocking")
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Link: https://patch.msgid.link/20241115071527.1129458-1-jiawenwu@trustnetic.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 18:44:31 -08:00
Jakub Kicinski
4be4a91d53 Merge branch 'eth-fbnic-cleanup-and-add-a-few-stats'
Jakub Kicinski says:

====================
eth: fbnic: cleanup and add a few stats

Cleanup trival problems with fbnic and add the PCIe and RPC (Rx parser)
stats.

All stats are read under rtnl_lock for now, so the code is pretty
trivial. We'll need to add more locking when we start gathering
drops used by .ndo_get_stats64.
====================

Link: https://patch.msgid.link/20241115015344.757567-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 18:43:45 -08:00
Sanman Pradhan
79da2aaa08 eth: fbnic: add RPC hardware statistics
Report Rx parser statistics via ethtool -S.

The parser stats are 32b, so we need to add refresh to the service
task to make sure we don't miss overflows.

Signed-off-by: Sanman Pradhan <sanman.p211993@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20241115015344.757567-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 18:43:41 -08:00
Sanman Pradhan
25ba596d13 eth: fbnic: add PCIe hardware statistics
Add PCIe hardware statistics support to the fbnic driver. These stats
provide insight into PCIe transaction performance and error conditions.

Which includes, read/write and completion TLP counts and DWORD counts and
debug counters for tag, completion credit and NP credit exhaustion

The stats are exposed via debugfs and can be used to monitor PCIe
performance and debug PCIe issues.

Signed-off-by: Sanman Pradhan <sanman.p211993@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20241115015344.757567-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 18:43:41 -08:00
Jakub Kicinski
08606cb528 eth: fbnic: add basic debugfs structure
Add the usual debugfs structure:

 fbnic/
   $pci-id/
     device-fileA
     device-fileB

This patch only adds the directories, subsequent changes
will add files.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/20241115015344.757567-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 18:43:41 -08:00
Jakub Kicinski
2a0d6c1705 eth: fbnic: add missing header guards
While adding the SPDX headers I noticed we're also missing
a header guard.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/20241115015344.757567-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 18:43:41 -08:00
Jakub Kicinski
e1a897ef4e eth: fbnic: add missing SPDX headers
Paolo noticed that we are missing SPDX headers, add them.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/20241115015344.757567-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 18:43:41 -08:00
Jakub Kicinski
357c52ff86 selftests: net: netlink-dumps: validation checks
The sanity checks are going to get silently cast to unsigned
and always pass. Cast the sizeof to signed size.

Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241115003248.733862-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 18:42:44 -08:00
Jakub Kicinski
0de6a472c3 net/neighbor: clear error in case strict check is not set
Commit 51183d233b5a ("net/neighbor: Update neigh_dump_info for strict
data checking") added strict checking. The err variable is not cleared,
so if we find no table to dump we will return the validation error even
if user did not want strict checking.

I think the only way to hit this is to send an buggy request, and ask
for a table which doesn't exist, so there's no point treating this
as a real fix. I only noticed it because a syzbot repro depended on it
to trigger another bug.

Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241115003221.733593-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 18:42:21 -08:00
Dmitry Antipov
e64285ff41 rocker: fix link status detection in rocker_carrier_init()
Since '1 << rocker_port->pport' may be undefined for port >= 32,
cast the left operand to 'unsigned long long' like it's done in
'rocker_port_set_enable()' above. Compile tested only.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Link: https://patch.msgid.link/20241114151946.519047-1-dmantipov@yandex.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 18:39:21 -08:00
Jack Wu
a0c80d5108 net: wwan: t7xx: Change PM_AUTOSUSPEND_MS to 5000
Because optimizing the power consumption of t7XX,
change auto suspend time to 5000.

The Tests uses a script to loop through the power_state
of t7XX.
(for example: /sys/bus/pci/devices/0000\:72\:00.0/power_state)

* If Auto suspend is 20 seconds,
  test script show power_state have 0~5% of the time was in D3 state
  when host don't have data packet transmission.

* Changed auto suspend time to 5 seconds,
  test script show power_state have 50%~80% of the time was in D3 state
  when host don't have data packet transmission.

We tested Fibocom FM350 and our products using the t7xx and they all
benefited from this.

Signed-off-by: Jack Wu <wojackbb@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Link: https://patch.msgid.link/20241114102002.481081-1-wojackbb@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 18:38:20 -08:00
Jakub Kicinski
6204656478 tools: ynl-gen: allow uapi headers in sub-dirs
Binder places its headers under include/uapi/linux/android/
Make sure replace / with _ in the uAPI header guard, the c_upper()
is more strict and only converts - to _. This is likely a good
constraint to have, to enforce sane naming in enums etc.
But paths may include /.

Signed-off-by: Li Li <dualli@google.com>
Link: https://patch.msgid.link/20241113193239.2113577-2-dualli@chromium.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 18:38:07 -08:00
Rob Herring (Arm)
5bf99baefb dt-bindings: net: renesas,ether: Drop undocumented "micrel,led-mode"
"micrel,led-mode" is not yet documented by a schema. It's irrelevant to
the example, so just drop it.

Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://patch.msgid.link/20241113225742.1784723-2-robh@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18 16:17:41 -08:00
David S. Miller
d7ef9eeef0 Merge branch 'am65-cpsw-rx-dscp-prio-map'
Roger Quadros says:

====================
net: ethernet: ti: am65-cpsw: enable DSCP to priority map for RX

Configure default DSCP to User Priority mapping registers as per:
 https://datatracker.ietf.org/doc/html/rfc8325#section-4.3
and
 https://datatracker.ietf.org/doc/html/rfc8622#section-11

Also update Priority to Thread maping to be compliant with
IEEE802.1Q-2014. Priority Code Point (PCP) 2 is higher priority than
PCP 0 (Best Effort). PCP 1 (Background) is lower priority than
PCP 0 (Best Effort).

---
Changes in v4:
- Updated default DSCP to User Priority mapping as per
  https://datatracker.ietf.org/doc/html/rfc8325#section-4.3
  and
  https://datatracker.ietf.org/doc/html/rfc8622#section-11
- Link to v3: https://lore.kernel.org/r/20241109-am65-cpsw-multi-rx-dscp-v3-0-1cfb76928490@kernel.org

Changes in v3:
- Added Reviewed-by tag to patch 1
- Added macros for DSCP PRI field size and DSCP PRI per register
- Drop unnecessary readl() in am65_cpsw_port_set_dscp_map()
- Link to v2: https://lore.kernel.org/r/20241107-am65-cpsw-multi-rx-dscp-v2-0-9e9cd1920035@kernel.org

Changes in v2:
- Updated references to more recent standard IEEE802.1Q-2014.
- Dropped reference to web link which might change in the future.
- Typo fix in commit log.
- Link to v1: https://lore.kernel.org/r/20241105-am65-cpsw-multi-rx-dscp-v1-0-38db85333c88@kernel.org
====================

Signed-off-by: Roger Quadros <rogerq@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-11-18 12:00:22 +00:00
Roger Quadros
a208f41758 net: ethernet: ti: am65-cpsw: enable DSCP to priority map for RX
AM65 CPSW hardware can map the 6-bit DSCP/TOS field to
appropriate priority queue via DSCP to Priority mapping registers
(CPSW_PN_RX_PRI_MAP_REG).

Use a default DSCP to User Priority (UP) mapping as per
https://datatracker.ietf.org/doc/html/rfc8325#section-4.3
and
https://datatracker.ietf.org/doc/html/rfc8622#section-11

Signed-off-by: Roger Quadros <rogerq@kernel.org>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-11-18 12:00:21 +00:00
Roger Quadros
01a45daebb net: ethernet: ti: am65-cpsw: update pri_thread_map as per IEEE802.1Q-2014
IEEE802.1Q-2014 supersedes IEEE802.1D-2004. Now Priority Code Point (PCP)
2 is no longer at a lower priority than PCP 0. PCP 1 (Background) is still
at a lower priority than PCP 0 (Best Effort).

Reference:
IEEE802.1Q-2014, Standard for Local and metropolitan area networks
  Table I-2 - Traffic type acronyms
  Table I-3 - Defining traffic types

Signed-off-by: Roger Quadros <rogerq@kernel.org>
Reviewed-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-11-18 12:00:21 +00:00
David S. Miller
ac60031f79 Merge branch 'udp-4tuple-hash'
Philo Lu says:

====================
udp: Add 4-tuple hash for connected sockets

This patchset introduces 4-tuple hash for connected udp sockets, to make
connected udp lookup faster.

Stress test results (with 1 cpu fully used) are shown below, in pps:
(1) _un-connected_ socket as server
    [a] w/o hash4: 1,825176
    [b] w/  hash4: 1,831750 (+0.36%)

(2) 500 _connected_ sockets as server
    [c] w/o hash4:   290860 (only 16% of [a])
    [d] w/  hash4: 1,889658 (+3.1% compared with [b])
With hash4, compute_score is skipped when lookup, so [d] is slightly
better than [b].

Patch1: Add a new counter for hslot2 named hash4_cnt, to avoid cache line
        miss when lookup.
Patch2: Add hslot/hlist_nulls for 4-tuple hash.
Patch3 and 4: Implement 4-tuple hash for ipv4 and ipv6.

The detailed motivation is described in Patch 3.

The 4-tuple hash increases the size of udp_sock and udp_hslot. Thus add it
with CONFIG_BASE_SMALL, i.e., it's a no op with CONFIG_BASE_SMALL.

Intentionally, the feature is not available for udplite. Though udplite
shares some structs and functions with udp, its connect() keeps unchanged.
So all udplite sockets perform the same as un-connected udp sockets.
Besides, udplite also shares the additional memory consumption in udp_sock
and udptable.

changelogs:
v8 -> v9 (Paolo Abeni):
- Add explanation about udplite in cover letter
- Update tags for co-developers
- Add acked-by tags of Paolo and Willem

v7 -> v8:
- add EXPORT_SYMBOL for ipv6.ko build

v6 -> v7 (Kuniyuki Iwashima):
- export udp_ehashfn to be used by udpv6 rehash

v5 -> v6 (Paolo Abeni):
- move udp_table_hash4_init from patch2 to patch1
- use hlist_nulls for lookup-rehash race
- add test results in commit log
- add more comment, e.g., for rehash4 used in hash4
- add ipv6 support (Patch4), and refactor some functions for better
  sharing, without functionality change

v4 -> v5 (Paolo Abeni):
- add CONFIG_BASE_SMALL with which udp hash4 does nothing

v3 -> v4 (Willem de Bruijn):
- fix mistakes in udp_pernet_table_alloc()

RFCv2 -> v3 (Gur Stavi):
- minor fix in udp_hashslot2() and udp_table_init()
- add rcu sync in rehash4()

RFCv1 -> RFCv2:
- add a new struct for hslot2
- remove the sockopt UDP_HASH4 because it has little side effect for
  unconnected sockets
- add rehash in connect()
- re-organize the patch into 3 smaller ones
- other minor fix

v8:
https://lore.kernel.org/all/20241108054836.123484-1-lulie@linux.alibaba.com/
v7:
https://lore.kernel.org/all/20241105121225.12513-1-lulie@linux.alibaba.com/
v6:
https://lore.kernel.org/all/20241031124550.20227-1-lulie@linux.alibaba.com/
v5:
https://lore.kernel.org/all/20241018114535.35712-1-lulie@linux.alibaba.com/
v4:
https://lore.kernel.org/all/20241012012918.70888-1-lulie@linux.alibaba.com/
v3:
https://lore.kernel.org/all/20241010090351.79698-1-lulie@linux.alibaba.com/
RFCv2:
https://lore.kernel.org/all/20240924110414.52618-1-lulie@linux.alibaba.com/
RFCv1:
https://lore.kernel.org/all/20240913100941.8565-1-lulie@linux.alibaba.com/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2024-11-18 11:56:21 +00:00
Philo Lu
1b29a730ef ipv6/udp: Add 4-tuple hash for connected socket
Implement ipv6 udp hash4 like that in ipv4. The major difference is that
the hash value should be calculated with udp6_ehashfn(). Besides,
ipv4-mapped ipv6 address is handled before hash() and rehash(). Export
udp_ehashfn because now we use it in udpv6 rehash.

Core procedures of hash/unhash/rehash are same as ipv4, and udpv4 and
udpv6 share the same udptable, so some functions in ipv4 hash4 can also
be shared.

Co-developed-by: Cambda Zhu <cambda@linux.alibaba.com>
Signed-off-by: Cambda Zhu <cambda@linux.alibaba.com>
Co-developed-by: Fred Chen <fred.cc@alibaba-inc.com>
Signed-off-by: Fred Chen <fred.cc@alibaba-inc.com>
Co-developed-by: Yubing Qiu <yubing.qiuyubing@alibaba-inc.com>
Signed-off-by: Yubing Qiu <yubing.qiuyubing@alibaba-inc.com>
Signed-off-by: Philo Lu <lulie@linux.alibaba.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-11-18 11:56:21 +00:00
Philo Lu
78c91ae2c6 ipv4/udp: Add 4-tuple hash for connected socket
Currently, the udp_table has two hash table, the port hash and portaddr
hash. Usually for UDP servers, all sockets have the same local port and
addr, so they are all on the same hash slot within a reuseport group.

In some applications, UDP servers use connect() to manage clients. In
particular, when firstly receiving from an unseen 4 tuple, a new socket
is created and connect()ed to the remote addr:port, and then the fd is
used exclusively by the client.

Once there are connected sks in a reuseport group, udp has to score all
sks in the same hash2 slot to find the best match. This could be
inefficient with a large number of connections, resulting in high
softirq overhead.

To solve the problem, this patch implement 4-tuple hash for connected
udp sockets. During connect(), hash4 slot is updated, as well as a
corresponding counter, hash4_cnt, in hslot2. In __udp4_lib_lookup(),
hslot4 will be searched firstly if the counter is non-zero. Otherwise,
hslot2 is used like before. Note that only connected sockets enter this
hash4 path, while un-connected ones are not affected.

hlist_nulls is used for hash4, because we probably move to another hslot
wrongly when lookup with concurrent rehash. Then we check nulls at the
list end to see if we should restart lookup. Because udp does not use
SLAB_TYPESAFE_BY_RCU, we don't need to touch sk_refcnt when lookup.

Stress test results (with 1 cpu fully used) are shown below, in pps:
(1) _un-connected_ socket as server
    [a] w/o hash4: 1,825176
    [b] w/  hash4: 1,831750 (+0.36%)

(2) 500 _connected_ sockets as server
    [c] w/o hash4:   290860 (only 16% of [a])
    [d] w/  hash4: 1,889658 (+3.1% compared with [b])

With hash4, compute_score is skipped when lookup, so [d] is slightly
better than [b].

Co-developed-by: Cambda Zhu <cambda@linux.alibaba.com>
Signed-off-by: Cambda Zhu <cambda@linux.alibaba.com>
Co-developed-by: Fred Chen <fred.cc@alibaba-inc.com>
Signed-off-by: Fred Chen <fred.cc@alibaba-inc.com>
Co-developed-by: Yubing Qiu <yubing.qiuyubing@alibaba-inc.com>
Signed-off-by: Yubing Qiu <yubing.qiuyubing@alibaba-inc.com>
Signed-off-by: Philo Lu <lulie@linux.alibaba.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-11-18 11:56:21 +00:00
Philo Lu
dab78a1745 net/udp: Add 4-tuple hash list basis
Add a new hash list, hash4, in udp table. It will be used to implement
4-tuple hash for connected udp sockets. This patch adds the hlist to
table, and implements helpers and the initialization. 4-tuple hash is
implemented in the following patch.

hash4 uses hlist_nulls to avoid moving wrongly onto another hlist due to
concurrent rehash, because rehash() can happen with lookup().

Co-developed-by: Cambda Zhu <cambda@linux.alibaba.com>
Signed-off-by: Cambda Zhu <cambda@linux.alibaba.com>
Co-developed-by: Fred Chen <fred.cc@alibaba-inc.com>
Signed-off-by: Fred Chen <fred.cc@alibaba-inc.com>
Co-developed-by: Yubing Qiu <yubing.qiuyubing@alibaba-inc.com>
Signed-off-by: Yubing Qiu <yubing.qiuyubing@alibaba-inc.com>
Signed-off-by: Philo Lu <lulie@linux.alibaba.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-11-18 11:56:21 +00:00
Philo Lu
accdd51dc7 net/udp: Add a new struct for hash2 slot
Preparing for udp 4-tuple hash (uhash4 for short).

To implement uhash4 without cache line missing when lookup, hslot2 is
used to record the number of hashed sockets in hslot4. Thus adding a new
struct udp_hslot_main with field hash4_cnt, which is used by hash2. The
new struct is used to avoid doubling the size of udp_hslot.

Before uhash4 lookup, firstly checking hash4_cnt to see if there are
hashed sks in hslot4. Because hslot2 is always used in lookup, there is
no cache line miss.

Related helpers are updated, and use the helpers as possible.

uhash4 is implemented in following patches.

Signed-off-by: Philo Lu <lulie@linux.alibaba.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-11-18 11:56:21 +00:00
David S. Miller
296a681def ipsec-next-2024-11-15
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEH7ZpcWbFyOOp6OJbrB3Eaf9PW7cFAmc3A/gACgkQrB3Eaf9P
 W7fNew//XCIhIvFYaQcP2x84T4EYB679NkGlwMATxXgn40+sp7muSwVweynEWNIu
 FltfBAwYD/MxD7g519abVPMWXs/iYI5duw3vvqnxmkOoebWLLocg2VoqFIdVXlQw
 /hj+1X/oNT4OKcaQAw/FAGRuYvkc90YB/rRG51RwAIR0tyBjRwfUsozMM8QX/zQI
 I0cLCgGAf/kylQre+dhvUkMhXaLogMF5v0qzPxhyMBD02JaUpe6+5cdHQcmKOhqa
 ksTpySYnIKIHZrLizeFGDZpinaDIph20vGaDvDXpqTYFuwvCQsZczJy02dF4otf2
 2dZz6+2La+ZM+WsGIqpALqKCNhr8fOcQxCRH3eGLPBwoXXt5CFAMgJKob8hKuonW
 FgJaYMBZOjYbgGah8WbEe/YsWq4y3uRs48pFtY+T5cn7AskNxIvUoLNjSS83Hlqu
 PJbveiKsZygig966Q/zUFATYnvj3zEgjVEcSbK6LRyBXL79Njr8l+PZ0Zoz76tc4
 bF1Xv0x+lRYmwa9rvOFaeqrP/GTe0xvlitFzuCN7HnXiN8URKnnDY2odkXYzo+Z7
 MBbP8wR/CaoiAvdMw74116nAIFOW95LPtvdGJTvlS9jAOt1P7dWQ3/mFKEpItndv
 cJjWzI7HKl0+85FcCDw+tmsDWWGbALUyPw96i8UgUcDGyqVKUgA=
 =Ioo8
 -----END PGP SIGNATURE-----

Merge tag 'ipsec-next-2024-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next

Steffen Klassert says:

====================

ipsec-next-11-15

1) Add support for RFC 9611 per cpu xfrm state handling.

2) Add inbound and outbound xfrm state caches to speed up
   state lookups.

3) Convert xfrm to dscp_t. From Guillaume Nault.

4) Fix error handling in build_aevent.
   From Everest K.C.

5) Replace strncpy with strscpy_pad in copy_to_user_auth.
   From Daniel Yang.

6) Fix an uninitialized symbol during acquire state insertion.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2024-11-18 11:52:49 +00:00
Jakub Kicinski
38f83a57aa Merge branch 'virtio-net-support-af_xdp-zero-copy-tx'
Xuan Zhuo says:

====================
virtio-net: support AF_XDP zero copy (tx)

XDP socket(AF_XDP) is an excellent bypass kernel network framework. The zero
copy feature of xsk (XDP socket) needs to be supported by the driver. The
performance of zero copy is very good. mlx5 and intel ixgbe already support
this feature, This patch set allows virtio-net to support xsk's zerocopy xmit
feature.

At present, we have completed some preparation:

1. vq-reset (virtio spec and kernel code)
2. virtio-core premapped dma
3. virtio-net xdp refactor

So it is time for Virtio-Net to complete the support for the XDP Socket
Zerocopy.

Virtio-net can not increase the queue num at will, so xsk shares the queue with
kernel.

This patch set includes some refactor to the virtio-net to let that to support
AF_XDP.

The current configuration sets the virtqueue (vq) to premapped mode,
implying that all buffers submitted to this queue must be mapped ahead
of time. This presents a challenge for the virtnet send queue (sq): the
virtnet driver would be required to keep track of dma information for vq
size * 17, which can be substantial. However, if the premapped mode were
applied on a per-buffer basis, the complexity would be greatly reduced.
With AF_XDP enabled, AF_XDP buffers would become premapped, while kernel
skb buffers could remain unmapped.

We can distinguish them by sg_page(sg), When sg_page(sg) is NULL, this
indicates that the driver has performed DMA mapping in advance, allowing
the Virtio core to directly utilize sg_dma_address(sg) without
conducting any internal DMA mapping. Additionally, DMA unmap operations
for this buffer will be bypassed.

ENV: Qemu with vhost-user(polling mode).
Host CPU: Intel(R) Xeon(R) Platinum 8163 CPU @ 2.50GHz

testpmd> show port stats all

 ######################## NIC statistics for port 0 ########################
 RX-packets: 19531092064 RX-missed: 0     RX-bytes: 1093741155584
 RX-errors: 0
 RX-nombuf: 0
 TX-packets: 5959955552 TX-errors: 0     TX-bytes: 371030645664

 Throughput (since last show)
 Rx-pps:   8861574     Rx-bps:  3969985208
 Tx-pps:   8861493     Tx-bps:  3969962736
 ############################################################################

testpmd> show port stats all

  ######################## NIC statistics for port 0  ########################
  RX-packets: 68152727   RX-missed: 0          RX-bytes:  3816552712
  RX-errors: 0
  RX-nombuf:  0
  TX-packets: 68114967   TX-errors: 33216      TX-bytes:  3814438152

  Throughput (since last show)
  Rx-pps:      6333196          Rx-bps:   2837272088
  Tx-pps:      6333227          Tx-bps:   2837285936
  ############################################################################

But AF_XDP consumes more CPU for tx and rx napi(100% and 86%).
====================

Link: https://patch.msgid.link/20241112012928.102478-1-xuanzhuo@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-15 18:47:08 -08:00
Xuan Zhuo
37e0ca657a virtio_net: xdp_features add NETDEV_XDP_ACT_XSK_ZEROCOPY
Now, we support AF_XDP(xsk). Add NETDEV_XDP_ACT_XSK_ZEROCOPY to
xdp_features.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://patch.msgid.link/20241112012928.102478-14-xuanzhuo@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-15 18:46:56 -08:00
Xuan Zhuo
e2c5c57f1a virtio_net: update tx timeout record
If send queue sent some packets, we update the tx timeout
record to prevent the tx timeout.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://patch.msgid.link/20241112012928.102478-13-xuanzhuo@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-15 18:46:56 -08:00
Xuan Zhuo
89f86675cb virtio_net: xsk: tx: support xmit xsk buffer
The driver's tx napi is very important for XSK. It is responsible for
obtaining data from the XSK queue and sending it out.

At the beginning, we need to trigger tx napi.

virtnet_free_old_xmit distinguishes three type ptr(skb, xdp frame, xsk
buffer) by the last bits of the pointer.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://patch.msgid.link/20241112012928.102478-12-xuanzhuo@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-15 18:46:56 -08:00
Xuan Zhuo
1df5116a41 virtio_net: xsk: prevent disable tx napi
Since xsk's TX queue is consumed by TX NAPI, if sq is bound to xsk, then
we must stop tx napi from being disabled.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://patch.msgid.link/20241112012928.102478-11-xuanzhuo@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-15 18:46:55 -08:00
Xuan Zhuo
21a4e3ce6d virtio_net: xsk: bind/unbind xsk for tx
This patch implement the logic of bind/unbind xsk pool to sq and rq.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://patch.msgid.link/20241112012928.102478-10-xuanzhuo@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-15 18:46:55 -08:00
Xuan Zhuo
7db956707f virtio_net: refactor the xmit type
Because the af-xdp will introduce a new xmit type, so I refactor the
xmit type mechanism first.

We know both xdp_frame and sk_buff are at least 4 bytes aligned.
For the xdp tx, we do not pass any pointer to virtio core as data,
we just need to pass the len of the packet. So we will push len
to the void pointer. We can make sure the pointer is 4 bytes aligned.

And the data structure of AF_XDP also is at least 4 bytes aligned.

So the last two bits of the pointers are free, we can't use these to
distinguish them.

    00 for skb
    01 for SKB_ORPHAN
    10 for XDP
    11 for AF-XDP tx

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://patch.msgid.link/20241112012928.102478-9-xuanzhuo@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-15 18:46:55 -08:00