linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-01-07 13:53:24 +00:00

Author	SHA1	Message	Date
Kuniyuki Iwashima	b2cb9f9ef2	tcp: Unlink sk from bhash. Now we do not use tb->owners and can unlink sockets from bhash. sk_bind_node/tw_bind_node are available for bhash2 and will be used in the following patch. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-12-22 22:15:35 +00:00
Kuniyuki Iwashima	8002d44fe8	tcp: Check hlist_empty(&tb->bhash2) instead of hlist_empty(&tb->owners). We use hlist_empty(&tb->owners) to check if the bhash bucket has a socket. We can check the child bhash2 buckets instead. For this to work, the bhash2 bucket must be freed before the bhash bucket. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-12-22 22:15:35 +00:00
Kuniyuki Iwashima	b82ba728cc	tcp: Iterate tb->bhash2 in inet_csk_bind_conflict(). Sockets in bhash are also linked to bhash2, but TIME_WAIT sockets are linked separately in tb2->deathrow. Let's replace tb->owners iteration in inet_csk_bind_conflict() with two iterations over tb2->owners and tb2->deathrow. This can be done safely under bhash's lock because socket insertion/ deletion in bhash2 happens with bhash's lock held. Note that twsk_for_each_bound_bhash() will be removed later. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-12-22 22:15:35 +00:00
Kuniyuki Iwashima	58655bc0ad	tcp: Rearrange tests in inet_csk_bind_conflict(). The following patch adds code in the !inet_use_bhash2_on_bind(sk) case in inet_csk_bind_conflict(). To avoid adding nest and make the change cleaner, this patch rearranges tests in inet_csk_bind_conflict(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-12-22 22:15:35 +00:00
Kuniyuki Iwashima	822fb91fc7	tcp: Link bhash2 to bhash. bhash2 added a new member sk_bind2_node in struct sock to link sockets to bhash2 in addition to bhash. bhash is still needed to search conflicting sockets efficiently from a port for the wildcard address. However, bhash itself need not have sockets. If we link each bhash2 bucket to the corresponding bhash bucket, we can iterate the same set of the sockets from bhash2 via bhash. This patch links bhash2 to bhash only, and the actual use will be in the later patches. Finally, we will remove sk_bind2_node. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-12-22 22:15:34 +00:00
Kuniyuki Iwashima	4dd7108854	tcp: Rename tb in inet_bind2_bucket_(init\|create)(). Later, we no longer link sockets to bhash. Instead, each bhash2 bucket is linked to the corresponding bhash bucket. Then, we pass the bhash bucket to bhash2 allocation functions as tb. However, tb is already used in inet_bind2_bucket_create() and inet_bind2_bucket_init() as the bhash2 bucket. To make the following diff clear, let's use tb2 for the bhash2 bucket there. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-12-22 22:15:34 +00:00
Kuniyuki Iwashima	5a22bba13d	tcp: Save address type in inet_bind2_bucket. inet_bind2_bucket_addr_match() and inet_bind2_bucket_match_addr_any() are called for each bhash2 bucket to check conflicts. Thus, we call ipv6_addr_any() and ipv6_addr_v4mapped() over and over during bind(). Let's avoid calling them by saving the address type in inet_bind2_bucket. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-12-22 22:15:34 +00:00
Kuniyuki Iwashima	06a8c04f89	tcp: Save v4 address as v4-mapped-v6 in inet_bind2_bucket.v6_rcv_saddr. In bhash2, IPv4/IPv6 addresses are saved in two union members, which complicate address checks in inet_bind2_bucket_addr_match() and inet_bind2_bucket_match_addr_any() considering uninitialised memory and v4-mapped-v6 conflicts. Let's simplify that by saving IPv4 address as v4-mapped-v6 address and defining tb2.rcv_saddr as tb2.v6_rcv_saddr.s6_addr32[3]. Then, we can compare v6 address as is, and after checking v4-mapped-v6, we can compare v4 address easily. Also, we can remove tb2->family. Note these functions will be further refactored in the next patch. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-12-22 22:15:34 +00:00
Kuniyuki Iwashima	56f3e3f01f	tcp: Rearrange tests in inet_bind2_bucket_(addr_match\|match_addr_any)(). The protocol family tests in inet_bind2_bucket_addr_match() and inet_bind2_bucket_match_addr_any() are ordered as follows. if (sk->sk_family != tb2->family) else if (sk->sk_family == AF_INET6) else This patch rearranges them so that AF_INET6 socket is handled first to make the following patch tidy, where tb2->family will be removed. if (sk->sk_family == AF_INET6) else if (tb2->family == AF_INET6) else Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-12-22 22:15:34 +00:00
Kuniyuki Iwashima	5e07e67241	tcp: Use bhash2 for v4-mapped-v6 non-wildcard address. While checking port availability in bind() or listen(), we used only bhash for all v4-mapped-v6 addresses. But there is no good reason not to use bhash2 for v4-mapped-v6 non-wildcard addresses. Let's do it by returning true in inet_use_bhash2_on_bind(). Then, we also need to add a test in inet_bind2_bucket_match_addr_any() so that ::ffff:X.X.X.X will match with 0.0.0.0. Note that sk->sk_rcv_saddr is initialised for v4-mapped-v6 sk in __inet6_bind(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-12-22 22:15:34 +00:00
Paolo Abeni	56794e5358	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR. Adjacent changes: drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c `23c93c3b62` ("bnxt_en: do not map packet buffers twice") `6d1add9553` ("bnxt_en: Modify TX ring indexing logic.") tools/testing/selftests/net/Makefile `2258b66648` ("selftests: add vlan hw filter tests") `a0bc96c0cd` ("selftests: net: verify fq per-band packet limit") Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-12-21 22:17:23 +01:00
Linus Torvalds	7c5e046bdc	Including fixes from WiFi and bpf. Current release - regressions: - bpf: syzkaller found null ptr deref in unix_bpf proto add - eth: i40e: fix ST code value for clause 45 Previous releases - regressions: - core: return error from sk_stream_wait_connect() if sk_wait_event() fails - ipv6: revert remove expired routes with a separated list of routes - wifi rfkill: - set GPIO direction - fix crash with WED rx support enabled - bluetooth: - fix deadlock in vhci_send_frame - fix use-after-free in bt_sock_recvmsg - eth: mlx5e: fix a race in command alloc flow - eth: ice: fix PF with enabled XDP going no-carrier after reset - eth: bnxt_en: do not map packet buffers twice Previous releases - always broken: - core: - check vlan filter feature in vlan_vids_add_by_dev() and vlan_vids_del_by_dev() - check dev->gso_max_size in gso_features_check() - mptcp: fix inconsistent state on fastopen race - phy: skip LED triggers on PHYs on SFP modules - eth: mlx5e: - fix double free of encap_header - fix slab-out-of-bounds in mlx5_query_nic_vport_mac_list() Signed-off-by: Paolo Abeni <pabeni@redhat.com> -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmWELNYSHHBhYmVuaUBy ZWRoYXQuY29tAAoJECkkeY3MjxOkL0MQALYng9iIz5m/iX0pjRpI4HxruflMkdWa +FMRnWp0OE5ak8l/UcffgYRNozXAcA/PSJhanZU3gT21IeJ+X78paivWyEPUhpqN d1nhRRsnx37fTK3lbS/wSGJUN+x4g49kHQn5mw8vfi/RGHuc/vbLO23iWsawB92/ 7YI0rEzZh2b1FKytvqF9t2lLtJw5ucwQtdm3d/tg4iuL44Lq8dA69dln4wZx3t28 lobsW0eQW2JRh2YwrREb1oUD0CcUNk+XGsWVyUXqs31OflkqYMEzI41Yxs/lHJs0 0Lmt3/F2Ls+H+vEYElJ0wsNPFZr4TDhAsV5KMxZdoBfWhTN8ordloBXGHre1IVSK SQtja5IqT01dLbDoL7tLpyEsGLp1A+HPH+BVxt582srSMoXWmFYOZcRczKJ85C1W qaohCGeEO537ExrAMHbJ0CxR3oSawyOBszjTYGdbI3xiFj5q1n48YyJSep//rgvP PewzqtMpPPapPIiJbvRjN8Mn56Y2832TSbPOVZ2KJuBpx+i/mjXyIK97FMb+Jdvu 3ACH3BmsUfvXXpXNSZIgtc35tP03GSeV9B2vzlhjFwxB2vV4wuX9NbI5OIWi7ZM1 03jkC2wQf6jVby45IM5kMuEKL3hMXUx9t0nZg0szJ3T31+OQ6e5Hlv1Aqp4Ihn5Q N+fxo6lpm+Aq =sEmi -----END PGP SIGNATURE----- Merge tag 'net-6.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from WiFi and bpf. Current release - regressions: - bpf: syzkaller found null ptr deref in unix_bpf proto add - eth: i40e: fix ST code value for clause 45 Previous releases - regressions: - core: return error from sk_stream_wait_connect() if sk_wait_event() fails - ipv6: revert remove expired routes with a separated list of routes - wifi rfkill: - set GPIO direction - fix crash with WED rx support enabled - bluetooth: - fix deadlock in vhci_send_frame - fix use-after-free in bt_sock_recvmsg - eth: mlx5e: fix a race in command alloc flow - eth: ice: fix PF with enabled XDP going no-carrier after reset - eth: bnxt_en: do not map packet buffers twice Previous releases - always broken: - core: - check vlan filter feature in vlan_vids_add_by_dev() and vlan_vids_del_by_dev() - check dev->gso_max_size in gso_features_check() - mptcp: fix inconsistent state on fastopen race - phy: skip LED triggers on PHYs on SFP modules - eth: mlx5e: - fix double free of encap_header - fix slab-out-of-bounds in mlx5_query_nic_vport_mac_list()" * tag 'net-6.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (69 commits) net: check dev->gso_max_size in gso_features_check() kselftest: rtnetlink.sh: use grep_fail when expecting the cmd fail net/ipv6: Revert remove expired routes with a separated list of routes net: avoid build bug in skb extension length calculation net: ethernet: mtk_wed: fix possible NULL pointer dereference in mtk_wed_wo_queue_tx_clean() net: stmmac: fix incorrect flag check in timestamp interrupt selftests: add vlan hw filter tests net: check vlan filter feature in vlan_vids_add_by_dev() and vlan_vids_del_by_dev() net: hns3: add new maintainer for the HNS3 ethernet driver net: mana: select PAGE_POOL net: ks8851: Fix TX stall caused by TX buffer overrun ice: Fix PF with enabled XDP going no-carrier after reset ice: alter feature support check for SRIOV and LAG ice: stop trashing VF VSI aggregator node ID information mailmap: add entries for Geliang Tang mptcp: fill in missing MODULE_DESCRIPTION() mptcp: fix inconsistent state on fastopen race selftests: mptcp: join: fix subflow_send_ack lookup net: phy: skip LED triggers on PHYs on SFP modules bpf: Add missing BPF_LINK_TYPE invocations ...	2023-12-21 09:15:37 -08:00
Paolo Abeni	74769d810e	bpf-for-netdev -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZYQVjgAKCRDbK58LschI g/NfAP9xMBCASd22+KPu44FtPPO5DKcdG7hATXZMpb/cygF8GQEAojcZ4jztx42S F1+4RPEoxrn31oVYdtEGFY9q85ruzgA= =2XhN -----END PGP SIGNATURE----- Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2023-12-21 Hi David, hi Jakub, hi Paolo, hi Eric, The following pull-request contains BPF updates for your net tree. We've added 3 non-merge commits during the last 5 day(s) which contain a total of 4 files changed, 45 insertions(+). The main changes are: 1) Fix a syzkaller splat which triggered an oob issue in bpf_link_show_fdinfo(), from Jiri Olsa. 2) Fix another syzkaller-found issue which triggered a NULL pointer dereference in BPF sockmap for unconnected unix sockets, from John Fastabend. bpf-for-netdev * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf: Add missing BPF_LINK_TYPE invocations bpf: sockmap, test for unconnected af_unix sock bpf: syzkaller found null ptr deref in unix_bpf proto add ==================== Link: https://lore.kernel.org/r/20231221104844.1374-1-daniel@iogearbox.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-12-21 12:27:29 +01:00
Eric Dumazet	24ab059d2e	net: check dev->gso_max_size in gso_features_check() Some drivers might misbehave if TSO packets get too big. GVE for instance uses a 16bit field in its TX descriptor, and will do bad things if a packet is bigger than 2^16 bytes. Linux TCP stack honors dev->gso_max_size, but there are other ways for too big packets to reach an ndo_start_xmit() handler : virtio_net, af_packet, GRO... Add a generic check in gso_features_check() and fallback to GSO when needed. gso_max_size was added in the blamed commit. Fixes: `82cc1a7a56` ("[NET]: Add per-connection option to set max TSO frame size") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20231219125331.4127498-1-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-12-21 10:15:10 +01:00
David Ahern	dade3f6a1e	net/ipv6: Revert remove expired routes with a separated list of routes This reverts commit `3dec89b14d`. The commit has some race conditions given how expires is managed on a fib6_info in relation to gc start, adding the entry to the gc list and setting the timer value leading to UAF. Revert the commit and try again in a later release. Fixes: `3dec89b14d` ("net/ipv6: Remove expired routes with a separated list of routes") Cc: Kui-Feng Lee <thinker.li@gmail.com> Signed-off-by: David Ahern <dsahern@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20231219030243.25687-1-dsahern@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-12-21 09:01:30 +01:00
Thomas Weißschuh	d6e5794b06	net: avoid build bug in skb extension length calculation GCC seems to incorrectly fail to evaluate skb_ext_total_length() at compile time under certain conditions. The issue even occurs if all values in skb_ext_type_len[] are "0", ruling out the possibility of an actual overflow. As the patch has been in mainline since v6.6 without triggering the problem it seems to be a very uncommon occurrence. As the issue only occurs when -fno-tree-loop-im is specified as part of CFLAGS_GCOV, disable the BUILD_BUG_ON() only when building with coverage reporting enabled. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202312171924.4FozI5FG-lkp@intel.com/ Suggested-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/lkml/487cfd35-fe68-416f-9bfd-6bb417f98304@app.fastmail.com/ Fixes: `5d21d0a65b` ("net: generalize calculation of skb extensions length") Cc: <stable@vger.kernel.org> Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Acked-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20231218-net-skbuff-build-bug-v1-1-eefc2fb0a7d3@weissschuh.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-12-21 08:09:40 +01:00
Linus Torvalds	ac1c13e257	nfsd-6.7 fixes: - Address a few recently-introduced issues -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmWB+F4ACgkQM2qzM29m f5dZKBAAoAxQA9FTIcb+uMIviGvMFBylKv9mhQxKSkhWhhP9XX2ESXAl+6pcBQJP ikjWw09WO/6ttvxg4hY7exk6FgkDuCALnx1xB6azEq5Ndf4T2aauuFUTEERfQ8th mtGSKEX1a98bGlkBIVQgfr3VFbQ0MRwcr0iGKfbHC3X5GiuFZOHn5fFE+0LjXWyJ 7MjI0Du3lXDlrG5mJ88T5ySMcbaOWBTzqlMY3kbwcf1dWy3TZ6uh6faXqbemrDPg Zixj6JGi+oLlrbYYjQV1Sm5QLW3e882QCQ0U9g2sOEjmLRN/hbCE90i6l1rVEe9a E1ZkAY5qtl+iFK6cbAYUt6lOcZaF8lNeEtBW5JhcFm7f7CJAUSmMb05klZ6rwpXR l6UDwQtnAbmghqAwuKaWdfxys/yqFTvKNJRida7+qK1fs8H8q0y/acKsIv/5PoWz cJvWXova4wIT7Q3xneaYXhHk1aWTQTooErQ0VrgnsE0Ch7Vc4sONhQ3SLajllChD x3JL0DgDJ6mO/JaFZCfIy3ihEaprV0uL3bqgXaad1SszEWJt4HCCE1fQpI0cSbXH Vyq5H0R74EoLh8TGc6dbJ5wZwU81P6Rc0nHz/VU+gUPrEeNYKv8P2bahnXfR7Nsc F93sl2DqQOEv/Q/h/9UolwL7AGl6TkzdPKEoYKNafagUWT031y0= =F1lB -----END PGP SIGNATURE----- Merge tag 'nfsd-6.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: - Address a few recently-introduced issues * tag 'nfsd-6.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: SUNRPC: Revert `5f7fc5d69f` NFSD: Revert `738401a9bd` NFSD: Revert `6c41d9a9bd` nfsd: hold nfsd_mutex across entire netlink operation nfsd: call nfsd_last_thread() before final nfsd_put()	2023-12-20 11:16:50 -08:00
Victor Nogueira	4cf24dc893	net: sched: Add initial TC error skb drop reasons Continue expanding Daniel's patch by adding new skb drop reasons that are idiosyncratic to TC. More specifically: - SKB_DROP_REASON_TC_COOKIE_ERROR: An error occurred whilst processing a tc ext cookie. - SKB_DROP_REASON_TC_CHAIN_NOTFOUND: tc chain lookup failed. - SKB_DROP_REASON_TC_RECLASSIFY_LOOP: tc exceeded max reclassify loop iterations Signed-off-by: Victor Nogueira <victor@mojatatu.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-12-20 11:50:13 +00:00
Victor Nogueira	b6a3c6066a	net: sched: Make tc-related drop reason more flexible for remaining qdiscs Incrementing on Daniel's patch[1], make tc-related drop reason more flexible for remaining qdiscs - that is, all qdiscs aside from clsact. In essence, the drop reason will be set by cls_api and act_api in case any error occurred in the data path. With that, we can give the user more detailed information so that they can distinguish between a policy drop or an error drop. [1] https://lore.kernel.org/all/20231009092655.22025-1-daniel@iogearbox.net Signed-off-by: Victor Nogueira <victor@mojatatu.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-12-20 11:50:13 +00:00
Victor Nogueira	fb2780721c	net: sched: Move drop_reason to struct tc_skb_cb Move drop_reason from struct tcf_result to skb cb - more specifically to struct tc_skb_cb. With that, we'll be able to also set the drop reason for the remaining qdiscs (aside from clsact) that do not have access to tcf_result when time comes to set the skb drop reason. Signed-off-by: Victor Nogueira <victor@mojatatu.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-12-20 11:50:13 +00:00
Ido Schimmel	2601e9c4b1	rtnetlink: bridge: Enable MDB bulk deletion Now that both the common code as well as individual drivers support MDB bulk deletion, allow user space to make such requests. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-12-20 11:27:21 +00:00
Ido Schimmel	a6acb535af	bridge: mdb: Add MDB bulk deletion support Implement MDB bulk deletion support in the bridge driver, allowing MDB entries to be deleted in bulk according to provided parameters. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-12-20 11:27:20 +00:00
Ido Schimmel	d8e81f1311	rtnetlink: bridge: Invoke MDB bulk deletion when needed Invoke the new MDB bulk deletion device operation when the 'NLM_F_BULK' flag is set in the netlink message header. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-12-20 11:27:20 +00:00
Ido Schimmel	e0cd06f7fc	rtnetlink: bridge: Use a different policy for MDB bulk delete For MDB bulk delete we will need to validate 'MDBA_SET_ENTRY' differently compared to regular delete. Specifically, allow the ifindex to be zero (in case not filtering on bridge port) and force the address to be zero as bulk delete based on address is not supported. Do that by introducing a new policy and choosing the correct policy based on the presence of the 'NLM_F_BULK' flag in the netlink message header. Use nlmsg_parse() for strict validation. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-12-20 11:27:20 +00:00
David S. Miller	498444e390	bluetooth pull request for net: - Add encryption key size check when acting as peripheral - Shut up false-positive build warning - Send reject if L2CAP command request is corrupted - Fix Use-After-Free in bt_sock_recvmsg - Fix not notifying when connection encryption changes - Fix not checking if HCI_OP_INQUIRY has been sent - Fix address type send over to the MGMT interface - Fix deadlock in vhci_send_frame -----BEGIN PGP SIGNATURE----- iQJNBAABCAA3FiEE7E6oRXp8w05ovYr/9JCA4xAyCykFAmV8mvcZHGx1aXoudm9u LmRlbnR6QGludGVsLmNvbQAKCRD0kIDjEDILKcFTEACQRFdK50+kKekfzNoFe4SA CEDQqPyoR59Q12zc/sZmVVdWKCLjHT3yx9/3UPOPhz4LWQl0juvpPhNE/WSIh+Dx eozOZViebOfh9DstOSnjDKg7qGrybEjmtpVJn+jcFfzJedDVHF+POM97SleGSj50 EgJ8j3Qe8KNVQqhT0beJUE/Ku5f6zDBxZ3gcgphIU2CVnWQwrR36yaY/psDRn3oz YXcSz/Un4gWpIw0pwby8kViFjcnt8c65xJLX0o+WL+2rJecTiikDvSPtVHhI3F9W 1dmZ9rNg+mBYc+Lw/LkpDpj051+A+dINa71jLS/XCZyj4MMQw3uH/8D2diHwLxX8 Nst3ZF/xDZsEjFEbJEfDamM6M7DtqmltQ8A28Lmp90+AMIczNlrU2ZYAnRd+zK/H pbzTNddbVEj7gSSJb2OrymN45i1unX7dHZ9ASDAzBDxgQEdsrM0W1Qp0YW+7qefa ihvQOGM52ZdGNM6C73rvSYXssGKTp8G/illIQkHg4szs1H96Yw4x2ArXo5S1KANC 5c/k4qQEsr7vgDgkfORtJZ6BKFBb6p6/toyf2GizAiGfcsKCUwHvd7i8/AeGznj3 bCkLnL5v9nOSEeIqvBe9hIFsL6a3k1CbpTF4U/tK33Ybqqp8eiGrQY35+1EESdY4 Cy5fz5AgONuLAiOoypmN5A== =JAze -----END PGP SIGNATURE----- Merge tag 'for-net-2023-12-15' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Luiz Augusto von Dentz says: ==================== bluetooth pull request for net: - Add encryption key size check when acting as peripheral - Shut up false-positive build warning - Send reject if L2CAP command request is corrupted - Fix Use-After-Free in bt_sock_recvmsg - Fix not notifying when connection encryption changes - Fix not checking if HCI_OP_INQUIRY has been sent - Fix address type send over to the MGMT interface - Fix deadlock in vhci_send_frame ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2023-12-20 11:12:12 +00:00
Paolo Abeni	1728df7fc1	bpf-next-for-netdev -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZYHIBwAKCRDbK58LschI g7nvAQDM5/5vMbsO9rEJAdHomJ4Ybvcv/2SG652za9DVEu6VUgD+JCLJ9He/hMx2 1wZ3wcjwxibNAZ/cZp6yu1JpOtR3rQo= =BO/b -----END PGP SIGNATURE----- Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2023-12-19 Hi David, hi Jakub, hi Paolo, hi Eric, The following pull-request contains BPF updates for your net-next tree. We've added 2 non-merge commits during the last 1 day(s) which contain a total of 40 files changed, 642 insertions(+), 2926 deletions(-). The main changes are: 1) Revert all of BPF token-related patches for now as per list discussion [0], from Andrii Nakryiko. [0] https://lore.kernel.org/bpf/CAHk-=wg7JuFYwGy=GOMbRCtOL+jwSQsdUaBsRWkDVYbxipbM5A@mail.gmail.com 2) Fix a syzbot-reported use-after-free read in nla_find() triggered from bpf_skb_get_nlattr_nest() helper, from Jakub Kicinski. bpf-next-for-netdev * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: Revert BPF token-related functionality bpf: Use nla_ok() instead of checking nla_len directly ==================== Link: https://lore.kernel.org/r/20231219170359.11035-1-daniel@iogearbox.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-12-19 18:35:28 +01:00
Andrii Nakryiko	d17aff807f	Revert BPF token-related functionality This patch includes the following revert (one conflicting BPF FS patch and three token patch sets, represented by merge commits): - revert `0f5d5454c7` "Merge branch 'bpf-fs-mount-options-parsing-follow-ups'"; - revert `750e785796` "bpf: Support uid and gid when mounting bpffs"; - revert `733763285a` "Merge branch 'bpf-token-support-in-libbpf-s-bpf-object'"; - revert `c35919dcce` "Merge branch 'bpf-token-and-bpf-fs-based-delegation'". Link: https://lore.kernel.org/bpf/CAHk-=wg7JuFYwGy=GOMbRCtOL+jwSQsdUaBsRWkDVYbxipbM5A@mail.gmail.com Signed-off-by: Andrii Nakryiko <andrii@kernel.org>	2023-12-19 08:23:03 -08:00
Jiri Pirko	ded6f77c05	devlink: extend multicast filtering by port index Expose the previously introduced notification multicast messages filtering infrastructure and allow the user to select messages using port index. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-12-19 15:31:40 +01:00
Jiri Pirko	13b127d257	devlink: add a command to set notification filter and use it for multicasts Currently the user listening on a socket for devlink notifications gets always all messages for all existing instances, even if he is interested only in one of those. That may cause unnecessary overhead on setups with thousands of instances present. User is currently able to narrow down the devlink objects replies to dump commands by specifying select attributes. Allow similar approach for notifications. Introduce a new devlink NOTIFY_FILTER_SET which the user passes the select attributes. Store these per-socket and use them for filtering messages during multicast send. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-12-19 15:31:40 +01:00
Jiri Pirko	403863e985	netlink: introduce typedef for filter function Make the code using filter function a bit nicer by consolidating the filter function arguments using typedef. Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-12-19 15:31:40 +01:00
Jiri Pirko	a731132424	genetlink: introduce per-sock family private storage Introduce an xarray for Generic netlink family to store per-socket private. Initialize this xarray only if family uses per-socket privs. Introduce genl_sk_priv_get() to get the socket priv pointer for a family and initialize it in case it does not exist. Introduce __genl_sk_priv_get() to obtain socket priv pointer for a family under RCU read lock. Allow family to specify the priv size, init() and destroy() callbacks. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-12-19 15:31:40 +01:00
Jiri Pirko	5648de0b1f	devlink: introduce a helper for netlink multicast send Introduce a helper devlink_nl_notify_send() so each object notification function does not have to call genlmsg_multicast_netns() with the same arguments. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-12-19 15:31:40 +01:00
Jiri Pirko	cddbff470e	devlink: send notifications only if there are listeners Introduce devlink_nl_notify_need() helper and using it to check at the beginning of notification functions to avoid overhead of composing notification messages in case nobody listens. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-12-19 15:31:40 +01:00
Jiri Pirko	11280ddeae	devlink: introduce __devl_is_registered() helper and use it instead of xa_get_mark() Introduce __devl_is_registered() which does not assert on devlink instance lock and use it in notifications which may be called without devlink instance lock held. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-12-19 15:31:40 +01:00
Jiri Pirko	337ad364c4	devlink: use devl_is_registered() helper instead xa_get_mark() Instead of checking the xarray mark directly using xa_get_mark() helper use devl_is_registered() helper which wraps it up. Note that there are couple more users of xa_get_mark() left which are going to be handled by the next patch. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-12-19 15:31:40 +01:00
Jakub Kicinski	2130c519a4	bpf: Use nla_ok() instead of checking nla_len directly nla_len may also be too short to be sane, in which case after recent changes nla_len() will return a wrapped value. Fixes: `172db56d90` ("netlink: Return unsigned value for nla_len()") Reported-by: syzbot+f43a23b6e622797c7a28@syzkaller.appspotmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/bpf/20231218231904.260440-1-kuba@kernel.org	2023-12-19 15:20:40 +01:00
Liu Jian	01a564bab4	net: check vlan filter feature in vlan_vids_add_by_dev() and vlan_vids_del_by_dev() I got the below warning trace: WARNING: CPU: 4 PID: 4056 at net/core/dev.c:11066 unregister_netdevice_many_notify CPU: 4 PID: 4056 Comm: ip Not tainted 6.7.0-rc4+ #15 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014 RIP: 0010:unregister_netdevice_many_notify+0x9a4/0x9b0 Call Trace: rtnl_dellink rtnetlink_rcv_msg netlink_rcv_skb netlink_unicast netlink_sendmsg __sock_sendmsg ____sys_sendmsg ___sys_sendmsg __sys_sendmsg do_syscall_64 entry_SYSCALL_64_after_hwframe It can be repoduced via: ip netns add ns1 ip netns exec ns1 ip link add bond0 type bond mode 0 ip netns exec ns1 ip link add bond_slave_1 type veth peer veth2 ip netns exec ns1 ip link set bond_slave_1 master bond0 [1] ip netns exec ns1 ethtool -K bond0 rx-vlan-filter off [2] ip netns exec ns1 ip link add link bond_slave_1 name bond_slave_1.0 type vlan id 0 [3] ip netns exec ns1 ip link add link bond0 name bond0.0 type vlan id 0 [4] ip netns exec ns1 ip link set bond_slave_1 nomaster [5] ip netns exec ns1 ip link del veth2 ip netns del ns1 This is all caused by command [1] turning off the rx-vlan-filter function of bond0. The reason is the same as commit `01f4fd2708` ("bonding: Fix incorrect deletion of ETH_P_8021AD protocol vid from slaves"). Commands [2] [3] add the same vid to slave and master respectively, causing command [4] to empty slave->vlan_info. The following command [5] triggers this problem. To fix this problem, we should add VLAN_FILTER feature checks in vlan_vids_add_by_dev() and vlan_vids_del_by_dev() to prevent incorrect addition or deletion of vlan_vid information. Fixes: `348a1443cc` ("vlan: introduce functions to do mass addition/deletion of vids by another device") Signed-off-by: Liu Jian <liujian56@huawei.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-12-19 13:13:56 +01:00
Jakub Kicinski	c49b292d03	netdev -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmWAz2EACgkQ6rmadz2v bToqrw/9EwroZCc8GEHOKAlb/fzrMvn92rLo0ZW/cGN84QJPnx4zM6Zo0+fgLaaN oqqztwMUwdzGC3uX3FfVXaaLKbJ/MeHeL9BXFZNW8zkRHciw4R7kIBhOdPnHyET7 uT+rQ4xPe1Mt7e9PjepKlSL5mEsxWfBkdUgsdn19Z2Vjdfr9mZMhYWYMJGcfTCD1 TwxHKBPhq5fN3IsshmMBB8IrRp1HStUKb65MgZ4dI22LJXxTsFkx5XMFXcmuqvkH NhKj8jDcPEEh31bYcb6aG2Z4onw5F2lquygjk1Qyy5cyw45m/ipJKAXKdAyvJG+R VZCWOET/9wbRwFSK5wxwihCuKghFiofK52i2PcGtXZh0PCouyZZneSJOKM0yVWKO BvuJBxK4ETRnQyN6ZxhuJiEXG3/YMBBhyR2TX1LntVK9ct/k7qFVzATG49J39/sR SYMbptBRj4a5oMJ1qn0nFVEDFkg0jTnTDNnsEpcz60Ayt6EsJ1XosO5yz2huf861 xgRMTKMseyG1/uV45tQ8ZPzbSPpBxjUi9Dl3coYsIm1a+y6clWUXcarONY5KVrpS CR98DuFgl+E7dXuisd/Kz2p2KxxSPq8nytsmLlgOvrUqhwiXqB+TKN8EHgIapVOt l1A5LrzXFTcGlT9MlaWBqEIy83Bu1nqQqbxrAFOE0k8A5jomXaw= =stU2 -----END PGP SIGNATURE----- Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Alexei Starovoitov says: ==================== pull-request: bpf-next 2023-12-18 This PR is larger than usual and contains changes in various parts of the kernel. The main changes are: 1) Fix kCFI bugs in BPF, from Peter Zijlstra. End result: all forms of indirect calls from BPF into kernel and from kernel into BPF work with CFI enabled. This allows BPF to work with CONFIG_FINEIBT=y. 2) Introduce BPF token object, from Andrii Nakryiko. It adds an ability to delegate a subset of BPF features from privileged daemon (e.g., systemd) through special mount options for userns-bound BPF FS to a trusted unprivileged application. The design accommodates suggestions from Christian Brauner and Paul Moore. Example: $ sudo mkdir -p /sys/fs/bpf/token $ sudo mount -t bpf bpffs /sys/fs/bpf/token \ -o delegate_cmds=prog_load:MAP_CREATE \ -o delegate_progs=kprobe \ -o delegate_attachs=xdp 3) Various verifier improvements and fixes, from Andrii Nakryiko, Andrei Matei. - Complete precision tracking support for register spills - Fix verification of possibly-zero-sized stack accesses - Fix access to uninit stack slots - Track aligned STACK_ZERO cases as imprecise spilled registers. It improves the verifier "instructions processed" metric from single digit to 50-60% for some programs. - Fix verifier retval logic 4) Support for VLAN tag in XDP hints, from Larysa Zaremba. 5) Allocate BPF trampoline via bpf_prog_pack mechanism, from Song Liu. End result: better memory utilization and lower I$ miss for calls to BPF via BPF trampoline. 6) Fix race between BPF prog accessing inner map and parallel delete, from Hou Tao. 7) Add bpf_xdp_get_xfrm_state() kfunc, from Daniel Xu. It allows BPF interact with IPSEC infra. The intent is to support software RSS (via XDP) for the upcoming ipsec pcpu work. Experiments on AWS demonstrate single tunnel pcpu ipsec reaching line rate on 100G ENA nics. 8) Expand bpf_cgrp_storage to support cgroup1 non-attach, from Yafang Shao. 9) BPF file verification via fsverity, from Song Liu. It allows BPF progs get fsverity digest. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (164 commits) bpf: Ensure precise is reset to false in __mark_reg_const_zero() selftests/bpf: Add more uprobe multi fail tests bpf: Fail uprobe multi link with negative offset selftests/bpf: Test the release of map btf s390/bpf: Fix indirect trampoline generation selftests/bpf: Temporarily disable dummy_struct_ops test on s390 x86/cfi,bpf: Fix bpf_exception_cb() signature bpf: Fix dtor CFI cfi: Add CFI_NOSEAL() x86/cfi,bpf: Fix bpf_struct_ops CFI x86/cfi,bpf: Fix bpf_callback_t CFI x86/cfi,bpf: Fix BPF JIT call cfi: Flip headers selftests/bpf: Add test for abnormal cnt during multi-kprobe attachment selftests/bpf: Don't use libbpf_get_error() in kprobe_multi_test selftests/bpf: Add test for abnormal cnt during multi-uprobe attachment bpf: Limit the number of kprobes when attaching program to multiple kprobes bpf: Limit the number of uprobes when attaching program to multiple uprobes bpf: xdp: Register generic_kfunc_set with XDP programs selftests/bpf: utilize string values for delegate_xxx mount options ... ==================== Link: https://lore.kernel.org/r/20231219000520.34178-1-alexei.starovoitov@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-12-18 16:46:08 -08:00
Jakub Kicinski	0ee28c9ae0	wireless-next patches for v6.8 The second features pull request for v6.8. A bigger one this time with changes both to stack and drivers. We have a new Wifi band RFI (WBRF) mitigation feature for which we pulled an immutable branch shared with other subsystems. And, as always, other new features and bug fixes all over. Major changes: cfg80211/mac80211 * AMD ACPI based Wifi band RFI (WBRF) mitigation feature * Basic Service Set (BSS) usage reporting * TID to link mapping support * mac80211 hardware flag to disallow puncturing iwlwifi * new debugfs file fw_dbg_clear mt76 * NVMEM EEPROM improvements * mt7996 Extremely High Throughpu (EHT) improvements * mt7996 Wireless Ethernet Dispatcher (WED) support * mt7996 36-bit DMA support ath12k * support one MSI vector * WCN7850: support AP mode -----BEGIN PGP SIGNATURE----- iQFFBAABCgAvFiEEiBjanGPFTz4PRfLobhckVSbrbZsFAmWAdRERHGt2YWxvQGtl cm5lbC5vcmcACgkQbhckVSbrbZu0RAf+JtHgfjmUMFb54xcncLgj8ZAN82E0ThE0 bPewQDhot0QTri4s7i5Kn8PCWjk+eKEmiIK+eARM+JDyZMTlCpXs2Y92cDAGQ8KG +LbIMRQkwOUg0HmtX3NysUG3mGAx4QTcIX/y3+GmtMZpKXMFuNy6ODuFvuWFNJrF 3XTq1qFQNnA0XqUDKHW9uareeCiOMVOsqcxNW2FAi2gqRUfQpKnU1Ukv5iOjkqE9 i53GHzeAG2WI4/YjXaTEZvibkM3jqrPcquHlul3fVuq05qkKOEuiy2UalDjgDCYp u91vbmMpcOjhlf9GIiu2BF6K/muEUCCIjlh5oxob0k9NiKhnPUZLng== =6Y8M -----END PGP SIGNATURE----- Merge tag 'wireless-next-2023-12-18' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Kalle Valo says: ==================== wireless-next patches for v6.8 The second features pull request for v6.8. A bigger one this time with changes both to stack and drivers. We have a new Wifi band RFI (WBRF) mitigation feature for which we pulled an immutable branch shared with other subsystems. And, as always, other new features and bug fixes all over. Major changes: cfg80211/mac80211 * AMD ACPI based Wifi band RFI (WBRF) mitigation feature * Basic Service Set (BSS) usage reporting * TID to link mapping support * mac80211 hardware flag to disallow puncturing iwlwifi * new debugfs file fw_dbg_clear mt76 * NVMEM EEPROM improvements * mt7996 Extremely High Throughpu (EHT) improvements * mt7996 Wireless Ethernet Dispatcher (WED) support * mt7996 36-bit DMA support ath12k * support one MSI vector * WCN7850: support AP mode * tag 'wireless-next-2023-12-18' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (207 commits) wifi: mt76: mt7996: Use DECLARE_FLEX_ARRAY() and fix -Warray-bounds warnings wifi: ath11k: workaround too long expansion sparse warnings Revert "wifi: ath12k: use ATH12K_PCI_IRQ_DP_OFFSET for DP IRQ" wifi: rt2x00: remove useless code in rt2x00queue_create_tx_descriptor() wifi: rtw89: only reset BB/RF for existing WiFi 6 chips while starting up wifi: rtw89: add DBCC H2C to notify firmware the status wifi: rtw89: mac: add suffix _ax to MAC functions wifi: rtw89: mac: add flags to check if CMAC and DMAC are enabled wifi: rtw89: 8922a: add power on/off functions wifi: rtw89: add XTAL SI for WiFi 7 chips wifi: rtw89: phy: print out RFK log with formatted string wifi: rtw89: parse and print out RFK log from C2H events wifi: rtw89: add C2H event handlers of RFK log and report wifi: rtw89: load RFK log format string from firmware file wifi: rtw89: fw: add version field to BB MCU firmware element wifi: rtw89: fw: load TX power track tables from fw_element wifi: mwifiex: configure BSSID consistently when starting AP wifi: mwifiex: add extra delay for firmware ready wifi: mac80211: sta_info.c: fix sentence grammar wifi: mac80211: rx.c: fix sentence grammar ... ==================== Link: https://lore.kernel.org/r/20231218163900.C031DC433C9@smtp.kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-12-18 16:17:34 -08:00
Chuck Lever	bd018b98ba	SUNRPC: Revert `5f7fc5d69f` Guillaume says: > I believe commit `5f7fc5d69f` ("SUNRPC: Resupply rq_pages from > node-local memory") in Linux 6.5+ is incorrect. It passes > unconditionally rq_pool->sp_id as the NUMA node. > > While the comment in the svc_pool declaration in sunrpc/svc.h says > that sp_id is also the NUMA node id, it might not be the case if > the svc is created using svc_create_pooled(). svc_created_pooled() > can use the per-cpu pool mode therefore in this case sp_id would > be the cpu id. Fix this by reverting now. At a later point this minor optimization, and the deceptive labeling of the sp_id field, can be revisited. Reported-by: Guillaume Morin <guillaume@morinfr.org> Closes: https://lore.kernel.org/linux-nfs/ZYC9rsno8qYggVt9@bender.morinfr.org/T/#u Fixes: `5f7fc5d69f` ("SUNRPC: Resupply rq_pages from node-local memory") Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2023-12-18 17:10:52 -05:00
Pedro Tammela	174523479a	net: rtnl: use rcu_replace_pointer_rtnl in rtnl_unregister_* With the introduction of the rcu_replace_pointer_rtnl helper, cleanup the rtnl_unregister_* functions to use the helper instead of open coding it. Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-12-18 02:05:45 +00:00
Matthieu Baerts	a8f570b247	mptcp: fill in missing MODULE_DESCRIPTION() W=1 builds warn on missing MODULE_DESCRIPTION, add them here in MPTCP. Only two were missing: two modules with different KUnit tests for MPTCP. Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts <matttbe@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-12-17 20:54:22 +00:00
Paolo Abeni	4fd19a3070	mptcp: fix inconsistent state on fastopen race The netlink PM can race with fastopen self-connect attempts, shutting down the first subflow via: MPTCP_PM_CMD_DEL_ADDR -> mptcp_nl_remove_id_zero_address -> mptcp_pm_nl_rm_subflow_received -> mptcp_close_ssk and transitioning such subflow to FIN_WAIT1 status before the syn-ack packet is processed. The MPTCP code does not react to such state change, leaving the connection in not-fallback status and the subflow handshake uncompleted, triggering the following splat: WARNING: CPU: 0 PID: 10630 at net/mptcp/subflow.c:1405 subflow_data_ready+0x39f/0x690 net/mptcp/subflow.c:1405 Modules linked in: CPU: 0 PID: 10630 Comm: kworker/u4:11 Not tainted 6.6.0-syzkaller-14500-g1c41041124bd #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/09/2023 Workqueue: bat_events batadv_nc_worker RIP: 0010:subflow_data_ready+0x39f/0x690 net/mptcp/subflow.c:1405 Code: 18 89 ee e8 e3 d2 21 f7 40 84 ed 75 1f e8 a9 d7 21 f7 44 89 fe bf 07 00 00 00 e8 0c d3 21 f7 41 83 ff 07 74 07 e8 91 d7 21 f7 <0f> 0b e8 8a d7 21 f7 48 89 df e8 d2 b2 ff ff 31 ff 89 c5 89 c6 e8 RSP: 0018:ffffc90000007448 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff888031efc700 RCX: ffffffff8a65baf4 RDX: ffff888043222140 RSI: ffffffff8a65baff RDI: 0000000000000005 RBP: 0000000000000000 R08: 0000000000000005 R09: 0000000000000007 R10: 000000000000000b R11: 0000000000000000 R12: 1ffff92000000e89 R13: ffff88807a534d80 R14: ffff888021c11a00 R15: 000000000000000b FS: 0000000000000000(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fa19a0ffc81 CR3: 000000007a2db000 CR4: 00000000003506f0 DR0: 000000000000d8dd DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Call Trace: <IRQ> tcp_data_ready+0x14c/0x5b0 net/ipv4/tcp_input.c:5128 tcp_data_queue+0x19c3/0x5190 net/ipv4/tcp_input.c:5208 tcp_rcv_state_process+0x11ef/0x4e10 net/ipv4/tcp_input.c:6844 tcp_v4_do_rcv+0x369/0xa10 net/ipv4/tcp_ipv4.c:1929 tcp_v4_rcv+0x3888/0x3b30 net/ipv4/tcp_ipv4.c:2329 ip_protocol_deliver_rcu+0x9f/0x480 net/ipv4/ip_input.c:205 ip_local_deliver_finish+0x2e4/0x510 net/ipv4/ip_input.c:233 NF_HOOK include/linux/netfilter.h:314 [inline] NF_HOOK include/linux/netfilter.h:308 [inline] ip_local_deliver+0x1b6/0x550 net/ipv4/ip_input.c:254 dst_input include/net/dst.h:461 [inline] ip_rcv_finish+0x1c4/0x2e0 net/ipv4/ip_input.c:449 NF_HOOK include/linux/netfilter.h:314 [inline] NF_HOOK include/linux/netfilter.h:308 [inline] ip_rcv+0xce/0x440 net/ipv4/ip_input.c:569 __netif_receive_skb_one_core+0x115/0x180 net/core/dev.c:5527 __netif_receive_skb+0x1f/0x1b0 net/core/dev.c:5641 process_backlog+0x101/0x6b0 net/core/dev.c:5969 __napi_poll.constprop.0+0xb4/0x540 net/core/dev.c:6531 napi_poll net/core/dev.c:6600 [inline] net_rx_action+0x956/0xe90 net/core/dev.c:6733 __do_softirq+0x21a/0x968 kernel/softirq.c:553 do_softirq kernel/softirq.c:454 [inline] do_softirq+0xaa/0xe0 kernel/softirq.c:441 </IRQ> <TASK> __local_bh_enable_ip+0xf8/0x120 kernel/softirq.c:381 spin_unlock_bh include/linux/spinlock.h:396 [inline] batadv_nc_purge_paths+0x1ce/0x3c0 net/batman-adv/network-coding.c:471 batadv_nc_worker+0x9b1/0x10e0 net/batman-adv/network-coding.c:722 process_one_work+0x884/0x15c0 kernel/workqueue.c:2630 process_scheduled_works kernel/workqueue.c:2703 [inline] worker_thread+0x8b9/0x1290 kernel/workqueue.c:2784 kthread+0x33c/0x440 kernel/kthread.c:388 ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:242 </TASK> To address the issue, catch the racing subflow state change and use it to cause the MPTCP fallback. Such fallback is also used to cause the first subflow state propagation to the msk socket via mptcp_set_connected(). After this change, the first subflow can additionally propagate the TCP_FIN_WAIT1 state, so rename the helper accordingly. Finally, if the state propagation is delayed to the msk release callback, the first subflow can change to a different state in between. Cache the relevant target state in a new msk-level field and use such value to update the msk state at release time. Fixes: `1e777f39b4` ("mptcp: add MSG_FASTOPEN sendmsg flag support") Cc: stable@vger.kernel.org Reported-by: <syzbot+c53d4d3ddb327e80bc51@syzkaller.appspotmail.com> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/458 Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts <matttbe@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-12-17 20:54:22 +00:00
Liang Chen	f7dc3248dc	skbuff: Optimization of SKB coalescing for page pool In order to address the issues encountered with commit `1effe8ca4e` ("skbuff: fix coalescing for page_pool fragment recycling"), the combination of the following condition was excluded from skb coalescing: from->pp_recycle = 1 from->cloned = 1 to->pp_recycle = 1 However, with page pool environments, the aforementioned combination can be quite common(ex. NetworkMananger may lead to the additional packet_type being registered, thus the cloning). In scenarios with a higher number of small packets, it can significantly affect the success rate of coalescing. For example, considering packets of 256 bytes size, our comparison of coalescing success rate is as follows: Without page pool: 70% With page pool: 13% Consequently, this has an impact on performance: Without page pool: 2.57 Gbits/sec With page pool: 2.26 Gbits/sec Therefore, it seems worthwhile to optimize this scenario and enable coalescing of this particular combination. To achieve this, we need to ensure the correct increment of the "from" SKB page's page pool reference count (pp_ref_count). Following this optimization, the success rate of coalescing measured in our environment has improved as follows: With page pool: 60% This success rate is approaching the rate achieved without using page pool, and the performance has also been improved: With page pool: 2.52 Gbits/sec Below is the performance comparison for small packets before and after this optimization. We observe no impact to packets larger than 4K. packet size before after improved (bytes) (Gbits/sec) (Gbits/sec) 128 1.19 1.27 7.13% 256 2.26 2.52 11.75% 512 4.13 4.81 16.50% 1024 6.17 6.73 9.05% 2048 14.54 15.47 6.45% 4096 25.44 27.87 9.52% Signed-off-by: Liang Chen <liangchen.linux@gmail.com> Reviewed-by: Yunsheng Lin <linyunsheng@huawei.com> Suggested-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Mina Almasry <almasrymina@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-12-17 10:56:33 +00:00
Liang Chen	8cfa2dee32	skbuff: Add a function to check if a page belongs to page_pool Wrap code for checking if a page is a page_pool page into a function for better readability and ease of reuse. Signed-off-by: Liang Chen <liangchen.linux@gmail.com> Reviewed-by: Yunsheng Lin <linyunsheng@huawei.com> Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Reviewed-by: Mina Almasry <almarsymina@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-12-17 10:56:33 +00:00
Liang Chen	aaf153aece	page_pool: halve BIAS_MAX for multiple user references of a fragment Up to now, we were only subtracting from the number of used page fragments to figure out when a page could be freed or recycled. A following patch introduces support for multiple users referencing the same fragment. So reduce the initial page fragments value to half to avoid overflowing. Signed-off-by: Liang Chen <liangchen.linux@gmail.com> Reviewed-by: Yunsheng Lin <linyunsheng@huawei.com> Reviewed-by: Mina Almasry <almarsymina@google.com> Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-12-17 10:56:33 +00:00
Eric Dumazet	207184853d	tcp/dccp: change source port selection at connect() time In commit `1580ab63fc` ("tcp/dccp: better use of ephemeral ports in connect()") we added an heuristic to select even ports for connect() and odd ports for bind(). This was nice because no applications changes were needed. But it added more costs when all even ports are in use, when there are few listeners and many active connections. Since then, IP_LOCAL_PORT_RANGE has been added to permit an application to partition ephemeral port range at will. This patch extends the idea so that if IP_LOCAL_PORT_RANGE is set on a socket before accept(), port selection no longer favors even ports. This means that connect() can find a suitable source port faster, and applications can use a different split between connect() and bind() users. This should give more entropy to Toeplitz hash used in RSS: Using even ports was wasting one bit from the 16bit sport. A similar change can be done in inet_csk_find_open_port() if needed. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jakub Sitnicki <jakub@cloudflare.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Link: https://lore.kernel.org/r/20231214192939.1962891-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-12-15 17:56:27 -08:00
Eric Dumazet	41db7626b7	inet: returns a bool from inet_sk_get_local_port_range() Change inet_sk_get_local_port_range() to return a boolean, telling the callers if the port range was provided by IP_LOCAL_PORT_RANGE socket option. Adds documentation while we are at it. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20231214192939.1962891-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-12-15 17:56:27 -08:00
Peter Zijlstra	e4c0033989	bpf: Fix dtor CFI Ensure the various dtor functions match their prototype and retain their CFI signatures, since they don't have their address taken, they are prone to not getting CFI, making them impossible to call indirectly. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20231215092707.799451071@infradead.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-12-15 16:25:55 -08:00
Peter Zijlstra	2cd3e3772e	x86/cfi,bpf: Fix bpf_struct_ops CFI BPF struct_ops uses __arch_prepare_bpf_trampoline() to write trampolines for indirect function calls. These tramplines much have matching CFI. In order to obtain the correct CFI hash for the various methods, add a matching structure that contains stub functions, the compiler will generate correct CFI which we can pilfer for the trampolines. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20231215092707.566977112@infradead.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-12-15 16:25:55 -08:00

1 2 3 4 5 ...

75520 Commits