Commit Graph

79044 Commits

Author SHA1 Message Date
Matthieu Baerts (NGI0)
cfbbd48598 mptcp: no admin perm to list endpoints
During the switch to YNL, the command to list all endpoints has been
accidentally restricted to users with admin permissions.

It looks like there are no reasons to have this restriction which makes
it harder for a user to quickly check if the endpoint list has been
correctly populated by an automated tool. Best to go back to the
previous behaviour then.

mptcp_pm_gen.c has been modified using ynl-gen-c.py:

   $ ./tools/net/ynl/ynl-gen-c.py --mode kernel \
     --spec Documentation/netlink/specs/mptcp_pm.yaml --source \
     -o net/mptcp/mptcp_pm_gen.c

The header file doesn't need to be regenerated.

Fixes: 1d0507f468 ("net: mptcp: convert netlink from small_ops to ops")
Cc: stable@vger.kernel.org
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20241104-net-mptcp-misc-6-12-v1-1-c13f2ff1656f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-05 17:51:08 -08:00
Aaron Conole
7d1c2d517f openvswitch: Pass on secpath details for internal port rx.
Clearing the secpath for internal ports will cause packet drops when
ipsec offload or early SW ipsec decrypt are used.  Systems that rely
on these will not be able to actually pass traffic via openvswitch.

There is still an open issue for a flow miss packet - this is because
we drop the extensions during upcall and there is no facility to
restore such data (and it is non-trivial to add such functionality
to the upcall interface).  That means that when a flow miss occurs,
there will still be packet drops.  With this patch, when a flow is
found then traffic which has an associated xfrm extension will
properly flow.

Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Link: https://patch.msgid.link/20241101204732.183840-1-aconole@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-05 17:38:25 -08:00
Florian Westphal
cddc04275f netfilter: nf_tables: must hold rcu read lock while iterating object type list
Update of stateful object triggers:
WARNING: suspicious RCU usage
net/netfilter/nf_tables_api.c:7759 RCU-list traversed in non-reader section!!

other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1
1 lock held by nft/3060:
 #0: ffff88810f0578c8 (&nft_net->commit_mutex){+.+.}-{4:4}, [..]

... but this list is not protected by the transaction mutex but the
nfnl nftables subsystem mutex.

Switch to nft_obj_type_get which will acquire rcu read lock,
bump refcount, and returns the result.

v3: Dan Carpenter points out nft_obj_type_get returns error pointer, not
NULL, on error.

Fixes: dad3bdeef4 ("netfilter: nf_tables: fix memory leak during stateful obj update").
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-11-05 22:07:12 +01:00
Florian Westphal
ee666a541e netfilter: nf_tables: must hold rcu read lock while iterating expression type list
nft shell tests trigger:
 WARNING: suspicious RCU usage
 net/netfilter/nf_tables_api.c:3125 RCU-list traversed in non-reader section!!
 1 lock held by nft/2068:
  #0: ffff888106c6f8c8 (&nft_net->commit_mutex){+.+.}-{4:4}, at: nf_tables_valid_genid+0x3c/0xf0

But the transaction mutex doesn't protect this list, the nfnl subsystem
mutex would, but we can't acquire it here without risk of ABBA
deadlocks.

Acquire the rcu read lock to avoid this issue.

v3: add a comment that explains the ->inner_ops check implies
expression is builtin and lack of a module owner reference is ok.

Fixes: 3a07327d10 ("netfilter: nft_inner: support for inner tunnel header matching")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-11-05 22:06:32 +01:00
Florian Westphal
3567146b94 netfilter: nf_tables: avoid false-positive lockdep splats with basechain hook
Like previous patches: iteration is ok if the list cannot be altered in
parallel.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-11-05 22:06:23 +01:00
Florian Westphal
28b7a6b84c netfilter: nf_tables: avoid false-positive lockdep splats in set walker
Its not possible to add or delete elements from hash and bitmap sets,
as long as caller is holding the transaction mutex, so its ok to iterate
the list outside of rcu read side critical section.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-11-05 22:06:22 +01:00
Florian Westphal
b3e8f29d6b netfilter: nf_tables: avoid false-positive lockdep splats with flowtables
The transaction mutex prevents concurrent add/delete, its ok to iterate
those lists outside of rcu read side critical sections.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-11-05 22:06:18 +01:00
Florian Westphal
8f5f3786db netfilter: nf_tables: avoid false-positive lockdep splats with sets
Same as previous patch.  All set handling functions here can be called
with transaction mutex held (but not the rcu read lock).

The transaction mutex prevents concurrent add/delete, so this is fine.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-11-05 22:06:14 +01:00
Florian Westphal
9adbb4198b netfilter: nf_tables: avoid false-positive lockdep splat on rule deletion
On rule delete we get:
 WARNING: suspicious RCU usage
 net/netfilter/nf_tables_api.c:3420 RCU-list traversed in non-reader section!!
 1 lock held by iptables/134:
   #0: ffff888008c4fcc8 (&nft_net->commit_mutex){+.+.}-{3:3}, at: nf_tables_valid_genid (include/linux/jiffies.h:101) nf_tables

Code is fine, no other CPU can change the list because we're holding
transaction mutex.

Pass the needed lockdep annotation to the iterator and fix
two comments for functions that are no longer restricted to rcu-only
context.

This is enough to resolve rule delete, but there are several other
missing annotations, added in followup-patches.

Fixes: 28875945ba ("rcu: Add support for consolidated-RCU reader checking")
Reported-by: Matthieu Baerts <matttbe@kernel.org>
Tested-by: Matthieu Baerts <matttbe@kernel.org>
Closes: https://lore.kernel.org/netfilter-devel/da27f17f-3145-47af-ad0f-7fd2a823623e@kernel.org/
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-11-05 22:06:10 +01:00
NeilBrown
10f0740234 sunrpc: handle -ENOTCONN in xs_tcp_setup_socket()
xs_tcp_finish_connecting() can return -ENOTCONN but the switch statement
in xs_tcp_setup_socket() treats that as an unhandled error.

If we treat it as a known error it would propagate back to
call_connect_status() which does handle that error code.  This appears
to be the intention of the commit (given below) which added -ENOTCONN as
a return status for xs_tcp_finish_connecting().

So add -ENOTCONN to the switch statement as an error to pass through to
the caller.

Link: https://bugzilla.suse.com/show_bug.cgi?id=1231050
Link: https://access.redhat.com/discussions/3434091
Fixes: 01d37c428a ("SUNRPC: xprt_connect() don't abort the task if the transport isn't bound")
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-11-04 10:24:18 -05:00
Jakub Kicinski
cbf49bed6a bpf-next-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iIsEABYIADMWIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZyP6TxUcZGFuaWVsQGlv
 Z2VhcmJveC5uZXQACgkQ2yufC7HISINz7QD/RTuJAzPJXPQmjdzMj7pepjnSQH4K
 DnOc1soDqjJPSFkBAMlklDCZqSsFoNtNxagbyILrYQBC/MsV9jngimK46DEN
 =pDzC
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2024-10-31

We've added 13 non-merge commits during the last 16 day(s) which contain
a total of 16 files changed, 710 insertions(+), 668 deletions(-).

The main changes are:

1) Optimize and homogenize bpf_csum_diff helper for all archs and also
   add a batch of new BPF selftests for it, from Puranjay Mohan.

2) Rewrite and migrate the test_tcp_check_syncookie.sh BPF selftest
   into test_progs so that it can be run in BPF CI, from Alexis Lothoré.

3) Two BPF sockmap selftest fixes, from Zijian Zhang.

4) Small XDP synproxy BPF selftest cleanup to remove IP_DF check,
   from Vincent Li.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
  selftests/bpf: Add a selftest for bpf_csum_diff()
  selftests/bpf: Don't mask result of bpf_csum_diff() in test_verifier
  bpf: bpf_csum_diff: Optimize and homogenize for all archs
  net: checksum: Move from32to16() to generic header
  selftests/bpf: remove xdp_synproxy IP_DF check
  selftests/bpf: remove test_tcp_check_syncookie
  selftests/bpf: test MSS value returned with bpf_tcp_gen_syncookie
  selftests/bpf: add ipv4 and dual ipv4/ipv6 support in btf_skc_cls_ingress
  selftests/bpf: get rid of global vars in btf_skc_cls_ingress
  selftests/bpf: add missing ns cleanups in btf_skc_cls_ingress
  selftests/bpf: factorize conn and syncookies tests in a single runner
  selftests/bpf: Fix txmsg_redir of test_txmsg_pull in test_sockmap
  selftests/bpf: Fix msg_verify_data in test_sockmap

====================

Link: https://patch.msgid.link/20241031221543.108853-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-03 14:44:51 -08:00
Dmitry Safonov
6b2d11e2d8 net/tcp: Add missing lockdep annotations for TCP-AO hlist traversals
Under CONFIG_PROVE_RCU_LIST + CONFIG_RCU_EXPERT
hlist_for_each_entry_rcu() provides very helpful splats, which help
to find possible issues. I missed CONFIG_RCU_EXPERT=y in my testing
config the same as described in
a3e4bf7f96 ("configs/debug: make sure PROVE_RCU_LIST=y takes effect").

The fix itself is trivial: add the very same lockdep annotations
as were used to dereference ao_info from the socket.

Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://lore.kernel.org/netdev/20241028152645.35a8be66@kernel.org/
Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Link: https://patch.msgid.link/20241030-tcp-ao-hlist-lockdep-annotate-v1-1-bf641a64d7c6@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-03 12:10:11 -08:00
Gustavo A. R. Silva
3bd9b9abdf net: ethtool: Avoid thousands of -Wflex-array-member-not-at-end warnings
-Wflex-array-member-not-at-end was introduced in GCC-14, and we are
getting ready to enable it, globally.

Change the type of the middle struct member currently causing trouble from
`struct ethtool_link_settings` to `struct ethtool_link_settings_hdr`.

Additionally, update the type of some variables in various functions that
don't access the flexible-array member, changing them to the newly created
`struct ethtool_link_settings_hdr`. These changes are needed because the
type of the conflicting middle members changed. So, those instances that
expect the type to be `struct ethtool_link_settings` should be adjusted to
the newly created type `struct ethtool_link_settings_hdr`.

Also, adjust variable declarations to follow the reverse xmas tree
convention.

Fix 3338 of the following -Wflex-array-member-not-at-end warnings:

include/linux/ethtool.h:214:38: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://patch.msgid.link/0bc2809fe2a6c11dd4c8a9a10d9bd65cccdb559b.1730238285.git.gustavoars@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-03 11:06:58 -08:00
Xin Long
0ead60804b sctp: properly validate chunk size in sctp_sf_ootb()
A size validation fix similar to that in Commit 50619dbf8d ("sctp: add
size validation when walking chunks") is also required in sctp_sf_ootb()
to address a crash reported by syzbot:

  BUG: KMSAN: uninit-value in sctp_sf_ootb+0x7f5/0xce0 net/sctp/sm_statefuns.c:3712
  sctp_sf_ootb+0x7f5/0xce0 net/sctp/sm_statefuns.c:3712
  sctp_do_sm+0x181/0x93d0 net/sctp/sm_sideeffect.c:1166
  sctp_endpoint_bh_rcv+0xc38/0xf90 net/sctp/endpointola.c:407
  sctp_inq_push+0x2ef/0x380 net/sctp/inqueue.c:88
  sctp_rcv+0x3831/0x3b20 net/sctp/input.c:243
  sctp4_rcv+0x42/0x50 net/sctp/protocol.c:1159
  ip_protocol_deliver_rcu+0xb51/0x13d0 net/ipv4/ip_input.c:205
  ip_local_deliver_finish+0x336/0x500 net/ipv4/ip_input.c:233

Reported-by: syzbot+f0cbb34d39392f2746ca@syzkaller.appspotmail.com
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Link: https://patch.msgid.link/a29ebb6d8b9f8affd0f9abb296faafafe10c17d8.1730223981.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-03 11:03:23 -08:00
Rosen Penev
f12b363887 net: dsa: use ethtool string helpers
These are the preferred way to copy ethtool strings.

Avoids incrementing pointers all over the place.

Signed-off-by: Rosen Penev <rosenp@gmail.com>
(for hellcreek driver)
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Link: https://patch.msgid.link/20241028044828.1639668-1-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-03 10:36:34 -08:00
Yafang Shao
dbd5e2e79e net: tcp: Add noinline_for_tracing annotation for tcp_drop_reason()
We previously hooked the tcp_drop_reason() function using BPF to monitor
TCP drop reasons. However, after upgrading our compiler from GCC 9 to GCC
11, tcp_drop_reason() is now inlined, preventing us from hooking into it.
To address this, it would be beneficial to make noinline explicitly for
tracing.

Link: https://lore.kernel.org/netdev/CANn89iJuShCmidCi_ZkYABtmscwbVjhuDta1MS5LxV_4H9tKOA@mail.gmail.com/
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Cc: Menglong Dong <menglong8.dong@gmail.com>
Link: https://patch.msgid.link/20241024093742.87681-3-laoar.shao@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-03 09:02:32 -08:00
Al Viro
6348be02ee fdget(), trivial conversions
fdget() is the first thing done in scope, all matching fdput() are
immediately followed by leaving the scope.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-11-03 01:28:06 -05:00
Al Viro
f302edb9d8 switch netlink_getsockbyfilp() to taking descriptor
the only call site (in do_mq_notify()) obtains the argument
from an immediately preceding fdget() and it is immediately
followed by fdput(); might as well just replace it with
a variant that would take a descriptor instead of struct file *
and have file lookups handled inside that function.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-11-03 01:28:06 -05:00
Al Viro
53c0a58beb net/socket.c: switch to CLASS(fd)
The important part in sockfd_lookup_light() is avoiding needless
file refcount operations, not the marginal reduction of the register
pressure from not keeping a struct file pointer in the caller.

	Switch to use fdget()/fdpu(); with sane use of CLASS(fd) we can
get a better code generation...

	Would be nice if somebody tested it on networking test suites
(including benchmarks)...

	sockfd_lookup_light() does fdget(), uses sock_from_file() to
get the associated socket and returns the struct socket reference to
the caller, along with "do we need to fput()" flag.  No matching fdput(),
the caller does its equivalent manually, using the fact that sock->file
points to the struct file the socket has come from.

	Get rid of that - have the callers do fdget()/fdput() and
use sock_from_file() directly.  That kills sockfd_lookup_light()
and fput_light() (no users left).

	What's more, we can get rid of explicit fdget()/fdput() by
switching to CLASS(fd, ...) - code generation does not suffer, since
now fdput() inserted on "descriptor is not opened" failure exit
is recognized to be a no-op by compiler.

[folded a fix for braino in do_recvmmsg() caught by Simon Horman]

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-11-03 01:27:11 -05:00
Linus Torvalds
3e5e6c9900 nfsd-6.12 fixes:
- Fix two async COPY bugs found during NFS bake-a-thon
 - Fix an svcrdma memory leak
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmcmT5wACgkQM2qzM29m
 f5coUg/9FQMf1IeXhyGlDtV0ELUxkHoXaZ2T6zhfXFoLmyl/DU4AWMH4YbSAqk2M
 NnR47sVsM08tIwE3KYQgYzLbbFF41nQUa1sckeel+nYGtpcN6IwOlU5LYYpNNFeQ
 vsECxV78BA6FGdjaXwQ07r4G6lpVhCCqM/RZpDrwNSyoIWVLo77KBUVCSoQb5wzG
 z7OBvO9M7HfVJVOHcPd+tVcZaGAF0fhW812fibZQKV2mrWdhOOe+gWVs8ro3tmm1
 GocbTTQW2hlYcLCZPe1przTI9flfwon6Lk8TmIZuU5IrzcaAB+U3P140aKgl9427
 v4WdLuKYlKi+xISBdRG3omyaLroNUs8IHW4KoBXAW3FinyLzNsAyoPxb02m7SEge
 sOJ/gbeLtb2u+ur4wAp4gDmVKfg3TGyh05Hdt96LXsbQUuWIlwEcPurl+nY93Eoq
 vrPLIdPOXrOD5jBIaVQkBYlaCn04mDg+VTNbG9hW1wjorVFpWKS7MwCjVloXSVIn
 uE++cVpQtIKp8aTYBbFVXqtVREatczl++f+Npnlm8xlcquDbaORkk4ZBOs4vQuHo
 pNuZcWO0rIBR6hakr44OjTLnJBIwChPBYvBVgtq1E6oAbwHvC2SVXbiJB8IcCPOx
 nB2jnF0/tpTs2LnrHAxdAGdU9Om6RGmahfz/uwh/8djdbCVH+Ik=
 =NiQh
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-6.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd fixes from Chuck Lever:

 - Fix two async COPY bugs found during NFS bake-a-thon

 - Fix an svcrdma memory leak

* tag 'nfsd-6.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  rpcrdma: Always release the rpcrdma_device's xa_array
  NFSD: Never decrement pending_async_copies on error
  NFSD: Initialize struct nfsd4_copy earlier
2024-11-02 09:27:11 -10:00
Jinjie Ruan
bc74d329ce netlink: Remove the dead code in netlink_proto_init()
In the error path of netlink_proto_init(), frees the already allocated
bucket table for new hash tables in a loop, but it is going to panic,
so it is not necessary to clean up the resources, just remove the
dead code.

Suggested-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Link: https://patch.msgid.link/20241030012147.357400-1-ruanjinjie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-31 19:36:25 -07:00
Pengcheng Yang
cf44bd08cd tcp: only release congestion control if it has been initialized
Currently, when cleaning up congestion control, we always call the
release regardless of whether it has been initialized. There is no
need to release when closing TCP_LISTEN and TCP_CLOSE (close
immediately after socket()).

In this case, tcp_cdg calls kfree(NULL) in release without causing
an exception, but for some customized ca, this could lead to
unexpected exceptions. We need to ensure that init and release are
called in pairs.

Signed-off-by: Pengcheng Yang <yangpc@wangsu.com>
Link: https://patch.msgid.link/1729845944-6003-1-git-send-email-yangpc@wangsu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-31 18:22:48 -07:00
Jakub Kicinski
5b1c965956 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.12-rc6).

Conflicts:

drivers/net/wireless/intel/iwlwifi/mvm/mld-mac80211.c
  cbe84e9ad5 ("wifi: iwlwifi: mvm: really send iwl_txpower_constraints_cmd")
  188a1bf894 ("wifi: mac80211: re-order assigning channel in activate links")
https://lore.kernel.org/all/20241028123621.7bbb131b@canb.auug.org.au/

net/mac80211/cfg.c
  c4382d5ca1 ("wifi: mac80211: update the right link for tx power")
  8dd0498983 ("wifi: mac80211: Fix setting txpower with emulate_chanctx")

drivers/net/ethernet/intel/ice/ice_ptp_hw.h
  6e58c33106 ("ice: fix crash on probe for DPLL enabled E810 LOM")
  e4291b64e1 ("ice: Align E810T GPIO to other products")
  ebb2693f8f ("ice: Read SDP section from NVM for pin definitions")
  ac532f4f42 ("ice: Cleanup unused declarations")
https://lore.kernel.org/all/20241030120524.1ee1af18@canb.auug.org.au/

No adjacent changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-31 18:10:07 -07:00
Linus Torvalds
5635f18942 BPF fixes:
- Fix BPF verifier to force a checkpoint when the program's jump
   history becomes too long (Eduard Zingerman)
 
 - Add several fixes to the BPF bits iterator addressing issues
   like memory leaks and overflow problems (Hou Tao)
 
 - Fix an out-of-bounds write in trie_get_next_key (Byeonguk Jeong)
 
 - Fix BPF test infra's LIVE_FRAME frame update after a page has
   been recycled (Toke Høiland-Jørgensen)
 
 - Fix BPF verifier and undo the 40-bytes extra stack space for
   bpf_fastcall patterns due to various bugs (Eduard Zingerman)
 
 - Fix a BPF sockmap race condition which could trigger a NULL
   pointer dereference in sock_map_link_update_prog (Cong Wang)
 
 - Fix tcp_bpf_recvmsg_parser to retrieve seq_copied from tcp_sk
   under the socket lock (Jiayuan Chen)
 
 Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
 -----BEGIN PGP SIGNATURE-----
 
 iIsEABYIADMWIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZyQO/RUcZGFuaWVsQGlv
 Z2VhcmJveC5uZXQACgkQ2yufC7HISIO2vAD+NAng11x6W9tnIOVDHTwvsWL4aafQ
 pmf1zda90bwCIyIA/07ptFPWOH+WTmWqP8pZ9PGY5279KAxurZZDud0SOwIO
 =28aY
 -----END PGP SIGNATURE-----

Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Pull bpf fixes from Daniel Borkmann:

 - Fix BPF verifier to force a checkpoint when the program's jump
   history becomes too long (Eduard Zingerman)

 - Add several fixes to the BPF bits iterator addressing issues like
   memory leaks and overflow problems (Hou Tao)

 - Fix an out-of-bounds write in trie_get_next_key (Byeonguk Jeong)

 - Fix BPF test infra's LIVE_FRAME frame update after a page has been
   recycled (Toke Høiland-Jørgensen)

 - Fix BPF verifier and undo the 40-bytes extra stack space for
   bpf_fastcall patterns due to various bugs (Eduard Zingerman)

 - Fix a BPF sockmap race condition which could trigger a NULL pointer
   dereference in sock_map_link_update_prog (Cong Wang)

 - Fix tcp_bpf_recvmsg_parser to retrieve seq_copied from tcp_sk under
   the socket lock (Jiayuan Chen)

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  bpf, test_run: Fix LIVE_FRAME frame update after a page has been recycled
  selftests/bpf: Add three test cases for bits_iter
  bpf: Use __u64 to save the bits in bits iterator
  bpf: Check the validity of nr_words in bpf_iter_bits_new()
  bpf: Add bpf_mem_alloc_check_size() helper
  bpf: Free dynamically allocated bits in bpf_iter_bits_destroy()
  bpf: disallow 40-bytes extra stack for bpf_fastcall patterns
  selftests/bpf: Add test for trie_get_next_key()
  bpf: Fix out-of-bounds write in trie_get_next_key()
  selftests/bpf: Test with a very short loop
  bpf: Force checkpoint when jmp history is too long
  bpf: fix filed access without lock
  sock_map: fix a NULL pointer dereference in sock_map_link_update_prog()
2024-10-31 14:56:19 -10:00
Linus Torvalds
90602c251c Including fixes from WiFi, bluetooth and netfilter.
No known new regressions outstanding.
 
 Current release - regressions:
 
   - wifi: mt76: do not increase mcu skb refcount if retry is not supported
 
 Current release - new code bugs:
 
   - wifi:
     - rtw88: fix the RX aggregation in USB 3 mode
     - mac80211: fix memory corruption bug in struct ieee80211_chanctx
 
 Previous releases - regressions:
 
   - sched:
     - stop qdisc_tree_reduce_backlog on TC_H_ROOT
     - sch_api: fix xa_insert() error path in tcf_block_get_ext()
 
   - wifi:
     - revert "wifi: iwlwifi: remove retry loops in start"
     - cfg80211: clear wdev->cqm_config pointer on free
 
   - netfilter: fix potential crash in nf_send_reset6()
 
   - ip_tunnel: fix suspicious RCU usage warning in ip_tunnel_find()
 
   - bluetooth: fix null-ptr-deref in hci_read_supported_codecs
 
   - eth: mlxsw: add missing verification before pushing Tx header
 
   - eth: hns3: fixed hclge_fetch_pf_reg accesses bar space out of bounds issue
 
 Previous releases - always broken:
 
   - wifi: mac80211: do not pass a stopped vif to the driver in .get_txpower
 
   - netfilter: sanitize offset and length before calling skb_checksum()
 
   - core:
     - fix crash when config small gso_max_size/gso_ipv4_max_size
     - skip offload for NETIF_F_IPV6_CSUM if ipv6 header contains extension
 
   - mptcp: protect sched with rcu_read_lock
 
   - eth: ice: fix crash on probe for DPLL enabled E810 LOM
 
   - eth: macsec: fix use-after-free while sending the offloading packet
 
   - eth: stmmac: fix unbalanced DMA map/unmap for non-paged SKB data
 
   - eth: hns3: fix kernel crash when 1588 is sent on HIP08 devices
 
   - eth: mtk_wed: fix path of MT7988 WO firmware
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmcjfLUSHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOkONUP/35Vf0++xmZC12pvpL88B5RDqh9vH4Tv
 mYMIBUJNzXQhPoC27gGdY2v4U2ntVfbhFXVyYDJAVl5gwaCZkYufffrsBPqKBFBA
 tQnNpy+A2F+h4rRcTmugYoDdocwCK3qaAjZnF69SJ//6dtahorhOitdMoYbM2Vpj
 nNDWVPiN4pdIUBa+HrDeZ7f+Hou/i5q+mwXTh3/FZrJTWDdMfrFTSM3MMvKv+Fwk
 VoV7QwrR1APVjzgJmYujnil84d4D7etxHIgHFIvASJ5AgSZwnwVYWDfgTAalCD8a
 aoRtDvOZYJfVmRaitAFQd1tRrWn/Sk/QLqUyVfH8rZrGv3n/SEihZ00EtodOzAV4
 31DSdpipdopfht5pFBN1o/VwvAWx2s34uXL1/L8eQWbMLOp4lQoqXoHbQ6yDac2p
 L6ESQH/DY3dMTsKgpkpUm7w4RzutoI3QXpoxlWO2KIwNcawiyVcdKKlKvfFgBQZr
 cGHG/Nzp6P6y9BiX36Rq3I7QKz/GjZN9zPe+3kPX99C2/UoO6St2yPBPLdh+BT2a
 3cqq7ypkxvKtp5EByUjTRQwJZDsD8yY3VWTQN7GYAae0AWJlY8hET05tZEJmwWF8
 TFKdme6lAN4XxNunEVQmUG93kuQRHJkPsN6pRhqGdOv/yUOxJT+meWBVJfMBQCq/
 70L0e6WiIJUe
 =3oi9
 -----END PGP SIGNATURE-----

Merge tag 'net-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from WiFi, bluetooth and netfilter.

  No known new regressions outstanding.

  Current release - regressions:

   - wifi: mt76: do not increase mcu skb refcount if retry is not
     supported

  Current release - new code bugs:

   - wifi:
      - rtw88: fix the RX aggregation in USB 3 mode
      - mac80211: fix memory corruption bug in struct ieee80211_chanctx

  Previous releases - regressions:

   - sched:
      - stop qdisc_tree_reduce_backlog on TC_H_ROOT
      - sch_api: fix xa_insert() error path in tcf_block_get_ext()

   - wifi:
      - revert "wifi: iwlwifi: remove retry loops in start"
      - cfg80211: clear wdev->cqm_config pointer on free

   - netfilter: fix potential crash in nf_send_reset6()

   - ip_tunnel: fix suspicious RCU usage warning in ip_tunnel_find()

   - bluetooth: fix null-ptr-deref in hci_read_supported_codecs

   - eth: mlxsw: add missing verification before pushing Tx header

   - eth: hns3: fixed hclge_fetch_pf_reg accesses bar space out of
     bounds issue

  Previous releases - always broken:

   - wifi: mac80211: do not pass a stopped vif to the driver in
     .get_txpower

   - netfilter: sanitize offset and length before calling skb_checksum()

   - core:
      - fix crash when config small gso_max_size/gso_ipv4_max_size
      - skip offload for NETIF_F_IPV6_CSUM if ipv6 header contains extension

   - mptcp: protect sched with rcu_read_lock

   - eth: ice: fix crash on probe for DPLL enabled E810 LOM

   - eth: macsec: fix use-after-free while sending the offloading packet

   - eth: stmmac: fix unbalanced DMA map/unmap for non-paged SKB data

   - eth: hns3: fix kernel crash when 1588 is sent on HIP08 devices

   - eth: mtk_wed: fix path of MT7988 WO firmware"

* tag 'net-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (70 commits)
  net: hns3: fix kernel crash when 1588 is sent on HIP08 devices
  net: hns3: fixed hclge_fetch_pf_reg accesses bar space out of bounds issue
  net: hns3: initialize reset_timer before hclgevf_misc_irq_init()
  net: hns3: don't auto enable misc vector
  net: hns3: Resolved the issue that the debugfs query result is inconsistent.
  net: hns3: fix missing features due to dev->features configuration too early
  net: hns3: fixed reset failure issues caused by the incorrect reset type
  net: hns3: add sync command to sync io-pgtable
  net: hns3: default enable tx bounce buffer when smmu enabled
  netfilter: nft_payload: sanitize offset and length before calling skb_checksum()
  net: ethernet: mtk_wed: fix path of MT7988 WO firmware
  selftests: forwarding: Add IPv6 GRE remote change tests
  mlxsw: spectrum_ipip: Fix memory leak when changing remote IPv6 address
  mlxsw: pci: Sync Rx buffers for device
  mlxsw: pci: Sync Rx buffers for CPU
  mlxsw: spectrum_ptp: Add missing verification before pushing Tx header
  net: skip offload for NETIF_F_IPV6_CSUM if ipv6 header contains extension
  Bluetooth: hci: fix null-ptr-deref in hci_read_supported_codecs
  netfilter: nf_reject_ipv6: fix potential crash in nf_send_reset6()
  netfilter: Fix use-after-free in get_info()
  ...
2024-10-31 12:39:58 -10:00
Toke Høiland-Jørgensen
c40dd8c473 bpf, test_run: Fix LIVE_FRAME frame update after a page has been recycled
The test_run code detects whether a page has been modified and
re-initialises the xdp_frame structure if it has, using
xdp_update_frame_from_buff(). However, xdp_update_frame_from_buff()
doesn't touch frame->mem, so that wasn't correctly re-initialised, which
led to the pages from page_pool not being returned correctly. Syzbot
noticed this as a memory leak.

Fix this by also copying the frame->mem structure when re-initialising
the frame, like we do on initialisation of a new page from page_pool.

Fixes: e5995bc7e2 ("bpf, test_run: fix crashes due to XDP frame overwriting/corruption")
Fixes: b530e9e106 ("bpf: Add "live packet" mode for XDP in BPF_PROG_RUN")
Reported-by: syzbot+d121e098da06af416d23@syzkaller.appspotmail.com
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: syzbot+d121e098da06af416d23@syzkaller.appspotmail.com
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://lore.kernel.org/bpf/20241030-test-run-mem-fix-v1-1-41e88e8cae43@redhat.com
2024-10-31 16:15:21 +01:00
Paolo Abeni
50ae879de1 netfilter pull request 24-10-31
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEjF9xRqF1emXiQiqU1w0aZmrPKyEFAmcjVIwACgkQ1w0aZmrP
 KyGfKA/+Nhj8MpYPl/TiaWitL9quL8ExcSr8DBJPlG0w/LzlrdYZqKvgxUfTDHKi
 GYnfgFrHlrNG0E6HiWS8RPzjJjT8of/hrnFU/pMkGh97hQpDLkkoG9/wAScO3NGQ
 c8roORHe3gSH1ZR6ExCy6wdw1aUxGzA7amZULDc+bU64KamGgFoBTSI/YG0bW0FI
 s5waIGtqfo/Wg+uyRv5Ny477aWIrWTjhG0T/64lziPws9rZ40cTfg8N22PVX0Yog
 pojOC8mpMyys5hNu6UBB0pdX5J6ARO1seK0aj1i+XEkJYHb1u9/oySfVoI0yqQ3Y
 9Kt8nc/NvBoOvK2xlLApYIHC8//YCAE94n6JwrZPyb16L93yxhpGO3kZNq6ydJdU
 2tgv23clW3keMvJijICc0bxQdUqwKdQZcLRwkVAvXAWvOCI9x72Y6GeJS4xRnnBF
 RDug6uuLMVoh9UlUZ95HiXsFLzilXtkFHGaG0KjEKdIECgajTBlcwpFyDGktto44
 XVr4cjTQ5amUVqfcG7ycFLsVjk8dYEkYbTE6FMenSAeFIYzo+D8xQnoMKFYE+IXt
 C229Z9FZJvmbduZEoeaMI24CnV4P0HjNPvrTMw8go2iSbi89fLPEscCS/z/eUqQS
 4FB+E93LaCKhSYujQIAB+sZcNlG0Q8+7f10mv8SsNZxJZmPOIeU=
 =RHbi
 -----END PGP SIGNATURE-----

Merge tag 'nf-24-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================
The following patchset contains Netfilter fixes for net:

1) Remove unused parameters in conntrack_dump_flush.c used by
   selftests, from Liu Jing.

2) Fix possible UaF when removing xtables module via getsockopt()
   interface, from Dong Chenchen.

3) Fix potential crash in nf_send_reset6() reported by syzkaller.
   From Eric Dumazet

4) Validate offset and length before calling skb_checksum()
   in nft_payload, otherwise hitting BUG() is possible.

netfilter pull request 24-10-31

* tag 'nf-24-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nft_payload: sanitize offset and length before calling skb_checksum()
  netfilter: nf_reject_ipv6: fix potential crash in nf_send_reset6()
  netfilter: Fix use-after-free in get_info()
  selftests: netfilter: remove unused parameter
====================

Link: https://patch.msgid.link/
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-31 12:13:08 +01:00
Paolo Abeni
ee802a4954 bluetooth pull request for net:
- hci: fix null-ptr-deref in hci_read_supported_codecs
 -----BEGIN PGP SIGNATURE-----
 
 iQJNBAABCAA3FiEE7E6oRXp8w05ovYr/9JCA4xAyCykFAmcigO0ZHGx1aXoudm9u
 LmRlbnR6QGludGVsLmNvbQAKCRD0kIDjEDILKWojEACFiQEIQp/tMqOyaGFDemka
 EQCidCIp5uzKqV2G0FlWDl6i8q152qtdJu1wgKIq7F5fWLTlHXc0rdq9wopn2IPC
 f5KWkUutZizk8TlhN1HkkXnaHzqiWwuK8TYxXxVkC1y/E4eBvwhNdyATAbMQFJdy
 HTo0jO6OK8CDKnNCEXpMdd53Iov33KSpYE/OpfAzVPvspOk96bx34RSjVpcaX8Qa
 IoLZ1QS4crK/ETbAW6Ysy5ZyqpQphvl/O+0Fw7baOvj2iziX6K1k5sE18QbrlqFB
 N+DIgMC6sBpogv2OD13eGQWvSX51G2V60Fk9fHQEJo4BDoiE2oRtDZlWkxFl64VK
 kpzwswW/tUEw2xYFt/pgCM/QZ8KXh8oVHwo4HTP45TbpIP3nJcz0xuT7ChVzy2mF
 T3JQLR7W7w89OOsoaOezk+MS1UdL/tB4RsMG5Edb1SBS48fGI0yqIi/O58iqWeTu
 SfjacEjAizlzPgLxoUeVFqYzHOFJa89BsFzTU3rs+9z8+DBwfW44J10EP9i2tDhP
 lqseqLErItx47gdOy7HgWpjRmOraa36syjvL8bRXcbnhBVT4RjmJDWgM+NcIOkXB
 T0Y4I+u0ppoCxvHES/tCQ19C9Uj65XfPSZxzKDu76ecEqR6uMHaXoO196SWj1VSB
 Gqr/qDbdW9yvsCHCIYcwGQ==
 =mno/
 -----END PGP SIGNATURE-----

Merge tag 'for-net-2024-10-30' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth

Luiz Augusto von Dentz says:

====================
bluetooth pull request for net:

 - hci: fix null-ptr-deref in hci_read_supported_codecs

* tag 'for-net-2024-10-30' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
  Bluetooth: hci: fix null-ptr-deref in hci_read_supported_codecs
====================

Link: https://patch.msgid.link/20241030192205.38298-1-luiz.dentz@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-31 11:32:57 +01:00
Pablo Neira Ayuso
d5953d680f netfilter: nft_payload: sanitize offset and length before calling skb_checksum()
If access to offset + length is larger than the skbuff length, then
skb_checksum() triggers BUG_ON().

skb_checksum() internally subtracts the length parameter while iterating
over skbuff, BUG_ON(len) at the end of it checks that the expected
length to be included in the checksum calculation is fully consumed.

Fixes: 7ec3f7b47b ("netfilter: nft_payload: add packet mangling support")
Reported-by: Slavin Liu <slavin-ayu@qq.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-10-31 10:54:49 +01:00
Christophe JAILLET
bd03e7627c rtnetlink: Fix an error handling path in rtnl_newlink()
When some code has been moved in the commit in Fixes, some "return err;"
have correctly been changed in goto <some_where_in_the_error_handling_path>
but this one was missed.

Should "ops->maxtype > RTNL_MAX_TYPE" happen, then some resources would
leak.

Go through the error handling path to fix these leaks.

Fixes: 0d3008d1a9 ("rtnetlink: Move ops->validate to rtnl_newlink().")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/eca90eeb4d9e9a0545772b68aeaab883d9fe2279.1729952228.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-30 18:26:58 -07:00
Benoît Monin
04c20a9356 net: skip offload for NETIF_F_IPV6_CSUM if ipv6 header contains extension
As documented in skbuff.h, devices with NETIF_F_IPV6_CSUM capability
can only checksum TCP and UDP over IPv6 if the IP header does not
contains extension.

This is enforced for UDP packets emitted from user-space to an IPv6
address as they go through ip6_make_skb(), which calls
__ip6_append_data() where a check is done on the header size before
setting CHECKSUM_PARTIAL.

But the introduction of UDP encapsulation with fou6 added a code-path
where it is possible to get an skb with a partial UDP checksum and an
IPv6 header with extension:
* fou6 adds a UDP header with a partial checksum if the inner packet
does not contains a valid checksum.
* ip6_tunnel adds an IPv6 header with a destination option extension
header if encap_limit is non-zero (the default value is 4).

The thread linked below describes in more details how to reproduce the
problem with GRE-in-UDP tunnel.

Add a check on the network header size in skb_csum_hwoffload_help() to
make sure no IPv6 packet with extension header is handed to a network
device with NETIF_F_IPV6_CSUM capability.

Link: https://lore.kernel.org/netdev/26548921.1r3eYUQgxm@benoit.monin/T/#u
Fixes: aa3463d65e ("fou: Add encap ops for IPv6 tunnels")
Signed-off-by: Benoît Monin <benoit.monin@gmx.fr>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/5fbeecfc311ea182aa1d1c771725ab8b4cac515e.1729778144.git.benoit.monin@gmx.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-30 18:21:30 -07:00
Vladimir Oltean
3535d70df9 net: dsa: allow matchall mirroring rules towards the CPU
If the CPU bandwidth capacity permits, it may be useful to mirror the
entire ingress of a user port to software.

This is in fact possible to express even if there is no net_device
representation for the CPU port. In fact, that approach was already
exhausted and that representation wouldn't have even helped [1].

The idea behind implementing this is that currently, we refuse to
offload any mirroring towards a non-DSA target net_device. But if we
acknowledge the fact that to reach any foreign net_device, the switch
must send the packet to the CPU anyway, then we can simply offload just
that part, and let the software do the rest. There is only one condition
we need to uphold: the filter needs to be present in the software data
path as well (no skip_sw).

There are 2 actions to consider: FLOW_ACTION_MIRRED (redirect to egress
of target interface) and FLOW_ACTION_MIRRED_INGRESS (redirect to ingress
of target interface). We don't have the ability/API to offload
FLOW_ACTION_MIRRED_INGRESS when the target port is also a DSA user port,
but we could also permit that through mirred to the CPU + software.

Example:

$ ip link add dummy0 type dummy; ip link set dummy0 up
$ tc qdisc add dev swp0 clsact
$ tc filter add dev swp0 ingress matchall action mirred ingress mirror dev dummy0

Any DSA driver with a ds->ops->port_mirror_add() implementation can now
make use of this with no additional change.

[1] https://lore.kernel.org/netdev/20191002233750.13566-1-olteanv@gmail.com/
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20241023135251.1752488-6-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-30 17:33:54 -07:00
Vladimir Oltean
4cc4394a89 net: dsa: add more extack messages in dsa_user_add_cls_matchall_mirred()
Do not leave -EOPNOTSUPP errors without an explanation. It is confusing
for the user to figure out what is wrong otherwise.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20241023135251.1752488-5-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-30 17:33:54 -07:00
Vladimir Oltean
c11ace14d9 net: dsa: use "extack" as argument to flow_action_basic_hw_stats_check()
We already have an "extack" stack variable in
dsa_user_add_cls_matchall_police() and
dsa_user_add_cls_matchall_mirred(), there is no need to retrieve
it again from cls->common.extack.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20241023135251.1752488-4-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-30 17:33:54 -07:00
Vladimir Oltean
a0af7162cc net: dsa: clean up dsa_user_add_cls_matchall()
The body is a bit hard to read, hard to extend, and has duplicated
conditions.

Clean up the "if (many conditions) else if (many conditions, some
of them repeated)" pattern by:

- Moving the repeated conditions out
- Replacing the repeated tests for the same variable with a switch/case
- Moving the protocol check inside the dsa_user_add_cls_matchall_mirred()
  function call.

This is pure refactoring, no logic has been changed, though some tests
were reordered. The order does not matter - they are independent things
to be tested for.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20241023135251.1752488-3-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-30 17:33:53 -07:00
Chuck Lever
63a81588cd rpcrdma: Always release the rpcrdma_device's xa_array
Dai pointed out that the xa_init_flags() in rpcrdma_add_one() needs
to have a matching xa_destroy() in rpcrdma_remove_one() to release
underlying memory that the xarray might have accrued during
operation.

Reported-by: Dai Ngo <dai.ngo@oracle.com>
Fixes: 7e86845a03 ("rpcrdma: Implement generic device removal")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-10-30 16:14:00 -04:00
Sungwoo Kim
1e67d86418 Bluetooth: hci: fix null-ptr-deref in hci_read_supported_codecs
Fix __hci_cmd_sync_sk() to return not NULL for unknown opcodes.

__hci_cmd_sync_sk() returns NULL if a command returns a status event.
However, it also returns NULL where an opcode doesn't exist in the
hci_cc table because hci_cmd_complete_evt() assumes status = skb->data[0]
for unknown opcodes.
This leads to null-ptr-deref in cmd_sync for HCI_OP_READ_LOCAL_CODECS as
there is no hci_cc for HCI_OP_READ_LOCAL_CODECS, which always assumes
status = skb->data[0].

KASAN: null-ptr-deref in range [0x0000000000000070-0x0000000000000077]
CPU: 1 PID: 2000 Comm: kworker/u9:5 Not tainted 6.9.0-ga6bcb805883c-dirty #10
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
Workqueue: hci7 hci_power_on
RIP: 0010:hci_read_supported_codecs+0xb9/0x870 net/bluetooth/hci_codec.c:138
Code: 08 48 89 ef e8 b8 c1 8f fd 48 8b 75 00 e9 96 00 00 00 49 89 c6 48 ba 00 00 00 00 00 fc ff df 4c 8d 60 70 4c 89 e3 48 c1 eb 03 <0f> b6 04 13 84 c0 0f 85 82 06 00 00 41 83 3c 24 02 77 0a e8 bf 78
RSP: 0018:ffff888120bafac8 EFLAGS: 00010212
RAX: 0000000000000000 RBX: 000000000000000e RCX: ffff8881173f0040
RDX: dffffc0000000000 RSI: ffffffffa58496c0 RDI: ffff88810b9ad1e4
RBP: ffff88810b9ac000 R08: ffffffffa77882a7 R09: 1ffffffff4ef1054
R10: dffffc0000000000 R11: fffffbfff4ef1055 R12: 0000000000000070
R13: 0000000000000000 R14: 0000000000000000 R15: ffff88810b9ac000
FS:  0000000000000000(0000) GS:ffff8881f6c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f6ddaa3439e CR3: 0000000139764003 CR4: 0000000000770ef0
PKRU: 55555554
Call Trace:
 <TASK>
 hci_read_local_codecs_sync net/bluetooth/hci_sync.c:4546 [inline]
 hci_init_stage_sync net/bluetooth/hci_sync.c:3441 [inline]
 hci_init4_sync net/bluetooth/hci_sync.c:4706 [inline]
 hci_init_sync net/bluetooth/hci_sync.c:4742 [inline]
 hci_dev_init_sync net/bluetooth/hci_sync.c:4912 [inline]
 hci_dev_open_sync+0x19a9/0x2d30 net/bluetooth/hci_sync.c:4994
 hci_dev_do_open net/bluetooth/hci_core.c:483 [inline]
 hci_power_on+0x11e/0x560 net/bluetooth/hci_core.c:1015
 process_one_work kernel/workqueue.c:3267 [inline]
 process_scheduled_works+0x8ef/0x14f0 kernel/workqueue.c:3348
 worker_thread+0x91f/0xe50 kernel/workqueue.c:3429
 kthread+0x2cb/0x360 kernel/kthread.c:388
 ret_from_fork+0x4d/0x80 arch/x86/kernel/process.c:147
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244

Fixes: abfeea476c ("Bluetooth: hci_sync: Convert MGMT_OP_START_DISCOVERY")

Signed-off-by: Sungwoo Kim <iam@sung-woo.kim>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-10-30 14:49:09 -04:00
Easwar Hariharan
b35108a51c jiffies: Define secs_to_jiffies()
secs_to_jiffies() is defined in hci_event.c and cannot be reused by
other call sites. Hoist it into the core code to allow conversion of the
~1150 usages of msecs_to_jiffies() that either:

 - use a multiplier value of 1000 or equivalently MSEC_PER_SEC, or
 - have timeouts that are denominated in seconds (i.e. end in 000)

It's implemented as a macro to allow usage in static initializers.

This will also allow conversion of yet more sites that use (sec * HZ)
directly, and improve their readability.

Suggested-by: Michael Kelley <mhklinux@outlook.com>
Signed-off-by: Easwar Hariharan <eahariha@linux.microsoft.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Link: https://lore.kernel.org/all/20241030-open-coded-timeouts-v3-1-9ba123facf88@linux.microsoft.com
2024-10-30 19:47:20 +01:00
Puranjay Mohan
6a4794d5a3 bpf: bpf_csum_diff: Optimize and homogenize for all archs
1. Optimization
   ------------

The current implementation copies the 'from' and 'to' buffers to a
scratchpad and it takes the bitwise NOT of 'from' buffer while copying.
In the next step csum_partial() is called with this scratchpad.

so, mathematically, the current implementation is doing:

	result = csum(to - from)

Here, 'to'  and '~ from' are copied in to the scratchpad buffer, we need
it in the scratchpad buffer because csum_partial() takes a single
contiguous buffer and not two disjoint buffers like 'to' and 'from'.

We can re write this equation to:

	result = csum(to) - csum(from)

using the distributive property of csum().

this allows 'to' and 'from' to be at different locations and therefore
this scratchpad and copying is not needed.

This in C code will look like:

result = csum_sub(csum_partial(to, to_size, seed),
                  csum_partial(from, from_size, 0));

2. Homogenization
   --------------

The bpf_csum_diff() helper calls csum_partial() which is implemented by
some architectures like arm and x86 but other architectures rely on the
generic implementation in lib/checksum.c

The generic implementation in lib/checksum.c returns a 16 bit value but
the arch specific implementations can return more than 16 bits, this
works out in most places because before the result is used, it is passed
through csum_fold() that turns it into a 16-bit value.

bpf_csum_diff() directly returns the value from csum_partial() and
therefore the returned values could be different on different
architectures. see discussion in [1]:

for the int value 28 the calculated checksums are:

x86                    :    -29 : 0xffffffe3
generic (arm64, riscv) :  65507 : 0x0000ffe3
arm                    : 131042 : 0x0001ffe2

Pass the result of bpf_csum_diff() through from32to16() before returning
to homogenize this result for all architectures.

NOTE: from32to16() is used instead of csum_fold() because csum_fold()
does from32to16() + bitwise NOT of the result, which is not what we want
to do here.

[1] https://lore.kernel.org/bpf/CAJ+HfNiQbOcqCLxFUP2FMm5QrLXUUaj852Fxe3hn_2JNiucn6g@mail.gmail.com/

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20241026125339.26459-3-puranjay@kernel.org
2024-10-30 15:29:59 +01:00
Jason Xing
668d663989 tcp: add more warn of socket in tcp_send_loss_probe()
Add two fields to print in the helper which here covers tcp_send_loss_probe().

Link: https://lore.kernel.org/all/5632e043-bdba-4d75-bc7e-bf58014492fd@redhat.com/
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Cc: Neal Cardwell <ncardwell@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-30 13:26:55 +00:00
Eric Dumazet
4ed234fe79 netfilter: nf_reject_ipv6: fix potential crash in nf_send_reset6()
I got a syzbot report without a repro [1] crashing in nf_send_reset6()

I think the issue is that dev->hard_header_len is zero, and we attempt
later to push an Ethernet header.

Use LL_MAX_HEADER, as other functions in net/ipv6/netfilter/nf_reject_ipv6.c.

[1]

skbuff: skb_under_panic: text:ffffffff89b1d008 len:74 put:14 head:ffff88803123aa00 data:ffff88803123a9f2 tail:0x3c end:0x140 dev:syz_tun
 kernel BUG at net/core/skbuff.c:206 !
Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
CPU: 0 UID: 0 PID: 7373 Comm: syz.1.568 Not tainted 6.12.0-rc2-syzkaller-00631-g6d858708d465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
 RIP: 0010:skb_panic net/core/skbuff.c:206 [inline]
 RIP: 0010:skb_under_panic+0x14b/0x150 net/core/skbuff.c:216
Code: 0d 8d 48 c7 c6 60 a6 29 8e 48 8b 54 24 08 8b 0c 24 44 8b 44 24 04 4d 89 e9 50 41 54 41 57 41 56 e8 ba 30 38 02 48 83 c4 20 90 <0f> 0b 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3
RSP: 0018:ffffc900045269b0 EFLAGS: 00010282
RAX: 0000000000000088 RBX: dffffc0000000000 RCX: cd66dacdc5d8e800
RDX: 0000000000000000 RSI: 0000000000000200 RDI: 0000000000000000
RBP: ffff88802d39a3d0 R08: ffffffff8174afec R09: 1ffff920008a4ccc
R10: dffffc0000000000 R11: fffff520008a4ccd R12: 0000000000000140
R13: ffff88803123aa00 R14: ffff88803123a9f2 R15: 000000000000003c
FS:  00007fdbee5ff6c0(0000) GS:ffff8880b8600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000005d322000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
  skb_push+0xe5/0x100 net/core/skbuff.c:2636
  eth_header+0x38/0x1f0 net/ethernet/eth.c:83
  dev_hard_header include/linux/netdevice.h:3208 [inline]
  nf_send_reset6+0xce6/0x1270 net/ipv6/netfilter/nf_reject_ipv6.c:358
  nft_reject_inet_eval+0x3b9/0x690 net/netfilter/nft_reject_inet.c:48
  expr_call_ops_eval net/netfilter/nf_tables_core.c:240 [inline]
  nft_do_chain+0x4ad/0x1da0 net/netfilter/nf_tables_core.c:288
  nft_do_chain_inet+0x418/0x6b0 net/netfilter/nft_chain_filter.c:161
  nf_hook_entry_hookfn include/linux/netfilter.h:154 [inline]
  nf_hook_slow+0xc3/0x220 net/netfilter/core.c:626
  nf_hook include/linux/netfilter.h:269 [inline]
  NF_HOOK include/linux/netfilter.h:312 [inline]
  br_nf_pre_routing_ipv6+0x63e/0x770 net/bridge/br_netfilter_ipv6.c:184
  nf_hook_entry_hookfn include/linux/netfilter.h:154 [inline]
  nf_hook_bridge_pre net/bridge/br_input.c:277 [inline]
  br_handle_frame+0x9fd/0x1530 net/bridge/br_input.c:424
  __netif_receive_skb_core+0x13e8/0x4570 net/core/dev.c:5562
  __netif_receive_skb_one_core net/core/dev.c:5666 [inline]
  __netif_receive_skb+0x12f/0x650 net/core/dev.c:5781
  netif_receive_skb_internal net/core/dev.c:5867 [inline]
  netif_receive_skb+0x1e8/0x890 net/core/dev.c:5926
  tun_rx_batched+0x1b7/0x8f0 drivers/net/tun.c:1550
  tun_get_user+0x3056/0x47e0 drivers/net/tun.c:2007
  tun_chr_write_iter+0x10d/0x1f0 drivers/net/tun.c:2053
  new_sync_write fs/read_write.c:590 [inline]
  vfs_write+0xa6d/0xc90 fs/read_write.c:683
  ksys_write+0x183/0x2b0 fs/read_write.c:736
  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
  do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fdbeeb7d1ff
Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 c9 8d 02 00 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 1c 8e 02 00 48
RSP: 002b:00007fdbee5ff000 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 00007fdbeed36058 RCX: 00007fdbeeb7d1ff
RDX: 000000000000008e RSI: 0000000020000040 RDI: 00000000000000c8
RBP: 00007fdbeebf12be R08: 0000000000000000 R09: 0000000000000000
R10: 000000000000008e R11: 0000000000000293 R12: 0000000000000000
R13: 0000000000000000 R14: 00007fdbeed36058 R15: 00007ffc38de06e8
 </TASK>

Fixes: c8d7b98bec ("netfilter: move nf_send_resetX() code to nf_reject_ipvX modules")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-10-30 13:17:36 +01:00
Dong Chenchen
f48d258f0a netfilter: Fix use-after-free in get_info()
ip6table_nat module unload has refcnt warning for UAF. call trace is:

WARNING: CPU: 1 PID: 379 at kernel/module/main.c:853 module_put+0x6f/0x80
Modules linked in: ip6table_nat(-)
CPU: 1 UID: 0 PID: 379 Comm: ip6tables Not tainted 6.12.0-rc4-00047-gc2ee9f594da8-dirty #205
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:module_put+0x6f/0x80
Call Trace:
 <TASK>
 get_info+0x128/0x180
 do_ip6t_get_ctl+0x6a/0x430
 nf_getsockopt+0x46/0x80
 ipv6_getsockopt+0xb9/0x100
 rawv6_getsockopt+0x42/0x190
 do_sock_getsockopt+0xaa/0x180
 __sys_getsockopt+0x70/0xc0
 __x64_sys_getsockopt+0x20/0x30
 do_syscall_64+0xa2/0x1a0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Concurrent execution of module unload and get_info() trigered the warning.
The root cause is as follows:

cpu0				      cpu1
module_exit
//mod->state = MODULE_STATE_GOING
  ip6table_nat_exit
    xt_unregister_template
	kfree(t)
	//removed from templ_list
				      getinfo()
					  t = xt_find_table_lock
						list_for_each_entry(tmpl, &xt_templates[af]...)
							if (strcmp(tmpl->name, name))
								continue;  //table not found
							try_module_get
						list_for_each_entry(t, &xt_net->tables[af]...)
							return t;  //not get refcnt
					  module_put(t->me) //uaf
    unregister_pernet_subsys
    //remove table from xt_net list

While xt_table module was going away and has been removed from
xt_templates list, we couldnt get refcnt of xt_table->me. Check
module in xt_net->tables list re-traversal to fix it.

Fixes: fdacd57c79 ("netfilter: x_tables: never register tables by default")
Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-10-30 13:17:36 +01:00
Jakub Kicinski
c05c62850a wireless fixes for v6.12-rc6
Another set of fixes, mostly iwlwifi:
  * fix infinite loop in 6 GHz scan if more than
    255 colocated APs were reported
  * revert removal of retry loops for now to work
    around issues with firmware initialization on
    some devices/platforms
  * fix SAR table issues with some BIOSes
  * fix race in suspend/debug collection
  * fix memory leak in fw recovery
  * fix link ID leak in AP mode for older devices
  * fix sending TX power constraints
  * fix link handling in FW restart
 
 And also the stack:
  * fix setting TX power from userspace with the new
    chanctx emulation code for old-style drivers
  * fix a memory corruption bug due to structure
    embedding
  * fix CQM configuration double-free when moving
    between net namespaces
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEpeA8sTs3M8SN2hR410qiO8sPaAAFAmcgnyMACgkQ10qiO8sP
 aAApKhAAgnQgnDXmtFYyP6qKr5tkHnY+4VcLOOUI13lGAuhIJwfMqct4204TUS+4
 /3qBajSsr3n/pKjI/RQ9LtRdUHoZS+O6zVmSjAEFq8JU+Z8cZKo4teY8SfXY9pXm
 I/6X93o9hMCeL1+ebr4qgZVJFW9jdQnM9xJgR9lwNDViJUQT0Bn6Xfy2Crvkl2hn
 cnr1wU3lH261vf22X8ON+giWToyCvsQmKExGQzenSLJ1UK8bwcu9akWA7GZsjFYK
 Skp9q95EEFJ9V2qGAHWCivD3//rJlsXZPfwgT2QhHvvMawrb/l9pva9fk543y8Xk
 9IpLa8WEiLivYk7lW4qzykrfWhNM+ubdOJVUIl23qbpI8aP/ZU3QHzIn9ZOGrBVJ
 yvu647OC0P/xNMHJ7jhOCAxmduqwsxKsRcnES1ZwWdQWAmHO97bSTh95+YfLsAwD
 6l/PQVzeUcypeYG9rXnBkA/k8obFh8dJnIU4d96rxJdrtF7ixRiedfvx9V9tGEx4
 dWu+4KVkh2W7iSXAWI/cammpU06ptzFNcjEYIGs/7xttu/riXH4fNcgm2wl+oPe7
 SMaBwbGf2nTXtrqLPaVSkyM16MtSVHh9iqzYH//N357PCd1Kr9L+5xUKx1hfJ7X+
 7Z9+h9azBCQXeXz2AuWQWWynZPTSrdVLYuMNsdLJ17HAerDZJc8=
 =4M1T
 -----END PGP SIGNATURE-----

Merge tag 'wireless-2024-10-29' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless

Johannes Berg says:

====================
wireless fixes for v6.12-rc6

Another set of fixes, mostly iwlwifi:
 * fix infinite loop in 6 GHz scan if more than
   255 colocated APs were reported
 * revert removal of retry loops for now to work
   around issues with firmware initialization on
   some devices/platforms
 * fix SAR table issues with some BIOSes
 * fix race in suspend/debug collection
 * fix memory leak in fw recovery
 * fix link ID leak in AP mode for older devices
 * fix sending TX power constraints
 * fix link handling in FW restart

And also the stack:
 * fix setting TX power from userspace with the new
   chanctx emulation code for old-style drivers
 * fix a memory corruption bug due to structure
   embedding
 * fix CQM configuration double-free when moving
   between net namespaces

* tag 'wireless-2024-10-29' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
  wifi: mac80211: ieee80211_i: Fix memory corruption bug in struct ieee80211_chanctx
  wifi: iwlwifi: mvm: fix 6 GHz scan construction
  wifi: cfg80211: clear wdev->cqm_config pointer on free
  mac80211: fix user-power when emulating chanctx
  Revert "wifi: iwlwifi: remove retry loops in start"
  wifi: iwlwifi: mvm: don't add default link in fw restart flow
  wifi: iwlwifi: mvm: Fix response handling in iwl_mvm_send_recovery_cmd()
  wifi: iwlwifi: mvm: SAR table alignment
  wifi: iwlwifi: mvm: Use the sync timepoint API in suspend
  wifi: iwlwifi: mvm: really send iwl_txpower_constraints_cmd
  wifi: iwlwifi: mvm: don't leak a link on AP removal
====================

Link: https://patch.msgid.link/20241029093926.13750-3-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-29 18:57:12 -07:00
Jakub Kicinski
71e0ad3451 wireless-next patches for v6.13
The first -next "new features" pull request for v6.13. This is a big
 one as we have not been able to send one earlier. We have also some
 patches affecting other subsystems: in staging we deleted the rtl8192e
 driver and in debugfs added a new interface to save struct
 file_operations memory; both were acked by GregKH.
 
 Because of the lib80211/libipw move there were quite a lot of
 conflicts and to solve those we decided to merge net-next into
 wireless-next.
 
 Currently there's one conflict in
 Documentation/networking/net_cachelines/net_device.rst. To fix that
 just remove the iw_public_data line:
 
 https://lore.kernel.org/all/20241011121014.674661a0@canb.auug.org.au/
 
 And when net is merged to net-next there will be another simple
 conflict in in net/mac80211/cfg.c:
 
 https://lore.kernel.org/all/20241024115523.4cd35dde@canb.auug.org.au/
 
 Major changes:
 
 cfg80211/mac80211
 
 * stop exporting wext symbols
 
 * new mac80211 op to indicate that a new interface is to be added
 
 * support radio separation of multi-band devices
 
 Wireless Extensions
 
 * move wext spy implementation to libiw
 
 * remove iw_public_data from struct net_device
 
 brcmfmac
 
 * optional LPO clock support
 
 ipw2x00
 
 * move remaining lib80211 code into libiw
 
 wilc1000
 
 * WILC3000 support
 
 rtw89
 
 * RTL8852BE and RTL8852BE-VT BT-coexistence improvements
 -----BEGIN PGP SIGNATURE-----
 
 iQFFBAABCgAvFiEEiBjanGPFTz4PRfLobhckVSbrbZsFAmcbz9YRHGt2YWxvQGtl
 cm5lbC5vcmcACgkQbhckVSbrbZsabQf8CWJ/kyonw/Z8hRxgfE/7D6Jiqoq7R+ML
 8W8lbc6F5wra4eCBq/oo6UVV36Ss6mxQYcRcmLq+nCkXa4qdMpg/z55QECMHxx5Z
 YnIBbD2vBrIj7W21gfCKH1WJ+b5IQFZl3zuxuCgXjxD9TJM2CjUfOkvrhrqqzrPn
 clfUx5f01vfv2jdvClPR5977gFE5One/ANeRQNs7uDS0TeeD2P+61DEB1//htIJo
 7GwwCyUJCeOcfWRMzQwhpoppWKcPAV70kSVJrl/fRstS68vQGSQbcx9yiNeWkSFw
 JXjQGdc8eYLPzLqECwS0KwFkta6AXbafAYYXe1wdlAzr+kmJ9x5oqA==
 =x+mr
 -----END PGP SIGNATURE-----

Merge tag 'wireless-next-2024-10-25' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next

Kalle Valo says:

====================
wireless-next patches for v6.13

The first -next "new features" pull request for v6.13. This is a big
one as we have not been able to send one earlier. We have also some
patches affecting other subsystems: in staging we deleted the rtl8192e
driver and in debugfs added a new interface to save struct
file_operations memory; both were acked by GregKH.

Because of the lib80211/libipw move there were quite a lot of
conflicts and to solve those we decided to merge net-next into
wireless-next.

Major changes:

cfg80211/mac80211
 * stop exporting wext symbols
 * new mac80211 op to indicate that a new interface is to be added
 * support radio separation of multi-band devices

Wireless Extensions
 * move wext spy implementation to libiw
 * remove iw_public_data from struct net_device

brcmfmac
 * optional LPO clock support

ipw2x00
 * move remaining lib80211 code into libiw

wilc1000
 * WILC3000 support

rtw89
 * RTL8852BE and RTL8852BE-VT BT-coexistence improvements

* tag 'wireless-next-2024-10-25' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (126 commits)
  mac80211: Remove NOP call to ieee80211_hw_config
  wifi: iwlwifi: work around -Wenum-compare-conditional warning
  wifi: mac80211: re-order assigning channel in activate links
  wifi: mac80211: convert debugfs files to short fops
  debugfs: add small file operations for most files
  wifi: mac80211: remove misleading j_0 construction parts
  wifi: mac80211_hwsim: use hrtimer_active()
  wifi: mac80211: refactor BW limitation check for CSA parsing
  wifi: mac80211: filter on monitor interfaces based on configured channel
  wifi: mac80211: refactor ieee80211_rx_monitor
  wifi: mac80211: add support for the monitor SKIP_TX flag
  wifi: cfg80211: add monitor SKIP_TX flag
  wifi: mac80211: add flag to opt out of virtual monitor support
  wifi: cfg80211: pass net_device to .set_monitor_channel
  wifi: mac80211: remove status->ampdu_delimiter_crc
  wifi: cfg80211: report per wiphy radio antenna mask
  wifi: mac80211: use vif radio mask to limit creating chanctx
  wifi: mac80211: use vif radio mask to limit ibss scan frequencies
  wifi: cfg80211: add option for vif allowed radios
  wifi: iwlwifi: allow IWL_FW_CHECK() with just a string
  ...

====================

Link: https://patch.msgid.link/20241025170705.5F6B2C4CEC3@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-29 18:50:58 -07:00
Przemek Kitszel
e3302f9a50 devlink: remove unused devlink_resource_register()
Remove unused devlink_resource_register(); all the drivers use
devl_resource_register() variant instead.

Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://patch.msgid.link/20241023131248.27192-8-przemyslaw.kitszel@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-29 16:52:57 -07:00
Przemek Kitszel
2a0df10434 devlink: remove unused devlink_resource_occ_get_register() and _unregister()
Remove not used devlink_resource_occ_get_register() and
devlink_resource_occ_get_unregister() functions; current devlink resource
users are fine with devl_ variants of the two.

Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://patch.msgid.link/20241023131248.27192-7-przemyslaw.kitszel@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-29 16:52:57 -07:00
Przemek Kitszel
d5020cb41e net: dsa: replace devlink resource registration calls by devl_ variants
Replace devlink_resource_register(), devlink_resource_occ_get_register(),
and devlink_resource_occ_get_unregister() calls by respective devl_*
variants. Mentioned functions have no direct users in any drivers, and are
going to be removed in subsequent patches.

Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://patch.msgid.link/20241023131248.27192-6-przemyslaw.kitszel@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-29 16:52:57 -07:00
Przemek Kitszel
72429e9e0c devlink: region: snapshot IDs: consolidate error values
Consolidate error codes for too big message size.

Current code is written to return -EINVAL when tailroom in the skb msg
would be exhausted precisely when it's time to nest, and return -EMSGSIZE
in all other "not enough space" conditions.

Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://patch.msgid.link/20241023131248.27192-5-przemyslaw.kitszel@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-29 16:52:57 -07:00
Przemek Kitszel
e0b140c44f devlink: devl_resource_register(): differentiate error codes
Differentiate error codes of devl_resource_register().

Replace one of -EINVAL exit paths by -EEXIST. This should aid developers
introducing new resources and registering them in the wrong order.

Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://patch.msgid.link/20241023131248.27192-4-przemyslaw.kitszel@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-29 16:52:57 -07:00
Przemek Kitszel
a788acf154 devlink: use devlink_nl_put_u64() helper
Use devlink_nl_put_u64() shortcut added by prev commit on all devlink/.

Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://patch.msgid.link/20241023131248.27192-3-przemyslaw.kitszel@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-29 16:52:56 -07:00
Przemek Kitszel
da3ee3cd79 devlink: introduce devlink_nl_put_u64()
Add devlink_nl_put_u64() that abstracts padding for u64 values.
All u64 values are passed with the very same padding option.

Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://patch.msgid.link/20241023131248.27192-2-przemyslaw.kitszel@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-29 16:52:56 -07:00
Kuniyuki Iwashima
4bbd360a50 socket: Print pf->create() when it does not clear sock->sk on failure.
I suggested to put DEBUG_NET_WARN_ON_ONCE() in __sock_create() to
catch possible use-after-free.

But the warning itself was not useful because our interest is in
the callee than the caller.

Let's define DEBUG_NET_WARN_ONCE() and print the name of pf->create()
and the socket identifier.

While at it, we enclose DEBUG_NET_WARN_ON_ONCE() in parentheses too
to avoid a checkpatch error.

Note that %pf or %pF were obsoleted and will be removed later as per
comment in lib/vsprintf.c.

Link: https://lore.kernel.org/netdev/202410231427.633734b3-lkp@intel.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241024201458.49412-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-29 16:31:23 -07:00
Wang Liang
9ab5cf19fb net: fix crash when config small gso_max_size/gso_ipv4_max_size
Config a small gso_max_size/gso_ipv4_max_size will lead to an underflow
in sk_dst_gso_max_size(), which may trigger a BUG_ON crash,
because sk->sk_gso_max_size would be much bigger than device limits.
Call Trace:
tcp_write_xmit
    tso_segs = tcp_init_tso_segs(skb, mss_now);
        tcp_set_skb_tso_segs
            tcp_skb_pcount_set
                // skb->len = 524288, mss_now = 8
                // u16 tso_segs = 524288/8 = 65535 -> 0
                tso_segs = DIV_ROUND_UP(skb->len, mss_now)
    BUG_ON(!tso_segs)
Add check for the minimum value of gso_max_size and gso_ipv4_max_size.

Fixes: 46e6b992c2 ("rtnetlink: allow GSO maximums to be set on device creation")
Fixes: 9eefedd58a ("net: add gso_ipv4_max_size and gro_ipv4_max_size per device")
Signed-off-by: Wang Liang <wangliang74@huawei.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241023035213.517386-1-wangliang74@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-29 16:18:23 -07:00
Vladimir Oltean
a13e690191 net/sched: sch_api: fix xa_insert() error path in tcf_block_get_ext()
This command:

$ tc qdisc replace dev eth0 ingress_block 1 egress_block 1 clsact
Error: block dev insert failed: -EBUSY.

fails because user space requests the same block index to be set for
both ingress and egress.

[ side note, I don't think it even failed prior to commit 913b47d342
  ("net/sched: Introduce tc block netdev tracking infra"), because this
  is a command from an old set of notes of mine which used to work, but
  alas, I did not scientifically bisect this ]

The problem is not that it fails, but rather, that the second time
around, it fails differently (and irrecoverably):

$ tc qdisc replace dev eth0 ingress_block 1 egress_block 1 clsact
Error: dsa_core: Flow block cb is busy.

[ another note: the extack is added by me for illustration purposes.
  the context of the problem is that clsact_init() obtains the same
  &q->ingress_block pointer as &q->egress_block, and since we call
  tcf_block_get_ext() on both of them, "dev" will be added to the
  block->ports xarray twice, thus failing the operation: once through
  the ingress block pointer, and once again through the egress block
  pointer. the problem itself is that when xa_insert() fails, we have
  emitted a FLOW_BLOCK_BIND command through ndo_setup_tc(), but the
  offload never sees a corresponding FLOW_BLOCK_UNBIND. ]

Even correcting the bad user input, we still cannot recover:

$ tc qdisc replace dev swp3 ingress_block 1 egress_block 2 clsact
Error: dsa_core: Flow block cb is busy.

Basically the only way to recover is to reboot the system, or unbind and
rebind the net device driver.

To fix the bug, we need to fill the correct error teardown path which
was missed during code movement, and call tcf_block_offload_unbind()
when xa_insert() fails.

[ last note, fundamentally I blame the label naming convention in
  tcf_block_get_ext() for the bug. The labels should be named after what
  they do, not after the error path that jumps to them. This way, it is
  obviously wrong that two labels pointing to the same code mean
  something is wrong, and checking the code correctness at the goto site
  is also easier ]

Fixes: 94e2557d08 ("net: sched: move block device tracking into tcf_block_get/put_ext()")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://patch.msgid.link/20241023100541.974362-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-29 11:45:23 -07:00
Kuniyuki Iwashima
bdd85ddce5 rtnetlink: Fix kdoc of rtnl_af_register().
Commit 26eebdc4b0 ("rtnetlink: Return int from rtnl_af_register().")
made rtnl_af_register() return int again, and kdoc needs to be fixed up.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241022210320.86111-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-29 11:35:20 -07:00
Pedro Tammela
2e95c43844 net/sched: stop qdisc_tree_reduce_backlog on TC_H_ROOT
In qdisc_tree_reduce_backlog, Qdiscs with major handle ffff: are assumed
to be either root or ingress. This assumption is bogus since it's valid
to create egress qdiscs with major handle ffff:
Budimir Markovic found that for qdiscs like DRR that maintain an active
class list, it will cause a UAF with a dangling class pointer.

In 066a3b5b23, the concern was to avoid iterating over the ingress
qdisc since its parent is itself. The proper fix is to stop when parent
TC_H_ROOT is reached because the only way to retrieve ingress is when a
hierarchy which does not contain a ffff: major handle call into
qdisc_lookup with TC_H_MAJ(TC_H_ROOT).

In the scenario where major ffff: is an egress qdisc in any of the tree
levels, the updates will also propagate to TC_H_ROOT, which then the
iteration must stop.

Fixes: 066a3b5b23 ("[NET_SCHED] sch_api: fix qdisc_tree_decrease_qlen() loop")
Reported-by: Budimir Markovic <markovicbudimir@gmail.com>
Suggested-by: Jamal Hadi Salim <jhs@mojatatu.com>
Tested-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>

 net/sched/sch_api.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
Reviewed-by: Simon Horman <horms@kernel.org>

Link: https://patch.msgid.link/20241024165547.418570-1-jhs@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-29 11:32:26 -07:00
Guillaume Nault
85ef52e869 ipv4: Prepare ip_rt_get_source() to future .flowi4_tos conversion.
Use ip4h_dscp() to get the DSCP from the IPv4 header, then convert the
dscp_t value to __u8 with inet_dscp_to_dsfield().

Then, when we'll convert .flowi4_tos to dscp_t, we'll just have to drop
the inet_dscp_to_dsfield() call.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/0a13a200f31809841975e38633914af1061e0c04.1729530028.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-29 11:21:06 -07:00
Guillaume Nault
6ab04392dd ipv4: Prepare ipmr_rt_fib_lookup() to future .flowi4_tos conversion.
Use ip4h_dscp() to get the DSCP from the IPv4 header, then convert the
dscp_t value to __u8 with inet_dscp_to_dsfield().

Then, when we'll convert .flowi4_tos to dscp_t, we'll just have to drop
the inet_dscp_to_dsfield() call.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/462402a097260357a7aba80228612305f230b6a9.1729530028.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-29 11:21:05 -07:00
Guillaume Nault
0ed373390c ipv4: Prepare icmp_reply() to future .flowi4_tos conversion.
Use ip4h_dscp() to get the DSCP from the IPv4 header, then convert the
dscp_t value to __u8 with inet_dscp_to_dsfield().

Then, when we'll convert .flowi4_tos to dscp_t, we'll just have to drop
the inet_dscp_to_dsfield() call.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/61b7563563f8b0a562b5b62032fe5260034d0aac.1729530028.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-29 11:21:05 -07:00
Guillaume Nault
b76ebf22c5 ipv4: Prepare fib_compute_spec_dst() to future .flowi4_tos conversion.
Use ip4h_dscp() to get the DSCP from the IPv4 header, then convert the
dscp_t value to __u8 with inet_dscp_to_dsfield().

Then, when we'll convert .flowi4_tos to dscp_t, we'll just have to drop
the inet_dscp_to_dsfield() call.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/a0eba69cce94f747e4c7516184a85ffd0abbe3f0.1729530028.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-29 11:21:05 -07:00
Ido Schimmel
90e0569dd3 ipv4: ip_tunnel: Fix suspicious RCU usage warning in ip_tunnel_find()
The per-netns IP tunnel hash table is protected by the RTNL mutex and
ip_tunnel_find() is only called from the control path where the mutex is
taken.

Add a lockdep expression to hlist_for_each_entry_rcu() in
ip_tunnel_find() in order to validate that the mutex is held and to
silence the suspicious RCU usage warning [1].

[1]
WARNING: suspicious RCU usage
6.12.0-rc3-custom-gd95d9a31aceb #139 Not tainted
-----------------------------
net/ipv4/ip_tunnel.c:221 RCU-list traversed in non-reader section!!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
1 lock held by ip/362:
 #0: ffffffff86fc7cb0 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x377/0xf60

stack backtrace:
CPU: 12 UID: 0 PID: 362 Comm: ip Not tainted 6.12.0-rc3-custom-gd95d9a31aceb #139
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Call Trace:
 <TASK>
 dump_stack_lvl+0xba/0x110
 lockdep_rcu_suspicious.cold+0x4f/0xd6
 ip_tunnel_find+0x435/0x4d0
 ip_tunnel_newlink+0x517/0x7a0
 ipgre_newlink+0x14c/0x170
 __rtnl_newlink+0x1173/0x19c0
 rtnl_newlink+0x6c/0xa0
 rtnetlink_rcv_msg+0x3cc/0xf60
 netlink_rcv_skb+0x171/0x450
 netlink_unicast+0x539/0x7f0
 netlink_sendmsg+0x8c1/0xd80
 ____sys_sendmsg+0x8f9/0xc20
 ___sys_sendmsg+0x197/0x1e0
 __sys_sendmsg+0x122/0x1f0
 do_syscall_64+0xbb/0x1d0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Fixes: c544193214 ("GRE: Refactor GRE tunneling code.")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241023123009.749764-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-29 11:12:53 -07:00
Jiayuan Chen
a32aee8f0d bpf: fix filed access without lock
The tcp_bpf_recvmsg_parser() function, running in user context,
retrieves seq_copied from tcp_sk without holding the socket lock, and
stores it in a local variable seq. However, the softirq context can
modify tcp_sk->seq_copied concurrently, for example, n tcp_read_sock().

As a result, the seq value is stale when it is assigned back to
tcp_sk->copied_seq at the end of tcp_bpf_recvmsg_parser(), leading to
incorrect behavior.

Due to concurrency, the copied_seq field in tcp_bpf_recvmsg_parser()
might be set to an incorrect value (less than the actual copied_seq) at
the end of function: 'WRITE_ONCE(tcp->copied_seq, seq)'. This causes the
'offset' to be negative in tcp_read_sock()->tcp_recv_skb() when
processing new incoming packets (sk->copied_seq - skb->seq becomes less
than 0), and all subsequent packets will be dropped.

Signed-off-by: Jiayuan Chen <mrpre@163.com>
Link: https://lore.kernel.org/r/20241028065226.35568-1-mrpre@163.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-10-29 10:54:05 -07:00
Steffen Klassert
83dfce38c4 xfrm: Restrict percpu SA attribute to specific netlink message types
Reject the usage of XFRMA_SA_PCPU in xfrm netlink messages when
it's not applicable.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Tested-by: Antony Antony <antony.antony@secunet.com>
Tested-by: Tobias Brunner <tobias@strongswan.org>
2024-10-29 11:56:24 +01:00
Steffen Klassert
81a331a0e7 xfrm: Add an inbound percpu state cache.
Now that we can have percpu xfrm states, the number of active
states might increase. To get a better lookup performance,
we add a percpu cache to cache the used inbound xfrm states.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Tested-by: Antony Antony <antony.antony@secunet.com>
Tested-by: Tobias Brunner <tobias@strongswan.org>
2024-10-29 11:56:18 +01:00
Steffen Klassert
0045e3d806 xfrm: Cache used outbound xfrm states at the policy.
Now that we can have percpu xfrm states, the number of active
states might increase. To get a better lookup performance,
we cache the used xfrm states at the policy for outbound
IPsec traffic.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Tested-by: Antony Antony <antony.antony@secunet.com>
Tested-by: Tobias Brunner <tobias@strongswan.org>
2024-10-29 11:56:12 +01:00
Steffen Klassert
1ddf9916ac xfrm: Add support for per cpu xfrm state handling.
Currently all flows for a certain SA must be processed by the same
cpu to avoid packet reordering and lock contention of the xfrm
state lock.

To get rid of this limitation, the IETF standardized per cpu SAs
in RFC 9611. This patch implements the xfrm part of it.

We add the cpu as a lookup key for xfrm states and a config option
to generate acquire messages for each cpu.

With that, we can have on each cpu a SA with identical traffic selector
so that flows can be processed in parallel on all cpus.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Tested-by: Antony Antony <antony.antony@secunet.com>
Tested-by: Tobias Brunner <tobias@strongswan.org>
2024-10-29 11:56:00 +01:00
Kuniyuki Iwashima
7ed8da17bf ipv4: Convert devinet_ioctl to per-netns RTNL.
ioctl(SIOCGIFCONF) calls dev_ifconf() that operates on the current netns.

Let's use per-netns RTNL helpers in dev_ifconf() and inet_gifconf().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-29 11:54:58 +01:00
Kuniyuki Iwashima
88d1f87706 ipv4: Convert devinet_ioctl() to per-netns RTNL except for SIOCSIFFLAGS.
Basically, devinet_ioctl() operates on a single netns.

However, ioctl(SIOCSIFFLAGS) will trigger the netdev notifier
that could touch another netdev in different netns.

Let's use per-netns RTNL helper in devinet_ioctl() and place
ASSERT_RTNL() for SIOCSIFFLAGS.

We will remove ASSERT_RTNL() once RTM_SETLINK and RTM_DELLINK
are converted.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-29 11:54:58 +01:00
Kuniyuki Iwashima
77453d428d ipv4: Convert devinet_sysctl_forward() to per-netns RTNL.
devinet_sysctl_forward() touches only a single netns.

Let's use rtnl_trylock() and __in_dev_get_rtnl_net().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-29 11:54:57 +01:00
Kuniyuki Iwashima
d1c81818aa rtnetlink: Define rtnl_net_trylock().
We will need the per-netns version of rtnl_trylock().

rtnl_net_trylock() calls __rtnl_net_lock() only when rtnl_trylock()
successfully holds RTNL.

When RTNL is removed, we will use mutex_trylock() for per-netns RTNL.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-29 11:54:57 +01:00
Kuniyuki Iwashima
c350c4761e ipv4: Convert check_lifetime() to per-netns RTNL.
Since commit 1675f38521 ("ipv4: Namespacify IPv4 address GC."),
check_lifetime() works on a per-netns basis.

Let's use rtnl_net_lock() and rtnl_net_dereference().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-29 11:54:57 +01:00
Kuniyuki Iwashima
4df5066f07 ipv4: Convert RTM_DELADDR to per-netns RTNL.
Let's push down RTNL into inet_rtm_deladdr() as rtnl_net_lock().

Now, ip_mc_autojoin_config() is always called under per-netns RTNL,
so ASSERT_RTNL() can be replaced with ASSERT_RTNL_NET().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-29 11:54:57 +01:00
Kuniyuki Iwashima
d4b483208b ipv4: Use per-netns RTNL helpers in inet_rtm_newaddr().
inet_rtm_to_ifa() and find_matching_ifa() are called
under rtnl_net_lock().

__in_dev_get_rtnl() and in_dev_for_each_ifa_rtnl() there
can use per-netns RTNL helpers.

Let's define and use __in_dev_get_rtnl_net() and
in_dev_for_each_ifa_rtnl_net().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-29 11:54:57 +01:00
Kuniyuki Iwashima
487257786b ipv4: Convert RTM_NEWADDR to per-netns RTNL.
The address hash table and GC are already namespacified.

Let's push down RTNL into inet_rtm_newaddr() as rtnl_net_lock().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-29 11:54:57 +01:00
Kuniyuki Iwashima
abd0deff03 ipv4: Don't allocate ifa for 0.0.0.0 in inet_rtm_newaddr().
When we pass 0.0.0.0 to __inet_insert_ifa(), it frees ifa and returns 0.

We can do this check much earlier for RTM_NEWADDR even before allocating
struct in_ifaddr.

Let's move the validation to

  1. inet_insert_ifa() for ioctl()
  2. inet_rtm_newaddr() for RTM_NEWADDR

Now, we can remove the same check in find_matching_ifa().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-29 11:54:57 +01:00
Kuniyuki Iwashima
2d34429d14 ipv4: Factorise RTM_NEWADDR validation to inet_validate_rtm().
rtm_to_ifaddr() validates some attributes, looks up a netdev,
allocates struct in_ifaddr, and validates IFA_CACHEINFO.

There is no reason to delay IFA_CACHEINFO validation.

We will push RTNL down to inet_rtm_newaddr(), and then we want
to complete rtnetlink validation before rtnl_net_lock().

Let's factorise the validation parts.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-29 11:54:57 +01:00
Cong Wang
740be3b9a6 sock_map: fix a NULL pointer dereference in sock_map_link_update_prog()
The following race condition could trigger a NULL pointer dereference:

sock_map_link_detach():		sock_map_link_update_prog():
   mutex_lock(&sockmap_mutex);
   ...
   sockmap_link->map = NULL;
   mutex_unlock(&sockmap_mutex);
   				   mutex_lock(&sockmap_mutex);
				   ...
				   sock_map_prog_link_lookup(sockmap_link->map);
				   mutex_unlock(&sockmap_mutex);
   <continue>

Fix it by adding a NULL pointer check. In this specific case, it makes
no sense to update a link which is being released.

Reported-by: Ruan Bonan <bonan.ruan@u.nus.edu>
Fixes: 699c23f02c ("bpf: Add bpf_link support for sk_msg and sk_skb progs")
Cc: Yonghong Song <yonghong.song@linux.dev>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Link: https://lore.kernel.org/r/20241026185522.338562-1-xiyou.wangcong@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-10-28 18:44:14 -07:00
Eric Dumazet
ab101c553b neighbour: use kvzalloc()/kvfree()
mm layer is providing convenient functions, we do not have
to work around old limitations.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Gilad Naaman <gnaaman@drivenets.com>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20241022150059.1345406-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-28 18:12:06 -07:00
Eric Dumazet
ba4e469e42 vsock: do not leave dangling sk pointer in vsock_create()
syzbot was able to trigger the following warning after recent
core network cleanup.

On error vsock_create() frees the allocated sk object, but sock_init_data()
has already attached it to the provided sock object.

We must clear sock->sk to avoid possible use-after-free later.

WARNING: CPU: 0 PID: 5282 at net/socket.c:1581 __sock_create+0x897/0x950 net/socket.c:1581
Modules linked in:
CPU: 0 UID: 0 PID: 5282 Comm: syz.2.43 Not tainted 6.12.0-rc2-syzkaller-00667-g53bac8330865 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
 RIP: 0010:__sock_create+0x897/0x950 net/socket.c:1581
Code: 7f 06 01 65 48 8b 34 25 00 d8 03 00 48 81 c6 b0 08 00 00 48 c7 c7 60 0b 0d 8d e8 d4 9a 3c 02 e9 11 f8 ff ff e8 0a ab 0d f8 90 <0f> 0b 90 e9 82 fd ff ff 89 e9 80 e1 07 fe c1 38 c1 0f 8c c7 f8 ff
RSP: 0018:ffffc9000394fda8 EFLAGS: 00010293
RAX: ffffffff89873c46 RBX: ffff888079f3c818 RCX: ffff8880314b9e00
RDX: 0000000000000000 RSI: 00000000ffffffed RDI: 0000000000000000
RBP: ffffffff8d3337f0 R08: ffffffff8987384e R09: ffffffff8989473a
R10: dffffc0000000000 R11: fffffbfff203a276 R12: 00000000ffffffed
R13: ffff888079f3c8c0 R14: ffffffff898736e7 R15: dffffc0000000000
FS:  00005555680ab500(0000) GS:ffff8880b8600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22b11196d0 CR3: 00000000308c0000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
  sock_create net/socket.c:1632 [inline]
  __sys_socket_create net/socket.c:1669 [inline]
  __sys_socket+0x150/0x3c0 net/socket.c:1716
  __do_sys_socket net/socket.c:1730 [inline]
  __se_sys_socket net/socket.c:1728 [inline]
  __x64_sys_socket+0x7a/0x90 net/socket.c:1728
  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
  do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f22b117dff9
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fff56aec0e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000029
RAX: ffffffffffffffda RBX: 00007f22b1335f80 RCX: 00007f22b117dff9
RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000028
RBP: 00007f22b11f0296 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f22b1335f80 R14: 00007f22b1335f80 R15: 00000000000012dd

Fixes: 48156296a0 ("net: warn, if pf->create does not clear sock->sk on error")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ignat Korchagin <ignat@cloudflare.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20241022134819.1085254-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-28 18:08:52 -07:00
Davide Caratti
46a3282b87 mptcp: use "middlebox interference" RST when no DSS
RFC8684 suggests use of "Middlebox interference (code 0x06)" in case of
fully established subflow that carries data at TCP level with no DSS
sub-option.

This is generally the case when mpext is NULL or mpext->use_map is 0:
use a dedicated value of 'mapping_status' and use it before closing the
socket in subflow_check_data_avail().

Link: https://github.com/multipath-tcp/mptcp_net-next/issues/518
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20241021-net-next-mptcp-misc-6-13-v1-4-1ef02746504a@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-28 15:55:45 -07:00
Geliang Tang
5add80bfdc mptcp: implement mptcp_pm_connection_closed
The MPTCP path manager event handler mptcp_pm_connection_closed
interface has been added in the commit 1b1c7a0ef7 ("mptcp: Add path
manager interface") but it was an empty function from then on.

With such name, it sounds good to invoke mptcp_event with the
MPTCP_EVENT_CLOSED event type from it. It also removes a bit of
duplicated code.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20241021-net-next-mptcp-misc-6-13-v1-3-1ef02746504a@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-28 15:55:45 -07:00
Gang Yan
581c8cbfa9 mptcp: annotate data-races around subflow->fully_established
We introduce the same handling for potential data races with the
'fully_established' flag in subflow as previously done for
msk->fully_established.

Additionally, we make a crucial change: convert the subflow's
'fully_established' from 'bit_field' to 'bool' type. This is
necessary because methods for avoiding data races don't work well
with 'bit_field'. Specifically, the 'READ_ONCE' needs to know
the size of the variable being accessed, which is not supported in
'bit_field'. Also, 'test_bit' expect the address of 'bit_field'.

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/516
Signed-off-by: Gang Yan <yangang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20241021-net-next-mptcp-misc-6-13-v1-2-1ef02746504a@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-28 15:55:45 -07:00
Matthieu Baerts (NGI0)
a42f307664 mptcp: pm: send ACK on non-stale subflows
If the subflow is considered as "staled", it is better to avoid it to
send an ACK carrying an ADD_ADDR or RM_ADDR. Another subflow, if any,
will then be selected.

Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20241021-net-next-mptcp-misc-6-13-v1-1-1ef02746504a@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-28 15:55:45 -07:00
Matthieu Baerts (NGI0)
3deb12c788 mptcp: init: protect sched with rcu_read_lock
Enabling CONFIG_PROVE_RCU_LIST with its dependence CONFIG_RCU_EXPERT
creates this splat when an MPTCP socket is created:

  =============================
  WARNING: suspicious RCU usage
  6.12.0-rc2+ #11 Not tainted
  -----------------------------
  net/mptcp/sched.c:44 RCU-list traversed in non-reader section!!

  other info that might help us debug this:

  rcu_scheduler_active = 2, debug_locks = 1
  no locks held by mptcp_connect/176.

  stack backtrace:
  CPU: 0 UID: 0 PID: 176 Comm: mptcp_connect Not tainted 6.12.0-rc2+ #11
  Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
  Call Trace:
   <TASK>
   dump_stack_lvl (lib/dump_stack.c:123)
   lockdep_rcu_suspicious (kernel/locking/lockdep.c:6822)
   mptcp_sched_find (net/mptcp/sched.c:44 (discriminator 7))
   mptcp_init_sock (net/mptcp/protocol.c:2867 (discriminator 1))
   ? sock_init_data_uid (arch/x86/include/asm/atomic.h:28)
   inet_create.part.0.constprop.0 (net/ipv4/af_inet.c:386)
   ? __sock_create (include/linux/rcupdate.h:347 (discriminator 1))
   __sock_create (net/socket.c:1576)
   __sys_socket (net/socket.c:1671)
   ? __pfx___sys_socket (net/socket.c:1712)
   ? do_user_addr_fault (arch/x86/mm/fault.c:1419 (discriminator 1))
   __x64_sys_socket (net/socket.c:1728)
   do_syscall_64 (arch/x86/entry/common.c:52 (discriminator 1))
   entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)

That's because when the socket is initialised, rcu_read_lock() is not
used despite the explicit comment written above the declaration of
mptcp_sched_find() in sched.c. Adding the missing lock/unlock avoids the
warning.

Fixes: 1730b2b2c5 ("mptcp: add sched in mptcp_sock")
Cc: stable@vger.kernel.org
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/523
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241021-net-mptcp-sched-lock-v1-1-637759cf061c@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-28 15:50:54 -07:00
Gustavo A. R. Silva
cf44e74504 wifi: mac80211: ieee80211_i: Fix memory corruption bug in struct ieee80211_chanctx
Move the `struct ieee80211_chanctx_conf conf` to the end of
`struct ieee80211_chanctx` and fix a memory corruption bug
triggered e.g. in `hwsim_set_chanctx_magic()`: `radar_detected`
is being overwritten when `cp->magic = HWSIM_CHANCTX_MAGIC;`
See the function call sequence below:

drv_add_chanctx(... struct ieee80211_chanctx *ctx) ->
    local->ops->add_chanctx(&local->hw, &ctx->conf) ->
	mac80211_hwsim_add_chanctx(... struct ieee80211_chanctx_conf *ctx) ->
	    hwsim_set_chanctx_magic(ctx)

This also happens in a number of other drivers.

Also, add a code comment to try to prevent people from introducing
new members after `struct ieee80211_chanctx_conf conf`. Notice that
`struct ieee80211_chanctx_conf` is a flexible structure --a structure
that contains a flexible-array member, so it should always be at
the end of any other containing structures.

This change also fixes 50 of the following warnings:

net/mac80211/ieee80211_i.h:895:39: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]

-Wflex-array-member-not-at-end was introduced in GCC-14, and we are
getting ready to enable it, globally.

Fixes: bca8bc0399 ("wifi: mac80211: handle ieee80211_radar_detected() for MLO")
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://patch.msgid.link/ZxwWPrncTeSi1UTq@kspp
[also refer to other drivers in commit message]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-26 00:42:49 +02:00
Johannes Berg
d5fee261df wifi: cfg80211: clear wdev->cqm_config pointer on free
When we free wdev->cqm_config when unregistering, we also
need to clear out the pointer since the same wdev/netdev
may get re-registered in another network namespace, then
destroyed later, running this code again, which results in
a double-free.

Reported-by: syzbot+36218cddfd84b5cc263e@syzkaller.appspotmail.com
Fixes: 37c20b2eff ("wifi: cfg80211: fix cqm_config access race")
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20241022161742.7c34b2037726.I121b9cdb7eb180802eafc90b493522950d57ee18@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-25 17:53:40 +02:00
Ben Greear
9b15c6cf8d mac80211: fix user-power when emulating chanctx
ieee80211_calc_hw_conf_chan was ignoring the configured
user_txpower.  If it is set, use it to potentially decrease
txpower as requested.

Signed-off-by: Ben Greear <greearb@candelatech.com>
Link: https://patch.msgid.link/20241010203954.1219686-1-greearb@candelatech.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-25 17:53:38 +02:00
David S. Miller
e31a8219fb wireless fixes for v6.12-rc5
The first set of wireless fixes for v6.12. We have been busy and have
 not been able to send this earlier, so there are more fixes than
 usual. The fixes are all over, both in stack and in drivers, but
 nothing special really standing out.
 -----BEGIN PGP SIGNATURE-----
 
 iQFFBAABCgAvFiEEiBjanGPFTz4PRfLobhckVSbrbZsFAmcWl7MRHGt2YWxvQGtl
 cm5lbC5vcmcACgkQbhckVSbrbZv0Qgf/fQJKXkGJkvozrbJATkKHfHKUOphIl4Y8
 /r3SlrsIL6qXZAUq5N+NH9vfUeKt5kkKG8Fc8yrJaygDLsV9v1LGiBSsb5eJ+PfM
 4fCOdzPSrWG984dLwsCK8UGEzfQ1G4d6HckwubUMimK2X/m6wx/99fenjMAQvdWO
 rjAJmpAkgoT0Fvf8GD3joMBKKjMFr2KT8tgbfvwpyr9cXAPZYf35+74Nl84UjHiP
 rGTGN++NQuPMsYyYIPPA+eMNUnlUVyDah+UVmzsMp27YUdKBKjx23kRH6tKM/46H
 dWqpqEV50xshlPaotHoFg9+4KRrxnxwvFtGTsnbvHcuSnkPBUusAvw==
 =l6SI
 -----END PGP SIGNATURE-----

Merge tag 'wireless-2024-10-21' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless

wireless fixes for v6.12-rc5

The first set of wireless fixes for v6.12. We have been busy and have
not been able to send this earlier, so there are more fixes than
usual. The fixes are all over, both in stack and in drivers, but
nothing special really standing out.
2024-10-25 10:44:41 +01:00
Paolo Abeni
03fc07a247 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

No conflicts and no adjacent changes.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-25 09:08:22 +02:00
Alexei Starovoitov
bfa7b5c98b Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Cross-merge bpf fixes after downstream PR.

No conflicts.

Adjacent changes in:

include/linux/bpf.h
include/uapi/linux/bpf.h
kernel/bpf/btf.c
kernel/bpf/helpers.c
kernel/bpf/syscall.c
kernel/bpf/verifier.c
kernel/trace/bpf_trace.c
mm/slab_common.c
tools/include/uapi/linux/bpf.h
tools/testing/selftests/bpf/Makefile

Link: https://lore.kernel.org/all/20241024215724.60017-1-daniel@iogearbox.net/
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-10-24 18:47:28 -07:00
Linus Torvalds
ae90f6a617 BPF fixes:
- Fix an out-of-bounds read in bpf_link_show_fdinfo for BPF
   sockmap link file descriptors (Hou Tao)
 
 - Fix BPF arm64 JIT's address emission with tag-based KASAN
   enabled reserving not enough size (Peter Collingbourne)
 
 - Fix BPF verifier do_misc_fixups patching for inlining of the
   bpf_get_branch_snapshot BPF helper (Andrii Nakryiko)
 
 - Fix a BPF verifier bug and reject BPF program write attempts
   into read-only marked BPF maps (Daniel Borkmann)
 
 - Fix perf_event_detach_bpf_prog error handling by removing an
   invalid check which would skip BPF program release (Jiri Olsa)
 
 - Fix memory leak when parsing mount options for the BPF
   filesystem (Hou Tao)
 
 Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
 -----BEGIN PGP SIGNATURE-----
 
 iIsEABYIADMWIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZxrAzxUcZGFuaWVsQGlv
 Z2VhcmJveC5uZXQACgkQ2yufC7HISIPcHwD8DnBSPlHX9OezMWCm8mjVx2Fd26W9
 /IaiW2tyOPtoSGIA/3hfgfLrxkb3Raoh0miQB2+FRrz9e+y7i8c4Q91mcUgJ
 =Hvht
 -----END PGP SIGNATURE-----

Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Pull bpf fixes from Daniel Borkmann:

 - Fix an out-of-bounds read in bpf_link_show_fdinfo for BPF sockmap
   link file descriptors (Hou Tao)

 - Fix BPF arm64 JIT's address emission with tag-based KASAN enabled
   reserving not enough size (Peter Collingbourne)

 - Fix BPF verifier do_misc_fixups patching for inlining of the
   bpf_get_branch_snapshot BPF helper (Andrii Nakryiko)

 - Fix a BPF verifier bug and reject BPF program write attempts into
   read-only marked BPF maps (Daniel Borkmann)

 - Fix perf_event_detach_bpf_prog error handling by removing an invalid
   check which would skip BPF program release (Jiri Olsa)

 - Fix memory leak when parsing mount options for the BPF filesystem
   (Hou Tao)

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  bpf: Check validity of link->type in bpf_link_show_fdinfo()
  bpf: Add the missing BPF_LINK_TYPE invocation for sockmap
  bpf: fix do_misc_fixups() for bpf_get_branch_snapshot()
  bpf,perf: Fix perf_event_detach_bpf_prog error handling
  selftests/bpf: Add test for passing in uninit mtu_len
  selftests/bpf: Add test for writes to .rodata
  bpf: Remove MEM_UNINIT from skb/xdp MTU helpers
  bpf: Fix overloading of MEM_UNINIT's meaning
  bpf: Add MEM_WRITE attribute
  bpf: Preserve param->string when parsing mount options
  bpf, arm64: Fix address emission with tag-based KASAN enabled
2024-10-24 16:53:20 -07:00
Linus Torvalds
d44cd82264 Including fixes from netfiler, xfrm and bluetooth.
Current release - regressions:
 
   - posix-clock: Fix unbalanced locking in pc_clock_settime()
 
   - netfilter: fix typo causing some targets not to load on IPv6
 
 Current release - new code bugs:
 
   - xfrm: policy: remove last remnants of pernet inexact list
 
 Previous releases - regressions:
 
   - core: fix races in netdev_tx_sent_queue()/dev_watchdog()
 
   - bluetooth: fix UAF on sco_sock_timeout
 
   - eth: hv_netvsc: fix VF namespace also in synthetic NIC NETDEV_REGISTER event
 
   - eth: usbnet: fix name regression
 
   - eth: be2net: fix potential memory leak in be_xmit()
 
   - eth: plip: fix transmit path breakage
 
 Previous releases - always broken:
 
   - sched: deny mismatched skip_sw/skip_hw flags for actions created by classifiers
 
   - netfilter: bpf: must hold reference on net namespace
 
   - eth: virtio_net: fix integer overflow in stats
 
   - eth: bnxt_en: replace ptp_lock with irqsave variant
 
   - eth: octeon_ep: add SKB allocation failures handling in __octep_oq_process_rx()
 
 Misc:
 
   - MAINTAINERS: add Simon as an official reviewer
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmcaTkUSHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOkW8kP/iYfaxQ8zR61wUU7bOcVUSnEADR9XQ1H
 Nta5Z0tDJprZv254XW3hYDzU0Iy3OgclRE1oewF5fQVLn6Sfg4U5awxRTNdJw7KV
 wj62ziAv/xht2W/4nBsNfYkOZaDAibItbKtxlkOhgCGXSrXBoS22IonKRqEv2HLV
 Gu0vAY/VI9YNvB5Z6SEKFmQp2bWfX79AChVT72shLBLakOCUHBavk/DOU56XH1Ci
 IRmU5Lt8ysXWxCTF91rPCAbMyuxBbIv6phIKPV2ALpRUd6ha5nBqcl0wcS7Y1E+/
 0XOV71zjcXFoE/6hc5W3/mC7jm+ipXKVJOnIkCcWq40p6kDVJJ+E1RWEr5JxGEyF
 FtnUCZ8iK/F3/jSalMras2z+AZ/CGtfHF9wAS3YfMGtOJJb/k4dCxAddp7UzD9O4
 yxAJhJ0DrVuplzwovL5owoJJXeRAMQeFydzHBYun5P8Sc9TtvviICi19fMgKGn4O
 eUQhjgZZY371sPnTDLDEw1Oqzs9qeaeV3S2dSeFJ98PQuPA5KVOf/R2/CptBIMi5
 +UNcqeXrlUeYSBW94pPioEVStZDrzax5RVKh/Jo1tTnKzbnWDOOKZqSVsGPMWXdO
 0aBlGuSsNe36VDg2C0QMxGk7+gXbKmk9U4+qVQH3KMpB8uqdAu5deMbTT6dfcwBV
 O/BaGiqoR4ak
 =dR3Q
 -----END PGP SIGNATURE-----

Merge tag 'net-6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from netfiler, xfrm and bluetooth.

  Oddly this includes a fix for a posix clock regression; in our
  previous PR we included a change there as a pre-requisite for
  networking one. That fix proved to be buggy and requires the follow-up
  included here. Thomas suggested we should send it, given we sent the
  buggy patch.

  Current release - regressions:

   - posix-clock: Fix unbalanced locking in pc_clock_settime()

   - netfilter: fix typo causing some targets not to load on IPv6

  Current release - new code bugs:

   - xfrm: policy: remove last remnants of pernet inexact list

  Previous releases - regressions:

   - core: fix races in netdev_tx_sent_queue()/dev_watchdog()

   - bluetooth: fix UAF on sco_sock_timeout

   - eth: hv_netvsc: fix VF namespace also in synthetic NIC
     NETDEV_REGISTER event

   - eth: usbnet: fix name regression

   - eth: be2net: fix potential memory leak in be_xmit()

   - eth: plip: fix transmit path breakage

  Previous releases - always broken:

   - sched: deny mismatched skip_sw/skip_hw flags for actions created by
     classifiers

   - netfilter: bpf: must hold reference on net namespace

   - eth: virtio_net: fix integer overflow in stats

   - eth: bnxt_en: replace ptp_lock with irqsave variant

   - eth: octeon_ep: add SKB allocation failures handling in
     __octep_oq_process_rx()

  Misc:

   - MAINTAINERS: add Simon as an official reviewer"

* tag 'net-6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (40 commits)
  net: dsa: mv88e6xxx: support 4000ps cycle counter period
  net: dsa: mv88e6xxx: read cycle counter period from hardware
  net: dsa: mv88e6xxx: group cycle counter coefficients
  net: usb: qmi_wwan: add Fibocom FG132 0x0112 composition
  hv_netvsc: Fix VF namespace also in synthetic NIC NETDEV_REGISTER event
  net: dsa: microchip: disable EEE for KSZ879x/KSZ877x/KSZ876x
  Bluetooth: ISO: Fix UAF on iso_sock_timeout
  Bluetooth: SCO: Fix UAF on sco_sock_timeout
  Bluetooth: hci_core: Disable works on hci_unregister_dev
  posix-clock: posix-clock: Fix unbalanced locking in pc_clock_settime()
  r8169: avoid unsolicited interrupts
  net: sched: use RCU read-side critical section in taprio_dump()
  net: sched: fix use-after-free in taprio_change()
  net/sched: act_api: deny mismatched skip_sw/skip_hw flags for actions created by classifiers
  net: usb: usbnet: fix name regression
  mlxsw: spectrum_router: fix xa_store() error checking
  virtio_net: fix integer overflow in stats
  net: fix races in netdev_tx_sent_queue()/dev_watchdog()
  net: wwan: fix global oob in wwan_rtnl_policy
  netfilter: xtables: fix typo causing some targets not to load on IPv6
  ...
2024-10-24 16:43:50 -07:00
Martin KaFai Lau
b9a5a07aea bpf: Add "bool swap_uptrs" arg to bpf_local_storage_update() and bpf_selem_alloc()
In a later patch, the task local storage will only accept uptr
from the syscall update_elem and will not accept uptr from
the bpf prog. The reason is the bpf prog does not have a way
to provide a valid user space address.

bpf_local_storage_update() and bpf_selem_alloc() are used by
both bpf prog bpf_task_storage_get(BPF_LOCAL_STORAGE_GET_F_CREATE)
and bpf syscall update_elem. "bool swap_uptrs" arg is added
to bpf_local_storage_update() and bpf_selem_alloc() to tell if
it is called by the bpf prog or by the bpf syscall. When
swap_uptrs==true, it is called by the syscall.

The arg is named (swap_)uptrs because the later patch will swap
the uptrs between the newly allocated selem and the user space
provided map_value. It will make error handling easier in case
map->ops->map_update_elem() fails and the caller can decide
if it needs to unpin the uptr in the user space provided
map_value or the bpf_local_storage_update() has already
taken the uptr ownership and will take care of unpinning it also.

Only swap_uptrs==false is passed now. The logic to handle
the true case will be added in a later patch.

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20241023234759.860539-4-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-10-24 10:25:59 -07:00
Kuniyuki Iwashima
17a1ac0018 phonet: Don't hold RTNL for route_doit().
Now only __dev_get_by_index() depends on RTNL in route_doit().

Let's use dev_get_by_index_rcu() and register route_doit() with
RTNL_FLAG_DOIT_UNLOCKED.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-24 16:03:40 +02:00
Kuniyuki Iwashima
3deec3b4af phonet: Convert phonet_routes.lock to spinlock_t.
route_doit() calls phonet_route_add() or phonet_route_del()
for RTM_NEWROUTE or RTM_DELROUTE, respectively.

Both functions only touch phonet_pernet(dev_net(dev))->routes,
which is currently protected by RTNL and its dedicated mutex,
phonet_routes.lock.

We will convert route_doit() to RCU and cannot use mutex inside RCU.

Let's convert the mutex to spinlock_t.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-24 16:03:40 +02:00
Kuniyuki Iwashima
de51ad08b1 phonet: Pass net and ifindex to rtm_phonet_notify().
Currently, rtm_phonet_notify() fetches netns and ifindex from dev.

Once route_doit() is converted to RCU, rtm_phonet_notify() will be
called outside of RCU due to GFP_KERNEL, and dev will be unavailable
there.

Let's pass net and ifindex to rtm_phonet_notify().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-24 16:03:40 +02:00
Kuniyuki Iwashima
302fc6bbcb phonet: Pass ifindex to fill_route().
We will convert route_doit() to RCU.

route_doit() will call rtm_phonet_notify() outside of RCU due
to GFP_KERNEL, so dev will not be available in fill_route().

Let's pass ifindex directly to fill_route().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-24 16:03:40 +02:00
Kuniyuki Iwashima
b7d2fc9ad7 phonet: Don't hold RTNL for getaddr_dumpit().
getaddr_dumpit() already relies on RCU and does not need RTNL.

Let's use READ_ONCE() for ifindex and register getaddr_dumpit()
with RTNL_FLAG_DUMP_UNLOCKED.

While at it, the retval of getaddr_dumpit() is changed to combine
NLMSG_DONE and save recvmsg() as done in 58a4ff5d77 ("phonet: no
longer hold RTNL in route_dumpit()").

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-24 16:03:40 +02:00
Kuniyuki Iwashima
8786e98dd0 phonet: Don't hold RTNL for addr_doit().
Now only __dev_get_by_index() depends on RTNL in addr_doit().

Let's use dev_get_by_index_rcu() and register addr_doit() with
RTNL_FLAG_DOIT_UNLOCKED.

While at it, I changed phonet_rtnl_msg_handlers[]'s init to C99
style like other core networking code.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-24 16:03:40 +02:00
Kuniyuki Iwashima
42f5fe1dc4 phonet: Convert phonet_device_list.lock to spinlock_t.
addr_doit() calls phonet_address_add() or phonet_address_del()
for RTM_NEWADDR or RTM_DELADDR, respectively.

Both functions only touch phonet_device_list(dev_net(dev)),
which is currently protected by RTNL and its dedicated mutex,
phonet_device_list.lock.

We will convert addr_doit() to RCU and cannot use mutex inside RCU.

Let's convert the mutex to spinlock_t.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-24 16:03:40 +02:00
Kuniyuki Iwashima
68ed5c38b5 phonet: Pass net and ifindex to phonet_address_notify().
Currently, phonet_address_notify() fetches netns and ifindex from dev.

Once addr_doit() is converted to RCU, phonet_address_notify() will be
called outside of RCU due to GFP_KERNEL, and dev will be unavailable
there.

Let's pass net and ifindex to phonet_address_notify().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-24 16:03:40 +02:00
Kuniyuki Iwashima
08a9572be3 phonet: Pass ifindex to fill_addr().
We will convert addr_doit() and getaddr_dumpit() to RCU, both
of which call fill_addr().

The former will call phonet_address_notify() outside of RCU
due to GFP_KERNEL, so dev will not be available in fill_addr().

Let's pass ifindex directly to fill_addr().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-24 16:03:40 +02:00
Paolo Abeni
1876479d98 bluetooth pull request for net:
- hci_core: Disable works on hci_unregister_dev
  - SCO: Fix UAF on sco_sock_timeout
  - ISO: Fix UAF on iso_sock_timeout
 -----BEGIN PGP SIGNATURE-----
 
 iQJNBAABCAA3FiEE7E6oRXp8w05ovYr/9JCA4xAyCykFAmcZB+0ZHGx1aXoudm9u
 LmRlbnR6QGludGVsLmNvbQAKCRD0kIDjEDILKWcfD/4j4Q3W01y1UotsTKa1et7V
 r6ydZfHBwU/ND8G8t7sfg4zJ6daYcYY6ERllj92JYCHdmrwILzt3WEgCAAC1Y8dj
 r6RUXFapGSwE3YMIdDstcKgsfYV+zzKL+IBt0VRjhEMeoEEczzRGldk8Y8Z9ePhm
 9Jyw1WDmid9DkIfIdrTKjnLH5HsjYZSvIJfkVuKKcfRshdolxfokHv9DpbPIbueO
 ya33a/mnHVKnLyZPT72uP2DANkP218+zHI+dCjE6279LzrXJkQKLKThf/3bTGO+o
 NohqGEj9NsIp8NqS/jr1yzvXOIqXkA5cG0Qrix+OyZuvBIohukTFyes1f2hcRHoh
 l41s+IorviUhrtEPZ2ki/AWyXTVpWJl2EQ6XPf6iUexF2PDCgTzILN4WIELBTZse
 cEWPVbMI+ZEq9FHX1P9Vfc+Yje4glTXcQzBSlfaljPmbW0CouxYCJ4kEj+5m4F6V
 xBUevpIz3dRUMST4tXaOrhso/Th2zCDDbWwp6ImEZQ8xBM5wm8sDrjkZOtWEWRNc
 miLEnkfqZxJmt6b8DfGzeM/p3FJ9i7OsCo6tUVEH4mGATViD8QOUaqMk/kxV5Wh0
 ORB3cWk4VYTvXGuPceEH7u8yBslbUzmad/eMOK52ErP4UrORhESZYoaDy3ALcr7A
 4z+9xBlOlezKTe5puR5ulw==
 =gOTr
 -----END PGP SIGNATURE-----

Merge tag 'for-net-2024-10-23' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth

Luiz Augusto von Dentz says:

====================
bluetooth pull request for net:

 - hci_core: Disable works on hci_unregister_dev
 - SCO: Fix UAF on sco_sock_timeout
 - ISO: Fix UAF on iso_sock_timeout

* tag 'for-net-2024-10-23' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
  Bluetooth: ISO: Fix UAF on iso_sock_timeout
  Bluetooth: SCO: Fix UAF on sco_sock_timeout
  Bluetooth: hci_core: Disable works on hci_unregister_dev
====================

Link: https://patch.msgid.link/20241023143005.2297694-1-luiz.dentz@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-24 12:30:23 +02:00
Paolo Abeni
1e424d08d3 ipsec-2024-10-22
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEH7ZpcWbFyOOp6OJbrB3Eaf9PW7cFAmcXbBEACgkQrB3Eaf9P
 W7e0hQ//XiBdyhArA8kYIgsCylrOr+y/uCErnIhzUTqo20uE3dMPvzQHwY1GIgiU
 HYXKg49WLVxSuFtLRu32qCr0G+muU1UI5OL58IQuQ+TxKzj0hnV4BqAx+rNYhaFb
 JxJhgAcQQu7VCL7/qgqGsQnhq/hhg29Rfqa1VTEZ4RthEMahPDbwyibjyOfqwSgm
 fCPIl2FTkB7E0PZnwZJGxmaOJXS7g/djb+CPmBI6zxLQHG5VXY/UGNyObUvTLD9K
 gV+N0u0ieyDTxpvpgh6HMAFSkORLS/PIUCAX0SZEW48+7DLbBeKMMYwegtxxJZ3D
 3zaWi8uKGh5rjOslQbU4ZlpxJr7yvIV6RhGJhOPDYz5Es4EXHU7c0tZ/pma46eb0
 2PJxQyTHW4O9fbybQvl0w9fUQlhjKMbv/TygJgpOIk9YUr2y8Yxc8yhmWi+669ly
 e7PEi/33lqJI44gisu0BMresxJcPA3eFWje+Dzw/7N/tlLJzbWt3psRqB9u/JwVH
 LD0YvXraZYvaRNzeGUfbXTrvmouhLcl15zAE8RFJBTgGJbpILviJ9NfUMOIO7Yor
 BBKEWlylCm/4x5iOdVb17gFCi7uERiahbxNg3+hltAQuMvEdrhhWXp1N7esTRvkf
 D1o0qR5C2k2jyc9LQNqfiGWDEOgTCt1DCdhpo2F/EtF5kSerp6s=
 =Ai21
 -----END PGP SIGNATURE-----

Merge tag 'ipsec-2024-10-22' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec

Steffen Klassert says:

====================
pull request (net): ipsec 2024-10-22

1) Fix routing behavior that relies on L4 information
   for xfrm encapsulated packets.
   From Eyal Birger.

2) Remove leftovers of pernet policy_inexact lists.
   From Florian Westphal.

3) Validate new SA's prefixlen when the selector family is
   not set from userspace.
   From Sabrina Dubroca.

4) Fix a kernel-infoleak when dumping an auth algorithm.
   From Petr Vaganov.

Please pull or let me know if there are problems.

ipsec-2024-10-22

* tag 'ipsec-2024-10-22' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
  xfrm: fix one more kernel-infoleak in algo dumping
  xfrm: validate new SA's prefixlen using SA family when sel.family is unset
  xfrm: policy: remove last remnants of pernet inexact list
  xfrm: respect ip protocols rules criteria when performing dst lookups
  xfrm: extract dst lookup parameters into a struct
====================

Link: https://patch.msgid.link/20241022092226.654370-1-steffen.klassert@secunet.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-24 11:11:33 +02:00
Ben Greear
eaed5fc0c3 mac80211: Remove NOP call to ieee80211_hw_config
If changed is '0', then the ieee80211_hw_config takes no
action, so just remove the call in
__ieee809211_recalc_txpower()

Signed-off-by: Ben Greear <greearb@candelatech.com>
Link: https://patch.msgid.link/20241010204036.1219896-1-greearb@candelatech.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-23 17:36:49 +02:00
Aditya Kumar Singh
188a1bf894 wifi: mac80211: re-order assigning channel in activate links
The current flow in _ieee80211_set_active_links() does not align with the
operational requirements of drivers that groups multiple hardware
under a single wiphy. These drivers (e.g ath12k) rely on channel
assignment to determine the appropriate hardware for each link. Without
this, the drivers cannot correctly establish the link interface.

Currently in _ieee80211_set_active_links(), after calling
drv_change_vif_links() on the driver, the state of all connected stations
is updated via drv_change_sta_links(). This is followed by handling keys
in the links, and finally, assigning the channel to the links.
Consequently, drv_change_sta_links() prompts drivers to create the station
entry at their level and within their firmware. However, since channels
have not yet been assigned to links at this stage, drivers have not
created the necessary link interface for establishing link stations,
leading to failures in activating the links.

Therefore, re-order the logic so that after drv_change_vif_links() and
removing the old links, channels are assigned to newly added links.
Following this, the flow proceeds to station handling.

Signed-off-by: Aditya Kumar Singh <quic_adisi@quicinc.com>
Link: https://patch.msgid.link/20241001085034.2745669-1-quic_adisi@quicinc.com
[Johannes: fix iwlwifi to deal with the changes]
Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-23 17:26:14 +02:00
Johannes Berg
31cb94f71c wifi: mac80211: convert debugfs files to short fops
Given the large size of the regular struct file_operations, save
a lot of space with the newly added short fops for debugfs.

Link: https://patch.msgid.link/20241022151838.2f6de3ea3ecc.I45657e6a8415d796ec95c95becc9efb377ee3be6@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-23 16:47:07 +02:00
Johannes Berg
b457d87138 wifi: mac80211: remove misleading j_0 construction parts
The GCM algorithm implementation in the kernel assumes that
a 12-byte IV is passed, not the actual J_0 from the GCM spec.
Don't rename, that'd be messy, but also don't fill the bytes
beyond the IV that aren't used, since otherwise it looks as
though j_0[12] is used uninitialized.

Link: https://patch.msgid.link/20241021151414.798ceb7a5896.Ic57751edad228d56865ecf7433fef469e5e0a4aa@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-23 16:46:58 +02:00
Michael-CY Lee
2d63e6530e wifi: mac80211: refactor BW limitation check for CSA parsing
Refactor the BW limitation check to a more general format when
parsing CSA. Also, the original BW check did not account for BW
less than 160 MHz.

Signed-off-by: Michael-CY Lee <michael-cy.lee@mediatek.com>
Link: https://patch.msgid.link/20241009121812.2419-1-michael-cy.lee@mediatek.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-23 16:46:48 +02:00
Felix Fietkau
f92e0cf19a wifi: mac80211: filter on monitor interfaces based on configured channel
When a monitor interface has an assigned channel (only happens with the
NO_VIRTUAL_MONITOR feature), only pass packets received on that channel.
This is useful for monitoring on multiple channels at the same time using
multiple monitor interfaces.

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://patch.msgid.link/1bbe55107ba0f2e62ea90f305faeb7ba9247ef29.1728462320.git-series.nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-23 16:46:34 +02:00
Felix Fietkau
342afe693e wifi: mac80211: refactor ieee80211_rx_monitor
Rework the monitor mode interface iteration to get rid of the last_monitor
condition. Preparation for further filtering received monitor packets.

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://patch.msgid.link/d57d82f109643894325beb9db6da8f001fc533eb.1728462320.git-series.nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-23 16:46:26 +02:00
Felix Fietkau
754905ce1a wifi: mac80211: add support for the monitor SKIP_TX flag
Do not pass locally sent packets to monitor interfaces with this flag set.
Skip processing tx packets on the status call entirely if no monitor
interfaces without this flag are present.

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://patch.msgid.link/c327bb57ef8dadaa6a0e8e4dc2f5f99ae8123e6c.1728462320.git-series.nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-23 16:46:13 +02:00
Felix Fietkau
a77e527b47 wifi: cfg80211: add monitor SKIP_TX flag
This can be used to indicate that the user is not interested in receiving
locally sent packets on the monitor interface.

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://patch.msgid.link/f0c20f832eadd36c71fba9a2a16ba57d78389b6c.1728462320.git-series.nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-23 16:46:06 +02:00
Felix Fietkau
9d40f7e327 wifi: mac80211: add flag to opt out of virtual monitor support
This is useful for multi-radio devices that are capable of monitoring on
multiple channels simultanenously. When this flag is set, each monitor
interface is passed to the driver individually and can have a configured
channel.
The vif mac address for non-active monitor interfaces is cleared, in order
to allow the driver to tell them apart from active ones.

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://patch.msgid.link/3c55505ee0cf0a5f141fbcb30d1e8be8d9f40373.1728462320.git-series.nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-23 16:45:43 +02:00
Felix Fietkau
9c4f830927 wifi: cfg80211: pass net_device to .set_monitor_channel
Preparation for allowing multiple monitor interfaces with different channels
on a multi-radio wiphy.

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://patch.msgid.link/35fa652dbfebf93343f8b9a08fdef0467a2a02dc.1728462320.git-series.nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-23 16:45:35 +02:00
Felix Fietkau
006a97ceb6 wifi: mac80211: remove status->ampdu_delimiter_crc
This was never used by any driver, so remove it to free up some space.

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://patch.msgid.link/e6fee6eed49b105261830db1c74f13841fb9616c.1728462320.git-series.nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-23 16:45:19 +02:00
Felix Fietkau
ebda716ea4 wifi: cfg80211: report per wiphy radio antenna mask
With multi-radio devices, each radio typically gets a fixed set of antennas.
In order to be able to disable specific antennas for some radios, user space
needs to know which antenna mask bits are assigned to which radio.

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://patch.msgid.link/e0a26afa2c88eaa188ec96ec6d17ecac4e827641.1728462320.git-series.nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-23 16:45:03 +02:00
Felix Fietkau
7b68f63d5c wifi: mac80211: use vif radio mask to limit creating chanctx
Reject frequencies not supported by any radio that the vif is allowed to use.

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://patch.msgid.link/95ea1f6fc5bd1614a0c7952b6c67726e3fd635fb.1728462320.git-series.nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-23 16:44:55 +02:00
Felix Fietkau
32ee616a7f wifi: mac80211: use vif radio mask to limit ibss scan frequencies
Reject frequencies not supported by any radio that the vif is allowed to
use.

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://patch.msgid.link/9d5c0b6b00a7ecef6a0ac6de765c0af00c8bb0e1.1728462320.git-series.nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-23 16:44:44 +02:00
Felix Fietkau
3607798ad9 wifi: cfg80211: add option for vif allowed radios
This allows users to prevent a vif from affecting radios other than the
configured ones. This can be useful in cases where e.g. an AP is running
on one radio, and triggering a scan on another radio should not disturb it.

Changing the allowed radios list for a vif is supported, but only while
it is down.

While it is possible to achieve the same by always explicitly specifying
a frequency list for scan requests and ensuring that the wrong channel/band
is never accidentally set on an unrelated interface, this change makes
multi-radio wiphy setups a lot easier to deal with for CLI users.

By itself, this patch only enforces the radio mask for scanning requests
and remain-on-channel. Follow-up changes build on this to limit configured
frequencies.

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://patch.msgid.link/eefcb218780f71a1549875d149f1196486762756.1728462320.git-series.nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-23 16:44:10 +02:00
Johannes Berg
751e7489c1 wifi: mac80211: expose ieee80211_chan_width_to_rx_bw() to drivers
Drivers might need to also do this calculation, no point in
them duplicating the code. Since it's so simple, just make
it an inline.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20241007144851.af003cb4a088.I8b5d29504b726caae24af6013c65b3daebe842a2@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-23 16:43:38 +02:00
Johannes Berg
b23af47921 wifi: mac80211: chan: calculate min_def also for client mode
In order to deal with (temporary) bandwidth reductions to/from
the AP such as the upcoming RX OMI changes, modify the min_def
calculation to also not take the chanreq width into account in
client mode. This normally changes nothing as the AP bandwidth
will be the same as the channel request's width. In the RX OMI
changes, however, the code will reduce the bandwidth for only
the AP STA, since the OMI is only to that, and TDLS STAs are
unaffected. Using the min_def for this case simplifies RX OMI
a lot.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20241007144851.95a39c4f6f45.I2e7517fb1a7221dc6f60b0c752e4882042b4265d@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-23 16:43:35 +02:00
Miri Korenblit
41eba07636 wifi: mac80211: add an option to fake ieee80211_connection_loss
This allows faking this function in KUnit tests.

Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20241007144851.3b42e7547c65.I3bcbd51bec9ccfc7c08739450ec778722549c007@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-23 16:43:33 +02:00
Miri Korenblit
cf00792797 wifi: mac80211: parse A-MSDU len from EHT capabilities
On 2.4 GHz there's no VHT, so EHT defines its own bits for
the maximum MPDU length. Parse and store them in the link_sta.

Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20241007144851.e05da59c419a.I0b1c047639160d9a96f48ab013c18ea33f5473b0@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-23 16:43:29 +02:00
Johannes Berg
88b67e91e2 wifi: mac80211: call rate_control_rate_update() for link STA
In order to update the right link information, call the update
rate_control_rate_update() with the right link_sta, and then
pass that through to the driver's sta_rc_update() method. The
software rate control still doesn't support it, but that'll be
skipped by not having a rate control ref.

Since it now operates on a link sta, rename the driver method.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20241007144851.5851b6b5fd41.Ibdf50d96afa4b761dd9b9dfd54a1147e77a75329@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-23 16:43:27 +02:00
Johannes Berg
f828deb70c wifi: mac80211: allow rate_control_rate_init() for links
Andrei previously fixed an issue in the client where the NSS
for links other than the primary/assoc/deflink isn't set. The
same issue appears to exist on the AP side, because there's
only a call to rate_control_rate_init() for the deflink, and
not any other links.

Rework the code a bit to do rate_control_rate_init() for links,
even if it really doesn't work with software rate control yet,
it does other things as well.

Also add rate_control_rate_init_all_links() to actually do it
properly when moving to ASSOC state in cfg80211.

Change the explicit call to ieee80211_sta_init_nss() to instead
be rate_control_rate_init() now in the client code, but also
add a call to rate_control_rate_init() when a link is added in
AP mode and the STA is already associated.

This should fix the NSS initialization issue, and perhaps pave
the way for actual software rate scaling a bit, in case anyone
cares in the future, but that of course needs a lot more than
just the init call.

We still need to fix the rate control _update_ as well, and the
sta_rc_update() driver method especially, but that will be in a
different patch.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20241007144851.c693274a908f.I0376da02e9f5a30eaa1b5d0d01371ff09506d453@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-23 16:43:23 +02:00
Emmanuel Grumbach
c4382d5ca1 wifi: mac80211: update the right link for tx power
Stop looking at deflink and start using the actual link.
Initialize the power settings upon link init.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20241007144851.2685dab8e1ab.I1d82cbdb2dda020aee4a225bd9a134f7d82dd810@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-23 16:43:22 +02:00
Emmanuel Grumbach
0b7392ee3b wifi: mac80211: __ieee80211_recalc_txpower receives a link
Handle the tx power per-link. Don't change the behavior for now. Just
change the signature of the function.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20241007144851.3c9cd0731f5b.I6ebfd9d5084f3602b55c55e2669881fd92471c2f@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-23 16:43:20 +02:00
Emmanuel Grumbach
9925aa855d wifi: mac80211: ieee80211_recalc_txpower receives a link
Handle the tx power per-link. Don't change the behavior for now. Just
change the signature of the function.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20241007144851.705bbf953d0a.I8a429dede07bab5801f4c730a6abff7ce23b22d3@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-23 16:43:18 +02:00
Emmanuel Grumbach
eea3323c43 wifi: mac80211: remove unneeded parameters
ieee80211_find_80211h_pwr_constr and ieee80211_find_cisco_dtpc don't
need the pointer to struct ieee80211_sub_if_data *sdata. Remove it and
it'll be one step closer to handle the power constraints per-link.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20241007144851.3ea505cd74e7.Id416127544afd80e4fe7b275b612aef511fc64ed@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-23 16:43:15 +02:00
Emmanuel Grumbach
e21dd758cf wifi: mac80211: make bss_param_ch_cnt available for the low level driver
Drivers may need to track this. Make it available for them, and maintain
the value when beacons are received.
When link X receives a beacon, iterate the RNR elements and update all
the links with their respective data.
Track the link id that updated the data so that each link can know
whether the update came from its own beacon or from another link.
In case, the update came from the link's own beacon, always update the
updater link id.
The purpose is to let the low level driver know if a link is losing its
beacons. If link X is losing its beacons, it can still track the
bss_param_ch_cnt and know where the update came from.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20241007144851.e2d8d1a722ad.I04b883daba2cd48e5730659eb62ca1614c899cbb@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-23 16:43:12 +02:00
Miri Korenblit
9c5f2c7eeb wifi: mac80211: rename IEEE80211_CHANCTX_CHANGE_MIN_WIDTH
The name is misleading, this actually indicates that
ieee80211_chanctx_conf::min_def was updated.
Rename it to IEEE80211_CHANCTX_CHANGE_MIN_DEF.

Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20241007144851.726b5f12ae0c.I3bd9e594c9d2735183ec049a4c7224bd0a9599c9@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-23 16:43:08 +02:00
Johannes Berg
62262dd00c wifi: cfg80211: disallow SMPS in AP mode
In practice, userspace hasn't been able to set this for many
years, and mac80211 has already rejected it (which is now no
longer needed), so reject SMPS mode (other than "OFF" to be
a bit more compatible) in AP mode. Also remove the parameter
from the AP settings struct.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20241007144851.fe1fc46484cf.I8676fb52b818a4bedeb9c25b901e1396277ffc0b@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-23 16:43:03 +02:00
Ilan Peer
074a8b54da wifi: mac80211: Add support to indicate that a new interface is to be added
Add support to indicate to the driver that an interface is about to be
added so that the driver could prepare its resources early if it needs
so.

Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20241007144851.e0e8563e1c30.Ifccc96a46a347eb15752caefc9f4eff31f75ed47@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-23 16:42:56 +02:00
Luiz Augusto von Dentz
246b435ad6 Bluetooth: ISO: Fix UAF on iso_sock_timeout
conn->sk maybe have been unlinked/freed while waiting for iso_conn_lock
so this checks if the conn->sk is still valid by checking if it part of
iso_sk_list.

Fixes: ccf74f2390 ("Bluetooth: Add BTPROTO_ISO socket type")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-10-23 10:21:14 -04:00
Luiz Augusto von Dentz
1bf4470a39 Bluetooth: SCO: Fix UAF on sco_sock_timeout
conn->sk maybe have been unlinked/freed while waiting for sco_conn_lock
so this checks if the conn->sk is still valid by checking if it part of
sco_sk_list.

Reported-by: syzbot+4c0d0c4cde787116d465@syzkaller.appspotmail.com
Tested-by: syzbot+4c0d0c4cde787116d465@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=4c0d0c4cde787116d465
Fixes: ba316be1b6 ("Bluetooth: schedule SCO timeouts with delayed_work")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-10-23 10:20:29 -04:00
Luiz Augusto von Dentz
989fa5171f Bluetooth: hci_core: Disable works on hci_unregister_dev
This make use of disable_work_* on hci_unregister_dev since the hci_dev is
about to be freed new submissions are not disarable.

Fixes: 0d151a1037 ("Bluetooth: hci_core: cancel all works upon hci_unregister_dev()")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-10-23 10:19:44 -04:00
Eric Dumazet
e44ef3f66c netpoll: remove ndo_netpoll_setup() second argument
npinfo is not used in any of the ndo_netpoll_setup() methods.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241018052108.2610827-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-23 13:31:32 +02:00
Dmitry Antipov
b22db8b8be net: sched: use RCU read-side critical section in taprio_dump()
Fix possible use-after-free in 'taprio_dump()' by adding RCU
read-side critical section there. Never seen on x86 but
found on a KASAN-enabled arm64 system when investigating
https://syzkaller.appspot.com/bug?extid=b65e0af58423fc8a73aa:

[T15862] BUG: KASAN: slab-use-after-free in taprio_dump+0xa0c/0xbb0
[T15862] Read of size 4 at addr ffff0000d4bb88f8 by task repro/15862
[T15862]
[T15862] CPU: 0 UID: 0 PID: 15862 Comm: repro Not tainted 6.11.0-rc1-00293-gdefaf1a2113a-dirty #2
[T15862] Hardware name: QEMU QEMU Virtual Machine, BIOS edk2-20240524-5.fc40 05/24/2024
[T15862] Call trace:
[T15862]  dump_backtrace+0x20c/0x220
[T15862]  show_stack+0x2c/0x40
[T15862]  dump_stack_lvl+0xf8/0x174
[T15862]  print_report+0x170/0x4d8
[T15862]  kasan_report+0xb8/0x1d4
[T15862]  __asan_report_load4_noabort+0x20/0x2c
[T15862]  taprio_dump+0xa0c/0xbb0
[T15862]  tc_fill_qdisc+0x540/0x1020
[T15862]  qdisc_notify.isra.0+0x330/0x3a0
[T15862]  tc_modify_qdisc+0x7b8/0x1838
[T15862]  rtnetlink_rcv_msg+0x3c8/0xc20
[T15862]  netlink_rcv_skb+0x1f8/0x3d4
[T15862]  rtnetlink_rcv+0x28/0x40
[T15862]  netlink_unicast+0x51c/0x790
[T15862]  netlink_sendmsg+0x79c/0xc20
[T15862]  __sock_sendmsg+0xe0/0x1a0
[T15862]  ____sys_sendmsg+0x6c0/0x840
[T15862]  ___sys_sendmsg+0x1ac/0x1f0
[T15862]  __sys_sendmsg+0x110/0x1d0
[T15862]  __arm64_sys_sendmsg+0x74/0xb0
[T15862]  invoke_syscall+0x88/0x2e0
[T15862]  el0_svc_common.constprop.0+0xe4/0x2a0
[T15862]  do_el0_svc+0x44/0x60
[T15862]  el0_svc+0x50/0x184
[T15862]  el0t_64_sync_handler+0x120/0x12c
[T15862]  el0t_64_sync+0x190/0x194
[T15862]
[T15862] Allocated by task 15857:
[T15862]  kasan_save_stack+0x3c/0x70
[T15862]  kasan_save_track+0x20/0x3c
[T15862]  kasan_save_alloc_info+0x40/0x60
[T15862]  __kasan_kmalloc+0xd4/0xe0
[T15862]  __kmalloc_cache_noprof+0x194/0x334
[T15862]  taprio_change+0x45c/0x2fe0
[T15862]  tc_modify_qdisc+0x6a8/0x1838
[T15862]  rtnetlink_rcv_msg+0x3c8/0xc20
[T15862]  netlink_rcv_skb+0x1f8/0x3d4
[T15862]  rtnetlink_rcv+0x28/0x40
[T15862]  netlink_unicast+0x51c/0x790
[T15862]  netlink_sendmsg+0x79c/0xc20
[T15862]  __sock_sendmsg+0xe0/0x1a0
[T15862]  ____sys_sendmsg+0x6c0/0x840
[T15862]  ___sys_sendmsg+0x1ac/0x1f0
[T15862]  __sys_sendmsg+0x110/0x1d0
[T15862]  __arm64_sys_sendmsg+0x74/0xb0
[T15862]  invoke_syscall+0x88/0x2e0
[T15862]  el0_svc_common.constprop.0+0xe4/0x2a0
[T15862]  do_el0_svc+0x44/0x60
[T15862]  el0_svc+0x50/0x184
[T15862]  el0t_64_sync_handler+0x120/0x12c
[T15862]  el0t_64_sync+0x190/0x194
[T15862]
[T15862] Freed by task 6192:
[T15862]  kasan_save_stack+0x3c/0x70
[T15862]  kasan_save_track+0x20/0x3c
[T15862]  kasan_save_free_info+0x4c/0x80
[T15862]  poison_slab_object+0x110/0x160
[T15862]  __kasan_slab_free+0x3c/0x74
[T15862]  kfree+0x134/0x3c0
[T15862]  taprio_free_sched_cb+0x18c/0x220
[T15862]  rcu_core+0x920/0x1b7c
[T15862]  rcu_core_si+0x10/0x1c
[T15862]  handle_softirqs+0x2e8/0xd64
[T15862]  __do_softirq+0x14/0x20

Fixes: 18cdd2f099 ("net/sched: taprio: taprio_dump and taprio_change are protected by rtnl_mutex")
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Link: https://patch.msgid.link/20241018051339.418890-2-dmantipov@yandex.ru
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-23 13:26:15 +02:00
Dmitry Antipov
f504465970 net: sched: fix use-after-free in taprio_change()
In 'taprio_change()', 'admin' pointer may become dangling due to sched
switch / removal caused by 'advance_sched()', and critical section
protected by 'q->current_entry_lock' is too small to prevent from such
a scenario (which causes use-after-free detected by KASAN). Fix this
by prefer 'rcu_replace_pointer()' over 'rcu_assign_pointer()' to update
'admin' immediately before an attempt to schedule freeing.

Fixes: a3d43c0d56 ("taprio: Add support adding an admin schedule")
Reported-by: syzbot+b65e0af58423fc8a73aa@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=b65e0af58423fc8a73aa
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Link: https://patch.msgid.link/20241018051339.418890-1-dmantipov@yandex.ru
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-23 13:26:15 +02:00
Kuniyuki Iwashima
c972c1c41d ipv4: Switch inet_addr_hash() to less predictable hash.
Recently, commit 4a0ec2aa07 ("ipv6: switch inet6_addr_hash()
to less predictable hash") and commit 4daf4dc275 ("ipv6: switch
inet6_acaddr_hash() to less predictable hash") hardened IPv6
address hash functions.

inet_addr_hash() is also highly predictable, and a malicious use
could abuse a specific bucket.

Let's follow the change on IPv4 by using jhash_1word().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241018014100.93776-1-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-23 13:17:35 +02:00
Kuniyuki Iwashima
7213a1c417 ip6mr: Add __init to ip6_mr_cleanup().
kernel test robot reported a section mismatch in ip6_mr_cleanup().

  WARNING: modpost: vmlinux: section mismatch in reference: ip6_mr_cleanup+0x0 (section: .text) -> 0xffffffff (section: .init.rodata)
  WARNING: modpost: vmlinux: section mismatch in reference: ip6_mr_cleanup+0x14 (section: .text) -> ip6mr_rtnl_msg_handlers (section: .init.rodata)

ip6_mr_cleanup() uses ip6mr_rtnl_msg_handlers[] that has
__initconst_or_module qualifier.

ip6_mr_cleanup() is only called from inet6_init() but does
not have __init qualifier.

Let's add __init to ip6_mr_cleanup().

Fixes: 3ac84e31b3 ("ipmr: Use rtnl_register_many().")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202410180139.B3HeemsC-lkp@intel.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20241017174732.39487-1-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-23 12:29:27 +02:00
Vladimir Oltean
83c289e81e net/sched: act_api: unexport tcf_action_dump_1()
This isn't used outside act_api.c, but is called by tcf_dump_walker()
prior to its definition. So move it upwards and make it static.

Simultaneously, reorder the variable declarations so that they follow
the networking "reverse Christmas tree" coding style.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://patch.msgid.link/20241017161934.3599046-1-vladimir.oltean@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-23 11:43:47 +02:00
Vladimir Oltean
34d35b4edb net/sched: act_api: deny mismatched skip_sw/skip_hw flags for actions created by classifiers
tcf_action_init() has logic for checking mismatches between action and
filter offload flags (skip_sw/skip_hw). AFAIU, this is intended to run
on the transition between the new tc_act_bind(flags) returning true (aka
now gets bound to classifier) and tc_act_bind(act->tcfa_flags) returning
false (aka action was not bound to classifier before). Otherwise, the
check is skipped.

For the case where an action is not standalone, but rather it was
created by a classifier and is bound to it, tcf_action_init() skips the
check entirely, and this means it allows mismatched flags to occur.

Taking the matchall classifier code path as an example (with mirred as
an action), the reason is the following:

 1 | mall_change()
 2 | -> mall_replace_hw_filter()
 3 |   -> tcf_exts_validate_ex()
 4 |      -> flags |= TCA_ACT_FLAGS_BIND;
 5 |      -> tcf_action_init()
 6 |         -> tcf_action_init_1()
 7 |            -> a_o->init()
 8 |               -> tcf_mirred_init()
 9 |                  -> tcf_idr_create_from_flags()
10 |                     -> tcf_idr_create()
11 |                        -> p->tcfa_flags = flags;
12 |         -> tc_act_bind(flags))
13 |         -> tc_act_bind(act->tcfa_flags)

When invoked from tcf_exts_validate_ex() like matchall does (but other
classifiers validate their extensions as well), tcf_action_init() runs
in a call path where "flags" always contains TCA_ACT_FLAGS_BIND (set by
line 4). So line 12 is always true, and line 13 is always true as well.
No transition ever takes place, and the check is skipped.

The code was added in this form in commit c86e0209dc ("flow_offload:
validate flags of filter and actions"), but I'm attributing the blame
even earlier in that series, to when TCA_ACT_FLAGS_SKIP_HW and
TCA_ACT_FLAGS_SKIP_SW were added to the UAPI.

Following the development process of this change, the check did not
always exist in this form. A change took place between v3 [1] and v4 [2],
AFAIU due to review feedback that it doesn't make sense for action flags
to be different than classifier flags. I think I agree with that
feedback, but it was translated into code that omits enforcing this for
"classic" actions created at the same time with the filters themselves.

There are 3 more important cases to discuss. First there is this command:

$ tc qdisc add dev eth0 clasct
$ tc filter add dev eth0 ingress matchall skip_sw \
	action mirred ingress mirror dev eth1

which should be allowed, because prior to the concept of dedicated
action flags, it used to work and it used to mean the action inherited
the skip_sw/skip_hw flags from the classifier. It's not a mismatch.

Then we have this command:

$ tc qdisc add dev eth0 clasct
$ tc filter add dev eth0 ingress matchall skip_sw \
	action mirred ingress mirror dev eth1 skip_hw

where there is a mismatch and it should be rejected.

Finally, we have:

$ tc qdisc add dev eth0 clasct
$ tc filter add dev eth0 ingress matchall skip_sw \
	action mirred ingress mirror dev eth1 skip_sw

where the offload flags coincide, and this should be treated the same as
the first command based on inheritance, and accepted.

[1]: https://lore.kernel.org/netdev/20211028110646.13791-9-simon.horman@corigine.com/
[2]: https://lore.kernel.org/netdev/20211118130805.23897-10-simon.horman@corigine.com/
Fixes: 7adc576512 ("flow_offload: add skip_hw and skip_sw to control if offload the action")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20241017161049.3570037-1-vladimir.oltean@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-23 11:31:27 +02:00
Antoine Tenart
124afe773b net: sysctl: allow dump_cpumask to handle higher numbers of CPUs
This fixes the output of rps_default_mask and flow_limit_cpu_bitmap when
the CPU count is > 448, as it was truncated.

The underlying values are actually stored correctly when writing to
these sysctl but displaying them uses a fixed length temporary buffer in
dump_cpumask. This buffer can be too small if the CPU count is > 448.

Fix this by dynamically allocating the buffer in dump_cpumask, using a
guesstimate of what we need.

Signed-off-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-23 10:28:26 +02:00
Antoine Tenart
a8cc8fa145 net: sysctl: do not reserve an extra char in dump_cpumask temporary buffer
When computing the length we'll be able to use out of the buffers, one
char is removed from the temporary one to make room for a newline. It
should be removed from the output buffer length too, but in reality this
is not needed as the later call to scnprintf makes sure a null char is
written at the end of the buffer which we override with the newline.

Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-23 10:28:19 +02:00
Antoine Tenart
d631094e4d net: sysctl: remove always-true condition
Before adding a new line at the end of the temporary buffer in
dump_cpumask, a length check is performed to ensure there is space for
it.

  len = min(sizeof(kbuf) - 1, *lenp);
  len = scnprintf(kbuf, len, ...);
  if (len < *lenp)
          kbuf[len++] = '\n';

Note that the check is currently logically wrong, the written length is
compared against the output buffer, not the temporary one. However this
has no consequence as this is always true, even if fixed: scnprintf
includes a null char at the end of the buffer but the returned length do
not include it and there is always space for overriding it with a
newline.

Remove the condition.

Signed-off-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-23 10:28:19 +02:00
Yajun Deng
6886c14bdc net: use sock_valbool_flag() only in __sock_set_timestamps()
sock_{,re}set_flag() are contained in sock_valbool_flag(),
it would be cleaner to just use sock_valbool_flag().

Signed-off-by: Yajun Deng <yajun.deng@linux.dev>
Link: https://patch.msgid.link/20241017133435.2552-1-yajun.deng@linux.dev
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-23 10:06:24 +02:00
Daniel Borkmann
14a3d3ef02 bpf: Remove MEM_UNINIT from skb/xdp MTU helpers
We can now undo parts of 4b3786a6c5 ("bpf: Zero former ARG_PTR_TO_{LONG,INT}
args in case of error") as discussed in [0].

Given the BPF helpers now have MEM_WRITE tag, the MEM_UNINIT can be cleared.

The mtu_len is an input as well as output argument, meaning, the BPF program
has to set it to something. It cannot be uninitialized. Therefore, allowing
uninitialized memory and zeroing it on error would be odd. It was done as
an interim step in 4b3786a6c5 as the desired behavior could not have been
expressed before the introduction of MEM_WRITE tag.

Fixes: 4b3786a6c5 ("bpf: Zero former ARG_PTR_TO_{LONG,INT} args in case of error")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/a86eb76d-f52f-dee4-e5d2-87e45de3e16f@iogearbox.net [0]
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20241021152809.33343-3-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-10-22 15:42:56 -07:00
Daniel Borkmann
6fad274f06 bpf: Add MEM_WRITE attribute
Add a MEM_WRITE attribute for BPF helper functions which can be used in
bpf_func_proto to annotate an argument type in order to let the verifier
know that the helper writes into the memory passed as an argument. In
the past MEM_UNINIT has been (ab)used for this function, but the latter
merely tells the verifier that the passed memory can be uninitialized.

There have been bugs with overloading the latter but aside from that
there are also cases where the passed memory is read + written which
currently cannot be expressed, see also 4b3786a6c5 ("bpf: Zero former
ARG_PTR_TO_{LONG,INT} args in case of error").

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20241021152809.33343-1-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-10-22 15:42:56 -07:00
Asbjørn Sloth Tønnesen
867d13a754 tools: ynl-gen: use big-endian netlink attribute types
Change ynl-gen-c.py to use NLA_BE16 and NLA_BE32 types to represent
big-endian u16 and u32 ynl types.

Doing this enables those attributes to have range checks applied, as
the validator will then convert to host endianness prior to validation.

The autogenerated kernel/uapi code have been regenerated by running:
  ./tools/net/ynl/ynl-regen.sh -f

This changes the policy types of the following attributes:

  FOU_ATTR_PORT (NLA_U16 -> NLA_BE16)
  FOU_ATTR_PEER_PORT (NLA_U16 -> NLA_BE16)
    These two are used with nla_get_be16/nla_put_be16().

  MPTCP_PM_ADDR_ATTR_ADDR4 (NLA_U32 -> NLA_BE32)
    This one is used with nla_get_in_addr/nla_put_in_addr(),
    which uses nla_get_be32/nla_put_be32().

IOWs the generated changes are AFAICT aligned with their implementations.

The generated userspace code remains identical, and have been verified
by comparing the output generated by the following command:
  make -C tools/net/ynl/generated

Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241017094704.3222173-1-ast@fiberby.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-22 15:33:24 +02:00
Paolo Abeni
fa287557e6 netfilter pull request 24-10-21
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEjF9xRqF1emXiQiqU1w0aZmrPKyEFAmcWIBEACgkQ1w0aZmrP
 KyHRmhAAvR7+UjYFSg9jK7rczu16+s2Uf6B0z9FqSo0VHJOmrAAJqhiAkTofy9yz
 WO3CGm5Fmzf6uSo88Y8ngHpbmZprot3NocIq2y4lJqHsmEA2PTy2Ig3DgXW7anku
 OQlDOXabjkuW223mWqba+aV2qsARQrjp9EwRb89Stuqc4hCoNn+8j7l48Tbu1pyc
 a8V5tIfoUxGf2nYkub4Ykf1TbKBuEZfuBQMZU8N5NATahi2SqCtbF5ztmkyJnGwW
 kg/aNfFAvnHJYhTwGMG2bYKdFczLgMxgf4PipDqfFWChIBC4OUr8Kl4+NIRoGu5S
 foGoGAQmsgc3bS9N06L+mr5GznW2XAkOvOpeSJNK9vBK5T/GiDN8XUd1P+/K3G/a
 W780PW4qIzIWmvIEZ7lmbvfrJHQrCMDYecOsagl9PLW7TuFJfodPAUGa2TwnEX5j
 ckERb5djVa//rzEYcLj9QHvqfRE3/T74Z/YfvkJiIVlso36hYVnTnD4tt8Yn64Lc
 u458E/V1/wYRar2boRExDWfMv8seGkpJvIefqetZqr7sBjh64Oeoijb4r9+YJJql
 mO39kOyYDRfgVxirZ/CglN/YPIwuDjFHUvkWJj4GyqixkyAo9FJTDYoPBihS0SFS
 ANmVukRcXAG+cPVMnzbJnXzUA0Q9ZMean7jY92hum57QRyinS/U=
 =cTcY
 -----END PGP SIGNATURE-----

Merge tag 'nf-24-10-21' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================

This patchset contains Netfilter fixes for net:

1) syzkaller managed to triger UaF due to missing reference on netns in
   bpf infrastructure, from Florian Westphal.

2) Fix incorrect conversion from NFPROTO_UNSPEC to NFPROTO_{IPV4,IPV6}
   in the following xtables targets: MARK and NFLOG. Moreover, add
   missing

I have my half share in this mistake, I did not take the necessary time
to review this: For several years I have been struggling to keep working
on Netfilter, juggling a myriad of side consulting projects to stop
burning my own savings.

I have extended the iptables-tests.py test infrastructure to improve the
coverage of ip6tables and detect similar problems in the future.

This is a v2 including a extended PR with one more fix.

netfilter pull request 24-10-21

* tag 'nf-24-10-21' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: xtables: fix typo causing some targets not to load on IPv6
  netfilter: bpf: must hold reference on net namespace
====================

Link: https://patch.msgid.link/20241021094536.81487-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-22 12:43:42 +02:00
Kuniyuki Iwashima
6ab0f86694 rtnetlink: Protect struct rtnl_af_ops with SRCU.
Once RTNL is replaced with rtnl_net_lock(), we need a mechanism to
guarantee that rtnl_af_ops is alive during inflight RTM_SETLINK
even when its module is being unloaded.

Let's use SRCU to protect ops.

rtnl_af_lookup() now iterates rtnl_af_ops under RCU and returns
SRCU-protected ops pointer.  The caller must call rtnl_af_put()
to release the pointer after the use.

Also, rtnl_af_unregister() unlinks the ops first and calls
synchronize_srcu() to wait for inflight RTM_SETLINK requests to
complete.

Note that rtnl_af_ops needs to be protected by its dedicated lock
when RTNL is removed.

Note also that BUG_ON() in do_setlink() is changed to the normal
error handling as a different af_ops might be found after
validate_linkmsg().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-22 11:02:05 +02:00
Kuniyuki Iwashima
26eebdc4b0 rtnetlink: Return int from rtnl_af_register().
The next patch will add init_srcu_struct() in rtnl_af_register(),
then we need to handle its error.

Let's add the error handling in advance to make the following
patch cleaner.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Matt Johnston <matt@codeconstruct.com.au>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-22 11:02:05 +02:00
Kuniyuki Iwashima
a0b63c6457 rtnetlink: Call rtnl_link_get_net_capable() in do_setlink().
We will push RTNL down to rtnl_setlink().

RTM_SETLINK could call rtnl_link_get_net_capable() in do_setlink()
to move a dev to a new netns, but the netns needs to be fetched before
holding rtnl_net_lock().

Let's move it to rtnl_setlink() and pass the netns to do_setlink().

Now, RTM_NEWLINK paths (rtnl_changelink() and rtnl_group_changelink())
can pass the prefetched netns to do_setlink().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-22 11:02:05 +02:00
Kuniyuki Iwashima
6e495fad88 rtnetlink: Clean up rtnl_setlink().
We will push RTNL down to rtnl_setlink().

Let's unify the error path to make it easy to place rtnl_net_lock().

While at it, keep the variables in reverse xmas order.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-22 11:02:05 +02:00
Kuniyuki Iwashima
175cfc5cd3 rtnetlink: Clean up rtnl_dellink().
We will push RTNL down to rtnl_delink().

Let's unify the error path to make it easy to place rtnl_net_lock().

While at it, keep the variables in reverse xmas order.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-22 11:02:05 +02:00
Kuniyuki Iwashima
f7774eec20 rtnetlink: Fetch IFLA_LINK_NETNSID in rtnl_newlink().
Another netns option for RTM_NEWLINK is IFLA_LINK_NETNSID and
is fetched in rtnl_newlink_create().

This must be done before holding rtnl_net_lock().

Let's move IFLA_LINK_NETNSID processing to rtnl_newlink().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-22 11:02:05 +02:00
Kuniyuki Iwashima
0fef2a1212 rtnetlink: Call rtnl_link_get_net_capable() in rtnl_newlink().
As a prerequisite of per-netns RTNL, we must fetch netns before
looking up dev or moving it to another netns.

rtnl_link_get_net_capable() is called in rtnl_newlink_create() and
do_setlink(), but both of them need to be moved to the RTNL-independent
region, which will be rtnl_newlink().

Let's call rtnl_link_get_net_capable() in rtnl_newlink() and pass the
netns down to where needed.

Note that the latter two have not passed the nets to do_setlink() yet
but will do so after the remaining rtnl_link_get_net_capable() is moved
to rtnl_setlink() later.

While at it, dest_net is renamed to tgt_net in rtnl_newlink_create() to
align with rtnl_{del,set}link().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-22 11:02:05 +02:00
Kuniyuki Iwashima
43c7ce69d2 rtnetlink: Protect struct rtnl_link_ops with SRCU.
Once RTNL is replaced with rtnl_net_lock(), we need a mechanism to
guarantee that rtnl_link_ops is alive during inflight RTM_NEWLINK
even when its module is being unloaded.

Let's use SRCU to protect ops.

rtnl_link_ops_get() now iterates link_ops under RCU and returns
SRCU-protected ops pointer.  The caller must call rtnl_link_ops_put()
to release the pointer after the use.

Also, __rtnl_link_unregister() unlinks the ops first and calls
synchronize_srcu() to wait for inflight RTM_NEWLINK requests to
complete.

Note that link_ops needs to be protected by its dedicated lock
when RTNL is removed.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-22 11:02:04 +02:00
Kuniyuki Iwashima
0d3008d1a9 rtnetlink: Move ops->validate to rtnl_newlink().
ops->validate() does not require RTNL.

Let's move it to rtnl_newlink().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-22 11:02:04 +02:00
Kuniyuki Iwashima
331fe31c50 rtnetlink: Move rtnl_link_ops_get() and retry to rtnl_newlink().
Currently, if neither dev nor rtnl_link_ops is found in __rtnl_newlink(),
we release RTNL and redo the whole process after request_module(), which
complicates the logic.

The ops will be RTNL-independent later.

Let's move the ops lookup to rtnl_newlink() and do the retry earlier.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-22 11:02:04 +02:00
Kuniyuki Iwashima
7fea1a8cb4 rtnetlink: Move simple validation from __rtnl_newlink() to rtnl_newlink().
We will push RTNL down to rtnl_newlink().

Let's move RTNL-independent validation to rtnl_newlink().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-22 11:02:04 +02:00
Kuniyuki Iwashima
cc47bcdf0d rtnetlink: Factorise do_setlink() path from __rtnl_newlink().
__rtnl_newlink() got too long to maintain.

For example, netdev_master_upper_dev_get()->rtnl_link_ops is fetched even
when IFLA_INFO_SLAVE_DATA is not specified.

Let's factorise the single dev do_setlink() path to a separate function.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-22 11:02:04 +02:00
Kuniyuki Iwashima
a5838cf9b2 rtnetlink: Call validate_linkmsg() in do_setlink().
There are 3 paths that finally call do_setlink(), and validate_linkmsg()
is called in each path.

  1. RTM_NEWLINK
    1-1. dev is found in __rtnl_newlink()
    1-2. dev isn't found, but IFLA_GROUP is specified in
          rtnl_group_changelink()
  2. RTM_SETLINK

The next patch factorises 1-1 to a separate function.

As a preparation, let's move validate_linkmsg() calls to do_setlink().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-22 11:02:04 +02:00
Kuniyuki Iwashima
fa8ef258da rtnetlink: Allocate linkinfo[] as struct rtnl_newlink_tbs.
We will move linkinfo to rtnl_newlink() and pass it down to other
functions.

Let's pack it into rtnl_newlink_tbs.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-22 11:02:04 +02:00
Linus Torvalds
a360f311f5 9p: fix slab cache name creation for real
This was attempted by using the dev_name in the slab cache name, but as
Omar Sandoval pointed out, that can be an arbitrary string, eg something
like "/dev/root".  Which in turn trips verify_dirent_name(), which fails
if a filename contains a slash.

So just make it use a sequence counter, and make it an atomic_t to avoid
any possible races or locking issues.

Reported-and-tested-by: Omar Sandoval <osandov@fb.com>
Link: https://lore.kernel.org/all/ZxafcO8KWMlXaeWE@telecaster.dhcp.thefacebook.com/
Fixes: 79efebae4a ("9p: Avoid creating multiple slab caches with the same name")
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Thorsten Leemhuis <regressions@leemhuis.info>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-10-21 15:41:29 -07:00
Eric Dumazet
95ecba62e2 net: fix races in netdev_tx_sent_queue()/dev_watchdog()
Some workloads hit the infamous dev_watchdog() message:

"NETDEV WATCHDOG: eth0 (xxxx): transmit queue XX timed out"

It seems possible to hit this even for perfectly normal
BQL enabled drivers:

1) Assume a TX queue was idle for more than dev->watchdog_timeo
   (5 seconds unless changed by the driver)

2) Assume a big packet is sent, exceeding current BQL limit.

3) Driver ndo_start_xmit() puts the packet in TX ring,
   and netdev_tx_sent_queue() is called.

4) QUEUE_STATE_STACK_XOFF could be set from netdev_tx_sent_queue()
   before txq->trans_start has been written.

5) txq->trans_start is written later, from netdev_start_xmit()

    if (rc == NETDEV_TX_OK)
          txq_trans_update(txq)

dev_watchdog() running on another cpu could read the old
txq->trans_start, and then see QUEUE_STATE_STACK_XOFF, because 5)
did not happen yet.

To solve the issue, write txq->trans_start right before one XOFF bit
is set :

- _QUEUE_STATE_DRV_XOFF from netif_tx_stop_queue()
- __QUEUE_STATE_STACK_XOFF from netdev_tx_sent_queue()

From dev_watchdog(), we have to read txq->state before txq->trans_start.

Add memory barriers to enforce correct ordering.

In the future, we could avoid writing over txq->trans_start for normal
operations, and rename this field to txq->xoff_start_time.

Fixes: bec251bc8b ("net: no longer stop all TX queues in dev_watchdog()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://patch.msgid.link/20241015194118.3951657-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-21 12:54:25 +02:00
Pablo Neira Ayuso
306ed1728e netfilter: xtables: fix typo causing some targets not to load on IPv6
- There is no NFPROTO_IPV6 family for mark and NFLOG.
- TRACE is also missing module autoload with NFPROTO_IPV6.

This results in ip6tables failing to restore a ruleset. This issue has been
reported by several users providing incomplete patches.

Very similar to Ilya Katsnelson's patch including a missing chunk in the
TRACE extension.

Fixes: 0bfcb7b71e ("netfilter: xtables: avoid NFPROTO_UNSPEC where needed")
Reported-by: Ignat Korchagin <ignat@cloudflare.com>
Reported-by: Ilya Katsnelson <me@0upti.me>
Reported-by: Krzysztof Olędzki <ole@ans.pl>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-10-21 11:31:26 +02:00
Paolo Abeni
91afa49a3e Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.12-rc4).

Conflicts:

107a034d5c ("net/mlx5: qos: Store rate groups in a qos domain")
1da9cfd6c4 ("net/mlx5: Unregister notifier on eswitch init failure")

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-21 09:14:18 +02:00
Linus Torvalds
d7f513ae7b bluetooth pull request for net:
- ISO: Fix multiple init when debugfs is disabled
  - Call iso_exit() on module unload
  - Remove debugfs directory on module init failure
  - btusb: Fix not being able to reconnect after suspend
  - btusb: Fix regression with fake CSR controllers 0a12:0001
  - bnep: fix wild-memory-access in proto_unregister
 -----BEGIN PGP SIGNATURE-----
 
 iQJNBAABCAA3FiEE7E6oRXp8w05ovYr/9JCA4xAyCykFAmcQJGUZHGx1aXoudm9u
 LmRlbnR6QGludGVsLmNvbQAKCRD0kIDjEDILKeWiD/9WIrQRp7K10hc7Q9vjyex+
 DA+yv3EaFbqk//BIXOAg3En+HjhrDeaIkrt0L3m5D+/Y5K/oGQeO0ffOu4aF3So2
 D4Q8ZDQHXJ7pxG2hVMHbBBUikiKbt8HATMdQDutJnDtIpxxtB8ugTMm/pRG57Nq8
 QjWdy5h2aYm5NwPuu/ErY26UpCljMoOrKAMiWQ539AkxQJGVN4hp4l6mVrZdnjc5
 MmL1+S1r/S3PBxP5Mw3VX58c7J1Ki28d80BF+1C4qDGjKXdLoqYpCy2CRyiqlFSo
 9yU37Uh1AeobbWGC8/7dIKN3PoXe1UQRhF56EdWAeTbz+yr0c8sRYsOs9SGEyWlj
 +PWFkpXF8Dpxs96BHf0iLy4XxtByJsdmGaUSSmdktUK1WWEOa8I51xtR0xVswOqr
 BXZ9ATNDZjed5DFItwyj3YYoh0L7Bid594ZuAbxHRUCOnaFNR3DZbNVe2+uZfo5l
 Lc2L/Vfa4mwuQtH0psMvXAvvywuGNGv9cjK5PwVyRwqsppcco5oV9LckmGMMnWoO
 xsU+/IMljHORL0HPvL+RaSBDprj7GgA7nDEuv7zyYMC/uWHv5olUvLNEdN/LtCaB
 S9mVlBpkB2yxUT8FvZg71UEK0FVix2IaFcSF8FZcJqHw+ureRH0mhwe7BUytMvH1
 7IbfeYSu7h6fr6wnGrW8dQ==
 =k8sF
 -----END PGP SIGNATURE-----

Merge tag 'for-net-2024-10-16' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth

Pull bluetooth fixes from Luiz Augusto Von Dentz:

 - ISO: Fix multiple init when debugfs is disabled

 - Call iso_exit() on module unload

 - Remove debugfs directory on module init failure

 - btusb: Fix not being able to reconnect after suspend

 - btusb: Fix regression with fake CSR controllers 0a12:0001

 - bnep: fix wild-memory-access in proto_unregister

Note: normally the bluetooth fixes go through the networking tree, but
this missed the weekly merge, and two of the commits fix regressions
that have caused a fair amount of noise and have now hit stable too:

  https://lore.kernel.org/all/4e1977ca-6166-4891-965e-34a6f319035f@leemhuis.info/

So I'm pulling it directly just to expedite things and not miss yet
another -rc release. This is not meant to become a new pattern.

* tag 'for-net-2024-10-16' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
  Bluetooth: btusb: Fix regression with fake CSR controllers 0a12:0001
  Bluetooth: bnep: fix wild-memory-access in proto_unregister
  Bluetooth: btusb: Fix not being able to reconnect after suspend
  Bluetooth: Remove debugfs directory on module init failure
  Bluetooth: Call iso_exit() on module unload
  Bluetooth: ISO: Fix multiple init when debugfs is disabled
2024-10-20 14:08:17 -07:00
Linus Torvalds
9197b73fd7 Mashed-up update that I sat on too long:
- fix for multiple slabs created with the same name
 - enable multipage folios
 - theorical fix to also look for opened fids by inode if none
 was found by dentry
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE/IPbcYBuWt0zoYhOq06b7GqY5nAFAmcS81AACgkQq06b7GqY
 5nACpBAAtXOGRjg+dushCwUVKBlnI3oTwE2G+ywnphNZg2A0emlMOxos7x1OTiM3
 Fu0b10MCUWHIXo4jD6ALVPWITJTfjiXR8s90Q/ozypcIXXhkDDShhV31b2h6Iplr
 YyKyjEehDFRiS7rqWC2a9mce99sOpwdQRmnssnWbjYvpJ4imFbl+50Z1I5Nc/Omu
 j2y02eMuikiWF/shKj0Dx1mmpZ4InSv3kvlM+V2D2YdWKNonGZe/xFZhid95LXmr
 Upt55R8k9qR2pn4VU22eKP6c34DIZGDlrcQdPUCNP5QuaAdGZov3TjNQdjE1bJmF
 E2QdxvUNfvvHqlvaRrlWa27uMgXMcy7QV3LEKwmo3tmaYVw2PDMRbFXc9zQdxy91
 zqXjjGasnwzE8ca36y79vZjFTHAyY5VK/3cHCL3ai+ysu4UL3k2QgmVegREG/xKk
 G8Nz4UO/R6s8Wc2VqxKJdZS5NMLlADS+Aes0PG+9AxQz7iR9Ktgwrw39KDxMi+Lm
 PeH3Gz2rP9+EPoa3usoBQtvvvmJKM/Wb9qdPW9vTtRbRJ7bVclJoizFoLMA/TiW1
 Jru+HYGBO75s8RynwEDLMiJhkjZWHfVgDjPsY6YsGVH8W2gOcJ7egQ2J2EsuurN3
 tzKz4uQilV+VeDuWs8pWKrX/c3Y3KpSYV+oayg7Je7LoTlQBmU8=
 =VG4t
 -----END PGP SIGNATURE-----

Merge tag '9p-for-6.12-rc4' of https://github.com/martinetd/linux

Pull 9p fixes from Dominique Martinet:
 "Mashed-up update that I sat on too long:

   - fix for multiple slabs created with the same name

   - enable multipage folios

   - theorical fix to also look for opened fids by inode if none was
     found by dentry"

[ Enabling multi-page folios should have been done during the merge
  window, but it's a one-liner, and the actual meat of the enablement
  is in netfs and already in use for other filesystems...  - Linus ]

* tag '9p-for-6.12-rc4' of https://github.com/martinetd/linux:
  9p: Avoid creating multiple slab caches with the same name
  9p: Enable multipage folios
  9p: v9fs_fid_find: also lookup by inode if not found dentry
2024-10-19 08:44:10 -07:00
Linus Torvalds
3d5ad2d4ec BPF fixes:
- Fix BPF verifier to not affect subreg_def marks in its range
   propagation, from Eduard Zingerman.
 
 - Fix a truncation bug in the BPF verifier's handling of
   coerce_reg_to_size_sx, from Dimitar Kanaliev.
 
 - Fix the BPF verifier's delta propagation between linked
   registers under 32-bit addition, from Daniel Borkmann.
 
 - Fix a NULL pointer dereference in BPF devmap due to missing
   rxq information, from Florian Kauer.
 
 - Fix a memory leak in bpf_core_apply, from Jiri Olsa.
 
 - Fix an UBSAN-reported array-index-out-of-bounds in BTF
   parsing for arrays of nested structs, from Hou Tao.
 
 - Fix build ID fetching where memory areas backing the file
   were created with memfd_secret, from Andrii Nakryiko.
 
 - Fix BPF task iterator tid filtering which was incorrectly
   using pid instead of tid, from Jordan Rome.
 
 - Several fixes for BPF sockmap and BPF sockhash redirection
   in combination with vsocks, from Michal Luczaj.
 
 - Fix riscv BPF JIT and make BPF_CMPXCHG fully ordered,
   from Andrea Parri.
 
 - Fix riscv BPF JIT under CONFIG_CFI_CLANG to prevent the
   possibility of an infinite BPF tailcall, from Pu Lehui.
 
 - Fix a build warning from resolve_btfids that bpf_lsm_key_free
   cannot be resolved, from Thomas Weißschuh.
 
 - Fix a bug in kfunc BTF caching for modules where the wrong
   BTF object was returned, from Toke Høiland-Jørgensen.
 
 - Fix a BPF selftest compilation error in cgroup-related tests
   with musl libc, from Tony Ambardar.
 
 - Several fixes to BPF link info dumps to fill missing fields,
   from Tyrone Wu.
 
 - Add BPF selftests for kfuncs from multiple modules, checking
   that the correct kfuncs are called, from Simon Sundberg.
 
 - Ensure that internal and user-facing bpf_redirect flags
   don't overlap, also from Toke Høiland-Jørgensen.
 
 - Switch to use kvzmalloc to allocate BPF verifier environment,
   from Rik van Riel.
 
 - Use raw_spinlock_t in BPF ringbuf to fix a sleep in atomic
   splat under RT, from Wander Lairson Costa.
 
 Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
 -----BEGIN PGP SIGNATURE-----
 
 iIsEABYIADMWIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZxK4OhUcZGFuaWVsQGlv
 Z2VhcmJveC5uZXQACgkQ2yufC7HISIOCrwEAib2kC5EEQn5+wKVE/bnZryVX2leT
 YXdfItDCBU6zCYUA+wTU5hGGn9lcDUcZx72l/KZPDyPw7HdzNJ+6iR1zQqoM
 =f9kv
 -----END PGP SIGNATURE-----

Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Pull bpf fixes from Daniel Borkmann:

 - Fix BPF verifier to not affect subreg_def marks in its range
   propagation (Eduard Zingerman)

 - Fix a truncation bug in the BPF verifier's handling of
   coerce_reg_to_size_sx (Dimitar Kanaliev)

 - Fix the BPF verifier's delta propagation between linked registers
   under 32-bit addition (Daniel Borkmann)

 - Fix a NULL pointer dereference in BPF devmap due to missing rxq
   information (Florian Kauer)

 - Fix a memory leak in bpf_core_apply (Jiri Olsa)

 - Fix an UBSAN-reported array-index-out-of-bounds in BTF parsing for
   arrays of nested structs (Hou Tao)

 - Fix build ID fetching where memory areas backing the file were
   created with memfd_secret (Andrii Nakryiko)

 - Fix BPF task iterator tid filtering which was incorrectly using pid
   instead of tid (Jordan Rome)

 - Several fixes for BPF sockmap and BPF sockhash redirection in
   combination with vsocks (Michal Luczaj)

 - Fix riscv BPF JIT and make BPF_CMPXCHG fully ordered (Andrea Parri)

 - Fix riscv BPF JIT under CONFIG_CFI_CLANG to prevent the possibility
   of an infinite BPF tailcall (Pu Lehui)

 - Fix a build warning from resolve_btfids that bpf_lsm_key_free cannot
   be resolved (Thomas Weißschuh)

 - Fix a bug in kfunc BTF caching for modules where the wrong BTF object
   was returned (Toke Høiland-Jørgensen)

 - Fix a BPF selftest compilation error in cgroup-related tests with
   musl libc (Tony Ambardar)

 - Several fixes to BPF link info dumps to fill missing fields (Tyrone
   Wu)

 - Add BPF selftests for kfuncs from multiple modules, checking that the
   correct kfuncs are called (Simon Sundberg)

 - Ensure that internal and user-facing bpf_redirect flags don't overlap
   (Toke Høiland-Jørgensen)

 - Switch to use kvzmalloc to allocate BPF verifier environment (Rik van
   Riel)

 - Use raw_spinlock_t in BPF ringbuf to fix a sleep in atomic splat
   under RT (Wander Lairson Costa)

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: (38 commits)
  lib/buildid: Handle memfd_secret() files in build_id_parse()
  selftests/bpf: Add test case for delta propagation
  bpf: Fix print_reg_state's constant scalar dump
  bpf: Fix incorrect delta propagation between linked registers
  bpf: Properly test iter/task tid filtering
  bpf: Fix iter/task tid filtering
  riscv, bpf: Make BPF_CMPXCHG fully ordered
  bpf, vsock: Drop static vsock_bpf_prot initialization
  vsock: Update msg_count on read_skb()
  vsock: Update rx_bytes on read_skb()
  bpf, sockmap: SK_DROP on attempted redirects of unsupported af_vsock
  selftests/bpf: Add asserts for netfilter link info
  bpf: Fix link info netfilter flags to populate defrag flag
  selftests/bpf: Add test for sign extension in coerce_subreg_to_size_sx()
  selftests/bpf: Add test for truncation after sign extension in coerce_reg_to_size_sx()
  bpf: Fix truncation bug in coerce_reg_to_size_sx()
  selftests/bpf: Assert link info uprobe_multi count & path_size if unset
  bpf: Fix unpopulated path_size when uprobe_multi fields unset
  selftests/bpf: Fix cross-compiling urandom_read
  selftests/bpf: Add test for kfunc module order
  ...
2024-10-18 16:27:14 -07:00
Russell King (Oracle)
ecb595ebba net: dsa: remove dsa_port_phylink_mac_select_pcs()
There is no longer any reason to implement the mac_select_pcs()
callback in DSA. Returning ERR_PTR(-EOPNOTSUPP) is functionally
equivalent to not providing the function.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
2024-10-17 18:15:14 -05:00
Linus Torvalds
07d6bf634b No contributions from subtrees.
Current release - new code bugs:
 
   - eth: mlx5: HWS, don't destroy more bwc queue locks than allocated
 
 Previous releases - regressions:
 
   - ipv4: give an IPv4 dev to blackhole_netdev
 
   - udp: compute L4 checksum as usual when not segmenting the skb
 
   - tcp/dccp: don't use timer_pending() in reqsk_queue_unlink().
 
   - eth: mlx5e: don't call cleanup on profile rollback failure
 
   - eth: microchip: vcap api: fix memory leaks in vcap_api_encode_rule_test()
 
   - eth: enetc: disable Tx BD rings after they are empty
 
   - eth: macb: avoid 20s boot delay by skipping MDIO bus registration for fixed-link PHY
 
 Previous releases - always broken:
 
   - posix-clock: fix missing timespec64 check in pc_clock_settime()
 
   - genetlink: hold RCU in genlmsg_mcast()
 
   - mptcp: prevent MPC handshake on port-based signal endpoints
 
   - eth: vmxnet3: fix packet corruption in vmxnet3_xdp_xmit_frame
 
   - eth: stmmac: dwmac-tegra: fix link bring-up sequence
 
   - eth: bcmasp: fix potential memory leak in bcmasp_xmit()
 
 Misc:
 
   - add Andrew Lunn as a co-maintainer of all networking drivers
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmcRDoMSHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOklXoP/2VKcUDYMfP03vB6a60riiqMcD+GVZzm
 lLaa9xBiDdEsnL+4dcQLF7tecYbqpIqh+0ZeEe0aXgp0HjIm2Im1eiXv5cB7INxK
 n1lxVibI/zD2j2M9cy2NDTeeYSI29GH98g0IkLrU9vnfsrp5jRuXFttrCmotzesZ
 tYb200cVMR/nt9rrG3aNAxTUHjaykgpKh/zSvFC0MRStXFfUbIia88LzcQOQzEK3
 jcWMj8jsJHkDLhUHS8LVZEV1JDYIb+QeEjGz5LxMpQV5/T5sP7Ki4ISPxpUbYNrZ
 QWwSrFSg2ivaq8PkQ2LBTKAtiHyGlcEJtnlSf7Y50cpc9RNphClq8YMSBUCcGTEi
 3W18eVIZ0aj1omTeHEQLbMkqT0soTwYxskDW0uCCFKf/SRCQm1ixrpQiA6PGP266
 e0q7mMD7v4S9ZdO5VF+oYzf1fF0OhaOkZtUEsWjxHite6ujh8719EPbUoee4cqPL
 ofoHYF338BlYl4YIZERZMff8HJD4+PR0R8c9SIBXQcGUMvf2mQdvaD9q2MGySSJd
 4mlH6bE8Ckj4KVp9llxB8zyw/5OmJdMZ+ar4o7+CX2Z3Wrs1SMSTy7qAADcV2B2G
 S9kvwNDffak+N94N/MqswQM4se5yo8dmyUnhqChBFYPFWc4N7v6wwKelROigqLL7
 7Xhj2TXRBGKs
 =ukTi
 -----END PGP SIGNATURE-----

Merge tag 'net-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Current release - new code bugs:

   - eth: mlx5: HWS, don't destroy more bwc queue locks than allocated

  Previous releases - regressions:

   - ipv4: give an IPv4 dev to blackhole_netdev

   - udp: compute L4 checksum as usual when not segmenting the skb

   - tcp/dccp: don't use timer_pending() in reqsk_queue_unlink().

   - eth: mlx5e: don't call cleanup on profile rollback failure

   - eth: microchip: vcap api: fix memory leaks in
     vcap_api_encode_rule_test()

   - eth: enetc: disable Tx BD rings after they are empty

   - eth: macb: avoid 20s boot delay by skipping MDIO bus registration
     for fixed-link PHY

  Previous releases - always broken:

   - posix-clock: fix missing timespec64 check in pc_clock_settime()

   - genetlink: hold RCU in genlmsg_mcast()

   - mptcp: prevent MPC handshake on port-based signal endpoints

   - eth: vmxnet3: fix packet corruption in vmxnet3_xdp_xmit_frame

   - eth: stmmac: dwmac-tegra: fix link bring-up sequence

   - eth: bcmasp: fix potential memory leak in bcmasp_xmit()

  Misc:

   - add Andrew Lunn as a co-maintainer of all networking drivers"

* tag 'net-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (47 commits)
  net/mlx5e: Don't call cleanup on profile rollback failure
  net/mlx5: Unregister notifier on eswitch init failure
  net/mlx5: Fix command bitmask initialization
  net/mlx5: Check for invalid vector index on EQ creation
  net/mlx5: HWS, use lock classes for bwc locks
  net/mlx5: HWS, don't destroy more bwc queue locks than allocated
  net/mlx5: HWS, fixed double free in error flow of definer layout
  net/mlx5: HWS, removed wrong access to a number of rules variable
  mptcp: pm: fix UaF read in mptcp_pm_nl_rm_addr_or_subflow
  net: ethernet: mtk_eth_soc: fix memory corruption during fq dma init
  vmxnet3: Fix packet corruption in vmxnet3_xdp_xmit_frame
  net: dsa: vsc73xx: fix reception from VLAN-unaware bridges
  net: ravb: Only advertise Rx/Tx timestamps if hardware supports it
  net: microchip: vcap api: Fix memory leaks in vcap_api_encode_rule_test()
  net: phy: mdio-bcm-unimac: Add BCM6846 support
  dt-bindings: net: brcm,unimac-mdio: Add bcm6846-mdio
  udp: Compute L4 checksum as usual when not segmenting the skb
  genetlink: hold RCU in genlmsg_mcast()
  net: dsa: mv88e6xxx: Fix the max_vid definition for the MV88E6361
  tcp/dccp: Don't use timer_pending() in reqsk_queue_unlink().
  ...
2024-10-17 09:31:18 -07:00
Florian Westphal
1230fe7ad3 netfilter: bpf: must hold reference on net namespace
BUG: KASAN: slab-use-after-free in __nf_unregister_net_hook+0x640/0x6b0
Read of size 8 at addr ffff8880106fe400 by task repro/72=
bpf_nf_link_release+0xda/0x1e0
bpf_link_free+0x139/0x2d0
bpf_link_release+0x68/0x80
__fput+0x414/0xb60

Eric says:
 It seems that bpf was able to defer the __nf_unregister_net_hook()
 after exit()/close() time.
 Perhaps a netns reference is missing, because the netns has been
 dismantled/freed already.
 bpf_nf_link_attach() does :
 link->net = net;
 But I do not see a reference being taken on net.

Add such a reference and release it after hook unreg.
Note that I was unable to get syzbot reproducer to work, so I
do not know if this resolves this splat.

Fixes: 84601d6ee6 ("bpf: add bpf_link support for BPF_NETFILTER programs")
Diagnosed-by: Eric Dumazet <edumazet@google.com>
Reported-by: Lai, Yi <yi1.lai@linux.intel.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-10-17 13:58:57 +02:00
Michal Luczaj
19039f2797 bpf, vsock: Drop static vsock_bpf_prot initialization
vsock_bpf_prot is set up at runtime. Remove the superfluous init.

No functional change intended.

Fixes: 634f1a7110 ("vsock: support sockmap")
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20241013-vsock-fixes-for-redir-v2-4-d6577bbfe742@rbox.co
2024-10-17 13:02:55 +02:00
Michal Luczaj
6dafde852d vsock: Update msg_count on read_skb()
Dequeuing via vsock_transport::read_skb() left msg_count outdated, which
then confused SOCK_SEQPACKET recv(). Decrease the counter.

Fixes: 634f1a7110 ("vsock: support sockmap")
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20241013-vsock-fixes-for-redir-v2-3-d6577bbfe742@rbox.co
2024-10-17 13:02:54 +02:00
Michal Luczaj
3543152f2d vsock: Update rx_bytes on read_skb()
Make sure virtio_transport_inc_rx_pkt() and virtio_transport_dec_rx_pkt()
calls are balanced (i.e. virtio_vsock_sock::rx_bytes doesn't lie) after
vsock_transport::read_skb().

While here, also inform the peer that we've freed up space and it has more
credit.

Failing to update rx_bytes after packet is dequeued leads to a warning on
SOCK_STREAM recv():

[  233.396654] rx_queue is empty, but rx_bytes is non-zero
[  233.396702] WARNING: CPU: 11 PID: 40601 at net/vmw_vsock/virtio_transport_common.c:589

Fixes: 634f1a7110 ("vsock: support sockmap")
Suggested-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20241013-vsock-fixes-for-redir-v2-2-d6577bbfe742@rbox.co
2024-10-17 13:02:54 +02:00
Michal Luczaj
9c5bd93edf bpf, sockmap: SK_DROP on attempted redirects of unsupported af_vsock
Don't mislead the callers of bpf_{sk,msg}_redirect_{map,hash}(): make sure
to immediately and visibly fail the forwarding of unsupported af_vsock
packets.

Fixes: 634f1a7110 ("vsock: support sockmap")
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20241013-vsock-fixes-for-redir-v2-1-d6577bbfe742@rbox.co
2024-10-17 13:02:54 +02:00
Matthieu Baerts (NGI0)
7decd1f590 mptcp: pm: fix UaF read in mptcp_pm_nl_rm_addr_or_subflow
Syzkaller reported this splat:

  ==================================================================
  BUG: KASAN: slab-use-after-free in mptcp_pm_nl_rm_addr_or_subflow+0xb44/0xcc0 net/mptcp/pm_netlink.c:881
  Read of size 4 at addr ffff8880569ac858 by task syz.1.2799/14662

  CPU: 0 UID: 0 PID: 14662 Comm: syz.1.2799 Not tainted 6.12.0-rc2-syzkaller-00307-g36c254515dc6 #0
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
  Call Trace:
   <TASK>
   __dump_stack lib/dump_stack.c:94 [inline]
   dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120
   print_address_description mm/kasan/report.c:377 [inline]
   print_report+0xc3/0x620 mm/kasan/report.c:488
   kasan_report+0xd9/0x110 mm/kasan/report.c:601
   mptcp_pm_nl_rm_addr_or_subflow+0xb44/0xcc0 net/mptcp/pm_netlink.c:881
   mptcp_pm_nl_rm_subflow_received net/mptcp/pm_netlink.c:914 [inline]
   mptcp_nl_remove_id_zero_address+0x305/0x4a0 net/mptcp/pm_netlink.c:1572
   mptcp_pm_nl_del_addr_doit+0x5c9/0x770 net/mptcp/pm_netlink.c:1603
   genl_family_rcv_msg_doit+0x202/0x2f0 net/netlink/genetlink.c:1115
   genl_family_rcv_msg net/netlink/genetlink.c:1195 [inline]
   genl_rcv_msg+0x565/0x800 net/netlink/genetlink.c:1210
   netlink_rcv_skb+0x165/0x410 net/netlink/af_netlink.c:2551
   genl_rcv+0x28/0x40 net/netlink/genetlink.c:1219
   netlink_unicast_kernel net/netlink/af_netlink.c:1331 [inline]
   netlink_unicast+0x53c/0x7f0 net/netlink/af_netlink.c:1357
   netlink_sendmsg+0x8b8/0xd70 net/netlink/af_netlink.c:1901
   sock_sendmsg_nosec net/socket.c:729 [inline]
   __sock_sendmsg net/socket.c:744 [inline]
   ____sys_sendmsg+0x9ae/0xb40 net/socket.c:2607
   ___sys_sendmsg+0x135/0x1e0 net/socket.c:2661
   __sys_sendmsg+0x117/0x1f0 net/socket.c:2690
   do_syscall_32_irqs_on arch/x86/entry/common.c:165 [inline]
   __do_fast_syscall_32+0x73/0x120 arch/x86/entry/common.c:386
   do_fast_syscall_32+0x32/0x80 arch/x86/entry/common.c:411
   entry_SYSENTER_compat_after_hwframe+0x84/0x8e
  RIP: 0023:0xf7fe4579
  Code: b8 01 10 06 03 74 b4 01 10 07 03 74 b0 01 10 08 03 74 d8 01 00 00 00 00 00 00 00 00 00 00 00 00 00 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d b4 26 00 00 00 00 8d b4 26 00 00 00 00
  RSP: 002b:00000000f574556c EFLAGS: 00000296 ORIG_RAX: 0000000000000172
  RAX: ffffffffffffffda RBX: 000000000000000b RCX: 0000000020000140
  RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
  RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000296 R12: 0000000000000000
  R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
   </TASK>

  Allocated by task 5387:
   kasan_save_stack+0x33/0x60 mm/kasan/common.c:47
   kasan_save_track+0x14/0x30 mm/kasan/common.c:68
   poison_kmalloc_redzone mm/kasan/common.c:377 [inline]
   __kasan_kmalloc+0xaa/0xb0 mm/kasan/common.c:394
   kmalloc_noprof include/linux/slab.h:878 [inline]
   kzalloc_noprof include/linux/slab.h:1014 [inline]
   subflow_create_ctx+0x87/0x2a0 net/mptcp/subflow.c:1803
   subflow_ulp_init+0xc3/0x4d0 net/mptcp/subflow.c:1956
   __tcp_set_ulp net/ipv4/tcp_ulp.c:146 [inline]
   tcp_set_ulp+0x326/0x7f0 net/ipv4/tcp_ulp.c:167
   mptcp_subflow_create_socket+0x4ae/0x10a0 net/mptcp/subflow.c:1764
   __mptcp_subflow_connect+0x3cc/0x1490 net/mptcp/subflow.c:1592
   mptcp_pm_create_subflow_or_signal_addr+0xbda/0x23a0 net/mptcp/pm_netlink.c:642
   mptcp_pm_nl_fully_established net/mptcp/pm_netlink.c:650 [inline]
   mptcp_pm_nl_work+0x3a1/0x4f0 net/mptcp/pm_netlink.c:943
   mptcp_worker+0x15a/0x1240 net/mptcp/protocol.c:2777
   process_one_work+0x958/0x1b30 kernel/workqueue.c:3229
   process_scheduled_works kernel/workqueue.c:3310 [inline]
   worker_thread+0x6c8/0xf00 kernel/workqueue.c:3391
   kthread+0x2c1/0x3a0 kernel/kthread.c:389
   ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
   ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244

  Freed by task 113:
   kasan_save_stack+0x33/0x60 mm/kasan/common.c:47
   kasan_save_track+0x14/0x30 mm/kasan/common.c:68
   kasan_save_free_info+0x3b/0x60 mm/kasan/generic.c:579
   poison_slab_object mm/kasan/common.c:247 [inline]
   __kasan_slab_free+0x51/0x70 mm/kasan/common.c:264
   kasan_slab_free include/linux/kasan.h:230 [inline]
   slab_free_hook mm/slub.c:2342 [inline]
   slab_free mm/slub.c:4579 [inline]
   kfree+0x14f/0x4b0 mm/slub.c:4727
   kvfree+0x47/0x50 mm/util.c:701
   kvfree_rcu_list+0xf5/0x2c0 kernel/rcu/tree.c:3423
   kvfree_rcu_drain_ready kernel/rcu/tree.c:3563 [inline]
   kfree_rcu_monitor+0x503/0x8b0 kernel/rcu/tree.c:3632
   kfree_rcu_shrink_scan+0x245/0x3a0 kernel/rcu/tree.c:3966
   do_shrink_slab+0x44f/0x11c0 mm/shrinker.c:435
   shrink_slab+0x32b/0x12a0 mm/shrinker.c:662
   shrink_one+0x47e/0x7b0 mm/vmscan.c:4818
   shrink_many mm/vmscan.c:4879 [inline]
   lru_gen_shrink_node mm/vmscan.c:4957 [inline]
   shrink_node+0x2452/0x39d0 mm/vmscan.c:5937
   kswapd_shrink_node mm/vmscan.c:6765 [inline]
   balance_pgdat+0xc19/0x18f0 mm/vmscan.c:6957
   kswapd+0x5ea/0xbf0 mm/vmscan.c:7226
   kthread+0x2c1/0x3a0 kernel/kthread.c:389
   ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
   ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244

  Last potentially related work creation:
   kasan_save_stack+0x33/0x60 mm/kasan/common.c:47
   __kasan_record_aux_stack+0xba/0xd0 mm/kasan/generic.c:541
   kvfree_call_rcu+0x74/0xbe0 kernel/rcu/tree.c:3810
   subflow_ulp_release+0x2ae/0x350 net/mptcp/subflow.c:2009
   tcp_cleanup_ulp+0x7c/0x130 net/ipv4/tcp_ulp.c:124
   tcp_v4_destroy_sock+0x1c5/0x6a0 net/ipv4/tcp_ipv4.c:2541
   inet_csk_destroy_sock+0x1a3/0x440 net/ipv4/inet_connection_sock.c:1293
   tcp_done+0x252/0x350 net/ipv4/tcp.c:4870
   tcp_rcv_state_process+0x379b/0x4f30 net/ipv4/tcp_input.c:6933
   tcp_v4_do_rcv+0x1ad/0xa90 net/ipv4/tcp_ipv4.c:1938
   sk_backlog_rcv include/net/sock.h:1115 [inline]
   __release_sock+0x31b/0x400 net/core/sock.c:3072
   __tcp_close+0x4f3/0xff0 net/ipv4/tcp.c:3142
   __mptcp_close_ssk+0x331/0x14d0 net/mptcp/protocol.c:2489
   mptcp_close_ssk net/mptcp/protocol.c:2543 [inline]
   mptcp_close_ssk+0x150/0x220 net/mptcp/protocol.c:2526
   mptcp_pm_nl_rm_addr_or_subflow+0x2be/0xcc0 net/mptcp/pm_netlink.c:878
   mptcp_pm_nl_rm_subflow_received net/mptcp/pm_netlink.c:914 [inline]
   mptcp_nl_remove_id_zero_address+0x305/0x4a0 net/mptcp/pm_netlink.c:1572
   mptcp_pm_nl_del_addr_doit+0x5c9/0x770 net/mptcp/pm_netlink.c:1603
   genl_family_rcv_msg_doit+0x202/0x2f0 net/netlink/genetlink.c:1115
   genl_family_rcv_msg net/netlink/genetlink.c:1195 [inline]
   genl_rcv_msg+0x565/0x800 net/netlink/genetlink.c:1210
   netlink_rcv_skb+0x165/0x410 net/netlink/af_netlink.c:2551
   genl_rcv+0x28/0x40 net/netlink/genetlink.c:1219
   netlink_unicast_kernel net/netlink/af_netlink.c:1331 [inline]
   netlink_unicast+0x53c/0x7f0 net/netlink/af_netlink.c:1357
   netlink_sendmsg+0x8b8/0xd70 net/netlink/af_netlink.c:1901
   sock_sendmsg_nosec net/socket.c:729 [inline]
   __sock_sendmsg net/socket.c:744 [inline]
   ____sys_sendmsg+0x9ae/0xb40 net/socket.c:2607
   ___sys_sendmsg+0x135/0x1e0 net/socket.c:2661
   __sys_sendmsg+0x117/0x1f0 net/socket.c:2690
   do_syscall_32_irqs_on arch/x86/entry/common.c:165 [inline]
   __do_fast_syscall_32+0x73/0x120 arch/x86/entry/common.c:386
   do_fast_syscall_32+0x32/0x80 arch/x86/entry/common.c:411
   entry_SYSENTER_compat_after_hwframe+0x84/0x8e

  The buggy address belongs to the object at ffff8880569ac800
   which belongs to the cache kmalloc-512 of size 512
  The buggy address is located 88 bytes inside of
   freed 512-byte region [ffff8880569ac800, ffff8880569aca00)

  The buggy address belongs to the physical page:
  page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x569ac
  head: order:2 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
  flags: 0x4fff00000000040(head|node=1|zone=1|lastcpupid=0x7ff)
  page_type: f5(slab)
  raw: 04fff00000000040 ffff88801ac42c80 dead000000000100 dead000000000122
  raw: 0000000000000000 0000000080100010 00000001f5000000 0000000000000000
  head: 04fff00000000040 ffff88801ac42c80 dead000000000100 dead000000000122
  head: 0000000000000000 0000000080100010 00000001f5000000 0000000000000000
  head: 04fff00000000002 ffffea00015a6b01 ffffffffffffffff 0000000000000000
  head: 0000000000000004 0000000000000000 00000000ffffffff 0000000000000000
  page dumped because: kasan: bad access detected
  page_owner tracks the page as allocated
  page last allocated via order 2, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 10238, tgid 10238 (kworker/u32:6), ts 597403252405, free_ts 597177952947
   set_page_owner include/linux/page_owner.h:32 [inline]
   post_alloc_hook+0x2d1/0x350 mm/page_alloc.c:1537
   prep_new_page mm/page_alloc.c:1545 [inline]
   get_page_from_freelist+0x101e/0x3070 mm/page_alloc.c:3457
   __alloc_pages_noprof+0x223/0x25a0 mm/page_alloc.c:4733
   alloc_pages_mpol_noprof+0x2c9/0x610 mm/mempolicy.c:2265
   alloc_slab_page mm/slub.c:2412 [inline]
   allocate_slab mm/slub.c:2578 [inline]
   new_slab+0x2ba/0x3f0 mm/slub.c:2631
   ___slab_alloc+0xd1d/0x16f0 mm/slub.c:3818
   __slab_alloc.constprop.0+0x56/0xb0 mm/slub.c:3908
   __slab_alloc_node mm/slub.c:3961 [inline]
   slab_alloc_node mm/slub.c:4122 [inline]
   __kmalloc_cache_noprof+0x2c5/0x310 mm/slub.c:4290
   kmalloc_noprof include/linux/slab.h:878 [inline]
   kzalloc_noprof include/linux/slab.h:1014 [inline]
   mld_add_delrec net/ipv6/mcast.c:743 [inline]
   igmp6_leave_group net/ipv6/mcast.c:2625 [inline]
   igmp6_group_dropped+0x4ab/0xe40 net/ipv6/mcast.c:723
   __ipv6_dev_mc_dec+0x281/0x360 net/ipv6/mcast.c:979
   addrconf_leave_solict net/ipv6/addrconf.c:2253 [inline]
   __ipv6_ifa_notify+0x3f6/0xc30 net/ipv6/addrconf.c:6283
   addrconf_ifdown.isra.0+0xef9/0x1a20 net/ipv6/addrconf.c:3982
   addrconf_notify+0x220/0x19c0 net/ipv6/addrconf.c:3781
   notifier_call_chain+0xb9/0x410 kernel/notifier.c:93
   call_netdevice_notifiers_info+0xbe/0x140 net/core/dev.c:1996
   call_netdevice_notifiers_extack net/core/dev.c:2034 [inline]
   call_netdevice_notifiers net/core/dev.c:2048 [inline]
   dev_close_many+0x333/0x6a0 net/core/dev.c:1589
  page last free pid 13136 tgid 13136 stack trace:
   reset_page_owner include/linux/page_owner.h:25 [inline]
   free_pages_prepare mm/page_alloc.c:1108 [inline]
   free_unref_page+0x5f4/0xdc0 mm/page_alloc.c:2638
   stack_depot_save_flags+0x2da/0x900 lib/stackdepot.c:666
   kasan_save_stack+0x42/0x60 mm/kasan/common.c:48
   kasan_save_track+0x14/0x30 mm/kasan/common.c:68
   unpoison_slab_object mm/kasan/common.c:319 [inline]
   __kasan_slab_alloc+0x89/0x90 mm/kasan/common.c:345
   kasan_slab_alloc include/linux/kasan.h:247 [inline]
   slab_post_alloc_hook mm/slub.c:4085 [inline]
   slab_alloc_node mm/slub.c:4134 [inline]
   kmem_cache_alloc_noprof+0x121/0x2f0 mm/slub.c:4141
   skb_clone+0x190/0x3f0 net/core/skbuff.c:2084
   do_one_broadcast net/netlink/af_netlink.c:1462 [inline]
   netlink_broadcast_filtered+0xb11/0xef0 net/netlink/af_netlink.c:1540
   netlink_broadcast+0x39/0x50 net/netlink/af_netlink.c:1564
   uevent_net_broadcast_untagged lib/kobject_uevent.c:331 [inline]
   kobject_uevent_net_broadcast lib/kobject_uevent.c:410 [inline]
   kobject_uevent_env+0xacd/0x1670 lib/kobject_uevent.c:608
   device_del+0x623/0x9f0 drivers/base/core.c:3882
   snd_card_disconnect.part.0+0x58a/0x7c0 sound/core/init.c:546
   snd_card_disconnect+0x1f/0x30 sound/core/init.c:495
   snd_usx2y_disconnect+0xe9/0x1f0 sound/usb/usx2y/usbusx2y.c:417
   usb_unbind_interface+0x1e8/0x970 drivers/usb/core/driver.c:461
   device_remove drivers/base/dd.c:569 [inline]
   device_remove+0x122/0x170 drivers/base/dd.c:561

That's because 'subflow' is used just after 'mptcp_close_ssk(subflow)',
which will initiate the release of its memory. Even if it is very likely
the release and the re-utilisation will be done later on, it is of
course better to avoid any issues and read the content of 'subflow'
before closing it.

Fixes: 1c1f721375 ("mptcp: pm: only decrement add_addr_accepted for MPJ req")
Cc: stable@vger.kernel.org
Reported-by: syzbot+3c8b7a8e7df6a2a226ca@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/670d7337.050a0220.4cbc0.004f.GAE@google.com
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://patch.msgid.link/20241015-net-mptcp-uaf-pm-rm-v1-1-c4ee5d987a64@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-17 12:06:55 +02:00
Daniel Zahka
42dc431f5d ethtool: rss: prevent rss ctx deletion when in use
ntuple filters can specify an rss context to use for packet hashing
and queue selection. When a filter is referencing an rss context, it
should be invalid for that context to be deleted. A list of active
ntuple filters and their associated rss contexts can be compiled by
querying a device's ethtool_ops.get_rxnfc. This patch checks to see if
any ntuple filters are referencing an rss context during context
deletion, and prevents the deletion if the requested context is still
in use.

Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-17 10:22:01 +02:00
Ye Bin
64a90991ba Bluetooth: bnep: fix wild-memory-access in proto_unregister
There's issue as follows:
  KASAN: maybe wild-memory-access in range [0xdead...108-0xdead...10f]
  CPU: 3 UID: 0 PID: 2805 Comm: rmmod Tainted: G        W
  RIP: 0010:proto_unregister+0xee/0x400
  Call Trace:
   <TASK>
   __do_sys_delete_module+0x318/0x580
   do_syscall_64+0xc1/0x1d0
   entry_SYSCALL_64_after_hwframe+0x77/0x7f

As bnep_init() ignore bnep_sock_init()'s return value, and bnep_sock_init()
will cleanup all resource. Then when remove bnep module will call
bnep_sock_cleanup() to cleanup sock's resource.
To solve above issue just return bnep_sock_init()'s return value in
bnep_exit().

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Ye Bin <yebin10@huawei.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-10-16 16:10:03 -04:00
Aaron Thompson
1db4564f10 Bluetooth: Remove debugfs directory on module init failure
If bt_init() fails, the debugfs directory currently is not removed. If
the module is loaded again after that, the debugfs directory is not set
up properly due to the existing directory.

  # modprobe bluetooth
  # ls -laF /sys/kernel/debug/bluetooth
  total 0
  drwxr-xr-x  2 root root 0 Sep 27 14:26 ./
  drwx------ 31 root root 0 Sep 27 14:25 ../
  -r--r--r--  1 root root 0 Sep 27 14:26 l2cap
  -r--r--r--  1 root root 0 Sep 27 14:26 sco
  # modprobe -r bluetooth
  # ls -laF /sys/kernel/debug/bluetooth
  ls: cannot access '/sys/kernel/debug/bluetooth': No such file or directory
  #

  # modprobe bluetooth
  modprobe: ERROR: could not insert 'bluetooth': Invalid argument
  # dmesg | tail -n 6
  Bluetooth: Core ver 2.22
  NET: Registered PF_BLUETOOTH protocol family
  Bluetooth: HCI device and connection manager initialized
  Bluetooth: HCI socket layer initialized
  Bluetooth: Faking l2cap_init() failure for testing
  NET: Unregistered PF_BLUETOOTH protocol family
  # ls -laF /sys/kernel/debug/bluetooth
  total 0
  drwxr-xr-x  2 root root 0 Sep 27 14:31 ./
  drwx------ 31 root root 0 Sep 27 14:26 ../
  #

  # modprobe bluetooth
  # dmesg | tail -n 7
  Bluetooth: Core ver 2.22
  debugfs: Directory 'bluetooth' with parent '/' already present!
  NET: Registered PF_BLUETOOTH protocol family
  Bluetooth: HCI device and connection manager initialized
  Bluetooth: HCI socket layer initialized
  Bluetooth: L2CAP socket layer initialized
  Bluetooth: SCO socket layer initialized
  # ls -laF /sys/kernel/debug/bluetooth
  total 0
  drwxr-xr-x  2 root root 0 Sep 27 14:31 ./
  drwx------ 31 root root 0 Sep 27 14:26 ../
  #

Cc: stable@vger.kernel.org
Fixes: ffcecac6a7 ("Bluetooth: Create root debugfs directory during module init")
Signed-off-by: Aaron Thompson <dev@aaront.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-10-16 16:09:25 -04:00
Aaron Thompson
d458cd1221 Bluetooth: Call iso_exit() on module unload
If iso_init() has been called, iso_exit() must be called on module
unload. Without that, the struct proto that iso_init() registered with
proto_register() becomes invalid, which could cause unpredictable
problems later. In my case, with CONFIG_LIST_HARDENED and
CONFIG_BUG_ON_DATA_CORRUPTION enabled, loading the module again usually
triggers this BUG():

  list_add corruption. next->prev should be prev (ffffffffb5355fd0),
    but was 0000000000000068. (next=ffffffffc0a010d0).
  ------------[ cut here ]------------
  kernel BUG at lib/list_debug.c:29!
  Oops: invalid opcode: 0000 [#1] PREEMPT SMP PTI
  CPU: 1 PID: 4159 Comm: modprobe Not tainted 6.10.11-4+bt2-ao-desktop #1
  RIP: 0010:__list_add_valid_or_report+0x61/0xa0
  ...
    __list_add_valid_or_report+0x61/0xa0
    proto_register+0x299/0x320
    hci_sock_init+0x16/0xc0 [bluetooth]
    bt_init+0x68/0xd0 [bluetooth]
    __pfx_bt_init+0x10/0x10 [bluetooth]
    do_one_initcall+0x80/0x2f0
    do_init_module+0x8b/0x230
    __do_sys_init_module+0x15f/0x190
    do_syscall_64+0x68/0x110
  ...

Cc: stable@vger.kernel.org
Fixes: ccf74f2390 ("Bluetooth: Add BTPROTO_ISO socket type")
Signed-off-by: Aaron Thompson <dev@aaront.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-10-16 16:09:03 -04:00
Aaron Thompson
a9b7b535ba Bluetooth: ISO: Fix multiple init when debugfs is disabled
If bt_debugfs is not created successfully, which happens if either
CONFIG_DEBUG_FS or CONFIG_DEBUG_FS_ALLOW_ALL is unset, then iso_init()
returns early and does not set iso_inited to true. This means that a
subsequent call to iso_init() will result in duplicate calls to
proto_register(), bt_sock_register(), etc.

With CONFIG_LIST_HARDENED and CONFIG_BUG_ON_DATA_CORRUPTION enabled, the
duplicate call to proto_register() triggers this BUG():

  list_add double add: new=ffffffffc0b280d0, prev=ffffffffbab56250,
    next=ffffffffc0b280d0.
  ------------[ cut here ]------------
  kernel BUG at lib/list_debug.c:35!
  Oops: invalid opcode: 0000 [#1] PREEMPT SMP PTI
  CPU: 2 PID: 887 Comm: bluetoothd Not tainted 6.10.11-1-ao-desktop #1
  RIP: 0010:__list_add_valid_or_report+0x9a/0xa0
  ...
    __list_add_valid_or_report+0x9a/0xa0
    proto_register+0x2b5/0x340
    iso_init+0x23/0x150 [bluetooth]
    set_iso_socket_func+0x68/0x1b0 [bluetooth]
    kmem_cache_free+0x308/0x330
    hci_sock_sendmsg+0x990/0x9e0 [bluetooth]
    __sock_sendmsg+0x7b/0x80
    sock_write_iter+0x9a/0x110
    do_iter_readv_writev+0x11d/0x220
    vfs_writev+0x180/0x3e0
    do_writev+0xca/0x100
  ...

This change removes the early return. The check for iso_debugfs being
NULL was unnecessary, it is always NULL when iso_inited is false.

Cc: stable@vger.kernel.org
Fixes: ccf74f2390 ("Bluetooth: Add BTPROTO_ISO socket type")
Signed-off-by: Aaron Thompson <dev@aaront.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-10-16 16:08:43 -04:00
Tyrone Wu
92f3715e1e bpf: Fix link info netfilter flags to populate defrag flag
This fix correctly populates the `bpf_link_info.netfilter.flags` field
when user passes the `BPF_F_NETFILTER_IP_DEFRAG` flag.

Fixes: 91721c2d02 ("netfilter: bpf: Support BPF_F_NETFILTER_IP_DEFRAG in netfilter link")
Signed-off-by: Tyrone Wu <wudevelops@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Florian Westphal <fw@strlen.de>
Cc: Daniel Xu <dxu@dxuuu.xyz>
Link: https://lore.kernel.org/bpf/20241011193252.178997-1-wudevelops@gmail.com
2024-10-16 17:04:33 +02:00
Kuniyuki Iwashima
e1c6c38312 rtnetlink: Remove rtnl_register() and rtnl_register_module().
No one uses rtnl_register() and rtnl_register_module().

Let's remove them.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241014201828.91221-12-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15 18:52:26 -07:00
Kuniyuki Iwashima
df96b8f45a can: gw: Use rtnl_register_many().
We will remove rtnl_register_module() in favour of rtnl_register_many().

rtnl_register_many() will unwind the previous successful registrations
on failure and simplify module error handling.

Let's use rtnl_register_many() instead.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Marc Kleine-Budde <mkl@pengutronix.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241014201828.91221-11-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15 18:52:26 -07:00
Kuniyuki Iwashima
c82b031dcb dcb: Use rtnl_register_many().
We will remove rtnl_register() in favour of rtnl_register_many().

When it succeeds, rtnl_register_many() guarantees all rtnetlink types
in the passed array are supported, and there is no chance that a part
of message types is not supported.

Let's use rtnl_register_many() instead.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241014201828.91221-10-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15 18:52:26 -07:00
Kuniyuki Iwashima
3ac84e31b3 ipmr: Use rtnl_register_many().
We will remove rtnl_register() and rtnl_register_module() in favour
of rtnl_register_many().

When it succeeds for built-in callers, rtnl_register_many() guarantees
all rtnetlink types in the passed array are supported, and there is no
chance that a part of message types is not supported.

Let's use rtnl_register_many() instead.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241014201828.91221-9-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15 18:52:26 -07:00
Kuniyuki Iwashima
a37b0e4eca ipv6: Use rtnl_register_many().
We will remove rtnl_register_module() in favour of rtnl_register_many().

rtnl_register_many() will unwind the previous successful registrations
on failure and simplify module error handling.

Let's use rtnl_register_many() instead.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241014201828.91221-8-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15 18:52:26 -07:00
Kuniyuki Iwashima
465bac91f9 ipv4: Use rtnl_register_many().
We will remove rtnl_register() in favour of rtnl_register_many().

When it succeeds, rtnl_register_many() guarantees all rtnetlink types
in the passed array are supported, and there is no chance that a part
of message types is not supported.

Let's use rtnl_register_many() instead.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241014201828.91221-7-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15 18:52:26 -07:00
Kuniyuki Iwashima
803838a5f6 net: Use rtnl_register_many().
We will remove rtnl_register() in favour of rtnl_register_many().

When it succeeds, rtnl_register_many() guarantees all rtnetlink types
in the passed array are supported, and there is no chance that a part
of message types is not supported.

Let's use rtnl_register_many() instead.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241014201828.91221-6-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15 18:52:26 -07:00
Kuniyuki Iwashima
cc72bb0303 net: sched: Use rtnl_register_many().
We will remove rtnl_register() in favour of rtnl_register_many().

When it succeeds, rtnl_register_many() guarantees all rtnetlink types
in the passed array are supported, and there is no chance that a part
of message types is not supported.

Let's use rtnl_register_many() instead.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://patch.msgid.link/20241014201828.91221-5-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15 18:52:25 -07:00
Kuniyuki Iwashima
d0d14aef50 neighbour: Use rtnl_register_many().
We will remove rtnl_register() in favour of rtnl_register_many().

When it succeeds, rtnl_register_many() guarantees all rtnetlink types
in the passed array are supported, and there is no chance that a part
of message types is not supported.

Let's use rtnl_register_many() instead.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241014201828.91221-4-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15 18:52:25 -07:00
Kuniyuki Iwashima
181bc7875b rtnetlink: Use rtnl_register_many().
We will remove rtnl_register() in favour of rtnl_register_many().

When it succeeds, rtnl_register_many() guarantees all rtnetlink types
in the passed array are supported, and there is no chance that a part
of message types is not supported.

Let's use rtnl_register_many() instead.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20241014201828.91221-3-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15 18:52:25 -07:00
Kuniyuki Iwashima
09aec57d83 rtnetlink: Panic when __rtnl_register_many() fails for builtin callers.
We will replace all rtnl_register() and rtnl_register_module() with
rtnl_register_many().

Currently, rtnl_register() returns nothing and prints an error message
when it fails to register a rtnetlink message type and handlers.

The failure happens only when rtnl_register_internal() fails to allocate
rtnl_msg_handlers[protocol][msgtype], but it's unlikely for built-in
callers on boot time.

rtnl_register_many() unwinds the previous successful registrations on
failure and returns an error, but it will be useless for built-in callers,
especially some subsystems that do not have the legacy ioctl() interface
and do not work without rtnetlink.

Instead of booting up without rtnetlink functionality, let's panic on
failure for built-in rtnl_register_many() callers.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241014201828.91221-2-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15 18:52:25 -07:00
Ignat Korchagin
18429e6e0c Revert "net: do not leave a dangling sk pointer, when socket creation fails"
This reverts commit 6cd4a78d96.

inet/inet6->create() implementations have been fixed to explicitly NULL the
allocated sk object on error.

A warning was put in place to make sure any future changes will not leave
a dangling pointer in pf->create() implementations.

So this code is now redundant.

Suggested-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Ignat Korchagin <ignat@cloudflare.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241014153808.51894-10-ignat@cloudflare.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15 18:43:08 -07:00
Ignat Korchagin
48156296a0 net: warn, if pf->create does not clear sock->sk on error
All pf->create implementations have been fixed now to clear sock->sk on
error, when they deallocate the allocated sk object.

Put a warning in place to make sure we don't break this promise in the
future.

Suggested-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Ignat Korchagin <ignat@cloudflare.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241014153808.51894-9-ignat@cloudflare.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15 18:43:08 -07:00
Ignat Korchagin
9df99c395d net: inet6: do not leave a dangling sk pointer in inet6_create()
sock_init_data() attaches the allocated sk pointer to the provided sock
object. If inet6_create() fails later, the sk object is released, but the
sock object retains the dangling sk pointer, which may cause use-after-free
later.

Clear the sock sk pointer on error.

Signed-off-by: Ignat Korchagin <ignat@cloudflare.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241014153808.51894-8-ignat@cloudflare.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15 18:43:08 -07:00
Ignat Korchagin
9365fa510c net: inet: do not leave a dangling sk pointer in inet_create()
sock_init_data() attaches the allocated sk object to the provided sock
object. If inet_create() fails later, the sk object is freed, but the
sock object retains the dangling pointer, which may create use-after-free
later.

Clear the sk pointer in the sock object on error.

Signed-off-by: Ignat Korchagin <ignat@cloudflare.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241014153808.51894-7-ignat@cloudflare.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15 18:43:08 -07:00
Ignat Korchagin
b4fcd63f6e net: ieee802154: do not leave a dangling sk pointer in ieee802154_create()
sock_init_data() attaches the allocated sk object to the provided sock
object. If ieee802154_create() fails later, the allocated sk object is
freed, but the dangling pointer remains in the provided sock object, which
may allow use-after-free.

Clear the sk pointer in the sock object on error.

Signed-off-by: Ignat Korchagin <ignat@cloudflare.com>
Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241014153808.51894-6-ignat@cloudflare.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15 18:43:08 -07:00
Ignat Korchagin
811a7ca732 net: af_can: do not leave a dangling sk pointer in can_create()
On error can_create() frees the allocated sk object, but sock_init_data()
has already attached it to the provided sock object. This will leave a
dangling sk pointer in the sock object and may cause use-after-free later.

Signed-off-by: Ignat Korchagin <ignat@cloudflare.com>
Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Marc Kleine-Budde <mkl@pengutronix.de>
Link: https://patch.msgid.link/20241014153808.51894-5-ignat@cloudflare.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15 18:43:08 -07:00
Ignat Korchagin
3945c799f1 Bluetooth: RFCOMM: avoid leaving dangling sk pointer in rfcomm_sock_alloc()
bt_sock_alloc() attaches allocated sk object to the provided sock object.
If rfcomm_dlc_alloc() fails, we release the sk object, but leave the
dangling pointer in the sock object, which may cause use-after-free.

Fix this by swapping calls to bt_sock_alloc() and rfcomm_dlc_alloc().

Signed-off-by: Ignat Korchagin <ignat@cloudflare.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241014153808.51894-4-ignat@cloudflare.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15 18:43:08 -07:00
Ignat Korchagin
7c4f78cdb8 Bluetooth: L2CAP: do not leave dangling sk pointer on error in l2cap_sock_create()
bt_sock_alloc() allocates the sk object and attaches it to the provided
sock object. On error l2cap_sock_alloc() frees the sk object, but the
dangling pointer is still attached to the sock object, which may create
use-after-free in other code.

Signed-off-by: Ignat Korchagin <ignat@cloudflare.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241014153808.51894-3-ignat@cloudflare.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15 18:43:07 -07:00
Ignat Korchagin
46f2a11cb8 af_packet: avoid erroring out after sock_init_data() in packet_create()
After sock_init_data() the allocated sk object is attached to the provided
sock object. On error, packet_create() frees the sk object leaving the
dangling pointer in the sock object on return. Some other code may try
to use this pointer and cause use-after-free.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Ignat Korchagin <ignat@cloudflare.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241014153808.51894-2-ignat@cloudflare.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15 18:43:07 -07:00
Elena Salomatkina
397006ba5d net/sched: cbs: Fix integer overflow in cbs_set_port_rate()
The subsequent calculation of port_rate = speed * 1000 * BYTES_PER_KBIT,
where the BYTES_PER_KBIT is of type LL, may cause an overflow.
At least when speed = SPEED_20000, the expression to the left of port_rate
will be greater than INT_MAX.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Signed-off-by: Elena Salomatkina <esalomatkina@ispras.ru>
Link: https://patch.msgid.link/20241013124529.1043-1-esalomatkina@ispras.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15 18:25:47 -07:00
Jakub Sitnicki
d96016a764 udp: Compute L4 checksum as usual when not segmenting the skb
If:

  1) the user requested USO, but
  2) there is not enough payload for GSO to kick in, and
  3) the egress device doesn't offer checksum offload, then

we want to compute the L4 checksum in software early on.

In the case when we are not taking the GSO path, but it has been requested,
the software checksum fallback in skb_segment doesn't get a chance to
compute the full checksum, if the egress device can't do it. As a result we
end up sending UDP datagrams with only a partial checksum filled in, which
the peer will discard.

Fixes: 10154dbded ("udp: Allow GSO transmit from devices with no checksum offload")
Reported-by: Ivan Babrou <ivan@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20241011-uso-swcsum-fixup-v2-1-6e1ddc199af9@cloudflare.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15 18:12:33 -07:00
Eric Dumazet
56440d7ec2 genetlink: hold RCU in genlmsg_mcast()
While running net selftests with CONFIG_PROVE_RCU_LIST=y I saw
one lockdep splat [1].

genlmsg_mcast() uses for_each_net_rcu(), and must therefore hold RCU.

Instead of letting all callers guard genlmsg_multicast_allns()
with a rcu_read_lock()/rcu_read_unlock() pair, do it in genlmsg_mcast().

This also means the @flags parameter is useless, we need to always use
GFP_ATOMIC.

[1]
[10882.424136] =============================
[10882.424166] WARNING: suspicious RCU usage
[10882.424309] 6.12.0-rc2-virtme #1156 Not tainted
[10882.424400] -----------------------------
[10882.424423] net/netlink/genetlink.c:1940 RCU-list traversed in non-reader section!!
[10882.424469]
other info that might help us debug this:

[10882.424500]
rcu_scheduler_active = 2, debug_locks = 1
[10882.424744] 2 locks held by ip/15677:
[10882.424791] #0: ffffffffb6b491b0 (cb_lock){++++}-{3:3}, at: genl_rcv (net/netlink/genetlink.c:1219)
[10882.426334] #1: ffffffffb6b49248 (genl_mutex){+.+.}-{3:3}, at: genl_rcv_msg (net/netlink/genetlink.c:61 net/netlink/genetlink.c:57 net/netlink/genetlink.c:1209)
[10882.426465]
stack backtrace:
[10882.426805] CPU: 14 UID: 0 PID: 15677 Comm: ip Not tainted 6.12.0-rc2-virtme #1156
[10882.426919] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[10882.427046] Call Trace:
[10882.427131]  <TASK>
[10882.427244] dump_stack_lvl (lib/dump_stack.c:123)
[10882.427335] lockdep_rcu_suspicious (kernel/locking/lockdep.c:6822)
[10882.427387] genlmsg_multicast_allns (net/netlink/genetlink.c:1940 (discriminator 7) net/netlink/genetlink.c:1977 (discriminator 7))
[10882.427436] l2tp_tunnel_notify.constprop.0 (net/l2tp/l2tp_netlink.c:119) l2tp_netlink
[10882.427683] l2tp_nl_cmd_tunnel_create (net/l2tp/l2tp_netlink.c:253) l2tp_netlink
[10882.427748] genl_family_rcv_msg_doit (net/netlink/genetlink.c:1115)
[10882.427834] genl_rcv_msg (net/netlink/genetlink.c:1195 net/netlink/genetlink.c:1210)
[10882.427877] ? __pfx_l2tp_nl_cmd_tunnel_create (net/l2tp/l2tp_netlink.c:186) l2tp_netlink
[10882.427927] ? __pfx_genl_rcv_msg (net/netlink/genetlink.c:1201)
[10882.427959] netlink_rcv_skb (net/netlink/af_netlink.c:2551)
[10882.428069] genl_rcv (net/netlink/genetlink.c:1220)
[10882.428095] netlink_unicast (net/netlink/af_netlink.c:1332 net/netlink/af_netlink.c:1357)
[10882.428140] netlink_sendmsg (net/netlink/af_netlink.c:1901)
[10882.428210] ____sys_sendmsg (net/socket.c:729 (discriminator 1) net/socket.c:744 (discriminator 1) net/socket.c:2607 (discriminator 1))

Fixes: 33f72e6f0c ("l2tp : multicast notification to the registered listeners")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: James Chapman <jchapman@katalix.com>
Cc: Tom Parkin <tparkin@katalix.com>
Cc: Johannes Berg <johannes.berg@intel.com>
Link: https://patch.msgid.link/20241011171217.3166614-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15 17:52:58 -07:00
Kuniyuki Iwashima
e8c526f2bd tcp/dccp: Don't use timer_pending() in reqsk_queue_unlink().
Martin KaFai Lau reported use-after-free [0] in reqsk_timer_handler().

  """
  We are seeing a use-after-free from a bpf prog attached to
  trace_tcp_retransmit_synack. The program passes the req->sk to the
  bpf_sk_storage_get_tracing kernel helper which does check for null
  before using it.
  """

The commit 83fccfc394 ("inet: fix potential deadlock in
reqsk_queue_unlink()") added timer_pending() in reqsk_queue_unlink() not
to call del_timer_sync() from reqsk_timer_handler(), but it introduced a
small race window.

Before the timer is called, expire_timers() calls detach_timer(timer, true)
to clear timer->entry.pprev and marks it as not pending.

If reqsk_queue_unlink() checks timer_pending() just after expire_timers()
calls detach_timer(), TCP will miss del_timer_sync(); the reqsk timer will
continue running and send multiple SYN+ACKs until it expires.

The reported UAF could happen if req->sk is close()d earlier than the timer
expiration, which is 63s by default.

The scenario would be

  1. inet_csk_complete_hashdance() calls inet_csk_reqsk_queue_drop(),
     but del_timer_sync() is missed

  2. reqsk timer is executed and scheduled again

  3. req->sk is accept()ed and reqsk_put() decrements rsk_refcnt, but
     reqsk timer still has another one, and inet_csk_accept() does not
     clear req->sk for non-TFO sockets

  4. sk is close()d

  5. reqsk timer is executed again, and BPF touches req->sk

Let's not use timer_pending() by passing the caller context to
__inet_csk_reqsk_queue_drop().

Note that reqsk timer is pinned, so the issue does not happen in most
use cases. [1]

[0]
BUG: KFENCE: use-after-free read in bpf_sk_storage_get_tracing+0x2e/0x1b0

Use-after-free read at 0x00000000a891fb3a (in kfence-#1):
bpf_sk_storage_get_tracing+0x2e/0x1b0
bpf_prog_5ea3e95db6da0438_tcp_retransmit_synack+0x1d20/0x1dda
bpf_trace_run2+0x4c/0xc0
tcp_rtx_synack+0xf9/0x100
reqsk_timer_handler+0xda/0x3d0
run_timer_softirq+0x292/0x8a0
irq_exit_rcu+0xf5/0x320
sysvec_apic_timer_interrupt+0x6d/0x80
asm_sysvec_apic_timer_interrupt+0x16/0x20
intel_idle_irq+0x5a/0xa0
cpuidle_enter_state+0x94/0x273
cpu_startup_entry+0x15e/0x260
start_secondary+0x8a/0x90
secondary_startup_64_no_verify+0xfa/0xfb

kfence-#1: 0x00000000a72cc7b6-0x00000000d97616d9, size=2376, cache=TCPv6

allocated by task 0 on cpu 9 at 260507.901592s:
sk_prot_alloc+0x35/0x140
sk_clone_lock+0x1f/0x3f0
inet_csk_clone_lock+0x15/0x160
tcp_create_openreq_child+0x1f/0x410
tcp_v6_syn_recv_sock+0x1da/0x700
tcp_check_req+0x1fb/0x510
tcp_v6_rcv+0x98b/0x1420
ipv6_list_rcv+0x2258/0x26e0
napi_complete_done+0x5b1/0x2990
mlx5e_napi_poll+0x2ae/0x8d0
net_rx_action+0x13e/0x590
irq_exit_rcu+0xf5/0x320
common_interrupt+0x80/0x90
asm_common_interrupt+0x22/0x40
cpuidle_enter_state+0xfb/0x273
cpu_startup_entry+0x15e/0x260
start_secondary+0x8a/0x90
secondary_startup_64_no_verify+0xfa/0xfb

freed by task 0 on cpu 9 at 260507.927527s:
rcu_core_si+0x4ff/0xf10
irq_exit_rcu+0xf5/0x320
sysvec_apic_timer_interrupt+0x6d/0x80
asm_sysvec_apic_timer_interrupt+0x16/0x20
cpuidle_enter_state+0xfb/0x273
cpu_startup_entry+0x15e/0x260
start_secondary+0x8a/0x90
secondary_startup_64_no_verify+0xfa/0xfb

Fixes: 83fccfc394 ("inet: fix potential deadlock in reqsk_queue_unlink()")
Reported-by: Martin KaFai Lau <martin.lau@kernel.org>
Closes: https://lore.kernel.org/netdev/eb6684d0-ffd9-4bdc-9196-33f690c25824@linux.dev/
Link: https://lore.kernel.org/netdev/b55e2ca0-42f2-4b7c-b445-6ffd87ca74a0@linux.dev/ [1]
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20241014223312.4254-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15 17:47:30 -07:00
Kuniyuki Iwashima
95b3120a48 neighbour: Remove NEIGH_DN_TABLE.
Since commit 1202cdd665 ("Remove DECnet support from kernel"),
NEIGH_DN_TABLE is no longer used.

MPLS has implicit dependency on it in nla_put_via(), but nla_get_via()
does not support DECnet.

Let's remove NEIGH_DN_TABLE.

Now, neigh_tables[] has only 2 elements and no extra iteration
for DECnet in many places.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20241014235216.10785-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15 17:46:30 -07:00
Paolo Abeni
3d041393ea mptcp: prevent MPC handshake on port-based signal endpoints
Syzkaller reported a lockdep splat:

  ============================================
  WARNING: possible recursive locking detected
  6.11.0-rc6-syzkaller-00019-g67784a74e258 #0 Not tainted
  --------------------------------------------
  syz-executor364/5113 is trying to acquire lock:
  ffff8880449f1958 (k-slock-AF_INET){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline]
  ffff8880449f1958 (k-slock-AF_INET){+.-.}-{2:2}, at: sk_clone_lock+0x2cd/0xf40 net/core/sock.c:2328

  but task is already holding lock:
  ffff88803fe3cb58 (k-slock-AF_INET){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline]
  ffff88803fe3cb58 (k-slock-AF_INET){+.-.}-{2:2}, at: sk_clone_lock+0x2cd/0xf40 net/core/sock.c:2328

  other info that might help us debug this:
   Possible unsafe locking scenario:

         CPU0
         ----
    lock(k-slock-AF_INET);
    lock(k-slock-AF_INET);

   *** DEADLOCK ***

   May be due to missing lock nesting notation

  7 locks held by syz-executor364/5113:
   #0: ffff8880449f0e18 (sk_lock-AF_INET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1607 [inline]
   #0: ffff8880449f0e18 (sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_sendmsg+0x153/0x1b10 net/mptcp/protocol.c:1806
   #1: ffff88803fe39ad8 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1607 [inline]
   #1: ffff88803fe39ad8 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_sendmsg_fastopen+0x11f/0x530 net/mptcp/protocol.c:1727
   #2: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:326 [inline]
   #2: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:838 [inline]
   #2: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: __ip_queue_xmit+0x5f/0x1b80 net/ipv4/ip_output.c:470
   #3: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:326 [inline]
   #3: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:838 [inline]
   #3: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: ip_finish_output2+0x45f/0x1390 net/ipv4/ip_output.c:228
   #4: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: local_lock_acquire include/linux/local_lock_internal.h:29 [inline]
   #4: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: process_backlog+0x33b/0x15b0 net/core/dev.c:6104
   #5: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:326 [inline]
   #5: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:838 [inline]
   #5: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: ip_local_deliver_finish+0x230/0x5f0 net/ipv4/ip_input.c:232
   #6: ffff88803fe3cb58 (k-slock-AF_INET){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline]
   #6: ffff88803fe3cb58 (k-slock-AF_INET){+.-.}-{2:2}, at: sk_clone_lock+0x2cd/0xf40 net/core/sock.c:2328

  stack backtrace:
  CPU: 0 UID: 0 PID: 5113 Comm: syz-executor364 Not tainted 6.11.0-rc6-syzkaller-00019-g67784a74e258 #0
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
  Call Trace:
   <IRQ>
   __dump_stack lib/dump_stack.c:93 [inline]
   dump_stack_lvl+0x241/0x360 lib/dump_stack.c:119
   check_deadlock kernel/locking/lockdep.c:3061 [inline]
   validate_chain+0x15d3/0x5900 kernel/locking/lockdep.c:3855
   __lock_acquire+0x137a/0x2040 kernel/locking/lockdep.c:5142
   lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5759
   __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline]
   _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154
   spin_lock include/linux/spinlock.h:351 [inline]
   sk_clone_lock+0x2cd/0xf40 net/core/sock.c:2328
   mptcp_sk_clone_init+0x32/0x13c0 net/mptcp/protocol.c:3279
   subflow_syn_recv_sock+0x931/0x1920 net/mptcp/subflow.c:874
   tcp_check_req+0xfe4/0x1a20 net/ipv4/tcp_minisocks.c:853
   tcp_v4_rcv+0x1c3e/0x37f0 net/ipv4/tcp_ipv4.c:2267
   ip_protocol_deliver_rcu+0x22e/0x440 net/ipv4/ip_input.c:205
   ip_local_deliver_finish+0x341/0x5f0 net/ipv4/ip_input.c:233
   NF_HOOK+0x3a4/0x450 include/linux/netfilter.h:314
   NF_HOOK+0x3a4/0x450 include/linux/netfilter.h:314
   __netif_receive_skb_one_core net/core/dev.c:5661 [inline]
   __netif_receive_skb+0x2bf/0x650 net/core/dev.c:5775
   process_backlog+0x662/0x15b0 net/core/dev.c:6108
   __napi_poll+0xcb/0x490 net/core/dev.c:6772
   napi_poll net/core/dev.c:6841 [inline]
   net_rx_action+0x89b/0x1240 net/core/dev.c:6963
   handle_softirqs+0x2c4/0x970 kernel/softirq.c:554
   do_softirq+0x11b/0x1e0 kernel/softirq.c:455
   </IRQ>
   <TASK>
   __local_bh_enable_ip+0x1bb/0x200 kernel/softirq.c:382
   local_bh_enable include/linux/bottom_half.h:33 [inline]
   rcu_read_unlock_bh include/linux/rcupdate.h:908 [inline]
   __dev_queue_xmit+0x1763/0x3e90 net/core/dev.c:4450
   dev_queue_xmit include/linux/netdevice.h:3105 [inline]
   neigh_hh_output include/net/neighbour.h:526 [inline]
   neigh_output include/net/neighbour.h:540 [inline]
   ip_finish_output2+0xd41/0x1390 net/ipv4/ip_output.c:235
   ip_local_out net/ipv4/ip_output.c:129 [inline]
   __ip_queue_xmit+0x118c/0x1b80 net/ipv4/ip_output.c:535
   __tcp_transmit_skb+0x2544/0x3b30 net/ipv4/tcp_output.c:1466
   tcp_rcv_synsent_state_process net/ipv4/tcp_input.c:6542 [inline]
   tcp_rcv_state_process+0x2c32/0x4570 net/ipv4/tcp_input.c:6729
   tcp_v4_do_rcv+0x77d/0xc70 net/ipv4/tcp_ipv4.c:1934
   sk_backlog_rcv include/net/sock.h:1111 [inline]
   __release_sock+0x214/0x350 net/core/sock.c:3004
   release_sock+0x61/0x1f0 net/core/sock.c:3558
   mptcp_sendmsg_fastopen+0x1ad/0x530 net/mptcp/protocol.c:1733
   mptcp_sendmsg+0x1884/0x1b10 net/mptcp/protocol.c:1812
   sock_sendmsg_nosec net/socket.c:730 [inline]
   __sock_sendmsg+0x1a6/0x270 net/socket.c:745
   ____sys_sendmsg+0x525/0x7d0 net/socket.c:2597
   ___sys_sendmsg net/socket.c:2651 [inline]
   __sys_sendmmsg+0x3b2/0x740 net/socket.c:2737
   __do_sys_sendmmsg net/socket.c:2766 [inline]
   __se_sys_sendmmsg net/socket.c:2763 [inline]
   __x64_sys_sendmmsg+0xa0/0xb0 net/socket.c:2763
   do_syscall_x64 arch/x86/entry/common.c:52 [inline]
   do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
   entry_SYSCALL_64_after_hwframe+0x77/0x7f
  RIP: 0033:0x7f04fb13a6b9
  Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 01 1a 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
  RSP: 002b:00007ffd651f42d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000133
  RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f04fb13a6b9
  RDX: 0000000000000001 RSI: 0000000020000d00 RDI: 0000000000000004
  RBP: 00007ffd651f4310 R08: 0000000000000001 R09: 0000000000000001
  R10: 0000000020000080 R11: 0000000000000246 R12: 00000000000f4240
  R13: 00007f04fb187449 R14: 00007ffd651f42f4 R15: 00007ffd651f4300
   </TASK>

As noted by Cong Wang, the splat is false positive, but the code
path leading to the report is an unexpected one: a client is
attempting an MPC handshake towards the in-kernel listener created
by the in-kernel PM for a port based signal endpoint.

Such connection will be never accepted; many of them can make the
listener queue full and preventing the creation of MPJ subflow via
such listener - its intended role.

Explicitly detect this scenario at initial-syn time and drop the
incoming MPC request.

Fixes: 1729cf186d ("mptcp: create the listening socket for new port")
Cc: stable@vger.kernel.org
Reported-by: syzbot+f4aacdfef2c6a6529c3e@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=f4aacdfef2c6a6529c3e
Cc: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20241014-net-mptcp-mpc-port-endp-v2-1-7faea8e6b6ae@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15 10:57:02 -07:00
Li RongQing
82ac39ebd6 net/smc: Fix searching in list of known pnetids in smc_pnet_add_pnetid
pnetid of pi (not newly allocated pe) should be compared

Fixes: e888a2e833 ("net/smc: introduce list of pnetids for Ethernet devices")
Reviewed-by: D. Wythe <alibuda@linux.alibaba.com>
Reviewed-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com>
Link: https://patch.msgid.link/20241014115321.33234-1-lirongqing@baidu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15 10:56:31 -07:00
Julia Lawall
7bb3ecbc2b kcm: replace call_rcu by kfree_rcu for simple kmem_cache_free callback
Since SLOB was removed and since
commit 6c6c47b063 ("mm, slab: call kvfree_rcu_barrier() from kmem_cache_destroy()"),
it is not necessary to use call_rcu when the callback only performs
kmem_cache_free. Use kfree_rcu() directly.

The changes were made using Coccinelle.

Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Link: https://patch.msgid.link/20241013201704.49576-15-Julia.Lawall@inria.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15 10:50:21 -07:00
Julia Lawall
4ac64e570c net: bridge: replace call_rcu by kfree_rcu for simple kmem_cache_free callback
Since SLOB was removed and since
commit 6c6c47b063 ("mm, slab: call kvfree_rcu_barrier() from kmem_cache_destroy()"),
it is not necessary to use call_rcu when the callback only performs
kmem_cache_free. Use kfree_rcu() directly.

The changes were made using Coccinelle.

Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Link: https://patch.msgid.link/20241013201704.49576-9-Julia.Lawall@inria.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15 10:50:21 -07:00
Julia Lawall
85e48bcf29 ipv6: replace call_rcu by kfree_rcu for simple kmem_cache_free callback
Since SLOB was removed and since
commit 6c6c47b063 ("mm, slab: call kvfree_rcu_barrier() from kmem_cache_destroy()"),
it is not necessary to use call_rcu when the callback only performs
kmem_cache_free. Use kfree_rcu() directly.

The changes were made using Coccinelle.

Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Link: https://patch.msgid.link/20241013201704.49576-5-Julia.Lawall@inria.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15 10:50:21 -07:00
Julia Lawall
bb5810d423 inetpeer: replace call_rcu by kfree_rcu for simple kmem_cache_free callback
Since SLOB was removed and since
commit 6c6c47b063 ("mm, slab: call kvfree_rcu_barrier() from kmem_cache_destroy()"),
it is not necessary to use call_rcu when the callback only performs
kmem_cache_free. Use kfree_rcu() directly.

The changes were made using Coccinelle.

Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Link: https://patch.msgid.link/20241013201704.49576-4-Julia.Lawall@inria.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15 10:50:21 -07:00
Julia Lawall
497e17d807 ipv4: replace call_rcu by kfree_rcu for simple kmem_cache_free callback
Since SLOB was removed and since
commit 6c6c47b063 ("mm, slab: call kvfree_rcu_barrier() from kmem_cache_destroy()"),
it is not necessary to use call_rcu when the callback only performs
kmem_cache_free. Use kfree_rcu() directly.

The changes were made using Coccinelle.

Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Link: https://patch.msgid.link/20241013201704.49576-3-Julia.Lawall@inria.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15 10:50:21 -07:00
Gur Stavi
2cee3e6e2e af_packet: allow fanout_add when socket is not RUNNING
PACKET socket can retain its fanout membership through link down and up
and leave a fanout while closed regardless of link state.
However, socket was forbidden from joining a fanout while it was not
RUNNING.

This patch allows PACKET socket to join fanout while not RUNNING.

Socket can be RUNNING if it has a specified protocol. Either directly
from packet_create (being implicitly bound to any interface) or following
a successful bind. Socket RUNNING state is switched off if it is bound to
an interface that went down.

Instead of the test for RUNNING, this patch adds a test that socket can
become RUNNING.

Signed-off-by: Gur Stavi <gur.stavi@huawei.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/4f1a3c37dbef980ef044c4d2adf91c76e2eca14b.1728802323.git.gur.stavi@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15 09:52:36 -07:00
Florian Westphal
08e52cccae netfilter: nf_tables: prefer nft_trans_elem_alloc helper
Reduce references to sizeof(struct nft_trans_elem).
Preparation patch to move this to a flexiable array to store
elem references.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-10-15 17:29:51 +02:00
Justin Stitt
544dded8cb netfilter: nf_tables: replace deprecated strncpy with strscpy_pad
strncpy() is deprecated for use on NUL-terminated destination strings [1] and
as such we should prefer more robust and less ambiguous string interfaces.

In this particular instance, the usage of strncpy() is fine and works as
expected. However, towards the goal of [2], we should consider replacing
it with an alternative as many instances of strncpy() are bug-prone. Its
removal from the kernel promotes better long term health for the
codebase.

The current usage of strncpy() likely just wants the NUL-padding
behavior offered by strncpy() and doesn't care about the
NUL-termination. Since the compiler doesn't know the size of @dest, we
can't use strtomem_pad(). Instead, use strscpy_pad() which behaves
functionally the same as strncpy() in this context -- as we expect
br_dev->name to be NUL-terminated itself.

Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1]
Link: https://github.com/KSPP/linux/issues/90 [2]
Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html
Cc: Kees Cook <keescook@chromium.org>
Cc: linux-hardening@vger.kernel.org
Signed-off-by: Justin Stitt <justinstitt@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-10-15 17:29:51 +02:00
Uros Bizjak
0741f55593 netfilter: nf_tables: Fix percpu address space issues in nf_tables_api.c
Compiling nf_tables_api.c results in several sparse warnings:

nf_tables_api.c:2077:31: warning: incorrect type in return expression (different address spaces)
nf_tables_api.c:2080:31: warning: incorrect type in return expression (different address spaces)
nf_tables_api.c:2084:31: warning: incorrect type in return expression (different address spaces)

nf_tables_api.c:2740:23: warning: incorrect type in assignment (different address spaces)
nf_tables_api.c:2752:38: warning: incorrect type in assignment (different address spaces)
nf_tables_api.c:2798:21: warning: incorrect type in argument 1 (different address spaces)

Use {ERR_PTR,IS_ERR,PTR_ERR}_PCPU() macros when crossing between generic
and percpu address spaces and add __percpu annotation to *stats pointer
to fix these warnings.

Found by GCC's named address space checks.

There were no changes in the resulting object files.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-10-15 17:29:51 +02:00
Paolo Abeni
4a6f05d9fe This cleanup patchset includes the following patches:
- bump version strings, by Simon Wunderlich
 
  - Add flex array to struct batadv_tvlv_tt_data, by Erick Archer
 
  - Use string choice helper to print booleans, by Sven Eckelmann
 
  - replace call_rcu by kfree_rcu for simple kmem_cache_free callback,
    by Julia Lawall
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEE1ilQI7G+y+fdhnrfoSvjmEKSnqEFAmcOG44WHHN3QHNpbW9u
 d3VuZGVybGljaC5kZQAKCRChK+OYQpKeoTKoD/9t2iYs4kRr1HMCBEbXq5K0MDmV
 uBH75BAwVoTqE8brOcEyfBiQVzLGUyJXQDSu1Tjjzw75NEtdYWTBjXRC7AAZeOGh
 mwBVHOH8Is/9gjz93QCBbHNSx4tCN2DAI+spTii4VQOPjlSxxTipyxr7zh5G3HmX
 PO/x/N/7LoSqxI/9+Ft9HljaP+u6bguWUgTX1XlbsW9gRS9Vw6xXvXg4v/BCf7pB
 4p0KBa0mwyyUqpWSyFhPoi/3kcIgMGjkBka3zoTzDzMSthlT2kmovColHQh/KhW5
 1bsJg+uPe7oj0Hlw8RFk324a14jDIUF/y7ut0KvTM4IW1bgV2uyw26Vn6U8pkDA9
 9x4y/wOPxrWCkyDdcpMKtOVwyEF1242gGcpFIgNKM7gS7mTUsirN8bPYhb4HnT9/
 PY7YgVFMQHsZgSfDPP3hQunf8vQUT1U/lPOF2ZL7ixwiJ26WEl0LaJ02sj+mBKEs
 g/qLjsMSIWXU7ppHGNFajw5xSYHX/cRMRUB/jjHCP3z2gMZza+H7TiujV7p8tI7Q
 827z4hQRBjLHdYbk6835XVGt/coDsizFuhForapbBJskYGR35y1N4bMAnohV8Zrv
 iHrSfXr9p/ny7CvpLkJO7CLaFEg5B15fX+YOCx2Cx1NOhfDY0ztIYl44lowQip5l
 a//f2+EjwB2K1ahYqg==
 =6M1z
 -----END PGP SIGNATURE-----

Merge tag 'batadv-next-pullrequest-20241015' of git://git.open-mesh.org/linux-merge

Simon Wunderlich says:

====================
This cleanup patchset includes the following patches:

 - bump version strings, by Simon Wunderlich

 - Add flex array to struct batadv_tvlv_tt_data, by Erick Archer

 - Use string choice helper to print booleans, by Sven Eckelmann

 - replace call_rcu by kfree_rcu for simple kmem_cache_free callback,
   by Julia Lawall

* tag 'batadv-next-pullrequest-20241015' of git://git.open-mesh.org/linux-merge:
  batman-adv: replace call_rcu by kfree_rcu for simple kmem_cache_free callback
  batman-adv: Use string choice helper to print booleans
  batman-adv: Add flex array to struct batadv_tvlv_tt_data
  batman-adv: Start new development cycle
====================

Link: https://patch.msgid.link/20241015073946.46613-1-sw@simonwunderlich.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-15 15:28:17 +02:00
Paolo Abeni
39ab20647d bpf-next-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iIsEABYIADMWIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZw1/jBUcZGFuaWVsQGlv
 Z2VhcmJveC5uZXQACgkQ2yufC7HISIO/ZwEAuAVkRgyuC0njVV9PyT7EbZqxHjY+
 10v6I6XR8vWmILABALrTIR9wTOyBVgmZzW7AUq8wiFv9FSZmhJfp1KxPdNYA
 =L6hT
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2024-10-14

The following pull-request contains BPF updates for your *net-next* tree.

We've added 21 non-merge commits during the last 18 day(s) which contain
a total of 21 files changed, 1185 insertions(+), 127 deletions(-).

The main changes are:

1) Put xsk sockets on a struct diet and add various cleanups. Overall, this helps
   to bump performance by 12% for some workloads, from Maciej Fijalkowski.

2) Extend BPF selftests to increase coverage of XDP features in combination
   with BPF cpumap, from Alexis Lothoré (eBPF Foundation).

3) Extend netkit with an option to delegate skb->{mark,priority} scrubbing to
   its BPF program, from Daniel Borkmann.

4) Make the bpf_get_netns_cookie() helper available also to tc(x) BPF programs,
   from Mahe Tardy.

5) Extend BPF selftests covering a BPF program setting socket options per MPTCP
   subflow, from Geliang Tang and Nicolas Rybowski.

bpf-next-for-netdev

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (21 commits)
  xsk: Use xsk_buff_pool directly for cq functions
  xsk: Wrap duplicated code to function
  xsk: Carry a copy of xdp_zc_max_segs within xsk_buff_pool
  xsk: Get rid of xdp_buff_xsk::orig_addr
  xsk: s/free_list_node/list_node/
  xsk: Get rid of xdp_buff_xsk::xskb_list_node
  selftests/bpf: check program redirect in xdp_cpumap_attach
  selftests/bpf: make xdp_cpumap_attach keep redirect prog attached
  selftests/bpf: fix bpf_map_redirect call for cpu map test
  selftests/bpf: add tcx netns cookie tests
  bpf: add get_netns_cookie helper to tc programs
  selftests/bpf: add missing header include for htons
  selftests/bpf: Extend netkit tests to validate skb meta data
  tools: Sync if_link.h uapi tooling header
  netkit: Add add netkit scrub support to rt_link.yaml
  netkit: Simplify netkit mode over to use NLA_POLICY_MAX
  netkit: Add option for scrubbing skb meta data
  bpf: Remove unused macro
  selftests/bpf: Add mptcp subflow subtest
  selftests/bpf: Add getsockopt to inspect mptcp subflow
  ...
====================

Link: https://patch.msgid.link/20241014211110.16562-1-daniel@iogearbox.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-15 15:19:48 +02:00
Kuniyuki Iwashima
bb9df28e6f rtnl_net_debug: Remove rtnl_net_debug_exit().
kernel test robot reported section mismatch in rtnl_net_debug_exit().

  WARNING: modpost: vmlinux: section mismatch in reference: rtnl_net_debug_exit+0x20 (section: .exit.text) -> rtnl_net_debug_net_ops (section: .init.data)

rtnl_net_debug_exit() uses rtnl_net_debug_net_ops() that is annotated
as __net_initdata, but this file is always built-in.

Let's remove rtnl_net_debug_exit().

Fixes: 03fa534856 ("rtnetlink: Add ASSERT_RTNL_NET() placeholder for netdev notifier.")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202410101854.i0vQCaDz-lkp@intel.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20241010172433.67694-1-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-15 13:40:55 +02:00
Jakub Kicinski
bcbbfaa261 tools: ynl-gen: use names of constants in generated limits
YNL specs can use string expressions for limits, like s32-min
or u16-max. We convert all of those into their numeric values
when generating the code, which isn't always helpful. Try to
retain the string representations in the output. Any sort of
calculations still need the integers.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20241010151248.2049755-1-kuba@kernel.org
[pabeni@redhat.com: regenerated netdev-genl-gen.c]
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-15 13:06:49 +02:00
Breno Leitao
6c959fd5e1 netfilter: Make legacy configs user selectable
This option makes legacy Netfilter Kconfig user selectable, giving users
the option to configure iptables without enabling any other config.

Make the following KConfig entries user selectable:
 * BRIDGE_NF_EBTABLES_LEGACY
 * IP_NF_ARPTABLES
 * IP_NF_IPTABLES_LEGACY
 * IP6_NF_IPTABLES_LEGACY

Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-10-15 10:00:24 +02:00
Joe Damato
1287c1ae0f netdev-genl: Support setting per-NAPI config values
Add support to set per-NAPI defer_hard_irqs and gro_flush_timeout.

Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241011184527.16393-7-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-14 17:54:29 -07:00
Joe Damato
86e25f40aa net: napi: Add napi_config
Add a persistent NAPI config area for NAPI configuration to the core.
Drivers opt-in to setting the persistent config for a NAPI by passing an
index when calling netif_napi_add_config.

napi_config is allocated in alloc_netdev_mqs, freed in free_netdev
(after the NAPIs are deleted).

Drivers which call netif_napi_add_config will have persistent per-NAPI
settings: NAPI IDs, gro_flush_timeout, and defer_hard_irq settings.

Per-NAPI settings are saved in napi_disable and restored in napi_enable.

Co-developed-by: Martin Karsten <mkarsten@uwaterloo.ca>
Signed-off-by: Martin Karsten <mkarsten@uwaterloo.ca>
Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241011184527.16393-6-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-14 17:54:29 -07:00
Joe Damato
0137891e74 netdev-genl: Dump gro_flush_timeout
Support dumping gro_flush_timeout for a NAPI ID.

Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241011184527.16393-5-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-14 17:54:29 -07:00
Joe Damato
acb8d4ed56 net: napi: Make gro_flush_timeout per-NAPI
Allow per-NAPI gro_flush_timeout setting.

The existing sysfs parameter is respected; writes to sysfs will write to
all NAPI structs for the device and the net_device gro_flush_timeout
field. Reads from sysfs will read from the net_device field.

The ability to set gro_flush_timeout on specific NAPI instances will be
added in a later commit, via netdev-genl.

Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20241011184527.16393-4-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-14 17:54:29 -07:00
Joe Damato
5160104600 netdev-genl: Dump napi_defer_hard_irqs
Support dumping defer_hard_irqs for a NAPI ID.

Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20241011184527.16393-3-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-14 17:54:29 -07:00
Joe Damato
f15e3b3ddb net: napi: Make napi_defer_hard_irqs per-NAPI
Add defer_hard_irqs to napi_struct in preparation for per-NAPI
settings.

The existing sysfs parameter is respected; writes to sysfs will write to
all NAPI structs for the device and the net_device defer_hard_irq field.
Reads from sysfs show the net_device field.

The ability to set defer_hard_irqs on specific NAPI instances will be
added in a later commit, via netdev-genl.

Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20241011184527.16393-2-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-14 17:54:28 -07:00
Eric Dumazet
79636038d3 ipv4: tcp: give socket pointer to control skbs
ip_send_unicast_reply() send orphaned 'control packets'.

These are RST packets and also ACK packets sent from TIME_WAIT.

Some eBPF programs would prefer to have a meaningful skb->sk
pointer as much as possible.

This means that TCP can now attach TIME_WAIT sockets to outgoing
skbs.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Brian Vazquez <brianvv@google.com>
Link: https://patch.msgid.link/20241010174817.1543642-6-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-14 17:39:37 -07:00
Eric Dumazet
507a96737d ipv6: tcp: give socket pointer to control skbs
tcp_v6_send_response() send orphaned 'control packets'.

These are RST packets and also ACK packets sent from TIME_WAIT.

Some eBPF programs would prefer to have a meaningful skb->sk
pointer as much as possible.

This means that TCP can now attach TIME_WAIT sockets to outgoing
skbs.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Brian Vazquez <brianvv@google.com>
Link: https://patch.msgid.link/20241010174817.1543642-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-14 17:39:37 -07:00
Eric Dumazet
5ced52fa8f net: add skb_set_owner_edemux() helper
This can be used to attach a socket to an skb,
taking a reference on sk->sk_refcnt.

This helper might be a NOP if sk->sk_refcnt is zero.

Use it from tcp_make_synack().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Brian Vazquez <brianvv@google.com>
Link: https://patch.msgid.link/20241010174817.1543642-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-14 17:39:36 -07:00
Eric Dumazet
bc43a3c83c net_sched: sch_fq: prepare for TIME_WAIT sockets
TCP stack is not attaching skb to TIME_WAIT sockets yet,
but we would like to allow this in the future.

Add sk_listener_or_tw() helper to detect the three states
that FQ needs to take care.

Like NEW_SYN_RECV, TIME_WAIT are not full sockets and
do not contain sk->sk_pacing_status, sk->sk_pacing_rate.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Brian Vazquez <brianvv@google.com>
Link: https://patch.msgid.link/20241010174817.1543642-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-14 17:39:36 -07:00
Eric Dumazet
78e2baf3d9 net: add TIME_WAIT logic to sk_to_full_sk()
TCP will soon attach TIME_WAIT sockets to some ACK and RST.

Make sure sk_to_full_sk() detects this and does not return
a non full socket.

v3: also changed sk_const_to_full_sk()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Brian Vazquez <brianvv@google.com>
Link: https://patch.msgid.link/20241010174817.1543642-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-14 17:39:36 -07:00
Kai Shen
25c12b459d net/smc: Fix memory leak when using percpu refs
This patch adds missing percpu_ref_exit when releasing percpu refs.
When releasing percpu refs, percpu_ref_exit should be called.
Otherwise, memory leak happens.

Fixes: 79a22238b4 ("net/smc: Use percpu ref for wr tx reference")
Signed-off-by: Kai Shen <KaiShen@linux.alibaba.com>
Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Link: https://patch.msgid.link/20241010115624.7769-1-KaiShen@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-14 17:32:05 -07:00
Maciej Fijalkowski
e6c4047f51 xsk: Use xsk_buff_pool directly for cq functions
Currently xsk_cq_{reserve_addr,submit,cancel}_locked() take xdp_sock as
an input argument but it is only used for pulling out xsk_buff_pool
pointer from it.

Change mentioned functions to take pool pointer as an input argument to
avoid unnecessary dereferences.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20241007122458.282590-7-maciej.fijalkowski@intel.com
2024-10-14 17:23:49 +02:00
Maciej Fijalkowski
1d10b2bed2 xsk: Wrap duplicated code to function
Both allocation paths have exactly the same code responsible for getting
and initializing xskb. Pull it out to common function.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20241007122458.282590-6-maciej.fijalkowski@intel.com
2024-10-14 17:23:45 +02:00
Maciej Fijalkowski
6e12687219 xsk: Carry a copy of xdp_zc_max_segs within xsk_buff_pool
This so we avoid dereferencing struct net_device within hot path.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20241007122458.282590-5-maciej.fijalkowski@intel.com
2024-10-14 17:23:30 +02:00
Maciej Fijalkowski
bea14124ba xsk: Get rid of xdp_buff_xsk::orig_addr
Continue the process of dieting xdp_buff_xsk by removing orig_addr
member. It can be calculated from xdp->data_hard_start where it was
previously used, so it is not anything that has to be carried around in
struct used widely in hot path.

This has been used for initializing xdp_buff_xsk::frame_dma during pool
setup and as a shortcut in xp_get_handle() to retrieve address provided
to xsk Rx queue.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20241007122458.282590-4-maciej.fijalkowski@intel.com
2024-10-14 17:23:17 +02:00
Maciej Fijalkowski
30ec2c1baa xsk: s/free_list_node/list_node/
Now that free_list_node's purpose is two-folded, make it just a
'list_node'.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20241007122458.282590-3-maciej.fijalkowski@intel.com
2024-10-14 17:22:59 +02:00
Maciej Fijalkowski
b692bf9a75 xsk: Get rid of xdp_buff_xsk::xskb_list_node
Let's bring xdp_buff_xsk back to occupying 2 cachelines by removing
xskb_list_node - for the purpose of gathering the xskb frags
free_list_node can be used, head of the list (xsk_buff_pool::xskb_list)
stays as-is, just reuse the node ptr.

It is safe to do as a single xdp_buff_xsk can never reside in two
pool's lists simultaneously.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20241007122458.282590-2-maciej.fijalkowski@intel.com
2024-10-14 17:22:38 +02:00
Yu Liao
e4c416533f net: hsr: convert to use new timer APIs
del_timer() and del_timer_sync() have been renamed to timer_delete()
and timer_delete_sync().

Inconsistent API usage makes the code a bit confusing, so replace with
the new APIs.

No functional changes intended.

Signed-off-by: Yu Liao <liaoyu15@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-14 13:17:04 +01:00
Julia Lawall
356c81b6c4 batman-adv: replace call_rcu by kfree_rcu for simple kmem_cache_free callback
Since SLOB was removed and since
commit 6c6c47b063 ("mm, slab: call kvfree_rcu_barrier() from kmem_cache_destroy()"),
it is not necessary to use call_rcu when the callback only performs
kmem_cache_free. Use kfree_rcu() directly.

The changes were made using Coccinelle.

Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2024-10-14 09:08:39 +02:00
Danielle Ratson
9a3b0d078b net: ethtool: Add support for writing firmware blocks using EPL payload
In the CMIS specification for pluggable modules, LPL (Local Payload) and
EPL (Extended Payload) are two types of data payloads used for managing
various functions and features of the module.

EPL payloads are used for more complex and extensive management
functions that require a larger amount of data, so writing firmware
blocks using EPL is much more efficient.

Currently, only LPL payload is supported for writing firmware blocks to
the module.

Add support for writing firmware block using EPL payload, both to
support modules that supports only EPL write mechanism, and to optimize
the flashing process of modules that support LPL and EPL.

Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-13 18:02:50 +01:00
Danielle Ratson
edc3445689 net: ethtool: Add new parameters and a function to support EPL
In the CMIS specification for pluggable modules, LPL (Local Payload) and
EPL (Extended Payload) are two types of data payloads used for managing
various functions and features of the module.

EPL payloads are used for more complex and extensive management
functions that require a larger amount of data, so writing firmware
blocks using EPL is much more efficient.

Currently, only LPL payload is supported for writing firmware blocks to
the module.

Add EPL related parameters to the function ethtool_cmis_cdb_compose_args()
and add a specific function for calculating the maximum allowable length
extension for EPL. Both will be used in the next patch to add support for
writing firmware blocks using EPL.

Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-13 18:02:50 +01:00