73310 Commits

Author SHA1 Message Date
Paolo Abeni
c2b2ae3925 mptcp: handle correctly disconnect() failures
Currently the mptcp code has assumes that disconnect() can fail only
at mptcp_sendmsg_fastopen() time - to avoid a deadlock scenario - and
don't even bother returning an error code.

Soon mptcp_disconnect() will handle more error conditions: let's track
them explicitly.

As a bonus, explicitly annotate TCP-level disconnect as not failing:
the mptcp code never blocks for event on the subflows.

Fixes: 7d803344fdc3 ("mptcp: fix deadlock in fastopen error path")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Tested-by: Christoph Paasch <cpaasch@apple.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-21 22:44:54 -07:00
David S. Miller
e438edaae2 ipsec-2023-06-20
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEH7ZpcWbFyOOp6OJbrB3Eaf9PW7cFAmSRfcsACgkQrB3Eaf9P
 W7cOew//Q2FYj4vOw3DNYN1NgLzDac6wS5YtBxXh9QSJXTBhx9yXW6/Y++AFrP/4
 GfgfQvgIHcRLUZZkZiiILmpiq5QcaTDTrryz0/HnWe72/rv/vm2RcZ9amQD4g4/x
 U+HOwiDpE0uP0nHbfclvQe/AZARfrLLhjItOGYNGDtinlQpudnTJM4QR+cr8EtZF
 8cNJ8YWylIlag+utaPMzYsaCgnTxt9vRzReQpdAgxHiyF7QD2FGfqZ5B+Re9CoSq
 kt/I6tNmKZ/SBGnRrCQNA0fMNMqMapGyMMqSVNUkpaVbc/ZvzO0GbtMGfT1sJ+rJ
 mGECTEqMbqxpLUpTKOtr3MVZ5ddIwezBzEop+AIG82MSbkIN+yYQw69pWkY6e5cY
 DFg709CQ+LrRVib/LUsJpnqnpS9CWD8Vi1uqFza8wivknaEu2FauSKQxIKQo9qux
 zmk377h7EzVF/asdtG7j1KdljyRaX5r5OnTF4fPVEHA4QF62ZxO2swQKy+EG9Fu/
 eQvafxuCfEAgcn5GDRzgjrvSKfFGRXyxDncsc8T7HphiuPR5rFQt3x9DhfcMn4Ds
 vezC4cXa2HYyhFj52tZ8KJAbmhVJz87eBUoiM/aTOdGPRFmVExOnuY1RsIYPxcIz
 m4aOvIWEtFjpYkpzEXOB3/lq7gjfggz3zVloXAoaeonIvnzahgw=
 =co/J
 -----END PGP SIGNATURE-----

Merge tag 'ipsec-2023-06-20' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec

ipsec-2023-06-20
2023-06-20 13:33:50 +01:00
Vladimir Oltean
b79d7c14f4 net: dsa: introduce preferred_default_local_cpu_port and use on MT7530
Since the introduction of the OF bindings, DSA has always had a policy that
in case multiple CPU ports are present in the device tree, the numerically
smallest one is always chosen.

The MT7530 switch family, except the switch on the MT7988 SoC, has 2 CPU
ports, 5 and 6, where port 6 is preferable on the MT7531BE switch because
it has higher bandwidth.

The MT7530 driver developers had 3 options:
- to modify DSA when the MT7531 switch support was introduced, such as to
  prefer the better port
- to declare both CPU ports in device trees as CPU ports, and live with the
  sub-optimal performance resulting from not preferring the better port
- to declare just port 6 in the device tree as a CPU port

Of course they chose the path of least resistance (3rd option), kicking the
can down the road. The hardware description in the device tree is supposed
to be stable - developers are not supposed to adopt the strategy of
piecemeal hardware description, where the device tree is updated in
lockstep with the features that the kernel currently supports.

Now, as a result of the fact that they did that, any attempts to modify the
device tree and describe both CPU ports as CPU ports would make DSA change
its default selection from port 6 to 5, effectively resulting in a
performance degradation visible to users with the MT7531BE switch as can be
seen below.

Without preferring port 6:

[ ID][Role] Interval           Transfer     Bitrate         Retr
[  5][TX-C]   0.00-20.00  sec   374 MBytes   157 Mbits/sec  734    sender
[  5][TX-C]   0.00-20.00  sec   373 MBytes   156 Mbits/sec    receiver
[  7][RX-C]   0.00-20.00  sec  1.81 GBytes   778 Mbits/sec    0    sender
[  7][RX-C]   0.00-20.00  sec  1.81 GBytes   777 Mbits/sec    receiver

With preferring port 6:

[ ID][Role] Interval           Transfer     Bitrate         Retr
[  5][TX-C]   0.00-20.00  sec  1.99 GBytes   856 Mbits/sec  273    sender
[  5][TX-C]   0.00-20.00  sec  1.99 GBytes   855 Mbits/sec    receiver
[  7][RX-C]   0.00-20.00  sec  1.72 GBytes   737 Mbits/sec   15    sender
[  7][RX-C]   0.00-20.00  sec  1.71 GBytes   736 Mbits/sec    receiver

Using one port for WAN and the other ports for LAN is a very popular use
case which is what this test emulates.

As such, this change proposes that we retroactively modify stable kernels
(which don't support the modification of the CPU port assignments, so as to
let user space fix the problem and restore the throughput) to keep the
mt7530 driver preferring port 6 even with device trees where the hardware
is more fully described.

Fixes: c288575f7810 ("net: dsa: mt7530: Add the support of MT7531 switch")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-20 09:40:26 +01:00
David S. Miller
8340eef98d Merge tag 'ieee802154-for-net-2023-06-19' of git://git.kernel.org/pub/scm/linux/kernel/git/wpan/wpan
Stefan Schmidt says:

====================
An update from ieee802154 for your *net* tree:

Two small fixes and MAINTAINERS update this time.

Azeem Shaikh ensured consistent use of strscpy through the tree and fixed
the usage in our trace.h.

Chen Aotian fixed a potential memory leak in the hwsim simulator for
ieee802154.

Miquel Raynal updated the MAINATINERS file with the new team git tree
locations and patchwork URLs.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-20 09:32:33 +01:00
Azeem Shaikh
cd91250306 ieee802154: Replace strlcpy with strscpy
strlcpy() reads the entire source buffer first.
This read may exceed the destination size limit.
This is both inefficient and can lead to linear read
overflows if a source string is not NUL-terminated [1].
In an effort to remove strlcpy() completely [2], replace
strlcpy() here with strscpy().

Direct replacement is safe here since the return values
from the helper macros are ignored by the callers.

[1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strlcpy
[2] https://github.com/KSPP/linux/issues/89

Signed-off-by: Azeem Shaikh <azeemshaikh38@gmail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230613003326.3538391-1-azeemshaikh38@gmail.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2023-06-16 22:14:24 +02:00
Sebastian Andrzej Siewior
f015b900bc xfrm: Linearize the skb after offloading if needed.
With offloading enabled, esp_xmit() gets invoked very late, from within
validate_xmit_xfrm() which is after validate_xmit_skb() validates and
linearizes the skb if the underlying device does not support fragments.

esp_output_tail() may add a fragment to the skb while adding the auth
tag/ IV. Devices without the proper support will then send skb->data
points to with the correct length so the packet will have garbage at the
end. A pcap sniffer will claim that the proper data has been sent since
it parses the skb properly.

It is not affected with INET_ESP_OFFLOAD disabled.

Linearize the skb after offloading if the sending hardware requires it.
It was tested on v4, v6 has been adopted.

Fixes: 7785bba299a8d ("esp: Add a software GRO codepath")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2023-06-16 10:29:50 +02:00
Kuniyuki Iwashima
b144fcaf46 dccp: Print deprecation notice.
DCCP was marked as Orphan in the MAINTAINERS entry 2 years ago in commit
054c4610bd05 ("MAINTAINERS: dccp: move Gerrit Renker to CREDITS").  It says
we haven't heard from the maintainer for five years, so DCCP is not well
maintained for 7 years now.

Recently DCCP only receives updates for bugs, and major distros disable it
by default.

Removing DCCP would allow for better organisation of TCP fields to reduce
the number of cache lines hit in the fast path.

Let's add a deprecation notice when DCCP socket is created and schedule its
removal to 2025.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-15 15:08:59 -07:00
Kuniyuki Iwashima
be28c14ac8 udplite: Print deprecation notice.
Recently syzkaller reported a 7-year-old null-ptr-deref [0] that occurs
when a UDP-Lite socket tries to allocate a buffer under memory pressure.

Someone should have stumbled on the bug much earlier if UDP-Lite had been
used in a real app.  Also, we do not always need a large UDP-Lite workload
to hit the bug since UDP and UDP-Lite share the same memory accounting
limit.

Removing UDP-Lite would simplify UDP code removing a bunch of conditionals
in fast path.

Let's add a deprecation notice when UDP-Lite socket is created and schedule
its removal to 2025.

Link: https://lore.kernel.org/netdev/20230523163305.66466-1-kuniyu@amazon.com/ [0]
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-15 15:08:58 -07:00
Lin Ma
44194cb1b6 net: tipc: resize nlattr array to correct size
According to nla_parse_nested_deprecated(), the tb[] is supposed to the
destination array with maxtype+1 elements. In current
tipc_nl_media_get() and __tipc_nl_media_set(), a larger array is used
which is unnecessary. This patch resize them to a proper size.

Fixes: 1e55417d8fc6 ("tipc: add media set to new netlink api")
Fixes: 46f15c6794fb ("tipc: add media get/dump to new netlink api")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Tung Nguyen <tung.q.nguyen@dektech.com.au>
Link: https://lore.kernel.org/r/20230614120604.1196377-1-linma@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-15 14:59:17 -07:00
Vlad Buslov
c9a82bec02 net/sched: cls_api: Fix lockup on flushing explicitly created chain
Mingshuai Ren reports:

When a new chain is added by using tc, one soft lockup alarm will be
 generated after delete the prio 0 filter of the chain. To reproduce
 the problem, perform the following steps:
(1) tc qdisc add dev eth0 root handle 1: htb default 1
(2) tc chain add dev eth0
(3) tc filter del dev eth0 chain 0 parent 1: prio 0
(4) tc filter add dev eth0 chain 0 parent 1:

Fix the issue by accounting for additional reference to chains that are
explicitly created by RTM_NEWCHAIN message as opposed to implicitly by
RTM_NEWTFILTER message.

Fixes: 726d061286ce ("net: sched: prevent insertion of new classifiers during chain flush")
Reported-by: Mingshuai Ren <renmingshuai@huawei.com>
Closes: https://lore.kernel.org/lkml/87legswvi3.fsf@nvidia.com/T/
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Link: https://lore.kernel.org/r/20230612093426.2867183-1-vladbu@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-14 23:03:16 -07:00
Lin Ma
361b6889ae net/handshake: remove fput() that causes use-after-free
A reference underflow is found in TLS handshake subsystem that causes a
direct use-after-free. Part of the crash log is like below:

[    2.022114] ------------[ cut here ]------------
[    2.022193] refcount_t: underflow; use-after-free.
[    2.022288] WARNING: CPU: 0 PID: 60 at lib/refcount.c:28 refcount_warn_saturate+0xbe/0x110
[    2.022432] Modules linked in:
[    2.022848] RIP: 0010:refcount_warn_saturate+0xbe/0x110
[    2.023231] RSP: 0018:ffffc900001bfe18 EFLAGS: 00000286
[    2.023325] RAX: 0000000000000000 RBX: 0000000000000007 RCX: 00000000ffffdfff
[    2.023438] RDX: 0000000000000000 RSI: 00000000ffffffea RDI: 0000000000000001
[    2.023555] RBP: ffff888004c20098 R08: ffffffff82b392c8 R09: 00000000ffffdfff
[    2.023693] R10: ffffffff82a592e0 R11: ffffffff82b092e0 R12: ffff888004c200d8
[    2.023813] R13: 0000000000000000 R14: ffff888004c20000 R15: ffffc90000013ca8
[    2.023930] FS:  0000000000000000(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
[    2.024062] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    2.024161] CR2: ffff888003601000 CR3: 0000000002a2e000 CR4: 00000000000006f0
[    2.024275] Call Trace:
[    2.024322]  <TASK>
[    2.024367]  ? __warn+0x7f/0x130
[    2.024430]  ? refcount_warn_saturate+0xbe/0x110
[    2.024513]  ? report_bug+0x199/0x1b0
[    2.024585]  ? handle_bug+0x3c/0x70
[    2.024676]  ? exc_invalid_op+0x18/0x70
[    2.024750]  ? asm_exc_invalid_op+0x1a/0x20
[    2.024830]  ? refcount_warn_saturate+0xbe/0x110
[    2.024916]  ? refcount_warn_saturate+0xbe/0x110
[    2.024998]  __tcp_close+0x2f4/0x3d0
[    2.025065]  ? __pfx_kunit_generic_run_threadfn_adapter+0x10/0x10
[    2.025168]  tcp_close+0x1f/0x70
[    2.025231]  inet_release+0x33/0x60
[    2.025297]  sock_release+0x1f/0x80
[    2.025361]  handshake_req_cancel_test2+0x100/0x2d0
[    2.025457]  kunit_try_run_case+0x4c/0xa0
[    2.025532]  kunit_generic_run_threadfn_adapter+0x15/0x20
[    2.025644]  kthread+0xe1/0x110
[    2.025708]  ? __pfx_kthread+0x10/0x10
[    2.025780]  ret_from_fork+0x2c/0x50

One can enable CONFIG_NET_HANDSHAKE_KUNIT_TEST config to reproduce above
crash.

The root cause of this bug is that the commit 1ce77c998f04
("net/handshake: Unpin sock->file if a handshake is cancelled") adds one
additional fput() function. That patch claims that the fput() is used to
enable sock->file to be freed even when user space never calls DONE.

However, it seems that the intended DONE routine will never give an
additional fput() of ths sock->file. The existing two of them are just
used to balance the reference added in sockfd_lookup().

This patch revert the mentioned commit to avoid the use-after-free. The
patched kernel could successfully pass the KUNIT test and boot to shell.

[    0.733613]     # Subtest: Handshake API tests
[    0.734029]     1..11
[    0.734255]         KTAP version 1
[    0.734542]         # Subtest: req_alloc API fuzzing
[    0.736104]         ok 1 handshake_req_alloc NULL proto
[    0.736114]         ok 2 handshake_req_alloc CLASS_NONE
[    0.736559]         ok 3 handshake_req_alloc CLASS_MAX
[    0.737020]         ok 4 handshake_req_alloc no callbacks
[    0.737488]         ok 5 handshake_req_alloc no done callback
[    0.737988]         ok 6 handshake_req_alloc excessive privsize
[    0.738529]         ok 7 handshake_req_alloc all good
[    0.739036]     # req_alloc API fuzzing: pass:7 fail:0 skip:0 total:7
[    0.739444]     ok 1 req_alloc API fuzzing
[    0.740065]     ok 2 req_submit NULL req arg
[    0.740436]     ok 3 req_submit NULL sock arg
[    0.740834]     ok 4 req_submit NULL sock->file
[    0.741236]     ok 5 req_lookup works
[    0.741621]     ok 6 req_submit max pending
[    0.741974]     ok 7 req_submit multiple
[    0.742382]     ok 8 req_cancel before accept
[    0.742764]     ok 9 req_cancel after accept
[    0.743151]     ok 10 req_cancel after done
[    0.743510]     ok 11 req_destroy works
[    0.743882] # Handshake API tests: pass:11 fail:0 skip:0 total:11
[    0.744205] # Totals: pass:17 fail:0 skip:0 total:17

Acked-by: Chuck Lever <chuck.lever@oracle.com>
Fixes: 1ce77c998f04 ("net/handshake: Unpin sock->file if a handshake is cancelled")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Link: https://lore.kernel.org/r/20230613083204.633896-1-linma@zju.edu.cn
Link: https://lore.kernel.org/r/20230614015249.987448-1-linma@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-14 22:26:37 -07:00
Jakub Kicinski
37cec6ed8d A couple of straggler fixes, mostly in the stack:
* fix fragmentation for multi-link related elements
  * fix callback copy/paste error
  * fix multi-link locking
  * remove double-locking of wiphy mutex
  * transmit only on active links, not all
  * activate links in the correct order
  * don't remove links that weren't added
  * disable soft-IRQs for LQ lock in iwlwifi
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEpeA8sTs3M8SN2hR410qiO8sPaAAFAmSJcf0ACgkQ10qiO8sP
 aAC77Q//TOZSUoAFnMqsA23SXwN8CeNQC5yBmYwxVMcsqBTO6+7k0NphpFGJUcLA
 OG8Leo6gwJdhLFYt7bl7RfHGSQrAZcNeYB9pv9J96vGQGnCComAAANEWSo8OBT2a
 03yYrfcKfkXAUNm2dxKvwmi3D3/VPyOgj+O6LNEs1DogHw0V6GdthW3J6s/vl6RU
 MPCQqlIFY9j20mXEKPFMaIZ8fyQKh38xa5YttGmeFrSUKYSljWUqqUooSMIkeyS4
 D5mYdzbsqCiihnN1FenEjkBUe2eS6BzxL+KVLaY2vth4tQytGeasvCaGcLcB83nc
 BxGR0rbEkrwp7nBqE4ZpMmhzHG3hpWus2+hJtMWsQku7qzE/vMh4qv2s2+QUVk/3
 jCXGv233bIgvQ2d1SUqp7CenGjJ0eBfKKRVzM+Hyiz+V6kWsugMxNaBmi59JVB7w
 5JilT85LfV2cRJgHtkDY7kMpDWnVYfwenvSywoXaRdVuKiowMUhZ9P19wLE0gn7K
 qtKIaLnkrLE2QHdqlxcuyMPBLhfga2+qXuo94SIYMFNURW7jjJcSVlN8ZVqqBvvp
 ib51XCyx/95zAr1Vyly/Pc7puuCMiiQk0ZhQBgqFPrnjs37JIzHDNo4Cq6H9+FlY
 0EncP/akjy8t7PsBdTmQv1UG3wq4EG5Wmh+wLDpa5QKKs2IqcCQ=
 =IPqO
 -----END PGP SIGNATURE-----

Merge tag 'wireless-2023-06-14' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless

Johannes Berg says:

====================
A couple of straggler fixes, mostly in the stack:
 - fix fragmentation for multi-link related elements
 - fix callback copy/paste error
 - fix multi-link locking
 - remove double-locking of wiphy mutex
 - transmit only on active links, not all
 - activate links in the correct order
 - don't remove links that weren't added
 - disable soft-IRQs for LQ lock in iwlwifi

* tag 'wireless-2023-06-14' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
  wifi: iwlwifi: mvm: spin_lock_bh() to fix lockdep regression
  wifi: mac80211: fragment per STA profile correctly
  wifi: mac80211: Use active_links instead of valid_links in Tx
  wifi: cfg80211: remove links only on AP
  wifi: mac80211: take lock before setting vif links
  wifi: cfg80211: fix link del callback to call correct handler
  wifi: mac80211: fix link activation settings order
  wifi: cfg80211: fix double lock bug in reg_wdev_chan_valid()
====================

Link: https://lore.kernel.org/r/20230614075502.11765-1-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-14 21:28:59 -07:00
Peilin Ye
84ad0af0bc net/sched: qdisc_destroy() old ingress and clsact Qdiscs before grafting
mini_Qdisc_pair::p_miniq is a double pointer to mini_Qdisc, initialized
in ingress_init() to point to net_device::miniq_ingress.  ingress Qdiscs
access this per-net_device pointer in mini_qdisc_pair_swap().  Similar
for clsact Qdiscs and miniq_egress.

Unfortunately, after introducing RTNL-unlocked RTM_{NEW,DEL,GET}TFILTER
requests (thanks Hillf Danton for the hint), when replacing ingress or
clsact Qdiscs, for example, the old Qdisc ("@old") could access the same
miniq_{in,e}gress pointer(s) concurrently with the new Qdisc ("@new"),
causing race conditions [1] including a use-after-free bug in
mini_qdisc_pair_swap() reported by syzbot:

 BUG: KASAN: slab-use-after-free in mini_qdisc_pair_swap+0x1c2/0x1f0 net/sched/sch_generic.c:1573
 Write of size 8 at addr ffff888045b31308 by task syz-executor690/14901
...
 Call Trace:
  <TASK>
  __dump_stack lib/dump_stack.c:88 [inline]
  dump_stack_lvl+0xd9/0x150 lib/dump_stack.c:106
  print_address_description.constprop.0+0x2c/0x3c0 mm/kasan/report.c:319
  print_report mm/kasan/report.c:430 [inline]
  kasan_report+0x11c/0x130 mm/kasan/report.c:536
  mini_qdisc_pair_swap+0x1c2/0x1f0 net/sched/sch_generic.c:1573
  tcf_chain_head_change_item net/sched/cls_api.c:495 [inline]
  tcf_chain0_head_change.isra.0+0xb9/0x120 net/sched/cls_api.c:509
  tcf_chain_tp_insert net/sched/cls_api.c:1826 [inline]
  tcf_chain_tp_insert_unique net/sched/cls_api.c:1875 [inline]
  tc_new_tfilter+0x1de6/0x2290 net/sched/cls_api.c:2266
...

@old and @new should not affect each other.  In other words, @old should
never modify miniq_{in,e}gress after @new, and @new should not update
@old's RCU state.

Fixing without changing sch_api.c turned out to be difficult (please
refer to Closes: for discussions).  Instead, make sure @new's first call
always happen after @old's last call (in {ingress,clsact}_destroy()) has
finished:

In qdisc_graft(), return -EBUSY if @old has any ongoing filter requests,
and call qdisc_destroy() for @old before grafting @new.

Introduce qdisc_refcount_dec_if_one() as the counterpart of
qdisc_refcount_inc_nz() used for filter requests.  Introduce a
non-static version of qdisc_destroy() that does a TCQ_F_BUILTIN check,
just like qdisc_put() etc.

Depends on patch "net/sched: Refactor qdisc_graft() for ingress and
clsact Qdiscs".

[1] To illustrate, the syzkaller reproducer adds ingress Qdiscs under
TC_H_ROOT (no longer possible after commit c7cfbd115001 ("net/sched:
sch_ingress: Only create under TC_H_INGRESS")) on eth0 that has 8
transmission queues:

  Thread 1 creates ingress Qdisc A (containing mini Qdisc a1 and a2),
  then adds a flower filter X to A.

  Thread 2 creates another ingress Qdisc B (containing mini Qdisc b1 and
  b2) to replace A, then adds a flower filter Y to B.

 Thread 1               A's refcnt   Thread 2
  RTM_NEWQDISC (A, RTNL-locked)
   qdisc_create(A)               1
   qdisc_graft(A)                9

  RTM_NEWTFILTER (X, RTNL-unlocked)
   __tcf_qdisc_find(A)          10
   tcf_chain0_head_change(A)
   mini_qdisc_pair_swap(A) (1st)
            |
            |                         RTM_NEWQDISC (B, RTNL-locked)
         RCU sync                2     qdisc_graft(B)
            |                    1     notify_and_destroy(A)
            |
   tcf_block_release(A)          0    RTM_NEWTFILTER (Y, RTNL-unlocked)
   qdisc_destroy(A)                    tcf_chain0_head_change(B)
   tcf_chain0_head_change_cb_del(A)    mini_qdisc_pair_swap(B) (2nd)
   mini_qdisc_pair_swap(A) (3rd)                |
           ...                                 ...

Here, B calls mini_qdisc_pair_swap(), pointing eth0->miniq_ingress to
its mini Qdisc, b1.  Then, A calls mini_qdisc_pair_swap() again during
ingress_destroy(), setting eth0->miniq_ingress to NULL, so ingress
packets on eth0 will not find filter Y in sch_handle_ingress().

This is just one of the possible consequences of concurrently accessing
miniq_{in,e}gress pointers.

Fixes: 7a096d579e8e ("net: sched: ingress: set 'unlocked' flag for Qdisc ops")
Fixes: 87f373921c4e ("net: sched: ingress: set 'unlocked' flag for clsact Qdisc ops")
Reported-by: syzbot+b53a9c0d1ea4ad62da8b@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/r/0000000000006cf87705f79acf1a@google.com/
Cc: Hillf Danton <hdanton@sina.com>
Cc: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-06-14 10:31:39 +02:00
Peilin Ye
2d5f6a8d7a net/sched: Refactor qdisc_graft() for ingress and clsact Qdiscs
Grafting ingress and clsact Qdiscs does not need a for-loop in
qdisc_graft().  Refactor it.  No functional changes intended.

Tested-by: Pedro Tammela <pctammela@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-06-14 10:31:39 +02:00
Paul Blakey
41f2c7c342 net/sched: act_ct: Fix promotion of offloaded unreplied tuple
Currently UNREPLIED and UNASSURED connections are added to the nf flow
table. This causes the following connection packets to be processed
by the flow table which then skips conntrack_in(), and thus such the
connections will remain UNREPLIED and UNASSURED even if reply traffic
is then seen. Even still, the unoffloaded reply packets are the ones
triggering hardware update from new to established state, and if
there aren't any to triger an update and/or previous update was
missed, hardware can get out of sync with sw and still mark
packets as new.

Fix the above by:
1) Not skipping conntrack_in() for UNASSURED packets, but still
   refresh for hardware, as before the cited patch.
2) Try and force a refresh by reply-direction packets that update
   the hardware rules from new to established state.
3) Remove any bidirectional flows that didn't failed to update in
   hardware for re-insertion as bidrectional once any new packet
   arrives.

Fixes: 6a9bad0069cf ("net/sched: act_ct: offload UDP NEW connections")
Co-developed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/1686313379-117663-1-git-send-email-paulb@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-06-14 09:56:50 +02:00
Herbert Xu
842665a900 xfrm: Use xfrm_state selector for BEET input
For BEET the inner address and therefore family is stored in the
xfrm_state selector.  Use that when decapsulating an input packet
instead of incorrectly relying on a non-existent tunnel protocol.

Fixes: 5f24f41e8ea6 ("xfrm: Remove inner/outer modes from input path")
Reported-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2023-06-12 10:36:48 +02:00
Dan Carpenter
75e6def3b2 sctp: fix an error code in sctp_sf_eat_auth()
The sctp_sf_eat_auth() function is supposed to enum sctp_disposition
values and returning a kernel error code will cause issues in the
caller.  Change -ENOMEM to SCTP_DISPOSITION_NOMEM.

Fixes: 65b07e5d0d09 ("[SCTP]: API updates to suport SCTP-AUTH extensions.")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Acked-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-12 09:36:27 +01:00
Dan Carpenter
a0067dfcd9 sctp: handle invalid error codes without calling BUG()
The sctp_sf_eat_auth() function is supposed to return enum sctp_disposition
values but if the call to sctp_ulpevent_make_authkey() fails, it returns
-ENOMEM.

This results in calling BUG() inside the sctp_side_effects() function.
Calling BUG() is an over reaction and not helpful.  Call WARN_ON_ONCE()
instead.

This code predates git.

Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-12 09:36:27 +01:00
Benjamin Berg
d094482c99 wifi: mac80211: fragment per STA profile correctly
When fragmenting the ML per STA profile, the element ID should be
IEEE80211_MLE_SUBELEM_PER_STA_PROFILE rather than WLAN_EID_FRAGMENT.

Change the helper function to take the to be used element ID and pass
the appropriate value for each of the fragmentation levels.

Fixes: 81151ce462e5 ("wifi: mac80211: support MLO authentication/association with one link")
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230611121219.9b5c793d904b.I7dad952bea8e555e2f3139fbd415d0cd2b3a08c3@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-06-12 09:52:52 +02:00
David S. Miller
65d8bd81aa netfilter pull request 23-06-08
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEN9lkrMBJgcdVAPub1V2XiooUIOQFAmSCMbUACgkQ1V2XiooU
 IOR4Vw//UPJALibmSrS4+yN7nHM3Bwy40ZVCYZkskFuotDU4pkVO6/GKZKZAF9ZM
 jgYXb5in61lnatyLlolZeg2lbZM4XmKxi4hZa7hXy2oHvp7p/CkIHri2Sn/dSTT1
 fv+/LYeQFc6w3o+z7uUMWG+WruDLyFREAEw5A2vbVLOAvLCsjPYxSWOHOBOuMgg/
 dpRMlcgkBDRycy2a9wVhCSDxB/pwv1ksk5Ev8TIC+ACOI4WCauzFvPlO4IrbjiGK
 GG4avni4q2R1ia9EbVE4OZiaFYOQyABS8XbJX32vcMN2BVoRCnmacJNNo9Q0jpM8
 rRjThsDsKPQARIEB0/HAccO++afrtQWZPQehUNskmd/d3g79tk+3k/hgrP64anz9
 C7GuWTT40Xaq7nccsrrOPfHHODYM5IZilw6kVgkmfFUQ0m0mD1A79Mo4XCRxpOAG
 adMI9Zy+1tGPfZVTvb1WxMEyk52agYfUc9obxomiTIdK8/5HEK3c5Z2oMOGiTjH/
 5Msj6VNTGRjhfFXKiLLk/cb0nhWbuAORwTHahvFLqYa2m1/Gpf+Kvt4jkqCjtmy+
 TfOlYPfEo1+qVRSWy8pcGKhMX5H6oYXwff1s4Bj0ycoxVIIJF3qeIdKSP64AGxh+
 yahIw0GfvBDA3ZuSYcHIYeEuAP7yFRwL6lX4P/Rz5PNxsj+2YxY=
 =H0/i
 -----END PGP SIGNATURE-----

Merge tag 'nf-23-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

netfilter pull request 23-06-08

Pablo Neira Ayuso says:

====================
The following patchset contains Netfilter fixes for net:

1) Add commit and abort set operation to pipapo set abort path.

2) Bail out immediately in case of ENOMEM in nfnetlink batch.

3) Incorrect error path handling when creating a new rule leads to
   dangling pointer in set transaction list.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-10 19:57:03 +01:00
Dmitry Mastykin
b403643d15 netlabel: fix shift wrapping bug in netlbl_catmap_setlong()
There is a shift wrapping bug in this code on 32-bit architectures.
NETLBL_CATMAP_MAPTYPE is u64, bitmap is unsigned long.
Every second 32-bit word of catmap becomes corrupted.

Signed-off-by: Dmitry Mastykin <dmastykin@astralinux.ru>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-10 19:54:06 +01:00
Ilan Peer
7b3b9ac899 wifi: mac80211: Use active_links instead of valid_links in Tx
Fix few places on the Tx path where the valid_links were used instead
of active links.

Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230608163202.e24832691fc8.I9ac10dc246d7798a8d26b1a94933df5668df63fc@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-06-09 13:31:08 +02:00
Johannes Berg
34d4e3eb67 wifi: cfg80211: remove links only on AP
Since links are only controlled by userspace via cfg80211
in AP mode, also only remove them from the driver in that
case.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230608163202.ed65b94916fa.I2458c46888284cc5ce30715fe642bc5fc4340c8f@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-06-09 13:30:53 +02:00
Benjamin Berg
15846f95ab wifi: mac80211: take lock before setting vif links
ieee80211_vif_set_links requires the sdata->local->mtx lock to be held.
Add the appropriate locking around the calls in both the link add and
remove handlers.

This causes a warning when e.g. ieee80211_link_release_channel is called
via ieee80211_link_stop from ieee80211_vif_update_links.

Fixes: 0d8c4a3c8688 ("wifi: mac80211: implement add/del interface link callbacks")
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230608163202.fa0c6597fdad.I83dd70359f6cda30f86df8418d929c2064cf4995@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-06-09 13:30:32 +02:00
Benjamin Berg
1ff56684fa wifi: cfg80211: fix link del callback to call correct handler
The wrapper function was incorrectly calling the add handler instead of
the del handler. This had no negative side effect as the default
handlers are essentially identical.

Fixes: f2a0290b2df2 ("wifi: cfg80211: add optional link add/remove callbacks")
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230608163202.ebd00e000459.Iaff7dc8d1cdecf77f53ea47a0e5080caa36ea02a@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-06-09 13:30:16 +02:00
Johannes Berg
01605ad6c3 wifi: mac80211: fix link activation settings order
In the normal MLME code we always call
ieee80211_mgd_set_link_qos_params() before
ieee80211_link_info_change_notify() and some drivers,
notably iwlwifi, rely on that as they don't do anything
(but store the data) in their conf_tx.

Fix the order here to be the same as in the normal code
paths, so this isn't broken.

Fixes: 3d9011029227 ("wifi: mac80211: implement link switching")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230608163202.a2a86bba2f80.Iac97e04827966d22161e63bb6e201b4061e9651b@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-06-09 13:30:03 +02:00
Dan Carpenter
996c3117da wifi: cfg80211: fix double lock bug in reg_wdev_chan_valid()
The locking was changed recently so now the caller holds the wiphy_lock()
lock.  Taking the lock inside the reg_wdev_chan_valid() function will
lead to a deadlock.

Fixes: f7e60032c661 ("wifi: cfg80211: fix locking in regulatory disconnect")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://lore.kernel.org/r/40c4114a-6cb4-4abf-b013-300b598aba65@moroto.mountain
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-06-09 13:28:41 +02:00
Lee Jones
04c55383fa net/sched: cls_u32: Fix reference counter leak leading to overflow
In the event of a failure in tcf_change_indev(), u32_set_parms() will
immediately return without decrementing the recently incremented
reference counter.  If this happens enough times, the counter will
rollover and the reference freed, leading to a double free which can be
used to do 'bad things'.

In order to prevent this, move the point of possible failure above the
point where the reference counter is incremented.  Also save any
meaningful return values to be applied to the return data at the
appropriate point in time.

This issue was caught with KASAN.

Fixes: 705c7091262d ("net: sched: cls_u32: no need to call tcf_exts_change for newly allocated struct")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Lee Jones <lee@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-09 11:40:17 +01:00
Zhengchao Shao
be3618d965 net/sched: taprio: fix slab-out-of-bounds Read in taprio_dequeue_from_txq
As shown in [1], out-of-bounds access occurs in two cases:
1)when the qdisc of the taprio type is used to replace the previously
configured taprio, count and offset in tc_to_txq can be set to 0. In this
case, the value of *txq in taprio_next_tc_txq() will increases
continuously. When the number of accessed queues exceeds the number of
queues on the device, out-of-bounds access occurs.
2)When packets are dequeued, taprio can be deleted. In this case, the tc
rule of dev is cleared. The count and offset values are also set to 0. In
this case, out-of-bounds access is also caused.

Now the restriction on the queue number is added.

[1] https://groups.google.com/g/syzkaller-bugs/c/_lYOKgkBVMg
Fixes: 2f530df76c8c ("net/sched: taprio: give higher priority to higher TCs in software dequeue mode")
Reported-by: syzbot+04afcb3d2c840447559a@syzkaller.appspotmail.com
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Tested-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-09 10:48:14 +01:00
Max Tottenham
6c02568fd1 net/sched: act_pedit: Parse L3 Header for L4 offset
Instead of relying on skb->transport_header being set correctly, opt
instead to parse the L3 header length out of the L3 headers for both
IPv4/IPv6 when the Extended Layer Op for tcp/udp is used. This fixes a
bug if GRO is disabled, when GRO is disabled skb->transport_header is
set by __netif_receive_skb_core() to point to the L3 header, it's later
fixed by the upper protocol layers, but act_pedit will receive the SKB
before the fixups are completed. The existing behavior causes the
following to edit the L3 header if GRO is disabled instead of the UDP
header:

    tc filter add dev eth0 ingress protocol ip flower ip_proto udp \
 dst_ip 192.168.1.3 action pedit ex munge udp set dport 18053

Also re-introduce a rate-limited warning if we were unable to extract
the header offset when using the 'ex' interface.

Fixes: 71d0ed7079df ("net/act_pedit: Support using offset relative to
the conventional network headers")
Signed-off-by: Max Tottenham <mtottenh@akamai.com>
Reviewed-by: Josh Hunt <johunt@akamai.com>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202305261541.N165u9TZ-lkp@intel.com/
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-09 10:34:27 +01:00
Maciej Żenczykowski
1166a530a8 xfrm: fix inbound ipv4/udp/esp packets to UDPv6 dualstack sockets
Before Linux v5.8 an AF_INET6 SOCK_DGRAM (udp/udplite) socket
with SOL_UDP, UDP_ENCAP, UDP_ENCAP_ESPINUDP{,_NON_IKE} enabled
would just unconditionally use xfrm4_udp_encap_rcv(), afterwards
such a socket would use the newly added xfrm6_udp_encap_rcv()
which only handles IPv6 packets.

Cc: Sabrina Dubroca <sd@queasysnail.net>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Benedict Wong <benedictwong@google.com>
Cc: Yan Yan <evitayan@google.com>
Fixes: 0146dca70b87 ("xfrm: add support for UDPv6 encapsulation of ESP")
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2023-06-09 08:16:34 +02:00
Guillaume Nault
91ffd1bae1 ping6: Fix send to link-local addresses with VRF.
Ping sockets can't send packets when they're bound to a VRF master
device and the output interface is set to a slave device.

For example, when net.ipv4.ping_group_range is properly set, so that
ping6 can use ping sockets, the following kind of commands fails:
  $ ip vrf exec red ping6 fe80::854:e7ff:fe88:4bf1%eth1

What happens is that sk->sk_bound_dev_if is set to the VRF master
device, but 'oif' is set to the real output device. Since both are set
but different, ping_v6_sendmsg() sees their value as inconsistent and
fails.

Fix this by allowing 'oif' to be a slave device of ->sk_bound_dev_if.

This fixes the following kselftest failure:
  $ ./fcnal-test.sh -t ipv6_ping
  [...]
  TEST: ping out, vrf device+address bind - ns-B IPv6 LLA        [FAIL]

Reported-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>
Closes: https://lore.kernel.org/netdev/b6191f90-ffca-dbca-7d06-88a9788def9c@alu.unizg.hr/
Tested-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>
Fixes: 5e457896986e ("net: ipv6: Fix ping to link-local addresses.")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/6c8b53108816a8d0d5705ae37bdc5a8322b5e3d9.1686153846.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-08 18:59:57 -07:00
Pablo Neira Ayuso
1240eb93f0 netfilter: nf_tables: incorrect error path handling with NFT_MSG_NEWRULE
In case of error when adding a new rule that refers to an anonymous set,
deactivate expressions via NFT_TRANS_PREPARE state, not NFT_TRANS_RELEASE.
Thus, the lookup expression marks anonymous sets as inactive in the next
generation to ensure it is not reachable in this transaction anymore and
decrement the set refcount as introduced by c1592a89942e ("netfilter:
nf_tables: deactivate anonymous set from preparation phase"). The abort
step takes care of undoing the anonymous set.

This is also consistent with rule deletion, where NFT_TRANS_PREPARE is
used. Note that this error path is exercised in the preparation step of
the commit protocol. This patch replaces nf_tables_rule_release() by the
deactivate and destroy calls, this time with NFT_TRANS_PREPARE.

Due to this incorrect error handling, it is possible to access a
dangling pointer to the anonymous set that remains in the transaction
list.

[1009.379054] BUG: KASAN: use-after-free in nft_set_lookup_global+0x147/0x1a0 [nf_tables]
[1009.379106] Read of size 8 at addr ffff88816c4c8020 by task nft-rule-add/137110
[1009.379116] CPU: 7 PID: 137110 Comm: nft-rule-add Not tainted 6.4.0-rc4+ #256
[1009.379128] Call Trace:
[1009.379132]  <TASK>
[1009.379135]  dump_stack_lvl+0x33/0x50
[1009.379146]  ? nft_set_lookup_global+0x147/0x1a0 [nf_tables]
[1009.379191]  print_address_description.constprop.0+0x27/0x300
[1009.379201]  kasan_report+0x107/0x120
[1009.379210]  ? nft_set_lookup_global+0x147/0x1a0 [nf_tables]
[1009.379255]  nft_set_lookup_global+0x147/0x1a0 [nf_tables]
[1009.379302]  nft_lookup_init+0xa5/0x270 [nf_tables]
[1009.379350]  nf_tables_newrule+0x698/0xe50 [nf_tables]
[1009.379397]  ? nf_tables_rule_release+0xe0/0xe0 [nf_tables]
[1009.379441]  ? kasan_unpoison+0x23/0x50
[1009.379450]  nfnetlink_rcv_batch+0x97c/0xd90 [nfnetlink]
[1009.379470]  ? nfnetlink_rcv_msg+0x480/0x480 [nfnetlink]
[1009.379485]  ? __alloc_skb+0xb8/0x1e0
[1009.379493]  ? __alloc_skb+0xb8/0x1e0
[1009.379502]  ? entry_SYSCALL_64_after_hwframe+0x46/0xb0
[1009.379509]  ? unwind_get_return_address+0x2a/0x40
[1009.379517]  ? write_profile+0xc0/0xc0
[1009.379524]  ? avc_lookup+0x8f/0xc0
[1009.379532]  ? __rcu_read_unlock+0x43/0x60

Fixes: 958bee14d071 ("netfilter: nf_tables: use new transaction infrastructure to handle sets")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-06-08 21:49:26 +02:00
Linus Torvalds
25041a4c02 Networking fixes for 6.4-rc6, including fixes from can, wifi, netfilter,
bluetooth and ebpf.
 
 Current release - regressions:
 
   - bpf: sockmap: avoid potential NULL dereference in sk_psock_verdict_data_ready()
 
   - wifi: iwlwifi: fix -Warray-bounds bug in iwl_mvm_wait_d3_notif()
 
   - phylink: actually fix ksettings_set() ethtool call
 
   - eth: dwmac-qcom-ethqos: fix a regression on EMAC < 3
 
 Current release - new code bugs:
 
   - wifi: mt76: fix possible NULL pointer dereference in mt7996_mac_write_txwi()
 
 Previous releases - regressions:
 
   - netfilter: fix NULL pointer dereference in nf_confirm_cthelper
 
   - wifi: rtw88/rtw89: correct PS calculation for SUPPORTS_DYNAMIC_PS
 
   - openvswitch: fix upcall counter access before allocation
 
   - bluetooth:
     - fix use-after-free in hci_remove_ltk/hci_remove_irk
     - fix l2cap_disconnect_req deadlock
 
   - nic: bnxt_en: prevent kernel panic when receiving unexpected PHC_UPDATE event
 
 Previous releases - always broken:
 
   - core: annotate rfs lockless accesses
 
   - sched: fq_pie: ensure reasonable TCA_FQ_PIE_QUANTUM values
 
   - netfilter: add null check for nla_nest_start_noflag() in nft_dump_basechain_hook()
 
   - bpf: fix UAF in task local storage
 
   - ipv4: ping_group_range: allow GID from 2147483648 to 4294967294
 
   - ipv6: rpl: fix route of death.
 
   - tcp: gso: really support BIG TCP
 
   - mptcp: fixes for user-space PM address advertisement
 
   - smc: avoid to access invalid RMBs' MRs in SMCRv1 ADD LINK CONT
 
   - can: avoid possible use-after-free when j1939_can_rx_register fails
 
   - batman-adv: fix UaF while rescheduling delayed work
 
   - eth: qede: fix scheduling while atomic
 
   - eth: ice: make writes to /dev/gnssX synchronous
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmSBsv4SHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOkMXUP/jisT2xvTFRmtshX3h+xxPkBxZSo9ovx
 ujviqZkyCNep9fu7Njv+5WWp0V8cy3Ui6G6RiGNHDV24vBtISlX21yQt+VANOPjH
 7x8oqqnANxn3PXjL5hp6YZhNaxiwfAfQGJiU+TngVo1jTJopnWEt2x8Q3EhF/k0S
 id8VaHGh/ugC8lRZSJBK/b+FsJjWY0sxTcsoRSjp6gg1WHUVO8mJXlCfHFhNJcQQ
 /8ghieuskLUs4V6UX3TGg4smGxgl2HPdA79+ohvrVhcB1WoGCsWV83SfUTBWgHkU
 IZrIfM4BFCThcN88IgRgJioeX95D54SK0RzEZdCnJx+elmgTK1ZdUGlBh1Vybh+v
 iQel2dgJI+8zyIl/4lXYdhHogLwnONVrkszMrx+Ds2PzNecmnFWg4LUK01xLjW7J
 poAFsZGVBk0BuTkEqXtxv/8Cc7wU/PMOmy4ZVBrHkNIyGgOLbt5eM0T/pArYoKvr
 +34del2Us2vGVk6i89F/GgRuNCvevO0Y+HyAArOJr2XwpakwQYQHdBdj/77FGjFZ
 PyR/bVJZhxdUMv+J7BdKQK+mwt+ZFBVwIRfU2gvHcDa2XQJe2Eg8GXRtcJ1P7hpr
 Q2A+AgiHSoAn6GrgYNHNZVBhWywQFCsu2ZpH7J0uo4zOyTUl3+4O8jyfDrD7o56D
 BodtDJKZit3B
 =X6b2
 -----END PGP SIGNATURE-----

Merge tag 'net-6.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from can, wifi, netfilter, bluetooth and ebpf.

  Current release - regressions:

   - bpf: sockmap: avoid potential NULL dereference in
     sk_psock_verdict_data_ready()

   - wifi: iwlwifi: fix -Warray-bounds bug in iwl_mvm_wait_d3_notif()

   - phylink: actually fix ksettings_set() ethtool call

   - eth: dwmac-qcom-ethqos: fix a regression on EMAC < 3

  Current release - new code bugs:

   - wifi: mt76: fix possible NULL pointer dereference in
     mt7996_mac_write_txwi()

  Previous releases - regressions:

   - netfilter: fix NULL pointer dereference in nf_confirm_cthelper

   - wifi: rtw88/rtw89: correct PS calculation for SUPPORTS_DYNAMIC_PS

   - openvswitch: fix upcall counter access before allocation

   - bluetooth:
      - fix use-after-free in hci_remove_ltk/hci_remove_irk
      - fix l2cap_disconnect_req deadlock

   - nic: bnxt_en: prevent kernel panic when receiving unexpected
     PHC_UPDATE event

  Previous releases - always broken:

   - core: annotate rfs lockless accesses

   - sched: fq_pie: ensure reasonable TCA_FQ_PIE_QUANTUM values

   - netfilter: add null check for nla_nest_start_noflag() in
     nft_dump_basechain_hook()

   - bpf: fix UAF in task local storage

   - ipv4: ping_group_range: allow GID from 2147483648 to 4294967294

   - ipv6: rpl: fix route of death.

   - tcp: gso: really support BIG TCP

   - mptcp: fixes for user-space PM address advertisement

   - smc: avoid to access invalid RMBs' MRs in SMCRv1 ADD LINK CONT

   - can: avoid possible use-after-free when j1939_can_rx_register fails

   - batman-adv: fix UaF while rescheduling delayed work

   - eth: qede: fix scheduling while atomic

   - eth: ice: make writes to /dev/gnssX synchronous"

* tag 'net-6.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (83 commits)
  bnxt_en: Implement .set_port / .unset_port UDP tunnel callbacks
  bnxt_en: Prevent kernel panic when receiving unexpected PHC_UPDATE event
  bnxt_en: Skip firmware fatal error recovery if chip is not accessible
  bnxt_en: Query default VLAN before VNIC setup on a VF
  bnxt_en: Don't issue AP reset during ethtool's reset operation
  bnxt_en: Fix bnxt_hwrm_update_rss_hash_cfg()
  net: bcmgenet: Fix EEE implementation
  eth: ixgbe: fix the wake condition
  eth: bnxt: fix the wake condition
  lib: cpu_rmap: Fix potential use-after-free in irq_cpu_rmap_release()
  bpf: Add extra path pointer check to d_path helper
  net: sched: fix possible refcount leak in tc_chain_tmplt_add()
  net: sched: act_police: fix sparse errors in tcf_police_dump()
  net: openvswitch: fix upcall counter access before allocation
  net: sched: move rtm_tca_policy declaration to include file
  ice: make writes to /dev/gnssX synchronous
  net: sched: add rcu annotations around qdisc->qdisc_sleeping
  rfs: annotate lockless accesses to RFS sock flow table
  rfs: annotate lockless accesses to sk->sk_rxhash
  virtio_net: use control_buf for coalesce params
  ...
2023-06-08 09:27:19 -07:00
Jakub Kicinski
182620ab36 Here is a batman-adv bugfix:
- fix a broken sync while rescheduling delayed work, by
    Vladislav Efanov
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEE1ilQI7G+y+fdhnrfoSvjmEKSnqEFAmSAp90WHHN3QHNpbW9u
 d3VuZGVybGljaC5kZQAKCRChK+OYQpKeoZPwEADXDBvWlT7DH3sP1peNmFF/y+pd
 AgNJE5wJiJGBrCA1K0gZONulyhpOLjsBtwuyWDuS143IlBSPPQqFcJgZrmU6zHfQ
 MjZBOJMGEgxUbh51vQH4bVTotZB1STVSbst9+HJi+KLpgEN/BnMU2/gDfbV5JwzK
 xVDk2/D0+1OX4w61V+UDqmXuljBLa5cbOUPnRP4/gz/suVui5Q0CESeB+H/One9z
 RlKM6YkcSOr4y9MAIgSpJwY8O4hZ0oeqZyewMTYYWYDQ3nGpZ9NUCGR8kYozusqg
 u8c71nrJwdHV8VS7IU3eEzeKXFo2uz8UxyTgK+qcsoem4oTZgCZy8nXh+Pwp2y72
 R+RHFngBcKIYlvil5cyVUisnJ7GZOjHK/N2pESeG7A/iI0jU6YZgVe5oJaCHbMJl
 //F6m4iFHvPAbf61f5tRePTZTPd98LC3KAlI1Fu4/g+07H0ivgsiFk+qi45OvvcE
 MWvK12FlgTbCUeqjhg6bKuVva2NY5SDn1uRcVrTAD3HDpcpfw7UYv/jLBIeAwpoY
 S7SLRl6xho1+aEPaJWx39DrXWQCFQZP95ygZIN5TPZcihvVp8knY0F7mLPMRovf9
 WBFp89+aS/bv1x14fLrPYOzyJD0XisA3A04iERvf1eLroNa9poh3KHf5jhJLH+RP
 CTumDtHSDt+GIH7E5A==
 =euFF
 -----END PGP SIGNATURE-----

Merge tag 'batadv-net-pullrequest-20230607' of git://git.open-mesh.org/linux-merge

Simon Wunderlich says:

====================
Here is a batman-adv bugfix:

 - fix a broken sync while rescheduling delayed work,
   by Vladislav Efanov

* tag 'batadv-net-pullrequest-20230607' of git://git.open-mesh.org/linux-merge:
  batman-adv: Broken sync while rescheduling delayed work
====================

Link: https://lore.kernel.org/r/20230607155515.548120-1-sw@simonwunderlich.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-07 21:56:01 -07:00
Jakub Kicinski
c9d99cfa66 bpf-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZIDxUwAKCRDbK58LschI
 g5hDAQD7ukrniCvMRNIm2yUZIGSxE4RvGiXptO4a0NfLck5R/wEAsfN2KUsPcPhW
 HS37lVfx7VVXfj42+REf7lWLu4TXpwk=
 =6mS/
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Daniel Borkmann says:

====================
pull-request: bpf 2023-06-07

We've added 7 non-merge commits during the last 7 day(s) which contain
a total of 12 files changed, 112 insertions(+), 7 deletions(-).

The main changes are:

1) Fix a use-after-free in BPF's task local storage, from KP Singh.

2) Make struct path handling more robust in bpf_d_path, from Jiri Olsa.

3) Fix a syzbot NULL-pointer dereference in sockmap, from Eric Dumazet.

4) UAPI fix for BPF_NETFILTER before final kernel ships,
   from Florian Westphal.

5) Fix map-in-map array_map_gen_lookup code generation where elem_size was
   not being set for inner maps, from Rhys Rustad-Elliott.

6) Fix sockopt_sk selftest's NETLINK_LIST_MEMBERSHIPS assertion,
   from Yonghong Song.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  bpf: Add extra path pointer check to d_path helper
  selftests/bpf: Fix sockopt_sk selftest
  bpf: netfilter: Add BPF_NETFILTER bpf_attach_type
  selftests/bpf: Add access_inner_map selftest
  bpf: Fix elem_size not being set for inner maps
  bpf: Fix UAF in task local storage
  bpf, sockmap: Avoid potential NULL dereference in sk_psock_verdict_data_ready()
====================

Link: https://lore.kernel.org/r/20230607220514.29698-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-07 21:47:11 -07:00
Pablo Neira Ayuso
a1a64a151d netfilter: nfnetlink: skip error delivery on batch in case of ENOMEM
If caller reports ENOMEM, then stop iterating over the batch and send a
single netlink message to userspace to report OOM.

Fixes: cbb8125eb40b ("netfilter: nfnetlink: deliver netlink errors on batch completion")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-06-08 04:00:02 +02:00
Pablo Neira Ayuso
212ed75dc5 netfilter: nf_tables: integrate pipapo into commit protocol
The pipapo set backend follows copy-on-update approach, maintaining one
clone of the existing datastructure that is being updated. The clone
and current datastructures are swapped via rcu from the commit step.

The existing integration with the commit protocol is flawed because
there is no operation to clean up the clone if the transaction is
aborted. Moreover, the datastructure swap happens on set element
activation.

This patch adds two new operations for sets: commit and abort, these new
operations are invoked from the commit and abort steps, after the
transactions have been digested, and it updates the pipapo set backend
to use it.

This patch adds a new ->pending_update field to sets to maintain a list
of sets that require this new commit and abort operations.

Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-06-08 03:56:20 +02:00
Hangyu Hua
44f8baaf23 net: sched: fix possible refcount leak in tc_chain_tmplt_add()
try_module_get will be called in tcf_proto_lookup_ops. So module_put needs
to be called to drop the refcount if ops don't implement the required
function.

Fixes: 9f407f1768d3 ("net: sched: introduce chain templates")
Signed-off-by: Hangyu Hua <hbh25y@gmail.com>
Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-07 12:31:41 +01:00
Eric Dumazet
682881ee45 net: sched: act_police: fix sparse errors in tcf_police_dump()
Fixes following sparse errors:

net/sched/act_police.c:360:28: warning: dereference of noderef expression
net/sched/act_police.c:362:45: warning: dereference of noderef expression
net/sched/act_police.c:362:45: warning: dereference of noderef expression
net/sched/act_police.c:368:28: warning: dereference of noderef expression
net/sched/act_police.c:370:45: warning: dereference of noderef expression
net/sched/act_police.c:370:45: warning: dereference of noderef expression
net/sched/act_police.c:376:45: warning: dereference of noderef expression
net/sched/act_police.c:376:45: warning: dereference of noderef expression

Fixes: d1967e495a8d ("net_sched: act_police: add 2 new attributes to support police 64bit rate and peakrate")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-07 12:28:04 +01:00
Eelco Chaudron
de9df6c6b2 net: openvswitch: fix upcall counter access before allocation
Currently, the per cpu upcall counters are allocated after the vport is
created and inserted into the system. This could lead to the datapath
accessing the counters before they are allocated resulting in a kernel
Oops.

Here is an example:

  PID: 59693    TASK: ffff0005f4f51500  CPU: 0    COMMAND: "ovs-vswitchd"
   #0 [ffff80000a39b5b0] __switch_to at ffffb70f0629f2f4
   #1 [ffff80000a39b5d0] __schedule at ffffb70f0629f5cc
   #2 [ffff80000a39b650] preempt_schedule_common at ffffb70f0629fa60
   #3 [ffff80000a39b670] dynamic_might_resched at ffffb70f0629fb58
   #4 [ffff80000a39b680] mutex_lock_killable at ffffb70f062a1388
   #5 [ffff80000a39b6a0] pcpu_alloc at ffffb70f0594460c
   #6 [ffff80000a39b750] __alloc_percpu_gfp at ffffb70f05944e68
   #7 [ffff80000a39b760] ovs_vport_cmd_new at ffffb70ee6961b90 [openvswitch]
   ...

  PID: 58682    TASK: ffff0005b2f0bf00  CPU: 0    COMMAND: "kworker/0:3"
   #0 [ffff80000a5d2f40] machine_kexec at ffffb70f056a0758
   #1 [ffff80000a5d2f70] __crash_kexec at ffffb70f057e2994
   #2 [ffff80000a5d3100] crash_kexec at ffffb70f057e2ad8
   #3 [ffff80000a5d3120] die at ffffb70f0628234c
   #4 [ffff80000a5d31e0] die_kernel_fault at ffffb70f062828a8
   #5 [ffff80000a5d3210] __do_kernel_fault at ffffb70f056a31f4
   #6 [ffff80000a5d3240] do_bad_area at ffffb70f056a32a4
   #7 [ffff80000a5d3260] do_translation_fault at ffffb70f062a9710
   #8 [ffff80000a5d3270] do_mem_abort at ffffb70f056a2f74
   #9 [ffff80000a5d32a0] el1_abort at ffffb70f06297dac
  #10 [ffff80000a5d32d0] el1h_64_sync_handler at ffffb70f06299b24
  #11 [ffff80000a5d3410] el1h_64_sync at ffffb70f056812dc
  #12 [ffff80000a5d3430] ovs_dp_upcall at ffffb70ee6963c84 [openvswitch]
  #13 [ffff80000a5d3470] ovs_dp_process_packet at ffffb70ee6963fdc [openvswitch]
  #14 [ffff80000a5d34f0] ovs_vport_receive at ffffb70ee6972c78 [openvswitch]
  #15 [ffff80000a5d36f0] netdev_port_receive at ffffb70ee6973948 [openvswitch]
  #16 [ffff80000a5d3720] netdev_frame_hook at ffffb70ee6973a28 [openvswitch]
  #17 [ffff80000a5d3730] __netif_receive_skb_core.constprop.0 at ffffb70f06079f90

We moved the per cpu upcall counter allocation to the existing vport
alloc and free functions to solve this.

Fixes: 95637d91fefd ("net: openvswitch: release vport resources on failure")
Fixes: 1933ea365aa7 ("net: openvswitch: Add support to count upcall packets")
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-07 12:25:05 +01:00
Eric Dumazet
886bc7d6ed net: sched: move rtm_tca_policy declaration to include file
rtm_tca_policy is used from net/sched/sch_api.c and net/sched/cls_api.c,
thus should be declared in an include file.

This fixes the following sparse warning:
net/sched/sch_api.c:1434:25: warning: symbol 'rtm_tca_policy' was not declared. Should it be static?

Fixes: e331473fee3d ("net/sched: cls_api: add missing validation of netlink attributes")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-07 12:19:28 +01:00
Eric Dumazet
d636fc5dd6 net: sched: add rcu annotations around qdisc->qdisc_sleeping
syzbot reported a race around qdisc->qdisc_sleeping [1]

It is time we add proper annotations to reads and writes to/from
qdisc->qdisc_sleeping.

[1]
BUG: KCSAN: data-race in dev_graft_qdisc / qdisc_lookup_rcu

read to 0xffff8881286fc618 of 8 bytes by task 6928 on cpu 1:
qdisc_lookup_rcu+0x192/0x2c0 net/sched/sch_api.c:331
__tcf_qdisc_find+0x74/0x3c0 net/sched/cls_api.c:1174
tc_get_tfilter+0x18f/0x990 net/sched/cls_api.c:2547
rtnetlink_rcv_msg+0x7af/0x8c0 net/core/rtnetlink.c:6386
netlink_rcv_skb+0x126/0x220 net/netlink/af_netlink.c:2546
rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:6413
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x56f/0x640 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x665/0x770 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg net/socket.c:747 [inline]
____sys_sendmsg+0x375/0x4c0 net/socket.c:2503
___sys_sendmsg net/socket.c:2557 [inline]
__sys_sendmsg+0x1e3/0x270 net/socket.c:2586
__do_sys_sendmsg net/socket.c:2595 [inline]
__se_sys_sendmsg net/socket.c:2593 [inline]
__x64_sys_sendmsg+0x46/0x50 net/socket.c:2593
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

write to 0xffff8881286fc618 of 8 bytes by task 6912 on cpu 0:
dev_graft_qdisc+0x4f/0x80 net/sched/sch_generic.c:1115
qdisc_graft+0x7d0/0xb60 net/sched/sch_api.c:1103
tc_modify_qdisc+0x712/0xf10 net/sched/sch_api.c:1693
rtnetlink_rcv_msg+0x807/0x8c0 net/core/rtnetlink.c:6395
netlink_rcv_skb+0x126/0x220 net/netlink/af_netlink.c:2546
rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:6413
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x56f/0x640 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x665/0x770 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg net/socket.c:747 [inline]
____sys_sendmsg+0x375/0x4c0 net/socket.c:2503
___sys_sendmsg net/socket.c:2557 [inline]
__sys_sendmsg+0x1e3/0x270 net/socket.c:2586
__do_sys_sendmsg net/socket.c:2595 [inline]
__se_sys_sendmsg net/socket.c:2593 [inline]
__x64_sys_sendmsg+0x46/0x50 net/socket.c:2593
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 6912 Comm: syz-executor.5 Not tainted 6.4.0-rc3-syzkaller-00190-g0d85b27b0cc6 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/16/2023

Fixes: 3a7d0d07a386 ("net: sched: extend Qdisc with rcu")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Vlad Buslov <vladbu@nvidia.com>
Acked-by: Jamal Hadi Salim<jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-07 10:25:39 +01:00
Eric Dumazet
5c3b74a92a rfs: annotate lockless accesses to RFS sock flow table
Add READ_ONCE()/WRITE_ONCE() on accesses to the sock flow table.

This also prevents a (smart ?) compiler to remove the condition in:

if (table->ents[index] != newval)
        table->ents[index] = newval;

We need the condition to avoid dirtying a shared cache line.

Fixes: fec5e652e58f ("rfs: Receive Flow Steering")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-07 10:08:45 +01:00
Leon Romanovsky
bf06fcf4be xfrm: add missed call to delete offloaded policies
Offloaded policies are deleted through two flows: netdev is going
down and policy flush.

In both cases, the code lacks relevant call to delete offloaded policy.

Fixes: 919e43fad516 ("xfrm: add an interface to offload policy")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2023-06-07 09:58:48 +02:00
Jakub Kicinski
ab39b113e7 bluetooth pull request for net:
- Fixes to debugfs registration
  - Fix use-after-free in hci_remove_ltk/hci_remove_irk
  - Fixes to ISO channel support
  - Fix missing checks for invalid L2CAP DCID
  - Fix l2cap_disconnect_req deadlock
  - Add lock to protect HCI_UNREGISTER
 -----BEGIN PGP SIGNATURE-----
 
 iQJNBAABCAA3FiEE7E6oRXp8w05ovYr/9JCA4xAyCykFAmR+ftMZHGx1aXoudm9u
 LmRlbnR6QGludGVsLmNvbQAKCRD0kIDjEDILKQMXD/9NcuqbGmEzJspVA8bZ8gXD
 L7a68QnacdIoqH56QstLhGPQsYH6dv9fwhpNX6AN8/j8UG8DnDXQtHyfm4gZzfYA
 h8GP7+ZQIEiHivIxiamrJnQ1Ii+KYEV3NGyS43YBuuPi9LcTFR0Km42xA0GqOnDU
 Hz3/n5v342479TjJPNJkFPmcUGViRaLXtKhzcBzmSykUW+SVuIuD03yxuAJcojf5
 rlPYA7yho7k8BAWkcYxWAP3v9fzQVa3nz8rQO2rG+poi4La2mmqRHykuSCXmzvBX
 SbZwvzqgquqgQiFLpRIo/nwnVwPu3NYK6dQzlXPqiaxfM6qAtRttwQWNnOT+UxEu
 VVGk6fD9iKjo9dttq+lTSY3LI/SXWAHYByIBzjx883hJYf1YvDAMSlMlzo029xL6
 BHu3hMTDhosP8sG5wFdR2KzBmUd1W/ZcwOG0UP8PjshZgrOZ3uej9p3MrocKAys7
 uGOBFmGzwOaQLXJQLbd4djE5l6zLOxSCV/0OLIWQw7VFQiHb66NzN6wenYEkDnxM
 j2pFAlzp4RKHHCjU3dfaE90c0ede116e9nhjAlzmUOxggg6aCxCrCkMNOI8NlZ4v
 oukYWq66RWYA/J4S80OLepITtBRPVn3JFxOXss5xESFfEnzL2nRZ5gm8jJJGULU4
 x6tKTHaomO99FcH0ZFlZMw==
 =jMWO
 -----END PGP SIGNATURE-----

Merge tag 'for-net-2023-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth

Luiz Augusto von Dentz says:

====================
bluetooth pull request for net:

 - Fixes to debugfs registration
 - Fix use-after-free in hci_remove_ltk/hci_remove_irk
 - Fixes to ISO channel support
 - Fix missing checks for invalid L2CAP DCID
 - Fix l2cap_disconnect_req deadlock
 - Add lock to protect HCI_UNREGISTER

* tag 'for-net-2023-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
  Bluetooth: L2CAP: Add missing checks for invalid DCID
  Bluetooth: ISO: use correct CIS order in Set CIG Parameters event
  Bluetooth: ISO: don't try to remove CIG if there are bound CIS left
  Bluetooth: Fix l2cap_disconnect_req deadlock
  Bluetooth: hci_qca: fix debugfs registration
  Bluetooth: fix debugfs registration
  Bluetooth: hci_sync: add lock to protect HCI_UNREGISTER
  Bluetooth: Fix use-after-free in hci_remove_ltk/hci_remove_irk
  Bluetooth: ISO: Fix CIG auto-allocation to select configurable CIG
  Bluetooth: ISO: consider right CIS when removing CIG at cleanup
====================

Link: https://lore.kernel.org/r/20230606003454.2392552-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-06 21:36:57 -07:00
Jakub Kicinski
20c47646a2 netfilter pull request 23-06-07
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEN9lkrMBJgcdVAPub1V2XiooUIOQFAmR/uDEACgkQ1V2XiooU
 IOTC0BAAoKLyoPncbYOO9bTX9nbmn+gttwVd/wDJEbeAXzHSIiWJmjfCklJ9P7Bu
 j3cRAOPe7qyXbUCpTTWPOMzcrjUwnnSuNjF5dgGhfgkg+jiykEuxaRJvyXJ1WKI4
 v94hkmVeWB/iVpbNtFlUVzAzjemtLWU8TDEqaKRpZubaf+tNokJ3gggTlTRYslnn
 YGXlaypkLh7xGUmW7q3MfmySbfj6E7dHnYJ4Df5MKMwGM3Rrbelh9/VTpn33nob2
 74lWg/Gj3My9E+NjnZMoTA/YGnuUVPhYm4naIvp6Hc6IKQ3dI7NqleywxeHbuPgr
 McwHtLRR8a5HJpMhPXPtA0d/Ot2LGzKo4L62Ahp4KHrTr/UKDtqSDu+9ZButue/E
 0W/dKn+UA5hQKiNXOlTt25npx8VgQJFwcdCAYPJZNONCegCzl2MDVUBZufFLg6OM
 JC2XMHFN1GRAHtgHMfdbM1pHYjkx9QBeYFz4zLgWmsGLIvsfgYpVE+nF6ExJsNjZ
 pOILZtbAFWCUFVXWVUxJF4OkwOmpV2DhUk0hRKLOhmPD/HSoa4dvkGaB/yQB1uyz
 SVfZgIrTqftLYgLvHDb9u0nRSwxibmPSCkr0C86yWRzOLJytil/qWqX6lAyMYUei
 Yy8d+Kq/iX6qGJf5py9xtyXbT2Vsb5EYX7+qMu6HySngCZz+Zwo=
 =tb7S
 -----END PGP SIGNATURE-----

Merge tag 'nf-23-06-07' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

1) Missing nul-check in basechain hook netlink dump path, from Gavrilov Ilia.

2) Fix bitwise register tracking, from Jeremy Sowden.

3) Null pointer dereference when accessing conntrack helper,
   from Tijs Van Buggenhout.

4) Add schedule point to ipset's call_ad, from Kuniyuki Iwashima.

5) Incorrect boundary check when building chain blob.

* tag 'nf-23-06-07' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nf_tables: out-of-bound check in chain blob
  netfilter: ipset: Add schedule point in call_ad().
  netfilter: conntrack: fix NULL pointer dereference in nf_confirm_cthelper
  netfilter: nft_bitwise: fix register tracking
  netfilter: nf_tables: Add null check for nla_nest_start_noflag() in nft_dump_basechain_hook()
====================

Link: https://lore.kernel.org/r/20230606225851.67394-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-06 21:23:49 -07:00
Jakub Kicinski
e684ab76af wireless fixes for v6.4
Both rtw88 and rtw89 have a 802.11 powersave fix for a regression
 introduced in v6.0. mt76 fixes a race and a null pointer dereference.
 iwlwifi fixes an issue where not enough memory was allocated for a
 firmware event. And finally the stack has several smaller fixes all
 over.
 -----BEGIN PGP SIGNATURE-----
 
 iQFFBAABCgAvFiEEiBjanGPFTz4PRfLobhckVSbrbZsFAmR/S2URHGt2YWxvQGtl
 cm5lbC5vcmcACgkQbhckVSbrbZuHXwgAhS9w8UIZ2qLYmLQOlby4Hx9+TV2lSdZ1
 V878SCWC+/nRX1mRrWZdU5zwwXXVpLv61dCUOuYyJp8ko4izzTwUhZzvNGowaGgo
 HA+KrND/rZ2ApRZDZQMpe8SXaTUZJhcRDdV4njjdeSqNEcfksgz1W8exzDpKt8YD
 pAdz8+gfpBSoATRThY5p3vyeC4e1weKqbsk96SLoip/wKzz92jyUx9fyexTskfoN
 WMfDU474bz4XIEXzmuFBqpwylwxTvy+FKvEVZfe9PqtXEOChqMUZGGMAemD81FY0
 kKIEY21kAOBKRBW5OLNHcR0WrFcq+C17+L9eazE1F7iQiKIVQaCsag==
 =a4jg
 -----END PGP SIGNATURE-----

Merge tag 'wireless-2023-06-06' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless

Kalle Valo says:

====================
wireless fixes for v6.4

Both rtw88 and rtw89 have a 802.11 powersave fix for a regression
introduced in v6.0. mt76 fixes a race and a null pointer dereference.
iwlwifi fixes an issue where not enough memory was allocated for a
firmware event. And finally the stack has several smaller fixes all
over.

* tag 'wireless-2023-06-06' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
  wifi: cfg80211: fix locking in regulatory disconnect
  wifi: cfg80211: fix locking in sched scan stop work
  wifi: iwlwifi: mvm: Fix -Warray-bounds bug in iwl_mvm_wait_d3_notif()
  wifi: mac80211: fix switch count in EMA beacons
  wifi: mac80211: don't translate beacon/presp addrs
  wifi: mac80211: mlme: fix non-inheritence element
  wifi: cfg80211: reject bad AP MLD address
  wifi: mac80211: use correct iftype HE cap
  wifi: mt76: mt7996: fix possible NULL pointer dereference in mt7996_mac_write_txwi()
  wifi: rtw89: remove redundant check of entering LPS
  wifi: rtw89: correct PS calculation for SUPPORTS_DYNAMIC_PS
  wifi: rtw88: correct PS calculation for SUPPORTS_DYNAMIC_PS
  wifi: mt76: mt7615: fix possible race in mt7615_mac_sta_poll
====================

Link: https://lore.kernel.org/r/20230606150817.EC133C433D2@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-06 21:16:52 -07:00
Eric Dumazet
82a01ab35b tcp: gso: really support BIG TCP
We missed that tcp_gso_segment() was assuming skb->len was smaller than 65535 :

oldlen = (u16)~skb->len;

This part came with commit 0718bcc09b35 ("[NET]: Fix CHECKSUM_HW GSO problems.")

This leads to wrong TCP checksum.

Adapt the code to accept arbitrary packet length.

v2:
  - use two csum_add() instead of csum_fold() (Alexander Duyck)
  - Change delta type to __wsum to reduce casts (Alexander Duyck)

Fixes: 09f3d1a3a52c ("ipv6/gso: remove temporary HBH/jumbo header")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230605161647.3624428-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-06 21:07:43 -07:00
Kuniyuki Iwashima
a2f4c143d7 ipv6: rpl: Fix Route of Death.
A remote DoS vulnerability of RPL Source Routing is assigned CVE-2023-2156.

The Source Routing Header (SRH) has the following format:

  0                   1                   2                   3
  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |  Next Header  |  Hdr Ext Len  | Routing Type  | Segments Left |
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  | CmprI | CmprE |  Pad  |               Reserved                |
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |                                                               |
  .                                                               .
  .                        Addresses[1..n]                        .
  .                                                               .
  |                                                               |
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

The originator of an SRH places the first hop's IPv6 address in the IPv6
header's IPv6 Destination Address and the second hop's IPv6 address as
the first address in Addresses[1..n].

The CmprI and CmprE fields indicate the number of prefix octets that are
shared with the IPv6 Destination Address.  When CmprI or CmprE is not 0,
Addresses[1..n] are compressed as follows:

  1..n-1 : (16 - CmprI) bytes
       n : (16 - CmprE) bytes

Segments Left indicates the number of route segments remaining.  When the
value is not zero, the SRH is forwarded to the next hop.  Its address
is extracted from Addresses[n - Segment Left + 1] and swapped with IPv6
Destination Address.

When Segment Left is greater than or equal to 2, the size of SRH is not
changed because Addresses[1..n-1] are decompressed and recompressed with
CmprI.

OTOH, when Segment Left changes from 1 to 0, the new SRH could have a
different size because Addresses[1..n-1] are decompressed with CmprI and
recompressed with CmprE.

Let's say CmprI is 15 and CmprE is 0.  When we receive SRH with Segment
Left >= 2, Addresses[1..n-1] have 1 byte for each, and Addresses[n] has
16 bytes.  When Segment Left is 1, Addresses[1..n-1] is decompressed to
16 bytes and not recompressed.  Finally, the new SRH will need more room
in the header, and the size is (16 - 1) * (n - 1) bytes.

Here the max value of n is 255 as Segment Left is u8, so in the worst case,
we have to allocate 3825 bytes in the skb headroom.  However, now we only
allocate a small fixed buffer that is IPV6_RPL_SRH_WORST_SWAP_SIZE (16 + 7
bytes).  If the decompressed size overflows the room, skb_push() hits BUG()
below [0].

Instead of allocating the fixed buffer for every packet, let's allocate
enough headroom only when we receive SRH with Segment Left 1.

[0]:
skbuff: skb_under_panic: text:ffffffff81c9f6e2 len:576 put:576 head:ffff8880070b5180 data:ffff8880070b4fb0 tail:0x70 end:0x140 dev:lo
kernel BUG at net/core/skbuff.c:200!
invalid opcode: 0000 [#1] PREEMPT SMP PTI
CPU: 0 PID: 154 Comm: python3 Not tainted 6.4.0-rc4-00190-gc308e9ec0047 #7
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
RIP: 0010:skb_panic (net/core/skbuff.c:200)
Code: 4f 70 50 8b 87 bc 00 00 00 50 8b 87 b8 00 00 00 50 ff b7 c8 00 00 00 4c 8b 8f c0 00 00 00 48 c7 c7 80 6e 77 82 e8 ad 8b 60 ff <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90
RSP: 0018:ffffc90000003da0 EFLAGS: 00000246
RAX: 0000000000000085 RBX: ffff8880058a6600 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff88807dc1c540 RDI: ffff88807dc1c540
RBP: ffffc90000003e48 R08: ffffffff82b392c8 R09: 00000000ffffdfff
R10: ffffffff82a592e0 R11: ffffffff82b092e0 R12: ffff888005b1c800
R13: ffff8880070b51b8 R14: ffff888005b1ca18 R15: ffff8880070b5190
FS:  00007f4539f0b740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055670baf3000 CR3: 0000000005b0e000 CR4: 00000000007506f0
PKRU: 55555554
Call Trace:
 <IRQ>
 skb_push (net/core/skbuff.c:210)
 ipv6_rthdr_rcv (./include/linux/skbuff.h:2880 net/ipv6/exthdrs.c:634 net/ipv6/exthdrs.c:718)
 ip6_protocol_deliver_rcu (net/ipv6/ip6_input.c:437 (discriminator 5))
 ip6_input_finish (./include/linux/rcupdate.h:805 net/ipv6/ip6_input.c:483)
 __netif_receive_skb_one_core (net/core/dev.c:5494)
 process_backlog (./include/linux/rcupdate.h:805 net/core/dev.c:5934)
 __napi_poll (net/core/dev.c:6496)
 net_rx_action (net/core/dev.c:6565 net/core/dev.c:6696)
 __do_softirq (./arch/x86/include/asm/jump_label.h:27 ./include/linux/jump_label.h:207 ./include/trace/events/irq.h:142 kernel/softirq.c:572)
 do_softirq (kernel/softirq.c:472 kernel/softirq.c:459)
 </IRQ>
 <TASK>
 __local_bh_enable_ip (kernel/softirq.c:396)
 __dev_queue_xmit (net/core/dev.c:4272)
 ip6_finish_output2 (./include/net/neighbour.h:544 net/ipv6/ip6_output.c:134)
 rawv6_sendmsg (./include/net/dst.h:458 ./include/linux/netfilter.h:303 net/ipv6/raw.c:656 net/ipv6/raw.c:914)
 sock_sendmsg (net/socket.c:724 net/socket.c:747)
 __sys_sendto (net/socket.c:2144)
 __x64_sys_sendto (net/socket.c:2156 net/socket.c:2152 net/socket.c:2152)
 do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
 entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
RIP: 0033:0x7f453a138aea
Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
RSP: 002b:00007ffcc212a1c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00007ffcc212a288 RCX: 00007f453a138aea
RDX: 0000000000000060 RSI: 00007f4539084c20 RDI: 0000000000000003
RBP: 00007f4538308e80 R08: 00007ffcc212a300 R09: 000000000000001c
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: ffffffffc4653600 R14: 0000000000000001 R15: 00007f4539712d1b
 </TASK>
Modules linked in:

Fixes: 8610c7c6e3bd ("net: ipv6: add support for rpl sr exthdr")
Reported-by: Max VA
Closes: https://www.interruptlabs.co.uk/articles/linux-ipv6-route-of-death
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230605180617.67284-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-06 20:59:08 -07:00