Commit Graph

78461 Commits

Author SHA1 Message Date
Linus Torvalds
8c245fe7dd Including fixes from ieee802154, bluetooth and netfilter.
Current release - regressions:
 
   - eth: mlx5: fix wrong reserved field in hca_cap_2 in mlx5_ifc
 
   - eth: am65-cpsw: fix forever loop in cleanup code
 
 Current release - new code bugs:
 
   - eth: mlx5: HWS, fixed double-free in error flow of creating SQ
 
 Previous releases - regressions:
 
   - core: avoid potential underflow in qdisc_pkt_len_init() with UFO
 
   - core: test for not too small csum_start in virtio_net_hdr_to_skb()
 
   - vrf: revert "vrf: remove unnecessary RCU-bh critical section"
 
   - bluetooth:
     - fix uaf in l2cap_connect
     - fix possible crash on mgmt_index_removed
 
   - dsa: improve shutdown sequence
 
   - eth: mlx5e: SHAMPO, fix overflow of hd_per_wq
 
   - eth: ip_gre: fix drops of small packets in ipgre_xmit
 
 Previous releases - always broken:
 
   - core: fix gso_features_check to check for both dev->gso_{ipv4_,}max_size
 
   - core: fix tcp fraglist segmentation after pull from frag_list
 
   - netfilter: nf_tables: prevent nf_skb_duplicated corruption
 
   - sctp: set sk_state back to CLOSED if autobind fails in sctp_listen_start
 
   - mac802154: fix potential RCU dereference issue in mac802154_scan_worker
 
   - eth: fec: restart PPS after link state change
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmb+giESHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOkDowP/25YsDA8uaH5yelI85vUgp1T50MWgFxJ
 ARm58Pzxr8byX6eIup95xSsjLvMbLaWj5LIA2Y49AV0fWVgGn0U8yx4mPy0Czhdg
 J1oxtyoV1pR2V/okWzD4yhZV2on7OGsS73I6J1s6BAowezr19A+aa5Un57dW/103
 ccwBuBOYlSIOIHmarOxuFhWMYcwXreNBHa9K7J6JtDFn9F56fUn+ZoIUJ7x27cSO
 eWhh9bIkeEb+xYeUXAjNP3pBvJ1xpwIyZv+JMTp40jNsAXPjSpI3Jwd1YlAAMuT9
 J2dW0Zs8uwm5LzBPFvI9iM0WHEmVy6+b32NjnKVwPn2+XGGWQss52bmRElNcJkrw
 4NeG6/6CPIE0xuczBECuMa0X68NDKIZsjy3Q3OahV82ef2cwhRk6FexyIg5oiMPx
 KmMi5B+UQw6ZY3ZF/ME/0jJx/H5ayOC01yNBaTUPrLJr8gjquWEMjZXEqJsdyixJ
 5OoZeKG5oN6HkN7g/IxoFjg/W/g93OULO3qH+IzLQG4NlVs6Zp4ykL7dT+Py2zzc
 Ru3n5+HA4PqDn2u7gmP1mu2g/lmKUIZEEvR+msP81Cywlz5qtWIH1a6oIeVC7bjt
 JNhgBgzKGGMGdgmhYNzXw213WCEbz0+as2SNlvlbiqMP5FKQPLzzBVuJoz4AtJVn
 cyVy7D66HuMW
 =cq2I
 -----END PGP SIGNATURE-----

Merge tag 'net-6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from ieee802154, bluetooth and netfilter.

  Current release - regressions:

   - eth: mlx5: fix wrong reserved field in hca_cap_2 in mlx5_ifc

   - eth: am65-cpsw: fix forever loop in cleanup code

  Current release - new code bugs:

   - eth: mlx5: HWS, fixed double-free in error flow of creating SQ

  Previous releases - regressions:

   - core: avoid potential underflow in qdisc_pkt_len_init() with UFO

   - core: test for not too small csum_start in virtio_net_hdr_to_skb()

   - vrf: revert "vrf: remove unnecessary RCU-bh critical section"

   - bluetooth:
       - fix uaf in l2cap_connect
       - fix possible crash on mgmt_index_removed

   - dsa: improve shutdown sequence

   - eth: mlx5e: SHAMPO, fix overflow of hd_per_wq

   - eth: ip_gre: fix drops of small packets in ipgre_xmit

  Previous releases - always broken:

   - core: fix gso_features_check to check for both
     dev->gso_{ipv4_,}max_size

   - core: fix tcp fraglist segmentation after pull from frag_list

   - netfilter: nf_tables: prevent nf_skb_duplicated corruption

   - sctp: set sk_state back to CLOSED if autobind fails in
     sctp_listen_start

   - mac802154: fix potential RCU dereference issue in
     mac802154_scan_worker

   - eth: fec: restart PPS after link state change"

* tag 'net-6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (48 commits)
  sctp: set sk_state back to CLOSED if autobind fails in sctp_listen_start
  dt-bindings: net: xlnx,axi-ethernet: Add missing reg minItems
  doc: net: napi: Update documentation for napi_schedule_irqoff
  net/ncsi: Disable the ncsi work before freeing the associated structure
  net: phy: qt2025: Fix warning: unused import DeviceId
  gso: fix udp gso fraglist segmentation after pull from frag_list
  bridge: mcast: Fail MDB get request on empty entry
  vrf: revert "vrf: Remove unnecessary RCU-bh critical section"
  net: ethernet: ti: am65-cpsw: Fix forever loop in cleanup code
  net: phy: realtek: Check the index value in led_hw_control_get
  ppp: do not assume bh is held in ppp_channel_bridge_input()
  selftests: rds: move include.sh to TEST_FILES
  net: test for not too small csum_start in virtio_net_hdr_to_skb()
  net: gso: fix tcp fraglist segmentation after pull from frag_list
  ipv4: ip_gre: Fix drops of small packets in ipgre_xmit
  net: stmmac: dwmac4: extend timeout for VLAN Tag register busy bit check
  net: add more sanity checks to qdisc_pkt_len_init()
  net: avoid potential underflow in qdisc_pkt_len_init() with UFO
  net: ethernet: ti: cpsw_ale: Fix warning on some platforms
  net: microchip: Make FDMA config symbol invisible
  ...
2024-10-03 09:44:00 -07:00
Xin Long
8beee4d8de sctp: set sk_state back to CLOSED if autobind fails in sctp_listen_start
In sctp_listen_start() invoked by sctp_inet_listen(), it should set the
sk_state back to CLOSED if sctp_autobind() fails due to whatever reason.

Otherwise, next time when calling sctp_inet_listen(), if sctp_sk(sk)->reuse
is already set via setsockopt(SCTP_REUSE_PORT), sctp_sk(sk)->bind_hash will
be dereferenced as sk_state is LISTENING, which causes a crash as bind_hash
is NULL.

  KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
  RIP: 0010:sctp_inet_listen+0x7f0/0xa20 net/sctp/socket.c:8617
  Call Trace:
   <TASK>
   __sys_listen_socket net/socket.c:1883 [inline]
   __sys_listen+0x1b7/0x230 net/socket.c:1894
   __do_sys_listen net/socket.c:1902 [inline]

Fixes: 5e8f3f703a ("sctp: simplify sctp listening code")
Reported-by: syzbot+f4e0f821e3a3b7cee51d@syzkaller.appspotmail.com
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Link: https://patch.msgid.link/a93e655b3c153dc8945d7a812e6d8ab0d52b7aa0.1727729391.git.lucien.xin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-03 12:18:29 +02:00
Paolo Abeni
1127c73a8d netfilter pull request 24-10-02
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEN9lkrMBJgcdVAPub1V2XiooUIOQFAmb9qqsACgkQ1V2XiooU
 IOSplRAAsv0Rr2WRA+pDpQwcmMNWoemGtu0qB7L6IchM36P64GvldMhEgfSPCh1h
 6HdV8WlkGE5Q/bOPCNbkLg/INBelADoioaOlsdOO5oc+rGUw/Z4Swcq/1PF60Vaz
 tz8AOU0opAD3X50U5bqD1Z2xToonS9nz9Ql7OWAbTdn9red/2SY+H1fyDz00VIHU
 X4y2GWND5Hi6KIsAGTu9OiyQKy9hb1oA5xNU1OeNY+gNsr+r+NSbX0BOMSRJTvLv
 MyY0kzP+S+yTx2FGcDMqgKfo60Sb4Ru6rJXl3XKd6QxhW9Mt6adcmmlqa5edoWU3
 bJYkzugl66XKh1pDkC9u7om7zOOzBhjvLObDMbcYfAVJCctsErGcRDJIvS8M+ECB
 tRsxRFU2CSud4HzIeKfQUP7b16KghnBa4kTsc0r8MLcfU5D/aR/WMR62W/ua00IS
 noyWqtpdNk/7yR9HMzaCbsjgm+OZbtJbOSWCNaDo4TsXf+g+jQ+cf1Nl26cE73gB
 xWGcc3LKIkcjQpOU+Zu0fluF7OdnDNNTEoHprnahilBHDOtmSBDMwxAoJichCZMt
 mEN1CThG0B+YwlWH9yFL1bOQs1zHHFjHfJspdtqCok+UeD20p8QD1V8mlEsAkkT/
 alw0Gxa6T2KepuOF9KcMnx4IcpkqwpgkwcGXvwRWWchANgbi1ao=
 =UUcp
 -----END PGP SIGNATURE-----

Merge tag 'nf-24-10-02' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

1) Fix incorrect documentation in uapi/linux/netfilter/nf_tables.h
   regarding flowtable hooks, from Phil Sutter.

2) Fix nft_audit.sh selftests with newer nft binaries, due to different
   (valid) audit output, also from Phil.

3) Disable BH when duplicating packets via nf_dup infrastructure,
   otherwise race on nf_skb_duplicated for locally generated traffic.
   From Eric.

4) Missing return in callback of selftest C program, from zhang jiao.

netfilter pull request 24-10-02

* tag 'nf-24-10-02' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  selftests: netfilter: Add missing return value
  netfilter: nf_tables: prevent nf_skb_duplicated corruption
  selftests: netfilter: Fix nft_audit.sh for newer nft binaries
  netfilter: uapi: NFTA_FLOWTABLE_HOOK is NLA_NESTED
====================

Link: https://patch.msgid.link/20241002202421.1281311-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-03 12:01:05 +02:00
Eddie James
a0ffa68c70 net/ncsi: Disable the ncsi work before freeing the associated structure
The work function can run after the ncsi device is freed, resulting
in use-after-free bugs or kernel panic.

Fixes: 2d283bdd07 ("net/ncsi: Resource management")
Signed-off-by: Eddie James <eajames@linux.ibm.com>
Link: https://patch.msgid.link/20240925155523.1017097-1-eajames@linux.ibm.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-03 10:14:14 +02:00
Willem de Bruijn
a1e40ac5b5 gso: fix udp gso fraglist segmentation after pull from frag_list
Detect gso fraglist skbs with corrupted geometry (see below) and
pass these to skb_segment instead of skb_segment_list, as the first
can segment them correctly.

Valid SKB_GSO_FRAGLIST skbs
- consist of two or more segments
- the head_skb holds the protocol headers plus first gso_size
- one or more frag_list skbs hold exactly one segment
- all but the last must be gso_size

Optional datapath hooks such as NAT and BPF (bpf_skb_pull_data) can
modify these skbs, breaking these invariants.

In extreme cases they pull all data into skb linear. For UDP, this
causes a NULL ptr deref in __udpv4_gso_segment_list_csum at
udp_hdr(seg->next)->dest.

Detect invalid geometry due to pull, by checking head_skb size.
Don't just drop, as this may blackhole a destination. Convert to be
able to pass to regular skb_segment.

Link: https://lore.kernel.org/netdev/20240428142913.18666-1-shiming.cheng@mediatek.com/
Fixes: 9fd1ff5d2a ("udp: Support UDP fraglist GRO/GSO.")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20241001171752.107580-1-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-02 17:29:31 -07:00
Ido Schimmel
555f45d24b bridge: mcast: Fail MDB get request on empty entry
When user space deletes a port from an MDB entry, the port is removed
synchronously. If this was the last port in the entry and the entry is
not joined by the host itself, then the entry is scheduled for deletion
via a timer.

The above means that it is possible for the MDB get netlink request to
retrieve an empty entry which is scheduled for deletion. This is
problematic as after deleting the last port in an entry, user space
cannot rely on a non-zero return code from the MDB get request as an
indication that the port was successfully removed.

Fix by returning an error when the entry's port list is empty and the
entry is not joined by the host.

Fixes: 68b380a395 ("bridge: mcast: Add MDB get support")
Reported-by: Jamie Bainbridge <jamie.bainbridge@gmail.com>
Closes: https://lore.kernel.org/netdev/c92569919307749f879b9482b0f3e125b7d9d2e3.1726480066.git.jamie.bainbridge@gmail.com/
Tested-by: Jamie Bainbridge <jamie.bainbridge@gmail.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20240929123640.558525-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-02 17:26:57 -07:00
Felix Fietkau
17bd3bd82f net: gso: fix tcp fraglist segmentation after pull from frag_list
Detect tcp gso fraglist skbs with corrupted geometry (see below) and
pass these to skb_segment instead of skb_segment_list, as the first
can segment them correctly.

Valid SKB_GSO_FRAGLIST skbs
- consist of two or more segments
- the head_skb holds the protocol headers plus first gso_size
- one or more frag_list skbs hold exactly one segment
- all but the last must be gso_size

Optional datapath hooks such as NAT and BPF (bpf_skb_pull_data) can
modify these skbs, breaking these invariants.

In extreme cases they pull all data into skb linear. For TCP, this
causes a NULL ptr deref in __tcpv4_gso_segment_list_csum at
tcp_hdr(seg->next).

Detect invalid geometry due to pull, by checking head_skb size.
Don't just drop, as this may blackhole a destination. Convert to be
able to pass to regular skb_segment.

Approach and description based on a patch by Willem de Bruijn.

Link: https://lore.kernel.org/netdev/20240428142913.18666-1-shiming.cheng@mediatek.com/
Link: https://lore.kernel.org/netdev/20240922150450.3873767-1-willemdebruijn.kernel@gmail.com/
Fixes: bee88cd5bd ("net: add support for segmenting TCP fraglist GSO packets")
Cc: stable@vger.kernel.org
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240926085315.51524-1-nbd@nbd.name
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-02 17:21:47 -07:00
Jakub Kicinski
e5e3f369b1 bluetooth pull request for net:
- btmrvl: Use IRQF_NO_AUTOEN flag in request_irq()
  - MGMT: Fix possible crash on mgmt_index_removed
  - L2CAP: Fix uaf in l2cap_connect
  - Bluetooth: hci_event: Align BR/EDR JUST_WORKS paring with LE
 -----BEGIN PGP SIGNATURE-----
 
 iQJNBAABCAA3FiEE7E6oRXp8w05ovYr/9JCA4xAyCykFAmb2x0wZHGx1aXoudm9u
 LmRlbnR6QGludGVsLmNvbQAKCRD0kIDjEDILKbvED/wJRoKUurN3YV0ASEUm5nyM
 sJThb5RDy9IX+30rTl3pMD3cKzXsT/zm0Otc+b4rlqQW5ScXy6ZghP7HnWGOONi2
 XNsMTaYRr2DqEQ+48ZGG3EAA+hlnlwmPduxcBQZqgXRqKYoRMUqGLGmP1IGs0Scb
 xHg2OJsE6UXAfMMFAZJs20xLv+03Xwd+/VUtgYqRXdC9Shf3iW0Vhl6WpfC8Et8I
 shbR28WqxjEpp+mx3bb8eqy+gugAYLjBDQw0xA8dWZW1rw22JlOZ0m4HxjxvZZH7
 S452dbwffrFFrfQ2nxzveWphjyTDW5yIhCT1yFKbbzUGgVqRBisjs5fkazue5Dcq
 An39Tt+trObkiL32CuJ9S85oLERsFmdbd5lrlb1wD6ac0+YdE8jPVmQWTXFlNvoG
 gAlzoL5GfbtkKJrkBdf6EuvSs3USrh2WlaprARkfeLIaF7H3Sm1qDln63MBG4ej0
 2S27fMhstomRkT4QPEZUpnqB3/ui7r2bOQP7CQCE1VRmTy1d2kBINLzqIf+c5In3
 R2iLEWS0T3V7W5nFOSM2aSdovVvm94tac5aUaZDFaeYF5WU6wtNfyo57NwWbsmZm
 u4Rcj5mnObYy48W7et080A6vX6tE3k7lf17yrCfRhMjKYd9cc3CN9h5+MN4ArK3M
 yLRa4TVKTjGECRj0diJfjw==
 =/6ai
 -----END PGP SIGNATURE-----

Merge tag 'for-net-2024-09-27' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth

Luiz Augusto von Dentz says:

====================
bluetooth pull request for net:

 - btmrvl: Use IRQF_NO_AUTOEN flag in request_irq()
 - MGMT: Fix possible crash on mgmt_index_removed
 - L2CAP: Fix uaf in l2cap_connect
 - Bluetooth: hci_event: Align BR/EDR JUST_WORKS paring with LE

* tag 'for-net-2024-09-27' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
  Bluetooth: hci_event: Align BR/EDR JUST_WORKS paring with LE
  Bluetooth: btmrvl: Use IRQF_NO_AUTOEN flag in request_irq()
  Bluetooth: L2CAP: Fix uaf in l2cap_connect
  Bluetooth: MGMT: Fix possible crash on mgmt_index_removed
====================

Link: https://patch.msgid.link/20240927145730.2452175-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-02 17:09:52 -07:00
Jakub Kicinski
cb3ad11342 Merge tag 'ieee802154-for-net-2024-09-27' of git://git.kernel.org/pub/scm/linux/kernel/git/wpan/wpan
Stefan Schmidt says:

====================
pull-request: ieee802154 for net 2024-09-27

Jinjie Ruan added the use of IRQF_NO_AUTOEN in the mcr20a driver and fixed
and addiotinal build dependency problem while doing so.

Jiawei Ye, ensured a correct RCU handling in mac802154_scan_worker.

* tag 'ieee802154-for-net-2024-09-27' of git://git.kernel.org/pub/scm/linux/kernel/git/wpan/wpan:
  net: ieee802154: mcr20a: Use IRQF_NO_AUTOEN flag in request_irq()
  mac802154: Fix potential RCU dereference issue in mac802154_scan_worker
  ieee802154: Fix build error
====================

Link: https://patch.msgid.link/20240927094351.3865511-1-stefan@datenfreihafen.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-02 17:07:00 -07:00
Al Viro
5f60d5f6bb move asm/unaligned.h to linux/unaligned.h
asm/unaligned.h is always an include of asm-generic/unaligned.h;
might as well move that thing to linux/unaligned.h and include
that - there's nothing arch-specific in that header.

auto-generated by the following:

for i in `git grep -l -w asm/unaligned.h`; do
	sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i
done
for i in `git grep -l -w asm-generic/unaligned.h`; do
	sed -i -e "s/asm-generic\/unaligned.h/linux\/unaligned.h/" $i
done
git mv include/asm-generic/unaligned.h include/linux/unaligned.h
git mv tools/include/asm-generic/unaligned.h tools/include/linux/unaligned.h
sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild
sed -i -e "s/__ASM_GENERIC/__LINUX/" include/linux/unaligned.h tools/include/linux/unaligned.h
2024-10-02 17:23:23 -04:00
Anton Danilov
c4a14f6d9d ipv4: ip_gre: Fix drops of small packets in ipgre_xmit
Regression Description:

Depending on the options specified for the GRE tunnel device, small
packets may be dropped. This occurs because the pskb_network_may_pull
function fails due to the packet's insufficient length.

For example, if only the okey option is specified for the tunnel device,
original (before encapsulation) packets smaller than 28 bytes (including
the IPv4 header) will be dropped. This happens because the required
length is calculated relative to the network header, not the skb->head.

Here is how the required length is computed and checked:

* The pull_len variable is set to 28 bytes, consisting of:
  * IPv4 header: 20 bytes
  * GRE header with Key field: 8 bytes

* The pskb_network_may_pull function adds the network offset, shifting
the checkable space further to the beginning of the network header and
extending it to the beginning of the packet. As a result, the end of
the checkable space occurs beyond the actual end of the packet.

Instead of ensuring that 28 bytes are present in skb->head, the function
is requesting these 28 bytes starting from the network header. For small
packets, this requested length exceeds the actual packet size, causing
the check to fail and the packets to be dropped.

This issue affects both locally originated and forwarded packets in
DMVPN-like setups.

How to reproduce (for local originated packets):

  ip link add dev gre1 type gre ikey 1.9.8.4 okey 1.9.8.4 \
          local <your-ip> remote 0.0.0.0

  ip link set mtu 1400 dev gre1
  ip link set up dev gre1
  ip address add 192.168.13.1/24 dev gre1
  ip neighbor add 192.168.13.2 lladdr <remote-ip> dev gre1
  ping -s 1374 -c 10 192.168.13.2
  tcpdump -vni gre1
  tcpdump -vni <your-ext-iface> 'ip proto 47'
  ip -s -s -d link show dev gre1

Solution:

Use the pskb_may_pull function instead the pskb_network_may_pull.

Fixes: 80d875cfc9 ("ipv4: ip_gre: Avoid skb_pull() failure in ipgre_xmit()")
Signed-off-by: Anton Danilov <littlesmilingcloud@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20240924235158.106062-1-littlesmilingcloud@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-01 13:04:03 +02:00
Eric Dumazet
ab9a9a9e96 net: add more sanity checks to qdisc_pkt_len_init()
One path takes care of SKB_GSO_DODGY, assuming
skb->len is bigger than hdr_len.

virtio_net_hdr_to_skb() does not fully dissect TCP headers,
it only make sure it is at least 20 bytes.

It is possible for an user to provide a malicious 'GSO' packet,
total length of 80 bytes.

- 20 bytes of IPv4 header
- 60 bytes TCP header
- a small gso_size like 8

virtio_net_hdr_to_skb() would declare this packet as a normal
GSO packet, because it would see 40 bytes of payload,
bigger than gso_size.

We need to make detect this case to not underflow
qdisc_skb_cb(skb)->pkt_len.

Fixes: 1def9238d4 ("net_sched: more precise pkt_len computation")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-01 11:47:05 +02:00
Eric Dumazet
c20029db28 net: avoid potential underflow in qdisc_pkt_len_init() with UFO
After commit 7c6d2ecbda ("net: be more gentle about silly gso
requests coming from user") virtio_net_hdr_to_skb() had sanity check
to detect malicious attempts from user space to cook a bad GSO packet.

Then commit cf9acc90c8 ("net: virtio_net_hdr_to_skb: count
transport header in UFO") while fixing one issue, allowed user space
to cook a GSO packet with the following characteristic :

IPv4 SKB_GSO_UDP, gso_size=3, skb->len = 28.

When this packet arrives in qdisc_pkt_len_init(), we end up
with hdr_len = 28 (IPv4 header + UDP header), matching skb->len

Then the following sets gso_segs to 0 :

gso_segs = DIV_ROUND_UP(skb->len - hdr_len,
                        shinfo->gso_size);

Then later we set qdisc_skb_cb(skb)->pkt_len to back to zero :/

qdisc_skb_cb(skb)->pkt_len += (gso_segs - 1) * hdr_len;

This leads to the following crash in fq_codel [1]

qdisc_pkt_len_init() is best effort, we only want an estimation
of the bytes sent on the wire, not crashing the kernel.

This patch is fixing this particular issue, a following one
adds more sanity checks for another potential bug.

[1]
[   70.724101] BUG: kernel NULL pointer dereference, address: 0000000000000000
[   70.724561] #PF: supervisor read access in kernel mode
[   70.724561] #PF: error_code(0x0000) - not-present page
[   70.724561] PGD 10ac61067 P4D 10ac61067 PUD 107ee2067 PMD 0
[   70.724561] Oops: Oops: 0000 [#1] SMP NOPTI
[   70.724561] CPU: 11 UID: 0 PID: 2163 Comm: b358537762 Not tainted 6.11.0-virtme #991
[   70.724561] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[   70.724561] RIP: 0010:fq_codel_enqueue (net/sched/sch_fq_codel.c:120 net/sched/sch_fq_codel.c:168 net/sched/sch_fq_codel.c:230) sch_fq_codel
[ 70.724561] Code: 24 08 49 c1 e1 06 44 89 7c 24 18 45 31 ed 45 31 c0 31 ff 89 44 24 14 4c 03 8b 90 01 00 00 eb 04 39 ca 73 37 4d 8b 39 83 c7 01 <49> 8b 17 49 89 11 41 8b 57 28 45 8b 5f 34 49 c7 07 00 00 00 00 49
All code
========
   0:	24 08                	and    $0x8,%al
   2:	49 c1 e1 06          	shl    $0x6,%r9
   6:	44 89 7c 24 18       	mov    %r15d,0x18(%rsp)
   b:	45 31 ed             	xor    %r13d,%r13d
   e:	45 31 c0             	xor    %r8d,%r8d
  11:	31 ff                	xor    %edi,%edi
  13:	89 44 24 14          	mov    %eax,0x14(%rsp)
  17:	4c 03 8b 90 01 00 00 	add    0x190(%rbx),%r9
  1e:	eb 04                	jmp    0x24
  20:	39 ca                	cmp    %ecx,%edx
  22:	73 37                	jae    0x5b
  24:	4d 8b 39             	mov    (%r9),%r15
  27:	83 c7 01             	add    $0x1,%edi
  2a:*	49 8b 17             	mov    (%r15),%rdx		<-- trapping instruction
  2d:	49 89 11             	mov    %rdx,(%r9)
  30:	41 8b 57 28          	mov    0x28(%r15),%edx
  34:	45 8b 5f 34          	mov    0x34(%r15),%r11d
  38:	49 c7 07 00 00 00 00 	movq   $0x0,(%r15)
  3f:	49                   	rex.WB

Code starting with the faulting instruction
===========================================
   0:	49 8b 17             	mov    (%r15),%rdx
   3:	49 89 11             	mov    %rdx,(%r9)
   6:	41 8b 57 28          	mov    0x28(%r15),%edx
   a:	45 8b 5f 34          	mov    0x34(%r15),%r11d
   e:	49 c7 07 00 00 00 00 	movq   $0x0,(%r15)
  15:	49                   	rex.WB
[   70.724561] RSP: 0018:ffff95ae85e6fb90 EFLAGS: 00000202
[   70.724561] RAX: 0000000002000000 RBX: ffff95ae841de000 RCX: 0000000000000000
[   70.724561] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000001
[   70.724561] RBP: ffff95ae85e6fbf8 R08: 0000000000000000 R09: ffff95b710a30000
[   70.724561] R10: 0000000000000000 R11: bdf289445ce31881 R12: ffff95ae85e6fc58
[   70.724561] R13: 0000000000000000 R14: 0000000000000040 R15: 0000000000000000
[   70.724561] FS:  000000002c5c1380(0000) GS:ffff95bd7fcc0000(0000) knlGS:0000000000000000
[   70.724561] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   70.724561] CR2: 0000000000000000 CR3: 000000010c568000 CR4: 00000000000006f0
[   70.724561] Call Trace:
[   70.724561]  <TASK>
[   70.724561] ? __die (arch/x86/kernel/dumpstack.c:421 arch/x86/kernel/dumpstack.c:434)
[   70.724561] ? page_fault_oops (arch/x86/mm/fault.c:715)
[   70.724561] ? exc_page_fault (./arch/x86/include/asm/irqflags.h:26 ./arch/x86/include/asm/irqflags.h:87 ./arch/x86/include/asm/irqflags.h:147 arch/x86/mm/fault.c:1489 arch/x86/mm/fault.c:1539)
[   70.724561] ? asm_exc_page_fault (./arch/x86/include/asm/idtentry.h:623)
[   70.724561] ? fq_codel_enqueue (net/sched/sch_fq_codel.c:120 net/sched/sch_fq_codel.c:168 net/sched/sch_fq_codel.c:230) sch_fq_codel
[   70.724561] dev_qdisc_enqueue (net/core/dev.c:3784)
[   70.724561] __dev_queue_xmit (net/core/dev.c:3880 (discriminator 2) net/core/dev.c:4390 (discriminator 2))
[   70.724561] ? irqentry_enter (kernel/entry/common.c:237)
[   70.724561] ? sysvec_apic_timer_interrupt (./arch/x86/include/asm/hardirq.h:74 (discriminator 2) arch/x86/kernel/apic/apic.c:1043 (discriminator 2) arch/x86/kernel/apic/apic.c:1043 (discriminator 2))
[   70.724561] ? trace_hardirqs_on (kernel/trace/trace_preemptirq.c:58 (discriminator 4))
[   70.724561] ? asm_sysvec_apic_timer_interrupt (./arch/x86/include/asm/idtentry.h:702)
[   70.724561] ? virtio_net_hdr_to_skb.constprop.0 (./include/linux/virtio_net.h:129 (discriminator 1))
[   70.724561] packet_sendmsg (net/packet/af_packet.c:3145 (discriminator 1) net/packet/af_packet.c:3177 (discriminator 1))
[   70.724561] ? _raw_spin_lock_bh (./arch/x86/include/asm/atomic.h:107 (discriminator 4) ./include/linux/atomic/atomic-arch-fallback.h:2170 (discriminator 4) ./include/linux/atomic/atomic-instrumented.h:1302 (discriminator 4) ./include/asm-generic/qspinlock.h:111 (discriminator 4) ./include/linux/spinlock.h:187 (discriminator 4) ./include/linux/spinlock_api_smp.h:127 (discriminator 4) kernel/locking/spinlock.c:178 (discriminator 4))
[   70.724561] ? netdev_name_node_lookup_rcu (net/core/dev.c:325 (discriminator 1))
[   70.724561] __sys_sendto (net/socket.c:730 (discriminator 1) net/socket.c:745 (discriminator 1) net/socket.c:2210 (discriminator 1))
[   70.724561] ? __sys_setsockopt (./include/linux/file.h:34 net/socket.c:2355)
[   70.724561] __x64_sys_sendto (net/socket.c:2222 (discriminator 1) net/socket.c:2218 (discriminator 1) net/socket.c:2218 (discriminator 1))
[   70.724561] do_syscall_64 (arch/x86/entry/common.c:52 (discriminator 1) arch/x86/entry/common.c:83 (discriminator 1))
[   70.724561] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
[   70.724561] RIP: 0033:0x41ae09

Fixes: cf9acc90c8 ("net: virtio_net_hdr_to_skb: count transport header in UFO")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jonathan Davies <jonathan.davies@nutanix.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Jonathan Davies <jonathan.davies@nutanix.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-01 11:47:05 +02:00
Daniel Borkmann
e609c959a9 net: Fix gso_features_check to check for both dev->gso_{ipv4_,}max_size
Commit 24ab059d2e ("net: check dev->gso_max_size in gso_features_check()")
added a dev->gso_max_size test to gso_features_check() in order to fall
back to GSO when needed.

This was added as it was noticed that some drivers could misbehave if TSO
packets get too big. However, the check doesn't respect dev->gso_ipv4_max_size
limit. For instance, a device could be configured with BIG TCP for IPv4,
but not IPv6.

Therefore, add a netif_get_gso_max_size() equivalent to netif_get_gro_max_size()
and use the helper to respect both limits before falling back to GSO engine.

Fixes: 24ab059d2e ("net: check dev->gso_max_size in gso_features_check()")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20240923212242.15669-2-daniel@iogearbox.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-01 10:48:52 +02:00
Daniel Borkmann
e8d4d34df7 net: Add netif_get_gro_max_size helper for GRO
Add a small netif_get_gro_max_size() helper which returns the maximum IPv4
or IPv6 GRO size of the netdevice.

We later add a netif_get_gso_max_size() equivalent as well for GSO, so that
these helpers can be used consistently instead of open-coded checks.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20240923212242.15669-1-daniel@iogearbox.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-01 10:48:51 +02:00
Vladimir Oltean
6c24a03a61 net: dsa: improve shutdown sequence
Alexander Sverdlin presents 2 problems during shutdown with the
lan9303 driver. One is specific to lan9303 and the other just happens
to reproduce there.

The first problem is that lan9303 is unique among DSA drivers in that it
calls dev_get_drvdata() at "arbitrary runtime" (not probe, not shutdown,
not remove):

phy_state_machine()
-> ...
   -> dsa_user_phy_read()
      -> ds->ops->phy_read()
         -> lan9303_phy_read()
            -> chip->ops->phy_read()
               -> lan9303_mdio_phy_read()
                  -> dev_get_drvdata()

But we never stop the phy_state_machine(), so it may continue to run
after dsa_switch_shutdown(). Our common pattern in all DSA drivers is
to set drvdata to NULL to suppress the remove() method that may come
afterwards. But in this case it will result in an NPD.

The second problem is that the way in which we set
dp->conduit->dsa_ptr = NULL; is concurrent with receive packet
processing. dsa_switch_rcv() checks once whether dev->dsa_ptr is NULL,
but afterwards, rather than continuing to use that non-NULL value,
dev->dsa_ptr is dereferenced again and again without NULL checks:
dsa_conduit_find_user() and many other places. In between dereferences,
there is no locking to ensure that what was valid once continues to be
valid.

Both problems have the common aspect that closing the conduit interface
solves them.

In the first case, dev_close(conduit) triggers the NETDEV_GOING_DOWN
event in dsa_user_netdevice_event() which closes user ports as well.
dsa_port_disable_rt() calls phylink_stop(), which synchronously stops
the phylink state machine, and ds->ops->phy_read() will thus no longer
call into the driver after this point.

In the second case, dev_close(conduit) should do this, as per
Documentation/networking/driver.rst:

| Quiescence
| ----------
|
| After the ndo_stop routine has been called, the hardware must
| not receive or transmit any data.  All in flight packets must
| be aborted. If necessary, poll or wait for completion of
| any reset commands.

So it should be sufficient to ensure that later, when we zeroize
conduit->dsa_ptr, there will be no concurrent dsa_switch_rcv() call
on this conduit.

The addition of the netif_device_detach() function is to ensure that
ioctls, rtnetlinks and ethtool requests on the user ports no longer
propagate down to the driver - we're no longer prepared to handle them.

The race condition actually did not exist when commit 0650bf52b3
("net: dsa: be compatible with masters which unregister on shutdown")
first introduced dsa_switch_shutdown(). It was created later, when we
stopped unregistering the user interfaces from a bad spot, and we just
replaced that sequence with a racy zeroization of conduit->dsa_ptr
(one which doesn't ensure that the interfaces aren't up).

Reported-by: Alexander Sverdlin <alexander.sverdlin@siemens.com>
Closes: https://lore.kernel.org/netdev/2d2e3bba17203c14a5ffdabc174e3b6bbb9ad438.camel@siemens.com/
Closes: https://lore.kernel.org/netdev/c1bf4de54e829111e0e4a70e7bd1cf523c9550ff.camel@siemens.com/
Fixes: ee534378f0 ("net: dsa: fix panic when DSA master device unbinds on shutdown")
Reviewed-by: Alexander Sverdlin <alexander.sverdlin@siemens.com>
Tested-by: Alexander Sverdlin <alexander.sverdlin@siemens.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20240913203549.3081071-1-vladimir.oltean@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-01 09:56:36 +02:00
Linus Torvalds
894b3c35d1 Three CephFS fixes from Xiubo and Luis and a bunch of assorted
cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAmb3JroTHGlkcnlvbW92
 QGdtYWlsLmNvbQAKCRBKf944AhHzizDiB/0elHQQaFxXMjuJRY1IzohozAHi0cHK
 gwgE4nEbECE8vRYK/QvyvZ3S+ep+N+r6jOIiIDyqhjtlY3//oSyyxL7RjMJlVFBq
 Ie37w8r4q1aL1mn9QDQ4iQxcRYyU+JxcUcPR1UUUvLiKgWaRixmq27zby/WQSrkA
 ke2ScBRDtEAYVtdxvxmUJK/DrPr3skwJAGY52KesjwgVhXSL8KG9X1zMUbWdJYDV
 THbQzLZsu4NVh7LlAsS/mh+z0EIZsXxQYU5IY3dIVEYcuLK93lXRGZb+7whtmUef
 wsDtYIe/w30QVxFdrN28qAQp8daUJhp+3t0EZSyecRcq5OPey6ICx1P4
 =+bdB
 -----END PGP SIGNATURE-----

Merge tag 'ceph-for-6.12-rc1' of https://github.com/ceph/ceph-client

Pull ceph updates from Ilya Dryomov:
 "Three CephFS fixes from Xiubo and Luis and a bunch of assorted
  cleanups"

* tag 'ceph-for-6.12-rc1' of https://github.com/ceph/ceph-client:
  ceph: remove the incorrect Fw reference check when dirtying pages
  ceph: Remove empty definition in header file
  ceph: Fix typo in the comment
  ceph: fix a memory leak on cap_auths in MDS client
  ceph: flush all caps releases when syncing the whole filesystem
  ceph: rename ceph_flush_cap_releases() to ceph_flush_session_cap_releases()
  libceph: use min() to simplify code in ceph_dns_resolve_name()
  ceph: Convert to use jiffies macro
  ceph: Remove unused declarations
2024-09-28 08:40:36 -07:00
Al Viro
cb787f4ac0 [tree-wide] finally take no_llseek out
no_llseek had been defined to NULL two years ago, in commit 868941b144
("fs: remove no_llseek")

To quote that commit,

  At -rc1 we'll need do a mechanical removal of no_llseek -

  git grep -l -w no_llseek | grep -v porting.rst | while read i; do
	sed -i '/\<no_llseek\>/d' $i
  done

  would do it.

Unfortunately, that hadn't been done.  Linus, could you do that now, so
that we could finally put that thing to rest? All instances are of the
form
	.llseek = no_llseek,
so it's obviously safe.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-09-27 08:18:43 -07:00
Luiz Augusto von Dentz
b25e11f978 Bluetooth: hci_event: Align BR/EDR JUST_WORKS paring with LE
This aligned BR/EDR JUST_WORKS method with LE which since 92516cd97f
("Bluetooth: Always request for user confirmation for Just Works")
always request user confirmation with confirm_hint set since the
likes of bluetoothd have dedicated policy around JUST_WORKS method
(e.g. main.conf:JustWorksRepairing).

CVE: CVE-2024-8805
Cc: stable@vger.kernel.org
Fixes: ba15a58b17 ("Bluetooth: Fix SSP acceptor just-works confirmation without MITM")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Tested-by: Kiran K <kiran.k@intel.com>
2024-09-27 10:52:20 -04:00
Luiz Augusto von Dentz
333b4fd11e Bluetooth: L2CAP: Fix uaf in l2cap_connect
[Syzbot reported]
BUG: KASAN: slab-use-after-free in l2cap_connect.constprop.0+0x10d8/0x1270 net/bluetooth/l2cap_core.c:3949
Read of size 8 at addr ffff8880241e9800 by task kworker/u9:0/54

CPU: 0 UID: 0 PID: 54 Comm: kworker/u9:0 Not tainted 6.11.0-rc6-syzkaller-00268-g788220eee30d #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/06/2024
Workqueue: hci2 hci_rx_work
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:93 [inline]
 dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:119
 print_address_description mm/kasan/report.c:377 [inline]
 print_report+0xc3/0x620 mm/kasan/report.c:488
 kasan_report+0xd9/0x110 mm/kasan/report.c:601
 l2cap_connect.constprop.0+0x10d8/0x1270 net/bluetooth/l2cap_core.c:3949
 l2cap_connect_req net/bluetooth/l2cap_core.c:4080 [inline]
 l2cap_bredr_sig_cmd net/bluetooth/l2cap_core.c:4772 [inline]
 l2cap_sig_channel net/bluetooth/l2cap_core.c:5543 [inline]
 l2cap_recv_frame+0xf0b/0x8eb0 net/bluetooth/l2cap_core.c:6825
 l2cap_recv_acldata+0x9b4/0xb70 net/bluetooth/l2cap_core.c:7514
 hci_acldata_packet net/bluetooth/hci_core.c:3791 [inline]
 hci_rx_work+0xaab/0x1610 net/bluetooth/hci_core.c:4028
 process_one_work+0x9c5/0x1b40 kernel/workqueue.c:3231
 process_scheduled_works kernel/workqueue.c:3312 [inline]
 worker_thread+0x6c8/0xed0 kernel/workqueue.c:3389
 kthread+0x2c1/0x3a0 kernel/kthread.c:389
 ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
...

Freed by task 5245:
 kasan_save_stack+0x33/0x60 mm/kasan/common.c:47
 kasan_save_track+0x14/0x30 mm/kasan/common.c:68
 kasan_save_free_info+0x3b/0x60 mm/kasan/generic.c:579
 poison_slab_object+0xf7/0x160 mm/kasan/common.c:240
 __kasan_slab_free+0x32/0x50 mm/kasan/common.c:256
 kasan_slab_free include/linux/kasan.h:184 [inline]
 slab_free_hook mm/slub.c:2256 [inline]
 slab_free mm/slub.c:4477 [inline]
 kfree+0x12a/0x3b0 mm/slub.c:4598
 l2cap_conn_free net/bluetooth/l2cap_core.c:1810 [inline]
 kref_put include/linux/kref.h:65 [inline]
 l2cap_conn_put net/bluetooth/l2cap_core.c:1822 [inline]
 l2cap_conn_del+0x59d/0x730 net/bluetooth/l2cap_core.c:1802
 l2cap_connect_cfm+0x9e6/0xf80 net/bluetooth/l2cap_core.c:7241
 hci_connect_cfm include/net/bluetooth/hci_core.h:1960 [inline]
 hci_conn_failed+0x1c3/0x370 net/bluetooth/hci_conn.c:1265
 hci_abort_conn_sync+0x75a/0xb50 net/bluetooth/hci_sync.c:5583
 abort_conn_sync+0x197/0x360 net/bluetooth/hci_conn.c:2917
 hci_cmd_sync_work+0x1a4/0x410 net/bluetooth/hci_sync.c:328
 process_one_work+0x9c5/0x1b40 kernel/workqueue.c:3231
 process_scheduled_works kernel/workqueue.c:3312 [inline]
 worker_thread+0x6c8/0xed0 kernel/workqueue.c:3389
 kthread+0x2c1/0x3a0 kernel/kthread.c:389
 ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244

Reported-by: syzbot+c12e2f941af1feb5632c@syzkaller.appspotmail.com
Tested-by: syzbot+c12e2f941af1feb5632c@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=c12e2f941af1feb5632c
Fixes: 7b064edae3 ("Bluetooth: Fix authentication if acl data comes before remote feature evt")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-09-27 10:52:18 -04:00
Luiz Augusto von Dentz
f53e1c9c72 Bluetooth: MGMT: Fix possible crash on mgmt_index_removed
If mgmt_index_removed is called while there are commands queued on
cmd_sync it could lead to crashes like the bellow trace:

0x0000053D: __list_del_entry_valid_or_report+0x98/0xdc
0x0000053D: mgmt_pending_remove+0x18/0x58 [bluetooth]
0x0000053E: mgmt_remove_adv_monitor_complete+0x80/0x108 [bluetooth]
0x0000053E: hci_cmd_sync_work+0xbc/0x164 [bluetooth]

So while handling mgmt_index_removed this attempts to dequeue
commands passed as user_data to cmd_sync.

Fixes: 7cf5c2978f ("Bluetooth: hci_sync: Refactor remove Adv Monitor")
Reported-by: jiaymao <quic_jiaymao@quicinc.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-09-27 10:52:18 -04:00
Eric Dumazet
92ceba94de netfilter: nf_tables: prevent nf_skb_duplicated corruption
syzbot found that nf_dup_ipv4() or nf_dup_ipv6() could write
per-cpu variable nf_skb_duplicated in an unsafe way [1].

Disabling preemption as hinted by the splat is not enough,
we have to disable soft interrupts as well.

[1]
BUG: using __this_cpu_write() in preemptible [00000000] code: syz.4.282/6316
 caller is nf_dup_ipv4+0x651/0x8f0 net/ipv4/netfilter/nf_dup_ipv4.c:87
CPU: 0 UID: 0 PID: 6316 Comm: syz.4.282 Not tainted 6.11.0-rc7-syzkaller-00104-g7052622fccb1 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/06/2024
Call Trace:
 <TASK>
  __dump_stack lib/dump_stack.c:93 [inline]
  dump_stack_lvl+0x241/0x360 lib/dump_stack.c:119
  check_preemption_disabled+0x10e/0x120 lib/smp_processor_id.c:49
  nf_dup_ipv4+0x651/0x8f0 net/ipv4/netfilter/nf_dup_ipv4.c:87
  nft_dup_ipv4_eval+0x1db/0x300 net/ipv4/netfilter/nft_dup_ipv4.c:30
  expr_call_ops_eval net/netfilter/nf_tables_core.c:240 [inline]
  nft_do_chain+0x4ad/0x1da0 net/netfilter/nf_tables_core.c:288
  nft_do_chain_ipv4+0x202/0x320 net/netfilter/nft_chain_filter.c:23
  nf_hook_entry_hookfn include/linux/netfilter.h:154 [inline]
  nf_hook_slow+0xc3/0x220 net/netfilter/core.c:626
  nf_hook+0x2c4/0x450 include/linux/netfilter.h:269
  NF_HOOK_COND include/linux/netfilter.h:302 [inline]
  ip_output+0x185/0x230 net/ipv4/ip_output.c:433
  ip_local_out net/ipv4/ip_output.c:129 [inline]
  ip_send_skb+0x74/0x100 net/ipv4/ip_output.c:1495
  udp_send_skb+0xacf/0x1650 net/ipv4/udp.c:981
  udp_sendmsg+0x1c21/0x2a60 net/ipv4/udp.c:1269
  sock_sendmsg_nosec net/socket.c:730 [inline]
  __sock_sendmsg+0x1a6/0x270 net/socket.c:745
  ____sys_sendmsg+0x525/0x7d0 net/socket.c:2597
  ___sys_sendmsg net/socket.c:2651 [inline]
  __sys_sendmmsg+0x3b2/0x740 net/socket.c:2737
  __do_sys_sendmmsg net/socket.c:2766 [inline]
  __se_sys_sendmmsg net/socket.c:2763 [inline]
  __x64_sys_sendmmsg+0xa0/0xb0 net/socket.c:2763
  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
  do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f4ce4f7def9
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f4ce5d4a038 EFLAGS: 00000246 ORIG_RAX: 0000000000000133
RAX: ffffffffffffffda RBX: 00007f4ce5135f80 RCX: 00007f4ce4f7def9
RDX: 0000000000000001 RSI: 0000000020005d40 RDI: 0000000000000006
RBP: 00007f4ce4ff0b76 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: 00007f4ce5135f80 R15: 00007ffd4cbc6d68
 </TASK>

Fixes: d877f07112 ("netfilter: nf_tables: add nft_dup expression")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-09-27 13:55:46 +02:00
Linus Torvalds
62a0e2fa40 Including fixes from netfilter.
Previous releases - regressions:
 
   - netfilter:
     - nf_reject_ipv6: fix nf_reject_ip6_tcphdr_put()
     - nf_tables: keep deleted flowtable hooks until after RCU
 
   - tcp: check skb is non-NULL in tcp_rto_delta_us()
 
   - phy: aquantia: fix -ETIMEDOUT PHY probe failure when firmware not present
 
   - eth: virtio_net: fix mismatched buf address when unmapping for small packets
 
   - eth: stmmac: fix zero-division error when disabling tc cbs
 
   - eth: bonding: fix unnecessary warnings and logs from bond_xdp_get_xmit_slave()
 
 Previous releases - always broken:
 
   - netfilter:
     - fix clash resolution for bidirectional flows
     - fix allocation with no memcg accounting
 
   - eth: r8169: add tally counter fields added with RTL8125
 
   - eth: ravb: fix rx and tx frame size limit
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmb1bHASHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOkxUAP/3cnsANzqmulU+zXLRCyYqQkMnLDrXuC
 yb1sy4gf/2vih+UPAK0Gw+NXMnL/Ftlv2EMV9RQKFjIWV4D0AYGEmKdnPhe2ycRN
 0Gr7zSZdP2KlA7HgYSehxmWjrNFatAmyGvIEYs+9JBzLnoZCkRlsrYE8HO7fk8+a
 4FDyh+FyiniDKR3+W/tgPoZy/U+FS9AUftOrAjCM/o6c0WPugwgHDxwlyrBg3lAp
 Mkx8Q3IPWESOfPcUmJ+AezljfL1W3xAG/4cxALpN9lboeJaZNjvMQgMyqC1uVyHS
 VJOkOuhQEVfXpc9139j5DxPHhacmLBQGfDw6ZXevwRC9NwgaLcRh9cf3rUafA7uC
 qT7P5dt5y3kGOqp7pltUsFT7C47VD7ZlFz4J6eqTVCVTopjpMipZajvWZEIDNqPa
 ftsMW0ZIbjpJVTJAvhlrKySxsRFte6b3aa9VdttkevgQPMneEXyePe8Me6Fbrv+t
 hF5R8we6842xclLfjBCJT1d4e7yW8B5o69eygQbyaqRK9EhbaF+4R0V+NK9eVnd9
 qZudNZBznnfdVgjjgcu12qievHEazIAFkyjs+ZCt2xYNcRg8cLwr/TclOB8fEMBO
 VpjPci4j1Ln158EbGJf30VQpZJzXSrxZ4HFZU1Be+d3fW58o1H9zMfvweOcvxI/v
 AQWSy3aMoWHB
 =l8TJ
 -----END PGP SIGNATURE-----

Merge tag 'net-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from netfilter.

  It looks like that most people are still traveling: both the ML volume
  and the processing capacity are low.

  Previous releases - regressions:

    - netfilter:
        - nf_reject_ipv6: fix nf_reject_ip6_tcphdr_put()
        - nf_tables: keep deleted flowtable hooks until after RCU

    - tcp: check skb is non-NULL in tcp_rto_delta_us()

    - phy: aquantia: fix -ETIMEDOUT PHY probe failure when firmware not
      present

    - eth: virtio_net: fix mismatched buf address when unmapping for
      small packets

    - eth: stmmac: fix zero-division error when disabling tc cbs

    - eth: bonding: fix unnecessary warnings and logs from
      bond_xdp_get_xmit_slave()

  Previous releases - always broken:

    - netfilter:
        - fix clash resolution for bidirectional flows
        - fix allocation with no memcg accounting

    - eth: r8169: add tally counter fields added with RTL8125

    - eth: ravb: fix rx and tx frame size limit"

* tag 'net-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (35 commits)
  selftests: netfilter: Avoid hanging ipvs.sh
  kselftest: add test for nfqueue induced conntrack race
  netfilter: nfnetlink_queue: remove old clash resolution logic
  netfilter: nf_tables: missing objects with no memcg accounting
  netfilter: nf_tables: use rcu chain hook list iterator from netlink dump path
  netfilter: ctnetlink: compile ctnetlink_label_size with CONFIG_NF_CONNTRACK_EVENTS
  netfilter: nf_reject: Fix build warning when CONFIG_BRIDGE_NETFILTER=n
  netfilter: nf_tables: Keep deleted flowtable hooks until after RCU
  docs: tproxy: ignore non-transparent sockets in iptables
  netfilter: ctnetlink: Guard possible unused functions
  selftests: netfilter: nft_tproxy.sh: add tcp tests
  selftests: netfilter: add reverse-clash resolution test case
  netfilter: conntrack: add clash resolution for reverse collisions
  netfilter: nf_nat: don't try nat source port reallocation for reverse dir clash
  selftests/net: packetdrill: increase timing tolerance in debug mode
  usbnet: fix cyclical race on disconnect with work queue
  net: stmmac: set PP_FLAG_DMA_SYNC_DEV only if XDP is enabled
  virtio_net: Fix mismatched buf address when unmapping for small packets
  bonding: Fix unnecessary warnings and logs from bond_xdp_get_xmit_slave()
  r8169: add missing MODULE_FIRMWARE entry for RTL8126A rev.b
  ...
2024-09-26 10:27:10 -07:00
Linus Torvalds
4965ddb166 USB/Thunderbolt update for 6.12-rc1
Here is the large set of USB and Thunderbolt changes for 6.12-rc1.
 
 Nothing "major" in here, except for a new 9p network gadget that has
 been worked on for a long time (all of the needed acks are here.)  Other
 than that, it's the usual set of:
   - Thunderbolt / USB4 driver updates and additions for new hardware
   - dwc3 driver updates and new features added
   - xhci driver updates
   - typec driver updates
   - USB gadget updates and api additions to make some gadgets more
     configurable by userspace
   - dwc2 driver updates
   - usb phy driver updates
   - usbip feature additions
   - other minor USB driver updates
 
 All of these have been in linux-next for a long time with no reported
 issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZvU0/g8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ykGcACfSqouxRg8FRtq+nIKHWXI9lOTnVcAoKd9PAgq
 1i7yCNopPEPEW8sjz1GX
 =mY+S
 -----END PGP SIGNATURE-----

Merge tag 'usb-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB/Thunderbolt updates from Greg KH:
 "Here is the large set of USB and Thunderbolt changes for 6.12-rc1.

  Nothing "major" in here, except for a new 9p network gadget that has
  been worked on for a long time (all of the needed acks are here)

  Other than that, it's the usual set of:

   - Thunderbolt / USB4 driver updates and additions for new hardware

   - dwc3 driver updates and new features added

   - xhci driver updates

   - typec driver updates

   - USB gadget updates and api additions to make some gadgets more
     configurable by userspace

   - dwc2 driver updates

   - usb phy driver updates

   - usbip feature additions

   - other minor USB driver updates

  All of these have been in linux-next for a long time with no reported
  issues"

* tag 'usb-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (145 commits)
  sub: cdns3: Use predefined PCI vendor ID constant
  sub: cdns2: Use predefined PCI vendor ID constant
  USB: misc: yurex: fix race between read and write
  USB: misc: cypress_cy7c63: check for short transfer
  USB: appledisplay: close race between probe and completion handler
  USB: class: CDC-ACM: fix race between get_serial and set_serial
  usb: r8a66597-hcd: make read-only const arrays static
  usb: typec: ucsi: Fix busy loop on ASUS VivoBooks
  usb: dwc3: rtk: Clean up error code in __get_dwc3_maximum_speed()
  usb: storage: ene_ub6250: Fix right shift warnings
  usb: roles: Improve the fix for a false positive recursive locking complaint
  locking/mutex: Introduce mutex_init_with_key()
  locking/mutex: Define mutex_init() once
  net/9p/usbg: fix CONFIG_USB_GADGET dependency
  usb: xhci: fix loss of data on Cadence xHC
  usb: xHCI: add XHCI_RESET_ON_RESUME quirk for Phytium xHCI host
  usb: dwc3: imx8mp: disable SS_CON and U3 wakeup for system sleep
  usb: dwc3: imx8mp: add 2 software managed quirk properties for host mode
  usb: host: xhci-plat: Parse xhci-missing_cas_quirk and apply quirk
  usb: misc: onboard_usb_dev: add Microchip usb5744 SMBus programming support
  ...
2024-09-26 09:45:36 -07:00
Linus Torvalds
0181f8c809 virtio: features, fixes, cleanups
Several new features here:
 
 	virtio-balloon supports new stats
 
 	vdpa supports setting mac address
 
 	vdpa/mlx5 suspend/resume as well as MKEY ops are now faster
 
 	virtio_fs supports new sysfs entries for queue info
 
 	virtio/vsock performance has been improved
 
 Fixes, cleanups all over the place.
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmbz7ykPHG1zdEByZWRo
 YXQuY29tAAoJECgfDbjSjVRpkk8H/A3vMRYXBzne9anezZLvADKS/CpX7v0DFEVj
 VfSMWXvYdUariYDyyb7pZsvK5QR22pE0pIaW6Kcgv9fNwq27M/H6g6NJk5ny8a7d
 216AQs1J28pXPPY+q03fhf3SzE3yHP8aeD9lyiO9QJYfs9vjtoyZeBGt3a4IUSX4
 ZeNBAx8xWTBcEDIIcZLdY1DNDTbZ4+qQ12Ln9IKq7D4xkE6l7Xh+HGdgTWTnDZ8P
 qEUUOmJTFKTQdOiVuU4NN3wzgHKWHdwKg0uWXo7ereYr3kYe3q//jCcLMv88a1x0
 XP7NRBQg/rsErwTMdLz6ffyqXJs6lGGqNXzRfZKEwAvmnh/+zs4=
 =gNBq
 -----END PGP SIGNATURE-----

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio updates from Michael Tsirkin:
 "Several new features here:

   - virtio-balloon supports new stats

   - vdpa supports setting mac address

   - vdpa/mlx5 suspend/resume as well as MKEY ops are now faster

   - virtio_fs supports new sysfs entries for queue info

   - virtio/vsock performance has been improved

  And fixes, cleanups all over the place"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (34 commits)
  vsock/virtio: avoid queuing packets when intermediate queue is empty
  vsock/virtio: refactor virtio_transport_send_pkt_work
  fw_cfg: Constify struct kobj_type
  vdpa/mlx5: Postpone MR deletion
  vdpa/mlx5: Introduce init/destroy for MR resources
  vdpa/mlx5: Rename mr_mtx -> lock
  vdpa/mlx5: Extract mr members in own resource struct
  vdpa/mlx5: Rename function
  vdpa/mlx5: Delete direct MKEYs in parallel
  vdpa/mlx5: Create direct MKEYs in parallel
  MAINTAINERS: add virtio-vsock driver in the VIRTIO CORE section
  virtio_fs: add sysfs entries for queue information
  virtio_fs: introduce virtio_fs_put_locked helper
  vdpa: Remove unused declarations
  vdpa/mlx5: Parallelize VQ suspend/resume for CVQ MQ command
  vdpa/mlx5: Small improvement for change_num_qps()
  vdpa/mlx5: Keep notifiers during suspend but ignore
  vdpa/mlx5: Parallelize device resume
  vdpa/mlx5: Parallelize device suspend
  vdpa/mlx5: Use async API for vq modify commands
  ...
2024-09-26 08:43:17 -07:00
Paolo Abeni
aef3a58b06 netfilter pull request 24-09-26
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEN9lkrMBJgcdVAPub1V2XiooUIOQFAmb1P8AACgkQ1V2XiooU
 IOT2KQ/9Gpf66VH41Byae9qzpgS+iRWUkN3Apn/5m7io/v0AuEmDfDRCPcOH/k8N
 61m5RGBzuZETR3YhmlzzvMv5WXmHJmUCGjWm5M2b6Byji13GsdgTqJ3VXwgQXINI
 tuE2bRTRzm5oBOsJvTENb5X7A3Bmjnk93N4jJSQgQNzO+fTNgiUQxszrUc2llQLS
 D85VC94AtNu3fKbv+sv76yWGdR+srq2ePeN+6lDT/Hx6sqnU+uWziYaSXLTmWd9S
 va+yOgi2t0gJkCZqfR/Aw8fQJSpCLWFIy4LBJa1fFX6ni462w2c7VOMPHnJ3PlOy
 QG+UAH2brpRyIVn3IBzEeBDb1ZhrsHKsEaUz84LHs22XbZCCZ4xAfe0DsFmxC0o3
 TW9f0RA9geRlnZOxHJRHc8I6Edi4B3oBcvbEe6PaoHeQJCUqfVJp8dgkLT0IvySJ
 TWYQEx8A/fSBKmr8QQ9L/wEomTTnvLuW5GW4dyOsfoyS7DKd9wgIycujakqmowIA
 ZnaXmosCtopNGrf5lxKsWYDac4VKLJufzjCj/4b7Q1BBaJXmSj0xVD0/0fSJeijk
 t9nfvvOwBKBYOoZOwYK2KD+YmMwxSuHz48yE0WZANoRnTP/gwFhY9bDmonqOi7+e
 L5Vbtv6QZtnChnHCSkRzXEkmKUIlzMoi607suV1jYmmDiEQoa+A=
 =a9OT
 -----END PGP SIGNATURE-----

Merge tag 'nf-24-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

v2: with kdoc fixes per Paolo Abeni.

The following patchset contains Netfilter fixes for net:

Patch #1 and #2 handle an esoteric scenario: Given two tasks sending UDP
packets to one another, two packets of the same flow in each direction
handled by different CPUs that result in two conntrack objects in NEW
state, where reply packet loses race. Then, patch #3 adds a testcase for
this scenario. Series from Florian Westphal.

1) NAT engine can falsely detect a port collision if it happens to pick
   up a reply packet as NEW rather than ESTABLISHED. Add extra code to
   detect this and suppress port reallocation in this case.

2) To complete the clash resolution in the reply direction, extend conntrack
   logic to detect clashing conntrack in the reply direction to existing entry.

3) Adds a test case.

Then, an assorted list of fixes follow:

4) Add a selftest for tproxy, from Antonio Ojea.

5) Guard ctnetlink_*_size() functions under
   #if defined(CONFIG_NETFILTER_NETLINK_GLUE_CT) || defined(CONFIG_NF_CONNTRACK_EVENTS)
   From Andy Shevchenko.

6) Use -m socket --transparent in iptables tproxy documentation.
   From XIE Zhibang.

7) Call kfree_rcu() when releasing flowtable hooks to address race with
   netlink dump path, from Phil Sutter.

8) Fix compilation warning in nf_reject with CONFIG_BRIDGE_NETFILTER=n.
   From Simon Horman.

9) Guard ctnetlink_label_size() under CONFIG_NF_CONNTRACK_EVENTS which
   is its only user, to address a compilation warning. From Simon Horman.

10) Use rcu-protected list iteration over basechain hooks from netlink
    dump path.

11) Fix memcg for nf_tables, use GFP_KERNEL_ACCOUNT is not complete.

12) Remove old nfqueue conntrack clash resolution. Instead trying to
    use same destination address consistently which requires double DNAT,
    use the existing clash resolution which allows clashing packets
    go through with different destination. Antonio Ojea originally
    reported an issue from the postrouting chain, I proposed a fix:
    https://lore.kernel.org/netfilter-devel/ZuwSwAqKgCB2a51-@calendula/T/
    which he reported it did not work for him.

13) Adds a selftest for patch 12.

14) Fixes ipvs.sh selftest.

netfilter pull request 24-09-26

* tag 'nf-24-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  selftests: netfilter: Avoid hanging ipvs.sh
  kselftest: add test for nfqueue induced conntrack race
  netfilter: nfnetlink_queue: remove old clash resolution logic
  netfilter: nf_tables: missing objects with no memcg accounting
  netfilter: nf_tables: use rcu chain hook list iterator from netlink dump path
  netfilter: ctnetlink: compile ctnetlink_label_size with CONFIG_NF_CONNTRACK_EVENTS
  netfilter: nf_reject: Fix build warning when CONFIG_BRIDGE_NETFILTER=n
  netfilter: nf_tables: Keep deleted flowtable hooks until after RCU
  docs: tproxy: ignore non-transparent sockets in iptables
  netfilter: ctnetlink: Guard possible unused functions
  selftests: netfilter: nft_tproxy.sh: add tcp tests
  selftests: netfilter: add reverse-clash resolution test case
  netfilter: conntrack: add clash resolution for reverse collisions
  netfilter: nf_nat: don't try nat source port reallocation for reverse dir clash
====================

Link: https://patch.msgid.link/20240926110717.102194-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-09-26 15:47:11 +02:00
Florian Westphal
8af79d3edb netfilter: nfnetlink_queue: remove old clash resolution logic
For historical reasons there are two clash resolution spots in
netfilter, one in nfnetlink_queue and one in conntrack core.

nfnetlink_queue one was added first: If a colliding entry is found, NAT
NAT transformation is reversed by calling nat engine again with altered
tuple.

See commit 368982cd7d ("netfilter: nfnetlink_queue: resolve clash for
unconfirmed conntracks") for details.

One problem is that nf_reroute() won't take an action if the queueing
doesn't occur in the OUTPUT hook, i.e. when queueing in forward or
postrouting, packet will be sent via the wrong path.

Another problem is that the scenario addressed (2nd UDP packet sent with
identical addresses while first packet is still being processed) can also
occur without any nfqueue involvement due to threaded resolvers doing
A and AAAA requests back-to-back.

This lead us to add clash resolution logic to the conntrack core, see
commit 6a757c07e5 ("netfilter: conntrack: allow insertion of clashing
entries").  Instead of fixing the nfqueue based logic, lets remove it
and let conntrack core handle this instead.

Retain the ->update hook for sake of nfqueue based conntrack helpers.
We could axe this hook completely but we'd have to split confirm and
helper logic again, see commit ee04805ff5 ("netfilter: conntrack: make
conntrack userspace helpers work again").

This SHOULD NOT be backported to kernels earlier than v5.6; they lack
adequate clash resolution handling.

Patch was originally written by Pablo Neira Ayuso.

Reported-by: Antonio Ojea <aojea@google.com>
Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1766
Signed-off-by: Florian Westphal <fw@strlen.de>
Tested-by: Antonio Ojea <aojea@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-09-26 13:03:03 +02:00
Pablo Neira Ayuso
69e687cea7 netfilter: nf_tables: missing objects with no memcg accounting
Several ruleset objects are still not using GFP_KERNEL_ACCOUNT for
memory accounting, update them. This includes:

- catchall elements
- compat match large info area
- log prefix
- meta secctx
- numgen counters
- pipapo set backend datastructure
- tunnel private objects

Fixes: 33758c8914 ("memcg: enable accounting for nft objects")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-09-26 13:03:02 +02:00
Pablo Neira Ayuso
4ffcf5ca81 netfilter: nf_tables: use rcu chain hook list iterator from netlink dump path
Lockless iteration over hook list is possible from netlink dump path,
use rcu variant to iterate over the hook list as is done with flowtable
hooks.

Fixes: b9703ed44f ("netfilter: nf_tables: support for adding new devices to an existing netdev chain")
Reported-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-09-26 13:03:02 +02:00
Simon Horman
e1f1ee0e9a netfilter: ctnetlink: compile ctnetlink_label_size with CONFIG_NF_CONNTRACK_EVENTS
Only provide ctnetlink_label_size when it is used,
which is when CONFIG_NF_CONNTRACK_EVENTS is configured.

Flagged by clang-18 W=1 builds as:

.../nf_conntrack_netlink.c:385:19: warning: unused function 'ctnetlink_label_size' [-Wunused-function]
  385 | static inline int ctnetlink_label_size(const struct nf_conn *ct)
      |                   ^~~~~~~~~~~~~~~~~~~~

The condition on CONFIG_NF_CONNTRACK_LABELS being removed by
this patch guards compilation of non-trivial implementations
of ctnetlink_dump_labels() and ctnetlink_label_size().

However, this is not necessary as each of these functions
will always return 0 if CONFIG_NF_CONNTRACK_LABELS is not defined
as each function starts with the equivalent of:

	struct nf_conn_labels *labels = nf_ct_labels_find(ct);

	if (!labels)
		return 0;

And nf_ct_labels_find always returns NULL if CONFIG_NF_CONNTRACK_LABELS
is not enabled.  So I believe that the compiler optimises the code away
in such cases anyway.

Found by inspection.
Compile tested only.

Originally splitted in two patches, Pablo Neira Ayuso collapsed them and
added Fixes: tag.

Fixes: 0ceabd8387 ("netfilter: ctnetlink: deliver labels to userspace")
Link: https://lore.kernel.org/netfilter-devel/20240909151712.GZ2097826@kernel.org/
Signed-off-by: Simon Horman <horms@kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-09-26 13:03:02 +02:00
Simon Horman
fc56878ca1 netfilter: nf_reject: Fix build warning when CONFIG_BRIDGE_NETFILTER=n
If CONFIG_BRIDGE_NETFILTER is not enabled, which is the case for x86_64
defconfig, then building nf_reject_ipv4.c and nf_reject_ipv6.c with W=1
using gcc-14 results in the following warnings, which are treated as
errors:

net/ipv4/netfilter/nf_reject_ipv4.c: In function 'nf_send_reset':
net/ipv4/netfilter/nf_reject_ipv4.c:243:23: error: variable 'niph' set but not used [-Werror=unused-but-set-variable]
  243 |         struct iphdr *niph;
      |                       ^~~~
cc1: all warnings being treated as errors
net/ipv6/netfilter/nf_reject_ipv6.c: In function 'nf_send_reset6':
net/ipv6/netfilter/nf_reject_ipv6.c:286:25: error: variable 'ip6h' set but not used [-Werror=unused-but-set-variable]
  286 |         struct ipv6hdr *ip6h;
      |                         ^~~~
cc1: all warnings being treated as errors

Address this by reducing the scope of these local variables to where
they are used, which is code only compiled when CONFIG_BRIDGE_NETFILTER
enabled.

Compile tested and run through netfilter selftests.

Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Closes: https://lore.kernel.org/netfilter-devel/20240906145513.567781-1-andriy.shevchenko@linux.intel.com/
Signed-off-by: Simon Horman <horms@kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-09-26 13:03:02 +02:00
Phil Sutter
642c89c475 netfilter: nf_tables: Keep deleted flowtable hooks until after RCU
Documentation of list_del_rcu() warns callers to not immediately free
the deleted list item. While it seems not necessary to use the
RCU-variant of list_del() here in the first place, doing so seems to
require calling kfree_rcu() on the deleted item as well.

Fixes: 3f0465a9ef ("netfilter: nf_tables: dynamically allocate hooks per net_device in flowtables")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-09-26 13:03:01 +02:00
Andy Shevchenko
2cadd3b177 netfilter: ctnetlink: Guard possible unused functions
Some of the functions may be unused (CONFIG_NETFILTER_NETLINK_GLUE_CT=n
and CONFIG_NF_CONNTRACK_EVENTS=n), it prevents kernel builds with clang,
`make W=1` and CONFIG_WERROR=y:

net/netfilter/nf_conntrack_netlink.c:657:22: error: unused function 'ctnetlink_acct_size' [-Werror,-Wunused-function]
  657 | static inline size_t ctnetlink_acct_size(const struct nf_conn *ct)
      |                      ^~~~~~~~~~~~~~~~~~~
net/netfilter/nf_conntrack_netlink.c:667:19: error: unused function 'ctnetlink_secctx_size' [-Werror,-Wunused-function]
  667 | static inline int ctnetlink_secctx_size(const struct nf_conn *ct)
      |                   ^~~~~~~~~~~~~~~~~~~~~
net/netfilter/nf_conntrack_netlink.c:683:22: error: unused function 'ctnetlink_timestamp_size' [-Werror,-Wunused-function]
  683 | static inline size_t ctnetlink_timestamp_size(const struct nf_conn *ct)
      |                      ^~~~~~~~~~~~~~~~~~~~~~~~

Fix this by guarding possible unused functions with ifdeffery.

See also commit 6863f5643d ("kbuild: allow Clang to find unused static
inline functions for W=1 build").

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-09-26 13:03:01 +02:00
Florian Westphal
a4e6a1031e netfilter: conntrack: add clash resolution for reverse collisions
Given existing entry:
ORIGIN: a:b -> c:d
REPLY:  c:d -> a:b

And colliding entry:
ORIGIN: c:d -> a:b
REPLY:  a:b -> c:d

The colliding ct (and the associated skb) get dropped on insert.
Permit this by checking if the colliding entry matches the reply
direction.

Happens when both ends send packets at same time, both requests are picked
up as NEW, rather than NEW for the 'first' and 'ESTABLISHED' for the
second packet.

This is an esoteric condition, as ruleset must permit NEW connections
in either direction and both peers must already have a bidirectional
traffic flow at the time conntrack gets enabled.

Allow the 'reverse' skb to pass and assign the existing (clashing)
entry.

While at it, also drop the extra 'dying' check, this is already
tested earlier by the calling function.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-09-26 13:01:54 +02:00
Florian Westphal
d8f84a9bc7 netfilter: nf_nat: don't try nat source port reallocation for reverse dir clash
A conntrack entry can be inserted to the connection tracking table if there
is no existing entry with an identical tuple in either direction.

Example:
INITIATOR -> NAT/PAT -> RESPONDER

Initiator passes through NAT/PAT ("us") and SNAT is done (saddr rewrite).
Then, later, NAT/PAT machine itself also wants to connect to RESPONDER.

This will not work if the SNAT done earlier has same IP:PORT source pair.

Conntrack table has:
ORIGINAL: $IP_INITATOR:$SPORT -> $IP_RESPONDER:$DPORT
REPLY:    $IP_RESPONDER:$DPORT -> $IP_NAT:$SPORT

and new locally originating connection wants:
ORIGINAL: $IP_NAT:$SPORT -> $IP_RESPONDER:$DPORT
REPLY:    $IP_RESPONDER:$DPORT -> $IP_NAT:$SPORT

This is handled by the NAT engine which will do a source port reallocation
for the locally originating connection that is colliding with an existing
tuple by attempting a source port rewrite.

This is done even if this new connection attempt did not go through a
masquerade/snat rule.

There is a rare race condition with connection-less protocols like UDP,
where we do the port reallocation even though its not needed.

This happens when new packets from the same, pre-existing flow are received
in both directions at the exact same time on different CPUs after the
conntrack table was flushed (or conntrack becomes active for first time).

With strict ordering/single cpu, the first packet creates new ct entry and
second packet is resolved as established reply packet.

With parallel processing, both packets are picked up as new and both get
their own ct entry.

In this case, the 'reply' packet (picked up as ORIGINAL) can be mangled by
NAT engine because a port collision is detected.

This change isn't enough to prevent a packet drop later during
nf_conntrack_confirm(), the existing clash resolution strategy will not
detect such reverse clash case.  This is resolved by a followup patch.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-09-26 13:00:55 +02:00
Luigi Leonardi
efcd71af38 vsock/virtio: avoid queuing packets when intermediate queue is empty
When the driver needs to send new packets to the device, it always
queues the new sk_buffs into an intermediate queue (send_pkt_queue)
and schedules a worker (send_pkt_work) to then queue them into the
virtqueue exposed to the device.

This increases the chance of batching, but also introduces a lot of
latency into the communication. So we can optimize this path by
adding a fast path to be taken when there is no element in the
intermediate queue, there is space available in the virtqueue,
and no other process that is sending packets (tx_lock held).

The following benchmarks were run to check improvements in latency and
throughput. The test bed is a host with Intel i7-10700KF CPU @ 3.80GHz
and L1 guest running on QEMU/KVM with vhost process and all vCPUs
pinned individually to pCPUs.

- Latency
   Tool: Fio version 3.37-56
   Mode: pingpong (h-g-h)
   Test runs: 50
   Runtime-per-test: 50s
   Type: SOCK_STREAM

In the following fio benchmark (pingpong mode) the host sends
a payload to the guest and waits for the same payload back.

fio process pinned both inside the host and the guest system.

Before: Linux 6.9.8

Payload 64B:

	1st perc.	overall		99th perc.
Before	12.91		16.78		42.24		us
After	9.77		13.57		39.17		us

Payload 512B:

	1st perc.	overall		99th perc.
Before	13.35		17.35		41.52		us
After	10.25		14.11		39.58		us

Payload 4K:

	1st perc.	overall		99th perc.
Before	14.71		19.87		41.52		us
After	10.51		14.96		40.81		us

- Throughput
   Tool: iperf-vsock

The size represents the buffer length (-l) to read/write
P represents the number of parallel streams

P=1
	4K	64K	128K
Before	6.87	29.3	29.5 Gb/s
After	10.5	39.4	39.9 Gb/s

P=2
	4K	64K	128K
Before	10.5	32.8	33.2 Gb/s
After	17.8	47.7	48.5 Gb/s

P=4
	4K	64K	128K
Before	12.7	33.6	34.2 Gb/s
After	16.9	48.1	50.5 Gb/s

The performance improvement is related to this optimization,
I used a ebpf kretprobe on virtio_transport_send_skb to check
that each packet was sent directly to the virtqueue

Co-developed-by: Marco Pinna <marco.pinn95@gmail.com>
Signed-off-by: Marco Pinna <marco.pinn95@gmail.com>
Signed-off-by: Luigi Leonardi <luigi.leonardi@outlook.com>
Message-Id: <20240730-pinna-v4-2-5c9179164db5@outlook.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
2024-09-25 07:07:44 -04:00
Marco Pinna
26618da3b2 vsock/virtio: refactor virtio_transport_send_pkt_work
Preliminary patch to introduce an optimization to the
enqueue system.

All the code used to enqueue a packet into the virtqueue
is removed from virtio_transport_send_pkt_work()
and moved to the new virtio_transport_send_skb() function.

Co-developed-by: Luigi Leonardi <luigi.leonardi@outlook.com>
Signed-off-by: Luigi Leonardi <luigi.leonardi@outlook.com>
Signed-off-by: Marco Pinna <marco.pinn95@gmail.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20240730-pinna-v4-1-5c9179164db5@outlook.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-09-25 07:07:44 -04:00
Linus Torvalds
684a64bf32 NFS Client Updates for Linux 6.12
New Features:
   * Add a 'noalignwrite' mount option for lock-less 'lost writes' prevention
   * Add support for the LOCALIO protocol extention
 
 Bugfixes:
   * Fix memory leak in error path of nfs4_do_reclaim()
   * Simplify and guarantee lock owner uniqueness
   * Fix -Wformat-truncation warning
   * Fix folio refcounts by using folio_attach_private()
   * Fix failing the mount system call when the server is down
   * Fix detection of "Proxying of Times" server support
 
 Cleanups:
   * Annotate struct nfs_cache_array with __counted_by()
   * Remove unnecessary NULL checks before kfree()
   * Convert RPC_TASK_* constants to an enum
   * Remove obsolete or misleading comments and declerations
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAmbzKDAACgkQ18tUv7Cl
 QOusjBAAxTSoVbHocl+9eYpvKnscPArgPXnfd6mB9rnQRgtnceTO2ei7cdiE2qhz
 dxQiyzlAXh3e7dGwoy02qEd6wTTqWeQ8ESdMpCAqSacBU4tu5owfzNSWunZvgYYj
 QOhjdmv8M1IfZnTstPlVrRNaZcDkhV1tzEtpZppkEqhTB0bHWqrcM4EdklTWT0Yc
 PGMpGbfuGsa4qZy2vWl7doERVEgK8mBeahLtYFD2W6phIvNWgD6IlKy66RaK2RfH
 nXmZoZbI2/ioi4TKvNyY8xoGMGvetLI1h8YNQYkEg060XCkisLZDOvoodUAylOTR
 2jHQLG5+/ejhpD/zgPghGZDSGNN1GyZaH09E/vtiS+3k9OXxFz6Rq68VnC6kpMA4
 TIUYsT8ejPzs2gW59iDFGB6cKI4XnRtxgmApW/Za0y9A72PSi+G/pbWAk7ThjTxf
 +HySsba4baA63opIgBSLVBrUsXZfdn/KTDTZ4nkPiq57BggGcZv7Y2ItOTXA+pB/
 5nigDKkhWsYVjMbkx6wmh+VO2gv4/Z8WqsmiDwFMpVqM0w8eycBOHjOumuuc6nmw
 y+2OKZqU2Npm2HI/R8lA7nB1m2QP5t7CRM2+xlZNuavHrfsMaqHNl8/9VgxlCATQ
 /Zo74hbhmCgQYxrTjL8XFQG9/8y0o3H5IcTEr/SgCVxHyDSan1I=
 =YjyC
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-6.12-1' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client updates from Anna Schumaker:
 "New Features:
   - Add a 'noalignwrite' mount option for lock-less 'lost writes' prevention
   - Add support for the LOCALIO protocol extention

  Bugfixes:
   - Fix memory leak in error path of nfs4_do_reclaim()
   - Simplify and guarantee lock owner uniqueness
   - Fix -Wformat-truncation warning
   - Fix folio refcounts by using folio_attach_private()
   - Fix failing the mount system call when the server is down
   - Fix detection of "Proxying of Times" server support

  Cleanups:
   - Annotate struct nfs_cache_array with __counted_by()
   - Remove unnecessary NULL checks before kfree()
   - Convert RPC_TASK_* constants to an enum
   - Remove obsolete or misleading comments and declerations"

* tag 'nfs-for-6.12-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (41 commits)
  nfs: Fix `make htmldocs` warnings in the localio documentation
  nfs: add "NFS Client and Server Interlock" section to localio.rst
  nfs: add FAQ section to Documentation/filesystems/nfs/localio.rst
  nfs: add Documentation/filesystems/nfs/localio.rst
  nfs: implement client support for NFS_LOCALIO_PROGRAM
  nfs/localio: use dedicated workqueues for filesystem read and write
  pnfs/flexfiles: enable localio support
  nfs: enable localio for non-pNFS IO
  nfs: add LOCALIO support
  nfs: pass struct nfsd_file to nfs_init_pgio and nfs_init_commit
  nfsd: implement server support for NFS_LOCALIO_PROGRAM
  nfsd: add LOCALIO support
  nfs_common: prepare for the NFS client to use nfsd_file for LOCALIO
  nfs_common: add NFS LOCALIO auxiliary protocol enablement
  SUNRPC: replace program list with program array
  SUNRPC: add svcauth_map_clnt_to_svc_cred_local
  SUNRPC: remove call_allocate() BUG_ONs
  nfsd: add nfsd_serv_try_get and nfsd_serv_put
  nfsd: add nfsd_file_acquire_local()
  nfsd: factor out __fh_verify to allow NULL rqstp to be passed
  ...
2024-09-24 15:44:18 -07:00
Linus Torvalds
fa8380a06b bpf-next-6.12-struct-fd
-----BEGIN PGP SIGNATURE-----
 
 iQIyBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmbyniwACgkQ6rmadz2v
 bTqE0w/2J8TJWfR+1Z0Bf2Nzt3kFd/wLNn6FpWsq+z0/pzoP5AzborvmLzNiZmeh
 0vJFieOL7pV4+NcaIHBPqfW1eMsXu+BlrtkHGLLYiCPJUr8o5jU9SrVKfF3arMZS
 a6+zcX6EivX0MYWobZ2F7/8XF0nRQADxzInLazFmtJmLmOAyIch417KOg9ylwr3m
 WVqhtCImUFyVz83XMFgbf2jXrvL9xD08iHN62GzcAioRF5LeJSPX0U/N15gWDqF7
 V68F0PnvUf6/hkFvYVynhpMivE8u+8VXCHX+heZ8yUyf4ExV/+KSZrImupJ0WLeO
 iX/qJ/9XP+g6ad9Olqpu6hmPi/6c6epQgbSOchpG04FGBGmJv1j9w4wnlHCgQDdB
 i2oKHRtMKdqNZc0sOSfvw/KyxZXJuD1VQ9YgGVpZbHUbSZDoj7T40zWziUp8VgyR
 nNtOmfJLDbtYlPV7/cQY5Ui4ccMJm6GzxxLBcqcMWxBu/90Ng0wTSubLbg3RHmWu
 d9cCL6IprjJnliEUqC4k4gqZy6RJlHvQ8+NDllaW+4iPnz7B2WaUbwRX/oZ5yiYK
 bLjWCWo+SzntVPAzTsmAYs2G47vWoALxo2NpNXLfmhJiWwfakJaQu7fwrDxsY11M
 OgByiOzcbAcvkJzeVIDhfLVq5z49KF6k4D8Qu0uvXHDeC8Mraw==
 =zzmh
 -----END PGP SIGNATURE-----

Merge tag 'bpf-next-6.12-struct-fd' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Pull bpf 'struct fd' updates from Alexei Starovoitov:
 "This includes struct_fd BPF changes from Al and Andrii"

* tag 'bpf-next-6.12-struct-fd' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
  bpf: convert bpf_token_create() to CLASS(fd, ...)
  security,bpf: constify struct path in bpf_token_create() LSM hook
  bpf: more trivial fdget() conversions
  bpf: trivial conversions for fdget()
  bpf: switch maps to CLASS(fd, ...)
  bpf: factor out fetching bpf_map from FD and adding it to used_maps list
  bpf: switch fdget_raw() uses to CLASS(fd_raw, ...)
  bpf: convert __bpf_prog_get() to CLASS(fd, ...)
2024-09-24 14:54:26 -07:00
Jiawei Ye
bff1709b39 mac802154: Fix potential RCU dereference issue in mac802154_scan_worker
In the `mac802154_scan_worker` function, the `scan_req->type` field was
accessed after the RCU read-side critical section was unlocked. According
to RCU usage rules, this is illegal and can lead to unpredictable
behavior, such as accessing memory that has been updated or causing
use-after-free issues.

This possible bug was identified using a static analysis tool developed
by myself, specifically designed to detect RCU-related issues.

To address this, the `scan_req->type` value is now stored in a local
variable `scan_req_type` while still within the RCU read-side critical
section. The `scan_req_type` is then used after the RCU lock is released,
ensuring that the type value is safely accessed without violating RCU
rules.

Fixes: e2c3e6f53a ("mac802154: Handle active scanning")
Cc: stable@vger.kernel.org
Signed-off-by: Jiawei Ye <jiawei.ye@foxmail.com>
Acked-by: Miquel Raynal <miquel.raynal@bootlin.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://lore.kernel.org/tencent_3B2F4F2B4DA30FAE2F51A9634A16B3AD4908@qq.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2024-09-24 11:53:50 +02:00
Youssef Samir
f011b313e8 net: qrtr: Update packets cloning when broadcasting
When broadcasting data to multiple nodes via MHI, using skb_clone()
causes all nodes to receive the same header data. This can result in
packets being discarded by endpoints, leading to lost data.

This issue occurs when a socket is closed, and a QRTR_TYPE_DEL_CLIENT
packet is broadcasted. All nodes receive the same destination node ID,
causing the node connected to the client to discard the packet and
remain unaware of the client's deletion.

Replace skb_clone() with pskb_copy(), to create a separate copy of
the header for each sk_buff.

Fixes: bdabad3e36 ("net: Add Qualcomm IPC router")
Signed-off-by: Youssef Samir <quic_yabdulra@quicinc.com>
Reviewed-by: Jeffery Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com>
Reviewed-by: Chris Lew <quic_clew@quicinc.com>
Link: https://patch.msgid.link/20240916170858.2382247-1-quic_yabdulra@quicinc.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-09-24 10:48:16 +02:00
NeilBrown
86ab08beb3 SUNRPC: replace program list with program array
A service created with svc_create_pooled() can be given a linked list of
programs and all of these will be served.

Using a linked list makes it cumbersome when there are several programs
that can be optionally selected with CONFIG settings.

After this patch is applied, API consumers must use only
svc_create_pooled() when creating an RPC service that listens for more
than one RPC program.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23 15:03:30 -04:00
Weston Andros Adamson
199f212874 SUNRPC: add svcauth_map_clnt_to_svc_cred_local
Add new funtion svcauth_map_clnt_to_svc_cred_local which maps a
generic cred to a svc_cred suitable for use in nfsd.

This is needed by the localio code to map nfs client creds to nfs
server credentials.

Following from net/sunrpc/auth_unix.c:unx_marshal() it is clear that
->fsuid and ->fsgid must be used (rather than ->uid and ->gid).  In
addition, these uid and gid must be translated with from_kuid_munged()
so local client uses correct uid and gid when acting as local server.

Jeff Layton noted:
  This is where the magic happens. Since we're working in
  kuid_t/kgid_t, we don't need to worry about further idmapping.

Suggested-by: NeilBrown <neilb@suse.de> # to approximate unx_marshal()
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Co-developed-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23 15:03:30 -04:00
Mike Snitzer
2c8919848d SUNRPC: remove call_allocate() BUG_ONs
Remove BUG_ON if p_arglen=0 to allow RPC with void arg.
Remove BUG_ON if p_replen=0 to allow RPC with void return.

The former was needed for the first revision of the LOCALIO protocol
which had an RPC that took a void arg:

    /* raw RFC 9562 UUID */
    typedef u8 uuid_t<UUID_SIZE>;

    program NFS_LOCALIO_PROGRAM {
        version LOCALIO_V1 {
            void
                NULL(void) = 0;

            uuid_t
                GETUUID(void) = 1;
        } = 1;
    } = 400122;

The latter is needed for the final revision of the LOCALIO protocol
which has a UUID_IS_LOCAL RPC which returns a void:

    /* raw RFC 9562 UUID */
    typedef u8 uuid_t<UUID_SIZE>;

    program NFS_LOCALIO_PROGRAM {
        version LOCALIO_V1 {
            void
                NULL(void) = 0;

            void
                UUID_IS_LOCAL(uuid_t) = 1;
        } = 1;
    } = 400122;

There is really no value in triggering a BUG_ON in response to either
of these previously unsupported conditions.

NeilBrown would like the entire 'if (proc->p_proc != 0)' branch
removed (not just the one BUG_ON that must be removed for LOCALIO's
immediate needs of returning void).

Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23 15:03:30 -04:00
Hongbo Li
64a3ab9967 net/sunrpc: make use of the helper macro LIST_HEAD()
list_head can be initialized automatically with LIST_HEAD()
instead of calling INIT_LIST_HEAD(). Here we can simplify
the code.

Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23 15:03:13 -04:00
Siddh Raman Pant
2e001972e8 SUNRPC: clnt.c: Remove misleading comment
destroy_wait doesn't store all RPC clients. There was a list named
"all_clients" above it, which got moved to struct sunrpc_net in 2012,
but the comment was never removed.

Fixes: 70abc49b4f ("SUNRPC: make SUNPRC clients list per network namespace context")
Signed-off-by: Siddh Raman Pant <siddh.raman.pant@oracle.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23 15:03:13 -04:00
Kunwu Chan
9090a7f786 SUNRPC: Fix -Wformat-truncation warning
Increase size of the servername array to avoid truncated output warning.

net/sunrpc/clnt.c:582:75: error:‘%s’ directive output may be truncated
writing up to 107 bytes into a region of size 48
[-Werror=format-truncation=]
  582 |                   snprintf(servername, sizeof(servername), "%s",
      |                                                             ^~

net/sunrpc/clnt.c:582:33: note:‘snprintf’ output
between 1 and 108 bytes into a destination of size 48
  582 |                     snprintf(servername, sizeof(servername), "%s",
      |                     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  583 |                                          sun->sun_path);

Signed-off-by: Kunwu Chan <chentao@kylinos.cn>
Suggested-by: NeilBrown <neilb@suse.de>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23 15:03:13 -04:00
Linus Torvalds
18ba603446 NFSD 6.12 Release Notes
Notable features of this release include:
 
 - Pre-requisites for automatically determining the RPC server thread
   count
 - Clean-up and preparation for supporting LOCALIO, which will be
   merged via the NFS client tree
 - Enhancements and fixes to NFSv4.2 COPY offload
 - A new Python-based tool for generating kernel SunRPC XDR encoding
   and decoding functions, added as an aid for prototyping features
   in protocols based on the Linux kernel's SunRPC implementation.
 
 As always I am grateful to the NFSD contributors, reviewers,
 testers, and bug reporters who participated during this cycle.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmbxg60ACgkQM2qzM29m
 f5d+9A/+LiXAjR3x1vlbGFiMAW3Alixg5wE6AM7M1I/OH/dBCkWU1gzneWYaUXAk
 cIGp5sH2Uco2mFVswOZyQ3tX8T/2PeY+Kx5qrlK5h0bTUoz95AIyLe3LA/4o4CIL
 qMGlLQyVq9UolggPoRdigsDhKVwLcu3hWaG7ykkTquyrOPLBKgzRNSwVKLpFc/0/
 mQToOf6HLjgFkEUR3pmXAMsVq88/BpjHIXeNhx2Z1ekWslSKjrAu2gC0rc6/s9Wi
 JsTtzSdnqefc2jsNVZ8FT+V7mDF1sxrN4SnHruSLhJsd5tL/3HDkiZEvdG2Sh0nH
 zQlDpMpNbZyCvaWs6jgaZeMRiNSSl7q31zXUgX2bkWpL/EnagujZHtLZroUgLQfA
 BO8HhRqdt1wJohiv2aMlFvnlp+GhSH5FdcXv1cT/CmyTNGqbXENqoCUA1OT9kE55
 RvXVCLD4YbmCb5EpjLavhu/NuFOc9l9GitKlhiJlcX86QAu/C1Bu1DOyqgq5G0VW
 Xl/q7xIvNZz0mh7x8kKVV4bQHsm9pnoNz57CZFPahoHg/+BR4u6p8LepowpaHjHj
 Ef62BzYwQtuw0jCyufDea+uCt5CGwUM3Y5iBiQogtnvFK6ie8WwD0QTI2SYpcWZ/
 T6RwDOX5jlMWWmuibSK2STgwkblG3vAmMot0RtEbZILvB/ld9qw=
 =Ybsc
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd updates from Chuck Lever:
 "Notable features of this release include:

   - Pre-requisites for automatically determining the RPC server thread
     count

   - Clean-up and preparation for supporting LOCALIO, which will be
     merged via the NFS client tree

   - Enhancements and fixes to NFSv4.2 COPY offload

   - A new Python-based tool for generating kernel SunRPC XDR encoding
     and decoding functions, added as an aid for prototyping features in
     protocols based on the Linux kernel's SunRPC implementation

  As always I am grateful to the NFSD contributors, reviewers, testers,
  and bug reporters who participated during this cycle"

* tag 'nfsd-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (57 commits)
  xdrgen: Prevent reordering of encoder and decoder functions
  xdrgen: typedefs should use the built-in string and opaque functions
  xdrgen: Fix return code checking in built-in XDR decoders
  tools: Add xdrgen
  nfsd: fix delegation_blocked() to block correctly for at least 30 seconds
  nfsd: fix initial getattr on write delegation
  nfsd: untangle code in nfsd4_deleg_getattr_conflict()
  nfsd: enforce upper limit for namelen in __cld_pipe_inprogress_downcall()
  nfsd: return -EINVAL when namelen is 0
  NFSD: Wrap async copy operations with trace points
  NFSD: Clean up extra whitespace in trace_nfsd_copy_done
  NFSD: Record the callback stateid in copy tracepoints
  NFSD: Display copy stateids with conventional print formatting
  NFSD: Limit the number of concurrent async COPY operations
  NFSD: Async COPY result needs to return a write verifier
  nfsd: avoid races with wake_up_var()
  nfsd: use clear_and_wake_up_bit()
  sunrpc: xprtrdma: Use ERR_CAST() to return
  NFSD: Annotate struct pnfs_block_deviceaddr with __counted_by()
  nfsd: call cache_put if xdr_reserve_space returns NULL
  ...
2024-09-23 12:01:45 -07:00
Anna Schumaker
8c04a6d6e0 NFSD 6.12 Release Notes
Notable features of this release include:
 
 - Pre-requisites for automatically determining the RPC server thread
   count
 - Clean-up and preparation for supporting LOCALIO, which will be
   merged via the NFS client tree
 - Enhancements and fixes to NFSv4.2 COPY offload
 - A new Python-based tool for generating kernel SunRPC XDR encoding
   and decoding functions, added as an aid for prototyping features
   in protocols based on the Linux kernel's SunRPC implementation.
 
 As always I am grateful to the NFSD contributors, reviewers,
 testers, and bug reporters who participated during this cycle.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmbxg60ACgkQM2qzM29m
 f5d+9A/+LiXAjR3x1vlbGFiMAW3Alixg5wE6AM7M1I/OH/dBCkWU1gzneWYaUXAk
 cIGp5sH2Uco2mFVswOZyQ3tX8T/2PeY+Kx5qrlK5h0bTUoz95AIyLe3LA/4o4CIL
 qMGlLQyVq9UolggPoRdigsDhKVwLcu3hWaG7ykkTquyrOPLBKgzRNSwVKLpFc/0/
 mQToOf6HLjgFkEUR3pmXAMsVq88/BpjHIXeNhx2Z1ekWslSKjrAu2gC0rc6/s9Wi
 JsTtzSdnqefc2jsNVZ8FT+V7mDF1sxrN4SnHruSLhJsd5tL/3HDkiZEvdG2Sh0nH
 zQlDpMpNbZyCvaWs6jgaZeMRiNSSl7q31zXUgX2bkWpL/EnagujZHtLZroUgLQfA
 BO8HhRqdt1wJohiv2aMlFvnlp+GhSH5FdcXv1cT/CmyTNGqbXENqoCUA1OT9kE55
 RvXVCLD4YbmCb5EpjLavhu/NuFOc9l9GitKlhiJlcX86QAu/C1Bu1DOyqgq5G0VW
 Xl/q7xIvNZz0mh7x8kKVV4bQHsm9pnoNz57CZFPahoHg/+BR4u6p8LepowpaHjHj
 Ef62BzYwQtuw0jCyufDea+uCt5CGwUM3Y5iBiQogtnvFK6ie8WwD0QTI2SYpcWZ/
 T6RwDOX5jlMWWmuibSK2STgwkblG3vAmMot0RtEbZILvB/ld9qw=
 =Ybsc
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-6.12' into linux-next-with-localio

NFSD 6.12 Release Notes

Notable features of this release include:

- Pre-requisites for automatically determining the RPC server thread
  count
- Clean-up and preparation for supporting LOCALIO, which will be
  merged via the NFS client tree
- Enhancements and fixes to NFSv4.2 COPY offload
- A new Python-based tool for generating kernel SunRPC XDR encoding
  and decoding functions, added as an aid for prototyping features
  in protocols based on the Linux kernel's SunRPC implementation.

As always I am grateful to the NFSD contributors, reviewers,
testers, and bug reporters who participated during this cycle.
2024-09-23 15:00:07 -04:00
Linus Torvalds
f8ffbc365f struct fd layout change (and conversion to accessor helpers)
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZvDNmgAKCRBZ7Krx/gZQ
 63zrAP9vI0rf55v27twiabe9LnI7aSx5ckoqXxFIFxyT3dOYpQD/bPmoApnWDD3d
 592+iDgLsema/H/0/CqfqlaNtDNY8Q0=
 =HUl5
 -----END PGP SIGNATURE-----

Merge tag 'pull-stable-struct_fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull 'struct fd' updates from Al Viro:
 "Just the 'struct fd' layout change, with conversion to accessor
  helpers"

* tag 'pull-stable-struct_fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  add struct fd constructors, get rid of __to_fd()
  struct fd: representation change
  introduce fd_file(), convert all accessors to it.
2024-09-23 09:35:36 -07:00