linux/net
Eric Dumazet 56667da739 net: implement lockless setsockopt(SO_PEEK_OFF)
syzbot reported a lockdep violation [1] involving af_unix
support of SO_PEEK_OFF.

Since SO_PEEK_OFF is inherently not thread safe (it uses a per-socket
sk_peek_off field), there is really no point to enforce a pointless
thread safety in the kernel.

After this patch :

- setsockopt(SO_PEEK_OFF) no longer acquires the socket lock.

- skb_consume_udp() no longer has to acquire the socket lock.

- af_unix no longer needs a special version of sk_set_peek_off(),
  because it does not lock u->iolock anymore.

As a followup, we could replace prot->set_peek_off to be a boolean
and avoid an indirect call, since we always use sk_set_peek_off().

[1]

WARNING: possible circular locking dependency detected
6.8.0-rc4-syzkaller-00267-g0f1dd5e91e2b #0 Not tainted

syz-executor.2/30025 is trying to acquire lock:
 ffff8880765e7d80 (&u->iolock){+.+.}-{3:3}, at: unix_set_peek_off+0x26/0xa0 net/unix/af_unix.c:789

but task is already holding lock:
 ffff8880765e7930 (sk_lock-AF_UNIX){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1691 [inline]
 ffff8880765e7930 (sk_lock-AF_UNIX){+.+.}-{0:0}, at: sockopt_lock_sock net/core/sock.c:1060 [inline]
 ffff8880765e7930 (sk_lock-AF_UNIX){+.+.}-{0:0}, at: sk_setsockopt+0xe52/0x3360 net/core/sock.c:1193

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (sk_lock-AF_UNIX){+.+.}-{0:0}:
        lock_acquire+0x1e3/0x530 kernel/locking/lockdep.c:5754
        lock_sock_nested+0x48/0x100 net/core/sock.c:3524
        lock_sock include/net/sock.h:1691 [inline]
        __unix_dgram_recvmsg+0x1275/0x12c0 net/unix/af_unix.c:2415
        sock_recvmsg_nosec+0x18e/0x1d0 net/socket.c:1046
        ____sys_recvmsg+0x3c0/0x470 net/socket.c:2801
        ___sys_recvmsg net/socket.c:2845 [inline]
        do_recvmmsg+0x474/0xae0 net/socket.c:2939
        __sys_recvmmsg net/socket.c:3018 [inline]
        __do_sys_recvmmsg net/socket.c:3041 [inline]
        __se_sys_recvmmsg net/socket.c:3034 [inline]
        __x64_sys_recvmmsg+0x199/0x250 net/socket.c:3034
       do_syscall_64+0xf9/0x240
       entry_SYSCALL_64_after_hwframe+0x6f/0x77

-> #0 (&u->iolock){+.+.}-{3:3}:
        check_prev_add kernel/locking/lockdep.c:3134 [inline]
        check_prevs_add kernel/locking/lockdep.c:3253 [inline]
        validate_chain+0x18ca/0x58e0 kernel/locking/lockdep.c:3869
        __lock_acquire+0x1345/0x1fd0 kernel/locking/lockdep.c:5137
        lock_acquire+0x1e3/0x530 kernel/locking/lockdep.c:5754
        __mutex_lock_common kernel/locking/mutex.c:608 [inline]
        __mutex_lock+0x136/0xd70 kernel/locking/mutex.c:752
        unix_set_peek_off+0x26/0xa0 net/unix/af_unix.c:789
       sk_setsockopt+0x207e/0x3360
        do_sock_setsockopt+0x2fb/0x720 net/socket.c:2307
        __sys_setsockopt+0x1ad/0x250 net/socket.c:2334
        __do_sys_setsockopt net/socket.c:2343 [inline]
        __se_sys_setsockopt net/socket.c:2340 [inline]
        __x64_sys_setsockopt+0xb5/0xd0 net/socket.c:2340
       do_syscall_64+0xf9/0x240
       entry_SYSCALL_64_after_hwframe+0x6f/0x77

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(sk_lock-AF_UNIX);
                               lock(&u->iolock);
                               lock(sk_lock-AF_UNIX);
  lock(&u->iolock);

 *** DEADLOCK ***

1 lock held by syz-executor.2/30025:
  #0: ffff8880765e7930 (sk_lock-AF_UNIX){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1691 [inline]
  #0: ffff8880765e7930 (sk_lock-AF_UNIX){+.+.}-{0:0}, at: sockopt_lock_sock net/core/sock.c:1060 [inline]
  #0: ffff8880765e7930 (sk_lock-AF_UNIX){+.+.}-{0:0}, at: sk_setsockopt+0xe52/0x3360 net/core/sock.c:1193

stack backtrace:
CPU: 0 PID: 30025 Comm: syz-executor.2 Not tainted 6.8.0-rc4-syzkaller-00267-g0f1dd5e91e2b #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/25/2024
Call Trace:
 <TASK>
  __dump_stack lib/dump_stack.c:88 [inline]
  dump_stack_lvl+0x1e7/0x2e0 lib/dump_stack.c:106
  check_noncircular+0x36a/0x4a0 kernel/locking/lockdep.c:2187
  check_prev_add kernel/locking/lockdep.c:3134 [inline]
  check_prevs_add kernel/locking/lockdep.c:3253 [inline]
  validate_chain+0x18ca/0x58e0 kernel/locking/lockdep.c:3869
  __lock_acquire+0x1345/0x1fd0 kernel/locking/lockdep.c:5137
  lock_acquire+0x1e3/0x530 kernel/locking/lockdep.c:5754
  __mutex_lock_common kernel/locking/mutex.c:608 [inline]
  __mutex_lock+0x136/0xd70 kernel/locking/mutex.c:752
  unix_set_peek_off+0x26/0xa0 net/unix/af_unix.c:789
 sk_setsockopt+0x207e/0x3360
  do_sock_setsockopt+0x2fb/0x720 net/socket.c:2307
  __sys_setsockopt+0x1ad/0x250 net/socket.c:2334
  __do_sys_setsockopt net/socket.c:2343 [inline]
  __se_sys_setsockopt net/socket.c:2340 [inline]
  __x64_sys_setsockopt+0xb5/0xd0 net/socket.c:2340
 do_syscall_64+0xf9/0x240
 entry_SYSCALL_64_after_hwframe+0x6f/0x77
RIP: 0033:0x7f78a1c7dda9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 20 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f78a0fde0c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
RAX: ffffffffffffffda RBX: 00007f78a1dac050 RCX: 00007f78a1c7dda9
RDX: 000000000000002a RSI: 0000000000000001 RDI: 0000000000000006
RBP: 00007f78a1cca47a R08: 0000000000000004 R09: 0000000000000000
R10: 0000000020000180 R11: 0000000000000246 R12: 0000000000000000
R13: 000000000000006e R14: 00007f78a1dac050 R15: 00007ffe5cd81ae8

Fixes: 859051dd16 ("bpf: Implement cgroup sockaddr hooks for unix sockets")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Cc: Daan De Meyer <daan.j.demeyer@gmail.com>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: Martin KaFai Lau <martin.lau@kernel.org>
Cc: David Ahern <dsahern@kernel.org>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-21 11:24:20 +00:00
..
6lowpan net: fill in MODULE_DESCRIPTION()s for 6LoWPAN 2024-02-09 14:12:01 -08:00
9p net: 9p: avoid freeing uninit memory in p9pdu_vreadf 2023-12-13 05:44:30 +09:00
802 net: fill in MODULE_DESCRIPTION()s under net/802* 2023-10-28 11:29:28 +01:00
8021q vlan: skip nested type that is not IFLA_VLAN_QOS_MAPPING 2024-01-19 21:25:06 -08:00
appletalk net: remove SOCK_DEBUG leftovers 2023-12-26 20:31:01 +00:00
atm net: fill in MODULE_DESCRIPTION()s for mpoa 2024-02-09 14:12:01 -08:00
ax25 net: implement lockless SO_PRIORITY 2023-10-01 19:09:54 +01:00
batman-adv batman-adv: mcast: fix memory leak on deleting a batman-adv interface 2024-01-27 09:13:39 +01:00
bluetooth TTY/Serial changes for 6.8-rc1 2024-01-18 11:37:24 -08:00
bpf bpf: Fix dtor CFI 2023-12-15 16:25:55 -08:00
bridge net: bridge: switchdev: Ensure deferred event delivery on unoffload 2024-02-16 09:36:37 +00:00
caif net: fill in MODULE_DESCRIPTION()s for CAIF 2024-01-05 08:06:35 -08:00
can can: j1939: Fix UAF in j1939_sk_match_filter during setsockopt(SO_J1939_FILTER) 2024-02-14 13:53:03 +01:00
ceph libceph: just wait for more data to be available on the socket 2024-02-07 14:43:29 +01:00
core net: implement lockless setsockopt(SO_PEEK_OFF) 2024-02-21 11:24:20 +00:00
dcb net: dcb: choose correct policy to parse DCB_ATTR_BCN 2023-08-01 21:07:46 -07:00
dccp net: remove SOCK_DEBUG leftovers 2023-12-26 20:31:01 +00:00
devlink devlink: fix possible use-after-free and memory leaks in devlink_init() 2024-02-20 10:17:46 +01:00
dns_resolver Networking changes for 6.8. 2024-01-11 10:07:29 -08:00
dsa net: dsa: fix netdev_priv() dereference before check on non-DSA netdevice events 2024-01-11 16:33:52 -08:00
ethernet net: ethernet: use sysfs_emit() to instead of scnprintf() 2022-12-07 20:02:44 -08:00
ethtool ethtool: netlink: Add missing ethnl_ops_begin/complete 2024-01-18 13:21:06 +01:00
handshake net/handshake: Fix handshake_req_destroy_test1 2024-02-08 18:32:29 -08:00
hsr net: hsr: remove WARN_ONCE() in send_hsr_supervision_frame() 2024-01-29 11:29:55 +00:00
ieee802154 mac802154: Avoid new associations while disassociating 2023-12-15 11:14:57 +01:00
ife net: sched: ife: fix potential use-after-free 2023-12-15 10:50:18 +00:00
ipv4 net: implement lockless setsockopt(SO_PEEK_OFF) 2024-02-21 11:24:20 +00:00
ipv6 ipv6: sr: fix possible use-after-free and null-ptr-deref 2024-02-20 10:17:14 +01:00
iucv net/iucv: fix the allocation size of iucv_path_table array 2024-02-16 09:25:09 +00:00
kcm net: kcm: fix direct access to bv_len 2024-01-03 18:37:22 -08:00
key net: fill in MODULE_DESCRIPTION()s for af_key 2024-02-09 14:12:01 -08:00
l2tp ipv6: annotate data-races around np->ucast_oif 2023-12-11 10:59:17 +00:00
l3mdev
lapb
llc llc: call sock_orphan() at release time 2024-01-30 13:49:09 +01:00
mac80211 wifi: mac80211: reload info pointer in ieee80211_tx_dequeue() 2024-02-08 13:16:22 +01:00
mac802154 mac802154: Avoid new associations while disassociating 2023-12-15 11:14:57 +01:00
mctp mctp: perform route lookups under a RCU read-side lock 2023-10-10 19:43:22 -07:00
mpls networking: Update to register_net_sysctl_sz 2023-08-15 15:26:18 -07:00
mptcp mptcp: fix duplicate subflow creation 2024-02-18 10:25:00 +00:00
ncsi net/ncsi: Add NC-SI 1.2 Get MC MAC Address command 2023-11-18 15:00:51 +00:00
netfilter Including fixes from can, wireless and netfilter. 2024-02-15 11:39:27 -08:00
netlabel calipso: fix memory leak in netlbl_calipso_add_pass() 2023-12-07 14:23:12 -05:00
netlink netlink: fix potential sleeping issue in mqueue_flush_file 2024-01-23 11:21:18 +01:00
netrom net: implement lockless SO_PRIORITY 2023-10-01 19:09:54 +01:00
nfc nfc: nci: free rx_data_reassembly skb on NCI device cleanup 2024-01-29 12:05:31 +00:00
nsh net: move gso declarations and functions to their own files 2023-06-10 00:11:41 -07:00
openvswitch net: openvswitch: limit the number of recursions from action sets 2024-02-09 12:54:38 -08:00
packet net: fill in MODULE_DESCRIPTION() for AF_PACKET 2024-01-05 08:06:35 -08:00
phonet sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
psample genetlink: Use internal flags for multicast groups 2023-12-29 08:43:59 +00:00
qrtr net: qrtr: ns: Return 0 if server port is not present 2024-01-01 18:41:29 +00:00
rds net:rds: Fix possible deadlock in rds_message_put 2024-02-13 10:25:30 +01:00
rfkill Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-12-21 22:17:23 +01:00
rose net/rose: fix races in rose_kill_by_device() 2023-12-15 11:59:53 +00:00
rxrpc rxrpc: Fix counting of new acks and nacks 2024-02-05 12:34:07 +00:00
sched net/sched: act_mirred: don't override retval if we already lost the skb 2024-02-16 10:13:31 +00:00
sctp net: sctp: fix skb leak in sctp_inq_free() 2024-02-15 07:34:52 -08:00
smc net: smc: fix spurious error message from __sock_release() 2024-02-14 10:56:02 +00:00
strparser
sunrpc NFSv4.1: Assign the right value for initval and retries for rpc timeout 2024-01-29 13:39:48 -05:00
switchdev net: bridge: switchdev: Skip MDB replays of deferred events on offload 2024-02-16 09:36:37 +00:00
tipc tipc: Check the bearer type before calling tipc_udp_nl_bearer_add() 2024-02-06 08:49:26 +01:00
tls mptcp: fix lockless access in subflow ULP diag 2024-02-18 10:25:00 +00:00
unix net: implement lockless setsockopt(SO_PEEK_OFF) 2024-02-21 11:24:20 +00:00
vmw_vsock vsock/virtio: use skb_frag_*() helpers 2024-01-03 18:37:16 -08:00
wireless wifi: cfg80211: detect stuck ECSA element in probe resp 2024-02-02 13:08:58 +01:00
x25 net: remove SOCK_DEBUG leftovers 2023-12-26 20:31:01 +00:00
xdp xsk: make xsk_buff_pool responsible for clearing xdp_buff::flags 2024-01-24 16:24:06 -08:00
xfrm net: fill in MODULE_DESCRIPTION()s for xfrm 2024-02-09 14:12:01 -08:00
compat.c file: stop exposing receive_fd_user() 2023-12-12 14:24:14 +01:00
devres.c
Kconfig bpfilter: remove bpfilter 2024-01-04 10:23:10 -08:00
Kconfig.debug
Makefile bpfilter: remove bpfilter 2024-01-04 10:23:10 -08:00
socket.c vfs-6.8.iov_iter 2024-01-08 11:43:04 -08:00
sysctl_net.c sysctl: Add size to register_net_sysctl function 2023-08-15 15:26:17 -07:00