linux-stable/net
Michal Luczaj ed1fc5d76b bpf, sockmap: Fix race between element replace and close()
Element replace (with a socket different from the one stored) may race
with socket's close() link popping & unlinking. __sock_map_delete()
unconditionally unrefs the (wrong) element:

// set map[0] = s0
map_update_elem(map, 0, s0)

// drop fd of s0
close(s0)
  sock_map_close()
    lock_sock(sk)               (s0!)
    sock_map_remove_links(sk)
      link = sk_psock_link_pop()
      sock_map_unlink(sk, link)
        sock_map_delete_from_link
                                        // replace map[0] with s1
                                        map_update_elem(map, 0, s1)
                                          sock_map_update_elem
                                (s1!)       lock_sock(sk)
                                            sock_map_update_common
                                              psock = sk_psock(sk)
                                              spin_lock(&stab->lock)
                                              osk = stab->sks[idx]
                                              sock_map_add_link(..., &stab->sks[idx])
                                              sock_map_unref(osk, &stab->sks[idx])
                                                psock = sk_psock(osk)
                                                sk_psock_put(sk, psock)
                                                  if (refcount_dec_and_test(&psock))
                                                    sk_psock_drop(sk, psock)
                                              spin_unlock(&stab->lock)
                                            unlock_sock(sk)
          __sock_map_delete
            spin_lock(&stab->lock)
            sk = *psk                        // s1 replaced s0; sk == s1
            if (!sk_test || sk_test == sk)   // sk_test (s0) != sk (s1); no branch
              sk = xchg(psk, NULL)
            if (sk)
              sock_map_unref(sk, psk)        // unref s1; sks[idx] will dangle
                psock = sk_psock(sk)
                sk_psock_put(sk, psock)
                  if (refcount_dec_and_test())
                    sk_psock_drop(sk, psock)
            spin_unlock(&stab->lock)
    release_sock(sk)

Then close(map) enqueues bpf_map_free_deferred, which finally calls
sock_map_free(). This results in some refcount_t warnings along with
a KASAN splat [1].

Fix __sock_map_delete(), do not allow sock_map_unref() on elements that
may have been replaced.

[1]:
BUG: KASAN: slab-use-after-free in sock_map_free+0x10e/0x330
Write of size 4 at addr ffff88811f5b9100 by task kworker/u64:12/1063

CPU: 14 UID: 0 PID: 1063 Comm: kworker/u64:12 Not tainted 6.12.0+ #125
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
Workqueue: events_unbound bpf_map_free_deferred
Call Trace:
 <TASK>
 dump_stack_lvl+0x68/0x90
 print_report+0x174/0x4f6
 kasan_report+0xb9/0x190
 kasan_check_range+0x10f/0x1e0
 sock_map_free+0x10e/0x330
 bpf_map_free_deferred+0x173/0x320
 process_one_work+0x846/0x1420
 worker_thread+0x5b3/0xf80
 kthread+0x29e/0x360
 ret_from_fork+0x2d/0x70
 ret_from_fork_asm+0x1a/0x30
 </TASK>

Allocated by task 1202:
 kasan_save_stack+0x1e/0x40
 kasan_save_track+0x10/0x30
 __kasan_slab_alloc+0x85/0x90
 kmem_cache_alloc_noprof+0x131/0x450
 sk_prot_alloc+0x5b/0x220
 sk_alloc+0x2c/0x870
 unix_create1+0x88/0x8a0
 unix_create+0xc5/0x180
 __sock_create+0x241/0x650
 __sys_socketpair+0x1ce/0x420
 __x64_sys_socketpair+0x92/0x100
 do_syscall_64+0x93/0x180
 entry_SYSCALL_64_after_hwframe+0x76/0x7e

Freed by task 46:
 kasan_save_stack+0x1e/0x40
 kasan_save_track+0x10/0x30
 kasan_save_free_info+0x37/0x60
 __kasan_slab_free+0x4b/0x70
 kmem_cache_free+0x1a1/0x590
 __sk_destruct+0x388/0x5a0
 sk_psock_destroy+0x73e/0xa50
 process_one_work+0x846/0x1420
 worker_thread+0x5b3/0xf80
 kthread+0x29e/0x360
 ret_from_fork+0x2d/0x70
 ret_from_fork_asm+0x1a/0x30

The buggy address belongs to the object at ffff88811f5b9080
 which belongs to the cache UNIX-STREAM of size 1984
The buggy address is located 128 bytes inside of
 freed 1984-byte region [ffff88811f5b9080, ffff88811f5b9840)

The buggy address belongs to the physical page:
page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x11f5b8
head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
memcg:ffff888127d49401
flags: 0x17ffffc0000040(head|node=0|zone=2|lastcpupid=0x1fffff)
page_type: f5(slab)
raw: 0017ffffc0000040 ffff8881042e4500 dead000000000122 0000000000000000
raw: 0000000000000000 00000000800f000f 00000001f5000000 ffff888127d49401
head: 0017ffffc0000040 ffff8881042e4500 dead000000000122 0000000000000000
head: 0000000000000000 00000000800f000f 00000001f5000000 ffff888127d49401
head: 0017ffffc0000003 ffffea00047d6e01 ffffffffffffffff 0000000000000000
head: 0000000000000008 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff88811f5b9000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff88811f5b9080: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                   ^
 ffff88811f5b9180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff88811f5b9200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
Disabling lock debugging due to kernel taint

refcount_t: addition on 0; use-after-free.
WARNING: CPU: 14 PID: 1063 at lib/refcount.c:25 refcount_warn_saturate+0xce/0x150
CPU: 14 UID: 0 PID: 1063 Comm: kworker/u64:12 Tainted: G    B              6.12.0+ #125
Tainted: [B]=BAD_PAGE
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
Workqueue: events_unbound bpf_map_free_deferred
RIP: 0010:refcount_warn_saturate+0xce/0x150
Code: 34 73 eb 03 01 e8 82 53 ad fe 0f 0b eb b1 80 3d 27 73 eb 03 00 75 a8 48 c7 c7 80 bd 95 84 c6 05 17 73 eb 03 01 e8 62 53 ad fe <0f> 0b eb 91 80 3d 06 73 eb 03 00 75 88 48 c7 c7 e0 bd 95 84 c6 05
RSP: 0018:ffff88815c49fc70 EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffff88811f5b9100 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000001
RBP: 0000000000000002 R08: 0000000000000001 R09: ffffed10bcde6349
R10: ffff8885e6f31a4b R11: 0000000000000000 R12: ffff88813be0b000
R13: ffff88811f5b9100 R14: ffff88811f5b9080 R15: ffff88813be0b024
FS:  0000000000000000(0000) GS:ffff8885e6f00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055dda99b0250 CR3: 000000015dbac000 CR4: 0000000000752ef0
PKRU: 55555554
Call Trace:
 <TASK>
 ? __warn.cold+0x5f/0x1ff
 ? refcount_warn_saturate+0xce/0x150
 ? report_bug+0x1ec/0x390
 ? handle_bug+0x58/0x90
 ? exc_invalid_op+0x13/0x40
 ? asm_exc_invalid_op+0x16/0x20
 ? refcount_warn_saturate+0xce/0x150
 sock_map_free+0x2e5/0x330
 bpf_map_free_deferred+0x173/0x320
 process_one_work+0x846/0x1420
 worker_thread+0x5b3/0xf80
 kthread+0x29e/0x360
 ret_from_fork+0x2d/0x70
 ret_from_fork_asm+0x1a/0x30
 </TASK>
irq event stamp: 10741
hardirqs last  enabled at (10741): [<ffffffff84400ec6>] asm_sysvec_apic_timer_interrupt+0x16/0x20
hardirqs last disabled at (10740): [<ffffffff811e532d>] handle_softirqs+0x60d/0x770
softirqs last  enabled at (10506): [<ffffffff811e55a9>] __irq_exit_rcu+0x109/0x210
softirqs last disabled at (10301): [<ffffffff811e55a9>] __irq_exit_rcu+0x109/0x210

refcount_t: underflow; use-after-free.
WARNING: CPU: 14 PID: 1063 at lib/refcount.c:28 refcount_warn_saturate+0xee/0x150
CPU: 14 UID: 0 PID: 1063 Comm: kworker/u64:12 Tainted: G    B   W          6.12.0+ #125
Tainted: [B]=BAD_PAGE, [W]=WARN
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
Workqueue: events_unbound bpf_map_free_deferred
RIP: 0010:refcount_warn_saturate+0xee/0x150
Code: 17 73 eb 03 01 e8 62 53 ad fe 0f 0b eb 91 80 3d 06 73 eb 03 00 75 88 48 c7 c7 e0 bd 95 84 c6 05 f6 72 eb 03 01 e8 42 53 ad fe <0f> 0b e9 6e ff ff ff 80 3d e6 72 eb 03 00 0f 85 61 ff ff ff 48 c7
RSP: 0018:ffff88815c49fc70 EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffff88811f5b9100 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000001
RBP: 0000000000000003 R08: 0000000000000001 R09: ffffed10bcde6349
R10: ffff8885e6f31a4b R11: 0000000000000000 R12: ffff88813be0b000
R13: ffff88811f5b9100 R14: ffff88811f5b9080 R15: ffff88813be0b024
FS:  0000000000000000(0000) GS:ffff8885e6f00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055dda99b0250 CR3: 000000015dbac000 CR4: 0000000000752ef0
PKRU: 55555554
Call Trace:
 <TASK>
 ? __warn.cold+0x5f/0x1ff
 ? refcount_warn_saturate+0xee/0x150
 ? report_bug+0x1ec/0x390
 ? handle_bug+0x58/0x90
 ? exc_invalid_op+0x13/0x40
 ? asm_exc_invalid_op+0x16/0x20
 ? refcount_warn_saturate+0xee/0x150
 sock_map_free+0x2d3/0x330
 bpf_map_free_deferred+0x173/0x320
 process_one_work+0x846/0x1420
 worker_thread+0x5b3/0xf80
 kthread+0x29e/0x360
 ret_from_fork+0x2d/0x70
 ret_from_fork_asm+0x1a/0x30
 </TASK>
irq event stamp: 10741
hardirqs last  enabled at (10741): [<ffffffff84400ec6>] asm_sysvec_apic_timer_interrupt+0x16/0x20
hardirqs last disabled at (10740): [<ffffffff811e532d>] handle_softirqs+0x60d/0x770
softirqs last  enabled at (10506): [<ffffffff811e55a9>] __irq_exit_rcu+0x109/0x210
softirqs last disabled at (10301): [<ffffffff811e55a9>] __irq_exit_rcu+0x109/0x210

Fixes: 604326b41a ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20241202-sockmap-replace-v1-3-1e88579e7bd5@rbox.co
2024-12-10 17:38:05 +01:00
..
6lowpan ipv6: eliminate ndisc_ops_is_useropt() 2024-08-12 17:23:57 -07:00
9p net/9p/usbg: allow building as standalone module 2024-11-22 23:48:14 +09:00
802 move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
8021q net: convert to nla_get_*_default() 2024-11-11 10:32:06 -08:00
appletalk appletalk: Remove deadcode 2024-10-04 12:42:32 +01:00
atm atm: clean up a put_user() calls 2024-06-14 19:08:50 -07:00
ax25 ax25: Replace kfree() in ax25_dev_free() with ax25_dev_put() 2024-06-01 15:49:42 -07:00
batman-adv This cleanup patchset includes the following patches: 2024-10-15 15:28:17 +02:00
bluetooth Bluetooth: SCO: remove the redundant sco_conn_put 2024-11-26 11:07:28 -05:00
bpf bpf, test_run: Fix LIVE_FRAME frame update after a page has been recycled 2024-10-31 16:15:21 +01:00
bridge ndo_fdb_del: Add a parameter to report whether notification was sent 2024-11-15 16:39:18 -08:00
caif caif: Remove unused cfsrvl_getphyid 2024-10-08 15:33:49 -07:00
can can: j1939: j1939_session_new(): fix skb reference counting 2024-12-02 09:53:39 +01:00
ceph libceph: Remove unused ceph_crypto_key_encode 2024-11-18 17:34:35 +01:00
core bpf, sockmap: Fix race between element replace and close() 2024-12-10 17:38:05 +01:00
dcb dcb: Use rtnl_register_many(). 2024-10-15 18:52:26 -07:00
dccp dccp: Fix memory leak in dccp_feat_change_recv 2024-12-03 09:50:21 +01:00
devlink net: convert to nla_get_*_default() 2024-11-11 10:32:06 -08:00
dns_resolver
dsa net: dsa: use ethtool string helpers 2024-11-03 10:36:34 -08:00
ethernet netkit: Fix pkt_type override upon netkit pass verdict 2024-05-25 10:48:57 -07:00
ethtool ethtool: Fix wrong mod state in case of verbose and no_mask bitset 2024-12-04 18:54:43 -08:00
handshake module: Convert symbol namespace to string literal 2024-12-02 11:34:44 -08:00
hsr net: hsr: must allocate more bytes for RedBox support 2024-12-03 12:08:33 +01:00
ieee802154 net: convert to nla_get_*_default() 2024-11-11 10:32:06 -08:00
ife
ipv4 BPF fixes: 2024-12-06 15:07:48 -08:00
ipv6 ipmr: tune the ipmr_can_free_table() checks. 2024-12-04 18:49:16 -08:00
iucv s390/iucv: MSG_PEEK causes memory leak in iucv_sock_destruct() 2024-11-26 10:02:53 +01:00
kcm kcm: replace call_rcu by kfree_rcu for simple kmem_cache_free callback 2024-10-15 10:50:21 -07:00
key xfrm: Add support for per cpu xfrm state handling. 2024-10-29 11:56:00 +01:00
l2tp net/l2tp: fix warning in l2tp_exit_net found by syzbot 2024-11-26 09:27:07 +01:00
l3mdev
lapb
llc llc: Improve setsockopt() handling of malformed user input 2024-11-28 08:57:42 +01:00
mac80211 module: Convert symbol namespace to string literal 2024-12-02 11:34:44 -08:00
mac802154 Including fixes from ieee802154, bluetooth and netfilter. 2024-10-03 09:44:00 -07:00
mctp net: mctp: Expose transport binding identifier via IFLA attribute 2024-11-09 09:04:54 -08:00
mpls rtnetlink: Return int from rtnl_af_register(). 2024-10-22 11:02:05 +02:00
mptcp mptcp: pm: avoid code duplication to lookup endp 2024-11-18 18:50:13 -08:00
ncsi net/ncsi: Disable the ncsi work before freeing the associated structure 2024-10-03 10:14:14 +02:00
netfilter netfilter: nft_set_hash: skip duplicated elements pending gc run 2024-12-04 21:37:41 +01:00
netlabel Networking changes for 6.13. 2024-11-21 08:28:08 -08:00
netlink netlink: fix false positive warning in extack during dumps 2024-11-24 16:58:07 -08:00
netrom net/netrom: prefer strscpy over strcpy 2024-08-29 12:33:07 -07:00
nfc net: nfc: Propagate ISO14443 type A target ATS to userspace via netlink 2024-11-07 10:21:58 +01:00
nsh nsh: Restore skb->{protocol,data,mac_header} for outer header in nsh_gso_segment(). 2024-04-26 12:20:01 +02:00
openvswitch net: convert to nla_get_*_default() 2024-11-11 10:32:06 -08:00
packet af_packet: avoid erroring out after sock_init_data() in packet_create() 2024-10-15 18:43:07 -07:00
phonet phonet: do not call synchronize_rcu() from phonet_route_del() 2024-11-07 20:34:16 -08:00
psample net: psample: fix flag being set in wrong skb 2024-07-11 18:11:31 -07:00
qrtr net: qrtr: Update packets cloning when broadcasting 2024-09-24 10:48:16 +02:00
rds net/rds: remove unused struct 'rds_ib_dereg_odp_mr' 2024-10-03 16:42:52 -07:00
rfkill Get rid of 'remove_new' relic from platform driver struct 2024-12-01 15:12:43 -08:00
rose net: change proto and proto_ops accept type 2024-05-13 18:19:09 -06:00
rxrpc rxrpc: Improve setsockopt() handling of malformed user input 2024-11-28 08:57:42 +01:00
sched net: sched: fix ordering of qlen adjustment 2024-12-04 12:54:22 +00:00
sctp Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-11-14 11:29:15 -08:00
shaper net-shapers: implement cap validation in the core 2024-10-10 08:30:23 -07:00
smc net/smc: fix LGR and link use-after-free issue 2024-12-03 10:42:29 +01:00
strparser
sunrpc module: Convert symbol namespace to string literal 2024-12-02 11:34:44 -08:00
switchdev net: bridge: switchdev: Improve error message for port_obj_add/del functions 2024-05-08 12:19:12 +01:00
tipc tipc: Fix use-after-free of kernel socket in cleanup_bearer(). 2024-12-03 10:17:43 +01:00
tls move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
unix af_unix: Don't return OOB skb in manage_oob(). 2024-09-09 17:14:27 -07:00
vmw_vsock bpf, vsock: Invoke proto::close on close() 2024-11-25 14:19:14 -08:00
wireless module: Convert symbol namespace to string literal 2024-12-02 11:34:44 -08:00
x25 net: change proto and proto_ops accept type 2024-05-13 18:19:09 -06:00
xdp xsk: always clear DMA mapping information when unmapping the pool 2024-11-25 14:27:37 -08:00
xfrm ipsec-next-2024-11-15 2024-11-18 11:52:49 +00:00
compat.c
devres.c
Kconfig netlink: spec: add shaper YAML spec 2024-10-10 08:30:21 -07:00
Kconfig.debug rtnetlink: Add per-netns RTNL. 2024-10-08 15:16:59 +02:00
Makefile netlink: spec: add shaper YAML spec 2024-10-10 08:30:21 -07:00
socket.c Networking changes for 6.13. 2024-11-21 08:28:08 -08:00
sysctl_net.c sysctl: Remove check for sentinel element in ctl_table arrays 2024-06-13 10:50:52 +02:00