linux/net
Zijian Zhang ca70b8baf2 tcp_bpf: Fix the sk_mem_uncharge logic in tcp_bpf_sendmsg
The current sk memory accounting logic in __SK_REDIRECT is pre-uncharging
tosend bytes, which is either msg->sg.size or a smaller value apply_bytes.

Potential problems with this strategy are as follows:

- If the actual sent bytes are smaller than tosend, we need to charge some
  bytes back, as in line 487, which is okay but seems not clean.

- When tosend is set to apply_bytes, as in line 417, and (ret < 0), we may
  miss uncharging (msg->sg.size - apply_bytes) bytes.

[...]
415 tosend = msg->sg.size;
416 if (psock->apply_bytes && psock->apply_bytes < tosend)
417   tosend = psock->apply_bytes;
[...]
443 sk_msg_return(sk, msg, tosend);
444 release_sock(sk);
446 origsize = msg->sg.size;
447 ret = tcp_bpf_sendmsg_redir(sk_redir, redir_ingress,
448                             msg, tosend, flags);
449 sent = origsize - msg->sg.size;
[...]
454 lock_sock(sk);
455 if (unlikely(ret < 0)) {
456   int free = sk_msg_free_nocharge(sk, msg);
458   if (!cork)
459     *copied -= free;
460 }
[...]
487 if (eval == __SK_REDIRECT)
488   sk_mem_charge(sk, tosend - sent);
[...]

When running the selftest test_txmsg_redir_wait_sndmem with txmsg_apply,
the following warning will be reported:

------------[ cut here ]------------
WARNING: CPU: 6 PID: 57 at net/ipv4/af_inet.c:156 inet_sock_destruct+0x190/0x1a0
Modules linked in:
CPU: 6 UID: 0 PID: 57 Comm: kworker/6:0 Not tainted 6.12.0-rc1.bm.1-amd64+ #43
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
Workqueue: events sk_psock_destroy
RIP: 0010:inet_sock_destruct+0x190/0x1a0
RSP: 0018:ffffad0a8021fe08 EFLAGS: 00010206
RAX: 0000000000000011 RBX: ffff9aab4475b900 RCX: ffff9aab481a0800
RDX: 0000000000000303 RSI: 0000000000000011 RDI: ffff9aab4475b900
RBP: ffff9aab4475b990 R08: 0000000000000000 R09: ffff9aab40050ec0
R10: 0000000000000000 R11: ffff9aae6fdb1d01 R12: ffff9aab49c60400
R13: ffff9aab49c60598 R14: ffff9aab49c60598 R15: dead000000000100
FS:  0000000000000000(0000) GS:ffff9aae6fd80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffec7e47bd8 CR3: 00000001a1a1c004 CR4: 0000000000770ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<TASK>
? __warn+0x89/0x130
? inet_sock_destruct+0x190/0x1a0
? report_bug+0xfc/0x1e0
? handle_bug+0x5c/0xa0
? exc_invalid_op+0x17/0x70
? asm_exc_invalid_op+0x1a/0x20
? inet_sock_destruct+0x190/0x1a0
__sk_destruct+0x25/0x220
sk_psock_destroy+0x2b2/0x310
process_scheduled_works+0xa3/0x3e0
worker_thread+0x117/0x240
? __pfx_worker_thread+0x10/0x10
kthread+0xcf/0x100
? __pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x40
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
</TASK>
---[ end trace 0000000000000000 ]---

In __SK_REDIRECT, a more concise way is delaying the uncharging after sent
bytes are finalized, and uncharge this value. When (ret < 0), we shall
invoke sk_msg_free.

Same thing happens in case __SK_DROP, when tosend is set to apply_bytes,
we may miss uncharging (msg->sg.size - apply_bytes) bytes. The same
warning will be reported in selftest.

[...]
468 case __SK_DROP:
469 default:
470 sk_msg_free_partial(sk, msg, tosend);
471 sk_msg_apply_bytes(psock, tosend);
472 *copied -= (tosend + delta);
473 return -EACCES;
[...]

So instead of sk_msg_free_partial we can do sk_msg_free here.

Fixes: 604326b41a ("bpf, sockmap: convert to generic sk_msg interface")
Fixes: 8ec95b9471 ("bpf, sockmap: Fix the sk->sk_forward_alloc warning of sk_stream_kill_queues")
Signed-off-by: Zijian Zhang <zijianzhang@bytedance.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20241016234838.3167769-3-zijianzhang@bytedance.com
2024-11-26 20:37:46 +01:00
..
6lowpan ipv6: eliminate ndisc_ops_is_useropt() 2024-08-12 17:23:57 -07:00
9p 9p: fix slab cache name creation for real 2024-10-21 15:41:29 -07:00
802 move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
8021q net: convert to nla_get_*_default() 2024-11-11 10:32:06 -08:00
appletalk appletalk: Remove deadcode 2024-10-04 12:42:32 +01:00
atm atm: clean up a put_user() calls 2024-06-14 19:08:50 -07:00
ax25 ax25: Replace kfree() in ax25_dev_free() with ax25_dev_put() 2024-06-01 15:49:42 -07:00
batman-adv This cleanup patchset includes the following patches: 2024-10-15 15:28:17 +02:00
bluetooth Networking changes for 6.13. 2024-11-21 08:28:08 -08:00
bpf bpf, test_run: Fix LIVE_FRAME frame update after a page has been recycled 2024-10-31 16:15:21 +01:00
bridge ndo_fdb_del: Add a parameter to report whether notification was sent 2024-11-15 16:39:18 -08:00
caif caif: Remove unused cfsrvl_getphyid 2024-10-08 15:33:49 -07:00
can can: gw: Use rtnl_register_many(). 2024-10-15 18:52:26 -07:00
ceph libceph: use min() to simplify code in ceph_dns_resolve_name() 2024-08-27 09:30:16 +02:00
core Networking changes for 6.13. 2024-11-21 08:28:08 -08:00
dcb dcb: Use rtnl_register_many(). 2024-10-15 18:52:26 -07:00
dccp net: fix data-races around sk->sk_forward_alloc 2024-11-11 15:29:33 -08:00
devlink net: convert to nla_get_*_default() 2024-11-11 10:32:06 -08:00
dns_resolver
dsa net: dsa: use ethtool string helpers 2024-11-03 10:36:34 -08:00
ethernet netkit: Fix pkt_type override upon netkit pass verdict 2024-05-25 10:48:57 -07:00
ethtool Revert "net: ethtool: Avoid thousands of -Wflex-array-member-not-at-end warnings" 2024-11-18 18:52:11 -08:00
handshake remove pointless includes of <linux/fdtable.h> 2024-10-07 13:34:41 -04:00
hsr net: hsr: Add VLAN CTAG filter support 2024-11-11 16:40:44 -08:00
ieee802154 net: convert to nla_get_*_default() 2024-11-11 10:32:06 -08:00
ife
ipv4 tcp_bpf: Fix the sk_mem_uncharge logic in tcp_bpf_sendmsg 2024-11-26 20:37:46 +01:00
ipv6 ipv6/udp: Add 4-tuple hash for connected socket 2024-11-18 11:56:21 +00:00
iucv s390/iucv: Fix vargs handling in iucv_alloc_device() 2024-08-22 13:09:20 -07:00
kcm kcm: replace call_rcu by kfree_rcu for simple kmem_cache_free callback 2024-10-15 10:50:21 -07:00
key xfrm: Add support for per cpu xfrm state handling. 2024-10-29 11:56:00 +01:00
l2tp genetlink: hold RCU in genlmsg_mcast() 2024-10-15 17:52:58 -07:00
l3mdev
lapb
llc llc: Constify struct llc_sap_state_trans 2024-07-15 08:51:19 -07:00
mac80211 wireless-next patches for v6.13 2024-11-13 18:35:19 -08:00
mac802154 Including fixes from ieee802154, bluetooth and netfilter. 2024-10-03 09:44:00 -07:00
mctp net: mctp: Expose transport binding identifier via IFLA attribute 2024-11-09 09:04:54 -08:00
mpls rtnetlink: Return int from rtnl_af_register(). 2024-10-22 11:02:05 +02:00
mptcp mptcp: pm: avoid code duplication to lookup endp 2024-11-18 18:50:13 -08:00
ncsi net/ncsi: Disable the ncsi work before freeing the associated structure 2024-10-03 10:14:14 +02:00
netfilter Networking changes for 6.13. 2024-11-21 08:28:08 -08:00
netlabel Networking changes for 6.13. 2024-11-21 08:28:08 -08:00
netlink Networking changes for 6.13. 2024-11-21 08:28:08 -08:00
netrom net/netrom: prefer strscpy over strcpy 2024-08-29 12:33:07 -07:00
nfc net: nfc: Propagate ISO14443 type A target ATS to userspace via netlink 2024-11-07 10:21:58 +01:00
nsh nsh: Restore skb->{protocol,data,mac_header} for outer header in nsh_gso_segment(). 2024-04-26 12:20:01 +02:00
openvswitch net: convert to nla_get_*_default() 2024-11-11 10:32:06 -08:00
packet af_packet: avoid erroring out after sock_init_data() in packet_create() 2024-10-15 18:43:07 -07:00
phonet phonet: do not call synchronize_rcu() from phonet_route_del() 2024-11-07 20:34:16 -08:00
psample net: psample: fix flag being set in wrong skb 2024-07-11 18:11:31 -07:00
qrtr net: qrtr: Update packets cloning when broadcasting 2024-09-24 10:48:16 +02:00
rds net/rds: remove unused struct 'rds_ib_dereg_odp_mr' 2024-10-03 16:42:52 -07:00
rfkill net: rfkill: gpio: Add check for clk_enable() 2024-11-12 13:30:31 +01:00
rose net: change proto and proto_ops accept type 2024-05-13 18:19:09 -06:00
rxrpc rxrpc: Add a tracepoint for aborts being proposed 2024-11-11 15:27:46 -08:00
sched Networking changes for 6.13. 2024-11-21 08:28:08 -08:00
sctp Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-11-14 11:29:15 -08:00
shaper net-shapers: implement cap validation in the core 2024-10-10 08:30:23 -07:00
smc Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-11-07 13:44:16 -08:00
strparser
sunrpc mm: page_frag: avoid caller accessing 'page_frag_cache' directly 2024-11-11 10:56:27 -08:00
switchdev net: bridge: switchdev: Improve error message for port_obj_add/del functions 2024-05-08 12:19:12 +01:00
tipc Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-09-15 09:13:19 -07:00
tls move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
unix af_unix: Don't return OOB skb in manage_oob(). 2024-09-09 17:14:27 -07:00
vmw_vsock bpf, vsock: Invoke proto::close on close() 2024-11-25 14:19:14 -08:00
wireless wireless-next patches for v6.13 2024-11-13 18:35:19 -08:00
x25 net: change proto and proto_ops accept type 2024-05-13 18:19:09 -06:00
xdp xsk: always clear DMA mapping information when unmapping the pool 2024-11-25 14:27:37 -08:00
xfrm ipsec-next-2024-11-15 2024-11-18 11:52:49 +00:00
compat.c
devres.c
Kconfig netlink: spec: add shaper YAML spec 2024-10-10 08:30:21 -07:00
Kconfig.debug rtnetlink: Add per-netns RTNL. 2024-10-08 15:16:59 +02:00
Makefile netlink: spec: add shaper YAML spec 2024-10-10 08:30:21 -07:00
socket.c Networking changes for 6.13. 2024-11-21 08:28:08 -08:00
sysctl_net.c sysctl: Remove check for sentinel element in ctl_table arrays 2024-06-13 10:50:52 +02:00