linux-next/net
Jakub Sitnicki ca6a6f9386 tcp: Add sysctl to configure TIME-WAIT reuse delay
Today we have a hardcoded delay of 1 sec before a TIME-WAIT socket can be
reused by reopening a connection. This is a safe choice based on an
assumption that the other TCP timestamp clock frequency, which is unknown
to us, may be as low as 1 Hz (RFC 7323, section 5.4).

However, this means that in the presence of short lived connections with an
RTT of couple of milliseconds, the time during which a 4-tuple is blocked
from reuse can be orders of magnitude longer that the connection lifetime.
Combined with a reduced pool of ephemeral ports, when using
IP_LOCAL_PORT_RANGE to share an egress IP address between hosts [1], the
long TIME-WAIT reuse delay can lead to port exhaustion, where all available
4-tuples are tied up in TIME-WAIT state.

Turn the reuse delay into a per-netns setting so that sysadmins can make
more aggressive assumptions about remote TCP timestamp clock frequency and
shorten the delay in order to allow connections to reincarnate faster.

Note that applications can completely bypass the TIME-WAIT delay protection
already today by locking the local port with bind() before connecting. Such
immediate connection reuse may result in PAWS failing to detect old
duplicate segments, leaving us with just the sequence number check as a
safety net.

This new configurable offers a trade off where the sysadmin can balance
between the risk of PAWS detection failing to act versus exhausting ports
by having sockets tied up in TIME-WAIT state for too long.

[1] https://lpc.events/event/16/contributions/1349/

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Link: https://patch.msgid.link/20241209-jakub-krn-909-poc-msec-tw-tstamp-v2-2-66aca0eed03e@cloudflare.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-11 20:17:33 -08:00
..
6lowpan ipv6: eliminate ndisc_ops_is_useropt() 2024-08-12 17:23:57 -07:00
9p net/9p/usbg: allow building as standalone module 2024-11-22 23:48:14 +09:00
802 move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
8021q net: convert to nla_get_*_default() 2024-11-11 10:32:06 -08:00
appletalk appletalk: Remove deadcode 2024-10-04 12:42:32 +01:00
atm atm: clean up a put_user() calls 2024-06-14 19:08:50 -07:00
ax25 ax25: Replace kfree() in ax25_dev_free() with ax25_dev_put() 2024-06-01 15:49:42 -07:00
batman-adv This cleanup patchset includes the following patches: 2024-10-15 15:28:17 +02:00
bluetooth Bluetooth: SCO: remove the redundant sco_conn_put 2024-11-26 11:07:28 -05:00
bpf bpf, test_run: Fix LIVE_FRAME frame update after a page has been recycled 2024-10-31 16:15:21 +01:00
bridge rtnetlink: add ndo_fdb_dump_context 2024-12-10 18:32:32 -08:00
caif caif: Remove unused cfsrvl_getphyid 2024-10-08 15:33:49 -07:00
can can: j1939: j1939_session_new(): fix skb reference counting 2024-12-02 09:53:39 +01:00
ceph libceph: Remove unused ceph_crypto_key_encode 2024-11-18 17:34:35 +01:00
core rtnetlink: switch rtnl_fdb_dump() to for_each_netdev_dump() 2024-12-10 18:32:32 -08:00
dcb dcb: Use rtnl_register_many(). 2024-10-15 18:52:26 -07:00
dccp dccp: Fix memory leak in dccp_feat_change_recv 2024-12-03 09:50:21 +01:00
devlink net: convert to nla_get_*_default() 2024-11-11 10:32:06 -08:00
dns_resolver
dsa rtnetlink: add ndo_fdb_dump_context 2024-12-10 18:32:32 -08:00
ethernet netkit: Fix pkt_type override upon netkit pass verdict 2024-05-25 10:48:57 -07:00
ethtool ethtool: Fix wrong mod state in case of verbose and no_mask bitset 2024-12-04 18:54:43 -08:00
handshake module: Convert symbol namespace to string literal 2024-12-02 11:34:44 -08:00
hsr net: hsr: must allocate more bytes for RedBox support 2024-12-03 12:08:33 +01:00
ieee802154 net: convert to nla_get_*_default() 2024-11-11 10:32:06 -08:00
ife
ipv4 tcp: Add sysctl to configure TIME-WAIT reuse delay 2024-12-11 20:17:33 -08:00
ipv6 ipv6: mcast: annotate data-race around psf->sf_count[MCAST_XXX] 2024-12-11 20:15:29 -08:00
iucv s390/iucv: MSG_PEEK causes memory leak in iucv_sock_destruct() 2024-11-26 10:02:53 +01:00
kcm kcm: replace call_rcu by kfree_rcu for simple kmem_cache_free callback 2024-10-15 10:50:21 -07:00
key xfrm: Add support for per cpu xfrm state handling. 2024-10-29 11:56:00 +01:00
l2tp l2tp: Handle eth stats using NETDEV_PCPU_STAT_DSTATS. 2024-12-11 13:57:26 +00:00
l3mdev
lapb
llc llc: Improve setsockopt() handling of malformed user input 2024-11-28 08:57:42 +01:00
mac80211 module: Convert symbol namespace to string literal 2024-12-02 11:34:44 -08:00
mac802154 Including fixes from ieee802154, bluetooth and netfilter. 2024-10-03 09:44:00 -07:00
mctp mctp: no longer rely on net->dev_index_head[] 2024-12-09 14:29:14 -08:00
mpls rtnetlink: Return int from rtnl_af_register(). 2024-10-22 11:02:05 +02:00
mptcp mptcp: pm: avoid code duplication to lookup endp 2024-11-18 18:50:13 -08:00
ncsi net/ncsi: Disable the ncsi work before freeing the associated structure 2024-10-03 10:14:14 +02:00
netfilter netfilter: nft_set_hash: skip duplicated elements pending gc run 2024-12-04 21:37:41 +01:00
netlabel Networking changes for 6.13. 2024-11-21 08:28:08 -08:00
netlink netlink: fix false positive warning in extack during dumps 2024-11-24 16:58:07 -08:00
netrom net/netrom: prefer strscpy over strcpy 2024-08-29 12:33:07 -07:00
nfc net: nfc: Propagate ISO14443 type A target ATS to userspace via netlink 2024-11-07 10:21:58 +01:00
nsh nsh: Restore skb->{protocol,data,mac_header} for outer header in nsh_gso_segment(). 2024-04-26 12:20:01 +02:00
openvswitch net: convert to nla_get_*_default() 2024-11-11 10:32:06 -08:00
packet af_packet: avoid erroring out after sock_init_data() in packet_create() 2024-10-15 18:43:07 -07:00
phonet phonet: do not call synchronize_rcu() from phonet_route_del() 2024-11-07 20:34:16 -08:00
psample net: psample: fix flag being set in wrong skb 2024-07-11 18:11:31 -07:00
qrtr net: qrtr: Update packets cloning when broadcasting 2024-09-24 10:48:16 +02:00
rds net/rds: remove unused struct 'rds_ib_dereg_odp_mr' 2024-10-03 16:42:52 -07:00
rfkill Get rid of 'remove_new' relic from platform driver struct 2024-12-01 15:12:43 -08:00
rose net: change proto and proto_ops accept type 2024-05-13 18:19:09 -06:00
rxrpc rxrpc: Implement RACK/TLP to deal with transmission stalls [RFC8985] 2024-12-09 13:48:33 -08:00
sched net_sched: sch_sfq: don't allow 1 packet limit 2024-12-05 18:02:10 -08:00
sctp Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-11-14 11:29:15 -08:00
shaper net-shapers: implement cap validation in the core 2024-10-10 08:30:23 -07:00
smc net/smc: fix LGR and link use-after-free issue 2024-12-03 10:42:29 +01:00
strparser
sunrpc module: Convert symbol namespace to string literal 2024-12-02 11:34:44 -08:00
switchdev net: bridge: switchdev: Improve error message for port_obj_add/del functions 2024-05-08 12:19:12 +01:00
tipc net: tipc: remove one synchronize_net() from tipc_nametbl_stop() 2024-12-06 17:41:28 -08:00
tls move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
unix af_unix: Don't return OOB skb in manage_oob(). 2024-09-09 17:14:27 -07:00
vmw_vsock Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-11-14 11:29:15 -08:00
wireless module: Convert symbol namespace to string literal 2024-12-02 11:34:44 -08:00
x25 net: change proto and proto_ops accept type 2024-05-13 18:19:09 -06:00
xdp Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-11-19 13:56:02 +01:00
xfrm ipsec-next-2024-11-15 2024-11-18 11:52:49 +00:00
compat.c
devres.c
Kconfig netlink: spec: add shaper YAML spec 2024-10-10 08:30:21 -07:00
Kconfig.debug rtnetlink: Add per-netns RTNL. 2024-10-08 15:16:59 +02:00
Makefile netlink: spec: add shaper YAML spec 2024-10-10 08:30:21 -07:00
socket.c Networking changes for 6.13. 2024-11-21 08:28:08 -08:00
sysctl_net.c sysctl: Remove check for sentinel element in ctl_table arrays 2024-06-13 10:50:52 +02:00