linux-next/include/net/netns
Jakub Sitnicki ca6a6f9386 tcp: Add sysctl to configure TIME-WAIT reuse delay
Today we have a hardcoded delay of 1 sec before a TIME-WAIT socket can be
reused by reopening a connection. This is a safe choice based on an
assumption that the other TCP timestamp clock frequency, which is unknown
to us, may be as low as 1 Hz (RFC 7323, section 5.4).

However, this means that in the presence of short lived connections with an
RTT of couple of milliseconds, the time during which a 4-tuple is blocked
from reuse can be orders of magnitude longer that the connection lifetime.
Combined with a reduced pool of ephemeral ports, when using
IP_LOCAL_PORT_RANGE to share an egress IP address between hosts [1], the
long TIME-WAIT reuse delay can lead to port exhaustion, where all available
4-tuples are tied up in TIME-WAIT state.

Turn the reuse delay into a per-netns setting so that sysadmins can make
more aggressive assumptions about remote TCP timestamp clock frequency and
shorten the delay in order to allow connections to reincarnate faster.

Note that applications can completely bypass the TIME-WAIT delay protection
already today by locking the local port with bind() before connecting. Such
immediate connection reuse may result in PAWS failing to detect old
duplicate segments, leaving us with just the sequence number check as a
safety net.

This new configurable offers a trade off where the sysadmin can balance
between the risk of PAWS detection failing to act versus exhausting ports
by having sockets tied up in TIME-WAIT state for too long.

[1] https://lpc.events/event/16/contributions/1349/

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Link: https://patch.msgid.link/20241209-jakub-krn-909-poc-msec-tw-tstamp-v2-2-66aca0eed03e@cloudflare.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-11 20:17:33 -08:00
..
bpf.h bpf: Invert the dependency between bpf-netns.h and netns/bpf.h 2021-12-29 20:03:05 -08:00
can.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
conntrack.h netfilter: conntrack: switch connlabels to atomic_t 2023-10-24 13:16:30 +02:00
core.h net-timestamp: namespacify the sysctl_tstamp_allow_data 2024-10-08 15:33:11 -07:00
flow_table.h netfilter: nf_flow_table: count pending offload workqueue tasks 2022-07-11 16:25:14 +02:00
generic.h netns: Replace zero-length array with DECLARE_FLEX_ARRAY() helper 2022-09-28 18:51:47 -07:00
hash.h netns: provide pure entropy for net_hash_mix() 2019-03-28 17:00:45 -07:00
ieee802154_6lowpan.h net: dynamically allocate fqdir structures 2019-05-26 14:08:05 -07:00
ipv4.h tcp: Add sysctl to configure TIME-WAIT reuse delay 2024-12-11 20:17:33 -08:00
ipv6.h net/ipv6: convert skip_notify_on_dev_down sysctl to u8 2023-06-02 22:55:43 -07:00
mctp.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
mib.h net: reorganize fields in netns_mib 2021-04-02 14:31:44 -07:00
mpls.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
netfilter.h netfilter: move the sysctl nf_hooks_lwtunnel into the netfilter core 2024-06-19 18:41:59 +02:00
nexthop.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
nftables.h net: Remove leftover include from nftables.h 2023-08-13 14:55:25 +01:00
packet.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
sctp.h net: Correct spelling in headers 2024-08-26 09:37:23 -07:00
smc.h net/smc: add sysctl for max conns per lgr for SMC-R v2.1 2023-11-24 12:13:14 +00:00
unix.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
xdp.h net: xsk: Don't include <linux/rculist.h> 2022-12-06 20:04:34 -08:00
xfrm.h xfrm: Add an inbound percpu state cache. 2024-10-29 11:56:18 +01:00