linux-stable/net/netfilter/ipvs
Terin Stock d7fce52fdf ipvs: align inner_mac_header for encapsulation
When using encapsulation the original packet's headers are copied to the
inner headers. This preserves the space for an inner mac header, which
is not used by the inner payloads for the encapsulation types supported
by IPVS. If a packet is using GUE or GRE encapsulation and needs to be
segmented, flow can be passed to __skb_udp_tunnel_segment() which
calculates a negative tunnel header length. A negative tunnel header
length causes pskb_may_pull() to fail, dropping the packet.

This can be observed by attaching probes to ip_vs_in_hook(),
__dev_queue_xmit(), and __skb_udp_tunnel_segment():

    perf probe --add '__dev_queue_xmit skb->inner_mac_header \
    skb->inner_network_header skb->mac_header skb->network_header'
    perf probe --add '__skb_udp_tunnel_segment:7 tnl_hlen'
    perf probe -m ip_vs --add 'ip_vs_in_hook skb->inner_mac_header \
    skb->inner_network_header skb->mac_header skb->network_header'

These probes the headers and tunnel header length for packets which
traverse the IPVS encapsulation path. A TCP packet can be forced into
the segmentation path by being smaller than a calculated clamped MSS,
but larger than the advertised MSS.

    probe:ip_vs_in_hook: inner_mac_header=0x0 inner_network_header=0x0 mac_header=0x44 network_header=0x52
    probe:ip_vs_in_hook: inner_mac_header=0x44 inner_network_header=0x52 mac_header=0x44 network_header=0x32
    probe:dev_queue_xmit: inner_mac_header=0x44 inner_network_header=0x52 mac_header=0x44 network_header=0x32
    probe:__skb_udp_tunnel_segment_L7: tnl_hlen=-2

When using veth-based encapsulation, the interfaces are set to be
mac-less, which does not preserve space for an inner mac header. This
prevents this issue from occurring.

In our real-world testing of sending a 32KB file we observed operation
time increasing from ~75ms for veth-based encapsulation to over 1.5s
using IPVS encapsulation due to retries from dropped packets.

This changeset modifies the packet on the encapsulation path in
ip_vs_tunnel_xmit() and ip_vs_tunnel_xmit_v6() to remove the inner mac
header offset. This fixes UDP segmentation for both encapsulation types,
and corrects the inner headers for any IPIP flows that may use it.

Fixes: 84c0d5e96f ("ipvs: allow tunneling with gue encapsulation")
Signed-off-by: Terin Stock <terin@cloudflare.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-06-19 16:01:07 +02:00
..
ip_vs_app.c ipvs: fix WARNING in ip_vs_app_net_cleanup() 2022-11-02 09:39:14 +01:00
ip_vs_conn.c ipvs: Consistently use array_size() in ip_vs_conn_init() 2023-04-22 01:39:41 +02:00
ip_vs_core.c ipvs: Remove {Enter,Leave}Function 2023-04-22 01:39:41 +02:00
ip_vs_ctl.c ipvs: Remove {Enter,Leave}Function 2023-04-22 01:39:41 +02:00
ip_vs_dh.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
ip_vs_est.c ipvs: avoid kfree_rcu without 2nd arg 2023-02-02 14:02:01 +01:00
ip_vs_fo.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
ip_vs_ftp.c netfilter: ipvs: do not printk on netns creation 2021-04-03 20:17:11 +02:00
ip_vs_lblc.c treewide: Convert del_timer*() to timer_shutdown*() 2022-12-25 13:38:09 -08:00
ip_vs_lblcr.c treewide: Convert del_timer*() to timer_shutdown*() 2022-12-25 13:38:09 -08:00
ip_vs_lc.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
ip_vs_mh.c netfilter: ipvs: Use the bitmap API to allocate bitmaps 2022-07-21 00:55:39 +02:00
ip_vs_nfct.c netfilter: nf_conntrack_sip: fix expectation clash 2019-07-16 13:16:59 +02:00
ip_vs_nq.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
ip_vs_ovf.c net: Fix various misspellings of "connect" 2019-10-28 13:41:59 -07:00
ip_vs_pe_sip.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
ip_vs_pe.c ipvs: don't ignore errors in case refcounting ip_vs module fails 2019-10-24 11:53:19 +02:00
ip_vs_proto_ah_esp.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
ip_vs_proto_sctp.c netfilter: ipvs: prefer skb_ensure_writable 2019-05-31 18:02:44 +02:00
ip_vs_proto_tcp.c ipvs: adjust the debug info in function set_tcp_state 2020-10-20 13:54:46 +02:00
ip_vs_proto_udp.c treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
ip_vs_proto.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
ip_vs_rr.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
ip_vs_sched.c ipvs: don't ignore errors in case refcounting ip_vs module fails 2019-10-24 11:53:19 +02:00
ip_vs_sed.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
ip_vs_sh.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
ip_vs_sync.c ipvs: Remove {Enter,Leave}Function 2023-04-22 01:39:41 +02:00
ip_vs_twos.c treewide: use get_random_u32_below() instead of deprecated function 2022-11-18 02:15:15 +01:00
ip_vs_wlc.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
ip_vs_wrr.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
ip_vs_xmit.c ipvs: align inner_mac_header for encapsulation 2023-06-19 16:01:07 +02:00
Kconfig netfilter: Remove leading spaces in Kconfig 2021-05-29 01:04:52 +02:00
Makefile ipvs: add weighted random twos choice algorithm 2021-01-26 01:09:46 +01:00