linux/tools
Vladimir Vdovin 7d3f3b4367 net: ipv4: Cache pmtu for all packet paths if multipath enabled
Check number of paths by fib_info_num_path(),
and update_or_create_fnhe() for every path.
Problem is that pmtu is cached only for the oif
that has received icmp message "need to frag",
other oifs will still try to use "default" iface mtu.

An example topology showing the problem:

                    |  host1
                +---------+
                |  dummy0 | 10.179.20.18/32  mtu9000
                +---------+
        +-----------+----------------+
    +---------+                     +---------+
    | ens17f0 |  10.179.2.141/31    | ens17f1 |  10.179.2.13/31
    +---------+                     +---------+
        |    (all here have mtu 9000)    |
    +------+                         +------+
    | ro1  |  10.179.2.140/31        | ro2  |  10.179.2.12/31
    +------+                         +------+
        |                                |
---------+------------+-------------------+------
                        |
                    +-----+
                    | ro3 | 10.10.10.10  mtu1500
                    +-----+
                        |
    ========================================
                some networks
    ========================================
                        |
                    +-----+
                    | eth0| 10.10.30.30  mtu9000
                    +-----+
                        |  host2

host1 have enabled multipath and
sysctl net.ipv4.fib_multipath_hash_policy = 1:

default proto static src 10.179.20.18
        nexthop via 10.179.2.12 dev ens17f1 weight 1
        nexthop via 10.179.2.140 dev ens17f0 weight 1

When host1 tries to do pmtud from 10.179.20.18/32 to host2,
host1 receives at ens17f1 iface an icmp packet from ro3 that ro3 mtu=1500.
And host1 caches it in nexthop exceptions cache.

Problem is that it is cached only for the iface that has received icmp,
and there is no way that ro3 will send icmp msg to host1 via another path.

Host1 now have this routes to host2:

ip r g 10.10.30.30 sport 30000 dport 443
10.10.30.30 via 10.179.2.12 dev ens17f1 src 10.179.20.18 uid 0
    cache expires 521sec mtu 1500

ip r g 10.10.30.30 sport 30033 dport 443
10.10.30.30 via 10.179.2.140 dev ens17f0 src 10.179.20.18 uid 0
    cache

So when host1 tries again to reach host2 with mtu>1500,
if packet flow is lucky enough to be hashed with oif=ens17f1 its ok,
if oif=ens17f0 it blackholes and still gets icmp msgs from ro3 to ens17f1,
until lucky day when ro3 will send it through another flow to ens17f0.

Signed-off-by: Vladimir Vdovin <deliran@verdict.gg>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20241108093427.317942-1-deliran@verdict.gg
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11 19:07:36 -08:00
..
accounting
arch perf cap: Add __NR_capget to arch/x86 unistd 2024-10-28 13:04:52 -03:00
bootconfig
bpf bpftool: Fix undefined behavior in qsort(NULL, 0, ...) 2024-09-10 11:40:55 -07:00
build perf build: Fix build feature-dwarf_getlocations fail for old libdw 2024-10-02 18:21:49 -03:00
certs
cgroup
counter
crypto crypto: tools/ccp - Remove unused variable 2024-08-30 18:22:30 +08:00
debugging
firewire
firmware
gpio tools: gpio: rm .*.cmd on make clean 2024-09-02 12:27:35 +02:00
hv hyperv-next for v6.12 2024-09-19 08:15:30 +02:00
iio tools: iio: rm .*.cmd when make clean 2024-09-05 19:27:13 +01:00
include net: Add napi_struct parameter irq_suspend_timeout 2024-11-11 18:45:05 -08:00
kvm/kvm_stat
laptop
leds
lib memblock: updates for 6.12-rc1 2024-09-25 11:35:19 -07:00
memory-model tools/memory-model: simple.txt: Fix stale reference to recipes-pairs.txt 2024-09-13 23:56:44 -07:00
mm tools/mm: -Werror fixes in page-types/slabinfo 2024-10-30 20:14:11 -07:00
net tools: ynl-gen: de-kdocify enums with no doc for entries 2024-11-04 18:11:47 -08:00
objtool LoongArch changes for v6.12 2024-09-27 10:14:35 -07:00
pci tools: PCI: Remove unused BILLION macro 2024-09-13 22:37:06 +00:00
pcmcia
perf perf cap: Add __NR_capget to arch/x86 unistd 2024-10-28 13:04:52 -03:00
power linux-cpupower-6.12-rc1-fixes 2024-09-24 12:57:46 -07:00
rcu tools/rcu: Remove RCU Tasks Rude asynchronous APIs from rcu-updaters.sh 2024-07-29 07:39:32 +05:30
sched_ext sched_ext: Make cast_mask() inline 2024-10-25 12:19:44 -10:00
scripts
sound ASoC: dapm-graph: show path name for non-static routes 2024-08-23 11:03:00 +01:00
spi spi: spidev_fdx: Fix the wrong format specifier 2024-09-04 16:50:33 +01:00
testing net: ipv4: Cache pmtu for all packet paths if multipath enabled 2024-11-11 19:07:36 -08:00
thermal
time
tracing rtla: Fix the help text in osnoise and timerlat top tools 2024-10-03 16:43:22 -04:00
usb usbip: tools: Fix detach_port() invalid port error path 2024-10-29 04:23:23 +01:00
verification tools/verification: Use pkg-config in lib_setup of Makefile.config 2024-07-17 13:14:51 -07:00
virtio tools/virtio:Fix the wrong format specifier 2024-09-10 02:51:48 -04:00
wmi
workqueue
writeback writeback: add wb_monitor.py script to monitor writeback info on bdi 2024-05-05 17:53:51 -07:00
Makefile sched_ext: Add scx_simple and scx_example_qmap example schedulers 2024-06-18 10:09:17 -10:00