linux/net/ipv6
Omid Ehtemam-Haghighi d9ccb18f83 ipv6: Fix soft lockups in fib6_select_path under high next hop churn
Soft lockups have been observed on a cluster of Linux-based edge routers
located in a highly dynamic environment. Using the `bird` service, these
routers continuously update BGP-advertised routes due to frequently
changing nexthop destinations, while also managing significant IPv6
traffic. The lockups occur during the traversal of the multipath
circular linked-list in the `fib6_select_path` function, particularly
while iterating through the siblings in the list. The issue typically
arises when the nodes of the linked list are unexpectedly deleted
concurrently on a different core—indicated by their 'next' and
'previous' elements pointing back to the node itself and their reference
count dropping to zero. This results in an infinite loop, leading to a
soft lockup that triggers a system panic via the watchdog timer.

Apply RCU primitives in the problematic code sections to resolve the
issue. Where necessary, update the references to fib6_siblings to
annotate or use the RCU APIs.

Include a test script that reproduces the issue. The script
periodically updates the routing table while generating a heavy load
of outgoing IPv6 traffic through multiple iperf3 clients. It
consistently induces infinite soft lockups within a couple of minutes.

Kernel log:

 0 [ffffbd13003e8d30] machine_kexec at ffffffff8ceaf3eb
 1 [ffffbd13003e8d90] __crash_kexec at ffffffff8d0120e3
 2 [ffffbd13003e8e58] panic at ffffffff8cef65d4
 3 [ffffbd13003e8ed8] watchdog_timer_fn at ffffffff8d05cb03
 4 [ffffbd13003e8f08] __hrtimer_run_queues at ffffffff8cfec62f
 5 [ffffbd13003e8f70] hrtimer_interrupt at ffffffff8cfed756
 6 [ffffbd13003e8fd0] __sysvec_apic_timer_interrupt at ffffffff8cea01af
 7 [ffffbd13003e8ff0] sysvec_apic_timer_interrupt at ffffffff8df1b83d
-- <IRQ stack> --
 8 [ffffbd13003d3708] asm_sysvec_apic_timer_interrupt at ffffffff8e000ecb
    [exception RIP: fib6_select_path+299]
    RIP: ffffffff8ddafe7b  RSP: ffffbd13003d37b8  RFLAGS: 00000287
    RAX: ffff975850b43600  RBX: ffff975850b40200  RCX: 0000000000000000
    RDX: 000000003fffffff  RSI: 0000000051d383e4  RDI: ffff975850b43618
    RBP: ffffbd13003d3800   R8: 0000000000000000   R9: ffff975850b40200
    R10: 0000000000000000  R11: 0000000000000000  R12: ffffbd13003d3830
    R13: ffff975850b436a8  R14: ffff975850b43600  R15: 0000000000000007
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 9 [ffffbd13003d3808] ip6_pol_route at ffffffff8ddb030c
10 [ffffbd13003d3888] ip6_pol_route_input at ffffffff8ddb068c
11 [ffffbd13003d3898] fib6_rule_lookup at ffffffff8ddf02b5
12 [ffffbd13003d3928] ip6_route_input at ffffffff8ddb0f47
13 [ffffbd13003d3a18] ip6_rcv_finish_core.constprop.0 at ffffffff8dd950d0
14 [ffffbd13003d3a30] ip6_list_rcv_finish.constprop.0 at ffffffff8dd96274
15 [ffffbd13003d3a98] ip6_sublist_rcv at ffffffff8dd96474
16 [ffffbd13003d3af8] ipv6_list_rcv at ffffffff8dd96615
17 [ffffbd13003d3b60] __netif_receive_skb_list_core at ffffffff8dc16fec
18 [ffffbd13003d3be0] netif_receive_skb_list_internal at ffffffff8dc176b3
19 [ffffbd13003d3c50] napi_gro_receive at ffffffff8dc565b9
20 [ffffbd13003d3c80] ice_receive_skb at ffffffffc087e4f5 [ice]
21 [ffffbd13003d3c90] ice_clean_rx_irq at ffffffffc0881b80 [ice]
22 [ffffbd13003d3d20] ice_napi_poll at ffffffffc088232f [ice]
23 [ffffbd13003d3d80] __napi_poll at ffffffff8dc18000
24 [ffffbd13003d3db8] net_rx_action at ffffffff8dc18581
25 [ffffbd13003d3e40] __do_softirq at ffffffff8df352e9
26 [ffffbd13003d3eb0] run_ksoftirqd at ffffffff8ceffe47
27 [ffffbd13003d3ec0] smpboot_thread_fn at ffffffff8cf36a30
28 [ffffbd13003d3ee8] kthread at ffffffff8cf2b39f
29 [ffffbd13003d3f28] ret_from_fork at ffffffff8ce5fa64
30 [ffffbd13003d3f50] ret_from_fork_asm at ffffffff8ce03cbb

Fixes: 66f5d6ce53 ("ipv6: replace rwlock with rcu and spinlock in fib6_table")
Reported-by: Adrian Oliver <kernel@aoliver.ca>
Signed-off-by: Omid Ehtemam-Haghighi <omid.ehtemamhaghighi@menlosecurity.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Ido Schimmel <idosch@idosch.org>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: Simon Horman <horms@kernel.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20241106010236.1239299-1-omid.ehtemamhaghighi@menlosecurity.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11 15:26:10 -08:00
..
ila net: convert to nla_get_*_default() 2024-11-11 10:32:06 -08:00
netfilter netfilter pull request 24-11-07 2024-11-07 12:46:04 +01:00
addrconf_core.c ipv6: Ensure natural alignment of const ipv6 loopback and router addresses 2024-01-30 12:43:18 +01:00
addrconf.c net: convert to nla_get_*_default() 2024-11-11 10:32:06 -08:00
addrlabel.c ipv6: Use rtnl_register_many(). 2024-10-15 18:52:26 -07:00
af_inet6.c net: inet6: do not leave a dangling sk pointer in inet6_create() 2024-10-15 18:43:08 -07:00
ah6.c net: fill in MODULE_DESCRIPTION()s for ipv6 modules 2024-02-09 14:12:01 -08:00
anycast.c ipv6: switch inet6_acaddr_hash() to less predictable hash 2024-10-09 19:33:57 -07:00
calipso.c move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
datagram.c ipv6: annotate data-races around np->ucast_oif 2023-12-11 10:59:17 +00:00
esp6_offload.c xfrm: Log input direction mismatch error in one place 2024-06-17 13:53:19 +02:00
esp6.c net: support non paged skb frags 2024-09-11 20:44:31 -07:00
exthdrs_core.c ipv6: Fix out-of-bounds access in ipv6_find_tlv() 2023-05-24 08:43:39 +01:00
exthdrs_offload.c net: gso: add HBH extension header offload support 2024-01-05 08:11:49 -08:00
exthdrs.c net: ipv6: exthdrs: get rid of ipv6_skb_net() 2024-03-11 15:15:08 -07:00
fib6_notifier.c net: do not acquire rtnl in fib_seq_sum() 2024-10-11 15:35:05 -07:00
fib6_rules.c ipv6: use READ_ONCE()/WRITE_ONCE() on fib6_table->fib_seq 2024-10-11 15:35:05 -07:00
fou6.c net: Add MODULE_DESCRIPTION entries to network modules 2020-06-20 21:33:57 -07:00
icmp.c icmp: move icmp_global.credit and icmp_global.stamp to per netns storage 2024-08-30 11:14:06 -07:00
inet6_connection_sock.c net: implement lockless SO_PRIORITY 2023-10-01 19:09:54 +01:00
inet6_hashtables.c inet6: constify 'struct net' parameter of various lookup helpers 2024-08-05 16:27:26 -07:00
ioam6_iptunnel.c net: convert to nla_get_*_default() 2024-11-11 10:32:06 -08:00
ioam6.c net: convert to nla_get_*_default() 2024-11-11 10:32:06 -08:00
ip6_checksum.c net: udp: fix handling of CHECKSUM_COMPLETE packets 2018-10-24 14:18:16 -07:00
ip6_fib.c ipv6: Fix soft lockups in fib6_select_path under high next hop churn 2024-11-11 15:26:10 -08:00
ip6_flowlabel.c ipv6: move np->repflow to atomic flags 2023-09-15 10:33:48 +01:00
ip6_gre.c netdev_features: convert NETIF_F_NETNS_LOCAL to dev->netns_local 2024-09-03 11:36:43 +02:00
ip6_icmp.c net: icmp: pass zeroed opts from icmp{,v6}_ndo_send before sending 2021-02-23 11:29:52 -08:00
ip6_input.c net/ipv6: make use of the helper macro LIST_HEAD() 2024-09-06 18:10:21 -07:00
ip6_offload.c net: gro: initialize network_offset in network layer 2024-05-27 16:46:59 -07:00
ip6_offload.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
ip6_output.c ipv6: Remove redundant unlikely() 2024-10-09 19:40:46 -07:00
ip6_tunnel.c ipv4: Convert ip_route_input() to dscp_t. 2024-10-03 16:21:21 -07:00
ip6_udp_tunnel.c net: fill in MODULE_DESCRIPTION()s for ipv6 modules 2024-02-09 14:12:01 -08:00
ip6_vti.c net: annotate writes on dev->mtu from ndo_change_mtu() 2024-05-07 16:19:14 -07:00
ip6mr.c net: convert to nla_get_*_default() 2024-11-11 10:32:06 -08:00
ipcomp6.c xfrm: ipcomp: add extack to ipcomp{4,6}_init_state 2022-09-29 07:18:00 +02:00
ipv6_sockglue.c ipv6: avoid indirect calls for SOL_IP socket options 2024-08-26 14:53:50 -07:00
Kconfig net: ipv6: select DST_CACHE from IPV6_RPL_LWTUNNEL 2024-09-22 19:52:07 +01:00
Makefile net/tcp: Introduce TCP_AO setsockopt()s 2023-10-27 10:35:44 +01:00
mcast_snoop.c net: bridge: mcast: fix broken length + header check for MRDv6 Adv. 2021-04-27 14:02:06 -07:00
mcast.c ipv6: mcast: use min() to simplify the code 2024-08-26 09:48:53 -07:00
mip6.c net: fill in MODULE_DESCRIPTION()s for ipv6 modules 2024-02-09 14:12:01 -08:00
ndisc.c net/ipv6: replace deprecated strcpy with strscpy 2024-08-29 12:33:07 -07:00
netfilter.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-06-13 13:13:46 -07:00
output_core.c ipv6: annotate data-races around cnf.hop_limit 2024-03-01 08:42:31 +00:00
ping.c ipv6: introduce dst_rt6_info() helper 2024-04-29 13:32:01 +01:00
proc.c minmax: add a few more MIN_T/MAX_T users 2024-07-28 13:41:14 -07:00
protocol.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
raw.c net_tstamp: add SCM_TS_OPT_ID for RAW sockets 2024-10-04 11:52:19 -07:00
reassembly.c net: Rename mono_delivery_time to tstamp_type for scalabilty 2024-05-23 14:14:23 -07:00
route.c ipv6: Fix soft lockups in fib6_select_path under high next hop churn 2024-11-11 15:26:10 -08:00
rpl_iptunnel.c net: ipv6: rpl_iptunnel: Fix memory leak in rpl_input 2024-09-13 19:55:49 -07:00
rpl.c ipv6: rpl: Remove pskb(_may)?_pull() in ipv6_rpl_srh_rcv(). 2023-06-19 11:32:58 -07:00
seg6_hmac.c ipv6: sr: fix memleak in seg6_hmac_init_algo 2024-05-21 13:16:25 +02:00
seg6_iptunnel.c ipv6: sr: block BH in seg6_output_core() and seg6_input_core() 2024-06-03 18:50:08 -07:00
seg6_local.c seg6: Use nested-BH locking for seg6_bpf_srh_states. 2024-06-24 16:41:23 -07:00
seg6.c ipv6: sr: restruct ifdefines 2024-05-30 18:29:38 -07:00
sit.c ipv6: sit: Unmask upper DSCP bits in ipip6_tunnel_bind_dev() 2024-09-04 16:57:11 -07:00
syncookies.c tcp: use sk_skb_reason_drop to free rx packets 2024-06-19 12:44:22 +01:00
sysctl_net_ipv6.c sysctl: treewide: constify the ctl_table argument of proc_handlers 2024-07-24 20:59:29 +02:00
tcp_ao.c net/tcp: Wire up l3index to TCP-AO 2023-10-27 10:35:46 +01:00
tcp_ipv6.c net/tcp: Add missing lockdep annotations for TCP-AO hlist traversals 2024-11-03 12:10:11 -08:00
tcpv6_offload.c net: gso: fix tcp fraglist segmentation after pull from frag_list 2024-10-02 17:21:47 -07:00
tunnel6.c net: fill in MODULE_DESCRIPTION()s for ipv6 modules 2024-02-09 14:12:01 -08:00
udp_impl.h tcp/udp: Call inet6_destroy_sock() in IPv6 sk->sk_destruct(). 2022-10-12 17:50:37 -07:00
udp_offload.c net: gro: fix udp bad offset in socket lookup by adding {inner_}network_offset to napi_gro_cb 2024-05-02 11:02:48 +02:00
udp.c udp: Compute L4 checksum as usual when not segmenting the skb 2024-10-15 18:12:33 -07:00
udplite.c udplite: remove UDPLITE_BIT 2023-09-14 16:16:36 +02:00
xfrm6_input.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-05-09 10:01:01 -07:00
xfrm6_output.c ipv6: drop feature RTAX_FEATURE_ALLFRAG 2023-10-25 18:04:29 -07:00
xfrm6_policy.c xfrm: respect ip protocols rules criteria when performing dst lookups 2024-09-23 07:02:07 +02:00
xfrm6_protocol.c xfrm: add support for UDPv6 encapsulation of ESP 2020-04-28 11:28:36 +02:00
xfrm6_state.c xfrm: remove output_finish indirection from xfrm_state_afinfo 2020-05-06 09:40:08 +02:00
xfrm6_tunnel.c ipsec-next-2024-03-06 2024-03-08 10:56:05 +00:00