mirror of
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
synced 2024-12-29 17:23:36 +00:00
net/ipv6: release expired exception dst cached in socket
Dst objects get leaked in ip6_negative_advice() when this function is executed for an expired IPv6 route located in the exception table. There are several conditions that must be fulfilled for the leak to occur: * an ICMPv6 packet indicating a change of the MTU for the path is received, resulting in an exception dst being created * a TCP connection that uses the exception dst for routing packets must start timing out so that TCP begins retransmissions * after the exception dst expires, the FIB6 garbage collector must not run before TCP executes ip6_negative_advice() for the expired exception dst When TCP executes ip6_negative_advice() for an exception dst that has expired and if no other socket holds a reference to the exception dst, the refcount of the exception dst is 2, which corresponds to the increment made by dst_init() and the increment made by the TCP socket for which the connection is timing out. The refcount made by the socket is never released. The refcount of the dst is decremented in sk_dst_reset() but that decrement is counteracted by a dst_hold() intentionally placed just before the sk_dst_reset() in ip6_negative_advice(). After ip6_negative_advice() has finished, there is no other object tied to the dst. The socket lost its reference stored in sk_dst_cache and the dst is no longer in the exception table. The exception dst becomes a leaked object. As a result of this dst leak, an unbalanced refcount is reported for the loopback device of a net namespace being destroyed under kernels that do not containe5f80fcf86
("ipv6: give an IPv6 dev to blackhole_netdev"): unregister_netdevice: waiting for lo to become free. Usage count = 2 Fix the dst leak by removing the dst_hold() in ip6_negative_advice(). The patch that introduced the dst_hold() in ip6_negative_advice() was92f1655aa2
("net: fix __dst_negative_advice() race"). But92f1655aa2
merely refactored the code with regards to the dst refcount so the issue was present even before92f1655aa2
. The bug was introduced in54c1a859ef
("ipv6: Don't drop cache route entry unless timer actually expired.") where the expired cached route is deleted and the sk_dst_cache member of the socket is set to NULL by calling dst_negative_advice() but the refcount belonging to the socket is left unbalanced. The IPv4 version - ipv4_negative_advice() - is not affected by this bug. When the TCP connection times out ipv4_negative_advice() merely resets the sk_dst_cache of the socket while decrementing the refcount of the exception dst. Fixes:92f1655aa2
("net: fix __dst_negative_advice() race") Fixes:54c1a859ef
("ipv6: Don't drop cache route entry unless timer actually expired.") Link: https://lore.kernel.org/netdev/20241113105611.GA6723@incl/T/#u Signed-off-by: Jiri Wiesner <jwiesner@suse.de> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20241128085950.GA4505@incl Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This commit is contained in:
parent
ccb989e4d1
commit
3301ab7d5a
@ -2780,10 +2780,10 @@ static void ip6_negative_advice(struct sock *sk,
|
||||
if (rt->rt6i_flags & RTF_CACHE) {
|
||||
rcu_read_lock();
|
||||
if (rt6_check_expired(rt)) {
|
||||
/* counteract the dst_release() in sk_dst_reset() */
|
||||
dst_hold(dst);
|
||||
/* rt/dst can not be destroyed yet,
|
||||
* because of rcu_read_lock()
|
||||
*/
|
||||
sk_dst_reset(sk);
|
||||
|
||||
rt6_remove_exception_rt(rt);
|
||||
}
|
||||
rcu_read_unlock();
|
||||
|
Loading…
Reference in New Issue
Block a user