Linux kernel source tree
Go to file
Jiri Wiesner 3301ab7d5a net/ipv6: release expired exception dst cached in socket
Dst objects get leaked in ip6_negative_advice() when this function is
executed for an expired IPv6 route located in the exception table. There
are several conditions that must be fulfilled for the leak to occur:
* an ICMPv6 packet indicating a change of the MTU for the path is received,
  resulting in an exception dst being created
* a TCP connection that uses the exception dst for routing packets must
  start timing out so that TCP begins retransmissions
* after the exception dst expires, the FIB6 garbage collector must not run
  before TCP executes ip6_negative_advice() for the expired exception dst

When TCP executes ip6_negative_advice() for an exception dst that has
expired and if no other socket holds a reference to the exception dst, the
refcount of the exception dst is 2, which corresponds to the increment
made by dst_init() and the increment made by the TCP socket for which the
connection is timing out. The refcount made by the socket is never
released. The refcount of the dst is decremented in sk_dst_reset() but
that decrement is counteracted by a dst_hold() intentionally placed just
before the sk_dst_reset() in ip6_negative_advice(). After
ip6_negative_advice() has finished, there is no other object tied to the
dst. The socket lost its reference stored in sk_dst_cache and the dst is
no longer in the exception table. The exception dst becomes a leaked
object.

As a result of this dst leak, an unbalanced refcount is reported for the
loopback device of a net namespace being destroyed under kernels that do
not contain e5f80fcf86 ("ipv6: give an IPv6 dev to blackhole_netdev"):
unregister_netdevice: waiting for lo to become free. Usage count = 2

Fix the dst leak by removing the dst_hold() in ip6_negative_advice(). The
patch that introduced the dst_hold() in ip6_negative_advice() was
92f1655aa2 ("net: fix __dst_negative_advice() race"). But 92f1655aa2
merely refactored the code with regards to the dst refcount so the issue
was present even before 92f1655aa2. The bug was introduced in
54c1a859ef ("ipv6: Don't drop cache route entry unless timer actually
expired.") where the expired cached route is deleted and the sk_dst_cache
member of the socket is set to NULL by calling dst_negative_advice() but
the refcount belonging to the socket is left unbalanced.

The IPv4 version - ipv4_negative_advice() - is not affected by this bug.
When the TCP connection times out ipv4_negative_advice() merely resets the
sk_dst_cache of the socket while decrementing the refcount of the
exception dst.

Fixes: 92f1655aa2 ("net: fix __dst_negative_advice() race")
Fixes: 54c1a859ef ("ipv6: Don't drop cache route entry unless timer actually expired.")
Link: https://lore.kernel.org/netdev/20241113105611.GA6723@incl/T/#u
Signed-off-by: Jiri Wiesner <jwiesner@suse.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241128085950.GA4505@incl
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-02 19:24:54 -08:00
arch dmaengine updates for v6.13 2024-11-27 13:25:47 -08:00
block for-6.13/block-20241118 2024-11-18 16:50:08 -08:00
certs sign-file,extract-cert: use pkcs11 provider for OPENSSL MAJOR >= 3 2024-09-20 19:52:48 +03:00
crypto Random number generator updates for Linux 6.13-rc1. 2024-11-19 10:43:44 -08:00
Documentation docs: net: bareudp: fix spelling and grammar mistakes 2024-11-30 13:54:28 -08:00
drivers net: phy: microchip: Reset LAN88xx PHY to ensure clean link state on LAN7800/7850 2024-12-02 18:56:41 -08:00
fs Changes for 6.13-rc1 2024-11-28 09:22:00 -08:00
include tcp: populate XPS related fields of timewait sockets 2024-11-30 13:00:52 -08:00
init - The series "resource: A couple of cleanups" from Andy Shevchenko 2024-11-25 16:09:48 -08:00
io_uring A rather large update for timekeeping and timers: 2024-11-19 16:35:06 -08:00
ipc - The series "resource: A couple of cleanups" from Andy Shevchenko 2024-11-25 16:09:48 -08:00
kernel Modules changes for v6.13-rc1 2024-11-27 10:20:50 -08:00
lib Modules changes for v6.13-rc1 2024-11-27 10:20:50 -08:00
LICENSES LICENSES: add 0BSD license text 2024-09-01 20:43:24 -07:00
mm memblock: updates for 6.13-rc1 2024-11-27 11:13:25 -08:00
net net/ipv6: release expired exception dst cached in socket 2024-12-02 19:24:54 -08:00
rust rust: fix up formatting after merge 2024-11-26 17:54:58 -08:00
samples Rust changes for v6.13 2024-11-26 14:00:26 -08:00
scripts Modules changes for v6.13-rc1 2024-11-27 10:20:50 -08:00
security selinux: use sk_to_full_sk() in selinux_ip_output() 2024-11-30 13:24:33 -08:00
sound soundwire updates for 6.13 2024-11-27 13:38:09 -08:00
tools selftests: drv-net: rss_ctx: Add test for ntuple rule 2024-11-30 14:16:12 -08:00
usr initramfs: shorten cmd_initfs in usr/Makefile 2024-07-16 01:07:52 +09:00
virt VFIO updates for v6.13 2024-11-27 12:57:03 -08:00
.clang-format clang-format: Update with v6.11-rc1's for_each macro list 2024-08-02 13:20:31 +02:00
.clippy.toml rust: enable Clippy's check-private-items 2024-10-07 21:39:57 +02:00
.cocciconfig scripts: add Linux .cocciconfig for coccinelle 2016-07-22 12:13:39 +02:00
.editorconfig .editorconfig: remove trim_trailing_whitespace option 2024-06-13 16:47:52 +02:00
.get_maintainer.ignore MAINTAINERS: Retire Ralf Baechle 2024-11-12 15:48:59 +01:00
.gitattributes .gitattributes: set diff driver for Rust source code files 2023-05-31 17:48:25 +02:00
.gitignore rust: introduce .clippy.toml 2024-10-07 21:39:05 +02:00
.mailmap media updates for v6.13-rc1 2024-11-20 14:01:15 -08:00
.rustfmt.toml rust: add .rustfmt.toml 2022-09-28 09:02:20 +02:00
COPYING COPYING: state that all contributions really are covered by this file 2020-02-10 13:32:20 -08:00
CREDITS cgroup: Changes for v6.13 2024-11-20 09:54:49 -08:00
Kbuild Kbuild updates for v6.1 2022-10-10 12:00:45 -07:00
Kconfig kbuild: ensure full rebuild when the compiler is updated 2020-05-12 13:28:33 +09:00
MAINTAINERS MAINTAINERS: list PTP drivers under networking 2024-12-02 15:42:40 -08:00
Makefile Modules changes for v6.13-rc1 2024-11-27 10:20:50 -08:00
README README: Fix spelling 2024-03-18 03:36:32 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the reStructuredText markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.