linux-stable/kernel
Andrii Nakryiko 3f7f1a64da uprobes: revamp uprobe refcounting and lifetime management
Revamp how struct uprobe is refcounted, and thus how its lifetime is
managed.

Right now, there are a few possible "owners" of uprobe refcount:
  - uprobes_tree RB tree assumes one refcount when uprobe is registered
    and added to the lookup tree;
  - while uprobe is triggered and kernel is handling it in the breakpoint
    handler code, temporary refcount bump is done to keep uprobe from
    being freed;
  - if we have uretprobe requested on a given struct uprobe instance, we
    take another refcount to keep uprobe alive until user space code
    returns from the function and triggers return handler.

The uprobe_tree's extra refcount of 1 is confusing and problematic. No
matter how many actual consumers are attached, they all share the same
refcount, and we have an extra logic to drop the "last" (which might not
really be last) refcount once uprobe's consumer list becomes empty.

This is unconventional and has to be kept in mind as a special case all
the time. Further, because of this design we have the situations where
find_uprobe() will find uprobe, bump refcount, return it to the caller,
but that uprobe will still need uprobe_is_active() check, after which
the caller is required to drop refcount and try again. This is just too
many details leaking to the higher level logic.

This patch changes refcounting scheme in such a way as to not have
uprobes_tree keeping extra refcount for struct uprobe. Instead, each
uprobe_consumer is assuming its own refcount, which will be dropped
when consumer is unregistered. Other than that, all the active users of
uprobe (entry and return uprobe handling code) keeps exactly the same
refcounting approach.

With the above setup, once uprobe's refcount drops to zero, we need to
make sure that uprobe's "destructor" removes uprobe from uprobes_tree,
of course. This, though, races with uprobe entry handling code in
handle_swbp(), which, through find_active_uprobe()->find_uprobe() lookup,
can race with uprobe being destroyed after refcount drops to zero (e.g.,
due to uprobe_consumer unregistering). So we add try_get_uprobe(), which
will attempt to bump refcount, unless it already is zero. Caller needs
to guarantee that uprobe instance won't be freed in parallel, which is
the case while we keep uprobes_treelock (for read or write, doesn't
matter).

Note also, we now don't leak the race between registration and
unregistration, so we remove the retry logic completely. If
find_uprobe() returns valid uprobe, it's guaranteed to remain in
uprobes_tree with properly incremented refcount. The race is handled
inside __insert_uprobe() and put_uprobe() working together:
__insert_uprobe() will remove uprobe from RB-tree, if it can't bump
refcount and will retry to insert the new uprobe instance. put_uprobe()
won't attempt to remove uprobe from RB-tree, if it's already not there.
All that is protected by uprobes_treelock, which keeps things simple.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/r/20240903174603.3554182-2-andrii@kernel.org
2024-09-05 16:56:13 +02:00
..
bpf bpf: Fix a kernel verifier crash in stacksafe() 2024-08-12 18:09:48 -07:00
cgroup cgroup/cpuset: Eliminate unncessary sched domains rebuilds in hotplug 2024-08-05 10:54:25 -10:00
configs mm/slab: Plumb kmem_buckets into __do_kmalloc_node() 2024-07-03 12:24:19 +02:00
debug kdb: Get rid of redundant kdb_curr_task() 2024-06-21 15:49:29 +01:00
dma dma-debug: avoid deadlock between dma debug vs printk and netconsole 2024-08-06 10:29:32 -07:00
entry entry: Respect changes to system call number by trace_sys_enter() 2024-03-12 13:23:32 +01:00
events uprobes: revamp uprobe refcounting and lifetime management 2024-09-05 16:56:13 +02:00
futex printk: Change type of CONFIG_BASE_SMALL to bool 2024-05-06 17:39:09 +02:00
gcov gcov: add support for GCC 14 2024-06-15 10:43:06 -07:00
irq genirq/irqdesc: Honor caller provided affinity in alloc_desc() 2024-08-07 17:27:00 +02:00
kcsan kcsan: Add missing MODULE_DESCRIPTION() macro 2024-06-06 11:21:14 -07:00
livepatch livepatch: Replace snprintf() with sysfs_emit() 2024-07-02 16:56:18 +02:00
locking bcachefs fixes for 6.11-rc3 2024-08-08 13:27:31 -07:00
module module: make waiting for a concurrent module loader interruptible 2024-08-09 08:33:28 -07:00
power mm: remove the implementation of swap_free() and always use swap_free_nr() 2024-07-03 19:30:01 -07:00
printk printk/panic: Allow cpu backtraces to be written into ringbuffer during panic 2024-08-13 14:16:22 +02:00
rcu Merge branches 'doc.2024.06.06a', 'fixes.2024.07.04a', 'mb.2024.06.28a', 'nocb.2024.06.03a', 'rcu-tasks.2024.06.06a', 'rcutorture.2024.06.06a' and 'srcu.2024.06.18a' into HEAD 2024-07-04 13:54:17 -07:00
sched profiling: remove profile=sleep support 2024-08-04 13:36:28 -07:00
time timekeeping: Fix bogus clock_was_set() invocation in do_adjtimex() 2024-08-05 16:14:14 +02:00
trace bpf: Fix use-after-free in bpf_uprobe_multi_link_attach() 2024-09-05 16:56:13 +02:00
.gitignore
acct.c kernel misc: Remove the now superfluous sentinel elements from ctl_table array 2024-04-24 09:43:53 +02:00
async.c async: Use a dedicated unbound workqueue with raised min_active 2024-02-09 11:13:59 -10:00
audit_fsnotify.c
audit_tree.c fsnotify: create a wrapper fsnotify_find_inode_mark() 2024-04-04 16:24:16 +02:00
audit_watch.c fsnotify: create a wrapper fsnotify_find_inode_mark() 2024-04-04 16:24:16 +02:00
audit.c audit: use KMEM_CACHE() instead of kmem_cache_create() 2024-01-25 10:12:22 -05:00
audit.h audit: correct audit_filter_inodes() definition 2023-07-21 12:17:25 -04:00
auditfilter.c ima: Avoid blocking in RCU read-side critical section 2024-06-13 14:26:50 -04:00
auditsc.c audit,io_uring: io_uring openat triggers audit reference count underflow 2023-10-13 18:34:46 +02:00
backtracetest.c backtracetest: add MODULE_DESCRIPTION() 2024-06-24 22:24:55 -07:00
bounds.c bounds: Use the right number of bits for power-of-two CONFIG_NR_CPUS 2024-04-29 08:29:29 -07:00
capability.c lsm: constify the 'target' parameter in security_capget() 2023-08-08 16:48:47 -04:00
cfi.c
compat.c
configs.c
context_tracking.c context_tracking: Make context_tracking_key __ro_after_init 2024-03-22 11:18:18 +01:00
cpu_pm.c
cpu.c cpu/SMT: Enable SMT only if a core is online 2024-08-13 10:31:24 +10:00
crash_core.c Mainly singleton patches, documented in their respective changelogs. 2024-05-19 14:02:03 -07:00
crash_reserve.c crash: fix riscv64 crash memory reserve dead loop 2024-08-15 22:16:16 -07:00
cred.c cred: Use KMEM_CACHE() instead of kmem_cache_create() 2024-02-23 17:33:31 -05:00
delayacct.c sysctl: treewide: constify the ctl_table argument of proc_handlers 2024-07-24 20:59:29 +02:00
dma.c
elfcorehdr.c crash: remove dependency of FA_DUMP on CRASH_DUMP 2024-02-23 17:48:22 -08:00
exec_domain.c
exit.c - 875fa64577da ("mm/hugetlb_vmemmap: fix race with speculative PFN 2024-07-21 17:15:46 -07:00
exit.h exit: add internal include file with helpers 2023-09-21 12:03:50 -06:00
extable.c
fail_function.c
fork.c pidfd: prevent creation of pidfds for kthreads 2024-08-12 22:03:26 +02:00
freezer.c Linux 6.7-rc6 2023-12-23 15:52:13 +01:00
gen_kheaders.sh kheaders: use command -v to test for existence of cpio 2024-05-30 01:13:20 +09:00
groups.c groups: Convert group_info.usage to refcount_t 2023-09-29 11:28:39 -07:00
hung_task.c sysctl: treewide: constify the ctl_table argument of proc_handlers 2024-07-24 20:59:29 +02:00
iomem.c kernel/iomem.c: remove __weak ioremap_cache helper 2023-08-21 13:37:28 -07:00
irq_work.c
jump_label.c jump_label: Fix the fix, brown paper bags galore 2024-07-31 12:57:39 +02:00
kallsyms_internal.h kallsyms: get rid of code for absolute kallsyms 2024-07-20 16:33:21 +09:00
kallsyms_selftest.c kallsyms: Match symbols exactly with CONFIG_LTO_CLANG 2024-08-15 09:33:35 -07:00
kallsyms_selftest.h
kallsyms.c kallsyms: Match symbols exactly with CONFIG_LTO_CLANG 2024-08-15 09:33:35 -07:00
kcmp.c file: convert to SLAB_TYPESAFE_BY_RCU 2023-10-19 11:02:48 +02:00
Kconfig.freezer
Kconfig.hz
Kconfig.kexec crash: clean up kdump related config items 2024-02-23 17:48:22 -08:00
Kconfig.locks
Kconfig.preempt
kcov.c kcov: properly check for softirq context 2024-08-07 18:33:56 -07:00
kexec_core.c sysctl: treewide: constify the ctl_table argument of proc_handlers 2024-07-24 20:59:29 +02:00
kexec_elf.c
kexec_file.c crash: add a new kexec flag for hotplug support 2024-04-23 14:59:01 +10:00
kexec_internal.h crash: remove dependency of FA_DUMP on CRASH_DUMP 2024-02-23 17:48:22 -08:00
kexec.c crash: add a new kexec flag for hotplug support 2024-04-23 14:59:01 +10:00
kheaders.c
kprobes.c kprobes: Fix to check symbol prefixes correctly 2024-08-05 14:04:03 +09:00
ksyms_common.c
ksysfs.c profiling: remove prof_cpu_mask 2024-07-29 10:45:54 -07:00
kthread.c kunit: Handle test faults 2024-05-06 14:22:02 -06:00
latencytop.c sysctl: treewide: constify the ctl_table argument of proc_handlers 2024-07-24 20:59:29 +02:00
Makefile crash: split crash dumping code out from kexec_core.c 2024-02-23 17:48:22 -08:00
module_signature.c
notifier.c
nsproxy.c pidfd: add pidfs 2024-03-01 12:23:37 +01:00
numa.c kernel/numa.c: Move logging out of numa.h 2023-12-20 19:26:30 -05:00
padata.c padata: Fix possible divide-by-0 panic in padata_mt_helper() 2024-08-07 18:33:56 -07:00
panic.c printk/panic: Allow cpu backtraces to be written into ringbuffer during panic 2024-08-13 14:16:22 +02:00
params.c params: Fix multi-line comment style 2023-12-01 09:51:44 -08:00
pid_namespace.c sysctl: treewide: constify the ctl_table argument of proc_handlers 2024-07-24 20:59:29 +02:00
pid_sysctl.h sysctl: treewide: constify the ctl_table argument of proc_handlers 2024-07-24 20:59:29 +02:00
pid.c pidfs: remove config option 2024-03-13 12:53:53 -07:00
profile.c profiling: remove profile=sleep support 2024-08-04 13:36:28 -07:00
ptrace.c ptrace_attach: shift send(SIGSTOP) into ptrace_set_stopped() 2024-02-22 15:38:52 -08:00
range.c
reboot.c kernel misc: Remove the now superfluous sentinel elements from ctl_table array 2024-04-24 09:43:53 +02:00
regset.c regset: use kvzalloc() for regset_get_alloc() 2024-04-25 21:07:03 -07:00
relay.c kernel: relay: remove relay_file_splice_read dead code, doesn't work 2023-12-29 12:22:27 -08:00
resource_kunit.c resource: add missing MODULE_DESCRIPTION() 2024-06-28 19:36:30 -07:00
resource.c resource: Export find_resource_space() 2024-05-28 11:14:14 -05:00
rseq.c
scftorture.c scftorture: Make torture_type static 2024-05-30 15:31:51 -07:00
scs.c
seccomp.c sysctl: treewide: constify the ctl_table argument of proc_handlers 2024-07-24 20:59:29 +02:00
signal.c kernel: rerun task_work while freezing in get_signal() 2024-07-11 01:51:44 -06:00
smp.c smp: Add missing destroy_work_on_stack() call in smp_call_on_cpu() 2024-07-10 22:40:39 +02:00
smpboot.c kthread: add kthread_stop_put 2023-10-04 10:41:57 -07:00
smpboot.h
softirq.c softirq: Fix suspicious RCU usage in __do_softirq() 2024-04-29 05:03:51 +02:00
stackleak.c sysctl: treewide: constify the ctl_table argument of proc_handlers 2024-07-24 20:59:29 +02:00
stacktrace.c stacktrace: fix kernel-doc typo 2023-12-29 12:22:29 -08:00
static_call_inline.c
static_call.c
stop_machine.c
sys_ni.c Probes updates for v6.11: 2024-07-18 12:19:20 -07:00
sys.c RISC-V Patches for the 6.10 Merge Window, Part 1 2024-05-22 09:56:00 -07:00
sysctl-test.c sysctl: Add module description to sysctl-testing 2024-06-03 15:20:37 +02:00
sysctl.c sysctl: treewide: constify the ctl_table argument of proc_handlers 2024-07-24 20:59:29 +02:00
task_work.c task_work: make TWA_NMI_CURRENT handling conditional on IRQ_WORK 2024-07-29 12:05:06 -07:00
taskstats.c taskstats: fill_stats_for_tgid: use for_each_thread() 2023-10-04 10:41:57 -07:00
torture.c torture: Add MODULE_DESCRIPTION() 2024-05-30 15:31:38 -07:00
tracepoint.c
tsacct.c tsacct: replace strncpy() with strscpy() 2024-07-12 16:39:53 -07:00
ucount.c sysctl changes for v6.10-rc1 2024-05-17 17:31:24 -07:00
uid16.c
uid16.h
umh.c sysctl: treewide: constify the ctl_table argument of proc_handlers 2024-07-24 20:59:29 +02:00
up.c smp: Change function signatures to use call_single_data_t 2023-09-13 14:59:24 +02:00
user_namespace.c user_namespace: remove unnecessary NULL values from kbuf 2024-02-22 15:38:52 -08:00
user-return-notifier.c
user.c printk: Change type of CONFIG_BASE_SMALL to bool 2024-05-06 17:39:09 +02:00
usermode_driver.c
utsname_sysctl.c sysctl: treewide: constify the ctl_table argument of proc_handlers 2024-07-24 20:59:29 +02:00
utsname.c
vhost_task.c vhost_task: Handle SIGKILL by flushing work and exiting 2024-05-22 08:31:15 -04:00
vmcore_info.c kallsyms: get rid of code for absolute kallsyms 2024-07-20 16:33:21 +09:00
watch_queue.c watch_queue: fix kcalloc() arguments order 2023-12-21 13:17:54 +01:00
watchdog_buddy.c
watchdog_perf.c watchdog/perf: properly initialize the turbo mode timestamp and rearm counter 2024-07-17 21:11:34 -07:00
watchdog.c sysctl: treewide: constify the ctl_table argument of proc_handlers 2024-07-24 20:59:29 +02:00
workqueue_internal.h workqueue: Drop the special locking rule for worker->flags and worker_pool->flags 2023-08-07 15:57:22 -10:00
workqueue.c workqueue: Correct declaration of cpu_pwq in struct workqueue_struct 2024-08-05 18:34:02 -10:00