linux-next/kernel/sched
Nicholas Piggin 9664c5b908 lazy tlb: fix hotplug exit race with MMU_LAZY_TLB_SHOOTDOWN
CPU unplug first calls __cpu_disable(), and that's where powerpc calls
cleanup_cpu_mmu_context(), which clears this CPU from mm_cpumask() of all
mms in the system.

However this CPU may still be using a lazy tlb mm, and its mm_cpumask bit
will be cleared from it.  The CPU does not switch away from the lazy tlb
mm until arch_cpu_idle_dead() calls idle_task_exit().

If that user mm exits in this window, it will not be subject to the lazy
tlb mm shootdown and may be freed while in use as a lazy mm by the CPU
that is being unplugged.

cleanup_cpu_mmu_context() could be moved later, but it looks better to
move the lazy tlb mm switching earlier.  The problem with doing the lazy
mm switching in idle_task_exit() is explained in commit bf2c59fce4
("sched/core: Fix illegal RCU from offline CPUs"), which added a wart to
switch away from the mm but leave it set in active_mm to be cleaned up
later.

So instead, switch away from the lazy tlb mm at sched_cpu_wait_empty(),
which is the last hotplug state before teardown
(CPUHP_AP_SCHED_WAIT_EMPTY).  This CPU will never switch to a user thread
from this point, so it has no chance to pick up a new lazy tlb mm.  This
removes the lazy tlb mm handling wart in CPU unplug.

With this, idle_task_exit() is not needed anymore and can be cleaned up. 
This leaves the prototype alone, to be cleaned after this change.

herton: took the suggestions from https://lore.kernel.org/all/87jzvyprsw.ffs@tglx/
and made adjustments on the initial patch proposed by Nicholas.

Link: https://lkml.kernel.org/r/20230524060455.147699-1-npiggin@gmail.com
Link: https://lore.kernel.org/all/20230525205253.E2FAEC433EF@smtp.kernel.org/
Link: https://lkml.kernel.org/r/20241104142318.3295663-1-herton@redhat.com
Fixes: 2655421ae6 ("lazy tlb: shoot lazies, non-refcounting lazy tlb mm reference handling scheme")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Herton R. Krzesinski <herton@redhat.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-12-18 19:50:37 -08:00
..
autogroup.c scheduler: Remove the now superfluous sentinel elements from ctl_table array 2024-04-24 09:43:54 +02:00
autogroup.h sched/headers: Add header guard to kernel/sched/stats.h and kernel/sched/autogroup.h 2022-02-23 08:22:00 +01:00
build_policy.c sched_ext: Disallow loading BPF scheduler if isolcpus= domain isolation is in effect 2024-07-08 09:30:13 -10:00
build_utility.c sched/headers: Remove duplicate header inclusions 2023-10-03 21:27:55 +02:00
clock.c sched: Fix spelling in comments 2024-05-27 17:00:21 +02:00
completion.c sched: add a few helpers to wake up tasks on the current cpu 2023-07-17 16:08:08 -07:00
core_sched.c sched: Fix spelling in comments 2024-05-27 17:00:21 +02:00
core.c lazy tlb: fix hotplug exit race with MMU_LAZY_TLB_SHOOTDOWN 2024-12-18 19:50:37 -08:00
cpuacct.c Merge branch 'sched/fast-headers' into sched/core 2022-03-15 09:05:05 +01:00
cpudeadline.c sched/topology: Consolidate and clean up access to a CPU's max compute capacity 2023-10-09 12:59:48 +02:00
cpudeadline.h
cpufreq_schedutil.c sched/cpufreq: Ensure sd is rebuilt for EAS check 2024-11-12 21:36:51 +01:00
cpufreq.c sched/headers: Introduce kernel/sched/build_utility.c and build multiple .c files there 2022-02-23 10:58:33 +01:00
cpupri.c sched/rt: Fix live lock between select_fallback_rq() and RT push 2023-09-28 22:58:13 +02:00
cpupri.h sched/cpupri: Add CPUPRI_HIGHER 2020-10-29 11:00:30 +01:00
cputime.c sched/cputime: Fix mul_u64_u64_div_u64() precision for cputime 2024-07-29 12:22:32 +02:00
deadline.c sched/dlserver: Fix dlserver double enqueue 2024-12-13 12:57:34 +01:00
debug.c sched/eevdf: More PELT vs DELAYED_DEQUEUE 2024-12-09 11:48:09 +01:00
ext.c sched_ext: Change for v6.13 2024-11-20 10:08:00 -08:00
ext.h sched: Pass correct scheduling policy to __setscheduler_class 2024-10-29 13:57:51 +01:00
fair.c - Prevent incorrect dequeueing of the deadline dlserver helper task and fix 2024-12-15 09:38:03 -08:00
features.h sched/fair: fix the comment for PREEMPT_SHORT 2024-10-07 09:28:41 +02:00
idle.c A rather large update for timekeeping and timers: 2024-11-19 16:35:06 -08:00
isolation.c sched/isolation: Fix boot crash when maxcpus < first housekeeping CPU 2024-04-28 10:08:21 +02:00
loadavg.c sched: Fix spelling in comments 2024-05-27 17:00:21 +02:00
Makefile sched/headers: Introduce kernel/sched/build_policy.c and build multiple .c files there 2022-02-23 10:58:33 +01:00
membarrier.c RISC-V Patches for the 6.9 Merge Window 2024-03-22 10:41:13 -07:00
pelt.c sched/eevdf: More PELT vs DELAYED_DEQUEUE 2024-12-09 11:48:09 +01:00
pelt.h sched: Move update_other_load_avgs() to kernel/sched/pelt.c 2024-09-11 20:00:21 -10:00
psi.c sched: psi: fix bogus pressure spikes from aggregation race 2024-10-03 16:03:16 -07:00
rt.c sched: Split scheduler and execution contexts 2024-10-14 12:52:42 +02:00
sched-pelt.h sched/fair: Fix "runnable_avg_yN_inv" not used warnings 2019-06-17 12:15:58 +02:00
sched.h sched/dlserver: Fix dlserver double enqueue 2024-12-13 12:57:34 +01:00
smp.h sched, smp: Trace smp callback causing an IPI 2023-03-24 11:01:29 +01:00
stats.c profiling: remove profile=sleep support 2024-08-04 13:36:28 -07:00
stats.h sched: psi: pass enqueue/dequeue flags to psi callbacks directly 2024-10-26 09:28:38 +02:00
stop_task.c sched: Add put_prev_task(.next) 2024-09-03 15:26:32 +02:00
swait.c sched: add a few helpers to wake up tasks on the current cpu 2023-07-17 16:08:08 -07:00
syscalls.c sched: fix warning in sched_setaffinity 2024-12-02 12:01:27 +01:00
topology.c sched/fair: Fair server interface 2024-07-29 12:22:36 +02:00
wait_bit.c sched/wait: Remove unused bit_wait_io_timeout 2024-10-07 09:28:41 +02:00
wait.c sched: remove wait bookmarks 2023-10-18 14:34:18 -07:00