Commit Graph

46783 Commits

Author SHA1 Message Date
Stephen Rothwell
24896579b9 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git 2024-12-20 15:11:58 +11:00
Stephen Rothwell
5e76f6a874 Merge branch 'for-next/kspp' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git 2024-12-20 15:11:33 +11:00
Stephen Rothwell
e570a07187 Merge branch 'for-next/execve' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git 2024-12-20 15:11:26 +11:00
Stephen Rothwell
7b5a9f355d Merge branch 'slab/for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab.git 2024-12-20 15:11:19 +11:00
Stephen Rothwell
e25b845fe9 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching 2024-12-20 14:48:21 +11:00
Stephen Rothwell
c7dd1920cd Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git 2024-12-20 14:39:25 +11:00
Stephen Rothwell
f44b7127cb Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext.git 2024-12-20 14:16:41 +11:00
Stephen Rothwell
3feff3acfa Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git 2024-12-20 14:16:39 +11:00
Stephen Rothwell
2d119f6afa Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux.git
# Conflicts:
#	kernel/rcu/tree.c
2024-12-20 13:35:44 +11:00
Stephen Rothwell
ae884f86d7 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git 2024-12-20 13:32:54 +11:00
Stephen Rothwell
ea2356f802 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 2024-12-20 13:32:50 +11:00
Stephen Rothwell
70d508f98a Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit.git 2024-12-20 13:26:08 +11:00
Stephen Rothwell
f5cf996f5c Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm.git 2024-12-20 13:16:33 +11:00
Stephen Rothwell
8eaf5c20fb Merge branch 'for-next' of git://git.kernel.dk/linux-block.git 2024-12-20 13:10:04 +11:00
Stephen Rothwell
bfb2b1bbdb Merge branch 'modules-next' of git://git.kernel.org/pub/scm/linux/kernel/git/modules/linux.git 2024-12-20 12:26:56 +11:00
Stephen Rothwell
49e3ee1413 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git 2024-12-20 11:48:40 +11:00
Stephen Rothwell
0b744f9e35 Merge branch 'main' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git 2024-12-20 11:48:38 +11:00
Stephen Rothwell
1dae1421fa Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git 2024-12-20 11:29:38 +11:00
Stephen Rothwell
23b66e8e8b Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux.git 2024-12-20 11:07:32 +11:00
Stephen Rothwell
07ccd1271c Merge branch 'fs-next' of linux-next 2024-12-20 10:45:33 +11:00
Stephen Rothwell
b86e29c311 Merge branch 'mm-everything' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm 2024-12-20 10:23:48 +11:00
Stephen Rothwell
6488329e36 Merge branch 'tip/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 2024-12-20 09:42:15 +11:00
Stephen Rothwell
bf034d3155 Merge branch 'ring-buffer/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git 2024-12-20 09:42:14 +11:00
Stephen Rothwell
412ef23451 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git 2024-12-20 09:41:34 +11:00
Stephen Rothwell
7743f57150 Merge branch 'mm-hotfixes-unstable' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm 2024-12-20 09:41:31 +11:00
Stephen Rothwell
cd07c43f9b Merge branch 'vfs.all' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git 2024-12-20 09:19:26 +11:00
Stephen Rothwell
8dad5129f0 Merge branch 'for_next' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs.git 2024-12-20 09:19:18 +11:00
Jakub Kicinski
07e5c4eb94 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.13-rc4).

No conflicts.

Adjacent changes:

drivers/net/ethernet/renesas/rswitch.h
  32fd46f5b6 ("net: renesas: rswitch: remove speed from gwca structure")
  922b4b955a ("net: renesas: rswitch: rework ts tags management")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-19 11:35:07 -08:00
Ingo Molnar
c779bc69c8 Merge branch into tip/master: 'sched/core'
# New commits in sched/core:
    af98d8a36a ("sched/fair: Fix CPU bandwidth limit bypass during CPU hotplug")
    7675361ff9 ("sched: deadline: Cleanup goto label in pick_earliest_pushable_dl_task")
    7d5265ffcd ("rseq: Validate read-only fields under DEBUG_RSEQ config")
    2a77e4be12 ("sched/fair: Untangle NEXT_BUDDY and pick_next_task()")
    95d9fed3a2 ("sched/fair: Mark m*_vruntime() with __maybe_unused")
    0429489e09 ("sched/fair: Fix variable declaration position")
    61b82dfb6b ("sched/fair: Do not try to migrate delayed dequeue task")
    736c55a02c ("sched/fair: Rename cfs_rq.nr_running into nr_queued")
    43eef7c3a4 ("sched/fair: Remove unused cfs_rq.idle_nr_running")
    31898e7b87 ("sched/fair: Rename cfs_rq.idle_h_nr_running into h_nr_idle")
    9216582b0b ("sched/fair: Removed unsued cfs_rq.h_nr_delayed")
    1a49104496 ("sched/fair: Use the new cfs_rq.h_nr_runnable")
    c2a295bffe ("sched/fair: Add new cfs_rq.h_nr_runnable")
    7b8a702d94 ("sched/fair: Rename h_nr_running into h_nr_queued")
    c907cd44a1 ("sched: Unify HK_TYPE_{TIMER|TICK|MISC} to HK_TYPE_KERNEL_NOISE")
    6010d245dd ("sched/isolation: Consolidate housekeeping cpumasks that are always identical")
    1174b9344b ("sched/isolation: Make "isolcpus=nohz" equivalent to "nohz_full"")
    ae5c677729 ("sched/core: Remove HK_TYPE_SCHED")
    a76328d44c ("sched/fair: Remove CONFIG_CFS_BANDWIDTH=n definition of cfs_bandwidth_used()")
    3a181f20fb ("sched/deadline: Consolidate Timer Cancellation")
    53916d5fd3 ("sched/deadline: Check bandwidth overflow earlier for hotplug")
    d4742f6ed7 ("sched/deadline: Correctly account for allocated bandwidth during hotplug")
    41d4200b71 ("sched/deadline: Restore dl_server bandwidth on non-destructive root domain changes")
    59297e2093 ("sched: add READ_ONCE to task_on_rq_queued")
    108ad09990 ("sched: Don't try to catch up excess steal time.")

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2024-12-19 20:24:25 +01:00
Ingo Molnar
08eccca432 Merge branch into tip/master: 'perf/core'
# New commits in perf/core:
    02c56362a7 ("uprobes: Guard against kmemdup() failing in dup_return_instance()")
    d29e744c71 ("perf/x86: Relax privilege filter restriction on AMD IBS")
    6057b90ecc ("perf/core: Export perf_exclude_event()")
    8622e45b5d ("uprobes: Reuse return_instances between multiple uretprobes within task")
    0cf981de76 ("uprobes: Ensure return_instance is detached from the list before freeing")
    636666a1c7 ("uprobes: Decouple return_instance list traversal and freeing")
    2ff913ab3f ("uprobes: Simplify session consumer tracking")
    e0925f2dc4 ("uprobes: add speculative lockless VMA-to-inode-to-uprobe resolution")
    83e3dc9a5d ("uprobes: simplify find_active_uprobe_rcu() VMA checks")
    03a001b156 ("mm: introduce mmap_lock_speculate_{try_begin|retry}")
    eb449bd969 ("mm: convert mm_lock_seq to a proper seqcount")
    7528585290 ("mm/gup: Use raw_seqcount_try_begin()")
    96450ead16 ("seqlock: add raw_seqcount_try_begin")
    b4943b8bfc ("perf/x86/rapl: Add core energy counter support for AMD CPUs")
    54d2759778 ("perf/x86/rapl: Move the cntr_mask to rapl_pmus struct")
    bdc57ec705 ("perf/x86/rapl: Remove the global variable rapl_msrs")
    abf03d9bd2 ("perf/x86/rapl: Modify the generic variable names to *_pkg*")
    eeca4c6b25 ("perf/x86/rapl: Add arguments to the init and cleanup functions")
    cd29d83a6d ("perf/x86/rapl: Make rapl_model struct global")
    8bf1c86e5a ("perf/x86/rapl: Rename rapl_pmu variables")
    1d5e2f637a ("perf/x86/rapl: Remove the cpu_to_rapl_pmu() function")
    e4b4443477 ("x86/topology: Introduce topology_logical_core_id()")
    2f2db34707 ("perf/x86/rapl: Remove the unused get_rapl_pmu_cpumask() function")
    ae55e308bd ("perf/x86/intel/ds: Simplify the PEBS records processing for adaptive PEBS")
    3c00ed344c ("perf/x86/intel/ds: Factor out functions for PEBS records processing")
    7087bfb0ad ("perf/x86/intel/ds: Clarify adaptive PEBS processing")
    faac6f105e ("perf/core: Check sample_type in perf_sample_save_brstack")
    f226805bc5 ("perf/core: Check sample_type in perf_sample_save_callchain")
    b9c44b9147 ("perf/core: Save raw sample data conditionally based on sample type")

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2024-12-19 20:24:25 +01:00
Ingo Molnar
c46f39a3e7 Merge branch into tip/master: 'locking/core'
# New commits in locking/core:
    63a48181fb ("smp/scf: Evaluate local cond_func() before IPI side-effects")
    d387ceb171 ("locking/lockdep: Enforce PROVE_RAW_LOCK_NESTING only if ARCH_SUPPORTS_RT")

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2024-12-19 20:24:24 +01:00
Ingo Molnar
f391ba1ed5 Merge branch into tip/master: 'irq/core'
# New commits in irq/core:
    b4706d8149 ("genirq/kexec: Prevent redundant IRQ masking by checking state before shutdown")
    bad6722e47 ("kexec: Consolidate machine_kexec_mask_interrupts() implementation")
    429f49ad36 ("genirq: Reuse irq_thread_fn() for forced thread case")
    6f8b79683d ("genirq: Move irq_thread_fn() further up in the code")

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2024-12-19 20:24:23 +01:00
Ingo Molnar
6371c819b1 Merge branch into tip/master: 'locking/urgent'
# New commits in locking/urgent:
    4a07791457 ("locking/rtmutex: Make sure we wake anything on the wake_q when we release the lock->wait_lock")

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2024-12-19 20:24:22 +01:00
Tvrtko Ursulin
de35994ecd workqueue: Do not warn when cancelling WQ_MEM_RECLAIM work from !WQ_MEM_RECLAIM worker
After commit
746ae46c11 ("drm/sched: Mark scheduler work queues with WQ_MEM_RECLAIM")
amdgpu started seeing the following warning:

 [ ] workqueue: WQ_MEM_RECLAIM sdma0:drm_sched_run_job_work [gpu_sched] is flushing !WQ_MEM_RECLAIM events:amdgpu_device_delay_enable_gfx_off [amdgpu]
...
 [ ] Workqueue: sdma0 drm_sched_run_job_work [gpu_sched]
...
 [ ] Call Trace:
 [ ]  <TASK>
...
 [ ]  ? check_flush_dependency+0xf5/0x110
...
 [ ]  cancel_delayed_work_sync+0x6e/0x80
 [ ]  amdgpu_gfx_off_ctrl+0xab/0x140 [amdgpu]
 [ ]  amdgpu_ring_alloc+0x40/0x50 [amdgpu]
 [ ]  amdgpu_ib_schedule+0xf4/0x810 [amdgpu]
 [ ]  ? drm_sched_run_job_work+0x22c/0x430 [gpu_sched]
 [ ]  amdgpu_job_run+0xaa/0x1f0 [amdgpu]
 [ ]  drm_sched_run_job_work+0x257/0x430 [gpu_sched]
 [ ]  process_one_work+0x217/0x720
...
 [ ]  </TASK>

The intent of the verifcation done in check_flush_depedency is to ensure
forward progress during memory reclaim, by flagging cases when either a
memory reclaim process, or a memory reclaim work item is flushed from a
context not marked as memory reclaim safe.

This is correct when flushing, but when called from the
cancel(_delayed)_work_sync() paths it is a false positive because work is
either already running, or will not be running at all. Therefore
cancelling it is safe and we can relax the warning criteria by letting the
helper know of the calling context.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Fixes: fca839c00a ("workqueue: warn if memory reclaim tries to flush !WQ_MEM_RECLAIM workqueue")
References: 746ae46c11 ("drm/sched: Mark scheduler work queues with WQ_MEM_RECLAIM")
Cc: Tejun Heo <tj@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: <stable@vger.kernel.org> # v4.5+
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-12-19 06:15:35 -10:00
Rafael J. Wysocki
432f1f00f7 Merge branches 'pm-em', 'pm-sleep' and 'pm-cpufreq' into linux-next
* pm-em:
  PM: EM: Move sched domains rebuild function from schedutil to EM

* pm-sleep:
  PM: wakeup: implement devm_device_init_wakeup() helper

* pm-cpufreq:
  cpufreq: schedutil: Fix superfluous updates caused by need_freq_update
  cpufreq: intel_pstate: Use CPUFREQ_POLICY_UNKNOWN
2024-12-19 12:36:59 +01:00
Andrew Morton
45f41efd96 foo 2024-12-18 19:51:48 -08:00
Yunhui Cui
04f910643d watchdog: output this_cpu when printing hard LOCKUP
When printing "Watchdog detected hard LOCKUP on cpu", also output the
detecting CPU.  It's more intuitive.

Link: https://lkml.kernel.org/r/20241210095238.63444-1-cuiyunhui@bytedance.com
Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Cc: Bitao Hu <yaoma@linux.alibaba.com>
Cc: Joel Granados <joel.granados@kernel.org>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Liu Song <liusong@linux.alibaba.com>
Cc: Song Liu <song@kernel.org>
Cc: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-12-18 19:51:38 -08:00
MengEn Sun
2b05eacc98 ucounts: move kfree() out of critical zone protected by ucounts_lock
Although kfree is a non-sleep function, it is possible to enter a long
chain of calls probabilistically, so it looks better to move kfree from
alloc_ucounts() out of the critical zone of ucounts_lock.

Link: https://lkml.kernel.org/r/1733458427-11794-1-git-send-email-mengensun@tencent.com
Signed-off-by: MengEn Sun <mengensun@tencent.com>
Reviewed-by: YueHong Wu <yuehongwu@tencent.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrei Vagin <avagin@google.com>
Cc: Joel Granados <joel.granados@kernel.org>
Cc: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-12-18 19:51:32 -08:00
Yaxin Wang
e1698e9117 delayacct: update docs and fix some spelling errors
Update delay-accounting.rst to include the 'delay max' in the output of
getdelays, and fix some spelling errors before.

Link: https://lkml.kernel.org/r/20241213192700771XKZ8H30OtHSeziGqRVMs0@zte.com.cn
Signed-off-by: Yaxin Wang <wang.yaxin@zte.com.cn>
Signed-off-by: Jiang Kun <jiang.kun2@zte.com.cn>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Fan Yu <fan.yu9@zte.com.cn>
Cc: Peilin He <he.peilin@zte.com.cn>
Cc: tuqiang <tu.qiang35@zte.com.cn>
Cc: Wang Yong <wang.yong12@zte.com.cn>
Cc: xu xin <xu.xin16@zte.com.cn>
Cc: ye xingchen <ye.xingchen@zte.com.cn>
Cc: Yunkai Zhang <zhang.yunkai@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-12-18 19:51:30 -08:00
Wang Yaxin
036e1b3af4 delayacct: add delay max to record delay peak
Introduce the use cases of delay max, which can help quickly detect
potential abnormal delays in the system and record the types and specific
details of delay spikes.

Problem
========
Delay accounting can track the average delay of processes to show
system workload. However, when a process experiences a significant
delay, maybe a delay spike, which adversely affects performance,
getdelays can only display the average system delay over a period
of time. Yet, average delay is unhelpful for diagnosing delay peak.
It is not even possible to determine which type of delay has spiked,
as this information might be masked by the average delay.

Solution
=========
the 'delay max' can display delay peak since the system's startup,
which can record potential abnormal delays over time, including
the type of delay and the maximum delay. This is helpful for
quickly identifying crash caused by delay.

Use case
=========
bash# ./getdelays -d -p 244
print delayacct stats ON
PID     244

CPU             count     real total  virtual total    delay total  delay average      delay max
                   68      192000000      213676651         705643          0.010ms     0.306381ms
IO              count    delay total  delay average      delay max
                    0              0          0.000ms     0.000000ms
SWAP            count    delay total  delay average      delay max
                    0              0          0.000ms     0.000000ms
RECLAIM         count    delay total  delay average      delay max
                    0              0          0.000ms     0.000000ms
THRASHING       count    delay total  delay average      delay max
                    0              0          0.000ms     0.000000ms
COMPACT         count    delay total  delay average      delay max
                    0              0          0.000ms     0.000000ms
WPCOPY          count    delay total  delay average      delay max
                  235       15648284          0.067ms     0.263842ms
IRQ             count    delay total  delay average      delay max
                    0              0          0.000ms     0.000000ms

Link: https://lkml.kernel.org/r/20241203164848805CS62CQPQWG9GLdQj2_BxS@zte.com.cn
Co-developed-by: Wang Yong <wang.yong12@zte.com.cn>
Signed-off-by: Wang Yong <wang.yong12@zte.com.cn>
Co-developed-by: xu xin <xu.xin16@zte.com.cn>
Signed-off-by: xu xin <xu.xin16@zte.com.cn>
Co-developed-by: Wang Yaxin <wang.yaxin@zte.com.cn>
Signed-off-by: Wang Yaxin <wang.yaxin@zte.com.cn>
Signed-off-by: Kun Jiang <jiang.kun2@zte.com.cn>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Fan Yu <fan.yu9@zte.com.cn>
Cc: Peilin He <he.peilin@zte.com.cn>
Cc: tuqiang <tu.qiang35@zte.com.cn>
Cc: Yang Yang <yang.yang29@zte.com.cn>
Cc: ye xingchen <ye.xingchen@zte.com.cn>
Cc: Yunkai Zhang <zhang.yunkai@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-12-18 19:51:30 -08:00
Zijun Hu
bcaadbb2ee kernel/resource: simplify API __devm_release_region() implementation
Simplify __devm_release_region() implementation by dedicated API
devres_release() which have below advantages than current
__release_region() + devres_destroy():

It is simpler if __devm_release_region() is undoing what
__devm_request_region() did, otherwise, it can avoid wrong and undesired
__release_region().

Link: https://lkml.kernel.org/r/20241017-release_region_fix-v1-1-84a3e8441284@quicinc.com
Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-12-18 19:51:30 -08:00
Mateusz Guzik
7bd439f024 get_task_exe_file: check PF_KTHREAD locklessly
Same thing as 8ac5dc6659 ("get_task_mm: check PF_KTHREAD lockless")

Nowadays PF_KTHREAD is sticky and it was never protected by ->alloc_lock. 
Move the PF_KTHREAD check outside of task_lock() section to make this code
more understandable.

Link: https://lkml.kernel.org/r/20241119143526.704986-1-mjguzik@gmail.com
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-12-18 19:51:25 -08:00
Uros Bizjak
9c4ba50565 percpu: use TYPEOF_UNQUAL() in variable declarations
Use TYPEOF_UNQUAL() to declare variables as a corresponding
type without named address space qualifier to avoid
"`__seg_gs' specified for auto variable `var'" errors.

Link: https://lkml.kernel.org/r/20241208204708.3742696-4-ubizjak@gmail.com
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Acked-by: Nadav Amit <nadav.amit@gmail.com>
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: Dennis Zhou <dennis@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-12-18 19:51:04 -08:00
Suren Baghdasaryan
19fbad905e mm: convert mm_lock_seq to a proper seqcount
Convert mm_lock_seq to be seqcount_t and change all mmap_write_lock
variants to increment it, in-line with the usual seqcount usage pattern.
This lets us check whether the mmap_lock is write-locked by checking
mm_lock_seq.sequence counter (odd=locked, even=unlocked). This will be
used when implementing mmap_lock speculation functions.
As a result vm_lock_seq is also change to be unsigned to match the type
of mm_lock_seq.sequence.

Link: https://lkml.kernel.org/r/20241122174416.1367052-2-surenb@google.com
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Sourav Panda <souravpanda@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-12-18 19:50:50 -08:00
Nicholas Piggin
9664c5b908 lazy tlb: fix hotplug exit race with MMU_LAZY_TLB_SHOOTDOWN
CPU unplug first calls __cpu_disable(), and that's where powerpc calls
cleanup_cpu_mmu_context(), which clears this CPU from mm_cpumask() of all
mms in the system.

However this CPU may still be using a lazy tlb mm, and its mm_cpumask bit
will be cleared from it.  The CPU does not switch away from the lazy tlb
mm until arch_cpu_idle_dead() calls idle_task_exit().

If that user mm exits in this window, it will not be subject to the lazy
tlb mm shootdown and may be freed while in use as a lazy mm by the CPU
that is being unplugged.

cleanup_cpu_mmu_context() could be moved later, but it looks better to
move the lazy tlb mm switching earlier.  The problem with doing the lazy
mm switching in idle_task_exit() is explained in commit bf2c59fce4
("sched/core: Fix illegal RCU from offline CPUs"), which added a wart to
switch away from the mm but leave it set in active_mm to be cleaned up
later.

So instead, switch away from the lazy tlb mm at sched_cpu_wait_empty(),
which is the last hotplug state before teardown
(CPUHP_AP_SCHED_WAIT_EMPTY).  This CPU will never switch to a user thread
from this point, so it has no chance to pick up a new lazy tlb mm.  This
removes the lazy tlb mm handling wart in CPU unplug.

With this, idle_task_exit() is not needed anymore and can be cleaned up. 
This leaves the prototype alone, to be cleaned after this change.

herton: took the suggestions from https://lore.kernel.org/all/87jzvyprsw.ffs@tglx/
and made adjustments on the initial patch proposed by Nicholas.

Link: https://lkml.kernel.org/r/20230524060455.147699-1-npiggin@gmail.com
Link: https://lore.kernel.org/all/20230525205253.E2FAEC433EF@smtp.kernel.org/
Link: https://lkml.kernel.org/r/20241104142318.3295663-1-herton@redhat.com
Fixes: 2655421ae6 ("lazy tlb: shoot lazies, non-refcounting lazy tlb mm reference handling scheme")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Herton R. Krzesinski <herton@redhat.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-12-18 19:50:37 -08:00
Peter Zijlstra
bf8f464ee2 kasan: make kasan_record_aux_stack_noalloc() the default behaviour
kasan_record_aux_stack_noalloc() was introduced to record a stack trace
without allocating memory in the process.  It has been added to callers
which were invoked while a raw_spinlock_t was held.  More and more callers
were identified and changed over time.  Is it a good thing to have this
while functions try their best to do a locklessly setup?  The only
downside of having kasan_record_aux_stack() not allocate any memory is
that we end up without a stacktrace if stackdepot runs out of memory and
at the same stacktrace was not recorded before To quote Marco Elver from
https://lore.kernel.org/all/CANpmjNPmQYJ7pv1N3cuU8cP18u7PP_uoZD8YxwZd4jtbof9nVQ@mail.gmail.com/

| I'd be in favor, it simplifies things. And stack depot should be
| able to replenish its pool sufficiently in the "non-aux" cases
| i.e. regular allocations. Worst case we fail to record some
| aux stacks, but I think that's only really bad if there's a bug
| around one of these allocations. In general the probabilities
| of this being a regression are extremely small [...]

Make the kasan_record_aux_stack_noalloc() behaviour default as
kasan_record_aux_stack().

[bigeasy@linutronix.de: dressed the diff as patch]
Link: https://lkml.kernel.org/r/20241122155451.Mb2pmeyJ@linutronix.de
Fixes: 7cb3007ce2 ("kasan: generic: introduce kasan_record_aux_stack_noalloc()")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reported-by: syzbot+39f85d612b7c20d8db48@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/67275485.050a0220.3c8d68.0a37.GAE@google.com
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Reviewed-by: Marco Elver <elver@google.com>
Reviewed-by: Waiman Long <longman@redhat.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Ben Segall <bsegall@google.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: <kasan-dev@googlegroups.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: syzkaller-bugs@googlegroups.com
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zqiang <qiang.zhang1211@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-12-18 19:50:31 -08:00
Arnd Bergmann
71497ff8f3 kcov: mark in_softirq_really() as __always_inline
If gcc decides not to inline in_softirq_really(), objtool warns about a
function call with UACCESS enabled:

kernel/kcov.o: warning: objtool: __sanitizer_cov_trace_pc+0x1e: call to in_softirq_really() with UACCESS enabled
kernel/kcov.o: warning: objtool: check_kcov_mode+0x11: call to in_softirq_really() with UACCESS enabled

Mark this as __always_inline to avoid the problem.

Link: https://lkml.kernel.org/r/20241217071814.2261620-1-arnd@kernel.org
Fixes: 7d4df2dad3 ("kcov: properly check for softirq context")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Marco Elver <elver@google.com>
Cc: Aleksandr Nogikh <nogikh@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-12-18 19:49:58 -08:00
Lorenzo Stoakes
8ac662f5da fork: avoid inappropriate uprobe access to invalid mm
If dup_mmap() encounters an issue, currently uprobe is able to access the
relevant mm via the reverse mapping (in build_map_info()), and if we are
very unlucky with a race window, observe invalid XA_ZERO_ENTRY state which
we establish as part of the fork error path.

This occurs because uprobe_write_opcode() invokes anon_vma_prepare() which
in turn invokes find_mergeable_anon_vma() that uses a VMA iterator,
invoking vma_iter_load() which uses the advanced maple tree API and thus
is able to observe XA_ZERO_ENTRY entries added to dup_mmap() in commit
d240629148 ("fork: use __mt_dup() to duplicate maple tree in
dup_mmap()").

This change was made on the assumption that only process tear-down code
would actually observe (and make use of) these values.  However this very
unlikely but still possible edge case with uprobes exists and
unfortunately does make these observable.

The uprobe operation prevents races against the dup_mmap() operation via
the dup_mmap_sem semaphore, which is acquired via uprobe_start_dup_mmap()
and dropped via uprobe_end_dup_mmap(), and held across
register_for_each_vma() prior to invoking build_map_info() which does the
reverse mapping lookup.

Currently these are acquired and dropped within dup_mmap(), which exposes
the race window prior to error handling in the invoking dup_mm() which
tears down the mm.

We can avoid all this by just moving the invocation of
uprobe_start_dup_mmap() and uprobe_end_dup_mmap() up a level to dup_mm()
and only release this lock once the dup_mmap() operation succeeds or clean
up is done.

This means that the uprobe code can never observe an incompletely
constructed mm and resolves the issue in this case.

Link: https://lkml.kernel.org/r/20241210172412.52995-1-lorenzo.stoakes@oracle.com
Fixes: d240629148 ("fork: use __mt_dup() to duplicate maple tree in dup_mmap()")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reported-by: syzbot+2d788f4f7cb660dac4b7@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/6756d273.050a0220.2477f.003d.GAE@google.com/
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peng Zhang <zhangpeng.00@bytedance.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-12-18 19:04:44 -08:00
Kees Cook
c7c1167fcb Merge branch 'for-next/topic/execve/core' into for-next/execve 2024-12-18 17:01:53 -08:00
Rafael J. Wysocki
ebeeee390b PM: EM: Move sched domains rebuild function from schedutil to EM
Function sugov_eas_rebuild_sd() defined in the schedutil cpufreq governor
implements generic functionality that may be useful in other places.  In
particular, there is a plan to use it in the intel_pstate driver in the
future.

For this reason, move it from schedutil to the energy model code and
rename it to em_rebuild_sched_domains().

This also helps to get rid of some #ifdeffery in schedutil which is a
plus.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
2024-12-18 20:32:13 +01:00