[Problem Description]
When running the hackbench program of LTP, the following memory leak is
reported by kmemleak.
# /opt/ltp/testcases/bin/hackbench 20 thread 1000
Running with 20*40 (== 800) tasks.
# dmesg | grep kmemleak
...
kmemleak: 480 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
kmemleak: 665 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
# cat /sys/kernel/debug/kmemleak
unreferenced object 0xffff888cd8ca2c40 (size 64):
comm "hackbench", pid 17142, jiffies 4299780315
hex dump (first 32 bytes):
ac 74 49 00 01 00 00 00 4c 84 49 00 01 00 00 00 .tI.....L.I.....
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace (crc bff18fd4):
[<ffffffff81419a89>] __kmalloc_cache_noprof+0x2f9/0x3f0
[<ffffffff8113f715>] task_numa_work+0x725/0xa00
[<ffffffff8110f878>] task_work_run+0x58/0x90
[<ffffffff81ddd9f8>] syscall_exit_to_user_mode+0x1c8/0x1e0
[<ffffffff81dd78d5>] do_syscall_64+0x85/0x150
[<ffffffff81e0012b>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
...
This issue can be consistently reproduced on three different servers:
* a 448-core server
* a 256-core server
* a 192-core server
[Root Cause]
Since multiple threads are created by the hackbench program (along with
the command argument 'thread'), a shared vma might be accessed by two or
more cores simultaneously. When two or more cores observe that
vma->numab_state is NULL at the same time, vma->numab_state will be
overwritten.
Although current code ensures that only one thread scans the VMAs in a
single 'numa_scan_period', there might be a chance for another thread
to enter in the next 'numa_scan_period' while we have not gotten till
numab_state allocation [1].
Note that the command `/opt/ltp/testcases/bin/hackbench 50 process 1000`
cannot the reproduce the issue. It is verified with 200+ test runs.
[Solution]
Use the cmpxchg atomic operation to ensure that only one thread executes
the vma->numab_state assignment.
[1] https://lore.kernel.org/lkml/1794be3c-358c-4cdc-a43d-a1f841d91ef7@amd.com/
Link: https://lkml.kernel.org/r/20241113102146.2384-1-ahuang12@lenovo.com
Fixes: ef6a22b70f ("sched/numa: apply the scan delay to every new vma")
Signed-off-by: Adrian Huang <ahuang12@lenovo.com>
Reported-by: Jiwei Sun <sunjw10@lenovo.com>
Reviewed-by: Raghavendra K T <raghavendra.kt@amd.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Ben Segall <bsegall@google.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Since the order of the scheme_idx and target_idx arguments in TP_ARGS is
reversed, they are stored in the trace record in reverse.
Link: https://lkml.kernel.org/r/20241115182023.43118-1-sj@kernel.org
Link: https://patch.msgid.link/20241112154828.40307-1-akinobu.mita@gmail.com
Fixes: c603c630b5 ("mm/damon/core: add a tracepoint for damos apply target regions")
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The never-taken branch leads to an invalid bounds condition, which is by
design. To avoid the unwanted warning from the compiler, hide the
variable from the optimizer.
../lib/stackinit_kunit.c: In function 'do_nothing_u16_zero':
../lib/stackinit_kunit.c:51:49: error: array subscript 1 is outside array bounds of 'u16[0]' {aka 'short unsigned int[]'} [-Werror=array-bounds=]
51 | #define DO_NOTHING_RETURN_SCALAR(ptr) *(ptr)
| ^~~~~~
../lib/stackinit_kunit.c:219:24: note: in expansion of macro 'DO_NOTHING_RETURN_SCALAR'
219 | return DO_NOTHING_RETURN_ ## which(ptr + 1); \
| ^~~~~~~~~~~~~~~~~~
Link: https://lkml.kernel.org/r/20241117113813.work.735-kees@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The folio can get freed + buddy-merged + reallocated in the meantime,
resulting in us calling folio_test_locked() possibly on a tail page.
This makes const_folio_flags VM_BUG_ON_PGFLAGS() when stumbling over the
tail page.
Could this result in other issues? Doesn't look like it. False positives
and false negatives don't really matter, because this folio would get
skipped either way when detecting that they have been reallocated in the
meantime.
Fix it by performing the folio_test_locked() checked after grabbing a
reference. If this ever becomes a real problem, we could add a special
helper that racily checks if the bit is set even on tail pages ... but
let's hope that's not required so we can just handle it cleaner: work on
the folio after we hold a reference.
Do we really need the folio_test_locked() check if we are going to trylock
briefly after? Well, we can at least avoid a xas_reload().
It's a bit unclear which exact change introduced that issue. Likely, ever
since we made PG_locked obey to the PF_NO_TAIL policy it could have been
triggered in some way.
Link: https://lkml.kernel.org/r/20241129125303.4033164-1-david@redhat.com
Fixes: 48c935ad88 ("page-flags: define PG_locked behavior on compound pages")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: syzbot+9f9a7f73fb079b2387a6@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/lkml/674184c9.050a0220.1cc393.0001.GAE@google.com/
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Hillf Danton <hdanton@sina.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Fix a kernel-doc warning by making the kernel-doc function description
match the function name:
include/linux/scatterlist.h:323: warning: expecting prototype for sg_unmark_bus_address(). Prototype was for sg_dma_unmark_bus_address() instead
Link: https://lkml.kernel.org/r/20241130022406.537973-1-rdunlap@infradead.org
Fixes: 4239930120 ("lib/scatterlist: add flag for indicating P2PDMA segments in an SGL")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
We mistakenly refer to len rather than len_ here. The only existing
caller passes len to the len_ parameter so this has no impact on the code,
but it is obviously incorrect to do this, so fix it.
Link: https://lkml.kernel.org/r/20241118175414.390827-1-lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Cc: Jann Horn <jannh@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Commit efa7df3e3b ("mm: align larger anonymous mappings on THP
boundaries") updated __get_unmapped_area() to align the start address for
the VMA to a PMD boundary if CONFIG_TRANSPARENT_HUGEPAGE=y.
It does this by effectively looking up a region that is of size,
request_size + PMD_SIZE, and aligning up the start to a PMD boundary.
Commit 4ef9ad19e1 ("mm: huge_memory: don't force huge page alignment on
32 bit") opted out of this for 32bit due to regressions in mmap base
randomization.
Commit d4148aeab4 ("mm, mmap: limit THP alignment of anonymous mappings
to PMD-aligned sizes") restricted this to only mmap sizes that are
multiples of the PMD_SIZE due to reported regressions in some performance
benchmarks -- which seemed mostly due to the reduced spatial locality of
related mappings due to the forced PMD-alignment.
Another unintended side effect has emerged: When a user specifies an mmap
hint address, the THP alignment logic modifies the behavior, potentially
ignoring the hint even if a sufficiently large gap exists at the requested
hint location.
Example Scenario:
Consider the following simplified virtual address (VA) space:
...
0x200000-0x400000 --- VMA A
0x400000-0x600000 --- Hole
0x600000-0x800000 --- VMA B
...
A call to mmap() with hint=0x400000 and len=0x200000 behaves differently:
- Before THP alignment: The requested region (size 0x200000) fits into
the gap at 0x400000, so the hint is respected.
- After alignment: The logic searches for a region of size
0x400000 (len + PMD_SIZE) starting at 0x400000.
This search fails due to the mapping at 0x600000 (VMA B), and the hint
is ignored, falling back to arch_get_unmapped_area[_topdown]().
In general the hint is effectively ignored, if there is any existing
mapping in the below range:
[mmap_hint + mmap_size, mmap_hint + mmap_size + PMD_SIZE)
This changes the semantics of mmap hint; from ""Respect the hint if a
sufficiently large gap exists at the requested location" to "Respect the
hint only if an additional PMD-sized gap exists beyond the requested
size".
This has performance implications for allocators that allocate their heap
using mmap but try to keep it "as contiguous as possible" by using the end
of the exisiting heap as the address hint. With the new behavior it's
more likely to get a much less contiguous heap, adding extra fragmentation
and performance overhead.
To restore the expected behavior; don't use
thp_get_unmapped_area_vmflags() when the user provided a hint address, for
anonymous mappings.
Note: As Yang Shi pointed out: the issue still remains for filesystems
which are using thp_get_unmapped_area() for their get_unmapped_area() op.
It is unclear what worklaods will regress for if we ignore THP alignment
when the hint address is provided for such file backed mappings -- so this
fix will be handled separately.
Link: https://lkml.kernel.org/r/20241118214650.3667577-1-kaleshsingh@google.com
Fixes: efa7df3e3b ("mm: align larger anonymous mappings on THP boundaries")
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Reviewed-by: Rik van Riel <riel@surriel.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yang Shi <yang@os.amperecomputing.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Hans Boehm <hboehm@google.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
In commit 66d60c428b ("mm: memcg: move legacy memcg event code into
memcontrol-v1.c"), the static do_memsw_account() function was moved from a
.c file to a .h file. Unfortunately, the traditional inline keyword
wasn't added. If a file (e.g., a unit test) includes the .h file, but
doesn't refer to do_memsw_account(), it will get a warning like:
mm/memcontrol-v1.h:41:13: warning: unused function 'do_memsw_account' [-Wunused-function]
41 | static bool do_memsw_account(void)
| ^~~~~~~~~~~~~~~~
Link: https://lkml.kernel.org/r/20241128203959.726527-1-jsperbeck@google.com
Fixes: 66d60c428b ("mm: memcg: move legacy memcg event code into memcontrol-v1.c")
Signed-off-by: John Sperbeck <jsperbeck@google.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Current solution to adjust codetag references during page migration is
done in 3 steps:
1. sets the codetag reference of the old page as empty (not pointing
to any codetag);
2. subtracts counters of the new page to compensate for its own
allocation;
3. sets codetag reference of the new page to point to the codetag of
the old page.
This does not work if CONFIG_MEM_ALLOC_PROFILING_DEBUG=n because
set_codetag_empty() becomes NOOP. Instead, let's simply swap codetag
references so that the new page is referencing the old codetag and the old
page is referencing the new codetag. This way accounting stays valid and
the logic makes more sense.
Link: https://lkml.kernel.org/r/20241129025213.34836-1-00107082@163.com
Fixes: e0a955bf7f ("mm/codetag: add pgalloc_tag_copy()")
Signed-off-by: David Wang <00107082@163.com>
Closes: https://lore.kernel.org/lkml/20241124074318.399027-1-00107082@163.com/
Acked-by: Suren Baghdasaryan <surenb@google.com>
Suggested-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Yu Zhao <yuzhao@google.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The following INFO level message was seen:
seq_file: buggy .next function ocfs2_dlm_seq_next [ocfs2] did not
update position index
Fix:
Update *pos (so m->index) to make seq_read_iter happy though the index its
self makes no sense to ocfs2_dlm_seq_next.
Link: https://lkml.kernel.org/r/20241119174500.9198-1-wen.gang.wang@oracle.com
Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Per documentation, stack_depot_save_flags() was meant to be usable from
NMI context if STACK_DEPOT_FLAG_CAN_ALLOC is unset. However, it still
would try to take the pool_lock in an attempt to save a stack trace in the
current pool (if space is available).
This could result in deadlock if an NMI is handled while pool_lock is
already held. To avoid deadlock, only try to take the lock in NMI context
and give up if unsuccessful.
The documentation is fixed to clearly convey this.
Link: https://lkml.kernel.org/r/Z0CcyfbPqmxJ9uJH@elver.google.com
Link: https://lkml.kernel.org/r/20241122154051.3914732-1-elver@google.com
Fixes: 4434a56ec2 ("stackdepot: make fast paths lock-less again")
Signed-off-by: Marco Elver <elver@google.com>
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
page_folio() calls page_fixed_fake_head() which will misidentify this page
as being a fake head and load off the end of 'precise'. We may have a
pointer to a fake head, but that's OK because it contains the right
information for dump_page().
gcc-15 is smart enough to catch this with -Warray-bounds:
In function 'page_fixed_fake_head',
inlined from '_compound_head' at ../include/linux/page-flags.h:251:24,
inlined from '__dump_page' at ../mm/debug.c:123:11:
../include/asm-generic/rwonce.h:44:26: warning: array subscript 9 is outside
+array bounds of 'struct page[1]' [-Warray-bounds=]
Link: https://lkml.kernel.org/r/20241125201721.2963278-2-willy@infradead.org
Fixes: fae7d834c4 ("mm: add __dump_folio()")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reported-by: Kees Cook <kees@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
It is unsafe to call PageTail() in dump_page() as page_is_fake_head() will
almost certainly return true when called on a head page that is copied to
the stack. That will cause the VM_BUG_ON_PGFLAGS() in const_folio_flags()
to trigger when it shouldn't. Fortunately, we don't need to call
PageTail() here; it's fine to have a pointer to a virtual alias of the
page's flag word rather than the real page's flag word.
Link: https://lkml.kernel.org/r/20241125201721.2963278-1-willy@infradead.org
Fixes: fae7d834c4 ("mm: add __dump_folio()")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Kees Cook <kees@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
When vrealloc() reuses already allocated vmap_area, we need to re-annotate
poisoned and unpoisoned portions of underlying memory according to the new
size.
This results in a KASAN splat recorded at [1]. A KASAN mis-reporting
issue where there is none.
Note, hard-coding KASAN_VMALLOC_PROT_NORMAL might not be exactly correct,
but KASAN flag logic is pretty involved and spread out throughout
__vmalloc_node_range_noprof(), so I'm using the bare minimum flag here and
leaving the rest to mm people to refactor this logic and reuse it here.
Link: https://lkml.kernel.org/r/20241126005206.3457974-1-andrii@kernel.org
Link: https://lore.kernel.org/bpf/67450f9b.050a0220.21d33d.0004.GAE@google.com/ [1]
Fixes: 3ddc2fefe6 ("mm: vmalloc: implement vrealloc()")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This reverts commit 7c877586da.
Anders and Philippe have reported that recent kernels occasionally hang
when used with NFS in readahead code. The problem has been bisected to
7c877586da ("readahead: properly shorten readahead when falling back to
do_page_cache_ra()"). The cause of the problem is that ra->size can be
shrunk by read_pages() call and subsequently we end up calling
do_page_cache_ra() with negative (read huge positive) number of pages.
Let's revert 7c877586da for now until we can find a proper way how the
logic in read_pages() and page_cache_ra_order() can coexist. This can
lead to reduced readahead throughput due to readahead window confusion but
that's better than outright hangs.
Link: https://lkml.kernel.org/r/20241126145208.985-1-jack@suse.cz
Fixes: 7c877586da ("readahead: properly shorten readahead when falling back to do_page_cache_ra()")
Reported-by: Anders Blomdell <anders.blomdell@gmail.com>
Reported-by: Philippe Troin <phil@fifi.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Tested-by: Philippe Troin <phil@fifi.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
When running selftests I encountered the following error message with
some damon tests:
# Traceback (most recent call last):
# File "[...]/damon/./damos_quota.py", line 7, in <module>
# import _damon_sysfs
# ModuleNotFoundError: No module named '_damon_sysfs'
Fix this by adding the _damon_sysfs.py file to TEST_FILES so that it
will be available when running the respective damon selftests.
Link: https://lkml.kernel.org/r/20241127-picks-visitor-7416685b-mheyne@amazon.de
Fixes: 306abb63a8 ("selftests/damon: implement a python module for test-purpose DAMON sysfs controls")
Signed-off-by: Maximilian Heyne <mheyne@amazon.de>
Reviewed-by: SeongJae Park <sj@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The string logged when a test passes or fails is used by the selftest
framework to identify which test is being reported. The hugetlb_dio test
not only uses the same strings for every test that is run but it also uses
different strings for test passes and failures which means that test
automation is unable to follow what the test is doing at all.
Pull the existing duplicated logging of the number of free huge pages
before and after the test out of the conditional and replace that and the
logging of the result with a single ksft_print_result() which incorporates
the parameters passed into the test into the output.
Link: https://lkml.kernel.org/r/20241127-kselftest-mm-hugetlb-dio-names-v1-1-22aab01bf550@kernel.org
Fixes: fae1980347 ("selftests: hugetlb_dio: fixup check for initial conditions to skip in the start")
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Donet Tom <donettom@linux.ibm.com>
Cc: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Syzbot reported that when searching for records in a directory where the
inode's i_size is corrupted and has a large value, memory access outside
the folio/page range may occur, or a use-after-free bug may be detected if
KASAN is enabled.
This is because nilfs_last_byte(), which is called by nilfs_find_entry()
and others to calculate the number of valid bytes of directory data in a
page from i_size and the page index, loses the upper 32 bits of the 64-bit
size information due to an inappropriate type of local variable to which
the i_size value is assigned.
This caused a large byte offset value due to underflow in the end address
calculation in the calling nilfs_find_entry(), resulting in memory access
that exceeds the folio/page size.
Fix this issue by changing the type of the local variable causing the bit
loss from "unsigned int" to "u64". The return value of nilfs_last_byte()
is also of type "unsigned int", but it is truncated so as not to exceed
PAGE_SIZE and no bit loss occurs, so no change is required.
Link: https://lkml.kernel.org/r/20241119172403.9292-1-konishi.ryusuke@gmail.com
Fixes: 2ba466d74e ("nilfs2: directory entry operations")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+96d5d14c47d97015c624@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=96d5d14c47d97015c624
Tested-by: syzbot+96d5d14c47d97015c624@syzkaller.appspotmail.com
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
If PREEMPT_RT is enabled, report_lock is a sleeping spinlock and must not
be locked when IRQs are disabled. However, KASAN reports may be triggered
in such contexts. For example:
char *s = kzalloc(1, GFP_KERNEL);
kfree(s);
local_irq_disable();
char c = *s; /* KASAN report here leads to spin_lock() */
local_irq_enable();
Make report_spinlock a raw spinlock to prevent rescheduling when
PREEMPT_RT is enabled.
Link: https://lkml.kernel.org/r/20241119210234.1602529-1-jkangas@redhat.com
Fixes: 342a93247e ("locking/spinlock: Provide RT variant header: <linux/spinlock_rt.h>")
Signed-off-by: Jared Kangas <jkangas@redhat.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The recent addition of "pofs" (pages or folios) handling to gup has a
flaw: it assumes that unpin_user_pages() handles NULL pages in the pages**
array. That's not the case, as I discovered when I ran on a new
configuration on my test machine.
Fix this by skipping NULL pages in unpin_user_pages(), just like
unpin_folios() already does.
Details: when booting on x86 with "numa=fake=2 movablecore=4G" on Linux
6.12, and running this:
tools/testing/selftests/mm/gup_longterm
...I get the following crash:
BUG: kernel NULL pointer dereference, address: 0000000000000008
RIP: 0010:sanity_check_pinned_pages+0x3a/0x2d0
...
Call Trace:
<TASK>
? __die_body+0x66/0xb0
? page_fault_oops+0x30c/0x3b0
? do_user_addr_fault+0x6c3/0x720
? irqentry_enter+0x34/0x60
? exc_page_fault+0x68/0x100
? asm_exc_page_fault+0x22/0x30
? sanity_check_pinned_pages+0x3a/0x2d0
unpin_user_pages+0x24/0xe0
check_and_migrate_movable_pages_or_folios+0x455/0x4b0
__gup_longterm_locked+0x3bf/0x820
? mmap_read_lock_killable+0x12/0x50
? __pfx_mmap_read_lock_killable+0x10/0x10
pin_user_pages+0x66/0xa0
gup_test_ioctl+0x358/0xb20
__se_sys_ioctl+0x6b/0xc0
do_syscall_64+0x7b/0x150
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Link: https://lkml.kernel.org/r/20241121034933.77502-1-jhubbard@nvidia.com
Fixes: 94efde1d15 ("mm/gup: avoid an unnecessary allocation call for FOLL_LONGTERM cases")
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Vivek Kasireddy <vivek.kasireddy@intel.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dongwon Kim <dongwon.kim@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Junxiao Chang <junxiao.chang@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>> fs/proc/vmcore.c:424:19: warning: 'mmap_vmcore_fault' defined but not used [-Wunused-function]
424 | static vm_fault_t mmap_vmcore_fault(struct vm_fault *vmf)
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202411140156.2o0nS4fl-lkp@intel.com/
Cc: Qi Xi <xiqi2@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
core: add of based component probing
Some devices are designed and manufactured with some components having
multiple drop-in replacement options. These components are often
connected to the mainboard via ribbon cables, having the same signals
and pin assignments across all options. These may include the display
panel and touchscreen on laptops and tablets, and the trackpad on
laptops. Sometimes which component option is used in a particular device
can be detected by some firmware provided identifier, other times that
information is not available, and the kernel has to try to probe each
device.
Instead of a delicate dance between drivers and device tree quirks, this
change introduces a simple I2C component probe function. For a given
class of devices on the same I2C bus, it will go through all of them,
doing a simple I2C read transfer and see which one of them responds. It
will then enable the device that responds.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEOZGx6rniZ1Gk92RdFA3kzBSgKbYFAmdMOCwACgkQFA3kzBSg
KbaLWQ//e+29c9Trh+Typ6YCknuCFmM12xNUEPgce93EldERZjfCsdlHEchJnjEe
RX5Mw+UxT7NqFUD6G2YQQw35LMj/PDws4Q/q3OwAvurj2PqoKEGIbdTxUEbGrLtT
aYRUjGoqjPk3gAkqfKEQp8X8bvqKJfQ/xeXcZyy7Ij8xPU1LC/W6So4vpZA7oZkZ
UfwrUXu2NM/DKfLATeSH3vnlBrrpco9BdjImDanP8BpuYHQ0aNt7IToDytd3/rLZ
WKQPbHqVzzBUHm8Tf+DTyeVtII/ID078z+RW7htPrzPtzhVu2DN56w3z7/+EsVIy
XKb1PiJo5HlFev4gxhIcUgf8L5JYE8MyU8eeRWAcDObiBYVcSP34P7Xoykczjuq9
22IFNSdaqMVdigo9v4p7U99fVjoi3oUqIVsSb2HkmND8bPw3+jtyq1GxTFvTeOw8
EVJGuGjYrPiko26zNgK/WZaumpB+fsH7uRioSb1ejlSQk0BcNv4YqjYaMx9aT7Xj
+zF7luuxpkWTkPCjk57GX55K/BFzkA9jhJdKqjImBS+Q9Rn3k5bHyRdNPUDIeWs0
Yl5dhg6QbLHyuPC2R/hJiV6X4uFloH9+Oh99T3lLzcVNc2mllPQDN9F4I8q5nWB8
Eq5U95fnegfdVkiWnbNCnTCux/mypm35Gx46BffTiAJL2U9pSIE=
=MAxU
-----END PGP SIGNATURE-----
Merge tag 'i2c-for-6.13-rc1-part3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c component probing support from Wolfram Sang:
"Add OF component probing.
Some devices are designed and manufactured with some components having
multiple drop-in replacement options. These components are often
connected to the mainboard via ribbon cables, having the same signals
and pin assignments across all options. These may include the display
panel and touchscreen on laptops and tablets, and the trackpad on
laptops. Sometimes which component option is used in a particular
device can be detected by some firmware provided identifier, other
times that information is not available, and the kernel has to try to
probe each device.
Instead of a delicate dance between drivers and device tree quirks,
this change introduces a simple I2C component probe function. For a
given class of devices on the same I2C bus, it will go through all of
them, doing a simple I2C read transfer and see which one of them
responds. It will then enable the device that responds"
* tag 'i2c-for-6.13-rc1-part3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
MAINTAINERS: fix typo in I2C OF COMPONENT PROBER
of: base: Document prefix argument for of_get_next_child_with_prefix()
i2c: Fix whitespace style issue
arm64: dts: mediatek: mt8173-elm-hana: Mark touchscreens and trackpads as fail
platform/chrome: Introduce device tree hardware prober
i2c: of-prober: Add GPIO support to simple helpers
i2c: of-prober: Add simple helpers for regulator support
i2c: Introduce OF component probe function
of: base: Add for_each_child_of_node_with_prefix()
of: dynamic: Add of_changeset_update_prop_string
- Remove unused bprintf() function
bprintf() was added with the rest of the "bin-printf" functions.
These are functions that are used by trace_printk() that allows to
quickly save the format and arguments into the ring buffer without
the expensive processing of converting numbers to ASCII. Then on
output, at a much later time, the ring buffer is read and the string
processing occurs then. The bprintf() was added for consistency but
was never used. It can be safely removed.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZ0yNShQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qmJ6AP9i8pFOjeMfb2hOBpJTzORkIXEbz5nG
OCK/5aeSdjxy8QEAqafBSr5IQOxaTCFve1p7WSwdgmi2ZLmqEasaud0LmAk=
=5bp1
-----END PGP SIGNATURE-----
Merge tag 'trace-printf-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull bprintf() removal from Steven Rostedt:
- Remove unused bprintf() function, that was added with the rest of the
"bin-printf" functions.
These are functions that are used by trace_printk() that allows to
quickly save the format and arguments into the ring buffer without
the expensive processing of converting numbers to ASCII. Then on
output, at a much later time, the ring buffer is read and the string
processing occurs then. The bprintf() was added for consistency but
was never used. It can be safely removed.
* tag 'trace-printf-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
printf: Remove unused 'bprintf'
signals if some of the group's threads are exiting
- Fix a hang caused by ndelay() calling the wrong delay function __udelay()
- Fix a wrong offset calculation in adjtimex(2) when using ADJ_MICRO
(microsecond resolution) and a negative offset
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmdMQ2sACgkQEsHwGGHe
VUoRGBAAt8luiDBdMHIcD053RHsLr7Oocg5AI/t0PVxYxJ+89o0cSdDx2vaaXiyX
+vRSkdvH5mfwvwW4XRJZkVWbzOjMiA6m7FwH667XGzEedIq4vtgs5Rd/1YStSfIx
ceQfD2N+34esamxiGGBlzjNO2GdqI2XMo/Fc6LuPCTfPBqELCL8OpbEdOV8Ltwxr
mRsmbCNazBtw31Yo3zp9UZIVVSAzJFmWOoK0M+xm6S91YPYaKQ9RYk2QQwLizVgR
N++dniNV6yZuSLTzr4dNckrvl744Iqc4Sy8iy2CL9rNFZkb+3q5CAAQggGNlY2U9
0W95tgwpy/Qt6drfsyam3+PR5Smwjnh/0mrk3sLzUCdy9Y6L2HgKmrvHk4Rqq/66
N6uIjIDmou+L0FUcdUducRnMOgQnvfIB/l6hIAHHkDap7iL8oy74JDzzk0jnNKHw
1I5kGbKqXz0ucdxge6H1BHqCc/roobwC05/TWLPAQ5IG0BtQFPGAwd901AZtANkk
/FfWUq7IT6PW05T2co7O75NjgMvU3QV0Sf5E9vkV/+R9WtTKT13FmZ8+rC6zaC7o
Juml/lRWeTCyuot3vv29NtcvY6j+gy/RrKWL4iNWDlXznntR2DAhIkzRCF+1yTSb
z0RSOrY2BSsk2iqeUh8ydet5OEyPiMXwiHVbxUHzJ4R/7qaxsB8=
=X4bb
-----END PGP SIGNATURE-----
Merge tag 'timers_urgent_for_v6.13_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Borislav Petkov:
- Fix a case where posix timers with a thread-group-wide target would
miss signals if some of the group's threads are exiting
- Fix a hang caused by ndelay() calling the wrong delay function
__udelay()
- Fix a wrong offset calculation in adjtimex(2) when using ADJ_MICRO
(microsecond resolution) and a negative offset
* tag 'timers_urgent_for_v6.13_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
posix-timers: Target group sigqueue to current task only if not exiting
delay: Fix ndelay() spuriously treated as udelay()
ntp: Remove invalid cast in time offset math
fix some Marvell Armada platforms
- Add a workaround for Hisilicon ITS erratum 162100801 which can cause some
virtual interrupts to get lost
- More platform_driver::remove() conversion
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmdMQN0ACgkQEsHwGGHe
VUo0Ng/+OWH2VtqWo2Elz2iH2gYKaxku5GXfSOlsV3DwrQ0UGH9jlbVLj9yEZytx
FRWasWVO5e5bDq6g/pnSLjgfkDOYq573eSm56DIc+hrb4EB97r+VNWtlxNx3P5Yl
6AYnRdQ+TFNvk3PtAngbFwTQpQK3qTOf26emvxXLdHVeZ6BvAgq8m5mT0AMjJu31
5HOxjdF8pngCxPCb3aX5jKAiE1KdyRnbc1bJX+UqNBUwNaDf2thPNn7XtJXblkmG
K+CYQ+cVdmiVfvHH+E3LiYW4p8MVMc/Zp0KLKKvNaN1os0LXVaKbAucBorhgCBch
1kMuU5IYHSfeOAXyp7C19a2yNKQ1b2+ghbUr44P3nWJCrARrr5dFqcvEL+U5wP/0
oNO0nP1vtpIWp8M5cc2ip9UYjtPhEQi4nI7rB4F2i5l10C8pIKXSbn+rVPVf13jf
gUwmRB+Ihd8Dw+EGoJmAn4xTyUnA7twgav2zi9jj8M2O0iwmwRfftfRDfnN/H3tc
hGlNDCO5vVSZxA2tBruEMbKTkECLU7b2lmZp3MldCfWy2wstmrtdL7Vaef+9JpM3
K5pW/QTaHMAsqgmHmb2nfxu5xXDlTRIZMcX/ynGeXi7aDf5Zdc8AsSeysxMFiTua
nerDCljpXHXKJiFjKF9hTgnkdDrLO8l/A+JVGymDK9Ai2vdgSHg=
=xeni
-----END PGP SIGNATURE-----
Merge tag 'irq_urgent_for_v6.13_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Borislav Petkov:
- Move the ->select callback to the correct ops structure in
irq-mvebu-sei to fix some Marvell Armada platforms
- Add a workaround for Hisilicon ITS erratum 162100801 which can cause
some virtual interrupts to get lost
- More platform_driver::remove() conversion
* tag 'irq_urgent_for_v6.13_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip: Switch back to struct platform_driver::remove()
irqchip/gicv3-its: Add workaround for hip09 ITS erratum 162100801
irqchip/irq-mvebu-sei: Move misplaced select() callback to SEI CP domain
by erratum 1386 so that the matching loop actually terminates instead of
going off into the weeds
- Update the boot protocol documentation to mention the fact that the
preferred address to load the kernel to is considered in the relocatable
kernel case too
- Flush the memory buffer containing the microcode patch after applying
microcode on AMD Zen1 and Zen2, to avoid unnecessary slowdowns
- Make sure the PPIN CPU feature flag is cleared on all CPUs if PPIN has been
disabled
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmdMPvcACgkQEsHwGGHe
VUo2xw//QvwzIfWU/l+UnZppbpRL5gvLy41EgNOwhMBVDd81Fdx87KImg7luDDvM
FHsydVpSmqS6gMX0n6JQfr7IMz4HLWHff/yJjq2Pgb5BS7HBk8RyQ8YPCaBbXP33
NsV2fSL2INgLL6z6iefrnStQouIP2iRp+bN1kXSRe0Yhs+RBj6DyKsD6BdN/x342
AFkP65rY/1+7jLIftI2YulKEB5RmlbNqa9Nzbq1kOfO6I0TPUZmK5XI1xcRKHiwK
yFaMKufZq94rULhNsbjwPhNqK5LG34AeQ2xpaiujA1uHdQssChmAnGuJzrK2s3T0
YUo7WzI5LBsRDw0UGtfKjvl6JMFhDvhiSY9f8stS8B8GIiIeErkwKxkzVqAR5rQM
JbukE0Di/JABXk5sMzwyamFCJ3TgbuSWivK5ujxsiDTU6d/X89f1CRECU02lZT6u
fc7GIjZ09voep/YruknmyZbha/hh0EofN3GbIkwBsKX6dsypSKAuSkSBysZfRYXE
z1hZuVyBGQJj0OQMtbIaGGmKJWPcQq18xiKKo1XYIDOL+Ag0RQ0ZrVYA7Wt96MB7
ImoeduD1ssvU00IJ9QMx/EPdmrZHxzX3C1XGEm1DyW4fTYc8TPJnowxjXuB9Hir6
IhCARvXni5/8vAeNOb8xx+izr64jRCy3w2rCcjjebqFW/oEvPrQ=
=RC2M
-----END PGP SIGNATURE-----
Merge tag 'x86_urgent_for_v6.13_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- Add a terminating zero end-element to the array describing AMD CPUs
affected by erratum 1386 so that the matching loop actually
terminates instead of going off into the weeds
- Update the boot protocol documentation to mention the fact that the
preferred address to load the kernel to is considered in the
relocatable kernel case too
- Flush the memory buffer containing the microcode patch after applying
microcode on AMD Zen1 and Zen2, to avoid unnecessary slowdowns
- Make sure the PPIN CPU feature flag is cleared on all CPUs if PPIN
has been disabled
* tag 'x86_urgent_for_v6.13_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/CPU/AMD: Terminate the erratum_1386_microcode array
x86/Documentation: Update algo in init_size description of boot protocol
x86/microcode/AMD: Flush patch buffer mapping after application
x86/mm: Carve out INVLPG inline asm for use by others
x86/cpu: Fix PPIN initialization
The point behind strscpy() was to once and for all avoid all the
problems with 'strncpy()' and later broken "fixed" versions like
strlcpy() that just made things worse.
So strscpy not only guarantees NUL-termination (unlike strncpy), it also
doesn't do unnecessary padding at the destination. But at the same time
also avoids byte-at-a-time reads and writes by _allowing_ some extra NUL
writes - within the size, of course - so that the whole copy can be done
with word operations.
It is also stable in the face of a mutable source string: it explicitly
does not read the source buffer multiple times (so an implementation
using "strnlen()+memcpy()" would be wrong), and does not read the source
buffer past the size (like the mis-design that is strlcpy does).
Finally, the return value is designed to be simple and unambiguous: if
the string cannot be copied fully, it returns an actual negative error,
making error handling clearer and simpler (and the caller already knows
the size of the buffer). Otherwise it returns the string length of the
result.
However, there was one final stability issue that can be important to
callers: the stability of the destination buffer.
In particular, the same way we shouldn't read the source buffer more
than once, we should avoid doing multiple writes to the destination
buffer: first writing a potentially non-terminated string, and then
terminating it with NUL at the end does not result in a stable result
buffer.
Yes, it gives the right result in the end, but if the rule for the
destination buffer was that it is _always_ NUL-terminated even when
accessed concurrently with updates, the final byte of the buffer needs
to always _stay_ as a NUL byte.
[ Note that "final byte is NUL" here is literally about the final byte
in the destination array, not the terminating NUL at the end of the
string itself. There is no attempt to try to make concurrent reads and
writes give any kind of consistent string length or contents, but we
do want to guarantee that there is always at least that final
terminating NUL character at the end of the destination array if it
existed before ]
This is relevant in the kernel for the tsk->comm[] array, for example.
Even without locking (for either readers or writers), we want to know
that while the buffer contents may be garbled, it is always a valid C
string and always has a NUL character at 'comm[TASK_COMM_LEN-1]' (and
never has any "out of thin air" data).
So avoid any "copy possibly non-terminated string, and terminate later"
behavior, and write the destination buffer only once.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
bprintf() is unused. Remove it. It was added in the commit 4370aa4aa7
("vsprintf: add binary printf") but as far as I can see was never used,
unlike the other two functions in that patch.
Link: https://lore.kernel.org/20241002173147.210107-1-linux@treblig.org
Reviewed-by: Andy Shevchenko <andy@kernel.org>
Acked-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmdLqNcUHGJoZWxnYWFz
QGdvb2dsZS5jb20ACgkQWYigwDrT+vzv8Q/+IMfZpw/zOojgVKa7uL8Xv44t+hhX
DYMghaRJpg392D49DN/cTRST9PO6iWT3FI/EUC+kYDIyHucUZ7qEO5lL/8B/sfwi
i9qZHODOwqYRy9VRTmBBgmPsq9sGeKZnCSo7kQ1JW4syiYN+SFaFxNG4eSUxPe4K
yJ958TJqV0dd3YAT8ItcBxaJS/C2FuIvPfmvwjG1GFQwSwuEjeOTnqoN33Pwey/X
P6p4xLwd+IcEvJ/o9ygm/4g+nT/FTgb91P+FCCUfcOsu0IsxGxei/dIR6JuDmmK/
QwPM+RnnpXJG5hwm5tKv6I/sJND1EMktTDIFQZawX9/S8qF2KhjJupEB8/pxqFca
V+kQopiVKysgZFXWtbCFf6rb07s3UO6HnwF6sUgoP7yD5xO4QcEKsbNvftn6kruV
dfuFericd7MPNlfYwxLLctPWa8IOyJQkqCrClDZQQa3YxWx6FDB9+AWayBfd2lDj
VTRcupotQX36gOYBZa7F1zN3A/gh/dJ6qdk68WCcWIzkzdmYDZ6iDM6/GiGHjwOk
ri6b0FyazVSUu17V4Gw5xnqlH6dfCBznjluwdqAbNUXkEusNIf7OxlQxPaAKmtbm
MTYn20op9OBv1VYgx8oj9iMZGBNKWkoHK2SJnS94xUSje08P0ubhpw2eTpi00Zx6
7PzJNqBARBF+n6I=
=bPnr
-----END PGP SIGNATURE-----
Merge tag 'pci-v6.13-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull PCI fix from Bjorn Helgaas:
- When removing a PCI device, only look up and remove a platform device
if there is an associated device node for which there could be a
platform device, to fix a merge window regression (Brian Norris)
* tag 'pci-v6.13-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
PCI/pwrctrl: Unregister platform device only if one actually exists
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmdKdi4UHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXP36A//UudCkttNTVCErQM0r8mjcPjtthlo
fRblzsuXbHB3FTsrDQA4bpPcLTlxMcNZXGgO9F+cwjNRNRbavBZo8+AbtAWOppNx
3XGr65Z0iIelUgvy/T3jYl8xQM4IePWbBn/wXMtqX3synE/UX4fqfxC/feeaUly8
qdyoefM0I4JOabjn9i9N6KyeyU/sjI4aTRhd5YwaCdy6wK2u9hQEqvER5XdFZjhN
HnuFKlBKEH5fo3Wwq5X+PbX8hP4szl3DE7QEplTlMjrh0y7xIc/ndugp/f3IN2Ri
0/Da6FpqjtBiis+wkyGWwwfmw/erRRaLUhwbuW3MjCQTmtbcjkIjA2xb4nxtcT+n
e/sCdWkgf1PX15igHpbWAynlv+jF3Ue+Kr/hrWi/H50fWiZh9skUmUxJ92SlI4fK
DT1N430xvoBpQwynASAM6hcWWgLJlFBo11aZA2fl++ORC3z/dbCZ5iLPdbIvaemF
KSmQTnrk4sVH8sPl9aIHqCFxj78uTHkSzLGQyq76AWU/s8hGzPszpU6iPwlcx1wk
WpkMLG5FQr0a4vKnNx1FVLCmwalPlB6/ZclRvKuSiEGZWTrnOo5KhPy1+Yn39AUo
IDBFnJkJNWxUVyb6zK9OJJ3Tsd6c1qeBEqb0VzUDGmw2umTuEAKqlqNJAohxVvyP
A7GFrlg9FvsDAwM=
=ZCsN
-----END PGP SIGNATURE-----
Merge tag 'lsm-pr-20241129' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm
Pull ima fix from Paul Moore:
"One small patch to fix a function parameter / local variable naming
snafu that went up to you in the current merge window"
* tag 'lsm-pr-20241129' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
ima: uncover hidden variable in ima_match_rules()
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmdJ6jwQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpvgeEADaPL+qmsRyp070K1OI/yTA9Jhf6liEyJ31
0GPVar5Vt6ObH6/POObrJqbBtAo5asQanFvwyVyLztKYxPHU7sdQaRcD+vvj7q3+
EhmQZSKM7Spp77awWhRWbeQfUBvdhTeGHjQH/0e60eIrF9KtEL9sM9hVqc8hBD9F
YtDNWPCk7Rz1PPNYlGEkQ2JmYmaxh3Gn29c/k1cSSo3InEQOFj6x+0Cgz6RjbTx3
9HfpLhVG3WV5MlZCCwp7KG36aJzlc0nq53x/sC9cg+F17RvL2EwNAOUfLl75/Kp/
t7PCQSd2ODciiDN9qZW71KGtVtlJ07W048Rk0nB+ogneC0uh4fuIYTidP9D7io7D
bBMrhDuUpnlPzlOqg0aeedXePQL7TRfT3CTentol6xldqg14n7C4QTQFQMSJCgJf
gr4YCTwl0RTknXo0A3ja16XwsUq5+2xsSoCTU25TY+wgKiAcc5lN9fhbvPRzbCQC
u9EQ9I9IFAMqEdnE51sw0x16fLtN2w4/zOkvTF+gD/KooEjSn9lcfeNue7jt1O0/
gFvFJCdXK/2GgxwHihvsEVdcNeaS8JowNafKUsfOM2G0qWQbY+l2vl/b5PfwecWi
0knOaqNWlGMwrQ+z+fgsEeFG7X98ninC7tqVZpzoZ7j0x65anH+Jq4q1Egongj0H
90zclclxjg==
=6cbB
-----END PGP SIGNATURE-----
Merge tag 'block-6.13-20242901' of git://git.kernel.dk/linux
Pull more block updates from Jens Axboe:
- NVMe pull request via Keith:
- Use correct srcu list traversal (Breno)
- Scatter-gather support for metadata (Keith)
- Fabrics shutdown race condition fix (Nilay)
- Persistent reservations updates (Guixin)
- Add the required bits for MD atomic write support for raid0/1/10
- Correct return value for unknown opcode in ublk
- Fix deadlock with zone revalidation
- Fix for the io priority request vs bio cleanups
- Use the correct unsigned int type for various limit helpers
- Fix for a race in loop
- Cleanup blk_rq_prep_clone() to prevent uninit-value warning and make
it easier for actual humans to read
- Fix potential UAF when iterating tags
- A few fixes for bfq-iosched UAF issues
- Fix for brd discard not decrementing the allocated page count
- Various little fixes and cleanups
* tag 'block-6.13-20242901' of git://git.kernel.dk/linux: (36 commits)
brd: decrease the number of allocated pages which discarded
block, bfq: fix bfqq uaf in bfq_limit_depth()
block: Don't allow an atomic write be truncated in blkdev_write_iter()
mq-deadline: don't call req_get_ioprio from the I/O completion handler
block: Prevent potential deadlock in blk_revalidate_disk_zones()
block: Remove extra part pointer NULLify in blk_rq_init()
nvme: tuning pr code by using defined structs and macros
nvme: introduce change ptpl and iekey definition
block: return bool from get_disk_ro and bdev_read_only
block: remove a duplicate definition for bdev_read_only
block: return bool from blk_rq_aligned
block: return unsigned int from blk_lim_dma_alignment_and_pad
block: return unsigned int from queue_dma_alignment
block: return unsigned int from bdev_io_opt
block: req->bio is always set in the merge code
block: don't bother checking the data direction for merges
block: blk-mq: fix uninit-value in blk_rq_prep_clone and refactor
Revert "block, bfq: merge bfq_release_process_ref() into bfq_put_cooperator()"
md/raid10: Atomic write support
md/raid1: Atomic write support
...
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmdJ6igQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpjj3D/44ltUzbKLiGRE8wvtyWSFdAeGUT8DA0MTW
ot+Tr43PY6+J+v5ClUmgzJYqLRjNUxJAGUWM8Tmr7tZ2UtKwhHX/CEUtbqOEm2Sg
e6aofpzR+sXX+ZqZRrLMPj6gLvuklWra+1STyzA6EkcvLiMqsLCY/U8nIm03VW26
ua0kj+5477pEo9Hei4mfLtHCad94IX6UAv5xuh+90Xo9zxdWYA5sCv6SpXlG/5vy
VYF8yChIiQC3SBgs1ewALblkm2RsCU59p0/9mOHOeBYzaFnoOV66fHEawWwKF2qM
FLp6ZKpFEgxiRW9JpxhUw8Pv0hQx5FWN15FLLTPb/ss4Xo5uFRq8+0fDP8S5U9OT
T37sj1nej7adaSjRWkmrgclNggFyhMmoCO9jMWxO1dmWNtHB153xGWNUcd0v/P2+
FdjibQd79Wpq7aWbKPOQORU8rqshNusUVlge/KlvyufEne9EuOQVjGk/i2AEjU5y
f1DomdUbEBeGB2FE7w0YYquI0oBOLQvBBk/hQl5pW7rfMgFoU0WAXiZLaJhM0i81
RgbI5FH1rFZtsnJ3kG6HpNPcibK2seip6weNfgZZnDZCSOHiCZbuxi+WBLtupKng
8J+ZXoDjucBVRgrUQRz6Km62oTLJQ/6CcazqrKvLxERa0eB6SNOxZRd1XYNFKacn
xIyyyzQj1g==
=b84h
-----END PGP SIGNATURE-----
Merge tag 'io_uring-6.13-20242901' of git://git.kernel.dk/linux
Pull more io_uring updates from Jens Axboe:
- Remove a leftover struct from when the cqwait registered waiting was
transitioned to regions.
- Fix for an issue introduced in this merge window, where nop->fd might
be used uninitialized. Ensure it's always set.
- Add capping of the task_work run in local task_work mode, to prevent
bursty and long chains from adding too much latency.
- Work around xa_store() leaving ->head non-NULL if it encounters an
allocation error during storing. Just a debug trigger, and can go
away once xa_store() behaves in a more expected way for this
condition. Not a major thing as it basically requires fault injection
to trigger it.
- Fix a few mapping corner cases
- Fix KCSAN complaint on reading the table size post unlock. Again not
a "real" issue, but it's easy to silence by just keeping the reading
inside the lock that protects it.
* tag 'io_uring-6.13-20242901' of git://git.kernel.dk/linux:
io_uring/tctx: work around xa_store() allocation error issue
io_uring: fix corner case forgetting to vunmap
io_uring: fix task_work cap overshooting
io_uring: check for overflows in io_pin_pages
io_uring/nop: ensure nop->fd is always initialized
io_uring: limit local tw done
io_uring: add io_local_work_pending()
io_uring/region: return negative -E2BIG in io_create_region()
io_uring: protect register tracing
io_uring: remove io_uring_cqwait_reg_arg
* Fixes.
RISC-V:
* Svade and Svadu (accessed and dirty bit) extension support for host and
guest. This was acked on the mailing list by the RISC-V maintainer, see
https://patchew.org/linux/20240726084931.28924-1-yongxuan.wang@sifive.com/.
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmdKS0QUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroP7hggAmt5CJesFGIuDwQgJX1KuNWAS84AX
Oq5SPZLH0XjE5YDm6AusSzvbtOhRM6mARU5/iqMRE6Mqpf4MXpP9tOo6xaDiL7+m
bOFsDYEO73WQyrIfFUCZ7dXiTbDVtQfNH8Z1yQwHPsa1d+WDYY3tLbCe5qCdqYMF
JDiB7K0cQzPDmhCwf3Zf8mW2ZRI0QsTqiuFUfVGGNgFDspWfBFBqkLCkrMNmbp9z
ye375oKb2VCe6OBJCY+Nl6tdoBUkz+CtZDCxkxuh0Uk4NmsUC9JMye9iwgU9DuI7
nagFuvpUGcgbZvrx1ly47TL+wcEFLwnBJ0xBZTGIgVoZHj/wX9GM+tSgIw==
=semZ
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull more kvm updates from Paolo Bonzini:
- ARM fixes
- RISC-V Svade and Svadu (accessed and dirty bit) extension support for
host and guest
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: riscv: selftests: Add Svade and Svadu Extension to get-reg-list test
RISC-V: KVM: Add Svade and Svadu Extensions Support for Guest/VM
dt-bindings: riscv: Add Svade and Svadu Entries
RISC-V: Add Svade and Svadu Extensions Support
KVM: arm64: Use MDCR_EL2.HPME to evaluate overflow of hyp counters
KVM: arm64: Ignore PMCNTENSET_EL0 while checking for overflow status
KVM: arm64: Mark set_sysreg_masks() as inline to avoid build failure
KVM: arm64: vgic-its: Add stronger type-checking to the ITS entry sizes
KVM: arm64: vgic: Kill VGIC_MAX_PRIVATE definition
KVM: arm64: vgic: Make vgic_get_irq() more robust
KVM: arm64: vgic-v3: Sanitise guest writes to GICR_INVLPIR
- sh: intc: Fix use-after-free bug in register_intc_controller()
- sh: cpuinfo: Fix a warning for CONFIG_CPUMASK_OFFSTACK
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEYv+KdYTgKVaVRgAGdCY7N/W1+RMFAmdLb80ACgkQdCY7N/W1
+RO+VhAAgv30ZrJlXytROsk8d8+565ttMHtv9KW+tPKJgEjGRx4G/nETQPwHpe7i
ueQNFwrL4w4vzy2CimMOVPOTTJGdF6BezXUB0LiNhNF+2X4EkhQ2Gf5yZ0CHddKU
VcJzglnDLbse9njZAx1RDZg2mysSnIlnP/clpnIcOlXpAtLp02JlfKytLb6db7lN
BOy9xjcIQac/ALQlJ2RGrd6RYzTtc4uok6Kt6K9mA893xyzvAXM2/vQDAAEPGzAW
aOzQALuiHlS0rvDXnPgBOEfPLtGcHJCS3QWIApGoLxqVcpd3sSsoCYI3Z4vGBm7M
J9Ng/VcfiA2QUuG4I5Z9fXT99dKqQhEJbDVpC7Y1T2zGsosGBR/CWOfF/DQGd7E2
2kmbPIw1ctqz1IZt4kmdoKMWJ0HvNRatK2870tajJz3B3B7wv6fPbGmVufiz8XJP
ErHfzHAKDHNJdAtmkpsllP/48tT5eIY2oykydlqtoBrGReFNGlGgEM4jNghHxy/D
yD7wGi7HtvavrJHMTWjuqv3YzFWrVk6U0juLgC/lM5durZzBKP6ABzz9aY0lARjJ
MunQD44p6EDh5x13fnjhbCGOQZfj4WIHs3VQSQp01EKthl4vDrMW66EssHJYI6ms
BSCWcA+R6XHgyTJbAR6DojqjJvzm5mZq77Xro+IqCrd8vtjwnpQ=
=8ud9
-----END PGP SIGNATURE-----
Merge tag 'sh-for-v6.13-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux
Pull sh updates from John Paul Adrian Glaubitz:
"Two small fixes.
The first one by Huacai Chen addresses a runtime warning when
CONFIG_CPUMASK_OFFSTACK and CONFIG_DEBUG_PER_CPU_MAPS are selected
which occurs because the cpuinfo code on sh incorrectly uses NR_CPUS
when iterating CPUs instead of the runtime limit nr_cpu_ids.
A second fix by Dan Carpenter fixes a use-after-free bug in
register_intc_controller() which occurred as a result of improper
error handling in the interrupt controller driver code when
registering an interrupt controller during plat_irq_setup() on sh"
* tag 'sh-for-v6.13-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux:
sh: intc: Fix use-after-free bug in register_intc_controller()
sh: cpuinfo: Fix a warning for CONFIG_CPUMASK_OFFSTACK
- Deselect ARCH_CORRECT_STACKTRACE_ON_KRETPROBE so that tests depending
on it don't run (and fail) on arm64
- Fix lockdep assert in the Arm SMMUv3 PMU driver
- Fix the port and device ID bits setting in the Arm CMN perf driver
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmdKK6QACgkQa9axLQDI
XvFvxg//SIcw9gQL+2POTH4taZ5Xk4lsUyCKygVINhO54RE1iKMAhsr5qar7YQtE
0/sxOdHACqO0BATNGQMbwhHYvn7RkfcqnoXihvLxOwPp8Nqdf4W55eOIwQCPfVaZ
p2N/Iy2SYEO044gIH6N8KzoFLjH7hwvXDtaWD6o+Fq9NEAwLMRn55QyIGfyM//V3
6igBD1vjPKwxX7qIIlNqsn1l80jk8weF5uWeU1g06lSFSSRNGQ9PGT7oTLmgiKrl
8daE8E+1++rKmelANNe4CZ2Ig9tFNonaE8jRFuEJMEu/mZ5AAMksGh1M7eaz/ODU
ov6AukUVFRr+EsfSYlARhEgBKEssuRSDRiYuwTYPGHhDmyf+wNSu0u9K3p99DdJv
Fu55u4weVShg/ooOY3HtPWm3WnSoaxO7L4OOVd6yxeF2McUyRI9qFrADkpr4D3Sv
9aCGX9UdIuuTbEuXf3jacMhPq97QP9KAshWNEmzo9OZze7vLcbDz3IVeTanHIUOg
PieUgcnLW0dOxPz+tXYzOs3Ebf0pU8nQY3Eyj3YP7GsJkJMQh3usNC2t/oYhsnyj
OsohGT97CyBAgGtdlGsiNknIKisD3LIwa4gkJ+4sFIvFr5jfdrfCTvl23DieI8bf
9YETifEaSrJ+R/wzP0cMUUAauBC0gxS4HMPXGETe02U+ziGt3LU=
=ZNzJ
-----END PGP SIGNATURE-----
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- Deselect ARCH_CORRECT_STACKTRACE_ON_KRETPROBE so that tests depending
on it don't run (and fail) on arm64
- Fix lockdep assert in the Arm SMMUv3 PMU driver
- Fix the port and device ID bits setting in the Arm CMN perf driver
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
perf/arm-cmn: Ensure port and device id bits are set properly
perf/arm-smmuv3: Fix lockdep assert in ->event_init()
arm64: disable ARCH_CORRECT_STACKTRACE_ON_KRETPROBE tests
since 2024.07.26:
assorted minor bug fixes
assorted platform specific tweaks
initial RAPL PSYS (SysWatt) support
Signed-off-by: Len Brown <len.brown@intel.com>
Introduce the counter as a part of global, platform counters structure.
We open the counter for only one cpu, but otherwise treat it as an
ordinary RAPL counter, allowing for grouped perf read.
The counter is disabled by default, because it's interpretation may
require additional, platform specific information, making it unsuitable
for general use.
Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Add '+' to optstring when early scanning for --no-msr and --no-perf.
It causes option processing to stop as soon as a nonoption argument is
encountered, effectively skipping child's arguments.
Fixes: 3e4048466c ("tools/power turbostat: Add --no-msr option")
Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Force the --no-perf early to prevent using it as a source. User asks for
raw values, but perf returns them relative to the opening of the file
descriptor.
Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
On some machines, the graphics device is enumerated as
/sys/class/drm/card1 instead of /sys/class/drm/card0. The current
implementation does not handle this scenario, resulting in the loss of
graphics C6 residency and frequency information.
Add support for /sys/class/drm/card1, ensuring that turbostat can
retrieve and display the graphics columns for these platforms.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Snapshots of the graphics sysfs knobs are taken based on file
descriptors. To optimize this process, open the files and cache the file
descriptors during the graphics probe phase. As a result, the previously
cached pathnames become redundant and are removed.
This change aims to streamline the code without altering its functionality.
No functional change intended.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Currently, there is an inconsistency in how graphics sysfs knobs are
accessed: graphics residency sysfs knobs are opened and closed for each
read, while graphics frequency sysfs knobs are opened once and remain
open until turbostat exits. This inconsistency is confusing and adds
unnecessary code complexity.
Consolidate the access method by opening the sysfs files once and
reusing the file pointers for subsequent accesses. This approach
simplifies the code and ensures a consistent method for accessing
graphics sysfs knobs.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
The graphics sysfs knobs are read-only, making the use of fflush()
before reading them redundant.
Remove the unnecessary fflush() call.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
In various generations, platforms often share a majority of features,
diverging only in a few specific aspects. The current approach of using
hardcoded values in 'platform_features' structure fails to effectively
represent these divergences.
To improve the description of platform divergence:
1. Each newly introduced 'platform_features' structure must have a base,
typically derived from the previous generation.
2. Platform feature values should be inherited from the base structure
rather than being hardcoded.
This approach ensures a more accurate and maintainable representation of
platform-specific features across different generations.
Converts `adl_features` and `lnl_features` to follow this new scheme.
No functional change.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Add initial support for GraniteRapids-D. It shares the same features
with SapphireRapids.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>