mirror of
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
synced 2024-12-29 09:13:38 +00:00
4bbf9020be
23502 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Leo Stone
|
d3ac65d274 |
mm: huge_memory: handle strsep not finding delimiter
split_huge_pages_write() does not handle the case where strsep finds no delimiter in the given string and sets the input buffer to NULL, which allows this reproducer to trigger a protection fault. Link: https://lkml.kernel.org/r/20241216042752.257090-2-leocstone@gmail.com Signed-off-by: Leo Stone <leocstone@gmail.com> Reported-by: syzbot+8a3da2f1bbf59227c289@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=8a3da2f1bbf59227c289 Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Usama Arif
|
42b2eb6983 |
mm: convert partially_mapped set/clear operations to be atomic
Other page flags in the 2nd page, like PG_hwpoison and PG_anon_exclusive
can get modified concurrently. Changes to other page flags might be lost
if they are happening at the same time as non-atomic partially_mapped
operations. Hence, make partially_mapped operations atomic.
Link: https://lkml.kernel.org/r/20241212183351.1345389-1-usamaarif642@gmail.com
Fixes:
|
||
Matthew Wilcox (Oracle)
|
a2e740e216 |
vmalloc: fix accounting with i915
If the caller of vmap() specifies VM_MAP_PUT_PAGES (currently only the
i915 driver), we will decrement nr_vmalloc_pages and MEMCG_VMALLOC in
vfree(). These counters are incremented by vmalloc() but not by vmap() so
this will cause an underflow. Check the VM_MAP_PUT_PAGES flag before
decrementing either counter.
Link: https://lkml.kernel.org/r/20241211202538.168311-1-willy@infradead.org
Fixes:
|
||
David Hildenbrand
|
faeec8e23c |
mm/page_alloc: don't call pfn_to_page() on possibly non-existent PFN in split_large_buddy()
In split_large_buddy(), we might call pfn_to_page() on a PFN that might
not exist. In corner cases, such as when freeing the highest pageblock in
the last memory section, this could result with CONFIG_SPARSEMEM &&
!CONFIG_SPARSEMEM_EXTREME in __pfn_to_section() returning NULL and and
__section_mem_map_addr() dereferencing that NULL pointer.
Let's fix it, and avoid doing a pfn_to_page() call for the first
iteration, where we already have the page.
So far this was found by code inspection, but let's just CC stable as the
fix is easy.
Link: https://lkml.kernel.org/r/20241210093437.174413-1-david@redhat.com
Fixes:
|
||
Zi Yan
|
c51a4f11e6 |
mm: use clear_user_(high)page() for arch with special user folio handling
Some architectures have special handling after clearing user folios:
architectures, which set cpu_dcache_is_aliasing() to true, require
flushing dcache; arc, which sets cpu_icache_is_aliasing() to true, changes
folio->flags to make icache coherent to dcache. So __GFP_ZERO using only
clear_page() is not enough to zero user folios and clear_user_(high)page()
must be used. Otherwise, user data will be corrupted.
Fix it by always clearing user folios with clear_user_(high)page() when
cpu_dcache_is_aliasing() is true or cpu_icache_is_aliasing() is true.
Rename alloc_zeroed() to user_alloc_needs_zeroing() and invert the logic
to clarify its intend.
Link: https://lkml.kernel.org/r/20241209182326.2955963-2-ziy@nvidia.com
Fixes:
|
||
Petr Malat
|
31c5629920 |
mm: add RCU annotation to pte_offset_map(_lock)
RCU lock is taken by ___pte_offset_map() unless it returns NULL. Add this information to its inline callers to avoid sparse warning about context imbalance in pte_unmap(). Link: https://lkml.kernel.org/r/20241210000604.700710-1-oss@malat.biz Signed-off-by: Petr Malat <oss@malat.biz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Lorenzo Stoakes
|
42c4e4b20d |
mm: correctly reference merged VMA
On second merge attempt on mmap() we incorrectly discard the possibly
merged VMA, resulting in a possible use-after-free (and most certainly a
reference to the wrong VMA) in this instance in the subsequent
__mmap_complete() invocation.
Correct this mistake by reassigning vma correctly if a merge succeeds in
this case.
Link: https://lkml.kernel.org/r/20241206215229.244413-1-lorenzo.stoakes@oracle.com
Fixes:
|
||
Kefeng Wang
|
f5d09de9f1 |
mm: use aligned address in copy_user_gigantic_page()
In current kernel, hugetlb_wp() calls copy_user_large_folio() with the
fault address. Where the fault address may be not aligned with the huge
page size. Then, copy_user_large_folio() may call
copy_user_gigantic_page() with the address, while
copy_user_gigantic_page() requires the address to be huge page size
aligned. So, this may cause memory corruption or information leak,
addtional, use more obvious naming 'addr_hint' instead of 'addr' for
copy_user_gigantic_page().
Link: https://lkml.kernel.org/r/20241028145656.932941-2-wangkefeng.wang@huawei.com
Fixes:
|
||
Kefeng Wang
|
8aca2bc96c |
mm: use aligned address in clear_gigantic_page()
In current kernel, hugetlb_no_page() calls folio_zero_user() with the
fault address. Where the fault address may be not aligned with the huge
page size. Then, folio_zero_user() may call clear_gigantic_page() with
the address, while clear_gigantic_page() requires the address to be huge
page size aligned. So, this may cause memory corruption or information
leak, addtional, use more obvious naming 'addr_hint' instead of 'addr' for
clear_gigantic_page().
Link: https://lkml.kernel.org/r/20241028145656.932941-1-wangkefeng.wang@huawei.com
Fixes:
|
||
Hugh Dickins
|
dad2dc9c92 |
mm: shmem: fix ShmemHugePages at swapout
/proc/meminfo ShmemHugePages has been showing overlarge amounts (more than
Shmem) after swapping out THPs: we forgot to update NR_SHMEM_THPS.
Add shmem_update_stats(), to avoid repetition, and risk of making that
mistake again: the call from shmem_delete_from_page_cache() is the bugfix;
the call from shmem_replace_folio() is reassuring, but not really a bugfix
(replace corrects misplaced swapin readahead, but huge swapin readahead
would be a mistake).
Link: https://lkml.kernel.org/r/5ba477c8-a569-70b5-923e-09ab221af45b@google.com
Fixes:
|
||
Linus Torvalds
|
266facde83 |
slab fixes for 6.13-rc3
-----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEe7vIQRWZI0iWSE3xu+CwddJFiJoFAmdb/84ACgkQu+CwddJF iJqEhQf+KHXQfZ/m7ma6bUEisjVOkUsxnpSj/DM1CP03+czqXoSEr+CHX2g1v0Os pZSOGGPTtvDgSBO0UZgo1P+sXyl0hrs2A/XrVQfKPmTfc9mPoxFTkbhI7Rv9hZfi 9OVh9kg3OE2hixD9ussGmxtFTXE3vXeu9zASa2sEdpakc3gxgeNhIklGK9r3oQuF 4aa33JpRV/oF80XAAUch5ltAO0tqQWtN50CbnoKc5cGRDuFRRdzXMMiL7gGiCctI te3sVMc6mnusdJqvp+AE10aT0iQpRWwL1lzGrgp3PuZ7DN8UQHxBIf8u8VIjyPML D2JZEYmqykKSvXbDzYBdOAF8XLEYCw== =PpHb -----END PGP SIGNATURE----- Merge tag 'slab-for-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab Pull slab fix from Vlastimil Babka: - Fix for memcg unreclaimable slab stats drift when post-charging large kmalloc allocations (Shakeel Butt) * tag 'slab-for-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: memcg: slub: fix SUnreclaim for post charged objects |
||
Shakeel Butt
|
b7ffecbe19 |
memcg: slub: fix SUnreclaim for post charged objects
Large kmalloc directly allocates from the page allocator and then use lruvec_stat_mod_folio() to increment the unreclaimable slab stats for global and memcg. However when post memcg charging of slab objects was added in commit |
||
Linus Torvalds
|
553c89ec31 |
24 hotfixes. 17 are cc:stable. 15 are MM and 9 are non-MM.
The usual bunch of singletons - please see the relevant changelogs for details. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZ1U/QwAKCRDdBJ7gKXxA jnE7AQC0eyNNvaL5pLCIxN/Vmr8YeuWP1dldgI29TjrH/JKjSQEAihZNqVZYjoIT Gf7Y+IKnc4LbfAXcTe+MfJFeDexM5AU= =U5LQ -----END PGP SIGNATURE----- Merge tag 'mm-hotfixes-stable-2024-12-07-22-39' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "24 hotfixes. 17 are cc:stable. 15 are MM and 9 are non-MM. The usual bunch of singletons - please see the relevant changelogs for details" * tag 'mm-hotfixes-stable-2024-12-07-22-39' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (24 commits) iio: magnetometer: yas530: use signed integer type for clamp limits sched/numa: fix memory leak due to the overwritten vma->numab_state mm/damon: fix order of arguments in damos_before_apply tracepoint lib: stackinit: hide never-taken branch from compiler mm/filemap: don't call folio_test_locked() without a reference in next_uptodate_folio() scatterlist: fix incorrect func name in kernel-doc mm: correct typo in MMAP_STATE() macro mm: respect mmap hint address when aligning for THP mm: memcg: declare do_memsw_account inline mm/codetag: swap tags when migrate pages ocfs2: update seq_file index in ocfs2_dlm_seq_next stackdepot: fix stack_depot_save_flags() in NMI context mm: open-code page_folio() in dump_page() mm: open-code PageTail in folio_flags() and const_folio_flags() mm: fix vrealloc()'s KASAN poisoning logic Revert "readahead: properly shorten readahead when falling back to do_page_cache_ra()" selftests/damon: add _damon_sysfs.py to TEST_FILES selftest: hugetlb_dio: fix test naming ocfs2: free inode when ocfs2_get_init_inode() fails nilfs2: fix potential out-of-bounds memory access in nilfs_find_entry() ... |
||
Linus Torvalds
|
ddfc146ed5 |
memblock: restore check for node validity in arch_numa
Rework of NUMA initialization in arch_numa dropped a check that refused to accept configurations with invalid node IDs. Restore that check to ensure that when firmware passes invalid nodes, such configuration is rejected and kernel gracefully falls back to dummy NUMA. -----BEGIN PGP SIGNATURE----- iQFEBAABCgAuFiEEeOVYVaWZL5900a/pOQOGJssO/ZEFAmdSz9wQHHJwcHRAa2Vy bmVsLm9yZwAKCRA5A4Ymyw79kQPWCACSCwm7B8K0ctWbqGHsglCkMgF9pI/mUwjM 3c6zjzpsL5z0ii41cAEbDKWNTfroJddkWxZbDveHt3PytEYVM5ZvQL3tGwCfkpG8 wrAQSRE4XMv+ffA4LBB7U4xHxxEKtSc7OpqO3h4RED3T66hlFtKWMhiNYhl2mKwn 4ic7xLqoKj7Nu3hHc3014x/94tVWszgdgsZo+OJyPSxh+kwLdOVpwZWG22CT58UR nTVQu/a13XVFu8R11S3a4iDMTOqb5oBVRw2pnw+knChXFJ4r2Pr/pA8uneTWEAFB TiYclkH/0/eDd9Vpx5JTUQf4xPfuIXHynjQDwXYHWJ/U9jwLAwTH =h/KU -----END PGP SIGNATURE----- Merge tag 'fixes-2024-12-06' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock Pull memblock fixes from Mike Rapoport: "Restore check for node validity in arch_numa. The rework of NUMA initialization in arch_numa dropped a check that refused to accept configurations with invalid node IDs. Restore that check to ensure that when firmware passes invalid nodes, such configuration is rejected and kernel gracefully falls back to dummy NUMA" * tag 'fixes-2024-12-06' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock: arch_numa: Restore nid checks before registering a memblock with a node memblock: allow zero threshold in validate_numa_converage() |
||
David Hildenbrand
|
3203b3ab0f |
mm/filemap: don't call folio_test_locked() without a reference in next_uptodate_folio()
The folio can get freed + buddy-merged + reallocated in the meantime,
resulting in us calling folio_test_locked() possibly on a tail page.
This makes const_folio_flags VM_BUG_ON_PGFLAGS() when stumbling over the
tail page.
Could this result in other issues? Doesn't look like it. False positives
and false negatives don't really matter, because this folio would get
skipped either way when detecting that they have been reallocated in the
meantime.
Fix it by performing the folio_test_locked() checked after grabbing a
reference. If this ever becomes a real problem, we could add a special
helper that racily checks if the bit is set even on tail pages ... but
let's hope that's not required so we can just handle it cleaner: work on
the folio after we hold a reference.
Do we really need the folio_test_locked() check if we are going to trylock
briefly after? Well, we can at least avoid a xas_reload().
It's a bit unclear which exact change introduced that issue. Likely, ever
since we made PG_locked obey to the PF_NO_TAIL policy it could have been
triggered in some way.
Link: https://lkml.kernel.org/r/20241129125303.4033164-1-david@redhat.com
Fixes:
|
||
Lorenzo Stoakes
|
cbb70e4534 |
mm: correct typo in MMAP_STATE() macro
We mistakenly refer to len rather than len_ here. The only existing caller passes len to the len_ parameter so this has no impact on the code, but it is obviously incorrect to do this, so fix it. Link: https://lkml.kernel.org/r/20241118175414.390827-1-lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Wei Yang <richard.weiyang@gmail.com> Cc: Jann Horn <jannh@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Kalesh Singh
|
249608ee47 |
mm: respect mmap hint address when aligning for THP
Commit |
||
John Sperbeck
|
89dd878282 |
mm: memcg: declare do_memsw_account inline
In commit |
||
David Wang
|
51f43d5d82 |
mm/codetag: swap tags when migrate pages
Current solution to adjust codetag references during page migration is
done in 3 steps:
1. sets the codetag reference of the old page as empty (not pointing
to any codetag);
2. subtracts counters of the new page to compensate for its own
allocation;
3. sets codetag reference of the new page to point to the codetag of
the old page.
This does not work if CONFIG_MEM_ALLOC_PROFILING_DEBUG=n because
set_codetag_empty() becomes NOOP. Instead, let's simply swap codetag
references so that the new page is referencing the old codetag and the old
page is referencing the new codetag. This way accounting stays valid and
the logic makes more sense.
Link: https://lkml.kernel.org/r/20241129025213.34836-1-00107082@163.com
Fixes:
|
||
Matthew Wilcox (Oracle)
|
6a7de1bf21 |
mm: open-code page_folio() in dump_page()
page_folio() calls page_fixed_fake_head() which will misidentify this page
as being a fake head and load off the end of 'precise'. We may have a
pointer to a fake head, but that's OK because it contains the right
information for dump_page().
gcc-15 is smart enough to catch this with -Warray-bounds:
In function 'page_fixed_fake_head',
inlined from '_compound_head' at ../include/linux/page-flags.h:251:24,
inlined from '__dump_page' at ../mm/debug.c:123:11:
../include/asm-generic/rwonce.h:44:26: warning: array subscript 9 is outside
+array bounds of 'struct page[1]' [-Warray-bounds=]
Link: https://lkml.kernel.org/r/20241125201721.2963278-2-willy@infradead.org
Fixes:
|
||
Andrii Nakryiko
|
d699440f58 |
mm: fix vrealloc()'s KASAN poisoning logic
When vrealloc() reuses already allocated vmap_area, we need to re-annotate
poisoned and unpoisoned portions of underlying memory according to the new
size.
This results in a KASAN splat recorded at [1]. A KASAN mis-reporting
issue where there is none.
Note, hard-coding KASAN_VMALLOC_PROT_NORMAL might not be exactly correct,
but KASAN flag logic is pretty involved and spread out throughout
__vmalloc_node_range_noprof(), so I'm using the bare minimum flag here and
leaving the rest to mm people to refactor this logic and reuse it here.
Link: https://lkml.kernel.org/r/20241126005206.3457974-1-andrii@kernel.org
Link: https://lore.kernel.org/bpf/67450f9b.050a0220.21d33d.0004.GAE@google.com/ [1]
Fixes:
|
||
Jan Kara
|
a220d6b95b |
Revert "readahead: properly shorten readahead when falling back to do_page_cache_ra()"
This reverts commit |
||
Jared Kangas
|
e30a0361b8 |
kasan: make report_lock a raw spinlock
If PREEMPT_RT is enabled, report_lock is a sleeping spinlock and must not
be locked when IRQs are disabled. However, KASAN reports may be triggered
in such contexts. For example:
char *s = kzalloc(1, GFP_KERNEL);
kfree(s);
local_irq_disable();
char c = *s; /* KASAN report here leads to spin_lock() */
local_irq_enable();
Make report_spinlock a raw spinlock to prevent rescheduling when
PREEMPT_RT is enabled.
Link: https://lkml.kernel.org/r/20241119210234.1602529-1-jkangas@redhat.com
Fixes:
|
||
David Hildenbrand
|
091c1dd2d4 |
mm/mempolicy: fix migrate_to_node() assuming there is at least one VMA in a MM
We currently assume that there is at least one VMA in a MM, which isn't
true.
So we might end up having find_vma() return NULL, to then de-reference
NULL. So properly handle find_vma() returning NULL.
This fixes the report:
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
CPU: 1 UID: 0 PID: 6021 Comm: syz-executor284 Not tainted 6.12.0-rc7-syzkaller-00187-gf868cd251776 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/30/2024
RIP: 0010:migrate_to_node mm/mempolicy.c:1090 [inline]
RIP: 0010:do_migrate_pages+0x403/0x6f0 mm/mempolicy.c:1194
Code: ...
RSP: 0018:ffffc9000375fd08 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffc9000375fd78 RCX: 0000000000000000
RDX: ffff88807e171300 RSI: dffffc0000000000 RDI: ffff88803390c044
RBP: ffff88807e171428 R08: 0000000000000014 R09: fffffbfff2039ef1
R10: ffffffff901cf78f R11: 0000000000000000 R12: 0000000000000003
R13: ffffc9000375fe90 R14: ffffc9000375fe98 R15: ffffc9000375fdf8
FS: 00005555919e1380(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005555919e1ca8 CR3: 000000007f12a000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
kernel_migrate_pages+0x5b2/0x750 mm/mempolicy.c:1709
__do_sys_migrate_pages mm/mempolicy.c:1727 [inline]
__se_sys_migrate_pages mm/mempolicy.c:1723 [inline]
__x64_sys_migrate_pages+0x96/0x100 mm/mempolicy.c:1723
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
[akpm@linux-foundation.org: add unlikely()]
Link: https://lkml.kernel.org/r/20241120201151.9518-1-david@redhat.com
Fixes:
|
||
John Hubbard
|
a1268be280 |
mm/gup: handle NULL pages in unpin_user_pages()
The recent addition of "pofs" (pages or folios) handling to gup has a
flaw: it assumes that unpin_user_pages() handles NULL pages in the pages**
array. That's not the case, as I discovered when I ran on a new
configuration on my test machine.
Fix this by skipping NULL pages in unpin_user_pages(), just like
unpin_folios() already does.
Details: when booting on x86 with "numa=fake=2 movablecore=4G" on Linux
6.12, and running this:
tools/testing/selftests/mm/gup_longterm
...I get the following crash:
BUG: kernel NULL pointer dereference, address: 0000000000000008
RIP: 0010:sanity_check_pinned_pages+0x3a/0x2d0
...
Call Trace:
<TASK>
? __die_body+0x66/0xb0
? page_fault_oops+0x30c/0x3b0
? do_user_addr_fault+0x6c3/0x720
? irqentry_enter+0x34/0x60
? exc_page_fault+0x68/0x100
? asm_exc_page_fault+0x22/0x30
? sanity_check_pinned_pages+0x3a/0x2d0
unpin_user_pages+0x24/0xe0
check_and_migrate_movable_pages_or_folios+0x455/0x4b0
__gup_longterm_locked+0x3bf/0x820
? mmap_read_lock_killable+0x12/0x50
? __pfx_mmap_read_lock_killable+0x10/0x10
pin_user_pages+0x66/0xa0
gup_test_ioctl+0x358/0xb20
__se_sys_ioctl+0x6b/0xc0
do_syscall_64+0x7b/0x150
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Link: https://lkml.kernel.org/r/20241121034933.77502-1-jhubbard@nvidia.com
Fixes:
|
||
Peter Zijlstra
|
cdd30ebb1b |
module: Convert symbol namespace to string literal
Clean up the existing export namespace code along the same lines of
commit
|
||
Mike Rapoport (Microsoft)
|
9cdc6423ac |
memblock: allow zero threshold in validate_numa_converage()
Currently memblock validate_numa_converage() returns false negative when threshold set to zero. Make the check if the memory size with invalid node ID is greater than the threshold exclusive to fix that. Link: https://lore.kernel.org/all/Z0mIDBD4KLyxyOCm@kernel.org/ Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> |
||
Linus Torvalds
|
6a34dfa15d |
Kbuild updates for v6.13
- Add generic support for built-in boot DTB files - Enable TAB cycling for dialog buttons in nconfig - Fix issues in streamline_config.pl - Refactor Kconfig - Add support for Clang's AutoFDO (Automatic Feedback-Directed Optimization) - Add support for Clang's Propeller, a profile-guided optimization. - Change the working directory to the external module directory for M= builds - Support building external modules in a separate output directory - Enable objtool for *.mod.o and additional kernel objects - Use lz4 instead of deprecated lz4c - Work around a performance issue with "git describe" - Refactor modpost -----BEGIN PGP SIGNATURE----- iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmdKGgEVHG1hc2FoaXJv eUBrZXJuZWwub3JnAAoJED2LAQed4NsGrFoQAIgioJPRG+HC6bGmjy4tL4bq1RAm 78nbD12grrAa+NvQGRZYRs264rWxBGwrNfGGS9BDvlWJZ3fmKEuPlfCIxC0nkKk8 LVLNxSVvgpHE47RQ+E4V+yYhrlZKb4aWZjH3ZICn7vxRgbQ5Veq61aatluVHyn9c I8g+APYN/S1A4JkFzaLe8GV7v5OM3+zGSn3M9n7/DxVkoiNrMOXJm5hRdRgYfEv/ kMppheY2PPshZsaL+yLAdrJccY5au5vYE/v8wHkMtvM/LF6YwjgqPVDRFQ30BuLM sAMMd6AUoopiDZQOpqmXYukU0b0MQPswg3jmB+PWUBrlsuydRa8kkyPwUaFrDd+w 9d0jZRc8/O/lxUdD1AefRkNcA/dIJ4lTPr+2NpxwHuj2UFo0gLQmtjBggMFHaWvs 0NQRBPlxfOE4+Htl09gyg230kHuWq+rh7xqbyJCX+hBOaZ6kI2lryl6QkgpAoS+x KDOcUKnsgGMGARQRrvCOAXnQs+rjkW8fEm6t7eSBFPuWJMK85k4LmxOog8GVYEdM JcwCnCHt9TtcHlSxLRnVXj2aqGTFNLJXE1aLyCp9u8MxZ7qcx01xOuCmwp6FRzNq ghal7ngA58Y/S4K/oJ+CW2KupOb6CFne8mpyotpYeWI7MR64t0YWs4voZkuK46E2 CEBfA4PDehA4BxQe =GDKD -----END PGP SIGNATURE----- Merge tag 'kbuild-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - Add generic support for built-in boot DTB files - Enable TAB cycling for dialog buttons in nconfig - Fix issues in streamline_config.pl - Refactor Kconfig - Add support for Clang's AutoFDO (Automatic Feedback-Directed Optimization) - Add support for Clang's Propeller, a profile-guided optimization. - Change the working directory to the external module directory for M= builds - Support building external modules in a separate output directory - Enable objtool for *.mod.o and additional kernel objects - Use lz4 instead of deprecated lz4c - Work around a performance issue with "git describe" - Refactor modpost * tag 'kbuild-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (85 commits) kbuild: rename .tmp_vmlinux.kallsyms0.syms to .tmp_vmlinux0.syms gitignore: Don't ignore 'tags' directory kbuild: add dependency from vmlinux to resolve_btfids modpost: replace tdb_hash() with hash_str() kbuild: deb-pkg: add python3:native to build dependency genksyms: reduce indentation in export_symbol() modpost: improve error messages in device_id_check() modpost: rename alias symbol for MODULE_DEVICE_TABLE() modpost: rename variables in handle_moddevtable() modpost: move strstarts() to modpost.h modpost: convert do_usb_table() to a generic handler modpost: convert do_of_table() to a generic handler modpost: convert do_pnp_device_entry() to a generic handler modpost: convert do_pnp_card_entries() to a generic handler modpost: call module_alias_printf() from all do_*_entry() functions modpost: pass (struct module *) to do_*_entry() functions modpost: remove DEF_FIELD_ADDR_VAR() macro modpost: deduplicate MODULE_ALIAS() for all drivers modpost: introduce module_alias_printf() helper modpost: remove unnecessary check in do_acpi_entry() ... |
||
Linus Torvalds
|
ab952fc5c7 |
memblock: updates for 6.13-rc1
* replace hardcoded strings with str_on_off() in report_meminit() * initialize reserved pages to MIGRATE_MOVABLE when deferred struct page initialization is enabled so that if the reserved pages are freed they are put on movable free lists like it is done now when deferred struct page initialization is disabled -----BEGIN PGP SIGNATURE----- iQFEBAABCgAuFiEEeOVYVaWZL5900a/pOQOGJssO/ZEFAmdG0cAQHHJwcHRAa2Vy bmVsLm9yZwAKCRA5A4Ymyw79kV3JCACjcw5F3plRqPOUYcHpbFT0gVq787CHQTke szhLrRLcSaCMf9P7sjlLdlAkB0SVR64oghCLxa4s8IxkOSBk52WJneUimkN8smsI VAm5K7YPZtC+W4BHmGmrXdMeV6XWzV7RJ/SMxyD3oBd+zpnzevXmxRTglKq+MRok t3WCT0cSb7FFORraqrexKg4zcEEYCad1swDmpSlHBzYFnC05C5qFgEW4hnqsEc1x dikstJujVxnfxFwIu3L2yRLjB2Ti4EDMsh1Z5HjjfE89wm/PRW0YL0bsJ+RG+7t4 GYpG/KxhnAZeEEcOdwYxZKWxqyTgBXYvVyL43Yizw1BGSIyLvOjn =mv+T -----END PGP SIGNATURE----- Merge tag 'memblock-v6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock Pull memblock updates from Mike Rapoport: - replace hardcoded strings with str_on_off() in report_meminit() - initialize reserved pages to MIGRATE_MOVABLE when deferred struct page initialization is enabled so that if the reserved pages are freed they are put on movable free lists like it is done now when deferred struct page initialization is disabled * tag 'memblock-v6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock: memblock: uniformly initialize all reserved pages to MIGRATE_MOVABLE mm: Use str_on_off() helper function in report_meminit() |
||
Masahiro Yamada
|
dbefa1f31a |
Rename .data.once to .data..once to fix resetting WARN*_ONCE
Commit |
||
Linus Torvalds
|
798bb342e0 |
Rust changes for v6.13
Toolchain and infrastructure: - Enable a series of lints, including safety-related ones, e.g. the compiler will now warn about missing safety comments, as well as unnecessary ones. How safety documentation is organized is a frequent source of review comments, thus having the compiler guide new developers on where they are expected (and where not) is very nice. - Start using '#[expect]': an interesting feature in Rust (stabilized in 1.81.0) that makes the compiler warn if an expected warning was _not_ emitted. This is useful to avoid forgetting cleaning up locally ignored diagnostics ('#[allow]'s). - Introduce '.clippy.toml' configuration file for Clippy, the Rust linter, which will allow us to tweak its behaviour. For instance, our first use cases are declaring a disallowed macro and, more importantly, enabling the checking of private items. - Lints-related fixes and cleanups related to the items above. - Migrate from 'receiver_trait' to 'arbitrary_self_types': to get the kernel into stable Rust, one of the major pieces of the puzzle is the support to write custom types that can be used as 'self', i.e. as receivers, since the kernel needs to write types such as 'Arc' that common userspace Rust would not. 'arbitrary_self_types' has been accepted to become stable, and this is one of the steps required to get there. - Remove usage of the 'new_uninit' unstable feature. - Use custom C FFI types. Includes a new 'ffi' crate to contain our custom mapping, instead of using the standard library 'core::ffi' one. The actual remapping will be introduced in a later cycle. - Map '__kernel_{size_t,ssize_t,ptrdiff_t}' to 'usize'/'isize' instead of 32/64-bit integers. - Fix 'size_t' in bindgen generated prototypes of C builtins. - Warn on bindgen < 0.69.5 and libclang >= 19.1 due to a double issue in the projects, which we managed to trigger with the upcoming tracepoint support. It includes a build test since some distributions backported the fix (e.g. Debian -- thanks!). All major distributions we list should be now OK except Ubuntu non-LTS. 'macros' crate: - Adapt the build system to be able run the doctests there too; and clean up and enable the corresponding doctests. 'kernel' crate: - Add 'alloc' module with generic kernel allocator support and remove the dependency on the Rust standard library 'alloc' and the extension traits we used to provide fallible methods with flags. Add the 'Allocator' trait and its implementations '{K,V,KV}malloc'. Add the 'Box' type (a heap allocation for a single value of type 'T' that is also generic over an allocator and considers the kernel's GFP flags) and its shorthand aliases '{K,V,KV}Box'. Add 'ArrayLayout' type. Add 'Vec' (a contiguous growable array type) and its shorthand aliases '{K,V,KV}Vec', including iterator support. For instance, now we may write code such as: let mut v = KVec::new(); v.push(1, GFP_KERNEL)?; assert_eq!(&v, &[1]); Treewide, move as well old users to these new types. - 'sync' module: add global lock support, including the 'GlobalLockBackend' trait; the 'Global{Lock,Guard,LockedBy}' types and the 'global_lock!' macro. Add the 'Lock::try_lock' method. - 'error' module: optimize 'Error' type to use 'NonZeroI32' and make conversion functions public. - 'page' module: add 'page_align' function. - Add 'transmute' module with the existing 'FromBytes' and 'AsBytes' traits. - 'block::mq::request' module: improve rendered documentation. - 'types' module: extend 'Opaque' type documentation and add simple examples for the 'Either' types. drm/panic: - Clean up a series of Clippy warnings. Documentation: - Add coding guidelines for lints and the '#[expect]' feature. - Add Ubuntu to the list of distributions in the Quick Start guide. MAINTAINERS: - Add Danilo Krummrich as maintainer of the new 'alloc' module. And a few other small cleanups and fixes. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEPjU5OPd5QIZ9jqqOGXyLc2htIW0FAmdFMIgACgkQGXyLc2ht IW16hQ/+KX/jmdGoXtNXx7T6yG6SJ/txPOieGAWfhBf6C3bqkGrU9Gnw/O3VWrxf eyj1QLQaIVUkmumWCefeiy9u3xRXx5fpS0tWJOjUtxC5NcS7vCs0AHQs1skIa6H+ YD6UKDPOy7CB5fVYqo13B5xnFAlciU0dLo6IGB6bB/lSpCudGLE9+nukfn5H3/R1 DTc3/fbSoYQU6Ij/FKscB+D/A7ojdYaReodhbzNw1lChg1MrJlCpqoQvHPE8ijg+ UDljHFFvgKdhSQL9GTa3LC7X4DsnihMWzXt14m6mMOqBa6TqF47WUhhgC77pHEI2 v/Yy8MLq0pdIzT1wFjsqs6opuvXc7K5Yk5Y60HDsDyIyjk2xgOjh6ZlD0EV161gS 7w1NtaKd/Cn7hnL7Ua51yJDxJTMllne3fTWemhs3Zd63j7ham98yOoiw+6L2QaM4 C9nW48vfUuTwDuYJ5HU0uSugubuHW3Ng5JEvMcvd4QjmaI1bQNkgVzefR5j3dLw8 9kEOTzJoxHpu5B7PZVTEd/L95hlmk1csSQObxi7JYCCimMkusF1S+heBzV/SqWD5 5ioEhCnSKE05fhQs0Uxns1HkcFle8Bn6r3aSAWV6yaR8oF94yHcuaZRUKxKMHw+1 cmBE2X8Yvtldw+CYDwEGWjKDtwOStbqk+b/ZzP7f7/p56QH9lSg= =Kn7b -----END PGP SIGNATURE----- Merge tag 'rust-6.13' of https://github.com/Rust-for-Linux/linux Pull rust updates from Miguel Ojeda: "Toolchain and infrastructure: - Enable a series of lints, including safety-related ones, e.g. the compiler will now warn about missing safety comments, as well as unnecessary ones. How safety documentation is organized is a frequent source of review comments, thus having the compiler guide new developers on where they are expected (and where not) is very nice. - Start using '#[expect]': an interesting feature in Rust (stabilized in 1.81.0) that makes the compiler warn if an expected warning was _not_ emitted. This is useful to avoid forgetting cleaning up locally ignored diagnostics ('#[allow]'s). - Introduce '.clippy.toml' configuration file for Clippy, the Rust linter, which will allow us to tweak its behaviour. For instance, our first use cases are declaring a disallowed macro and, more importantly, enabling the checking of private items. - Lints-related fixes and cleanups related to the items above. - Migrate from 'receiver_trait' to 'arbitrary_self_types': to get the kernel into stable Rust, one of the major pieces of the puzzle is the support to write custom types that can be used as 'self', i.e. as receivers, since the kernel needs to write types such as 'Arc' that common userspace Rust would not. 'arbitrary_self_types' has been accepted to become stable, and this is one of the steps required to get there. - Remove usage of the 'new_uninit' unstable feature. - Use custom C FFI types. Includes a new 'ffi' crate to contain our custom mapping, instead of using the standard library 'core::ffi' one. The actual remapping will be introduced in a later cycle. - Map '__kernel_{size_t,ssize_t,ptrdiff_t}' to 'usize'/'isize' instead of 32/64-bit integers. - Fix 'size_t' in bindgen generated prototypes of C builtins. - Warn on bindgen < 0.69.5 and libclang >= 19.1 due to a double issue in the projects, which we managed to trigger with the upcoming tracepoint support. It includes a build test since some distributions backported the fix (e.g. Debian -- thanks!). All major distributions we list should be now OK except Ubuntu non-LTS. 'macros' crate: - Adapt the build system to be able run the doctests there too; and clean up and enable the corresponding doctests. 'kernel' crate: - Add 'alloc' module with generic kernel allocator support and remove the dependency on the Rust standard library 'alloc' and the extension traits we used to provide fallible methods with flags. Add the 'Allocator' trait and its implementations '{K,V,KV}malloc'. Add the 'Box' type (a heap allocation for a single value of type 'T' that is also generic over an allocator and considers the kernel's GFP flags) and its shorthand aliases '{K,V,KV}Box'. Add 'ArrayLayout' type. Add 'Vec' (a contiguous growable array type) and its shorthand aliases '{K,V,KV}Vec', including iterator support. For instance, now we may write code such as: let mut v = KVec::new(); v.push(1, GFP_KERNEL)?; assert_eq!(&v, &[1]); Treewide, move as well old users to these new types. - 'sync' module: add global lock support, including the 'GlobalLockBackend' trait; the 'Global{Lock,Guard,LockedBy}' types and the 'global_lock!' macro. Add the 'Lock::try_lock' method. - 'error' module: optimize 'Error' type to use 'NonZeroI32' and make conversion functions public. - 'page' module: add 'page_align' function. - Add 'transmute' module with the existing 'FromBytes' and 'AsBytes' traits. - 'block::mq::request' module: improve rendered documentation. - 'types' module: extend 'Opaque' type documentation and add simple examples for the 'Either' types. drm/panic: - Clean up a series of Clippy warnings. Documentation: - Add coding guidelines for lints and the '#[expect]' feature. - Add Ubuntu to the list of distributions in the Quick Start guide. MAINTAINERS: - Add Danilo Krummrich as maintainer of the new 'alloc' module. And a few other small cleanups and fixes" * tag 'rust-6.13' of https://github.com/Rust-for-Linux/linux: (82 commits) rust: alloc: Fix `ArrayLayout` allocations docs: rust: remove spurious item in `expect` list rust: allow `clippy::needless_lifetimes` rust: warn on bindgen < 0.69.5 and libclang >= 19.1 rust: use custom FFI integer types rust: map `__kernel_size_t` and friends also to usize/isize rust: fix size_t in bindgen prototypes of C builtins rust: sync: add global lock support rust: macros: enable the rest of the tests rust: macros: enable paste! use from macro_rules! rust: enable macros::module! tests rust: kbuild: expand rusttest target for macros rust: types: extend `Opaque` documentation rust: block: fix formatting of `kernel::block::mq::request` module rust: macros: fix documentation of the paste! macro rust: kernel: fix THIS_MODULE header path in ThisModule doc comment rust: page: add Rust version of PAGE_ALIGN rust: helpers: remove unnecessary header includes rust: exports: improve grammar in commentary drm/panic: allow verbose version check ... |
||
Linus Torvalds
|
fb527fc1f3 |
fuse update for 6.13
-----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCZ0Rb/wAKCRDh3BK/laaZ PK80AQDAUgA6S5SSrbJxwRFNOhbwtZxZqJ8fomJR5xuWIEQ9pwEAkpFqhBhBW0y1 0YaREow2aDANQQtSUrfPtgva1ZXFwQU= =Cyx5 -----END PGP SIGNATURE----- Merge tag 'fuse-update-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse updates from Miklos Szeredi: - Add page -> folio conversions (Joanne Koong, Josef Bacik) - Allow max size of fuse requests to be configurable with a sysctl (Joanne Koong) - Allow FOPEN_DIRECT_IO to take advantage of async code path (yangyun) - Fix large kernel reads (like a module load) in virtio_fs (Hou Tao) - Fix attribute inconsistency in case readdirplus (and plain lookup in corner cases) is racing with inode eviction (Zhang Tianci) - Fix a WARN_ON triggered by virtio_fs (Asahi Lina) * tag 'fuse-update-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: (30 commits) virtiofs: dax: remove ->writepages() callback fuse: check attributes staleness on fuse_iget() fuse: remove pages for requests and exclusively use folios fuse: convert direct io to use folios mm/writeback: add folio_mark_dirty_lock() fuse: convert writebacks to use folios fuse: convert retrieves to use folios fuse: convert ioctls to use folios fuse: convert writes (non-writeback) to use folios fuse: convert reads to use folios fuse: convert readdir to use folios fuse: convert readlink to use folios fuse: convert cuse to use folios fuse: add support in virtio for requests using folios fuse: support folios in struct fuse_args_pages and fuse_copy_pages() fuse: convert fuse_notify_store to use folios fuse: convert fuse_retrieve to use folios fuse: use the folio based vmstat helpers fuse: convert fuse_writepage_need_send to take a folio fuse: convert fuse_do_readpage to use folios ... |
||
Linus Torvalds
|
e06635e26c |
slab updates for 6.13
-----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEe7vIQRWZI0iWSE3xu+CwddJFiJoFAmdERvEACgkQu+CwddJF iJre6Af9EBMVQiWJrmoMOjbGLqLgmZzSXRNxR862WGn4D/wesA1HmSlWgEn54hgc GIYIeD++v4JaIRNH0yZqb2UBSKjF/rYPDkKstnqgFaVakLoDrwkkwV2n3Gk5BEgR m/SzLGgoDWKR65I/oMpL6e2KrMOfMfjpB31qiVvdlaQd2Nv/5rw+gUVylxhNIZEH W11N3IC+e9hmgT3ZBpTmHeqNrlXE1+USWPrp/HV05Ndz6yf97JnP4Wr9f9pcyN3R aflLHR38+Q9cCfO7y8wNqtYvIV/kbqgdaqD76frSgalC4Lmz9+L+TZ2NuENCPoGj Xdbip2z+iffWhvqM+qooOLVxR0XqTA== =Sepb -----END PGP SIGNATURE----- Merge tag 'slab-for-6.13-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab Pull slab updates from Vlastimil Babka: - Add new slab_strict_numa boot parameter to enforce per-object memory policies on top of slab folio policies, for systems where saving cost of remote accesses is more important than minimizing slab allocation overhead (Christoph Lameter) - Fix for freeptr_offset alignment check being too strict for m68k (Geert Uytterhoeven) - krealloc() fixes for not violating __GFP_ZERO guarantees on krealloc() when slub_debug (redzone and object tracking) is enabled (Feng Tang) - Fix a memory leak in case sysfs registration fails for a slab cache, and also no longer fail to create the cache in that case (Hyeonggon Yoo) - Fix handling of detected consistency problems (due to buggy slab user) with slub_debug enabled, so that it does not cause further list corruption bugs (yuan.gao) - Code cleanup and kerneldocs polishing (Zhen Lei, Vlastimil Babka) * tag 'slab-for-6.13-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: slab: Fix too strict alignment check in create_cache() mm/slab: Allow cache creation to proceed even if sysfs registration fails mm/slub: Avoid list corruption when removing a slab from the full list mm/slub, kunit: Add testcase for krealloc redzone and zeroing mm/slub: Improve redzone check and zeroing for krealloc() mm/slub: Consider kfence case for get_orig_size() SLUB: Add support for per object memory policies mm, slab: add kerneldocs for common SLAB_ flags mm/slab: remove duplicate check in create_cache() mm/slub: Move krealloc() and related code to slub.c mm/kasan: Don't store metadata inside kmalloc object when slub_debug_orig_size is on |
||
Linus Torvalds
|
f5f4745a7f |
- The series "resource: A couple of cleanups" from Andy Shevchenko
performs some cleanups in the resource management code. - The series "Improve the copy of task comm" from Yafang Shao addresses possible race-induced overflows in the management of task_struct.comm[]. - The series "Remove unnecessary header includes from {tools/}lib/list_sort.c" from Kuan-Wei Chiu adds some cleanups and a small fix to the list_sort library code and to its selftest. - The series "Enhance min heap API with non-inline functions and optimizations" also from Kuan-Wei Chiu optimizes and cleans up the min_heap library code. - The series "nilfs2: Finish folio conversion" from Ryusuke Konishi finishes off nilfs2's folioification. - The series "add detect count for hung tasks" from Lance Yang adds more userspace visibility into the hung-task detector's activity. - Apart from that, singelton patches in many places - please see the individual changelogs for details. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZ0L6lQAKCRDdBJ7gKXxA jmEIAPwMSglNPKRIOgzOvHh8MUJW1Dy8iKJ2kWCO3f6QTUIM2AEA+PazZbUd/g2m Ii8igH0UBibIgva7MrCyJedDI1O23AA= =8BIU -----END PGP SIGNATURE----- Merge tag 'mm-nonmm-stable-2024-11-24-02-05' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - The series "resource: A couple of cleanups" from Andy Shevchenko performs some cleanups in the resource management code - The series "Improve the copy of task comm" from Yafang Shao addresses possible race-induced overflows in the management of task_struct.comm[] - The series "Remove unnecessary header includes from {tools/}lib/list_sort.c" from Kuan-Wei Chiu adds some cleanups and a small fix to the list_sort library code and to its selftest - The series "Enhance min heap API with non-inline functions and optimizations" also from Kuan-Wei Chiu optimizes and cleans up the min_heap library code - The series "nilfs2: Finish folio conversion" from Ryusuke Konishi finishes off nilfs2's folioification - The series "add detect count for hung tasks" from Lance Yang adds more userspace visibility into the hung-task detector's activity - Apart from that, singelton patches in many places - please see the individual changelogs for details * tag 'mm-nonmm-stable-2024-11-24-02-05' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (71 commits) gdb: lx-symbols: do not error out on monolithic build kernel/reboot: replace sprintf() with sysfs_emit() lib: util_macros_kunit: add kunit test for util_macros.h util_macros.h: fix/rework find_closest() macros Improve consistency of '#error' directive messages ocfs2: fix uninitialized value in ocfs2_file_read_iter() hung_task: add docs for hung_task_detect_count hung_task: add detect count for hung tasks dma-buf: use atomic64_inc_return() in dma_buf_getfile() fs/proc/kcore.c: fix coccinelle reported ERROR instances resource: avoid unnecessary resource tree walking in __region_intersects() ocfs2: remove unused errmsg function and table ocfs2: cluster: fix a typo lib/scatterlist: use sg_phys() helper checkpatch: always parse orig_commit in fixes tag nilfs2: convert metadata aops from writepage to writepages nilfs2: convert nilfs_recovery_copy_block() to take a folio nilfs2: convert nilfs_page_count_clean_buffers() to take a folio nilfs2: remove nilfs_writepage nilfs2: convert checkpoint file to be folio-based ... |
||
Linus Torvalds
|
5c00ff742b |
- The series "zram: optimal post-processing target selection" from
Sergey Senozhatsky improves zram's post-processing selection algorithm. This leads to improved memory savings. - Wei Yang has gone to town on the mapletree code, contributing several series which clean up the implementation: - "refine mas_mab_cp()" - "Reduce the space to be cleared for maple_big_node" - "maple_tree: simplify mas_push_node()" - "Following cleanup after introduce mas_wr_store_type()" - "refine storing null" - The series "selftests/mm: hugetlb_fault_after_madv improvements" from David Hildenbrand fixes this selftest for s390. - The series "introduce pte_offset_map_{ro|rw}_nolock()" from Qi Zheng implements some rationaizations and cleanups in the page mapping code. - The series "mm: optimize shadow entries removal" from Shakeel Butt optimizes the file truncation code by speeding up the handling of shadow entries. - The series "Remove PageKsm()" from Matthew Wilcox completes the migration of this flag over to being a folio-based flag. - The series "Unify hugetlb into arch_get_unmapped_area functions" from Oscar Salvador implements a bunch of consolidations and cleanups in the hugetlb code. - The series "Do not shatter hugezeropage on wp-fault" from Dev Jain takes away the wp-fault time practice of turning a huge zero page into small pages. Instead we replace the whole thing with a THP. More consistent cleaner and potentiall saves a large number of pagefaults. - The series "percpu: Add a test case and fix for clang" from Andy Shevchenko enhances and fixes the kernel's built in percpu test code. - The series "mm/mremap: Remove extra vma tree walk" from Liam Howlett optimizes mremap() by avoiding doing things which we didn't need to do. - The series "Improve the tmpfs large folio read performance" from Baolin Wang teaches tmpfs to copy data into userspace at the folio size rather than as individual pages. A 20% speedup was observed. - The series "mm/damon/vaddr: Fix issue in damon_va_evenly_split_region()" fro Zheng Yejian fixes DAMON splitting. - The series "memcg-v1: fully deprecate charge moving" from Shakeel Butt removes the long-deprecated memcgv2 charge moving feature. - The series "fix error handling in mmap_region() and refactor" from Lorenzo Stoakes cleanup up some of the mmap() error handling and addresses some potential performance issues. - The series "x86/module: use large ROX pages for text allocations" from Mike Rapoport teaches x86 to use large pages for read-only-execute module text. - The series "page allocation tag compression" from Suren Baghdasaryan is followon maintenance work for the new page allocation profiling feature. - The series "page->index removals in mm" from Matthew Wilcox remove most references to page->index in mm/. A slow march towards shrinking struct page. - The series "damon/{self,kunit}tests: minor fixups for DAMON debugfs interface tests" from Andrew Paniakin performs maintenance work for DAMON's self testing code. - The series "mm: zswap swap-out of large folios" from Kanchana Sridhar improves zswap's batching of compression and decompression. It is a step along the way towards using Intel IAA hardware acceleration for this zswap operation. - The series "kasan: migrate the last module test to kunit" from Sabyrzhan Tasbolatov completes the migration of the KASAN built-in tests over to the KUnit framework. - The series "implement lightweight guard pages" from Lorenzo Stoakes permits userapace to place fault-generating guard pages within a single VMA, rather than requiring that multiple VMAs be created for this. Improved efficiencies for userspace memory allocators are expected. - The series "memcg: tracepoint for flushing stats" from JP Kobryn uses tracepoints to provide increased visibility into memcg stats flushing activity. - The series "zram: IDLE flag handling fixes" from Sergey Senozhatsky fixes a zram buglet which potentially affected performance. - The series "mm: add more kernel parameters to control mTHP" from Maíra Canal enhances our ability to control/configuremultisize THP from the kernel boot command line. - The series "kasan: few improvements on kunit tests" from Sabyrzhan Tasbolatov has a couple of fixups for the KASAN KUnit tests. - The series "mm/list_lru: Split list_lru lock into per-cgroup scope" from Kairui Song optimizes list_lru memory utilization when lockdep is enabled. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZzwFqgAKCRDdBJ7gKXxA jkeuAQCkl+BmeYHE6uG0hi3pRxkupseR6DEOAYIiTv0/l8/GggD/Z3jmEeqnZaNq xyyenpibWgUoShU2wZ/Ha8FE5WDINwg= =JfWR -----END PGP SIGNATURE----- Merge tag 'mm-stable-2024-11-18-19-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - The series "zram: optimal post-processing target selection" from Sergey Senozhatsky improves zram's post-processing selection algorithm. This leads to improved memory savings. - Wei Yang has gone to town on the mapletree code, contributing several series which clean up the implementation: - "refine mas_mab_cp()" - "Reduce the space to be cleared for maple_big_node" - "maple_tree: simplify mas_push_node()" - "Following cleanup after introduce mas_wr_store_type()" - "refine storing null" - The series "selftests/mm: hugetlb_fault_after_madv improvements" from David Hildenbrand fixes this selftest for s390. - The series "introduce pte_offset_map_{ro|rw}_nolock()" from Qi Zheng implements some rationaizations and cleanups in the page mapping code. - The series "mm: optimize shadow entries removal" from Shakeel Butt optimizes the file truncation code by speeding up the handling of shadow entries. - The series "Remove PageKsm()" from Matthew Wilcox completes the migration of this flag over to being a folio-based flag. - The series "Unify hugetlb into arch_get_unmapped_area functions" from Oscar Salvador implements a bunch of consolidations and cleanups in the hugetlb code. - The series "Do not shatter hugezeropage on wp-fault" from Dev Jain takes away the wp-fault time practice of turning a huge zero page into small pages. Instead we replace the whole thing with a THP. More consistent cleaner and potentiall saves a large number of pagefaults. - The series "percpu: Add a test case and fix for clang" from Andy Shevchenko enhances and fixes the kernel's built in percpu test code. - The series "mm/mremap: Remove extra vma tree walk" from Liam Howlett optimizes mremap() by avoiding doing things which we didn't need to do. - The series "Improve the tmpfs large folio read performance" from Baolin Wang teaches tmpfs to copy data into userspace at the folio size rather than as individual pages. A 20% speedup was observed. - The series "mm/damon/vaddr: Fix issue in damon_va_evenly_split_region()" fro Zheng Yejian fixes DAMON splitting. - The series "memcg-v1: fully deprecate charge moving" from Shakeel Butt removes the long-deprecated memcgv2 charge moving feature. - The series "fix error handling in mmap_region() and refactor" from Lorenzo Stoakes cleanup up some of the mmap() error handling and addresses some potential performance issues. - The series "x86/module: use large ROX pages for text allocations" from Mike Rapoport teaches x86 to use large pages for read-only-execute module text. - The series "page allocation tag compression" from Suren Baghdasaryan is followon maintenance work for the new page allocation profiling feature. - The series "page->index removals in mm" from Matthew Wilcox remove most references to page->index in mm/. A slow march towards shrinking struct page. - The series "damon/{self,kunit}tests: minor fixups for DAMON debugfs interface tests" from Andrew Paniakin performs maintenance work for DAMON's self testing code. - The series "mm: zswap swap-out of large folios" from Kanchana Sridhar improves zswap's batching of compression and decompression. It is a step along the way towards using Intel IAA hardware acceleration for this zswap operation. - The series "kasan: migrate the last module test to kunit" from Sabyrzhan Tasbolatov completes the migration of the KASAN built-in tests over to the KUnit framework. - The series "implement lightweight guard pages" from Lorenzo Stoakes permits userapace to place fault-generating guard pages within a single VMA, rather than requiring that multiple VMAs be created for this. Improved efficiencies for userspace memory allocators are expected. - The series "memcg: tracepoint for flushing stats" from JP Kobryn uses tracepoints to provide increased visibility into memcg stats flushing activity. - The series "zram: IDLE flag handling fixes" from Sergey Senozhatsky fixes a zram buglet which potentially affected performance. - The series "mm: add more kernel parameters to control mTHP" from Maíra Canal enhances our ability to control/configuremultisize THP from the kernel boot command line. - The series "kasan: few improvements on kunit tests" from Sabyrzhan Tasbolatov has a couple of fixups for the KASAN KUnit tests. - The series "mm/list_lru: Split list_lru lock into per-cgroup scope" from Kairui Song optimizes list_lru memory utilization when lockdep is enabled. * tag 'mm-stable-2024-11-18-19-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (215 commits) cma: enforce non-zero pageblock_order during cma_init_reserved_mem() mm/kfence: add a new kunit test test_use_after_free_read_nofault() zram: fix NULL pointer in comp_algorithm_show() memcg/hugetlb: add hugeTLB counters to memcg vmstat: call fold_vm_zone_numa_events() before show per zone NUMA event mm: mmap_lock: check trace_mmap_lock_$type_enabled() instead of regcount zram: ZRAM_DEF_COMP should depend on ZRAM MAINTAINERS/MEMORY MANAGEMENT: add document files for mm Docs/mm/damon: recommend academic papers to read and/or cite mm: define general function pXd_init() kmemleak: iommu/iova: fix transient kmemleak false positive mm/list_lru: simplify the list_lru walk callback function mm/list_lru: split the lock to per-cgroup scope mm/list_lru: simplify reparenting and initial allocation mm/list_lru: code clean up for reparenting mm/list_lru: don't export list_lru_add mm/list_lru: don't pass unnecessary key parameters kasan: add kunit tests for kmalloc_track_caller, kmalloc_node_track_caller kasan: change kasan_atomics kunit test as KUNIT_CASE_SLOW kasan: use EXPORT_SYMBOL_IF_KUNIT to export symbols ... |
||
Linus Torvalds
|
341d041daa |
iommufd 6.13 merge window pull
Several new features and uAPI for iommufd: - IOMMU_IOAS_MAP_FILE allows passing in a file descriptor as the backing memory for an iommu mapping. To date VFIO/iommufd have used VMA's and pin_user_pages(), this now allows using memfds and memfd_pin_folios(). Notably this creates a pure folio path from the memfd to the iommu page table where memory is never broken down to PAGE_SIZE. - IOMMU_IOAS_CHANGE_PROCESS moves the pinned page accounting between two processes. Combined with the above this allows iommufd to support a VMM re-start using exec() where something like qemu would exec() a new version of itself and fd pass the memfds/iommufd/etc to the new process. The memfd allows DMA access to the memory to continue while the new process is getting setup, and the CHANGE_PROCESS updates all the accounting. - Support for fault reporting to userspace on non-PRI HW, such as ARM stall-mode embedded devices. - IOMMU_VIOMMU_ALLOC introduces the concept of a HW/driver backed virtual iommu. This will be used by VMMs to access hardware features that are contained with in a VM. The first use is to inform the kernel of the virtual SID to physical SID mapping when issuing SID based invalidation on ARM. Further uses will tie HW features that are directly accessed by the VM, such as invalidation queue assignment and others. - IOMMU_VDEVICE_ALLOC informs the kernel about the mapping of virtual device to physical device within a VIOMMU. Minimially this is used to translate VM issued cache invalidation commands from virtual to physical device IDs. - Enhancements to IOMMU_HWPT_INVALIDATE and IOMMU_HWPT_ALLOC to work with the VIOMMU - ARM SMMuv3 support for nested translation. Using the VIOMMU and VDEVICE the driver can model this HW's behavior for nested translation. This includes a shared branch from Will. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCZzzKKwAKCRCFwuHvBreF YaCMAQDOQAgw87eUYKnY7vFodlsTUA2E8uSxDmk6nPWySd0NKwD/flOP85MdEs9O Ot+RoL4/J3IyNH+eg5kN68odmx4mAw8= =ec8x -----END PGP SIGNATURE----- Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd Pull iommufd updates from Jason Gunthorpe: "Several new features and uAPI for iommufd: - IOMMU_IOAS_MAP_FILE allows passing in a file descriptor as the backing memory for an iommu mapping. To date VFIO/iommufd have used VMA's and pin_user_pages(), this now allows using memfds and memfd_pin_folios(). Notably this creates a pure folio path from the memfd to the iommu page table where memory is never broken down to PAGE_SIZE. - IOMMU_IOAS_CHANGE_PROCESS moves the pinned page accounting between two processes. Combined with the above this allows iommufd to support a VMM re-start using exec() where something like qemu would exec() a new version of itself and fd pass the memfds/iommufd/etc to the new process. The memfd allows DMA access to the memory to continue while the new process is getting setup, and the CHANGE_PROCESS updates all the accounting. - Support for fault reporting to userspace on non-PRI HW, such as ARM stall-mode embedded devices. - IOMMU_VIOMMU_ALLOC introduces the concept of a HW/driver backed virtual iommu. This will be used by VMMs to access hardware features that are contained with in a VM. The first use is to inform the kernel of the virtual SID to physical SID mapping when issuing SID based invalidation on ARM. Further uses will tie HW features that are directly accessed by the VM, such as invalidation queue assignment and others. - IOMMU_VDEVICE_ALLOC informs the kernel about the mapping of virtual device to physical device within a VIOMMU. Minimially this is used to translate VM issued cache invalidation commands from virtual to physical device IDs. - Enhancements to IOMMU_HWPT_INVALIDATE and IOMMU_HWPT_ALLOC to work with the VIOMMU - ARM SMMuv3 support for nested translation. Using the VIOMMU and VDEVICE the driver can model this HW's behavior for nested translation. This includes a shared branch from Will" * tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: (51 commits) iommu/arm-smmu-v3: Import IOMMUFD module namespace iommufd: IOMMU_IOAS_CHANGE_PROCESS selftest iommufd: Add IOMMU_IOAS_CHANGE_PROCESS iommufd: Lock all IOAS objects iommufd: Export do_update_pinned iommu/arm-smmu-v3: Support IOMMU_HWPT_INVALIDATE using a VIOMMU object iommu/arm-smmu-v3: Allow ATS for IOMMU_DOMAIN_NESTED iommu/arm-smmu-v3: Use S2FWB for NESTED domains iommu/arm-smmu-v3: Support IOMMU_DOMAIN_NESTED iommu/arm-smmu-v3: Support IOMMU_VIOMMU_ALLOC Documentation: userspace-api: iommufd: Update vDEVICE iommufd/selftest: Add vIOMMU coverage for IOMMU_HWPT_INVALIDATE ioctl iommufd/selftest: Add IOMMU_TEST_OP_DEV_CHECK_CACHE test command iommufd/selftest: Add mock_viommu_cache_invalidate iommufd/viommu: Add iommufd_viommu_find_dev helper iommu: Add iommu_copy_struct_from_full_user_array helper iommufd: Allow hwpt_id to carry viommu_id for IOMMU_HWPT_INVALIDATE iommu/viommu: Add cache_invalidate to iommufd_viommu_ops iommufd/selftest: Add IOMMU_VDEVICE_ALLOC test coverage iommufd/viommu: Add IOMMUFD_OBJ_VDEVICE and IOMMU_VDEVICE_ALLOC ioctl ... |
||
Linus Torvalds
|
fcc79e1714 |
Networking changes for 6.13.
The most significant set of changes is the per netns RTNL. The new behavior is disabled by default, regression risk should be contained. Notably the new config knob PTP_1588_CLOCK_VMCLOCK will inherit its default value from PTP_1588_CLOCK_KVM, as the first is intended to be a more reliable replacement for the latter. Core ---- - Started a very large, in-progress, effort to make the RTNL lock scope per network-namespace, thus reducing the lock contention significantly in the containerized use-case, comprising: - RCU-ified some relevant slices of the FIB control path - introduce basic per netns locking helpers - namespacified the IPv4 address hash table - remove rtnl_register{,_module}() in favour of rtnl_register_many() - refactor rtnl_{new,del,set}link() moving as much validation as possible out of RTNL lock - convert all phonet doit() and dumpit() handlers to RCU - convert IPv4 addresses manipulation to per-netns RTNL - convert virtual interface creation to per-netns RTNL the per-netns lock infra is guarded by the CONFIG_DEBUG_NET_SMALL_RTNL knob, disabled by default ad interim. - Introduce NAPI suspension, to efficiently switching between busy polling (NAPI processing suspended) and normal processing. - Migrate the IPv4 routing input, output and control path from direct ToS usage to DSCP macros. This is a work in progress to make ECN handling consistent and reliable. - Add drop reasons support to the IPv4 rotue input path, allowing better introspection in case of packets drop. - Make FIB seqnum lockless, dropping RTNL protection for read access. - Make inet{,v6} addresses hashing less predicable. - Allow providing timestamp OPT_ID via cmsg, to correlate TX packets and timestamps Things we sprinkled into general kernel code -------------------------------------------- - Add small file operations for debugfs, to reduce the struct ops size. - Refactoring and optimization for the implementation of page_frag API, This is a preparatory work to consolidate the page_frag implementation. Netfilter --------- - Optimize set element transactions to reduce memory consumption - Extended netlink error reporting for attribute parser failure. - Make legacy xtables configs user selectable, giving users the option to configure iptables without enabling any other config. - Address a lot of false-positive RCU issues, pointed by recent CI improvements. BPF --- - Put xsk sockets on a struct diet and add various cleanups. Overall, this helps to bump performance by 12% for some workloads. - Extend BPF selftests to increase coverage of XDP features in combination with BPF cpumap. - Optimize and homogenize bpf_csum_diff helper for all archs and also add a batch of new BPF selftests for it. - Extend netkit with an option to delegate skb->{mark,priority} scrubbing to its BPF program. - Make the bpf_get_netns_cookie() helper available also to tc(x) BPF programs. Protocols --------- - Introduces 4-tuple hash for connected udp sockets, speeding-up significantly connected sockets lookup. - Add a fastpath for some TCP timers that usually expires after close, the socket lock contention. - Add inbound and outbound xfrm state caches to speed up state lookups. - Avoid sending MPTCP advertisements on stale subflows, reducing risks on loosing them. - Make neighbours table flushing more scalable, maintaining per device neigh lists. Driver API ---------- - Introduce a unified interface to configure transmission H/W shaping, and expose it to user-space via generic-netlink. - Add support for per-NAPI config via netlink. This makes napi configuration persistent across queues removal and re-creation. Requires driver updates, currently supported drivers are: nVidia/Mellanox mlx4 and mlx5, Broadcom brcm and Intel ice. - Add ethtool support for writing SFP / PHY firmware blocks. - Track RSS context allocation from ethtool core. - Implement support for mirroring to DSA CPU port, via TC mirror offload. - Consolidate FDB updates notification, to avoid duplicates on device-specific entries. - Expose DPLL clock quality level to the user-space. - Support master-slave PHY config via device tree. Tests and tooling ----------------- - forwarding: introduce deferred commands, to simplify the cleanup phase Drivers ------- - Updated several drivers - Amazon vNic, Google vNic, Microsoft vNic, Intel e1000e and Broadcom Tigon3 - to use netdev-genl to link the IRQs and queues to NAPI IDs, allowing busy polling and better introspection. - Ethernet high-speed NICs: - nVidia/Mellanox: - mlx5: - a large refactor to implement support for cross E-Switch scheduling - refactor H/W conter management to let it scale better - H/W GRO cleanups - Intel (100G, ice):: - adds support for ethtool reset - implement support for per TX queue H/W shaping - AMD/Solarflare: - implement per device queue stats support - Broadcom (bnxt): - improve wildcard l4proto on IPv4/IPv6 ntuple rules - Marvell Octeon: - Adds representor support for each Resource Virtualization Unit (RVU) device. - Hisilicon: - adds support for the BMC Gigabit Ethernet - IBM (EMAC): - driver cleanup and modernization - Cisco (VIC): - raise the queues number limit to 256 - Ethernet virtual: - Google vNIC: - implements page pool support - macsec: - inherit lower device's features and TSO limits when offloading - virtio_net: - enable premapped mode by default - support for XDP socket(AF_XDP) zerocopy TX - wireguard: - set the TSO max size to be GSO_MAX_SIZE, to aggregate larger packets. - Ethernet NICs embedded and virtual: - Broadcom ASP: - enable software timestamping - Freescale: - add enetc4 PF driver - MediaTek: Airoha SoC: - implement BQL support - RealTek r8169: - enable TSO by default on r8168/r8125 - implement extended ethtool stats - Renesas AVB: - enable TX checksum offload - Synopsys (stmmac): - support header splitting for vlan tagged packets - move common code for DWMAC4 and DWXGMAC into a separate FPE module. - Add the dwmac driver support for T-HEAD TH1520 SoC - Synopsys (xpcs): - driver refactor and cleanup - TI: - icssg_prueth: add VLAN offload support - Xilinx emaclite: - adds clock support - Ethernet switches: - Microchip: - implement support for the lan969x Ethernet switch family - add LAN9646 switch support to KSZ DSA driver - Ethernet PHYs: - Marvel: 88q2x: enable auto negotiation - Microchip: add support for LAN865X Rev B1 and LAN867X Rev C1/C2 - PTP: - Add support for the Amazon virtual clock device - Add PtP driver for s390 clocks - WiFi: - mac80211 - EHT 1024 aggregation size for transmissions - new operation to indicate that a new interface is to be added - support radio separation of multi-band devices - move wireless extension spy implementation to libiw - Broadcom: - brcmfmac: optional LPO clock support - Microchip: - add support for Atmel WILC3000 - Qualcomm (ath12k): - firmware coredump collection support - add debugfs support for a multitude of statistics - Qualcomm (ath5k): - Arcadyan ARV45XX AR2417 & Gigaset SX76[23] AR241[34]A support - Realtek: - rtw88: 8821au and 8812au USB adapters support - rtw89: add thermal protection - rtw89: fine tune BT-coexsitence to improve user experience - rtw89: firmware secure boot for WiFi 6 chip - Bluetooth - add Qualcomm WCN785x support for ids Foxconn 0xe0fc/0xe0f3 and 0x13d3:0x3623 - add Realtek RTL8852BE support for id Foxconn 0xe123 - add MediaTek MT7920 support for wireless module ids - btintel_pcie: add handshake between driver and firmware - btintel_pcie: add recovery mechanism - btnxpuart: add GPIO support to power save feature Signed-off-by: Paolo Abeni <pabeni@redhat.com> -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmc8sukSHHBhYmVuaUBy ZWRoYXQuY29tAAoJECkkeY3MjxOkLEYQAIMM6Qjh0bh3Byr3gOS1xZzXG+APLjP4 9Jr0p3i+X53i90jvVqzeVO5FTc95MVHSKZ3kvPkDMXSLUaEJxocNHCI5Dzl/2/qL wWdpUB6/ou+jKB4Bn6Z8OvVODT7qrr0tVa9M2/fuKWrIsOU/ntIhG8EhnGddk5U/ vKPSf5PUIb81uNRnF58VusY3wrT1dEoh9VfJYxL+ST+inPxjEAMy6Y+lmlsjGaSX jrS+Pp9KYiUwl3Qt0AQs+cG4OHkJdjbnChrfosWwpkiyddO8klVq06+wX/TiSzfF b9VZtBfy/GZs3lkE1mQkcILdtX5pP3YHQdpsuxFfVI0JHVszx2ck7WdoRux/8F0v kKZsYcO7bH9I1wMFP66Ff9hIbdEQaeucK+KdDkXyPNMfP91Vzmfjii8IBxOC36Ie BbOeFUrXyTxxJ2u0vf/X9JtIq8bcrkNrSd1n1jlGPMqG3FVzsY95+Oi4qfsyeUbl lS1PlVTqPMPFdX54HnxM3y2rJjhd7iXhkvmtuXNjRFThXlOiK3maAPWlM1aZ3b8u Vjs4JFUsW0tleZG+RzANjsGjXbf7AiPUGLZt+acem0K+fcjG4i5aGIAJrxwa/ORx eG74IZRt5cOI371W7gNLGHjwnuge8tFPgOWcRP2eozNm7jvMYALBejYS7eWUTvaf THcvVM+bupEZ =GzPr -----END PGP SIGNATURE----- Merge tag 'net-next-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Paolo Abeni: "The most significant set of changes is the per netns RTNL. The new behavior is disabled by default, regression risk should be contained. Notably the new config knob PTP_1588_CLOCK_VMCLOCK will inherit its default value from PTP_1588_CLOCK_KVM, as the first is intended to be a more reliable replacement for the latter. Core: - Started a very large, in-progress, effort to make the RTNL lock scope per network-namespace, thus reducing the lock contention significantly in the containerized use-case, comprising: - RCU-ified some relevant slices of the FIB control path - introduce basic per netns locking helpers - namespacified the IPv4 address hash table - remove rtnl_register{,_module}() in favour of rtnl_register_many() - refactor rtnl_{new,del,set}link() moving as much validation as possible out of RTNL lock - convert all phonet doit() and dumpit() handlers to RCU - convert IPv4 addresses manipulation to per-netns RTNL - convert virtual interface creation to per-netns RTNL the per-netns lock infrastructure is guarded by the CONFIG_DEBUG_NET_SMALL_RTNL knob, disabled by default ad interim. - Introduce NAPI suspension, to efficiently switching between busy polling (NAPI processing suspended) and normal processing. - Migrate the IPv4 routing input, output and control path from direct ToS usage to DSCP macros. This is a work in progress to make ECN handling consistent and reliable. - Add drop reasons support to the IPv4 rotue input path, allowing better introspection in case of packets drop. - Make FIB seqnum lockless, dropping RTNL protection for read access. - Make inet{,v6} addresses hashing less predicable. - Allow providing timestamp OPT_ID via cmsg, to correlate TX packets and timestamps Things we sprinkled into general kernel code: - Add small file operations for debugfs, to reduce the struct ops size. - Refactoring and optimization for the implementation of page_frag API, This is a preparatory work to consolidate the page_frag implementation. Netfilter: - Optimize set element transactions to reduce memory consumption - Extended netlink error reporting for attribute parser failure. - Make legacy xtables configs user selectable, giving users the option to configure iptables without enabling any other config. - Address a lot of false-positive RCU issues, pointed by recent CI improvements. BPF: - Put xsk sockets on a struct diet and add various cleanups. Overall, this helps to bump performance by 12% for some workloads. - Extend BPF selftests to increase coverage of XDP features in combination with BPF cpumap. - Optimize and homogenize bpf_csum_diff helper for all archs and also add a batch of new BPF selftests for it. - Extend netkit with an option to delegate skb->{mark,priority} scrubbing to its BPF program. - Make the bpf_get_netns_cookie() helper available also to tc(x) BPF programs. Protocols: - Introduces 4-tuple hash for connected udp sockets, speeding-up significantly connected sockets lookup. - Add a fastpath for some TCP timers that usually expires after close, the socket lock contention. - Add inbound and outbound xfrm state caches to speed up state lookups. - Avoid sending MPTCP advertisements on stale subflows, reducing risks on loosing them. - Make neighbours table flushing more scalable, maintaining per device neigh lists. Driver API: - Introduce a unified interface to configure transmission H/W shaping, and expose it to user-space via generic-netlink. - Add support for per-NAPI config via netlink. This makes napi configuration persistent across queues removal and re-creation. Requires driver updates, currently supported drivers are: nVidia/Mellanox mlx4 and mlx5, Broadcom brcm and Intel ice. - Add ethtool support for writing SFP / PHY firmware blocks. - Track RSS context allocation from ethtool core. - Implement support for mirroring to DSA CPU port, via TC mirror offload. - Consolidate FDB updates notification, to avoid duplicates on device-specific entries. - Expose DPLL clock quality level to the user-space. - Support master-slave PHY config via device tree. Tests and tooling: - forwarding: introduce deferred commands, to simplify the cleanup phase Drivers: - Updated several drivers - Amazon vNic, Google vNic, Microsoft vNic, Intel e1000e and Broadcom Tigon3 - to use netdev-genl to link the IRQs and queues to NAPI IDs, allowing busy polling and better introspection. - Ethernet high-speed NICs: - nVidia/Mellanox: - mlx5: - a large refactor to implement support for cross E-Switch scheduling - refactor H/W conter management to let it scale better - H/W GRO cleanups - Intel (100G, ice):: - add support for ethtool reset - implement support for per TX queue H/W shaping - AMD/Solarflare: - implement per device queue stats support - Broadcom (bnxt): - improve wildcard l4proto on IPv4/IPv6 ntuple rules - Marvell Octeon: - Add representor support for each Resource Virtualization Unit (RVU) device. - Hisilicon: - add support for the BMC Gigabit Ethernet - IBM (EMAC): - driver cleanup and modernization - Cisco (VIC): - raise the queues number limit to 256 - Ethernet virtual: - Google vNIC: - implement page pool support - macsec: - inherit lower device's features and TSO limits when offloading - virtio_net: - enable premapped mode by default - support for XDP socket(AF_XDP) zerocopy TX - wireguard: - set the TSO max size to be GSO_MAX_SIZE, to aggregate larger packets. - Ethernet NICs embedded and virtual: - Broadcom ASP: - enable software timestamping - Freescale: - add enetc4 PF driver - MediaTek: Airoha SoC: - implement BQL support - RealTek r8169: - enable TSO by default on r8168/r8125 - implement extended ethtool stats - Renesas AVB: - enable TX checksum offload - Synopsys (stmmac): - support header splitting for vlan tagged packets - move common code for DWMAC4 and DWXGMAC into a separate FPE module. - add dwmac driver support for T-HEAD TH1520 SoC - Synopsys (xpcs): - driver refactor and cleanup - TI: - icssg_prueth: add VLAN offload support - Xilinx emaclite: - add clock support - Ethernet switches: - Microchip: - implement support for the lan969x Ethernet switch family - add LAN9646 switch support to KSZ DSA driver - Ethernet PHYs: - Marvel: 88q2x: enable auto negotiation - Microchip: add support for LAN865X Rev B1 and LAN867X Rev C1/C2 - PTP: - Add support for the Amazon virtual clock device - Add PtP driver for s390 clocks - WiFi: - mac80211 - EHT 1024 aggregation size for transmissions - new operation to indicate that a new interface is to be added - support radio separation of multi-band devices - move wireless extension spy implementation to libiw - Broadcom: - brcmfmac: optional LPO clock support - Microchip: - add support for Atmel WILC3000 - Qualcomm (ath12k): - firmware coredump collection support - add debugfs support for a multitude of statistics - Qualcomm (ath5k): - Arcadyan ARV45XX AR2417 & Gigaset SX76[23] AR241[34]A support - Realtek: - rtw88: 8821au and 8812au USB adapters support - rtw89: add thermal protection - rtw89: fine tune BT-coexsitence to improve user experience - rtw89: firmware secure boot for WiFi 6 chip - Bluetooth - add Qualcomm WCN785x support for ids Foxconn 0xe0fc/0xe0f3 and 0x13d3:0x3623 - add Realtek RTL8852BE support for id Foxconn 0xe123 - add MediaTek MT7920 support for wireless module ids - btintel_pcie: add handshake between driver and firmware - btintel_pcie: add recovery mechanism - btnxpuart: add GPIO support to power save feature" * tag 'net-next-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1475 commits) mm: page_frag: fix a compile error when kernel is not compiled Documentation: tipc: fix formatting issue in tipc.rst selftests: nic_performance: Add selftest for performance of NIC driver selftests: nic_link_layer: Add selftest case for speed and duplex states selftests: nic_link_layer: Add link layer selftest for NIC driver bnxt_en: Add FW trace coredump segments to the coredump bnxt_en: Add a new ethtool -W dump flag bnxt_en: Add 2 parameters to bnxt_fill_coredump_seg_hdr() bnxt_en: Add functions to copy host context memory bnxt_en: Do not free FW log context memory bnxt_en: Manage the FW trace context memory bnxt_en: Allocate backing store memory for FW trace logs bnxt_en: Add a 'force' parameter to bnxt_free_ctx_mem() bnxt_en: Refactor bnxt_free_ctx_mem() bnxt_en: Add mem_valid bit to struct bnxt_ctx_mem_type bnxt_en: Update firmware interface spec to 1.10.3.85 selftests/bpf: Add some tests with sockmap SK_PASS bpf: fix recursive lock when verdict program return SK_PASS wireguard: device: support big tcp GSO wireguard: selftests: load nf_conntrack if not present ... |
||
Linus Torvalds
|
6e95ef0258 |
bpf-next-bpf-next-6.13
-----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmc7hIQACgkQ6rmadz2v bTrcRA/+MsUOzJPnjokonHwk8X4KQM21gOua/sUcGArLVGF/JoW5/b1W8UBQ0y5+ +okYaRNGpwF0/2S8M5FAYpM7VSPLl1U7Rihr55I63D9kbAo0pDQwpn4afQFuZhaC l7MzkhBHS7XXx5/70APOzy3kz1GDYvz39jiWuAAhRqVejFO+fa4pDz4W+Ht7jYTQ jJOLn4vJna9fSfVf/U/bbdz5lL0lncIiEnRIEbF7EszbF2CA7sa+/KFENGM7ChEo UlxK2Xz5fpzgT6htZRjMr6jmupfg7gzdT4moOysQQcjkllvv6/4MD0s/GLShtG9H SmpaptpYCEGXLuApGzkSddwiT6iUMTqQr7zs6LPp0gPh+4Z0sSPNoBtBp2v0aVDl w0zhVhMfoF66rMG+IZY684CsMGg5h8UsOS46KLjSU0fW2HpGM7+zZLpXOaGkU3OH UV0womPT/C2kS2fpOn9F91O8qMjOZ4EXd+zuRtIRv9CeuVIpCT9R13lEYn+wfr6d aUci8wybha1UOAvkRiXiqWOPS+0Z/arrSbCSDMQF6DevLpQl0noVbTVssWXcRdUE 9Ve6J0yS29WxNWFtuuw4xP5NcG1AnRXVGh215TuVBX7xK9X/hnDDhfalltsjXfnd m1f64FxU2SGp2D7X8BX/6Aeyo6mITE6I3SNMUrcvk1Zid36zhy8= =TXGS -----END PGP SIGNATURE----- Merge tag 'bpf-next-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Pull bpf updates from Alexei Starovoitov: - Add BPF uprobe session support (Jiri Olsa) - Optimize uprobe performance (Andrii Nakryiko) - Add bpf_fastcall support to helpers and kfuncs (Eduard Zingerman) - Avoid calling free_htab_elem() under hash map bucket lock (Hou Tao) - Prevent tailcall infinite loop caused by freplace (Leon Hwang) - Mark raw_tracepoint arguments as nullable (Kumar Kartikeya Dwivedi) - Introduce uptr support in the task local storage map (Martin KaFai Lau) - Stringify errno log messages in libbpf (Mykyta Yatsenko) - Add kmem_cache BPF iterator for perf's lock profiling (Namhyung Kim) - Support BPF objects of either endianness in libbpf (Tony Ambardar) - Add ksym to struct_ops trampoline to fix stack trace (Xu Kuohai) - Introduce private stack for eligible BPF programs (Yonghong Song) - Migrate samples/bpf tests to selftests/bpf test_progs (Daniel T. Lee) - Migrate test_sock to selftests/bpf test_progs (Jordan Rife) * tag 'bpf-next-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (152 commits) libbpf: Change hash_combine parameters from long to unsigned long selftests/bpf: Fix build error with llvm 19 libbpf: Fix memory leak in bpf_program__attach_uprobe_multi bpf: use common instruction history across all states bpf: Add necessary migrate_disable to range_tree. bpf: Do not alloc arena on unsupported arches selftests/bpf: Set test path for token/obj_priv_implicit_token_envvar selftests/bpf: Add a test for arena range tree algorithm bpf: Introduce range_tree data structure and use it in bpf arena samples/bpf: Remove unused variable in xdp2skb_meta_kern.c samples/bpf: Remove unused variables in tc_l2_redirect_kern.c bpftool: Cast variable `var` to long long bpf, x86: Propagate tailcall info only for subprogs bpf: Add kernel symbol for struct_ops trampoline bpf: Use function pointers count as struct_ops links count bpf: Remove unused member rcu from bpf_struct_ops_map selftests/bpf: Add struct_ops prog private stack tests bpf: Support private stack for struct_ops progs selftests/bpf: Add tracing prog private stack tests bpf, x86: Support private stack in jit ... |
||
Geert Uytterhoeven
|
9008fe8fad |
slab: Fix too strict alignment check in create_cache()
On m68k, where the minimum alignment of unsigned long is 2 bytes: Kernel panic - not syncing: __kmem_cache_create_args: Failed to create slab 'io_kiocb'. Error -22 CPU: 0 UID: 0 PID: 1 Comm: swapper Not tainted 6.12.0-atari-03776-g7eaa1f99261a #1783 Stack from 0102fe5c: 0102fe5c 00514a2b 00514a2b ffffff00 00000001 0051f5ed 00425e78 00514a2b 0041eb74 ffffffea 00000310 0051f5ed ffffffea ffffffea 00601f60 00000044 0102ff20 000e7a68 0051ab8e 004383b8 0051f5ed ffffffea 000000b8 00000007 01020c00 00000000 000e77f0 0041e5f0 005f67c0 0051f5ed 000000b6 0102fef4 00000310 0102fef4 00000000 00000016 005f676c 0060a34c 00000010 00000004 00000038 0000009a 01000000 000000b8 005f668e 0102e000 00001372 0102ff88 Call Trace: [<00425e78>] dump_stack+0xc/0x10 [<0041eb74>] panic+0xd8/0x26c [<000e7a68>] __kmem_cache_create_args+0x278/0x2e8 [<000e77f0>] __kmem_cache_create_args+0x0/0x2e8 [<0041e5f0>] memset+0x0/0x8c [<005f67c0>] io_uring_init+0x54/0xd2 The minimal alignment of an integral type may differ from its size, hence is not safe to assume that an arbitrary freeptr_t (which is basically an unsigned long) is always aligned to 4 or 8 bytes. As nothing seems to require the additional alignment, it is safe to fix this by relaxing the check to the actual minimum alignment of freeptr_t. Fixes: |
||
Linus Torvalds
|
bf9aa14fc5 |
A rather large update for timekeeping and timers:
- The final step to get rid of auto-rearming posix-timers posix-timers are currently auto-rearmed by the kernel when the signal of the timer is ignored so that the timer signal can be delivered once the corresponding signal is unignored. This requires to throttle the timer to prevent a DoS by small intervals and keeps the system pointlessly out of low power states for no value. This is a long standing non-trivial problem due to the lock order of posix-timer lock and the sighand lock along with life time issues as the timer and the sigqueue have different life time rules. Cure this by: * Embedding the sigqueue into the timer struct to have the same life time rules. Aside of that this also avoids the lookup of the timer in the signal delivery and rearm path as it's just a always valid container_of() now. * Queuing ignored timer signals onto a seperate ignored list. * Moving queued timer signals onto the ignored list when the signal is switched to SIG_IGN before it could be delivered. * Walking the ignored list when SIG_IGN is lifted and requeue the signals to the actual signal lists. This allows the signal delivery code to rearm the timer. This also required to consolidate the signal delivery rules so they are consistent across all situations. With that all self test scenarios finally succeed. - Core infrastructure for VFS multigrain timestamping This is required to allow the kernel to use coarse grained time stamps by default and switch to fine grained time stamps when inode attributes are actively observed via getattr(). These changes have been provided to the VFS tree as well, so that the VFS specific infrastructure could be built on top. - Cleanup and consolidation of the sleep() infrastructure * Move all sleep and timeout functions into one file * Rework udelay() and ndelay() into proper documented inline functions and replace the hardcoded magic numbers by proper defines. * Rework the fsleep() implementation to take the reality of the timer wheel granularity on different HZ values into account. Right now the boundaries are hard coded time ranges which fail to provide the requested accuracy on different HZ settings. * Update documentation for all sleep/timeout related functions and fix up stale documentation links all over the place * Fixup a few usage sites - Rework of timekeeping and adjtimex(2) to prepare for multiple PTP clocks A system can have multiple PTP clocks which are participating in seperate and independent PTP clock domains. So far the kernel only considers the PTP clock which is based on CLOCK TAI relevant as that's the clock which drives the timekeeping adjustments via the various user space daemons through adjtimex(2). The non TAI based clock domains are accessible via the file descriptor based posix clocks, but their usability is very limited. They can't be accessed fast as they always go all the way out to the hardware and they cannot be utilized in the kernel itself. As Time Sensitive Networking (TSN) gains traction it is required to provide fast user and kernel space access to these clocks. The approach taken is to utilize the timekeeping and adjtimex(2) infrastructure to provide this access in a similar way how the kernel provides access to clock MONOTONIC, REALTIME etc. Instead of creating a duplicated infrastructure this rework converts timekeeping and adjtimex(2) into generic functionality which operates on pointers to data structures instead of using static variables. This allows to provide time accessors and adjtimex(2) functionality for the independent PTP clocks in a subsequent step. - Consolidate hrtimer initialization hrtimers are set up by initializing the data structure and then seperately setting the callback function for historical reasons. That's an extra unnecessary step and makes Rust support less straight forward than it should be. Provide a new set of hrtimer_setup*() functions and convert the core code and a few usage sites of the less frequently used interfaces over. The bulk of the htimer_init() to hrtimer_setup() conversion is already prepared and scheduled for the next merge window. - Drivers: * Ensure that the global timekeeping clocksource is utilizing the cluster 0 timer on MIPS multi-cluster systems. Otherwise CPUs on different clusters use their cluster specific clocksource which is not guaranteed to be synchronized with other clusters. * Mostly boring cleanups, fixes, improvements and code movement -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmc7kPITHHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoZKkD/9OUL6fOJrDUmOYBa4QVeMyfTef4EaL tvwIMM/29XQFeiq3xxCIn+EMnHjXn2lvIhYGQ7GKsbKYwvJ7ZBDpQb+UMhZ2nKI9 6D6BP6WomZohKeH2fZbJQAdqOi3KRYdvQdIsVZUexkqiaVPphRvOH9wOr45gHtZM EyMRSotPlQTDqcrbUejDMEO94GyjDCYXRsyATLxjmTzL/N4xD4NRIiotjM2vL/a9 8MuCgIhrKUEyYlFoOxxeokBsF3kk3/ez2jlG9b/N8VLH3SYIc2zgL58FBgWxlmgG bY71nVG3nUgEjxBd2dcXAVVqvb+5widk8p6O7xxOAQKTLMcJ4H0tQDkMnzBtUzvB DGAJDHAmAr0g+ja9O35Pkhunkh4HYFIbq0Il4d1HMKObhJV0JumcKuQVxrXycdm3 UZfq3seqHsZJQbPgCAhlFU0/2WWScocbee9bNebGT33KVwSp5FoVv89C/6Vjb+vV Gusc3thqrQuMAZW5zV8g4UcBAA/xH4PB0I+vHib+9XPZ4UQ7/6xKl2jE0kd5hX7n AAUeZvFNFqIsY+B6vz+Jx/yzyM7u5cuXq87pof5EHVFzv56lyTp4ToGcOGYRgKH5 JXeYV1OxGziSDrd5vbf9CzdWMzqMvTefXrHbWrjkjhNOe8E1A8O88RZ5uRKZhmSw hZZ4hdM9+3T7cg== =2VC6 -----END PGP SIGNATURE----- Merge tag 'timers-core-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "A rather large update for timekeeping and timers: - The final step to get rid of auto-rearming posix-timers posix-timers are currently auto-rearmed by the kernel when the signal of the timer is ignored so that the timer signal can be delivered once the corresponding signal is unignored. This requires to throttle the timer to prevent a DoS by small intervals and keeps the system pointlessly out of low power states for no value. This is a long standing non-trivial problem due to the lock order of posix-timer lock and the sighand lock along with life time issues as the timer and the sigqueue have different life time rules. Cure this by: - Embedding the sigqueue into the timer struct to have the same life time rules. Aside of that this also avoids the lookup of the timer in the signal delivery and rearm path as it's just a always valid container_of() now. - Queuing ignored timer signals onto a seperate ignored list. - Moving queued timer signals onto the ignored list when the signal is switched to SIG_IGN before it could be delivered. - Walking the ignored list when SIG_IGN is lifted and requeue the signals to the actual signal lists. This allows the signal delivery code to rearm the timer. This also required to consolidate the signal delivery rules so they are consistent across all situations. With that all self test scenarios finally succeed. - Core infrastructure for VFS multigrain timestamping This is required to allow the kernel to use coarse grained time stamps by default and switch to fine grained time stamps when inode attributes are actively observed via getattr(). These changes have been provided to the VFS tree as well, so that the VFS specific infrastructure could be built on top. - Cleanup and consolidation of the sleep() infrastructure - Move all sleep and timeout functions into one file - Rework udelay() and ndelay() into proper documented inline functions and replace the hardcoded magic numbers by proper defines. - Rework the fsleep() implementation to take the reality of the timer wheel granularity on different HZ values into account. Right now the boundaries are hard coded time ranges which fail to provide the requested accuracy on different HZ settings. - Update documentation for all sleep/timeout related functions and fix up stale documentation links all over the place - Fixup a few usage sites - Rework of timekeeping and adjtimex(2) to prepare for multiple PTP clocks A system can have multiple PTP clocks which are participating in seperate and independent PTP clock domains. So far the kernel only considers the PTP clock which is based on CLOCK TAI relevant as that's the clock which drives the timekeeping adjustments via the various user space daemons through adjtimex(2). The non TAI based clock domains are accessible via the file descriptor based posix clocks, but their usability is very limited. They can't be accessed fast as they always go all the way out to the hardware and they cannot be utilized in the kernel itself. As Time Sensitive Networking (TSN) gains traction it is required to provide fast user and kernel space access to these clocks. The approach taken is to utilize the timekeeping and adjtimex(2) infrastructure to provide this access in a similar way how the kernel provides access to clock MONOTONIC, REALTIME etc. Instead of creating a duplicated infrastructure this rework converts timekeeping and adjtimex(2) into generic functionality which operates on pointers to data structures instead of using static variables. This allows to provide time accessors and adjtimex(2) functionality for the independent PTP clocks in a subsequent step. - Consolidate hrtimer initialization hrtimers are set up by initializing the data structure and then seperately setting the callback function for historical reasons. That's an extra unnecessary step and makes Rust support less straight forward than it should be. Provide a new set of hrtimer_setup*() functions and convert the core code and a few usage sites of the less frequently used interfaces over. The bulk of the htimer_init() to hrtimer_setup() conversion is already prepared and scheduled for the next merge window. - Drivers: - Ensure that the global timekeeping clocksource is utilizing the cluster 0 timer on MIPS multi-cluster systems. Otherwise CPUs on different clusters use their cluster specific clocksource which is not guaranteed to be synchronized with other clusters. - Mostly boring cleanups, fixes, improvements and code movement" * tag 'timers-core-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (140 commits) posix-timers: Fix spurious warning on double enqueue versus do_exit() clocksource/drivers/arm_arch_timer: Use of_property_present() for non-boolean properties clocksource/drivers/gpx: Remove redundant casts clocksource/drivers/timer-ti-dm: Fix child node refcount handling dt-bindings: timer: actions,owl-timer: convert to YAML clocksource/drivers/ralink: Add Ralink System Tick Counter driver clocksource/drivers/mips-gic-timer: Always use cluster 0 counter as clocksource clocksource/drivers/timer-ti-dm: Don't fail probe if int not found clocksource/drivers:sp804: Make user selectable clocksource/drivers/dw_apb: Remove unused dw_apb_clockevent functions hrtimers: Delete hrtimer_init_on_stack() alarmtimer: Switch to use hrtimer_setup() and hrtimer_setup_on_stack() io_uring: Switch to use hrtimer_setup_on_stack() sched/idle: Switch to use hrtimer_setup_on_stack() hrtimers: Delete hrtimer_init_sleeper_on_stack() wait: Switch to use hrtimer_setup_sleeper_on_stack() timers: Switch to use hrtimer_setup_sleeper_on_stack() net: pktgen: Switch to use hrtimer_setup_sleeper_on_stack() futex: Switch to use hrtimer_setup_sleeper_on_stack() fs/aio: Switch to use hrtimer_setup_sleeper_on_stack() ... |
||
Linus Torvalds
|
ba1f9c8fe3 |
arm64 updates for 6.13:
* Support for running Linux in a protected VM under the Arm Confidential Compute Architecture (CCA) * Guarded Control Stack user-space support. Current patches follow the x86 ABI of implicitly creating a shadow stack on clone(). Subsequent patches (already on the list) will add support for clone3() allowing finer-grained control of the shadow stack size and placement from libc * AT_HWCAP3 support (not running out of HWCAP2 bits yet but we are getting close with the upcoming dpISA support) * Other arch features: - In-kernel use of the memcpy instructions, FEAT_MOPS (previously only exposed to user; uaccess support not merged yet) - MTE: hugetlbfs support and the corresponding kselftests - Optimise CRC32 using the PMULL instructions - Support for FEAT_HAFT enabling ARCH_HAS_NONLEAF_PMD_YOUNG - Optimise the kernel TLB flushing to use the range operations - POE/pkey (permission overlays): further cleanups after bringing the signal handler in line with the x86 behaviour for 6.12 * arm64 perf updates: - Support for the NXP i.MX91 PMU in the existing IMX driver - Support for Ampere SoCs in the Designware PCIe PMU driver - Support for Marvell's 'PEM' PCIe PMU present in the 'Odyssey' SoC - Support for Samsung's 'Mongoose' CPU PMU - Support for PMUv3.9 finer-grained userspace counter access control - Switch back to platform_driver::remove() now that it returns 'void' - Add some missing events for the CXL PMU driver * Miscellaneous arm64 fixes/cleanups: - Page table accessors cleanup: type updates, drop unused macros, reorganise arch_make_huge_pte() and clean up pte_mkcont(), sanity check addresses before runtime P4D/PUD folding - Command line override for ID_AA64MMFR0_EL1.ECV (advertising the FEAT_ECV for the generic timers) allowing Linux to boot with firmware deployments that don't set SCTLR_EL3.ECVEn - ACPI/arm64: tighten the check for the array of platform timer structures and adjust the error handling procedure in gtdt_parse_timer_block() - Optimise the cache flush for the uprobes xol slot (skip if no change) and other uprobes/kprobes cleanups - Fix the context switching of tpidrro_el0 when kpti is enabled - Dynamic shadow call stack fixes - Sysreg updates - Various arm64 kselftest improvements -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmc5POIACgkQa9axLQDI XvEDYA//a3eeNkgMuGdnSCVcLz+zy+oNwAwboG/4X1DqL8jiCbI4npwugPx95RIA YZOUvo9T2aL3OyefpUHll4gFHqx9OwoZIig2F70TEUmlPsGUbh0KBkdfQF3xZPdl EwV0kHSGEqMWMBwsGJGwgCYrUaf1MUQzh1GBl7VJ2ts5XsJBaBeOyKkysij26wtZ V+aHq2IUx7qQS7+HC/4P6IoHxKziFcsCMovaKaynP4cw9xXBQbDMcNlHEwndOMyk pu2zrv7GG0j3KQuVP/2Alf5FKhmI0GVGP/6Nc/zsOmw96w8Kf7HfzEtkHawr2aRq rqg/c9ivzDn1p+fUBo4ZYtrRk4IAY+yKu6hdzdLTP5+bQrBTWTO9rjQVBm9FAGYT sCdEj1NqzvExvNHD7X6ut/GJ05lmce3K+qeSXSEysN9gqiT3eomYWMXrD2V2lxzb rIDDcb/icfaqjt14Mksh19r/rzNeq7noj9CGSmcqw0BHZfHzl38Lai6pdfYzCNyn vCM/c4c1D/WWX8/lifO1JZVbhDk1jy82Iphg2KEhL8iKPxDsKBBZLmYuU1oa7tMo WryGAz9+GQwd+W9chFuaOEtMnzvW2scEJ5Eb2fEf0Qj0aEurkL+C9dZR6o1GN77V DBUxtU628Ef4PJJGfbNCwZzdd8UPYG3a/mKfQQ3dz0oz2LySlW4= =wDot -----END PGP SIGNATURE----- Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: - Support for running Linux in a protected VM under the Arm Confidential Compute Architecture (CCA) - Guarded Control Stack user-space support. Current patches follow the x86 ABI of implicitly creating a shadow stack on clone(). Subsequent patches (already on the list) will add support for clone3() allowing finer-grained control of the shadow stack size and placement from libc - AT_HWCAP3 support (not running out of HWCAP2 bits yet but we are getting close with the upcoming dpISA support) - Other arch features: - In-kernel use of the memcpy instructions, FEAT_MOPS (previously only exposed to user; uaccess support not merged yet) - MTE: hugetlbfs support and the corresponding kselftests - Optimise CRC32 using the PMULL instructions - Support for FEAT_HAFT enabling ARCH_HAS_NONLEAF_PMD_YOUNG - Optimise the kernel TLB flushing to use the range operations - POE/pkey (permission overlays): further cleanups after bringing the signal handler in line with the x86 behaviour for 6.12 - arm64 perf updates: - Support for the NXP i.MX91 PMU in the existing IMX driver - Support for Ampere SoCs in the Designware PCIe PMU driver - Support for Marvell's 'PEM' PCIe PMU present in the 'Odyssey' SoC - Support for Samsung's 'Mongoose' CPU PMU - Support for PMUv3.9 finer-grained userspace counter access control - Switch back to platform_driver::remove() now that it returns 'void' - Add some missing events for the CXL PMU driver - Miscellaneous arm64 fixes/cleanups: - Page table accessors cleanup: type updates, drop unused macros, reorganise arch_make_huge_pte() and clean up pte_mkcont(), sanity check addresses before runtime P4D/PUD folding - Command line override for ID_AA64MMFR0_EL1.ECV (advertising the FEAT_ECV for the generic timers) allowing Linux to boot with firmware deployments that don't set SCTLR_EL3.ECVEn - ACPI/arm64: tighten the check for the array of platform timer structures and adjust the error handling procedure in gtdt_parse_timer_block() - Optimise the cache flush for the uprobes xol slot (skip if no change) and other uprobes/kprobes cleanups - Fix the context switching of tpidrro_el0 when kpti is enabled - Dynamic shadow call stack fixes - Sysreg updates - Various arm64 kselftest improvements * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (168 commits) arm64: tls: Fix context-switching of tpidrro_el0 when kpti is enabled kselftest/arm64: Try harder to generate different keys during PAC tests kselftest/arm64: Don't leak pipe fds in pac.exec_sign_all() arm64/ptrace: Clarify documentation of VL configuration via ptrace kselftest/arm64: Corrupt P0 in the irritator when testing SSVE acpi/arm64: remove unnecessary cast arm64/mm: Change protval as 'pteval_t' in map_range() kselftest/arm64: Fix missing printf() argument in gcs/gcs-stress.c kselftest/arm64: Add FPMR coverage to fp-ptrace kselftest/arm64: Expand the set of ZA writes fp-ptrace does kselftets/arm64: Use flag bits for features in fp-ptrace assembler code kselftest/arm64: Enable build of PAC tests with LLVM=1 kselftest/arm64: Check that SVCR is 0 in signal handlers selftests/mm: Fix unused function warning for aarch64_write_signal_pkey() kselftest/arm64: Fix printf() compiler warnings in the arm64 syscall-abi.c tests kselftest/arm64: Fix printf() warning in the arm64 MTE prctl() test kselftest/arm64: Fix printf() compiler warnings in the arm64 fp tests kselftest/arm64: Fix build with stricter assemblers arm64/scs: Drop unused prototype __pi_scs_patch_vmlinux() arm64/scs: Deal with 64-bit relative offsets in FDE frames ... |
||
Linus Torvalds
|
3e7447ab48 |
A lot of miscellaneous ext4 bug fixes and cleanups this cycle, most
notably in the journaling code, bufered I/O, and compiler warning cleanups. -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAmc7NN4ACgkQ8vlZVpUN gaMJRAf+Oc3Tn/ZvuX0amkaBQI+ZNIeYD/U0WBSvarKb00bo1X39mM/0LovqV6ec c51iRgt8U6uDZDUm6zJtppkIUiqkHRj+TmTInueFtmUqhIg8jgfZIpxCn0QkFKnQ jI5EKCkvUqM0B347axH/s+dlOE9JBSlQNKgjkvCYOGknQ1PH6X8oMDt5QAqGEk3P Nsa4QChIxt2yujFvydgFT+RAbjvY3sNvrZ7D3B+KL3VSJpILChVZK/UdFrraSXxq mLO5j4txjtnr/OLgujCTHOfPsTiQReHHXErrSbKhnFhrTXLD0mZSUgJ6irpaxRQ5 wQHQzmsrVwqFfqPU3Hkl8FGeCR0owQ== =26/E -----END PGP SIGNATURE----- Merge tag 'ext4_for_linus-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "A lot of miscellaneous ext4 bug fixes and cleanups this cycle, most notably in the journaling code, bufered I/O, and compiler warning cleanups" * tag 'ext4_for_linus-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (33 commits) jbd2: Fix comment describing journal_init_common() ext4: prevent an infinite loop in the lazyinit thread ext4: use struct_size() to improve ext4_htree_store_dirent() ext4: annotate struct fname with __counted_by() jbd2: avoid dozens of -Wflex-array-member-not-at-end warnings ext4: use str_yes_no() helper function ext4: prevent delalloc to nodelalloc on remount jbd2: make b_frozen_data allocation always succeed ext4: cleanup variable name in ext4_fc_del() ext4: use string choices helpers jbd2: remove the 'success' parameter from the jbd2_do_replay() function jbd2: remove useless 'block_error' variable jbd2: factor out jbd2_do_replay() jbd2: refactor JBD2_COMMIT_BLOCK process in do_one_pass() jbd2: unified release of buffer_head in do_one_pass() jbd2: remove redundant judgments for check v1 checksum ext4: use ERR_CAST to return an error-valued pointer mm: zero range of eof folio exposed by inode size extension ext4: partial zero eof block on unaligned inode size extension ext4: disambiguate the return value of ext4_dio_write_end_io() ... |
||
Linus Torvalds
|
0f25f0e4ef |
the bulk of struct fd memory safety stuff
Making sure that struct fd instances are destroyed in the same scope where they'd been created, getting rid of reassignments and passing them by reference, converting to CLASS(fd{,_pos,_raw}). We are getting very close to having the memory safety of that stuff trivial to verify. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZzdikAAKCRBZ7Krx/gZQ 69nJAQCmbQHK3TGUbQhOw6MJXOK9ezpyEDN3FZb4jsu38vTIdgEA6OxAYDO2m2g9 CN18glYmD3wRyU6Bwl4vGODouSJvDgA= =gVH3 -----END PGP SIGNATURE----- Merge tag 'pull-fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull 'struct fd' class updates from Al Viro: "The bulk of struct fd memory safety stuff Making sure that struct fd instances are destroyed in the same scope where they'd been created, getting rid of reassignments and passing them by reference, converting to CLASS(fd{,_pos,_raw}). We are getting very close to having the memory safety of that stuff trivial to verify" * tag 'pull-fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (28 commits) deal with the last remaing boolean uses of fd_file() css_set_fork(): switch to CLASS(fd_raw, ...) memcg_write_event_control(): switch to CLASS(fd) assorted variants of irqfd setup: convert to CLASS(fd) do_pollfd(): convert to CLASS(fd) convert do_select() convert vfs_dedupe_file_range(). convert cifs_ioctl_copychunk() convert media_request_get_by_fd() convert spu_run(2) switch spufs_calls_{get,put}() to CLASS() use convert cachestat(2) convert do_preadv()/do_pwritev() fdget(), more trivial conversions fdget(), trivial conversions privcmd_ioeventfd_assign(): don't open-code eventfd_ctx_fdget() o2hb_region_dev_store(): avoid goto around fdget()/fdput() introduce "fd_pos" class, convert fdget_pos() users to it. fdget_raw() users: switch to CLASS(fd_raw) convert vmsplice() to CLASS(fd) ... |
||
Linus Torvalds
|
7956186e75 |
vfs-6.13.tmpfs
-----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZzcZIgAKCRCRxhvAZXjc oge4AQDxhsKW+v/jKHydzqzwG3Ks7DIxrUg/mcGfdtBwjiWgvwEA8t0QAAfKECAK B0+bNKJ8XJRUtZ10Jgm3dzURbEhBWgU= =4Lui -----END PGP SIGNATURE----- Merge tag 'vfs-6.13.tmpfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull tmpfs case folding updates from Christian Brauner: "This adds case-insensitive support for tmpfs. The work contained in here adds support for case-insensitive file names lookups in tmpfs. The main difference from other casefold filesystems is that tmpfs has no information on disk, just on RAM, so we can't use mkfs to create a case-insensitive tmpfs. For this implementation, there's a mount option for casefolding. The rest of the patchset follows a similar approach as ext4 and f2fs. The use case for this feature is similar to the use case for ext4, to better support compatibility layers (like Wine), particularly in combination with sandboxing/container tools (like Flatpak). Those containerization tools can share a subset of the host filesystem with an application. In the container, the root directory and any parent directories required for a shared directory are on tmpfs, with the shared directories bind-mounted into the container's view of the filesystem. If the host filesystem is using case-insensitive directories, then the application can do lookups inside those directories in a case-insensitive way, without this needing to be implemented in user-space. However, if the host is only sharing a subset of a case-insensitive directory with the application, then the parent directories of the mount point will be part of the container's root tmpfs. When the application tries to do case-insensitive lookups of those parent directories on a case-sensitive tmpfs, the lookup will fail" * tag 'vfs-6.13.tmpfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: tmpfs: Initialize sysfs during tmpfs init tmpfs: Fix type for sysfs' casefold attribute libfs: Fix kernel-doc warning in generic_ci_validate_strict_name docs: tmpfs: Add casefold options tmpfs: Expose filesystem features via sysfs tmpfs: Add flag FS_CASEFOLD_FL support for tmpfs dirs tmpfs: Add casefold lookup support libfs: Export generic_ci_ dentry functions unicode: Recreate utf8_parse_version() unicode: Export latest available UTF-8 version number ext4: Use generic_ci_validate_strict_name helper libfs: Create the helper function generic_ci_validate_strict_name() |
||
Linus Torvalds
|
56be9aaf98 |
vfs-6.13.pagecache
-----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZzcUQAAKCRCRxhvAZXjc onEpAQCUdwIBHpwmSIFvJFA9aNGpbLzi0dDSEIxuWYtp5qVuogD+ImccwqpG3kEi Zq9vokdPpB1zbahxKl1mkvBG4G0GFQE= =LbP6 -----END PGP SIGNATURE----- Merge tag 'vfs-6.13.pagecache' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs pagecache updates from Christian Brauner: "Cleanup filesystem page flag usage: This continues the work to make the mappedtodisk/owner_2 flag available to filesystems which don't use buffer heads. Further patches remove uses of Private2. This brings us very close to being rid of it entirely" * tag 'vfs-6.13.pagecache' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: migrate: Remove references to Private2 ceph: Remove call to PagePrivate2() btrfs: Switch from using the private_2 flag to owner_2 mm: Remove PageMappedToDisk nilfs2: Convert nilfs_copy_buffer() to use folios fs: Move clearing of mappedtodisk to buffer.c |
||
Linus Torvalds
|
70e7730c2a |
vfs-6.13.misc
-----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZzcToAAKCRCRxhvAZXjc osL9AP948FFumJRC28gDJ4xp+X4eohNOfkgoEG8FTbF2zU6ulwD+O0pr26FqpFli pqlG+38UdATImpfqqWjPbb72sBYcfQg= =wLUh -----END PGP SIGNATURE----- Merge tag 'vfs-6.13.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "Features: - Fixup and improve NLM and kNFSD file lock callbacks Last year both GFS2 and OCFS2 had some work done to make their locking more robust when exported over NFS. Unfortunately, part of that work caused both NLM (for NFS v3 exports) and kNFSD (for NFSv4.1+ exports) to no longer send lock notifications to clients This in itself is not a huge problem because most NFS clients will still poll the server in order to acquire a conflicted lock It's important for NLM and kNFSD that they do not block their kernel threads inside filesystem's file_lock implementations because that can produce deadlocks. We used to make sure of this by only trusting that posix_lock_file() can correctly handle blocking lock calls asynchronously, so the lock managers would only setup their file_lock requests for async callbacks if the filesystem did not define its own lock() file operation However, when GFS2 and OCFS2 grew the capability to correctly handle blocking lock requests asynchronously, they started signalling this behavior with EXPORT_OP_ASYNC_LOCK, and the check for also trusting posix_lock_file() was inadvertently dropped, so now most filesystems no longer produce lock notifications when exported over NFS Fix this by using an fop_flag which greatly simplifies the problem and grooms the way for future uses by both filesystems and lock managers alike - Add a sysctl to delete the dentry when a file is removed instead of making it a negative dentry Commit |
||
Linus Torvalds
|
6ac81fd55e |
vfs-6.13.mgtime
-----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZzcScQAKCRCRxhvAZXjc oj+5AP4k822a77wc/3iPFk379naIvQ4dsrgemh0/Pb6ZvzvkFQEAi3vFCfzCDR2x SkJF/RwXXKZv6U31QXMRt2Qo6wfBuAc= =nVlm -----END PGP SIGNATURE----- Merge tag 'vfs-6.13.mgtime' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs multigrain timestamps from Christian Brauner: "This is another try at implementing multigrain timestamps. This time with significant help from the timekeeping maintainers to reduce the performance impact. Thomas provided a base branch that contains the required timekeeping interfaces for the VFS. It serves as the base for the multi-grain timestamp work: - Multigrain timestamps allow the kernel to use fine-grained timestamps when an inode's attributes is being actively observed via ->getattr(). With this support, it's possible for a file to get a fine-grained timestamp, and another modified after it to get a coarse-grained stamp that is earlier than the fine-grained time. If this happens then the files can appear to have been modified in reverse order, which breaks VFS ordering guarantees. To prevent this, a floor value is maintained for multigrain timestamps. Whenever a fine-grained timestamp is handed out, record it, and when later coarse-grained stamps are handed out, ensure they are not earlier than that value. If the coarse-grained timestamp is earlier than the fine-grained floor, return the floor value instead. The timekeeper changes add a static singleton atomic64_t into timekeeper.c that is used to keep track of the latest fine-grained time ever handed out. This is tracked as a monotonic ktime_t value to ensure that it isn't affected by clock jumps. Because it is updated at different times than the rest of the timekeeper object, the floor value is managed independently of the timekeeper via a cmpxchg() operation, and sits on its own cacheline. Two new public timekeeper interfaces are added: (1) ktime_get_coarse_real_ts64_mg() fills a timespec64 with the later of the coarse-grained clock and the floor time (2) ktime_get_real_ts64_mg() gets the fine-grained clock value, and tries to swap it into the floor. A timespec64 is filled with the result. - The VFS has always used coarse-grained timestamps when updating the ctime and mtime after a change. This has the benefit of allowing filesystems to optimize away a lot metadata updates, down to around 1 per jiffy, even when a file is under heavy writes. Unfortunately, this has always been an issue when we're exporting via NFSv3, which relies on timestamps to validate caches. A lot of changes can happen in a jiffy, so timestamps aren't sufficient to help the client decide when to invalidate the cache. Even with NFSv4, a lot of exported filesystems don't properly support a change attribute and are subject to the same problems with timestamp granularity. Other applications have similar issues with timestamps (e.g backup applications). If we were to always use fine-grained timestamps, that would improve the situation, but that becomes rather expensive, as the underlying filesystem would have to log a lot more metadata updates. This adds a way to only use fine-grained timestamps when they are being actively queried. Use the (unused) top bit in inode->i_ctime_nsec as a flag that indicates whether the current timestamps have been queried via stat() or the like. When it's set, we allow the kernel to use a fine-grained timestamp iff it's necessary to make the ctime show a different value. This solves the problem of being able to distinguish the timestamp between updates, but introduces a new problem: it's now possible for a file being changed to get a fine-grained timestamp. A file that is altered just a bit later can then get a coarse-grained one that appears older than the earlier fine-grained time. This violates timestamp ordering guarantees. This is where the earlier mentioned timkeeping interfaces help. A global monotonic atomic64_t value is kept that acts as a timestamp floor. When we go to stamp a file, we first get the latter of the current floor value and the current coarse-grained time. If the inode ctime hasn't been queried then we just attempt to stamp it with that value. If it has been queried, then first see whether the current coarse time is later than the existing ctime. If it is, then we accept that value. If it isn't, then we get a fine-grained time and try to swap that into the global floor. Whether that succeeds or fails, we take the resulting floor time, convert it to realtime and try to swap that into the ctime. We take the result of the ctime swap whether it succeeds or fails, since either is just as valid. Filesystems can opt into this by setting the FS_MGTIME fstype flag. Others should be unaffected (other than being subject to the same floor value as multigrain filesystems)" * tag 'vfs-6.13.mgtime' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: reduce pointer chasing in is_mgtime() test tmpfs: add support for multigrain timestamps btrfs: convert to multigrain timestamps ext4: switch to multigrain timestamps xfs: switch to multigrain timestamps Documentation: add a new file documenting multigrain timestamps fs: add percpu counters for significant multigrain timestamp events fs: tracepoints around multigrain timestamp events fs: handle delegated timestamps in setattr_copy_mgtime timekeeping: Add percpu counter for tracking floor swap events timekeeping: Add interfaces for handling timestamps with a floor value fs: have setattr_copy handle multigrain timestamps appropriately fs: add infrastructure for multigrain timestamps |
||
Linus Torvalds
|
4a5df37964 |
10 hotfixes, 7 of which are cc:stable. All singletons, please see the
changelogs for details. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZzkr6AAKCRDdBJ7gKXxA jsb2AP9HCOI4w9rQTmBdnaefXytS7fiiPq+LVNpjJ0NGXX2FSgD/e1NM0wi8KevQ npcvlqTcXtRSJvYNF904aTNyDn+Kuw0= =KFGY -----END PGP SIGNATURE----- Merge tag 'mm-hotfixes-stable-2024-11-16-15-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull hotfixes from Andrew Morton: "10 hotfixes, 7 of which are cc:stable. All singletons, please see the changelogs for details" * tag 'mm-hotfixes-stable-2024-11-16-15-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mm: revert "mm: shmem: fix data-race in shmem_getattr()" ocfs2: uncache inode which has failed entering the group mm: fix NULL pointer dereference in alloc_pages_bulk_noprof mm, doc: update read_ahead_kb for MADV_HUGEPAGE fs/proc/task_mmu: prevent integer overflow in pagemap_scan_get_args() sched/task_stack: fix object_is_on_stack() for KASAN tagged pointers crash, powerpc: default to CRASH_DUMP=n on PPC_BOOK3S_32 mm/mremap: fix address wraparound in move_page_tables() tools/mm: fix compile error mm, swap: fix allocation and scanning race with swapoff |
||
Andrew Morton
|
d1aa0c0429 |
mm: revert "mm: shmem: fix data-race in shmem_getattr()"
Revert |
||
Vlastimil Babka
|
9e19aa165c |
Merge branch 'slab/for-6.13/features' into slab/for-next
Merge the slab feature branch for 6.13: - Add new slab_strict_numa parameter for per-object memory policies (Christoph Lameter) |