From 4a693ce65b186fddc1a73621bd6f941e6e3eca21 Mon Sep 17 00:00:00 2001 From: Huacai Chen Date: Fri, 29 Dec 2023 16:02:13 +0800 Subject: [PATCH 01/17] kdump: defer the insertion of crashkernel resources In /proc/iomem, sub-regions should be inserted after their parent, otherwise the insertion of parent resource fails. But after generic crashkernel reservation applied, in both RISC-V and ARM64 (LoongArch will also use generic reservation later on), crashkernel resources are inserted before their parent, which causes the parent disappear in /proc/iomem. So we defer the insertion of crashkernel resources to an early_initcall(). 1, Without 'crashkernel' parameter: 100d0100-100d01ff : LOON0001:00 100d0100-100d01ff : LOON0001:00 LOON0001:00 100e0000-100e0bff : LOON0002:00 100e0000-100e0bff : LOON0002:00 LOON0002:00 1fe001e0-1fe001e7 : serial 90400000-fa17ffff : System RAM f6220000-f622ffff : Reserved f9ee0000-f9ee3fff : Reserved fa120000-fa17ffff : Reserved fa190000-fe0bffff : System RAM fa190000-fa1bffff : Reserved fe4e0000-47fffffff : System RAM 43c000000-441ffffff : Reserved 47ff98000-47ffa3fff : Reserved 47ffa4000-47ffa7fff : Reserved 47ffa8000-47ffabfff : Reserved 47ffac000-47ffaffff : Reserved 47ffb0000-47ffb3fff : Reserved 2, With 'crashkernel' parameter, before this patch: 100d0100-100d01ff : LOON0001:00 100d0100-100d01ff : LOON0001:00 LOON0001:00 100e0000-100e0bff : LOON0002:00 100e0000-100e0bff : LOON0002:00 LOON0002:00 1fe001e0-1fe001e7 : serial e6200000-f61fffff : Crash kernel fa190000-fe0bffff : System RAM fa190000-fa1bffff : Reserved fe4e0000-47fffffff : System RAM 43c000000-441ffffff : Reserved 47ff98000-47ffa3fff : Reserved 47ffa4000-47ffa7fff : Reserved 47ffa8000-47ffabfff : Reserved 47ffac000-47ffaffff : Reserved 47ffb0000-47ffb3fff : Reserved 3, With 'crashkernel' parameter, after this patch: 100d0100-100d01ff : LOON0001:00 100d0100-100d01ff : LOON0001:00 LOON0001:00 100e0000-100e0bff : LOON0002:00 100e0000-100e0bff : LOON0002:00 LOON0002:00 1fe001e0-1fe001e7 : serial 90400000-fa17ffff : System RAM e6200000-f61fffff : Crash kernel f6220000-f622ffff : Reserved f9ee0000-f9ee3fff : Reserved fa120000-fa17ffff : Reserved fa190000-fe0bffff : System RAM fa190000-fa1bffff : Reserved fe4e0000-47fffffff : System RAM 43c000000-441ffffff : Reserved 47ff98000-47ffa3fff : Reserved 47ffa4000-47ffa7fff : Reserved 47ffa8000-47ffabfff : Reserved 47ffac000-47ffaffff : Reserved 47ffb0000-47ffb3fff : Reserved Link: https://lkml.kernel.org/r/20231229080213.2622204-1-chenhuacai@loongson.cn Signed-off-by: Huacai Chen Fixes: 0ab97169aa05 ("crash_core: add generic function to do reservation") Cc: Baoquan He Cc: Zhen Lei Cc: [6.6+] Signed-off-by: Andrew Morton --- kernel/crash_core.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/kernel/crash_core.c b/kernel/crash_core.c index d48315667752..4e2cac71b84f 100644 --- a/kernel/crash_core.c +++ b/kernel/crash_core.c @@ -376,7 +376,6 @@ static int __init reserve_crashkernel_low(unsigned long long low_size) crashk_low_res.start = low_base; crashk_low_res.end = low_base + low_size - 1; - insert_resource(&iomem_resource, &crashk_low_res); #endif return 0; } @@ -458,8 +457,19 @@ void __init reserve_crashkernel_generic(char *cmdline, crashk_res.start = crash_base; crashk_res.end = crash_base + crash_size - 1; - insert_resource(&iomem_resource, &crashk_res); } + +static __init int insert_crashkernel_resources(void) +{ + if (crashk_res.start < crashk_res.end) + insert_resource(&iomem_resource, &crashk_res); + + if (crashk_low_res.start < crashk_low_res.end) + insert_resource(&iomem_resource, &crashk_low_res); + + return 0; +} +early_initcall(insert_crashkernel_resources); #endif int crash_prepare_elf64_headers(struct crash_mem *mem, int need_kernel_map, From 65cc86800cf27d8e2da3439c834aef96d93fef63 Mon Sep 17 00:00:00 2001 From: Petr Vorel Date: Thu, 4 Jan 2024 16:49:53 +0100 Subject: [PATCH 02/17] MAINTAINERS: update LTP maintainers There are more people with git push permissions, but we keep only people who actually did review and merge patches last year. Link: https://lkml.kernel.org/r/20240104154953.1193634-1-pvorel@suse.cz Signed-off-by: Petr Vorel Reviewed-by: Li Wang Cc: Alexey Kodanev Cc: Jan Stancek Cc: Mike Frysinger Cc: Wanlong Gao Cc: Yang Xu Signed-off-by: Andrew Morton --- MAINTAINERS | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index 40c754b4c39c..621751a8b8c7 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -12627,12 +12627,11 @@ F: Documentation/devicetree/bindings/i2c/i2c-mux-ltc4306.txt F: drivers/i2c/muxes/i2c-mux-ltc4306.c LTP (Linux Test Project) -M: Mike Frysinger M: Cyril Hrubis -M: Wanlong Gao M: Jan Stancek -M: Stanislav Kholmanskikh -M: Alexey Kodanev +M: Petr Vorel +M: Li Wang +M: Yang Xu L: ltp@lists.linux.it (subscribers-only) S: Maintained W: http://linux-test-project.github.io/ From aaa2c9a97c22af5bf011f6dd8e0538219b45af88 Mon Sep 17 00:00:00 2001 From: Nathan Chancellor Date: Fri, 5 Jan 2024 12:13:04 -0700 Subject: [PATCH 03/17] lib/Kconfig.debug: disable CONFIG_DEBUG_INFO_BTF for Hexagon pahole, which generates BTF, relies on elfutils to process DWARF debug info. Because kernel modules are relocatable files, elfutils needs to resolve relocations when processing the DWARF .debug sections. Hexagon is not supported in binutils or elfutils, so elfutils is unable to process relocations in kernel modules, causing pahole to crash during BTF generation. Do not allow CONFIG_DEBUG_INFO_BTF to be selected for Hexagon until it is supported in elfutils, so that there are no more cryptic build failures during BTF generation. Link: https://lkml.kernel.org/r/20240105-hexagon-disable-btf-v1-1-ddab073e7f74@kernel.org Signed-off-by: Nathan Chancellor Reported-by: kernel test robot Closes: https://lore.kernel.org/oe-kbuild-all/202312192107.wMIKiZWw-lkp@intel.com/ Suggested-by: Nick Desaulniers Acked-by: Brian Cain Cc: Arnaldo Carvalho de Melo Signed-off-by: Andrew Morton --- lib/Kconfig.debug | 2 ++ 1 file changed, 2 insertions(+) diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index 97ce28f4d154..ba25129563ad 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -378,6 +378,8 @@ config DEBUG_INFO_BTF depends on !GCC_PLUGIN_RANDSTRUCT || COMPILE_TEST depends on BPF_SYSCALL depends on !DEBUG_INFO_DWARF5 || PAHOLE_VERSION >= 121 + # pahole uses elfutils, which does not have support for Hexagon relocations + depends on !HEXAGON help Generate deduplicated BTF type information from DWARF debug info. Turning this on expects presence of pahole tool, which will convert From cc478e0b6bdffd20561e1a07941a65f6c8962cab Mon Sep 17 00:00:00 2001 From: Andrey Konovalov Date: Tue, 9 Jan 2024 23:12:34 +0100 Subject: [PATCH 04/17] kasan: avoid resetting aux_lock With commit 63b85ac56a64 ("kasan: stop leaking stack trace handles"), KASAN zeroes out alloc meta when an object is freed. The zeroed out data purposefully includes alloc and auxiliary stack traces but also accidentally includes aux_lock. As aux_lock is only initialized for each object slot during slab creation, when the freed slot is reallocated, saving auxiliary stack traces for the new object leads to lockdep reports when taking the zeroed out aux_lock. Arguably, we could reinitialize aux_lock when the object is reallocated, but a simpler solution is to avoid zeroing out aux_lock when an object gets freed. Link: https://lkml.kernel.org/r/20240109221234.90929-1-andrey.konovalov@linux.dev Fixes: 63b85ac56a64 ("kasan: stop leaking stack trace handles") Signed-off-by: Andrey Konovalov Reported-by: Paul E. McKenney Closes: https://lore.kernel.org/linux-next/5cc0f83c-e1d6-45c5-be89-9b86746fe731@paulmck-laptop/ Reviewed-by: Marco Elver Tested-by: Paul E. McKenney Cc: Alexander Potapenko Cc: Andrey Ryabinin Cc: Dmitry Vyukov Cc: Liam R. Howlett Signed-off-by: Andrew Morton --- mm/kasan/generic.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/mm/kasan/generic.c b/mm/kasan/generic.c index 24c13dfb1e94..df6627f62402 100644 --- a/mm/kasan/generic.c +++ b/mm/kasan/generic.c @@ -487,6 +487,7 @@ void kasan_init_object_meta(struct kmem_cache *cache, const void *object) __memset(alloc_meta, 0, sizeof(*alloc_meta)); /* + * Prepare the lock for saving auxiliary stack traces. * Temporarily disable KASAN bug reporting to allow instrumented * raw_spin_lock_init to access aux_lock, which resides inside * of a redzone. @@ -510,8 +511,13 @@ static void release_alloc_meta(struct kasan_alloc_meta *meta) stack_depot_put(meta->aux_stack[0]); stack_depot_put(meta->aux_stack[1]); - /* Zero out alloc meta to mark it as invalid. */ - __memset(meta, 0, sizeof(*meta)); + /* + * Zero out alloc meta to mark it as invalid but keep aux_lock + * initialized to avoid having to reinitialize it when another object + * is allocated in the same slot. + */ + __memset(&meta->alloc_track, 0, sizeof(meta->alloc_track)); + __memset(meta->aux_stack, 0, sizeof(meta->aux_stack)); } static void release_free_meta(const void *object, struct kasan_free_meta *meta) From efbd6398353315b7018e6943e41fee9ec35e875f Mon Sep 17 00:00:00 2001 From: Carlos Llamas Date: Fri, 29 Sep 2023 03:48:17 +0000 Subject: [PATCH 05/17] scripts/decode_stacktrace.sh: optionally use LLVM utilities GNU's addr2line can have problems parsing a vmlinux built with LLVM, particularly when LTO was used. In order to decode the traces correctly this patch adds the ability to switch to LLVM's utilities readelf and addr2line. The same approach is followed by Will in [1]. Before: $ scripts/decode_stacktrace.sh vmlinux < kernel.log [17716.240635] Call trace: [17716.240646] skb_cow_data (??:?) [17716.240654] esp6_input (ld-temp.o:?) [17716.240666] xfrm_input (ld-temp.o:?) [17716.240674] xfrm6_rcv (??:?) [...] After: $ LLVM=1 scripts/decode_stacktrace.sh vmlinux < kernel.log [17716.240635] Call trace: [17716.240646] skb_cow_data (include/linux/skbuff.h:2172 net/core/skbuff.c:4503) [17716.240654] esp6_input (net/ipv6/esp6.c:977) [17716.240666] xfrm_input (net/xfrm/xfrm_input.c:659) [17716.240674] xfrm6_rcv (net/ipv6/xfrm6_input.c:172) [...] Note that one could set CROSS_COMPILE=llvm- instead to hack around this issue. However, doing so can break the decodecode routine as it will force the selection of other LLVM utilities down the line e.g. llvm-as. [1] https://lore.kernel.org/all/20230914131225.13415-3-will@kernel.org/ Link: https://lkml.kernel.org/r/20230929034836.403735-1-cmllamas@google.com Signed-off-by: Carlos Llamas Reviewed-by: Nick Desaulniers Reviewed-by: Elliot Berman Tested-by: Justin Stitt Cc: Will Deacon Cc: John Stultz Cc: Masahiro Yamada Cc: Nathan Chancellor Cc: Tom Rix Cc: Signed-off-by: Andrew Morton --- scripts/decode_stacktrace.sh | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-) diff --git a/scripts/decode_stacktrace.sh b/scripts/decode_stacktrace.sh index cb980b144ca1..fa5be6f57b00 100755 --- a/scripts/decode_stacktrace.sh +++ b/scripts/decode_stacktrace.sh @@ -16,6 +16,21 @@ elif type c++filt >/dev/null 2>&1 ; then cppfilt_opts=-i fi +UTIL_SUFFIX= +if [[ -z ${LLVM:-} ]]; then + UTIL_PREFIX=${CROSS_COMPILE:-} +else + UTIL_PREFIX=llvm- + if [[ ${LLVM} == */ ]]; then + UTIL_PREFIX=${LLVM}${UTIL_PREFIX} + elif [[ ${LLVM} == -* ]]; then + UTIL_SUFFIX=${LLVM} + fi +fi + +READELF=${UTIL_PREFIX}readelf${UTIL_SUFFIX} +ADDR2LINE=${UTIL_PREFIX}addr2line${UTIL_SUFFIX} + if [[ $1 == "-r" ]] ; then vmlinux="" basepath="auto" @@ -75,7 +90,7 @@ find_module() { if [[ "$modpath" != "" ]] ; then for fn in $(find "$modpath" -name "${module//_/[-_]}.ko*") ; do - if readelf -WS "$fn" | grep -qwF .debug_line ; then + if ${READELF} -WS "$fn" | grep -qwF .debug_line ; then echo $fn return fi @@ -169,7 +184,7 @@ parse_symbol() { if [[ $aarray_support == true && "${cache[$module,$address]+isset}" == "isset" ]]; then local code=${cache[$module,$address]} else - local code=$(${CROSS_COMPILE}addr2line -i -e "$objfile" "$address" 2>/dev/null) + local code=$(${ADDR2LINE} -i -e "$objfile" "$address" 2>/dev/null) if [[ $aarray_support == true ]]; then cache[$module,$address]=$code fi From ea52f71598f3d0cfaaf53b7d837bee9919041351 Mon Sep 17 00:00:00 2001 From: Johannes Weiner Date: Tue, 9 Jan 2024 14:02:00 -0500 Subject: [PATCH 06/17] mm: zswap: switch maintainers to recently active developers and reviewers Yosry, Nhat and I have been doing most of the recent development and reviewing of changes in this space. Signed-off-by: Johannes Weiner Acked-by: Nhat Pham Acked-by: Yosry Ahmed Acked-by: Dan Streetman Acked-by: Seth Jennings Cc: Vitaly Wool Signed-off-by: Andrew Morton --- MAINTAINERS | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index 621751a8b8c7..5affd7ca3217 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -24141,11 +24141,13 @@ N: zstd K: zstd ZSWAP COMPRESSED SWAP CACHING -M: Seth Jennings -M: Dan Streetman -M: Vitaly Wool +M: Johannes Weiner +M: Yosry Ahmed +M: Nhat Pham L: linux-mm@kvack.org S: Maintained +F: Documentation/admin-guide/mm/zswap.rst +F: include/linux/zswap.h F: mm/zswap.c THE REST From 4cccb6221cae6d020270606b9e52b1678fc8b71a Mon Sep 17 00:00:00 2001 From: Muhammad Usama Anjum Date: Tue, 9 Jan 2024 16:24:42 +0500 Subject: [PATCH 07/17] fs/proc/task_mmu: move mmu notification mechanism inside mm lock MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Move mmu notification mechanism inside mm lock to prevent race condition in other components which depend on it. The notifier will invalidate memory range. Depending upon the number of iterations, different memory ranges would be invalidated. The following warning would be removed by this patch: WARNING: CPU: 0 PID: 5067 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:734 kvm_mmu_notifier_change_pte+0x860/0x960 arch/x86/kvm/../../../virt/kvm/kvm_main.c:734 There is no behavioural and performance change with this patch when there is no component registered with the mmu notifier. [akpm@linux-foundation.org: narrow the scope of `range', per Sean] Link: https://lkml.kernel.org/r/20240109112445.590736-1-usama.anjum@collabora.com Fixes: 52526ca7fdb9 ("fs/proc/task_mmu: implement IOCTL to get and optionally clear info about PTEs") Signed-off-by: Muhammad Usama Anjum Reported-by: syzbot+81227d2bd69e9dedb802@syzkaller.appspotmail.com Link: https://lore.kernel.org/all/000000000000f6d051060c6785bc@google.com/ Reviewed-by: Sean Christopherson Cc: Andrei Vagin Cc: Arnd Bergmann Cc: David Hildenbrand Cc: Hugh Dickins Cc: Kefeng Wang Cc: Liam R. Howlett Cc: Michał Mirosław Cc: Peter Xu Cc: Ryan Roberts Cc: Stephen Rothwell Cc: Suren Baghdasaryan Cc: Signed-off-by: Andrew Morton --- fs/proc/task_mmu.c | 24 +++++++++++++----------- 1 file changed, 13 insertions(+), 11 deletions(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 62b16f42d5d2..3f78ebbb795f 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -2432,7 +2432,6 @@ static long pagemap_scan_flush_buffer(struct pagemap_scan_private *p) static long do_pagemap_scan(struct mm_struct *mm, unsigned long uarg) { - struct mmu_notifier_range range; struct pagemap_scan_private p = {0}; unsigned long walk_start; size_t n_ranges_out = 0; @@ -2448,15 +2447,9 @@ static long do_pagemap_scan(struct mm_struct *mm, unsigned long uarg) if (ret) return ret; - /* Protection change for the range is going to happen. */ - if (p.arg.flags & PM_SCAN_WP_MATCHING) { - mmu_notifier_range_init(&range, MMU_NOTIFY_PROTECTION_VMA, 0, - mm, p.arg.start, p.arg.end); - mmu_notifier_invalidate_range_start(&range); - } - for (walk_start = p.arg.start; walk_start < p.arg.end; walk_start = p.arg.walk_end) { + struct mmu_notifier_range range; long n_out; if (fatal_signal_pending(current)) { @@ -2467,8 +2460,20 @@ static long do_pagemap_scan(struct mm_struct *mm, unsigned long uarg) ret = mmap_read_lock_killable(mm); if (ret) break; + + /* Protection change for the range is going to happen. */ + if (p.arg.flags & PM_SCAN_WP_MATCHING) { + mmu_notifier_range_init(&range, MMU_NOTIFY_PROTECTION_VMA, 0, + mm, walk_start, p.arg.end); + mmu_notifier_invalidate_range_start(&range); + } + ret = walk_page_range(mm, walk_start, p.arg.end, &pagemap_scan_ops, &p); + + if (p.arg.flags & PM_SCAN_WP_MATCHING) + mmu_notifier_invalidate_range_end(&range); + mmap_read_unlock(mm); n_out = pagemap_scan_flush_buffer(&p); @@ -2494,9 +2499,6 @@ static long do_pagemap_scan(struct mm_struct *mm, unsigned long uarg) if (pagemap_scan_writeback_args(&p.arg, uarg)) ret = -EFAULT; - if (p.arg.flags & PM_SCAN_WP_MATCHING) - mmu_notifier_invalidate_range_end(&range); - kfree(p.vec_buf); return ret; } From 327b4603c0b2469c2ef5f15b510d0e42d8e209de Mon Sep 17 00:00:00 2001 From: Manivannan Sadhasivam Date: Tue, 9 Jan 2024 15:44:31 +0530 Subject: [PATCH 08/17] mailmap: update entry for Manivannan Sadhasivam Remove the map for Linaro id as it is still in use and I want to use it for submitting patches. Otherwise, git uses kernel.org as the author id for patches created using Linaro id. Link: https://lkml.kernel.org/r/20240109-update-mailmap-v1-1-bf7a39f15fb7@linaro.org Signed-off-by: Manivannan Sadhasivam Signed-off-by: Andrew Morton --- .mailmap | 1 - 1 file changed, 1 deletion(-) diff --git a/.mailmap b/.mailmap index 03587c433097..23dbb22cfd7a 100644 --- a/.mailmap +++ b/.mailmap @@ -363,7 +363,6 @@ Maheshwar Ajja Malathi Gottam Manikanta Pubbisetty Manivannan Sadhasivam -Manivannan Sadhasivam Manoj Basapathi Marcin Nowakowski Marc Zyngier From 7bb943806ff61e83ae4cceef8906b7fe52453e8a Mon Sep 17 00:00:00 2001 From: James Gowans Date: Wed, 13 Dec 2023 08:40:04 +0200 Subject: [PATCH 09/17] kexec: do syscore_shutdown() in kernel_kexec syscore_shutdown() runs driver and module callbacks to get the system into a state where it can be correctly shut down. In commit 6f389a8f1dd2 ("PM / reboot: call syscore_shutdown() after disable_nonboot_cpus()") syscore_shutdown() was removed from kernel_restart_prepare() and hence got (incorrectly?) removed from the kexec flow. This was innocuous until commit 6735150b6997 ("KVM: Use syscore_ops instead of reboot_notifier to hook restart/shutdown") changed the way that KVM registered its shutdown callbacks, switching from reboot notifiers to syscore_ops.shutdown. As syscore_shutdown() is missing from kexec, KVM's shutdown hook is not run and virtualisation is left enabled on the boot CPU which results in triple faults when switching to the new kernel on Intel x86 VT-x with VMXE enabled. Fix this by adding syscore_shutdown() to the kexec sequence. In terms of where to add it, it is being added after migrating the kexec task to the boot CPU, but before APs are shut down. It is not totally clear if this is the best place: in commit 6f389a8f1dd2 ("PM / reboot: call syscore_shutdown() after disable_nonboot_cpus()") it is stated that "syscore_ops operations should be carried with one CPU on-line and interrupts disabled." APs are only offlined later in machine_shutdown(), so this syscore_shutdown() is being run while APs are still online. This seems to be the correct place as it matches where syscore_shutdown() is run in the reboot and halt flows - they also run it before APs are shut down. The assumption is that the commit message in commit 6f389a8f1dd2 ("PM / reboot: call syscore_shutdown() after disable_nonboot_cpus()") is no longer valid. KVM has been discussed here as it is what broke loudly by not having syscore_shutdown() in kexec, but this change impacts more than just KVM; all drivers/modules which register a syscore_ops.shutdown callback will now be invoked in the kexec flow. Looking at some of them like x86 MCE it is probably more correct to also shut these down during kexec. Maintainers of all drivers which use syscore_ops.shutdown are added on CC for visibility. They are: arch/powerpc/platforms/cell/spu_base.c .shutdown = spu_shutdown, arch/x86/kernel/cpu/mce/core.c .shutdown = mce_syscore_shutdown, arch/x86/kernel/i8259.c .shutdown = i8259A_shutdown, drivers/irqchip/irq-i8259.c .shutdown = i8259A_shutdown, drivers/irqchip/irq-sun6i-r.c .shutdown = sun6i_r_intc_shutdown, drivers/leds/trigger/ledtrig-cpu.c .shutdown = ledtrig_cpu_syscore_shutdown, drivers/power/reset/sc27xx-poweroff.c .shutdown = sc27xx_poweroff_shutdown, kernel/irq/generic-chip.c .shutdown = irq_gc_shutdown, virt/kvm/kvm_main.c .shutdown = kvm_shutdown, This has been tested by doing a kexec on x86_64 and aarch64. Link: https://lkml.kernel.org/r/20231213064004.2419447-1-jgowans@amazon.com Fixes: 6735150b6997 ("KVM: Use syscore_ops instead of reboot_notifier to hook restart/shutdown") Signed-off-by: James Gowans Cc: Baoquan He Cc: Eric Biederman Cc: Paolo Bonzini Cc: Sean Christopherson Cc: Marc Zyngier Cc: Arnd Bergmann Cc: Tony Luck Cc: Borislav Petkov Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Chen-Yu Tsai Cc: Jernej Skrabec Cc: Samuel Holland Cc: Pavel Machek Cc: Sebastian Reichel Cc: Orson Zhai Cc: Alexander Graf Cc: Jan H. Schoenherr Cc: Signed-off-by: Andrew Morton --- kernel/kexec_core.c | 1 + 1 file changed, 1 insertion(+) diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c index a08031b57a61..d08fc7b5db97 100644 --- a/kernel/kexec_core.c +++ b/kernel/kexec_core.c @@ -1257,6 +1257,7 @@ int kernel_kexec(void) kexec_in_progress = true; kernel_restart_prepare("kexec reboot"); migrate_to_reboot_cpu(); + syscore_shutdown(); /* * migrate_to_reboot_cpu() disables CPU hotplug assuming that From 7ea6ec4c25294e8bc8788148ef854df92ee8dc5e Mon Sep 17 00:00:00 2001 From: Ma Wupeng Date: Tue, 9 Jan 2024 12:15:36 +0800 Subject: [PATCH 10/17] efi: disable mirror feature during crashkernel If the system has no mirrored memory or uses crashkernel.high while kernelcore=mirror is enabled on the command line then during crashkernel, there will be limited mirrored memory and this usually leads to OOM. To solve this problem, disable the mirror feature during crashkernel. Link: https://lkml.kernel.org/r/20240109041536.3903042-1-mawupeng1@huawei.com Signed-off-by: Ma Wupeng Acked-by: Mike Rapoport (IBM) Cc: Signed-off-by: Andrew Morton --- mm/mm_init.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/mm/mm_init.c b/mm/mm_init.c index 89dc29f1e6c6..2c19f5515e36 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -26,6 +26,7 @@ #include #include #include +#include #include "internal.h" #include "slab.h" #include "shuffle.h" @@ -381,6 +382,11 @@ static void __init find_zone_movable_pfns_for_nodes(void) goto out; } + if (is_kdump_kernel()) { + pr_warn("The system is under kdump, ignore kernelcore=mirror.\n"); + goto out; + } + for_each_mem_region(r) { if (memblock_is_mirror(r)) continue; From 4e87ff59cebb53d3ce1333245e64ab7d51ebf118 Mon Sep 17 00:00:00 2001 From: Andrew Morton Date: Tue, 9 Jan 2024 21:35:21 -0800 Subject: [PATCH 11/17] kernel/crash_core.c: make __crash_hotplug_lock static sparse warnings: kernel/crash_core.c:749:1: sparse: sparse: symbol '__crash_hotplug_lock' was not declared. Should it be static? Fixes: e2a8f20dd8e9 ("Crash: add lock to serialize crash hotplug handling") Reported-by: kernel test robot Closes: https://lore.kernel.org/oe-kbuild-all/202401080654.IjjU5oK7-lkp@intel.com/ Cc: Baoquan He Signed-off-by: Andrew Morton --- kernel/crash_core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/crash_core.c b/kernel/crash_core.c index 4e2cac71b84f..75cd6a736d03 100644 --- a/kernel/crash_core.c +++ b/kernel/crash_core.c @@ -877,7 +877,7 @@ subsys_initcall(crash_notes_memory_init); * regions are online. So mutex lock __crash_hotplug_lock is used to * serialize the crash hotplug handling specifically. */ -DEFINE_MUTEX(__crash_hotplug_lock); +static DEFINE_MUTEX(__crash_hotplug_lock); #define crash_hotplug_lock() mutex_lock(&__crash_hotplug_lock) #define crash_hotplug_unlock() mutex_unlock(&__crash_hotplug_lock) From 0b8f128da761c54c5b210fad581ef597d38e7f81 Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Fri, 5 Jan 2024 22:30:44 -0800 Subject: [PATCH 12/17] mailmap: add old address mappings for Randy Add my old email addresses so that git send-email will map them to my current email address. Link: https://lkml.kernel.org/r/20240106063051.13623-1-rdunlap@infradead.org Signed-off-by: Randy Dunlap Cc: Andrew Morton Signed-off-by: Andrew Morton --- .mailmap | 3 +++ 1 file changed, 3 insertions(+) diff --git a/.mailmap b/.mailmap index 23dbb22cfd7a..3d2ca49c116a 100644 --- a/.mailmap +++ b/.mailmap @@ -503,6 +503,9 @@ Ralf Baechle Ralf Wildenhues Ram Chandra Jangir Randy Dunlap +Randy Dunlap +Randy Dunlap +Randy Dunlap Ravi Kumar Bokka Ravi Kumar Siddojigari Rémi Denis-Courmont From 55f958c55c2f381c4c9e121c0a5098b222bca75f Mon Sep 17 00:00:00 2001 From: Tanzir Hasan Date: Fri, 5 Jan 2024 23:44:34 +0000 Subject: [PATCH 13/17] mailmap: switch email for Tanzir Hasan Access to the tanzirh@google.com email will be revoked upon the end of the internship. Link: https://lkml.kernel.org/r/20240105-newemail-v3-1-3dc8ae035b54@google.com Signed-off-by: Tanzir Hasan Reviewed-by: Nick Desaulniers Reviewed-by: Justin Stitt Signed-off-by: Andrew Morton --- .mailmap | 1 + 1 file changed, 1 insertion(+) diff --git a/.mailmap b/.mailmap index 3d2ca49c116a..aecdb5291680 100644 --- a/.mailmap +++ b/.mailmap @@ -584,6 +584,7 @@ Surabhi Vishnoi Takashi YOSHII Tamizh Chelvam Raja Taniya Das +Tanzir Hasan Tejun Heo Tomeu Vizoso Thomas Graf From 11684134140bb708b6e6de969a060535630b1b53 Mon Sep 17 00:00:00 2001 From: Sumanth Korikkar Date: Wed, 10 Jan 2024 15:01:27 +0100 Subject: [PATCH 14/17] mm/memory_hotplug: fix memmap_on_memory sysfs value retrieval set_memmap_mode() stores the kernel parameter memmap mode as an integer. However, the get_memmap_mode() function utilizes param_get_bool() to fetch the value as a boolean, leading to potential endianness issue. On Big-endian architectures, the memmap_on_memory is consistently displayed as 'N' regardless of its actual status. To address this endianness problem, the solution involves obtaining the mode as an integer. This adjustment ensures the proper display of the memmap_on_memory parameter, presenting it as one of the following options: Force, Y, or N. Link: https://lkml.kernel.org/r/20240110140127.241451-1-sumanthk@linux.ibm.com Fixes: 2d1f649c7c08 ("mm/memory_hotplug: support memmap_on_memory when memmap is not aligned to pageblocks") Signed-off-by: Sumanth Korikkar Suggested-by: Gerald Schaefer Acked-by: David Hildenbrand Cc: Alexander Gordeev Cc: Aneesh Kumar K.V Cc: Heiko Carstens Cc: Michal Hocko Cc: Oscar Salvador Cc: Vasily Gorbik Cc: [6.6+] Signed-off-by: Andrew Morton --- mm/memory_hotplug.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index b3c0ff52bb72..21890994c1d3 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -101,9 +101,11 @@ static int set_memmap_mode(const char *val, const struct kernel_param *kp) static int get_memmap_mode(char *buffer, const struct kernel_param *kp) { - if (*((int *)kp->arg) == MEMMAP_ON_MEMORY_FORCE) - return sprintf(buffer, "force\n"); - return param_get_bool(buffer, kp); + int mode = *((int *)kp->arg); + + if (mode == MEMMAP_ON_MEMORY_FORCE) + return sprintf(buffer, "force\n"); + return sprintf(buffer, "%c\n", mode ? 'Y' : 'N'); } static const struct kernel_param_ops memmap_mode_ops = { From 00bcfcd47a52f50f07a2e88d730d7931384cb073 Mon Sep 17 00:00:00 2001 From: Donet Tom Date: Wed, 10 Jan 2024 14:03:35 +0530 Subject: [PATCH 15/17] selftests: mm: hugepage-vmemmap fails on 64K page size systems The kernel sefltest mm/hugepage-vmemmap fails on architectures which has different page size other than 4K. In hugepage-vmemmap page size used is 4k so the pfn calculation will go wrong on systems which has different page size .The length of MAP_HUGETLB memory must be hugepage aligned but in hugepage-vmemmap map length is 2M so this will not get aligned if the system has differnet hugepage size. Added psize() to get the page size and default_huge_page_size() to get the default hugepage size at run time, hugepage-vmemmap test pass on powerpc with 64K page size and x86 with 4K page size. Result on powerpc without patch (page size 64K) *# ./hugepage-vmemmap Returned address is 0x7effff000000 whose pfn is 0 Head page flags (100000000) is invalid check_page_flags: Invalid argument *# Result on powerpc with patch (page size 64K) *# ./hugepage-vmemmap Returned address is 0x7effff000000 whose pfn is 600 *# Result on x86 with patch (page size 4K) *# ./hugepage-vmemmap Returned address is 0x7fc7c2c00000 whose pfn is 1dac00 *# Link: https://lkml.kernel.org/r/3b3a3ae37ba21218481c482a872bbf7526031600.1704865754.git.donettom@linux.vnet.ibm.com Fixes: b147c89cd429 ("selftests: vm: add a hugetlb test case") Signed-off-by: Donet Tom Reported-by: Geetika Moolchandani Tested-by: Geetika Moolchandani Acked-by: Muchun Song Cc: Signed-off-by: Andrew Morton --- tools/testing/selftests/mm/hugepage-vmemmap.c | 29 ++++++++++++------- 1 file changed, 18 insertions(+), 11 deletions(-) diff --git a/tools/testing/selftests/mm/hugepage-vmemmap.c b/tools/testing/selftests/mm/hugepage-vmemmap.c index 5b354c209e93..894d28c3dd47 100644 --- a/tools/testing/selftests/mm/hugepage-vmemmap.c +++ b/tools/testing/selftests/mm/hugepage-vmemmap.c @@ -10,10 +10,7 @@ #include #include #include - -#define MAP_LENGTH (2UL * 1024 * 1024) - -#define PAGE_SIZE 4096 +#include "vm_util.h" #define PAGE_COMPOUND_HEAD (1UL << 15) #define PAGE_COMPOUND_TAIL (1UL << 16) @@ -39,6 +36,9 @@ #define MAP_FLAGS (MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB) #endif +static size_t pagesize; +static size_t maplength; + static void write_bytes(char *addr, size_t length) { unsigned long i; @@ -56,7 +56,7 @@ static unsigned long virt_to_pfn(void *addr) if (fd < 0) return -1UL; - lseek(fd, (unsigned long)addr / PAGE_SIZE * sizeof(pagemap), SEEK_SET); + lseek(fd, (unsigned long)addr / pagesize * sizeof(pagemap), SEEK_SET); read(fd, &pagemap, sizeof(pagemap)); close(fd); @@ -86,7 +86,7 @@ static int check_page_flags(unsigned long pfn) * this also verifies kernel has correctly set the fake page_head to tail * while hugetlb_free_vmemmap is enabled. */ - for (i = 1; i < MAP_LENGTH / PAGE_SIZE; i++) { + for (i = 1; i < maplength / pagesize; i++) { read(fd, &pageflags, sizeof(pageflags)); if ((pageflags & TAIL_PAGE_FLAGS) != TAIL_PAGE_FLAGS || (pageflags & HEAD_PAGE_FLAGS) == HEAD_PAGE_FLAGS) { @@ -106,18 +106,25 @@ int main(int argc, char **argv) void *addr; unsigned long pfn; - addr = mmap(MAP_ADDR, MAP_LENGTH, PROT_READ | PROT_WRITE, MAP_FLAGS, -1, 0); + pagesize = psize(); + maplength = default_huge_page_size(); + if (!maplength) { + printf("Unable to determine huge page size\n"); + exit(1); + } + + addr = mmap(MAP_ADDR, maplength, PROT_READ | PROT_WRITE, MAP_FLAGS, -1, 0); if (addr == MAP_FAILED) { perror("mmap"); exit(1); } /* Trigger allocation of HugeTLB page. */ - write_bytes(addr, MAP_LENGTH); + write_bytes(addr, maplength); pfn = virt_to_pfn(addr); if (pfn == -1UL) { - munmap(addr, MAP_LENGTH); + munmap(addr, maplength); perror("virt_to_pfn"); exit(1); } @@ -125,13 +132,13 @@ int main(int argc, char **argv) printf("Returned address is %p whose pfn is %lx\n", addr, pfn); if (check_page_flags(pfn) < 0) { - munmap(addr, MAP_LENGTH); + munmap(addr, maplength); perror("check_page_flags"); exit(1); } /* munmap() length of MAP_HUGETLB memory must be hugepage aligned */ - if (munmap(addr, MAP_LENGTH)) { + if (munmap(addr, maplength)) { perror("munmap"); exit(1); } From aa8f91910bf5fb11dcd176e6efac2dbe2cecebea Mon Sep 17 00:00:00 2001 From: Qi Zheng Date: Thu, 11 Jan 2024 15:52:19 +0800 Subject: [PATCH 16/17] MAINTAINERS: add entry for shrinker Since the shrinker-related code has been moved to a separate shrinker.c file, it's time to add a MAINTAINERS entry for it. Dave, Roman, Muchun and I have all worked on shrinker (development, review, etc) in the past period of time, and all of us are willing to continue working on shrinker in the future, so I'd like to add all of us as maintainer/reviewer. Link: https://lkml.kernel.org/r/20240111075219.34221-1-zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng Cc: Dave Chinner Cc: Muchun Song Cc: Roman Gushchin Signed-off-by: Andrew Morton --- MAINTAINERS | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 5affd7ca3217..a1b31c4c2969 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -19635,6 +19635,19 @@ T: git git://linuxtv.org/media_tree.git F: drivers/media/i2c/rj54n1cb0c.c F: include/media/i2c/rj54n1cb0c.h +SHRINKER +M: Andrew Morton +M: Dave Chinner +R: Qi Zheng +R: Roman Gushchin +R: Muchun Song +L: linux-mm@kvack.org +S: Maintained +F: Documentation/admin-guide/mm/shrinker_debugfs.rst +F: include/linux/shrinker.h +F: mm/shrinker.c +F: mm/shrinker_debug.c + SH_VOU V4L2 OUTPUT DRIVER L: linux-media@vger.kernel.org S: Orphan From 5d4747a6cc8e78ce74742d557fc9b7697fcacc95 Mon Sep 17 00:00:00 2001 From: Suren Baghdasaryan Date: Thu, 11 Jan 2024 17:39:35 -0800 Subject: [PATCH 17/17] userfaultfd: avoid huge_zero_page in UFFDIO_MOVE While testing UFFDIO_MOVE ioctl, syzbot triggered VM_BUG_ON_PAGE caused by a call to PageAnonExclusive() with a huge_zero_page as a parameter. UFFDIO_MOVE does not yet handle zeropages and returns EBUSY when one is encountered. Add an early huge_zero_page check in the PMD move path to avoid this situation. Link: https://lkml.kernel.org/r/20240112013935.1474648-1-surenb@google.com Fixes: adef440691ba ("userfaultfd: UFFDIO_MOVE uABI") Reported-by: syzbot+705209281e36404998f6@syzkaller.appspotmail.com Signed-off-by: Suren Baghdasaryan Acked-by: David Hildenbrand Cc: Andrea Arcangeli Cc: Peter Xu Cc: Stephen Rothwell Signed-off-by: Andrew Morton --- mm/userfaultfd.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 216ab4c8621f..20e3b0d9cf7e 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -1393,6 +1393,12 @@ ssize_t move_pages(struct userfaultfd_ctx *ctx, struct mm_struct *mm, err = -ENOENT; break; } + /* Avoid moving zeropages for now */ + if (is_huge_zero_pmd(*src_pmd)) { + spin_unlock(ptl); + err = -EBUSY; + break; + } /* Check if we can move the pmd without splitting it. */ if (move_splits_huge_pmd(dst_addr, src_addr, src_start + len) ||