From 29083fd84da576bfb3563d044f98d38e6b338f00 Mon Sep 17 00:00:00 2001 From: Mark Rutland Date: Tue, 18 Apr 2023 17:42:12 +0100 Subject: [PATCH 1/5] kasan: hw_tags: avoid invalid virt_to_page() When booting with 'kasan.vmalloc=off', a kernel configured with support for KASAN_HW_TAGS will explode at boot time due to bogus use of virt_to_page() on a vmalloc adddress. With CONFIG_DEBUG_VIRTUAL selected this will be reported explicitly, and with or without CONFIG_DEBUG_VIRTUAL the kernel will dereference a bogus address: | ------------[ cut here ]------------ | virt_to_phys used for non-linear address: (____ptrval____) (0xffff800008000000) | WARNING: CPU: 0 PID: 0 at arch/arm64/mm/physaddr.c:15 __virt_to_phys+0x78/0x80 | Modules linked in: | CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.3.0-rc3-00073-g83865133300d-dirty #4 | Hardware name: linux,dummy-virt (DT) | pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--) | pc : __virt_to_phys+0x78/0x80 | lr : __virt_to_phys+0x78/0x80 | sp : ffffcd076afd3c80 | x29: ffffcd076afd3c80 x28: 0068000000000f07 x27: ffff800008000000 | x26: fffffbfff0000000 x25: fffffbffff000000 x24: ff00000000000000 | x23: ffffcd076ad3c000 x22: fffffc0000000000 x21: ffff800008000000 | x20: ffff800008004000 x19: ffff800008000000 x18: ffff800008004000 | x17: 666678302820295f x16: ffffffffffffffff x15: 0000000000000004 | x14: ffffcd076b009e88 x13: 0000000000000fff x12: 0000000000000003 | x11: 00000000ffffefff x10: c0000000ffffefff x9 : 0000000000000000 | x8 : 0000000000000000 x7 : 205d303030303030 x6 : 302e30202020205b | x5 : ffffcd076b41d63f x4 : ffffcd076afd3827 x3 : 0000000000000000 | x2 : 0000000000000000 x1 : ffffcd076afd3a30 x0 : 000000000000004f | Call trace: | __virt_to_phys+0x78/0x80 | __kasan_unpoison_vmalloc+0xd4/0x478 | __vmalloc_node_range+0x77c/0x7b8 | __vmalloc_node+0x54/0x64 | init_IRQ+0x94/0xc8 | start_kernel+0x194/0x420 | __primary_switched+0xbc/0xc4 | ---[ end trace 0000000000000000 ]--- | Unable to handle kernel paging request at virtual address 03fffacbe27b8000 | Mem abort info: | ESR = 0x0000000096000004 | EC = 0x25: DABT (current EL), IL = 32 bits | SET = 0, FnV = 0 | EA = 0, S1PTW = 0 | FSC = 0x04: level 0 translation fault | Data abort info: | ISV = 0, ISS = 0x00000004 | CM = 0, WnR = 0 | swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000041bc5000 | [03fffacbe27b8000] pgd=0000000000000000, p4d=0000000000000000 | Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP | Modules linked in: | CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 6.3.0-rc3-00073-g83865133300d-dirty #4 | Hardware name: linux,dummy-virt (DT) | pstate: 200000c5 (nzCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--) | pc : __kasan_unpoison_vmalloc+0xe4/0x478 | lr : __kasan_unpoison_vmalloc+0xd4/0x478 | sp : ffffcd076afd3ca0 | x29: ffffcd076afd3ca0 x28: 0068000000000f07 x27: ffff800008000000 | x26: 0000000000000000 x25: 03fffacbe27b8000 x24: ff00000000000000 | x23: ffffcd076ad3c000 x22: fffffc0000000000 x21: ffff800008000000 | x20: ffff800008004000 x19: ffff800008000000 x18: ffff800008004000 | x17: 666678302820295f x16: ffffffffffffffff x15: 0000000000000004 | x14: ffffcd076b009e88 x13: 0000000000000fff x12: 0000000000000001 | x11: 0000800008000000 x10: ffff800008000000 x9 : ffffb2f8dee00000 | x8 : 000ffffb2f8dee00 x7 : 205d303030303030 x6 : 302e30202020205b | x5 : ffffcd076b41d63f x4 : ffffcd076afd3827 x3 : 0000000000000000 | x2 : 0000000000000000 x1 : ffffcd076afd3a30 x0 : ffffb2f8dee00000 | Call trace: | __kasan_unpoison_vmalloc+0xe4/0x478 | __vmalloc_node_range+0x77c/0x7b8 | __vmalloc_node+0x54/0x64 | init_IRQ+0x94/0xc8 | start_kernel+0x194/0x420 | __primary_switched+0xbc/0xc4 | Code: d34cfc08 aa1f03fa 8b081b39 d503201f (f9400328) | ---[ end trace 0000000000000000 ]--- | Kernel panic - not syncing: Attempted to kill the idle task! This is because init_vmalloc_pages() erroneously calls virt_to_page() on a vmalloc address, while virt_to_page() is only valid for addresses in the linear/direct map. Since init_vmalloc_pages() expects virtual addresses in the vmalloc range, it must use vmalloc_to_page() rather than virt_to_page(). We call init_vmalloc_pages() from __kasan_unpoison_vmalloc(), where we check !is_vmalloc_or_module_addr(), suggesting that we might encounter a non-vmalloc address. Luckily, this never happens. By design, we only call __kasan_unpoison_vmalloc() on pointers in the vmalloc area, and I have verified that we don't violate that expectation. Given that, is_vmalloc_or_module_addr() must always be true for any legitimate argument to __kasan_unpoison_vmalloc(). Correct init_vmalloc_pages() to use vmalloc_to_page(), and remove the redundant and misleading use of is_vmalloc_or_module_addr() in __kasan_unpoison_vmalloc(). Link: https://lkml.kernel.org/r/20230418164212.1775741-1-mark.rutland@arm.com Fixes: 6c2f761dad7851d8 ("kasan: fix zeroing vmalloc memory with HW_TAGS") Signed-off-by: Mark Rutland Cc: Alexander Potapenko Cc: Andrey Konovalov Cc: Andrey Ryabinin Cc: Dmitry Vyukov Cc: Marco Elver Cc: Signed-off-by: Andrew Morton --- mm/kasan/hw_tags.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/mm/kasan/hw_tags.c b/mm/kasan/hw_tags.c index f98b9f4d9d3e..06141bbc1e51 100644 --- a/mm/kasan/hw_tags.c +++ b/mm/kasan/hw_tags.c @@ -285,7 +285,7 @@ static void init_vmalloc_pages(const void *start, unsigned long size) const void *addr; for (addr = start; addr < start + size; addr += PAGE_SIZE) { - struct page *page = virt_to_page(addr); + struct page *page = vmalloc_to_page(addr); clear_highpage_kasan_tagged(page); } @@ -297,7 +297,7 @@ void *__kasan_unpoison_vmalloc(const void *start, unsigned long size, u8 tag; unsigned long redzone_start, redzone_size; - if (!kasan_vmalloc_enabled() || !is_vmalloc_or_module_addr(start)) { + if (!kasan_vmalloc_enabled()) { if (flags & KASAN_VMALLOC_INIT) init_vmalloc_pages(start, size); return (void *)start; From 43ec16f1450f4936025a9bdf1a273affdb9732c1 Mon Sep 17 00:00:00 2001 From: Zhang Zhengming Date: Wed, 19 Apr 2023 12:02:03 +0800 Subject: [PATCH 2/5] relayfs: fix out-of-bounds access in relay_file_read There is a crash in relay_file_read, as the var from point to the end of last subbuf. The oops looks something like: pc : __arch_copy_to_user+0x180/0x310 lr : relay_file_read+0x20c/0x2c8 Call trace: __arch_copy_to_user+0x180/0x310 full_proxy_read+0x68/0x98 vfs_read+0xb0/0x1d0 ksys_read+0x6c/0xf0 __arm64_sys_read+0x20/0x28 el0_svc_common.constprop.3+0x84/0x108 do_el0_svc+0x74/0x90 el0_svc+0x1c/0x28 el0_sync_handler+0x88/0xb0 el0_sync+0x148/0x180 We get the condition by analyzing the vmcore: 1). The last produced byte and last consumed byte both at the end of the last subbuf 2). A softirq calls function(e.g __blk_add_trace) to write relay buffer occurs when an program is calling relay_file_read_avail(). relay_file_read relay_file_read_avail relay_file_read_consume(buf, 0, 0); //interrupted by softirq who will write subbuf .... return 1; //read_start point to the end of the last subbuf read_start = relay_file_read_start_pos //avail is equal to subsize avail = relay_file_read_subbuf_avail //from points to an invalid memory address from = buf->start + read_start //system is crashed copy_to_user(buffer, from, avail) Link: https://lkml.kernel.org/r/20230419040203.37676-1-zhang.zhengming@h3c.com Fixes: 8d62fdebdaf9 ("relay file read: start-pos fix") Signed-off-by: Zhang Zhengming Reviewed-by: Zhao Lei Reviewed-by: Zhou Kete Reviewed-by: Pengcheng Yang Cc: Jens Axboe Cc: Signed-off-by: Andrew Morton --- kernel/relay.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/kernel/relay.c b/kernel/relay.c index 9aa70ae53d24..a80fa01042e9 100644 --- a/kernel/relay.c +++ b/kernel/relay.c @@ -989,7 +989,8 @@ static size_t relay_file_read_start_pos(struct rchan_buf *buf) size_t subbuf_size = buf->chan->subbuf_size; size_t n_subbufs = buf->chan->n_subbufs; size_t consumed = buf->subbufs_consumed % n_subbufs; - size_t read_pos = consumed * subbuf_size + buf->bytes_consumed; + size_t read_pos = (consumed * subbuf_size + buf->bytes_consumed) + % (n_subbufs * subbuf_size); read_subbuf = read_pos / subbuf_size; padding = buf->padding[read_subbuf]; From 00ca0f2e86bf40b016a646e6323a8941a09cf106 Mon Sep 17 00:00:00 2001 From: Lorenzo Stoakes Date: Sun, 30 Apr 2023 16:07:07 +0100 Subject: [PATCH 3/5] mm/mempolicy: correctly update prev when policy is equal on mbind The refactoring in commit f4e9e0e69468 ("mm/mempolicy: fix use-after-free of VMA iterator") introduces a subtle bug which arises when attempting to apply a new NUMA policy across a range of VMAs in mbind_range(). The refactoring passes a **prev pointer to keep track of the previous VMA in order to reduce duplication, and in all but one case it keeps this correctly updated. The bug arises when a VMA within the specified range has an equivalent policy as determined by mpol_equal() - which unlike other cases, does not update prev. This can result in a situation where, later in the iteration, a VMA is found whose policy does need to change. At this point, vma_merge() is invoked with prev pointing to a VMA which is before the previous VMA. Since vma_merge() discovers the curr VMA by looking for the one immediately after prev, it will now be in a situation where this VMA is incorrect and the merge will not proceed correctly. This is checked in the VM_WARN_ON() invariant case with end > curr->vm_end, which, if a merge is possible, results in a warning (if CONFIG_DEBUG_VM is specified). I note that vma_merge() performs these invariant checks only after merge_prev/merge_next are checked, which is debatable as it hides this issue if no merge is possible even though a buggy situation has arisen. The solution is simply to update the prev pointer even when policies are equal. This caused a bug to arise in the 6.2.y stable tree, and this patch resolves this bug. Link: https://lkml.kernel.org/r/83f1d612acb519d777bebf7f3359317c4e7f4265.1682866629.git.lstoakes@gmail.com Fixes: f4e9e0e69468 ("mm/mempolicy: fix use-after-free of VMA iterator") Signed-off-by: Lorenzo Stoakes Reported-by: kernel test robot Link: https://lore.kernel.org/oe-lkp/202304292203.44ddeff6-oliver.sang@intel.com Cc: Liam R. Howlett Cc: Mel Gorman Cc: Signed-off-by: Andrew Morton --- mm/mempolicy.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 2068b594dc88..1756389a0609 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -808,8 +808,10 @@ static int mbind_range(struct vma_iterator *vmi, struct vm_area_struct *vma, vmstart = vma->vm_start; } - if (mpol_equal(vma_policy(vma), new_pol)) + if (mpol_equal(vma_policy(vma), new_pol)) { + *prev = vma; return 0; + } pgoff = vma->vm_pgoff + ((vmstart - vma->vm_start) >> PAGE_SHIFT); merged = vma_merge(vmi, vma->vm_mm, *prev, vmstart, vmend, vma->vm_flags, From 3628d2bb155b0e4dfc1922a99de0631d3e2095d3 Mon Sep 17 00:00:00 2001 From: Michal Simek Date: Tue, 2 May 2023 15:55:00 +0200 Subject: [PATCH 4/5] MAINTAINERS: update Michal Simek's email @xilinx.com is still working but better to switch to new amd.com after AMD/Xilinx acquisition. Link: https://lkml.kernel.org/r/bd073d026f8c367a9cfb45d26d39f26e40c665dc.1683035692.git.michal.simek@amd.com Signed-off-by: Michal Simek Cc: Colin Ian King Cc: Jakub Kicinski Cc: Kirill Tkhai Cc: Konrad Dybcio Cc: Qais Yousef Cc: Michal Simek Signed-off-by: Andrew Morton --- .mailmap | 1 + MAINTAINERS | 4 ++-- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/.mailmap b/.mailmap index adfed592f88b..d3eb862f71ab 100644 --- a/.mailmap +++ b/.mailmap @@ -328,6 +328,7 @@ Maxime Ripard Maxime Ripard Mayuresh Janorkar Michael Buesch +Michal Simek Michel Dänzer Michel Lespinasse Michel Lespinasse diff --git a/MAINTAINERS b/MAINTAINERS index 32772c383ab7..51f7682246be 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -3036,7 +3036,7 @@ F: drivers/video/fbdev/wm8505fb* F: drivers/video/fbdev/wmt_ge_rops.* ARM/ZYNQ ARCHITECTURE -M: Michal Simek +M: Michal Simek L: linux-arm-kernel@lists.infradead.org (moderated for non-subscribers) S: Supported W: http://wiki.xilinx.com @@ -23084,7 +23084,7 @@ F: drivers/net/can/xilinx_can.c XILINX GPIO DRIVER M: Shubhrajyoti Datta R: Srinivas Neeli -R: Michal Simek +R: Michal Simek S: Maintained F: Documentation/devicetree/bindings/gpio/xlnx,gpio-xilinx.yaml F: Documentation/devicetree/bindings/gpio/gpio-zynq.yaml From 6152e53d9671b0ccc21c1bca842617b32ccfc5d8 Mon Sep 17 00:00:00 2001 From: Suren Baghdasaryan Date: Fri, 28 Apr 2023 10:35:33 -0700 Subject: [PATCH 5/5] mm: change per-VMA lock statistics to be disabled by default Change CONFIG_PER_VMA_LOCK_STATS to be disabled by default, as most users don't need it. Add configuration help to clarify its usage. Link: https://lkml.kernel.org/r/20230428173533.18158-1-surenb@google.com Fixes: 52f238653e45 ("mm: introduce per-VMA lock statistics") Signed-off-by: Suren Baghdasaryan Suggested-by: Linus Torvalds Reviewed-by: Lorenzo Stoakes Acked-by: Vlastimil Babka Reviewed-by: David Hildenbrand Signed-off-by: Andrew Morton --- mm/Kconfig.debug | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/mm/Kconfig.debug b/mm/Kconfig.debug index 6dae63b46368..a925415b4d10 100644 --- a/mm/Kconfig.debug +++ b/mm/Kconfig.debug @@ -274,6 +274,12 @@ config DEBUG_KMEMLEAK_AUTO_SCAN config PER_VMA_LOCK_STATS bool "Statistics for per-vma locks" depends on PER_VMA_LOCK - default y help - Statistics for per-vma locks. + Say Y here to enable success, retry and failure counters of page + faults handled under protection of per-vma locks. When enabled, the + counters are exposed in /proc/vmstat. This information is useful for + kernel developers to evaluate effectiveness of per-vma locks and to + identify pathological cases. Counting these events introduces a small + overhead in the page fault path. + + If in doubt, say N.