mirror of
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
synced 2025-01-15 11:37:47 +00:00
1324970 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Rik van Riel
|
95de1e5b34 |
fs/proc: fix softlockup in __read_vmcore (part 2)
Since commit 5cbcb62dddf5 ("fs/proc: fix softlockup in __read_vmcore") the number of softlockups in __read_vmcore at kdump time have gone down, but they still happen sometimes. In a memory constrained environment like the kdump image, a softlockup is not just a harmless message, but it can interfere with things like RCU freeing memory, causing the crashdump to get stuck. The second loop in __read_vmcore has a lot more opportunities for natural sleep points, like scheduling out while waiting for a data write to happen, but apparently that is not always enough. Add a cond_resched() to the second loop in __read_vmcore to (hopefully) get rid of the softlockups. Link: https://lkml.kernel.org/r/20250110102821.2a37581b@fangorn Fixes: 5cbcb62dddf5 ("fs/proc: fix softlockup in __read_vmcore") Signed-off-by: Rik van Riel <riel@surriel.com> Reported-by: Breno Leitao <leitao@debian.org> Cc: Baoquan He <bhe@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Matthew Wilcox (Oracle)
|
3bd1b307b6 |
mm: fix assertion in folio_end_read()
We only need to assert that the uptodate flag is clear if we're going to set it. This hasn't been a problem before now because we have only used folio_end_read() when completing with an error, but it's convenient to use it in squashfs if we discover the folio is already uptodate. Link: https://lkml.kernel.org/r/20250110163300.3346321-1-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Phillip Lougher <phillip@squashfs.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Donet Tom
|
8d423383db |
mm: vmscan : pgdemote vmstat is not getting updated when MGLRU is enabled.
When MGLRU is enabled, the pgdemote_kswapd, pgdemote_direct, and pgdemote_khugepaged stats in vmstat are not being updated. Commit f77f0c751478 ("mm,memcg: provide per-cgroup counters for NUMA balancing operations") moved the pgdemote vmstat update from demote_folio_list() to shrink_inactive_list(), which is in the normal LRU path. As a result, the pgdemote stats are updated correctly for the normal LRU but not for MGLRU. To address this, we have added the pgdemote stat update in the evict_folios() function, which is in the MGLRU path. With this patch, the pgdemote stats will now be updated correctly when MGLRU is enabled. Without this patch vmstat output when MGLRU is enabled ====================================================== pgdemote_kswapd 0 pgdemote_direct 0 pgdemote_khugepaged 0 With this patch vmstat output when MGLRU is enabled =================================================== pgdemote_kswapd 43234 pgdemote_direct 4691 pgdemote_khugepaged 0 Link: https://lkml.kernel.org/r/20250109060540.451261-1-donettom@linux.ibm.com Fixes: f77f0c751478 ("mm,memcg: provide per-cgroup counters for NUMA balancing operations") Signed-off-by: Donet Tom <donettom@linux.ibm.com> Acked-by: Yu Zhao <yuzhao@google.com> Tested-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Cc: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kaiyang Zhao <kaiyang2@cs.cmu.edu> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Wei Xu <weixugc@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Koichiro Den
|
926c34cb29 |
vmstat: disable vmstat_work on vmstat_cpu_down_prep()
The upstream commit adcfb264c3ed ("vmstat: disable vmstat_work on vmstat_cpu_down_prep()") introduced another warning during the boot phase so was soon reverted on upstream by commit cd6313beaeae ("Revert "vmstat: disable vmstat_work on vmstat_cpu_down_prep()""). This commit resolves it and reattempts the original fix. Even after mm/vmstat:online teardown, shepherd may still queue work for the dying cpu until the cpu is removed from online mask. While it's quite rare, this means that after unbind_workers() unbinds a per-cpu kworker, it potentially runs vmstat_update for the dying CPU on an irrelevant cpu before entering atomic AP states. When CONFIG_DEBUG_PREEMPT=y, it results in the following error with the backtrace. BUG: using smp_processor_id() in preemptible [00000000] code: \ kworker/7:3/1702 caller is refresh_cpu_vm_stats+0x235/0x5f0 CPU: 0 UID: 0 PID: 1702 Comm: kworker/7:3 Tainted: G Tainted: [N]=TEST Workqueue: mm_percpu_wq vmstat_update Call Trace: <TASK> dump_stack_lvl+0x8d/0xb0 check_preemption_disabled+0xce/0xe0 refresh_cpu_vm_stats+0x235/0x5f0 vmstat_update+0x17/0xa0 process_one_work+0x869/0x1aa0 worker_thread+0x5e5/0x1100 kthread+0x29e/0x380 ret_from_fork+0x2d/0x70 ret_from_fork_asm+0x1a/0x30 </TASK> So, for mm/vmstat:online, disable vmstat_work reliably on teardown and symmetrically enable it on startup. For secondary CPUs during CPU hotplug scenarios, ensure the delayed work is disabled immediately after the initialization. These CPUs are not yet online when start_shepherd_timer() runs on boot CPU. vmstat_cpu_online() will enable the work for them. Link: https://lkml.kernel.org/r/20250108042807.3429745-1-koichiro.den@canonical.com Signed-off-by: Huacai Chen <chenhuacai@kernel.org> Signed-off-by: Koichiro Den <koichiro.den@canonical.com> Suggested-by: Huacai Chen <chenhuacai@kernel.org> Tested-by: Charalampos Mitrodimas <charmitro@posteo.net> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Yu Zhao
|
c30c0860f0 |
mm/hugetlb_vmemmap: fix memory loads ordering
Using x86_64 as an example, for a 32KB struct page[] area describing a 2MB hugeTLB, HVO reduces the area to 4KB by the following steps: 1. Split the (r/w vmemmap) PMD mapping the area into 512 (r/w) PTEs; 2. For the 8 PTEs mapping the area, remap PTE 1-7 to the page mapped by PTE 0, and at the same time change the permission from r/w to r/o; 3. Free the pages PTE 1-7 used to map, hence the reduction from 32KB to 4KB. However, the following race can happen due to improperly memory loads ordering: CPU 1 (HVO) CPU 2 (speculative PFN walker) page_ref_freeze() synchronize_rcu() rcu_read_lock() page_is_fake_head() is false vmemmap_remap_pte() XXX: struct page[] becomes r/o page_ref_unfreeze() page_ref_count() is not zero atomic_add_unless(&page->_refcount) XXX: try to modify r/o struct page[] Specifically, page_is_fake_head() must be ordered after page_ref_count() on CPU 2 so that it can only return true for this case, to avoid the later attempt to modify r/o struct page[]. This patch adds the missing memory barrier and makes the tests on page_is_fake_head() and page_ref_count() done in the proper order. Link: https://lkml.kernel.org/r/20250108074822.722696-1-yuzhao@google.com Fixes: bd225530a4c7 ("mm/hugetlb_vmemmap: fix race with speculative PFN walkers") Signed-off-by: Yu Zhao <yuzhao@google.com> Reported-by: Will Deacon <will@kernel.org> Closes: https://lore.kernel.org/20241128142028.GA3506@willie-the-truck/ Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Muchun Song <muchun.song@linux.dev> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Kairui Song
|
045191a877 |
zram: fix potential UAF of zram table
If zram_meta_alloc failed early, it frees allocated zram->table without setting it NULL. Which will potentially cause zram_meta_free to access the table if user reset an failed and uninitialized device. Link: https://lkml.kernel.org/r/20250107065446.86928-1-ryncsn@gmail.com Fixes: 74363ec674cb ("zram: fix uninitialized ZRAM not releasing backing device") Signed-off-by: Kairui Song <kasong@tencent.com> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Ryan Roberts
|
39a444282e |
selftests/mm: set allocated memory to non-zero content in cow test
After commit b1f202060afe ("mm: remap unused subpages to shared zeropage when splitting isolated thp"), cow test cases involving swapping out THPs via madvise(MADV_PAGEOUT) started to be skipped due to the subsequent check via pagemap determining that the memory was not actually swapped out. Logs similar to this were emitted: ... # [RUN] Basic COW after fork() ... with swapped-out, PTE-mapped THP (16 kB) ok 2 # SKIP MADV_PAGEOUT did not work, is swap enabled? # [RUN] Basic COW after fork() ... with single PTE of swapped-out THP (16 kB) ok 3 # SKIP MADV_PAGEOUT did not work, is swap enabled? # [RUN] Basic COW after fork() ... with swapped-out, PTE-mapped THP (32 kB) ok 4 # SKIP MADV_PAGEOUT did not work, is swap enabled? ... The commit in question introduces the behaviour of scanning THPs and if their content is predominantly zero, it splits them and replaces the pages which are wholly zero with the zero page. These cow test cases were getting caught up in this. So let's avoid that by filling the contents of all allocated memory with a non-zero value. With this in place, the tests are passing again. Link: https://lkml.kernel.org/r/20250107142555.1870101-1-ryan.roberts@arm.com Fixes: b1f202060afe ("mm: remap unused subpages to shared zeropage when splitting isolated thp") Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Usama Arif <usamaarif642@gmail.com> Cc: Yu Zhao <yuzhao@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Ryan Roberts
|
7cc36ca820 |
mm: clear uffd-wp PTE/PMD state on mremap()
When mremap()ing a memory region previously registered with userfaultfd as write-protected but without UFFD_FEATURE_EVENT_REMAP, an inconsistency in flag clearing leads to a mismatch between the vma flags (which have uffd-wp cleared) and the pte/pmd flags (which do not have uffd-wp cleared). This mismatch causes a subsequent mprotect(PROT_WRITE) to trigger a warning in page_table_check_pte_flags() due to setting the pte to writable while uffd-wp is still set. Fix this by always explicitly clearing the uffd-wp pte/pmd flags on any such mremap() so that the values are consistent with the existing clearing of VM_UFFD_WP. Be careful to clear the logical flag regardless of its physical form; a PTE bit, a swap PTE bit, or a PTE marker. Cover PTE, huge PMD and hugetlb paths. Link: https://lkml.kernel.org/r/20250107144755.1871363-2-ryan.roberts@arm.com Co-developed-by: Mikołaj Lenczewski <miko.lenczewski@arm.com> Signed-off-by: Mikołaj Lenczewski <miko.lenczewski@arm.com> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Closes: https://lore.kernel.org/linux-mm/810b44a8-d2ae-4107-b665-5a42eae2d948@arm.com/ Fixes: 63b2d4174c4a ("userfaultfd: wp: add the writeprotect API to userfaultfd ioctl") Cc: David Hildenbrand <david@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Peter Xu <peterx@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Thomas Weißschuh
|
6c2a1908f6 |
selftests/mm: virtual_address_range: avoid reading VVAR mappings
The virtual_address_range selftest reads from the start of each mapping listed in /proc/self/maps. However not all mappings are valid to be arbitrarily accessed. For example the vvar data used for virtual clocks on x86 can only be accessed if 1) the kernel configuration enables virtual clocks and 2) the hypervisor provided the data for it, which can only determined by the VDSO code itself. Since commit e93d2521b27f ("x86/vdso: Split virtual clock pages into dedicated mapping") the virtual clock data was split out into its own mapping, triggering faulting accesses by virtual_address_range. Skip the various vvar mappings in virtual_address_range to avoid errors. Link: https://lkml.kernel.org/r/20250107-virtual_address_range-tests-v1-2-3834a2fb47fe@linutronix.de Fixes: e93d2521b27f ("x86/vdso: Split virtual clock pages into dedicated mapping") Fixes: 010409649885 ("selftests/mm: confirm VA exhaustion without reliance on correctness of mmap()") Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202412271148.2656e485-lkp@intel.com Cc: Dev Jain <dev.jain@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Thomas Weißschuh
|
b1b7050709 |
selftests/mm: virtual_address_range: fix error when CommitLimit < 1GiB
If not enough physical memory is available the kernel may fail mmap(); see __vm_enough_memory() and vm_commit_limit(). In that case the logic in validate_complete_va_space() does not make sense and will even incorrectly fail. Instead skip the test if no mmap() succeeded. Link: https://lkml.kernel.org/r/20250107-virtual_address_range-tests-v1-1-3834a2fb47fe@linutronix.de Fixes: 010409649885 ("selftests/mm: confirm VA exhaustion without reliance on correctness of mmap()") Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Cc: <stable@vger.kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: kernel test robot <oliver.sang@intel.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Petr Pavlu
|
54582e074a |
module: fix writing of livepatch relocations in ROX text
A livepatch module can contain a special relocation section .klp.rela.<objname>.<secname> to apply its relocations at the appropriate time and to additionally access local and unexported symbols. When <objname> points to another module, such relocations are processed separately from the regular module relocation process. For instance, only when the target <objname> actually becomes loaded. With CONFIG_STRICT_MODULE_RWX, when the livepatch core decides to apply these relocations, their processing results in the following bug: [ 25.827238] BUG: unable to handle page fault for address: 00000000000012ba [ 25.827819] #PF: supervisor read access in kernel mode [ 25.828153] #PF: error_code(0x0000) - not-present page [ 25.828588] PGD 0 P4D 0 [ 25.829063] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI [ 25.829742] CPU: 2 UID: 0 PID: 452 Comm: insmod Tainted: G O K 6.13.0-rc4-00078-g059dd502b263 #7820 [ 25.830417] Tainted: [O]=OOT_MODULE, [K]=LIVEPATCH [ 25.830768] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-20220807_005459-localhost 04/01/2014 [ 25.831651] RIP: 0010:memcmp+0x24/0x60 [ 25.832190] Code: [...] [ 25.833378] RSP: 0018:ffffa40b403a3ae8 EFLAGS: 00000246 [ 25.833637] RAX: 0000000000000000 RBX: ffff93bc81d8e700 RCX: ffffffffc0202000 [ 25.834072] RDX: 0000000000000000 RSI: 0000000000000004 RDI: 00000000000012ba [ 25.834548] RBP: ffffa40b403a3b68 R08: ffffa40b403a3b30 R09: 0000004a00000002 [ 25.835088] R10: ffffffffffffd222 R11: f000000000000000 R12: 0000000000000000 [ 25.835666] R13: ffffffffc02032ba R14: ffffffffc007d1e0 R15: 0000000000000004 [ 25.836139] FS: 00007fecef8c3080(0000) GS:ffff93bc8f900000(0000) knlGS:0000000000000000 [ 25.836519] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 25.836977] CR2: 00000000000012ba CR3: 0000000002f24000 CR4: 00000000000006f0 [ 25.837442] Call Trace: [ 25.838297] <TASK> [ 25.841083] __write_relocate_add.constprop.0+0xc7/0x2b0 [ 25.841701] apply_relocate_add+0x75/0xa0 [ 25.841973] klp_write_section_relocs+0x10e/0x140 [ 25.842304] klp_write_object_relocs+0x70/0xa0 [ 25.842682] klp_init_object_loaded+0x21/0xf0 [ 25.842972] klp_enable_patch+0x43d/0x900 [ 25.843572] do_one_initcall+0x4c/0x220 [ 25.844186] do_init_module+0x6a/0x260 [ 25.844423] init_module_from_file+0x9c/0xe0 [ 25.844702] idempotent_init_module+0x172/0x270 [ 25.845008] __x64_sys_finit_module+0x69/0xc0 [ 25.845253] do_syscall_64+0x9e/0x1a0 [ 25.845498] entry_SYSCALL_64_after_hwframe+0x77/0x7f [ 25.846056] RIP: 0033:0x7fecef9eb25d [ 25.846444] Code: [...] [ 25.847563] RSP: 002b:00007ffd0c5d6de8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 [ 25.848082] RAX: ffffffffffffffda RBX: 000055b03f05e470 RCX: 00007fecef9eb25d [ 25.848456] RDX: 0000000000000000 RSI: 000055b001e74e52 RDI: 0000000000000003 [ 25.848969] RBP: 00007ffd0c5d6ea0 R08: 0000000000000040 R09: 0000000000004100 [ 25.849411] R10: 00007fecefac7b20 R11: 0000000000000246 R12: 000055b001e74e52 [ 25.849905] R13: 0000000000000000 R14: 000055b03f05e440 R15: 0000000000000000 [ 25.850336] </TASK> [ 25.850553] Modules linked in: deku(OK+) uinput [ 25.851408] CR2: 00000000000012ba [ 25.852085] ---[ end trace 0000000000000000 ]--- The problem is that the .klp.rela.<objname>.<secname> relocations are processed after the module was already formed and mod->rw_copy was reset. However, the code in __write_relocate_add() calls module_writable_address() which translates the target address 'loc' still to 'loc + (mem->rw_copy - mem->base)', with mem->rw_copy now being 0. Fix the problem by returning directly 'loc' in module_writable_address() when the module is already formed. Function __write_relocate_add() knows to use text_poke() in such a case. Link: https://lkml.kernel.org/r/20250107153507.14733-1-petr.pavlu@suse.com Fixes: 0c133b1e78cd ("module: prepare to handle ROX allocations for text") Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Reported-by: Marek Maslanka <mmaslanka@google.com> Closes: https://lore.kernel.org/linux-modules/CAGcaFA2hdThQV6mjD_1_U+GNHThv84+MQvMWLgEuX+LVbAyDxg@mail.gmail.com/ Reviewed-by: Petr Mladek <pmladek@suse.com> Tested-by: Petr Mladek <pmladek@suse.com> Cc: Joe Lawrence <joe.lawrence@redhat.com> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Petr Mladek <pmladek@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Yosry Ahmed
|
47c2fab082 |
mm-zswap-properly-synchronize-freeing-resources-during-cpu-hotunplug-fix
remove comment Link: https://lkml.kernel.org/r/CAJD7tkaxS1wjn+swugt8QCvQ-rVF5RZnjxwPGX17k8x9zSManA@mail.gmail.com Signed-off-by: Yosry Ahmed <yosryahmed@google.com> Cc: Barry Song <baohua@kernel.org> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kanchana P Sridhar <kanchana.p.sridhar@intel.com> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Sam Sun <samsun1006219@gmail.com> Cc: Vitaly Wool <vitalywool@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Yosry Ahmed
|
b8668bad31 |
mm: zswap: properly synchronize freeing resources during CPU hotunplug
In zswap_compress() and zswap_decompress(), the per-CPU acomp_ctx of the current CPU at the beginning of the operation is retrieved and used throughout. However, since neither preemption nor migration are disabled, it is possible that the operation continues on a different CPU. If the original CPU is hotunplugged while the acomp_ctx is still in use, we run into a UAF bug as some of the resources attached to the acomp_ctx are freed during hotunplug in zswap_cpu_comp_dead() (i.e. acomp_ctx.buffer, acomp_ctx.req, or acomp_ctx.acomp). The problem was introduced in commit 1ec3b5fe6eec ("mm/zswap: move to use crypto_acomp API for hardware acceleration") when the switch to the crypto_acomp API was made. Prior to that, the per-CPU crypto_comp was retrieved using get_cpu_ptr() which disables preemption and makes sure the CPU cannot go away from under us. Preemption cannot be disabled with the crypto_acomp API as a sleepable context is needed. Use the acomp_ctx.mutex to synchronize CPU hotplug callbacks allocating and freeing resources with compression/decompression paths. Make sure that acomp_ctx.req is NULL when the resources are freed. In the compression/decompression paths, check if acomp_ctx.req is NULL after acquiring the mutex (meaning the CPU was offlined) and retry on the new CPU. The initialization of acomp_ctx.mutex is moved from the CPU hotplug callback to the pool initialization where it belongs (where the mutex is allocated). In addition to adding clarity, this makes sure that CPU hotplug cannot reinitialize a mutex that is already locked by compression/decompression. Previously a fix was attempted by holding cpus_read_lock() [1]. This would have caused a potential deadlock as it is possible for code already holding the lock to fall into reclaim and enter zswap (causing a deadlock). A fix was also attempted using SRCU for synchronization, but Johannes pointed out that synchronize_srcu() cannot be used in CPU hotplug notifiers [2]. Alternative fixes that were considered/attempted and could have worked: - Refcounting the per-CPU acomp_ctx. This involves complexity in handling the race between the refcount dropping to zero in zswap_[de]compress() and the refcount being re-initialized when the CPU is onlined. - Disabling migration before getting the per-CPU acomp_ctx [3], but that's discouraged and is a much bigger hammer than needed, and could result in subtle performance issues. [1]https://lkml.kernel.org/20241219212437.2714151-1-yosryahmed@google.com/ [2]https://lkml.kernel.org/20250107074724.1756696-2-yosryahmed@google.com/ [3]https://lkml.kernel.org/20250107222236.2715883-2-yosryahmed@google.com/ Link: https://lkml.kernel.org/r/20250108222441.3622031-1-yosryahmed@google.com Fixes: 1ec3b5fe6eec ("mm/zswap: move to use crypto_acomp API for hardware acceleration") Signed-off-by: Yosry Ahmed <yosryahmed@google.com> Reported-by: Johannes Weiner <hannes@cmpxchg.org> Closes: https://lore.kernel.org/lkml/20241113213007.GB1564047@cmpxchg.org/ Reported-by: Sam Sun <samsun1006219@gmail.com> Closes: https://lore.kernel.org/lkml/CAEkJfYMtSdM5HceNsXUDf5haghD5+o2e7Qv4OcuruL4tPg6OaQ@mail.gmail.com/ Cc: Barry Song <baohua@kernel.org> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Kanchana P Sridhar <kanchana.p.sridhar@intel.com> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Vitaly Wool <vitalywool@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Yosry Ahmed
|
36357fd23a |
Revert "mm: zswap: fix race between [de]compression and CPU hotunplug"
This reverts commit eaebeb93922ca6ab0dd92027b73d0112701706ef. Commit eaebeb93922c ("mm: zswap: fix race between [de]compression and CPU hotunplug") used the CPU hotplug lock in zswap compress/decompress operations to protect against a race with CPU hotunplug making some per-CPU resources go away. However, zswap compress/decompress can be reached through reclaim while the lock is held, resulting in a potential deadlock as reported by syzbot: ====================================================== WARNING: possible circular locking dependency detected 6.13.0-rc6-syzkaller-00006-g5428dc1906dd #0 Not tainted ------------------------------------------------------ kswapd0/89 is trying to acquire lock: ffffffff8e7d2ed0 (cpu_hotplug_lock){++++}-{0:0}, at: acomp_ctx_get_cpu mm/zswap.c:886 [inline] ffffffff8e7d2ed0 (cpu_hotplug_lock){++++}-{0:0}, at: zswap_compress mm/zswap.c:908 [inline] ffffffff8e7d2ed0 (cpu_hotplug_lock){++++}-{0:0}, at: zswap_store_page mm/zswap.c:1439 [inline] ffffffff8e7d2ed0 (cpu_hotplug_lock){++++}-{0:0}, at: zswap_store+0xa74/0x1ba0 mm/zswap.c:1546 but task is already holding lock: ffffffff8ea355a0 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat mm/vmscan.c:6871 [inline] ffffffff8ea355a0 (fs_reclaim){+.+.}-{0:0}, at: kswapd+0xb58/0x2f30 mm/vmscan.c:7253 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (fs_reclaim){+.+.}-{0:0}: lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5849 __fs_reclaim_acquire mm/page_alloc.c:3853 [inline] fs_reclaim_acquire+0x88/0x130 mm/page_alloc.c:3867 might_alloc include/linux/sched/mm.h:318 [inline] slab_pre_alloc_hook mm/slub.c:4070 [inline] slab_alloc_node mm/slub.c:4148 [inline] __kmalloc_cache_node_noprof+0x40/0x3a0 mm/slub.c:4337 kmalloc_node_noprof include/linux/slab.h:924 [inline] alloc_worker kernel/workqueue.c:2638 [inline] create_worker+0x11b/0x720 kernel/workqueue.c:2781 workqueue_prepare_cpu+0xe3/0x170 kernel/workqueue.c:6628 cpuhp_invoke_callback+0x48d/0x830 kernel/cpu.c:194 __cpuhp_invoke_callback_range kernel/cpu.c:965 [inline] cpuhp_invoke_callback_range kernel/cpu.c:989 [inline] cpuhp_up_callbacks kernel/cpu.c:1020 [inline] _cpu_up+0x2b3/0x580 kernel/cpu.c:1690 cpu_up+0x184/0x230 kernel/cpu.c:1722 cpuhp_bringup_mask+0xdf/0x260 kernel/cpu.c:1788 cpuhp_bringup_cpus_parallel+0xf9/0x160 kernel/cpu.c:1878 bringup_nonboot_cpus+0x2b/0x50 kernel/cpu.c:1892 smp_init+0x34/0x150 kernel/smp.c:1009 kernel_init_freeable+0x417/0x5d0 init/main.c:1569 kernel_init+0x1d/0x2b0 init/main.c:1466 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 -> #0 (cpu_hotplug_lock){++++}-{0:0}: check_prev_add kernel/locking/lockdep.c:3161 [inline] check_prevs_add kernel/locking/lockdep.c:3280 [inline] validate_chain+0x18ef/0x5920 kernel/locking/lockdep.c:3904 __lock_acquire+0x1397/0x2100 kernel/locking/lockdep.c:5226 lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5849 percpu_down_read include/linux/percpu-rwsem.h:51 [inline] cpus_read_lock+0x42/0x150 kernel/cpu.c:490 acomp_ctx_get_cpu mm/zswap.c:886 [inline] zswap_compress mm/zswap.c:908 [inline] zswap_store_page mm/zswap.c:1439 [inline] zswap_store+0xa74/0x1ba0 mm/zswap.c:1546 swap_writepage+0x647/0xce0 mm/page_io.c:279 shmem_writepage+0x1248/0x1610 mm/shmem.c:1579 pageout mm/vmscan.c:696 [inline] shrink_folio_list+0x35ee/0x57e0 mm/vmscan.c:1374 shrink_inactive_list mm/vmscan.c:1967 [inline] shrink_list mm/vmscan.c:2205 [inline] shrink_lruvec+0x16db/0x2f30 mm/vmscan.c:5734 mem_cgroup_shrink_node+0x385/0x8e0 mm/vmscan.c:6575 mem_cgroup_soft_reclaim mm/memcontrol-v1.c:312 [inline] memcg1_soft_limit_reclaim+0x346/0x810 mm/memcontrol-v1.c:362 balance_pgdat mm/vmscan.c:6975 [inline] kswapd+0x17b3/0x2f30 mm/vmscan.c:7253 kthread+0x2f0/0x390 kernel/kthread.c:389 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(fs_reclaim); lock(cpu_hotplug_lock); lock(fs_reclaim); rlock(cpu_hotplug_lock); *** DEADLOCK *** 1 lock held by kswapd0/89: #0: ffffffff8ea355a0 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat mm/vmscan.c:6871 [inline] #0: ffffffff8ea355a0 (fs_reclaim){+.+.}-{0:0}, at: kswapd+0xb58/0x2f30 mm/vmscan.c:7253 stack backtrace: CPU: 0 UID: 0 PID: 89 Comm: kswapd0 Not tainted 6.13.0-rc6-syzkaller-00006-g5428dc1906dd #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024 Call Trace: <TASK> __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120 print_circular_bug+0x13a/0x1b0 kernel/locking/lockdep.c:2074 check_noncircular+0x36a/0x4a0 kernel/locking/lockdep.c:2206 check_prev_add kernel/locking/lockdep.c:3161 [inline] check_prevs_add kernel/locking/lockdep.c:3280 [inline] validate_chain+0x18ef/0x5920 kernel/locking/lockdep.c:3904 __lock_acquire+0x1397/0x2100 kernel/locking/lockdep.c:5226 lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5849 percpu_down_read include/linux/percpu-rwsem.h:51 [inline] cpus_read_lock+0x42/0x150 kernel/cpu.c:490 acomp_ctx_get_cpu mm/zswap.c:886 [inline] zswap_compress mm/zswap.c:908 [inline] zswap_store_page mm/zswap.c:1439 [inline] zswap_store+0xa74/0x1ba0 mm/zswap.c:1546 swap_writepage+0x647/0xce0 mm/page_io.c:279 shmem_writepage+0x1248/0x1610 mm/shmem.c:1579 pageout mm/vmscan.c:696 [inline] shrink_folio_list+0x35ee/0x57e0 mm/vmscan.c:1374 shrink_inactive_list mm/vmscan.c:1967 [inline] shrink_list mm/vmscan.c:2205 [inline] shrink_lruvec+0x16db/0x2f30 mm/vmscan.c:5734 mem_cgroup_shrink_node+0x385/0x8e0 mm/vmscan.c:6575 mem_cgroup_soft_reclaim mm/memcontrol-v1.c:312 [inline] memcg1_soft_limit_reclaim+0x346/0x810 mm/memcontrol-v1.c:362 balance_pgdat mm/vmscan.c:6975 [inline] kswapd+0x17b3/0x2f30 mm/vmscan.c:7253 kthread+0x2f0/0x390 kernel/kthread.c:389 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 </TASK> Revert the change. A different fix for the race with CPU hotunplug will follow. Link: https://lkml.kernel.org/r/20250107222236.2715883-1-yosryahmed@google.com Signed-off-by: Yosry Ahmed <yosryahmed@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Barry Song <baohua@kernel.org> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kanchana P Sridhar <kanchana.p.sridhar@intel.com> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Sam Sun <samsun1006219@gmail.com> Cc: Vitaly Wool <vitalywool@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Muchun Song
|
56b2294d7d |
hugetlb: fix NULL pointer dereference in trace_hugetlbfs_alloc_inode
hugetlb_file_setup() will pass a NULL @dir to hugetlbfs_get_inode(), so we will access a NULL pointer for @dir. Fix it and set __entry->dr to 0 if @dir is NULL. Because ->i_ino cannot be 0 (see get_next_ino()), there is no confusing if user sees a 0 inode number. Link: https://lkml.kernel.org/r/20250106033118.4640-1-songmuchun@bytedance.com Fixes: 318580ad7f28 ("hugetlbfs: support tracepoint") Signed-off-by: Muchun Song <songmuchun@bytedance.com> Reported-by: Cheung Wall <zzqq0103.hey@gmail.com> Closes: https://lore.kernel.org/linux-mm/02858D60-43C1-4863-A84F-3C76A8AF1F15@linux.dev/T/# Reviewed-by: Hongbo Li <lihongbo22@huawei.com> Cc: cheung wall <zzqq0103.hey@gmail.com> Cc: Christian Brauner <brauner@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Stefan Roesch
|
5c475df36d |
mm-fix-div-by-zero-in-bdi_ratio_from_pages-v3
During testing it has been detected, that it is possible to get div by zero error in bdi_set_min_bytes. The error is caused by the function bdi_ratio_from_pages(). bdi_ratio_from_pages() calls global_dirty_limits. If the dirty threshold is 0, the div by zero is raised. This can happen if the root user is setting: echo 0 > /proc/sys/vm/dirty_ratio The following is a test case: echo 0 > /proc/sys/vm/dirty_ratio cd /sys/class/bdi/<device> echo 1 > strict_limit echo 8192 > min_bytes ==> error is raised. The problem is addressed by returning -EINVAL if dirty_ratio or dirty_bytes is set to 0. Link: https://lkml.kernel.org/r/20250109063411.6591-1-shr@devkernel.io Reported-by: cheung wall <zzqq0103.hey@gmail.com> Closes: https://lore.kernel.org/linux-mm/87pll35yd0.fsf@devkernel.io/T/#t Signed-off-by: Stefan Roesch <shr@devkernel.io> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Stefan Roesch
|
4a8c239933 |
mm-fix-div-by-zero-in-bdi_ratio_from_pages-v2
check for -EINVAL in bdi_set_min_bytes() and bdi_set_max_bytes() Link: https://lkml.kernel.org/r/20250108014723.166637-1-shr@devkernel.io Reported-by: cheung wall <zzqq0103.hey@gmail.com> Closes: https://lore.kernel.org/linux-mm/87pll35yd0.fsf@devkernel.io/T/#t Signed-off-by: Stefan Roesch <shr@devkernel.io> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Stefan Roesch
|
738fa81619 |
mm: fix div by zero in bdi_ratio_from_pages
During testing it was detected that it is possible to get div by zero error in bdi_set_min_bytes. The error is caused by the function bdi_ratio_from_pages(). bdi_ratio_from_pages() calls global_dirty_limits. If the dirty threshold is 0, the div by zero is raised. This can happen if the root user is setting: echo 0 > /proc/sys/vm/dirty_ratio The following is a test case: echo 0 > /proc/sys/vm/dirty_ratio cd /sys/class/bdi/<device> echo 1 > strict_limit echo 8192 > min_bytes ==> error is raised. The problem is addressed by returning -EINVAL if dirty_ratio or dirty_bytes is set to 0. Link: https://lkml.kernel.org/r/20250104012037.159386-1-shr@devkernel.io Reported-by: cheung wall <zzqq0103.hey@gmail.com> Closes: https://lore.kernel.org/linux-mm/87pll35yd0.fsf@devkernel.io/T/#t Signed-off-by: Stefan Roesch <shr@devkernel.io> Acked-by: David Hildenbrand <david@redhat.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Qiang Zhang <zzqq0103.hey@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Juergen Gross
|
dcb1623e2a |
x86/execmem: fix ROX cache usage in Xen PV guests
The recently introduced ROX cache for modules is assuming large page support in 64-bit mode without testing the related feature bit. This results in breakage when running as a Xen PV guest, as in this mode large pages are not supported. Fix that by testing the X86_FEATURE_PSE capability when deciding whether to enable the ROX cache. Link: https://lkml.kernel.org/r/20250103065631.26459-1-jgross@suse.com Fixes: 2e45474ab14f ("execmem: add support for cache of large ROX pages") Signed-off-by: Juergen Gross <jgross@suse.com> Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Marco Nelissen
|
8c97f863da |
filemap: avoid truncating 64-bit offset to 32 bits
On 32-bit kernels, folio_seek_hole_data() was inadvertently truncating a 64-bit value to 32 bits, leading to a possible infinite loop when writing to an xfs filesystem. Link: https://lkml.kernel.org/r/20250102190540.1356838-1-marco.nelissen@gmail.com Fixes: 54fa39ac2e00 ("iomap: use mapping_seek_hole_data") Signed-off-by: Marco Nelissen <marco.nelissen@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Suren Baghdasaryan
|
d8cc9c50d1 |
tools: fix atomic_set() definition to set the value correctly
Currently vma test is failing because of the new vma_assert_attached() assertion. The check is failing because previous refcount_set() inside vma_mark_attached() is a NoOp. Fix the definition of atomic_set() to correctly set the value of the atomic. Link: https://lkml.kernel.org/r/20241227222220.1726384-1-surenb@google.com Fixes: 9325b8b5a1cb ("tools: add skeleton code for userland testing of VMA logic") Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Jann Horn <jannh@google.com> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Honggyu Kim
|
410b2d13cf |
mm/mempolicy: count MPOL_WEIGHTED_INTERLEAVE to "interleave_hit"
Commit fa3bea4e1f82 introduced MPOL_WEIGHTED_INTERLEAVE but it missed adding its counter to "interleave_hit" of numastat, which is located at /sys/devices/system/node/nodeN/ directory. It'd be better to add weighted interleving counter info to the existing "interleave_hit" instead of introducing a new counter "weighted_interleave_hit". Link: https://lkml.kernel.org/r/20241227095737.645-1-honggyu.kim@sk.com Fixes: fa3bea4e1f82 ("mm/mempolicy: introduce MPOL_WEIGHTED_INTERLEAVE for weighted interleaving") Signed-off-by: Honggyu Kim <honggyu.kim@sk.com> Reviewed-by: Gregory Price <gourry@gourry.net> Reviewed-by: Hyeonggon Yoo <hyeonggon.yoo@sk.com> Tested-by: Yunjeong Mun <yunjeong.mun@sk.com> Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Luca Ceresoli
|
33af53e509 |
scripts/decode_stacktrace.sh: fix decoding of lines with an additional info
Since commit bdf8eafbf7f5 ("arm64: stacktrace: report source of unwind data") a stack trace line can contain an additional info field that was not present before, in the form of one or more letters in parentheses. E.g.: [ 504.517915] led_sysfs_enable+0x54/0x80 (P) ^^^ When this is present, decode_stacktrace decodes the line incorrectly: [ 504.517915] led_sysfs_enable+0x54/0x80 P Extend parsing to decode it correctly: [ 504.517915] led_sysfs_enable (drivers/leds/led-core.c:455 (discriminator 7)) (P) The regex to match such lines assumes the info can be extended in the future to other uppercase characters, and will need to be extended in case other characters will be used. Using a much more generic regex might incur in false positives, so this looked like a good tradeoff. Link: https://lkml.kernel.org/r/20241230-decode_stacktrace-fix-info-v1-1-984910659173@bootlin.com Fixes: bdf8eafbf7f5 ("arm64: stacktrace: report source of unwind data") Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Mark Brown <broonie@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Miroslav Benes <mbenes@suse.cz> Cc: Puranjay Mohan <puranjay@kernel.org> Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Guo Weikang
|
474739022e |
mm/kmemleak: fix percpu memory leak detection failure
kmemleak_alloc_percpu gives an incorrect min_count parameter, causing percpu memory to be considered a gray object. Link: https://lkml.kernel.org/r/20241227092311.3572500-1-guoweikang.kernel@gmail.com Fixes: 8c8685928910 ("mm/kmemleak: use IS_ERR_PCPU() for pointer in the percpu address space") Signed-off-by: Guo Weikang <guoweikang.kernel@gmail.com> Acked-by: Uros Bizjak <ubizjak@gmail.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Guo Weikang <guoweikang.kernel@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Suren Baghdasaryan
|
d1c2903592 |
alloc_tag: skip pgalloc_tag_swap if profiling is disabled
When memory allocation profiling is disabled, there is no need to swap allocation tags during migration. Skip it to avoid unnecessary overhead. Once I added these checks, the overhead of the mode when memory profiling is enabled but turned off went down by about 50%. I ran more comprehensive testing on Pixel 6 on Big, Medium and Little cores: Overhead before fixes Overhead after fixes slab alloc page alloc slab alloc page alloc Big 6.21% 5.32% 3.31% 4.93% Medium 4.51% 5.05% 3.79% 4.39% Little 7.62% 1.82% 6.68% 1.02% This is an allocation microbenchmark doing allocations in a tight loop. Not a really realistic scenario and useful only to make performance comparisons. Link: https://lkml.kernel.org/r/20241226211639.1357704-2-surenb@google.com Fixes: e0a955bf7f61 ("mm/codetag: add pgalloc_tag_copy()") Signed-off-by: Suren Baghdasaryan <surenb@google.com> Cc: David Wang <00107082@163.com> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Yu Zhao <yuzhao@google.com> Cc: Zhenhua Huang <quic_zhenhuah@quicinc.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
zihan zhou
|
d871d8a4e4 |
mm: page_alloc: fix missed updates of lowmem_reserve in adjust_managed_page_count
In the kernel, the zone's lowmem_reserve and _watermark, and the global variable 'totalreserve_pages' depend on the value of managed_pages, but after running adjust_managed_page_count, these values aren't updated, which causes some problems. For example, in a system with six 1GB large pages, we found that the value of protection in zoneinfo (zone->lowmem_reserve), is not right. Its value seems calculated from the initial managed_pages, but after the managed_pages changed, was not updated. Only after reading the file /proc/sys/vm/lowmem_reserve_ratio, updates happen. read file /proc/sys/vm/lowmem_reserve_ratio: lowmem_reserve_ratio_sysctl_handler ----setup_per_zone_lowmem_reserve --------calculate_totalreserve_pages protection changed after reading file: [root@test ~]# cat /proc/zoneinfo | grep protection protection: (0, 2719, 57360, 0) protection: (0, 0, 54640, 0) protection: (0, 0, 0, 0) protection: (0, 0, 0, 0) [root@test ~]# cat /proc/sys/vm/lowmem_reserve_ratio 256 256 32 0 [root@test ~]# cat /proc/zoneinfo | grep protection protection: (0, 2735, 63524, 0) protection: (0, 0, 60788, 0) protection: (0, 0, 0, 0) protection: (0, 0, 0, 0) lowmem_reserve increased also makes the totalreserve_pages increased, which causes a decrease in available memory. The one above is just a test machine, and the increase is not significant. On our online machine, the reserved memory will increase by several GB due to reading this file. It is clearly unreasonable to cause a sharp drop in available memory just by reading a file. In this patch, we update reserve memory when update managed_pages, The size of reserved memory becomes stable. But it seems that the _watermark should also be updated along with the managed_pages. We have not done it because we are unsure if it is reasonable to set the watermark through the initial managed_pages. If it is not reasonable, we will propose new patch. Link: https://lkml.kernel.org/r/20241225021034.45693-1-15645113830zzh@gmail.com Signed-off-by: zihan zhou <15645113830zzh@gmail.com> Signed-off-by: yaowenchao <yaowenchao@jd.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
liuye
|
665ad6bbac |
mm/vmscan: fix hard LOCKUP in function isolate_lru_folios
This fixes the following hard lockup in isolate_lru_folios() during memory reclaim. If the LRU mostly contains ineligible folios this may trigger watchdog. watchdog: Watchdog detected hard LOCKUP on cpu 173 RIP: 0010:native_queued_spin_lock_slowpath+0x255/0x2a0 Call Trace: _raw_spin_lock_irqsave+0x31/0x40 folio_lruvec_lock_irqsave+0x5f/0x90 folio_batch_move_lru+0x91/0x150 lru_add_drain_per_cpu+0x1c/0x40 process_one_work+0x17d/0x350 worker_thread+0x27b/0x3a0 kthread+0xe8/0x120 ret_from_fork+0x34/0x50 ret_from_fork_asm+0x1b/0x30 lruvec->lru_lock owner: PID: 2865 TASK: ffff888139214d40 CPU: 40 COMMAND: "kswapd0" #0 [fffffe0000945e60] crash_nmi_callback at ffffffffa567a555 #1 [fffffe0000945e68] nmi_handle at ffffffffa563b171 #2 [fffffe0000945eb0] default_do_nmi at ffffffffa6575920 #3 [fffffe0000945ed0] exc_nmi at ffffffffa6575af4 #4 [fffffe0000945ef0] end_repeat_nmi at ffffffffa6601dde [exception RIP: isolate_lru_folios+403] RIP: ffffffffa597df53 RSP: ffffc90006fb7c28 RFLAGS: 00000002 RAX: 0000000000000001 RBX: ffffc90006fb7c60 RCX: ffffea04a2196f88 RDX: ffffc90006fb7c60 RSI: ffffc90006fb7c60 RDI: ffffea04a2197048 RBP: ffff88812cbd3010 R8: ffffea04a2197008 R9: 0000000000000001 R10: 0000000000000000 R11: 0000000000000001 R12: ffffea04a2197008 R13: ffffea04a2197048 R14: ffffc90006fb7de8 R15: 0000000003e3e937 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 <NMI exception stack> #5 [ffffc90006fb7c28] isolate_lru_folios at ffffffffa597df53 #6 [ffffc90006fb7cf8] shrink_active_list at ffffffffa597f788 #7 [ffffc90006fb7da8] balance_pgdat at ffffffffa5986db0 #8 [ffffc90006fb7ec0] kswapd at ffffffffa5987354 #9 [ffffc90006fb7ef8] kthread at ffffffffa5748238 crash> Scenario: User processe are requesting a large amount of memory and keep page active. Then a module continuously requests memory from ZONE_DMA32 area. Memory reclaim will be triggered due to ZONE_DMA32 watermark alarm reached. However pages in the LRU(active_anon) list are mostly from the ZONE_NORMAL area. Reproduce: Terminal 1: Construct to continuously increase pages active(anon). mkdir /tmp/memory mount -t tmpfs -o size=1024000M tmpfs /tmp/memory dd if=/dev/zero of=/tmp/memory/block bs=4M tail /tmp/memory/block Terminal 2: vmstat -a 1 active will increase. procs ---memory--- ---swap-- ---io---- -system-- ---cpu--- ... r b swpd free inact active si so bi bo 1 0 0 1445623076 45898836 83646008 0 0 0 1 0 0 1445623076 43450228 86094616 0 0 0 1 0 0 1445623076 41003480 88541364 0 0 0 1 0 0 1445623076 38557088 90987756 0 0 0 1 0 0 1445623076 36109688 93435156 0 0 0 1 0 0 1445619552 33663256 95881632 0 0 0 1 0 0 1445619804 31217140 98327792 0 0 0 1 0 0 1445619804 28769988 100774944 0 0 0 1 0 0 1445619804 26322348 103222584 0 0 0 1 0 0 1445619804 23875592 105669340 0 0 0 cat /proc/meminfo | head Active(anon) increase. MemTotal: 1579941036 kB MemFree: 1445618500 kB MemAvailable: 1453013224 kB Buffers: 6516 kB Cached: 128653956 kB SwapCached: 0 kB Active: 118110812 kB Inactive: 11436620 kB Active(anon): 115345744 kB Inactive(anon): 945292 kB When the Active(anon) is 115345744 kB, insmod module triggers the ZONE_DMA32 watermark. perf record -e vmscan:mm_vmscan_lru_isolate -aR perf script isolate_mode=0 classzone=1 order=1 nr_requested=32 nr_scanned=2 nr_skipped=2 nr_taken=0 lru=active_anon isolate_mode=0 classzone=1 order=1 nr_requested=32 nr_scanned=0 nr_skipped=0 nr_taken=0 lru=active_anon isolate_mode=0 classzone=1 order=0 nr_requested=32 nr_scanned=28835844 nr_skipped=28835844 nr_taken=0 lru=active_anon isolate_mode=0 classzone=1 order=1 nr_requested=32 nr_scanned=28835844 nr_skipped=28835844 nr_taken=0 lru=active_anon isolate_mode=0 classzone=1 order=0 nr_requested=32 nr_scanned=29 nr_skipped=29 nr_taken=0 lru=active_anon isolate_mode=0 classzone=1 order=0 nr_requested=32 nr_scanned=0 nr_skipped=0 nr_taken=0 lru=active_anon See nr_scanned=28835844. 28835844 * 4k = 115343376KB approximately equal to 115345744 kB. If increase Active(anon) to 1000G then insmod module triggers the ZONE_DMA32 watermark. hard lockup will occur. In my device nr_scanned = 0000000003e3e937 when hard lockup. Convert to memory size 0x0000000003e3e937 * 4KB = 261072092 KB. [ffffc90006fb7c28] isolate_lru_folios at ffffffffa597df53 ffffc90006fb7c30: 0000000000000020 0000000000000000 ffffc90006fb7c40: ffffc90006fb7d40 ffff88812cbd3000 ffffc90006fb7c50: ffffc90006fb7d30 0000000106fb7de8 ffffc90006fb7c60: ffffea04a2197008 ffffea0006ed4a48 ffffc90006fb7c70: 0000000000000000 0000000000000000 ffffc90006fb7c80: 0000000000000000 0000000000000000 ffffc90006fb7c90: 0000000000000000 0000000000000000 ffffc90006fb7ca0: 0000000000000000 0000000003e3e937 ffffc90006fb7cb0: 0000000000000000 0000000000000000 ffffc90006fb7cc0: 8d7c0b56b7874b00 ffff88812cbd3000 About the Fixes: Why did it take eight years to be discovered? The problem requires the following conditions to occur: 1. The device memory should be large enough. 2. Pages in the LRU(active_anon) list are mostly from the ZONE_NORMAL area. 3. The memory in ZONE_DMA32 needs to reach the watermark. If the memory is not large enough, or if the usage design of ZONE_DMA32 area memory is reasonable, this problem is difficult to detect. notes: The problem is most likely to occur in ZONE_DMA32 and ZONE_NORMAL, but other suitable scenarios may also trigger the problem. Link: https://lkml.kernel.org/r/20241119060842.274072-1-liuye@kylinos.cn Fixes: b2e18757f2c9 ("mm, vmscan: begin reclaiming pages on a per-node basis") Signed-off-by: liuye <liuye@kylinos.cn> Cc: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Yang Shi <yang@os.amperecomputing.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Linus Torvalds
|
cd6313beae |
Revert "vmstat: disable vmstat_work on vmstat_cpu_down_prep()"
This reverts commit adcfb264c3ed51fbbf5068ddf10d309a63683868. It turns out this just causes a different warning splat instead that seems to be much easier to trigger, so let's revert ASAP. Reported-and-bisected-by: Borislav Petkov <bp@alien8.de> Tested-by: Breno Leitao <leitao@debian.org> Reported-by: Alexander Gordeev <agordeev@linux.ibm.com> Link: https://lore.kernel.org/all/20250106131817.GAZ3vYGVr3-hWFFPLj@fat_crate.local/ Cc: Koichiro Den <koichiro.den@canonical.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Linus Torvalds
|
9d89551994 | Linux 6.13-rc6 v6.13-rc6 | ||
Linus Torvalds
|
9244696b34 |
Kbuild fixes for v6.13 (3rd)
- Fix escaping of '$' in scripts/mksysmap - Fix a modpost crash observed with the latest binutils - Fix 'provides' in the linux-api-headers pacman package -----BEGIN PGP SIGNATURE----- iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmd6lzsVHG1hc2FoaXJv eUBrZXJuZWwub3JnAAoJED2LAQed4NsGkXoQAJCH0b0gfzuRMtgKI+HKj/eow4ZI +uArtK/cu2fJ6Vxe71GTBJGEca8DNslOOlyMvWf6jhRcBdd5jOlTciUojk389jRb 6JR9fvEfS3RevkbkzVe0Uq6jSL02eLrF/z/Q3xmVMi68npBlCg40G2EDIIS8iWpu jRxnhSUBKU+ooKahA2KwRVexUdkE8BbdsNLUf5NaTSOACI0A4sn6Om4uYdCQdrRC oNptpzJdVoG40cKlzSyzNoCVxxY77b9PKdkuR28SMpj55ZjcAQRV3ZuHphegH1JQ fpg6JNu0sNHrM4OkKzrpA6f8d3s1yooILgrCMQhFo0gUHLFVLIuos1t8uT4urDr+ RJ7bUWnLCyqu6wEtzGRaiRPXDolKJrXYYTO+XwLkz5iLkNLO1uGXvm7Hbl4GG2iX PKKZ5gT02kQmijvDRc496Q7z4LYJqAqUIUu0NSR1lP+qK0qKCIHJyi3kAAvvLZ/Y Eia8LtUXcjjbbMSSk3vCXC8qvRj4NE1HxqmbBDVTpdnzxNfxeQisHiD0Rz9RzTyo FhvWGrDNx2vTcShrBiZvww5wfUkSSueXqG89d95U3AxlhJ58dk4w08whu03fkhNB P/NM1e6ShOo682LVrBgFHmhgxc+2iYb81ZqzfSJVHLOV8cpShC0KjONFRgdCW4K4 O8iD6fUXYDlLIJjg =v0Mh -----END PGP SIGNATURE----- Merge tag 'kbuild-fixes-v6.13-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild fixes from Masahiro Yamada: - Fix escaping of '$' in scripts/mksysmap - Fix a modpost crash observed with the latest binutils - Fix 'provides' in the linux-api-headers pacman package * tag 'kbuild-fixes-v6.13-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: kbuild: pacman-pkg: provide versioned linux-api-headers package modpost: work around unaligned data access error modpost: refactor do_vmbus_entry() modpost: fix the missed iteration for the max bit in do_input() scripts/mksysmap: Fix escape chars '$' |
||
Linus Torvalds
|
5635d8bad2 |
25 hotfixes. 16 are cc:stable. 18 are MM and 7 are non-MM.
The usual bunch of singletons and two doubletons - please see the relevant changelogs for details. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZ3noXwAKCRDdBJ7gKXxA jkzRAP9Ejb8kbgCrA3cptnzlVkDCDUm0TmleepT3bx6B2rH0BgEAzSiTXf4ioZPg 4pOHnKIGOWEVPcVwBrdA0irWG+QPYAQ= =nEIZ -----END PGP SIGNATURE----- Merge tag 'mm-hotfixes-stable-2025-01-04-18-02' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull hotfixes from Andrew Morton: "25 hotfixes. 16 are cc:stable. 18 are MM and 7 are non-MM. The usual bunch of singletons and two doubletons - please see the relevant changelogs for details" * tag 'mm-hotfixes-stable-2025-01-04-18-02' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (25 commits) MAINTAINERS: change Arınç _NAL's name and email address scripts/sorttable: fix orc_sort_cmp() to maintain symmetry and transitivity mm/util: make memdup_user_nul() similar to memdup_user() mm, madvise: fix potential workingset node list_lru leaks mm/damon/core: fix ignored quota goals and filters of newly committed schemes mm/damon/core: fix new damon_target objects leaks on damon_commit_targets() mm/list_lru: fix false warning of negative counter vmstat: disable vmstat_work on vmstat_cpu_down_prep() mm: shmem: fix the update of 'shmem_falloc->nr_unswapped' mm: shmem: fix incorrect index alignment for within_size policy percpu: remove intermediate variable in PERCPU_PTR() mm: zswap: fix race between [de]compression and CPU hotunplug ocfs2: fix slab-use-after-free due to dangling pointer dqi_priv fs/proc/task_mmu: fix pagemap flags with PMD THP entries on 32bit kcov: mark in_softirq_really() as __always_inline docs: mm: fix the incorrect 'FileHugeMapped' field mailmap: modify the entry for Mathieu Othacehe mm/kmemleak: fix sleeping function called from invalid context at print message mm: hugetlb: independent PMD page table shared count maple_tree: reload mas before the second call for mas_empty_area ... |
||
Linus Torvalds
|
7a5b6fc8bd |
A randconfig build fix and a performance fix:
- Fix the CONFIG_RESET_CONTROLLER=n path signature of clk_imx8mp_audiomix_reset_controller_register() to appease randconfig - Speed up the sdhci clk on TH1520 by a factor of 4 by adding a fixed factor clk -----BEGIN PGP SIGNATURE----- iQJFBAABCAAvFiEE9L57QeeUxqYDyoaDrQKIl8bklSUFAmd5nXkRHHNib3lkQGtl cm5lbC5vcmcACgkQrQKIl8bklSXRpRAAsPGcr0feyS+TxKSp3qGdmUwNdiwYzoH/ NvyzMO++Lw3VICY+NfatlfhHlVT9mwVbIB/1xvMbL/3p7AbYR2e4/wRzOycE6+mB leYUrEmvWeRXCEgs6zymMm8i7qezTfRPvpTf6dIp04DxbIEXcy/B9FTdhJlYQv8T CfOXv6BVZbWDgkNDZBiu/a93sMk3ivQUYcgdPYZD6CT6IcEeTFYRAoZqIrywO14P xl70TIaWC+PDeGjsGlH5nFXq4cCkRuqSIZx50hsgT5tvM+CV105AGdMjlNRijwzM L4G3u1faRzGZI+4U4ccjZuNQWBhPZxdb5Ho31nDHYSaSxW/f7eBIJfZYVTggryC9 DcgQSlPdfiVa5fwyj/ZSenOr7S0Km0onnHxrbqNd5bCEuhWrpRPCwLJ2XtqOOi1l 3jezZ2RvvtrYqO0naPrcgpJVPU98J+CfNveGQUSXH9g/b2RVBgUsOV4NLWb6YoH0 f11RbfZjBx1maEjUrv1fRIl8dvW4CySQXQpMYG/lGD52bs3EzOre3GCiyAtM9liH tpAvclQB0lVEpr0YIwXPrmK5dFIL4mZQNIaaCqsO8ApkkQMHfPXxwGxX1E9wPIeg E+cBXGsHzQf5fZERNc9BFRtelN5RHGQCb7mgqv9nNkFXCB09jWCNSjyVtOibU8FJ KCeYxUbyVyk= =30Sl -----END PGP SIGNATURE----- Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull clk fixes from Stephen Boyd: "A randconfig build fix and a performance fix: - Fix the CONFIG_RESET_CONTROLLER=n path signature of clk_imx8mp_audiomix_reset_controller_register() to appease randconfig - Speed up the sdhci clk on TH1520 by a factor of 4 by adding a fixed factor clk" * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: clk: clk-imx8mp-audiomix: fix function signature clk: thead: Fix TH1520 emmc and shdci clock rate |
||
Thomas Weißschuh
|
385443057f |
kbuild: pacman-pkg: provide versioned linux-api-headers package
The Arch Linux glibc package contains a versioned dependency on "linux-api-headers". If the linux-api-headers package provided by pacman-pkg does not specify an explicit version this dependency is not satisfied. Fix the dependency by providing an explicit version. Fixes: c8578539deba ("kbuild: add script and target to generate pacman package") Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> |
||
Linus Torvalds
|
ab75170520 |
linux-watchdog 6.13-rc6 tag
-----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (GNU/Linux) iEYEABECAAYFAmd5de4ACgkQ+iyteGJfRsrcWACgoOs/mw7ReFgz4BCkk8BkBfcv y2kAn3uyUMwb/mU1m8qLWpY8RuhNC4Ge =oL1Y -----END PGP SIGNATURE----- Merge tag 'linux-watchdog-6.13-rc6' of git://www.linux-watchdog.org/linux-watchdog Pull watchdog fix from Wim Van Sebroeck: - fix error message during stm32 driver probe * tag 'linux-watchdog-6.13-rc6' of git://www.linux-watchdog.org/linux-watchdog: watchdog: stm32_iwdg: fix error message during driver probe |
||
Linus Torvalds
|
63676eefb7 |
sched_ext: Fixes for v6.13-rc5
- Fix the bug where bpf_iter_scx_dsq_new() was not initializing the iterator's flags and could inadvertently enable e.g. reverse iteration. - Fix the bug where scx_ops_bypass() could call irq_restore twice. - Add Andrea and Changwoo as maintainers for better review coverage. - selftests and tools/sched_ext build and other fixes. -----BEGIN PGP SIGNATURE----- iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCZ3hpXg4cdGpAa2VybmVs Lm9yZwAKCRCxYfJx3gVYGS/lAQDOZDfcJtO1VEsLoPY9NhFHPuBDTfoJyjSi/4mh GsjgDAD/Sx0rN6C9S/+ToUjmq3FA+ft0m2+97VqgLwkzwA9YxwI= =jaZ6 -----END PGP SIGNATURE----- Merge tag 'sched_ext-for-6.13-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext Pull sched_ext fixes from Tejun Heo: - Fix a bug where bpf_iter_scx_dsq_new() was not initializing the iterator's flags and could inadvertently enable e.g. reverse iteration - Fix a bug where scx_ops_bypass() could call irq_restore twice - Add Andrea and Changwoo as maintainers for better review coverage - selftests and tools/sched_ext build and other fixes * tag 'sched_ext-for-6.13-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: sched_ext: Fix dsq_local_on selftest sched_ext: initialize kit->cursor.flags sched_ext: Fix invalid irq restore in scx_ops_bypass() MAINTAINERS: add me as reviewer for sched_ext MAINTAINERS: add self as reviewer for sched_ext scx: Fix maximal BPF selftest prog sched_ext: fix application of sizeof to pointer selftests/sched_ext: fix build after renames in sched_ext API sched_ext: Add __weak to fix the build errors |
||
Linus Torvalds
|
f9aa1fb9f8 |
workqueue: Fixes for v6.13-rc5
- Suppress a corner case spurious flush dependency warning. - Two trivial changes. -----BEGIN PGP SIGNATURE----- iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCZ3hmjA4cdGpAa2VybmVs Lm9yZwAKCRCxYfJx3gVYGUrkAP90cajNtGbtFR1J61N4dTSfjBz8L7oQ6GLLyjCB MDxvpQD/ViVVpHBl9/jfObk//p6YMBTBD2Zp/aBc3mkKOVhfqws= =eUNO -----END PGP SIGNATURE----- Merge tag 'wq-for-6.13-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue fixes from Tejun Heo: - Suppress a corner case spurious flush dependency warning - Two trivial changes * tag 'wq-for-6.13-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: add printf attribute to __alloc_workqueue() workqueue: Do not warn when cancelling WQ_MEM_RECLAIM work from !WQ_MEM_RECLAIM worker rust: add safety comment in workqueue traits |
||
Linus Torvalds
|
2ae3aab557 |
block-6.13-20250103
-----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmd4P6oQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpvZzD/0YHz5bmr5+9P7+gH+EaB/yGp235OpFoFRe MtrlfHFmaj50NPQgBJD9IsuQfa7GNGEaTSjxU2cVFEiJjdCMGr1fj6aCQp1vsw0u ZKZFDfBKYMlynHx+jT6DTlssi4MDpKWDGitnZS/0Ql0oAA9jE2zZB0ntitmKaHyn tNeAHWUjvy7Z8zT7j5R8GdKNOAuT+m/R9qAwtH+UG1bkwMY7sUlAA3bn2n63VB9w v5a9xnQ/rBuz0R7lEMTKXyWxHNUQNpU+gV8+E6utibcP1Q5Xlf31BJHLFnFSffXy HOaYVGCk+gOZp4BybARNFEb7bJczRkG3zVDR0AzRTOVIyLxIi03ba8cOawbacFkf wRW1WlB1yCf5N8YT4NnXkqzcmZJgeHm/28zIsUbGG/OLgC4GhKc7uFQ8i4SJBNCO DI8n7HKgyfdhEyHUCzpLbgla4dr3uHnAyry1JVGLjXCfGo+KT5dnLlJilLjZ8ZbM orkP6aF6tJzOrZJJ4wwgZ5J30rqsqyuCHMV82zBlrnKCEgABgkTG3EfHC0HJRdTK XZnAjI41R2Y3qP4LF5RB2lsyDWVhREBVtX99lrJSZ0qEEt8A26gfidrHvR6fa2M3 w7FGDNQdgAjSkqlyBjaOAIzBG4olfVkCAsXJql1X/ODYLXLVKedpQotEZkH6BxbV M7t8hA41Sg== =MUeQ -----END PGP SIGNATURE----- Merge tag 'block-6.13-20250103' of git://git.kernel.dk/linux Pull block fixes from Jens Axboe: "Collection of fixes for block. Particularly the target name overflow has been a bit annoying, as it results in overwriting random memory and hence shows up as triggering various other bugs. - NVMe pull request via Keith: - Fix device specific quirk for PRP list alignment (Robert) - Fix target name overflow (Leo) - Fix target write granularity (Luis) - Fix target sleeping in atomic context (Nilay) - Remove unnecessary tcp queue teardown (Chunguang) - Simple cdrom typo fix" * tag 'block-6.13-20250103' of git://git.kernel.dk/linux: cdrom: Fix typo, 'devicen' to 'device' nvme-tcp: remove nvme_tcp_destroy_io_queues() nvmet-loop: avoid using mutex in IO hotpath nvmet: propagate npwg topology nvmet: Don't overflow subsysnqn nvme-pci: 512 byte aligned dma pool segment quirk |
||
Linus Torvalds
|
a984e234fc |
io_uring-6.13-20250103
-----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmd4P5kQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpuVeEACpkFom5oU8l8fwO2xBWVgPQvv4q9XNfAxB gd4k5UK3QqYaHJLxuoQrE4a/T/rMzTmQerBbWqe4UbMnLfuzNysre+dnSCCtMYxc c32gbenzkUi+5bsVjN9BT2OYB/l/Ei8gDewTRp09UJ36r9LOumxKd74BJlks+M0I hJKU4T/V/kAob4U/Ix3qhn+66yP2iC8244o7dJHqYK49a4ejoB5UzZGrzfwvLAaE V+Aic0nFW4aoB45pS4TcFFKj8NwwgkdShJsr35L3hDXYHWoIEthmpO3sOfcnKJD/ +fHp5Rdc+ukqRX/fAZDSziMn6EZL7+mAu62U+YNn1B/fcRXH9FGLE/4WCsdsbiXm X/KMat8TAsui2dojDJQHnFdQU4ry05NhDRs8Ssj7b993YCcN0XYH/Rt351q+gNC4 7s/eYoITHfTXV4yMZm6oudsnA6kFH9dgtu7OV0GciYRp17bYUyxOupXyt678akH9 SsTYX4joDBW4sGVoMxxWYLJ7Cucpbn8j6J2qUQQxL7feYWWSzf/aCCdEuDiVM70A ePGqOGUrqQq8dtFrB8osvSCj9eg/TnN2szLc+Rgik1INt92d0W4cPAx1IWiFhnQn FLmH4yepv0T0SXK2vri+ApbZbeFHMJ6zKr+H1s9xCr1TpwszP32jzlAPPWXQGpCB D/VnBlPIcA== =YBDX -----END PGP SIGNATURE----- Merge tag 'io_uring-6.13-20250103' of git://git.kernel.dk/linux Pull io_uring fixes from Jens Axboe: - Fix an issue with the read multishot support and posting of CQEs from io-wq context - Fix a regression introduced in this cycle, where making the timeout lock a raw one uncovered another locking dependency. As a result, move the timeout flushing outside of the timeout lock, punting them to a local list first - Fix use of an uninitialized variable in io_async_msghdr. Doesn't really matter functionally, but silences a valid KMSAN complaint that it's not always initialized - Fix use of incrementally provided buffers for read on non-pollable files, where the buffer always gets committed upfront. Unfortunately the buffer address isn't resolved first, so the read ends up using the updated rather than the current value * tag 'io_uring-6.13-20250103' of git://git.kernel.dk/linux: io_uring/kbuf: use pre-committed buffer address for non-pollable file io_uring/net: always initialize kmsg->msg.msg_inq upfront io_uring/timeout: flush timeouts outside of the timeout lock io_uring/rw: fix downgraded mshot read |
||
Linus Torvalds
|
aba74e639f |
Nothing major here. Over the last two weeks we gathered only around
two-thirds of our normal weekly fix count, but delaying sending these until -rc7 seemed like a really bad idea. AFAIK we have no bugs under investigation. One or two reverts for stuff for which we haven't gotten a proper fix will likely come in the next PR. Including fixes from wireles and netfilter. Current release - fix to a fix: - netfilter: nft_set_hash: unaligned atomic read on struct nft_set_ext - eth: gve: trigger RX NAPI instead of TX NAPI in gve_xsk_wakeup Previous releases - regressions: - net: reenable NETIF_F_IPV6_CSUM offload for BIG TCP packets - mptcp: - fix sleeping rcvmsg sleeping forever after bad recvbuffer adjust - fix TCP options overflow - prevent excessive coalescing on receive, fix throughput - net: fix memory leak in tcp_conn_request() if map insertion fails - wifi: cw1200: fix potential NULL dereference after conversion to GPIO descriptors - phy: micrel: dynamically control external clock of KSZ PHY, fix suspend behavior Previous releases - always broken: - af_packet: fix VLAN handling with MSG_PEEK - net: restrict SO_REUSEPORT to inet sockets - netdev-genl: avoid empty messages in NAPI get - dsa: microchip: fix set_ageing_time function on KSZ9477 and LAN937X - eth: gve: XDP fixes around transmit, queue wakeup etc. - eth: ti: icssg-prueth: fix firmware load sequence to prevent time jump which breaks timesync related operations Misc: - netlink: specs: mptcp: add missing attr and improve documentation Signed-off-by: Jakub Kicinski <kuba@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmd4FCgACgkQMUZtbf5S IruB7hAAkppr4+4kywDq/PIqgkisjYzpNIQIrphebUeIHIcAizWbJEL9CiHXnCQD w95+nVhzDmsWXdzE2AvwM/2YAByLP6CpgKEa+7rI3K5BQZ86lbzm+ftOOFYiF3eV 2v5yLfjgMVbzwqzuP3ivkNrtG95IGQFPobLl7tI9Wpgm3yD0+H8BCqtZNAwA4N6g mEdZD1jMOzkdINoJBg35O70B2GDq/WSS1N8dgxtj1F7EPDuQdO1kijJmqFjYT3fw 30/NdjlfGtbFro6zp+8ZY6P9RG3CLNhYXu+nUqdOHJOJw65aXUxOPjWCRp3o6m12 HIIp22imDTvNTSCfn0wlbGJfC+v9HdIgB9gqApAaxNOS02bQgEy5tuxxXNTo2B1I bqH7GJ+6a7axsQ6BnpcOcDhu2fnbtguj1W/zC/MlGQmZZj2g+UNCf4QqPir/XkcG r43/iy2/L8cgZWYXzH1skfUMgbymI5uLytO+n+CLsb6P/N3WLa7wxHNV5Lni+Xng qDyd0TwUccCr2KBfZqjgFeGbvQ7qub+eeQCYx9kUAaFhKUsQB/eZ5i9C7SsfutMa xzwITqCkcvmdyU3gG3v/oABtQkxdTcRqWCfYDGcXm0zb1jzMf5ZGN4E/gABYM7p9 EewL1kCAt4ULmBc8Hqrj+2Erlzh+xsodvvfpgpJandMVXaJW44k= =YKS4 -----END PGP SIGNATURE----- Merge tag 'net-6.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from wireles and netfilter. Nothing major here. Over the last two weeks we gathered only around two-thirds of our normal weekly fix count, but delaying sending these until -rc7 seemed like a really bad idea. AFAIK we have no bugs under investigation. One or two reverts for stuff for which we haven't gotten a proper fix will likely come in the next PR. Current release - fix to a fix: - netfilter: nft_set_hash: unaligned atomic read on struct nft_set_ext - eth: gve: trigger RX NAPI instead of TX NAPI in gve_xsk_wakeup Previous releases - regressions: - net: reenable NETIF_F_IPV6_CSUM offload for BIG TCP packets - mptcp: - fix sleeping rcvmsg sleeping forever after bad recvbuffer adjust - fix TCP options overflow - prevent excessive coalescing on receive, fix throughput - net: fix memory leak in tcp_conn_request() if map insertion fails - wifi: cw1200: fix potential NULL dereference after conversion to GPIO descriptors - phy: micrel: dynamically control external clock of KSZ PHY, fix suspend behavior Previous releases - always broken: - af_packet: fix VLAN handling with MSG_PEEK - net: restrict SO_REUSEPORT to inet sockets - netdev-genl: avoid empty messages in NAPI get - dsa: microchip: fix set_ageing_time function on KSZ9477 and LAN937X - eth: - gve: XDP fixes around transmit, queue wakeup etc. - ti: icssg-prueth: fix firmware load sequence to prevent time jump which breaks timesync related operations Misc: - netlink: specs: mptcp: add missing attr and improve documentation" * tag 'net-6.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (50 commits) net: ti: icssg-prueth: Fix clearing of IEP_CMP_CFG registers during iep_init net: ti: icssg-prueth: Fix firmware load sequence. mptcp: prevent excessive coalescing on receive mptcp: don't always assume copied data in mptcp_cleanup_rbuf() mptcp: fix recvbuffer adjust on sleeping rcvmsg ila: serialize calls to nf_register_net_hooks() af_packet: fix vlan_get_protocol_dgram() vs MSG_PEEK af_packet: fix vlan_get_tci() vs MSG_PEEK net: wwan: iosm: Properly check for valid exec stage in ipc_mmio_init() net: restrict SO_REUSEPORT to inet sockets net: reenable NETIF_F_IPV6_CSUM offload for BIG TCP packets net: sfc: Correct key_len for efx_tc_ct_zone_ht_params net: wwan: t7xx: Fix FSM command timeout issue sky2: Add device ID 11ab:4373 for Marvell 88E8075 mptcp: fix TCP options overflow. net: mv643xx_eth: fix an OF node reference leak gve: trigger RX NAPI instead of TX NAPI in gve_xsk_wakeup eth: bcmsysport: fix call balance of priv->clk handling routines net: llc: reset skb->transport_header netlink: specs: mptcp: fix missing doc ... |
||
Linus Torvalds
|
ee063c23e4 |
NIOS2: update for v6.14
- Use str_yes_no() helper function -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEoHhMeiyk5VmwVMwNGZQEC4GjKPQFAmd3TF0ACgkQGZQEC4Gj KPQQYw/+MEt6NJz5YRLGgmAyDEXIFRZbxMKa4xraQv12QD+l83JxSf0o9LP4knmt eRkSB3Ta1irwdu/jwA5WPTBU+Z3oZGRIrnRvv2h+CAyqj9yDI7+QMlTqWDU3KnAl uVIHbLo6LBZlRx6sekm6tHPuD+dilzlWYVOrKbkGnrZX+HlrpEmTRXZ5qDTfRh3y Wj7SYWWN4L9vTJ3IpiX6Zu3827MCFbpPFR5fzyusYsGX3AdVD/QBbXspbjsigPEG DnaPh6qgnKqeuEYSyC7PxpPyFzVh2vwFO4q1p5t3FaEvF5IKlztRoFljkHto2sPN XAb5CG4kjZ6jo8/ntrXEbYmaqtHIW0/c0MZSLtTWz2jVOpFuvncFFq2yo2JegKRY FEAan+GyftrXXLiFTvbrUew7m1dIqrKRQQEsxbhlHojE+gRjnFvQE/TXGc5Gt8C8 s/77qOIM7yhOkNdUeh2I/T2ZPi+ZyMO19k5Ut8lvrH/1gts0HXv+wNW6rcjyO2Sb nZBG9dQNWS7P3I/zjdldBUY2OWuxe3EnMBh5zn3yRxmgl2aBwDwv+3aymuybYkws BkZsnydANkDYdQdsiA0Mvns+69OXnrA5JS2vAf0rdnaySrozUgVXPMV/Q++r48RH tsxsFNrj9AufYrTWWRa2dRFNFBIaSprFehJmOmDrR+kpG69SQyw= =Ma+k -----END PGP SIGNATURE----- Merge tag 'nios2_update_for_v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux Pull nios2 fixlet from Dinh Nguyen: - Use str_yes_no() helper function * tag 'nios2_update_for_v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux: nios2: Use str_yes_no() helper in show_cpuinfo() |
||
Linus Torvalds
|
dea3165f98 |
RDMA v6.13 first rc pull request
Alot of fixes accumulated over the holiday break: - Static tool fixes, value is already proven to be NULL, possible integer overflow - Many bnxt_re fixes: * Crashes due to a mismatch in the maximum SGE list size * Don't waste memory for user QPs by creating kernel-only structures * Fix compatability issues with older HW in some of the new HW features recently introduced: RTS->RTS feature, work around 9096 * Do not allow destroy_qp to fail * Validate QP MTU against device limits * Add missing validation on madatory QP attributes for RTR->RTS * Report port_num in query_qp as required by the spec * Fix creation of QPs of the maximum queue size, and in the variable mode * Allow all QPs to be used on newer HW by limiting a work around only to HW it affects * Use the correct MSN table size for variable mode QPs * Add missing locking in create_qp() accessing the qp_tbl * Form WQE buffers correctly when some of the buffers are 0 hop * Don't crash on QP destroy if the userspace doesn't setup the dip_ctx * Add the missing QP flush handler call on the DWQE path to avoid hanging on error recovery * Consistently use ENXIO for return codes if the devices is fatally errored - Try again to fix VLAN support on iwarp, previous fix was reverted due to breaking other cards - Correct error path return code for rdma netlink events - Remove the seperate net_device pointer in siw and rxe which syzkaller found a way to UAF - Fix a UAF of a stack ib_sge in rtrs - Fix a regression where old mlx5 devices and FW were wrongly activing new device features and failing -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCZ3fvrwAKCRCFwuHvBreF YSuIAQDNZ8fPvjVdL8x1QIUVParAAewEUv3MbmV1uRVgoG7LrQEAxgw6UIUF4HOw u6aW8aRXGYM3hG241AKoYnU5yqHl4g0= =ilfd -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull rdma fixes from Jason Gunthorpe: "A lot of fixes accumulated over the holiday break: - Static tool fixes, value is already proven to be NULL, possible integer overflow - Many bnxt_re fixes: - Crashes due to a mismatch in the maximum SGE list size - Don't waste memory for user QPs by creating kernel-only structures - Fix compatability issues with older HW in some of the new HW features recently introduced: RTS->RTS feature, work around 9096 - Do not allow destroy_qp to fail - Validate QP MTU against device limits - Add missing validation on madatory QP attributes for RTR->RTS - Report port_num in query_qp as required by the spec - Fix creation of QPs of the maximum queue size, and in the variable mode - Allow all QPs to be used on newer HW by limiting a work around only to HW it affects - Use the correct MSN table size for variable mode QPs - Add missing locking in create_qp() accessing the qp_tbl - Form WQE buffers correctly when some of the buffers are 0 hop - Don't crash on QP destroy if the userspace doesn't setup the dip_ctx - Add the missing QP flush handler call on the DWQE path to avoid hanging on error recovery - Consistently use ENXIO for return codes if the devices is fatally errored - Try again to fix VLAN support on iwarp, previous fix was reverted due to breaking other cards - Correct error path return code for rdma netlink events - Remove the seperate net_device pointer in siw and rxe which syzkaller found a way to UAF - Fix a UAF of a stack ib_sge in rtrs - Fix a regression where old mlx5 devices and FW were wrongly activing new device features and failing" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (28 commits) RDMA/mlx5: Enable multiplane mode only when it is supported RDMA/bnxt_re: Fix error recovery sequence RDMA/rtrs: Ensure 'ib_sge list' is accessible RDMA/rxe: Remove the direct link to net_device RDMA/hns: Fix missing flush CQE for DWQE RDMA/hns: Fix warning storm caused by invalid input in IO path RDMA/hns: Fix accessing invalid dip_ctx during destroying QP RDMA/hns: Fix mapping error of zero-hop WQE buffer RDMA/bnxt_re: Fix the locking while accessing the QP table RDMA/bnxt_re: Fix MSN table size for variable wqe mode RDMA/bnxt_re: Add send queue size check for variable wqe RDMA/bnxt_re: Disable use of reserved wqes RDMA/bnxt_re: Fix max_qp_wrs reported RDMA/siw: Remove direct link to net_device RDMA/nldev: Set error code in rdma_nl_notify_event RDMA/bnxt_re: Fix reporting hw_ver in query_device RDMA/bnxt_re: Fix to export port num to ib_query_qp RDMA/bnxt_re: Fix setting mandatory attributes for modify_qp RDMA/bnxt_re: Add check for path mtu in modify_qp RDMA/bnxt_re: Fix the check for 9060 condition ... |
||
Linus Torvalds
|
f274fffbc2 |
Pin control fixes for the v6.13 series:
- A small Kconfig fixup for the i.MX, in principle this could come in from the SoC tree but the bug was introduced from the pin control tree so let's fix it from here. - Fix a sleep in atomic context in the MCP23xxx GPIO expander by disabling the regmap locking and using explicit mutex locks. -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEElDRnuGcz/wPCXQWMQRCzN7AZXXMFAmd3nJcACgkQQRCzN7AZ XXOhNg/8C5lXyL8M8LEiCL7qatG9fBlFCIsCs6/bprYUT5tQksMpBlJUtYmyKEf4 nw1UWOrwPz6RFC5UnK0PnDg3wC2y8sElWab5FUMsqZaBdCeDnCQXsbnJ7iiow/pN +BigELRBgHJtfyMBp8WymS1w+2IarlBuh3JPTVFTZO/uZFCiM4C9QLFg3oPaUMhq ZaVMYldRIIbM0QVs2JjYas317dmH290qOIQu7tTNVsiVDIias0Z3gv4ejKTe+QAt qAgHrK5xMBcTu07Dx/DVg8+X1YN+LCqk1CkZuCk6kYoUv4pAARGAWwO2JtbeHldI 9iuBBDOmaqVS+GKxrXgga7nu4mcsQqanXGmu12+YibkuMt56iGCp0LVLVWe8/CAN aNg5/5fDrfKWssAOrMVupGJkaqR5uVkZ8v9oPDq1fSh2jAATzOa/l8OiaonIfkJ9 2UG0jr23LtrsAsfcObRR2xbvK6BwtFONFnk51ivzM+tLyXBl1DkXiUBakrrYpYKN eTF2+5Att/XKu8yghTqtl6XPLPb9DLjWCzFh7Y7z8hylHoIOUHNUpS4sMZ/gveTq yBCLpo7tSYZfIvhrIRkPlNciyCQJDoLOxb2OJT/6WcomW561RSEhOjTFObSsn634 YjJ3bSobGLMq4bT3p6iCylXc5KTGhvf54hp/bLj4HMekM7daSyY= =+Eo1 -----END PGP SIGNATURE----- Merge tag 'pinctrl-v6.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Pull pin control fixes from Linus Walleij: - A small Kconfig fixup for the i.MX. In principle this could come in from the SoC tree but the bug was introduced from the pin control tree so let's fix it from here. - Fix a sleep in atomic context in the MCP23xxx GPIO expander by disabling the regmap locking and using explicit mutex locks. * tag 'pinctrl-v6.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: pinctrl: mcp23s08: Fix sleeping in atomic context due to regmap locking ARM: imx: Re-introduce the PINCTRL selection |
||
Linus Torvalds
|
4f5d3da619 |
sound fixes for 6.13-rc6
The first new year PR: no surprise, all small fixes, including: - Follow-up fixes for the new compress-offload API extension - A couple of fixes for MIDI 2.0 UMP handling - A trivial race fix for OSS sequencer emulation ioctls - USB-audio and HD-audio fixes / quirks -----BEGIN PGP SIGNATURE----- iQJCBAABCAAsFiEEIXTw5fNLNI7mMiVaLtJE4w1nLE8FAmd38D8OHHRpd2FpQHN1 c2UuZGUACgkQLtJE4w1nLE8brw//dkMzKN/VUwEHJLCj1gkDd4vNkdyVqcHY/zIF NJnUWnDeQbGP7iMS+ZuQ9xb7P9x+6JM1WA6CUCrHDMq0NLAIu48jZQiaQQtuFfYG NlJaGJMwFZcecgKpBXyFKqvHrY8XZgnnd1dH0QkwZlf+YJsoIWhmVSj78UbxSKca Nu/VZc5HitwYJKR9czovrZJ6iQcTY5F7YkoeBGJAoUzi125XvMZzx8ZP8LJCskT2 eZmmLEwTRWLq0kR8Esv2zpNwh7jX9+nOaYJOb4J0M7XHfU/d3lUCl324tGUBiFu6 TN+5woCZ3bKuiCJa83JWyD0FGdiIcjUIzYWY9/enaR8DmebOvlvF2FeBr4o1oGvT bKSAq8qs44EQiteTkh5AB6TyBxGpjtZ1GAqMS1cW2ARjdT6n49SIYaovrYOsSF/Z E95AkspsMANlisn7gU2U7wWEl1/Vyp0J7U37BqrwUwg7Fembh5KDVJIVijo2pK0J KaO4ulws88cxCvwm8gnyfGoQb6cJrp9bbrObjpy4Cuk1jtTZdslW6PhrqLl7wz4E 2Z9L6K2LEO/OXg0keyB8nF9jQofPA3g17xQZOE/Aob0z3e2EshW7F3ituvGF1iRQ bQLbeQp6cGg1nYDSBq3uhq/3n8IV2m5aZc8kWCTNpU2uoVZTBXzw88VU5sFtakns w3ockRs= =n/AG -----END PGP SIGNATURE----- Merge tag 'sound-6.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "The first new year pull request: no surprises, all small fixes, including: - Follow-up fixes for the new compress-offload API extension - A couple of fixes for MIDI 2.0 UMP handling - A trivial race fix for OSS sequencer emulation ioctls - USB-audio and HD-audio fixes / quirks" * tag 'sound-6.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: seq: Check UMP support for midi_version change ALSA hda/realtek: Add quirk for Framework F111:000C Revert "ALSA: ump: Don't enumeration invalid groups for legacy rawmidi" ALSA: seq: oss: Fix races at processing SysEx messages ALSA: compress_offload: fix remaining descriptor races in sound/core/compress_offload.c ALSA: compress_offload: Drop unneeded no_free_ptr() ALSA: hda/tas2781: Ignore SUBSYS_ID not found for tas2563 projects ALSA: usb-audio: US16x08: Initialize array before use |
||
Linus Torvalds
|
92c3bb3d2e |
drm fixes for 6.13-rc6
i915: - Fix C10 pll programming sequence [cx0_phy] - Fix power gate sequence. [dg1] xe: - uapi: Revert some devcoredump file format changes breaking a mesa debug tool - Fixes around waits when moving to system - Fix a typo when checking for LMEM provisioning - Fix a fault on fd close after unbind - A couple of OA fixes squashed for stable backporting adv7511: - fix UAF - drop single lane support - audio infoframe fix -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmd3RjgACgkQDHTzWXnE hr4zfhAAieuGQtsYScV6LPe6tYfiR+kT58TbfXYr1xaKrZn/F+UkLCG7d2dOZfPg zG69lZUQQMG0DMujYsL3SnWOf8Q5Z9flHg0Gmj577nxtKWry3ZbjJ31PCeiYsAF5 09SwrMHDGytYpo+vJBbHDQXn7NIH9Ud9PhrH+koObW6mVivdouF37gB1JUsl+f6l JGA99Uv89IOr/UmvmW4WyCLTw9NBnQgoJB1HKIjJTx+H4OAUcQ8qH97zy7nVZEpN LT2+fGzPCr2Us1yuu0XCWFDPUsxE7moGTH2TR27mgIJDvf+7yJokqyfmhLDWIm+R kjGGw+3s1rqy7xkugkD/cz11DL/Ag2i/pLihywUfPXT7XOMvU4JwjN1M+IY0uZ4x BmcJ9v1HZq0YdfIzydva6aaRwW+/ouJVq9/FEPtKUqq6vL4+OeoX5dAAUt7xH5Ok TlobbeVwBVgChqyZ9Ug0e9Jz8VTgH0lKqz5aTUVTd4b5xibmCUWIkGH8elPiXfpf NsZ8LlffoHIFa1cbR3QizmQRDXS2Nu4NLiGKajL+yM+8+IyclgDzfVWLz4xq3cpW ZBFHO6aeU7i/SHYj3BbnR7Kmt8kVEnE9yN1hENztJyz/IruIhV8BgTiLPaOfwV1m yRfK3oD2qbdoxY4RiFGWwU60GGRU/WvFhRLf2rmh0LyBPU3oNy0= =H3XD -----END PGP SIGNATURE----- Merge tag 'drm-fixes-2025-01-03' of https://gitlab.freedesktop.org/drm/kernel Pull drm fixes from Dave Airlie: "Happy New Year. It was fairly quiet for holidays period, certainly nothing that worth getting off the couch before I needed to, this is for the past two weeks, i915, xe and some adv7511, I expect we will see some amdgpu etc happening next week, but otherwise all quiet. i915: - Fix C10 pll programming sequence [cx0_phy] - Fix power gate sequence. [dg1] xe: - uapi: Revert some devcoredump file format changes breaking a mesa debug tool - Fixes around waits when moving to system - Fix a typo when checking for LMEM provisioning - Fix a fault on fd close after unbind - A couple of OA fixes squashed for stable backporting adv7511: - fix UAF - drop single lane support - audio infoframe fix" * tag 'drm-fixes-2025-01-03' of https://gitlab.freedesktop.org/drm/kernel: xe/oa: Fix query mode of operation for OAR/OAC drm/i915/dg1: Fix power gate sequence. drm/i915/cx0_phy: Fix C10 pll programming sequence drm/xe: Fix fault on fd close after unbind drm/xe/pf: Use correct function to check LMEM provisioning drm/xe: Wait for migration job before unmapping pages drm/xe: Use non-interruptible wait when moving BO to system drm/xe: Revert some changes that break a mesa debug tool drm: adv7511: Drop dsi single lane support dt-bindings: display: adi,adv7533: Drop single lane support drm: adv7511: Fix use-after-free in adv7533_attach_dsi() drm/bridge: adv7511_audio: Update Audio InfoFrame properly |
||
Linus Torvalds
|
e30dd219c7 |
Fixes for ftrace in v6.13:
- Add needed READ_ONCE() around access to the fgraph array element The updates to the fgraph array can happen when callbacks are registered and unregistered. The __ftrace_return_to_handler() can handle reading either the old value or the new value. But once it reads that value it must stay consistent otherwise the check that looks to see if the value is a stub may show false, but if the compiler decides to re-read after that check, it can be true which can cause the code to crash later on. - Make function profiler use the top level ops for filtering again When function graph became available for instances, its filter ops became independent from the top level set_ftrace_filter. In the process the function profiler received its own filter ops as well. But the function profiler uses the top level set_ftrace_filter file and does not have one of its own. In giving it its own filter ops, it lost any user interface it once had. Make it use the top level set_ftrace_filter file again. This fixes a regression. -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZ3cR4RQccm9zdGVkdEBn b29kbWlzLm9yZwAKCRAp5XQQmuv6qjxfAQCPhNztdmGmEYmuBtONPHwejidWnuJ6 Rl2mQxEbp40OUgD+JvSWofhRsvtXWlymqZ9j+dKMegLqMeq834hB0LK4NAg= =+KqV -----END PGP SIGNATURE----- Merge tag 'ftrace-v6.13-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull ftrace fixes from Steven Rostedt: - Add needed READ_ONCE() around access to the fgraph array element The updates to the fgraph array can happen when callbacks are registered and unregistered. The __ftrace_return_to_handler() can handle reading either the old value or the new value. But once it reads that value it must stay consistent otherwise the check that looks to see if the value is a stub may show false, but if the compiler decides to re-read after that check, it can be true which can cause the code to crash later on. - Make function profiler use the top level ops for filtering again When function graph became available for instances, its filter ops became independent from the top level set_ftrace_filter. In the process the function profiler received its own filter ops as well. But the function profiler uses the top level set_ftrace_filter file and does not have one of its own. In giving it its own filter ops, it lost any user interface it once had. Make it use the top level set_ftrace_filter file again. This fixes a regression. * tag 'ftrace-v6.13-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: ftrace: Fix function profiler's filtering functionality fgraph: Add READ_ONCE() when accessing fgraph_array[] |
||
Jens Axboe
|
ed123c948d |
io_uring/kbuf: use pre-committed buffer address for non-pollable file
For non-pollable files, buffer ring consumption will commit upfront. This is fine, but io_ring_buffer_select() will return the address of the buffer after having committed it. For incrementally consumed buffers, this is incorrect as it will modify the buffer address. Store the pre-committed value and return that. If that isn't done, then the initial part of the buffer is not used and the application will correctly assume the content arrived at the start of the userspace buffer, but the kernel will have put it later in the buffer. Or it can cause a spurious -EFAULT returned in the CQE, depending on the buffer size. As bounds are suitably checked for doing the actual IO, no adverse side effects are possible - it's just a data misplacement within the existing buffer. Reported-by: Gwendal Fernet <gwendalfernet@gmail.com> Cc: stable@vger.kernel.org Fixes: ae98dbf43d75 ("io_uring/kbuf: add support for incremental buffer consumption") Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
Mark Zhang
|
45d339fefa |
RDMA/mlx5: Enable multiplane mode only when it is supported
Driver queries vport_cxt.num_plane and enables multiplane when it is greater then 0, but some old FWs (versions from x.40.1000 till x.42.1000), report vport_cxt.num_plane = 1 unexpectedly. Fix it by querying num_plane only when HCA_CAP2.multiplane bit is set. Fixes: 2a5db20fa532 ("RDMA/mlx5: Add support to multi-plane device and port") Link: https://patch.msgid.link/r/1ef901acdf564716fcf550453cf5e94f343777ec.1734610916.git.leon@kernel.org Cc: stable@vger.kernel.org Reported-by: Francesco Poli <invernomuto@paranoici.org> Closes: https://lore.kernel.org/all/nvs4i2v7o6vn6zhmtq4sgazy2hu5kiulukxcntdelggmznnl7h@so3oul6uwgbl/ Signed-off-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> |
||
David S. Miller
|
ce21419b55 |
Merge branch 'net-iep-clock-module-fixes'
Meghana Malladi says: ==================== IEP clock module bug fixes This series has some bug fixes for IEP module needed by PPS and timesync operations. Patch 1/2 fixes firmware load sequence to run all the firmwares when either of the ethernet interfaces is up. Move all the code common for firmware bringup under common functions. Patch 2/2 fixes distorted PPS signal when the ethernet interfaces are brough down and up. This patch also fixes enabling PPS signal after bringing the interface up, without disabling PPS. ==================== Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Meghana Malladi
|
9b11536124 |
net: ti: icssg-prueth: Fix clearing of IEP_CMP_CFG registers during iep_init
When ICSSG interfaces are brought down and brought up again, the pru cores are shut down and booted again, flushing out all the memories and start again in a clean state. Hence it is expected that the IEP_CMP_CFG register needs to be flushed during iep_init() to ensure that the existing residual configuration doesn't cause any unusual behavior. If the register is not cleared, existing IEP_CMP_CFG set for CMP1 will result in SYNC0_OUT signal based on the SYNC_OUT register values. After bringing the interface up, calling PPS enable doesn't work as the driver believes PPS is already enabled, (iep->pps_enabled is not cleared during interface bring down) and driver will just return true even though there is no signal. Fix this by disabling pps and perout. Fixes: c1e0230eeaab ("net: ti: icss-iep: Add IEP driver") Signed-off-by: Meghana Malladi <m-malladi@ti.com> Reviewed-by: Roger Quadros <rogerq@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
MD Danish Anwar
|
9facce84f4 |
net: ti: icssg-prueth: Fix firmware load sequence.
Timesync related operations are ran in PRU0 cores for both ICSSG SLICE0 and SLICE1. Currently whenever any ICSSG interface comes up we load the respective firmwares to PRU cores and whenever interface goes down, we stop the resective cores. Due to this, when SLICE0 goes down while SLICE1 is still active, PRU0 firmwares are unloaded and PRU0 core is stopped. This results in clock jump for SLICE1 interface as the timesync related operations are no longer running. As there are interdependencies between SLICE0 and SLICE1 firmwares, fix this by running both PRU0 and PRU1 firmwares as long as at least 1 ICSSG interface is up. Add new flag in prueth struct to check if all firmwares are running and remove the old flag (fw_running). Use emacs_initialized as reference count to load the firmwares for the first and last interface up/down. Moving init_emac_mode and fw_offload_mode API outside of icssg_config to icssg_common_start API as they need to be called only once per firmware boot. Change prueth_emac_restart() to return error code and add error prints inside the caller of this functions in case of any failures. Move prueth_emac_stop() from common to sr1 driver. sr1 and sr2 drivers have different logic handling for stopping the firmwares. While sr1 driver is dependent on emac structure to stop the corresponding pru cores for that slice, for sr2 all the pru cores of both the slices are stopped and is not dependent on emac. So the prueth_emac_stop() function is no longer common and can be moved to sr1 driver. Fixes: c1e0230eeaab ("net: ti: icss-iep: Add IEP driver") Signed-off-by: MD Danish Anwar <danishanwar@ti.com> Signed-off-by: Meghana Malladi <m-malladi@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net> |