linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2024-12-28 16:53:49 +00:00

Author	SHA1	Message	Date
Suren Baghdasaryan	b7fc16a16b	mm/codetag: uninline and move pgalloc_tag_copy and pgalloc_tag_split pgalloc_tag_copy() and pgalloc_tag_split() are sizable and outside of any performance-critical paths, so it should be fine to uninline them. Also move their declarations into pgalloc_tag.h which seems like a more appropriate place for them. No functional changes other than uninlining. Link: https://lkml.kernel.org/r/20241024162318.1640781-1-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Suggested-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Yu Zhao <yuzhao@google.com> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Sourav Panda <souravpanda@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-11-07 14:25:17 -08:00
Suren Baghdasaryan	4835f747d3	alloc_tag: support for page allocation tag compression Implement support for storing page allocation tag references directly in the page flags instead of page extensions. sysctl.vm.mem_profiling boot parameter it extended to provide a way for a user to request this mode. Enabling compression eliminates memory overhead caused by page_ext and results in better performance for page allocations. However this mode will not work if the number of available page flag bits is insufficient to address all kernel allocations. Such condition can happen during boot or when loading a module. If this condition is detected, memory allocation profiling gets disabled with an appropriate warning. By default compression mode is disabled. Link: https://lkml.kernel.org/r/20241023170759.999909-7-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Christoph Hellwig <hch@infradead.org> Cc: Daniel Gomez <da.gomez@samsung.com> Cc: David Hildenbrand <david@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: David Rientjes <rientjes@google.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Minchan Kim <minchan@google.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Petr Pavlu <petr.pavlu@suse.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Sami Tolvanen <samitolvanen@google.com> Cc: Sourav Panda <souravpanda@google.com> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Huth <thuth@redhat.com> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xiongwei Song <xiongwei.song@windriver.com> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-11-07 14:25:16 -08:00
Suren Baghdasaryan	0f9b685626	alloc_tag: populate memory for module tags as needed The memory reserved for module tags does not need to be backed by physical pages until there are tags to store there. Change the way we reserve this memory to allocate only virtual area for the tags and populate it with physical pages as needed when we load a module. [surenb@google.com: avoid execmem_vmap() when !MMU] Link: https://lkml.kernel.org/r/20241031233611.3833002-1-surenb@google.com Link: https://lkml.kernel.org/r/20241023170759.999909-5-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Christoph Hellwig <hch@infradead.org> Cc: Daniel Gomez <da.gomez@samsung.com> Cc: David Hildenbrand <david@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: David Rientjes <rientjes@google.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Minchan Kim <minchan@google.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Petr Pavlu <petr.pavlu@suse.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Sami Tolvanen <samitolvanen@google.com> Cc: Sourav Panda <souravpanda@google.com> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Huth <thuth@redhat.com> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xiongwei Song <xiongwei.song@windriver.com> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-11-07 14:25:16 -08:00
Suren Baghdasaryan	0db6f8d782	alloc_tag: load module tags into separate contiguous memory When a module gets unloaded there is a possibility that some of the allocations it made are still used and therefore the allocation tags corresponding to these allocations are still referenced. As such, the memory for these tags can't be freed. This is currently handled as an abnormal situation and module's data section is not being unloaded. To handle this situation without keeping module's data in memory, allow codetags with longer lifespan than the module to be loaded into their own separate memory. The in-use memory areas and gaps after module unloading in this separate memory are tracked using maple trees. Allocation tags arrange their separate memory so that it is virtually contiguous and that will allow simple allocation tag indexing later on in this patchset. The size of this virtually contiguous memory is set to store up to 100000 allocation tags. [surenb@google.com: fix empty codetag module section handling] Link: https://lkml.kernel.org/r/20241101000017.3856204-1-surenb@google.com [akpm@linux-foundation.org: update comment, per Dan] Link: https://lkml.kernel.org/r/20241023170759.999909-4-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Christoph Hellwig <hch@infradead.org> Cc: Daniel Gomez <da.gomez@samsung.com> Cc: David Hildenbrand <david@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: David Rientjes <rientjes@google.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Minchan Kim <minchan@google.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Petr Pavlu <petr.pavlu@suse.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Sami Tolvanen <samitolvanen@google.com> Cc: Sourav Panda <souravpanda@google.com> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Huth <thuth@redhat.com> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xiongwei Song <xiongwei.song@windriver.com> Cc: Yu Zhao <yuzhao@google.com> Cc: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-11-07 14:25:16 -08:00
Suren Baghdasaryan	3e09c500bb	alloc_tag: introduce shutdown_mem_profiling helper function Implement a helper function to disable memory allocation profiling and use it when creation of /proc/allocinfo fails. Ensure /proc/allocinfo does not get created when memory allocation profiling is disabled. Link: https://lkml.kernel.org/r/20241023170759.999909-3-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Christoph Hellwig <hch@infradead.org> Cc: Daniel Gomez <da.gomez@samsung.com> Cc: David Hildenbrand <david@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: David Rientjes <rientjes@google.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Minchan Kim <minchan@google.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Petr Pavlu <petr.pavlu@suse.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Sami Tolvanen <samitolvanen@google.com> Cc: Sourav Panda <souravpanda@google.com> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Huth <thuth@redhat.com> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xiongwei Song <xiongwei.song@windriver.com> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-11-07 14:25:16 -08:00
Masami Hiramatsu (Google)	cb6fcef8b4	objpool: fix to make percpu slot allocation more robust Since gfp & GFP_ATOMIC == GFP_ATOMIC is true for GFP_KERNEL \| GFP_HIGH, it will use kmalloc if user specifies that combination. Here the reason why combining the __vmalloc_node() and kmalloc_node() is that the vmalloc does not support all GFP flag, especially GFP_ATOMIC. So we should check if gfp & (GFP_ATOMIC \| GFP_KERNEL) != GFP_ATOMIC for vmalloc first. This ensures caller can sleep. And for the robustness, even if vmalloc fails, it should retry with kmalloc to allocate it. Link: https://lkml.kernel.org/r/173008598713.1262174.2959179484209897252.stgit@mhiramat.roam.corp.google.com Fixes: `aff1871bfc` ("objpool: fix choosing allocation for percpu slots") Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Closes: https://lore.kernel.org/all/CAHk-=whO+vSH+XVRio8byJU8idAWES0SPGVZ7KAVdc4qrV0VUA@mail.gmail.com/ Cc: Leo Yan <leo.yan@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Wu <wuqiang.matt@bytedance.com> Cc: Mikel Rychliski <mikel@mikelr.com> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Viktor Malik <vmalik@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-11-07 14:14:58 -08:00
Jakub Kicinski	2696e451df	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.12-rc7). Conflicts: drivers/net/ethernet/freescale/enetc/enetc_pf.c `e15c5506dd` ("net: enetc: allocate vf_state during PF probes") `3774409fd4` ("net: enetc: build enetc_pf_common.c as a separate module") https://lore.kernel.org/20241105114100.118bd35e@canb.auug.org.au Adjacent changes: drivers/net/ethernet/ti/am65-cpsw-nuss.c `de794169cf` ("net: ethernet: ti: am65-cpsw: Fix multi queue Rx on J7") `4a7b2ba94a` ("net: ethernet: ti: am65-cpsw: Use tstats instead of open coded version") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-11-07 13:44:16 -08:00
David Hildenbrand	2b37c814aa	lib/Kconfig.debug: Default STRICT_DEVMEM to "y" on s390 virtio-mem currently depends on !DEVMEM \| STRICT_DEVMEM. Let's default STRICT_DEVMEM to "y" just like we do for arm64 and x86. There could be ways in the future to filter access to virtio-mem device memory even without STRICT_DEVMEM, but for now let's just keep it simple. Tested-by: Mario Casquero <mcasquer@redhat.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: David Hildenbrand <david@redhat.com> Tested-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com> Link: https://lore.kernel.org/r/20241025141453.1210600-6-david@redhat.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2024-11-07 10:26:24 +01:00
Wei Yang	38dc8f4952	maple_tree: remove sanity check from mas_wr_slot_store() After commit `5d659bbb52` ("maple_tree: introduce mas_wr_store_type()"), the check here is redundant. Let's remove it. Link: https://lkml.kernel.org/r/20241017015809.23392-3-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-11-06 20:11:16 -08:00
Wei Yang	61e9df7085	maple_tree: calculate new_end when needed Patch series "Following cleanup after introduce mas_wr_store_type()", v2. Patch 1 postpone new_end calculation when needed. Patch 2 removes a unnecessary sanity check in mas_wr_slot_store(). This patch (of 2): For wr_exact_fit/wr_new_root, we don't need to calculate new_end. Let's postpone it until necessary. Link: https://lkml.kernel.org/r/20241017015809.23392-1-richard.weiyang@gmail.com Link: https://lkml.kernel.org/r/20241017015809.23392-2-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-11-06 20:11:15 -08:00
Andy Shevchenko	4a7bba1df0	percpu: add a test case for the specific 64-bit value addition It might be a corner case when we add UINT_MAX as 64-bit unsigned value to the percpu variable as it's not the same as -1 (ULONG_LONG_MAX). Add a test case for that. Link: https://lkml.kernel.org/r/20241016182635.1156168-3-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Christoph Lameter <cl@linux.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-11-06 20:11:14 -08:00
Wei Yang	908378a30b	maple_tree: simplify mas_push_node() When count is not 0, we know head is valid. So we can put the assignment in if (count) instead of checking the head pointer again. Also count represents current total, we can assign the new total by increasing the count by one. Link: https://lkml.kernel.org/r/20241015120746.15850-4-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-11-06 20:11:14 -08:00
Wei Yang	4223dd93bf	maple_tree: total is not changed for nomem_one case If it jumps to nomem_one, the total allocated number is not changed. So we don't need to adjust it. For the nomem_bulk case, we know there is a valid mas->alloc. So we don't need to do the check. Link: https://lkml.kernel.org/r/20241015120746.15850-3-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-11-06 20:11:14 -08:00
Wei Yang	e852cb1d00	maple_tree: clear request_count for new allocated one Patch series "maple_tree: simplify mas_push_node()", v2. When count is not 0, we know head is valid. So we can put the assignment in if (count) instead of checking the head pointer again. Also count represents current total, we can assign the new total by increasing the count by one. This patch (of 3): If this is not a new allocated one, the request_count has already been cleared in mas_set_alloc_req(). Link: https://lkml.kernel.org/r/20241015120746.15850-1-richard.weiyang@gmail.com Link: https://lkml.kernel.org/r/20241015120746.15850-2-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-11-06 20:11:14 -08:00
Wei Yang	0cc8d68abe	maple_tree: root node could be handled by !p_slot too For a root node, mte_parent_slot() return 0, this exactly fits the following !p_slot check. So we can remove the special handling for root node. Link: https://lkml.kernel.org/r/20240913063128.27391-1-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-11-06 20:11:13 -08:00
Jiazi Li	5b2100f723	maple_tree: fix alloc node fail issue In the following code, the second call to the mas_node_count will return -ENOMEM: mas_node_count(mas, MAPLE_ALLOC_SLOTS + 1); mas_node_count(mas, MAPLE_ALLOC_SLOTS * 2 + 2); This is because there may be some full maple_alloc node in current maple state. Use full maple_alloc node will make max_req equal to 0. And it leads to mt_alloc_bulk return 0. As a result, mas_node_count set mas.node to MA_ERROR(-ENOMEM). Find a non-full maple_alloc node, and if necessary, use this non-full node in the next while loop. Link: https://lkml.kernel.org/r/20240626160631.3636515-1-Liam.Howlett@oracle.com Fixes: `54a611b605` ("Maple Tree: add new data structure") Signed-off-by: Jiazi Li <jqqlijiazi@gmail.com> Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Suggested-by: Liam R. Howlett <Liam.Howlett@oracle.com> Reviewed-by: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-11-06 20:11:13 -08:00
Sidhartha Kumar	f0c99037a0	maple_tree: refactor mas_wr_store_type() In mas_wr_store_type(), we check if new_end < mt_slots[wr_mas->type]. If this check fails, we know that ,after this, new_end is >= mt_min_slots. Checking this again when we detect a wr_node_store later in the function is reduntant. Because this check is part of an OR statement, the statement will always evaluate to true, therefore we can just get rid of it. We also refactor mas_wr_store_type() to return the store type rather than set it directly as it greatly cleans up the function. Link: https://lkml.kernel.org/r/20241011214451.7286-2-sidhartha.kumar@oracle.com Signed-off-by: Sidhartha <sidhartha.kumar@oracle.com> Suggested-by: Liam Howlett <liam.howlett@oracle.com> Suggested-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Liam Howlett <liam.howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-11-06 20:11:12 -08:00
Lorenzo Stoakes	b314e21596	maple_tree: do not hash pointers on dump in debug mode Many maple tree values output when an mt_validate() or equivalent hits an issue utilise tagged pointers, most notably parent nodes. Also some pivots/slots contain meaningful values, output as pointers, such as the index of the last entry with data for example. All pointer values such as this are destroyed by kernel pointer hashing rendering the debug output obtained from CONFIG_DEBUG_VM_MAPLE_TREE considerably less usable. Update this code to output the raw pointers using %px rather than %p when CONFIG_DEBUG_VM_MAPLE_TREE is defined. This is justified, as the use of this configuration flag indicates that this is a test environment. Userland does not understand %px, so use %p there. In an abundance of caution, if CONFIG_DEBUG_VM_MAPLE_TREE is not set, also use %p to avoid exposing raw kernel pointers except when we are positive a testing mode is enabled. This was inspired by the investigation performed in recent debugging efforts around a maple tree regression [0] where kernel pointer tagging had to be disabled in order to obtain truly meaningful and useful data. [0]:https://lore.kernel.org/all/20241001023402.3374-1-spasswolf@web.de/ Link: https://lkml.kernel.org/r/20241007115335.90104-1-lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-11-06 20:11:09 -08:00
Sui Jingfeng	e01caa2b63	lib/scatterlist: use sg_phys() helper This shorten the length of code in horizential direction, therefore is easier to read. Link: https://lkml.kernel.org/r/20241028182920.1025819-1-sui.jingfeng@linux.dev Signed-off-by: Sui Jingfeng <sui.jingfeng@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-11-05 17:12:40 -08:00
Kuan-Wei Chiu	d559bb2c6d	lib/test_min_heap: update min_heap_callbacks to use default builtin swap Replace the swp function pointer in the min_heap_callbacks of test_min_heap with NULL, allowing direct usage of the default builtin swap implementation. This modification simplifies the code and improves performance by removing unnecessary function indirection. Link: https://lkml.kernel.org/r/20241020040200.939973-5-visitorckw@gmail.com Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Ching-Chun (Jim) Huang <jserv@ccns.ncku.edu.tw> Cc: Coly Li <colyli@suse.de> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: "Liang, Kan" <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Matthew Sakai <msakai@redhat.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-11-05 17:12:35 -08:00
Kuan-Wei Chiu	92a8b224b8	lib/min_heap: introduce non-inline versions of min heap API functions Patch series "Enhance min heap API with non-inline functions and optimizations", v2. Add non-inline versions of the min heap API functions in lib/min_heap.c and updates all users outside of kernel/events/core.c to use these non-inline versions. To mitigate the performance impact of indirect function calls caused by the non-inline versions of the swap and compare functions, a builtin swap has been introduced that swaps elements based on their size. Additionally, it micro-optimizes the efficiency of the min heap by pre-scaling the counter, following the same approach as in lib/sort.c. Documentation for the min heap API has also been added to the core-api section. This patch (of 10): All current min heap API functions are marked with '__always_inline'. However, as the number of users increases, inlining these functions everywhere leads to a increase in kernel size. In performance-critical paths, such as when perf events are enabled and min heap functions are called on every context switch, it is important to retain the inline versions for optimal performance. To balance this, the original inline functions are kept, and additional non-inline versions of the functions have been added in lib/min_heap.c. Link: https://lkml.kernel.org/r/20241020040200.939973-1-visitorckw@gmail.com Link: https://lore.kernel.org/20240522161048.8d8bbc7b153b4ecd92c50666@linux-foundation.org Link: https://lkml.kernel.org/r/20241020040200.939973-2-visitorckw@gmail.com Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> Suggested-by: Andrew Morton <akpm@linux-foundation.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Ching-Chun (Jim) Huang <jserv@ccns.ncku.edu.tw> Cc: Coly Li <colyli@suse.de> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Kuan-Wei Chiu <visitorckw@gmail.com> Cc: "Liang, Kan" <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Matthew Sakai <msakai@redhat.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-11-05 17:12:34 -08:00
Kuan-Wei Chiu	908ef9bb4b	lib/list_sort: remove unnecessary header includes Patch series "Remove unnecessary header includes from {tools/}lib/list_sort.c". Remove outdated and unnecessary header includes from lib/list_sort.c and tools/lib/list_sort.c. Additionally, update the hunk exceptions checked by check_headers.sh to reflect these changes. This patch (of 3): After commit `043b3f7b63` ("lib/list_sort: simplify and remove MAX_LIST_LENGTH_BITS"), list_sort.c no longer uses ARRAY_SIZE() (which required kernel.h and bug.h for BUILD_BUG_ON_ZERO via __must_be_array) or memset() (which required string.h). As these headers are no longer needed, removes them. There are no changes to the generated code, as confirmed by 'objdump -d'. Additionally, 'wc -l' shows that the size of lib/.list_sort.o.cmd is reduced from 259 lines to 101 lines. Link: https://lkml.kernel.org/r/20241012042828.471614-1-visitorckw@gmail.com Link: https://lkml.kernel.org/r/20241012042828.471614-2-visitorckw@gmail.com Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Ching-Chun (Jim) Huang <jserv@ccns.ncku.edu.tw> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: "Liang, Kan" <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-11-05 17:12:33 -08:00
Kuan-Wei Chiu	bf9850f6ea	lib/Makefile: make union-find compilation conditional on CONFIG_CPUSETS Currently, cpuset is the only user of the union-find implementation. Compiling union-find in all configurations unnecessarily increases the code size when building the kernel without cgroup support. Modify the build system to compile union-find only when CONFIG_CPUSETS is enabled. Link: https://lore.kernel.org/lkml/1ccd6411-5002-4574-bb8e-3e64bba6a757@redhat.com/ Link: https://lkml.kernel.org/r/20241011141214.87096-1-visitorckw@gmail.com Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> Suggested-by: Waiman Long <llong@redhat.com> Acked-by: Waiman Long <longman@redhat.com> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Ching-Chun (Jim) Huang <jserv@ccns.ncku.edu.tw> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Koutný <mkoutny@suse.com> Cc: Xavier <xavier_qy@163.com> Cc: Zefan Li <lizefan.x@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-11-05 17:12:32 -08:00
Vinicius Peixoto	5d04270708	lib/crc16_kunit.c: add KUnit tests for crc16 Add Kunit tests for the kernel's implementation of the standard CRC-16 algorithm (<linux/crc16.h>). The test data consists of 100 randomly-generated test cases, validated against a naive CRC-16 implementation. This test follows roughly the same logic as lib/crc32test.c, but without the performance measurements. Link: https://lkml.kernel.org/r/20241012-crc16-kunit-v3-1-0ca75cb58ca9@lkcamp.dev Signed-off-by: Vinicius Peixoto <vpeixoto@lkcamp.dev> Co-developed-by: Enzo Bertoloti <ebertoloti@lkcamp.dev> Signed-off-by: Enzo Bertoloti <ebertoloti@lkcamp.dev> Co-developed-by: Fabricio Gasperin <fgasperin@lkcamp.dev> Signed-off-by: Fabricio Gasperin <fgasperin@lkcamp.dev> Suggested-by: David Laight <David.Laight@ACULAB.COM> Cc: Brendan Higgins <brendan.higgins@linux.dev> Cc: David Gow <davidgow@google.com> Cc: Rae Moar <rmoar@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-11-05 17:12:31 -08:00
I Hsin Cheng	5a3c9366cb	list: test: check the size of every lists for list_cut_position() Check the total number of elements in both resultant lists are correct within list_cut_position(). Previously, only the first list's size was checked. so additional elements in the second list would not have been caught. Link: https://lkml.kernel.org/r/20241008065253.26673-1-richard120310@gmail.com Signed-off-by: I Hsin Cheng <richard120310@gmail.com> Cc: David Gow <davidgow@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-11-05 17:12:30 -08:00
Kuan-Wei Chiu	b42166427b	lib/Kconfig.debug: move int_pow test option to runtime testing section When executing 'make menuconfig' with KUNIT enabled, the int_pow test option appears on the first page of the main menu instead of under the runtime testing section. Relocate the int_pow test configuration to the appropriate runtime testing submenu, ensuring a more organized and logical structure in the menu configuration. Link: https://lkml.kernel.org/r/20241005222221.2154393-1-visitorckw@gmail.com Fixes: `7fcc9b5321` ("lib/math: Add int_pow test suite") Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> Cc: Ching-Chun (Jim) Huang <jserv@ccns.ncku.edu.tw> Cc: David Gow <davidgow@google.com> Cc: Luis Felipe Hernandez <luis.hernandez093@gmail.com> Cc: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-11-05 17:12:30 -08:00
Wei Yang	5059aa6334	maple_tree: memset maple_big_node as a whole In mast_fill_bnode(), we first clear some fields of maple_big_node and set the 'type' unconditionally before return. This means we won't leverage any information in maple_big_node and it is safe to clear the whole structure. In maple_big_node, we define slot and padding/gap in a union. And based on current definition of MAPLE_BIG_NODE_SLOTS/GAPS, padding is always less than slot and part of the gap is overlapped by slot. For example on 64bit system: MAPLE_BIG_NODE_SLOT is 34 MAPLE_BIG_NODE_GAP is 21 With this knowledge, current code may clear some space by twice. And this could be avoid by clearing the structure as a whole. Link: https://lkml.kernel.org/r/20240908140554.20378-3-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-11-05 16:56:24 -08:00
Wei Yang	f36ba81081	maple_tree: remove maple_big_node.parent Patch series "Reduce the space to be cleared for maple_big_node", v2. Found current code may clear maple_big_node redundantly. First we define a field parent, which is never used. After removing this, we reduce the size of memory to be cleared by memset. Then mast_fill_bnode() clears part of the structure twice, since slot and gap share some space. By clearing the whole structure, we can avoid this. This patch (of 2): The member parent of maple_big_node is never used. Let's remove it which could reduce the number of space to be cleared on memset. Link: https://lkml.kernel.org/r/20240908140554.20378-1-richard.weiyang@gmail.com Link: https://lkml.kernel.org/r/20240908140554.20378-2-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-11-05 16:56:24 -08:00
Wei Yang	1c148069b2	maple_tree: goto complete directly on a pivot of 0 When we break the loop after assigning a pivot, the index i/j is not changed. Then the following code assign pivot, which means we do the assignment with same i/j by mas_safe_pivot. Since the loop condition is (i < piv_end), from which we can get i is less than mt_pivots[mt]. It implies mas_safe_pivot() return pivot[i] which is the same value we get in loop. Now we can conclude it does a redundant assignment on a pivot of 0. Let's just go to complete to avoid it. Link: https://lkml.kernel.org/r/20240911142759.20989-3-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-11-05 16:56:24 -08:00
Wei Yang	8c7904a8cd	maple_tree: i is always less than or equal to mas_end Patch series "refine mas_mab_cp()". By analysis of the code, one condition check can be removed and one case would hit a redundant assignment. This patch (of 2): mas_mab_cp() copy range [mas_start, mas_end] inclusively from a maple_node to maple_big_node. This implies mas_start <= mas_end. Based on the relationship of mas_start and mas_end, we can have the following four cases: \| mas_start == mas_end \| mas_start < mas_end ---------------+----------------------+---------------------- mas_start == 0 \| 1 \| 2 ---------------+----------------------+---------------------- mas_start != 0 \| 3 \| 4 We can see in all these four cases, i is always less than or equal to mas_end after finish the loop: Case 1: After assign pivot 0, i is set to 1, which is bigger than mas_end 0. So it jumps to complete and skip the check. Case 2: After assign pivot 0, i is set to 1. ∵ (mas_start < mas_end) && (mas_start == 0) ==> (1 <= mas_end) ∵ (i == 1) && (1 <= mas_end) ==> (i <= mas_end) ∴ Before loop, we have (i <= mas_end). And we still hold this if it skips the loop. For example, (i == mas_end). Now let's see what happens in the loop: ∵ piv_end = min(mas_end, mt_pivots[mt]) ==> (piv_end <= mas_end) ∵ loop condition is (i < piv_end) ==> (i <= piv_end) on finish the loop both normally or break ∵ (i <= piv_end) && (piv_end <= mas_end) ==> (i <= mas_end) ∴ After loop, we still get (i <= mas_end) in this case Case 3: This case would skip both if clause and loop. So when it comes to the check, i is still mas_start which equals to mas_end. Case 4: This case would skip the if clause. ∵ (mas_start < mas_end) && (i == mas_start) ==> (i < mas_end) ∴ Before loop, we have (i < mas_end). The loop process is similar with Case 2, so we get the same result. Now we can conclude in all cases, we get (i <= mas_end) when doing check. Then it is not necessary to do the check. Link: https://lkml.kernel.org/r/20240911142759.20989-1-richard.weiyang@gmail.com Link: https://lkml.kernel.org/r/20240911142759.20989-2-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-11-05 16:56:23 -08:00
Greg Kroah-Hartman	09fbb82f94	Merge 6.12-rc6 into driver-core-next We need the driver-core fix/revert in here as well to build on top of. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-11-05 10:11:53 +01:00
Jakub Kicinski	cbf49bed6a	bpf-next-for-netdev -----BEGIN PGP SIGNATURE----- iIsEABYIADMWIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZyP6TxUcZGFuaWVsQGlv Z2VhcmJveC5uZXQACgkQ2yufC7HISINz7QD/RTuJAzPJXPQmjdzMj7pepjnSQH4K DnOc1soDqjJPSFkBAMlklDCZqSsFoNtNxagbyILrYQBC/MsV9jngimK46DEN =pDzC -----END PGP SIGNATURE----- Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2024-10-31 We've added 13 non-merge commits during the last 16 day(s) which contain a total of 16 files changed, 710 insertions(+), 668 deletions(-). The main changes are: 1) Optimize and homogenize bpf_csum_diff helper for all archs and also add a batch of new BPF selftests for it, from Puranjay Mohan. 2) Rewrite and migrate the test_tcp_check_syncookie.sh BPF selftest into test_progs so that it can be run in BPF CI, from Alexis Lothoré. 3) Two BPF sockmap selftest fixes, from Zijian Zhang. 4) Small XDP synproxy BPF selftest cleanup to remove IP_DF check, from Vincent Li. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: selftests/bpf: Add a selftest for bpf_csum_diff() selftests/bpf: Don't mask result of bpf_csum_diff() in test_verifier bpf: bpf_csum_diff: Optimize and homogenize for all archs net: checksum: Move from32to16() to generic header selftests/bpf: remove xdp_synproxy IP_DF check selftests/bpf: remove test_tcp_check_syncookie selftests/bpf: test MSS value returned with bpf_tcp_gen_syncookie selftests/bpf: add ipv4 and dual ipv4/ipv6 support in btf_skc_cls_ingress selftests/bpf: get rid of global vars in btf_skc_cls_ingress selftests/bpf: add missing ns cleanups in btf_skc_cls_ingress selftests/bpf: factorize conn and syncookies tests in a single runner selftests/bpf: Fix txmsg_redir of test_txmsg_pull in test_sockmap selftests/bpf: Fix msg_verify_data in test_sockmap ==================== Link: https://patch.msgid.link/20241031221543.108853-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-11-03 14:44:51 -08:00
Caleb Sander Mateos	61bf0009a7	dim: pass dim_sample to net_dim() by reference net_dim() is currently passed a struct dim_sample argument by value. struct dim_sample is 24 bytes. Since this is greater 16 bytes, x86-64 passes it on the stack. All callers have already initialized dim_sample on the stack, so passing it by value requires pushing a duplicated copy to the stack. Either witing to the stack and immediately reading it, or perhaps dereferencing addresses relative to the stack pointer in a chain of push instructions, seems to perform quite poorly. In a heavy TCP workload, mlx5e_handle_rx_dim() consumes 3% of CPU time, 94% of which is attributed to the first push instruction to copy dim_sample on the stack for the call to net_dim(): // Call ktime_get() 0.26 \|4ead2: call 4ead7 <mlx5e_handle_rx_dim+0x47> // Pass the address of struct dim in %rdi \|4ead7: lea 0x3d0(%rbx),%rdi // Set dim_sample.pkt_ctr \|4eade: mov %r13d,0x8(%rsp) // Set dim_sample.byte_ctr \|4eae3: mov %r12d,0xc(%rsp) // Set dim_sample.event_ctr 0.15 \|4eae8: mov %bp,0x10(%rsp) // Duplicate dim_sample on the stack 94.16 \|4eaed: push 0x10(%rsp) 2.79 \|4eaf1: push 0x10(%rsp) 0.07 \|4eaf5: push %rax // Call net_dim() 0.21 \|4eaf6: call 4eafb <mlx5e_handle_rx_dim+0x6b> To allow the caller to reuse the struct dim_sample already on the stack, pass the struct dim_sample by reference to net_dim(). Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Arthur Kiyanovski <akiyano@amazon.com> Reviewed-by: Louis Peens <louis.peens@corigine.com> Link: https://patch.msgid.link/20241031002326.3426181-2-csander@purestorage.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-11-03 12:36:54 -08:00
Caleb Sander Mateos	a865276872	dim: make dim_calc_stats() inputs const pointers Make the start and end arguments to dim_calc_stats() const pointers to clarify that the function does not modify their values. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Arthur Kiyanovski <akiyano@amazon.com> Link: https://patch.msgid.link/20241031002326.3426181-1-csander@purestorage.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-11-03 12:35:57 -08:00
Bartosz Golaszewski	a508ef4b1d	lib: string_helpers: silence snprintf() output truncation warning The output of ".%03u" with the unsigned int in range [0, 4294966295] may get truncated if the target buffer is not 12 bytes. This can't really happen here as the 'remainder' variable cannot exceed 999 but the compiler doesn't know it. To make it happy just increase the buffer to where the warning goes away. Fixes: `3c9f3681d0` ("[SCSI] lib: add generic helper to print sizes rounded to the correct SI range") Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Reviewed-by: Andy Shevchenko <andy@kernel.org> Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com> Cc: Kees Cook <kees@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: https://lore.kernel.org/r/20241101205453.9353-1-brgl@bgdev.pl Signed-off-by: Kees Cook <kees@kernel.org>	2024-11-02 13:08:55 -07:00
Thomas Gleixner	d44d26987b	timekeeping: Remove CONFIG_DEBUG_TIMEKEEPING Since `135225a363` timekeeping_cycles_to_ns() handles large offsets which would lead to 64bit multiplication overflows correctly. It's also protected against negative motion of the clocksource unconditionally, which was exclusive to x86 before. timekeeping_advance() handles large offsets already correctly. That means the value of CONFIG_DEBUG_TIMEKEEPING which analyzed these cases is very close to zero. Remove all of it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20241031120328.536010148@linutronix.de	2024-11-02 10:14:31 +01:00
Ming Lei	341468e0ab	lib/iov_iter: fix bvec iterator setup .bi_size of bvec iterator should be initialized as real max size for walking, and .bi_bvec_done just counts how many bytes need to be skipped in the 1st bvec, so .bi_size isn't related with .bi_bvec_done. This patch fixes bvec iterator initialization, and the inner `size` check isn't needed any more, so revert Eric Dumazet's commit 7bc802acf193 ("iov-iter: do not return more bytes than requested in iov_iter_extract_bvec_pages()"). Cc: Eric Dumazet <edumazet@google.com> Fixes: `e4e535bff2` ("iov_iter: don't require contiguous pages in iov_iter_extract_bvec_pages") Reported-by: syzbot+71abe7ab2b70bca770fd@syzkaller.appspotmail.com Tested-by: syzbot+71abe7ab2b70bca770fd@syzkaller.appspotmail.com Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-11-01 20:18:21 -06:00
Linus Torvalds	3dfffd506e	arm64 fixes for -rc6 - Fix handling of POR_EL0 during signal delivery so that pushing the signal context doesn't fail based on the pkey configuration of the interrupted context and align our user-visible behaviour with that of x86. - Fix a bogus pointer being passed to the CPU hotplug code from the Arm SDEI driver. - Re-enable software tag-based KASAN with GCC by using an alternative implementation of '__no_sanitize_address'. -----BEGIN PGP SIGNATURE----- iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmcjr8wQHHdpbGxAa2Vy bmVsLm9yZwAKCRC3rHDchMFjNL2DB/4tNl7feCA2V4fW/Eu3RzXrHTdJbZvTjLDl JjeXPZr4WdGQQMgQ0DPZtpnmeBzd5nswx9WHG9VSsUxc5g+rzWxwvMnUeplDvEXo Y/QMUq4JZN3eqDZWPs0mEN4fMI+QOihInErVHvFXaJLcbxYrU5BvfwExgfY53AjT ZJEPmF291OL6V4UCWVWggk44BQaTBeWmc4itJcYm6z6mIgAgh84MZGK5M0e582ip CRAImDiAPqLxRO9kzKcYthI3FDyyVi1HtiSL1CiNktOXMNz19qPelq1XAnDEyvBt TEUitTLTwbUJ0nqi4u7ve09aebneAq8nsGucteYTrBU4U/PRjvQO =LTB9 -----END PGP SIGNATURE----- Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: "The important one is a change to the way in which we handle protection keys around signal delivery so that we're more closely aligned with the x86 behaviour, however there is also a revert of the previous fix to disable software tag-based KASAN with GCC, since a workaround materialised shortly afterwards. I'd love to say we're done with 6.12, but we're aware of some longstanding fpsimd register corruption issues that we're almost at the bottom of resolving. Summary: - Fix handling of POR_EL0 during signal delivery so that pushing the signal context doesn't fail based on the pkey configuration of the interrupted context and align our user-visible behaviour with that of x86. - Fix a bogus pointer being passed to the CPU hotplug code from the Arm SDEI driver. - Re-enable software tag-based KASAN with GCC by using an alternative implementation of '__no_sanitize_address'" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: signal: Improve POR_EL0 handling to avoid uaccess failures firmware: arm_sdei: Fix the input parameter of cpuhp_remove_state() Revert "kasan: Disable Software Tag-Based KASAN with GCC" kasan: Fix Software Tag-Based KASAN with GCC	2024-11-01 07:54:11 -10:00
Linus Torvalds	d56239a82e	vfs-6.12-rc6.fixes -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZyTGAQAKCRCRxhvAZXjc opd6AQCal4omyfS8FYe4VRRZ/0XHouagq99I0U0TAmKkvoKAsgD/XrdE+pSTEkPX Pv4T9phh1cZRxcyKVu77UoYkuHJEDAg= =Lu9R -----END PGP SIGNATURE----- Merge tag 'vfs-6.12-rc6.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs Pull filesystem fixes from Christian Brauner: "VFS: - Fix copy_page_from_iter_atomic() if KMAP_LOCAL_FORCE_MAP=y is set - Add a get_tree_bdev_flags() helper that allows to modify e.g., whether errors are logged into the filesystem context during superblock creation. This is used by erofs to fix a userspace regression where an error is currently logged when its used on a regular file which is an new allowed mode in erofs. netfs: - Fix the sysfs debug path in the documentation. - Fix iov_iter_get_pages() for folio queues by skipping the page extracation if we're at the end of a folio. afs: - Fix moving subdirectories to different parent directory. autofs: - Fix handling of AUTOFS_DEV_IOCTL_TIMEOUT_CMD ioctl in validate_dev_ioctl(). The actual ioctl number, not the ioctl command needs to be checked for autofs" tag 'vfs-6.12-rc6.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: iov_iter: fix copy_page_from_iter_atomic() if KMAP_LOCAL_FORCE_MAP autofs: fix thinko in validate_dev_ioctl() iov_iter: Fix iov_iter_get_pages*() for folio_queue afs: Fix missing subdir edit when renamed between parent dirs doc: correcting the debug path for cachefiles erofs: use get_tree_bdev_flags() to avoid misleading messages fs/super.c: introduce get_tree_bdev_flags()	2024-11-01 07:37:10 -10:00
Eric Dumazet	a911bad094	dql: annotate data-races around dql->last_obj_cnt dql->last_obj_cnt is read/written from different contexts, without any lock synchronization. Use READ_ONCE()/WRITE_ONCE() to avoid load/store tearing. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Link: https://patch.msgid.link/20241029191425.2519085-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-31 19:19:36 -07:00
Jakub Kicinski	5b1c965956	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.12-rc6). Conflicts: drivers/net/wireless/intel/iwlwifi/mvm/mld-mac80211.c `cbe84e9ad5` ("wifi: iwlwifi: mvm: really send iwl_txpower_constraints_cmd") `188a1bf894` ("wifi: mac80211: re-order assigning channel in activate links") https://lore.kernel.org/all/20241028123621.7bbb131b@canb.auug.org.au/ net/mac80211/cfg.c `c4382d5ca1` ("wifi: mac80211: update the right link for tx power") `8dd0498983` ("wifi: mac80211: Fix setting txpower with emulate_chanctx") drivers/net/ethernet/intel/ice/ice_ptp_hw.h `6e58c33106` ("ice: fix crash on probe for DPLL enabled E810 LOM") `e4291b64e1` ("ice: Align E810T GPIO to other products") `ebb2693f8f` ("ice: Read SDP section from NVM for pin definitions") `ac532f4f42` ("ice: Cleanup unused declarations") https://lore.kernel.org/all/20241030120524.1ee1af18@canb.auug.org.au/ No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-31 18:10:07 -07:00
Ming Lei	496a51b371	lib/iov_iter.c: initialize bi.bi_idx before iterating over bvec Initialize bi.bi_idx as 0 before iterating over bvec, otherwise garbage data can be used as ->bi_idx. Cc: Christoph Hellwig <hch@lst.de> Reported-and-tested-by: Klara Modin <klarasmodin@gmail.com> Fixes: `e4e535bff2` ("iov_iter: don't require contiguous pages in iov_iter_extract_bvec_pages") Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-10-31 07:40:34 -06:00
Puranjay Mohan	db71aae70e	net: checksum: Move from32to16() to generic header from32to16() is used by lib/checksum.c and also by arch/parisc/lib/checksum.c. The next patch will use it in the bpf_csum_diff helper. Move from32to16() to the include/net/checksum.h as csum_from32to16() and remove other implementations. Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20241026125339.26459-2-puranjay@kernel.org	2024-10-30 15:29:59 +01:00
Linus Torvalds	7fbaacafbc	slab fixes for 6.12-rc6 -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEe7vIQRWZI0iWSE3xu+CwddJFiJoFAmcgrxcACgkQu+CwddJF iJrq9ggAiZ/2c7p23s52LdVhT9GTyV5omVOh2kDztVx4w6RM3RbkhkLWdqt0XUag uf1TJe6kOvnCeHEFEEo3sqPj820XebxKDf0GGCdI6a9f4n30ipKH+vWSQ0iutKO/ dOBdArxr0FGOV5VZR9i3xQ6sUqZXXUbJdte0c0ovp6Q6HDHTeQeKNhOQ2fv33TG/ 7jBh5HVyhI6JE/+TOxrMaklH0IqYBb6z49wdbaN7XBvXVXlb5MtOZy109gfUHDwe tfktifyE45VtmF0WdHfxDbCnqyDSG1Jm3wsLDbMq+voJ1BQlUvIZ5Dv4kucYqffm VN5HkH6uQ09aoounBoU4g50UYeNpiQ== =xAw8 -----END PGP SIGNATURE----- Merge tag 'slab-for-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab Pull slab fixes from Vlastimil Babka: - Fix for a slub_kunit test warning with MEM_ALLOC_PROFILING_DEBUG (Pei Xiao) - Fix for a MTE-based KASAN BUG in krealloc() (Qun-Wei Lin) * tag 'slab-for-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: mm: krealloc: Fix MTE false alarm in __do_krealloc slub/kunit: fix a WARNING due to unwrapped __kmalloc_cache_noprof	2024-10-29 16:24:02 -10:00
Ming Lei	e4e535bff2	iov_iter: don't require contiguous pages in iov_iter_extract_bvec_pages The iov_iter_extract_pages interface allows to return physically discontiguous pages, as long as all but the first and last page in the array are page aligned and page size. Rewrite iov_iter_extract_bvec_pages to take advantage of that instead of only returning ranges of physically contiguous pages. Signed-off-by: Ming Lei <ming.lei@redhat.com> [hch: minor cleanups, new commit log] Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20241024050021.627350-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-10-29 09:14:28 -06:00
Arnd Bergmann	5a8b4b4001	lib/iomem_copy: fix kerneldoc format style The newly added file did not quite get the punctuation right: lib/iomem_copy.c:14: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202410290907.0mDZVYPK-lkp@intel.com/ Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2024-10-29 07:14:29 +00:00
Arnd Bergmann	af08475370	selftests: kallsyms: add MODULE_DESCRIPTION The newly added test script creates modules that are lacking a description line in order to build cleanly: WARNING: modpost: missing MODULE_DESCRIPTION() in lib/tests/module/test_kallsyms_a.o WARNING: modpost: missing MODULE_DESCRIPTION() in lib/tests/module/test_kallsyms_b.o WARNING: modpost: missing MODULE_DESCRIPTION() in lib/tests/module/test_kallsyms_c.o WARNING: modpost: missing MODULE_DESCRIPTION() in lib/tests/module/test_kallsyms_d.o Fixes: `84b4a51fce` ("selftests: add new kallsyms selftests") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>	2024-10-28 15:27:08 -07:00
Julian Vetter	b660d0a2ac	New implementation for IO memcpy and IO memset The IO memcpy and IO memset functions in asm-generic/io.h simply call memcpy and memset. This can lead to alignment problems or faults on architectures that do not define their own version and fall back to these defaults. This patch introduces new implementations for IO memcpy and IO memset, that use read{l,q} accessor functions, align accesses to machine word size, and resort to byte accesses when the target memory is not aligned. For new architectures and existing ones that were using the old fallbacks these functions are save to use, because IO memory constraints are taken into account. Moreover, architectures with similar implementations can now use these new versions, not needing to implement their own. Reviewed-by: Yann Sionneau <ysionneau@kalrayinc.com> Signed-off-by: Julian Vetter <jvetter@kalrayinc.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2024-10-28 21:44:29 +00:00
Nicolas Pitre	1dc82675cb	lib/math/test_div64: add some edge cases relevant to __div64_const32() Be sure to test the extreme cases with and without bias. Signed-off-by: Nicolas Pitre <npitre@baylibre.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2024-10-28 21:44:28 +00:00
Ira Weiny	4261974701	printf: Add print format (%pra) for struct range The use of struct range in the CXL subsystem is growing. In particular, the addition of Dynamic Capacity devices uses struct range in a number of places which are reported in debug and error messages. To wit requiring the printing of the start/end fields in each print became cumbersome. Dan Williams mentions in [1] that it might be time to have a print specifier for struct range similar to struct resource. A few alternatives were considered including '%par', '%r', and '%pn'. %pra follows that struct range is similar to struct resource (%p[rR]) but needs to be different. Based on discussions with Petr and Andy '%pra' was chosen.[2] Andy also suggested to keep the range prints similar to struct resource though combined code. Add hex_range() to handle printing for both pointer types. Finally introduce DEFINE_RANGE() as a parallel to DEFINE_RES_*() and use it in the tests. Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-doc@vger.kernel.org Cc: open list <linux-kernel@vger.kernel.org> Link: https://lore.kernel.org/all/663922b475e50_d54d72945b@dwillia2-xfh.jf.intel.com.notmuch/ [1] Link: https://lore.kernel.org/all/66cea3bf3332f_f937b29424@iweiny-mobl.notmuch/ [2] Suggested-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://patch.msgid.link/20241025-cxl-pra-v2-3-123a825daba2@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>	2024-10-28 14:32:43 -07:00
Ira Weiny	8e7f07e608	test printf: Add very basic struct resource tests The printf tests for struct resource were stubbed out. struct range printing will leverage the struct resource implementation. To prevent regression add some basic sanity tests for struct resource. Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Fan Ni <fan.ni@samsung.com> Tested-by: Fan Ni <fan.ni@samsung.com> Acked-by: Petr Mladek <pmladek@suse.com> Link: https://patch.msgid.link/20241007-dcd-type2-upstream-v4-1-c261ee6eeded@intel.com Tested-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Link: https://patch.msgid.link/20241025-cxl-pra-v2-1-123a825daba2@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>	2024-10-28 14:30:52 -07:00
Hugh Dickins	c749d9b7eb	iov_iter: fix copy_page_from_iter_atomic() if KMAP_LOCAL_FORCE_MAP generic/077 on x86_32 CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP=y with highmem, on huge=always tmpfs, issues a warning and then hangs (interruptibly): WARNING: CPU: 5 PID: 3517 at mm/highmem.c:622 kunmap_local_indexed+0x62/0xc9 CPU: 5 UID: 0 PID: 3517 Comm: cp Not tainted 6.12.0-rc4 #2 ... copy_page_from_iter_atomic+0xa6/0x5ec generic_perform_write+0xf6/0x1b4 shmem_file_write_iter+0x54/0x67 Fix copy_page_from_iter_atomic() by limiting it in that case (include/linux/skbuff.h skb_frag_must_loop() does similar). But going forward, perhaps CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP is too surprising, has outlived its usefulness, and should just be removed? Fixes: `908a1ad894` ("iov_iter: Handle compound highmem pages in copy_page_from_iter_atomic()") Signed-off-by: Hugh Dickins <hughd@google.com> Link: https://lore.kernel.org/r/dd5f0c89-186e-18e1-4f43-19a60f5a9774@google.com Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: stable@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-10-28 13:39:35 +01:00
Eric Biggers	4964a1d91c	crypto: api - move crypto_simd_disabled_for_test to lib Move crypto_simd_disabled_for_test to lib/ so that crypto_simd_usable() can be used by library code. This was discussed previously (https://lore.kernel.org/linux-crypto/20220716062920.210381-4-ebiggers@kernel.org/) but was not done because there was no use case yet. However, this is now needed for the arm64 CRC32 library code. Tested with: export ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- echo CONFIG_CRC32=y > .config echo CONFIG_MODULES=y >> .config echo CONFIG_CRYPTO=m >> .config echo CONFIG_DEBUG_KERNEL=y >> .config echo CONFIG_CRYPTO_MANAGER_DISABLE_TESTS=n >> .config echo CONFIG_CRYPTO_MANAGER_EXTRA_TESTS=y >> .config make olddefconfig make -j$(nproc) Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2024-10-28 18:33:11 +08:00
Ard Biesheuvel	16739efac6	crypto: crc32c - Provide crc32c-arch driver for accelerated library code crc32c-generic is currently backed by the architecture's CRC-32c library code, which may offer a variety of implementations depending on the capabilities of the platform. These are not covered by the crypto subsystem's fuzz testing capabilities because crc32c-generic is the reference driver that the fuzzing logic uses as a source of truth. Fix this by providing a crc32c-arch implementation which is based on the arch library code if available, and modify crc32c-generic so it is always based on the generic C implementation. If the arch has no CRC-32c library code, this change does nothing. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2024-10-28 18:33:10 +08:00
Ard Biesheuvel	a37e55791f	crypto: crc32 - Provide crc32-arch driver for accelerated library code crc32-generic is currently backed by the architecture's CRC-32 library code, which may offer a variety of implementations depending on the capabilities of the platform. These are not covered by the crypto subsystem's fuzz testing capabilities because crc32-generic is the reference driver that the fuzzing logic uses as a source of truth. Fix this by providing a crc32-arch implementation which is based on the arch library code if available, and modify crc32-generic so it is always based on the generic C implementation. If the arch has no CRC-32 library code, this change does nothing. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2024-10-28 18:33:10 +08:00
Paolo Abeni	03fc07a247	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR. No conflicts and no adjacent changes. Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-10-25 09:08:22 +02:00
Linus Torvalds	c2cd8e4592	Probes fixes for v6.12-rc4(2): - objpool: Fix choosing allocation for percpu slots Fixes to allocate objpool's percpu slots correctly according to the GFP flag. It checks whether "any bit" in GFP_ATOMIC is set to choose the vmalloc source, but it should check "all bits" in GFP_ATOMIC flag is set, because GFP_ATOMIC is a combined flag. - tracing/probes: Fix MAX_TRACE_ARGS limit handling If more than MAX_TRACE_ARGS are passed for creating a probe event, the entries over MAX_TRACE_ARG in trace_arg array are not initialized. Thus if the kernel accesses those entries, it crashes. This rejects creating event if the number of arguments is over MAX_TRACE_ARGS. - tracing: Consider the NULL character when validating the event length A strlen() is used when parsing the event name, and the original code does not consider the terminal null byte. Thus it can pass the name 1 byte longer than the buffer. This fixes to check it correctly. -----BEGIN PGP SIGNATURE----- iQEzBAABCgAdFiEEh7BulGwFlgAOi5DV2/sHvwUrPxsFAmcZBJ0ACgkQ2/sHvwUr Pxu4qAgAm+mIiCaBGyolsT1oB5EF+9gztbwRtcAOY1811RJZ0XiQPuOwtZfijpBr 1Pl+SjubRKhLg+lLHEuCQHxkqlTSp+zrjkF+A0hFlB38nJ5P3pIw+b5pM5FCvhY+ w0tBTwkjiRBS9h1z88c74ciKYA/XR4apcMMUrPQZUCHq8P73Wu/Fo2lhnCVGBs6q nYESyrTcOCDR0c6HP9D2GWxQFtbbCyAfotUjX37EIooTcl7ufAr8IPm8jBx7EzCa WM841FwbuIgGbFCGYlG1/lOR+Qf7FszKAY5SBJMV/BiyFbxJqZfA5DWfJcrZ9YpW pl86oKWyEkidwx8OIiB3Y1enPzUUJQ== =8oUB -----END PGP SIGNATURE----- Merge tag 'probes-fixes-v6.12-rc4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull probes fixes from Masami Hiramatsu: - objpool: Fix choosing allocation for percpu slots Fixes to allocate objpool's percpu slots correctly according to the GFP flag. It checks whether "any bit" in GFP_ATOMIC is set to choose the vmalloc source, but it should check "all bits" in GFP_ATOMIC flag is set, because GFP_ATOMIC is a combined flag. - tracing/probes: Fix MAX_TRACE_ARGS limit handling If more than MAX_TRACE_ARGS are passed for creating a probe event, the entries over MAX_TRACE_ARG in trace_arg array are not initialized. Thus if the kernel accesses those entries, it crashes. This rejects creating event if the number of arguments is over MAX_TRACE_ARGS. - tracing: Consider the NUL character when validating the event length A strlen() is used when parsing the event name, and the original code does not consider the terminal null byte. Thus it can pass the name one byte longer than the buffer. This fixes to check it correctly. * tag 'probes-fixes-v6.12-rc4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Consider the NULL character when validating the event length tracing/probes: Fix MAX_TRACE_ARGS limit handling objpool: fix choosing allocation for percpu slots	2024-10-24 13:51:58 -07:00
Luis Chamberlain	84b4a51fce	selftests: add new kallsyms selftests We lack find_symbol() selftests, so add one. This let's us stress test improvements easily on find_symbol() or optimizations. It also inherently allows us to test the limits of kallsyms on Linux today. We test a pathalogical use case for kallsyms by introducing modules which are automatically written for us with a larger number of symbols. We have 4 kallsyms test modules: A: has KALLSYSMS_NUMSYMS exported symbols B: uses one of A's symbols C: adds KALLSYMS_SCALE_FACTOR * KALLSYSMS_NUMSYMS exported D: adds 2 * the symbols than C By using anything much larger than KALLSYSMS_NUMSYMS as 10,000 and KALLSYMS_SCALE_FACTOR of 8 we segfault today. So we're capped at around 160000 symbols somehow today. We can inpsect that issue at our leasure later, but for now the real value to this test is that this will easily allow us to test improvements on find_symbol(). We want to enable this test on allyesmodconfig builds so we can't use this combination, so instead just use a safe value for now and be informative on the Kconfig symbol documentation about where our thresholds are for testers. We default then to KALLSYSMS_NUMSYMS of just 100 and KALLSYMS_SCALE_FACTOR of 8. On x86_64 we can use perf, for other architectures we just use 'time' and allow for customizations. For example a future enhancements could be done for parisc to check for unaligned accesses which triggers a special special exception handler assembler code inside the kernel. The negative impact on performance is so large on parisc that it keeps track of its accesses on /proc/cpuinfo as UAH: IRQ: CPU0 CPU1 3: 1332 0 SuperIO ttyS0 7: `1270013` 0 SuperIO pata_ns87415 64: 320023012 320021431 CPU timer 65: 17080507 20624423 CPU IPI UAH: 10948640 58104 Unaligned access handler traps While at it, this tidies up lib/ test modules to allow us to have a new directory for them. The amount of test modules under lib/ is insane. This should also hopefully showcase how to start doing basic self module writing code, which may be more useful for more complex cases later in the future. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>	2024-10-24 10:14:12 -07:00
David Howells	e65a0dc1ca	iov_iter: Fix iov_iter_get_pages*() for folio_queue p9_get_mapped_pages() uses iov_iter_get_pages_alloc2() to extract pages from an iterator when performing a zero-copy request and under some circumstances, this crashes with odd page errors[1], for example, I see: page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0xbcf0 flags: 0x2000000000000000(zone=1) ... page dumped because: VM_BUG_ON_FOLIO(((unsigned int) folio_ref_count(folio) + 127u <= 127u)) ------------[ cut here ]------------ kernel BUG at include/linux/mm.h:1444! This is because, unlike in iov_iter_extract_folioq_pages(), the iter_folioq_get_pages() helper function doesn't skip the current folio when iov_offset points to the end of it, but rather extracts the next page beyond the end of the folio and adds it to the list. Reading will then clobber the contents of this page, leading to system corruption, and if the page is not in use, put_page() may try to clean up the unused page. This can be worked around by copying the iterator before each extraction[2] and using iov_iter_advance() on the original as the advance function steps over the page we're at the end of. Fix this by skipping the page extraction if we're at the end of the folio. This was reproduced in the ktest environment[3] by forcing 9p to use the fscache caching mode and then reading a file through 9p. Fixes: `db0aa2e956` ("mm: Define struct folio_queue and ITER_FOLIOQ to handle a sequence of folios") Reported-by: Antony Antony <antony@phenome.org> Closes: https://lore.kernel.org/r/ZxFQw4OI9rrc7UYc@Antony2201.local/ Signed-off-by: David Howells <dhowells@redhat.com> cc: Eric Van Hensbergen <ericvh@kernel.org> cc: Latchesar Ionkov <lucho@ionkov.net> cc: Dominique Martinet <asmadeus@codewreck.org> cc: Christian Schoenebeck <linux_oss@crudebyte.com> cc: v9fs@lists.linux.dev cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/ZxFEi1Tod43pD6JC@moon.secunet.de/ [1] Link: https://lore.kernel.org/r/2299159.1729543103@warthog.procyon.org.uk/ [2] Link: https://github.com/koverstreet/ktest.git [3] Tested-by: Antony Antony <antony.antony@secunet.com> Link: https://lore.kernel.org/r/3327438.1729678025@warthog.procyon.org.uk Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-10-24 13:50:27 +02:00
Marco Elver	237ab03e30	Revert "kasan: Disable Software Tag-Based KASAN with GCC" This reverts commit `7aed6a2c51`. Now that __no_sanitize_address attribute is fixed for KASAN_SW_TAGS with GCC, allow re-enabling KASAN_SW_TAGS with GCC. Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrew Pinski <pinskia@gmail.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Marco Elver <elver@google.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Link: https://lore.kernel.org/r/20241021120013.3209481-2-elver@google.com Signed-off-by: Will Deacon <will@kernel.org>	2024-10-23 16:04:30 +01:00
Pei Xiao	2b059d0d1e	slub/kunit: fix a WARNING due to unwrapped __kmalloc_cache_noprof 'modprobe slub_kunit' will have a warning as shown below. The root cause is that __kmalloc_cache_noprof was directly used, which resulted in no alloc_tag being allocated. This caused current->alloc_tag to be null, leading to a warning in alloc_tag_add_check. Let's add an alloc_hook layer to __kmalloc_cache_noprof specifically within lib/slub_kunit.c, which is the only user of this internal slub function outside kmalloc implementation itself. [58162.947016] WARNING: CPU: 2 PID: 6210 at ./include/linux/alloc_tag.h:125 alloc_tagging_slab_alloc_hook+0x268/0x27c [58162.957721] Call trace: [58162.957919] alloc_tagging_slab_alloc_hook+0x268/0x27c [58162.958286] __kmalloc_cache_noprof+0x14c/0x344 [58162.958615] test_kmalloc_redzone_access+0x50/0x10c [slub_kunit] [58162.959045] kunit_try_run_case+0x74/0x184 [kunit] [58162.959401] kunit_generic_run_threadfn_adapter+0x2c/0x4c [kunit] [58162.959841] kthread+0x10c/0x118 [58162.960093] ret_from_fork+0x10/0x20 [58162.960363] ---[ end trace 0000000000000000 ]--- Signed-off-by: Pei Xiao <xiaopei01@kylinos.cn> Fixes: `a0a44d9175` ("mm, slab: don't wrap internal functions with alloc_hooks()") Signed-off-by: Vlastimil Babka <vbabka@suse.cz>	2024-10-23 09:50:58 +02:00
Viktor Malik	aff1871bfc	objpool: fix choosing allocation for percpu slots objpool intends to use vmalloc for default (non-atomic) allocations of percpu slots and objects. However, the condition checking if GFP flags set any bit of GFP_ATOMIC is wrong b/c GFP_ATOMIC is a combination of bits (__GFP_HIGH\|__GFP_KSWAPD_RECLAIM) and so `pool->gfp & GFP_ATOMIC` will be true if either bit is set. Since GFP_ATOMIC and GFP_KERNEL share the ___GFP_KSWAPD_RECLAIM bit, kmalloc will be used in cases when GFP_KERNEL is specified, i.e. in all current usages of objpool. This may lead to unexpected OOM errors since kmalloc cannot allocate large amounts of memory. For instance, objpool is used by fprobe rethook which in turn is used by BPF kretprobe.multi and kprobe.session probe types. Trying to attach these to all kernel functions with libbpf using SEC("kprobe.session/") int kprobe(struct pt_regs ctx) { [...] } fails on objpool slot allocation with ENOMEM. Fix the condition to truly use vmalloc by default. Link: https://lore.kernel.org/all/20240826060718.267261-1-vmalik@redhat.com/ Fixes: `b4edb8d2d4` ("lib: objpool added: ring-array based lockless MPMC") Signed-off-by: Viktor Malik <vmalik@redhat.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Matt Wu <wuqiang.matt@bytedance.com> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>	2024-10-22 14:22:42 +09:00
Linus Torvalds	a777c32ca4	This push fixes a regression in mpi that broke RSA. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEn51F/lCuNhUwmDeSxycdCkmxi6cFAmcSJQYACgkQxycdCkmx i6ejoBAAhK/3bk9jmxMOnVvednjrjVMqg+17daXHKbHT6eMcOwXsgr4ZWrkc5syV tQBRipdSfLhwf4aTNOzgyg3GIVVQkLZuRKDanntVdyYs65YKKUP/BiUshMAJ4DbW nkPe+LBdl0EvIWexrSKy5cyB2Yt+5MknK+mUMHyAeRjgVHNCEBMbMo/4KHGDW6fL Cn8rBATD1LCBODkxFC83pHe5M/TsxM08hL8xQxPJZm9SvNiBa7+xaS/oSApyIs8x L0RmYdlXlRGQcok5/ZCFc66QEOw2lIOwIc6sTmbT+eKFtvztkZ+ErhAuubgk5UKa TaB0qrBIpsQs2O7gFq4OU7BkG4QAlFt37MqBuf21b5Zh605s/ORDWEQobcokXpBY SmxOBxBhhLcRgb1cjUQn44/M8vrRXL0+IZiuOWkb+vcNln32bCH+BeiW6traNdL3 s3uVRF28Pd76xB4eAuT4eqiSOuCI/FyB7+hJmkOcpKC1eQUq2whrFLfru3iGItn8 bJWJQjPaysI8QXoky6miMjaeBWWOHuBWgYb2BzzHRsAdxK2oXUN/Q3BOJq1wONtP YaRzqu5vBvPk+0F/SOIl1MBp1nt62T8WRcDyIAhDsgmnuWASAKzo9Smzzo0gJr8q bB9iHTHN6yR9J3+zPyOqPY99zkaABSrQU9StFqEjN8icndG5Tfo= =MHMX -----END PGP SIGNATURE----- Merge tag 'v6.12-p4' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fix from Herbert Xu: "Fix a regression in mpi that broke RSA" * tag 'v6.12-p4' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: lib/mpi - Fix an "Uninitialized scalar variable" issue	2024-10-21 09:59:43 -07:00
Linus Torvalds	4e6bd4a33a	Rust fixes for v6.12 (2nd) Toolchain and infrastructure: - Fix several issues with the 'rustc-option' macro. It includes a refactor from Masahiro of three '{cc,rust}-' macros, which is not a fix but avoids repeating the same commands (which would be several lines in the case of 'rustc-option'). - Fix conditions for 'CONFIG_HAVE_CFI_ICALL_NORMALIZE_INTEGERS'. It includes the addition of 'CONFIG_RUSTC_LLVM_VERSION', which is not a fix but is needed for the actual fix. And a trivial grammar fix. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEPjU5OPd5QIZ9jqqOGXyLc2htIW0FAmcS5LkACgkQGXyLc2ht IW07ghAAxP94zqWzf8bQ4IIgTYrV9WSqR9vMpd31VAPknRJjGUq5dehFxiQxDJ5X ibMcpyja8V1CGeOh4qthLJAD/OGw+ANafjLfHM/l9cQRx1uwLEac3h4/YR1x52Ep al3ISewhbs3cjko2aa6Gnym3hdYizqkKY9Bca6kvo7k4ZRRmWT3sKAsle6rV93Hw q9AjC40XC8iy2VYv/JPvP1zcr3T7ZzCrs3ELG8sLSeR0gZZEmI3e3FOWWHcRlVRa uig4SSPvhHVssG8k64CHmzUtVQCApuJuzQGG72Ozs4V5Xxk86ZRE0XzyMXaw15nu Mm8s+hDxsFXfESQg0GMCVQ7wnGFSuvRwK3sWALltXmqtGQxkYgcJ3mYtu0sP8p51 VIzDIomdUfGLxk+sDn7Lnl5PrSLaetUd94nr5qCMmfb2/7/kSaB4aHmML+8ZHCn5 I4TQONL/pVmmRm97HFaAFOzCaGRWfVoIzQ/cRaQhqK+qrTfRjyFcsMzN+Flp5A58 c3AgnTVlm4pPqtlLQ1z9BiGYT50dI0fHBOQiisogGsZwwMUqzEMOnbZjbhS/HKSp FG8hu/OyzIsNnNqOfQZN4DSTyf4qfIuyTmFM1OAel8zllCwlxy5F2hVp/opwH3/y On6CW0lunUBzCXZZ+byWudo7Vg8YpMVHATLqp9FHZpJb8JK688w= =Y7fL -----END PGP SIGNATURE----- Merge tag 'rust-fixes-6.12-2' of https://github.com/Rust-for-Linux/linux Pull rust fixes from Miguel Ojeda: "Toolchain and infrastructure: - Fix several issues with the 'rustc-option' macro. It includes a refactor from Masahiro of three '{cc,rust}-' macros, which is not a fix but avoids repeating the same commands (which would be several lines in the case of 'rustc-option'). - Fix conditions for 'CONFIG_HAVE_CFI_ICALL_NORMALIZE_INTEGERS'. It includes the addition of 'CONFIG_RUSTC_LLVM_VERSION', which is not a fix but is needed for the actual fix. And a trivial grammar fix" * tag 'rust-fixes-6.12-2' of https://github.com/Rust-for-Linux/linux: cfi: fix conditions for HAVE_CFI_ICALL_NORMALIZE_INTEGERS kbuild: rust: add `CONFIG_RUSTC_LLVM_VERSION` kbuild: fix issues with rustc-option kbuild: refactor cc-option-yn, cc-disable-warning, rust-option-yn macros lib/Kconfig.debug: fix grammar in RUST_BUILD_ASSERT_ALLOW	2024-10-19 08:32:47 -07:00
Linus Torvalds	3d5ad2d4ec	BPF fixes: - Fix BPF verifier to not affect subreg_def marks in its range propagation, from Eduard Zingerman. - Fix a truncation bug in the BPF verifier's handling of coerce_reg_to_size_sx, from Dimitar Kanaliev. - Fix the BPF verifier's delta propagation between linked registers under 32-bit addition, from Daniel Borkmann. - Fix a NULL pointer dereference in BPF devmap due to missing rxq information, from Florian Kauer. - Fix a memory leak in bpf_core_apply, from Jiri Olsa. - Fix an UBSAN-reported array-index-out-of-bounds in BTF parsing for arrays of nested structs, from Hou Tao. - Fix build ID fetching where memory areas backing the file were created with memfd_secret, from Andrii Nakryiko. - Fix BPF task iterator tid filtering which was incorrectly using pid instead of tid, from Jordan Rome. - Several fixes for BPF sockmap and BPF sockhash redirection in combination with vsocks, from Michal Luczaj. - Fix riscv BPF JIT and make BPF_CMPXCHG fully ordered, from Andrea Parri. - Fix riscv BPF JIT under CONFIG_CFI_CLANG to prevent the possibility of an infinite BPF tailcall, from Pu Lehui. - Fix a build warning from resolve_btfids that bpf_lsm_key_free cannot be resolved, from Thomas Weißschuh. - Fix a bug in kfunc BTF caching for modules where the wrong BTF object was returned, from Toke Høiland-Jørgensen. - Fix a BPF selftest compilation error in cgroup-related tests with musl libc, from Tony Ambardar. - Several fixes to BPF link info dumps to fill missing fields, from Tyrone Wu. - Add BPF selftests for kfuncs from multiple modules, checking that the correct kfuncs are called, from Simon Sundberg. - Ensure that internal and user-facing bpf_redirect flags don't overlap, also from Toke Høiland-Jørgensen. - Switch to use kvzmalloc to allocate BPF verifier environment, from Rik van Riel. - Use raw_spinlock_t in BPF ringbuf to fix a sleep in atomic splat under RT, from Wander Lairson Costa. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> -----BEGIN PGP SIGNATURE----- iIsEABYIADMWIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZxK4OhUcZGFuaWVsQGlv Z2VhcmJveC5uZXQACgkQ2yufC7HISIOCrwEAib2kC5EEQn5+wKVE/bnZryVX2leT YXdfItDCBU6zCYUA+wTU5hGGn9lcDUcZx72l/KZPDyPw7HdzNJ+6iR1zQqoM =f9kv -----END PGP SIGNATURE----- Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Pull bpf fixes from Daniel Borkmann: - Fix BPF verifier to not affect subreg_def marks in its range propagation (Eduard Zingerman) - Fix a truncation bug in the BPF verifier's handling of coerce_reg_to_size_sx (Dimitar Kanaliev) - Fix the BPF verifier's delta propagation between linked registers under 32-bit addition (Daniel Borkmann) - Fix a NULL pointer dereference in BPF devmap due to missing rxq information (Florian Kauer) - Fix a memory leak in bpf_core_apply (Jiri Olsa) - Fix an UBSAN-reported array-index-out-of-bounds in BTF parsing for arrays of nested structs (Hou Tao) - Fix build ID fetching where memory areas backing the file were created with memfd_secret (Andrii Nakryiko) - Fix BPF task iterator tid filtering which was incorrectly using pid instead of tid (Jordan Rome) - Several fixes for BPF sockmap and BPF sockhash redirection in combination with vsocks (Michal Luczaj) - Fix riscv BPF JIT and make BPF_CMPXCHG fully ordered (Andrea Parri) - Fix riscv BPF JIT under CONFIG_CFI_CLANG to prevent the possibility of an infinite BPF tailcall (Pu Lehui) - Fix a build warning from resolve_btfids that bpf_lsm_key_free cannot be resolved (Thomas Weißschuh) - Fix a bug in kfunc BTF caching for modules where the wrong BTF object was returned (Toke Høiland-Jørgensen) - Fix a BPF selftest compilation error in cgroup-related tests with musl libc (Tony Ambardar) - Several fixes to BPF link info dumps to fill missing fields (Tyrone Wu) - Add BPF selftests for kfuncs from multiple modules, checking that the correct kfuncs are called (Simon Sundberg) - Ensure that internal and user-facing bpf_redirect flags don't overlap (Toke Høiland-Jørgensen) - Switch to use kvzmalloc to allocate BPF verifier environment (Rik van Riel) - Use raw_spinlock_t in BPF ringbuf to fix a sleep in atomic splat under RT (Wander Lairson Costa) * tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: (38 commits) lib/buildid: Handle memfd_secret() files in build_id_parse() selftests/bpf: Add test case for delta propagation bpf: Fix print_reg_state's constant scalar dump bpf: Fix incorrect delta propagation between linked registers bpf: Properly test iter/task tid filtering bpf: Fix iter/task tid filtering riscv, bpf: Make BPF_CMPXCHG fully ordered bpf, vsock: Drop static vsock_bpf_prot initialization vsock: Update msg_count on read_skb() vsock: Update rx_bytes on read_skb() bpf, sockmap: SK_DROP on attempted redirects of unsupported af_vsock selftests/bpf: Add asserts for netfilter link info bpf: Fix link info netfilter flags to populate defrag flag selftests/bpf: Add test for sign extension in coerce_subreg_to_size_sx() selftests/bpf: Add test for truncation after sign extension in coerce_reg_to_size_sx() bpf: Fix truncation bug in coerce_reg_to_size_sx() selftests/bpf: Assert link info uprobe_multi count & path_size if unset bpf: Fix unpopulated path_size when uprobe_multi fields unset selftests/bpf: Fix cross-compiling urandom_read selftests/bpf: Add test for kfunc module order ...	2024-10-18 16:27:14 -07:00
Sebastian Andrzej Siewior	560af5dc83	lockdep: Enable PROVE_RAW_LOCK_NESTING with PROVE_LOCKING. With the printk issues solved, the last known splat created by PROVE_RAW_LOCK_NESTING is gone. Enable PROVE_RAW_LOCK_NESTING by default as part of PROVE_LOCKING. Keep the defines around in case something serious pops up and it needs to be disabled. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Waiman Long <longman@redhat.com> Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Link: https://lore.kernel.org/r/20241009161041.1018375-2-bigeasy@linutronix.de	2024-10-17 21:21:16 -07:00
Ahmed Ehab	5eadeb7b3b	locking/lockdep: Add a test for lockdep_set_subclass() Add a test case to ensure that no new name string literal will be created in lockdep_set_subclass(), otherwise a warning will be triggered in look_up_lock_class(). Add this to catch the problem in the future. [boqun: Reword the title, replace #if with #ifdef and rename functions and variables] Signed-off-by: Ahmed Ehab <bottaawesome633@gmail.com> Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Link: https://lore.kernel.org/lkml/20240905011220.356973-1-bottaawesome633@gmail.com/	2024-10-17 21:21:16 -07:00
Linus Torvalds	4d939780b7	28 hotfixes. 13 are cc:stable. 23 are MM. It is the usual shower of unrelated singletons - please see the individual changelogs for details. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZxGY5wAKCRDdBJ7gKXxA js6RAQC16zQ7WRV091i79cEi1C5648NbZjMCU626hZjuyfbzKgEA2v8PYtjj9w2e UGLxMY+PYZki2XNEh75Sikdkiyl9Vgg= =xcWT -----END PGP SIGNATURE----- Merge tag 'mm-hotfixes-stable-2024-10-17-16-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "28 hotfixes. 13 are cc:stable. 23 are MM. It is the usual shower of unrelated singletons - please see the individual changelogs for details" * tag 'mm-hotfixes-stable-2024-10-17-16-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (28 commits) maple_tree: add regression test for spanning store bug maple_tree: correct tree corruption on spanning store mm/mglru: only clear kswapd_failures if reclaimable mm/swapfile: skip HugeTLB pages for unuse_vma selftests: mm: fix the incorrect usage() info of khugepaged MAINTAINERS: add Jann as memory mapping/VMA reviewer mm: swap: prevent possible data-race in __try_to_reclaim_swap mm: khugepaged: fix the incorrect statistics when collapsing large file folios MAINTAINERS: kasan, kcov: add bugzilla links mm: don't install PMD mappings when THPs are disabled by the hw/process/vma mm: huge_memory: add vma_thp_disabled() and thp_disabled_by_hw() Docs/damon/maintainer-profile: update deprecated awslabs GitHub URLs Docs/damon/maintainer-profile: add missing '_' suffixes for external web links maple_tree: check for MA_STATE_BULK on setting wr_rebalance mm: khugepaged: fix the arguments order in khugepaged_collapse_file trace point mm/damon/tests/sysfs-kunit.h: fix memory leak in damon_sysfs_test_add_targets() mm: remove unused stub for can_swapin_thp() mailmap: add an entry for Andy Chiu MAINTAINERS: add memory mapping/VMA co-maintainers fs/proc: fix build with GCC 15 due to -Werror=unterminated-string-initialization ...	2024-10-17 16:33:06 -07:00
Andrii Nakryiko	5ac9b4e935	lib/buildid: Handle memfd_secret() files in build_id_parse() >From memfd_secret(2) manpage: The memory areas backing the file created with memfd_secret(2) are visible only to the processes that have access to the file descriptor. The memory region is removed from the kernel page tables and only the page tables of the processes holding the file descriptor map the corresponding physical memory. (Thus, the pages in the region can't be accessed by the kernel itself, so that, for example, pointers to the region can't be passed to system calls.) We need to handle this special case gracefully in build ID fetching code. Return -EFAULT whenever secretmem file is passed to build_id_parse() family of APIs. Original report and repro can be found in [0]. [0] https://lore.kernel.org/bpf/ZwyG8Uro%2FSyTXAni@ly-workstation/ Fixes: `de3ec364c3` ("lib/buildid: add single folio-based file reader abstraction") Reported-by: Yi Lai <yi1.lai@intel.com> Suggested-by: Shakeel Butt <shakeel.butt@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Link: https://lore.kernel.org/bpf/20241017175431.6183-A-hca@linux.ibm.com Link: https://lore.kernel.org/bpf/20241017174713.2157873-1-andrii@kernel.org	2024-10-17 21:30:32 +02:00
Linus Torvalds	6efbea77b3	arm64 fixes for -rc4 - Disable software tag-based KASAN when compiling with GCC, as functions are incorrectly instrumented leading to a crash early during boot. - Fix pkey configuration for kernel threads when POE is enabled. - Fix invalid memory accesses in uprobes when targetting load-literal instructions. -----BEGIN PGP SIGNATURE----- iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmcPrzQQHHdpbGxAa2Vy bmVsLm9yZwAKCRC3rHDchMFjNIr6B/wN+o1xI7Fv/QdlaTuKYLvOOg/XTl6sbUDj YssxtjhpKuaFVG4zJHNsWvgUqO+YCM7m3F1L8LVPMF7l2xoKtRTIB1Ye315hTjYm dW5Te6xBMVKF8SVxE8sBbZobdokIW1JNPBrvGvHO3d5ujmofzwHU8RNMXuTUItRw z85Qy75FkEDTEbsWhS3VL5HOgEr+k0TYDRa8SXwKWVj7/rYna3tO39kIdS5dt9VX wDJbnxtWJMhiHmDnevFFhBkSZrips12P1Rb6HUSmhpUJh0Rk4TAZntSl2f/lr+jA PuboBbSG68UOCwAHoNmTcLdFhkiNaiyw4w2F7hk2A6aNRtme+bT0 =M/ug -----END PGP SIGNATURE----- Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: - Disable software tag-based KASAN when compiling with GCC, as functions are incorrectly instrumented leading to a crash early during boot - Fix pkey configuration for kernel threads when POE is enabled - Fix invalid memory accesses in uprobes when targetting load-literal instructions * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: kasan: Disable Software Tag-Based KASAN with GCC Documentation/protection-keys: add AArch64 to documentation arm64: set POR_EL0 for kernel threads arm64: probes: Fix uprobes for big-endian kernels arm64: probes: Fix simulate_ldr*_literal() arm64: probes: Remove broken LDR (literal) uprobe support	2024-10-17 09:51:03 -07:00
Lorenzo Stoakes	bea07fd631	maple_tree: correct tree corruption on spanning store Patch series "maple_tree: correct tree corruption on spanning store", v3. There has been a nasty yet subtle maple tree corruption bug that appears to have been in existence since the inception of the algorithm. This bug seems far more likely to happen since commit `f8d112a4e6` ("mm/mmap: avoid zeroing vma tree in mmap_region()"), which is the point at which reports started to be submitted concerning this bug. We were made definitely aware of the bug thanks to the kind efforts of Bert Karwatzki who helped enormously in my being able to track this down and identify the cause of it. The bug arises when an attempt is made to perform a spanning store across two leaf nodes, where the right leaf node is the rightmost child of the shared parent, AND the store completely consumes the right-mode node. This results in mas_wr_spanning_store() mitakenly duplicating the new and existing entries at the maximum pivot within the range, and thus maple tree corruption. The fix patch corrects this by detecting this scenario and disallowing the mistaken duplicate copy. The fix patch commit message goes into great detail as to how this occurs. This series also includes a test which reliably reproduces the issue, and asserts that the fix works correctly. Bert has kindly tested the fix and confirmed it resolved his issues. Also Mikhail Gavrilov kindly reported what appears to be precisely the same bug, which this fix should also resolve. This patch (of 2): There has been a subtle bug present in the maple tree implementation from its inception. This arises from how stores are performed - when a store occurs, it will overwrite overlapping ranges and adjust the tree as necessary to accommodate this. A range may always ultimately span two leaf nodes. In this instance we walk the two leaf nodes, determine which elements are not overwritten to the left and to the right of the start and end of the ranges respectively and then rebalance the tree to contain these entries and the newly inserted one. This kind of store is dubbed a 'spanning store' and is implemented by mas_wr_spanning_store(). In order to reach this stage, mas_store_gfp() invokes mas_wr_preallocate(), mas_wr_store_type() and mas_wr_walk() in turn to walk the tree and update the object (mas) to traverse to the location where the write should be performed, determining its store type. When a spanning store is required, this function returns false stopping at the parent node which contains the target range, and mas_wr_store_type() marks the mas->store_type as wr_spanning_store to denote this fact. When we go to perform the store in mas_wr_spanning_store(), we first determine the elements AFTER the END of the range we wish to store (that is, to the right of the entry to be inserted) - we do this by walking to the NEXT pivot in the tree (i.e. r_mas.last + 1), starting at the node we have just determined contains the range over which we intend to write. We then turn our attention to the entries to the left of the entry we are inserting, whose state is represented by l_mas, and copy these into a 'big node', which is a special node which contains enough slots to contain two leaf node's worth of data. We then copy the entry we wish to store immediately after this - the copy and the insertion of the new entry is performed by mas_store_b_node(). After this we copy the elements to the right of the end of the range which we are inserting, if we have not exceeded the length of the node (i.e. r_mas.offset <= r_mas.end). Herein lies the bug - under very specific circumstances, this logic can break and corrupt the maple tree. Consider the following tree: Height 0 Root Node / \ pivot = 0xffff / \ pivot = ULONG_MAX / \ 1 A [-----] ... / \ pivot = 0x4fff / \ pivot = 0xffff / \ 2 (LEAVES) B [-----] [-----] C ^--- Last pivot 0xffff. Now imagine we wish to store an entry in the range [0x4000, 0xffff] (note that all ranges expressed in maple tree code are inclusive): 1. mas_store_gfp() descends the tree, finds node A at <=0xffff, then determines that this is a spanning store across nodes B and C. The mas state is set such that the current node from which we traverse further is node A. 2. In mas_wr_spanning_store() we try to find elements to the right of pivot 0xffff by searching for an index of 0x10000: - mas_wr_walk_index() invokes mas_wr_walk_descend() and mas_wr_node_walk() in turn. - mas_wr_node_walk() loops over entries in node A until EITHER it finds an entry whose pivot equals or exceeds 0x10000 OR it reaches the final entry. - Since no entry has a pivot equal to or exceeding 0x10000, pivot 0xffff is selected, leading to node C. - mas_wr_walk_traverse() resets the mas state to traverse node C. We loop around and invoke mas_wr_walk_descend() and mas_wr_node_walk() in turn once again. - Again, we reach the last entry in node C, which has a pivot of 0xffff. 3. We then copy the elements to the left of 0x4000 in node B to the big node via mas_store_b_node(), and insert the new [0x4000, 0xffff] entry too. 4. We determine whether we have any entries to copy from the right of the end of the range via - and with r_mas set up at the entry at pivot 0xffff, r_mas.offset <= r_mas.end, and then we DUPLICATE the entry at pivot 0xffff. 5. BUG! The maple tree is corrupted with a duplicate entry. This requires a very specific set of circumstances - we must be spanning the last element in a leaf node, which is the last element in the parent node. spanning store across two leaf nodes with a range that ends at that shared pivot. A potential solution to this problem would simply be to reset the walk each time we traverse r_mas, however given the rarity of this situation it seems that would be rather inefficient. Instead, this patch detects if the right hand node is populated, i.e. has anything we need to copy. We do so by only copying elements from the right of the entry being inserted when the maximum value present exceeds the last, rather than basing this on offset position. The patch also updates some comments and eliminates the unused bool return value in mas_wr_walk_index(). The work performed in commit `f8d112a4e6` ("mm/mmap: avoid zeroing vma tree in mmap_region()") seems to have made the probability of this event much more likely, which is the point at which reports started to be submitted concerning this bug. The motivation for this change arose from Bert Karwatzki's report of encountering mm instability after the release of kernel v6.12-rc1 which, after the use of CONFIG_DEBUG_VM_MAPLE_TREE and similar configuration options, was identified as maple tree corruption. After Bert very generously provided his time and ability to reproduce this event consistently, I was able to finally identify that the issue discussed in this commit message was occurring for him. Link: https://lkml.kernel.org/r/cover.1728314402.git.lorenzo.stoakes@oracle.com Link: https://lkml.kernel.org/r/48b349a2a0f7c76e18772712d0997a5e12ab0a3b.1728314403.git.lorenzo.stoakes@oracle.com Fixes: `54a611b605` ("Maple Tree: add new data structure") Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reported-by: Bert Karwatzki <spasswolf@web.de> Closes: https://lore.kernel.org/all/20241001023402.3374-1-spasswolf@web.de/ Tested-by: Bert Karwatzki <spasswolf@web.de> Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Closes: https://lore.kernel.org/all/CABXGCsOPwuoNOqSMmAvWO2Fz4TEmPnjFj-b7iF+XFRu1h7-+Dg@mail.gmail.com/ Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Reviewed-by: Wei Yang <richard.weiyang@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-10-17 08:35:10 -07:00
Sidhartha Kumar	a6e0ceb7bf	maple_tree: check for MA_STATE_BULK on setting wr_rebalance It is possible for a bulk operation (MA_STATE_BULK is set) to enter the new_end < mt_min_slots[type] case and set wr_rebalance as a store type. This is incorrect as bulk stores do not rebalance per write, but rather after the all of the writes are done through the mas_bulk_rebalance() path. Therefore, add a check to make sure MA_STATE_BULK is not set before we return wr_rebalance as the store type. Also add a test to make sure wr_rebalance is never the store type when doing bulk operations via mas_expected_entries() This is a hotfix for this rc however it has no userspace effects as there are no users of the bulk insertion mode. Link: https://lkml.kernel.org/r/20241011214451.7286-1-sidhartha.kumar@oracle.com Fixes: `5d659bbb52` ("maple_tree: introduce mas_wr_store_type()") Suggested-by: Liam Howlett <liam.howlett@oracle.com> Signed-off-by: Sidhartha <sidhartha.kumar@oracle.com> Reviewed-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Liam Howlett <liam.howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-10-17 00:28:09 -07:00
Florian Westphal	dc783ba4b9	lib: alloc_tag_module_unload must wait for pending kfree_rcu calls Ben Greear reports following splat: ------------[ cut here ]------------ net/netfilter/nf_nat_core.c:1114 module nf_nat func:nf_nat_register_fn has 256 allocated at module unload WARNING: CPU: 1 PID: 10421 at lib/alloc_tag.c:168 alloc_tag_module_unload+0x22b/0x3f0 Modules linked in: nf_nat(-) btrfs ufs qnx4 hfsplus hfs minix vfat msdos fat ... Hardware name: Default string Default string/SKYBAY, BIOS 5.12 08/04/2020 RIP: 0010:alloc_tag_module_unload+0x22b/0x3f0 codetag_unload_module+0x19b/0x2a0 ? codetag_load_module+0x80/0x80 nf_nat module exit calls kfree_rcu on those addresses, but the free operation is likely still pending by the time alloc_tag checks for leaks. Wait for outstanding kfree_rcu operations to complete before checking resolves this warning. Reproducer: unshare -n iptables-nft -t nat -A PREROUTING -p tcp grep nf_nat /proc/allocinfo # will list 4 allocations rmmod nft_chain_nat rmmod nf_nat # will WARN. [akpm@linux-foundation.org: add comment] Link: https://lkml.kernel.org/r/20241007205236.11847-1-fw@strlen.de Fixes: `a473573964` ("lib: code tagging module support") Signed-off-by: Florian Westphal <fw@strlen.de> Reported-by: Ben Greear <greearb@candelatech.com> Closes: https://lore.kernel.org/netdev/bdaaef9d-4364-4171-b82b-bcfc12e207eb@candelatech.com/ Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-10-17 00:28:07 -07:00
Qianqiang Liu	cd843399d7	crypto: lib/mpi - Fix an "Uninitialized scalar variable" issue The "err" variable may be returned without an initialized value. Fixes: `8e3a67f2de` ("crypto: lib/mpi - Add error checks to extension") Signed-off-by: Qianqiang Liu <qianqiang.liu@163.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2024-10-16 13:38:16 +08:00
Thomas Gleixner	ff8d523cc4	debugobjects: Track object usage to avoid premature freeing of objects The freelist is freed at a constant rate independent of the actual usage requirements. That's bad in scenarios where usage comes in bursts. The end of a burst puts the objects on the free list and freeing proceeds even when the next burst which requires objects started again. Keep track of the usage with a exponentially wheighted moving average and take that into account in the worker function which frees objects from the free list. This further reduces the kmem_cache allocation/free rate for a full kernel compile: kmem_cache_alloc() kmem_cache_free() Baseline: 225k 173k Usage: 170k 117k Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com> Link: https://lore.kernel.org/all/87bjznhme2.ffs@tglx	2024-10-15 17:30:33 +02:00
Thomas Gleixner	13f9ca7239	debugobjects: Refill per CPU pool more agressively Right now the per CPU pools are only refilled when they become empty. That's suboptimal especially when there are still non-freed objects in the to free list. Check whether an allocation from the per CPU pool emptied a batch and try to allocate from the free pool if that still has objects available. kmem_cache_alloc() kmem_cache_free() Baseline: 295k 245k Refill: 225k 173k Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com> Link: https://lore.kernel.org/all/20241007164914.439053085@linutronix.de	2024-10-15 17:30:33 +02:00
Thomas Gleixner	a201a96b96	debugobjects: Double the per CPU slots In situations where objects are rapidly allocated from the pool and handed back, the size of the per CPU pool turns out to be too small. Double the size of the per CPU pool. This reduces the kmem cache allocation and free operations during a kernel compile: alloc free Baseline: 380k 330k Double size: 295k 245k Especially the reduction of allocations is important because that happens in the hot path when objects are initialized. The maximum increase in per CPU pool memory consumption is about 2.5K per online CPU, which is acceptable. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com> Link: https://lore.kernel.org/all/20241007164914.378676302@linutronix.de	2024-10-15 17:30:33 +02:00
Thomas Gleixner	2638345d22	debugobjects: Move pool statistics into global_pool struct Keep it along with the pool as that's a hot cache line anyway and it makes the code more comprehensible. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com> Link: https://lore.kernel.org/all/20241007164914.318776207@linutronix.de	2024-10-15 17:30:33 +02:00
Thomas Gleixner	f57ebb92ba	debugobjects: Implement batch processing Adding and removing single objects in a loop is bad in terms of lock contention and cache line accesses. To implement batching, record the last object in a batch in the object itself. This is trivialy possible as hlists are strictly stacks. At a batch boundary, when the first object is added to the list the object stores a pointer to itself in debug_obj::batch_last. When the next object is added to the list then the batch_last pointer is retrieved from the first object in the list and stored in the to be added one. That means for batch processing the first object always has a pointer to the last object in a batch, which allows to move batches in a cache line efficient way and reduces the lock held time. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com> Link: https://lore.kernel.org/all/20241007164914.258995000@linutronix.de	2024-10-15 17:30:33 +02:00
Thomas Gleixner	aebbfe0779	debugobjects: Prepare kmem_cache allocations for batching Allocate a batch and then push it into the pool. Utilize the debug_obj::last_node pointer for keeping track of the batch boundary. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20241007164914.198647184@linutronix.de	2024-10-15 17:30:32 +02:00
Thomas Gleixner	74fe1ad413	debugobjects: Prepare for batching Move the debug_obj::object pointer into a union and add a pointer to the last node in a batch. That allows to implement batch processing efficiently by utilizing the stack property of hlist: When the first object of a batch is added to the list, then the batch pointer is set to the hlist node of the object itself. Any subsequent add retrieves the pointer to the last node from the first object in the list and uses that for storing the last node pointer in the newly added object. Add the pointer to the data structure and ensure that all relevant pool sizes are strictly batch sized. The actual batching implementation follows in subsequent changes. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com> Link: https://lore.kernel.org/all/20241007164914.139204961@linutronix.de	2024-10-15 17:30:32 +02:00
Thomas Gleixner	14077b9e58	debugobjects: Use static key for boot pool selection Get rid of the conditional in the hot path. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com> Link: https://lore.kernel.org/all/20241007164914.077247071@linutronix.de	2024-10-15 17:30:32 +02:00
Thomas Gleixner	9ce99c6d7b	debugobjects: Rework free_object_work() Convert it to batch processing with intermediate helper functions. This reduces the final changes for batch processing. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com> Link: https://lore.kernel.org/all/20241007164914.015906394@linutronix.de	2024-10-15 17:30:32 +02:00
Thomas Gleixner	a3b9e191f5	debugobjects: Rework object freeing __free_object() is uncomprehensibly complex. The same can be achieved by: 1) Adding the object to the per CPU pool 2) If that pool is full, move a batch of objects into the global pool or if the global pool is full into the to free pool This also prepares for batch processing. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com> Link: https://lore.kernel.org/all/20241007164913.955542307@linutronix.de	2024-10-15 17:30:32 +02:00
Thomas Gleixner	fb60c004f3	debugobjects: Rework object allocation The current allocation scheme tries to allocate from the per CPU pool first. If that fails it allocates one object from the global pool and then refills the per CPU pool from the global pool. That is in the way of switching the pool management to batch mode as the global pool needs to be a strict stack of batches, which does not allow to allocate single objects. Rework the code to refill the per CPU pool first and then allocate the object from the refilled batch. Also try to allocate from the to free pool first to avoid freeing and reallocating objects. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com> Link: https://lore.kernel.org/all/20241007164913.893554162@linutronix.de	2024-10-15 17:30:32 +02:00
Thomas Gleixner	96a9a0421c	debugobjects: Move min/max count into pool struct Having the accounting in the datastructure is better in terms of cache lines and allows more optimizations later on. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com> Link: https://lore.kernel.org/all/20241007164913.831908427@linutronix.de	2024-10-15 17:30:32 +02:00
Thomas Gleixner	18b8afcb37	debugobjects: Rename and tidy up per CPU pools No point in having a separate data structure. Reuse struct obj_pool and tidy up the code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com> Link: https://lore.kernel.org/all/20241007164913.770595795@linutronix.de	2024-10-15 17:30:31 +02:00
Thomas Gleixner	cb58d19084	debugobjects: Use separate list head for boot pool There is no point to handle the statically allocated objects during early boot in the actual pool list. This phase does not require accounting, so all of the related complexity can be avoided. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com> Link: https://lore.kernel.org/all/20241007164913.708939081@linutronix.de	2024-10-15 17:30:31 +02:00
Thomas Gleixner	e18328ff70	debugobjects: Move pools into a datastructure The contention on the global pool lock can be reduced by strict batch processing where batches of objects are moved from one list head to another instead of moving them object by object. This also reduces the cache footprint because it avoids the list walk and dirties at maximum three cache lines instead of potentially up to eighteen. To prepare for that, move the hlist head and related counters into a struct. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com> Link: https://lore.kernel.org/all/20241007164913.646171170@linutronix.de	2024-10-15 17:30:31 +02:00
Zhen Lei	d8c6cd3a5c	debugobjects: Reduce parallel pool fill attempts The contention on the global pool_lock can be massive when the global pool needs to be refilled and many CPUs try to handle this. Address this by: - splitting the refill from free list and allocation. Refill from free list has no constraints vs. the context on RT, so it can be tried outside of the RT specific preemptible() guard - Let only one CPU handle the free list - Let only one CPU do allocations unless the pool level is below half of the minimum fill level. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20240911083521.2257-4-thunder.leizhen@huawei.com- Link: https://lore.kernel.org/all/20241007164913.582118421@linutronix.de -- lib/debugobjects.c \| 84 +++++++++++++++++++++++++++++++++++++---------------- 1 file changed, 59 insertions(+), 25 deletions(-)	2024-10-15 17:30:31 +02:00
Thomas Gleixner	661cc28b52	debugobjects: Make debug_objects_enabled bool Make it what it is. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com> Link: https://lore.kernel.org/all/20241007164913.518175013@linutronix.de	2024-10-15 17:30:31 +02:00
Thomas Gleixner	49a5cb827d	debugobjects: Provide and use free_object_list() Move the loop to free a list of objects into a helper function so it can be reused later. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20241007164913.453912357@linutronix.de	2024-10-15 17:30:31 +02:00
Thomas Gleixner	241463f4fd	debugobjects: Remove pointless debug printk It has zero value. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com> Link: https://lore.kernel.org/all/20241007164913.390511021@linutronix.de	2024-10-15 17:30:31 +02:00
Thomas Gleixner	49968cf181	debugobjects: Reuse put_objects() on OOM Reuse the helper function instead of having a open coded copy. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com> Link: https://lore.kernel.org/all/20241007164913.326834268@linutronix.de	2024-10-15 17:30:31 +02:00
Thomas Gleixner	a2a702383e	debugobjects: Dont free objects directly on CPU hotplug Freeing the per CPU pool of the unplugged CPU directly is suboptimal as the objects can be reused in the real pool if there is room. Aside of that this gets the accounting wrong. Use the regular free path, which allows reuse and has the accounting correct. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com> Link: https://lore.kernel.org/all/20241007164913.263960570@linutronix.de	2024-10-15 17:30:30 +02:00
Thomas Gleixner	3f397bf955	debugobjects: Remove pointless hlist initialization It's BSS zero initialized. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com> Link: https://lore.kernel.org/all/20241007164913.200379308@linutronix.de	2024-10-15 17:30:30 +02:00
Thomas Gleixner	55fb412ef7	debugobjects: Dont destroy kmem cache in init() debug_objects_mem_init() is invoked from mm_core_init() before work queues are available. If debug_objects_mem_init() destroys the kmem cache in the error path it causes an Oops in __queue_work(): Oops: Oops: 0000 [#1] PREEMPT SMP PTI RIP: 0010:__queue_work+0x35/0x6a0 queue_work_on+0x66/0x70 flush_all_cpus_locked+0xdf/0x1a0 __kmem_cache_shutdown+0x2f/0x340 kmem_cache_destroy+0x4e/0x150 mm_core_init+0x9e/0x120 start_kernel+0x298/0x800 x86_64_start_reservations+0x18/0x30 x86_64_start_kernel+0xc5/0xe0 common_startup_64+0x12c/0x138 Further the object cache pointer is used in various places to check for early boot operation. It is exposed before the replacments for the static boot time objects are allocated and the self test operates on it. This can be avoided by: 1) Running the self test with the static boot objects 2) Exposing it only after the replacement objects have been added to the pool. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20241007164913.137021337@linutronix.de	2024-10-15 17:30:30 +02:00
Zhen Lei	813fd07858	debugobjects: Collect newly allocated objects in a list to reduce lock contention Collect the newly allocated debug objects in a list outside the lock, so that the lock held time and the potential lock contention is reduced. Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20240911083521.2257-3-thunder.leizhen@huawei.com Link: https://lore.kernel.org/all/20241007164913.073653668@linutronix.de	2024-10-15 17:30:30 +02:00
Zhen Lei	a0ae950408	debugobjects: Delete a piece of redundant code The statically allocated objects are all located in obj_static_pool[], the whole memory of obj_static_pool[] will be reclaimed later. Therefore, there is no need to split the remaining statically nodes in list obj_pool into isolated ones, no one will use them anymore. Just write INIT_HLIST_HEAD(&obj_pool) is enough. Since hlist_move_list() directly discards the old list, even this can be omitted. Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20240911083521.2257-2-thunder.leizhen@huawei.com Link: https://lore.kernel.org/all/20241007164913.009849239@linutronix.de	2024-10-15 17:30:30 +02:00
Will Deacon	7aed6a2c51	kasan: Disable Software Tag-Based KASAN with GCC Syzbot reports a KASAN failure early during boot on arm64 when building with GCC 12.2.0 and using the Software Tag-Based KASAN mode: \| BUG: KASAN: invalid-access in smp_build_mpidr_hash arch/arm64/kernel/setup.c:133 [inline] \| BUG: KASAN: invalid-access in setup_arch+0x984/0xd60 arch/arm64/kernel/setup.c:356 \| Write of size 4 at addr 03ff800086867e00 by task swapper/0 \| Pointer tag: [03], memory tag: [fe] Initial triage indicates that the report is a false positive and a thorough investigation of the crash by Mark Rutland revealed the root cause to be a bug in GCC: > When GCC is passed `-fsanitize=hwaddress` or > `-fsanitize=kernel-hwaddress` it ignores > `__attribute__((no_sanitize_address))`, and instruments functions > we require are not instrumented. > > [...] > > All versions [of GCC] I tried were broken, from 11.3.0 to 14.2.0 > inclusive. > > I think we have to disable KASAN_SW_TAGS with GCC until this is > fixed Disable Software Tag-Based KASAN when building with GCC by making CC_HAS_KASAN_SW_TAGS depend on !CC_IS_GCC. Cc: Andrey Konovalov <andreyknvl@gmail.com> Suggested-by: Mark Rutland <mark.rutland@arm.com> Reported-by: syzbot+908886656a02769af987@syzkaller.appspotmail.com Link: https://lore.kernel.org/r/000000000000f362e80620e27859@google.com Link: https://lore.kernel.org/r/ZvFGwKfoC4yVjN_X@J2N7QTR9R3 Link: https://bugzilla.kernel.org/show_bug.cgi?id=218854 Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20241014161100.18034-1-will@kernel.org Signed-off-by: Will Deacon <will@kernel.org>	2024-10-15 11:38:10 +01:00

1 2 3 4 5 ...

9522 Commits