Commit Graph

1111 Commits

Author SHA1 Message Date
Linus Torvalds
8fc3b8f082 hardening fixes for v6.5-rc1
- Check for NULL bdev in LoadPin (Matthias Kaehlcke)
 
 - Revert unwanted KUnit FORTIFY build default
 
 - Fix 1-element array causing boot warnings with xhci-hub
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmSoVSsWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJjyuD/9Sgr+T3VJyROJdKouYO8tLUqaO
 g0A6+WE0L7XyO4ZYk4FOadeihsVEPuhB0fpDTwriKCKdPB35+Nhq8YfWPPQcGdjQ
 0IAT5AjsjYDDFGABRtsNRcL+KXyR+QRVUnSllEsZuwb3lyq6HRbdTF2QBjToAbyO
 QOgEnFJNqPp2w9y2KSzpMuYL4I9o1WbyM+huVSfoKe/3d2WnVKiARMpV+0EJgUAy
 BvORp55+c1w77IRbQduACWszdCLXfkQyI+p5ii3M7cZmePDe4q8LHN01WtIMEnHy
 cln7AnwU4daxzfdeAWIQMLFjOXTLHlkRhC18KSobeBc5Zkudtcg5LxtFGiDsDgOU
 mUWB/Ow8rgr6KlYkMFmFrW/GAVX12KbPXDATECa/4Yhl55Ydl/1bChJWWnX2pppU
 mRRnwIcY7MfhRLeB284Gst81wOHy408arJsm/vck5kdya0Ys1y38rgNQm7iKfXVu
 FYMrDU9qqGmeIVk2namjQYoWH5ei670PXndtrcvSffeZOhpzk2FnFphtraPe0mrl
 l1lcUonZwEoTJ4wDiOR9cjSphoDVom9LgwygQVb4KGHBjuCfRABDV2DGy9duBMtv
 Akcet48VkCX6wF91+30fFmTs5haRiF/5kkx5fGuxhFlQO8QHYVjIOH55VqhAt3mw
 d0OWiZaNRvbNfjPSkQ==
 =R3uK
 -----END PGP SIGNATURE-----

Merge tag 'hardening-v6.5-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull hardening fixes from Kees Cook:

 - Check for NULL bdev in LoadPin (Matthias Kaehlcke)

 - Revert unwanted KUnit FORTIFY build default

 - Fix 1-element array causing boot warnings with xhci-hub

* tag 'hardening-v6.5-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  usb: ch9: Replace bmSublinkSpeedAttr 1-element array with flexible array
  Revert "fortify: Allow KUnit test to build without FORTIFY"
  dm: verity-loadpin: Add NULL pointer check for 'bdev' parameter
2023-07-08 12:08:39 -07:00
Kees Cook
5e2956ee46 Revert "fortify: Allow KUnit test to build without FORTIFY"
This reverts commit a9dc8d0442.

The standard for KUnit is to not build tests at all when required
functionality is missing, rather than doing test "skip". Restore this
for the fortify tests, so that architectures without
CONFIG_ARCH_HAS_FORTIFY_SOURCE do not emit unsolvable warnings.

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Closes: https://lore.kernel.org/all/CAMuHMdUrxOEroHVUt7-mAnKSBjY=a-D3jr+XiAifuwv06Ob9Pw@mail.gmail.com
Signed-off-by: Kees Cook <keescook@chromium.org>
2023-07-03 12:50:26 -07:00
Linus Torvalds
77b1a7f7a0 - Arnd Bergmann has fixed a bunch of -Wmissing-prototypes in
top-level directories.
 
 - Douglas Anderson has added a new "buddy" mode to the hardlockup
   detector.  It permits the detector to work on architectures which
   cannot provide the required interrupts, by having CPUs periodically
   perform checks on other CPUs.
 
 - Zhen Lei has enhanced kexec's ability to support two crash regions.
 
 - Petr Mladek has done a lot of cleanup on the hard lockup detector's
   Kconfig entries.
 
 - And the usual bunch of singleton patches in various places.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZJelTAAKCRDdBJ7gKXxA
 juDkAP0VXWynzkXoojdS/8e/hhi+htedmQ3v2dLZD+vBrctLhAEA7rcH58zAVoWa
 2ejqO6wDrRGUC7JQcO9VEjT0nv73UwU=
 =F293
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2023-06-24-19-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull non-mm updates from Andrew Morton:

 - Arnd Bergmann has fixed a bunch of -Wmissing-prototypes in top-level
   directories

 - Douglas Anderson has added a new "buddy" mode to the hardlockup
   detector. It permits the detector to work on architectures which
   cannot provide the required interrupts, by having CPUs periodically
   perform checks on other CPUs

 - Zhen Lei has enhanced kexec's ability to support two crash regions

 - Petr Mladek has done a lot of cleanup on the hard lockup detector's
   Kconfig entries

 - And the usual bunch of singleton patches in various places

* tag 'mm-nonmm-stable-2023-06-24-19-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (72 commits)
  kernel/time/posix-stubs.c: remove duplicated include
  ocfs2: remove redundant assignment to variable bit_off
  watchdog/hardlockup: fix typo in config HARDLOCKUP_DETECTOR_PREFER_BUDDY
  powerpc: move arch_trigger_cpumask_backtrace from nmi.h to irq.h
  devres: show which resource was invalid in __devm_ioremap_resource()
  watchdog/hardlockup: define HARDLOCKUP_DETECTOR_ARCH
  watchdog/sparc64: define HARDLOCKUP_DETECTOR_SPARC64
  watchdog/hardlockup: make HAVE_NMI_WATCHDOG sparc64-specific
  watchdog/hardlockup: declare arch_touch_nmi_watchdog() only in linux/nmi.h
  watchdog/hardlockup: make the config checks more straightforward
  watchdog/hardlockup: sort hardlockup detector related config values a logical way
  watchdog/hardlockup: move SMP barriers from common code to buddy code
  watchdog/buddy: simplify the dependency for HARDLOCKUP_DETECTOR_PREFER_BUDDY
  watchdog/buddy: don't copy the cpumask in watchdog_next_cpu()
  watchdog/buddy: cleanup how watchdog_buddy_check_hardlockup() is called
  watchdog/hardlockup: remove softlockup comment in touch_nmi_watchdog()
  watchdog/hardlockup: in watchdog_hardlockup_check() use cpumask_copy()
  watchdog/hardlockup: don't use raw_cpu_ptr() in watchdog_hardlockup_kick()
  watchdog/hardlockup: HAVE_NMI_WATCHDOG must implement watchdog_hardlockup_probe()
  watchdog/hardlockup: keep kernel.nmi_watchdog sysctl as 0444 if probe fails
  ...
2023-06-28 10:59:38 -07:00
Linus Torvalds
6e17c6de3d - Yosry Ahmed brought back some cgroup v1 stats in OOM logs.
- Yosry has also eliminated cgroup's atomic rstat flushing.
 
 - Nhat Pham adds the new cachestat() syscall.  It provides userspace
   with the ability to query pagecache status - a similar concept to
   mincore() but more powerful and with improved usability.
 
 - Mel Gorman provides more optimizations for compaction, reducing the
   prevalence of page rescanning.
 
 - Lorenzo Stoakes has done some maintanance work on the get_user_pages()
   interface.
 
 - Liam Howlett continues with cleanups and maintenance work to the maple
   tree code.  Peng Zhang also does some work on maple tree.
 
 - Johannes Weiner has done some cleanup work on the compaction code.
 
 - David Hildenbrand has contributed additional selftests for
   get_user_pages().
 
 - Thomas Gleixner has contributed some maintenance and optimization work
   for the vmalloc code.
 
 - Baolin Wang has provided some compaction cleanups,
 
 - SeongJae Park continues maintenance work on the DAMON code.
 
 - Huang Ying has done some maintenance on the swap code's usage of
   device refcounting.
 
 - Christoph Hellwig has some cleanups for the filemap/directio code.
 
 - Ryan Roberts provides two patch series which yield some
   rationalization of the kernel's access to pte entries - use the provided
   APIs rather than open-coding accesses.
 
 - Lorenzo Stoakes has some fixes to the interaction between pagecache
   and directio access to file mappings.
 
 - John Hubbard has a series of fixes to the MM selftesting code.
 
 - ZhangPeng continues the folio conversion campaign.
 
 - Hugh Dickins has been working on the pagetable handling code, mainly
   with a view to reducing the load on the mmap_lock.
 
 - Catalin Marinas has reduced the arm64 kmalloc() minimum alignment from
   128 to 8.
 
 - Domenico Cerasuolo has improved the zswap reclaim mechanism by
   reorganizing the LRU management.
 
 - Matthew Wilcox provides some fixups to make gfs2 work better with the
   buffer_head code.
 
 - Vishal Moola also has done some folio conversion work.
 
 - Matthew Wilcox has removed the remnants of the pagevec code - their
   functionality is migrated over to struct folio_batch.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZJejewAKCRDdBJ7gKXxA
 joggAPwKMfT9lvDBEUnJagY7dbDPky1cSYZdJKxxM2cApGa42gEA6Cl8HRAWqSOh
 J0qXCzqaaN8+BuEyLGDVPaXur9KirwY=
 =B7yQ
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2023-06-24-19-15' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull mm updates from Andrew Morton:

 - Yosry Ahmed brought back some cgroup v1 stats in OOM logs

 - Yosry has also eliminated cgroup's atomic rstat flushing

 - Nhat Pham adds the new cachestat() syscall. It provides userspace
   with the ability to query pagecache status - a similar concept to
   mincore() but more powerful and with improved usability

 - Mel Gorman provides more optimizations for compaction, reducing the
   prevalence of page rescanning

 - Lorenzo Stoakes has done some maintanance work on the
   get_user_pages() interface

 - Liam Howlett continues with cleanups and maintenance work to the
   maple tree code. Peng Zhang also does some work on maple tree

 - Johannes Weiner has done some cleanup work on the compaction code

 - David Hildenbrand has contributed additional selftests for
   get_user_pages()

 - Thomas Gleixner has contributed some maintenance and optimization
   work for the vmalloc code

 - Baolin Wang has provided some compaction cleanups,

 - SeongJae Park continues maintenance work on the DAMON code

 - Huang Ying has done some maintenance on the swap code's usage of
   device refcounting

 - Christoph Hellwig has some cleanups for the filemap/directio code

 - Ryan Roberts provides two patch series which yield some
   rationalization of the kernel's access to pte entries - use the
   provided APIs rather than open-coding accesses

 - Lorenzo Stoakes has some fixes to the interaction between pagecache
   and directio access to file mappings

 - John Hubbard has a series of fixes to the MM selftesting code

 - ZhangPeng continues the folio conversion campaign

 - Hugh Dickins has been working on the pagetable handling code, mainly
   with a view to reducing the load on the mmap_lock

 - Catalin Marinas has reduced the arm64 kmalloc() minimum alignment
   from 128 to 8

 - Domenico Cerasuolo has improved the zswap reclaim mechanism by
   reorganizing the LRU management

 - Matthew Wilcox provides some fixups to make gfs2 work better with the
   buffer_head code

 - Vishal Moola also has done some folio conversion work

 - Matthew Wilcox has removed the remnants of the pagevec code - their
   functionality is migrated over to struct folio_batch

* tag 'mm-stable-2023-06-24-19-15' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (380 commits)
  mm/hugetlb: remove hugetlb_set_page_subpool()
  mm: nommu: correct the range of mmap_sem_read_lock in task_mem()
  hugetlb: revert use of page_cache_next_miss()
  Revert "page cache: fix page_cache_next/prev_miss off by one"
  mm/vmscan: fix root proactive reclaim unthrottling unbalanced node
  mm: memcg: rename and document global_reclaim()
  mm: kill [add|del]_page_to_lru_list()
  mm: compaction: convert to use a folio in isolate_migratepages_block()
  mm: zswap: fix double invalidate with exclusive loads
  mm: remove unnecessary pagevec includes
  mm: remove references to pagevec
  mm: rename invalidate_mapping_pagevec to mapping_try_invalidate
  mm: remove struct pagevec
  net: convert sunrpc from pagevec to folio_batch
  i915: convert i915_gpu_error to use a folio_batch
  pagevec: rename fbatch_count()
  mm: remove check_move_unevictable_pages()
  drm: convert drm_gem_put_pages() to use a folio_batch
  i915: convert shmem_sg_free_table() to use a folio_batch
  scatterlist: add sg_set_folio()
  ...
2023-06-28 10:28:11 -07:00
Linus Torvalds
582c161cf3 hardening updates for v6.5-rc1
- Fix KMSAN vs FORTIFY in strlcpy/strlcat (Alexander Potapenko)
 
 - Convert strreplace() to return string start (Andy Shevchenko)
 
 - Flexible array conversions (Arnd Bergmann, Wyes Karny, Kees Cook)
 
 - Add missing function prototypes seen with W=1 (Arnd Bergmann)
 
 - Fix strscpy() kerndoc typo (Arne Welzel)
 
 - Replace strlcpy() with strscpy() across many subsystems which were
   either Acked by respective maintainers or were trivial changes that
   went ignored for multiple weeks (Azeem Shaikh)
 
 - Remove unneeded cc-option test for UBSAN_TRAP (Nick Desaulniers)
 
 - Add KUnit tests for strcat()-family
 
 - Enable KUnit tests of FORTIFY wrappers under UML
 
 - Add more complete FORTIFY protections for strlcat()
 
 - Add missed disabling of FORTIFY for all arch purgatories.
 
 - Enable -fstrict-flex-arrays=3 globally
 
 - Tightening UBSAN_BOUNDS when using GCC
 
 - Improve checkpatch to check for strcpy, strncpy, and fake flex arrays
 
 - Improve use of const variables in FORTIFY
 
 - Add requested struct_size_t() helper for types not pointers
 
 - Add __counted_by macro for annotating flexible array size members
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmSbftQWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJj0MD/9X9jzJzCmsAU+yNldeoAzC84Sk
 GVU3RBxGcTNysL1gZXynkIgigw7DWc4htMGeSABHHwQRVP65JCH1Kw/VqIkyumbx
 9LdX6IklMJb4pRT4PVU3azebV4eNmSjlur2UxMeW54Czm91/6I8RHbJOyAPnOUmo
 2oomGdP/hpEHtKR7hgy8Axc6w5ySwQixh2V5sVZG3VbvCS5WKTmTXbs6puuRT5hz
 iHt7v+7VtEg/Qf1W7J2oxfoghvVBsaRrSLrExWT/oZYh1ZxM7DsCAAoG/IsDgHGA
 9LBXiRECgAFThbHVxLvvKZQMXdVk0i8iXLX43XMKC0wTA+NTyH7wlcQQ4RWNMuo8
 sfA9Qm9gMArXaf64aymr3Uwn20Zan0391HdlbhOJZAE6v3PPJbleUnM58AzD2d3r
 5Lz6AIFBxDImy+3f9iDWgacCT5/PkeiXTHzk9QnKhJyKKtRA58XJxj4q2+rPnGJP
 n4haXqoxD5FJbxdXiGKk31RS0U5HBug7wkOcUrTqDHUbc/QNU2b7dxTKUx+zYtCU
 uV5emPzpF4H4z+91WpO47n9gkMAfwV0lt9S2dwS8pxsgqctbmIan+Jgip7rsqZ2G
 OgLXBsb43eEs+6WgO8tVt/ZHYj9ivGMdrcNcsIfikzNs/xweUJ53k2xSEn2xEa5J
 cwANDmkL6QQK7yfeeg==
 =s0j1
 -----END PGP SIGNATURE-----

Merge tag 'hardening-v6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull hardening updates from Kees Cook:
 "There are three areas of note:

  A bunch of strlcpy()->strscpy() conversions ended up living in my tree
  since they were either Acked by maintainers for me to carry, or got
  ignored for multiple weeks (and were trivial changes).

  The compiler option '-fstrict-flex-arrays=3' has been enabled
  globally, and has been in -next for the entire devel cycle. This
  changes compiler diagnostics (though mainly just -Warray-bounds which
  is disabled) and potential UBSAN_BOUNDS and FORTIFY _warning_
  coverage. In other words, there are no new restrictions, just
  potentially new warnings. Any new FORTIFY warnings we've seen have
  been fixed (usually in their respective subsystem trees). For more
  details, see commit df8fc4e934.

  The under-development compiler attribute __counted_by has been added
  so that we can start annotating flexible array members with their
  associated structure member that tracks the count of flexible array
  elements at run-time. It is possible (likely?) that the exact syntax
  of the attribute will change before it is finalized, but GCC and Clang
  are working together to sort it out. Any changes can be made to the
  macro while we continue to add annotations.

  As an example of that last case, I have a treewide commit waiting with
  such annotations found via Coccinelle:

    https://git.kernel.org/linus/adc5b3cb48a049563dc673f348eab7b6beba8a9b

  Also see commit dd06e72e68 for more details.

  Summary:

   - Fix KMSAN vs FORTIFY in strlcpy/strlcat (Alexander Potapenko)

   - Convert strreplace() to return string start (Andy Shevchenko)

   - Flexible array conversions (Arnd Bergmann, Wyes Karny, Kees Cook)

   - Add missing function prototypes seen with W=1 (Arnd Bergmann)

   - Fix strscpy() kerndoc typo (Arne Welzel)

   - Replace strlcpy() with strscpy() across many subsystems which were
     either Acked by respective maintainers or were trivial changes that
     went ignored for multiple weeks (Azeem Shaikh)

   - Remove unneeded cc-option test for UBSAN_TRAP (Nick Desaulniers)

   - Add KUnit tests for strcat()-family

   - Enable KUnit tests of FORTIFY wrappers under UML

   - Add more complete FORTIFY protections for strlcat()

   - Add missed disabling of FORTIFY for all arch purgatories.

   - Enable -fstrict-flex-arrays=3 globally

   - Tightening UBSAN_BOUNDS when using GCC

   - Improve checkpatch to check for strcpy, strncpy, and fake flex
     arrays

   - Improve use of const variables in FORTIFY

   - Add requested struct_size_t() helper for types not pointers

   - Add __counted_by macro for annotating flexible array size members"

* tag 'hardening-v6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (54 commits)
  netfilter: ipset: Replace strlcpy with strscpy
  uml: Replace strlcpy with strscpy
  um: Use HOST_DIR for mrproper
  kallsyms: Replace all non-returning strlcpy with strscpy
  sh: Replace all non-returning strlcpy with strscpy
  of/flattree: Replace all non-returning strlcpy with strscpy
  sparc64: Replace all non-returning strlcpy with strscpy
  Hexagon: Replace all non-returning strlcpy with strscpy
  kobject: Use return value of strreplace()
  lib/string_helpers: Change returned value of the strreplace()
  jbd2: Avoid printing outside the boundary of the buffer
  checkpatch: Check for 0-length and 1-element arrays
  riscv/purgatory: Do not use fortified string functions
  s390/purgatory: Do not use fortified string functions
  x86/purgatory: Do not use fortified string functions
  acpi: Replace struct acpi_table_slit 1-element array with flex-array
  clocksource: Replace all non-returning strlcpy with strscpy
  string: use __builtin_memcpy() in strlcpy/strlcat
  staging: most: Replace all non-returning strlcpy with strscpy
  drm/i2c: tda998x: Replace all non-returning strlcpy with strscpy
  ...
2023-06-27 21:24:18 -07:00
Linus Torvalds
7ab044a4f4 workqueue: Changes for v6.5
* Concurrency-managed per-cpu work items that hog CPUs and delay the
   execution of other work items are now automatically detected and excluded
   from concurrency management. Reporting on such work items can also be
   enabled through a config option.
 
 * Added tools/workqueue/wq_monitor.py which improves visibility into
   workqueue usages and behaviors.
 
 * Includes Arnd's minimal fix for gcc-13 enum warning on 32bit compiles.
   This conflicts with afa4bb778e ("workqueue: clean up WORK_* constant
   types, clarify masking") in master. Can be resolved by picking the master
   version.
 -----BEGIN PGP SIGNATURE-----
 
 iIQEABYIACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCZJoGvw4cdGpAa2VybmVs
 Lm9yZwAKCRCxYfJx3gVYGZu0AP9IGK2opAzO9i3i1/Ys81b3sHi9PwrYWH3g252T
 Oe3O6QD/Wh0wYBVl0o7IdW6BGdd5iNwIEs420G53UmmPrATqsgQ=
 =TffY
 -----END PGP SIGNATURE-----

Merge tag 'wq-for-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq

Pull workqueue updates from Tejun Heo:

 - Concurrency-managed per-cpu work items that hog CPUs and delay the
   execution of other work items are now automatically detected and
   excluded from concurrency management. Reporting on such work items
   can also be enabled through a config option.

 - Added tools/workqueue/wq_monitor.py which improves visibility into
   workqueue usages and behaviors.

 - Arnd's minimal fix for gcc-13 enum warning on 32bit compiles,
   superseded by commit afa4bb778e in mainline.

* tag 'wq-for-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: Disable per-cpu CPU hog detection when wq_cpu_intensive_thresh_us is 0
  workqueue: Fix WARN_ON_ONCE() triggers in worker_enter_idle()
  workqueue: fix enum type for gcc-13
  workqueue: Track and monitor per-workqueue CPU time usage
  workqueue: Report work funcs that trigger automatic CPU_INTENSIVE mechanism
  workqueue: Automatically mark CPU-hogging work items CPU_INTENSIVE
  workqueue: Improve locking rule description for worker fields
  workqueue: Move worker_set/clr_flags() upwards
  workqueue: Re-order struct worker fields
  workqueue: Add pwq->stats[] and a monitoring script
  Further upgrade queue_work_on() comment
2023-06-27 16:32:52 -07:00
Lukas Bulwahn
a8992d8ad7 watchdog/hardlockup: fix typo in config HARDLOCKUP_DETECTOR_PREFER_BUDDY
Commit a5fcc2367e ("watchdog/hardlockup: make HAVE_NMI_WATCHDOG
sparc64-specific") accidentially introduces a typo in one of the config
dependencies of HARDLOCKUP_DETECTOR_PREFER_BUDDY.

Fix this accidental typo.

Link: https://lkml.kernel.org/r/20230623040717.8645-1-lukas.bulwahn@gmail.com
Fixes: a5fcc2367e ("watchdog/hardlockup: make HAVE_NMI_WATCHDOG sparc64-specific")
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-23 17:04:04 -07:00
Petr Mladek
7ca8fe94aa watchdog/hardlockup: define HARDLOCKUP_DETECTOR_ARCH
The HAVE_ prefix means that the code could be enabled. Add another
variable for HAVE_HARDLOCKUP_DETECTOR_ARCH without this prefix.
It will be set when it should be built. It will make it compatible
with the other hardlockup detectors.

The change allows to clean up dependencies of PPC_WATCHDOG
and HAVE_HARDLOCKUP_DETECTOR_PERF definitions for powerpc.

As a result HAVE_HARDLOCKUP_DETECTOR_PERF has the same dependencies
on arm, x86, powerpc architectures.

Link: https://lkml.kernel.org/r/20230616150618.6073-7-pmladek@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19 16:25:29 -07:00
Petr Mladek
47f4cb4339 watchdog/sparc64: define HARDLOCKUP_DETECTOR_SPARC64
The HAVE_ prefix means that the code could be enabled. Add another
variable for HAVE_HARDLOCKUP_DETECTOR_SPARC64 without this prefix.
It will be set when it should be built. It will make it compatible
with the other hardlockup detectors.

Before, it is far from obvious that the SPARC64 variant is actually used:

$> make ARCH=sparc64 defconfig
$> grep HARDLOCKUP_DETECTOR .config
CONFIG_HAVE_HARDLOCKUP_DETECTOR_BUDDY=y
CONFIG_HAVE_HARDLOCKUP_DETECTOR_SPARC64=y

After, it is more clear:

$> make ARCH=sparc64 defconfig
$> grep HARDLOCKUP_DETECTOR .config
CONFIG_HAVE_HARDLOCKUP_DETECTOR_BUDDY=y
CONFIG_HAVE_HARDLOCKUP_DETECTOR_SPARC64=y
CONFIG_HARDLOCKUP_DETECTOR_SPARC64=y

Link: https://lkml.kernel.org/r/20230616150618.6073-6-pmladek@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19 16:25:29 -07:00
Petr Mladek
a5fcc2367e watchdog/hardlockup: make HAVE_NMI_WATCHDOG sparc64-specific
There are several hardlockup detector implementations and several Kconfig
values which allow selection and build of the preferred one.

CONFIG_HARDLOCKUP_DETECTOR was introduced by the commit 23637d477c
("lockup_detector: Introduce CONFIG_HARDLOCKUP_DETECTOR") in v2.6.36.
It was a preparation step for introducing the new generic perf hardlockup
detector.

The existing arch-specific variants did not support the to-be-created
generic build configurations, sysctl interface, etc. This distinction
was made explicit by the commit 4a7863cc2e ("x86, nmi_watchdog:
Remove ARCH_HAS_NMI_WATCHDOG and rely on CONFIG_HARDLOCKUP_DETECTOR")
in v2.6.38.

CONFIG_HAVE_NMI_WATCHDOG was introduced by the commit d314d74c69
("nmi watchdog: do not use cpp symbol in Kconfig") in v3.4-rc1. It replaced
the above mentioned ARCH_HAS_NMI_WATCHDOG. At that time, it was still used
by three architectures, namely blackfin, mn10300, and sparc.

The support for blackfin and mn10300 architectures has been completely
dropped some time ago. And sparc is the only architecture with the historic
NMI watchdog at the moment.

And the old sparc implementation is really special. It is always built on
sparc64. It used to be always enabled until the commit 7a5c8b57ce
("sparc: implement watchdog_nmi_enable and watchdog_nmi_disable") added
in v4.10-rc1.

There are only few locations where the sparc64 NMI watchdog interacts
with the generic hardlockup detectors code:

  + implements arch_touch_nmi_watchdog() which is called from the generic
    touch_nmi_watchdog()

  + implements watchdog_hardlockup_enable()/disable() to support
    /proc/sys/kernel/nmi_watchdog

  + is always preferred over other generic watchdogs, see
    CONFIG_HARDLOCKUP_DETECTOR

  + includes asm/nmi.h into linux/nmi.h because some sparc-specific
    functions are needed in sparc-specific code which includes
    only linux/nmi.h.

The situation became more complicated after the commit 05a4a95279
("kernel/watchdog: split up config options") and commit 2104180a53
("powerpc/64s: implement arch-specific hardlockup watchdog") in v4.13-rc1.
They introduced HAVE_HARDLOCKUP_DETECTOR_ARCH. It was used for powerpc
specific hardlockup detector. It was compatible with the perf one
regarding the general boot, sysctl, and programming interfaces.

HAVE_HARDLOCKUP_DETECTOR_ARCH was defined as a superset of
HAVE_NMI_WATCHDOG. It made some sense because all arch-specific
detectors had some common requirements, namely:

  + implemented arch_touch_nmi_watchdog()
  + included asm/nmi.h into linux/nmi.h
  + defined the default value for /proc/sys/kernel/nmi_watchdog

But it actually has made things pretty complicated when the generic
buddy hardlockup detector was added. Before the generic perf detector
was newer supported together with an arch-specific one. But the buddy
detector could work on any SMP system. It means that an architecture
could support both the arch-specific and buddy detector.

As a result, there are few tricky dependencies. For example,
CONFIG_HARDLOCKUP_DETECTOR depends on:

  ((HAVE_HARDLOCKUP_DETECTOR_PERF || HAVE_HARDLOCKUP_DETECTOR_BUDDY) && !HAVE_NMI_WATCHDOG) || HAVE_HARDLOCKUP_DETECTOR_ARCH

The problem is that the very special sparc implementation is defined as:

  HAVE_NMI_WATCHDOG && !HAVE_HARDLOCKUP_DETECTOR_ARCH

Another problem is that the meaning of HAVE_NMI_WATCHDOG is far from clear
without reading understanding the history.

Make the logic less tricky and more self-explanatory by making
HAVE_NMI_WATCHDOG specific for the sparc64 implementation. And rename it to
HAVE_HARDLOCKUP_DETECTOR_SPARC64.

Note that HARDLOCKUP_DETECTOR_PREFER_BUDDY, HARDLOCKUP_DETECTOR_PERF,
and HARDLOCKUP_DETECTOR_BUDDY may conflict only with
HAVE_HARDLOCKUP_DETECTOR_ARCH. They depend on HARDLOCKUP_DETECTOR
and it is not longer enabled when HAVE_NMI_WATCHDOG is set.

Link: https://lkml.kernel.org/r/20230616150618.6073-5-pmladek@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19 16:25:29 -07:00
Petr Mladek
1356d0b966 watchdog/hardlockup: make the config checks more straightforward
There are four possible variants of hardlockup detectors:

  + buddy: available when SMP is set.

  + perf: available when HAVE_HARDLOCKUP_DETECTOR_PERF is set.

  + arch-specific: available when HAVE_HARDLOCKUP_DETECTOR_ARCH is set.

  + sparc64 special variant: available when HAVE_NMI_WATCHDOG is set
	and HAVE_HARDLOCKUP_DETECTOR_ARCH is not set.

The check for the sparc64 variant is more complicated because
HAVE_NMI_WATCHDOG is used to #ifdef code used by both arch-specific
and sparc64 specific variant. Therefore it is automatically
selected with HAVE_HARDLOCKUP_DETECTOR_ARCH.

This complexity is partly hidden in HAVE_HARDLOCKUP_DETECTOR_NON_ARCH.
It reduces the size of some checks but it makes them harder to follow.

Finally, the other temporary variable HARDLOCKUP_DETECTOR_NON_ARCH
is used to re-compute HARDLOCKUP_DETECTOR_PERF/BUDDY when the global
HARDLOCKUP_DETECTOR switch is enabled/disabled.

Make the logic more straightforward by the following changes:

  + Better explain the role of HAVE_HARDLOCKUP_DETECTOR_ARCH and
    HAVE_NMI_WATCHDOG in comments.

  + Add HAVE_HARDLOCKUP_DETECTOR_BUDDY so that there is separate
    HAVE_* for all four hardlockup detector variants.

    Use it in the other conditions instead of SMP. It makes it
    clear that it is about the buddy detector.

  + Open code HAVE_HARDLOCKUP_DETECTOR_NON_ARCH in HARDLOCKUP_DETECTOR
    and HARDLOCKUP_DETECTOR_PREFER_BUDDY. It helps to understand
    the conditions between the four hardlockup detector variants.

  + Define the exact conditions when HARDLOCKUP_DETECTOR_PERF/BUDDY
    can be enabled. It explains the dependency on the other
    hardlockup detector variants.

    Also it allows to remove HARDLOCKUP_DETECTOR_NON_ARCH by using "imply".
    It triggers re-evaluating HARDLOCKUP_DETECTOR_PERF/BUDDY when
    the global HARDLOCKUP_DETECTOR switch is changed.

  + Add dependency on HARDLOCKUP_DETECTOR so that the affected variables
    disappear when the hardlockup detectors are disabled.

    Another nice side effect is that HARDLOCKUP_DETECTOR_PREFER_BUDDY
    value is not preserved when the global switch is disabled.
    The user has to make the decision again when it gets re-enabled.

Link: https://lkml.kernel.org/r/20230616150618.6073-3-pmladek@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19 16:25:28 -07:00
Petr Mladek
4917a25f83 watchdog/hardlockup: sort hardlockup detector related config values a logical way
Patch series "watchdog/hardlockup: Cleanup configuration of hardlockup
detectors", v2.

Clean up watchdog Kconfig after introducing the buddy detector.


This patch (of 6):

There are four possible variants of hardlockup detectors:

  + buddy: available when SMP is set.

  + perf: available when HAVE_HARDLOCKUP_DETECTOR_PERF is set.

  + arch-specific: available when HAVE_HARDLOCKUP_DETECTOR_ARCH is set.

  + sparc64 special variant: available when HAVE_NMI_WATCHDOG is set
	and HAVE_HARDLOCKUP_DETECTOR_ARCH is not set.

Only one hardlockup detector can be compiled in. The selection is done
using quite complex dependencies between several CONFIG variables.
The following patches will try to make it more straightforward.

As a first step, reorder the definitions of the various CONFIG variables.
The logical order is:

   1. HAVE_* variables define available variants. They are typically
      defined in the arch/ config files.

   2. HARDLOCKUP_DETECTOR y/n variable defines whether the hardlockup
      detector is enabled at all.

   3. HARDLOCKUP_DETECTOR_PREFER_BUDDY y/n variable defines whether
      the buddy detector should be preferred over the perf one.
      Note that the arch specific variants are always preferred when
      available.

   4. HARDLOCKUP_DETECTOR_PERF/BUDDY variables define whether the given
      detector is enabled in the end.

   5. HAVE_HARDLOCKUP_DETECTOR_NON_ARCH and HARDLOCKUP_DETECTOR_NON_ARCH
      are temporary variables that are going to be removed in
      a followup patch.

This is a preparation step for further cleanup. It will change the logic
without shuffling the definitions.

This change temporary breaks the C-like ordering where the variables are
declared or defined before they are used. It is not really needed for
Kconfig. Also the following patches will rework the logic so that
the ordering will be C-like in the end.

The patch just shuffles the definitions. It should not change the existing
behavior.

Link: https://lkml.kernel.org/r/20230616150618.6073-1-pmladek@suse.com
Link: https://lkml.kernel.org/r/20230616150618.6073-2-pmladek@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19 16:25:28 -07:00
Douglas Anderson
7ece48b7b4 watchdog/buddy: simplify the dependency for HARDLOCKUP_DETECTOR_PREFER_BUDDY
The dependency for HARDLOCKUP_DETECTOR_PREFER_BUDDY was more complicated
than it needed to be.  If the "perf" detector is available and we have SMP
then we have a choice, so enable the config based on just those two config
items.

Link: https://lkml.kernel.org/r/20230526184139.8.I49d5b483336b65b8acb1e5066548a05260caf809@changeid
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Suggested-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19 16:25:28 -07:00
Douglas Anderson
1f423c905a watchdog/hardlockup: detect hard lockups using secondary (buddy) CPUs
Implement a hardlockup detector that doesn't doesn't need any extra
arch-specific support code to detect lockups.  Instead of using something
arch-specific we will use the buddy system, where each CPU watches out for
another one.  Specifically, each CPU will use its softlockup hrtimer to
check that the next CPU is processing hrtimer interrupts by verifying that
a counter is increasing.

NOTE: unlike the other hard lockup detectors, the buddy one can't easily
show what's happening on the CPU that locked up just by doing a simple
backtrace.  It relies on some other mechanism in the system to get
information about the locked up CPUs.  This could be support for NMI
backtraces like [1], it could be a mechanism for printing the PC of locked
CPUs at panic time like [2] / [3], or it could be something else.  Even
though that means we still rely on arch-specific code, this arch-specific
code seems to often be implemented even on architectures that don't have a
hardlockup detector.

This style of hardlockup detector originated in some downstream Android
trees and has been rebased on / carried in ChromeOS trees for quite a long
time for use on arm and arm64 boards.  Historically on these boards we've
leveraged mechanism [2] / [3] to get information about hung CPUs, but we
could move to [1].

Although the original motivation for the buddy system was for use on
systems without an arch-specific hardlockup detector, it can still be
useful to use even on systems that _do_ have an arch-specific hardlockup
detector.  On x86, for instance, there is a 24-part patch series [4] in
progress switching the arch-specific hard lockup detector from a scarce
perf counter to a less-scarce hardware resource.  Potentially the buddy
system could be a simpler alternative to free up the perf counter but
still get hard lockup detection.

Overall, pros (+) and cons (-) of the buddy system compared to an
arch-specific hardlockup detector (which might be implemented using
perf):
+ The buddy system is usable on systems that don't have an
  arch-specific hardlockup detector, like arm32 and arm64 (though it's
  being worked on for arm64 [5]).
+ The buddy system may free up scarce hardware resources.
+ If a CPU totally goes out to lunch (can't process NMIs) the buddy
  system could still detect the problem (though it would be unlikely
  to be able to get a stack trace).
+ The buddy system uses the same timer function to pet the hardlockup
  detector on the running CPU as it uses to detect hardlockups on
  other CPUs. Compared to other hardlockup detectors, this means it
  generates fewer interrupts and thus is likely better able to let
  CPUs stay idle longer.
- If all CPUs are hard locked up at the same time the buddy system
  can't detect it.
- If we don't have SMP we can't use the buddy system.
- The buddy system needs an arch-specific mechanism (possibly NMI
  backtrace) to get info about the locked up CPU.

[1] https://lore.kernel.org/r/20230419225604.21204-1-dianders@chromium.org
[2] https://issuetracker.google.com/172213129
[3] https://docs.kernel.org/trace/coresight/coresight-cpu-debug.html
[4] https://lore.kernel.org/lkml/20230301234753.28582-1-ricardo.neri-calderon@linux.intel.com/
[5] https://lore.kernel.org/linux-arm-kernel/20220903093415.15850-1-lecopzer.chen@mediatek.com/

Link: https://lkml.kernel.org/r/20230519101840.v5.14.I6bf789d21d0c3d75d382e7e51a804a7a51315f2c@changeid
Signed-off-by: Colin Cross <ccross@android.com>
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Guenter Roeck <groeck@chromium.org>
Signed-off-by: Tzung-Bi Shih <tzungbi@chromium.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen-Yu Tsai <wens@csie.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Ian Rogers <irogers@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masayoshi Mizuma <msys.mizuma@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Pingfan Liu <kernelfans@gmail.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: Ricardo Neri <ricardo.neri@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Stephen Boyd <swboyd@chromium.org>
Cc: Sumit Garg <sumit.garg@linaro.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-09 17:44:21 -07:00
Liam R. Howlett
a5199577b1 maple_tree: make test code work without debug enabled
The test code is less useful without debug, but can still do general
validations.  Define mt_dump(), mas_dump() and mas_wr_dump() as a noop if
debug is not enabled and document it in the test module information that
more information can be obtained with another kernel config option.

MT_BUG_ON() will report a failures without tree dumps, and the output will
be less useful.

Link: https://lkml.kernel.org/r/20230518145544.1722059-17-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: David Binderman <dcb314@hotmail.com>
Cc: Peng Zhang <zhangpeng.00@bytedance.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Vernon Yang <vernon2gm@gmail.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-09 16:25:31 -07:00
Noah Goldstein
688eb8191b x86/csum: Improve performance of csum_partial
1) Add special case for len == 40 as that is the hottest value. The
   nets a ~8-9% latency improvement and a ~30% throughput improvement
   in the len == 40 case.

2) Use multiple accumulators in the 64-byte loop. This dramatically
   improves ILP and results in up to a 40% latency/throughput
   improvement (better for more iterations).

Results from benchmarking on Icelake. Times measured with rdtsc()
 len   lat_new   lat_old      r    tput_new  tput_old      r
   8      3.58      3.47  1.032        3.58      3.51  1.021
  16      4.14      4.02  1.028        3.96      3.78  1.046
  24      4.99      5.03  0.992        4.23      4.03  1.050
  32      5.09      5.08  1.001        4.68      4.47  1.048
  40      5.57      6.08  0.916        3.05      4.43  0.690
  48      6.65      6.63  1.003        4.97      4.69  1.059
  56      7.74      7.72  1.003        5.22      4.95  1.055
  64      6.65      7.22  0.921        6.38      6.42  0.994
  96      9.43      9.96  0.946        7.46      7.54  0.990
 128      9.39     12.15  0.773        8.90      8.79  1.012
 200     12.65     18.08  0.699       11.63     11.60  1.002
 272     15.82     23.37  0.677       14.43     14.35  1.005
 440     24.12     36.43  0.662       21.57     22.69  0.951
 952     46.20     74.01  0.624       42.98     53.12  0.809
1024     47.12     78.24  0.602       46.36     58.83  0.788
1552     72.01    117.30  0.614       71.92     96.78  0.743
2048     93.07    153.25  0.607       93.28    137.20  0.680
2600    114.73    194.30  0.590      114.28    179.32  0.637
3608    156.34    268.41  0.582      154.97    254.02  0.610
4096    175.01    304.03  0.576      175.89    292.08  0.602

There is no such thing as a free lunch, however, and the special case
for len == 40 does add overhead to the len != 40 cases. This seems to
amount to be ~5% throughput and slightly less in terms of latency.

Testing:
Part of this change is a new kunit test. The tests check all
alignment X length pairs in [0, 64) X [0, 512).
There are three cases.
    1) Precomputed random inputs/seed. The expected results where
       generated use the generic implementation (which is assumed to be
       non-buggy).
    2) An input of all 1s. The goal of this test is to catch any case
       a carry is missing.
    3) An input that never carries. The goal of this test si to catch
       any case of incorrectly carrying.

More exhaustive tests that test all alignment X length pairs in
[0, 8192) X [0, 8192] on random data are also available here:
https://github.com/goldsteinn/csum-reproduction

The reposity also has the code for reproducing the above benchmark
numbers.

Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/all/20230511011002.935690-1-goldstein.w.n%40gmail.com
2023-05-25 10:55:18 -07:00
Tejun Heo
6363845005 workqueue: Report work funcs that trigger automatic CPU_INTENSIVE mechanism
Workqueue now automatically marks per-cpu work items that hog CPU for too
long as CPU_INTENSIVE, which excludes them from concurrency management and
prevents stalling other concurrency-managed work items. If a work function
keeps running over the thershold, it likely needs to be switched to use an
unbound workqueue.

This patch adds a debug mechanism which tracks the work functions which
trigger the automatic CPU_INTENSIVE mechanism and report them using
pr_warn() with exponential backoff.

v3: Documentation update.

v2: Drop bouncing to kthread_worker for printing messages. It was to avoid
    introducing circular locking dependency through printk but not effective
    as it still had pool lock -> wci_lock -> printk -> pool lock loop. Let's
    just print directly using printk_deferred().

Signed-off-by: Tejun Heo <tj@kernel.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
2023-05-17 17:02:08 -10:00
Kees Cook
3bf301e1ab string: Add Kunit tests for strcat() family
Add tests to make sure the strcat() family of functions behave
correctly.

Signed-off-by: Kees Cook <keescook@chromium.org>
2023-05-16 14:08:02 -07:00
Kees Cook
a9dc8d0442 fortify: Allow KUnit test to build without FORTIFY
In order for CI systems to notice all the skipped tests related to
CONFIG_FORTIFY_SOURCE, allow the FORTIFY_SOURCE KUnit tests to build
with or without CONFIG_FORTIFY_SOURCE.

Signed-off-by: Kees Cook <keescook@chromium.org>
2023-05-16 14:07:49 -07:00
Linus Torvalds
10de638d8e s390 updates for the 6.4 merge window
- Add support for stackleak feature. Also allow specifying
   architecture-specific stackleak poison function to enable faster
   implementation. On s390, the mvc-based implementation helps decrease
   typical overhead from a factor of 3 to just 25%
 
 - Convert all assembler files to use SYM* style macros, deprecating the
   ENTRY() macro and other annotations. Select ARCH_USE_SYM_ANNOTATIONS
 
 - Improve KASLR to also randomize module and special amode31 code
   base load addresses
 
 - Rework decompressor memory tracking to support memory holes and improve
   error handling
 
 - Add support for protected virtualization AP binding
 
 - Add support for set_direct_map() calls
 
 - Implement set_memory_rox() and noexec module_alloc()
 
 - Remove obsolete overriding of mem*() functions for KASAN
 
 - Rework kexec/kdump to avoid using nodat_stack to call purgatory
 
 - Convert the rest of the s390 code to use flexible-array member instead
   of a zero-length array
 
 - Clean up uaccess inline asm
 
 - Enable ARCH_HAS_MEMBARRIER_SYNC_CORE
 
 - Convert to using CONFIG_FUNCTION_ALIGNMENT and enable
   DEBUG_FORCE_FUNCTION_ALIGN_64B
 
 - Resolve last_break in userspace fault reports
 
 - Simplify one-level sysctl registration
 
 - Clean up branch prediction handling
 
 - Rework CPU counter facility to retrieve available counter sets just
   once
 
 - Other various small fixes and improvements all over the code
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAmRM8pwACgkQjYWKoQLX
 FBjV1AgAlvAhu1XkwOdwqdT4GqE8pcN4XXzydog1MYihrSO2PdgWAxpEW7o2QURN
 W+3xa6RIqt7nX2YBiwTanMZ12TYaFY7noGl3eUpD/NhueprweVirVl7VZUEuRoW/
 j0mbx77xsVzLfuDFxkpVwE6/j+tTO78kLyjUHwcN9rFVUaL7/orJneDJf+V8fZG0
 sHLOv0aljF7Jr2IIkw82lCmW/vdk7k0dACWMXK2kj1H3dIK34B9X4AdKDDf/WKXk
 /OSElBeZ93tSGEfNDRIda6iR52xocROaRnQAaDtargKFl9VO0/dN9ADxO+SLNHjN
 pFE/9VD6xT/xo4IuZZh/Z3TcYfiLvA==
 =Geqx
 -----END PGP SIGNATURE-----

Merge tag 's390-6.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 updates from Vasily Gorbik:

 - Add support for stackleak feature. Also allow specifying
   architecture-specific stackleak poison function to enable faster
   implementation. On s390, the mvc-based implementation helps decrease
   typical overhead from a factor of 3 to just 25%

 - Convert all assembler files to use SYM* style macros, deprecating the
   ENTRY() macro and other annotations. Select ARCH_USE_SYM_ANNOTATIONS

 - Improve KASLR to also randomize module and special amode31 code base
   load addresses

 - Rework decompressor memory tracking to support memory holes and
   improve error handling

 - Add support for protected virtualization AP binding

 - Add support for set_direct_map() calls

 - Implement set_memory_rox() and noexec module_alloc()

 - Remove obsolete overriding of mem*() functions for KASAN

 - Rework kexec/kdump to avoid using nodat_stack to call purgatory

 - Convert the rest of the s390 code to use flexible-array member
   instead of a zero-length array

 - Clean up uaccess inline asm

 - Enable ARCH_HAS_MEMBARRIER_SYNC_CORE

 - Convert to using CONFIG_FUNCTION_ALIGNMENT and enable
   DEBUG_FORCE_FUNCTION_ALIGN_64B

 - Resolve last_break in userspace fault reports

 - Simplify one-level sysctl registration

 - Clean up branch prediction handling

 - Rework CPU counter facility to retrieve available counter sets just
   once

 - Other various small fixes and improvements all over the code

* tag 's390-6.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (118 commits)
  s390/stackleak: provide fast __stackleak_poison() implementation
  stackleak: allow to specify arch specific stackleak poison function
  s390: select ARCH_USE_SYM_ANNOTATIONS
  s390/mm: use VM_FLUSH_RESET_PERMS in module_alloc()
  s390: wire up memfd_secret system call
  s390/mm: enable ARCH_HAS_SET_DIRECT_MAP
  s390/mm: use BIT macro to generate SET_MEMORY bit masks
  s390/relocate_kernel: adjust indentation
  s390/relocate_kernel: use SYM* macros instead of ENTRY(), etc.
  s390/entry: use SYM* macros instead of ENTRY(), etc.
  s390/purgatory: use SYM* macros instead of ENTRY(), etc.
  s390/kprobes: use SYM* macros instead of ENTRY(), etc.
  s390/reipl: use SYM* macros instead of ENTRY(), etc.
  s390/head64: use SYM* macros instead of ENTRY(), etc.
  s390/earlypgm: use SYM* macros instead of ENTRY(), etc.
  s390/mcount: use SYM* macros instead of ENTRY(), etc.
  s390/crc32le: use SYM* macros instead of ENTRY(), etc.
  s390/crc32be: use SYM* macros instead of ENTRY(), etc.
  s390/crypto,chacha: use SYM* macros instead of ENTRY(), etc.
  s390/amode31: use SYM* macros instead of ENTRY(), etc.
  ...
2023-04-30 11:43:31 -07:00
Linus Torvalds
f20730efbd SMP cross-CPU function-call updates for v6.4:
- Remove diagnostics and adjust config for CSD lock diagnostics
 
  - Add a generic IPI-sending tracepoint, as currently there's no easy
    way to instrument IPI origins: it's arch dependent and for some
    major architectures it's not even consistently available.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmRK438RHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1jJ5Q/5AZ0HGpyqwdFK8GmGznyu5qjP5HwV9pPq
 gZQScqSy4tZEeza4TFMi83CoXSg9uJ7GlYJqqQMKm78LGEPomnZtXXC7oWvTA9M5
 M/jAvzytmvZloSCXV6kK7jzSejMHhag97J/BjTYhZYQpJ9T+hNC87XO6J6COsKr9
 lPIYqkFrIkQNr6B0U11AQfFejRYP1ics2fnbnZL86G/zZAc6x8EveM3KgSer2iHl
 KbrO+xcYyGY8Ef9P2F72HhEGFfM3WslpT1yzqR3sm4Y+fuMG0oW3qOQuMJx0ZhxT
 AloterY0uo6gJwI0P9k/K4klWgz81Tf/zLb0eBAtY2uJV9Fo3YhPHuZC7jGPGAy3
 JusW2yNYqc8erHVEMAKDUsl/1KN4TE2uKlkZy98wno+KOoMufK5MA2e2kPPqXvUi
 Jk9RvFolnWUsexaPmCftti0OCv3YFiviVAJ/t0pchfmvvJA2da0VC9hzmEXpLJVF
 25nBTV/1uAOrWvOpCyo3ElrC2CkQVkFmK5rXMDdvf6ib0Nid4vFcCkCSLVfu+ePB
 11mi7QYro+CcnOug1K+yKogUDmsZgV/u1kUwgQzTIpZ05Kkb49gUiXw9L2RGcBJh
 yoDoiI66KPR7PWQ2qBdQoXug4zfEEtWG0O9HNLB0FFRC3hu7I+HHyiUkBWs9jasK
 PA5+V7HcQRk=
 =Wp7f
 -----END PGP SIGNATURE-----

Merge tag 'smp-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull SMP cross-CPU function-call updates from Ingo Molnar:

 - Remove diagnostics and adjust config for CSD lock diagnostics

 - Add a generic IPI-sending tracepoint, as currently there's no easy
   way to instrument IPI origins: it's arch dependent and for some major
   architectures it's not even consistently available.

* tag 'smp-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  trace,smp: Trace all smp_function_call*() invocations
  trace: Add trace_ipi_send_cpu()
  sched, smp: Trace smp callback causing an IPI
  smp: reword smp call IPI comment
  treewide: Trace IPIs sent via smp_send_reschedule()
  irq_work: Trace self-IPIs sent via arch_irq_work_raise()
  smp: Trace IPIs sent via arch_send_call_function_ipi_mask()
  sched, smp: Trace IPIs sent via send_call_function_single_ipi()
  trace: Add trace_ipi_send_cpumask()
  kernel/smp: Make csdlock_debug= resettable
  locking/csd_lock: Remove per-CPU data indirection from CSD lock debugging
  locking/csd_lock: Remove added data from CSD lock debugging
  locking/csd_lock: Add Kconfig option for csd_debug default
2023-04-28 15:03:43 -07:00
Linus Torvalds
7fa8a8ee94 - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of
switching from a user process to a kernel thread.
 
 - More folio conversions from Kefeng Wang, Zhang Peng and Pankaj Raghav.
 
 - zsmalloc performance improvements from Sergey Senozhatsky.
 
 - Yue Zhao has found and fixed some data race issues around the
   alteration of memcg userspace tunables.
 
 - VFS rationalizations from Christoph Hellwig:
 
   - removal of most of the callers of write_one_page().
 
   - make __filemap_get_folio()'s return value more useful
 
 - Luis Chamberlain has changed tmpfs so it no longer requires swap
   backing.  Use `mount -o noswap'.
 
 - Qi Zheng has made the slab shrinkers operate locklessly, providing
   some scalability benefits.
 
 - Keith Busch has improved dmapool's performance, making part of its
   operations O(1) rather than O(n).
 
 - Peter Xu adds the UFFD_FEATURE_WP_UNPOPULATED feature to userfaultd,
   permitting userspace to wr-protect anon memory unpopulated ptes.
 
 - Kirill Shutemov has changed MAX_ORDER's meaning to be inclusive rather
   than exclusive, and has fixed a bunch of errors which were caused by its
   unintuitive meaning.
 
 - Axel Rasmussen give userfaultfd the UFFDIO_CONTINUE_MODE_WP feature,
   which causes minor faults to install a write-protected pte.
 
 - Vlastimil Babka has done some maintenance work on vma_merge():
   cleanups to the kernel code and improvements to our userspace test
   harness.
 
 - Cleanups to do_fault_around() by Lorenzo Stoakes.
 
 - Mike Rapoport has moved a lot of initialization code out of various
   mm/ files and into mm/mm_init.c.
 
 - Lorenzo Stoakes removd vmf_insert_mixed_prot(), which was added for
   DRM, but DRM doesn't use it any more.
 
 - Lorenzo has also coverted read_kcore() and vread() to use iterators
   and has thereby removed the use of bounce buffers in some cases.
 
 - Lorenzo has also contributed further cleanups of vma_merge().
 
 - Chaitanya Prakash provides some fixes to the mmap selftesting code.
 
 - Matthew Wilcox changes xfs and afs so they no longer take sleeping
   locks in ->map_page(), a step towards RCUification of pagefaults.
 
 - Suren Baghdasaryan has improved mmap_lock scalability by switching to
   per-VMA locking.
 
 - Frederic Weisbecker has reworked the percpu cache draining so that it
   no longer causes latency glitches on cpu isolated workloads.
 
 - Mike Rapoport cleans up and corrects the ARCH_FORCE_MAX_ORDER Kconfig
   logic.
 
 - Liu Shixin has changed zswap's initialization so we no longer waste a
   chunk of memory if zswap is not being used.
 
 - Yosry Ahmed has improved the performance of memcg statistics flushing.
 
 - David Stevens has fixed several issues involving khugepaged,
   userfaultfd and shmem.
 
 - Christoph Hellwig has provided some cleanup work to zram's IO-related
   code paths.
 
 - David Hildenbrand has fixed up some issues in the selftest code's
   testing of our pte state changing.
 
 - Pankaj Raghav has made page_endio() unneeded and has removed it.
 
 - Peter Xu contributed some rationalizations of the userfaultfd
   selftests.
 
 - Yosry Ahmed has fixed an issue around memcg's page recalim accounting.
 
 - Chaitanya Prakash has fixed some arm-related issues in the
   selftests/mm code.
 
 - Longlong Xia has improved the way in which KSM handles hwpoisoned
   pages.
 
 - Peter Xu fixes a few issues with uffd-wp at fork() time.
 
 - Stefan Roesch has changed KSM so that it may now be used on a
   per-process and per-cgroup basis.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZEr3zQAKCRDdBJ7gKXxA
 jlLoAP0fpQBipwFxED0Us4SKQfupV6z4caXNJGPeay7Aj11/kQD/aMRC2uPfgr96
 eMG3kwn2pqkB9ST2QpkaRbxA//eMbQY=
 =J+Dj
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2023-04-27-15-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of
   switching from a user process to a kernel thread.

 - More folio conversions from Kefeng Wang, Zhang Peng and Pankaj
   Raghav.

 - zsmalloc performance improvements from Sergey Senozhatsky.

 - Yue Zhao has found and fixed some data race issues around the
   alteration of memcg userspace tunables.

 - VFS rationalizations from Christoph Hellwig:
     - removal of most of the callers of write_one_page()
     - make __filemap_get_folio()'s return value more useful

 - Luis Chamberlain has changed tmpfs so it no longer requires swap
   backing. Use `mount -o noswap'.

 - Qi Zheng has made the slab shrinkers operate locklessly, providing
   some scalability benefits.

 - Keith Busch has improved dmapool's performance, making part of its
   operations O(1) rather than O(n).

 - Peter Xu adds the UFFD_FEATURE_WP_UNPOPULATED feature to userfaultd,
   permitting userspace to wr-protect anon memory unpopulated ptes.

 - Kirill Shutemov has changed MAX_ORDER's meaning to be inclusive
   rather than exclusive, and has fixed a bunch of errors which were
   caused by its unintuitive meaning.

 - Axel Rasmussen give userfaultfd the UFFDIO_CONTINUE_MODE_WP feature,
   which causes minor faults to install a write-protected pte.

 - Vlastimil Babka has done some maintenance work on vma_merge():
   cleanups to the kernel code and improvements to our userspace test
   harness.

 - Cleanups to do_fault_around() by Lorenzo Stoakes.

 - Mike Rapoport has moved a lot of initialization code out of various
   mm/ files and into mm/mm_init.c.

 - Lorenzo Stoakes removd vmf_insert_mixed_prot(), which was added for
   DRM, but DRM doesn't use it any more.

 - Lorenzo has also coverted read_kcore() and vread() to use iterators
   and has thereby removed the use of bounce buffers in some cases.

 - Lorenzo has also contributed further cleanups of vma_merge().

 - Chaitanya Prakash provides some fixes to the mmap selftesting code.

 - Matthew Wilcox changes xfs and afs so they no longer take sleeping
   locks in ->map_page(), a step towards RCUification of pagefaults.

 - Suren Baghdasaryan has improved mmap_lock scalability by switching to
   per-VMA locking.

 - Frederic Weisbecker has reworked the percpu cache draining so that it
   no longer causes latency glitches on cpu isolated workloads.

 - Mike Rapoport cleans up and corrects the ARCH_FORCE_MAX_ORDER Kconfig
   logic.

 - Liu Shixin has changed zswap's initialization so we no longer waste a
   chunk of memory if zswap is not being used.

 - Yosry Ahmed has improved the performance of memcg statistics
   flushing.

 - David Stevens has fixed several issues involving khugepaged,
   userfaultfd and shmem.

 - Christoph Hellwig has provided some cleanup work to zram's IO-related
   code paths.

 - David Hildenbrand has fixed up some issues in the selftest code's
   testing of our pte state changing.

 - Pankaj Raghav has made page_endio() unneeded and has removed it.

 - Peter Xu contributed some rationalizations of the userfaultfd
   selftests.

 - Yosry Ahmed has fixed an issue around memcg's page recalim
   accounting.

 - Chaitanya Prakash has fixed some arm-related issues in the
   selftests/mm code.

 - Longlong Xia has improved the way in which KSM handles hwpoisoned
   pages.

 - Peter Xu fixes a few issues with uffd-wp at fork() time.

 - Stefan Roesch has changed KSM so that it may now be used on a
   per-process and per-cgroup basis.

* tag 'mm-stable-2023-04-27-15-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits)
  mm,unmap: avoid flushing TLB in batch if PTE is inaccessible
  shmem: restrict noswap option to initial user namespace
  mm/khugepaged: fix conflicting mods to collapse_file()
  sparse: remove unnecessary 0 values from rc
  mm: move 'mmap_min_addr' logic from callers into vm_unmapped_area()
  hugetlb: pte_alloc_huge() to replace huge pte_alloc_map()
  maple_tree: fix allocation in mas_sparse_area()
  mm: do not increment pgfault stats when page fault handler retries
  zsmalloc: allow only one active pool compaction context
  selftests/mm: add new selftests for KSM
  mm: add new KSM process and sysfs knobs
  mm: add new api to enable ksm per process
  mm: shrinkers: fix debugfs file permissions
  mm: don't check VMA write permissions if the PTE/PMD indicates write permissions
  migrate_pages_batch: fix statistics for longterm pin retry
  userfaultfd: use helper function range_in_vma()
  lib/show_mem.c: use for_each_populated_zone() simplify code
  mm: correct arg in reclaim_pages()/reclaim_clean_pages_from_list()
  fs/buffer: convert create_page_buffers to folio_create_buffers
  fs/buffer: add folio_create_empty_buffers helper
  ...
2023-04-27 19:42:02 -07:00
Linus Torvalds
9dd6956b38 for-6.4/block-2023-04-21
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmRCvcIQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpk+JEACj01t7Xen2+Razagu3aTx9tmRGFnTNR3MY
 raFG6B1TADk1TgCWWa2C4Dj67SOispPLm8hbIcOxqB1UscDWCCwjmnr/debADFzW
 Ap6shv/IRwVGmDp+F7ocYas0ynwooOJg4WJTwkSKz2o4m4p3vzlwAKi4fLiSjbXp
 gJTrA7WEvDOVjzajlTFUtjr8rc6PdunbGm25cPIufAxUEhvttYex2VbVqjDmfNsE
 8tyyk9RWbe4AY/ZYaGXVn4yQ/CgL/sXFkVc5noRXNfAQ/K3CVLQrFLJ3JlwUHpiA
 xXBor21TUWCZEo33Y2G5NConAYqE7etoPTkaTDO3/aZ+dAMFyhC/WAYLz1KZGMh1
 +g1fDX1QKEd40H2lfDXvqF1ob7Ut8EzUx+gvBXcc3/AiRpJ5rjfOcj6LPUMUqQJk
 nucLLFTiMKecnDMBERbvixqbaTyrjvkFEj2wYJvgj1LKXAd+x/bj8SGajs9r88Nb
 9YT9ai/+Yl7Ppfb67rCgXJU7oNZQSAQ2H+X/l2jbiqImOgq1u/45AmINnbanS7HH
 Y1I8pbH45AcnCgkJRoQwrNX3BnTOTBJ+D/4Fl4b8jsihq0D3UtwCwPCObHP4LW9S
 MUNPhP3tUuYsAgXqX80+Sao6SYvXDwnbWOM+LOaaZXgjb1ndwDUZXpto8Ra8WB1u
 8kM6s6ZR7g==
 =W1Zb
 -----END PGP SIGNATURE-----

Merge tag 'for-6.4/block-2023-04-21' of git://git.kernel.dk/linux

Pull block updates from Jens Axboe:

 - drbd patches, bringing us closer to unifying the out-of-tree version
   and the in tree one (Andreas, Christoph)

 - support for auto-quiesce for the s390 dasd driver (Stefan)

 - MD pull request via Song:
      - md/bitmap: Optimal last page size (Jon Derrick)
      - Various raid10 fixes (Yu Kuai, Li Nan)
      - md: add error_handlers for raid0 and linear (Mariusz Tkaczyk)

 - NVMe pull request via Christoph:
      - Drop redundant pci_enable_pcie_error_reporting (Bjorn Helgaas)
      - Validate nvmet module parameters (Chaitanya Kulkarni)
      - Fence TCP socket on receive error (Chris Leech)
      - Fix async event trace event (Keith Busch)
      - Minor cleanups (Chaitanya Kulkarni, zhenwei pi)
      - Fix and cleanup nvmet Identify handling (Damien Le Moal,
        Christoph Hellwig)
      - Fix double blk_mq_complete_request race in the timeout handler
        (Lei Yin)
      - Fix irq locking in nvme-fcloop (Ming Lei)
      - Remove queue mapping helper for rdma devices (Sagi Grimberg)

 - use structured request attribute checks for nbd (Jakub)

 - fix blk-crypto race conditions between keyslot management (Eric)

 - add sed-opal support for reading read locking range attributes
   (Ondrej)

 - make fault injection configurable for null_blk (Akinobu)

 - clean up the request insertion API (Christoph)

 - clean up the queue running API (Christoph)

 - blkg config helper cleanups (Tejun)

 - lazy init support for blk-iolatency (Tejun)

 - various fixes and tweaks to ublk (Ming)

 - remove hybrid polling. It hasn't really been useful since we got
   async polled IO support, and these days we don't support sync polled
   IO at all (Keith)

 - misc fixes, cleanups, improvements (Zhong, Ondrej, Colin, Chengming,
   Chaitanya, me)

* tag 'for-6.4/block-2023-04-21' of git://git.kernel.dk/linux: (118 commits)
  nbd: fix incomplete validation of ioctl arg
  ublk: don't return 0 in case of any failure
  sed-opal: geometry feature reporting command
  null_blk: Always check queue mode setting from configfs
  block: ublk: switch to ioctl command encoding
  blk-mq: fix the blk_mq_add_to_requeue_list call in blk_kick_flush
  block, bfq: Fix division by zero error on zero wsum
  fault-inject: fix build error when FAULT_INJECTION_CONFIGFS=y and CONFIGFS_FS=m
  block: store bdev->bd_disk->fops->submit_bio state in bdev
  block: re-arrange the struct block_device fields for better layout
  md/raid5: remove unused working_disks variable
  md/raid10: don't call bio_start_io_acct twice for bio which experienced read error
  md/raid10: fix memleak of md thread
  md/raid10: fix memleak for 'conf->bio_split'
  md/raid10: fix leak of 'r10bio->remaining' for recovery
  md/raid10: don't BUG_ON() in raise_barrier()
  md: fix soft lockup in status_resync
  md: add error_handlers for raid0 and linear
  md: Use optimal I/O size for last bitmap page
  md: Fix types in sb writer
  ...
2023-04-26 12:52:58 -07:00
Akinobu Mita
d325c16263 fault-inject: fix build error when FAULT_INJECTION_CONFIGFS=y and CONFIGFS_FS=m
This fixes a build error when CONFIG_FAULT_INJECTION_CONFIGFS=y and
CONFIG_CONFIGFS_FS=m.

Since the fault-injection library cannot built as a module, avoid building
configfs as a module.

Fixes: 4668c7a294 ("fault-inject: allow configuration via configfs")
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/oe-kbuild-all/202304150025.K0hczLR4-lkp@intel.com/
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-16 13:01:42 -06:00
Akinobu Mita
4668c7a294 fault-inject: allow configuration via configfs
This provides a helper function to allow configuration of fault-injection
for configfs-based drivers.

The config items created by this function have the same interface as the
one created under debugfs by fault_create_debugfs_attr().

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Link: https://lore.kernel.org/r/20230327143733.14599-2-akinobu.mita@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-13 07:38:54 -06:00
Nicholas Piggin
2655421ae6 lazy tlb: shoot lazies, non-refcounting lazy tlb mm reference handling scheme
On big systems, the mm refcount can become highly contented when doing a
lot of context switching with threaded applications.  user<->idle switch
is one of the important cases.  Abandoning lazy tlb entirely slows this
switching down quite a bit in the common uncontended case, so that is not
viable.

Implement a scheme where lazy tlb mm references do not contribute to the
refcount, instead they get explicitly removed when the refcount reaches
zero.

The final mmdrop() sends IPIs to all CPUs in the mm_cpumask and they
switch away from this mm to init_mm if it was being used as the lazy tlb
mm.  Enabling the shoot lazies option therefore requires that the arch
ensures that mm_cpumask contains all CPUs that could possibly be using mm.
A DEBUG_VM option IPIs every CPU in the system after this to ensure there
are no references remaining before the mm is freed.

Shootdown IPIs cost could be an issue, but they have not been observed to
be a serious problem with this scheme, because short-lived processes tend
not to migrate CPUs much, therefore they don't get much chance to leave
lazy tlb mm references on remote CPUs.  There are a lot of options to
reduce them if necessary, described in comments.

The near-worst-case can be benchmarked with will-it-scale:

  context_switch1_threads -t $(($(nproc) / 2))

This will create nproc threads (nproc / 2 switching pairs) all sharing the
same mm that spread over all CPUs so each CPU does thread->idle->thread
switching.

[ Rik came up with basically the same idea a few years ago, so credit
  to him for that. ]

Link: https://lore.kernel.org/linux-mm/20230118080011.2258375-1-npiggin@gmail.com/
Link: https://lore.kernel.org/all/20180728215357.3249-11-riel@surriel.com/
Link: https://lkml.kernel.org/r/20230203071837.1136453-5-npiggin@gmail.com
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-03-28 16:20:08 -07:00
Tiezhu Yang
f478b9987c lib/Kconfig.debug: correct help info of LOCKDEP_STACK_TRACE_HASH_BITS
We can see the following definition in kernel/locking/lockdep_internals.h:

  #define STACK_TRACE_HASH_SIZE	(1 << CONFIG_LOCKDEP_STACK_TRACE_HASH_BITS)

CONFIG_LOCKDEP_STACK_TRACE_HASH_BITS is related with STACK_TRACE_HASH_SIZE
instead of MAX_STACK_TRACE_ENTRIES, fix it.

Link: https://lkml.kernel.org/r/1679380508-20830-1-git-send-email-yangtiezhu@loongson.cn
Fixes: 5dc33592e9 ("lockdep: Allow tuning tracing capacity constants.")
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-03-28 15:24:31 -07:00
ye xingchen
35260cf545 Kconfig.debug: fix SCHED_DEBUG dependency
The path for SCHED_DEBUG is /sys/kernel/debug/sched.  So, SCHED_DEBUG
should depend on DEBUG_FS, not PROC_FS.

Link: https://lkml.kernel.org/r/202301291110098787982@zte.com.cn
Signed-off-by: ye xingchen <ye.xingchen@zte.com.cn>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-03-28 15:24:31 -07:00
Paul E. McKenney
c521986016 locking/csd_lock: Add Kconfig option for csd_debug default
The csd_debug kernel parameter works well, but is inconvenient in cases
where it is more closely associated with boot loaders or automation than
with a particular kernel version or release.  Thererfore, provide a new
CSD_LOCK_WAIT_DEBUG_DEFAULT Kconfig option that defaults csd_debug to
1 when selected and 0 otherwise, with this latter being the default.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20230321005516.50558-1-paulmck@kernel.org
2023-03-24 11:01:25 +01:00
Heiko Carstens
322a7ce7a6 s390: enable DEBUG_FORCE_FUNCTION_ALIGN_64B
Allow to enforce 64 byte function alignment like it is possible for a
couple of other architectures. This may or may not be helpful for
debugging performance problems, as described with the Kconfig option.

Since the kernel works also with 64 byte function alignment there is
no reason for not allowing to enforce this function alignment.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-03-20 11:12:47 +01:00
Linus Torvalds
d2980d8d82 There is no particular theme here - mainly quick hits all over the tree.
Most notable is a set of zlib changes from Mikhail Zaslonko which enhances
 and fixes zlib's use of S390 hardware support: "lib/zlib: Set of s390
 DFLTCC related patches for kernel zlib".
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY/QC4QAKCRDdBJ7gKXxA
 jtKdAQCbDCBdY8H45d1fONzQW2UDqCPnOi77MpVUxGL33r+1SAEA807C7rvDEmlf
 yP1Ft+722fFU5jogVU8ZFh+vapv2/gI=
 =Q9YK
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2023-02-20-15-29' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull non-MM updates from Andrew Morton:
 "There is no particular theme here - mainly quick hits all over the
  tree.

  Most notable is a set of zlib changes from Mikhail Zaslonko which
  enhances and fixes zlib's use of S390 hardware support: 'lib/zlib: Set
  of s390 DFLTCC related patches for kernel zlib'"

* tag 'mm-nonmm-stable-2023-02-20-15-29' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (55 commits)
  Update CREDITS file entry for Jesper Juhl
  sparc: allow PM configs for sparc32 COMPILE_TEST
  hung_task: print message when hung_task_warnings gets down to zero.
  arch/Kconfig: fix indentation
  scripts/tags.sh: fix the Kconfig tags generation when using latest ctags
  nilfs2: prevent WARNING in nilfs_dat_commit_end()
  lib/zlib: remove redundation assignement of avail_in dfltcc_gdht()
  lib/Kconfig.debug: do not enable DEBUG_PREEMPT by default
  lib/zlib: DFLTCC always switch to software inflate for Z_PACKET_FLUSH option
  lib/zlib: DFLTCC support inflate with small window
  lib/zlib: Split deflate and inflate states for DFLTCC
  lib/zlib: DFLTCC not writing header bits when avail_out == 0
  lib/zlib: fix DFLTCC ignoring flush modes when avail_in == 0
  lib/zlib: fix DFLTCC not flushing EOBS when creating raw streams
  lib/zlib: implement switching between DFLTCC and software
  lib/zlib: adjust offset calculation for dfltcc_state
  nilfs2: replace WARN_ONs for invalid DAT metadata block requests
  scripts/spelling.txt: add "exsits" pattern and fix typo instances
  fs: gracefully handle ->get_block not mapping bh in __mpage_writepage
  cramfs: Kconfig: fix spelling & punctuation
  ...
2023-02-23 17:55:40 -08:00
Linus Torvalds
3822a7c409 - Daniel Verkamp has contributed a memfd series ("mm/memfd: add
F_SEAL_EXEC") which permits the setting of the memfd execute bit at
   memfd creation time, with the option of sealing the state of the X bit.
 
 - Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset()
   thread-safe for pmd unshare") which addresses a rare race condition
   related to PMD unsharing.
 
 - Several folioification patch serieses from Matthew Wilcox, Vishal
   Moola, Sidhartha Kumar and Lorenzo Stoakes
 
 - Johannes Weiner has a series ("mm: push down lock_page_memcg()") which
   does perform some memcg maintenance and cleanup work.
 
 - SeongJae Park has added DAMOS filtering to DAMON, with the series
   "mm/damon/core: implement damos filter".  These filters provide users
   with finer-grained control over DAMOS's actions.  SeongJae has also done
   some DAMON cleanup work.
 
 - Kairui Song adds a series ("Clean up and fixes for swap").
 
 - Vernon Yang contributed the series "Clean up and refinement for maple
   tree".
 
 - Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series.  It
   adds to MGLRU an LRU of memcgs, to improve the scalability of global
   reclaim.
 
 - David Hildenbrand has added some userfaultfd cleanup work in the
   series "mm: uffd-wp + change_protection() cleanups".
 
 - Christoph Hellwig has removed the generic_writepages() library
   function in the series "remove generic_writepages".
 
 - Baolin Wang has performed some maintenance on the compaction code in
   his series "Some small improvements for compaction".
 
 - Sidhartha Kumar is doing some maintenance work on struct page in his
   series "Get rid of tail page fields".
 
 - David Hildenbrand contributed some cleanup, bugfixing and
   generalization of pte management and of pte debugging in his series "mm:
   support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with swap
   PTEs".
 
 - Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation
   flag in the series "Discard __GFP_ATOMIC".
 
 - Sergey Senozhatsky has improved zsmalloc's memory utilization with his
   series "zsmalloc: make zspage chain size configurable".
 
 - Joey Gouly has added prctl() support for prohibiting the creation of
   writeable+executable mappings.  The previous BPF-based approach had
   shortcomings.  See "mm: In-kernel support for memory-deny-write-execute
   (MDWE)".
 
 - Waiman Long did some kmemleak cleanup and bugfixing in the series
   "mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF".
 
 - T.J.  Alumbaugh has contributed some MGLRU cleanup work in his series
   "mm: multi-gen LRU: improve".
 
 - Jiaqi Yan has provided some enhancements to our memory error
   statistics reporting, mainly by presenting the statistics on a per-node
   basis.  See the series "Introduce per NUMA node memory error
   statistics".
 
 - Mel Gorman has a second and hopefully final shot at fixing a CPU-hog
   regression in compaction via his series "Fix excessive CPU usage during
   compaction".
 
 - Christoph Hellwig does some vmalloc maintenance work in the series
   "cleanup vfree and vunmap".
 
 - Christoph Hellwig has removed block_device_operations.rw_page() in ths
   series "remove ->rw_page".
 
 - We get some maple_tree improvements and cleanups in Liam Howlett's
   series "VMA tree type safety and remove __vma_adjust()".
 
 - Suren Baghdasaryan has done some work on the maintainability of our
   vm_flags handling in the series "introduce vm_flags modifier functions".
 
 - Some pagemap cleanup and generalization work in Mike Rapoport's series
   "mm, arch: add generic implementation of pfn_valid() for FLATMEM" and
   "fixups for generic implementation of pfn_valid()"
 
 - Baoquan He has done some work to make /proc/vmallocinfo and
   /proc/kcore better represent the real state of things in his series
   "mm/vmalloc.c: allow vread() to read out vm_map_ram areas".
 
 - Jason Gunthorpe rationalized the GUP system's interface to the rest of
   the kernel in the series "Simplify the external interface for GUP".
 
 - SeongJae Park wishes to migrate people from DAMON's debugfs interface
   over to its sysfs interface.  To support this, we'll temporarily be
   printing warnings when people use the debugfs interface.  See the series
   "mm/damon: deprecate DAMON debugfs interface".
 
 - Andrey Konovalov provided the accurately named "lib/stackdepot: fixes
   and clean-ups" series.
 
 - Huang Ying has provided a dramatic reduction in migration's TLB flush
   IPI rates with the series "migrate_pages(): batch TLB flushing".
 
 - Arnd Bergmann has some objtool fixups in "objtool warning fixes".
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY/PoPQAKCRDdBJ7gKXxA
 jlvpAPsFECUBBl20qSue2zCYWnHC7Yk4q9ytTkPB/MMDrFEN9wD/SNKEm2UoK6/K
 DmxHkn0LAitGgJRS/W9w81yrgig9tAQ=
 =MlGs
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - Daniel Verkamp has contributed a memfd series ("mm/memfd: add
   F_SEAL_EXEC") which permits the setting of the memfd execute bit at
   memfd creation time, with the option of sealing the state of the X
   bit.

 - Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset()
   thread-safe for pmd unshare") which addresses a rare race condition
   related to PMD unsharing.

 - Several folioification patch serieses from Matthew Wilcox, Vishal
   Moola, Sidhartha Kumar and Lorenzo Stoakes

 - Johannes Weiner has a series ("mm: push down lock_page_memcg()")
   which does perform some memcg maintenance and cleanup work.

 - SeongJae Park has added DAMOS filtering to DAMON, with the series
   "mm/damon/core: implement damos filter".

   These filters provide users with finer-grained control over DAMOS's
   actions. SeongJae has also done some DAMON cleanup work.

 - Kairui Song adds a series ("Clean up and fixes for swap").

 - Vernon Yang contributed the series "Clean up and refinement for maple
   tree".

 - Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series. It
   adds to MGLRU an LRU of memcgs, to improve the scalability of global
   reclaim.

 - David Hildenbrand has added some userfaultfd cleanup work in the
   series "mm: uffd-wp + change_protection() cleanups".

 - Christoph Hellwig has removed the generic_writepages() library
   function in the series "remove generic_writepages".

 - Baolin Wang has performed some maintenance on the compaction code in
   his series "Some small improvements for compaction".

 - Sidhartha Kumar is doing some maintenance work on struct page in his
   series "Get rid of tail page fields".

 - David Hildenbrand contributed some cleanup, bugfixing and
   generalization of pte management and of pte debugging in his series
   "mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with
   swap PTEs".

 - Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation
   flag in the series "Discard __GFP_ATOMIC".

 - Sergey Senozhatsky has improved zsmalloc's memory utilization with
   his series "zsmalloc: make zspage chain size configurable".

 - Joey Gouly has added prctl() support for prohibiting the creation of
   writeable+executable mappings.

   The previous BPF-based approach had shortcomings. See "mm: In-kernel
   support for memory-deny-write-execute (MDWE)".

 - Waiman Long did some kmemleak cleanup and bugfixing in the series
   "mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF".

 - T.J. Alumbaugh has contributed some MGLRU cleanup work in his series
   "mm: multi-gen LRU: improve".

 - Jiaqi Yan has provided some enhancements to our memory error
   statistics reporting, mainly by presenting the statistics on a
   per-node basis. See the series "Introduce per NUMA node memory error
   statistics".

 - Mel Gorman has a second and hopefully final shot at fixing a CPU-hog
   regression in compaction via his series "Fix excessive CPU usage
   during compaction".

 - Christoph Hellwig does some vmalloc maintenance work in the series
   "cleanup vfree and vunmap".

 - Christoph Hellwig has removed block_device_operations.rw_page() in
   ths series "remove ->rw_page".

 - We get some maple_tree improvements and cleanups in Liam Howlett's
   series "VMA tree type safety and remove __vma_adjust()".

 - Suren Baghdasaryan has done some work on the maintainability of our
   vm_flags handling in the series "introduce vm_flags modifier
   functions".

 - Some pagemap cleanup and generalization work in Mike Rapoport's
   series "mm, arch: add generic implementation of pfn_valid() for
   FLATMEM" and "fixups for generic implementation of pfn_valid()"

 - Baoquan He has done some work to make /proc/vmallocinfo and
   /proc/kcore better represent the real state of things in his series
   "mm/vmalloc.c: allow vread() to read out vm_map_ram areas".

 - Jason Gunthorpe rationalized the GUP system's interface to the rest
   of the kernel in the series "Simplify the external interface for
   GUP".

 - SeongJae Park wishes to migrate people from DAMON's debugfs interface
   over to its sysfs interface. To support this, we'll temporarily be
   printing warnings when people use the debugfs interface. See the
   series "mm/damon: deprecate DAMON debugfs interface".

 - Andrey Konovalov provided the accurately named "lib/stackdepot: fixes
   and clean-ups" series.

 - Huang Ying has provided a dramatic reduction in migration's TLB flush
   IPI rates with the series "migrate_pages(): batch TLB flushing".

 - Arnd Bergmann has some objtool fixups in "objtool warning fixes".

* tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (505 commits)
  include/linux/migrate.h: remove unneeded externs
  mm/memory_hotplug: cleanup return value handing in do_migrate_range()
  mm/uffd: fix comment in handling pte markers
  mm: change to return bool for isolate_movable_page()
  mm: hugetlb: change to return bool for isolate_hugetlb()
  mm: change to return bool for isolate_lru_page()
  mm: change to return bool for folio_isolate_lru()
  objtool: add UACCESS exceptions for __tsan_volatile_read/write
  kmsan: disable ftrace in kmsan core code
  kasan: mark addr_has_metadata __always_inline
  mm: memcontrol: rename memcg_kmem_enabled()
  sh: initialize max_mapnr
  m68k/nommu: add missing definition of ARCH_PFN_OFFSET
  mm: percpu: fix incorrect size in pcpu_obj_full_size()
  maple_tree: reduce stack usage with gcc-9 and earlier
  mm: page_alloc: call panic() when memoryless node allocation fails
  mm: multi-gen LRU: avoid futile retries
  migrate_pages: move THP/hugetlb migration support check to simplify code
  migrate_pages: batch flushing TLB
  migrate_pages: share more code between _unmap and _move
  ...
2023-02-23 17:09:35 -08:00
Linus Torvalds
89f1a2440a linux-kselftest-kunit-6.3-rc1
This KUnit update for Linux 6.3-rc1 consists of cleanups, new features,
 and documentation updates:
 
 -- adds Function Redirection API to isolate the code being tested from
    other parts of the kernel. functionredirection.rst has the details.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmP1c3AACgkQCwJExA0N
 Qxxbwg//TK0YlpQhoO2AgqSp3F8QlXeFKNdm5rHjBBVMYOQOl6rEB+4uznm2AOD9
 PZmQfAI+bcxMflSMDEBHEwbh6gLyZJKrsMsxuH2k/LQeWHAbuxHVq+/K4kqzhuhi
 QA4ZFKFqnHy+U7jCOGdMtrg9oyg7Glz00fq5pX2iz3FWsE/JpuDZ559RoB9zT9Pu
 VnZ+k42Svxkdmf8fXhSCH7C66k9fKkcQm7IGyVbnsWqmldCHpQ6kIjJVTeQSng4j
 tXkcys37I/d3/Ffz63rke7+WmJrQviL/gg3PqDmEEVxeX8T3GBT01uONTk+TqyWd
 GKudu1lfvuyylFMDoR/5gXr2hr5OJJTGjTfEtwWq7xM0NSiIFHS3/uEYZlE9g3+U
 z2/DKMWOHrzJ2G78dfi5fokFdMfGnz2hBCZa9czSxIbjafxLhjSgnt112mDvkJsZ
 leeVTB9x6g0b+VYwPKYa9gOmFQyZDGTTsJVT9iaAnhEvlxIRoqxZxzW/jFKgHV/r
 ZNRg/kcPfe7m6H15PEblFIuLC4LT/LtDxD8XvkKt42XnG2fuAPS20Jkv6/XB9Ew6
 3H1Su27TXIksUD/Z/ZPP9mBno7rwOLrZUa4QNzXqi6q2sbdXP5apg96cPDU0gvI5
 sq4zwLgHVuIQ8dfX/hgmqZ8VEcvSFDMINoS+SYGvKjxoTzvd+Sw=
 =PloE
 -----END PGP SIGNATURE-----

Merge tag 'linux-kselftest-kunit-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull KUnit update from Shuah Khan:

 - add Function Redirection API to isolate the code being tested from
   other parts of the kernel.

   Documentation/dev-tools/kunit/api/functionredirection.rst has the
   details.

* tag 'linux-kselftest-kunit-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  kunit: Add printf attribute to fail_current_test_impl
  lib/hashtable_test.c: add test for the hashtable structure
  Documentation: Add Function Redirection API docs
  kunit: Expose 'static stub' API to redirect functions
  kunit: Add "hooks" to call into KUnit when it's built as a module
  kunit: kunit.py extract handlers
  tools/testing/kunit/kunit.py: remove redundant double check
2023-02-23 09:40:14 -08:00
Linus Torvalds
525445efac NMI diagnostics for v6.3
Add diagnostics to the x86 NMI handler to help detect NMI-handler bugs
 on the one hand and failing hardware on the other.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEbK7UrM+RBIrCoViJnr8S83LZ+4wFAmPq3x0THHBhdWxtY2tA
 a2VybmVsLm9yZwAKCRCevxLzctn7jOwED/41rLVFORQqNpK5mitA2acqVzRmUG9J
 sUHkJCPmHVr4sEDwqi2u+iBqHMqm8COaQOKA65tPsHJKI1PPcIjBG371QPPYsdRl
 +qNq6oLCrD37Dgs7CmJPjIO+0P2Xb765GUmcNhR9aH9QnYGz2a3s7QfXV2WlFjq3
 1LJ6Z8euEQBb5IE1syp3HHYf3IP4Z88gQxcU4kgV16uADnW0IKSw8F7p9B/EjSnB
 IjIh8gkAbfqNh0VXpex/wzPkrXRbjcOr1s43YkoYS1t3ggIZc6MEGs1kTmXAjxo2
 4S4CAPKfh4Btlez9VVIMwCDb56fHG6I5wyP+jH51dhNNKiuLqnSyHU3kWU3GFiYn
 5Ix7BKtAtp/AzASrL1xildOYjN6gB2QdQijs+bvqzH8Rm8Nl1Yy1z6p6iqcJGe/q
 cerzulajs+/UG2XKwRTWw6I3km4WkueEHYjEzmer+olK/Akx4COYiqMivdY5HIwK
 M7cFVQz2EiHLP3fu2LWrOkNi/Dy96Vsuya0n3E8Ch7Xtdjez7QPAdWU7bLXY/OTd
 jCRdd1MDPz87XQLSmbCV42nJzP5nNryBfijS7swqq9qL/D152ycctxOpIHa6/fJH
 pze5nqRBxjwlhOatJds5mVbhtKwF01YV4wzcaHypCmBXXTz1zVj/hFSbzuUuFwEI
 06c2sVNqzez4tw==
 =MPMw
 -----END PGP SIGNATURE-----

Merge tag 'nmi.2023.02.14a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu

Pull x86 NMI diagnostics from Paul McKenney:
 "Add diagnostics to the x86 NMI handler to help detect NMI-handler bugs
  on the one hand and failing hardware on the other"

* tag 'nmi.2023.02.14a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
  x86/nmi: Print reasons why backtrace NMIs are ignored
  x86/nmi: Accumulate NMI-progress evidence in exc_nmi()
2023-02-23 09:28:37 -08:00
Linus Torvalds
70756b49be It has been a moderately calm cycle for documentation; the significant
changes include:
 
 - Some significant additions to the memory-management documentation
 
 - Some improvements to navigation in the HTML-rendered docs
 
 - More Spanish and Chinese translations
 
 ...and the usual set of typo fixes and such.
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmPzkQUPHGNvcmJldEBs
 d24ubmV0AAoJEBdDWhNsDH5YC0QH/09u10xV3N+RuveNE/tArVxKcQi7JZd/xugQ
 toSXygh64WY10lzwi7Ms1bHZzpPYB0fOrqTGNqNQuhrVTjQzaZB0BBJqm8lwt2w/
 S/Z5wj+IicJTmQ7+0C2Hc/dcK5SCPfY3CgwqOUVdr3dEm1oU+4QaBy31fuIJJ0Hx
 NdbXBco8BZqJX9P67jwp9vbrFrSGBjPI0U4HNHVjrWlcBy8JT0aAnf0fyWFy3orA
 T86EzmEw8drA1mXsHa5pmVwuHDx2X+D+eRurG9llCBrlIG9EDSmnalY4BeGqR4LS
 oDrEH6M91I5+9iWoJ0rBheD8rPclXO2HpjXLApXzTjrORgEYZsM=
 =MCdX
 -----END PGP SIGNATURE-----

Merge tag 'docs-6.3' of git://git.lwn.net/linux

Pull documentation updates from Jonathan Corbet:
 "It has been a moderately calm cycle for documentation; the significant
  changes include:

   - Some significant additions to the memory-management documentation

   - Some improvements to navigation in the HTML-rendered docs

   - More Spanish and Chinese translations

  ... and the usual set of typo fixes and such"

* tag 'docs-6.3' of git://git.lwn.net/linux: (68 commits)
  Documentation/watchdog/hpwdt: Fix Format
  Documentation/watchdog/hpwdt: Fix Reference
  Documentation: core-api: padata: correct spelling
  docs/mm: Physical Memory: correct spelling in reference to CONFIG_PAGE_EXTENSION
  docs: Use HTML comments for the kernel-toc SPDX line
  docs: Add more information to the HTML sidebar
  Documentation: KVM: Update AMD memory encryption link
  printk: Document that CONFIG_BOOT_PRINTK_DELAY required for boot_delay=
  Documentation: userspace-api: correct spelling
  Documentation: sparc: correct spelling
  Documentation: driver-api: correct spelling
  Documentation: admin-guide: correct spelling
  docs: add workload-tracing document to admin-guide
  docs/admin-guide/mm: remove useless markup
  docs/mm: remove useless markup
  docs/mm: Physical Memory: remove useless markup
  docs/sp_SP: Add process magic-number translation
  docs: ftrace: always use canonical ftrace path
  Doc/damon: fix the data path error
  dma-buf: Add "dma-buf" to title of documentation
  ...
2023-02-22 12:00:20 -08:00
Jakub Kicinski
8697a258ae Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
net/devlink/leftover.c / net/core/devlink.c:
  565b4824c3 ("devlink: change port event netdev notifier from per-net to global")
  f05bd8ebeb ("devlink: move code to a dedicated directory")
  687125b579 ("devlink: split out core code")
https://lore.kernel.org/all/20230208094657.379f2b1a@canb.auug.org.au/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-09 12:25:40 -08:00
Rae Moar
789538c61f lib/hashtable_test.c: add test for the hashtable structure
Add a KUnit test for the kernel hashtable implementation in
include/linux/hashtable.h.

Note that this version does not yet test each of the rcu
alternative versions of functions.

Signed-off-by: Rae Moar <rmoar@google.com>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2023-02-08 14:28:17 -07:00
Linus Torvalds
0c272a1d33 25 hotfixes, mainly for MM. 13 are cc:stable.
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY9x+swAKCRDdBJ7gKXxA
 joPwAP95XqB7gzy2l1Mc++Ta7Ih0fS34Pj1vTAxwsRQnqzr6rwD/QOt3YU9KgXpy
 D7Fp8NnaQZq6m5o8cvV5+fBqA3uarAM=
 =IIB8
 -----END PGP SIGNATURE-----

Merge tag 'mm-hotfixes-stable-2023-02-02-19-24-2' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
 "25 hotfixes, mainly for MM.  13 are cc:stable"

* tag 'mm-hotfixes-stable-2023-02-02-19-24-2' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (26 commits)
  mm: memcg: fix NULL pointer in mem_cgroup_track_foreign_dirty_slowpath()
  Kconfig.debug: fix the help description in SCHED_DEBUG
  mm/swapfile: add cond_resched() in get_swap_pages()
  mm: use stack_depot_early_init for kmemleak
  Squashfs: fix handling and sanity checking of xattr_ids count
  sh: define RUNTIME_DISCARD_EXIT
  highmem: round down the address passed to kunmap_flush_on_unmap()
  migrate: hugetlb: check for hugetlb shared PMD in node migration
  mm: hugetlb: proc: check for hugetlb shared PMD in /proc/PID/smaps
  mm/MADV_COLLAPSE: catch !none !huge !bad pmd lookups
  Revert "mm: kmemleak: alloc gray object for reserved region with direct map"
  freevxfs: Kconfig: fix spelling
  maple_tree: should get pivots boundary by type
  .mailmap: update e-mail address for Eugen Hristev
  mm, mremap: fix mremap() expanding for vma's with vm_ops->close()
  squashfs: harden sanity check in squashfs_read_xattr_id_table
  ia64: fix build error due to switch case label appearing next to declaration
  mm: multi-gen LRU: fix crash during cgroup migration
  Revert "mm: add nodes= arg to memory.reclaim"
  zsmalloc: fix a race with deferred_handles storing
  ...
2023-02-03 10:01:57 -08:00
Hyeonggon Yoo
cc6003916e lib/Kconfig.debug: do not enable DEBUG_PREEMPT by default
In workloads where this_cpu operations are frequently performed,
enabling DEBUG_PREEMPT may result in significant increase in
runtime overhead due to frequent invocation of
__this_cpu_preempt_check() function.

This can be demonstrated through benchmarks such as hackbench where this
configuration results in a 10% reduction in performance, primarily due to
the added overhead within memcg charging path.

Therefore, do not to enable DEBUG_PREEMPT by default and make users aware
of its potential impact on performance in some workloads.

hackbench-process-sockets
		      debug_preempt	 no_debug_preempt
Amean     1       0.4743 (   0.00%)      0.4295 *   9.45%*
Amean     4       1.4191 (   0.00%)      1.2650 *  10.86%*
Amean     7       2.2677 (   0.00%)      2.0094 *  11.39%*
Amean     12      3.6821 (   0.00%)      3.2115 *  12.78%*
Amean     21      6.6752 (   0.00%)      5.7956 *  13.18%*
Amean     30      9.6646 (   0.00%)      8.5197 *  11.85%*
Amean     48     15.3363 (   0.00%)     13.5559 *  11.61%*
Amean     79     24.8603 (   0.00%)     22.0597 *  11.27%*
Amean     96     30.1240 (   0.00%)     26.8073 *  11.01%*

Link: https://lkml.kernel.org/r/20230121033942.350387-1-42.hyeyoo@gmail.com
Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Cc: Ben Segall <bsegall@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02 22:50:10 -08:00
Geert Uytterhoeven
d5528cc168 lib: add Dhrystone benchmark test
When working on SoC bring-up, (a full) userspace may not be available,
making it hard to benchmark the CPU performance of the system under
development.  Still, one may want to have a rough idea of the (relative)
performance of one or more CPU cores, especially when working on e.g.  the
clock driver that controls the CPU core clock(s).

Hence make the classical Dhrystone 2.1 benchmark available as a Linux
kernel test module, based on[1].

When built-in, this benchmark can be run without any userspace present.

Parallel runs (run on multiple CPU cores) are supported, just kick the
"run" file multiple times.

Note that the actual figures depend on the configuration options that
control compiler optimization (e.g.  CONFIG_CC_OPTIMIZE_FOR_SIZE vs. 
CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE), and on the compiler options used when
building the kernel in general.  Hence numbers may differ from those
obtained by running similar benchmarks in userspace.

[1] https://github.com/qris/dhrystone-deb.git

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lkml.kernel.org/r/4d07ad990740a5f1e426ce4566fb514f60ec9bdd.1670509558.git.geert+renesas@glider.be
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: David Gow <davidgow@google.com>
[geert+renesas@glider.be: fix uninitialized use of ret]
 Link: https://lkml.kernel.org/r/alpine.DEB.2.22.394.2212190857310.137329@ramsan.of.borg
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02 22:50:01 -08:00
Zhaoyang Huang
b2db9ef2c0 mm: move KMEMLEAK's Kconfig items from lib to mm
Have the kmemleak's source code and Kconfig items be in the same directory.

Link: https://lkml.kernel.org/r/1674091345-14799-1-git-send-email-zhaoyang.huang@unisoc.com
Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: ke.wang <ke.wang@unisoc.com>
Cc: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02 22:33:26 -08:00
Jakub Kicinski
82b4a9412b Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
net/core/gro.c
  7d2c89b325 ("skb: Do mix page pool and page referenced frags in GRO")
  b1a78b9b98 ("net: add support for ipv4 big tcp")
https://lore.kernel.org/all/20230203094454.5766f160@canb.auug.org.au/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-02 14:49:55 -08:00
ye xingchen
1e90e35b62 Kconfig.debug: fix the help description in SCHED_DEBUG
The correct file path for SCHED_DEBUG is /sys/kernel/debug/sched.

Link: https://lkml.kernel.org/r/202301291013573466558@zte.com.cn
Signed-off-by: ye xingchen <ye.xingchen@zte.com.cn>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-01-31 16:44:10 -08:00
Zhaoyang Huang
993f57e027 mm: use stack_depot_early_init for kmemleak
Mirsad report the below error which is caused by stack_depot_init()
failure in kvcalloc.  Solve this by having stackdepot use
stack_depot_early_init().

On 1/4/23 17:08, Mirsad Goran Todorovac wrote:
I hate to bring bad news again, but there seems to be a problem with the output of /sys/kernel/debug/kmemleak:

[root@pc-mtodorov ~]# cat /sys/kernel/debug/kmemleak
unreferenced object 0xffff951c118568b0 (size 16):
comm "kworker/u12:2", pid 56, jiffies 4294893952 (age 4356.548s)
hex dump (first 16 bytes):
    6d 65 6d 73 74 69 63 6b 30 00 00 00 00 00 00 00 memstick0.......
    backtrace:
[root@pc-mtodorov ~]#

Apparently, backtrace of called functions on the stack is no longer
printed with the list of memory leaks.  This appeared on Lenovo desktop
10TX000VCR, with AlmaLinux 8.7 and BIOS version M22KT49A (11/10/2022) and
6.2-rc1 and 6.2-rc2 builds.  This worked on 6.1 with the same
CONFIG_KMEMLEAK=y and MGLRU enabled on a vanilla mainstream kernel from
Mr.  Torvalds' tree.  I don't know if this is deliberate feature for some
reason or a bug.  Please find attached the config, lshw and kmemleak
output.

[vbabka@suse.cz: remove stack_depot_init() call]
Link: https://lore.kernel.org/all/5272a819-ef74-65ff-be61-4d2d567337de@alu.unizg.hr/
Link: https://lkml.kernel.org/r/1674091345-14799-2-git-send-email-zhaoyang.huang@unisoc.com
Fixes: 56a61617dd ("mm: use stack_depot for recording kmemleak's backtrace")
Reported-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: ke.wang <ke.wang@unisoc.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-01-31 16:44:10 -08:00
Linus Torvalds
28cca23da7 hardening fixes for v6.2-rc6
- Split slow memcpy tests into MEMCPY_SLOW_KUNIT_TEST
 
 - Reorganize gcc-plugin includes for GCC 13
 
 - Silence bcache memcpy run-time false positive warnings
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmPUHrIWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJnuoEAClEu/z51wnPPyYKoVAeYbxurqG
 Wlf7Ti7tMEsNN/kbSRXbJV3wMeY2RvY/+9iHa7zySJ4p9GzhH6pJXpS14CdSONjr
 c/ESUEVxJqq5C/ICfC7w1l4my7MwZ/IKMyvZOZET8IWlvTSpe8SW4Xvkaloapl08
 7GanyWB24wZKL8rqKv2D1vmVeE1SmPEhLncgknPs77jUmnH9AXDgN8bVxKzOZI+V
 fa5Mjzcg+ovI/i9e25+EX7GCJ6HxrBnPogB1UM46UmFk/TKY/20l0bJTHzqE8gdK
 iKYpcU+0kihG/JJEg3h95v3qBNJnbBks27t3am+YSbN45Nb4rhovCcEI7NzeHMSG
 grwYeWN7iAcL8gJuEtwFPieUsioi7iX3KlmL990XxviutDkcgGgmGK02m64fNngb
 0LpVue/r0nnYK7WEW1BcZ4Cd8bJfyXoZj0hN/awDkma3vHbsLuYQ6VXCPezOET7u
 uaDWCkgKrKgjRUJQZEqVs3nrEkrIjhODyH7Aa24EwoioGYuqcsVO5frGeu21vY+A
 t52S3WKWcuF5Hr8dDrYHMSA6ntWHwaYN1gaxPaObK9KD4aS4uV/SnJZLmTftZMU8
 P4fvBKYV8W2SU9Y4ppzaYRpaCVSH04rW1wyFjkCN7HtCBIGc5QFLW9v3F3QLe7lW
 1YucSUHg0S1plIKgGQ==
 =0orM
 -----END PGP SIGNATURE-----

Merge tag 'hardening-v6.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull hardening fixes from Kees Cook:

 - Split slow memcpy tests into MEMCPY_SLOW_KUNIT_TEST

 - Reorganize gcc-plugin includes for GCC 13

 - Silence bcache memcpy run-time false positive warnings

* tag 'hardening-v6.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  bcache: Silence memcpy() run-time false positive warnings
  gcc-plugins: Reorganize gimple includes for GCC 13
  kunit: memcpy: Split slow memcpy tests into MEMCPY_SLOW_KUNIT_TEST
2023-01-27 16:09:12 -08:00
Kees Cook
4acf1de35f kunit: memcpy: Split slow memcpy tests into MEMCPY_SLOW_KUNIT_TEST
Since the long memcpy tests may stall a system for tens of seconds
in virtualized architecture environments, split those tests off under
CONFIG_MEMCPY_SLOW_KUNIT_TEST so they can be separately disabled.

Reported-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/lkml/20221226195206.GA2626419@roeck-us.net
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-and-tested-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: David Gow <davidgow@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: linux-hardening@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
2023-01-25 12:24:40 -08:00
Randy Dunlap
ca0f2cfc49 lib: Kconfig: fix spellos
Fix spelling in lib/ Kconfig files.
(reported by codespell)

Link: https://lkml.kernel.org/r/20230124181655.16269-1-rdunlap@infradead.org

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Marco Elver <elver@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: kasan-dev@googlegroups.com
Reviewed-by: Marco Elver <elver@google.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-01-24 19:21:26 -05:00
Paul E. McKenney
1a3ea611fc x86/nmi: Accumulate NMI-progress evidence in exc_nmi()
CPUs ignoring NMIs is often a sign of those CPUs going bad, but there
are quite a few other reasons why a CPU might ignore NMIs.  Therefore,
accumulate evidence within exc_nmi() as to what might be preventing a
given CPU from responding to an NMI.

[ paulmck: Apply Peter Zijlstra feedback. ]

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <x86@kernel.org>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
2023-01-19 15:54:41 -08:00
Peter Foley
b4f34f0b66 Documentation: Avoid duplicate Kconfig inclusion
Documentation/Kconfig is already included from top-level, avoid
including it again from lib/Kconfig.debug.

Signed-off-by: Peter Foley <pefoley2@pefoley.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20230114-doc-v2-1-853a8434ac95@pefoley.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2023-01-19 13:56:34 -07:00
Martin Rodriguez Reboredo
c1177979af btf, scripts: Exclude Rust CUs with pahole
Version 1.24 of pahole has the capability to exclude compilation units (CUs)
of specific languages [1] [2]. Rust, as of writing, is not currently supported
by pahole and if it's used with a build that has BTF debugging enabled it
results in malformed kernel and module binaries [3]. So it's better for pahole
to exclude Rust CUs until support for it arrives.

Co-developed-by: Eric Curtin <ecurtin@redhat.com>
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
Signed-off-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Eric Curtin <ecurtin@redhat.com>
Reviewed-by: Neal Gompa <neal@gompa.dev>
Acked-by: Miguel Ojeda <ojeda@kernel.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?id=49358dfe2aaae4e90b072332c3e324019826783f [1]
Link: https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?id=8ee363790b7437283c53090a85a9fec2f0b0fbc4 [2]
Link: https://github.com/Rust-for-Linux/linux/issues/735 [3]
Link: https://lore.kernel.org/bpf/20230111152050.559334-1-yakoyoku@gmail.com
2023-01-17 17:29:42 +01:00
Linus Torvalds
6feb57c2fd Kbuild updates for v6.2
- Support zstd-compressed debug info
 
  - Allow W=1 builds to detect objects shared among multiple modules
 
  - Add srcrpm-pkg target to generate a source RPM package
 
  - Make the -s option detection work for future GNU Make versions
 
  - Add -Werror to KBUILD_CPPFLAGS when CONFIG_WERROR=y
 
  - Allow W=1 builds to detect -Wundef warnings in any preprocessed files
 
  - Raise the minimum supported version of binutils to 2.25
 
  - Use $(intcmp ...) to compare integers if GNU Make >= 4.4 is used
 
  - Use $(file ...) to read a file if GNU Make >= 4.2 is used
 
  - Print error if GNU Make older than 3.82 is used
 
  - Allow modpost to detect section mismatches with Clang LTO
 
  - Include vmlinuz.efi into kernel tarballs for arm64 CONFIG_EFI_ZBOOT=y
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmOeImsVHG1hc2FoaXJv
 eUBrZXJuZWwub3JnAAoJED2LAQed4NsG06IP/iVjuWFvnjDZT4X8X6zN8aKp1vtR
 EMkmoRtt5cD4CLb1MG4N7irYHgedQSx4rYceP45MyW1I3egl6Ct14RDyeQ1xSIZb
 XFTLDCZvfl/up3MdiqNAqKRS7x5lk9++7F0t+2SoQxKQyJvm735XreX+VhZ1FeLB
 qcHrmzJ5veky5Ry/3OkNUgKFBjKEAL+qKMc55uvkXqfTb3KoBa2r4VC1OaoYGRru
 R8oF9qQRnGVQAl/LbBVchmgSjxryxPrCvBGiKlK03VkXdzEMHMimEJh3BQ6e0PGo
 gajdk+4liy7z+jQnI7jFhvJjGKzkEP/Bc99M/uS92QX5MgpH6mqpHMoqqPiqW87K
 RmZH37FqRu1Vo8dpibmH6r2K6YD/HHRjaDHk1VuuCQYEn0dsNmokPXOqd/1v0I1i
 TXPjWOw1AID5vMJWllqxFhpeVvf0vx5BT/UNrh68MLqlJZzv2eMVJb4fNy6640ml
 U0NclMnOa3eOmf5z1T7/LqDRTa63Q0kpanRrBpcmVOaqW+ZpQ3SQjh4uBN1PyJHL
 cX3Skc341DyRlFiT54QhGKlm57MEb2gjhBZ3Z4J+b7sEFgvjXH/W8vcOGIKlppmA
 CfYMyres4OV+fJc89ONkWsvLiOP1OeUGPvytm33J5QMKXc8SzOLP0D/F8kjrDflm
 EROKuZ4EA5ej/rOy
 =Ig/Y
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild updates from Masahiro Yamada:

 - Support zstd-compressed debug info

 - Allow W=1 builds to detect objects shared among multiple modules

 - Add srcrpm-pkg target to generate a source RPM package

 - Make the -s option detection work for future GNU Make versions

 - Add -Werror to KBUILD_CPPFLAGS when CONFIG_WERROR=y

 - Allow W=1 builds to detect -Wundef warnings in any preprocessed files

 - Raise the minimum supported version of binutils to 2.25

 - Use $(intcmp ...) to compare integers if GNU Make >= 4.4 is used

 - Use $(file ...) to read a file if GNU Make >= 4.2 is used

 - Print error if GNU Make older than 3.82 is used

 - Allow modpost to detect section mismatches with Clang LTO

 - Include vmlinuz.efi into kernel tarballs for arm64 CONFIG_EFI_ZBOOT=y

* tag 'kbuild-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (29 commits)
  buildtar: fix tarballs with EFI_ZBOOT enabled
  modpost: Include '.text.*' in TEXT_SECTIONS
  padata: Mark padata_work_init() as __ref
  kbuild: ensure Make >= 3.82 is used
  kbuild: refactor the prerequisites of the modpost rule
  kbuild: change module.order to list *.o instead of *.ko
  kbuild: use .NOTINTERMEDIATE for future GNU Make versions
  kconfig: refactor Makefile to reduce process forks
  kbuild: add read-file macro
  kbuild: do not sort after reading modules.order
  kbuild: add test-{ge,gt,le,lt} macros
  Documentation: raise minimum supported version of binutils to 2.25
  kbuild: add -Wundef to KBUILD_CPPFLAGS for W=1 builds
  kbuild: move -Werror from KBUILD_CFLAGS to KBUILD_CPPFLAGS
  kbuild: Port silent mode detection to future gnu make.
  init/version.c: remove #include <generated/utsrelease.h>
  firmware_loader: remove #include <generated/utsrelease.h>
  modpost: Mark uuid_le type to be suitable only for MEI
  kbuild: add ability to make source rpm buildable using koji
  kbuild: warn objects shared among multiple modules
  ...
2022-12-19 12:33:32 -06:00
Linus Torvalds
a6e3e6f138 Some fault-injection improvements from Wei Yongjun which enable stacktrace
filtering on x86_64.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY56YjgAKCRDdBJ7gKXxA
 jni0AQCXYCZE1pOVXnB+IA1G1xkM0f1xDS/D63uJVl7Lyurv5QEAvJWX25DsTbLR
 c0bq3y2PPpHzrcDyPhciVlY/iplHQQM=
 =tfs8
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2022-12-17-20-32' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull fault-injection updates from Andrew Morton:
 "Some fault-injection improvements from Wei Yongjun which enable
  stacktrace filtering on x86_64"

* tag 'mm-nonmm-stable-2022-12-17-20-32' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  fault-injection: make stacktrace filter works as expected
  fault-injection: make some stack filter attrs more readable
  fault-injection: skip stacktrace filtering by default
  fault-injection: allow stacktrace filter for x86-64
2022-12-19 07:03:44 -06:00
Wei Yongjun
a7ebbbb159 fault-injection: allow stacktrace filter for x86-64
This patchset allow fault injection to run on x86_64 and makes stacktrace
filter work as expected.  With this, we can test a device driver module
with fault injection more easily.


This patch (of 4):

FAULT_INJECTION_STACKTRACE_FILTER option was apparently disallowed on
x86_64 because of problems with the stack unwinder:

    commit 6d690dcac9
    Author: Akinobu Mita <akinobu.mita@gmail.com>
    Date:   Sat May 12 10:36:53 2007 -0700

        fault injection: disable stacktrace filter for x86-64

However, there is no problems whatsoever with this today. Let's allow
it again.

Link: https://lkml.kernel.org/r/20220817080332.1052710-1-weiyongjun1@huawei.com
Link: https://lkml.kernel.org/r/20220817080332.1052710-2-weiyongjun1@huawei.com
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Cc: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Isabella Basso <isabbasso@riseup.net>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-12-15 16:40:43 -08:00
Zhaoyang Huang
56a61617dd mm: use stack_depot for recording kmemleak's backtrace
Using stack_depot to record kmemleak's backtrace which has been
implemented on slub for reducing redundant information.

[akpm@linux-foundation.org: fix build - remove now-unused __save_stack_trace()]
[zhaoyang.huang@unisoc.com: v3]
  Link: https://lkml.kernel.org/r/1667101354-4669-1-git-send-email-zhaoyang.huang@unisoc.com
[akpm@linux-foundation.org: fix v3 layout oddities]
[akpm@linux-foundation.org: coding-style cleanups]
Link: https://lkml.kernel.org/r/1666864224-27541-1-git-send-email-zhaoyang.huang@unisoc.com
Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: ke.wang <ke.wang@unisoc.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zhaoyang Huang <huangzhaoyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-12-15 16:37:49 -08:00
Linus Torvalds
94a855111e - Add the call depth tracking mitigation for Retbleed which has
been long in the making. It is a lighterweight software-only fix for
 Skylake-based cores where enabling IBRS is a big hammer and causes a
 significant performance impact.
 
 What it basically does is, it aligns all kernel functions to 16 bytes
 boundary and adds a 16-byte padding before the function, objtool
 collects all functions' locations and when the mitigation gets applied,
 it patches a call accounting thunk which is used to track the call depth
 of the stack at any time.
 
 When that call depth reaches a magical, microarchitecture-specific value
 for the Return Stack Buffer, the code stuffs that RSB and avoids its
 underflow which could otherwise lead to the Intel variant of Retbleed.
 
 This software-only solution brings a lot of the lost performance back,
 as benchmarks suggest:
 
   https://lore.kernel.org/all/20220915111039.092790446@infradead.org/
 
 That page above also contains a lot more detailed explanation of the
 whole mechanism
 
 - Implement a new control flow integrity scheme called FineIBT which is
 based on the software kCFI implementation and uses hardware IBT support
 where present to annotate and track indirect branches using a hash to
 validate them
 
 - Other misc fixes and cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmOZp5EACgkQEsHwGGHe
 VUrZFxAAvi/+8L0IYSK4mKJvixGbTFjxN/Swo2JVOfs34LqGUT6JaBc+VUMwZxdb
 VMTFIZ3ttkKEodjhxGI7oGev6V8UfhI37SmO2lYKXpQVjXXnMlv/M+Vw3teE38CN
 gopi+xtGnT1IeWQ3tc/Tv18pleJ0mh5HKWiW+9KoqgXj0wgF9x4eRYDz1TDCDA/A
 iaBzs56j8m/FSykZHnrWZ/MvjKNPdGlfJASUCPeTM2dcrXQGJ93+X2hJctzDte0y
 Nuiw6Y0htfFBE7xoJn+sqm5Okr+McoUM18/CCprbgSKYk18iMYm3ZtAi6FUQZS1A
 ua4wQCf49loGp15PO61AS5d3OBf5D3q/WihQRbCaJvTVgPp9sWYnWwtcVUuhMllh
 ZQtBU9REcVJ/22bH09Q9CjBW0VpKpXHveqQdqRDViLJ6v/iI6EFGmD24SW/VxyRd
 73k9MBGrL/dOf1SbEzdsnvcSB3LGzp0Om8o/KzJWOomrVKjBCJy16bwTEsCZEJmP
 i406m92GPXeaN1GhTko7vmF0GnkEdJs1GVCZPluCAxxbhHukyxHnrjlQjI4vC80n
 Ylc0B3Kvitw7LGJsPqu+/jfNHADC/zhx1qz/30wb5cFmFbN1aRdp3pm8JYUkn+l/
 zri2Y6+O89gvE/9/xUhMohzHsWUO7xITiBavewKeTP9GSWybWUs=
 =cRy1
 -----END PGP SIGNATURE-----

Merge tag 'x86_core_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 core updates from Borislav Petkov:

 - Add the call depth tracking mitigation for Retbleed which has been
   long in the making. It is a lighterweight software-only fix for
   Skylake-based cores where enabling IBRS is a big hammer and causes a
   significant performance impact.

   What it basically does is, it aligns all kernel functions to 16 bytes
   boundary and adds a 16-byte padding before the function, objtool
   collects all functions' locations and when the mitigation gets
   applied, it patches a call accounting thunk which is used to track
   the call depth of the stack at any time.

   When that call depth reaches a magical, microarchitecture-specific
   value for the Return Stack Buffer, the code stuffs that RSB and
   avoids its underflow which could otherwise lead to the Intel variant
   of Retbleed.

   This software-only solution brings a lot of the lost performance
   back, as benchmarks suggest:

       https://lore.kernel.org/all/20220915111039.092790446@infradead.org/

   That page above also contains a lot more detailed explanation of the
   whole mechanism

 - Implement a new control flow integrity scheme called FineIBT which is
   based on the software kCFI implementation and uses hardware IBT
   support where present to annotate and track indirect branches using a
   hash to validate them

 - Other misc fixes and cleanups

* tag 'x86_core_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (80 commits)
  x86/paravirt: Use common macro for creating simple asm paravirt functions
  x86/paravirt: Remove clobber bitmask from .parainstructions
  x86/debug: Include percpu.h in debugreg.h to get DECLARE_PER_CPU() et al
  x86/cpufeatures: Move X86_FEATURE_CALL_DEPTH from bit 18 to bit 19 of word 11, to leave space for WIP X86_FEATURE_SGX_EDECCSSA bit
  x86/Kconfig: Enable kernel IBT by default
  x86,pm: Force out-of-line memcpy()
  objtool: Fix weak hole vs prefix symbol
  objtool: Optimize elf_dirty_reloc_sym()
  x86/cfi: Add boot time hash randomization
  x86/cfi: Boot time selection of CFI scheme
  x86/ibt: Implement FineIBT
  objtool: Add --cfi to generate the .cfi_sites section
  x86: Add prefix symbols for function padding
  objtool: Add option to generate prefix symbols
  objtool: Avoid O(bloody terrible) behaviour -- an ode to libelf
  objtool: Slice up elf_create_section_symbol()
  kallsyms: Revert "Take callthunks into account"
  x86: Unconfuse CONFIG_ and X86_FEATURE_ namespaces
  x86/retpoline: Fix crash printing warning
  x86/paravirt: Fix a !PARAVIRT build warning
  ...
2022-12-14 15:03:00 -08:00
Linus Torvalds
48ea09cdda hardening updates for v6.2-rc1
- Convert flexible array members, fix -Wstringop-overflow warnings,
   and fix KCFI function type mismatches that went ignored by
   maintainers (Gustavo A. R. Silva, Nathan Chancellor, Kees Cook).
 
 - Remove the remaining side-effect users of ksize() by converting
   dma-buf, btrfs, and coredump to using kmalloc_size_roundup(),
   add more __alloc_size attributes, and introduce full testing
   of all allocator functions. Finally remove the ksize() side-effect
   so that each allocation-aware checker can finally behave without
   exceptions.
 
 - Introduce oops_limit (default 10,000) and warn_limit (default off)
   to provide greater granularity of control for panic_on_oops and
   panic_on_warn (Jann Horn, Kees Cook).
 
 - Introduce overflows_type() and castable_to_type() helpers for
   cleaner overflow checking.
 
 - Improve code generation for strscpy() and update str*() kern-doc.
 
 - Convert strscpy and sigphash tests to KUnit, and expand memcpy
   tests.
 
 - Always use a non-NULL argument for prepare_kernel_cred().
 
 - Disable structleak plugin in FORTIFY KUnit test (Anders Roxell).
 
 - Adjust orphan linker section checking to respect CONFIG_WERROR
   (Xin Li).
 
 - Make sure siginfo is cleared for forced SIGKILL (haifeng.xu).
 
 - Fix um vs FORTIFY warnings for always-NULL arguments.
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmOZSOoWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJjAAD/0YkvpU7f03f8hcQMJK6wv//24K
 AW41hEaBikq9RcmkuvkLLrJRibGgZ5O2xUkUkxRs/HxhkhrZ0kEw8sbwZe8MoWls
 F4Y9+TDjsrdHmjhfcBZdLnVxwcKK5wlaEcpjZXtbsfcdhx3TbgcDA23YELl5t0K+
 I11j4kYmf9SLl4CwIrSP5iACml8CBHARDh8oIMF7FT/LrjNbM8XkvBcVVT6hTbOV
 yjgA8WP2e9GXvj9GzKgqvd0uE/kwPkVAeXLNFWopPi4FQ8AWjlxbBZR0gamA6/EB
 d7TIs0ifpVU2JGQaTav4xO6SsFMj3ntoUI0qIrFaTxZAvV4KYGrPT/Kwz1O4SFaG
 rN5lcxseQbPQSBTFNG4zFjpywTkVCgD2tZqDwz5Rrmiraz0RyIokCN+i4CD9S0Ds
 oEd8JSyLBk1sRALczkuEKo0an5AyC9YWRcBXuRdIHpLo08PsbeUUSe//4pe303cw
 0ApQxYOXnrIk26MLElTzSMImlSvlzW6/5XXzL9ME16leSHOIfDeerPnc9FU9Eb3z
 ODv22z6tJZ9H/apSUIHZbMciMbbVTZ8zgpkfydr08o87b342N/ncYHZ5cSvQ6DWb
 jS5YOIuvl46/IhMPT16qWC8p0bP5YhxoPv5l6Xr0zq0ooEj0E7keiD/SzoLvW+Qs
 AHXcibguPRQBPAdiPQ==
 =yaaN
 -----END PGP SIGNATURE-----

Merge tag 'hardening-v6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull kernel hardening updates from Kees Cook:

 - Convert flexible array members, fix -Wstringop-overflow warnings, and
   fix KCFI function type mismatches that went ignored by maintainers
   (Gustavo A. R. Silva, Nathan Chancellor, Kees Cook)

 - Remove the remaining side-effect users of ksize() by converting
   dma-buf, btrfs, and coredump to using kmalloc_size_roundup(), add
   more __alloc_size attributes, and introduce full testing of all
   allocator functions. Finally remove the ksize() side-effect so that
   each allocation-aware checker can finally behave without exceptions

 - Introduce oops_limit (default 10,000) and warn_limit (default off) to
   provide greater granularity of control for panic_on_oops and
   panic_on_warn (Jann Horn, Kees Cook)

 - Introduce overflows_type() and castable_to_type() helpers for cleaner
   overflow checking

 - Improve code generation for strscpy() and update str*() kern-doc

 - Convert strscpy and sigphash tests to KUnit, and expand memcpy tests

 - Always use a non-NULL argument for prepare_kernel_cred()

 - Disable structleak plugin in FORTIFY KUnit test (Anders Roxell)

 - Adjust orphan linker section checking to respect CONFIG_WERROR (Xin
   Li)

 - Make sure siginfo is cleared for forced SIGKILL (haifeng.xu)

 - Fix um vs FORTIFY warnings for always-NULL arguments

* tag 'hardening-v6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (31 commits)
  ksmbd: replace one-element arrays with flexible-array members
  hpet: Replace one-element array with flexible-array member
  um: virt-pci: Avoid GCC non-NULL warning
  signal: Initialize the info in ksignal
  lib: fortify_kunit: build without structleak plugin
  panic: Expose "warn_count" to sysfs
  panic: Introduce warn_limit
  panic: Consolidate open-coded panic_on_warn checks
  exit: Allow oops_limit to be disabled
  exit: Expose "oops_count" to sysfs
  exit: Put an upper limit on how often we can oops
  panic: Separate sysctl logic from CONFIG_SMP
  mm/pgtable: Fix multiple -Wstringop-overflow warnings
  mm: Make ksize() a reporting-only function
  kunit/fortify: Validate __alloc_size attribute results
  drm/sti: Fix return type of sti_{dvo,hda,hdmi}_connector_mode_valid()
  drm/fsl-dcu: Fix return type of fsl_dcu_drm_connector_mode_valid()
  driver core: Add __alloc_size hint to devm allocators
  overflow: Introduce overflows_type() and castable_to_type()
  coredump: Proactively round up to kmalloc bucket size
  ...
2022-12-14 12:20:00 -08:00
Linus Torvalds
e2ca6ba6ba MM patches for 6.2-rc1.
- More userfaultfs work from Peter Xu.
 
 - Several convert-to-folios series from Sidhartha Kumar and Huang Ying.
 
 - Some filemap cleanups from Vishal Moola.
 
 - David Hildenbrand added the ability to selftest anon memory COW handling.
 
 - Some cpuset simplifications from Liu Shixin.
 
 - Addition of vmalloc tracing support by Uladzislau Rezki.
 
 - Some pagecache folioifications and simplifications from Matthew Wilcox.
 
 - A pagemap cleanup from Kefeng Wang: we have VM_ACCESS_FLAGS, so use it.
 
 - Miguel Ojeda contributed some cleanups for our use of the
   __no_sanitize_thread__ gcc keyword.  This series shold have been in the
   non-MM tree, my bad.
 
 - Naoya Horiguchi improved the interaction between memory poisoning and
   memory section removal for huge pages.
 
 - DAMON cleanups and tuneups from SeongJae Park
 
 - Tony Luck fixed the handling of COW faults against poisoned pages.
 
 - Peter Xu utilized the PTE marker code for handling swapin errors.
 
 - Hugh Dickins reworked compound page mapcount handling, simplifying it
   and making it more efficient.
 
 - Removal of the autonuma savedwrite infrastructure from Nadav Amit and
   David Hildenbrand.
 
 - zram support for multiple compression streams from Sergey Senozhatsky.
 
 - David Hildenbrand reworked the GUP code's R/O long-term pinning so
   that drivers no longer need to use the FOLL_FORCE workaround which
   didn't work very well anyway.
 
 - Mel Gorman altered the page allocator so that local IRQs can remnain
   enabled during per-cpu page allocations.
 
 - Vishal Moola removed the try_to_release_page() wrapper.
 
 - Stefan Roesch added some per-BDI sysfs tunables which are used to
   prevent network block devices from dirtying excessive amounts of
   pagecache.
 
 - David Hildenbrand did some cleanup and repair work on KSM COW
   breaking.
 
 - Nhat Pham and Johannes Weiner have implemented writeback in zswap's
   zsmalloc backend.
 
 - Brian Foster has fixed a longstanding corner-case oddity in
   file[map]_write_and_wait_range().
 
 - sparse-vmemmap changes for MIPS, LoongArch and NIOS2 from Feiyang
   Chen.
 
 - Shiyang Ruan has done some work on fsdax, to make its reflink mode
   work better under xfstests.  Better, but still not perfect.
 
 - Christoph Hellwig has removed the .writepage() method from several
   filesystems.  They only need .writepages().
 
 - Yosry Ahmed wrote a series which fixes the memcg reclaim target
   beancounting.
 
 - David Hildenbrand has fixed some of our MM selftests for 32-bit
   machines.
 
 - Many singleton patches, as usual.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY5j6ZwAKCRDdBJ7gKXxA
 jkDYAP9qNeVqp9iuHjZNTqzMXkfmJPsw2kmy2P+VdzYVuQRcJgEAgoV9d7oMq4ml
 CodAgiA51qwzId3GRytIo/tfWZSezgA=
 =d19R
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2022-12-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - More userfaultfs work from Peter Xu

 - Several convert-to-folios series from Sidhartha Kumar and Huang Ying

 - Some filemap cleanups from Vishal Moola

 - David Hildenbrand added the ability to selftest anon memory COW
   handling

 - Some cpuset simplifications from Liu Shixin

 - Addition of vmalloc tracing support by Uladzislau Rezki

 - Some pagecache folioifications and simplifications from Matthew
   Wilcox

 - A pagemap cleanup from Kefeng Wang: we have VM_ACCESS_FLAGS, so use
   it

 - Miguel Ojeda contributed some cleanups for our use of the
   __no_sanitize_thread__ gcc keyword.

   This series should have been in the non-MM tree, my bad

 - Naoya Horiguchi improved the interaction between memory poisoning and
   memory section removal for huge pages

 - DAMON cleanups and tuneups from SeongJae Park

 - Tony Luck fixed the handling of COW faults against poisoned pages

 - Peter Xu utilized the PTE marker code for handling swapin errors

 - Hugh Dickins reworked compound page mapcount handling, simplifying it
   and making it more efficient

 - Removal of the autonuma savedwrite infrastructure from Nadav Amit and
   David Hildenbrand

 - zram support for multiple compression streams from Sergey Senozhatsky

 - David Hildenbrand reworked the GUP code's R/O long-term pinning so
   that drivers no longer need to use the FOLL_FORCE workaround which
   didn't work very well anyway

 - Mel Gorman altered the page allocator so that local IRQs can remnain
   enabled during per-cpu page allocations

 - Vishal Moola removed the try_to_release_page() wrapper

 - Stefan Roesch added some per-BDI sysfs tunables which are used to
   prevent network block devices from dirtying excessive amounts of
   pagecache

 - David Hildenbrand did some cleanup and repair work on KSM COW
   breaking

 - Nhat Pham and Johannes Weiner have implemented writeback in zswap's
   zsmalloc backend

 - Brian Foster has fixed a longstanding corner-case oddity in
   file[map]_write_and_wait_range()

 - sparse-vmemmap changes for MIPS, LoongArch and NIOS2 from Feiyang
   Chen

 - Shiyang Ruan has done some work on fsdax, to make its reflink mode
   work better under xfstests. Better, but still not perfect

 - Christoph Hellwig has removed the .writepage() method from several
   filesystems. They only need .writepages()

 - Yosry Ahmed wrote a series which fixes the memcg reclaim target
   beancounting

 - David Hildenbrand has fixed some of our MM selftests for 32-bit
   machines

 - Many singleton patches, as usual

* tag 'mm-stable-2022-12-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (313 commits)
  mm/hugetlb: set head flag before setting compound_order in __prep_compound_gigantic_folio
  mm: mmu_gather: allow more than one batch of delayed rmaps
  mm: fix typo in struct pglist_data code comment
  kmsan: fix memcpy tests
  mm: add cond_resched() in swapin_walk_pmd_entry()
  mm: do not show fs mm pc for VM_LOCKONFAULT pages
  selftests/vm: ksm_functional_tests: fixes for 32bit
  selftests/vm: cow: fix compile warning on 32bit
  selftests/vm: madv_populate: fix missing MADV_POPULATE_(READ|WRITE) definitions
  mm/gup_test: fix PIN_LONGTERM_TEST_READ with highmem
  mm,thp,rmap: fix races between updates of subpages_mapcount
  mm: memcg: fix swapcached stat accounting
  mm: add nodes= arg to memory.reclaim
  mm: disable top-tier fallback to reclaim on proactive reclaim
  selftests: cgroup: make sure reclaim target memcg is unprotected
  selftests: cgroup: refactor proactive reclaim code to reclaim_until()
  mm: memcg: fix stale protection of reclaim target memcg
  mm/mmap: properly unaccount memory on mas_preallocate() failure
  omfs: remove ->writepage
  jfs: remove ->writepage
  ...
2022-12-13 19:29:45 -08:00
Linus Torvalds
96f4263568 Rust changes for v6.2
The first set of changes after the merge, the major ones being:
 
 - String and formatting: new types `CString`, `CStr`, `BStr` and
   `Formatter`; new macros `c_str!`, `b_str!` and `fmt!`.
 
 - Errors: the rest of the error codes from `errno-base.h`, as well as
   some `From` trait implementations for the `Error` type.
 
 - Printing: the rest of the `pr_*!` levels and the continuation one
   `pr_cont!`, as well as a new sample.
 
 - `alloc` crate: new constructors `try_with_capacity()` and
   `try_with_capacity_in()` for `RawVec` and `Vec`.
 
 - Procedural macros: new macros `#[vtable]` and `concat_idents!`, as
   well as better ergonomics for `module!` users.
 
 - Asserting: new macros `static_assert!`, `build_error!` and
   `build_assert!`, as well as a new crate `build_error` to support them.
 
 - Vocabulary types: new types `Opaque` and `Either`.
 
 - Debugging: new macro `dbg!`.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPjU5OPd5QIZ9jqqOGXyLc2htIW0FAmOVJTQACgkQGXyLc2ht
 IW3cLA//dn0/rN2sCJBsX8/mNQqRXQvM45QUP5ESd5w01fRXEpvHZ+MbfdcFZN8K
 oQEgZwTwTKvKY6V+xCPmEiUvk5jniCzdEjTFtDhVhA+qimxY/FMS3zozuJMQIlz2
 zTiZ4aEM1zFAwoDHnnOmaCO+C0zw5d9UFIKO4nIvOSy3gD/eLgiFz3cyTh8Q2+BT
 lPyqeKg9+xKIl8tWa5zGYHgZASGguV0EpXFn4Ck4eBOH6O9ovWgakdzZp0BMJ9Ca
 UNIFpFjLMUkCwzZkPqIyI9IZEOzUYWTTfWU9S5JJ6IzC3aT8NPp3WeSYW9TgVnvO
 z5n6rsYOgvKeWCvGIgq82fgVbGMNaaP1MFxNLsdbWWj+9lfebpk62aQXSuWsvASq
 /W39/xEhOOLikyb3ObVHLW1r1lu9guSeP8eaMQ5ci/99kypHHBOXmB/nr9pxPkrr
 kovxuZedDbgEYunbVmwWGmvLg8dcjadfeXf6Dkc6bwDvyhiuX9W21z9ppT9nV5NW
 chYRAPROCHuBRu+txft9gIjyE1/V7G8CyeWiG36VWN8Tayc5iJEWOopk4GJcEpJi
 MS5tAJru7fBZcXjFausN3mdXyRwMLdilTZ2Qkp6MqzXi5zwVuKH1wsJ7CLkPBWQC
 tAPJts6krOonI2cd2JM8ds+Wj5Q1cDGQuF6Rj29/27aUBKH1w2Y=
 =iIcK
 -----END PGP SIGNATURE-----

Merge tag 'rust-6.2' of https://github.com/Rust-for-Linux/linux

Pull rust updates from Miguel Ojeda:
 "The first set of changes after the merge, the major ones being:

   - String and formatting: new types 'CString', 'CStr', 'BStr' and
     'Formatter'; new macros 'c_str!', 'b_str!' and 'fmt!'.

   - Errors: the rest of the error codes from 'errno-base.h', as well as
     some 'From' trait implementations for the 'Error' type.

   - Printing: the rest of the 'pr_*!' levels and the continuation one
     'pr_cont!', as well as a new sample.

   - 'alloc' crate: new constructors 'try_with_capacity()' and
     'try_with_capacity_in()' for 'RawVec' and 'Vec'.

   - Procedural macros: new macros '#[vtable]' and 'concat_idents!', as
     well as better ergonomics for 'module!' users.

   - Asserting: new macros 'static_assert!', 'build_error!' and
     'build_assert!', as well as a new crate 'build_error' to support
     them.

   - Vocabulary types: new types 'Opaque' and 'Either'.

   - Debugging: new macro 'dbg!'"

* tag 'rust-6.2' of https://github.com/Rust-for-Linux/linux: (28 commits)
  rust: types: add `Opaque` type
  rust: types: add `Either` type
  rust: build_assert: add `build_{error,assert}!` macros
  rust: add `build_error` crate
  rust: static_assert: add `static_assert!` macro
  rust: std_vendor: add `dbg!` macro based on `std`'s one
  rust: str: add `fmt!` macro
  rust: str: add `CString` type
  rust: str: add `Formatter` type
  rust: str: add `c_str!` macro
  rust: str: add `CStr` unit tests
  rust: str: implement several traits for `CStr`
  rust: str: add `CStr` type
  rust: str: add `b_str!` macro
  rust: str: add `BStr` type
  rust: alloc: add `Vec::try_with_capacity{,_in}()` constructors
  rust: alloc: add `RawVec::try_with_capacity_in()` constructor
  rust: prelude: add `error::code::*` constant items
  rust: error: add `From` implementations for `Error`
  rust: error: add codes from `errno-base.h`
  ...
2022-12-12 16:59:00 -08:00
Linus Torvalds
a312a8cc3c cgroup changes for v6.2-rc1
Nothing too interesting.
 
 * Add CONFIG_DEBUG_GROUP_REF which makes cgroup refcnt operations kprobable.
 
 * A couple cpuset optimizations.
 
 * Other misc changes including doc and test updates.
 -----BEGIN PGP SIGNATURE-----
 
 iIQEABYIACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCY5bHvg4cdGpAa2VybmVs
 Lm9yZwAKCRCxYfJx3gVYGcYrAQCfrlzrbWw6gTQ7fmr0Avxjy5FxLjsdzEGPcmGY
 ByEMhgD/VdUf3zI/Khr91Gsi5JXQxQf7a5caD369xupRWUWjqA8=
 =Nf+E
 -----END PGP SIGNATURE-----

Merge tag 'cgroup-for-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup

Pull cgroup updates from Tejun Heo:
 "Nothing too interesting:

   - Add CONFIG_DEBUG_GROUP_REF which makes cgroup refcnt operations
     kprobable

   - A couple cpuset optimizations

   - Other misc changes including doc and test updates"

* tag 'cgroup-for-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: remove rcu_read_lock()/rcu_read_unlock() in critical section of spin_lock_irq()
  cgroup/cpuset: Improve cpuset_css_alloc() description
  kselftest/cgroup: Add cleanup() to test_cpuset_prs.sh
  cgroup/cpuset: Optimize cpuset_attach() on v2
  cgroup/cpuset: Skip spread flags update on v2
  kselftest/cgroup: Fix gathering number of CPUs
  cgroup: cgroup refcnt functions should be exported when CONFIG_DEBUG_CGROUP_REF
  cgroup: Implement DEBUG_CGROUP_REF
2022-12-12 15:48:36 -08:00
Gary Guo
ecaa6ddff2 rust: add build_error crate
The `build_error` crate provides a function `build_error` which
will panic at compile-time if executed in const context and,
by default, will cause a build error if not executed at compile
time and the optimizer does not optimise away the call.

The `CONFIG_RUST_BUILD_ASSERT_ALLOW` kernel option allows to
relax the default build failure and convert it to a runtime
check. If the runtime check fails, `panic!` will be called.

Its functionality will be exposed to users as a couple macros in
the `kernel` crate in the following patch, thus some documentation
here refers to them for simplicity.

Signed-off-by: Gary Guo <gary@garyguo.net>
Reviewed-by: Wei Liu <wei.liu@kernel.org>
[Reworded, adapted for upstream and applied latest changes]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2022-12-04 01:59:16 +01:00
Linus Torvalds
bdaa78c6aa 15 hotfixes. 11 marked cc:stable. Only three or four of the latter
address post-6.0 issues, which is hopefully a sign that things are
 converging.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY4pQpQAKCRDdBJ7gKXxA
 jquxAP9Lqif7CGDgdq8uWY2hHS/Ujc3k7Ohgyzs37olnCuU8KwEA6/J7SpjsBgtY
 OfzvnwxpCTh8Kfzu/oNckIHo/EEiIA8=
 =o6qT
 -----END PGP SIGNATURE-----

Merge tag 'mm-hotfixes-stable-2022-12-02' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc hotfixes from Andrew Morton:
 "15 hotfixes,  11 marked cc:stable.

  Only three or four of the latter address post-6.0 issues, which is
  hopefully a sign that things are converging"

* tag 'mm-hotfixes-stable-2022-12-02' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  revert "kbuild: fix -Wimplicit-function-declaration in license_is_gpl_compatible"
  Kconfig.debug: provide a little extra FRAME_WARN leeway when KASAN is enabled
  drm/amdgpu: temporarily disable broken Clang builds due to blown stack-frame
  mm/khugepaged: invoke MMU notifiers in shmem/file collapse paths
  mm/khugepaged: fix GUP-fast interaction by sending IPI
  mm/khugepaged: take the right locks for page table retraction
  mm: migrate: fix THP's mapcount on isolation
  mm: introduce arch_has_hw_nonleaf_pmd_young()
  mm: add dummy pmd_young() for architectures not having it
  mm/damon/sysfs: fix wrong empty schemes assumption under online tuning in damon_sysfs_set_schemes()
  tools/vm/slabinfo-gnuplot: use "grep -E" instead of "egrep"
  nilfs2: fix NULL pointer dereference in nilfs_palloc_commit_free_entry()
  hugetlb: don't delete vma_lock in hugetlb MADV_DONTNEED processing
  madvise: use zap_page_range_single for madvise dontneed
  mm: replace VM_WARN_ON to pr_warn if the node is offline with __GFP_THISNODE
2022-12-02 13:39:38 -08:00
Steven Rostedt (Google)
a4412fdd49 error-injection: Add prompt for function error injection
The config to be able to inject error codes into any function annotated
with ALLOW_ERROR_INJECTION() is enabled when FUNCTION_ERROR_INJECTION is
enabled.  But unfortunately, this is always enabled on x86 when KPROBES
is enabled, and there's no way to turn it off.

As kprobes is useful for observability of the kernel, it is useful to
have it enabled in production environments.  But error injection should
be avoided.  Add a prompt to the config to allow it to be disabled even
when kprobes is enabled, and get rid of the "def_bool y".

This is a kernel debug feature (it's in Kconfig.debug), and should have
never been something enabled by default.

Cc: stable@vger.kernel.org
Fixes: 540adea380 ("error-injection: Separate error-injection from kprobe")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-12-01 13:14:21 -08:00
Randy Dunlap
845aad0aa0 maple_tree: allow TEST_MAPLE_TREE only when DEBUG_KERNEL is set
Prevent a kconfig warning that is caused by TEST_MAPLE_TREE by adding a
"depends on" clause for TEST_MAPLE_TREE since 'select' does not follow any
kconfig dependencies.

WARNING: unmet direct dependencies detected for DEBUG_MAPLE_TREE
  Depends on [n]: DEBUG_KERNEL [=n]
  Selected by [y]:
  - TEST_MAPLE_TREE [=y] && RUNTIME_TESTING_MENU [=y]

Link: https://lkml.kernel.org/r/20221119055117.14094-1-rdunlap@infradead.org
Fixes: 120b116208 ("maple_tree: reorganize testing to restore module testing")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-30 15:59:03 -08:00
Lee Jones
152fe65f30 Kconfig.debug: provide a little extra FRAME_WARN leeway when KASAN is enabled
When enabled, KASAN enlarges function's stack-frames.  Pushing quite a few
over the current threshold.  This can mainly be seen on 32-bit
architectures where the present limit (when !GCC) is a lowly 1024-Bytes.

Link: https://lkml.kernel.org/r/20221125120750.3537134-3-lee@kernel.org
Signed-off-by: Lee Jones <lee@kernel.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: David Airlie <airlied@gmail.com>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Leo Li <sunpeng.li@amd.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com>
Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Tom Rix <trix@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-30 14:49:42 -08:00
Li Hua
de3db3f883 test_kprobes: fix implicit declaration error of test_kprobes
If KPROBES_SANITY_TEST and ARCH_CORRECT_STACKTRACE_ON_KRETPROBE is enabled, but
STACKTRACE is not set. Build failed as below:

lib/test_kprobes.c: In function `stacktrace_return_handler':
lib/test_kprobes.c:228:8: error: implicit declaration of function `stack_trace_save'; did you mean `stacktrace_driver'? [-Werror=implicit-function-declaration]
  ret = stack_trace_save(stack_buf, STACK_BUF_SIZE, 0);
        ^~~~~~~~~~~~~~~~
        stacktrace_driver
cc1: all warnings being treated as errors
scripts/Makefile.build:250: recipe for target 'lib/test_kprobes.o' failed
make[2]: *** [lib/test_kprobes.o] Error 1

To fix this error, Select STACKTRACE if ARCH_CORRECT_STACKTRACE_ON_KRETPROBE is enabled.

Link: https://lkml.kernel.org/r/20221121030620.63181-1-hucool.lihua@huawei.com
Fixes: 1f6d3a8f5e ("kprobes: Add a test case for stacktrace from kretprobe handler")
Signed-off-by: Li Hua <hucool.lihua@huawei.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-22 18:50:45 -08:00
Ingo Molnar
0ce096db71 Linux 6.1-rc6
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmN6wAgeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG0EYH/3/RO90NbrFItraN
 Lzr+d3VdbGjTu8xd1M+PRTmwh3zxLpB+Jwqr0T0A2gzL9B/D+AUPUJdrCVbv9DqS
 FLJAVqoeV20dNBAHSffOOLPsgCZ+Eu+LzlNN7Iqde0e8cyZICFMNktitui84Xm/i
 1NgFVgz9OZ6+aieYvUj3FrFq0p8GTIaC/oybDZrxYKcO8ZzKVMJ11swRw10wwq0g
 qOOECvV3w7wlQ8upQZkzFxItKFc7EexZI6R4elXeGSJJ9Hlc092dv/zsKB9dwV+k
 WcwkJrZRoezYXzgGBFxUcQtzi+ethjrPjuJuM1rYLUSIcfIW/0lkaSLgRoBu8D+I
 1GfXkXs=
 =gt6P
 -----END PGP SIGNATURE-----

Merge tag 'v6.1-rc6' into x86/core, to resolve conflicts

Resolve conflicts between these commits in arch/x86/kernel/asm-offsets.c:

 # upstream:
 debc5a1ec0 ("KVM: x86: use a separate asm-offsets.c file")

 # retbleed work in x86/core:
 5d8213864a ("x86/retbleed: Add SKL return thunk")

... and these commits in include/linux/bpf.h:

  # upstram:
  18acb7fac2 ("bpf: Revert ("Fix dispatcher patchable function entry to 5 bytes nop")")

  # x86/core commits:
  931ab63664 ("x86/ibt: Implement FineIBT")
  bea75b3389 ("x86/Kconfig: Introduce function padding")

The latter two modify BPF_DISPATCHER_ATTRIBUTES(), which was removed upstream.

 Conflicts:
	arch/x86/kernel/asm-offsets.c
	include/linux/bpf.h

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2022-11-21 23:01:51 +01:00
Nick Desaulniers
9f8fe64779 Makefile.debug: support for -gz=zstd
Make DEBUG_INFO_COMPRESSED a choice; DEBUG_INFO_COMPRESSED_NONE is the
default, DEBUG_INFO_COMPRESSED_ZLIB uses zlib,
DEBUG_INFO_COMPRESSED_ZSTD uses zstd.

This renames the existing KConfig option DEBUG_INFO_COMPRESSED to
DEBUG_INFO_COMPRESSED_ZLIB so users upgrading may need to reset the new
Kconfigs.

Some quick N=1 measurements with du, /usr/bin/time -v, and bloaty:

clang-16, x86_64 defconfig plus
CONFIG_DEBUG_INFO=y CONFIG_DEBUG_INFO_COMPRESSED_NONE=y:
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:55.43
488M vmlinux
27.6%   136Mi   0.0%       0    .debug_info
 6.1%  30.2Mi   0.0%       0    .debug_str_offsets
 3.5%  17.2Mi   0.0%       0    .debug_line
 3.3%  16.3Mi   0.0%       0    .debug_loclists
 0.9%  4.62Mi   0.0%       0    .debug_str

clang-16, x86_64 defconfig plus
CONFIG_DEBUG_INFO=y CONFIG_DEBUG_INFO_COMPRESSED_ZLIB=y:
Elapsed (wall clock) time (h:mm:ss or m:ss): 1:00.35
385M vmlinux
21.8%  85.4Mi   0.0%       0    .debug_info
 2.1%  8.26Mi   0.0%       0    .debug_str_offsets
 2.1%  8.24Mi   0.0%       0    .debug_loclists
 1.9%  7.48Mi   0.0%       0    .debug_line
 0.5%  1.94Mi   0.0%       0    .debug_str

clang-16, x86_64 defconfig plus
CONFIG_DEBUG_INFO=y CONFIG_DEBUG_INFO_COMPRESSED_ZSTD=y:
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:59.69
373M vmlinux
21.4%  81.4Mi   0.0%       0    .debug_info
 2.3%  8.85Mi   0.0%       0    .debug_loclists
 1.5%  5.71Mi   0.0%       0    .debug_line
 0.5%  1.95Mi   0.0%       0    .debug_str_offsets
 0.4%  1.62Mi   0.0%       0    .debug_str

That's only a 3.11% overall binary size savings over zlib, but at no
performance regression.

Link: https://maskray.me/blog/2022-09-09-zstd-compressed-debug-sections
Link: https://maskray.me/blog/2022-01-23-compressed-debug-sections
Suggested-by: Sedat Dilek (DHL Supply Chain) <sedat.dilek@dhl.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2022-11-21 10:18:39 +09:00
Alexander Potapenko
ac66998df3 Kconfig.debug: ensure early check for KMSAN in CONFIG_KMSAN_WARN
As pointed out by Masahiro Yamada, Kconfig picks up the first default
entry which has true 'if' condition.  Hence, the previously added check
for KMSAN was never used, because it followed the checks for 64BIT and
!64BIT.

Put KMSAN check before others to ensure it is always applied.

Link: https://lkml.kernel.org/r/20221102110611.1085175-3-glider@google.com
Link: https://github.com/google/kmsan/issues/89
Link: https://lore.kernel.org/linux-mm/20221024212144.2852069-3-glider@google.com/
Fixes: 921757bc9b ("Kconfig.debug: disable CONFIG_FRAME_WARN for KMSAN by default")
Signed-off-by: Alexander Potapenko <glider@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Marco Elver <elver@google.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-08 15:57:24 -08:00
Liam Howlett
120b116208 maple_tree: reorganize testing to restore module testing
Along the development cycle, the testing code support for module/in-kernel
compiles was removed.  Restore this functionality by moving any internal
API tests to the userspace side, as well as threading tests.  Fix the
lockdep issues and add a way to reduce memory usage so the tests can
complete with KASAN + memleak detection.  Make the tests work on 32 bit
hosts where possible and detect 32 bit hosts in the radix test suite.

[akpm@linux-foundation.org: fix module export]
[akpm@linux-foundation.org: fix it some more]
[liam.howlett@oracle.com: fix compile warnings on 32bit build in check_find()]
  Link: https://lkml.kernel.org/r/20221107203816.1260327-1-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20221028180415.3074673-1-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-08 15:57:22 -08:00
Kees Cook
fb3d88ab35 siphash: Convert selftest to KUnit
Convert the siphash self-test to KUnit so it will be included in "all
KUnit tests" coverage, and can be run individually still:

$ ./tools/testing/kunit/kunit.py run siphash
...
[02:58:45] Starting KUnit Kernel (1/1)...
[02:58:45] ============================================================
[02:58:45] =================== siphash (1 subtest) ====================
[02:58:45] [PASSED] siphash_test
[02:58:45] ===================== [PASSED] siphash =====================
[02:58:45] ============================================================
[02:58:45] Testing complete. Ran 1 tests: passed: 1
[02:58:45] Elapsed time: 21.421s total, 4.306s configuring, 16.947s building, 0.148s running

Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: "Steven Rostedt (Google)" <rostedt@goodmis.org>
Cc: Yury Norov <yury.norov@gmail.com>
Cc: Sander Vanheule <sander@svanheule.net>
Acked-by: "Jason A. Donenfeld" <Jason@zx2c4.com>
Link: https://lore.kernel.org/lkml/CAHmME9r+9MPH6zk3Vn=buEMSbQiWMFryqqzerKarmjYk+tHLJA@mail.gmail.com
Tested-by: David Gow <davidgow@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
2022-11-01 10:04:52 -07:00
Kees Cook
41eefc46a3 string: Convert strscpy() self-test to KUnit
Convert the strscpy() self-test to a KUnit test.

Cc: David Gow <davidgow@google.com>
Cc: Tobin C. Harding <tobin@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/lkml/Y072ZMk/hNkfwqMv@dev-arch.thelio-3990X
Signed-off-by: Kees Cook <keescook@chromium.org>
2022-11-01 10:04:52 -07:00
Tejun Heo
6ab428604f cgroup: Implement DEBUG_CGROUP_REF
It's really difficult to debug when cgroup or css refs leak. Let's add a
debug option to force the refcnt function to not be inlined so that they can
be kprobed for debugging.

Signed-off-by: Tejun Heo <tj@kernel.org>
2022-10-28 10:46:17 -10:00
Alexander Potapenko
921757bc9b Kconfig.debug: disable CONFIG_FRAME_WARN for KMSAN by default
KMSAN adds a lot of instrumentation to the code, which results in
increased stack usage (up to 2048 bytes and more in some cases).  It's
hard to predict how big the stack frames can be, so we disable the
warnings for KMSAN instead.

Link: https://lkml.kernel.org/r/20221024212144.2852069-3-glider@google.com
Link: https://github.com/google/kmsan/issues/89
Signed-off-by: Alexander Potapenko <glider@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-28 13:37:23 -07:00
Peter Zijlstra
d49a062621 arch: Introduce CONFIG_FUNCTION_ALIGNMENT
Generic function-alignment infrastructure.

Architectures can select FUNCTION_ALIGNMENT_xxB symbols; the
FUNCTION_ALIGNMENT symbol is then set to the largest such selected
size, 0 otherwise.

From this the -falign-functions compiler argument and __ALIGN macro
are set.

This incorporates the DEBUG_FORCE_FUNCTION_ALIGN_64B knob and future
alignment requirements for x86_64 (later in this series) into a single
place.

NOTE: also removes the 0x90 filler byte from the generic __ALIGN
      primitive, that value makes no sense outside of x86.

NOTE: .balign 0 reverts to a no-op.

Requested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220915111143.719248727@infradead.org
2022-10-17 16:40:58 +02:00
Linus Torvalds
2df76606db Kbuild fixes for v6.1
- Fix CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y compile error for
    the combination of Clang >= 14 and GAS <= 2.35.
 
  - Drop vmlinux.bz2 from the rpm package as it just annoyingly increased
    the package size.
 
  - Fix modpost error under build environments using musl.
 
  - Make *.ll files keep value names for easier debugging
 
  - Fix single directory build
 
  - Prevent RISC-V from selecting the broken DWARF5 support when Clang
    and GAS are used together.
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmNMSCcVHG1hc2FoaXJv
 eUBrZXJuZWwub3JnAAoJED2LAQed4NsGuVsP/j9FBN3x9S14gAHpu4BAFLK0s31W
 A5sGtmEb1keLqW4oY7/5bcr8KgIrY1extJBeSOJHLB1z/cfU7CHd7bl3+oadZH+z
 BNQ7F9SAHm9GuZoM58TMmC5/Eq0a45bqEP32wvoscyrFQ0ka11aQw/lOZmVTYSgO
 NrTHUSD6NmJCG8hbMiJAH8ch+fziSR0JXOomOwJDxs63aXHhavjZ3z7pgySnuPav
 PD46QtKtpjH8H+gx4nJMqDWjaukGlq7+kVIHhZh3oC5KU23UfUc3d3U+Lpati4+w
 Ggl1pmR5iMsYioQ/MaC58hb06WkamAYRfxKWXvpzEAVGIHF+xhMdGybK4FOPQkQh
 J9Rb358LD1d/QtH6C77wajaEj1FvQLaOQ8CHUDSzjgGwJuz+qrpI8kwtgRxJCXgp
 0+2YQxdfWR2kJ9W7lnyguVjM7AYebqS7bCGm2fDPU92NWftw4y2TJii1v10BCD/N
 dB3orKHPp3mosAS2SdTXgMYYMlzFMzgma0PzibWvm4DE4tHtndRMvW/8c5UyB8uk
 ganuHOUg8Vup79OiANSD6lJrzq0fZofvD3euD61mis6s39GAeHvr5rlwy0xOoN8A
 TgOBu2DQFUKrlZH2m4F+hEBzCz26HTkg8+S5DNpb7Qr2EKDlLPT3xjwhQlooipNc
 KuZNXoR6wEstepn/
 =EZAr
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-fixes-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild fixes from Masahiro Yamada:

 - Fix CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y compile error for the
   combination of Clang >= 14 and GAS <= 2.35.

 - Drop vmlinux.bz2 from the rpm package as it just annoyingly increased
   the package size.

 - Fix modpost error under build environments using musl.

 - Make *.ll files keep value names for easier debugging

 - Fix single directory build

 - Prevent RISC-V from selecting the broken DWARF5 support when Clang
   and GAS are used together.

* tag 'kbuild-fixes-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  lib/Kconfig.debug: Add check for non-constant .{s,u}leb128 support to DWARF5
  kbuild: fix single directory build
  kbuild: add -fno-discard-value-names to cmd_cc_ll_c
  scripts/clang-tools: Convert clang-tidy args to list
  modpost: put modpost options before argument
  kbuild: Stop including vmlinux.bz2 in the rpm's
  Kconfig.debug: add toolchain checks for DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT
  Kconfig.debug: simplify the dependency of DEBUG_INFO_DWARF4/5
2022-10-16 11:12:22 -07:00
Nathan Chancellor
0a6de78cff lib/Kconfig.debug: Add check for non-constant .{s,u}leb128 support to DWARF5
When building with a RISC-V kernel with DWARF5 debug info using clang
and the GNU assembler, several instances of the following error appear:

  /tmp/vgettimeofday-48aa35.s:2963: Error: non-constant .uleb128 is not supported

Dumping the .s file reveals these .uleb128 directives come from
.debug_loc and .debug_ranges:

  .Ldebug_loc0:
          .byte   4                               # DW_LLE_offset_pair
          .uleb128 .Lfunc_begin0-.Lfunc_begin0    #   starting offset
          .uleb128 .Ltmp1-.Lfunc_begin0           #   ending offset
          .byte   1                               # Loc expr size
          .byte   90                              # DW_OP_reg10
          .byte   0                               # DW_LLE_end_of_list

  .Ldebug_ranges0:
          .byte   4                               # DW_RLE_offset_pair
          .uleb128 .Ltmp6-.Lfunc_begin0           #   starting offset
          .uleb128 .Ltmp27-.Lfunc_begin0          #   ending offset
          .byte   4                               # DW_RLE_offset_pair
          .uleb128 .Ltmp28-.Lfunc_begin0          #   starting offset
          .uleb128 .Ltmp30-.Lfunc_begin0          #   ending offset
          .byte   0                               # DW_RLE_end_of_list

There is an outstanding binutils issue to support a non-constant operand
to .sleb128 and .uleb128 in GAS for RISC-V but there does not appear to
be any movement on it, due to concerns over how it would work with
linker relaxation.

To avoid these build errors, prevent DWARF5 from being selected when
using clang and an assembler that does not have support for these symbol
deltas, which can be easily checked in Kconfig with as-instr plus the
small test program from the dwz test suite from the binutils issue.

Link: https://sourceware.org/bugzilla/show_bug.cgi?id=27215
Link: https://github.com/ClangBuiltLinux/linux/issues/1719
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2022-10-17 02:06:47 +09:00
Masahiro Yamada
bb1435f3f5 Kconfig.debug: add toolchain checks for DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT
CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT does not give explicit
-gdwarf-* flag. The actual DWARF version is up to the toolchain.

The combination of GCC and GAS works fine, and Clang with the integrated
assembler is good too.

The combination of Clang and GAS is tricky, but at least, the -g flag
works for Clang <=13, which defaults to DWARF v4.

Clang 14 switched its default to DWARF v5.

Now, CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT has the same issue as
addressed by commit 98cd6f521f ("Kconfig: allow explicit opt in to
DWARF v5").

CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y for Clang >= 14 and
GAS < 2.35 produces a ton of errors like follows:

  /tmp/main-c2741c.s: Assembler messages:
  /tmp/main-c2741c.s:109: Error: junk at end of line, first unrecognized character is `"'
  /tmp/main-c2741c.s:109: Error: file number less than one

Add 'depends on' to check toolchains.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
2022-10-13 02:10:05 +09:00
Masahiro Yamada
4f001a2108 Kconfig.debug: simplify the dependency of DEBUG_INFO_DWARF4/5
Commit c0a5c81ca9 ("Kconfig.debug: drop GCC 5+ version check for
DWARF5") could have cleaned up the code a bit more.

"CC_IS_CLANG &&" is unneeded. No functional change is intended.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
2022-10-13 02:10:05 +09:00
Linus Torvalds
27bc50fc90 - Yu Zhao's Multi-Gen LRU patches are here. They've been under test in
linux-next for a couple of months without, to my knowledge, any negative
   reports (or any positive ones, come to that).
 
 - Also the Maple Tree from Liam R.  Howlett.  An overlapping range-based
   tree for vmas.  It it apparently slight more efficient in its own right,
   but is mainly targeted at enabling work to reduce mmap_lock contention.
 
   Liam has identified a number of other tree users in the kernel which
   could be beneficially onverted to mapletrees.
 
   Yu Zhao has identified a hard-to-hit but "easy to fix" lockdep splat
   (https://lkml.kernel.org/r/CAOUHufZabH85CeUN-MEMgL8gJGzJEWUrkiM58JkTbBhh-jew0Q@mail.gmail.com).
   This has yet to be addressed due to Liam's unfortunately timed
   vacation.  He is now back and we'll get this fixed up.
 
 - Dmitry Vyukov introduces KMSAN: the Kernel Memory Sanitizer.  It uses
   clang-generated instrumentation to detect used-unintialized bugs down to
   the single bit level.
 
   KMSAN keeps finding bugs.  New ones, as well as the legacy ones.
 
 - Yang Shi adds a userspace mechanism (madvise) to induce a collapse of
   memory into THPs.
 
 - Zach O'Keefe has expanded Yang Shi's madvise(MADV_COLLAPSE) to support
   file/shmem-backed pages.
 
 - userfaultfd updates from Axel Rasmussen
 
 - zsmalloc cleanups from Alexey Romanov
 
 - cleanups from Miaohe Lin: vmscan, hugetlb_cgroup, hugetlb and memory-failure
 
 - Huang Ying adds enhancements to NUMA balancing memory tiering mode's
   page promotion, with a new way of detecting hot pages.
 
 - memcg updates from Shakeel Butt: charging optimizations and reduced
   memory consumption.
 
 - memcg cleanups from Kairui Song.
 
 - memcg fixes and cleanups from Johannes Weiner.
 
 - Vishal Moola provides more folio conversions
 
 - Zhang Yi removed ll_rw_block() :(
 
 - migration enhancements from Peter Xu
 
 - migration error-path bugfixes from Huang Ying
 
 - Aneesh Kumar added ability for a device driver to alter the memory
   tiering promotion paths.  For optimizations by PMEM drivers, DRM
   drivers, etc.
 
 - vma merging improvements from Jakub Matěn.
 
 - NUMA hinting cleanups from David Hildenbrand.
 
 - xu xin added aditional userspace visibility into KSM merging activity.
 
 - THP & KSM code consolidation from Qi Zheng.
 
 - more folio work from Matthew Wilcox.
 
 - KASAN updates from Andrey Konovalov.
 
 - DAMON cleanups from Kaixu Xia.
 
 - DAMON work from SeongJae Park: fixes, cleanups.
 
 - hugetlb sysfs cleanups from Muchun Song.
 
 - Mike Kravetz fixes locking issues in hugetlbfs and in hugetlb core.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY0HaPgAKCRDdBJ7gKXxA
 joPjAQDZ5LlRCMWZ1oxLP2NOTp6nm63q9PWcGnmY50FjD/dNlwEAnx7OejCLWGWf
 bbTuk6U2+TKgJa4X7+pbbejeoqnt5QU=
 =xfWx
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - Yu Zhao's Multi-Gen LRU patches are here. They've been under test in
   linux-next for a couple of months without, to my knowledge, any
   negative reports (or any positive ones, come to that).

 - Also the Maple Tree from Liam Howlett. An overlapping range-based
   tree for vmas. It it apparently slightly more efficient in its own
   right, but is mainly targeted at enabling work to reduce mmap_lock
   contention.

   Liam has identified a number of other tree users in the kernel which
   could be beneficially onverted to mapletrees.

   Yu Zhao has identified a hard-to-hit but "easy to fix" lockdep splat
   at [1]. This has yet to be addressed due to Liam's unfortunately
   timed vacation. He is now back and we'll get this fixed up.

 - Dmitry Vyukov introduces KMSAN: the Kernel Memory Sanitizer. It uses
   clang-generated instrumentation to detect used-unintialized bugs down
   to the single bit level.

   KMSAN keeps finding bugs. New ones, as well as the legacy ones.

 - Yang Shi adds a userspace mechanism (madvise) to induce a collapse of
   memory into THPs.

 - Zach O'Keefe has expanded Yang Shi's madvise(MADV_COLLAPSE) to
   support file/shmem-backed pages.

 - userfaultfd updates from Axel Rasmussen

 - zsmalloc cleanups from Alexey Romanov

 - cleanups from Miaohe Lin: vmscan, hugetlb_cgroup, hugetlb and
   memory-failure

 - Huang Ying adds enhancements to NUMA balancing memory tiering mode's
   page promotion, with a new way of detecting hot pages.

 - memcg updates from Shakeel Butt: charging optimizations and reduced
   memory consumption.

 - memcg cleanups from Kairui Song.

 - memcg fixes and cleanups from Johannes Weiner.

 - Vishal Moola provides more folio conversions

 - Zhang Yi removed ll_rw_block() :(

 - migration enhancements from Peter Xu

 - migration error-path bugfixes from Huang Ying

 - Aneesh Kumar added ability for a device driver to alter the memory
   tiering promotion paths. For optimizations by PMEM drivers, DRM
   drivers, etc.

 - vma merging improvements from Jakub Matěn.

 - NUMA hinting cleanups from David Hildenbrand.

 - xu xin added aditional userspace visibility into KSM merging
   activity.

 - THP & KSM code consolidation from Qi Zheng.

 - more folio work from Matthew Wilcox.

 - KASAN updates from Andrey Konovalov.

 - DAMON cleanups from Kaixu Xia.

 - DAMON work from SeongJae Park: fixes, cleanups.

 - hugetlb sysfs cleanups from Muchun Song.

 - Mike Kravetz fixes locking issues in hugetlbfs and in hugetlb core.

Link: https://lkml.kernel.org/r/CAOUHufZabH85CeUN-MEMgL8gJGzJEWUrkiM58JkTbBhh-jew0Q@mail.gmail.com [1]

* tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (555 commits)
  hugetlb: allocate vma lock for all sharable vmas
  hugetlb: take hugetlb vma_lock when clearing vma_lock->vma pointer
  hugetlb: fix vma lock handling during split vma and range unmapping
  mglru: mm/vmscan.c: fix imprecise comments
  mm/mglru: don't sync disk for each aging cycle
  mm: memcontrol: drop dead CONFIG_MEMCG_SWAP config symbol
  mm: memcontrol: use do_memsw_account() in a few more places
  mm: memcontrol: deprecate swapaccounting=0 mode
  mm: memcontrol: don't allocate cgroup swap arrays when memcg is disabled
  mm/secretmem: remove reduntant return value
  mm/hugetlb: add available_huge_pages() func
  mm: remove unused inline functions from include/linux/mm_inline.h
  selftests/vm: add selftest for MADV_COLLAPSE of uffd-minor memory
  selftests/vm: add file/shmem MADV_COLLAPSE selftest for cleared pmd
  selftests/vm: add thp collapse shmem testing
  selftests/vm: add thp collapse file and tmpfs testing
  selftests/vm: modularize thp collapse memory operations
  selftests/vm: dedup THP helpers
  mm/khugepaged: add tracepoint to hpage_collapse_scan_file()
  mm/madvise: add file and shmem support to MADV_COLLAPSE
  ...
2022-10-10 17:53:04 -07:00
Linus Torvalds
7f6dcffb44 Preempt RT cleanups:
Introduce preempt_[dis|enable_nested() and use it to clean up
  various places which have open coded PREEMPT_RT conditionals.
 
  On PREEMPT_RT enabled kernels, spinlocks and rwlocks are neither disabling
  preemption nor interrupts. Though there are a few places which depend on
  the implicit preemption/interrupt disable of those locks, e.g. seqcount
  write sections, per CPU statistics updates etc.
 
  PREEMPT_RT added open coded CONFIG_PREEMPT_RT conditionals to
  disable/enable preemption in the related code parts all over the
  place. That's hard to read and does not really explain why this is
  necessary.
 
  Linus suggested to use helper functions (preempt_disable_nested() and
  preempt_enable_nested()) and use those in the affected places. On !RT
  enabled kernels these functions are NOPs, but contain a lockdep assert to
  validate that preemption is actually disabled to catch call sites which
  do not have preemption disabled.
 
  Clean up the affected code paths in mm, dentry and lib.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmM9c8MTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYobrrEADHkvkCUHxRlarfinQY2rxEpC4nbnAg
 ibg+LWpDpqqZwkjADExu6+lsbb0mCdvlFyvSPwY2YcQAkj/bkTAXvdf3KjejTl++
 B1J5/Cr5lyyKjajjl1efxdORgATBvwuEjR2moJiU868ZR3K4vgflN9n51A0U+NAn
 3kOj/TYotFlyDNJeoK/8edqZwKaueXs3fsYGC1aq2X8mQLI4QDeaHUR6R8CU4w+X
 bVSIdKNluIYxyc3Eav5sDwzyF6gOSL+9DtZcVyXxJ6+PrkDdkptO23derVHk19WE
 ymdAwVX6S37L6HNhJgqeScs+s3xD8KDmvu5ktEAtqC0unBP8JwOFZKCZaaYj91j3
 iMjMC4UFcXI5sERWhDXTSja2g0pYV6q3myfYfojxe6xXHlrVs42gCzDpOI4LZncM
 lvPfmhb7JR7zEmBEvVyEOX8B16ecWnUqgihU17a3ogGdKW1PRNWcWj3RmNXDmpGD
 YZsZSfsawMSJsDIrNRCydXrsiFBNIoVStN7K7c+blnNV8ER5rt24dqCJyUhrl4fB
 K8hNvDp+T8N0f6nlIUWk42vjhskEo2ijCnpvHSXQc1UL7WmLfaJf3/T9zlufPwqJ
 7yVuWd9vZIb3iVAKz+LqOzLlHcgeJmYlbSBsj+Ay1UHPsNgYulDEKcuNniVoG39u
 zFgHu3OmIRueHA==
 =3M58
 -----END PGP SIGNATURE-----

Merge tag 'sched-rt-2022-10-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull preempt RT updates from Thomas Gleixner:
 "Introduce preempt_[dis|enable_nested() and use it to clean up various
  places which have open coded PREEMPT_RT conditionals.

  On PREEMPT_RT enabled kernels, spinlocks and rwlocks are neither
  disabling preemption nor interrupts. Though there are a few places
  which depend on the implicit preemption/interrupt disable of those
  locks, e.g. seqcount write sections, per CPU statistics updates etc.

  PREEMPT_RT added open coded CONFIG_PREEMPT_RT conditionals to
  disable/enable preemption in the related code parts all over the
  place. That's hard to read and does not really explain why this is
  necessary.

  Linus suggested to use helper functions (preempt_disable_nested() and
  preempt_enable_nested()) and use those in the affected places. On !RT
  enabled kernels these functions are NOPs, but contain a lockdep assert
  to validate that preemption is actually disabled to catch call sites
  which do not have preemption disabled.

  Clean up the affected code paths in mm, dentry and lib"

* tag 'sched-rt-2022-10-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  u64_stats: Streamline the implementation
  flex_proportions: Disable preemption entering the write section.
  mm/compaction: Get rid of RT ifdeffery
  mm/memcontrol: Replace the PREEMPT_RT conditionals
  mm/debug: Provide VM_WARN_ON_IRQS_ENABLED()
  mm/vmstat: Use preempt_[dis|en]able_nested()
  dentry: Use preempt_[dis|en]able_nested()
  preempt: Provide preempt_[dis|en]able_nested()
2022-10-10 10:03:24 -07:00
Linus Torvalds
3871d93b82 Perf events updates for v6.1:
- PMU driver updates:
 
      - Add AMD Last Branch Record Extension Version 2 (LbrExtV2)
        feature support for Zen 4 processors.
 
      - Extend the perf ABI to provide branch speculation information,
        if available, and use this on CPUs that have it (eg. LbrExtV2).
 
      - Improve Intel PEBS TSC timestamp handling & integration.
 
      - Add Intel Raptor Lake S CPU support.
 
      - Add 'perf mem' and 'perf c2c' memory profiling support on
        AMD CPUs by utilizing IBS tagged load/store samples.
 
      - Clean up & optimize various x86 PMU details.
 
  - HW breakpoints:
 
      - Big rework to optimize the code for systems with hundreds of CPUs and
        thousands of breakpoints:
 
         - Replace the nr_bp_mutex global mutex with the bp_cpuinfo_sem
 	  per-CPU rwsem that is read-locked during most of the key operations.
 
 	- Improve the O(#cpus * #tasks) logic in toggle_bp_slot()
 	  and fetch_bp_busy_slots().
 
 	- Apply micro-optimizations & cleanups.
 
   - Misc cleanups & enhancements.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmM/2pMRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1iIMA/+J+MCEVTt9kwZeBtHoPX7iZ5gnq1+McoQ
 f6ALX19AO/ZSuA7EBA3cS3Ny5eyGy3ofYUnRW+POezu9CpflLW/5N27R2qkZFrWC
 A09B86WH676ZrmXt+oI05rpZ2y/NGw4gJxLLa4/bWF3g9xLfo21i+YGKwdOnNFpl
 DEdCVHtjlMcOAU3+on6fOYuhXDcYd7PKGcCfLE7muOMOAtwyj0bUDBt7m+hneZgy
 qbZHzDU2DZ5L/LXiMyuZj5rC7V4xUbfZZfXglG38YCW1WTieS3IjefaU2tREhu7I
 rRkCK48ULDNNJR3dZK8IzXJRxusq1ICPG68I+nm/K37oZyTZWtfYZWehW/d/TnPa
 tUiTwimabz7UUqaGq9ZptxwINcAigax0nl6dZ3EseeGhkDE6j71/3kqrkKPz4jth
 +fCwHLOrI3c4Gq5qWgPvqcUlUneKf3DlOMtzPKYg7sMhla2zQmFpYCPzKfm77U/Z
 BclGOH3FiwaK6MIjPJRUXTePXqnUseqCR8PCH/UPQUeBEVHFcMvqCaa15nALed8x
 dFi76VywR9mahouuLNq6sUNePlvDd2B124PygNwegLlBfY9QmKONg9qRKOnQpuJ6
 UprRJjLOOucZ/N/jn6+ShHkqmXsnY2MhfUoHUoMQ0QAI+n++e+2AuePo251kKWr8
 QlqKxd9PMQU=
 =LcGg
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-2022-10-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf events updates from Ingo Molnar:
 "PMU driver updates:

   - Add AMD Last Branch Record Extension Version 2 (LbrExtV2) feature
     support for Zen 4 processors.

   - Extend the perf ABI to provide branch speculation information, if
     available, and use this on CPUs that have it (eg. LbrExtV2).

   - Improve Intel PEBS TSC timestamp handling & integration.

   - Add Intel Raptor Lake S CPU support.

   - Add 'perf mem' and 'perf c2c' memory profiling support on AMD CPUs
     by utilizing IBS tagged load/store samples.

   - Clean up & optimize various x86 PMU details.

  HW breakpoints:

   - Big rework to optimize the code for systems with hundreds of CPUs
     and thousands of breakpoints:

      - Replace the nr_bp_mutex global mutex with the bp_cpuinfo_sem
        per-CPU rwsem that is read-locked during most of the key
        operations.

      - Improve the O(#cpus * #tasks) logic in toggle_bp_slot() and
        fetch_bp_busy_slots().

      - Apply micro-optimizations & cleanups.

  - Misc cleanups & enhancements"

* tag 'perf-core-2022-10-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (75 commits)
  perf/hw_breakpoint: Annotate tsk->perf_event_mutex vs ctx->mutex
  perf: Fix pmu_filter_match()
  perf: Fix lockdep_assert_event_ctx()
  perf/x86/amd/lbr: Adjust LBR regardless of filtering
  perf/x86/utils: Fix uninitialized var in get_branch_type()
  perf/uapi: Define PERF_MEM_SNOOPX_PEER in kernel header file
  perf/x86/amd: Support PERF_SAMPLE_PHY_ADDR
  perf/x86/amd: Support PERF_SAMPLE_ADDR
  perf/x86/amd: Support PERF_SAMPLE_{WEIGHT|WEIGHT_STRUCT}
  perf/x86/amd: Support PERF_SAMPLE_DATA_SRC
  perf/x86/amd: Add IBS OP_DATA2 DataSrc bit definitions
  perf/mem: Introduce PERF_MEM_LVLNUM_{EXTN_MEM|IO}
  perf/x86/uncore: Add new Raptor Lake S support
  perf/x86/cstate: Add new Raptor Lake S support
  perf/x86/msr: Add new Raptor Lake S support
  perf/x86: Add new Raptor Lake S support
  bpf: Check flags for branch stack in bpf_read_branch_records helper
  perf, hw_breakpoint: Fix use-after-free if perf_event_open() fails
  perf: Use sample_flags for raw_data
  perf: Use sample_flags for addr
  ...
2022-10-10 09:27:46 -07:00
Linus Torvalds
e8bc52cb8d Driver core changes for 6.1-rc1
Here is the big set of driver core and debug printk changes for 6.1-rc1.
 Included in here is:
 	- dynamic debug updates for the core and the drm subsystem.  The
 	  drm changes have all been acked by the relevant maintainers.
 	- kernfs fixes for syzbot reported problems
 	- kernfs refactors and updates for cgroup requirements
 	- magic number cleanups and removals from the kernel tree (they
 	  were not being used and they really did not actually do
 	  anything.)
 	- other tiny cleanups
 
 All of these have been in linux-next for a while with no reported
 issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCY0BYUA8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ylozwCdFRlcghaf7XBUyNgRZRwMC+oQI8EAn1G/nEDE
 6aFd2er41uK0IGQnSmYO
 =OK0k
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg KH:
 "Here is the big set of driver core and debug printk changes for
  6.1-rc1. Included in here is:

   - dynamic debug updates for the core and the drm subsystem. The drm
     changes have all been acked by the relevant maintainers

   - kernfs fixes for syzbot reported problems

   - kernfs refactors and updates for cgroup requirements

   - magic number cleanups and removals from the kernel tree (they were
     not being used and they really did not actually do anything)

   - other tiny cleanups

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'driver-core-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (74 commits)
  docs: filesystems: sysfs: Make text and code for ->show() consistent
  Documentation: NBD_REQUEST_MAGIC isn't a magic number
  a.out: restore CMAGIC
  device property: Add const qualifier to device_get_match_data() parameter
  drm_print: add _ddebug descriptor to drm_*dbg prototypes
  drm_print: prefer bare printk KERN_DEBUG on generic fn
  drm_print: optimize drm_debug_enabled for jump-label
  drm-print: add drm_dbg_driver to improve namespace symmetry
  drm-print.h: include dyndbg header
  drm_print: wrap drm_*_dbg in dyndbg descriptor factory macro
  drm_print: interpose drm_*dbg with forwarding macros
  drm: POC drm on dyndbg - use in core, 2 helpers, 3 drivers.
  drm_print: condense enum drm_debug_category
  debugfs: use DEFINE_SHOW_ATTRIBUTE to define debugfs_regset32_fops
  driver core: use IS_ERR_OR_NULL() helper in device_create_groups_vargs()
  Documentation: ENI155_MAGIC isn't a magic number
  Documentation: NBD_REPLY_MAGIC isn't a magic number
  nbd: remove define-only NBD_MAGIC, previously magic number
  Documentation: FW_HEADER_MAGIC isn't a magic number
  Documentation: EEPROM_MAGIC_VALUE isn't a magic number
  ...
2022-10-07 17:04:10 -07:00
Linus Torvalds
d0989d01c6 hardening updates for v6.1-rc1
Various fixes across several hardening areas:
 
 - loadpin: Fix verity target enforcement (Matthias Kaehlcke).
 
 - zero-call-used-regs: Add missing clobbers in paravirt (Bill Wendling).
 
 - CFI: clean up sparc function pointer type mismatches (Bart Van Assche).
 
 - Clang: Adjust compiler flag detection for various Clang changes (Sami
   Tolvanen, Kees Cook).
 
 - fortify: Fix warnings in arch-specific code in sh, ARM, and xen.
 
 Improvements to existing features:
 
 - testing: improve overflow KUnit test, introduce fortify KUnit test,
   add more coverage to LKDTM tests (Bart Van Assche, Kees Cook).
 
 - overflow: Relax overflow type checking for wider utility.
 
 New features:
 
 - string: Introduce strtomem() and strtomem_pad() to fill a gap in
   strncpy() replacement needs.
 
 - um: Enable FORTIFY_SOURCE support.
 
 - fortify: Enable run-time struct member memcpy() overflow warning.
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmM4chcWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJvq1D/9uKU03RozAOnzhi4gcgRnHZSAK
 oOQOkPwnkUgFU0yOnMkNYOZ7njLnM+CjCN3RJ9SSpD2lrQ23PwLeThAuOzy0brPO
 0iAksIztSF3e5tAyFjtFkjswrY8MSv/TkF0WttTOSOj3lCUcwatF0FBkclCOXtwu
 ILXfG7K8E17r/wsUejN+oMAI42ih/YeVQAZpKRymEEJsK+Lly7OT4uu3fdFWVb1P
 M77eRLI2Vg1eSgMVwv6XdwGakpUdwsboK7do0GGX+JOrhayJoCfY2IpwyPz9ciel
 jsp9OQs8NrlPJMa2sQ7LDl+b5EQl/MtggX3JlQEbLs2LV7gDtYgAWNo6vxCT5Lvd
 zB7TZqIR3lrVjbtw4FAKQ+41bS4VOajk2NB3Mkiy5AfivB+6zKF+P56a+xSoNhOl
 iktpjCEP7bp4oxmTMXpOfmywjh/ZsyoMhQ2ABP7S+JZ5rHUndpPAjjuBetIcHxX2
 28Wlr4aFIF9ff9caasg4sMYXcQMGnuLUlUKngceUbd1umZZRNZ1gaIxYpm9poefm
 qd/lvTIvzn9V8IB8wHVmvafbvDbV88A+2bKJdSUDA352Dt9PvqT7yI0dmbMNliGL
 os+iLPW6Y6x38BxhXax0HR9FEhO3Eq7kLdNdc4J29NvISg8HHaifwNrG41lNwaWL
 cuc6IAjLxiRk3NsUpg==
 =HZ6+
 -----END PGP SIGNATURE-----

Merge tag 'hardening-v6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull kernel hardening updates from Kees Cook:
 "Most of the collected changes here are fixes across the tree for
  various hardening features (details noted below).

  The most notable new feature here is the addition of the memcpy()
  overflow warning (under CONFIG_FORTIFY_SOURCE), which is the next step
  on the path to killing the common class of "trivially detectable"
  buffer overflow conditions (i.e. on arrays with sizes known at compile
  time) that have resulted in many exploitable vulnerabilities over the
  years (e.g. BleedingTooth).

  This feature is expected to still have some undiscovered false
  positives. It's been in -next for a full development cycle and all the
  reported false positives have been fixed in their respective trees.
  All the known-bad code patterns we could find with Coccinelle are also
  either fixed in their respective trees or in flight.

  The commit message in commit 54d9469bc5 ("fortify: Add run-time WARN
  for cross-field memcpy()") for the feature has extensive details, but
  I'll repeat here that this is a warning _only_, and is not intended to
  actually block overflows (yet). The many patches fixing array sizes
  and struct members have been landing for several years now, and we're
  finally able to turn this on to find any remaining stragglers.

  Summary:

  Various fixes across several hardening areas:

   - loadpin: Fix verity target enforcement (Matthias Kaehlcke).

   - zero-call-used-regs: Add missing clobbers in paravirt (Bill
     Wendling).

   - CFI: clean up sparc function pointer type mismatches (Bart Van
     Assche).

   - Clang: Adjust compiler flag detection for various Clang changes
     (Sami Tolvanen, Kees Cook).

   - fortify: Fix warnings in arch-specific code in sh, ARM, and xen.

  Improvements to existing features:

   - testing: improve overflow KUnit test, introduce fortify KUnit test,
     add more coverage to LKDTM tests (Bart Van Assche, Kees Cook).

   - overflow: Relax overflow type checking for wider utility.

  New features:

   - string: Introduce strtomem() and strtomem_pad() to fill a gap in
     strncpy() replacement needs.

   - um: Enable FORTIFY_SOURCE support.

   - fortify: Enable run-time struct member memcpy() overflow warning"

* tag 'hardening-v6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (27 commits)
  Makefile.extrawarn: Move -Wcast-function-type-strict to W=1
  hardening: Remove Clang's enable flag for -ftrivial-auto-var-init=zero
  sparc: Unbreak the build
  x86/paravirt: add extra clobbers with ZERO_CALL_USED_REGS enabled
  x86/paravirt: clean up typos and grammaros
  fortify: Convert to struct vs member helpers
  fortify: Explicitly check bounds are compile-time constants
  x86/entry: Work around Clang __bdos() bug
  ARM: decompressor: Include .data.rel.ro.local
  fortify: Adjust KUnit test for modular build
  sh: machvec: Use char[] for section boundaries
  kunit/memcpy: Avoid pathological compile-time string size
  lib: Improve the is_signed_type() kunit test
  LoadPin: Require file with verity root digests to have a header
  dm: verity-loadpin: Only trust verity targets with enforcement
  LoadPin: Fix Kconfig doc about format of file with verity digests
  um: Enable FORTIFY_SOURCE
  lkdtm: Update tests for memcpy() run-time warnings
  fortify: Add run-time WARN for cross-field memcpy()
  fortify: Use SIZE_MAX instead of (size_t)-1
  ...
2022-10-03 17:24:22 -07:00
Alexander Potapenko
f80be4571b kmsan: add KMSAN runtime core
For each memory location KernelMemorySanitizer maintains two types of
metadata:

1. The so-called shadow of that location - а byte:byte mapping describing
   whether or not individual bits of memory are initialized (shadow is 0)
   or not (shadow is 1).
2. The origins of that location - а 4-byte:4-byte mapping containing
   4-byte IDs of the stack traces where uninitialized values were
   created.

Each struct page now contains pointers to two struct pages holding KMSAN
metadata (shadow and origins) for the original struct page.  Utility
routines in mm/kmsan/core.c and mm/kmsan/shadow.c handle the metadata
creation, addressing, copying and checking.  mm/kmsan/report.c performs
error reporting in the cases an uninitialized value is used in a way that
leads to undefined behavior.

KMSAN compiler instrumentation is responsible for tracking the metadata
along with the kernel memory.  mm/kmsan/instrumentation.c provides the
implementation for instrumentation hooks that are called from files
compiled with -fsanitize=kernel-memory.

To aid parameter passing (also done at instrumentation level), each
task_struct now contains a struct kmsan_task_state used to track the
metadata of function parameters and return values for that task.

Finally, this patch provides CONFIG_KMSAN that enables KMSAN, and declares
CFLAGS_KMSAN, which are applied to files compiled with KMSAN.  The
KMSAN_SANITIZE:=n Makefile directive can be used to completely disable
KMSAN instrumentation for certain files.

Similarly, KMSAN_ENABLE_CHECKS:=n disables KMSAN checks and makes newly
created stack memory initialized.

Users can also use functions from include/linux/kmsan-checks.h to mark
certain memory regions as uninitialized or initialized (this is called
"poisoning" and "unpoisoning") or check that a particular region is
initialized.

Link: https://lkml.kernel.org/r/20220915150417.722975-12-glider@google.com
Signed-off-by: Alexander Potapenko <glider@google.com>
Acked-by: Marco Elver <elver@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Eric Biggers <ebiggers@google.com>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-03 14:03:19 -07:00
Peter Zijlstra
a1ebcd5943 Linux 6.0-rc7
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmMwwY4eHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGdlwH/0ESzdb6F9zYWwHR
 E08har56/IfwjOsn1y+JuHibpwUjzskLzdwIfI5zshSZAQTj5/UyC0P7G/wcYh/Z
 INh1uHGazmDUkx4O3lwuWLR+mmeUxZRWdq4NTwYDRNPMSiPInVxz+cZJ7y0aPr2e
 wii7kMFRHgXmX5DMDEwuHzehsJF7vZrp8zBu2DqzVUGnbwD50nPbyMM3H4g9mute
 fAEpDG0X3+smqMaKL+2rK0W/Av/87r3U8ZAztBem3nsCJ9jT7hqMO1ICcKmFMviA
 DTERRMwWjPq+mBPE2CiuhdaXvNZBW85Ds81mSddS6MsO6+Tvuzfzik/zSLQJxlBi
 vIqYphY=
 =NqG+
 -----END PGP SIGNATURE-----

Merge branch 'v6.0-rc7'

Merge upstream to get RAPTORLAKE_S

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
2022-09-29 12:20:50 +02:00
Miguel Ojeda
2f7ab1267d Kbuild: add Rust support
Having most of the new files in place, we now enable Rust support
in the build system, including `Kconfig` entries related to Rust,
the Rust configuration printer and a few other bits.

Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Co-developed-by: Alex Gaynor <alex.gaynor@gmail.com>
Signed-off-by: Alex Gaynor <alex.gaynor@gmail.com>
Co-developed-by: Finn Behrens <me@kloenk.de>
Signed-off-by: Finn Behrens <me@kloenk.de>
Co-developed-by: Adam Bratschi-Kaye <ark.email@gmail.com>
Signed-off-by: Adam Bratschi-Kaye <ark.email@gmail.com>
Co-developed-by: Wedson Almeida Filho <wedsonaf@google.com>
Signed-off-by: Wedson Almeida Filho <wedsonaf@google.com>
Co-developed-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Co-developed-by: Sven Van Asbroeck <thesven73@gmail.com>
Signed-off-by: Sven Van Asbroeck <thesven73@gmail.com>
Co-developed-by: Gary Guo <gary@garyguo.net>
Signed-off-by: Gary Guo <gary@garyguo.net>
Co-developed-by: Boris-Chengbiao Zhou <bobo1239@web.de>
Signed-off-by: Boris-Chengbiao Zhou <bobo1239@web.de>
Co-developed-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Co-developed-by: Douglas Su <d0u9.su@outlook.com>
Signed-off-by: Douglas Su <d0u9.su@outlook.com>
Co-developed-by: Dariusz Sosnowski <dsosnowski@dsosnowski.pl>
Signed-off-by: Dariusz Sosnowski <dsosnowski@dsosnowski.pl>
Co-developed-by: Antonio Terceiro <antonio.terceiro@linaro.org>
Signed-off-by: Antonio Terceiro <antonio.terceiro@linaro.org>
Co-developed-by: Daniel Xu <dxu@dxuuu.xyz>
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Co-developed-by: Björn Roy Baron <bjorn3_gh@protonmail.com>
Signed-off-by: Björn Roy Baron <bjorn3_gh@protonmail.com>
Co-developed-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com>
Signed-off-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com>
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2022-09-28 09:02:20 +02:00
Liam R. Howlett
7964cf8caa mm: remove vmacache
By using the maple tree and the maple tree state, the vmacache is no
longer beneficial and is complicating the VMA code.  Remove the vmacache
to reduce the work in keeping it up to date and code complexity.

Link: https://lkml.kernel.org/r/20220906194824.2110408-26-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Yu Zhao <yuzhao@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26 19:46:18 -07:00
Liam R. Howlett
54a611b605 Maple Tree: add new data structure
Patch series "Introducing the Maple Tree"

The maple tree is an RCU-safe range based B-tree designed to use modern
processor cache efficiently.  There are a number of places in the kernel
that a non-overlapping range-based tree would be beneficial, especially
one with a simple interface.  If you use an rbtree with other data
structures to improve performance or an interval tree to track
non-overlapping ranges, then this is for you.

The tree has a branching factor of 10 for non-leaf nodes and 16 for leaf
nodes.  With the increased branching factor, it is significantly shorter
than the rbtree so it has fewer cache misses.  The removal of the linked
list between subsequent entries also reduces the cache misses and the need
to pull in the previous and next VMA during many tree alterations.

The first user that is covered in this patch set is the vm_area_struct,
where three data structures are replaced by the maple tree: the augmented
rbtree, the vma cache, and the linked list of VMAs in the mm_struct.  The
long term goal is to reduce or remove the mmap_lock contention.

The plan is to get to the point where we use the maple tree in RCU mode.
Readers will not block for writers.  A single write operation will be
allowed at a time.  A reader re-walks if stale data is encountered.  VMAs
would be RCU enabled and this mode would be entered once multiple tasks
are using the mm_struct.

Davidlor said

: Yes I like the maple tree, and at this stage I don't think we can ask for
: more from this series wrt the MM - albeit there seems to still be some
: folks reporting breakage.  Fundamentally I see Liam's work to (re)move
: complexity out of the MM (not to say that the actual maple tree is not
: complex) by consolidating the three complimentary data structures very
: much worth it considering performance does not take a hit.  This was very
: much a turn off with the range locking approach, which worst case scenario
: incurred in prohibitive overhead.  Also as Liam and Matthew have
: mentioned, RCU opens up a lot of nice performance opportunities, and in
: addition academia[1] has shown outstanding scalability of address spaces
: with the foundation of replacing the locked rbtree with RCU aware trees.

A similar work has been discovered in the academic press

	https://pdos.csail.mit.edu/papers/rcuvm:asplos12.pdf

Sheer coincidence.  We designed our tree with the intention of solving the
hardest problem first.  Upon settling on a b-tree variant and a rough
outline, we researched ranged based b-trees and RCU b-trees and did find
that article.  So it was nice to find reassurances that we were on the
right path, but our design choice of using ranges made that paper unusable
for us.

This patch (of 70):

The maple tree is an RCU-safe range based B-tree designed to use modern
processor cache efficiently.  There are a number of places in the kernel
that a non-overlapping range-based tree would be beneficial, especially
one with a simple interface.  If you use an rbtree with other data
structures to improve performance or an interval tree to track
non-overlapping ranges, then this is for you.

The tree has a branching factor of 10 for non-leaf nodes and 16 for leaf
nodes.  With the increased branching factor, it is significantly shorter
than the rbtree so it has fewer cache misses.  The removal of the linked
list between subsequent entries also reduces the cache misses and the need
to pull in the previous and next VMA during many tree alterations.

The first user that is covered in this patch set is the vm_area_struct,
where three data structures are replaced by the maple tree: the augmented
rbtree, the vma cache, and the linked list of VMAs in the mm_struct.  The
long term goal is to reduce or remove the mmap_lock contention.

The plan is to get to the point where we use the maple tree in RCU mode.
Readers will not block for writers.  A single write operation will be
allowed at a time.  A reader re-walks if stale data is encountered.  VMAs
would be RCU enabled and this mode would be entered once multiple tasks
are using the mm_struct.

There is additional BUG_ON() calls added within the tree, most of which
are in debug code.  These will be replaced with a WARN_ON() call in the
future.  There is also additional BUG_ON() calls within the code which
will also be reduced in number at a later date.  These exist to catch
things such as out-of-range accesses which would crash anyways.

Link: https://lkml.kernel.org/r/20220906194824.2110408-1-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220906194824.2110408-2-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Tested-by: David Howells <dhowells@redhat.com>
Tested-by: Sven Schnelle <svens@linux.ibm.com>
Tested-by: Yu Zhao <yuzhao@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Hildenbrand <david@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26 19:46:13 -07:00
Nick Desaulniers
32ef9e5054 Makefile.debug: re-enable debug info for .S files
Alexey reported that the fraction of unknown filename instances in
kallsyms grew from ~0.3% to ~10% recently; Bill and Greg tracked it down
to assembler defined symbols, which regressed as a result of:

commit b8a9092330 ("Kbuild: do not emit debug info for assembly with LLVM_IAS=1")

In that commit, I allude to restoring debug info for assembler defined
symbols in a follow up patch, but it seems I forgot to do so in

commit a66049e2cf ("Kbuild: make DWARF version a choice")

Link: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=31bf18645d98b4d3d7357353be840e320649a67d
Fixes: b8a9092330 ("Kbuild: do not emit debug info for assembly with LLVM_IAS=1")
Reported-by: Alexey Alexandrov <aalexand@google.com>
Reported-by: Bill Wendling <morbo@google.com>
Reported-by: Greg Thelen <gthelen@google.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Suggested-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2022-09-24 11:19:19 +09:00
Thomas Gleixner
a738e9bad6 mm/debug: Provide VM_WARN_ON_IRQS_ENABLED()
Some places in the VM code expect interrupts disabled, which is a valid
expectation on non-PREEMPT_RT kernels, but does not hold on RT kernels in
some places because the RT spinlock substitution does not disable
interrupts.

To avoid sprinkling CONFIG_PREEMPT_RT conditionals into those places,
provide VM_WARN_ON_IRQS_ENABLED() which is only enabled when VM_DEBUG=y and
PREEMPT_RT=n.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Link: https://lore.kernel.org/r/20220825164131.402717-5-bigeasy@linutronix.de
2022-09-19 14:35:08 +02:00
Greg Kroah-Hartman
a791dc1353 Linux 6.0-rc5
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmMeQ2keHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGYRMH+gLNHiGirGZlm2GQ
 tKaZQUy7MiXuIP0hGDonDIIIAmIVhnjm9MDG8KT4W8AvEd7ukncyYqJfwWeWQPhP
 4mZcf6l3Z8Ke+qiaFpXpMPCxTyWcln1ox0EoNx2g9gdPxZntaRuuaTQVljUfTiey
 aVPHxve8ip3G7jDoJnuLSxESOqWxkb8v/SshBP1E5bF5BZ+cgZRqq7FNigFqxjbk
 wF29K09BVOPjdgkSvY/b0/SnL5KlSdMAv+FrPcJNGivcdIPgf/qJks5cI2HRUo7o
 CpKgbcLorCVyD+d+zLonJBwIy3arbmKD8JqYnfdTSIqVOUqHXWUDfeydsH32u1Gu
 lPSI2Hw=
 =7LTL
 -----END PGP SIGNATURE-----

Merge 6.0-rc5 into driver-core-next

We need the driver core and debugfs changes in this branch.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-12 16:51:22 +02:00
Kees Cook
875bfd5276 fortify: Add KUnit test for FORTIFY_SOURCE internals
Add lib/fortify_kunit.c KUnit test for checking the expected behavioral
characteristics of FORTIFY_SOURCE internals.

Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Tom Rix <trix@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: "Steven Rostedt (Google)" <rostedt@goodmis.org>
Cc: Yury Norov <yury.norov@gmail.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Sander Vanheule <sander@svanheule.net>
Cc: linux-hardening@vger.kernel.org
Cc: llvm@lists.linux.dev
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
2022-09-07 16:37:26 -07:00
Jim Cromie
683263a5e0 dyndbg: add test_dynamic_debug module
Provide a simple module to allow testing DYNAMIC_DEBUG behavior.  It
calls do_prints() from module-init, and with a sysfs-node.

  dmesg -C
  dmesg -w &
  modprobe test_dynamic_debug dyndbg=+p
  echo 1 > /sys/module/dynamic_debug/parameters/verbose

  cat /sys/module/test_dynamic_debug/parameters/do_prints
  echo module test_dynamic_debug +mftl > /proc/dynamic_debug/control
  echo junk > /sys/module/test_dynamic_debug/parameters/do_prints

Acked-by: Jason Baron <jbaron@akamai.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jim Cromie <jim.cromie@gmail.com>
Link: https://lore.kernel.org/r/20220904214134.408619-9-jim.cromie@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-07 16:58:23 +02:00
Bart Van Assche
addbeea6f5 testing/selftests: Add tests for the is_signed_type() macro
Although not documented, is_signed_type() must support the 'bool' and
pointer types next to scalar and enumeration types. Add a selftest that
verifies that this macro handles all supported types correctly.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Isabella Basso <isabbasso@riseup.net>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Sander Vanheule <sander@svanheule.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Isabella Basso <isabbasso@riseup.net>
Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220826162116.1050972-2-bvanassche@acm.org
2022-08-31 10:54:05 -07:00
Marco Elver
724c299c6a perf/hw_breakpoint: Add KUnit test for constraints accounting
Add KUnit test for hw_breakpoint constraints accounting, with various
interesting mixes of breakpoint targets (some care was taken to catch
interesting corner cases via bug-injection).

The test cannot be built as a module because it requires access to
hw_breakpoint_slots(), which is not inlinable or exported on all
architectures.

Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20220829124719.675715-2-elver@google.com
2022-08-30 10:56:20 +02:00
Sander Vanheule
d3c0ca4992 lib/test_cpumask: follow KUnit style guidelines
The cpumask test suite doesn't follow the KUnit style guidelines, as
laid out in Documentation/dev-tools/kunit/style.rst.  The file is
renamed to lib/cpumask_kunit.c to clearly distinguish it from other,
non-KUnit, tests.

Link: https://lore.kernel.org/lkml/346cb279-8e75-24b0-7d12-9803f2b41c73@riseup.net/
Suggested-by: Maíra Canal <mairacanal@riseup.net>
Signed-off-by: Sander Vanheule <sander@svanheule.net>
Reviewed-by: Maíra Canal <mairacanal@riseup.net>
Reviewed-by: David Gow <davidgow@google.com>
Acked-by: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
2022-08-24 08:35:42 -07:00
Linus Torvalds
eb5699ba31 Updates to various subsystems which I help look after. lib, ocfs2,
fatfs, autofs, squashfs, procfs, etc.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCYu9BeQAKCRDdBJ7gKXxA
 jp1DAP4mjCSvAwYzXklrIt+Knv3CEY5oVVdS+pWOAOGiJpldTAD9E5/0NV+VmlD9
 kwS/13j38guulSlXRzDLmitbg81zAAI=
 =Zfum
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2022-08-06-2' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc updates from Andrew Morton:
 "Updates to various subsystems which I help look after. lib, ocfs2,
  fatfs, autofs, squashfs, procfs, etc. A relatively small amount of
  material this time"

* tag 'mm-nonmm-stable-2022-08-06-2' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (72 commits)
  scripts/gdb: ensure the absolute path is generated on initial source
  MAINTAINERS: kunit: add David Gow as a maintainer of KUnit
  mailmap: add linux.dev alias for Brendan Higgins
  mailmap: update Kirill's email
  profile: setup_profiling_timer() is moslty not implemented
  ocfs2: fix a typo in a comment
  ocfs2: use the bitmap API to simplify code
  ocfs2: remove some useless functions
  lib/mpi: fix typo 'the the' in comment
  proc: add some (hopefully) insightful comments
  bdi: remove enum wb_congested_state
  kernel/hung_task: fix address space of proc_dohung_task_timeout_secs
  lib/lzo/lzo1x_compress.c: replace ternary operator with min() and min_t()
  squashfs: support reading fragments in readahead call
  squashfs: implement readahead
  squashfs: always build "file direct" version of page actor
  Revert "squashfs: provide backing_dev_info in order to disable read-ahead"
  fs/ocfs2: Fix spelling typo in comment
  ia64: old_rr4 added under CONFIG_HUGETLB_PAGE
  proc: fix test for "vsyscall=xonly" boot option
  ...
2022-08-07 10:03:24 -07:00
Linus Torvalds
6614a3c316 - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe
Lin, Yang Shi, Anshuman Khandual and Mike Rapoport
 
 - Some kmemleak fixes from Patrick Wang and Waiman Long
 
 - DAMON updates from SeongJae Park
 
 - memcg debug/visibility work from Roman Gushchin
 
 - vmalloc speedup from Uladzislau Rezki
 
 - more folio conversion work from Matthew Wilcox
 
 - enhancements for coherent device memory mapping from Alex Sierra
 
 - addition of shared pages tracking and CoW support for fsdax, from
   Shiyang Ruan
 
 - hugetlb optimizations from Mike Kravetz
 
 - Mel Gorman has contributed some pagealloc changes to improve latency
   and realtime behaviour.
 
 - mprotect soft-dirty checking has been improved by Peter Xu
 
 - Many other singleton patches all over the place
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCYuravgAKCRDdBJ7gKXxA
 jpqSAQDrXSdII+ht9kSHlaCVYjqRFQz/rRvURQrWQV74f6aeiAD+NHHeDPwZn11/
 SPktqEUrF1pxnGQxqLh1kUFUhsVZQgE=
 =w/UH
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2022-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:
 "Most of the MM queue. A few things are still pending.

  Liam's maple tree rework didn't make it. This has resulted in a few
  other minor patch series being held over for next time.

  Multi-gen LRU still isn't merged as we were waiting for mapletree to
  stabilize. The current plan is to merge MGLRU into -mm soon and to
  later reintroduce mapletree, with a view to hopefully getting both
  into 6.1-rc1.

  Summary:

   - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe
     Lin, Yang Shi, Anshuman Khandual and Mike Rapoport

   - Some kmemleak fixes from Patrick Wang and Waiman Long

   - DAMON updates from SeongJae Park

   - memcg debug/visibility work from Roman Gushchin

   - vmalloc speedup from Uladzislau Rezki

   - more folio conversion work from Matthew Wilcox

   - enhancements for coherent device memory mapping from Alex Sierra

   - addition of shared pages tracking and CoW support for fsdax, from
     Shiyang Ruan

   - hugetlb optimizations from Mike Kravetz

   - Mel Gorman has contributed some pagealloc changes to improve
     latency and realtime behaviour.

   - mprotect soft-dirty checking has been improved by Peter Xu

   - Many other singleton patches all over the place"

 [ XFS merge from hell as per Darrick Wong in

   https://lore.kernel.org/all/YshKnxb4VwXycPO8@magnolia/ ]

* tag 'mm-stable-2022-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (282 commits)
  tools/testing/selftests/vm/hmm-tests.c: fix build
  mm: Kconfig: fix typo
  mm: memory-failure: convert to pr_fmt()
  mm: use is_zone_movable_page() helper
  hugetlbfs: fix inaccurate comment in hugetlbfs_statfs()
  hugetlbfs: cleanup some comments in inode.c
  hugetlbfs: remove unneeded header file
  hugetlbfs: remove unneeded hugetlbfs_ops forward declaration
  hugetlbfs: use helper macro SZ_1{K,M}
  mm: cleanup is_highmem()
  mm/hmm: add a test for cross device private faults
  selftests: add soft-dirty into run_vmtests.sh
  selftests: soft-dirty: add test for mprotect
  mm/mprotect: fix soft-dirty check in can_change_pte_writable()
  mm: memcontrol: fix potential oom_lock recursion deadlock
  mm/gup.c: fix formatting in check_and_migrate_movable_page()
  xfs: fail dax mount if reflink is enabled on a partition
  mm/memcontrol.c: remove the redundant updating of stats_flush_threshold
  userfaultfd: don't fail on unrecognized features
  hugetlb_cgroup: fix wrong hugetlb cgroup numa stat
  ...
2022-08-05 16:32:45 -07:00
Linus Torvalds
cfeafd9466 Driver core / kernfs changes for 6.0-rc1
Here is the set of driver core and kernfs changes for 6.0-rc1.
 
 "biggest" thing in here is some scalability improvements for kernfs for
 large systems.  Other than that, included in here are:
 	- arch topology and cache info changes that have been reviewed
 	  and discussed a lot.
 	- potential error path cleanup fixes
 	- deferred driver probe cleanups
 	- firmware loader cleanups and tweaks
 	- documentation updates
 	- other small things
 
 All of these have been in the linux-next tree for a while with no
 reported problems.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYuqCnw8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ym/JgCcCnaycJY00ZPRQm3LQCyzfJ0HgqoAn2qxGV+K
 NKycLeXZSnuvIA87dycE
 =/4Jk
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core / kernfs updates from Greg KH:
 "Here is the set of driver core and kernfs changes for 6.0-rc1.

  The "biggest" thing in here is some scalability improvements for
  kernfs for large systems. Other than that, included in here are:

   - arch topology and cache info changes that have been reviewed and
     discussed a lot.

   - potential error path cleanup fixes

   - deferred driver probe cleanups

   - firmware loader cleanups and tweaks

   - documentation updates

   - other small things

  All of these have been in the linux-next tree for a while with no
  reported problems"

* tag 'driver-core-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (63 commits)
  docs: embargoed-hardware-issues: fix invalid AMD contact email
  firmware_loader: Replace kmap() with kmap_local_page()
  sysfs docs: ABI: Fix typo in comment
  kobject: fix Kconfig.debug "its" grammar
  kernfs: Fix typo 'the the' in comment
  docs: driver-api: firmware: add driver firmware guidelines. (v3)
  arch_topology: Fix cache attributes detection in the CPU hotplug path
  ACPI: PPTT: Leave the table mapped for the runtime usage
  cacheinfo: Use atomic allocation for percpu cache attributes
  drivers/base: fix userspace break from using bin_attributes for cpumap and cpulist
  MAINTAINERS: Change mentions of mpm to olivia
  docs: ABI: sysfs-devices-soc: Update Lee Jones' email address
  docs: ABI: sysfs-class-pwm: Update Lee Jones' email address
  Documentation/process: Add embargoed HW contact for LLVM
  Revert "kernfs: Change kernfs_notify_list to llist."
  ACPI: Remove the unused find_acpi_cpu_cache_topology()
  arch_topology: Warn that topology for nested clusters is not supported
  arch_topology: Add support for parsing sockets in /cpu-map
  arch_topology: Set cluster identifier in each core/thread from /cpu-map
  arch_topology: Limit span of cpu_clustergroup_mask()
  ...
2022-08-04 11:31:20 -07:00
Randy Dunlap
b6c694740e kobject: fix Kconfig.debug "its" grammar
Use the possessive "its" instead of the contraction "it's"
where appropriate.

Cc: Russell King <linux@armlinux.org.uk>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20220715015959.12657-1-rdunlap@infradead.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-07-28 16:30:35 +02:00
Sander Vanheule
c41e8866c2 lib/test: introduce cpumask KUnit test suite
Add a basic suite of tests for cpumask, providing some tests for empty and
completely filled cpumasks.

Link: https://lkml.kernel.org/r/c96980ec35c3bd23f17c3374bf42c22971545e85.1656777646.git.sander@svanheule.net
Signed-off-by: Sander Vanheule <sander@svanheule.net>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Suggested-by: Yury Norov <yury.norov@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Marco Elver <elver@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Valentin Schneider <vschneid@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-17 17:31:41 -07:00
Mauro Carvalho Chehab
d6a21f2d73 objtool: update objtool.txt references
Changeset a8e35fece4 ("objtool: Update documentation")
renamed: tools/objtool/Documentation/stack-validation.txt
to: tools/objtool/Documentation/objtool.txt.

Update the cross-references accordingly.

Fixes: a8e35fece4 ("objtool: Update documentation")
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Link: https://lore.kernel.org/r/ec285ece6348a5be191aebe45f78d06b3319056b.1656234456.git.mchehab@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-07-07 13:10:00 -06:00
Roman Gushchin
5035ebc644 mm: shrinkers: introduce debugfs interface for memory shrinkers
This commit introduces the /sys/kernel/debug/shrinker debugfs interface
which provides an ability to observe the state of individual kernel memory
shrinkers.

Because the feature adds some memory overhead (which shouldn't be large
unless there is a huge amount of registered shrinkers), it's guarded by a
config option (enabled by default).

This commit introduces the "count" interface for each shrinker registered
in the system.

The output is in the following format:
<cgroup inode id> <nr of objects on node 0> <nr of objects on node 1>...
<cgroup inode id> <nr of objects on node 0> <nr of objects on node 1>...
...

To reduce the size of output on machines with many thousands cgroups, if
the total number of objects on all nodes is 0, the line is omitted.

If the shrinker is not memcg-aware or CONFIG_MEMCG is off, 0 is printed as
cgroup inode id.  If the shrinker is not numa-aware, 0's are printed for
all nodes except the first one.

This commit gives debugfs entries simple numeric names, which are not very
convenient.  The following commit in the series will provide shrinkers
with more meaningful names.

[akpm@linux-foundation.org: remove WARN_ON_ONCE(), per Roman]
  Reported-by: syzbot+300d27c79fe6d4cbcc39@syzkaller.appspotmail.com
Link: https://lkml.kernel.org/r/20220601032227.4076670-3-roman.gushchin@linux.dev
Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
Reviewed-by: Kent Overstreet <kent.overstreet@gmail.com>
Acked-by: Muchun Song <songmuchun@bytedance.com>
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Hillf Danton <hdanton@sina.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-03 18:08:40 -07:00
Linus Torvalds
9d004b2f4f cxl for 5.19
- Add driver-core infrastructure for lockdep validation of
   device_lock(), and fixup a deadlock report that was previously hidden
   behind the 'lockdep no validate' policy.
 
 - Add CXL _OSC support for claiming native control of CXL hotplug and
   error handling.
 
 - Disable suspend in the presence of CXL memory unless and until a
   protocol is identified for restoring PCI device context from memory
   hosted on CXL PCI devices.
 
 - Add support for snooping CXL mailbox commands to protect against
   inopportune changes, like set-partition with the 'immediate' flag set.
 
 - Rework how the driver detects legacy CXL 1.1 configurations (CXL DVSEC
   / 'mem_enable') before enabling new CXL 2.0 decode configurations (CXL
   HDM Capability).
 
 - Miscellaneous cleanups and fixes from -next exposure.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSbo+XnGs+rwLz9XGXfioYZHlFsZwUCYpFUogAKCRDfioYZHlFs
 Zz+VAP9o/NkYhbaM2Ne9ImgsdJii96gA8nN7q/q/ZoXjsSx2WQD+NRC5d3ZwZDCa
 9YKEkntnvbnAZOCs+ZUuyZBgNh6vsgU=
 =p92w
 -----END PGP SIGNATURE-----

Merge tag 'cxl-for-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl

Pull cxl updates from Dan Williams:
 "Compute Express Link (CXL) updates for this cycle.

  The highlight is new driver-core infrastructure and CXL subsystem
  changes for allowing lockdep to validate device_lock() usage. Thanks
  to PeterZ for setting me straight on the current capabilities of the
  lockdep API, and Greg acked it as well.

  On the CXL ACPI side this update adds support for CXL _OSC so that
  platform firmware knows that it is safe to still grant Linux native
  control of PCIe hotplug and error handling in the presence of CXL
  devices. A circular dependency problem was discovered between suspend
  and CXL memory for cases where the suspend image might be stored in
  CXL memory where that image also contains the PCI register state to
  restore to re-enable the device. Disable suspend for now until an
  architecture is defined to clarify that conflict.

  Lastly a collection of reworks, fixes, and cleanups to the CXL
  subsystem where support for snooping mailbox commands and properly
  handling the "mem_enable" flow are the highlights.

  Summary:

   - Add driver-core infrastructure for lockdep validation of
     device_lock(), and fixup a deadlock report that was previously
     hidden behind the 'lockdep no validate' policy.

   - Add CXL _OSC support for claiming native control of CXL hotplug and
     error handling.

   - Disable suspend in the presence of CXL memory unless and until a
     protocol is identified for restoring PCI device context from memory
     hosted on CXL PCI devices.

   - Add support for snooping CXL mailbox commands to protect against
     inopportune changes, like set-partition with the 'immediate' flag
     set.

   - Rework how the driver detects legacy CXL 1.1 configurations (CXL
     DVSEC / 'mem_enable') before enabling new CXL 2.0 decode
     configurations (CXL HDM Capability).

   - Miscellaneous cleanups and fixes from -next exposure"

* tag 'cxl-for-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (47 commits)
  cxl/port: Enable HDM Capability after validating DVSEC Ranges
  cxl/port: Reuse 'struct cxl_hdm' context for hdm init
  cxl/port: Move endpoint HDM Decoder Capability init to port driver
  cxl/pci: Drop @info argument to cxl_hdm_decode_init()
  cxl/mem: Merge cxl_dvsec_ranges() and cxl_hdm_decode_init()
  cxl/mem: Skip range enumeration if mem_enable clear
  cxl/mem: Consolidate CXL DVSEC Range enumeration in the core
  cxl/pci: Move cxl_await_media_ready() to the core
  cxl/mem: Validate port connectivity before dvsec ranges
  cxl/mem: Fix cxl_mem_probe() error exit
  cxl/pci: Drop wait_for_valid() from cxl_await_media_ready()
  cxl/pci: Consolidate wait_for_media() and wait_for_media_ready()
  cxl/mem: Drop mem_enabled check from wait_for_media()
  nvdimm: Fix firmware activation deadlock scenarios
  device-core: Kill the lockdep_mutex
  nvdimm: Drop nd_device_lock()
  ACPI: NFIT: Drop nfit_device_lock()
  nvdimm: Replace lockdep_mutex with local lock classes
  cxl: Drop cxl_device_lock()
  cxl/acpi: Add root device lockdep validation
  ...
2022-05-27 21:24:19 -07:00
Linus Torvalds
8291eaafed Two followon fixes for the post-5.19 series "Use pageblock_order for cma
and alloc_contig_range alignment", from Zi Yan.
 
 A series of z3fold cleanups and fixes from Miaohe Lin.
 
 Some memcg selftests work from Michal Koutný <mkoutny@suse.com>
 
 Some swap fixes and cleanups from Miaohe Lin.
 
 Several individual minor fixups.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCYpEE7QAKCRDdBJ7gKXxA
 jlamAP9WmjNdx+5Pz5OkkaSjBO7y7vBrBTcQ9e5pz8bUWRoQhwEA+WtsssLmq9aI
 7DBDmBKYCMTbzOQTqaMRHkB+JWZo+Ao=
 =L3f1
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2022-05-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull more MM updates from Andrew Morton:

 - Two follow-on fixes for the post-5.19 series "Use pageblock_order for
   cma and alloc_contig_range alignment", from Zi Yan.

 - A series of z3fold cleanups and fixes from Miaohe Lin.

 - Some memcg selftests work from Michal Koutný <mkoutny@suse.com>

 - Some swap fixes and cleanups from Miaohe Lin

 - Several individual minor fixups

* tag 'mm-stable-2022-05-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (25 commits)
  mm/shmem.c: suppress shift warning
  mm: Kconfig: reorganize misplaced mm options
  mm: kasan: fix input of vmalloc_to_page()
  mm: fix is_pinnable_page against a cma page
  mm: filter out swapin error entry in shmem mapping
  mm/shmem: fix infinite loop when swap in shmem error at swapoff time
  mm/madvise: free hwpoison and swapin error entry in madvise_free_pte_range
  mm/swapfile: fix lost swap bits in unuse_pte()
  mm/swapfile: unuse_pte can map random data if swap read fails
  selftests: memcg: factor out common parts of memory.{low,min} tests
  selftests: memcg: remove protection from top level memcg
  selftests: memcg: adjust expected reclaim values of protected cgroups
  selftests: memcg: expect no low events in unprotected sibling
  selftests: memcg: fix compilation
  mm/z3fold: fix z3fold_page_migrate races with z3fold_map
  mm/z3fold: fix z3fold_reclaim_page races with z3fold_free
  mm/z3fold: always clear PAGE_CLAIMED under z3fold page lock
  mm/z3fold: put z3fold page back into unbuddied list when reclaim or migration fails
  revert "mm/z3fold.c: allow __GFP_HIGHMEM in z3fold_alloc"
  mm/z3fold: throw warning on failure of trylock_page in z3fold_alloc
  ...
2022-05-27 11:40:49 -07:00
Linus Torvalds
6f664045c8 Not a lot of material this cycle. Many singleton patches against various
subsystems.   Most notably some maintenance work in ocfs2 and initramfs.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCYo/6xQAKCRDdBJ7gKXxA
 jkD9AQCPczLBbRWpe1edL+5VHvel9ePoHQmvbHQnufdTh9rB5QEAu0Uilxz4q9cx
 xSZypNhj2n9f8FCYca/ZrZneBsTnAA8=
 =AJEO
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2022-05-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc updates from Andrew Morton:
 "The non-MM patch queue for this merge window.

  Not a lot of material this cycle. Many singleton patches against
  various subsystems. Most notably some maintenance work in ocfs2
  and initramfs"

* tag 'mm-nonmm-stable-2022-05-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (65 commits)
  kcov: update pos before writing pc in trace function
  ocfs2: dlmfs: fix error handling of user_dlm_destroy_lock
  ocfs2: dlmfs: don't clear USER_LOCK_ATTACHED when destroying lock
  fs/ntfs: remove redundant variable idx
  fat: remove time truncations in vfat_create/vfat_mkdir
  fat: report creation time in statx
  fat: ignore ctime updates, and keep ctime identical to mtime in memory
  fat: split fat_truncate_time() into separate functions
  MAINTAINERS: add Muchun as a memcg reviewer
  proc/sysctl: make protected_* world readable
  ia64: mca: drop redundant spinlock initialization
  tty: fix deadlock caused by calling printk() under tty_port->lock
  relay: remove redundant assignment to pointer buf
  fs/ntfs3: validate BOOT sectors_per_clusters
  lib/string_helpers: fix not adding strarray to device's resource list
  kernel/crash_core.c: remove redundant check of ck_cmdline
  ELF, uapi: fixup ELF_ST_TYPE definition
  ipc/mqueue: use get_tree_nodev() in mqueue_get_tree()
  ipc: update semtimedop() to use hrtimer
  ipc/sem: remove redundant assignments
  ...
2022-05-27 11:22:03 -07:00
Vlastimil Babka
0710d0122a mm: Kconfig: reorganize misplaced mm options
After commits 7b42f1041c ("mm: Kconfig: move swap and slab config
options to the MM section") and 519bcb7979 ("mm: Kconfig: group swap,
slab, hotplug and thp options into submenus") we now have nicely organized
mm related config options.  I have noticed some that were still misplaced,
so this moves them from various places into the new structure:

VM_EVENT_COUNTERS, COMPAT_BRK, MMAP_ALLOW_UNINITIALIZED to mm/Kconfig and
general MM section.

SLUB_STATS to mm/Kconfig and the slab submenu.

DEBUG_SLAB, SLUB_DEBUG, SLUB_DEBUG_ON to mm/Kconfig.debug and the Kernel
hacking / Memory Debugging submenu.

Link: https://lkml.kernel.org/r/20220525112559.1139-1-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27 09:33:47 -07:00
Linus Torvalds
64e34b50d7 linux-kselftest-kunit-5.19-rc1
This KUnit update for Linux 5.19-rc1 consists of several fixes, cleanups,
 and enhancements to tests and framework:
 
 - introduces _NULL and _NOT_NULL macros to pointer error checks
 
 - reworks kunit_resource allocation policy to fix memory leaks when
   caller doesn't specify free() function to be used when allocating
   memory using kunit_add_resource() and kunit_alloc_resource() funcs.
 
 - adds ability to specify suite-level init and exit functions
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmKLw4QACgkQCwJExA0N
 Qxz9wRAA3PonJESDAFF2sXTDzQurEXdWoJHqNvO0JCObku8SDODEI7nozXOD0MBC
 ASAXiX3HuNI0yESF27xECqu3xbe8KsYOtCN8vco/sYUroVGmzgAt/atsvrSUv2Oh
 sEQbjrTMwkMUjL5ECvjR2dArd6bQew7PPBkl3HqOpyysL3b/EAMEAY0DmDXrrrwB
 +oNvXGVAR1Tczg4ahcSSwDdZl1C41kREj5f8S/4+kohMdIjCUPWOAYnaWHpVdAOJ
 C+LWkPSJ5IpgjU2urDX2kNfg32UxIJpFI009ovytBmwCbd+GEs24u7gtgtksPM2s
 YypoPEqC40gxkbY99omojtADiDdZlKqlIipCTWYe/CpzgBD+WQ4PVqMGM4ZprP9w
 Hrc6ulVmd8hZ4F9QQ3oN6W9L6pBCgdXtPPCsQtGoUTbw7r79BP67PjJ6Ko+usn3s
 Jy0FR5LvzYBjykoJzKSIaJ8ONaX34DB6w5rB+q5mBGwPKPHWo3eAZVZDPEMVo3Z7
 D9TW5UliGBt2y5YJZbPbSnhdJPMPHSK5ef9hIy0wYjVJFafirdgrQhgbWbVxalRT
 eZz1edcs1sdU7GAzfMA/v+NqAAA3bFIUVr2b+GTc+4zzWhq+cwI2SNikgyhETv/f
 xKq8Xek8EkOIdaa2lu9chTPT4sG7A6991EkRqfc7rL1IptkPiS8=
 =DzVQ
 -----END PGP SIGNATURE-----

Merge tag 'linux-kselftest-kunit-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull KUnit updates from Shuah Khan:
 "Several fixes, cleanups, and enhancements to tests and framework:

   - introduce _NULL and _NOT_NULL macros to pointer error checks

   - rework kunit_resource allocation policy to fix memory leaks when
     caller doesn't specify free() function to be used when allocating
     memory using kunit_add_resource() and kunit_alloc_resource() funcs.

   - add ability to specify suite-level init and exit functions"

* tag 'linux-kselftest-kunit-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (41 commits)
  kunit: tool: Use qemu-system-i386 for i386 runs
  kunit: fix executor OOM error handling logic on non-UML
  kunit: tool: update riscv QEMU config with new serial dependency
  kcsan: test: use new suite_{init,exit} support
  kunit: tool: Add list of all valid test configs on UML
  kunit: take `kunit_assert` as `const`
  kunit: tool: misc cleanups
  kunit: tool: minor cosmetic cleanups in kunit_parser.py
  kunit: tool: make parser stop overwriting status of suites w/ no_tests
  kunit: tool: remove dead parse_crash_in_log() logic
  kunit: tool: print clearer error message when there's no TAP output
  kunit: tool: stop using a shell to run kernel under QEMU
  kunit: tool: update test counts summary line format
  kunit: bail out of test filtering logic quicker if OOM
  lib/Kconfig.debug: change KUnit tests to default to KUNIT_ALL_TESTS
  kunit: Rework kunit_resource allocation policy
  kunit: fix debugfs code to use enum kunit_status, not bool
  kfence: test: use new suite_{init/exit} support, add .kunitconfig
  kunit: add ability to specify suite-level init and exit functions
  kunit: rename print_subtest_{start,end} for clarity (s/subtest/suite)
  ...
2022-05-25 11:32:53 -07:00
Linus Torvalds
2e17ce1106 slab changes for 5.19
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEjUuTAak14xi+SF7M4CHKc/GJqRAFAmKLUYoACgkQ4CHKc/GJ
 qRCMFwf/Tm1cf2JLUANrT58rjkrrj15EtKhnJdm5/yvmsWKps7WKPP4jeUHe+NTO
 NovAGt67lG1l6LMLczZkWckOkWlyYjC42CPDLdxRUkk+zQRb3nRA8Nbt6VTNBOfQ
 0wTLOqXgsNXdSPSVUsKGL8kIAHNQTMX+7TjO6s7CXy/5Qag6r1iZX2HZxASOHxLa
 yYzaJ9pJRZBAMGnzV6L6v0J8KPnjYO0fB68S1qYQTbhoRxchtFF+0AIr1JydGgBI
 9RFUowTrSpJkZtcSjabopvZz4JfCRDP+eAxkyw13feji7MG1FMX74HgDdw+HhzTv
 R2/6iA5WcsmzcXopsfMx8lUP/KIfPw==
 =gnSc
 -----END PGP SIGNATURE-----

Merge tag 'slab-for-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab

Pull slab updates from Vlastimil Babka:

 - Conversion of slub_debug stack traces to stackdepot, allowing more
   useful debugfs-based inspection for e.g. memory leak debugging.
   Allocation and free debugfs info now includes full traces and is
   sorted by the unique trace frequency.

   The stackdepot conversion was already attempted last year but
   reverted by ae14c63a9f. The memory overhead (while not actually
   enabled on boot) has been meanwhile solved by making the large
   stackdepot allocation dynamic. The xfstest issues haven't been
   reproduced on current kernel locally nor in -next, so the slab cache
   layout changes that originally made that bug manifest were probably
   not the root cause.

 - Refactoring of dma-kmalloc caches creation.

 - Trivial cleanups such as removal of unused parameters, fixes and
   clarifications of comments.

 - Hyeonggon Yoo joins as a reviewer.

* tag 'slab-for-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
  MAINTAINERS: add myself as reviewer for slab
  mm/slub: remove unused kmem_cache_order_objects max
  mm: slab: fix comment for __assume_kmalloc_alignment
  mm: slab: fix comment for ARCH_KMALLOC_MINALIGN
  mm/slub: remove unneeded return value of slab_pad_check
  mm/slab_common: move dma-kmalloc caches creation into new_kmalloc_cache()
  mm/slub: remove meaningless node check in ___slab_alloc()
  mm/slub: remove duplicate flag in allocate_slab()
  mm/slub: remove unused parameter in setup_object*()
  mm/slab.c: fix comments
  slab, documentation: add description of debugfs files for SLUB caches
  mm/slub: sort debugfs output by frequency of stack traces
  mm/slub: distinguish and print stack traces in debugfs files
  mm/slub: use stackdepot to save stack trace in objects
  mm/slub: move struct track init out of set_track()
  lib/stackdepot: allow requesting early initialization dynamically
  mm/slub, kunit: Make slub_kunit unaffected by user specified flags
  mm/slab: remove some unused functions
2022-05-25 10:24:04 -07:00
Linus Torvalds
ac2ab99072 Random number generator updates for Linux 5.19-rc1.
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEq5lC5tSkz8NBJiCnSfxwEqXeA64FAmKKpM8ACgkQSfxwEqXe
 A6726w/+OJimGd4arvpSmdn+vxepSyDLgKfwM0x5zprRVd16xg8CjJr4eMonTesq
 YvtJRqpetb53MB+sMhutlvQqQzrjtf2MBkgPwF4I2gUrk7vLD45Q+AGdGhi/rUwz
 wHGA7xg1FHLHia2M/9idSqi8QlZmUP4u4l5ZnMyTUHiwvRD6XOrWKfqvUSawNzyh
 hCWlTUxDrjizsW5YpsJX/MkRadSC8loJEk5ByZebow6nRPfurJvqfrcOMgHyNrbY
 pOZ/CGPxcetMqotL2TuuJt5wKmenqYhIWGAp3YM2SWWgU2ueBZekW8AYeMfgUcvh
 LWV93RpSuAnE5wsdjIULvjFnEDJBf8ihfMnMrd9G5QjQu44tuKWfY2MghLSpYzaR
 V6UFbRmhrqhqiStHQXOvk1oqxtpbHlc9zzJLmvPmDJcbvzXQ9Opk5GVXAmdtnHnj
 M/ty3wGWxucY6mHqT8MkCShSSslbgEtc1pEIWHdrUgnaiSVoCVBEO+9LqLbjvOTm
 XA/6YtoiCE5FasK51pir1zVb2GORQn0v8HnuAOsusD/iPAlRQ/G5jZkaXbwRQI6j
 atYL1svqvSKn5POnzqAlMUXfMUr19K5xqJdp7i6qmlO1Vq6Z+tWbCQgD1JV+Wjkb
 CMyvXomFCFu4aYKGRE2SBRnWLRghG3kYHqEQ15yTPMQerxbUDNg=
 =SUr3
 -----END PGP SIGNATURE-----

Merge tag 'random-5.19-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random

Pull random number generator updates from Jason Donenfeld:
 "These updates continue to refine the work began in 5.17 and 5.18 of
  modernizing the RNG's crypto and streamlining and documenting its
  code.

  New for 5.19, the updates aim to improve entropy collection methods
  and make some initial decisions regarding the "premature next" problem
  and our threat model. The cloc utility now reports that random.c is
  931 lines of code and 466 lines of comments, not that basic metrics
  like that mean all that much, but at the very least it tells you that
  this is very much a manageable driver now.

  Here's a summary of the various updates:

   - The random_get_entropy() function now always returns something at
     least minimally useful. This is the primary entropy source in most
     collectors, which in the best case expands to something like RDTSC,
     but prior to this change, in the worst case it would just return 0,
     contributing nothing. For 5.19, additional architectures are wired
     up, and architectures that are entirely missing a cycle counter now
     have a generic fallback path, which uses the highest resolution
     clock available from the timekeeping subsystem.

     Some of those clocks can actually be quite good, despite the CPU
     not having a cycle counter of its own, and going off-core for a
     stamp is generally thought to increase jitter, something positive
     from the perspective of entropy gathering. Done very early on in
     the development cycle, this has been sitting in next getting some
     testing for a while now and has relevant acks from the archs, so it
     should be pretty well tested and fine, but is nonetheless the thing
     I'll be keeping my eye on most closely.

   - Of particular note with the random_get_entropy() improvements is
     MIPS, which, on CPUs that lack the c0 count register, will now
     combine the high-speed but short-cycle c0 random register with the
     lower-speed but long-cycle generic fallback path.

   - With random_get_entropy() now always returning something useful,
     the interrupt handler now collects entropy in a consistent
     construction.

   - Rather than comparing two samples of random_get_entropy() for the
     jitter dance, the algorithm now tests many samples, and uses the
     amount of differing ones to determine whether or not jitter entropy
     is usable and how laborious it must be. The problem with comparing
     only two samples was that if the cycle counter was extremely slow,
     but just so happened to be on the cusp of a change, the slowness
     wouldn't be detected. Taking many samples fixes that to some
     degree.

     This, combined with the other improvements to random_get_entropy(),
     should make future unification of /dev/random and /dev/urandom
     maybe more possible. At the very least, were we to attempt it again
     today (we're not), it wouldn't break any of Guenter's test rigs
     that broke when we tried it with 5.18. So, not today, but perhaps
     down the road, that's something we can revisit.

   - We attempt to reseed the RNG immediately upon waking up from system
     suspend or hibernation, making use of the various timestamps about
     suspend time and such available, as well as the usual inputs such
     as RDRAND when available.

   - Batched randomness now falls back to ordinary randomness before the
     RNG is initialized. This provides more consistent guarantees to the
     types of random numbers being returned by the various accessors.

   - The "pre-init injection" code is now gone for good. I suspect you
     in particular will be happy to read that, as I recall you
     expressing your distaste for it a few months ago. Instead, to avoid
     a "premature first" issue, while still allowing for maximal amount
     of entropy availability during system boot, the first 128 bits of
     estimated entropy are used immediately as it arrives, with the next
     128 bits being buffered. And, as before, after the RNG has been
     fully initialized, it winds up reseeding anyway a few seconds later
     in most cases. This resulted in a pretty big simplification of the
     initialization code and let us remove various ad-hoc mechanisms
     like the ugly crng_pre_init_inject().

   - The RNG no longer pretends to handle the "premature next" security
     model, something that various academics and other RNG designs have
     tried to care about in the past. After an interesting mailing list
     thread, these issues are thought to be a) mainly academic and not
     practical at all, and b) actively harming the real security of the
     RNG by delaying new entropy additions after a potential compromise,
     making a potentially bad situation even worse. As well, in the
     first place, our RNG never even properly handled the premature next
     issue, so removing an incomplete solution to a fake problem was
     particularly nice.

     This allowed for numerous other simplifications in the code, which
     is a lot cleaner as a consequence. If you didn't see it before,
     https://lore.kernel.org/lkml/YmlMGx6+uigkGiZ0@zx2c4.com/ may be a
     thread worth skimming through.

   - While the interrupt handler received a separate code path years ago
     that avoids locks by using per-cpu data structures and a faster
     mixing algorithm, in order to reduce interrupt latency, input and
     disk events that are triggered in hardirq handlers were still
     hitting locks and more expensive algorithms. Those are now
     redirected to use the faster per-cpu data structures.

   - Rather than having the fake-crypto almost-siphash-based random32
     implementation be used right and left, and in many places where
     cryptographically secure randomness is desirable, the batched
     entropy code is now fast enough to replace that.

   - As usual, numerous code quality and documentation cleanups. For
     example, the initialization state machine now uses enum symbolic
     constants instead of just hard coding numbers everywhere.

   - Since the RNG initializes once, and then is always initialized
     thereafter, a pretty heavy amount of code used during that
     initialization is never used again. It is now completely cordoned
     off using static branches and it winds up in the .text.unlikely
     section so that it doesn't reduce cache compactness after the RNG
     is ready.

   - A variety of functions meant for waiting on the RNG to be
     initialized were only used by vsprintf, and in not a particularly
     optimal way. Replacing that usage with a more ordinary setup made
     it possible to remove those functions.

   - A cleanup of how we warn userspace about the use of uninitialized
     /dev/urandom and uninitialized get_random_bytes() usage.
     Interestingly, with the change you merged for 5.18 that attempts to
     use jitter (but does not block if it can't), the majority of users
     should never see those warnings for /dev/urandom at all now, and
     the one for in-kernel usage is mainly a debug thing.

   - The file_operations struct for /dev/[u]random now implements
     .read_iter and .write_iter instead of .read and .write, allowing it
     to also implement .splice_read and .splice_write, which makes
     splice(2) work again after it was broken here (and in many other
     places in the tree) during the set_fs() removal. This was a bit of
     a last minute arrival from Jens that hasn't had as much time to
     bake, so I'll be keeping my eye on this as well, but it seems
     fairly ordinary. Unfortunately, read_iter() is around 3% slower
     than read() in my tests, which I'm not thrilled about. But Jens and
     Al, spurred by this observation, seem to be making progress in
     removing the bottlenecks on the iter paths in the VFS layer in
     general, which should remove the performance gap for all drivers.

   - Assorted other bug fixes, cleanups, and optimizations.

   - A small SipHash cleanup"

* tag 'random-5.19-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: (49 commits)
  random: check for signals after page of pool writes
  random: wire up fops->splice_{read,write}_iter()
  random: convert to using fops->write_iter()
  random: convert to using fops->read_iter()
  random: unify batched entropy implementations
  random: move randomize_page() into mm where it belongs
  random: remove mostly unused async readiness notifier
  random: remove get_random_bytes_arch() and add rng_has_arch_random()
  random: move initialization functions out of hot pages
  random: make consistent use of buf and len
  random: use proper return types on get_random_{int,long}_wait()
  random: remove extern from functions in header
  random: use static branch for crng_ready()
  random: credit architectural init the exact amount
  random: handle latent entropy and command line from random_init()
  random: use proper jiffies comparison macro
  random: remove ratelimiting for in-kernel unseeded randomness
  random: move initialization out of reseeding hot path
  random: avoid initializing twice in credit race
  random: use symbolic constants for crng_init states
  ...
2022-05-24 11:58:10 -07:00
Jason A. Donenfeld
cc1e127bfa random: remove ratelimiting for in-kernel unseeded randomness
The CONFIG_WARN_ALL_UNSEEDED_RANDOM debug option controls whether the
kernel warns about all unseeded randomness or just the first instance.
There's some complicated rate limiting and comparison to the previous
caller, such that even with CONFIG_WARN_ALL_UNSEEDED_RANDOM enabled,
developers still don't see all the messages or even an accurate count of
how many were missed. This is the result of basically parallel
mechanisms aimed at accomplishing more or less the same thing, added at
different points in random.c history, which sort of compete with the
first-instance-only limiting we have now.

It turns out, however, that nobody cares about the first unseeded
randomness instance of in-kernel users. The same first user has been
there for ages now, and nobody is doing anything about it. It isn't even
clear that anybody _can_ do anything about it. Most places that can do
something about it have switched over to using get_random_bytes_wait()
or wait_for_random_bytes(), which is the right thing to do, but there is
still much code that needs randomness sometimes during init, and as a
geeneral rule, if you're not using one of the _wait functions or the
readiness notifier callback, you're bound to be doing it wrong just
based on that fact alone.

So warning about this same first user that can't easily change is simply
not an effective mechanism for anything at all. Users can't do anything
about it, as the Kconfig text points out -- the problem isn't in
userspace code -- and kernel developers don't or more often can't react
to it.

Instead, show the warning for all instances when CONFIG_WARN_ALL_UNSEEDED_RANDOM
is set, so that developers can debug things need be, or if it isn't set,
don't show a warning at all.

At the same time, CONFIG_WARN_ALL_UNSEEDED_RANDOM now implies setting
random.ratelimit_disable=1 on by default, since if you care about one
you probably care about the other too. And we can clean up usage around
the related urandom_warning ratelimiter as well (whose behavior isn't
changing), so that it properly counts missed messages after the 10
message threshold is reached.

Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-05-18 15:53:53 +02:00
Daniel Latypov
dcbb2ee246 lib/Kconfig.debug: change KUnit tests to default to KUNIT_ALL_TESTS
This is in line with Documentation/dev-tools/kunit/style.rst.
Some of these tests predate that so they don't follow this convention.

With this and commit b0841b51ca ("kunit: arch/um/configs: Enable
KUNIT_ALL_TESTS by default"), kunit.py will now run these tests by
default. This hopefully makes it easier to run and maintain the tests.
If any of these were to start failing, people would notice much quicker.

Note: this commit doesn't update LINEAR_RANGES_TEST since that would
select its dependency (LINEAR_RANGES). We don't want KUNIT_ALL_TESTS
to enable anything other than test kconfigs.

Signed-off-by: Daniel Latypov <dlatypov@google.com>
Reviewed-by: David Gow <davidgow@google.com>
Reviewed-by: Nico Pache <npache@redhat.com>
Acked-by: Nico Pache <npache@redhat.com>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2022-05-12 11:14:56 -06:00
Rasmus Villemoes
67fca000e1 lib/Kconfig.debug: remove more CONFIG_..._VALUE indirections
As in "kernel/panic.c: remove CONFIG_PANIC_ON_OOPS_VALUE indirection",
use the IS_ENABLED() helper rather than having a hidden config option.

Link: https://lkml.kernel.org/r/20220321121301.1389693-1-linux@rasmusvillemoes.dk
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29 14:38:00 -07:00
Dan Williams
81beea55cb nvdimm: Drop nd_device_lock()
Now that all NVDIMM subsystem locking is validated with custom lock
classes, there is no need for the custom usage of the lockdep_mutex.

Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/165055521979.3745911.10751769706032029999.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-04-28 14:01:55 -07:00
Dan Williams
38a34e1076 cxl: Drop cxl_device_lock()
Now that all CXL subsystem locking is validated with custom lock
classes, there is no need for the custom usage of the lockdep_mutex.

Cc: Alison Schofield <alison.schofield@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Ben Widawsky <ben.widawsky@intel.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/165055520383.3745911.53447786039115271.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-04-28 14:01:55 -07:00
Josh Poimboeuf
489e355b42 objtool: Add HAVE_NOINSTR_VALIDATION
Remove CONFIG_NOINSTR_VALIDATION's dependency on HAVE_OBJTOOL, since
other arches might want to implement objtool without it.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Link: https://lkml.kernel.org/r/488e94f69db4df154499bc098573d90e5db1c826.1650300597.git.jpoimboe@redhat.com
2022-04-22 12:32:05 +02:00
Josh Poimboeuf
0f620cefd7 objtool: Rename "VMLINUX_VALIDATION" -> "NOINSTR_VALIDATION"
CONFIG_VMLINUX_VALIDATION is just the validation of the "noinstr" rules.
That name is a misnomer, because now objtool actually does vmlinux
validation for other reasons.

Rename CONFIG_VMLINUX_VALIDATION to CONFIG_NOINSTR_VALIDATION.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Link: https://lkml.kernel.org/r/173f07e2d6d1afc0874aed975a61783207c6a531.1650300597.git.jpoimboe@redhat.com
2022-04-22 12:32:04 +02:00
Josh Poimboeuf
22102f4559 objtool: Make noinstr hacks optional
Objtool has some hacks in place to workaround toolchain limitations
which otherwise would break no-instrumentation rules.  Make the hacks
explicit (and optional for other arches) by turning it into a cmdline
option and kernel config option.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Link: https://lkml.kernel.org/r/b326eeb9c33231b9dfbb925f194ed7ee40edcd7c.1650300597.git.jpoimboe@redhat.com
2022-04-22 12:32:04 +02:00
Josh Poimboeuf
03f16cd020 objtool: Add CONFIG_OBJTOOL
Now that stack validation is an optional feature of objtool, add
CONFIG_OBJTOOL and replace most usages of CONFIG_STACK_VALIDATION with
it.

CONFIG_STACK_VALIDATION can now be considered to be frame-pointer
specific.  CONFIG_UNWINDER_ORC is already inherently valid for live
patching, so no need to "validate" it.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Link: https://lkml.kernel.org/r/939bf3d85604b2a126412bf11af6e3bd3b872bcb.1650300597.git.jpoimboe@redhat.com
2022-04-22 12:32:03 +02:00
Oliver Glitta
5cf909c553 mm/slub: use stackdepot to save stack trace in objects
Many stack traces are similar so there are many similar arrays.
Stackdepot saves each unique stack only once.

Replace field addrs in struct track with depot_stack_handle_t handle.  Use
stackdepot to save stack trace.

The benefits are smaller memory overhead and possibility to aggregate
per-cache statistics in the following patch using the stackdepot handle
instead of matching stacks manually.

[ vbabka@suse.cz: rebase to 5.17-rc1 and adjust accordingly ]

This was initially merged as commit 788691464c and reverted by commit
ae14c63a9f due to several issues, that should now be fixed.
The problem of unconditional memory overhead by stackdepot has been
addressed by commit 2dba5eb1c7 ("lib/stackdepot: allow optional init
and stack_table allocation by kvmalloc()"), so the dependency on
stackdepot will result in extra memory usage only when a slab cache
tracking is actually enabled, and not for all CONFIG_SLUB_DEBUG builds.
The build failures on some architectures were also addressed, and the
reported issue with xfs/433 test did not reproduce on 5.17-rc1 with this
patch.

Signed-off-by: Oliver Glitta <glittao@gmail.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-and-tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
2022-04-06 11:03:32 +02:00
Linus Torvalds
b9132c32e0 cxl for 5.18
- Add a driver for 'struct cxl_memdev' objects responsible for CXL.mem
   operation as distinct from 'cxl_pci' mailbox operations. Its primary
   responsibility is enumerating an endpoint 'struct cxl_port' and all the
   'struct cxl_port' instances between an endpoint and the CXL platform
   root.
 
 - Add a driver for 'struct cxl_port' objects responsible for enumerating
   and operating all Host-managed Device Memory (HDM) decoder resources
   between the platform-level CXL memory description, all intervening host
   bridges / switches, and the HDM resources in endpoints.
 
 - Update the cxl_pci driver to validate CXL.mem operation precursors to
   HDM decoder operation like ready-polling, and legacy CXL 1.1 DVSEC
   based CXL.mem configuration.
 
 - Add basic lockdep coverage for usage of device_lock() on CXL subsystem
   objects similar to what exists for LIBNVDIMM. Include a compile-time
   switch for which subsystem to validate at run-time.
 
 - Update cxl_test to emulate a one level switch topology.
 
 - Document a "Theory of Operation" for the subsystem.
 
 - Add 'numa_node' and 'serial' attributes to cxl_memdev sysfs
 
 - Include miscellaneous fixes for spec / QEMU CXL emulation
   compatibility and static analysis reports.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSbo+XnGs+rwLz9XGXfioYZHlFsZwUCYjpX6AAKCRDfioYZHlFs
 ZzyxAQCztxAXj7mzkm1Qt5zZz4e7p/6sR49B03jBTfPtrEF9kQEAl9R15WVt6U+o
 Ooof1XhRic3kT6e8zS3ZVKHzGduYxwM=
 =mR94
 -----END PGP SIGNATURE-----

Merge tag 'cxl-for-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl

Pull CXL (Compute Express Link) updates from Dan Williams:
 "This development cycle extends the subsystem to discover CXL resources
  throughout a CXL/PCIe switch topology and respond to hot add/remove
  events anywhere in that topology.

  This is more foundational infrastructure in preparation for dynamic
  memory region provisioning support. Recall that CXL memory regions, as
  the new "Theory of Operation" section of
  Documentation/driver-api/cxl/memory-devices.rst describes, bring
  storage volume striping semantics to memory.

  The hot add/remove behavior is validated with extensions to the
  cxl_test unit test environment and this test in the cxl-cli test
  suite:

      https://github.com/pmem/ndctl/blob/djbw/for-74/cxl/test/cxl-topology.sh

  Summary:

   - Add a driver for 'struct cxl_memdev' objects responsible for
     CXL.mem operation as distinct from 'cxl_pci' mailbox operations.

     Its primary responsibility is enumerating an endpoint 'struct
     cxl_port' and all the 'struct cxl_port' instances between an
     endpoint and the CXL platform root.

   - Add a driver for 'struct cxl_port' objects responsible for
     enumerating and operating all Host-managed Device Memory (HDM)
     decoder resources between the platform-level CXL memory
     description, all intervening host bridges / switches, and the HDM
     resources in endpoints.

   - Update the cxl_pci driver to validate CXL.mem operation precursors
     to HDM decoder operation like ready-polling, and legacy CXL 1.1
     DVSEC based CXL.mem configuration.

   - Add basic lockdep coverage for usage of device_lock() on CXL
     subsystem objects similar to what exists for LIBNVDIMM. Include a
     compile-time switch for which subsystem to validate at run-time.

   - Update cxl_test to emulate a one level switch topology.

   - Document a "Theory of Operation" for the subsystem.

   - Add 'numa_node' and 'serial' attributes to cxl_memdev sysfs

   - Include miscellaneous fixes for spec / QEMU CXL emulation
     compatibility and static analysis reports"

* tag 'cxl-for-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (48 commits)
  cxl/core/port: Fix NULL but dereferenced coccicheck error
  cxl/port: Hold port reference until decoder release
  cxl/port: Fix endpoint refcount leak
  cxl/core: Fix cxl_device_lock() class detection
  cxl/core/port: Fix unregister_port() lock assertion
  cxl/regs: Fix size of CXL Capability Header Register
  cxl/core/port: Handle invalid decoders
  cxl/core/port: Fix / relax decoder target enumeration
  tools/testing/cxl: Add a physical_node link
  tools/testing/cxl: Enumerate mock decoders
  tools/testing/cxl: Mock one level of switches
  tools/testing/cxl: Fix root port to host bridge assignment
  tools/testing/cxl: Mock dvsec_ranges()
  cxl/core/port: Add endpoint decoders
  cxl/core: Move target_list out of base decoder attributes
  cxl/mem: Add the cxl_mem driver
  cxl/core/port: Add switch port enumeration
  cxl/memdev: Add numa_node attribute
  cxl/pci: Emit device serial number
  cxl/pci: Implement wait for media active
  ...
2022-03-24 18:07:03 -07:00
Linus Torvalds
52deda9551 Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:
 "Various misc subsystems, before getting into the post-linux-next
  material.

  41 patches.

  Subsystems affected by this patch series: procfs, misc, core-kernel,
  lib, checkpatch, init, pipe, minix, fat, cgroups, kexec, kdump,
  taskstats, panic, kcov, resource, and ubsan"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (41 commits)
  Revert "ubsan, kcsan: Don't combine sanitizer with kcov on clang"
  kernel/resource: fix kfree() of bootmem memory again
  kcov: properly handle subsequent mmap calls
  kcov: split ioctl handling into locked and unlocked parts
  panic: move panic_print before kmsg dumpers
  panic: add option to dump all CPUs backtraces in panic_print
  docs: sysctl/kernel: add missing bit to panic_print
  taskstats: remove unneeded dead assignment
  kasan: no need to unset panic_on_warn in end_report()
  ubsan: no need to unset panic_on_warn in ubsan_epilogue()
  panic: unset panic_on_warn inside panic()
  docs: kdump: add scp example to write out the dump file
  docs: kdump: update description about sysfs file system support
  arm64: mm: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef
  x86/setup: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef
  riscv: mm: init: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef
  kexec: make crashk_res, crashk_low_res and crash_notes symbols always visible
  cgroup: use irqsave in cgroup_rstat_flush_locked().
  fat: use pointer to simple type in put_user()
  minix: fix bug when opening a file with O_DIRECT
  ...
2022-03-24 14:14:07 -07:00
Linus Torvalds
169e77764a Networking changes for 5.18.
Core
 ----
 
  - Introduce XDP multi-buffer support, allowing the use of XDP with
    jumbo frame MTUs and combination with Rx coalescing offloads (LRO).
 
  - Speed up netns dismantling (5x) and lower the memory cost a little.
    Remove unnecessary per-netns sockets. Scope some lists to a netns.
    Cut down RCU syncing. Use batch methods. Allow netdev registration
    to complete out of order.
 
  - Support distinguishing timestamp types (ingress vs egress) and
    maintaining them across packet scrubbing points (e.g. redirect).
 
  - Continue the work of annotating packet drop reasons throughout
    the stack.
 
  - Switch netdev error counters from an atomic to dynamically
    allocated per-CPU counters.
 
  - Rework a few preempt_disable(), local_irq_save() and busy waiting
    sections problematic on PREEMPT_RT.
 
  - Extend the ref_tracker to allow catching use-after-free bugs.
 
 BPF
 ---
 
  - Introduce "packing allocator" for BPF JIT images. JITed code is
    marked read only, and used to be allocated at page granularity.
    Custom allocator allows for more efficient memory use, lower
    iTLB pressure and prevents identity mapping huge pages from
    getting split.
 
  - Make use of BTF type annotations (e.g. __user, __percpu) to enforce
    the correct probe read access method, add appropriate helpers.
 
  - Convert the BPF preload to use light skeleton and drop
    the user-mode-driver dependency.
 
  - Allow XDP BPF_PROG_RUN test infra to send real packets, enabling
    its use as a packet generator.
 
  - Allow local storage memory to be allocated with GFP_KERNEL if called
    from a hook allowed to sleep.
 
  - Introduce fprobe (multi kprobe) to speed up mass attachment (arch
    bits to come later).
 
  - Add unstable conntrack lookup helpers for BPF by using the BPF
    kfunc infra.
 
  - Allow cgroup BPF progs to return custom errors to user space.
 
  - Add support for AF_UNIX iterator batching.
 
  - Allow iterator programs to use sleepable helpers.
 
  - Support JIT of add, and, or, xor and xchg atomic ops on arm64.
 
  - Add BTFGen support to bpftool which allows to use CO-RE in kernels
    without BTF info.
 
  - Large number of libbpf API improvements, cleanups and deprecations.
 
 Protocols
 ---------
 
  - Micro-optimize UDPv6 Tx, gaining up to 5% in test on dummy netdev.
 
  - Adjust TSO packet sizes based on min_rtt, allowing very low latency
    links (data centers) to always send full-sized TSO super-frames.
 
  - Make IPv6 flow label changes (AKA hash rethink) more configurable,
    via sysctl and setsockopt. Distinguish between server and client
    behavior.
 
  - VxLAN support to "collect metadata" devices to terminate only
    configured VNIs. This is similar to VLAN filtering in the bridge.
 
  - Support inserting IPv6 IOAM information to a fraction of frames.
 
  - Add protocol attribute to IP addresses to allow identifying where
    given address comes from (kernel-generated, DHCP etc.)
 
  - Support setting socket and IPv6 options via cmsg on ping6 sockets.
 
  - Reject mis-use of ECN bits in IP headers as part of DSCP/TOS.
    Define dscp_t and stop taking ECN bits into account in fib-rules.
 
  - Add support for locked bridge ports (for 802.1X).
 
  - tun: support NAPI for packets received from batched XDP buffs,
    doubling the performance in some scenarios.
 
  - IPv6 extension header handling in Open vSwitch.
 
  - Support IPv6 control message load balancing in bonding, prevent
    neighbor solicitation and advertisement from using the wrong port.
    Support NS/NA monitor selection similar to existing ARP monitor.
 
  - SMC
    - improve performance with TCP_CORK and sendfile()
    - support auto-corking
    - support TCP_NODELAY
 
  - MCTP (Management Component Transport Protocol)
    - add user space tag control interface
    - I2C binding driver (as specified by DMTF DSP0237)
 
  - Multi-BSSID beacon handling in AP mode for WiFi.
 
  - Bluetooth:
    - handle MSFT Monitor Device Event
    - add MGMT Adv Monitor Device Found/Lost events
 
  - Multi-Path TCP:
    - add support for the SO_SNDTIMEO socket option
    - lots of selftest cleanups and improvements
 
  - Increase the max PDU size in CAN ISOTP to 64 kB.
 
 Driver API
 ----------
 
  - Add HW counters for SW netdevs, a mechanism for devices which
    offload packet forwarding to report packet statistics back to
    software interfaces such as tunnels.
 
  - Select the default NIC queue count as a fraction of number of
    physical CPU cores, instead of hard-coding to 8.
 
  - Expose devlink instance locks to drivers. Allow device layer of
    drivers to use that lock directly instead of creating their own
    which always runs into ordering issues in devlink callbacks.
 
  - Add header/data split indication to guide user space enabling
    of TCP zero-copy Rx.
 
  - Allow configuring completion queue event size.
 
  - Refactor page_pool to enable fragmenting after allocation.
 
  - Add allocation and page reuse statistics to page_pool.
 
  - Improve Multiple Spanning Trees support in the bridge to allow
    reuse of topologies across VLANs, saving HW resources in switches.
 
  - DSA (Distributed Switch Architecture):
    - replay and offload of host VLAN entries
    - offload of static and local FDB entries on LAG interfaces
    - FDB isolation and unicast filtering
 
 New hardware / drivers
 ----------------------
 
  - Ethernet:
    - LAN937x T1 PHYs
    - Davicom DM9051 SPI NIC driver
    - Realtek RTL8367S, RTL8367RB-VB switch and MDIO
    - Microchip ksz8563 switches
    - Netronome NFP3800 SmartNICs
    - Fungible SmartNICs
    - MediaTek MT8195 switches
 
  - WiFi:
    - mt76: MediaTek mt7916
    - mt76: MediaTek mt7921u USB adapters
    - brcmfmac: Broadcom BCM43454/6
 
  - Mobile:
    - iosm: Intel M.2 7360 WWAN card
 
 Drivers
 -------
 
  - Convert many drivers to the new phylink API built for split PCS
    designs but also simplifying other cases.
 
  - Intel Ethernet NICs:
    - add TTY for GNSS module for E810T device
    - improve AF_XDP performance
    - GTP-C and GTP-U filter offload
    - QinQ VLAN support
 
  - Mellanox Ethernet NICs (mlx5):
    - support xdp->data_meta
    - multi-buffer XDP
    - offload tc push_eth and pop_eth actions
 
  - Netronome Ethernet NICs (nfp):
    - flow-independent tc action hardware offload (police / meter)
    - AF_XDP
 
  - Other Ethernet NICs:
    - at803x: fiber and SFP support
    - xgmac: mdio: preamble suppression and custom MDC frequencies
    - r8169: enable ASPM L1.2 if system vendor flags it as safe
    - macb/gem: ZynqMP SGMII
    - hns3: add TX push mode
    - dpaa2-eth: software TSO
    - lan743x: multi-queue, mdio, SGMII, PTP
    - axienet: NAPI and GRO support
 
  - Mellanox Ethernet switches (mlxsw):
    - source and dest IP address rewrites
    - RJ45 ports
 
  - Marvell Ethernet switches (prestera):
    - basic routing offload
    - multi-chain TC ACL offload
 
  - NXP embedded Ethernet switches (ocelot & felix):
    - PTP over UDP with the ocelot-8021q DSA tagging protocol
    - basic QoS classification on Felix DSA switch using dcbnl
    - port mirroring for ocelot switches
 
  - Microchip high-speed industrial Ethernet (sparx5):
    - offloading of bridge port flooding flags
    - PTP Hardware Clock
 
  - Other embedded switches:
    - lan966x: PTP Hardward Clock
    - qca8k: mdio read/write operations via crafted Ethernet packets
 
  - Qualcomm 802.11ax WiFi (ath11k):
    - add LDPC FEC type and 802.11ax High Efficiency data in radiotap
    - enable RX PPDU stats in monitor co-exist mode
 
  - Intel WiFi (iwlwifi):
    - UHB TAS enablement via BIOS
    - band disablement via BIOS
    - channel switch offload
    - 32 Rx AMPDU sessions in newer devices
 
  - MediaTek WiFi (mt76):
    - background radar detection
    - thermal management improvements on mt7915
    - SAR support for more mt76 platforms
    - MBSSID and 6 GHz band on mt7915
 
  - RealTek WiFi:
    - rtw89: AP mode
    - rtw89: 160 MHz channels and 6 GHz band
    - rtw89: hardware scan
 
  - Bluetooth:
    - mt7921s: wake on Bluetooth, SCO over I2S, wide-band-speed (WBS)
 
  - Microchip CAN (mcp251xfd):
    - multiple RX-FIFOs and runtime configurable RX/TX rings
    - internal PLL, runtime PM handling simplification
    - improve chip detection and error handling after wakeup
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmI7YBcACgkQMUZtbf5S
 IrveSBAAmSNJlUK6vPsnNzs7IhsZnfI/AUjm2TCLZnlhKttbpI4A/4Pohk33V7RS
 FGX7f8kjEfhUwrIiLDgeCnztNHRECrCmk6aZc/jLEvecmTauJ+f6kjShkDY/wix+
 AkPHmrZnQeLPAEVuljDdV+sL6ik08+zQL7PazIYHsaSKKC0MGQptRwcri8PLRAKE
 KPBAhVhleq2rAZ/ntprSN52F4Af6rpFTrPIWuN8Bqdbc9dy5094LT0mpOOWYvgr3
 /DLvvAPuLemwyIQkjWknVKBRUAQcmNPC+BY3J8K3LRaiNhekGqOFan46BfqP+k2J
 6DWu0Qrp2yWt4BMOeEToZR5rA6v5suUAMIBu8PRZIDkINXQMlIxHfGjZyNm0rVfw
 7edNri966yus9OdzwPa32MIG3oC6PnVAwYCJAjjBMNS8sSIkp7wgHLkgWN4UFe2H
 K/e6z8TLF4UQ+zFM0aGI5WZ+9QqWkTWEDF3R3OhdFpGrznna0gxmkOeV2YvtsgxY
 cbS0vV9Zj73o+bYzgBKJsw/dAjyLdXoHUGvus26VLQ78S/VGunVKtItwoxBAYmZo
 krW964qcC89YofzSi8RSKLHuEWtNWZbVm8YXr75u6jpr5GhMBu0CYefLs+BuZcxy
 dw8c69cGneVbGZmY2J3rBhDkchbuICl8vdUPatGrOJAoaFdYKuw=
 =ELpe
 -----END PGP SIGNATURE-----

Merge tag 'net-next-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Jakub Kicinski:
 "The sprinkling of SPI drivers is because we added a new one and Mark
  sent us a SPI driver interface conversion pull request.

  Core
  ----

   - Introduce XDP multi-buffer support, allowing the use of XDP with
     jumbo frame MTUs and combination with Rx coalescing offloads (LRO).

   - Speed up netns dismantling (5x) and lower the memory cost a little.
     Remove unnecessary per-netns sockets. Scope some lists to a netns.
     Cut down RCU syncing. Use batch methods. Allow netdev registration
     to complete out of order.

   - Support distinguishing timestamp types (ingress vs egress) and
     maintaining them across packet scrubbing points (e.g. redirect).

   - Continue the work of annotating packet drop reasons throughout the
     stack.

   - Switch netdev error counters from an atomic to dynamically
     allocated per-CPU counters.

   - Rework a few preempt_disable(), local_irq_save() and busy waiting
     sections problematic on PREEMPT_RT.

   - Extend the ref_tracker to allow catching use-after-free bugs.

  BPF
  ---

   - Introduce "packing allocator" for BPF JIT images. JITed code is
     marked read only, and used to be allocated at page granularity.
     Custom allocator allows for more efficient memory use, lower iTLB
     pressure and prevents identity mapping huge pages from getting
     split.

   - Make use of BTF type annotations (e.g. __user, __percpu) to enforce
     the correct probe read access method, add appropriate helpers.

   - Convert the BPF preload to use light skeleton and drop the
     user-mode-driver dependency.

   - Allow XDP BPF_PROG_RUN test infra to send real packets, enabling
     its use as a packet generator.

   - Allow local storage memory to be allocated with GFP_KERNEL if
     called from a hook allowed to sleep.

   - Introduce fprobe (multi kprobe) to speed up mass attachment (arch
     bits to come later).

   - Add unstable conntrack lookup helpers for BPF by using the BPF
     kfunc infra.

   - Allow cgroup BPF progs to return custom errors to user space.

   - Add support for AF_UNIX iterator batching.

   - Allow iterator programs to use sleepable helpers.

   - Support JIT of add, and, or, xor and xchg atomic ops on arm64.

   - Add BTFGen support to bpftool which allows to use CO-RE in kernels
     without BTF info.

   - Large number of libbpf API improvements, cleanups and deprecations.

  Protocols
  ---------

   - Micro-optimize UDPv6 Tx, gaining up to 5% in test on dummy netdev.

   - Adjust TSO packet sizes based on min_rtt, allowing very low latency
     links (data centers) to always send full-sized TSO super-frames.

   - Make IPv6 flow label changes (AKA hash rethink) more configurable,
     via sysctl and setsockopt. Distinguish between server and client
     behavior.

   - VxLAN support to "collect metadata" devices to terminate only
     configured VNIs. This is similar to VLAN filtering in the bridge.

   - Support inserting IPv6 IOAM information to a fraction of frames.

   - Add protocol attribute to IP addresses to allow identifying where
     given address comes from (kernel-generated, DHCP etc.)

   - Support setting socket and IPv6 options via cmsg on ping6 sockets.

   - Reject mis-use of ECN bits in IP headers as part of DSCP/TOS.
     Define dscp_t and stop taking ECN bits into account in fib-rules.

   - Add support for locked bridge ports (for 802.1X).

   - tun: support NAPI for packets received from batched XDP buffs,
     doubling the performance in some scenarios.

   - IPv6 extension header handling in Open vSwitch.

   - Support IPv6 control message load balancing in bonding, prevent
     neighbor solicitation and advertisement from using the wrong port.
     Support NS/NA monitor selection similar to existing ARP monitor.

   - SMC
      - improve performance with TCP_CORK and sendfile()
      - support auto-corking
      - support TCP_NODELAY

   - MCTP (Management Component Transport Protocol)
      - add user space tag control interface
      - I2C binding driver (as specified by DMTF DSP0237)

   - Multi-BSSID beacon handling in AP mode for WiFi.

   - Bluetooth:
      - handle MSFT Monitor Device Event
      - add MGMT Adv Monitor Device Found/Lost events

   - Multi-Path TCP:
      - add support for the SO_SNDTIMEO socket option
      - lots of selftest cleanups and improvements

   - Increase the max PDU size in CAN ISOTP to 64 kB.

  Driver API
  ----------

   - Add HW counters for SW netdevs, a mechanism for devices which
     offload packet forwarding to report packet statistics back to
     software interfaces such as tunnels.

   - Select the default NIC queue count as a fraction of number of
     physical CPU cores, instead of hard-coding to 8.

   - Expose devlink instance locks to drivers. Allow device layer of
     drivers to use that lock directly instead of creating their own
     which always runs into ordering issues in devlink callbacks.

   - Add header/data split indication to guide user space enabling of
     TCP zero-copy Rx.

   - Allow configuring completion queue event size.

   - Refactor page_pool to enable fragmenting after allocation.

   - Add allocation and page reuse statistics to page_pool.

   - Improve Multiple Spanning Trees support in the bridge to allow
     reuse of topologies across VLANs, saving HW resources in switches.

   - DSA (Distributed Switch Architecture):
      - replay and offload of host VLAN entries
      - offload of static and local FDB entries on LAG interfaces
      - FDB isolation and unicast filtering

  New hardware / drivers
  ----------------------

   - Ethernet:
      - LAN937x T1 PHYs
      - Davicom DM9051 SPI NIC driver
      - Realtek RTL8367S, RTL8367RB-VB switch and MDIO
      - Microchip ksz8563 switches
      - Netronome NFP3800 SmartNICs
      - Fungible SmartNICs
      - MediaTek MT8195 switches

   - WiFi:
      - mt76: MediaTek mt7916
      - mt76: MediaTek mt7921u USB adapters
      - brcmfmac: Broadcom BCM43454/6

   - Mobile:
      - iosm: Intel M.2 7360 WWAN card

  Drivers
  -------

   - Convert many drivers to the new phylink API built for split PCS
     designs but also simplifying other cases.

   - Intel Ethernet NICs:
      - add TTY for GNSS module for E810T device
      - improve AF_XDP performance
      - GTP-C and GTP-U filter offload
      - QinQ VLAN support

   - Mellanox Ethernet NICs (mlx5):
      - support xdp->data_meta
      - multi-buffer XDP
      - offload tc push_eth and pop_eth actions

   - Netronome Ethernet NICs (nfp):
      - flow-independent tc action hardware offload (police / meter)
      - AF_XDP

   - Other Ethernet NICs:
      - at803x: fiber and SFP support
      - xgmac: mdio: preamble suppression and custom MDC frequencies
      - r8169: enable ASPM L1.2 if system vendor flags it as safe
      - macb/gem: ZynqMP SGMII
      - hns3: add TX push mode
      - dpaa2-eth: software TSO
      - lan743x: multi-queue, mdio, SGMII, PTP
      - axienet: NAPI and GRO support

   - Mellanox Ethernet switches (mlxsw):
      - source and dest IP address rewrites
      - RJ45 ports

   - Marvell Ethernet switches (prestera):
      - basic routing offload
      - multi-chain TC ACL offload

   - NXP embedded Ethernet switches (ocelot & felix):
      - PTP over UDP with the ocelot-8021q DSA tagging protocol
      - basic QoS classification on Felix DSA switch using dcbnl
      - port mirroring for ocelot switches

   - Microchip high-speed industrial Ethernet (sparx5):
      - offloading of bridge port flooding flags
      - PTP Hardware Clock

   - Other embedded switches:
      - lan966x: PTP Hardward Clock
      - qca8k: mdio read/write operations via crafted Ethernet packets

   - Qualcomm 802.11ax WiFi (ath11k):
      - add LDPC FEC type and 802.11ax High Efficiency data in radiotap
      - enable RX PPDU stats in monitor co-exist mode

   - Intel WiFi (iwlwifi):
      - UHB TAS enablement via BIOS
      - band disablement via BIOS
      - channel switch offload
      - 32 Rx AMPDU sessions in newer devices

   - MediaTek WiFi (mt76):
      - background radar detection
      - thermal management improvements on mt7915
      - SAR support for more mt76 platforms
      - MBSSID and 6 GHz band on mt7915

   - RealTek WiFi:
      - rtw89: AP mode
      - rtw89: 160 MHz channels and 6 GHz band
      - rtw89: hardware scan

   - Bluetooth:
      - mt7921s: wake on Bluetooth, SCO over I2S, wide-band-speed (WBS)

   - Microchip CAN (mcp251xfd):
      - multiple RX-FIFOs and runtime configurable RX/TX rings
      - internal PLL, runtime PM handling simplification
      - improve chip detection and error handling after wakeup"

* tag 'net-next-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2521 commits)
  llc: fix netdevice reference leaks in llc_ui_bind()
  drivers: ethernet: cpsw: fix panic when interrupt coaleceing is set via ethtool
  ice: don't allow to run ice_send_event_to_aux() in atomic ctx
  ice: fix 'scheduling while atomic' on aux critical err interrupt
  net/sched: fix incorrect vlan_push_eth dest field
  net: bridge: mst: Restrict info size queries to bridge ports
  net: marvell: prestera: add missing destroy_workqueue() in prestera_module_init()
  drivers: net: xgene: Fix regression in CRC stripping
  net: geneve: add missing netlink policy and size for IFLA_GENEVE_INNER_PROTO_INHERIT
  net: dsa: fix missing host-filtered multicast addresses
  net/mlx5e: Fix build warning, detected write beyond size of field
  iwlwifi: mvm: Don't fail if PPAG isn't supported
  selftests/bpf: Fix kprobe_multi test.
  Revert "rethook: x86: Add rethook x86 implementation"
  Revert "arm64: rethook: Add arm64 rethook implementation"
  Revert "powerpc: Add rethook support"
  Revert "ARM: rethook: Add rethook arm implementation"
  netdevice: add missing dm_private kdoc
  net: bridge: mst: prevent NULL deref in br_mst_info_size()
  selftests: forwarding: Use same VRF for port and VLAN upper
  ...
2022-03-24 13:13:26 -07:00
Feng Tang
1bf18da621 lib/Kconfig.debug: add ARCH dependency for FUNCTION_ALIGN option
0Day robots reported there is compiling issue for 'csky' ARCH when
CONFIG_DEBUG_FORCE_DATA_SECTION_ALIGNED is enabled [1]:

All errors (new ones prefixed by >>):

   {standard input}: Assembler messages:
>> {standard input}:2277: Error: pcrel offset for branch to .LS000B too far (0x3c)

Which was discussed in [2].  And as there is no solution for csky yet, add
some dependency for this config to limit it to several ARCHs which have no
compiling issue so far.

[1]. https://lore.kernel.org/lkml/202202271612.W32UJAj2-lkp@intel.com/
[2]. https://www.spinics.net/lists/linux-kbuild/msg30298.html

Link: https://lkml.kernel.org/r/20220304021100.GN4548@shbuild999.sh.intel.com
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Feng Tang <feng.tang@intel.com>
Cc: Guo Ren <guoren@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-23 19:00:33 -07:00
Kees Cook
f9b3cd2457 Kconfig.debug: make DEBUG_INFO selectable from a choice
Currently it's not possible to enable DEBUG_INFO for an all*config
build, since it is marked as "depends on !COMPILE_TEST".

This generally makes sense because a debug build of an all*config target
ends up taking much longer and the output is much larger.  Having this
be "default off" makes sense.

However, there are cases where enabling DEBUG_INFO for such builds is
useful for doing treewide A/B comparisons of build options, etc.

Make DEBUG_INFO selectable from any of the DWARF version choice options,
with DEBUG_INFO_NONE being the default for COMPILE_TEST.

The mutually exclusive relationship between DWARF5 and BTF must be
inverted, but the result remains the same.  Additionally moves
DEBUG_KERNEL and DEBUG_MISC up to the top of the menu because they were
enabling features _above_ it, making it weird to navigate menuconfig.

[keescook@chromium.org: make DEBUG_INFO always default=n]
  Link: https://lkml.kernel.org/r/20220128214131.580131-1-keescook@chromium.org
  Link: https://lore.kernel.org/lkml/YfRY6+CaQxX7O8vF@dev-arch.archlinux-ax161

Link: https://lkml.kernel.org/r/20220125075126.891825-1-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Masahiro Yamada <masahiroy@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-23 19:00:33 -07:00
Kees Cook
02788ebcf5 lib: stackinit: Convert to KUnit
Convert stackinit unit tests to KUnit, for better integration
into the kernel self test framework. Includes a rename of
test_stackinit.c to stackinit_kunit.c, and CONFIG_TEST_STACKINIT to
CONFIG_STACKINIT_KUNIT_TEST.

Adjust expected test results based on which stack initialization method
was chosen:

 $ CMD="./tools/testing/kunit/kunit.py run stackinit --raw_output \
        --arch=x86_64 --kconfig_add"

 $ $CMD | grep stackinit:
 # stackinit: pass:36 fail:0 skip:29 total:65

 $ $CMD CONFIG_GCC_PLUGIN_STRUCTLEAK_USER=y | grep stackinit:
 # stackinit: pass:37 fail:0 skip:28 total:65

 $ $CMD CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF=y | grep stackinit:
 # stackinit: pass:55 fail:0 skip:10 total:65

 $ $CMD CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF_ALL=y | grep stackinit:
 # stackinit: pass:62 fail:0 skip:3 total:65

 $ $CMD CONFIG_INIT_STACK_ALL_PATTERN=y --make_option LLVM=1 | grep stackinit:
 # stackinit: pass:60 fail:0 skip:5 total:65

 $ $CMD CONFIG_INIT_STACK_ALL_ZERO=y --make_option LLVM=1 | grep stackinit:
 # stackinit: pass:60 fail:0 skip:5 total:65

Temporarily remove the userspace-build mode, which will be restored in a
later patch.

Expand the size of the pre-case switch variable so it doesn't get
accidentally cleared.

Cc: David Gow <davidgow@google.com>
Cc: Daniel Latypov <dlatypov@google.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
v1: https://lore.kernel.org/lkml/20220224055145.1853657-1-keescook@chromium.org
v2:
 - split "userspace KUnit stub" into separate header and patch (Daniel)
 - Improve commit log and comments (David)
 - Provide mapping of expected XFAIL tests to CONFIGs (David)
2022-03-21 08:13:04 -07:00
Masami Hiramatsu
f4616fabab fprobe: Add a selftest for fprobe
Add a KUnit based selftest for fprobe interface.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/164735295554.1084943.18347620679928750960.stgit@devnote2
2022-03-17 20:17:14 -07:00
Connor O'Brien
5e214f2e43 bpf: Add config to allow loading modules with BTF mismatches
BTF mismatch can occur for a separately-built module even when the ABI is
otherwise compatible and nothing else would prevent successfully loading.

Add a new Kconfig to control how mismatches are handled. By default, preserve
the current behavior of refusing to load the module. If MODULE_ALLOW_BTF_MISMATCH
is enabled, load the module but ignore its BTF information.

Suggested-by: Yonghong Song <yhs@fb.com>
Suggested-by: Michal Suchánek <msuchanek@suse.de>
Signed-off-by: Connor O'Brien <connoro@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/CAADnVQJ+OVPnBz8z3vNu8gKXX42jCUqfuvhWAyCQDu8N_yqqwQ@mail.gmail.com
Link: https://lore.kernel.org/bpf/20220223012814.1898677-1-connoro@google.com
2022-02-28 14:17:10 +01:00
Kees Cook
617f55e207 lib: overflow: Convert to Kunit
Convert overflow unit tests to KUnit, for better integration into the
kernel self test framework. Includes a rename of test_overflow.c to
overflow_kunit.c, and CONFIG_TEST_OVERFLOW to CONFIG_OVERFLOW_KUNIT_TEST.

$ ./tools/testing/kunit/kunit.py run overflow
...
[14:33:51] Starting KUnit Kernel (1/1)...
[14:33:51] ============================================================
[14:33:51] ================== overflow (11 subtests) ==================
[14:33:51] [PASSED] u8_overflow_test
[14:33:51] [PASSED] s8_overflow_test
[14:33:51] [PASSED] u16_overflow_test
[14:33:51] [PASSED] s16_overflow_test
[14:33:51] [PASSED] u32_overflow_test
[14:33:51] [PASSED] s32_overflow_test
[14:33:51] [PASSED] u64_overflow_test
[14:33:51] [PASSED] s64_overflow_test
[14:33:51] [PASSED] overflow_shift_test
[14:33:51] [PASSED] overflow_allocation_test
[14:33:51] [PASSED] overflow_size_helpers_test
[14:33:51] ==================== [PASSED] overflow =====================
[14:33:51] ============================================================
[14:33:51] Testing complete. Passed: 11, Failed: 0, Crashed: 0, Skipped: 0, Errors: 0
[14:33:51] Elapsed time: 12.525s total, 0.001s configuring, 12.402s building, 0.101s running

Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Co-developed-by: Vitor Massaru Iha <vitor@massaru.org>
Signed-off-by: Vitor Massaru Iha <vitor@massaru.org>
Link: https://lore.kernel.org/lkml/20200720224418.200495-1-vitor@massaru.org/
Co-developed-by: Daniel Latypov <dlatypov@google.com>
Signed-off-by: Daniel Latypov <dlatypov@google.com>
Link: https://lore.kernel.org/linux-kselftest/20210503211536.1384578-1-dlatypov@google.com/
Acked-by: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lore.kernel.org/lkml/CAKwvOdm62iA1dNiC6Q11UJ-MnTqtc4kXkm-ubPaFMK824_k0nw@mail.gmail.com
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Gow <davidgow@google.com>
Link: https://lore.kernel.org/lkml/CABVgOS=TWVh649_Vjo3wnMu9gZnq66gkV-LtGgsksAWMqc+MSA@mail.gmail.com
2022-02-27 09:29:02 -08:00
Dan Williams
3c5b903955 cxl: Prove CXL locking
When CONFIG_PROVE_LOCKING is enabled the 'struct device' definition gets
an additional mutex that is not clobbered by
lockdep_set_novalidate_class() like the typical device_lock(). This
allows for local annotation of subsystem locks with mutex_lock_nested()
per the subsystem's object/lock hierarchy. For CXL, this primarily needs
the ability to lock ports by depth and child objects of ports by their
parent parent-port lock.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ben Widawsky <ben.widawsky@intel.com>
Link: https://lore.kernel.org/r/164365853422.99383.1052399160445197427.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-02-08 22:57:29 -08:00
Nathan Chancellor
42d9b379e3 lib/Kconfig.debug: Allow BTF + DWARF5 with pahole 1.21+
Commit 98cd6f521f ("Kconfig: allow explicit opt in to DWARF v5")
prevented CONFIG_DEBUG_INFO_DWARF5 from being selected when
CONFIG_DEBUG_INFO_BTF is enabled because pahole had issues with clang's
DWARF5 info. This was resolved by [1], which is in pahole v1.21.

Allow DEBUG_INFO_DWARF5 to be selected with DEBUG_INFO_BTF when using
pahole v1.21 or newer.

[1]: https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?id=7d8e829f636f47aba2e1b6eda57e74d8e31f733c

Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220201205624.652313-6-nathan@kernel.org
2022-02-02 11:19:34 +01:00
Nathan Chancellor
6323c81350 lib/Kconfig.debug: Use CONFIG_PAHOLE_VERSION
Now that CONFIG_PAHOLE_VERSION exists, use it in the definition of
CONFIG_PAHOLE_HAS_SPLIT_BTF and CONFIG_PAHOLE_HAS_BTF_TAG to reduce the
amount of duplication across the tree.

Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220201205624.652313-5-nathan@kernel.org
2022-02-02 11:19:33 +01:00
Yonghong Song
7472d5a642 compiler_types: define __user as __attribute__((btf_type_tag("user")))
The __user attribute is currently mainly used by sparse for type checking.
The attribute indicates whether a memory access is in user memory address
space or not. Such information is important during tracing kernel
internal functions or data structures as accessing user memory often
has different mechanisms compared to accessing kernel memory. For example,
the perf-probe needs explicit command line specification to indicate a
particular argument or string in user-space memory ([1], [2], [3]).
Currently, vmlinux BTF is available in kernel with many distributions.
If __user attribute information is available in vmlinux BTF, the explicit
user memory access information from users will not be necessary as
the kernel can figure it out by itself with vmlinux BTF.

Besides the above possible use for perf/probe, another use case is
for bpf verifier. Currently, for bpf BPF_PROG_TYPE_TRACING type of bpf
programs, users can write direct code like
  p->m1->m2
and "p" could be a function parameter. Without __user information in BTF,
the verifier will assume p->m1 accessing kernel memory and will generate
normal loads. Let us say "p" actually tagged with __user in the source
code.  In such cases, p->m1 is actually accessing user memory and direct
load is not right and may produce incorrect result. For such cases,
bpf_probe_read_user() will be the correct way to read p->m1.

To support encoding __user information in BTF, a new attribute
  __attribute__((btf_type_tag("<arbitrary_string>")))
is implemented in clang ([4]). For example, if we have
  #define __user __attribute__((btf_type_tag("user")))
during kernel compilation, the attribute "user" information will
be preserved in dwarf. After pahole converting dwarf to BTF, __user
information will be available in vmlinux BTF.

The following is an example with latest upstream clang (clang14) and
pahole 1.23:

  [$ ~] cat test.c
  #define __user __attribute__((btf_type_tag("user")))
  int foo(int __user *arg) {
          return *arg;
  }
  [$ ~] clang -O2 -g -c test.c
  [$ ~] pahole -JV test.o
  ...
  [1] INT int size=4 nr_bits=32 encoding=SIGNED
  [2] TYPE_TAG user type_id=1
  [3] PTR (anon) type_id=2
  [4] FUNC_PROTO (anon) return=1 args=(3 arg)
  [5] FUNC foo type_id=4
  [$ ~]

You can see for the function argument "int __user *arg", its type is
described as
  PTR -> TYPE_TAG(user) -> INT
The kernel can use this information for bpf verification or other
use cases.

Current btf_type_tag is only supported in clang (>= clang14) and
pahole (>= 1.23). gcc support is also proposed and under development ([5]).

  [1] http://lkml.kernel.org/r/155789874562.26965.10836126971405890891.stgit@devnote2
  [2] http://lkml.kernel.org/r/155789872187.26965.4468456816590888687.stgit@devnote2
  [3] http://lkml.kernel.org/r/155789871009.26965.14167558859557329331.stgit@devnote2
  [4] https://reviews.llvm.org/D111199
  [5] https://lore.kernel.org/bpf/0cbeb2fb-1a18-f690-e360-24b1c90c2a91@fb.com/

Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220127154600.652613-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-01-27 12:03:46 -08:00
Linus Torvalds
f4484d138b Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:
 "55 patches.

  Subsystems affected by this patch series: percpu, procfs, sysctl,
  misc, core-kernel, get_maintainer, lib, checkpatch, binfmt, nilfs2,
  hfs, fat, adfs, panic, delayacct, kconfig, kcov, and ubsan"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (55 commits)
  lib: remove redundant assignment to variable ret
  ubsan: remove CONFIG_UBSAN_OBJECT_SIZE
  kcov: fix generic Kconfig dependencies if ARCH_WANTS_NO_INSTR
  lib/Kconfig.debug: make TEST_KMOD depend on PAGE_SIZE_LESS_THAN_256KB
  btrfs: use generic Kconfig option for 256kB page size limit
  arch/Kconfig: split PAGE_SIZE_LESS_THAN_256KB from PAGE_SIZE_LESS_THAN_64KB
  configs: introduce debug.config for CI-like setup
  delayacct: track delays from memory compact
  Documentation/accounting/delay-accounting.rst: add thrashing page cache and direct compact
  delayacct: cleanup flags in struct task_delay_info and functions use it
  delayacct: fix incomplete disable operation when switch enable to disable
  delayacct: support swapin delay accounting for swapping without blkio
  panic: remove oops_id
  panic: use error_report_end tracepoint on warnings
  fs/adfs: remove unneeded variable make code cleaner
  FAT: use io_schedule_timeout() instead of congestion_wait()
  hfsplus: use struct_group_attr() for memcpy() region
  nilfs2: remove redundant pointer sbufs
  fs/binfmt_elf: use PT_LOAD p_align values for static PIE
  const_structs.checkpatch: add frequently used ops structs
  ...
2022-01-20 10:41:01 +02:00
Marco Elver
bece04b5b4 kcov: fix generic Kconfig dependencies if ARCH_WANTS_NO_INSTR
Until recent versions of GCC and Clang, it was not possible to disable
KCOV instrumentation via a function attribute.  The relevant function
attribute was introduced in 540540d06e ("kcov: add
__no_sanitize_coverage to fix noinstr for all architectures").

x86 was the first architecture to want a working noinstr, and at the
time no compiler support for the attribute existed yet.  Therefore,
commit 0f1441b44e ("objtool: Fix noinstr vs KCOV") introduced the
ability to NOP __sanitizer_cov_*() calls in .noinstr.text.

However, this doesn't work for other architectures like arm64 and s390
that want a working noinstr per ARCH_WANTS_NO_INSTR.

At the time of 0f1441b44e, we didn't yet have ARCH_WANTS_NO_INSTR,
but now we can move the Kconfig dependency checks to the generic KCOV
option.  KCOV will be available if:

	- architecture does not care about noinstr, OR
	- we have objtool support (like on x86), OR
	- GCC is 12.0 or newer, OR
	- Clang is 13.0 or newer.

Link: https://lkml.kernel.org/r/20211201152604.3984495-1-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-20 08:52:55 +02:00
Nathan Chancellor
bbd2e05fad lib/Kconfig.debug: make TEST_KMOD depend on PAGE_SIZE_LESS_THAN_256KB
Commit b05fbcc36b ("btrfs: disable build on platforms having page size
256K") disabled btrfs for configurations that used a 256kB page size.
However, it did not fully solve the problem because CONFIG_TEST_KMOD
selects CONFIG_BTRFS, which does not account for the dependency.  This
results in a Kconfig warning and the failed BUILD_BUG_ON error
returning.

  WARNING: unmet direct dependencies detected for BTRFS_FS
    Depends on [n]: BLOCK [=y] && !PPC_256K_PAGES && !PAGE_SIZE_256KB [=y]
    Selected by [m]:
    - TEST_KMOD [=m] && RUNTIME_TESTING_MENU [=y] && m && MODULES [=y] && NETDEVICES [=y] && NET_CORE [=y] && INET [=y] && BLOCK [=y]

To resolve this, add CONFIG_PAGE_SIZE_LESS_THAN_256KB as a dependency of
CONFIG_TEST_KMOD so there is no more invalid configuration or build
errors.

Link: https://lkml.kernel.org/r/20211129230141.228085-4-nathan@kernel.org
Fixes: b05fbcc36b ("btrfs: disable build on platforms having page size 256K")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reported-by: kernel test robot <lkp@intel.com>
Cc: Chris Mason <clm@fb.com>
Cc: David Sterba <dsterba@suse.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-20 08:52:55 +02:00
Isabella Basso
0acc968f35 test_hash.c: refactor into kunit
Use KUnit framework to make tests more easily integrable with CIs.  Even
though these tests are not yet properly written as unit tests this
change should help in debugging.

Also remove kernel messages (i.e.  through pr_info) as KUnit handles all
debugging output and let it handle module init and exit details.

Link: https://lkml.kernel.org/r/20211208183711.390454-6-isabbasso@riseup.net
Reviewed-by: David Gow <davidgow@google.com>
Reported-by: kernel test robot <lkp@intel.com>
Tested-by: David Gow <davidgow@google.com>
Co-developed-by: Augusto Durães Camargo <augusto.duraes33@gmail.com>
Signed-off-by: Augusto Durães Camargo <augusto.duraes33@gmail.com>
Co-developed-by: Enzo Ferreira <ferreiraenzoa@gmail.com>
Signed-off-by: Enzo Ferreira <ferreiraenzoa@gmail.com>
Signed-off-by: Isabella Basso <isabbasso@riseup.net>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Daniel Latypov <dlatypov@google.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Rodrigo Siqueira <rodrigosiqueiramelo@gmail.com>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-20 08:52:54 +02:00
Isabella Basso
88168bf35c lib/Kconfig.debug: properly split hash test kernel entries
Split TEST_HASH so that each entry only has one file.

Note that there's no stringhash test file, but actually
<linux/stringhash.h> tests are performed in lib/test_hash.c.

Link: https://lkml.kernel.org/r/20211208183711.390454-5-isabbasso@riseup.net
Reviewed-by: David Gow <davidgow@google.com>
Tested-by: David Gow <davidgow@google.com>
Signed-off-by: Isabella Basso <isabbasso@riseup.net>
Cc: Augusto Durães Camargo <augusto.duraes33@gmail.com>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Daniel Latypov <dlatypov@google.com>
Cc: Enzo Ferreira <ferreiraenzoa@gmail.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: kernel test robot <lkp@intel.com>
Cc: Rodrigo Siqueira <rodrigosiqueiramelo@gmail.com>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-20 08:52:54 +02:00
Jakub Kicinski
3150a73366 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
No conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-09 13:23:02 -08:00
Jakub Kicinski
6efcdadc15 Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
bpf 2021-12-08

We've added 12 non-merge commits during the last 22 day(s) which contain
a total of 29 files changed, 659 insertions(+), 80 deletions(-).

The main changes are:

1) Fix an off-by-two error in packet range markings and also add a batch of
   new tests for coverage of these corner cases, from Maxim Mikityanskiy.

2) Fix a compilation issue on MIPS JIT for R10000 CPUs, from Johan Almbladh.

3) Fix two functional regressions and a build warning related to BTF kfunc
   for modules, from Kumar Kartikeya Dwivedi.

4) Fix outdated code and docs regarding BPF's migrate_disable() use on non-
   PREEMPT_RT kernels, from Sebastian Andrzej Siewior.

5) Add missing includes in order to be able to detangle cgroup vs bpf header
   dependencies, from Jakub Kicinski.

6) Fix regression in BPF sockmap tests caused by missing detachment of progs
   from sockets when they are removed from the map, from John Fastabend.

7) Fix a missing "no previous prototype" warning in x86 JIT caused by BPF
   dispatcher, from Björn Töpel.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  bpf: Add selftests to cover packet access corner cases
  bpf: Fix the off-by-two error in range markings
  treewide: Add missing includes masked by cgroup -> bpf dependency
  tools/resolve_btfids: Skip unresolved symbol warning for empty BTF sets
  bpf: Fix bpf_check_mod_kfunc_call for built-in modules
  bpf: Make CONFIG_DEBUG_INFO_BTF depend upon CONFIG_BPF_SYSCALL
  mips, bpf: Fix reference to non-existing Kconfig symbol
  bpf: Make sure bpf_disable_instrumentation() is safe vs preemption.
  Documentation/locking/locktypes: Update migrate_disable() bits.
  bpf, sockmap: Re-evaluate proto ops when psock is removed from sockmap
  bpf, sockmap: Attach map progs to psock early for feature probes
  bpf, x86: Fix "no previous prototype" warning
====================

Link: https://lore.kernel.org/r/20211208155125.11826-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-08 16:06:44 -08:00
Eric Dumazet
4d92b95ff2 net: add net device refcount tracker infrastructure
net device are refcounted. Over the years we had numerous bugs
caused by imbalanced dev_hold() and dev_put() calls.

The general idea is to be able to precisely pair each decrement with
a corresponding prior increment. Both share a cookie, basically
a pointer to private data storing stack traces.

This patch adds dev_hold_track() and dev_put_track().

To use these helpers, each data structure owning a refcount
should also use a "netdevice_tracker" to pair the hold and put.

netdevice_tracker dev_tracker;
...
dev_hold_track(dev, &dev_tracker, GFP_ATOMIC);
...
dev_put_track(dev, &dev_tracker);

Whenever a leak happens, we will get precise stack traces
of the point dev_hold_track() happened, at device dismantle phase.

We will also get a stack trace if too many dev_put_track() for the same
netdevice_tracker are attempted.

This is guarded by CONFIG_NET_DEV_REFCNT_TRACKER option.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-06 16:05:07 -08:00
Eric Dumazet
914a7b5000 lib: add tests for reference tracker
This module uses reference tracker, forcing two issues.

1) Double free of a tracker

2) leak of two trackers, one being allocated from softirq context.

"modprobe test_ref_tracker" would emit the following traces.
(Use scripts/decode_stacktrace.sh if necessary)

[  171.648681] reference already released.
[  171.653213] allocated in:
[  171.656523]  alloctest_ref_tracker_alloc2+0x1c/0x20 [test_ref_tracker]
[  171.656526]  init_module+0x86/0x1000 [test_ref_tracker]
[  171.656528]  do_one_initcall+0x9c/0x220
[  171.656532]  do_init_module+0x60/0x240
[  171.656536]  load_module+0x32b5/0x3610
[  171.656538]  __do_sys_init_module+0x148/0x1a0
[  171.656540]  __x64_sys_init_module+0x1d/0x20
[  171.656542]  do_syscall_64+0x4a/0xb0
[  171.656546]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  171.656549] freed in:
[  171.659520]  alloctest_ref_tracker_free+0x13/0x20 [test_ref_tracker]
[  171.659522]  init_module+0xec/0x1000 [test_ref_tracker]
[  171.659523]  do_one_initcall+0x9c/0x220
[  171.659525]  do_init_module+0x60/0x240
[  171.659527]  load_module+0x32b5/0x3610
[  171.659529]  __do_sys_init_module+0x148/0x1a0
[  171.659532]  __x64_sys_init_module+0x1d/0x20
[  171.659534]  do_syscall_64+0x4a/0xb0
[  171.659536]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  171.659575] ------------[ cut here ]------------
[  171.659576] WARNING: CPU: 5 PID: 13016 at lib/ref_tracker.c:112 ref_tracker_free+0x224/0x270
[  171.659581] Modules linked in: test_ref_tracker(+)
[  171.659591] CPU: 5 PID: 13016 Comm: modprobe Tainted: G S                5.16.0-smp-DEV #290
[  171.659595] RIP: 0010:ref_tracker_free+0x224/0x270
[  171.659599] Code: 5e 41 5f 5d c3 48 c7 c7 04 9c 74 a6 31 c0 e8 62 ee 67 00 83 7b 14 00 75 1a 83 7b 18 00 75 30 4c 89 ff 4c 89 f6 e8 9c 00 69 00 <0f> 0b bb ea ff ff ff eb ae 48 c7 c7 3a 0a 77 a6 31 c0 e8 34 ee 67
[  171.659601] RSP: 0018:ffff89058ba0bbd0 EFLAGS: 00010286
[  171.659603] RAX: 0000000000000029 RBX: ffff890586b19780 RCX: 08895bff57c7d100
[  171.659604] RDX: c0000000ffff7fff RSI: 0000000000000282 RDI: ffffffffc0407000
[  171.659606] RBP: ffff89058ba0bc88 R08: 0000000000000000 R09: ffffffffa6f342e0
[  171.659607] R10: 00000000ffff7fff R11: 0000000000000000 R12: 000000008f000000
[  171.659608] R13: 0000000000000014 R14: 0000000000000282 R15: ffffffffc0407000
[  171.659609] FS:  00007f97ea29d740(0000) GS:ffff8923ff940000(0000) knlGS:0000000000000000
[  171.659611] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  171.659613] CR2: 00007f97ea299000 CR3: 0000000186b4a004 CR4: 00000000001706e0
[  171.659614] Call Trace:
[  171.659615]  <TASK>
[  171.659631]  ? alloctest_ref_tracker_free+0x13/0x20 [test_ref_tracker]
[  171.659633]  ? init_module+0x105/0x1000 [test_ref_tracker]
[  171.659636]  ? do_one_initcall+0x9c/0x220
[  171.659638]  ? do_init_module+0x60/0x240
[  171.659641]  ? load_module+0x32b5/0x3610
[  171.659644]  ? __do_sys_init_module+0x148/0x1a0
[  171.659646]  ? __x64_sys_init_module+0x1d/0x20
[  171.659649]  ? do_syscall_64+0x4a/0xb0
[  171.659652]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
[  171.659656]  ? 0xffffffffc040a000
[  171.659658]  alloctest_ref_tracker_free+0x13/0x20 [test_ref_tracker]
[  171.659660]  init_module+0x105/0x1000 [test_ref_tracker]
[  171.659663]  do_one_initcall+0x9c/0x220
[  171.659666]  do_init_module+0x60/0x240
[  171.659669]  load_module+0x32b5/0x3610
[  171.659672]  __do_sys_init_module+0x148/0x1a0
[  171.659676]  __x64_sys_init_module+0x1d/0x20
[  171.659678]  do_syscall_64+0x4a/0xb0
[  171.659694]  ? exc_page_fault+0x6e/0x140
[  171.659696]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  171.659698] RIP: 0033:0x7f97ea3dbe7a
[  171.659700] Code: 48 8b 0d 61 8d 06 00 f7 d8 64 89 01 48 83 c8 ff c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 2e 8d 06 00 f7 d8 64 89 01 48
[  171.659701] RSP: 002b:00007ffea67ce608 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
[  171.659703] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f97ea3dbe7a
[  171.659704] RDX: 00000000013a0ba0 RSI: 0000000000002808 RDI: 00007f97ea299000
[  171.659705] RBP: 00007ffea67ce670 R08: 0000000000000003 R09: 0000000000000000
[  171.659706] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000013a1048
[  171.659707] R13: 00000000013a0ba0 R14: 0000000001399930 R15: 00000000013a1030
[  171.659709]  </TASK>
[  171.659710] ---[ end trace f5dbd6afa41e60a9 ]---
[  171.659712] leaked reference.
[  171.663393]  alloctest_ref_tracker_alloc0+0x1c/0x20 [test_ref_tracker]
[  171.663395]  test_ref_tracker_timer_func+0x9/0x20 [test_ref_tracker]
[  171.663397]  call_timer_fn+0x31/0x140
[  171.663401]  expire_timers+0x46/0x110
[  171.663403]  __run_timers+0x16f/0x1b0
[  171.663404]  run_timer_softirq+0x1d/0x40
[  171.663406]  __do_softirq+0x148/0x2d3
[  171.663408] leaked reference.
[  171.667101]  alloctest_ref_tracker_alloc1+0x1c/0x20 [test_ref_tracker]
[  171.667103]  init_module+0x81/0x1000 [test_ref_tracker]
[  171.667104]  do_one_initcall+0x9c/0x220
[  171.667106]  do_init_module+0x60/0x240
[  171.667108]  load_module+0x32b5/0x3610
[  171.667111]  __do_sys_init_module+0x148/0x1a0
[  171.667113]  __x64_sys_init_module+0x1d/0x20
[  171.667115]  do_syscall_64+0x4a/0xb0
[  171.667117]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  171.667131] ------------[ cut here ]------------
[  171.667132] WARNING: CPU: 5 PID: 13016 at lib/ref_tracker.c:30 ref_tracker_dir_exit+0x104/0x130
[  171.667136] Modules linked in: test_ref_tracker(+)
[  171.667144] CPU: 5 PID: 13016 Comm: modprobe Tainted: G S      W         5.16.0-smp-DEV #290
[  171.667147] RIP: 0010:ref_tracker_dir_exit+0x104/0x130
[  171.667150] Code: 01 00 00 00 00 ad de 48 89 03 4c 89 63 08 48 89 df e8 20 a0 d5 ff 4c 89 f3 4d 39 ee 75 a8 4c 89 ff 48 8b 75 d0 e8 7c 05 69 00 <0f> 0b eb 0c 4c 89 ff 48 8b 75 d0 e8 6c 05 69 00 41 8b 47 08 83 f8
[  171.667151] RSP: 0018:ffff89058ba0bc68 EFLAGS: 00010286
[  171.667154] RAX: 08895bff57c7d100 RBX: ffffffffc0407010 RCX: 000000000000003b
[  171.667156] RDX: 000000000000003c RSI: 0000000000000282 RDI: ffffffffc0407000
[  171.667157] RBP: ffff89058ba0bc98 R08: 0000000000000000 R09: ffffffffa6f342e0
[  171.667159] R10: 00000000ffff7fff R11: 0000000000000000 R12: dead000000000122
[  171.667160] R13: ffffffffc0407010 R14: ffffffffc0407010 R15: ffffffffc0407000
[  171.667162] FS:  00007f97ea29d740(0000) GS:ffff8923ff940000(0000) knlGS:0000000000000000
[  171.667164] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  171.667166] CR2: 00007f97ea299000 CR3: 0000000186b4a004 CR4: 00000000001706e0
[  171.667169] Call Trace:
[  171.667170]  <TASK>
[  171.667171]  ? 0xffffffffc040a000
[  171.667173]  init_module+0x126/0x1000 [test_ref_tracker]
[  171.667175]  do_one_initcall+0x9c/0x220
[  171.667179]  do_init_module+0x60/0x240
[  171.667182]  load_module+0x32b5/0x3610
[  171.667186]  __do_sys_init_module+0x148/0x1a0
[  171.667189]  __x64_sys_init_module+0x1d/0x20
[  171.667192]  do_syscall_64+0x4a/0xb0
[  171.667194]  ? exc_page_fault+0x6e/0x140
[  171.667196]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  171.667199] RIP: 0033:0x7f97ea3dbe7a
[  171.667200] Code: 48 8b 0d 61 8d 06 00 f7 d8 64 89 01 48 83 c8 ff c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 2e 8d 06 00 f7 d8 64 89 01 48
[  171.667201] RSP: 002b:00007ffea67ce608 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
[  171.667203] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f97ea3dbe7a
[  171.667204] RDX: 00000000013a0ba0 RSI: 0000000000002808 RDI: 00007f97ea299000
[  171.667205] RBP: 00007ffea67ce670 R08: 0000000000000003 R09: 0000000000000000
[  171.667206] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000013a1048
[  171.667207] R13: 00000000013a0ba0 R14: 0000000001399930 R15: 00000000013a1030
[  171.667209]  </TASK>
[  171.667210] ---[ end trace f5dbd6afa41e60aa ]---

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-06 16:04:44 -08:00
Kumar Kartikeya Dwivedi
d9847eb8be bpf: Make CONFIG_DEBUG_INFO_BTF depend upon CONFIG_BPF_SYSCALL
Vinicius Costa Gomes reported [0] that build fails when
CONFIG_DEBUG_INFO_BTF is enabled and CONFIG_BPF_SYSCALL is disabled.
This leads to btf.c not being compiled, and then no symbol being present
in vmlinux for the declarations in btf.h. Since BTF is not useful
without enabling BPF subsystem, disallow this combination.

However, theoretically disabling both now could still fail, as the
symbol for kfunc_btf_id_list variables is not available. This isn't a
problem as the compiler usually optimizes the whole register/unregister
call, but at lower optimization levels it can fail the build in linking
stage.

Fix that by adding dummy variables so that modules taking address of
them still work, but the whole thing is a noop.

  [0]: https://lore.kernel.org/bpf/20211110205418.332403-1-vinicius.gomes@intel.com

Fixes: 14f267d95f ("bpf: btf: Introduce helpers for dynamic BTF set registration")
Reported-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20211122144742.477787-2-memxor@gmail.com
2021-12-02 13:39:46 -08:00
Helge Deller
8d192bec53 parisc: Increase FRAME_WARN to 2048 bytes on parisc
PA-RISC uses a much bigger frame size for functions than other
architectures. So increase it to 2048 for 32- and 64-bit kernels.
This fixes e.g. a warning in lib/xxhash.c.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Helge Deller <deller@gmx.de>
2021-11-22 07:37:31 +01:00
Linus Torvalds
512b7931ad Merge branch 'akpm' (patches from Andrew)
Merge misc updates from Andrew Morton:
 "257 patches.

  Subsystems affected by this patch series: scripts, ocfs2, vfs, and
  mm (slab-generic, slab, slub, kconfig, dax, kasan, debug, pagecache,
  gup, swap, memcg, pagemap, mprotect, mremap, iomap, tracing, vmalloc,
  pagealloc, memory-failure, hugetlb, userfaultfd, vmscan, tools,
  memblock, oom-kill, hugetlbfs, migration, thp, readahead, nommu, ksm,
  vmstat, madvise, memory-hotplug, rmap, zsmalloc, highmem, zram,
  cleanups, kfence, and damon)"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (257 commits)
  mm/damon: remove return value from before_terminate callback
  mm/damon: fix a few spelling mistakes in comments and a pr_debug message
  mm/damon: simplify stop mechanism
  Docs/admin-guide/mm/pagemap: wordsmith page flags descriptions
  Docs/admin-guide/mm/damon/start: simplify the content
  Docs/admin-guide/mm/damon/start: fix a wrong link
  Docs/admin-guide/mm/damon/start: fix wrong example commands
  mm/damon/dbgfs: add adaptive_targets list check before enable monitor_on
  mm/damon: remove unnecessary variable initialization
  Documentation/admin-guide/mm/damon: add a document for DAMON_RECLAIM
  mm/damon: introduce DAMON-based Reclamation (DAMON_RECLAIM)
  selftests/damon: support watermarks
  mm/damon/dbgfs: support watermarks
  mm/damon/schemes: activate schemes based on a watermarks mechanism
  tools/selftests/damon: update for regions prioritization of schemes
  mm/damon/dbgfs: support prioritization weights
  mm/damon/vaddr,paddr: support pageout prioritization
  mm/damon/schemes: prioritize regions within the quotas
  mm/damon/selftests: support schemes quotas
  mm/damon/dbgfs: support quotas of schemes
  ...
2021-11-06 14:08:17 -07:00
David Hildenbrand
50f9481ed9 mm/memory_hotplug: remove CONFIG_MEMORY_HOTPLUG_SPARSE
CONFIG_MEMORY_HOTPLUG depends on CONFIG_SPARSEMEM, so there is no need for
CONFIG_MEMORY_HOTPLUG_SPARSE anymore; adjust all instances to use
CONFIG_MEMORY_HOTPLUG and remove CONFIG_MEMORY_HOTPLUG_SPARSE.

Link: https://lkml.kernel.org/r/20210929143600.49379-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>	[kselftest]
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Oscar Salvador <osalvador@suse.de>
Cc: Alex Shi <alexs@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 13:30:42 -07:00
Linus Torvalds
79ef0c0014 Tracing updates for 5.16:
- kprobes: Restructured stack unwinder to show properly on x86 when a stack
   dump happens from a kretprobe callback.
 
 - Fix to bootconfig parsing
 
 - Have tracefs allow owner and group permissions by default (only denying
   others). There's been pressure to allow non root to tracefs in a
   controlled fashion, and using groups is probably the safest.
 
 - Bootconfig memory managament updates.
 
 - Bootconfig clean up to have the tools directory be less dependent on
   changes in the kernel tree.
 
 - Allow perf to be traced by function tracer.
 
 - Rewrite of function graph tracer to be a callback from the function tracer
   instead of having its own trampoline (this change will happen on an arch
   by arch basis, and currently only x86_64 implements it).
 
 - Allow multiple direct trampolines (bpf hooks to functions) be batched
   together in one synchronization.
 
 - Allow histogram triggers to add variables that can perform calculations
   against the event's fields.
 
 - Use the linker to determine architecture callbacks from the ftrace
   trampoline to allow for proper parameter prototypes and prevent warnings
   from the compiler.
 
 - Extend histogram triggers to key off of variables.
 
 - Have trace recursion use bit magic to determine preempt context over if
   branches.
 
 - Have trace recursion disable preemption as all use cases do anyway.
 
 - Added testing for verification of tracing utilities.
 
 - Various small clean ups and fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYYBdxhQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qp1sAQD2oYFwaG3sx872gj/myBcHIBSKdiki
 Hry5csd8zYDBpgD+Poylopt5JIbeDuoYw/BedgEXmscZ8Qr7VzjAXdnv/Q4=
 =Loz8
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:

 - kprobes: Restructured stack unwinder to show properly on x86 when a
   stack dump happens from a kretprobe callback.

 - Fix to bootconfig parsing

 - Have tracefs allow owner and group permissions by default (only
   denying others). There's been pressure to allow non root to tracefs
   in a controlled fashion, and using groups is probably the safest.

 - Bootconfig memory managament updates.

 - Bootconfig clean up to have the tools directory be less dependent on
   changes in the kernel tree.

 - Allow perf to be traced by function tracer.

 - Rewrite of function graph tracer to be a callback from the function
   tracer instead of having its own trampoline (this change will happen
   on an arch by arch basis, and currently only x86_64 implements it).

 - Allow multiple direct trampolines (bpf hooks to functions) be batched
   together in one synchronization.

 - Allow histogram triggers to add variables that can perform
   calculations against the event's fields.

 - Use the linker to determine architecture callbacks from the ftrace
   trampoline to allow for proper parameter prototypes and prevent
   warnings from the compiler.

 - Extend histogram triggers to key off of variables.

 - Have trace recursion use bit magic to determine preempt context over
   if branches.

 - Have trace recursion disable preemption as all use cases do anyway.

 - Added testing for verification of tracing utilities.

 - Various small clean ups and fixes.

* tag 'trace-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (101 commits)
  tracing/histogram: Fix semicolon.cocci warnings
  tracing/histogram: Fix documentation inline emphasis warning
  tracing: Increase PERF_MAX_TRACE_SIZE to handle Sentinel1 and docker together
  tracing: Show size of requested perf buffer
  bootconfig: Initialize ret in xbc_parse_tree()
  ftrace: do CPU checking after preemption disabled
  ftrace: disable preemption when recursion locked
  tracing/histogram: Document expression arithmetic and constants
  tracing/histogram: Optimize division by a power of 2
  tracing/histogram: Covert expr to const if both operands are constants
  tracing/histogram: Simplify handling of .sym-offset in expressions
  tracing: Fix operator precedence for hist triggers expression
  tracing: Add division and multiplication support for hist triggers
  tracing: Add support for creating hist trigger variables from literal
  selftests/ftrace: Stop tracing while reading the trace file by default
  MAINTAINERS: Update KPROBES and TRACING entries
  test_kprobes: Move it from kernel/ to lib/
  docs, kprobes: Remove invalid URL and add new reference
  samples/kretprobes: Fix return value if register_kretprobe() failed
  lib/bootconfig: Fix the xbc_get_info kerneldoc
  ...
2021-11-01 20:05:19 -07:00
Linus Torvalds
2dc26d98cf overflow updates for v5.16-rc1
The end goal of the current buffer overflow detection work[0] is to gain
 full compile-time and run-time coverage of all detectable buffer overflows
 seen via array indexing or memcpy(), memmove(), and memset(). The str*()
 family of functions already have full coverage.
 
 While much of the work for these changes have been on-going for many
 releases (i.e. 0-element and 1-element array replacements, as well as
 avoiding false positives and fixing discovered overflows[1]), this series
 contains the foundational elements of several related buffer overflow
 detection improvements by providing new common helpers and FORTIFY_SOURCE
 changes needed to gain the introspection required for compiler visibility
 into array sizes. Also included are a handful of already Acked instances
 using the helpers (or related clean-ups), with many more waiting at the
 ready to be taken via subsystem-specific trees[2]. The new helpers are:
 
 - struct_group() for gaining struct member range introspection.
 - memset_after() and memset_startat() for clearing to the end of structures.
 - DECLARE_FLEX_ARRAY() for using flex arrays in unions or alone in structs.
 
 Also included is the beginning of the refactoring of FORTIFY_SOURCE to
 support memcpy() introspection, fix missing and regressed coverage under
 GCC, and to prepare to fix the currently broken Clang support. Finishing
 this work is part of the larger series[0], but depends on all the false
 positives and buffer overflow bug fixes to have landed already and those
 that depend on this series to land.
 
 As part of the FORTIFY_SOURCE refactoring, a set of both a compile-time
 and run-time tests are added for FORTIFY_SOURCE and the mem*()-family
 functions respectively. The compile time tests have found a legitimate
 (though corner-case) bug[6] already.
 
 Please note that the appearance of "panic" and "BUG" in the
 FORTIFY_SOURCE refactoring are the result of relocating existing code,
 and no new use of those code-paths are expected nor desired.
 
 Finally, there are two tree-wide conversions for 0-element arrays and
 flexible array unions to gain sane compiler introspection coverage that
 result in no known object code differences.
 
 After this series (and the changes that have now landed via netdev
 and usb), we are very close to finally being able to build with
 -Warray-bounds and -Wzero-length-bounds. However, due corner cases in
 GCC[3] and Clang[4], I have not included the last two patches that turn
 on these options, as I don't want to introduce any known warnings to
 the build. Hopefully these can be solved soon.
 
 [0] https://lore.kernel.org/lkml/20210818060533.3569517-1-keescook@chromium.org/
 [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/?qt=grep&q=FORTIFY_SOURCE
 [2] https://lore.kernel.org/lkml/202108220107.3E26FE6C9C@keescook/
 [3] https://lore.kernel.org/lkml/3ab153ec-2798-da4c-f7b1-81b0ac8b0c5b@roeck-us.net/
 [4] https://bugs.llvm.org/show_bug.cgi?id=51682
 [5] https://lore.kernel.org/lkml/202109051257.29B29745C0@keescook/
 [6] https://lore.kernel.org/lkml/20211020200039.170424-1-keescook@chromium.org/
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmGAFWcWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJmKFD/45MJdnvW5MhIEeW5tc5UjfcIPS
 ae+YvlEX/2ZwgSlTxocFVocE6hz7b6eCiX3dSAChPkPxsSfgeiuhjxsU+4ROnELR
 04RqTA/rwT6JXfJcXbDPXfxDL4huUkgktAW3m1sT771AZspeap2GrSwFyttlTqKA
 +kTiZ3lXJVFcw10uyhfp3Lk6eFJxdf5iOjuEou5kBOQfpNKEOduRL2K15hSowOwB
 lARiAC+HbmN+E+npvDE7YqK4V7ZQ0/dtB0BlfqgTkn1spQz8N21kBAMpegV5vvIk
 A+qGHc7q2oyk4M14TRTidQHGQ4juW1Kkvq3NV6KzwQIVD+mIfz0ESn3d4tnp28Hk
 Y+OXTI1BRFlApQU9qGWv33gkNEozeyqMLDRLKhDYRSFPA9UKkpgXQRzeTzoLKyrQ
 4B6n5NnUGcu7I6WWhpyZQcZLDsHGyy0vHzjQGs/NXtb1PzXJ5XIGuPdmx9pVMykk
 IVKnqRcWyGWahfh3asOnoXvdhi1No4NSHQ/ZHfUM+SrIGYjBMaUisw66qm3Fe8ZU
 lbO2CFkCsfGSoKNPHf0lUEGlkyxAiDolazOfflDNxdzzlZo2X1l/a7O/yoO4Pqul
 cdL0eDjiNoQ2YR2TSYPnXq5KSL1RI0tlfS8pH8k1hVhZsQx0wpAQ+qki0S+fLePV
 PdA9XB82G2tmqKc9cQ==
 =9xbT
 -----END PGP SIGNATURE-----

Merge tag 'overflow-v5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull overflow updates from Kees Cook:
 "The end goal of the current buffer overflow detection work[0] is to
  gain full compile-time and run-time coverage of all detectable buffer
  overflows seen via array indexing or memcpy(), memmove(), and
  memset(). The str*() family of functions already have full coverage.

  While much of the work for these changes have been on-going for many
  releases (i.e. 0-element and 1-element array replacements, as well as
  avoiding false positives and fixing discovered overflows[1]), this
  series contains the foundational elements of several related buffer
  overflow detection improvements by providing new common helpers and
  FORTIFY_SOURCE changes needed to gain the introspection required for
  compiler visibility into array sizes. Also included are a handful of
  already Acked instances using the helpers (or related clean-ups), with
  many more waiting at the ready to be taken via subsystem-specific
  trees[2].

  The new helpers are:

   - struct_group() for gaining struct member range introspection

   - memset_after() and memset_startat() for clearing to the end of
     structures

   - DECLARE_FLEX_ARRAY() for using flex arrays in unions or alone in
     structs

  Also included is the beginning of the refactoring of FORTIFY_SOURCE to
  support memcpy() introspection, fix missing and regressed coverage
  under GCC, and to prepare to fix the currently broken Clang support.
  Finishing this work is part of the larger series[0], but depends on
  all the false positives and buffer overflow bug fixes to have landed
  already and those that depend on this series to land.

  As part of the FORTIFY_SOURCE refactoring, a set of both a
  compile-time and run-time tests are added for FORTIFY_SOURCE and the
  mem*()-family functions respectively. The compile time tests have
  found a legitimate (though corner-case) bug[6] already.

  Please note that the appearance of "panic" and "BUG" in the
  FORTIFY_SOURCE refactoring are the result of relocating existing code,
  and no new use of those code-paths are expected nor desired.

  Finally, there are two tree-wide conversions for 0-element arrays and
  flexible array unions to gain sane compiler introspection coverage
  that result in no known object code differences.

  After this series (and the changes that have now landed via netdev and
  usb), we are very close to finally being able to build with
  -Warray-bounds and -Wzero-length-bounds.

  However, due corner cases in GCC[3] and Clang[4], I have not included
  the last two patches that turn on these options, as I don't want to
  introduce any known warnings to the build. Hopefully these can be
  solved soon"

Link: https://lore.kernel.org/lkml/20210818060533.3569517-1-keescook@chromium.org/ [0]
Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/?qt=grep&q=FORTIFY_SOURCE [1]
Link: https://lore.kernel.org/lkml/202108220107.3E26FE6C9C@keescook/ [2]
Link: https://lore.kernel.org/lkml/3ab153ec-2798-da4c-f7b1-81b0ac8b0c5b@roeck-us.net/ [3]
Link: https://bugs.llvm.org/show_bug.cgi?id=51682 [4]
Link: https://lore.kernel.org/lkml/202109051257.29B29745C0@keescook/ [5]
Link: https://lore.kernel.org/lkml/20211020200039.170424-1-keescook@chromium.org/ [6]

* tag 'overflow-v5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (30 commits)
  fortify: strlen: Avoid shadowing previous locals
  compiler-gcc.h: Define __SANITIZE_ADDRESS__ under hwaddress sanitizer
  treewide: Replace 0-element memcpy() destinations with flexible arrays
  treewide: Replace open-coded flex arrays in unions
  stddef: Introduce DECLARE_FLEX_ARRAY() helper
  btrfs: Use memset_startat() to clear end of struct
  string.h: Introduce memset_startat() for wiping trailing members and padding
  xfrm: Use memset_after() to clear padding
  string.h: Introduce memset_after() for wiping trailing members/padding
  lib: Introduce CONFIG_MEMCPY_KUNIT_TEST
  fortify: Add compile-time FORTIFY_SOURCE tests
  fortify: Allow strlen() and strnlen() to pass compile-time known lengths
  fortify: Prepare to improve strnlen() and strlen() warnings
  fortify: Fix dropped strcpy() compile-time write overflow check
  fortify: Explicitly disable Clang support
  fortify: Move remaining fortify helpers into fortify-string.h
  lib/string: Move helper functions out of string.c
  compiler_types.h: Remove __compiletime_object_size()
  cm4000_cs: Use struct_group() to zero struct cm4000_dev region
  can: flexcan: Use struct_group() to zero struct flexcan_regs regions
  ...
2021-11-01 17:12:56 -07:00
Sven Schnelle
e44e81c5b9 kprobes: convert tests to kunit
This converts the kprobes testcases to use the kunit framework.
It adds a dependency on CONFIG_KUNIT, and the output will change
to TAP:

TAP version 14
1..1
    # Subtest: kprobes_test
    1..4
random: crng init done
    ok 1 - test_kprobe
    ok 2 - test_kprobes
    ok 3 - test_kretprobe
    ok 4 - test_kretprobes
ok 1 - kprobes_test

Note that the kprobes testcases are no longer run immediately after
kprobes initialization, but as a late initcall when kunit is
initialized. kprobes itself is initialized with an early initcall,
so the order is still correct.

Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-21 14:19:01 -04:00
Kees Cook
bb95ebbe89 lib: Introduce CONFIG_MEMCPY_KUNIT_TEST
Before changing anything about memcpy(), memmove(), and memset(), add
run-time tests to check basic behaviors for any regressions.

Signed-off-by: Kees Cook <keescook@chromium.org>
2021-10-18 12:28:52 -07:00