Commit Graph

145 Commits

Author SHA1 Message Date
Linus Torvalds
c2a96b7f18 Driver core changes for 6.11-rc1
Here is the big set of driver core changes for 6.11-rc1.
 
 Lots of stuff in here, with not a huge diffstat, but apis are evolving
 which required lots of files to be touched.  Highlights of the changes
 in here are:
   - platform remove callback api final fixups (Uwe took many releases to
     get here, finally!)
   - Rust bindings for basic firmware apis and initial driver-core
     interactions.  It's not all that useful for a "write a whole driver
     in rust" type of thing, but the firmware bindings do help out the
     phy rust drivers, and the driver core bindings give a solid base on
     which others can start their work.  There is still a long way to go
     here before we have a multitude of rust drivers being added, but
     it's a great first step.
   - driver core const api changes.  This reached across all bus types,
     and there are some fix-ups for some not-common bus types that
     linux-next and 0-day testing shook out.  This work is being done to
     help make the rust bindings more safe, as well as the C code, moving
     toward the end-goal of allowing us to put driver structures into
     read-only memory.  We aren't there yet, but are getting closer.
   - minor devres cleanups and fixes found by code inspection
   - arch_topology minor changes
   - other minor driver core cleanups
 
 All of these have been in linux-next for a very long time with no
 reported problems.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZqH+aQ8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ymoOQCfVBdLcBjEDAGh3L8qHRGMPy4rV2EAoL/r+zKm
 cJEYtJpGtWX6aAtugm9E
 =ZyJV
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg KH:
 "Here is the big set of driver core changes for 6.11-rc1.

  Lots of stuff in here, with not a huge diffstat, but apis are evolving
  which required lots of files to be touched. Highlights of the changes
  in here are:

   - platform remove callback api final fixups (Uwe took many releases
     to get here, finally!)

   - Rust bindings for basic firmware apis and initial driver-core
     interactions.

     It's not all that useful for a "write a whole driver in rust" type
     of thing, but the firmware bindings do help out the phy rust
     drivers, and the driver core bindings give a solid base on which
     others can start their work.

     There is still a long way to go here before we have a multitude of
     rust drivers being added, but it's a great first step.

   - driver core const api changes.

     This reached across all bus types, and there are some fix-ups for
     some not-common bus types that linux-next and 0-day testing shook
     out.

     This work is being done to help make the rust bindings more safe,
     as well as the C code, moving toward the end-goal of allowing us to
     put driver structures into read-only memory. We aren't there yet,
     but are getting closer.

   - minor devres cleanups and fixes found by code inspection

   - arch_topology minor changes

   - other minor driver core cleanups

  All of these have been in linux-next for a very long time with no
  reported problems"

* tag 'driver-core-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (55 commits)
  ARM: sa1100: make match function take a const pointer
  sysfs/cpu: Make crash_hotplug attribute world-readable
  dio: Have dio_bus_match() callback take a const *
  zorro: make match function take a const pointer
  driver core: module: make module_[add|remove]_driver take a const *
  driver core: make driver_find_device() take a const *
  driver core: make driver_[create|remove]_file take a const *
  firmware_loader: fix soundness issue in `request_internal`
  firmware_loader: annotate doctests as `no_run`
  devres: Correct code style for functions that return a pointer type
  devres: Initialize an uninitialized struct member
  devres: Fix memory leakage caused by driver API devm_free_percpu()
  devres: Fix devm_krealloc() wasting memory
  driver core: platform: Switch to use kmemdup_array()
  driver core: have match() callback in struct bus_type take a const *
  MAINTAINERS: add Rust device abstractions to DRIVER CORE
  device: rust: improve safety comments
  MAINTAINERS: add Danilo as FIRMWARE LOADER maintainer
  MAINTAINERS: add Rust FW abstractions to FIRMWARE LOADER
  firmware: rust: improve safety comments
  ...
2024-07-25 10:42:22 -07:00
Petr Tesarik
e777798e67 sysfs/cpu: Make crash_hotplug attribute world-readable
There is no reason to restrict access to this attribute, as it merely
reports whether crash elfcorehdr is automatically updated on CPU hot
plug/unplug and/or online/offline events.

Note that since commit 79365026f8 ("crash: add a new kexec flag for
hotplug support"), this maps to the same flag which is world-accessible
through /sys/devices/system/memory/crash_hotplug.

Signed-off-by: Petr Tesarik <ptesarik@suse.com>
Acked-by: Baoquan He <bhe@redhat.com>
Acked-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Link: https://lore.kernel.org/r/20240711103409.319673-1-petr.tesarik@suse.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-12 11:00:58 +02:00
Greg Kroah-Hartman
d69d804845 driver core: have match() callback in struct bus_type take a const *
In the match() callback, the struct device_driver * should not be
changed, so change the function callback to be a const *.  This is one
step of many towards making the driver core safe to have struct
device_driver in read-only memory.

Because the match() callback is in all busses, all busses are modified
to handle this properly.  This does entail switching some container_of()
calls to container_of_const() to properly handle the constant *.

For some busses, like PCI and USB and HV, the const * is cast away in
the match callback as those busses do want to modify those structures at
this point in time (they have a local lock in the driver structure.)
That will have to be changed in the future if they wish to have their
struct device * in read-only-memory.

Cc: Rafael J. Wysocki <rafael@kernel.org>
Reviewed-by: Alex Elder <elder@kernel.org>
Acked-by: Sumit Garg <sumit.garg@linaro.org>
Link: https://lore.kernel.org/r/2024070136-wrongdoer-busily-01e8@gregkh
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-03 15:16:54 +02:00
James Morse
4e1a7df454 cpumask: Add enabled cpumask for present CPUs that can be brought online
The 'offline' file in sysfs shows all offline CPUs, including those
that aren't present. User-space is expected to remove not-present CPUs
from this list to learn which CPUs could be brought online.

CPUs can be present but not-enabled. These CPUs can't be brought online
until the firmware policy changes, which comes with an ACPI notification
that will register the CPUs.

With only the offline and present files, user-space is unable to
determine which CPUs it can try to bring online. Add a new CPU mask
that shows this based on all the registered CPUs.

Signed-off-by: James Morse <james.morse@arm.com>
Tested-by: Miguel Luis <miguel.luis@oracle.com>
Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com>
Tested-by: Jianyong Wu <jianyong.wu@arm.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20240529133446.28446-20-Jonathan.Cameron@huawei.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-06-28 18:38:33 +01:00
Jonathan Cameron
d830ef3ac5 cpu: Do not warn on arch_register_cpu() returning -EPROBE_DEFER
For arm64 the CPU registration cannot complete until the ACPI
interpreter us up and running so in those cases the arch specific
arch_register_cpu() will return -EPROBE_DEFER at this stage and the
registration will be attempted later.

Suggested-by: Rafael J. Wysocki <rafael@kernel.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Tested-by: Miguel Luis <miguel.luis@oracle.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20240529133446.28446-3-Jonathan.Cameron@huawei.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-06-28 18:38:27 +01:00
Sourabh Jain
79365026f8 crash: add a new kexec flag for hotplug support
Commit a72bbec70d ("crash: hotplug support for kexec_load()")
introduced a new kexec flag, `KEXEC_UPDATE_ELFCOREHDR`. Kexec tool uses
this flag to indicate to the kernel that it is safe to modify the
elfcorehdr of the kdump image loaded using the kexec_load system call.

However, it is possible that architectures may need to update kexec
segments other then elfcorehdr. For example, FDT (Flatten Device Tree)
on PowerPC. Introducing a new kexec flag for every new kexec segment
may not be a good solution. Hence, a generic kexec flag bit,
`KEXEC_CRASH_HOTPLUG_SUPPORT`, is introduced to share the CPU/Memory
hotplug support intent between the kexec tool and the kernel for the
kexec_load system call.

Now we have two kexec flags that enables crash hotplug support for
kexec_load system call. First is KEXEC_UPDATE_ELFCOREHDR (only used in
x86), and second is KEXEC_CRASH_HOTPLUG_SUPPORT (for all architectures).

To simplify the process of finding and reporting the crash hotplug
support the following changes are introduced.

1. Define arch specific function to process the kexec flags and
   determine crash hotplug support

2. Rename the @update_elfcorehdr member of struct kimage to
   @hotplug_support and populate it for both kexec_load and
   kexec_file_load syscalls, because architecture can update more than
   one kexec segment

3. Let generic function crash_check_hotplug_support report hotplug
   support for loaded kdump image based on value of @hotplug_support

To bring the x86 crash hotplug support in line with the above points,
the following changes have been made:

- Introduce the arch_crash_hotplug_support function to process kexec
  flags and determine crash hotplug support

- Remove the arch_crash_hotplug_[cpu|memory]_support functions

Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Acked-by: Baoquan He <bhe@redhat.com>
Acked-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240326055413.186534-3-sourabhjain@linux.ibm.com
2024-04-23 14:59:01 +10:00
Linus Torvalds
241590e5a1 Driver core changes for 6.9-rc1
Here is the "big" set of driver core and kernfs changes for 6.9-rc1.
 
 Nothing all that crazy here, just some good updates that include:
   - automatic attribute group hiding from Dan Williams (he fixed up my
     horrible attempt at doing this.)
   - kobject lock contention fixes from Eric Dumazet
   - driver core cleanups from Andy
   - kernfs rcu work from Tejun
   - fw_devlink changes to resolve some reported issues
   - other minor changes, all details in the shortlog
 
 All of these have been in linux-next for a long time with no reported
 issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZfwsHg8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ynT4ACePcNRAsYrINlOPPKPHimJtyP01yEAn0pZYnj2
 0/UpqIqf3HVPu7zsLKTa
 =vR9S
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg KH:
 "Here is the "big" set of driver core and kernfs changes for 6.9-rc1.

  Nothing all that crazy here, just some good updates that include:

   - automatic attribute group hiding from Dan Williams (he fixed up my
     horrible attempt at doing this.)

   - kobject lock contention fixes from Eric Dumazet

   - driver core cleanups from Andy

   - kernfs rcu work from Tejun

   - fw_devlink changes to resolve some reported issues

   - other minor changes, all details in the shortlog

  All of these have been in linux-next for a long time with no reported
  issues"

* tag 'driver-core-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (28 commits)
  device: core: Log warning for devices pending deferred probe on timeout
  driver: core: Use dev_* instead of pr_* so device metadata is added
  driver: core: Log probe failure as error and with device metadata
  of: property: fw_devlink: Add support for "post-init-providers" property
  driver core: Add FWLINK_FLAG_IGNORE to completely ignore a fwnode link
  driver core: Adds flags param to fwnode_link_add()
  debugfs: fix wait/cancellation handling during remove
  device property: Don't use "proxy" headers
  device property: Move enum dev_dma_attr to fwnode.h
  driver core: Move fw_devlink stuff to where it belongs
  driver core: Drop unneeded 'extern' keyword in fwnode.h
  firmware_loader: Suppress warning on FW_OPT_NO_WARN flag
  sysfs:Addresses documentation in sysfs_merge_group and sysfs_unmerge_group.
  firmware_loader: introduce __free() cleanup hanler
  platform-msi: Remove usage of the deprecated ida_simple_xx() API
  sysfs: Introduce DEFINE_SIMPLE_SYSFS_GROUP_VISIBLE()
  sysfs: Document new "group visible" helpers
  sysfs: Fix crash on empty group attributes array
  sysfs: Introduce a mechanism to hide static attribute_groups
  sysfs: Introduce a mechanism to hide static attribute_groups
  ...
2024-03-21 13:34:15 -07:00
Linus Torvalds
902861e34c - Sumanth Korikkar has taught s390 to allocate hotplug-time page frames
from hotplugged memory rather than only from main memory.  Series
   "implement "memmap on memory" feature on s390".
 
 - More folio conversions from Matthew Wilcox in the series
 
 	"Convert memcontrol charge moving to use folios"
 	"mm: convert mm counter to take a folio"
 
 - Chengming Zhou has optimized zswap's rbtree locking, providing
   significant reductions in system time and modest but measurable
   reductions in overall runtimes.  The series is "mm/zswap: optimize the
   scalability of zswap rb-tree".
 
 - Chengming Zhou has also provided the series "mm/zswap: optimize zswap
   lru list" which provides measurable runtime benefits in some
   swap-intensive situations.
 
 - And Chengming Zhou further optimizes zswap in the series "mm/zswap:
   optimize for dynamic zswap_pools".  Measured improvements are modest.
 
 - zswap cleanups and simplifications from Yosry Ahmed in the series "mm:
   zswap: simplify zswap_swapoff()".
 
 - In the series "Add DAX ABI for memmap_on_memory", Vishal Verma has
   contributed several DAX cleanups as well as adding a sysfs tunable to
   control the memmap_on_memory setting when the dax device is hotplugged
   as system memory.
 
 - Johannes Weiner has added the large series "mm: zswap: cleanups",
   which does that.
 
 - More DAMON work from SeongJae Park in the series
 
 	"mm/damon: make DAMON debugfs interface deprecation unignorable"
 	"selftests/damon: add more tests for core functionalities and corner cases"
 	"Docs/mm/damon: misc readability improvements"
 	"mm/damon: let DAMOS feeds and tame/auto-tune itself"
 
 - In the series "mm/mempolicy: weighted interleave mempolicy and sysfs
   extension" Rakie Kim has developed a new mempolicy interleaving policy
   wherein we allocate memory across nodes in a weighted fashion rather
   than uniformly.  This is beneficial in heterogeneous memory environments
   appearing with CXL.
 
 - Christophe Leroy has contributed some cleanup and consolidation work
   against the ARM pagetable dumping code in the series "mm: ptdump:
   Refactor CONFIG_DEBUG_WX and check_wx_pages debugfs attribute".
 
 - Luis Chamberlain has added some additional xarray selftesting in the
   series "test_xarray: advanced API multi-index tests".
 
 - Muhammad Usama Anjum has reworked the selftest code to make its
   human-readable output conform to the TAP ("Test Anything Protocol")
   format.  Amongst other things, this opens up the use of third-party
   tools to parse and process out selftesting results.
 
 - Ryan Roberts has added fork()-time PTE batching of THP ptes in the
   series "mm/memory: optimize fork() with PTE-mapped THP".  Mainly
   targeted at arm64, this significantly speeds up fork() when the process
   has a large number of pte-mapped folios.
 
 - David Hildenbrand also gets in on the THP pte batching game in his
   series "mm/memory: optimize unmap/zap with PTE-mapped THP".  It
   implements batching during munmap() and other pte teardown situations.
   The microbenchmark improvements are nice.
 
 - And in the series "Transparent Contiguous PTEs for User Mappings" Ryan
   Roberts further utilizes arm's pte's contiguous bit ("contpte
   mappings").  Kernel build times on arm64 improved nicely.  Ryan's series
   "Address some contpte nits" provides some followup work.
 
 - In the series "mm/hugetlb: Restore the reservation" Breno Leitao has
   fixed an obscure hugetlb race which was causing unnecessary page faults.
   He has also added a reproducer under the selftest code.
 
 - In the series "selftests/mm: Output cleanups for the compaction test",
   Mark Brown did what the title claims.
 
 - Kinsey Ho has added the series "mm/mglru: code cleanup and refactoring".
 
 - Even more zswap material from Nhat Pham.  The series "fix and extend
   zswap kselftests" does as claimed.
 
 - In the series "Introduce cpu_dcache_is_aliasing() to fix DAX
   regression" Mathieu Desnoyers has cleaned up and fixed rather a mess in
   our handling of DAX on archiecctures which have virtually aliasing data
   caches.  The arm architecture is the main beneficiary.
 
 - Lokesh Gidra's series "per-vma locks in userfaultfd" provides dramatic
   improvements in worst-case mmap_lock hold times during certain
   userfaultfd operations.
 
 - Some page_owner enhancements and maintenance work from Oscar Salvador
   in his series
 
 	"page_owner: print stacks and their outstanding allocations"
 	"page_owner: Fixup and cleanup"
 
 - Uladzislau Rezki has contributed some vmalloc scalability improvements
   in his series "Mitigate a vmap lock contention".  It realizes a 12x
   improvement for a certain microbenchmark.
 
 - Some kexec/crash cleanup work from Baoquan He in the series "Split
   crash out from kexec and clean up related config items".
 
 - Some zsmalloc maintenance work from Chengming Zhou in the series
 
 	"mm/zsmalloc: fix and optimize objects/page migration"
 	"mm/zsmalloc: some cleanup for get/set_zspage_mapping()"
 
 - Zi Yan has taught the MM to perform compaction on folios larger than
   order=0.  This a step along the path to implementaton of the merging of
   large anonymous folios.  The series is named "Enable >0 order folio
   memory compaction".
 
 - Christoph Hellwig has done quite a lot of cleanup work in the
   pagecache writeback code in his series "convert write_cache_pages() to
   an iterator".
 
 - Some modest hugetlb cleanups and speedups in Vishal Moola's series
   "Handle hugetlb faults under the VMA lock".
 
 - Zi Yan has changed the page splitting code so we can split huge pages
   into sizes other than order-0 to better utilize large folios.  The
   series is named "Split a folio to any lower order folios".
 
 - David Hildenbrand has contributed the series "mm: remove
   total_mapcount()", a cleanup.
 
 - Matthew Wilcox has sought to improve the performance of bulk memory
   freeing in his series "Rearrange batched folio freeing".
 
 - Gang Li's series "hugetlb: parallelize hugetlb page init on boot"
   provides large improvements in bootup times on large machines which are
   configured to use large numbers of hugetlb pages.
 
 - Matthew Wilcox's series "PageFlags cleanups" does that.
 
 - Qi Zheng's series "minor fixes and supplement for ptdesc" does that
   also.  S390 is affected.
 
 - Cleanups to our pagemap utility functions from Peter Xu in his series
   "mm/treewide: Replace pXd_large() with pXd_leaf()".
 
 - Nico Pache has fixed a few things with our hugepage selftests in his
   series "selftests/mm: Improve Hugepage Test Handling in MM Selftests".
 
 - Also, of course, many singleton patches to many things.  Please see
   the individual changelogs for details.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZfJpPQAKCRDdBJ7gKXxA
 joxeAP9TrcMEuHnLmBlhIXkWbIR4+ki+pA3v+gNTlJiBhnfVSgD9G55t1aBaRplx
 TMNhHfyiHYDTx/GAV9NXW84tasJSDgA=
 =TG55
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2024-03-13-20-04' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - Sumanth Korikkar has taught s390 to allocate hotplug-time page frames
   from hotplugged memory rather than only from main memory. Series
   "implement "memmap on memory" feature on s390".

 - More folio conversions from Matthew Wilcox in the series

	"Convert memcontrol charge moving to use folios"
	"mm: convert mm counter to take a folio"

 - Chengming Zhou has optimized zswap's rbtree locking, providing
   significant reductions in system time and modest but measurable
   reductions in overall runtimes. The series is "mm/zswap: optimize the
   scalability of zswap rb-tree".

 - Chengming Zhou has also provided the series "mm/zswap: optimize zswap
   lru list" which provides measurable runtime benefits in some
   swap-intensive situations.

 - And Chengming Zhou further optimizes zswap in the series "mm/zswap:
   optimize for dynamic zswap_pools". Measured improvements are modest.

 - zswap cleanups and simplifications from Yosry Ahmed in the series
   "mm: zswap: simplify zswap_swapoff()".

 - In the series "Add DAX ABI for memmap_on_memory", Vishal Verma has
   contributed several DAX cleanups as well as adding a sysfs tunable to
   control the memmap_on_memory setting when the dax device is
   hotplugged as system memory.

 - Johannes Weiner has added the large series "mm: zswap: cleanups",
   which does that.

 - More DAMON work from SeongJae Park in the series

	"mm/damon: make DAMON debugfs interface deprecation unignorable"
	"selftests/damon: add more tests for core functionalities and corner cases"
	"Docs/mm/damon: misc readability improvements"
	"mm/damon: let DAMOS feeds and tame/auto-tune itself"

 - In the series "mm/mempolicy: weighted interleave mempolicy and sysfs
   extension" Rakie Kim has developed a new mempolicy interleaving
   policy wherein we allocate memory across nodes in a weighted fashion
   rather than uniformly. This is beneficial in heterogeneous memory
   environments appearing with CXL.

 - Christophe Leroy has contributed some cleanup and consolidation work
   against the ARM pagetable dumping code in the series "mm: ptdump:
   Refactor CONFIG_DEBUG_WX and check_wx_pages debugfs attribute".

 - Luis Chamberlain has added some additional xarray selftesting in the
   series "test_xarray: advanced API multi-index tests".

 - Muhammad Usama Anjum has reworked the selftest code to make its
   human-readable output conform to the TAP ("Test Anything Protocol")
   format. Amongst other things, this opens up the use of third-party
   tools to parse and process out selftesting results.

 - Ryan Roberts has added fork()-time PTE batching of THP ptes in the
   series "mm/memory: optimize fork() with PTE-mapped THP". Mainly
   targeted at arm64, this significantly speeds up fork() when the
   process has a large number of pte-mapped folios.

 - David Hildenbrand also gets in on the THP pte batching game in his
   series "mm/memory: optimize unmap/zap with PTE-mapped THP". It
   implements batching during munmap() and other pte teardown
   situations. The microbenchmark improvements are nice.

 - And in the series "Transparent Contiguous PTEs for User Mappings"
   Ryan Roberts further utilizes arm's pte's contiguous bit ("contpte
   mappings"). Kernel build times on arm64 improved nicely. Ryan's
   series "Address some contpte nits" provides some followup work.

 - In the series "mm/hugetlb: Restore the reservation" Breno Leitao has
   fixed an obscure hugetlb race which was causing unnecessary page
   faults. He has also added a reproducer under the selftest code.

 - In the series "selftests/mm: Output cleanups for the compaction
   test", Mark Brown did what the title claims.

 - Kinsey Ho has added the series "mm/mglru: code cleanup and
   refactoring".

 - Even more zswap material from Nhat Pham. The series "fix and extend
   zswap kselftests" does as claimed.

 - In the series "Introduce cpu_dcache_is_aliasing() to fix DAX
   regression" Mathieu Desnoyers has cleaned up and fixed rather a mess
   in our handling of DAX on archiecctures which have virtually aliasing
   data caches. The arm architecture is the main beneficiary.

 - Lokesh Gidra's series "per-vma locks in userfaultfd" provides
   dramatic improvements in worst-case mmap_lock hold times during
   certain userfaultfd operations.

 - Some page_owner enhancements and maintenance work from Oscar Salvador
   in his series

	"page_owner: print stacks and their outstanding allocations"
	"page_owner: Fixup and cleanup"

 - Uladzislau Rezki has contributed some vmalloc scalability
   improvements in his series "Mitigate a vmap lock contention". It
   realizes a 12x improvement for a certain microbenchmark.

 - Some kexec/crash cleanup work from Baoquan He in the series "Split
   crash out from kexec and clean up related config items".

 - Some zsmalloc maintenance work from Chengming Zhou in the series

	"mm/zsmalloc: fix and optimize objects/page migration"
	"mm/zsmalloc: some cleanup for get/set_zspage_mapping()"

 - Zi Yan has taught the MM to perform compaction on folios larger than
   order=0. This a step along the path to implementaton of the merging
   of large anonymous folios. The series is named "Enable >0 order folio
   memory compaction".

 - Christoph Hellwig has done quite a lot of cleanup work in the
   pagecache writeback code in his series "convert write_cache_pages()
   to an iterator".

 - Some modest hugetlb cleanups and speedups in Vishal Moola's series
   "Handle hugetlb faults under the VMA lock".

 - Zi Yan has changed the page splitting code so we can split huge pages
   into sizes other than order-0 to better utilize large folios. The
   series is named "Split a folio to any lower order folios".

 - David Hildenbrand has contributed the series "mm: remove
   total_mapcount()", a cleanup.

 - Matthew Wilcox has sought to improve the performance of bulk memory
   freeing in his series "Rearrange batched folio freeing".

 - Gang Li's series "hugetlb: parallelize hugetlb page init on boot"
   provides large improvements in bootup times on large machines which
   are configured to use large numbers of hugetlb pages.

 - Matthew Wilcox's series "PageFlags cleanups" does that.

 - Qi Zheng's series "minor fixes and supplement for ptdesc" does that
   also. S390 is affected.

 - Cleanups to our pagemap utility functions from Peter Xu in his series
   "mm/treewide: Replace pXd_large() with pXd_leaf()".

 - Nico Pache has fixed a few things with our hugepage selftests in his
   series "selftests/mm: Improve Hugepage Test Handling in MM
   Selftests".

 - Also, of course, many singleton patches to many things. Please see
   the individual changelogs for details.

* tag 'mm-stable-2024-03-13-20-04' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (435 commits)
  mm/zswap: remove the memcpy if acomp is not sleepable
  crypto: introduce: acomp_is_async to expose if comp drivers might sleep
  memtest: use {READ,WRITE}_ONCE in memory scanning
  mm: prohibit the last subpage from reusing the entire large folio
  mm: recover pud_leaf() definitions in nopmd case
  selftests/mm: skip the hugetlb-madvise tests on unmet hugepage requirements
  selftests/mm: skip uffd hugetlb tests with insufficient hugepages
  selftests/mm: dont fail testsuite due to a lack of hugepages
  mm/huge_memory: skip invalid debugfs new_order input for folio split
  mm/huge_memory: check new folio order when split a folio
  mm, vmscan: retry kswapd's priority loop with cache_trim_mode off on failure
  mm: add an explicit smp_wmb() to UFFDIO_CONTINUE
  mm: fix list corruption in put_pages_list
  mm: remove folio from deferred split list before uncharging it
  filemap: avoid unnecessary major faults in filemap_fault()
  mm,page_owner: drop unnecessary check
  mm,page_owner: check for null stack_record before bumping its refcount
  mm: swap: fix race between free_swap_and_cache() and swapoff()
  mm/treewide: align up pXd_leaf() retval across archs
  mm/treewide: drop pXd_large()
  ...
2024-03-14 17:43:30 -07:00
Pawan Gupta
8076fcde01 x86/rfds: Mitigate Register File Data Sampling (RFDS)
RFDS is a CPU vulnerability that may allow userspace to infer kernel
stale data previously used in floating point registers, vector registers
and integer registers. RFDS only affects certain Intel Atom processors.

Intel released a microcode update that uses VERW instruction to clear
the affected CPU buffers. Unlike MDS, none of the affected cores support
SMT.

Add RFDS bug infrastructure and enable the VERW based mitigation by
default, that clears the affected buffers just before exiting to
userspace. Also add sysfs reporting and cmdline parameter
"reg_file_data_sampling" to control the mitigation.

For details see:
Documentation/admin-guide/hw-vuln/reg-file-data-sampling.rst

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
2024-03-11 13:13:48 -07:00
Baoquan He
02aff84805 crash: split crash dumping code out from kexec_core.c
Currently, KEXEC_CORE select CRASH_CORE automatically because crash codes
need be built in to avoid compiling error when building kexec code even
though the crash dumping functionality is not enabled. E.g
--------------------
CONFIG_CRASH_CORE=y
CONFIG_KEXEC_CORE=y
CONFIG_KEXEC=y
CONFIG_KEXEC_FILE=y
---------------------

After splitting out crashkernel reservation code and vmcoreinfo exporting
code, there's only crash related code left in kernel/crash_core.c. Now
move crash related codes from kexec_core.c to crash_core.c and only build it
in when CONFIG_CRASH_DUMP=y.

And also wrap up crash codes inside CONFIG_CRASH_DUMP ifdeffery scope,
or replace inappropriate CONFIG_KEXEC_CORE ifdef with CONFIG_CRASH_DUMP
ifdef in generic kernel files.

With these changes, crash_core codes are abstracted from kexec codes and
can be disabled at all if only kexec reboot feature is wanted.

Link: https://lkml.kernel.org/r/20240124051254.67105-5-bhe@redhat.com
Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: Pingfan Liu <piliu@redhat.com>
Cc: Klara Modin <klarasmodin@gmail.com>
Cc: Michael Kelley <mhklinux@outlook.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-23 17:48:22 -08:00
Greg Kroah-Hartman
3a480d4bb5 driver core: cpu: make cpu_subsys const
Now that the driver core can properly handle constant struct bus_type,
move the cpu_subsys variable to be a constant structure as well, placing
it into read-only memory which can not be modified at runtime.

Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Link: https://lore.kernel.org/r/2024010548-crane-snooze-a871@gregkh
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-01-30 15:57:50 -08:00
Linus Torvalds
80955ae955 Driver core changes for 6.8-rc1
Here are the set of driver core and kernfs changes for 6.8-rc1.  Nothing
 major in here this release cycle, just lots of small cleanups and some
 tweaks on kernfs that in the very end, got reverted and will come back
 in a safer way next release cycle.
 
 Included in here are:
   - more driver core 'const' cleanups and fixes
   - fw_devlink=rpm is now the default behavior
   - kernfs tiny changes to remove some string functions
   - cpu handling in the driver core is updated to work better on many
     systems that add topologies and cpus after booting
   - other minor changes and cleanups
 
 All of the cpu handling patches have been acked by the respective
 maintainers and are coming in here in one series.  Everything has been
 in linux-next for a while with no reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZaeOrg8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ymtcwCffzvKKkSY9qAp6+0v2WQNkZm1JWoAoJCPYUwF
 If6wEoPLWvRfKx4gIoq9
 =D96r
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg KH:
 "Here are the set of driver core and kernfs changes for 6.8-rc1.
  Nothing major in here this release cycle, just lots of small cleanups
  and some tweaks on kernfs that in the very end, got reverted and will
  come back in a safer way next release cycle.

  Included in here are:

   - more driver core 'const' cleanups and fixes

   - fw_devlink=rpm is now the default behavior

   - kernfs tiny changes to remove some string functions

   - cpu handling in the driver core is updated to work better on many
     systems that add topologies and cpus after booting

   - other minor changes and cleanups

  All of the cpu handling patches have been acked by the respective
  maintainers and are coming in here in one series. Everything has been
  in linux-next for a while with no reported issues"

* tag 'driver-core-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (51 commits)
  Revert "kernfs: convert kernfs_idr_lock to an irq safe raw spinlock"
  kernfs: convert kernfs_idr_lock to an irq safe raw spinlock
  class: fix use-after-free in class_register()
  PM: clk: make pm_clk_add_notifier() take a const pointer
  EDAC: constantify the struct bus_type usage
  kernfs: fix reference to renamed function
  driver core: device.h: fix Excess kernel-doc description warning
  driver core: class: fix Excess kernel-doc description warning
  driver core: mark remaining local bus_type variables as const
  driver core: container: make container_subsys const
  driver core: bus: constantify subsys_register() calls
  driver core: bus: make bus_sort_breadthfirst() take a const pointer
  kernfs: d_obtain_alias(NULL) will do the right thing...
  driver core: Better advertise dev_err_probe()
  kernfs: Convert kernfs_path_from_node_locked() from strlcpy() to strscpy()
  kernfs: Convert kernfs_name_locked() from strlcpy() to strscpy()
  kernfs: Convert kernfs_walk_ns() from strlcpy() to strscpy()
  initramfs: Expose retained initrd as sysfs file
  fs/kernfs/dir: obey S_ISGID
  kernel/cgroup: use kernfs_create_dir_ns()
  ...
2024-01-18 09:48:40 -08:00
Baoquan He
4e9e2e4c65 drivers/base/cpu: crash data showing should depends on KEXEC_CORE
After commit 88a6f89944 ("crash: memory and CPU hotplug sysfs
attributes"), on x86_64, if only below kernel configs related to kdump are
set, compiling error are triggered.

----
CONFIG_CRASH_CORE=y
CONFIG_KEXEC_CORE=y
CONFIG_CRASH_DUMP=y
CONFIG_CRASH_HOTPLUG=y
------

------------------------------------------------------
drivers/base/cpu.c: In function `crash_hotplug_show':
drivers/base/cpu.c:309:40: error: implicit declaration of function `crash_hotplug_cpu_support'; did you mean `crash_hotplug_show'? [-Werror=implicit-function-declaration]
  309 |         return sysfs_emit(buf, "%d\n", crash_hotplug_cpu_support());
      |                                        ^~~~~~~~~~~~~~~~~~~~~~~~~
      |                                        crash_hotplug_show
cc1: some warnings being treated as errors
------------------------------------------------------

CONFIG_KEXEC is used to enable kexec_load interface, the
crash_notes/crash_notes_size/crash_hotplug showing depends on
CONFIG_KEXEC is incorrect. It should depend on KEXEC_CORE instead.

Fix it now.

Link: https://lkml.kernel.org/r/20231128055248.659808-1-bhe@redhat.com
Fixes: 88a6f89944 ("crash: memory and CPU hotplug sysfs attributes")
Signed-off-by: Baoquan He <bhe@redhat.com>
Tested-by: Ignat Korchagin <ignat@cloudflare.com>	[compile-time only]
Tested-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Eric DeVolder <eric_devolder@yahoo.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-06 16:12:48 -08:00
James Morse
ca00f7d999 drivers: base: Print a warning instead of panic() when register_cpu() fails
loongarch, mips, parisc, riscv and sh all print a warning if
register_cpu() returns an error. Architectures that use
GENERIC_CPU_DEVICES call panic() instead.

Errors in this path indicate something is wrong with the firmware
description of the platform, but the kernel is able to keep running.

Downgrade this to a warning to make it easier to debug this issue.

This will allow architectures that switching over to GENERIC_CPU_DEVICES
to drop their warning, but keep the existing behaviour.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>
Reviewed-by: Shaoqin Huang <shahuang@redhat.com>
Signed-off-by: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/E1r5R3W-00CszU-GM@rmk-PC.armlinux.org.uk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-06 12:41:49 +09:00
Russell King (Oracle)
bb5e44fb3b drivers: base: add arch_cpu_is_hotpluggable()
The differences between architecture specific implementations of
arch_register_cpu() are down to whether the CPU is hotpluggable or not.
Rather than overriding the weak version of arch_register_cpu(), provide
a function that can be used to provide this detail instead.

Reviewed-by: Shaoqin Huang <shahuang@redhat.com>
Signed-off-by: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/E1r5R3M-00CszH-6r@rmk-PC.armlinux.org.uk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-06 12:41:48 +09:00
James Morse
866ec30086 drivers: base: Implement weak arch_unregister_cpu()
Add arch_unregister_cpu() to allow the ACPI machinery to call
unregister_cpu(). This is enough for arm64, riscv and loongarch, but
needs to be overridden by x86 and ia64 who need to do more work.

CC: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Shaoqin Huang <shahuang@redhat.com>
Signed-off-by: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/E1r5R3H-00CszC-2n@rmk-PC.armlinux.org.uk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-06 12:41:48 +09:00
James Morse
0949dd96df drivers: base: Allow parts of GENERIC_CPU_DEVICES to be overridden
Architectures often have extra per-cpu work that needs doing
before a CPU is registered, often to determine if a CPU is
hotpluggable.

To allow the ACPI architectures to use GENERIC_CPU_DEVICES, move
the cpu_register() call into arch_register_cpu(), which is made __weak
so architectures with extra work can override it.
This aligns with the way x86, ia64 and loongarch register hotplug CPUs
when they become present.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Shaoqin Huang <shahuang@redhat.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/E1r5R3B-00Csz6-Uh@rmk-PC.armlinux.org.uk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-06 12:41:48 +09:00
James Morse
b0c69e1214 drivers: base: Use present CPUs in GENERIC_CPU_DEVICES
Three of the five ACPI architectures create sysfs entries using
register_cpu() for present CPUs, whereas arm64, riscv and all
GENERIC_CPU_DEVICES do this for possible CPUs.

Registering a CPU is what causes them to show up in sysfs.

It makes very little sense to register all possible CPUs. Registering
a CPU is what triggers the udev notifications allowing user-space to
react to newly added CPUs.

To allow all five ACPI architectures to use GENERIC_CPU_DEVICES, change
it to use for_each_present_cpu().

Making the ACPI architectures use GENERIC_CPU_DEVICES is a pre-requisite
step to centralise their register_cpu() logic, before moving it into the
ACPI processor driver. When we add support for register CPUs from ACPI
in a later patch, we will avoid registering CPUs in this path.

Of the ACPI architectures that register possible CPUs, arm64 and riscv
do not support making possible CPUs present as they use the weak 'always
fails' version of arch_register_cpu().

Only two of the eight architectures that use GENERIC_CPU_DEVICES have a
distinction between present and possible CPUs.

The following architectures use GENERIC_CPU_DEVICES but are not SMP,
so possible == present:
 * m68k
 * microblaze
 * nios2

The following architectures use GENERIC_CPU_DEVICES and consider
possible == present:
 * csky: setup_smp()
 * processor_probe() sets possible for all CPUs and present for all CPUs
   except the boot cpu, which will have been done by
   init/main.c::start_kernel().

um appears to be a subarchitecture of x86.

The remaining architecture using GENERIC_CPU_DEVICES are:
 * openrisc and hexagon:
   where smp_init_cpus() makes all CPUs < NR_CPUS possible,
   whereas smp_prepare_cpus() only makes CPUs < setup_max_cpus present.

After this change, openrisc and hexagon systems that use the max_cpus
command line argument would not see the other CPUs present in sysfs.
This should not be a problem as these CPUs can't be brought online as
_cpu_up() checks cpu_present().

After this change, only CPUs which are present appear in sysfs.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/E1r5R36-00Csz0-Px@rmk-PC.armlinux.org.uk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-06 12:41:48 +09:00
Linus Torvalds
28a4f91f5f Driver core changes for 6.6-rc1
Here is a small set of driver core updates and additions for 6.6-rc1.
 
 Included in here are:
   - stable kernel documentation updates
   - class structure const work from Ivan on various subsystems
   - kernfs tweaks
   - driver core tests!
   - kobject sanity cleanups
   - kobject structure reordering to save space
   - driver core error code handling fixups
   - other minor driver core cleanups
 
 All of these have been in linux-next for a while with no reported
 problems.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZPH77Q8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ylZMACePk8SitfaJc6FfFf5I7YK7Nq0V8MAn0nUjgsR
 i8NcNpu/Yv4HGrDgTdh/
 =PJbk
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg KH:
 "Here is a small set of driver core updates and additions for 6.6-rc1.

  Included in here are:

   - stable kernel documentation updates

   - class structure const work from Ivan on various subsystems

   - kernfs tweaks

   - driver core tests!

   - kobject sanity cleanups

   - kobject structure reordering to save space

   - driver core error code handling fixups

   - other minor driver core cleanups

  All of these have been in linux-next for a while with no reported
  problems"

* tag 'driver-core-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (32 commits)
  driver core: Call in reversed order in device_platform_notify_remove()
  driver core: Return proper error code when dev_set_name() fails
  kobject: Remove redundant checks for whether ktype is NULL
  kobject: Add sanity check for kset->kobj.ktype in kset_register()
  drivers: base: test: Add missing MODULE_* macros to root device tests
  drivers: base: test: Add missing MODULE_* macros for platform devices tests
  drivers: base: Free devm resources when unregistering a device
  drivers: base: Add basic devm tests for platform devices
  drivers: base: Add basic devm tests for root devices
  kernfs: fix missing kernfs_iattr_rwsem locking
  docs: stable-kernel-rules: mention that regressions must be prevented
  docs: stable-kernel-rules: fine-tune various details
  docs: stable-kernel-rules: make the examples for option 1 a proper list
  docs: stable-kernel-rules: move text around to improve flow
  docs: stable-kernel-rules: improve structure by changing headlines
  base/node: Remove duplicated include
  kernfs: attach uuid for every kernfs and report it in fsid
  kernfs: add stub helper for kernfs_generic_poll()
  x86/resctrl: make pseudo_lock_class a static const structure
  x86/MSR: make msr_class a static const structure
  ...
2023-09-01 09:43:18 -07:00
Linus Torvalds
d68b4b6f30 - An extensive rework of kexec and crash Kconfig from Eric DeVolder
("refactor Kconfig to consolidate KEXEC and CRASH options").
 
 - kernel.h slimming work from Andy Shevchenko ("kernel.h: Split out a
   couple of macros to args.h").
 
 - gdb feature work from Kuan-Ying Lee ("Add GDB memory helper
   commands").
 
 - vsprintf inclusion rationalization from Andy Shevchenko
   ("lib/vsprintf: Rework header inclusions").
 
 - Switch the handling of kdump from a udev scheme to in-kernel handling,
   by Eric DeVolder ("crash: Kernel handling of CPU and memory hot
   un/plug").
 
 - Many singleton patches to various parts of the tree
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZO2GpAAKCRDdBJ7gKXxA
 juW3AQD1moHzlSN6x9I3tjm5TWWNYFoFL8af7wXDJspp/DWH/AD/TO0XlWWhhbYy
 QHy7lL0Syha38kKLMXTM+bN6YQHi9AU=
 =WJQa
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2023-08-28-22-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull non-MM updates from Andrew Morton:

 - An extensive rework of kexec and crash Kconfig from Eric DeVolder
   ("refactor Kconfig to consolidate KEXEC and CRASH options")

 - kernel.h slimming work from Andy Shevchenko ("kernel.h: Split out a
   couple of macros to args.h")

 - gdb feature work from Kuan-Ying Lee ("Add GDB memory helper
   commands")

 - vsprintf inclusion rationalization from Andy Shevchenko
   ("lib/vsprintf: Rework header inclusions")

 - Switch the handling of kdump from a udev scheme to in-kernel
   handling, by Eric DeVolder ("crash: Kernel handling of CPU and memory
   hot un/plug")

 - Many singleton patches to various parts of the tree

* tag 'mm-nonmm-stable-2023-08-28-22-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (81 commits)
  document while_each_thread(), change first_tid() to use for_each_thread()
  drivers/char/mem.c: shrink character device's devlist[] array
  x86/crash: optimize CPU changes
  crash: change crash_prepare_elf64_headers() to for_each_possible_cpu()
  crash: hotplug support for kexec_load()
  x86/crash: add x86 crash hotplug support
  crash: memory and CPU hotplug sysfs attributes
  kexec: exclude elfcorehdr from the segment digest
  crash: add generic infrastructure for crash hotplug support
  crash: move a few code bits to setup support of crash hotplug
  kstrtox: consistently use _tolower()
  kill do_each_thread()
  nilfs2: fix WARNING in mark_buffer_dirty due to discarded buffer reuse
  scripts/bloat-o-meter: count weak symbol sizes
  treewide: drop CONFIG_EMBEDDED
  lockdep: fix static memory detection even more
  lib/vsprintf: declare no_hash_pointers in sprintf.h
  lib/vsprintf: split out sprintf() and friends
  kernel/fork: stop playing lockless games for exe_file replacement
  adfs: delete unused "union adfs_dirtail" definition
  ...
2023-08-29 14:53:51 -07:00
Eric DeVolder
88a6f89944 crash: memory and CPU hotplug sysfs attributes
Introduce the crash_hotplug attribute for memory and CPUs for use by
userspace.  These attributes directly facilitate the udev rule for
managing userspace re-loading of the crash kernel upon hot un/plug
changes.

For memory, expose the crash_hotplug attribute to the
/sys/devices/system/memory directory.  For example:

 # udevadm info --attribute-walk /sys/devices/system/memory/memory81
  looking at device '/devices/system/memory/memory81':
    KERNEL=="memory81"
    SUBSYSTEM=="memory"
    DRIVER==""
    ATTR{online}=="1"
    ATTR{phys_device}=="0"
    ATTR{phys_index}=="00000051"
    ATTR{removable}=="1"
    ATTR{state}=="online"
    ATTR{valid_zones}=="Movable"

  looking at parent device '/devices/system/memory':
    KERNELS=="memory"
    SUBSYSTEMS==""
    DRIVERS==""
    ATTRS{auto_online_blocks}=="offline"
    ATTRS{block_size_bytes}=="8000000"
    ATTRS{crash_hotplug}=="1"

For CPUs, expose the crash_hotplug attribute to the
/sys/devices/system/cpu directory. For example:

 # udevadm info --attribute-walk /sys/devices/system/cpu/cpu0
  looking at device '/devices/system/cpu/cpu0':
    KERNEL=="cpu0"
    SUBSYSTEM=="cpu"
    DRIVER=="processor"
    ATTR{crash_notes}=="277c38600"
    ATTR{crash_notes_size}=="368"
    ATTR{online}=="1"

  looking at parent device '/devices/system/cpu':
    KERNELS=="cpu"
    SUBSYSTEMS==""
    DRIVERS==""
    ATTRS{crash_hotplug}=="1"
    ATTRS{isolated}==""
    ATTRS{kernel_max}=="8191"
    ATTRS{nohz_full}=="  (null)"
    ATTRS{offline}=="4-7"
    ATTRS{online}=="0-3"
    ATTRS{possible}=="0-7"
    ATTRS{present}=="0-3"

With these sysfs attributes in place, it is possible to efficiently
instruct the udev rule to skip crash kernel reloading for kernels
configured with crash hotplug support.

For example, the following is the proposed udev rule change for RHEL
system 98-kexec.rules (as the first lines of the rule file):

 # The kernel updates the crash elfcorehdr for CPU and memory changes
 SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
 SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"

When examined in the context of 98-kexec.rules, the above rules test if
crash_hotplug is set, and if so, the userspace initiated
unload-then-reload of the crash kernel is skipped.

CPU and memory checks are separated in accordance with CONFIG_HOTPLUG_CPU
and CONFIG_MEMORY_HOTPLUG kernel config options.  If an architecture
supports, for example, memory hotplug but not CPU hotplug, then the
/sys/devices/system/memory/crash_hotplug attribute file is present, but
the /sys/devices/system/cpu/crash_hotplug attribute file will NOT be
present.  Thus the udev rule skips userspace processing of memory hot
un/plug events, but the udev rule will evaluate false for CPU events, thus
allowing userspace to process CPU hot un/plug events (ie the
unload-then-reload of the kdump capture kernel).

Link: https://lkml.kernel.org/r/20230814214446.6659-5-eric.devolder@oracle.com
Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Acked-by: Hari Bathini <hbathini@linux.ibm.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Akhil Raj <lf32.dev@gmail.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Mimi Zohar <zohar@linux.ibm.com>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Weißschuh <linux@weissschuh.net>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24 16:25:14 -07:00
Borislav Petkov (AMD)
3477144c87 driver core: cpu: Fix the fallback cpu_show_gds() name
In

  6524c798b7 ("driver core: cpu: Make cpu_show_not_affected() static")

I fat-fingered the name of cpu_show_gds(). Usually, I'd rebase but since
those are extraordinary embargoed times, the commit above was already
pulled into another tree so no no.

Therefore, fix it ontop.

Fixes: 6524c798b7 ("driver core: cpu: Make cpu_show_not_affected() static")
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230811095831.27513-1-bp@alien8.de
2023-08-11 20:36:02 +02:00
Borislav Petkov (AMD)
6524c798b7 driver core: cpu: Make cpu_show_not_affected() static
Fix a -Wmissing-prototypes warning and add the gather_data_sampling()
stub macro call for real.

Fixes: 0fddfe3382 ("driver core: cpu: Unify redundant silly stubs")
Closes: https://lore.kernel.org/oe-kbuild-all/202308101956.oRj1ls7s-lkp@intel.com
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/202308101956.oRj1ls7s-lkp@intel.com
2023-08-10 13:25:12 +02:00
Borislav Petkov (AMD)
0fddfe3382 driver core: cpu: Unify redundant silly stubs
Make them all a weak function, aliasing to a single function which
issues the "Not affected" string.

No functional changes.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Link: https://lore.kernel.org/r/20230809102700.29449-3-bp@alien8.de
2023-08-10 11:03:04 +02:00
Linus Torvalds
64094e7e31 Mitigate Gather Data Sampling issue
* Add Base GDS mitigation
  * Support GDS_NO under KVM
  * Fix a documentation typo
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmTJh5YACgkQaDWVMHDJ
 krAzAw/8DzjhAYEa7a1AodCBMNg8uNOPnLNoRPPNhaN5Iw6W3zXYDBDKT9PyjAIx
 RoIM0aHx/oY9nCpK441o25oCWAAyzk6E5/+q9hMa7B4aHUGKqiDUC6L9dC8UiiSN
 yvoBv4g7F81QnmyazwYI64S6vnbr4Cqe7K/mvVqQ/vbJiugD25zY8mflRV9YAuMk
 Oe7Ff/mCA+I/kqyKhJE3cf3qNhZ61FsFI886fOSvIE7g4THKqo5eGPpIQxR4mXiU
 Ri2JWffTaeHr2m0sAfFeLH4VTZxfAgBkNQUEWeG6f2kDGTEKibXFRsU4+zxjn3gl
 xug+9jfnKN1ceKyNlVeJJZKAfr2TiyUtrlSE5d+subIRKKBaAGgnCQDasaFAluzd
 aZkOYz30PCebhN+KTrR84FySHCaxnev04jqdtVGAQEDbTvyNagFUdZFGhWijJShV
 l2l4A0gFSYJmPfPVuuAwOJnnZtA1sRH9oz/Sny3+z9BKloZh+Nc/+Cu9zC8SLjaU
 BF3Qv2gU9HKTJ+MSy2JrGS52cONfpO5ngFHoOMilZ1KBHrfSb1eiy32PDT+vK60Y
 PFEmI8SWl7bmrO1snVUCfGaHBsHJSu5KMqwBGmM4xSRzJpyvRe493xC7+nFvqNLY
 vFOFc4jGeusOXgiLPpfGduppkTGcM7sy75UMLwTSLcQbDK99mus=
 =ZAPY
 -----END PGP SIGNATURE-----

Merge tag 'gds-for-linus-2023-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86/gds fixes from Dave Hansen:
 "Mitigate Gather Data Sampling issue:

   - Add Base GDS mitigation

   - Support GDS_NO under KVM

   - Fix a documentation typo"

* tag 'gds-for-linus-2023-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Documentation/x86: Fix backwards on/off logic about YMM support
  KVM: Add GDS_NO support to KVM
  x86/speculation: Add Kconfig option for GDS
  x86/speculation: Add force option to GDS mitigation
  x86/speculation: Add Gather Data Sampling mitigation
2023-08-07 17:03:54 -07:00
Waiman Long
1fd7ab3fac driver/base/cpu: Retry online operation if -EBUSY
Booting the kernel with "maxcpus=1" is a common technique for CPU
partitioning and isolation. It delays the CPU bringup process until
when the bootup scripts are ready to bring CPUs online by writing 1 to
/sys/device/system/cpu/cpu<X>/online. However, it was found that not
all the CPUs were online after bootup. The collection of offline CPUs
are different after every reboot.

Further investigation reveals that some "online" write operations
fail with an -EBUSY error. This error is returned when CPU hotplug is
temporiarly disabled when cpu_hotplug_disable() is called.

During bootup, the main caller of cpu_hotplug_disable() is
pci_call_probe() for PCI device initialization. By measuring the times
spent with cpu_hotplug_disabled set in a typical 2-socket server, most
of them last less than 10ms.  However, there are a few that can last
hundreds of ms. Note that the cpu_hotplug_disabled period of different
devices can overlap leading to longer cpu_hotplug_disabled hold time.

Since the CPU hotplug disable condition is transient and it is not
that easy to modify all the existing bootup scripts to handle this
condition, the kernel can help by retrying the online operation when
an -EBUSY error is returned. This patch retries the online operation
in cpu_subsys_online() when an -EBUSY error is returned for up to 5
times after an exponentially increasing delay that can last a total of
at least 620ms of waiting time by calling msleep().

With this patch in place, booting up the patched kernel with "maxcpus=1"
does not leave any CPU in an offline state in 10 reboot attempts.

Reported-by: Vishal Agrawal <vagrawal@redhat.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Link: https://lore.kernel.org/r/20230724143826.3996163-1-longman@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-05 08:31:41 +02:00
Borislav Petkov (AMD)
fb3bd914b3 x86/srso: Add a Speculative RAS Overflow mitigation
Add a mitigation for the speculative return address stack overflow
vulnerability found on AMD processors.

The mitigation works by ensuring all RET instructions speculate to
a controlled location, similar to how speculation is controlled in the
retpoline sequence.  To accomplish this, the __x86_return_thunk forces
the CPU to mispredict every function return using a 'safe return'
sequence.

To ensure the safety of this mitigation, the kernel must ensure that the
safe return sequence is itself free from attacker interference.  In Zen3
and Zen4, this is accomplished by creating a BTB alias between the
untraining function srso_untrain_ret_alias() and the safe return
function srso_safe_ret_alias() which results in evicting a potentially
poisoned BTB entry and using that safe one for all function returns.

In older Zen1 and Zen2, this is accomplished using a reinterpretation
technique similar to Retbleed one: srso_untrain_ret() and
srso_safe_ret().

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2023-07-27 11:07:14 +02:00
Daniel Sneddon
8974eb5882 x86/speculation: Add Gather Data Sampling mitigation
Gather Data Sampling (GDS) is a hardware vulnerability which allows
unprivileged speculative access to data which was previously stored in
vector registers.

Intel processors that support AVX2 and AVX512 have gather instructions
that fetch non-contiguous data elements from memory. On vulnerable
hardware, when a gather instruction is transiently executed and
encounters a fault, stale data from architectural or internal vector
registers may get transiently stored to the destination vector
register allowing an attacker to infer the stale data using typical
side channel techniques like cache timing attacks.

This mitigation is different from many earlier ones for two reasons.
First, it is enabled by default and a bit must be set to *DISABLE* it.
This is the opposite of normal mitigation polarity. This means GDS can
be mitigated simply by updating microcode and leaving the new control
bit alone.

Second, GDS has a "lock" bit. This lock bit is there because the
mitigation affects the hardware security features KeyLocker and SGX.
It needs to be enabled and *STAY* enabled for these features to be
mitigated against GDS.

The mitigation is enabled in the microcode by default. Disable it by
setting gather_data_sampling=off or by disabling all mitigations with
mitigations=off. The mitigation status can be checked by reading:

    /sys/devices/system/cpu/vulnerabilities/gather_data_sampling

Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
2023-07-19 16:45:37 -07:00
Joel Fernandes (Google)
58d7668242 tick/nohz: Fix cpu_is_hotpluggable() by checking with nohz subsystem
For CONFIG_NO_HZ_FULL systems, the tick_do_timer_cpu cannot be offlined.
However, cpu_is_hotpluggable() still returns true for those CPUs. This causes
torture tests that do offlining to end up trying to offline this CPU causing
test failures. Such failure happens on all architectures.

Fix the repeated error messages thrown by this (even if the hotplug errors are
harmless) by asking the opinion of the nohz subsystem on whether the CPU can be
hotplugged.

[ Apply Frederic Weisbecker feedback on refactoring tick_nohz_cpu_down(). ]

For drivers/base/ portion:
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Zhouyi Zhou <zhouzhouyi@gmail.com>
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: rcu <rcu@vger.kernel.org>
Cc: stable@vger.kernel.org
Fixes: 2987557f52 ("driver-core/cpu: Expose hotpluggability to the rest of the kernel")
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
2023-04-05 13:47:43 +00:00
Greg Kroah-Hartman
2bc19066bd driver core: cpu: don't hand-override the uevent bus_type callback.
Instead of having to change the uevent bus_type callback by hand at
runtime, set it at build time based on the build configuration options,
making this much simpler to maintain and understand (and allow to make
the structure constant.)

Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Link: https://lore.kernel.org/r/20230210102408.1083177-1-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-11 11:50:41 +01:00
Greg Kroah-Hartman
8c99377e61 driver core: bus: add bus_get_dev_root() function
Instead of poking around in the struct bus_type directly for the
dev_root pointer, provide a function to return it properly reference
counted, if it is present in the bus.  This will be needed to move the
pointer out of struct bus_type in the future.

Use the function in the driver core code at the same time it is
introduced to verify that it works properly.

Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Link: https://lore.kernel.org/r/20230209093556.19132-1-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-10 10:16:42 +01:00
Greg Kroah-Hartman
2a81ada32f driver core: make struct bus_type.uevent() take a const *
The uevent() callback in struct bus_type should not be modifying the
device that is passed into it, so mark it as a const * and propagate the
function signature changes out into all relevant subsystems that use
this callback.

Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20230111113018.459199-16-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-01-27 13:45:52 +01:00
Alexandre Chartre
6b80b59b35 x86/bugs: Report AMD retbleed vulnerability
Report that AMD x86 CPUs are vulnerable to the RETBleed (Arbitrary
Speculative Code Execution with Return Instructions) attack.

  [peterz: add hygon]
  [kim: invert parity; fam15h]

Co-developed-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27 10:33:59 +02:00
Pawan Gupta
8d50cdf8b8 x86/speculation/mmio: Add sysfs reporting for Processor MMIO Stale Data
Add the sysfs reporting file for Processor MMIO Stale Data
vulnerability. It exposes the vulnerability and mitigation state similar
to the existing files for the other hardware vulnerabilities.

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2022-05-21 12:16:04 +02:00
Frederic Weisbecker
04d4e665a6 sched/isolation: Use single feature type while referring to housekeeping cpumask
Refer to housekeeping APIs using single feature types instead of flags.
This prevents from passing multiple isolation features at once to
housekeeping interfaces, which soon won't be possible anymore as each
isolation features will have their own cpumask.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Juri Lelli <juri.lelli@redhat.com>
Reviewed-by: Phil Auld <pauld@redhat.com>
Link: https://lore.kernel.org/r/20220207155910.527133-5-frederic@kernel.org
2022-02-16 15:57:55 +01:00
Jinchao Wang
e7deeb9d79 driver: base: Prefer unsigned int to bare use of unsigned
Fix checkpatch warnings:
    WARNING: Prefer 'unsigned int' to bare use of 'unsigned'

Signed-off-by: Jinchao Wang <wjc@cdjrlc.com>
Link: https://lore.kernel.org/r/20210628171907.63646-2-wjc@cdjrlc.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-21 17:30:09 +02:00
Rikard Falkeborn
5a576764e4 drivers/base: Constify static attribute_group structs
These are only used by putting their address in an array of pointers to
const struct attribute_group (either directly or via the
__ATTRIBUTE_GROUP macro). Make them const to allow the compiler to place
them in read-only memory.

Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com>
Link: https://lore.kernel.org/r/20210528213408.20067-1-rikard.falkeborn@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-04 15:06:28 +02:00
Colin Ian King
6b72cf1282 drivers/base/cpu: remove redundant assignment of variable retval
The variable retval is being initialized with a value that is never read
and it is being updated later with a new value.  Clean this up by
initializing retval to -ENOMEM and remove the assignment to retval
on the !dev failure path.

Kudos to Rafael for the improved fix suggestion.

Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/20210218202837.516231-1-colin.king@canonical.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-23 14:56:50 +01:00
Joe Perches
948b3edba8 drivers core: Miscellaneous changes for sysfs_emit
Change additional instances that could use sysfs_emit and sysfs_emit_at
that the coccinelle script could not convert.

o macros creating show functions with ## concatenation
o unbound sprintf uses with buf+len for start of output to sysfs_emit_at
o returns with ?: tests and sprintf to sysfs_emit
o sysfs output with struct class * not struct device * arguments

Miscellanea:

o remove unnecessary initializations around these changes
o consistently use int len for return length of show functions
o use octal permissions and not S_<FOO>
o rename a few show function names so DEVICE_ATTR_<FOO> can be used
o use DEVICE_ATTR_ADMIN_RO where appropriate
o consistently use const char *output for strings
o checkpatch/style neatening

Signed-off-by: Joe Perches <joe@perches.com>
Link: https://lore.kernel.org/r/8bc24444fe2049a9b2de6127389b57edfdfe324d.1600285923.git.joe@perches.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-02 13:12:07 +02:00
Joe Perches
aa838896d8 drivers core: Use sysfs_emit and sysfs_emit_at for show(device *...) functions
Convert the various sprintf fmaily calls in sysfs device show functions
to sysfs_emit and sysfs_emit_at for PAGE_SIZE buffer safety.

Done with:

$ spatch -sp-file sysfs_emit_dev.cocci --in-place --max-width=80 .

And cocci script:

$ cat sysfs_emit_dev.cocci
@@
identifier d_show;
identifier dev, attr, buf;
@@

ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf)
{
	<...
	return
-	sprintf(buf,
+	sysfs_emit(buf,
	...);
	...>
}

@@
identifier d_show;
identifier dev, attr, buf;
@@

ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf)
{
	<...
	return
-	snprintf(buf, PAGE_SIZE,
+	sysfs_emit(buf,
	...);
	...>
}

@@
identifier d_show;
identifier dev, attr, buf;
@@

ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf)
{
	<...
	return
-	scnprintf(buf, PAGE_SIZE,
+	sysfs_emit(buf,
	...);
	...>
}

@@
identifier d_show;
identifier dev, attr, buf;
expression chr;
@@

ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf)
{
	<...
	return
-	strcpy(buf, chr);
+	sysfs_emit(buf, chr);
	...>
}

@@
identifier d_show;
identifier dev, attr, buf;
identifier len;
@@

ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf)
{
	<...
	len =
-	sprintf(buf,
+	sysfs_emit(buf,
	...);
	...>
	return len;
}

@@
identifier d_show;
identifier dev, attr, buf;
identifier len;
@@

ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf)
{
	<...
	len =
-	snprintf(buf, PAGE_SIZE,
+	sysfs_emit(buf,
	...);
	...>
	return len;
}

@@
identifier d_show;
identifier dev, attr, buf;
identifier len;
@@

ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf)
{
	<...
	len =
-	scnprintf(buf, PAGE_SIZE,
+	sysfs_emit(buf,
	...);
	...>
	return len;
}

@@
identifier d_show;
identifier dev, attr, buf;
identifier len;
@@

ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf)
{
	<...
-	len += scnprintf(buf + len, PAGE_SIZE - len,
+	len += sysfs_emit_at(buf, len,
	...);
	...>
	return len;
}

@@
identifier d_show;
identifier dev, attr, buf;
expression chr;
@@

ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf)
{
	...
-	strcpy(buf, chr);
-	return strlen(buf);
+	return sysfs_emit(buf, chr);
}

Signed-off-by: Joe Perches <joe@perches.com>
Link: https://lore.kernel.org/r/3d033c33056d88bbe34d4ddb62afd05ee166ab9a.1600285923.git.joe@perches.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-02 13:09:10 +02:00
Mark Gross
7e5b3c267d x86/speculation: Add Special Register Buffer Data Sampling (SRBDS) mitigation
SRBDS is an MDS-like speculative side channel that can leak bits from the
random number generator (RNG) across cores and threads. New microcode
serializes the processor access during the execution of RDRAND and
RDSEED. This ensures that the shared buffer is overwritten before it is
released for reuse.

While it is present on all affected CPU models, the microcode mitigation
is not needed on models that enumerate ARCH_CAPABILITIES[MDS_NO] in the
cases where TSX is not supported or has been disabled with TSX_CTRL.

The mitigation is activated by default on affected processors and it
increases latency for RDRAND and RDSEED instructions. Among other
effects this will reduce throughput from /dev/urandom.

* Enable administrator to configure the mitigation off when desired using
  either mitigations=off or srbds=off.

* Export vulnerability status via sysfs

* Rename file-scoped macros to apply for non-whitelist table initializations.

 [ bp: Massage,
   - s/VULNBL_INTEL_STEPPING/VULNBL_INTEL_STEPPINGS/g,
   - do not read arch cap MSR a second time in tsx_fused_off() - just pass it in,
   - flip check in cpu_set_bug_bits() to save an indentation level,
   - reflow comments.
   jpoimboe: s/Mitigated/Mitigation/ in user-visible strings
   tglx: Dropped the fused off magic for now
 ]

Signed-off-by: Mark Gross <mgross@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Neelima Krishnan <neelima.krishnan@intel.com>
2020-04-20 12:19:22 +02:00
Linus Torvalds
992a1a3b45 CPU (hotplug) updates:
- Support for locked CSD objects in smp_call_function_single_async()
     which allows to simplify callsites in the scheduler core and MIPS
 
   - Treewide consolidation of CPU hotplug functions which ensures the
     consistency between the sysfs interface and kernel state. The low level
     functions cpu_up/down() are now confined to the core code and not
     longer accessible from random code.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl6B9VQTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYodCyD/0WFYAe7LkOfNjkbLa0IeuyLjF9rnCi
 ilcSXMLpaVwwoQvm7MopwkXUDdmEIyeJ0B641j3mC3AKCRap4+O36H2IEg2byrj7
 twOvQNCfxpVVmCCD11FTH9aQa74LEB6AikTgjevhrRWj6eHsal7c2Ak26AzCgrt+
 0eEkOAOWJbLAlbIiPdHlCZ3TMldcs3gg+lRSYd5QCGQVkZFnwpXzyOvpyJEUGGbb
 R/JuvwJoLhRMiYAJDILoQQQg/J07ODuivse/R8PWaH2djkn+2NyRGrD794PhyyOg
 QoTU0ZrYD3Z48ACXv+N3jLM7wXMcFzjYtr1vW1E3O/YGA7GVIC6XHGbMQ7tEihY0
 ajtwq8DcnpKtuouviYnf7NuKgqdmJXkaZjz3Gms6n8nLXqqSVwuQELWV2CXkxNe6
 9kgnnKK+xXMOGI4TUhN8bejvkXqRCmKMeQJcWyf+7RA9UIhAJw5o7WGo8gXfQWUx
 tazCqDy/inYjqGxckW615fhi2zHfemlYTbSzIGOuMB1TEPKFcrgYAii/VMsYHQVZ
 5amkYUXGQ5brlCOzOn38lzp5OkALBnFzD7xgvOcQgWT3ynVpdqADfBytXiEEHh4J
 KSkSgSSRcS58397nIxnDcJgJouHLvAWYyPZ4UC6mfynuQIic31qMHGVqwdbEKMY3
 4M5dGgqIfOBgYw==
 =jwCg
 -----END PGP SIGNATURE-----

Merge tag 'smp-core-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull core SMP updates from Thomas Gleixner:
 "CPU (hotplug) updates:

   - Support for locked CSD objects in smp_call_function_single_async()
     which allows to simplify callsites in the scheduler core and MIPS

   - Treewide consolidation of CPU hotplug functions which ensures the
     consistency between the sysfs interface and kernel state. The low
     level functions cpu_up/down() are now confined to the core code and
     not longer accessible from random code"

* tag 'smp-core-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits)
  cpu/hotplug: Ignore pm_wakeup_pending() for disable_nonboot_cpus()
  cpu/hotplug: Hide cpu_up/down()
  cpu/hotplug: Move bringup of secondary CPUs out of smp_init()
  torture: Replace cpu_up/down() with add/remove_cpu()
  firmware: psci: Replace cpu_up/down() with add/remove_cpu()
  xen/cpuhotplug: Replace cpu_up/down() with device_online/offline()
  parisc: Replace cpu_up/down() with add/remove_cpu()
  sparc: Replace cpu_up/down() with add/remove_cpu()
  powerpc: Replace cpu_up/down() with add/remove_cpu()
  x86/smp: Replace cpu_up/down() with add/remove_cpu()
  arm64: hibernate: Use bringup_hibernate_cpu()
  cpu/hotplug: Provide bringup_hibernate_cpu()
  arm64: Use reboot_cpu instead of hardconding it to 0
  arm64: Don't use disable_nonboot_cpus()
  ARM: Use reboot_cpu instead of hardcoding it to 0
  ARM: Don't use disable_nonboot_cpus()
  ia64: Replace cpu_down() with smp_shutdown_nonboot_cpus()
  cpu/hotplug: Create a new function to shutdown nonboot cpus
  cpu/hotplug: Add new {add,remove}_cpu() functions
  sched/core: Remove rq.hrtick_csd_pending
  ...
2020-03-30 18:06:39 -07:00
Qais Yousef
33c3736ec8 cpu/hotplug: Hide cpu_up/down()
Use separate functions for the device core to bring a CPU up and down.

Users outside the device core must use add/remove_cpu() which will take
care of extra housekeeping work like keeping sysfs in sync.

Make cpu_up/down() static and replace the extra layer of indirection.

[ tglx: Removed the extra wrapper functions and adjusted function names ]

Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200323135110.30522-18-qais.yousef@arm.com
2020-03-25 12:59:38 +01:00
Takashi Iwai
847e33867b drivers/base/cpu: Simplify s*nprintf() usages
Use the simpler sprintf() instead of snprintf() or scnprintf() in a
single-shot sysfs output callbacks where you are very sure that it
won't go over PAGE_SIZE buffer limit.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: https://lore.kernel.org/r/20200311080207.12046-3-tiwai@suse.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-11 09:08:44 +01:00
Takashi Iwai
4636a04630 drivers/base/cpu: Use scnprintf() for avoiding potential buffer overflow
Since snprintf() returns the would-be-output size instead of the
actual output size, the succeeding calls may go beyond the given
buffer limit.  Fix it by replacing with scnprintf().

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: https://lore.kernel.org/r/20200311080207.12046-2-tiwai@suse.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-11 09:08:44 +01:00
Vineela Tummalapalli
db4d30fbb7 x86/bugs: Add ITLB_MULTIHIT bug infrastructure
Some processors may incur a machine check error possibly resulting in an
unrecoverable CPU lockup when an instruction fetch encounters a TLB
multi-hit in the instruction TLB. This can occur when the page size is
changed along with either the physical address or cache type. The relevant
erratum can be found here:

   https://bugzilla.kernel.org/show_bug.cgi?id=205195

There are other processors affected for which the erratum does not fully
disclose the impact.

This issue affects both bare-metal x86 page tables and EPT.

It can be mitigated by either eliminating the use of large pages or by
using careful TLB invalidations when changing the page size in the page
tables.

Just like Spectre, Meltdown, L1TF and MDS, a new bit has been allocated in
MSR_IA32_ARCH_CAPABILITIES (PSCHANGE_MC_NO) and will be set on CPUs which
are mitigated against this issue.

Signed-off-by: Vineela Tummalapalli <vineela.tummalapalli@intel.com>
Co-developed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2019-11-04 12:22:01 +01:00
Pawan Gupta
6608b45ac5 x86/speculation/taa: Add sysfs reporting for TSX Async Abort
Add the sysfs reporting file for TSX Async Abort. It exposes the
vulnerability and the mitigation state similar to the existing files for
the other hardware vulnerabilities.

Sysfs file path is:
/sys/devices/system/cpu/vulnerabilities/tsx_async_abort

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Neelima Krishnan <neelima.krishnan@intel.com>
Reviewed-by: Mark Gross <mgross@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
2019-10-28 08:36:59 +01:00
Linus Torvalds
fa4bff1650 Merge branch 'x86-mds-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 MDS mitigations from Thomas Gleixner:
 "Microarchitectural Data Sampling (MDS) is a hardware vulnerability
  which allows unprivileged speculative access to data which is
  available in various CPU internal buffers. This new set of misfeatures
  has the following CVEs assigned:

     CVE-2018-12126  MSBDS  Microarchitectural Store Buffer Data Sampling
     CVE-2018-12130  MFBDS  Microarchitectural Fill Buffer Data Sampling
     CVE-2018-12127  MLPDS  Microarchitectural Load Port Data Sampling
     CVE-2019-11091  MDSUM  Microarchitectural Data Sampling Uncacheable Memory

  MDS attacks target microarchitectural buffers which speculatively
  forward data under certain conditions. Disclosure gadgets can expose
  this data via cache side channels.

  Contrary to other speculation based vulnerabilities the MDS
  vulnerability does not allow the attacker to control the memory target
  address. As a consequence the attacks are purely sampling based, but
  as demonstrated with the TLBleed attack samples can be postprocessed
  successfully.

  The mitigation is to flush the microarchitectural buffers on return to
  user space and before entering a VM. It's bolted on the VERW
  instruction and requires a microcode update. As some of the attacks
  exploit data structures shared between hyperthreads, full protection
  requires to disable hyperthreading. The kernel does not do that by
  default to avoid breaking unattended updates.

  The mitigation set comes with documentation for administrators and a
  deeper technical view"

* 'x86-mds-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
  x86/speculation/mds: Fix documentation typo
  Documentation: Correct the possible MDS sysfs values
  x86/mds: Add MDSUM variant to the MDS documentation
  x86/speculation/mds: Add 'mitigations=' support for MDS
  x86/speculation/mds: Print SMT vulnerable on MSBDS with mitigations off
  x86/speculation/mds: Fix comment
  x86/speculation/mds: Add SMT warning message
  x86/speculation: Move arch_smt_update() call to after mitigation decisions
  x86/speculation/mds: Add mds=full,nosmt cmdline option
  Documentation: Add MDS vulnerability documentation
  Documentation: Move L1TF to separate directory
  x86/speculation/mds: Add mitigation mode VMWERV
  x86/speculation/mds: Add sysfs reporting for MDS
  x86/speculation/mds: Add mitigation control for MDS
  x86/speculation/mds: Conditionally clear CPU buffers on idle entry
  x86/kvm/vmx: Add MDS protection when L1D Flush is not active
  x86/speculation/mds: Clear CPU buffers on exit to user
  x86/speculation/mds: Add mds_clear_cpu_buffers()
  x86/kvm: Expose X86_FEATURE_MD_CLEAR to guests
  x86/speculation/mds: Add BUG_MSBDS_ONLY
  ...
2019-05-14 07:57:29 -07:00
Linus Torvalds
e431f2d74e Driver core patches for 5.1-rc1
Here is the big driver core patchset for 5.1-rc1
 
 More patches than "normal" here this merge window, due to some work in
 the driver core by Alexander Duyck to rework the async probe
 functionality to work better for a number of devices, and independant
 work from Rafael for the device link functionality to make it work
 "correctly".
 
 Also in here is:
 	- lots of BUS_ATTR() removals, the macro is about to go away
 	- firmware test fixups
 	- ihex fixups and simplification
 	- component additions (also includes i915 patches)
 	- lots of minor coding style fixups and cleanups.
 
 All of these have been in linux-next for a while with no reported
 issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXH+euQ8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ynyTgCfbV8CLums843sBnT8NnWrTMTdTCcAn1K4re0m
 ep8g+6oRLxJy414hogxQ
 =bLs2
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg KH:
 "Here is the big driver core patchset for 5.1-rc1

  More patches than "normal" here this merge window, due to some work in
  the driver core by Alexander Duyck to rework the async probe
  functionality to work better for a number of devices, and independant
  work from Rafael for the device link functionality to make it work
  "correctly".

  Also in here is:

   - lots of BUS_ATTR() removals, the macro is about to go away

   - firmware test fixups

   - ihex fixups and simplification

   - component additions (also includes i915 patches)

   - lots of minor coding style fixups and cleanups.

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'driver-core-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (65 commits)
  driver core: platform: remove misleading err_alloc label
  platform: set of_node in platform_device_register_full()
  firmware: hardcode the debug message for -ENOENT
  driver core: Add missing description of new struct device_link field
  driver core: Fix PM-runtime for links added during consumer probe
  drivers/component: kerneldoc polish
  async: Add cmdline option to specify drivers to be async probed
  driver core: Fix possible supplier PM-usage counter imbalance
  PM-runtime: Fix __pm_runtime_set_status() race with runtime resume
  driver: platform: Support parsing GpioInt 0 in platform_get_irq()
  selftests: firmware: fix verify_reqs() return value
  Revert "selftests: firmware: remove use of non-standard diff -Z option"
  Revert "selftests: firmware: add CONFIG_FW_LOADER_USER_HELPER_FALLBACK to config"
  device: Fix comment for driver_data in struct device
  kernfs: Allocating memory for kernfs_iattrs with kmem_cache.
  sysfs: remove unused include of kernfs-internal.h
  driver core: Postpone DMA tear-down until after devres release
  driver core: Document limitation related to DL_FLAG_RPM_ACTIVE
  PM-runtime: Take suppliers into account in __pm_runtime_set_status()
  device.h: Add __cold to dev_<level> logging functions
  ...
2019-03-06 14:52:48 -08:00
Thomas Gleixner
8a4b06d391 x86/speculation/mds: Add sysfs reporting for MDS
Add the sysfs reporting file for MDS. It exposes the vulnerability and
mitigation state similar to the existing files for the other speculative
hardware vulnerabilities.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Jon Masters <jcm@redhat.com>
Tested-by: Jon Masters <jcm@redhat.com>
2019-03-06 21:52:14 +01:00