Commit Graph

563 Commits

Author SHA1 Message Date
Christoph Müllner
b8ddb0df30
riscv: Add Zawrs support for spinlocks
RISC-V code uses the generic ticket lock implementation, which calls
the macros smp_cond_load_relaxed() and smp_cond_load_acquire().
Introduce a RISC-V specific implementation of smp_cond_load_relaxed()
which applies WRS.NTO of the Zawrs extension in order to reduce power
consumption while waiting and allows hypervisors to enable guests to
trap while waiting. smp_cond_load_acquire() doesn't need a RISC-V
specific implementation as the generic implementation is based on
smp_cond_load_relaxed() and smp_acquire__after_ctrl_dep() sufficiently
provides the acquire semantics.

This implementation is heavily based on Arm's approach which is the
approach Andrea Parri also suggested.

The Zawrs specification can be found here:
https://github.com/riscv/riscv-zawrs/blob/main/zawrs.adoc

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
Co-developed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20240426100820.14762-11-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-12 03:16:42 -07:00
Andrew Jones
6da111574b
riscv: Provide a definition for 'pause'
If we're going to provide the encoding for 'pause' in cpu_relax()
anyway, then we can drop the toolchain checks and just always use
it. The advantage of doing this is that other code that need
pause don't need to also define it (yes, another use is coming).
Add the definition to insn-def.h since it's an instruction
definition and also because insn-def.h doesn't include much, so
it's safe to include from asm/vdso/processor.h without concern for
circular dependencies.

Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20240426100820.14762-9-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-12 03:16:39 -07:00
Xiao Wang
a43fe27d65
riscv: Optimize crc32 with Zbc extension
As suggested by the B-ext spec, the Zbc (carry-less multiplication)
instructions can be used to accelerate CRC calculations. Currently, the
crc32 is the most widely used crc function inside kernel, so this patch
focuses on the optimization of just the crc32 APIs.

Compared with the current table-lookup based optimization, Zbc based
optimization can also achieve large stride during CRC calculation loop,
meantime, it avoids the memory access latency of the table-lookup based
implementation and it reduces memory footprint.

If Zbc feature is not supported in a runtime environment, then the
table-lookup based implementation would serve as fallback via alternative
mechanism.

By inspecting the vmlinux built by gcc v12.2.0 with default optimization
level (-O2), we can see below instruction count change for each 8-byte
stride in the CRC32 loop:

rv64: crc32_be (54->31), crc32_le (54->13), __crc32c_le (54->13)
rv32: crc32_be (50->32), crc32_le (50->16), __crc32c_le (50->16)

The compile target CPU is little endian, extra effort is needed for byte
swapping for the crc32_be API, thus, the instruction count change is not
as significant as that in the *_le cases.

This patch is tested on QEMU VM with the kernel CRC32 selftest for both
rv64 and rv32. Running the CRC32 selftest on a real hardware (SpacemiT K1)
with Zbc extension shows 65% and 125% performance improvement respectively
on crc32_test() and crc32c_test().

Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>
Link: https://lore.kernel.org/r/20240621054707.1847548-1-xiao.w.wang@intel.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-10 13:19:50 -07:00
Palmer Dabbelt
60a6707f58
Merge patch series "riscv: Memory Hot(Un)Plug support"
Björn Töpel <bjorn@kernel.org> says:

From: Björn Töpel <bjorn@rivosinc.com>

================================================================
Memory Hot(Un)Plug support (and ZONE_DEVICE) for the RISC-V port
================================================================

Introduction
============

To quote "Documentation/admin-guide/mm/memory-hotplug.rst": "Memory
hot(un)plug allows for increasing and decreasing the size of physical
memory available to a machine at runtime."

This series adds memory hot(un)plugging, and ZONE_DEVICE support for
the RISC-V Linux port.

MM configuration
================

RISC-V MM has the following configuration:

 * Memory blocks are 128M, analogous to x86-64. It uses PMD
   ("hugepage") vmemmaps. From that follows that 2M (PMD) worth of
   vmemmap spans 32768 pages á 4K which gets us 128M.

 * The pageblock size is the minimum minimum virtio_mem size, and on
   RISC-V it's 2M (2^9 * 4K).

Implementation
==============

The PGD table on RISC-V is shared/copied between for all processes. To
avoid doing page table synchronization, the first patch (patch 1)
pre-allocated the PGD entries for vmemmap/direct map. By doing that
the init_mm PGD will be fixed at kernel init, and synchronization can
be avoided all together.

The following two patches (patch 2-3) does some preparations, followed
by the actual MHP implementation (patch 4-5). Then, MHP and virtio-mem
are enabled (patch 6-7), and finally ZONE_DEVICE support is added
(patch 8).

MHP and locking
===============

TL;DR: The MHP does not step on any toes, except for ptdump.
Additional locking is required for ptdump.

Long version: For v2 I spent some time digging into init_mm
synchronization/update. Here are my findings, and I'd love them to be
corrected if incorrect.

It's been a gnarly path...

The `init_mm` structure is a special mm (perhaps not a "real" one).
It's a "lazy context" that tracks kernel page table resources, e.g.,
the kernel page table (swapper_pg_dir), a kernel page_table_lock (more
about the usage below), mmap_lock, and such.

`init_mm` does not track/contain any VMAs. Having the `init_mm` is
convenient, so that the regular kernel page table walk/modify
functions can be used.

Now, `init_mm` being special means that the locking for kernel page
tables are special as well.

On RISC-V the PGD (top-level page table structure), similar to x86, is
shared (copied) with user processes. If the kernel PGD is modified, it
has to be synched to user-mode processes PGDs. This is avoided by
pre-populating the PGD, so it'll be fixed from boot.

The in-kernel pgd regions are documented in
`Documentation/arch/riscv/vm-layout.rst`.

The distinct regions are:
 * vmemmap
 * vmalloc/ioremap space
 * direct mapping of all physical memory
 * kasan
 * modules, BPF
 * kernel

Memory hotplug is the process of adding/removing memory to/from the
kernel.

Adding is done in two phases:
 1. Add the memory to the kernel
 2. Online memory, making it available to the page allocator.

Step 1 is partially architecture dependent, and updates the init_mm
page table:
 * Update the direct map page tables. The direct map is a linear map,
   representing all physical memory: `virt = phys + PAGE_OFFSET`
 * Add a `struct page` for each added page of memory. Update the
   vmemmap (virtual mapping to the `struct page`, so we can easily
   transform a kernel virtual address to a `struct page *` address.

From an MHP perspective, there are two regions of the PGD that are
updated:
 * vmemmap
 * direct mapping of all physical memory

The `struct mm_struct` has a couple of locks in play:
 * `spinlock_t page_table_lock` protects the page table, and some
    counters
 * `struct rw_semaphore mmap_lock` protect an mm's VMAs

Note again that `init_mm` does not contain any VMAs, but still uses
the mmap_lock in some places.

The `page_table_lock` was originally used to to protect all pages
tables, but more recently a split page table lock has been introduced.
The split lock has a per-table lock for the PTE and PMD tables. If
split lock is disabled, all tables are guarded by
`mm->page_table_lock` (for user processes). Split page table locks are
not used for init_mm.

MHP operations is typically synchronized using
`DEFINE_STATIC_PERCPU_RWSEM(mem_hotplug_lock)`.

Actors
------

The following non-MHP actors in the kernel traverses (read), and/or
modifies the kernel PGD.

 * `ptdump`

   Walks the entire `init_mm`, via `ptdump_walk_pgd()` with the
   `mmap_write_lock(init_mm)` taken.

   Observation: ptdump can race with MHP, and needs additional locking
   to avoid crashes/races.

 * `set_direct_*` / `arch/riscv/mm/pageattr.c`

   The `set_direct_*` functionality is used to "synchronize" the
   direct map to other kernel mappings, e.g. modules/kernel text. The
   direct map is using "as large huge table mappings as possible",
   which means that the `set_direct_*` might need to split the direct
   map.

  The `set_direct_*` functions operates with the
  `mmap_write_lock(init_mm)` taken.

  Observation: `set_direct_*` uses the direct map, but will never
  modify the same entry as MHP. If there is a mapping, that entry will
  never race with MHP. Further, MHP acts when memory is offline.

 * HVO / `mm/hugetlb_vmemmap`

   HVO optimizes the backing `struct page` for hugetlb pages, which
   means changing the "vmemmap" region. HVO can split (merge?) a
   vmemmap pmd. However, it will never race with MHP, since HVO only
   operates at online memory. HVO cannot touch memory being MHP added
   or removed.

 * `apply_to_page_range`

   Walks a range, creates pages and applies a callback (setting
   permissions) for the page.

   When creating a table, it might use `int __pte_alloc_kernel(pmd_t
   *pmd)` which takes the `init_mm.page_table_lock` to synchronize pmd
   populate.

   Used by: `mm/vmalloc.c` and `mm/kasan/shadow.c`. The KASAN callback
   takes the `init_mm.page_table_lock` to synchronize pte creation.

   Observations: `apply_to_page_range` applies to the "vmalloc/ioremap
   space" region, and "kasan" region. *Not* affected by MHP.

 * `apply_to_existing_page_range`

   Walks a range, applies a callback (setting permissions) for the
   page (no page creation).

   Used by: `kernel/bpf/arena.c` and `mm/kasan/shadow.c`. The KASAN
   callback takes the `init_mm.page_table_lock` to synchronize pte
   creation. *Not* affected by MHP regions.

 * `apply_to_existing_page_range` applies to the "vmalloc/ioremap
   space" region, and "kasan" region. *Not* affected by MHP regions.

 *  `ioremap_page_range` and `vmap_page_range`

    Uses the same internal function, and might create table entries at
    the "vmalloc/ioremap space" region. Can call
    `__pte_alloc_kernel()` which takes the `init_mm.page_table_lock`
    synchronizing pmd populate in the region. *Not* affected by MHP
    regions.

Summary:
  * MHP add will never modify the same page table entries, as any of
    the other actors.
  * MHP remove is done when memory is offlined, and will not clash
    with any of the actors.
  * Functions that walk the entire kernel page table need
    synchronization

  * It's sufficient to add the MHP lock ptdump.

Testing
=======

This series adds basic DT supported hotplugging. There is a QEMU
series enabling MHP for the RISC-V "virt" machine here: [1]

ACPI/MSI support is still in the making for RISC-V, and prior proper
(ACPI) PCI MSI support lands [2] and NUMA SRAT support [3], it hard to
try it out.

I've prepared a QEMU branch with proper ACPI GED/PC-DIMM support [4],
and a this series with the required prerequisites [5] (AIA, ACPI AIA
MADT, ACPI NUMA SRAT).

To test with virtio-mem, e.g.:
  | qemu-system-riscv64 \
  |     -machine virt,aia=aplic-imsic \
  |     -cpu rv64,v=true,vlen=256,elen=64,h=true,zbkb=on,zbkc=on,zbkx=on,zkr=on,zkt=on,svinval=on,svnapot=on,svpbmt=on \
  |     -nodefaults \
  |     -nographic -smp 8 -kernel rv64-u-boot.bin \
  |     -drive file=rootfs.img,format=raw,if=virtio \
  |     -device virtio-rng-pci \
  |     -m 16G,slots=3,maxmem=32G \
  |     -object memory-backend-ram,id=mem0,size=16G \
  |     -numa node,nodeid=0,memdev=mem0 \
  |     -serial chardev:char0 \
  |     -mon chardev=char0,mode=readline \
  |     -chardev stdio,mux=on,id=char0 \
  |     -device pci-serial,id=serial0,chardev=char0 \
  |     -object memory-backend-ram,id=vmem0,size=2G \
  |     -device virtio-mem-pci,id=vm0,memdev=vmem0,node=0

where "rv64-u-boot.bin" is U-boot with EFI/ACPI-support (use [6] if
you're lazy).

In the QEMU monitor:
  | (qemu) info memory-devices
  | (qemu) qom-set vm0 requested-size 1G

...to test DAX/KMEM, use the follow QEMU parameters:
  |  -object memory-backend-file,id=mem1,share=on,mem-path=virtio_pmem.img,size=4G \
  |  -device virtio-pmem-pci,memdev=mem1,id=nv1

and the regular ndctl/daxctl dance.

If you're brave to try the ACPI branch, add "acpi=on" to "-machine
virt", and test PC-DIMM MHP (in addition to virtio-{p},mem):

In the QEMU monitor:
  | (qemu) object_add memory-backend-ram,id=mem1,size=1G
  | (qemu) device_add pc-dimm,id=dimm1,memdev=mem1

You can also try hot-remove with some QEMU options, say:
  | -object memory-backend-file,id=mem-1,size=256M,mem-path=/pagesize-2MB
  | -device pc-dimm,id=mem1,memdev=mem-1
  | -object memory-backend-file,id=mem-2,size=1G,mem-path=/pagesize-1GB
  | -device pc-dimm,id=mem2,memdev=mem-2
  | -object memory-backend-file,id=mem-3,size=256M,mem-path=/pagesize-2MB
  | -device pc-dimm,id=mem3,memdev=mem-3

Remove "acpi=on" to run with DT.

Thanks to Alex, Andrew, David, and Oscar for all
comments/tests/fixups.

References
==========

[1] https://lore.kernel.org/qemu-devel/20240521105635.795211-1-bjorn@kernel.org/
[2] https://lore.kernel.org/linux-riscv/20240501121742.1215792-1-sunilvl@ventanamicro.com/
[3] https://lore.kernel.org/linux-riscv/cover.1713778236.git.haibo1.xu@intel.com/
[4] https://github.com/bjoto/qemu/commits/virtio-mem-pc-dimm-mhp-acpi-v2/
[5] https://github.com/bjoto/linux/commits/mhp-v4-acpi
[6] https://github.com/bjoto/riscv-rootfs-utils/tree/acpi

* b4-shazam-merge:
  riscv: Enable DAX VMEMMAP optimization
  riscv: mm: Add support for ZONE_DEVICE
  virtio-mem: Enable virtio-mem for RISC-V
  riscv: Enable memory hotplugging for RISC-V
  riscv: mm: Take memory hotplug read-lock during kernel page table dump
  riscv: mm: Add memory hotplugging support
  riscv: mm: Add pfn_to_kaddr() implementation
  riscv: mm: Refactor create_linear_mapping_range() for memory hot add
  riscv: mm: Change attribute from __init to __meminit for page functions
  riscv: mm: Pre-allocate vmemmap/direct map/kasan PGD entries
  riscv: mm: Properly forward vmemmap_populate() altmap parameter

Link: https://lore.kernel.org/r/20240605114100.315918-1-bjorn@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-06-26 08:43:07 -07:00
Björn Töpel
4705c1571a
riscv: Enable DAX VMEMMAP optimization
Now that DAX is usable, enable the DAX VMEMMAP optimization as well.

Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20240605114100.315918-12-bjorn@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-06-26 08:42:47 -07:00
Björn Töpel
216e04bf1e
riscv: mm: Add support for ZONE_DEVICE
ZONE_DEVICE pages need DEVMAP PTEs support to function
(ARCH_HAS_PTE_DEVMAP). Claim another RSW (reserved for software) bit
in the PTE for DEVMAP mark, add the corresponding helpers, and enable
ARCH_HAS_PTE_DEVMAP for riscv64.

Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20240605114100.315918-11-bjorn@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-06-26 08:42:46 -07:00
Björn Töpel
f8c2a24055
riscv: Enable memory hotplugging for RISC-V
Enable ARCH_ENABLE_MEMORY_HOTPLUG and ARCH_ENABLE_MEMORY_HOTREMOVE for
RISC-V.

Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20240605114100.315918-9-bjorn@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-06-26 08:42:44 -07:00
Haibo Xu
d6ecd18893
riscv: dmi: Add SMBIOS/DMI support
Enable the dmi driver for riscv which would allow access the
SMBIOS info through some userspace file(/sys/firmware/dmi/*).

The change was based on that of arm64 and has been verified
by dmidecode tool.

Signed-off-by: Haibo Xu <haibo1.xu@intel.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20240613065507.287577-1-haibo1.xu@intel.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-06-26 08:02:33 -07:00
Jakub Kicinski
62b5bf58b9 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

No conflicts.

Adjacent changes:

drivers/net/ethernet/pensando/ionic/ionic_txrx.c
  d9c0420999 ("ionic: Mark error paths in the data path as unlikely")
  491aee894a ("ionic: fix kernel panic in XDP_TX action")

net/ipv6/ip6_fib.c
  b4cb4a1391 ("net: use unrcu_pointer() helper")
  b01e1c0307 ("ipv6: fix possible race in __fib6_drop_pcpu_from()")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-06 12:06:56 -07:00
Jakub Kicinski
e19de2064f Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

Conflicts:

drivers/net/ethernet/ti/icssg/icssg_classifier.c
  abd5576b9c ("net: ti: icssg-prueth: Add support for ICSSG switch firmware")
  56a5cf538c ("net: ti: icssg-prueth: Fix start counter for ft1 filter")
https://lore.kernel.org/all/20240531123822.3bb7eadf@canb.auug.org.au/

No other adjacent changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-31 14:10:28 -07:00
Nam Cao
7bed516174
riscv: enable HAVE_ARCH_HUGE_VMAP for XIP kernel
HAVE_ARCH_HUGE_VMAP also works on XIP kernel, so remove its dependency on
!XIP_KERNEL.

This also fixes a boot problem for XIP kernel introduced by the commit in
"Fixes:". This commit used huge page mapping for vmemmap, but huge page
vmap was not enabled for XIP kernel.

Fixes: ff172d4818 ("riscv: Use hugepage mappings for vmemmap")
Signed-off-by: Nam Cao <namcao@linutronix.de>
Cc: <stable@vger.kernel.org>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20240526110104.470429-1-namcao@linutronix.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-05-30 09:42:52 -07:00
Jakub Kicinski
4b3529edbb bpf-next-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZlWtmQAKCRDbK58LschI
 g0TUAQDT76jx7Rq1DShCtZ3eqiBMNkYczK8b+GqNsSG8YGduaAEA1jn/GN+H65Rh
 atQZ/pYAfLZflMV04+XE0GyBr5q1uQg=
 =NczG
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2024-05-28

We've added 23 non-merge commits during the last 11 day(s) which contain
a total of 45 files changed, 696 insertions(+), 277 deletions(-).

The main changes are:

1) Rename skb's mono_delivery_time to tstamp_type for extensibility
   and add SKB_CLOCK_TAI type support to bpf_skb_set_tstamp(),
   from Abhishek Chauhan.

2) Add netfilter CT zone ID and direction to bpf_ct_opts so that arbitrary
   CT zones can be used from XDP/tc BPF netfilter CT helper functions,
   from Brad Cowie.

3) Several tweaks to the instruction-set.rst IETF doc to address
   the Last Call review comments, from Dave Thaler.

4) Small batch of riscv64 BPF JIT optimizations in order to emit more
   compressed instructions to the JITed image for better icache efficiency,
   from Xiao Wang.

5) Sort bpftool C dump output from BTF, aiming to simplify vmlinux.h
   diffing and forcing more natural type definitions ordering,
   from Mykyta Yatsenko.

6) Use DEV_STATS_INC() macro in BPF redirect helpers to silence
   a syzbot/KCSAN race report for the tx_errors counter,
   from Jiang Yunshui.

7) Un-constify bpf_func_info in bpftool to fix compilation with LLVM 17+
   which started treating const structs as constants and thus breaking
   full BTF program name resolution, from Ivan Babrou.

8) Fix up BPF program numbers in test_sockmap selftest in order to reduce
   some of the test-internal array sizes, from Geliang Tang.

9) Small cleanup in Makefile.btf script to use test-ge check for v1.25-only
   pahole, from Alan Maguire.

10) Fix bpftool's make dependencies for vmlinux.h in order to avoid needless
    rebuilds in some corner cases, from Artem Savkov.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (23 commits)
  bpf, net: Use DEV_STAT_INC()
  bpf, docs: Fix instruction.rst indentation
  bpf, docs: Clarify call local offset
  bpf, docs: Add table captions
  bpf, docs: clarify sign extension of 64-bit use of 32-bit imm
  bpf, docs: Use RFC 2119 language for ISA requirements
  bpf, docs: Move sentence about returning R0 to abi.rst
  bpf: constify member bpf_sysctl_kern:: Table
  riscv, bpf: Try RVC for reg move within BPF_CMPXCHG JIT
  riscv, bpf: Use STACK_ALIGN macro for size rounding up
  riscv, bpf: Optimize zextw insn with Zba extension
  selftests/bpf: Handle forwarding of UDP CLOCK_TAI packets
  net: Add additional bit to support clockid_t timestamp type
  net: Rename mono_delivery_time to tstamp_type for scalabilty
  selftests/bpf: Update tests for new ct zone opts for nf_conntrack kfuncs
  net: netfilter: Make ct zone opts configurable for bpf ct helpers
  selftests/bpf: Fix prog numbers in test_sockmap
  bpf: Remove unused variable "prev_state"
  bpftool: Un-const bpf_func_info to fix it for llvm 17 and newer
  bpf: Fix order of args in call to bpf_map_kvcalloc
  ...
====================

Link: https://lore.kernel.org/r/20240528105924.30905-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-28 07:27:29 -07:00
Linus Torvalds
f1f9984fdc RISC-V Patches for the 6.10 Merge Window, Part 2
* The compression format used for boot images is now configurable at
   build time, and these formats are shown in `make help`.
 * access_ok() has been optimized.
 * A pair of performance bugs have been fixed in the uaccess handlers.
 * Various fixes and cleanups, including one for the IMSIC build failure
   and one for the early-boot ftrace illegal NOPs bug.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmZQtRwTHHBhbG1lckBk
 YWJiZWx0LmNvbQAKCRAuExnzX7sYiWPBD/0UitwMg88m6urvMd0Pfvwwbu/OnGqW
 TZT8C55iJi/e5f9K4mBrSyjATI8z/MblD+Zz0adX8ygavS4JuQ7DoWwb1yTT3pww
 +z74FkWeJuiar+HfbhQ602CfMrnzvWjnyJ3URemqy5pIBKyvD9gGkDJDZwf8hJTk
 Vqh5qVtnBqFBO9kWpIx+/pLCfpyHVNkhWr1AzKfoqQ1WPIpZ/o0IGdvS88rL+EBR
 QOXiwVhEsRfC+LT6Jhn8l2bGp7PaSRVOid19OxNsJKpAhpL6AOscaafclVrLBuTd
 gkys0rT2dHdoWTAkPHQpvlOI6OmGTgopxo5pUKJHS8J9VRoBun25zC1FGBF8uyVd
 05CabWPnh7olNsRge9XiNj3x8PXjGVi7X7wUbRgOBG5aDc6TbKdxu37J0tXe0M7a
 Q74ctQvk8Nk6bQWirgTNlfJJHzL5pJbKc9VwY5uGX4qTmH+yEvCIt45ZXgXOuS/F
 eqijStkkdXUDnkMdcpaZJvXP80rHcgfP8bqevvPymRli8ER9zj9aXJQ3rmCUcPz+
 EtbyS+vOEN31wNTA1EQlfIRxfvr22x7r70DDdRwmhuD1W1tgfblm+R0Cq76I5rnJ
 VSgXKq1b4mY0eautqXEnPGyqb7H8iJIq7AoyfbzzWN+4u6yVEUvpDKueeksy+fFt
 sGNtjWqGhWyKXg==
 =/Qtt
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-6.10-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull more RISC-V updates from Palmer Dabbelt:

 - The compression format used for boot images is now configurable at
   build time, and these formats are shown in `make help`

 - access_ok() has been optimized

 - A pair of performance bugs have been fixed in the uaccess handlers

 - Various fixes and cleanups, including one for the IMSIC build failure
   and one for the early-boot ftrace illegal NOPs bug

* tag 'riscv-for-linus-6.10-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: Fix early ftrace nop patching
  irqchip: riscv-imsic: Fixup riscv_ipi_set_virq_range() conflict
  riscv: selftests: Add signal handling vector tests
  riscv: mm: accelerate pagefault when badaccess
  riscv: uaccess: Relax the threshold for fast path
  riscv: uaccess: Allow the last potential unrolled copy
  riscv: typo in comment for get_f64_reg
  Use bool value in set_cpu_online()
  riscv: selftests: Add hwprobe binaries to .gitignore
  riscv: stacktrace: fixed walk_stackframe()
  ftrace: riscv: move from REGS to ARGS
  riscv: do not select MODULE_SECTIONS by default
  riscv: show help string for riscv-specific targets
  riscv: make image compression configurable
  riscv: cpufeature: Fix extension subset checking
  riscv: cpufeature: Fix thead vector hwcap removal
  riscv: rewrite __kernel_map_pages() to fix sleeping in invalid context
  riscv: force PAGE_SIZE linear mapping if debug_pagealloc is enabled
  riscv: Define TASK_SIZE_MAX for __access_ok()
  riscv: Remove PGDIR_SIZE_L3 and TASK_SIZE_MIN
2024-05-24 10:46:35 -07:00
Xiao Wang
c12603e76e riscv, bpf: Optimize zextw insn with Zba extension
The Zba extension provides add.uw insn which can be used to implement
zext.w with rs2 set as ZERO.

Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Pu Lehui <pulehui@huawei.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Pu Lehui <pulehui@huawei.com>
Link: https://lore.kernel.org/bpf/20240516090430.493122-1-xiao.w.wang@intel.com
2024-05-24 16:53:12 +02:00
Linus Torvalds
c760b3725e - A series ("kbuild: enable more warnings by default") from Arnd
Bergmann which enables a number of additional build-time warnings.  We
   fixed all the fallout which we could find, there may still be a few
   stragglers.
 
 - Samuel Holland has developed the series "Unified cross-architecture
   kernel-mode FPU API".  This does a lot of consolidation of
   per-architecture kernel-mode FPU usage and enables the use of newer AMD
   GPUs on RISC-V.
 
 - Tao Su has fixed some selftests build warnings in the series
   "Selftests: Fix compilation warnings due to missing _GNU_SOURCE
   definition".
 
 - This pull also includes a nilfs2 fixup from Ryusuke Konishi.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZk6OSAAKCRDdBJ7gKXxA
 jpTGAP9hQaZ+g7CO38hKQAtEI8rwcZJtvUAP84pZEGMjYMGLxQD/S8z1o7UHx61j
 DUbnunbOkU/UcPx3Fs/gp4KcJARMEgs=
 =EPi9
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2024-05-22-17-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull more non-mm updates from Andrew Morton:

 - A series ("kbuild: enable more warnings by default") from Arnd
   Bergmann which enables a number of additional build-time warnings. We
   fixed all the fallout which we could find, there may still be a few
   stragglers.

 - Samuel Holland has developed the series "Unified cross-architecture
   kernel-mode FPU API". This does a lot of consolidation of
   per-architecture kernel-mode FPU usage and enables the use of newer
   AMD GPUs on RISC-V.

 - Tao Su has fixed some selftests build warnings in the series
   "Selftests: Fix compilation warnings due to missing _GNU_SOURCE
   definition".

 - This pull also includes a nilfs2 fixup from Ryusuke Konishi.

* tag 'mm-nonmm-stable-2024-05-22-17-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (23 commits)
  nilfs2: make block erasure safe in nilfs_finish_roll_forward()
  selftests/harness: use 1024 in place of LINE_MAX
  Revert "selftests/harness: remove use of LINE_MAX"
  selftests/fpu: allow building on other architectures
  selftests/fpu: move FP code to a separate translation unit
  drm/amd/display: use ARCH_HAS_KERNEL_FPU_SUPPORT
  drm/amd/display: only use hard-float, not altivec on powerpc
  riscv: add support for kernel-mode FPU
  x86: implement ARCH_HAS_KERNEL_FPU_SUPPORT
  powerpc: implement ARCH_HAS_KERNEL_FPU_SUPPORT
  LoongArch: implement ARCH_HAS_KERNEL_FPU_SUPPORT
  lib/raid6: use CC_FLAGS_FPU for NEON CFLAGS
  arm64: crypto: use CC_FLAGS_FPU for NEON CFLAGS
  arm64: implement ARCH_HAS_KERNEL_FPU_SUPPORT
  ARM: crypto: use CC_FLAGS_FPU for NEON CFLAGS
  ARM: implement ARCH_HAS_KERNEL_FPU_SUPPORT
  arch: add ARCH_HAS_KERNEL_FPU_SUPPORT
  x86/fpu: fix asm/fpu/types.h include guard
  kbuild: enable -Wcast-function-type-strict unconditionally
  kbuild: enable -Wformat-truncation on clang
  ...
2024-05-22 18:59:29 -07:00
Puranjay Mohan
7caa976546
ftrace: riscv: move from REGS to ARGS
This commit replaces riscv's support for FTRACE_WITH_REGS with support
for FTRACE_WITH_ARGS. This is required for the ongoing effort to stop
relying on stop_machine() for RISCV's implementation of ftrace.

The main relevant benefit that this change will bring for the above
use-case is that now we don't have separate ftrace_caller and
ftrace_regs_caller trampolines. This will allow the callsite to call
ftrace_caller by modifying a single instruction. Now the callsite can
do something similar to:

When not tracing:            |             When tracing:

func:                                      func:
  auipc t0, ftrace_caller_top                auipc t0, ftrace_caller_top
  nop  <=========<Enable/Disable>=========>  jalr  t0, ftrace_caller_bottom
  [...]                                      [...]

The above assumes that we are dropping the support of calling a direct
trampoline from the callsite. We need to drop this as the callsite can't
change the target address to call, it can only enable/disable a call to
a preset target (ftrace_caller in the above diagram). We can later optimize
this by calling an intermediate dispatcher trampoline before ftrace_caller.

Currently, ftrace_regs_caller saves all CPU registers in the format of
struct pt_regs and allows the tracer to modify them. We don't need to
save all of the CPU registers because at function entry only a subset of
pt_regs is live:

|----------+----------+---------------------------------------------|
| Register | ABI Name | Description                                 |
|----------+----------+---------------------------------------------|
| x1       | ra       | Return address for traced function          |
| x2       | sp       | Stack pointer                               |
| x5       | t0       | Return address for ftrace_caller trampoline |
| x8       | s0/fp    | Frame pointer                               |
| x10-11   | a0-1     | Function arguments/return values            |
| x12-17   | a2-7     | Function arguments                          |
|----------+----------+---------------------------------------------|

See RISCV calling convention[1] for the above table.

Saving just the live registers decreases the amount of stack space
required from 288 Bytes to 112 Bytes.

Basic testing was done with this on the VisionFive 2 development board.

Note:
  - Moving from REGS to ARGS will mean that RISCV will stop supporting
    KPROBES_ON_FTRACE as it requires full pt_regs to be saved.
  - KPROBES_ON_FTRACE will be supplanted by FPROBES see [2].

[1] https://riscv.org/wp-content/uploads/2015/01/riscv-calling.pdf
[2] https://lore.kernel.org/all/170887410337.564249.6360118840946697039.stgit@devnote2/

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20240405142453.4187-1-puranjay@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-05-22 16:12:48 -07:00
Qingfang Deng
2aff5f955b
riscv: do not select MODULE_SECTIONS by default
Since commit aad15bc85c ("riscv: Change code model of module to
medany to improve data accessing"), kernel modules have not been built
with -fPIC, so they wouldn't have R_RISCV_GOT_HI20 or R_RISCV_CALL_PLT
relocations, and handling of those relocations is unnecessary.

If RELOCATABLE=y, kernel modules will be built with -fPIE, which would
reintroduce said relocations, so only select MODULE_SECTIONS when
RELOCATABLE.

Signed-off-by: Qingfang Deng <qingfang.deng@siflower.com.cn>
Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>
Link: https://lore.kernel.org/r/20240511015725.1162-1-dqfext@gmail.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-05-22 16:12:46 -07:00
Emil Renner Berthing
e79dfcbfb9
riscv: make image compression configurable
Previously the build process would always set KBUILD_IMAGE to the
uncompressed Image file (unless XIP_KERNEL or EFI_ZBOOT was enabled) and
unconditionally compress it into Image.gz. However there are already
build targets for Image.bz2, Image.lz4, Image.lzma, Image.lzo and
Image.zstd, so let's make use of those, make the compression method
configurable and set KBUILD_IMAGE accordingly so that targets like
'make install' and 'make bindeb-pkg' will use the chosen image.

Tested-by: Björn Töpel <bjorn@rivosinc.com>
Signed-off-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Reviewed-by: Nicolas Schier <n.schier@avm.de>
Reviewed-by: Masahiro Yamada <masahiroy@kernel.org>
Link: https://lore.kernel.org/r/20240504193446.196886-2-emil.renner.berthing@canonical.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-05-22 16:12:44 -07:00
Linus Torvalds
0bfbc914d9 RISC-V Patches for the 6.10 Merge Window, Part 1
* Support for byte/half-word compare-and-exchange, emulated via LR/SC
   loops.
 * Support for Rust.
 * Support for Zihintpause in hwprobe.
 * Support for the PR_RISCV_SET_ICACHE_FLUSH_CTX prctl().
 * Support for lockless lockrefs.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmZN/hcTHHBhbG1lckBk
 YWJiZWx0LmNvbQAKCRAuExnzX7sYiVrGEACUT3gsbTx1q7fa11iQNxOjVkpl66Qn
 7+kI+V9xt5+GuH2EjJk6AsSNHPKeQ8totbSTA8AZjINFvgVjXslN+DPpcjCFKvnh
 NN5/Lyd64X0PZMsxGWlN9SHTFWf2b7lalCnY51BlX/IpBbHWc/no9XUsPSVixx6u
 9q+JoS3D1DDV92nGcA/UK9ICCsDcf4omWgZW7KbjnVWnuY9jt4ctTy11jtF2RM9R
 Z9KAWh0RqPzjz0vNbBBf9Iw7E4jt/Px6HDYPfZAiE2dVsCTHjdsC7TcGRYXzKt6F
 4q9zg8kzwvUG5GaBl7/XprXO1vaeOUmPcTVoE7qlRkSdkknRH/iBz1P4hk+r0fze
 f+h5ZUV/oJP7vDb+vHm/BExtGufgLuJ2oMA2Bp9qI17EMcMsGiRMt7DsBMEafWDk
 bNrFcJdqqYBz6HxfTwzNH5ErxfS/59PuwYl913BTSOH//raCZCFXOfyrSICH7qXd
 UFOLLmBpMuApLa8ayFeI9Mp3flWfbdQHR52zLRLiUvlpWNEDKrNQN417juVwTXF0
 DYkjJDhFPLfFOr/sJBboftOMOUdA9c/CJepY9o4kPvBXUvPtRHN1jdXDNSCVDZRb
 nErnsJ9rv0PzfxQU7Xjhd2QmCMeMlbCQDpXAKKETyyimpTbgF33rovN0i5ixX3m4
 KG6RvKDubOzZdA==
 =YLoD
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-6.10-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V updates from Palmer Dabbelt:

 - Add byte/half-word compare-and-exchange, emulated via LR/SC loops

 - Support for Rust

 - Support for Zihintpause in hwprobe

 - Add PR_RISCV_SET_ICACHE_FLUSH_CTX prctl()

 - Support lockless lockrefs

* tag 'riscv-for-linus-6.10-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (42 commits)
  riscv: defconfig: Enable CONFIG_CLK_SOPHGO_CV1800
  riscv: select ARCH_HAS_FAST_MULTIPLIER
  riscv: mm: still create swiotlb buffer for kmalloc() bouncing if required
  riscv: Annotate pgtable_l{4,5}_enabled with __ro_after_init
  riscv: Remove redundant CONFIG_64BIT from pgtable_l{4,5}_enabled
  riscv: mm: Always use an ASID to flush mm contexts
  riscv: mm: Preserve global TLB entries when switching contexts
  riscv: mm: Make asid_bits a local variable
  riscv: mm: Use a fixed layout for the MM context ID
  riscv: mm: Introduce cntx2asid/cntx2version helper macros
  riscv: Avoid TLB flush loops when affected by SiFive CIP-1200
  riscv: Apply SiFive CIP-1200 workaround to single-ASID sfence.vma
  riscv: mm: Combine the SMP and UP TLB flush code
  riscv: Only send remote fences when some other CPU is online
  riscv: mm: Broadcast kernel TLB flushes only when needed
  riscv: Use IPIs for remote cache/TLB flushes by default
  riscv: Factor out page table TLB synchronization
  riscv: Flush the instruction cache during SMP bringup
  riscv: hwprobe: export Zihintpause ISA extension
  riscv: misaligned: remove CONFIG_RISCV_M_MODE specific code
  ...
2024-05-22 09:56:00 -07:00
Samuel Holland
77acc6b55a riscv: add support for kernel-mode FPU
This is motivated by the amdgpu DRM driver, which needs floating-point
code to support recent hardware.  That code is not performance-critical,
so only provide a minimal non-preemptible implementation for now.

Support is limited to riscv64 because riscv32 requires runtime (libgcc)
assistance to convert between doubles and 64-bit integers.

Link: https://lkml.kernel.org/r/20240329072441.591471-12-samuel.holland@sifive.com
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Christian König <christian.koenig@amd.com> 
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nicolas Schier <nicolas@fjasle.eu>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: WANG Xuerui <git@xen0n.name>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-19 14:36:19 -07:00
Linus Torvalds
61307b7be4 The usual shower of singleton fixes and minor series all over MM,
documented (hopefully adequately) in the respective changelogs.  Notable
 series include:
 
 - Lucas Stach has provided some page-mapping
   cleanup/consolidation/maintainability work in the series "mm/treewide:
   Remove pXd_huge() API".
 
 - In the series "Allow migrate on protnone reference with
   MPOL_PREFERRED_MANY policy", Donet Tom has optimized mempolicy's
   MPOL_PREFERRED_MANY mode, yielding almost doubled performance in one
   test.
 
 - In their series "Memory allocation profiling" Kent Overstreet and
   Suren Baghdasaryan have contributed a means of determining (via
   /proc/allocinfo) whereabouts in the kernel memory is being allocated:
   number of calls and amount of memory.
 
 - Matthew Wilcox has provided the series "Various significant MM
   patches" which does a number of rather unrelated things, but in largely
   similar code sites.
 
 - In his series "mm: page_alloc: freelist migratetype hygiene" Johannes
   Weiner has fixed the page allocator's handling of migratetype requests,
   with resulting improvements in compaction efficiency.
 
 - In the series "make the hugetlb migration strategy consistent" Baolin
   Wang has fixed a hugetlb migration issue, which should improve hugetlb
   allocation reliability.
 
 - Liu Shixin has hit an I/O meltdown caused by readahead in a
   memory-tight memcg.  Addressed in the series "Fix I/O high when memory
   almost met memcg limit".
 
 - In the series "mm/filemap: optimize folio adding and splitting" Kairui
   Song has optimized pagecache insertion, yielding ~10% performance
   improvement in one test.
 
 - Baoquan He has cleaned up and consolidated the early zone
   initialization code in the series "mm/mm_init.c: refactor
   free_area_init_core()".
 
 - Baoquan has also redone some MM initializatio code in the series
   "mm/init: minor clean up and improvement".
 
 - MM helper cleanups from Christoph Hellwig in his series "remove
   follow_pfn".
 
 - More cleanups from Matthew Wilcox in the series "Various page->flags
   cleanups".
 
 - Vlastimil Babka has contributed maintainability improvements in the
   series "memcg_kmem hooks refactoring".
 
 - More folio conversions and cleanups in Matthew Wilcox's series
 
 	"Convert huge_zero_page to huge_zero_folio"
 	"khugepaged folio conversions"
 	"Remove page_idle and page_young wrappers"
 	"Use folio APIs in procfs"
 	"Clean up __folio_put()"
 	"Some cleanups for memory-failure"
 	"Remove page_mapping()"
 	"More folio compat code removal"
 
 - David Hildenbrand chipped in with "fs/proc/task_mmu: convert hugetlb
   functions to work on folis".
 
 - Code consolidation and cleanup work related to GUP's handling of
   hugetlbs in Peter Xu's series "mm/gup: Unify hugetlb, part 2".
 
 - Rick Edgecombe has developed some fixes to stack guard gaps in the
   series "Cover a guard gap corner case".
 
 - Jinjiang Tu has fixed KSM's behaviour after a fork+exec in the series
   "mm/ksm: fix ksm exec support for prctl".
 
 - Baolin Wang has implemented NUMA balancing for multi-size THPs.  This
   is a simple first-cut implementation for now.  The series is "support
   multi-size THP numa balancing".
 
 - Cleanups to vma handling helper functions from Matthew Wilcox in the
   series "Unify vma_address and vma_pgoff_address".
 
 - Some selftests maintenance work from Dev Jain in the series
   "selftests/mm: mremap_test: Optimizations and style fixes".
 
 - Improvements to the swapping of multi-size THPs from Ryan Roberts in
   the series "Swap-out mTHP without splitting".
 
 - Kefeng Wang has significantly optimized the handling of arm64's
   permission page faults in the series
 
 	"arch/mm/fault: accelerate pagefault when badaccess"
 	"mm: remove arch's private VM_FAULT_BADMAP/BADACCESS"
 
 - GUP cleanups from David Hildenbrand in "mm/gup: consistently call it
   GUP-fast".
 
 - hugetlb fault code cleanups from Vishal Moola in "Hugetlb fault path to
   use struct vm_fault".
 
 - selftests build fixes from John Hubbard in the series "Fix
   selftests/mm build without requiring "make headers"".
 
 - Memory tiering fixes/improvements from Ho-Ren (Jack) Chuang in the
   series "Improved Memory Tier Creation for CPUless NUMA Nodes".  Fixes
   the initialization code so that migration between different memory types
   works as intended.
 
 - David Hildenbrand has improved follow_pte() and fixed an errant driver
   in the series "mm: follow_pte() improvements and acrn follow_pte()
   fixes".
 
 - David also did some cleanup work on large folio mapcounts in his
   series "mm: mapcount for large folios + page_mapcount() cleanups".
 
 - Folio conversions in KSM in Alex Shi's series "transfer page to folio
   in KSM".
 
 - Barry Song has added some sysfs stats for monitoring multi-size THP's
   in the series "mm: add per-order mTHP alloc and swpout counters".
 
 - Some zswap cleanups from Yosry Ahmed in the series "zswap same-filled
   and limit checking cleanups".
 
 - Matthew Wilcox has been looking at buffer_head code and found the
   documentation to be lacking.  The series is "Improve buffer head
   documentation".
 
 - Multi-size THPs get more work, this time from Lance Yang.  His series
   "mm/madvise: enhance lazyfreeing with mTHP in madvise_free" optimizes
   the freeing of these things.
 
 - Kemeng Shi has added more userspace-visible writeback instrumentation
   in the series "Improve visibility of writeback".
 
 - Kemeng Shi then sent some maintenance work on top in the series "Fix
   and cleanups to page-writeback".
 
 - Matthew Wilcox reduces mmap_lock traffic in the anon vma code in the
   series "Improve anon_vma scalability for anon VMAs".  Intel's test bot
   reported an improbable 3x improvement in one test.
 
 - SeongJae Park adds some DAMON feature work in the series
 
 	"mm/damon: add a DAMOS filter type for page granularity access recheck"
 	"selftests/damon: add DAMOS quota goal test"
 
 - Also some maintenance work in the series
 
 	"mm/damon/paddr: simplify page level access re-check for pageout"
 	"mm/damon: misc fixes and improvements"
 
 - David Hildenbrand has disabled some known-to-fail selftests ni the
   series "selftests: mm: cow: flag vmsplice() hugetlb tests as XFAIL".
 
 - memcg metadata storage optimizations from Shakeel Butt in "memcg:
   reduce memory consumption by memcg stats".
 
 - DAX fixes and maintenance work from Vishal Verma in the series
   "dax/bus.c: Fixups for dax-bus locking".
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZkgQYwAKCRDdBJ7gKXxA
 jrdKAP9WVJdpEcXxpoub/vVE0UWGtffr8foifi9bCwrQrGh5mgEAx7Yf0+d/oBZB
 nvA4E0DcPrUAFy144FNM0NTCb7u9vAw=
 =V3R/
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2024-05-17-19-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull mm updates from Andrew Morton:
 "The usual shower of singleton fixes and minor series all over MM,
  documented (hopefully adequately) in the respective changelogs.
  Notable series include:

   - Lucas Stach has provided some page-mapping cleanup/consolidation/
     maintainability work in the series "mm/treewide: Remove pXd_huge()
     API".

   - In the series "Allow migrate on protnone reference with
     MPOL_PREFERRED_MANY policy", Donet Tom has optimized mempolicy's
     MPOL_PREFERRED_MANY mode, yielding almost doubled performance in
     one test.

   - In their series "Memory allocation profiling" Kent Overstreet and
     Suren Baghdasaryan have contributed a means of determining (via
     /proc/allocinfo) whereabouts in the kernel memory is being
     allocated: number of calls and amount of memory.

   - Matthew Wilcox has provided the series "Various significant MM
     patches" which does a number of rather unrelated things, but in
     largely similar code sites.

   - In his series "mm: page_alloc: freelist migratetype hygiene"
     Johannes Weiner has fixed the page allocator's handling of
     migratetype requests, with resulting improvements in compaction
     efficiency.

   - In the series "make the hugetlb migration strategy consistent"
     Baolin Wang has fixed a hugetlb migration issue, which should
     improve hugetlb allocation reliability.

   - Liu Shixin has hit an I/O meltdown caused by readahead in a
     memory-tight memcg. Addressed in the series "Fix I/O high when
     memory almost met memcg limit".

   - In the series "mm/filemap: optimize folio adding and splitting"
     Kairui Song has optimized pagecache insertion, yielding ~10%
     performance improvement in one test.

   - Baoquan He has cleaned up and consolidated the early zone
     initialization code in the series "mm/mm_init.c: refactor
     free_area_init_core()".

   - Baoquan has also redone some MM initializatio code in the series
     "mm/init: minor clean up and improvement".

   - MM helper cleanups from Christoph Hellwig in his series "remove
     follow_pfn".

   - More cleanups from Matthew Wilcox in the series "Various
     page->flags cleanups".

   - Vlastimil Babka has contributed maintainability improvements in the
     series "memcg_kmem hooks refactoring".

   - More folio conversions and cleanups in Matthew Wilcox's series:
	"Convert huge_zero_page to huge_zero_folio"
	"khugepaged folio conversions"
	"Remove page_idle and page_young wrappers"
	"Use folio APIs in procfs"
	"Clean up __folio_put()"
	"Some cleanups for memory-failure"
	"Remove page_mapping()"
	"More folio compat code removal"

   - David Hildenbrand chipped in with "fs/proc/task_mmu: convert
     hugetlb functions to work on folis".

   - Code consolidation and cleanup work related to GUP's handling of
     hugetlbs in Peter Xu's series "mm/gup: Unify hugetlb, part 2".

   - Rick Edgecombe has developed some fixes to stack guard gaps in the
     series "Cover a guard gap corner case".

   - Jinjiang Tu has fixed KSM's behaviour after a fork+exec in the
     series "mm/ksm: fix ksm exec support for prctl".

   - Baolin Wang has implemented NUMA balancing for multi-size THPs.
     This is a simple first-cut implementation for now. The series is
     "support multi-size THP numa balancing".

   - Cleanups to vma handling helper functions from Matthew Wilcox in
     the series "Unify vma_address and vma_pgoff_address".

   - Some selftests maintenance work from Dev Jain in the series
     "selftests/mm: mremap_test: Optimizations and style fixes".

   - Improvements to the swapping of multi-size THPs from Ryan Roberts
     in the series "Swap-out mTHP without splitting".

   - Kefeng Wang has significantly optimized the handling of arm64's
     permission page faults in the series
	"arch/mm/fault: accelerate pagefault when badaccess"
	"mm: remove arch's private VM_FAULT_BADMAP/BADACCESS"

   - GUP cleanups from David Hildenbrand in "mm/gup: consistently call
     it GUP-fast".

   - hugetlb fault code cleanups from Vishal Moola in "Hugetlb fault
     path to use struct vm_fault".

   - selftests build fixes from John Hubbard in the series "Fix
     selftests/mm build without requiring "make headers"".

   - Memory tiering fixes/improvements from Ho-Ren (Jack) Chuang in the
     series "Improved Memory Tier Creation for CPUless NUMA Nodes".
     Fixes the initialization code so that migration between different
     memory types works as intended.

   - David Hildenbrand has improved follow_pte() and fixed an errant
     driver in the series "mm: follow_pte() improvements and acrn
     follow_pte() fixes".

   - David also did some cleanup work on large folio mapcounts in his
     series "mm: mapcount for large folios + page_mapcount() cleanups".

   - Folio conversions in KSM in Alex Shi's series "transfer page to
     folio in KSM".

   - Barry Song has added some sysfs stats for monitoring multi-size
     THP's in the series "mm: add per-order mTHP alloc and swpout
     counters".

   - Some zswap cleanups from Yosry Ahmed in the series "zswap
     same-filled and limit checking cleanups".

   - Matthew Wilcox has been looking at buffer_head code and found the
     documentation to be lacking. The series is "Improve buffer head
     documentation".

   - Multi-size THPs get more work, this time from Lance Yang. His
     series "mm/madvise: enhance lazyfreeing with mTHP in madvise_free"
     optimizes the freeing of these things.

   - Kemeng Shi has added more userspace-visible writeback
     instrumentation in the series "Improve visibility of writeback".

   - Kemeng Shi then sent some maintenance work on top in the series
     "Fix and cleanups to page-writeback".

   - Matthew Wilcox reduces mmap_lock traffic in the anon vma code in
     the series "Improve anon_vma scalability for anon VMAs". Intel's
     test bot reported an improbable 3x improvement in one test.

   - SeongJae Park adds some DAMON feature work in the series
	"mm/damon: add a DAMOS filter type for page granularity access recheck"
	"selftests/damon: add DAMOS quota goal test"

   - Also some maintenance work in the series
	"mm/damon/paddr: simplify page level access re-check for pageout"
	"mm/damon: misc fixes and improvements"

   - David Hildenbrand has disabled some known-to-fail selftests ni the
     series "selftests: mm: cow: flag vmsplice() hugetlb tests as
     XFAIL".

   - memcg metadata storage optimizations from Shakeel Butt in "memcg:
     reduce memory consumption by memcg stats".

   - DAX fixes and maintenance work from Vishal Verma in the series
     "dax/bus.c: Fixups for dax-bus locking""

* tag 'mm-stable-2024-05-17-19-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (426 commits)
  memcg, oom: cleanup unused memcg_oom_gfp_mask and memcg_oom_order
  selftests/mm: hugetlb_madv_vs_map: avoid test skipping by querying hugepage size at runtime
  mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_wp
  mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_fault
  selftests: cgroup: add tests to verify the zswap writeback path
  mm: memcg: make alloc_mem_cgroup_per_node_info() return bool
  mm/damon/core: fix return value from damos_wmark_metric_value
  mm: do not update memcg stats for NR_{FILE/SHMEM}_PMDMAPPED
  selftests: cgroup: remove redundant enabling of memory controller
  Docs/mm/damon/maintainer-profile: allow posting patches based on damon/next tree
  Docs/mm/damon/maintainer-profile: change the maintainer's timezone from PST to PT
  Docs/mm/damon/design: use a list for supported filters
  Docs/admin-guide/mm/damon/usage: fix wrong schemes effective quota update command
  Docs/admin-guide/mm/damon/usage: fix wrong example of DAMOS filter matching sysfs file
  selftests/damon: classify tests for functionalities and regressions
  selftests/damon/_damon_sysfs: use 'is' instead of '==' for 'None'
  selftests/damon/_damon_sysfs: find sysfs mount point from /proc/mounts
  selftests/damon/_damon_sysfs: check errors from nr_schemes file reads
  mm/damon/core: initialize ->esz_bp from damos_quota_init_priv()
  selftests/damon: add a test for DAMOS quota goal
  ...
2024-05-19 09:21:03 -07:00
Palmer Dabbelt
4f16345d92
Merge patch series "riscv: ASID-related and UP-related TLB flush enhancements"
Samuel Holland <samuel.holland@sifive.com> says:

This series converts uniprocessor kernel builds to use the same TLB
flushing code as SMP builds, to take advantage of batching and existing
range- and ASID-based TLB flush optimizations. It optimizes out IPIs and
SBI calls based on the online CPU count, which also covers the scenario
where SMP was enabled at build time but only one CPU is present/online.
A final optimization is to use single-ASID flushes wherever possible, to
avoid unnecessary TLB misses for kernel mappings.

This series has a semantic conflict with the AIA patches that are in
linux-next due to the removal of the third parameter of
riscv_ipi_set_virq_range(), which is called from imsic_ipi_domain_init()
in drivers/irqchip/irq-riscv-imsic-early.c. The resolution is to remove
the extra argument from the call site.

Here are some numbers from D1 which show the performance impact:

v6.9-rc1:
 System Benchmarks Partial Index              BASELINE       RESULT    INDEX
 Execl Throughput                                 43.0        198.5     46.2
 File Copy 1024 bufsize 2000 maxblocks          3960.0      73934.4    186.7
 File Copy 256 bufsize 500 maxblocks            1655.0      20242.6    122.3
 File Copy 4096 bufsize 8000 maxblocks          5800.0     197706.4    340.9
 Pipe Throughput                               12440.0     176974.2    142.3
 Pipe-based Context Switching                   4000.0      23626.8     59.1
 Process Creation                                126.0        449.9     35.7
 Shell Scripts (1 concurrent)                     42.4        544.4    128.4
 Shell Scripts (16 concurrent)                     ---         35.3      ---
 Shell Scripts (8 concurrent)                      6.0         71.6    119.3
 System Call Overhead                          15000.0     248072.6    165.4
                                                                    ========
 System Benchmarks Index Score (Partial Only)                          110.6

v6.9-rc1 + this patch series:
 System Benchmarks Partial Index              BASELINE       RESULT    INDEX
 Execl Throughput                                 43.0        196.8     45.8
 File Copy 1024 bufsize 2000 maxblocks          3960.0      71782.2    181.3
 File Copy 256 bufsize 500 maxblocks            1655.0      21269.4    128.5
 File Copy 4096 bufsize 8000 maxblocks          5800.0     199424.0    343.8
 Pipe Throughput                               12440.0     196468.6    157.9
 Pipe-based Context Switching                   4000.0      24261.8     60.7
 Process Creation                                126.0        459.0     36.4
 Shell Scripts (1 concurrent)                     42.4        543.8    128.2
 Shell Scripts (16 concurrent)                     ---         35.5      ---
 Shell Scripts (8 concurrent)                      6.0         71.7    119.6
 System Call Overhead                          15000.0     259415.2    172.9
                                                                    ========
 System Benchmarks Index Score (Partial Only)                          113.0

* b4-shazam-lts:
  riscv: mm: Always use an ASID to flush mm contexts
  riscv: mm: Preserve global TLB entries when switching contexts
  riscv: mm: Make asid_bits a local variable
  riscv: mm: Use a fixed layout for the MM context ID
  riscv: mm: Introduce cntx2asid/cntx2version helper macros
  riscv: Avoid TLB flush loops when affected by SiFive CIP-1200
  riscv: Apply SiFive CIP-1200 workaround to single-ASID sfence.vma
  riscv: mm: Combine the SMP and UP TLB flush code
  riscv: Only send remote fences when some other CPU is online
  riscv: mm: Broadcast kernel TLB flushes only when needed
  riscv: Use IPIs for remote cache/TLB flushes by default
  riscv: Factor out page table TLB synchronization
  riscv: Flush the instruction cache during SMP bringup

Link: https://lore.kernel.org/r/20240327045035.368512-1-samuel.holland@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-04-30 10:35:48 -07:00
Jisheng Zhang
48b4fc6693
riscv: select ARCH_HAS_FAST_MULTIPLIER
Currently, riscv linux requires at least IMA, so all platforms have a
multiplier. And I assume the 'mul' efficiency is comparable or better
than a sequence of five or so register-dependent arithmetic
instructions. Select ARCH_HAS_FAST_MULTIPLIER to get slightly nicer
codegen. Refer to commit f9b4192923 ("[PATCH] bitops: hweight()
speedup") for more details.

In a simple benchmark test calling hweight64() in a loop, it got:
about 14% performance improvement on JH7110, tested on Milkv Mars.

about 23% performance improvement on TH1520 and SG2042, tested on
Sipeed LPI4A and SG2042 platform.

a slight performance drop on CV1800B, tested on milkv duo. Among all
riscv platforms in my hands, this is the only one which sees a slight
performance drop. It means the 'mul' isn't quick enough. However, the
situation exists on x86 too, for example, P4 doesn't have fast
integer multiplies as said in the above commit, x86 also selects
ARCH_HAS_FAST_MULTIPLIER. So let's select ARCH_HAS_FAST_MULTIPLIER
which can benefit almost riscv platforms.

Samuel also provided some performance numbers:
On Unmatched: 20% speedup for __sw_hweight32 and 30% speedup for
__sw_hweight64.
On D1: 8% speedup for __sw_hweight32 and 8% slowdown for
__sw_hweight64.

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Samuel Holland <samuel.holland@sifive.com>
Tested-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20240325105823.1483-1-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-04-30 10:35:47 -07:00
Palmer Dabbelt
7845f52256
Merge patch series "riscv: enable lockless lockref implementation"
Jisheng Zhang <jszhang@kernel.org> says:

This series selects ARCH_USE_CMPXCHG_LOCKREF to enable the
cmpxchg-based lockless lockref implementation for riscv. Then,
implement arch_cmpxchg64_{relaxed|acquire|release}.

After patch1:
Using Linus' test case[1] on TH1520 platform, I see a 11.2% improvement.
On JH7110 platform, I see 12.0% improvement.

After patch2:
on both TH1520 and JH7110 platforms, I didn't see obvious
performance improvement with Linus' test case [1]. IMHO, this may
be related with the fence and lr.d/sc.d hw implementations. In theory,
lr/sc without fence could give performance improvement over lr/sc plus
fence, so add the code here to leave performance improvement room on
newer HW platforms.

* b4-shazam-merge:
  riscv: cmpxchg: implement arch_cmpxchg64_{relaxed|acquire|release}
  riscv: select ARCH_USE_CMPXCHG_LOCKREF

Link: http://marc.info/?l=linux-fsdevel&m=137782380714721&w=4 [1]
Link: https://lore.kernel.org/r/20240325111038.1700-1-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-04-30 10:35:46 -07:00
Samuel Holland
c6026d35b6
riscv: mm: Combine the SMP and UP TLB flush code
In SMP configurations, all TLB flushing narrower than flush_tlb_all()
goes through __flush_tlb_range(). Do the same in UP configurations.

This allows UP configurations to take advantage of recent improvements
to the code in tlbflush.c, such as support for huge pages and flushing
multiple-page ranges.

Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Yunhui Cui <cuiyunhui@bytedance.com>
Link: https://lore.kernel.org/r/20240327045035.368512-7-samuel.holland@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-04-29 10:49:29 -07:00
Palmer Dabbelt
d4b500cceb
Merge patch series "riscv: 64-bit NOMMU fixes and enhancements"
Samuel Holland <samuel.holland@sifive.com> says:

This series aims to improve support for NOMMU, specifically by making it
easier to test NOMMU kernels in QEMU and on various widely-available
hardware (errata permitting). After all, everything supports Svbare...

After applying this series, a NOMMU kernel based on defconfig (changing
only the three options below*) boots to userspace on QEMU when passed as
-kernel.

  # CONFIG_RISCV_M_MODE is not set
  # CONFIG_MMU is not set
  CONFIG_NONPORTABLE=y

*if you are using LLD, you must also disable BPF_SYSCALL and KALLSYMS,
because LLD bails on out-of-range references to undefined weak symbols.

* b4-shazam-merge:
  riscv: Allow NOMMU kernels to run in S-mode
  riscv: Remove MMU dependency from Zbb and Zicboz
  riscv: Fix loading 64-bit NOMMU kernels past the start of RAM
  riscv: Fix TASK_SIZE on 64-bit NOMMU

Link: https://lore.kernel.org/r/20240227003630.3634533-1-samuel.holland@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-04-28 14:50:35 -07:00
Miguel Ojeda
70a57b2472
RISC-V: enable building 64-bit kernels with rust support
The rust modules work on 64-bit RISC-V, with no twiddling required.
Select HAVE_RUST and provide the required flags to kbuild so that the
modules can be used. The Makefile and Kconfig changes are lifted from
work done by Miguel in the Rust-for-Linux tree, hence his authorship.
Following the rabbit hole, the Makefile changes originated in a script,
created based on config files originally added by Gary, hence his
co-authorship.

32-bit is broken in core rust code, so support is limited to 64-bit:
ld.lld: error: undefined symbol: __udivdi3

As 64-bit RISC-V is now supported, add it to the arch support table.

Co-developed-by: Gary Guo <gary@garyguo.net>
Signed-off-by: Gary Guo <gary@garyguo.net>
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Co-developed-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20240409-silencer-book-ce1320f06aab@spud
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-04-28 14:50:34 -07:00
David Hildenbrand
25176ad09c mm/treewide: rename CONFIG_HAVE_FAST_GUP to CONFIG_HAVE_GUP_FAST
Nowadays, we call it "GUP-fast", the external interface includes functions
like "get_user_pages_fast()", and we renamed all internal functions to
reflect that as well.

Let's make the config option reflect that.

Link: https://lkml.kernel.org/r/20240402125516.223131-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-25 20:56:41 -07:00
Jisheng Zhang
eb1e503729
riscv: select ARCH_USE_CMPXCHG_LOCKREF
Select ARCH_USE_CMPXCHG_LOCKREF to enable the cmpxchg-based lockless
lockref implementation for riscv.

Using Linus' test case[1] on TH1520 platform, I see a 11.2% improvement.
On JH7110 platform, I see 12.0% improvement.

Link: http://marc.info/?l=linux-fsdevel&m=137782380714721&w=4 [1]
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Andrea Parri <parri.andrea@gmail.com>
Link: https://lore.kernel.org/r/20240325111038.1700-2-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-04-24 12:57:48 -07:00
Samuel Holland
f862bbf4cd
riscv: Allow NOMMU kernels to run in S-mode
For ease of testing, it is convenient to run NOMMU kernels in supervisor
mode. The only required change is to offset the kernel load address,
since the beginning of RAM is usually reserved for M-mode firmware.

Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20240227003630.3634533-5-samuel.holland@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-04-09 11:39:40 -07:00
Samuel Holland
9c4319d697
riscv: Remove MMU dependency from Zbb and Zicboz
The Zbb and Zicboz ISA extensions have no dependency on an MMU and are
equally useful on NOMMU kernels.

Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20240227003630.3634533-4-samuel.holland@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-04-09 11:39:39 -07:00
Anup Patel
0eebc69db3 RISC-V: Select APLIC and IMSIC drivers
The QEMU virt machine supports AIA emulation and quite a few RISC-V
platforms with AIA support are under development so select APLIC and
IMSIC drivers for all RISC-V platforms.

Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20240307140307.646078-9-apatel@ventanamicro.com
2024-03-25 17:38:29 +01:00
Linus Torvalds
c150b809f7 RISC-V Patches for the 6.9 Merge Window
* Support for various vector-accelerated crypto routines.
 * Hibernation is now enabled for portable kernel builds.
 * mmap_rnd_bits_max is larger on systems with larger VAs.
 * Support for fast GUP.
 * Support for membarrier-based instruction cache synchronization.
 * Support for the Andes hart-level interrupt controller and PMU.
 * Some cleanups around unaligned access speed probing and Kconfig
   settings.
 * Support for ACPI LPI and CPPC.
 * Various cleanus related to barriers.
 * A handful of fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmX9icgTHHBhbG1lckBk
 YWJiZWx0LmNvbQAKCRAuExnzX7sYib+UD/4xyL6UMixx6A06BVBL9UT4vOrxRvNr
 JIihG5y5QNMjes9DHWL35mZTMqFtQ0tq94ViWFLmJWloV/8KRVM2C9R9KX7vplf3
 M/OwvP106spxgvNHoeQbycgs42RU1t2mpqT7N1iK2hCjqieP3vLn6hsSLXWTAG0L
 3gQbQw6XCLC3hPyLq+nbFY2i4faeCmpXWmixoy/IvQ5calZQrRU0LNlP6lcMBhVo
 uocjG0uGAhrahw2s81jxcMZcxa3AvUCiplapdD5H5v9rBM85SkYJj2Q9SqdSorkb
 xzuimRnKPI5s47yM3pTfZY0qnQUYHV7PXXuw4WujpCQVQdhaG+Ggq63UUZA61J9t
 IzZK2zdcfHqICrGTtXImUzRT3dcc3oq+IFq4tTY+rEJm29hrXkAtx+qBm5xtMvax
 fJz5feJ/iT0u7MDj4Oq24n+Kpl+Olm+MJaZX3m5Ovi/9V6a9iK9HXqxg9/Fs0fMO
 +J/0kTgd8Vu9CYH7KNWz3uztcO9eMAH3VyzuXuab4BGj1i1Y/9EjpALQi7rDN73S
 OsYQX6NnzMkBV4dvElJVLXiPlvNlMHZZwdak5CqPb48jaJu6iiIZAuvOrG6/naGP
 wnQSLVA2WWWoOkl3AJhxfpa11CLhbMl9E2gYm1VtNvASXoSFIxlAq1Yv3sG8yjty
 4ZT0rYFJOstYiQ==
 =3dL5
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-6.9-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V updates from Palmer Dabbelt:

 - Support for various vector-accelerated crypto routines

 - Hibernation is now enabled for portable kernel builds

 - mmap_rnd_bits_max is larger on systems with larger VAs

 - Support for fast GUP

 - Support for membarrier-based instruction cache synchronization

 - Support for the Andes hart-level interrupt controller and PMU

 - Some cleanups around unaligned access speed probing and Kconfig
   settings

 - Support for ACPI LPI and CPPC

 - Various cleanus related to barriers

 - A handful of fixes

* tag 'riscv-for-linus-6.9-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (66 commits)
  riscv: Fix syscall wrapper for >word-size arguments
  crypto: riscv - add vector crypto accelerated AES-CBC-CTS
  crypto: riscv - parallelize AES-CBC decryption
  riscv: Only flush the mm icache when setting an exec pte
  riscv: Use kcalloc() instead of kzalloc()
  riscv/barrier: Add missing space after ','
  riscv/barrier: Consolidate fence definitions
  riscv/barrier: Define RISCV_FULL_BARRIER
  riscv/barrier: Define __{mb,rmb,wmb}
  RISC-V: defconfig: Enable CONFIG_ACPI_CPPC_CPUFREQ
  cpufreq: Move CPPC configs to common Kconfig and add RISC-V
  ACPI: RISC-V: Add CPPC driver
  ACPI: Enable ACPI_PROCESSOR for RISC-V
  ACPI: RISC-V: Add LPI driver
  cpuidle: RISC-V: Move few functions to arch/riscv
  riscv: Introduce set_compat_task() in asm/compat.h
  riscv: Introduce is_compat_thread() into compat.h
  riscv: add compile-time test into is_compat_task()
  riscv: Replace direct thread flag check with is_compat_task()
  riscv: Improve arch_get_mmap_end() macro
  ...
2024-03-22 10:41:13 -07:00
Linus Torvalds
e5eb28f6d1 - Kuan-Wei Chiu has developed the well-named series "lib min_heap: Min
heap optimizations".
 
 - Kuan-Wei Chiu has also sped up the library sorting code in the series
   "lib/sort: Optimize the number of swaps and comparisons".
 
 - Alexey Gladkov has added the ability for code running within an IPC
   namespace to alter its IPC and MQ limits.  The series is "Allow to
   change ipc/mq sysctls inside ipc namespace".
 
 - Geert Uytterhoeven has contributed some dhrystone maintenance work in
   the series "lib: dhry: miscellaneous cleanups".
 
 - Ryusuke Konishi continues nilfs2 maintenance work in the series
 
 	"nilfs2: eliminate kmap and kmap_atomic calls"
 	"nilfs2: fix kernel bug at submit_bh_wbc()"
 
 - Nathan Chancellor has updated our build tools requirements in the
   series "Bump the minimum supported version of LLVM to 13.0.1".
 
 - Muhammad Usama Anjum continues with the selftests maintenance work in
   the series "selftests/mm: Improve run_vmtests.sh".
 
 - Oleg Nesterov has done some maintenance work against the signal code
   in the series "get_signal: minor cleanups and fix".
 
 Plus the usual shower of singleton patches in various parts of the tree.
 Please see the individual changelogs for details.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZfMnvgAKCRDdBJ7gKXxA
 jjKMAP4/Upq07D4wjkMVPb+QrkipbbLpdcgJ++q3z6rba4zhPQD+M3SFriIJk/Xh
 tKVmvihFxfAhdDthseXcIf1nBjMALwY=
 =8rVc
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2024-03-14-09-36' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull non-MM updates from Andrew Morton:

 - Kuan-Wei Chiu has developed the well-named series "lib min_heap: Min
   heap optimizations".

 - Kuan-Wei Chiu has also sped up the library sorting code in the series
   "lib/sort: Optimize the number of swaps and comparisons".

 - Alexey Gladkov has added the ability for code running within an IPC
   namespace to alter its IPC and MQ limits. The series is "Allow to
   change ipc/mq sysctls inside ipc namespace".

 - Geert Uytterhoeven has contributed some dhrystone maintenance work in
   the series "lib: dhry: miscellaneous cleanups".

 - Ryusuke Konishi continues nilfs2 maintenance work in the series

	"nilfs2: eliminate kmap and kmap_atomic calls"
	"nilfs2: fix kernel bug at submit_bh_wbc()"

 - Nathan Chancellor has updated our build tools requirements in the
   series "Bump the minimum supported version of LLVM to 13.0.1".

 - Muhammad Usama Anjum continues with the selftests maintenance work in
   the series "selftests/mm: Improve run_vmtests.sh".

 - Oleg Nesterov has done some maintenance work against the signal code
   in the series "get_signal: minor cleanups and fix".

Plus the usual shower of singleton patches in various parts of the tree.
Please see the individual changelogs for details.

* tag 'mm-nonmm-stable-2024-03-14-09-36' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (77 commits)
  nilfs2: prevent kernel bug at submit_bh_wbc()
  nilfs2: fix failure to detect DAT corruption in btree and direct mappings
  ocfs2: enable ocfs2_listxattr for special files
  ocfs2: remove SLAB_MEM_SPREAD flag usage
  assoc_array: fix the return value in assoc_array_insert_mid_shortcut()
  buildid: use kmap_local_page()
  watchdog/core: remove sysctl handlers from public header
  nilfs2: use div64_ul() instead of do_div()
  mul_u64_u64_div_u64: increase precision by conditionally swapping a and b
  kexec: copy only happens before uchunk goes to zero
  get_signal: don't initialize ksig->info if SIGNAL_GROUP_EXIT/group_exec_task
  get_signal: hide_si_addr_tag_bits: fix the usage of uninitialized ksig
  get_signal: don't abuse ksig->info.si_signo and ksig->sig
  const_structs.checkpatch: add device_type
  Normalise "name (ad@dr)" MODULE_AUTHORs to "name <ad@dr>"
  dyndbg: replace kstrdup() + strchr() with kstrdup_and_replace()
  list: leverage list_is_head() for list_entry_is_head()
  nilfs2: MAINTAINERS: drop unreachable project mirror site
  smp: make __smp_processor_id() 0-argument macro
  fat: fix uninitialized field in nostale filehandles
  ...
2024-03-14 18:03:09 -07:00
Linus Torvalds
902861e34c - Sumanth Korikkar has taught s390 to allocate hotplug-time page frames
from hotplugged memory rather than only from main memory.  Series
   "implement "memmap on memory" feature on s390".
 
 - More folio conversions from Matthew Wilcox in the series
 
 	"Convert memcontrol charge moving to use folios"
 	"mm: convert mm counter to take a folio"
 
 - Chengming Zhou has optimized zswap's rbtree locking, providing
   significant reductions in system time and modest but measurable
   reductions in overall runtimes.  The series is "mm/zswap: optimize the
   scalability of zswap rb-tree".
 
 - Chengming Zhou has also provided the series "mm/zswap: optimize zswap
   lru list" which provides measurable runtime benefits in some
   swap-intensive situations.
 
 - And Chengming Zhou further optimizes zswap in the series "mm/zswap:
   optimize for dynamic zswap_pools".  Measured improvements are modest.
 
 - zswap cleanups and simplifications from Yosry Ahmed in the series "mm:
   zswap: simplify zswap_swapoff()".
 
 - In the series "Add DAX ABI for memmap_on_memory", Vishal Verma has
   contributed several DAX cleanups as well as adding a sysfs tunable to
   control the memmap_on_memory setting when the dax device is hotplugged
   as system memory.
 
 - Johannes Weiner has added the large series "mm: zswap: cleanups",
   which does that.
 
 - More DAMON work from SeongJae Park in the series
 
 	"mm/damon: make DAMON debugfs interface deprecation unignorable"
 	"selftests/damon: add more tests for core functionalities and corner cases"
 	"Docs/mm/damon: misc readability improvements"
 	"mm/damon: let DAMOS feeds and tame/auto-tune itself"
 
 - In the series "mm/mempolicy: weighted interleave mempolicy and sysfs
   extension" Rakie Kim has developed a new mempolicy interleaving policy
   wherein we allocate memory across nodes in a weighted fashion rather
   than uniformly.  This is beneficial in heterogeneous memory environments
   appearing with CXL.
 
 - Christophe Leroy has contributed some cleanup and consolidation work
   against the ARM pagetable dumping code in the series "mm: ptdump:
   Refactor CONFIG_DEBUG_WX and check_wx_pages debugfs attribute".
 
 - Luis Chamberlain has added some additional xarray selftesting in the
   series "test_xarray: advanced API multi-index tests".
 
 - Muhammad Usama Anjum has reworked the selftest code to make its
   human-readable output conform to the TAP ("Test Anything Protocol")
   format.  Amongst other things, this opens up the use of third-party
   tools to parse and process out selftesting results.
 
 - Ryan Roberts has added fork()-time PTE batching of THP ptes in the
   series "mm/memory: optimize fork() with PTE-mapped THP".  Mainly
   targeted at arm64, this significantly speeds up fork() when the process
   has a large number of pte-mapped folios.
 
 - David Hildenbrand also gets in on the THP pte batching game in his
   series "mm/memory: optimize unmap/zap with PTE-mapped THP".  It
   implements batching during munmap() and other pte teardown situations.
   The microbenchmark improvements are nice.
 
 - And in the series "Transparent Contiguous PTEs for User Mappings" Ryan
   Roberts further utilizes arm's pte's contiguous bit ("contpte
   mappings").  Kernel build times on arm64 improved nicely.  Ryan's series
   "Address some contpte nits" provides some followup work.
 
 - In the series "mm/hugetlb: Restore the reservation" Breno Leitao has
   fixed an obscure hugetlb race which was causing unnecessary page faults.
   He has also added a reproducer under the selftest code.
 
 - In the series "selftests/mm: Output cleanups for the compaction test",
   Mark Brown did what the title claims.
 
 - Kinsey Ho has added the series "mm/mglru: code cleanup and refactoring".
 
 - Even more zswap material from Nhat Pham.  The series "fix and extend
   zswap kselftests" does as claimed.
 
 - In the series "Introduce cpu_dcache_is_aliasing() to fix DAX
   regression" Mathieu Desnoyers has cleaned up and fixed rather a mess in
   our handling of DAX on archiecctures which have virtually aliasing data
   caches.  The arm architecture is the main beneficiary.
 
 - Lokesh Gidra's series "per-vma locks in userfaultfd" provides dramatic
   improvements in worst-case mmap_lock hold times during certain
   userfaultfd operations.
 
 - Some page_owner enhancements and maintenance work from Oscar Salvador
   in his series
 
 	"page_owner: print stacks and their outstanding allocations"
 	"page_owner: Fixup and cleanup"
 
 - Uladzislau Rezki has contributed some vmalloc scalability improvements
   in his series "Mitigate a vmap lock contention".  It realizes a 12x
   improvement for a certain microbenchmark.
 
 - Some kexec/crash cleanup work from Baoquan He in the series "Split
   crash out from kexec and clean up related config items".
 
 - Some zsmalloc maintenance work from Chengming Zhou in the series
 
 	"mm/zsmalloc: fix and optimize objects/page migration"
 	"mm/zsmalloc: some cleanup for get/set_zspage_mapping()"
 
 - Zi Yan has taught the MM to perform compaction on folios larger than
   order=0.  This a step along the path to implementaton of the merging of
   large anonymous folios.  The series is named "Enable >0 order folio
   memory compaction".
 
 - Christoph Hellwig has done quite a lot of cleanup work in the
   pagecache writeback code in his series "convert write_cache_pages() to
   an iterator".
 
 - Some modest hugetlb cleanups and speedups in Vishal Moola's series
   "Handle hugetlb faults under the VMA lock".
 
 - Zi Yan has changed the page splitting code so we can split huge pages
   into sizes other than order-0 to better utilize large folios.  The
   series is named "Split a folio to any lower order folios".
 
 - David Hildenbrand has contributed the series "mm: remove
   total_mapcount()", a cleanup.
 
 - Matthew Wilcox has sought to improve the performance of bulk memory
   freeing in his series "Rearrange batched folio freeing".
 
 - Gang Li's series "hugetlb: parallelize hugetlb page init on boot"
   provides large improvements in bootup times on large machines which are
   configured to use large numbers of hugetlb pages.
 
 - Matthew Wilcox's series "PageFlags cleanups" does that.
 
 - Qi Zheng's series "minor fixes and supplement for ptdesc" does that
   also.  S390 is affected.
 
 - Cleanups to our pagemap utility functions from Peter Xu in his series
   "mm/treewide: Replace pXd_large() with pXd_leaf()".
 
 - Nico Pache has fixed a few things with our hugepage selftests in his
   series "selftests/mm: Improve Hugepage Test Handling in MM Selftests".
 
 - Also, of course, many singleton patches to many things.  Please see
   the individual changelogs for details.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZfJpPQAKCRDdBJ7gKXxA
 joxeAP9TrcMEuHnLmBlhIXkWbIR4+ki+pA3v+gNTlJiBhnfVSgD9G55t1aBaRplx
 TMNhHfyiHYDTx/GAV9NXW84tasJSDgA=
 =TG55
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2024-03-13-20-04' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - Sumanth Korikkar has taught s390 to allocate hotplug-time page frames
   from hotplugged memory rather than only from main memory. Series
   "implement "memmap on memory" feature on s390".

 - More folio conversions from Matthew Wilcox in the series

	"Convert memcontrol charge moving to use folios"
	"mm: convert mm counter to take a folio"

 - Chengming Zhou has optimized zswap's rbtree locking, providing
   significant reductions in system time and modest but measurable
   reductions in overall runtimes. The series is "mm/zswap: optimize the
   scalability of zswap rb-tree".

 - Chengming Zhou has also provided the series "mm/zswap: optimize zswap
   lru list" which provides measurable runtime benefits in some
   swap-intensive situations.

 - And Chengming Zhou further optimizes zswap in the series "mm/zswap:
   optimize for dynamic zswap_pools". Measured improvements are modest.

 - zswap cleanups and simplifications from Yosry Ahmed in the series
   "mm: zswap: simplify zswap_swapoff()".

 - In the series "Add DAX ABI for memmap_on_memory", Vishal Verma has
   contributed several DAX cleanups as well as adding a sysfs tunable to
   control the memmap_on_memory setting when the dax device is
   hotplugged as system memory.

 - Johannes Weiner has added the large series "mm: zswap: cleanups",
   which does that.

 - More DAMON work from SeongJae Park in the series

	"mm/damon: make DAMON debugfs interface deprecation unignorable"
	"selftests/damon: add more tests for core functionalities and corner cases"
	"Docs/mm/damon: misc readability improvements"
	"mm/damon: let DAMOS feeds and tame/auto-tune itself"

 - In the series "mm/mempolicy: weighted interleave mempolicy and sysfs
   extension" Rakie Kim has developed a new mempolicy interleaving
   policy wherein we allocate memory across nodes in a weighted fashion
   rather than uniformly. This is beneficial in heterogeneous memory
   environments appearing with CXL.

 - Christophe Leroy has contributed some cleanup and consolidation work
   against the ARM pagetable dumping code in the series "mm: ptdump:
   Refactor CONFIG_DEBUG_WX and check_wx_pages debugfs attribute".

 - Luis Chamberlain has added some additional xarray selftesting in the
   series "test_xarray: advanced API multi-index tests".

 - Muhammad Usama Anjum has reworked the selftest code to make its
   human-readable output conform to the TAP ("Test Anything Protocol")
   format. Amongst other things, this opens up the use of third-party
   tools to parse and process out selftesting results.

 - Ryan Roberts has added fork()-time PTE batching of THP ptes in the
   series "mm/memory: optimize fork() with PTE-mapped THP". Mainly
   targeted at arm64, this significantly speeds up fork() when the
   process has a large number of pte-mapped folios.

 - David Hildenbrand also gets in on the THP pte batching game in his
   series "mm/memory: optimize unmap/zap with PTE-mapped THP". It
   implements batching during munmap() and other pte teardown
   situations. The microbenchmark improvements are nice.

 - And in the series "Transparent Contiguous PTEs for User Mappings"
   Ryan Roberts further utilizes arm's pte's contiguous bit ("contpte
   mappings"). Kernel build times on arm64 improved nicely. Ryan's
   series "Address some contpte nits" provides some followup work.

 - In the series "mm/hugetlb: Restore the reservation" Breno Leitao has
   fixed an obscure hugetlb race which was causing unnecessary page
   faults. He has also added a reproducer under the selftest code.

 - In the series "selftests/mm: Output cleanups for the compaction
   test", Mark Brown did what the title claims.

 - Kinsey Ho has added the series "mm/mglru: code cleanup and
   refactoring".

 - Even more zswap material from Nhat Pham. The series "fix and extend
   zswap kselftests" does as claimed.

 - In the series "Introduce cpu_dcache_is_aliasing() to fix DAX
   regression" Mathieu Desnoyers has cleaned up and fixed rather a mess
   in our handling of DAX on archiecctures which have virtually aliasing
   data caches. The arm architecture is the main beneficiary.

 - Lokesh Gidra's series "per-vma locks in userfaultfd" provides
   dramatic improvements in worst-case mmap_lock hold times during
   certain userfaultfd operations.

 - Some page_owner enhancements and maintenance work from Oscar Salvador
   in his series

	"page_owner: print stacks and their outstanding allocations"
	"page_owner: Fixup and cleanup"

 - Uladzislau Rezki has contributed some vmalloc scalability
   improvements in his series "Mitigate a vmap lock contention". It
   realizes a 12x improvement for a certain microbenchmark.

 - Some kexec/crash cleanup work from Baoquan He in the series "Split
   crash out from kexec and clean up related config items".

 - Some zsmalloc maintenance work from Chengming Zhou in the series

	"mm/zsmalloc: fix and optimize objects/page migration"
	"mm/zsmalloc: some cleanup for get/set_zspage_mapping()"

 - Zi Yan has taught the MM to perform compaction on folios larger than
   order=0. This a step along the path to implementaton of the merging
   of large anonymous folios. The series is named "Enable >0 order folio
   memory compaction".

 - Christoph Hellwig has done quite a lot of cleanup work in the
   pagecache writeback code in his series "convert write_cache_pages()
   to an iterator".

 - Some modest hugetlb cleanups and speedups in Vishal Moola's series
   "Handle hugetlb faults under the VMA lock".

 - Zi Yan has changed the page splitting code so we can split huge pages
   into sizes other than order-0 to better utilize large folios. The
   series is named "Split a folio to any lower order folios".

 - David Hildenbrand has contributed the series "mm: remove
   total_mapcount()", a cleanup.

 - Matthew Wilcox has sought to improve the performance of bulk memory
   freeing in his series "Rearrange batched folio freeing".

 - Gang Li's series "hugetlb: parallelize hugetlb page init on boot"
   provides large improvements in bootup times on large machines which
   are configured to use large numbers of hugetlb pages.

 - Matthew Wilcox's series "PageFlags cleanups" does that.

 - Qi Zheng's series "minor fixes and supplement for ptdesc" does that
   also. S390 is affected.

 - Cleanups to our pagemap utility functions from Peter Xu in his series
   "mm/treewide: Replace pXd_large() with pXd_leaf()".

 - Nico Pache has fixed a few things with our hugepage selftests in his
   series "selftests/mm: Improve Hugepage Test Handling in MM
   Selftests".

 - Also, of course, many singleton patches to many things. Please see
   the individual changelogs for details.

* tag 'mm-stable-2024-03-13-20-04' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (435 commits)
  mm/zswap: remove the memcpy if acomp is not sleepable
  crypto: introduce: acomp_is_async to expose if comp drivers might sleep
  memtest: use {READ,WRITE}_ONCE in memory scanning
  mm: prohibit the last subpage from reusing the entire large folio
  mm: recover pud_leaf() definitions in nopmd case
  selftests/mm: skip the hugetlb-madvise tests on unmet hugepage requirements
  selftests/mm: skip uffd hugetlb tests with insufficient hugepages
  selftests/mm: dont fail testsuite due to a lack of hugepages
  mm/huge_memory: skip invalid debugfs new_order input for folio split
  mm/huge_memory: check new folio order when split a folio
  mm, vmscan: retry kswapd's priority loop with cache_trim_mode off on failure
  mm: add an explicit smp_wmb() to UFFDIO_CONTINUE
  mm: fix list corruption in put_pages_list
  mm: remove folio from deferred split list before uncharging it
  filemap: avoid unnecessary major faults in filemap_fault()
  mm,page_owner: drop unnecessary check
  mm,page_owner: check for null stack_record before bumping its refcount
  mm: swap: fix race between free_swap_and_cache() and swapoff()
  mm/treewide: align up pXd_leaf() retval across archs
  mm/treewide: drop pXd_large()
  ...
2024-03-14 17:43:30 -07:00
Palmer Dabbelt
2b2ca35467
Merge patch series "riscv: Use Kconfig to set unaligned access speed"
Charlie Jenkins <charlie@rivosinc.com> says:

If the hardware unaligned access speed is known at compile time, it is
possible to avoid running the unaligned access speed probe to speedup
boot-time.

* b4-shazam-merge:
  riscv: Set unaligned access speed at compile time
  riscv: Decouple emulated unaligned accesses from access speed
  riscv: Only check online cpus for emulated accesses
  riscv: lib: Introduce has_fast_unaligned_access()

Link: https://lore.kernel.org/r/20240308-disable_misaligned_probe_config-v9-0-a388770ba0ce@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-03-13 07:30:33 -07:00
Charlie Jenkins
f413aae96c
riscv: Set unaligned access speed at compile time
Introduce Kconfig options to set the kernel unaligned access support.
These options provide a non-portable alternative to the runtime
unaligned access probe.

To support this, the unaligned access probing code is moved into it's
own file and gated behind a new RISCV_PROBE_UNALIGNED_ACCESS_SUPPORT
option.

Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Tested-by: Samuel Holland <samuel.holland@sifive.com>
Link: https://lore.kernel.org/r/20240308-disable_misaligned_probe_config-v9-4-a388770ba0ce@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-03-13 07:30:31 -07:00
Linus Torvalds
216532e147 hardening updates for v6.9-rc1
- string.h and related header cleanups (Tanzir Hasan, Andy Shevchenko)
 
 - VMCI memcpy() usage and struct_size() cleanups (Vasiliy Kovalev, Harshit
   Mogalapalli)
 
 - selftests/powerpc: Fix load_unaligned_zeropad build failure (Michael
   Ellerman)
 
 - hardened Kconfig fragment updates (Marco Elver, Lukas Bulwahn)
 
 - Handle tail call optimization better in LKDTM (Douglas Anderson)
 
 - Use long form types in overflow.h (Andy Shevchenko)
 
 - Add flags param to string_get_size() (Andy Shevchenko)
 
 - Add Coccinelle script for potential struct_size() use (Jacob Keller)
 
 - Fix objtool corner case under KCFI (Josh Poimboeuf)
 
 - Drop 13 year old backward compat CAP_SYS_ADMIN check (Jingzi Meng)
 
 - Add str_plural() helper (Michal Wajdeczko, Kees Cook)
 
 - Ignore relocations in .notes section
 
 - Add comments to explain how __is_constexpr() works
 
 - Fix m68k stack alignment expectations in stackinit Kunit test
 
 - Convert string selftests to KUnit
 
 - Add KUnit tests for fortified string functions
 
 - Improve reporting during fortified string warnings
 
 - Allow non-type arg to type_max() and type_min()
 
 - Allow strscpy() to be called with only 2 arguments
 
 - Add binary mode to leaking_addresses scanner
 
 - Various small cleanups to leaking_addresses scanner
 
 - Adding wrapping_*() arithmetic helper
 
 - Annotate initial signed integer wrap-around in refcount_t
 
 - Add explicit UBSAN section to MAINTAINERS
 
 - Fix UBSAN self-test warnings
 
 - Simplify UBSAN build via removal of CONFIG_UBSAN_SANITIZE_ALL
 
 - Reintroduce UBSAN's signed overflow sanitizer
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmXvm5kWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJiQqD/4mM6SWZpYHKlR1nEiqIyz7Hqr9
 g4oguuw6HIVNJXLyeBI5Hd43CTeHPA0e++EETqhUAt7HhErxfYJY+JB221nRYmu+
 zhhQ7N/xbTMV/Je7AR03kQjhiMm8LyEcM2X4BNrsAcoCieQzmO3g0zSp8ISzLUE0
 PEEmf1lOzMe3gK2KOFCPt5Hiz9sGWyN6at+BQubY18tQGtjEXYAQNXkpD5qhGn4a
 EF693r/17wmc8hvSsjf4AGaWy1k8crG0WfpMCZsaqftjj0BbvOC60IDyx4eFjpcy
 tGyAJKETq161AkCdNweIh2Q107fG3tm0fcvw2dv8Wt1eQCko6M8dUGCBinQs/thh
 TexjJFS/XbSz+IvxLqgU+C5qkOP23E0M9m1dbIbOFxJAya/5n16WOBlGr3ae2Wdq
 /+t8wVSJw3vZiku5emWdFYP1VsdIHUjVa5QizFaaRhzLGRwhxVV49SP4IQC/5oM5
 3MAgNOFTP6yRQn9Y9wP+SZs+SsfaIE7yfKa9zOi4S+Ve+LI2v4YFhh8NCRiLkeWZ
 R1dhp8Pgtuq76f/v0qUaWcuuVeGfJ37M31KOGIhi1sI/3sr7UMrngL8D1+F8UZMi
 zcLu+x4GtfUZCHl6znx1rNUBqE5S/5ndVhLpOqfCXKaQ+RAm7lkOJ3jXE2VhNkhp
 yVEmeSOLnlCaQjZvXQ==
 =OP+o
 -----END PGP SIGNATURE-----

Merge tag 'hardening-v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull hardening updates from Kees Cook:
 "As is pretty normal for this tree, there are changes all over the
  place, especially for small fixes, selftest improvements, and improved
  macro usability.

  Some header changes ended up landing via this tree as they depended on
  the string header cleanups. Also, a notable set of changes is the work
  for the reintroduction of the UBSAN signed integer overflow sanitizer
  so that we can continue to make improvements on the compiler side to
  make this sanitizer a more viable future security hardening option.

  Summary:

   - string.h and related header cleanups (Tanzir Hasan, Andy
     Shevchenko)

   - VMCI memcpy() usage and struct_size() cleanups (Vasiliy Kovalev,
     Harshit Mogalapalli)

   - selftests/powerpc: Fix load_unaligned_zeropad build failure
     (Michael Ellerman)

   - hardened Kconfig fragment updates (Marco Elver, Lukas Bulwahn)

   - Handle tail call optimization better in LKDTM (Douglas Anderson)

   - Use long form types in overflow.h (Andy Shevchenko)

   - Add flags param to string_get_size() (Andy Shevchenko)

   - Add Coccinelle script for potential struct_size() use (Jacob
     Keller)

   - Fix objtool corner case under KCFI (Josh Poimboeuf)

   - Drop 13 year old backward compat CAP_SYS_ADMIN check (Jingzi Meng)

   - Add str_plural() helper (Michal Wajdeczko, Kees Cook)

   - Ignore relocations in .notes section

   - Add comments to explain how __is_constexpr() works

   - Fix m68k stack alignment expectations in stackinit Kunit test

   - Convert string selftests to KUnit

   - Add KUnit tests for fortified string functions

   - Improve reporting during fortified string warnings

   - Allow non-type arg to type_max() and type_min()

   - Allow strscpy() to be called with only 2 arguments

   - Add binary mode to leaking_addresses scanner

   - Various small cleanups to leaking_addresses scanner

   - Adding wrapping_*() arithmetic helper

   - Annotate initial signed integer wrap-around in refcount_t

   - Add explicit UBSAN section to MAINTAINERS

   - Fix UBSAN self-test warnings

   - Simplify UBSAN build via removal of CONFIG_UBSAN_SANITIZE_ALL

   - Reintroduce UBSAN's signed overflow sanitizer"

* tag 'hardening-v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (51 commits)
  selftests/powerpc: Fix load_unaligned_zeropad build failure
  string: Convert helpers selftest to KUnit
  string: Convert selftest to KUnit
  sh: Fix build with CONFIG_UBSAN=y
  compiler.h: Explain how __is_constexpr() works
  overflow: Allow non-type arg to type_max() and type_min()
  VMCI: Fix possible memcpy() run-time warning in vmci_datagram_invoke_guest_handler()
  lib/string_helpers: Add flags param to string_get_size()
  x86, relocs: Ignore relocations in .notes section
  objtool: Fix UNWIND_HINT_{SAVE,RESTORE} across basic blocks
  overflow: Use POD in check_shl_overflow()
  lib: stackinit: Adjust target string to 8 bytes for m68k
  sparc: vdso: Disable UBSAN instrumentation
  kernel.h: Move lib/cmdline.c prototypes to string.h
  leaking_addresses: Provide mechanism to scan binary files
  leaking_addresses: Ignore input device status lines
  leaking_addresses: Use File::Temp for /tmp files
  MAINTAINERS: Update LEAKING_ADDRESSES details
  fortify: Improve buffer overflow reporting
  fortify: Add KUnit tests for runtime overflows
  ...
2024-03-12 14:49:30 -07:00
Linus Torvalds
65d287c7eb asm-generic updates for 6.9
Just two small updates this time:
 
  - A series I did to unify the definition of PAGE_SIZE through Kconfig,
    intended to help with a vdso rework that needs the constant but
    cannot include the normal kernel headers when building the compat
    VDSO on arm64 and potentially others.
 
  - a patch from Yan Zhao to remove the pfn_to_virt() definitions from
    a couple of architectures after finding they were both incorrect
    and entirely unused.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmXwEjQACgkQYKtH/8kJ
 UifwHxAAqXl6R4cZtjUKxHpQoX7TTtBgWyZ9OID8KYt8V/QN+Jme6EhuGV/5CJ1k
 5n30PuDvSKPB9865HfCZgh0BDSzSFo2xtc/bDuqiPHO5deNhXUDKX5MowIs3Pf2J
 EM1OJYiXG/g9vR19uaHvWVA4I1eJk01+Pl5nZ3DA+n9ZYcnM35+HO7EQcH80FGwz
 jkjN1HizxDmuMDDKn24hrSt6mVoE54JWyeDvklbY4CbwZbtFbtBJiFv3NWTfaxSf
 MPR1fopgaAkT0aJzUXOh36qDodyqR2tz4M7ucpRKa6/YlOewDN59tFwgwtun0s74
 lLJPBqQ6cT8no1VODNnKPb1M5Jh3uzsF1fuhnU6B06Z+1s7sxxqOli1Q0yrpivYY
 SCAh6WmiCMhHeP/sxfQHRhhrx9l0gOarXh7s4wRJFp+LAi59NuUTeJotoOfboX4M
 ozeFgW1Rlr+wORzUargRnQiXMLObC/RFdogLgiBJwa8XOI8bOPZg9JfAUPOwbfa2
 37IFZRleu+V2NaBF8rS5wRGI8hVp99XSMjlskKLM/645doqNq1cyR9UO68jb1hhF
 d5X2+BEaEJTHJbXEQ9YtThpNWYzHXL5dFswVJfHDs+CW1FWi5GVqCufZGzr7xihy
 uNLlVqXLhjM+hU2dDoS4ZshygxN3b8f2qa+GtlIMBYrLcbcjxd4=
 =X4Cs
 -----END PGP SIGNATURE-----

Merge tag 'asm-generic-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic

Pull asm-generic updates from Arnd Bergmann:
 "Just two small updates this time:

   - A series I did to unify the definition of PAGE_SIZE through
     Kconfig, intended to help with a vdso rework that needs the
     constant but cannot include the normal kernel headers when building
     the compat VDSO on arm64 and potentially others

   - a patch from Yan Zhao to remove the pfn_to_virt() definitions from
     a couple of architectures after finding they were both incorrect
     and entirely unused"

* tag 'asm-generic-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
  arch: define CONFIG_PAGE_SIZE_*KB on all architectures
  arch: simplify architecture specific page size configuration
  arch: consolidate existing CONFIG_PAGE_SIZE_*KB definitions
  mm: Remove broken pfn_to_virt() on arch csky/hexagon/openrisc
2024-03-12 10:56:28 -07:00
Linus Torvalds
306bee64b7 SoC: device tree updates for 6.9
There is very little going on with new SoC support this time, all the
 new chips are variations of others that we already support, and they
 are all based on ARMv8 cores:
 
  - Mediatek MT7981B (Filogic 820) and MT7988A (Filogic 880) are
    networking SoCs designed to be used in wireless routers, similar
    to the already supported MT7986A (Filogic 830).
 
  - NXP i.MX8DXP is a variant of i.MX8QXP, with two CPU cores less.
    These are used in many embedded and industrial applications.
 
  - Renesas R8A779G2 (R-Car V4H ES2.0) and R8A779H0 (R-Car V4M)
    are automotive SoCs.
 
  - TI J722S is another automotive variant of its K3 family,
    related to the AM62 series.
 
 There are a total of 7 new arm32 machines and 45 arm64 ones, including
 
  - Two Android phones based on the old Tegra30 chip
 
  - Two machines using Cortex-A53 SoCs from Allwinner, a mini PC and
    a SoM development board
 
  - A set-top box using Amlogic Meson G12A S905X2
 
  - Eight embedded board using NXP i.MX6/8/9
 
  - Three machines using Mediatek network router chips
 
  - Ten Chromebooks, all based on Mediatek MT8186
 
  - One development board based on Mediatek MT8395 (Genio 1200)
 
  - Seven tablets and phones based on Qualcomm SoCs, most of them
    from Samsung.
 
  - A third development board for Qualcomm SM8550 (Snapdragon 8 Gen 2)
 
  - Three variants of the "White Hawk" board for Renesas
    automotive SoCs
 
  - Ten Rockchips RK35xx based machines, including NAS, Tablet,
    Game console and industrial form factors.
 
  - Three evaluation boards for TI K3 based SoCs
 
 The other changes are mainly the usual feature additions for existing hardware,
 cleanups, and dtc compile time fixes. One notable change is the inclusion
 of PowerVR SGX GPU nodes on TI SoCs.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmXvLwQACgkQYKtH/8kJ
 Uidkhw/+LjDOIqF8f4+6TBCCS3pFAVSAZxKxlm7L4VhsVOOeGZdspOY57eKZJWqW
 bVqj+B22UjJSw/9LOrFBNApkV8vk+rR7UfJjzijXM34WB80DC8+s7DbenCHagqR8
 fsKCB4tHKTYbBk6EefzyWy7fSA1SFu7hpTg5qWK8XONbGdHnkhbj1aQDbUe7p961
 huKGM+2spO+bFs3ljHGymBWywFKtuMTmVzoq16mBZl/bnuIKobm7W2kF+n3NAo+h
 CMta6J9mBlinBT+VtIg2Xax+KvkjmoitevOmyURxp/33+14A64dafI+RLiSyeqb6
 DfeAp9ptrBbVGzYZq2r07WYX9AIBdD2hvdkrtrjOy6JPqtJpWdfA4slYzWCzZfOz
 O08sV3l7ERggpNkMcTWiwBiuB/y5Hci7SYVeQm8N8bp5PydgNpoo6kNVpnc1e6ri
 Ug8t/jQYvpkCVHT3ld8PmgpWoZRinKIe6PNmqdg5jUu8aH+m4TNNmHyA2IjBcovj
 006FBBGVKp4HlCrGz4t9/XsmKzt+cRxLaX06duoZ93FQknXSzs7j7UDkPhpR07kF
 yEHjETnfhziyONL2fHZ+ejBoK/9psTFtzbpgMreBJ0mFZM0yvL0c+gcMvDgDD8ho
 PCp2ohDYpKPoklrTqMLKM7Yjev5bTOdrAJeWoLDWCbgkzVDkyjw=
 =krkR
 -----END PGP SIGNATURE-----

Merge tag 'soc-dt-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc

Pull SoC device tree updates from Arnd Bergmann:
 "There is very little going on with new SoC support this time, all the
  new chips are variations of others that we already support, and they
  are all based on ARMv8 cores:

   - Mediatek MT7981B (Filogic 820) and MT7988A (Filogic 880) are
     networking SoCs designed to be used in wireless routers, similar to
     the already supported MT7986A (Filogic 830).

   - NXP i.MX8DXP is a variant of i.MX8QXP, with two CPU cores less.
     These are used in many embedded and industrial applications.

   - Renesas R8A779G2 (R-Car V4H ES2.0) and R8A779H0 (R-Car V4M) are
     automotive SoCs.

   - TI J722S is another automotive variant of its K3 family, related to
     the AM62 series.

  There are a total of 7 new arm32 machines and 45 arm64 ones, including

   - Two Android phones based on the old Tegra30 chip

   - Two machines using Cortex-A53 SoCs from Allwinner, a mini PC and a
     SoM development board

   - A set-top box using Amlogic Meson G12A S905X2

   - Eight embedded board using NXP i.MX6/8/9

   - Three machines using Mediatek network router chips

   - Ten Chromebooks, all based on Mediatek MT8186

   - One development board based on Mediatek MT8395 (Genio 1200)

   - Seven tablets and phones based on Qualcomm SoCs, most of them from
     Samsung.

   - A third development board for Qualcomm SM8550 (Snapdragon 8 Gen 2)

   - Three variants of the "White Hawk" board for Renesas automotive
     SoCs

   - Ten Rockchips RK35xx based machines, including NAS, Tablet, Game
     console and industrial form factors.

   - Three evaluation boards for TI K3 based SoCs

  The other changes are mainly the usual feature additions for existing
  hardware, cleanups, and dtc compile time fixes. One notable change is
  the inclusion of PowerVR SGX GPU nodes on TI SoCs"

* tag 'soc-dt-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (824 commits)
  riscv: dts: Move BUILTIN_DTB_SOURCE to common Kconfig
  riscv: dts: starfive: jh7100: fix root clock names
  ARM: dts: samsung: exynos4412: decrease memory to account for unusable region
  arm64: dts: qcom: sm8250-xiaomi-elish: set rotation
  arm64: dts: qcom: sm8650: Fix SPMI channels size
  arm64: dts: qcom: sm8550: Fix SPMI channels size
  arm64: dts: rockchip: Fix name for UART pin header on qnap-ts433
  arm: dts: marvell: clearfog-gtr-l8: align port numbers with enclosure
  arm: dts: marvell: clearfog-gtr-l8: add support for second sfp connector
  dt-bindings: soc: renesas: renesas-soc: Add pattern for gray-hawk
  dtc: Enable dtc interrupt_provider check
  arm64: dts: st: add video encoder support to stm32mp255
  arm64: dts: st: add video decoder support to stm32mp255
  ARM: dts: stm32: enable crypto accelerator on stm32mp135f-dk
  ARM: dts: stm32: enable CRC on stm32mp135f-dk
  ARM: dts: stm32: add CRC on stm32mp131
  ARM: dts: add stm32f769-disco-mb1166-reva09
  ARM: dts: stm32: add display support on stm32f769-disco
  ARM: dts: stm32: rename mmc_vcard to vcc-3v3 on stm32f769-disco
  ARM: dts: stm32: add DSI support on stm32f769
  ...
2024-03-12 10:29:57 -07:00
Arnd Bergmann
5394f1e9b6 arch: define CONFIG_PAGE_SIZE_*KB on all architectures
Most architectures only support a single hardcoded page size. In order
to ensure that each one of these sets the corresponding Kconfig symbols,
change over the PAGE_SHIFT definition to the common one and allow
only the hardware page size to be selected.

Acked-by: Guo Ren <guoren@kernel.org>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Stafford Horne <shorne@gmail.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-03-06 19:29:09 +01:00
Yangyu Chen
2672031b20 riscv: dts: Move BUILTIN_DTB_SOURCE to common Kconfig
The BUILTIN_DTB_SOURCE was only configured for K210 before. Since
SOC_BUILTIN_DTB_DECLARE was removed at commit d5805af9fe ("riscv: Fix
builtin DTB handling") from patch [1], the kernel cannot choose one of the
dtbs from then on and always take the first one dtb to use. Then, another
commit 0ddd7eaffa ("riscv: Fix BUILTIN_DTB for sifive and microchip soc")
from patch [2] supports BUILTIN_DTB_SOURCE for other SoCs. However, this
feature will only work if the Kconfig we use links the dtb we expected in
the first place as mentioned in the thread [3]. Thus, a config
BUILTIN_DTB_SOURCE is needed for all SoCs to choose one dtb to use.

For some considerations, this patch also removes default y if XIP_KERNEL
for BUILTIN_DTB, as this requires setting a proper dtb to use on the
BUILTIN_DTB_SOURCE, else the kernel with XIP but does not set
BUILTIN_DTB_SOURCE or unselect BUILTIN_DTB will not boot.

Also, this patch removes the default dtb string for k210 from Kconfig to
nommu_k210_defconfig and nommu_k210_sdcard_defconfig to avoid complex
Kconfig settings for other SoCs in the future.

[1] https://lore.kernel.org/linux-riscv/20201208073355.40828-5-damien.lemoal@wdc.com/
[2] https://lore.kernel.org/linux-riscv/20210604120639.1447869-1-alex@ghiti.fr/
[3] https://lore.kernel.org/linux-riscv/CAK7LNATt_56mO2Le4v4EnPnAfd3gC8S_Sm5-GCsfa=qXy=8Lrg@mail.gmail.com/

Signed-off-by: Yangyu Chen <cyy@cyyself.name>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
2024-03-06 00:08:32 +00:00
Eric Biggers
886516fae2
RISC-V: fix check for zvkb with tip-of-tree clang
LLVM commit 8e01042da9d3 ("[RISCV] Add missing dependency check for Zvkb
(#79467)") broke the check used by the TOOLCHAIN_HAS_VECTOR_CRYPTO
kconfig symbol because it made zvkb start depending on v or zve*.  Fix
this by specifying both v and zvkb when checking for support for zvkb.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20240127090055.124336-1-ebiggers@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-02-28 07:08:33 -08:00
Palmer Dabbelt
aba3f18aba
Merge commit '3aff0c459e77' into for-next
These fixes are a dependency for the Zvkb patches, so I'm merging them
into for-next as well as fixes.

* commit '3aff0c459e77':
  RISC-V: Drop invalid test from CONFIG_AS_HAS_OPTION_ARCH
  kbuild: Add -Wa,--fatal-warnings to as-instr invocation
2024-02-28 07:07:51 -08:00
Baoquan He
85fcde402d kexec: split crashkernel reservation code out from crash_core.c
Patch series "Split crash out from kexec and clean up related config
items", v3.

Motivation:
=============
Previously, LKP reported a building error. When investigating, it can't
be resolved reasonablly with the present messy kdump config items.

 https://lore.kernel.org/oe-kbuild-all/202312182200.Ka7MzifQ-lkp@intel.com/

The kdump (crash dumping) related config items could causes confusions:

Firstly,

CRASH_CORE enables codes including
 - crashkernel reservation;
 - elfcorehdr updating;
 - vmcoreinfo exporting;
 - crash hotplug handling;

Now fadump of powerpc, kcore dynamic debugging and kdump all selects
CRASH_CORE, while fadump
 - fadump needs crashkernel parsing, vmcoreinfo exporting, and accessing
   global variable 'elfcorehdr_addr';
 - kcore only needs vmcoreinfo exporting;
 - kdump needs all of the current kernel/crash_core.c.

So only enabling PROC_CORE or FA_DUMP will enable CRASH_CORE, this
mislead people that we enable crash dumping, actual it's not.

Secondly,

It's not reasonable to allow KEXEC_CORE select CRASH_CORE.

Because KEXEC_CORE enables codes which allocate control pages, copy
kexec/kdump segments, and prepare for switching. These codes are
shared by both kexec reboot and kdump. We could want kexec reboot,
but disable kdump. In that case, CRASH_CORE should not be selected.

 --------------------
 CONFIG_CRASH_CORE=y
 CONFIG_KEXEC_CORE=y
 CONFIG_KEXEC=y
 CONFIG_KEXEC_FILE=y
 ---------------------

Thirdly,

It's not reasonable to allow CRASH_DUMP select KEXEC_CORE.

That could make KEXEC_CORE, CRASH_DUMP are enabled independently from
KEXEC or KEXEC_FILE. However, w/o KEXEC or KEXEC_FILE, the KEXEC_CORE
code built in doesn't make any sense because no kernel loading or
switching will happen to utilize the KEXEC_CORE code.
 ---------------------
 CONFIG_CRASH_CORE=y
 CONFIG_KEXEC_CORE=y
 CONFIG_CRASH_DUMP=y
 ---------------------

In this case, what is worse, on arch sh and arm, KEXEC relies on MMU,
while CRASH_DUMP can still be enabled when !MMU, then compiling error is
seen as the lkp test robot reported in above link.

 ------arch/sh/Kconfig------
 config ARCH_SUPPORTS_KEXEC
         def_bool MMU

 config ARCH_SUPPORTS_CRASH_DUMP
         def_bool BROKEN_ON_SMP
 ---------------------------

Changes:
===========
1, split out crash_reserve.c from crash_core.c;
2, split out vmcore_infoc. from crash_core.c;
3, move crash related codes in kexec_core.c into crash_core.c;
4, remove dependency of FA_DUMP on CRASH_DUMP;
5, clean up kdump related config items;
6, wrap up crash codes in crash related ifdefs on all 8 arch-es
   which support crash dumping, except of ppc;

Achievement:
===========
With above changes, I can rearrange the config item logic as below (the right
item depends on or is selected by the left item):

    PROC_KCORE -----------> VMCORE_INFO

               |----------> VMCORE_INFO
    FA_DUMP----|
               |----------> CRASH_RESERVE

                                                    ---->VMCORE_INFO
                                                   /
                                                   |---->CRASH_RESERVE
    KEXEC      --|                                /|
                 |--> KEXEC_CORE--> CRASH_DUMP-->/-|---->PROC_VMCORE
    KEXEC_FILE --|                               \ |
                                                   \---->CRASH_HOTPLUG


    KEXEC      --|
                 |--> KEXEC_CORE (for kexec reboot only)
    KEXEC_FILE --|

Test
========
On all 8 architectures, including x86_64, arm64, s390x, sh, arm, mips,
riscv, loongarch, I did below three cases of config item setting and
building all passed. Take configs on x86_64 as exampmle here:

(1) Both CONFIG_KEXEC and KEXEC_FILE is unset, then all kexec/kdump
items are unset automatically:
# Kexec and crash features
# CONFIG_KEXEC is not set
# CONFIG_KEXEC_FILE is not set
# end of Kexec and crash features

(2) set CONFIG_KEXEC_FILE and 'make olddefconfig':
---------------
# Kexec and crash features
CONFIG_CRASH_RESERVE=y
CONFIG_VMCORE_INFO=y
CONFIG_KEXEC_CORE=y
CONFIG_KEXEC_FILE=y
CONFIG_CRASH_DUMP=y
CONFIG_CRASH_HOTPLUG=y
CONFIG_CRASH_MAX_MEMORY_RANGES=8192
# end of Kexec and crash features
---------------

(3) unset CONFIG_CRASH_DUMP in case 2 and execute 'make olddefconfig':
------------------------
# Kexec and crash features
CONFIG_KEXEC_CORE=y
CONFIG_KEXEC_FILE=y
# end of Kexec and crash features
------------------------

Note:
For ppc, it needs investigation to make clear how to split out crash
code in arch folder. Hope Hari and Pingfan can help have a look, see if
it's doable. Now, I make it either have both kexec and crash enabled, or
disable both of them altogether.


This patch (of 14):

Both kdump and fa_dump of ppc rely on crashkernel reservation.  Move the
relevant codes into separate files: crash_reserve.c,
include/linux/crash_reserve.h.

And also add config item CRASH_RESERVE to control its enabling of the
codes.  And update config items which has relationship with crashkernel
reservation.

And also change ifdeffery from CONFIG_CRASH_CORE to CONFIG_CRASH_RESERVE
when those scopes are only crashkernel reservation related.

And also rename arch/XXX/include/asm/{crash_core.h => crash_reserve.h} on
arm64, x86 and risc-v because those architectures' crash_core.h is only
related to crashkernel reservation.

[akpm@linux-foundation.org: s/CRASH_RESEERVE/CRASH_RESERVE/, per Klara Modin]
Link: https://lkml.kernel.org/r/20240124051254.67105-1-bhe@redhat.com
Link: https://lkml.kernel.org/r/20240124051254.67105-2-bhe@redhat.com
Signed-off-by: Baoquan He <bhe@redhat.com>
Acked-by: Hari Bathini <hbathini@linux.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Pingfan Liu <piliu@redhat.com>
Cc: Klara Modin <klarasmodin@gmail.com>
Cc: Michael Kelley <mhklinux@outlook.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-23 17:48:21 -08:00
Nathan Chancellor
a38d971812 riscv: Kconfig: remove version dependency from CONFIG_CLANG_SUPPORTS_DYNAMIC_FTRACE
Now that the minimum supported version of LLVM for building the kernel has
been bumped to 13.0.1, this condition is always true, as the build will
fail during the configuration stage for older LLVM versions.  Remove it.

Link: https://lkml.kernel.org/r/20240125-bump-min-llvm-ver-to-13-0-1-v1-8-f5ff9bda41c5@kernel.org
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: "Aneesh Kumar K.V (IBM)" <aneesh.kumar@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Conor Dooley <conor@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Nicolas Schier <nicolas@fjasle.eu>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-22 15:38:54 -08:00
Nathan Chancellor
fafdea3419 arch and include: update LLVM Phabricator links
reviews.llvm.org was LLVM's Phabricator instances for code review.  It has
been abandoned in favor of GitHub pull requests.  While the majority of
links in the kernel sources still work because of the work Fangrui has
done turning the dynamic Phabricator instance into a static archive, there
are some issues with that work, so preemptively convert all the links in
the kernel sources to point to the commit on GitHub.

Most of the commits have the corresponding differential review link in the
commit message itself so there should not be any loss of fidelity in the
relevant information.

Link: https://discourse.llvm.org/t/update-on-github-pull-requests/71540/172
Link: https://lkml.kernel.org/r/20240109-update-llvm-links-v1-2-eb09b59db071@kernel.org
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Fangrui Song <maskray@google.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Mykola Lysenko <mykolal@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-22 15:38:51 -08:00
Nathan Chancellor
3aff0c459e
RISC-V: Drop invalid test from CONFIG_AS_HAS_OPTION_ARCH
Commit e4bb020f3d ("riscv: detect assembler support for .option arch")
added two tests, one for a valid value to '.option arch' that should
succeed and one for an invalid value that is expected to fail to make
sure that support for '.option arch' is properly detected because Clang
does not error when '.option arch' is not supported:

  $ clang --target=riscv64-linux-gnu -Werror -x assembler -c -o /dev/null <(echo '.option arch, +m')
  /dev/fd/63:1:9: warning: unknown option, expected 'push', 'pop', 'rvc', 'norvc', 'relax' or 'norelax'
  .option arch, +m
          ^
  $ echo $?
  0

Unfortunately, the invalid test started being accepted by Clang after
the linked llvm-project change, which causes CONFIG_AS_HAS_OPTION_ARCH
and configurations that depend on it to be silently disabled, even
though those versions do support '.option arch'.

The invalid test can be avoided altogether by using
'-Wa,--fatal-warnings', which will turn all assembler warnings into
errors, like '-Werror' does for the compiler:

  $ clang --target=riscv64-linux-gnu -Werror -Wa,--fatal-warnings -x assembler -c -o /dev/null <(echo '.option arch, +m')
  /dev/fd/63:1:9: error: unknown option, expected 'push', 'pop', 'rvc', 'norvc', 'relax' or 'norelax'
  .option arch, +m
          ^
  $ echo $?
  1

The as-instr macros have been updated to make use of this flag, so
remove the invalid test, which allows CONFIG_AS_HAS_OPTION_ARCH to work
for all compiler versions.

Cc: stable@vger.kernel.org
Fixes: e4bb020f3d ("riscv: detect assembler support for .option arch")
Link: 3ac9fe69f7
Reported-by: Eric Biggers <ebiggers@kernel.org>
Closes: https://lore.kernel.org/r/20240121011341.GA97368@sol.localdomain/
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Eric Biggers <ebiggers@google.com>
Tested-by: Andy Chiu <andybnac@gmail.com>
Reviewed-by: Andy Chiu <andybnac@gmail.com>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Acked-by: Masahiro Yamada <masahiroy@kernel.org>
Link: https://lore.kernel.org/r/20240125-fix-riscv-option-arch-llvm-18-v1-2-390ac9cc3cd0@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-02-16 16:07:08 -08:00
Palmer Dabbelt
0420af54c2
Merge patch series "membarrier: riscv: Core serializing command"
RISC-V was lacking a membarrier implementation for the store/fetch
ordering, which is a bit tricky because of the deferred icache flushing
we use in RISC-V.

* b4-shazam-merge:
  membarrier: riscv: Provide core serializing command
  locking: Introduce prepare_sync_core_cmd()
  membarrier: Create Documentation/scheduler/membarrier.rst
  membarrier: riscv: Add full memory barrier in switch_mm()

Link: https://lore.kernel.org/r/20240131144936.29190-1-parri.andrea@gmail.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-02-15 08:04:23 -08:00
Andrea Parri
cd9b29014d
membarrier: riscv: Provide core serializing command
RISC-V uses xRET instructions on return from interrupt and to go back
to user-space; the xRET instruction is not core serializing.

Use FENCE.I for providing core serialization as follows:

 - by calling sync_core_before_usermode() on return from interrupt (cf.
   ipi_sync_core()),

 - via switch_mm() and sync_core_before_usermode() (respectively, for
   uthread->uthread and kthread->uthread transitions) before returning
   to user-space.

On RISC-V, the serialization in switch_mm() is activated by resetting
the icache_stale_mask of the mm at prepare_sync_core_cmd().

Suggested-by: Palmer Dabbelt <palmer@dabbelt.com>
Signed-off-by: Andrea Parri <parri.andrea@gmail.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/r/20240131144936.29190-5-parri.andrea@gmail.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-02-15 08:04:14 -08:00
Andrea Parri
d6cfd1770f
membarrier: riscv: Add full memory barrier in switch_mm()
The membarrier system call requires a full memory barrier after storing
to rq->curr, before going back to user-space.  The barrier is only
needed when switching between processes: the barrier is implied by
mmdrop() when switching from kernel to userspace, and it's not needed
when switching from userspace to kernel.

Rely on the feature/mechanism ARCH_HAS_MEMBARRIER_CALLBACKS and on the
primitive membarrier_arch_switch_mm(), already adopted by the PowerPC
architecture, to insert the required barrier.

Fixes: fab957c11e ("RISC-V: Atomic and Locking Code")
Signed-off-by: Andrea Parri <parri.andrea@gmail.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/r/20240131144936.29190-2-parri.andrea@gmail.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-02-15 08:04:11 -08:00
Kees Cook
918327e9b7 ubsan: Remove CONFIG_UBSAN_SANITIZE_ALL
For simplicity in splitting out UBSan options into separate rules,
remove CONFIG_UBSAN_SANITIZE_ALL, effectively defaulting to "y", which
is how it is generally used anyway. (There are no ":= y" cases beyond
where a specific file is enabled when a top-level ":= n" is in effect.)

Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Marco Elver <elver@google.com>
Cc: linux-doc@vger.kernel.org
Cc: linux-kbuild@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
2024-02-06 02:21:38 -08:00
Song Shuai
05d450aabd
riscv: Support RANDOMIZE_KSTACK_OFFSET
Inspired from arm64's implement -- commit 70918779ae
("arm64: entry: Enable random_kstack_offset support")

Add support of kernel stack offset randomization while handling syscall,
the offset is defaultly limited by KSTACK_OFFSET_MAX() (i.e. 10 bits).

In order to avoid trigger stack canaries (due to __builtin_alloca) and
slowing down the entry path, use __no_stack_protector attribute to
disable stack protector for do_trap_ecall_u() at the function level.

Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Song Shuai <songshuaishuai@tinylab.org>
Link: https://lore.kernel.org/r/20231109133751.212079-1-songshuaishuai@tinylab.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-24 17:24:24 -08:00
Palmer Dabbelt
7f43d57b90
Merge patch series "riscv: support fast gup"
Jisheng Zhang <jszhang@kernel.org> says:

This series adds fast gup support to riscv.

The First patch fixes a bug in __p*d_free_tlb(). Per the riscv
privileged spec, if non-leaf PTEs I.E pmd, pud or p4d is modified, a
sfence.vma is a must.

The 2nd patch is a preparation patch.

The last two patches do the real work:
In order to implement fast gup we need to ensure that the page
table walker is protected from page table pages being freed from
under it.

riscv situation is more complicated than other architectures: some
riscv platforms may use IPI to perform TLB shootdown, for example,
those platforms which support AIA, usually the riscv_ipi_for_rfence is
true on these platforms; some riscv platforms may rely on the SBI to
perform TLB shootdown, usually the riscv_ipi_for_rfence is false on
these platforms. To keep software pagetable walkers safe in this case
we switch to RCU based table free (MMU_GATHER_RCU_TABLE_FREE). See the
comment below 'ifdef CONFIG_MMU_GATHER_RCU_TABLE_FREE' in
include/asm-generic/tlb.h for more details.

This patch enables MMU_GATHER_RCU_TABLE_FREE, then use

*tlb_remove_page_ptdesc() for those platforms which use IPI to perform
TLB shootdown;

*tlb_remove_ptdesc() for those platforms which use SBI to perform TLB
shootdown;

Both case mean that disabling interrupts will block the free and
protect the fast gup page walker.

So after the 3rd patch, everything is well prepared, let's select
HAVE_FAST_GUP if MMU.

* b4-shazam-merge:
  riscv: enable HAVE_FAST_GUP if MMU
  riscv: enable MMU_GATHER_RCU_TABLE_FREE for SMP && MMU
  riscv: tlb: convert __p*d_free_tlb() to inline functions
  riscv: tlb: fix __p*d_free_tlb()

Link: https://lore.kernel.org/r/20231219175046.2496-1-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-24 15:57:00 -08:00
Jisheng Zhang
3f910b7a52
riscv: enable HAVE_FAST_GUP if MMU
Activate the fast gup for riscv mmu platforms. Here are some
GUP_FAST_BENCHMARK performance numbers:

Before the patch:
GUP_FAST_BENCHMARK: Time: get:53203 put:5085 us

After the patch:
GUP_FAST_BENCHMARK: Time: get:17711 put:5060 us

The get time is reduced by 66.7%! IOW, 3x get speed!

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20231219175046.2496-5-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-24 15:55:56 -08:00
Jisheng Zhang
69be3fb111
riscv: enable MMU_GATHER_RCU_TABLE_FREE for SMP && MMU
In order to implement fast gup we need to ensure that the page
table walker is protected from page table pages being freed from
under it.

riscv situation is more complicated than other architectures: some
riscv platforms may use IPI to perform TLB shootdown, for example,
those platforms which support AIA, usually the riscv_ipi_for_rfence is
true on these platforms; some riscv platforms may rely on the SBI to
perform TLB shootdown, usually the riscv_ipi_for_rfence is false on
these platforms. To keep software pagetable walkers safe in this case
we switch to RCU based table free (MMU_GATHER_RCU_TABLE_FREE). See the
comment below 'ifdef CONFIG_MMU_GATHER_RCU_TABLE_FREE' in
include/asm-generic/tlb.h for more details.

This patch enables MMU_GATHER_RCU_TABLE_FREE, then use

*tlb_remove_page_ptdesc() for those platforms which use IPI to perform
TLB shootdown;

*tlb_remove_ptdesc() for those platforms which use SBI to perform TLB
shootdown;

Both case mean that disabling interrupts will block the free and
protect the fast gup page walker.

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20231219175046.2496-4-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-24 15:55:55 -08:00
Conor Dooley
e2d6b54b93
Revert "RISC-V: mark hibernation as nonportable"
Revert commit ed309ce522 ("RISC-V: mark hibernation as nonportable")
as it appears the broken versions of OpenSBI have not made it to
production on any systems that support hibernation.

Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230802-chef-throng-d9de8b672a49@wendy
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-23 08:28:11 -08:00
Palmer Dabbelt
67daf84203
Merge patch series "RISC-V crypto with reworked asm files"
Eric Biggers <ebiggers@kernel.org> says:

This patchset, which applies to v6.8-rc1, adds cryptographic algorithm
implementations accelerated using the RISC-V vector crypto extensions
(https://github.com/riscv/riscv-crypto/releases/download/v1.0.0/riscv-crypto-spec-vector.pdf)
and RISC-V vector extension
(https://github.com/riscv/riscv-v-spec/releases/download/v1.0/riscv-v-spec-1.0.pdf).
The following algorithms are included: AES in ECB, CBC, CTR, and XTS modes;
ChaCha20; GHASH; SHA-2; SM3; and SM4.

In general, the assembly code requires a 64-bit RISC-V CPU with VLEN >= 128,
little endian byte order, and vector unaligned access support.  The ECB, CTR,
XTS, and ChaCha20 code is designed to naturally scale up to larger VLEN values.
Building the assembly code requires tip-of-tree binutils (future 2.42) or
tip-of-tree clang (future 18.x).  All algorithms pass testing in QEMU, using
CONFIG_CRYPTO_MANAGER_EXTRA_TESTS=y.  Much of the assembly code is derived from
OpenSSL code that was added by https://github.com/openssl/openssl/pull/21923.
It's been cleaned up for integration with the kernel, e.g. reducing code
duplication, eliminating use of .inst and perlasm, and fixing a few bugs.

This patchset incorporates the work of multiple people, including Jerry Shih,
Heiko Stuebner, Christoph Müllner, Phoebe Chen, Charalampos Mitrodimas, and
myself.  This patchset went through several versions from Heiko (last version
https://lore.kernel.org/linux-crypto/20230711153743.1970625-1-heiko@sntech.de),
then several versions from Jerry (last version:
https://lore.kernel.org/linux-crypto/20231231152743.6304-1-jerry.shih@sifive.com),
then finally several versions from me.  Thanks to everyone who has contributed
to this patchset or its prerequisites.

* b4-shazam-merge:
  crypto: riscv - add vector crypto accelerated SM4
  crypto: riscv - add vector crypto accelerated SM3
  crypto: riscv - add vector crypto accelerated SHA-{512,384}
  crypto: riscv - add vector crypto accelerated SHA-{256,224}
  crypto: riscv - add vector crypto accelerated GHASH
  crypto: riscv - add vector crypto accelerated ChaCha20
  crypto: riscv - add vector crypto accelerated AES-{ECB,CBC,CTR,XTS}
  RISC-V: hook new crypto subdir into build-system
  RISC-V: add TOOLCHAIN_HAS_VECTOR_CRYPTO
  RISC-V: add helper function to read the vector VLEN

Link: https://lore.kernel.org/r/20240122002024.27477-1-ebiggers@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-22 17:55:40 -08:00
Eric Biggers
34ca4ec628
RISC-V: add TOOLCHAIN_HAS_VECTOR_CRYPTO
Add a kconfig symbol that indicates whether the toolchain supports the
vector crypto extensions.  This is needed by the RISC-V crypto code.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20240122002024.27477-3-ebiggers@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-22 17:55:16 -08:00
Wende Tan
021d23428b
RISC-V: build: Allow LTO to be selected
Allow LTO to be selected for RISC-V, only when LLD >= 14, since there is
an issue [1] in prior LLD versions that prevents LLD to generate proper
machine code for RISC-V when writing `nop`s.

To avoid boot failures in QEMU [2], '-mattr=+c' and '-mattr=+relax'
need to be passed via '-mllvm' to ld.lld, as there appears to be an
issue with LLVM's target-features and LTO [3], which can result in
incorrect relocations to branch targets [4]. Once this is fixed in LLVM,
it can be made conditional on affected ld.lld versions.

Disable LTO for arch/riscv/kernel/pi, as llvm-objcopy expects an ELF
object file when manipulating the files in that subfolder, rather than
LLVM bitcode.

[1] https://github.com/llvm/llvm-project/issues/50505, resolved by LLVM
    commit e63455d5e0e5 ("[MC] Use local MCSubtargetInfo in writeNops")
[2] https://github.com/ClangBuiltLinux/linux/issues/1942
[3] https://github.com/llvm/llvm-project/issues/59350
[4] https://github.com/llvm/llvm-project/issues/65090

Tested-by: Wende Tan <twd2.me@gmail.com>
Signed-off-by: Wende Tan <twd2.me@gmail.com>
Co-developed-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20231017-riscv-lto-v4-1-e7810b24e805@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-22 10:06:29 -08:00
Linus Torvalds
e5075d8ec5 RISC-V Patches for the 6.8 Merge Window, Part 4
This includes everything from part 2:
 
 * Support for tuning for systems with fast misaligned accesses.
 * Support for SBI-based suspend.
 * Support for the new SBI debug console extension.
 * The T-Head CMOs now use PA-based flushes.
 * Support for enabling the V extension in kernel code.
 * Optimized IP checksum routines.
 * Various ftrace improvements.
 * Support for archrandom, which depends on the Zkr extension.
 
 and then also a fix for those:
 
 * The build is no longer broken under NET=n, KUNIT=y for ports that
   don't define their own ipv6 checksum.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmWsCOMTHHBhbG1lckBk
 YWJiZWx0LmNvbQAKCRAuExnzX7sYieQND/0f+1gizTM0OzuqZG9+DOdWTtqmILyr
 sZaYXWBw6SPzbUSlwjoW4Qp/S3Ur7IhrbfttM2aMoS4GHZvSESAXOMXC4c7AnCaQ
 HOXBC2OuXvq6jA0ZjK5XPviR70A/7uD2iu5SNO1hyfJK08LSEu+AulxtkW50+wMc
 bHXSpZxEf8AtwOJK1cRtwhH4qy+Qcs3Nla3jG7OnDsPbhJVcydHx95eCtfwn2cQA
 KwJPN1fjRtm4ALZb91QcMDO8VAoanfPEkSR3DoNVE/UfdTItYk35VHmf4RWh7IWA
 qDnV5Mp/XMX2RmJqwi1ZmSHHX0rfVLL5UqgBhGHC8PuMpLJn5p9U6DZ0qD7YWxcB
 NDlrHsaXt112RHEEM/7CcLkqEexua/ezcC45E5tSQ4sRDZE3fvgbALao67xSQ22D
 lCpVAY0Z3o5oWaM/jISiQHjSNn5RrAwEYSvvv2pkW4QAMShA2eggmQaCF+Jl4EMp
 u6yqJpXxDI99C088uvM6Bi2gcX8fnBSmOzCB/sSU4a1I72UpWrGngqUpTYKHG8Jz
 cTZhbIKmQirBP0vC/UgMOS0sNuw/NykItfRXZ2g0qGKvw1TjJ6djdeZBKcAj3h0E
 fJpMxuhmeOFYE7DavnhSt3CResFTXZzXLChjxGbT+g10YzVEf9g7vBVnjxAwad9f
 tryMVpL/ipGpQQ==
 =Sjhj
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-6.8-mw4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull more RISC-V updates from Palmer Dabbelt:

 - Support for tuning for systems with fast misaligned accesses.

 - Support for SBI-based suspend.

 - Support for the new SBI debug console extension.

 - The T-Head CMOs now use PA-based flushes.

 - Support for enabling the V extension in kernel code.

 - Optimized IP checksum routines.

 - Various ftrace improvements.

 - Support for archrandom, which depends on the Zkr extension.

 - The build is no longer broken under NET=n, KUNIT=y for ports that
   don't define their own ipv6 checksum.

* tag 'riscv-for-linus-6.8-mw4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (56 commits)
  lib: checksum: Fix build with CONFIG_NET=n
  riscv: lib: Check if output in asm goto supported
  riscv: Fix build error on rv32 + XIP
  riscv: optimize ELF relocation function in riscv
  RISC-V: Implement archrandom when Zkr is available
  riscv: Optimize hweight API with Zbb extension
  riscv: add dependency among Image(.gz), loader(.bin), and vmlinuz.efi
  samples: ftrace: Add RISC-V support for SAMPLE_FTRACE_DIRECT[_MULTI]
  riscv: ftrace: Add DYNAMIC_FTRACE_WITH_DIRECT_CALLS support
  riscv: ftrace: Make function graph use ftrace directly
  riscv: select FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY
  lib/Kconfig.debug: Update AS_HAS_NON_CONST_LEB128 comment and name
  riscv: Restrict DWARF5 when building with LLVM to known working versions
  riscv: Hoist linker relaxation disabling logic into Kconfig
  kunit: Add tests for csum_ipv6_magic and ip_fast_csum
  riscv: Add checksum library
  riscv: Add checksum header
  riscv: Add static key for misaligned accesses
  asm-generic: Improve csum_fold
  RISC-V: selftests: cbo: Ensure asm operands match constraints
  ...
2024-01-20 11:06:04 -08:00
Linus Torvalds
e7ded27593 percpu:
- Enable percpu page allocator for risc-v. There are risc-v
   configurations with sparse NUMA configurations and small vmalloc
   space causing dynamic percpu allocations to fail as the backing chunk
   stride is too far apart.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE3hZPHJdcVwe+yTTtiDc0yuoFPR0FAmWhz5YACgkQiDc0yuoF
 PR33fQ//TImRNBOvLn0zSL6eKE49pBOg1lhff82GroMkjIw/jHLOp0WfdtCRKBZm
 8234WiQRPk3TNKkvmrikMwmDG249Bc/U+RwaHTkfDao6Fm1Pb4SESaggNXw/VKDe
 zWFvI/zoVQGC3+xuUYo6KDtFE9shnphsT7surRt21wdDeZOojH89FtrrEHnnQpIx
 Zl5miPx0H1V+Hlzk7PZkPYmEwcZHp7Sjcx1/t7QzvtzzkiDKmOLROO2gxRMXaCJz
 zeM5UAQi1294EftLpHTgrtn9NEbwt8xOQnaNtZozYSznmcy6CztyiNH43XCOapFC
 10iVxn4NlioXGzaT/Vo2As3PGjJueg2kl+TJur7lAdENgWyqT0qksgtu+9Q2SSYg
 hzWMk8KKqpLHvjnDpKu0spl7EI7u4J8MdIfHLlw/a2vWUU1bBQeRzIZHGe56/yFu
 asHsTlqWzLPZy8ZvqjhX63HQnWnglHhmY63BcHr5kCeUN8F6cNAS0WWtSrvj5bXM
 OHuq+OaSUms9Ktl/igaaXDLUW+0t04vtH4qh1l2ncEdElYWBzT3d9WBkW8RfQzcu
 aXmu0ItxTHGTgmjafibGoQCkMzJ0NG0b7IW4NMNz5nWgpf5ghBXnSjz17Z4FkMgo
 PY/+uF3Gr7w+OYxsIDSzvMef/J14qgJ9oPMVUJWOJIwVUO7+nMQ=
 =fwxu
 -----END PGP SIGNATURE-----

Merge tag 'percpu-for-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu

Pull percpu updates from Dennis Zhou:
 "Enable percpu page allocator for RISC-V.

  There are RISC-V configurations with sparse NUMA configurations and
  small vmalloc space causing dynamic percpu allocations to fail as the
  backing chunk stride is too far apart"

* tag 'percpu-for-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu:
  riscv: Enable pcpu page first chunk allocator
  mm: Introduce flush_cache_vmap_early()
2024-01-18 15:01:28 -08:00
Linus Torvalds
80955ae955 Driver core changes for 6.8-rc1
Here are the set of driver core and kernfs changes for 6.8-rc1.  Nothing
 major in here this release cycle, just lots of small cleanups and some
 tweaks on kernfs that in the very end, got reverted and will come back
 in a safer way next release cycle.
 
 Included in here are:
   - more driver core 'const' cleanups and fixes
   - fw_devlink=rpm is now the default behavior
   - kernfs tiny changes to remove some string functions
   - cpu handling in the driver core is updated to work better on many
     systems that add topologies and cpus after booting
   - other minor changes and cleanups
 
 All of the cpu handling patches have been acked by the respective
 maintainers and are coming in here in one series.  Everything has been
 in linux-next for a while with no reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZaeOrg8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ymtcwCffzvKKkSY9qAp6+0v2WQNkZm1JWoAoJCPYUwF
 If6wEoPLWvRfKx4gIoq9
 =D96r
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg KH:
 "Here are the set of driver core and kernfs changes for 6.8-rc1.
  Nothing major in here this release cycle, just lots of small cleanups
  and some tweaks on kernfs that in the very end, got reverted and will
  come back in a safer way next release cycle.

  Included in here are:

   - more driver core 'const' cleanups and fixes

   - fw_devlink=rpm is now the default behavior

   - kernfs tiny changes to remove some string functions

   - cpu handling in the driver core is updated to work better on many
     systems that add topologies and cpus after booting

   - other minor changes and cleanups

  All of the cpu handling patches have been acked by the respective
  maintainers and are coming in here in one series. Everything has been
  in linux-next for a while with no reported issues"

* tag 'driver-core-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (51 commits)
  Revert "kernfs: convert kernfs_idr_lock to an irq safe raw spinlock"
  kernfs: convert kernfs_idr_lock to an irq safe raw spinlock
  class: fix use-after-free in class_register()
  PM: clk: make pm_clk_add_notifier() take a const pointer
  EDAC: constantify the struct bus_type usage
  kernfs: fix reference to renamed function
  driver core: device.h: fix Excess kernel-doc description warning
  driver core: class: fix Excess kernel-doc description warning
  driver core: mark remaining local bus_type variables as const
  driver core: container: make container_subsys const
  driver core: bus: constantify subsys_register() calls
  driver core: bus: make bus_sort_breadthfirst() take a const pointer
  kernfs: d_obtain_alias(NULL) will do the right thing...
  driver core: Better advertise dev_err_probe()
  kernfs: Convert kernfs_path_from_node_locked() from strlcpy() to strscpy()
  kernfs: Convert kernfs_name_locked() from strlcpy() to strscpy()
  kernfs: Convert kernfs_walk_ns() from strlcpy() to strscpy()
  initramfs: Expose retained initrd as sysfs file
  fs/kernfs/dir: obey S_ISGID
  kernel/cgroup: use kernfs_create_dir_ns()
  ...
2024-01-18 09:48:40 -08:00
Palmer Dabbelt
3074e8b175
Merge patch series "riscv: ftrace: Miscellaneous ftrace improvements"
Björn Töpel <bjorn@kernel.org> says:

This series includes a three ftrace improvements for RISC-V:

1. Do not require to run recordmcount at build time (patch 1)
2. Simplification of the function graph functionality (patch 2)
3. Enable DYNAMIC_FTRACE_WITH_DIRECT_CALLS (patch 3 and 4)

The series has been tested on Qemu/rv64 virt/Debian sid with the
following test configs:
  CONFIG_FTRACE_SELFTEST=y
  CONFIG_FTRACE_STARTUP_TEST=y
  CONFIG_SAMPLE_FTRACE_DIRECT=m
  CONFIG_SAMPLE_FTRACE_DIRECT_MULTI=m
  CONFIG_SAMPLE_FTRACE_OPS=m

All tests pass.

* b4-shazam-merge:
  samples: ftrace: Add RISC-V support for SAMPLE_FTRACE_DIRECT[_MULTI]
  riscv: ftrace: Add DYNAMIC_FTRACE_WITH_DIRECT_CALLS support
  riscv: ftrace: Make function graph use ftrace directly
  riscv: select FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY

Link: https://lore.kernel.org/r/20231130121531.1178502-1-bjorn@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-17 18:17:29 -08:00
Song Shuai
629291dd84
samples: ftrace: Add RISC-V support for SAMPLE_FTRACE_DIRECT[_MULTI]
Add RISC-V variants of the ftrace-direct* samples.

Tested-by: Evgenii Shatokhin <e.shatokhin@yadro.com>
Signed-off-by: Song Shuai <suagrfillet@gmail.com>
Tested-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Guo Ren <guoren@kernel.org>
Acked-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20231130121531.1178502-5-bjorn@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-17 18:17:10 -08:00
Song Shuai
196c79f19a
riscv: ftrace: Add DYNAMIC_FTRACE_WITH_DIRECT_CALLS support
Select the DYNAMIC_FTRACE_WITH_DIRECT_CALLS to provide the
register_ftrace_direct[_multi] interfaces allowing users to register
the customed trampoline (direct_caller) as the mcount for one or more
target functions. And modify_ftrace_direct[_multi] are also provided
for modifying direct_caller.

To make the direct_caller and the other ftrace hooks (e.g.
function/fgraph tracer, k[ret]probes) co-exist, a temporary register
is nominated to store the address of direct_caller in
ftrace_regs_caller. After the setting of the address direct_caller by
direct_ops->func and the RESTORE_REGS in ftrace_regs_caller,
direct_caller will be jumped to by the `jr` inst.

Add DYNAMIC_FTRACE_WITH_DIRECT_CALLS support for RISC-V.

Signed-off-by: Song Shuai <suagrfillet@gmail.com>
Tested-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Guo Ren <guoren@kernel.org>
Acked-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20231130121531.1178502-4-bjorn@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-17 18:17:09 -08:00
Song Shuai
b546d6363a
riscv: select FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY
In commit afc76b8b80 ("riscv: Using PATCHABLE_FUNCTION_ENTRY instead
of MCOUNT") RISC-V added support for -fpatchable-function-entry, which
removes the need for recordmcount.

Select FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY to tell the build
system not to run recordmcount.

Link: https://lore.kernel.org/linux-riscv/CAAYs2=j3Eak9vU6xbAw0zPuoh00rh8v5C2U3fePkokZFibWs2g@mail.gmail.com/T/#t
Link: https://lore.kernel.org/linux-riscv/Y4jtfrJt+%2FQ5nMOz@spud/
Signed-off-by: Song Shuai <suagrfillet@gmail.com>
Tested-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Guo Ren <guoren@kernel.org>
Acked-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20231130121531.1178502-2-bjorn@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-17 18:17:07 -08:00
Palmer Dabbelt
448857ec53
Merge patch series "RISC-V: Disable DWARF5 with known broken LLVM versions"
Nathan Chancellor <nathan@kernel.org> says:

This series disables DWARF5 for LLVM versions where it is known to be
broken due to linker relaxation.

* b4-shazam-merge:
  lib/Kconfig.debug: Update AS_HAS_NON_CONST_LEB128 comment and name
  riscv: Restrict DWARF5 when building with LLVM to known working versions
  riscv: Hoist linker relaxation disabling logic into Kconfig

Link: bbc0f99f3b
Link: https://lore.kernel.org/r/20231205-riscv-restrict-dwarf5-llvm-v2-0-aedf00a382ac@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-17 18:08:30 -08:00
Nathan Chancellor
ae84ff9a14
riscv: Restrict DWARF5 when building with LLVM to known working versions
LLVM prior to 18.0.0 would generate incorrect debug info for DWARF5 due
to linker relaxation, which was worked around in clang by defaulting
RISC-V to DWARF4 [1]. Unfortunately, this workaround does not work for
the kernel because the DWARF version can be independently changed from
the default in Kconfig.

Do not allow DWARF5 to be selected for RISC-V when using linker
relaxation (ld.lld >= 15.0.0) and a version of LLVM that does not have
the fixes (the integrated assembler [2] and ld.lld [3] < 18.0.0)
necessary to generate the correct debug info.

Link: bbc0f99f3b [1]
Link: 1df5ea29b4 [2]
Link: 7ffabb61a5 [3]
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Fangrui Song <maskray@google.com>
Link: https://lore.kernel.org/r/20231205-riscv-restrict-dwarf5-llvm-v2-2-aedf00a382ac@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-17 18:08:27 -08:00
Nathan Chancellor
55b71d2ce1
riscv: Hoist linker relaxation disabling logic into Kconfig
Certain configurations may need to be disabled if linker relaxation is
in use, such as DWARF5 with ld.lld < 18. Hoist the logic of whether or
not linker relaxation is in use into Kconfig so decisions can be made at
configuration time.

Reviewed-by: Fangrui Song <maskray@google.com>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20231205-riscv-restrict-dwarf5-llvm-v2-1-aedf00a382ac@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-17 18:08:26 -08:00
Linus Torvalds
09d1c6a80f Generic:
- Use memdup_array_user() to harden against overflow.
 
 - Unconditionally advertise KVM_CAP_DEVICE_CTRL for all architectures.
 
 - Clean up Kconfigs that all KVM architectures were selecting
 
 - New functionality around "guest_memfd", a new userspace API that
   creates an anonymous file and returns a file descriptor that refers
   to it.  guest_memfd files are bound to their owning virtual machine,
   cannot be mapped, read, or written by userspace, and cannot be resized.
   guest_memfd files do however support PUNCH_HOLE, which can be used to
   switch a memory area between guest_memfd and regular anonymous memory.
 
 - New ioctl KVM_SET_MEMORY_ATTRIBUTES allowing userspace to specify
   per-page attributes for a given page of guest memory; right now the
   only attribute is whether the guest expects to access memory via
   guest_memfd or not, which in Confidential SVMs backed by SEV-SNP,
   TDX or ARM64 pKVM is checked by firmware or hypervisor that guarantees
   confidentiality (AMD PSP, Intel TDX module, or EL2 in the case of pKVM).
 
 x86:
 
 - Support for "software-protected VMs" that can use the new guest_memfd
   and page attributes infrastructure.  This is mostly useful for testing,
   since there is no pKVM-like infrastructure to provide a meaningfully
   reduced TCB.
 
 - Fix a relatively benign off-by-one error when splitting huge pages during
   CLEAR_DIRTY_LOG.
 
 - Fix a bug where KVM could incorrectly test-and-clear dirty bits in non-leaf
   TDP MMU SPTEs if a racing thread replaces a huge SPTE with a non-huge SPTE.
 
 - Use more generic lockdep assertions in paths that don't actually care
   about whether the caller is a reader or a writer.
 
 - let Xen guests opt out of having PV clock reported as "based on a stable TSC",
   because some of them don't expect the "TSC stable" bit (added to the pvclock
   ABI by KVM, but never set by Xen) to be set.
 
 - Revert a bogus, made-up nested SVM consistency check for TLB_CONTROL.
 
 - Advertise flush-by-ASID support for nSVM unconditionally, as KVM always
   flushes on nested transitions, i.e. always satisfies flush requests.  This
   allows running bleeding edge versions of VMware Workstation on top of KVM.
 
 - Sanity check that the CPU supports flush-by-ASID when enabling SEV support.
 
 - On AMD machines with vNMI, always rely on hardware instead of intercepting
   IRET in some cases to detect unmasking of NMIs
 
 - Support for virtualizing Linear Address Masking (LAM)
 
 - Fix a variety of vPMU bugs where KVM fail to stop/reset counters and other state
   prior to refreshing the vPMU model.
 
 - Fix a double-overflow PMU bug by tracking emulated counter events using a
   dedicated field instead of snapshotting the "previous" counter.  If the
   hardware PMC count triggers overflow that is recognized in the same VM-Exit
   that KVM manually bumps an event count, KVM would pend PMIs for both the
   hardware-triggered overflow and for KVM-triggered overflow.
 
 - Turn off KVM_WERROR by default for all configs so that it's not
   inadvertantly enabled by non-KVM developers, which can be problematic for
   subsystems that require no regressions for W=1 builds.
 
 - Advertise all of the host-supported CPUID bits that enumerate IA32_SPEC_CTRL
   "features".
 
 - Don't force a masterclock update when a vCPU synchronizes to the current TSC
   generation, as updating the masterclock can cause kvmclock's time to "jump"
   unexpectedly, e.g. when userspace hotplugs a pre-created vCPU.
 
 - Use RIP-relative address to read kvm_rebooting in the VM-Enter fault paths,
   partly as a super minor optimization, but mostly to make KVM play nice with
   position independent executable builds.
 
 - Guard KVM-on-HyperV's range-based TLB flush hooks with an #ifdef on
   CONFIG_HYPERV as a minor optimization, and to self-document the code.
 
 - Add CONFIG_KVM_HYPERV to allow disabling KVM support for HyperV "emulation"
   at build time.
 
 ARM64:
 
 - LPA2 support, adding 52bit IPA/PA capability for 4kB and 16kB
   base granule sizes. Branch shared with the arm64 tree.
 
 - Large Fine-Grained Trap rework, bringing some sanity to the
   feature, although there is more to come. This comes with
   a prefix branch shared with the arm64 tree.
 
 - Some additional Nested Virtualization groundwork, mostly
   introducing the NV2 VNCR support and retargetting the NV
   support to that version of the architecture.
 
 - A small set of vgic fixes and associated cleanups.
 
 Loongarch:
 
 - Optimization for memslot hugepage checking
 
 - Cleanup and fix some HW/SW timer issues
 
 - Add LSX/LASX (128bit/256bit SIMD) support
 
 RISC-V:
 
 - KVM_GET_REG_LIST improvement for vector registers
 
 - Generate ISA extension reg_list using macros in get-reg-list selftest
 
 - Support for reporting steal time along with selftest
 
 s390:
 
 - Bugfixes
 
 Selftests:
 
 - Fix an annoying goof where the NX hugepage test prints out garbage
   instead of the magic token needed to run the test.
 
 - Fix build errors when a header is delete/moved due to a missing flag
   in the Makefile.
 
 - Detect if KVM bugged/killed a selftest's VM and print out a helpful
   message instead of complaining that a random ioctl() failed.
 
 - Annotate the guest printf/assert helpers with __printf(), and fix the
   various bugs that were lurking due to lack of said annotation.
 
 There are two non-KVM patches buried in the middle of guest_memfd support:
 
   fs: Rename anon_inode_getfile_secure() and anon_inode_getfd_secure()
   mm: Add AS_UNMOVABLE to mark mapping as completely unmovable
 
 The first is small and mostly suggested-by Christian Brauner; the second
 a bit less so but it was written by an mm person (Vlastimil Babka).
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmWcMWkUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroO15gf/WLmmg3SET6Uzw9iEq2xo28831ZA+
 6kpILfIDGKozV5safDmMvcInlc/PTnqOFrsKyyN4kDZ+rIJiafJdg/loE0kPXBML
 wdR+2ix5kYI1FucCDaGTahskBDz8Lb/xTpwGg9BFLYFNmuUeHc74o6GoNvr1uliE
 4kLZL2K6w0cSMPybUD+HqGaET80ZqPwecv+s1JL+Ia0kYZJONJifoHnvOUJ7DpEi
 rgudVdgzt3EPjG0y1z6MjvDBXTCOLDjXajErlYuZD3Ej8N8s59Dh2TxOiDNTLdP4
 a4zjRvDmgyr6H6sz+upvwc7f4M4p+DBvf+TkWF54mbeObHUYliStqURIoA==
 =66Ws
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "Generic:

   - Use memdup_array_user() to harden against overflow.

   - Unconditionally advertise KVM_CAP_DEVICE_CTRL for all
     architectures.

   - Clean up Kconfigs that all KVM architectures were selecting

   - New functionality around "guest_memfd", a new userspace API that
     creates an anonymous file and returns a file descriptor that refers
     to it. guest_memfd files are bound to their owning virtual machine,
     cannot be mapped, read, or written by userspace, and cannot be
     resized. guest_memfd files do however support PUNCH_HOLE, which can
     be used to switch a memory area between guest_memfd and regular
     anonymous memory.

   - New ioctl KVM_SET_MEMORY_ATTRIBUTES allowing userspace to specify
     per-page attributes for a given page of guest memory; right now the
     only attribute is whether the guest expects to access memory via
     guest_memfd or not, which in Confidential SVMs backed by SEV-SNP,
     TDX or ARM64 pKVM is checked by firmware or hypervisor that
     guarantees confidentiality (AMD PSP, Intel TDX module, or EL2 in
     the case of pKVM).

  x86:

   - Support for "software-protected VMs" that can use the new
     guest_memfd and page attributes infrastructure. This is mostly
     useful for testing, since there is no pKVM-like infrastructure to
     provide a meaningfully reduced TCB.

   - Fix a relatively benign off-by-one error when splitting huge pages
     during CLEAR_DIRTY_LOG.

   - Fix a bug where KVM could incorrectly test-and-clear dirty bits in
     non-leaf TDP MMU SPTEs if a racing thread replaces a huge SPTE with
     a non-huge SPTE.

   - Use more generic lockdep assertions in paths that don't actually
     care about whether the caller is a reader or a writer.

   - let Xen guests opt out of having PV clock reported as "based on a
     stable TSC", because some of them don't expect the "TSC stable" bit
     (added to the pvclock ABI by KVM, but never set by Xen) to be set.

   - Revert a bogus, made-up nested SVM consistency check for
     TLB_CONTROL.

   - Advertise flush-by-ASID support for nSVM unconditionally, as KVM
     always flushes on nested transitions, i.e. always satisfies flush
     requests. This allows running bleeding edge versions of VMware
     Workstation on top of KVM.

   - Sanity check that the CPU supports flush-by-ASID when enabling SEV
     support.

   - On AMD machines with vNMI, always rely on hardware instead of
     intercepting IRET in some cases to detect unmasking of NMIs

   - Support for virtualizing Linear Address Masking (LAM)

   - Fix a variety of vPMU bugs where KVM fail to stop/reset counters
     and other state prior to refreshing the vPMU model.

   - Fix a double-overflow PMU bug by tracking emulated counter events
     using a dedicated field instead of snapshotting the "previous"
     counter. If the hardware PMC count triggers overflow that is
     recognized in the same VM-Exit that KVM manually bumps an event
     count, KVM would pend PMIs for both the hardware-triggered overflow
     and for KVM-triggered overflow.

   - Turn off KVM_WERROR by default for all configs so that it's not
     inadvertantly enabled by non-KVM developers, which can be
     problematic for subsystems that require no regressions for W=1
     builds.

   - Advertise all of the host-supported CPUID bits that enumerate
     IA32_SPEC_CTRL "features".

   - Don't force a masterclock update when a vCPU synchronizes to the
     current TSC generation, as updating the masterclock can cause
     kvmclock's time to "jump" unexpectedly, e.g. when userspace
     hotplugs a pre-created vCPU.

   - Use RIP-relative address to read kvm_rebooting in the VM-Enter
     fault paths, partly as a super minor optimization, but mostly to
     make KVM play nice with position independent executable builds.

   - Guard KVM-on-HyperV's range-based TLB flush hooks with an #ifdef on
     CONFIG_HYPERV as a minor optimization, and to self-document the
     code.

   - Add CONFIG_KVM_HYPERV to allow disabling KVM support for HyperV
     "emulation" at build time.

  ARM64:

   - LPA2 support, adding 52bit IPA/PA capability for 4kB and 16kB base
     granule sizes. Branch shared with the arm64 tree.

   - Large Fine-Grained Trap rework, bringing some sanity to the
     feature, although there is more to come. This comes with a prefix
     branch shared with the arm64 tree.

   - Some additional Nested Virtualization groundwork, mostly
     introducing the NV2 VNCR support and retargetting the NV support to
     that version of the architecture.

   - A small set of vgic fixes and associated cleanups.

  Loongarch:

   - Optimization for memslot hugepage checking

   - Cleanup and fix some HW/SW timer issues

   - Add LSX/LASX (128bit/256bit SIMD) support

  RISC-V:

   - KVM_GET_REG_LIST improvement for vector registers

   - Generate ISA extension reg_list using macros in get-reg-list
     selftest

   - Support for reporting steal time along with selftest

  s390:

   - Bugfixes

  Selftests:

   - Fix an annoying goof where the NX hugepage test prints out garbage
     instead of the magic token needed to run the test.

   - Fix build errors when a header is delete/moved due to a missing
     flag in the Makefile.

   - Detect if KVM bugged/killed a selftest's VM and print out a helpful
     message instead of complaining that a random ioctl() failed.

   - Annotate the guest printf/assert helpers with __printf(), and fix
     the various bugs that were lurking due to lack of said annotation"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (185 commits)
  x86/kvm: Do not try to disable kvmclock if it was not enabled
  KVM: x86: add missing "depends on KVM"
  KVM: fix direction of dependency on MMU notifiers
  KVM: introduce CONFIG_KVM_COMMON
  KVM: arm64: Add missing memory barriers when switching to pKVM's hyp pgd
  KVM: arm64: vgic-its: Avoid potential UAF in LPI translation cache
  RISC-V: KVM: selftests: Add get-reg-list test for STA registers
  RISC-V: KVM: selftests: Add steal_time test support
  RISC-V: KVM: selftests: Add guest_sbi_probe_extension
  RISC-V: KVM: selftests: Move sbi_ecall to processor.c
  RISC-V: KVM: Implement SBI STA extension
  RISC-V: KVM: Add support for SBI STA registers
  RISC-V: KVM: Add support for SBI extension registers
  RISC-V: KVM: Add SBI STA info to vcpu_arch
  RISC-V: KVM: Add steal-update vcpu request
  RISC-V: KVM: Add SBI STA extension skeleton
  RISC-V: paravirt: Implement steal-time support
  RISC-V: Add SBI STA extension definitions
  RISC-V: paravirt: Add skeleton for pv-time support
  RISC-V: KVM: Fix indentation in kvm_riscv_vcpu_set_reg_csr()
  ...
2024-01-17 13:03:37 -08:00
Linus Torvalds
4331f07026 RISC-V Patches for the 6.8 Merge Window, Part 1
* Support for many new extensions in hwprobe, along with a handful of
   cleanups.
 * Various cleanups to our page table handling code, so we alwayse use
   {READ,WRITE}_ONCE.
 * Support for the which-cpus flavor of hwprobe.
 * Support for XIP kernels has been resurrected.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmWhb+sTHHBhbG1lckBk
 YWJiZWx0LmNvbQAKCRAuExnzX7sYiWyJEADH/l2PND3AE2sfhtkDceMR8k+MOrjn
 3T0+EIow28tBEpcu7Bdu7aw65ZQDgV9aEDuo8HYlwtimPUfvTQ01QiwDRVZoxPGT
 4Br2X7n5lczQOvp6r5+8p34viQVNXaBXApgZc+iMbelj0W7AnNJNdr8/d1pMw/hA
 y6v8rq6BBgFKZKmU0va+T2AaXQN3nj/fme1l8Rn6Wf8JpaBtTnlNWGOepRfJdFbv
 ZewTEqu4CVmCE6ij8c+Gatk8k71KXLjH3mSjZ2F0FIreI0I5pdD9OKQJk+hiRCEA
 wnEneWyl+rHPUTRXpZEeLVPD4gBTbKt20awImpNG+eN+l68s4ESNWP2EZM4n5utF
 NWJAscxMA1c8NlWhnQfAKK2eAmi2sp0/9O3pTfpvZ7yWAp/GpkZGEuAaQe4R80X+
 0lLKrS8P8T2ZSA5UVfszN5vLXU/Ae3GpAQCJkzoYXjDes8sxw4fjHcg/AWn/ZmrO
 FoqPA1ka/2i0b5be+p3Emt5kfTK8WeDnV2rV1ZLYEJYBkXdTLAM8jR+mhXJ7z59P
 shfOSpZ7icvX7Q3t/eFKApryM93JE3w6WZBOYuY4D7FPoPSxJG7VgL2U42wiTZjj
 xr1ta4vdfEqWgRpAOvGaP569MQ9awzA6JZHJQOVLx9FOWox2gMWsTB8xQ33y5k/n
 eNd7JjUOu4K3jQ==
 =fLgG
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-6.8-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V updates from Palmer Dabbelt:

 - Support for many new extensions in hwprobe, along with a handful of
   cleanups

 - Various cleanups to our page table handling code, so we alwayse use
   {READ,WRITE}_ONCE

 - Support for the which-cpus flavor of hwprobe

 - Support for XIP kernels has been resurrected

* tag 'riscv-for-linus-6.8-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (52 commits)
  riscv: hwprobe: export Zicond extension
  riscv: hwprobe: export Zacas ISA extension
  riscv: add ISA extension parsing for Zacas
  dt-bindings: riscv: add Zacas ISA extension description
  riscv: hwprobe: export Ztso ISA extension
  riscv: add ISA extension parsing for Ztso
  use linux/export.h rather than asm-generic/export.h
  riscv: Remove SHADOW_OVERFLOW_STACK_SIZE macro
  riscv; fix __user annotation in save_v_state()
  riscv: fix __user annotation in traps_misaligned.c
  riscv: Select ARCH_WANTS_NO_INSTR
  riscv: Remove obsolete rv32_defconfig file
  riscv: Allow disabling of BUILTIN_DTB for XIP
  riscv: Fixed wrong register in XIP_FIXUP_FLASH_OFFSET macro
  riscv: Make XIP bootable again
  riscv: Fix set_direct_map_default_noflush() to reset _PAGE_EXEC
  riscv: Fix module_alloc() that did not reset the linear mapping permissions
  riscv: Fix wrong usage of lm_alias() when splitting a huge linear mapping
  riscv: Check if the code to patch lies in the exit section
  riscv: Use the same CPU operations for all CPUs
  ...
2024-01-17 10:50:46 -08:00
Palmer Dabbelt
a894e8ed09
Merge patch series "riscv: support kernel-mode Vector"
Andy Chiu <andy.chiu@sifive.com> says:

This series provides support running Vector in kernel mode.
Additionally, kernel-mode Vector can be configured to run without
turnning off preemption on a CONFIG_PREEMPT kernel. Along with the
suport, we add Vector optimized copy_{to,from}_user. And provide a
simple threshold to decide when to run the vectorized functions.

We decided to drop vectorized memcpy/memset/memmove for the moment due
to the concern of memory side-effect in kernel_vector_begin(). The
detailed description can be found at v9[0]

This series is composed by 4 parts:
 patch 1-4: adds basic support for kernel-mode Vector
 patch 5: includes vectorized copy_{to,from}_user into the kernel
 patch 6: refactor context switch code in fpu [1]
 patch 7-10: provides some code refactors and support for preemptible
             kernel-mode Vector.

This series can be merged if we feel any part of {1~4, 5, 6, 7~10} is
mature enough.

This patch is tested on a QEMU with V and verified that booting, normal
userspace operations all work as usual with thresholds set to 0. Also,
we test by launching multiple kernel threads which continuously executes
and verifies Vector operations in the background. The module that tests
these operation is expected to be upstream later.

* b4-shazam-merge:
  riscv: vector: allow kernel-mode Vector with preemption
  riscv: vector: use kmem_cache to manage vector context
  riscv: vector: use a mask to write vstate_ctrl
  riscv: vector: do not pass task_struct into riscv_v_vstate_{save,restore}()
  riscv: fpu: drop SR_SD bit checking
  riscv: lib: vectorize copy_to_user/copy_from_user
  riscv: sched: defer restoring Vector context for user
  riscv: Add vector extension XOR implementation
  riscv: vector: make Vector always available for softirq context
  riscv: Add support for kernel mode vector

Link: https://lore.kernel.org/r/20240115055929.4736-1-andy.chiu@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-16 07:14:04 -08:00
Andy Chiu
2080ff9493
riscv: vector: allow kernel-mode Vector with preemption
Add kernel_vstate to keep track of kernel-mode Vector registers when
trap introduced context switch happens. Also, provide riscv_v_flags to
let context save/restore routine track context status. Context tracking
happens whenever the core starts its in-kernel Vector executions. An
active (dirty) kernel task's V contexts will be saved to memory whenever
a trap-introduced context switch happens. Or, when a softirq, which
happens to nest on top of it, uses Vector. Context retoring happens when
the execution transfer back to the original Kernel context where it
first enable preempt_v.

Also, provide a config CONFIG_RISCV_ISA_V_PREEMPTIVE to give users an
option to disable preemptible kernel-mode Vector at build time. Users
with constraint memory may want to disable this config as preemptible
kernel-mode Vector needs extra space for tracking of per thread's
kernel-mode V context. Or, users might as well want to disable it if all
kernel-mode Vector code is time sensitive and cannot tolerate context
switch overhead.

Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Link: https://lore.kernel.org/r/20240115055929.4736-11-andy.chiu@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-16 07:14:02 -08:00
Andy Chiu
c2a658d419
riscv: lib: vectorize copy_to_user/copy_from_user
This patch utilizes Vector to perform copy_to_user/copy_from_user. If
Vector is available and the size of copy is large enough for Vector to
perform better than scalar, then direct the kernel to do Vector copies
for userspace. Though the best programming practice for users is to
reduce the copy, this provides a faster variant when copies are
inevitable.

The optimal size for using Vector, copy_to_user_thres, is only a
heuristic for now. We can add DT parsing if people feel the need of
customizing it.

The exception fixup code of the __asm_vector_usercopy must fallback to
the scalar one because accessing user pages might fault, and must be
sleepable. Current kernel-mode Vector does not allow tasks to be
preemptible, so we must disactivate Vector and perform a scalar fallback
in such case.

The original implementation of Vector operations comes from
https://github.com/sifive/sifive-libc, which we agree to contribute to
Linux kernel.

Co-developed-by: Jerry Shih <jerry.shih@sifive.com>
Signed-off-by: Jerry Shih <jerry.shih@sifive.com>
Co-developed-by: Nick Knight <nick.knight@sifive.com>
Signed-off-by: Nick Knight <nick.knight@sifive.com>
Suggested-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Link: https://lore.kernel.org/r/20240115055929.4736-6-andy.chiu@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-16 07:13:57 -08:00
Alexandre Ghiti
54d7431af7
riscv: Add support for BATCHED_UNMAP_TLB_FLUSH
Allow to defer the flushing of the TLB when unmapping pages, which allows
to reduce the numbers of IPI and the number of sfence.vma.

The ubenchmarch used in commit 43b3dfdd04 ("arm64: support
batched/deferred tlb shootdown during page reclamation/migration") that
was multithreaded to force the usage of IPI shows good performance
improvement on all platforms:

* Unmatched: ~34%
* TH1520   : ~78%
* Qemu     : ~81%

In addition, perf on qemu reports an important decrease in time spent
dealing with IPIs:

Before:  68.17%  main     [kernel.kallsyms]            [k] __sbi_rfence_v02_call
After :   8.64%  main     [kernel.kallsyms]            [k] __sbi_rfence_v02_call

* Benchmark:

int stick_this_thread_to_core(int core_id) {
        int num_cores = sysconf(_SC_NPROCESSORS_ONLN);
        if (core_id < 0 || core_id >= num_cores)
           return EINVAL;

        cpu_set_t cpuset;
        CPU_ZERO(&cpuset);
        CPU_SET(core_id, &cpuset);

        pthread_t current_thread = pthread_self();
        return pthread_setaffinity_np(current_thread,
sizeof(cpu_set_t), &cpuset);
}

static void *fn_thread (void *p_data)
{
        int ret;
        pthread_t thread;

        stick_this_thread_to_core((int)p_data);

        while (1) {
                sleep(1);
        }

        return NULL;
}

int main()
{
        volatile unsigned char *p = mmap(NULL, SIZE, PROT_READ | PROT_WRITE,
                                         MAP_SHARED | MAP_ANONYMOUS, -1, 0);
        pthread_t threads[4];
        int ret;

        for (int i = 0; i < 4; ++i) {
                ret = pthread_create(&threads[i], NULL, fn_thread, (void *)i);
                if (ret)
                {
                        printf("%s", strerror (ret));
                }
        }

        memset(p, 0x88, SIZE);

        for (int k = 0; k < 10000; k++) {
                /* swap in */
                for (int i = 0; i < SIZE; i += 4096) {
                        (void)p[i];
                }

                /* swap out */
                madvise(p, SIZE, MADV_PAGEOUT);
        }

        for (int i = 0; i < 4; i++)
        {
                pthread_cancel(threads[i]);
        }

        for (int i = 0; i < 4; i++)
        {
                pthread_join(threads[i], NULL);
        }

        return 0;
}

Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Jisheng Zhang <jszhang@kernel.org>
Tested-by: Jisheng Zhang <jszhang@kernel.org> # Tested on TH1520
Tested-by: Nam Cao <namcao@linutronix.de>
Link: https://lore.kernel.org/r/20240108193640.344929-1-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-11 08:01:53 -08:00
Andrew Jones
4dc4af9ce3
riscv: sbi: Introduce system suspend support
When the SUSP SBI extension is present it implies that the standard
"suspend to RAM" type is available. Wire it up to the generic
platform suspend support, also applying the already present support
for non-retentive CPU suspend. When the kernel is built with
CONFIG_SUSPEND, one can do 'echo mem > /sys/power/state' to suspend.
Resumption will occur when a platform-specific wake-up event arrives.

Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Tested-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20231206110807.35882-4-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-11 07:36:26 -08:00
Palmer Dabbelt
17f2c30805
Merge patch series "riscv: enable EFFICIENT_UNALIGNED_ACCESS and DCACHE_WORD_ACCESS"
Jisheng Zhang <jszhang@kernel.org> says:

Some riscv implementations such as T-HEAD's C906, C908, C910 and C920
support efficient unaligned access, for performance reason we want
to enable HAVE_EFFICIENT_UNALIGNED_ACCESS on these platforms. To
avoid performance regressions on non efficient unaligned access
platforms, HAVE_EFFICIENT_UNALIGNED_ACCESS can't be globally selected.

To solve this problem, runtime code patching based on the detected
speed is a good solution. But that's not easy, it involves lots of
work to modify vairous subsystems such as net, mm, lib and so on.
This can be done step by step.

So let's take an easier solution: add support to efficient unaligned
access and hide the support under NONPORTABLE.

patch1 introduces RISCV_EFFICIENT_UNALIGNED_ACCESS which depends on
NONPORTABLE, if users know during config time that the kernel will be
only run on those efficient unaligned access hw platforms, they can
enable it. Obviously, generic unified kernel Image shouldn't enable it.

patch2 adds support DCACHE_WORD_ACCESS when MMU and
RISCV_EFFICIENT_UNALIGNED_ACCESS.

Below test program and step shows how much performance can be improved:

 $ cat tt.c
 #include <sys/types.h>
 #include <sys/stat.h>
 #include <unistd.h>

 #define ITERATIONS 1000000

 #define PATH "123456781234567812345678123456781"

 int main(void)
 {
         unsigned long i;
         struct stat buf;

         for (i = 0; i < ITERATIONS; i++)
                 stat(PATH, &buf);

         return 0;
 }

 $ gcc -O2 tt.c
 $ touch 123456781234567812345678123456781
 $ time ./a.out

Per my test on T-HEAD C910 platforms, the above test performance is
improved by about 7.5%.

* b4-shazam-merge:
  riscv: select DCACHE_WORD_ACCESS for efficient unaligned access HW
  riscv: introduce RISCV_EFFICIENT_UNALIGNED_ACCESS

Link: https://lore.kernel.org/r/20231225044207.3821-1-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-11 07:36:24 -08:00
Jisheng Zhang
d0fdc20b04
riscv: select DCACHE_WORD_ACCESS for efficient unaligned access HW
DCACHE_WORD_ACCESS uses the word-at-a-time API for optimised string
comparisons in the vfs layer.

This patch implements support for load_unaligned_zeropad in much the
same way as has been done for arm64.

Here is the test program and step:

 $ cat tt.c
 #include <sys/types.h>
 #include <sys/stat.h>
 #include <unistd.h>

 #define ITERATIONS 1000000

 #define PATH "123456781234567812345678123456781"

 int main(void)
 {
         unsigned long i;
         struct stat buf;

         for (i = 0; i < ITERATIONS; i++)
                 stat(PATH, &buf);

         return 0;
 }

 $ gcc -O2 tt.c
 $ touch 123456781234567812345678123456781
 $ time ./a.out

Per my test on T-HEAD C910 platforms, the above test performance is
improved by about 7.5%.

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Link: https://lore.kernel.org/r/20231225044207.3821-3-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-09 20:18:20 -08:00
Jisheng Zhang
b6da6cbe13
riscv: introduce RISCV_EFFICIENT_UNALIGNED_ACCESS
Some riscv implementations such as T-HEAD's C906, C908, C910 and C920
support efficient unaligned access, for performance reason we want
to enable HAVE_EFFICIENT_UNALIGNED_ACCESS on these platforms. To
avoid performance regressions on other non efficient unaligned access
platforms, HAVE_EFFICIENT_UNALIGNED_ACCESS can't be globally selected.

To solve this problem, runtime code patching based on the detected
speed is a good solution. But that's not easy, it involves lots of
work to modify vairous subsystems such as net, mm, lib and so on.
This can be done step by step.

So let's take an easier solution: add support to efficient unaligned
access and hide the support under NONPORTABLE.

Now let's introduce RISCV_EFFICIENT_UNALIGNED_ACCESS which depends on
NONPORTABLE, if users know during config time that the kernel will be
only run on those efficient unaligned access hw platforms, they can
enable it. Obviously, generic unified kernel Image shouldn't enable it.

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20231225044207.3821-2-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-09 20:18:19 -08:00
Palmer Dabbelt
2bf8acbf54
Merge patch series "Fix XIP boot and make XIP testable in QEMU"
Frederik Haxel <haxel@fzi.de> says:

XIP boot seems to be broken for some time now. A likely reason why no one
seems to have noticed this is that XIP is more difficult to test, as it is
currently not easily testable with QEMU.

These patches fix the XIP boot and allow an XIP build without BUILTIN_DTB,
which in turn makes it easier to test an image with the QEMU virt machine.

* b4-shazam-merge:
  riscv: Allow disabling of BUILTIN_DTB for XIP
  riscv: Fixed wrong register in XIP_FIXUP_FLASH_OFFSET macro
  riscv: Make XIP bootable again

Link: https://lore.kernel.org/r/20231212130116.848530-1-haxel@fzi.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-09 20:10:39 -08:00
Jisheng Zhang
cfbc4f81c9
riscv: Select ARCH_WANTS_NO_INSTR
As said in the help of ARCH_WANTS_NO_INSTR entry in arch/Kconfig:
"An architecture should select this if the noinstr macro is being used on
functions to denote that the toolchain should avoid instrumenting such
functions and is required for correctness."

Select ARCH_WANTS_NO_INSTR for correctness.

PS: The reason we didn't find any issue so far is that the
CC_HAS_NO_PROFILE_FN_ATTR is true.

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Link: https://lore.kernel.org/r/20231123142223.1787-1-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-09 20:10:16 -08:00
Frederik Haxel
6c4a2f6329
riscv: Allow disabling of BUILTIN_DTB for XIP
This enables, among other things, testing with the QEMU virt machine.

To build an XIP kernel for the QEMU virt machine, configure the
the kernel as desired and apply the following configuration
```
CONFIG_NONPORTABLE=y
CONFIG_XIP_KERNEL=y
CONFIG_XIP_PHYS_ADDR=0x20000000
CONFIG_PHYS_RAM_BASE=0x80200000
CONFIG_BUILTIN_DTB=n
```

Since the QEMU virt flash memory expects a 32 MB file, the built image
must be padded. For example, with
`truncate -s 32M arch/riscv/boot/xipImage`

The kernel can be started using the following command in QEMU (v8+)
```
qemu-system-riscv64 -M virt,pflash0=pflash0 \
 -blockdev node-name=pflash0,driver=file,read-only=on,\
filename=arch/riscv/boot/xipImage <optional parameters>
```

Signed-off-by: Frederik Haxel <haxel@fzi.de>
Link: https://lore.kernel.org/r/20231212130116.848530-4-haxel@fzi.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-09 19:33:22 -08:00
Andrew Jones
fdf68acccf RISC-V: paravirt: Implement steal-time support
When the SBI STA extension exists we can use it to implement
paravirt steal-time support. Fill in the empty pv-time functions
with an SBI STA implementation and add the Kconfig knobs allowing
it to be enabled.

Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
2023-12-30 11:26:04 +05:30
Arnd Bergmann
c1ad12ee0e kexec: fix KEXEC_FILE dependencies
The cleanup for the CONFIG_KEXEC Kconfig logic accidentally changed the
'depends on CRYPTO=y' dependency to a plain 'depends on CRYPTO', which
causes a link failure when all the crypto support is in a loadable module
and kexec_file support is built-in:

x86_64-linux-ld: vmlinux.o: in function `__x64_sys_kexec_file_load':
(.text+0x32e30a): undefined reference to `crypto_alloc_shash'
x86_64-linux-ld: (.text+0x32e58e): undefined reference to `crypto_shash_update'
x86_64-linux-ld: (.text+0x32e6ee): undefined reference to `crypto_shash_final'

Both s390 and x86 have this problem, while ppc64 and riscv have the
correct dependency already.  On riscv, the dependency is only used for the
purgatory, not for the kexec_file code itself, which may be a bit
surprising as it means that with CONFIG_CRYPTO=m, it is possible to enable
KEXEC_FILE but then the purgatory code is silently left out.

Move this into the common Kconfig.kexec file in a way that is correct
everywhere, using the dependency on CRYPTO_SHA256=y only when the
purgatory code is available.  This requires reversing the dependency
between ARCH_SUPPORTS_KEXEC_PURGATORY and KEXEC_FILE, but the effect
remains the same, other than making riscv behave like the other ones.

On s390, there is an additional dependency on CRYPTO_SHA256_S390, which
should technically not be required but gives better performance.  Remove
this dependency here, noting that it was not present in the initial
Kconfig code but was brought in without an explanation in commit
71406883fd ("s390/kexec_file: Add kexec_file_load system call").

[arnd@arndb.de: fix riscv build]
  Link: https://lkml.kernel.org/r/67ddd260-d424-4229-a815-e3fcfb864a77@app.fastmail.com
Link: https://lkml.kernel.org/r/20231023110308.1202042-1-arnd@kernel.org
Fixes: 6af5138083 ("x86/kexec: refactor for kernel/Kconfig.kexec")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Eric DeVolder <eric_devolder@yahoo.com>
Tested-by: Eric DeVolder <eric_devolder@yahoo.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Conor Dooley <conor@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-20 13:46:19 -08:00
Alexandre Ghiti
6b9f29b81b riscv: Enable pcpu page first chunk allocator
As explained in commit 6ea529a203 ("percpu: make embedding first chunk
allocator check vmalloc space size"), the embedding first chunk allocator
needs the vmalloc space to be larger than the maximum distance between
units which are grouped into NUMA nodes.

On a very sparse NUMA configurations and a small vmalloc area (for example,
it is 64GB in sv39), the allocation of dynamic percpu data in the vmalloc
area could fail.

So provide the pcpu page allocator as a fallback in case we fall into
such a sparse configuration (which happened in arm64 as shown by
commit 09cea61950 ("arm64: support page mapping percpu first chunk
allocator")).

Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Dennis Zhou <dennis@kernel.org>
2023-12-14 00:24:06 -08:00
Ignat Korchagin
c41bd25141 kexec: drop dependency on ARCH_SUPPORTS_KEXEC from CRASH_DUMP
In commit f8ff23429c62 ("kernel/Kconfig.kexec: drop select of KEXEC for
CRASH_DUMP") we tried to fix a config regression, where CONFIG_CRASH_DUMP
required CONFIG_KEXEC.

However, it was not enough at least for arm64 platforms.  While further
testing the patch with our arm64 config I noticed that CONFIG_CRASH_DUMP
is unavailable in menuconfig.  This is because CONFIG_CRASH_DUMP still
depends on the new CONFIG_ARCH_SUPPORTS_KEXEC introduced in commit
91506f7e5d ("arm64/kexec: refactor for kernel/Kconfig.kexec") and on
arm64 CONFIG_ARCH_SUPPORTS_KEXEC requires CONFIG_PM_SLEEP_SMP=y, which in
turn requires either CONFIG_SUSPEND=y or CONFIG_HIBERNATION=y neither of
which are set in our config.

Given that we already established that CONFIG_KEXEC (which is a switch for
kexec system call itself) is not required for CONFIG_CRASH_DUMP drop
CONFIG_ARCH_SUPPORTS_KEXEC dependency as well.  The arm64 kernel builds
just fine with CONFIG_CRASH_DUMP=y and with both CONFIG_KEXEC=n and
CONFIG_KEXEC_FILE=n after f8ff23429c62 ("kernel/Kconfig.kexec: drop select
of KEXEC for CRASH_DUMP") and this patch are applied given that the
necessary shared bits are included via CONFIG_KEXEC_CORE dependency.

[bhe@redhat.com: don't export some symbols when CONFIG_MMU=n]
  Link: https://lkml.kernel.org/r/ZW03ODUKGGhP1ZGU@MiWiFi-R3L-srv
[bhe@redhat.com: riscv, kexec: fix dependency of two items]
  Link: https://lkml.kernel.org/r/ZW04G/SKnhbE5mnX@MiWiFi-R3L-srv
Link: https://lkml.kernel.org/r/20231129220409.55006-1-ignat@cloudflare.com
Fixes: 91506f7e5d ("arm64/kexec: refactor for kernel/Kconfig.kexec")
Signed-off-by: Ignat Korchagin <ignat@cloudflare.com>
Signed-off-by: Baoquan He <bhe@redhat.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: <stable@vger.kernel.org> # 6.6+: f8ff234: kernel/Kconfig.kexec: drop select of KEXEC for CRASH_DUMP
Cc: <stable@vger.kernel.org> # 6.6+
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-12 17:20:16 -08:00
James Morse
96cf203651 riscv: Switch over to GENERIC_CPU_DEVICES
Now that GENERIC_CPU_DEVICES calls arch_register_cpu(), which can be
overridden by the arch code, switch over to this to allow common code
to choose when the register_cpu() call is made.

This allows topology_init() to be removed.

This is an intermediate step to the logic being moved to drivers/acpi,
where GENERIC_CPU_DEVICES will do the work when booting with acpi=off.

This patch also has the effect of moving the registration of CPUs from
subsys to driver core initialisation, prior to any initcalls running.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>
Reviewed-by: Samuel Holland <samuel.holland@sifive.com>
Tested-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/E1r5R4G-00Ct0M-PS@rmk-PC.armlinux.org.uk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-06 12:41:50 +09:00
Linus Torvalds
56d428ae1c RISC-V Patches for the 6.7 Merge Window, Part 2
* Support for handling misaligned accesses in S-mode.
 * Probing for misaligned access support is now properly cached and
   handled in parallel.
 * PTDUMP now reflects the SW reserved bits, as well as the PBMT and
   NAPOT extensions.
 * Performance improvements for TLB flushing.
 * Support for many new relocations in the module loader.
 * Various bug fixes and cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmVOUCcTHHBhbG1lckBk
 YWJiZWx0LmNvbQAKCRAuExnzX7sYicJ2D/9S+9dnHYHVGTeJfr9Zf2T4r+qHBPyx
 LXbTAbgHN6139MgcRLMRlcUaQ04RVxuBCWhxewJ6mQiHiYNlullgKmJO8oYMS4uZ
 2yQGHKhzKEVluXxe+qT6VW+zsP0cY6pDQ+e59AqZgyWzvATxMU4VtFfCDdjFG03I
 k/8Y3MUKSHAKzIHUsGHiMW5J2YRiM/iVehv2gZfanreulWlK6lyiV4AZ4KChu8Sa
 gix9QkFJw+9+7RHnouHvczt4xTqLPJQcdecLJsbisEI4VaaPtTVzkvXx/kwbMwX0
 qkQnZ7I60fPHrCb9ccuedjDMa1Z0lrfwRldBGz9f9QaW37Eppirn6LA5JiZ1cA47
 wKTwba6gZJCTRXELFTJLcv+Cwdy003E0y3iL5UK2rkbLqcxfvLdq1WAJU2t05Lmh
 aRQN10BtM2DZG+SNPlLoBpXPDw0Q3KOc20zGtuhmk010+X4yOK7WXlu8zNGLLE0+
 yHamiZqAbpIUIEzwDdGbb95jywR1sUhNTbScuhj4Rc79ZqLtPxty1PUhnfqFat1R
 i3ngQtCbeUUYFS2YV9tKkXjLf/xkQNRbt7kQBowuvFuvfksl9UwMdRAWcE/h0M9P
 7uz7cBFhuG0v/XblB7bUhYLkKITvP+ltSMyxaGlfpGqCLAH2KIztdZ2PLWLRdKeU
 +9dtZSQR6oBLqQ==
 =NhdR
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-6.7-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull more RISC-V updates from Palmer Dabbelt:

 - Support for handling misaligned accesses in S-mode

 - Probing for misaligned access support is now properly cached and
   handled in parallel

 - PTDUMP now reflects the SW reserved bits, as well as the PBMT and
   NAPOT extensions

 - Performance improvements for TLB flushing

 - Support for many new relocations in the module loader

 - Various bug fixes and cleanups

* tag 'riscv-for-linus-6.7-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (51 commits)
  riscv: Optimize bitops with Zbb extension
  riscv: Rearrange hwcap.h and cpufeature.h
  drivers: perf: Do not broadcast to other cpus when starting a counter
  drivers: perf: Check find_first_bit() return value
  of: property: Add fw_devlink support for msi-parent
  RISC-V: Don't fail in riscv_of_parent_hartid() for disabled HARTs
  riscv: Fix set_memory_XX() and set_direct_map_XX() by splitting huge linear mappings
  riscv: Don't use PGD entries for the linear mapping
  RISC-V: Probe misaligned access speed in parallel
  RISC-V: Remove __init on unaligned_emulation_finish()
  RISC-V: Show accurate per-hart isa in /proc/cpuinfo
  RISC-V: Don't rely on positional structure initialization
  riscv: Add tests for riscv module loading
  riscv: Add remaining module relocations
  riscv: Avoid unaligned access when relocating modules
  riscv: split cache ops out of dma-noncoherent.c
  riscv: Improve flush_tlb_kernel_range()
  riscv: Make __flush_tlb_range() loop over pte instead of flushing the whole tlb
  riscv: Improve flush_tlb_range() for hugetlb pages
  riscv: Improve tlb_flush()
  ...
2023-11-10 09:23:17 -08:00
Linus Torvalds
d46392bbf5 RISC-V Patches for the 6.7 Merge Window, Part 1
* Support for cbo.zero in userspace.
 * Support for CBOs on ACPI-based systems.
 * A handful of improvements for the T-Head cache flushing ops.
 * Support for software shadow call stacks.
 * Various cleanups and fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmVJAJoTHHBhbG1lckBk
 YWJiZWx0LmNvbQAKCRAuExnzX7sYiWZrD/9ECV/0tuX5LbS56kA0ElkwiakyIVGu
 ZVuF26yGJ6w+XvwnHPhqKNVN0ReYR6s6CwH1WpHI5Du9QHZGQU3DKJ43dFMTP3Dn
 dQFli7QJ+tsNo1nre8NZWKj5Ac+Cu906F794qM0q0XrZmyb9DY3ojVYJAYy+dtoo
 /9gwbB7P0GLyDlURLn48oQyz36WQW3CkL5Jkfu+uYwnFe9DAFtfakIKq5mLlNuaH
 PgUk8pAVhSy2GdPOGFtnFFhdXMrTjpgxdo62ZIZC0lbsts26Dxp95oUygqMg51Iy
 ilaXkA2U1c1+gFQNpEove7BVZa5708Kaj6RLQ3/kAJblAzibszwQvIWlWOh7RVni
 3GQAS7/0D0+0cjDwXdWaPIaFFzLfi3bDxRYkc7n59p6nOz+GrxnSNsRPQJGgYxeU
 oTtJfaqWKntm72iutiHmXgx/pvAxWOHpqDnSTlDdtjvgzXCplqBbxZFF/azj30o5
 jplNW5YvdvD9fviYMAoGSOz03IwDeZF5rMlAhqu6vXlyD2//mID82yw/hBluIA3+
 /hLo5QfTLiUGs9nnijxMcfoyusN6AXsJOxwYdAJCIuJOr78YUj0S974gd9KvJXma
 KedrwRVwW7KE7CwY1jhrWBsZEpzl8YrtpMDN47y4gRtDZN8XJMQ+lHqd+BHT/DUO
 TGUCYi5xvr6Vlw==
 =hKWl
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V updates from Palmer Dabbelt:

 - Support for cbo.zero in userspace

 - Support for CBOs on ACPI-based systems

 - A handful of improvements for the T-Head cache flushing ops

 - Support for software shadow call stacks

 - Various cleanups and fixes

* tag 'riscv-for-linus-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (31 commits)
  RISC-V: hwprobe: Fix vDSO SIGSEGV
  riscv: configs: defconfig: Enable configs required for RZ/Five SoC
  riscv: errata: prefix T-Head mnemonics with th.
  riscv: put interrupt entries into .irqentry.text
  riscv: mm: Update the comment of CONFIG_PAGE_OFFSET
  riscv: Using TOOLCHAIN_HAS_ZIHINTPAUSE marco replace zihintpause
  riscv/mm: Fix the comment for swap pte format
  RISC-V: clarify the QEMU workaround in ISA parser
  riscv: correct pt_level name via pgtable_l5/4_enabled
  RISC-V: Provide pgtable_l5_enabled on rv32
  clocksource: timer-riscv: Increase rating of clock_event_device for Sstc
  clocksource: timer-riscv: Don't enable/disable timer interrupt
  lkdtm: Fix CFI_BACKWARD on RISC-V
  riscv: Use separate IRQ shadow call stacks
  riscv: Implement Shadow Call Stack
  riscv: Move global pointer loading to a macro
  riscv: Deduplicate IRQ stack switching
  riscv: VMAP_STACK overflow detection thread-safe
  RISC-V: cacheflush: Initialize CBO variables on ACPI systems
  RISC-V: ACPI: RHCT: Add function to get CBO block sizes
  ...
2023-11-08 09:21:18 -08:00
Andreas Schwab
e0c0a7c35f
riscv: select ARCH_PROC_KCORE_TEXT
This adds a separate segment for kernel text in /proc/kcore, which has a
different address than the direct linear map.

Signed-off-by: Andreas Schwab <schwab@suse.de>
Link: https://lore.kernel.org/r/mvmh6m758ao.fsf@suse.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-11-06 16:25:45 -08:00
Palmer Dabbelt
0619ff9f02
Merge patch series "Add support to handle misaligned accesses in S-mode"
Clément Léger <cleger@rivosinc.com> says:

Since commit 61cadb9 ("Provide new description of misaligned load/store
behavior compatible with privileged architecture.") in the RISC-V ISA
manual, it is stated that misaligned load/store might not be supported.
However, the RISC-V kernel uABI describes that misaligned accesses are
supported. In order to support that, this series adds support for S-mode
handling of misaligned accesses as well support for prctl(PR_UNALIGN).

Handling misaligned access in kernel allows for a finer grain control
of the misaligned accesses behavior, and thanks to the prctl() call,
can allow disabling misaligned access emulation to generate SIGBUS. User
space can then optimize its software by removing such access based on
SIGBUS generation.

This series is useful when using a SBI implementation that does not
handle misaligned traps as well as detecting misaligned accesses
generated by userspace application using the prctrl(PR_SET_UNALIGN)
feature.

This series can be tested using the spike simulator[1] and a modified
openSBI version[2] which allows to always delegate misaligned load/store to
S-mode. A test[3] that exercise various instructions/registers can be
executed to verify the unaligned access support.

[1] https://github.com/riscv-software-src/riscv-isa-sim
[2] https://github.com/rivosinc/opensbi/tree/dev/cleger/no_misaligned
[3] https://github.com/clementleger/unaligned_test

* b4-shazam-merge:
  riscv: add support for PR_SET_UNALIGN and PR_GET_UNALIGN
  riscv: report misaligned accesses emulation to hwprobe
  riscv: annotate check_unaligned_access_boot_cpu() with __init
  riscv: add support for sysctl unaligned_enabled control
  riscv: add floating point insn support to misaligned access emulation
  riscv: report perf event for misaligned fault
  riscv: add support for misaligned trap handling in S-mode
  riscv: remove unused functions in traps_misaligned.c

Link: https://lore.kernel.org/r/20231004151405.521596-1-cleger@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-11-05 06:42:51 -08:00
Linus Torvalds
8f6f76a6a2 As usual, lots of singleton and doubleton patches all over the tree and
there's little I can say which isn't in the individual changelogs.
 
 The lengthier patch series are
 
 - "kdump: use generic functions to simplify crashkernel reservation in
   arch", from Baoquan He.  This is mainly cleanups and consolidation of
   the "crashkernel=" kernel parameter handling.
 
 - After much discussion, David Laight's "minmax: Relax type checks in
   min() and max()" is here.  Hopefully reduces some typecasting and the
   use of min_t() and max_t().
 
 - A group of patches from Oleg Nesterov which clean up and slightly fix
   our handling of reads from /proc/PID/task/...  and which remove
   task_struct.therad_group.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZUQP9wAKCRDdBJ7gKXxA
 jmOAAQDh8sxagQYocoVsSm28ICqXFeaY9Co1jzBIDdNesAvYVwD/c2DHRqJHEiS4
 63BNcG3+hM9nwGJHb5lyh5m79nBMRg0=
 =On4u
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2023-11-02-14-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull non-MM updates from Andrew Morton:
 "As usual, lots of singleton and doubleton patches all over the tree
  and there's little I can say which isn't in the individual changelogs.

  The lengthier patch series are

   - 'kdump: use generic functions to simplify crashkernel reservation
     in arch', from Baoquan He. This is mainly cleanups and
     consolidation of the 'crashkernel=' kernel parameter handling

   - After much discussion, David Laight's 'minmax: Relax type checks in
     min() and max()' is here. Hopefully reduces some typecasting and
     the use of min_t() and max_t()

   - A group of patches from Oleg Nesterov which clean up and slightly
     fix our handling of reads from /proc/PID/task/... and which remove
     task_struct.thread_group"

* tag 'mm-nonmm-stable-2023-11-02-14-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (64 commits)
  scripts/gdb/vmalloc: disable on no-MMU
  scripts/gdb: fix usage of MOD_TEXT not defined when CONFIG_MODULES=n
  .mailmap: add address mapping for Tomeu Vizoso
  mailmap: update email address for Claudiu Beznea
  tools/testing/selftests/mm/run_vmtests.sh: lower the ptrace permissions
  .mailmap: map Benjamin Poirier's address
  scripts/gdb: add lx_current support for riscv
  ocfs2: fix a spelling typo in comment
  proc: test ProtectionKey in proc-empty-vm test
  proc: fix proc-empty-vm test with vsyscall
  fs/proc/base.c: remove unneeded semicolon
  do_io_accounting: use sig->stats_lock
  do_io_accounting: use __for_each_thread()
  ocfs2: replace BUG_ON() at ocfs2_num_free_extents() with ocfs2_error()
  ocfs2: fix a typo in a comment
  scripts/show_delta: add __main__ judgement before main code
  treewide: mark stuff as __ro_after_init
  fs: ocfs2: check status values
  proc: test /proc/${pid}/statm
  compiler.h: move __is_constexpr() to compiler.h
  ...
2023-11-02 20:53:31 -10:00
Palmer Dabbelt
24005d184a
Merge patch series "riscv: SCS support"
Sami Tolvanen <samitolvanen@google.com> says:

This series adds Shadow Call Stack (SCS) support for RISC-V. SCS
uses compiler instrumentation to store return addresses in a
separate shadow stack to protect them against accidental or
malicious overwrites. More information about SCS can be found
here:

  https://clang.llvm.org/docs/ShadowCallStack.html

Patch 1 is from Deepak, and it simplifies VMAP_STACK overflow
handling by adding support for accessing per-CPU variables
directly in assembly. The patch is included in this series to
make IRQ stack switching cleaner with SCS, and I've simply
rebased it and fixed a couple of minor issues. Patch 2 uses this
functionality to clean up the stack switching by moving duplicate
code into a single function. On RISC-V, the compiler uses the
gp register for storing the current shadow call stack pointer,
which is incompatible with global pointer relaxation. Patch 3
moves global pointer loading into a macro that can be easily
disabled with SCS. Patch 4 implements SCS register loading and
switching, and allows the feature to be enabled, and patch 5 adds
separate per-CPU IRQ shadow call stacks when CONFIG_IRQ_STACKS is
enabled. Patch 6 fixes the backward-edge CFI test in lkdtm for
RISC-V.

Note that this series requires Clang 17. Earlier Clang versions
support SCS on RISC-V, but use the x18 register instead of gp,
which isn't ideal. gcc has SCS support for arm64, but I'm not
aware of plans to support RISC-V. Once the Zicfiss extension is
ratified, it's probably preferable to use hardware-backed shadow
stacks instead of SCS on hardware that supports the extension,
and we may want to consider implementing CONFIG_DYNAMIC_SCS to
patch between the implementation at runtime (similarly to the
arm64 implementation, which switches to SCS when hardware PAC
support isn't available).

* b4-shazam-merge:
  lkdtm: Fix CFI_BACKWARD on RISC-V
  riscv: Use separate IRQ shadow call stacks
  riscv: Implement Shadow Call Stack
  riscv: Move global pointer loading to a macro
  riscv: Deduplicate IRQ stack switching
  riscv: VMAP_STACK overflow detection thread-safe

Link: https://lore.kernel.org/r/20230927224757.1154247-8-samitolvanen@google.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-11-02 14:05:23 -07:00
Clément Léger
bc38f61313
riscv: add support for sysctl unaligned_enabled control
This sysctl tuning option allows the user to disable misaligned access
handling globally on the system. This will also be used by misaligned
detection code to temporarily disable misaligned access handling.

Signed-off-by: Clément Léger <cleger@rivosinc.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20231004151405.521596-6-cleger@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-11-01 08:34:56 -07:00
Clément Léger
7c83232161
riscv: add support for misaligned trap handling in S-mode
Misalignment trap handling is only supported for M-mode and uses direct
accesses to user memory. In S-mode, when handling usermode fault, this
requires to use the get_user()/put_user() accessors. Implement
load_u8(), store_u8() and get_insn() using these accessors for
userspace and direct text access for kernel.

Signed-off-by: Clément Léger <cleger@rivosinc.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20231004151405.521596-3-cleger@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-11-01 08:34:53 -07:00
Sami Tolvanen
d1584d791a
riscv: Implement Shadow Call Stack
Implement CONFIG_SHADOW_CALL_STACK for RISC-V. When enabled, the
compiler injects instructions to all non-leaf C functions to
store the return address to the shadow stack and unconditionally
load it again before returning, which makes it harder to corrupt
the return address through a stack overflow, for example.

The active shadow call stack pointer is stored in the gp
register, which makes SCS incompatible with gp relaxation. Use
--no-relax-gp to ensure gp relaxation is disabled and disable
global pointer loading.  Add SCS pointers to struct thread_info,
implement SCS initialization, and task switching

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20230927224757.1154247-12-samitolvanen@google.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-10-27 14:43:08 -07:00
Sunil V L
e8065df5b0
RISC-V: ACPI: Enhance acpi_os_ioremap with MMIO remapping
Enhance the acpi_os_ioremap() to support opregions in MMIO space. Also,
have strict checks using EFI memory map to allow remapping the RAM similar
to arm64.

Signed-off-by: Sunil V L <sunilvl@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20231018124007.1306159-2-sunilvl@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-10-26 09:40:31 -07:00
Christoph Hellwig
381cae1698 riscv: only select DMA_DIRECT_REMAP from RISCV_ISA_ZICBOM and ERRATA_THEAD_PBMT
RISCV_DMA_NONCOHERENT is also used for whacky non-standard
non-coherent ops that use different hooks in dma-direct.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Tested-by: Samuel Holland <samuel.holland@sifive.com>
Link: https://lore.kernel.org/r/20231018052654.50074-3-hch@lst.de
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2023-10-26 09:42:38 +02:00
Christoph Hellwig
fd96278127 riscv: RISCV_NONSTANDARD_CACHE_OPS shouldn't depend on RISCV_DMA_NONCOHERENT
RISCV_NONSTANDARD_CACHE_OPS is also used for the pmem cache maintenance
helpers, which are built into the kernel unconditionally.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20231018052654.50074-2-hch@lst.de
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2023-10-26 09:42:37 +02:00