200463 Commits

Author SHA1 Message Date
Zach O'Keefe
7d8faaf155 mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse
This idea was introduced by David Rientjes[1].

Introduce a new madvise mode, MADV_COLLAPSE, that allows users to request
a synchronous collapse of memory at their own expense.

The benefits of this approach are:

* CPU is charged to the process that wants to spend the cycles for the
  THP
* Avoid unpredictable timing of khugepaged collapse

Semantics

This call is independent of the system-wide THP sysfs settings, but will
fail for memory marked VM_NOHUGEPAGE.  If the ranges provided span
multiple VMAs, the semantics of the collapse over each VMA is independent
from the others.  This implies a hugepage cannot cross a VMA boundary.  If
collapse of a given hugepage-aligned/sized region fails, the operation may
continue to attempt collapsing the remainder of memory specified.

The memory ranges provided must be page-aligned, but are not required to
be hugepage-aligned.  If the memory ranges are not hugepage-aligned, the
start/end of the range will be clamped to the first/last hugepage-aligned
address covered by said range.  The memory ranges must span at least one
hugepage-sized region.

All non-resident pages covered by the range will first be
swapped/faulted-in, before being internally copied onto a freshly
allocated hugepage.  Unmapped pages will have their data directly
initialized to 0 in the new hugepage.  However, for every eligible
hugepage aligned/sized region to-be collapsed, at least one page must
currently be backed by memory (a PMD covering the address range must
already exist).

Allocation for the new hugepage may enter direct reclaim and/or
compaction, regardless of VMA flags.  When the system has multiple NUMA
nodes, the hugepage will be allocated from the node providing the most
native pages.  This operation operates on the current state of the
specified process and makes no persistent changes or guarantees on how
pages will be mapped, constructed, or faulted in the future

Return Value

If all hugepage-sized/aligned regions covered by the provided range were
either successfully collapsed, or were already PMD-mapped THPs, this
operation will be deemed successful.  On success, process_madvise(2)
returns the number of bytes advised, and madvise(2) returns 0.  Else, -1
is returned and errno is set to indicate the error for the most-recently
attempted hugepage collapse.  Note that many failures might have occurred,
since the operation may continue to collapse in the event a single
hugepage-sized/aligned region fails.

	ENOMEM	Memory allocation failed or VMA not found
	EBUSY	Memcg charging failed
	EAGAIN	Required resource temporarily unavailable.  Try again
		might succeed.
	EINVAL	Other error: No PMD found, subpage doesn't have Present
		bit set, "Special" page no backed by struct page, VMA
		incorrectly sized, address not page-aligned, ...

Most notable here is ENOMEM and EBUSY (new to madvise) which are intended
to provide the caller with actionable feedback so they may take an
appropriate fallback measure.

Use Cases

An immediate user of this new functionality are malloc() implementations
that manage memory in hugepage-sized chunks, but sometimes subrelease
memory back to the system in native-sized chunks via MADV_DONTNEED;
zapping the pmd.  Later, when the memory is hot, the implementation could
madvise(MADV_COLLAPSE) to re-back the memory by THPs to regain hugepage
coverage and dTLB performance.  TCMalloc is such an implementation that
could benefit from this[2].

Only privately-mapped anon memory is supported for now, but additional
support for file, shmem, and HugeTLB high-granularity mappings[2] is
expected.  File and tmpfs/shmem support would permit:

* Backing executable text by THPs.  Current support provided by
  CONFIG_READ_ONLY_THP_FOR_FS may take a long time on a large system which
  might impair services from serving at their full rated load after
  (re)starting.  Tricks like mremap(2)'ing text onto anonymous memory to
  immediately realize iTLB performance prevents page sharing and demand
  paging, both of which increase steady state memory footprint.  With
  MADV_COLLAPSE, we get the best of both worlds: Peak upfront performance
  and lower RAM footprints.
* Backing guest memory by hugapages after the memory contents have been
  migrated in native-page-sized chunks to a new host, in a
  userfaultfd-based live-migration stack.

[1] https://lore.kernel.org/linux-mm/d098c392-273a-36a4-1a29-59731cdf5d3d@google.com/
[2] https://github.com/google/tcmalloc/tree/master/tcmalloc

[jrdr.linux@gmail.com: avoid possible memory leak in failure path]
  Link: https://lkml.kernel.org/r/20220713024109.62810-1-jrdr.linux@gmail.com
[zokeefe@google.com add missing kfree() to madvise_collapse()]
  Link: https://lore.kernel.org/linux-mm/20220713024109.62810-1-jrdr.linux@gmail.com/
  Link: https://lkml.kernel.org/r/20220713161851.1879439-1-zokeefe@google.com
[zokeefe@google.com: delay computation of hpage boundaries until use]]
  Link: https://lkml.kernel.org/r/20220720140603.1958773-4-zokeefe@google.com
Link: https://lkml.kernel.org/r/20220706235936.2197195-10-zokeefe@google.com
Signed-off-by: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: "Souptick Joarder (HPE)" <jrdr.linux@gmail.com>
Suggested-by: David Rientjes <rientjes@google.com>
Cc: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Chris Kennelly <ckennelly@google.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Pavel Begunkov <asml.silence@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rongwei Wang <rongwei.wang@linux.alibaba.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-11 20:25:46 -07:00
Linus Torvalds
2f23a7c914 Misc fixes:
- Fix PAT on Xen, which caused i915 driver failures
  - Fix compat INT 80 entry crash on Xen PV guests
  - Fix 'MMIO Stale Data' mitigation status reporting on older Intel CPUs
  - Fix RSB stuffing regressions
  - Fix ORC unwinding on ftrace trampolines
  - Add Intel Raptor Lake CPU model number
  - Fix (work around) a SEV-SNP bootloader bug providing bogus values in
    boot_params->cc_blob_address, by ignoring the value on !SEV-SNP bootups.
  - Fix SEV-SNP early boot failure
  - Fix the objtool list of noreturn functions and annotate snp_abort(),
    which bug confused objtool on gcc-12.
  - Fix the documentation for retbleed
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmMLgUERHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1hU1hAAj9bKO+D07gROpkRhXpLvbqm+mUqk6It8
 qEuyLkC/xvD9N//Pya0ZPPE+l23EHQxM/L0tXCYwsc8Zlx3fr678FQktHZuxULaP
 qlGmwub8PvsyMesXBGtgHJ4RT8L07FelBR+E/TpqCSbv/geM7mcc0ojyOKo+OmbD
 vbsUTDNqsMYndzePUBrv/vJRqVLGlbd1a22DKE/YiYBbZu5hughNUB0bHYhYdsml
 mutcRsVTKRbZOi4wiuY/pnveMf/z9wAwKOWd9EBaDWILygVSHkBJJQyhHigBh3F8
 H1hFn1q4ybULdrAUT4YND9Tjb6U8laU0fxg8Z43bay7bFXXElqoj3W2qka9NjAim
 QvWSFvTYQRTIWt6sCshVsUBWj6fBZdHxcJ8Fh+ucJJp+JWl9/aD0A41vK2Jx0LIt
 p8YzBRKEqd1Q/7QD855BArN7HNtQGgShNt4oPVZ7nPnjuQt0+lngu8NR+6X3RKpX
 r8rordyPvzgPL6W+1uV+8hnz1w+YU2xplAbK+zTijwgJVgyf8khSlZQNpFKULrou
 zjtzo/2nB+4C4bvfetNnaOGhi1/AdCHZHyZE35rotpd73SLHvdOrH0Ll9oCVfVrC
 UWbC1E67cHQw97Ni/4CrCsJRBULK01uyszVCxlEkSYInf0UsKlnmy+TqxizwsCVy
 reYQ0ePyWg0=
 =ZpCJ
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2022-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull misc x86 fixes from Ingo Molnar:

 - Fix PAT on Xen, which caused i915 driver failures

 - Fix compat INT 80 entry crash on Xen PV guests

 - Fix 'MMIO Stale Data' mitigation status reporting on older Intel CPUs

 - Fix RSB stuffing regressions

 - Fix ORC unwinding on ftrace trampolines

 - Add Intel Raptor Lake CPU model number

 - Fix (work around) a SEV-SNP bootloader bug providing bogus values in
   boot_params->cc_blob_address, by ignoring the value on !SEV-SNP
   bootups.

 - Fix SEV-SNP early boot failure

 - Fix the objtool list of noreturn functions and annotate snp_abort(),
   which bug confused objtool on gcc-12.

 - Fix the documentation for retbleed

* tag 'x86-urgent-2022-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Documentation/ABI: Mention retbleed vulnerability info file for sysfs
  x86/sev: Mark snp_abort() noreturn
  x86/sev: Don't use cc_platform_has() for early SEV-SNP calls
  x86/boot: Don't propagate uninitialized boot_params->cc_blob_address
  x86/cpu: Add new Raptor Lake CPU model number
  x86/unwind/orc: Unwind ftrace trampolines with correct ORC entry
  x86/nospec: Fix i386 RSB stuffing
  x86/nospec: Unwreck the RSB stuffing
  x86/bugs: Add "unknown" reporting for MMIO Stale Data
  x86/entry: Fix entry_INT80_compat for Xen PV guests
  x86/PAT: Have pat_enabled() properly reflect state when running on Xen
2022-08-28 10:10:23 -07:00
Linus Torvalds
4459d800f7 Misc fixes: an Arch-LBR fix, a PEBS enumeration fix, an Intel DS fix,
PEBS constraints fix on Alder Lake CPUs and an Intel uncore PMU fix.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmMLfEERHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1iUdxAAqWLRHp1JQlANJxbdwmJu/PMwjlhXLn63
 w71UPXou172jEJWk6PxEkllMLfJBAe1hL0CW2VE1DlGFTfzOTwBtylLz8frhF5am
 4smCwAppGzK/r6gOABwhgPG/rbU5TJRhjmRMkPmfeOmFZSD/L4DHcDu8HbG4ruz9
 lhgKnB+TUmNBLyYQ7oqfnNsGNI5uuyJhzA8/kddPgEkK5XeebCxqZCXDRSp9LbUg
 4BkGCB2R+MfiHlttCGzOKkW+dQafA+pUQMfoZHSFJ30lB7UuvpsVl5FTiXQ5cu+f
 TGkjyBIzkNqNJHrRebQ3kkLYY6rlTgJTvrk7QdnWY2sb1J6B0ktxqBd+DG47Pc9S
 IVOe66ikoVnV/Bws6mFxN8Kj/U4L38M+373hdUvyQd8kwuvy4c6fXGFZqfF8VCMf
 zHZQJR8eeOKP1EAOIE7tDb2eY+pWnherZlm3VsHYZLtOfhDepZH6bvRWO2wXbl3I
 R+Wr/PYZih2AzcvHj+CtpS8jKFAvBG6rbqlElWZ9ain0TrV0uMuI8I3HQ9WqMa5H
 sPukVqtOczxXSMgTCilHK5S+ymq2xOHdRgo2FZUIP/5SllMfiWpYFRdaS8oh2K/j
 M6zyc5kXajSeHdSKc/2O8imkcurNN2vgdXVDsIzZGqw6phFepMVmsXYeRD9msQDT
 aZTvVfOmvvQ=
 =I/UD
 -----END PGP SIGNATURE-----

Merge tag 'perf-urgent-2022-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 perf fixes from Ingo Molnar:
 "Misc fixes: an Arch-LBR fix, a PEBS enumeration fix, an Intel DS fix,
  PEBS constraints fix on Alder Lake CPUs and an Intel uncore PMU fix"

* tag 'perf-urgent-2022-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel/uncore: Fix broken read_counter() for SNB IMC PMU
  perf/x86/intel: Fix pebs event constraints for ADL
  perf/x86/intel/ds: Fix precise store latency handling
  perf/x86/core: Set pebs_capable and PMU_FL_PEBS_ALL for the Baseline
  perf/x86/lbr: Enable the branch type for the Arch LBR by default
2022-08-28 10:05:42 -07:00
Linus Torvalds
dee1873796 s390 updates for 6.0-rc3
- Fix double free of guarded storage and runtime instrumentation control
   blocks on fork() failure.
 
 - Fix triggering write fault when VMA does not allow VM_WRITE.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAmMJ6sYACgkQjYWKoQLX
 FBiH2Af/YzyZO+MgkXEGpMDp7zp7G1H3m7OYfA9onnRruf5PKppWYR/0eQRRYbat
 C1+9LyIQO2PjMG0RoWroxX7DYG9HR027nt1D+J1xPk82UgGvegDkky3oH3c75wFb
 oxjjdSI+Rsb4VEo1yIaKfl+lEsxa+rqXB3wvSsrwMFs9OHJYQo4eK62WUAFwcDoq
 saZ5xu82GhACTETpzCFnVHdGyWdq4BfpQPV1xIQDt8V+8y/yK4/dnB8coGEJL6I1
 cBVPUXltCwXeVoeMTVzzD6qUU0TK9nwHmJneSJ/Afj2w2XfAfJlNYpBuSEUSwHS4
 pD9DzBKy8YQn9ecDmqnDN7zY5n1fFw==
 =wHSL
 -----END PGP SIGNATURE-----

Merge tag 's390-6.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 fixes from Vasily Gorbik:

 - Fix double free of guarded storage and runtime instrumentation
   control blocks on fork() failure

 - Fix triggering write fault when VMA does not allow VM_WRITE

* tag 's390-6.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/mm: do not trigger write fault when vma does not allow VM_WRITE
  s390: fix double free of GS and RI CBs on fork() failure
2022-08-27 15:40:51 -07:00
Linus Torvalds
05519f2480 xen: branch for v6.0-rc3
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCYwnVogAKCRCAXGG7T9hj
 vkvCAP97gXbhilg8GkmhSgwsn8XAnR3PFWonR+PA/NBQwsrx4AD/by18f7ICyOCl
 /r2PHDjYSiyMjwBSjrdm5Uo/kNBaqAM=
 =cIk8
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-6.0-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:

 - two minor cleanups

 - a fix of the xen/privcmd driver avoiding a possible NULL dereference
   in an error case

* tag 'for-linus-6.0-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/privcmd: fix error exit of privcmd_ioctl_dm_op()
  xen: move from strlcpy with unused retval to strscpy
  xen: x86: remove setting the obsolete config XEN_MAX_DOMAIN_MEMORY
2022-08-27 15:38:00 -07:00
Mikulas Patocka
d6ffe6067a provide arch_test_bit_acquire for architectures that define test_bit
Some architectures define their own arch_test_bit and they also need
arch_test_bit_acquire, otherwise they won't compile.  We also clean up
the code by using the generic test_bit if that is equivalent to the
arch-specific version.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 8238b4579866 ("wait_on_bit: add an acquire memory barrier")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-08-27 09:49:54 -07:00
Stephane Eranian
11745ecfe8 perf/x86/intel/uncore: Fix broken read_counter() for SNB IMC PMU
Existing code was generating bogus counts for the SNB IMC bandwidth counters:

$ perf stat -a -I 1000 -e uncore_imc/data_reads/,uncore_imc/data_writes/
     1.000327813           1,024.03 MiB  uncore_imc/data_reads/
     1.000327813              20.73 MiB  uncore_imc/data_writes/
     2.000580153         261,120.00 MiB  uncore_imc/data_reads/
     2.000580153              23.28 MiB  uncore_imc/data_writes/

The problem was introduced by commit:
  07ce734dd8ad ("perf/x86/intel/uncore: Clean up client IMC")

Where the read_counter callback was replace to point to the generic
uncore_mmio_read_counter() function.

The SNB IMC counters are freerunnig 32-bit counters laid out contiguously in
MMIO. But uncore_mmio_read_counter() is using a readq() call to read from
MMIO therefore reading 64-bit from MMIO. Although this is okay for the
uncore_perf_event_update() function because it is shifting the value based
on the actual counter width to compute a delta, it is not okay for the
uncore_pmu_event_start() which is simply reading the counter  and therefore
priming the event->prev_count with a bogus value which is responsible for
causing bogus deltas in the perf stat command above.

The fix is to reintroduce the custom callback for read_counter for the SNB
IMC PMU and use readl() instead of readq(). With the change the output of
perf stat is back to normal:
$ perf stat -a -I 1000 -e uncore_imc/data_reads/,uncore_imc/data_writes/
     1.000120987             296.94 MiB  uncore_imc/data_reads/
     1.000120987             138.42 MiB  uncore_imc/data_writes/
     2.000403144             175.91 MiB  uncore_imc/data_reads/
     2.000403144              68.50 MiB  uncore_imc/data_writes/

Fixes: 07ce734dd8ad ("perf/x86/intel/uncore: Clean up client IMC")
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/20220803160031.1379788-1-eranian@google.com
2022-08-27 00:05:38 +02:00
Linus Torvalds
e022620b5d arm64 fixes for -rc3
- Fix workaround for Cortex-A76 erratum #1286807
 
 - Add workaround for AMU erratum #2457168 on Cortex-A510
 
 - Drop reference to removed CONFIG_ARCH_RANDOM #define
 
 - Fix parsing of the "rodata=full" cmdline option
 
 - Fix a bunch of issues in the SME register state switching and sigframe code
 
 - Fix incorrect extraction of the CTR_EL0.CWG register field
 
 - Fix ACPI cache topology probing when the PPTT is not present
 
 - Trivial comment and whitespace fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmMIr4YQHHdpbGxAa2Vy
 bmVsLm9yZwAKCRC3rHDchMFjNOpcB/9PQM3DtbFcCFJeetFxJYVSPHikHfEj4SHY
 H6NRNLA9JwwCqA1/RKbJQDpFS34JdL5UgtNuUFbg4zkH2aaeML7WKuRg396jv2ke
 GmHpOkNeIaae0NQes3MLZQf+Heh2oimRYwlPXJc23vhAIZagwby3yR/9POrxKm6p
 kamLIrmwU1mU7pwr7HEkRr1HOd/eOJ+q6P3UYrFlAgAp5PEfVGgcdjbHJ1gY1KWw
 HwR6s2yWQ71+Aug9wMS56TQszBedyuIOeKezAI+IbThF6t/kQlfJna+hqvPp0zEE
 mz1hss7ACpfDURdga35B1bE146aZ1SKEsXXaB853JI76K6D534Y9
 =3oRd
 -----END PGP SIGNATURE-----

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Will Deacon:
 "A bumper crop of arm64 fixes for -rc3.

  The largest change is fixing our parsing of the 'rodata=full' command
  line option, which kstrtobool() started treating as 'rodata=false'.
  The fix actually makes the parsing of that option much less fragile
  and updates the documentation at the same time.

  We still have a boot issue pending when KASLR is disabled at compile
  time, but there's a fresh fix on the list which I'll send next week if
  it holds up to testing.

  Summary:

   - Fix workaround for Cortex-A76 erratum #1286807

   - Add workaround for AMU erratum #2457168 on Cortex-A510

   - Drop reference to removed CONFIG_ARCH_RANDOM #define

   - Fix parsing of the "rodata=full" cmdline option

   - Fix a bunch of issues in the SME register state switching and sigframe code

   - Fix incorrect extraction of the CTR_EL0.CWG register field

   - Fix ACPI cache topology probing when the PPTT is not present

   - Trivial comment and whitespace fixes"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64/sme: Don't flush SVE register state when handling SME traps
  arm64/sme: Don't flush SVE register state when allocating SME storage
  arm64/signal: Flush FPSIMD register state when disabling streaming mode
  arm64/signal: Raise limit on stack frames
  arm64/cache: Fix cache_type_cwg() for register generation
  arm64/sysreg: Guard SYS_FIELD_ macros for asm
  arm64/sysreg: Directly include bitfield.h
  arm64: cacheinfo: Fix incorrect assignment of signed error value to unsigned fw_level
  arm64: errata: add detection for AMEVCNTR01 incrementing incorrectly
  arm64: fix rodata=full
  arm64: Fix comment typo
  docs/arm64: elf_hwcaps: unify newlines in HWCAP lists
  arm64: adjust KASLR relocation after ARCH_RANDOM removal
  arm64: Fix match_list for erratum 1286807 on Arm Cortex-A76
2022-08-26 11:32:53 -07:00
Linus Torvalds
012bd7e859 RISC-V Fixes for 6.0-rc3
* A handful of fixes for the Microchip device trees.
 * A pair of fixes to eliminate build warnings.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEAM520YNJYN/OiG3470yhUCzLq0EFAmMI390THHBhbG1lckBk
 YWJiZWx0LmNvbQAKCRDvTKFQLMurQTjPD/sFAumM94gfE2ofdqUNr4BMVI7Bv0i8
 Y/S9ZEnyJG2nkhA4xODSYpgrwAopVyMbxhg76SLYe2LesIJzidfy4eg4mamyoe1A
 eqK8gtS9QiWU0EY1DjqkhDk0+OsRv31LUtRivesrwWsXEcBjUAo9cuqj8nYPwOBp
 635fRtDvZFJeZ8W0Vnb/f6E57wmt1TwmiwUTvtpQ7LSACHfmUAjFYLVfnPwCyH46
 MJNk1EUw8KiVHICMZRSxxTfzFqWX+E7u+a8aYdT2UqmJ96ejRlGxEg1WFdoSxH9w
 E1kVZ8IQkH17X8Tqn0exJ1naZWS5xvvxEPxmuICpNken3PL/5Q0jiSxspihi/u14
 KY+reJsbqCYhd0bmS88npnY3X0yvsRnYkYdg/bwQFqEpBuPtW5qIJjAeEV75YCEx
 97keJzkcedCv86hrserYKne2/vWnTETbpSKSQ0TdjrCeUGwOmozOwy5ZV8AouWo9
 S81xH6NwrCNqQOpVG0GqFAk47vhOqgZVKwDE5NQsyrW2SFACgRfzN5sMsEIfyFV2
 lsf9qf1lmgoFa9I9a/uHdx/V2A3CADYZ5rpe0fnQk6mISdvU6tJhSySyg98TUisc
 pkipkbqewEmGDw7lk7TCMoa44tDjzJoINi6lq5q9W0VPi1r3JoRpHqf4ZJ6CBJfJ
 Nc1f+TISDqngGw==
 =bqa6
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V fixes from Palmer Dabbelt:

 - A handful of fixes for the Microchip device trees

 - A pair of fixes to eliminate build warnings

* tag 'riscv-for-linus-6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: dts: microchip: mpfs: remove pci axi address translation property
  riscv: dts: microchip: mpfs: remove bogus card-detect-delay
  riscv: dts: microchip: mpfs: remove ti,fifo-depth property
  riscv: dts: microchip: mpfs: fix incorrect pcie child node name
  riscv: traps: add missing prototype
  riscv: signal: fix missing prototype warning
  riscv: dts: microchip: correct L2 cache interrupts
2022-08-26 11:26:27 -07:00
Linus Torvalds
c23f864dc7 LoongArch fixes for v6.0-rc3
-----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCAA0FiEEzOlt8mkP+tbeiYy5AoYrw/LiJnoFAmMHXw0WHGNoZW5odWFj
 YWlAa2VybmVsLm9yZwAKCRAChivD8uImep6AD/9lmv4L05lYbMKbmYx9kOJJrqxt
 HDhm3h041yQhLo/HDabChoOBrOLo5jP7HA8iJZPq+PLYwsxCutQc/HwoClbsIYfe
 ii/KKS2s0+y+zhtHW3FFjEEFUR1J8OqHDfRPodMEXlEhbDUbWf8IxXLOyJ4qaoP2
 4bNAx0oZwxMLx8P+cA/Tr7HOCL1tw5pTxWQfK+C7XM+QdpUJRTJjx4UA2qqTr/KF
 HE96g+P9U1GnOZYuhSmtVntSNb1JkrWFpXZMLAlW+pZbcTEMpiUxP79vg0wFNR2r
 V4BwNTQlGj41rrHEOKRcjSbMp08o/35CSIBMKRLcWrORgRMg1vVmie3eSSjMzxtu
 nZCPacBTM1rW4MHe+FFXq6aWf1JCI/nyHuwGCh2hsSwtUczS6t8eANOKBOatBqV4
 0qF/VStdrYk9ZotKH46AiOrqZZOZmsB1NiqruZeoSllrrZVafQAj7g3/Dh//EtBk
 IhgyKx4I4Ua1O36qL3P32KynbLIuzg+qhBAZj7MnvQ7YAMBClrCpFU6e/7DFjPS2
 vNDqVje4pTySd+N3SrJv7PqI6sOliPlBbY0bFcocQAUGOBmU7+w+relhU2bLVzFz
 BDaBFrTiuxreYxUd/8Vv0vr2aLBHZIcXDs54o8ppNMhS2PsV/LnyvXSesN5BoByp
 CyhDHpMwaFMfjFjBWg==
 =b5Bu
 -----END PGP SIGNATURE-----

Merge tag 'loongarch-fixes-6.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson

Pull LoongArch fixes from Huacai Chen:
 "Fix a bunch of build errors/warnings, a poweroff error and an
  unbalanced locking in do_page_fault()"

* tag 'loongarch-fixes-6.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
  LoongArch: mm: Avoid unnecessary page fault retires on shared memory types
  LoongArch: Add subword xchg/cmpxchg emulation
  LoongArch: Cleanup headers to avoid circular dependency
  LoongArch: Cleanup reset routines with new API
  LoongArch: Fix build warnings in VDSO
  LoongArch: Select PCI_QUIRKS to avoid build error
2022-08-26 11:21:18 -07:00
Mikulas Patocka
8238b45798 wait_on_bit: add an acquire memory barrier
There are several places in the kernel where wait_on_bit is not followed
by a memory barrier (for example, in drivers/md/dm-bufio.c:new_read).

On architectures with weak memory ordering, it may happen that memory
accesses that follow wait_on_bit are reordered before wait_on_bit and
they may return invalid data.

Fix this class of bugs by introducing a new function "test_bit_acquire"
that works like test_bit, but has acquire memory ordering semantics.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Acked-by: Will Deacon <will@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-08-26 09:30:25 -07:00
Palmer Dabbelt
1709c70c31
Merge branch 'riscv-variable_fixes_without_kvm' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux.git into fixes
This contains a pair of fixes for build-time warnings.

* 'riscv-variable_fixes_without_kvm' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux.git:
  riscv: traps: add missing prototype
  riscv: signal: fix missing prototype warning
2022-08-25 16:38:01 -07:00
Palmer Dabbelt
92e55a865b Microchip RISC-V devicetree fixes for 6.0-rc3
Two sets of fixes this time around:
 - A fix for the interrupt ordering of the l2-cache controller. If the
   driver is enabled, it would spam the console /constantly/, rendering
   the system useless.
 - General cleanup for some bogus properties in the dt, part of my quest
   for zero dtbs_check warnings.
 
 On that note, the interrupt ordering adds a dtbs_check warning - but I
 considered that fixing the potentially useless system was more of a
 priority.
 
 Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRh246EGq/8RLhDjO14tDGHoIJi0gUCYwelHAAKCRB4tDGHoIJi
 0kSJAPwLvBBdH7lEOdM5NyEctyqa0pqMOPNKG3+7/VOK5rL++wD/Uqf7tNDaXSeo
 Qp3hfJQ16p853bdz+xLSf/HXguIf0QI=
 =zdu9
 -----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEAM520YNJYN/OiG3470yhUCzLq0EFAmMIBsQTHHBhbG1lckBk
 YWJiZWx0LmNvbQAKCRDvTKFQLMurQRLDEACz6Tm/vxs7Q8rM+KmEdlUQU8vyEl2l
 34KAZsoLfbEcmg0/YKbQYCi+m4xLziXHUgeTdEMv1wAUfpi+RuX8sBVachM9Obo1
 s2q+Trg+Y44WKKtgK4i8J+OHX9sbBDEScT8swmApFKMtoYZEYjS0N44WrOWDSc5j
 xXTyQ2VJJvW7FNawZ58pktAGk/mRPNAzLqq/sR2igg/M5lDFY28LyM5Q9QfQ71bg
 M/lgoo1X1EYwE2RCOn1jYWJmBlx58N9+IiUBGwK+sa6UvpUlI6mPvrJinNSSUFkZ
 tqoMD/FlOcwP6W2IHvGBPWh0LJs/2RV/FlL97tYStrbB3vnjX4HiSC2/GxQSRNC4
 UFwGGP8rZpIwIPORZN4h6U7UoZQaP7lN5BwXJFLQ3OvB0y0UXyaphUDOmSqV8aPE
 9mk2jPZhamxB8UxEH9IGJRGPYCnboaAF5uMJw17GuxKtke9CGlnWLszMHxcjKpOi
 l0zBuMUnw0wWqqkJRy95eL3BAH0zcA4WC8HAM1fcSmVQaRYDjS2qQC4xpJNaP7Rg
 br0h2C8fDrqKis0WkGsKrPJbrk4q6l1zv798zyOiGE3GkfmLHmHiNDriREPmC/bP
 6BBykEGXVt+XcBdlpRqtb1fT7rTfFGDpFkyZJ1XFclHVPijwJ18C02Ptd+3v2d22
 W2Ch+Q0GLe73ZA==
 =O5yv
 -----END PGP SIGNATURE-----

Merge tag 'dt-fixes-for-palmer-6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/conor/linux.git into fixes

Microchip RISC-V devicetree fixes for 6.0-rc3

Two sets of fixes this time around:
- A fix for the interrupt ordering of the l2-cache controller. If the
  driver is enabled, it would spam the console /constantly/, rendering
  the system useless.
- General cleanup for some bogus properties in the dt, part of my quest
  for zero dtbs_check warnings.

On that note, the interrupt ordering adds a dtbs_check warning - but I
considered that fixing the potentially useless system was more of a
priority.

Signed-off-by: Conor Dooley <conor.dooley@microchip.com>

* tag 'dt-fixes-for-palmer-6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/conor/linux.git:
  riscv: dts: microchip: mpfs: remove pci axi address translation property
  riscv: dts: microchip: mpfs: remove bogus card-detect-delay
  riscv: dts: microchip: mpfs: remove ti,fifo-depth property
  riscv: dts: microchip: mpfs: fix incorrect pcie child node name
  riscv: dts: microchip: correct L2 cache interrupts
2022-08-25 16:32:39 -07:00
Borislav Petkov
c93c296fff x86/sev: Mark snp_abort() noreturn
Mark both the function prototype and definition as noreturn in order to
prevent the compiler from doing transformations which confuse objtool
like so:

  vmlinux.o: warning: objtool: sme_enable+0x71: unreachable instruction

This triggers with gcc-12.

Add it and sev_es_terminate() to the objtool noreturn tracking array
too. Sort it while at it.

Suggested-by: Michael Matz <matz@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220824152420.20547-1-bp@alien8.de
2022-08-25 15:54:03 +02:00
Gerald Schaefer
41ac42f137 s390/mm: do not trigger write fault when vma does not allow VM_WRITE
For non-protection pXd_none() page faults in do_dat_exception(), we
call do_exception() with access == (VM_READ | VM_WRITE | VM_EXEC).
In do_exception(), vma->vm_flags is checked against that before
calling handle_mm_fault().

Since commit 92f842eac7ee3 ("[S390] store indication fault optimization"),
we call handle_mm_fault() with FAULT_FLAG_WRITE, when recognizing that
it was a write access. However, the vma flags check is still only
checking against (VM_READ | VM_WRITE | VM_EXEC), and therefore also
calling handle_mm_fault() with FAULT_FLAG_WRITE in cases where the vma
does not allow VM_WRITE.

Fix this by changing access check in do_exception() to VM_WRITE only,
when recognizing write access.

Link: https://lkml.kernel.org/r/20220811103435.188481-3-david@redhat.com
Fixes: 92f842eac7ee3 ("[S390] store indication fault optimization")
Cc: <stable@vger.kernel.org>
Reported-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2022-08-25 15:12:32 +02:00
Brian Foster
13cccafe0e s390: fix double free of GS and RI CBs on fork() failure
The pointers for guarded storage and runtime instrumentation control
blocks are stored in the thread_struct of the associated task. These
pointers are initially copied on fork() via arch_dup_task_struct()
and then cleared via copy_thread() before fork() returns. If fork()
happens to fail after the initial task dup and before copy_thread(),
the newly allocated task and associated thread_struct memory are
freed via free_task() -> arch_release_task_struct(). This results in
a double free of the guarded storage and runtime info structs
because the fields in the failed task still refer to memory
associated with the source task.

This problem can manifest as a BUG_ON() in set_freepointer() (with
CONFIG_SLAB_FREELIST_HARDENED enabled) or KASAN splat (if enabled)
when running trinity syscall fuzz tests on s390x. To avoid this
problem, clear the associated pointer fields in
arch_dup_task_struct() immediately after the new task is copied.
Note that the RI flag is still cleared in copy_thread() because it
resides in thread stack memory and that is where stack info is
copied.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Fixes: 8d9047f8b967c ("s390/runtime instrumentation: simplify task exit handling")
Fixes: 7b83c6297d2fc ("s390/guarded storage: simplify task exit handling")
Cc: <stable@vger.kernel.org> # 4.15
Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Link: https://lore.kernel.org/r/20220816155407.537372-1-bfoster@redhat.com
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2022-08-25 15:12:32 +02:00
Lukas Bulwahn
ab0af755d4 xen: x86: remove setting the obsolete config XEN_MAX_DOMAIN_MEMORY
Commit c70727a5bc18 ("xen: allow more than 512 GB of RAM for 64 bit
pv-domains") from July 2015 replaces the config XEN_MAX_DOMAIN_MEMORY with
a new config XEN_512GB, but misses to adjust arch/x86/configs/xen.config.
As XEN_512GB defaults to yes, there is no need to explicitly set any config
in xen.config.

Just remove setting the obsolete config XEN_MAX_DOMAIN_MEMORY.

Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20220817044333.22310-1-lukas.bulwahn@gmail.com
Signed-off-by: Juergen Gross <jgross@suse.com>
2022-08-25 13:38:03 +02:00
Huacai Chen
b83699ea1e LoongArch: mm: Avoid unnecessary page fault retires on shared memory types
Commit d92725256b4f22d0 ("mm: avoid unnecessary page fault retires on
shared memory types") modifies do_page_fault() to handle the VM_FAULT_
COMPLETED case, but forget to change for LoongArch, so fix it as other
architectures does.

Fixes: d92725256b4f22d0 ("mm: avoid unnecessary page fault retires on shared memory types")
Reviewed-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2022-08-25 19:34:59 +08:00
Huacai Chen
720dc7ab25 LoongArch: Add subword xchg/cmpxchg emulation
LoongArch only support 32-bit/64-bit xchg/cmpxchg in native. But percpu
operation, qspinlock and some drivers need 8-bit/16-bit xchg/cmpxchg. We
add subword xchg/cmpxchg emulation in this patch because the emulation
has better performance than the generic implementation (on NUMA system),
and it can fix some build errors meanwhile [1].

LoongArch's guarantee for forward progress (avoid many ll/sc happening
at the same time and no one succeeds):

We have the "exclusive access (with timeout) of ll" feature to avoid
simultaneous ll (which also blocks other memory load/store on the same
address), and the "random delay of sc" feature to avoid simultaneous
sc. It is a mandatory requirement for multi-core LoongArch processors
to implement such features, only except those single-core and dual-core
processors (they also don't support multi-chip interconnection).

Feature bits are introduced in CPUCFG3, bit 3 and bit 4 [2].

[1] https://lore.kernel.org/loongarch/CAAhV-H6vvkuOzy8OemWdYK3taj5Jn3bFX0ZTwE=twM8ywpBUYA@mail.gmail.com/T/#t
[2] https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN.html#_cpucfg

Reported-by: Sudip Mukherjee (Codethink) <sudipm.mukherjee@gmail.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Rui Wang <wangrui@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2022-08-25 19:34:59 +08:00
Huacai Chen
092e9ebe52 LoongArch: Cleanup headers to avoid circular dependency
When enable GENERIC_IOREMAP, there will be circular dependency to cause
build errors. The root cause is that pgtable.h shouldn't include io.h
but pgtable.h need some macros defined in io.h. So cleanup those macros
and remove the unnecessary inclusions, as other architectures do.

Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2022-08-25 19:34:59 +08:00
Huacai Chen
da48b67cfb LoongArch: Cleanup reset routines with new API
Cleanup reset routines by using new do_kernel_power_off() instead of old
pm_power_off(), and then simplify the whole file (reset.c) organization
by inlining some functions. This cleanup also fix a poweroff error if EFI
runtime is disabled.

Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2022-08-25 19:34:59 +08:00
Huacai Chen
84e7620601 LoongArch: Fix build warnings in VDSO
Fix build warnings in VDSO as below:

arch/loongarch/vdso/vgettimeofday.c:9:5: warning: no previous prototype for '__vdso_clock_gettime' [-Wmissing-prototypes]
    9 | int __vdso_clock_gettime(clockid_t clock,
      |     ^~~~~~~~~~~~~~~~~~~~
arch/loongarch/vdso/vgettimeofday.c:15:5: warning: no previous prototype for '__vdso_gettimeofday' [-Wmissing-prototypes]
   15 | int __vdso_gettimeofday(struct __kernel_old_timeval *tv,
      |     ^~~~~~~~~~~~~~~~~~~
arch/loongarch/vdso/vgettimeofday.c:21:5: warning: no previous prototype for '__vdso_clock_getres' [-Wmissing-prototypes]
   21 | int __vdso_clock_getres(clockid_t clock_id,
      |     ^~~~~~~~~~~~~~~~~~~
arch/loongarch/vdso/vgetcpu.c:27:5: warning: no previous prototype for '__vdso_getcpu' [-Wmissing-prototypes]
   27 | int __vdso_getcpu(unsigned int *cpu, unsigned int *node, struct getcpu_cache *unused)

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2022-08-25 19:34:59 +08:00
Huacai Chen
7c12bb8f59 LoongArch: Select PCI_QUIRKS to avoid build error
PCI_LOONGSON is a mandatory for LoongArch and it is selected in Kconfig
unconditionally, but its dependency PCI_QUIRKS is missing and may cause
a build error when "make randconfig":

   arch/loongarch/pci/acpi.c: In function 'pci_acpi_setup_ecam_mapping':
>> arch/loongarch/pci/acpi.c:103:29: error: 'loongson_pci_ecam_ops' undeclared (first use in this function)
     103 |                 ecam_ops = &loongson_pci_ecam_ops;
         |                             ^~~~~~~~~~~~~~~~~~~~~
   arch/loongarch/pci/acpi.c:103:29: note: each undeclared identifier is reported only once for each function it appears in

Kconfig warnings: (for reference only)
   WARNING: unmet direct dependencies detected for PCI_LOONGSON
   Depends on [n]: PCI [=y] && (MACH_LOONGSON64 [=y] || COMPILE_TEST [=y]) && (OF [=y] || ACPI [=y]) && PCI_QUIRKS [=n]
   Selected by [y]:
   - LOONGARCH [=y]

Fix it by selecting PCI_QUIRKS unconditionally, too.

Reported-by: kernel test robot <lkp@intel.com>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2022-08-25 19:34:59 +08:00
Tom Lendacky
cdaa0a407f x86/sev: Don't use cc_platform_has() for early SEV-SNP calls
When running identity-mapped and depending on the kernel configuration,
it is possible that the compiler uses jump tables when generating code
for cc_platform_has().

This causes a boot failure because the jump table uses un-mapped kernel
virtual addresses, not identity-mapped addresses. This has been seen
with CONFIG_RETPOLINE=n.

Similar to sme_encrypt_kernel(), use an open-coded direct check for the
status of SNP rather than trying to eliminate the jump table. This
preserves any code optimization in cc_platform_has() that can be useful
post boot. It also limits the changes to SEV-specific files so that
future compiler features won't necessarily require possible build changes
just because they are not compatible with running identity-mapped.

  [ bp: Massage commit message. ]

Fixes: 5e5ccff60a29 ("x86/sev: Add helper for validating pages in early enc attribute changes")
Reported-by: Sean Christopherson <seanjc@google.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org> # 5.19.x
Link: https://lore.kernel.org/all/YqfabnTRxFSM+LoX@google.com/
2022-08-24 09:54:32 +02:00
Michael Roth
4b1c742407 x86/boot: Don't propagate uninitialized boot_params->cc_blob_address
In some cases, bootloaders will leave boot_params->cc_blob_address
uninitialized rather than zeroing it out. This field is only meant to be
set by the boot/compressed kernel in order to pass information to the
uncompressed kernel when SEV-SNP support is enabled.

Therefore, there are no cases where the bootloader-provided values
should be treated as anything other than garbage. Otherwise, the
uncompressed kernel may attempt to access this bogus address, leading to
a crash during early boot.

Normally, sanitize_boot_params() would be used to clear out such fields
but that happens too late: sev_enable() may have already initialized
it to a valid value that should not be zeroed out. Instead, have
sev_enable() zero it out unconditionally beforehand.

Also ensure this happens for !CONFIG_AMD_MEM_ENCRYPT as well by also
including this handling in the sev_enable() stub function.

  [ bp: Massage commit message and comments. ]

Fixes: b190a043c49a ("x86/sev: Add SEV-SNP feature detection/setup")
Reported-by: Jeremi Piotrowski <jpiotrowski@linux.microsoft.com>
Reported-by: watnuss@gmx.de
Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216387
Link: https://lore.kernel.org/r/20220823160734.89036-1-michael.roth@amd.com
2022-08-24 09:03:04 +02:00
Conor Dooley
e4009c5fa7 riscv: dts: microchip: mpfs: remove pci axi address translation property
An AXI master address translation table property was inadvertently
added to the device tree & this was not caught by dtbs_check at the
time. Remove the property - it should not be in mpfs.dtsi anyway as
it would be more suitable in -fabric.dtsi nor does it actually apply
to the version of the reference design we are using for upstream.

Link: https://www.microsemi.com/document-portal/doc_download/1245812-polarfire-fpga-and-polarfire-soc-fpga-pci-express-user-guide # Section 1.3.3
Fixes: 528a5b1f2556 ("riscv: dts: microchip: add new peripherals to icicle kit device tree")
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
2022-08-23 22:15:55 +01:00
Conor Dooley
2b55915d27 riscv: dts: microchip: mpfs: remove bogus card-detect-delay
Recent versions of dt-schema warn about a previously undetected
undocumented property:
arch/riscv/boot/dts/microchip/mpfs-icicle-kit.dtb: mmc@20008000: Unevaluated properties are not allowed ('card-detect-delay' was unexpected)
        From schema: Documentation/devicetree/bindings/mmc/cdns,sdhci.yaml

There are no GPIOs connected to MSSIO6B4 pin K3 so adding the common
cd-debounce-delay-ms property makes no sense. The Cadence IP has a
register that sets the card detect delay as "DP * tclk". On MPFS, this
clock frequency is not configurable (it must be 200 MHz) & the FPGA
comes out of reset with this register already set.

Fixes: bc47b2217f24 ("riscv: dts: microchip: add the sundance polarberry")
Fixes: 0fa6107eca41 ("RISC-V: Initial DTS for Microchip ICICLE board")
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
2022-08-23 22:15:54 +01:00
Conor Dooley
72a05748cb riscv: dts: microchip: mpfs: remove ti,fifo-depth property
Recent versions of dt-schema warn about a previously undetected
undocument property on the icicle & polarberry devicetrees:

arch/riscv/boot/dts/microchip/mpfs-icicle-kit.dtb: ethernet@20112000: ethernet-phy@8: Unevaluated properties are not allowed ('ti,fifo-depth' was unexpected)
        From schema: Documentation/devicetree/bindings/net/cdns,macb.yaml

I know what you're thinking, the binding doesn't look to be the problem
and I agree. I am not sure why a TI vendor property was ever actually
added since it has no meaning... just get rid of it.

Fixes: bc47b2217f24 ("riscv: dts: microchip: add the sundance polarberry")
Fixes: 0fa6107eca41 ("RISC-V: Initial DTS for Microchip ICICLE board")
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
2022-08-23 22:15:54 +01:00
Conor Dooley
3f67e69976 riscv: dts: microchip: mpfs: fix incorrect pcie child node name
Recent versions of dt-schema complain about the PCIe controller's child
node name:
arch/riscv/boot/dts/microchip/mpfs-icicle-kit.dtb: pcie@2000000000: Unevaluated properties are not allowed ('clock-names', 'clocks', 'legacy-interrupt-controller', 'microchip,axi-m-atr0' were unexpected)
            From schema: Documentation/devicetree/bindings/pci/microchip,pcie-host.yaml
Make the dts match the correct property name in the dts.

Fixes: 528a5b1f2556 ("riscv: dts: microchip: add new peripherals to icicle kit device tree")
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
2022-08-23 22:15:54 +01:00
Linus Torvalds
df0219d11b parisc architecture fixes and updates for kernel v6.0-rc3:
* Fix emulation of fldw instruction on unaligned addresses
 * Fix "make ARCH=parisc64 randconfig" to return a 64-bit config
 * Prevent boot if trying to boot a 32-bit kernel compiled for PA8X00
   CPUs on 32-bit only machines
 * ccio-dma: Handle kmalloc failure in ccio_init_resources()
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQS86RI+GtKfB8BJu973ErUQojoPXwUCYwSs6AAKCRD3ErUQojoP
 X4Q5AQCHIg0vlkbDD8ckaCLMzQDnTUOhXzWjcuYRWnxQHJGE7gEAoLBtynPMLi4M
 J80exkaguje8XoAcROSqxNMYbsIQDgU=
 =CZXx
 -----END PGP SIGNATURE-----

Merge tag 'parisc-for-6.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux

Pull parisc fixes from Helge Deller:
 "Some interesting background to the current patchset:

  It turned out that the fldw instruction (which loads a 32-bit word
  from memory into one half of a FP register) failed on unaligned
  addresses and even trashed some other random FP register instead. It's
  a trivial one-liner fix in the exception handler but this failure
  dates back to the very beginnings of the parisc-port. It's strange
  that it was never noticed before.

  Another patch fixes an annoyance noticed by Randy Dunlap. Running
  "make ARCH=parisc64 randconfig" always returned a 32-bit config,
  although one would expect a 64-bit config. Masahiro Yamada suggested
  to mimik sparc Kconfig code, which fixed the issue nicely. This
  allowed to drop some compiler build checks too.

  Third, it's possible to build an optimized 32-bit kernel for PA8X00
  (64-bit) CPUs, which then wouldn't start on 32-bit-only (PA1.x)
  machines. I've added a bootup check which prevents that and which
  prints a message to the console. This can be tested with qemu, which
  currently only supports 32-bit emulation.

  The other patches are usual clean-up stuff like added return value
  checks and typo fixes in comments.

  Summary:

   - Fix emulation of fldw instruction on unaligned addresses

   - Fix "make ARCH=parisc64 randconfig" to return a 64-bit config

   - Prevent boot if trying to boot a 32-bit kernel compiled for PA8X00
     CPUs on 32-bit only machines

   - ccio-dma: Handle kmalloc failure in ccio_init_resources()"

* tag 'parisc-for-6.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc: Add runtime check to prevent PA2.0 kernels on PA1.x machines
  parisc: ccio-dma: Handle kmalloc failure in ccio_init_resources()
  parisc: led: Move from strlcpy with unused retval to strscpy
  parisc: ccio-dma: Fix typo in comment
  Revert "parisc: Show error if wrong 32/64-bit compiler is being used"
  parisc: Make CONFIG_64BIT available for ARCH=parisc64 only
  parisc: Fix exception handler for fldw and fstw instructions
2022-08-23 13:42:31 -07:00
Tony Luck
ea902bcc19 x86/cpu: Add new Raptor Lake CPU model number
Note1: Model 0xB7 already claimed the "no suffix" #define for a regular
client part, so add (yet another) suffix "S" to distinguish this new
part from the earlier one.

Note2: the RAPTORLAKE* and ALDERLAKE* processors are very similar from a
software enabling point of view.  There are no known features that have
model-specific enabling and also differ between the two.  In other words,
every single place that list *one* or more RAPTORLAKE* or ALDERLAKE*
processors should list all of them.

Note3: This is being merged before there is an in-tree user.  Merging
this provides an "anchor" so that the different folks can update their
subsystems (like perf) in parallel to use this define and test it.

[ dhansen: add a note about why this has no in-tree users yet ]

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/20220823174819.223941-1-tony.luck@intel.com
2022-08-23 10:58:21 -07:00
Mark Brown
714f3cbd70 arm64/sme: Don't flush SVE register state when handling SME traps
Currently as part of handling a SME access trap we flush the SVE register
state. This is not needed and would corrupt register state if the task has
access to the SVE registers already. For non-streaming mode accesses the
required flushing will be done in the SVE access trap. For streaming
mode SVE register accesses the architecture guarantees that the register
state will be flushed when streaming mode is entered or exited so there is
no need for us to do so. Simply remove the register initialisation.

Fixes: 8bd7f91c03d8 ("arm64/sme: Implement traps and syscall handling for SME")
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220817182324.638214-5-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2022-08-23 11:29:12 +01:00
Mark Brown
826a4fdd2a arm64/sme: Don't flush SVE register state when allocating SME storage
Currently when taking a SME access trap we allocate storage for the SVE
register state in order to be able to handle storage of streaming mode SVE.
Due to the original usage in a purely SVE context the SVE register state
allocation this also flushes the register state for SVE if storage was
already allocated but in the SME context this is not desirable. For a SME
access trap to be taken the task must not be in streaming mode so either
there already is SVE register state present for regular SVE mode which would
be corrupted or the task does not have TIF_SVE and the flush is redundant.

Fix this by adding a flag to sve_alloc() indicating if we are in a SVE
context and need to flush the state. Freshly allocated storage is always
zeroed either way.

Fixes: 8bd7f91c03d8 ("arm64/sme: Implement traps and syscall handling for SME")
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220817182324.638214-4-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2022-08-23 11:29:11 +01:00
Mark Brown
ea64baacbc arm64/signal: Flush FPSIMD register state when disabling streaming mode
When handling a signal delivered to a context with streaming mode enabled
we will disable streaming mode for the signal handler, when doing so we
should also flush the saved FPSIMD register state like exiting streaming
mode in the hardware would do so that if that state is reloaded we get the
same behaviour. Without this we will reload whatever the last FPSIMD state
that was saved for the task was.

Fixes: 40a8e87bb328 ("arm64/sme: Disable ZA and streaming mode when handling signals")
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220817182324.638214-3-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2022-08-23 11:29:11 +01:00
Mark Brown
7ddcaf78e9 arm64/signal: Raise limit on stack frames
The signal code has a limit of 64K on the size of a stack frame that it
will generate, if this limit is exceeded then a process will be killed if
it receives a signal. Unfortunately with the advent of SME this limit is
too small - the maximum possible size of the ZA register alone is 64K. This
is not an issue for practical systems at present but is easily seen using
virtual platforms.

Raise the limit to 256K, this is substantially more than could be used by
any current architecture extension.

Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220817182324.638214-2-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2022-08-23 11:29:11 +01:00
Mark Brown
53d2d84a1f arm64/cache: Fix cache_type_cwg() for register generation
Ard noticed that since we converted CTR_EL0 to automatic generation we have
been seeing errors on some systems handling the value of cache_type_cwg()
such as

   CPU features: No Cache Writeback Granule information, assuming 128

This is because the manual definition of CTR_EL0_CWG_MASK was done without
a shift while our convention is to define the mask after shifting. This
means that the user in cache_type_cwg() was broken as it was written for
the manually written shift then mask. Fix this by converting to use
SYS_FIELD_GET().

The only other field where the _MASK for this register is used is IminLine
which is at offset 0 so unaffected.

Fixes: 9a3634d02301 ("arm64/sysreg: Convert CTR_EL0 to automatic generation")
Reported-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20220818213613.733091-4-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2022-08-23 11:26:01 +01:00
Mark Brown
a10edea4ef arm64/sysreg: Guard SYS_FIELD_ macros for asm
The SYS_FIELD_ macros are not safe for assembly contexts, move them inside
the guarded section.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20220818213613.733091-3-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2022-08-23 11:26:00 +01:00
Mark Brown
02e483f8d4 arm64/sysreg: Directly include bitfield.h
The SYS_FIELD_ macros in sysreg.h use definitions from bitfield.h but there
is no direct inclusion of it, add one to ensure that sysreg.h is directly
usable.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20220818213613.733091-2-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2022-08-23 11:26:00 +01:00
Sudeep Holla
e75d18cecb arm64: cacheinfo: Fix incorrect assignment of signed error value to unsigned fw_level
Though acpi_find_last_cache_level() always returned signed value and the
document states it will return any errors caused by lack of a PPTT table,
it never returned negative values before.

Commit 0c80f9e165f8 ("ACPI: PPTT: Leave the table mapped for the runtime usage")
however changed it by returning -ENOENT if no PPTT was found. The value
returned from acpi_find_last_cache_level() is then assigned to unsigned
fw_level.

It will result in the number of cache leaves calculated incorrectly as
a huge value which will then cause the following warning from __alloc_pages
as the order would be great than MAX_ORDER because of incorrect and huge
cache leaves value.

  |  WARNING: CPU: 0 PID: 1 at mm/page_alloc.c:5407 __alloc_pages+0x74/0x314
  |  Modules linked in:
  |  CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.19.0-10393-g7c2a8d3ac4c0 #73
  |  pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
  |  pc : __alloc_pages+0x74/0x314
  |  lr : alloc_pages+0xe8/0x318
  |  Call trace:
  |   __alloc_pages+0x74/0x314
  |   alloc_pages+0xe8/0x318
  |   kmalloc_order_trace+0x68/0x1dc
  |   __kmalloc+0x240/0x338
  |   detect_cache_attributes+0xe0/0x56c
  |   update_siblings_masks+0x38/0x284
  |   store_cpu_topology+0x78/0x84
  |   smp_prepare_cpus+0x48/0x134
  |   kernel_init_freeable+0xc4/0x14c
  |   kernel_init+0x2c/0x1b4
  |   ret_from_fork+0x10/0x20

Fix the same by changing fw_level to be signed integer and return the
error from init_cache_level() early in case of error.

Reported-and-Tested-by: Bruno Goncalves <bgoncalv@redhat.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Link: https://lore.kernel.org/r/20220808084640.3165368-1-sudeep.holla@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-08-23 11:10:24 +01:00
Ionela Voinescu
e89d120c4b arm64: errata: add detection for AMEVCNTR01 incrementing incorrectly
The AMU counter AMEVCNTR01 (constant counter) should increment at the same
rate as the system counter. On affected Cortex-A510 cores, AMEVCNTR01
increments incorrectly giving a significantly higher output value. This
results in inaccurate task scheduler utilization tracking and incorrect
feedback on CPU frequency.

Work around this problem by returning 0 when reading the affected counter
in key locations that results in disabling all users of this counter from
using it either for frequency invariance or as FFH reference counter. This
effect is the same to firmware disabling affected counters.

Details on how the two features are affected by this erratum:

 - AMU counters will not be used for frequency invariance for affected
   CPUs and CPUs in the same cpufreq policy. AMUs can still be used for
   frequency invariance for unaffected CPUs in the system. Although
   unlikely, if no alternative method can be found to support frequency
   invariance for affected CPUs (cpufreq based or solution based on
   platform counters) frequency invariance will be disabled. Please check
   the chapter on frequency invariance at
   Documentation/scheduler/sched-capacity.rst for details of its effect.

 - Given that FFH can be used to fetch either the core or constant counter
   values, restrictions are lifted regarding any of these counters
   returning a valid (!0) value. Therefore FFH is considered supported
   if there is a least one CPU that support AMUs, independent of any
   counters being disabled or affected by this erratum. Clarifying
   comments are now added to the cpc_ffh_supported(), cpu_read_constcnt()
   and cpu_read_corecnt() functions.

The above is achieved through adding a new erratum: ARM64_ERRATUM_2457168.

Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: James Morse <james.morse@arm.com>
Link: https://lore.kernel.org/r/20220819103050.24211-1-ionela.voinescu@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-08-23 11:06:48 +01:00
Mark Rutland
2e8cff0a0e arm64: fix rodata=full
On arm64, "rodata=full" has been suppored (but not documented) since
commit:

  c55191e96caa9d78 ("arm64: mm: apply r/o permissions of VM areas to its linear alias as well")

As it's necessary to determine the rodata configuration early during
boot, arm64 has an early_param() handler for this, whereas init/main.c
has a __setup() handler which is run later.

Unfortunately, this split meant that since commit:

  f9a40b0890658330 ("init/main.c: return 1 from handled __setup() functions")

... passing "rodata=full" would result in a spurious warning from the
__setup() handler (though RO permissions would be configured
appropriately).

Further, "rodata=full" has been broken since commit:

  0d6ea3ac94ca77c5 ("lib/kstrtox.c: add "false"/"true" support to kstrtobool()")

... which caused strtobool() to parse "full" as false (in addition to
many other values not documented for the "rodata=" kernel parameter.

This patch fixes this breakage by:

* Moving the core parameter parser to an __early_param(), such that it
  is available early.

* Adding an (optional) arch hook which arm64 can use to parse "full".

* Updating the documentation to mention that "full" is valid for arm64.

* Having the core parameter parser handle "on" and "off" explicitly,
  such that any undocumented values (e.g. typos such as "ful") are
  reported as errors rather than being silently accepted.

Note that __setup() and early_param() have opposite conventions for
their return values, where __setup() uses 1 to indicate a parameter was
handled and early_param() uses 0 to indicate a parameter was handled.

Fixes: f9a40b089065 ("init/main.c: return 1 from handled __setup() functions")
Fixes: 0d6ea3ac94ca ("lib/kstrtox.c: add "false"/"true" support to kstrtobool()")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jagdish Gediya <jvgediya@linux.ibm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220817154022.3974645-1-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-08-23 11:02:02 +01:00
Kuan-Ying Lee
729a916599 arm64: Fix comment typo
Replace wrong 'FIQ EL1h' comment with 'FIQ EL1t'.

Signed-off-by: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>
Link: https://lore.kernel.org/r/20220721030531.21234-1-Kuan-Ying.Lee@mediatek.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-08-23 10:53:34 +01:00
Helge Deller
591d2108f3 parisc: Add runtime check to prevent PA2.0 kernels on PA1.x machines
If a 32-bit kernel was compiled for PA2.0 CPUs, it won't be able to run
on machines with PA1.x CPUs. Add a check and bail out early if a PA1.x
machine is detected.

Signed-off-by: Helge Deller <deller@gmx.de>
2022-08-22 11:09:17 +02:00
Helge Deller
b4b18f47f4 Revert "parisc: Show error if wrong 32/64-bit compiler is being used"
This reverts commit b160628e9ebcdc85d0db9d7f423c26b3c7c179d0.

There is no need any longer to have this sanity check, because the
previous commit ("parisc: Make CONFIG_64BIT available for ARCH=parisc64
only") prevents that CONFIG_64BIT is set if ARCH==parisc.

Signed-off-by: Helge Deller <deller@gmx.de>
2022-08-22 11:09:17 +02:00
Helge Deller
3dcfb729b5 parisc: Make CONFIG_64BIT available for ARCH=parisc64 only
With this patch the ARCH= parameter decides if the
CONFIG_64BIT option will be set or not. This means, the
ARCH= parameter will give:

	ARCH=parisc	-> 32-bit kernel
	ARCH=parisc64	-> 64-bit kernel

This simplifies the usage of the other config options like
randconfig, allmodconfig and allyesconfig a lot and produces
the output which is expected for parisc64 (64-bit) vs. parisc (32-bit).

Suggested-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Cc: <stable@vger.kernel.org> # 5.15+
2022-08-22 11:09:17 +02:00
Linus Torvalds
4daa6a81c0 Misc irqchip fixes: LoongArch driver fixes and a Hyper-V IOMMU fix.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmMCnLoRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1g9Yw/8DCoZnWpb0S1WBMgxBA/t1i4Y5+FHqF8B
 M/dbNqukryKsRaB70Lbif75691i3jV1E12X1apY9yP5weGEc4KDOhIVpcGLfN0L0
 39775FEtZ6Mncvifw7aZ8J9/hCr5ppQd7PC47nKgnNFA2m8DK9Uy6pNMWra/IMwx
 Aw39lfZQ2UwTgP+VH0Bt9klGgzuoZSndzMvMrG2xpqhQPJOkop/dRxvf0wVofOjB
 3V53nKSNiVZ1lTpDrF828cvErJf51P3kA7/iaO4CUSf+S/Xs0DPmSxBcVjpudZBc
 HBJFxDOhy6tjMb+WOAsvLikEl6K5wuHvP427VZhy1orXDQYFBuiF8xCh2tId61l0
 O/NFu6PA9EVEDEMtGR3kRPXWee1V8KKKngr091tE+0ErbCKrzh9dzG9qPrUZIgzW
 +RfiMfT1SjgbZcg9+CsJXCaImHEZKGBsRw70S6c8nUfKoWC6vdBcjKnLseThJKKe
 OLClYd2jsYV8PVASqLTMraDF9hPK5UokPvVCq/M5k7147Qd0spmlAsGov22i7/uf
 Yd4hud1W+Q1yTsabx0hBA0VNYO899BnRRjqrx+SayMoo7VHD/fAX7tFNjy0cpidw
 YdpxBkKzlWuNIACxeaA0vTNdNomelPYAmlniH/jTIhD6ts1Ym4gepyApE4Ccthvx
 uKJqvgY1KEM=
 =2sWM
 -----END PGP SIGNATURE-----

Merge tag 'irq-urgent-2022-08-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fixes from Ingo Molnar:
 "Misc irqchip fixes: LoongArch driver fixes and a Hyper-V IOMMU fix"

* tag 'irq-urgent-2022-08-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/loongson-liointc: Fix an error handling path in liointc_init()
  irqchip/loongarch: Fix irq_domain_alloc_fwnode() abuse
  irqchip/loongson-pch-pic: Move find_pch_pic() into CONFIG_ACPI
  irqchip/loongson-eiointc: Fix a build warning
  irqchip/loongson-eiointc: Fix irq affinity setting
  iommu/hyper-v: Use helper instead of directly accessing affinity
2022-08-21 15:09:55 -07:00
Linus Torvalds
4f61f842d1 Fix a kprobes bug in JNG/JNLE emulation when a kprobe is
installed at such instructions, possibly resulting in
 incorrect execution (the wrong branch taken).
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmMCmzARHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1jt1RAAwd8PJDg9gkWMq+96B1RM5nAumTnbkngc
 4ylQr7+8bhFj4O+uKuBYC0S4D2ox+LeEPAD7x4+pFSLahB78HTytAtixKtKe1l0B
 h+AZBm35PfkUlHw52B/aCUW6DWKZOa6KrLQa8L1MqIGK8oiMiL+Q6+xXFQ8a6Gtt
 ZYR2yi1esM2tHzw6CdMasRYswGO1KnZlCzGB5lHyDdU/y7xTfOgCTd00K07augXA
 mLQOBqTZ9dWDaT9mdc+Uh9llHOOhhfzMDYlWxSpAUsdk6vCWiUz6Y+ybsYRlebEh
 yEjRzUkvchraGj0L04m35ulqvoYTSCjc4aP0JvNB7mZoIyicajngfoKEc8DJ0BMf
 dgZj3CoKuTrKbbp//hfK5rt5J0m0ueiixOFI4EM0kJVUAOG1zzWAbxEQTrWnJlZh
 /8l/4n4zUY3Ss9ecG9PzNs++YH8I+gDbQd2kGr6YMffqUW9HOb5j2yk3mXs8gH0n
 O8HBGP7I94w45Q8L1CDSD/Sdn+IrhtaGzpYTV7rCEiM13S4FwrnzpVoi3SJiVODM
 kAxM4HScWXygTQTqvz4I5T9ibOVGp8iRQRHfLRbNiX+xB8k8DSQWAE1vxsw2v5o+
 L8/N6mt71u/j5DyTzybTp1kepum+UjVH2RdH3v9ma3BQODkM+kbWq3cTBD6qOUh9
 HH8KnagQcgc=
 =eXIr
 -----END PGP SIGNATURE-----

Merge tag 'perf-urgent-2022-08-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 kprobes fix from Ingo Molnar:
 "Fix a kprobes bug in JNG/JNLE emulation when a kprobe is installed at
  such instructions, possibly resulting in incorrect execution (the
  wrong branch taken)"

* tag 'perf-urgent-2022-08-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/kprobes: Fix JNG/JNLE emulation
2022-08-21 15:01:51 -07:00
Nick Desaulniers
a0a12c3ed0 asm goto: eradicate CC_HAS_ASM_GOTO
GCC has supported asm goto since 4.5, and Clang has since version 9.0.0.
The minimum supported versions of these tools for the build according to
Documentation/process/changes.rst are 5.1 and 11.0.0 respectively.

Remove the feature detection script, Kconfig option, and clean up some
fallback code that is no longer supported.

The removed script was also testing for a GCC specific bug that was
fixed in the 4.7 release.

Also remove workarounds for bpftrace using clang older than 9.0.0, since
other BPF backend fixes are required at this point.

Link: https://lore.kernel.org/lkml/CAK7LNATSr=BXKfkdW8f-H5VT_w=xBpT2ZQcZ7rm6JfkdE+QnmA@mail.gmail.com/
Link: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48637
Acked-by: Borislav Petkov <bp@suse.de>
Suggested-by: Masahiro Yamada <masahiroy@kernel.org>
Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-08-21 10:06:28 -07:00
Chen Zhongjin
fc2e426b11 x86/unwind/orc: Unwind ftrace trampolines with correct ORC entry
When meeting ftrace trampolines in ORC unwinding, unwinder uses address
of ftrace_{regs_}call address to find the ORC entry, which gets next frame at
sp+176.

If there is an IRQ hitting at sub $0xa8,%rsp, the next frame should be
sp+8 instead of 176. It makes unwinder skip correct frame and throw
warnings such as "wrong direction" or "can't access registers", etc,
depending on the content of the incorrect frame address.

By adding the base address ftrace_{regs_}caller with the offset
*ip - ops->trampoline*, we can get the correct address to find the ORC entry.

Also change "caller" to "tramp_addr" to make variable name conform to
its content.

[ mingo: Clarified the changelog a bit. ]

Fixes: 6be7fa3c74d1 ("ftrace, orc, x86: Handle ftrace dynamically allocated trampolines")
Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20220819084334.244016-1-chenzhongjin@huawei.com
2022-08-21 12:19:32 +02:00
Helge Deller
7ae1f5508d parisc: Fix exception handler for fldw and fstw instructions
The exception handler is broken for unaligned memory acceses with fldw
and fstw instructions, because it trashes or uses randomly some other
floating point register than the one specified in the instruction word
on loads and stores.

The instruction "fldw 0(addr),%fr22L" (and the other fldw/fstw
instructions) encode the target register (%fr22) in the rightmost 5 bits
of the instruction word. The 7th rightmost bit of the instruction word
defines if the left or right half of %fr22 should be used.

While processing unaligned address accesses, the FR3() define is used to
extract the offset into the local floating-point register set.  But the
calculation in FR3() was buggy, so that for example instead of %fr22,
register %fr12 [((22 * 2) & 0x1f) = 12] was used.

This bug has been since forever in the parisc kernel and I wonder why it
wasn't detected earlier. Interestingly I noticed this bug just because
the libime debian package failed to build on *native* hardware, while it
successfully built in qemu.

This patch corrects the bitshift and masking calculation in FR3().

Signed-off-by: Helge Deller <deller@gmx.de>
Cc: <stable@vger.kernel.org>
2022-08-21 08:43:09 +02:00