879 Commits

Author SHA1 Message Date
Mark Brown
4c11113c1a KVM: arm64: Enable Cortex-A510 erratum 2077057 by default
The recently added configuration option for Cortex A510 erratum 2077057 does
not have a "default y" unlike other errata fixes. This appears to simply be
an oversight since the help text suggests enabling the option if unsure and
there's nothing in the commit log to suggest it is intentional.

Fixes: 1dd498e5e26ad ("KVM: arm64: Workaround Cortex-A510's single-step and PAC trap errata")
Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220225184658.172527-1-broonie@kernel.org
2022-03-02 15:02:27 +00:00
Kees Cook
2792d84e6d usercopy: Check valid lifetime via stack depth
One of the things that CONFIG_HARDENED_USERCOPY sanity-checks is whether
an object that is about to be copied to/from userspace is overlapping
the stack at all. If it is, it performs a number of inexpensive
bounds checks. One of the finer-grained checks is whether an object
crosses stack frames within the stack region. Doing this on x86 with
CONFIG_FRAME_POINTER was cheap/easy. Doing it with ORC was deemed too
heavy, and was left out (a while ago), leaving the courser whole-stack
check.

The LKDTM tests USERCOPY_STACK_FRAME_TO and USERCOPY_STACK_FRAME_FROM
try to exercise these cross-frame cases to validate the defense is
working. They have been failing ever since ORC was added (which was
expected). While Muhammad was investigating various LKDTM failures[1],
he asked me for additional details on them, and I realized that when
exact stack frame boundary checking is not available (i.e. everything
except x86 with FRAME_POINTER), it could check if a stack object is at
least "current depth valid", in the sense that any object within the
stack region but not between start-of-stack and current_stack_pointer
should be considered unavailable (i.e. its lifetime is from a call no
longer present on the stack).

Introduce ARCH_HAS_CURRENT_STACK_POINTER to track which architectures
have actually implemented the common global register alias.

Additionally report usercopy bounds checking failures with an offset
from current_stack_pointer, which may assist with diagnosing failures.

The LKDTM USERCOPY_STACK_FRAME_TO and USERCOPY_STACK_FRAME_FROM tests
(once slightly adjusted in a separate patch) pass again with this fixed.

[1] https://github.com/kernelci/kernelci-project/issues/84

Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
Reported-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
v1: https://lore.kernel.org/lkml/20220216201449.2087956-1-keescook@chromium.org
v2: https://lore.kernel.org/lkml/20220224060342.1855457-1-keescook@chromium.org
v3: https://lore.kernel.org/lkml/20220225173345.3358109-1-keescook@chromium.org
v4: - improve commit log (akpm)
2022-02-25 18:20:11 -08:00
James Morse
558c303c97 arm64: Mitigate spectre style branch history side channels
Speculation attacks against some high-performance processors can
make use of branch history to influence future speculation.
When taking an exception from user-space, a sequence of branches
or a firmware call overwrites or invalidates the branch history.

The sequence of branches is added to the vectors, and should appear
before the first indirect branch. For systems using KPTI the sequence
is added to the kpti trampoline where it has a free register as the exit
from the trampoline is via a 'ret'. For systems not using KPTI, the same
register tricks are used to free up a register in the vectors.

For the firmware call, arch-workaround-3 clobbers 4 registers, so
there is no choice but to save them to the EL1 stack. This only happens
for entry from EL0, so if we take an exception due to the stack access,
it will not become re-entrant.

For KVM, the existing branch-predictor-hardening vectors are used.
When a spectre version of these vectors is in use, the firmware call
is sufficient to mitigate against Spectre-BHB. For the non-spectre
versions, the sequence of branches is added to the indirect vector.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
2022-02-24 13:58:52 +00:00
Ingo Molnar
6255b48aeb Linux 5.17-rc5
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmISrYgeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGg20IAKDZr7rfSHBopjQV
 Cocw744tom0XuxpvSZpp2GGOOXF+tkswcNNaRIrbGOl1mkyxA7eBZCTMpDeDS9aQ
 wB0D0Gxx8QBAJp4KgB1W7TB+hIGes/rs8Ve+6iO4ulLLdCVWX/q2boI0aZ7QX9O9
 qNi8OsoZQtk6falRvciZFHwV5Av1p2Sy1AW57udQ7DvJ4H98AfKf1u8/z208WWW8
 1ixC+qJxQcUcM9vI+7P9Tt7NbFSKv8SvAmqjFY7P+DxQAsVw6KXoqVXykDzeOv0t
 fUNOE/t0oFZafwtn8h7KBQnwS9lH03+3KkslVZs+iMFyUj/Bar+NVVyKoDhWXtVg
 /PuMhEg=
 =eU1o
 -----END PGP SIGNATURE-----

Merge tag 'v5.17-rc5' into sched/core, to resolve conflicts

New conflicts in sched/core due to the following upstream fixes:

  44585f7bc0cb ("psi: fix "defined but not used" warnings when CONFIG_PROC_FS=n")
  a06247c6804f ("psi: Fix uaf issue when psi trigger is destroyed while being polled")

Conflicts:
	include/linux/psi_types.h
	kernel/sched/psi.c

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2022-02-21 11:53:51 +01:00
Mark Rutland
1b2d3451ee arm64: Support PREEMPT_DYNAMIC
This patch enables support for PREEMPT_DYNAMIC on arm64, allowing the
preemption model to be chosen at boot time.

Specifically, this patch selects HAVE_PREEMPT_DYNAMIC_KEY, so that each
preemption function is an out-of-line call with an early return
depending upon a static key. This leaves almost all the codegen up to
the compiler, and side-steps a number of pain points with static calls
(e.g. interaction with CFI schemes). This should have no worse overhead
than using non-inline static calls, as those use out-of-line trampolines
with early returns.

For example, the dynamic_cond_resched() wrapper looks as follows when
enabled. When disabled, the first `B` is replaced with a `NOP`,
resulting in an early return.

| <dynamic_cond_resched>:
|        bti     c
|        b       <dynamic_cond_resched+0x10>     // or `nop`
|        mov     w0, #0x0
|        ret
|        mrs     x0, sp_el0
|        ldr     x0, [x0, #8]
|        cbnz    x0, <dynamic_cond_resched+0x8>
|        paciasp
|        stp     x29, x30, [sp, #-16]!
|        mov     x29, sp
|        bl      <preempt_schedule_common>
|        mov     w0, #0x1
|        ldp     x29, x30, [sp], #16
|        autiasp
|        ret

... compared to the regular form of the function:

| <__cond_resched>:
|        bti     c
|        mrs     x0, sp_el0
|        ldr     x1, [x0, #8]
|        cbz     x1, <__cond_resched+0x18>
|        mov     w0, #0x0
|        ret
|        paciasp
|        stp     x29, x30, [sp, #-16]!
|        mov     x29, sp
|        bl      <preempt_schedule_common>
|        mov     w0, #0x1
|        ldp     x29, x30, [sp], #16
|        autiasp
|        ret

Since arm64 does not yet use the generic entry code, we must define our
own `sk_dynamic_irqentry_exit_cond_resched`, which will be
enabled/disabled by the common code in kernel/sched/core.c. All other
preemption functions and associated static keys are defined there.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20220214165216.2231574-8-mark.rutland@arm.com
2022-02-19 11:11:09 +01:00
Catalin Marinas
6dd8b1a0b6 arm64: mte: Dump the MTE tags in the core file
For each vma mapped with PROT_MTE (the VM_MTE flag set), generate a
PT_ARM_MEMTAG_MTE segment in the core file and dump the corresponding
tags. The in-file size for such segments is 128 bytes per page.

For pages in a VM_MTE vma which are not present in the user page tables
or don't have the PG_mte_tagged flag set (e.g. execute-only), just write
zeros in the core file.

An example of program headers for two vmas, one 2-page, the other 4-page
long:

  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  ...
  LOAD           0x030000 0x0000ffff80034000 0x0000000000000000 0x000000 0x002000 RW  0x1000
  LOAD           0x030000 0x0000ffff80036000 0x0000000000000000 0x004000 0x004000 RW  0x1000
  ...
  LOPROC+0x1     0x05b000 0x0000ffff80034000 0x0000000000000000 0x000100 0x002000     0
  LOPROC+0x1     0x05b100 0x0000ffff80036000 0x0000000000000000 0x000200 0x004000     0

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Luis Machado <luis.machado@linaro.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20220131165456.2160675-5-catalin.marinas@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-02-15 22:53:29 +00:00
Linus Torvalds
e47ca40326 arm64 fixes:
- Enable Cortex-A510 erratum 2051678 by default as we do with other
   errata.
 
 - arm64 IORT: Check the node revision for PMCG resources to cope with
   old firmware based on a broken revision of the spec that had no way to
   describe the second register page (when an implementation is using the
   recommended RELOC_CTRS feature).
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmIGqvAACgkQa9axLQDI
 XvEJ8RAAjBT8KyM1BcDrcvqzKBVmz2en1lby+xlto7Wal71ubtgcSqRixY9SeMjI
 js1vvdKOKExSftLWjBPTlJ2PLVX0/7KAg0d6Q3SFHab4UdlY8hfvsRIRFi1MNzxa
 aSX6YQ8gvMpad1aDE5TGRZOz4fse5D9j3p83MYi6lSKYZEBG4Axsw0pSxKda7hd2
 oq6i+LeQLXrOYfMWgr7dHKJlsr3KZRICEeXO3irEWeNFm3euBFDyhNLMKmHioZ8C
 YWjvvOPygcpxdZ8bL1aFUfXtt+Ou9zz+++27TbfirgE4kaW6SOynNYOowhSfotJb
 8ku5bxGCiifUgQ/Cy4C8HJRplDIUGoP5nQ8tv0l1tK5Dld1Wn03qzUNl/nTyB1RG
 R23Uykcq9HVjm2yorYP4tZ9WFMISgwZuWyGD7cXBITTkBZ0P6ooTwFtoP6yfeVUw
 Oh32a0DzjVjK6HrgQddyJColJuM937YkXW6pCcR1kPqIxPyYaABLA9KrC+rYcVy5
 vIVHmfrrcadhjZFB4vNIrvwb55MdMX1S6FQ1jqrmit/NP6AR6iXj5ufssfYveRmO
 iUvg1KuRdpStsUS+3dEEH7ZXPnj7nKVKfmvxMi4wDd8nmOiim4jG3u2/0PhHNqaC
 cKMbNumH+Pu+Dmrd6Tskvb59IkKfuWDpshPqMsN3UCp1FoyRdAw=
 =n9qz
 -----END PGP SIGNATURE-----

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Catalin Marinas:

 - Enable Cortex-A510 erratum 2051678 by default as we do with other
   errata.

 - arm64 IORT: Check the node revision for PMCG resources to cope with
   old firmware based on a broken revision of the spec that had no way
   to describe the second register page (when an implementation is using
   the recommended RELOC_CTRS feature).

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  ACPI/IORT: Check node revision for PMCG resources
  arm64: Enable Cortex-A510 erratum 2051678 by default
2022-02-11 11:55:26 -08:00
James Morse
1dd498e5e2 KVM: arm64: Workaround Cortex-A510's single-step and PAC trap errata
Cortex-A510's erratum #2077057 causes SPSR_EL2 to be corrupted when
single-stepping authenticated ERET instructions. A single step is
expected, but a pointer authentication trap is taken instead. The
erratum causes SPSR_EL1 to be copied to SPSR_EL2, which could allow
EL1 to cause a return to EL2 with a guest controlled ELR_EL2.

Because the conditions require an ERET into active-not-pending state,
this is only a problem for the EL2 when EL2 is stepping EL1. In this case
the previous SPSR_EL2 value is preserved in struct kvm_vcpu, and can be
restored.

Cc: stable@vger.kernel.org # 53960faf2b73: arm64: Add Cortex-A510 CPU part definition
Cc: stable@vger.kernel.org
Signed-off-by: James Morse <james.morse@arm.com>
[maz: fixup cpucaps ordering]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220127122052.1584324-5-james.morse@arm.com
2022-02-03 09:22:30 +00:00
Mark Brown
a4b92cebc3 arm64: Enable Cortex-A510 erratum 2051678 by default
The recently added configuration option for Cortex A510 erratum 2051678 does
not have a "default y" unlike other errata fixes. This appears to simply be
an oversight since the help text suggests enabling the option if unsure and
there's nothing in the commit log to suggest it is intentional.

Fixes: 297ae1eb23b0 ("arm64: cpufeature: List early Cortex-A510 parts as having broken dbm")
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20220201144838.20037-1-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-02-01 18:27:43 +00:00
James Morse
297ae1eb23 arm64: cpufeature: List early Cortex-A510 parts as having broken dbm
Versions of Cortex-A510 before r0p3 are affected by a hardware erratum
where the hardware update of the dirty bit is not correctly ordered.

Add these cpus to the cpu_has_broken_dbm list.

Signed-off-by: James Morse <james.morse@arm.com>
Link: https://lore.kernel.org/r/20220125154040.549272-3-james.morse@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-01-28 16:15:46 +00:00
Catalin Marinas
df20597044 coresight: trbe: Workaround Cortex-A510 erratas
This pull request is providing arm64 definitions to support
 TRBE Cortex-A510 erratas.
 
 Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQFPBAABCgA5FiEEeTrpXvBwUkra1RYWo5FxFnwrV6EFAmHy7RkbHG1hdGhpZXUu
 cG9pcmllckBsaW5hcm8ub3JnAAoJEKORcRZ8K1ehmvEH/2O01bctGwhEtg7wRxdu
 b/pksZSJZkrTq7cUU/xRUzGEj38owoYb/QFle1+e1qMW8Lt5nI11xLLCuBxTTZFT
 zazoYnHHciKK5kiQSCK1cN4hTjGfL0dn/cEUkwGMA9PX6B8jG+WvMEHYXZkebt5b
 BV88QUNB5+S5PPZzF+UczLVQoZ1UmlwkoVyTpRQN97qunqOZ6C1esDgOeghAXTg4
 EKni3tl7IkkuDDsWvg4ez4hvnYbCbPaMaFqVI81n1NGHl2fhsKAa3GXKzj+wiG8H
 gQEXw0q8G8rxJ4Ik/K4/VApWGrqFFSCFCeho8GFqxputUkzGoCRZ1U6JPQIbFWrN
 lJM=
 =HLQt
 -----END PGP SIGNATURE-----

Merge tag 'trbe-cortex-a510-errata' of gitolite.kernel.org:pub/scm/linux/kernel/git/coresight/linux into for-next/fixes

coresight: trbe: Workaround Cortex-A510 erratas

This pull request is providing arm64 definitions to support
TRBE Cortex-A510 erratas.

Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>

* tag 'trbe-cortex-a510-errata' of gitolite.kernel.org:pub/scm/linux/kernel/git/coresight/linux:
  arm64: errata: Add detection for TRBE trace data corruption
  arm64: errata: Add detection for TRBE invalid prohibited states
  arm64: errata: Add detection for TRBE ignored system register writes
  arm64: Add Cortex-A510 CPU part definition
2022-01-28 16:14:06 +00:00
Anshuman Khandual
708e8af492 arm64: errata: Add detection for TRBE trace data corruption
TRBE implementations affected by Arm erratum #1902691 might corrupt trace
data or deadlock, when it's being written into the memory. So effectively
TRBE is broken and hence cannot be used to capture trace data. This adds
a new errata ARM64_ERRATUM_1902691 in arm64 errata framework.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Suzuki Poulose <suzuki.poulose@arm.com>
Cc: coresight@lists.linaro.org
Cc: linux-doc@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/1643120437-14352-5-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
2022-01-27 12:01:53 -07:00
Anshuman Khandual
3bd94a8759 arm64: errata: Add detection for TRBE invalid prohibited states
TRBE implementations affected by Arm erratum #2038923 might get TRBE into
an inconsistent view on whether trace is prohibited within the CPU. As a
result, the trace buffer or trace buffer state might be corrupted. This
happens after TRBE buffer has been enabled by setting TRBLIMITR_EL1.E,
followed by just a single context synchronization event before execution
changes from a context, in which trace is prohibited to one where it isn't,
or vice versa. In these mentioned conditions, the view of whether trace is
prohibited is inconsistent between parts of the CPU, and the trace buffer
or the trace buffer state might be corrupted. This adds a new errata
ARM64_ERRATUM_2038923 in arm64 errata framework.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Suzuki Poulose <suzuki.poulose@arm.com>
Cc: coresight@lists.linaro.org
Cc: linux-doc@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/1643120437-14352-4-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
2022-01-27 12:01:53 -07:00
Anshuman Khandual
607a9afaae arm64: errata: Add detection for TRBE ignored system register writes
TRBE implementations affected by Arm erratum #2064142 might fail to write
into certain system registers after the TRBE has been disabled. Under some
conditions after TRBE has been disabled, writes into certain TRBE registers
TRBLIMITR_EL1, TRBPTR_EL1, TRBBASER_EL1, TRBSR_EL1 and TRBTRG_EL1 will be
ignored and not be effected. This adds a new errata ARM64_ERRATUM_2064142
in arm64 errata framework.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Suzuki Poulose <suzuki.poulose@arm.com>
Cc: coresight@lists.linaro.org
Cc: linux-doc@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/1643120437-14352-3-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
2022-01-27 12:01:53 -07:00
Anshuman Khandual
eb30d838a4 arm64: errata: Update ARM64_ERRATUM_[2119858|2224489] with Cortex-X2 ranges
Errata ARM64_ERRATUM_[2119858|2224489] also affect some Cortex-X2 ranges as
well. Lets update these errata definition and detection to accommodate all
new Cortex-X2 based cpu MIDR ranges.

Cc: Will Deacon <will@kernel.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Suzuki Poulose <suzuki.poulose@arm.com>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/1642994138-25887-3-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-01-24 14:20:50 +00:00
Linus Torvalds
3689f9f8b0 bitmap patches for 5.17-rc1
-----BEGIN PGP SIGNATURE-----
 
 iQHJBAABCgAzFiEEi8GdvG6xMhdgpu/4sUSA/TofvsgFAmHi+xgVHHl1cnkubm9y
 b3ZAZ21haWwuY29tAAoJELFEgP06H77IxdoMAMf3E+L51Ys/4iAiyJQNVoT3aIBC
 A8ZVOB9he1OA3o3wBNIRKmICHk+ovnfCWcXTr9fG/Ade2wJz88NAsGPQ1Phywb+s
 iGlpySllFN72RT9ZqtJhLEzgoHHOL0CzTW07TN9GJy4gQA2h2G9CTP+OmsQdnVqE
 m9Fn3PSlJ5lhzePlKfnln8rGZFgrriJakfEFPC79n/7an4+2Hvkb5rWigo7KQc4Z
 9YNqYUcHWZFUgq80adxEb9LlbMXdD+Z/8fCjOrAatuwVkD4RDt6iKD0mFGjHXGL7
 MZ9KRS8AfZXawmetk3jjtsV+/QkeS+Deuu7k0FoO0Th2QV7BGSDhsLXAS5By/MOC
 nfSyHhnXHzCsBMyVNrJHmNhEZoN29+tRwI84JX9lWcf/OLANcCofnP6f2UIX7tZY
 CAZAgVELp+0YQXdybrfzTQ8BT3TinjS/aZtCrYijRendI1GwUXcyl69vdOKqAHuk
 5jy8k/xHyp+ZWu6v+PyAAAEGowY++qhL0fmszA==
 =RKW4
 -----END PGP SIGNATURE-----

Merge tag 'bitmap-5.17-rc1' of git://github.com/norov/linux

Pull bitmap updates from Yury Norov:

 - introduce for_each_set_bitrange()

 - use find_first_*_bit() instead of find_next_*_bit() where possible

 - unify for_each_bit() macros

* tag 'bitmap-5.17-rc1' of git://github.com/norov/linux:
  vsprintf: rework bitmap_list_string
  lib: bitmap: add performance test for bitmap_print_to_pagebuf
  bitmap: unify find_bit operations
  mm/percpu: micro-optimize pcpu_is_populated()
  Replace for_each_*_bit_from() with for_each_*_bit() where appropriate
  find: micro-optimize for_each_{set,clear}_bit()
  include/linux: move for_each_bit() macros from bitops.h to find.h
  cpumask: replace cpumask_next_* with cpumask_first_* where appropriate
  tools: sync tools/bitmap with mother linux
  all: replace find_next{,_zero}_bit with find_first{,_zero}_bit where appropriate
  cpumask: use find_first_and_bit()
  lib: add find_first_and_bit()
  arch: remove GENERIC_FIND_FIRST_BIT entirely
  include: move find.h from asm_generic to linux
  bitops: move find_bit_*_le functions from le.h to find.h
  bitops: protect find_first_{,zero}_bit properly
2022-01-23 06:20:44 +02:00
Linus Torvalds
f4484d138b Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:
 "55 patches.

  Subsystems affected by this patch series: percpu, procfs, sysctl,
  misc, core-kernel, get_maintainer, lib, checkpatch, binfmt, nilfs2,
  hfs, fat, adfs, panic, delayacct, kconfig, kcov, and ubsan"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (55 commits)
  lib: remove redundant assignment to variable ret
  ubsan: remove CONFIG_UBSAN_OBJECT_SIZE
  kcov: fix generic Kconfig dependencies if ARCH_WANTS_NO_INSTR
  lib/Kconfig.debug: make TEST_KMOD depend on PAGE_SIZE_LESS_THAN_256KB
  btrfs: use generic Kconfig option for 256kB page size limit
  arch/Kconfig: split PAGE_SIZE_LESS_THAN_256KB from PAGE_SIZE_LESS_THAN_64KB
  configs: introduce debug.config for CI-like setup
  delayacct: track delays from memory compact
  Documentation/accounting/delay-accounting.rst: add thrashing page cache and direct compact
  delayacct: cleanup flags in struct task_delay_info and functions use it
  delayacct: fix incomplete disable operation when switch enable to disable
  delayacct: support swapin delay accounting for swapping without blkio
  panic: remove oops_id
  panic: use error_report_end tracepoint on warnings
  fs/adfs: remove unneeded variable make code cleaner
  FAT: use io_schedule_timeout() instead of congestion_wait()
  hfsplus: use struct_group_attr() for memcpy() region
  nilfs2: remove redundant pointer sbufs
  fs/binfmt_elf: use PT_LOAD p_align values for static PIE
  const_structs.checkpatch: add frequently used ops structs
  ...
2022-01-20 10:41:01 +02:00
Kefeng Wang
7ecd19cfdf mm: percpu: generalize percpu related config
Patch series "mm: percpu: Cleanup percpu first chunk function".

When supporting page mapping percpu first chunk allocator on arm64, we
found there are lots of duplicated codes in percpu embed/page first chunk
allocator.  This patchset is aimed to cleanup them and should no function
change.

The currently supported status about 'embed' and 'page' in Archs shows
below,

	embed: NEED_PER_CPU_PAGE_FIRST_CHUNK
	page:  NEED_PER_CPU_EMBED_FIRST_CHUNK

		embed	page
	------------------------
	arm64	  Y	 Y
	mips	  Y	 N
	powerpc	  Y	 Y
	riscv	  Y	 N
	sparc	  Y	 Y
	x86	  Y	 Y
	------------------------

There are two interfaces about percpu first chunk allocator,

 extern int __init pcpu_embed_first_chunk(size_t reserved_size, size_t dyn_size,
                                size_t atom_size,
                                pcpu_fc_cpu_distance_fn_t cpu_distance_fn,
-                               pcpu_fc_alloc_fn_t alloc_fn,
-                               pcpu_fc_free_fn_t free_fn);
+                               pcpu_fc_cpu_to_node_fn_t cpu_to_nd_fn);

 extern int __init pcpu_page_first_chunk(size_t reserved_size,
-                               pcpu_fc_alloc_fn_t alloc_fn,
-                               pcpu_fc_free_fn_t free_fn,
-                               pcpu_fc_populate_pte_fn_t populate_pte_fn);
+                               pcpu_fc_cpu_to_node_fn_t cpu_to_nd_fn);

The pcpu_fc_alloc_fn_t/pcpu_fc_free_fn_t is killed, we provide generic
pcpu_fc_alloc() and pcpu_fc_free() function, which are called in the
pcpu_embed/page_first_chunk().

1) For pcpu_embed_first_chunk(), pcpu_fc_cpu_to_node_fn_t is needed to be
   provided when archs supported NUMA.

2) For pcpu_page_first_chunk(), the pcpu_fc_populate_pte_fn_t is killed too,
   a generic pcpu_populate_pte() which marked '__weak' is provided, if you
   need a different function to populate pte on the arch(like x86), please
   provide its own implementation.

[1] https://github.com/kevin78/linux.git percpu-cleanup

This patch (of 4):

The HAVE_SETUP_PER_CPU_AREA/NEED_PER_CPU_EMBED_FIRST_CHUNK/
NEED_PER_CPU_PAGE_FIRST_CHUNK/USE_PERCPU_NUMA_NODE_ID configs, which have
duplicate definitions on platforms that subscribe it.

Move them into mm, drop these redundant definitions and instead just
select it on applicable platforms.

Link: https://lkml.kernel.org/r/20211216112359.103822-1-wangkefeng.wang@huawei.com
Link: https://lkml.kernel.org/r/20211216112359.103822-2-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>	[arm64]
Cc: Will Deacon <will@kernel.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-20 08:52:52 +02:00
Yury Norov
c126a53c27 arch: remove GENERIC_FIND_FIRST_BIT entirely
In 5.12 cycle we enabled GENERIC_FIND_FIRST_BIT config option for ARM64
and MIPS. It increased performance and shrunk .text size; and so far
I didn't receive any negative feedback on the change.

https://lore.kernel.org/linux-arch/20210225135700.1381396-1-yury.norov@gmail.com/

Now I think it's a good time to switch all architectures to use
find_{first,last}_bit() unconditionally, and so remove corresponding
config option.

The patch does't introduce functioal changes for arc, arm, arm64, mips,
m68k, s390 and x86, for other architectures I expect improvement both in
performance and .text size.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Tested-by: Alexander Lobakin <alobakin@pm.me> (mips)
Reviewed-by: Alexander Lobakin <alobakin@pm.me> (mips)
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Will Deacon <will@kernel.org>
Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
2022-01-15 08:47:31 -08:00
Linus Torvalds
daadb3bd0e Peter Zijlstra says:
"Lots of cleanups and preparation; highlights:
 
  - futex: Cleanup and remove runtime futex_cmpxchg detection
 
  - rtmutex: Some fixes for the PREEMPT_RT locking infrastructure
 
  - kcsan: Share owner_on_cpu() between mutex,rtmutex and rwsem and
    annotate the racy owner->on_cpu access *once*.
 
  - atomic64: Dead-Code-Elemination"
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmHdvssACgkQEsHwGGHe
 VUrbBg//VQvz5BwddIJDj9utt5AvSixNcTF5mJyFKCSIqO0S4J8nCNcvJjZ2bs4S
 w1YmInFbp0WFGUhaIZiw0e6KWJUoINTng4MfHDZosS1doT2of53ZaQqXs3i81jDz
 87w8ADVHL0x4+BNjdsIwbcuPSDTmJFoyFOdeXTIl9hv9ZULT8m4Mt+LJuUHNZ+vF
 rS1jyseVPWkcm5y+Yie0rhip+ygzbfbt0ArsLfRcrBJsKr6oxLxV2DDF+2djXuuP
 d2OgGT7VkbgAhoKpzVXUiHsT6ppR5Mn5TLSa4EZ4bPPCUFldOhKuCAImF3T6yVIa
 44iX5vQN9v5VHBy6ocPbdOIBuYBYVGCMurh1t7pbpB6G+mmSxMiyta5MY37POwjv
 K2JT9mC2A6a4d17gue5FT3mnJMBB4eHwVaDfAwCZs/5rRNuoTz4aY5Xy04Mq0ltI
 39uarwBd5hwSugBWg44AS5E9h52E654FQ7g6iS4NtUvJuuaXBTl43EcZWx2+mnPL
 zY+iOMVMgg33VIVcm/mlf/6zWL0LXPmILUiA1fp4Q9/n8u1EuOOyeA/GsC9Pl3wO
 HY3KpYJA5eQpIk/JEnzKm5ZE3pCrUdH6VDC/SB4owQtafQG6OxyQVP1Gj7KYxZsD
 NqqpJ4nkKooc5f5DqVEN8wrjyYsnVxEfriEG09OoR6wI3MqyUA4=
 =vrYy
 -----END PGP SIGNATURE-----

Merge tag 'locking_core_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking updates from Borislav Petkov:
 "Lots of cleanups and preparation. Highlights:

   - futex: Cleanup and remove runtime futex_cmpxchg detection

   - rtmutex: Some fixes for the PREEMPT_RT locking infrastructure

   - kcsan: Share owner_on_cpu() between mutex,rtmutex and rwsem and
     annotate the racy owner->on_cpu access *once*.

   - atomic64: Dead-Code-Elemination"

[ Description above by Peter Zijlstra ]

* tag 'locking_core_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/atomic: atomic64: Remove unusable atomic ops
  futex: Fix additional regressions
  locking: Allow to include asm/spinlock_types.h from linux/spinlock_types_raw.h
  x86/mm: Include spinlock_t definition in pgtable.
  locking: Mark racy reads of owner->on_cpu
  locking: Make owner_on_cpu() into <linux/sched.h>
  lockdep/selftests: Adapt ww-tests for PREEMPT_RT
  lockdep/selftests: Skip the softirq related tests on PREEMPT_RT
  lockdep/selftests: Unbalanced migrate_disable() & rcu_read_lock().
  lockdep/selftests: Avoid using local_lock_{acquire|release}().
  lockdep: Remove softirq accounting on PREEMPT_RT.
  locking/rtmutex: Add rt_mutex_lock_nest_lock() and rt_mutex_lock_killable().
  locking/rtmutex: Squash self-deadlock check for ww_rt_mutex.
  locking: Remove rt_rwlock_is_contended().
  sched: Trigger warning if ->migration_disabled counter underflows.
  futex: Fix sparc32/m68k/nds32 build regression
  futex: Remove futex_cmpxchg detection
  futex: Ensure futex_atomic_cmpxchg_inatomic() is present
  kernel/locking: Use a pointer in ww_mutex_trylock().
2022-01-11 17:24:45 -08:00
Catalin Marinas
945409a6ef Merge branches 'for-next/misc', 'for-next/cache-ops-dzp', 'for-next/stacktrace', 'for-next/xor-neon', 'for-next/kasan', 'for-next/armv8_7-fp', 'for-next/atomics', 'for-next/bti', 'for-next/sve', 'for-next/kselftest' and 'for-next/kcsan', remote-tracking branch 'arm64/for-next/perf' into for-next/core
* arm64/for-next/perf: (32 commits)
  arm64: perf: Don't register user access sysctl handler multiple times
  drivers: perf: marvell_cn10k: fix an IS_ERR() vs NULL check
  perf/smmuv3: Fix unused variable warning when CONFIG_OF=n
  arm64: perf: Support new DT compatibles
  arm64: perf: Simplify registration boilerplate
  arm64: perf: Support Denver and Carmel PMUs
  drivers/perf: hisi: Add driver for HiSilicon PCIe PMU
  docs: perf: Add description for HiSilicon PCIe PMU driver
  dt-bindings: perf: Add YAML schemas for Marvell CN10K LLC-TAD pmu bindings
  drivers: perf: Add LLC-TAD perf counter support
  perf/smmuv3: Synthesize IIDR from CoreSight ID registers
  perf/smmuv3: Add devicetree support
  dt-bindings: Add Arm SMMUv3 PMCG binding
  perf/arm-cmn: Add debugfs topology info
  perf/arm-cmn: Add CI-700 Support
  dt-bindings: perf: arm-cmn: Add CI-700
  perf/arm-cmn: Support new IP features
  perf/arm-cmn: Demarcate CMN-600 specifics
  perf/arm-cmn: Move group validation data off-stack
  perf/arm-cmn: Optimise DTC counter accesses
  ...

* for-next/misc:
  : Miscellaneous patches
  arm64: Use correct method to calculate nomap region boundaries
  arm64: Drop outdated links in comments
  arm64: errata: Fix exec handling in erratum 1418040 workaround
  arm64: Unhash early pointer print plus improve comment
  asm-generic: introduce io_stop_wc() and add implementation for ARM64
  arm64: remove __dma_*_area() aliases
  docs/arm64: delete a space from tagged-address-abi
  arm64/fp: Add comments documenting the usage of state restore functions
  arm64: mm: Use asid feature macro for cheanup
  arm64: mm: Rename asid2idx() to ctxid2asid()
  arm64: kexec: reduce calls to page_address()
  arm64: extable: remove unused ex_handler_t definition
  arm64: entry: Use SDEI event constants
  arm64: Simplify checking for populated DT
  arm64/kvm: Fix bitrotted comment for SVE handling in handle_exit.c

* for-next/cache-ops-dzp:
  : Avoid DC instructions when DCZID_EL0.DZP == 1
  arm64: mte: DC {GVA,GZVA} shouldn't be used when DCZID_EL0.DZP == 1
  arm64: clear_page() shouldn't use DC ZVA when DCZID_EL0.DZP == 1

* for-next/stacktrace:
  : Unify the arm64 unwind code
  arm64: Make some stacktrace functions private
  arm64: Make dump_backtrace() use arch_stack_walk()
  arm64: Make profile_pc() use arch_stack_walk()
  arm64: Make return_address() use arch_stack_walk()
  arm64: Make __get_wchan() use arch_stack_walk()
  arm64: Make perf_callchain_kernel() use arch_stack_walk()
  arm64: Mark __switch_to() as __sched
  arm64: Add comment for stack_info::kr_cur
  arch: Make ARCH_STACKWALK independent of STACKTRACE

* for-next/xor-neon:
  : Use SHA3 instructions to speed up XOR
  arm64/xor: use EOR3 instructions when available

* for-next/kasan:
  : Log potential KASAN shadow aliases
  arm64: mm: log potential KASAN shadow alias
  arm64: mm: use die_kernel_fault() in do_mem_abort()

* for-next/armv8_7-fp:
  : Add HWCAPS for ARMv8.7 FEAT_AFP amd FEAT_RPRES
  arm64: cpufeature: add HWCAP for FEAT_RPRES
  arm64: add ID_AA64ISAR2_EL1 sys register
  arm64: cpufeature: add HWCAP for FEAT_AFP

* for-next/atomics:
  : arm64 atomics clean-ups and codegen improvements
  arm64: atomics: lse: define RETURN ops in terms of FETCH ops
  arm64: atomics: lse: improve constraints for simple ops
  arm64: atomics: lse: define ANDs in terms of ANDNOTs
  arm64: atomics lse: define SUBs in terms of ADDs
  arm64: atomics: format whitespace consistently

* for-next/bti:
  : BTI clean-ups
  arm64: Ensure that the 'bti' macro is defined where linkage.h is included
  arm64: Use BTI C directly and unconditionally
  arm64: Unconditionally override SYM_FUNC macros
  arm64: Add macro version of the BTI instruction
  arm64: ftrace: add missing BTIs
  arm64: kexec: use __pa_symbol(empty_zero_page)
  arm64: update PAC description for kernel

* for-next/sve:
  : SVE code clean-ups and refactoring in prepararation of Scalable Matrix Extensions
  arm64/sve: Minor clarification of ABI documentation
  arm64/sve: Generalise vector length configuration prctl() for SME
  arm64/sve: Make sysctl interface for SVE reusable by SME

* for-next/kselftest:
  : arm64 kselftest additions
  kselftest/arm64: Add pidbench for floating point syscall cases
  kselftest/arm64: Add a test program to exercise the syscall ABI
  kselftest/arm64: Allow signal tests to trigger from a function
  kselftest/arm64: Parameterise ptrace vector length information

* for-next/kcsan:
  : Enable KCSAN for arm64
  arm64: Enable KCSAN
2022-01-05 18:14:32 +00:00
Kefeng Wang
dd03762ab6 arm64: Enable KCSAN
This patch enables KCSAN for arm64, with updates to build rules
to not use KCSAN for several incompatible compilation units.

Recent GCC version(at least GCC10) made outline-atomics as the
default option(unlike Clang), which will cause linker errors
for kernel/kcsan/core.o. Disables the out-of-line atomics by
no-outline-atomics to fix the linker errors.

Meanwhile, as Mark said[1], some latent issues are needed to be
fixed which isn't just a KCSAN problem, we make the KCSAN depends
on EXPERT for now.

Tested selftest and kcsan_test(built with GCC11 and Clang 13),
and all passed.

[1] https://lkml.kernel.org/r/YadiUPpJ0gADbiHQ@FVFF77S0Q05N

Acked-by: Marco Elver <elver@google.com> # kernel/kcsan
Tested-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Link: https://lore.kernel.org/r/20211211131734.126874-1-wangkefeng.wang@huawei.com
[catalin.marinas@arm.com: added comment to justify EXPERT]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-12-14 18:54:34 +00:00
Ard Biesheuvel
2c54b423cf arm64/xor: use EOR3 instructions when available
Use the EOR3 instruction to implement xor_blocks() if the instruction is
available, which is the case if the CPU implements the SHA-3 extension.
This is about 20% faster on Apple M1 when using the 5-way version.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20211213140252.2856053-1-ardb@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-12-14 12:14:26 +00:00
Arnd Bergmann
3297481d68 futex: Remove futex_cmpxchg detection
Now that all architectures have a working futex implementation in any
configuration, remove the runtime detection code.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Acked-by: Vineet Gupta <vgupta@kernel.org>
Acked-by: Max Filippov <jcmvbkbc@gmail.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Link: https://lore.kernel.org/r/20211026100432.1730393-2-arnd@kernel.org
2021-11-25 00:02:28 +01:00
Linus Torvalds
512b7931ad Merge branch 'akpm' (patches from Andrew)
Merge misc updates from Andrew Morton:
 "257 patches.

  Subsystems affected by this patch series: scripts, ocfs2, vfs, and
  mm (slab-generic, slab, slub, kconfig, dax, kasan, debug, pagecache,
  gup, swap, memcg, pagemap, mprotect, mremap, iomap, tracing, vmalloc,
  pagealloc, memory-failure, hugetlb, userfaultfd, vmscan, tools,
  memblock, oom-kill, hugetlbfs, migration, thp, readahead, nommu, ksm,
  vmstat, madvise, memory-hotplug, rmap, zsmalloc, highmem, zram,
  cleanups, kfence, and damon)"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (257 commits)
  mm/damon: remove return value from before_terminate callback
  mm/damon: fix a few spelling mistakes in comments and a pr_debug message
  mm/damon: simplify stop mechanism
  Docs/admin-guide/mm/pagemap: wordsmith page flags descriptions
  Docs/admin-guide/mm/damon/start: simplify the content
  Docs/admin-guide/mm/damon/start: fix a wrong link
  Docs/admin-guide/mm/damon/start: fix wrong example commands
  mm/damon/dbgfs: add adaptive_targets list check before enable monitor_on
  mm/damon: remove unnecessary variable initialization
  Documentation/admin-guide/mm/damon: add a document for DAMON_RECLAIM
  mm/damon: introduce DAMON-based Reclamation (DAMON_RECLAIM)
  selftests/damon: support watermarks
  mm/damon/dbgfs: support watermarks
  mm/damon/schemes: activate schemes based on a watermarks mechanism
  tools/selftests/damon: update for regions prioritization of schemes
  mm/damon/dbgfs: support prioritization weights
  mm/damon/vaddr,paddr: support pageout prioritization
  mm/damon/schemes: prioritize regions within the quotas
  mm/damon/selftests: support schemes quotas
  mm/damon/dbgfs: support quotas of schemes
  ...
2021-11-06 14:08:17 -07:00
Kefeng Wang
09cea61950 arm64: support page mapping percpu first chunk allocator
Percpu embedded first chunk allocator is the firstly option, but it
could fails on ARM64, eg,

  percpu: max_distance=0x5fcfdc640000 too large for vmalloc space 0x781fefff0000
  percpu: max_distance=0x600000540000 too large for vmalloc space 0x7dffb7ff0000
  percpu: max_distance=0x5fff9adb0000 too large for vmalloc space 0x5dffb7ff0000

then we could get

  WARNING: CPU: 15 PID: 461 at vmalloc.c:3087 pcpu_get_vm_areas+0x488/0x838

and the system could not boot successfully.

Let's implement page mapping percpu first chunk allocator as a fallback
to the embedding allocator to increase the robustness of the system.

Link: https://lkml.kernel.org/r/20210910053354.26721-3-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Marco Elver <elver@google.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 13:30:37 -07:00
Linus Torvalds
5c904c66ed Char/Misc driver update for 5.16-rc1
Here is the big set of char and misc and other tiny driver subsystem
 updates for 5.16-rc1.
 
 Loads of things in here, all of which have been in linux-next for a
 while with no reported problems (except for one called out below.)
 
 Included are:
 	- habanana labs driver updates, including dma_buf usage,
 	  reviewed and acked by the dma_buf maintainers
 	- iio driver update (going through this tree not staging as they
 	  really do not belong going through that tree anymore)
 	- counter driver updates
 	- hwmon driver updates that the counter drivers needed, acked by
 	  the hwmon maintainer
 	- xillybus driver updates
 	- binder driver updates
 	- extcon driver updates
 	- dma_buf module namespaces added (will cause a build error in
 	  arm64 for allmodconfig, but that change is on its way through
 	  the drm tree)
 	- lkdtm driver updates
 	- pvpanic driver updates
 	- phy driver updates
 	- virt acrn and nitr_enclaves driver updates
 	- smaller char and misc driver updates
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYYPX2A8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ymUUgCbB4EKysgLuXYdjUalZDx+vvZO4k0AniS14O4k
 F+2dVSZ5WX6wumUzCaA6
 =bXQM
 -----END PGP SIGNATURE-----

Merge tag 'char-misc-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc driver updates from Greg KH:
 "Here is the big set of char and misc and other tiny driver subsystem
  updates for 5.16-rc1.

  Loads of things in here, all of which have been in linux-next for a
  while with no reported problems (except for one called out below.)

  Included are:

   - habanana labs driver updates, including dma_buf usage, reviewed and
     acked by the dma_buf maintainers

   - iio driver update (going through this tree not staging as they
     really do not belong going through that tree anymore)

   - counter driver updates

   - hwmon driver updates that the counter drivers needed, acked by the
     hwmon maintainer

   - xillybus driver updates

   - binder driver updates

   - extcon driver updates

   - dma_buf module namespaces added (will cause a build error in arm64
     for allmodconfig, but that change is on its way through the drm
     tree)

   - lkdtm driver updates

   - pvpanic driver updates

   - phy driver updates

   - virt acrn and nitr_enclaves driver updates

   - smaller char and misc driver updates"

* tag 'char-misc-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (386 commits)
  comedi: dt9812: fix DMA buffers on stack
  comedi: ni_usb6501: fix NULL-deref in command paths
  arm64: errata: Enable TRBE workaround for write to out-of-range address
  arm64: errata: Enable workaround for TRBE overwrite in FILL mode
  coresight: trbe: Work around write to out of range
  coresight: trbe: Make sure we have enough space
  coresight: trbe: Add a helper to determine the minimum buffer size
  coresight: trbe: Workaround TRBE errata overwrite in FILL mode
  coresight: trbe: Add infrastructure for Errata handling
  coresight: trbe: Allow driver to choose a different alignment
  coresight: trbe: Decouple buffer base from the hardware base
  coresight: trbe: Add a helper to pad a given buffer area
  coresight: trbe: Add a helper to calculate the trace generated
  coresight: trbe: Defer the probe on offline CPUs
  coresight: trbe: Fix incorrect access of the sink specific data
  coresight: etm4x: Add ETM PID for Kryo-5XX
  coresight: trbe: Prohibit trace before disabling TRBE
  coresight: trbe: End the AUX handle on truncation
  coresight: trbe: Do not truncate buffer on IRQ
  coresight: trbe: Fix handling of spurious interrupts
  ...
2021-11-04 08:21:47 -07:00
Linus Torvalds
d7e0a795bf ARM:
* More progress on the protected VM front, now with the full
   fixed feature set as well as the limitation of some hypercalls
   after initialisation.
 
 * Cleanup of the RAZ/WI sysreg handling, which was pointlessly
   complicated
 
 * Fixes for the vgic placement in the IPA space, together with a
   bunch of selftests
 
 * More memcg accounting of the memory allocated on behalf of a guest
 
 * Timer and vgic selftests
 
 * Workarounds for the Apple M1 broken vgic implementation
 
 * KConfig cleanups
 
 * New kvmarm.mode=none option, for those who really dislike us
 
 RISC-V:
 * New KVM port.
 
 x86:
 * New API to control TSC offset from userspace
 
 * TSC scaling for nested hypervisors on SVM
 
 * Switch masterclock protection from raw_spin_lock to seqcount
 
 * Clean up function prototypes in the page fault code and avoid
 repeated memslot lookups
 
 * Convey the exit reason to userspace on emulation failure
 
 * Configure time between NX page recovery iterations
 
 * Expose Predictive Store Forwarding Disable CPUID leaf
 
 * Allocate page tracking data structures lazily (if the i915
 KVM-GT functionality is not compiled in)
 
 * Cleanups, fixes and optimizations for the shadow MMU code
 
 s390:
 * SIGP Fixes
 
 * initial preparations for lazy destroy of secure VMs
 
 * storage key improvements/fixes
 
 * Log the guest CPNC
 
 Starting from this release, KVM-PPC patches will come from
 Michael Ellerman's PPC tree.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmGBOiEUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroNowwf/axlx3g9sgCwQHr12/6UF/7hL/RwP
 9z+pGiUzjl2YQE+RjSvLqyd6zXh+h4dOdOKbZDLSkSTbcral/8U70ojKnQsXM0XM
 1LoymxBTJqkgQBLm9LjYreEbzrPV4irk4ygEmuk3CPOHZu8xX1ei6c5LdandtM/n
 XVUkXsQY+STkmnGv4P3GcPoDththCr0tBTWrFWtxa0w9hYOxx0ay1AZFlgM4FFX0
 QFuRc8VBLoDJpIUjbkhsIRIbrlHc/YDGjuYnAU7lV/CIME8vf2BW6uBwIZJdYcDj
 0ejozLjodEnuKXQGnc8sXFioLX2gbMyQJEvwCgRvUu/EU7ncFm1lfs7THQ==
 =UxKM
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "ARM:

   - More progress on the protected VM front, now with the full fixed
     feature set as well as the limitation of some hypercalls after
     initialisation.

   - Cleanup of the RAZ/WI sysreg handling, which was pointlessly
     complicated

   - Fixes for the vgic placement in the IPA space, together with a
     bunch of selftests

   - More memcg accounting of the memory allocated on behalf of a guest

   - Timer and vgic selftests

   - Workarounds for the Apple M1 broken vgic implementation

   - KConfig cleanups

   - New kvmarm.mode=none option, for those who really dislike us

  RISC-V:

   - New KVM port.

  x86:

   - New API to control TSC offset from userspace

   - TSC scaling for nested hypervisors on SVM

   - Switch masterclock protection from raw_spin_lock to seqcount

   - Clean up function prototypes in the page fault code and avoid
     repeated memslot lookups

   - Convey the exit reason to userspace on emulation failure

   - Configure time between NX page recovery iterations

   - Expose Predictive Store Forwarding Disable CPUID leaf

   - Allocate page tracking data structures lazily (if the i915 KVM-GT
     functionality is not compiled in)

   - Cleanups, fixes and optimizations for the shadow MMU code

  s390:

   - SIGP Fixes

   - initial preparations for lazy destroy of secure VMs

   - storage key improvements/fixes

   - Log the guest CPNC

  Starting from this release, KVM-PPC patches will come from Michael
  Ellerman's PPC tree"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (227 commits)
  RISC-V: KVM: fix boolreturn.cocci warnings
  RISC-V: KVM: remove unneeded semicolon
  RISC-V: KVM: Fix GPA passed to __kvm_riscv_hfence_gvma_xyz() functions
  RISC-V: KVM: Factor-out FP virtualization into separate sources
  KVM: s390: add debug statement for diag 318 CPNC data
  KVM: s390: pv: properly handle page flags for protected guests
  KVM: s390: Fix handle_sske page fault handling
  KVM: x86: SGX must obey the KVM_INTERNAL_ERROR_EMULATION protocol
  KVM: x86: On emulation failure, convey the exit reason, etc. to userspace
  KVM: x86: Get exit_reason as part of kvm_x86_ops.get_exit_info
  KVM: x86: Clarify the kvm_run.emulation_failure structure layout
  KVM: s390: Add a routine for setting userspace CPU state
  KVM: s390: Simplify SIGP Set Arch handling
  KVM: s390: pv: avoid stalls when making pages secure
  KVM: s390: pv: avoid stalls for kvm_s390_pv_init_vm
  KVM: s390: pv: avoid double free of sida page
  KVM: s390: pv: add macros for UVC CC values
  s390/mm: optimize reset_guest_reference_bit()
  s390/mm: optimize set_guest_storage_key()
  s390/mm: no need for pte_alloc_map_lock() if we know the pmd is present
  ...
2021-11-02 11:24:14 -07:00
Linus Torvalds
79ef0c0014 Tracing updates for 5.16:
- kprobes: Restructured stack unwinder to show properly on x86 when a stack
   dump happens from a kretprobe callback.
 
 - Fix to bootconfig parsing
 
 - Have tracefs allow owner and group permissions by default (only denying
   others). There's been pressure to allow non root to tracefs in a
   controlled fashion, and using groups is probably the safest.
 
 - Bootconfig memory managament updates.
 
 - Bootconfig clean up to have the tools directory be less dependent on
   changes in the kernel tree.
 
 - Allow perf to be traced by function tracer.
 
 - Rewrite of function graph tracer to be a callback from the function tracer
   instead of having its own trampoline (this change will happen on an arch
   by arch basis, and currently only x86_64 implements it).
 
 - Allow multiple direct trampolines (bpf hooks to functions) be batched
   together in one synchronization.
 
 - Allow histogram triggers to add variables that can perform calculations
   against the event's fields.
 
 - Use the linker to determine architecture callbacks from the ftrace
   trampoline to allow for proper parameter prototypes and prevent warnings
   from the compiler.
 
 - Extend histogram triggers to key off of variables.
 
 - Have trace recursion use bit magic to determine preempt context over if
   branches.
 
 - Have trace recursion disable preemption as all use cases do anyway.
 
 - Added testing for verification of tracing utilities.
 
 - Various small clean ups and fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYYBdxhQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qp1sAQD2oYFwaG3sx872gj/myBcHIBSKdiki
 Hry5csd8zYDBpgD+Poylopt5JIbeDuoYw/BedgEXmscZ8Qr7VzjAXdnv/Q4=
 =Loz8
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:

 - kprobes: Restructured stack unwinder to show properly on x86 when a
   stack dump happens from a kretprobe callback.

 - Fix to bootconfig parsing

 - Have tracefs allow owner and group permissions by default (only
   denying others). There's been pressure to allow non root to tracefs
   in a controlled fashion, and using groups is probably the safest.

 - Bootconfig memory managament updates.

 - Bootconfig clean up to have the tools directory be less dependent on
   changes in the kernel tree.

 - Allow perf to be traced by function tracer.

 - Rewrite of function graph tracer to be a callback from the function
   tracer instead of having its own trampoline (this change will happen
   on an arch by arch basis, and currently only x86_64 implements it).

 - Allow multiple direct trampolines (bpf hooks to functions) be batched
   together in one synchronization.

 - Allow histogram triggers to add variables that can perform
   calculations against the event's fields.

 - Use the linker to determine architecture callbacks from the ftrace
   trampoline to allow for proper parameter prototypes and prevent
   warnings from the compiler.

 - Extend histogram triggers to key off of variables.

 - Have trace recursion use bit magic to determine preempt context over
   if branches.

 - Have trace recursion disable preemption as all use cases do anyway.

 - Added testing for verification of tracing utilities.

 - Various small clean ups and fixes.

* tag 'trace-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (101 commits)
  tracing/histogram: Fix semicolon.cocci warnings
  tracing/histogram: Fix documentation inline emphasis warning
  tracing: Increase PERF_MAX_TRACE_SIZE to handle Sentinel1 and docker together
  tracing: Show size of requested perf buffer
  bootconfig: Initialize ret in xbc_parse_tree()
  ftrace: do CPU checking after preemption disabled
  ftrace: disable preemption when recursion locked
  tracing/histogram: Document expression arithmetic and constants
  tracing/histogram: Optimize division by a power of 2
  tracing/histogram: Covert expr to const if both operands are constants
  tracing/histogram: Simplify handling of .sym-offset in expressions
  tracing: Fix operator precedence for hist triggers expression
  tracing: Add division and multiplication support for hist triggers
  tracing: Add support for creating hist trigger variables from literal
  selftests/ftrace: Stop tracing while reading the trace file by default
  MAINTAINERS: Update KPROBES and TRACING entries
  test_kprobes: Move it from kernel/ to lib/
  docs, kprobes: Remove invalid URL and add new reference
  samples/kretprobes: Fix return value if register_kretprobe() failed
  lib/bootconfig: Fix the xbc_get_info kerneldoc
  ...
2021-11-01 20:05:19 -07:00
Linus Torvalds
46f8763228 arm64 updates for 5.16
- Support for the Arm8.6 timer extensions, including a self-synchronising
   view of the system registers to elide some expensive ISB instructions.
 
 - Exception table cleanup and rework so that the fixup handlers appear
   correctly in backtraces.
 
 - A handful of miscellaneous changes, the main one being selection of
   CONFIG_HAVE_POSIX_CPU_TIMERS_TASK_WORK.
 
 - More mm and pgtable cleanups.
 
 - KASAN support for "asymmetric" MTE, where tag faults are reported
   synchronously for loads (via an exception) and asynchronously for
   stores (via a register).
 
 - Support for leaving the MMU enabled during kexec relocation, which
   significantly speeds up the operation.
 
 - Minor improvements to our perf PMU drivers.
 
 - Improvements to the compat vDSO build system, particularly when
   building with LLVM=1.
 
 - Preparatory work for handling some Coresight TRBE tracing errata.
 
 - Cleanup and refactoring of the SVE code to pave the way for SME
   support in future.
 
 - Ensure SCS pages are unpoisoned immediately prior to freeing them
   when KASAN is enabled for the vmalloc area.
 
 - Try moving to the generic pfn_valid() implementation again now that
   the DMA mapping issue from last time has been resolved.
 
 - Numerous improvements and additions to our FPSIMD and SVE selftests.
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmF74ZYQHHdpbGxAa2Vy
 bmVsLm9yZwAKCRC3rHDchMFjNI/eB/UZYAtmNi6xC5StPaETyMLeZph9BV/IqIFq
 N71ds7MFzlX/agR6MwLbH2tBHezBtlQ90O732Jjz8zAec2cHd+7sx/w82JesX7PB
 IuOfqP78rvtU4ZkKe1Rcd96QtYvbtNAqcRhIo95OzfV9xwuzkvdXI+ZTYhtCfCuZ
 GozCqQoJtnNDayMtfzbDSXyJLNJc/qnIcUQhrt3vg12zbF3BcHxnmp0nBcHCqZEo
 lDJYufju7p87kCzaFYda2WhlI3t+NThqKOiZ332wQfqzNcr+rw1Y4jWbnCfrdLtI
 JfHT9yiuHDmFSYaJrk7NU8kftW31NV70bbhD7rZ+DQCVndl0lRc=
 =3R3j
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Will Deacon:
 "There's the usual summary below, but the highlights are support for
  the Armv8.6 timer extensions, KASAN support for asymmetric MTE, the
  ability to kexec() with the MMU enabled and a second attempt at
  switching to the generic pfn_valid() implementation.

  Summary:

   - Support for the Arm8.6 timer extensions, including a
     self-synchronising view of the system registers to elide some
     expensive ISB instructions.

   - Exception table cleanup and rework so that the fixup handlers
     appear correctly in backtraces.

   - A handful of miscellaneous changes, the main one being selection of
     CONFIG_HAVE_POSIX_CPU_TIMERS_TASK_WORK.

   - More mm and pgtable cleanups.

   - KASAN support for "asymmetric" MTE, where tag faults are reported
     synchronously for loads (via an exception) and asynchronously for
     stores (via a register).

   - Support for leaving the MMU enabled during kexec relocation, which
     significantly speeds up the operation.

   - Minor improvements to our perf PMU drivers.

   - Improvements to the compat vDSO build system, particularly when
     building with LLVM=1.

   - Preparatory work for handling some Coresight TRBE tracing errata.

   - Cleanup and refactoring of the SVE code to pave the way for SME
     support in future.

   - Ensure SCS pages are unpoisoned immediately prior to freeing them
     when KASAN is enabled for the vmalloc area.

   - Try moving to the generic pfn_valid() implementation again now that
     the DMA mapping issue from last time has been resolved.

   - Numerous improvements and additions to our FPSIMD and SVE
     selftests"

[ armv8.6 timer updates were in a shared branch and already came in
  through -tip in the timer pull  - Linus ]

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (85 commits)
  arm64: Select POSIX_CPU_TIMERS_TASK_WORK
  arm64: Document boot requirements for FEAT_SME_FA64
  arm64/sve: Fix warnings when SVE is disabled
  arm64/sve: Add stub for sve_max_virtualisable_vl()
  arm64: errata: Add detection for TRBE write to out-of-range
  arm64: errata: Add workaround for TSB flush failures
  arm64: errata: Add detection for TRBE overwrite in FILL mode
  arm64: Add Neoverse-N2, Cortex-A710 CPU part definition
  selftests: arm64: Factor out utility functions for assembly FP tests
  arm64: vmlinux.lds.S: remove `.fixup` section
  arm64: extable: add load_unaligned_zeropad() handler
  arm64: extable: add a dedicated uaccess handler
  arm64: extable: add `type` and `data` fields
  arm64: extable: use `ex` for `exception_table_entry`
  arm64: extable: make fixup_exception() return bool
  arm64: extable: consolidate definitions
  arm64: gpr-num: support W registers
  arm64: factor out GPR numbering helpers
  arm64: kvm: use kvm_exception_table_entry
  arm64: lib: __arch_copy_to_user(): fold fixups into body
  ...
2021-11-01 16:33:53 -07:00
Linus Torvalds
9a7e0a90a4 Scheduler updates:
- Revert the printk format based wchan() symbol resolution as it can leak
    the raw value in case that the symbol is not resolvable.
 
  - Make wchan() more robust and work with all kind of unwinders by
    enforcing that the task stays blocked while unwinding is in progress.
 
  - Prevent sched_fork() from accessing an invalid sched_task_group
 
  - Improve asymmetric packing logic
 
  - Extend scheduler statistics to RT and DL scheduling classes and add
    statistics for bandwith burst to the SCHED_FAIR class.
 
  - Properly account SCHED_IDLE entities
 
  - Prevent a potential deadlock when initial priority is assigned to a
    newly created kthread. A recent change to plug a race between cpuset and
    __sched_setscheduler() introduced a new lock dependency which is now
    triggered. Break the lock dependency chain by moving the priority
    assignment to the thread function.
 
  - Fix the idle time reporting in /proc/uptime for NOHZ enabled systems.
 
  - Improve idle balancing in general and especially for NOHZ enabled
    systems.
 
  - Provide proper interfaces for live patching so it does not have to
    fiddle with scheduler internals.
 
  - Add cluster aware scheduling support.
 
  - A small set of tweaks for RT (irqwork, wait_task_inactive(), various
    scheduler options and delaying mmdrop)
 
  - The usual small tweaks and improvements all over the place
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmF/OUkTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoR/5D/9ikdGNpKg9osNqJ3GjAmxsK6kVkB29
 iFe2k8pIpWDToWQf/wQRGih4Yj3Cl49QSnZcPIibh2/12EB1qrrW6iSPJkInz8Ec
 /1LS5/Vewn2OyoxyXZjdvGC5gTXEodSbIazASvX7nvdMeI4gsAsL5etzrMJirT/t
 aymqvr7zovvywrwMTQJrGjUMo9l4ewE8tafMNNhRu1BHU1U4ojM9yvThyRAAcmp7
 3Xy49A+Yq3IgrvYI4u8FMK5Zh08KaxSFjiLhePGm/bF+wSfYmWop2TP1jY05W2Uo
 ti8hfbJMUoFRYuMxAiEldkItnc0wV4M9PtWZZ/x+B71bs65Y4Zjt9cW+rxJv2+m1
 vzV31EsQwGnOti072dzWN4c/cZqngVXAjaNtErvDwJUr+Tw1ayv9KUvuodMQqZY6
 mu68bFUO2kV9EMe1CBOv51Uy1RGHyLj3rlNqrkw+Xp5ISE9Ad2vhUEiRp5bQx5Ci
 V/XFhGZkGUluh0vccrdFlNYZwhj8cZEzkOPCnPSeZ+bq8SyZE6xuHH/lTP1CJCOy
 s800rW1huM+kgV+zRN8adDkGXibAk9N3RtVGnQXmuEy8gB9LZmQg+JeM2wsc9B+6
 i0gdqZnsjNAfoK+BBAG4holxptSL8/eOJsFH8ZNIoxQ+iqooyPx9tFX7yXnRTBQj
 d2qWG7UvoseT+g==
 =fgtS
 -----END PGP SIGNATURE-----

Merge tag 'sched-core-2021-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler updates from Thomas Gleixner:

 - Revert the printk format based wchan() symbol resolution as it can
   leak the raw value in case that the symbol is not resolvable.

 - Make wchan() more robust and work with all kind of unwinders by
   enforcing that the task stays blocked while unwinding is in progress.

 - Prevent sched_fork() from accessing an invalid sched_task_group

 - Improve asymmetric packing logic

 - Extend scheduler statistics to RT and DL scheduling classes and add
   statistics for bandwith burst to the SCHED_FAIR class.

 - Properly account SCHED_IDLE entities

 - Prevent a potential deadlock when initial priority is assigned to a
   newly created kthread. A recent change to plug a race between cpuset
   and __sched_setscheduler() introduced a new lock dependency which is
   now triggered. Break the lock dependency chain by moving the priority
   assignment to the thread function.

 - Fix the idle time reporting in /proc/uptime for NOHZ enabled systems.

 - Improve idle balancing in general and especially for NOHZ enabled
   systems.

 - Provide proper interfaces for live patching so it does not have to
   fiddle with scheduler internals.

 - Add cluster aware scheduling support.

 - A small set of tweaks for RT (irqwork, wait_task_inactive(), various
   scheduler options and delaying mmdrop)

 - The usual small tweaks and improvements all over the place

* tag 'sched-core-2021-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (69 commits)
  sched/fair: Cleanup newidle_balance
  sched/fair: Remove sysctl_sched_migration_cost condition
  sched/fair: Wait before decaying max_newidle_lb_cost
  sched/fair: Skip update_blocked_averages if we are defering load balance
  sched/fair: Account update_blocked_averages in newidle_balance cost
  x86: Fix __get_wchan() for !STACKTRACE
  sched,x86: Fix L2 cache mask
  sched/core: Remove rq_relock()
  sched: Improve wake_up_all_idle_cpus() take #2
  irq_work: Also rcuwait for !IRQ_WORK_HARD_IRQ on PREEMPT_RT
  irq_work: Handle some irq_work in a per-CPU thread on PREEMPT_RT
  irq_work: Allow irq_work_sync() to sleep if irq_work() no IRQ support.
  sched/rt: Annotate the RT balancing logic irqwork as IRQ_WORK_HARD_IRQ
  sched: Add cluster scheduler level for x86
  sched: Add cluster scheduler level in core and related Kconfig for ARM64
  topology: Represent clusters of CPUs within a die
  sched: Disable -Wunused-but-set-variable
  sched: Add wrapper for get_wchan() to keep task blocked
  x86: Fix get_wchan() to support the ORC unwinder
  proc: Use task_is_running() for wchan in /proc/$pid/stat
  ...
2021-11-01 13:48:52 -07:00
Linus Torvalds
5a47ebe98e Updates for the interrupt subsystem:
Core changes:
 
   - Prevent a potential deadlock when initial priority is assigned to a
     newly created interrupt thread. A recent change to plug a race between
     cpuset and __sched_setscheduler() introduced a new lock dependency
     which is now triggered. Break the lock dependency chain by moving the
     priority assignment to the thread function.
 
   - A couple of small updates to make the irq core RT safe.
 
   - Confine the irq_cpu_online/offline() API to the only left unfixable
     user Cavium Octeon so that it does not grow new usage.
 
   - A small documentation update
 
  Driver changes:
 
   - A large cross architecture rework to move irq_enter/exit() into the
     architecture code to make addressing the NOHZ_FULL/RCU issues simpler.
 
   - The obligatory new irq chip driver for Microchip EIC
 
   - Modularize a few irq chip drivers
 
   - Expand usage of devm_*() helpers throughout the driver code
 
   - The usual small fixes and improvements all over the place
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmF+8BUTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoWs2EACeNbL93aIFokd2/RllRSr4VvMjKNyW
 PpA0RYDOz1Jh4ldK+7b/EYapKgAkR3yyOtz+jyjRE7jsQK0pQeLtYNLd3cTzsD7K
 LCvl8rq6cbRqyFoSC15UKKNbQ/f+o/3LeGPoipr5NQZRMepxk2J/yBCNRXHvIbe6
 oLMQJUgw7KKtvCrCUX9OSei4F09T1qsNrIYb7QafP5+v0zndAT7uKNivWrKGFrsh
 Uk9epoH3hIkvQERkpmzwJEJaq6oyqhoYQy7ZRGayEPwIdCyivJGZrVX0mZk1LX58
 uc8u5grIslX9MqZEQWBweR5y7nISB494NGKmoCInu66U/+3DSOg3AGH2Rfw8PNFZ
 lMKdXzYoDgv2y6LeiLtTUKV4K1NBRXo0BhwSGbPw0o6C03/x003kG824Y+/naU75
 6q05BZSia1PagPV3e0UAm0A2Rnjj/5uso2fEk0eGBSGM27jf9SQcSE8DVrEiLRd1
 2N5uAXbMdfu4xACsEI1Uxu1KNOSQnUhBCy0X6Ppj1a083kLG7jg/126ebb05R8G4
 MF79PFt+xUPSzmuKc/xwCdANtW+zzoyjYl5w6mwELBJ9veNbPShokGBTN/qzjXKZ
 vdr3/pXx95lRAzFnGOnETesm3IyObruU4K8NbMKd2b+eYa0w1WuZCKnutGLfsqxg
 byhCEw459e3P2g==
 =r6ln
 -----END PGP SIGNATURE-----

Merge tag 'irq-core-2021-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq updates from Thomas Gleixner:
 "Updates for the interrupt subsystem:

  Core changes:

   - Prevent a potential deadlock when initial priority is assigned to a
     newly created interrupt thread. A recent change to plug a race
     between cpuset and __sched_setscheduler() introduced a new lock
     dependency which is now triggered. Break the lock dependency chain
     by moving the priority assignment to the thread function.

   - A couple of small updates to make the irq core RT safe.

   - Confine the irq_cpu_online/offline() API to the only left unfixable
     user Cavium Octeon so that it does not grow new usage.

   - A small documentation update

  Driver changes:

   - A large cross architecture rework to move irq_enter/exit() into the
     architecture code to make addressing the NOHZ_FULL/RCU issues
     simpler.

   - The obligatory new irq chip driver for Microchip EIC

   - Modularize a few irq chip drivers

   - Expand usage of devm_*() helpers throughout the driver code

   - The usual small fixes and improvements all over the place"

* tag 'irq-core-2021-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits)
  h8300: Fix linux/irqchip.h include mess
  dt-bindings: irqchip: renesas-irqc: Document r8a774e1 bindings
  MIPS: irq: Avoid an unused-variable error
  genirq: Hide irq_cpu_{on,off}line() behind a deprecated option
  irqchip/mips-gic: Get rid of the reliance on irq_cpu_online()
  MIPS: loongson64: Drop call to irq_cpu_offline()
  irq: remove handle_domain_{irq,nmi}()
  irq: remove CONFIG_HANDLE_DOMAIN_IRQ_IRQENTRY
  irq: riscv: perform irqentry in entry code
  irq: openrisc: perform irqentry in entry code
  irq: csky: perform irqentry in entry code
  irq: arm64: perform irqentry in entry code
  irq: arm: perform irqentry in entry code
  irq: add a (temporary) CONFIG_HANDLE_DOMAIN_IRQ_IRQENTRY
  irq: nds32: avoid CONFIG_HANDLE_DOMAIN_IRQ
  irq: arc: avoid CONFIG_HANDLE_DOMAIN_IRQ
  irq: add generic_handle_arch_irq()
  irq: unexport handle_irq_desc()
  irq: simplify handle_domain_{irq,nmi}()
  irq: mips: simplify do_domain_IRQ()
  ...
2021-11-01 13:09:10 -07:00
Greg Kroah-Hartman
28eb3b363d Coresight changes for v5.16
- A new option to make coresight cpu-debug capabilities available as early
 as possible in the kernel boot process.
 
 - Make trace sessions more enduring by coping with scenarios where events
 are scheduled on CPUs that can't reach the selected sink.
 
 - A set of improvement to make the TMC-ETR driver more efficient.
 
 - Enhancements to the TRBE driver to correct several errata.
 
 - An enhancement to make the AXI burts size configurable for TMC devices
 that can't work with the default value.
 
 - A fix in the CTI module to use the correct device when calling
 pm_runtime_put()
 
 - The addition of the Kryo-5xx device to the list of support ETMs.
 
 Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQFPBAABCgA5FiEEeTrpXvBwUkra1RYWo5FxFnwrV6EFAmF5lPIbHG1hdGhpZXUu
 cG9pcmllckBsaW5hcm8ub3JnAAoJEKORcRZ8K1ehDhUH/1EY+5f0VEqYAXC9Frf/
 LqzdlHz1z9hA7SewKWZk52W0EF47MOwF0Td37ZxL/xV6gL29Dp7QfegWxHkCwbVX
 CuIeYxcIPRNtoIceVctLNY1BlqCFfyMWvwLkM0cqa6ZaFd+kC+WHxxjImMQ71Ohf
 4AmTZAwFItLsw+3fF8FC2J1VbmgrPc81Q5Yt+hne00bVE2TA+NWzsC+c1MR5MSWu
 kenpCi6CFyBJUQXs6kmWyrkouEq3R40NJjBePW7UUO2b/9KFmid5h5VCn/hhb/5I
 +76Cqz5+KWPZ5xKUgjmL0n6TKFOmXg2AMAQaQSJ3BaTq/odDymIyiX20WS54SjhE
 0CM=
 =dgnA
 -----END PGP SIGNATURE-----

Merge tag 'coresight-next-v5.16.v3' of gitolite.kernel.org:pub/scm/linux/kernel/git/coresight/linux into char-misc-next

Mathieu writes:

Coresight changes for v5.16

- A new option to make coresight cpu-debug capabilities available as early
as possible in the kernel boot process.

- Make trace sessions more enduring by coping with scenarios where events
are scheduled on CPUs that can't reach the selected sink.

- A set of improvement to make the TMC-ETR driver more efficient.

- Enhancements to the TRBE driver to correct several errata.

- An enhancement to make the AXI burts size configurable for TMC devices
that can't work with the default value.

- A fix in the CTI module to use the correct device when calling
pm_runtime_put()

- The addition of the Kryo-5xx device to the list of support ETMs.

Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>

* tag 'coresight-next-v5.16.v3' of gitolite.kernel.org:pub/scm/linux/kernel/git/coresight/linux: (39 commits)
  arm64: errata: Enable TRBE workaround for write to out-of-range address
  arm64: errata: Enable workaround for TRBE overwrite in FILL mode
  coresight: trbe: Work around write to out of range
  coresight: trbe: Make sure we have enough space
  coresight: trbe: Add a helper to determine the minimum buffer size
  coresight: trbe: Workaround TRBE errata overwrite in FILL mode
  coresight: trbe: Add infrastructure for Errata handling
  coresight: trbe: Allow driver to choose a different alignment
  coresight: trbe: Decouple buffer base from the hardware base
  coresight: trbe: Add a helper to pad a given buffer area
  coresight: trbe: Add a helper to calculate the trace generated
  coresight: trbe: Defer the probe on offline CPUs
  coresight: trbe: Fix incorrect access of the sink specific data
  coresight: etm4x: Add ETM PID for Kryo-5XX
  coresight: trbe: Prohibit trace before disabling TRBE
  coresight: trbe: End the AUX handle on truncation
  coresight: trbe: Do not truncate buffer on IRQ
  coresight: trbe: Fix handling of spurious interrupts
  coresight: trbe: irq handler: Do not disable TRBE if no action is needed
  coresight: trbe: Unify the enabling sequence
  ...
2021-10-30 10:48:32 +02:00
Will Deacon
b2909a447e Merge branch 'for-next/vdso' into for-next/core
* for-next/vdso:
  arm64: vdso32: require CROSS_COMPILE_COMPAT for gcc+bfd
  arm64: vdso32: suppress error message for 'make mrproper'
  arm64: vdso32: drop test for -march=armv8-a
  arm64: vdso32: drop the test for dmb ishld
2021-10-29 12:25:38 +01:00
Will Deacon
e5f5210212 Merge branch 'for-next/trbe-errata' into for-next/core
* for-next/trbe-errata:
  arm64: errata: Add detection for TRBE write to out-of-range
  arm64: errata: Add workaround for TSB flush failures
  arm64: errata: Add detection for TRBE overwrite in FILL mode
  arm64: Add Neoverse-N2, Cortex-A710 CPU part definition
2021-10-29 12:25:33 +01:00
Will Deacon
16c200e040 Merge branch 'for-next/pfn-valid' into for-next/core
* for-next/pfn-valid:
  arm64/mm: drop HAVE_ARCH_PFN_VALID
  dma-mapping: remove bogus test for pfn_valid from dma_map_resource
2021-10-29 12:25:19 +01:00
Will Deacon
2bc655ce29 Merge branch 'for-next/misc' into for-next/core
* for-next/misc:
  arm64: Select POSIX_CPU_TIMERS_TASK_WORK
  arm64: Document boot requirements for FEAT_SME_FA64
  arm64: ftrace: use function_nocfi for _mcount as well
  arm64: asm: setup.h: export common variables
  arm64/traps: Avoid unnecessary kernel/user pointer conversion
2021-10-29 12:24:59 +01:00
Nicolas Saenz Julienne
a68773bd32 arm64: Select POSIX_CPU_TIMERS_TASK_WORK
With 6caa5812e2d1 ("KVM: arm64: Use generic KVM xfer to guest work
function") all arm64 exit paths are properly equipped to handle the
POSIX timers' task work.

Deferring timer callbacks to thread context, not only limits the amount
of time spent in hard interrupt context, but is a safer
implementation[1], and will allow PREEMPT_RT setups to use KVM[2].

So let's enable POSIX_CPU_TIMERS_TASK_WORK on arm64.

[1] https://lore.kernel.org/all/20200716201923.228696399@linutronix.de/
[2] https://lore.kernel.org/linux-rt-users/87v92bdnlx.ffs@tglx/

Signed-off-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211018144713.873464-1-nsaenzju@redhat.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-10-28 09:56:23 +01:00
Suzuki K Poulose
561ced0bb9 arm64: errata: Enable TRBE workaround for write to out-of-range address
With the TRBE driver workaround available, enable the config symbols
to be built without COMPILE_TEST

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20211019163153.3692640-16-suzuki.poulose@arm.com
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
2021-10-27 11:46:06 -06:00
Suzuki K Poulose
74b2740f57 arm64: errata: Enable workaround for TRBE overwrite in FILL mode
With the workaround enabled in TRBE, enable the config entries
to be built without COMPILE_TEST

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20211019163153.3692640-15-suzuki.poulose@arm.com
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
2021-10-27 11:46:04 -06:00
Mark Rutland
0953fb2637 irq: remove handle_domain_{irq,nmi}()
Now that entry code handles IRQ entry (including setting the IRQ regs)
before calling irqchip code, irqchip code can safely call
generic_handle_domain_irq(), and there's no functional reason for it to
call handle_domain_irq().

Let's cement this split of responsibility and remove handle_domain_irq()
entirely, updating irqchip drivers to call generic_handle_domain_irq().

For consistency, handle_domain_nmi() is similarly removed and replaced
with a generic_handle_domain_nmi() function which also does not perform
any entry logic.

Previously handle_domain_{irq,nmi}() had a WARN_ON() which would fire
when they were called in an inappropriate context. So that we can
identify similar issues going forward, similar WARN_ON_ONCE() logic is
added to the generic_handle_*() functions, and comments are updated for
clarity and consistency.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
2021-10-26 10:13:31 +01:00
Mark Rutland
26dc129342 irq: arm64: perform irqentry in entry code
In preparation for removing HANDLE_DOMAIN_IRQ_IRQENTRY, have arch/arm64
perform all the irqentry accounting in its entry code.

As arch/arm64 already performs portions of the irqentry logic in
enter_from_kernel_mode() and exit_to_kernel_mode(), including
rcu_irq_{enter,exit}(), the only additional calls that need to be made
are to irq_{enter,exit}_rcu(). Removing the calls to
rcu_irq_{enter,exit}() from handle_domain_irq() ensures that we inform
RCU once per IRQ entry and will correctly identify quiescent periods.

Since we should not call irq_{enter,exit}_rcu() when entering a
pseudo-NMI, el1_interrupt() is reworked to have separate __el1_irq() and
__el1_pnmi() paths for regular IRQ and psuedo-NMI entry, with
irq_{enter,exit}_irq() only called for the former.

In preparation for removing HANDLE_DOMAIN_IRQ, the irq regs are managed
in do_interrupt_handler() for both regular IRQ and pseudo-NMI. This is
currently redundant, but not harmful.

For clarity the preemption logic is moved into __el1_irq(). We should
never preempt within a pseudo-NMI, and arm64_enter_nmi() already
enforces this by incrementing the preempt_count, but it's clearer if we
never invoke the preemption logic when entering a pseudo-NMI.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Pingfan Liu <kernelfans@gmail.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
2021-10-26 10:12:53 +01:00
Mark Rutland
2fe35f8ee7 irq: add a (temporary) CONFIG_HANDLE_DOMAIN_IRQ_IRQENTRY
Going forward we want architecture/entry code to perform all the
necessary work to enter/exit IRQ context, with irqchip code merely
handling the mapping of the interrupt to any handler(s). Among other
reasons, this is necessary to consistently fix some longstanding issues
with the ordering of lockdep/RCU/tracing instrumentation which many
architectures get wrong today in their entry code.

Importantly, rcu_irq_{enter,exit}() must be called precisely once per
IRQ exception, so that rcu_is_cpu_rrupt_from_idle() can correctly
identify when an interrupt was taken from an idle context which must be
explicitly preempted. Currently handle_domain_irq() calls
rcu_irq_{enter,exit}() via irq_{enter,exit}(), but entry code needs to
be able to call rcu_irq_{enter,exit}() earlier for correct ordering
across lockdep/RCU/tracing updates for sequences such as:

  lockdep_hardirqs_off(CALLER_ADDR0);
  rcu_irq_enter();
  trace_hardirqs_off_finish();

To permit each architecture to be converted to the new style in turn,
this patch adds a new CONFIG_HANDLE_DOMAIN_IRQ_IRQENTRY selected by all
current users of HANDLE_DOMAIN_IRQ, which gates the existing behaviour.
When CONFIG_HANDLE_DOMAIN_IRQ_IRQENTRY is not selected,
handle_domain_irq() requires entry code to perform the
irq_{enter,exit}() work, with an explicit check for this matching the
style of handle_domain_nmi().

Subsequent patches will:

1) Add the necessary IRQ entry accounting to each architecture in turn,
   dropping CONFIG_HANDLE_DOMAIN_IRQ_IRQENTRY from that architecture's
   Kconfig.

2) Remove CONFIG_HANDLE_DOMAIN_IRQ_IRQENTRY once it is no longer
   selected.

3) Convert irqchip drivers to consistently use
   generic_handle_domain_irq() rather than handle_domain_irq().

4) Remove handle_domain_irq() and CONFIG_HANDLE_DOMAIN_IRQ.

... which should leave us with a clear split of responsiblity across the
entry and irqchip code, making it possible to perform additional
cleanups and fixes for the aforementioned longstanding issues with entry
code.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
2021-10-25 10:05:30 +01:00
Masami Hiramatsu
cd9bc2c925 arm64: Recover kretprobe modified return address in stacktrace
Since the kretprobe replaces the function return address with
the kretprobe_trampoline on the stack, stack unwinder shows it
instead of the correct return address.

This checks whether the next return address is the
__kretprobe_trampoline(), and if so, try to find the correct
return address from the kretprobe instance list. For this purpose
this adds 'kr_cur' loop cursor to memorize the current kretprobe
instance.

With this fix, now arm64 can enable
CONFIG_ARCH_CORRECT_STACKTRACE_ON_KRETPROBE, and pass the
kprobe self tests.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-22 12:16:53 -04:00
Suzuki K Poulose
8d81b2a38d arm64: errata: Add detection for TRBE write to out-of-range
Arm Neoverse-N2 and Cortex-A710 cores are affected by an erratum where
the trbe, under some circumstances, might write upto 64bytes to an
address after the Limit as programmed by the TRBLIMITR_EL1.LIMIT.
This might -
  - Corrupt a page in the ring buffer, which may corrupt trace from a
    previous session, consumed by userspace.
  - Hit the guard page at the end of the vmalloc area and raise a fault.

To keep the handling simpler, we always leave the last page from the
range, which TRBE is allowed to write. This can be achieved by ensuring
that we always have more than a PAGE worth space in the range, while
calculating the LIMIT for TRBE. And then the LIMIT pointer can be
adjusted to leave the PAGE (TRBLIMITR.LIMIT -= PAGE_SIZE), out of the
TRBE range while enabling it. This makes sure that the TRBE will only
write to an area within its allowed limit (i.e, [head-head+size]) and
we do not have to handle address faults within the driver.

Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20211019163153.3692640-5-suzuki.poulose@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-10-21 17:49:17 +01:00
Suzuki K Poulose
fa82d0b4b8 arm64: errata: Add workaround for TSB flush failures
Arm Neoverse-N2 (#2067961) and Cortex-A710 (#2054223) suffers
from errata, where a TSB (trace synchronization barrier)
fails to flush the trace data completely, when executed from
a trace prohibited region. In Linux we always execute it
after we have moved the PE to trace prohibited region. So,
we can apply the workaround every time a TSB is executed.

The work around is to issue two TSB consecutively.

NOTE: This errata is defined as LOCAL_CPU_ERRATUM, implying
that a late CPU could be blocked from booting if it is the
first CPU that requires the workaround. This is because we
do not allow setting a cpu_hwcaps after the SMP boot. The
other alternative is to use "this_cpu_has_cap()" instead
of the faster system wide check, which may be a bit of an
overhead, given we may have to do this in nvhe KVM host
before a guest entry.

Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20211019163153.3692640-4-suzuki.poulose@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-10-21 17:49:16 +01:00
Suzuki K Poulose
b9d216fcef arm64: errata: Add detection for TRBE overwrite in FILL mode
Arm Neoverse-N2 and the Cortex-A710 cores are affected
by a CPU erratum where the TRBE will overwrite the trace buffer
in FILL mode. The TRBE doesn't stop (as expected in FILL mode)
when it reaches the limit and wraps to the base to continue
writing upto 3 cache lines. This will overwrite any trace that
was written previously.

Add the Neoverse-N2 erratum(#2139208) and Cortex-A710 erratum
(#2119858) to the detection logic.

This will be used by the TRBE driver in later patches to work
around the issue. The detection has been kept with the core
arm64 errata framework list to make sure :
  - We don't duplicate the framework in TRBE driver
  - The errata detection is advertised like the rest
    of the CPU errata.

Note that the Kconfig entries are not fully active until the
TRBE driver implements the work around.

Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
cc: Leo Yan <leo.yan@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20211019163153.3692640-3-suzuki.poulose@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-10-21 17:49:16 +01:00
Nick Desaulniers
3e6f8d1fa1 arm64: vdso32: require CROSS_COMPILE_COMPAT for gcc+bfd
Similar to
commit 231ad7f409f1 ("Makefile: infer --target from ARCH for CC=clang")
There really is no point in setting --target based on
$CROSS_COMPILE_COMPAT for clang when the integrated assembler is being
used, since
commit ef94340583ee ("arm64: vdso32: drop -no-integrated-as flag").

Allows COMPAT_VDSO to be selected without setting $CROSS_COMPILE_COMPAT
when using clang and lld together.

Before:
$ ARCH=arm64 CROSS_COMPILE_COMPAT=arm-linux-gnueabi- make -j72 LLVM=1 defconfig
$ grep CONFIG_COMPAT_VDSO .config
CONFIG_COMPAT_VDSO=y
$ ARCH=arm64 make -j72 LLVM=1 defconfig
$ grep CONFIG_COMPAT_VDSO .config
$

After:
$ ARCH=arm64 CROSS_COMPILE_COMPAT=arm-linux-gnueabi- make -j72 LLVM=1 defconfig
$ grep CONFIG_COMPAT_VDSO .config
CONFIG_COMPAT_VDSO=y
$ ARCH=arm64 make -j72 LLVM=1 defconfig
$ grep CONFIG_COMPAT_VDSO .config
CONFIG_COMPAT_VDSO=y

Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Suggested-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Link: https://lore.kernel.org/r/20211019223646.1146945-5-ndesaulniers@google.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-10-21 10:31:39 +01:00
Barry Song
778c558f49 sched: Add cluster scheduler level in core and related Kconfig for ARM64
This patch adds scheduler level for clusters and automatically enables
the load balance among clusters. It will directly benefit a lot of
workload which loves more resources such as memory bandwidth, caches.

Testing has widely been done in two different hardware configurations of
Kunpeng920:

 24 cores in one NUMA(6 clusters in each NUMA node);
 32 cores in one NUMA(8 clusters in each NUMA node)

Workload is running on either one NUMA node or four NUMA nodes, thus,
this can estimate the effect of cluster spreading w/ and w/o NUMA load
balance.

* Stream benchmark:

4threads stream (on 1NUMA * 24cores = 24cores)
                stream                 stream
                w/o patch              w/ patch
MB/sec copy     29929.64 (   0.00%)    32932.68 (  10.03%)
MB/sec scale    29861.10 (   0.00%)    32710.58 (   9.54%)
MB/sec add      27034.42 (   0.00%)    32400.68 (  19.85%)
MB/sec triad    27225.26 (   0.00%)    31965.36 (  17.41%)

6threads stream (on 1NUMA * 24cores = 24cores)
                stream                 stream
                w/o patch              w/ patch
MB/sec copy     40330.24 (   0.00%)    42377.68 (   5.08%)
MB/sec scale    40196.42 (   0.00%)    42197.90 (   4.98%)
MB/sec add      37427.00 (   0.00%)    41960.78 (  12.11%)
MB/sec triad    37841.36 (   0.00%)    42513.64 (  12.35%)

12threads stream (on 1NUMA * 24cores = 24cores)
                stream                 stream
                w/o patch              w/ patch
MB/sec copy     52639.82 (   0.00%)    53818.04 (   2.24%)
MB/sec scale    52350.30 (   0.00%)    53253.38 (   1.73%)
MB/sec add      53607.68 (   0.00%)    55198.82 (   2.97%)
MB/sec triad    54776.66 (   0.00%)    56360.40 (   2.89%)

Thus, it could help memory-bound workload especially under medium load.
Similar improvement is also seen in lkp-pbzip2:

* lkp-pbzip2 benchmark

2-96 threads (on 4NUMA * 24cores = 96cores)
                  lkp-pbzip2              lkp-pbzip2
                  w/o patch               w/ patch
Hmean     tput-2   11062841.57 (   0.00%)  11341817.51 *   2.52%*
Hmean     tput-5   26815503.70 (   0.00%)  27412872.65 *   2.23%*
Hmean     tput-8   41873782.21 (   0.00%)  43326212.92 *   3.47%*
Hmean     tput-12  61875980.48 (   0.00%)  64578337.51 *   4.37%*
Hmean     tput-21 105814963.07 (   0.00%) 111381851.01 *   5.26%*
Hmean     tput-30 150349470.98 (   0.00%) 156507070.73 *   4.10%*
Hmean     tput-48 237195937.69 (   0.00%) 242353597.17 *   2.17%*
Hmean     tput-79 360252509.37 (   0.00%) 362635169.23 *   0.66%*
Hmean     tput-96 394571737.90 (   0.00%) 400952978.48 *   1.62%*

2-24 threads (on 1NUMA * 24cores = 24cores)
                 lkp-pbzip2               lkp-pbzip2
                 w/o patch                w/ patch
Hmean     tput-2   11071705.49 (   0.00%)  11296869.10 *   2.03%*
Hmean     tput-4   20782165.19 (   0.00%)  21949232.15 *   5.62%*
Hmean     tput-6   30489565.14 (   0.00%)  33023026.96 *   8.31%*
Hmean     tput-8   40376495.80 (   0.00%)  42779286.27 *   5.95%*
Hmean     tput-12  61264033.85 (   0.00%)  62995632.78 *   2.83%*
Hmean     tput-18  86697139.39 (   0.00%)  86461545.74 (  -0.27%)
Hmean     tput-24 104854637.04 (   0.00%) 104522649.46 *  -0.32%*

In the case of 6 threads and 8 threads, we see the greatest performance
improvement.

Similar improvement can be seen on lkp-pixz though the improvement is
smaller:

* lkp-pixz benchmark

2-24 threads lkp-pixz (on 1NUMA * 24cores = 24cores)
                  lkp-pixz               lkp-pixz
                  w/o patch              w/ patch
Hmean     tput-2   6486981.16 (   0.00%)  6561515.98 *   1.15%*
Hmean     tput-4  11645766.38 (   0.00%) 11614628.43 (  -0.27%)
Hmean     tput-6  15429943.96 (   0.00%) 15957350.76 *   3.42%*
Hmean     tput-8  19974087.63 (   0.00%) 20413746.98 *   2.20%*
Hmean     tput-12 28172068.18 (   0.00%) 28751997.06 *   2.06%*
Hmean     tput-18 39413409.54 (   0.00%) 39896830.55 *   1.23%*
Hmean     tput-24 49101815.85 (   0.00%) 49418141.47 *   0.64%*

* SPECrate benchmark

4,8,16 copies mcf_r(on 1NUMA * 32cores = 32cores)
		Base     	 	Base
		Run Time   	 	Rate
		-------  	 	---------
4 Copies	w/o 580 (w/ 570)       	w/o 11.1 (w/ 11.3)
8 Copies	w/o 647 (w/ 605)       	w/o 20.0 (w/ 21.4, +7%)
16 Copies	w/o 844 (w/ 844)       	w/o 30.6 (w/ 30.6)

32 Copies(on 4NUMA * 32 cores = 128cores)
[w/o patch]
                 Base     Base        Base
Benchmarks       Copies  Run Time     Rate
--------------- -------  ---------  ---------
500.perlbench_r      32        584       87.2  *
502.gcc_r            32        503       90.2  *
505.mcf_r            32        745       69.4  *
520.omnetpp_r        32       1031       40.7  *
523.xalancbmk_r      32        597       56.6  *
525.x264_r            1         --            CE
531.deepsjeng_r      32        336      109    *
541.leela_r          32        556       95.4  *
548.exchange2_r      32        513      163    *
557.xz_r             32        530       65.2  *
 Est. SPECrate2017_int_base              80.3

[w/ patch]
                  Base     Base        Base
Benchmarks       Copies  Run Time     Rate
--------------- -------  ---------  ---------
500.perlbench_r      32        580      87.8 (+0.688%)  *
502.gcc_r            32        477      95.1 (+5.432%)  *
505.mcf_r            32        644      80.3 (+13.574%) *
520.omnetpp_r        32        942      44.6 (+9.58%)   *
523.xalancbmk_r      32        560      60.4 (+6.714%%) *
525.x264_r            1         --           CE
531.deepsjeng_r      32        337      109  (+0.000%) *
541.leela_r          32        554      95.6 (+0.210%) *
548.exchange2_r      32        515      163  (+0.000%) *
557.xz_r             32        524      66.0 (+1.227%) *
 Est. SPECrate2017_int_base              83.7 (+4.062%)

On the other hand, it is slightly helpful to CPU-bound tasks like
kernbench:

* 24-96 threads kernbench (on 4NUMA * 24cores = 96cores)
                     kernbench              kernbench
                     w/o cluster            w/ cluster
Min       user-24    12054.67 (   0.00%)    12024.19 (   0.25%)
Min       syst-24     1751.51 (   0.00%)     1731.68 (   1.13%)
Min       elsp-24      600.46 (   0.00%)      598.64 (   0.30%)
Min       user-48    12361.93 (   0.00%)    12315.32 (   0.38%)
Min       syst-48     1917.66 (   0.00%)     1892.73 (   1.30%)
Min       elsp-48      333.96 (   0.00%)      332.57 (   0.42%)
Min       user-96    12922.40 (   0.00%)    12921.17 (   0.01%)
Min       syst-96     2143.94 (   0.00%)     2110.39 (   1.56%)
Min       elsp-96      211.22 (   0.00%)      210.47 (   0.36%)
Amean     user-24    12063.99 (   0.00%)    12030.78 *   0.28%*
Amean     syst-24     1755.20 (   0.00%)     1735.53 *   1.12%*
Amean     elsp-24      601.60 (   0.00%)      600.19 (   0.23%)
Amean     user-48    12362.62 (   0.00%)    12315.56 *   0.38%*
Amean     syst-48     1921.59 (   0.00%)     1894.95 *   1.39%*
Amean     elsp-48      334.10 (   0.00%)      332.82 *   0.38%*
Amean     user-96    12925.27 (   0.00%)    12922.63 (   0.02%)
Amean     syst-96     2146.66 (   0.00%)     2122.20 *   1.14%*
Amean     elsp-96      211.96 (   0.00%)      211.79 (   0.08%)

Note this patch isn't an universal win, it might hurt those workload
which can benefit from packing. Though tasks which want to take
advantages of lower communication latency of one cluster won't
necessarily been packed in one cluster while kernel is not aware of
clusters, they have some chance to be randomly packed. But this
patch will make them more likely spread.

Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
Tested-by: Yicong Yang <yangyicong@hisilicon.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2021-10-15 11:25:16 +02:00
Sean Christopherson
e26bb75aa2 KVM: arm64: Depend on HAVE_KVM instead of OF
Select HAVE_KVM at all times on arm64, as the OF requirement is
always there (even in the case of an ACPI system, we still depend
on some of the OF infrastructure), and won't fo away.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Acked-by: Will Deacon <will@kernel.org>
[maz: Drop the "HAVE_KVM if OF" dependency, as OF is always there on arm64,
 new commit message]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210921222231.518092-3-seanjc@google.com
2021-10-11 10:08:50 +01:00