Add a new signal context type for ZT which is present in the signal frame
when ZA is enabled and ZT is supported by the system. In order to account
for the possible addition of further ZT registers in the future we make the
number of registers variable in the ABI, though currently the only possible
number is 1. We could just use a bare list head for the context since the
number of registers can be inferred from the size of the context but for
usability and future extensibility we define a header with the number of
registers and some reserved fields in it.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221208-arm64-sme2-v4-11-f2fa0aef982f@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In preparation for adding support for storage for ZT0 to the thread_struct
rename za_state to sme_state. Since ZT0 is accessible when PSTATE.ZA is
set just like ZA itself we will extend the allocation done for ZA to
cover it, avoiding the need to further expand task_struct for non-SME
tasks.
No functional changes.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221208-arm64-sme2-v4-1-f2fa0aef982f@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
When we save the state for the floating point registers this can be done
in the form visible through either the FPSIMD V registers or the SVE Z and
P registers. At present we track which format is currently used based on
TIF_SVE and the SME streaming mode state but particularly in the SVE case
this limits our options for optimising things, especially around syscalls.
Introduce a new enum which we place together with saved floating point
state in both thread_struct and the KVM guest state which explicitly
states which format is active and keep it up to date when we change it.
At present we do not use this state except to verify that it has the
expected value when loading the state, future patches will introduce
functional changes.
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221115094640.112848-3-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Currently when taking a SME access trap we allocate storage for the SVE
register state in order to be able to handle storage of streaming mode SVE.
Due to the original usage in a purely SVE context the SVE register state
allocation this also flushes the register state for SVE if storage was
already allocated but in the SME context this is not desirable. For a SME
access trap to be taken the task must not be in streaming mode so either
there already is SVE register state present for regular SVE mode which would
be corrupted or the task does not have TIF_SVE and the flush is redundant.
Fix this by adding a flag to sve_alloc() indicating if we are in a SVE
context and need to flush the state. Freshly allocated storage is always
zeroed either way.
Fixes: 8bd7f91c03d8 ("arm64/sme: Implement traps and syscall handling for SME")
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220817182324.638214-4-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
When handling a signal delivered to a context with streaming mode enabled
we will disable streaming mode for the signal handler, when doing so we
should also flush the saved FPSIMD register state like exiting streaming
mode in the hardware would do so that if that state is reloaded we get the
same behaviour. Without this we will reload whatever the last FPSIMD state
that was saved for the task was.
Fixes: 40a8e87bb328 ("arm64/sme: Disable ZA and streaming mode when handling signals")
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220817182324.638214-3-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
The signal code has a limit of 64K on the size of a stack frame that it
will generate, if this limit is exceeded then a process will be killed if
it receives a signal. Unfortunately with the advent of SME this limit is
too small - the maximum possible size of the ZA register alone is 64K. This
is not an issue for practical systems at present but is easily seen using
virtual platforms.
Raise the limit to 256K, this is substantially more than could be used by
any current architecture extension.
Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220817182324.638214-2-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Currently when restoring signal state we check to see if SVE is supported
in restore_sigframe() but check to see if SVE is supported inside
restore_sve_fpsimd_context(). This makes no real difference since SVE is
always supported in systems with SME but looks a bit untidy and makes
things slightly harder to follow, move the SVE check next to the SME one
in restore_sve_fpsimd_context().
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20220624172108.555000-1-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
- Initialise jump labels before setup_machine_fdt(), needed by commit
f5bda35fba61 ("random: use static branch for crng_ready()").
- Sparse warnings: missing prototype, incorrect __user annotation.
- Skip SVE kselftest if not sufficient vector lengths supported.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmKaQ/8ACgkQa9axLQDI
XvE+eRAAjVQr4ehVgT75SIYq9cv/6A8cAlYzK4hi7cwhFLTii/9flZLmqTfjUwbY
Ifnm1/c/ka11vAC4JfzhEcAjIm5jJ7Wlw4XqMhGC2woTjmCrDxSlG1qrdt1zZlgI
H7RXCjCpfstCWFoZy4uDaaWZYgN6c91/IPuuamBUfADXu5NbeD9yfZALYUhiE9FP
yZ61BLDP0f8l0x5PMmDATidZ/vtIp+Hfiq1iezX0VugIY4+v6nbi6/Ba20Wc419v
cmDgjRc1ODaWnuWLUTiriUlt2VEtBs3LrGXa81WSrTKR9+DXih4Gwj4orpW7Rqx3
CFeSgVCDd5V9/XGEExN5GtMtiolDWNG8dA58jwwaWHWgp8AveSGY8AxPT+8/30/S
FQyVGjxrX1JUkIya3lbVuDLKHdzCKEX0O83dFQSm3bP6lUMZUCYCcFNpVpSYPsQr
ZZea9dNWhwiI82z19p7igfWKwerBMdUeptxMm247jKmdz4H/i6xGtK6nJc731zCZ
kjxoI6HCYAMH/r9kgWYldgeov/UNQXi5Q2ftsXkUepPPjpEd5YBJfkL6CzKgQ6u5
wtOTI8fainjPVWPGbcpHnEjlwG5UMY80jcJpfkD54g9N5WxunCWf7L3OZj77toTp
Z2kU2b189Di4bNOV9bCKFqrecqajNychMhXcc5anMfdN+284mdg=
=25eW
-----END PGP SIGNATURE-----
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
"Most of issues addressed were introduced during this merging window.
- Initialise jump labels before setup_machine_fdt(), needed by commit
f5bda35fba61 ("random: use static branch for crng_ready()").
- Sparse warnings: missing prototype, incorrect __user annotation.
- Skip SVE kselftest if not sufficient vector lengths supported"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
kselftest/arm64: signal: Skip SVE signal test if not enough VLs supported
arm64: Initialize jump labels before setup_machine_fdt()
arm64: hibernate: Fix syntax errors in comments
arm64: Remove the __user annotation for the restore_za_context() argument
ftrace/fgraph: fix increased missing-prototypes warnings
The struct user_ctx *user pointer passed to restore_za_context() is not
a user point but a structure containing several __user pointers. Remove
the __user annotation.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Fixes: 39782210eb7e ("arm64/sme: Implement ZA signal handling")
Reported-by: kernel test robot <lkp@intel.com>
Cc: Mark Brown <broonie@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20220601171338.2143625-1-catalin.marinas@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Platform PMU changes:
=====================
- x86/intel:
- Add new Intel Alder Lake and Raptor Lake support
- x86/amd:
- AMD Zen4 IBS extensions support
- Add AMD PerfMonV2 support
- Add AMD Fam19h Branch Sampling support
Generic changes:
================
- signal: Deliver SIGTRAP on perf event asynchronously if blocked
Perf instrumentation can be driven via SIGTRAP, but this causes a problem
when SIGTRAP is blocked by a task & terminate the task.
Allow user-space to request these signals asynchronously (after they get
unblocked) & also give the information to the signal handler when this
happens:
" To give user space the ability to clearly distinguish synchronous from
asynchronous signals, introduce siginfo_t::si_perf_flags and
TRAP_PERF_FLAG_ASYNC (opted for flags in case more binary information is
required in future).
The resolution to the problem is then to (a) no longer force the signal
(avoiding the terminations), but (b) tell user space via si_perf_flags
if the signal was synchronous or not, so that such signals can be
handled differently (e.g. let user space decide to ignore or consider
the data imprecise). "
- Unify/standardize the /sys/devices/cpu/events/* output format.
- Misc fixes & cleanups.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmKLuiURHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1ioSRAAgM3PneFHn5MFiuV/8ZfP3xMHNUOYOCgN
JhALRcUhDdL4N9pS0DSImfXvAlYPJ/TZK8qBRNDsRgygp5vjrbr9zH2HdZBW1gyV
qi3bpuNS+METnfNyumAoBeOYbMIvpm3NDUX+w68Xvkd1g8ykyno8Zc2H2hj3IDsW
cK3ErP0CZLsnBZsymy29/bxCYhfxsED6J06hOa8R3Tvl4XYg/27Z+tEuZ4GYeFS8
VikulYB9RhRWUbhkzwjyRSbTWyvsuXP+xD28ymUIxXaNCDOwxK8uYtVepUFIBO8X
cZgtwT2faV3y5ZAnz02M+/JZl+Jz5EPm037vNQp9aJsTuAbAGnxh/hL0cBVuDqhv
Nh9wkqS8FqwAbtpvg/IeamzqN5z/Yn2Q/Jyk/4oWipmeddXWUL7sYVoSduTGJJkz
cZz2ciNQbnOCzv0ZSjihrGMqPaT+/wI/iLW3ouLoZXpfTtVVRiiLuI1DDAZ1rd2r
D6djV8JjHIs71V/6E9ahVATxq8yMdikd7u734rA5K3XSxIBTYrdshbOhddzgeE7d
chQ7XvpQXDoFrZtxkHXP5iIeNF7fU9MWNWaEcsrZaWEB/8UpD6eL2if1Kl8mog+h
J4+zR1LWRHh8TNRfos3yCP2PSbbS6LPVsYLJzP+bb+pxgqdJ+urxfmxoCtY5trNI
zHT52xfdxSo=
=UqYA
-----END PGP SIGNATURE-----
Merge tag 'perf-core-2022-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf events updates from Ingo Molnar:
"Platform PMU changes:
- x86/intel:
- Add new Intel Alder Lake and Raptor Lake support
- x86/amd:
- AMD Zen4 IBS extensions support
- Add AMD PerfMonV2 support
- Add AMD Fam19h Branch Sampling support
Generic changes:
- signal: Deliver SIGTRAP on perf event asynchronously if blocked
Perf instrumentation can be driven via SIGTRAP, but this causes a
problem when SIGTRAP is blocked by a task & terminate the task.
Allow user-space to request these signals asynchronously (after
they get unblocked) & also give the information to the signal
handler when this happens:
"To give user space the ability to clearly distinguish
synchronous from asynchronous signals, introduce
siginfo_t::si_perf_flags and TRAP_PERF_FLAG_ASYNC (opted for
flags in case more binary information is required in future).
The resolution to the problem is then to (a) no longer force the
signal (avoiding the terminations), but (b) tell user space via
si_perf_flags if the signal was synchronous or not, so that such
signals can be handled differently (e.g. let user space decide
to ignore or consider the data imprecise). "
- Unify/standardize the /sys/devices/cpu/events/* output format.
- Misc fixes & cleanups"
* tag 'perf-core-2022-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits)
perf/x86/amd/core: Fix reloading events for SVM
perf/x86/amd: Run AMD BRS code only on supported hw
perf/x86/amd: Fix AMD BRS period adjustment
perf/x86/amd: Remove unused variable 'hwc'
perf/ibs: Fix comment
perf/amd/ibs: Advertise zen4_ibs_extensions as pmu capability attribute
perf/amd/ibs: Add support for L3 miss filtering
perf/amd/ibs: Use ->is_visible callback for dynamic attributes
perf/amd/ibs: Cascade pmu init functions' return value
perf/x86/uncore: Add new Alder Lake and Raptor Lake support
perf/x86/uncore: Clean up uncore_pci_ids[]
perf/x86/cstate: Add new Alder Lake and Raptor Lake support
perf/x86/msr: Add new Alder Lake and Raptor Lake support
perf/x86: Add new Alder Lake and Raptor Lake support
perf/amd/ibs: Use interrupt regs ip for stack unwinding
perf/x86/amd/core: Add PerfMonV2 overflow handling
perf/x86/amd/core: Add PerfMonV2 counter control
perf/x86/amd/core: Detect available counters
perf/x86/amd/core: Detect PerfMonV2 support
x86/msr: Add PerfCntrGlobal* registers
...
The defines for SVCR call it SVCR_EL0 however the architecture calls the
register SVCR with no _EL0 suffix. In preparation for generating the sysreg
definitions rename to match the architecture, no functional change.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20220510161208.631259-6-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Implement support for ZA in signal handling in a very similar way to how
we implement support for SVE registers, using a signal context structure
with optional register state after it. Where present this register state
stores the ZA matrix as a series of horizontal vectors numbered from 0 to
VL/8 in the endinanness independent format used for vectors.
As with SVE we do not allow changes in the vector length during signal
return but we do allow ZA to be enabled or disabled.
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220419112247.711548-20-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
When in streaming mode we have the same set of SVE registers as we do in
regular SVE mode with the exception of FFR and the use of the SME vector
length. Provide signal handling for these registers by taking one of the
reserved words in the SVE signal context as a flags field and defining a
flag which is set for streaming mode. When the flag is set the vector
length is set to the streaming mode vector length and we save and
restore streaming mode data. We support entering or leaving streaming
mode based on the value of the flag but do not support changing the
vector length, this is not currently supported SVE signal handling.
We could instead allocate a separate record in the signal frame for the
streaming mode SVE context but this inflates the size of the maximal signal
frame required and adds complication when validating signal frames from
userspace, especially given the current structure of the code.
Any implementation of support for streaming mode vectors in signals will
have some potential for causing issues for applications that attempt to
handle SVE vectors in signals, use streaming mode but do not understand
streaming mode in their signal handling code, it is hard to identify a
case that is clearly better than any other - they all have cases where
they could cause unexpected register corruption or faults.
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220419112247.711548-19-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The ABI requires that streaming mode and ZA are disabled when invoking
signal handlers, do this in setup_return() when we prepare the task state
for the signal handler.
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220419112247.711548-18-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
With SIGTRAP on perf events, we have encountered termination of
processes due to user space attempting to block delivery of SIGTRAP.
Consider this case:
<set up SIGTRAP on a perf event>
...
sigset_t s;
sigemptyset(&s);
sigaddset(&s, SIGTRAP | <and others>);
sigprocmask(SIG_BLOCK, &s, ...);
...
<perf event triggers>
When the perf event triggers, while SIGTRAP is blocked, force_sig_perf()
will force the signal, but revert back to the default handler, thus
terminating the task.
This makes sense for error conditions, but not so much for explicitly
requested monitoring. However, the expectation is still that signals
generated by perf events are synchronous, which will no longer be the
case if the signal is blocked and delivered later.
To give user space the ability to clearly distinguish synchronous from
asynchronous signals, introduce siginfo_t::si_perf_flags and
TRAP_PERF_FLAG_ASYNC (opted for flags in case more binary information is
required in future).
The resolution to the problem is then to (a) no longer force the signal
(avoiding the terminations), but (b) tell user space via si_perf_flags
if the signal was synchronous or not, so that such signals can be
handled differently (e.g. let user space decide to ignore or consider
the data imprecise).
The alternative of making the kernel ignore SIGTRAP on perf events if
the signal is blocked may work for some usecases, but likely causes
issues in others that then have to revert back to interception of
sigprocmask() (which we want to avoid). [ A concrete example: when using
breakpoint perf events to track data-flow, in a region of code where
signals are blocked, data-flow can no longer be tracked accurately.
When a relevant asynchronous signal is received after unblocking the
signal, the data-flow tracking logic needs to know its state is
imprecise. ]
Fixes: 97ba62b27867 ("perf: Add support for SIGTRAP on perf events")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Link: https://lore.kernel.org/r/20220404111204.935357-1-elver@google.com
This set of changes removes tracehook.h, moves modification of all of
the ptrace fields inside of siglock to remove races, adds a missing
permission check to ptrace.c
The removal of tracehook.h is quite significant as it has been a major
source of confusion in recent years. Much of that confusion was
around task_work and TIF_NOTIFY_SIGNAL (which I have now decoupled
making the semantics clearer).
For people who don't know tracehook.h is a vestiage of an attempt to
implement uprobes like functionality that was never fully merged, and
was later superseeded by uprobes when uprobes was merged. For many
years now we have been removing what tracehook functionaly a little
bit at a time. To the point where now anything left in tracehook.h is
some weird strange thing that is difficult to understand.
Eric W. Biederman (15):
ptrace: Move ptrace_report_syscall into ptrace.h
ptrace/arm: Rename tracehook_report_syscall report_syscall
ptrace: Create ptrace_report_syscall_{entry,exit} in ptrace.h
ptrace: Remove arch_syscall_{enter,exit}_tracehook
ptrace: Remove tracehook_signal_handler
task_work: Remove unnecessary include from posix_timers.h
task_work: Introduce task_work_pending
task_work: Call tracehook_notify_signal from get_signal on all architectures
task_work: Decouple TIF_NOTIFY_SIGNAL and task_work
signal: Move set_notify_signal and clear_notify_signal into sched/signal.h
resume_user_mode: Remove #ifdef TIF_NOTIFY_RESUME in set_notify_resume
resume_user_mode: Move to resume_user_mode.h
tracehook: Remove tracehook.h
ptrace: Move setting/clearing ptrace_message into ptrace_stop
ptrace: Return the signal to continue with from ptrace_stop
Jann Horn (1):
ptrace: Check PTRACE_O_SUSPEND_SECCOMP permission on PTRACE_SEIZE
Yang Li (1):
ptrace: Remove duplicated include in ptrace.c
MAINTAINERS | 1 -
arch/Kconfig | 5 +-
arch/alpha/kernel/ptrace.c | 5 +-
arch/alpha/kernel/signal.c | 4 +-
arch/arc/kernel/ptrace.c | 5 +-
arch/arc/kernel/signal.c | 4 +-
arch/arm/kernel/ptrace.c | 12 +-
arch/arm/kernel/signal.c | 4 +-
arch/arm64/kernel/ptrace.c | 14 +--
arch/arm64/kernel/signal.c | 4 +-
arch/csky/kernel/ptrace.c | 5 +-
arch/csky/kernel/signal.c | 4 +-
arch/h8300/kernel/ptrace.c | 5 +-
arch/h8300/kernel/signal.c | 4 +-
arch/hexagon/kernel/process.c | 4 +-
arch/hexagon/kernel/signal.c | 1 -
arch/hexagon/kernel/traps.c | 6 +-
arch/ia64/kernel/process.c | 4 +-
arch/ia64/kernel/ptrace.c | 6 +-
arch/ia64/kernel/signal.c | 1 -
arch/m68k/kernel/ptrace.c | 5 +-
arch/m68k/kernel/signal.c | 4 +-
arch/microblaze/kernel/ptrace.c | 5 +-
arch/microblaze/kernel/signal.c | 4 +-
arch/mips/kernel/ptrace.c | 5 +-
arch/mips/kernel/signal.c | 4 +-
arch/nds32/include/asm/syscall.h | 2 +-
arch/nds32/kernel/ptrace.c | 5 +-
arch/nds32/kernel/signal.c | 4 +-
arch/nios2/kernel/ptrace.c | 5 +-
arch/nios2/kernel/signal.c | 4 +-
arch/openrisc/kernel/ptrace.c | 5 +-
arch/openrisc/kernel/signal.c | 4 +-
arch/parisc/kernel/ptrace.c | 7 +-
arch/parisc/kernel/signal.c | 4 +-
arch/powerpc/kernel/ptrace/ptrace.c | 8 +-
arch/powerpc/kernel/signal.c | 4 +-
arch/riscv/kernel/ptrace.c | 5 +-
arch/riscv/kernel/signal.c | 4 +-
arch/s390/include/asm/entry-common.h | 1 -
arch/s390/kernel/ptrace.c | 1 -
arch/s390/kernel/signal.c | 5 +-
arch/sh/kernel/ptrace_32.c | 5 +-
arch/sh/kernel/signal_32.c | 4 +-
arch/sparc/kernel/ptrace_32.c | 5 +-
arch/sparc/kernel/ptrace_64.c | 5 +-
arch/sparc/kernel/signal32.c | 1 -
arch/sparc/kernel/signal_32.c | 4 +-
arch/sparc/kernel/signal_64.c | 4 +-
arch/um/kernel/process.c | 4 +-
arch/um/kernel/ptrace.c | 5 +-
arch/x86/kernel/ptrace.c | 1 -
arch/x86/kernel/signal.c | 5 +-
arch/x86/mm/tlb.c | 1 +
arch/xtensa/kernel/ptrace.c | 5 +-
arch/xtensa/kernel/signal.c | 4 +-
block/blk-cgroup.c | 2 +-
fs/coredump.c | 1 -
fs/exec.c | 1 -
fs/io-wq.c | 6 +-
fs/io_uring.c | 11 +-
fs/proc/array.c | 1 -
fs/proc/base.c | 1 -
include/asm-generic/syscall.h | 2 +-
include/linux/entry-common.h | 47 +-------
include/linux/entry-kvm.h | 2 +-
include/linux/posix-timers.h | 1 -
include/linux/ptrace.h | 81 ++++++++++++-
include/linux/resume_user_mode.h | 64 ++++++++++
include/linux/sched/signal.h | 17 +++
include/linux/task_work.h | 5 +
include/linux/tracehook.h | 226 -----------------------------------
include/uapi/linux/ptrace.h | 2 +-
kernel/entry/common.c | 19 +--
kernel/entry/kvm.c | 9 +-
kernel/exit.c | 3 +-
kernel/livepatch/transition.c | 1 -
kernel/ptrace.c | 47 +++++---
kernel/seccomp.c | 1 -
kernel/signal.c | 62 +++++-----
kernel/task_work.c | 4 +-
kernel/time/posix-cpu-timers.c | 1 +
mm/memcontrol.c | 2 +-
security/apparmor/domain.c | 1 -
security/selinux/hooks.c | 1 -
85 files changed, 372 insertions(+), 495 deletions(-)
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEgjlraLDcwBA2B+6cC/v6Eiajj0AFAmJCQkoACgkQC/v6Eiaj
j0DCWQ/5AZVFU+hX32obUNCLackHTwgcCtSOs3JNBmNA/zL/htPiYYG0ghkvtlDR
Dw5J5DnxC6P7PVAdAqrpvx2uX2FebHYU0bRlyLx8LYUEP5dhyNicxX9jA882Z+vw
Ud0Ue9EojwGWS76dC9YoKUj3slThMATbhA2r4GVEoof8fSNJaBxQIqath44t0FwU
DinWa+tIOvZANGBZr6CUUINNIgqBIZCH/R4h6ArBhMlJpuQ5Ufk2kAaiWFwZCkX4
0LuuAwbKsCKkF8eap5I2KrIg/7zZVgxAg9O3cHOzzm8OPbKzRnNnQClcDe8perqp
S6e/f3MgpE+eavd1EiLxevZ660cJChnmikXVVh8ZYYoefaMKGqBaBSsB38bNcLjY
3+f2dB+TNBFRnZs1aCujK3tWBT9QyjZDKtCBfzxDNWBpXGLhHH6j6lA5Lj+Cef5K
/HNHFb+FuqedlFZh5m1Y+piFQ70hTgCa2u8b+FSOubI2hW9Zd+WzINV0ANaZ2LvZ
4YGtcyDNk1q1+c87lxP9xMRl/xi6rNg+B9T2MCo4IUnHgpSVP6VEB3osgUmrrrN0
eQlUI154G/AaDlqXLgmn1xhRmlPGfmenkxpok1AuzxvNJsfLKnpEwQSc13g3oiZr
disZQxNY0kBO2Nv3G323Z6PLinhbiIIFez6cJzK5v0YJ2WtO3pY=
=uEro
-----END PGP SIGNATURE-----
Merge tag 'ptrace-cleanups-for-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull ptrace cleanups from Eric Biederman:
"This set of changes removes tracehook.h, moves modification of all of
the ptrace fields inside of siglock to remove races, adds a missing
permission check to ptrace.c
The removal of tracehook.h is quite significant as it has been a major
source of confusion in recent years. Much of that confusion was around
task_work and TIF_NOTIFY_SIGNAL (which I have now decoupled making the
semantics clearer).
For people who don't know tracehook.h is a vestiage of an attempt to
implement uprobes like functionality that was never fully merged, and
was later superseeded by uprobes when uprobes was merged. For many
years now we have been removing what tracehook functionaly a little
bit at a time. To the point where anything left in tracehook.h was
some weird strange thing that was difficult to understand"
* tag 'ptrace-cleanups-for-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
ptrace: Remove duplicated include in ptrace.c
ptrace: Check PTRACE_O_SUSPEND_SECCOMP permission on PTRACE_SEIZE
ptrace: Return the signal to continue with from ptrace_stop
ptrace: Move setting/clearing ptrace_message into ptrace_stop
tracehook: Remove tracehook.h
resume_user_mode: Move to resume_user_mode.h
resume_user_mode: Remove #ifdef TIF_NOTIFY_RESUME in set_notify_resume
signal: Move set_notify_signal and clear_notify_signal into sched/signal.h
task_work: Decouple TIF_NOTIFY_SIGNAL and task_work
task_work: Call tracehook_notify_signal from get_signal on all architectures
task_work: Introduce task_work_pending
task_work: Remove unnecessary include from posix_timers.h
ptrace: Remove tracehook_signal_handler
ptrace: Remove arch_syscall_{enter,exit}_tracehook
ptrace: Create ptrace_report_syscall_{entry,exit} in ptrace.h
ptrace/arm: Rename tracehook_report_syscall report_syscall
ptrace: Move ptrace_report_syscall into ptrace.h
* for-next/fpsimd:
arm64: cpufeature: Warn if we attempt to read a zero width field
arm64: cpufeature: Add missing .field_width for GIC system registers
arm64: signal: nofpsimd: Do not allocate fp/simd context when not available
arm64: cpufeature: Always specify and use a field width for capabilities
arm64: Always use individual bits in CPACR floating point enables
arm64: Define CPACR_EL1_FPEN similarly to other floating point controls
Move set_notify_resume and tracehook_notify_resume into resume_user_mode.h.
While doing that rename tracehook_notify_resume to resume_user_mode_work.
Update all of the places that included tracehook.h for these functions to
include resume_user_mode.h instead.
Update all of the callers of tracehook_notify_resume to call
resume_user_mode_work.
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lkml.kernel.org/r/20220309162454.123006-12-ebiederm@xmission.com
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Drop several includes of <linux/personality.h> which are not used.
git-blame indicates they were used at some point, but they're not needed
anymore.
Signed-off-by: Sagar Patel <sagarmp@cs.unc.edu>
Link: https://lore.kernel.org/r/20220307222412.146506-1-sagarmp@cs.unc.edu
Signed-off-by: Will Deacon <will@kernel.org>
Commit 6d502b6ba1b2 ("arm64: signal: nofpsimd: Handle fp/simd context for
signal frames") introduced saving the fp/simd context for signal handling
only when support is available. But setup_sigframe_layout() always
reserves memory for fp/simd context. The additional memory is not touched
because preserve_fpsimd_context() is not called and thus the magic is
invalid.
This may lead to an error when parse_user_sigframe() checks the fp/simd
area and does not find a valid magic number.
Signed-off-by: David Engraf <david.engraf@sysgo.com>
Reviwed-by: Mark Brown <broonie@kernel.org>
Fixes: 6d502b6ba1b267b3 ("arm64: signal: nofpsimd: Handle fp/simd context for signal frames")
Cc: <stable@vger.kernel.org> # 5.6.x
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220225104008.820289-1-david.engraf@sysgo.com
Signed-off-by: Will Deacon <will@kernel.org>
Some thread flags can be set remotely, and so even when IRQs are disabled,
the flags can change under our feet. Generally this is unlikely to cause a
problem in practice, but it is somewhat unsound, and KCSAN will
legitimately warn that there is a data race.
To avoid such issues, a snapshot of the flags has to be taken prior to
using them. Some places already use READ_ONCE() for that, others do not.
Convert them all to the new flag accessor helpers.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20211129130653.2037928-7-mark.rutland@arm.com
With the introduction of SME we will have a second vector length in the
system, enumerated and configured in a very similar fashion to the
existing SVE vector length. While there are a few differences in how
things are handled this is a relatively small portion of the overall
code so in order to avoid code duplication we factor out
We create two structs, one vl_info for the static hardware properties
and one vl_config for the runtime configuration, with an array
instantiated for each and update all the users to reference these. Some
accessor functions are provided where helpful for readability, and the
write to set the vector length is put into a function since the system
register being updated needs to be chosen at compile time.
This is a mostly mechanical replacement, further work will be required
to actually make things generic, ensuring that we handle those places
where there are differences properly.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20211019172247.3045838-8-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
In a system with SME there are parallel vector length controls for SVE and
SME vectors which function in much the same way so it is desirable to
share the code for handling them as much as possible. In order to prepare
for doing this add a layer of accessor functions for the various VL related
operations on tasks.
Since almost all current interactions are actually via task->thread rather
than directly with the thread_info the accessors use that. Accessors are
provided for both generic and SVE specific usage, the generic accessors
should be used for cases where register state is being manipulated since
the registers are shared between streaming and regular SVE so we know that
when SME support is implemented we will always have to be in the appropriate
mode already and hence can generalise now.
Since we are using task_struct and we don't want to cause widespread
inclusion of sched.h the acessors are all out of line, it is hoped that
none of the uses are in a sufficiently critical path for this to be an
issue. Those that are most likely to present an issue are in the same
translation unit so hopefully the compiler may be able to inline anyway.
This is purely adding the layer of abstraction, additional work will be
needed to support tasks using SME.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20211019172247.3045838-7-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Invoke rseq_handle_notify_resume() from tracehook_notify_resume() now
that the two function are always called back-to-back by architectures
that have rseq. The rseq helper is stubbed out for architectures that
don't support rseq, i.e. this is a nop across the board.
Note, tracehook_notify_resume() is horribly named and arguably does not
belong in tracehook.h as literally every line of code in it has nothing
to do with tracing. But, that's been true since commit a42c6ded827d
("move key_repace_session_keyring() into tracehook_notify_resume()")
first usurped tracehook_notify_resume() back in 2012. Punt cleaning that
mess up to future patches.
No functional change intended.
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210901203030.1292304-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
- Support for 32-bit tasks on asymmetric AArch32 systems (on top of the
scheduler changes merged via the tip tree).
- More entry.S clean-ups and conversion to C.
- MTE updates: allow a preferred tag checking mode to be set per CPU
(the overhead of synchronous mode is smaller for some CPUs than
others); optimisations for kernel entry/exit path; optionally disable
MTE on the kernel command line.
- Kselftest improvements for SVE and signal handling, PtrAuth.
- Fix unlikely race where a TLBI could use stale ASID on an ASID
roll-over (found by inspection).
- Miscellaneous fixes: disable trapping of PMSNEVFR_EL1 to higher
exception levels; drop unnecessary sigdelsetmask() call in the
signal32 handling; remove BUG_ON when failing to allocate SVE state
(just signal the process); SYM_CODE annotations.
- Other trivial clean-ups: use macros instead of magic numbers, remove
redundant returns, typos.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmEuYkoACgkQa9axLQDI
XvEWVw/9HSWbccLrQ68ulaqZkL4r6lL2RqvZ2p6fkIRW7bX1JS4UJjWe3+VBg5Ed
DQ1A5cHC5ZndQ4gCRsUhcq7IMXBSj3twMzK7yxBk3zh8tbhVrIOONsKMurMw1NyM
OmoyTJ01i2ZrkDs0OU3fBlvIHPxBjKbOZqykOJHjrB2rwBSbsyUw2KvpM7ha8DOf
O7gKViDrdAhumdIL9rsMvSiIPoJLCxvqeu55c3saVu1JrUR6ENu7lMu3jt4WrfK3
m5gf76IFbgxXvlLiC8RJW7OYaXZ+COb7RA/yP/lK+Y0ug9PwqTpzXDwqvAp8nBIv
y7DK0umcBwfDWmwnRO+ZzNPjOGTHnOnjC07WNBPn3v03pMeJ8v8RnvzHkliek31P
r6uFWBxWO/O0sBbSpR+4tzgNfir0RkMajwL5pxQCEMoPCucStYQQl8zIeJeJecpT
DKIyKzfFw6O59gdhE6dCj2wXH8YmKUoSUPCAXpKGzK/oYVOGVQTZSZjIC++ydFWv
AOXz77etPidk3/Tl15Ena7fkkMkxX9UM8dTjOFS64mSWlEyzE6FtfAgm2rIEOaG7
ps6IjVzVves39SC+yry8T2L6gsxPnanRfwKKCWHkovQzNFgs5Qt51Fd5eIeI1jZ0
uEZhd19FN4136QhjWJOeXL/eyj0bv1WLX/mUln95sHnKyf4je9w=
=X6Wm
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
- Support for 32-bit tasks on asymmetric AArch32 systems (on top of the
scheduler changes merged via the tip tree).
- More entry.S clean-ups and conversion to C.
- MTE updates: allow a preferred tag checking mode to be set per CPU
(the overhead of synchronous mode is smaller for some CPUs than
others); optimisations for kernel entry/exit path; optionally disable
MTE on the kernel command line.
- Kselftest improvements for SVE and signal handling, PtrAuth.
- Fix unlikely race where a TLBI could use stale ASID on an ASID
roll-over (found by inspection).
- Miscellaneous fixes: disable trapping of PMSNEVFR_EL1 to higher
exception levels; drop unnecessary sigdelsetmask() call in the
signal32 handling; remove BUG_ON when failing to allocate SVE state
(just signal the process); SYM_CODE annotations.
- Other trivial clean-ups: use macros instead of magic numbers, remove
redundant returns, typos.
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (56 commits)
arm64: Do not trap PMSNEVFR_EL1
arm64: mm: fix comment typo of pud_offset_phys()
arm64: signal32: Drop pointless call to sigdelsetmask()
arm64/sve: Better handle failure to allocate SVE register storage
arm64: Document the requirement for SCR_EL3.HCE
arm64: head: avoid over-mapping in map_memory
arm64/sve: Add a comment documenting the binutils needed for SVE asm
arm64/sve: Add some comments for sve_save/load_state()
kselftest/arm64: signal: Add a TODO list for signal handling tests
kselftest/arm64: signal: Add test case for SVE register state in signals
kselftest/arm64: signal: Verify that signals can't change the SVE vector length
kselftest/arm64: signal: Check SVE signal frame shows expected vector length
kselftest/arm64: signal: Support signal frames with SVE register data
kselftest/arm64: signal: Add SVE to the set of features we can check for
arm64: replace in_irq() with in_hardirq()
kselftest/arm64: pac: Fix skipping of tests on systems without PAC
Documentation: arm64: describe asymmetric 32-bit support
arm64: Remove logic to kill 32-bit tasks on 64-bit-only cores
arm64: Hook up cmdline parameter to allow mismatched 32-bit EL0
arm64: Advertise CPUs capable of running 32-bit applications in sysfs
...
Pull siginfo si_trapno updates from Eric Biederman:
"The full set of si_trapno changes was not appropriate as a fix for the
newly added SIGTRAP TRAP_PERF, and so I postponed the rest of the
related cleanups.
This is the rest of the cleanups for si_trapno that reduces it from
being a really weird arch special case that is expect to be always
present (but isn't) on the architectures that support it to being yet
another field in the _sigfault union of struct siginfo.
The changes have been reviewed and marinated in linux-next. With the
removal of this awkward special case new code (like SIGTRAP TRAP_PERF)
that works across architectures should be easier to write and
maintain"
* 'siginfo-si_trapno-for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
signal: Rename SIL_PERF_EVENT SIL_FAULT_PERF_EVENT for consistency
signal: Verify the alignment and size of siginfo_t
signal: Remove the generic __ARCH_SI_TRAPNO support
signal/alpha: si_trapno is only used with SIGFPE and SIGTRAP TRAP_UNK
signal/sparc: si_trapno is only used with SIGILL ILL_ILLTRP
arm64: Add compile-time asserts for siginfo_t offsets
arm: Add compile-time asserts for siginfo_t offsets
sparc64: Add compile-time asserts for siginfo_t offsets
* tip/sched/arm64: (785 commits)
Documentation: arm64: describe asymmetric 32-bit support
arm64: Remove logic to kill 32-bit tasks on 64-bit-only cores
arm64: Hook up cmdline parameter to allow mismatched 32-bit EL0
arm64: Advertise CPUs capable of running 32-bit applications in sysfs
arm64: Prevent offlining first CPU with 32-bit EL0 on mismatched system
arm64: exec: Adjust affinity for compat tasks with mismatched 32-bit EL0
arm64: Implement task_cpu_possible_mask()
sched: Introduce dl_task_check_affinity() to check proposed affinity
sched: Allow task CPU affinity to be restricted on asymmetric systems
sched: Split the guts of sched_setaffinity() into a helper function
sched: Introduce task_struct::user_cpus_ptr to track requested affinity
sched: Reject CPU affinity changes based on task_cpu_possible_mask()
cpuset: Cleanup cpuset_cpus_allowed_fallback() use in select_fallback_rq()
cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus()
cpuset: Don't use the cpu_possible_mask as a last resort for cgroup v1
sched: Introduce task_cpu_possible_mask() to limit fallback rq selection
sched: Cgroup SCHED_IDLE support
sched/topology: Skip updating masks for non-online nodes
Linux 5.14-rc6
lib: use PFN_PHYS() in devmem_is_allowed()
...
* for-next/entry:
: More entry.S clean-ups and conversion to C.
arm64: entry: call exit_to_user_mode() from C
arm64: entry: move bulk of ret_to_user to C
arm64: entry: clarify entry/exit helpers
arm64: entry: consolidate entry/exit helpers
Currently we "handle" failure to allocate the SVE register storage by
doing a BUG_ON() and hoping for the best. This is obviously not great and
the memory allocation failure will already be loud enough without the
BUG_ON(). As the comment says it is a corner case but let's try to do a bit
better, remove the BUG_ON() and add code to handle the failure in the
callers.
For the ptrace and signal code we can return -ENOMEM gracefully however
we have no real error reporting path available to us for the SVE access
trap so instead generate a SIGKILL if the allocation fails there. This
at least means that we won't try to soldier on and end up trying to
access the nonexistant state and while it's obviously not ideal for
userspace SIGKILL doesn't allow any handling so minimises the ABI
impact, making it easier to improve the interface later if we come up
with a better idea.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20210824153417.18371-1-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The scheduler now knows enough about these braindead systems to place
32-bit tasks accordingly, so throw out the safety checks and allow the
ret-to-user path to avoid do_notify_resume() if there is nothing to do.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20210730112443.23245-16-will@kernel.org
In `ret_to_user` we perform some conditional work depending on the
thread flags, then perform some IRQ/context tracking which is intended
to balance with the IRQ/context tracking performed in the entry C code.
For simplicity and consistency, it would be preferable to move this all
to C. As a step towards that, this patch moves the conditional work and
IRQ/context tracking into a C helper function. To aid bisectability,
this is called from the `ret_to_user` assembly, and a subsequent patch
will move the call to C code.
As local_daif_mask() handles all necessary tracing and PMR manipulation,
we no longer need to handle this explicitly. As we call
exit_to_user_mode() directly, the `user_enter_irqoff` macro is no longer
used, and can be removed. As enter_from_user_mode() and
exit_to_user_mode() are no longer called from assembly, these can be
made static, and as these are typically very small, they are marked
__always_inline to avoid the overhead of a function call.
For now, enablement of single-step is left in entry.S, and for this we
still need to read the flags in ret_to_user(). It is safe to read this
separately as TIF_SINGLESTEP is not part of _TIF_WORK_MASK.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Joey Gouly <joey.gouly@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Link: https://lore.kernel.org/r/20210802140733.52716-4-mark.rutland@arm.com
[catalin.marinas@arm.com: removed unused gic_prio_kentry_setup macro]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Due to inconsistencies in the way we manipulate compat GPRs, we have a
few issues today:
* For audit and tracing, where error codes are handled as a (native)
long, negative error codes are expected to be sign-extended to the
native 64-bits, or they may fail to be matched correctly. Thus a
syscall which fails with an error may erroneously be identified as
failing.
* For ptrace, *all* compat return values should be sign-extended for
consistency with 32-bit arm, but we currently only do this for
negative return codes.
* As we may transiently set the upper 32 bits of some compat GPRs while
in the kernel, these can be sampled by perf, which is somewhat
confusing. This means that where a syscall returns a pointer above 2G,
this will be sign-extended, but will not be mistaken for an error as
error codes are constrained to the inclusive range [-4096, -1] where
no user pointer can exist.
To fix all of these, we must consistently use helpers to get/set the
compat GPRs, ensuring that we never write the upper 32 bits of the
return code, and always sign-extend when reading the return code. This
patch does so, with the following changes:
* We re-organise syscall_get_return_value() to always sign-extend for
compat tasks, and reimplement syscall_get_error() atop. We update
syscall_trace_exit() to use syscall_get_return_value().
* We consistently use syscall_set_return_value() to set the return
value, ensureing the upper 32 bits are never set unexpectedly.
* As the core audit code currently uses regs_return_value() rather than
syscall_get_return_value(), we special-case this for
compat_user_mode(regs) such that this will do the right thing. Going
forward, we should try to move the core audit code over to
syscall_get_return_value().
Cc: <stable@vger.kernel.org>
Reported-by: He Zhe <zhe.he@windriver.com>
Reported-by: weiyuchen <weiyuchen3@huawei.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20210802104200.21390-1-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Update the static assertions about siginfo_t to also describe
it's alignment and size.
While investigating if it was possible to add a 64bit field into
siginfo_t[1] it became apparent that the alignment of siginfo_t
is as much a part of the ABI as the size of the structure.
If the alignment changes siginfo_t when embedded in another structure
can move to a different offset. Which is not acceptable from an ABI
structure.
So document that fact and add static assertions to notify developers
if they change change the alignment by accident.
[1] https://lkml.kernel.org/r/YJEZdhe6JGFNYlum@elver.google.com
Acked-by: Marco Elver <elver@google.com>
v1: https://lkml.kernel.org/r/20210505141101.11519-4-ebiederm@xmission.co
Link: https://lkml.kernel.org/r/875yxaxmyl.fsf_-_@disp2133
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Scheduling a 32-bit application on a 64-bit-only CPU is a bad idea.
Ensure that 32-bit applications always take the slow-path when returning
to userspace on a system with mismatched support at EL0, so that we can
avoid trying to run on a 64-bit-only CPU and force a SIGKILL instead.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20210608180313.11502-5-will@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
All EL0 returns go via ret_to_user(), which masks IRQs and notifies
lockdep and tracing before calling into do_notify_resume(). Therefore,
there's no need for do_notify_resume() to call trace_hardirqs_off(), and
the comment is stale. The call is simply redundant.
In ret_to_user() we call exit_to_user_mode(), which notifies lockdep and
tracing the IRQs will be enabled in userspace, so there's no need for
el0_svc_common() to call trace_hardirqs_on() before returning. Further,
at the start of ret_to_user() we call trace_hardirqs_off(), so not only
is this redundant, but it is immediately undone.
In addition to being redundant, the trace_hardirqs_on() in
el0_svc_common() leaves lockdep inconsistent with the hardware state,
and is liable to cause issues for any C code or instrumentation
between this and the call to trace_hardirqs_off() which undoes it in
ret_to_user().
This patch removes the redundant tracing calls and associated stale
comments.
Fixes: 23529049c684 ("arm64: entry: fix non-NMI user<->kernel transitions")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Will Deacon <will@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210107145310.44616-1-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl/YJxsQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpjpyEACBdW+YjenjTbkUPeEXzQgkBkTZUYw3g007
DPcUT1g8PQZXYXlQvBKCvGhhIr7/KVcjepKoowiNQfBNGcIPJTVopW58nzpqAfTQ
goI2WYGn5EKFFKBPvtH04cJD/Wo8muXdxynKtqyZbnGGgZjQxPrE259b8dpHjBSR
6L7HHkk0D1oU/5b6h6Ocpg9mc/0iIUCZylySAYY3eGO0JaVPJaXgZSJZYgHxCHll
Lb+/y/fXdtm/0PmQ3ko0ev54g3yEWqZIX0NsZW1asrButIy+KLzQ2Mz1xFLFDMag
prtIfwb8tzgc4dFPY090C/azjCh5CPpxqYS6FkRwS0p86n6OhkyXrqfily5Hs4/B
NC7CBPBSH/j+NKUK7CYZcpTzTpxPjUr9p0anUdlvMJz8FhTb/3YEEZ1UTeWOeHmk
Yo5SxnFghLeZZeZ1ok6rdymnVa7WEX12SCLGQX31BB2mld0tNbKb4b+FsBF6OUMk
IUaX6OjwDFVRaysC88BQ4hjcIP1HxsViG4/VZDX15gjAAH2Pvb+7tev+lcDcOhjz
TCD4GNFspTFzRhh9nT7oxQ679qCh9G9zHbzuIRewnrS6iqvo5SJQB3dR2yrWZRRH
ySkQFiHpYOlnLJYv0jg9COlGwo2FUdcvKhCvkjQKKBz48rzW/IC0LwKdRQWZDFk3
FKGzP/NBig==
=cadT
-----END PGP SIGNATURE-----
Merge tag 'tif-task_work.arch-2020-12-14' of git://git.kernel.dk/linux-block
Pull TIF_NOTIFY_SIGNAL updates from Jens Axboe:
"This sits on top of of the core entry/exit and x86 entry branch from
the tip tree, which contains the generic and x86 parts of this work.
Here we convert the rest of the archs to support TIF_NOTIFY_SIGNAL.
With that done, we can get rid of JOBCTL_TASK_WORK from task_work and
signal.c, and also remove a deadlock work-around in io_uring around
knowing that signal based task_work waking is invoked with the sighand
wait queue head lock.
The motivation for this work is to decouple signal notify based
task_work, of which io_uring is a heavy user of, from sighand. The
sighand lock becomes a huge contention point, particularly for
threaded workloads where it's shared between threads. Even outside of
threaded applications it's slower than it needs to be.
Roman Gershman <romger@amazon.com> reported that his networked
workload dropped from 1.6M QPS at 80% CPU to 1.0M QPS at 100% CPU
after io_uring was changed to use TIF_NOTIFY_SIGNAL. The time was all
spent hammering on the sighand lock, showing 57% of the CPU time there
[1].
There are further cleanups possible on top of this. One example is
TIF_PATCH_PENDING, where a patch already exists to use
TIF_NOTIFY_SIGNAL instead. Hopefully this will also lead to more
consolidation, but the work stands on its own as well"
[1] https://github.com/axboe/liburing/issues/215
* tag 'tif-task_work.arch-2020-12-14' of git://git.kernel.dk/linux-block: (28 commits)
io_uring: remove 'twa_signal_ok' deadlock work-around
kernel: remove checking for TIF_NOTIFY_SIGNAL
signal: kill JOBCTL_TASK_WORK
io_uring: JOBCTL_TASK_WORK is no longer used by task_work
task_work: remove legacy TWA_SIGNAL path
sparc: add support for TIF_NOTIFY_SIGNAL
riscv: add support for TIF_NOTIFY_SIGNAL
nds32: add support for TIF_NOTIFY_SIGNAL
ia64: add support for TIF_NOTIFY_SIGNAL
h8300: add support for TIF_NOTIFY_SIGNAL
c6x: add support for TIF_NOTIFY_SIGNAL
alpha: add support for TIF_NOTIFY_SIGNAL
xtensa: add support for TIF_NOTIFY_SIGNAL
arm: add support for TIF_NOTIFY_SIGNAL
microblaze: add support for TIF_NOTIFY_SIGNAL
hexagon: add support for TIF_NOTIFY_SIGNAL
csky: add support for TIF_NOTIFY_SIGNAL
openrisc: add support for TIF_NOTIFY_SIGNAL
sh: add support for TIF_NOTIFY_SIGNAL
um: add support for TIF_NOTIFY_SIGNAL
...
Now that set_fs() is gone, addr_limit_user_check() is redundant. Remove
the checks and associated thread flag.
To ensure that _TIF_WORK_MASK can be used as an immediate value in an
AND instruction (as it is in `ret_to_user`), TIF_MTE_ASYNC_FAULT is
renumbered to keep the constituent bits of _TIF_WORK_MASK contiguous.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201202131558.39270-11-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
All the callers currently do this, clean it up and move the clearing
into tracehook_notify_resume() instead.
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Add userspace support for the Memory Tagging Extension introduced by
Armv8.5.
(Catalin Marinas and others)
* for-next/mte: (30 commits)
arm64: mte: Fix typo in memory tagging ABI documentation
arm64: mte: Add Memory Tagging Extension documentation
arm64: mte: Kconfig entry
arm64: mte: Save tags when hibernating
arm64: mte: Enable swap of tagged pages
mm: Add arch hooks for saving/restoring tags
fs: Handle intra-page faults in copy_mount_options()
arm64: mte: ptrace: Add NT_ARM_TAGGED_ADDR_CTRL regset
arm64: mte: ptrace: Add PTRACE_{PEEK,POKE}MTETAGS support
arm64: mte: Allow {set,get}_tagged_addr_ctrl() on non-current tasks
arm64: mte: Restore the GCR_EL1 register after a suspend
arm64: mte: Allow user control of the generated random tags via prctl()
arm64: mte: Allow user control of the tag check mode via prctl()
mm: Allow arm64 mmap(PROT_MTE) on RAM-based files
arm64: mte: Validate the PROT_MTE request via arch_validate_flags()
mm: Introduce arch_validate_flags()
arm64: mte: Add PROT_MTE support to mmap() and mprotect()
mm: Introduce arch_calc_vm_flag_bits()
arm64: mte: Tags-aware aware memcmp_pages() implementation
arm64: Avoid unnecessary clear_user_page() indirection
...
The SVE state is saved by fpsimd_signal_preserve_current_state() and not
preserve_fpsimd_context(). Update the comment in preserve_sve_context to
reflect the current behavior.
Signed-off-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Link: https://lore.kernel.org/r/20200828181155.17745-3-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
The Memory Tagging Extension has two modes of notifying a tag check
fault at EL0, configurable through the SCTLR_EL1.TCF0 field:
1. Synchronous raising of a Data Abort exception with DFSC 17.
2. Asynchronous setting of a cumulative bit in TFSRE0_EL1.
Add the exception handler for the synchronous exception and handling of
the asynchronous TFSRE0_EL1.TF0 bit setting via a new TIF flag in
do_notify_resume().
On a tag check failure in user-space, whether synchronous or
asynchronous, a SIGSEGV will be raised on the faulting thread.
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Co-developed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Although the arm64 single-step state machine can be fast-forwarded in
cases where we wish to generate a SIGTRAP without actually executing an
instruction, this has two major limitations outside of simply skipping
an instruction due to emulation.
1. Stepping out of a ptrace signal stop into a signal handler where
SIGTRAP is blocked. Fast-forwarding the stepping state machine in
this case will result in a forced SIGTRAP, with the handler reset to
SIG_DFL.
2. The hardware implicitly fast-forwards the state machine when executing
an SVC instruction for issuing a system call. This can interact badly
with subsequent ptrace stops signalled during the execution of the
system call (e.g. SYSCALL_EXIT or seccomp traps), as they may corrupt
the stepping state by updating the PSTATE for the tracee.
Resolve both of these issues by injecting a pseudo-singlestep exception
on entry to a signal handler and also on return to userspace following a
system call.
Cc: <stable@vger.kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Tested-by: Luis Machado <luis.machado@linaro.org>
Reported-by: Keno Fischer <keno@juliacomputing.com>
Signed-off-by: Will Deacon <will@kernel.org>
This patch adds the bare minimum required to expose the ARMv8.5
Branch Target Identification feature to userspace.
By itself, this does _not_ automatically enable BTI for any initial
executable pages mapped by execve(). This will come later, but for
now it should be possible to enable BTI manually on those pages by
using mprotect() from within the target process.
Other arches already using the generic mman.h are already using
0x10 for arch-specific prot flags, so we use that for PROT_BTI
here.
For consistency, signal handler entry points in BTI guarded pages
are required to be annotated as such, just like any other function.
This blocks a relatively minor attack vector, but comforming
userspace will have the annotations anyway, so we may as well
enforce them.
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Make sure we try to save/restore the vfp/fpsimd context for signal
handling only when the fp/simd support is available. Otherwise, skip
the frames.
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation this program is
distributed in the hope that it will be useful but without any
warranty without even the implied warranty of merchantability or
fitness for a particular purpose see the gnu general public license
for more details you should have received a copy of the gnu general
public license along with this program if not see http www gnu org
licenses
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 503 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexios Zavras <alexios.zavras@intel.com>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Enrico Weigelt <info@metux.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190602204653.811534538@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch updates fpsimd_flush_task_state() to mirror the new
semantics of fpsimd_flush_cpu_state() introduced by commit
d8ad71fa38a9 ("arm64: fpsimd: Fix TIF_FOREIGN_FPSTATE after
invalidating cpu regs"). Both functions now implicitly set
TIF_FOREIGN_FPSTATE to indicate that the task's FPSIMD state is not
loaded into the cpu.
As a side-effect, fpsimd_flush_task_state() now sets
TIF_FOREIGN_FPSTATE even for non-running tasks. In the case of
non-running tasks this is not useful but also harmless, because the
flag is live only while the corresponding task is running. This
function is not called from fast paths, so special-casing this for
the task == current case is not really worth it.
Compiler barriers previously present in restore_sve_fpsimd_context()
are pulled into fpsimd_flush_task_state() so that it can be safely
called with preemption enabled if necessary.
Explicit calls to set TIF_FOREIGN_FPSTATE that accompany
fpsimd_flush_task_state() calls and are now redundant are removed
as appropriate.
fpsimd_flush_task_state() is used to get exclusive access to the
representation of the task's state via task_struct, for the purpose
of replacing the state. Thus, the call to this function should
happen before manipulating fpsimd_state or sve_state etc. in
task_struct. Anomalous cases are reordered appropriately in order
to make the code more consistent, although there should be no
functional difference since these cases are protected by
local_bh_disable() anyway.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Julien Grall <julien.grall@arm.com>
Tested-by: zhang.lei <zhang.lei@jp.fujitsu.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument
of the user address range verification function since we got rid of the
old racy i386-only code to walk page tables by hand.
It existed because the original 80386 would not honor the write protect
bit when in kernel mode, so you had to do COW by hand before doing any
user access. But we haven't supported that in a long time, and these
days the 'type' argument is a purely historical artifact.
A discussion about extending 'user_access_begin()' to do the range
checking resulted this patch, because there is no way we're going to
move the old VERIFY_xyz interface to that model. And it's best done at
the end of the merge window when I've done most of my merges, so let's
just get this done once and for all.
This patch was mostly done with a sed-script, with manual fix-ups for
the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form.
There were a couple of notable cases:
- csky still had the old "verify_area()" name as an alias.
- the iter_iov code had magical hardcoded knowledge of the actual
values of VERIFY_{READ,WRITE} (not that they mattered, since nothing
really used it)
- microblaze used the type argument for a debug printout
but other than those oddities this should be a total no-op patch.
I tried to fix up all architectures, did fairly extensive grepping for
access_ok() uses, and the changes are trivial, but I may have missed
something. Any missed conversion should be trivially fixable, though.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We don't currently annotate our various sigreturn functions as syscalls,
as we need to do to use pt_regs syscall wrappers.
Let's mark them as real syscalls.
For compat_sys_sigreturn and compat_sys_rt_sigreturn, this changes the
return type from int to long, matching the prototypes in sys32.c.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>