This set of changes removes tracehook.h, moves modification of all of
the ptrace fields inside of siglock to remove races, adds a missing
permission check to ptrace.c
The removal of tracehook.h is quite significant as it has been a major
source of confusion in recent years. Much of that confusion was
around task_work and TIF_NOTIFY_SIGNAL (which I have now decoupled
making the semantics clearer).
For people who don't know tracehook.h is a vestiage of an attempt to
implement uprobes like functionality that was never fully merged, and
was later superseeded by uprobes when uprobes was merged. For many
years now we have been removing what tracehook functionaly a little
bit at a time. To the point where now anything left in tracehook.h is
some weird strange thing that is difficult to understand.
Eric W. Biederman (15):
ptrace: Move ptrace_report_syscall into ptrace.h
ptrace/arm: Rename tracehook_report_syscall report_syscall
ptrace: Create ptrace_report_syscall_{entry,exit} in ptrace.h
ptrace: Remove arch_syscall_{enter,exit}_tracehook
ptrace: Remove tracehook_signal_handler
task_work: Remove unnecessary include from posix_timers.h
task_work: Introduce task_work_pending
task_work: Call tracehook_notify_signal from get_signal on all architectures
task_work: Decouple TIF_NOTIFY_SIGNAL and task_work
signal: Move set_notify_signal and clear_notify_signal into sched/signal.h
resume_user_mode: Remove #ifdef TIF_NOTIFY_RESUME in set_notify_resume
resume_user_mode: Move to resume_user_mode.h
tracehook: Remove tracehook.h
ptrace: Move setting/clearing ptrace_message into ptrace_stop
ptrace: Return the signal to continue with from ptrace_stop
Jann Horn (1):
ptrace: Check PTRACE_O_SUSPEND_SECCOMP permission on PTRACE_SEIZE
Yang Li (1):
ptrace: Remove duplicated include in ptrace.c
MAINTAINERS | 1 -
arch/Kconfig | 5 +-
arch/alpha/kernel/ptrace.c | 5 +-
arch/alpha/kernel/signal.c | 4 +-
arch/arc/kernel/ptrace.c | 5 +-
arch/arc/kernel/signal.c | 4 +-
arch/arm/kernel/ptrace.c | 12 +-
arch/arm/kernel/signal.c | 4 +-
arch/arm64/kernel/ptrace.c | 14 +--
arch/arm64/kernel/signal.c | 4 +-
arch/csky/kernel/ptrace.c | 5 +-
arch/csky/kernel/signal.c | 4 +-
arch/h8300/kernel/ptrace.c | 5 +-
arch/h8300/kernel/signal.c | 4 +-
arch/hexagon/kernel/process.c | 4 +-
arch/hexagon/kernel/signal.c | 1 -
arch/hexagon/kernel/traps.c | 6 +-
arch/ia64/kernel/process.c | 4 +-
arch/ia64/kernel/ptrace.c | 6 +-
arch/ia64/kernel/signal.c | 1 -
arch/m68k/kernel/ptrace.c | 5 +-
arch/m68k/kernel/signal.c | 4 +-
arch/microblaze/kernel/ptrace.c | 5 +-
arch/microblaze/kernel/signal.c | 4 +-
arch/mips/kernel/ptrace.c | 5 +-
arch/mips/kernel/signal.c | 4 +-
arch/nds32/include/asm/syscall.h | 2 +-
arch/nds32/kernel/ptrace.c | 5 +-
arch/nds32/kernel/signal.c | 4 +-
arch/nios2/kernel/ptrace.c | 5 +-
arch/nios2/kernel/signal.c | 4 +-
arch/openrisc/kernel/ptrace.c | 5 +-
arch/openrisc/kernel/signal.c | 4 +-
arch/parisc/kernel/ptrace.c | 7 +-
arch/parisc/kernel/signal.c | 4 +-
arch/powerpc/kernel/ptrace/ptrace.c | 8 +-
arch/powerpc/kernel/signal.c | 4 +-
arch/riscv/kernel/ptrace.c | 5 +-
arch/riscv/kernel/signal.c | 4 +-
arch/s390/include/asm/entry-common.h | 1 -
arch/s390/kernel/ptrace.c | 1 -
arch/s390/kernel/signal.c | 5 +-
arch/sh/kernel/ptrace_32.c | 5 +-
arch/sh/kernel/signal_32.c | 4 +-
arch/sparc/kernel/ptrace_32.c | 5 +-
arch/sparc/kernel/ptrace_64.c | 5 +-
arch/sparc/kernel/signal32.c | 1 -
arch/sparc/kernel/signal_32.c | 4 +-
arch/sparc/kernel/signal_64.c | 4 +-
arch/um/kernel/process.c | 4 +-
arch/um/kernel/ptrace.c | 5 +-
arch/x86/kernel/ptrace.c | 1 -
arch/x86/kernel/signal.c | 5 +-
arch/x86/mm/tlb.c | 1 +
arch/xtensa/kernel/ptrace.c | 5 +-
arch/xtensa/kernel/signal.c | 4 +-
block/blk-cgroup.c | 2 +-
fs/coredump.c | 1 -
fs/exec.c | 1 -
fs/io-wq.c | 6 +-
fs/io_uring.c | 11 +-
fs/proc/array.c | 1 -
fs/proc/base.c | 1 -
include/asm-generic/syscall.h | 2 +-
include/linux/entry-common.h | 47 +-------
include/linux/entry-kvm.h | 2 +-
include/linux/posix-timers.h | 1 -
include/linux/ptrace.h | 81 ++++++++++++-
include/linux/resume_user_mode.h | 64 ++++++++++
include/linux/sched/signal.h | 17 +++
include/linux/task_work.h | 5 +
include/linux/tracehook.h | 226 -----------------------------------
include/uapi/linux/ptrace.h | 2 +-
kernel/entry/common.c | 19 +--
kernel/entry/kvm.c | 9 +-
kernel/exit.c | 3 +-
kernel/livepatch/transition.c | 1 -
kernel/ptrace.c | 47 +++++---
kernel/seccomp.c | 1 -
kernel/signal.c | 62 +++++-----
kernel/task_work.c | 4 +-
kernel/time/posix-cpu-timers.c | 1 +
mm/memcontrol.c | 2 +-
security/apparmor/domain.c | 1 -
security/selinux/hooks.c | 1 -
85 files changed, 372 insertions(+), 495 deletions(-)
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEgjlraLDcwBA2B+6cC/v6Eiajj0AFAmJCQkoACgkQC/v6Eiaj
j0DCWQ/5AZVFU+hX32obUNCLackHTwgcCtSOs3JNBmNA/zL/htPiYYG0ghkvtlDR
Dw5J5DnxC6P7PVAdAqrpvx2uX2FebHYU0bRlyLx8LYUEP5dhyNicxX9jA882Z+vw
Ud0Ue9EojwGWS76dC9YoKUj3slThMATbhA2r4GVEoof8fSNJaBxQIqath44t0FwU
DinWa+tIOvZANGBZr6CUUINNIgqBIZCH/R4h6ArBhMlJpuQ5Ufk2kAaiWFwZCkX4
0LuuAwbKsCKkF8eap5I2KrIg/7zZVgxAg9O3cHOzzm8OPbKzRnNnQClcDe8perqp
S6e/f3MgpE+eavd1EiLxevZ660cJChnmikXVVh8ZYYoefaMKGqBaBSsB38bNcLjY
3+f2dB+TNBFRnZs1aCujK3tWBT9QyjZDKtCBfzxDNWBpXGLhHH6j6lA5Lj+Cef5K
/HNHFb+FuqedlFZh5m1Y+piFQ70hTgCa2u8b+FSOubI2hW9Zd+WzINV0ANaZ2LvZ
4YGtcyDNk1q1+c87lxP9xMRl/xi6rNg+B9T2MCo4IUnHgpSVP6VEB3osgUmrrrN0
eQlUI154G/AaDlqXLgmn1xhRmlPGfmenkxpok1AuzxvNJsfLKnpEwQSc13g3oiZr
disZQxNY0kBO2Nv3G323Z6PLinhbiIIFez6cJzK5v0YJ2WtO3pY=
=uEro
-----END PGP SIGNATURE-----
Merge tag 'ptrace-cleanups-for-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull ptrace cleanups from Eric Biederman:
"This set of changes removes tracehook.h, moves modification of all of
the ptrace fields inside of siglock to remove races, adds a missing
permission check to ptrace.c
The removal of tracehook.h is quite significant as it has been a major
source of confusion in recent years. Much of that confusion was around
task_work and TIF_NOTIFY_SIGNAL (which I have now decoupled making the
semantics clearer).
For people who don't know tracehook.h is a vestiage of an attempt to
implement uprobes like functionality that was never fully merged, and
was later superseeded by uprobes when uprobes was merged. For many
years now we have been removing what tracehook functionaly a little
bit at a time. To the point where anything left in tracehook.h was
some weird strange thing that was difficult to understand"
* tag 'ptrace-cleanups-for-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
ptrace: Remove duplicated include in ptrace.c
ptrace: Check PTRACE_O_SUSPEND_SECCOMP permission on PTRACE_SEIZE
ptrace: Return the signal to continue with from ptrace_stop
ptrace: Move setting/clearing ptrace_message into ptrace_stop
tracehook: Remove tracehook.h
resume_user_mode: Move to resume_user_mode.h
resume_user_mode: Remove #ifdef TIF_NOTIFY_RESUME in set_notify_resume
signal: Move set_notify_signal and clear_notify_signal into sched/signal.h
task_work: Decouple TIF_NOTIFY_SIGNAL and task_work
task_work: Call tracehook_notify_signal from get_signal on all architectures
task_work: Introduce task_work_pending
task_work: Remove unnecessary include from posix_timers.h
ptrace: Remove tracehook_signal_handler
ptrace: Remove arch_syscall_{enter,exit}_tracehook
ptrace: Create ptrace_report_syscall_{entry,exit} in ptrace.h
ptrace/arm: Rename tracehook_report_syscall report_syscall
ptrace: Move ptrace_report_syscall into ptrace.h
There are three sets of updates for 5.18 in the asm-generic tree:
- The set_fs()/get_fs() infrastructure gets removed for good. This
was already gone from all major architectures, but now we can
finally remove it everywhere, which loses some particularly
tricky and error-prone code.
There is a small merge conflict against a parisc cleanup, the
solution is to use their new version.
- The nds32 architecture ends its tenure in the Linux kernel. The
hardware is still used and the code is in reasonable shape, but
the mainline port is not actively maintained any more, as all
remaining users are thought to run vendor kernels that would never
be updated to a future release.
There are some obvious conflicts against changes to the removed
files.
- A series from Masahiro Yamada cleans up some of the uapi header
files to pass the compile-time checks.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAmI69BsACgkQmmx57+YA
GNn/zA//f4d5VTT0ThhRxRWTu9BdThGHoB8TUcY7iOhbsWu0X/913NItRC3UeWNl
IdmisaXgVtirg1dcC2pWUmrcHdoWOCEGfK4+Zr2NhSWfuZDWvODHK9pGWk4WLnhe
cQgUNBvIuuAMryGtrOBwHPO4TpfCyy2ioeVP36ZfcsWXdDxTrqfaq/56mk3sxIP6
sUTk1UEjut9NG4C9xIIvcSU50R3l6LryQE/H9kyTLtaSvfvTOvprcVYCq0GPmSzo
DtQ1Wwa9zbJ+4EqoMiP5RrgQwWvOTg2iRByLU8ytwlX3e/SEF0uihvMv1FQbL8zG
G8RhGUOKQSEhaBfc3lIkm8GpOVPh0uHzB6zhn7daVmAWtazRD2Nu59BMjipa+ims
a8Z58iHH7jRAnKeEkVZqXKb1CEiUxaQx/IeVPzN4QlwMhDtwrI76LY7ZJ1zCqTGY
ENG0yRLav1XselYBslOYXGtOEWcY5EZPWqLyWbp4P9vz2g0Fe0gZxoIOvPmNQc89
QnfXpCt7vm/DGkyO255myu08GOLeMkisVqUIzLDB9avlym5mri7T7vk9abBa2YyO
CRpTL5gl1/qKPWuH1UI5mvhT+sbbBE2SUHSuy84btns39ZKKKynwCtdu+hSQkKLE
h9pV30Gf1cLTD4JAE0RWlUgOmbBLVp34loTOexQj4MrLM1noOnw=
=vtCN
-----END PGP SIGNATURE-----
Merge tag 'asm-generic-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull asm-generic updates from Arnd Bergmann:
"There are three sets of updates for 5.18 in the asm-generic tree:
- The set_fs()/get_fs() infrastructure gets removed for good.
This was already gone from all major architectures, but now we can
finally remove it everywhere, which loses some particularly tricky
and error-prone code. There is a small merge conflict against a
parisc cleanup, the solution is to use their new version.
- The nds32 architecture ends its tenure in the Linux kernel.
The hardware is still used and the code is in reasonable shape, but
the mainline port is not actively maintained any more, as all
remaining users are thought to run vendor kernels that would never
be updated to a future release.
- A series from Masahiro Yamada cleans up some of the uapi header
files to pass the compile-time checks"
* tag 'asm-generic-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: (27 commits)
nds32: Remove the architecture
uaccess: remove CONFIG_SET_FS
ia64: remove CONFIG_SET_FS support
sh: remove CONFIG_SET_FS support
sparc64: remove CONFIG_SET_FS support
lib/test_lockup: fix kernel pointer check for separate address spaces
uaccess: generalize access_ok()
uaccess: fix type mismatch warnings from access_ok()
arm64: simplify access_ok()
m68k: fix access_ok for coldfire
MIPS: use simpler access_ok()
MIPS: Handle address errors for accesses above CPU max virtual user address
uaccess: add generic __{get,put}_kernel_nofault
nios2: drop access_ok() check from __put_user()
x86: use more conventional access_ok() definition
x86: remove __range_not_ok()
sparc64: add __{get,put}_kernel_nofault()
nds32: fix access_ok() checks in get/put_user
uaccess: fix nios2 and microblaze get_user_8()
sparc64: fix building assembly files
...
... and call node_dev_init() after memory_dev_init() from driver_init(),
so before any of the existing arch/subsys calls. All online nodes should
be known at that point: early during boot, arch code determines node and
zone ranges and sets the relevant nodes online; usually this happens in
setup_arch().
This is in line with memory_dev_init(), which initializes the memory
device subsystem and creates all memory block devices.
Similar to memory_dev_init(), panic() if anything goes wrong, we don't
want to continue with such basic initialization errors.
The important part is that node_dev_init() gets called after
memory_dev_init() and after cpu_dev_init(), but before any of the relevant
archs call register_cpu() to register the new cpu device under the node
device. The latter should be the case for the current users of
topology_init().
Link: https://lkml.kernel.org/r/20220203105212.30385-1-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Tested-by: Anatoly Pugachev <matorola@gmail.com> (sparc64)
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There exists many same definitions of device_tree_init() for various
platforms, add a weak function in arch/mips/kernel/prom.c to clean
up the related code.
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Unfortunately, Clang did not have support for "sp" as a global register
definition, and was crashing after the addition of current_stack_pointer.
This has been fixed in Clang 14, but earlier Clang versions need to
avoid this code, so add a versioned test and revert back to the
open-coded asm instances. Fixes Clang build error:
fatal error: error in backend: Invalid register name global variable
Fixes: 200ed341b8 ("mips: Implement "current_stack_pointer"")
Reported-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/lkml/YikTQRql+il3HbrK@dev-arch.thelio-3990X
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Yanteng Si <siyanteng01@gmail.com>
Cc: linux-mips@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Move set_notify_resume and tracehook_notify_resume into resume_user_mode.h.
While doing that rename tracehook_notify_resume to resume_user_mode_work.
Update all of the places that included tracehook.h for these functions to
include resume_user_mode.h instead.
Update all of the callers of tracehook_notify_resume to call
resume_user_mode_work.
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lkml.kernel.org/r/20220309162454.123006-12-ebiederm@xmission.com
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Rename tracehook_report_syscall_{entry,exit} to
ptrace_report_syscall_{entry,exit} and place them in ptrace.h
There is no longer any generic tracehook infractructure so make
these ptrace specific functions ptrace specific.
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lkml.kernel.org/r/20220309162454.123006-3-ebiederm@xmission.com
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
With KCFLAGS="-O3", I was able to trigger a fortify-source
memcpy() overflow panic on set_vi_srs_handler().
Although O3 level is not supported in the mainline, under some
conditions that may've happened with any optimization settings,
it's just a matter of inlining luck. The panic itself is correct,
more precisely, 50/50 false-positive and not at the same time.
From the one side, no real overflow happens. Exception handler
defined in asm just gets copied to some reserved places in the
memory.
But the reason behind is that C code refers to that exception
handler declares it as `char`, i.e. something of 1 byte length.
It's obvious that the asm function itself is way more than 1 byte,
so fortify logics thought we are going to past the symbol declared.
The standard way to refer to asm symbols from C code which is not
supposed to be called from C is to declare them as
`extern const u8[]`. This is fully correct from any point of view,
as any code itself is just a bunch of bytes (including 0 as it is
for syms like _stext/_etext/etc.), and the exact size is not known
at the moment of compilation.
Adjust the type of the except_vec_vi_*() and related variables.
Make set_handler() take `const` as a second argument to avoid
cast-away warnings and give a little more room for optimization.
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
To follow the existing per-arch conventions replace open-coded uses
of asm "sp" as "current_stack_pointer". This will let it be used in
non-arch places (like HARDENED_USERCOPY).
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Yanteng Si <siyanteng01@gmail.com>
Cc: linux-mips@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Address errors have always been treated as unaliged accesses and handled
as such. But address errors are also issued for illegal accesses like
user to kernel space or accesses outside of implemented spaces. This
change implements Linux exception handling for accesses to the illegal
space above the CPU implemented maximum virtual user address and the
MIPS 64bit architecture maximum. With this we can now use a fixed value
for the maximum task size on every MIPS CPU and get a more optimized
access_ok().
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
I'm doing some thread necromancy of
https://lore.kernel.org/lkml/202007081624.82FA0CC1EA@keescook/
x86, arm64, and arm32 adjusted their READ_IMPLIES_EXEC logic to better
align with the safer defaults and the interactions with other mappings,
which I illustrated with this comment on x86:
/*
* An executable for which elf_read_implies_exec() returns TRUE will
* have the READ_IMPLIES_EXEC personality flag set automatically.
*
* The decision process for determining the results are:
*
* CPU: | lacks NX* | has NX, ia32 | has NX, x86_64 |
* ELF: | | | |
* ---------------------|------------|------------------|----------------|
* missing PT_GNU_STACK | exec-all | exec-all | exec-none |
* PT_GNU_STACK == RWX | exec-stack | exec-stack | exec-stack |
* PT_GNU_STACK == RW | exec-none | exec-none | exec-none |
*
* exec-all : all PROT_READ user mappings are executable, except when
* backed by files on a noexec-filesystem.
* exec-none : only PROT_EXEC user mappings are executable.
* exec-stack: only the stack and PROT_EXEC user mappings are
* executable.
*
* *this column has no architectural effect: NX markings are ignored by
* hardware, but may have behavioral effects when "wants X" collides with
* "cannot be X" constraints in memory permission flags, as in
* https://lkml.kernel.org/r/20190418055759.GA3155@mellanox.com
*
*/
For MIPS, the "lacks NX" above is the "!cpu_has_rixi" check. On x86,
we decided that the READ_IMPLIES_EXEC flag needed to reflect the
expectations, not the architectural behavior due to bad interactions
as noted above, as always returning "1" on non-NX hardware breaks
some mappings.
The other part of the issue is "what does the MIPS toolchain do for
PT_GNU_STACK?" The answer seems to be "by default, include PT_GNU_STACK,
but mark it executable" (likely due to concerns over non-NX hardware):
$ mipsel-linux-gnu-gcc -o hello_world hello_world.c
$ llvm-readelf -lW hellow_world | grep GNU_STACK
GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RWE 0x10
Given that older hardware doesn't support non-executable memory, it
seems safe to make the "PT_GNU_STACK is absent" logic mean "assume
non-executable", but this might break very old software running on
modern MIPS. This situation matches the ia32-on-x86_64 logic x86
uses (which assumes needing READ_IMPLIES_EXEC in that situation). But
modern toolchains on modern MIPS hardware should follow a safer default
(assume NX stack).
A follow-up to this change would be to switch the MIPS toolchain to emit
a non-executable PT_GNU_STACK, as this seems to be unneeded.
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Xuefeng Li <lixuefeng@loongson.cn>
Cc: Juxin Gao <gaojuxin@loongson.cn>
Cc: linux-mips@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
The major part for workaround handling has already moved to config
options. This change replaces the remaining defines by already
available config options and gets rid of war.h
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Correct a typo/pasto: setnocoherentio() should set
dma_default_coherent to false, not true.
Fixes: 14ac09a65e ("MIPS: refactor the runtime coherent vs noncoherent DMA indicators")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: linux-mips@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
After enabling CONFIG_SCHED_CORE (landed during 5.14 cycle),
2-core 2-thread-per-core interAptiv (CPS-driven) started emitting
the following:
[ 0.025698] CPU1 revision is: 0001a120 (MIPS interAptiv (multi))
[ 0.048183] ------------[ cut here ]------------
[ 0.048187] WARNING: CPU: 1 PID: 0 at kernel/sched/core.c:6025 sched_core_cpu_starting+0x198/0x240
[ 0.048220] Modules linked in:
[ 0.048233] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.17.0-rc3+ #35 b7b319f24073fd9a3c2aa7ad15fb7993eec0b26f
[ 0.048247] Stack : 817f0000 00000004 327804c8 810eb050 00000000 00000004 00000000 c314fdd1
[ 0.048278] 830cbd64 819c0000 81800000 817f0000 83070bf4 00000001 830cbd08 00000000
[ 0.048307] 00000000 00000000 815fcbc4 00000000 00000000 00000000 00000000 00000000
[ 0.048334] 00000000 00000000 00000000 00000000 817f0000 00000000 00000000 817f6f34
[ 0.048361] 817f0000 818a3c00 817f0000 00000004 00000000 00000000 4dc33260 0018c933
[ 0.048389] ...
[ 0.048396] Call Trace:
[ 0.048399] [<8105a7bc>] show_stack+0x3c/0x140
[ 0.048424] [<8131c2a0>] dump_stack_lvl+0x60/0x80
[ 0.048440] [<8108b5c0>] __warn+0xc0/0xf4
[ 0.048454] [<8108b658>] warn_slowpath_fmt+0x64/0x10c
[ 0.048467] [<810bd418>] sched_core_cpu_starting+0x198/0x240
[ 0.048483] [<810c6514>] sched_cpu_starting+0x14/0x80
[ 0.048497] [<8108c0f8>] cpuhp_invoke_callback_range+0x78/0x140
[ 0.048510] [<8108d914>] notify_cpu_starting+0x94/0x140
[ 0.048523] [<8106593c>] start_secondary+0xbc/0x280
[ 0.048539]
[ 0.048543] ---[ end trace 0000000000000000 ]---
[ 0.048636] Synchronize counters for CPU 1: done.
...for each but CPU 0/boot.
Basic debug printks right before the mentioned line say:
[ 0.048170] CPU: 1, smt_mask:
So smt_mask, which is sibling mask obviously, is empty when entering
the function.
This is critical, as sched_core_cpu_starting() calculates
core-scheduling parameters only once per CPU start, and it's crucial
to have all the parameters filled in at that moment (at least it
uses cpu_smt_mask() which in fact is `&cpu_sibling_map[cpu]` on
MIPS).
A bit of debugging led me to that set_cpu_sibling_map() performing
the actual map calculation, was being invocated after
notify_cpu_start(), and exactly the latter function starts CPU HP
callback round (sched_core_cpu_starting() is basically a CPU HP
callback).
While the flow is same on ARM64 (maps after the notifier, although
before calling set_cpu_online()), x86 started calculating sibling
maps earlier than starting the CPU HP callbacks in Linux 4.14 (see
[0] for the reference). Neither me nor my brief tests couldn't find
any potential caveats in calculating the maps right after performing
delay calibration, but the WARN splat is now gone.
The very same debug prints now yield exactly what I expected from
them:
[ 0.048433] CPU: 1, smt_mask: 0-1
[0] https://git.kernel.org/pub/scm/linux/kernel/git/mips/linux.git/commit/?id=76ce7cfe35ef
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Pull signal/exit/ptrace updates from Eric Biederman:
"This set of changes deletes some dead code, makes a lot of cleanups
which hopefully make the code easier to follow, and fixes bugs found
along the way.
The end-game which I have not yet reached yet is for fatal signals
that generate coredumps to be short-circuit deliverable from
complete_signal, for force_siginfo_to_task not to require changing
userspace configured signal delivery state, and for the ptrace stops
to always happen in locations where we can guarantee on all
architectures that the all of the registers are saved and available on
the stack.
Removal of profile_task_ext, profile_munmap, and profile_handoff_task
are the big successes for dead code removal this round.
A bunch of small bug fixes are included, as most of the issues
reported were small enough that they would not affect bisection so I
simply added the fixes and did not fold the fixes into the changes
they were fixing.
There was a bug that broke coredumps piped to systemd-coredump. I
dropped the change that caused that bug and replaced it entirely with
something much more restrained. Unfortunately that required some
rebasing.
Some successes after this set of changes: There are few enough calls
to do_exit to audit in a reasonable amount of time. The lifetime of
struct kthread now matches the lifetime of struct task, and the
pointer to struct kthread is no longer stored in set_child_tid. The
flag SIGNAL_GROUP_COREDUMP is removed. The field group_exit_task is
removed. Issues where task->exit_code was examined with
signal->group_exit_code should been examined were fixed.
There are several loosely related changes included because I am
cleaning up and if I don't include them they will probably get lost.
The original postings of these changes can be found at:
https://lkml.kernel.org/r/87a6ha4zsd.fsf@email.froward.int.ebiederm.orghttps://lkml.kernel.org/r/87bl1kunjj.fsf@email.froward.int.ebiederm.orghttps://lkml.kernel.org/r/87r19opkx1.fsf_-_@email.froward.int.ebiederm.org
I trimmed back the last set of changes to only the obviously correct
once. Simply because there was less time for review than I had hoped"
* 'signal-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (44 commits)
ptrace/m68k: Stop open coding ptrace_report_syscall
ptrace: Remove unused regs argument from ptrace_report_syscall
ptrace: Remove second setting of PT_SEIZED in ptrace_attach
taskstats: Cleanup the use of task->exit_code
exit: Use the correct exit_code in /proc/<pid>/stat
exit: Fix the exit_code for wait_task_zombie
exit: Coredumps reach do_group_exit
exit: Remove profile_handoff_task
exit: Remove profile_task_exit & profile_munmap
signal: clean up kernel-doc comments
signal: Remove the helper signal_group_exit
signal: Rename group_exit_task group_exec_task
coredump: Stop setting signal->group_exit_task
signal: Remove SIGNAL_GROUP_COREDUMP
signal: During coredumps set SIGNAL_GROUP_EXIT in zap_process
signal: Make coredump handling explicit in complete_signal
signal: Have prepare_signal detect coredumps using signal->core_state
signal: Have the oom killer detect coredumps using signal->core_state
exit: Move force_uaccess back into do_exit
exit: Guarantee make_task_dead leaks the tsk when calling do_task_exit
...
Merge misc updates from Andrew Morton:
"146 patches.
Subsystems affected by this patch series: kthread, ia64, scripts,
ntfs, squashfs, ocfs2, vfs, and mm (slab-generic, slab, kmemleak,
dax, kasan, debug, pagecache, gup, shmem, frontswap, memremap,
memcg, selftests, pagemap, dma, vmalloc, memory-failure, hugetlb,
userfaultfd, vmscan, mempolicy, oom-kill, hugetlbfs, migration, thp,
ksm, page-poison, percpu, rmap, zswap, zram, cleanups, hmm, and
damon)"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (146 commits)
mm/damon: hide kernel pointer from tracepoint event
mm/damon/vaddr: hide kernel pointer from damon_va_three_regions() failure log
mm/damon/vaddr: use pr_debug() for damon_va_three_regions() failure logging
mm/damon/dbgfs: remove an unnecessary variable
mm/damon: move the implementation of damon_insert_region to damon.h
mm/damon: add access checking for hugetlb pages
Docs/admin-guide/mm/damon/usage: update for schemes statistics
mm/damon/dbgfs: support all DAMOS stats
Docs/admin-guide/mm/damon/reclaim: document statistics parameters
mm/damon/reclaim: provide reclamation statistics
mm/damon/schemes: account how many times quota limit has exceeded
mm/damon/schemes: account scheme actions that successfully applied
mm/damon: remove a mistakenly added comment for a future feature
Docs/admin-guide/mm/damon/usage: update for kdamond_pid and (mk|rm)_contexts
Docs/admin-guide/mm/damon/usage: mention tracepoint at the beginning
Docs/admin-guide/mm/damon/usage: remove redundant information
Docs/admin-guide/mm/damon/usage: update for scheme quotas and watermarks
mm/damon: convert macro functions to static inline functions
mm/damon: modify damon_rand() macro to static inline function
mm/damon: move damon_rand() definition into damon.h
...
When debug sigaltstack(), copy_siginfo_to_user() fails first in
setup_rt_frame() if the alternate signal stack is too small, so
it should return immediately if call fails, no need to call the
following functions.
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
There are two big uses of do_exit. The first is it's design use to be
the guts of the exit(2) system call. The second use is to terminate
a task after something catastrophic has happened like a NULL pointer
in kernel code.
Add a function make_task_dead that is initialy exactly the same as
do_exit to cover the cases where do_exit is called to handle
catastrophic failure. In time this can probably be reduced to just a
light wrapper around do_task_dead. For now keep it exactly the same so
that there will be no behavioral differences introducing this new
concept.
Replace all of the uses of do_exit that use it for catastraphic
task cleanup with make_task_dead to make it clear what the code
is doing.
As part of this rename rewind_stack_do_exit
rewind_stack_and_make_dead.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Use the FIELD_PREP() helper, instead of open-coding the same operation.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
It turns out that 'decode_configs' -> 'set_ftlb_enable' is called under
c->cputype unset, which leaves FTLB disabled on BOTH 3A2000 and 3A3000
Fix it by calling "decode_configs" after c->cputype is initialized
Fixes: da1bd29742 ("MIPS: Loongson64: Probe CPU features via CPUCFG")
Signed-off-by: Huang Pei <huangpei@loongson.cn>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
There exists the following issue under DEBUG_PREEMPT:
BUG: using smp_processor_id() in preemptible [00000000] code: systemd/1
caller is show_cpuinfo+0x460/0xea0
...
Call Trace:
[<ffffffff8020f0dc>] show_stack+0x94/0x128
[<ffffffff80e6cab4>] dump_stack_lvl+0x94/0xd8
[<ffffffff80e74c5c>] check_preemption_disabled+0x104/0x110
[<ffffffff802209c8>] show_cpuinfo+0x460/0xea0
[<ffffffff80539d54>] seq_read_iter+0xfc/0x4f8
[<ffffffff804fcc10>] new_sync_read+0x110/0x1b8
[<ffffffff804ff57c>] vfs_read+0x1b4/0x1d0
[<ffffffff804ffb18>] ksys_read+0xd0/0x110
[<ffffffff8021c090>] syscall_common+0x34/0x58
We can see the following call trace:
show_cpuinfo()
cpu_has_fpu
current_cpu_data
smp_processor_id()
$ addr2line -f -e vmlinux 0xffffffff802209c8
show_cpuinfo
arch/mips/kernel/proc.c:188
$ head -188 arch/mips/kernel/proc.c | tail -1
if (cpu_has_fpu)
arch/mips/include/asm/cpu-features.h
# define cpu_has_fpu (current_cpu_data.options & MIPS_CPU_FPU)
arch/mips/include/asm/cpu-info.h
#define current_cpu_data cpu_data[smp_processor_id()]
Based on the above analysis, fix the issue by using raw_cpu_has_fpu
which calls raw_smp_processor_id() in show_cpuinfo().
Fixes: 626bfa0372 ("MIPS: kernel: proc: add CPU option reporting")
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Wire up the futex_waitv syscall.
Fix Build warning: #warning syscall futex_waitv not implemented [-Wcpp]
Signed-off-by: Wang Haojun <wanghaojun@loongson.cn>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Pull exit cleanups from Eric Biederman:
"While looking at some issues related to the exit path in the kernel I
found several instances where the code is not using the existing
abstractions properly.
This set of changes introduces force_fatal_sig a way of sending a
signal and not allowing it to be caught, and corrects the misuse of
the existing abstractions that I found.
A lot of the misuse of the existing abstractions are silly things such
as doing something after calling a no return function, rolling BUG by
hand, doing more work than necessary to terminate a kernel thread, or
calling do_exit(SIGKILL) instead of calling force_sig(SIGKILL).
In the review a deficiency in force_fatal_sig and force_sig_seccomp
where ptrace or sigaction could prevent the delivery of the signal was
found. I have added a change that adds SA_IMMUTABLE to change that
makes it impossible to interrupt the delivery of those signals, and
allows backporting to fix force_sig_seccomp
And Arnd found an issue where a function passed to kthread_run had the
wrong prototype, and after my cleanup was failing to build."
* 'exit-cleanups-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (23 commits)
soc: ti: fix wkup_m3_rproc_boot_thread return type
signal: Add SA_IMMUTABLE to ensure forced siganls do not get changed
signal: Replace force_sigsegv(SIGSEGV) with force_fatal_sig(SIGSEGV)
exit/r8188eu: Replace the macro thread_exit with a simple return 0
exit/rtl8712: Replace the macro thread_exit with a simple return 0
exit/rtl8723bs: Replace the macro thread_exit with a simple return 0
signal/x86: In emulate_vsyscall force a signal instead of calling do_exit
signal/sparc32: In setup_rt_frame and setup_fram use force_fatal_sig
signal/sparc32: Exit with a fatal signal when try_to_clear_window_buffer fails
exit/syscall_user_dispatch: Send ordinary signals on failure
signal: Implement force_fatal_sig
exit/kthread: Have kernel threads return instead of calling do_exit
signal/s390: Use force_sigsegv in default_trap_handler
signal/vm86_32: Properly send SIGSEGV when the vm86 state cannot be saved.
signal/vm86_32: Replace open coded BUG_ON with an actual BUG_ON
signal/sparc: In setup_tsb_params convert open coded BUG into BUG
signal/powerpc: On swapcontext failure force SIGSEGV
signal/sh: Use force_sig(SIGKILL) instead of do_group_exit(SIGKILL)
signal/mips: Update (_save|_restore)_fp_context to fail with -EFAULT
signal/sparc32: Remove unreachable do_exit in do_sparc_fault
...
After making the brcmstb_gisb driver modular with 707a4cdf86 ("bus:
brcmstb_gisb: Allow building as module") Guenter reported that mips
allmodconfig failed to link because board_be_handler was referenced.
Thomas indicated that if we were to continue making the brcmstb_gisb
driver modular for MIPS we would need to introduce a function that
allows setting the board_be_handler and export that function towards
modules.
This is what is being done here: board_be_handler is made static and is
now settable with a mips_set_be_handler() function which is exported.
Reported-by: Guenter Roeck <linux@roeck-us.net>
Suggested-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Fixes: 707a4cdf86 ("bus: brcmstb_gisb: Allow building as module")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
- removed support for Netlogic SOCs
- fixes and cleanup
-----BEGIN PGP SIGNATURE-----
iQJOBAABCAA4FiEEbt46xwy6kEcDOXoUeZbBVTGwZHAFAmGE6OEaHHRzYm9nZW5k
QGFscGhhLmZyYW5rZW4uZGUACgkQeZbBVTGwZHA3zQ//Ze+qwFP53nl21DTx+viH
1wvoy+rMa95Udrzf8/Mp8qskZG9Jq2UKskYeWPq2cgMD2X7JSh8b0Xz3O9K8XxKD
a21jGoY61np7HHl9XOhYD7A87DvrHS2VrYh6IiN9tqV5tK0x7uzYnH0pJDqhU4im
0zDtnnvVDvUPU9EDEusD8xPaKHquQXEd0JyNXLo+hvuwc1RZ2VckjE8dsS3I0kn2
2Sz4j7K/e1hB85vE1LXuuU8cL8IuMpDxkDnK6STA7mzNBPAdqnClnKuB70zjnwp0
lCjZBWmSeVIqCEdoerB7wgn1g+8k/1TBbCoqtN2poQMiLBUKFbfv5gXg3Uei0SDa
Q/c3od8Mr2jn39EbG5/cf0FOyuKtLDFvGXASaSuW7Xhvn3ue4I6MrKh2bb5nimXP
3ShvsvYfoc2cGSz7Pt7tOCP6WHFlIi+9wjZEr8XZlGhJE6BgE4CbJHA3/Q4VAl+Q
4whAcNldSX+sVS/dSHPo0s/9eLCPm1EWulD8mBpvj27cmeUVhqXCIY34Df7aIDBi
F1+rFxt6/uDTDEJp3PkwkmVJuyzivN5CcXl4xks/MaJ/8BgYZVRbafyGlJxm+MMf
ufAXvfiwZhyVg4cHajtMiFO2m1itmoyg3CvIffiVZMbegqy4Dsu+ZIf0csPmGheK
88AfVKBfrRwrJG1yK2iX/Dw=
=mvn0
-----END PGP SIGNATURE-----
Merge tag 'mips_5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
Pull MIPS updates from Thomas Bogendoerfer:
- added printing of CPU options for /proc/cpuinfo
- removed support for Netlogic SOCs
- fixes and cleanup
* tag 'mips_5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: Cobalt: Explain GT64111 early PCI fixup
mips: fix HUGETLB function without THP enabled
mips: cm: Convert to bitfield API to fix out-of-bounds access
MIPS: Remove NETLOGIC support
MIPS: kernel: proc: add CPU option reporting
MIPS: kernel: proc: use seq_puts instead of seq_printf
MIPS: kernel: proc: fix trivial style errors
MIPS: Fix assembly error from MIPSr2 code used within MIPS_ISA_ARCH_LEVEL
MIPS: octeon: Remove unused functions
MIPS: Loongson64: Add of_node_put() before break
bcm47xx: Replace printk(KERN_ALERT ... pci_devname(dev)) with pci_alert()
bcm47xx: Get rid of redundant 'else'
MIPS: sni: Fix the build
MIPS: Avoid macro redefinitions
MIPS: loongson64: Fix no screen display during boot-up
MIPS: loongson64: make CPU_LOONGSON64 depends on MIPS_FP_SUPPORT
mips_cm_error_report() extracts the cause and other cause from the error
register using shifts. This works fine for the former, as it is stored
in the top bits, and the shift will thus remove all non-related bits.
However, the latter is stored in the bottom bits, hence thus needs masking
to get rid of non-related bits. Without such masking, using it as an
index into the cm2_causes[] array will lead to an out-of-bounds access,
probably causing a crash.
Fix this by using FIELD_GET() instead. Bite the bullet and convert all
MIPS CM handling to the bitfield API, to improve readability and safety.
Fixes: 3885c2b463 ("MIPS: CM: Add support for reporting CM cache errors")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
- kprobes: Restructured stack unwinder to show properly on x86 when a stack
dump happens from a kretprobe callback.
- Fix to bootconfig parsing
- Have tracefs allow owner and group permissions by default (only denying
others). There's been pressure to allow non root to tracefs in a
controlled fashion, and using groups is probably the safest.
- Bootconfig memory managament updates.
- Bootconfig clean up to have the tools directory be less dependent on
changes in the kernel tree.
- Allow perf to be traced by function tracer.
- Rewrite of function graph tracer to be a callback from the function tracer
instead of having its own trampoline (this change will happen on an arch
by arch basis, and currently only x86_64 implements it).
- Allow multiple direct trampolines (bpf hooks to functions) be batched
together in one synchronization.
- Allow histogram triggers to add variables that can perform calculations
against the event's fields.
- Use the linker to determine architecture callbacks from the ftrace
trampoline to allow for proper parameter prototypes and prevent warnings
from the compiler.
- Extend histogram triggers to key off of variables.
- Have trace recursion use bit magic to determine preempt context over if
branches.
- Have trace recursion disable preemption as all use cases do anyway.
- Added testing for verification of tracing utilities.
- Various small clean ups and fixes.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYYBdxhQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qp1sAQD2oYFwaG3sx872gj/myBcHIBSKdiki
Hry5csd8zYDBpgD+Poylopt5JIbeDuoYw/BedgEXmscZ8Qr7VzjAXdnv/Q4=
=Loz8
-----END PGP SIGNATURE-----
Merge tag 'trace-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
- kprobes: Restructured stack unwinder to show properly on x86 when a
stack dump happens from a kretprobe callback.
- Fix to bootconfig parsing
- Have tracefs allow owner and group permissions by default (only
denying others). There's been pressure to allow non root to tracefs
in a controlled fashion, and using groups is probably the safest.
- Bootconfig memory managament updates.
- Bootconfig clean up to have the tools directory be less dependent on
changes in the kernel tree.
- Allow perf to be traced by function tracer.
- Rewrite of function graph tracer to be a callback from the function
tracer instead of having its own trampoline (this change will happen
on an arch by arch basis, and currently only x86_64 implements it).
- Allow multiple direct trampolines (bpf hooks to functions) be batched
together in one synchronization.
- Allow histogram triggers to add variables that can perform
calculations against the event's fields.
- Use the linker to determine architecture callbacks from the ftrace
trampoline to allow for proper parameter prototypes and prevent
warnings from the compiler.
- Extend histogram triggers to key off of variables.
- Have trace recursion use bit magic to determine preempt context over
if branches.
- Have trace recursion disable preemption as all use cases do anyway.
- Added testing for verification of tracing utilities.
- Various small clean ups and fixes.
* tag 'trace-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (101 commits)
tracing/histogram: Fix semicolon.cocci warnings
tracing/histogram: Fix documentation inline emphasis warning
tracing: Increase PERF_MAX_TRACE_SIZE to handle Sentinel1 and docker together
tracing: Show size of requested perf buffer
bootconfig: Initialize ret in xbc_parse_tree()
ftrace: do CPU checking after preemption disabled
ftrace: disable preemption when recursion locked
tracing/histogram: Document expression arithmetic and constants
tracing/histogram: Optimize division by a power of 2
tracing/histogram: Covert expr to const if both operands are constants
tracing/histogram: Simplify handling of .sym-offset in expressions
tracing: Fix operator precedence for hist triggers expression
tracing: Add division and multiplication support for hist triggers
tracing: Add support for creating hist trigger variables from literal
selftests/ftrace: Stop tracing while reading the trace file by default
MAINTAINERS: Update KPROBES and TRACING entries
test_kprobes: Move it from kernel/ to lib/
docs, kprobes: Remove invalid URL and add new reference
samples/kretprobes: Fix return value if register_kretprobe() failed
lib/bootconfig: Fix the xbc_get_info kerneldoc
...
Hi Linus,
Please, pull the following patches that fix some fall-through warnings
when building with Clang and -Wimplicit-fallthrough.
Thanks!
--
Gustavo
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEkmRahXBSurMIg1YvRwW0y0cG2zEFAmGAYmQACgkQRwW0y0cG
2zFBAhAAtFusTzJCanITg12l87R4w4LWKZjBGNNKJCSvh3JyUxRw/AfgASgRECKn
2b4pDpYpwZh7CzUoaYwhROCFKw4+FMgEQ/d+q77JBdiNGa4PyWI6gQMwqD5rOFET
hBIJfiIcR2cS9F9w6sHysNxYoiO+zIGbSA8CcTPa47QOecZH5QKhcNw1apGw8n+D
PTSm4qhQBjLfk8/2BsF8pn6THI9+BWmHd8/W9p8cjNXISCeDMZgHqmoK3VHjw6L8
z/CA1lveagbeMHwGYa0ZqVw087G4fzN0XCxwK1+R0zHJbAIGMZTtLkRqZ0yty4Wx
Ses/oNbcsXZVzcOuh1MkU00j4M59dZdaR7WfQNwxdHnWs6jUVi3DDqMvzF3AD2Au
LTbQsKzPg76jd+McbukRSUYV5O4GBPYUCYkGMdbyXPRAi8TxPl4SQYNCCkY7QVL6
2Em9mU1qqxxnoGfPWBBQtlVfeViAw9RWihSAr6FXxV4a9wFYU90TWg6L5MweoPVh
1avP54yW9xSrA1eBMO2QiH2nD56NKzqp3eyy/9G8f9XDUJre8OaCw8Ow+kc6/vdW
gSTiVArZKXWAe1IdvXOE4QMwAFs1eK66MBn9Flngy+8k7bcE9/tzNsjRbLHkNlOB
L4bWu9sn9pZNkcFLprSY9wAnm317JBGhC/9ef/wC4kYE4VaYUvM=
=3pkR
-----END PGP SIGNATURE-----
Merge tag 'fallthrough-fixes-clang-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux
Pull fallthrough fixes from Gustavo A. R. Silva:
"Fix some fall-through warnings when building with Clang and
-Wimplicit-fallthrough"
* tag 'fallthrough-fixes-clang-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux:
pcmcia: db1xxx_ss: Fix fall-through warning for Clang
MIPS: Fix fall-through warnings for Clang
scsi: st: Fix fall-through warning for Clang
- Revert the printk format based wchan() symbol resolution as it can leak
the raw value in case that the symbol is not resolvable.
- Make wchan() more robust and work with all kind of unwinders by
enforcing that the task stays blocked while unwinding is in progress.
- Prevent sched_fork() from accessing an invalid sched_task_group
- Improve asymmetric packing logic
- Extend scheduler statistics to RT and DL scheduling classes and add
statistics for bandwith burst to the SCHED_FAIR class.
- Properly account SCHED_IDLE entities
- Prevent a potential deadlock when initial priority is assigned to a
newly created kthread. A recent change to plug a race between cpuset and
__sched_setscheduler() introduced a new lock dependency which is now
triggered. Break the lock dependency chain by moving the priority
assignment to the thread function.
- Fix the idle time reporting in /proc/uptime for NOHZ enabled systems.
- Improve idle balancing in general and especially for NOHZ enabled
systems.
- Provide proper interfaces for live patching so it does not have to
fiddle with scheduler internals.
- Add cluster aware scheduling support.
- A small set of tweaks for RT (irqwork, wait_task_inactive(), various
scheduler options and delaying mmdrop)
- The usual small tweaks and improvements all over the place
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmF/OUkTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoR/5D/9ikdGNpKg9osNqJ3GjAmxsK6kVkB29
iFe2k8pIpWDToWQf/wQRGih4Yj3Cl49QSnZcPIibh2/12EB1qrrW6iSPJkInz8Ec
/1LS5/Vewn2OyoxyXZjdvGC5gTXEodSbIazASvX7nvdMeI4gsAsL5etzrMJirT/t
aymqvr7zovvywrwMTQJrGjUMo9l4ewE8tafMNNhRu1BHU1U4ojM9yvThyRAAcmp7
3Xy49A+Yq3IgrvYI4u8FMK5Zh08KaxSFjiLhePGm/bF+wSfYmWop2TP1jY05W2Uo
ti8hfbJMUoFRYuMxAiEldkItnc0wV4M9PtWZZ/x+B71bs65Y4Zjt9cW+rxJv2+m1
vzV31EsQwGnOti072dzWN4c/cZqngVXAjaNtErvDwJUr+Tw1ayv9KUvuodMQqZY6
mu68bFUO2kV9EMe1CBOv51Uy1RGHyLj3rlNqrkw+Xp5ISE9Ad2vhUEiRp5bQx5Ci
V/XFhGZkGUluh0vccrdFlNYZwhj8cZEzkOPCnPSeZ+bq8SyZE6xuHH/lTP1CJCOy
s800rW1huM+kgV+zRN8adDkGXibAk9N3RtVGnQXmuEy8gB9LZmQg+JeM2wsc9B+6
i0gdqZnsjNAfoK+BBAG4holxptSL8/eOJsFH8ZNIoxQ+iqooyPx9tFX7yXnRTBQj
d2qWG7UvoseT+g==
=fgtS
-----END PGP SIGNATURE-----
Merge tag 'sched-core-2021-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Thomas Gleixner:
- Revert the printk format based wchan() symbol resolution as it can
leak the raw value in case that the symbol is not resolvable.
- Make wchan() more robust and work with all kind of unwinders by
enforcing that the task stays blocked while unwinding is in progress.
- Prevent sched_fork() from accessing an invalid sched_task_group
- Improve asymmetric packing logic
- Extend scheduler statistics to RT and DL scheduling classes and add
statistics for bandwith burst to the SCHED_FAIR class.
- Properly account SCHED_IDLE entities
- Prevent a potential deadlock when initial priority is assigned to a
newly created kthread. A recent change to plug a race between cpuset
and __sched_setscheduler() introduced a new lock dependency which is
now triggered. Break the lock dependency chain by moving the priority
assignment to the thread function.
- Fix the idle time reporting in /proc/uptime for NOHZ enabled systems.
- Improve idle balancing in general and especially for NOHZ enabled
systems.
- Provide proper interfaces for live patching so it does not have to
fiddle with scheduler internals.
- Add cluster aware scheduling support.
- A small set of tweaks for RT (irqwork, wait_task_inactive(), various
scheduler options and delaying mmdrop)
- The usual small tweaks and improvements all over the place
* tag 'sched-core-2021-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (69 commits)
sched/fair: Cleanup newidle_balance
sched/fair: Remove sysctl_sched_migration_cost condition
sched/fair: Wait before decaying max_newidle_lb_cost
sched/fair: Skip update_blocked_averages if we are defering load balance
sched/fair: Account update_blocked_averages in newidle_balance cost
x86: Fix __get_wchan() for !STACKTRACE
sched,x86: Fix L2 cache mask
sched/core: Remove rq_relock()
sched: Improve wake_up_all_idle_cpus() take #2
irq_work: Also rcuwait for !IRQ_WORK_HARD_IRQ on PREEMPT_RT
irq_work: Handle some irq_work in a per-CPU thread on PREEMPT_RT
irq_work: Allow irq_work_sync() to sleep if irq_work() no IRQ support.
sched/rt: Annotate the RT balancing logic irqwork as IRQ_WORK_HARD_IRQ
sched: Add cluster scheduler level for x86
sched: Add cluster scheduler level in core and related Kconfig for ARM64
topology: Represent clusters of CPUs within a die
sched: Disable -Wunused-but-set-variable
sched: Add wrapper for get_wchan() to keep task blocked
x86: Fix get_wchan() to support the ORC unwinder
proc: Use task_is_running() for wchan in /proc/$pid/stat
...
* irq/remove-handle-domain-irq-20211026:
: Large rework of the architecture entry code from Mark Rutland.
: From the cover letter:
:
: <quote>
: The handle_domain_{irq,nmi}() functions were oringally intended as a
: convenience, but recent rework to entry code across the kernel tree has
: demonstrated that they cause more pain than they're worth and prevent
: architectures from being able to write robust entry code.
:
: This series reworks the irq code to remove them, handling the necessary
: entry work consistently in entry code (be it architectural or generic).
: </quote>
MIPS: irq: Avoid an unused-variable error
irq: remove handle_domain_{irq,nmi}()
irq: remove CONFIG_HANDLE_DOMAIN_IRQ_IRQENTRY
irq: riscv: perform irqentry in entry code
irq: openrisc: perform irqentry in entry code
irq: csky: perform irqentry in entry code
irq: arm64: perform irqentry in entry code
irq: arm: perform irqentry in entry code
irq: add a (temporary) CONFIG_HANDLE_DOMAIN_IRQ_IRQENTRY
irq: nds32: avoid CONFIG_HANDLE_DOMAIN_IRQ
irq: arc: avoid CONFIG_HANDLE_DOMAIN_IRQ
irq: add generic_handle_arch_irq()
irq: unexport handle_irq_desc()
irq: simplify handle_domain_{irq,nmi}()
irq: mips: simplify do_domain_IRQ()
irq: mips: stop (ab)using handle_domain_irq()
irq: mips: simplify bcm6345_l1_irq_handle()
irq: mips: avoid nested irq_enter()
Signed-off-by: Marc Zyngier <maz@kernel.org>
When CONFIG_IRQ_DOMAIN is set, there is a warning:
arch/mips/kernel/irq.c:114:19: error: unused variable 'desc' [-Werror=unused-variable]
114 | struct irq_desc *desc;
| ^~~~
This variable is unused, let's remove it.
Signed-off-by: Yanteng Si <siyanteng@loongson.cn>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211028095652.3503790-1-siyanteng@loongson.cn
When an instruction to save or restore a register from the stack fails
in _save_fp_context or _restore_fp_context return with -EFAULT. This
change was made to r2300_fpu.S[1] but it looks like it got lost with
the introduction of EX2[2]. This is also what the other implementation
of _save_fp_context and _restore_fp_context in r4k_fpu.S does, and
what is needed for the callers to be able to handle the error.
Furthermore calling do_exit(SIGSEGV) from bad_stack is wrong because
it does not terminate the entire process it just terminates a single
thread.
As the changed code was the only caller of arch/mips/kernel/syscall.c:bad_stack
remove the problematic and now unused helper function.
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Maciej Rozycki <macro@orcam.me.uk>
Cc: linux-mips@vger.kernel.org
[1] 35938a00ba ("MIPS: Fix ISA I FP sigcontext access violation handling")
[2] f92722dc45 ("MIPS: Correct MIPS I FP sigcontext layout")
Cc: stable@vger.kernel.org
Fixes: f92722dc45 ("MIPS: Correct MIPS I FP sigcontext layout")
Acked-by: Maciej W. Rozycki <macro@orcam.me.uk>
Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Link: https://lkml.kernel.org/r/20211020174406.17889-5-ebiederm@xmission.com
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
There's no need fpr arch/mips's do_domain_IRQ() to open-code the NULL
check performed by handle_irq_desc(), nor the resolution of the desc
performed by generic_handle_domain_irq().
Use generic_handle_domain_irq() directly, as this is functioanlly
equivalent and clearer.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Many MIPS CPUs have optional CPU features which are not activated for
all CPU cores. Print the CPU options, which are implemented in the core,
in /proc/cpuinfo. This makes it possible to see which features are
supported and which are not supported. This should cover all standard
MIPS extensions. Before, it only printed information about the main MIPS
ASEs.
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Changes from original patch[0]:
- Remove cpu_has_6k_cache and cpu_has_8k_cache due to commit 6ce91ba858
("MIPS: Remove cpu_has_6k_cache and cpu_has_8k_cache in cpu_cache_init()")
- Add new options: mac2008_only, ftlbparex, gsexcex, mmid, mm_sysad,
mm_full
- Use seq_puts instead of seq_printf as suggested by checkpatch
- Minor commit message reword
[0]: https://lore.kernel.org/linux-mips/20181223225224.23042-1-hauke@hauke-m.de/
Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com>
Acked-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Fix the following checkpatch errors - no logic changes:
WARNING: Block comments use a trailing */ on a separate line
+ * */
ERROR: space prohibited before open square bracket '['
+ char fmt [64];
ERROR: space prohibited before that ',' (ctx:WxE)
+ seq_printf(m, "%s0x%04x", i ? ", " : "" ,
ERROR: trailing whitespace
+^Iseq_printf(m, "isa\t\t\t:"); $
ERROR: trailing statements should be on next line
Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
irq_cpu_offline() is only used by MIPS and we should instead use
irq_migrate_all_off_this_cpu(). This will be helpful in order to remove
drivers/irqchip/irq-bcm7038-l1.c irq_cpu_offline callback which would
have got in the way of making this driver modular.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211020184859.2705451-2-f.fainelli@gmail.com
Having a stable wchan means the process must be blocked and for it to
stay that way while performing stack unwinding.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> [arm]
Tested-by: Mark Rutland <mark.rutland@arm.com> [arm64]
Link: https://lkml.kernel.org/r/20211008111626.332092234@infradead.org
Fix the following fallthrough warnings:
arch/mips/alchemy/devboards/db1550.c:69:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
case BCSR_WHOAMI_DB1550:
^
arch/mips/alchemy/devboards/db1550.c:69:2: note: insert 'break;' to avoid fall-through
case BCSR_WHOAMI_DB1550:
^
break;
arch/mips/kernel/uprobes.c:176:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
default:
^
arch/mips/kernel/uprobes.c:176:2: note: insert 'break;' to avoid fall-through
default:
^
break;
This helps with the ongoing efforts to globally enable
-Wimplicit-fallthrough for Clang.
Link: https://github.com/KSPP/linux/issues/115
Link: https://lore.kernel.org/linux-mm/202109030839.t2llsvmc-lkp@intel.com/
Link: https://lore.kernel.org/linux-mm/202108271617.MHxFd8aX-lkp@intel.com/
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Since now there is kretprobe_trampoline_addr() for referring the
address of kretprobe trampoline code, we don't need to access
kretprobe_trampoline directly.
Make it harder to refer by renaming it to __kretprobe_trampoline().
Link: https://lkml.kernel.org/r/163163045446.489837.14510577516938803097.stgit@devnote2
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The __kretprobe_trampoline_handler() callback, called from low level
arch kprobes methods, has the 'trampoline_address' parameter, which is
entirely superfluous as it basically just replicates:
dereference_kernel_function_descriptor(kretprobe_trampoline)
In fact we had bugs in arch code where it wasn't replicated correctly.
So remove this superfluous parameter and use kretprobe_trampoline_addr()
instead.
Link: https://lkml.kernel.org/r/163163044546.489837.13505751885476015002.stgit@devnote2
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Tested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
This clean up the error/notification messages in kprobes related code.
Basically this defines 'pr_fmt()' macros for each files and update
the messages which describes
- what happened,
- what is the kernel going to do or not do,
- is the kernel fine,
- what can the user do about it.
Also, if the message is not needed (e.g. the function returns unique
error code, or other error message is already shown.) remove it,
and replace the message with WARN_*() macros if suitable.
Link: https://lkml.kernel.org/r/163163036568.489837.14085396178727185469.stgit@devnote2
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Invoke rseq_handle_notify_resume() from tracehook_notify_resume() now
that the two function are always called back-to-back by architectures
that have rseq. The rseq helper is stubbed out for architectures that
don't support rseq, i.e. this is a nop across the board.
Note, tracehook_notify_resume() is horribly named and arguably does not
belong in tracehook.h as literally every line of code in it has nothing
to do with tracing. But, that's been true since commit a42c6ded82
("move key_repace_session_keyring() into tracehook_notify_resume()")
first usurped tracehook_notify_resume() back in 2012. Punt cleaning that
mess up to future patches.
No functional change intended.
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210901203030.1292304-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Ensure that all usage sites of get/put_online_cpus() except for the
struggler in drivers/thermal are gone. So the last user and the deprecated
inlines can be removed.
These are all handled correctly when calling the native system call entry
point, so remove the special cases.
Link: https://lkml.kernel.org/r/20210727144859.4150043-6-arnd@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- fixes and cleanups
-----BEGIN PGP SIGNATURE-----
iQJOBAABCAA4FiEEbt46xwy6kEcDOXoUeZbBVTGwZHAFAmEx9dUaHHRzYm9nZW5k
QGFscGhhLmZyYW5rZW4uZGUACgkQeZbBVTGwZHA0xQ/8D1gvh1fURnwLrDjNyydC
yOVngOviRgI/3vZj3i1dSLEXskCVjXYhFJLtMUuwiTH5J3b5ToxjjKfp3MZv55BV
04B11YuXXqDQ6fzXCmaPNK6jfgq4DmZZSGx3hx2H7HLq749WrxiReTK4bmwizH3V
2A2I7DDTUhGG4L3ThDMiQO17aRfR7C+5eltRvyCJeGTnvyB67x6eOkOSBcreJSVj
iCshM7XkCXEsLjdvv0XOiOWZJLTjMoEyQm+uQya7wkS6/0PZ4MQqou+WvppDH0Sc
w/36lSaMYSuP2KqcXLehfhdPCQpWxfvcOn4j0Z5yBZ3ADzBN44s6S3MEgncvona3
w4ki0ZCHzpPa81Z15Fn9EIHAQ7H9XIpnTTByZjbBI+6Ch7MSfQv0yUqKPc2Hyl6a
DksVSKyrC78iNOhtjFVOpjW8rJ47GWI25nbFyADeZWoyjX35FWiDQJWn5K5Aa6+D
RBTHGbMcaAiuTauUpDKvxaWmT+NyB0WUjxpMdqg2BLQcv3CvcrTR39zVrBBvG0mM
BoMjCR60q8MSG/NBlZt03FEhJXsu2UZFFMfilIkJmNjoB7VANLexPsU+jI6LjT36
qiVZKXHtR/NpbBdOSk+DXFh0QyAStJ1mfZ6yQ0xYyBvhn6bISIK7nn6LhFtVmN85
kUTOZuQhQ6qigRu++i1sb3U=
=mKyO
-----END PGP SIGNATURE-----
Merge tag 'mips_5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
Pull MIPS updates from Thomas Bogendoerfer:
- converted Pistachio platform to use MIPS generic kernel
- fixes and cleanups
* tag 'mips_5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: (29 commits)
MIPS: Malta: fix alignment of the devicetree buffer
MIPS: ingenic: Unconditionally enable clock of CPU #0
MIPS: mscc: ocelot: mark the phy-mode for internal PHY ports
MIPS: mscc: ocelot: disable all switch ports by default
MAINTAINERS: adjust PISTACHIO SOC SUPPORT after its retirement
MIPS: Return true/false (not 1/0) from bool functions
MIPS: generic: Return true/false (not 1/0) from bool functions
MIPS: Make a alias for pistachio_defconfig
MIPS: Retire MACH_PISTACHIO
MIPS: config: generic: Add config for Marduk board
pinctrl: pistachio: Make it as an option
phy: pistachio-usb: Depend on MIPS || COMPILE_TEST
clocksource/drivers/pistachio: Make it selectable for MIPS
clk: pistachio: Make it selectable for generic MIPS kernel
MIPS: DTS: Pistachio add missing cpc and cdmm
MIPS: generic: Allow generating FIT image for Marduk board
MIPS: locking/atomic: Fix atomic{_64,}_sub_if_positive
MIPS: loongson2ef: don't build serial.o unconditionally
MIPS: Replace deprecated CPU-hotplug functions.
MIPS: Alchemy: Fix spelling contraction "cant" -> "can't"
...
Merge misc updates from Andrew Morton:
"173 patches.
Subsystems affected by this series: ia64, ocfs2, block, and mm (debug,
pagecache, gup, swap, shmem, memcg, selftests, pagemap, mremap,
bootmem, sparsemem, vmalloc, kasan, pagealloc, memory-failure,
hugetlb, userfaultfd, vmscan, compaction, mempolicy, memblock,
oom-kill, migration, ksm, percpu, vmstat, and madvise)"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (173 commits)
mm/madvise: add MADV_WILLNEED to process_madvise()
mm/vmstat: remove unneeded return value
mm/vmstat: simplify the array size calculation
mm/vmstat: correct some wrong comments
mm/percpu,c: remove obsolete comments of pcpu_chunk_populated()
selftests: vm: add COW time test for KSM pages
selftests: vm: add KSM merging time test
mm: KSM: fix data type
selftests: vm: add KSM merging across nodes test
selftests: vm: add KSM zero page merging test
selftests: vm: add KSM unmerge test
selftests: vm: add KSM merge test
mm/migrate: correct kernel-doc notation
mm: wire up syscall process_mrelease
mm: introduce process_mrelease system call
memblock: make memblock_find_in_range method private
mm/mempolicy.c: use in_task() in mempolicy_slab_node()
mm/mempolicy: unify the create() func for bind/interleave/prefer-many policies
mm/mempolicy: advertise new MPOL_PREFERRED_MANY
mm/hugetlb: add support for mempolicy MPOL_PREFERRED_MANY
...
Split off from prev patch in the series that implements the syscall.
Link: https://lkml.kernel.org/r/20210809185259.405936-2-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Jan Engelhardt <jengelh@inai.de>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tim Murray <timmurray@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are a lot of uses of memblock_find_in_range() along with
memblock_reserve() from the times memblock allocation APIs did not exist.
memblock_find_in_range() is the very core of memblock allocations, so any
future changes to its internal behaviour would mandate updates of all the
users outside memblock.
Replace the calls to memblock_find_in_range() with an equivalent calls to
memblock_phys_alloc() and memblock_phys_alloc_range() and make
memblock_find_in_range() private method of memblock.
This simplifies the callers, ensures that (unlikely) errors in
memblock_reserve() are handled and improves maintainability of
memblock_find_in_range().
Link: https://lkml.kernel.org/r/20210816122622.30279-1-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> [arm64]
Acked-by: Kirill A. Shutemov <kirill.shtuemov@linux.intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [ACPI]
Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Acked-by: Nick Kossifidis <mick@ics.forth.gr> [riscv]
Tested-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
DEFINE_SMP_CALL_CACHE_FUNCTION() was usefel before the CPU hotplug rework
to ensure that the cache related functions are called on the upcoming CPU
because the notifier itself could run on any online CPU.
The hotplug state machine guarantees that the callbacks are invoked on the
upcoming CPU. So there is no need to have this SMP function call
obfuscation. That indirection was missed when the hotplug notifiers were
converted.
This also solves the problem of ARM64 init_cache_level() invoking ACPI
functions which take a semaphore in that context. That's invalid as SMP
function calls run with interrupts disabled. Running it just from the
callback in context of the CPU hotplug thread solves this.
Fixes: 8571890e15 ("arm64: Add support for ACPI based firmware tables")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/871r69ersb.ffs@tglx
./arch/mips/kernel/uprobes.c:261:8-9: WARNING: return of 0/1 in function
'arch_uprobe_skip_sstep' with return type bool
./arch/mips/kernel/uprobes.c:78:10-11: WARNING: return of 0/1 in
function 'is_trap_insn' with return type bool
./arch/mips/kvm/mmu.c:489:9-10: WARNING: return of 0/1 in function
'kvm_test_age_gfn' with return type bool
./arch/mips/kvm/mmu.c:445:8-9: WARNING: return of 0/1 in function
'kvm_unmap_gfn_range' with return type bool
Signed-off-by: Huilong Deng <denghuilong@cdjrlc.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
The functions get_online_cpus() and put_online_cpus() have been
deprecated during the CPU hotplug rework. They map directly to
cpus_read_lock() and cpus_read_unlock().
Replace deprecated CPU-hotplug functions with the official version.
The behavior remains unchanged.
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: linux-mips@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
The bdflush system call has been deprecated for a very long time.
Recently Michael Schmitz tested[1] and found that the last known
caller of of the bdflush system call is unaffected by it's removal.
Since the code is not needed delete it.
[1] https://lkml.kernel.org/r/36123b5d-daa0-6c2b-f2d4-a942f069fd54@gmail.com
Link: https://lkml.kernel.org/r/87sg10quue.fsf_-_@disp2133
Tested-by: Michael Schmitz <schmitzmic@gmail.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Cyril Hrubis <chrubis@suse.cz>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
- Fix a MIPS bug where irqdomain loopkups could occur in a context
where RCU is not allowed
- Fix a documentation bug for handle_domain_irq
-----BEGIN PGP SIGNATURE-----
iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmDoFZwPHG1hekBrZXJu
ZWwub3JnAAoJECPQ0LrRPXpDkxgP/0IlcPoBfamLbzpjyrMm0359AIllKP7Vgpb6
IaDnoiWTCe1B6rFoP59j357iGCTvB04+oJ17DsB51R9RKaM2sdhi2dKxbEHRd+se
v1EyU5gmVB7JJP/1lPnjYdPGnXz73jjEq5MRI6o4yLG2xnQ+ZcdHifuP/tD7glGZ
UmJkAWrq53IbBCja9HbtbAQanuDtDu6xgYIweaCdf2oKzNS9jQVvjqFokMd0AcIA
juY8xTFNzHoA/8AF8eBUE5TfLVcG8j3P31ffw4gVvzEkman77AP5DZ8qkSvi7plH
wOdjlBrsGTfoti4kfAIvsfi6zBHyXJEaW0Vd38uaA+cXYI1QNiZ9qqzQPvjuK9gs
VFORcWe2pDiI1q4mg1pz7dGBy0FoEswe4uVnr6xm+vn98KeAn/gS5Dc/K1JpmfN+
iOJt9H7hvDrUC/KnntzuRY82I8y7gDyyjGDJHhFlWgZBeONPhpOiE/d6xbxGtQm8
SpVBD9QZnMBxDY2eVNQp31SCdrwLCLczNeQrJHP6Oh9LLmjQq3VD4M3OpBLpEA8e
8WIiH9vJfXI5emU1wQRkscyA8tT2mAo3Kb192fpC5nDwTMSxbfk1l94oLGUldcV5
liQ78yd/12mJBMS5MJPsA39g1ww2vv5m6Bq0JDwDu33l1qKrlRJqveeS8UjDSXrD
s3cVrxFO
=eTw3
-----END PGP SIGNATURE-----
Merge tag 'irqchip-fixes-5.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/urgent
Pull irqchip fixes from Marc Zyngier:
- Fix a MIPS bug where irqdomain loopkups could occur in a context
where RCU is not allowed
- Fix a documentation bug for handle_domain_irq
Since d4a45c68dc ("irqdomain: Protect the linear revmap with RCU"),
any irqdomain lookup requires the RCU read lock to be held.
This assumes that the architecture code will be structured such as
irq_enter() will be called *before* the interrupt is looked up
in the irq domain. However, this isn't the case for MIPS, and a number
of drivers are structured to do it the other way around when handling
an interrupt in their root irqchip (secondary irqchips are OK by
construction).
This results in a RCU splat on a lockdep-enabled kernel when the kernel
takes an interrupt from idle, as reported by Guenter Roeck.
Note that this could have fired previously if any driver had used
tree-based irqdomain, which always had the RCU requirement.
To solve this, provide a MIPS-specific helper (do_domain_IRQ())
as the pendent of do_IRQ() that will do thing in the right order
(and maybe save some cycles in the process).
Ideally, MIPS would be moved over to using handle_domain_irq(),
but that's much more ambitious.
Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
[maz: add dependency on CONFIG_IRQ_DOMAIN after report from the kernelci bot]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Serge Semin <fancer.lancer@gmail.com>
Link: https://lore.kernel.org/r/20210705172352.GA56304@roeck-us.net
Link: https://lore.kernel.org/r/20210706110647.3979002-1-maz@kernel.org
Merge more updates from Andrew Morton:
"190 patches.
Subsystems affected by this patch series: mm (hugetlb, userfaultfd,
vmscan, kconfig, proc, z3fold, zbud, ras, mempolicy, memblock,
migration, thp, nommu, kconfig, madvise, memory-hotplug, zswap,
zsmalloc, zram, cleanups, kfence, and hmm), procfs, sysctl, misc,
core-kernel, lib, lz4, checkpatch, init, kprobes, nilfs2, hfs,
signals, exec, kcov, selftests, compress/decompress, and ipc"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (190 commits)
ipc/util.c: use binary search for max_idx
ipc/sem.c: use READ_ONCE()/WRITE_ONCE() for use_global_lock
ipc: use kmalloc for msg_queue and shmid_kernel
ipc sem: use kvmalloc for sem_undo allocation
lib/decompressors: remove set but not used variabled 'level'
selftests/vm/pkeys: exercise x86 XSAVE init state
selftests/vm/pkeys: refill shadow register after implicit kernel write
selftests/vm/pkeys: handle negative sys_pkey_alloc() return code
selftests/vm/pkeys: fix alloc_random_pkey() to make it really, really random
kcov: add __no_sanitize_coverage to fix noinstr for all architectures
exec: remove checks in __register_bimfmt()
x86: signal: don't do sas_ss_reset() until we are certain that sigframe won't be abandoned
hfsplus: report create_date to kstat.btime
hfsplus: remove unnecessary oom message
nilfs2: remove redundant continue statement in a while-loop
kprobes: remove duplicated strong free_insn_page in x86 and s390
init: print out unknown kernel parameters
checkpatch: do not complain about positive return values starting with EPOLL
checkpatch: improve the indented label test
checkpatch: scripts/spdxcheck.py now requires python3
...
- Ingenic fixes/improvments
- other fixes and cleanups
-----BEGIN PGP SIGNATURE-----
iQJOBAABCAA4FiEEbt46xwy6kEcDOXoUeZbBVTGwZHAFAmDdyJQaHHRzYm9nZW5k
QGFscGhhLmZyYW5rZW4uZGUACgkQeZbBVTGwZHBlyRAAhj5dC1EyGsc0cWyzC8ZU
Nh45vzMCSxa4mxAelNjuVKdE0h7gLeRa8/sKPV+EcS3ZFcSfZQeIHEbH9Na3EDS7
KtUZmkjrHCDdRTh7kou7E7mb716HvoQEyq6d1VyZOahyqf2ZcjIsFinK+As+4wBV
JcXJMcpUWemI4Ojm8cbmNWQW3V2Ty9qLNUa6BpmntbdOdYowTih9QWHv2u1YsOR7
R6LJXdyo6V1RieeqfZaWXTQtN8yyXYhBewLIy0DxBAm329f5sRHUVd9/ZD2RMByD
1weEhbs0jhmYZFSfkwZ8SjAb4GkusjNTnDiK4Rsuz6pQK0BIGNAV0Mrnedq+i5eD
wrrTI1/envStDpj9XGlSNajcaQGpTza1V1uaCIm4EanOMMTBc8DTXaj7MCOlTD8j
fkNE3ykfloVSSzZAWCmcpV9fBsFwQp3m+cWrtIAAnOJDK+NV5FfABUwFHKnBUpxG
YOUNCed0WJFx++xrHKylt8hWILLEATLHh1h5vDsEYJi4gwt2KxCSYQBGgSYKa1At
tOUa5UINTMvpe4U9GEjHh6f1VedtUNYUzXjD5g2cn62d81RCSx3hB1KkfQSKtaDw
H9sR8d+rDFt4fK9T/HPZ3EyM2i9FkZNv+CulfGNd3M0vnF8+qX2xRehWrqhwBBi0
2p7V9/t5TPhziAfJySKX+Jg=
=j2kH
-----END PGP SIGNATURE-----
Merge tag 'mips_5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
Pull MIPS updates from Thomas Bogendoerfer:
- add support for OpeneEmbed SOM9331 board
- Ingenic fixes/improvments
- other fixes and cleanups
* tag 'mips_5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: (39 commits)
MIPS: Fix PKMAP with 32-bit MIPS huge page support
MIPS: CI20: Add second percpu timer for SMP.
MIPS: CI20: Reduce clocksource to 750 kHz.
MIPS: Ingenic: Add MAC syscon nodes for Ingenic SoCs.
dt-bindings: clock: Add documentation for MAC PHY control bindings.
MIPS: X1830: Respect cell count of common properties.
MIPS: set mips32r5 for virt extensions
MIPS: loongsoon64: Reserve memory below starting pfn to prevent Oops
MIPS: MT extensions are not available on MIPS32r1
mips/kvm: Use BUG_ON instead of if condition followed by BUG
MIPS: OCTEON: octeon-usb: Use devm_platform_get_and_ioremap_resource()
MIPS: add PMD table accounting into MIPS'pmd_alloc_one
MIPS: Loongson64: fix spelling of SPDX tag
MIPS: ingenic: rs90: Add dedicated VRAM memory region
MIPS: ingenic: gcw0: Set codec to cap-less mode for FM radio
MIPS: ingenic: jz4780: Fix I2C nodes to match DT doc
MIPS: ingenic: Select CPU_SUPPORTS_CPUFREQ && MIPS_EXTERNAL_TIMER
MIPS: Kconfig: ingenic: Ensure MACH_INGENIC_GENERIC selects all SoCs
MIPS: cpu-probe: Fix FPU detection on Ingenic JZ4760(B)
MIPS: boot: Support specifying UART port on Ingenic SoCs
...
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmDcl7AACgkQnJ2qBz9k
QNnsBQf+LBAPsfykQ/f8EdHErO1lfbVTmwf2g/JzTkjrIVZTZ6Ic47aCIiFxgHU2
Js9ufaPxpsbbopzpn2PAoCUzxNsZDqgXtnC03MOUAqoSFbAvgLHz2sQwjqeYJUGQ
P6n7VipEA/qBVpQI5zeCUhHYcahoNrRjSLzaFnE2Z8CrQYQ6Ry9gVEhduvu2OTru
62cWlAWlTJfx/FcR1Y0F/ZznnNSKMiAHcEe3F6Beztplg2ooq+z6FclJYrkmnxMq
SXSOsqTCdi1/oFx36NpvLkykrIS9I7N/iqCnKwbm6X+nyZZKyAwYZhWVqkbozPPu
+u1Ppq8o0IuWwEA6/UAmxgAO3m/Gkw==
=tn0h
-----END PGP SIGNATURE-----
Merge tag 'fs_for_v5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull misc fs updates from Jan Kara:
"The new quotactl_fd() syscall (remake of quotactl_path() syscall that
got introduced & disabled in 5.13 cycle), and couple of udf, reiserfs,
isofs, and writeback fixes and cleanups"
* tag 'fs_for_v5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
writeback: fix obtain a reference to a freeing memcg css
quota: remove unnecessary oom message
isofs: remove redundant continue statement
quota: Wire up quotactl_fd syscall
quota: Change quotactl_path() systcall to an fd-based one
reiserfs: Remove unneed check in reiserfs_write_full_page()
udf: Fix NULL pointer dereference in udf_symlink function
reiserfs: add check for invalid 1st journal block
kernel.h is being used as a dump for all kinds of stuff for a long time.
Here is the attempt to start cleaning it up by splitting out panic and
oops helpers.
There are several purposes of doing this:
- dropping dependency in bug.h
- dropping a loop by moving out panic_notifier.h
- unload kernel.h from something which has its own domain
At the same time convert users tree-wide to use new headers, although for
the time being include new header back to kernel.h to avoid twisted
indirected includes for existing users.
[akpm@linux-foundation.org: thread_info.h needs limits.h]
[andriy.shevchenko@linux.intel.com: ia64 fix]
Link: https://lkml.kernel.org/r/20210520130557.55277-1-andriy.shevchenko@linux.intel.com
Link: https://lkml.kernel.org/r/20210511074137.33666-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Co-developed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Corey Minyard <cminyard@mvista.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Wei Liu <wei.liu@kernel.org>
Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Sebastian Reichel <sre@kernel.org>
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Acked-by: Stephen Boyd <sboyd@kernel.org>
Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Acked-by: Helge Deller <deller@gmx.de> # parisc
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge misc updates from Andrew Morton:
"191 patches.
Subsystems affected by this patch series: kthread, ia64, scripts,
ntfs, squashfs, ocfs2, kernel/watchdog, and mm (gup, pagealloc, slab,
slub, kmemleak, dax, debug, pagecache, gup, swap, memcg, pagemap,
mprotect, bootmem, dma, tracing, vmalloc, kasan, initialization,
pagealloc, and memory-failure)"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (191 commits)
mm,hwpoison: make get_hwpoison_page() call get_any_page()
mm,hwpoison: send SIGBUS with error virutal address
mm/page_alloc: split pcp->high across all online CPUs for cpuless nodes
mm/page_alloc: allow high-order pages to be stored on the per-cpu lists
mm: replace CONFIG_FLAT_NODE_MEM_MAP with CONFIG_FLATMEM
mm: replace CONFIG_NEED_MULTIPLE_NODES with CONFIG_NUMA
docs: remove description of DISCONTIGMEM
arch, mm: remove stale mentions of DISCONIGMEM
mm: remove CONFIG_DISCONTIGMEM
m68k: remove support for DISCONTIGMEM
arc: remove support for DISCONTIGMEM
arc: update comment about HIGHMEM implementation
alpha: remove DISCONTIGMEM and NUMA
mm/page_alloc: move free_the_page
mm/page_alloc: fix counting of managed_pages
mm/page_alloc: improve memmap_pages dbg msg
mm: drop SECTION_SHIFT in code comments
mm/page_alloc: introduce vm.percpu_pagelist_high_fraction
mm/page_alloc: limit the number of pages on PCP lists when reclaim is active
mm/page_alloc: scale the number of pages that are batch freed
...
Use vma_lookup() to find the VMA at a specific address. As vma_lookup()
will return NULL if the address is not within any VMA, the start address
no longer needs to be validated.
Link: https://lkml.kernel.org/r/20210521174745.2219620-8-Liam.Howlett@Oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Changes to core scheduling facilities:
- Add "Core Scheduling" via CONFIG_SCHED_CORE=y, which enables
coordinated scheduling across SMT siblings. This is a much
requested feature for cloud computing platforms, to allow
the flexible utilization of SMT siblings, without exposing
untrusted domains to information leaks & side channels, plus
to ensure more deterministic computing performance on SMT
systems used by heterogenous workloads.
There's new prctls to set core scheduling groups, which
allows more flexible management of workloads that can share
siblings.
- Fix task->state access anti-patterns that may result in missed
wakeups and rename it to ->__state in the process to catch new
abuses.
- Load-balancing changes:
- Tweak newidle_balance for fair-sched, to improve
'memcache'-like workloads.
- "Age" (decay) average idle time, to better track & improve workloads
such as 'tbench'.
- Fix & improve energy-aware (EAS) balancing logic & metrics.
- Fix & improve the uclamp metrics.
- Fix task migration (taskset) corner case on !CONFIG_CPUSET.
- Fix RT and deadline utilization tracking across policy changes
- Introduce a "burstable" CFS controller via cgroups, which allows
bursty CPU-bound workloads to borrow a bit against their future
quota to improve overall latencies & batching. Can be tweaked
via /sys/fs/cgroup/cpu/<X>/cpu.cfs_burst_us.
- Rework assymetric topology/capacity detection & handling.
- Scheduler statistics & tooling:
- Disable delayacct by default, but add a sysctl to enable
it at runtime if tooling needs it. Use static keys and
other optimizations to make it more palatable.
- Use sched_clock() in delayacct, instead of ktime_get_ns().
- Misc cleanups and fixes.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmDZcPoRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1g3yw//WfhIqy7Psa9d/MBMjQDRGbTuO4+w22Dj
vmWFU44Q4KJxQHWeIgUlrK+dzvYWvNmflUs2CUUOiDVzxFTHMIyBtL4qCBUbx4Ns
vKAcB9wsWZge2o3WzZqpProRhdoRaSKw8egUr2q7rACVBkckY7eGP/OjWxXU8BdA
b7D0LPWwuIBFfN4pFYeCDLn32Dqr9s6Chyj+ZecabdG7EE6Gu+f1diVcxy7JE/mc
4WWL0D1RqdgpGrBEuMJIxPYekdrZiuy4jtEbztz5gbTBteN1cj3BLfqn0Pc/e6rO
Vyuc5mXCAmzRVi18z6g6bsVl+IA/nrbErENB2OHOhOYtqiZxqGTd4GPWZszMyY17
5AsEO5+5pcaBsy4gyp09qURggBu9zhJnMVmOI3rIHZkmkhwzc6uUJlyhDCTiFWOz
3ZF3LjbZEyCKodMD8qMHbs3axIBpIfZqjzkvSKyFnvfXEGVytVse7NUuWtQ36u92
GnURxVeYY1TDVXvE1Y8owNKMxknKQ6YRlypP7Dtbeo/qG6hShp0xmS7qDLDi0ybZ
ZlK+bDECiVoDf3nvJo+8v5M82IJ3CBt4UYldeRJsa1YCK/FsbK8tp91fkEfnXVue
+U6LPX0AmMpXacR5HaZfb3uBIKRw/QMdP/7RFtBPhpV6jqCrEmuqHnpPQiEVtxwO
UmG7bt94Trk=
=3VDr
-----END PGP SIGNATURE-----
Merge tag 'sched-core-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler udpates from Ingo Molnar:
- Changes to core scheduling facilities:
- Add "Core Scheduling" via CONFIG_SCHED_CORE=y, which enables
coordinated scheduling across SMT siblings. This is a much
requested feature for cloud computing platforms, to allow the
flexible utilization of SMT siblings, without exposing untrusted
domains to information leaks & side channels, plus to ensure more
deterministic computing performance on SMT systems used by
heterogenous workloads.
There are new prctls to set core scheduling groups, which allows
more flexible management of workloads that can share siblings.
- Fix task->state access anti-patterns that may result in missed
wakeups and rename it to ->__state in the process to catch new
abuses.
- Load-balancing changes:
- Tweak newidle_balance for fair-sched, to improve 'memcache'-like
workloads.
- "Age" (decay) average idle time, to better track & improve
workloads such as 'tbench'.
- Fix & improve energy-aware (EAS) balancing logic & metrics.
- Fix & improve the uclamp metrics.
- Fix task migration (taskset) corner case on !CONFIG_CPUSET.
- Fix RT and deadline utilization tracking across policy changes
- Introduce a "burstable" CFS controller via cgroups, which allows
bursty CPU-bound workloads to borrow a bit against their future
quota to improve overall latencies & batching. Can be tweaked via
/sys/fs/cgroup/cpu/<X>/cpu.cfs_burst_us.
- Rework assymetric topology/capacity detection & handling.
- Scheduler statistics & tooling:
- Disable delayacct by default, but add a sysctl to enable it at
runtime if tooling needs it. Use static keys and other
optimizations to make it more palatable.
- Use sched_clock() in delayacct, instead of ktime_get_ns().
- Misc cleanups and fixes.
* tag 'sched-core-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (72 commits)
sched/doc: Update the CPU capacity asymmetry bits
sched/topology: Rework CPU capacity asymmetry detection
sched/core: Introduce SD_ASYM_CPUCAPACITY_FULL sched_domain flag
psi: Fix race between psi_trigger_create/destroy
sched/fair: Introduce the burstable CFS controller
sched/uclamp: Fix uclamp_tg_restrict()
sched/rt: Fix Deadline utilization tracking during policy change
sched/rt: Fix RT utilization tracking during policy change
sched: Change task_struct::state
sched,arch: Remove unused TASK_STATE offsets
sched,timer: Use __set_current_state()
sched: Add get_current_state()
sched,perf,kvm: Fix preemption condition
sched: Introduce task_is_running()
sched: Unbreak wakeups
sched/fair: Age the average idle time
sched/cpufreq: Consider reduced CPU capacity in energy calculation
sched/fair: Take thermal pressure into account while estimating energy
thermal/cpufreq_cooling: Update offline CPUs per-cpu thermal_pressure
sched/fair: Return early from update_tg_cfs_load() if delta == 0
...
- Platform PMU driver updates:
- x86 Intel uncore driver updates for Skylake (SNR) and Icelake (ICX) servers
- Fix RDPMC support
- Fix [extended-]PEBS-via-PT support
- Fix Sapphire Rapids event constraints
- Fix :ppp support on Sapphire Rapids
- Fix fixed counter sanity check on Alder Lake & X86_FEATURE_HYBRID_CPU
- Other heterogenous-PMU fixes
- Kprobes:
- Remove the unused and misguided kprobe::fault_handler callbacks.
- Warn about kprobes taking a page fault.
- Fix the 'nmissed' stat counter.
- Misc cleanups and fixes.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmDZaxMRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1hPgw//f9SnGzFoP1uR5TBqM8j/QHulMewew/iD
dM5lh2emdmqHWYPBeRxUHgag38K2Golr3Y+NxLA3R+RMx+OZQe8Mz/wYvPQcBvsV
k1HHImU3GRMn4GM7GwxH3vPIottDUx3mNS2J6pzlw3kwRUVqrxUdj/0/pSY/4eJ7
ZT4uq4yLV83Jd3qioU7o7e/u6MrdNIIcAXRpVDdE9Mm1+kWXSVN7/h3Vsiz4tj5E
iS+UXEtSc1a2mnmekv63pYkJHHNUb6guD8jgI/wrm1KIFGjDRifM+3TV6R/kB96/
TfD2LhCcTShfSp8KI191pgV7/NQbB/PmLdSYmff3rTBiii4cqXuCygJCHInZ09z0
4fTSSqM6aHg7kfTQyOCp+DUQ+9vNVXWo8mxt9c6B8xA0GyCI3zhjQ4UIiSUWRpjs
Be5ZyF0kNNuPxYrKFnGnBf8+51DURpCz3sDdYRuK4KNkj1+4ZvJo/KzGTMUUIE4B
IDQG6wDP5Kb388eRDtKrG5X7IXg+L5F/kezin60j0QF5MwDgxirT217teN8H1lNn
YgWMjRK8Tw0flUJsbCxa51/nl93UtByB+fIRIc88MSeLxcI6/ORW+TxBBEqkYm5Z
6BLFtmHSuAqAXUuyZXSGLcW7XLJvIaDoHgvbDn6l4g7FMWHqPOIq6nJQY3L8ben2
e+fQrGh4noI=
=20Vc
-----END PGP SIGNATURE-----
Merge tag 'perf-core-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf events updates from Ingo Molnar:
- Platform PMU driver updates:
- x86 Intel uncore driver updates for Skylake (SNR) and Icelake (ICX) servers
- Fix RDPMC support
- Fix [extended-]PEBS-via-PT support
- Fix Sapphire Rapids event constraints
- Fix :ppp support on Sapphire Rapids
- Fix fixed counter sanity check on Alder Lake & X86_FEATURE_HYBRID_CPU
- Other heterogenous-PMU fixes
- Kprobes:
- Remove the unused and misguided kprobe::fault_handler callbacks.
- Warn about kprobes taking a page fault.
- Fix the 'nmissed' stat counter.
- Misc cleanups and fixes.
* tag 'perf-core-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Fix task context PMU for Hetero
perf/x86/intel: Fix instructions:ppp support in Sapphire Rapids
perf/x86/intel: Add more events requires FRONTEND MSR on Sapphire Rapids
perf/x86/intel: Fix fixed counter check warning for some Alder Lake
perf/x86/intel: Fix PEBS-via-PT reload base value for Extended PEBS
perf/x86: Reset the dirty counter to prevent the leak for an RDPMC task
kprobes: Do not increment probe miss count in the fault handler
x86,kprobes: WARN if kprobes tries to handle a fault
kprobes: Remove kprobe::fault_handler
uprobes: Update uprobe_write_opcode() kernel-doc comment
perf/hw_breakpoint: Fix DocBook warnings in perf hw_breakpoint
perf/core: Fix DocBook warnings
perf/core: Make local function perf_pmu_snapshot_aux() static
perf/x86/intel/uncore: Enable I/O stacks to IIO PMON mapping on ICX
perf/x86/intel/uncore: Enable I/O stacks to IIO PMON mapping on SNR
perf/x86/intel/uncore: Generalize I/O stacks to PMON mapping procedure
perf/x86/intel/uncore: Drop unnecessary NULL checks after container_of()
- Core locking & atomics:
- Convert all architectures to ARCH_ATOMIC: move every
architecture to ARCH_ATOMIC, then get rid of ARCH_ATOMIC
and all the transitory facilities and #ifdefs.
Much reduction in complexity from that series:
63 files changed, 756 insertions(+), 4094 deletions(-)
- Self-test enhancements
- Futexes:
- Add the new FUTEX_LOCK_PI2 ABI, which is a variant that
doesn't set FLAGS_CLOCKRT (.e. uses CLOCK_MONOTONIC).
[ The temptation to repurpose FUTEX_LOCK_PI's implicit
setting of FLAGS_CLOCKRT & invert the flag's meaning
to avoid having to introduce a new variant was
resisted successfully. ]
- Enhance futex self-tests
- Lockdep:
- Fix dependency path printouts
- Optimize trace saving
- Broaden & fix wait-context checks
- Misc cleanups and fixes.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmDZaEYRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1hPdxAAiNCsxL6X1cZ8zqbWsvLefT9Zqhzgs5u6
gdZele7PNibvbYdON26b5RUzuKfOW/hgyX6LKqr+AiNYTT9PGhcY+tycUr2PGk5R
LMyhJWmmX5cUVPU92ky+z5hEHB2gr4XPJcvgpKKUL0XB1tBaSvy2DtgwPuhXOoT1
1sCQfy63t71snt2RfEnibVW6xovwaA2lsqL81lLHJN4iRFWvqO498/m4+PWkylsm
ig/+VT1Oz7t4wqu3NhTqNNZv+4K4W2asniyo53Dg2BnRm/NjhJtgg4jRibrb0ssb
67Xdq6y8+xNBmEAKj+Re8VpMcu4aj346Ctk7d4gst2ah/Rc0TvqfH6mezH7oq7RL
hmOrMBWtwQfKhEE/fDkng30nrVxc/98YXP0n2rCCa0ySsaF6b6T185mTcYDRDxFs
BVNS58ub+zxrF9Zd4nhIHKaEHiL2ZdDimqAicXN0RpywjIzTQ/y11uU7I1WBsKkq
WkPYs+FPHnX7aBv1MsuxHhb8sUXjG924K4JeqnjF45jC3sC1crX+N0jv4wHw+89V
h4k20s2Tw6m5XGXlgGwMJh0PCcD6X22Vd9Uyw8zb+IJfvNTGR9Rp1Ec+1gMRSll+
xsn6G6Uy9bcNU0SqKlBSfelweGKn4ZxbEPn76Jc8KWLiepuZ6vv5PBoOuaujWht9
KAeOC5XdjMk=
=tH//
-----END PGP SIGNATURE-----
Merge tag 'locking-core-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
- Core locking & atomics:
- Convert all architectures to ARCH_ATOMIC: move every architecture
to ARCH_ATOMIC, then get rid of ARCH_ATOMIC and all the
transitory facilities and #ifdefs.
Much reduction in complexity from that series:
63 files changed, 756 insertions(+), 4094 deletions(-)
- Self-test enhancements
- Futexes:
- Add the new FUTEX_LOCK_PI2 ABI, which is a variant that doesn't
set FLAGS_CLOCKRT (.e. uses CLOCK_MONOTONIC).
[ The temptation to repurpose FUTEX_LOCK_PI's implicit setting of
FLAGS_CLOCKRT & invert the flag's meaning to avoid having to
introduce a new variant was resisted successfully. ]
- Enhance futex self-tests
- Lockdep:
- Fix dependency path printouts
- Optimize trace saving
- Broaden & fix wait-context checks
- Misc cleanups and fixes.
* tag 'locking-core-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits)
locking/lockdep: Correct the description error for check_redundant()
futex: Provide FUTEX_LOCK_PI2 to support clock selection
futex: Prepare futex_lock_pi() for runtime clock selection
lockdep/selftest: Remove wait-type RCU_CALLBACK tests
lockdep/selftests: Fix selftests vs PROVE_RAW_LOCK_NESTING
lockdep: Fix wait-type for empty stack
locking/selftests: Add a selftest for check_irq_usage()
lockding/lockdep: Avoid to find wrong lock dep path in check_irq_usage()
locking/lockdep: Remove the unnecessary trace saving
locking/lockdep: Fix the dep path printing for backwards BFS
selftests: futex: Add futex compare requeue test
selftests: futex: Add futex wait test
seqlock: Remove trailing semicolon in macros
locking/lockdep: Reduce LOCKDEP dependency list
locking/lockdep,doc: Improve readability of the block matrix
locking/atomics: atomic-instrumented: simplify ifdeffery
locking/atomic: delete !ARCH_ATOMIC remnants
locking/atomic: xtensa: move to ARCH_ATOMIC
locking/atomic: sparc: move to ARCH_ATOMIC
locking/atomic: sh: move to ARCH_ATOMIC
...
All 6 architectures define TASK_STATE in asm-offsets, but then never
actually use it. Remove the definitions to make sure they never will.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210611082838.472811363@infradead.org
Replace a bunch of 'p->state == TASK_RUNNING' with a new helper:
task_is_running(p).
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210611082838.222401495@infradead.org
The reason for kprobe::fault_handler(), as given by their comment:
* We come here because instructions in the pre/post
* handler caused the page_fault, this could happen
* if handler tries to access user space by
* copy_from_user(), get_user() etc. Let the
* user-specified handler try to fix it first.
Is just plain bad. Those other handlers are ran from non-preemptible
context and had better use _nofault() functions. Also, there is no
upstream usage of this.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20210525073213.561116662@infradead.org
Ingenic JZ4760 and JZ4760B do have a FPU, but the config registers don't
report it. Force the FPU detection in case the processor ID match the
JZ4760(B) one.
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Use pattern rules to unify similar build rules among n32, n64, and o32.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
There is no good reason to generate the syscall offset macros by
scripting since they are not derived from the syscall tables.
Define __NR_*_Linux macros directly in arch/mips/include/asm/unistd.h,
and clean up the Makefile and the shell script.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
We'd like all architectures to convert to ARCH_ATOMIC, as once all
architectures are converted it will be possible to make significant
cleanups to the atomics headers, and this will make it much easier to
generically enable atomic functionality (e.g. debug logic in the
instrumented wrappers).
As a step towards that, this patch migrates mips to ARCH_ATOMIC. The
arch code provides arch_{atomic,atomic64,xchg,cmpxchg}*(), and common
code wraps these with optional instrumentation to provide the regular
functions.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210525140232.53872-23-mark.rutland@arm.com
In commit fa8b90070a ("quota: wire up quotactl_path") we have wired up
new quotactl_path syscall. However some people in LWN discussion have
objected that the path based syscall is missing dirfd and flags argument
which is mostly standard for contemporary path based syscalls. Indeed
they have a point and after a discussion with Christian Brauner and
Sascha Hauer I've decided to disable the syscall for now and update its
API. Since there is no userspace currently using that syscall and it
hasn't been released in any major release, we should be fine.
CC: Christian Brauner <christian.brauner@ubuntu.com>
CC: Sascha Hauer <s.hauer@pengutronix.de>
Link: https://lore.kernel.org/lkml/20210512153621.n5u43jsytbik4yze@wittgenstein
Signed-off-by: Jan Kara <jack@suse.cz>
As pointed out by commit
de9b8f5dcb ("sched: Fix crash trying to dequeue/enqueue the idle thread")
init_idle() can and will be invoked more than once on the same idle
task. At boot time, it is invoked for the boot CPU thread by
sched_init(). Then smp_init() creates the threads for all the secondary
CPUs and invokes init_idle() on them.
As the hotplug machinery brings the secondaries to life, it will issue
calls to idle_thread_get(), which itself invokes init_idle() yet again.
In this case it's invoked twice more per secondary: at _cpu_up(), and at
bringup_cpu().
Given smp_init() already initializes the idle tasks for all *possible*
CPUs, no further initialization should be required. Now, removing
init_idle() from idle_thread_get() exposes some interesting expectations
with regards to the idle task's preempt_count: the secondary startup always
issues a preempt_disable(), requiring some reset of the preempt count to 0
between hot-unplug and hotplug, which is currently served by
idle_thread_get() -> idle_init().
Given the idle task is supposed to have preemption disabled once and never
see it re-enabled, it seems that what we actually want is to initialize its
preempt_count to PREEMPT_DISABLED and leave it there. Do that, and remove
init_idle() from idle_thread_get().
Secondary startups were patched via coccinelle:
@begone@
@@
-preempt_disable();
...
cpu_startup_entry(CPUHP_AP_ONLINE_IDLE);
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210512094636.2958515-1-valentin.schneider@arm.com
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEgycj0O+d1G2aycA8rZhLv9lQBTwFAmCInP4ACgkQrZhLv9lQ
BTza0g//dTeb9woC9H7qlEhK4l9yk62lTss60Q8X7m7ZSNfdL4tiEbi64SgK+iOW
OOegbrOEb8Kzh4KJJYmVlVZ5YUWyH4szgmee1wnylBdsWiWaPLPF3Cflz77apy6T
TiiBsJd7rRE29FKheaMt34B41BMh8QHESN+DzjzJWsFoi/uNxjgSs2W16XuSupKu
bpRmB1pYNXMlrkzz7taL05jndZYE5arVriqlxgAsuLOFOp/ER7zecrjImdCM/4kL
W6ej0R1fz2Geh6CsLBJVE+bKWSQ82q5a4xZEkSYuQHXgZV5eywE5UKu8ssQcRgQA
VmGUY5k73rfY9Ofupf2gCaf/JSJNXKO/8Xjg0zAdklKtmgFjtna5Tyg9I90j7zn+
5swSpKuRpilN8MQH+6GWAnfqQlNoviTOpFeq3LwBtNVVOh08cOg6lko/bmebBC+R
TeQPACKS0Q0gCDPm9RYoU1pMUuYgfOwVfVRZK1prgi2Co7ZBUMOvYbNoKYoPIydr
ENBYljlU1OYwbzgR2nE+24fvhU8xdNOVG1xXYPAEHShu+p7dLIWRLhl8UCtRQpSR
1ofeVaJjgjrp29O+1OIQjB2kwCaRdfv/Gq1mztE/VlMU/r++E62OEzcH0aS+mnrg
yzfyUdI8IFv1q6FGT9yNSifWUWxQPmOKuC8kXsKYfqfJsFwKmHM=
=uCN4
-----END PGP SIGNATURE-----
Merge tag 'landlock_v34' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull Landlock LSM from James Morris:
"Add Landlock, a new LSM from Mickaël Salaün.
Briefly, Landlock provides for unprivileged application sandboxing.
From Mickaël's cover letter:
"The goal of Landlock is to enable to restrict ambient rights (e.g.
global filesystem access) for a set of processes. Because Landlock
is a stackable LSM [1], it makes possible to create safe security
sandboxes as new security layers in addition to the existing
system-wide access-controls. This kind of sandbox is expected to
help mitigate the security impact of bugs or unexpected/malicious
behaviors in user-space applications. Landlock empowers any
process, including unprivileged ones, to securely restrict
themselves.
Landlock is inspired by seccomp-bpf but instead of filtering
syscalls and their raw arguments, a Landlock rule can restrict the
use of kernel objects like file hierarchies, according to the
kernel semantic. Landlock also takes inspiration from other OS
sandbox mechanisms: XNU Sandbox, FreeBSD Capsicum or OpenBSD
Pledge/Unveil.
In this current form, Landlock misses some access-control features.
This enables to minimize this patch series and ease review. This
series still addresses multiple use cases, especially with the
combined use of seccomp-bpf: applications with built-in sandboxing,
init systems, security sandbox tools and security-oriented APIs [2]"
The cover letter and v34 posting is here:
https://lore.kernel.org/linux-security-module/20210422154123.13086-1-mic@digikod.net/
See also:
https://landlock.io/
This code has had extensive design discussion and review over several
years"
Link: https://lore.kernel.org/lkml/50db058a-7dde-441b-a7f9-f6837fe8b69f@schaufler-ca.com/ [1]
Link: https://lore.kernel.org/lkml/f646e1c7-33cf-333f-070c-0a40ad0468cd@digikod.net/ [2]
* tag 'landlock_v34' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
landlock: Enable user space to infer supported features
landlock: Add user and kernel documentation
samples/landlock: Add a sandbox manager example
selftests/landlock: Add user space tests
landlock: Add syscall implementations
arch: Wire up Landlock syscalls
fs,security: Add sb_delete hook
landlock: Support filesystem access-control
LSM: Infrastructure management of the superblock
landlock: Add ptrace restrictions
landlock: Set up the security framework and manage credentials
landlock: Add ruleset and domain management
landlock: Add object management
- removed broken/unmaintained MIPS KVM trap and emulate support
- added support for Loongson-2K1000
- fixes and cleanups
-----BEGIN PGP SIGNATURE-----
iQJOBAABCAA4FiEEbt46xwy6kEcDOXoUeZbBVTGwZHAFAmCJS6oaHHRzYm9nZW5k
QGFscGhhLmZyYW5rZW4uZGUACgkQeZbBVTGwZHDnbBAAnfVnVgjRZZ1y+oX9siFk
xrQUVJMxEl02272yptB6VtWOLhTnFnJ2xNyjb4EcoCbTiBReSNQ8ee4WYYdCgSXh
soDW6wYrlpcZau5sRg72xybmmYo4aUEh3D9PWNpCaDXFaSpnbXv28ccCTb0yLvCL
4s72wl+hLGAygJMlA6gkQ3mN3XblRk+ci1MLQOr4WpxCmWx0ozKKnHnlt21rw26w
kokJRsqKAOgzzB5LIBgFBdlNKXMd+Xq8BhDXvilNbH2+LtrNIIbyJDPReZBkkO/T
JWGbcNJMox/lBzqtM+eIMvuf/2QHtUAB4xXSAH+oAno6PlV0S73Oo5E+tI++ax9j
WmdN3nwjhpEmwyovZwZtl+b0P14NXXAYKYOYClaRn3dFxEXnS7Fiq+EyGuOEQYBS
oTa2VXQrFZfpN55wtdwmXdwWDwNuE8UtEbkSpblyTw9HQqh3jnl8xKGnaWzw7HMN
D1yycBxMzuOrMiY5UBcfuzdU1K9Dgxn+ur8IfraXjjcEmZB+1TEB74/GCLK1/UJM
lBUrW9ZrE5e3COA/gqPq5ZQf4W9cp8xLBMWWXxnMSBEErpePDtt3EZP2H62sK+3a
wuyvEwRuiFskeUi/Tr7t89u4oXrcIZ7Q3bM3LKj3IWbsk/Znk8X03GCF86QU27zP
wlrVT2ef0/8DHEI7btraCb4=
=w/c4
-----END PGP SIGNATURE-----
Merge tag 'mips_5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
Pull MIPS updates from Thomas Bogendoerfer:
- removed get_fs/set_fs
- removed broken/unmaintained MIPS KVM trap and emulate support
- added support for Loongson-2K1000
- fixes and cleanups
* tag 'mips_5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: (107 commits)
MIPS: BCM63XX: Use BUG_ON instead of condition followed by BUG.
MIPS: select ARCH_KEEP_MEMBLOCK unconditionally
mips: Do not include hi and lo in clobber list for R6
MIPS:DTS:Correct the license for Loongson-2K
MIPS:DTS:Fix label name and interrupt number of ohci for Loongson-2K
MIPS: Avoid handcoded DIVU in `__div64_32' altogether
lib/math/test_div64: Correct the spelling of "dividend"
lib/math/test_div64: Fix error message formatting
mips/bootinfo:correct some comments of fw_arg
MIPS: Avoid DIVU in `__div64_32' is result would be zero
MIPS: Reinstate platform `__div64_32' handler
div64: Correct inline documentation for `do_div'
lib/math: Add a `do_div' test module
MIPS: Makefile: Replace -pg with CC_FLAGS_FTRACE
MIPS: pci-legacy: revert "use generic pci_enable_resources"
MIPS: Loongson64: Add kexec/kdump support
MIPS: pci-legacy: use generic pci_enable_resources
MIPS: pci-legacy: remove busn_resource field
MIPS: pci-legacy: remove redundant info messages
MIPS: pci-legacy: stop using of_pci_range_to_resource
...
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmCJU1UACgkQnJ2qBz9k
QNk62AgAgp05OIXU/AgObb7DvSyI3ycwCV8PeWBpwD8yoDAh5x0tmT7vnJu974p6
yHdnF7rr69ZzvbNCHLJ5kRykRlUao9W7cO5fdOW1uTpL7Ic60QuJMks/NfgVTHp1
2zIQmBDerfn1/LTK8r2pPGcvtcjRcr7Ep4beN0Duw57lfVMJhjsNRPnBbXGBcp0r
QzKk4/8V3DCZvOw+XNC3nto7avjvf+nU9sJmuh83546eqh0atjWivvO5aAlDOe6W
rhBiLlmP0in5u2n1fYqzI1OQvtgtleyEZT2G0CrbAZn0xjmV/if9wl+3K6TOwDvR
778xDEX7sZCaO/xkB+WK3hrd15ftKg==
=0kYE
-----END PGP SIGNATURE-----
Merge tag 'for_v5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull quota, ext2, reiserfs updates from Jan Kara:
- support for path (instead of device) based quotactl syscall
(quotactl_path(2))
- ext2 conversion to kmap_local()
- other minor cleanups & fixes
* tag 'for_v5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
fs/reiserfs/journal.c: delete useless variables
fs/ext2: Replace kmap() with kmap_local_page()
ext2: Match up ext2_put_page() with ext2_dotdot() and ext2_find_entry()
fs/ext2/: fix misspellings using codespell tool
quota: report warning limits for realtime space quotas
quota: wire up quotactl_path
quota: Add mountpath based quota support
This patch replaces the "open-coded" -pg compile flag with a CC_FLAGS_FTRACE
makefile variable which architectures can override if a different option
should be used for code generation.
Signed-off-by: zhaoxiao <zhaoxiao@uniontech.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Add kexec/kdump support for Loongson64 by:
1, Provide Loongson-specific kexec functions: loongson_kexec_prepare(),
loongson_kexec_shutdown() and loongson_crash_shutdown();
2, Provide Loongson-specific assembly code in kexec_smp_wait();
To start Loongson64, The boot CPU needs 3 parameters:
fw_arg0: the number of arguments in cmdline (i.e., argc).
fw_arg1: structure holds cmdline such as "root=/dev/sda1 console=tty"
(i.e., argv).
fw_arg2: environment (i.e., envp, additional boot parameters from LEFI).
Non-boot CPUs do not need one parameter as the IPI mailbox base address.
They query their own IPI mailbox to get PC, SP and GP in a loopi, until
the boot CPU brings them up.
loongson_kexec_prepare(): Setup cmdline for kexec/kdump. The kexec/kdump
cmdline comes from kexec's "append" option string. This structure will
be parsed in fw_init_cmdline() of arch/mips/fw/lib/cmdline.c. Both image
->control_code_page and the cmdline need to be in a safe memory region
(memory allocated by the old kernel may be corrupted by the new kernel).
In order to maintain compatibility for the old firmware, the low 2MB is
reserverd and safe for Loongson. So let KEXEC_CTRL_CODE and KEXEC_ARGV_
ADDR be here. LEFI parameters may be corrupted at runtime, so backup it
at mips_reboot_setup(), and then restore it at loongson_kexec_shutdown()
/loongson_crash_shutdown().
loongson_kexec_shutdown(): Wake up all present CPUs and let them go to
reboot_code_buffer. Pass the kexec parameters to kexec_args.
loongson_crash_shutdown(): Pass the kdump parameters to kexec_args.
The assembly part in kexec_smp_wait provide a routine as BIOS does, in
order to keep secondary CPUs in a querying loop.
The layout of low 2MB memory in our design:
0x80000000, the first MB, the first 64K, Exception vectors
0x80010000, the first MB, the second 64K, STR (suspend) data
0x80020000, the first MB, the third and fourth 64K, UEFI HOB
0x80040000, the first MB, the fifth 64K, RT-Thread for SMC
0x80100000, the second MB, the first 64K, KEXEC code
0x80108000, the second MB, the second 64K, KEXEC data
Cc: Eric Biederman <ebiederm@xmission.com>
Tested-by: Jinyang He <hejinyang@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@kernel.org>
Signed-off-by: Jinyang He <hejinyang@loongson.cn>
Signed-off-by: Youling Tang <tangyouling@loongson.cn>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Commit 45deb5faeb ("MIPS: uaccess: Remove get_fs/set_fs call sites")
caused a few new sparse warnings, fix them.
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Loongson64 processors have a writecombine issue that maybe failed to
write back framebuffer used with ATI Radeon or AMD GPU at times, after
commit 8a08e50cee ("drm: Permit video-buffers writecombine mapping
for MIPS"), there exists some errors such as blurred screen and lockup,
and so on.
[ 60.958721] radeon 0000:03:00.0: ring 0 stalled for more than 10079msec
[ 60.965315] radeon 0000:03:00.0: GPU lockup (current fence id 0x0000000000000112 last fence id 0x000000000000011d on ring 0)
[ 60.976525] radeon 0000:03:00.0: ring 3 stalled for more than 10086msec
[ 60.983156] radeon 0000:03:00.0: GPU lockup (current fence id 0x0000000000000374 last fence id 0x00000000000003a8 on ring 3)
As discussed earlier [1], it might be better to disable writecombine
on the CPU detection side because the root cause is unknown now.
Actually, this patch is a temporary solution to just make it work well,
it is not a proper and final solution, I hope someone will have a better
solution to fix this issue in the future.
[1] https://lore.kernel.org/patchwork/patch/1285542/
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
All get_fs/set_fs calls in MIPS code are gone, so remove implementation
of it. With the clear separation of user/kernel space access we no
longer need the EVA special handling, so get rid of that, too.
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Use new helpers to access user/kernel for functions, which are used with
user/kernel pointers. Instead of dealing with get_fs/set_fs select
user/kernel access via parameter.
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
flush_icache_range always does flush kernel address ranges, so no
need to do the set_fs dance.
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Provide hooks to intercept bad usages of virt_to_phys() and
__pa_symbol() throughout the kernel. To make this possible, we need to
rename the current implement of virt_to_phys() into
__virt_to_phys_nodebug() and wrap it around depending on
CONFIG_DEBUG_VIRTUAL.
A similar thing is needed for __pa_symbol() which is now aliased to
__phys_addr_symbol() whose implementation is either the direct return of
RELOC_HIDE or goes through the debug version.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
With ath79_defconfig enabling CONFIG_MIPS_ELF_APPENDED_DTB gives a
compilation error. This patch fixes it.
Build log:
...
CC kernel/locking/percpu-rwsem.o
../arch/mips/kernel/setup.c:46:39: error: conflicting types for
'__appended_dtb'
const char __section(".appended_dtb") __appended_dtb[0x100000];
^~~~~~~~~~~~~~
In file included from ../arch/mips/kernel/setup.c:34:
../arch/mips/include/asm/bootinfo.h:118:13: note: previous declaration
of '__appended_dtb' was here
extern char __appended_dtb[];
^~~~~~~~~~~~~~
CC fs/attr.o
make[4]: *** [../scripts/Makefile.build:271: arch/mips/kernel/setup.o]
Error 1
...
Root cause seems to be:
Fixes: b83ba0b9df ("MIPS: of: Introduce helper function to get DTB")
Signed-off-by: Mauri Sandberg <sandberg@mailfence.com>
Reviewed-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: trivial@kernel.org
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Commit 6654111c89 ("MIPS: vmlinux.lds.S: align raw appended dtb to 8
bytes") changed the alignment from STRUCT_ALIGNMENT bytes to 8 bytes.
The commit's message makes it sound like it was actually done on
purpose, but this is not the case. The commit was written when raw
appended dtb were not aligned at all. The STRUCT_ALIGN() was added a few
days before, in commit 7a05293af3 ("MIPS: boot/compressed: Copy DTB to
aligned address"). The true purpose of the commit was not to align
specifically to 8 bytes, but to make sure that the generated vmlinux'
size was properly padded to the alignment required for DTBs.
While the switch to 8-byte alignment worked for vmlinux-appended dtb
blobs, it broke vmlinuz-appended dtb blobs, as the decompress routine
moves the blob to a STRUCT_ALIGNMENT aligned address.
Fix this by changing the raw appended dtb blob alignment from 8 bytes
back to STRUCT_ALIGNMENT bytes in vmlinux.lds.S.
Fixes: 6654111c89 ("MIPS: vmlinux.lds.S: align raw appended dtb to 8 bytes")
Cc: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
debugfs_create_file_unsafe does not protect the fops handed to it
against file removal. DEFINE_DEBUGFS_ATTRIBUTE makes the fops aware of
the file lifetime and thus protects it against removal.
Signed-off-by: Wang Qing <wangqing@vivo.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Many architectures duplicate similar shell scripts.
This commit converts mips to use scripts/syscallhdr.sh.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Many architectures duplicate similar shell scripts.
This commit converts mips to use scripts/syscalltbl.sh. This also
unifies syscall_table_32_o32.h and syscall_table_64_o32.h into
syscall_table_o32.h.
The offset parameters are unneeded here; __SYSCALL(nr, entry) is defined
as 'PTR entry', so the parameter 'nr' is not used in the first place.
With this commit, syscall tables and generated files are straight
mapped, which makes things easier to understand.
syscall_n32.tbl --> syscall_table_n32.h
syscall_n64.tbl --> syscall_table_n64.h
syscall_o32.tbl --> syscall_table_o32.h
Then, the abi parameters are also unneeded.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
KVM_GUEST is broken and unmaintained, so let's remove it.
Reviewed-by: Huacai Chen <chenhuacai@kernel.org>
Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
BMIPS is one of the few platforms that do change the exception base.
After commit 2dcb396454 ("memblock: do not start bottom-up allocations
with kernel_end") we started seeing BMIPS boards fail to boot with the
built-in FDT being corrupted.
Before the cited commit, early allocations would be in the [kernel_end,
RAM_END] range, but after commit they would be within [RAM_START +
PAGE_SIZE, RAM_END].
The custom exception base handler that is installed by
bmips_ebase_setup() done for BMIPS5000 CPUs ends-up trampling on the
memory region allocated by unflatten_and_copy_device_tree() thus
corrupting the FDT used by the kernel.
To fix this, we need to perform an early reservation of the custom
exception space. Additional we reserve the first 4k (1k for R3k) for
either normal exception vector space (legacy CPUs) or special vectors
like cache exceptions.
Huge thanks to Serge for analysing and proposing a solution to this
issue.
Fixes: 2dcb396454 ("memblock: do not start bottom-up allocations with kernel_end")
Reported-by: Kamal Dasu <kdasu.kdev@gmail.com>
Debugged-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Serge Semin <fancer.lancer@gmail.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
The devicetree specification requires 8-byte alignment in
memory. This is now enforced by libfdt since commit 79edff1206
("scripts/dtc: Update to upstream version v1.6.0-51-g183df9e9c2b9")
which included the upstream commit 5e735860c478 ("libfdt: Check for
8-byte address alignment in fdt_ro_probe_()").
This broke the MIPS raw appended DTBs which would be appended to
the image immediately following the initramfs section. This ends
with a 32bit size, resulting in a 4-byte alignment of the DTB.
Fix by padding with zeroes to 8-bytes when MIPS_RAW_APPENDED_DTB
is defined.
Fixes: 79edff1206 ("scripts/dtc: Update to upstream version v1.6.0-51-g183df9e9c2b9")
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Frank Rowand <frowand.list@gmail.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Since 5.12-rc1, the Device Tree blob must now be properly aligned.
Therefore, the decompress routine must be careful to copy the blob at
the next aligned address after the kernel image.
This commit fixes the kernel sometimes not booting with a Device Tree
blob appended to it.
Fixes: 79edff1206 ("scripts/dtc: Update to upstream version v1.6.0-51-g183df9e9c2b9")
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
When booting bmips with SMP enabled on a BCM6358 running on CPU #1 instead of
CPU #0, the current CPU mapping code produces the following:
- smp_processor_id(): 0
- cpu_logical_map(0): 1
- cpu_number_map(0): 1
This is because SMP isn't supported on BCM6358 since it has a shared TLB, so
it is disabled and max_cpus is decreased from 2 to 1.
Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmA4JRkQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpoWqD/9dbbqe8L701U6May1A/4hRsqL4THTA2flx
vNCNRBl6XV3l/wBCtL6waKy6tyO4lyM8XdUdEvo3Kxl2kGPb8eVfpyYL/+77HqyH
ctT4RMrs+84Mxn+5N6cM97hS1qVI2moTxxyvOEl/JTB7BYrutz9gvAoeY3/Dto47
J66oSaPeuqJ32TyihxfQHVxQopJcqFzDjyoYHGDu6ATio1PXfaIdTu8ywVYSECAh
pWI4rwnqdurGuHMNpxyL1bA6CT/jC7s+sqU7bUYUCgtYI3eG0u3V0bp5gAQQIgl9
5sxxE3DidYGAkYZsosrelshBtzGddLdz4Qrt2ungMYv8RsGNpFQ095jDPKDwFaZj
bSvSsfplCo7iFsJByb1TtpNEOW8eAwi81PmBDVQ9Oq5P5ygTYno9GBDc/20ql0Fk
q6wcX28coE3IBw44ne0hIwvBOtXV4WJyluG/gqOxfbTH+kOy3pDsN8lWcY/P4X0U
yzdU2MLHe8BNMyYlUiBF47Amzt4ltr85P4XD3WZ4bX71iwri6HvrdGWLuuKwX+Ie
66QiIDDQIYZQ6NMMJWS9DGW3y3DBizpSXGxONbOw1J2bQdNmtToR0D2UnK/9UnKp
msnvkUNk8fkYGS4aptpJ6HxbmjMEG5YtbiGlPj6fz5/7MTvhRjPxt7A0LWrUIdqR
f88+sHUMqg==
=oc8u
-----END PGP SIGNATURE-----
Merge tag 'io_uring-worker.v3-2021-02-25' of git://git.kernel.dk/linux-block
Pull io_uring thread rewrite from Jens Axboe:
"This converts the io-wq workers to be forked off the tasks in question
instead of being kernel threads that assume various bits of the
original task identity.
This kills > 400 lines of code from io_uring/io-wq, and it's the worst
part of the code. We've had several bugs in this area, and the worry
is always that we could be missing some pieces for file types doing
unusual things (recent /dev/tty example comes to mind, userfaultfd
reads installing file descriptors is another fun one... - both of
which need special handling, and I bet it's not the last weird oddity
we'll find).
With these identical workers, we can have full confidence that we're
never missing anything. That, in itself, is a huge win. Outside of
that, it's also more efficient since we're not wasting space and code
on tracking state, or switching between different states.
I'm sure we're going to find little things to patch up after this
series, but testing has been pretty thorough, from the usual
regression suite to production. Any issue that may crop up should be
manageable.
There's also a nice series of further reductions we can do on top of
this, but I wanted to get the meat of it out sooner rather than later.
The general worry here isn't that it's fundamentally broken. Most of
the little issues we've found over the last week have been related to
just changes in how thread startup/exit is done, since that's the main
difference between using kthreads and these kinds of threads. In fact,
if all goes according to plan, I want to get this into the 5.10 and
5.11 stable branches as well.
That said, the changes outside of io_uring/io-wq are:
- arch setup, simple one-liner to each arch copy_thread()
implementation.
- Removal of net and proc restrictions for io_uring, they are no
longer needed or useful"
* tag 'io_uring-worker.v3-2021-02-25' of git://git.kernel.dk/linux-block: (30 commits)
io-wq: remove now unused IO_WQ_BIT_ERROR
io_uring: fix SQPOLL thread handling over exec
io-wq: improve manager/worker handling over exec
io_uring: ensure SQPOLL startup is triggered before error shutdown
io-wq: make buffered file write hashed work map per-ctx
io-wq: fix race around io_worker grabbing
io-wq: fix races around manager/worker creation and task exit
io_uring: ensure io-wq context is always destroyed for tasks
arch: ensure parisc/powerpc handle PF_IO_WORKER in copy_thread()
io_uring: cleanup ->user usage
io-wq: remove nr_process accounting
io_uring: flag new native workers with IORING_FEAT_NATIVE_WORKERS
net: remove cmsg restriction from io_uring based send/recvmsg calls
Revert "proc: don't allow async path resolution of /proc/self components"
Revert "proc: don't allow async path resolution of /proc/thread-self components"
io_uring: move SQPOLL thread io-wq forked worker
io-wq: make io_wq_fork_thread() available to other users
io-wq: only remove worker from free_list, if it was there
io_uring: remove io_identity
io_uring: remove any grabbing of context
...
- fix for ubsan warnings
- fix for bcm63xx platform
- update of linux-mips mailinglist
-----BEGIN PGP SIGNATURE-----
iQJOBAABCAA4FiEEbt46xwy6kEcDOXoUeZbBVTGwZHAFAmA3+WoaHHRzYm9nZW5k
QGFscGhhLmZyYW5rZW4uZGUACgkQeZbBVTGwZHB+mw/9HVN4BVGO6PM62ZH/WOYc
M6G8i0Bfa/ZO0XLiFMMUVlRtDkbXWaNbNIzO6f3ud6Me1+D3aLW8j4yhhVs+yki0
SuspDPDAwtnmsE1DMnr21eL+p5VSaMhdCLdL2pAWErLHBy3oXLSJ1kajhuMs+vVp
2Zq7k1rjJdiYAULep21VDi0CQfTTudFOCkJs0AqbSA6tmPiFZXp8Y/rDOHAu2wro
6iNd4qkh9ASpFgWWWBVuPmsKbTNAEaYIcjGMMm692sraKYOKxqeaCfwyG2odMNFk
If8tnFJgRg+wdS+ZUnX/R9cxrSrZ/QeoUOl6qGUnJWodhlGpm/nSjYSYm5mS7mVX
FPnG2NWjw6pvhf3reupyS4JqcdLJ9Ldk8KJZe8rLXIcrgd90kuMj0ahoc8NY04rs
/ZMVLWZ72XgFvEJeCWdXWvVf21gi3F0MGFJiGPHXTk1leDhNLRFR+ExaEPRjt1HK
KENlBNuXzO790EyfDe/Z7Abq3r4wf17RmXHcoU1FLSRxBtIT8gwF2E4jZB6gzlh7
WaXVHAkxyP0AO11Zo6h/eWaeeulH8lnPSKVvDL4px/apsvnlE56IPClBAMT3Fhhy
ZEiFIiT7eJzs4reDnB9PxxySR4v79xQTdv9jRi1f3wVRQiNaqkGdcrAaj9SVeck5
FsD/Desev/Hu1+maUlY221g=
=KT2b
-----END PGP SIGNATURE-----
Merge tag 'mips_5.12_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
Pull more MIPS updates from Thomas Bogendoerfer:
- added n64 block driver
- fix for ubsan warnings
- fix for bcm63xx platform
- update of linux-mips mailinglist
* tag 'mips_5.12_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
arch: mips: update references to current linux-mips list
mips: bmips: init clocks earlier
vmlinux.lds.h: catch even more instrumentation symbols into .data
n64: store dev instance into disk private data
n64: cleanup n64cart_probe()
n64: cosmetics changes
n64: remove curly brackets
n64: use sector SECTOR_SHIFT instead 512
n64: use enums for reg
n64: move module param at the top
n64: move module info at the end
n64: use pr_fmt to avoid duplicate string
block: Add n64 cart driver
- Fix false-positive build warnings for ARCH=ia64 builds
- Optimize dictionary size for module compression with xz
- Check the compiler and linker versions in Kconfig
- Fix misuse of extra-y
- Support DWARF v5 debug info
- Clamp SUBLEVEL to 255 because stable releases 4.4.x and 4.9.x
exceeded the limit
- Add generic syscall{tbl,hdr}.sh for cleanups across arches
- Minor cleanups of genksyms
- Minor cleanups of Kconfig
-----BEGIN PGP SIGNATURE-----
iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmA3zhgVHG1hc2FoaXJv
eUBrZXJuZWwub3JnAAoJED2LAQed4NsG0C4P/A5hUNFdkYI+EffAWZiHn69t0S8j
M1GQkZildKu/yOfm6hp3mNwgHmYgw0aAuch1htkJuv+5rXRtoK77yw0xKbUqNHyO
VqkJWQPVUXJbWIDiu332NaETHbFTWCnPZKGmzcbVOBHbYsXUJPp17gROQ9ke0fQN
Ae6OV5WINhoS8UnjESWb3qOO87MdQTZ+9mP+NMnVh4kV1SUeMAXLFwFll66KZTkj
GXB330N3p9L0wQVljhXpQ/YPOd76wJNPhJWJ9+hKLFbWsedovzlHb+duprh1z1xe
7LLaq9dEbXxe1Uz0qmK76lupXxilYMyUupTW9HIYtIsY8br8DIoBOG0bn46LVnuL
/m+UQNfUFCYYePT7iZQNNc1DISQJrxme3bjq0PJzZTDukNnHJVahnj9x4RoNaF8j
Dc+JME0r2i8Ccp28vgmaRgzvSsb8Xtw5icwRdwzIpyt1ubs/+tkd/GSaGzQo30Q8
m8y1WOjovHNX7OGnOaOWBGoQAX/2k/VHeAediMsPqWUoOxwsLHYxG/4KtgwbJ5vc
gu/Fyk1GRDklZPpLdYFVvz8TGnqSDogJgF+7WolJ6YvPGAUIDAfd5Ky2sWayddlm
wchc3sKDVyh3lov23h0WQVTvLO9xl+NZ6THxoAGdYeQ0DUu5OxwH8qje/UpWuo1a
DchhNN+g5pa6n56Z
=sLxb
-----END PGP SIGNATURE-----
Merge tag 'kbuild-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
- Fix false-positive build warnings for ARCH=ia64 builds
- Optimize dictionary size for module compression with xz
- Check the compiler and linker versions in Kconfig
- Fix misuse of extra-y
- Support DWARF v5 debug info
- Clamp SUBLEVEL to 255 because stable releases 4.4.x and 4.9.x
exceeded the limit
- Add generic syscall{tbl,hdr}.sh for cleanups across arches
- Minor cleanups of genksyms
- Minor cleanups of Kconfig
* tag 'kbuild-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (38 commits)
initramfs: Remove redundant dependency of RD_ZSTD on BLK_DEV_INITRD
kbuild: remove deprecated 'always' and 'hostprogs-y/m'
kbuild: parse C= and M= before changing the working directory
kbuild: reuse this-makefile to define abs_srctree
kconfig: unify rule of config, menuconfig, nconfig, gconfig, xconfig
kconfig: omit --oldaskconfig option for 'make config'
kconfig: fix 'invalid option' for help option
kconfig: remove dead code in conf_askvalue()
kconfig: clean up nested if-conditionals in check_conf()
kconfig: Remove duplicate call to sym_get_string_value()
Makefile: Remove # characters from compiler string
Makefile: reuse CC_VERSION_TEXT
kbuild: check the minimum linker version in Kconfig
kbuild: remove ld-version macro
scripts: add generic syscallhdr.sh
scripts: add generic syscalltbl.sh
arch: syscalls: remove $(srctree)/ prefix from syscall tables
arch: syscalls: add missing FORCE and fix 'targets' to make if_changed work
gen_compile_commands: prune some directories
kbuild: simplify access to the kernel's version
...
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCYCegywAKCRCRxhvAZXjc
ouJ6AQDlf+7jCQlQdeKKoN9QDFfMzG1ooemat36EpRRTONaGuAD8D9A4sUsG4+5f
4IU5Lj9oY4DEmF8HenbWK2ZHsesL2Qg=
=yPaw
-----END PGP SIGNATURE-----
Merge tag 'idmapped-mounts-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull idmapped mounts from Christian Brauner:
"This introduces idmapped mounts which has been in the making for some
time. Simply put, different mounts can expose the same file or
directory with different ownership. This initial implementation comes
with ports for fat, ext4 and with Christoph's port for xfs with more
filesystems being actively worked on by independent people and
maintainers.
Idmapping mounts handle a wide range of long standing use-cases. Here
are just a few:
- Idmapped mounts make it possible to easily share files between
multiple users or multiple machines especially in complex
scenarios. For example, idmapped mounts will be used in the
implementation of portable home directories in
systemd-homed.service(8) where they allow users to move their home
directory to an external storage device and use it on multiple
computers where they are assigned different uids and gids. This
effectively makes it possible to assign random uids and gids at
login time.
- It is possible to share files from the host with unprivileged
containers without having to change ownership permanently through
chown(2).
- It is possible to idmap a container's rootfs and without having to
mangle every file. For example, Chromebooks use it to share the
user's Download folder with their unprivileged containers in their
Linux subsystem.
- It is possible to share files between containers with
non-overlapping idmappings.
- Filesystem that lack a proper concept of ownership such as fat can
use idmapped mounts to implement discretionary access (DAC)
permission checking.
- They allow users to efficiently changing ownership on a per-mount
basis without having to (recursively) chown(2) all files. In
contrast to chown (2) changing ownership of large sets of files is
instantenous with idmapped mounts. This is especially useful when
ownership of a whole root filesystem of a virtual machine or
container is changed. With idmapped mounts a single syscall
mount_setattr syscall will be sufficient to change the ownership of
all files.
- Idmapped mounts always take the current ownership into account as
idmappings specify what a given uid or gid is supposed to be mapped
to. This contrasts with the chown(2) syscall which cannot by itself
take the current ownership of the files it changes into account. It
simply changes the ownership to the specified uid and gid. This is
especially problematic when recursively chown(2)ing a large set of
files which is commong with the aforementioned portable home
directory and container and vm scenario.
- Idmapped mounts allow to change ownership locally, restricting it
to specific mounts, and temporarily as the ownership changes only
apply as long as the mount exists.
Several userspace projects have either already put up patches and
pull-requests for this feature or will do so should you decide to pull
this:
- systemd: In a wide variety of scenarios but especially right away
in their implementation of portable home directories.
https://systemd.io/HOME_DIRECTORY/
- container runtimes: containerd, runC, LXD:To share data between
host and unprivileged containers, unprivileged and privileged
containers, etc. The pull request for idmapped mounts support in
containerd, the default Kubernetes runtime is already up for quite
a while now: https://github.com/containerd/containerd/pull/4734
- The virtio-fs developers and several users have expressed interest
in using this feature with virtual machines once virtio-fs is
ported.
- ChromeOS: Sharing host-directories with unprivileged containers.
I've tightly synced with all those projects and all of those listed
here have also expressed their need/desire for this feature on the
mailing list. For more info on how people use this there's a bunch of
talks about this too. Here's just two recent ones:
https://www.cncf.io/wp-content/uploads/2020/12/Rootless-Containers-in-Gitpod.pdfhttps://fosdem.org/2021/schedule/event/containers_idmap/
This comes with an extensive xfstests suite covering both ext4 and
xfs:
https://git.kernel.org/brauner/xfstests-dev/h/idmapped_mounts
It covers truncation, creation, opening, xattrs, vfscaps, setid
execution, setgid inheritance and more both with idmapped and
non-idmapped mounts. It already helped to discover an unrelated xfs
setgid inheritance bug which has since been fixed in mainline. It will
be sent for inclusion with the xfstests project should you decide to
merge this.
In order to support per-mount idmappings vfsmounts are marked with
user namespaces. The idmapping of the user namespace will be used to
map the ids of vfs objects when they are accessed through that mount.
By default all vfsmounts are marked with the initial user namespace.
The initial user namespace is used to indicate that a mount is not
idmapped. All operations behave as before and this is verified in the
testsuite.
Based on prior discussions we want to attach the whole user namespace
and not just a dedicated idmapping struct. This allows us to reuse all
the helpers that already exist for dealing with idmappings instead of
introducing a whole new range of helpers. In addition, if we decide in
the future that we are confident enough to enable unprivileged users
to setup idmapped mounts the permission checking can take into account
whether the caller is privileged in the user namespace the mount is
currently marked with.
The user namespace the mount will be marked with can be specified by
passing a file descriptor refering to the user namespace as an
argument to the new mount_setattr() syscall together with the new
MOUNT_ATTR_IDMAP flag. The system call follows the openat2() pattern
of extensibility.
The following conditions must be met in order to create an idmapped
mount:
- The caller must currently have the CAP_SYS_ADMIN capability in the
user namespace the underlying filesystem has been mounted in.
- The underlying filesystem must support idmapped mounts.
- The mount must not already be idmapped. This also implies that the
idmapping of a mount cannot be altered once it has been idmapped.
- The mount must be a detached/anonymous mount, i.e. it must have
been created by calling open_tree() with the OPEN_TREE_CLONE flag
and it must not already have been visible in the filesystem.
The last two points guarantee easier semantics for userspace and the
kernel and make the implementation significantly simpler.
By default vfsmounts are marked with the initial user namespace and no
behavioral or performance changes are observed.
The manpage with a detailed description can be found here:
1d7b902e28
In order to support idmapped mounts, filesystems need to be changed
and mark themselves with the FS_ALLOW_IDMAP flag in fs_flags. The
patches to convert individual filesystem are not very large or
complicated overall as can be seen from the included fat, ext4, and
xfs ports. Patches for other filesystems are actively worked on and
will be sent out separately. The xfstestsuite can be used to verify
that port has been done correctly.
The mount_setattr() syscall is motivated independent of the idmapped
mounts patches and it's been around since July 2019. One of the most
valuable features of the new mount api is the ability to perform
mounts based on file descriptors only.
Together with the lookup restrictions available in the openat2()
RESOLVE_* flag namespace which we added in v5.6 this is the first time
we are close to hardened and race-free (e.g. symlinks) mounting and
path resolution.
While userspace has started porting to the new mount api to mount
proper filesystems and create new bind-mounts it is currently not
possible to change mount options of an already existing bind mount in
the new mount api since the mount_setattr() syscall is missing.
With the addition of the mount_setattr() syscall we remove this last
restriction and userspace can now fully port to the new mount api,
covering every use-case the old mount api could. We also add the
crucial ability to recursively change mount options for a whole mount
tree, both removing and adding mount options at the same time. This
syscall has been requested multiple times by various people and
projects.
There is a simple tool available at
https://github.com/brauner/mount-idmapped
that allows to create idmapped mounts so people can play with this
patch series. I'll add support for the regular mount binary should you
decide to pull this in the following weeks:
Here's an example to a simple idmapped mount of another user's home
directory:
u1001@f2-vm:/$ sudo ./mount --idmap both:1000:1001:1 /home/ubuntu/ /mnt
u1001@f2-vm:/$ ls -al /home/ubuntu/
total 28
drwxr-xr-x 2 ubuntu ubuntu 4096 Oct 28 22:07 .
drwxr-xr-x 4 root root 4096 Oct 28 04:00 ..
-rw------- 1 ubuntu ubuntu 3154 Oct 28 22:12 .bash_history
-rw-r--r-- 1 ubuntu ubuntu 220 Feb 25 2020 .bash_logout
-rw-r--r-- 1 ubuntu ubuntu 3771 Feb 25 2020 .bashrc
-rw-r--r-- 1 ubuntu ubuntu 807 Feb 25 2020 .profile
-rw-r--r-- 1 ubuntu ubuntu 0 Oct 16 16:11 .sudo_as_admin_successful
-rw------- 1 ubuntu ubuntu 1144 Oct 28 00:43 .viminfo
u1001@f2-vm:/$ ls -al /mnt/
total 28
drwxr-xr-x 2 u1001 u1001 4096 Oct 28 22:07 .
drwxr-xr-x 29 root root 4096 Oct 28 22:01 ..
-rw------- 1 u1001 u1001 3154 Oct 28 22:12 .bash_history
-rw-r--r-- 1 u1001 u1001 220 Feb 25 2020 .bash_logout
-rw-r--r-- 1 u1001 u1001 3771 Feb 25 2020 .bashrc
-rw-r--r-- 1 u1001 u1001 807 Feb 25 2020 .profile
-rw-r--r-- 1 u1001 u1001 0 Oct 16 16:11 .sudo_as_admin_successful
-rw------- 1 u1001 u1001 1144 Oct 28 00:43 .viminfo
u1001@f2-vm:/$ touch /mnt/my-file
u1001@f2-vm:/$ setfacl -m u:1001:rwx /mnt/my-file
u1001@f2-vm:/$ sudo setcap -n 1001 cap_net_raw+ep /mnt/my-file
u1001@f2-vm:/$ ls -al /mnt/my-file
-rw-rwxr--+ 1 u1001 u1001 0 Oct 28 22:14 /mnt/my-file
u1001@f2-vm:/$ ls -al /home/ubuntu/my-file
-rw-rwxr--+ 1 ubuntu ubuntu 0 Oct 28 22:14 /home/ubuntu/my-file
u1001@f2-vm:/$ getfacl /mnt/my-file
getfacl: Removing leading '/' from absolute path names
# file: mnt/my-file
# owner: u1001
# group: u1001
user::rw-
user:u1001:rwx
group::rw-
mask::rwx
other::r--
u1001@f2-vm:/$ getfacl /home/ubuntu/my-file
getfacl: Removing leading '/' from absolute path names
# file: home/ubuntu/my-file
# owner: ubuntu
# group: ubuntu
user::rw-
user:ubuntu:rwx
group::rw-
mask::rwx
other::r--"
* tag 'idmapped-mounts-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: (41 commits)
xfs: remove the possibly unused mp variable in xfs_file_compat_ioctl
xfs: support idmapped mounts
ext4: support idmapped mounts
fat: handle idmapped mounts
tests: add mount_setattr() selftests
fs: introduce MOUNT_ATTR_IDMAP
fs: add mount_setattr()
fs: add attr_flags_to_mnt_flags helper
fs: split out functions to hold writers
namespace: only take read lock in do_reconfigure_mnt()
mount: make {lock,unlock}_mount_hash() static
namespace: take lock_mount_hash() directly when changing flags
nfs: do not export idmapped mounts
overlayfs: do not mount on top of idmapped mounts
ecryptfs: do not mount on top of idmapped mounts
ima: handle idmapped mounts
apparmor: handle idmapped mounts
fs: make helpers idmap mount aware
exec: handle idmapped mounts
would_dump: handle idmapped mounts
...
The linux-mips mailing list now lives at kernel.org. Update all references
in the kernel tree.
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Reviewed-by: Huacai Chen <chenhuacai@kernel.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
PF_IO_WORKER are kernel threads too, but they aren't PF_KTHREAD in the
sense that we don't assign ->set_child_tid with our own structure. Just
ensure that every arch sets up the PF_IO_WORKER threads like kthreads
in the arch implementation of copy_thread().
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The 'syscall' variables are not directly used in the commands.
Remove the $(srctree)/ prefix because we can rely on VPATH.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
The rules in these Makefiles cannot detect the command line change
because the prerequisite 'FORCE' is missing.
Adding 'FORCE' will result in the headers being rebuilt every time
because the 'targets' additions are also wrong; the file paths in
'targets' must be relative to the current Makefile.
Fix all of them so the if_changed rules work correctly.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
- added support for Realtek RTL83XX SoCs
- kaslr support for Loongson64
- first steps to get rid of set_fs()
- DMA runtime coherent/non-coherent selection cleanup
- cleanups and fixes
-----BEGIN PGP SIGNATURE-----
iQJOBAABCAA4FiEEbt46xwy6kEcDOXoUeZbBVTGwZHAFAmAvmm8aHHRzYm9nZW5k
QGFscGhhLmZyYW5rZW4uZGUACgkQeZbBVTGwZHCbwQ//aCn2l/GuXvKFuGkbTMSW
tqrnN4WNVpqK96BHlbS1CLzBj1Qzf9znXq4SBuK2ga3Gks/WOrG9vcOhARX3k5C3
dWi5TQzCKChjKmGxUij3mmIxE41L3vpZ2TOKCVVG8M+/0rMsWClNXfU/Bc9B4n4Y
VhZsd3KEjI+SezWT6h1Hw4bmhq2OPTH4CzZMA6Dpq3gZjmNBj1z5SMtLM0XA60dL
jkXYxYeMcPEWOoX69z2Gf1XFRWQNbCfnM3OHHeLeNo9eG4ZQbv4OlZLisXI81r71
0DWe/b/RZM0NdkgfSUM+Yen8KPgj4JcfA3cM6yKZClmF0IvrvvC4LvEBmCSoSfId
uQvPAwEoCFm0iuGhcL7XHCxL8QUKelrOWgzRzeMiVfX6XdSwW9evytjqQ5hYl5ov
lwIfmuK6Zc/c9mGLzbYG4b73eW1Kwhb9g+wvJRK44rFHZh5ztoYPgoB5Y+ECo9zO
nIfc9FjeyMIjLJEKSybYf8BZlyLUJPprUBLx0xHdL4cXCb62Im947F4d6uTwDyNI
oprIptQBMcJUwxSdIyreH5KyuV0Kyb20akmUB0wo6lx1+ilAQ0UsP9zTIkM4ihEN
Lu85RdX973iIJ9M9fS00LLOPn9Osu5QSMw0LcSHTr7Eme83WrfGY3juxf61SJcE4
ZxYki79OFzK8gFxEjstFqpY=
=kQfQ
-----END PGP SIGNATURE-----
Merge tag 'mips_5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
Pull MIPS updates from Thomas Bogendoerfer:
- added support for Nintendo N64
- added support for Realtek RTL83XX SoCs
- kaslr support for Loongson64
- first steps to get rid of set_fs()
- DMA runtime coherent/non-coherent selection cleanup
- cleanups and fixes
* tag 'mips_5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: (98 commits)
Revert "MIPS: Add basic support for ptrace single step"
vmlinux.lds.h: catch more UBSAN symbols into .data
MIPS: kernel: Drop kgdb_call_nmi_hook
MAINTAINERS: Add git tree for KVM/mips
MIPS: Use common way to parse elfcorehdr
MIPS: Simplify EVA cache handling
Revert "MIPS: kernel: {ftrace,kgdb}: Set correct address limit for cache flushes"
MIPS: remove CONFIG_DMA_PERDEV_COHERENT
MIPS: remove CONFIG_DMA_MAYBE_COHERENT
driver core: lift dma_default_coherent into common code
MIPS: refactor the runtime coherent vs noncoherent DMA indicators
MIPS/alchemy: factor out the DMA coherent setup
MIPS/malta: simplify plat_setup_iocoherency
MIPS: Add basic support for ptrace single step
MAINTAINERS: replace non-matching patterns for loongson{2,3}
MIPS: Make check condition for SDBBP consistent with EJTAG spec
mips: Replace lkml.org links with lore
Revert "MIPS: microMIPS: Fix the judgment of mm_jr16_op and mm_jalr_op"
MIPS: crash_dump.c: Simplify copy_oldmem_page()
Revert "mips: Manually call fdt_init_reserved_mem() method"
...
Pull ELF compat updates from Al Viro:
"Sanitizing ELF compat support, especially for triarch architectures:
- X32 handling cleaned up
- MIPS64 uses compat_binfmt_elf.c both for O32 and N32 now
- Kconfig side of things regularized
Eventually I hope to have compat_binfmt_elf.c killed, with both native
and compat built from fs/binfmt_elf.c, with -DELF_BITS={64,32} passed
by kbuild, but that's a separate story - not included here"
* 'work.elf-compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
get rid of COMPAT_ELF_EXEC_PAGESIZE
compat_binfmt_elf: don't bother with undef of ELF_ARCH
Kconfig: regularize selection of CONFIG_BINFMT_ELF
mips compat: switch to compat_binfmt_elf.c
mips: don't bother with ELF_CORE_EFLAGS
mips compat: don't bother with ELF_ET_DYN_BASE
mips: KVM_GUEST makes no sense for 64bit builds...
mips: kill unused definitions in binfmt_elf[on]32.c
mips binfmt_elf*32.c: use elfcore-compat.h
x32: make X32, !IA32_EMULATION setups able to execute x32 binaries
[amd64] clean PRSTATUS_SIZE/SET_PR_FPVALID up properly
elf_prstatus: collect the common part (everything before pr_reg) into a struct
binfmt_elf: partially sanitize PRSTATUS_SIZE and SET_PR_FPVALID
This reverts commit 7c86ff9925.
There are too many special cases for MIPS not covered by this patch.
In the end it might be better to implement single stepping in userland
than emulating it in the kernel.
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
With the removal of set_fs() calls kgdb_call_nmi_hook() is now the same as
the default implementation, so we can remove it.
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
"elfcorehdr" can be parsed at kernel/crash_dump.c
Signed-off-by: Jinyang He <hejinyang@loongson.cn>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
This reverts commit 6ebda44f36.
All icache flushes in this code paths are done via flush_icache_range(),
which only uses normal cache instruction. And this is the correct thing
for EVA mode, too. So no need to do set_fs(KERNEL_DS) here.
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
CONFIG_DMA_MAYBE_COHERENT just guards two early init options now. Just
enable them unconditionally for CONFIG_DMA_NONCOHERENT.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Lift the dma_default_coherent variable from the mips architecture code
to the driver core. This allows an architecture to sdefault all device
to be DMA coherent at run time, even if the kernel is build with support
for DMA noncoherent device. By allowing device_initialize to set the
->dma_coherent field to this default the amount of arch hooks required
for this behavior can be greatly reduced.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Replace the global coherentio enum, and the hw_coherentio (fake) boolean
variables with a single boolean dma_default_coherent flag.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
In the current code, arch_has_single_step() is not defined on MIPS,
that means MIPS does not support instruction single-step for user mode.
Delve is a debugger for the Go programming language, the ptrace syscall
PtraceSingleStep() failed [1] on MIPS and then the single step function
can not work well, we can see that PtraceSingleStep() definition returns
ptrace(PTRACE_SINGLESTEP) [2].
So it is necessary to support ptrace single step on MIPS.
At the beginning, we try to use the Debug Single Step exception on the
Loongson 3A4000 platform, but it has no effect when set CP0_DEBUG SSt
bit, this is because CP0_DEBUG NoSSt bit is 1 which indicates no
single-step feature available [3], so this way which is dependent on the
hardware is almost impossible.
With further research, we find out there exists a common way used with
break instruction in arch/alpha/kernel/ptrace.c, it is workable.
For the above analysis, define arch_has_single_step(), add the common
function user_enable_single_step() and user_disable_single_step(), set
flag TIF_SINGLESTEP for child process, use break instruction to set
breakpoint.
We can use the following testcase to test it:
tools/testing/selftests/breakpoints/step_after_suspend_test.c
$ make -C tools/testing/selftests TARGETS=breakpoints
$ cd tools/testing/selftests/breakpoints
Without this patch:
$ ./step_after_suspend_test -n
TAP version 13
1..4
# ptrace(PTRACE_SINGLESTEP) not supported on this architecture: Input/output error
ok 1 # SKIP CPU 0
# ptrace(PTRACE_SINGLESTEP) not supported on this architecture: Input/output error
ok 2 # SKIP CPU 1
# ptrace(PTRACE_SINGLESTEP) not supported on this architecture: Input/output error
ok 3 # SKIP CPU 2
# ptrace(PTRACE_SINGLESTEP) not supported on this architecture: Input/output error
ok 4 # SKIP CPU 3
# Totals: pass:0 fail:0 xfail:0 xpass:0 skip:4 error:0
With this patch:
$ ./step_after_suspend_test -n
TAP version 13
1..4
ok 1 CPU 0
ok 2 CPU 1
ok 3 CPU 2
ok 4 CPU 3
# Totals: pass:4 fail:0 xfail:0 xpass:0 skip:0 error:0
[1] https://github.com/go-delve/delve/blob/master/pkg/proc/native/threads_linux.go#L50
[2] https://github.com/go-delve/delve/blob/master/vendor/golang.org/x/sys/unix/syscall_linux.go#L1573
[3] http://www.t-es-t.hu/download/mips/md00047f.pdf
Reported-by: Guoqi Chen <chenguoqi@loongson.cn>
Signed-off-by: Xingxing Su <suxingxing@loongson.cn>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
According to MIPS EJTAG Specification [1], a Debug Breakpoint
exception occurs when an SDBBP instruction is executed, the
CP0_DEBUG bit DBp indicates that a Debug Breakpoint exception
occurred.
When I read the original code, it looks a little confusing
at first glance, just check bit DBp for SDBBP to make the
code more readable, it will be much easier to understand.
[1] http://www.t-es-t.hu/download/mips/md00047f.pdf
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Replace kmap_atomic_pfn() with kmap_local_pfn() which is preemptible and
can take page faults.
Remove the indirection of the dump page and the related cruft which is not
longer required.
Remove unused or redundant header files.
Signed-off-by: Youling Tang <tangyouling@loongson.cn>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
This reverts commit 3751cbda8f.
Originally the patch was created to fix the reserved-memory DT-node
parsing failure on the early stages of the platform memory initialization.
That happened due to the two early memory allocators utilization that
time: bootmem and memblock. At first the platform-specific memory mapping
array was initialized. Then the early_init_fdt_scan_reserved_mem() was
called, which couldn't fully parse the "reserved-memory" DT-node since
neither memblock nor bootmem allocators hadn't been initialized at that
stage, so the fdt_init_reserved_mem() method failed on the memory
allocation calls. Only after that the platform-specific memory mapping
were used to create proper bootmem and memblock structures and let the
early memory allocations work. That's why we had to call the
fdt_init_reserved_mem() method one more time to retry the initialization
of the features like CMA.
The necessity to have that fix was disappeared after the full memblock
support had been added to the MIPS kernel and all plat_mem_setup() had
been fixed to add the memory regions right into the memblock memory pool.
Let's revert that patch then especially after having Paul reported that
the second fdt_init_reserved_mem() call causes the reserved memory pool
being created twice bigger than implied.
Fixes: a94e4f24ec ("MIPS: init: Drop boot_mem_map")
Reported-by: Paul Cercueil <paul@crapouillou.net>
Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
The type of the VMLINUX_LOAD_ADDRESS macro is the (unsigned long long)
in 32bits kernel but (unsigned long) in the 64-bit kernel. Although there
is no error here, avoid using it to calculate kaslr_offset.
Signed-off-by: Jinyang He <hejinyang@loongson.cn>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Provide kaslr_offset() to get the kernel offset when KASLR is enabled.
Error may occur before update_kaslr_offset(), so put it at the end of
the offset branch.
Fixes: a307a4ce9e ("MIPS: Loongson64: Add KASLR support")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Jinyang He <hejinyang@loongson.cn>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Add perf_event_mips_regs/perf_reg_value/perf_reg_validate to support
features HAVE_PERF_REGS/HAVE_PERF_USER_STACK_DUMP in kernel.
[ayan@wavecomp.com: Repick this patch for unwinding userstack backtrace
by perf and libunwind on MIPS based CPU.]
[ralf@linux-mips.org: Add perf_get_regs_user() which is required after
'commit 88a7c26af8 ("perf: Move task_pt_regs sampling into arch code")'.]
[yangtiezhu@loongson.cn: Fix build error about perf_get_regs_user() after
commit 76a4efa809 ("perf/arch: Remove perf_sample_data::regs_user_copy"),
and also separate the original patches into two parts (MIPS kernel and perf
tools) to merge easily.]
The original patches:
https://lore.kernel.org/patchwork/patch/1126521/https://lore.kernel.org/patchwork/patch/1126520/
Signed-off-by: David Daney <david.daney@cavium.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Archer Yan <ayan@wavecomp.com>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Selection of the DTB to be used was burried in more or less readable
code in head.S. Move this code into a inline helper function and
use it.
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
I couldn't find any user of the dubious vpe_getcwd so far. So remove it and
get rid of another set_fs(KERNEL_DS).
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
kernel test robot throws below warning ->
arch/mips/kernel/cacheinfo.c:112:3: warning: Variable 'level' is
modified but its new value is never used. [unreadVariable]
Remove unnecessary increment of level at the end.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
For those leaf functions, they are likely to have no stack operations.
Add is_jr_ra_ins() to determine whether jr ra has been touched before
the frame_size is found. Without this patch, the get frame_size operation
may be out of range and get the frame_size from the next nested function.
There is no POOL32A format in uapi/asm/inst.h, so some bits here use the
format of r_format instead.
e.g.
---------------------------------------------------------------------
| format | 31:26 | 25:21 | 20:16 | 15:6 | 5:0 |
-----------------+---------+-------+-------+------------+------------
| pool32a_format | pool32a | rt | rs | jalrc | pool32axf |
-----------------+---------+-------+-------+------------+------------
| r_format | opcode | rs | rt | rd:5, re:5 | func |
---------------------------------------------------------------------
Signed-off-by: Jinyang He <hejinyang@loongson.cn>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
[1]: Commit b6c7a324df ("MIPS: Fix get_frame_info() handling of
microMIPS function size")
[2]: Commit 2b424cfc69 ("MIPS: Remove function size check in
get_frame_info()")
First patch added a constant to check the number of iterations against.
Second patch fixed the situation that info->func_size is zero.
However, func_size member became useless after the second commit. Without
ip_end, the get frame_size operation may be out of range although KALLSYMS
enabled. Thus, check func_size first. Then make ip_end be the sum of ip
and a constant (512) if func_size is equal to 0. Otherwise make ip_end be
the sum of ip and func_size.
Signed-off-by: Jinyang He <hejinyang@loongson.cn>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
mm16_r5_format.rt is 5 bits, so directly judge the value if equal or not.
mm_jalr_op requires 7th to 16th bits. These 10 which bits generated by
shifting u_format.uimmediate by 6 may be affected by sign extension.
Thus, take out the 10 bits for comparison.
Without this patch, errors may occur, such as these bits are all ones.
Signed-off-by: Jinyang He <hejinyang@loongson.cn>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Some headers are not necessary, remove them and sort includes.
Signed-off-by: Jinyang He <hejinyang@loongson.cn>
Reviewed-by: Huacai Chen <chenhuacai@kernel.org>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
This implements the missing mount_setattr() syscall. While the new mount
api allows to change the properties of a superblock there is currently
no way to change the properties of a mount or a mount tree using file
descriptors which the new mount api is based on. In addition the old
mount api has the restriction that mount options cannot be applied
recursively. This hasn't changed since changing mount options on a
per-mount basis was implemented in [1] and has been a frequent request
not just for convenience but also for security reasons. The legacy
mount syscall is unable to accommodate this behavior without introducing
a whole new set of flags because MS_REC | MS_REMOUNT | MS_BIND |
MS_RDONLY | MS_NOEXEC | [...] only apply the mount option to the topmost
mount. Changing MS_REC to apply to the whole mount tree would mean
introducing a significant uapi change and would likely cause significant
regressions.
The new mount_setattr() syscall allows to recursively clear and set
mount options in one shot. Multiple calls to change mount options
requesting the same changes are idempotent:
int mount_setattr(int dfd, const char *path, unsigned flags,
struct mount_attr *uattr, size_t usize);
Flags to modify path resolution behavior are specified in the @flags
argument. Currently, AT_EMPTY_PATH, AT_RECURSIVE, AT_SYMLINK_NOFOLLOW,
and AT_NO_AUTOMOUNT are supported. If useful, additional lookup flags to
restrict path resolution as introduced with openat2() might be supported
in the future.
The mount_setattr() syscall can be expected to grow over time and is
designed with extensibility in mind. It follows the extensible syscall
pattern we have used with other syscalls such as openat2(), clone3(),
sched_{set,get}attr(), and others.
The set of mount options is passed in the uapi struct mount_attr which
currently has the following layout:
struct mount_attr {
__u64 attr_set;
__u64 attr_clr;
__u64 propagation;
__u64 userns_fd;
};
The @attr_set and @attr_clr members are used to clear and set mount
options. This way a user can e.g. request that a set of flags is to be
raised such as turning mounts readonly by raising MOUNT_ATTR_RDONLY in
@attr_set while at the same time requesting that another set of flags is
to be lowered such as removing noexec from a mount tree by specifying
MOUNT_ATTR_NOEXEC in @attr_clr.
Note, since the MOUNT_ATTR_<atime> values are an enum starting from 0,
not a bitmap, users wanting to transition to a different atime setting
cannot simply specify the atime setting in @attr_set, but must also
specify MOUNT_ATTR__ATIME in the @attr_clr field. So we ensure that
MOUNT_ATTR__ATIME can't be partially set in @attr_clr and that @attr_set
can't have any atime bits set if MOUNT_ATTR__ATIME isn't set in
@attr_clr.
The @propagation field lets callers specify the propagation type of a
mount tree. Propagation is a single property that has four different
settings and as such is not really a flag argument but an enum.
Specifically, it would be unclear what setting and clearing propagation
settings in combination would amount to. The legacy mount() syscall thus
forbids the combination of multiple propagation settings too. The goal
is to keep the semantics of mount propagation somewhat simple as they
are overly complex as it is.
The @userns_fd field lets user specify a user namespace whose idmapping
becomes the idmapping of the mount. This is implemented and explained in
detail in the next patch.
[1]: commit 2e4b7fcd92 ("[PATCH] r/o bind mounts: honor mount writer counts at remount")
Link: https://lore.kernel.org/r/20210121131959.646623-35-christian.brauner@ubuntu.com
Cc: David Howells <dhowells@redhat.com>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-api@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
This reverts commit f9065b54d4.
We're adding Nintendo 64 support, so the VR4300 is no longer unused.
Signed-off-by: Lauri Kasanen <cand@gmx.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
For now, vmlinux relocation functions for relocatable kernel are
implemented as an array of handlers of a particular type.
Convert that array into a single switch-case function to:
- remove unused arguments;
- change the return type of simple handlers to void;
- remove the array and don't use any data at all;
- avoid using indirect calls;
- allow the compiler to inline and greatly optimize
the relocation function[s];
and also mark do_relocations() and show_kernel_relocation() static
as they aren't used anywhere else.
The result on MIPS32 R2 with GCC 10.2 -O2 is:
scripts/bloat-o-meter -c arch/mips/kernel/__relocate.o arch/mips/kernel/relocate.o
add/remove: 0/6 grow/shrink: 1/0 up/down: 356/-640 (-284)
Function old new delta
relocate_kernel 852 1208 +356
apply_r_mips_32_rel 20 - -20
apply_r_mips_hi16_rel 40 - -40
apply_r_mips_64_rel 44 - -44
apply_r_mips_26_rel 144 - -144
show_kernel_relocation 164 - -164
do_relocations 228 - -228
Total: Before=1780, After=1496, chg -15.96%
add/remove: 0/1 grow/shrink: 0/0 up/down: 0/-76 (-76)
Data old new delta
reloc_handlers_rel 76 - -76
Total: Before=92, After=16, chg -82.61%
add/remove: 0/0 grow/shrink: 0/0 up/down: 0/0 (0)
RO Data old new delta
Total: Before=0, After=0, chg +0.00%
All functions were collapsed into the main one, relocate_kernel().
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
For now, module relocation functions are implemented as an array
of handlers of type reloc_handler_t.
Convert that array into a single switch-case function to:
- remove unused arguments;
- change the return type of simple handlers to void;
- remove the array and don't use any data at all;
- avoid using indirect calls;
- allow the compiler to inline and greatly optimize
the relocation function[s].
The result on MIPS32 R2 with GCC 10.2 -O2 is:
scripts/bloat-o-meter -c arch/mips/kernel/__module.o arch/mips/kernel/module.o
add/remove: 1/11 grow/shrink: 1/0 up/down: 876/-1436 (-560)
Function old new delta
apply_relocate 456 1148 +692
apply_r_mips_pc - 184 +184
apply_r_mips_none 8 - -8
apply_r_mips_32 16 - -16
apply_r_mips_64 76 - -76
apply_r_mips_highest 88 - -88
apply_r_mips_higher 108 - -108
apply_r_mips_26 132 - -132
apply_r_mips_pc26 160 - -160
apply_r_mips_pc21 160 - -160
apply_r_mips_pc16 160 - -160
apply_r_mips_hi16 172 - -172
apply_r_mips_lo16 356 - -356
Total: Before=2608, After=2048, chg -21.47%
add/remove: 0/0 grow/shrink: 0/0 up/down: 0/0 (0)
Data old new delta
Total: Before=12, After=12, chg +0.00%
add/remove: 0/1 grow/shrink: 0/0 up/down: 0/-248 (-248)
RO Data old new delta
reloc_handlers 248 - -248
Total: Before=248, After=0, chg -100.00%
All functions were collapsed into a single one that is called
directly by $(srctree)/kernel/module.c.
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
The JZ4760 has the HPTLB as well, but has a XBurst CPU with a D0 CPUID.
Disable the HPTLB for all XBurst CPUs with a D0 CPUID. In the case where
there is no HPTLB (e.g. for older SoCs), this won't have any side
effect.
Fixes: b02efeb056 ("MIPS: Ingenic: Disable abandoned HPTLB function.")
Cc: <stable@vger.kernel.org> # 5.4
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Reviewed-by: 周琰杰 (Zhou Yanjie) <zhouyanjie@wanyeetech.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
LLVM stack generates GOT table when building the kernel:
ld.lld: warning: <internal>:(.got) is being placed in '.got'
According to the debug assertions, it's not zero-sized and thus can't
be handled the way it's done for x86.
Also use the ARM64 path here and place it at the end of .text section.
Reported-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
According to linker warnings, both GCC and LLVM generate '.rel.dyn'
symbols:
mips-alpine-linux-musl-ld: warning: orphan section `.rel.dyn'
from `init/main.o' being placed in section `.rel.dyn'
Link-time assertion shows that this section is sometimes empty,
sometimes not, depending on machine bitness and the compiler [0]:
LD .tmp_vmlinux.kallsyms1
mips64-linux-gnu-ld: Unexpected run-time relocations (.rel) detected!
Just use the ARM64 approach and declare it in vmlinux.lds.S closer
to __init_end.
[0] https://lore.kernel.org/linux-mips/20210109111259.GA4213@alpha.franken.de
Reported-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Commit 866b6a89c6 ("MIPS: Add DWARF unwinding to assembly") added
-fno-asynchronous-unwind-tables to KBUILD_CFLAGS to prevent compiler
from emitting .eh_frame symbols.
However, as MIPS heavily uses CFI, that's not enough. Use the
approach taken for x86 (as it also uses CFI) and explicitly put CFI
symbols into the .debug_frame section (except for VDSO).
This allows us to drop .eh_frame from DISCARDS as it's no longer
being generated.
Fixes: 866b6a89c6 ("MIPS: Add DWARF unwinding to assembly")
Suggested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Discard GNU attributes (MIPS FP type, GNU Hash etc.) at link time
as kernel doesn't use it at all.
Solves a dozen of the following ld warnings (one per every file):
mips-alpine-linux-musl-ld: warning: orphan section `.gnu.attributes'
from `arch/mips/kernel/head.o' being placed in section
`.gnu.attributes'
mips-alpine-linux-musl-ld: warning: orphan section `.gnu.attributes'
from `init/main.o' being placed in section `.gnu.attributes'
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
A number of symbols from arch/mips/kernel/cps-vec.S is explicitly
placed into '.text.cps-vec' section.
There are no direct references to this section, so there's no need
to form it. '.balign 0x1000' directive will work anyway.
Moreover, this section was being placed in vmlinux differently
depending on CONFIG_LD_DEAD_CODE_DATA_ELIMINATION:
- with this option enabled, '.text.cps-vec' was being caught
by '.text.[0-9a-zA-Z_]*' from include/asm-generic/vmlinux.lds.h;
- without this option, '.text.cps-vec' was being caught
by discouraging '.text.*' from arch/mips/kernel/vmlinux.lds.S.
'.text.*' should not be used in vmlinux linker scripts at all as it
silently catches any orphan text sections.
So, remove both '.section .text.cps-vec' and '.text.*' from cps-vec.S
and vmlinux.lds.S respectively. As said, this does not affect related
functions alignment:
80116000 T mips_cps_core_entry
80116028 t not_nmi
80116200 T excep_tlbfill
80116280 T excep_xtlbfill
80116300 T excep_cache
80116380 T excep_genex
80116400 T excep_intex
80116480 T excep_ejtag
80116490 T mips_cps_core_init
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
MIPS uses its own declaration of rwdata, and thus it should be kept
in sync with the asm-generic one. Currently PAGE_ALIGNED_DATA() is
missing from the linker script, which emits the following ld
warnings:
mips-alpine-linux-musl-ld: warning: orphan section
`.data..page_aligned' from `arch/mips/kernel/vdso.o' being placed
in section `.data..page_aligned'
mips-alpine-linux-musl-ld: warning: orphan section
`.data..page_aligned' from `arch/mips/vdso/vdso-image.o' being placed
in section `.data..page_aligned'
Add the necessary declaration, so the mentioned structures will be
placed in vmlinux as intended:
ffffffff80630580 D __end_once
ffffffff80630580 D __start___dyndbg
ffffffff80630580 D __start_once
ffffffff80630580 D __stop___dyndbg
ffffffff80634000 d mips_vdso_data
ffffffff80638000 d vdso_data
ffffffff80638580 D _gp
ffffffff8063c000 T __init_begin
ffffffff8063c000 D _edata
ffffffff8063c000 T _sinittext
->
ffffffff805a4000 D __end_init_task
ffffffff805a4000 D __nosave_begin
ffffffff805a4000 D __nosave_end
ffffffff805a4000 d mips_vdso_data
ffffffff805a8000 d vdso_data
ffffffff805ac000 D mmlist_lock
ffffffff805ac080 D tasklist_lock
Fixes: ebb5e78cc6 ("MIPS: Initial implementation of a VDSO")
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Cc: stable@vger.kernel.org # 4.4+
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
LLVM-built Linux triggered a boot hangup with KASLR enabled.
arch/mips/kernel/relocate.c:get_random_boot() uses linux_banner,
which is a string constant, as a random seed, but accesses it
as an array of unsigned long (in rotate_xor()).
When the address of linux_banner is not aligned to sizeof(long),
such access emits unaligned access exception and hangs the kernel.
Use PTR_ALIGN() to align input address to sizeof(long) and also
align down the input length to prevent possible access-beyond-end.
Fixes: 405bc8fd12 ("MIPS: Kernel: Implement KASLR using CONFIG_RELOCATABLE")
Cc: stable@vger.kernel.org # 4.7+
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Tested-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
unistd_nr_{n32,n64,o32}.h are needed only by include/asm/unistd.h,
which is a kernel-side header file, and their contents is generally
not for userland use.
Move their target destination from include/generated/uapi/asm/ to
include/generated/asm/ to disable exporting them as UAPI headers.
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Like amd64, mips has two 32bit ABIs - o32 and n32. Unlike amd64,
it does not use compat_binfmt_elf.c for either of those; each
of those ABIs has a binfmt handler of its own, both very similar
to fs/compat_binfmt_elf.c. And the same technics as we use on
amd64 can be used to make fs/compat_binfmt_elf.c handle both.
* merge elfo32_check_arch() with elfn32_check_arch(),
make that serve as compat_elf_check_arch(). Note that
SET_PERSONALITY2() is already the same for all ABI variants -
it looks at the elf header to choose the flags to set.
* add asm/elfcore-compat.h, using the bigger (n32) variant
of elf32_prstatus as compat_elf_prstatus there.
* make PRSTATUS_SIZE() and SET_PR_FPVALID() choose the
right layout, same as done for amd64. test_thread_flag(TIF_32BIT_REGS)
is used as the predicate.
Voila - we are rid of binfmt_elf{n,o}32.c; fs/compat_binfmt_elf.c is
used, same as for all other ELF-supporting 64bit architectures that
need 32bit compat.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
normal mips one is just fine - it's only used after we'd done
SET_PERSONALITY2() and by that point TASK_SIZE will yield the
right value
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
elf_caddr_t: unused since 2002
jiffies_to_timeval: unused since 2015
TASK_SIZE: used only downstream of SET_PERSONALITY2(), and after that
point the normal definition results in TASK_SIZE32 just fine.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Preparations to doing i386 compat elf_prstatus sanely - rather than duplicating
the beginning of compat_elf_prstatus, take these fields into a separate
structure (compat_elf_prstatus_common), so that it could be reused. Due to
the incestous relationship between binfmt_elf.c and compat_binfmt_elf.c we
need the same shape change done to native struct elf_prstatus, gathering the
fields prior to pr_reg into a new structure (struct elf_prstatus_common).
Fortunately, offset of pr_reg is always a multiple of 16 with no padding
right before it, so it's possible to turn all the stuff prior to it into
a single member without disturbing the layout.
[build fix from Geert Uytterhoeven folded in]
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
RM7000 IRQ driver never got really used by any of the platform,
and rm9k_cpu_irq_init only exist in a header.
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
According to Hardware Reference Manual, OCTEON III
are mostly same as previous OCTEON models. So just
enable them and extend supported event code.
0x3e and 0x3f still reserved.
Signed-off-by: Jia Qingtong <jiaqingtong97@163.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Victim Cache is defined by Loongson as per-core unified
private Cache.
Add this into cacheinfo and make cache levels selfincrement
instead of hardcode levels.
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Reviewed-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Tested-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Patches that introduced NT_FILE and NT_SIGINFO notes back in 2012
had taken care of native (fs/binfmt_elf.c) and compat (fs/compat_binfmt_elf.c)
coredumps; unfortunately, compat on mips (which does not go through the
usual compat_binfmt_elf.c) had not been noticed.
As the result, both N32 and O32 coredumps on 64bit mips kernels
have those sections malformed enough to confuse the living hell out of
all gdb and readelf versions (up to and including the tip of binutils-gdb.git).
Longer term solution is to make both O32 and N32 compat use the
regular compat_binfmt_elf.c, but that's too much for backports. The minimal
solution is to do in arch/mips/kernel/binfmt_elf[on]32.c the same thing
those patches have done in fs/compat_binfmt_elf.c
Cc: stable@kernel.org # v3.7+
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
After commit 9cce844abf ("MIPS: CPU#0 is not hotpluggable"),
c->hotpluggable is 0 for CPU 0 and it will not generate a control
file in sysfs for this CPU:
[root@linux loongson]# cat /sys/devices/system/cpu/cpu0/online
cat: /sys/devices/system/cpu/cpu0/online: No such file or directory
[root@linux loongson]# echo 0 > /sys/devices/system/cpu/cpu0/online
bash: /sys/devices/system/cpu/cpu0/online: Permission denied
So no need to check CPU 0 in cps_cpu_disable(), just remove it.
Reported-by: liwei (GF) <liwei391@huawei.com>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Replace a comma between expression statements by a semicolon.
Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Replace a comma between expression statements by a semicolon.
Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Commit b0a0c2615f ("epoll: wire up syscall epoll_pwait2") wired up
the 64 bit syscall instead of the compat variant in a couple of places.
Fixes: b0a0c2615f ("epoll: wire up syscall epoll_pwait2")
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Split off from prev patch in the series that implements the syscall.
Link: https://lkml.kernel.org/r/20201121144401.3727659-4-willemdebruijn.kernel@gmail.com
Signed-off-by: Willem de Bruijn <willemb@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl/YJxsQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpjpyEACBdW+YjenjTbkUPeEXzQgkBkTZUYw3g007
DPcUT1g8PQZXYXlQvBKCvGhhIr7/KVcjepKoowiNQfBNGcIPJTVopW58nzpqAfTQ
goI2WYGn5EKFFKBPvtH04cJD/Wo8muXdxynKtqyZbnGGgZjQxPrE259b8dpHjBSR
6L7HHkk0D1oU/5b6h6Ocpg9mc/0iIUCZylySAYY3eGO0JaVPJaXgZSJZYgHxCHll
Lb+/y/fXdtm/0PmQ3ko0ev54g3yEWqZIX0NsZW1asrButIy+KLzQ2Mz1xFLFDMag
prtIfwb8tzgc4dFPY090C/azjCh5CPpxqYS6FkRwS0p86n6OhkyXrqfily5Hs4/B
NC7CBPBSH/j+NKUK7CYZcpTzTpxPjUr9p0anUdlvMJz8FhTb/3YEEZ1UTeWOeHmk
Yo5SxnFghLeZZeZ1ok6rdymnVa7WEX12SCLGQX31BB2mld0tNbKb4b+FsBF6OUMk
IUaX6OjwDFVRaysC88BQ4hjcIP1HxsViG4/VZDX15gjAAH2Pvb+7tev+lcDcOhjz
TCD4GNFspTFzRhh9nT7oxQ679qCh9G9zHbzuIRewnrS6iqvo5SJQB3dR2yrWZRRH
ySkQFiHpYOlnLJYv0jg9COlGwo2FUdcvKhCvkjQKKBz48rzW/IC0LwKdRQWZDFk3
FKGzP/NBig==
=cadT
-----END PGP SIGNATURE-----
Merge tag 'tif-task_work.arch-2020-12-14' of git://git.kernel.dk/linux-block
Pull TIF_NOTIFY_SIGNAL updates from Jens Axboe:
"This sits on top of of the core entry/exit and x86 entry branch from
the tip tree, which contains the generic and x86 parts of this work.
Here we convert the rest of the archs to support TIF_NOTIFY_SIGNAL.
With that done, we can get rid of JOBCTL_TASK_WORK from task_work and
signal.c, and also remove a deadlock work-around in io_uring around
knowing that signal based task_work waking is invoked with the sighand
wait queue head lock.
The motivation for this work is to decouple signal notify based
task_work, of which io_uring is a heavy user of, from sighand. The
sighand lock becomes a huge contention point, particularly for
threaded workloads where it's shared between threads. Even outside of
threaded applications it's slower than it needs to be.
Roman Gershman <romger@amazon.com> reported that his networked
workload dropped from 1.6M QPS at 80% CPU to 1.0M QPS at 100% CPU
after io_uring was changed to use TIF_NOTIFY_SIGNAL. The time was all
spent hammering on the sighand lock, showing 57% of the CPU time there
[1].
There are further cleanups possible on top of this. One example is
TIF_PATCH_PENDING, where a patch already exists to use
TIF_NOTIFY_SIGNAL instead. Hopefully this will also lead to more
consolidation, but the work stands on its own as well"
[1] https://github.com/axboe/liburing/issues/215
* tag 'tif-task_work.arch-2020-12-14' of git://git.kernel.dk/linux-block: (28 commits)
io_uring: remove 'twa_signal_ok' deadlock work-around
kernel: remove checking for TIF_NOTIFY_SIGNAL
signal: kill JOBCTL_TASK_WORK
io_uring: JOBCTL_TASK_WORK is no longer used by task_work
task_work: remove legacy TWA_SIGNAL path
sparc: add support for TIF_NOTIFY_SIGNAL
riscv: add support for TIF_NOTIFY_SIGNAL
nds32: add support for TIF_NOTIFY_SIGNAL
ia64: add support for TIF_NOTIFY_SIGNAL
h8300: add support for TIF_NOTIFY_SIGNAL
c6x: add support for TIF_NOTIFY_SIGNAL
alpha: add support for TIF_NOTIFY_SIGNAL
xtensa: add support for TIF_NOTIFY_SIGNAL
arm: add support for TIF_NOTIFY_SIGNAL
microblaze: add support for TIF_NOTIFY_SIGNAL
hexagon: add support for TIF_NOTIFY_SIGNAL
csky: add support for TIF_NOTIFY_SIGNAL
openrisc: add support for TIF_NOTIFY_SIGNAL
sh: add support for TIF_NOTIFY_SIGNAL
um: add support for TIF_NOTIFY_SIGNAL
...
- enabled GCOV
- reworked setup of protection map
- added support for more MSCC platforms
- added sysfs boardinfo for Loongson64
- enabled KASLR for Loogson64
- added reset controller for BCM63xx
- cleanups and fixes
-----BEGIN PGP SIGNATURE-----
iQJOBAABCAA4FiEEbt46xwy6kEcDOXoUeZbBVTGwZHAFAl/Z6RIaHHRzYm9nZW5k
QGFscGhhLmZyYW5rZW4uZGUACgkQeZbBVTGwZHBYtw//bKd5RI/wRgkejish0zX7
Qt6tcnPAZRPFopq/VyepNekgleM5uR8wKJ2YzBYDvyjGL2l217WTvqZG6lTLhLI6
3YJ2xDEi7g5FmecZNF4648iGJBVLu78om2SF5HtXeJaNR1L2ofnCvP+La2y5194M
4J1dxR+V8TZ61ufTEqn3KT+cUet+EAwLsPKqdk6ThlqkDfItjbVUkWG/Hc0hMu2C
pyvnto0cvNDYZE1PNeOjhitWnSeuGX2KOSF3hYaPPEGh8V6tqtR1JTGKIJlN92iH
LCFC7fLTt/ju12/OAnLbGqM1gpoMP7lsflbQAeNZzaBhtDZ7seq+6iXajimosQMj
Vxd3+ur+3+zd/I03uKk2Tcp9Z/m6wv9tux7Rr6J8rQ/6/VCQK1btXbv30ayyYn04
pOqK0XJigyRjlp8SwDgD2uCg7iYQI10toliIipo4TvErKOVSNl/VsyedryCq64Il
6Rj6vntAp/m5s2I3hKEmCZikWC5YalsfkLFRnzKma3vc8Q8+lwWQoEoFb19V/F3E
KGkjdoZhsfcGH3kHiqBc342G+9va54c8dgOKcfx0CA6NJwhogI+i4tza2ZdRMPxn
Yo7VvuXYISppFYUujW5oT1m5JyPFmVRBCBAyf9kUdPQ5H8QJ0V3gXdtQuUd/NV5J
OuVQljUNXTQRxqRFDIdHfXI=
=0Sht
-----END PGP SIGNATURE-----
Merge tag 'mips_5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
Pull MIPS updates from Thomas Bogendoerfer:
- enable GCOV
- rework setup of protection map
- add support for more MSCC platforms
- add sysfs boardinfo for Loongson64
- enable KASLR for Loogson64
- add reset controller for BCM63xx
- cleanups and fixes
* tag 'mips_5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: (70 commits)
mips: fix Section mismatch in reference
MAINTAINERS: Add linux-mips mailing list to JZ47xx entries
MAINTAINERS: Remove JZ4780 DMA driver entry
MAINTAINERS: chenhc@lemote.com -> chenhuacai@kernel.org
MIPS: Octeon: irq: Alloc desc before configuring IRQ
MIPS: mm: Add back define for PAGE_SHARED
MIPS: Select ARCH_KEEP_MEMBLOCK if DEBUG_KERNEL to enable sysfs memblock debug
mips: lib: uncached: fix non-standard usage of variable 'sp'
MIPS: DTS: img: Fix schema warnings for pwm-leds
MIPS: KASLR: Avoid endless loop in sync_icache if synci_step is zero
MIPS: Move memblock_dump_all() to the end of setup_arch()
MIPS: SMP-CPS: Add support for irq migration when CPU offline
MIPS: OCTEON: Don't add kernel sections into memblock allocator
MIPS: Don't round up kernel sections size for memblock_add()
MIPS: Enable GCOV
MIPS: configs: drop unused BACKLIGHT_GENERIC option
MIPS: Loongson64: Fix up reserving kernel memory range
MIPS: mm: Remove unused is_aligned_hugepage_range
MIPS: No need to check CPU 0 in {loongson3,bmips,octeon}_cpu_disable()
mips: cdmm: fix use-after-free in mips_cdmm_bus_discover
...
- migrate_disable/enable() support which originates from the RT tree and
is now a prerequisite for the new preemptible kmap_local() API which aims
to replace kmap_atomic().
- A fair amount of topology and NUMA related improvements
- Improvements for the frequency invariant calculations
- Enhanced robustness for the global CPU priority tracking and decision
making
- The usual small fixes and enhancements all over the place
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl/XwK4THHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoX28D/9cVrvziSQGfBfuQWnUiw8iOIq1QBa2
Me+Tvenhfrlt7xU6rbP9ciFu7eTN+fS06m5uQPGI+t22WuJmHzbmw1bJVXfkvYfI
/QoU+Hg7DkDAn1p7ZKXh0dRkV0nI9ixxSHl0E+Zf1ATBxCUMV2SO85flg6z/4qJq
3VWUye0dmR7/bhtkIjv5rwce9v2JB2g1AbgYXYTW9lHVoUdGoMSdiZAF4tGyHLnx
sJ6DMqQ+k+dmPyYO0z5MTzjW/fXit4n9w2e3z9TvRH/uBu58WSW1RBmQYX6aHBAg
dhT9F4lvTs6lJY23x5RSFWDOv6xAvKF5a0xfb8UZcyH5EoLYrPRvm42a0BbjdeRa
u0z7LbwIlKA+RFdZzFZWz8UvvO0ljyMjmiuqZnZ5dY9Cd80LSBuxrWeQYG0qg6lR
Y2povhhCepEG+q8AXIe2YjHKWKKC1s/l/VY3CNnCzcd21JPQjQ4Z5eWGmHif5IED
CntaeFFhZadR3w02tkX35zFmY3w4soKKrbI4EKWrQwd+cIEQlOSY7dEPI/b5BbYj
MWAb3P4EG9N77AWTNmbhK4nN0brEYb+rBbCA+5dtNBVhHTxAC7OTWElJOC2O66FI
e06dREjvwYtOkRUkUguWwErbIai2gJ2MH0VILV3hHoh64oRk7jjM8PZYnjQkdptQ
Gsq0rJW5iiu/OQ==
=Oz1V
-----END PGP SIGNATURE-----
Merge tag 'sched-core-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Thomas Gleixner:
- migrate_disable/enable() support which originates from the RT tree
and is now a prerequisite for the new preemptible kmap_local() API
which aims to replace kmap_atomic().
- A fair amount of topology and NUMA related improvements
- Improvements for the frequency invariant calculations
- Enhanced robustness for the global CPU priority tracking and decision
making
- The usual small fixes and enhancements all over the place
* tag 'sched-core-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (61 commits)
sched/fair: Trivial correction of the newidle_balance() comment
sched/fair: Clear SMT siblings after determining the core is not idle
sched: Fix kernel-doc markup
x86: Print ratio freq_max/freq_base used in frequency invariance calculations
x86, sched: Use midpoint of max_boost and max_P for frequency invariance on AMD EPYC
x86, sched: Calculate frequency invariance for AMD systems
irq_work: Optimize irq_work_single()
smp: Cleanup smp_call_function*()
irq_work: Cleanup
sched: Limit the amount of NUMA imbalance that can exist at fork time
sched/numa: Allow a floating imbalance between NUMA nodes
sched: Avoid unnecessary calculation of load imbalance at clone time
sched/numa: Rename nr_running and break out the magic number
sched: Make migrate_disable/enable() independent of RT
sched/topology: Condition EAS enablement on FIE support
arm64: Rebuild sched domains on invariance status changes
sched/topology,schedutil: Wrap sched domains rebuild
sched/uclamp: Allow to reset a task uclamp constraint value
sched/core: Fix typos in comments
Documentation: scheduler: fix information on arch SD flags, sched_domain and sched_debug
...
Most platforms do not need to do synci instruction operations when
synci_step is 0. But for example, the synci implementation on Loongson64
platform has some changes. On the one hand, it ensures that the memory
access instructions have been completed. On the other hand, it guarantees
that all prefetch instructions need to be fetched again. And its address
information is useless. Thus, only one synci operation is required when
synci_step is 0 on Loongson64 platform. I guess that some other platforms
have similar implementations on synci, so add judgment conditions in
`while` to ensure that at least all platforms perform synci operations
once. For those platforms that do not need synci, they just do one more
operation similar to nop.
Signed-off-by: Jinyang He <hejinyang@loongson.cn>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
In order to get more memblock configuration with memblock=debug in the boot
cmdline, move memblock_dump_all() to the end of setup_arch(), this can help
us to get dmi_setup() and resource_init() memblock info, at least for now.
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Currently we won't migrate irqs when offline CPUs, which has been
implemented on most architectures. That will lead to some devices work
incorrectly if the bound cores are offline.
While that can be easily supported by enabling GENERIC_IRQ_MIGRATION.
But i don't pretty known the reason it was not supported on all MIPS
platforms.
This patch add the support for irq migration on MIPS CPS platform, and
it's tested on the interAptiv processor.
Signed-off-by: Wei Li <liwei391@huawei.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Linux doesn't own the memory immediately after the kernel image. On Octeon
bootloader places a shared structure right close after the kernel _end,
refer to "struct cvmx_bootinfo *octeon_bootinfo" in cavium-octeon/setup.c.
If check_kernel_sections_mem() rounds the PFNs up, first memblock_alloc()
inside early_init_dt_alloc_memory_arch() <= device_tree_init() returns
memory block overlapping with the above octeon_bootinfo structure, which
is being overwritten afterwards.
Fixes: a94e4f24ec ("MIPS: init: Drop boot_mem_map")
Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nokia.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
idle path. Similar to the entry path the low level idle functions have to
be non-instrumentable.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl/DpAUTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoXSLD/9klc0YimnEnROW6Q5Svb2IcyIutmXF
bOIY1bYYoKILOBj3wyvDUhmdMuq5zh7H9yG11hO8MaVVWVQcLcOMLdHTYm9dcdmF
xQk33+xqjuhRShB+nEmC9ayYtWogtH6W6uZ6WDtF9ZltMKU85n5ddGJ/Fvo+HoCb
NbOdHGJdJ3/3ZCeHnxOnxM+5/GwjkBuccTV/tXmb3yXrfU9DBySyQ4/UchcpF43w
LcEb0kiQbpZsBTByKJOQV8+RR654S0sILlvRwVXpmj94vrgGwhlVk1/9rz7tkOhF
ksoo1mTVu75LMt22G/hXxE63787yRvFdHjapf0+kCOAuhl992NK+xlGDH8o9DXcu
9y73D4bI0HnDFs20w6vs20iLvxECJiYHJqlgR5ZwFUToceaNgtiYr8kzuD7Zbae1
KG2E7BuNSwHWMtf97fGn44GZknPEOaKdDn4Wv6/bvKHxLm77qe11RKF70Stcz2AI
am13KmQzzsHGF5qNWwpElRUxSdxfJMR66RnOdTQULGrRedaZTFol/y2pnVzTSe3k
SZnlpL5kE7y92UYDogPb5wWA7b+YkJN0OdSkRFy1FH26ZG8E4M7ZJ2tql5Sw7pGM
lsTjXpAUphnK5rz7QcYE8KAZWj//fIAcElIrvdklVcBnS3IqjfksYW27B64133vx
cT1B/lA1PHXj6Q==
=raED
-----END PGP SIGNATURE-----
Merge tag 'locking-urgent-2020-11-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fixes from Thomas Gleixner:
"Two more places which invoke tracing from RCU disabled regions in the
idle path.
Similar to the entry path the low level idle functions have to be
non-instrumentable"
* tag 'locking-urgent-2020-11-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
intel_idle: Fix intel_idle() vs tracing
sched/idle: Fix arch_cpu_idle() vs tracing
After commit 9cce844abf ("MIPS: CPU#0 is not hotpluggable"),
c->hotpluggable is 0 for CPU 0 and it will not generate a control
file in sysfs for this CPU:
[root@linux loongson]# cat /sys/devices/system/cpu/cpu0/online
cat: /sys/devices/system/cpu/cpu0/online: No such file or directory
[root@linux loongson]# echo 0 > /sys/devices/system/cpu/cpu0/online
bash: /sys/devices/system/cpu/cpu0/online: Permission denied
So no need to check CPU 0 in {loongson3,bmips,octeon}_cpu_disable(),
just remove them.
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Provide a weak plat_get_fdt() in relocate.c in case some platform enable
USE_OF while plat_get_fdt() is useless.
1MB RELOCATION_TABLE_SIZE is small for Loongson64 because too many
instructions should be relocated. 2MB is enough in present.
Add KASLR support for Loongson64.
KASLR(kernel address space layout randomization)
To enable KASLR on Loongson64:
First, make loongson3_defconfig.
Then, enable CONFIG_RELOCATABLE and CONFIG_RANDOMIZE_BASE.
Finally, compile the kernel.
To test KASLR on Loongson64:
Start machine with KASLR kernel.
The first time:
# cat /proc/iomem
00200000-0effffff : System RAM
02f30000-03895e9f : Kernel code
03895ea0-03bc7fff : Kernel data
03e30000-04f43f7f : Kernel bss
The second time:
# cat /proc/iomem
00200000-0effffff : System RAM
022f0000-02c55e9f : Kernel code
02c55ea0-02f87fff : Kernel data
031f0000-04303f7f : Kernel bss
We see that code, data and bss sections become randomize.
Signed-off-by: Jinyang He <hejinyang@loongson.cn>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Apply_r_mips_26_rel() relocates instructions like j, jal and etc. These
instructions consist of 6bits function field and 26bits address field.
The value of target_addr as follows,
=================================================================
| high 4bits | low 28bits |
=================================================================
|the high 4bits of this PC | the low 26bits of instructions << 2|
=================================================================
Thus, loc_orig and log_new both need high 4bits rather than high 6bits.
Signed-off-by: Jinyang He <hejinyang@loongson.cn>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Get rid of the __call_single_node union and cleanup the API a little
to avoid external code relying on the structure layout as much.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
We call arch_cpu_idle() with RCU disabled, but then use
local_irq_{en,dis}able(), which invokes tracing, which relies on RCU.
Switch all arch_cpu_idle() implementations to use
raw_local_irq_{en,dis}able() and carefully manage the
lockdep,rcu,tracing state like we do in entry.
(XXX: we really should change arch_cpu_idle() to not return with
interrupts enabled)
Reported-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lkml.kernel.org/r/20201120114925.594122626@infradead.org
MIPS protection bits are setup during runtime so using defines like
PAGE_READONLY ignores these runtime changes. To fix this we simply
use the page protection of the setup vma.
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
- fix for build error, when modules need has_transparent_hugepage
- fix for memleak in alchemy clk setup
-----BEGIN PGP SIGNATURE-----
iQJOBAABCAA4FiEEbt46xwy6kEcDOXoUeZbBVTGwZHAFAl+0DxIaHHRzYm9nZW5k
QGFscGhhLmZyYW5rZW4uZGUACgkQeZbBVTGwZHAJARAAo9kPj9P488PlKjw168SJ
neNK00OmtW8D6grnXIHnj/lYzJ1ul+dgbfEiKYMKrBibwc9jxdqeBjR+aLbADs0H
8u4Z3+dLRNzCqnxd4EYwP53+WdICrlNN9c7H+AjvWP4aMzZLNQW2fAQi+7hGR8p7
gvR+7CZyBHtVns92IGrMPEm1CBVrP/PrKxU6TRi3GYxxf7jCGYMHRI/bwXkBSGhg
mzxeCtygnDKivUcLv2FqFQ6BmSbPbxZVqQNKkBCqEyEIvYmvNkmUMuqf36NYOtBp
dXloYV5px9FjEqBhfyRYeO3xElyphCm4fYWV4+Bvf60w4m3c33peezg8ltJxqFAu
zFZk6NhfgZrRRJTGTfJ3juisagmtLzGx5rcK/JJGmcvW3975cwMlEtd545JGrECW
Lq3GZLwROc8hEh8dydrf3jGjP3+Ty2emO6rjYQrDqoWAhu4IqpLIb7s58IWzVlYv
z7KejBo1bTuSGar6sHREmEGCPzlrJp4ebq6niD4j57Yk9gzp0BoTd+JogE/EaCz/
P92OV/Log/hgKhFB0sI3Mtl2OOF3KIPq1IFfeu8731yMq9muOVZMhTOS03LRySFl
yyxVFn+uI00oE5A7Vw4ETEdiw1lbBjD4rC6HLQK36RhlZR5vGGL8rIejXRuKJCZY
F5yTub6Y6KDbKpBCjAkDros=
=5NkQ
-----END PGP SIGNATURE-----
Merge tag 'mips_fixes_5.10_1' into mips-next
Pull in mips-fixes to get memblock fix.
- fix bug preventing booting on several platforms
- fix for build error, when modules need has_transparent_hugepage
- fix for memleak in alchemy clk setup
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
The loop over all memblocks works with PFNs and not physical
addresses, so we need for_each_mem_pfn_range().
Fixes: b10d6bca87 ("arch, drivers: replace for_each_membock() with for_each_mem_range()")
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Serge Semin <fancer.lancer@gmail.com>
Add the missing iounmap() of iounmap(mips_gcr_base) before
return from mips_cm_probe() in the error handling case.
Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Use a more generic form for __section that requires quotes to avoid
complications with clang and gcc differences.
Remove the quote operator # from compiler_attributes.h __section macro.
Convert all unquoted __section(foo) uses to quoted __section("foo").
Also convert __attribute__((section("foo"))) uses to __section("foo")
even if the __attribute__ has multiple list entry forms.
Conversion done using the script at:
https://lore.kernel.org/lkml/75393e5ddc272dc7403de74d645e6c6e0f4e70eb.camel@perches.com/2-convert_section.pl
Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@gooogle.com>
Reviewed-by: Miguel Ojeda <ojeda@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl+SOXIQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgptrcD/93VUDmRAn73ChKNd0TtXUicJlAlNLVjvfs
VFTXWBDnlJnGkZT7ElkDD9b8dsz8l4xGf/QZ5dzhC/th2OsfObQkSTfe0lv5cCQO
mX7CRSrDpjaHtW+WGPDa0oQsGgIfpqUz2IOg9NKbZZ1LJ2uzYfdOcf3oyRgwZJ9B
I3sh1vP6OzjZVVCMmtMTM+sYZEsDoNwhZwpkpiwMmj8tYtOPgKCYKpqCiXrGU0x2
ML5FtDIwiwU+O3zYYdCBWqvCb2Db0iA9Aov2whEBz/V2jnmrN5RMA/90UOh1E2zG
br4wM1Wt3hNrtj5qSxZGlF/HEMYJVB8Z2SgMjYu4vQz09qRVVqpGdT/dNvLAHQWg
w4xNCj071kVZDQdfwnqeWSKYUau9Xskvi8xhTT+WX8a5CsbVrM9vGslnS5XNeZ6p
h2D3Q+TAYTvT756icTl0qsYVP7PrPY7DdmQYu0q+Lc3jdGI+jyxO2h9OFBRLZ3p6
zFX2N8wkvvCCzP2DwVnnhIi/GovpSh7ksHnb039F36Y/IhZPqV1bGqdNQVdanv6I
8fcIDM6ltRQ7dO2Br5f1tKUZE9Pm6x60b/uRVjhfVh65uTEKyGRhcm5j9ztzvQfI
cCBg4rbVRNKolxuDEkjsAFXVoiiEEsb7pLf4pMO+Dr62wxFG589tQNySySneUIVZ
J9ILnGAAeQ==
=aVWo
-----END PGP SIGNATURE-----
Merge tag 'arch-cleanup-2020-10-22' of git://git.kernel.dk/linux-block
Pull arch task_work cleanups from Jens Axboe:
"Two cleanups that don't fit other categories:
- Finally get the task_work_add() cleanup done properly, so we don't
have random 0/1/false/true/TWA_SIGNAL confusing use cases. Updates
all callers, and also fixes up the documentation for
task_work_add().
- While working on some TIF related changes for 5.11, this
TIF_NOTIFY_RESUME cleanup fell out of that. Remove some arch
duplication for how that is handled"
* tag 'arch-cleanup-2020-10-22' of git://git.kernel.dk/linux-block:
task_work: cleanup notification modes
tracehook: clear TIF_NOTIFY_RESUME in tracehook_notify_resume()
There is usecase that System Management Software(SMS) want to give a
memory hint like MADV_[COLD|PAGEEOUT] to other processes and in the
case of Android, it is the ActivityManagerService.
The information required to make the reclaim decision is not known to the
app. Instead, it is known to the centralized userspace
daemon(ActivityManagerService), and that daemon must be able to initiate
reclaim on its own without any app involvement.
To solve the issue, this patch introduces a new syscall
process_madvise(2). It uses pidfd of an external process to give the
hint. It also supports vector address range because Android app has
thousands of vmas due to zygote so it's totally waste of CPU and power if
we should call the syscall one by one for each vma.(With testing 2000-vma
syscall vs 1-vector syscall, it showed 15% performance improvement. I
think it would be bigger in real practice because the testing ran very
cache friendly environment).
Another potential use case for the vector range is to amortize the cost
ofTLB shootdowns for multiple ranges when using MADV_DONTNEED; this could
benefit users like TCP receive zerocopy and malloc implementations. In
future, we could find more usecases for other advises so let's make it
happens as API since we introduce a new syscall at this moment. With
that, existing madvise(2) user could replace it with process_madvise(2)
with their own pid if they want to have batch address ranges support
feature.
ince it could affect other process's address range, only privileged
process(PTRACE_MODE_ATTACH_FSCREDS) or something else(e.g., being the same
UID) gives it the right to ptrace the process could use it successfully.
The flag argument is reserved for future use if we need to extend the API.
I think supporting all hints madvise has/will supported/support to
process_madvise is rather risky. Because we are not sure all hints make
sense from external process and implementation for the hint may rely on
the caller being in the current context so it could be error-prone. Thus,
I just limited hints as MADV_[COLD|PAGEOUT] in this patch.
If someone want to add other hints, we could hear the usecase and review
it for each hint. It's safer for maintenance rather than introducing a
buggy syscall but hard to fix it later.
So finally, the API is as follows,
ssize_t process_madvise(int pidfd, const struct iovec *iovec,
unsigned long vlen, int advice, unsigned int flags);
DESCRIPTION
The process_madvise() system call is used to give advice or directions
to the kernel about the address ranges from external process as well as
local process. It provides the advice to address ranges of process
described by iovec and vlen. The goal of such advice is to improve
system or application performance.
The pidfd selects the process referred to by the PID file descriptor
specified in pidfd. (See pidofd_open(2) for further information)
The pointer iovec points to an array of iovec structures, defined in
<sys/uio.h> as:
struct iovec {
void *iov_base; /* starting address */
size_t iov_len; /* number of bytes to be advised */
};
The iovec describes address ranges beginning at address(iov_base)
and with size length of bytes(iov_len).
The vlen represents the number of elements in iovec.
The advice is indicated in the advice argument, which is one of the
following at this moment if the target process specified by pidfd is
external.
MADV_COLD
MADV_PAGEOUT
Permission to provide a hint to external process is governed by a
ptrace access mode PTRACE_MODE_ATTACH_FSCREDS check; see ptrace(2).
The process_madvise supports every advice madvise(2) has if target
process is in same thread group with calling process so user could
use process_madvise(2) to extend existing madvise(2) to support
vector address ranges.
RETURN VALUE
On success, process_madvise() returns the number of bytes advised.
This return value may be less than the total number of requested
bytes, if an error occurred. The caller should check return value
to determine whether a partial advice occurred.
FAQ:
Q.1 - Why does any external entity have better knowledge?
Quote from Sandeep
"For Android, every application (including the special SystemServer)
are forked from Zygote. The reason of course is to share as many
libraries and classes between the two as possible to benefit from the
preloading during boot.
After applications start, (almost) all of the APIs end up calling into
this SystemServer process over IPC (binder) and back to the
application.
In a fully running system, the SystemServer monitors every single
process periodically to calculate their PSS / RSS and also decides
which process is "important" to the user for interactivity.
So, because of how these processes start _and_ the fact that the
SystemServer is looping to monitor each process, it does tend to *know*
which address range of the application is not used / useful.
Besides, we can never rely on applications to clean things up
themselves. We've had the "hey app1, the system is low on memory,
please trim your memory usage down" notifications for a long time[1].
They rely on applications honoring the broadcasts and very few do.
So, if we want to avoid the inevitable killing of the application and
restarting it, some way to be able to tell the OS about unimportant
memory in these applications will be useful.
- ssp
Q.2 - How to guarantee the race(i.e., object validation) between when
giving a hint from an external process and get the hint from the target
process?
process_madvise operates on the target process's address space as it
exists at the instant that process_madvise is called. If the space
target process can run between the time the process_madvise process
inspects the target process address space and the time that
process_madvise is actually called, process_madvise may operate on
memory regions that the calling process does not expect. It's the
responsibility of the process calling process_madvise to close this
race condition. For example, the calling process can suspend the
target process with ptrace, SIGSTOP, or the freezer cgroup so that it
doesn't have an opportunity to change its own address space before
process_madvise is called. Another option is to operate on memory
regions that the caller knows a priori will be unchanged in the target
process. Yet another option is to accept the race for certain
process_madvise calls after reasoning that mistargeting will do no
harm. The suggested API itself does not provide synchronization. It
also apply other APIs like move_pages, process_vm_write.
The race isn't really a problem though. Why is it so wrong to require
that callers do their own synchronization in some manner? Nobody
objects to write(2) merely because it's possible for two processes to
open the same file and clobber each other's writes --- instead, we tell
people to use flock or something. Think about mmap. It never
guarantees newly allocated address space is still valid when the user
tries to access it because other threads could unmap the memory right
before. That's where we need synchronization by using other API or
design from userside. It shouldn't be part of API itself. If someone
needs more fine-grained synchronization rather than process level,
there were two ideas suggested - cookie[2] and anon-fd[3]. Both are
applicable via using last reserved argument of the API but I don't
think it's necessary right now since we have already ways to prevent
the race so don't want to add additional complexity with more
fine-grained optimization model.
To make the API extend, it reserved an unsigned long as last argument
so we could support it in future if someone really needs it.
Q.3 - Why doesn't ptrace work?
Injecting an madvise in the target process using ptrace would not work
for us because such injected madvise would have to be executed by the
target process, which means that process would have to be runnable and
that creates the risk of the abovementioned race and hinting a wrong
VMA. Furthermore, we want to act the hint in caller's context, not the
callee's, because the callee is usually limited in cpuset/cgroups or
even freezed state so they can't act by themselves quick enough, which
causes more thrashing/kill. It doesn't work if the target process are
ptraced(e.g., strace, debugger, minidump) because a process can have at
most one ptracer.
[1] https://developer.android.com/topic/performance/memory"
[2] process_getinfo for getting the cookie which is updated whenever
vma of process address layout are changed - Daniel Colascione -
https://lore.kernel.org/lkml/20190520035254.57579-1-minchan@kernel.org/T/#m7694416fd179b2066a2c62b5b139b14e3894e224
[3] anonymous fd which is used for the object(i.e., address range)
validation - Michal Hocko -
https://lore.kernel.org/lkml/20200120112722.GY18451@dhcp22.suse.cz/
[minchan@kernel.org: fix process_madvise build break for arm64]
Link: http://lkml.kernel.org/r/20200303145756.GA219683@google.com
[minchan@kernel.org: fix build error for mips of process_madvise]
Link: http://lkml.kernel.org/r/20200508052517.GA197378@google.com
[akpm@linux-foundation.org: fix patch ordering issue]
[akpm@linux-foundation.org: fix arm64 whoops]
[minchan@kernel.org: make process_madvise() vlen arg have type size_t, per Florian]
[akpm@linux-foundation.org: fix i386 build]
[sfr@canb.auug.org.au: fix syscall numbering]
Link: https://lkml.kernel.org/r/20200905142639.49fc3f1a@canb.auug.org.au
[sfr@canb.auug.org.au: madvise.c needs compat.h]
Link: https://lkml.kernel.org/r/20200908204547.285646b4@canb.auug.org.au
[minchan@kernel.org: fix mips build]
Link: https://lkml.kernel.org/r/20200909173655.GC2435453@google.com
[yuehaibing@huawei.com: remove duplicate header which is included twice]
Link: https://lkml.kernel.org/r/20200915121550.30584-1-yuehaibing@huawei.com
[minchan@kernel.org: do not use helper functions for process_madvise]
Link: https://lkml.kernel.org/r/20200921175539.GB387368@google.com
[akpm@linux-foundation.org: pidfd_get_pid() gained an argument]
[sfr@canb.auug.org.au: fix up for "iov_iter: transparently handle compat iovecs in import_iovec"]
Link: https://lkml.kernel.org/r/20200928212542.468e1fef@canb.auug.org.au
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Christian Brauner <christian@brauner.io>
Cc: Daniel Colascione <dancol@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Dias <joaodias@google.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oleksandr Natalenko <oleksandr@redhat.com>
Cc: Sandeep Patil <sspatil@google.com>
Cc: SeongJae Park <sj38.park@gmail.com>
Cc: SeongJae Park <sjpark@amazon.de>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Sonny Rao <sonnyrao@google.com>
Cc: Tim Murray <timmurray@google.com>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Florian Weimer <fw@deneb.enyo.de>
Cc: <linux-man@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200302193630.68771-3-minchan@kernel.org
Link: http://lkml.kernel.org/r/20200508183320.GA125527@google.com
Link: http://lkml.kernel.org/r/20200622192900.22757-4-minchan@kernel.org
Link: https://lkml.kernel.org/r/20200901000633.1920247-4-minchan@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
All the callers currently do this, clean it up and move the clearing
into tracehook_notify_resume() instead.
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
- removed support for PNX833x alias NXT_STB22x
- included Ingenic SoC support into generic MIPS kernels
- added support for new Ingenic SoCs
- converted workaround selection to use Kconfig
- replaced old boot mem functions by memblock_*
- enabled COP2 usage in kernel for Loongson64 to make usage
of usage of 16byte load/stores possible
- cleanups and fixes
-----BEGIN PGP SIGNATURE-----
iQJOBAABCAA4FiEEbt46xwy6kEcDOXoUeZbBVTGwZHAFAl+Jk/MaHHRzYm9nZW5k
QGFscGhhLmZyYW5rZW4uZGUACgkQeZbBVTGwZHDjyA/9EAEb8woPRsEfbQE8GLgT
vW2y2/fSHFJHoYY/t9+G81lJVKsg9TXQ9LyNk3WSU6+a6qELVqmnHY7+e43rSkfG
qaxMRJOmwsMU7NWQOy1OSyESHidsAXrGYMY40TKrcClyVbS/Ob6wZ5QbBp+MTEsU
ane8Yq/QTS60xIxsS0SZSiQpqzumUn7oHAwCAlqcqo26tV94mtrtsFG4pReqI2gh
Bxs2ZoQYdx1/rPGXHV74fwP3Iz1Rwq3Z38FCyK7ME98cTEiLxYs1/ztgL1y0IC07
F3Dv3wmPtCGZtNyqDJxs7lHsbi74owSyoueywNeOA+YV8IzkOCEW0XpgL7vI2gPL
OIi+LbH7MXt3P14h5ekzK+dSILg3yNFD152PmGxpUVzVhfDCw+uyUHzHdhZSCF/J
aldlDm1wtUV5PacruVbH26amownTsfdei+WTtgGN3QAISmnLjUghsplPZE6KWbGW
uPPpuIA2pTwW2FQdXL/WwGZm1k44ii5zX1Cjc55AZISZOzFXqklbuEZbMEM5O76N
EFR+zOd4+wueOZOI7vpBTmKSSY/r12Ve25hbMMYeY0G3bbsubcIIIHHRxhdPp/+R
8t+PTC9//bT9r/OKdGV6TDsSJmSZWfaaNd30actDHpss8ruUlNxwWv7awp/z0sOs
U+R5CAaVvzQlhxkzdO+M03w=
=Aa/Q
-----END PGP SIGNATURE-----
Merge tag 'mips_5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
Pull MIPS updates from Thomas Bogendoerfer:
- removed support for PNX833x alias NXT_STB22x
- included Ingenic SoC support into generic MIPS kernels
- added support for new Ingenic SoCs
- converted workaround selection to use Kconfig
- replaced old boot mem functions by memblock_*
- enabled COP2 usage in kernel for Loongson64 to make use
of 16byte load/stores possible
- cleanups and fixes
* tag 'mips_5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: (92 commits)
MIPS: DEC: Restore bootmem reservation for firmware working memory area
MIPS: dec: fix section mismatch
bcm963xx_tag.h: fix duplicated word
mips: ralink: enable zboot support
MIPS: ingenic: Remove CPU_SUPPORTS_HUGEPAGES
MIPS: cpu-probe: remove MIPS_CPU_BP_GHIST option bit
MIPS: cpu-probe: introduce exclusive R3k CPU probe
MIPS: cpu-probe: move fpu probing/handling into its own file
MIPS: replace add_memory_region with memblock
MIPS: Loongson64: Clean up numa.c
MIPS: Loongson64: Select SMP in Kconfig to avoid build error
mips: octeon: Add Ubiquiti E200 and E220 boards
MIPS: SGI-IP28: disable use of ll/sc in kernel
MIPS: tx49xx: move tx4939_add_memory_regions into only user
MIPS: pgtable: Remove used PAGE_USERIO define
MIPS: alchemy: Share prom_init implementation
MIPS: alchemy: Fix build breakage, if TOUCHSCREEN_WM97XX is disabled
MIPS: process: include exec.h header in process.c
MIPS: process: Add prototype for function arch_dup_task_struct
MIPS: idle: Add prototype for function check_wait
...
- rework the non-coherent DMA allocator
- move private definitions out of <linux/dma-mapping.h>
- lower CMA_ALIGNMENT (Paul Cercueil)
- remove the omap1 dma address translation in favor of the common
code
- make dma-direct aware of multiple dma offset ranges (Jim Quinlan)
- support per-node DMA CMA areas (Barry Song)
- increase the default seg boundary limit (Nicolin Chen)
- misc fixes (Robin Murphy, Thomas Tai, Xu Wang)
- various cleanups
-----BEGIN PGP SIGNATURE-----
iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAl+IiPwLHGhjaEBsc3Qu
ZGUACgkQD55TZVIEUYPKEQ//TM8vxjucnRl/pklpMin49dJorwiVvROLhQqLmdxw
286ZKpVzYYAPc7LnNqwIBugnFZiXuHu8xPKQkIiOa2OtNDTwhKNoBxOAmOJaV6DD
8JfEtZYeX5mKJ/Nqd2iSkIqOvCwZ9Wzii+aytJ2U88wezQr1fnyF4X49MegETEey
FHWreSaRWZKa0MMRu9AQ0QxmoNTHAQUNaPc0PeqEtPULybfkGOGw4/ghSB7WcKrA
gtKTuooNOSpVEHkTas2TMpcBp6lxtOjFqKzVN0ml+/nqq5NeTSDx91VOCX/6Cj76
mXIg+s7fbACTk/BmkkwAkd0QEw4fo4tyD6Bep/5QNhvEoAriTuSRbhvLdOwFz0EF
vhkF0Rer6umdhSK7nPd7SBqn8kAnP4vBbdmB68+nc3lmkqysLyE4VkgkdH/IYYQI
6TJ0oilXWFmU6DT5Rm4FBqCvfcEfU2dUIHJr5wZHqrF2kLzoZ+mpg42fADoG4GuI
D/oOsz7soeaRe3eYfWybC0omGR6YYPozZJ9lsfftcElmwSsFrmPsbO1DM5IBkj1B
gItmEbOB9ZK3RhIK55T/3u1UWY3Uc/RVr+kchWvADGrWnRQnW0kxYIqDgiOytLFi
JZNH8uHpJIwzoJAv6XXSPyEUBwXTG+zK37Ce769HGbUEaUrE71MxBbQAQsK8mDpg
7fM=
=Bkf/
-----END PGP SIGNATURE-----
Merge tag 'dma-mapping-5.10' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping updates from Christoph Hellwig:
- rework the non-coherent DMA allocator
- move private definitions out of <linux/dma-mapping.h>
- lower CMA_ALIGNMENT (Paul Cercueil)
- remove the omap1 dma address translation in favor of the common code
- make dma-direct aware of multiple dma offset ranges (Jim Quinlan)
- support per-node DMA CMA areas (Barry Song)
- increase the default seg boundary limit (Nicolin Chen)
- misc fixes (Robin Murphy, Thomas Tai, Xu Wang)
- various cleanups
* tag 'dma-mapping-5.10' of git://git.infradead.org/users/hch/dma-mapping: (63 commits)
ARM/ixp4xx: add a missing include of dma-map-ops.h
dma-direct: simplify the DMA_ATTR_NO_KERNEL_MAPPING handling
dma-direct: factor out a dma_direct_alloc_from_pool helper
dma-direct check for highmem pages in dma_direct_alloc_pages
dma-mapping: merge <linux/dma-noncoherent.h> into <linux/dma-map-ops.h>
dma-mapping: move large parts of <linux/dma-direct.h> to kernel/dma
dma-mapping: move dma-debug.h to kernel/dma/
dma-mapping: remove <asm/dma-contiguous.h>
dma-mapping: merge <linux/dma-contiguous.h> into <linux/dma-map-ops.h>
dma-contiguous: remove dma_contiguous_set_default
dma-contiguous: remove dev_set_cma_area
dma-contiguous: remove dma_declare_contiguous
dma-mapping: split <linux/dma-mapping.h>
cma: decrease CMA_ALIGNMENT lower limit to 2
firewire-ohci: use dma_alloc_pages
dma-iommu: implement ->alloc_noncoherent
dma-mapping: add new {alloc,free}_noncoherent dma_map_ops methods
dma-mapping: add a new dma_alloc_pages API
dma-mapping: remove dma_cache_sync
53c700: convert to dma_alloc_noncoherent
...
There are several occurrences of the following pattern:
for_each_memblock(memory, reg) {
start = __pfn_to_phys(memblock_region_memory_base_pfn(reg);
end = __pfn_to_phys(memblock_region_memory_end_pfn(reg));
/* do something with start and end */
}
Using for_each_mem_range() iterator is more appropriate in such cases and
allows simpler and cleaner code.
[akpm@linux-foundation.org: fix arch/arm/mm/pmsa-v7.c build]
[rppt@linux.ibm.com: mips: fix cavium-octeon build caused by memblock refactoring]
Link: http://lkml.kernel.org/r/20200827124549.GD167163@linux.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Daniel Axtens <dja@axtens.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Emil Renner Berthing <kernel@esmil.dk>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: https://lkml.kernel.org/r/20200818151634.14343-13-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull compat mount cleanups from Al Viro:
"The last remnants of mount(2) compat buried by Christoph.
Buried into NFS, that is.
Generally I'm less enthusiastic about "let's use in_compat_syscall()
deep in call chain" kind of approach than Christoph seems to be, but
in this case it's warranted - that had been an NFS-specific wart,
hopefully not to be repeated in any other filesystems (read: any new
filesystem introducing non-text mount options will get NAKed even if
it doesn't mess the layout up).
IOW, not worth trying to grow an infrastructure that would avoid that
use of in_compat_syscall()..."
* 'compat.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs: remove compat_sys_mount
fs,nfs: lift compat nfs4 mount data handling into the nfs code
nfs: simplify nfs4_parse_monolithic
Pull compat iovec cleanups from Al Viro:
"Christoph's series around import_iovec() and compat variant thereof"
* 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
security/keys: remove compat_keyctl_instantiate_key_iov
mm: remove compat_process_vm_{readv,writev}
fs: remove compat_sys_vmsplice
fs: remove the compat readv/writev syscalls
fs: remove various compat readv/writev helpers
iov_iter: transparently handle compat iovecs in import_iovec
iov_iter: refactor rw_copy_check_uvector and import_iovec
iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c
compat.h: fix a spelling error in <linux/compat.h>
because the heuristics that various linkers & compilers use to handle them
(include these bits into the output image vs discarding them silently)
are both highly idiosyncratic and also version dependent.
Instead of this historically problematic mess, this tree by Kees Cook (et al)
adds build time asserts and build time warnings if there's any orphan section
in the kernel or if a section is not sized as expected.
And because we relied on so many silent assumptions in this area, fix a metric
ton of dependencies and some outright bugs related to this, before we can
finally enable the checks on the x86, ARM and ARM64 platforms.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl+Edv4RHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1hiKBAApdJEOaK7hMc3013DYNctklIxEPJL2mFJ
11YJRIh4pUJTF0TE+EHT/D+rSIuRsyuoSmOQBQ61/wVSnyG067GjjVJRqh/eYaJ1
fDhJi2FuHOjXl+CiN0KxzBjjp+V4NhF7jHT59tpQSvfZeg7FjteoxfztxaCp5ek3
S3wHB3CC4c4jE3lfjHem1E9/PwT4kwPYx1c3gAUdEqJdjkihjX9fWusfjLeqW6/d
Y5VkApi6bL9XiZUZj5l0dEIweLJJ86+PkKJqpo3spxxEak1LSn1MEix+lcJ8e1Kg
sb/bEEivDcmFlFWOJnn0QLquCR0Cx5bz1pwsL0tuf0yAd4+sXX5IMuGUysZlEdKM
BHL9h5HbevGF4BScwZwZH7lyEg7q67s5KnRu4hxy0Swfcj7y0oT/9lXqpbpZ2DqO
Hd+bRRQKIbqnTMp0hcit9LfpLp93vj0dBlaV5ocAJJlu62u9VnwGG5HQuZ5giLUr
kA1SLw63Y1wopFRxgFyER8les7eLsu0zxHeK44rRVlVnfI99OMTOgVNicmDFy3Fm
AfcnfJG0BqBEJGQz5es34uQQKKBwFPtC9NztopI62KiwOspYYZyrO1BNxdOc6DlS
mIHrmO89HMXuid5eolvLaFqUWirHoWO8TlycgZxUWVHc2txVPjAEU/axouU/dSSU
w/6GpzAa+7g=
=fXAw
-----END PGP SIGNATURE-----
Merge tag 'core-build-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull orphan section checking from Ingo Molnar:
"Orphan link sections were a long-standing source of obscure bugs,
because the heuristics that various linkers & compilers use to handle
them (include these bits into the output image vs discarding them
silently) are both highly idiosyncratic and also version dependent.
Instead of this historically problematic mess, this tree by Kees Cook
(et al) adds build time asserts and build time warnings if there's any
orphan section in the kernel or if a section is not sized as expected.
And because we relied on so many silent assumptions in this area, fix
a metric ton of dependencies and some outright bugs related to this,
before we can finally enable the checks on the x86, ARM and ARM64
platforms"
* tag 'core-build-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
x86/boot/compressed: Warn on orphan section placement
x86/build: Warn on orphan section placement
arm/boot: Warn on orphan section placement
arm/build: Warn on orphan section placement
arm64/build: Warn on orphan section placement
x86/boot/compressed: Add missing debugging sections to output
x86/boot/compressed: Remove, discard, or assert for unwanted sections
x86/boot/compressed: Reorganize zero-size section asserts
x86/build: Add asserts for unwanted sections
x86/build: Enforce an empty .got.plt section
x86/asm: Avoid generating unused kprobe sections
arm/boot: Handle all sections explicitly
arm/build: Assert for unwanted sections
arm/build: Add missing sections
arm/build: Explicitly keep .ARM.attributes sections
arm/build: Refactor linker script headers
arm64/build: Assert for unwanted sections
arm64/build: Add missing DWARF sections
arm64/build: Use common DISCARDS in linker script
arm64/build: Remove .eh_frame* sections due to unwind tables
...
MIPS_CPU_BP_GHIST is only set two times and more or less immediately
used in cpu-probe.c itself. Remove this option to make room in options
word.
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Running a kernel on a R3k of machine definitly will never see one of
the newer CPU cores. And since R3k system usually are low on memory
we could save quite some kbytes:
text data bss dec hex filename
15070 88 32 15190 3b56 arch/mips/kernel/cpu-probe.o
844 4 16 864 360 arch/mips/kernel/cpu-r3k-probe.o
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
cpu-probe.c has grown when supporting more and more CPUs and there
are use cases where probing for all the CPUs isn't useful like
running on a R3k system. But still the fpu handling is nearly
the same. For sharing put the fpu code into it's own file.
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
add_memory_region was the old interface for registering memory and
was already changed to used memblock internaly. Replace it by
directly calling memblock functions.
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Merge dma-contiguous.h into dma-map-ops.h, after removing the comment
describing the contiguous allocator into kernel/dma/contigous.c.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Now that import_iovec handles compat iovecs, the native syscalls
can be used for the compat case as well.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Now that import_iovec handles compat iovecs, the native vmsplice syscall
can be used for the compat case as well.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Now that import_iovec handles compat iovecs, the native readv and writev
syscalls can be used for the compat case as well.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
arch/mips/kernel/process.c:696:15: error: no previous prototype for 'arch_align_stack' [-Werror=missing-prototypes]
Signed-off-by: Pujin Shi <shipujin.t@gmail.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
compat_sys_mount is identical to the regular sys_mount now, so remove it
and use the native version everywhere.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
When the kernel crashkernel parameter is specified with just a size,
we are supposed to allocate a region from RAM to store the crashkernel.
However, MIPS merely reserves physical address zero with no checking
that there is even RAM there.
Fix this by lifting similar code from x86, importing it to MIPS with the
MIPS specific parameters added. In the absence of any platform specific
information, we allocate the crashkernel region from the first 512MB of
physical memory (limited to CKSEG0 or KSEG0 address range).
When X is not specified, crash_base defaults to 0 (crashkernel=YM@XM).
E.g. without this patch:
The environment as follows:
[ 0.000000] MIPS: machine is loongson,loongson64c-4core-ls7a
...
[ 0.000000] Kernel command line: root=/dev/sda2 crashkernel=96M ...
The warning as follows:
[ 0.000000] Invalid memory region reserved for crash kernel
And the iomem as follows:
00200000-0effffff : System RAM
00200000-00b47f87 : Kernel code
00b47f88-00dfffff : Kernel data
00e60000-01f73c7f : Kernel bss
1a000000-1bffffff : pci@1a000000
...
With this patch:
After increasing crash_base <= 0 handling.
And the iomem as follows:
00200000-0effffff : System RAM
00200000-00b47f87 : Kernel code
00b47f88-00dfffff : Kernel data
00e60000-01f73c7f : Kernel bss
04000000-09ffffff : Crash kernel
1a000000-1bffffff : pci@1a000000
...
Signed-off-by: Youling Tang <tangyouling@loongson.cn>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Loongson-3 has 16-bytes load/store instructions: gslq and gssq. This
patch calculate ra properly when unwinding the stack, if ra is saved
by gssq and restored by gslq.
Signed-off-by: Huacai Chen <chenhc@lemote.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Loongson-3's COP2 is Multi-Media coprocessor, it is disabled in kernel
mode by default. However, gslq/gssq (16-bytes load/store instructions)
overrides the instruction format of lwc2/swc2. If we wan't to use gslq/
gssq for optimization in kernel, we should enable COP2 usage in kernel.
Please pay attention that in this patch we only enable COP2 in kernel,
which means it will lose ST0_CU2 when a process go to user space (try
to use COP2 in user space will trigger an exception and then grab COP2,
which is similar to FPU). And as a result, we need to modify the context
switching code because the new scheduled process doesn't contain ST0_CU2
in its THERAD_STATUS probably.
For zboot, we disable gslq/gssq be generated by toolchain.
Signed-off-by: Huacai Chen <chenhc@lemote.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
arch/mips/kernel/branch.c:876:5: error: no previous prototype for '__insn_is_compact_branch' [-Werror=missing-prototypes]
Signed-off-by: Pujin Shi <shipujin.t@gmail.com>
Signed-off-by: Pujin Shi <shipj@lemote.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
This addresses the following sparse warning:
arch/mips/kernel/setup.c:446:33: warning: symbol 'setup_elfcorehdr_size'
was not declared. Should it be static?
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
The previous code was doing:
BUG_ON(!__builtin_constant_p(cpu_has_counter) || cpu_has_counter);
This only worked as the "cpu_has_counter" macro was overridden in
<cpu-feature-overrides.h>. The default "cpu_has_counter" macro is
non-constant, which triggered the BUG_ON() independently of the value
returned by the macro.
What we want to check here, is that *if* the macro was overridden to a
compile-time constant, then must be defined to zero, otherwise it's a
bug.
So the correct check is:
BUG_ON(__builtin_constant_p(cpu_has_counter) && cpu_has_counter);
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
XBurst CPUs present in Ingenic SoCs have virtually tagged caches,
according to the <cpu-features-override.h> header.
Add that information to cpu_probe_ingenic().
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Previously, in cpu_probe_ingenic(), c->writecombine was set to
_CACHE_UNCACHED_ACCELERATED, but this macro was defined differently when
CONFIG_MACH_INGENIC was set. This made it impossible to support multiple
CPUs.
Address this issue by setting c->writecombine to _CACHE_CACHABLE_WA
directly and removing the dependency on CONFIG_MACH_INGENIC.
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Use a new config option to enabel R1000_LLSC workaound and remove
define from different war.h files.
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Use a new config option to enable I-cache refill workaround and remove
define from different war.h files.
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
In cc97ab235f ("MIPS: Simplify FP context initialization), init_fp_ctx
just initialize the fp/msa context, and own_fp_inatomic just restore
FCSR and 64bit FP regs from it, but miss MSACSR and upper MSA regs for
MSA, so MSACSR and MSA upper regs's value from previous task on current
cpu can leak into current task and cause unpredictable behavior when MSA
context not initialized.
Fixes: cc97ab235f ("MIPS: Simplify FP context initialization")
Signed-off-by: Huang Pei <huangpei@loongson.cn>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
The call simply looks up the corresponding task (without iterating
the tasklist), which is safe under rcu instead of the tasklist_lock.
In addition, the setaffinity counter part already does this.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
According to the user's manual chapter 8.2.1 of Loongson 3A2000 CPU [1]
and 3A3000 CPU [2], we should take some event IDs such as 274, 358, 359
and 360 as valid in the check condition, otherwise they are recognized
as "not supported", fix it.
[1] http://www.loongson.cn/uploadfile/cpu/3A2000/Loongson3A2000_user2.pdf
[2] http://www.loongson.cn/uploadfile/cpu/3A3000/Loongson3A3000_3B3000user2.pdf
Fixes: e9dfbaaeef ("MIPS: perf: Add hardware perf events support for new Loongson-3")
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Acked-by: Huang Pei <huangpei@loongson.cn>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
The .comment section doesn't belong in STABS_DEBUG. Split it out into a
new macro named ELF_DETAILS. This will gain other non-debug sections
that need to be accounted for when linking with --orphan-handling=warn.
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: linux-arch@vger.kernel.org
Link: https://lore.kernel.org/r/20200821194310.3089815-5-keescook@chromium.org
The initialization done by bmips_cpu_setup() typically affects both
threads of a given core, on 7435 which supports 2 cores and 2 threads,
logical CPU number 2 and 3 would not run this initialization.
Fixes: 738a3f7902 ("MIPS: BMIPS: Add early CPU initialization code")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
There exists redundant #ifdef CONFIG_DYNAMIC_FTRACE in ftrace.c, remove it.
Signed-off-by: Zejiang Tang <tangzejiang@loongson.cn>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Since commit 61a47c1ad3 ("sysctl: Remove the sysctl system call"),
sys_sysctl is actually unavailable: any input can only return an error.
We have been warning about people using the sysctl system call for years
and believe there are no more users. Even if there are users of this
interface if they have not complained or fixed their code by now they
probably are not going to, so there is no point in warning them any
longer.
So completely remove sys_sysctl on all architectures.
[nixiaoming@huawei.com: s390: fix build error for sys_call_table_emu]
Link: http://lkml.kernel.org/r/20200618141426.16884-1-nixiaoming@huawei.com
Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Will Deacon <will@kernel.org> [arm/arm64]
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Bin Meng <bin.meng@windriver.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: chenzefeng <chenzefeng2@huawei.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Christian Brauner <christian@brauner.io>
Cc: Chris Zankel <chris@zankel.net>
Cc: David Howells <dhowells@redhat.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Diego Elio Pettenò <flameeyes@flameeyes.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kars de Jong <jongk@linux-m68k.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Marco Elver <elver@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Olof Johansson <olof@lixom.net>
Cc: Paul Burton <paulburton@kernel.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: Sargun Dhillon <sargun@sargun.me>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Sven Schnelle <svens@stackframe.org>
Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Zhou Yanjie <zhouyanjie@wanyeetech.com>
Link: http://lkml.kernel.org/r/20200616030734.87257-1-nixiaoming@huawei.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add helpers to wrap the get_fs/set_fs magic for undoing any damange done
by set_fs(KERNEL_DS). There is no real functional benefit, but this
documents the intent of these calls better, and will allow stubbing the
functions out easily for kernels builds that do not allow address space
overrides in the future.
[hch@lst.de: drop two incorrect hunks, fix a commit log typo]
Link: http://lkml.kernel.org/r/20200714105505.935079-6-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Greentime Hu <green.hu@gmail.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Link: http://lkml.kernel.org/r/20200710135706.537715-6-hch@lst.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge misc updates from Andrew Morton:
- a few MM hotfixes
- kthread, tools, scripts, ntfs and ocfs2
- some of MM
Subsystems affected by this patch series: kthread, tools, scripts, ntfs,
ocfs2 and mm (hofixes, pagealloc, slab-generic, slab, slub, kcsan,
debug, pagecache, gup, swap, shmem, memcg, pagemap, mremap, mincore,
sparsemem, vmalloc, kasan, pagealloc, hugetlb and vmscan).
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (162 commits)
mm: vmscan: consistent update to pgrefill
mm/vmscan.c: fix typo
khugepaged: khugepaged_test_exit() check mmget_still_valid()
khugepaged: retract_page_tables() remember to test exit
khugepaged: collapse_pte_mapped_thp() protect the pmd lock
khugepaged: collapse_pte_mapped_thp() flush the right range
mm/hugetlb: fix calculation of adjust_range_if_pmd_sharing_possible
mm: thp: replace HTTP links with HTTPS ones
mm/page_alloc: fix memalloc_nocma_{save/restore} APIs
mm/page_alloc.c: skip setting nodemask when we are in interrupt
mm/page_alloc: fallbacks at most has 3 elements
mm/page_alloc: silence a KASAN false positive
mm/page_alloc.c: remove unnecessary end_bitidx for [set|get]_pfnblock_flags_mask()
mm/page_alloc.c: simplify pageblock bitmap access
mm/page_alloc.c: extract the common part in pfn_to_bitidx()
mm/page_alloc.c: replace the definition of NR_MIGRATETYPE_BITS with PB_migratetype_bits
mm/shuffle: remove dynamic reconfiguration
mm/memory_hotplug: document why shuffle_zone() is relevant
mm/page_alloc: remove nr_free_pagecache_pages()
mm: remove vm_total_pages
...
After removal of CONFIG_HAVE_MEMBLOCK_NODE_MAP we have two equivalent
functions that call memory_present() for each region in memblock.memory:
sparse_memory_present_with_active_regions() and membocks_present().
Moreover, all architectures have a call to either of these functions
preceding the call to sparse_init() and in the most cases they are called
one after the other.
Mark the regions from memblock.memory as present during sparce_init() by
making sparse_init() call memblocks_present(), make memblocks_present()
and memory_present() functions static and remove redundant
sparse_memory_present_with_active_regions() function.
Also remove no longer required HAVE_MEMORY_PRESENT configuration option.
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20200712083130.22919-1-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull ptrace regset updates from Al Viro:
"Internal regset API changes:
- regularize copy_regset_{to,from}_user() callers
- switch to saner calling conventions for ->get()
- kill user_regset_copyout()
The ->put() side of things will have to wait for the next cycle,
unfortunately.
The balance is about -1KLoC and replacements for ->get() instances are
a lot saner"
* 'work.regset' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (41 commits)
regset: kill user_regset_copyout{,_zero}()
regset(): kill ->get_size()
regset: kill ->get()
csky: switch to ->regset_get()
xtensa: switch to ->regset_get()
parisc: switch to ->regset_get()
nds32: switch to ->regset_get()
nios2: switch to ->regset_get()
hexagon: switch to ->regset_get()
h8300: switch to ->regset_get()
openrisc: switch to ->regset_get()
riscv: switch to ->regset_get()
c6x: switch to ->regset_get()
ia64: switch to ->regset_get()
arc: switch to ->regset_get()
arm: switch to ->regset_get()
sh: convert to ->regset_get()
arm64: switch to ->regset_get()
mips: switch to ->regset_get()
sparc: switch to ->regset_get()
...
- improvements for Loongson64
- extended ingenic support
- removal of not maintained paravirt system type
- cleanups and fixes
-----BEGIN PGP SIGNATURE-----
iQJOBAABCAA4FiEEbt46xwy6kEcDOXoUeZbBVTGwZHAFAl8sOj8aHHRzYm9nZW5k
QGFscGhhLmZyYW5rZW4uZGUACgkQeZbBVTGwZHDaKxAAj2Ptqna21sGQaRW3wQ4L
Bt41xZUw5m62DjAfd2j8w6PkJLvwcXx8y5v7ZpA79fLFJcDNzBgZzmjfqVxGue6P
j1Ei/Bc1UPF6s+hsjjr7IJG3Y50CKD1AY0VgKzZzY42NhCkkOVlzyUh0wER/COhG
jDAJtkrtFJeyNx2x+aWV5ILg2HTFsjCDfkLlwDWZdhMLa4XuDJoruRDBjQ8RVTs8
NsYANHaO8M1I/x9aGDpiQbE5j11WSm/GrA154ivIpenLHT+lx4HWoEjfLskIG0Bx
n+bMuUUdHw/T0dpj/51/eCIoaSXzEEMgLo5K5vowaikY/Q6++M+ePSQw6ohxFQ/q
PPWNJIilpufw6Utvkdzw6j/oGB65k1Tz1gxlQRVeF9vWf/qLW1O7gTuKJDebjPnH
3f+NfqlKuygF4zV0RJ4Kk5k0S+C6h8Q6qTlAZe4G2Ixcy1AV8USd+RBacwdSxzvE
z7c7wovHpIhvZ8Zkheli0WeB0DKX5KH8H1GcFki/fZ748qP02cfzkZ8sU4n3OF93
6z3AdShnDb9sCTB4trSXe7EvwVrirYbUtCPNCCRWgIZrJFVuw/2L197IzYF94mTE
ilqJ121hL7ETr7Wk8+HTOMbLUaYnh7Dt842Y7RvOEOzPhN9dlvMcABimpaIHf1rp
45aCNTwT6V5og2v73UhqTmk=
=I/vU
-----END PGP SIGNATURE-----
Merge tag 'mips_5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
Pull MIPS upates from Thomas Bogendoerfer:
- improvements for Loongson64
- extended ingenic support
- removal of not maintained paravirt system type
- cleanups and fixes
* tag 'mips_5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: (81 commits)
MIPS: SGI-IP27: always enable NUMA in Kconfig
MAINTAINERS: Update KVM/MIPS maintainers
MIPS: Update default config file for Loongson-3
MIPS: KVM: Add kvm guest support for Loongson-3
dt-bindings: mips: Document Loongson kvm guest board
MIPS: handle Loongson-specific GSExc exception
MIPS: add definitions for Loongson-specific CP0.Diag1 register
MIPS: only register FTLBPar exception handler for supported models
MIPS: ingenic: Hardcode mem size for qi,lb60 board
MIPS: DTS: ingenic/qi,lb60: Add model and memory node
MIPS: ingenic: Use fw_passed_dtb even if CONFIG_BUILTIN_DTB
MIPS: head.S: Init fw_passed_dtb to builtin DTB
of: address: Fix parser address/size cells initialization
of_address: Guard of_bus_pci_get_flags with CONFIG_PCI
MIPS: DTS: Fix number of msi vectors for Loongson64G
MIPS: Loongson64: Add ISA node for LS7A PCH
MIPS: Loongson64: DTS: Fix ISA and PCI I/O ranges for RS780E PCH
MIPS: Loongson64: Enlarge IO_SPACE_LIMIT
MIPS: Loongson64: Process ISA Node in DeviceTree
of_address: Add bus type match for pci ranges parser
...
Pull networking updates from David Miller:
1) Support 6Ghz band in ath11k driver, from Rajkumar Manoharan.
2) Support UDP segmentation in code TSO code, from Eric Dumazet.
3) Allow flashing different flash images in cxgb4 driver, from Vishal
Kulkarni.
4) Add drop frames counter and flow status to tc flower offloading,
from Po Liu.
5) Support n-tuple filters in cxgb4, from Vishal Kulkarni.
6) Various new indirect call avoidance, from Eric Dumazet and Brian
Vazquez.
7) Fix BPF verifier failures on 32-bit pointer arithmetic, from
Yonghong Song.
8) Support querying and setting hardware address of a port function via
devlink, use this in mlx5, from Parav Pandit.
9) Support hw ipsec offload on bonding slaves, from Jarod Wilson.
10) Switch qca8k driver over to phylink, from Jonathan McDowell.
11) In bpftool, show list of processes holding BPF FD references to
maps, programs, links, and btf objects. From Andrii Nakryiko.
12) Several conversions over to generic power management, from Vaibhav
Gupta.
13) Add support for SO_KEEPALIVE et al. to bpf_setsockopt(), from Dmitry
Yakunin.
14) Various https url conversions, from Alexander A. Klimov.
15) Timestamping and PHC support for mscc PHY driver, from Antoine
Tenart.
16) Support bpf iterating over tcp and udp sockets, from Yonghong Song.
17) Support 5GBASE-T i40e NICs, from Aleksandr Loktionov.
18) Add kTLS RX HW offload support to mlx5e, from Tariq Toukan.
19) Fix the ->ndo_start_xmit() return type to be netdev_tx_t in several
drivers. From Luc Van Oostenryck.
20) XDP support for xen-netfront, from Denis Kirjanov.
21) Support receive buffer autotuning in MPTCP, from Florian Westphal.
22) Support EF100 chip in sfc driver, from Edward Cree.
23) Add XDP support to mvpp2 driver, from Matteo Croce.
24) Support MPTCP in sock_diag, from Paolo Abeni.
25) Commonize UDP tunnel offloading code by creating udp_tunnel_nic
infrastructure, from Jakub Kicinski.
26) Several pci_ --> dma_ API conversions, from Christophe JAILLET.
27) Add FLOW_ACTION_POLICE support to mlxsw, from Ido Schimmel.
28) Add SK_LOOKUP bpf program type, from Jakub Sitnicki.
29) Refactor a lot of networking socket option handling code in order to
avoid set_fs() calls, from Christoph Hellwig.
30) Add rfc4884 support to icmp code, from Willem de Bruijn.
31) Support TBF offload in dpaa2-eth driver, from Ioana Ciornei.
32) Support XDP_REDIRECT in qede driver, from Alexander Lobakin.
33) Support PCI relaxed ordering in mlx5 driver, from Aya Levin.
34) Support TCP syncookies in MPTCP, from Flowian Westphal.
35) Fix several tricky cases of PMTU handling wrt. briding, from Stefano
Brivio.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2056 commits)
net: thunderx: initialize VF's mailbox mutex before first usage
usb: hso: remove bogus check for EINPROGRESS
usb: hso: no complaint about kmalloc failure
hso: fix bailout in error case of probe
ip_tunnel_core: Fix build for archs without _HAVE_ARCH_IPV6_CSUM
selftests/net: relax cpu affinity requirement in msg_zerocopy test
mptcp: be careful on subflow creation
selftests: rtnetlink: make kci_test_encap() return sub-test result
selftests: rtnetlink: correct the final return value for the test
net: dsa: sja1105: use detected device id instead of DT one on mismatch
tipc: set ub->ifindex for local ipv6 address
ipv6: add ipv6_dev_find()
net: openvswitch: silence suspicious RCU usage warning
Revert "vxlan: fix tos value before xmit"
ptp: only allow phase values lower than 1 period
farsync: switch from 'pci_' to 'dma_' API
wan: wanxl: switch from 'pci_' to 'dma_' API
hv_netvsc: do not use VF device if link is down
dpaa2-eth: Fix passing zero to 'PTR_ERR' warning
net: macb: Properly handle phylink on at91sam9x
...
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCXygcpgAKCRCRxhvAZXjc
ogPeAQDv1ncqtNroFAC4pJ4tQhH7JSjW0OltiMk/AocY/J2SdQD9GJ15luYJ0/om
697q/Z68sndRynhdoZlMuf3oYuBlHQw=
=3ZhE
-----END PGP SIGNATURE-----
Merge tag 'close-range-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull close_range() implementation from Christian Brauner:
"This adds the close_range() syscall. It allows to efficiently close a
range of file descriptors up to all file descriptors of a calling
task.
This is coordinated with the FreeBSD folks which have copied our
version of this syscall and in the meantime have already merged it in
April 2019:
https://reviews.freebsd.org/D21627https://svnweb.freebsd.org/base?view=revision&revision=359836
The syscall originally came up in a discussion around the new mount
API and making new file descriptor types cloexec by default. During
this discussion, Al suggested the close_range() syscall.
First, it helps to close all file descriptors of an exec()ing task.
This can be done safely via (quoting Al's example from [1] verbatim):
/* that exec is sensitive */
unshare(CLONE_FILES);
/* we don't want anything past stderr here */
close_range(3, ~0U);
execve(....);
The code snippet above is one way of working around the problem that
file descriptors are not cloexec by default. This is aggravated by the
fact that we can't just switch them over without massively regressing
userspace. For a whole class of programs having an in-kernel method of
closing all file descriptors is very helpful (e.g. demons, service
managers, programming language standard libraries, container managers
etc.).
Second, it allows userspace to avoid implementing closing all file
descriptors by parsing through /proc/<pid>/fd/* and calling close() on
each file descriptor and other hacks. From looking at various
large(ish) userspace code bases this or similar patterns are very
common in service managers, container runtimes, and programming
language runtimes/standard libraries such as Python or Rust.
In addition, the syscall will also work for tasks that do not have
procfs mounted and on kernels that do not have procfs support compiled
in. In such situations the only way to make sure that all file
descriptors are closed is to call close() on each file descriptor up
to UINT_MAX or RLIMIT_NOFILE, OPEN_MAX trickery.
Based on Linus' suggestion close_range() also comes with a new flag
CLOSE_RANGE_UNSHARE to more elegantly handle file descriptor dropping
right before exec. This would usually be expressed in the sequence:
unshare(CLONE_FILES);
close_range(3, ~0U);
as pointed out by Linus it might be desirable to have this be a part
of close_range() itself under a new flag CLOSE_RANGE_UNSHARE which
gets especially handy when we're closing all file descriptors above a
certain threshold.
Test-suite as always included"
* tag 'close-range-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
tests: add CLOSE_RANGE_UNSHARE tests
close_range: add CLOSE_RANGE_UNSHARE
tests: add close_range() tests
arch: wire-up close_range()
open: add close_range()
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCXyge/QAKCRCRxhvAZXjc
oildAQCCWpnTeXm6hrIE3VZ36X5npFtbaEthdBVAUJM7mo0FYwEA8+Wbnubg6jCw
mztkXCnTfU7tApUdhKtQzcpEws45/Qk=
=REE/
-----END PGP SIGNATURE-----
Merge tag 'fork-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull fork cleanups from Christian Brauner:
"This is cleanup series from when we reworked a chunk of the process
creation paths in the kernel and switched to struct
{kernel_}clone_args.
High-level this does two main things:
- Remove the double export of both do_fork() and _do_fork() where
do_fork() used the incosistent legacy clone calling convention.
Now we only export _do_fork() which is based on struct
kernel_clone_args.
- Remove the copy_thread_tls()/copy_thread() split making the
architecture specific HAVE_COYP_THREAD_TLS config option obsolete.
This switches all remaining architectures to select
HAVE_COPY_THREAD_TLS and thus to the copy_thread_tls() calling
convention. The current split makes the process creation codepaths
more convoluted than they need to be. Each architecture has their own
copy_thread() function unless it selects HAVE_COPY_THREAD_TLS then it
has a copy_thread_tls() function.
The split is not needed anymore nowadays, all architectures support
CLONE_SETTLS but quite a few of them never bothered to select
HAVE_COPY_THREAD_TLS and instead simply continued to use copy_thread()
and use the old calling convention. Removing this split cleans up the
process creation codepaths and paves the way for implementing clone3()
on such architectures since it requires the copy_thread_tls() calling
convention.
After having made each architectures support copy_thread_tls() this
series simply renames that function back to copy_thread(). It also
switches all architectures that call do_fork() directly over to
_do_fork() and the struct kernel_clone_args calling convention. This
is a corollary of switching the architectures that did not yet support
it over to copy_thread_tls() since do_fork() is conditional on not
supporting copy_thread_tls() (Mostly because it lacks a separate
argument for tls which is trivial to fix but there's no need for this
function to exist.).
The do_fork() removal is in itself already useful as it allows to to
remove the export of both do_fork() and _do_fork() we currently have
in favor of only _do_fork(). This has already been discussed back when
we added clone3(). The legacy clone() calling convention is - as is
probably well-known - somewhat odd:
#
# ABI hall of shame
#
config CLONE_BACKWARDS
config CLONE_BACKWARDS2
config CLONE_BACKWARDS3
that is aggravated by the fact that some architectures such as sparc
follow the CLONE_BACKWARDSx calling convention but don't really select
the corresponding config option since they call do_fork() directly.
So do_fork() enforces a somewhat arbitrary calling convention in the
first place that doesn't really help the individual architectures that
deviate from it. They can thus simply be switched to _do_fork()
enforcing a single calling convention. (I really hope that any new
architectures will __not__ try to implement their own calling
conventions...)
Most architectures already have made a similar switch (m68k comes to
mind).
Overall this removes more code than it adds even with a good portion
of added comments. It simplifies a chunk of arch specific assembly
either by moving the code into C or by simply rewriting the assembly.
Architectures that have been touched in non-trivial ways have all been
actually boot and stress tested: sparc and ia64 have been tested with
Debian 9 images. They are the two architectures which have been
touched the most. All non-trivial changes to architectures have seen
acks from the relevant maintainers. nios2 with a custom built
buildroot image. h8300 I couldn't get something bootable to test on
but the changes have been fairly automatic and I'm sure we'll hear
people yell if I broke something there.
All other architectures that have been touched in trivial ways have
been compile tested for each single patch of the series via git rebase
-x "make ..." v5.8-rc2. arm{64} and x86{_64} have been boot tested
even though they have just been trivially touched (removal of the
HAVE_COPY_THREAD_TLS macro from their Kconfig) because well they are
basically "core architectures" and since it is trivial to get your
hands on a useable image"
* tag 'fork-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
arch: rename copy_thread_tls() back to copy_thread()
arch: remove HAVE_COPY_THREAD_TLS
unicore: switch to copy_thread_tls()
sh: switch to copy_thread_tls()
nds32: switch to copy_thread_tls()
microblaze: switch to copy_thread_tls()
hexagon: switch to copy_thread_tls()
c6x: switch to copy_thread_tls()
alpha: switch to copy_thread_tls()
fork: remove do_fork()
h8300: select HAVE_COPY_THREAD_TLS, switch to kernel_clone_args
nios2: enable HAVE_COPY_THREAD_TLS, switch to kernel_clone_args
ia64: enable HAVE_COPY_THREAD_TLS, switch to kernel_clone_args
sparc: unconditionally enable HAVE_COPY_THREAD_TLS
sparc: share process creation helpers between sparc and sparc64
sparc64: enable HAVE_COPY_THREAD_TLS
fork: fold legacy_clone_args_valid() into _do_fork()
Newer Loongson cores (Loongson-3A R2 and newer) use the
implementation-dependent ExcCode 16 to signal Loongson-specific
exceptions. The extended cause is put in the non-standard CP0.Diag1
register which is CP0 Register 22 Select 1, called GSCause in Loongson
manuals. Inside is an exception code bitfield called GSExcCode, only
codes 0 to 6 inclusive are documented (so far, in the Loongson 3A3000
User Manual, Volume 2).
During experiments, it was found that some undocumented unprivileged
instructions can trigger the also-undocumented GSExcCode 8 on Loongson
3A4000. Processor state is not corrupted, but we cannot continue without
further knowledge, and Loongson is not providing that information as of
this writing. So we send SIGILL on seeing this exception code to thwart
easy local DoS attacks.
Other exception codes are made fatal, partly because of insufficient
knowledge, also partly because they are not as easily reproduced. None
of them are encountered in the wild with upstream kernels and userspace
so far.
Some older cores (Loongson-3A1000 and Loongson-3B1500) have ExcCode 16
too, but the semantic is equivalent to GSExcCode 0. Because the
respective manuals did not mention the CP0.Diag1 register or its read
behavior, these cores are not covered in this patch, as MFC0 from
non-existent CP0 registers is UNDEFINED according to the MIPS
architecture spec.
Reviewed-by: Huacai Chen <chenhc@lemote.com>
Signed-off-by: WANG Xuerui <git@xen0n.name>
Cc: Huacai Chen <chenhc@lemote.com>
Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Previously ExcCode 16 is unconditionally treated as the FTLB parity
exception (FTLBPar), but in fact its semantic is implementation-
dependent. Looking at various manuals it seems the FTLBPar exception is
only present on some recent MIPS Technologies cores, so only register
the handler on these.
Fixes: 75b5b5e0a2 ("MIPS: Add support for FTLBs")
Reviewed-by: Huacai Chen <chenhc@lemote.com>
Signed-off-by: WANG Xuerui <git@xen0n.name>
Cc: Paul Burton <paulburton@kernel.org>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Init the 'fw_passed_dtb' pointer to the buit-in Device Tree blob when it
has been compiled in with CONFIG_BUILTIN_DTB.
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
The CONFIG_MIPS_MACHINE option is dead code that hasn't been used in
years. The Kconfig option is not selected anywhere, and the
<asm/mips_machine.h> is not included anywhere either.
To make things worse, for years it co-existed with a separate MIPS
machine implementation as <asm/machine.h>. The two defined the
'mips_machine' structure with different fields, and the 'MIPS_MACHINE'
macro with different parameters. The two used the same memory area
(defined by the linker script) to store data, and you could totally use
the two at the same time for all kinds of funny results.
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Use 0 as the align parameter in memblock_find_in_range() is
incorrect when we reserve memory for Crash kernel.
The environment as follows:
[ 0.000000] MIPS: machine is loongson,loongson64c-4core-rs780e
...
[ 1.951016] crashkernel=64M@128M
The warning as follows:
[ 0.000000] Invalid memory region reserved for crash kernel
And the iomem as follows:
00200000-0effffff : System RAM
04000000-0484009f : Kernel code
048400a0-04ad7fff : Kernel data
04b40000-05c4c6bf : Kernel bss
1a000000-1bffffff : pci@1a000000
...
The align parameter may be finally used by round_down() or round_up().
Like the following call tree:
mips-next: mm/memblock.c
memblock_find_in_range
└── memblock_find_in_range_node
├── __memblock_find_range_bottom_up
│ └── round_up
└── __memblock_find_range_top_down
└── round_down
\#define round_up(x, y) ((((x)-1) | __round_mask(x, y))+1)
\#define round_down(x, y) ((x) & ~__round_mask(x, y))
\#define __round_mask(x, y) ((__typeof__(x))((y)-1))
The round_down(or round_up)'s second parameter must be a power of 2.
If the second parameter is 0, it both will return 0.
Use 1 as the parameter to fix the bug and the iomem as follows:
00200000-0effffff : System RAM
04000000-0484009f : Kernel code
048400a0-04ad7fff : Kernel data
04b40000-05c4c6bf : Kernel bss
08000000-0bffffff : Crash kernel
1a000000-1bffffff : pci@1a000000
...
Signed-off-by: Jinyang He <hejinyang@loongson.cn>
Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Now CPU#0 is not hotpluggable on MIPS, so prevent to create /sys/devices
/system/cpu/cpu0/online which confuses some user-space tools.
Cc: stable@vger.kernel.org
Signed-off-by: Huacai Chen <chenhc@lemote.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
It references __initdata and is called only from an __init function:
trap_init. This avoids section mismatches (which I am seeing with gcc
10).
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: linux-mips@vger.kernel.org
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
1.Add "PRID_COMP_INGENIC_13" and "PRID_IMP_XBURST2" for X2000.
2.Add X2000 system type for cat /proc/cpuinfo to give out X2000.
Signed-off-by: 周琰杰 (Zhou Yanjie) <zhouyanjie@wanyeetech.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Now that the ->compat_{get,set}sockopt proto_ops methods are gone
there is no good reason left to keep the compat syscalls separate.
This fixes the odd use of unsigned int for the compat_setsockopt
optlen and the missing sock_use_custom_sol_socket.
It would also easily allow running the eBPF hooks for the compat
syscalls, but such a large change in behavior does not belong into
a consolidation patch like this one.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Other vendor-defined registers use the vendor name as a prefix, not an
infix, so unify the naming style of CP0.Config6 bits.
Suggested-by: Maciej W. Rozycki" <macro@linux-mips.org>
Signed-off-by: Huacai Chen <chenhc@lemote.com>
Reviewed-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
- fix for missing hazard barrier
- DT fix for ingenic
- DT fix of GPHY names for lantiq
- fix usage of smp_processor_id() while preemption is enabled
-----BEGIN PGP SIGNATURE-----
iQJOBAABCAA4FiEEbt46xwy6kEcDOXoUeZbBVTGwZHAFAl8Bpt8aHHRzYm9nZW5k
QGFscGhhLmZyYW5rZW4uZGUACgkQeZbBVTGwZHBVlA/+JsNymfzLCaUHgEyjDzfp
R7x3/UUNHOF659MKebEIJEd/Rmhj+pg5682e3SugpNlOxuadB7Kl1j0SYNdVbR0h
Dg7ztP3osWbymoJG829xLvpYMlLGLuNaNUBV/8mFNwPy2bijmgkObYyeciFEQlvy
skHihVZCYQ1qVqMtDlNEmMGU0V6JTHuqOfdF7+d7ZmkwHpKGDBgR0BL3rhQREHif
iawKjkhuemgVpw0g6ULpuWlwvsgTbQNoaMIIGIlsaGfYWlyBnlnhbiZHg+WwC5Ey
zzuDFybQq9K+cylgwlrn7ypxCpUBfKCVzYUcEOcQC4+BPt74t1mwfS24FQS3HDol
pQ9lpIPLkm5m0kxokUT8Ei/lcSA1NeiubMOGQJaEc+7gpyBTcw+ChLB242cilngB
CzME5hppGEQlkBefS8CYZaOGUhhU0qaqm6WMkcQ0YIuiyo+ZmwQ6nwyVNbDB/BMb
vK99mgCf96PWqu8vcCcifC+O/wSBOUrMD3vljAswY6xwP9gQ4WYFAcielEXoSVMV
sIlVHNbDivpb6e41zerK+KU9Z1oCgPnFKT6FmkDtdQWQ4iDfOEUi/n72NlNfH5xT
MDGaaYVYuW3M1eR4Tlahe+UA2qIleZDc9DgORhu1kwlxecMfQSaBdKh1G9ifqw58
ZbzM1YrJHHh2xEBvhjpGaZU=
=A4AW
-----END PGP SIGNATURE-----
Merge tag 'mips_fixes_5.8_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
Pull MIPS fixes from Thomas Bogendoerfer:
- fix for missing hazard barrier
- DT fix for ingenic
- DT fix of GPHY names for lantiq
- fix usage of smp_processor_id() while preemption is enabled
* tag 'mips_fixes_5.8_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: Do not use smp_processor_id() in preemptible code
MIPS: Add missing EHB in mtc0 -> mfc0 sequence for DSPen
MIPS: ingenic: gcw0: Fix HP detection GPIO.
MIPS: lantiq: xway: sysctrl: fix the GPHY clock alias names
This resolves the hazard between the mtc0 in the change_c0_status() and
the mfc0 in configure_exception_vector(). Without resolving this hazard
configure_exception_vector() could read an old value and would restore
this old value again. This would revert the changes change_c0_status()
did. I checked this by printing out the read_c0_status() at the end of
per_cpu_trap_init() and the ST0_MX is not set without this patch.
The hazard is documented in the MIPS Architecture Reference Manual Vol.
III: MIPS32/microMIPS32 Privileged Resource Architecture (MD00088), rev
6.03 table 8.1 which includes:
Producer | Consumer | Hazard
----------|----------|----------------------------
mtc0 | mfc0 | any coprocessor 0 register
I saw this hazard on an Atheros AR9344 rev 2 SoC with a MIPS 74Kc CPU.
There the change_c0_status() function would activate the DSPen by
setting ST0_MX in the c0_status register. This was reverted and then the
system got a DSP exception when the DSP registers were saved in
save_dsp() in the first process switch. The crash looks like this:
[ 0.089999] Mount-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
[ 0.097796] Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
[ 0.107070] Kernel panic - not syncing: Unexpected DSP exception
[ 0.113470] Rebooting in 1 seconds..
We saw this problem in OpenWrt only on the MIPS 74Kc based Atheros SoCs,
not on the 24Kc based SoCs. We only saw it with kernel 5.4 not with
kernel 4.19, in addition we had to use GCC 8.4 or 9.X, with GCC 8.3 it
did not happen.
In the kernel I bisected this problem to commit 9012d01166 ("compiler:
allow all arches to enable CONFIG_OPTIMIZE_INLINING"), but when this was
reverted it also happened after commit 172dcd935c ("MIPS: Always
allocate exception vector for MIPSr2+").
Commit 0b24cae4d5 ("MIPS: Add missing EHB in mtc0 -> mfc0 sequence.")
does similar changes to a different file. I am not sure if there are
more places affected by this problem.
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Now that HAVE_COPY_THREAD_TLS has been removed, rename copy_thread_tls()
back simply copy_thread(). It's a simpler name, and doesn't imply that only
tls is copied here. This finishes an outstanding chunk of internal process
creation work since we've added clone3().
Cc: linux-arch@vger.kernel.org
Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>A
Acked-by: Stafford Horne <shorne@gmail.com>
Acked-by: Greentime Hu <green.hu@gmail.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>A
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>