Currently the panic notifiers from user mode linux don't follow
the convention for most of the other notifiers present in the
kernel (indentation, priority setting, numeric return).
More important, the priorities could be improved, since it's a
special case (userspace), hence we could run the notifiers earlier;
user mode linux shouldn't care much with other panic notifiers but
the ordering among the mconsole and arch notifier is important,
given that the arch one effectively triggers a core dump.
Fix that by running the mconsole notifier as the first panic
notifier, followed by the architecture one (that coredumps).
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Richard Weinberger <richard@nod.at>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
V3:
- No changes.
V2:
- Kept the notifier header to avoid implicit usage - thanks
Johannes for the suggestion!
Signed-off-by: Richard Weinberger <richard@nod.at>
Follow the advice of the below link and prefer 'strscpy' in this
subsystem. Conversion is 1:1 because the return value is not used.
Generated by a coccinelle script.
Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
The current 32 MiB of RAM causes OOMs to appear shortly after
booting in a minimal OpenWrt 22.03 configuration with a
5.10.134 kernel.
Of course, passing a "mem=64M" (from the --help text) parameter
works too, but it produces the following (info) message:
| [ 0.000000] Unknown kernel command line parameters "mem=64M", will be passed to user space.
That's why, I think it would be nicer, if this is working out
of the box again :).
Signed-off-by: Christian Lamparter <chunkeey@gmail.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Lin, Yang Shi, Anshuman Khandual and Mike Rapoport
- Some kmemleak fixes from Patrick Wang and Waiman Long
- DAMON updates from SeongJae Park
- memcg debug/visibility work from Roman Gushchin
- vmalloc speedup from Uladzislau Rezki
- more folio conversion work from Matthew Wilcox
- enhancements for coherent device memory mapping from Alex Sierra
- addition of shared pages tracking and CoW support for fsdax, from
Shiyang Ruan
- hugetlb optimizations from Mike Kravetz
- Mel Gorman has contributed some pagealloc changes to improve latency
and realtime behaviour.
- mprotect soft-dirty checking has been improved by Peter Xu
- Many other singleton patches all over the place
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCYuravgAKCRDdBJ7gKXxA
jpqSAQDrXSdII+ht9kSHlaCVYjqRFQz/rRvURQrWQV74f6aeiAD+NHHeDPwZn11/
SPktqEUrF1pxnGQxqLh1kUFUhsVZQgE=
=w/UH
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2022-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
"Most of the MM queue. A few things are still pending.
Liam's maple tree rework didn't make it. This has resulted in a few
other minor patch series being held over for next time.
Multi-gen LRU still isn't merged as we were waiting for mapletree to
stabilize. The current plan is to merge MGLRU into -mm soon and to
later reintroduce mapletree, with a view to hopefully getting both
into 6.1-rc1.
Summary:
- The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe
Lin, Yang Shi, Anshuman Khandual and Mike Rapoport
- Some kmemleak fixes from Patrick Wang and Waiman Long
- DAMON updates from SeongJae Park
- memcg debug/visibility work from Roman Gushchin
- vmalloc speedup from Uladzislau Rezki
- more folio conversion work from Matthew Wilcox
- enhancements for coherent device memory mapping from Alex Sierra
- addition of shared pages tracking and CoW support for fsdax, from
Shiyang Ruan
- hugetlb optimizations from Mike Kravetz
- Mel Gorman has contributed some pagealloc changes to improve
latency and realtime behaviour.
- mprotect soft-dirty checking has been improved by Peter Xu
- Many other singleton patches all over the place"
[ XFS merge from hell as per Darrick Wong in
https://lore.kernel.org/all/YshKnxb4VwXycPO8@magnolia/ ]
* tag 'mm-stable-2022-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (282 commits)
tools/testing/selftests/vm/hmm-tests.c: fix build
mm: Kconfig: fix typo
mm: memory-failure: convert to pr_fmt()
mm: use is_zone_movable_page() helper
hugetlbfs: fix inaccurate comment in hugetlbfs_statfs()
hugetlbfs: cleanup some comments in inode.c
hugetlbfs: remove unneeded header file
hugetlbfs: remove unneeded hugetlbfs_ops forward declaration
hugetlbfs: use helper macro SZ_1{K,M}
mm: cleanup is_highmem()
mm/hmm: add a test for cross device private faults
selftests: add soft-dirty into run_vmtests.sh
selftests: soft-dirty: add test for mprotect
mm/mprotect: fix soft-dirty check in can_change_pte_writable()
mm: memcontrol: fix potential oom_lock recursion deadlock
mm/gup.c: fix formatting in check_and_migrate_movable_page()
xfs: fail dax mount if reflink is enabled on a partition
mm/memcontrol.c: remove the redundant updating of stats_flush_threshold
userfaultfd: don't fail on unrecognized features
hugetlb_cgroup: fix wrong hugetlb cgroup numa stat
...
- KASAN support for x86_64
- noreboot command line option, just like qemu's -no-reboot
- Various fixes and cleanups
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEdgfidid8lnn52cLTZvlZhesYu8EFAmLteZYACgkQZvlZhesY
u8F8bRAA2806QUzysg3Nj1AKPiTOj47TuluGu4SXytB0usQgYK/n3Fxr36ULJAOJ
3qZWf2fsAkBLvgX9Sw2QFGfulrpfKnLeTdBXSEbWYWhZ0ZoaEJztKmtfH02kRDOW
POedQT5FXMDVjGQdLC7Ycp+WyjaUwrccZ+KRkGWmlr7vNFlxcTlEqBb13mgLdjkY
ep8X+SgmAcdvWBd/os+nNn9Al6TbFd4XQCok82DtNrv0ggwXnVPov/ArvZvvn2Oj
F028X77180rbrGV+ZnDkV1KSv/ccT5EFebJkfEEcYVjre8o0QoPQmh2tFqXN0d83
2WpIOb1+mQL0VClpC4hKbScpIB5tw8vIHsUT+ifloIgY/puhezx6aWm0TKSA+aTM
WitJl1Nf4uNu1rqkBkn9o3VK8CYokTALQIRexHCzvZ70CSxmFbR7EVRSTf7Rr690
Oq7StHagfuTJpddh0wQwaMorIH4s0/bpPoA6m4OhwlppnCpY0Hfl3+AKluNRUtH6
lPeQwfxhd/LKqYW0COElEnReDLzer82kUx/keVyxVINqxpm6YTHVtOgtMCEuVNXg
GbS8PFCW2mIP8Is6HJavZYCzG8vnz3wZ9GENujanwLemiIJfINDauybu+nNsE5pO
7v12vWeZ0x2HGM/cFxODrpp4xAkdq8BBLap8/aXB8uJFagmYyhs=
=f3Bh
-----END PGP SIGNATURE-----
Merge tag 'for-linus-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml
Pull UML updates from Richard Weinberger:
- KASAN support for x86_64
- noreboot command line option, just like qemu's -no-reboot
- Various fixes and cleanups
* tag 'for-linus-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
um: include sys/types.h for size_t
um: Replace to_phys() and to_virt() with less generic function names
um: Add missing apply_returns()
um: add "noreboot" command line option for PANIC_TIMEOUT=-1 setups
um: include linux/stddef.h for __always_inline
UML: add support for KASAN under x86_64
mm: Add PAGE_ALIGN_DOWN macro
um: random: Don't initialise hwrng struct with zero
um: remove unused mm_copy_segments
um: remove unused variable
um: Remove straying parenthesis
um: x86: print RIP with symbol
arch: um: Fix build for statically linked UML w/ constructors
x86/um: Kconfig: Fix indentation
um/drivers: Kconfig: Fix indentation
um: Kconfig: Fix indentation
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEq5lC5tSkz8NBJiCnSfxwEqXeA64FAmLnDOwACgkQSfxwEqXe
A65Fiw//Z0YaPejSslQIGitQ1b0XzdWBhyJArYDieaaiQRXMqlaSKlIUqHz38xb7
+FykUY51/SJLjHV2riPxq1OK3/MPmk6VlTd0HHihcHVmg77oZcFcv2tPnDpZoqND
TsBOujLbXKwxP8tNFedRY/4+K7w+ue9BTfDjuH7aCtz7uWd+4cNJmPg3x9FCfkMA
+hbcRluwE9W3Pg4OCKwv+qxL0JF3qQtNKEOp1wpnjGAZZW/I9gFNgFBEkykvcAsj
TkIRDc3agPFj6QgDeRIgLdnf9KCsLubKAg5oJneeCvQztJJUCSkn8nQXxpx+4sLo
GsRgvCdfL/GyJqfSAzQJVYDHKtKMkJiCiWCC/oOALR8dzHJfSlULDAjbY1m/DAr9
at+vi4678Or7TNx2ZSaUlCXXKZ+UT7yWMlQWax9JuxGk1hGYP5/eT1AH5SGjqUwF
w1q8oyzxt1vUcnOzEddFXPFirnqqhAk4dQFtu83+xKM4ZssMVyeB4NZdEhAdW0ng
MX+RjrVj4l5gWWuoS0Cx3LUxDCgV6WT0dN+Vl9axAZkoJJbcXLEmXwQ6NbzTLPWg
1/MT7qFTxNcTCeAArMdZvvFbeh7pOBXO42pafrK/7vDRnTMUIw9tqXNLQUfvdFQp
F5flPgiVRHDU2vSzKIFtnPTyXU0RBBGvNb4n0ss2ehH2DSsCxYE=
=Zy3d
-----END PGP SIGNATURE-----
Merge tag 'random-6.0-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random
Pull random number generator updates from Jason Donenfeld:
"Though there's been a decent amount of RNG-related development during
this last cycle, not all of it is coming through this tree, as this
cycle saw a shift toward tackling early boot time seeding issues,
which took place in other trees as well.
Here's a summary of the various patches:
- The CONFIG_ARCH_RANDOM .config option and the "nordrand" boot
option have been removed, as they overlapped with the more widely
supported and more sensible options, CONFIG_RANDOM_TRUST_CPU and
"random.trust_cpu". This change allowed simplifying a bit of arch
code.
- x86's RDRAND boot time test has been made a bit more robust, with
RDRAND disabled if it's clearly producing bogus results. This would
be a tip.git commit, technically, but I took it through random.git
to avoid a large merge conflict.
- The RNG has long since mixed in a timestamp very early in boot, on
the premise that a computer that does the same things, but does so
starting at different points in wall time, could be made to still
produce a different RNG state. Unfortunately, the clock isn't set
early in boot on all systems, so now we mix in that timestamp when
the time is actually set.
- User Mode Linux now uses the host OS's getrandom() syscall to
generate a bootloader RNG seed and later on treats getrandom() as
the platform's RDRAND-like faculty.
- The arch_get_random_{seed_,}_long() family of functions is now
arch_get_random_{seed_,}_longs(), which enables certain platforms,
such as s390, to exploit considerable performance advantages from
requesting multiple CPU random numbers at once, while at the same
time compiling down to the same code as before on platforms like
x86.
- A small cleanup changing a cmpxchg() into a try_cmpxchg(), from
Uros.
- A comment spelling fix"
More info about other random number changes that come in through various
architecture trees in the full commentary in the pull request:
https://lore.kernel.org/all/20220731232428.2219258-1-Jason@zx2c4.com/
* tag 'random-6.0-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
random: correct spelling of "overwrites"
random: handle archrandom with multiple longs
um: seed rng using host OS rng
random: use try_cmpxchg in _credit_init_bits
timekeeping: contribute wall clock to rng on time change
x86/rdrand: Remove "nordrand" flag in favor of "random.trust_cpu"
random: remove CONFIG_ARCH_RANDOM
UML generally does not provide access to special CPU instructions like
RDRAND, and execution tends to be rather deterministic, with no real
hardware interrupts, making good randomness really very hard, if not
all together impossible. Not only is this a security eyebrow raiser, but
it's also quite annoying when trying to do various pieces of UML-based
automation that takes a long time to boot, if ever.
Fix this by trivially calling getrandom() in the host and using that
seed as "bootloader randomness", which initializes the rng immediately
at UML boot.
The old behavior can be restored the same way as on any other arch, by
way of CONFIG_TRUST_BOOTLOADER_RANDOMNESS=n or
random.trust_bootloader=0. So seen from that perspective, this just
makes UML act like other archs, which is positive in its own right.
Additionally, wire up arch_get_random_{int,long}() in the same way, so
that reseeds can also make use of the host RNG, controllable by
CONFIG_TRUST_CPU_RANDOMNESS and random.trust_cpu, per usual.
Cc: stable@vger.kernel.org
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
This enables ARCH_HAS_VM_GET_PAGE_PROT on the platform and exports
standard vm_get_page_prot() implementation via DECLARE_VM_GET_PAGE_PROT,
which looks up a private and static protection_map[] array. Subsequently
all __SXXX and __PXXX macros can be dropped which are no longer needed.
Link: https://lkml.kernel.org/r/20220711070600.2378316-25-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Brian Cain <bcain@quicinc.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Implement apply_returns() stub for UM, just like all the other patching
routines.
Fixes: 15e67227c49a ("x86: Undo return-thunk damage")
Reported-by: Randy Dunlap <rdunlap@infradead.org)
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Richard Weinberger <richard@nod.at>
Make KASAN run on User Mode Linux on x86_64.
The UML-specific KASAN initializer uses mmap to map the ~16TB of shadow
memory to the location defined by KASAN_SHADOW_OFFSET. kasan_init()
utilizes constructors to initialize KASAN before main().
The location of the KASAN shadow memory, starting at
KASAN_SHADOW_OFFSET, can be configured using the KASAN_SHADOW_OFFSET
option. The default location of this offset is 0x100000000000, which
keeps it out-of-the-way even on UML setups with more "physical" memory.
For low-memory setups, 0x7fff8000 can be used instead, which fits in an
immediate and is therefore faster, as suggested by Dmitry Vyukov. There
is usually enough free space at this location; however, it is a config
option so that it can be easily changed if needed.
Note that, unlike KASAN on other architectures, vmalloc allocations
still use the shadow memory allocated upfront, rather than allocating
and free-ing it per-vmalloc allocation.
If another architecture chooses to go down the same path, we should
replace the checks for CONFIG_UML with something more generic, such
as:
- A CONFIG_KASAN_NO_SHADOW_ALLOC option, which architectures could set
- or, a way of having architecture-specific versions of these vmalloc
and module shadow memory allocation options.
Also note that, while UML supports both KASAN in inline mode
(CONFIG_KASAN_INLINE) and static linking (CONFIG_STATIC_LINK), it does
not support both at the same time.
Signed-off-by: Patricia Alfonso <trishalfonso@google.com>
Co-developed-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Signed-off-by: David Gow <davidgow@google.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
If CONFIG_CONSTUCTORS is enabled on a statically linked
(CONFIG_STATIC_LINK=y) build of UML, the build fails due to the
.eh_frame section being both used and discarded:
ERROR:root:`.eh_frame' referenced in section `.text' of /usr/lib/gcc/x86_64-linux-gnu/11/crtbeginT.o: defined in discarded section `.eh_frame' of /usr/lib/gcc/x86_64-linux-gnu/11/crtbeginT.o
`.eh_frame' referenced in section `.text' of /usr/lib/gcc/x86_64-linux-gnu/11/crtbeginT.o: defined in discarded section `.eh_frame' of /usr/lib/gcc/x86_64-linux-gnu/11/crtbeginT.o
Instead, keep the .eh_frame section, as we do in dyn.lds.S for
dynamically linked UML.
This can be reproduced with:
./tools/testing/kunit/kunit.py run --kconfig_add CONFIG_STATIC_LINK=y --kconfig_add CONFIG_GCOV_KERNEL=y --kconfig_add CONFIG_DEBUG_FS=y
Signed-off-by: David Gow <davidgow@google.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Implement apply_returns() stub for UM, just like all the other patching
routines.
Fixes: 15e67227c49a ("x86: Undo return-thunk damage")
Reported-by: Randy Dunlap <rdunlap@infradead.org)
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/Ys%2Ft45l%2FgarIrD0u@worktop.programming.kicks-ass.net
I observed that for each of the shared file-backed page faults, we're very
likely to retry one more time for the 1st write fault upon no page. It's
because we'll need to release the mmap lock for dirty rate limit purpose
with balance_dirty_pages_ratelimited() (in fault_dirty_shared_page()).
Then after that throttling we return VM_FAULT_RETRY.
We did that probably because VM_FAULT_RETRY is the only way we can return
to the fault handler at that time telling it we've released the mmap lock.
However that's not ideal because it's very likely the fault does not need
to be retried at all since the pgtable was well installed before the
throttling, so the next continuous fault (including taking mmap read lock,
walk the pgtable, etc.) could be in most cases unnecessary.
It's not only slowing down page faults for shared file-backed, but also add
more mmap lock contention which is in most cases not needed at all.
To observe this, one could try to write to some shmem page and look at
"pgfault" value in /proc/vmstat, then we should expect 2 counts for each
shmem write simply because we retried, and vm event "pgfault" will capture
that.
To make it more efficient, add a new VM_FAULT_COMPLETED return code just to
show that we've completed the whole fault and released the lock. It's also
a hint that we should very possibly not need another fault immediately on
this page because we've just completed it.
This patch provides a ~12% perf boost on my aarch64 test VM with a simple
program sequentially dirtying 400MB shmem file being mmap()ed and these are
the time it needs:
Before: 650.980 ms (+-1.94%)
After: 569.396 ms (+-1.38%)
I believe it could help more than that.
We need some special care on GUP and the s390 pgfault handler (for gmap
code before returning from pgfault), the rest changes in the page fault
handlers should be relatively straightforward.
Another thing to mention is that mm_account_fault() does take this new
fault as a generic fault to be accounted, unlike VM_FAULT_RETRY.
I explicitly didn't touch hmm_vma_fault() and break_ksm() because they do
not handle VM_FAULT_RETRY even with existing code, so I'm literally keeping
them as-is.
Link: https://lkml.kernel.org/r/20220530183450.42886-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vineet Gupta <vgupta@kernel.org>
Acked-by: Guo Ren <guoren@kernel.org>
Acked-by: Max Filippov <jcmvbkbc@gmail.com>
Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Alistair Popple <apopple@nvidia.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> [arm part]
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Stafford Horne <shorne@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Brian Cain <bcain@quicinc.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Richard Weinberger <richard@nod.at>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Janosch Frank <frankja@linux.ibm.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Will Deacon <will@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Chris Zankel <chris@zankel.net>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Rich Felker <dalias@libc.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Helge Deller <deller@gmx.de>
Cc: Yoshinori Sato <ysato@users.osdn.me>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
of Peter Zijlstra was encountering with ptrace in his freezer rewrite
I identified some cleanups to ptrace_stop that make sense on their own
and move make resolving the other problems much simpler.
The biggest issue is the habbit of the ptrace code to change task->__state
from the tracer to suppress TASK_WAKEKILL from waking up the tracee. No
other code in the kernel does that and it is straight forward to update
signal_wake_up and friends to make that unnecessary.
Peter's task freezer sets frozen tasks to a new state TASK_FROZEN and
then it stores them by calling "wake_up_state(t, TASK_FROZEN)" relying
on the fact that all stopped states except the special stop states can
tolerate spurious wake up and recover their state.
The state of stopped and traced tasked is changed to be stored in
task->jobctl as well as in task->__state. This makes it possible for
the freezer to recover tasks in these special states, as well as
serving as a general cleanup. With a little more work in that
direction I believe TASK_STOPPED can learn to tolerate spurious wake
ups and become an ordinary stop state.
The TASK_TRACED state has to remain a special state as the registers for
a process are only reliably available when the process is stopped in
the scheduler. Fundamentally ptrace needs acess to the saved
register values of a task.
There are bunch of semi-random ptrace related cleanups that were found
while looking at these issues.
One cleanup that deserves to be called out is from commit 57b6de08b5f6
("ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs"). This
makes a change that is technically user space visible, in the handling
of what happens to a tracee when a tracer dies unexpectedly.
According to our testing and our understanding of userspace nothing
cares that spurious SIGTRAPs can be generated in that case.
The entire discussion can be found at:
https://lkml.kernel.org/r/87a6bv6dl6.fsf_-_@email.froward.int.ebiederm.org
Eric W. Biederman (11):
signal: Rename send_signal send_signal_locked
signal: Replace __group_send_sig_info with send_signal_locked
ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP
ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP
ptrace: Remove arch_ptrace_attach
signal: Use lockdep_assert_held instead of assert_spin_locked
ptrace: Reimplement PTRACE_KILL by always sending SIGKILL
ptrace: Document that wait_task_inactive can't fail
ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs
ptrace: Don't change __state
ptrace: Always take siglock in ptrace_resume
Peter Zijlstra (1):
sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
arch/ia64/include/asm/ptrace.h | 4 --
arch/ia64/kernel/ptrace.c | 57 ----------------
arch/um/include/asm/thread_info.h | 2 +
arch/um/kernel/exec.c | 2 +-
arch/um/kernel/process.c | 2 +-
arch/um/kernel/ptrace.c | 8 +--
arch/um/kernel/signal.c | 4 +-
arch/x86/kernel/step.c | 3 +-
arch/xtensa/kernel/ptrace.c | 4 +-
arch/xtensa/kernel/signal.c | 4 +-
drivers/tty/tty_jobctrl.c | 4 +-
include/linux/ptrace.h | 7 --
include/linux/sched.h | 10 ++-
include/linux/sched/jobctl.h | 8 +++
include/linux/sched/signal.h | 20 ++++--
include/linux/signal.h | 3 +-
kernel/ptrace.c | 87 ++++++++---------------
kernel/sched/core.c | 5 +-
kernel/signal.c | 140 +++++++++++++++++---------------------
kernel/time/posix-cpu-timers.c | 6 +-
20 files changed, 140 insertions(+), 240 deletions(-)
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEgjlraLDcwBA2B+6cC/v6Eiajj0AFAmKaXaYACgkQC/v6Eiaj
j0CgoA/+JncSQ6PY2D5Jh1apvHzmnRsFXzr3DRvtv/CVx4oIebOXRQFyVDeD5tRn
TmMgB29HpBlHRDLojlmlZRGAld1HR/aPEW9j8W1D3Sy/ZFO5L8lQitv9aDHO9Ntw
4lZvlhS1M0KhATudVVBqSPixiG6CnV5SsGmixqdOyg7xcXSY6G1l2nB7Zk9I3Tat
ZlmhuZ6R5Z5qsm4MEq0vUSrnsHiGxYrpk6uQOaVz8Wkv8ZFmbutt6XgxF0tsyZNn
mHSmWSiZzIgBjTlaibEmxi8urYJTPj3vGBeJQVYHblFwLFi6+Oy7bDxQbWjQvaZh
DsgWPScfBF4Jm0+8hhCiSYpvPp8XnZuklb4LNCeok/VFr+KfSmpJTIhn00kagQ1u
vxQDqLws8YLW4qsfGydfx9uUIFCbQE/V2VDYk5J3Re3gkUNDOOR1A56hPniKv6VB
2aqGO2Fl0RdBbUa3JF+XI5Pwq5y1WrqR93EUvj+5+u5W9rZL/8WLBHBMEz6gbmfD
DhwFE0y8TG2WRlWJVEDRId+5zo3di/YvasH0vJZ5HbrxhS2RE/yIGAd+kKGx/lZO
qWDJC7IHvFJ7Mw5KugacyF0SHeNdloyBM7KZW6HeXmgKn9IMJBpmwib92uUkRZJx
D8j/bHHqD/zsgQ39nO+c4M0MmhO/DsPLG/dnGKrRCu7v1tmEnkY=
=ZUuO
-----END PGP SIGNATURE-----
Merge tag 'ptrace_stop-cleanup-for-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull ptrace_stop cleanups from Eric Biederman:
"While looking at the ptrace problems with PREEMPT_RT and the problems
Peter Zijlstra was encountering with ptrace in his freezer rewrite I
identified some cleanups to ptrace_stop that make sense on their own
and move make resolving the other problems much simpler.
The biggest issue is the habit of the ptrace code to change
task->__state from the tracer to suppress TASK_WAKEKILL from waking up
the tracee. No other code in the kernel does that and it is straight
forward to update signal_wake_up and friends to make that unnecessary.
Peter's task freezer sets frozen tasks to a new state TASK_FROZEN and
then it stores them by calling "wake_up_state(t, TASK_FROZEN)" relying
on the fact that all stopped states except the special stop states can
tolerate spurious wake up and recover their state.
The state of stopped and traced tasked is changed to be stored in
task->jobctl as well as in task->__state. This makes it possible for
the freezer to recover tasks in these special states, as well as
serving as a general cleanup. With a little more work in that
direction I believe TASK_STOPPED can learn to tolerate spurious wake
ups and become an ordinary stop state.
The TASK_TRACED state has to remain a special state as the registers
for a process are only reliably available when the process is stopped
in the scheduler. Fundamentally ptrace needs acess to the saved
register values of a task.
There are bunch of semi-random ptrace related cleanups that were found
while looking at these issues.
One cleanup that deserves to be called out is from commit 57b6de08b5f6
("ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs"). This
makes a change that is technically user space visible, in the handling
of what happens to a tracee when a tracer dies unexpectedly. According
to our testing and our understanding of userspace nothing cares that
spurious SIGTRAPs can be generated in that case"
* tag 'ptrace_stop-cleanup-for-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
ptrace: Always take siglock in ptrace_resume
ptrace: Don't change __state
ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs
ptrace: Document that wait_task_inactive can't fail
ptrace: Reimplement PTRACE_KILL by always sending SIGKILL
signal: Use lockdep_assert_held instead of assert_spin_locked
ptrace: Remove arch_ptrace_attach
ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP
ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP
signal: Replace __group_send_sig_info with send_signal_locked
signal: Rename send_signal send_signal_locked
User mode linux is the last user of the PT_DTRACE flag. Using the flag to indicate
single stepping is a little confusing and worse changing tsk->ptrace without locking
could potentionally cause problems.
So use a thread info flag with a better name instead of flag in tsk->ptrace.
Remove the definition PT_DTRACE as uml is the last user.
Cc: stable@vger.kernel.org
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Tested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Link: https://lkml.kernel.org/r/20220505182645.497868-3-ebiederm@xmission.com
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Add fn and fn_arg members into struct kernel_clone_args and test for
them in copy_thread (instead of testing for PF_KTHREAD | PF_IO_WORKER).
This allows any task that wants to be a user space task that only runs
in kernel mode to use this functionality.
The code on x86 is an exception and still retains a PF_KTHREAD test
because x86 unlikely everything else handles kthreads slightly
differently than user space tasks that start with a function.
The functions that created tasks that start with a function
have been updated to set ".fn" and ".fn_arg" instead of
".stack" and ".stack_size". These functions are fork_idle(),
create_io_thread(), kernel_thread(), and user_mode_thread().
Link: https://lkml.kernel.org/r/20220506141512.516114-4-ebiederm@xmission.com
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
With io_uring we have started supporting tasks that are for most
purposes user space tasks that exclusively run code in kernel mode.
The kernel task that exec's init and tasks that exec user mode
helpers are also user mode tasks that just run kernel code
until they call kernel execve.
Pass kernel_clone_args into copy_thread so these oddball
tasks can be supported more cleanly and easily.
v2: Fix spelling of kenrel_clone_args on h8300
Link: https://lkml.kernel.org/r/20220506141512.516114-2-ebiederm@xmission.com
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
- Devicetree support (for testing)
- Various cleanups and fixes: UBD, port_user, uml_mconsole
- Maintainer update
-----BEGIN PGP SIGNATURE-----
iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAmJFwUMWHHJpY2hhcmRA
c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wQqBD/9gLyeiVp2eu1YFVir64IASgVjK
lNdlAfUwfebtEsw65JcfY8K64910ahw6TvkjTT2A+QGeJIYaVwmw69bLXJUvQq31
C7ZDsMHptuNiZrHDL9SoA0DfwqRdJx3tgGzDnSkhX+2T7Zs5n1nLRMBmn/NJV9Qy
CmxG9fLH1VsU0p6RI76WST3GPLOqWa3jCeHK1vMGZNXI+eo5prHc59lkOcT7lEy7
M4vJRaAV6pCDDYMQdDOYr1PDEeG7/h49EqdKylkOhonDyYB649rL6Lc9nRBvSts3
NXX/qYy1Sj1AlOSR5IOon6QCyk1hap9kr85QoCtz3VMabD/yLlBovZzLOLaF+0S6
dQWgKg806g8QYQGxN03Ph0Pb5cA6hAjr8nVmAuICJDWgmY6Oo74pEvhI8toofFzk
NJzwa6G99xNhfggeTcGdG0ddQDT8N3enKspDPkzpN127GzU5cgvI1Z8wnZXB7JDM
zLMCxzwehocCSrFlh9aQDFK1XJfEWuP66xEPl5cX46//IMKqsrXEOjNlCTRUmA5F
OhU4qqb01OW3K4HPaAkBcGPZ0HhFn6JREUFyNW07dg6s73IWzf0CaNKeYJS7abln
tdvfPg3OPNXCjHd3aCW22EzuB9R/K8BNMkva3QQZxtUa+tOjBdBd9JBJ+vHGA1MN
7/k60wl1dt8/N9yHFg==
=YsK8
-----END PGP SIGNATURE-----
Merge tag 'for-linus-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml
Pull UML updates from Richard Weinberger:
- Devicetree support (for testing)
- Various cleanups and fixes: UBD, port_user, uml_mconsole
- Maintainer update
* tag 'for-linus-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
um: run_helper: Write error message to kernel log on exec failure on host
um: port_user: Improve error handling when port-helper is not found
um: port_user: Allow setting path to port-helper using UML_PORT_HELPER envvar
um: port_user: Search for in.telnetd in PATH
um: clang: Strip out -mno-global-merge from USER_CFLAGS
docs: UML: Mention telnetd for port channel
um: Remove unused timeval_to_ns() function
um: Fix uml_mconsole stop/go
um: Cleanup syscall_handler_t definition/cast, fix warning
uml: net: vector: fix const issue
um: Fix WRITE_ZEROES in the UBD Driver
um: Migrate vector drivers to NAPI
um: Fix order of dtb unflatten/early init
um: fix and optimize xor select template for CONFIG64 and timetravel mode
um: Document dtb command line option
lib/logic_iomem: correct fallback config references
um: Remove duplicated include in syscalls_64.c
MAINTAINERS: Update UserModeLinux entry
This set of changes removes tracehook.h, moves modification of all of
the ptrace fields inside of siglock to remove races, adds a missing
permission check to ptrace.c
The removal of tracehook.h is quite significant as it has been a major
source of confusion in recent years. Much of that confusion was
around task_work and TIF_NOTIFY_SIGNAL (which I have now decoupled
making the semantics clearer).
For people who don't know tracehook.h is a vestiage of an attempt to
implement uprobes like functionality that was never fully merged, and
was later superseeded by uprobes when uprobes was merged. For many
years now we have been removing what tracehook functionaly a little
bit at a time. To the point where now anything left in tracehook.h is
some weird strange thing that is difficult to understand.
Eric W. Biederman (15):
ptrace: Move ptrace_report_syscall into ptrace.h
ptrace/arm: Rename tracehook_report_syscall report_syscall
ptrace: Create ptrace_report_syscall_{entry,exit} in ptrace.h
ptrace: Remove arch_syscall_{enter,exit}_tracehook
ptrace: Remove tracehook_signal_handler
task_work: Remove unnecessary include from posix_timers.h
task_work: Introduce task_work_pending
task_work: Call tracehook_notify_signal from get_signal on all architectures
task_work: Decouple TIF_NOTIFY_SIGNAL and task_work
signal: Move set_notify_signal and clear_notify_signal into sched/signal.h
resume_user_mode: Remove #ifdef TIF_NOTIFY_RESUME in set_notify_resume
resume_user_mode: Move to resume_user_mode.h
tracehook: Remove tracehook.h
ptrace: Move setting/clearing ptrace_message into ptrace_stop
ptrace: Return the signal to continue with from ptrace_stop
Jann Horn (1):
ptrace: Check PTRACE_O_SUSPEND_SECCOMP permission on PTRACE_SEIZE
Yang Li (1):
ptrace: Remove duplicated include in ptrace.c
MAINTAINERS | 1 -
arch/Kconfig | 5 +-
arch/alpha/kernel/ptrace.c | 5 +-
arch/alpha/kernel/signal.c | 4 +-
arch/arc/kernel/ptrace.c | 5 +-
arch/arc/kernel/signal.c | 4 +-
arch/arm/kernel/ptrace.c | 12 +-
arch/arm/kernel/signal.c | 4 +-
arch/arm64/kernel/ptrace.c | 14 +--
arch/arm64/kernel/signal.c | 4 +-
arch/csky/kernel/ptrace.c | 5 +-
arch/csky/kernel/signal.c | 4 +-
arch/h8300/kernel/ptrace.c | 5 +-
arch/h8300/kernel/signal.c | 4 +-
arch/hexagon/kernel/process.c | 4 +-
arch/hexagon/kernel/signal.c | 1 -
arch/hexagon/kernel/traps.c | 6 +-
arch/ia64/kernel/process.c | 4 +-
arch/ia64/kernel/ptrace.c | 6 +-
arch/ia64/kernel/signal.c | 1 -
arch/m68k/kernel/ptrace.c | 5 +-
arch/m68k/kernel/signal.c | 4 +-
arch/microblaze/kernel/ptrace.c | 5 +-
arch/microblaze/kernel/signal.c | 4 +-
arch/mips/kernel/ptrace.c | 5 +-
arch/mips/kernel/signal.c | 4 +-
arch/nds32/include/asm/syscall.h | 2 +-
arch/nds32/kernel/ptrace.c | 5 +-
arch/nds32/kernel/signal.c | 4 +-
arch/nios2/kernel/ptrace.c | 5 +-
arch/nios2/kernel/signal.c | 4 +-
arch/openrisc/kernel/ptrace.c | 5 +-
arch/openrisc/kernel/signal.c | 4 +-
arch/parisc/kernel/ptrace.c | 7 +-
arch/parisc/kernel/signal.c | 4 +-
arch/powerpc/kernel/ptrace/ptrace.c | 8 +-
arch/powerpc/kernel/signal.c | 4 +-
arch/riscv/kernel/ptrace.c | 5 +-
arch/riscv/kernel/signal.c | 4 +-
arch/s390/include/asm/entry-common.h | 1 -
arch/s390/kernel/ptrace.c | 1 -
arch/s390/kernel/signal.c | 5 +-
arch/sh/kernel/ptrace_32.c | 5 +-
arch/sh/kernel/signal_32.c | 4 +-
arch/sparc/kernel/ptrace_32.c | 5 +-
arch/sparc/kernel/ptrace_64.c | 5 +-
arch/sparc/kernel/signal32.c | 1 -
arch/sparc/kernel/signal_32.c | 4 +-
arch/sparc/kernel/signal_64.c | 4 +-
arch/um/kernel/process.c | 4 +-
arch/um/kernel/ptrace.c | 5 +-
arch/x86/kernel/ptrace.c | 1 -
arch/x86/kernel/signal.c | 5 +-
arch/x86/mm/tlb.c | 1 +
arch/xtensa/kernel/ptrace.c | 5 +-
arch/xtensa/kernel/signal.c | 4 +-
block/blk-cgroup.c | 2 +-
fs/coredump.c | 1 -
fs/exec.c | 1 -
fs/io-wq.c | 6 +-
fs/io_uring.c | 11 +-
fs/proc/array.c | 1 -
fs/proc/base.c | 1 -
include/asm-generic/syscall.h | 2 +-
include/linux/entry-common.h | 47 +-------
include/linux/entry-kvm.h | 2 +-
include/linux/posix-timers.h | 1 -
include/linux/ptrace.h | 81 ++++++++++++-
include/linux/resume_user_mode.h | 64 ++++++++++
include/linux/sched/signal.h | 17 +++
include/linux/task_work.h | 5 +
include/linux/tracehook.h | 226 -----------------------------------
include/uapi/linux/ptrace.h | 2 +-
kernel/entry/common.c | 19 +--
kernel/entry/kvm.c | 9 +-
kernel/exit.c | 3 +-
kernel/livepatch/transition.c | 1 -
kernel/ptrace.c | 47 +++++---
kernel/seccomp.c | 1 -
kernel/signal.c | 62 +++++-----
kernel/task_work.c | 4 +-
kernel/time/posix-cpu-timers.c | 1 +
mm/memcontrol.c | 2 +-
security/apparmor/domain.c | 1 -
security/selinux/hooks.c | 1 -
85 files changed, 372 insertions(+), 495 deletions(-)
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEgjlraLDcwBA2B+6cC/v6Eiajj0AFAmJCQkoACgkQC/v6Eiaj
j0DCWQ/5AZVFU+hX32obUNCLackHTwgcCtSOs3JNBmNA/zL/htPiYYG0ghkvtlDR
Dw5J5DnxC6P7PVAdAqrpvx2uX2FebHYU0bRlyLx8LYUEP5dhyNicxX9jA882Z+vw
Ud0Ue9EojwGWS76dC9YoKUj3slThMATbhA2r4GVEoof8fSNJaBxQIqath44t0FwU
DinWa+tIOvZANGBZr6CUUINNIgqBIZCH/R4h6ArBhMlJpuQ5Ufk2kAaiWFwZCkX4
0LuuAwbKsCKkF8eap5I2KrIg/7zZVgxAg9O3cHOzzm8OPbKzRnNnQClcDe8perqp
S6e/f3MgpE+eavd1EiLxevZ660cJChnmikXVVh8ZYYoefaMKGqBaBSsB38bNcLjY
3+f2dB+TNBFRnZs1aCujK3tWBT9QyjZDKtCBfzxDNWBpXGLhHH6j6lA5Lj+Cef5K
/HNHFb+FuqedlFZh5m1Y+piFQ70hTgCa2u8b+FSOubI2hW9Zd+WzINV0ANaZ2LvZ
4YGtcyDNk1q1+c87lxP9xMRl/xi6rNg+B9T2MCo4IUnHgpSVP6VEB3osgUmrrrN0
eQlUI154G/AaDlqXLgmn1xhRmlPGfmenkxpok1AuzxvNJsfLKnpEwQSc13g3oiZr
disZQxNY0kBO2Nv3G323Z6PLinhbiIIFez6cJzK5v0YJ2WtO3pY=
=uEro
-----END PGP SIGNATURE-----
Merge tag 'ptrace-cleanups-for-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull ptrace cleanups from Eric Biederman:
"This set of changes removes tracehook.h, moves modification of all of
the ptrace fields inside of siglock to remove races, adds a missing
permission check to ptrace.c
The removal of tracehook.h is quite significant as it has been a major
source of confusion in recent years. Much of that confusion was around
task_work and TIF_NOTIFY_SIGNAL (which I have now decoupled making the
semantics clearer).
For people who don't know tracehook.h is a vestiage of an attempt to
implement uprobes like functionality that was never fully merged, and
was later superseeded by uprobes when uprobes was merged. For many
years now we have been removing what tracehook functionaly a little
bit at a time. To the point where anything left in tracehook.h was
some weird strange thing that was difficult to understand"
* tag 'ptrace-cleanups-for-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
ptrace: Remove duplicated include in ptrace.c
ptrace: Check PTRACE_O_SUSPEND_SECCOMP permission on PTRACE_SEIZE
ptrace: Return the signal to continue with from ptrace_stop
ptrace: Move setting/clearing ptrace_message into ptrace_stop
tracehook: Remove tracehook.h
resume_user_mode: Move to resume_user_mode.h
resume_user_mode: Remove #ifdef TIF_NOTIFY_RESUME in set_notify_resume
signal: Move set_notify_signal and clear_notify_signal into sched/signal.h
task_work: Decouple TIF_NOTIFY_SIGNAL and task_work
task_work: Call tracehook_notify_signal from get_signal on all architectures
task_work: Introduce task_work_pending
task_work: Remove unnecessary include from posix_timers.h
ptrace: Remove tracehook_signal_handler
ptrace: Remove arch_syscall_{enter,exit}_tracehook
ptrace: Create ptrace_report_syscall_{entry,exit} in ptrace.h
ptrace/arm: Rename tracehook_report_syscall report_syscall
ptrace: Move ptrace_report_syscall into ptrace.h
Objtool's --ibt option generates .ibt_endbr_seal which lists
superfluous ENDBR instructions. That is those instructions for which
the function is never indirectly called.
Overwrite these ENDBR instructions with a NOP4 such that these
function can never be indirect called, reducing the number of viable
ENDBR targets in the kernel.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20220308154319.822545231@infradead.org
Scan the tree for reserved memory before unflattening it. As Rob
Herring said:
| These should be reversed. early_init_fdt_scan_reserved_mem() works on
| the flat tree. Reserved memory needs to be reserved before
| unflatten_device_tree() starts allocating memory. Though I imagine that
| doesn't really matter for UML.
Suggested-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/lkml/YeF%2FVbxo6fAt0WLp@robh.at.kernel.org/
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Move set_notify_resume and tracehook_notify_resume into resume_user_mode.h.
While doing that rename tracehook_notify_resume to resume_user_mode_work.
Update all of the places that included tracehook.h for these functions to
include resume_user_mode.h instead.
Update all of the callers of tracehook_notify_resume to call
resume_user_mode_work.
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lkml.kernel.org/r/20220309162454.123006-12-ebiederm@xmission.com
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Rename tracehook_report_syscall_{entry,exit} to
ptrace_report_syscall_{entry,exit} and place them in ptrace.h
There is no longer any generic tracehook infractructure so make
these ptrace specific functions ptrace specific.
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lkml.kernel.org/r/20220309162454.123006-3-ebiederm@xmission.com
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Merge misc updates from Andrew Morton:
"146 patches.
Subsystems affected by this patch series: kthread, ia64, scripts,
ntfs, squashfs, ocfs2, vfs, and mm (slab-generic, slab, kmemleak,
dax, kasan, debug, pagecache, gup, shmem, frontswap, memremap,
memcg, selftests, pagemap, dma, vmalloc, memory-failure, hugetlb,
userfaultfd, vmscan, mempolicy, oom-kill, hugetlbfs, migration, thp,
ksm, page-poison, percpu, rmap, zswap, zram, cleanups, hmm, and
damon)"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (146 commits)
mm/damon: hide kernel pointer from tracepoint event
mm/damon/vaddr: hide kernel pointer from damon_va_three_regions() failure log
mm/damon/vaddr: use pr_debug() for damon_va_three_regions() failure logging
mm/damon/dbgfs: remove an unnecessary variable
mm/damon: move the implementation of damon_insert_region to damon.h
mm/damon: add access checking for hugetlb pages
Docs/admin-guide/mm/damon/usage: update for schemes statistics
mm/damon/dbgfs: support all DAMOS stats
Docs/admin-guide/mm/damon/reclaim: document statistics parameters
mm/damon/reclaim: provide reclamation statistics
mm/damon/schemes: account how many times quota limit has exceeded
mm/damon/schemes: account scheme actions that successfully applied
mm/damon: remove a mistakenly added comment for a future feature
Docs/admin-guide/mm/damon/usage: update for kdamond_pid and (mk|rm)_contexts
Docs/admin-guide/mm/damon/usage: mention tracepoint at the beginning
Docs/admin-guide/mm/damon/usage: remove redundant information
Docs/admin-guide/mm/damon/usage: update for scheme quotas and watermarks
mm/damon: convert macro functions to static inline functions
mm/damon: modify damon_rand() macro to static inline function
mm/damon: move damon_rand() definition into damon.h
...
Since commit 4064b9827063 ("mm: allow VM_FAULT_RETRY for multiple
times") allowed VM_FAULT_RETRY for multiple times, the
FAULT_FLAG_ALLOW_RETRY bit of fault_flag will not be changed in the page
fault path, so the following check is no longer needed:
flags & FAULT_FLAG_ALLOW_RETRY
So just remove it.
[akpm@linux-foundation.org: coding style fixes]
Link: https://lkml.kernel.org/r/20211110123358.36511-1-zhengqi.arch@bytedance.com
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Kirill Shutemov <kirill@shutemov.name>
Cc: Peter Xu <peterx@redhat.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
"Lots of cleanups and preparation; highlights:
- futex: Cleanup and remove runtime futex_cmpxchg detection
- rtmutex: Some fixes for the PREEMPT_RT locking infrastructure
- kcsan: Share owner_on_cpu() between mutex,rtmutex and rwsem and
annotate the racy owner->on_cpu access *once*.
- atomic64: Dead-Code-Elemination"
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmHdvssACgkQEsHwGGHe
VUrbBg//VQvz5BwddIJDj9utt5AvSixNcTF5mJyFKCSIqO0S4J8nCNcvJjZ2bs4S
w1YmInFbp0WFGUhaIZiw0e6KWJUoINTng4MfHDZosS1doT2of53ZaQqXs3i81jDz
87w8ADVHL0x4+BNjdsIwbcuPSDTmJFoyFOdeXTIl9hv9ZULT8m4Mt+LJuUHNZ+vF
rS1jyseVPWkcm5y+Yie0rhip+ygzbfbt0ArsLfRcrBJsKr6oxLxV2DDF+2djXuuP
d2OgGT7VkbgAhoKpzVXUiHsT6ppR5Mn5TLSa4EZ4bPPCUFldOhKuCAImF3T6yVIa
44iX5vQN9v5VHBy6ocPbdOIBuYBYVGCMurh1t7pbpB6G+mmSxMiyta5MY37POwjv
K2JT9mC2A6a4d17gue5FT3mnJMBB4eHwVaDfAwCZs/5rRNuoTz4aY5Xy04Mq0ltI
39uarwBd5hwSugBWg44AS5E9h52E654FQ7g6iS4NtUvJuuaXBTl43EcZWx2+mnPL
zY+iOMVMgg33VIVcm/mlf/6zWL0LXPmILUiA1fp4Q9/n8u1EuOOyeA/GsC9Pl3wO
HY3KpYJA5eQpIk/JEnzKm5ZE3pCrUdH6VDC/SB4owQtafQG6OxyQVP1Gj7KYxZsD
NqqpJ4nkKooc5f5DqVEN8wrjyYsnVxEfriEG09OoR6wI3MqyUA4=
=vrYy
-----END PGP SIGNATURE-----
Merge tag 'locking_core_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Borislav Petkov:
"Lots of cleanups and preparation. Highlights:
- futex: Cleanup and remove runtime futex_cmpxchg detection
- rtmutex: Some fixes for the PREEMPT_RT locking infrastructure
- kcsan: Share owner_on_cpu() between mutex,rtmutex and rwsem and
annotate the racy owner->on_cpu access *once*.
- atomic64: Dead-Code-Elemination"
[ Description above by Peter Zijlstra ]
* tag 'locking_core_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/atomic: atomic64: Remove unusable atomic ops
futex: Fix additional regressions
locking: Allow to include asm/spinlock_types.h from linux/spinlock_types_raw.h
x86/mm: Include spinlock_t definition in pgtable.
locking: Mark racy reads of owner->on_cpu
locking: Make owner_on_cpu() into <linux/sched.h>
lockdep/selftests: Adapt ww-tests for PREEMPT_RT
lockdep/selftests: Skip the softirq related tests on PREEMPT_RT
lockdep/selftests: Unbalanced migrate_disable() & rcu_read_lock().
lockdep/selftests: Avoid using local_lock_{acquire|release}().
lockdep: Remove softirq accounting on PREEMPT_RT.
locking/rtmutex: Add rt_mutex_lock_nest_lock() and rt_mutex_lock_killable().
locking/rtmutex: Squash self-deadlock check for ww_rt_mutex.
locking: Remove rt_rwlock_is_contended().
sched: Trigger warning if ->migration_disabled counter underflows.
futex: Fix sparc32/m68k/nds32 build regression
futex: Remove futex_cmpxchg detection
futex: Ensure futex_atomic_cmpxchg_inatomic() is present
kernel/locking: Use a pointer in ww_mutex_trylock().
Add a dtb=<filename> option to boot UML with a devicetree blob. This
can be used for testing driver code using UML.
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
[rw: Add dependency on CONFIG_OF]
Signed-off-by: Richard Weinberger <richard@nod.at>
The file loading support in initrd.c can be re-used for
loading devicetrees. Move it out of initrd.c.
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Remove address space overrides using set_fs() for User Mode Linux.
Note that just like the existing kernel access case of the uaccess
routines the new nofault kernel handlers do not actually have any
exception handling. This is probably broken, but not change to the
status quo.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Richard Weinberger <richard@nod.at>
Only one extern in there is needed in processor-generic.h, and it's
not needed anywhere else. So move it over there and get rid of
the include in processor-generic.h, adding includes of registers.h
to the few files that need the declarations in it.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
Fix the following coccinelle reports:
./arch/um/kernel/mem.c:89:2-5: WARNING: Use BUG_ON instead of if
condition followed by BUG.
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Changcheng Deng <deng.changcheng@zte.com.cn>
Signed-off-by: Richard Weinberger <richard@nod.at>
Rename set_signals() as there's at least one driver that
uses the same name and can now be built on UM due to PCI
support, and thus we can get symbol conflicts.
Also rename set_signals_trace() to be consistent.
Reported-by: kernel test robot <lkp@intel.com>
Fixes: 68f5d3f3b654 ("um: add PCI over virtio emulation driver")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Now that all architectures have a working futex implementation in any
configuration, remove the runtime detection code.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Acked-by: Vineet Gupta <vgupta@kernel.org>
Acked-by: Max Filippov <jcmvbkbc@gmail.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Link: https://lore.kernel.org/r/20211026100432.1730393-2-arnd@kernel.org
Pull exit cleanups from Eric Biederman:
"While looking at some issues related to the exit path in the kernel I
found several instances where the code is not using the existing
abstractions properly.
This set of changes introduces force_fatal_sig a way of sending a
signal and not allowing it to be caught, and corrects the misuse of
the existing abstractions that I found.
A lot of the misuse of the existing abstractions are silly things such
as doing something after calling a no return function, rolling BUG by
hand, doing more work than necessary to terminate a kernel thread, or
calling do_exit(SIGKILL) instead of calling force_sig(SIGKILL).
In the review a deficiency in force_fatal_sig and force_sig_seccomp
where ptrace or sigaction could prevent the delivery of the signal was
found. I have added a change that adds SA_IMMUTABLE to change that
makes it impossible to interrupt the delivery of those signals, and
allows backporting to fix force_sig_seccomp
And Arnd found an issue where a function passed to kthread_run had the
wrong prototype, and after my cleanup was failing to build."
* 'exit-cleanups-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (23 commits)
soc: ti: fix wkup_m3_rproc_boot_thread return type
signal: Add SA_IMMUTABLE to ensure forced siganls do not get changed
signal: Replace force_sigsegv(SIGSEGV) with force_fatal_sig(SIGSEGV)
exit/r8188eu: Replace the macro thread_exit with a simple return 0
exit/rtl8712: Replace the macro thread_exit with a simple return 0
exit/rtl8723bs: Replace the macro thread_exit with a simple return 0
signal/x86: In emulate_vsyscall force a signal instead of calling do_exit
signal/sparc32: In setup_rt_frame and setup_fram use force_fatal_sig
signal/sparc32: Exit with a fatal signal when try_to_clear_window_buffer fails
exit/syscall_user_dispatch: Send ordinary signals on failure
signal: Implement force_fatal_sig
exit/kthread: Have kernel threads return instead of calling do_exit
signal/s390: Use force_sigsegv in default_trap_handler
signal/vm86_32: Properly send SIGSEGV when the vm86 state cannot be saved.
signal/vm86_32: Replace open coded BUG_ON with an actual BUG_ON
signal/sparc: In setup_tsb_params convert open coded BUG into BUG
signal/powerpc: On swapcontext failure force SIGSEGV
signal/sh: Use force_sig(SIGKILL) instead of do_group_exit(SIGKILL)
signal/mips: Update (_save|_restore)_fp_context to fail with -EFAULT
signal/sparc32: Remove unreachable do_exit in do_sparc_fault
...
Merge misc updates from Andrew Morton:
"257 patches.
Subsystems affected by this patch series: scripts, ocfs2, vfs, and
mm (slab-generic, slab, slub, kconfig, dax, kasan, debug, pagecache,
gup, swap, memcg, pagemap, mprotect, mremap, iomap, tracing, vmalloc,
pagealloc, memory-failure, hugetlb, userfaultfd, vmscan, tools,
memblock, oom-kill, hugetlbfs, migration, thp, readahead, nommu, ksm,
vmstat, madvise, memory-hotplug, rmap, zsmalloc, highmem, zram,
cleanups, kfence, and damon)"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (257 commits)
mm/damon: remove return value from before_terminate callback
mm/damon: fix a few spelling mistakes in comments and a pr_debug message
mm/damon: simplify stop mechanism
Docs/admin-guide/mm/pagemap: wordsmith page flags descriptions
Docs/admin-guide/mm/damon/start: simplify the content
Docs/admin-guide/mm/damon/start: fix a wrong link
Docs/admin-guide/mm/damon/start: fix wrong example commands
mm/damon/dbgfs: add adaptive_targets list check before enable monitor_on
mm/damon: remove unnecessary variable initialization
Documentation/admin-guide/mm/damon: add a document for DAMON_RECLAIM
mm/damon: introduce DAMON-based Reclamation (DAMON_RECLAIM)
selftests/damon: support watermarks
mm/damon/dbgfs: support watermarks
mm/damon/schemes: activate schemes based on a watermarks mechanism
tools/selftests/damon: update for regions prioritization of schemes
mm/damon/dbgfs: support prioritization weights
mm/damon/vaddr,paddr: support pageout prioritization
mm/damon/schemes: prioritize regions within the quotas
mm/damon/selftests: support schemes quotas
mm/damon/dbgfs: support quotas of schemes
...
Rename memblock_free_ptr() to memblock_free() and use memblock_free()
when freeing a virtual pointer so that memblock_free() will be a
counterpart of memblock_alloc()
The callers are updated with the below semantic patch and manual
addition of (void *) casting to pointers that are represented by
unsigned long variables.
@@
identifier vaddr;
expression size;
@@
(
- memblock_phys_free(__pa(vaddr), size);
+ memblock_free(vaddr, size);
|
- memblock_free_ptr(vaddr, size);
+ memblock_free(vaddr, size);
)
[sfr@canb.auug.org.au: fixup]
Link: https://lkml.kernel.org/r/20211018192940.3d1d532f@canb.auug.org.au
Link: https://lkml.kernel.org/r/20210930185031.18648-7-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Juergen Gross <jgross@suse.com>
Cc: Shahab Vahedi <Shahab.Vahedi@synopsys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since memblock_free() operates on a physical range, make its name
reflect it and rename it to memblock_phys_free(), so it will be a
logical counterpart to memblock_phys_alloc().
The callers are updated with the below semantic patch:
@@
expression addr;
expression size;
@@
- memblock_free(addr, size);
+ memblock_phys_free(addr, size);
Link: https://lkml.kernel.org/r/20210930185031.18648-6-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Juergen Gross <jgross@suse.com>
Cc: Shahab Vahedi <Shahab.Vahedi@synopsys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Revert the printk format based wchan() symbol resolution as it can leak
the raw value in case that the symbol is not resolvable.
- Make wchan() more robust and work with all kind of unwinders by
enforcing that the task stays blocked while unwinding is in progress.
- Prevent sched_fork() from accessing an invalid sched_task_group
- Improve asymmetric packing logic
- Extend scheduler statistics to RT and DL scheduling classes and add
statistics for bandwith burst to the SCHED_FAIR class.
- Properly account SCHED_IDLE entities
- Prevent a potential deadlock when initial priority is assigned to a
newly created kthread. A recent change to plug a race between cpuset and
__sched_setscheduler() introduced a new lock dependency which is now
triggered. Break the lock dependency chain by moving the priority
assignment to the thread function.
- Fix the idle time reporting in /proc/uptime for NOHZ enabled systems.
- Improve idle balancing in general and especially for NOHZ enabled
systems.
- Provide proper interfaces for live patching so it does not have to
fiddle with scheduler internals.
- Add cluster aware scheduling support.
- A small set of tweaks for RT (irqwork, wait_task_inactive(), various
scheduler options and delaying mmdrop)
- The usual small tweaks and improvements all over the place
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmF/OUkTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoR/5D/9ikdGNpKg9osNqJ3GjAmxsK6kVkB29
iFe2k8pIpWDToWQf/wQRGih4Yj3Cl49QSnZcPIibh2/12EB1qrrW6iSPJkInz8Ec
/1LS5/Vewn2OyoxyXZjdvGC5gTXEodSbIazASvX7nvdMeI4gsAsL5etzrMJirT/t
aymqvr7zovvywrwMTQJrGjUMo9l4ewE8tafMNNhRu1BHU1U4ojM9yvThyRAAcmp7
3Xy49A+Yq3IgrvYI4u8FMK5Zh08KaxSFjiLhePGm/bF+wSfYmWop2TP1jY05W2Uo
ti8hfbJMUoFRYuMxAiEldkItnc0wV4M9PtWZZ/x+B71bs65Y4Zjt9cW+rxJv2+m1
vzV31EsQwGnOti072dzWN4c/cZqngVXAjaNtErvDwJUr+Tw1ayv9KUvuodMQqZY6
mu68bFUO2kV9EMe1CBOv51Uy1RGHyLj3rlNqrkw+Xp5ISE9Ad2vhUEiRp5bQx5Ci
V/XFhGZkGUluh0vccrdFlNYZwhj8cZEzkOPCnPSeZ+bq8SyZE6xuHH/lTP1CJCOy
s800rW1huM+kgV+zRN8adDkGXibAk9N3RtVGnQXmuEy8gB9LZmQg+JeM2wsc9B+6
i0gdqZnsjNAfoK+BBAG4holxptSL8/eOJsFH8ZNIoxQ+iqooyPx9tFX7yXnRTBQj
d2qWG7UvoseT+g==
=fgtS
-----END PGP SIGNATURE-----
Merge tag 'sched-core-2021-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Thomas Gleixner:
- Revert the printk format based wchan() symbol resolution as it can
leak the raw value in case that the symbol is not resolvable.
- Make wchan() more robust and work with all kind of unwinders by
enforcing that the task stays blocked while unwinding is in progress.
- Prevent sched_fork() from accessing an invalid sched_task_group
- Improve asymmetric packing logic
- Extend scheduler statistics to RT and DL scheduling classes and add
statistics for bandwith burst to the SCHED_FAIR class.
- Properly account SCHED_IDLE entities
- Prevent a potential deadlock when initial priority is assigned to a
newly created kthread. A recent change to plug a race between cpuset
and __sched_setscheduler() introduced a new lock dependency which is
now triggered. Break the lock dependency chain by moving the priority
assignment to the thread function.
- Fix the idle time reporting in /proc/uptime for NOHZ enabled systems.
- Improve idle balancing in general and especially for NOHZ enabled
systems.
- Provide proper interfaces for live patching so it does not have to
fiddle with scheduler internals.
- Add cluster aware scheduling support.
- A small set of tweaks for RT (irqwork, wait_task_inactive(), various
scheduler options and delaying mmdrop)
- The usual small tweaks and improvements all over the place
* tag 'sched-core-2021-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (69 commits)
sched/fair: Cleanup newidle_balance
sched/fair: Remove sysctl_sched_migration_cost condition
sched/fair: Wait before decaying max_newidle_lb_cost
sched/fair: Skip update_blocked_averages if we are defering load balance
sched/fair: Account update_blocked_averages in newidle_balance cost
x86: Fix __get_wchan() for !STACKTRACE
sched,x86: Fix L2 cache mask
sched/core: Remove rq_relock()
sched: Improve wake_up_all_idle_cpus() take #2
irq_work: Also rcuwait for !IRQ_WORK_HARD_IRQ on PREEMPT_RT
irq_work: Handle some irq_work in a per-CPU thread on PREEMPT_RT
irq_work: Allow irq_work_sync() to sleep if irq_work() no IRQ support.
sched/rt: Annotate the RT balancing logic irqwork as IRQ_WORK_HARD_IRQ
sched: Add cluster scheduler level for x86
sched: Add cluster scheduler level in core and related Kconfig for ARM64
topology: Represent clusters of CPUs within a die
sched: Disable -Wunused-but-set-variable
sched: Add wrapper for get_wchan() to keep task blocked
x86: Fix get_wchan() to support the ORC unwinder
proc: Use task_is_running() for wchan in /proc/$pid/stat
...
Now that force_fatal_sig exists it is unnecessary and a bit confusing
to use force_sigsegv in cases where the simpler force_fatal_sig is
wanted. So change every instance we can to make the code clearer.
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Link: https://lkml.kernel.org/r/877de7jrev.fsf@disp2133
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Rewrite retpoline thunk call sites to be indirect calls for
spectre_v2=off. This ensures spectre_v2=off is as near to a
RETPOLINE=n build as possible.
This is the replacement for objtool writing alternative entries to
ensure the same and achieves feature-parity with the previous
approach.
One noteworthy feature is that it relies on the thunks to be in
machine order to compute the register index.
Specifically, this does not yet address the Jcc __x86_indirect_thunk_*
calls generated by clang, a future patch will add this.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/r/20211026120310.232495794@infradead.org
Having a stable wchan means the process must be blocked and for it to
stay that way while performing stack unwinding.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> [arm]
Tested-by: Mark Rutland <mark.rutland@arm.com> [arm64]
Link: https://lkml.kernel.org/r/20211008111626.332092234@infradead.org
- Support for VMAP_STACK
- Support for splice_write in hostfs
- Fixes for virt-pci
- Fixes for virtio_uml
- Various fixes
-----BEGIN PGP SIGNATURE-----
iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAmE6Xv4WHHJpY2hhcmRA
c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wbpMD/0UBswFdI9J6ePQf2+UyQ3sfFay
xZ5/gyL+Ou0k/hwcjLx4DtIQBXkNiwgiKF+ncHvMXTr/oKAo5f7UsGYyMNIKlbKO
LrIpc6avqmeovTtOuVhm6VML/m7rvJYC/wJ0VFu6CN2aELoRZLXfeogwn1beAl6p
3JKc54tbew5022lZF6Df/QEpkCyuOjWMnEn/khJGuz+vmkodV+5cegZqxJIAnWrU
NVGf7laiV+rBWY4SVXiuJBGTNFwLZkORNa5evBScum85aqwaFawepZT0pNKEt4tc
Lalyy7jACriWeQJeQksWACfexYFPywQU/ebYcAlQ9b0wd5aZxi8IJc9wj0a1Oz3N
i2DEf09/Zk8eE1cbpp6GP+pbvlqNVsAgtLane2Wzxc1kuJGiFYeXCiDyCFzbhbxW
rsTiP3oAxC7OjFwebmtCvBbK9GSl5ETDwfOg+nl2idIK0cds292ju3bWL9vO6VRP
Cjxzn7ZaJYvPlrRHo5yujLURqRZSrkPcL/XthIDQJNjXMd8j2AYMRVM2n0gFLu7g
jSphwg8t3SmCrolGtUucadTPNMR5pE3rQTN+tbhqwGp+Cs+MnM7CqKUv+JoRC7KF
1qH/1p9tiz/utIpjKmvNZtZRwnElBoEgyoY6RdtqlCMnDcuLpDdmCRyWDsHAzXKg
1X9ym5QqDj5zSLxsXg==
=RgAO
-----END PGP SIGNATURE-----
Merge tag 'for-linus-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml
Pull UML updates from Richard Weinberger:
- Support for VMAP_STACK
- Support for splice_write in hostfs
- Fixes for virt-pci
- Fixes for virtio_uml
- Various fixes
* tag 'for-linus-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
um: fix stub location calculation
um: virt-pci: fix uapi documentation
um: enable VMAP_STACK
um: virt-pci: don't do DMA from stack
hostfs: support splice_write
um: virtio_uml: fix memory leak on init failures
um: virtio_uml: include linux/virtio-uml.h
lib/logic_iomem: fix sparse warnings
um: make PCI emulation driver init/exit static
Merge more updates from Andrew Morton:
"147 patches, based on 7d2a07b769330c34b4deabeed939325c77a7ec2f.
Subsystems affected by this patch series: mm (memory-hotplug, rmap,
ioremap, highmem, cleanups, secretmem, kfence, damon, and vmscan),
alpha, percpu, procfs, misc, core-kernel, MAINTAINERS, lib,
checkpatch, epoll, init, nilfs2, coredump, fork, pids, criu, kconfig,
selftests, ipc, and scripts"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (94 commits)
scripts: check_extable: fix typo in user error message
mm/workingset: correct kernel-doc notations
ipc: replace costly bailout check in sysvipc_find_ipc()
selftests/memfd: remove unused variable
Kconfig.debug: drop selecting non-existing HARDLOCKUP_DETECTOR_ARCH
configs: remove the obsolete CONFIG_INPUT_POLLDEV
prctl: allow to setup brk for et_dyn executables
pid: cleanup the stale comment mentioning pidmap_init().
kernel/fork.c: unexport get_{mm,task}_exe_file
coredump: fix memleak in dump_vma_snapshot()
fs/coredump.c: log if a core dump is aborted due to changed file permissions
nilfs2: use refcount_dec_and_lock() to fix potential UAF
nilfs2: fix memory leak in nilfs_sysfs_delete_snapshot_group
nilfs2: fix memory leak in nilfs_sysfs_create_snapshot_group
nilfs2: fix memory leak in nilfs_sysfs_delete_##name##_group
nilfs2: fix memory leak in nilfs_sysfs_create_##name##_group
nilfs2: fix NULL pointer in nilfs_##name##_attr_release
nilfs2: fix memory leak in nilfs_sysfs_create_device_group
trap: cleanup trap_init()
init: move usermodehelper_enable() to populate_rootfs()
...
There are some empty trap_init() definitions in different ARCHs, Introduce
a new weak trap_init() function to clean them up.
Link: https://lkml.kernel.org/r/20210812123602.76356-1-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> [arm32]
Acked-by: Vineet Gupta [arc]
Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc]
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: Stafford Horne <shorne@gmail.com>
Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <palmerdabbelt@google.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In commit 9f0b4807a44f ("um: rework userspace stubs to not hard-code
stub location") I changed stub_segv_handler() to do a calculation with
a pointer to a stack variable to find the data page that we're using
for the stack and the rest of the data. This same commit was meant to
do it as well for stub_clone_handler(), but the change inadvertently
went into commit 84b2789d6115 ("um: separate child and parent errors
in clone stub") instead.
This was reported to not be compiled correctly by gcc 5, causing the
code to crash here. I'm not sure why, perhaps it's UB because the var
isn't initialized? In any case, this trick always seemed bad, so just
create a new inline function that does the calculation in assembly.
Reported-by: subashab@codeaurora.org
Fixes: 9f0b4807a44f ("um: rework userspace stubs to not hard-code stub location")
Fixes: 84b2789d6115 ("um: separate child and parent errors in clone stub")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
This is a preparation for changing over architectures to the
generic implementation one at a time. As there are no callers
of either __strncpy_from_user() or __strnlen_user(), fold these
into the strncpy_from_user() and strnlen_user() functions to make
each implementation independent of the others.
Many of these implementations have known bugs, but the intention
here is to not change behavior at all and stay compatible with
those bugs for the moment.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>