When we switch to use seccomp, we need both the signal stack and other
data (i.e. syscall information) to co-exist in the stub data. To
facilitate this, start by defining separate memory areas for the stack
and syscall data.
This moves the signal stack onto a new page as the memory area is not
sufficient to hold both signal stack and syscall information.
Only change the signal stack setup for now, as the syscall code will be
reworked later.
Signed-off-by: Benjamin Berg <benjamin@sipsolutions.net>
Link: https://patch.msgid.link/20240703134536.1161108-3-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
With external time travel, a LOT of message can end up
being exchanged on the socket, taking a significant
amount of time just to do that.
Add a new shared memory optimisation to that, where a
number of changes are made:
- the controller sends a client ID and a shared memory FD
(and a logging FD we don't use) in the ACK message to
the initial START
- the shared memory holds the current time and the
free_until value, so that there's no need to exchange
messages for that
- if the client that's running has shared memory support,
any client (the running one included) can request the
next time it wants to run inside the shared memory,
rather than sending a message, by also updating the
free_until value
- when shared memory is enabled, RUN/WAIT messages no
longer have an ACK, further cutting down on messages
Together, this can reduce the number of messages very
significantly, and reduce overall test/simulation run time.
Co-developed-by: Mordechay Goodstein <mordechay.goodstein@intel.com>
Signed-off-by: Mordechay Goodstein <mordechay.goodstein@intel.com>
Link: https://patch.msgid.link/20240702192118.6ad0a083f574.Ie41206c8ce4507fe26b991937f47e86c24ca7a31@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add a message type to the time-travel protocol to broadcast
a small (64-bit) value to all participants in a simulation.
The main use case is to have an identical message come to
all participants in a simulation, e.g. to separate out logs
for different tests running in a single simulation.
Down in the guts of time_travel_handle_message() we can't
use printk() and not even printk_deferred(), so just store
the message and print it at the start of the userspace()
function.
Unfortunately this means that other prints in the kernel
can actually bypass the message, but in most cases where
this is used, for example to separate test logs, userspace
will be involved. Also, even if we could use
printk_deferred(), we'd still need to flush it out in the
userspace() function since otherwise userspace messages
might cross it.
As a result, this is a reasonable compromise, there's no
need to have any core changes and it solves the main use
case we have for it.
Signed-off-by: Mordechay Goodstein <mordechay.goodstein@intel.com>
Link: https://patch.msgid.link/20240702192118.c4093bc5b15e.I2ca8d006b67feeb866ac2017af7b741c9e06445a@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Current calculation of max_low_pfn is introduced in commit af84eab20891
("[PATCH] uml: fix LVM crash"). It is intended to set max_low_pfn to the
same value as max_pfn.
But I am not sure why the max_pfn is set to totalram_pages, which
represents the number of usable pages in system instead of an absolute
page frame number. (The change history stops there.)
While we have already calculate it in setup_physmem(), so not necessary
to do it again.
Also this would help changing totalram_pages accounting, since we plan
to move the accounting into __free_pages_core(). With this change,
totalram_pages may not represent the total usable pages at this point,
since some pages would be deferred initialized.
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
CC: Jeff Dike <jdike@linux.intel.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Alasdair G Kergon <agk@redhat.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Mike Rapoport (IBM) <rppt@kernel.org>
CC: David Hildenbrand <david@redhat.com>
Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
Link: https://patch.msgid.link/20240615034150.2958-1-richard.weiyang@gmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Currently /proc/sysemu will never be registered, as sysemu_supported
is initialized to zero implicitly and no code updates it. And there is
also nothing to configure via sysemu in UML anymore.
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20240527134024.1539848-3-tiwei.btw@antgroup.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When in time-travel mode, the eventfd events are read even when signals
are blocked as SIGIO still needs to be processed. In this case, the
event is cleared on the eventfd but the IRQ still needs to be fired
later.
We did already ensure that the SIGIO handler is run again. However, the
FDs are configured to be level triggered, so that eventfd will not
notify again. As such, add some logic to mark the IRQ as pending and
process it at the next opportunity.
To avoid duplication, reuse the logic used for the suspend/resume case.
This does not really change anything except for delaying running the
IRQs with timetravel_handler at a slightly later point in time (and
possibly running non-timetravel IRQs that shouldn't happen earlier).
While at it, move marking as pending into irq_event_handler as that is
the more logical place for it to happen.
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20231018123643.1255813-1-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
- Fixes for -Wmissing-prototypes warnings and further cleanup
- Remove callback returning void from rtc and virtio drivers
- Fix bash location
-----BEGIN PGP SIGNATURE-----
iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAmZQ/FYWHHJpY2hhcmRA
c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wfbLEAC1X55jjignxMIt4gEbtOXL2Pgn
Md3z8sr5QhyQeLEkoYEhAqYHcKYY8A9ZshfNS4RNTbhU6qaFQBNwbBuFnJ1MsllC
236EKgy0xFChgqH0bszGW97VRcIs79qauDt0mE0AXQGpuW7AjJX9chT2ikp9Sr5z
P2Gnp7+l/OaAH7UXFpaYYOWOzRAQCbA67hN3nRcSBCPq+Plw2bQCCKKK0g4UwqmI
vukAguO3eGZ0B4oQEsPX/krM0IigM01l5pJVhkdNzJgMOfd7eWb3o3juE35f4KPx
vSd8LPmoBvDJt9dKbZE38fC58+U9qWDcBDLfDlf7F0dGtWQi6QeZmrmQSteQUAFF
YWHllQ+P6xdh1kdSXWk8IesVINydMAc79DpqmKkEUgmCGVX+grt40aOTnOIUuzjq
9lMcfKgjjBz6qsC3fWyGMvjaPpRRbe4G1wnAOij+hdBNR2fEFaqv8Dx9Zx42G3lm
oYDylqjP73SbtOKbTCdHTqOfTSC83KYmo6w5ttwnFZcDVtbXRY8NejIX08Go8KIn
OXeZ8Pxf3DmQ4yuhE3mWOoT/eFiZnXpoNiteQZ/8RhyPMJllVijtSIlnLteuah4d
Z68Nh9/P52VcjMH0wS1eTKrkUAgfGBQ3kIOZqbU8UMSeq8vTB2kx++HwAtmUNi07
pDaNOQVtW5m4HMhVlw==
=umlG
-----END PGP SIGNATURE-----
Merge tag 'uml-for-linus-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux
Pull UML updates from Richard Weinberger:
- Fixes for -Wmissing-prototypes warnings and further cleanup
- Remove callback returning void from rtc and virtio drivers
- Fix bash location
* tag 'uml-for-linus-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux: (26 commits)
um: virtio_uml: Convert to platform remove callback returning void
um: rtc: Convert to platform remove callback returning void
um: Remove unused do_get_thread_area function
um: Fix -Wmissing-prototypes warnings for __vdso_*
um: Add an internal header shared among the user code
um: Fix the declaration of kasan_map_memory
um: Fix the -Wmissing-prototypes warning for get_thread_reg
um: Fix the -Wmissing-prototypes warning for __switch_mm
um: Fix -Wmissing-prototypes warnings for (rt_)sigreturn
um: Stop tracking host PID in cpu_tasks
um: process: remove unused 'n' variable
um: vector: remove unused len variable/calculation
um: vector: fix bpfflash parameter evaluation
um: slirp: remove set but unused variable 'pid'
um: signal: move pid variable where needed
um: Makefile: use bash from the environment
um: Add winch to winch_handlers before registering winch IRQ
um: Fix -Wmissing-prototypes warnings for __warp_* and foo
um: Fix -Wmissing-prototypes warnings for text_poke*
um: Move declarations to proper headers
...
Kbuild conventionally uses $(obj)/ for generated files, and $(src)/ for
checked-in source files. It is merely a convention without any functional
difference. In fact, $(obj) and $(src) are exactly the same, as defined
in scripts/Makefile.build:
src := $(obj)
When the kernel is built in a separate output directory, $(src) does
not accurately reflect the source directory location. While Kbuild
resolves this discrepancy by specifying VPATH=$(srctree) to search for
source files, it does not cover all cases. For example, when adding a
header search path for local headers, -I$(srctree)/$(src) is typically
passed to the compiler.
This introduces inconsistency between upstream and downstream Makefiles
because $(src) is used instead of $(srctree)/$(src) for the latter.
To address this inconsistency, this commit changes the semantics of
$(src) so that it always points to the directory in the source tree.
Going forward, the variables used in Makefiles will have the following
meanings:
$(obj) - directory in the object tree
$(src) - directory in the source tree (changed by this commit)
$(objtree) - the top of the kernel object tree
$(srctree) - the top of the kernel source tree
Consequently, $(srctree)/$(src) in upstream Makefiles need to be replaced
with $(src).
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
The host PID tracked in 'cpu_tasks' is no longer used. Stopping
tracking it will also save some cycles.
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
The return value of fn() wasn't used for a long time,
so no need to assign it to a variable, addressing a
W=1 warning.
This seems to be - with patches from others posted to
the list before - the last W=1 warning in arch/um/.
Fixes: 22e2430d60db ("x86, um: convert to saner kernel_execve() semantics")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: Tiwei Bie <tiwei.btw@antgroup.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
The prototypes for text_poke* are declared in asm/text-patching.h
under arch/x86/include/. It's safe to include this header, as it's
UML-aware (by checking CONFIG_UML_X86).
This will address below -Wmissing-prototypes warnings:
arch/um/kernel/um_arch.c:461:7: warning: no previous prototype for ‘text_poke’ [-Wmissing-prototypes]
arch/um/kernel/um_arch.c:473:6: warning: no previous prototype for ‘text_poke_sync’ [-Wmissing-prototypes]
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
This will address below -Wmissing-prototypes warnings:
arch/um/kernel/initrd.c:18:12: warning: no previous prototype for ‘read_initrd’ [-Wmissing-prototypes]
arch/um/kernel/um_arch.c:408:19: warning: no previous prototype for ‘read_initrd’ [-Wmissing-prototypes]
arch/um/os-Linux/start_up.c:301:12: warning: no previous prototype for ‘parse_iomem’ [-Wmissing-prototypes]
arch/x86/um/ptrace_32.c:15:6: warning: no previous prototype for ‘arch_switch_to’ [-Wmissing-prototypes]
arch/x86/um/ptrace_32.c:101:5: warning: no previous prototype for ‘poke_user’ [-Wmissing-prototypes]
arch/x86/um/ptrace_32.c:153:5: warning: no previous prototype for ‘peek_user’ [-Wmissing-prototypes]
arch/x86/um/ptrace_64.c:111:5: warning: no previous prototype for ‘poke_user’ [-Wmissing-prototypes]
arch/x86/um/ptrace_64.c:171:5: warning: no previous prototype for ‘peek_user’ [-Wmissing-prototypes]
arch/x86/um/syscalls_64.c:48:6: warning: no previous prototype for ‘arch_switch_to’ [-Wmissing-prototypes]
arch/x86/um/tls_32.c:184:5: warning: no previous prototype for ‘arch_switch_tls’ [-Wmissing-prototypes]
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
This will address below -Wmissing-prototypes warnings:
arch/um/kernel/mem.c:202:8: warning: no previous prototype for ‘pgd_alloc’ [-Wmissing-prototypes]
arch/um/kernel/mem.c:215:7: warning: no previous prototype for ‘uml_kmalloc’ [-Wmissing-prototypes]
arch/um/kernel/process.c:207:6: warning: no previous prototype for ‘arch_cpu_idle’ [-Wmissing-prototypes]
arch/um/kernel/process.c:328:15: warning: no previous prototype for ‘arch_align_stack’ [-Wmissing-prototypes]
arch/um/kernel/reboot.c:45:6: warning: no previous prototype for ‘machine_restart’ [-Wmissing-prototypes]
arch/um/kernel/reboot.c:51:6: warning: no previous prototype for ‘machine_power_off’ [-Wmissing-prototypes]
arch/um/kernel/reboot.c:57:6: warning: no previous prototype for ‘machine_halt’ [-Wmissing-prototypes]
arch/um/kernel/skas/mmu.c:17:5: warning: no previous prototype for ‘init_new_context’ [-Wmissing-prototypes]
arch/um/kernel/skas/mmu.c:60:6: warning: no previous prototype for ‘destroy_context’ [-Wmissing-prototypes]
arch/um/kernel/skas/process.c:36:12: warning: no previous prototype for ‘start_uml’ [-Wmissing-prototypes]
arch/um/kernel/time.c:807:15: warning: no previous prototype for ‘calibrate_delay_is_known’ [-Wmissing-prototypes]
arch/um/kernel/tlb.c:594:6: warning: no previous prototype for ‘force_flush_all’ [-Wmissing-prototypes]
arch/x86/um/bugs_32.c:22:6: warning: no previous prototype for ‘arch_check_bugs’ [-Wmissing-prototypes]
arch/x86/um/bugs_32.c:44:6: warning: no previous prototype for ‘arch_examine_signal’ [-Wmissing-prototypes]
arch/x86/um/bugs_64.c:9:6: warning: no previous prototype for ‘arch_check_bugs’ [-Wmissing-prototypes]
arch/x86/um/bugs_64.c:13:6: warning: no previous prototype for ‘arch_examine_signal’ [-Wmissing-prototypes]
arch/x86/um/elfcore.c:10:12: warning: no previous prototype for ‘elf_core_extra_phdrs’ [-Wmissing-prototypes]
arch/x86/um/elfcore.c:15:5: warning: no previous prototype for ‘elf_core_write_extra_phdrs’ [-Wmissing-prototypes]
arch/x86/um/elfcore.c:42:5: warning: no previous prototype for ‘elf_core_write_extra_data’ [-Wmissing-prototypes]
arch/x86/um/elfcore.c:63:8: warning: no previous prototype for ‘elf_core_extra_data_size’ [-Wmissing-prototypes]
arch/x86/um/fault.c:18:5: warning: no previous prototype for ‘arch_fixup’ [-Wmissing-prototypes]
arch/x86/um/os-Linux/mcontext.c:7:6: warning: no previous prototype for ‘get_regs_from_mc’ [-Wmissing-prototypes]
arch/x86/um/os-Linux/tls.c:22:6: warning: no previous prototype for ‘check_host_supports_tls’ [-Wmissing-prototypes]
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Make it match the declaration in asm-generic/switch_to.h. And
also include the header to allow the compiler to check it.
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
These functions are not used anymore. Removing them will also address
below -Wmissing-prototypes warnings:
arch/um/kernel/process.c:51:5: warning: no previous prototype for ‘pid_to_processor_id’ [-Wmissing-prototypes]
arch/um/kernel/process.c:253:5: warning: no previous prototype for ‘copy_to_user_proc’ [-Wmissing-prototypes]
arch/um/kernel/process.c:263:5: warning: no previous prototype for ‘clear_user_proc’ [-Wmissing-prototypes]
arch/um/kernel/tlb.c:579:6: warning: no previous prototype for ‘flush_tlb_mm_range’ [-Wmissing-prototypes]
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
This will also fix the warnings like:
warning: no previous prototype for ‘fork_handler’ [-Wmissing-prototypes]
140 | void fork_handler(void)
| ^~~~~~~~~~~~
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Call this function unconditionally so that we can populate an empty DTB
on platforms that don't boot with a command line provided DTB. There's
no harm in calling unflatten_device_tree() unconditionally. If there
isn't a valid initial_boot_params dtb then unflatten_device_tree()
returns early.
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Frank Rowand <frowand.list@gmail.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: linux-um@lists.infradead.org
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Link: https://lore.kernel.org/r/20240217010557.2381548-4-sboyd@kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
- Clang coverage support
- Many cleanups from Benjamin Berg
- Various minor fixes
-----BEGIN PGP SIGNATURE-----
iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAmWi86sWHHJpY2hhcmRA
c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wWB9D/0XLw6RzLkav+fZzaNhNsP1jzGn
cQ3Yl29WcCsL+7iMVy/fLPZns8SlgBtVDZ6YZMNxrjjOBLu1VTOv30dNwYpvKUDj
ERakIpZeqg8+w6+pOK3lLWRkjjBVDrFM+e6vgZ+2iEFkvHhyHk7LhcfAanC3CTUO
qU4dLqZDpO4H98xhkbksc847I8Yg8mx9vLrqZBxQ8tfT2tLHMfzuR7kCvRfxOFso
4QDe4NImgEF5WLlK3Aj92NWxbMUZvw1f0bEuJGQJPr6GsLbNcsrlQCZIis/mov82
6176c3Jl1lHyBYsYyPAagVpqVBN+Q1gL1+YGz4Nv+wZFTqk5RvxMzVDdkpP6QEAV
ZJjdJdBJHrAqrzIdiY8XfPYBsu5iZYCLr8q3wJxQYYBrV3s8eXsFXMqoLHKBVOe6
OSgTv2OJrSvXZzXgI+5nysVVqOeoImcRzQXuVHha5jhrf6sPeGxB+hOuTccPLOG4
kIEoC+8CaR+vlHBspg2qU+wN0QAqvIFPID05p0GM37UnskxeOxSksMSkwPAVdhqD
6Mq+L5TDx6ShRfQ0KC1PGmByv312bRLudZ34BBAhoEspixI1AISbDn2wX7pKqERq
yC9h/49XmjqcCjmE2TsQ1k4zhUSo8CT/Z8SUmrmqnbrUmchdRCEBplolZpBbKfGO
U+l4hGt7O5BrPO0Tzw==
=oH6k
-----END PGP SIGNATURE-----
Merge tag 'uml-for-linus-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux
Pull UML updates from Richard Weinberger:
- Clang coverage support
- Many cleanups from Benjamin Berg
- Various minor fixes
* tag 'uml-for-linus-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux:
um: Mark 32bit syscall helpers as clobbering memory
um: Remove unused register save/restore functions
um: Rely on PTRACE_SETREGSET to set FS/GS base registers
Documentation: kunit: Add clang UML coverage example
arch: um: Add Clang coverage support
um: time-travel: fix time corruption
um: net: Fix return type of uml_net_start_xmit()
um: Always inline stub functions
um: Do not use printk in userspace trampoline
um: Reap winch thread if it fails
um: Do not use printk in SIGWINCH helper thread
um: Don't use vfprintf() for os_info()
um: Make errors to stop ptraced child fatal during startup
um: Drop NULL check from start_userspace
um: Drop support for hosts without SYSEMU_SINGLESTEP support
um: document arch_futex_atomic_op_inuser
um: mmu: remove stub_pages
um: Fix naming clash between UML and scheduler
um: virt-pci: fix platform map offset
commit 23baf831a32c ("mm, treewide: redefine MAX_ORDER sanely") has
changed the definition of MAX_ORDER to be inclusive. This has caused
issues with code that was not yet upstream and depended on the previous
definition.
To draw attention to the altered meaning of the define, rename MAX_ORDER
to MAX_PAGE_ORDER.
Link: https://lkml.kernel.org/r/20231228144704.14033-2-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
In 'basic' time-travel mode (without =inf-cpu or =ext), we
still get timer interrupts. These can happen at arbitrary
points in time, i.e. while in timer_read(), which pushes
time forward just a little bit. Then, if we happen to get
the interrupt after calculating the new time to push to,
but before actually finishing that, the interrupt will set
the time to a value that's incompatible with the forward,
and we'll crash because time goes backwards when we do the
forwarding.
Fix this by reading the time_travel_time, calculating the
adjustment, and doing the adjustment all with interrupts
disabled.
Reported-by: Vincent Whitchurch <Vincent.Whitchurch@axis.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
These features have existed since Linux 2.6.14 and can be considered
widely available at this point. Also drop the backward compatibility
code for PTRACE_SETOPTIONS.
Signed-off-by: Benjamin Berg <benjamin@sipsolutions.net>
----
v2:
* Continue to define PTRACE_SYSEMU_SINGLESTEP as glibc only added it in
version 2.27.
Signed-off-by: Richard Weinberger <richard@nod.at>
arch_futex_atomic_op_inuser was not documented correctly
resulting in build time warnings.
Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
__cant_sleep was already used and exported by the scheduler.
The name had to be changed to a UML specific one.
Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Reviewed-by: Peter Lafreniere <peter@n8pjl.ca>
Signed-off-by: Richard Weinberger <richard@nod.at>
Fixes the following build errors observed from W=1 builds:
arch/um/drivers/xterm_kern.c:35:5: warning: no previous prototype for
function 'xterm_fd' [-Wmissing-prototypes]
35 | int xterm_fd(int socket, int *pid_out)
| ^
arch/um/drivers/xterm_kern.c:35:1: note: declare 'static' if the
function is not intended to be used outside of this translation unit
35 | int xterm_fd(int socket, int *pid_out)
| ^
| static
arch/um/drivers/chan_kern.c:183:6: warning: no previous prototype for
function 'free_irqs' [-Wmissing-prototypes]
183 | void free_irqs(void)
| ^
arch/um/drivers/chan_kern.c:183:1: note: declare 'static' if the
function is not intended to be used outside of this translation unit
183 | void free_irqs(void)
| ^
| static
arch/um/drivers/slirp_kern.c:18:6: warning: no previous prototype for
function 'slirp_init' [-Wmissing-prototypes]
18 | void slirp_init(struct net_device *dev, void *data)
| ^
arch/um/drivers/slirp_kern.c:18:1: note: declare 'static' if the
function is not intended to be used outside of this translation unit
18 | void slirp_init(struct net_device *dev, void *data)
| ^
| static
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202308081050.sZEw4cQ5-lkp@intel.com/
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
The current name doesn't reflect what it does very well.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://lkml.kernel.org/r/20230622144321.427441595%40infradead.org
This modifies our user mode stack expansion code to always take the
mmap_lock for writing before modifying the VM layout.
It's actually something we always technically should have done, but
because we didn't strictly need it, we were being lazy ("opportunistic"
sounds so much better, doesn't it?) about things, and had this hack in
place where we would extend the stack vma in-place without doing the
proper locking.
And it worked fine. We just needed to change vm_start (or, in the case
of grow-up stacks, vm_end) and together with some special ad-hoc locking
using the anon_vma lock and the mm->page_table_lock, it all was fairly
straightforward.
That is, it was all fine until Ruihan Li pointed out that now that the
vma layout uses the maple tree code, we *really* don't just change
vm_start and vm_end any more, and the locking really is broken. Oops.
It's not actually all _that_ horrible to fix this once and for all, and
do proper locking, but it's a bit painful. We have basically three
different cases of stack expansion, and they all work just a bit
differently:
- the common and obvious case is the page fault handling. It's actually
fairly simple and straightforward, except for the fact that we have
something like 24 different versions of it, and you end up in a maze
of twisty little passages, all alike.
- the simplest case is the execve() code that creates a new stack.
There are no real locking concerns because it's all in a private new
VM that hasn't been exposed to anybody, but lockdep still can end up
unhappy if you get it wrong.
- and finally, we have GUP and page pinning, which shouldn't really be
expanding the stack in the first place, but in addition to execve()
we also use it for ptrace(). And debuggers do want to possibly access
memory under the stack pointer and thus need to be able to expand the
stack as a special case.
None of these cases are exactly complicated, but the page fault case in
particular is just repeated slightly differently many many times. And
ia64 in particular has a fairly complicated situation where you can have
both a regular grow-down stack _and_ a special grow-up stack for the
register backing store.
So to make this slightly more manageable, the bulk of this series is to
first create a helper function for the most common page fault case, and
convert all the straightforward architectures to it.
Thus the new 'lock_mm_and_find_vma()' helper function, which ends up
being used by x86, arm, powerpc, mips, riscv, alpha, arc, csky, hexagon,
loongarch, nios2, sh, sparc32, and xtensa. So we not only convert more
than half the architectures, we now have more shared code and avoid some
of those twisty little passages.
And largely due to this common helper function, the full diffstat of
this series ends up deleting more lines than it adds.
That still leaves eight architectures (ia64, m68k, microblaze, openrisc,
parisc, s390, sparc64 and um) that end up doing 'expand_stack()'
manually because they are doing something slightly different from the
normal pattern. Along with the couple of special cases in execve() and
GUP.
So there's a couple of patches that first create 'locked' helper
versions of the stack expansion functions, so that there's a obvious
path forward in the conversion. The execve() case is then actually
pretty simple, and is a nice cleanup from our old "grow-up stackls are
special, because at execve time even they grow down".
The #ifdef CONFIG_STACK_GROWSUP in that code just goes away, because
it's just more straightforward to write out the stack expansion there
manually, instead od having get_user_pages_remote() do it for us in some
situations but not others and have to worry about locking rules for GUP.
And the final step is then to just convert the remaining odd cases to a
new world order where 'expand_stack()' is called with the mmap_lock held
for reading, but where it might drop it and upgrade it to a write, only
to return with it held for reading (in the success case) or with it
completely dropped (in the failure case).
In the process, we remove all the stack expansion from GUP (where
dropping the lock wouldn't be ok without special rules anyway), and add
it in manually to __access_remote_vm() for ptrace().
Thanks to Adrian Glaubitz and Frank Scheiner who tested the ia64 cases.
Everything else here felt pretty straightforward, but the ia64 rules for
stack expansion are really quite odd and very different from everything
else. Also thanks to Vegard Nossum who caught me getting one of those
odd conditions entirely the wrong way around.
Anyway, I think I want to actually move all the stack expansion code to
a whole new file of its own, rather than have it split up between
mm/mmap.c and mm/memory.c, but since this will have to be backported to
the initial maple tree vma introduction anyway, I tried to keep the
patches _fairly_ minimal.
Also, while I don't think it's valid to expand the stack from GUP, the
final patch in here is a "warn if some crazy GUP user wants to try to
expand the stack" patch. That one will be reverted before the final
release, but it's left to catch any odd cases during the merge window
and release candidates.
Reported-by: Ruihan Li <lrh2000@pku.edu.cn>
* branch 'expand-stack':
gup: add warning if some caller would seem to want stack expansion
mm: always expand the stack with the mmap write lock held
execve: expand new process stack manually ahead of time
mm: make find_extend_vma() fail if write lock not held
powerpc/mm: convert coprocessor fault to lock_mm_and_find_vma()
mm/fault: convert remaining simple cases to lock_mm_and_find_vma()
arm/mm: Convert to using lock_mm_and_find_vma()
riscv/mm: Convert to using lock_mm_and_find_vma()
mips/mm: Convert to using lock_mm_and_find_vma()
powerpc/mm: Convert to using lock_mm_and_find_vma()
arm64/mm: Convert to using lock_mm_and_find_vma()
mm: make the page fault mmap locking killable
mm: introduce new 'lock_mm_and_find_vma()' page fault helper
This finishes the job of always holding the mmap write lock when
extending the user stack vma, and removes the 'write_locked' argument
from the vm helper functions again.
For some cases, we just avoid expanding the stack at all: drivers and
page pinning really shouldn't be extending any stacks. Let's see if any
strange users really wanted that.
It's worth noting that architectures that weren't converted to the new
lock_mm_and_find_vma() helper function are left using the legacy
"expand_stack()" function, but it has been changed to drop the mmap_lock
and take it for writing while expanding the vma. This makes it fairly
straightforward to convert the remaining architectures.
As a result of dropping and re-taking the lock, the calling conventions
for this function have also changed, since the old vma may no longer be
valid. So it will now return the new vma if successful, and NULL - and
the lock dropped - if the area could not be extended.
Tested-by: Vegard Nossum <vegard.nossum@oracle.com>
Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> # ia64
Tested-by: Frank Scheiner <frank.scheiner@web.de> # ia64
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
check_bugs() is about to be phased out. Switch over to the new
arch_cpu_finalize_init() implementation.
No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Richard Weinberger <richard@nod.at>
Link: https://lore.kernel.org/r/20230613224545.493148694@linutronix.de
There's a lot of code here that hard-codes that the
data is a single page, and right now that seems to
be sufficient, but to make it easier to change this
in the future, add a new STUB_DATA_PAGES constant
and use it throughout the code.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
- Add support for rust (yay!)
- Add support for LTO
- Add platform bus support to virtio-pci
- Various virtio fixes
- Coding style, spelling cleanups
-----BEGIN PGP SIGNATURE-----
iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAmP/AVYWHHJpY2hhcmRA
c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wUQvEADRVll8yCg+a4C1BORSefv0FQ/I
z+3jiUtzq/ABGBf9S0TYEfaGcOn1LXFzlEgtqf4kd3vmzyIVG0pHUt3BxaqLSSTU
IjDZkbtZ2GC0i7EK8D3iAC+lvew+TjBMfgEjhrNyni3VeYNDBh8EkUseWIbrNX8Q
pqoUQwYlVdjY6PedtcdGDko70Fiy4OMIK7lat7JDtuoL5pvMEadbR1D7ClfiYRIh
NGg5mSnfBBTGc20ochDkHUhubzagtSCDHvNe2SiYDBrM5sjeSANecsICmymWS+Pm
aJtYybwpC4tl9J25O4OTD3SyfKxZ93mKwK89Tw9ryqQV2rmXG+3qB2hbZ0zpi75I
vpgDrfv+VC+4daC0Wp8zjoeSE6zUbCKVE1s307EC5fjYyIoHjSUAsPE9GrNaJi5K
91WVe1x8Dnpfq9/ZO8o3sLqftBo3aVH21dGVuqi5qS6OjRqDMkFkaY31nUjVXELV
tEBj6n9UoyqPFzcgvsQfTcCjjlMiVL+JU+sl2L7dQXTNev1/RReTJngdK/vv7Epo
BjLpGfn+1I/8dlbsyjLt0FOIwhIbUf+8RbWpezENGVgKP81iqEQPOdax/yBEhCKm
NWduDz2itQn94KNMcUoPq+G3xoG3dOW7lXlEzXP6ZbLkcuIyXpeNZeJNlvq7J/z/
2PBy61Ngs/DDUK/doQ==
=q2ks
-----END PGP SIGNATURE-----
Merge tag 'uml-for-linus-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux
Pull UML updates from Richard Weinberger:
- Add support for rust (yay!)
- Add support for LTO
- Add platform bus support to virtio-pci
- Various virtio fixes
- Coding style, spelling cleanups
* tag 'uml-for-linus-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux: (27 commits)
Documentation: rust: Fix arch support table
uml: vector: Remove unused definitions VECTOR_{WRITE,HEADERS}
um: virt-pci: properly remove PCI device from bus
um: virtio_uml: move device breaking into workqueue
um: virtio_uml: mark device as unregistered when breaking it
um: virtio_uml: free command if adding to virtqueue failed
UML: define RUNTIME_DISCARD_EXIT
virt-pci: add platform bus support
um-virt-pci: Make max delay configurable
um: virt-pci: implement pcibios_get_phb_of_node()
um: Support LTO
um: put power options in a menu
um: Use CFLAGS_vmlinux
um: Prevent building modules incompatible with MODVERSIONS
um: Avoid pcap multiple definition errors
um: Make the definition of cpu_data more compatible
x86: um: vdso: Add '%rcx' and '%r11' to the syscall clobber list
rust: arch/um: Add support for CONFIG_RUST under x86_64 UML
rust: arch/um: Disable FP/SIMD instruction to match x86
rust: arch/um: Use 'pie' relocation mode under UML
...
- Change V=1 option to print both short log and full command log.
- Allow V=1 and V=2 to be combined as V=12.
- Make W=1 detect wrong .gitignore files.
- Tree-wide cleanups for unused command line arguments passed to Clang.
- Stop using -Qunused-arguments with Clang.
- Make scripts/setlocalversion handle only correct release tags instead
of any arbitrary annotated tag.
- Create Debian and RPM source packages without cleaning the source tree.
- Various cleanups for packaging.
-----BEGIN PGP SIGNATURE-----
iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmP7iHoVHG1hc2FoaXJv
eUBrZXJuZWwub3JnAAoJED2LAQed4NsGL/cQAK9q5rsNL5a2LgTbm89ORA+UV+ST
hrAoGo5DkJHUbVH53oPzyLynFBZPvUzLK8yjApjXkyAzy2hXYnj+vbTs0s+JVCFL
owS4NB0YP+tpHGuy8bGpWI0GMZSMwmspUteqxk86zuH8uQVAhnCaeV1/Cr6Aqj1h
2jk1FZid3/h7qEkEgu5U8soeyFnV6VhAT6Ie5yfZ2O2RdsSqPUh6vfKrgdyW4RWz
gito0SOUwvjIDfSmTnIIacUibisPRv2OW29OvmDp1aXj5rMhe3UfOznVE3NR86yl
ZbWDAIm6KYT8V1ASOoAUR80qent9IPKytThLK9BVEQCT6bsujCZMvhYhhEvO30TF
Lzsdr+FrES//xag3+hgc63FEied2xxWGQG1cRtzAhfRL9tJ03+mY1omoW6SyKqW/
Gc9PIcTgQbCIrkeL0HuAI1q3I1vkvHXInJKtGkoHh1J9aJ8v5gQpwGA+DDRUnA+A
LQSeEbT2Hf3MoF4CqZRnConvfhlMuLI+j5v54YPrhokxXmv7u807kjfwMFTiZ/+m
CJFlEMf9YRv3pi8g/AYyGAg5ZQigCwzOCRUC5kguFqzZdgnjiI907GEL804lm1Mg
lpx/HtYPyxwWEd2XyU6/C9AEIl3gm7MBd6b1tD54Tb/VmE+AvjS/O9jFYXZqnAnM
Llv4BfK/cQKwHb6o
=HpFZ
-----END PGP SIGNATURE-----
Merge tag 'kbuild-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
- Change V=1 option to print both short log and full command log
- Allow V=1 and V=2 to be combined as V=12
- Make W=1 detect wrong .gitignore files
- Tree-wide cleanups for unused command line arguments passed to Clang
- Stop using -Qunused-arguments with Clang
- Make scripts/setlocalversion handle only correct release tags instead
of any arbitrary annotated tag
- Create Debian and RPM source packages without cleaning the source
tree
- Various cleanups for packaging
* tag 'kbuild-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (74 commits)
kbuild: rpm-pkg: remove unneeded KERNELRELEASE from modules/headers_install
docs: kbuild: remove description of KBUILD_LDS_MODULE
.gitattributes: use 'dts' diff driver for *.dtso files
kbuild: deb-pkg: improve the usability of source package
kbuild: deb-pkg: fix binary-arch and clean in debian/rules
kbuild: tar-pkg: use tar rules in scripts/Makefile.package
kbuild: make perf-tar*-src-pkg work without relying on git
kbuild: deb-pkg: switch over to source format 3.0 (quilt)
kbuild: deb-pkg: make .orig tarball a hard link if possible
kbuild: deb-pkg: hide KDEB_SOURCENAME from Makefile
kbuild: srcrpm-pkg: create source package without cleaning
kbuild: rpm-pkg: build binary packages from source rpm
kbuild: deb-pkg: create source package without cleaning
kbuild: add a tool to list files ignored by git
Documentation/llvm: add Chimera Linux, Google and Meta datacenters
setlocalversion: use only the correct release tag for git-describe
setlocalversion: clean up the construction of version output
.gitignore: ignore *.cover and *.mbx
kbuild: remove --include-dir MAKEFLAG from top Makefile
kbuild: fix trivial typo in comment
...
With CONFIG_VIRTIO_UML=y, GNU ld < 2.36 fails to link UML vmlinux
(w/wo CONFIG_LD_SCRIPT_STATIC).
`.exit.text' referenced in section `.uml.exitcall.exit' of arch/um/drivers/virtio_uml.o: defined in discarded section `.exit.text' of arch/um/drivers/virtio_uml.o
collect2: error: ld returned 1 exit status
This fix is similar to the following commits:
- 4b9880dbf3bd ("powerpc/vmlinux.lds: Define RUNTIME_DISCARD_EXIT")
- a494398bde27 ("s390: define RUNTIME_DISCARD_EXIT to fix link error
with GNU ld < 2.36")
- c1c551bebf92 ("sh: define RUNTIME_DISCARD_EXIT")
Fixes: 99cb0d917ffa ("arch: fix broken BuildID for arm64 and riscv")
Reported-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Tested-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Richard Weinberger <richard@nod.at>
Match the x86 implementation to improve build errors.
Noticed when building allyesconfig.
e.g.
../arch/um/include/asm/processor-generic.h:94:19: error: called object is not a function or function pointer
94 | #define cpu_data (&boot_cpu_data)
| ~^~~~~~~~~~~~~~~
../drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_topology.c:2157:16: note: in expansion of macro ‘cpu_data’
2157 | return cpu_data(first_cpu_of_numa_node).apicid;
Signed-off-by: Peter Foley <pefoley2@pefoley.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
I added $(srctree)/ to some included Makefiles in the following commits:
- 3204a7fb98a3 ("kbuild: prefix $(srctree)/ to some included Makefiles")
- d82856395505 ("kbuild: do not require sub-make for separate output tree builds")
They were a preparation for removing --include-dir flag.
I have never thought --include-dir useful. Rather, it _is_ harmful.
For example, run the following commands:
$ make -s ARCH=x86 mrproper defconfig
$ make ARCH=arm O=foo dtbs
make[1]: Entering directory '/tmp/linux/foo'
HOSTCC scripts/basic/fixdep
Error: kernelrelease not valid - run 'make prepare' to update it
UPD include/config/kernel.release
make[1]: Leaving directory '/tmp/linux/foo'
The first command configures the source tree for x86. The next command
tries to build ARM device trees in the separate foo/ directory - this
must stop because the directory foo/ has not been configured yet.
However, due to --include-dir=$(abs_srctree), the top Makefile includes
the wrong include/config/auto.conf from the source tree and continues
building. Kbuild traverses the directory tree, but of course it does
not work correctly. The Error message is also pointless - 'make prepare'
does not help at all for fixing the issue.
This commit fixes more arch Makefile, and finally removes --include-dir
from the top Makefile.
There are more breakages under drivers/, but I do not volunteer to fix
them all. I just moved --include-dir to drivers/Makefile.
With this commit, the second command will stop with a sensible message.
$ make -s ARCH=x86 mrproper defconfig
$ make ARCH=arm O=foo dtbs
make[1]: Entering directory '/tmp/linux/foo'
SYNC include/config/auto.conf.cmd
***
*** The source tree is not clean, please run 'make ARCH=arm mrproper'
*** in /tmp/linux
***
make[2]: *** [../Makefile:646: outputmakefile] Error 1
/tmp/linux/Makefile:770: include/config/auto.conf.cmd: No such file or directory
make[1]: *** [/tmp/linux/Makefile:793: include/config/auto.conf.cmd] Error 2
make[1]: Leaving directory '/tmp/linux/foo'
make: *** [Makefile:226: __sub-make] Error 2
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
This means having the string literal in one line and using __func__
where appropriate.
Signed-off-by: Benjamin Berg <benjamin@sipsolutions.net>
Signed-off-by: Richard Weinberger <richard@nod.at>
Due to changes in the iteration, there are now lockdep
checks indicating that we're missing locking here. Add
the missing locking where it's needed.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Current arch_cpu_idle() is called with IRQs disabled, but will return
with IRQs enabled.
However, the very first thing the generic code does after calling
arch_cpu_idle() is raw_local_irq_disable(). This means that
architectures that can idle with IRQs disabled end up doing a
pointless 'enable-disable' dance.
Therefore, push this IRQ disabling into the idle function, meaning
that those architectures can avoid the pointless IRQ state flipping.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Tony Lindgren <tony@atomide.com>
Tested-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Acked-by: Mark Rutland <mark.rutland@arm.com> [arm64]
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Guo Ren <guoren@kernel.org>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20230112195540.618076436@infradead.org
Idle code is very like entry code in that RCU isn't available. As
such, add a little validation.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Tony Lindgren <tony@atomide.com>
Tested-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20230112195540.373461409@infradead.org
been long in the making. It is a lighterweight software-only fix for
Skylake-based cores where enabling IBRS is a big hammer and causes a
significant performance impact.
What it basically does is, it aligns all kernel functions to 16 bytes
boundary and adds a 16-byte padding before the function, objtool
collects all functions' locations and when the mitigation gets applied,
it patches a call accounting thunk which is used to track the call depth
of the stack at any time.
When that call depth reaches a magical, microarchitecture-specific value
for the Return Stack Buffer, the code stuffs that RSB and avoids its
underflow which could otherwise lead to the Intel variant of Retbleed.
This software-only solution brings a lot of the lost performance back,
as benchmarks suggest:
https://lore.kernel.org/all/20220915111039.092790446@infradead.org/
That page above also contains a lot more detailed explanation of the
whole mechanism
- Implement a new control flow integrity scheme called FineIBT which is
based on the software kCFI implementation and uses hardware IBT support
where present to annotate and track indirect branches using a hash to
validate them
- Other misc fixes and cleanups
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmOZp5EACgkQEsHwGGHe
VUrZFxAAvi/+8L0IYSK4mKJvixGbTFjxN/Swo2JVOfs34LqGUT6JaBc+VUMwZxdb
VMTFIZ3ttkKEodjhxGI7oGev6V8UfhI37SmO2lYKXpQVjXXnMlv/M+Vw3teE38CN
gopi+xtGnT1IeWQ3tc/Tv18pleJ0mh5HKWiW+9KoqgXj0wgF9x4eRYDz1TDCDA/A
iaBzs56j8m/FSykZHnrWZ/MvjKNPdGlfJASUCPeTM2dcrXQGJ93+X2hJctzDte0y
Nuiw6Y0htfFBE7xoJn+sqm5Okr+McoUM18/CCprbgSKYk18iMYm3ZtAi6FUQZS1A
ua4wQCf49loGp15PO61AS5d3OBf5D3q/WihQRbCaJvTVgPp9sWYnWwtcVUuhMllh
ZQtBU9REcVJ/22bH09Q9CjBW0VpKpXHveqQdqRDViLJ6v/iI6EFGmD24SW/VxyRd
73k9MBGrL/dOf1SbEzdsnvcSB3LGzp0Om8o/KzJWOomrVKjBCJy16bwTEsCZEJmP
i406m92GPXeaN1GhTko7vmF0GnkEdJs1GVCZPluCAxxbhHukyxHnrjlQjI4vC80n
Ylc0B3Kvitw7LGJsPqu+/jfNHADC/zhx1qz/30wb5cFmFbN1aRdp3pm8JYUkn+l/
zri2Y6+O89gvE/9/xUhMohzHsWUO7xITiBavewKeTP9GSWybWUs=
=cRy1
-----END PGP SIGNATURE-----
Merge tag 'x86_core_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 core updates from Borislav Petkov:
- Add the call depth tracking mitigation for Retbleed which has been
long in the making. It is a lighterweight software-only fix for
Skylake-based cores where enabling IBRS is a big hammer and causes a
significant performance impact.
What it basically does is, it aligns all kernel functions to 16 bytes
boundary and adds a 16-byte padding before the function, objtool
collects all functions' locations and when the mitigation gets
applied, it patches a call accounting thunk which is used to track
the call depth of the stack at any time.
When that call depth reaches a magical, microarchitecture-specific
value for the Return Stack Buffer, the code stuffs that RSB and
avoids its underflow which could otherwise lead to the Intel variant
of Retbleed.
This software-only solution brings a lot of the lost performance
back, as benchmarks suggest:
https://lore.kernel.org/all/20220915111039.092790446@infradead.org/
That page above also contains a lot more detailed explanation of the
whole mechanism
- Implement a new control flow integrity scheme called FineIBT which is
based on the software kCFI implementation and uses hardware IBT
support where present to annotate and track indirect branches using a
hash to validate them
- Other misc fixes and cleanups
* tag 'x86_core_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (80 commits)
x86/paravirt: Use common macro for creating simple asm paravirt functions
x86/paravirt: Remove clobber bitmask from .parainstructions
x86/debug: Include percpu.h in debugreg.h to get DECLARE_PER_CPU() et al
x86/cpufeatures: Move X86_FEATURE_CALL_DEPTH from bit 18 to bit 19 of word 11, to leave space for WIP X86_FEATURE_SGX_EDECCSSA bit
x86/Kconfig: Enable kernel IBT by default
x86,pm: Force out-of-line memcpy()
objtool: Fix weak hole vs prefix symbol
objtool: Optimize elf_dirty_reloc_sym()
x86/cfi: Add boot time hash randomization
x86/cfi: Boot time selection of CFI scheme
x86/ibt: Implement FineIBT
objtool: Add --cfi to generate the .cfi_sites section
x86: Add prefix symbols for function padding
objtool: Add option to generate prefix symbols
objtool: Avoid O(bloody terrible) behaviour -- an ode to libelf
objtool: Slice up elf_create_section_symbol()
kallsyms: Revert "Take callthunks into account"
x86: Unconfuse CONFIG_ and X86_FEATURE_ namespaces
x86/retpoline: Fix crash printing warning
x86/paravirt: Fix a !PARAVIRT build warning
...
handling. Collecting per-thread register values is the
only thing that needs to be ifdefed there...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCY5ZyNgAKCRBZ7Krx/gZQ
63MZAQDZDE9Pk9EQ/3qOPNb2cuz8KSB3THUyotvustUUGPTUVAD/Ut1xD03jpWCY
oQ6tM8dNyh3+Vsx6/XKNd1+pj6IgNQE=
=XYkZ
-----END PGP SIGNATURE-----
Merge tag 'pull-elfcore' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull elf coredumping updates from Al Viro:
"Unification of regset and non-regset sides of ELF coredump handling.
Collecting per-thread register values is the only thing that needs to
be ifdefed there..."
* tag 'pull-elfcore' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
[elf] get rid of get_note_info_size()
[elf] unify regset and non-regset cases
[elf][non-regset] use elf_core_copy_task_regs() for dumper as well
[elf][non-regset] uninline elf_core_copy_task_fpregs() (and lose pt_regs argument)
elf_core_copy_task_regs(): task_pt_regs is defined everywhere
[elf][regset] simplify thread list handling in fill_note_info()
[elf][regset] clean fill_note_info() a bit
kill extern of vsyscall32_sysctl
kill coredump_params->regs
kill signal_pt_regs()
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEq5lC5tSkz8NBJiCnSfxwEqXeA64FAmOU+U8ACgkQSfxwEqXe
A67NnQ//Y5DltmvibyPd7r1TFT2gUYv+Rx3sUV9ZE1NYptd/SWhhcL8c5FZ70Fuw
bSKCa1uiWjOxosjXT1kGrWq3de7q7oUpAPSOGxgxzoaNURIt58N/ajItCX/4Au8I
RlGAScHy5e5t41/26a498kB6qJ441fBEqCYKQpPLINMBAhe8TQ+NVp0rlpUwNHFX
WrUGg4oKWxdBIW3HkDirQjJWDkkAiklRTifQh/Al4b6QDbOnRUGGCeckNOhixsvS
waHWTld+Td8jRrA4b82tUb2uVZ2/b8dEvj/A8CuTv4yC0lywoyMgBWmJAGOC+UmT
ZVNdGW02Jc2T+Iap8ZdsEmeLHNqbli4+IcbY5xNlov+tHJ2oz41H9TZoYKbudlr6
/ReAUPSn7i50PhbQlEruj3eg+M2gjOeh8OF8UKwwRK8PghvyWQ1ScW0l3kUhPIhI
PdIG6j4+D2mJc1FIj2rTVB+Bg933x6S+qx4zDxGlNp62AARUFYf6EgyD6aXFQVuX
RxcKb6cjRuFkzFiKc8zkqg5edZH+IJcPNuIBmABqTGBOxbZWURXzIQvK/iULqZa4
CdGAFIs6FuOh8pFHLI3R4YoHBopbHup/xKDEeAO9KZGyeVIuOSERDxxo5f/ITzcq
APvT77DFOEuyvanr8RMqqh0yUjzcddXqw9+ieufsAyDwjD9DTuE=
=QRhK
-----END PGP SIGNATURE-----
Merge tag 'random-6.2-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random
Pull random number generator updates from Jason Donenfeld:
- Replace prandom_u32_max() and various open-coded variants of it,
there is now a new family of functions that uses fast rejection
sampling to choose properly uniformly random numbers within an
interval:
get_random_u32_below(ceil) - [0, ceil)
get_random_u32_above(floor) - (floor, U32_MAX]
get_random_u32_inclusive(floor, ceil) - [floor, ceil]
Coccinelle was used to convert all current users of
prandom_u32_max(), as well as many open-coded patterns, resulting in
improvements throughout the tree.
I'll have a "late" 6.1-rc1 pull for you that removes the now unused
prandom_u32_max() function, just in case any other trees add a new
use case of it that needs to converted. According to linux-next,
there may be two trivial cases of prandom_u32_max() reintroductions
that are fixable with a 's/.../.../'. So I'll have for you a final
conversion patch doing that alongside the removal patch during the
second week.
This is a treewide change that touches many files throughout.
- More consistent use of get_random_canary().
- Updates to comments, documentation, tests, headers, and
simplification in configuration.
- The arch_get_random*_early() abstraction was only used by arm64 and
wasn't entirely useful, so this has been replaced by code that works
in all relevant contexts.
- The kernel will use and manage random seeds in non-volatile EFI
variables, refreshing a variable with a fresh seed when the RNG is
initialized. The RNG GUID namespace is then hidden from efivarfs to
prevent accidental leakage.
These changes are split into random.c infrastructure code used in the
EFI subsystem, in this pull request, and related support inside of
EFISTUB, in Ard's EFI tree. These are co-dependent for full
functionality, but the order of merging doesn't matter.
- Part of the infrastructure added for the EFI support is also used for
an improvement to the way vsprintf initializes its siphash key,
replacing an sleep loop wart.
- The hardware RNG framework now always calls its correct random.c
input function, add_hwgenerator_randomness(), rather than sometimes
going through helpers better suited for other cases.
- The add_latent_entropy() function has long been called from the fork
handler, but is a no-op when the latent entropy gcc plugin isn't
used, which is fine for the purposes of latent entropy.
But it was missing out on the cycle counter that was also being mixed
in beside the latent entropy variable. So now, if the latent entropy
gcc plugin isn't enabled, add_latent_entropy() will expand to a call
to add_device_randomness(NULL, 0), which adds a cycle counter,
without the absent latent entropy variable.
- The RNG is now reseeded from a delayed worker, rather than on demand
when used. Always running from a worker allows it to make use of the
CPU RNG on platforms like S390x, whose instructions are too slow to
do so from interrupts. It also has the effect of adding in new inputs
more frequently with more regularity, amounting to a long term
transcript of random values. Plus, it helps a bit with the upcoming
vDSO implementation (which isn't yet ready for 6.2).
- The jitter entropy algorithm now tries to execute on many different
CPUs, round-robining, in hopes of hitting even more memory latencies
and other unpredictable effects. It also will mix in a cycle counter
when the entropy timer fires, in addition to being mixed in from the
main loop, to account more explicitly for fluctuations in that timer
firing. And the state it touches is now kept within the same cache
line, so that it's assured that the different execution contexts will
cause latencies.
* tag 'random-6.2-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: (23 commits)
random: include <linux/once.h> in the right header
random: align entropy_timer_state to cache line
random: mix in cycle counter when jitter timer fires
random: spread out jitter callback to different CPUs
random: remove extraneous period and add a missing one in comments
efi: random: refresh non-volatile random seed when RNG is initialized
vsprintf: initialize siphash key using notifier
random: add back async readiness notifier
random: reseed in delayed work rather than on-demand
random: always mix cycle counter in add_latent_entropy()
hw_random: use add_hwgenerator_randomness() for early entropy
random: modernize documentation comment on get_random_bytes()
random: adjust comment to account for removed function
random: remove early archrandom abstraction
random: use random.trust_{bootloader,cpu} command line option only
stackprotector: actually use get_random_canary()
stackprotector: move get_random_canary() into stackprotector.h
treewide: use get_random_u32_inclusive() when possible
treewide: use get_random_u32_{above,below}() instead of manual loop
treewide: use get_random_u32_below() instead of deprecated function
...
Rather than using the console_lock to guarantee safe console list
traversal, use srcu console list iteration.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20221116162152.193147-14-john.ogness@linutronix.de
The initial intention of the UML kmsg_dumper is to dump the kernel
buffers to stdout if there is no console available to perform the
regular crash output.
However, if ttynull was registered as a console, no crash output was
seen. Commit e23fe90dec28 ("um: kmsg_dumper: always dump when not tty
console") tried to fix this by performing the kmsg_dump unless the
stdio console was behind /dev/console or enabled. But this allowed
kmsg dumping to occur even if other non-stdio consoles will output
the crash output. Also, a console being the driver behind
/dev/console has nothing to do with a crash scenario.
Restore the initial intention by dumping the kernel buffers to stdout
only if a non-ttynull console is registered and enabled. Also add
detailed comments so that it is clear why these rules are applied.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20221116162152.193147-8-john.ogness@linutronix.de
Don't bother with pointless macros - we are not sharing it with aout coredumps
anymore. Just convert the underlying functions to the same arguments (nobody
uses regs, actually) and call them elf_core_copy_task_fpregs(). And unexport
the entire bunch, while we are at it.
[added missing includes in arch/{csky,m68k,um}/kernel/process.c to avoid extra
warnings about the lack of externs getting added to huge piles for those
files. Pointless, but...]
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This is a simple mechanical transformation done by:
@@
expression E;
@@
- prandom_u32_max
+ get_random_u32_below
(E)
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs
Reviewed-by: SeongJae Park <sj@kernel.org> # for damon
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> # for infiniband
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> # for arm
Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>