2019-05-19 13:07:45 +01:00
|
|
|
# SPDX-License-Identifier: GPL-2.0-only
|
2018-07-31 13:39:31 +02:00
|
|
|
menu "Kernel hacking"
|
|
|
|
|
2013-07-01 13:04:49 -07:00
|
|
|
menu "printk and dmesg options"
|
2005-04-16 15:20:36 -07:00
|
|
|
|
|
|
|
config PRINTK_TIME
|
|
|
|
bool "Show timing information on printks"
|
2006-12-06 20:36:38 -08:00
|
|
|
depends on PRINTK
|
2005-04-16 15:20:36 -07:00
|
|
|
help
|
2012-05-10 04:30:45 +02:00
|
|
|
Selecting this option causes time stamps of the printk()
|
|
|
|
messages to be added to the output of the syslog() system
|
|
|
|
call and at the console.
|
|
|
|
|
|
|
|
The timestamp is always recorded internally, and exported
|
|
|
|
to /dev/kmsg. This flag just specifies if the timestamp should
|
|
|
|
be included, not that the timestamp is recorded.
|
|
|
|
|
|
|
|
The behavior is also controlled by the kernel command line
|
2016-10-18 10:12:27 -02:00
|
|
|
parameter printk.time=1. See Documentation/admin-guide/kernel-parameters.rst
|
2005-04-16 15:20:36 -07:00
|
|
|
|
printk: Add caller information to printk() output.
Sometimes we want to print a series of printk() messages to consoles
without being disturbed by concurrent printk() from interrupts and/or
other threads. But we can't enforce printk() callers to use their local
buffers because we need to ask them to make too much changes. Also, even
buffering up to one line inside printk() might cause failing to emit
an important clue under critical situation.
Therefore, instead of trying to help buffering, let's try to help
reconstructing messages by saving caller information as of calling
log_store() and adding it as "[T$thread_id]" or "[C$processor_id]"
upon printing to consoles.
Some examples for console output:
[ 1.222773][ T1] x86: Booting SMP configuration:
[ 2.779635][ T1] pci 0000:00:01.0: PCI bridge to [bus 01]
[ 5.069193][ T268] Fusion MPT base driver 3.04.20
[ 9.316504][ C2] random: fast init done
[ 13.413336][ T3355] Initialized host personality
Some examples for /dev/kmsg output:
6,496,1222773,-,caller=T1;x86: Booting SMP configuration:
6,968,2779635,-,caller=T1;pci 0000:00:01.0: PCI bridge to [bus 01]
SUBSYSTEM=pci
DEVICE=+pci:0000:00:01.0
6,1353,5069193,-,caller=T268;Fusion MPT base driver 3.04.20
5,1526,9316504,-,caller=C2;random: fast init done
6,1575,13413336,-,caller=T3355;Initialized host personality
Note that this patch changes max length of messages which can be printed
by printk() or written to /dev/kmsg interface from 992 bytes to 976 bytes,
based on an assumption that userspace won't try to write messages hitting
that border line to /dev/kmsg interface.
Link: http://lkml.kernel.org/r/93f19e57-5051-c67d-9af4-b17624062d44@i-love.sakura.ne.jp
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: LKML <linux-kernel@vger.kernel.org>
Cc: syzkaller <syzkaller@googlegroups.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2018-12-18 06:05:04 +09:00
|
|
|
config PRINTK_CALLER
|
|
|
|
bool "Show caller information on printks"
|
|
|
|
depends on PRINTK
|
|
|
|
help
|
|
|
|
Selecting this option causes printk() to add a caller "thread id" (if
|
|
|
|
in task context) or a caller "processor id" (if not in task context)
|
|
|
|
to every message.
|
|
|
|
|
|
|
|
This option is intended for environments where multiple threads
|
|
|
|
concurrently call printk() for many times, for it is difficult to
|
|
|
|
interpret without knowing where these lines (or sometimes individual
|
|
|
|
line which was divided into multiple lines due to race) came from.
|
|
|
|
|
|
|
|
Since toggling after boot makes the code racy, currently there is
|
|
|
|
no option to enable/disable at the kernel command line parameter or
|
|
|
|
sysfs interface.
|
|
|
|
|
2021-07-07 18:09:17 -07:00
|
|
|
config STACKTRACE_BUILD_ID
|
|
|
|
bool "Show build ID information in stacktraces"
|
|
|
|
depends on PRINTK
|
|
|
|
help
|
|
|
|
Selecting this option adds build ID information for symbols in
|
|
|
|
stacktraces printed with the printk format '%p[SR]b'.
|
|
|
|
|
|
|
|
This option is intended for distros where debuginfo is not easily
|
|
|
|
accessible but can be downloaded given the build ID of the vmlinux or
|
|
|
|
kernel module where the function is located.
|
|
|
|
|
2016-12-12 16:45:56 -08:00
|
|
|
config CONSOLE_LOGLEVEL_DEFAULT
|
|
|
|
int "Default console loglevel (1-15)"
|
|
|
|
range 1 15
|
|
|
|
default "7"
|
|
|
|
help
|
|
|
|
Default loglevel to determine what will be printed on the console.
|
|
|
|
|
|
|
|
Setting a default here is equivalent to passing in loglevel=<x> in
|
|
|
|
the kernel bootargs. loglevel=<x> continues to override whatever
|
|
|
|
value is specified here as well.
|
|
|
|
|
2016-12-19 16:23:15 -08:00
|
|
|
Note: This does not affect the log level of un-prefixed printk()
|
2016-12-12 16:45:56 -08:00
|
|
|
usage in the kernel. That is controlled by the MESSAGE_LOGLEVEL_DEFAULT
|
|
|
|
option.
|
|
|
|
|
2018-06-19 13:57:26 +02:00
|
|
|
config CONSOLE_LOGLEVEL_QUIET
|
|
|
|
int "quiet console loglevel (1-15)"
|
|
|
|
range 1 15
|
|
|
|
default "4"
|
|
|
|
help
|
|
|
|
loglevel to use when "quiet" is passed on the kernel commandline.
|
|
|
|
|
|
|
|
When "quiet" is passed on the kernel commandline this loglevel
|
|
|
|
will be used as the loglevel. IOW passing "quiet" will be the
|
|
|
|
equivalent of passing "loglevel=<CONSOLE_LOGLEVEL_QUIET>"
|
|
|
|
|
2014-08-06 16:09:01 -07:00
|
|
|
config MESSAGE_LOGLEVEL_DEFAULT
|
2011-03-22 16:34:23 -07:00
|
|
|
int "Default message log level (1-7)"
|
|
|
|
range 1 7
|
|
|
|
default "4"
|
|
|
|
help
|
|
|
|
Default log level for printk statements with no specified priority.
|
|
|
|
|
|
|
|
This was hard-coded to KERN_WARNING since at least 2.6.10 but folks
|
|
|
|
that are auditing their logs closely may want to set it to a lower
|
|
|
|
priority.
|
|
|
|
|
2016-12-12 16:45:56 -08:00
|
|
|
Note: This does not affect what message level gets printed on the console
|
|
|
|
by default. To change that, use loglevel=<x> in the kernel bootargs,
|
|
|
|
or pick a different CONSOLE_LOGLEVEL_DEFAULT configuration value.
|
|
|
|
|
2013-07-01 13:04:49 -07:00
|
|
|
config BOOT_PRINTK_DELAY
|
|
|
|
bool "Delay each boot printk message by N milliseconds"
|
|
|
|
depends on DEBUG_KERNEL && PRINTK && GENERIC_CALIBRATE_DELAY
|
|
|
|
help
|
|
|
|
This build option allows you to read kernel boot messages
|
|
|
|
by inserting a short delay after each one. The delay is
|
|
|
|
specified in milliseconds on the kernel command line,
|
|
|
|
using "boot_delay=N".
|
|
|
|
|
|
|
|
It is likely that you would also need to use "lpj=M" to preset
|
2024-09-04 15:04:53 +02:00
|
|
|
the "loops per jiffy" value.
|
2013-07-01 13:04:49 -07:00
|
|
|
See a previous boot log for the "lpj" value to use for your
|
|
|
|
system, and then set "lpj=M" before setting "boot_delay=N".
|
|
|
|
NOTE: Using this option may adversely affect SMP systems.
|
|
|
|
I.e., processors other than the first one may not boot up.
|
|
|
|
BOOT_PRINTK_DELAY also may cause LOCKUP_DETECTOR to detect
|
|
|
|
what it believes to be lockup conditions.
|
|
|
|
|
|
|
|
config DYNAMIC_DEBUG
|
|
|
|
bool "Enable dynamic printk() support"
|
|
|
|
default n
|
|
|
|
depends on PRINTK
|
2020-02-10 13:11:42 -08:00
|
|
|
depends on (DEBUG_FS || PROC_FS)
|
2020-06-07 21:40:14 -07:00
|
|
|
select DYNAMIC_DEBUG_CORE
|
2013-07-01 13:04:49 -07:00
|
|
|
help
|
|
|
|
|
|
|
|
Compiles debug level messages into the kernel, which would not
|
|
|
|
otherwise be available at runtime. These messages can then be
|
|
|
|
enabled/disabled based on various levels of scope - per source file,
|
|
|
|
function, module, format string, and line number. This mechanism
|
|
|
|
implicitly compiles in all pr_debug() and dev_dbg() calls, which
|
|
|
|
enlarges the kernel text size by about 2%.
|
|
|
|
|
|
|
|
If a source file is compiled with DEBUG flag set, any
|
|
|
|
pr_debug() calls in it are enabled by default, but can be
|
|
|
|
disabled at runtime as below. Note that DEBUG flag is
|
|
|
|
turned on by many CONFIG_*DEBUG* options.
|
|
|
|
|
|
|
|
Usage:
|
|
|
|
|
|
|
|
Dynamic debugging is controlled via the 'dynamic_debug/control' file,
|
2020-02-10 13:11:42 -08:00
|
|
|
which is contained in the 'debugfs' filesystem or procfs.
|
|
|
|
Thus, the debugfs or procfs filesystem must first be mounted before
|
|
|
|
making use of this feature.
|
2013-07-01 13:04:49 -07:00
|
|
|
We refer the control file as: <debugfs>/dynamic_debug/control. This
|
|
|
|
file contains a list of the debug statements that can be enabled. The
|
|
|
|
format for each line of the file is:
|
|
|
|
|
|
|
|
filename:lineno [module]function flags format
|
|
|
|
|
|
|
|
filename : source file of the debug statement
|
|
|
|
lineno : line number of the debug statement
|
|
|
|
module : module that contains the debug statement
|
|
|
|
function : function that contains the debug statement
|
2019-12-06 17:04:08 -08:00
|
|
|
flags : '=p' means the line is turned 'on' for printing
|
|
|
|
format : the format used for the debug statement
|
2013-07-01 13:04:49 -07:00
|
|
|
|
|
|
|
From a live system:
|
|
|
|
|
|
|
|
nullarbor:~ # cat <debugfs>/dynamic_debug/control
|
|
|
|
# filename:lineno [module]function flags format
|
|
|
|
fs/aio.c:222 [aio]__put_ioctx =_ "__put_ioctx:\040freeing\040%p\012"
|
|
|
|
fs/aio.c:248 [aio]ioctx_alloc =_ "ENOMEM:\040nr_events\040too\040high\012"
|
|
|
|
fs/aio.c:1770 [aio]sys_io_cancel =_ "calling\040cancel\012"
|
|
|
|
|
|
|
|
Example usage:
|
|
|
|
|
|
|
|
// enable the message at line 1603 of file svcsock.c
|
|
|
|
nullarbor:~ # echo -n 'file svcsock.c line 1603 +p' >
|
|
|
|
<debugfs>/dynamic_debug/control
|
|
|
|
|
|
|
|
// enable all the messages in file svcsock.c
|
|
|
|
nullarbor:~ # echo -n 'file svcsock.c +p' >
|
|
|
|
<debugfs>/dynamic_debug/control
|
|
|
|
|
|
|
|
// enable all the messages in the NFS server module
|
|
|
|
nullarbor:~ # echo -n 'module nfsd +p' >
|
|
|
|
<debugfs>/dynamic_debug/control
|
|
|
|
|
|
|
|
// enable all 12 messages in the function svc_process()
|
|
|
|
nullarbor:~ # echo -n 'func svc_process +p' >
|
|
|
|
<debugfs>/dynamic_debug/control
|
|
|
|
|
|
|
|
// disable all 12 messages in the function svc_process()
|
|
|
|
nullarbor:~ # echo -n 'func svc_process -p' >
|
|
|
|
<debugfs>/dynamic_debug/control
|
|
|
|
|
2017-03-16 09:37:32 +01:00
|
|
|
See Documentation/admin-guide/dynamic-debug-howto.rst for additional
|
|
|
|
information.
|
2013-07-01 13:04:49 -07:00
|
|
|
|
2020-06-07 21:40:14 -07:00
|
|
|
config DYNAMIC_DEBUG_CORE
|
|
|
|
bool "Enable core function of dynamic debug support"
|
|
|
|
depends on PRINTK
|
|
|
|
depends on (DEBUG_FS || PROC_FS)
|
|
|
|
help
|
|
|
|
Enable core functional support of dynamic debug. It is useful
|
|
|
|
when you want to tie dynamic debug to your kernel modules with
|
|
|
|
DYNAMIC_DEBUG_MODULE defined for each of them, especially for
|
|
|
|
the case of embedded system where the kernel image size is
|
|
|
|
sensitive for people.
|
|
|
|
|
printf: add support for printing symbolic error names
It has been suggested several times to extend vsnprintf() to be able
to convert the numeric value of ENOSPC to print "ENOSPC". This
implements that as a %p extension: With %pe, one can do
if (IS_ERR(foo)) {
pr_err("Sorry, can't do that: %pe\n", foo);
return PTR_ERR(foo);
}
instead of what is seen in quite a few places in the kernel:
if (IS_ERR(foo)) {
pr_err("Sorry, can't do that: %ld\n", PTR_ERR(foo));
return PTR_ERR(foo);
}
If the value passed to %pe is an ERR_PTR, but the library function
errname() added here doesn't know about the value, the value is simply
printed in decimal. If the value passed to %pe is not an ERR_PTR, we
treat it as an ordinary %p and thus print the hashed value (passing
non-ERR_PTR values to %pe indicates a bug in the caller, but we can't
do much about that).
With my embedded hat on, and because it's not very invasive to do,
I've made it possible to remove this. The errname() function and
associated lookup tables take up about 3K. For most, that's probably
quite acceptable and a price worth paying for more readable
dmesg (once this starts getting used), while for those that disable
printk() it's of very little use - I don't see a
procfs/sysfs/seq_printf() file reasonably making use of this - and
they clearly want to squeeze vmlinux as much as possible. Hence the
default y if PRINTK.
The symbols to include have been found by massaging the output of
find arch include -iname 'errno*.h' | xargs grep -E 'define\s*E'
In the cases where some common aliasing exists
(e.g. EAGAIN=EWOULDBLOCK on all platforms, EDEADLOCK=EDEADLK on most),
I've moved the more popular one (in terms of 'git grep -w Efoo | wc)
to the bottom so that one takes precedence.
Link: http://lkml.kernel.org/r/20191015190706.15989-1-linux@rasmusvillemoes.dk
To: "Jonathan Corbet" <corbet@lwn.net>
To: linux-kernel@vger.kernel.org
Cc: "Andy Shevchenko" <andy.shevchenko@gmail.com>
Cc: "Andrew Morton" <akpm@linux-foundation.org>
Cc: "Joe Perches" <joe@perches.com>
Cc: linux-doc@vger.kernel.org
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Acked-by: Uwe Kleine-König <uwe@kleine-koenig.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
[andy.shevchenko@gmail.com: use abs()]
Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2019-10-15 21:07:05 +02:00
|
|
|
config SYMBOLIC_ERRNAME
|
|
|
|
bool "Support symbolic error names in printf"
|
|
|
|
default y if PRINTK
|
|
|
|
help
|
|
|
|
If you say Y here, the kernel's printf implementation will
|
|
|
|
be able to print symbolic error names such as ENOSPC instead
|
|
|
|
of the number 28. It makes the kernel image slightly larger
|
|
|
|
(about 3KB), but can make the kernel logs easier to read.
|
|
|
|
|
2019-12-06 17:04:03 -08:00
|
|
|
config DEBUG_BUGVERBOSE
|
|
|
|
bool "Verbose BUG() reporting (adds 70K)" if DEBUG_KERNEL && EXPERT
|
|
|
|
depends on BUG && (GENERIC_BUG || HAVE_DEBUG_BUGVERBOSE)
|
|
|
|
default y
|
|
|
|
help
|
|
|
|
Say Y here to make BUG() panics output the file name and line number
|
|
|
|
of the BUG call as well as the EIP and oops trace. This aids
|
|
|
|
debugging but costs about 70-100K of memory.
|
|
|
|
|
2013-07-01 13:04:49 -07:00
|
|
|
endmenu # "printk and dmesg options"
|
|
|
|
|
2022-03-23 16:05:38 -07:00
|
|
|
config DEBUG_KERNEL
|
|
|
|
bool "Kernel debugging"
|
|
|
|
help
|
|
|
|
Say Y here if you are developing drivers or trying to debug and
|
|
|
|
identify kernel problems.
|
|
|
|
|
|
|
|
config DEBUG_MISC
|
|
|
|
bool "Miscellaneous debug code"
|
|
|
|
default DEBUG_KERNEL
|
|
|
|
depends on DEBUG_KERNEL
|
|
|
|
help
|
|
|
|
Say Y here if you need to enable miscellaneous debug code that should
|
|
|
|
be under a more specific debug option but isn't.
|
|
|
|
|
2013-07-01 13:04:46 -07:00
|
|
|
menu "Compile-time checks and compiler options"
|
|
|
|
|
|
|
|
config DEBUG_INFO
|
2022-03-23 16:05:38 -07:00
|
|
|
bool
|
2013-07-01 13:04:46 -07:00
|
|
|
help
|
2022-03-23 16:05:38 -07:00
|
|
|
A kernel debug info option other than "None" has been selected
|
|
|
|
in the "Debug information" choice below, indicating that debug
|
|
|
|
information will be generated for build targets.
|
|
|
|
|
2023-12-05 16:53:52 -07:00
|
|
|
# Clang generates .uleb128 with label differences for DWARF v5, a feature that
|
|
|
|
# older binutils ports do not support when utilizing RISC-V style linker
|
|
|
|
# relaxation: https://sourceware.org/bugzilla/show_bug.cgi?id=27215
|
|
|
|
config AS_HAS_NON_CONST_ULEB128
|
2022-10-14 13:42:11 -07:00
|
|
|
def_bool $(as-instr,.uleb128 .Lexpr_end4 - .Lexpr_start3\n.Lexpr_start3:\n.Lexpr_end4:)
|
|
|
|
|
2022-03-23 16:05:38 -07:00
|
|
|
choice
|
|
|
|
prompt "Debug information"
|
|
|
|
depends on DEBUG_KERNEL
|
|
|
|
help
|
|
|
|
Selecting something other than "None" results in a kernel image
|
|
|
|
that will include debugging info resulting in a larger kernel image.
|
2013-07-01 13:04:46 -07:00
|
|
|
This adds debug symbols to the kernel and modules (gcc -g), and
|
|
|
|
is needed if you intend to use kernel crashdump or binary object
|
|
|
|
tools like crash, kgdb, LKCD, gdb, etc on the kernel.
|
|
|
|
|
2022-03-23 16:05:38 -07:00
|
|
|
Choose which version of DWARF debug info to emit. If unsure,
|
|
|
|
select "Toolchain default".
|
|
|
|
|
|
|
|
config DEBUG_INFO_NONE
|
|
|
|
bool "Disable debug information"
|
|
|
|
help
|
|
|
|
Do not build the kernel with debugging information, which will
|
|
|
|
result in a faster and smaller build.
|
|
|
|
|
|
|
|
config DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT
|
|
|
|
bool "Rely on the toolchain's implicit default DWARF version"
|
|
|
|
select DEBUG_INFO
|
2023-12-05 16:53:52 -07:00
|
|
|
depends on !CC_IS_CLANG || AS_IS_LLVM || CLANG_VERSION < 140000 || (AS_IS_GNU && AS_VERSION >= 23502 && AS_HAS_NON_CONST_ULEB128)
|
2022-03-23 16:05:38 -07:00
|
|
|
help
|
|
|
|
The implicit default version of DWARF debug info produced by a
|
|
|
|
toolchain changes over time.
|
|
|
|
|
|
|
|
This can break consumers of the debug info that haven't upgraded to
|
|
|
|
support newer revisions, and prevent testing newer versions, but
|
|
|
|
those should be less common scenarios.
|
|
|
|
|
|
|
|
config DEBUG_INFO_DWARF4
|
|
|
|
bool "Generate DWARF Version 4 debuginfo"
|
|
|
|
select DEBUG_INFO
|
2022-10-05 01:29:03 +09:00
|
|
|
depends on !CC_IS_CLANG || AS_IS_LLVM || (AS_IS_GNU && AS_VERSION >= 23502)
|
2022-03-23 16:05:38 -07:00
|
|
|
help
|
2022-09-19 10:45:47 -07:00
|
|
|
Generate DWARF v4 debug info. This requires gcc 4.5+, binutils 2.35.2
|
|
|
|
if using clang without clang's integrated assembler, and gdb 7.0+.
|
2022-03-23 16:05:38 -07:00
|
|
|
|
|
|
|
If you have consumers of DWARF debug info that are not ready for
|
|
|
|
newer revisions of DWARF, you may wish to choose this or have your
|
|
|
|
config select this.
|
|
|
|
|
|
|
|
config DEBUG_INFO_DWARF5
|
|
|
|
bool "Generate DWARF Version 5 debuginfo"
|
|
|
|
select DEBUG_INFO
|
2023-12-05 16:53:51 -07:00
|
|
|
depends on !ARCH_HAS_BROKEN_DWARF5
|
2023-12-05 16:53:52 -07:00
|
|
|
depends on !CC_IS_CLANG || AS_IS_LLVM || (AS_IS_GNU && AS_VERSION >= 23502 && AS_HAS_NON_CONST_ULEB128)
|
2022-03-23 16:05:38 -07:00
|
|
|
help
|
|
|
|
Generate DWARF v5 debug info. Requires binutils 2.35.2, gcc 5.0+ (gcc
|
|
|
|
5.0+ accepts the -gdwarf-5 flag but only had partial support for some
|
|
|
|
draft features until 7.0), and gdb 8.0+.
|
|
|
|
|
|
|
|
Changes to the structure of debug info in Version 5 allow for around
|
|
|
|
15-18% savings in resulting image and debug info section sizes as
|
|
|
|
compared to DWARF Version 4. DWARF Version 5 standardizes previous
|
|
|
|
extensions such as accelerators for symbol indexing and the format
|
|
|
|
for fission (.dwo/.dwp) files. Users may not want to select this
|
|
|
|
config if they rely on tooling that has not yet been updated to
|
|
|
|
support DWARF Version 5.
|
|
|
|
|
|
|
|
endchoice # "Debug information"
|
2013-07-01 13:04:46 -07:00
|
|
|
|
2020-08-16 14:32:44 +02:00
|
|
|
if DEBUG_INFO
|
|
|
|
|
2013-07-01 13:04:46 -07:00
|
|
|
config DEBUG_INFO_REDUCED
|
|
|
|
bool "Reduce debugging information"
|
|
|
|
help
|
|
|
|
If you say Y here gcc is instructed to generate less debugging
|
|
|
|
information for structure types. This means that tools that
|
|
|
|
need full debugging information (like kgdb or systemtap) won't
|
|
|
|
be happy. But if you merely need debugging information to
|
|
|
|
resolve line numbers there is no loss. Advantage is that
|
|
|
|
build directory object sizes shrink dramatically over a full
|
|
|
|
DEBUG_INFO build and compile times are reduced too.
|
|
|
|
Only works with newer gcc versions.
|
|
|
|
|
2022-11-10 11:59:05 -08:00
|
|
|
choice
|
|
|
|
prompt "Compressed Debug information"
|
|
|
|
help
|
|
|
|
Compress the resulting debug info. Results in smaller debug info sections,
|
|
|
|
but requires that consumers are able to decompress the results.
|
|
|
|
|
|
|
|
If unsure, choose DEBUG_INFO_COMPRESSED_NONE.
|
|
|
|
|
|
|
|
config DEBUG_INFO_COMPRESSED_NONE
|
|
|
|
bool "Don't compress debug information"
|
|
|
|
help
|
|
|
|
Don't compress debug info sections.
|
|
|
|
|
|
|
|
config DEBUG_INFO_COMPRESSED_ZLIB
|
|
|
|
bool "Compress debugging information with zlib"
|
2020-05-26 10:18:29 -07:00
|
|
|
depends on $(cc-option,-gz=zlib)
|
|
|
|
depends on $(ld-option,--compress-debug-sections=zlib)
|
|
|
|
help
|
|
|
|
Compress the debug information using zlib. Requires GCC 5.0+ or Clang
|
|
|
|
5.0+, binutils 2.26+, and zlib.
|
|
|
|
|
|
|
|
Users of dpkg-deb via scripts/package/builddeb may find an increase in
|
|
|
|
size of their debug .deb packages with this config set, due to the
|
|
|
|
debug info being compressed with zlib, then the object files being
|
|
|
|
recompressed with a different compression scheme. But this is still
|
|
|
|
preferable to setting $KDEB_COMPRESS to "none" which would be even
|
|
|
|
larger.
|
|
|
|
|
2022-11-10 11:59:05 -08:00
|
|
|
config DEBUG_INFO_COMPRESSED_ZSTD
|
|
|
|
bool "Compress debugging information with zstd"
|
|
|
|
depends on $(cc-option,-gz=zstd)
|
|
|
|
depends on $(ld-option,--compress-debug-sections=zstd)
|
|
|
|
help
|
|
|
|
Compress the debug information using zstd. This may provide better
|
|
|
|
compression than zlib, for about the same time costs, but requires newer
|
|
|
|
toolchain support. Requires GCC 13.0+ or Clang 16.0+, binutils 2.40+, and
|
|
|
|
zstd.
|
|
|
|
|
|
|
|
endchoice # "Compressed Debug information"
|
|
|
|
|
2014-07-30 20:50:18 +02:00
|
|
|
config DEBUG_INFO_SPLIT
|
|
|
|
bool "Produce split debuginfo in .dwo files"
|
2019-02-22 16:56:09 +09:00
|
|
|
depends on $(cc-option,-gsplit-dwarf)
|
2023-08-16 10:35:43 -07:00
|
|
|
# RISC-V linker relaxation + -gsplit-dwarf has issues with LLVM and GCC
|
|
|
|
# prior to 12.x:
|
|
|
|
# https://github.com/llvm/llvm-project/issues/56642
|
|
|
|
# https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99090
|
|
|
|
depends on !RISCV || GCC_VERSION >= 120000
|
2014-07-30 20:50:18 +02:00
|
|
|
help
|
|
|
|
Generate debug info into separate .dwo files. This significantly
|
|
|
|
reduces the build directory size for builds with DEBUG_INFO,
|
|
|
|
because it stores the information only once on disk in .dwo
|
|
|
|
files instead of multiple times in object files and executables.
|
|
|
|
In addition the debug information is also compressed.
|
|
|
|
|
|
|
|
Requires recent gcc (4.7+) and recent gdb/binutils.
|
|
|
|
Any tool that packages or reads debug information would need
|
|
|
|
to know about the .dwo files and include them.
|
|
|
|
Incompatible with older versions of ccache.
|
|
|
|
|
kbuild: add ability to generate BTF type info for vmlinux
This patch adds new config option to trigger generation of BTF type
information from DWARF debuginfo for vmlinux and kernel modules through
pahole, which in turn relies on libbpf for btf_dedup() algorithm.
The intent is to record compact type information of all types used
inside kernel, including all the structs/unions/typedefs/etc. This
enables BPF's compile-once-run-everywhere ([0]) approach, in which
tracing programs that are inspecting kernel's internal data (e.g.,
struct task_struct) can be compiled on a system running some kernel
version, but would be possible to run on other kernel versions (and
configurations) without recompilation, even if the layout of structs
changed and/or some of the fields were added, removed, or renamed.
This is only possible if BPF loader can get kernel type info to adjust
all the offsets correctly. This patch is a first time in this direction,
making sure that BTF type info is part of Linux kernel image in
non-loadable ELF section.
BTF deduplication ([1]) algorithm typically provides 100x savings
compared to DWARF data, so resulting .BTF section is not big as is
typically about 2MB in size.
[0] http://vger.kernel.org/lpc-bpf2018.html#session-2
[1] https://facebookmicrosites.github.io/bpf/blog/2018/11/14/btf-enhancement.html
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-04-02 09:49:50 -07:00
|
|
|
config DEBUG_INFO_BTF
|
2024-04-04 15:03:44 -07:00
|
|
|
bool "Generate BTF type information"
|
2020-04-02 23:41:39 +03:00
|
|
|
depends on !DEBUG_INFO_SPLIT && !DEBUG_INFO_REDUCED
|
|
|
|
depends on !GCC_PLUGIN_RANDSTRUCT || COMPILE_TEST
|
2021-11-22 20:17:40 +05:30
|
|
|
depends on BPF_SYSCALL
|
2024-09-14 02:37:53 +09:00
|
|
|
depends on PAHOLE_VERSION >= 116
|
2024-09-14 02:37:54 +09:00
|
|
|
depends on DEBUG_INFO_DWARF4 || PAHOLE_VERSION >= 121
|
2024-01-05 12:13:04 -07:00
|
|
|
# pahole uses elfutils, which does not have support for Hexagon relocations
|
|
|
|
depends on !HEXAGON
|
kbuild: add ability to generate BTF type info for vmlinux
This patch adds new config option to trigger generation of BTF type
information from DWARF debuginfo for vmlinux and kernel modules through
pahole, which in turn relies on libbpf for btf_dedup() algorithm.
The intent is to record compact type information of all types used
inside kernel, including all the structs/unions/typedefs/etc. This
enables BPF's compile-once-run-everywhere ([0]) approach, in which
tracing programs that are inspecting kernel's internal data (e.g.,
struct task_struct) can be compiled on a system running some kernel
version, but would be possible to run on other kernel versions (and
configurations) without recompilation, even if the layout of structs
changed and/or some of the fields were added, removed, or renamed.
This is only possible if BPF loader can get kernel type info to adjust
all the offsets correctly. This patch is a first time in this direction,
making sure that BTF type info is part of Linux kernel image in
non-loadable ELF section.
BTF deduplication ([1]) algorithm typically provides 100x savings
compared to DWARF data, so resulting .BTF section is not big as is
typically about 2MB in size.
[0] http://vger.kernel.org/lpc-bpf2018.html#session-2
[1] https://facebookmicrosites.github.io/bpf/blog/2018/11/14/btf-enhancement.html
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-04-02 09:49:50 -07:00
|
|
|
help
|
|
|
|
Generate deduplicated BTF type information from DWARF debug info.
|
2024-09-14 02:37:53 +09:00
|
|
|
Turning this on requires pahole v1.16 or later (v1.21 or later to
|
|
|
|
support DWARF 5), which will convert DWARF type info into equivalent
|
|
|
|
deduplicated BTF type info.
|
kbuild: add ability to generate BTF type info for vmlinux
This patch adds new config option to trigger generation of BTF type
information from DWARF debuginfo for vmlinux and kernel modules through
pahole, which in turn relies on libbpf for btf_dedup() algorithm.
The intent is to record compact type information of all types used
inside kernel, including all the structs/unions/typedefs/etc. This
enables BPF's compile-once-run-everywhere ([0]) approach, in which
tracing programs that are inspecting kernel's internal data (e.g.,
struct task_struct) can be compiled on a system running some kernel
version, but would be possible to run on other kernel versions (and
configurations) without recompilation, even if the layout of structs
changed and/or some of the fields were added, removed, or renamed.
This is only possible if BPF loader can get kernel type info to adjust
all the offsets correctly. This patch is a first time in this direction,
making sure that BTF type info is part of Linux kernel image in
non-loadable ELF section.
BTF deduplication ([1]) algorithm typically provides 100x savings
compared to DWARF data, so resulting .BTF section is not big as is
typically about 2MB in size.
[0] http://vger.kernel.org/lpc-bpf2018.html#session-2
[1] https://facebookmicrosites.github.io/bpf/blog/2018/11/14/btf-enhancement.html
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-04-02 09:49:50 -07:00
|
|
|
|
kbuild: Build kernel module BTFs if BTF is enabled and pahole supports it
Detect if pahole supports split BTF generation, and generate BTF for each
selected kernel module, if it does. This is exposed to Makefiles and C code as
CONFIG_DEBUG_INFO_BTF_MODULES flag.
Kernel module BTF has to be re-generated if either vmlinux's BTF changes or
module's .ko changes. To achieve that, I needed a helper similar to
if_changed, but that would allow to filter out vmlinux from the list of
updated dependencies for .ko building. I've put it next to the only place that
uses and needs it, but it might be a better idea to just add it along the
other if_changed variants into scripts/Kbuild.include.
Each kernel module's BTF deduplication is pretty fast, as it does only
incremental BTF deduplication on top of already deduplicated vmlinux BTF. To
show the added build time, I've first ran make only just built kernel (to
establish the baseline) and then forced only BTF re-generation, without
regenerating .ko files. The build was performed with -j60 parallelization on
56-core machine. The final time also includes bzImage building, so it's not
a pure BTF overhead.
$ time make -j60
...
make -j60 27.65s user 10.96s system 782% cpu 4.933 total
$ touch ~/linux-build/default/vmlinux && time make -j60
...
make -j60 123.69s user 27.85s system 1566% cpu 9.675 total
So 4.6 seconds real time, with noticeable part spent in compressed vmlinux and
bzImage building.
To show size savings, I've built my kernel configuration with about 700 kernel
modules with full BTF per each kernel module (without deduplicating against
vmlinux) and with split BTF against deduplicated vmlinux (approach in this
patch). Below are top 10 modules with biggest BTF sizes. And total size of BTF
data across all kernel modules.
It shows that split BTF "compresses" 115MB down to 5MB total. And the biggest
kernel modules get a downsize from 500-570KB down to 200-300KB.
FULL BTF
========
$ for f in $(find . -name '*.ko'); do size -A -d $f | grep BTF | awk '{print $2}'; done | awk '{ s += $1 } END { print s }'
115710691
$ for f in $(find . -name '*.ko'); do printf "%s %d\n" $f $(size -A -d $f | grep BTF | awk '{print $2}'); done | sort -nr -k2 | head -n10
./drivers/gpu/drm/i915/i915.ko 570570
./drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.ko 520240
./drivers/gpu/drm/radeon/radeon.ko 503849
./drivers/infiniband/hw/mlx5/mlx5_ib.ko 491777
./fs/xfs/xfs.ko 411544
./drivers/net/ethernet/intel/i40e/i40e.ko 403904
./drivers/net/ethernet/broadcom/bnx2x/bnx2x.ko 398754
./drivers/infiniband/core/ib_core.ko 397224
./fs/cifs/cifs.ko 386249
./fs/nfsd/nfsd.ko 379738
SPLIT BTF
=========
$ for f in $(find . -name '*.ko'); do size -A -d $f | grep BTF | awk '{print $2}'; done | awk '{ s += $1 } END { print s }'
5194047
$ for f in $(find . -name '*.ko'); do printf "%s %d\n" $f $(size -A -d $f | grep BTF | awk '{print $2}'); done | sort -nr -k2 | head -n10
./drivers/gpu/drm/i915/i915.ko 293206
./drivers/gpu/drm/radeon/radeon.ko 282103
./fs/xfs/xfs.ko 222150
./drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.ko 198503
./drivers/infiniband/hw/mlx5/mlx5_ib.ko 198356
./drivers/net/ethernet/broadcom/bnx2x/bnx2x.ko 113444
./fs/cifs/cifs.ko 109379
./arch/x86/kvm/kvm.ko 100225
./drivers/gpu/drm/drm.ko 94827
./drivers/infiniband/core/ib_core.ko 91188
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201110011932.3201430-4-andrii@kernel.org
2020-11-09 17:19:30 -08:00
|
|
|
config PAHOLE_HAS_SPLIT_BTF
|
2022-02-01 13:56:23 -07:00
|
|
|
def_bool PAHOLE_VERSION >= 119
|
kbuild: Build kernel module BTFs if BTF is enabled and pahole supports it
Detect if pahole supports split BTF generation, and generate BTF for each
selected kernel module, if it does. This is exposed to Makefiles and C code as
CONFIG_DEBUG_INFO_BTF_MODULES flag.
Kernel module BTF has to be re-generated if either vmlinux's BTF changes or
module's .ko changes. To achieve that, I needed a helper similar to
if_changed, but that would allow to filter out vmlinux from the list of
updated dependencies for .ko building. I've put it next to the only place that
uses and needs it, but it might be a better idea to just add it along the
other if_changed variants into scripts/Kbuild.include.
Each kernel module's BTF deduplication is pretty fast, as it does only
incremental BTF deduplication on top of already deduplicated vmlinux BTF. To
show the added build time, I've first ran make only just built kernel (to
establish the baseline) and then forced only BTF re-generation, without
regenerating .ko files. The build was performed with -j60 parallelization on
56-core machine. The final time also includes bzImage building, so it's not
a pure BTF overhead.
$ time make -j60
...
make -j60 27.65s user 10.96s system 782% cpu 4.933 total
$ touch ~/linux-build/default/vmlinux && time make -j60
...
make -j60 123.69s user 27.85s system 1566% cpu 9.675 total
So 4.6 seconds real time, with noticeable part spent in compressed vmlinux and
bzImage building.
To show size savings, I've built my kernel configuration with about 700 kernel
modules with full BTF per each kernel module (without deduplicating against
vmlinux) and with split BTF against deduplicated vmlinux (approach in this
patch). Below are top 10 modules with biggest BTF sizes. And total size of BTF
data across all kernel modules.
It shows that split BTF "compresses" 115MB down to 5MB total. And the biggest
kernel modules get a downsize from 500-570KB down to 200-300KB.
FULL BTF
========
$ for f in $(find . -name '*.ko'); do size -A -d $f | grep BTF | awk '{print $2}'; done | awk '{ s += $1 } END { print s }'
115710691
$ for f in $(find . -name '*.ko'); do printf "%s %d\n" $f $(size -A -d $f | grep BTF | awk '{print $2}'); done | sort -nr -k2 | head -n10
./drivers/gpu/drm/i915/i915.ko 570570
./drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.ko 520240
./drivers/gpu/drm/radeon/radeon.ko 503849
./drivers/infiniband/hw/mlx5/mlx5_ib.ko 491777
./fs/xfs/xfs.ko 411544
./drivers/net/ethernet/intel/i40e/i40e.ko 403904
./drivers/net/ethernet/broadcom/bnx2x/bnx2x.ko 398754
./drivers/infiniband/core/ib_core.ko 397224
./fs/cifs/cifs.ko 386249
./fs/nfsd/nfsd.ko 379738
SPLIT BTF
=========
$ for f in $(find . -name '*.ko'); do size -A -d $f | grep BTF | awk '{print $2}'; done | awk '{ s += $1 } END { print s }'
5194047
$ for f in $(find . -name '*.ko'); do printf "%s %d\n" $f $(size -A -d $f | grep BTF | awk '{print $2}'); done | sort -nr -k2 | head -n10
./drivers/gpu/drm/i915/i915.ko 293206
./drivers/gpu/drm/radeon/radeon.ko 282103
./fs/xfs/xfs.ko 222150
./drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.ko 198503
./drivers/infiniband/hw/mlx5/mlx5_ib.ko 198356
./drivers/net/ethernet/broadcom/bnx2x/bnx2x.ko 113444
./fs/cifs/cifs.ko 109379
./arch/x86/kvm/kvm.ko 100225
./drivers/gpu/drm/drm.ko 94827
./drivers/infiniband/core/ib_core.ko 91188
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201110011932.3201430-4-andrii@kernel.org
2020-11-09 17:19:30 -08:00
|
|
|
|
compiler_types: define __user as __attribute__((btf_type_tag("user")))
The __user attribute is currently mainly used by sparse for type checking.
The attribute indicates whether a memory access is in user memory address
space or not. Such information is important during tracing kernel
internal functions or data structures as accessing user memory often
has different mechanisms compared to accessing kernel memory. For example,
the perf-probe needs explicit command line specification to indicate a
particular argument or string in user-space memory ([1], [2], [3]).
Currently, vmlinux BTF is available in kernel with many distributions.
If __user attribute information is available in vmlinux BTF, the explicit
user memory access information from users will not be necessary as
the kernel can figure it out by itself with vmlinux BTF.
Besides the above possible use for perf/probe, another use case is
for bpf verifier. Currently, for bpf BPF_PROG_TYPE_TRACING type of bpf
programs, users can write direct code like
p->m1->m2
and "p" could be a function parameter. Without __user information in BTF,
the verifier will assume p->m1 accessing kernel memory and will generate
normal loads. Let us say "p" actually tagged with __user in the source
code. In such cases, p->m1 is actually accessing user memory and direct
load is not right and may produce incorrect result. For such cases,
bpf_probe_read_user() will be the correct way to read p->m1.
To support encoding __user information in BTF, a new attribute
__attribute__((btf_type_tag("<arbitrary_string>")))
is implemented in clang ([4]). For example, if we have
#define __user __attribute__((btf_type_tag("user")))
during kernel compilation, the attribute "user" information will
be preserved in dwarf. After pahole converting dwarf to BTF, __user
information will be available in vmlinux BTF.
The following is an example with latest upstream clang (clang14) and
pahole 1.23:
[$ ~] cat test.c
#define __user __attribute__((btf_type_tag("user")))
int foo(int __user *arg) {
return *arg;
}
[$ ~] clang -O2 -g -c test.c
[$ ~] pahole -JV test.o
...
[1] INT int size=4 nr_bits=32 encoding=SIGNED
[2] TYPE_TAG user type_id=1
[3] PTR (anon) type_id=2
[4] FUNC_PROTO (anon) return=1 args=(3 arg)
[5] FUNC foo type_id=4
[$ ~]
You can see for the function argument "int __user *arg", its type is
described as
PTR -> TYPE_TAG(user) -> INT
The kernel can use this information for bpf verification or other
use cases.
Current btf_type_tag is only supported in clang (>= clang14) and
pahole (>= 1.23). gcc support is also proposed and under development ([5]).
[1] http://lkml.kernel.org/r/155789874562.26965.10836126971405890891.stgit@devnote2
[2] http://lkml.kernel.org/r/155789872187.26965.4468456816590888687.stgit@devnote2
[3] http://lkml.kernel.org/r/155789871009.26965.14167558859557329331.stgit@devnote2
[4] https://reviews.llvm.org/D111199
[5] https://lore.kernel.org/bpf/0cbeb2fb-1a18-f690-e360-24b1c90c2a91@fb.com/
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220127154600.652613-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-01-27 07:46:00 -08:00
|
|
|
config PAHOLE_HAS_BTF_TAG
|
2022-02-01 13:56:23 -07:00
|
|
|
def_bool PAHOLE_VERSION >= 123
|
compiler_types: define __user as __attribute__((btf_type_tag("user")))
The __user attribute is currently mainly used by sparse for type checking.
The attribute indicates whether a memory access is in user memory address
space or not. Such information is important during tracing kernel
internal functions or data structures as accessing user memory often
has different mechanisms compared to accessing kernel memory. For example,
the perf-probe needs explicit command line specification to indicate a
particular argument or string in user-space memory ([1], [2], [3]).
Currently, vmlinux BTF is available in kernel with many distributions.
If __user attribute information is available in vmlinux BTF, the explicit
user memory access information from users will not be necessary as
the kernel can figure it out by itself with vmlinux BTF.
Besides the above possible use for perf/probe, another use case is
for bpf verifier. Currently, for bpf BPF_PROG_TYPE_TRACING type of bpf
programs, users can write direct code like
p->m1->m2
and "p" could be a function parameter. Without __user information in BTF,
the verifier will assume p->m1 accessing kernel memory and will generate
normal loads. Let us say "p" actually tagged with __user in the source
code. In such cases, p->m1 is actually accessing user memory and direct
load is not right and may produce incorrect result. For such cases,
bpf_probe_read_user() will be the correct way to read p->m1.
To support encoding __user information in BTF, a new attribute
__attribute__((btf_type_tag("<arbitrary_string>")))
is implemented in clang ([4]). For example, if we have
#define __user __attribute__((btf_type_tag("user")))
during kernel compilation, the attribute "user" information will
be preserved in dwarf. After pahole converting dwarf to BTF, __user
information will be available in vmlinux BTF.
The following is an example with latest upstream clang (clang14) and
pahole 1.23:
[$ ~] cat test.c
#define __user __attribute__((btf_type_tag("user")))
int foo(int __user *arg) {
return *arg;
}
[$ ~] clang -O2 -g -c test.c
[$ ~] pahole -JV test.o
...
[1] INT int size=4 nr_bits=32 encoding=SIGNED
[2] TYPE_TAG user type_id=1
[3] PTR (anon) type_id=2
[4] FUNC_PROTO (anon) return=1 args=(3 arg)
[5] FUNC foo type_id=4
[$ ~]
You can see for the function argument "int __user *arg", its type is
described as
PTR -> TYPE_TAG(user) -> INT
The kernel can use this information for bpf verification or other
use cases.
Current btf_type_tag is only supported in clang (>= clang14) and
pahole (>= 1.23). gcc support is also proposed and under development ([5]).
[1] http://lkml.kernel.org/r/155789874562.26965.10836126971405890891.stgit@devnote2
[2] http://lkml.kernel.org/r/155789872187.26965.4468456816590888687.stgit@devnote2
[3] http://lkml.kernel.org/r/155789871009.26965.14167558859557329331.stgit@devnote2
[4] https://reviews.llvm.org/D111199
[5] https://lore.kernel.org/bpf/0cbeb2fb-1a18-f690-e360-24b1c90c2a91@fb.com/
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220127154600.652613-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-01-27 07:46:00 -08:00
|
|
|
depends on CC_IS_CLANG
|
|
|
|
help
|
|
|
|
Decide whether pahole emits btf_tag attributes (btf_type_tag and
|
|
|
|
btf_decl_tag) or not. Currently only clang compiler implements
|
|
|
|
these attributes, so make the config depend on CC_IS_CLANG.
|
kbuild: Build kernel module BTFs if BTF is enabled and pahole supports it
Detect if pahole supports split BTF generation, and generate BTF for each
selected kernel module, if it does. This is exposed to Makefiles and C code as
CONFIG_DEBUG_INFO_BTF_MODULES flag.
Kernel module BTF has to be re-generated if either vmlinux's BTF changes or
module's .ko changes. To achieve that, I needed a helper similar to
if_changed, but that would allow to filter out vmlinux from the list of
updated dependencies for .ko building. I've put it next to the only place that
uses and needs it, but it might be a better idea to just add it along the
other if_changed variants into scripts/Kbuild.include.
Each kernel module's BTF deduplication is pretty fast, as it does only
incremental BTF deduplication on top of already deduplicated vmlinux BTF. To
show the added build time, I've first ran make only just built kernel (to
establish the baseline) and then forced only BTF re-generation, without
regenerating .ko files. The build was performed with -j60 parallelization on
56-core machine. The final time also includes bzImage building, so it's not
a pure BTF overhead.
$ time make -j60
...
make -j60 27.65s user 10.96s system 782% cpu 4.933 total
$ touch ~/linux-build/default/vmlinux && time make -j60
...
make -j60 123.69s user 27.85s system 1566% cpu 9.675 total
So 4.6 seconds real time, with noticeable part spent in compressed vmlinux and
bzImage building.
To show size savings, I've built my kernel configuration with about 700 kernel
modules with full BTF per each kernel module (without deduplicating against
vmlinux) and with split BTF against deduplicated vmlinux (approach in this
patch). Below are top 10 modules with biggest BTF sizes. And total size of BTF
data across all kernel modules.
It shows that split BTF "compresses" 115MB down to 5MB total. And the biggest
kernel modules get a downsize from 500-570KB down to 200-300KB.
FULL BTF
========
$ for f in $(find . -name '*.ko'); do size -A -d $f | grep BTF | awk '{print $2}'; done | awk '{ s += $1 } END { print s }'
115710691
$ for f in $(find . -name '*.ko'); do printf "%s %d\n" $f $(size -A -d $f | grep BTF | awk '{print $2}'); done | sort -nr -k2 | head -n10
./drivers/gpu/drm/i915/i915.ko 570570
./drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.ko 520240
./drivers/gpu/drm/radeon/radeon.ko 503849
./drivers/infiniband/hw/mlx5/mlx5_ib.ko 491777
./fs/xfs/xfs.ko 411544
./drivers/net/ethernet/intel/i40e/i40e.ko 403904
./drivers/net/ethernet/broadcom/bnx2x/bnx2x.ko 398754
./drivers/infiniband/core/ib_core.ko 397224
./fs/cifs/cifs.ko 386249
./fs/nfsd/nfsd.ko 379738
SPLIT BTF
=========
$ for f in $(find . -name '*.ko'); do size -A -d $f | grep BTF | awk '{print $2}'; done | awk '{ s += $1 } END { print s }'
5194047
$ for f in $(find . -name '*.ko'); do printf "%s %d\n" $f $(size -A -d $f | grep BTF | awk '{print $2}'); done | sort -nr -k2 | head -n10
./drivers/gpu/drm/i915/i915.ko 293206
./drivers/gpu/drm/radeon/radeon.ko 282103
./fs/xfs/xfs.ko 222150
./drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.ko 198503
./drivers/infiniband/hw/mlx5/mlx5_ib.ko 198356
./drivers/net/ethernet/broadcom/bnx2x/bnx2x.ko 113444
./fs/cifs/cifs.ko 109379
./arch/x86/kvm/kvm.ko 100225
./drivers/gpu/drm/drm.ko 94827
./drivers/infiniband/core/ib_core.ko 91188
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201110011932.3201430-4-andrii@kernel.org
2020-11-09 17:19:30 -08:00
|
|
|
|
2023-01-11 12:20:50 -03:00
|
|
|
config PAHOLE_HAS_LANG_EXCLUDE
|
|
|
|
def_bool PAHOLE_VERSION >= 124
|
|
|
|
help
|
|
|
|
Support for the --lang_exclude flag which makes pahole exclude
|
|
|
|
compilation units from the supplied language. Used in Kbuild to
|
|
|
|
omit Rust CUs which are not supported in version 1.24 of pahole,
|
|
|
|
otherwise it would emit malformed kernel and module binaries when
|
|
|
|
using DEBUG_INFO_BTF_MODULES.
|
|
|
|
|
kbuild: Build kernel module BTFs if BTF is enabled and pahole supports it
Detect if pahole supports split BTF generation, and generate BTF for each
selected kernel module, if it does. This is exposed to Makefiles and C code as
CONFIG_DEBUG_INFO_BTF_MODULES flag.
Kernel module BTF has to be re-generated if either vmlinux's BTF changes or
module's .ko changes. To achieve that, I needed a helper similar to
if_changed, but that would allow to filter out vmlinux from the list of
updated dependencies for .ko building. I've put it next to the only place that
uses and needs it, but it might be a better idea to just add it along the
other if_changed variants into scripts/Kbuild.include.
Each kernel module's BTF deduplication is pretty fast, as it does only
incremental BTF deduplication on top of already deduplicated vmlinux BTF. To
show the added build time, I've first ran make only just built kernel (to
establish the baseline) and then forced only BTF re-generation, without
regenerating .ko files. The build was performed with -j60 parallelization on
56-core machine. The final time also includes bzImage building, so it's not
a pure BTF overhead.
$ time make -j60
...
make -j60 27.65s user 10.96s system 782% cpu 4.933 total
$ touch ~/linux-build/default/vmlinux && time make -j60
...
make -j60 123.69s user 27.85s system 1566% cpu 9.675 total
So 4.6 seconds real time, with noticeable part spent in compressed vmlinux and
bzImage building.
To show size savings, I've built my kernel configuration with about 700 kernel
modules with full BTF per each kernel module (without deduplicating against
vmlinux) and with split BTF against deduplicated vmlinux (approach in this
patch). Below are top 10 modules with biggest BTF sizes. And total size of BTF
data across all kernel modules.
It shows that split BTF "compresses" 115MB down to 5MB total. And the biggest
kernel modules get a downsize from 500-570KB down to 200-300KB.
FULL BTF
========
$ for f in $(find . -name '*.ko'); do size -A -d $f | grep BTF | awk '{print $2}'; done | awk '{ s += $1 } END { print s }'
115710691
$ for f in $(find . -name '*.ko'); do printf "%s %d\n" $f $(size -A -d $f | grep BTF | awk '{print $2}'); done | sort -nr -k2 | head -n10
./drivers/gpu/drm/i915/i915.ko 570570
./drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.ko 520240
./drivers/gpu/drm/radeon/radeon.ko 503849
./drivers/infiniband/hw/mlx5/mlx5_ib.ko 491777
./fs/xfs/xfs.ko 411544
./drivers/net/ethernet/intel/i40e/i40e.ko 403904
./drivers/net/ethernet/broadcom/bnx2x/bnx2x.ko 398754
./drivers/infiniband/core/ib_core.ko 397224
./fs/cifs/cifs.ko 386249
./fs/nfsd/nfsd.ko 379738
SPLIT BTF
=========
$ for f in $(find . -name '*.ko'); do size -A -d $f | grep BTF | awk '{print $2}'; done | awk '{ s += $1 } END { print s }'
5194047
$ for f in $(find . -name '*.ko'); do printf "%s %d\n" $f $(size -A -d $f | grep BTF | awk '{print $2}'); done | sort -nr -k2 | head -n10
./drivers/gpu/drm/i915/i915.ko 293206
./drivers/gpu/drm/radeon/radeon.ko 282103
./fs/xfs/xfs.ko 222150
./drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.ko 198503
./drivers/infiniband/hw/mlx5/mlx5_ib.ko 198356
./drivers/net/ethernet/broadcom/bnx2x/bnx2x.ko 113444
./fs/cifs/cifs.ko 109379
./arch/x86/kvm/kvm.ko 100225
./drivers/gpu/drm/drm.ko 94827
./drivers/infiniband/core/ib_core.ko 91188
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201110011932.3201430-4-andrii@kernel.org
2020-11-09 17:19:30 -08:00
|
|
|
config DEBUG_INFO_BTF_MODULES
|
2024-04-04 15:03:44 -07:00
|
|
|
bool "Generate BTF type information for kernel modules"
|
|
|
|
default y
|
kbuild: Build kernel module BTFs if BTF is enabled and pahole supports it
Detect if pahole supports split BTF generation, and generate BTF for each
selected kernel module, if it does. This is exposed to Makefiles and C code as
CONFIG_DEBUG_INFO_BTF_MODULES flag.
Kernel module BTF has to be re-generated if either vmlinux's BTF changes or
module's .ko changes. To achieve that, I needed a helper similar to
if_changed, but that would allow to filter out vmlinux from the list of
updated dependencies for .ko building. I've put it next to the only place that
uses and needs it, but it might be a better idea to just add it along the
other if_changed variants into scripts/Kbuild.include.
Each kernel module's BTF deduplication is pretty fast, as it does only
incremental BTF deduplication on top of already deduplicated vmlinux BTF. To
show the added build time, I've first ran make only just built kernel (to
establish the baseline) and then forced only BTF re-generation, without
regenerating .ko files. The build was performed with -j60 parallelization on
56-core machine. The final time also includes bzImage building, so it's not
a pure BTF overhead.
$ time make -j60
...
make -j60 27.65s user 10.96s system 782% cpu 4.933 total
$ touch ~/linux-build/default/vmlinux && time make -j60
...
make -j60 123.69s user 27.85s system 1566% cpu 9.675 total
So 4.6 seconds real time, with noticeable part spent in compressed vmlinux and
bzImage building.
To show size savings, I've built my kernel configuration with about 700 kernel
modules with full BTF per each kernel module (without deduplicating against
vmlinux) and with split BTF against deduplicated vmlinux (approach in this
patch). Below are top 10 modules with biggest BTF sizes. And total size of BTF
data across all kernel modules.
It shows that split BTF "compresses" 115MB down to 5MB total. And the biggest
kernel modules get a downsize from 500-570KB down to 200-300KB.
FULL BTF
========
$ for f in $(find . -name '*.ko'); do size -A -d $f | grep BTF | awk '{print $2}'; done | awk '{ s += $1 } END { print s }'
115710691
$ for f in $(find . -name '*.ko'); do printf "%s %d\n" $f $(size -A -d $f | grep BTF | awk '{print $2}'); done | sort -nr -k2 | head -n10
./drivers/gpu/drm/i915/i915.ko 570570
./drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.ko 520240
./drivers/gpu/drm/radeon/radeon.ko 503849
./drivers/infiniband/hw/mlx5/mlx5_ib.ko 491777
./fs/xfs/xfs.ko 411544
./drivers/net/ethernet/intel/i40e/i40e.ko 403904
./drivers/net/ethernet/broadcom/bnx2x/bnx2x.ko 398754
./drivers/infiniband/core/ib_core.ko 397224
./fs/cifs/cifs.ko 386249
./fs/nfsd/nfsd.ko 379738
SPLIT BTF
=========
$ for f in $(find . -name '*.ko'); do size -A -d $f | grep BTF | awk '{print $2}'; done | awk '{ s += $1 } END { print s }'
5194047
$ for f in $(find . -name '*.ko'); do printf "%s %d\n" $f $(size -A -d $f | grep BTF | awk '{print $2}'); done | sort -nr -k2 | head -n10
./drivers/gpu/drm/i915/i915.ko 293206
./drivers/gpu/drm/radeon/radeon.ko 282103
./fs/xfs/xfs.ko 222150
./drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.ko 198503
./drivers/infiniband/hw/mlx5/mlx5_ib.ko 198356
./drivers/net/ethernet/broadcom/bnx2x/bnx2x.ko 113444
./fs/cifs/cifs.ko 109379
./arch/x86/kvm/kvm.ko 100225
./drivers/gpu/drm/drm.ko 94827
./drivers/infiniband/core/ib_core.ko 91188
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201110011932.3201430-4-andrii@kernel.org
2020-11-09 17:19:30 -08:00
|
|
|
depends on DEBUG_INFO_BTF && MODULES && PAHOLE_HAS_SPLIT_BTF
|
|
|
|
help
|
|
|
|
Generate compact split BTF type information for kernel modules.
|
|
|
|
|
2022-02-23 01:28:14 +00:00
|
|
|
config MODULE_ALLOW_BTF_MISMATCH
|
|
|
|
bool "Allow loading modules with non-matching BTF type info"
|
|
|
|
depends on DEBUG_INFO_BTF_MODULES
|
|
|
|
help
|
|
|
|
For modules whose split BTF does not match vmlinux, load without
|
|
|
|
BTF rather than refusing to load. The default behavior with
|
|
|
|
module BTF enabled is to reject modules with such mismatches;
|
|
|
|
this option will still load module BTF where possible but ignore
|
|
|
|
it when a mismatch is found.
|
|
|
|
|
2015-02-17 13:46:36 -08:00
|
|
|
config GDB_SCRIPTS
|
|
|
|
bool "Provide GDB scripts for kernel debugging"
|
|
|
|
help
|
|
|
|
This creates the required links to GDB helper scripts in the
|
|
|
|
build directory. If you load vmlinux into gdb, the helper
|
|
|
|
scripts will be automatically imported by gdb as well, and
|
|
|
|
additional functions are available to analyze a Linux kernel
|
2016-12-14 15:05:40 -08:00
|
|
|
instance. See Documentation/dev-tools/gdb-kernel-debugging.rst
|
|
|
|
for further details.
|
2015-02-17 13:46:36 -08:00
|
|
|
|
2020-08-16 14:32:44 +02:00
|
|
|
endif # DEBUG_INFO
|
|
|
|
|
2008-02-22 15:15:03 +01:00
|
|
|
config FRAME_WARN
|
2020-02-17 00:19:36 +09:00
|
|
|
int "Warn for stack frames larger than"
|
2008-02-22 15:15:03 +01:00
|
|
|
range 0 8192
|
2022-11-02 12:06:09 +01:00
|
|
|
default 0 if KMSAN
|
2016-10-27 17:46:41 -07:00
|
|
|
default 2048 if GCC_PLUGIN_LATENT_ENTROPY
|
2021-11-19 22:31:03 +01:00
|
|
|
default 2048 if PARISC
|
|
|
|
default 1536 if (!64BIT && XTENSA)
|
2022-11-25 12:07:50 +00:00
|
|
|
default 1280 if KASAN && !64BIT
|
2021-11-19 22:31:03 +01:00
|
|
|
default 1024 if !64BIT
|
2008-02-22 15:15:03 +01:00
|
|
|
default 2048 if 64BIT
|
|
|
|
help
|
2022-10-24 23:21:42 +02:00
|
|
|
Tell the compiler to warn at build time for stack frames larger than this.
|
2008-02-22 15:15:03 +01:00
|
|
|
Setting this too low will cause a lot of warnings.
|
|
|
|
Setting it to 0 disables the warning.
|
|
|
|
|
2009-09-18 12:49:22 -07:00
|
|
|
config STRIP_ASM_SYMS
|
|
|
|
bool "Strip assembler-generated symbols during link"
|
|
|
|
default n
|
|
|
|
help
|
|
|
|
Strip internal assembler-generated symbols during a link (symbols
|
|
|
|
that look like '.Lxxx') so they don't pollute the output of
|
|
|
|
get_wchan() and suchlike.
|
|
|
|
|
2012-03-28 11:51:18 -07:00
|
|
|
config READABLE_ASM
|
2019-12-06 17:04:08 -08:00
|
|
|
bool "Generate readable assembler code"
|
|
|
|
depends on DEBUG_KERNEL
|
Makefile: remove stale cc-option checks
cc-option, cc-option-yn, and cc-disable-warning all invoke the compiler
during build time, and can slow down the build when these checks become
stale for our supported compilers, whose minimally supported versions
increases over time. See Documentation/process/changes.rst for the
current supported minimal versions (GCC 4.9+, clang 10.0.1+). Compiler
version support for these flags may be verified on godbolt.org.
The following flags are GCC only and supported since at least GCC 4.9.
Remove cc-option and cc-disable-warning tests.
* -fno-tree-loop-im
* -Wno-maybe-uninitialized
* -fno-reorder-blocks
* -fno-ipa-cp-clone
* -fno-partial-inlining
* -femit-struct-debug-baseonly
* -fno-inline-functions-called-once
* -fconserve-stack
The following flags are supported by all supported versions of GCC and
Clang. Remove their cc-option, cc-option-yn, and cc-disable-warning tests.
* -fno-delete-null-pointer-checks
* -fno-var-tracking
* -Wno-array-bounds
The following configs are made dependent on GCC, since they use GCC
specific flags.
* READABLE_ASM
* DEBUG_SECTION_MISMATCH
-mfentry was not supported by s390-linux-gnu-gcc until gcc-9+, add a
comment.
--param=allow-store-data-races=0 was renamed to -fno-allow-store-data-races
in the GCC 10 release; add a comment.
-Wmaybe-uninitialized (GCC specific) was being added for CONFIG_GCOV,
then again unconditionally; add it only once.
Also, base RETPOLINE_CFLAGS and RETPOLINE_VDSO_CFLAGS on CONFIC_CC_IS_*
then remove cc-option tests for Clang.
Link: https://github.com/ClangBuiltLinux/linux/issues/1436
Acked-by: Miguel Ojeda <ojeda@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2021-08-16 13:25:01 -07:00
|
|
|
depends on CC_IS_GCC
|
2006-12-10 02:18:37 -08:00
|
|
|
help
|
2019-12-06 17:04:08 -08:00
|
|
|
Disable some compiler optimizations that tend to generate human unreadable
|
|
|
|
assembler output. This may make the kernel slightly slower, but it helps
|
|
|
|
to keep kernel developers who have to stare a lot at assembler listings
|
|
|
|
sane.
|
2006-12-10 02:18:37 -08:00
|
|
|
|
2019-06-04 19:13:59 +09:00
|
|
|
config HEADERS_INSTALL
|
|
|
|
bool "Install uapi headers to usr/include"
|
2006-12-10 02:18:37 -08:00
|
|
|
depends on !UML
|
|
|
|
help
|
2019-06-04 19:13:59 +09:00
|
|
|
This option will install uapi headers (headers exported to user-space)
|
|
|
|
into the usr/include directory for use during the kernel build.
|
|
|
|
This is unneeded for building the kernel itself, but needed for some
|
|
|
|
user-space program samples. It is also needed by some features such
|
|
|
|
as uapi header sanity checks.
|
|
|
|
|
2008-01-21 21:31:44 +01:00
|
|
|
config DEBUG_SECTION_MISMATCH
|
|
|
|
bool "Enable full Section mismatch analysis"
|
Makefile: remove stale cc-option checks
cc-option, cc-option-yn, and cc-disable-warning all invoke the compiler
during build time, and can slow down the build when these checks become
stale for our supported compilers, whose minimally supported versions
increases over time. See Documentation/process/changes.rst for the
current supported minimal versions (GCC 4.9+, clang 10.0.1+). Compiler
version support for these flags may be verified on godbolt.org.
The following flags are GCC only and supported since at least GCC 4.9.
Remove cc-option and cc-disable-warning tests.
* -fno-tree-loop-im
* -Wno-maybe-uninitialized
* -fno-reorder-blocks
* -fno-ipa-cp-clone
* -fno-partial-inlining
* -femit-struct-debug-baseonly
* -fno-inline-functions-called-once
* -fconserve-stack
The following flags are supported by all supported versions of GCC and
Clang. Remove their cc-option, cc-option-yn, and cc-disable-warning tests.
* -fno-delete-null-pointer-checks
* -fno-var-tracking
* -Wno-array-bounds
The following configs are made dependent on GCC, since they use GCC
specific flags.
* READABLE_ASM
* DEBUG_SECTION_MISMATCH
-mfentry was not supported by s390-linux-gnu-gcc until gcc-9+, add a
comment.
--param=allow-store-data-races=0 was renamed to -fno-allow-store-data-races
in the GCC 10 release; add a comment.
-Wmaybe-uninitialized (GCC specific) was being added for CONFIG_GCOV,
then again unconditionally; add it only once.
Also, base RETPOLINE_CFLAGS and RETPOLINE_VDSO_CFLAGS on CONFIC_CC_IS_*
then remove cc-option tests for Clang.
Link: https://github.com/ClangBuiltLinux/linux/issues/1436
Acked-by: Miguel Ojeda <ojeda@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2021-08-16 13:25:01 -07:00
|
|
|
depends on CC_IS_GCC
|
2008-01-21 21:31:44 +01:00
|
|
|
help
|
|
|
|
The section mismatch analysis checks if there are illegal
|
|
|
|
references from one section to another section.
|
2011-04-17 04:08:48 +00:00
|
|
|
During linktime or runtime, some sections are dropped;
|
|
|
|
any use of code/data previously in these sections would
|
2008-01-21 21:31:44 +01:00
|
|
|
most likely result in an oops.
|
2011-04-17 04:08:48 +00:00
|
|
|
In the code, functions and variables are annotated with
|
2013-06-19 14:53:51 -04:00
|
|
|
__init,, etc. (see the full list in include/linux/init.h),
|
2008-01-30 11:13:23 +01:00
|
|
|
which results in the code/data being placed in specific sections.
|
2011-04-17 04:08:48 +00:00
|
|
|
The section mismatch analysis is always performed after a full
|
|
|
|
kernel build, and enabling this option causes the following
|
kbuild: create *.mod with full directory path and remove MODVERDIR
While descending directories, Kbuild produces objects for modules,
but do not link final *.ko files; it is done in the modpost.
To keep track of modules, Kbuild creates a *.mod file in $(MODVERDIR)
for every module it is building. Some post-processing steps read the
necessary information from *.mod files. This avoids descending into
directories again. This mechanism was introduced in 2003 or so.
Later, commit 551559e13af1 ("kbuild: implement modules.order") added
modules.order. So, we can simply read it out to know all the modules
with directory paths. This is easier than parsing the first line of
*.mod files.
$(MODVERDIR) has a flat directory structure, that is, *.mod files
are named only with base names. This is based on the assumption that
the module name is unique across the tree. This assumption is really
fragile.
Stephen Rothwell reported a race condition caused by a module name
conflict:
https://lkml.org/lkml/2019/5/13/991
In parallel building, two different threads could write to the same
$(MODVERDIR)/*.mod simultaneously.
Non-unique module names are the source of all kind of troubles, hence
commit 3a48a91901c5 ("kbuild: check uniqueness of module names")
introduced a new checker script.
However, it is still fragile in the build system point of view because
this race happens before scripts/modules-check.sh is invoked. If it
happens again, the modpost will emit unclear error messages.
To fix this issue completely, create *.mod with full directory path
so that two threads never attempt to write to the same file.
$(MODVERDIR) is no longer needed.
Since modules with directory paths are listed in modules.order, Kbuild
is still able to find *.mod files without additional descending.
I also killed cmd_secanalysis; scripts/mod/sumversion.c computes MD4 hash
for modules with MODULE_VERSION(). When CONFIG_DEBUG_SECTION_MISMATCH=y,
it occurs not only in the modpost stage, but also during directory
descending, where sumversion.c may parse stale *.mod files. It would emit
'No such file or directory' warning when an object consisting a module is
renamed, or when a single-obj module is turned into a multi-obj module or
vice versa.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Nicolas Pitre <nico@fluxnic.net>
2019-07-17 15:17:57 +09:00
|
|
|
additional step to occur:
|
2011-04-17 04:08:48 +00:00
|
|
|
- Add the option -fno-inline-functions-called-once to gcc commands.
|
|
|
|
When inlining a function annotated with __init in a non-init
|
|
|
|
function, we would lose the section information and thus
|
2008-01-21 21:31:44 +01:00
|
|
|
the analysis would not catch the illegal reference.
|
2011-04-17 04:08:48 +00:00
|
|
|
This option tells gcc to inline less (but it does result in
|
|
|
|
a larger kernel).
|
2008-01-21 21:31:44 +01:00
|
|
|
|
2015-10-06 09:44:42 +10:30
|
|
|
config SECTION_MISMATCH_WARN_ONLY
|
|
|
|
bool "Make section mismatch errors non-fatal"
|
|
|
|
default y
|
|
|
|
help
|
|
|
|
If you say N here, the build process will fail if there are any
|
|
|
|
section mismatch, instead of just throwing warnings.
|
|
|
|
|
|
|
|
If unsure, say Y.
|
|
|
|
|
2021-05-06 15:34:59 +08:00
|
|
|
config DEBUG_FORCE_FUNCTION_ALIGN_64B
|
2022-03-23 16:05:50 -07:00
|
|
|
bool "Force all function address 64B aligned"
|
2023-07-28 00:03:56 +08:00
|
|
|
depends on EXPERT && (X86_64 || ARM64 || PPC32 || PPC64 || ARC || RISCV || S390)
|
2022-09-15 13:10:47 +02:00
|
|
|
select FUNCTION_ALIGNMENT_64B
|
./Makefile: add debug option to enable function aligned on 32 bytes
Recently 0day reported many strange performance changes (regression or
improvement), in which there was no obvious relation between the culprit
commit and the benchmark at the first look, and it causes people to doubt
the test itself is wrong.
Upon further check, many of these cases are caused by the change to the
alignment of kernel text or data, as whole text/data of kernel are linked
together, change in one domain may affect alignments of other domains.
gcc has an option '-falign-functions=n' to force text aligned, and with
that option enabled, some of those performance changes will be gone, like
[1][2][3].
Add this option so that developers and 0day can easily find performance
bump caused by text alignment change, as tracking these strange bump is
quite time consuming. Though it can't help in other cases like data
alignment changes like [4].
Following is some size data for v5.7 kernel built with a RHEL config used
in 0day:
text data bss dec filename
19738771 13292906 5554236 38585913 vmlinux.noalign
19758591 13297002 5529660 38585253 vmlinux.align32
Raw vmlinux size in bytes:
v5.7 v5.7+align32
253950832 254018000 +0.02%
Some benchmark data, most of them have no big change:
* hackbench: [ -1.8%, +0.5%]
* fsmark: [ -3.2%, +3.4%] # ext4/xfs/btrfs
* kbuild: [ -2.0%, +0.9%]
* will-it-scale: [ -0.5%, +1.8%] # mmap1/pagefault3
* netperf:
- TCP_CRR [+16.6%, +97.4%]
- TCP_RR [-18.5%, -1.8%]
- TCP_STREAM [ -1.1%, +1.9%]
[1] https://lore.kernel.org/lkml/20200114085637.GA29297@shao2-debian/
[2] https://lore.kernel.org/lkml/20200330011254.GA14393@feng-iot/
[3] https://lore.kernel.org/lkml/1d98d1f0-fe84-6df7-f5bd-f4cb2cdb7f45@intel.com/
[4] https://lore.kernel.org/lkml/20200205123216.GO12867@shao2-debian/
Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Michal Marek <michal.lkml@markovi.net>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Andy Shevchenko <andriy.shevchenko@intel.com>
Link: http://lkml.kernel.org/r/1595475001-90945-1-git-send-email-feng.tang@intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-11 18:34:13 -07:00
|
|
|
help
|
|
|
|
There are cases that a commit from one domain changes the function
|
|
|
|
address alignment of other domains, and cause magic performance
|
|
|
|
bump (regression or improvement). Enable this option will help to
|
|
|
|
verify if the bump is caused by function alignment changes, while
|
|
|
|
it will slightly increase the kernel size and affect icache usage.
|
|
|
|
|
|
|
|
It is mainly for debug and performance tuning use.
|
|
|
|
|
2013-07-01 13:04:46 -07:00
|
|
|
#
|
|
|
|
# Select this config option from the architecture Kconfig, if it
|
|
|
|
# is preferred to always offer frame pointers as a config
|
|
|
|
# option on the architecture (regardless of KERNEL_DEBUG):
|
|
|
|
#
|
|
|
|
config ARCH_WANT_FRAME_POINTERS
|
|
|
|
bool
|
2006-01-09 20:54:51 -08:00
|
|
|
|
2013-07-01 13:04:46 -07:00
|
|
|
config FRAME_POINTER
|
|
|
|
bool "Compile the kernel with frame pointers"
|
2018-03-07 23:30:54 +01:00
|
|
|
depends on DEBUG_KERNEL && (M68K || UML || SUPERH) || ARCH_WANT_FRAME_POINTERS
|
2013-07-01 13:04:46 -07:00
|
|
|
default y if (DEBUG_INFO && UML) || ARCH_WANT_FRAME_POINTERS
|
2007-02-12 00:52:00 -08:00
|
|
|
help
|
2013-07-01 13:04:46 -07:00
|
|
|
If you say Y here the resulting kernel image will be slightly
|
|
|
|
larger and slower, but it gives very useful debugging information
|
|
|
|
in case of kernel bugs. (precise oopses/stacktraces/warnings)
|
2007-02-12 00:52:00 -08:00
|
|
|
|
2022-04-18 09:50:36 -07:00
|
|
|
config OBJTOOL
|
|
|
|
bool
|
|
|
|
|
2016-02-28 22:22:42 -06:00
|
|
|
config STACK_VALIDATION
|
|
|
|
bool "Compile-time stack metadata validation"
|
2022-04-18 09:50:36 -07:00
|
|
|
depends on HAVE_STACK_VALIDATION && UNWINDER_FRAME_POINTER
|
|
|
|
select OBJTOOL
|
2016-02-28 22:22:42 -06:00
|
|
|
default n
|
|
|
|
help
|
2022-04-18 09:50:36 -07:00
|
|
|
Validate frame pointer rules at compile-time. This helps ensure that
|
|
|
|
runtime stack traces are more reliable.
|
2017-07-24 18:36:57 -05:00
|
|
|
|
2016-02-28 22:22:42 -06:00
|
|
|
For more information, see
|
2022-06-26 10:11:01 +01:00
|
|
|
tools/objtool/Documentation/objtool.txt.
|
2016-02-28 22:22:42 -06:00
|
|
|
|
2022-04-18 09:50:41 -07:00
|
|
|
config NOINSTR_VALIDATION
|
2020-03-18 13:33:54 +01:00
|
|
|
bool
|
2022-04-18 09:50:42 -07:00
|
|
|
depends on HAVE_NOINSTR_VALIDATION && DEBUG_ENTRY
|
2022-04-18 09:50:36 -07:00
|
|
|
select OBJTOOL
|
2020-03-18 13:33:54 +01:00
|
|
|
default y
|
|
|
|
|
2021-03-05 10:27:07 +01:00
|
|
|
config VMLINUX_MAP
|
|
|
|
bool "Generate vmlinux.map file when linking"
|
|
|
|
depends on EXPERT
|
|
|
|
help
|
|
|
|
Selecting this option will pass "-Map=vmlinux.map" to ld
|
|
|
|
when linking vmlinux. That file can be useful for verifying
|
|
|
|
and debugging magic section games, and for seeing which
|
|
|
|
pieces of code get eliminated with
|
|
|
|
CONFIG_LD_DEAD_CODE_DATA_ELIMINATION.
|
|
|
|
|
kbuild: generate offset range data for builtin modules
Create file module.builtin.ranges that can be used to find where
built-in modules are located by their addresses. This will be useful for
tracing tools to find what functions are for various built-in modules.
The offset range data for builtin modules is generated using:
- modules.builtin: associates object files with module names
- vmlinux.map: provides load order of sections and offset of first member
per section
- vmlinux.o.map: provides offset of object file content per section
- .*.cmd: build cmd file with KBUILD_MODFILE
The generated data will look like:
.text 00000000-00000000 = _text
.text 0000baf0-0000cb10 amd_uncore
.text 0009bd10-0009c8e0 iosf_mbi
...
.text 00b9f080-00ba011a intel_skl_int3472_discrete
.text 00ba0120-00ba03c0 intel_skl_int3472_discrete intel_skl_int3472_tps68470
.text 00ba03c0-00ba08d6 intel_skl_int3472_tps68470
...
.data 00000000-00000000 = _sdata
.data 0000f020-0000f680 amd_uncore
For each ELF section, it lists the offset of the first symbol. This can
be used to determine the base address of the section at runtime.
Next, it lists (in strict ascending order) offset ranges in that section
that cover the symbols of one or more builtin modules. Multiple ranges
can apply to a single module, and ranges can be shared between modules.
The CONFIG_BUILTIN_MODULE_RANGES option controls whether offset range data
is generated for kernel modules that are built into the kernel image.
How it works:
1. The modules.builtin file is parsed to obtain a list of built-in
module names and their associated object names (the .ko file that
the module would be in if it were a loadable module, hereafter
referred to as <kmodfile>). This object name can be used to
identify objects in the kernel compile because any C or assembler
code that ends up into a built-in module will have the option
-DKBUILD_MODFILE=<kmodfile> present in its build command, and those
can be found in the .<obj>.cmd file in the kernel build tree.
If an object is part of multiple modules, they will all be listed
in the KBUILD_MODFILE option argument.
This allows us to conclusively determine whether an object in the
kernel build belong to any modules, and which.
2. The vmlinux.map is parsed next to determine the base address of each
top level section so that all addresses into the section can be
turned into offsets. This makes it possible to handle sections
getting loaded at different addresses at system boot.
We also determine an 'anchor' symbol at the beginning of each
section to make it possible to calculate the true base address of
a section at runtime (i.e. symbol address - symbol offset).
We collect start addresses of sections that are included in the top
level section. This is used when vmlinux is linked using vmlinux.o,
because in that case, we need to look at the vmlinux.o linker map to
know what object a symbol is found in.
And finally, we process each symbol that is listed in vmlinux.map
(or vmlinux.o.map) based on the following structure:
vmlinux linked from vmlinux.a:
vmlinux.map:
<top level section>
<included section> -- might be same as top level section)
<object> -- built-in association known
<symbol> -- belongs to module(s) object belongs to
...
vmlinux linked from vmlinux.o:
vmlinux.map:
<top level section>
<included section> -- might be same as top level section)
vmlinux.o -- need to use vmlinux.o.map
<symbol> -- ignored
...
vmlinux.o.map:
<section>
<object> -- built-in association known
<symbol> -- belongs to module(s) object belongs to
...
3. As sections, objects, and symbols are processed, offset ranges are
constructed in a straight-forward way:
- If the symbol belongs to one or more built-in modules:
- If we were working on the same module(s), extend the range
to include this object
- If we were working on another module(s), close that range,
and start the new one
- If the symbol does not belong to any built-in modules:
- If we were working on a module(s) range, close that range
Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
Reviewed-by: Nick Alcock <nick.alcock@oracle.com>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Tested-by: Sam James <sam@gentoo.org>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Tested-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-09-06 10:45:03 -04:00
|
|
|
config BUILTIN_MODULE_RANGES
|
|
|
|
bool "Generate address range information for builtin modules"
|
|
|
|
depends on !LTO
|
|
|
|
depends on VMLINUX_MAP
|
|
|
|
help
|
|
|
|
When modules are built into the kernel, there will be no module name
|
|
|
|
associated with its symbols in /proc/kallsyms. Tracers may want to
|
|
|
|
identify symbols by module name and symbol name regardless of whether
|
|
|
|
the module is configured as loadable or not.
|
|
|
|
|
|
|
|
This option generates modules.builtin.ranges in the build tree with
|
|
|
|
offset ranges (per ELF section) for the module(s) they belong to.
|
|
|
|
It also records an anchor symbol to determine the load address of the
|
|
|
|
section.
|
|
|
|
|
2013-07-01 13:04:46 -07:00
|
|
|
config DEBUG_FORCE_WEAK_PER_CPU
|
|
|
|
bool "Force weak per-cpu definitions"
|
|
|
|
depends on DEBUG_KERNEL
|
2005-09-06 15:16:27 -07:00
|
|
|
help
|
2013-07-01 13:04:46 -07:00
|
|
|
s390 and alpha require percpu variables in modules to be
|
|
|
|
defined weak to work around addressing range issue which
|
|
|
|
puts the following two restrictions on percpu variable
|
|
|
|
definitions.
|
2005-09-06 15:16:27 -07:00
|
|
|
|
2013-07-01 13:04:46 -07:00
|
|
|
1. percpu symbols must be unique whether static or not
|
|
|
|
2. percpu variables can't be defined inside a function
|
2005-09-06 15:16:27 -07:00
|
|
|
|
2013-07-01 13:04:46 -07:00
|
|
|
To ensure that generic code follows the above rules, this
|
|
|
|
option forces all percpu variables to be defined as weak.
|
2012-02-09 17:42:21 -05:00
|
|
|
|
2013-07-01 13:04:46 -07:00
|
|
|
endmenu # "Compiler options"
|
2005-09-06 15:16:27 -07:00
|
|
|
|
2019-12-06 17:03:42 -08:00
|
|
|
menu "Generic Kernel Debugging Instruments"
|
|
|
|
|
2013-07-01 13:04:46 -07:00
|
|
|
config MAGIC_SYSRQ
|
|
|
|
bool "Magic SysRq key"
|
|
|
|
depends on !UML
|
|
|
|
help
|
|
|
|
If you say Y here, you will have some control over the system even
|
|
|
|
if the system crashes for example during kernel debugging (e.g., you
|
|
|
|
will be able to flush the buffer cache to disk, reboot the system
|
|
|
|
immediately or dump some status information). This is accomplished
|
|
|
|
by pressing various keys while holding SysRq (Alt+PrintScreen). It
|
|
|
|
also works on a serial console (on PC hardware at least), if you
|
|
|
|
send a BREAK and then within 5 seconds a command keypress. The
|
2017-03-16 09:37:32 +01:00
|
|
|
keys are documented in <file:Documentation/admin-guide/sysrq.rst>.
|
|
|
|
Don't say Y unless you really know what this hack does.
|
2005-09-06 15:16:27 -07:00
|
|
|
|
2013-10-07 01:05:46 +01:00
|
|
|
config MAGIC_SYSRQ_DEFAULT_ENABLE
|
|
|
|
hex "Enable magic SysRq key functions by default"
|
|
|
|
depends on MAGIC_SYSRQ
|
|
|
|
default 0x1
|
|
|
|
help
|
|
|
|
Specifies which SysRq key functions are enabled by default.
|
|
|
|
This may be set to 1 or 0 to enable or disable them all, or
|
2017-03-16 09:37:32 +01:00
|
|
|
to a bitmask as described in Documentation/admin-guide/sysrq.rst.
|
2013-10-07 01:05:46 +01:00
|
|
|
|
2016-12-22 08:31:34 +01:00
|
|
|
config MAGIC_SYSRQ_SERIAL
|
|
|
|
bool "Enable magic SysRq key over serial"
|
|
|
|
depends on MAGIC_SYSRQ
|
|
|
|
default y
|
|
|
|
help
|
|
|
|
Many embedded boards have a disconnected TTL level serial which can
|
|
|
|
generate some garbage that can lead to spurious false sysrq detects.
|
|
|
|
This option allows you to decide whether you want to enable the
|
|
|
|
magic SysRq key.
|
|
|
|
|
2020-03-02 17:51:35 +00:00
|
|
|
config MAGIC_SYSRQ_SERIAL_SEQUENCE
|
|
|
|
string "Char sequence that enables magic SysRq over serial"
|
|
|
|
depends on MAGIC_SYSRQ_SERIAL
|
|
|
|
default ""
|
|
|
|
help
|
|
|
|
Specifies a sequence of characters that can follow BREAK to enable
|
|
|
|
SysRq on a serial console.
|
|
|
|
|
2020-03-06 15:31:56 +00:00
|
|
|
If unsure, leave an empty string and the option will not be enabled.
|
|
|
|
|
2019-12-06 17:04:06 -08:00
|
|
|
config DEBUG_FS
|
|
|
|
bool "Debug Filesystem"
|
|
|
|
help
|
|
|
|
debugfs is a virtual file system that kernel developers use to put
|
|
|
|
debugging files into. Enable this option to be able to read and
|
|
|
|
write to these files.
|
|
|
|
|
|
|
|
For detailed documentation on the debugfs API, see
|
|
|
|
Documentation/filesystems/.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2020-07-16 09:15:11 +02:00
|
|
|
choice
|
|
|
|
prompt "Debugfs default access"
|
|
|
|
depends on DEBUG_FS
|
|
|
|
default DEBUG_FS_ALLOW_ALL
|
|
|
|
help
|
|
|
|
This selects the default access restrictions for debugfs.
|
|
|
|
It can be overridden with kernel command line option
|
|
|
|
debugfs=[on,no-mount,off]. The restrictions apply for API access
|
|
|
|
and filesystem registration.
|
|
|
|
|
|
|
|
config DEBUG_FS_ALLOW_ALL
|
|
|
|
bool "Access normal"
|
|
|
|
help
|
|
|
|
No restrictions apply. Both API and filesystem registration
|
|
|
|
is on. This is the normal default operation.
|
|
|
|
|
|
|
|
config DEBUG_FS_DISALLOW_MOUNT
|
|
|
|
bool "Do not register debugfs as filesystem"
|
|
|
|
help
|
|
|
|
The API is open but filesystem is not loaded. Clients can still do
|
|
|
|
their work and read with debug tools that do not need
|
|
|
|
debugfs filesystem.
|
|
|
|
|
|
|
|
config DEBUG_FS_ALLOW_NONE
|
|
|
|
bool "No access"
|
|
|
|
help
|
|
|
|
Access is off. Clients get -PERM when trying to create nodes in
|
|
|
|
debugfs tree and debugfs is not registered as a filesystem.
|
|
|
|
Client can then back-off or continue without debugfs access.
|
|
|
|
|
|
|
|
endchoice
|
|
|
|
|
2019-12-06 17:03:42 -08:00
|
|
|
source "lib/Kconfig.kgdb"
|
|
|
|
source "lib/Kconfig.ubsan"
|
2020-09-18 21:20:42 -07:00
|
|
|
source "lib/Kconfig.kcsan"
|
2019-12-06 17:03:42 -08:00
|
|
|
|
|
|
|
endmenu
|
|
|
|
|
2021-12-04 20:21:57 -08:00
|
|
|
menu "Networking Debugging"
|
|
|
|
|
|
|
|
source "net/Kconfig.debug"
|
|
|
|
|
|
|
|
endmenu # "Networking Debugging"
|
2019-05-14 15:44:00 -07:00
|
|
|
|
2013-07-01 13:04:43 -07:00
|
|
|
menu "Memory Debugging"
|
2011-03-22 16:34:16 -07:00
|
|
|
|
2018-12-11 20:01:04 +09:00
|
|
|
source "mm/Kconfig.debug"
|
2011-03-22 16:34:16 -07:00
|
|
|
|
2013-07-01 13:04:43 -07:00
|
|
|
config DEBUG_OBJECTS
|
|
|
|
bool "Debug object operations"
|
|
|
|
depends on DEBUG_KERNEL
|
2008-05-12 21:21:04 +02:00
|
|
|
help
|
2013-07-01 13:04:43 -07:00
|
|
|
If you say Y here, additional code will be inserted into the
|
|
|
|
kernel to track the life time of various objects and validate
|
|
|
|
the operations on those objects.
|
2008-05-12 21:21:04 +02:00
|
|
|
|
2013-07-01 13:04:43 -07:00
|
|
|
config DEBUG_OBJECTS_SELFTEST
|
|
|
|
bool "Debug objects selftest"
|
|
|
|
depends on DEBUG_OBJECTS
|
|
|
|
help
|
|
|
|
This enables the selftest of the object debug code.
|
2008-05-12 21:21:04 +02:00
|
|
|
|
2013-07-01 13:04:43 -07:00
|
|
|
config DEBUG_OBJECTS_FREE
|
|
|
|
bool "Debug objects in freed memory"
|
|
|
|
depends on DEBUG_OBJECTS
|
|
|
|
help
|
|
|
|
This enables checks whether a k/v free operation frees an area
|
|
|
|
which contains an object which has not been deactivated
|
|
|
|
properly. This can make kmalloc/kfree-intensive workloads
|
|
|
|
much slower.
|
2008-04-30 00:55:01 -07:00
|
|
|
|
2008-04-30 00:55:03 -07:00
|
|
|
config DEBUG_OBJECTS_TIMERS
|
|
|
|
bool "Debug timer objects"
|
|
|
|
depends on DEBUG_OBJECTS
|
|
|
|
help
|
|
|
|
If you say Y here, additional code will be inserted into the
|
|
|
|
timer routines to track the life time of timer objects and
|
|
|
|
validate the timer operations.
|
|
|
|
|
2009-11-16 01:09:48 +09:00
|
|
|
config DEBUG_OBJECTS_WORK
|
|
|
|
bool "Debug work objects"
|
|
|
|
depends on DEBUG_OBJECTS
|
|
|
|
help
|
|
|
|
If you say Y here, additional code will be inserted into the
|
|
|
|
work queue routines to track the life time of work objects and
|
|
|
|
validate the work operations.
|
|
|
|
|
2010-04-17 08:48:42 -04:00
|
|
|
config DEBUG_OBJECTS_RCU_HEAD
|
|
|
|
bool "Debug RCU callbacks objects"
|
2011-02-23 09:42:14 -08:00
|
|
|
depends on DEBUG_OBJECTS
|
2010-04-17 08:48:42 -04:00
|
|
|
help
|
|
|
|
Enable this to turn on debugging of RCU list heads (call_rcu() usage).
|
|
|
|
|
2010-10-26 14:23:05 -07:00
|
|
|
config DEBUG_OBJECTS_PERCPU_COUNTER
|
|
|
|
bool "Debug percpu counter objects"
|
|
|
|
depends on DEBUG_OBJECTS
|
|
|
|
help
|
|
|
|
If you say Y here, additional code will be inserted into the
|
|
|
|
percpu counter routines to track the life time of percpu counter
|
|
|
|
objects and validate the percpu counter operations.
|
|
|
|
|
2008-11-26 10:02:00 +01:00
|
|
|
config DEBUG_OBJECTS_ENABLE_DEFAULT
|
|
|
|
int "debug_objects bootup default value (0-1)"
|
2019-12-06 17:04:08 -08:00
|
|
|
range 0 1
|
|
|
|
default "1"
|
|
|
|
depends on DEBUG_OBJECTS
|
|
|
|
help
|
|
|
|
Debug objects boot parameter default value
|
2008-11-26 10:02:00 +01:00
|
|
|
|
2022-05-31 20:22:23 -07:00
|
|
|
config SHRINKER_DEBUG
|
|
|
|
bool "Enable shrinker debugging support"
|
|
|
|
depends on DEBUG_FS
|
|
|
|
help
|
|
|
|
Say Y to enable the shrinker debugfs interface which provides
|
|
|
|
visibility into the kernel memory shrinkers subsystem.
|
|
|
|
Disable it to avoid an extra memory footprint.
|
|
|
|
|
2013-07-01 13:04:43 -07:00
|
|
|
config DEBUG_STACK_USAGE
|
|
|
|
bool "Stack utilization instrumentation"
|
arch: Remove Itanium (IA-64) architecture
The Itanium architecture is obsolete, and an informal survey [0] reveals
that any residual use of Itanium hardware in production is mostly HP-UX
or OpenVMS based. The use of Linux on Itanium appears to be limited to
enthusiasts that occasionally boot a fresh Linux kernel to see whether
things are still working as intended, and perhaps to churn out some
distro packages that are rarely used in practice.
None of the original companies behind Itanium still produce or support
any hardware or software for the architecture, and it is listed as
'Orphaned' in the MAINTAINERS file, as apparently, none of the engineers
that contributed on behalf of those companies (nor anyone else, for that
matter) have been willing to support or maintain the architecture
upstream or even be responsible for applying the odd fix. The Intel
firmware team removed all IA-64 support from the Tianocore/EDK2
reference implementation of EFI in 2018. (Itanium is the original
architecture for which EFI was developed, and the way Linux supports it
deviates significantly from other architectures.) Some distros, such as
Debian and Gentoo, still maintain [unofficial] ia64 ports, but many have
dropped support years ago.
While the argument is being made [1] that there is a 'for the common
good' angle to being able to build and run existing projects such as the
Grid Community Toolkit [2] on Itanium for interoperability testing, the
fact remains that none of those projects are known to be deployed on
Linux/ia64, and very few people actually have access to such a system in
the first place. Even if there were ways imaginable in which Linux/ia64
could be put to good use today, what matters is whether anyone is
actually doing that, and this does not appear to be the case.
There are no emulators widely available, and so boot testing Itanium is
generally infeasible for ordinary contributors. GCC still supports IA-64
but its compile farm [3] no longer has any IA-64 machines. GLIBC would
like to get rid of IA-64 [4] too because it would permit some overdue
code cleanups. In summary, the benefits to the ecosystem of having IA-64
be part of it are mostly theoretical, whereas the maintenance overhead
of keeping it supported is real.
So let's rip off the band aid, and remove the IA-64 arch code entirely.
This follows the timeline proposed by the Debian/ia64 maintainer [5],
which removes support in a controlled manner, leaving IA-64 in a known
good state in the most recent LTS release. Other projects will follow
once the kernel support is removed.
[0] https://lore.kernel.org/all/CAMj1kXFCMh_578jniKpUtx_j8ByHnt=s7S+yQ+vGbKt9ud7+kQ@mail.gmail.com/
[1] https://lore.kernel.org/all/0075883c-7c51-00f5-2c2d-5119c1820410@web.de/
[2] https://gridcf.org/gct-docs/latest/index.html
[3] https://cfarm.tetaneutral.net/machines/list/
[4] https://lore.kernel.org/all/87bkiilpc4.fsf@mid.deneb.enyo.de/
[5] https://lore.kernel.org/all/ff58a3e76e5102c94bb5946d99187b358def688a.camel@physik.fu-berlin.de/
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2022-10-20 15:54:33 +02:00
|
|
|
depends on DEBUG_KERNEL
|
2013-07-01 13:04:43 -07:00
|
|
|
help
|
|
|
|
Enables the display of the minimum amount of free stack which each
|
|
|
|
task has ever had available in the sysrq-T and sysrq-P debug output.
|
2023-12-19 19:28:09 +01:00
|
|
|
Also emits a message to dmesg when a process exits if that process
|
|
|
|
used more stack space than previously exiting processes.
|
2013-07-01 13:04:43 -07:00
|
|
|
|
|
|
|
This option will slow down process creation somewhat.
|
|
|
|
|
2019-12-06 17:03:57 -08:00
|
|
|
config SCHED_STACK_END_CHECK
|
|
|
|
bool "Detect stack corruption on calls to schedule()"
|
|
|
|
depends on DEBUG_KERNEL
|
|
|
|
default n
|
|
|
|
help
|
|
|
|
This option checks for a stack overrun on calls to schedule().
|
|
|
|
If the stack end location is found to be over written always panic as
|
|
|
|
the content of the corrupted region can no longer be trusted.
|
|
|
|
This is to ensure no erroneous behaviour occurs which could result in
|
|
|
|
data corruption or a sporadic crash at a later stage once the region
|
|
|
|
is examined. The runtime overhead introduced is minimal.
|
|
|
|
|
mm/debug: add tests validating architecture page table helpers
This adds tests which will validate architecture page table helpers and
other accessors in their compliance with expected generic MM semantics.
This will help various architectures in validating changes to existing
page table helpers or addition of new ones.
This test covers basic page table entry transformations including but not
limited to old, young, dirty, clean, write, write protect etc at various
level along with populating intermediate entries with next page table page
and validating them.
Test page table pages are allocated from system memory with required size
and alignments. The mapped pfns at page table levels are derived from a
real pfn representing a valid kernel text symbol. This test gets called
via late_initcall().
This test gets built and run when CONFIG_DEBUG_VM_PGTABLE is selected.
Any architecture, which is willing to subscribe this test will need to
select ARCH_HAS_DEBUG_VM_PGTABLE. For now this is limited to arc, arm64,
x86, s390 and powerpc platforms where the test is known to build and run
successfully Going forward, other architectures too can subscribe the test
after fixing any build or runtime problems with their page table helpers.
Folks interested in making sure that a given platform's page table helpers
conform to expected generic MM semantics should enable the above config
which will just trigger this test during boot. Any non conformity here
will be reported as an warning which would need to be fixed. This test
will help catch any changes to the agreed upon semantics expected from
generic MM and enable platforms to accommodate it thereafter.
[anshuman.khandual@arm.com: v17]
Link: http://lkml.kernel.org/r/1587436495-22033-3-git-send-email-anshuman.khandual@arm.com
[anshuman.khandual@arm.com: v18]
Link: http://lkml.kernel.org/r/1588564865-31160-3-git-send-email-anshuman.khandual@arm.com
Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> [s390]
Tested-by: Christophe Leroy <christophe.leroy@c-s.fr> [ppc32]
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Link: http://lkml.kernel.org/r/1583919272-24178-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04 16:47:15 -07:00
|
|
|
config ARCH_HAS_DEBUG_VM_PGTABLE
|
|
|
|
bool
|
|
|
|
help
|
|
|
|
An architecture should select this when it can successfully
|
|
|
|
build and run DEBUG_VM_PGTABLE.
|
|
|
|
|
2022-08-25 18:41:27 +02:00
|
|
|
config DEBUG_VM_IRQSOFF
|
|
|
|
def_bool DEBUG_VM && !PREEMPT_RT
|
|
|
|
|
2013-07-01 13:04:43 -07:00
|
|
|
config DEBUG_VM
|
|
|
|
bool "Debug VM"
|
|
|
|
depends on DEBUG_KERNEL
|
|
|
|
help
|
|
|
|
Enable this to turn on extended checks in the virtual-memory system
|
2019-12-06 17:04:08 -08:00
|
|
|
that may impact performance.
|
2013-07-01 13:04:43 -07:00
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2023-02-03 17:18:36 +10:00
|
|
|
config DEBUG_VM_SHOOT_LAZIES
|
|
|
|
bool "Debug MMU_LAZY_TLB_SHOOTDOWN implementation"
|
|
|
|
depends on DEBUG_VM
|
|
|
|
depends on MMU_LAZY_TLB_SHOOTDOWN
|
|
|
|
help
|
|
|
|
Enable additional IPIs that ensure lazy tlb mm references are removed
|
|
|
|
before the mm is freed.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
Maple Tree: add new data structure
Patch series "Introducing the Maple Tree"
The maple tree is an RCU-safe range based B-tree designed to use modern
processor cache efficiently. There are a number of places in the kernel
that a non-overlapping range-based tree would be beneficial, especially
one with a simple interface. If you use an rbtree with other data
structures to improve performance or an interval tree to track
non-overlapping ranges, then this is for you.
The tree has a branching factor of 10 for non-leaf nodes and 16 for leaf
nodes. With the increased branching factor, it is significantly shorter
than the rbtree so it has fewer cache misses. The removal of the linked
list between subsequent entries also reduces the cache misses and the need
to pull in the previous and next VMA during many tree alterations.
The first user that is covered in this patch set is the vm_area_struct,
where three data structures are replaced by the maple tree: the augmented
rbtree, the vma cache, and the linked list of VMAs in the mm_struct. The
long term goal is to reduce or remove the mmap_lock contention.
The plan is to get to the point where we use the maple tree in RCU mode.
Readers will not block for writers. A single write operation will be
allowed at a time. A reader re-walks if stale data is encountered. VMAs
would be RCU enabled and this mode would be entered once multiple tasks
are using the mm_struct.
Davidlor said
: Yes I like the maple tree, and at this stage I don't think we can ask for
: more from this series wrt the MM - albeit there seems to still be some
: folks reporting breakage. Fundamentally I see Liam's work to (re)move
: complexity out of the MM (not to say that the actual maple tree is not
: complex) by consolidating the three complimentary data structures very
: much worth it considering performance does not take a hit. This was very
: much a turn off with the range locking approach, which worst case scenario
: incurred in prohibitive overhead. Also as Liam and Matthew have
: mentioned, RCU opens up a lot of nice performance opportunities, and in
: addition academia[1] has shown outstanding scalability of address spaces
: with the foundation of replacing the locked rbtree with RCU aware trees.
A similar work has been discovered in the academic press
https://pdos.csail.mit.edu/papers/rcuvm:asplos12.pdf
Sheer coincidence. We designed our tree with the intention of solving the
hardest problem first. Upon settling on a b-tree variant and a rough
outline, we researched ranged based b-trees and RCU b-trees and did find
that article. So it was nice to find reassurances that we were on the
right path, but our design choice of using ranges made that paper unusable
for us.
This patch (of 70):
The maple tree is an RCU-safe range based B-tree designed to use modern
processor cache efficiently. There are a number of places in the kernel
that a non-overlapping range-based tree would be beneficial, especially
one with a simple interface. If you use an rbtree with other data
structures to improve performance or an interval tree to track
non-overlapping ranges, then this is for you.
The tree has a branching factor of 10 for non-leaf nodes and 16 for leaf
nodes. With the increased branching factor, it is significantly shorter
than the rbtree so it has fewer cache misses. The removal of the linked
list between subsequent entries also reduces the cache misses and the need
to pull in the previous and next VMA during many tree alterations.
The first user that is covered in this patch set is the vm_area_struct,
where three data structures are replaced by the maple tree: the augmented
rbtree, the vma cache, and the linked list of VMAs in the mm_struct. The
long term goal is to reduce or remove the mmap_lock contention.
The plan is to get to the point where we use the maple tree in RCU mode.
Readers will not block for writers. A single write operation will be
allowed at a time. A reader re-walks if stale data is encountered. VMAs
would be RCU enabled and this mode would be entered once multiple tasks
are using the mm_struct.
There is additional BUG_ON() calls added within the tree, most of which
are in debug code. These will be replaced with a WARN_ON() call in the
future. There is also additional BUG_ON() calls within the code which
will also be reduced in number at a later date. These exist to catch
things such as out-of-range accesses which would crash anyways.
Link: https://lkml.kernel.org/r/20220906194824.2110408-1-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220906194824.2110408-2-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Tested-by: David Howells <dhowells@redhat.com>
Tested-by: Sven Schnelle <svens@linux.ibm.com>
Tested-by: Yu Zhao <yuzhao@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Hildenbrand <david@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-06 19:48:39 +00:00
|
|
|
config DEBUG_VM_MAPLE_TREE
|
|
|
|
bool "Debug VM maple trees"
|
2014-06-04 16:06:46 -07:00
|
|
|
depends on DEBUG_VM
|
Maple Tree: add new data structure
Patch series "Introducing the Maple Tree"
The maple tree is an RCU-safe range based B-tree designed to use modern
processor cache efficiently. There are a number of places in the kernel
that a non-overlapping range-based tree would be beneficial, especially
one with a simple interface. If you use an rbtree with other data
structures to improve performance or an interval tree to track
non-overlapping ranges, then this is for you.
The tree has a branching factor of 10 for non-leaf nodes and 16 for leaf
nodes. With the increased branching factor, it is significantly shorter
than the rbtree so it has fewer cache misses. The removal of the linked
list between subsequent entries also reduces the cache misses and the need
to pull in the previous and next VMA during many tree alterations.
The first user that is covered in this patch set is the vm_area_struct,
where three data structures are replaced by the maple tree: the augmented
rbtree, the vma cache, and the linked list of VMAs in the mm_struct. The
long term goal is to reduce or remove the mmap_lock contention.
The plan is to get to the point where we use the maple tree in RCU mode.
Readers will not block for writers. A single write operation will be
allowed at a time. A reader re-walks if stale data is encountered. VMAs
would be RCU enabled and this mode would be entered once multiple tasks
are using the mm_struct.
Davidlor said
: Yes I like the maple tree, and at this stage I don't think we can ask for
: more from this series wrt the MM - albeit there seems to still be some
: folks reporting breakage. Fundamentally I see Liam's work to (re)move
: complexity out of the MM (not to say that the actual maple tree is not
: complex) by consolidating the three complimentary data structures very
: much worth it considering performance does not take a hit. This was very
: much a turn off with the range locking approach, which worst case scenario
: incurred in prohibitive overhead. Also as Liam and Matthew have
: mentioned, RCU opens up a lot of nice performance opportunities, and in
: addition academia[1] has shown outstanding scalability of address spaces
: with the foundation of replacing the locked rbtree with RCU aware trees.
A similar work has been discovered in the academic press
https://pdos.csail.mit.edu/papers/rcuvm:asplos12.pdf
Sheer coincidence. We designed our tree with the intention of solving the
hardest problem first. Upon settling on a b-tree variant and a rough
outline, we researched ranged based b-trees and RCU b-trees and did find
that article. So it was nice to find reassurances that we were on the
right path, but our design choice of using ranges made that paper unusable
for us.
This patch (of 70):
The maple tree is an RCU-safe range based B-tree designed to use modern
processor cache efficiently. There are a number of places in the kernel
that a non-overlapping range-based tree would be beneficial, especially
one with a simple interface. If you use an rbtree with other data
structures to improve performance or an interval tree to track
non-overlapping ranges, then this is for you.
The tree has a branching factor of 10 for non-leaf nodes and 16 for leaf
nodes. With the increased branching factor, it is significantly shorter
than the rbtree so it has fewer cache misses. The removal of the linked
list between subsequent entries also reduces the cache misses and the need
to pull in the previous and next VMA during many tree alterations.
The first user that is covered in this patch set is the vm_area_struct,
where three data structures are replaced by the maple tree: the augmented
rbtree, the vma cache, and the linked list of VMAs in the mm_struct. The
long term goal is to reduce or remove the mmap_lock contention.
The plan is to get to the point where we use the maple tree in RCU mode.
Readers will not block for writers. A single write operation will be
allowed at a time. A reader re-walks if stale data is encountered. VMAs
would be RCU enabled and this mode would be entered once multiple tasks
are using the mm_struct.
There is additional BUG_ON() calls added within the tree, most of which
are in debug code. These will be replaced with a WARN_ON() call in the
future. There is also additional BUG_ON() calls within the code which
will also be reduced in number at a later date. These exist to catch
things such as out-of-range accesses which would crash anyways.
Link: https://lkml.kernel.org/r/20220906194824.2110408-1-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220906194824.2110408-2-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Tested-by: David Howells <dhowells@redhat.com>
Tested-by: Sven Schnelle <svens@linux.ibm.com>
Tested-by: Yu Zhao <yuzhao@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Hildenbrand <david@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-06 19:48:39 +00:00
|
|
|
select DEBUG_MAPLE_TREE
|
2014-06-04 16:06:46 -07:00
|
|
|
help
|
Maple Tree: add new data structure
Patch series "Introducing the Maple Tree"
The maple tree is an RCU-safe range based B-tree designed to use modern
processor cache efficiently. There are a number of places in the kernel
that a non-overlapping range-based tree would be beneficial, especially
one with a simple interface. If you use an rbtree with other data
structures to improve performance or an interval tree to track
non-overlapping ranges, then this is for you.
The tree has a branching factor of 10 for non-leaf nodes and 16 for leaf
nodes. With the increased branching factor, it is significantly shorter
than the rbtree so it has fewer cache misses. The removal of the linked
list between subsequent entries also reduces the cache misses and the need
to pull in the previous and next VMA during many tree alterations.
The first user that is covered in this patch set is the vm_area_struct,
where three data structures are replaced by the maple tree: the augmented
rbtree, the vma cache, and the linked list of VMAs in the mm_struct. The
long term goal is to reduce or remove the mmap_lock contention.
The plan is to get to the point where we use the maple tree in RCU mode.
Readers will not block for writers. A single write operation will be
allowed at a time. A reader re-walks if stale data is encountered. VMAs
would be RCU enabled and this mode would be entered once multiple tasks
are using the mm_struct.
Davidlor said
: Yes I like the maple tree, and at this stage I don't think we can ask for
: more from this series wrt the MM - albeit there seems to still be some
: folks reporting breakage. Fundamentally I see Liam's work to (re)move
: complexity out of the MM (not to say that the actual maple tree is not
: complex) by consolidating the three complimentary data structures very
: much worth it considering performance does not take a hit. This was very
: much a turn off with the range locking approach, which worst case scenario
: incurred in prohibitive overhead. Also as Liam and Matthew have
: mentioned, RCU opens up a lot of nice performance opportunities, and in
: addition academia[1] has shown outstanding scalability of address spaces
: with the foundation of replacing the locked rbtree with RCU aware trees.
A similar work has been discovered in the academic press
https://pdos.csail.mit.edu/papers/rcuvm:asplos12.pdf
Sheer coincidence. We designed our tree with the intention of solving the
hardest problem first. Upon settling on a b-tree variant and a rough
outline, we researched ranged based b-trees and RCU b-trees and did find
that article. So it was nice to find reassurances that we were on the
right path, but our design choice of using ranges made that paper unusable
for us.
This patch (of 70):
The maple tree is an RCU-safe range based B-tree designed to use modern
processor cache efficiently. There are a number of places in the kernel
that a non-overlapping range-based tree would be beneficial, especially
one with a simple interface. If you use an rbtree with other data
structures to improve performance or an interval tree to track
non-overlapping ranges, then this is for you.
The tree has a branching factor of 10 for non-leaf nodes and 16 for leaf
nodes. With the increased branching factor, it is significantly shorter
than the rbtree so it has fewer cache misses. The removal of the linked
list between subsequent entries also reduces the cache misses and the need
to pull in the previous and next VMA during many tree alterations.
The first user that is covered in this patch set is the vm_area_struct,
where three data structures are replaced by the maple tree: the augmented
rbtree, the vma cache, and the linked list of VMAs in the mm_struct. The
long term goal is to reduce or remove the mmap_lock contention.
The plan is to get to the point where we use the maple tree in RCU mode.
Readers will not block for writers. A single write operation will be
allowed at a time. A reader re-walks if stale data is encountered. VMAs
would be RCU enabled and this mode would be entered once multiple tasks
are using the mm_struct.
There is additional BUG_ON() calls added within the tree, most of which
are in debug code. These will be replaced with a WARN_ON() call in the
future. There is also additional BUG_ON() calls within the code which
will also be reduced in number at a later date. These exist to catch
things such as out-of-range accesses which would crash anyways.
Link: https://lkml.kernel.org/r/20220906194824.2110408-1-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220906194824.2110408-2-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Tested-by: David Howells <dhowells@redhat.com>
Tested-by: Sven Schnelle <svens@linux.ibm.com>
Tested-by: Yu Zhao <yuzhao@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Hildenbrand <david@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-06 19:48:39 +00:00
|
|
|
Enable VM maple tree debugging information and extra validations.
|
2014-06-04 16:06:46 -07:00
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2013-07-01 13:04:43 -07:00
|
|
|
config DEBUG_VM_RB
|
|
|
|
bool "Debug VM red-black trees"
|
|
|
|
depends on DEBUG_VM
|
|
|
|
help
|
2014-04-18 15:07:22 -07:00
|
|
|
Enable VM red-black tree debugging information and extra validations.
|
2013-07-01 13:04:43 -07:00
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2016-01-15 16:51:21 -08:00
|
|
|
config DEBUG_VM_PGFLAGS
|
|
|
|
bool "Debug page-flags operations"
|
|
|
|
depends on DEBUG_VM
|
|
|
|
help
|
|
|
|
Enables extra validation on page flags operations.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
mm/debug: add tests validating architecture page table helpers
This adds tests which will validate architecture page table helpers and
other accessors in their compliance with expected generic MM semantics.
This will help various architectures in validating changes to existing
page table helpers or addition of new ones.
This test covers basic page table entry transformations including but not
limited to old, young, dirty, clean, write, write protect etc at various
level along with populating intermediate entries with next page table page
and validating them.
Test page table pages are allocated from system memory with required size
and alignments. The mapped pfns at page table levels are derived from a
real pfn representing a valid kernel text symbol. This test gets called
via late_initcall().
This test gets built and run when CONFIG_DEBUG_VM_PGTABLE is selected.
Any architecture, which is willing to subscribe this test will need to
select ARCH_HAS_DEBUG_VM_PGTABLE. For now this is limited to arc, arm64,
x86, s390 and powerpc platforms where the test is known to build and run
successfully Going forward, other architectures too can subscribe the test
after fixing any build or runtime problems with their page table helpers.
Folks interested in making sure that a given platform's page table helpers
conform to expected generic MM semantics should enable the above config
which will just trigger this test during boot. Any non conformity here
will be reported as an warning which would need to be fixed. This test
will help catch any changes to the agreed upon semantics expected from
generic MM and enable platforms to accommodate it thereafter.
[anshuman.khandual@arm.com: v17]
Link: http://lkml.kernel.org/r/1587436495-22033-3-git-send-email-anshuman.khandual@arm.com
[anshuman.khandual@arm.com: v18]
Link: http://lkml.kernel.org/r/1588564865-31160-3-git-send-email-anshuman.khandual@arm.com
Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> [s390]
Tested-by: Christophe Leroy <christophe.leroy@c-s.fr> [ppc32]
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Link: http://lkml.kernel.org/r/1583919272-24178-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04 16:47:15 -07:00
|
|
|
config DEBUG_VM_PGTABLE
|
|
|
|
bool "Debug arch page table for semantics compliance"
|
|
|
|
depends on MMU
|
|
|
|
depends on ARCH_HAS_DEBUG_VM_PGTABLE
|
|
|
|
default y if DEBUG_VM
|
|
|
|
help
|
|
|
|
This option provides a debug method which can be used to test
|
|
|
|
architecture page table helper functions on various platforms in
|
|
|
|
verifying if they comply with expected generic MM semantics. This
|
|
|
|
will help architecture code in making sure that any changes or
|
|
|
|
new additions of these helpers still conform to expected
|
|
|
|
semantics of the generic MM. Platforms will have to opt in for
|
|
|
|
this through ARCH_HAS_DEBUG_VM_PGTABLE.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2017-01-10 13:35:40 -08:00
|
|
|
config ARCH_HAS_DEBUG_VIRTUAL
|
|
|
|
bool
|
|
|
|
|
2013-07-01 13:04:43 -07:00
|
|
|
config DEBUG_VIRTUAL
|
|
|
|
bool "Debug VM translations"
|
2017-01-10 13:35:40 -08:00
|
|
|
depends on DEBUG_KERNEL && ARCH_HAS_DEBUG_VIRTUAL
|
2013-07-01 13:04:43 -07:00
|
|
|
help
|
|
|
|
Enable some costly sanity checks in virtual to page code. This can
|
|
|
|
catch mistakes with virt_to_page() and friends.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
|
|
|
config DEBUG_NOMMU_REGIONS
|
|
|
|
bool "Debug the global anon/private NOMMU mapping region tree"
|
|
|
|
depends on DEBUG_KERNEL && !MMU
|
|
|
|
help
|
|
|
|
This option causes the global tree of anonymous and private mapping
|
|
|
|
regions to be regularly checked for invalid topology.
|
|
|
|
|
|
|
|
config DEBUG_MEMORY_INIT
|
|
|
|
bool "Debug memory initialisation" if EXPERT
|
|
|
|
default !EXPERT
|
|
|
|
help
|
|
|
|
Enable this for additional checks during memory initialisation.
|
|
|
|
The sanity checks verify aspects of the VM such as the memory model
|
|
|
|
and other information provided by the architecture. Verbose
|
|
|
|
information will be printed at KERN_DEBUG loglevel depending
|
|
|
|
on the mminit_loglevel= command-line option.
|
|
|
|
|
|
|
|
If unsure, say Y
|
|
|
|
|
|
|
|
config MEMORY_NOTIFIER_ERROR_INJECT
|
|
|
|
tristate "Memory hotplug notifier error injection module"
|
2021-11-05 13:44:24 -07:00
|
|
|
depends on MEMORY_HOTPLUG && NOTIFIER_ERROR_INJECTION
|
2013-07-01 13:04:43 -07:00
|
|
|
help
|
|
|
|
This option provides the ability to inject artificial errors to
|
|
|
|
memory hotplug notifier chain callbacks. It is controlled through
|
|
|
|
debugfs interface under /sys/kernel/debug/notifier-error-inject/memory
|
|
|
|
|
|
|
|
If the notifier call chain should be failed with some events
|
|
|
|
notified, write the error code to "actions/<notifier event>/error".
|
|
|
|
|
|
|
|
Example: Inject memory hotplug offline error (-12 == -ENOMEM)
|
|
|
|
|
|
|
|
# cd /sys/kernel/debug/notifier-error-inject/memory
|
|
|
|
# echo -12 > actions/MEM_GOING_OFFLINE/error
|
|
|
|
# echo offline > /sys/devices/system/memory/memoryXXX/state
|
|
|
|
bash: echo: write error: Cannot allocate memory
|
|
|
|
|
|
|
|
To compile this code as a module, choose M here: the module will
|
|
|
|
be called memory-notifier-error-inject.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
|
|
|
config DEBUG_PER_CPU_MAPS
|
|
|
|
bool "Debug access to per_cpu maps"
|
|
|
|
depends on DEBUG_KERNEL
|
|
|
|
depends on SMP
|
|
|
|
help
|
|
|
|
Say Y to verify that the per_cpu map being accessed has
|
|
|
|
been set up. This adds a fair amount of code to kernel memory
|
|
|
|
and decreases performance.
|
|
|
|
|
|
|
|
Say N if unsure.
|
|
|
|
|
2020-11-18 20:48:39 +01:00
|
|
|
config DEBUG_KMAP_LOCAL
|
|
|
|
bool "Debug kmap_local temporary mappings"
|
|
|
|
depends on DEBUG_KERNEL && KMAP_LOCAL
|
|
|
|
help
|
|
|
|
This option enables additional error checking for the kmap_local
|
|
|
|
infrastructure. Disable for production use.
|
|
|
|
|
2020-11-18 20:48:40 +01:00
|
|
|
config ARCH_SUPPORTS_KMAP_LOCAL_FORCE_MAP
|
|
|
|
bool
|
|
|
|
|
|
|
|
config DEBUG_KMAP_LOCAL_FORCE_MAP
|
|
|
|
bool "Enforce kmap_local temporary mappings"
|
|
|
|
depends on DEBUG_KERNEL && ARCH_SUPPORTS_KMAP_LOCAL_FORCE_MAP
|
|
|
|
select KMAP_LOCAL
|
|
|
|
select DEBUG_KMAP_LOCAL
|
|
|
|
help
|
|
|
|
This option enforces temporary mappings through the kmap_local
|
|
|
|
mechanism for non-highmem pages and on non-highmem systems.
|
|
|
|
Disable this for production systems!
|
|
|
|
|
2013-07-01 13:04:43 -07:00
|
|
|
config DEBUG_HIGHMEM
|
|
|
|
bool "Highmem debugging"
|
|
|
|
depends on DEBUG_KERNEL && HIGHMEM
|
2020-11-18 20:48:40 +01:00
|
|
|
select DEBUG_KMAP_LOCAL_FORCE_MAP if ARCH_SUPPORTS_KMAP_LOCAL_FORCE_MAP
|
2020-11-18 20:48:39 +01:00
|
|
|
select DEBUG_KMAP_LOCAL
|
2013-07-01 13:04:43 -07:00
|
|
|
help
|
2014-04-14 18:55:50 +02:00
|
|
|
This option enables additional error checking for high memory
|
|
|
|
systems. Disable for production systems.
|
2013-07-01 13:04:43 -07:00
|
|
|
|
|
|
|
config HAVE_DEBUG_STACKOVERFLOW
|
|
|
|
bool
|
|
|
|
|
|
|
|
config DEBUG_STACKOVERFLOW
|
|
|
|
bool "Check for stack overflows"
|
|
|
|
depends on DEBUG_KERNEL && HAVE_DEBUG_STACKOVERFLOW
|
2020-06-14 01:50:22 +09:00
|
|
|
help
|
2013-07-01 13:04:43 -07:00
|
|
|
Say Y here if you want to check for overflows of kernel, IRQ
|
2015-01-25 19:50:34 +01:00
|
|
|
and exception stacks (if your architecture uses them). This
|
2013-07-01 13:04:43 -07:00
|
|
|
option will show detailed messages if free stack space drops
|
|
|
|
below a certain limit.
|
|
|
|
|
|
|
|
These kinds of bugs usually occur when call-chains in the
|
|
|
|
kernel get too deep, especially when interrupts are
|
|
|
|
involved.
|
|
|
|
|
|
|
|
Use this in cases where you see apparently random memory
|
|
|
|
corruption, especially if it appears in 'struct thread_info'
|
|
|
|
|
|
|
|
If in doubt, say "N".
|
|
|
|
|
2024-03-21 09:36:32 -07:00
|
|
|
config CODE_TAGGING
|
|
|
|
bool
|
|
|
|
select KALLSYMS
|
|
|
|
|
2024-03-21 09:36:35 -07:00
|
|
|
config MEM_ALLOC_PROFILING
|
|
|
|
bool "Enable memory allocation profiling"
|
|
|
|
default n
|
|
|
|
depends on PROC_FS
|
|
|
|
depends on !DEBUG_FORCE_WEAK_PER_CPU
|
|
|
|
select CODE_TAGGING
|
2024-03-21 09:36:36 -07:00
|
|
|
select PAGE_EXTENSION
|
2024-03-21 09:36:44 -07:00
|
|
|
select SLAB_OBJ_EXT
|
2024-03-21 09:36:35 -07:00
|
|
|
help
|
|
|
|
Track allocation source code and record total allocation size
|
|
|
|
initiated at that code location. The mechanism can be used to track
|
|
|
|
memory leaks with a low performance and memory impact.
|
|
|
|
|
|
|
|
config MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT
|
|
|
|
bool "Enable memory allocation profiling by default"
|
|
|
|
default y
|
|
|
|
depends on MEM_ALLOC_PROFILING
|
|
|
|
|
|
|
|
config MEM_ALLOC_PROFILING_DEBUG
|
|
|
|
bool "Memory allocation profiler debugging"
|
|
|
|
default n
|
|
|
|
depends on MEM_ALLOC_PROFILING
|
|
|
|
select MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT
|
|
|
|
help
|
|
|
|
Adds warnings with helpful error messages for memory allocation
|
|
|
|
profiling.
|
|
|
|
|
kasan: add kernel address sanitizer infrastructure
Kernel Address sanitizer (KASan) is a dynamic memory error detector. It
provides fast and comprehensive solution for finding use-after-free and
out-of-bounds bugs.
KASAN uses compile-time instrumentation for checking every memory access,
therefore GCC > v4.9.2 required. v4.9.2 almost works, but has issues with
putting symbol aliases into the wrong section, which breaks kasan
instrumentation of globals.
This patch only adds infrastructure for kernel address sanitizer. It's
not available for use yet. The idea and some code was borrowed from [1].
Basic idea:
The main idea of KASAN is to use shadow memory to record whether each byte
of memory is safe to access or not, and use compiler's instrumentation to
check the shadow memory on each memory access.
Address sanitizer uses 1/8 of the memory addressable in kernel for shadow
memory and uses direct mapping with a scale and offset to translate a
memory address to its corresponding shadow address.
Here is function to translate address to corresponding shadow address:
unsigned long kasan_mem_to_shadow(unsigned long addr)
{
return (addr >> KASAN_SHADOW_SCALE_SHIFT) + KASAN_SHADOW_OFFSET;
}
where KASAN_SHADOW_SCALE_SHIFT = 3.
So for every 8 bytes there is one corresponding byte of shadow memory.
The following encoding used for each shadow byte: 0 means that all 8 bytes
of the corresponding memory region are valid for access; k (1 <= k <= 7)
means that the first k bytes are valid for access, and other (8 - k) bytes
are not; Any negative value indicates that the entire 8-bytes are
inaccessible. Different negative values used to distinguish between
different kinds of inaccessible memory (redzones, freed memory) (see
mm/kasan/kasan.h).
To be able to detect accesses to bad memory we need a special compiler.
Such compiler inserts a specific function calls (__asan_load*(addr),
__asan_store*(addr)) before each memory access of size 1, 2, 4, 8 or 16.
These functions check whether memory region is valid to access or not by
checking corresponding shadow memory. If access is not valid an error
printed.
Historical background of the address sanitizer from Dmitry Vyukov:
"We've developed the set of tools, AddressSanitizer (Asan),
ThreadSanitizer and MemorySanitizer, for user space. We actively use
them for testing inside of Google (continuous testing, fuzzing,
running prod services). To date the tools have found more than 10'000
scary bugs in Chromium, Google internal codebase and various
open-source projects (Firefox, OpenSSL, gcc, clang, ffmpeg, MySQL and
lots of others): [2] [3] [4].
The tools are part of both gcc and clang compilers.
We have not yet done massive testing under the Kernel AddressSanitizer
(it's kind of chicken and egg problem, you need it to be upstream to
start applying it extensively). To date it has found about 50 bugs.
Bugs that we've found in upstream kernel are listed in [5].
We've also found ~20 bugs in out internal version of the kernel. Also
people from Samsung and Oracle have found some.
[...]
As others noted, the main feature of AddressSanitizer is its
performance due to inline compiler instrumentation and simple linear
shadow memory. User-space Asan has ~2x slowdown on computational
programs and ~2x memory consumption increase. Taking into account that
kernel usually consumes only small fraction of CPU and memory when
running real user-space programs, I would expect that kernel Asan will
have ~10-30% slowdown and similar memory consumption increase (when we
finish all tuning).
I agree that Asan can well replace kmemcheck. We have plans to start
working on Kernel MemorySanitizer that finds uses of unitialized
memory. Asan+Msan will provide feature-parity with kmemcheck. As
others noted, Asan will unlikely replace debug slab and pagealloc that
can be enabled at runtime. Asan uses compiler instrumentation, so even
if it is disabled, it still incurs visible overheads.
Asan technology is easily portable to other architectures. Compiler
instrumentation is fully portable. Runtime has some arch-dependent
parts like shadow mapping and atomic operation interception. They are
relatively easy to port."
Comparison with other debugging features:
========================================
KMEMCHECK:
- KASan can do almost everything that kmemcheck can. KASan uses
compile-time instrumentation, which makes it significantly faster than
kmemcheck. The only advantage of kmemcheck over KASan is detection of
uninitialized memory reads.
Some brief performance testing showed that kasan could be
x500-x600 times faster than kmemcheck:
$ netperf -l 30
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost (127.0.0.1) port 0 AF_INET
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
no debug: 87380 16384 16384 30.00 41624.72
kasan inline: 87380 16384 16384 30.00 12870.54
kasan outline: 87380 16384 16384 30.00 10586.39
kmemcheck: 87380 16384 16384 30.03 20.23
- Also kmemcheck couldn't work on several CPUs. It always sets
number of CPUs to 1. KASan doesn't have such limitation.
DEBUG_PAGEALLOC:
- KASan is slower than DEBUG_PAGEALLOC, but KASan works on sub-page
granularity level, so it able to find more bugs.
SLUB_DEBUG (poisoning, redzones):
- SLUB_DEBUG has lower overhead than KASan.
- SLUB_DEBUG in most cases are not able to detect bad reads,
KASan able to detect both reads and writes.
- In some cases (e.g. redzone overwritten) SLUB_DEBUG detect
bugs only on allocation/freeing of object. KASan catch
bugs right before it will happen, so we always know exact
place of first bad read/write.
[1] https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel
[2] https://code.google.com/p/address-sanitizer/wiki/FoundBugs
[3] https://code.google.com/p/thread-sanitizer/wiki/FoundBugs
[4] https://code.google.com/p/memory-sanitizer/wiki/FoundBugs
[5] https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel#Trophies
Based on work by Andrey Konovalov.
Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Acked-by: Michal Marek <mmarek@suse.cz>
Signed-off-by: Andrey Konovalov <adech.fo@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Konstantin Serebryany <kcc@google.com>
Cc: Dmitry Chernenkov <dmitryc@google.com>
Cc: Yuri Gribov <tetra2005@gmail.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-13 14:39:17 -08:00
|
|
|
source "lib/Kconfig.kasan"
|
mm: add Kernel Electric-Fence infrastructure
Patch series "KFENCE: A low-overhead sampling-based memory safety error detector", v7.
This adds the Kernel Electric-Fence (KFENCE) infrastructure. KFENCE is a
low-overhead sampling-based memory safety error detector of heap
use-after-free, invalid-free, and out-of-bounds access errors. This
series enables KFENCE for the x86 and arm64 architectures, and adds
KFENCE hooks to the SLAB and SLUB allocators.
KFENCE is designed to be enabled in production kernels, and has near
zero performance overhead. Compared to KASAN, KFENCE trades performance
for precision. The main motivation behind KFENCE's design, is that with
enough total uptime KFENCE will detect bugs in code paths not typically
exercised by non-production test workloads. One way to quickly achieve a
large enough total uptime is when the tool is deployed across a large
fleet of machines.
KFENCE objects each reside on a dedicated page, at either the left or
right page boundaries. The pages to the left and right of the object
page are "guard pages", whose attributes are changed to a protected
state, and cause page faults on any attempted access to them. Such page
faults are then intercepted by KFENCE, which handles the fault
gracefully by reporting a memory access error.
Guarded allocations are set up based on a sample interval (can be set
via kfence.sample_interval). After expiration of the sample interval,
the next allocation through the main allocator (SLAB or SLUB) returns a
guarded allocation from the KFENCE object pool. At this point, the timer
is reset, and the next allocation is set up after the expiration of the
interval.
To enable/disable a KFENCE allocation through the main allocator's
fast-path without overhead, KFENCE relies on static branches via the
static keys infrastructure. The static branch is toggled to redirect the
allocation to KFENCE.
The KFENCE memory pool is of fixed size, and if the pool is exhausted no
further KFENCE allocations occur. The default config is conservative
with only 255 objects, resulting in a pool size of 2 MiB (with 4 KiB
pages).
We have verified by running synthetic benchmarks (sysbench I/O,
hackbench) and production server-workload benchmarks that a kernel with
KFENCE (using sample intervals 100-500ms) is performance-neutral
compared to a non-KFENCE baseline kernel.
KFENCE is inspired by GWP-ASan [1], a userspace tool with similar
properties. The name "KFENCE" is a homage to the Electric Fence Malloc
Debugger [2].
For more details, see Documentation/dev-tools/kfence.rst added in the
series -- also viewable here:
https://raw.githubusercontent.com/google/kasan/kfence/Documentation/dev-tools/kfence.rst
[1] http://llvm.org/docs/GwpAsan.html
[2] https://linux.die.net/man/3/efence
This patch (of 9):
This adds the Kernel Electric-Fence (KFENCE) infrastructure. KFENCE is a
low-overhead sampling-based memory safety error detector of heap
use-after-free, invalid-free, and out-of-bounds access errors.
KFENCE is designed to be enabled in production kernels, and has near
zero performance overhead. Compared to KASAN, KFENCE trades performance
for precision. The main motivation behind KFENCE's design, is that with
enough total uptime KFENCE will detect bugs in code paths not typically
exercised by non-production test workloads. One way to quickly achieve a
large enough total uptime is when the tool is deployed across a large
fleet of machines.
KFENCE objects each reside on a dedicated page, at either the left or
right page boundaries. The pages to the left and right of the object
page are "guard pages", whose attributes are changed to a protected
state, and cause page faults on any attempted access to them. Such page
faults are then intercepted by KFENCE, which handles the fault
gracefully by reporting a memory access error. To detect out-of-bounds
writes to memory within the object's page itself, KFENCE also uses
pattern-based redzones. The following figure illustrates the page
layout:
---+-----------+-----------+-----------+-----------+-----------+---
| xxxxxxxxx | O : | xxxxxxxxx | : O | xxxxxxxxx |
| xxxxxxxxx | B : | xxxxxxxxx | : B | xxxxxxxxx |
| x GUARD x | J : RED- | x GUARD x | RED- : J | x GUARD x |
| xxxxxxxxx | E : ZONE | xxxxxxxxx | ZONE : E | xxxxxxxxx |
| xxxxxxxxx | C : | xxxxxxxxx | : C | xxxxxxxxx |
| xxxxxxxxx | T : | xxxxxxxxx | : T | xxxxxxxxx |
---+-----------+-----------+-----------+-----------+-----------+---
Guarded allocations are set up based on a sample interval (can be set
via kfence.sample_interval). After expiration of the sample interval, a
guarded allocation from the KFENCE object pool is returned to the main
allocator (SLAB or SLUB). At this point, the timer is reset, and the
next allocation is set up after the expiration of the interval.
To enable/disable a KFENCE allocation through the main allocator's
fast-path without overhead, KFENCE relies on static branches via the
static keys infrastructure. The static branch is toggled to redirect the
allocation to KFENCE. To date, we have verified by running synthetic
benchmarks (sysbench I/O, hackbench) that a kernel compiled with KFENCE
is performance-neutral compared to the non-KFENCE baseline.
For more details, see Documentation/dev-tools/kfence.rst (added later in
the series).
[elver@google.com: fix parameter description for kfence_object_start()]
Link: https://lkml.kernel.org/r/20201106092149.GA2851373@elver.google.com
[elver@google.com: avoid stalling work queue task without allocations]
Link: https://lkml.kernel.org/r/CADYN=9J0DQhizAGB0-jz4HOBBh+05kMBXb4c0cXMS7Qi5NAJiw@mail.gmail.com
Link: https://lkml.kernel.org/r/20201110135320.3309507-1-elver@google.com
[elver@google.com: fix potential deadlock due to wake_up()]
Link: https://lkml.kernel.org/r/000000000000c0645805b7f982e4@google.com
Link: https://lkml.kernel.org/r/20210104130749.1768991-1-elver@google.com
[elver@google.com: add option to use KFENCE without static keys]
Link: https://lkml.kernel.org/r/20210111091544.3287013-1-elver@google.com
[elver@google.com: add missing copyright and description headers]
Link: https://lkml.kernel.org/r/20210118092159.145934-1-elver@google.com
Link: https://lkml.kernel.org/r/20201103175841.3495947-2-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Alexander Potapenko <glider@google.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: SeongJae Park <sjpark@amazon.de>
Co-developed-by: Marco Elver <elver@google.com>
Reviewed-by: Jann Horn <jannh@google.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Joern Engel <joern@purestorage.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-25 17:18:53 -08:00
|
|
|
source "lib/Kconfig.kfence"
|
2022-09-15 17:03:45 +02:00
|
|
|
source "lib/Kconfig.kmsan"
|
kasan: add kernel address sanitizer infrastructure
Kernel Address sanitizer (KASan) is a dynamic memory error detector. It
provides fast and comprehensive solution for finding use-after-free and
out-of-bounds bugs.
KASAN uses compile-time instrumentation for checking every memory access,
therefore GCC > v4.9.2 required. v4.9.2 almost works, but has issues with
putting symbol aliases into the wrong section, which breaks kasan
instrumentation of globals.
This patch only adds infrastructure for kernel address sanitizer. It's
not available for use yet. The idea and some code was borrowed from [1].
Basic idea:
The main idea of KASAN is to use shadow memory to record whether each byte
of memory is safe to access or not, and use compiler's instrumentation to
check the shadow memory on each memory access.
Address sanitizer uses 1/8 of the memory addressable in kernel for shadow
memory and uses direct mapping with a scale and offset to translate a
memory address to its corresponding shadow address.
Here is function to translate address to corresponding shadow address:
unsigned long kasan_mem_to_shadow(unsigned long addr)
{
return (addr >> KASAN_SHADOW_SCALE_SHIFT) + KASAN_SHADOW_OFFSET;
}
where KASAN_SHADOW_SCALE_SHIFT = 3.
So for every 8 bytes there is one corresponding byte of shadow memory.
The following encoding used for each shadow byte: 0 means that all 8 bytes
of the corresponding memory region are valid for access; k (1 <= k <= 7)
means that the first k bytes are valid for access, and other (8 - k) bytes
are not; Any negative value indicates that the entire 8-bytes are
inaccessible. Different negative values used to distinguish between
different kinds of inaccessible memory (redzones, freed memory) (see
mm/kasan/kasan.h).
To be able to detect accesses to bad memory we need a special compiler.
Such compiler inserts a specific function calls (__asan_load*(addr),
__asan_store*(addr)) before each memory access of size 1, 2, 4, 8 or 16.
These functions check whether memory region is valid to access or not by
checking corresponding shadow memory. If access is not valid an error
printed.
Historical background of the address sanitizer from Dmitry Vyukov:
"We've developed the set of tools, AddressSanitizer (Asan),
ThreadSanitizer and MemorySanitizer, for user space. We actively use
them for testing inside of Google (continuous testing, fuzzing,
running prod services). To date the tools have found more than 10'000
scary bugs in Chromium, Google internal codebase and various
open-source projects (Firefox, OpenSSL, gcc, clang, ffmpeg, MySQL and
lots of others): [2] [3] [4].
The tools are part of both gcc and clang compilers.
We have not yet done massive testing under the Kernel AddressSanitizer
(it's kind of chicken and egg problem, you need it to be upstream to
start applying it extensively). To date it has found about 50 bugs.
Bugs that we've found in upstream kernel are listed in [5].
We've also found ~20 bugs in out internal version of the kernel. Also
people from Samsung and Oracle have found some.
[...]
As others noted, the main feature of AddressSanitizer is its
performance due to inline compiler instrumentation and simple linear
shadow memory. User-space Asan has ~2x slowdown on computational
programs and ~2x memory consumption increase. Taking into account that
kernel usually consumes only small fraction of CPU and memory when
running real user-space programs, I would expect that kernel Asan will
have ~10-30% slowdown and similar memory consumption increase (when we
finish all tuning).
I agree that Asan can well replace kmemcheck. We have plans to start
working on Kernel MemorySanitizer that finds uses of unitialized
memory. Asan+Msan will provide feature-parity with kmemcheck. As
others noted, Asan will unlikely replace debug slab and pagealloc that
can be enabled at runtime. Asan uses compiler instrumentation, so even
if it is disabled, it still incurs visible overheads.
Asan technology is easily portable to other architectures. Compiler
instrumentation is fully portable. Runtime has some arch-dependent
parts like shadow mapping and atomic operation interception. They are
relatively easy to port."
Comparison with other debugging features:
========================================
KMEMCHECK:
- KASan can do almost everything that kmemcheck can. KASan uses
compile-time instrumentation, which makes it significantly faster than
kmemcheck. The only advantage of kmemcheck over KASan is detection of
uninitialized memory reads.
Some brief performance testing showed that kasan could be
x500-x600 times faster than kmemcheck:
$ netperf -l 30
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost (127.0.0.1) port 0 AF_INET
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
no debug: 87380 16384 16384 30.00 41624.72
kasan inline: 87380 16384 16384 30.00 12870.54
kasan outline: 87380 16384 16384 30.00 10586.39
kmemcheck: 87380 16384 16384 30.03 20.23
- Also kmemcheck couldn't work on several CPUs. It always sets
number of CPUs to 1. KASan doesn't have such limitation.
DEBUG_PAGEALLOC:
- KASan is slower than DEBUG_PAGEALLOC, but KASan works on sub-page
granularity level, so it able to find more bugs.
SLUB_DEBUG (poisoning, redzones):
- SLUB_DEBUG has lower overhead than KASan.
- SLUB_DEBUG in most cases are not able to detect bad reads,
KASan able to detect both reads and writes.
- In some cases (e.g. redzone overwritten) SLUB_DEBUG detect
bugs only on allocation/freeing of object. KASan catch
bugs right before it will happen, so we always know exact
place of first bad read/write.
[1] https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel
[2] https://code.google.com/p/address-sanitizer/wiki/FoundBugs
[3] https://code.google.com/p/thread-sanitizer/wiki/FoundBugs
[4] https://code.google.com/p/memory-sanitizer/wiki/FoundBugs
[5] https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel#Trophies
Based on work by Andrey Konovalov.
Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Acked-by: Michal Marek <mmarek@suse.cz>
Signed-off-by: Andrey Konovalov <adech.fo@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Konstantin Serebryany <kcc@google.com>
Cc: Dmitry Chernenkov <dmitryc@google.com>
Cc: Yuri Gribov <tetra2005@gmail.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-13 14:39:17 -08:00
|
|
|
|
2013-07-01 13:04:43 -07:00
|
|
|
endmenu # "Memory Debugging"
|
|
|
|
|
2007-02-12 00:52:00 -08:00
|
|
|
config DEBUG_SHIRQ
|
|
|
|
bool "Debug shared IRQ handlers"
|
2013-08-30 09:39:53 +02:00
|
|
|
depends on DEBUG_KERNEL
|
2007-02-12 00:52:00 -08:00
|
|
|
help
|
2020-07-03 00:20:24 +02:00
|
|
|
Enable this to generate a spurious interrupt just before a shared
|
|
|
|
interrupt handler is deregistered (generating one when registering
|
|
|
|
is currently disabled). Drivers need to handle this correctly. Some
|
|
|
|
don't and need to be caught.
|
2007-02-12 00:52:00 -08:00
|
|
|
|
2019-12-06 17:03:54 -08:00
|
|
|
menu "Debug Oops, Lockups and Hangs"
|
|
|
|
|
|
|
|
config PANIC_ON_OOPS
|
|
|
|
bool "Panic on Oops"
|
|
|
|
help
|
|
|
|
Say Y here to enable the kernel to panic when it oopses. This
|
|
|
|
has the same effect as setting oops=panic on the kernel command
|
|
|
|
line.
|
|
|
|
|
|
|
|
This feature is useful to ensure that the kernel does not do
|
|
|
|
anything erroneous after an oops which could result in data
|
|
|
|
corruption or other issues.
|
|
|
|
|
|
|
|
Say N if unsure.
|
|
|
|
|
|
|
|
config PANIC_ON_OOPS_VALUE
|
|
|
|
int
|
|
|
|
range 0 1
|
|
|
|
default 0 if !PANIC_ON_OOPS
|
|
|
|
default 1 if PANIC_ON_OOPS
|
|
|
|
|
|
|
|
config PANIC_TIMEOUT
|
|
|
|
int "panic timeout"
|
|
|
|
default 0
|
|
|
|
help
|
2020-08-11 18:36:49 -07:00
|
|
|
Set the timeout value (in seconds) until a reboot occurs when
|
2019-12-06 17:03:54 -08:00
|
|
|
the kernel panics. If n = 0, then we wait forever. A timeout
|
|
|
|
value n > 0 will wait n seconds before rebooting, while a timeout
|
2024-06-07 11:24:43 -04:00
|
|
|
value n < 0 will reboot immediately. This setting can be overridden
|
|
|
|
with the kernel command line option panic=, and from userspace via
|
|
|
|
/proc/sys/kernel/panic.
|
2013-07-01 13:04:50 -07:00
|
|
|
|
2010-05-07 17:11:44 -04:00
|
|
|
config LOCKUP_DETECTOR
|
2017-07-12 14:35:46 -07:00
|
|
|
bool
|
|
|
|
|
|
|
|
config SOFTLOCKUP_DETECTOR
|
|
|
|
bool "Detect Soft Lockups"
|
2006-10-11 01:20:44 -07:00
|
|
|
depends on DEBUG_KERNEL && !S390
|
2017-07-12 14:35:46 -07:00
|
|
|
select LOCKUP_DETECTOR
|
2005-09-06 15:16:27 -07:00
|
|
|
help
|
2010-05-07 17:11:44 -04:00
|
|
|
Say Y here to enable the kernel to act as a watchdog to detect
|
2017-07-12 14:35:46 -07:00
|
|
|
soft lockups.
|
2010-05-07 17:11:44 -04:00
|
|
|
|
|
|
|
Softlockups are bugs that cause the kernel to loop in kernel
|
2012-02-09 17:42:21 -05:00
|
|
|
mode for more than 20 seconds, without giving other tasks a
|
2010-05-07 17:11:44 -04:00
|
|
|
chance to run. The current stack trace is displayed upon
|
|
|
|
detection and the system will stay locked up.
|
2005-09-06 15:16:27 -07:00
|
|
|
|
watchdog/softlockup: Low-overhead detection of interrupt storm
The following softlockup is caused by interrupt storm, but it cannot be
identified from the call tree. Because the call tree is just a snapshot
and doesn't fully capture the behavior of the CPU during the soft lockup.
watchdog: BUG: soft lockup - CPU#28 stuck for 23s! [fio:83921]
...
Call trace:
__do_softirq+0xa0/0x37c
__irq_exit_rcu+0x108/0x140
irq_exit+0x14/0x20
__handle_domain_irq+0x84/0xe0
gic_handle_irq+0x80/0x108
el0_irq_naked+0x50/0x58
Therefore, it is necessary to report CPU utilization during the
softlockup_threshold period (report once every sample_period, for a total
of 5 reportings), like this:
watchdog: BUG: soft lockup - CPU#28 stuck for 23s! [fio:83921]
CPU#28 Utilization every 4s during lockup:
#1: 0% system, 0% softirq, 100% hardirq, 0% idle
#2: 0% system, 0% softirq, 100% hardirq, 0% idle
#3: 0% system, 0% softirq, 100% hardirq, 0% idle
#4: 0% system, 0% softirq, 100% hardirq, 0% idle
#5: 0% system, 0% softirq, 100% hardirq, 0% idle
...
This is helpful in determining whether an interrupt storm has occurred or
in identifying the cause of the softlockup. The criteria for determination
are as follows:
a. If the hardirq utilization is high, then interrupt storm should be
considered and the root cause cannot be determined from the call tree.
b. If the softirq utilization is high, then the call might not necessarily
point at the root cause.
c. If the system utilization is high, then analyzing the root
cause from the call tree is possible in most cases.
The mechanism requires a considerable amount of global storage space
when configured for the maximum number of CPUs. Therefore, adding a
SOFTLOCKUP_DETECTOR_INTR_STORM Kconfig knob that defaults to "yes"
if the max number of CPUs is <= 128.
Signed-off-by: Bitao Hu <yaoma@linux.alibaba.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Liu Song <liusong@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240411074134.30922-5-yaoma@linux.alibaba.com
2024-04-11 15:41:33 +08:00
|
|
|
config SOFTLOCKUP_DETECTOR_INTR_STORM
|
|
|
|
bool "Detect Interrupt Storm in Soft Lockups"
|
|
|
|
depends on SOFTLOCKUP_DETECTOR && IRQ_TIME_ACCOUNTING
|
|
|
|
select GENERIC_IRQ_STAT_SNAPSHOT
|
|
|
|
default y if NR_CPUS <= 128
|
|
|
|
help
|
|
|
|
Say Y here to enable the kernel to detect interrupt storm
|
|
|
|
during "soft lockups".
|
|
|
|
|
|
|
|
"soft lockups" can be caused by a variety of reasons. If one is
|
|
|
|
caused by an interrupt storm, then the storming interrupts will not
|
|
|
|
be on the callstack. To detect this case, it is necessary to report
|
|
|
|
the CPU stats and the interrupt counts during the "soft lockups".
|
|
|
|
|
2018-04-10 16:32:51 -07:00
|
|
|
config BOOTPARAM_SOFTLOCKUP_PANIC
|
|
|
|
bool "Panic (Reboot) On Soft Lockups"
|
|
|
|
depends on SOFTLOCKUP_DETECTOR
|
|
|
|
help
|
|
|
|
Say Y here to enable the kernel to panic on "soft lockups",
|
|
|
|
which are bugs that cause the kernel to loop in kernel
|
|
|
|
mode for more than 20 seconds (configurable using the watchdog_thresh
|
|
|
|
sysctl), without giving other tasks a chance to run.
|
|
|
|
|
|
|
|
The panic can be used in combination with panic_timeout,
|
|
|
|
to cause the system to reboot automatically after a
|
|
|
|
lockup has been detected. This feature is useful for
|
|
|
|
high-availability systems that have uptime guarantees and
|
|
|
|
where a lockup must be resolved ASAP.
|
|
|
|
|
|
|
|
Say N if unsure.
|
|
|
|
|
2023-06-16 17:06:14 +02:00
|
|
|
config HAVE_HARDLOCKUP_DETECTOR_BUDDY
|
2017-07-12 14:35:46 -07:00
|
|
|
bool
|
2023-06-16 17:06:14 +02:00
|
|
|
depends on SMP
|
|
|
|
default y
|
2017-07-12 14:35:46 -07:00
|
|
|
|
2017-08-15 09:50:13 +02:00
|
|
|
#
|
2023-06-16 17:06:14 +02:00
|
|
|
# Global switch whether to build a hardlockup detector at all. It is available
|
|
|
|
# only when the architecture supports at least one implementation. There are
|
|
|
|
# two exceptions. The hardlockup detector is never enabled on:
|
2017-08-15 09:50:13 +02:00
|
|
|
#
|
2023-06-16 17:06:14 +02:00
|
|
|
# s390: it reported many false positives there
|
2017-07-12 14:35:46 -07:00
|
|
|
#
|
2023-06-16 17:06:14 +02:00
|
|
|
# sparc64: has a custom implementation which is not using the common
|
|
|
|
# hardlockup command line options and sysctl interface.
|
2017-07-12 14:35:46 -07:00
|
|
|
#
|
|
|
|
config HARDLOCKUP_DETECTOR
|
|
|
|
bool "Detect Hard Lockups"
|
2023-06-16 17:06:17 +02:00
|
|
|
depends on DEBUG_KERNEL && !S390 && !HARDLOCKUP_DETECTOR_SPARC64
|
watchdog/hardlockup: make HAVE_NMI_WATCHDOG sparc64-specific
There are several hardlockup detector implementations and several Kconfig
values which allow selection and build of the preferred one.
CONFIG_HARDLOCKUP_DETECTOR was introduced by the commit 23637d477c1f53acb
("lockup_detector: Introduce CONFIG_HARDLOCKUP_DETECTOR") in v2.6.36.
It was a preparation step for introducing the new generic perf hardlockup
detector.
The existing arch-specific variants did not support the to-be-created
generic build configurations, sysctl interface, etc. This distinction
was made explicit by the commit 4a7863cc2eb5f98 ("x86, nmi_watchdog:
Remove ARCH_HAS_NMI_WATCHDOG and rely on CONFIG_HARDLOCKUP_DETECTOR")
in v2.6.38.
CONFIG_HAVE_NMI_WATCHDOG was introduced by the commit d314d74c695f967e105
("nmi watchdog: do not use cpp symbol in Kconfig") in v3.4-rc1. It replaced
the above mentioned ARCH_HAS_NMI_WATCHDOG. At that time, it was still used
by three architectures, namely blackfin, mn10300, and sparc.
The support for blackfin and mn10300 architectures has been completely
dropped some time ago. And sparc is the only architecture with the historic
NMI watchdog at the moment.
And the old sparc implementation is really special. It is always built on
sparc64. It used to be always enabled until the commit 7a5c8b57cec93196b
("sparc: implement watchdog_nmi_enable and watchdog_nmi_disable") added
in v4.10-rc1.
There are only few locations where the sparc64 NMI watchdog interacts
with the generic hardlockup detectors code:
+ implements arch_touch_nmi_watchdog() which is called from the generic
touch_nmi_watchdog()
+ implements watchdog_hardlockup_enable()/disable() to support
/proc/sys/kernel/nmi_watchdog
+ is always preferred over other generic watchdogs, see
CONFIG_HARDLOCKUP_DETECTOR
+ includes asm/nmi.h into linux/nmi.h because some sparc-specific
functions are needed in sparc-specific code which includes
only linux/nmi.h.
The situation became more complicated after the commit 05a4a95279311c3
("kernel/watchdog: split up config options") and commit 2104180a53698df5
("powerpc/64s: implement arch-specific hardlockup watchdog") in v4.13-rc1.
They introduced HAVE_HARDLOCKUP_DETECTOR_ARCH. It was used for powerpc
specific hardlockup detector. It was compatible with the perf one
regarding the general boot, sysctl, and programming interfaces.
HAVE_HARDLOCKUP_DETECTOR_ARCH was defined as a superset of
HAVE_NMI_WATCHDOG. It made some sense because all arch-specific
detectors had some common requirements, namely:
+ implemented arch_touch_nmi_watchdog()
+ included asm/nmi.h into linux/nmi.h
+ defined the default value for /proc/sys/kernel/nmi_watchdog
But it actually has made things pretty complicated when the generic
buddy hardlockup detector was added. Before the generic perf detector
was newer supported together with an arch-specific one. But the buddy
detector could work on any SMP system. It means that an architecture
could support both the arch-specific and buddy detector.
As a result, there are few tricky dependencies. For example,
CONFIG_HARDLOCKUP_DETECTOR depends on:
((HAVE_HARDLOCKUP_DETECTOR_PERF || HAVE_HARDLOCKUP_DETECTOR_BUDDY) && !HAVE_NMI_WATCHDOG) || HAVE_HARDLOCKUP_DETECTOR_ARCH
The problem is that the very special sparc implementation is defined as:
HAVE_NMI_WATCHDOG && !HAVE_HARDLOCKUP_DETECTOR_ARCH
Another problem is that the meaning of HAVE_NMI_WATCHDOG is far from clear
without reading understanding the history.
Make the logic less tricky and more self-explanatory by making
HAVE_NMI_WATCHDOG specific for the sparc64 implementation. And rename it to
HAVE_HARDLOCKUP_DETECTOR_SPARC64.
Note that HARDLOCKUP_DETECTOR_PREFER_BUDDY, HARDLOCKUP_DETECTOR_PERF,
and HARDLOCKUP_DETECTOR_BUDDY may conflict only with
HAVE_HARDLOCKUP_DETECTOR_ARCH. They depend on HARDLOCKUP_DETECTOR
and it is not longer enabled when HAVE_NMI_WATCHDOG is set.
Link: https://lkml.kernel.org/r/20230616150618.6073-5-pmladek@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-16 17:06:16 +02:00
|
|
|
depends on HAVE_HARDLOCKUP_DETECTOR_PERF || HAVE_HARDLOCKUP_DETECTOR_BUDDY || HAVE_HARDLOCKUP_DETECTOR_ARCH
|
2023-06-16 17:06:14 +02:00
|
|
|
imply HARDLOCKUP_DETECTOR_PERF
|
|
|
|
imply HARDLOCKUP_DETECTOR_BUDDY
|
2023-06-16 17:06:18 +02:00
|
|
|
imply HARDLOCKUP_DETECTOR_ARCH
|
2017-07-12 14:35:46 -07:00
|
|
|
select LOCKUP_DETECTOR
|
2023-06-16 17:06:13 +02:00
|
|
|
|
2017-07-12 14:35:46 -07:00
|
|
|
help
|
|
|
|
Say Y here to enable the kernel to act as a watchdog to detect
|
|
|
|
hard lockups.
|
|
|
|
|
2010-05-07 17:11:44 -04:00
|
|
|
Hardlockups are bugs that cause the CPU to loop in kernel mode
|
2012-02-09 17:42:21 -05:00
|
|
|
for more than 10 seconds, without letting other interrupts have a
|
2010-05-07 17:11:44 -04:00
|
|
|
chance to run. The current stack trace is displayed upon detection
|
|
|
|
and the system will stay locked up.
|
2005-09-06 15:16:27 -07:00
|
|
|
|
2023-06-16 17:06:14 +02:00
|
|
|
#
|
|
|
|
# Note that arch-specific variants are always preferred.
|
|
|
|
#
|
2023-06-16 17:06:13 +02:00
|
|
|
config HARDLOCKUP_DETECTOR_PREFER_BUDDY
|
|
|
|
bool "Prefer the buddy CPU hardlockup detector"
|
2023-06-16 17:06:14 +02:00
|
|
|
depends on HARDLOCKUP_DETECTOR
|
|
|
|
depends on HAVE_HARDLOCKUP_DETECTOR_PERF && HAVE_HARDLOCKUP_DETECTOR_BUDDY
|
2023-06-23 06:07:17 +02:00
|
|
|
depends on !HAVE_HARDLOCKUP_DETECTOR_ARCH
|
2023-06-16 17:06:13 +02:00
|
|
|
help
|
|
|
|
Say Y here to prefer the buddy hardlockup detector over the perf one.
|
|
|
|
|
|
|
|
With the buddy detector, each CPU uses its softlockup hrtimer
|
|
|
|
to check that the next CPU is processing hrtimer interrupts by
|
|
|
|
verifying that a counter is increasing.
|
|
|
|
|
|
|
|
This hardlockup detector is useful on systems that don't have
|
|
|
|
an arch-specific hardlockup detector or if resources needed
|
|
|
|
for the hardlockup detector are better used for other things.
|
2017-07-12 14:35:46 -07:00
|
|
|
|
watchdog/hardlockup: detect hard lockups using secondary (buddy) CPUs
Implement a hardlockup detector that doesn't doesn't need any extra
arch-specific support code to detect lockups. Instead of using something
arch-specific we will use the buddy system, where each CPU watches out for
another one. Specifically, each CPU will use its softlockup hrtimer to
check that the next CPU is processing hrtimer interrupts by verifying that
a counter is increasing.
NOTE: unlike the other hard lockup detectors, the buddy one can't easily
show what's happening on the CPU that locked up just by doing a simple
backtrace. It relies on some other mechanism in the system to get
information about the locked up CPUs. This could be support for NMI
backtraces like [1], it could be a mechanism for printing the PC of locked
CPUs at panic time like [2] / [3], or it could be something else. Even
though that means we still rely on arch-specific code, this arch-specific
code seems to often be implemented even on architectures that don't have a
hardlockup detector.
This style of hardlockup detector originated in some downstream Android
trees and has been rebased on / carried in ChromeOS trees for quite a long
time for use on arm and arm64 boards. Historically on these boards we've
leveraged mechanism [2] / [3] to get information about hung CPUs, but we
could move to [1].
Although the original motivation for the buddy system was for use on
systems without an arch-specific hardlockup detector, it can still be
useful to use even on systems that _do_ have an arch-specific hardlockup
detector. On x86, for instance, there is a 24-part patch series [4] in
progress switching the arch-specific hard lockup detector from a scarce
perf counter to a less-scarce hardware resource. Potentially the buddy
system could be a simpler alternative to free up the perf counter but
still get hard lockup detection.
Overall, pros (+) and cons (-) of the buddy system compared to an
arch-specific hardlockup detector (which might be implemented using
perf):
+ The buddy system is usable on systems that don't have an
arch-specific hardlockup detector, like arm32 and arm64 (though it's
being worked on for arm64 [5]).
+ The buddy system may free up scarce hardware resources.
+ If a CPU totally goes out to lunch (can't process NMIs) the buddy
system could still detect the problem (though it would be unlikely
to be able to get a stack trace).
+ The buddy system uses the same timer function to pet the hardlockup
detector on the running CPU as it uses to detect hardlockups on
other CPUs. Compared to other hardlockup detectors, this means it
generates fewer interrupts and thus is likely better able to let
CPUs stay idle longer.
- If all CPUs are hard locked up at the same time the buddy system
can't detect it.
- If we don't have SMP we can't use the buddy system.
- The buddy system needs an arch-specific mechanism (possibly NMI
backtrace) to get info about the locked up CPU.
[1] https://lore.kernel.org/r/20230419225604.21204-1-dianders@chromium.org
[2] https://issuetracker.google.com/172213129
[3] https://docs.kernel.org/trace/coresight/coresight-cpu-debug.html
[4] https://lore.kernel.org/lkml/20230301234753.28582-1-ricardo.neri-calderon@linux.intel.com/
[5] https://lore.kernel.org/linux-arm-kernel/20220903093415.15850-1-lecopzer.chen@mediatek.com/
Link: https://lkml.kernel.org/r/20230519101840.v5.14.I6bf789d21d0c3d75d382e7e51a804a7a51315f2c@changeid
Signed-off-by: Colin Cross <ccross@android.com>
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Guenter Roeck <groeck@chromium.org>
Signed-off-by: Tzung-Bi Shih <tzungbi@chromium.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen-Yu Tsai <wens@csie.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Ian Rogers <irogers@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masayoshi Mizuma <msys.mizuma@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Pingfan Liu <kernelfans@gmail.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: Ricardo Neri <ricardo.neri@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Stephen Boyd <swboyd@chromium.org>
Cc: Sumit Garg <sumit.garg@linaro.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-05-19 10:18:38 -07:00
|
|
|
config HARDLOCKUP_DETECTOR_PERF
|
|
|
|
bool
|
2023-06-16 17:06:14 +02:00
|
|
|
depends on HARDLOCKUP_DETECTOR
|
|
|
|
depends on HAVE_HARDLOCKUP_DETECTOR_PERF && !HARDLOCKUP_DETECTOR_PREFER_BUDDY
|
watchdog/hardlockup: make HAVE_NMI_WATCHDOG sparc64-specific
There are several hardlockup detector implementations and several Kconfig
values which allow selection and build of the preferred one.
CONFIG_HARDLOCKUP_DETECTOR was introduced by the commit 23637d477c1f53acb
("lockup_detector: Introduce CONFIG_HARDLOCKUP_DETECTOR") in v2.6.36.
It was a preparation step for introducing the new generic perf hardlockup
detector.
The existing arch-specific variants did not support the to-be-created
generic build configurations, sysctl interface, etc. This distinction
was made explicit by the commit 4a7863cc2eb5f98 ("x86, nmi_watchdog:
Remove ARCH_HAS_NMI_WATCHDOG and rely on CONFIG_HARDLOCKUP_DETECTOR")
in v2.6.38.
CONFIG_HAVE_NMI_WATCHDOG was introduced by the commit d314d74c695f967e105
("nmi watchdog: do not use cpp symbol in Kconfig") in v3.4-rc1. It replaced
the above mentioned ARCH_HAS_NMI_WATCHDOG. At that time, it was still used
by three architectures, namely blackfin, mn10300, and sparc.
The support for blackfin and mn10300 architectures has been completely
dropped some time ago. And sparc is the only architecture with the historic
NMI watchdog at the moment.
And the old sparc implementation is really special. It is always built on
sparc64. It used to be always enabled until the commit 7a5c8b57cec93196b
("sparc: implement watchdog_nmi_enable and watchdog_nmi_disable") added
in v4.10-rc1.
There are only few locations where the sparc64 NMI watchdog interacts
with the generic hardlockup detectors code:
+ implements arch_touch_nmi_watchdog() which is called from the generic
touch_nmi_watchdog()
+ implements watchdog_hardlockup_enable()/disable() to support
/proc/sys/kernel/nmi_watchdog
+ is always preferred over other generic watchdogs, see
CONFIG_HARDLOCKUP_DETECTOR
+ includes asm/nmi.h into linux/nmi.h because some sparc-specific
functions are needed in sparc-specific code which includes
only linux/nmi.h.
The situation became more complicated after the commit 05a4a95279311c3
("kernel/watchdog: split up config options") and commit 2104180a53698df5
("powerpc/64s: implement arch-specific hardlockup watchdog") in v4.13-rc1.
They introduced HAVE_HARDLOCKUP_DETECTOR_ARCH. It was used for powerpc
specific hardlockup detector. It was compatible with the perf one
regarding the general boot, sysctl, and programming interfaces.
HAVE_HARDLOCKUP_DETECTOR_ARCH was defined as a superset of
HAVE_NMI_WATCHDOG. It made some sense because all arch-specific
detectors had some common requirements, namely:
+ implemented arch_touch_nmi_watchdog()
+ included asm/nmi.h into linux/nmi.h
+ defined the default value for /proc/sys/kernel/nmi_watchdog
But it actually has made things pretty complicated when the generic
buddy hardlockup detector was added. Before the generic perf detector
was newer supported together with an arch-specific one. But the buddy
detector could work on any SMP system. It means that an architecture
could support both the arch-specific and buddy detector.
As a result, there are few tricky dependencies. For example,
CONFIG_HARDLOCKUP_DETECTOR depends on:
((HAVE_HARDLOCKUP_DETECTOR_PERF || HAVE_HARDLOCKUP_DETECTOR_BUDDY) && !HAVE_NMI_WATCHDOG) || HAVE_HARDLOCKUP_DETECTOR_ARCH
The problem is that the very special sparc implementation is defined as:
HAVE_NMI_WATCHDOG && !HAVE_HARDLOCKUP_DETECTOR_ARCH
Another problem is that the meaning of HAVE_NMI_WATCHDOG is far from clear
without reading understanding the history.
Make the logic less tricky and more self-explanatory by making
HAVE_NMI_WATCHDOG specific for the sparc64 implementation. And rename it to
HAVE_HARDLOCKUP_DETECTOR_SPARC64.
Note that HARDLOCKUP_DETECTOR_PREFER_BUDDY, HARDLOCKUP_DETECTOR_PERF,
and HARDLOCKUP_DETECTOR_BUDDY may conflict only with
HAVE_HARDLOCKUP_DETECTOR_ARCH. They depend on HARDLOCKUP_DETECTOR
and it is not longer enabled when HAVE_NMI_WATCHDOG is set.
Link: https://lkml.kernel.org/r/20230616150618.6073-5-pmladek@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-16 17:06:16 +02:00
|
|
|
depends on !HAVE_HARDLOCKUP_DETECTOR_ARCH
|
watchdog/hardlockup: detect hard lockups using secondary (buddy) CPUs
Implement a hardlockup detector that doesn't doesn't need any extra
arch-specific support code to detect lockups. Instead of using something
arch-specific we will use the buddy system, where each CPU watches out for
another one. Specifically, each CPU will use its softlockup hrtimer to
check that the next CPU is processing hrtimer interrupts by verifying that
a counter is increasing.
NOTE: unlike the other hard lockup detectors, the buddy one can't easily
show what's happening on the CPU that locked up just by doing a simple
backtrace. It relies on some other mechanism in the system to get
information about the locked up CPUs. This could be support for NMI
backtraces like [1], it could be a mechanism for printing the PC of locked
CPUs at panic time like [2] / [3], or it could be something else. Even
though that means we still rely on arch-specific code, this arch-specific
code seems to often be implemented even on architectures that don't have a
hardlockup detector.
This style of hardlockup detector originated in some downstream Android
trees and has been rebased on / carried in ChromeOS trees for quite a long
time for use on arm and arm64 boards. Historically on these boards we've
leveraged mechanism [2] / [3] to get information about hung CPUs, but we
could move to [1].
Although the original motivation for the buddy system was for use on
systems without an arch-specific hardlockup detector, it can still be
useful to use even on systems that _do_ have an arch-specific hardlockup
detector. On x86, for instance, there is a 24-part patch series [4] in
progress switching the arch-specific hard lockup detector from a scarce
perf counter to a less-scarce hardware resource. Potentially the buddy
system could be a simpler alternative to free up the perf counter but
still get hard lockup detection.
Overall, pros (+) and cons (-) of the buddy system compared to an
arch-specific hardlockup detector (which might be implemented using
perf):
+ The buddy system is usable on systems that don't have an
arch-specific hardlockup detector, like arm32 and arm64 (though it's
being worked on for arm64 [5]).
+ The buddy system may free up scarce hardware resources.
+ If a CPU totally goes out to lunch (can't process NMIs) the buddy
system could still detect the problem (though it would be unlikely
to be able to get a stack trace).
+ The buddy system uses the same timer function to pet the hardlockup
detector on the running CPU as it uses to detect hardlockups on
other CPUs. Compared to other hardlockup detectors, this means it
generates fewer interrupts and thus is likely better able to let
CPUs stay idle longer.
- If all CPUs are hard locked up at the same time the buddy system
can't detect it.
- If we don't have SMP we can't use the buddy system.
- The buddy system needs an arch-specific mechanism (possibly NMI
backtrace) to get info about the locked up CPU.
[1] https://lore.kernel.org/r/20230419225604.21204-1-dianders@chromium.org
[2] https://issuetracker.google.com/172213129
[3] https://docs.kernel.org/trace/coresight/coresight-cpu-debug.html
[4] https://lore.kernel.org/lkml/20230301234753.28582-1-ricardo.neri-calderon@linux.intel.com/
[5] https://lore.kernel.org/linux-arm-kernel/20220903093415.15850-1-lecopzer.chen@mediatek.com/
Link: https://lkml.kernel.org/r/20230519101840.v5.14.I6bf789d21d0c3d75d382e7e51a804a7a51315f2c@changeid
Signed-off-by: Colin Cross <ccross@android.com>
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Guenter Roeck <groeck@chromium.org>
Signed-off-by: Tzung-Bi Shih <tzungbi@chromium.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen-Yu Tsai <wens@csie.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Ian Rogers <irogers@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masayoshi Mizuma <msys.mizuma@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Pingfan Liu <kernelfans@gmail.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: Ricardo Neri <ricardo.neri@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Stephen Boyd <swboyd@chromium.org>
Cc: Sumit Garg <sumit.garg@linaro.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-05-19 10:18:38 -07:00
|
|
|
select HARDLOCKUP_DETECTOR_COUNTS_HRTIMER
|
|
|
|
|
|
|
|
config HARDLOCKUP_DETECTOR_BUDDY
|
|
|
|
bool
|
2023-06-16 17:06:14 +02:00
|
|
|
depends on HARDLOCKUP_DETECTOR
|
|
|
|
depends on HAVE_HARDLOCKUP_DETECTOR_BUDDY
|
|
|
|
depends on !HAVE_HARDLOCKUP_DETECTOR_PERF || HARDLOCKUP_DETECTOR_PREFER_BUDDY
|
watchdog/hardlockup: make HAVE_NMI_WATCHDOG sparc64-specific
There are several hardlockup detector implementations and several Kconfig
values which allow selection and build of the preferred one.
CONFIG_HARDLOCKUP_DETECTOR was introduced by the commit 23637d477c1f53acb
("lockup_detector: Introduce CONFIG_HARDLOCKUP_DETECTOR") in v2.6.36.
It was a preparation step for introducing the new generic perf hardlockup
detector.
The existing arch-specific variants did not support the to-be-created
generic build configurations, sysctl interface, etc. This distinction
was made explicit by the commit 4a7863cc2eb5f98 ("x86, nmi_watchdog:
Remove ARCH_HAS_NMI_WATCHDOG and rely on CONFIG_HARDLOCKUP_DETECTOR")
in v2.6.38.
CONFIG_HAVE_NMI_WATCHDOG was introduced by the commit d314d74c695f967e105
("nmi watchdog: do not use cpp symbol in Kconfig") in v3.4-rc1. It replaced
the above mentioned ARCH_HAS_NMI_WATCHDOG. At that time, it was still used
by three architectures, namely blackfin, mn10300, and sparc.
The support for blackfin and mn10300 architectures has been completely
dropped some time ago. And sparc is the only architecture with the historic
NMI watchdog at the moment.
And the old sparc implementation is really special. It is always built on
sparc64. It used to be always enabled until the commit 7a5c8b57cec93196b
("sparc: implement watchdog_nmi_enable and watchdog_nmi_disable") added
in v4.10-rc1.
There are only few locations where the sparc64 NMI watchdog interacts
with the generic hardlockup detectors code:
+ implements arch_touch_nmi_watchdog() which is called from the generic
touch_nmi_watchdog()
+ implements watchdog_hardlockup_enable()/disable() to support
/proc/sys/kernel/nmi_watchdog
+ is always preferred over other generic watchdogs, see
CONFIG_HARDLOCKUP_DETECTOR
+ includes asm/nmi.h into linux/nmi.h because some sparc-specific
functions are needed in sparc-specific code which includes
only linux/nmi.h.
The situation became more complicated after the commit 05a4a95279311c3
("kernel/watchdog: split up config options") and commit 2104180a53698df5
("powerpc/64s: implement arch-specific hardlockup watchdog") in v4.13-rc1.
They introduced HAVE_HARDLOCKUP_DETECTOR_ARCH. It was used for powerpc
specific hardlockup detector. It was compatible with the perf one
regarding the general boot, sysctl, and programming interfaces.
HAVE_HARDLOCKUP_DETECTOR_ARCH was defined as a superset of
HAVE_NMI_WATCHDOG. It made some sense because all arch-specific
detectors had some common requirements, namely:
+ implemented arch_touch_nmi_watchdog()
+ included asm/nmi.h into linux/nmi.h
+ defined the default value for /proc/sys/kernel/nmi_watchdog
But it actually has made things pretty complicated when the generic
buddy hardlockup detector was added. Before the generic perf detector
was newer supported together with an arch-specific one. But the buddy
detector could work on any SMP system. It means that an architecture
could support both the arch-specific and buddy detector.
As a result, there are few tricky dependencies. For example,
CONFIG_HARDLOCKUP_DETECTOR depends on:
((HAVE_HARDLOCKUP_DETECTOR_PERF || HAVE_HARDLOCKUP_DETECTOR_BUDDY) && !HAVE_NMI_WATCHDOG) || HAVE_HARDLOCKUP_DETECTOR_ARCH
The problem is that the very special sparc implementation is defined as:
HAVE_NMI_WATCHDOG && !HAVE_HARDLOCKUP_DETECTOR_ARCH
Another problem is that the meaning of HAVE_NMI_WATCHDOG is far from clear
without reading understanding the history.
Make the logic less tricky and more self-explanatory by making
HAVE_NMI_WATCHDOG specific for the sparc64 implementation. And rename it to
HAVE_HARDLOCKUP_DETECTOR_SPARC64.
Note that HARDLOCKUP_DETECTOR_PREFER_BUDDY, HARDLOCKUP_DETECTOR_PERF,
and HARDLOCKUP_DETECTOR_BUDDY may conflict only with
HAVE_HARDLOCKUP_DETECTOR_ARCH. They depend on HARDLOCKUP_DETECTOR
and it is not longer enabled when HAVE_NMI_WATCHDOG is set.
Link: https://lkml.kernel.org/r/20230616150618.6073-5-pmladek@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-16 17:06:16 +02:00
|
|
|
depends on !HAVE_HARDLOCKUP_DETECTOR_ARCH
|
watchdog/hardlockup: detect hard lockups using secondary (buddy) CPUs
Implement a hardlockup detector that doesn't doesn't need any extra
arch-specific support code to detect lockups. Instead of using something
arch-specific we will use the buddy system, where each CPU watches out for
another one. Specifically, each CPU will use its softlockup hrtimer to
check that the next CPU is processing hrtimer interrupts by verifying that
a counter is increasing.
NOTE: unlike the other hard lockup detectors, the buddy one can't easily
show what's happening on the CPU that locked up just by doing a simple
backtrace. It relies on some other mechanism in the system to get
information about the locked up CPUs. This could be support for NMI
backtraces like [1], it could be a mechanism for printing the PC of locked
CPUs at panic time like [2] / [3], or it could be something else. Even
though that means we still rely on arch-specific code, this arch-specific
code seems to often be implemented even on architectures that don't have a
hardlockup detector.
This style of hardlockup detector originated in some downstream Android
trees and has been rebased on / carried in ChromeOS trees for quite a long
time for use on arm and arm64 boards. Historically on these boards we've
leveraged mechanism [2] / [3] to get information about hung CPUs, but we
could move to [1].
Although the original motivation for the buddy system was for use on
systems without an arch-specific hardlockup detector, it can still be
useful to use even on systems that _do_ have an arch-specific hardlockup
detector. On x86, for instance, there is a 24-part patch series [4] in
progress switching the arch-specific hard lockup detector from a scarce
perf counter to a less-scarce hardware resource. Potentially the buddy
system could be a simpler alternative to free up the perf counter but
still get hard lockup detection.
Overall, pros (+) and cons (-) of the buddy system compared to an
arch-specific hardlockup detector (which might be implemented using
perf):
+ The buddy system is usable on systems that don't have an
arch-specific hardlockup detector, like arm32 and arm64 (though it's
being worked on for arm64 [5]).
+ The buddy system may free up scarce hardware resources.
+ If a CPU totally goes out to lunch (can't process NMIs) the buddy
system could still detect the problem (though it would be unlikely
to be able to get a stack trace).
+ The buddy system uses the same timer function to pet the hardlockup
detector on the running CPU as it uses to detect hardlockups on
other CPUs. Compared to other hardlockup detectors, this means it
generates fewer interrupts and thus is likely better able to let
CPUs stay idle longer.
- If all CPUs are hard locked up at the same time the buddy system
can't detect it.
- If we don't have SMP we can't use the buddy system.
- The buddy system needs an arch-specific mechanism (possibly NMI
backtrace) to get info about the locked up CPU.
[1] https://lore.kernel.org/r/20230419225604.21204-1-dianders@chromium.org
[2] https://issuetracker.google.com/172213129
[3] https://docs.kernel.org/trace/coresight/coresight-cpu-debug.html
[4] https://lore.kernel.org/lkml/20230301234753.28582-1-ricardo.neri-calderon@linux.intel.com/
[5] https://lore.kernel.org/linux-arm-kernel/20220903093415.15850-1-lecopzer.chen@mediatek.com/
Link: https://lkml.kernel.org/r/20230519101840.v5.14.I6bf789d21d0c3d75d382e7e51a804a7a51315f2c@changeid
Signed-off-by: Colin Cross <ccross@android.com>
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Guenter Roeck <groeck@chromium.org>
Signed-off-by: Tzung-Bi Shih <tzungbi@chromium.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen-Yu Tsai <wens@csie.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Ian Rogers <irogers@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masayoshi Mizuma <msys.mizuma@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Pingfan Liu <kernelfans@gmail.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: Ricardo Neri <ricardo.neri@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Stephen Boyd <swboyd@chromium.org>
Cc: Sumit Garg <sumit.garg@linaro.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-05-19 10:18:38 -07:00
|
|
|
select HARDLOCKUP_DETECTOR_COUNTS_HRTIMER
|
|
|
|
|
2023-06-16 17:06:18 +02:00
|
|
|
config HARDLOCKUP_DETECTOR_ARCH
|
|
|
|
bool
|
|
|
|
depends on HARDLOCKUP_DETECTOR
|
|
|
|
depends on HAVE_HARDLOCKUP_DETECTOR_ARCH
|
|
|
|
help
|
|
|
|
The arch-specific implementation of the hardlockup detector will
|
|
|
|
be used.
|
|
|
|
|
2023-06-16 17:06:14 +02:00
|
|
|
#
|
2023-06-16 17:06:13 +02:00
|
|
|
# Both the "perf" and "buddy" hardlockup detectors count hrtimer
|
|
|
|
# interrupts. This config enables functions managing this common code.
|
2023-06-16 17:06:14 +02:00
|
|
|
#
|
2023-06-16 17:06:13 +02:00
|
|
|
config HARDLOCKUP_DETECTOR_COUNTS_HRTIMER
|
|
|
|
bool
|
|
|
|
select SOFTLOCKUP_DETECTOR
|
|
|
|
|
2017-08-15 09:50:13 +02:00
|
|
|
#
|
|
|
|
# Enables a timestamp based low pass filter to compensate for perf based
|
|
|
|
# hard lockup detection which runs too fast due to turbo modes.
|
|
|
|
#
|
|
|
|
config HARDLOCKUP_CHECK_TIMESTAMP
|
|
|
|
bool
|
|
|
|
|
2011-03-22 16:34:16 -07:00
|
|
|
config BOOTPARAM_HARDLOCKUP_PANIC
|
|
|
|
bool "Panic (Reboot) On Hard Lockups"
|
2012-10-04 17:13:17 -07:00
|
|
|
depends on HARDLOCKUP_DETECTOR
|
2011-03-22 16:34:16 -07:00
|
|
|
help
|
|
|
|
Say Y here to enable the kernel to panic on "hard lockups",
|
|
|
|
which are bugs that cause the kernel to loop in kernel
|
2012-02-09 17:42:21 -05:00
|
|
|
mode with interrupts disabled for more than 10 seconds (configurable
|
|
|
|
using the watchdog_thresh sysctl).
|
2011-03-22 16:34:16 -07:00
|
|
|
|
|
|
|
Say N if unsure.
|
|
|
|
|
2009-01-15 11:08:40 -08:00
|
|
|
config DETECT_HUNG_TASK
|
|
|
|
bool "Detect Hung Tasks"
|
|
|
|
depends on DEBUG_KERNEL
|
2017-07-12 14:35:46 -07:00
|
|
|
default SOFTLOCKUP_DETECTOR
|
2009-01-15 11:08:40 -08:00
|
|
|
help
|
2013-07-01 13:04:43 -07:00
|
|
|
Say Y here to enable the kernel to detect "hung tasks",
|
|
|
|
which are bugs that cause the task to be stuck in
|
2016-09-22 16:55:13 -04:00
|
|
|
uninterruptible "D" state indefinitely.
|
2005-04-16 15:20:36 -07:00
|
|
|
|
2013-07-01 13:04:43 -07:00
|
|
|
When a hung task is detected, the kernel will print the
|
|
|
|
current stack trace (which you should report), but the
|
|
|
|
task will stay in uninterruptible state. If lockdep is
|
|
|
|
enabled then all held locks will also be reported. This
|
|
|
|
feature has negligible overhead.
|
2006-03-25 03:06:39 -08:00
|
|
|
|
2013-07-01 13:04:43 -07:00
|
|
|
config DEFAULT_HUNG_TASK_TIMEOUT
|
|
|
|
int "Default timeout for hung task detection (in seconds)"
|
|
|
|
depends on DETECT_HUNG_TASK
|
|
|
|
default 120
|
2007-07-15 23:38:14 -07:00
|
|
|
help
|
2013-07-01 13:04:43 -07:00
|
|
|
This option controls the default timeout (in seconds) used
|
|
|
|
to determine when a task has become non-responsive and should
|
|
|
|
be considered hung.
|
2007-07-15 23:38:14 -07:00
|
|
|
|
2013-07-01 13:04:43 -07:00
|
|
|
It can be adjusted at runtime via the kernel.hung_task_timeout_secs
|
|
|
|
sysctl or by writing a value to
|
|
|
|
/proc/sys/kernel/hung_task_timeout_secs.
|
2008-02-07 17:47:41 -08:00
|
|
|
|
2013-07-01 13:04:43 -07:00
|
|
|
A timeout of 0 disables the check. The default is two minutes.
|
|
|
|
Keeping the default should be fine in most cases.
|
2012-10-08 16:28:11 -07:00
|
|
|
|
2013-07-01 13:04:43 -07:00
|
|
|
config BOOTPARAM_HUNG_TASK_PANIC
|
|
|
|
bool "Panic (Reboot) On Hung Tasks"
|
|
|
|
depends on DETECT_HUNG_TASK
|
2009-06-11 13:24:13 +01:00
|
|
|
help
|
2013-07-01 13:04:43 -07:00
|
|
|
Say Y here to enable the kernel to panic on "hung tasks",
|
|
|
|
which are bugs that cause the kernel to leave a task stuck
|
|
|
|
in uninterruptible "D" state.
|
2009-06-11 13:24:13 +01:00
|
|
|
|
2013-07-01 13:04:43 -07:00
|
|
|
The panic can be used in combination with panic_timeout,
|
|
|
|
to cause the system to reboot automatically after a
|
|
|
|
hung task has been detected. This feature is useful for
|
|
|
|
high-availability systems that have uptime guarantees and
|
|
|
|
where a hung tasks must be resolved ASAP.
|
2009-06-23 14:40:27 +01:00
|
|
|
|
2013-07-01 13:04:43 -07:00
|
|
|
Say N if unsure.
|
2009-06-23 14:40:27 +01:00
|
|
|
|
workqueue: implement lockup detector
Workqueue stalls can happen from a variety of usage bugs such as
missing WQ_MEM_RECLAIM flag or concurrency managed work item
indefinitely staying RUNNING. These stalls can be extremely difficult
to hunt down because the usual warning mechanisms can't detect
workqueue stalls and the internal state is pretty opaque.
To alleviate the situation, this patch implements workqueue lockup
detector. It periodically monitors all worker_pools periodically and,
if any pool failed to make forward progress longer than the threshold
duration, triggers warning and dumps workqueue state as follows.
BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 31s!
Showing busy workqueues and worker pools:
workqueue events: flags=0x0
pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=17/256
pending: monkey_wrench_fn, e1000_watchdog, cache_reap, vmstat_shepherd, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, cgroup_release_agent
workqueue events_power_efficient: flags=0x80
pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=2/256
pending: check_lifetime, neigh_periodic_work
workqueue cgroup_pidlist_destroy: flags=0x0
pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/1
pending: cgroup_pidlist_destroy_work_fn
...
The detection mechanism is controller through kernel parameter
workqueue.watchdog_thresh and can be updated at runtime through the
sysfs module parameter file.
v2: Decoupled from softlockup control knobs.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Don Zickus <dzickus@redhat.com>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Chris Mason <clm@fb.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
2015-12-08 11:28:04 -05:00
|
|
|
config WQ_WATCHDOG
|
|
|
|
bool "Detect Workqueue Stalls"
|
|
|
|
depends on DEBUG_KERNEL
|
|
|
|
help
|
|
|
|
Say Y here to enable stall detection on workqueues. If a
|
|
|
|
worker pool doesn't make forward progress on a pending work
|
|
|
|
item for over a given amount of time, 30s by default, a
|
|
|
|
warning message is printed along with dump of workqueue
|
|
|
|
state. This can be configured through kernel parameter
|
|
|
|
"workqueue.watchdog_thresh" and its sysfs counterpart.
|
|
|
|
|
2023-05-17 17:02:08 -10:00
|
|
|
config WQ_CPU_INTENSIVE_REPORT
|
|
|
|
bool "Report per-cpu work items which hog CPU for too long"
|
|
|
|
depends on DEBUG_KERNEL
|
|
|
|
help
|
|
|
|
Say Y here to enable reporting of concurrency-managed per-cpu work
|
|
|
|
items that hog CPUs for longer than
|
2023-07-11 12:38:20 +02:00
|
|
|
workqueue.cpu_intensive_thresh_us. Workqueue automatically
|
2023-05-17 17:02:08 -10:00
|
|
|
detects and excludes them from concurrency management to prevent
|
|
|
|
them from stalling other per-cpu work items. Occassional
|
|
|
|
triggering may not necessarily indicate a problem. Repeated
|
|
|
|
triggering likely indicates that the work item should be switched
|
|
|
|
to use an unbound workqueue.
|
|
|
|
|
lib/test_lockup: test module to generate lockups
CONFIG_TEST_LOCKUP=m adds module "test_lockup" that helps to make sure
that watchdogs and lockup detectors are working properly.
Depending on module parameters test_lockup could emulate soft or hard
lockup, "hung task", hold arbitrary lock, allocate bunch of pages.
Also it could generate series of lockups with cooling-down periods, in
this way it could be used as "ping" for locks or page allocator. Loop
checks signals between iteration thus could be stopped by ^C.
# modinfo test_lockup
...
parm: time_secs:lockup time in seconds, default 0 (uint)
parm: time_nsecs:nanoseconds part of lockup time, default 0 (uint)
parm: cooldown_secs:cooldown time between iterations in seconds, default 0 (uint)
parm: cooldown_nsecs:nanoseconds part of cooldown, default 0 (uint)
parm: iterations:lockup iterations, default 1 (uint)
parm: all_cpus:trigger lockup at all cpus at once (bool)
parm: state:wait in 'R' running (default), 'D' uninterruptible, 'K' killable, 'S' interruptible state (charp)
parm: use_hrtimer:use high-resolution timer for sleeping (bool)
parm: iowait:account sleep time as iowait (bool)
parm: lock_read:lock read-write locks for read (bool)
parm: lock_single:acquire locks only at one cpu (bool)
parm: reacquire_locks:release and reacquire locks/irq/preempt between iterations (bool)
parm: touch_softlockup:touch soft-lockup watchdog between iterations (bool)
parm: touch_hardlockup:touch hard-lockup watchdog between iterations (bool)
parm: call_cond_resched:call cond_resched() between iterations (bool)
parm: measure_lock_wait:measure lock wait time (bool)
parm: lock_wait_threshold:print lock wait time longer than this in nanoseconds, default off (ulong)
parm: disable_irq:disable interrupts: generate hard-lockups (bool)
parm: disable_softirq:disable bottom-half irq handlers (bool)
parm: disable_preempt:disable preemption: generate soft-lockups (bool)
parm: lock_rcu:grab rcu_read_lock: generate rcu stalls (bool)
parm: lock_mmap_sem:lock mm->mmap_sem: block procfs interfaces (bool)
parm: lock_rwsem_ptr:lock rw_semaphore at address (ulong)
parm: lock_mutex_ptr:lock mutex at address (ulong)
parm: lock_spinlock_ptr:lock spinlock at address (ulong)
parm: lock_rwlock_ptr:lock rwlock at address (ulong)
parm: alloc_pages_nr:allocate and free pages under locks (uint)
parm: alloc_pages_order:page order to allocate (uint)
parm: alloc_pages_gfp:allocate pages with this gfp_mask, default GFP_KERNEL (uint)
parm: alloc_pages_atomic:allocate pages with GFP_ATOMIC (bool)
parm: reallocate_pages:free and allocate pages between iterations (bool)
Parameters for locking by address are unsafe and taints kernel. With
CONFIG_DEBUG_SPINLOCK=y they at least check magics for embedded spinlocks.
Examples:
task hang in D-state:
modprobe test_lockup time_secs=1 iterations=60 state=D
task hang in io-wait D-state:
modprobe test_lockup time_secs=1 iterations=60 state=D iowait
softlockup:
modprobe test_lockup time_secs=1 iterations=60 state=R
hardlockup:
modprobe test_lockup time_secs=1 iterations=60 state=R disable_irq
system-wide hardlockup:
modprobe test_lockup time_secs=1 iterations=60 state=R \
disable_irq all_cpus
rcu stall:
modprobe test_lockup time_secs=1 iterations=60 state=R \
lock_rcu touch_softlockup
lock mmap_sem / block procfs interfaces:
modprobe test_lockup time_secs=1 iterations=60 state=S lock_mmap_sem
lock tasklist_lock for read / block forks:
TASKLIST_LOCK=$(awk '$3 == "tasklist_lock" {print "0x"$1}' /proc/kallsyms)
modprobe test_lockup time_secs=1 iterations=60 state=R \
disable_irq lock_read lock_rwlock_ptr=$TASKLIST_LOCK
lock namespace_sem / block vfs mount operations:
NAMESPACE_SEM=$(awk '$3 == "namespace_sem" {print "0x"$1}' /proc/kallsyms)
modprobe test_lockup time_secs=1 iterations=60 state=S \
lock_rwsem_ptr=$NAMESPACE_SEM
lock cgroup mutex / block cgroup operations:
CGROUP_MUTEX=$(awk '$3 == "cgroup_mutex" {print "0x"$1}' /proc/kallsyms)
modprobe test_lockup time_secs=1 iterations=60 state=S \
lock_mutex_ptr=$CGROUP_MUTEX
ping cgroup_mutex every second and measure maximum lock wait time:
modprobe test_lockup cooldown_secs=1 iterations=60 state=S \
lock_mutex_ptr=$CGROUP_MUTEX reacquire_locks measure_lock_wait
[linux@roeck-us.net: rename disable_irq to fix build error]
Link: http://lkml.kernel.org/r/20200317133614.23152-1-linux@roeck-us.net
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Dmitry Monakhov <dmtrmonakhov@yandex-team.ru
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Link: http://lkml.kernel.org/r/158132859146.2797.525923171323227836.stgit@buzz
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-06 20:09:47 -07:00
|
|
|
config TEST_LOCKUP
|
|
|
|
tristate "Test module to generate lockups"
|
2020-08-11 18:34:44 -07:00
|
|
|
depends on m
|
lib/test_lockup: test module to generate lockups
CONFIG_TEST_LOCKUP=m adds module "test_lockup" that helps to make sure
that watchdogs and lockup detectors are working properly.
Depending on module parameters test_lockup could emulate soft or hard
lockup, "hung task", hold arbitrary lock, allocate bunch of pages.
Also it could generate series of lockups with cooling-down periods, in
this way it could be used as "ping" for locks or page allocator. Loop
checks signals between iteration thus could be stopped by ^C.
# modinfo test_lockup
...
parm: time_secs:lockup time in seconds, default 0 (uint)
parm: time_nsecs:nanoseconds part of lockup time, default 0 (uint)
parm: cooldown_secs:cooldown time between iterations in seconds, default 0 (uint)
parm: cooldown_nsecs:nanoseconds part of cooldown, default 0 (uint)
parm: iterations:lockup iterations, default 1 (uint)
parm: all_cpus:trigger lockup at all cpus at once (bool)
parm: state:wait in 'R' running (default), 'D' uninterruptible, 'K' killable, 'S' interruptible state (charp)
parm: use_hrtimer:use high-resolution timer for sleeping (bool)
parm: iowait:account sleep time as iowait (bool)
parm: lock_read:lock read-write locks for read (bool)
parm: lock_single:acquire locks only at one cpu (bool)
parm: reacquire_locks:release and reacquire locks/irq/preempt between iterations (bool)
parm: touch_softlockup:touch soft-lockup watchdog between iterations (bool)
parm: touch_hardlockup:touch hard-lockup watchdog between iterations (bool)
parm: call_cond_resched:call cond_resched() between iterations (bool)
parm: measure_lock_wait:measure lock wait time (bool)
parm: lock_wait_threshold:print lock wait time longer than this in nanoseconds, default off (ulong)
parm: disable_irq:disable interrupts: generate hard-lockups (bool)
parm: disable_softirq:disable bottom-half irq handlers (bool)
parm: disable_preempt:disable preemption: generate soft-lockups (bool)
parm: lock_rcu:grab rcu_read_lock: generate rcu stalls (bool)
parm: lock_mmap_sem:lock mm->mmap_sem: block procfs interfaces (bool)
parm: lock_rwsem_ptr:lock rw_semaphore at address (ulong)
parm: lock_mutex_ptr:lock mutex at address (ulong)
parm: lock_spinlock_ptr:lock spinlock at address (ulong)
parm: lock_rwlock_ptr:lock rwlock at address (ulong)
parm: alloc_pages_nr:allocate and free pages under locks (uint)
parm: alloc_pages_order:page order to allocate (uint)
parm: alloc_pages_gfp:allocate pages with this gfp_mask, default GFP_KERNEL (uint)
parm: alloc_pages_atomic:allocate pages with GFP_ATOMIC (bool)
parm: reallocate_pages:free and allocate pages between iterations (bool)
Parameters for locking by address are unsafe and taints kernel. With
CONFIG_DEBUG_SPINLOCK=y they at least check magics for embedded spinlocks.
Examples:
task hang in D-state:
modprobe test_lockup time_secs=1 iterations=60 state=D
task hang in io-wait D-state:
modprobe test_lockup time_secs=1 iterations=60 state=D iowait
softlockup:
modprobe test_lockup time_secs=1 iterations=60 state=R
hardlockup:
modprobe test_lockup time_secs=1 iterations=60 state=R disable_irq
system-wide hardlockup:
modprobe test_lockup time_secs=1 iterations=60 state=R \
disable_irq all_cpus
rcu stall:
modprobe test_lockup time_secs=1 iterations=60 state=R \
lock_rcu touch_softlockup
lock mmap_sem / block procfs interfaces:
modprobe test_lockup time_secs=1 iterations=60 state=S lock_mmap_sem
lock tasklist_lock for read / block forks:
TASKLIST_LOCK=$(awk '$3 == "tasklist_lock" {print "0x"$1}' /proc/kallsyms)
modprobe test_lockup time_secs=1 iterations=60 state=R \
disable_irq lock_read lock_rwlock_ptr=$TASKLIST_LOCK
lock namespace_sem / block vfs mount operations:
NAMESPACE_SEM=$(awk '$3 == "namespace_sem" {print "0x"$1}' /proc/kallsyms)
modprobe test_lockup time_secs=1 iterations=60 state=S \
lock_rwsem_ptr=$NAMESPACE_SEM
lock cgroup mutex / block cgroup operations:
CGROUP_MUTEX=$(awk '$3 == "cgroup_mutex" {print "0x"$1}' /proc/kallsyms)
modprobe test_lockup time_secs=1 iterations=60 state=S \
lock_mutex_ptr=$CGROUP_MUTEX
ping cgroup_mutex every second and measure maximum lock wait time:
modprobe test_lockup cooldown_secs=1 iterations=60 state=S \
lock_mutex_ptr=$CGROUP_MUTEX reacquire_locks measure_lock_wait
[linux@roeck-us.net: rename disable_irq to fix build error]
Link: http://lkml.kernel.org/r/20200317133614.23152-1-linux@roeck-us.net
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Dmitry Monakhov <dmtrmonakhov@yandex-team.ru
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Link: http://lkml.kernel.org/r/158132859146.2797.525923171323227836.stgit@buzz
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-06 20:09:47 -07:00
|
|
|
help
|
|
|
|
This builds the "test_lockup" module that helps to make sure
|
|
|
|
that watchdogs and lockup detectors are working properly.
|
|
|
|
|
|
|
|
Depending on module parameters it could emulate soft or hard
|
|
|
|
lockup, "hung task", or locking arbitrary lock for a long time.
|
|
|
|
Also it could generate series of lockups with cooling-down periods.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2013-07-01 13:04:50 -07:00
|
|
|
endmenu # "Debug lockups and hangs"
|
|
|
|
|
2019-12-06 17:04:00 -08:00
|
|
|
menu "Scheduler Debugging"
|
2013-11-25 23:23:04 +00:00
|
|
|
|
2013-07-01 13:04:43 -07:00
|
|
|
config SCHED_DEBUG
|
|
|
|
bool "Collect scheduler debugging info"
|
2023-01-29 11:10:09 +08:00
|
|
|
depends on DEBUG_KERNEL && DEBUG_FS
|
2013-07-01 13:04:43 -07:00
|
|
|
default y
|
2009-06-11 13:24:14 +01:00
|
|
|
help
|
2023-01-29 10:13:57 +08:00
|
|
|
If you say Y here, the /sys/kernel/debug/sched file will be provided
|
2013-07-01 13:04:43 -07:00
|
|
|
that can help debug the scheduler. The runtime overhead of this
|
|
|
|
option is minimal.
|
2009-06-11 13:24:14 +01:00
|
|
|
|
2015-06-25 23:53:37 +05:30
|
|
|
config SCHED_INFO
|
|
|
|
bool
|
|
|
|
default n
|
|
|
|
|
2013-07-01 13:04:43 -07:00
|
|
|
config SCHEDSTATS
|
|
|
|
bool "Collect scheduler statistics"
|
2024-03-08 11:58:56 +01:00
|
|
|
depends on PROC_FS
|
2015-06-25 23:53:37 +05:30
|
|
|
select SCHED_INFO
|
2013-07-01 13:04:43 -07:00
|
|
|
help
|
|
|
|
If you say Y here, additional code will be inserted into the
|
|
|
|
scheduler and related routines to collect statistics about
|
|
|
|
scheduler behavior and provide them in /proc/schedstat. These
|
|
|
|
stats may be useful for both tuning and debugging the scheduler
|
|
|
|
If you aren't debugging the scheduler or trying to tune a specific
|
|
|
|
application, you can say N to avoid the very slight overhead
|
|
|
|
this adds.
|
2009-06-11 13:24:14 +01:00
|
|
|
|
2019-12-06 17:04:00 -08:00
|
|
|
endmenu
|
2014-09-12 14:16:19 +01:00
|
|
|
|
2015-03-11 21:16:32 -07:00
|
|
|
config DEBUG_TIMEKEEPING
|
|
|
|
bool "Enable extra timekeeping sanity checking"
|
|
|
|
help
|
|
|
|
This option will enable additional timekeeping sanity checks
|
|
|
|
which may be helpful when diagnosing issues where timekeeping
|
|
|
|
problems are suspected.
|
|
|
|
|
|
|
|
This may include checks in the timekeeping hotpaths, so this
|
|
|
|
option may have a (very small) performance impact to some
|
|
|
|
workloads.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2005-04-16 15:20:36 -07:00
|
|
|
config DEBUG_PREEMPT
|
|
|
|
bool "Debug preemptible kernel"
|
2019-10-15 21:18:19 +02:00
|
|
|
depends on DEBUG_KERNEL && PREEMPTION && TRACE_IRQFLAGS_SUPPORT
|
2005-04-16 15:20:36 -07:00
|
|
|
help
|
|
|
|
If you say Y here then the kernel will use a debug variant of the
|
|
|
|
commonly used smp_processor_id() function and will print warnings
|
|
|
|
if kernel code uses it in a preemption-unsafe way. Also, the kernel
|
|
|
|
will detect preemption count underflows.
|
|
|
|
|
2023-01-21 12:39:42 +09:00
|
|
|
This option has potential to introduce high runtime overhead,
|
|
|
|
depending on workload as it triggers debugging routines for each
|
|
|
|
this_cpu operation. It should only be used for debugging purposes.
|
|
|
|
|
2013-07-01 13:04:47 -07:00
|
|
|
menu "Lock Debugging (spinlocks, mutexes, etc...)"
|
|
|
|
|
2018-03-30 17:27:59 -04:00
|
|
|
config LOCK_DEBUGGING_SUPPORT
|
|
|
|
bool
|
|
|
|
depends on TRACE_IRQFLAGS_SUPPORT && STACKTRACE_SUPPORT && LOCKDEP_SUPPORT
|
|
|
|
default y
|
|
|
|
|
2018-03-30 17:28:00 -04:00
|
|
|
config PROVE_LOCKING
|
|
|
|
bool "Lock debugging: prove locking correctness"
|
|
|
|
depends on DEBUG_KERNEL && LOCK_DEBUGGING_SUPPORT
|
|
|
|
select LOCKDEP
|
|
|
|
select DEBUG_SPINLOCK
|
2021-08-15 23:29:01 +02:00
|
|
|
select DEBUG_MUTEXES if !PREEMPT_RT
|
2018-03-30 17:28:00 -04:00
|
|
|
select DEBUG_RT_MUTEXES if RT_MUTEXES
|
2024-02-22 10:05:40 -05:00
|
|
|
select DEBUG_RWSEMS if !PREEMPT_RT
|
2018-03-30 17:28:00 -04:00
|
|
|
select DEBUG_WW_MUTEX_SLOWPATH
|
|
|
|
select DEBUG_LOCK_ALLOC
|
2020-07-20 17:55:13 +02:00
|
|
|
select PREEMPT_COUNT if !ARCH_NO_PREEMPT
|
2018-03-30 17:28:00 -04:00
|
|
|
select TRACE_IRQFLAGS
|
|
|
|
default n
|
|
|
|
help
|
|
|
|
This feature enables the kernel to prove that all locking
|
|
|
|
that occurs in the kernel runtime is mathematically
|
|
|
|
correct: that under no circumstance could an arbitrary (and
|
|
|
|
not yet triggered) combination of observed locking
|
|
|
|
sequences (on an arbitrary number of CPUs, running an
|
|
|
|
arbitrary number of tasks and interrupt contexts) cause a
|
|
|
|
deadlock.
|
|
|
|
|
|
|
|
In short, this feature enables the kernel to report locking
|
|
|
|
related deadlocks before they actually occur.
|
|
|
|
|
|
|
|
The proof does not depend on how hard and complex a
|
|
|
|
deadlock scenario would be to trigger: how many
|
|
|
|
participant CPUs, tasks and irq-contexts would be needed
|
|
|
|
for it to trigger. The proof also does not depend on
|
|
|
|
timing: if a race and a resulting deadlock is possible
|
|
|
|
theoretically (no matter how unlikely the race scenario
|
|
|
|
is), it will be proven so and will immediately be
|
|
|
|
reported by the kernel (once the event is observed that
|
|
|
|
makes the deadlock theoretically possible).
|
|
|
|
|
|
|
|
If a deadlock is impossible (i.e. the locking rules, as
|
|
|
|
observed by the kernel, are mathematically correct), the
|
|
|
|
kernel reports nothing.
|
|
|
|
|
|
|
|
NOTE: this feature can also be enabled for rwlocks, mutexes
|
|
|
|
and rwsems - in which case all dependencies between these
|
|
|
|
different locking variants are observed and mapped too, and
|
|
|
|
the proof of observed correctness is also maintained for an
|
|
|
|
arbitrary combination of these separate locking variants.
|
|
|
|
|
2019-04-10 08:32:41 -03:00
|
|
|
For more details, see Documentation/locking/lockdep-design.rst.
|
2018-03-30 17:28:00 -04:00
|
|
|
|
lockdep: Introduce wait-type checks
Extend lockdep to validate lock wait-type context.
The current wait-types are:
LD_WAIT_FREE, /* wait free, rcu etc.. */
LD_WAIT_SPIN, /* spin loops, raw_spinlock_t etc.. */
LD_WAIT_CONFIG, /* CONFIG_PREEMPT_LOCK, spinlock_t etc.. */
LD_WAIT_SLEEP, /* sleeping locks, mutex_t etc.. */
Where lockdep validates that the current lock (the one being acquired)
fits in the current wait-context (as generated by the held stack).
This ensures that there is no attempt to acquire mutexes while holding
spinlocks, to acquire spinlocks while holding raw_spinlocks and so on. In
other words, its a more fancy might_sleep().
Obviously RCU made the entire ordeal more complex than a simple single
value test because RCU can be acquired in (pretty much) any context and
while it presents a context to nested locks it is not the same as it
got acquired in.
Therefore its necessary to split the wait_type into two values, one
representing the acquire (outer) and one representing the nested context
(inner). For most 'normal' locks these two are the same.
[ To make static initialization easier we have the rule that:
.outer == INV means .outer == .inner; because INV == 0. ]
It further means that its required to find the minimal .inner of the held
stack to compare against the outer of the new lock; because while 'normal'
RCU presents a CONFIG type to nested locks, if it is taken while already
holding a SPIN type it obviously doesn't relax the rules.
Below is an example output generated by the trivial test code:
raw_spin_lock(&foo);
spin_lock(&bar);
spin_unlock(&bar);
raw_spin_unlock(&foo);
[ BUG: Invalid wait context ]
-----------------------------
swapper/0/1 is trying to lock:
ffffc90000013f20 (&bar){....}-{3:3}, at: kernel_init+0xdb/0x187
other info that might help us debug this:
1 lock held by swapper/0/1:
#0: ffffc90000013ee0 (&foo){+.+.}-{2:2}, at: kernel_init+0xd1/0x187
The way to read it is to look at the new -{n,m} part in the lock
description; -{3:3} for the attempted lock, and try and match that up to
the held locks, which in this case is the one: -{2,2}.
This tells that the acquiring lock requires a more relaxed environment than
presented by the lock stack.
Currently only the normal locks and RCU are converted, the rest of the
lockdep users defaults to .inner = INV which is ignored. More conversions
can be done when desired.
The check for spinlock_t nesting is not enabled by default. It's a separate
config option for now as there are known problems which are currently
addressed. The config option allows to identify these problems and to
verify that the solutions found are indeed solving them.
The config switch will be removed and the checks will permanently enabled
once the vast majority of issues has been addressed.
[ bigeasy: Move LD_WAIT_FREE,… out of CONFIG_LOCKDEP to avoid compile
failure with CONFIG_DEBUG_SPINLOCK + !CONFIG_LOCKDEP]
[ tglx: Add the config option ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200321113242.427089655@linutronix.de
2020-03-21 12:26:01 +01:00
|
|
|
config PROVE_RAW_LOCK_NESTING
|
|
|
|
bool "Enable raw_spinlock - spinlock nesting checks"
|
|
|
|
depends on PROVE_LOCKING
|
|
|
|
default n
|
|
|
|
help
|
|
|
|
Enable the raw_spinlock vs. spinlock nesting checks which ensure
|
|
|
|
that the lock nesting rules for PREEMPT_RT enabled kernels are
|
|
|
|
not violated.
|
|
|
|
|
|
|
|
NOTE: There are known nesting problems. So if you enable this
|
|
|
|
option expect lockdep splats until these problems have been fully
|
|
|
|
addressed which is work in progress. This config switch allows to
|
|
|
|
identify and analyze these problems. It will be removed and the
|
2021-07-07 18:07:31 -07:00
|
|
|
check permanently enabled once the main issues have been fixed.
|
lockdep: Introduce wait-type checks
Extend lockdep to validate lock wait-type context.
The current wait-types are:
LD_WAIT_FREE, /* wait free, rcu etc.. */
LD_WAIT_SPIN, /* spin loops, raw_spinlock_t etc.. */
LD_WAIT_CONFIG, /* CONFIG_PREEMPT_LOCK, spinlock_t etc.. */
LD_WAIT_SLEEP, /* sleeping locks, mutex_t etc.. */
Where lockdep validates that the current lock (the one being acquired)
fits in the current wait-context (as generated by the held stack).
This ensures that there is no attempt to acquire mutexes while holding
spinlocks, to acquire spinlocks while holding raw_spinlocks and so on. In
other words, its a more fancy might_sleep().
Obviously RCU made the entire ordeal more complex than a simple single
value test because RCU can be acquired in (pretty much) any context and
while it presents a context to nested locks it is not the same as it
got acquired in.
Therefore its necessary to split the wait_type into two values, one
representing the acquire (outer) and one representing the nested context
(inner). For most 'normal' locks these two are the same.
[ To make static initialization easier we have the rule that:
.outer == INV means .outer == .inner; because INV == 0. ]
It further means that its required to find the minimal .inner of the held
stack to compare against the outer of the new lock; because while 'normal'
RCU presents a CONFIG type to nested locks, if it is taken while already
holding a SPIN type it obviously doesn't relax the rules.
Below is an example output generated by the trivial test code:
raw_spin_lock(&foo);
spin_lock(&bar);
spin_unlock(&bar);
raw_spin_unlock(&foo);
[ BUG: Invalid wait context ]
-----------------------------
swapper/0/1 is trying to lock:
ffffc90000013f20 (&bar){....}-{3:3}, at: kernel_init+0xdb/0x187
other info that might help us debug this:
1 lock held by swapper/0/1:
#0: ffffc90000013ee0 (&foo){+.+.}-{2:2}, at: kernel_init+0xd1/0x187
The way to read it is to look at the new -{n,m} part in the lock
description; -{3:3} for the attempted lock, and try and match that up to
the held locks, which in this case is the one: -{2,2}.
This tells that the acquiring lock requires a more relaxed environment than
presented by the lock stack.
Currently only the normal locks and RCU are converted, the rest of the
lockdep users defaults to .inner = INV which is ignored. More conversions
can be done when desired.
The check for spinlock_t nesting is not enabled by default. It's a separate
config option for now as there are known problems which are currently
addressed. The config option allows to identify these problems and to
verify that the solutions found are indeed solving them.
The config switch will be removed and the checks will permanently enabled
once the vast majority of issues has been addressed.
[ bigeasy: Move LD_WAIT_FREE,… out of CONFIG_LOCKDEP to avoid compile
failure with CONFIG_DEBUG_SPINLOCK + !CONFIG_LOCKDEP]
[ tglx: Add the config option ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200321113242.427089655@linutronix.de
2020-03-21 12:26:01 +01:00
|
|
|
|
|
|
|
If unsure, select N.
|
|
|
|
|
2018-03-30 17:28:00 -04:00
|
|
|
config LOCK_STAT
|
|
|
|
bool "Lock usage statistics"
|
|
|
|
depends on DEBUG_KERNEL && LOCK_DEBUGGING_SUPPORT
|
|
|
|
select LOCKDEP
|
|
|
|
select DEBUG_SPINLOCK
|
2021-08-15 23:29:01 +02:00
|
|
|
select DEBUG_MUTEXES if !PREEMPT_RT
|
2018-03-30 17:28:00 -04:00
|
|
|
select DEBUG_RT_MUTEXES if RT_MUTEXES
|
|
|
|
select DEBUG_LOCK_ALLOC
|
|
|
|
default n
|
|
|
|
help
|
|
|
|
This feature enables tracking lock contention points
|
|
|
|
|
2019-04-10 08:32:41 -03:00
|
|
|
For more details, see Documentation/locking/lockstat.rst
|
2018-03-30 17:28:00 -04:00
|
|
|
|
|
|
|
This also enables lock events required by "perf lock",
|
|
|
|
subcommand of perf.
|
|
|
|
If you want to use "perf lock", you also need to turn on
|
|
|
|
CONFIG_EVENT_TRACING.
|
|
|
|
|
|
|
|
CONFIG_LOCK_STAT defines "contended" and "acquired" lock events.
|
|
|
|
(CONFIG_LOCKDEP defines "acquire" and "release" events.)
|
|
|
|
|
2006-06-27 02:54:55 -07:00
|
|
|
config DEBUG_RT_MUTEXES
|
|
|
|
bool "RT Mutex debugging, deadlock detection"
|
|
|
|
depends on DEBUG_KERNEL && RT_MUTEXES
|
|
|
|
help
|
|
|
|
This allows rt mutex semantics violations and rt mutex related
|
|
|
|
deadlocks (lockups) to be detected and reported automatically.
|
|
|
|
|
2005-04-16 15:20:36 -07:00
|
|
|
config DEBUG_SPINLOCK
|
2006-07-03 00:24:55 -07:00
|
|
|
bool "Spinlock and rw-lock debugging: basic checks"
|
2005-04-16 15:20:36 -07:00
|
|
|
depends on DEBUG_KERNEL
|
2012-03-22 15:25:08 +05:30
|
|
|
select UNINLINE_SPIN_UNLOCK
|
2005-04-16 15:20:36 -07:00
|
|
|
help
|
|
|
|
Say Y here and build SMP to catch missing spinlock initialization
|
|
|
|
and certain other kinds of spinlock errors commonly made. This is
|
|
|
|
best used in conjunction with the NMI watchdog so that spinlock
|
|
|
|
deadlocks are also debuggable.
|
|
|
|
|
2006-07-03 00:24:55 -07:00
|
|
|
config DEBUG_MUTEXES
|
|
|
|
bool "Mutex debugging: basic checks"
|
2021-08-15 23:29:01 +02:00
|
|
|
depends on DEBUG_KERNEL && !PREEMPT_RT
|
2006-07-03 00:24:55 -07:00
|
|
|
help
|
|
|
|
This feature allows mutex semantics violations to be detected and
|
|
|
|
reported.
|
|
|
|
|
2013-06-20 13:31:17 +02:00
|
|
|
config DEBUG_WW_MUTEX_SLOWPATH
|
|
|
|
bool "Wait/wound mutex debugging: Slowpath testing"
|
2018-03-30 17:27:59 -04:00
|
|
|
depends on DEBUG_KERNEL && LOCK_DEBUGGING_SUPPORT
|
2013-06-20 13:31:17 +02:00
|
|
|
select DEBUG_LOCK_ALLOC
|
|
|
|
select DEBUG_SPINLOCK
|
2021-08-15 23:29:01 +02:00
|
|
|
select DEBUG_MUTEXES if !PREEMPT_RT
|
|
|
|
select DEBUG_RT_MUTEXES if PREEMPT_RT
|
2013-06-20 13:31:17 +02:00
|
|
|
help
|
|
|
|
This feature enables slowpath testing for w/w mutex users by
|
|
|
|
injecting additional -EDEADLK wound/backoff cases. Together with
|
|
|
|
the full mutex checks enabled with (CONFIG_PROVE_LOCKING) this
|
|
|
|
will test all possible w/w mutex interface abuse with the
|
|
|
|
exception of simply not acquiring all the required locks.
|
2014-08-27 11:19:26 -04:00
|
|
|
Note that this feature can introduce significant overhead, so
|
|
|
|
it really should not be enabled in a production or distro kernel,
|
|
|
|
even a debug kernel. If you are a driver writer, enable it. If
|
|
|
|
you are a distro, do not.
|
2013-06-20 13:31:17 +02:00
|
|
|
|
2018-03-30 17:27:58 -04:00
|
|
|
config DEBUG_RWSEMS
|
|
|
|
bool "RW Semaphore debugging: basic checks"
|
2024-02-22 10:05:40 -05:00
|
|
|
depends on DEBUG_KERNEL && !PREEMPT_RT
|
2018-03-30 17:27:58 -04:00
|
|
|
help
|
2019-05-20 16:59:00 -04:00
|
|
|
This debugging feature allows mismatched rw semaphore locks
|
|
|
|
and unlocks to be detected and reported.
|
2018-03-30 17:27:58 -04:00
|
|
|
|
2006-07-03 00:24:55 -07:00
|
|
|
config DEBUG_LOCK_ALLOC
|
|
|
|
bool "Lock debugging: detect incorrect freeing of live locks"
|
2018-03-30 17:27:59 -04:00
|
|
|
depends on DEBUG_KERNEL && LOCK_DEBUGGING_SUPPORT
|
2006-07-03 00:24:55 -07:00
|
|
|
select DEBUG_SPINLOCK
|
2021-08-15 23:29:01 +02:00
|
|
|
select DEBUG_MUTEXES if !PREEMPT_RT
|
2016-09-19 12:15:37 +02:00
|
|
|
select DEBUG_RT_MUTEXES if RT_MUTEXES
|
2006-07-03 00:24:55 -07:00
|
|
|
select LOCKDEP
|
|
|
|
help
|
|
|
|
This feature will check whether any held lock (spinlock, rwlock,
|
|
|
|
mutex or rwsem) is incorrectly freed by the kernel, via any of the
|
|
|
|
memory-freeing routines (kfree(), kmem_cache_free(), free_pages(),
|
|
|
|
vfree(), etc.), whether a live lock is incorrectly reinitialized via
|
|
|
|
spin_lock_init()/mutex_init()/etc., or whether there is any lock
|
|
|
|
held during task exit.
|
|
|
|
|
|
|
|
config LOCKDEP
|
|
|
|
bool
|
2018-03-30 17:27:59 -04:00
|
|
|
depends on DEBUG_KERNEL && LOCK_DEBUGGING_SUPPORT
|
2006-07-03 00:24:55 -07:00
|
|
|
select STACKTRACE
|
|
|
|
select KALLSYMS
|
|
|
|
select KALLSYMS_ALL
|
|
|
|
|
2017-04-10 11:50:52 -04:00
|
|
|
config LOCKDEP_SMALL
|
|
|
|
bool
|
|
|
|
|
2021-04-05 20:33:57 +09:00
|
|
|
config LOCKDEP_BITS
|
|
|
|
int "Bitsize for MAX_LOCKDEP_ENTRIES"
|
|
|
|
depends on LOCKDEP && !LOCKDEP_SMALL
|
|
|
|
range 10 30
|
|
|
|
default 15
|
|
|
|
help
|
|
|
|
Try increasing this value if you hit "BUG: MAX_LOCKDEP_ENTRIES too low!" message.
|
|
|
|
|
|
|
|
config LOCKDEP_CHAINS_BITS
|
|
|
|
int "Bitsize for MAX_LOCKDEP_CHAINS"
|
|
|
|
depends on LOCKDEP && !LOCKDEP_SMALL
|
2024-07-23 16:40:17 +00:00
|
|
|
range 10 21
|
2021-04-05 20:33:57 +09:00
|
|
|
default 16
|
|
|
|
help
|
|
|
|
Try increasing this value if you hit "BUG: MAX_LOCKDEP_CHAINS too low!" message.
|
|
|
|
|
|
|
|
config LOCKDEP_STACK_TRACE_BITS
|
|
|
|
int "Bitsize for MAX_STACK_TRACE_ENTRIES"
|
|
|
|
depends on LOCKDEP && !LOCKDEP_SMALL
|
|
|
|
range 10 30
|
|
|
|
default 19
|
|
|
|
help
|
|
|
|
Try increasing this value if you hit "BUG: MAX_STACK_TRACE_ENTRIES too low!" message.
|
|
|
|
|
|
|
|
config LOCKDEP_STACK_TRACE_HASH_BITS
|
|
|
|
int "Bitsize for STACK_TRACE_HASH_SIZE"
|
|
|
|
depends on LOCKDEP && !LOCKDEP_SMALL
|
|
|
|
range 10 30
|
|
|
|
default 14
|
|
|
|
help
|
2023-03-21 14:35:08 +08:00
|
|
|
Try increasing this value if you need large STACK_TRACE_HASH_SIZE.
|
2021-04-05 20:33:57 +09:00
|
|
|
|
|
|
|
config LOCKDEP_CIRCULAR_QUEUE_BITS
|
|
|
|
int "Bitsize for elements in circular_queue struct"
|
|
|
|
depends on LOCKDEP
|
|
|
|
range 10 30
|
|
|
|
default 12
|
|
|
|
help
|
|
|
|
Try increasing this value if you hit "lockdep bfs error:-1" warning due to __cq_enqueue() failure.
|
|
|
|
|
2006-07-03 00:24:55 -07:00
|
|
|
config DEBUG_LOCKDEP
|
|
|
|
bool "Lock dependency engine debugging"
|
2006-07-14 00:24:32 -07:00
|
|
|
depends on DEBUG_KERNEL && LOCKDEP
|
2021-01-11 15:37:07 +00:00
|
|
|
select DEBUG_IRQFLAGS
|
2006-07-03 00:24:55 -07:00
|
|
|
help
|
|
|
|
If you say Y here, the lock dependency engine will do
|
|
|
|
additional runtime checks to debug itself, at the price
|
|
|
|
of more runtime overhead.
|
|
|
|
|
2011-06-08 19:31:56 +02:00
|
|
|
config DEBUG_ATOMIC_SLEEP
|
|
|
|
bool "Sleep inside atomic section checking"
|
2011-06-08 01:51:02 +02:00
|
|
|
select PREEMPT_COUNT
|
2005-04-16 15:20:36 -07:00
|
|
|
depends on DEBUG_KERNEL
|
2018-07-31 13:39:32 +02:00
|
|
|
depends on !ARCH_NO_PREEMPT
|
2005-04-16 15:20:36 -07:00
|
|
|
help
|
|
|
|
If you say Y here, various routines which may sleep will become very
|
2011-06-08 19:31:56 +02:00
|
|
|
noisy if they are called inside atomic sections: when a spinlock is
|
|
|
|
held, inside an rcu read side critical section, inside preempt disabled
|
|
|
|
sections, inside an interrupt, etc...
|
2005-04-16 15:20:36 -07:00
|
|
|
|
2006-07-03 00:24:48 -07:00
|
|
|
config DEBUG_LOCKING_API_SELFTESTS
|
|
|
|
bool "Locking API boot-time self-tests"
|
|
|
|
depends on DEBUG_KERNEL
|
|
|
|
help
|
|
|
|
Say Y here if you want the kernel to run a short self-test during
|
|
|
|
bootup. The self-test checks whether common types of locking bugs
|
|
|
|
are detected by debugging mechanisms or not. (if you disable
|
2021-07-07 18:07:31 -07:00
|
|
|
lock debugging then those bugs won't be detected of course.)
|
2006-07-03 00:24:48 -07:00
|
|
|
The following locking APIs are covered: spinlocks, rwlocks,
|
|
|
|
mutexes and rwsems.
|
|
|
|
|
2014-02-04 15:51:41 -08:00
|
|
|
config LOCK_TORTURE_TEST
|
|
|
|
tristate "torture tests for locking"
|
|
|
|
depends on DEBUG_KERNEL
|
|
|
|
select TORTURE_TEST
|
|
|
|
help
|
|
|
|
This option provides a kernel module that runs torture tests
|
|
|
|
on kernel locking primitives. The kernel module may be built
|
|
|
|
after the fact on the running kernel to be tested, if desired.
|
|
|
|
|
|
|
|
Say Y here if you want kernel locking-primitive torture tests
|
|
|
|
to be built into the kernel.
|
|
|
|
Say M if you want these torture tests to build as a module.
|
|
|
|
Say N if you are unsure.
|
|
|
|
|
2016-12-01 11:47:06 +00:00
|
|
|
config WW_MUTEX_SELFTEST
|
|
|
|
tristate "Wait/wound mutex selftests"
|
|
|
|
help
|
|
|
|
This option provides a kernel module that runs tests on the
|
|
|
|
on the struct ww_mutex locking API.
|
|
|
|
|
|
|
|
It is recommended to enable DEBUG_WW_MUTEX_SLOWPATH in conjunction
|
|
|
|
with this test harness.
|
|
|
|
|
|
|
|
Say M if you want these self tests to build as a module.
|
|
|
|
Say N if you are unsure.
|
|
|
|
|
2020-06-24 15:59:59 -07:00
|
|
|
config SCF_TORTURE_TEST
|
|
|
|
tristate "torture tests for smp_call_function*()"
|
|
|
|
depends on DEBUG_KERNEL
|
|
|
|
select TORTURE_TEST
|
|
|
|
help
|
|
|
|
This option provides a kernel module that runs torture tests
|
|
|
|
on the smp_call_function() family of primitives. The kernel
|
|
|
|
module may be built after the fact on the running kernel to
|
|
|
|
be tested, if desired.
|
|
|
|
|
2020-06-30 13:22:54 -07:00
|
|
|
config CSD_LOCK_WAIT_DEBUG
|
|
|
|
bool "Debugging for csd_lock_wait(), called from smp_call_function*()"
|
|
|
|
depends on DEBUG_KERNEL
|
2024-07-01 13:33:58 -07:00
|
|
|
depends on SMP
|
2020-06-30 13:22:54 -07:00
|
|
|
depends on 64BIT
|
|
|
|
default n
|
|
|
|
help
|
|
|
|
This option enables debug prints when CPUs are slow to respond
|
|
|
|
to the smp_call_function*() IPI wrappers. These debug prints
|
|
|
|
include the IPI handler function currently executing (if any)
|
|
|
|
and relevant stack traces.
|
|
|
|
|
2023-03-20 17:55:13 -07:00
|
|
|
config CSD_LOCK_WAIT_DEBUG_DEFAULT
|
|
|
|
bool "Default csd_lock_wait() debugging on at boot time"
|
|
|
|
depends on CSD_LOCK_WAIT_DEBUG
|
|
|
|
depends on 64BIT
|
|
|
|
default n
|
|
|
|
help
|
|
|
|
This option causes the csdlock_debug= kernel boot parameter to
|
|
|
|
default to 1 (basic debugging) instead of 0 (no debugging).
|
|
|
|
|
2013-07-01 13:04:47 -07:00
|
|
|
endmenu # lock debugging
|
2006-07-03 00:24:38 -07:00
|
|
|
|
2013-07-01 13:04:47 -07:00
|
|
|
config TRACE_IRQFLAGS
|
2020-07-27 14:48:52 +02:00
|
|
|
depends on TRACE_IRQFLAGS_SUPPORT
|
2013-07-01 13:04:47 -07:00
|
|
|
bool
|
2011-05-24 17:13:36 -07:00
|
|
|
help
|
2013-07-01 13:04:47 -07:00
|
|
|
Enables hooks to interrupt enabling and disabling for
|
|
|
|
either tracing or lock debugging.
|
2011-05-24 17:13:36 -07:00
|
|
|
|
2020-07-27 14:48:52 +02:00
|
|
|
config TRACE_IRQFLAGS_NMI
|
|
|
|
def_bool y
|
|
|
|
depends on TRACE_IRQFLAGS
|
|
|
|
depends on TRACE_IRQFLAGS_NMI_SUPPORT
|
|
|
|
|
2022-12-16 15:57:51 -08:00
|
|
|
config NMI_CHECK_CPU
|
|
|
|
bool "Debugging for CPUs failing to respond to backtrace requests"
|
|
|
|
depends on DEBUG_KERNEL
|
|
|
|
depends on X86
|
|
|
|
default n
|
|
|
|
help
|
|
|
|
Enables debug prints when a CPU fails to respond to a given
|
|
|
|
backtrace NMI. These prints provide some reasons why a CPU
|
|
|
|
might legitimately be failing to respond, for example, if it
|
|
|
|
is offline of if ignore_nmis is set.
|
|
|
|
|
2021-01-11 15:37:07 +00:00
|
|
|
config DEBUG_IRQFLAGS
|
|
|
|
bool "Debug IRQ flag manipulation"
|
|
|
|
help
|
|
|
|
Enables checks for potentially unsafe enabling or disabling of
|
|
|
|
interrupts, such as calling raw_local_irq_restore() when interrupts
|
|
|
|
are enabled.
|
|
|
|
|
2006-07-03 00:24:38 -07:00
|
|
|
config STACKTRACE
|
2014-08-29 15:18:35 -07:00
|
|
|
bool "Stack backtrace support"
|
2006-07-03 00:24:38 -07:00
|
|
|
depends on STACKTRACE_SUPPORT
|
2014-08-29 15:18:35 -07:00
|
|
|
help
|
|
|
|
This option causes the kernel to create a /proc/pid/stack for
|
|
|
|
every process, showing its current stack trace.
|
|
|
|
It is also used by various kernel debugging features that require
|
|
|
|
stack trace generation.
|
2011-05-24 17:13:36 -07:00
|
|
|
|
2017-06-08 04:16:59 -04:00
|
|
|
config WARN_ALL_UNSEEDED_RANDOM
|
|
|
|
bool "Warn for all uses of unseeded randomness"
|
|
|
|
default n
|
random: warn when kernel uses unseeded randomness
This enables an important dmesg notification about when drivers have
used the crng without it being seeded first. Prior, these errors would
occur silently, and so there hasn't been a great way of diagnosing these
types of bugs for obscure setups. By adding this as a config option, we
can leave it on by default, so that we learn where these issues happen,
in the field, will still allowing some people to turn it off, if they
really know what they're doing and do not want the log entries.
However, we don't leave it _completely_ by default. An earlier version
of this patch simply had `default y`. I'd really love that, but it turns
out, this problem with unseeded randomness being used is really quite
present and is going to take a long time to fix. Thus, as a compromise
between log-messages-for-all and nobody-knows, this is `default y`,
except it is also `depends on DEBUG_KERNEL`. This will ensure that the
curious see the messages while others don't have to.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-06-07 23:06:55 -04:00
|
|
|
help
|
|
|
|
Some parts of the kernel contain bugs relating to their use of
|
|
|
|
cryptographically secure random numbers before it's actually possible
|
|
|
|
to generate those numbers securely. This setting ensures that these
|
|
|
|
flaws don't go unnoticed, by enabling a message, should this ever
|
|
|
|
occur. This will allow people with obscure setups to know when things
|
|
|
|
are going wrong, so that they might contact developers about fixing
|
|
|
|
it.
|
|
|
|
|
2017-06-08 04:16:59 -04:00
|
|
|
Unfortunately, on some models of some architectures getting
|
|
|
|
a fully seeded CRNG is extremely difficult, and so this can
|
|
|
|
result in dmesg getting spammed for a surprisingly long
|
|
|
|
time. This is really bad from a security perspective, and
|
|
|
|
so architecture maintainers really need to do what they can
|
|
|
|
to get the CRNG seeded sooner after the system is booted.
|
2018-09-04 15:46:23 -07:00
|
|
|
However, since users cannot do anything actionable to
|
random: remove ratelimiting for in-kernel unseeded randomness
The CONFIG_WARN_ALL_UNSEEDED_RANDOM debug option controls whether the
kernel warns about all unseeded randomness or just the first instance.
There's some complicated rate limiting and comparison to the previous
caller, such that even with CONFIG_WARN_ALL_UNSEEDED_RANDOM enabled,
developers still don't see all the messages or even an accurate count of
how many were missed. This is the result of basically parallel
mechanisms aimed at accomplishing more or less the same thing, added at
different points in random.c history, which sort of compete with the
first-instance-only limiting we have now.
It turns out, however, that nobody cares about the first unseeded
randomness instance of in-kernel users. The same first user has been
there for ages now, and nobody is doing anything about it. It isn't even
clear that anybody _can_ do anything about it. Most places that can do
something about it have switched over to using get_random_bytes_wait()
or wait_for_random_bytes(), which is the right thing to do, but there is
still much code that needs randomness sometimes during init, and as a
geeneral rule, if you're not using one of the _wait functions or the
readiness notifier callback, you're bound to be doing it wrong just
based on that fact alone.
So warning about this same first user that can't easily change is simply
not an effective mechanism for anything at all. Users can't do anything
about it, as the Kconfig text points out -- the problem isn't in
userspace code -- and kernel developers don't or more often can't react
to it.
Instead, show the warning for all instances when CONFIG_WARN_ALL_UNSEEDED_RANDOM
is set, so that developers can debug things need be, or if it isn't set,
don't show a warning at all.
At the same time, CONFIG_WARN_ALL_UNSEEDED_RANDOM now implies setting
random.ratelimit_disable=1 on by default, since if you care about one
you probably care about the other too. And we can clean up usage around
the related urandom_warning ratelimiter as well (whose behavior isn't
changing), so that it properly counts missed messages after the 10
message threshold is reached.
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-05-09 16:13:18 +02:00
|
|
|
address this, by default this option is disabled.
|
2017-06-08 04:16:59 -04:00
|
|
|
|
|
|
|
Say Y here if you want to receive warnings for all uses of
|
|
|
|
unseeded randomness. This will be of use primarily for
|
2018-09-04 15:46:23 -07:00
|
|
|
those developers interested in improving the security of
|
2017-06-08 04:16:59 -04:00
|
|
|
Linux kernels running on their architecture (or
|
|
|
|
subarchitecture).
|
random: warn when kernel uses unseeded randomness
This enables an important dmesg notification about when drivers have
used the crng without it being seeded first. Prior, these errors would
occur silently, and so there hasn't been a great way of diagnosing these
types of bugs for obscure setups. By adding this as a config option, we
can leave it on by default, so that we learn where these issues happen,
in the field, will still allowing some people to turn it off, if they
really know what they're doing and do not want the log entries.
However, we don't leave it _completely_ by default. An earlier version
of this patch simply had `default y`. I'd really love that, but it turns
out, this problem with unseeded randomness being used is really quite
present and is going to take a long time to fix. Thus, as a compromise
between log-messages-for-all and nobody-knows, this is `default y`,
except it is also `depends on DEBUG_KERNEL`. This will ensure that the
curious see the messages while others don't have to.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-06-07 23:06:55 -04:00
|
|
|
|
2005-04-16 15:20:36 -07:00
|
|
|
config DEBUG_KOBJECT
|
|
|
|
bool "kobject debugging"
|
|
|
|
depends on DEBUG_KERNEL
|
|
|
|
help
|
|
|
|
If you say Y here, some extra kobject debugging messages will be sent
|
2018-10-30 15:07:44 -07:00
|
|
|
to the syslog.
|
2005-04-16 15:20:36 -07:00
|
|
|
|
2013-06-27 15:06:14 +01:00
|
|
|
config DEBUG_KOBJECT_RELEASE
|
|
|
|
bool "kobject release debugging"
|
2013-10-29 08:33:36 -07:00
|
|
|
depends on DEBUG_OBJECTS_TIMERS
|
2013-06-27 15:06:14 +01:00
|
|
|
help
|
|
|
|
kobjects are reference counted objects. This means that their
|
|
|
|
last reference count put is not predictable, and the kobject can
|
2022-07-14 18:59:59 -07:00
|
|
|
live on past the point at which a driver decides to drop its
|
2013-06-27 15:06:14 +01:00
|
|
|
initial reference to the kobject gained on allocation. An
|
|
|
|
example of this would be a struct device which has just been
|
|
|
|
unregistered.
|
|
|
|
|
|
|
|
However, some buggy drivers assume that after such an operation,
|
|
|
|
the memory backing the kobject can be immediately freed. This
|
|
|
|
goes completely against the principles of a refcounted object.
|
|
|
|
|
|
|
|
If you say Y here, the kernel will delay the release of kobjects
|
|
|
|
on the last reference count to improve the visibility of this
|
|
|
|
kind of kobject release bug.
|
|
|
|
|
2012-10-08 16:28:13 -07:00
|
|
|
config HAVE_DEBUG_BUGVERBOSE
|
|
|
|
bool
|
|
|
|
|
2019-12-06 17:03:48 -08:00
|
|
|
menu "Debug kernel data structures"
|
2005-04-16 15:20:36 -07:00
|
|
|
|
2006-09-29 01:59:00 -07:00
|
|
|
config DEBUG_LIST
|
|
|
|
bool "Debug linked list manipulation"
|
2023-08-11 17:18:41 +02:00
|
|
|
depends on DEBUG_KERNEL
|
list: Introduce CONFIG_LIST_HARDENED
Numerous production kernel configs (see [1, 2]) are choosing to enable
CONFIG_DEBUG_LIST, which is also being recommended by KSPP for hardened
configs [3]. The motivation behind this is that the option can be used
as a security hardening feature (e.g. CVE-2019-2215 and CVE-2019-2025
are mitigated by the option [4]).
The feature has never been designed with performance in mind, yet common
list manipulation is happening across hot paths all over the kernel.
Introduce CONFIG_LIST_HARDENED, which performs list pointer checking
inline, and only upon list corruption calls the reporting slow path.
To generate optimal machine code with CONFIG_LIST_HARDENED:
1. Elide checking for pointer values which upon dereference would
result in an immediate access fault (i.e. minimal hardening
checks). The trade-off is lower-quality error reports.
2. Use the __preserve_most function attribute (available with Clang,
but not yet with GCC) to minimize the code footprint for calling
the reporting slow path. As a result, function size of callers is
reduced by avoiding saving registers before calling the rarely
called reporting slow path.
Note that all TUs in lib/Makefile already disable function tracing,
including list_debug.c, and __preserve_most's implied notrace has
no effect in this case.
3. Because the inline checks are a subset of the full set of checks in
__list_*_valid_or_report(), always return false if the inline
checks failed. This avoids redundant compare and conditional
branch right after return from the slow path.
As a side-effect of the checks being inline, if the compiler can prove
some condition to always be true, it can completely elide some checks.
Since DEBUG_LIST is functionally a superset of LIST_HARDENED, the
Kconfig variables are changed to reflect that: DEBUG_LIST selects
LIST_HARDENED, whereas LIST_HARDENED itself has no dependency on
DEBUG_LIST.
Running netperf with CONFIG_LIST_HARDENED (using a Clang compiler with
"preserve_most") shows throughput improvements, in my case of ~7% on
average (up to 20-30% on some test cases).
Link: https://r.android.com/1266735 [1]
Link: https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/blob/main/config [2]
Link: https://kernsec.org/wiki/index.php/Kernel_Self_Protection_Project/Recommended_Settings [3]
Link: https://googleprojectzero.blogspot.com/2019/11/bad-binder-android-in-wild-exploit.html [4]
Signed-off-by: Marco Elver <elver@google.com>
Link: https://lore.kernel.org/r/20230811151847.1594958-3-elver@google.com
Signed-off-by: Kees Cook <keescook@chromium.org>
2023-08-11 17:18:40 +02:00
|
|
|
select LIST_HARDENED
|
2006-09-29 01:59:00 -07:00
|
|
|
help
|
list: Introduce CONFIG_LIST_HARDENED
Numerous production kernel configs (see [1, 2]) are choosing to enable
CONFIG_DEBUG_LIST, which is also being recommended by KSPP for hardened
configs [3]. The motivation behind this is that the option can be used
as a security hardening feature (e.g. CVE-2019-2215 and CVE-2019-2025
are mitigated by the option [4]).
The feature has never been designed with performance in mind, yet common
list manipulation is happening across hot paths all over the kernel.
Introduce CONFIG_LIST_HARDENED, which performs list pointer checking
inline, and only upon list corruption calls the reporting slow path.
To generate optimal machine code with CONFIG_LIST_HARDENED:
1. Elide checking for pointer values which upon dereference would
result in an immediate access fault (i.e. minimal hardening
checks). The trade-off is lower-quality error reports.
2. Use the __preserve_most function attribute (available with Clang,
but not yet with GCC) to minimize the code footprint for calling
the reporting slow path. As a result, function size of callers is
reduced by avoiding saving registers before calling the rarely
called reporting slow path.
Note that all TUs in lib/Makefile already disable function tracing,
including list_debug.c, and __preserve_most's implied notrace has
no effect in this case.
3. Because the inline checks are a subset of the full set of checks in
__list_*_valid_or_report(), always return false if the inline
checks failed. This avoids redundant compare and conditional
branch right after return from the slow path.
As a side-effect of the checks being inline, if the compiler can prove
some condition to always be true, it can completely elide some checks.
Since DEBUG_LIST is functionally a superset of LIST_HARDENED, the
Kconfig variables are changed to reflect that: DEBUG_LIST selects
LIST_HARDENED, whereas LIST_HARDENED itself has no dependency on
DEBUG_LIST.
Running netperf with CONFIG_LIST_HARDENED (using a Clang compiler with
"preserve_most") shows throughput improvements, in my case of ~7% on
average (up to 20-30% on some test cases).
Link: https://r.android.com/1266735 [1]
Link: https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/blob/main/config [2]
Link: https://kernsec.org/wiki/index.php/Kernel_Self_Protection_Project/Recommended_Settings [3]
Link: https://googleprojectzero.blogspot.com/2019/11/bad-binder-android-in-wild-exploit.html [4]
Signed-off-by: Marco Elver <elver@google.com>
Link: https://lore.kernel.org/r/20230811151847.1594958-3-elver@google.com
Signed-off-by: Kees Cook <keescook@chromium.org>
2023-08-11 17:18:40 +02:00
|
|
|
Enable this to turn on extended checks in the linked-list walking
|
|
|
|
routines.
|
|
|
|
|
|
|
|
This option trades better quality error reports for performance, and
|
|
|
|
is more suitable for kernel debugging. If you care about performance,
|
|
|
|
you should only enable CONFIG_LIST_HARDENED instead.
|
2006-09-29 01:59:00 -07:00
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2019-05-14 15:42:46 -07:00
|
|
|
config DEBUG_PLIST
|
2014-06-04 16:11:54 -07:00
|
|
|
bool "Debug priority linked list manipulation"
|
|
|
|
depends on DEBUG_KERNEL
|
|
|
|
help
|
|
|
|
Enable this to turn on extended checks in the priority-ordered
|
|
|
|
linked-list (plist) walking routines. This checks the entire
|
|
|
|
list multiple times during each manipulation.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2007-10-22 20:01:06 +02:00
|
|
|
config DEBUG_SG
|
|
|
|
bool "Debug SG table operations"
|
|
|
|
depends on DEBUG_KERNEL
|
|
|
|
help
|
|
|
|
Enable this to turn on checks on scatter-gather tables. This can
|
|
|
|
help find problems with drivers that do not properly initialize
|
|
|
|
their sg tables.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2008-08-15 15:29:38 -07:00
|
|
|
config DEBUG_NOTIFIERS
|
|
|
|
bool "Debug notifier call chains"
|
|
|
|
depends on DEBUG_KERNEL
|
|
|
|
help
|
|
|
|
Enable this to turn on sanity checking for notifier call chains.
|
|
|
|
This is most useful for kernel developers to make sure that
|
|
|
|
modules properly unregister themselves from notifier chains.
|
|
|
|
This is a relatively cheap check but if you care about maximum
|
|
|
|
performance, say N.
|
|
|
|
|
2017-03-17 16:35:23 -08:00
|
|
|
config DEBUG_CLOSURES
|
|
|
|
bool "Debug closures (bcache async widgits)"
|
|
|
|
depends on CLOSURES
|
|
|
|
select DEBUG_FS
|
|
|
|
help
|
|
|
|
Keeps all active closures in a linked list and provides a debugfs
|
|
|
|
interface to list them, which makes it possible to see asynchronous
|
|
|
|
operations that get stuck.
|
|
|
|
|
Maple Tree: add new data structure
Patch series "Introducing the Maple Tree"
The maple tree is an RCU-safe range based B-tree designed to use modern
processor cache efficiently. There are a number of places in the kernel
that a non-overlapping range-based tree would be beneficial, especially
one with a simple interface. If you use an rbtree with other data
structures to improve performance or an interval tree to track
non-overlapping ranges, then this is for you.
The tree has a branching factor of 10 for non-leaf nodes and 16 for leaf
nodes. With the increased branching factor, it is significantly shorter
than the rbtree so it has fewer cache misses. The removal of the linked
list between subsequent entries also reduces the cache misses and the need
to pull in the previous and next VMA during many tree alterations.
The first user that is covered in this patch set is the vm_area_struct,
where three data structures are replaced by the maple tree: the augmented
rbtree, the vma cache, and the linked list of VMAs in the mm_struct. The
long term goal is to reduce or remove the mmap_lock contention.
The plan is to get to the point where we use the maple tree in RCU mode.
Readers will not block for writers. A single write operation will be
allowed at a time. A reader re-walks if stale data is encountered. VMAs
would be RCU enabled and this mode would be entered once multiple tasks
are using the mm_struct.
Davidlor said
: Yes I like the maple tree, and at this stage I don't think we can ask for
: more from this series wrt the MM - albeit there seems to still be some
: folks reporting breakage. Fundamentally I see Liam's work to (re)move
: complexity out of the MM (not to say that the actual maple tree is not
: complex) by consolidating the three complimentary data structures very
: much worth it considering performance does not take a hit. This was very
: much a turn off with the range locking approach, which worst case scenario
: incurred in prohibitive overhead. Also as Liam and Matthew have
: mentioned, RCU opens up a lot of nice performance opportunities, and in
: addition academia[1] has shown outstanding scalability of address spaces
: with the foundation of replacing the locked rbtree with RCU aware trees.
A similar work has been discovered in the academic press
https://pdos.csail.mit.edu/papers/rcuvm:asplos12.pdf
Sheer coincidence. We designed our tree with the intention of solving the
hardest problem first. Upon settling on a b-tree variant and a rough
outline, we researched ranged based b-trees and RCU b-trees and did find
that article. So it was nice to find reassurances that we were on the
right path, but our design choice of using ranges made that paper unusable
for us.
This patch (of 70):
The maple tree is an RCU-safe range based B-tree designed to use modern
processor cache efficiently. There are a number of places in the kernel
that a non-overlapping range-based tree would be beneficial, especially
one with a simple interface. If you use an rbtree with other data
structures to improve performance or an interval tree to track
non-overlapping ranges, then this is for you.
The tree has a branching factor of 10 for non-leaf nodes and 16 for leaf
nodes. With the increased branching factor, it is significantly shorter
than the rbtree so it has fewer cache misses. The removal of the linked
list between subsequent entries also reduces the cache misses and the need
to pull in the previous and next VMA during many tree alterations.
The first user that is covered in this patch set is the vm_area_struct,
where three data structures are replaced by the maple tree: the augmented
rbtree, the vma cache, and the linked list of VMAs in the mm_struct. The
long term goal is to reduce or remove the mmap_lock contention.
The plan is to get to the point where we use the maple tree in RCU mode.
Readers will not block for writers. A single write operation will be
allowed at a time. A reader re-walks if stale data is encountered. VMAs
would be RCU enabled and this mode would be entered once multiple tasks
are using the mm_struct.
There is additional BUG_ON() calls added within the tree, most of which
are in debug code. These will be replaced with a WARN_ON() call in the
future. There is also additional BUG_ON() calls within the code which
will also be reduced in number at a later date. These exist to catch
things such as out-of-range accesses which would crash anyways.
Link: https://lkml.kernel.org/r/20220906194824.2110408-1-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220906194824.2110408-2-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Tested-by: David Howells <dhowells@redhat.com>
Tested-by: Sven Schnelle <svens@linux.ibm.com>
Tested-by: Yu Zhao <yuzhao@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Hildenbrand <david@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-06 19:48:39 +00:00
|
|
|
config DEBUG_MAPLE_TREE
|
|
|
|
bool "Debug maple trees"
|
|
|
|
depends on DEBUG_KERNEL
|
|
|
|
help
|
|
|
|
Enable maple tree debugging information and extra validations.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2019-12-06 17:03:48 -08:00
|
|
|
endmenu
|
|
|
|
|
2017-05-17 09:19:44 -07:00
|
|
|
source "kernel/rcu/Kconfig.debug"
|
2013-01-07 08:19:23 -08:00
|
|
|
|
2016-02-09 17:59:38 -05:00
|
|
|
config DEBUG_WQ_FORCE_RR_CPU
|
|
|
|
bool "Force round-robin CPU selection for unbound work items"
|
|
|
|
depends on DEBUG_KERNEL
|
|
|
|
default n
|
|
|
|
help
|
|
|
|
Workqueue used to implicitly guarantee that work items queued
|
|
|
|
without explicit CPU specified are put on the local CPU. This
|
|
|
|
guarantee is no longer true and while local CPU is still
|
|
|
|
preferred work items may be put on foreign CPUs. Kernel
|
|
|
|
parameter "workqueue.debug_force_rr_cpu" is added to force
|
|
|
|
round-robin CPU selection to flush out usages which depend on the
|
|
|
|
now broken guarantee. This config option enables the debug
|
|
|
|
feature by default. When enabled, memory and cache locality will
|
|
|
|
be impacted.
|
|
|
|
|
2016-02-26 18:43:32 +00:00
|
|
|
config CPU_HOTPLUG_STATE_CONTROL
|
|
|
|
bool "Enable CPU hotplug state control"
|
|
|
|
depends on DEBUG_KERNEL
|
|
|
|
depends on HOTPLUG_CPU
|
|
|
|
default n
|
|
|
|
help
|
|
|
|
Allows to write steps between "offline" and "online" to the CPUs
|
|
|
|
sysfs target file so states can be stepped granular. This is a debug
|
|
|
|
option for now as the hotplug machinery cannot be stopped and
|
|
|
|
restarted at arbitrary points yet.
|
|
|
|
|
|
|
|
Say N if your are unsure.
|
|
|
|
|
2019-12-06 17:03:51 -08:00
|
|
|
config LATENCYTOP
|
|
|
|
bool "Latency measuring infrastructure"
|
|
|
|
depends on DEBUG_KERNEL
|
|
|
|
depends on STACKTRACE_SUPPORT
|
|
|
|
depends on PROC_FS
|
2021-04-09 13:27:47 -07:00
|
|
|
depends on FRAME_POINTER || MIPS || PPC || S390 || MICROBLAZE || ARM || ARC || X86
|
2019-12-06 17:03:51 -08:00
|
|
|
select KALLSYMS
|
|
|
|
select KALLSYMS_ALL
|
|
|
|
select STACKTRACE
|
|
|
|
select SCHEDSTATS
|
|
|
|
help
|
|
|
|
Enable this option if you want to use the LatencyTOP tool
|
|
|
|
to find out which userspace is blocking on what kernel operations.
|
|
|
|
|
2022-10-28 10:45:44 -10:00
|
|
|
config DEBUG_CGROUP_REF
|
|
|
|
bool "Disable inlining of cgroup css reference count functions"
|
|
|
|
depends on DEBUG_KERNEL
|
|
|
|
depends on CGROUPS
|
|
|
|
depends on KPROBES
|
|
|
|
default n
|
|
|
|
help
|
|
|
|
Force cgroup css reference count functions to not be inlined so
|
|
|
|
that they can be kprobed for debugging.
|
|
|
|
|
2019-12-06 17:03:51 -08:00
|
|
|
source "kernel/trace/Kconfig"
|
|
|
|
|
|
|
|
config PROVIDE_OHCI1394_DMA_INIT
|
|
|
|
bool "Remote debugging over FireWire early on boot"
|
|
|
|
depends on PCI && X86
|
|
|
|
help
|
|
|
|
If you want to debug problems which hang or crash the kernel early
|
|
|
|
on boot and the crashing machine has a FireWire port, you can use
|
|
|
|
this feature to remotely access the memory of the crashed machine
|
|
|
|
over FireWire. This employs remote DMA as part of the OHCI1394
|
|
|
|
specification which is now the standard for FireWire controllers.
|
|
|
|
|
|
|
|
With remote DMA, you can monitor the printk buffer remotely using
|
|
|
|
firescope and access all memory below 4GB using fireproxy from gdb.
|
|
|
|
Even controlling a kernel debugger is possible using remote DMA.
|
|
|
|
|
|
|
|
Usage:
|
|
|
|
|
|
|
|
If ohci1394_dma=early is used as boot parameter, it will initialize
|
|
|
|
all OHCI1394 controllers which are found in the PCI config space.
|
|
|
|
|
|
|
|
As all changes to the FireWire bus such as enabling and disabling
|
|
|
|
devices cause a bus reset and thereby disable remote DMA for all
|
|
|
|
devices, be sure to have the cable plugged and FireWire enabled on
|
|
|
|
the debugging host before booting the debug target for debugging.
|
|
|
|
|
|
|
|
This code (~1k) is freed after boot. By then, the firewire stack
|
|
|
|
in charge of the OHCI-1394 controllers should be used instead.
|
|
|
|
|
2020-05-01 17:37:50 +02:00
|
|
|
See Documentation/core-api/debugging-via-ohci1394.rst for more information.
|
2019-12-06 17:03:51 -08:00
|
|
|
|
2019-12-17 20:51:56 -08:00
|
|
|
source "samples/Kconfig"
|
|
|
|
|
|
|
|
config ARCH_HAS_DEVMEM_IS_ALLOWED
|
|
|
|
bool
|
|
|
|
|
|
|
|
config STRICT_DEVMEM
|
|
|
|
bool "Filter access to /dev/mem"
|
|
|
|
depends on MMU && DEVMEM
|
2020-07-09 11:43:21 -07:00
|
|
|
depends on ARCH_HAS_DEVMEM_IS_ALLOWED || GENERIC_LIB_DEVMEM_IS_ALLOWED
|
2019-12-17 20:51:56 -08:00
|
|
|
default y if PPC || X86 || ARM64
|
|
|
|
help
|
|
|
|
If this option is disabled, you allow userspace (root) access to all
|
|
|
|
of memory, including kernel and userspace memory. Accidental
|
|
|
|
access to this is obviously disastrous, but specific access can
|
|
|
|
be used by people debugging the kernel. Note that with PAT support
|
|
|
|
enabled, even in this case there are restrictions on /dev/mem
|
|
|
|
use due to the cache aliasing requirements.
|
|
|
|
|
|
|
|
If this option is switched on, and IO_STRICT_DEVMEM=n, the /dev/mem
|
|
|
|
file only allows userspace access to PCI space and the BIOS code and
|
|
|
|
data regions. This is sufficient for dosemu and X and all common
|
|
|
|
users of /dev/mem.
|
|
|
|
|
|
|
|
If in doubt, say Y.
|
|
|
|
|
|
|
|
config IO_STRICT_DEVMEM
|
|
|
|
bool "Filter I/O access to /dev/mem"
|
|
|
|
depends on STRICT_DEVMEM
|
|
|
|
help
|
|
|
|
If this option is disabled, you allow userspace (root) access to all
|
|
|
|
io-memory regardless of whether a driver is actively using that
|
|
|
|
range. Accidental access to this is obviously disastrous, but
|
|
|
|
specific access can be used by people debugging kernel drivers.
|
|
|
|
|
|
|
|
If this option is switched on, the /dev/mem file only allows
|
|
|
|
userspace access to *idle* io-memory ranges (see /proc/iomem) This
|
|
|
|
may break traditional users of /dev/mem (dosemu, legacy X, etc...)
|
|
|
|
if the driver using a given range cannot be disabled.
|
|
|
|
|
|
|
|
If in doubt, say Y.
|
|
|
|
|
|
|
|
menu "$(SRCARCH) Debugging"
|
|
|
|
|
|
|
|
source "arch/$(SRCARCH)/Kconfig.debug"
|
|
|
|
|
|
|
|
endmenu
|
|
|
|
|
|
|
|
menu "Kernel Testing and Coverage"
|
|
|
|
|
2019-12-06 17:03:51 -08:00
|
|
|
source "lib/kunit/Kconfig"
|
|
|
|
|
2012-07-30 14:43:02 -07:00
|
|
|
config NOTIFIER_ERROR_INJECTION
|
|
|
|
tristate "Notifier error injection"
|
|
|
|
depends on DEBUG_KERNEL
|
|
|
|
select DEBUG_FS
|
|
|
|
help
|
2012-11-30 16:44:39 +09:00
|
|
|
This option provides the ability to inject artificial errors to
|
2012-07-30 14:43:02 -07:00
|
|
|
specified notifier chain callbacks. It is useful to test the error
|
|
|
|
handling of notifier call chain failures.
|
|
|
|
|
|
|
|
Say N if unsure.
|
|
|
|
|
2012-07-30 14:43:07 -07:00
|
|
|
config PM_NOTIFIER_ERROR_INJECT
|
|
|
|
tristate "PM notifier error injection module"
|
|
|
|
depends on PM && NOTIFIER_ERROR_INJECTION
|
|
|
|
default m if PM_DEBUG
|
|
|
|
help
|
2012-11-30 16:44:39 +09:00
|
|
|
This option provides the ability to inject artificial errors to
|
2012-07-30 14:43:07 -07:00
|
|
|
PM notifier chain callbacks. It is controlled through debugfs
|
|
|
|
interface /sys/kernel/debug/notifier-error-inject/pm
|
|
|
|
|
|
|
|
If the notifier call chain should be failed with some events
|
|
|
|
notified, write the error code to "actions/<notifier event>/error".
|
|
|
|
|
|
|
|
Example: Inject PM suspend error (-12 = -ENOMEM)
|
|
|
|
|
|
|
|
# cd /sys/kernel/debug/notifier-error-inject/pm/
|
|
|
|
# echo -12 > actions/PM_SUSPEND_PREPARE/error
|
|
|
|
# echo mem > /sys/power/state
|
|
|
|
bash: echo: write error: Cannot allocate memory
|
|
|
|
|
|
|
|
To compile this code as a module, choose M here: the module will
|
|
|
|
be called pm-notifier-error-inject.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2012-12-14 10:32:52 +11:00
|
|
|
config OF_RECONFIG_NOTIFIER_ERROR_INJECT
|
|
|
|
tristate "OF reconfig notifier error injection module"
|
|
|
|
depends on OF_DYNAMIC && NOTIFIER_ERROR_INJECTION
|
2012-07-30 14:43:13 -07:00
|
|
|
help
|
2012-11-30 16:44:39 +09:00
|
|
|
This option provides the ability to inject artificial errors to
|
2012-12-14 10:32:52 +11:00
|
|
|
OF reconfig notifier chain callbacks. It is controlled
|
2012-07-30 14:43:13 -07:00
|
|
|
through debugfs interface under
|
2012-12-14 10:32:52 +11:00
|
|
|
/sys/kernel/debug/notifier-error-inject/OF-reconfig/
|
2012-07-30 14:43:13 -07:00
|
|
|
|
|
|
|
If the notifier call chain should be failed with some events
|
|
|
|
notified, write the error code to "actions/<notifier event>/error".
|
|
|
|
|
|
|
|
To compile this code as a module, choose M here: the module will
|
2013-04-30 15:28:49 -07:00
|
|
|
be called of-reconfig-notifier-error-inject.
|
2012-07-30 14:43:13 -07:00
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2015-11-28 13:45:28 +01:00
|
|
|
config NETDEV_NOTIFIER_ERROR_INJECT
|
|
|
|
tristate "Netdev notifier error injection module"
|
|
|
|
depends on NET && NOTIFIER_ERROR_INJECTION
|
|
|
|
help
|
|
|
|
This option provides the ability to inject artificial errors to
|
|
|
|
netdevice notifier chain callbacks. It is controlled through debugfs
|
|
|
|
interface /sys/kernel/debug/notifier-error-inject/netdev
|
|
|
|
|
|
|
|
If the notifier call chain should be failed with some events
|
|
|
|
notified, write the error code to "actions/<notifier event>/error".
|
|
|
|
|
|
|
|
Example: Inject netdevice mtu change error (-22 = -EINVAL)
|
|
|
|
|
|
|
|
# cd /sys/kernel/debug/notifier-error-inject/netdev
|
|
|
|
# echo -22 > actions/NETDEV_CHANGEMTU/error
|
|
|
|
# ip link set eth0 mtu 1024
|
|
|
|
RTNETLINK answers: Invalid argument
|
|
|
|
|
|
|
|
To compile this code as a module, choose M here: the module will
|
|
|
|
be called netdev-notifier-error-inject.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2018-06-14 15:27:48 -07:00
|
|
|
config FUNCTION_ERROR_INJECTION
|
2022-11-21 10:44:03 -05:00
|
|
|
bool "Fault-injections of functions"
|
2018-06-14 15:27:48 -07:00
|
|
|
depends on HAVE_FUNCTION_ERROR_INJECTION && KPROBES
|
2022-11-21 10:44:03 -05:00
|
|
|
help
|
|
|
|
Add fault injections into various functions that are annotated with
|
|
|
|
ALLOW_ERROR_INJECTION() in the kernel. BPF may also modify the return
|
2023-01-24 10:16:55 -08:00
|
|
|
value of these functions. This is useful to test error paths of code.
|
2022-11-21 10:44:03 -05:00
|
|
|
|
|
|
|
If unsure, say N
|
2018-06-14 15:27:48 -07:00
|
|
|
|
2006-12-08 02:39:43 -08:00
|
|
|
config FAULT_INJECTION
|
2006-12-08 02:39:49 -08:00
|
|
|
bool "Fault-injection framework"
|
|
|
|
depends on DEBUG_KERNEL
|
2006-12-08 02:39:48 -08:00
|
|
|
help
|
|
|
|
Provide fault-injection framework.
|
|
|
|
For more details, see Documentation/fault-injection/.
|
2006-12-08 02:39:43 -08:00
|
|
|
|
2006-12-08 02:39:44 -08:00
|
|
|
config FAILSLAB
|
2006-12-08 02:39:49 -08:00
|
|
|
bool "Fault-injection capability for kmalloc"
|
|
|
|
depends on FAULT_INJECTION
|
2006-12-08 02:39:44 -08:00
|
|
|
help
|
2006-12-08 02:39:49 -08:00
|
|
|
Provide fault-injection capability for kmalloc.
|
2006-12-08 02:39:44 -08:00
|
|
|
|
2006-12-08 02:39:45 -08:00
|
|
|
config FAIL_PAGE_ALLOC
|
2020-04-06 20:12:49 -07:00
|
|
|
bool "Fault-injection capability for alloc_pages()"
|
2006-12-08 02:39:49 -08:00
|
|
|
depends on FAULT_INJECTION
|
2006-12-08 02:39:45 -08:00
|
|
|
help
|
2006-12-08 02:39:49 -08:00
|
|
|
Provide fault-injection capability for alloc_pages().
|
2006-12-08 02:39:45 -08:00
|
|
|
|
2020-10-15 20:13:46 -07:00
|
|
|
config FAULT_INJECTION_USERCOPY
|
|
|
|
bool "Fault injection capability for usercopy functions"
|
|
|
|
depends on FAULT_INJECTION
|
|
|
|
help
|
|
|
|
Provides fault-injection capability to inject failures
|
|
|
|
in usercopy functions (copy_from_user(), get_user(), ...).
|
|
|
|
|
2006-12-08 02:39:46 -08:00
|
|
|
config FAIL_MAKE_REQUEST
|
2006-12-12 20:16:36 +01:00
|
|
|
bool "Fault-injection capability for disk IO"
|
2008-09-14 05:56:33 -07:00
|
|
|
depends on FAULT_INJECTION && BLOCK
|
2006-12-08 02:39:46 -08:00
|
|
|
help
|
2006-12-08 02:39:49 -08:00
|
|
|
Provide fault-injection capability for disk IO.
|
2006-12-08 02:39:46 -08:00
|
|
|
|
2008-09-14 05:56:33 -07:00
|
|
|
config FAIL_IO_TIMEOUT
|
2010-07-21 16:05:53 +09:00
|
|
|
bool "Fault-injection capability for faking disk interrupts"
|
2008-09-14 05:56:33 -07:00
|
|
|
depends on FAULT_INJECTION && BLOCK
|
|
|
|
help
|
|
|
|
Provide fault-injection capability on end IO handling. This
|
|
|
|
will make the block layer "forget" an interrupt as configured,
|
|
|
|
thus exercising the error handling.
|
|
|
|
|
|
|
|
Only works with drivers that use the generic timeout handling,
|
2021-07-07 18:07:31 -07:00
|
|
|
for others it won't do anything.
|
2008-09-14 05:56:33 -07:00
|
|
|
|
2015-06-29 23:26:02 -07:00
|
|
|
config FAIL_FUTEX
|
|
|
|
bool "Fault-injection capability for futexes"
|
|
|
|
select DEBUG_FS
|
|
|
|
depends on FAULT_INJECTION && FUTEX
|
|
|
|
help
|
|
|
|
Provide fault-injection capability for futexes.
|
|
|
|
|
2018-06-14 15:27:48 -07:00
|
|
|
config FAULT_INJECTION_DEBUG_FS
|
|
|
|
bool "Debugfs entries for fault-injection capabilities"
|
|
|
|
depends on FAULT_INJECTION && SYSFS && DEBUG_FS
|
|
|
|
help
|
|
|
|
Enable configuration of fault-injection capabilities via debugfs.
|
|
|
|
|
2018-01-13 02:56:03 +09:00
|
|
|
config FAIL_FUNCTION
|
|
|
|
bool "Fault-injection capability for functions"
|
|
|
|
depends on FAULT_INJECTION_DEBUG_FS && FUNCTION_ERROR_INJECTION
|
|
|
|
help
|
|
|
|
Provide function-based fault-injection capability.
|
|
|
|
This will allow you to override a specific function with a return
|
|
|
|
with given return value. As a result, function caller will see
|
|
|
|
an error value and have to handle it. This is useful to test the
|
|
|
|
error handling in various subsystems.
|
|
|
|
|
2018-06-14 15:27:48 -07:00
|
|
|
config FAIL_MMC_REQUEST
|
|
|
|
bool "Fault-injection capability for MMC IO"
|
|
|
|
depends on FAULT_INJECTION_DEBUG_FS && MMC
|
2006-12-08 02:39:43 -08:00
|
|
|
help
|
2018-06-14 15:27:48 -07:00
|
|
|
Provide fault-injection capability for MMC IO.
|
|
|
|
This will make the mmc core return data errors. This is
|
|
|
|
useful to test the error handling in the mmc block device
|
|
|
|
and to test how the mmc host driver handles retries from
|
|
|
|
the block device.
|
2007-02-20 13:57:56 -08:00
|
|
|
|
2021-08-03 15:45:18 -04:00
|
|
|
config FAIL_SUNRPC
|
|
|
|
bool "Fault-injection capability for SunRPC"
|
|
|
|
depends on FAULT_INJECTION_DEBUG_FS && SUNRPC_DEBUG
|
|
|
|
help
|
|
|
|
Provide fault-injection capability for SunRPC and
|
|
|
|
its consumers.
|
|
|
|
|
2023-03-27 23:37:32 +09:00
|
|
|
config FAULT_INJECTION_CONFIGFS
|
|
|
|
bool "Configfs interface for fault-injection capabilities"
|
2023-04-15 21:57:05 +09:00
|
|
|
depends on FAULT_INJECTION
|
|
|
|
select CONFIGFS_FS
|
2023-03-27 23:37:32 +09:00
|
|
|
help
|
|
|
|
This option allows configfs-based drivers to dynamically configure
|
|
|
|
fault-injection via configfs. Each parameter for driver-specific
|
|
|
|
fault-injection can be made visible as a configfs attribute in a
|
|
|
|
configfs group.
|
|
|
|
|
|
|
|
|
2007-02-20 13:57:56 -08:00
|
|
|
config FAULT_INJECTION_STACKTRACE_FILTER
|
|
|
|
bool "stacktrace filter for fault-injection capabilities"
|
2023-03-27 23:37:32 +09:00
|
|
|
depends on FAULT_INJECTION
|
|
|
|
depends on (FAULT_INJECTION_DEBUG_FS || FAULT_INJECTION_CONFIGFS) && STACKTRACE_SUPPORT
|
2007-02-20 13:57:56 -08:00
|
|
|
select STACKTRACE
|
2021-04-09 13:27:47 -07:00
|
|
|
depends on FRAME_POINTER || MIPS || PPC || S390 || MICROBLAZE || ARM || ARC || X86
|
2007-02-20 13:57:56 -08:00
|
|
|
help
|
|
|
|
Provide stacktrace filter for fault-injection capabilities
|
2007-10-18 23:41:07 -07:00
|
|
|
|
2019-12-06 17:03:51 -08:00
|
|
|
config ARCH_HAS_KCOV
|
|
|
|
bool
|
2017-10-13 15:57:33 -07:00
|
|
|
help
|
2019-12-06 17:03:51 -08:00
|
|
|
An architecture should select this when it can successfully
|
|
|
|
build and run with CONFIG_KCOV. This typically requires
|
|
|
|
disabling instrumentation for some early boot code.
|
2017-10-13 15:57:33 -07:00
|
|
|
|
2019-12-06 17:03:51 -08:00
|
|
|
config CC_HAS_SANCOV_TRACE_PC
|
|
|
|
def_bool $(cc-option,-fsanitize-coverage=trace-pc)
|
2017-10-13 15:57:33 -07:00
|
|
|
|
|
|
|
|
2019-12-06 17:03:51 -08:00
|
|
|
config KCOV
|
|
|
|
bool "Code coverage for fuzzing"
|
|
|
|
depends on ARCH_HAS_KCOV
|
|
|
|
depends on CC_HAS_SANCOV_TRACE_PC || GCC_PLUGINS
|
2022-04-18 09:50:40 -07:00
|
|
|
depends on !ARCH_WANTS_NO_INSTR || HAVE_NOINSTR_HACK || \
|
2024-01-25 15:55:16 -07:00
|
|
|
GCC_VERSION >= 120000 || CC_IS_CLANG
|
2019-12-06 17:03:51 -08:00
|
|
|
select DEBUG_FS
|
|
|
|
select GCC_PLUGIN_SANCOV if !CC_HAS_SANCOV_TRACE_PC
|
2022-04-18 09:50:40 -07:00
|
|
|
select OBJTOOL if HAVE_NOINSTR_HACK
|
2019-12-06 17:03:51 -08:00
|
|
|
help
|
|
|
|
KCOV exposes kernel code coverage information in a form suitable
|
|
|
|
for coverage-guided fuzzing (randomized testing).
|
2017-10-13 15:57:33 -07:00
|
|
|
|
2019-12-06 17:03:51 -08:00
|
|
|
For more details, see Documentation/dev-tools/kcov.rst.
|
2017-10-13 15:57:33 -07:00
|
|
|
|
2019-12-06 17:03:51 -08:00
|
|
|
config KCOV_ENABLE_COMPARISONS
|
|
|
|
bool "Enable comparison operands collection by KCOV"
|
|
|
|
depends on KCOV
|
|
|
|
depends on $(cc-option,-fsanitize-coverage=trace-cmp)
|
|
|
|
help
|
|
|
|
KCOV also exposes operands of every comparison in the instrumented
|
|
|
|
code along with operand sizes and PCs of the comparison instructions.
|
|
|
|
These operands can be used by fuzzing engines to improve the quality
|
|
|
|
of fuzzing coverage.
|
2017-10-13 15:57:33 -07:00
|
|
|
|
2019-12-06 17:03:51 -08:00
|
|
|
config KCOV_INSTRUMENT_ALL
|
|
|
|
bool "Instrument all code by default"
|
|
|
|
depends on KCOV
|
|
|
|
default y
|
|
|
|
help
|
|
|
|
If you are doing generic system call fuzzing (like e.g. syzkaller),
|
|
|
|
then you will want to instrument the whole kernel and you should
|
|
|
|
say y here. If you are doing more targeted fuzzing (like e.g.
|
|
|
|
filesystem fuzzing with AFL) then you will want to enable coverage
|
|
|
|
for more specific subsets of files, and should say n here.
|
2019-09-23 02:02:36 -07:00
|
|
|
|
2020-06-04 16:46:04 -07:00
|
|
|
config KCOV_IRQ_AREA_SIZE
|
|
|
|
hex "Size of interrupt coverage collection area in words"
|
|
|
|
depends on KCOV
|
|
|
|
default 0x40000
|
|
|
|
help
|
|
|
|
KCOV uses preallocated per-cpu areas to collect coverage from
|
|
|
|
soft interrupts. This specifies the size of those areas in the
|
|
|
|
number of unsigned long words.
|
|
|
|
|
2024-06-11 09:50:31 +02:00
|
|
|
config KCOV_SELFTEST
|
|
|
|
bool "Perform short selftests on boot"
|
|
|
|
depends on KCOV
|
|
|
|
help
|
|
|
|
Run short KCOV coverage collection selftests on boot.
|
|
|
|
On test failure, causes the kernel to panic. Recommended to be
|
|
|
|
enabled, ensuring critical functionality works as intended.
|
|
|
|
|
2018-02-06 15:38:38 -08:00
|
|
|
menuconfig RUNTIME_TESTING_MENU
|
|
|
|
bool "Runtime Testing"
|
2024-02-11 21:48:08 +09:00
|
|
|
default y
|
2018-02-06 15:38:38 -08:00
|
|
|
|
|
|
|
if RUNTIME_TESTING_MENU
|
2013-07-01 13:04:44 -07:00
|
|
|
|
2022-12-08 15:31:28 +01:00
|
|
|
config TEST_DHRY
|
|
|
|
tristate "Dhrystone benchmark test"
|
|
|
|
help
|
|
|
|
Enable this to include the Dhrystone 2.1 benchmark. This test
|
|
|
|
calculates the number of Dhrystones per second, and the number of
|
|
|
|
DMIPS (Dhrystone MIPS) obtained when the Dhrystone score is divided
|
|
|
|
by 1757 (the number of Dhrystones per second obtained on the VAX
|
|
|
|
11/780, nominally a 1 MIPS machine).
|
|
|
|
|
|
|
|
To run the benchmark, it needs to be enabled explicitly, either from
|
|
|
|
the kernel command line (when built-in), or from userspace (when
|
2024-01-22 15:50:45 +01:00
|
|
|
built-in or modular).
|
2022-12-08 15:31:28 +01:00
|
|
|
|
|
|
|
Run once during kernel boot:
|
|
|
|
|
|
|
|
test_dhry.run
|
|
|
|
|
|
|
|
Set number of iterations from kernel command line:
|
|
|
|
|
|
|
|
test_dhry.iterations=<n>
|
|
|
|
|
|
|
|
Set number of iterations from userspace:
|
|
|
|
|
|
|
|
echo <n> > /sys/module/test_dhry/parameters/iterations
|
|
|
|
|
|
|
|
Trigger manual run from userspace:
|
|
|
|
|
|
|
|
echo y > /sys/module/test_dhry/parameters/run
|
|
|
|
|
|
|
|
If the number of iterations is <= 0, the test will devise a suitable
|
|
|
|
number of iterations (test runs for at least 2s) automatically.
|
|
|
|
This process takes ca. 4s.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2013-07-01 13:04:44 -07:00
|
|
|
config LKDTM
|
|
|
|
tristate "Linux Kernel Dump Test Tool Module"
|
|
|
|
depends on DEBUG_FS
|
|
|
|
help
|
|
|
|
This module enables testing of the different dumping mechanisms by
|
|
|
|
inducing system failures at predefined crash points.
|
|
|
|
If you don't need it: say N
|
|
|
|
Choose M here to compile this code as a module. The module will be
|
|
|
|
called lkdtm.
|
|
|
|
|
|
|
|
Documentation on how to use the module can be found in
|
2019-06-12 14:52:44 -03:00
|
|
|
Documentation/fault-injection/provoke-crashes.rst
|
2013-07-01 13:04:44 -07:00
|
|
|
|
2022-08-23 08:12:21 +02:00
|
|
|
config CPUMASK_KUNIT_TEST
|
|
|
|
tristate "KUnit test for cpumask" if !KUNIT_ALL_TESTS
|
2022-07-02 18:08:26 +02:00
|
|
|
depends on KUNIT
|
|
|
|
default KUNIT_ALL_TESTS
|
|
|
|
help
|
|
|
|
Enable to turn on cpumask tests, running at boot or module load time.
|
|
|
|
|
2022-08-23 08:12:21 +02:00
|
|
|
For more information on KUnit and unit tests in general, please refer
|
|
|
|
to the KUnit documentation in Documentation/dev-tools/kunit/.
|
|
|
|
|
2022-07-02 18:08:26 +02:00
|
|
|
If unsure, say N.
|
|
|
|
|
2013-07-01 13:04:44 -07:00
|
|
|
config TEST_LIST_SORT
|
lib/test: convert lib/test_list_sort.c to use KUnit
Functionally, this just means that the test output will be slightly
changed and it'll now depend on CONFIG_KUNIT=y/m.
It'll still run at boot time and can still be built as a loadable
module.
There was a pre-existing patch to convert this test that I found later,
here [1]. Compared to [1], this patch doesn't rename files and uses
KUnit features more heavily (i.e. does more than converting pr_err()
calls to KUNIT_FAIL()).
What this conversion gives us:
* a shorter test thanks to KUnit's macros
* a way to run this a bit more easily via kunit.py (and
CONFIG_KUNIT_ALL_TESTS=y) [2]
* a structured way of reporting pass/fail
* uses kunit-managed allocations to avoid the risk of memory leaks
* more descriptive error messages:
* i.e. it prints out which fields are invalid, what the expected
values are, etc.
What this conversion does not do:
* change the name of the file (and thus the name of the module)
* change the name of the config option
Leaving these as-is for now to minimize the impact to people wanting to
run this test. IMO, that concern trumps following KUnit's style guide
for both names, at least for now.
[1] https://lore.kernel.org/linux-kselftest/20201015014616.309000-1-vitor@massaru.org/
[2] Can be run via
$ ./tools/testing/kunit/kunit.py run --kunitconfig /dev/stdin <<EOF
CONFIG_KUNIT=y
CONFIG_TEST_LIST_SORT=y
EOF
[16:55:56] Configuring KUnit Kernel ...
[16:55:56] Building KUnit Kernel ...
[16:56:29] Starting KUnit Kernel ...
[16:56:32] ============================================================
[16:56:32] ======== [PASSED] list_sort ========
[16:56:32] [PASSED] list_sort_test
[16:56:32] ============================================================
[16:56:32] Testing complete. 1 tests run. 0 failed. 0 crashed.
[16:56:32] Elapsed time: 35.668s total, 0.001s configuring, 32.725s building, 0.000s running
Note: the build time is as after a `make mrproper`.
Signed-off-by: Daniel Latypov <dlatypov@google.com>
Tested-by: David Gow <davidgow@google.com>
Acked-by: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2021-05-03 13:58:35 -07:00
|
|
|
tristate "Linked list sorting test" if !KUNIT_ALL_TESTS
|
|
|
|
depends on KUNIT
|
|
|
|
default KUNIT_ALL_TESTS
|
2013-07-01 13:04:44 -07:00
|
|
|
help
|
|
|
|
Enable this to turn on 'list_sort()' function test. This test is
|
2017-05-08 15:55:26 -07:00
|
|
|
executed only once during system boot (so affects only boot time),
|
|
|
|
or at module load time.
|
2013-07-01 13:04:44 -07:00
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2020-02-13 23:51:29 -08:00
|
|
|
config TEST_MIN_HEAP
|
|
|
|
tristate "Min heap test"
|
|
|
|
depends on DEBUG_KERNEL || m
|
|
|
|
help
|
|
|
|
Enable this to turn on min heap function tests. This test is
|
|
|
|
executed only once during system boot (so affects only boot time),
|
|
|
|
or at module load time.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2017-02-24 15:01:07 -08:00
|
|
|
config TEST_SORT
|
lib/test: convert test_sort.c to use KUnit
This follows up commit ebd09577be6c ("lib/test: convert
lib/test_list_sort.c to use KUnit").
Converting this test to KUnit makes the test a bit shorter, standardizes
how it reports pass/fail, and adds an easier way to run the test [1].
Like ebd09577be6c, this leaves the file and Kconfig option name the same,
but slightly changes their dependencies (needs CONFIG_KUNIT).
[1] Can be run via
$ ./tools/testing/kunit/kunit.py run --kunitconfig /dev/stdin <<EOF
CONFIG_KUNIT=y
CONFIG_TEST_SORT=y
EOF
[11:30:27] Starting KUnit Kernel ...
[11:30:30] ============================================================
[11:30:30] ======== [PASSED] lib_sort ========
[11:30:30] [PASSED] test_sort
[11:30:30] ============================================================
[11:30:30] Testing complete. 1 tests run. 0 failed. 0 crashed. 0 skipped.
[11:30:30] Elapsed time: 37.032s total, 0.001s configuring, 34.090s building, 0.000s running
Note: this is the time it took after a `make mrproper`.
With an incremental rebuild, this looks more like:
[11:38:58] Elapsed time: 6.444s total, 0.001s configuring, 3.416s building, 0.000s running
Since the test has no dependencies, it can also be run (with some other
tests) with just:
$ ./tools/testing/kunit/kunit.py run
Link: https://lkml.kernel.org/r/20210715232441.1380885-1-dlatypov@google.com
Signed-off-by: Daniel Latypov <dlatypov@google.com>
Cc: Pravin Shedge <pravin.shedge4linux@gmail.com>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: David Gow <davidgow@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-07 19:58:48 -07:00
|
|
|
tristate "Array-based sort test" if !KUNIT_ALL_TESTS
|
|
|
|
depends on KUNIT
|
|
|
|
default KUNIT_ALL_TESTS
|
2017-02-24 15:01:07 -08:00
|
|
|
help
|
2017-05-08 15:55:23 -07:00
|
|
|
This option enables the self-test function of 'sort()' at boot,
|
|
|
|
or at module load time.
|
2017-02-24 15:01:07 -08:00
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2021-04-20 04:50:28 +02:00
|
|
|
config TEST_DIV64
|
|
|
|
tristate "64bit/32bit division and modulo test"
|
|
|
|
depends on DEBUG_KERNEL || m
|
|
|
|
help
|
|
|
|
Enable this to turn on 'do_div()' function test. This test is
|
|
|
|
executed only once during system boot (so affects only boot time),
|
|
|
|
or at module load time.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2024-07-07 15:05:20 -04:00
|
|
|
config TEST_MULDIV64
|
|
|
|
tristate "mul_u64_u64_div_u64() test"
|
|
|
|
depends on DEBUG_KERNEL || m
|
|
|
|
help
|
|
|
|
Enable this to turn on 'mul_u64_u64_div_u64()' function test.
|
|
|
|
This test is executed only once during system boot (so affects
|
|
|
|
only boot time), or at module load time.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2023-09-08 17:03:21 +01:00
|
|
|
config TEST_IOV_ITER
|
|
|
|
tristate "Test iov_iter operation" if !KUNIT_ALL_TESTS
|
|
|
|
depends on KUNIT
|
2024-02-08 07:30:10 -08:00
|
|
|
depends on MMU
|
2023-09-08 17:03:21 +01:00
|
|
|
default KUNIT_ALL_TESTS
|
|
|
|
help
|
|
|
|
Enable this to turn on testing of the operation of the I/O iterator
|
|
|
|
(iov_iter). This test is executed only once during system boot (so
|
|
|
|
affects only boot time), or at module load time.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2013-07-01 13:04:44 -07:00
|
|
|
config KPROBES_SANITY_TEST
|
2022-04-05 12:06:19 -07:00
|
|
|
tristate "Kprobes sanity tests" if !KUNIT_ALL_TESTS
|
2013-07-01 13:04:44 -07:00
|
|
|
depends on DEBUG_KERNEL
|
|
|
|
depends on KPROBES
|
2021-10-21 09:54:24 +09:00
|
|
|
depends on KUNIT
|
2022-11-21 11:06:20 +08:00
|
|
|
select STACKTRACE if ARCH_CORRECT_STACKTRACE_ON_KRETPROBE
|
2022-04-05 12:06:19 -07:00
|
|
|
default KUNIT_ALL_TESTS
|
2013-07-01 13:04:44 -07:00
|
|
|
help
|
|
|
|
This option provides for testing basic kprobes functionality on
|
2018-06-20 01:05:07 +09:00
|
|
|
boot. Samples of kprobe and kretprobe are inserted and
|
2013-07-01 13:04:44 -07:00
|
|
|
verified for functionality.
|
|
|
|
|
|
|
|
Say N if you are unsure.
|
|
|
|
|
2022-03-15 23:02:35 +09:00
|
|
|
config FPROBE_SANITY_TEST
|
|
|
|
bool "Self test for fprobe"
|
|
|
|
depends on DEBUG_KERNEL
|
|
|
|
depends on FPROBE
|
|
|
|
depends on KUNIT=y
|
|
|
|
help
|
|
|
|
This option will enable testing the fprobe when the system boot.
|
|
|
|
A series of tests are made to verify that the fprobe is functioning
|
|
|
|
properly.
|
|
|
|
|
|
|
|
Say N if you are unsure.
|
|
|
|
|
2013-07-01 13:04:44 -07:00
|
|
|
config BACKTRACE_SELF_TEST
|
|
|
|
tristate "Self test for the backtrace code"
|
|
|
|
depends on DEBUG_KERNEL
|
|
|
|
help
|
|
|
|
This option provides a kernel module that can be used to test
|
|
|
|
the kernel stack backtrace code. This option is not useful
|
|
|
|
for distributions or general kernels, but only for kernel
|
|
|
|
developers working on architecture code.
|
|
|
|
|
|
|
|
Note that if you want to also test saved backtraces, you will
|
|
|
|
have to enable STACKTRACE as well.
|
|
|
|
|
|
|
|
Say N if you are unsure.
|
|
|
|
|
2021-12-04 20:21:56 -08:00
|
|
|
config TEST_REF_TRACKER
|
|
|
|
tristate "Self test for reference tracker"
|
|
|
|
depends on DEBUG_KERNEL && STACKTRACE_SUPPORT
|
|
|
|
select REF_TRACKER
|
|
|
|
help
|
|
|
|
This option provides a kernel module performing tests
|
|
|
|
using reference tracker infrastructure.
|
|
|
|
|
|
|
|
Say N if you are unsure.
|
|
|
|
|
2012-10-08 16:30:39 -07:00
|
|
|
config RBTREE_TEST
|
|
|
|
tristate "Red-Black tree test"
|
2013-09-11 14:25:19 -07:00
|
|
|
depends on DEBUG_KERNEL
|
2012-10-08 16:30:39 -07:00
|
|
|
help
|
|
|
|
A benchmark measuring the performance of the rbtree library.
|
|
|
|
Also includes rbtree invariant checks.
|
|
|
|
|
2019-06-20 17:10:33 +03:00
|
|
|
config REED_SOLOMON_TEST
|
|
|
|
tristate "Reed-Solomon library test"
|
|
|
|
depends on DEBUG_KERNEL || m
|
|
|
|
select REED_SOLOMON
|
|
|
|
select REED_SOLOMON_ENC16
|
|
|
|
select REED_SOLOMON_DEC16
|
|
|
|
help
|
|
|
|
This option enables the self-test function of rslib at boot,
|
|
|
|
or at module load time.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
rbtree: add prio tree and interval tree tests
Patch 1 implements support for interval trees, on top of the augmented
rbtree API. It also adds synthetic tests to compare the performance of
interval trees vs prio trees. Short answers is that interval trees are
slightly faster (~25%) on insert/erase, and much faster (~2.4 - 3x)
on search. It is debatable how realistic the synthetic test is, and I have
not made such measurements yet, but my impression is that interval trees
would still come out faster.
Patch 2 uses a preprocessor template to make the interval tree generic,
and uses it as a replacement for the vma prio_tree.
Patch 3 takes the other prio_tree user, kmemleak, and converts it to use
a basic rbtree. We don't actually need the augmented rbtree support here
because the intervals are always non-overlapping.
Patch 4 removes the now-unused prio tree library.
Patch 5 proposes an additional optimization to rb_erase_augmented, now
providing it as an inline function so that the augmented callbacks can be
inlined in. This provides an additional 5-10% performance improvement
for the interval tree insert/erase benchmark. There is a maintainance cost
as it exposes augmented rbtree users to some of the rbtree library internals;
however I think this cost shouldn't be too high as I expect the augmented
rbtree will always have much less users than the base rbtree.
I should probably add a quick summary of why I think it makes sense to
replace prio trees with augmented rbtree based interval trees now. One of
the drivers is that we need augmented rbtrees for Rik's vma gap finding
code, and once you have them, it just makes sense to use them for interval
trees as well, as this is the simpler and more well known algorithm. prio
trees, in comparison, seem *too* clever: they impose an additional 'heap'
constraint on the tree, which they use to guarantee a faster worst-case
complexity of O(k+log N) for stabbing queries in a well-balanced prio
tree, vs O(k*log N) for interval trees (where k=number of matches,
N=number of intervals). Now this sounds great, but in practice prio trees
don't realize this theorical benefit. First, the additional constraint
makes them harder to update, so that the kernel implementation has to
simplify things by balancing them like a radix tree, which is not always
ideal. Second, the fact that there are both index and heap properties
makes both tree manipulation and search more complex, which results in a
higher multiplicative time constant. As it turns out, the simple interval
tree algorithm ends up running faster than the more clever prio tree.
This patch:
Add two test modules:
- prio_tree_test measures the performance of lib/prio_tree.c, both for
insertion/removal and for stabbing searches
- interval_tree_test measures the performance of a library of equivalent
functionality, built using the augmented rbtree support.
In order to support the second test module, lib/interval_tree.c is
introduced. It is kept separate from the interval_tree_test main file
for two reasons: first we don't want to provide an unfair advantage
over prio_tree_test by having everything in a single compilation unit,
and second there is the possibility that the interval tree functionality
could get some non-test users in kernel over time.
Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-08 16:31:23 -07:00
|
|
|
config INTERVAL_TREE_TEST
|
|
|
|
tristate "Interval tree test"
|
2017-07-10 15:51:43 -07:00
|
|
|
depends on DEBUG_KERNEL
|
2014-03-17 12:21:54 +00:00
|
|
|
select INTERVAL_TREE
|
rbtree: add prio tree and interval tree tests
Patch 1 implements support for interval trees, on top of the augmented
rbtree API. It also adds synthetic tests to compare the performance of
interval trees vs prio trees. Short answers is that interval trees are
slightly faster (~25%) on insert/erase, and much faster (~2.4 - 3x)
on search. It is debatable how realistic the synthetic test is, and I have
not made such measurements yet, but my impression is that interval trees
would still come out faster.
Patch 2 uses a preprocessor template to make the interval tree generic,
and uses it as a replacement for the vma prio_tree.
Patch 3 takes the other prio_tree user, kmemleak, and converts it to use
a basic rbtree. We don't actually need the augmented rbtree support here
because the intervals are always non-overlapping.
Patch 4 removes the now-unused prio tree library.
Patch 5 proposes an additional optimization to rb_erase_augmented, now
providing it as an inline function so that the augmented callbacks can be
inlined in. This provides an additional 5-10% performance improvement
for the interval tree insert/erase benchmark. There is a maintainance cost
as it exposes augmented rbtree users to some of the rbtree library internals;
however I think this cost shouldn't be too high as I expect the augmented
rbtree will always have much less users than the base rbtree.
I should probably add a quick summary of why I think it makes sense to
replace prio trees with augmented rbtree based interval trees now. One of
the drivers is that we need augmented rbtrees for Rik's vma gap finding
code, and once you have them, it just makes sense to use them for interval
trees as well, as this is the simpler and more well known algorithm. prio
trees, in comparison, seem *too* clever: they impose an additional 'heap'
constraint on the tree, which they use to guarantee a faster worst-case
complexity of O(k+log N) for stabbing queries in a well-balanced prio
tree, vs O(k*log N) for interval trees (where k=number of matches,
N=number of intervals). Now this sounds great, but in practice prio trees
don't realize this theorical benefit. First, the additional constraint
makes them harder to update, so that the kernel implementation has to
simplify things by balancing them like a radix tree, which is not always
ideal. Second, the fact that there are both index and heap properties
makes both tree manipulation and search more complex, which results in a
higher multiplicative time constant. As it turns out, the simple interval
tree algorithm ends up running faster than the more clever prio tree.
This patch:
Add two test modules:
- prio_tree_test measures the performance of lib/prio_tree.c, both for
insertion/removal and for stabbing searches
- interval_tree_test measures the performance of a library of equivalent
functionality, built using the augmented rbtree support.
In order to support the second test module, lib/interval_tree.c is
introduced. It is kept separate from the interval_tree_test main file
for two reasons: first we don't want to provide an unfair advantage
over prio_tree_test by having everything in a single compilation unit,
and second there is the possibility that the interval tree functionality
could get some non-test users in kernel over time.
Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-08 16:31:23 -07:00
|
|
|
help
|
|
|
|
A benchmark measuring the performance of the interval tree library
|
|
|
|
|
2013-11-12 15:08:34 -08:00
|
|
|
config PERCPU_TEST
|
|
|
|
tristate "Per cpu operations test"
|
|
|
|
depends on m && DEBUG_KERNEL
|
|
|
|
help
|
|
|
|
Enable this option to build test module which validates per-cpu
|
|
|
|
operations.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2013-07-01 13:04:44 -07:00
|
|
|
config ATOMIC64_SELFTEST
|
2017-02-24 15:00:55 -08:00
|
|
|
tristate "Perform an atomic64_t self-test"
|
2013-07-01 13:04:44 -07:00
|
|
|
help
|
2017-02-24 15:00:55 -08:00
|
|
|
Enable this option to test the atomic64_t functions at boot or
|
|
|
|
at module load time.
|
2013-07-01 13:04:44 -07:00
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
|
|
|
config ASYNC_RAID6_TEST
|
|
|
|
tristate "Self test for hardware accelerated raid6 recovery"
|
|
|
|
depends on ASYNC_RAID6_RECOV
|
|
|
|
select ASYNC_MEMCPY
|
2020-06-14 01:50:22 +09:00
|
|
|
help
|
2013-07-01 13:04:44 -07:00
|
|
|
This is a one-shot self test that permutes through the
|
|
|
|
recovery of all the possible two disk failure scenarios for a
|
|
|
|
N-disk array. Recovery is performed with the asynchronous
|
|
|
|
raid6 recovery routines, and will optionally use an offload
|
|
|
|
engine if one is available.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2015-02-12 15:02:21 -08:00
|
|
|
config TEST_HEXDUMP
|
|
|
|
tristate "Test functions located in the hexdump module at runtime"
|
|
|
|
|
2024-03-01 12:27:30 -08:00
|
|
|
config STRING_KUNIT_TEST
|
|
|
|
tristate "KUnit test string functions at runtime" if !KUNIT_ALL_TESTS
|
|
|
|
depends on KUNIT
|
|
|
|
default KUNIT_ALL_TESTS
|
2021-07-29 14:53:35 -07:00
|
|
|
|
2024-03-01 12:27:31 -08:00
|
|
|
config STRING_HELPERS_KUNIT_TEST
|
|
|
|
tristate "KUnit test string helpers at runtime" if !KUNIT_ALL_TESTS
|
|
|
|
depends on KUNIT
|
|
|
|
default KUNIT_ALL_TESTS
|
2013-07-01 13:04:44 -07:00
|
|
|
|
|
|
|
config TEST_KSTRTOX
|
|
|
|
tristate "Test kstrto*() family of functions at runtime"
|
|
|
|
|
2015-11-06 16:30:29 -08:00
|
|
|
config TEST_PRINTF
|
|
|
|
tristate "Test printf() family of functions at runtime"
|
|
|
|
|
2021-05-14 17:12:05 +01:00
|
|
|
config TEST_SCANF
|
|
|
|
tristate "Test scanf() family of functions at runtime"
|
|
|
|
|
2016-02-19 09:24:00 -05:00
|
|
|
config TEST_BITMAP
|
|
|
|
tristate "Test bitmap_*() family of functions at runtime"
|
|
|
|
help
|
|
|
|
Enable this option to test the bitmap functions at boot.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2016-05-30 17:40:41 +03:00
|
|
|
config TEST_UUID
|
|
|
|
tristate "Test functions located in the uuid module at runtime"
|
|
|
|
|
2017-11-07 14:57:46 -05:00
|
|
|
config TEST_XARRAY
|
|
|
|
tristate "Test the XArray code at runtime"
|
|
|
|
|
2022-10-28 18:04:30 +00:00
|
|
|
config TEST_MAPLE_TREE
|
2023-05-18 10:55:25 -04:00
|
|
|
tristate "Test the Maple Tree code at runtime or module load"
|
|
|
|
help
|
|
|
|
Enable this option to test the maple tree code functions at boot, or
|
|
|
|
when the module is loaded. Enable "Debug Maple Trees" will enable
|
|
|
|
more verbose output on failures.
|
|
|
|
|
|
|
|
If unsure, say N.
|
2022-10-28 18:04:30 +00:00
|
|
|
|
2014-08-02 11:47:44 +02:00
|
|
|
config TEST_RHASHTABLE
|
2015-01-29 15:40:25 +01:00
|
|
|
tristate "Perform selftest on resizable hash table"
|
2014-08-02 11:47:44 +02:00
|
|
|
help
|
|
|
|
Enable this option to test the rhashtable functions at boot.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2018-06-18 16:59:29 -04:00
|
|
|
config TEST_IDA
|
|
|
|
tristate "Perform selftest on IDA functions"
|
|
|
|
|
2017-02-03 10:29:06 +01:00
|
|
|
config TEST_PARMAN
|
|
|
|
tristate "Perform selftest on priority array manager"
|
|
|
|
depends on PARMAN
|
|
|
|
help
|
|
|
|
Enable this option to test priority array manager on boot
|
|
|
|
(or module load).
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2019-05-27 22:55:19 +02:00
|
|
|
config TEST_IRQ_TIMINGS
|
|
|
|
bool "IRQ timings selftest"
|
|
|
|
depends on IRQ_TIMINGS
|
|
|
|
help
|
|
|
|
Enable this option to test the irq timings code on boot.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2014-10-13 15:51:38 -07:00
|
|
|
config TEST_LKM
|
test: add minimal module for verification testing
This is a pair of test modules I'd like to see in the tree. Instead of
putting these in lkdtm, where I've been adding various tests that trigger
crashes, these don't make sense there since they need to be either
distinctly separate, or their pass/fail state don't need to crash the
machine.
These live in lib/ for now, along with a few other in-kernel test modules,
and use the slightly more common "test_" naming convention, instead of
"test-". We should likely standardize on the former:
$ find . -name 'test_*.c' | grep -v /tools/ | wc -l
4
$ find . -name 'test-*.c' | grep -v /tools/ | wc -l
2
The first is entirely a no-op module, designed to allow simple testing of
the module loading and verification interface. It's useful to have a
module that has no other uses or dependencies so it can be reliably used
for just testing module loading and verification.
The second is a module that exercises the user memory access functions, in
an effort to make sure that we can quickly catch any regressions in
boundary checking (e.g. like what was recently fixed on ARM).
This patch (of 2):
When doing module loading verification tests (for example, with module
signing, or LSM hooks), it is very handy to have a module that can be
built on all systems under test, isn't auto-loaded at boot, and has no
device or similar dependencies. This creates the "test_module.ko" module
for that purpose, which only reports its load and unload to printk.
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23 15:54:37 -08:00
|
|
|
tristate "Test module loading with 'hello world' module"
|
|
|
|
depends on m
|
|
|
|
help
|
|
|
|
This builds the "test_module" module that emits "Hello, world"
|
|
|
|
on printk when loaded. It is designed to be used for basic
|
|
|
|
evaluation of the module loading subsystem (for example when
|
|
|
|
validating module verification). It lacks any extra dependencies,
|
|
|
|
and will not normally be loaded by the system unless explicitly
|
|
|
|
requested by name.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2020-06-04 16:50:27 -07:00
|
|
|
config TEST_BITOPS
|
2020-06-10 18:41:53 -07:00
|
|
|
tristate "Test module for compilation of bitops operations"
|
2020-06-04 16:50:27 -07:00
|
|
|
help
|
|
|
|
This builds the "test_bitops" module that is much like the
|
|
|
|
TEST_LKM module except that it does a basic exercise of the
|
2020-06-10 18:41:53 -07:00
|
|
|
set/clear_bit macros and get_count_order/long to make sure there are
|
|
|
|
no compiler warnings from C=1 sparse checker or -Wextra
|
|
|
|
compilations. It has no dependencies and doesn't run or load unless
|
|
|
|
explicitly requested by name. for example: modprobe test_bitops.
|
2020-06-04 16:50:27 -07:00
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
vmalloc: add test driver to analyse vmalloc allocator
This adds a new kernel module for analysis of vmalloc allocator. It is
only enabled as a module. There are two main reasons this module should
be used for: performance evaluation and stressing of vmalloc subsystem.
It consists of several test cases. As of now there are 8. The module
has five parameters we can specify to change its the behaviour.
1) run_test_mask - set of tests to be run
id: 1, name: fix_size_alloc_test
id: 2, name: full_fit_alloc_test
id: 4, name: long_busy_list_alloc_test
id: 8, name: random_size_alloc_test
id: 16, name: fix_align_alloc_test
id: 32, name: random_size_align_alloc_test
id: 64, name: align_shift_alloc_test
id: 128, name: pcpu_alloc_test
By default all tests are in run test mask. If you want to select some
specific tests it is possible to pass the mask. For example for first,
second and fourth tests we go 11 value.
2) test_repeat_count - how many times each test should be repeated
By default it is one time per test. It is possible to pass any number.
As high the value is the test duration gets increased.
3) test_loop_count - internal test loop counter. By default it is set
to 1000000.
4) single_cpu_test - use one CPU to run the tests
By default this parameter is set to false. It means that all online
CPUs execute tests. By setting it to 1, the tests are executed by
first online CPU only.
5) sequential_test_order - run tests in sequential order
By default this parameter is set to false. It means that before running
tests the order is shuffled. It is possible to make it sequential, just
set it to 1.
Performance analysis:
In order to evaluate performance of vmalloc allocations, usually it
makes sense to use only one CPU that runs tests, use sequential order,
number of repeat tests can be different as well as set of test mask.
For example if we want to run all tests, to use one CPU and repeat each
test 3 times. Insert the module passing following parameters:
single_cpu_test=1 sequential_test_order=1 test_repeat_count=3
with following output:
<snip>
Summary: fix_size_alloc_test passed: 3 failed: 0 repeat: 3 loops: 1000000 avg: 901177 usec
Summary: full_fit_alloc_test passed: 3 failed: 0 repeat: 3 loops: 1000000 avg: 1039341 usec
Summary: long_busy_list_alloc_test passed: 3 failed: 0 repeat: 3 loops: 1000000 avg: 11775763 usec
Summary: random_size_alloc_test passed 3: failed: 0 repeat: 3 loops: 1000000 avg: 6081992 usec
Summary: fix_align_alloc_test passed: 3 failed: 0 repeat: 3, loops: 1000000 avg: 2003712 usec
Summary: random_size_align_alloc_test passed: 3 failed: 0 repeat: 3 loops: 1000000 avg: 2895689 usec
Summary: align_shift_alloc_test passed: 0 failed: 3 repeat: 3 loops: 1000000 avg: 573 usec
Summary: pcpu_alloc_test passed: 3 failed: 0 repeat: 3 loops: 1000000 avg: 95802 usec
All test took CPU0=192945605995 cycles
<snip>
The align_shift_alloc_test is expected to be failed.
Stressing:
In order to stress the vmalloc subsystem we run all available test cases
on all available CPUs simultaneously. In order to prevent constant behaviour
pattern, the test cases array is shuffled by default to randomize the order
of test execution.
For example if we want to run all tests(default), use all online CPUs(default)
with shuffled order(default) and to repeat each test 30 times. The command
would be like:
modprobe vmalloc_test test_repeat_count=30
Expected results are the system is alive, there are no any BUG_ONs or Kernel
Panics the tests are completed, no memory leaks.
[urezki@gmail.com: fix 32-bit builds]
Link: http://lkml.kernel.org/r/20190106214839.ffvjvmrn52uqog7k@pc636
[urezki@gmail.com: make CONFIG_TEST_VMALLOC depend on CONFIG_MMU]
Link: http://lkml.kernel.org/r/20190219085441.s6bg2gpy4esny5vw@pc636
Link: http://lkml.kernel.org/r/20190103142108.20744-3-urezki@gmail.com
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05 15:43:34 -08:00
|
|
|
config TEST_VMALLOC
|
|
|
|
tristate "Test module for stress/performance analysis of vmalloc allocator"
|
|
|
|
default n
|
|
|
|
depends on MMU
|
|
|
|
depends on m
|
|
|
|
help
|
|
|
|
This builds the "test_vmalloc" module that should be used for
|
|
|
|
stress and performance analysis. So, any new change for vmalloc
|
|
|
|
subsystem can be evaluated from performance and stability point
|
|
|
|
of view.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2014-05-08 14:10:52 -07:00
|
|
|
config TEST_BPF
|
|
|
|
tristate "Test BPF filter functionality"
|
2014-05-13 09:58:44 -07:00
|
|
|
depends on m && NET
|
2014-05-08 14:10:52 -07:00
|
|
|
help
|
|
|
|
This builds the "test_bpf" module that runs various test vectors
|
|
|
|
against the BPF interpreter or BPF JIT compiler depending on the
|
|
|
|
current setting. This is in particular useful for BPF JIT compiler
|
|
|
|
development, but also to run regression tests against changes in
|
bpf: mini eBPF library, test stubs and verifier testsuite
1.
the library includes a trivial set of BPF syscall wrappers:
int bpf_create_map(int key_size, int value_size, int max_entries);
int bpf_update_elem(int fd, void *key, void *value);
int bpf_lookup_elem(int fd, void *key, void *value);
int bpf_delete_elem(int fd, void *key);
int bpf_get_next_key(int fd, void *key, void *next_key);
int bpf_prog_load(enum bpf_prog_type prog_type,
const struct sock_filter_int *insns, int insn_len,
const char *license);
bpf_prog_load() stores verifier log into global bpf_log_buf[] array
and BPF_*() macros to build instructions
2.
test stubs configure eBPF infra with 'unspec' map and program types.
These are fake types used by user space testsuite only.
3.
verifier tests valid and invalid programs and expects predefined
error log messages from kernel.
40 tests so far.
$ sudo ./test_verifier
#0 add+sub+mul OK
#1 unreachable OK
#2 unreachable2 OK
#3 out of range jump OK
#4 out of range jump2 OK
#5 test1 ld_imm64 OK
...
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-26 00:17:07 -07:00
|
|
|
the interpreter code. It also enables test stubs for eBPF maps and
|
|
|
|
verifier used by user space verifier testsuite.
|
2014-05-08 14:10:52 -07:00
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2019-07-01 14:39:01 -07:00
|
|
|
config TEST_BLACKHOLE_DEV
|
|
|
|
tristate "Test blackhole netdev functionality"
|
|
|
|
depends on m && NET
|
|
|
|
help
|
|
|
|
This builds the "test_blackhole_dev" module that validates the
|
|
|
|
data path through this blackhole netdev.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2018-02-06 15:38:27 -08:00
|
|
|
config FIND_BIT_BENCHMARK
|
lib: test module for find_*_bit() functions
find_bit functions are widely used in the kernel, including hot paths.
This module tests performance of those functions in 2 typical scenarios:
randomly filled bitmap with relatively equal distribution of set and
cleared bits, and sparse bitmap which has 1 set bit for 500 cleared
bits.
On ThunderX machine:
Start testing find_bit() with random-filled bitmap
find_next_bit: 240043 cycles, 164062 iterations
find_next_zero_bit: 312848 cycles, 163619 iterations
find_last_bit: 193748 cycles, 164062 iterations
find_first_bit: 177720874 cycles, 164062 iterations
Start testing find_bit() with sparse bitmap
find_next_bit: 3633 cycles, 656 iterations
find_next_zero_bit: 620399 cycles, 327025 iterations
find_last_bit: 3038 cycles, 656 iterations
find_first_bit: 691407 cycles, 656 iterations
[arnd@arndb.de: use correct format string for find-bit tests]
Link: http://lkml.kernel.org/r/20171113135605.3166307-1-arnd@arndb.de
Link: http://lkml.kernel.org/r/20171109140714.13168-1-ynorov@caviumnetworks.com
Signed-off-by: Yury Norov <ynorov@caviumnetworks.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Clement Courbet <courbet@google.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17 15:28:31 -08:00
|
|
|
tristate "Test find_bit functions"
|
|
|
|
help
|
|
|
|
This builds the "test_find_bit" module that measure find_*_bit()
|
|
|
|
functions performance.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2014-07-14 14:38:12 -07:00
|
|
|
config TEST_FIRMWARE
|
|
|
|
tristate "Test firmware loading via userspace interface"
|
|
|
|
depends on FW_LOADER
|
|
|
|
help
|
|
|
|
This builds the "test_firmware" module that creates a userspace
|
|
|
|
interface for testing firmware loading. This can be used to
|
|
|
|
control the triggering of firmware loading without needing an
|
|
|
|
actual firmware-using device. The contents can be rechecked by
|
|
|
|
userspace.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2017-07-12 14:33:43 -07:00
|
|
|
config TEST_SYSCTL
|
|
|
|
tristate "sysctl test driver"
|
|
|
|
depends on PROC_SYSCTL
|
|
|
|
help
|
|
|
|
This builds the "test_sysctl" module. This driver enables to test the
|
|
|
|
proc sysctl interfaces available to drivers safely without affecting
|
|
|
|
production knobs which might alter system functionality.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2020-07-29 14:58:49 -03:00
|
|
|
config BITFIELD_KUNIT
|
2022-04-05 12:06:19 -07:00
|
|
|
tristate "KUnit test bitfield functions at runtime" if !KUNIT_ALL_TESTS
|
2020-07-29 14:58:49 -03:00
|
|
|
depends on KUNIT
|
2022-04-05 12:06:19 -07:00
|
|
|
default KUNIT_ALL_TESTS
|
2020-07-29 14:58:49 -03:00
|
|
|
help
|
|
|
|
Enable this option to test the bitfield functions at boot.
|
|
|
|
|
|
|
|
KUnit tests run during boot and output the results to the debug log
|
|
|
|
in TAP format (http://testanything.org/). Only useful for kernel devs
|
|
|
|
running the KUnit test harness, and not intended for inclusion into a
|
|
|
|
production build.
|
|
|
|
|
|
|
|
For more information on KUnit and unit tests in general please refer
|
|
|
|
to the KUnit documentation in Documentation/dev-tools/kunit/.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2023-05-10 20:10:02 -05:00
|
|
|
config CHECKSUM_KUNIT
|
|
|
|
tristate "KUnit test checksum functions at runtime" if !KUNIT_ALL_TESTS
|
|
|
|
depends on KUNIT
|
|
|
|
default KUNIT_ALL_TESTS
|
|
|
|
help
|
|
|
|
Enable this option to test the checksum functions at boot.
|
|
|
|
|
|
|
|
KUnit tests run during boot and output the results to the debug log
|
|
|
|
in TAP format (http://testanything.org/). Only useful for kernel devs
|
|
|
|
running the KUnit test harness, and not intended for inclusion into a
|
|
|
|
production build.
|
|
|
|
|
|
|
|
For more information on KUnit and unit tests in general please refer
|
|
|
|
to the KUnit documentation in Documentation/dev-tools/kunit/.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2022-01-19 18:09:15 -08:00
|
|
|
config HASH_KUNIT_TEST
|
|
|
|
tristate "KUnit Test for integer hash functions" if !KUNIT_ALL_TESTS
|
|
|
|
depends on KUNIT
|
|
|
|
default KUNIT_ALL_TESTS
|
|
|
|
help
|
|
|
|
Enable this option to test the kernel's string (<linux/stringhash.h>), and
|
|
|
|
integer (<linux/hash.h>) hash functions on boot.
|
|
|
|
|
|
|
|
KUnit tests run during boot and output the results to the debug log
|
|
|
|
in TAP format (https://testanything.org/). Only useful for kernel devs
|
|
|
|
running the KUnit test harness, and not intended for inclusion into a
|
|
|
|
production build.
|
|
|
|
|
|
|
|
For more information on KUnit and unit tests in general please refer
|
|
|
|
to the KUnit documentation in Documentation/dev-tools/kunit/.
|
|
|
|
|
|
|
|
This is intended to help people writing architecture-specific
|
|
|
|
optimized versions. If unsure, say N.
|
|
|
|
|
2020-11-03 22:45:08 +02:00
|
|
|
config RESOURCE_KUNIT_TEST
|
2022-04-05 12:06:19 -07:00
|
|
|
tristate "KUnit test for resource API" if !KUNIT_ALL_TESTS
|
2020-11-03 22:45:08 +02:00
|
|
|
depends on KUNIT
|
2022-04-05 12:06:19 -07:00
|
|
|
default KUNIT_ALL_TESTS
|
2024-09-06 11:07:13 +08:00
|
|
|
select GET_FREE_REGION
|
2020-11-03 22:45:08 +02:00
|
|
|
help
|
|
|
|
This builds the resource API unit test.
|
|
|
|
Tests the logic of API provided by resource.c and ioport.h.
|
|
|
|
For more information on KUnit and unit tests in general please refer
|
|
|
|
to the KUnit documentation in Documentation/dev-tools/kunit/.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2019-09-23 02:02:47 -07:00
|
|
|
config SYSCTL_KUNIT_TEST
|
2020-05-11 15:14:29 +02:00
|
|
|
tristate "KUnit test for sysctl" if !KUNIT_ALL_TESTS
|
2019-09-23 02:02:47 -07:00
|
|
|
depends on KUNIT
|
2020-05-11 15:14:29 +02:00
|
|
|
default KUNIT_ALL_TESTS
|
2019-09-23 02:02:47 -07:00
|
|
|
help
|
|
|
|
This builds the proc sysctl unit test, which runs on boot.
|
|
|
|
Tests the API contract and implementation correctness of sysctl.
|
|
|
|
For more information on KUnit and unit tests in general please refer
|
|
|
|
to the KUnit documentation in Documentation/dev-tools/kunit/.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2019-10-24 15:46:31 -07:00
|
|
|
config LIST_KUNIT_TEST
|
2020-05-11 15:14:29 +02:00
|
|
|
tristate "KUnit Test for Kernel Linked-list structures" if !KUNIT_ALL_TESTS
|
2019-10-24 15:46:31 -07:00
|
|
|
depends on KUNIT
|
2020-05-11 15:14:29 +02:00
|
|
|
default KUNIT_ALL_TESTS
|
2019-10-24 15:46:31 -07:00
|
|
|
help
|
|
|
|
This builds the linked list KUnit test suite.
|
|
|
|
It tests that the API and basic functionality of the list_head type
|
|
|
|
and associated macros.
|
|
|
|
|
|
|
|
KUnit tests run during boot and output the results to the debug log
|
2020-08-11 18:34:50 -07:00
|
|
|
in TAP format (https://testanything.org/). Only useful for kernel devs
|
2019-10-24 15:46:31 -07:00
|
|
|
running the KUnit test harness, and not intended for inclusion into a
|
|
|
|
production build.
|
|
|
|
|
|
|
|
For more information on KUnit and unit tests in general please refer
|
|
|
|
to the KUnit documentation in Documentation/dev-tools/kunit/.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2023-01-25 22:54:49 +00:00
|
|
|
config HASHTABLE_KUNIT_TEST
|
|
|
|
tristate "KUnit Test for Kernel Hashtable structures" if !KUNIT_ALL_TESTS
|
|
|
|
depends on KUNIT
|
|
|
|
default KUNIT_ALL_TESTS
|
|
|
|
help
|
|
|
|
This builds the hashtable KUnit test suite.
|
|
|
|
It tests the basic functionality of the API defined in
|
|
|
|
include/linux/hashtable.h. For more information on KUnit and
|
|
|
|
unit tests in general please refer to the KUnit documentation
|
|
|
|
in Documentation/dev-tools/kunit/.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2020-05-08 18:40:43 +03:00
|
|
|
config LINEAR_RANGES_TEST
|
|
|
|
tristate "KUnit test for linear_ranges"
|
|
|
|
depends on KUNIT
|
|
|
|
select LINEAR_RANGES
|
|
|
|
help
|
|
|
|
This builds the linear_ranges unit test, which runs on boot.
|
|
|
|
Tests the linear_ranges logic correctness.
|
|
|
|
For more information on KUnit and unit tests in general please refer
|
2020-12-15 20:43:34 -08:00
|
|
|
to the KUnit documentation in Documentation/dev-tools/kunit/.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
|
|
|
config CMDLINE_KUNIT_TEST
|
2022-04-05 12:06:19 -07:00
|
|
|
tristate "KUnit test for cmdline API" if !KUNIT_ALL_TESTS
|
2020-12-15 20:43:34 -08:00
|
|
|
depends on KUNIT
|
2022-04-05 12:06:19 -07:00
|
|
|
default KUNIT_ALL_TESTS
|
2020-12-15 20:43:34 -08:00
|
|
|
help
|
|
|
|
This builds the cmdline API unit test.
|
|
|
|
Tests the logic of API provided by cmdline.c.
|
|
|
|
For more information on KUnit and unit tests in general please refer
|
2020-05-08 18:40:43 +03:00
|
|
|
to the KUnit documentation in Documentation/dev-tools/kunit/.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2020-08-11 18:35:03 -07:00
|
|
|
config BITS_TEST
|
2022-04-05 12:06:19 -07:00
|
|
|
tristate "KUnit test for bits.h" if !KUNIT_ALL_TESTS
|
2020-08-11 18:35:03 -07:00
|
|
|
depends on KUNIT
|
2022-04-05 12:06:19 -07:00
|
|
|
default KUNIT_ALL_TESTS
|
2020-08-11 18:35:03 -07:00
|
|
|
help
|
|
|
|
This builds the bits unit test.
|
|
|
|
Tests the logic of macros defined in bits.h.
|
|
|
|
For more information on KUnit and unit tests in general please refer
|
|
|
|
to the KUnit documentation in Documentation/dev-tools/kunit/.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2021-06-28 19:34:33 -07:00
|
|
|
config SLUB_KUNIT_TEST
|
|
|
|
tristate "KUnit test for SLUB cache error detection" if !KUNIT_ALL_TESTS
|
|
|
|
depends on SLUB_DEBUG && KUNIT
|
|
|
|
default KUNIT_ALL_TESTS
|
|
|
|
help
|
|
|
|
This builds SLUB allocator unit test.
|
|
|
|
Tests SLUB cache debugging functionality.
|
|
|
|
For more information on KUnit and unit tests in general please refer
|
|
|
|
to the KUnit documentation in Documentation/dev-tools/kunit/.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2021-06-30 18:55:52 -07:00
|
|
|
config RATIONAL_KUNIT_TEST
|
|
|
|
tristate "KUnit test for rational.c" if !KUNIT_ALL_TESTS
|
2021-09-07 19:58:36 -07:00
|
|
|
depends on KUNIT && RATIONAL
|
2021-06-30 18:55:52 -07:00
|
|
|
default KUNIT_ALL_TESTS
|
|
|
|
help
|
|
|
|
This builds the rational math unit test.
|
|
|
|
For more information on KUnit and unit tests in general please refer
|
|
|
|
to the KUnit documentation in Documentation/dev-tools/kunit/.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2021-06-25 17:45:15 -07:00
|
|
|
config MEMCPY_KUNIT_TEST
|
|
|
|
tristate "Test memcpy(), memmove(), and memset() functions at runtime" if !KUNIT_ALL_TESTS
|
|
|
|
depends on KUNIT
|
|
|
|
default KUNIT_ALL_TESTS
|
|
|
|
help
|
|
|
|
Builds unit tests for memcpy(), memmove(), and memset() functions.
|
|
|
|
For more information on KUnit and unit tests in general please refer
|
|
|
|
to the KUnit documentation in Documentation/dev-tools/kunit/.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2022-08-26 09:21:15 -07:00
|
|
|
config IS_SIGNED_TYPE_KUNIT_TEST
|
|
|
|
tristate "Test is_signed_type() macro" if !KUNIT_ALL_TESTS
|
|
|
|
depends on KUNIT
|
|
|
|
default KUNIT_ALL_TESTS
|
|
|
|
help
|
|
|
|
Builds unit tests for the is_signed_type() macro.
|
|
|
|
|
|
|
|
For more information on KUnit and unit tests in general please refer
|
|
|
|
to the KUnit documentation in Documentation/dev-tools/kunit/.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2022-02-16 14:17:49 -08:00
|
|
|
config OVERFLOW_KUNIT_TEST
|
|
|
|
tristate "Test check_*_overflow() functions at runtime" if !KUNIT_ALL_TESTS
|
|
|
|
depends on KUNIT
|
|
|
|
default KUNIT_ALL_TESTS
|
|
|
|
help
|
|
|
|
Builds unit tests for the check_*_overflow(), size_*(), allocation, and
|
|
|
|
related functions.
|
|
|
|
|
|
|
|
For more information on KUnit and unit tests in general please refer
|
|
|
|
to the KUnit documentation in Documentation/dev-tools/kunit/.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2022-02-16 16:03:41 -08:00
|
|
|
config STACKINIT_KUNIT_TEST
|
|
|
|
tristate "Test level of stack variable initialization" if !KUNIT_ALL_TESTS
|
|
|
|
depends on KUNIT
|
|
|
|
default KUNIT_ALL_TESTS
|
|
|
|
help
|
|
|
|
Test if the kernel is zero-initializing stack variables and
|
|
|
|
padding. Coverage is controlled by compiler flags,
|
|
|
|
CONFIG_INIT_STACK_ALL_PATTERN, CONFIG_INIT_STACK_ALL_ZERO,
|
|
|
|
CONFIG_GCC_PLUGIN_STRUCTLEAK, CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF,
|
|
|
|
or CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF_ALL.
|
|
|
|
|
2022-09-02 13:02:26 -07:00
|
|
|
config FORTIFY_KUNIT_TEST
|
|
|
|
tristate "Test fortified str*() and mem*() function internals at runtime" if !KUNIT_ALL_TESTS
|
2023-04-07 12:27:08 -07:00
|
|
|
depends on KUNIT
|
2022-09-02 13:02:26 -07:00
|
|
|
default KUNIT_ALL_TESTS
|
|
|
|
help
|
|
|
|
Builds unit tests for checking internals of FORTIFY_SOURCE as used
|
|
|
|
by the str*() and mem*() family of functions. For testing runtime
|
|
|
|
traps of FORTIFY_SOURCE, see LKDTM's "FORTIFY_*" tests.
|
|
|
|
|
2022-08-29 14:47:06 +02:00
|
|
|
config HW_BREAKPOINT_KUNIT_TEST
|
|
|
|
bool "Test hw_breakpoint constraints accounting" if !KUNIT_ALL_TESTS
|
|
|
|
depends on HAVE_HW_BREAKPOINT
|
|
|
|
depends on KUNIT=y
|
|
|
|
default KUNIT_ALL_TESTS
|
|
|
|
help
|
|
|
|
Tests for hw_breakpoint constraints accounting.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2022-10-02 19:45:23 -07:00
|
|
|
config SIPHASH_KUNIT_TEST
|
|
|
|
tristate "Perform selftest on siphash functions" if !KUNIT_ALL_TESTS
|
|
|
|
depends on KUNIT
|
|
|
|
default KUNIT_ALL_TESTS
|
|
|
|
help
|
|
|
|
Enable this option to test the kernel's siphash (<linux/siphash.h>) hash
|
|
|
|
functions on boot (or module load).
|
|
|
|
|
|
|
|
This is intended to help people writing architecture-specific
|
|
|
|
optimized versions. If unsure, say N.
|
|
|
|
|
2024-06-12 12:59:19 -07:00
|
|
|
config USERCOPY_KUNIT_TEST
|
|
|
|
tristate "KUnit Test for user/kernel boundary protections"
|
|
|
|
depends on KUNIT
|
|
|
|
default KUNIT_ALL_TESTS
|
|
|
|
help
|
|
|
|
This builds the "usercopy_kunit" module that runs sanity checks
|
|
|
|
on the copy_to/from_user infrastructure, making sure basic
|
|
|
|
user/kernel boundary testing is working.
|
|
|
|
|
2014-06-16 14:58:32 -07:00
|
|
|
config TEST_UDELAY
|
|
|
|
tristate "udelay test driver"
|
|
|
|
help
|
|
|
|
This builds the "udelay_test" module that helps to make sure
|
|
|
|
that udelay() is working properly.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2015-08-03 11:42:57 +02:00
|
|
|
config TEST_STATIC_KEYS
|
|
|
|
tristate "Test static keys"
|
2015-07-30 03:59:44 +00:00
|
|
|
depends on m
|
|
|
|
help
|
2015-08-03 11:42:57 +02:00
|
|
|
Test the static key interfaces.
|
2015-07-30 03:59:44 +00:00
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2022-09-04 15:40:45 -06:00
|
|
|
config TEST_DYNAMIC_DEBUG
|
|
|
|
tristate "Test DYNAMIC_DEBUG"
|
|
|
|
depends on DYNAMIC_DEBUG
|
|
|
|
help
|
|
|
|
This module registers a tracer callback to count enabled
|
|
|
|
pr_debugs in a 'do_debugging' function, then alters their
|
|
|
|
enablements, calls the function, and compares counts.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
kmod: add test driver to stress test the module loader
This adds a new stress test driver for kmod: the kernel module loader.
The new stress test driver, test_kmod, is only enabled as a module right
now. It should be possible to load this as built-in and load tests
early (refer to the force_init_test module parameter), however since a
lot of test can get a system out of memory fast we leave this disabled
for now.
Using a system with 1024 MiB of RAM can *easily* get your kernel OOM
fast with this test driver.
The test_kmod driver exposes API knobs for us to fine tune simple
request_module() and get_fs_type() calls. Since these API calls only
allow each one parameter a test driver for these is rather simple.
Other factors that can help out test driver though are the number of
calls we issue and knowing current limitations of each. This exposes
configuration as much as possible through userspace to be able to build
tests directly from userspace.
Since it allows multiple misc devices its will eventually (once we add a
knob to let us create new devices at will) also be possible to perform
more tests in parallel, provided you have enough memory.
We only enable tests we know work as of right now.
Demo screenshots:
# tools/testing/selftests/kmod/kmod.sh
kmod_test_0001_driver: OK! - loading kmod test
kmod_test_0001_driver: OK! - Return value: 256 (MODULE_NOT_FOUND), expected MODULE_NOT_FOUND
kmod_test_0001_fs: OK! - loading kmod test
kmod_test_0001_fs: OK! - Return value: -22 (-EINVAL), expected -EINVAL
kmod_test_0002_driver: OK! - loading kmod test
kmod_test_0002_driver: OK! - Return value: 256 (MODULE_NOT_FOUND), expected MODULE_NOT_FOUND
kmod_test_0002_fs: OK! - loading kmod test
kmod_test_0002_fs: OK! - Return value: -22 (-EINVAL), expected -EINVAL
kmod_test_0003: OK! - loading kmod test
kmod_test_0003: OK! - Return value: 0 (SUCCESS), expected SUCCESS
kmod_test_0004: OK! - loading kmod test
kmod_test_0004: OK! - Return value: 0 (SUCCESS), expected SUCCESS
kmod_test_0005: OK! - loading kmod test
kmod_test_0005: OK! - Return value: 0 (SUCCESS), expected SUCCESS
kmod_test_0006: OK! - loading kmod test
kmod_test_0006: OK! - Return value: 0 (SUCCESS), expected SUCCESS
kmod_test_0005: OK! - loading kmod test
kmod_test_0005: OK! - Return value: 0 (SUCCESS), expected SUCCESS
kmod_test_0006: OK! - loading kmod test
kmod_test_0006: OK! - Return value: 0 (SUCCESS), expected SUCCESS
XXX: add test restult for 0007
Test completed
You can also request for specific tests:
# tools/testing/selftests/kmod/kmod.sh -t 0001
kmod_test_0001_driver: OK! - loading kmod test
kmod_test_0001_driver: OK! - Return value: 256 (MODULE_NOT_FOUND), expected MODULE_NOT_FOUND
kmod_test_0001_fs: OK! - loading kmod test
kmod_test_0001_fs: OK! - Return value: -22 (-EINVAL), expected -EINVAL
Test completed
Lastly, the current available number of tests:
# tools/testing/selftests/kmod/kmod.sh --help
Usage: tools/testing/selftests/kmod/kmod.sh [ -t <4-number-digit> ]
Valid tests: 0001-0009
0001 - Simple test - 1 thread for empty string
0002 - Simple test - 1 thread for modules/filesystems that do not exist
0003 - Simple test - 1 thread for get_fs_type() only
0004 - Simple test - 2 threads for get_fs_type() only
0005 - multithreaded tests with default setup - request_module() only
0006 - multithreaded tests with default setup - get_fs_type() only
0007 - multithreaded tests with default setup test request_module() and get_fs_type()
0008 - multithreaded - push kmod_concurrent over max_modprobes for request_module()
0009 - multithreaded - push kmod_concurrent over max_modprobes for get_fs_type()
The following test cases currently fail, as such they are not currently
enabled by default:
# tools/testing/selftests/kmod/kmod.sh -t 0008
# tools/testing/selftests/kmod/kmod.sh -t 0009
To be sure to run them as intended please unload both of the modules:
o test_module
o xfs
And ensure they are not loaded on your system prior to testing them. If
you use these paritions for your rootfs you can change the default test
driver used for get_fs_type() by exporting it into your environment. For
example of other test defaults you can override refer to kmod.sh
allow_user_defaults().
Behind the scenes this is how we fine tune at a test case prior to
hitting a trigger to run it:
cat /sys/devices/virtual/misc/test_kmod0/config
echo -n "2" > /sys/devices/virtual/misc/test_kmod0/config_test_case
echo -n "ext4" > /sys/devices/virtual/misc/test_kmod0/config_test_fs
echo -n "80" > /sys/devices/virtual/misc/test_kmod0/config_num_threads
cat /sys/devices/virtual/misc/test_kmod0/config
echo -n "1" > /sys/devices/virtual/misc/test_kmod0/config_num_threads
Finally to trigger:
echo -n "1" > /sys/devices/virtual/misc/test_kmod0/trigger_config
The kmod.sh script uses the above constructs to build different test cases.
A bit of interpretation of the current failures follows, first two
premises:
a) When request_module() is used userspace figures out an optimized
version of module order for us. Once it finds the modules it needs, as
per depmod symbol dep map, it will finit_module() the respective
modules which are needed for the original request_module() request.
b) We have an optimization in place whereby if a kernel uses
request_module() on a module already loaded we never bother userspace
as the module already is loaded. This is all handled by kernel/kmod.c.
A few things to consider to help identify root causes of issues:
0) kmod 19 has a broken heuristic for modules being assumed to be
built-in to your kernel and will return 0 even though request_module()
failed. Upgrade to a newer version of kmod.
1) A get_fs_type() call for "xfs" will request_module() for "fs-xfs",
not for "xfs". The optimization in kernel described in b) fails to
catch if we have a lot of consecutive get_fs_type() calls. The reason
is the optimization in place does not look for aliases. This means two
consecutive get_fs_type() calls will bump kmod_concurrent, whereas
request_module() will not.
This one explanation why test case 0009 fails at least once for
get_fs_type().
2) If a module fails to load --- for whatever reason (kmod_concurrent
limit reached, file not yet present due to rootfs switch, out of
memory) we have a period of time during which module request for the
same name either with request_module() or get_fs_type() will *also*
fail to load even if the file for the module is ready.
This explains why *multiple* NULLs are possible on test 0009.
3) finit_module() consumes quite a bit of memory.
4) Filesystems typically also have more dependent modules than other
modules, its important to note though that even though a get_fs_type()
call does not incur additional kmod_concurrent bumps, since userspace
loads dependencies it finds it needs via finit_module_fd(), it *will*
take much more memory to load a module with a lot of dependencies.
Because of 3) and 4) we will easily run into out of memory failures with
certain tests. For instance test 0006 fails on qemu with 1024 MiB of RAM.
It panics a box after reaping all userspace processes and still not
having enough memory to reap.
[arnd@arndb.de: add dependencies for test module]
Link: http://lkml.kernel.org/r/20170630154834.3689272-1-arnd@arndb.de
Link: http://lkml.kernel.org/r/20170628223155.26472-3-mcgrof@kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Cc: Jessica Yu <jeyu@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michal Marek <mmarek@suse.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-14 14:50:08 -07:00
|
|
|
config TEST_KMOD
|
|
|
|
tristate "kmod stress tester"
|
|
|
|
depends on m
|
|
|
|
depends on NETDEVICES && NET_CORE && INET # for TUN
|
2019-04-25 22:23:44 -07:00
|
|
|
depends on BLOCK
|
2022-01-19 18:10:28 -08:00
|
|
|
depends on PAGE_SIZE_LESS_THAN_256KB # for BTRFS
|
kmod: add test driver to stress test the module loader
This adds a new stress test driver for kmod: the kernel module loader.
The new stress test driver, test_kmod, is only enabled as a module right
now. It should be possible to load this as built-in and load tests
early (refer to the force_init_test module parameter), however since a
lot of test can get a system out of memory fast we leave this disabled
for now.
Using a system with 1024 MiB of RAM can *easily* get your kernel OOM
fast with this test driver.
The test_kmod driver exposes API knobs for us to fine tune simple
request_module() and get_fs_type() calls. Since these API calls only
allow each one parameter a test driver for these is rather simple.
Other factors that can help out test driver though are the number of
calls we issue and knowing current limitations of each. This exposes
configuration as much as possible through userspace to be able to build
tests directly from userspace.
Since it allows multiple misc devices its will eventually (once we add a
knob to let us create new devices at will) also be possible to perform
more tests in parallel, provided you have enough memory.
We only enable tests we know work as of right now.
Demo screenshots:
# tools/testing/selftests/kmod/kmod.sh
kmod_test_0001_driver: OK! - loading kmod test
kmod_test_0001_driver: OK! - Return value: 256 (MODULE_NOT_FOUND), expected MODULE_NOT_FOUND
kmod_test_0001_fs: OK! - loading kmod test
kmod_test_0001_fs: OK! - Return value: -22 (-EINVAL), expected -EINVAL
kmod_test_0002_driver: OK! - loading kmod test
kmod_test_0002_driver: OK! - Return value: 256 (MODULE_NOT_FOUND), expected MODULE_NOT_FOUND
kmod_test_0002_fs: OK! - loading kmod test
kmod_test_0002_fs: OK! - Return value: -22 (-EINVAL), expected -EINVAL
kmod_test_0003: OK! - loading kmod test
kmod_test_0003: OK! - Return value: 0 (SUCCESS), expected SUCCESS
kmod_test_0004: OK! - loading kmod test
kmod_test_0004: OK! - Return value: 0 (SUCCESS), expected SUCCESS
kmod_test_0005: OK! - loading kmod test
kmod_test_0005: OK! - Return value: 0 (SUCCESS), expected SUCCESS
kmod_test_0006: OK! - loading kmod test
kmod_test_0006: OK! - Return value: 0 (SUCCESS), expected SUCCESS
kmod_test_0005: OK! - loading kmod test
kmod_test_0005: OK! - Return value: 0 (SUCCESS), expected SUCCESS
kmod_test_0006: OK! - loading kmod test
kmod_test_0006: OK! - Return value: 0 (SUCCESS), expected SUCCESS
XXX: add test restult for 0007
Test completed
You can also request for specific tests:
# tools/testing/selftests/kmod/kmod.sh -t 0001
kmod_test_0001_driver: OK! - loading kmod test
kmod_test_0001_driver: OK! - Return value: 256 (MODULE_NOT_FOUND), expected MODULE_NOT_FOUND
kmod_test_0001_fs: OK! - loading kmod test
kmod_test_0001_fs: OK! - Return value: -22 (-EINVAL), expected -EINVAL
Test completed
Lastly, the current available number of tests:
# tools/testing/selftests/kmod/kmod.sh --help
Usage: tools/testing/selftests/kmod/kmod.sh [ -t <4-number-digit> ]
Valid tests: 0001-0009
0001 - Simple test - 1 thread for empty string
0002 - Simple test - 1 thread for modules/filesystems that do not exist
0003 - Simple test - 1 thread for get_fs_type() only
0004 - Simple test - 2 threads for get_fs_type() only
0005 - multithreaded tests with default setup - request_module() only
0006 - multithreaded tests with default setup - get_fs_type() only
0007 - multithreaded tests with default setup test request_module() and get_fs_type()
0008 - multithreaded - push kmod_concurrent over max_modprobes for request_module()
0009 - multithreaded - push kmod_concurrent over max_modprobes for get_fs_type()
The following test cases currently fail, as such they are not currently
enabled by default:
# tools/testing/selftests/kmod/kmod.sh -t 0008
# tools/testing/selftests/kmod/kmod.sh -t 0009
To be sure to run them as intended please unload both of the modules:
o test_module
o xfs
And ensure they are not loaded on your system prior to testing them. If
you use these paritions for your rootfs you can change the default test
driver used for get_fs_type() by exporting it into your environment. For
example of other test defaults you can override refer to kmod.sh
allow_user_defaults().
Behind the scenes this is how we fine tune at a test case prior to
hitting a trigger to run it:
cat /sys/devices/virtual/misc/test_kmod0/config
echo -n "2" > /sys/devices/virtual/misc/test_kmod0/config_test_case
echo -n "ext4" > /sys/devices/virtual/misc/test_kmod0/config_test_fs
echo -n "80" > /sys/devices/virtual/misc/test_kmod0/config_num_threads
cat /sys/devices/virtual/misc/test_kmod0/config
echo -n "1" > /sys/devices/virtual/misc/test_kmod0/config_num_threads
Finally to trigger:
echo -n "1" > /sys/devices/virtual/misc/test_kmod0/trigger_config
The kmod.sh script uses the above constructs to build different test cases.
A bit of interpretation of the current failures follows, first two
premises:
a) When request_module() is used userspace figures out an optimized
version of module order for us. Once it finds the modules it needs, as
per depmod symbol dep map, it will finit_module() the respective
modules which are needed for the original request_module() request.
b) We have an optimization in place whereby if a kernel uses
request_module() on a module already loaded we never bother userspace
as the module already is loaded. This is all handled by kernel/kmod.c.
A few things to consider to help identify root causes of issues:
0) kmod 19 has a broken heuristic for modules being assumed to be
built-in to your kernel and will return 0 even though request_module()
failed. Upgrade to a newer version of kmod.
1) A get_fs_type() call for "xfs" will request_module() for "fs-xfs",
not for "xfs". The optimization in kernel described in b) fails to
catch if we have a lot of consecutive get_fs_type() calls. The reason
is the optimization in place does not look for aliases. This means two
consecutive get_fs_type() calls will bump kmod_concurrent, whereas
request_module() will not.
This one explanation why test case 0009 fails at least once for
get_fs_type().
2) If a module fails to load --- for whatever reason (kmod_concurrent
limit reached, file not yet present due to rootfs switch, out of
memory) we have a period of time during which module request for the
same name either with request_module() or get_fs_type() will *also*
fail to load even if the file for the module is ready.
This explains why *multiple* NULLs are possible on test 0009.
3) finit_module() consumes quite a bit of memory.
4) Filesystems typically also have more dependent modules than other
modules, its important to note though that even though a get_fs_type()
call does not incur additional kmod_concurrent bumps, since userspace
loads dependencies it finds it needs via finit_module_fd(), it *will*
take much more memory to load a module with a lot of dependencies.
Because of 3) and 4) we will easily run into out of memory failures with
certain tests. For instance test 0006 fails on qemu with 1024 MiB of RAM.
It panics a box after reaping all userspace processes and still not
having enough memory to reap.
[arnd@arndb.de: add dependencies for test module]
Link: http://lkml.kernel.org/r/20170630154834.3689272-1-arnd@arndb.de
Link: http://lkml.kernel.org/r/20170628223155.26472-3-mcgrof@kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Cc: Jessica Yu <jeyu@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michal Marek <mmarek@suse.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-14 14:50:08 -07:00
|
|
|
select TEST_LKM
|
|
|
|
select XFS_FS
|
|
|
|
select TUN
|
|
|
|
select BTRFS_FS
|
|
|
|
help
|
|
|
|
Test the kernel's module loading mechanism: kmod. kmod implements
|
|
|
|
support to load modules using the Linux kernel's usermode helper.
|
|
|
|
This test provides a series of tests against kmod.
|
|
|
|
|
|
|
|
Although technically you can either build test_kmod as a module or
|
|
|
|
into the kernel we disallow building it into the kernel since
|
|
|
|
it stress tests request_module() and this will very likely cause
|
|
|
|
some issues by taking over precious threads available from other
|
|
|
|
module load requests, ultimately this could be fatal.
|
|
|
|
|
|
|
|
To run tests run:
|
|
|
|
|
|
|
|
tools/testing/selftests/kmod/kmod.sh --help
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2017-09-08 16:15:31 -07:00
|
|
|
config TEST_DEBUG_VIRTUAL
|
|
|
|
tristate "Test CONFIG_DEBUG_VIRTUAL feature"
|
|
|
|
depends on DEBUG_VIRTUAL
|
|
|
|
help
|
|
|
|
Test the kernel's ability to detect incorrect calls to
|
|
|
|
virt_to_phys() done against the non-linear part of the
|
|
|
|
kernel's virtual address map.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2018-10-05 15:43:05 +03:00
|
|
|
config TEST_MEMCAT_P
|
|
|
|
tristate "Test memcat_p() helper function"
|
|
|
|
help
|
|
|
|
Test the memcat_p() helper for correctly merging two
|
|
|
|
pointer arrays together.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2018-11-14 08:22:28 +00:00
|
|
|
config TEST_OBJAGG
|
|
|
|
tristate "Perform selftest on object aggreration manager"
|
|
|
|
default n
|
|
|
|
depends on OBJAGG
|
|
|
|
help
|
|
|
|
Enable this option to test object aggregation manager on boot
|
|
|
|
(or module load).
|
|
|
|
|
2019-07-16 16:27:27 -07:00
|
|
|
config TEST_MEMINIT
|
|
|
|
tristate "Test heap/page initialization"
|
|
|
|
help
|
|
|
|
Test if the kernel is zero-initializing heap and page allocations.
|
|
|
|
This can be useful to test init_on_alloc and init_on_free features.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2020-04-22 12:50:26 -07:00
|
|
|
config TEST_HMM
|
|
|
|
tristate "Test HMM (Heterogeneous Memory Management)"
|
|
|
|
depends on TRANSPARENT_HUGEPAGE
|
|
|
|
depends on DEVICE_PRIVATE
|
|
|
|
select HMM_MIRROR
|
|
|
|
select MMU_NOTIFIER
|
|
|
|
help
|
|
|
|
This is a pseudo device driver solely for testing HMM.
|
|
|
|
Say M here if you want to build the HMM test module.
|
|
|
|
Doing so will allow you to run tools/testing/selftest/vm/hmm-tests.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2020-10-13 16:56:04 -07:00
|
|
|
config TEST_FREE_PAGES
|
|
|
|
tristate "Test freeing pages"
|
|
|
|
help
|
|
|
|
Test that a memory leak does not occur due to a race between
|
|
|
|
freeing a block of pages and a speculative page reference.
|
|
|
|
Loading this module is safe if your kernel has the bug fixed.
|
|
|
|
If the bug is not fixed, it will leak gigabytes of memory and
|
|
|
|
probably OOM your system.
|
|
|
|
|
2020-06-18 16:37:37 +02:00
|
|
|
config TEST_FPU
|
|
|
|
tristate "Test floating point operations in kernel space"
|
2024-03-29 00:18:30 -07:00
|
|
|
depends on ARCH_HAS_KERNEL_FPU_SUPPORT && !KCOV_INSTRUMENT_ALL
|
2020-06-18 16:37:37 +02:00
|
|
|
help
|
|
|
|
Enable this option to add /sys/kernel/debug/selftest_helpers/test_fpu
|
|
|
|
which will trigger a sequence of floating point operations. This is used
|
|
|
|
for self-testing floating point control register setting in
|
|
|
|
kernel_fpu_begin().
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
clocksource: Provide kernel module to test clocksource watchdog
When the clocksource watchdog marks a clock as unstable, this might
be due to that clock being unstable or it might be due to delays that
happen to occur between the reads of the two clocks. It would be good
to have a way of testing the clocksource watchdog's ability to
distinguish between these two causes of clock skew and instability.
Therefore, provide a new clocksource-wdtest module selected by a new
TEST_CLOCKSOURCE_WATCHDOG Kconfig option. This module has a single module
parameter named "holdoff" that provides the number of seconds of delay
before testing should start, which defaults to zero when built as a module
and to 10 seconds when built directly into the kernel. Very large systems
that boot slowly may need to increase the value of this module parameter.
This module uses hand-crafted clocksource structures to do its testing,
thus avoiding messing up timing for the rest of the kernel and for user
applications. This module first verifies that the ->uncertainty_margin
field of the clocksource structures are set sanely. It then tests the
delay-detection capability of the clocksource watchdog, increasing the
number of consecutive delays injected, first provoking console messages
complaining about the delays and finally forcing a clock-skew event.
Unexpected test results cause at least one WARN_ON_ONCE() console splat.
If there are no splats, the test has passed. Finally, it fuzzes the
value returned from a clocksource to test the clocksource watchdog's
ability to detect time skew.
This module checks the state of its clocksource after each test, and
uses WARN_ON_ONCE() to emit a console splat if there are any failures.
This should enable all types of test frameworks to detect any such
failures.
This facility is intended for diagnostic use only, and should be avoided
on production systems.
Reported-by: Chris Mason <clm@fb.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Feng Tang <feng.tang@intel.com>
Link: https://lore.kernel.org/r/20210527190124.440372-5-paulmck@kernel.org
2021-05-27 12:01:23 -07:00
|
|
|
config TEST_CLOCKSOURCE_WATCHDOG
|
|
|
|
tristate "Test clocksource watchdog in kernel space"
|
|
|
|
depends on CLOCKSOURCE_WATCHDOG
|
|
|
|
help
|
|
|
|
Enable this option to create a kernel module that will trigger
|
|
|
|
a test of the clocksource watchdog. This module may be loaded
|
|
|
|
via modprobe or insmod in which case it will run upon being
|
|
|
|
loaded, or it may be built in, in which case it will run
|
|
|
|
shortly after boot.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2023-10-17 21:56:51 +08:00
|
|
|
config TEST_OBJPOOL
|
|
|
|
tristate "Test module for correctness and stress of objpool"
|
|
|
|
default n
|
|
|
|
depends on m && DEBUG_KERNEL
|
|
|
|
help
|
|
|
|
This builds the "test_objpool" module that should be used for
|
|
|
|
correctness verification and concurrent testings of objects
|
|
|
|
allocation and reclamation.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2018-02-06 15:38:38 -08:00
|
|
|
endif # RUNTIME_TESTING_MENU
|
2017-10-13 15:57:33 -07:00
|
|
|
|
2021-04-29 22:55:15 -07:00
|
|
|
config ARCH_USE_MEMTEST
|
|
|
|
bool
|
|
|
|
help
|
|
|
|
An architecture should select this when it uses early_memtest()
|
|
|
|
during boot process.
|
|
|
|
|
2017-10-13 15:57:33 -07:00
|
|
|
config MEMTEST
|
|
|
|
bool "Memtest"
|
2021-04-29 22:55:15 -07:00
|
|
|
depends on ARCH_USE_MEMTEST
|
2020-06-14 01:50:22 +09:00
|
|
|
help
|
2017-10-13 15:57:33 -07:00
|
|
|
This option adds a kernel parameter 'memtest', which allows memtest
|
2021-04-29 22:55:15 -07:00
|
|
|
to be set and executed.
|
2017-10-13 15:57:33 -07:00
|
|
|
memtest=0, mean disabled; -- default
|
|
|
|
memtest=1, mean do 1 test pattern;
|
|
|
|
...
|
|
|
|
memtest=17, mean do 17 test patterns.
|
|
|
|
If you are unsure how to answer this question, answer N.
|
|
|
|
|
2015-11-19 18:19:29 -08:00
|
|
|
|
2018-07-31 13:39:31 +02:00
|
|
|
|
2019-10-03 17:01:49 -04:00
|
|
|
config HYPERV_TESTING
|
|
|
|
bool "Microsoft Hyper-V driver testing"
|
|
|
|
default n
|
|
|
|
depends on HYPERV && DEBUG_FS
|
|
|
|
help
|
|
|
|
Select this option to enable Hyper-V vmbus testing.
|
|
|
|
|
2019-12-17 20:51:56 -08:00
|
|
|
endmenu # "Kernel Testing and Coverage"
|
|
|
|
|
2021-07-03 16:42:57 +02:00
|
|
|
menu "Rust hacking"
|
|
|
|
|
|
|
|
config RUST_DEBUG_ASSERTIONS
|
|
|
|
bool "Debug assertions"
|
|
|
|
depends on RUST
|
|
|
|
help
|
|
|
|
Enables rustc's `-Cdebug-assertions` codegen option.
|
|
|
|
|
|
|
|
This flag lets you turn `cfg(debug_assertions)` conditional
|
|
|
|
compilation on or off. This can be used to enable extra debugging
|
|
|
|
code in development but not in production. For example, it controls
|
|
|
|
the behavior of the standard library's `debug_assert!` macro.
|
|
|
|
|
|
|
|
Note that this will apply to all Rust code, including `core`.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
|
|
|
config RUST_OVERFLOW_CHECKS
|
|
|
|
bool "Overflow checks"
|
|
|
|
default y
|
|
|
|
depends on RUST
|
|
|
|
help
|
|
|
|
Enables rustc's `-Coverflow-checks` codegen option.
|
|
|
|
|
|
|
|
This flag allows you to control the behavior of runtime integer
|
|
|
|
overflow. When overflow-checks are enabled, a Rust panic will occur
|
|
|
|
on overflow.
|
|
|
|
|
|
|
|
Note that this will apply to all Rust code, including `core`.
|
|
|
|
|
|
|
|
If unsure, say Y.
|
|
|
|
|
2022-11-10 17:41:37 +01:00
|
|
|
config RUST_BUILD_ASSERT_ALLOW
|
|
|
|
bool "Allow unoptimized build-time assertions"
|
|
|
|
depends on RUST
|
|
|
|
help
|
2024-10-06 16:02:44 +02:00
|
|
|
Controls how `build_error!` and `build_assert!` are handled during the build.
|
2022-11-10 17:41:37 +01:00
|
|
|
|
|
|
|
If calls to them exist in the binary, it may indicate a violated invariant
|
|
|
|
or that the optimizer failed to verify the invariant during compilation.
|
|
|
|
|
|
|
|
This should not happen, thus by default the build is aborted. However,
|
|
|
|
as an escape hatch, you can choose Y here to ignore them during build
|
|
|
|
and let the check be carried at runtime (with `panic!` being called if
|
|
|
|
the check fails).
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
rust: support running Rust documentation tests as KUnit ones
Rust has documentation tests: these are typically examples of
usage of any item (e.g. function, struct, module...).
They are very convenient because they are just written
alongside the documentation. For instance:
/// Sums two numbers.
///
/// ```
/// assert_eq!(mymod::f(10, 20), 30);
/// ```
pub fn f(a: i32, b: i32) -> i32 {
a + b
}
In userspace, the tests are collected and run via `rustdoc`.
Using the tool as-is would be useful already, since it allows
to compile-test most tests (thus enforcing they are kept
in sync with the code they document) and run those that do not
depend on in-kernel APIs.
However, by transforming the tests into a KUnit test suite,
they can also be run inside the kernel. Moreover, the tests
get to be compiled as other Rust kernel objects instead of
targeting userspace.
On top of that, the integration with KUnit means the Rust
support gets to reuse the existing testing facilities. For
instance, the kernel log would look like:
KTAP version 1
1..1
KTAP version 1
# Subtest: rust_doctests_kernel
1..59
# rust_doctest_kernel_build_assert_rs_0.location: rust/kernel/build_assert.rs:13
ok 1 rust_doctest_kernel_build_assert_rs_0
# rust_doctest_kernel_build_assert_rs_1.location: rust/kernel/build_assert.rs:56
ok 2 rust_doctest_kernel_build_assert_rs_1
# rust_doctest_kernel_init_rs_0.location: rust/kernel/init.rs:122
ok 3 rust_doctest_kernel_init_rs_0
...
# rust_doctest_kernel_types_rs_2.location: rust/kernel/types.rs:150
ok 59 rust_doctest_kernel_types_rs_2
# rust_doctests_kernel: pass:59 fail:0 skip:0 total:59
# Totals: pass:59 fail:0 skip:0 total:59
ok 1 rust_doctests_kernel
Therefore, add support for running Rust documentation tests
in KUnit. Some other notes about the current implementation
and support follow.
The transformation is performed by a couple scripts written
as Rust hostprogs.
Tests using the `?` operator are also supported as usual, e.g.:
/// ```
/// # use kernel::{spawn_work_item, workqueue};
/// spawn_work_item!(workqueue::system(), || pr_info!("x"))?;
/// # Ok::<(), Error>(())
/// ```
The tests are also compiled with Clippy under `CLIPPY=1`, just
like normal code, thus also benefitting from extra linting.
The names of the tests are currently automatically generated.
This allows to reduce the burden for documentation writers,
while keeping them fairly stable for bisection. This is an
improvement over the `rustdoc`-generated names, which include
the line number; but ideally we would like to get `rustdoc` to
provide the Rust item path and a number (for multiple examples
in a single documented Rust item).
In order for developers to easily see from which original line
a failed doctests came from, a KTAP diagnostic line is printed
to the log, containing the location (file and line) of the
original test (i.e. instead of the location in the generated
Rust file):
# rust_doctest_kernel_types_rs_2.location: rust/kernel/types.rs:150
This line follows the syntax for declaring test metadata in the
proposed KTAP v2 spec [1], which may be used for the proposed
KUnit test attributes API [2]. Thus hopefully this will make
migration easier later on (suggested by David [3]).
The original line in that test attribute is figured out by
providing an anchor (suggested by Boqun [4]). The original file
is found by walking the filesystem, checking directory prefixes
to reduce the amount of combinations to check, and it is only
done once per file. Ambiguities are detected and reported.
A notable difference from KUnit C tests is that the Rust tests
appear to assert using the usual `assert!` and `assert_eq!`
macros from the Rust standard library (`core`). We provide
a custom version that forwards the call to KUnit instead.
Importantly, these macros do not require passing context,
unlike the KUnit C ones (i.e. `struct kunit *`). This makes
them easier to use, and readers of the documentation do not need
to care about which testing framework is used. In addition, it
may allow us to test third-party code more easily in the future.
However, a current limitation is that KUnit does not support
assertions in other tasks. Thus we presently simply print an
error to the kernel log if an assertion actually failed. This
should be revisited to properly fail the test, perhaps saving
the context somewhere else, or letting KUnit handle it.
Link: https://lore.kernel.org/lkml/20230420205734.1288498-1-rmoar@google.com/ [1]
Link: https://lore.kernel.org/linux-kselftest/20230707210947.1208717-1-rmoar@google.com/ [2]
Link: https://lore.kernel.org/rust-for-linux/CABVgOSkOLO-8v6kdAGpmYnZUb+LKOX0CtYCo-Bge7r_2YTuXDQ@mail.gmail.com/ [3]
Link: https://lore.kernel.org/rust-for-linux/ZIps86MbJF%2FiGIzd@boqun-archlinux/ [4]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2023-07-18 07:27:51 +02:00
|
|
|
config RUST_KERNEL_DOCTESTS
|
|
|
|
bool "Doctests for the `kernel` crate" if !KUNIT_ALL_TESTS
|
|
|
|
depends on RUST && KUNIT=y
|
|
|
|
default KUNIT_ALL_TESTS
|
|
|
|
help
|
|
|
|
This builds the documentation tests of the `kernel` crate
|
|
|
|
as KUnit tests.
|
|
|
|
|
|
|
|
For more information on KUnit and unit tests in general,
|
|
|
|
please refer to the KUnit documentation in Documentation/dev-tools/kunit/.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2021-07-03 16:42:57 +02:00
|
|
|
endmenu # "Rust"
|
|
|
|
|
2018-07-31 13:39:31 +02:00
|
|
|
endmenu # Kernel hacking
|
2024-09-09 21:10:34 -04:00
|
|
|
|
|
|
|
config INT_POW_TEST
|
|
|
|
tristate "Integer exponentiation (int_pow) test" if !KUNIT_ALL_TESTS
|
|
|
|
depends on KUNIT
|
|
|
|
default KUNIT_ALL_TESTS
|
|
|
|
help
|
|
|
|
This option enables the KUnit test suite for the int_pow function,
|
|
|
|
which performs integer exponentiation. The test suite is designed to
|
|
|
|
verify that the implementation of int_pow correctly computes the power
|
|
|
|
of a given base raised to a given exponent.
|
|
|
|
|
|
|
|
Enabling this option will include tests that check various scenarios
|
|
|
|
and edge cases to ensure the accuracy and reliability of the exponentiation
|
|
|
|
function.
|
|
|
|
|
|
|
|
If unsure, say N
|