mirror of
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
synced 2025-01-01 10:42:11 +00:00
tracing/Documentation: Start a document on how to debug with tracing
Add a new document Documentation/trace/debugging.rst that will hold various ways to debug tracing. This initial version mentions trace_printk and how to create persistent buffers that can last across bootups. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Vincent Donnefort <vdonnefort@google.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vineeth Pillai <vineeth@bitbyteword.org> Cc: Beau Belgrave <beaub@linux.microsoft.com> Cc: Alexander Graf <graf@amazon.com> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: David Howells <dhowells@redhat.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Ross Zwisler <zwisler@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Alexander Aring <aahringo@redhat.com> Cc: "Luis Claudio R. Goncalves" <lgoncalv@redhat.com> Cc: Tomas Glozar <tglozar@redhat.com> Cc: John Kacur <jkacur@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: "Jonathan Corbet" <corbet@lwn.net> Link: https://lore.kernel.org/20240823014019.702433486@goodmis.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
This commit is contained in:
parent
ef2bd81d0c
commit
2fcd5aff92
@ -6785,6 +6785,8 @@
|
||||
|
||||
reserve_mem=12M:4096:trace trace_instance=boot_map^traceoff^traceprintk@trace,sched,irq
|
||||
|
||||
See also Documentation/trace/debugging.rst
|
||||
|
||||
|
||||
trace_options=[option-list]
|
||||
[FTRACE] Enable or disable tracer options at boot.
|
||||
|
159
Documentation/trace/debugging.rst
Normal file
159
Documentation/trace/debugging.rst
Normal file
@ -0,0 +1,159 @@
|
||||
==============================
|
||||
Using the tracer for debugging
|
||||
==============================
|
||||
|
||||
Copyright 2024 Google LLC.
|
||||
|
||||
:Author: Steven Rostedt <rostedt@goodmis.org>
|
||||
:License: The GNU Free Documentation License, Version 1.2
|
||||
(dual licensed under the GPL v2)
|
||||
|
||||
- Written for: 6.12
|
||||
|
||||
Introduction
|
||||
------------
|
||||
The tracing infrastructure can be very useful for debugging the Linux
|
||||
kernel. This document is a place to add various methods of using the tracer
|
||||
for debugging.
|
||||
|
||||
First, make sure that the tracefs file system is mounted::
|
||||
|
||||
$ sudo mount -t tracefs tracefs /sys/kernel/tracing
|
||||
|
||||
|
||||
Using trace_printk()
|
||||
--------------------
|
||||
|
||||
trace_printk() is a very lightweight utility that can be used in any context
|
||||
inside the kernel, with the exception of "noinstr" sections. It can be used
|
||||
in normal, softirq, interrupt and even NMI context. The trace data is
|
||||
written to the tracing ring buffer in a lockless way. To make it even
|
||||
lighter weight, when possible, it will only record the pointer to the format
|
||||
string, and save the raw arguments into the buffer. The format and the
|
||||
arguments will be post processed when the ring buffer is read. This way the
|
||||
trace_printk() format conversions are not done during the hot path, where
|
||||
the trace is being recorded.
|
||||
|
||||
trace_printk() is meant only for debugging, and should never be added into
|
||||
a subsystem of the kernel. If you need debugging traces, add trace events
|
||||
instead. If a trace_printk() is found in the kernel, the following will
|
||||
appear in the dmesg::
|
||||
|
||||
**********************************************************
|
||||
** NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE **
|
||||
** **
|
||||
** trace_printk() being used. Allocating extra memory. **
|
||||
** **
|
||||
** This means that this is a DEBUG kernel and it is **
|
||||
** unsafe for production use. **
|
||||
** **
|
||||
** If you see this message and you are not debugging **
|
||||
** the kernel, report this immediately to your vendor! **
|
||||
** **
|
||||
** NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE **
|
||||
**********************************************************
|
||||
|
||||
Debugging kernel crashes
|
||||
------------------------
|
||||
There is various methods of acquiring the state of the system when a kernel
|
||||
crash occurs. This could be from the oops message in printk, or one could
|
||||
use kexec/kdump. But these just show what happened at the time of the crash.
|
||||
It can be very useful in knowing what happened up to the point of the crash.
|
||||
The tracing ring buffer, by default, is a circular buffer than will
|
||||
overwrite older events with newer ones. When a crash happens, the content of
|
||||
the ring buffer will be all the events that lead up to the crash.
|
||||
|
||||
There are several kernel command line parameters that can be used to help in
|
||||
this. The first is "ftrace_dump_on_oops". This will dump the tracing ring
|
||||
buffer when a oops occurs to the console. This can be useful if the console
|
||||
is being logged somewhere. If a serial console is used, it may be prudent to
|
||||
make sure the ring buffer is relatively small, otherwise the dumping of the
|
||||
ring buffer may take several minutes to hours to finish. Here's an example
|
||||
of the kernel command line::
|
||||
|
||||
ftrace_dump_on_oops trace_buf_size=50K
|
||||
|
||||
Note, the tracing buffer is made up of per CPU buffers where each of these
|
||||
buffers is broken up into sub-buffers that are by default PAGE_SIZE. The
|
||||
above trace_buf_size option above sets each of the per CPU buffers to 50K,
|
||||
so, on a machine with 8 CPUs, that's actually 400K total.
|
||||
|
||||
Persistent buffers across boots
|
||||
-------------------------------
|
||||
If the system memory allows it, the tracing ring buffer can be specified at
|
||||
a specific location in memory. If the location is the same across boots and
|
||||
the memory is not modified, the tracing buffer can be retrieved from the
|
||||
following boot. There's two ways to reserve memory for the use of the ring
|
||||
buffer.
|
||||
|
||||
The more reliable way (on x86) is to reserve memory with the "memmap" kernel
|
||||
command line option and then use that memory for the trace_instance. This
|
||||
requires a bit of knowledge of the physical memory layout of the system. The
|
||||
advantage of using this method, is that the memory for the ring buffer will
|
||||
always be the same::
|
||||
|
||||
memmap==12M$0x284500000 trace_instance=boot_map@0x284500000:12M
|
||||
|
||||
The memmap above reserves 12 megabytes of memory at the physical memory
|
||||
location 0x284500000. Then the trace_instance option will create a trace
|
||||
instance "boot_map" at that same location with the same amount of memory
|
||||
reserved. As the ring buffer is broke up into per CPU buffers, the 12
|
||||
megabytes will be broken up evenly between those CPUs. If you have 8 CPUs,
|
||||
each per CPU ring buffer will be 1.5 megabytes in size. Note, that also
|
||||
includes meta data, so the amount of memory actually used by the ring buffer
|
||||
will be slightly smaller.
|
||||
|
||||
Another more generic but less robust way to allocate a ring buffer mapping
|
||||
at boot is with the "reserve_mem" option::
|
||||
|
||||
reserve_mem=12M:4096:trace trace_instance=boot_map@trace
|
||||
|
||||
The reserve_mem option above will find 12 megabytes that are available at
|
||||
boot up, and align it by 4096 bytes. It will label this memory as "trace"
|
||||
that can be used by later command line options.
|
||||
|
||||
The trace_instance option creates a "boot_map" instance and will use the
|
||||
memory reserved by reserve_mem that was labeled as "trace". This method is
|
||||
more generic but may not be as reliable. Due to KASLR, the memory reserved
|
||||
by reserve_mem may not be located at the same location. If this happens,
|
||||
then the ring buffer will not be from the previous boot and will be reset.
|
||||
|
||||
Sometimes, by using a larger alignment, it can keep KASLR from moving things
|
||||
around in such a way that it will move the location of the reserve_mem. By
|
||||
using a larger alignment, you may find better that the buffer is more
|
||||
consistent to where it is placed::
|
||||
|
||||
reserve_mem=12M:0x2000000:trace trace_instance=boot_map@trace
|
||||
|
||||
On boot up, the memory reserved for the ring buffer is validated. It will go
|
||||
through a series of tests to make sure that the ring buffer contains valid
|
||||
data. If it is, it will then set it up to be available to read from the
|
||||
instance. If it fails any of the tests, it will clear the entire ring buffer
|
||||
and initialize it as new.
|
||||
|
||||
The layout of this mapped memory may not be consistent from kernel to
|
||||
kernel, so only the same kernel is guaranteed to work if the mapping is
|
||||
preserved. Switching to a different kernel version may find a different
|
||||
layout and mark the buffer as invalid.
|
||||
|
||||
Using trace_printk() in the boot instance
|
||||
-----------------------------------------
|
||||
By default, the content of trace_printk() goes into the top level tracing
|
||||
instance. But this instance is never preserved across boots. To have the
|
||||
trace_printk() content, and some other internal tracing go to the preserved
|
||||
buffer (like dump stacks), either set the instance to be the trace_printk()
|
||||
destination from the kernel command line, or set it after boot up via the
|
||||
trace_printk_dest option.
|
||||
|
||||
After boot up::
|
||||
|
||||
echo 1 > /sys/kernel/tracing/instances/boot_map/options/trace_printk_dest
|
||||
|
||||
From the kernel command line::
|
||||
|
||||
reserve_mem=12M:4096:trace trace_instance=boot_map^traceprintk^traceoff@trace
|
||||
|
||||
If setting it from the kernel command line, it is recommended to also
|
||||
disable tracing with the "traceoff" flag, and enable tracing after boot up.
|
||||
Otherwise the trace from the most recent boot will be mixed with the trace
|
||||
from the previous boot, and may make it confusing to read.
|
Loading…
Reference in New Issue
Block a user