License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-01 15:07:57 +01:00
|
|
|
/* SPDX-License-Identifier: GPL-2.0 */
|
2010-07-20 03:13:35 -04:00
|
|
|
/*
|
|
|
|
* Ftrace header. For implementation details beyond the random comments
|
2018-05-08 15:14:57 -03:00
|
|
|
* scattered below, see: Documentation/trace/ftrace-design.rst
|
2010-07-20 03:13:35 -04:00
|
|
|
*/
|
|
|
|
|
2008-05-12 21:20:42 +02:00
|
|
|
#ifndef _LINUX_FTRACE_H
|
|
|
|
#define _LINUX_FTRACE_H
|
|
|
|
|
2020-11-05 21:32:36 -05:00
|
|
|
#include <linux/trace_recursion.h>
|
2009-03-05 01:49:22 +01:00
|
|
|
#include <linux/trace_clock.h>
|
2022-03-30 09:00:19 +02:00
|
|
|
#include <linux/jump_label.h>
|
2008-10-02 13:26:05 +02:00
|
|
|
#include <linux/kallsyms.h>
|
2009-03-05 01:49:22 +01:00
|
|
|
#include <linux/linkage.h>
|
2008-12-03 15:36:57 -05:00
|
|
|
#include <linux/bitops.h>
|
2011-08-09 12:50:46 -04:00
|
|
|
#include <linux/ptrace.h>
|
2009-03-05 01:49:22 +01:00
|
|
|
#include <linux/ktime.h>
|
2008-12-04 23:51:23 +01:00
|
|
|
#include <linux/sched.h>
|
2009-03-05 01:49:22 +01:00
|
|
|
#include <linux/types.h>
|
|
|
|
#include <linux/init.h>
|
|
|
|
#include <linux/fs.h>
|
2008-05-12 21:20:42 +02:00
|
|
|
|
2009-02-27 21:30:03 +01:00
|
|
|
#include <asm/ftrace.h>
|
|
|
|
|
2011-08-08 16:57:47 -04:00
|
|
|
/*
|
|
|
|
* If the arch supports passing the variable contents of
|
|
|
|
* function_trace_op as the third parameter back from the
|
|
|
|
* mcount call, then the arch should define this as 1.
|
|
|
|
*/
|
|
|
|
#ifndef ARCH_SUPPORTS_FTRACE_OPS
|
|
|
|
#define ARCH_SUPPORTS_FTRACE_OPS 0
|
|
|
|
#endif
|
|
|
|
|
2022-03-10 21:37:09 -05:00
|
|
|
#ifdef CONFIG_TRACING
|
|
|
|
extern void ftrace_boot_snapshot(void);
|
|
|
|
#else
|
|
|
|
static inline void ftrace_boot_snapshot(void) { }
|
|
|
|
#endif
|
|
|
|
|
2020-06-17 16:56:16 -04:00
|
|
|
struct ftrace_ops;
|
|
|
|
struct ftrace_regs;
|
ftrace: Add DYNAMIC_FTRACE_WITH_CALL_OPS
Architectures without dynamic ftrace trampolines incur an overhead when
multiple ftrace_ops are enabled with distinct filters. in these cases,
each call site calls a common trampoline which uses
ftrace_ops_list_func() to iterate over all enabled ftrace functions, and
so incurs an overhead relative to the size of this list (including RCU
protection overhead).
Architectures with dynamic ftrace trampolines avoid this overhead for
call sites which have a single associated ftrace_ops. In these cases,
the dynamic trampoline is customized to branch directly to the relevant
ftrace function, avoiding the list overhead.
On some architectures it's impractical and/or undesirable to implement
dynamic ftrace trampolines. For example, arm64 has limited branch ranges
and cannot always directly branch from a call site to an arbitrary
address (e.g. from a kernel text address to an arbitrary module
address). Calls from modules to core kernel text can be indirected via
PLTs (allocated at module load time) to address this, but the same is
not possible from calls from core kernel text.
Using an indirect branch from a call site to an arbitrary trampoline is
possible, but requires several more instructions in the function
prologue (or immediately before it), and/or comes with far more complex
requirements for patching.
Instead, this patch adds a new option, where an architecture can
associate each call site with a pointer to an ftrace_ops, placed at a
fixed offset from the call site. A shared trampoline can recover this
pointer and call ftrace_ops::func() without needing to go via
ftrace_ops_list_func(), avoiding the associated overhead.
This avoids issues with branch range limitations, and avoids the need to
allocate and manipulate dynamic trampolines, making it far simpler to
implement and maintain, while having similar performance
characteristics.
Note that this allows for dynamic ftrace_ops to be invoked directly from
an architecture's ftrace_caller trampoline, whereas existing code forces
the use of ftrace_ops_get_list_func(), which is in part necessary to
permit the ftrace_ops to be freed once unregistered *and* to avoid
branch/address-generation range limitation on some architectures (e.g.
where ops->func is a module address, and may be outside of the direct
branch range for callsites within the main kernel image).
The CALL_OPS approach avoids this problems and is safe as:
* The existing synchronization in ftrace_shutdown() using
ftrace_shutdown() using synchronize_rcu_tasks_rude() (and
synchronize_rcu_tasks()) ensures that no tasks hold a stale reference
to an ftrace_ops (e.g. in the middle of the ftrace_caller trampoline,
or while invoking ftrace_ops::func), when that ftrace_ops is
unregistered.
Arguably this could also be relied upon for the existing scheme,
permitting dynamic ftrace_ops to be invoked directly when ops->func is
in range, but this will require additional logic to handle branch
range limitations, and is not handled by this patch.
* Each callsite's ftrace_ops pointer literal can hold any valid kernel
address, and is updated atomically. As an architecture's ftrace_caller
trampoline will atomically load the ops pointer then dereference
ops->func, there is no risk of invoking ops->func with a mismatches
ops pointer, and updates to the ops pointer do not require special
care.
A subsequent patch will implement architectures support for arm64. There
should be no functional change as a result of this patch alone.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: Florent Revest <revest@chromium.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230123134603.1064407-2-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-01-23 13:45:56 +00:00
|
|
|
struct dyn_ftrace;
|
2022-11-03 17:05:17 +00:00
|
|
|
|
|
|
|
#ifdef CONFIG_FUNCTION_TRACER
|
2012-06-05 09:44:25 -04:00
|
|
|
/*
|
|
|
|
* If the arch's mcount caller does not support all of ftrace's
|
|
|
|
* features, then it must call an indirect function that
|
2021-03-23 18:49:35 +01:00
|
|
|
* does. Or at least does enough to prevent any unwelcome side effects.
|
2020-06-17 16:56:16 -04:00
|
|
|
*
|
|
|
|
* Also define the function prototype that these architectures use
|
|
|
|
* to call the ftrace_ops_list_func().
|
2012-06-05 09:44:25 -04:00
|
|
|
*/
|
2014-06-25 13:26:59 -04:00
|
|
|
#if !ARCH_SUPPORTS_FTRACE_OPS
|
2012-06-05 09:44:25 -04:00
|
|
|
# define FTRACE_FORCE_LIST_FUNC 1
|
2020-06-17 16:56:16 -04:00
|
|
|
void arch_ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip);
|
2012-06-05 09:44:25 -04:00
|
|
|
#else
|
|
|
|
# define FTRACE_FORCE_LIST_FUNC 0
|
2020-06-17 16:56:16 -04:00
|
|
|
void arch_ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip,
|
|
|
|
struct ftrace_ops *op, struct ftrace_regs *fregs);
|
2012-06-05 09:44:25 -04:00
|
|
|
#endif
|
ftrace: Add DYNAMIC_FTRACE_WITH_CALL_OPS
Architectures without dynamic ftrace trampolines incur an overhead when
multiple ftrace_ops are enabled with distinct filters. in these cases,
each call site calls a common trampoline which uses
ftrace_ops_list_func() to iterate over all enabled ftrace functions, and
so incurs an overhead relative to the size of this list (including RCU
protection overhead).
Architectures with dynamic ftrace trampolines avoid this overhead for
call sites which have a single associated ftrace_ops. In these cases,
the dynamic trampoline is customized to branch directly to the relevant
ftrace function, avoiding the list overhead.
On some architectures it's impractical and/or undesirable to implement
dynamic ftrace trampolines. For example, arm64 has limited branch ranges
and cannot always directly branch from a call site to an arbitrary
address (e.g. from a kernel text address to an arbitrary module
address). Calls from modules to core kernel text can be indirected via
PLTs (allocated at module load time) to address this, but the same is
not possible from calls from core kernel text.
Using an indirect branch from a call site to an arbitrary trampoline is
possible, but requires several more instructions in the function
prologue (or immediately before it), and/or comes with far more complex
requirements for patching.
Instead, this patch adds a new option, where an architecture can
associate each call site with a pointer to an ftrace_ops, placed at a
fixed offset from the call site. A shared trampoline can recover this
pointer and call ftrace_ops::func() without needing to go via
ftrace_ops_list_func(), avoiding the associated overhead.
This avoids issues with branch range limitations, and avoids the need to
allocate and manipulate dynamic trampolines, making it far simpler to
implement and maintain, while having similar performance
characteristics.
Note that this allows for dynamic ftrace_ops to be invoked directly from
an architecture's ftrace_caller trampoline, whereas existing code forces
the use of ftrace_ops_get_list_func(), which is in part necessary to
permit the ftrace_ops to be freed once unregistered *and* to avoid
branch/address-generation range limitation on some architectures (e.g.
where ops->func is a module address, and may be outside of the direct
branch range for callsites within the main kernel image).
The CALL_OPS approach avoids this problems and is safe as:
* The existing synchronization in ftrace_shutdown() using
ftrace_shutdown() using synchronize_rcu_tasks_rude() (and
synchronize_rcu_tasks()) ensures that no tasks hold a stale reference
to an ftrace_ops (e.g. in the middle of the ftrace_caller trampoline,
or while invoking ftrace_ops::func), when that ftrace_ops is
unregistered.
Arguably this could also be relied upon for the existing scheme,
permitting dynamic ftrace_ops to be invoked directly when ops->func is
in range, but this will require additional logic to handle branch
range limitations, and is not handled by this patch.
* Each callsite's ftrace_ops pointer literal can hold any valid kernel
address, and is updated atomically. As an architecture's ftrace_caller
trampoline will atomically load the ops pointer then dereference
ops->func, there is no risk of invoking ops->func with a mismatches
ops pointer, and updates to the ops pointer do not require special
care.
A subsequent patch will implement architectures support for arm64. There
should be no functional change as a result of this patch alone.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: Florent Revest <revest@chromium.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230123134603.1064407-2-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-01-23 13:45:56 +00:00
|
|
|
extern const struct ftrace_ops ftrace_nop_ops;
|
|
|
|
extern const struct ftrace_ops ftrace_list_ops;
|
|
|
|
struct ftrace_ops *ftrace_find_unique_ops(struct dyn_ftrace *rec);
|
2020-06-17 16:56:16 -04:00
|
|
|
#endif /* CONFIG_FUNCTION_TRACER */
|
2012-06-05 09:44:25 -04:00
|
|
|
|
2014-12-12 20:05:10 -05:00
|
|
|
/* Main tracing buffer and events set up */
|
|
|
|
#ifdef CONFIG_TRACING
|
|
|
|
void trace_init(void);
|
2017-03-03 13:37:33 -05:00
|
|
|
void early_trace_init(void);
|
2014-12-12 20:05:10 -05:00
|
|
|
#else
|
|
|
|
static inline void trace_init(void) { }
|
2017-03-03 13:37:33 -05:00
|
|
|
static inline void early_trace_init(void) { }
|
2014-12-12 20:05:10 -05:00
|
|
|
#endif
|
2012-06-05 09:44:25 -04:00
|
|
|
|
2011-05-26 13:46:22 -04:00
|
|
|
struct module;
|
2011-07-11 10:12:59 -04:00
|
|
|
struct ftrace_hash;
|
2019-11-08 13:11:27 -05:00
|
|
|
struct ftrace_direct_func;
|
2011-07-11 10:12:59 -04:00
|
|
|
|
2017-09-01 08:35:38 -04:00
|
|
|
#if defined(CONFIG_FUNCTION_TRACER) && defined(CONFIG_MODULES) && \
|
|
|
|
defined(CONFIG_DYNAMIC_FTRACE)
|
|
|
|
const char *
|
|
|
|
ftrace_mod_address_lookup(unsigned long addr, unsigned long *size,
|
|
|
|
unsigned long *off, char **modname, char *sym);
|
|
|
|
#else
|
|
|
|
static inline const char *
|
|
|
|
ftrace_mod_address_lookup(unsigned long addr, unsigned long *size,
|
|
|
|
unsigned long *off, char **modname, char *sym)
|
|
|
|
{
|
|
|
|
return NULL;
|
|
|
|
}
|
2020-05-12 15:19:13 +03:00
|
|
|
#endif
|
|
|
|
|
|
|
|
#if defined(CONFIG_FUNCTION_TRACER) && defined(CONFIG_DYNAMIC_FTRACE)
|
|
|
|
int ftrace_mod_get_kallsym(unsigned int symnum, unsigned long *value,
|
|
|
|
char *type, char *name,
|
|
|
|
char *module_name, int *exported);
|
|
|
|
#else
|
2017-09-06 08:40:41 -04:00
|
|
|
static inline int ftrace_mod_get_kallsym(unsigned int symnum, unsigned long *value,
|
|
|
|
char *type, char *name,
|
|
|
|
char *module_name, int *exported)
|
|
|
|
{
|
|
|
|
return -1;
|
|
|
|
}
|
2017-09-01 08:35:38 -04:00
|
|
|
#endif
|
|
|
|
|
2008-10-06 19:06:12 -04:00
|
|
|
#ifdef CONFIG_FUNCTION_TRACER
|
2008-10-02 17:45:47 +02:00
|
|
|
|
2008-05-12 21:20:43 +02:00
|
|
|
extern int ftrace_enabled;
|
|
|
|
|
2020-10-27 10:55:55 -04:00
|
|
|
#ifndef CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS
|
|
|
|
|
2020-10-28 17:42:17 -04:00
|
|
|
struct ftrace_regs {
|
|
|
|
struct pt_regs regs;
|
|
|
|
};
|
2020-10-27 10:55:55 -04:00
|
|
|
#define arch_ftrace_get_regs(fregs) (&(fregs)->regs)
|
|
|
|
|
2020-10-28 17:15:27 -04:00
|
|
|
/*
|
2022-11-03 17:05:18 +00:00
|
|
|
* ftrace_regs_set_instruction_pointer() is to be defined by the architecture
|
|
|
|
* if to allow setting of the instruction pointer from the ftrace_regs when
|
|
|
|
* HAVE_DYNAMIC_FTRACE_WITH_ARGS is set and it supports live kernel patching.
|
2020-10-28 17:15:27 -04:00
|
|
|
*/
|
2022-11-03 17:05:18 +00:00
|
|
|
#define ftrace_regs_set_instruction_pointer(fregs, ip) do { } while (0)
|
2020-10-27 10:55:55 -04:00
|
|
|
#endif /* CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS */
|
2020-10-28 17:42:17 -04:00
|
|
|
|
|
|
|
static __always_inline struct pt_regs *ftrace_get_regs(struct ftrace_regs *fregs)
|
|
|
|
{
|
|
|
|
if (!fregs)
|
|
|
|
return NULL;
|
|
|
|
|
2020-10-27 10:55:55 -04:00
|
|
|
return arch_ftrace_get_regs(fregs);
|
2020-10-28 17:42:17 -04:00
|
|
|
}
|
|
|
|
|
2022-11-03 17:05:19 +00:00
|
|
|
/*
|
|
|
|
* When true, the ftrace_regs_{get,set}_*() functions may be used on fregs.
|
|
|
|
* Note: this can be true even when ftrace_get_regs() cannot provide a pt_regs.
|
|
|
|
*/
|
|
|
|
static __always_inline bool ftrace_regs_has_args(struct ftrace_regs *fregs)
|
|
|
|
{
|
|
|
|
if (IS_ENABLED(CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS))
|
|
|
|
return true;
|
|
|
|
|
|
|
|
return ftrace_get_regs(fregs) != NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
#ifndef CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS
|
|
|
|
#define ftrace_regs_get_instruction_pointer(fregs) \
|
|
|
|
instruction_pointer(ftrace_get_regs(fregs))
|
|
|
|
#define ftrace_regs_get_argument(fregs, n) \
|
|
|
|
regs_get_kernel_argument(ftrace_get_regs(fregs), n)
|
|
|
|
#define ftrace_regs_get_stack_pointer(fregs) \
|
|
|
|
kernel_stack_pointer(ftrace_get_regs(fregs))
|
|
|
|
#define ftrace_regs_return_value(fregs) \
|
|
|
|
regs_return_value(ftrace_get_regs(fregs))
|
|
|
|
#define ftrace_regs_set_return_value(fregs, ret) \
|
|
|
|
regs_set_return_value(ftrace_get_regs(fregs), ret)
|
|
|
|
#define ftrace_override_function_with_return(fregs) \
|
|
|
|
override_function_with_return(ftrace_get_regs(fregs))
|
|
|
|
#define ftrace_regs_query_register_offset(name) \
|
|
|
|
regs_query_register_offset(name)
|
|
|
|
#endif
|
|
|
|
|
2011-08-08 16:57:47 -04:00
|
|
|
typedef void (*ftrace_func_t)(unsigned long ip, unsigned long parent_ip,
|
2020-10-28 17:42:17 -04:00
|
|
|
struct ftrace_ops *op, struct ftrace_regs *fregs);
|
2008-05-12 21:20:42 +02:00
|
|
|
|
2014-07-22 20:41:42 -04:00
|
|
|
ftrace_func_t ftrace_ops_get_func(struct ftrace_ops *ops);
|
|
|
|
|
2012-02-15 15:51:48 +01:00
|
|
|
/*
|
|
|
|
* FTRACE_OPS_FL_* bits denote the state of ftrace_ops struct and are
|
|
|
|
* set in the flags member.
|
2020-11-05 21:32:45 -05:00
|
|
|
* CONTROL, SAVE_REGS, SAVE_REGS_IF_SUPPORTED, RECURSION, STUB and
|
2014-11-21 05:25:16 -05:00
|
|
|
* IPMODIFY are a kind of attribute flags which can be set only before
|
|
|
|
* registering the ftrace_ops, and can not be modified while registered.
|
2017-05-08 15:57:50 -07:00
|
|
|
* Changing those attribute flags after registering ftrace_ops will
|
2014-11-21 05:25:16 -05:00
|
|
|
* cause unexpected results.
|
2012-02-15 15:51:48 +01:00
|
|
|
*
|
|
|
|
* ENABLED - set/unset when ftrace_ops is registered/unregistered
|
|
|
|
* DYNAMIC - set when ftrace_ops is registered to denote dynamically
|
|
|
|
* allocated ftrace_ops which need special care
|
2012-04-30 16:20:23 -04:00
|
|
|
* SAVE_REGS - The ftrace_ops wants regs saved at each function called
|
|
|
|
* and passed to the callback. If this flag is set, but the
|
|
|
|
* architecture does not support passing regs
|
2012-09-28 17:15:17 +09:00
|
|
|
* (CONFIG_DYNAMIC_FTRACE_WITH_REGS is not defined), then the
|
2012-04-30 16:20:23 -04:00
|
|
|
* ftrace_ops will fail to register, unless the next flag
|
|
|
|
* is set.
|
|
|
|
* SAVE_REGS_IF_SUPPORTED - This is the same as SAVE_REGS, but if the
|
|
|
|
* handler can handle an arch that does not save regs
|
|
|
|
* (the handler tests if regs == NULL), then it can set
|
|
|
|
* this flag instead. It will not fail registering the ftrace_ops
|
|
|
|
* but, the regs field will be NULL if the arch does not support
|
|
|
|
* passing regs to the handler.
|
|
|
|
* Note, if this flag is set, the SAVE_REGS flag will automatically
|
|
|
|
* get set upon registering the ftrace_ops, if the arch supports it.
|
2020-11-05 21:32:45 -05:00
|
|
|
* RECURSION - The ftrace_ops can set this to tell the ftrace infrastructure
|
|
|
|
* that the call back needs recursion protection. If it does
|
|
|
|
* not set this, then the ftrace infrastructure will assume
|
|
|
|
* that the callback can handle recursion on its own.
|
2013-03-27 09:31:28 -04:00
|
|
|
* STUB - The ftrace_ops is just a place holder.
|
2013-05-09 14:44:17 +09:00
|
|
|
* INITIALIZED - The ftrace_ops has already been initialized (first use time
|
|
|
|
* register_ftrace_function() is called, it will initialized the ops)
|
2014-01-10 16:17:45 -05:00
|
|
|
* DELETED - The ops are being deleted, do not let them be registered again.
|
2014-08-05 17:19:38 -04:00
|
|
|
* ADDING - The ops is in the process of being added.
|
|
|
|
* REMOVING - The ops is in the process of being removed.
|
|
|
|
* MODIFYING - The ops is in the process of changing its filter functions.
|
ftrace/x86: Add dynamic allocated trampoline for ftrace_ops
The current method of handling multiple function callbacks is to register
a list function callback that calls all the other callbacks based on
their hash tables and compare it to the function that the callback was
called on. But this is very inefficient.
For example, if you are tracing all functions in the kernel and then
add a kprobe to a function such that the kprobe uses ftrace, the
mcount trampoline will switch from calling the function trace callback
to calling the list callback that will iterate over all registered
ftrace_ops (in this case, the function tracer and the kprobes callback).
That means for every function being traced it checks the hash of the
ftrace_ops for function tracing and kprobes, even though the kprobes
is only set at a single function. The kprobes ftrace_ops is checked
for every function being traced!
Instead of calling the list function for functions that are only being
traced by a single callback, we can call a dynamically allocated
trampoline that calls the callback directly. The function graph tracer
already uses a direct call trampoline when it is being traced by itself
but it is not dynamically allocated. It's trampoline is static in the
kernel core. The infrastructure that called the function graph trampoline
can also be used to call a dynamically allocated one.
For now, only ftrace_ops that are not dynamically allocated can have
a trampoline. That is, users such as function tracer or stack tracer.
kprobes and perf allocate their ftrace_ops, and until there's a safe
way to free the trampoline, it can not be used. The dynamically allocated
ftrace_ops may, although, use the trampoline if the kernel is not
compiled with CONFIG_PREEMPT. But that will come later.
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-02 23:23:31 -04:00
|
|
|
* ALLOC_TRAMP - A dynamic trampoline was allocated by the core code.
|
|
|
|
* The arch specific code sets this flag when it allocated a
|
|
|
|
* trampoline. This lets the arch know that it can update the
|
|
|
|
* trampoline in case the callback function changes.
|
|
|
|
* The ftrace_ops trampoline can be set by the ftrace users, and
|
|
|
|
* in such cases the arch must not modify it. Only the arch ftrace
|
|
|
|
* core code should set this flag.
|
2014-11-21 05:25:16 -05:00
|
|
|
* IPMODIFY - The ops can modify the IP register. This can only be set with
|
|
|
|
* SAVE_REGS. If another ops with this flag set is already registered
|
|
|
|
* for any of the functions that this ops will be registered for, then
|
|
|
|
* this ops will fail to register or set_filter_ip.
|
2015-07-24 10:38:12 -04:00
|
|
|
* PID - Is affected by set_ftrace_pid (allows filtering on those pids)
|
2017-06-21 13:39:13 -04:00
|
|
|
* RCU - Set when the ops can only be called when RCU is watching.
|
2017-06-26 11:47:31 -04:00
|
|
|
* TRACE_ARRAY - The ops->private points to a trace_array descriptor.
|
2019-10-16 13:33:13 +02:00
|
|
|
* PERMANENT - Set when the ops is permanent and should not be affected by
|
|
|
|
* ftrace_enabled.
|
2019-11-08 13:07:06 -05:00
|
|
|
* DIRECT - Used by the direct ftrace_ops helper for direct functions
|
|
|
|
* (internal ftrace only, should not be used by others)
|
2012-02-15 15:51:48 +01:00
|
|
|
*/
|
2011-05-04 09:27:52 -04:00
|
|
|
enum {
|
2019-11-21 14:38:15 +01:00
|
|
|
FTRACE_OPS_FL_ENABLED = BIT(0),
|
|
|
|
FTRACE_OPS_FL_DYNAMIC = BIT(1),
|
|
|
|
FTRACE_OPS_FL_SAVE_REGS = BIT(2),
|
|
|
|
FTRACE_OPS_FL_SAVE_REGS_IF_SUPPORTED = BIT(3),
|
2020-11-05 21:32:45 -05:00
|
|
|
FTRACE_OPS_FL_RECURSION = BIT(4),
|
2019-11-21 14:38:15 +01:00
|
|
|
FTRACE_OPS_FL_STUB = BIT(5),
|
|
|
|
FTRACE_OPS_FL_INITIALIZED = BIT(6),
|
|
|
|
FTRACE_OPS_FL_DELETED = BIT(7),
|
|
|
|
FTRACE_OPS_FL_ADDING = BIT(8),
|
|
|
|
FTRACE_OPS_FL_REMOVING = BIT(9),
|
|
|
|
FTRACE_OPS_FL_MODIFYING = BIT(10),
|
|
|
|
FTRACE_OPS_FL_ALLOC_TRAMP = BIT(11),
|
|
|
|
FTRACE_OPS_FL_IPMODIFY = BIT(12),
|
|
|
|
FTRACE_OPS_FL_PID = BIT(13),
|
|
|
|
FTRACE_OPS_FL_RCU = BIT(14),
|
|
|
|
FTRACE_OPS_FL_TRACE_ARRAY = BIT(15),
|
|
|
|
FTRACE_OPS_FL_PERMANENT = BIT(16),
|
|
|
|
FTRACE_OPS_FL_DIRECT = BIT(17),
|
2011-05-04 09:27:52 -04:00
|
|
|
};
|
|
|
|
|
ftrace: Allow IPMODIFY and DIRECT ops on the same function
IPMODIFY (livepatch) and DIRECT (bpf trampoline) ops are both important
users of ftrace. It is necessary to allow them work on the same function
at the same time.
First, DIRECT ops no longer specify IPMODIFY flag. Instead, DIRECT flag is
handled together with IPMODIFY flag in __ftrace_hash_update_ipmodify().
Then, a callback function, ops_func, is added to ftrace_ops. This is used
by ftrace core code to understand whether the DIRECT ops can share with an
IPMODIFY ops. To share with IPMODIFY ops, the DIRECT ops need to implement
the callback function and adjust the direct trampoline accordingly.
If DIRECT ops is attached before the IPMODIFY ops, ftrace core code calls
ENABLE_SHARE_IPMODIFY_PEER on the DIRECT ops before registering the
IPMODIFY ops.
If IPMODIFY ops is attached before the DIRECT ops, ftrace core code calls
ENABLE_SHARE_IPMODIFY_SELF in __ftrace_hash_update_ipmodify. Owner of the
DIRECT ops may return 0 if the DIRECT trampoline can share with IPMODIFY,
so error code otherwise. The error code is propagated to
register_ftrace_direct_multi so that onwer of the DIRECT trampoline can
handle it properly.
For more details, please refer to comment before enum ftrace_ops_cmd.
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/all/20220602193706.2607681-2-song@kernel.org/
Link: https://lore.kernel.org/all/20220718055449.3960512-1-song@kernel.org/
Link: https://lore.kernel.org/bpf/20220720002126.803253-3-song@kernel.org
2022-07-19 17:21:24 -07:00
|
|
|
/*
|
|
|
|
* FTRACE_OPS_CMD_* commands allow the ftrace core logic to request changes
|
|
|
|
* to a ftrace_ops. Note, the requests may fail.
|
|
|
|
*
|
|
|
|
* ENABLE_SHARE_IPMODIFY_SELF - enable a DIRECT ops to work on the same
|
|
|
|
* function as an ops with IPMODIFY. Called
|
|
|
|
* when the DIRECT ops is being registered.
|
|
|
|
* This is called with both direct_mutex and
|
|
|
|
* ftrace_lock are locked.
|
|
|
|
*
|
|
|
|
* ENABLE_SHARE_IPMODIFY_PEER - enable a DIRECT ops to work on the same
|
|
|
|
* function as an ops with IPMODIFY. Called
|
|
|
|
* when the other ops (the one with IPMODIFY)
|
|
|
|
* is being registered.
|
|
|
|
* This is called with direct_mutex locked.
|
|
|
|
*
|
|
|
|
* DISABLE_SHARE_IPMODIFY_PEER - disable a DIRECT ops to work on the same
|
|
|
|
* function as an ops with IPMODIFY. Called
|
|
|
|
* when the other ops (the one with IPMODIFY)
|
|
|
|
* is being unregistered.
|
|
|
|
* This is called with direct_mutex locked.
|
|
|
|
*/
|
|
|
|
enum ftrace_ops_cmd {
|
|
|
|
FTRACE_OPS_CMD_ENABLE_SHARE_IPMODIFY_SELF,
|
|
|
|
FTRACE_OPS_CMD_ENABLE_SHARE_IPMODIFY_PEER,
|
|
|
|
FTRACE_OPS_CMD_DISABLE_SHARE_IPMODIFY_PEER,
|
|
|
|
};
|
|
|
|
|
|
|
|
/*
|
|
|
|
* For most ftrace_ops_cmd,
|
|
|
|
* Returns:
|
|
|
|
* 0 - Success.
|
|
|
|
* Negative on failure. The return value is dependent on the
|
|
|
|
* callback.
|
|
|
|
*/
|
|
|
|
typedef int (*ftrace_ops_func_t)(struct ftrace_ops *op, enum ftrace_ops_cmd cmd);
|
|
|
|
|
2014-08-15 17:23:02 -04:00
|
|
|
#ifdef CONFIG_DYNAMIC_FTRACE
|
|
|
|
/* The hash used to know what functions callbacks trace */
|
|
|
|
struct ftrace_ops_hash {
|
2017-06-07 16:12:51 +08:00
|
|
|
struct ftrace_hash __rcu *notrace_hash;
|
|
|
|
struct ftrace_hash __rcu *filter_hash;
|
2014-08-15 17:23:02 -04:00
|
|
|
struct mutex regex_lock;
|
|
|
|
};
|
2017-03-03 16:15:39 -05:00
|
|
|
|
2017-04-03 12:57:35 -04:00
|
|
|
void ftrace_free_init_mem(void);
|
2017-09-01 08:35:38 -04:00
|
|
|
void ftrace_free_mem(struct module *mod, void *start, void *end);
|
2017-03-03 16:15:39 -05:00
|
|
|
#else
|
2022-03-10 21:37:09 -05:00
|
|
|
static inline void ftrace_free_init_mem(void)
|
|
|
|
{
|
|
|
|
ftrace_boot_snapshot();
|
|
|
|
}
|
2017-09-01 08:35:38 -04:00
|
|
|
static inline void ftrace_free_mem(struct module *mod, void *start, void *end) { }
|
2014-08-15 17:23:02 -04:00
|
|
|
#endif
|
|
|
|
|
2013-11-07 09:36:25 -05:00
|
|
|
/*
|
2015-11-30 17:23:39 -05:00
|
|
|
* Note, ftrace_ops can be referenced outside of RCU protection, unless
|
|
|
|
* the RCU flag is set. If ftrace_ops is allocated and not part of kernel
|
|
|
|
* core data, the unregistering of it will perform a scheduling on all CPUs
|
|
|
|
* to make sure that there are no more users. Depending on the load of the
|
|
|
|
* system that may take a bit of time.
|
2013-11-07 09:36:25 -05:00
|
|
|
*
|
|
|
|
* Any private data added must also take care not to be freed and if private
|
|
|
|
* data is added to a ftrace_ops that is in core code, the user of the
|
|
|
|
* ftrace_ops must perform a schedule_on_each_cpu() before freeing it.
|
|
|
|
*/
|
2008-05-12 21:20:42 +02:00
|
|
|
struct ftrace_ops {
|
2011-05-02 12:29:25 -04:00
|
|
|
ftrace_func_t func;
|
2017-06-07 16:12:51 +08:00
|
|
|
struct ftrace_ops __rcu *next;
|
2011-05-04 09:27:52 -04:00
|
|
|
unsigned long flags;
|
2013-11-07 09:36:25 -05:00
|
|
|
void *private;
|
2015-07-24 10:38:12 -04:00
|
|
|
ftrace_func_t saved_func;
|
2011-05-02 12:29:25 -04:00
|
|
|
#ifdef CONFIG_DYNAMIC_FTRACE
|
2014-08-15 17:23:02 -04:00
|
|
|
struct ftrace_ops_hash local_hash;
|
|
|
|
struct ftrace_ops_hash *func_hash;
|
2014-07-24 12:25:47 -04:00
|
|
|
struct ftrace_ops_hash old_hash;
|
ftrace: Optimize function graph to be called directly
Function graph tracing is a bit different than the function tracers, as
it is processed after either the ftrace_caller or ftrace_regs_caller
and we only have one place to modify the jump to ftrace_graph_caller,
the jump needs to happen after the restore of registeres.
The function graph tracer is dependent on the function tracer, where
even if the function graph tracing is going on by itself, the save and
restore of registers is still done for function tracing regardless of
if function tracing is happening, before it calls the function graph
code.
If there's no function tracing happening, it is possible to just call
the function graph tracer directly, and avoid the wasted effort to save
and restore regs for function tracing.
This requires adding new flags to the dyn_ftrace records:
FTRACE_FL_TRAMP
FTRACE_FL_TRAMP_EN
The first is set if the count for the record is one, and the ftrace_ops
associated to that record has its own trampoline. That way the mcount code
can call that trampoline directly.
In the future, trampolines can be added to arbitrary ftrace_ops, where you
can have two or more ftrace_ops registered to ftrace (like kprobes and perf)
and if they are not tracing the same functions, then instead of doing a
loop to check all registered ftrace_ops against their hashes, just call the
ftrace_ops trampoline directly, which would call the registered ftrace_ops
function directly.
Without this patch perf showed:
0.05% hackbench [kernel.kallsyms] [k] ftrace_caller
0.05% hackbench [kernel.kallsyms] [k] arch_local_irq_save
0.05% hackbench [kernel.kallsyms] [k] native_sched_clock
0.04% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] preempt_trace
0.04% hackbench [kernel.kallsyms] [k] prepare_ftrace_return
0.04% hackbench [kernel.kallsyms] [k] __this_cpu_preempt_check
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
See that the ftrace_caller took up more time than the ftrace_graph_caller
did.
With this patch:
0.05% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] call_filter_check_discard
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
0.04% hackbench [kernel.kallsyms] [k] sched_clock
The ftrace_caller is no where to be found and ftrace_graph_caller still
takes up the same percentage.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-05-06 21:56:17 -04:00
|
|
|
unsigned long trampoline;
|
2014-11-18 21:14:11 -05:00
|
|
|
unsigned long trampoline_size;
|
2020-05-12 15:19:13 +03:00
|
|
|
struct list_head list;
|
ftrace: Allow IPMODIFY and DIRECT ops on the same function
IPMODIFY (livepatch) and DIRECT (bpf trampoline) ops are both important
users of ftrace. It is necessary to allow them work on the same function
at the same time.
First, DIRECT ops no longer specify IPMODIFY flag. Instead, DIRECT flag is
handled together with IPMODIFY flag in __ftrace_hash_update_ipmodify().
Then, a callback function, ops_func, is added to ftrace_ops. This is used
by ftrace core code to understand whether the DIRECT ops can share with an
IPMODIFY ops. To share with IPMODIFY ops, the DIRECT ops need to implement
the callback function and adjust the direct trampoline accordingly.
If DIRECT ops is attached before the IPMODIFY ops, ftrace core code calls
ENABLE_SHARE_IPMODIFY_PEER on the DIRECT ops before registering the
IPMODIFY ops.
If IPMODIFY ops is attached before the DIRECT ops, ftrace core code calls
ENABLE_SHARE_IPMODIFY_SELF in __ftrace_hash_update_ipmodify. Owner of the
DIRECT ops may return 0 if the DIRECT trampoline can share with IPMODIFY,
so error code otherwise. The error code is propagated to
register_ftrace_direct_multi so that onwer of the DIRECT trampoline can
handle it properly.
For more details, please refer to comment before enum ftrace_ops_cmd.
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/all/20220602193706.2607681-2-song@kernel.org/
Link: https://lore.kernel.org/all/20220718055449.3960512-1-song@kernel.org/
Link: https://lore.kernel.org/bpf/20220720002126.803253-3-song@kernel.org
2022-07-19 17:21:24 -07:00
|
|
|
ftrace_ops_func_t ops_func;
|
2011-05-02 12:29:25 -04:00
|
|
|
#endif
|
2008-05-12 21:20:42 +02:00
|
|
|
};
|
|
|
|
|
x86/ftrace: Have ftrace trampolines turn read-only at the end of system boot up
Booting one of my machines, it triggered the following crash:
Kernel/User page tables isolation: enabled
ftrace: allocating 36577 entries in 143 pages
Starting tracer 'function'
BUG: unable to handle page fault for address: ffffffffa000005c
#PF: supervisor write access in kernel mode
#PF: error_code(0x0003) - permissions violation
PGD 2014067 P4D 2014067 PUD 2015063 PMD 7b253067 PTE 7b252061
Oops: 0003 [#1] PREEMPT SMP PTI
CPU: 0 PID: 0 Comm: swapper Not tainted 5.4.0-test+ #24
Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
RIP: 0010:text_poke_early+0x4a/0x58
Code: 34 24 48 89 54 24 08 e8 bf 72 0b 00 48 8b 34 24 48 8b 4c 24 08 84 c0 74 0b 48 89 df f3 a4 48 83 c4 10 5b c3 9c 58 fa 48 89 df <f3> a4 50 9d 48 83 c4 10 5b e9 d6 f9 ff ff
0 41 57 49
RSP: 0000:ffffffff82003d38 EFLAGS: 00010046
RAX: 0000000000000046 RBX: ffffffffa000005c RCX: 0000000000000005
RDX: 0000000000000005 RSI: ffffffff825b9a90 RDI: ffffffffa000005c
RBP: ffffffffa000005c R08: 0000000000000000 R09: ffffffff8206e6e0
R10: ffff88807b01f4c0 R11: ffffffff8176c106 R12: ffffffff8206e6e0
R13: ffffffff824f2440 R14: 0000000000000000 R15: ffffffff8206eac0
FS: 0000000000000000(0000) GS:ffff88807d400000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffa000005c CR3: 0000000002012000 CR4: 00000000000006b0
Call Trace:
text_poke_bp+0x27/0x64
? mutex_lock+0x36/0x5d
arch_ftrace_update_trampoline+0x287/0x2d5
? ftrace_replace_code+0x14b/0x160
? ftrace_update_ftrace_func+0x65/0x6c
__register_ftrace_function+0x6d/0x81
ftrace_startup+0x23/0xc1
register_ftrace_function+0x20/0x37
func_set_flag+0x59/0x77
__set_tracer_option.isra.19+0x20/0x3e
trace_set_options+0xd6/0x13e
apply_trace_boot_options+0x44/0x6d
register_tracer+0x19e/0x1ac
early_trace_init+0x21b/0x2c9
start_kernel+0x241/0x518
? load_ucode_intel_bsp+0x21/0x52
secondary_startup_64+0xa4/0xb0
I was able to trigger it on other machines, when I added to the kernel
command line of both "ftrace=function" and "trace_options=func_stack_trace".
The cause is the "ftrace=function" would register the function tracer
and create a trampoline, and it will set it as executable and
read-only. Then the "trace_options=func_stack_trace" would then update
the same trampoline to include the stack tracer version of the function
tracer. But since the trampoline already exists, it updates it with
text_poke_bp(). The problem is that text_poke_bp() called while
system_state == SYSTEM_BOOTING, it will simply do a memcpy() and not
the page mapping, as it would think that the text is still read-write.
But in this case it is not, and we take a fault and crash.
Instead, lets keep the ftrace trampolines read-write during boot up,
and then when the kernel executable text is set to read-only, the
ftrace trampolines get set to read-only as well.
Link: https://lkml.kernel.org/r/20200430202147.4dc6e2de@oasis.local.home
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: stable@vger.kernel.org
Fixes: 768ae4406a5c ("x86/ftrace: Use text_poke()")
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-04-30 20:21:47 -04:00
|
|
|
extern struct ftrace_ops __rcu *ftrace_ops_list;
|
|
|
|
extern struct ftrace_ops ftrace_list_end;
|
|
|
|
|
|
|
|
/*
|
2020-08-31 11:11:04 +08:00
|
|
|
* Traverse the ftrace_ops_list, invoking all entries. The reason that we
|
x86/ftrace: Have ftrace trampolines turn read-only at the end of system boot up
Booting one of my machines, it triggered the following crash:
Kernel/User page tables isolation: enabled
ftrace: allocating 36577 entries in 143 pages
Starting tracer 'function'
BUG: unable to handle page fault for address: ffffffffa000005c
#PF: supervisor write access in kernel mode
#PF: error_code(0x0003) - permissions violation
PGD 2014067 P4D 2014067 PUD 2015063 PMD 7b253067 PTE 7b252061
Oops: 0003 [#1] PREEMPT SMP PTI
CPU: 0 PID: 0 Comm: swapper Not tainted 5.4.0-test+ #24
Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
RIP: 0010:text_poke_early+0x4a/0x58
Code: 34 24 48 89 54 24 08 e8 bf 72 0b 00 48 8b 34 24 48 8b 4c 24 08 84 c0 74 0b 48 89 df f3 a4 48 83 c4 10 5b c3 9c 58 fa 48 89 df <f3> a4 50 9d 48 83 c4 10 5b e9 d6 f9 ff ff
0 41 57 49
RSP: 0000:ffffffff82003d38 EFLAGS: 00010046
RAX: 0000000000000046 RBX: ffffffffa000005c RCX: 0000000000000005
RDX: 0000000000000005 RSI: ffffffff825b9a90 RDI: ffffffffa000005c
RBP: ffffffffa000005c R08: 0000000000000000 R09: ffffffff8206e6e0
R10: ffff88807b01f4c0 R11: ffffffff8176c106 R12: ffffffff8206e6e0
R13: ffffffff824f2440 R14: 0000000000000000 R15: ffffffff8206eac0
FS: 0000000000000000(0000) GS:ffff88807d400000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffa000005c CR3: 0000000002012000 CR4: 00000000000006b0
Call Trace:
text_poke_bp+0x27/0x64
? mutex_lock+0x36/0x5d
arch_ftrace_update_trampoline+0x287/0x2d5
? ftrace_replace_code+0x14b/0x160
? ftrace_update_ftrace_func+0x65/0x6c
__register_ftrace_function+0x6d/0x81
ftrace_startup+0x23/0xc1
register_ftrace_function+0x20/0x37
func_set_flag+0x59/0x77
__set_tracer_option.isra.19+0x20/0x3e
trace_set_options+0xd6/0x13e
apply_trace_boot_options+0x44/0x6d
register_tracer+0x19e/0x1ac
early_trace_init+0x21b/0x2c9
start_kernel+0x241/0x518
? load_ucode_intel_bsp+0x21/0x52
secondary_startup_64+0xa4/0xb0
I was able to trigger it on other machines, when I added to the kernel
command line of both "ftrace=function" and "trace_options=func_stack_trace".
The cause is the "ftrace=function" would register the function tracer
and create a trampoline, and it will set it as executable and
read-only. Then the "trace_options=func_stack_trace" would then update
the same trampoline to include the stack tracer version of the function
tracer. But since the trampoline already exists, it updates it with
text_poke_bp(). The problem is that text_poke_bp() called while
system_state == SYSTEM_BOOTING, it will simply do a memcpy() and not
the page mapping, as it would think that the text is still read-write.
But in this case it is not, and we take a fault and crash.
Instead, lets keep the ftrace trampolines read-write during boot up,
and then when the kernel executable text is set to read-only, the
ftrace trampolines get set to read-only as well.
Link: https://lkml.kernel.org/r/20200430202147.4dc6e2de@oasis.local.home
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: stable@vger.kernel.org
Fixes: 768ae4406a5c ("x86/ftrace: Use text_poke()")
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-04-30 20:21:47 -04:00
|
|
|
* can use rcu_dereference_raw_check() is that elements removed from this list
|
|
|
|
* are simply leaked, so there is no need to interact with a grace-period
|
|
|
|
* mechanism. The rcu_dereference_raw_check() calls are needed to handle
|
2020-08-31 11:11:04 +08:00
|
|
|
* concurrent insertions into the ftrace_ops_list.
|
x86/ftrace: Have ftrace trampolines turn read-only at the end of system boot up
Booting one of my machines, it triggered the following crash:
Kernel/User page tables isolation: enabled
ftrace: allocating 36577 entries in 143 pages
Starting tracer 'function'
BUG: unable to handle page fault for address: ffffffffa000005c
#PF: supervisor write access in kernel mode
#PF: error_code(0x0003) - permissions violation
PGD 2014067 P4D 2014067 PUD 2015063 PMD 7b253067 PTE 7b252061
Oops: 0003 [#1] PREEMPT SMP PTI
CPU: 0 PID: 0 Comm: swapper Not tainted 5.4.0-test+ #24
Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
RIP: 0010:text_poke_early+0x4a/0x58
Code: 34 24 48 89 54 24 08 e8 bf 72 0b 00 48 8b 34 24 48 8b 4c 24 08 84 c0 74 0b 48 89 df f3 a4 48 83 c4 10 5b c3 9c 58 fa 48 89 df <f3> a4 50 9d 48 83 c4 10 5b e9 d6 f9 ff ff
0 41 57 49
RSP: 0000:ffffffff82003d38 EFLAGS: 00010046
RAX: 0000000000000046 RBX: ffffffffa000005c RCX: 0000000000000005
RDX: 0000000000000005 RSI: ffffffff825b9a90 RDI: ffffffffa000005c
RBP: ffffffffa000005c R08: 0000000000000000 R09: ffffffff8206e6e0
R10: ffff88807b01f4c0 R11: ffffffff8176c106 R12: ffffffff8206e6e0
R13: ffffffff824f2440 R14: 0000000000000000 R15: ffffffff8206eac0
FS: 0000000000000000(0000) GS:ffff88807d400000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffa000005c CR3: 0000000002012000 CR4: 00000000000006b0
Call Trace:
text_poke_bp+0x27/0x64
? mutex_lock+0x36/0x5d
arch_ftrace_update_trampoline+0x287/0x2d5
? ftrace_replace_code+0x14b/0x160
? ftrace_update_ftrace_func+0x65/0x6c
__register_ftrace_function+0x6d/0x81
ftrace_startup+0x23/0xc1
register_ftrace_function+0x20/0x37
func_set_flag+0x59/0x77
__set_tracer_option.isra.19+0x20/0x3e
trace_set_options+0xd6/0x13e
apply_trace_boot_options+0x44/0x6d
register_tracer+0x19e/0x1ac
early_trace_init+0x21b/0x2c9
start_kernel+0x241/0x518
? load_ucode_intel_bsp+0x21/0x52
secondary_startup_64+0xa4/0xb0
I was able to trigger it on other machines, when I added to the kernel
command line of both "ftrace=function" and "trace_options=func_stack_trace".
The cause is the "ftrace=function" would register the function tracer
and create a trampoline, and it will set it as executable and
read-only. Then the "trace_options=func_stack_trace" would then update
the same trampoline to include the stack tracer version of the function
tracer. But since the trampoline already exists, it updates it with
text_poke_bp(). The problem is that text_poke_bp() called while
system_state == SYSTEM_BOOTING, it will simply do a memcpy() and not
the page mapping, as it would think that the text is still read-write.
But in this case it is not, and we take a fault and crash.
Instead, lets keep the ftrace trampolines read-write during boot up,
and then when the kernel executable text is set to read-only, the
ftrace trampolines get set to read-only as well.
Link: https://lkml.kernel.org/r/20200430202147.4dc6e2de@oasis.local.home
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: stable@vger.kernel.org
Fixes: 768ae4406a5c ("x86/ftrace: Use text_poke()")
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-04-30 20:21:47 -04:00
|
|
|
*
|
|
|
|
* Silly Alpha and silly pointer-speculation compiler optimizations!
|
|
|
|
*/
|
|
|
|
#define do_for_each_ftrace_op(op, list) \
|
|
|
|
op = rcu_dereference_raw_check(list); \
|
|
|
|
do
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Optimized for just a single item in the list (as that is the normal case).
|
|
|
|
*/
|
|
|
|
#define while_for_each_ftrace_op(op) \
|
|
|
|
while (likely(op = rcu_dereference_raw_check((op)->next)) && \
|
|
|
|
unlikely((op) != &ftrace_list_end))
|
|
|
|
|
2008-11-16 06:02:06 +01:00
|
|
|
/*
|
|
|
|
* Type of the current tracing.
|
|
|
|
*/
|
|
|
|
enum ftrace_tracing_type_t {
|
|
|
|
FTRACE_TYPE_ENTER = 0, /* Hook the call of the function */
|
|
|
|
FTRACE_TYPE_RETURN, /* Hook the return of the function */
|
|
|
|
};
|
|
|
|
|
|
|
|
/* Current tracing type, default is FTRACE_TYPE_ENTER */
|
|
|
|
extern enum ftrace_tracing_type_t ftrace_tracing_type;
|
|
|
|
|
2008-05-12 21:20:42 +02:00
|
|
|
/*
|
|
|
|
* The ftrace_ops must be a static and should also
|
|
|
|
* be read_mostly. These functions do modify read_mostly variables
|
|
|
|
* so use them sparely. Never free an ftrace_op or modify the
|
|
|
|
* next pointer after it has been registered. Even after unregistering
|
|
|
|
* it, the next pointer may still be used internally.
|
|
|
|
*/
|
|
|
|
int register_ftrace_function(struct ftrace_ops *ops);
|
|
|
|
int unregister_ftrace_function(struct ftrace_ops *ops);
|
|
|
|
|
2011-08-09 12:50:46 -04:00
|
|
|
extern void ftrace_stub(unsigned long a0, unsigned long a1,
|
2020-10-28 17:42:17 -04:00
|
|
|
struct ftrace_ops *op, struct ftrace_regs *fregs);
|
2008-05-12 21:20:42 +02:00
|
|
|
|
2022-05-10 14:26:13 +02:00
|
|
|
|
|
|
|
int ftrace_lookup_symbols(const char **sorted_syms, size_t cnt, unsigned long *addrs);
|
2008-10-06 19:06:12 -04:00
|
|
|
#else /* !CONFIG_FUNCTION_TRACER */
|
2010-05-04 11:24:01 -04:00
|
|
|
/*
|
|
|
|
* (un)register_ftrace_function must be a macro since the ops parameter
|
|
|
|
* must not be evaluated.
|
|
|
|
*/
|
|
|
|
#define register_ftrace_function(ops) ({ 0; })
|
|
|
|
#define unregister_ftrace_function(ops) ({ 0; })
|
2008-10-23 09:33:02 -04:00
|
|
|
static inline void ftrace_kill(void) { }
|
2017-04-03 12:57:35 -04:00
|
|
|
static inline void ftrace_free_init_mem(void) { }
|
2017-09-01 08:35:38 -04:00
|
|
|
static inline void ftrace_free_mem(struct module *mod, void *start, void *end) { }
|
2022-05-10 14:26:13 +02:00
|
|
|
static inline int ftrace_lookup_symbols(const char **sorted_syms, size_t cnt, unsigned long *addrs)
|
|
|
|
{
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
}
|
2008-10-06 19:06:12 -04:00
|
|
|
#endif /* CONFIG_FUNCTION_TRACER */
|
2008-05-12 21:20:42 +02:00
|
|
|
|
2019-11-17 17:04:15 -05:00
|
|
|
struct ftrace_func_entry {
|
|
|
|
struct hlist_node hlist;
|
|
|
|
unsigned long ip;
|
|
|
|
unsigned long direct; /* for direct lookup only */
|
|
|
|
};
|
|
|
|
|
2019-11-08 13:07:06 -05:00
|
|
|
#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
|
2019-11-08 13:12:57 -05:00
|
|
|
extern int ftrace_direct_func_count;
|
2019-11-08 13:07:06 -05:00
|
|
|
int register_ftrace_direct(unsigned long ip, unsigned long addr);
|
|
|
|
int unregister_ftrace_direct(unsigned long ip, unsigned long addr);
|
2019-11-14 14:39:35 -05:00
|
|
|
int modify_ftrace_direct(unsigned long ip, unsigned long old_addr, unsigned long new_addr);
|
2019-11-08 13:11:27 -05:00
|
|
|
struct ftrace_direct_func *ftrace_find_direct_func(unsigned long addr);
|
2019-11-17 17:04:15 -05:00
|
|
|
int ftrace_modify_direct_caller(struct ftrace_func_entry *entry,
|
|
|
|
struct dyn_ftrace *rec,
|
|
|
|
unsigned long old_addr,
|
|
|
|
unsigned long new_addr);
|
2019-12-08 16:01:12 -08:00
|
|
|
unsigned long ftrace_find_rec_direct(unsigned long ip);
|
2021-10-08 11:13:34 +02:00
|
|
|
int register_ftrace_direct_multi(struct ftrace_ops *ops, unsigned long addr);
|
|
|
|
int unregister_ftrace_direct_multi(struct ftrace_ops *ops, unsigned long addr);
|
2021-10-08 11:13:35 +02:00
|
|
|
int modify_ftrace_direct_multi(struct ftrace_ops *ops, unsigned long addr);
|
2022-07-19 17:21:23 -07:00
|
|
|
int modify_ftrace_direct_multi_nolock(struct ftrace_ops *ops, unsigned long addr);
|
2021-10-08 11:13:35 +02:00
|
|
|
|
2019-11-08 13:07:06 -05:00
|
|
|
#else
|
2021-10-08 11:13:34 +02:00
|
|
|
struct ftrace_ops;
|
2019-11-08 13:12:57 -05:00
|
|
|
# define ftrace_direct_func_count 0
|
2019-11-08 13:07:06 -05:00
|
|
|
static inline int register_ftrace_direct(unsigned long ip, unsigned long addr)
|
|
|
|
{
|
2019-11-20 18:32:25 -05:00
|
|
|
return -ENOTSUPP;
|
2019-11-08 13:07:06 -05:00
|
|
|
}
|
|
|
|
static inline int unregister_ftrace_direct(unsigned long ip, unsigned long addr)
|
|
|
|
{
|
2019-11-20 18:32:25 -05:00
|
|
|
return -ENOTSUPP;
|
2019-11-08 13:07:06 -05:00
|
|
|
}
|
2019-11-14 14:39:35 -05:00
|
|
|
static inline int modify_ftrace_direct(unsigned long ip,
|
|
|
|
unsigned long old_addr, unsigned long new_addr)
|
|
|
|
{
|
2019-11-20 18:32:25 -05:00
|
|
|
return -ENOTSUPP;
|
2019-11-14 14:39:35 -05:00
|
|
|
}
|
2019-11-08 13:11:27 -05:00
|
|
|
static inline struct ftrace_direct_func *ftrace_find_direct_func(unsigned long addr)
|
|
|
|
{
|
|
|
|
return NULL;
|
|
|
|
}
|
2019-11-17 17:04:15 -05:00
|
|
|
static inline int ftrace_modify_direct_caller(struct ftrace_func_entry *entry,
|
|
|
|
struct dyn_ftrace *rec,
|
|
|
|
unsigned long old_addr,
|
|
|
|
unsigned long new_addr)
|
|
|
|
{
|
|
|
|
return -ENODEV;
|
|
|
|
}
|
2019-12-08 16:01:12 -08:00
|
|
|
static inline unsigned long ftrace_find_rec_direct(unsigned long ip)
|
|
|
|
{
|
|
|
|
return 0;
|
|
|
|
}
|
2021-10-08 11:13:34 +02:00
|
|
|
static inline int register_ftrace_direct_multi(struct ftrace_ops *ops, unsigned long addr)
|
|
|
|
{
|
|
|
|
return -ENODEV;
|
|
|
|
}
|
|
|
|
static inline int unregister_ftrace_direct_multi(struct ftrace_ops *ops, unsigned long addr)
|
|
|
|
{
|
|
|
|
return -ENODEV;
|
|
|
|
}
|
2021-10-08 11:13:35 +02:00
|
|
|
static inline int modify_ftrace_direct_multi(struct ftrace_ops *ops, unsigned long addr)
|
|
|
|
{
|
|
|
|
return -ENODEV;
|
|
|
|
}
|
2022-07-19 17:21:23 -07:00
|
|
|
static inline int modify_ftrace_direct_multi_nolock(struct ftrace_ops *ops, unsigned long addr)
|
|
|
|
{
|
|
|
|
return -ENODEV;
|
|
|
|
}
|
2019-11-08 13:07:06 -05:00
|
|
|
|
|
|
|
/*
|
|
|
|
* This must be implemented by the architecture.
|
|
|
|
* It is the way the ftrace direct_ops helper, when called
|
|
|
|
* via ftrace (because there's other callbacks besides the
|
|
|
|
* direct call), can inform the architecture's trampoline that this
|
|
|
|
* routine has a direct caller, and what the caller is.
|
2019-11-08 13:11:39 -05:00
|
|
|
*
|
|
|
|
* For example, in x86, it returns the direct caller
|
|
|
|
* callback function via the regs->orig_ax parameter.
|
|
|
|
* Then in the ftrace trampoline, if this is set, it makes
|
|
|
|
* the return from the trampoline jump to the direct caller
|
|
|
|
* instead of going back to the function it just traced.
|
2019-11-08 13:07:06 -05:00
|
|
|
*/
|
2022-11-03 17:05:17 +00:00
|
|
|
static inline void arch_ftrace_set_direct_caller(struct ftrace_regs *fregs,
|
2019-11-08 13:07:06 -05:00
|
|
|
unsigned long addr) { }
|
2022-11-03 17:05:17 +00:00
|
|
|
#endif /* CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS */
|
2019-11-08 13:07:06 -05:00
|
|
|
|
2008-12-16 23:06:40 -05:00
|
|
|
#ifdef CONFIG_STACK_TRACER
|
2015-10-30 14:25:39 +09:00
|
|
|
|
2008-12-16 23:06:40 -05:00
|
|
|
extern int stack_tracer_enabled;
|
2019-04-25 11:44:54 +02:00
|
|
|
|
2020-06-03 07:52:37 +02:00
|
|
|
int stack_trace_sysctl(struct ctl_table *table, int write, void *buffer,
|
|
|
|
size_t *lenp, loff_t *ppos);
|
2017-04-06 12:26:20 -04:00
|
|
|
|
2017-04-06 15:47:32 -04:00
|
|
|
/* DO NOT MODIFY THIS VARIABLE DIRECTLY! */
|
|
|
|
DECLARE_PER_CPU(int, disable_stack_tracer);
|
|
|
|
|
|
|
|
/**
|
|
|
|
* stack_tracer_disable - temporarily disable the stack tracer
|
|
|
|
*
|
|
|
|
* There's a few locations (namely in RCU) where stack tracing
|
|
|
|
* cannot be executed. This function is used to disable stack
|
|
|
|
* tracing during those critical sections.
|
|
|
|
*
|
|
|
|
* This function must be called with preemption or interrupts
|
|
|
|
* disabled and stack_tracer_enable() must be called shortly after
|
|
|
|
* while preemption or interrupts are still disabled.
|
|
|
|
*/
|
|
|
|
static inline void stack_tracer_disable(void)
|
|
|
|
{
|
2021-03-23 18:49:35 +01:00
|
|
|
/* Preemption or interrupts must be disabled */
|
2017-08-30 05:36:38 -05:00
|
|
|
if (IS_ENABLED(CONFIG_DEBUG_PREEMPT))
|
2017-04-06 15:47:32 -04:00
|
|
|
WARN_ON_ONCE(!preempt_count() || !irqs_disabled());
|
|
|
|
this_cpu_inc(disable_stack_tracer);
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* stack_tracer_enable - re-enable the stack tracer
|
|
|
|
*
|
|
|
|
* After stack_tracer_disable() is called, stack_tracer_enable()
|
|
|
|
* must be called shortly afterward.
|
|
|
|
*/
|
|
|
|
static inline void stack_tracer_enable(void)
|
|
|
|
{
|
2017-08-30 05:36:38 -05:00
|
|
|
if (IS_ENABLED(CONFIG_DEBUG_PREEMPT))
|
2017-04-06 15:47:32 -04:00
|
|
|
WARN_ON_ONCE(!preempt_count() || !irqs_disabled());
|
|
|
|
this_cpu_dec(disable_stack_tracer);
|
|
|
|
}
|
2017-04-06 12:26:20 -04:00
|
|
|
#else
|
|
|
|
static inline void stack_tracer_disable(void) { }
|
|
|
|
static inline void stack_tracer_enable(void) { }
|
2008-12-16 23:06:40 -05:00
|
|
|
#endif
|
|
|
|
|
ftrace: dynamic enabling/disabling of function calls
This patch adds a feature to dynamically replace the ftrace code
with the jmps to allow a kernel with ftrace configured to run
as fast as it can without it configured.
The way this works, is on bootup (if ftrace is enabled), a ftrace
function is registered to record the instruction pointer of all
places that call the function.
Later, if there's still any code to patch, a kthread is awoken
(rate limited to at most once a second) that performs a stop_machine,
and replaces all the code that was called with a jmp over the call
to ftrace. It only replaces what was found the previous time. Typically
the system reaches equilibrium quickly after bootup and there's no code
patching needed at all.
e.g.
call ftrace /* 5 bytes */
is replaced with
jmp 3f /* jmp is 2 bytes and we jump 3 forward */
3:
When we want to enable ftrace for function tracing, the IP recording
is removed, and stop_machine is called again to replace all the locations
of that were recorded back to the call of ftrace. When it is disabled,
we replace the code back to the jmp.
Allocation is done by the kthread. If the ftrace recording function is
called, and we don't have any record slots available, then we simply
skip that call. Once a second a new page (if needed) is allocated for
recording new ftrace function calls. A large batch is allocated at
boot up to get most of the calls there.
Because we do this via stop_machine, we don't have to worry about another
CPU executing a ftrace call as we modify it. But we do need to worry
about NMI's so all functions that might be called via nmi must be
annotated with notrace_nmi. When this code is configured in, the NMI code
will not call notrace.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:20:42 +02:00
|
|
|
#ifdef CONFIG_DYNAMIC_FTRACE
|
2008-11-14 16:21:19 -08:00
|
|
|
|
2022-05-18 10:36:40 +08:00
|
|
|
void ftrace_arch_code_modify_prepare(void);
|
|
|
|
void ftrace_arch_code_modify_post_process(void);
|
2009-02-17 13:35:06 -05:00
|
|
|
|
2015-11-25 12:50:47 -05:00
|
|
|
enum ftrace_bug_type {
|
|
|
|
FTRACE_BUG_UNKNOWN,
|
|
|
|
FTRACE_BUG_INIT,
|
|
|
|
FTRACE_BUG_NOP,
|
|
|
|
FTRACE_BUG_CALL,
|
|
|
|
FTRACE_BUG_UPDATE,
|
|
|
|
};
|
|
|
|
extern enum ftrace_bug_type ftrace_bug_type;
|
|
|
|
|
2015-11-25 14:13:11 -05:00
|
|
|
/*
|
|
|
|
* Archs can set this to point to a variable that holds the value that was
|
|
|
|
* expected at the call site before calling ftrace_bug().
|
|
|
|
*/
|
|
|
|
extern const void *ftrace_expected;
|
|
|
|
|
2014-10-24 17:56:04 -04:00
|
|
|
void ftrace_bug(int err, struct dyn_ftrace *rec);
|
2011-08-16 09:53:39 -04:00
|
|
|
|
2009-02-16 23:06:01 -05:00
|
|
|
struct seq_file;
|
|
|
|
|
2013-01-09 18:09:20 -05:00
|
|
|
extern int ftrace_text_reserved(const void *start, const void *end);
|
2010-02-02 16:49:11 -05:00
|
|
|
|
2018-01-22 22:32:51 -05:00
|
|
|
struct ftrace_ops *ftrace_ops_trampoline(unsigned long addr);
|
|
|
|
|
2014-11-18 21:14:11 -05:00
|
|
|
bool is_ftrace_trampoline(unsigned long addr);
|
|
|
|
|
2012-04-30 16:20:23 -04:00
|
|
|
/*
|
|
|
|
* The dyn_ftrace record's flags field is split into two parts.
|
|
|
|
* the first part which is '0-FTRACE_REF_MAX' is a counter of
|
|
|
|
* the number of callbacks that have registered the function that
|
|
|
|
* the dyn_ftrace descriptor represents.
|
|
|
|
*
|
|
|
|
* The second part is a mask:
|
|
|
|
* ENABLED - the function is being traced
|
|
|
|
* REGS - the record wants the function to save regs
|
|
|
|
* REGS_EN - the function is set up to save regs.
|
2014-11-21 05:25:16 -05:00
|
|
|
* IPMODIFY - the record allows for the IP address to be changed.
|
ftrace: Add infrastructure for delayed enabling of module functions
Qiu Peiyang pointed out that there's a race when enabling function tracing
and loading a module. In order to make the modifications of converting nops
in the prologue of functions into callbacks, the text needs to be converted
from read-only to read-write. When enabling function tracing, the text
permission is updated, the functions are modified, and then they are put
back.
When loading a module, the updates to convert function calls to mcount is
done before the module text is set to read-only. But after it is done, the
module text is visible by the function tracer. Thus we have the following
race:
CPU 0 CPU 1
----- -----
start function tracing
set text to read-write
load_module
add functions to ftrace
set module text read-only
update all functions to callbacks
modify module functions too
< Can't it's read-only >
When this happens, ftrace detects the issue and disables itself till the
next reboot.
To fix this, a new DISABLED flag is added for ftrace records, which all
module functions get when they are added. Then later, after the module code
is all set, the records will have the DISABLED flag cleared, and they will
be enabled if any callback wants all functions to be traced.
Note, this doesn't add the delay to later. It simply changes the
ftrace_module_init() to do both the setting of DISABLED records, and then
immediately calls the enable code. This helps with testing this new code as
it has the same behavior as previously. Another change will come after this
to have the ftrace_module_enable() called after the text is set to
read-only.
Cc: Qiu Peiyang <peiyangx.qiu@intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-01-07 15:40:01 -05:00
|
|
|
* DISABLED - the record is not ready to be touched yet
|
2019-11-08 13:07:06 -05:00
|
|
|
* DIRECT - there is a direct function to call
|
ftrace: Add DYNAMIC_FTRACE_WITH_CALL_OPS
Architectures without dynamic ftrace trampolines incur an overhead when
multiple ftrace_ops are enabled with distinct filters. in these cases,
each call site calls a common trampoline which uses
ftrace_ops_list_func() to iterate over all enabled ftrace functions, and
so incurs an overhead relative to the size of this list (including RCU
protection overhead).
Architectures with dynamic ftrace trampolines avoid this overhead for
call sites which have a single associated ftrace_ops. In these cases,
the dynamic trampoline is customized to branch directly to the relevant
ftrace function, avoiding the list overhead.
On some architectures it's impractical and/or undesirable to implement
dynamic ftrace trampolines. For example, arm64 has limited branch ranges
and cannot always directly branch from a call site to an arbitrary
address (e.g. from a kernel text address to an arbitrary module
address). Calls from modules to core kernel text can be indirected via
PLTs (allocated at module load time) to address this, but the same is
not possible from calls from core kernel text.
Using an indirect branch from a call site to an arbitrary trampoline is
possible, but requires several more instructions in the function
prologue (or immediately before it), and/or comes with far more complex
requirements for patching.
Instead, this patch adds a new option, where an architecture can
associate each call site with a pointer to an ftrace_ops, placed at a
fixed offset from the call site. A shared trampoline can recover this
pointer and call ftrace_ops::func() without needing to go via
ftrace_ops_list_func(), avoiding the associated overhead.
This avoids issues with branch range limitations, and avoids the need to
allocate and manipulate dynamic trampolines, making it far simpler to
implement and maintain, while having similar performance
characteristics.
Note that this allows for dynamic ftrace_ops to be invoked directly from
an architecture's ftrace_caller trampoline, whereas existing code forces
the use of ftrace_ops_get_list_func(), which is in part necessary to
permit the ftrace_ops to be freed once unregistered *and* to avoid
branch/address-generation range limitation on some architectures (e.g.
where ops->func is a module address, and may be outside of the direct
branch range for callsites within the main kernel image).
The CALL_OPS approach avoids this problems and is safe as:
* The existing synchronization in ftrace_shutdown() using
ftrace_shutdown() using synchronize_rcu_tasks_rude() (and
synchronize_rcu_tasks()) ensures that no tasks hold a stale reference
to an ftrace_ops (e.g. in the middle of the ftrace_caller trampoline,
or while invoking ftrace_ops::func), when that ftrace_ops is
unregistered.
Arguably this could also be relied upon for the existing scheme,
permitting dynamic ftrace_ops to be invoked directly when ops->func is
in range, but this will require additional logic to handle branch
range limitations, and is not handled by this patch.
* Each callsite's ftrace_ops pointer literal can hold any valid kernel
address, and is updated atomically. As an architecture's ftrace_caller
trampoline will atomically load the ops pointer then dereference
ops->func, there is no risk of invoking ops->func with a mismatches
ops pointer, and updates to the ops pointer do not require special
care.
A subsequent patch will implement architectures support for arm64. There
should be no functional change as a result of this patch alone.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: Florent Revest <revest@chromium.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230123134603.1064407-2-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-01-23 13:45:56 +00:00
|
|
|
* CALL_OPS - the record can use callsite-specific ops
|
|
|
|
* CALL_OPS_EN - the function is set up to use callsite-specific ops
|
2012-04-30 16:20:23 -04:00
|
|
|
*
|
|
|
|
* When a new ftrace_ops is registered and wants a function to save
|
2020-08-31 11:11:01 +08:00
|
|
|
* pt_regs, the rec->flags REGS is set. When the function has been
|
2012-04-30 16:20:23 -04:00
|
|
|
* set up to save regs, the REG_EN flag is set. Once a function
|
|
|
|
* starts saving regs it will do so until all ftrace_ops are removed
|
|
|
|
* from tracing that function.
|
|
|
|
*/
|
2008-05-12 21:20:43 +02:00
|
|
|
enum {
|
ftrace: Optimize function graph to be called directly
Function graph tracing is a bit different than the function tracers, as
it is processed after either the ftrace_caller or ftrace_regs_caller
and we only have one place to modify the jump to ftrace_graph_caller,
the jump needs to happen after the restore of registeres.
The function graph tracer is dependent on the function tracer, where
even if the function graph tracing is going on by itself, the save and
restore of registers is still done for function tracing regardless of
if function tracing is happening, before it calls the function graph
code.
If there's no function tracing happening, it is possible to just call
the function graph tracer directly, and avoid the wasted effort to save
and restore regs for function tracing.
This requires adding new flags to the dyn_ftrace records:
FTRACE_FL_TRAMP
FTRACE_FL_TRAMP_EN
The first is set if the count for the record is one, and the ftrace_ops
associated to that record has its own trampoline. That way the mcount code
can call that trampoline directly.
In the future, trampolines can be added to arbitrary ftrace_ops, where you
can have two or more ftrace_ops registered to ftrace (like kprobes and perf)
and if they are not tracing the same functions, then instead of doing a
loop to check all registered ftrace_ops against their hashes, just call the
ftrace_ops trampoline directly, which would call the registered ftrace_ops
function directly.
Without this patch perf showed:
0.05% hackbench [kernel.kallsyms] [k] ftrace_caller
0.05% hackbench [kernel.kallsyms] [k] arch_local_irq_save
0.05% hackbench [kernel.kallsyms] [k] native_sched_clock
0.04% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] preempt_trace
0.04% hackbench [kernel.kallsyms] [k] prepare_ftrace_return
0.04% hackbench [kernel.kallsyms] [k] __this_cpu_preempt_check
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
See that the ftrace_caller took up more time than the ftrace_graph_caller
did.
With this patch:
0.05% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] call_filter_check_discard
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
0.04% hackbench [kernel.kallsyms] [k] sched_clock
The ftrace_caller is no where to be found and ftrace_graph_caller still
takes up the same percentage.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-05-06 21:56:17 -04:00
|
|
|
FTRACE_FL_ENABLED = (1UL << 31),
|
2012-04-30 16:20:23 -04:00
|
|
|
FTRACE_FL_REGS = (1UL << 30),
|
ftrace: Optimize function graph to be called directly
Function graph tracing is a bit different than the function tracers, as
it is processed after either the ftrace_caller or ftrace_regs_caller
and we only have one place to modify the jump to ftrace_graph_caller,
the jump needs to happen after the restore of registeres.
The function graph tracer is dependent on the function tracer, where
even if the function graph tracing is going on by itself, the save and
restore of registers is still done for function tracing regardless of
if function tracing is happening, before it calls the function graph
code.
If there's no function tracing happening, it is possible to just call
the function graph tracer directly, and avoid the wasted effort to save
and restore regs for function tracing.
This requires adding new flags to the dyn_ftrace records:
FTRACE_FL_TRAMP
FTRACE_FL_TRAMP_EN
The first is set if the count for the record is one, and the ftrace_ops
associated to that record has its own trampoline. That way the mcount code
can call that trampoline directly.
In the future, trampolines can be added to arbitrary ftrace_ops, where you
can have two or more ftrace_ops registered to ftrace (like kprobes and perf)
and if they are not tracing the same functions, then instead of doing a
loop to check all registered ftrace_ops against their hashes, just call the
ftrace_ops trampoline directly, which would call the registered ftrace_ops
function directly.
Without this patch perf showed:
0.05% hackbench [kernel.kallsyms] [k] ftrace_caller
0.05% hackbench [kernel.kallsyms] [k] arch_local_irq_save
0.05% hackbench [kernel.kallsyms] [k] native_sched_clock
0.04% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] preempt_trace
0.04% hackbench [kernel.kallsyms] [k] prepare_ftrace_return
0.04% hackbench [kernel.kallsyms] [k] __this_cpu_preempt_check
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
See that the ftrace_caller took up more time than the ftrace_graph_caller
did.
With this patch:
0.05% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] call_filter_check_discard
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
0.04% hackbench [kernel.kallsyms] [k] sched_clock
The ftrace_caller is no where to be found and ftrace_graph_caller still
takes up the same percentage.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-05-06 21:56:17 -04:00
|
|
|
FTRACE_FL_REGS_EN = (1UL << 29),
|
|
|
|
FTRACE_FL_TRAMP = (1UL << 28),
|
|
|
|
FTRACE_FL_TRAMP_EN = (1UL << 27),
|
2014-11-21 05:25:16 -05:00
|
|
|
FTRACE_FL_IPMODIFY = (1UL << 26),
|
ftrace: Add infrastructure for delayed enabling of module functions
Qiu Peiyang pointed out that there's a race when enabling function tracing
and loading a module. In order to make the modifications of converting nops
in the prologue of functions into callbacks, the text needs to be converted
from read-only to read-write. When enabling function tracing, the text
permission is updated, the functions are modified, and then they are put
back.
When loading a module, the updates to convert function calls to mcount is
done before the module text is set to read-only. But after it is done, the
module text is visible by the function tracer. Thus we have the following
race:
CPU 0 CPU 1
----- -----
start function tracing
set text to read-write
load_module
add functions to ftrace
set module text read-only
update all functions to callbacks
modify module functions too
< Can't it's read-only >
When this happens, ftrace detects the issue and disables itself till the
next reboot.
To fix this, a new DISABLED flag is added for ftrace records, which all
module functions get when they are added. Then later, after the module code
is all set, the records will have the DISABLED flag cleared, and they will
be enabled if any callback wants all functions to be traced.
Note, this doesn't add the delay to later. It simply changes the
ftrace_module_init() to do both the setting of DISABLED records, and then
immediately calls the enable code. This helps with testing this new code as
it has the same behavior as previously. Another change will come after this
to have the ftrace_module_enable() called after the text is set to
read-only.
Cc: Qiu Peiyang <peiyangx.qiu@intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-01-07 15:40:01 -05:00
|
|
|
FTRACE_FL_DISABLED = (1UL << 25),
|
2019-11-08 13:07:06 -05:00
|
|
|
FTRACE_FL_DIRECT = (1UL << 24),
|
|
|
|
FTRACE_FL_DIRECT_EN = (1UL << 23),
|
ftrace: Add DYNAMIC_FTRACE_WITH_CALL_OPS
Architectures without dynamic ftrace trampolines incur an overhead when
multiple ftrace_ops are enabled with distinct filters. in these cases,
each call site calls a common trampoline which uses
ftrace_ops_list_func() to iterate over all enabled ftrace functions, and
so incurs an overhead relative to the size of this list (including RCU
protection overhead).
Architectures with dynamic ftrace trampolines avoid this overhead for
call sites which have a single associated ftrace_ops. In these cases,
the dynamic trampoline is customized to branch directly to the relevant
ftrace function, avoiding the list overhead.
On some architectures it's impractical and/or undesirable to implement
dynamic ftrace trampolines. For example, arm64 has limited branch ranges
and cannot always directly branch from a call site to an arbitrary
address (e.g. from a kernel text address to an arbitrary module
address). Calls from modules to core kernel text can be indirected via
PLTs (allocated at module load time) to address this, but the same is
not possible from calls from core kernel text.
Using an indirect branch from a call site to an arbitrary trampoline is
possible, but requires several more instructions in the function
prologue (or immediately before it), and/or comes with far more complex
requirements for patching.
Instead, this patch adds a new option, where an architecture can
associate each call site with a pointer to an ftrace_ops, placed at a
fixed offset from the call site. A shared trampoline can recover this
pointer and call ftrace_ops::func() without needing to go via
ftrace_ops_list_func(), avoiding the associated overhead.
This avoids issues with branch range limitations, and avoids the need to
allocate and manipulate dynamic trampolines, making it far simpler to
implement and maintain, while having similar performance
characteristics.
Note that this allows for dynamic ftrace_ops to be invoked directly from
an architecture's ftrace_caller trampoline, whereas existing code forces
the use of ftrace_ops_get_list_func(), which is in part necessary to
permit the ftrace_ops to be freed once unregistered *and* to avoid
branch/address-generation range limitation on some architectures (e.g.
where ops->func is a module address, and may be outside of the direct
branch range for callsites within the main kernel image).
The CALL_OPS approach avoids this problems and is safe as:
* The existing synchronization in ftrace_shutdown() using
ftrace_shutdown() using synchronize_rcu_tasks_rude() (and
synchronize_rcu_tasks()) ensures that no tasks hold a stale reference
to an ftrace_ops (e.g. in the middle of the ftrace_caller trampoline,
or while invoking ftrace_ops::func), when that ftrace_ops is
unregistered.
Arguably this could also be relied upon for the existing scheme,
permitting dynamic ftrace_ops to be invoked directly when ops->func is
in range, but this will require additional logic to handle branch
range limitations, and is not handled by this patch.
* Each callsite's ftrace_ops pointer literal can hold any valid kernel
address, and is updated atomically. As an architecture's ftrace_caller
trampoline will atomically load the ops pointer then dereference
ops->func, there is no risk of invoking ops->func with a mismatches
ops pointer, and updates to the ops pointer do not require special
care.
A subsequent patch will implement architectures support for arm64. There
should be no functional change as a result of this patch alone.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: Florent Revest <revest@chromium.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230123134603.1064407-2-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-01-23 13:45:56 +00:00
|
|
|
FTRACE_FL_CALL_OPS = (1UL << 22),
|
|
|
|
FTRACE_FL_CALL_OPS_EN = (1UL << 21),
|
2008-05-12 21:20:43 +02:00
|
|
|
};
|
|
|
|
|
ftrace: Add DYNAMIC_FTRACE_WITH_CALL_OPS
Architectures without dynamic ftrace trampolines incur an overhead when
multiple ftrace_ops are enabled with distinct filters. in these cases,
each call site calls a common trampoline which uses
ftrace_ops_list_func() to iterate over all enabled ftrace functions, and
so incurs an overhead relative to the size of this list (including RCU
protection overhead).
Architectures with dynamic ftrace trampolines avoid this overhead for
call sites which have a single associated ftrace_ops. In these cases,
the dynamic trampoline is customized to branch directly to the relevant
ftrace function, avoiding the list overhead.
On some architectures it's impractical and/or undesirable to implement
dynamic ftrace trampolines. For example, arm64 has limited branch ranges
and cannot always directly branch from a call site to an arbitrary
address (e.g. from a kernel text address to an arbitrary module
address). Calls from modules to core kernel text can be indirected via
PLTs (allocated at module load time) to address this, but the same is
not possible from calls from core kernel text.
Using an indirect branch from a call site to an arbitrary trampoline is
possible, but requires several more instructions in the function
prologue (or immediately before it), and/or comes with far more complex
requirements for patching.
Instead, this patch adds a new option, where an architecture can
associate each call site with a pointer to an ftrace_ops, placed at a
fixed offset from the call site. A shared trampoline can recover this
pointer and call ftrace_ops::func() without needing to go via
ftrace_ops_list_func(), avoiding the associated overhead.
This avoids issues with branch range limitations, and avoids the need to
allocate and manipulate dynamic trampolines, making it far simpler to
implement and maintain, while having similar performance
characteristics.
Note that this allows for dynamic ftrace_ops to be invoked directly from
an architecture's ftrace_caller trampoline, whereas existing code forces
the use of ftrace_ops_get_list_func(), which is in part necessary to
permit the ftrace_ops to be freed once unregistered *and* to avoid
branch/address-generation range limitation on some architectures (e.g.
where ops->func is a module address, and may be outside of the direct
branch range for callsites within the main kernel image).
The CALL_OPS approach avoids this problems and is safe as:
* The existing synchronization in ftrace_shutdown() using
ftrace_shutdown() using synchronize_rcu_tasks_rude() (and
synchronize_rcu_tasks()) ensures that no tasks hold a stale reference
to an ftrace_ops (e.g. in the middle of the ftrace_caller trampoline,
or while invoking ftrace_ops::func), when that ftrace_ops is
unregistered.
Arguably this could also be relied upon for the existing scheme,
permitting dynamic ftrace_ops to be invoked directly when ops->func is
in range, but this will require additional logic to handle branch
range limitations, and is not handled by this patch.
* Each callsite's ftrace_ops pointer literal can hold any valid kernel
address, and is updated atomically. As an architecture's ftrace_caller
trampoline will atomically load the ops pointer then dereference
ops->func, there is no risk of invoking ops->func with a mismatches
ops pointer, and updates to the ops pointer do not require special
care.
A subsequent patch will implement architectures support for arm64. There
should be no functional change as a result of this patch alone.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: Florent Revest <revest@chromium.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230123134603.1064407-2-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-01-23 13:45:56 +00:00
|
|
|
#define FTRACE_REF_MAX_SHIFT 21
|
2014-05-07 12:42:28 -04:00
|
|
|
#define FTRACE_REF_MAX ((1UL << FTRACE_REF_MAX_SHIFT) - 1)
|
2011-05-03 13:25:24 -04:00
|
|
|
|
2020-08-31 11:11:01 +08:00
|
|
|
#define ftrace_rec_count(rec) ((rec)->flags & FTRACE_REF_MAX)
|
2014-05-07 13:46:45 -04:00
|
|
|
|
ftrace: dynamic enabling/disabling of function calls
This patch adds a feature to dynamically replace the ftrace code
with the jmps to allow a kernel with ftrace configured to run
as fast as it can without it configured.
The way this works, is on bootup (if ftrace is enabled), a ftrace
function is registered to record the instruction pointer of all
places that call the function.
Later, if there's still any code to patch, a kthread is awoken
(rate limited to at most once a second) that performs a stop_machine,
and replaces all the code that was called with a jmp over the call
to ftrace. It only replaces what was found the previous time. Typically
the system reaches equilibrium quickly after bootup and there's no code
patching needed at all.
e.g.
call ftrace /* 5 bytes */
is replaced with
jmp 3f /* jmp is 2 bytes and we jump 3 forward */
3:
When we want to enable ftrace for function tracing, the IP recording
is removed, and stop_machine is called again to replace all the locations
of that were recorded back to the call of ftrace. When it is disabled,
we replace the code back to the jmp.
Allocation is done by the kthread. If the ftrace recording function is
called, and we don't have any record slots available, then we simply
skip that call. Once a second a new page (if needed) is allocated for
recording new ftrace function calls. A large batch is allocated at
boot up to get most of the calls there.
Because we do this via stop_machine, we don't have to worry about another
CPU executing a ftrace call as we modify it. But we do need to worry
about NMI's so all functions that might be called via nmi must be
annotated with notrace_nmi. When this code is configured in, the NMI code
will not call notrace.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:20:42 +02:00
|
|
|
struct dyn_ftrace {
|
2014-02-24 20:00:00 +01:00
|
|
|
unsigned long ip; /* address of mcount call-site */
|
2011-12-16 16:30:31 -05:00
|
|
|
unsigned long flags;
|
2014-02-24 20:00:00 +01:00
|
|
|
struct dyn_arch_ftrace arch;
|
ftrace: dynamic enabling/disabling of function calls
This patch adds a feature to dynamically replace the ftrace code
with the jmps to allow a kernel with ftrace configured to run
as fast as it can without it configured.
The way this works, is on bootup (if ftrace is enabled), a ftrace
function is registered to record the instruction pointer of all
places that call the function.
Later, if there's still any code to patch, a kthread is awoken
(rate limited to at most once a second) that performs a stop_machine,
and replaces all the code that was called with a jmp over the call
to ftrace. It only replaces what was found the previous time. Typically
the system reaches equilibrium quickly after bootup and there's no code
patching needed at all.
e.g.
call ftrace /* 5 bytes */
is replaced with
jmp 3f /* jmp is 2 bytes and we jump 3 forward */
3:
When we want to enable ftrace for function tracing, the IP recording
is removed, and stop_machine is called again to replace all the locations
of that were recorded back to the call of ftrace. When it is disabled,
we replace the code back to the jmp.
Allocation is done by the kthread. If the ftrace recording function is
called, and we don't have any record slots available, then we simply
skip that call. Once a second a new page (if needed) is allocated for
recording new ftrace function calls. A large batch is allocated at
boot up to get most of the calls there.
Because we do this via stop_machine, we don't have to worry about another
CPU executing a ftrace call as we modify it. But we do need to worry
about NMI's so all functions that might be called via nmi must be
annotated with notrace_nmi. When this code is configured in, the NMI code
will not call notrace.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:20:42 +02:00
|
|
|
};
|
|
|
|
|
2012-06-05 19:28:08 +09:00
|
|
|
int ftrace_set_filter_ip(struct ftrace_ops *ops, unsigned long ip,
|
|
|
|
int remove, int reset);
|
2022-03-15 23:00:26 +09:00
|
|
|
int ftrace_set_filter_ips(struct ftrace_ops *ops, unsigned long *ips,
|
|
|
|
unsigned int cnt, int remove, int reset);
|
2012-01-02 10:04:14 +01:00
|
|
|
int ftrace_set_filter(struct ftrace_ops *ops, unsigned char *buf,
|
2011-05-05 22:54:01 -04:00
|
|
|
int len, int reset);
|
2012-01-02 10:04:14 +01:00
|
|
|
int ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf,
|
2011-05-05 22:54:01 -04:00
|
|
|
int len, int reset);
|
|
|
|
void ftrace_set_global_filter(unsigned char *buf, int len, int reset);
|
|
|
|
void ftrace_set_global_notrace(unsigned char *buf, int len, int reset);
|
ftrace, perf: Add filter support for function trace event
Adding support to filter function trace event via perf
interface. It is now possible to use filter interface
in the perf tool like:
perf record -e ftrace:function --filter="(ip == mm_*)" ls
The filter syntax is restricted to the the 'ip' field only,
and following operators are accepted '==' '!=' '||', ending
up with the filter strings like:
ip == f1[, ]f2 ... || ip != f3[, ]f4 ...
with comma ',' or space ' ' as a function separator. If the
space ' ' is used as a separator, the right side of the
assignment needs to be enclosed in double quotes '"', e.g.:
perf record -e ftrace:function --filter '(ip == do_execve,sys_*,ext*)' ls
perf record -e ftrace:function --filter '(ip == "do_execve,sys_*,ext*")' ls
perf record -e ftrace:function --filter '(ip == "do_execve sys_* ext*")' ls
The '==' operator adds trace filter with same effect as would
be added via set_ftrace_filter file.
The '!=' operator adds trace filter with same effect as would
be added via set_ftrace_notrace file.
The right side of the '!=', '==' operators is list of functions
or regexp. to be added to filter separated by space.
The '||' operator is used for connecting multiple filter definitions
together. It is possible to have more than one '==' and '!='
operators within one filter string.
Link: http://lkml.kernel.org/r/1329317514-8131-8-git-send-email-jolsa@redhat.com
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-02-15 15:51:54 +01:00
|
|
|
void ftrace_free_filter(struct ftrace_ops *ops);
|
2016-11-15 12:31:20 -08:00
|
|
|
void ftrace_ops_set_global_filter(struct ftrace_ops *ops);
|
2008-05-12 21:20:44 +02:00
|
|
|
|
2011-08-16 09:53:39 -04:00
|
|
|
enum {
|
|
|
|
FTRACE_UPDATE_CALLS = (1 << 0),
|
|
|
|
FTRACE_DISABLE_CALLS = (1 << 1),
|
|
|
|
FTRACE_UPDATE_TRACE_FUNC = (1 << 2),
|
|
|
|
FTRACE_START_FUNC_RET = (1 << 3),
|
|
|
|
FTRACE_STOP_FUNC_RET = (1 << 4),
|
2018-12-05 12:48:53 -05:00
|
|
|
FTRACE_MAY_SLEEP = (1 << 5),
|
2011-08-16 09:53:39 -04:00
|
|
|
};
|
|
|
|
|
2012-04-30 16:20:23 -04:00
|
|
|
/*
|
|
|
|
* The FTRACE_UPDATE_* enum is used to pass information back
|
|
|
|
* from the ftrace_update_record() and ftrace_test_record()
|
|
|
|
* functions. These are called by the code update routines
|
|
|
|
* to find out what is to be done for a given function.
|
|
|
|
*
|
|
|
|
* IGNORE - The function is already what we want it to be
|
|
|
|
* MAKE_CALL - Start tracing the function
|
|
|
|
* MODIFY_CALL - Stop saving regs for the function
|
|
|
|
* MAKE_NOP - Stop tracing the function
|
|
|
|
*/
|
2011-08-16 09:53:39 -04:00
|
|
|
enum {
|
|
|
|
FTRACE_UPDATE_IGNORE,
|
|
|
|
FTRACE_UPDATE_MAKE_CALL,
|
2012-04-30 16:20:23 -04:00
|
|
|
FTRACE_UPDATE_MODIFY_CALL,
|
2011-08-16 09:53:39 -04:00
|
|
|
FTRACE_UPDATE_MAKE_NOP,
|
|
|
|
};
|
|
|
|
|
2011-12-19 14:41:25 -05:00
|
|
|
enum {
|
|
|
|
FTRACE_ITER_FILTER = (1 << 0),
|
|
|
|
FTRACE_ITER_NOTRACE = (1 << 1),
|
|
|
|
FTRACE_ITER_PRINTALL = (1 << 2),
|
2017-04-04 21:31:28 -04:00
|
|
|
FTRACE_ITER_DO_PROBES = (1 << 3),
|
|
|
|
FTRACE_ITER_PROBE = (1 << 4),
|
2017-06-23 16:05:11 -04:00
|
|
|
FTRACE_ITER_MOD = (1 << 5),
|
|
|
|
FTRACE_ITER_ENABLED = (1 << 6),
|
2011-12-19 14:41:25 -05:00
|
|
|
};
|
|
|
|
|
2011-08-16 09:53:39 -04:00
|
|
|
void arch_ftrace_update_code(int command);
|
2018-11-22 10:04:09 +08:00
|
|
|
void arch_ftrace_update_trampoline(struct ftrace_ops *ops);
|
|
|
|
void *arch_ftrace_trampoline_func(struct ftrace_ops *ops, struct dyn_ftrace *rec);
|
|
|
|
void arch_ftrace_trampoline_free(struct ftrace_ops *ops);
|
2011-08-16 09:53:39 -04:00
|
|
|
|
|
|
|
struct ftrace_rec_iter;
|
|
|
|
|
|
|
|
struct ftrace_rec_iter *ftrace_rec_iter_start(void);
|
|
|
|
struct ftrace_rec_iter *ftrace_rec_iter_next(struct ftrace_rec_iter *iter);
|
|
|
|
struct dyn_ftrace *ftrace_rec_iter_record(struct ftrace_rec_iter *iter);
|
|
|
|
|
2011-08-16 09:57:10 -04:00
|
|
|
#define for_ftrace_rec_iter(iter) \
|
|
|
|
for (iter = ftrace_rec_iter_start(); \
|
|
|
|
iter; \
|
|
|
|
iter = ftrace_rec_iter_next(iter))
|
|
|
|
|
|
|
|
|
2019-05-20 09:26:24 -04:00
|
|
|
int ftrace_update_record(struct dyn_ftrace *rec, bool enable);
|
|
|
|
int ftrace_test_record(struct dyn_ftrace *rec, bool enable);
|
2011-08-16 09:53:39 -04:00
|
|
|
void ftrace_run_stop_machine(int command);
|
2012-04-25 14:39:54 -04:00
|
|
|
unsigned long ftrace_location(unsigned long ip);
|
2016-03-24 22:04:01 +11:00
|
|
|
unsigned long ftrace_location_range(unsigned long start, unsigned long end);
|
2014-05-06 21:34:14 -04:00
|
|
|
unsigned long ftrace_get_addr_new(struct dyn_ftrace *rec);
|
|
|
|
unsigned long ftrace_get_addr_curr(struct dyn_ftrace *rec);
|
2011-08-16 09:53:39 -04:00
|
|
|
|
|
|
|
extern ftrace_func_t ftrace_trace_function;
|
|
|
|
|
2011-12-19 14:41:25 -05:00
|
|
|
int ftrace_regex_open(struct ftrace_ops *ops, int flag,
|
|
|
|
struct inode *inode, struct file *file);
|
|
|
|
ssize_t ftrace_filter_write(struct file *file, const char __user *ubuf,
|
|
|
|
size_t cnt, loff_t *ppos);
|
|
|
|
ssize_t ftrace_notrace_write(struct file *file, const char __user *ubuf,
|
|
|
|
size_t cnt, loff_t *ppos);
|
|
|
|
int ftrace_regex_release(struct inode *inode, struct file *file);
|
|
|
|
|
2011-12-19 21:57:44 -05:00
|
|
|
void __init
|
|
|
|
ftrace_set_early_filter(struct ftrace_ops *ops, char *buf, int enable);
|
|
|
|
|
ftrace: dynamic enabling/disabling of function calls
This patch adds a feature to dynamically replace the ftrace code
with the jmps to allow a kernel with ftrace configured to run
as fast as it can without it configured.
The way this works, is on bootup (if ftrace is enabled), a ftrace
function is registered to record the instruction pointer of all
places that call the function.
Later, if there's still any code to patch, a kthread is awoken
(rate limited to at most once a second) that performs a stop_machine,
and replaces all the code that was called with a jmp over the call
to ftrace. It only replaces what was found the previous time. Typically
the system reaches equilibrium quickly after bootup and there's no code
patching needed at all.
e.g.
call ftrace /* 5 bytes */
is replaced with
jmp 3f /* jmp is 2 bytes and we jump 3 forward */
3:
When we want to enable ftrace for function tracing, the IP recording
is removed, and stop_machine is called again to replace all the locations
of that were recorded back to the call of ftrace. When it is disabled,
we replace the code back to the jmp.
Allocation is done by the kthread. If the ftrace recording function is
called, and we don't have any record slots available, then we simply
skip that call. Once a second a new page (if needed) is allocated for
recording new ftrace function calls. A large batch is allocated at
boot up to get most of the calls there.
Because we do this via stop_machine, we don't have to worry about another
CPU executing a ftrace call as we modify it. But we do need to worry
about NMI's so all functions that might be called via nmi must be
annotated with notrace_nmi. When this code is configured in, the NMI code
will not call notrace.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:20:42 +02:00
|
|
|
/* defined in arch */
|
2008-05-12 21:20:43 +02:00
|
|
|
extern int ftrace_ip_converted(unsigned long ip);
|
2014-02-24 19:59:59 +01:00
|
|
|
extern int ftrace_dyn_arch_init(void);
|
2012-04-27 09:13:18 -04:00
|
|
|
extern void ftrace_replace_code(int enable);
|
2008-05-12 21:20:43 +02:00
|
|
|
extern int ftrace_update_ftrace_func(ftrace_func_t func);
|
|
|
|
extern void ftrace_caller(void);
|
2012-04-30 16:20:23 -04:00
|
|
|
extern void ftrace_regs_caller(void);
|
2008-05-12 21:20:43 +02:00
|
|
|
extern void ftrace_call(void);
|
2012-04-30 16:20:23 -04:00
|
|
|
extern void ftrace_regs_call(void);
|
2008-05-12 21:20:43 +02:00
|
|
|
extern void mcount_call(void);
|
2009-01-09 11:29:42 +08:00
|
|
|
|
2012-04-26 14:59:43 -04:00
|
|
|
void ftrace_modify_all_code(int command);
|
|
|
|
|
2009-01-09 11:29:42 +08:00
|
|
|
#ifndef FTRACE_ADDR
|
|
|
|
#define FTRACE_ADDR ((unsigned long)ftrace_caller)
|
|
|
|
#endif
|
2012-04-30 16:20:23 -04:00
|
|
|
|
ftrace: Optimize function graph to be called directly
Function graph tracing is a bit different than the function tracers, as
it is processed after either the ftrace_caller or ftrace_regs_caller
and we only have one place to modify the jump to ftrace_graph_caller,
the jump needs to happen after the restore of registeres.
The function graph tracer is dependent on the function tracer, where
even if the function graph tracing is going on by itself, the save and
restore of registers is still done for function tracing regardless of
if function tracing is happening, before it calls the function graph
code.
If there's no function tracing happening, it is possible to just call
the function graph tracer directly, and avoid the wasted effort to save
and restore regs for function tracing.
This requires adding new flags to the dyn_ftrace records:
FTRACE_FL_TRAMP
FTRACE_FL_TRAMP_EN
The first is set if the count for the record is one, and the ftrace_ops
associated to that record has its own trampoline. That way the mcount code
can call that trampoline directly.
In the future, trampolines can be added to arbitrary ftrace_ops, where you
can have two or more ftrace_ops registered to ftrace (like kprobes and perf)
and if they are not tracing the same functions, then instead of doing a
loop to check all registered ftrace_ops against their hashes, just call the
ftrace_ops trampoline directly, which would call the registered ftrace_ops
function directly.
Without this patch perf showed:
0.05% hackbench [kernel.kallsyms] [k] ftrace_caller
0.05% hackbench [kernel.kallsyms] [k] arch_local_irq_save
0.05% hackbench [kernel.kallsyms] [k] native_sched_clock
0.04% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] preempt_trace
0.04% hackbench [kernel.kallsyms] [k] prepare_ftrace_return
0.04% hackbench [kernel.kallsyms] [k] __this_cpu_preempt_check
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
See that the ftrace_caller took up more time than the ftrace_graph_caller
did.
With this patch:
0.05% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] call_filter_check_discard
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
0.04% hackbench [kernel.kallsyms] [k] sched_clock
The ftrace_caller is no where to be found and ftrace_graph_caller still
takes up the same percentage.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-05-06 21:56:17 -04:00
|
|
|
#ifndef FTRACE_GRAPH_ADDR
|
|
|
|
#define FTRACE_GRAPH_ADDR ((unsigned long)ftrace_graph_caller)
|
|
|
|
#endif
|
|
|
|
|
2012-04-30 16:20:23 -04:00
|
|
|
#ifndef FTRACE_REGS_ADDR
|
2012-09-28 17:15:17 +09:00
|
|
|
#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
|
2012-04-30 16:20:23 -04:00
|
|
|
# define FTRACE_REGS_ADDR ((unsigned long)ftrace_regs_caller)
|
|
|
|
#else
|
|
|
|
# define FTRACE_REGS_ADDR FTRACE_ADDR
|
|
|
|
#endif
|
|
|
|
#endif
|
|
|
|
|
2014-07-11 14:39:10 -04:00
|
|
|
/*
|
|
|
|
* If an arch would like functions that are only traced
|
|
|
|
* by the function graph tracer to jump directly to its own
|
|
|
|
* trampoline, then they can define FTRACE_GRAPH_TRAMP_ADDR
|
|
|
|
* to be that address to jump to.
|
|
|
|
*/
|
|
|
|
#ifndef FTRACE_GRAPH_TRAMP_ADDR
|
|
|
|
#define FTRACE_GRAPH_TRAMP_ADDR ((unsigned long) 0)
|
|
|
|
#endif
|
|
|
|
|
2008-11-25 21:07:04 +01:00
|
|
|
#ifdef CONFIG_FUNCTION_GRAPH_TRACER
|
|
|
|
extern void ftrace_graph_caller(void);
|
2008-11-26 00:16:24 -05:00
|
|
|
extern int ftrace_enable_ftrace_graph_caller(void);
|
|
|
|
extern int ftrace_disable_ftrace_graph_caller(void);
|
|
|
|
#else
|
|
|
|
static inline int ftrace_enable_ftrace_graph_caller(void) { return 0; }
|
|
|
|
static inline int ftrace_disable_ftrace_graph_caller(void) { return 0; }
|
2008-11-16 06:02:06 +01:00
|
|
|
#endif
|
ftrace: user update and disable dynamic ftrace daemon
In dynamic ftrace, the mcount function starts off pointing to a stub
function that just returns.
On start up, the call to the stub is modified to point to a "record_ip"
function. The job of the record_ip function is to add the function to
a pre-allocated hash list. If the function is already there, it simply is
ignored, otherwise it is added to the list.
Later, a ftraced daemon wakes up and calls kstop_machine if any functions
have been recorded, and changes the calls to the recorded functions to
a simple nop. If no functions were recorded, the daemon goes back to sleep.
The daemon wakes up once a second to see if it needs to update any newly
recorded functions into nops. Usually it does not, but if a lot of code
has been executed for the first time in the kernel, the ftraced daemon
will call kstop_machine to update those into nops.
The problem currently is that there's no way to stop the daemon from doing
this, and it can cause unneeded latencies (800us which for some is bothersome).
This patch adds a new file /debugfs/tracing/ftraced_enabled. If the daemon
is active, reading this will return "enabled\n" and "disabled\n" when the
daemon is not running. To disable the daemon, the user can echo "0" or
"disable" into this file, and "1" or "enable" to re-enable the daemon.
Since the daemon is used to convert the functions into nops to increase
the performance of the system, I also added that anytime something is
written into the ftraced_enabled file, kstop_machine will run if there
are new functions that have been detected that need to be converted.
This way the user can disable the daemon but still be able to control the
conversion of the mcount calls to nops by simply,
"echo 0 > /debugfs/tracing/ftraced_enabled"
when they need to do more conversions.
To see the number of converted functions:
"cat /debugfs/tracing/dyn_ftrace_total_info"
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-27 20:48:37 -04:00
|
|
|
|
2008-11-14 16:21:19 -08:00
|
|
|
/**
|
2009-02-06 17:33:27 +08:00
|
|
|
* ftrace_make_nop - convert code into nop
|
2008-11-14 16:21:19 -08:00
|
|
|
* @mod: module structure if called by module load initialization
|
2019-10-16 17:51:10 +01:00
|
|
|
* @rec: the call site record (e.g. mcount/fentry)
|
2008-11-14 16:21:19 -08:00
|
|
|
* @addr: the address that the call site should be calling
|
|
|
|
*
|
|
|
|
* This is a very sensitive operation and great care needs
|
|
|
|
* to be taken by the arch. The operation should carefully
|
|
|
|
* read the location, check to see if what is read is indeed
|
|
|
|
* what we expect it to be, and then on success of the compare,
|
|
|
|
* it should write to the location.
|
|
|
|
*
|
|
|
|
* The code segment at @rec->ip should be a caller to @addr
|
|
|
|
*
|
|
|
|
* Return must be:
|
|
|
|
* 0 on success
|
|
|
|
* -EFAULT on error reading the location
|
|
|
|
* -EINVAL on a failed compare of the contents
|
|
|
|
* -EPERM on error writing to the location
|
|
|
|
* Any other value will be considered a failure.
|
|
|
|
*/
|
|
|
|
extern int ftrace_make_nop(struct module *mod,
|
|
|
|
struct dyn_ftrace *rec, unsigned long addr);
|
2008-10-31 00:03:22 -04:00
|
|
|
|
2021-07-28 23:25:45 +02:00
|
|
|
/**
|
|
|
|
* ftrace_need_init_nop - return whether nop call sites should be initialized
|
|
|
|
*
|
|
|
|
* Normally the compiler's -mnop-mcount generates suitable nops, so we don't
|
|
|
|
* need to call ftrace_init_nop() if the code is built with that flag.
|
|
|
|
* Architectures where this is not always the case may define their own
|
|
|
|
* condition.
|
|
|
|
*
|
|
|
|
* Return must be:
|
|
|
|
* 0 if ftrace_init_nop() should be called
|
|
|
|
* Nonzero if ftrace_init_nop() should not be called
|
|
|
|
*/
|
|
|
|
|
|
|
|
#ifndef ftrace_need_init_nop
|
|
|
|
#define ftrace_need_init_nop() (!__is_defined(CC_USING_NOP_MCOUNT))
|
|
|
|
#endif
|
2019-10-16 17:51:10 +01:00
|
|
|
|
|
|
|
/**
|
|
|
|
* ftrace_init_nop - initialize a nop call site
|
|
|
|
* @mod: module structure if called by module load initialization
|
|
|
|
* @rec: the call site record (e.g. mcount/fentry)
|
|
|
|
*
|
|
|
|
* This is a very sensitive operation and great care needs
|
|
|
|
* to be taken by the arch. The operation should carefully
|
|
|
|
* read the location, check to see if what is read is indeed
|
|
|
|
* what we expect it to be, and then on success of the compare,
|
|
|
|
* it should write to the location.
|
|
|
|
*
|
|
|
|
* The code segment at @rec->ip should contain the contents created by
|
|
|
|
* the compiler
|
|
|
|
*
|
|
|
|
* Return must be:
|
|
|
|
* 0 on success
|
|
|
|
* -EFAULT on error reading the location
|
|
|
|
* -EINVAL on a failed compare of the contents
|
|
|
|
* -EPERM on error writing to the location
|
|
|
|
* Any other value will be considered a failure.
|
|
|
|
*/
|
|
|
|
#ifndef ftrace_init_nop
|
|
|
|
static inline int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec)
|
|
|
|
{
|
|
|
|
return ftrace_make_nop(mod, rec, MCOUNT_ADDR);
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2008-10-23 09:32:59 -04:00
|
|
|
/**
|
2008-11-14 16:21:19 -08:00
|
|
|
* ftrace_make_call - convert a nop call site into a call to addr
|
2019-10-16 17:51:10 +01:00
|
|
|
* @rec: the call site record (e.g. mcount/fentry)
|
2008-11-14 16:21:19 -08:00
|
|
|
* @addr: the address that the call site should call
|
2008-10-23 09:32:59 -04:00
|
|
|
*
|
|
|
|
* This is a very sensitive operation and great care needs
|
|
|
|
* to be taken by the arch. The operation should carefully
|
|
|
|
* read the location, check to see if what is read is indeed
|
|
|
|
* what we expect it to be, and then on success of the compare,
|
|
|
|
* it should write to the location.
|
|
|
|
*
|
2008-11-14 16:21:19 -08:00
|
|
|
* The code segment at @rec->ip should be a nop
|
|
|
|
*
|
2008-10-23 09:32:59 -04:00
|
|
|
* Return must be:
|
|
|
|
* 0 on success
|
|
|
|
* -EFAULT on error reading the location
|
|
|
|
* -EINVAL on a failed compare of the contents
|
|
|
|
* -EPERM on error writing to the location
|
|
|
|
* Any other value will be considered a failure.
|
|
|
|
*/
|
2008-11-14 16:21:19 -08:00
|
|
|
extern int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr);
|
|
|
|
|
ftrace: Add DYNAMIC_FTRACE_WITH_CALL_OPS
Architectures without dynamic ftrace trampolines incur an overhead when
multiple ftrace_ops are enabled with distinct filters. in these cases,
each call site calls a common trampoline which uses
ftrace_ops_list_func() to iterate over all enabled ftrace functions, and
so incurs an overhead relative to the size of this list (including RCU
protection overhead).
Architectures with dynamic ftrace trampolines avoid this overhead for
call sites which have a single associated ftrace_ops. In these cases,
the dynamic trampoline is customized to branch directly to the relevant
ftrace function, avoiding the list overhead.
On some architectures it's impractical and/or undesirable to implement
dynamic ftrace trampolines. For example, arm64 has limited branch ranges
and cannot always directly branch from a call site to an arbitrary
address (e.g. from a kernel text address to an arbitrary module
address). Calls from modules to core kernel text can be indirected via
PLTs (allocated at module load time) to address this, but the same is
not possible from calls from core kernel text.
Using an indirect branch from a call site to an arbitrary trampoline is
possible, but requires several more instructions in the function
prologue (or immediately before it), and/or comes with far more complex
requirements for patching.
Instead, this patch adds a new option, where an architecture can
associate each call site with a pointer to an ftrace_ops, placed at a
fixed offset from the call site. A shared trampoline can recover this
pointer and call ftrace_ops::func() without needing to go via
ftrace_ops_list_func(), avoiding the associated overhead.
This avoids issues with branch range limitations, and avoids the need to
allocate and manipulate dynamic trampolines, making it far simpler to
implement and maintain, while having similar performance
characteristics.
Note that this allows for dynamic ftrace_ops to be invoked directly from
an architecture's ftrace_caller trampoline, whereas existing code forces
the use of ftrace_ops_get_list_func(), which is in part necessary to
permit the ftrace_ops to be freed once unregistered *and* to avoid
branch/address-generation range limitation on some architectures (e.g.
where ops->func is a module address, and may be outside of the direct
branch range for callsites within the main kernel image).
The CALL_OPS approach avoids this problems and is safe as:
* The existing synchronization in ftrace_shutdown() using
ftrace_shutdown() using synchronize_rcu_tasks_rude() (and
synchronize_rcu_tasks()) ensures that no tasks hold a stale reference
to an ftrace_ops (e.g. in the middle of the ftrace_caller trampoline,
or while invoking ftrace_ops::func), when that ftrace_ops is
unregistered.
Arguably this could also be relied upon for the existing scheme,
permitting dynamic ftrace_ops to be invoked directly when ops->func is
in range, but this will require additional logic to handle branch
range limitations, and is not handled by this patch.
* Each callsite's ftrace_ops pointer literal can hold any valid kernel
address, and is updated atomically. As an architecture's ftrace_caller
trampoline will atomically load the ops pointer then dereference
ops->func, there is no risk of invoking ops->func with a mismatches
ops pointer, and updates to the ops pointer do not require special
care.
A subsequent patch will implement architectures support for arm64. There
should be no functional change as a result of this patch alone.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: Florent Revest <revest@chromium.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230123134603.1064407-2-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-01-23 13:45:56 +00:00
|
|
|
#if defined(CONFIG_DYNAMIC_FTRACE_WITH_REGS) || \
|
|
|
|
defined(CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS)
|
2012-04-30 16:20:23 -04:00
|
|
|
/**
|
|
|
|
* ftrace_modify_call - convert from one addr to another (no nop)
|
2019-10-16 17:51:10 +01:00
|
|
|
* @rec: the call site record (e.g. mcount/fentry)
|
2012-04-30 16:20:23 -04:00
|
|
|
* @old_addr: the address expected to be currently called to
|
|
|
|
* @addr: the address to change to
|
|
|
|
*
|
|
|
|
* This is a very sensitive operation and great care needs
|
|
|
|
* to be taken by the arch. The operation should carefully
|
|
|
|
* read the location, check to see if what is read is indeed
|
|
|
|
* what we expect it to be, and then on success of the compare,
|
|
|
|
* it should write to the location.
|
|
|
|
*
|
ftrace: Add DYNAMIC_FTRACE_WITH_CALL_OPS
Architectures without dynamic ftrace trampolines incur an overhead when
multiple ftrace_ops are enabled with distinct filters. in these cases,
each call site calls a common trampoline which uses
ftrace_ops_list_func() to iterate over all enabled ftrace functions, and
so incurs an overhead relative to the size of this list (including RCU
protection overhead).
Architectures with dynamic ftrace trampolines avoid this overhead for
call sites which have a single associated ftrace_ops. In these cases,
the dynamic trampoline is customized to branch directly to the relevant
ftrace function, avoiding the list overhead.
On some architectures it's impractical and/or undesirable to implement
dynamic ftrace trampolines. For example, arm64 has limited branch ranges
and cannot always directly branch from a call site to an arbitrary
address (e.g. from a kernel text address to an arbitrary module
address). Calls from modules to core kernel text can be indirected via
PLTs (allocated at module load time) to address this, but the same is
not possible from calls from core kernel text.
Using an indirect branch from a call site to an arbitrary trampoline is
possible, but requires several more instructions in the function
prologue (or immediately before it), and/or comes with far more complex
requirements for patching.
Instead, this patch adds a new option, where an architecture can
associate each call site with a pointer to an ftrace_ops, placed at a
fixed offset from the call site. A shared trampoline can recover this
pointer and call ftrace_ops::func() without needing to go via
ftrace_ops_list_func(), avoiding the associated overhead.
This avoids issues with branch range limitations, and avoids the need to
allocate and manipulate dynamic trampolines, making it far simpler to
implement and maintain, while having similar performance
characteristics.
Note that this allows for dynamic ftrace_ops to be invoked directly from
an architecture's ftrace_caller trampoline, whereas existing code forces
the use of ftrace_ops_get_list_func(), which is in part necessary to
permit the ftrace_ops to be freed once unregistered *and* to avoid
branch/address-generation range limitation on some architectures (e.g.
where ops->func is a module address, and may be outside of the direct
branch range for callsites within the main kernel image).
The CALL_OPS approach avoids this problems and is safe as:
* The existing synchronization in ftrace_shutdown() using
ftrace_shutdown() using synchronize_rcu_tasks_rude() (and
synchronize_rcu_tasks()) ensures that no tasks hold a stale reference
to an ftrace_ops (e.g. in the middle of the ftrace_caller trampoline,
or while invoking ftrace_ops::func), when that ftrace_ops is
unregistered.
Arguably this could also be relied upon for the existing scheme,
permitting dynamic ftrace_ops to be invoked directly when ops->func is
in range, but this will require additional logic to handle branch
range limitations, and is not handled by this patch.
* Each callsite's ftrace_ops pointer literal can hold any valid kernel
address, and is updated atomically. As an architecture's ftrace_caller
trampoline will atomically load the ops pointer then dereference
ops->func, there is no risk of invoking ops->func with a mismatches
ops pointer, and updates to the ops pointer do not require special
care.
A subsequent patch will implement architectures support for arm64. There
should be no functional change as a result of this patch alone.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: Florent Revest <revest@chromium.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230123134603.1064407-2-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-01-23 13:45:56 +00:00
|
|
|
* When using call ops, this is called when the associated ops change, even
|
|
|
|
* when (addr == old_addr).
|
|
|
|
*
|
2012-04-30 16:20:23 -04:00
|
|
|
* The code segment at @rec->ip should be a caller to @old_addr
|
|
|
|
*
|
|
|
|
* Return must be:
|
|
|
|
* 0 on success
|
|
|
|
* -EFAULT on error reading the location
|
|
|
|
* -EINVAL on a failed compare of the contents
|
|
|
|
* -EPERM on error writing to the location
|
|
|
|
* Any other value will be considered a failure.
|
|
|
|
*/
|
|
|
|
extern int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr,
|
|
|
|
unsigned long addr);
|
|
|
|
#else
|
|
|
|
/* Should never be called */
|
|
|
|
static inline int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr,
|
|
|
|
unsigned long addr)
|
|
|
|
{
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2008-11-14 16:21:19 -08:00
|
|
|
/* May be defined in arch */
|
|
|
|
extern int ftrace_arch_read_dyn_info(char *buf, int size);
|
2008-10-23 09:32:59 -04:00
|
|
|
|
2008-06-21 23:47:53 +05:30
|
|
|
extern int skip_trace(unsigned long ip);
|
2014-04-24 10:40:12 -04:00
|
|
|
extern void ftrace_module_init(struct module *mod);
|
2016-02-16 17:32:33 -05:00
|
|
|
extern void ftrace_module_enable(struct module *mod);
|
2016-01-05 20:32:47 -05:00
|
|
|
extern void ftrace_release_mod(struct module *mod);
|
2008-06-21 23:47:53 +05:30
|
|
|
|
2008-09-06 01:06:03 -04:00
|
|
|
extern void ftrace_disable_daemon(void);
|
|
|
|
extern void ftrace_enable_daemon(void);
|
2012-06-06 13:45:31 -04:00
|
|
|
#else /* CONFIG_DYNAMIC_FTRACE */
|
2010-05-04 11:24:01 -04:00
|
|
|
static inline int skip_trace(unsigned long ip) { return 0; }
|
|
|
|
static inline void ftrace_disable_daemon(void) { }
|
|
|
|
static inline void ftrace_enable_daemon(void) { }
|
2016-02-16 17:32:33 -05:00
|
|
|
static inline void ftrace_module_init(struct module *mod) { }
|
|
|
|
static inline void ftrace_module_enable(struct module *mod) { }
|
|
|
|
static inline void ftrace_release_mod(struct module *mod) { }
|
2013-01-09 18:09:20 -05:00
|
|
|
static inline int ftrace_text_reserved(const void *start, const void *end)
|
2010-02-02 16:49:11 -05:00
|
|
|
{
|
|
|
|
return 0;
|
|
|
|
}
|
2012-06-06 13:45:31 -04:00
|
|
|
static inline unsigned long ftrace_location(unsigned long ip)
|
|
|
|
{
|
|
|
|
return 0;
|
|
|
|
}
|
2011-12-19 14:41:25 -05:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Again users of functions that have ftrace_ops may not
|
|
|
|
* have them defined when ftrace is not enabled, but these
|
|
|
|
* functions may still be called. Use a macro instead of inline.
|
|
|
|
*/
|
|
|
|
#define ftrace_regex_open(ops, flag, inod, file) ({ -ENODEV; })
|
2012-01-07 17:26:49 -05:00
|
|
|
#define ftrace_set_early_filter(ops, buf, enable) do { } while (0)
|
2012-06-05 19:28:08 +09:00
|
|
|
#define ftrace_set_filter_ip(ops, ip, remove, reset) ({ -ENODEV; })
|
2022-03-15 23:00:26 +09:00
|
|
|
#define ftrace_set_filter_ips(ops, ips, cnt, remove, reset) ({ -ENODEV; })
|
ftrace, perf: Add filter support for function trace event
Adding support to filter function trace event via perf
interface. It is now possible to use filter interface
in the perf tool like:
perf record -e ftrace:function --filter="(ip == mm_*)" ls
The filter syntax is restricted to the the 'ip' field only,
and following operators are accepted '==' '!=' '||', ending
up with the filter strings like:
ip == f1[, ]f2 ... || ip != f3[, ]f4 ...
with comma ',' or space ' ' as a function separator. If the
space ' ' is used as a separator, the right side of the
assignment needs to be enclosed in double quotes '"', e.g.:
perf record -e ftrace:function --filter '(ip == do_execve,sys_*,ext*)' ls
perf record -e ftrace:function --filter '(ip == "do_execve,sys_*,ext*")' ls
perf record -e ftrace:function --filter '(ip == "do_execve sys_* ext*")' ls
The '==' operator adds trace filter with same effect as would
be added via set_ftrace_filter file.
The '!=' operator adds trace filter with same effect as would
be added via set_ftrace_notrace file.
The right side of the '!=', '==' operators is list of functions
or regexp. to be added to filter separated by space.
The '||' operator is used for connecting multiple filter definitions
together. It is possible to have more than one '==' and '!='
operators within one filter string.
Link: http://lkml.kernel.org/r/1329317514-8131-8-git-send-email-jolsa@redhat.com
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-02-15 15:51:54 +01:00
|
|
|
#define ftrace_set_filter(ops, buf, len, reset) ({ -ENODEV; })
|
|
|
|
#define ftrace_set_notrace(ops, buf, len, reset) ({ -ENODEV; })
|
|
|
|
#define ftrace_free_filter(ops) do { } while (0)
|
2016-11-15 12:31:20 -08:00
|
|
|
#define ftrace_ops_set_global_filter(ops) do { } while (0)
|
2011-12-19 14:41:25 -05:00
|
|
|
|
|
|
|
static inline ssize_t ftrace_filter_write(struct file *file, const char __user *ubuf,
|
|
|
|
size_t cnt, loff_t *ppos) { return -ENODEV; }
|
|
|
|
static inline ssize_t ftrace_notrace_write(struct file *file, const char __user *ubuf,
|
|
|
|
size_t cnt, loff_t *ppos) { return -ENODEV; }
|
|
|
|
static inline int
|
|
|
|
ftrace_regex_release(struct inode *inode, struct file *file) { return -ENODEV; }
|
2014-11-18 21:14:11 -05:00
|
|
|
|
|
|
|
static inline bool is_ftrace_trampoline(unsigned long addr)
|
|
|
|
{
|
|
|
|
return false;
|
|
|
|
}
|
2008-06-21 23:47:53 +05:30
|
|
|
#endif /* CONFIG_DYNAMIC_FTRACE */
|
2008-05-12 21:20:42 +02:00
|
|
|
|
2021-10-08 11:13:31 +02:00
|
|
|
#ifdef CONFIG_FUNCTION_GRAPH_TRACER
|
|
|
|
#ifndef ftrace_graph_func
|
|
|
|
#define ftrace_graph_func ftrace_stub
|
|
|
|
#define FTRACE_OPS_GRAPH_STUB FTRACE_OPS_FL_STUB
|
|
|
|
#else
|
|
|
|
#define FTRACE_OPS_GRAPH_STUB 0
|
|
|
|
#endif
|
|
|
|
#endif /* CONFIG_FUNCTION_GRAPH_TRACER */
|
|
|
|
|
2008-05-12 21:20:49 +02:00
|
|
|
/* totally disable ftrace - can not re-enable after this */
|
|
|
|
void ftrace_kill(void);
|
|
|
|
|
2008-05-12 21:20:43 +02:00
|
|
|
static inline void tracer_disable(void)
|
|
|
|
{
|
2008-10-06 19:06:12 -04:00
|
|
|
#ifdef CONFIG_FUNCTION_TRACER
|
2008-05-12 21:20:43 +02:00
|
|
|
ftrace_enabled = 0;
|
|
|
|
#endif
|
|
|
|
}
|
|
|
|
|
2008-08-18 16:24:56 +08:00
|
|
|
/*
|
|
|
|
* Ftrace disable/restore without lock. Some synchronization mechanism
|
2008-08-15 00:40:25 -07:00
|
|
|
* must be used to prevent ftrace_enabled to be changed between
|
2008-08-18 16:24:56 +08:00
|
|
|
* disable/restore.
|
|
|
|
*/
|
2008-08-15 00:40:25 -07:00
|
|
|
static inline int __ftrace_enabled_save(void)
|
|
|
|
{
|
2008-10-06 19:06:12 -04:00
|
|
|
#ifdef CONFIG_FUNCTION_TRACER
|
2008-08-15 00:40:25 -07:00
|
|
|
int saved_ftrace_enabled = ftrace_enabled;
|
|
|
|
ftrace_enabled = 0;
|
|
|
|
return saved_ftrace_enabled;
|
|
|
|
#else
|
|
|
|
return 0;
|
|
|
|
#endif
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline void __ftrace_enabled_restore(int enabled)
|
|
|
|
{
|
2008-10-06 19:06:12 -04:00
|
|
|
#ifdef CONFIG_FUNCTION_TRACER
|
2008-08-15 00:40:25 -07:00
|
|
|
ftrace_enabled = enabled;
|
|
|
|
#endif
|
|
|
|
}
|
|
|
|
|
2014-05-20 20:31:04 +09:00
|
|
|
/* All archs should have this, but we define it for consistency */
|
|
|
|
#ifndef ftrace_return_address0
|
|
|
|
# define ftrace_return_address0 __builtin_return_address(0)
|
|
|
|
#endif
|
|
|
|
|
|
|
|
/* Archs may use other ways for ADDR1 and beyond */
|
|
|
|
#ifndef ftrace_return_address
|
2009-02-27 21:30:03 +01:00
|
|
|
# ifdef CONFIG_FRAME_POINTER
|
2014-05-20 20:31:04 +09:00
|
|
|
# define ftrace_return_address(n) __builtin_return_address(n)
|
2009-02-27 21:30:03 +01:00
|
|
|
# else
|
2014-05-20 20:31:04 +09:00
|
|
|
# define ftrace_return_address(n) 0UL
|
2009-02-27 21:30:03 +01:00
|
|
|
# endif
|
2014-05-20 20:31:04 +09:00
|
|
|
#endif
|
|
|
|
|
|
|
|
#define CALLER_ADDR0 ((unsigned long)ftrace_return_address0)
|
|
|
|
#define CALLER_ADDR1 ((unsigned long)ftrace_return_address(1))
|
|
|
|
#define CALLER_ADDR2 ((unsigned long)ftrace_return_address(2))
|
|
|
|
#define CALLER_ADDR3 ((unsigned long)ftrace_return_address(3))
|
|
|
|
#define CALLER_ADDR4 ((unsigned long)ftrace_return_address(4))
|
|
|
|
#define CALLER_ADDR5 ((unsigned long)ftrace_return_address(5))
|
|
|
|
#define CALLER_ADDR6 ((unsigned long)ftrace_return_address(6))
|
2008-05-12 21:20:42 +02:00
|
|
|
|
2016-02-26 14:54:56 +01:00
|
|
|
static inline unsigned long get_lock_parent_ip(void)
|
|
|
|
{
|
|
|
|
unsigned long addr = CALLER_ADDR0;
|
|
|
|
|
|
|
|
if (!in_lock_functions(addr))
|
|
|
|
return addr;
|
|
|
|
addr = CALLER_ADDR1;
|
|
|
|
if (!in_lock_functions(addr))
|
|
|
|
return addr;
|
|
|
|
return CALLER_ADDR2;
|
|
|
|
}
|
|
|
|
|
2018-07-30 15:24:23 -07:00
|
|
|
#ifdef CONFIG_TRACE_PREEMPT_TOGGLE
|
2008-02-25 13:38:05 +01:00
|
|
|
extern void trace_preempt_on(unsigned long a0, unsigned long a1);
|
|
|
|
extern void trace_preempt_off(unsigned long a0, unsigned long a1);
|
2008-05-12 21:20:42 +02:00
|
|
|
#else
|
2012-05-07 11:36:00 +09:00
|
|
|
/*
|
|
|
|
* Use defines instead of static inlines because some arches will make code out
|
|
|
|
* of the CALLER_ADDR, when we really want these to be a real nop.
|
|
|
|
*/
|
|
|
|
# define trace_preempt_on(a0, a1) do { } while (0)
|
|
|
|
# define trace_preempt_off(a0, a1) do { } while (0)
|
2008-05-12 21:20:42 +02:00
|
|
|
#endif
|
|
|
|
|
2008-08-14 15:45:08 -04:00
|
|
|
#ifdef CONFIG_FTRACE_MCOUNT_RECORD
|
|
|
|
extern void ftrace_init(void);
|
module/ftrace: handle patchable-function-entry
When using patchable-function-entry, the compiler will record the
callsites into a section named "__patchable_function_entries" rather
than "__mcount_loc". Let's abstract this difference behind a new
FTRACE_CALLSITE_SECTION, so that architectures don't have to handle this
explicitly (e.g. with custom module linker scripts).
As parisc currently handles this explicitly, it is fixed up accordingly,
with its custom linker script removed. Since FTRACE_CALLSITE_SECTION is
only defined when DYNAMIC_FTRACE is selected, the parisc module loading
code is updated to only use the definition in that case. When
DYNAMIC_FTRACE is not selected, modules shouldn't have this section, so
this removes some redundant work in that case.
To make sure that this is keep up-to-date for modules and the main
kernel, a comment is added to vmlinux.lds.h, with the existing ifdeffery
simplified for legibility.
I built parisc generic-{32,64}bit_defconfig with DYNAMIC_FTRACE enabled,
and verified that the section made it into the .ko files for modules.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Helge Deller <deller@gmx.de>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Torsten Duwe <duwe@suse.de>
Tested-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Tested-by: Sven Schnelle <svens@stackframe.org>
Tested-by: Torsten Duwe <duwe@suse.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: linux-parisc@vger.kernel.org
2019-10-16 18:17:11 +01:00
|
|
|
#ifdef CC_USING_PATCHABLE_FUNCTION_ENTRY
|
|
|
|
#define FTRACE_CALLSITE_SECTION "__patchable_function_entries"
|
|
|
|
#else
|
|
|
|
#define FTRACE_CALLSITE_SECTION "__mcount_loc"
|
|
|
|
#endif
|
2008-08-14 15:45:08 -04:00
|
|
|
#else
|
|
|
|
static inline void ftrace_init(void) { }
|
|
|
|
#endif
|
|
|
|
|
2008-11-26 00:57:25 +01:00
|
|
|
/*
|
|
|
|
* Structure that defines an entry function trace.
|
2016-06-29 19:56:48 +09:00
|
|
|
* It's already packed but the attribute "packed" is needed
|
|
|
|
* to remove extra padding at the end.
|
2008-11-26 00:57:25 +01:00
|
|
|
*/
|
|
|
|
struct ftrace_graph_ent {
|
|
|
|
unsigned long func; /* Current function */
|
|
|
|
int depth;
|
2016-06-29 19:56:48 +09:00
|
|
|
} __packed;
|
2008-08-01 12:26:41 -04:00
|
|
|
|
2008-11-11 07:03:45 +01:00
|
|
|
/*
|
|
|
|
* Structure that defines a return function trace.
|
2016-06-29 19:56:48 +09:00
|
|
|
* It's already packed but the attribute "packed" is needed
|
|
|
|
* to remove extra padding at the end.
|
2008-11-11 07:03:45 +01:00
|
|
|
*/
|
2008-11-25 21:07:04 +01:00
|
|
|
struct ftrace_graph_ret {
|
2008-11-11 07:03:45 +01:00
|
|
|
unsigned long func; /* Current function */
|
2020-10-28 08:19:24 -04:00
|
|
|
int depth;
|
2008-11-17 03:22:41 +01:00
|
|
|
/* Number of functions that overran the depth limit for current task */
|
2020-10-28 08:19:24 -04:00
|
|
|
unsigned int overrun;
|
2016-06-29 19:56:48 +09:00
|
|
|
unsigned long long calltime;
|
|
|
|
unsigned long long rettime;
|
|
|
|
} __packed;
|
2008-11-11 07:03:45 +01:00
|
|
|
|
2010-04-02 19:01:22 +02:00
|
|
|
/* Type of the callback handlers for tracing function graph*/
|
|
|
|
typedef void (*trace_func_graph_ret_t)(struct ftrace_graph_ret *); /* return */
|
|
|
|
typedef int (*trace_func_graph_ent_t)(struct ftrace_graph_ent *); /* entry */
|
|
|
|
|
2019-04-24 12:34:46 -04:00
|
|
|
extern int ftrace_graph_entry_stub(struct ftrace_graph_ent *trace);
|
|
|
|
|
2008-11-25 21:07:04 +01:00
|
|
|
#ifdef CONFIG_FUNCTION_GRAPH_TRACER
|
2008-12-06 03:40:00 +01:00
|
|
|
|
2018-11-15 14:06:47 -05:00
|
|
|
struct fgraph_ops {
|
|
|
|
trace_func_graph_ent_t entryfunc;
|
|
|
|
trace_func_graph_ret_t retfunc;
|
|
|
|
};
|
|
|
|
|
2009-02-09 10:54:03 -08:00
|
|
|
/*
|
|
|
|
* Stack of return addresses for functions
|
|
|
|
* of a thread.
|
|
|
|
* Used in struct thread_info
|
|
|
|
*/
|
|
|
|
struct ftrace_ret_stack {
|
|
|
|
unsigned long ret;
|
|
|
|
unsigned long func;
|
|
|
|
unsigned long long calltime;
|
2016-08-31 11:55:29 +09:00
|
|
|
#ifdef CONFIG_FUNCTION_PROFILER
|
2009-03-24 23:17:58 -04:00
|
|
|
unsigned long long subtime;
|
2016-08-31 11:55:29 +09:00
|
|
|
#endif
|
2016-08-19 06:52:56 -05:00
|
|
|
#ifdef HAVE_FUNCTION_GRAPH_FP_TEST
|
function-graph: add stack frame test
In case gcc does something funny with the stack frames, or the return
from function code, we would like to detect that.
An arch may implement passing of a variable that is unique to the
function and can be saved on entering a function and can be tested
when exiting the function. Usually the frame pointer can be used for
this purpose.
This patch also implements this for x86. Where it passes in the stack
frame of the parent function, and will test that frame on exit.
There was a case in x86_32 with optimize for size (-Os) where, for a
few functions, gcc would align the stack frame and place a copy of the
return address into it. The function graph tracer modified the copy and
not the actual return address. On return from the funtion, it did not go
to the tracer hook, but returned to the parent. This broke the function
graph tracer, because the return of the parent (where gcc did not do
this funky manipulation) returned to the location that the child function
was suppose to. This caused strange kernel crashes.
This test detected the problem and pointed out where the issue was.
This modifies the parameters of one of the functions that the arch
specific code calls, so it includes changes to arch code to accommodate
the new prototype.
Note, I notice that the parsic arch implements its own push_return_trace.
This is now a generic function and the ftrace_push_return_trace should be
used instead. This patch does not touch that code.
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-06-18 12:45:08 -04:00
|
|
|
unsigned long fp;
|
2016-08-19 06:52:56 -05:00
|
|
|
#endif
|
2016-08-19 06:52:57 -05:00
|
|
|
#ifdef HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
|
|
|
|
unsigned long *retp;
|
|
|
|
#endif
|
2009-02-09 10:54:03 -08:00
|
|
|
};
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Primary handler of a function return.
|
|
|
|
* It relays on ftrace_return_to_handler.
|
|
|
|
* Defined in entry_32/64.S
|
|
|
|
*/
|
|
|
|
extern void return_to_handler(void);
|
|
|
|
|
|
|
|
extern int
|
2018-11-18 17:10:15 -05:00
|
|
|
function_graph_enter(unsigned long ret, unsigned long func,
|
|
|
|
unsigned long frame_pointer, unsigned long *retp);
|
2009-02-09 10:54:03 -08:00
|
|
|
|
2018-11-19 20:54:08 -05:00
|
|
|
struct ftrace_ret_stack *
|
|
|
|
ftrace_graph_get_ret_stack(struct task_struct *task, int idx);
|
|
|
|
|
2016-08-19 06:52:58 -05:00
|
|
|
unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
|
|
|
|
unsigned long ret, unsigned long *retp);
|
|
|
|
|
2008-12-06 03:40:00 +01:00
|
|
|
/*
|
|
|
|
* Sometimes we don't want to trace a function with the function
|
|
|
|
* graph tracer but we want them to keep traced by the usual function
|
|
|
|
* tracer if the function graph tracer is not configured.
|
|
|
|
*/
|
|
|
|
#define __notrace_funcgraph notrace
|
|
|
|
|
2008-11-23 06:22:56 +01:00
|
|
|
#define FTRACE_RETFUNC_DEPTH 50
|
|
|
|
#define FTRACE_RETSTACK_ALLOC_SIZE 32
|
2018-11-15 14:06:47 -05:00
|
|
|
|
|
|
|
extern int register_ftrace_graph(struct fgraph_ops *ops);
|
|
|
|
extern void unregister_ftrace_graph(struct fgraph_ops *ops);
|
2008-11-26 00:57:25 +01:00
|
|
|
|
2022-03-30 09:00:19 +02:00
|
|
|
/**
|
|
|
|
* ftrace_graph_is_dead - returns true if ftrace_graph_stop() was called
|
|
|
|
*
|
|
|
|
* ftrace_graph_stop() is called when a severe error is detected in
|
|
|
|
* the function graph tracing. This function is called by the critical
|
|
|
|
* paths of function graph to keep those paths from doing any more harm.
|
|
|
|
*/
|
|
|
|
DECLARE_STATIC_KEY_FALSE(kill_ftrace_graph);
|
|
|
|
|
|
|
|
static inline bool ftrace_graph_is_dead(void)
|
|
|
|
{
|
|
|
|
return static_branch_unlikely(&kill_ftrace_graph);
|
|
|
|
}
|
|
|
|
|
2008-12-02 23:50:02 -05:00
|
|
|
extern void ftrace_graph_stop(void);
|
|
|
|
|
2008-11-26 00:57:25 +01:00
|
|
|
/* The current handlers in use */
|
|
|
|
extern trace_func_graph_ret_t ftrace_graph_return;
|
|
|
|
extern trace_func_graph_ent_t ftrace_graph_entry;
|
2008-11-11 07:03:45 +01:00
|
|
|
|
2008-11-25 21:07:04 +01:00
|
|
|
extern void ftrace_graph_init_task(struct task_struct *t);
|
|
|
|
extern void ftrace_graph_exit_task(struct task_struct *t);
|
ftrace: Fix memory leak with function graph and cpu hotplug
When the fuction graph tracer starts, it needs to make a special
stack for each task to save the real return values of the tasks.
All running tasks have this stack created, as well as any new
tasks.
On CPU hot plug, the new idle task will allocate a stack as well
when init_idle() is called. The problem is that cpu hotplug does
not create a new idle_task. Instead it uses the idle task that
existed when the cpu went down.
ftrace_graph_init_task() will add a new ret_stack to the task
that is given to it. Because a clone will make the task
have a stack of its parent it does not check if the task's
ret_stack is already NULL or not. When the CPU hotplug code
starts a CPU up again, it will allocate a new stack even
though one already existed for it.
The solution is to treat the idle_task specially. In fact, the
function_graph code already does, just not at init_idle().
Instead of using the ftrace_graph_init_task() for the idle task,
which that function expects the task to be a clone, have a
separate ftrace_graph_init_idle_task(). Also, we will create a
per_cpu ret_stack that is used by the idle task. When we call
ftrace_graph_init_idle_task() it will check if the idle task's
ret_stack is NULL, if it is, then it will assign it the per_cpu
ret_stack.
Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Suggested-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stable Tree <stable@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-02-10 21:26:13 -05:00
|
|
|
extern void ftrace_graph_init_idle_task(struct task_struct *t, int cpu);
|
2008-12-04 23:51:23 +01:00
|
|
|
|
2008-12-06 03:43:41 +01:00
|
|
|
static inline void pause_graph_tracing(void)
|
|
|
|
{
|
|
|
|
atomic_inc(¤t->tracing_graph_pause);
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline void unpause_graph_tracing(void)
|
|
|
|
{
|
|
|
|
atomic_dec(¤t->tracing_graph_pause);
|
|
|
|
}
|
2009-03-25 20:55:00 -04:00
|
|
|
#else /* !CONFIG_FUNCTION_GRAPH_TRACER */
|
2008-12-06 03:40:00 +01:00
|
|
|
|
|
|
|
#define __notrace_funcgraph
|
|
|
|
|
2008-11-25 21:07:04 +01:00
|
|
|
static inline void ftrace_graph_init_task(struct task_struct *t) { }
|
|
|
|
static inline void ftrace_graph_exit_task(struct task_struct *t) { }
|
ftrace: Fix memory leak with function graph and cpu hotplug
When the fuction graph tracer starts, it needs to make a special
stack for each task to save the real return values of the tasks.
All running tasks have this stack created, as well as any new
tasks.
On CPU hot plug, the new idle task will allocate a stack as well
when init_idle() is called. The problem is that cpu hotplug does
not create a new idle_task. Instead it uses the idle task that
existed when the cpu went down.
ftrace_graph_init_task() will add a new ret_stack to the task
that is given to it. Because a clone will make the task
have a stack of its parent it does not check if the task's
ret_stack is already NULL or not. When the CPU hotplug code
starts a CPU up again, it will allocate a new stack even
though one already existed for it.
The solution is to treat the idle_task specially. In fact, the
function_graph code already does, just not at init_idle().
Instead of using the ftrace_graph_init_task() for the idle task,
which that function expects the task to be a clone, have a
separate ftrace_graph_init_idle_task(). Also, we will create a
per_cpu ret_stack that is used by the idle task. When we call
ftrace_graph_init_idle_task() it will check if the idle task's
ret_stack is NULL, if it is, then it will assign it the per_cpu
ret_stack.
Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Suggested-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stable Tree <stable@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-02-10 21:26:13 -05:00
|
|
|
static inline void ftrace_graph_init_idle_task(struct task_struct *t, int cpu) { }
|
2008-12-04 23:51:23 +01:00
|
|
|
|
2018-11-15 14:06:47 -05:00
|
|
|
/* Define as macros as fgraph_ops may not be defined */
|
|
|
|
#define register_ftrace_graph(ops) ({ -1; })
|
|
|
|
#define unregister_ftrace_graph(ops) do { } while (0)
|
2008-12-06 03:43:41 +01:00
|
|
|
|
2016-08-19 06:52:58 -05:00
|
|
|
static inline unsigned long
|
|
|
|
ftrace_graph_ret_addr(struct task_struct *task, int *idx, unsigned long ret,
|
|
|
|
unsigned long *retp)
|
|
|
|
{
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2008-12-06 03:43:41 +01:00
|
|
|
static inline void pause_graph_tracing(void) { }
|
|
|
|
static inline void unpause_graph_tracing(void) { }
|
2009-03-25 20:55:00 -04:00
|
|
|
#endif /* CONFIG_FUNCTION_GRAPH_TRACER */
|
2008-11-11 07:03:45 +01:00
|
|
|
|
2008-12-03 15:36:57 -05:00
|
|
|
#ifdef CONFIG_TRACING
|
2010-04-18 19:08:41 +02:00
|
|
|
enum ftrace_dump_mode;
|
|
|
|
|
|
|
|
extern enum ftrace_dump_mode ftrace_dump_on_oops;
|
2014-12-12 22:27:10 -05:00
|
|
|
extern int tracepoint_printk;
|
2009-03-05 10:28:45 +01:00
|
|
|
|
2013-06-14 16:21:43 -04:00
|
|
|
extern void disable_trace_on_warning(void);
|
|
|
|
extern int __disable_trace_on_warning;
|
|
|
|
|
2016-11-23 15:52:45 -05:00
|
|
|
int tracepoint_printk_sysctl(struct ctl_table *table, int write,
|
2020-04-24 08:43:38 +02:00
|
|
|
void *buffer, size_t *lenp, loff_t *ppos);
|
2016-11-23 15:52:45 -05:00
|
|
|
|
2013-06-14 16:21:43 -04:00
|
|
|
#else /* CONFIG_TRACING */
|
|
|
|
static inline void disable_trace_on_warning(void) { }
|
2008-12-03 15:36:57 -05:00
|
|
|
#endif /* CONFIG_TRACING */
|
|
|
|
|
2010-01-26 04:40:03 -05:00
|
|
|
#ifdef CONFIG_FTRACE_SYSCALLS
|
|
|
|
|
|
|
|
unsigned long arch_syscall_addr(int nr);
|
|
|
|
|
|
|
|
#endif /* CONFIG_FTRACE_SYSCALLS */
|
|
|
|
|
2008-05-12 21:20:42 +02:00
|
|
|
#endif /* _LINUX_FTRACE_H */
|