License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-01 14:07:57 +00:00
|
|
|
/* SPDX-License-Identifier: GPL-2.0 */
|
tracing: Add and use generic set_trigger_filter() implementation
Add a generic event_command.set_trigger_filter() op implementation and
have the current set of trigger commands use it - this essentially
gives them all support for filters.
Syntactically, filters are supported by adding 'if <filter>' just
after the command, in which case only events matching the filter will
invoke the trigger. For example, to add a filter to an
enable/disable_event command:
echo 'enable_event:system:event if common_pid == 999' > \
.../othersys/otherevent/trigger
The above command will only enable the system:event event if the
common_pid field in the othersys:otherevent event is 999.
As another example, to add a filter to a stacktrace command:
echo 'stacktrace if common_pid == 999' > \
.../somesys/someevent/trigger
The above command will only trigger a stacktrace if the common_pid
field in the event is 999.
The filter syntax is the same as that described in the 'Event
filtering' section of Documentation/trace/events.txt.
Because triggers can now use filters, the trigger-invoking logic needs
to be moved in those cases - e.g. for ftrace_raw_event_calls, if a
trigger has a filter associated with it, the trigger invocation now
needs to happen after the { assign; } part of the call, in order for
the trigger condition to be tested.
There's still a SOFT_DISABLED-only check at the top of e.g. the
ftrace_raw_events function, so when an event is soft disabled but not
because of the presence of a trigger, the original SOFT_DISABLED
behavior remains unchanged.
There's also a bit of trickiness in that some triggers need to avoid
being invoked while an event is currently in the process of being
logged, since the trigger may itself log data into the trace buffer.
Thus we make sure the current event is committed before invoking those
triggers. To do that, we split the trigger invocation in two - the
first part (event_triggers_call()) checks the filter using the current
trace record; if a command has the post_trigger flag set, it sets a
bit for itself in the return value, otherwise it directly invoks the
trigger. Once all commands have been either invoked or set their
return flag, event_triggers_call() returns. The current record is
then either committed or discarded; if any commands have deferred
their triggers, those commands are finally invoked following the close
of the current event by event_triggers_post_call().
To simplify the above and make it more efficient, the TRIGGER_COND bit
is introduced, which is set only if a soft-disabled trigger needs to
use the log record for filter testing or needs to wait until the
current log record is closed.
The syscall event invocation code is also changed in analogous ways.
Because event triggers need to be able to create and free filters,
this also adds a couple external wrappers for the existing
create_filter and free_filter functions, which are too generic to be
made extern functions themselves.
Link: http://lkml.kernel.org/r/7164930759d8719ef460357f143d995406e4eead.1382622043.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-10-24 13:59:29 +00:00
|
|
|
|
2015-05-04 22:12:44 +00:00
|
|
|
#ifndef _LINUX_TRACE_EVENT_H
|
|
|
|
#define _LINUX_TRACE_EVENT_H
|
2009-04-13 15:20:49 +00:00
|
|
|
|
|
|
|
#include <linux/ring_buffer.h>
|
2009-09-12 23:04:54 +00:00
|
|
|
#include <linux/trace_seq.h>
|
2009-05-26 18:25:22 +00:00
|
|
|
#include <linux/percpu.h>
|
2009-09-18 04:10:28 +00:00
|
|
|
#include <linux/hardirq.h>
|
2010-01-28 01:32:29 +00:00
|
|
|
#include <linux/perf_event.h>
|
2014-04-08 21:26:21 +00:00
|
|
|
#include <linux/tracepoint.h>
|
2009-04-13 15:20:49 +00:00
|
|
|
|
|
|
|
struct trace_array;
|
2020-01-09 23:53:48 +00:00
|
|
|
struct array_buffer;
|
2009-04-13 15:20:49 +00:00
|
|
|
struct tracer;
|
2009-04-10 18:53:50 +00:00
|
|
|
struct dentry;
|
tracing, perf: Implement BPF programs attached to kprobes
BPF programs, attached to kprobes, provide a safe way to execute
user-defined BPF byte-code programs without being able to crash or
hang the kernel in any way. The BPF engine makes sure that such
programs have a finite execution time and that they cannot break
out of their sandbox.
The user interface is to attach to a kprobe via the perf syscall:
struct perf_event_attr attr = {
.type = PERF_TYPE_TRACEPOINT,
.config = event_id,
...
};
event_fd = perf_event_open(&attr,...);
ioctl(event_fd, PERF_EVENT_IOC_SET_BPF, prog_fd);
'prog_fd' is a file descriptor associated with BPF program
previously loaded.
'event_id' is an ID of the kprobe created.
Closing 'event_fd':
close(event_fd);
... automatically detaches BPF program from it.
BPF programs can call in-kernel helper functions to:
- lookup/update/delete elements in maps
- probe_read - wraper of probe_kernel_read() used to access any
kernel data structures
BPF programs receive 'struct pt_regs *' as an input ('struct pt_regs' is
architecture dependent) and return 0 to ignore the event and 1 to store
kprobe event into the ring buffer.
Note, kprobes are a fundamentally _not_ a stable kernel ABI,
so BPF programs attached to kprobes must be recompiled for
every kernel version and user must supply correct LINUX_VERSION_CODE
in attr.kern_version during bpf_prog_load() call.
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1427312966-8434-4-git-send-email-ast@plumgrid.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-25 19:49:20 +00:00
|
|
|
struct bpf_prog;
|
2022-03-16 12:24:09 +00:00
|
|
|
union bpf_attr;
|
2009-04-13 15:20:49 +00:00
|
|
|
|
2024-02-22 21:14:19 +00:00
|
|
|
/* Used for event string fields when they are NULL */
|
|
|
|
#define EVENT_NULL_STR "(null)"
|
|
|
|
|
2015-05-04 22:12:44 +00:00
|
|
|
const char *trace_print_flags_seq(struct trace_seq *p, const char *delim,
|
|
|
|
unsigned long flags,
|
|
|
|
const struct trace_print_flags *flag_array);
|
2009-05-26 18:25:22 +00:00
|
|
|
|
2015-05-04 22:12:44 +00:00
|
|
|
const char *trace_print_symbols_seq(struct trace_seq *p, unsigned long val,
|
|
|
|
const struct trace_print_flags *symbol_array);
|
2009-05-20 23:21:47 +00:00
|
|
|
|
2011-04-19 01:35:28 +00:00
|
|
|
#if BITS_PER_LONG == 32
|
2017-02-22 23:39:47 +00:00
|
|
|
const char *trace_print_flags_seq_u64(struct trace_seq *p, const char *delim,
|
|
|
|
unsigned long long flags,
|
|
|
|
const struct trace_print_flags_u64 *flag_array);
|
|
|
|
|
2015-05-04 22:12:44 +00:00
|
|
|
const char *trace_print_symbols_seq_u64(struct trace_seq *p,
|
|
|
|
unsigned long long val,
|
|
|
|
const struct trace_print_flags_u64
|
2011-04-19 01:35:28 +00:00
|
|
|
*symbol_array);
|
|
|
|
#endif
|
|
|
|
|
2015-05-04 22:12:44 +00:00
|
|
|
const char *trace_print_bitmask_seq(struct trace_seq *p, void *bitmask_ptr,
|
|
|
|
unsigned int bitmask_size);
|
tracing: Add __bitmask() macro to trace events to cpumasks and other bitmasks
Being able to show a cpumask of events can be useful as some events
may affect only some CPUs. There is no standard way to record the
cpumask and converting it to a string is rather expensive during
the trace as traces happen in hotpaths. It would be better to record
the raw event mask and be able to parse it at print time.
The following macros were added for use with the TRACE_EVENT() macro:
__bitmask()
__assign_bitmask()
__get_bitmask()
To test this, I added this to the sched_migrate_task event, which
looked like this:
TRACE_EVENT(sched_migrate_task,
TP_PROTO(struct task_struct *p, int dest_cpu, const struct cpumask *cpus),
TP_ARGS(p, dest_cpu, cpus),
TP_STRUCT__entry(
__array( char, comm, TASK_COMM_LEN )
__field( pid_t, pid )
__field( int, prio )
__field( int, orig_cpu )
__field( int, dest_cpu )
__bitmask( cpumask, num_possible_cpus() )
),
TP_fast_assign(
memcpy(__entry->comm, p->comm, TASK_COMM_LEN);
__entry->pid = p->pid;
__entry->prio = p->prio;
__entry->orig_cpu = task_cpu(p);
__entry->dest_cpu = dest_cpu;
__assign_bitmask(cpumask, cpumask_bits(cpus), num_possible_cpus());
),
TP_printk("comm=%s pid=%d prio=%d orig_cpu=%d dest_cpu=%d cpumask=%s",
__entry->comm, __entry->pid, __entry->prio,
__entry->orig_cpu, __entry->dest_cpu,
__get_bitmask(cpumask))
);
With the output of:
ksmtuned-3613 [003] d..2 485.220508: sched_migrate_task: comm=ksmtuned pid=3615 prio=120 orig_cpu=3 dest_cpu=2 cpumask=00000000,0000000f
migration/1-13 [001] d..5 485.221202: sched_migrate_task: comm=ksmtuned pid=3614 prio=120 orig_cpu=1 dest_cpu=0 cpumask=00000000,0000000f
awk-3615 [002] d.H5 485.221747: sched_migrate_task: comm=rcu_preempt pid=7 prio=120 orig_cpu=0 dest_cpu=1 cpumask=00000000,000000ff
migration/2-18 [002] d..5 485.222062: sched_migrate_task: comm=ksmtuned pid=3615 prio=120 orig_cpu=2 dest_cpu=3 cpumask=00000000,0000000f
Link: http://lkml.kernel.org/r/1399377998-14870-6-git-send-email-javi.merino@arm.com
Link: http://lkml.kernel.org/r/20140506132238.22e136d1@gandalf.local.home
Suggested-by: Javi Merino <javi.merino@arm.com>
Tested-by: Javi Merino <javi.merino@arm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-05-06 17:10:24 +00:00
|
|
|
|
2015-05-04 22:12:44 +00:00
|
|
|
const char *trace_print_hex_seq(struct trace_seq *p,
|
2017-01-25 01:28:16 +00:00
|
|
|
const unsigned char *buf, int len,
|
2017-02-02 16:09:54 +00:00
|
|
|
bool concatenate);
|
2010-04-01 11:40:58 +00:00
|
|
|
|
2015-05-04 22:12:44 +00:00
|
|
|
const char *trace_print_array_seq(struct trace_seq *p,
|
2015-04-29 15:18:46 +00:00
|
|
|
const void *buf, int count,
|
2015-01-28 12:48:53 +00:00
|
|
|
size_t el_size);
|
|
|
|
|
2019-11-07 12:45:38 +00:00
|
|
|
const char *
|
|
|
|
trace_print_hex_dump_seq(struct trace_seq *p, const char *prefix_str,
|
|
|
|
int prefix_type, int rowsize, int groupsize,
|
|
|
|
const void *buf, size_t len, bool ascii);
|
|
|
|
|
2013-02-21 02:32:38 +00:00
|
|
|
struct trace_iterator;
|
|
|
|
struct trace_event;
|
|
|
|
|
2015-05-05 18:18:11 +00:00
|
|
|
int trace_raw_output_prep(struct trace_iterator *iter,
|
|
|
|
struct trace_event *event);
|
2020-10-15 14:55:07 +00:00
|
|
|
extern __printf(2, 3)
|
|
|
|
void trace_event_printf(struct trace_iterator *iter, const char *fmt, ...);
|
2013-02-21 02:32:38 +00:00
|
|
|
|
2023-08-16 15:49:26 +00:00
|
|
|
/* Used to find the offset and length of dynamic fields in trace events */
|
|
|
|
struct trace_dynamic_info {
|
|
|
|
#ifdef CONFIG_CPU_BIG_ENDIAN
|
|
|
|
u16 len;
|
2023-09-08 20:39:29 +00:00
|
|
|
u16 offset;
|
2023-08-16 15:49:26 +00:00
|
|
|
#else
|
|
|
|
u16 offset;
|
2023-09-08 20:39:29 +00:00
|
|
|
u16 len;
|
2023-08-16 15:49:26 +00:00
|
|
|
#endif
|
2023-09-08 20:39:29 +00:00
|
|
|
} __packed;
|
2023-08-16 15:49:26 +00:00
|
|
|
|
2009-04-13 15:20:49 +00:00
|
|
|
/*
|
|
|
|
* The trace entry - the most basic unit of tracing. This is what
|
|
|
|
* is printed in the end as a single line in the trace output, such as:
|
|
|
|
*
|
|
|
|
* bash-15816 [01] 235.197585: idle_cpu <- irq_enter
|
|
|
|
*/
|
|
|
|
struct trace_entry {
|
2009-03-26 15:03:29 +00:00
|
|
|
unsigned short type;
|
2009-04-13 15:20:49 +00:00
|
|
|
unsigned char flags;
|
|
|
|
unsigned char preempt_count;
|
|
|
|
int pid;
|
|
|
|
};
|
|
|
|
|
2015-05-13 17:44:36 +00:00
|
|
|
#define TRACE_EVENT_TYPE_MAX \
|
2009-03-26 15:03:29 +00:00
|
|
|
((1 << (sizeof(((struct trace_entry *)0)->type) * 8)) - 1)
|
|
|
|
|
2009-04-13 15:20:49 +00:00
|
|
|
/*
|
|
|
|
* Trace iterator - used by printout routines who present trace
|
|
|
|
* results to users and which routines might sleep, etc:
|
|
|
|
*/
|
|
|
|
struct trace_iterator {
|
|
|
|
struct trace_array *tr;
|
|
|
|
struct tracer *trace;
|
2020-01-09 23:53:48 +00:00
|
|
|
struct array_buffer *array_buffer;
|
2009-04-13 15:20:49 +00:00
|
|
|
void *private;
|
|
|
|
int cpu_file;
|
|
|
|
struct mutex mutex;
|
2012-06-28 00:46:14 +00:00
|
|
|
struct ring_buffer_iter **buffer_iter;
|
2009-06-01 19:16:05 +00:00
|
|
|
unsigned long iter_flags;
|
2020-03-17 21:32:23 +00:00
|
|
|
void *temp; /* temp holder */
|
|
|
|
unsigned int temp_size;
|
2020-10-15 14:55:07 +00:00
|
|
|
char *fmt; /* modified format holder */
|
|
|
|
unsigned int fmt_size;
|
2024-03-12 12:15:08 +00:00
|
|
|
atomic_t wait_index;
|
2009-04-13 15:20:49 +00:00
|
|
|
|
2010-06-03 10:26:24 +00:00
|
|
|
/* trace_seq for __print_flags() and __print_symbolic() etc. */
|
|
|
|
struct trace_seq tmp_seq;
|
|
|
|
|
2013-08-02 17:16:43 +00:00
|
|
|
cpumask_var_t started;
|
|
|
|
|
2024-03-12 12:15:08 +00:00
|
|
|
/* Set when the file is closed to prevent new waiters */
|
|
|
|
bool closed;
|
|
|
|
|
2013-08-02 17:16:43 +00:00
|
|
|
/* it's true when current open file is snapshot */
|
|
|
|
bool snapshot;
|
|
|
|
|
2009-04-13 15:20:49 +00:00
|
|
|
/* The below is zeroed out in pipe_read */
|
|
|
|
struct trace_seq seq;
|
|
|
|
struct trace_entry *ent;
|
2010-03-31 23:49:26 +00:00
|
|
|
unsigned long lost_events;
|
2009-12-07 14:11:39 +00:00
|
|
|
int leftover;
|
2011-07-14 20:36:53 +00:00
|
|
|
int ent_size;
|
2009-04-13 15:20:49 +00:00
|
|
|
int cpu;
|
|
|
|
u64 ts;
|
|
|
|
|
|
|
|
loff_t pos;
|
|
|
|
long idx;
|
|
|
|
|
2013-08-02 17:16:43 +00:00
|
|
|
/* All new field here will be zeroed out in pipe_read */
|
2009-04-13 15:20:49 +00:00
|
|
|
};
|
|
|
|
|
2012-11-13 20:18:22 +00:00
|
|
|
enum trace_iter_flags {
|
|
|
|
TRACE_FILE_LAT_FMT = 1,
|
|
|
|
TRACE_FILE_ANNOTATE = 2,
|
|
|
|
TRACE_FILE_TIME_IN_NS = 4,
|
|
|
|
};
|
|
|
|
|
2009-04-13 15:20:49 +00:00
|
|
|
|
|
|
|
typedef enum print_line_t (*trace_print_func)(struct trace_iterator *iter,
|
2010-04-22 22:46:14 +00:00
|
|
|
int flags, struct trace_event *event);
|
|
|
|
|
|
|
|
struct trace_event_functions {
|
2009-04-13 15:20:49 +00:00
|
|
|
trace_print_func trace;
|
|
|
|
trace_print_func raw;
|
|
|
|
trace_print_func hex;
|
|
|
|
trace_print_func binary;
|
|
|
|
};
|
|
|
|
|
2010-04-22 22:46:14 +00:00
|
|
|
struct trace_event {
|
|
|
|
struct hlist_node node;
|
|
|
|
int type;
|
|
|
|
struct trace_event_functions *funcs;
|
|
|
|
};
|
|
|
|
|
2015-05-05 13:39:12 +00:00
|
|
|
extern int register_trace_event(struct trace_event *event);
|
|
|
|
extern int unregister_trace_event(struct trace_event *event);
|
2009-04-13 15:20:49 +00:00
|
|
|
|
|
|
|
/* Return values for print_line callback */
|
|
|
|
enum print_line_t {
|
|
|
|
TRACE_TYPE_PARTIAL_LINE = 0, /* Retry after flushing the seq */
|
|
|
|
TRACE_TYPE_HANDLED = 1,
|
|
|
|
TRACE_TYPE_UNHANDLED = 2, /* Relay to other output functions */
|
|
|
|
TRACE_TYPE_NO_CONSUME = 3 /* Handled but ask to not consume */
|
|
|
|
};
|
|
|
|
|
tracing: Move trace_handle_return() out of line
Currently trace_handle_return() looks like this:
static inline enum print_line_t trace_handle_return(struct trace_seq *s)
{
return trace_seq_has_overflowed(s) ?
TRACE_TYPE_PARTIAL_LINE : TRACE_TYPE_HANDLED;
}
Where trace_seq_overflowed(s) is:
static inline bool trace_seq_has_overflowed(struct trace_seq *s)
{
return s->full || seq_buf_has_overflowed(&s->seq);
}
And seq_buf_has_overflowed(&s->seq) is:
static inline bool
seq_buf_has_overflowed(struct seq_buf *s)
{
return s->len > s->size;
}
Making trace_handle_return() into:
return (s->full || (s->seq->len > s->seq->size)) ?
TRACE_TYPE_PARTIAL_LINE :
TRACE_TYPE_HANDLED;
One would think this is not an issue to keep as an inline. But because this
is used in the TRACE_EVENT() macro, it is extended for every tracepoint in
the system. Taking a look at a single tracepoint x86_irq_vector (was the
first one I randomly chosen). As trace_handle_return is used in the
TRACE_EVENT() macro of trace_raw_output_##call() we disassemble
trace_raw_output_x86_irq_vector and do a diff:
- is the original
+ is the out-of-line code
I removed identical lines that were different just due to different
addresses.
--- /tmp/irq-vec-orig 2017-03-16 09:12:48.569384851 -0400
+++ /tmp/irq-vec-ool 2017-03-16 09:13:39.378153385 -0400
@@ -6,27 +6,23 @@
53 push %rbx
48 89 fb mov %rdi,%rbx
4c 8b a7 c0 20 00 00 mov 0x20c0(%rdi),%r12
e8 f7 72 13 00 callq ffffffff81155c80 <trace_raw_output_prep>
83 f8 01 cmp $0x1,%eax
74 05 je ffffffff8101e993 <trace_raw_output_x86_irq_vector+0x23>
5b pop %rbx
41 5c pop %r12
5d pop %rbp
c3 retq
41 8b 54 24 08 mov 0x8(%r12),%edx
- 48 8d bb 98 10 00 00 lea 0x1098(%rbx),%rdi
+ 48 81 c3 98 10 00 00 add $0x1098,%rbx
- 48 c7 c6 7b 8a a0 81 mov $0xffffffff81a08a7b,%rsi
+ 48 c7 c6 ab 8a a0 81 mov $0xffffffff81a08aab,%rsi
- e8 c5 85 13 00 callq ffffffff81156f70 <trace_seq_printf>
=== here's the start of the main difference ===
+ 48 89 df mov %rbx,%rdi
+ e8 62 7e 13 00 callq ffffffff81156810 <trace_seq_printf>
- 8b 93 b8 20 00 00 mov 0x20b8(%rbx),%edx
- 31 c0 xor %eax,%eax
- 85 d2 test %edx,%edx
- 75 11 jne ffffffff8101e9c8 <trace_raw_output_x86_irq_vector+0x58>
- 48 8b 83 a8 20 00 00 mov 0x20a8(%rbx),%rax
- 48 39 83 a0 20 00 00 cmp %rax,0x20a0(%rbx)
- 0f 93 c0 setae %al
+ 48 89 df mov %rbx,%rdi
+ e8 4a c5 12 00 callq ffffffff8114af00 <trace_handle_return>
5b pop %rbx
- 0f b6 c0 movzbl %al,%eax
=== end ===
41 5c pop %r12
5d pop %rbp
c3 retq
If you notice, the original has 22 bytes of text more than the out of line
version. As this is for every TRACE_EVENT() defined in the system, this can
become quite large.
text data bss dec hex filename
8690305 5450490 1298432 15439227 eb957b vmlinux-orig
8681725 5450490 1298432 15430647 eb73f7 vmlinux-handle
This change has a total of 8580 bytes in savings.
$ objdump -dr /tmp/vmlinux-orig | grep '^[0-9a-f]* <trace_raw_output' | wc -l
324
That's 324 tracepoints. But this does not include modules (which contain
many more tracepoints). For an allyesconfig build:
$ objdump -dr vmlinux-allyes-orig | grep '^[0-9a-f]* <trace_raw_output' | wc -l
1401
That's 1401 tracepoints giving us:
text data bss dec hex filename
137920629 140221067 53264384 331406080 13c0db00 vmlinux-allyes-orig
137827709 140221067 53264384 331313160 13bf7008 vmlinux-allyes-handle
92920 bytes in savings!!!
Link: http://lkml.kernel.org/r/20170315021431.13107-2-andi@firstfloor.org
Reported-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-03-16 15:01:06 +00:00
|
|
|
enum print_line_t trace_handle_return(struct trace_seq *s);
|
2014-11-12 15:29:54 +00:00
|
|
|
|
2021-01-25 19:45:08 +00:00
|
|
|
static inline void tracing_generic_entry_update(struct trace_entry *entry,
|
|
|
|
unsigned short type,
|
|
|
|
unsigned int trace_ctx)
|
|
|
|
{
|
|
|
|
entry->preempt_count = trace_ctx & 0xff;
|
2021-01-25 19:45:11 +00:00
|
|
|
entry->pid = current->pid;
|
2021-01-25 19:45:08 +00:00
|
|
|
entry->type = type;
|
|
|
|
entry->flags = trace_ctx >> 16;
|
|
|
|
}
|
|
|
|
|
2021-01-25 19:45:09 +00:00
|
|
|
unsigned int tracing_gen_ctx_irq_test(unsigned int irqs_status);
|
|
|
|
|
|
|
|
enum trace_flag_type {
|
|
|
|
TRACE_FLAG_IRQS_OFF = 0x01,
|
2024-11-22 20:28:49 +00:00
|
|
|
TRACE_FLAG_NEED_RESCHED_LAZY = 0x02,
|
2021-01-25 19:45:09 +00:00
|
|
|
TRACE_FLAG_NEED_RESCHED = 0x04,
|
|
|
|
TRACE_FLAG_HARDIRQ = 0x08,
|
|
|
|
TRACE_FLAG_SOFTIRQ = 0x10,
|
|
|
|
TRACE_FLAG_PREEMPT_RESCHED = 0x20,
|
|
|
|
TRACE_FLAG_NMI = 0x40,
|
2021-12-13 10:08:53 +00:00
|
|
|
TRACE_FLAG_BH_OFF = 0x80,
|
2021-01-25 19:45:09 +00:00
|
|
|
};
|
|
|
|
|
|
|
|
static inline unsigned int tracing_gen_ctx_flags(unsigned long irqflags)
|
|
|
|
{
|
|
|
|
unsigned int irq_status = irqs_disabled_flags(irqflags) ?
|
|
|
|
TRACE_FLAG_IRQS_OFF : 0;
|
|
|
|
return tracing_gen_ctx_irq_test(irq_status);
|
|
|
|
}
|
|
|
|
static inline unsigned int tracing_gen_ctx(void)
|
|
|
|
{
|
|
|
|
unsigned long irqflags;
|
|
|
|
|
|
|
|
local_save_flags(irqflags);
|
|
|
|
return tracing_gen_ctx_flags(irqflags);
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline unsigned int tracing_gen_ctx_dec(void)
|
|
|
|
{
|
|
|
|
unsigned int trace_ctx;
|
|
|
|
|
|
|
|
trace_ctx = tracing_gen_ctx();
|
|
|
|
/*
|
2021-03-23 17:49:35 +00:00
|
|
|
* Subtract one from the preemption counter if preemption is enabled,
|
2021-01-25 19:45:09 +00:00
|
|
|
* see trace_event_buffer_reserve()for details.
|
|
|
|
*/
|
|
|
|
if (IS_ENABLED(CONFIG_PREEMPTION))
|
|
|
|
trace_ctx--;
|
|
|
|
return trace_ctx;
|
|
|
|
}
|
2021-01-25 19:45:08 +00:00
|
|
|
|
2015-05-05 14:09:53 +00:00
|
|
|
struct trace_event_file;
|
2012-08-02 14:32:10 +00:00
|
|
|
|
|
|
|
struct ring_buffer_event *
|
2019-12-13 18:58:57 +00:00
|
|
|
trace_event_buffer_lock_reserve(struct trace_buffer **current_buffer,
|
2015-05-05 14:09:53 +00:00
|
|
|
struct trace_event_file *trace_file,
|
2012-08-02 14:32:10 +00:00
|
|
|
int type, unsigned long len,
|
2021-01-25 19:45:08 +00:00
|
|
|
unsigned int trace_ctx);
|
2009-04-13 15:20:49 +00:00
|
|
|
|
2017-06-27 02:01:55 +00:00
|
|
|
#define TRACE_RECORD_CMDLINE BIT(0)
|
|
|
|
#define TRACE_RECORD_TGID BIT(1)
|
|
|
|
|
|
|
|
void tracing_record_taskinfo(struct task_struct *task, int flags);
|
|
|
|
void tracing_record_taskinfo_sched_switch(struct task_struct *prev,
|
|
|
|
struct task_struct *next, int flags);
|
|
|
|
|
|
|
|
void tracing_record_cmdline(struct task_struct *task);
|
|
|
|
void tracing_record_tgid(struct task_struct *task);
|
2009-04-13 15:20:49 +00:00
|
|
|
|
2022-12-05 10:21:52 +00:00
|
|
|
int trace_output_call(struct trace_iterator *iter, char *name, char *fmt, ...)
|
|
|
|
__printf(3, 4);
|
2012-08-09 23:16:14 +00:00
|
|
|
|
2009-07-20 02:20:53 +00:00
|
|
|
struct event_filter;
|
|
|
|
|
2010-04-21 16:27:06 +00:00
|
|
|
enum trace_reg {
|
|
|
|
TRACE_REG_REGISTER,
|
|
|
|
TRACE_REG_UNREGISTER,
|
2012-03-13 23:03:02 +00:00
|
|
|
#ifdef CONFIG_PERF_EVENTS
|
2010-04-21 16:27:06 +00:00
|
|
|
TRACE_REG_PERF_REGISTER,
|
|
|
|
TRACE_REG_PERF_UNREGISTER,
|
2012-02-15 14:51:49 +00:00
|
|
|
TRACE_REG_PERF_OPEN,
|
|
|
|
TRACE_REG_PERF_CLOSE,
|
2017-10-10 15:15:47 +00:00
|
|
|
/*
|
|
|
|
* These (ADD/DEL) use a 'boolean' return value, where 1 (true) means a
|
|
|
|
* custom action was taken and the default action is not to be
|
|
|
|
* performed.
|
|
|
|
*/
|
2012-02-15 14:51:50 +00:00
|
|
|
TRACE_REG_PERF_ADD,
|
|
|
|
TRACE_REG_PERF_DEL,
|
2012-03-13 23:03:02 +00:00
|
|
|
#endif
|
2010-04-21 16:27:06 +00:00
|
|
|
};
|
|
|
|
|
2015-05-05 15:45:27 +00:00
|
|
|
struct trace_event_call;
|
2010-04-21 16:27:06 +00:00
|
|
|
|
2019-10-24 20:26:59 +00:00
|
|
|
#define TRACE_FUNCTION_TYPE ((const char *)~0UL)
|
|
|
|
|
|
|
|
struct trace_event_fields {
|
|
|
|
const char *type;
|
|
|
|
union {
|
|
|
|
struct {
|
|
|
|
const char *name;
|
|
|
|
const int size;
|
|
|
|
const int align;
|
|
|
|
const int is_signed;
|
|
|
|
const int filter_type;
|
2023-02-12 15:13:03 +00:00
|
|
|
const int len;
|
2019-10-24 20:26:59 +00:00
|
|
|
};
|
|
|
|
int (*define_fields)(struct trace_event_call *);
|
|
|
|
};
|
|
|
|
};
|
|
|
|
|
2015-05-05 15:45:27 +00:00
|
|
|
struct trace_event_class {
|
2015-03-31 18:37:12 +00:00
|
|
|
const char *system;
|
2010-04-21 16:27:06 +00:00
|
|
|
void *probe;
|
|
|
|
#ifdef CONFIG_PERF_EVENTS
|
|
|
|
void *perf_probe;
|
|
|
|
#endif
|
2015-05-05 15:45:27 +00:00
|
|
|
int (*reg)(struct trace_event_call *event,
|
2012-02-15 14:51:49 +00:00
|
|
|
enum trace_reg type, void *data);
|
2019-10-24 20:26:59 +00:00
|
|
|
struct trace_event_fields *fields_array;
|
2015-05-05 15:45:27 +00:00
|
|
|
struct list_head *(*get_fields)(struct trace_event_call *);
|
2010-04-22 14:35:55 +00:00
|
|
|
struct list_head fields;
|
2015-05-05 15:45:27 +00:00
|
|
|
int (*raw_init)(struct trace_event_call *);
|
2010-04-20 14:47:33 +00:00
|
|
|
};
|
|
|
|
|
2015-05-05 15:45:27 +00:00
|
|
|
extern int trace_event_reg(struct trace_event_call *event,
|
2012-02-15 14:51:49 +00:00
|
|
|
enum trace_reg type, void *data);
|
2010-06-08 15:22:06 +00:00
|
|
|
|
2015-05-05 17:18:46 +00:00
|
|
|
struct trace_event_buffer {
|
2019-12-13 18:58:57 +00:00
|
|
|
struct trace_buffer *buffer;
|
2012-08-10 02:42:57 +00:00
|
|
|
struct ring_buffer_event *event;
|
2015-05-05 14:09:53 +00:00
|
|
|
struct trace_event_file *trace_file;
|
2012-08-10 02:42:57 +00:00
|
|
|
void *entry;
|
2021-01-25 19:45:08 +00:00
|
|
|
unsigned int trace_ctx;
|
2020-01-10 16:05:31 +00:00
|
|
|
struct pt_regs *regs;
|
2012-08-10 02:42:57 +00:00
|
|
|
};
|
|
|
|
|
2015-05-05 17:18:46 +00:00
|
|
|
void *trace_event_buffer_reserve(struct trace_event_buffer *fbuffer,
|
2015-05-05 14:09:53 +00:00
|
|
|
struct trace_event_file *trace_file,
|
2012-08-10 02:42:57 +00:00
|
|
|
unsigned long len);
|
|
|
|
|
2015-05-05 17:18:46 +00:00
|
|
|
void trace_event_buffer_commit(struct trace_event_buffer *fbuffer);
|
2012-08-10 02:42:57 +00:00
|
|
|
|
2010-04-23 15:12:36 +00:00
|
|
|
enum {
|
2010-11-18 00:39:17 +00:00
|
|
|
TRACE_EVENT_FL_CAP_ANY_BIT,
|
2011-11-01 01:09:35 +00:00
|
|
|
TRACE_EVENT_FL_NO_SET_FILTER_BIT,
|
2012-05-10 19:55:43 +00:00
|
|
|
TRACE_EVENT_FL_IGNORE_ENABLE_BIT,
|
2014-04-08 21:26:21 +00:00
|
|
|
TRACE_EVENT_FL_TRACEPOINT_BIT,
|
2021-08-17 03:42:56 +00:00
|
|
|
TRACE_EVENT_FL_DYNAMIC_BIT,
|
2015-03-25 19:49:19 +00:00
|
|
|
TRACE_EVENT_FL_KPROBE_BIT,
|
2015-07-01 02:13:50 +00:00
|
|
|
TRACE_EVENT_FL_UPROBE_BIT,
|
tracing: Add a probe that attaches to trace events
A new dynamic event is introduced: event probe. The event is attached
to an existing tracepoint and uses its fields as arguments. The user
can specify custom format string of the new event, select what tracepoint
arguments will be printed and how to print them.
An event probe is created by writing configuration string in
'dynamic_events' ftrace file:
e[:[SNAME/]ENAME] SYSTEM/EVENT [FETCHARGS] - Set an event probe
-:SNAME/ENAME - Delete an event probe
Where:
SNAME - System name, if omitted 'eprobes' is used.
ENAME - Name of the new event in SNAME, if omitted the SYSTEM_EVENT is used.
SYSTEM - Name of the system, where the tracepoint is defined, mandatory.
EVENT - Name of the tracepoint event in SYSTEM, mandatory.
FETCHARGS - Arguments:
<name>=$<field>[:TYPE] - Fetch given filed of the tracepoint and print
it as given TYPE with given name. Supported
types are:
(u8/u16/u32/u64/s8/s16/s32/s64), basic type
(x8/x16/x32/x64), hexadecimal types
"string", "ustring" and bitfield.
Example, attach an event probe on openat system call and print name of the
file that will be opened:
echo "e:esys/eopen syscalls/sys_enter_openat file=\$filename:string" >> dynamic_events
A new dynamic event is created in events/esys/eopen/ directory. It
can be deleted with:
echo "-:esys/eopen" >> dynamic_events
Filters, triggers and histograms can be attached to the new event, it can
be matched in synthetic events. There is one limitation - an event probe
can not be attached to kprobe, uprobe or another event probe.
Link: https://lkml.kernel.org/r/20210812145805.2292326-1-tz.stoyanov@gmail.com
Link: https://lkml.kernel.org/r/20210819152825.142428383@goodmis.org
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Co-developed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Tzvetomir Stoyanov (VMware) <tz.stoyanov@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-08-19 15:26:06 +00:00
|
|
|
TRACE_EVENT_FL_EPROBE_BIT,
|
2023-06-06 12:39:55 +00:00
|
|
|
TRACE_EVENT_FL_FPROBE_BIT,
|
2022-03-03 22:05:34 +00:00
|
|
|
TRACE_EVENT_FL_CUSTOM_BIT,
|
2010-04-23 15:12:36 +00:00
|
|
|
};
|
|
|
|
|
2012-05-04 03:09:03 +00:00
|
|
|
/*
|
|
|
|
* Event flags:
|
|
|
|
* CAP_ANY - Any user can enable for perf
|
|
|
|
* NO_SET_FILTER - Set when filter has error and is to be ignored
|
2015-05-13 19:12:33 +00:00
|
|
|
* IGNORE_ENABLE - For trace internal events, do not enable with debugfs file
|
2014-04-08 21:26:21 +00:00
|
|
|
* TRACEPOINT - Event is a tracepoint
|
2021-08-17 03:42:56 +00:00
|
|
|
* DYNAMIC - Event is a dynamic event (created at run time)
|
2015-03-25 19:49:19 +00:00
|
|
|
* KPROBE - Event is a kprobe
|
2015-07-01 02:13:50 +00:00
|
|
|
* UPROBE - Event is a uprobe
|
tracing: Add a probe that attaches to trace events
A new dynamic event is introduced: event probe. The event is attached
to an existing tracepoint and uses its fields as arguments. The user
can specify custom format string of the new event, select what tracepoint
arguments will be printed and how to print them.
An event probe is created by writing configuration string in
'dynamic_events' ftrace file:
e[:[SNAME/]ENAME] SYSTEM/EVENT [FETCHARGS] - Set an event probe
-:SNAME/ENAME - Delete an event probe
Where:
SNAME - System name, if omitted 'eprobes' is used.
ENAME - Name of the new event in SNAME, if omitted the SYSTEM_EVENT is used.
SYSTEM - Name of the system, where the tracepoint is defined, mandatory.
EVENT - Name of the tracepoint event in SYSTEM, mandatory.
FETCHARGS - Arguments:
<name>=$<field>[:TYPE] - Fetch given filed of the tracepoint and print
it as given TYPE with given name. Supported
types are:
(u8/u16/u32/u64/s8/s16/s32/s64), basic type
(x8/x16/x32/x64), hexadecimal types
"string", "ustring" and bitfield.
Example, attach an event probe on openat system call and print name of the
file that will be opened:
echo "e:esys/eopen syscalls/sys_enter_openat file=\$filename:string" >> dynamic_events
A new dynamic event is created in events/esys/eopen/ directory. It
can be deleted with:
echo "-:esys/eopen" >> dynamic_events
Filters, triggers and histograms can be attached to the new event, it can
be matched in synthetic events. There is one limitation - an event probe
can not be attached to kprobe, uprobe or another event probe.
Link: https://lkml.kernel.org/r/20210812145805.2292326-1-tz.stoyanov@gmail.com
Link: https://lkml.kernel.org/r/20210819152825.142428383@goodmis.org
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Co-developed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Tzvetomir Stoyanov (VMware) <tz.stoyanov@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-08-19 15:26:06 +00:00
|
|
|
* EPROBE - Event is an event probe
|
2023-06-06 12:39:55 +00:00
|
|
|
* FPROBE - Event is an function probe
|
2022-03-03 22:05:34 +00:00
|
|
|
* CUSTOM - Event is a custom event (to be attached to an exsiting tracepoint)
|
|
|
|
* This is set when the custom event has not been attached
|
|
|
|
* to a tracepoint yet, then it is cleared when it is.
|
2012-05-04 03:09:03 +00:00
|
|
|
*/
|
2010-04-23 15:12:36 +00:00
|
|
|
enum {
|
2010-11-18 00:39:17 +00:00
|
|
|
TRACE_EVENT_FL_CAP_ANY = (1 << TRACE_EVENT_FL_CAP_ANY_BIT),
|
2011-11-01 01:09:35 +00:00
|
|
|
TRACE_EVENT_FL_NO_SET_FILTER = (1 << TRACE_EVENT_FL_NO_SET_FILTER_BIT),
|
2012-05-10 19:55:43 +00:00
|
|
|
TRACE_EVENT_FL_IGNORE_ENABLE = (1 << TRACE_EVENT_FL_IGNORE_ENABLE_BIT),
|
2014-04-08 21:26:21 +00:00
|
|
|
TRACE_EVENT_FL_TRACEPOINT = (1 << TRACE_EVENT_FL_TRACEPOINT_BIT),
|
2021-08-17 03:42:56 +00:00
|
|
|
TRACE_EVENT_FL_DYNAMIC = (1 << TRACE_EVENT_FL_DYNAMIC_BIT),
|
2015-03-25 19:49:19 +00:00
|
|
|
TRACE_EVENT_FL_KPROBE = (1 << TRACE_EVENT_FL_KPROBE_BIT),
|
2015-07-01 02:13:50 +00:00
|
|
|
TRACE_EVENT_FL_UPROBE = (1 << TRACE_EVENT_FL_UPROBE_BIT),
|
tracing: Add a probe that attaches to trace events
A new dynamic event is introduced: event probe. The event is attached
to an existing tracepoint and uses its fields as arguments. The user
can specify custom format string of the new event, select what tracepoint
arguments will be printed and how to print them.
An event probe is created by writing configuration string in
'dynamic_events' ftrace file:
e[:[SNAME/]ENAME] SYSTEM/EVENT [FETCHARGS] - Set an event probe
-:SNAME/ENAME - Delete an event probe
Where:
SNAME - System name, if omitted 'eprobes' is used.
ENAME - Name of the new event in SNAME, if omitted the SYSTEM_EVENT is used.
SYSTEM - Name of the system, where the tracepoint is defined, mandatory.
EVENT - Name of the tracepoint event in SYSTEM, mandatory.
FETCHARGS - Arguments:
<name>=$<field>[:TYPE] - Fetch given filed of the tracepoint and print
it as given TYPE with given name. Supported
types are:
(u8/u16/u32/u64/s8/s16/s32/s64), basic type
(x8/x16/x32/x64), hexadecimal types
"string", "ustring" and bitfield.
Example, attach an event probe on openat system call and print name of the
file that will be opened:
echo "e:esys/eopen syscalls/sys_enter_openat file=\$filename:string" >> dynamic_events
A new dynamic event is created in events/esys/eopen/ directory. It
can be deleted with:
echo "-:esys/eopen" >> dynamic_events
Filters, triggers and histograms can be attached to the new event, it can
be matched in synthetic events. There is one limitation - an event probe
can not be attached to kprobe, uprobe or another event probe.
Link: https://lkml.kernel.org/r/20210812145805.2292326-1-tz.stoyanov@gmail.com
Link: https://lkml.kernel.org/r/20210819152825.142428383@goodmis.org
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Co-developed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Tzvetomir Stoyanov (VMware) <tz.stoyanov@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-08-19 15:26:06 +00:00
|
|
|
TRACE_EVENT_FL_EPROBE = (1 << TRACE_EVENT_FL_EPROBE_BIT),
|
2023-06-06 12:39:55 +00:00
|
|
|
TRACE_EVENT_FL_FPROBE = (1 << TRACE_EVENT_FL_FPROBE_BIT),
|
2022-03-03 22:05:34 +00:00
|
|
|
TRACE_EVENT_FL_CUSTOM = (1 << TRACE_EVENT_FL_CUSTOM_BIT),
|
2010-04-23 15:12:36 +00:00
|
|
|
};
|
|
|
|
|
2015-07-01 02:13:50 +00:00
|
|
|
#define TRACE_EVENT_FL_UKPROBE (TRACE_EVENT_FL_KPROBE | TRACE_EVENT_FL_UPROBE)
|
|
|
|
|
2015-05-05 15:45:27 +00:00
|
|
|
struct trace_event_call {
|
2009-04-10 17:52:20 +00:00
|
|
|
struct list_head list;
|
2015-05-05 15:45:27 +00:00
|
|
|
struct trace_event_class *class;
|
2014-04-08 21:26:21 +00:00
|
|
|
union {
|
|
|
|
char *name;
|
|
|
|
/* Set TRACE_EVENT_FL_TRACEPOINT flag when using "tp" */
|
|
|
|
struct tracepoint *tp;
|
|
|
|
};
|
2010-04-23 14:00:22 +00:00
|
|
|
struct trace_event event;
|
tracing: Add TRACE_DEFINE_ENUM() macro to map enums to their values
Several tracepoints use the helper functions __print_symbolic() or
__print_flags() and pass in enums that do the mapping between the
binary data stored and the value to print. This works well for reading
the ASCII trace files, but when the data is read via userspace tools
such as perf and trace-cmd, the conversion of the binary value to a
human string format is lost if an enum is used, as userspace does not
have access to what the ENUM is.
For example, the tracepoint trace_tlb_flush() has:
__print_symbolic(REC->reason,
{ TLB_FLUSH_ON_TASK_SWITCH, "flush on task switch" },
{ TLB_REMOTE_SHOOTDOWN, "remote shootdown" },
{ TLB_LOCAL_SHOOTDOWN, "local shootdown" },
{ TLB_LOCAL_MM_SHOOTDOWN, "local mm shootdown" })
Which maps the enum values to the strings they represent. But perf and
trace-cmd do no know what value TLB_LOCAL_MM_SHOOTDOWN is, and would
not be able to map it.
With TRACE_DEFINE_ENUM(), developers can place these in the event header
files and ftrace will convert the enums to their values:
By adding:
TRACE_DEFINE_ENUM(TLB_FLUSH_ON_TASK_SWITCH);
TRACE_DEFINE_ENUM(TLB_REMOTE_SHOOTDOWN);
TRACE_DEFINE_ENUM(TLB_LOCAL_SHOOTDOWN);
TRACE_DEFINE_ENUM(TLB_LOCAL_MM_SHOOTDOWN);
$ cat /sys/kernel/debug/tracing/events/tlb/tlb_flush/format
[...]
__print_symbolic(REC->reason,
{ 0, "flush on task switch" },
{ 1, "remote shootdown" },
{ 2, "local shootdown" },
{ 3, "local mm shootdown" })
The above is what userspace expects to see, and tools do not need to
be modified to parse them.
Link: http://lkml.kernel.org/r/20150403013802.220157513@goodmis.org
Cc: Guilherme Cox <cox@computer.org>
Cc: Tony Luck <tony.luck@gmail.com>
Cc: Xie XiuQi <xiexiuqi@huawei.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-03-24 21:58:09 +00:00
|
|
|
char *print_fmt;
|
2021-08-17 03:42:57 +00:00
|
|
|
/*
|
|
|
|
* Static events can disappear with modules,
|
|
|
|
* where as dynamic ones need their own ref count.
|
|
|
|
*/
|
|
|
|
union {
|
|
|
|
void *module;
|
|
|
|
atomic_t refcnt;
|
|
|
|
};
|
2009-08-10 20:52:44 +00:00
|
|
|
void *data;
|
2021-02-26 19:09:15 +00:00
|
|
|
|
|
|
|
/* See the TRACE_EVENT_FL_* flags above */
|
2012-05-04 03:09:03 +00:00
|
|
|
int flags; /* static flags of different events */
|
|
|
|
|
|
|
|
#ifdef CONFIG_PERF_EVENTS
|
|
|
|
int perf_refcount;
|
|
|
|
struct hlist_head __percpu *perf_events;
|
2017-10-24 06:53:08 +00:00
|
|
|
struct bpf_prog_array __rcu *prog_array;
|
2013-11-14 15:23:04 +00:00
|
|
|
|
2015-05-05 15:45:27 +00:00
|
|
|
int (*perf_perm)(struct trace_event_call *,
|
2013-11-14 15:23:04 +00:00
|
|
|
struct perf_event *);
|
2012-05-04 03:09:03 +00:00
|
|
|
#endif
|
|
|
|
};
|
|
|
|
|
2021-08-17 03:42:57 +00:00
|
|
|
#ifdef CONFIG_DYNAMIC_EVENTS
|
|
|
|
bool trace_event_dyn_try_get_ref(struct trace_event_call *call);
|
|
|
|
void trace_event_dyn_put_ref(struct trace_event_call *call);
|
|
|
|
bool trace_event_dyn_busy(struct trace_event_call *call);
|
|
|
|
#else
|
|
|
|
static inline bool trace_event_dyn_try_get_ref(struct trace_event_call *call)
|
|
|
|
{
|
|
|
|
/* Without DYNAMIC_EVENTS configured, nothing should be calling this */
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
static inline void trace_event_dyn_put_ref(struct trace_event_call *call)
|
|
|
|
{
|
|
|
|
}
|
|
|
|
static inline bool trace_event_dyn_busy(struct trace_event_call *call)
|
|
|
|
{
|
|
|
|
/* Nothing should call this without DYNAIMIC_EVENTS configured. */
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
|
|
|
static inline bool trace_event_try_get_ref(struct trace_event_call *call)
|
|
|
|
{
|
|
|
|
if (call->flags & TRACE_EVENT_FL_DYNAMIC)
|
|
|
|
return trace_event_dyn_try_get_ref(call);
|
|
|
|
else
|
|
|
|
return try_module_get(call->module);
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline void trace_event_put_ref(struct trace_event_call *call)
|
|
|
|
{
|
|
|
|
if (call->flags & TRACE_EVENT_FL_DYNAMIC)
|
|
|
|
trace_event_dyn_put_ref(call);
|
|
|
|
else
|
|
|
|
module_put(call->module);
|
|
|
|
}
|
|
|
|
|
2017-10-24 06:53:08 +00:00
|
|
|
#ifdef CONFIG_PERF_EVENTS
|
|
|
|
static inline bool bpf_prog_array_valid(struct trace_event_call *call)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* This inline function checks whether call->prog_array
|
|
|
|
* is valid or not. The function is called in various places,
|
|
|
|
* outside rcu_read_lock/unlock, as a heuristic to speed up execution.
|
|
|
|
*
|
|
|
|
* If this function returns true, and later call->prog_array
|
|
|
|
* becomes false inside rcu_read_lock/unlock region,
|
|
|
|
* we bail out then. If this function return false,
|
|
|
|
* there is a risk that we might miss a few events if the checking
|
|
|
|
* were delayed until inside rcu_read_lock/unlock region and
|
|
|
|
* call->prog_array happened to become non-NULL then.
|
|
|
|
*
|
|
|
|
* Here, READ_ONCE() is used instead of rcu_access_pointer().
|
|
|
|
* rcu_access_pointer() requires the actual definition of
|
|
|
|
* "struct bpf_prog_array" while READ_ONCE() only needs
|
|
|
|
* a declaration of the same type.
|
|
|
|
*/
|
|
|
|
return !!READ_ONCE(call->prog_array);
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2014-04-08 21:26:21 +00:00
|
|
|
static inline const char *
|
2015-05-13 18:20:14 +00:00
|
|
|
trace_event_name(struct trace_event_call *call)
|
2014-04-08 21:26:21 +00:00
|
|
|
{
|
2022-03-03 22:05:34 +00:00
|
|
|
if (call->flags & TRACE_EVENT_FL_CUSTOM)
|
|
|
|
return call->name;
|
|
|
|
else if (call->flags & TRACE_EVENT_FL_TRACEPOINT)
|
2014-04-08 21:26:21 +00:00
|
|
|
return call->tp ? call->tp->name : NULL;
|
|
|
|
else
|
|
|
|
return call->name;
|
|
|
|
}
|
|
|
|
|
2019-05-25 16:58:01 +00:00
|
|
|
static inline struct list_head *
|
|
|
|
trace_get_fields(struct trace_event_call *event_call)
|
|
|
|
{
|
|
|
|
if (!event_call->class->get_fields)
|
|
|
|
return &event_call->class->fields;
|
|
|
|
return event_call->class->get_fields(event_call);
|
|
|
|
}
|
|
|
|
|
2015-05-13 18:59:40 +00:00
|
|
|
struct trace_subsystem_dir;
|
2012-05-04 03:09:03 +00:00
|
|
|
|
|
|
|
enum {
|
2015-05-13 19:12:33 +00:00
|
|
|
EVENT_FILE_FL_ENABLED_BIT,
|
|
|
|
EVENT_FILE_FL_RECORDED_CMD_BIT,
|
2017-06-27 02:01:55 +00:00
|
|
|
EVENT_FILE_FL_RECORDED_TGID_BIT,
|
2015-05-13 19:12:33 +00:00
|
|
|
EVENT_FILE_FL_FILTERED_BIT,
|
|
|
|
EVENT_FILE_FL_NO_SET_FILTER_BIT,
|
|
|
|
EVENT_FILE_FL_SOFT_MODE_BIT,
|
|
|
|
EVENT_FILE_FL_SOFT_DISABLED_BIT,
|
|
|
|
EVENT_FILE_FL_TRIGGER_MODE_BIT,
|
|
|
|
EVENT_FILE_FL_TRIGGER_COND_BIT,
|
2015-09-25 16:58:44 +00:00
|
|
|
EVENT_FILE_FL_PID_FILTER_BIT,
|
tracing: Only have rmmod clear buffers that its events were active in
Currently, when a module event is enabled, when that module is removed, it
clears all ring buffers. This is to prevent another module from being loaded
and having one of its trace event IDs from reusing a trace event ID of the
removed module. This could cause undesirable effects as the trace event of
the new module would be using its own processing algorithms to process raw
data of another event. To prevent this, when a module is loaded, if any of
its events have been used (signified by the WAS_ENABLED event call flag,
which is never cleared), all ring buffers are cleared, just in case any one
of them contains event data of the removed event.
The problem is, there's no reason to clear all ring buffers if only one (or
less than all of them) uses one of the events. Instead, only clear the ring
buffers that recorded the events of a module that is being removed.
To do this, instead of keeping the WAS_ENABLED flag with the trace event
call, move it to the per instance (per ring buffer) event file descriptor.
The event file descriptor maps each event to a separate ring buffer
instance. Then when the module is removed, only the ring buffers that
activated one of the module's events get cleared. The rest are not touched.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-08-31 21:03:47 +00:00
|
|
|
EVENT_FILE_FL_WAS_ENABLED_BIT,
|
2023-10-31 16:24:53 +00:00
|
|
|
EVENT_FILE_FL_FREED_BIT,
|
2012-05-04 03:09:03 +00:00
|
|
|
};
|
|
|
|
|
2020-01-29 18:59:22 +00:00
|
|
|
extern struct trace_event_file *trace_get_event_file(const char *instance,
|
|
|
|
const char *system,
|
|
|
|
const char *event);
|
|
|
|
extern void trace_put_event_file(struct trace_event_file *file);
|
|
|
|
|
2020-01-29 18:59:24 +00:00
|
|
|
#define MAX_DYNEVENT_CMD_LEN (2048)
|
|
|
|
|
|
|
|
enum dynevent_type {
|
2020-01-29 18:59:25 +00:00
|
|
|
DYNEVENT_TYPE_SYNTH = 1,
|
2020-01-29 18:59:29 +00:00
|
|
|
DYNEVENT_TYPE_KPROBE,
|
2020-01-29 18:59:24 +00:00
|
|
|
DYNEVENT_TYPE_NONE,
|
|
|
|
};
|
|
|
|
|
|
|
|
struct dynevent_cmd;
|
|
|
|
|
|
|
|
typedef int (*dynevent_create_fn_t)(struct dynevent_cmd *cmd);
|
|
|
|
|
|
|
|
struct dynevent_cmd {
|
2020-01-31 21:55:34 +00:00
|
|
|
struct seq_buf seq;
|
2020-01-29 18:59:24 +00:00
|
|
|
const char *event_name;
|
|
|
|
unsigned int n_fields;
|
|
|
|
enum dynevent_type type;
|
|
|
|
dynevent_create_fn_t run_command;
|
|
|
|
void *private_data;
|
|
|
|
};
|
|
|
|
|
|
|
|
extern int dynevent_create(struct dynevent_cmd *cmd);
|
|
|
|
|
2020-01-29 18:59:23 +00:00
|
|
|
extern int synth_event_delete(const char *name);
|
|
|
|
|
2020-01-29 18:59:25 +00:00
|
|
|
extern void synth_event_cmd_init(struct dynevent_cmd *cmd,
|
|
|
|
char *buf, int maxlen);
|
|
|
|
|
|
|
|
extern int __synth_event_gen_cmd_start(struct dynevent_cmd *cmd,
|
|
|
|
const char *name,
|
|
|
|
struct module *mod, ...);
|
|
|
|
|
|
|
|
#define synth_event_gen_cmd_start(cmd, name, mod, ...) \
|
|
|
|
__synth_event_gen_cmd_start(cmd, name, mod, ## __VA_ARGS__, NULL)
|
|
|
|
|
|
|
|
struct synth_field_desc {
|
|
|
|
const char *type;
|
|
|
|
const char *name;
|
|
|
|
};
|
|
|
|
|
|
|
|
extern int synth_event_gen_cmd_array_start(struct dynevent_cmd *cmd,
|
|
|
|
const char *name,
|
|
|
|
struct module *mod,
|
|
|
|
struct synth_field_desc *fields,
|
|
|
|
unsigned int n_fields);
|
|
|
|
extern int synth_event_create(const char *name,
|
|
|
|
struct synth_field_desc *fields,
|
|
|
|
unsigned int n_fields, struct module *mod);
|
|
|
|
|
|
|
|
extern int synth_event_add_field(struct dynevent_cmd *cmd,
|
|
|
|
const char *type,
|
|
|
|
const char *name);
|
|
|
|
extern int synth_event_add_field_str(struct dynevent_cmd *cmd,
|
|
|
|
const char *type_name);
|
|
|
|
extern int synth_event_add_fields(struct dynevent_cmd *cmd,
|
|
|
|
struct synth_field_desc *fields,
|
|
|
|
unsigned int n_fields);
|
|
|
|
|
|
|
|
#define synth_event_gen_cmd_end(cmd) \
|
|
|
|
dynevent_create(cmd)
|
|
|
|
|
2020-01-29 18:59:27 +00:00
|
|
|
struct synth_event;
|
|
|
|
|
|
|
|
struct synth_event_trace_state {
|
|
|
|
struct trace_event_buffer fbuffer;
|
|
|
|
struct synth_trace_event *entry;
|
|
|
|
struct trace_buffer *buffer;
|
|
|
|
struct synth_event *event;
|
|
|
|
unsigned int cur_field;
|
|
|
|
unsigned int n_u64;
|
2020-02-10 23:06:50 +00:00
|
|
|
bool disabled;
|
2020-01-29 18:59:27 +00:00
|
|
|
bool add_next;
|
|
|
|
bool add_name;
|
|
|
|
};
|
|
|
|
|
|
|
|
extern int synth_event_trace(struct trace_event_file *file,
|
|
|
|
unsigned int n_vals, ...);
|
|
|
|
extern int synth_event_trace_array(struct trace_event_file *file, u64 *vals,
|
|
|
|
unsigned int n_vals);
|
|
|
|
extern int synth_event_trace_start(struct trace_event_file *file,
|
|
|
|
struct synth_event_trace_state *trace_state);
|
|
|
|
extern int synth_event_add_next_val(u64 val,
|
|
|
|
struct synth_event_trace_state *trace_state);
|
|
|
|
extern int synth_event_add_val(const char *field_name, u64 val,
|
|
|
|
struct synth_event_trace_state *trace_state);
|
|
|
|
extern int synth_event_trace_end(struct synth_event_trace_state *trace_state);
|
|
|
|
|
2020-01-29 18:59:29 +00:00
|
|
|
extern int kprobe_event_delete(const char *name);
|
|
|
|
|
|
|
|
extern void kprobe_event_cmd_init(struct dynevent_cmd *cmd,
|
|
|
|
char *buf, int maxlen);
|
|
|
|
|
|
|
|
#define kprobe_event_gen_cmd_start(cmd, name, loc, ...) \
|
|
|
|
__kprobe_event_gen_cmd_start(cmd, false, name, loc, ## __VA_ARGS__, NULL)
|
|
|
|
|
|
|
|
#define kretprobe_event_gen_cmd_start(cmd, name, loc, ...) \
|
|
|
|
__kprobe_event_gen_cmd_start(cmd, true, name, loc, ## __VA_ARGS__, NULL)
|
|
|
|
|
|
|
|
extern int __kprobe_event_gen_cmd_start(struct dynevent_cmd *cmd,
|
|
|
|
bool kretprobe,
|
|
|
|
const char *name,
|
|
|
|
const char *loc, ...);
|
|
|
|
|
|
|
|
#define kprobe_event_add_fields(cmd, ...) \
|
|
|
|
__kprobe_event_add_fields(cmd, ## __VA_ARGS__, NULL)
|
|
|
|
|
|
|
|
#define kprobe_event_add_field(cmd, field) \
|
|
|
|
__kprobe_event_add_fields(cmd, field, NULL)
|
|
|
|
|
|
|
|
extern int __kprobe_event_add_fields(struct dynevent_cmd *cmd, ...);
|
|
|
|
|
|
|
|
#define kprobe_event_gen_cmd_end(cmd) \
|
|
|
|
dynevent_create(cmd)
|
|
|
|
|
|
|
|
#define kretprobe_event_gen_cmd_end(cmd) \
|
|
|
|
dynevent_create(cmd)
|
|
|
|
|
2012-05-04 03:09:03 +00:00
|
|
|
/*
|
2015-05-13 19:12:33 +00:00
|
|
|
* Event file flags:
|
2013-03-12 16:38:06 +00:00
|
|
|
* ENABLED - The event is enabled
|
2012-05-04 03:09:03 +00:00
|
|
|
* RECORDED_CMD - The comms should be recorded at sched_switch
|
2017-06-27 02:01:55 +00:00
|
|
|
* RECORDED_TGID - The tgids should be recorded at sched_switch
|
2013-10-24 13:34:17 +00:00
|
|
|
* FILTERED - The event has a filter attached
|
|
|
|
* NO_SET_FILTER - Set when filter has error and is to be ignored
|
2013-03-12 17:26:18 +00:00
|
|
|
* SOFT_MODE - The event is enabled/disabled by SOFT_DISABLED
|
|
|
|
* SOFT_DISABLED - When set, do not trace the event (even though its
|
|
|
|
* tracepoint may be enabled)
|
tracing: Add basic event trigger framework
Add a 'trigger' file for each trace event, enabling 'trace event
triggers' to be set for trace events.
'trace event triggers' are patterned after the existing 'ftrace
function triggers' implementation except that triggers are written to
per-event 'trigger' files instead of to a single file such as the
'set_ftrace_filter' used for ftrace function triggers.
The implementation is meant to be entirely separate from ftrace
function triggers, in order to keep the respective implementations
relatively simple and to allow them to diverge.
The event trigger functionality is built on top of SOFT_DISABLE
functionality. It adds a TRIGGER_MODE bit to the ftrace_event_file
flags which is checked when any trace event fires. Triggers set for a
particular event need to be checked regardless of whether that event
is actually enabled or not - getting an event to fire even if it's not
enabled is what's already implemented by SOFT_DISABLE mode, so trigger
mode directly reuses that. Event trigger essentially inherit the soft
disable logic in __ftrace_event_enable_disable() while adding a bit of
logic and trigger reference counting via tm_ref on top of that in a
new trace_event_trigger_enable_disable() function. Because the base
__ftrace_event_enable_disable() code now needs to be invoked from
outside trace_events.c, a wrapper is also added for those usages.
The triggers for an event are actually invoked via a new function,
event_triggers_call(), and code is also added to invoke them for
ftrace_raw_event calls as well as syscall events.
The main part of the patch creates a new trace_events_trigger.c file
to contain the trace event triggers implementation.
The standard open, read, and release file operations are implemented
here.
The open() implementation sets up for the various open modes of the
'trigger' file. It creates and attaches the trigger iterator and sets
up the command parser. If opened for reading set up the trigger
seq_ops.
The read() implementation parses the event trigger written to the
'trigger' file, looks up the trigger command, and passes it along to
that event_command's func() implementation for command-specific
processing.
The release() implementation does whatever cleanup is needed to
release the 'trigger' file, like releasing the parser and trigger
iterator, etc.
A couple of functions for event command registration and
unregistration are added, along with a list to add them to and a mutex
to protect them, as well as an (initially empty) registration function
to add the set of commands that will be added by future commits, and
call to it from the trace event initialization code.
also added are a couple trigger-specific data structures needed for
these implementations such as a trigger iterator and a struct for
trigger-specific data.
A couple structs consisting mostly of function meant to be implemented
in command-specific ways, event_command and event_trigger_ops, are
used by the generic event trigger command implementations. They're
being put into trace.h alongside the other trace_event data structures
and functions, in the expectation that they'll be needed in several
trace_event-related files such as trace_events_trigger.c and
trace_events.c.
The event_command.func() function is meant to be called by the trigger
parsing code in order to add a trigger instance to the corresponding
event. It essentially coordinates adding a live trigger instance to
the event, and arming the triggering the event.
Every event_command func() implementation essentially does the
same thing for any command:
- choose ops - use the value of param to choose either a number or
count version of event_trigger_ops specific to the command
- do the register or unregister of those ops
- associate a filter, if specified, with the triggering event
The reg() and unreg() ops allow command-specific implementations for
event_trigger_op registration and unregistration, and the
get_trigger_ops() op allows command-specific event_trigger_ops
selection to be parameterized. When a trigger instance is added, the
reg() op essentially adds that trigger to the triggering event and
arms it, while unreg() does the opposite. The set_filter() function
is used to associate a filter with the trigger - if the command
doesn't specify a set_filter() implementation, the command will ignore
filters.
Each command has an associated trigger_type, which serves double duty,
both as a unique identifier for the command as well as a value that
can be used for setting a trigger mode bit during trigger invocation.
The signature of func() adds a pointer to the event_command struct,
used to invoke those functions, along with a command_data param that
can be passed to the reg/unreg functions. This allows func()
implementations to use command-specific blobs and supports code
re-use.
The event_trigger_ops.func() command corrsponds to the trigger 'probe'
function that gets called when the triggering event is actually
invoked. The other functions are used to list the trigger when
needed, along with a couple mundane book-keeping functions.
This also moves event_file_data() into trace.h so it can be used
outside of trace_events.c.
Link: http://lkml.kernel.org/r/316d95061accdee070aac8e5750afba0192fa5b9.1382622043.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Idea-by: Steve Rostedt <rostedt@goodmis.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-10-24 13:59:24 +00:00
|
|
|
* TRIGGER_MODE - When set, invoke the triggers associated with the event
|
tracing: Add and use generic set_trigger_filter() implementation
Add a generic event_command.set_trigger_filter() op implementation and
have the current set of trigger commands use it - this essentially
gives them all support for filters.
Syntactically, filters are supported by adding 'if <filter>' just
after the command, in which case only events matching the filter will
invoke the trigger. For example, to add a filter to an
enable/disable_event command:
echo 'enable_event:system:event if common_pid == 999' > \
.../othersys/otherevent/trigger
The above command will only enable the system:event event if the
common_pid field in the othersys:otherevent event is 999.
As another example, to add a filter to a stacktrace command:
echo 'stacktrace if common_pid == 999' > \
.../somesys/someevent/trigger
The above command will only trigger a stacktrace if the common_pid
field in the event is 999.
The filter syntax is the same as that described in the 'Event
filtering' section of Documentation/trace/events.txt.
Because triggers can now use filters, the trigger-invoking logic needs
to be moved in those cases - e.g. for ftrace_raw_event_calls, if a
trigger has a filter associated with it, the trigger invocation now
needs to happen after the { assign; } part of the call, in order for
the trigger condition to be tested.
There's still a SOFT_DISABLED-only check at the top of e.g. the
ftrace_raw_events function, so when an event is soft disabled but not
because of the presence of a trigger, the original SOFT_DISABLED
behavior remains unchanged.
There's also a bit of trickiness in that some triggers need to avoid
being invoked while an event is currently in the process of being
logged, since the trigger may itself log data into the trace buffer.
Thus we make sure the current event is committed before invoking those
triggers. To do that, we split the trigger invocation in two - the
first part (event_triggers_call()) checks the filter using the current
trace record; if a command has the post_trigger flag set, it sets a
bit for itself in the return value, otherwise it directly invoks the
trigger. Once all commands have been either invoked or set their
return flag, event_triggers_call() returns. The current record is
then either committed or discarded; if any commands have deferred
their triggers, those commands are finally invoked following the close
of the current event by event_triggers_post_call().
To simplify the above and make it more efficient, the TRIGGER_COND bit
is introduced, which is set only if a soft-disabled trigger needs to
use the log record for filter testing or needs to wait until the
current log record is closed.
The syscall event invocation code is also changed in analogous ways.
Because event triggers need to be able to create and free filters,
this also adds a couple external wrappers for the existing
create_filter and free_filter functions, which are too generic to be
made extern functions themselves.
Link: http://lkml.kernel.org/r/7164930759d8719ef460357f143d995406e4eead.1382622043.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-10-24 13:59:29 +00:00
|
|
|
* TRIGGER_COND - When set, one or more triggers has an associated filter
|
2015-09-25 16:58:44 +00:00
|
|
|
* PID_FILTER - When set, the event is filtered based on pid
|
tracing: Only have rmmod clear buffers that its events were active in
Currently, when a module event is enabled, when that module is removed, it
clears all ring buffers. This is to prevent another module from being loaded
and having one of its trace event IDs from reusing a trace event ID of the
removed module. This could cause undesirable effects as the trace event of
the new module would be using its own processing algorithms to process raw
data of another event. To prevent this, when a module is loaded, if any of
its events have been used (signified by the WAS_ENABLED event call flag,
which is never cleared), all ring buffers are cleared, just in case any one
of them contains event data of the removed event.
The problem is, there's no reason to clear all ring buffers if only one (or
less than all of them) uses one of the events. Instead, only clear the ring
buffers that recorded the events of a module that is being removed.
To do this, instead of keeping the WAS_ENABLED flag with the trace event
call, move it to the per instance (per ring buffer) event file descriptor.
The event file descriptor maps each event to a separate ring buffer
instance. Then when the module is removed, only the ring buffers that
activated one of the module's events get cleared. The rest are not touched.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-08-31 21:03:47 +00:00
|
|
|
* WAS_ENABLED - Set when enabled to know to clear trace on module removal
|
2023-10-31 16:24:53 +00:00
|
|
|
* FREED - File descriptor is freed, all fields should be considered invalid
|
2012-05-04 03:09:03 +00:00
|
|
|
*/
|
|
|
|
enum {
|
2015-05-13 19:12:33 +00:00
|
|
|
EVENT_FILE_FL_ENABLED = (1 << EVENT_FILE_FL_ENABLED_BIT),
|
|
|
|
EVENT_FILE_FL_RECORDED_CMD = (1 << EVENT_FILE_FL_RECORDED_CMD_BIT),
|
2017-06-27 02:01:55 +00:00
|
|
|
EVENT_FILE_FL_RECORDED_TGID = (1 << EVENT_FILE_FL_RECORDED_TGID_BIT),
|
2015-05-13 19:12:33 +00:00
|
|
|
EVENT_FILE_FL_FILTERED = (1 << EVENT_FILE_FL_FILTERED_BIT),
|
|
|
|
EVENT_FILE_FL_NO_SET_FILTER = (1 << EVENT_FILE_FL_NO_SET_FILTER_BIT),
|
|
|
|
EVENT_FILE_FL_SOFT_MODE = (1 << EVENT_FILE_FL_SOFT_MODE_BIT),
|
|
|
|
EVENT_FILE_FL_SOFT_DISABLED = (1 << EVENT_FILE_FL_SOFT_DISABLED_BIT),
|
|
|
|
EVENT_FILE_FL_TRIGGER_MODE = (1 << EVENT_FILE_FL_TRIGGER_MODE_BIT),
|
|
|
|
EVENT_FILE_FL_TRIGGER_COND = (1 << EVENT_FILE_FL_TRIGGER_COND_BIT),
|
2015-09-25 16:58:44 +00:00
|
|
|
EVENT_FILE_FL_PID_FILTER = (1 << EVENT_FILE_FL_PID_FILTER_BIT),
|
tracing: Only have rmmod clear buffers that its events were active in
Currently, when a module event is enabled, when that module is removed, it
clears all ring buffers. This is to prevent another module from being loaded
and having one of its trace event IDs from reusing a trace event ID of the
removed module. This could cause undesirable effects as the trace event of
the new module would be using its own processing algorithms to process raw
data of another event. To prevent this, when a module is loaded, if any of
its events have been used (signified by the WAS_ENABLED event call flag,
which is never cleared), all ring buffers are cleared, just in case any one
of them contains event data of the removed event.
The problem is, there's no reason to clear all ring buffers if only one (or
less than all of them) uses one of the events. Instead, only clear the ring
buffers that recorded the events of a module that is being removed.
To do this, instead of keeping the WAS_ENABLED flag with the trace event
call, move it to the per instance (per ring buffer) event file descriptor.
The event file descriptor maps each event to a separate ring buffer
instance. Then when the module is removed, only the ring buffers that
activated one of the module's events get cleared. The rest are not touched.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-08-31 21:03:47 +00:00
|
|
|
EVENT_FILE_FL_WAS_ENABLED = (1 << EVENT_FILE_FL_WAS_ENABLED_BIT),
|
2023-10-31 16:24:53 +00:00
|
|
|
EVENT_FILE_FL_FREED = (1 << EVENT_FILE_FL_FREED_BIT),
|
2012-05-04 03:09:03 +00:00
|
|
|
};
|
|
|
|
|
2015-05-05 14:09:53 +00:00
|
|
|
struct trace_event_file {
|
2012-05-04 03:09:03 +00:00
|
|
|
struct list_head list;
|
2015-05-05 15:45:27 +00:00
|
|
|
struct trace_event_call *event_call;
|
2017-06-07 08:12:51 +00:00
|
|
|
struct event_filter __rcu *filter;
|
eventfs: Remove eventfs_file and just use eventfs_inode
Instead of having a descriptor for every file represented in the eventfs
directory, only have the directory itself represented. Change the API to
send in a list of entries that represent all the files in the directory
(but not other directories). The entry list contains a name and a callback
function that will be used to create the files when they are accessed.
struct eventfs_inode *eventfs_create_events_dir(const char *name, struct dentry *parent,
const struct eventfs_entry *entries,
int size, void *data);
is used for the top level eventfs directory, and returns an eventfs_inode
that will be used by:
struct eventfs_inode *eventfs_create_dir(const char *name, struct eventfs_inode *parent,
const struct eventfs_entry *entries,
int size, void *data);
where both of the above take an array of struct eventfs_entry entries for
every file that is in the directory.
The entries are defined by:
typedef int (*eventfs_callback)(const char *name, umode_t *mode, void **data,
const struct file_operations **fops);
struct eventfs_entry {
const char *name;
eventfs_callback callback;
};
Where the name is the name of the file and the callback gets called when
the file is being created. The callback passes in the name (in case the
same callback is used for multiple files), a pointer to the mode, data and
fops. The data will be pointing to the data that was passed in
eventfs_create_dir() or eventfs_create_events_dir() but may be overridden
to point to something else, as it will be used to point to the
inode->i_private that is created. The information passed back from the
callback is used to create the dentry/inode.
If the callback fills the data and the file should be created, it must
return a positive number. On zero or negative, the file is ignored.
This logic may also be used as a prototype to convert entire pseudo file
systems into just-in-time allocation.
The "show_events_dentry" file has been updated to show the directories,
and any files they have.
With just the eventfs_file allocations:
Before after deltas for meminfo (in kB):
MemFree: -14360
MemAvailable: -14260
Buffers: 40
Cached: 24
Active: 44
Inactive: 48
Inactive(anon): 28
Active(file): 44
Inactive(file): 20
Dirty: -4
AnonPages: 28
Mapped: 4
KReclaimable: 132
Slab: 1604
SReclaimable: 132
SUnreclaim: 1472
Committed_AS: 12
Before after deltas for slabinfo:
<slab>: <objects> [ * <size> = <total>]
ext4_inode_cache 27 [* 1184 = 31968 ]
extent_status 102 [* 40 = 4080 ]
tracefs_inode_cache 144 [* 656 = 94464 ]
buffer_head 39 [* 104 = 4056 ]
shmem_inode_cache 49 [* 800 = 39200 ]
filp -53 [* 256 = -13568 ]
dentry 251 [* 192 = 48192 ]
lsm_file_cache 277 [* 32 = 8864 ]
vm_area_struct -14 [* 184 = -2576 ]
trace_event_file 1748 [* 88 = 153824 ]
kmalloc-1k 35 [* 1024 = 35840 ]
kmalloc-256 49 [* 256 = 12544 ]
kmalloc-192 -28 [* 192 = -5376 ]
kmalloc-128 -30 [* 128 = -3840 ]
kmalloc-96 10581 [* 96 = 1015776 ]
kmalloc-64 3056 [* 64 = 195584 ]
kmalloc-32 1291 [* 32 = 41312 ]
kmalloc-16 2310 [* 16 = 36960 ]
kmalloc-8 9216 [* 8 = 73728 ]
Free memory dropped by 14,360 kB
Available memory dropped by 14,260 kB
Total slab additions in size: 1,771,032 bytes
With this change:
Before after deltas for meminfo (in kB):
MemFree: -12084
MemAvailable: -11976
Buffers: 32
Cached: 32
Active: 72
Inactive: 168
Inactive(anon): 176
Active(file): 72
Inactive(file): -8
Dirty: 24
AnonPages: 196
Mapped: 8
KReclaimable: 148
Slab: 836
SReclaimable: 148
SUnreclaim: 688
Committed_AS: 324
Before after deltas for slabinfo:
<slab>: <objects> [ * <size> = <total>]
tracefs_inode_cache 144 [* 656 = 94464 ]
shmem_inode_cache -23 [* 800 = -18400 ]
filp -92 [* 256 = -23552 ]
dentry 179 [* 192 = 34368 ]
lsm_file_cache -3 [* 32 = -96 ]
vm_area_struct -13 [* 184 = -2392 ]
trace_event_file 1748 [* 88 = 153824 ]
kmalloc-1k -49 [* 1024 = -50176 ]
kmalloc-256 -27 [* 256 = -6912 ]
kmalloc-128 1864 [* 128 = 238592 ]
kmalloc-64 4685 [* 64 = 299840 ]
kmalloc-32 -72 [* 32 = -2304 ]
kmalloc-16 256 [* 16 = 4096 ]
total = 721352
Free memory dropped by 12,084 kB
Available memory dropped by 11,976 kB
Total slab additions in size: 721,352 bytes
That's over 2 MB in savings per instance for free and available memory,
and over 1 MB in savings per instance of slab memory.
Link: https://lore.kernel.org/linux-trace-kernel/20231003184059.4924468e@gandalf.local.home
Link: https://lore.kernel.org/linux-trace-kernel/20231004165007.43d79161@gandalf.local.home
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ajay Kaher <akaher@vmware.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-10-04 20:50:07 +00:00
|
|
|
struct eventfs_inode *ei;
|
2012-05-04 03:09:03 +00:00
|
|
|
struct trace_array *tr;
|
2015-05-13 18:59:40 +00:00
|
|
|
struct trace_subsystem_dir *system;
|
tracing: Add basic event trigger framework
Add a 'trigger' file for each trace event, enabling 'trace event
triggers' to be set for trace events.
'trace event triggers' are patterned after the existing 'ftrace
function triggers' implementation except that triggers are written to
per-event 'trigger' files instead of to a single file such as the
'set_ftrace_filter' used for ftrace function triggers.
The implementation is meant to be entirely separate from ftrace
function triggers, in order to keep the respective implementations
relatively simple and to allow them to diverge.
The event trigger functionality is built on top of SOFT_DISABLE
functionality. It adds a TRIGGER_MODE bit to the ftrace_event_file
flags which is checked when any trace event fires. Triggers set for a
particular event need to be checked regardless of whether that event
is actually enabled or not - getting an event to fire even if it's not
enabled is what's already implemented by SOFT_DISABLE mode, so trigger
mode directly reuses that. Event trigger essentially inherit the soft
disable logic in __ftrace_event_enable_disable() while adding a bit of
logic and trigger reference counting via tm_ref on top of that in a
new trace_event_trigger_enable_disable() function. Because the base
__ftrace_event_enable_disable() code now needs to be invoked from
outside trace_events.c, a wrapper is also added for those usages.
The triggers for an event are actually invoked via a new function,
event_triggers_call(), and code is also added to invoke them for
ftrace_raw_event calls as well as syscall events.
The main part of the patch creates a new trace_events_trigger.c file
to contain the trace event triggers implementation.
The standard open, read, and release file operations are implemented
here.
The open() implementation sets up for the various open modes of the
'trigger' file. It creates and attaches the trigger iterator and sets
up the command parser. If opened for reading set up the trigger
seq_ops.
The read() implementation parses the event trigger written to the
'trigger' file, looks up the trigger command, and passes it along to
that event_command's func() implementation for command-specific
processing.
The release() implementation does whatever cleanup is needed to
release the 'trigger' file, like releasing the parser and trigger
iterator, etc.
A couple of functions for event command registration and
unregistration are added, along with a list to add them to and a mutex
to protect them, as well as an (initially empty) registration function
to add the set of commands that will be added by future commits, and
call to it from the trace event initialization code.
also added are a couple trigger-specific data structures needed for
these implementations such as a trigger iterator and a struct for
trigger-specific data.
A couple structs consisting mostly of function meant to be implemented
in command-specific ways, event_command and event_trigger_ops, are
used by the generic event trigger command implementations. They're
being put into trace.h alongside the other trace_event data structures
and functions, in the expectation that they'll be needed in several
trace_event-related files such as trace_events_trigger.c and
trace_events.c.
The event_command.func() function is meant to be called by the trigger
parsing code in order to add a trigger instance to the corresponding
event. It essentially coordinates adding a live trigger instance to
the event, and arming the triggering the event.
Every event_command func() implementation essentially does the
same thing for any command:
- choose ops - use the value of param to choose either a number or
count version of event_trigger_ops specific to the command
- do the register or unregister of those ops
- associate a filter, if specified, with the triggering event
The reg() and unreg() ops allow command-specific implementations for
event_trigger_op registration and unregistration, and the
get_trigger_ops() op allows command-specific event_trigger_ops
selection to be parameterized. When a trigger instance is added, the
reg() op essentially adds that trigger to the triggering event and
arms it, while unreg() does the opposite. The set_filter() function
is used to associate a filter with the trigger - if the command
doesn't specify a set_filter() implementation, the command will ignore
filters.
Each command has an associated trigger_type, which serves double duty,
both as a unique identifier for the command as well as a value that
can be used for setting a trigger mode bit during trigger invocation.
The signature of func() adds a pointer to the event_command struct,
used to invoke those functions, along with a command_data param that
can be passed to the reg/unreg functions. This allows func()
implementations to use command-specific blobs and supports code
re-use.
The event_trigger_ops.func() command corrsponds to the trigger 'probe'
function that gets called when the triggering event is actually
invoked. The other functions are used to list the trigger when
needed, along with a couple mundane book-keeping functions.
This also moves event_file_data() into trace.h so it can be used
outside of trace_events.c.
Link: http://lkml.kernel.org/r/316d95061accdee070aac8e5750afba0192fa5b9.1382622043.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Idea-by: Steve Rostedt <rostedt@goodmis.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-10-24 13:59:24 +00:00
|
|
|
struct list_head triggers;
|
2009-04-13 15:20:49 +00:00
|
|
|
|
2010-04-23 15:12:36 +00:00
|
|
|
/*
|
|
|
|
* 32 bit flags:
|
2013-03-12 16:38:06 +00:00
|
|
|
* bit 0: enabled
|
|
|
|
* bit 1: enabled cmd record
|
2013-03-12 17:26:18 +00:00
|
|
|
* bit 2: enable/disable with the soft disable bit
|
|
|
|
* bit 3: soft disabled
|
tracing: Add basic event trigger framework
Add a 'trigger' file for each trace event, enabling 'trace event
triggers' to be set for trace events.
'trace event triggers' are patterned after the existing 'ftrace
function triggers' implementation except that triggers are written to
per-event 'trigger' files instead of to a single file such as the
'set_ftrace_filter' used for ftrace function triggers.
The implementation is meant to be entirely separate from ftrace
function triggers, in order to keep the respective implementations
relatively simple and to allow them to diverge.
The event trigger functionality is built on top of SOFT_DISABLE
functionality. It adds a TRIGGER_MODE bit to the ftrace_event_file
flags which is checked when any trace event fires. Triggers set for a
particular event need to be checked regardless of whether that event
is actually enabled or not - getting an event to fire even if it's not
enabled is what's already implemented by SOFT_DISABLE mode, so trigger
mode directly reuses that. Event trigger essentially inherit the soft
disable logic in __ftrace_event_enable_disable() while adding a bit of
logic and trigger reference counting via tm_ref on top of that in a
new trace_event_trigger_enable_disable() function. Because the base
__ftrace_event_enable_disable() code now needs to be invoked from
outside trace_events.c, a wrapper is also added for those usages.
The triggers for an event are actually invoked via a new function,
event_triggers_call(), and code is also added to invoke them for
ftrace_raw_event calls as well as syscall events.
The main part of the patch creates a new trace_events_trigger.c file
to contain the trace event triggers implementation.
The standard open, read, and release file operations are implemented
here.
The open() implementation sets up for the various open modes of the
'trigger' file. It creates and attaches the trigger iterator and sets
up the command parser. If opened for reading set up the trigger
seq_ops.
The read() implementation parses the event trigger written to the
'trigger' file, looks up the trigger command, and passes it along to
that event_command's func() implementation for command-specific
processing.
The release() implementation does whatever cleanup is needed to
release the 'trigger' file, like releasing the parser and trigger
iterator, etc.
A couple of functions for event command registration and
unregistration are added, along with a list to add them to and a mutex
to protect them, as well as an (initially empty) registration function
to add the set of commands that will be added by future commits, and
call to it from the trace event initialization code.
also added are a couple trigger-specific data structures needed for
these implementations such as a trigger iterator and a struct for
trigger-specific data.
A couple structs consisting mostly of function meant to be implemented
in command-specific ways, event_command and event_trigger_ops, are
used by the generic event trigger command implementations. They're
being put into trace.h alongside the other trace_event data structures
and functions, in the expectation that they'll be needed in several
trace_event-related files such as trace_events_trigger.c and
trace_events.c.
The event_command.func() function is meant to be called by the trigger
parsing code in order to add a trigger instance to the corresponding
event. It essentially coordinates adding a live trigger instance to
the event, and arming the triggering the event.
Every event_command func() implementation essentially does the
same thing for any command:
- choose ops - use the value of param to choose either a number or
count version of event_trigger_ops specific to the command
- do the register or unregister of those ops
- associate a filter, if specified, with the triggering event
The reg() and unreg() ops allow command-specific implementations for
event_trigger_op registration and unregistration, and the
get_trigger_ops() op allows command-specific event_trigger_ops
selection to be parameterized. When a trigger instance is added, the
reg() op essentially adds that trigger to the triggering event and
arms it, while unreg() does the opposite. The set_filter() function
is used to associate a filter with the trigger - if the command
doesn't specify a set_filter() implementation, the command will ignore
filters.
Each command has an associated trigger_type, which serves double duty,
both as a unique identifier for the command as well as a value that
can be used for setting a trigger mode bit during trigger invocation.
The signature of func() adds a pointer to the event_command struct,
used to invoke those functions, along with a command_data param that
can be passed to the reg/unreg functions. This allows func()
implementations to use command-specific blobs and supports code
re-use.
The event_trigger_ops.func() command corrsponds to the trigger 'probe'
function that gets called when the triggering event is actually
invoked. The other functions are used to list the trigger when
needed, along with a couple mundane book-keeping functions.
This also moves event_file_data() into trace.h so it can be used
outside of trace_events.c.
Link: http://lkml.kernel.org/r/316d95061accdee070aac8e5750afba0192fa5b9.1382622043.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Idea-by: Steve Rostedt <rostedt@goodmis.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-10-24 13:59:24 +00:00
|
|
|
* bit 4: trigger enabled
|
2010-04-23 15:12:36 +00:00
|
|
|
*
|
2013-03-12 17:26:18 +00:00
|
|
|
* Note: The bits must be set atomically to prevent races
|
|
|
|
* from other writers. Reads of flags do not need to be in
|
|
|
|
* sync as they occur in critical sections. But the way flags
|
2012-05-04 03:09:03 +00:00
|
|
|
* is currently used, these changes do not affect the code
|
2010-05-14 14:19:13 +00:00
|
|
|
* except that when a change is made, it may have a slight
|
|
|
|
* delay in propagating the changes to other CPUs due to
|
2013-03-12 17:26:18 +00:00
|
|
|
* caching and such. Which is mostly OK ;-)
|
2010-04-23 15:12:36 +00:00
|
|
|
*/
|
2013-03-12 17:26:18 +00:00
|
|
|
unsigned long flags;
|
2024-07-26 18:42:08 +00:00
|
|
|
refcount_t ref; /* ref count for opened files */
|
2013-05-09 05:44:29 +00:00
|
|
|
atomic_t sm_ref; /* soft-mode reference counter */
|
tracing: Add basic event trigger framework
Add a 'trigger' file for each trace event, enabling 'trace event
triggers' to be set for trace events.
'trace event triggers' are patterned after the existing 'ftrace
function triggers' implementation except that triggers are written to
per-event 'trigger' files instead of to a single file such as the
'set_ftrace_filter' used for ftrace function triggers.
The implementation is meant to be entirely separate from ftrace
function triggers, in order to keep the respective implementations
relatively simple and to allow them to diverge.
The event trigger functionality is built on top of SOFT_DISABLE
functionality. It adds a TRIGGER_MODE bit to the ftrace_event_file
flags which is checked when any trace event fires. Triggers set for a
particular event need to be checked regardless of whether that event
is actually enabled or not - getting an event to fire even if it's not
enabled is what's already implemented by SOFT_DISABLE mode, so trigger
mode directly reuses that. Event trigger essentially inherit the soft
disable logic in __ftrace_event_enable_disable() while adding a bit of
logic and trigger reference counting via tm_ref on top of that in a
new trace_event_trigger_enable_disable() function. Because the base
__ftrace_event_enable_disable() code now needs to be invoked from
outside trace_events.c, a wrapper is also added for those usages.
The triggers for an event are actually invoked via a new function,
event_triggers_call(), and code is also added to invoke them for
ftrace_raw_event calls as well as syscall events.
The main part of the patch creates a new trace_events_trigger.c file
to contain the trace event triggers implementation.
The standard open, read, and release file operations are implemented
here.
The open() implementation sets up for the various open modes of the
'trigger' file. It creates and attaches the trigger iterator and sets
up the command parser. If opened for reading set up the trigger
seq_ops.
The read() implementation parses the event trigger written to the
'trigger' file, looks up the trigger command, and passes it along to
that event_command's func() implementation for command-specific
processing.
The release() implementation does whatever cleanup is needed to
release the 'trigger' file, like releasing the parser and trigger
iterator, etc.
A couple of functions for event command registration and
unregistration are added, along with a list to add them to and a mutex
to protect them, as well as an (initially empty) registration function
to add the set of commands that will be added by future commits, and
call to it from the trace event initialization code.
also added are a couple trigger-specific data structures needed for
these implementations such as a trigger iterator and a struct for
trigger-specific data.
A couple structs consisting mostly of function meant to be implemented
in command-specific ways, event_command and event_trigger_ops, are
used by the generic event trigger command implementations. They're
being put into trace.h alongside the other trace_event data structures
and functions, in the expectation that they'll be needed in several
trace_event-related files such as trace_events_trigger.c and
trace_events.c.
The event_command.func() function is meant to be called by the trigger
parsing code in order to add a trigger instance to the corresponding
event. It essentially coordinates adding a live trigger instance to
the event, and arming the triggering the event.
Every event_command func() implementation essentially does the
same thing for any command:
- choose ops - use the value of param to choose either a number or
count version of event_trigger_ops specific to the command
- do the register or unregister of those ops
- associate a filter, if specified, with the triggering event
The reg() and unreg() ops allow command-specific implementations for
event_trigger_op registration and unregistration, and the
get_trigger_ops() op allows command-specific event_trigger_ops
selection to be parameterized. When a trigger instance is added, the
reg() op essentially adds that trigger to the triggering event and
arms it, while unreg() does the opposite. The set_filter() function
is used to associate a filter with the trigger - if the command
doesn't specify a set_filter() implementation, the command will ignore
filters.
Each command has an associated trigger_type, which serves double duty,
both as a unique identifier for the command as well as a value that
can be used for setting a trigger mode bit during trigger invocation.
The signature of func() adds a pointer to the event_command struct,
used to invoke those functions, along with a command_data param that
can be passed to the reg/unreg functions. This allows func()
implementations to use command-specific blobs and supports code
re-use.
The event_trigger_ops.func() command corrsponds to the trigger 'probe'
function that gets called when the triggering event is actually
invoked. The other functions are used to list the trigger when
needed, along with a couple mundane book-keeping functions.
This also moves event_file_data() into trace.h so it can be used
outside of trace_events.c.
Link: http://lkml.kernel.org/r/316d95061accdee070aac8e5750afba0192fa5b9.1382622043.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Idea-by: Steve Rostedt <rostedt@goodmis.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-10-24 13:59:24 +00:00
|
|
|
atomic_t tm_ref; /* trigger-mode reference counter */
|
2009-04-13 15:20:49 +00:00
|
|
|
};
|
|
|
|
|
2010-11-18 01:11:42 +00:00
|
|
|
#define __TRACE_EVENT_FLAGS(name, value) \
|
|
|
|
static int __init trace_init_flags_##name(void) \
|
|
|
|
{ \
|
2014-04-08 21:26:21 +00:00
|
|
|
event_##name.flags |= value; \
|
2010-11-18 01:11:42 +00:00
|
|
|
return 0; \
|
|
|
|
} \
|
|
|
|
early_initcall(trace_init_flags_##name);
|
|
|
|
|
2013-11-14 15:23:04 +00:00
|
|
|
#define __TRACE_EVENT_PERF_PERM(name, expr...) \
|
2015-05-05 15:45:27 +00:00
|
|
|
static int perf_perm_##name(struct trace_event_call *tp_event, \
|
2013-11-14 15:23:04 +00:00
|
|
|
struct perf_event *p_event) \
|
|
|
|
{ \
|
|
|
|
return ({ expr; }); \
|
|
|
|
} \
|
|
|
|
static int __init trace_init_perf_perm_##name(void) \
|
|
|
|
{ \
|
|
|
|
event_##name.perf_perm = &perf_perm_##name; \
|
|
|
|
return 0; \
|
|
|
|
} \
|
|
|
|
early_initcall(trace_init_perf_perm_##name);
|
|
|
|
|
tracing: Increase PERF_MAX_TRACE_SIZE to handle Sentinel1 and docker together
Running endpoint security solutions like Sentinel1 that use perf-based
tracing heavily lead to this repeated dump complaining about dockerd.
The default value of 2048 is nowhere near not large enough.
Using the prior patch "tracing: show size of requested buffer", we get
"perf buffer not large enough, wanted 6644, have 6144", after repeated
up-sizing (I did 2/4/6/8K). With 8K, the problem doesn't occur at all,
so below is the trace for 6K.
I'm wondering if this value should be selectable at boot time, but this
is a good starting point.
```
------------[ cut here ]------------
perf buffer not large enough, wanted 6644, have 6144
WARNING: CPU: 1 PID: 4997 at kernel/trace/trace_event_perf.c:402 perf_trace_buf_alloc+0x8c/0xa0
Modules linked in: [..]
CPU: 1 PID: 4997 Comm: sh Tainted: G T 5.13.13-x86_64-00039-gb3959163488e #63
Hardware name: LENOVO 20KH002JUS/20KH002JUS, BIOS N23ET66W (1.41 ) 09/02/2019
RIP: 0010:perf_trace_buf_alloc+0x8c/0xa0
Code: 80 3d 43 97 d0 01 00 74 07 31 c0 5b 5d 41 5c c3 ba 00 18 00 00 89 ee 48 c7 c7 00 82 7d 91 c6 05 25 97 d0 01 01 e8 22 ee bc 00 <0f> 0b 31 c0 eb db 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 55 89
RSP: 0018:ffffb922026b7d58 EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffff9da5ee012000 RCX: 0000000000000027
RDX: ffff9da881657828 RSI: 0000000000000001 RDI: ffff9da881657820
RBP: 00000000000019f4 R08: 0000000000000000 R09: ffffb922026b7b80
R10: ffffb922026b7b78 R11: ffffffff91dda688 R12: 000000000000000f
R13: ffff9da5ee012108 R14: ffff9da8816570a0 R15: ffffb922026b7e30
FS: 00007f420db1a080(0000) GS:ffff9da881640000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000060 CR3: 00000002504a8006 CR4: 00000000003706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
kprobe_perf_func+0x11e/0x270
? do_execveat_common.isra.0+0x1/0x1c0
? do_execveat_common.isra.0+0x5/0x1c0
kprobe_ftrace_handler+0x10e/0x1d0
0xffffffffc03aa0c8
? do_execveat_common.isra.0+0x1/0x1c0
do_execveat_common.isra.0+0x5/0x1c0
__x64_sys_execve+0x33/0x40
do_syscall_64+0x6b/0xc0
? do_syscall_64+0x11/0xc0
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f420dc1db37
Code: ff ff 76 e7 f7 d8 64 41 89 00 eb df 0f 1f 80 00 00 00 00 f7 d8 64 41 89 00 eb dc 0f 1f 84 00 00 00 00 00 b8 3b 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 01 43 0f 00 f7 d8 64 89 01 48
RSP: 002b:00007ffd4e8b4e38 EFLAGS: 00000246 ORIG_RAX: 000000000000003b
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f420dc1db37
RDX: 0000564338d1e740 RSI: 0000564338d32d50 RDI: 0000564338d28f00
RBP: 0000564338d28f00 R08: 0000564338d32d50 R09: 0000000000000020
R10: 00000000000001b6 R11: 0000000000000246 R12: 0000564338d28f00
R13: 0000564338d32d50 R14: 0000564338d1e740 R15: 0000564338d28c60
---[ end trace 83ab3e8e16275e49 ]---
```
Link: https://lkml.kernel.org/r/20210831043723.13481-2-robbat2@gentoo.org
Signed-off-by: Robin H. Johnson <robbat2@gentoo.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-08-31 04:37:23 +00:00
|
|
|
#define PERF_MAX_TRACE_SIZE 8192
|
2009-09-18 04:10:28 +00:00
|
|
|
|
2021-11-14 18:28:34 +00:00
|
|
|
#define MAX_FILTER_STR_VAL 256U /* Should handle KSYM_SYMBOL_LEN */
|
2009-04-13 15:20:49 +00:00
|
|
|
|
tracing: Add basic event trigger framework
Add a 'trigger' file for each trace event, enabling 'trace event
triggers' to be set for trace events.
'trace event triggers' are patterned after the existing 'ftrace
function triggers' implementation except that triggers are written to
per-event 'trigger' files instead of to a single file such as the
'set_ftrace_filter' used for ftrace function triggers.
The implementation is meant to be entirely separate from ftrace
function triggers, in order to keep the respective implementations
relatively simple and to allow them to diverge.
The event trigger functionality is built on top of SOFT_DISABLE
functionality. It adds a TRIGGER_MODE bit to the ftrace_event_file
flags which is checked when any trace event fires. Triggers set for a
particular event need to be checked regardless of whether that event
is actually enabled or not - getting an event to fire even if it's not
enabled is what's already implemented by SOFT_DISABLE mode, so trigger
mode directly reuses that. Event trigger essentially inherit the soft
disable logic in __ftrace_event_enable_disable() while adding a bit of
logic and trigger reference counting via tm_ref on top of that in a
new trace_event_trigger_enable_disable() function. Because the base
__ftrace_event_enable_disable() code now needs to be invoked from
outside trace_events.c, a wrapper is also added for those usages.
The triggers for an event are actually invoked via a new function,
event_triggers_call(), and code is also added to invoke them for
ftrace_raw_event calls as well as syscall events.
The main part of the patch creates a new trace_events_trigger.c file
to contain the trace event triggers implementation.
The standard open, read, and release file operations are implemented
here.
The open() implementation sets up for the various open modes of the
'trigger' file. It creates and attaches the trigger iterator and sets
up the command parser. If opened for reading set up the trigger
seq_ops.
The read() implementation parses the event trigger written to the
'trigger' file, looks up the trigger command, and passes it along to
that event_command's func() implementation for command-specific
processing.
The release() implementation does whatever cleanup is needed to
release the 'trigger' file, like releasing the parser and trigger
iterator, etc.
A couple of functions for event command registration and
unregistration are added, along with a list to add them to and a mutex
to protect them, as well as an (initially empty) registration function
to add the set of commands that will be added by future commits, and
call to it from the trace event initialization code.
also added are a couple trigger-specific data structures needed for
these implementations such as a trigger iterator and a struct for
trigger-specific data.
A couple structs consisting mostly of function meant to be implemented
in command-specific ways, event_command and event_trigger_ops, are
used by the generic event trigger command implementations. They're
being put into trace.h alongside the other trace_event data structures
and functions, in the expectation that they'll be needed in several
trace_event-related files such as trace_events_trigger.c and
trace_events.c.
The event_command.func() function is meant to be called by the trigger
parsing code in order to add a trigger instance to the corresponding
event. It essentially coordinates adding a live trigger instance to
the event, and arming the triggering the event.
Every event_command func() implementation essentially does the
same thing for any command:
- choose ops - use the value of param to choose either a number or
count version of event_trigger_ops specific to the command
- do the register or unregister of those ops
- associate a filter, if specified, with the triggering event
The reg() and unreg() ops allow command-specific implementations for
event_trigger_op registration and unregistration, and the
get_trigger_ops() op allows command-specific event_trigger_ops
selection to be parameterized. When a trigger instance is added, the
reg() op essentially adds that trigger to the triggering event and
arms it, while unreg() does the opposite. The set_filter() function
is used to associate a filter with the trigger - if the command
doesn't specify a set_filter() implementation, the command will ignore
filters.
Each command has an associated trigger_type, which serves double duty,
both as a unique identifier for the command as well as a value that
can be used for setting a trigger mode bit during trigger invocation.
The signature of func() adds a pointer to the event_command struct,
used to invoke those functions, along with a command_data param that
can be passed to the reg/unreg functions. This allows func()
implementations to use command-specific blobs and supports code
re-use.
The event_trigger_ops.func() command corrsponds to the trigger 'probe'
function that gets called when the triggering event is actually
invoked. The other functions are used to list the trigger when
needed, along with a couple mundane book-keeping functions.
This also moves event_file_data() into trace.h so it can be used
outside of trace_events.c.
Link: http://lkml.kernel.org/r/316d95061accdee070aac8e5750afba0192fa5b9.1382622043.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Idea-by: Steve Rostedt <rostedt@goodmis.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-10-24 13:59:24 +00:00
|
|
|
enum event_trigger_type {
|
|
|
|
ETT_NONE = (0),
|
tracing: Add 'traceon' and 'traceoff' event trigger commands
Add 'traceon' and 'traceoff' event_command commands. traceon and
traceoff event triggers are added by the user via these commands in a
similar way and using practically the same syntax as the analagous
'traceon' and 'traceoff' ftrace function commands, but instead of
writing to the set_ftrace_filter file, the traceon and traceoff
triggers are written to the per-event 'trigger' files:
echo 'traceon' > .../tracing/events/somesys/someevent/trigger
echo 'traceoff' > .../tracing/events/somesys/someevent/trigger
The above command will turn tracing on or off whenever someevent is
hit.
This also adds a 'count' version that limits the number of times the
command will be invoked:
echo 'traceon:N' > .../tracing/events/somesys/someevent/trigger
echo 'traceoff:N' > .../tracing/events/somesys/someevent/trigger
Where N is the number of times the command will be invoked.
The above commands will will turn tracing on or off whenever someevent
is hit, but only N times.
Some common register/unregister_trigger() implementations of the
event_command reg()/unreg() callbacks are also provided, which add and
remove trigger instances to the per-event list of triggers, and
arm/disarm them as appropriate. event_trigger_callback() is a
general-purpose event_command func() implementation that orchestrates
command parsing and registration for most normal commands.
Most event commands will use these, but some will override and
possibly reuse them.
The event_trigger_init(), event_trigger_free(), and
event_trigger_print() functions are meant to be common implementations
of the event_trigger_ops init(), free(), and print() ops,
respectively.
Most trigger_ops implementations will use these, but some will
override and possibly reuse them.
Link: http://lkml.kernel.org/r/00a52816703b98d2072947478dd6e2d70cde5197.1382622043.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-10-24 13:59:25 +00:00
|
|
|
ETT_TRACE_ONOFF = (1 << 0),
|
2013-10-24 13:59:26 +00:00
|
|
|
ETT_SNAPSHOT = (1 << 1),
|
2013-10-24 13:59:27 +00:00
|
|
|
ETT_STACKTRACE = (1 << 2),
|
2013-10-24 13:59:28 +00:00
|
|
|
ETT_EVENT_ENABLE = (1 << 3),
|
tracing: Add 'hist' event trigger command
'hist' triggers allow users to continually aggregate trace events,
which can then be viewed afterwards by simply reading a 'hist' file
containing the aggregation in a human-readable format.
The basic idea is very simple and boils down to a mechanism whereby
trace events, rather than being exhaustively dumped in raw form and
viewed directly, are automatically 'compressed' into meaningful tables
completely defined by the user.
This is done strictly via single-line command-line commands and
without the aid of any kind of programming language or interpreter.
A surprising number of typical use cases can be accomplished by users
via this simple mechanism. In fact, a large number of the tasks that
users typically do using the more complicated script-based tracing
tools, at least during the initial stages of an investigation, can be
accomplished by simply specifying a set of keys and values to be used
in the creation of a hash table.
The Linux kernel trace event subsystem happens to provide an extensive
list of keys and values ready-made for such a purpose in the form of
the event format files associated with each trace event. By simply
consulting the format file for field names of interest and by plugging
them into the hist trigger command, users can create an endless number
of useful aggregations to help with investigating various properties
of the system. See Documentation/trace/events.txt for examples.
hist triggers are implemented on top of the existing event trigger
infrastructure, and as such are consistent with the existing triggers
from a user's perspective as well.
The basic syntax follows the existing trigger syntax. Users start an
aggregation by writing a 'hist' trigger to the event of interest's
trigger file:
# echo hist:keys=xxx [ if filter] > event/trigger
Once a hist trigger has been set up, by default it continually
aggregates every matching event into a hash table using the event key
and a value field named 'hitcount'.
To view the aggregation at any point in time, simply read the 'hist'
file in the same directory as the 'trigger' file:
# cat event/hist
The detailed syntax provides additional options for user control, and
is described exhaustively in Documentation/trace/events.txt and in the
virtual tracing/README file in the tracing subsystem.
Link: http://lkml.kernel.org/r/72d263b5e1853fe9c314953b65833c3aa75479f2.1457029949.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-03-03 18:54:42 +00:00
|
|
|
ETT_EVENT_HIST = (1 << 4),
|
2016-03-03 18:54:55 +00:00
|
|
|
ETT_HIST_ENABLE = (1 << 5),
|
tracing: Add a probe that attaches to trace events
A new dynamic event is introduced: event probe. The event is attached
to an existing tracepoint and uses its fields as arguments. The user
can specify custom format string of the new event, select what tracepoint
arguments will be printed and how to print them.
An event probe is created by writing configuration string in
'dynamic_events' ftrace file:
e[:[SNAME/]ENAME] SYSTEM/EVENT [FETCHARGS] - Set an event probe
-:SNAME/ENAME - Delete an event probe
Where:
SNAME - System name, if omitted 'eprobes' is used.
ENAME - Name of the new event in SNAME, if omitted the SYSTEM_EVENT is used.
SYSTEM - Name of the system, where the tracepoint is defined, mandatory.
EVENT - Name of the tracepoint event in SYSTEM, mandatory.
FETCHARGS - Arguments:
<name>=$<field>[:TYPE] - Fetch given filed of the tracepoint and print
it as given TYPE with given name. Supported
types are:
(u8/u16/u32/u64/s8/s16/s32/s64), basic type
(x8/x16/x32/x64), hexadecimal types
"string", "ustring" and bitfield.
Example, attach an event probe on openat system call and print name of the
file that will be opened:
echo "e:esys/eopen syscalls/sys_enter_openat file=\$filename:string" >> dynamic_events
A new dynamic event is created in events/esys/eopen/ directory. It
can be deleted with:
echo "-:esys/eopen" >> dynamic_events
Filters, triggers and histograms can be attached to the new event, it can
be matched in synthetic events. There is one limitation - an event probe
can not be attached to kprobe, uprobe or another event probe.
Link: https://lkml.kernel.org/r/20210812145805.2292326-1-tz.stoyanov@gmail.com
Link: https://lkml.kernel.org/r/20210819152825.142428383@goodmis.org
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Co-developed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Tzvetomir Stoyanov (VMware) <tz.stoyanov@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-08-19 15:26:06 +00:00
|
|
|
ETT_EVENT_EPROBE = (1 << 6),
|
tracing: Add basic event trigger framework
Add a 'trigger' file for each trace event, enabling 'trace event
triggers' to be set for trace events.
'trace event triggers' are patterned after the existing 'ftrace
function triggers' implementation except that triggers are written to
per-event 'trigger' files instead of to a single file such as the
'set_ftrace_filter' used for ftrace function triggers.
The implementation is meant to be entirely separate from ftrace
function triggers, in order to keep the respective implementations
relatively simple and to allow them to diverge.
The event trigger functionality is built on top of SOFT_DISABLE
functionality. It adds a TRIGGER_MODE bit to the ftrace_event_file
flags which is checked when any trace event fires. Triggers set for a
particular event need to be checked regardless of whether that event
is actually enabled or not - getting an event to fire even if it's not
enabled is what's already implemented by SOFT_DISABLE mode, so trigger
mode directly reuses that. Event trigger essentially inherit the soft
disable logic in __ftrace_event_enable_disable() while adding a bit of
logic and trigger reference counting via tm_ref on top of that in a
new trace_event_trigger_enable_disable() function. Because the base
__ftrace_event_enable_disable() code now needs to be invoked from
outside trace_events.c, a wrapper is also added for those usages.
The triggers for an event are actually invoked via a new function,
event_triggers_call(), and code is also added to invoke them for
ftrace_raw_event calls as well as syscall events.
The main part of the patch creates a new trace_events_trigger.c file
to contain the trace event triggers implementation.
The standard open, read, and release file operations are implemented
here.
The open() implementation sets up for the various open modes of the
'trigger' file. It creates and attaches the trigger iterator and sets
up the command parser. If opened for reading set up the trigger
seq_ops.
The read() implementation parses the event trigger written to the
'trigger' file, looks up the trigger command, and passes it along to
that event_command's func() implementation for command-specific
processing.
The release() implementation does whatever cleanup is needed to
release the 'trigger' file, like releasing the parser and trigger
iterator, etc.
A couple of functions for event command registration and
unregistration are added, along with a list to add them to and a mutex
to protect them, as well as an (initially empty) registration function
to add the set of commands that will be added by future commits, and
call to it from the trace event initialization code.
also added are a couple trigger-specific data structures needed for
these implementations such as a trigger iterator and a struct for
trigger-specific data.
A couple structs consisting mostly of function meant to be implemented
in command-specific ways, event_command and event_trigger_ops, are
used by the generic event trigger command implementations. They're
being put into trace.h alongside the other trace_event data structures
and functions, in the expectation that they'll be needed in several
trace_event-related files such as trace_events_trigger.c and
trace_events.c.
The event_command.func() function is meant to be called by the trigger
parsing code in order to add a trigger instance to the corresponding
event. It essentially coordinates adding a live trigger instance to
the event, and arming the triggering the event.
Every event_command func() implementation essentially does the
same thing for any command:
- choose ops - use the value of param to choose either a number or
count version of event_trigger_ops specific to the command
- do the register or unregister of those ops
- associate a filter, if specified, with the triggering event
The reg() and unreg() ops allow command-specific implementations for
event_trigger_op registration and unregistration, and the
get_trigger_ops() op allows command-specific event_trigger_ops
selection to be parameterized. When a trigger instance is added, the
reg() op essentially adds that trigger to the triggering event and
arms it, while unreg() does the opposite. The set_filter() function
is used to associate a filter with the trigger - if the command
doesn't specify a set_filter() implementation, the command will ignore
filters.
Each command has an associated trigger_type, which serves double duty,
both as a unique identifier for the command as well as a value that
can be used for setting a trigger mode bit during trigger invocation.
The signature of func() adds a pointer to the event_command struct,
used to invoke those functions, along with a command_data param that
can be passed to the reg/unreg functions. This allows func()
implementations to use command-specific blobs and supports code
re-use.
The event_trigger_ops.func() command corrsponds to the trigger 'probe'
function that gets called when the triggering event is actually
invoked. The other functions are used to list the trigger when
needed, along with a couple mundane book-keeping functions.
This also moves event_file_data() into trace.h so it can be used
outside of trace_events.c.
Link: http://lkml.kernel.org/r/316d95061accdee070aac8e5750afba0192fa5b9.1382622043.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Idea-by: Steve Rostedt <rostedt@goodmis.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-10-24 13:59:24 +00:00
|
|
|
};
|
|
|
|
|
2009-10-15 03:21:42 +00:00
|
|
|
extern int filter_match_preds(struct event_filter *filter, void *rec);
|
2013-10-24 13:34:17 +00:00
|
|
|
|
2018-01-16 02:51:42 +00:00
|
|
|
extern enum event_trigger_type
|
2021-03-16 16:41:03 +00:00
|
|
|
event_triggers_call(struct trace_event_file *file,
|
|
|
|
struct trace_buffer *buffer, void *rec,
|
2018-01-16 02:51:42 +00:00
|
|
|
struct ring_buffer_event *event);
|
|
|
|
extern void
|
|
|
|
event_triggers_post_call(struct trace_event_file *file,
|
2018-05-07 20:02:14 +00:00
|
|
|
enum event_trigger_type tt);
|
2009-04-13 15:20:49 +00:00
|
|
|
|
2015-09-25 16:58:44 +00:00
|
|
|
bool trace_event_ignore_this_pid(struct trace_event_file *trace_file);
|
|
|
|
|
tracing: Uninline trace_trigger_soft_disabled() partly
On a powerpc32 build with CONFIG_CC_OPTIMISE_FOR_SIZE, the inline
keyword is not honored and trace_trigger_soft_disabled() appears
approx 50 times in vmlinux.
Adding -Winline to the build, the following message appears:
./include/linux/trace_events.h:712:1: error: inlining failed in call to 'trace_trigger_soft_disabled': call is unlikely and code size would grow [-Werror=inline]
That function is rather big for an inlined function:
c003df60 <trace_trigger_soft_disabled>:
c003df60: 94 21 ff f0 stwu r1,-16(r1)
c003df64: 7c 08 02 a6 mflr r0
c003df68: 90 01 00 14 stw r0,20(r1)
c003df6c: bf c1 00 08 stmw r30,8(r1)
c003df70: 83 e3 00 24 lwz r31,36(r3)
c003df74: 73 e9 01 00 andi. r9,r31,256
c003df78: 41 82 00 10 beq c003df88 <trace_trigger_soft_disabled+0x28>
c003df7c: 38 60 00 00 li r3,0
c003df80: 39 61 00 10 addi r11,r1,16
c003df84: 4b fd 60 ac b c0014030 <_rest32gpr_30_x>
c003df88: 73 e9 00 80 andi. r9,r31,128
c003df8c: 7c 7e 1b 78 mr r30,r3
c003df90: 41 a2 00 14 beq c003dfa4 <trace_trigger_soft_disabled+0x44>
c003df94: 38 c0 00 00 li r6,0
c003df98: 38 a0 00 00 li r5,0
c003df9c: 38 80 00 00 li r4,0
c003dfa0: 48 05 c5 f1 bl c009a590 <event_triggers_call>
c003dfa4: 73 e9 00 40 andi. r9,r31,64
c003dfa8: 40 82 00 28 bne c003dfd0 <trace_trigger_soft_disabled+0x70>
c003dfac: 73 ff 02 00 andi. r31,r31,512
c003dfb0: 41 82 ff cc beq c003df7c <trace_trigger_soft_disabled+0x1c>
c003dfb4: 80 01 00 14 lwz r0,20(r1)
c003dfb8: 83 e1 00 0c lwz r31,12(r1)
c003dfbc: 7f c3 f3 78 mr r3,r30
c003dfc0: 83 c1 00 08 lwz r30,8(r1)
c003dfc4: 7c 08 03 a6 mtlr r0
c003dfc8: 38 21 00 10 addi r1,r1,16
c003dfcc: 48 05 6f 6c b c0094f38 <trace_event_ignore_this_pid>
c003dfd0: 38 60 00 01 li r3,1
c003dfd4: 4b ff ff ac b c003df80 <trace_trigger_soft_disabled+0x20>
However it is located in a hot path so inlining it is important.
But forcing inlining of the entire function by using __always_inline
leads to increasing the text size by approx 20 kbytes.
Instead, split the fonction in two parts, one part with the likely
fast path, flagged __always_inline, and a second part out of line.
With this change, on a powerpc32 with CONFIG_CC_OPTIMISE_FOR_SIZE
vmlinux text increases by only 1,4 kbytes, which is partly
compensated by a decrease of vmlinux data by 7 kbytes.
On ppc64_defconfig which has CONFIG_CC_OPTIMISE_FOR_SPEED, this
change reduces vmlinux text by more than 30 kbytes.
Link: https://lkml.kernel.org/r/69ce0986a52d026d381d612801d978aa4f977460.1644563295.git.christophe.leroy@csgroup.eu
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-02-11 07:10:18 +00:00
|
|
|
bool __trace_trigger_soft_disabled(struct trace_event_file *file);
|
|
|
|
|
2014-01-07 02:32:10 +00:00
|
|
|
/**
|
2015-05-13 19:21:25 +00:00
|
|
|
* trace_trigger_soft_disabled - do triggers and test if soft disabled
|
2014-01-07 02:32:10 +00:00
|
|
|
* @file: The file pointer of the event to test
|
|
|
|
*
|
|
|
|
* If any triggers without filters are attached to this event, they
|
|
|
|
* will be called here. If the event is soft disabled and has no
|
|
|
|
* triggers that require testing the fields, it will return true,
|
|
|
|
* otherwise false.
|
|
|
|
*/
|
tracing: Uninline trace_trigger_soft_disabled() partly
On a powerpc32 build with CONFIG_CC_OPTIMISE_FOR_SIZE, the inline
keyword is not honored and trace_trigger_soft_disabled() appears
approx 50 times in vmlinux.
Adding -Winline to the build, the following message appears:
./include/linux/trace_events.h:712:1: error: inlining failed in call to 'trace_trigger_soft_disabled': call is unlikely and code size would grow [-Werror=inline]
That function is rather big for an inlined function:
c003df60 <trace_trigger_soft_disabled>:
c003df60: 94 21 ff f0 stwu r1,-16(r1)
c003df64: 7c 08 02 a6 mflr r0
c003df68: 90 01 00 14 stw r0,20(r1)
c003df6c: bf c1 00 08 stmw r30,8(r1)
c003df70: 83 e3 00 24 lwz r31,36(r3)
c003df74: 73 e9 01 00 andi. r9,r31,256
c003df78: 41 82 00 10 beq c003df88 <trace_trigger_soft_disabled+0x28>
c003df7c: 38 60 00 00 li r3,0
c003df80: 39 61 00 10 addi r11,r1,16
c003df84: 4b fd 60 ac b c0014030 <_rest32gpr_30_x>
c003df88: 73 e9 00 80 andi. r9,r31,128
c003df8c: 7c 7e 1b 78 mr r30,r3
c003df90: 41 a2 00 14 beq c003dfa4 <trace_trigger_soft_disabled+0x44>
c003df94: 38 c0 00 00 li r6,0
c003df98: 38 a0 00 00 li r5,0
c003df9c: 38 80 00 00 li r4,0
c003dfa0: 48 05 c5 f1 bl c009a590 <event_triggers_call>
c003dfa4: 73 e9 00 40 andi. r9,r31,64
c003dfa8: 40 82 00 28 bne c003dfd0 <trace_trigger_soft_disabled+0x70>
c003dfac: 73 ff 02 00 andi. r31,r31,512
c003dfb0: 41 82 ff cc beq c003df7c <trace_trigger_soft_disabled+0x1c>
c003dfb4: 80 01 00 14 lwz r0,20(r1)
c003dfb8: 83 e1 00 0c lwz r31,12(r1)
c003dfbc: 7f c3 f3 78 mr r3,r30
c003dfc0: 83 c1 00 08 lwz r30,8(r1)
c003dfc4: 7c 08 03 a6 mtlr r0
c003dfc8: 38 21 00 10 addi r1,r1,16
c003dfcc: 48 05 6f 6c b c0094f38 <trace_event_ignore_this_pid>
c003dfd0: 38 60 00 01 li r3,1
c003dfd4: 4b ff ff ac b c003df80 <trace_trigger_soft_disabled+0x20>
However it is located in a hot path so inlining it is important.
But forcing inlining of the entire function by using __always_inline
leads to increasing the text size by approx 20 kbytes.
Instead, split the fonction in two parts, one part with the likely
fast path, flagged __always_inline, and a second part out of line.
With this change, on a powerpc32 with CONFIG_CC_OPTIMISE_FOR_SIZE
vmlinux text increases by only 1,4 kbytes, which is partly
compensated by a decrease of vmlinux data by 7 kbytes.
On ppc64_defconfig which has CONFIG_CC_OPTIMISE_FOR_SPEED, this
change reduces vmlinux text by more than 30 kbytes.
Link: https://lkml.kernel.org/r/69ce0986a52d026d381d612801d978aa4f977460.1644563295.git.christophe.leroy@csgroup.eu
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-02-11 07:10:18 +00:00
|
|
|
static __always_inline bool
|
2015-05-13 19:21:25 +00:00
|
|
|
trace_trigger_soft_disabled(struct trace_event_file *file)
|
2014-01-07 02:32:10 +00:00
|
|
|
{
|
|
|
|
unsigned long eflags = file->flags;
|
|
|
|
|
tracing: Uninline trace_trigger_soft_disabled() partly
On a powerpc32 build with CONFIG_CC_OPTIMISE_FOR_SIZE, the inline
keyword is not honored and trace_trigger_soft_disabled() appears
approx 50 times in vmlinux.
Adding -Winline to the build, the following message appears:
./include/linux/trace_events.h:712:1: error: inlining failed in call to 'trace_trigger_soft_disabled': call is unlikely and code size would grow [-Werror=inline]
That function is rather big for an inlined function:
c003df60 <trace_trigger_soft_disabled>:
c003df60: 94 21 ff f0 stwu r1,-16(r1)
c003df64: 7c 08 02 a6 mflr r0
c003df68: 90 01 00 14 stw r0,20(r1)
c003df6c: bf c1 00 08 stmw r30,8(r1)
c003df70: 83 e3 00 24 lwz r31,36(r3)
c003df74: 73 e9 01 00 andi. r9,r31,256
c003df78: 41 82 00 10 beq c003df88 <trace_trigger_soft_disabled+0x28>
c003df7c: 38 60 00 00 li r3,0
c003df80: 39 61 00 10 addi r11,r1,16
c003df84: 4b fd 60 ac b c0014030 <_rest32gpr_30_x>
c003df88: 73 e9 00 80 andi. r9,r31,128
c003df8c: 7c 7e 1b 78 mr r30,r3
c003df90: 41 a2 00 14 beq c003dfa4 <trace_trigger_soft_disabled+0x44>
c003df94: 38 c0 00 00 li r6,0
c003df98: 38 a0 00 00 li r5,0
c003df9c: 38 80 00 00 li r4,0
c003dfa0: 48 05 c5 f1 bl c009a590 <event_triggers_call>
c003dfa4: 73 e9 00 40 andi. r9,r31,64
c003dfa8: 40 82 00 28 bne c003dfd0 <trace_trigger_soft_disabled+0x70>
c003dfac: 73 ff 02 00 andi. r31,r31,512
c003dfb0: 41 82 ff cc beq c003df7c <trace_trigger_soft_disabled+0x1c>
c003dfb4: 80 01 00 14 lwz r0,20(r1)
c003dfb8: 83 e1 00 0c lwz r31,12(r1)
c003dfbc: 7f c3 f3 78 mr r3,r30
c003dfc0: 83 c1 00 08 lwz r30,8(r1)
c003dfc4: 7c 08 03 a6 mtlr r0
c003dfc8: 38 21 00 10 addi r1,r1,16
c003dfcc: 48 05 6f 6c b c0094f38 <trace_event_ignore_this_pid>
c003dfd0: 38 60 00 01 li r3,1
c003dfd4: 4b ff ff ac b c003df80 <trace_trigger_soft_disabled+0x20>
However it is located in a hot path so inlining it is important.
But forcing inlining of the entire function by using __always_inline
leads to increasing the text size by approx 20 kbytes.
Instead, split the fonction in two parts, one part with the likely
fast path, flagged __always_inline, and a second part out of line.
With this change, on a powerpc32 with CONFIG_CC_OPTIMISE_FOR_SIZE
vmlinux text increases by only 1,4 kbytes, which is partly
compensated by a decrease of vmlinux data by 7 kbytes.
On ppc64_defconfig which has CONFIG_CC_OPTIMISE_FOR_SPEED, this
change reduces vmlinux text by more than 30 kbytes.
Link: https://lkml.kernel.org/r/69ce0986a52d026d381d612801d978aa4f977460.1644563295.git.christophe.leroy@csgroup.eu
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-02-11 07:10:18 +00:00
|
|
|
if (likely(!(eflags & (EVENT_FILE_FL_TRIGGER_MODE |
|
|
|
|
EVENT_FILE_FL_SOFT_DISABLED |
|
|
|
|
EVENT_FILE_FL_PID_FILTER))))
|
|
|
|
return false;
|
|
|
|
|
|
|
|
if (likely(eflags & EVENT_FILE_FL_TRIGGER_COND))
|
|
|
|
return false;
|
|
|
|
|
|
|
|
return __trace_trigger_soft_disabled(file);
|
2014-01-07 02:32:10 +00:00
|
|
|
}
|
|
|
|
|
2015-07-01 02:13:49 +00:00
|
|
|
#ifdef CONFIG_BPF_EVENTS
|
2017-10-24 06:53:08 +00:00
|
|
|
unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx);
|
bpf: Allow to specify user-provided bpf_cookie for BPF perf links
Add ability for users to specify custom u64 value (bpf_cookie) when creating
BPF link for perf_event-backed BPF programs (kprobe/uprobe, perf_event,
tracepoints).
This is useful for cases when the same BPF program is used for attaching and
processing invocation of different tracepoints/kprobes/uprobes in a generic
fashion, but such that each invocation is distinguished from each other (e.g.,
BPF program can look up additional information associated with a specific
kernel function without having to rely on function IP lookups). This enables
new use cases to be implemented simply and efficiently that previously were
possible only through code generation (and thus multiple instances of almost
identical BPF program) or compilation at runtime (BCC-style) on target hosts
(even more expensive resource-wise). For uprobes it is not even possible in
some cases to know function IP before hand (e.g., when attaching to shared
library without PID filtering, in which case base load address is not known
for a library).
This is done by storing u64 bpf_cookie in struct bpf_prog_array_item,
corresponding to each attached and run BPF program. Given cgroup BPF programs
already use two 8-byte pointers for their needs and cgroup BPF programs don't
have (yet?) support for bpf_cookie, reuse that space through union of
cgroup_storage and new bpf_cookie field.
Make it available to kprobe/tracepoint BPF programs through bpf_trace_run_ctx.
This is set by BPF_PROG_RUN_ARRAY, used by kprobe/uprobe/tracepoint BPF
program execution code, which luckily is now also split from
BPF_PROG_RUN_ARRAY_CG. This run context will be utilized by a new BPF helper
giving access to this user-provided cookie value from inside a BPF program.
Generic perf_event BPF programs will access this value from perf_event itself
through passed in BPF program context.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/bpf/20210815070609.987780-6-andrii@kernel.org
2021-08-15 07:05:58 +00:00
|
|
|
int perf_event_attach_bpf_prog(struct perf_event *event, struct bpf_prog *prog, u64 bpf_cookie);
|
2017-10-24 06:53:08 +00:00
|
|
|
void perf_event_detach_bpf_prog(struct perf_event *event);
|
2017-12-13 18:35:37 +00:00
|
|
|
int perf_event_query_prog_array(struct perf_event *event, void __user *info);
|
2024-03-19 23:38:49 +00:00
|
|
|
|
|
|
|
struct bpf_raw_tp_link;
|
|
|
|
int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_raw_tp_link *link);
|
|
|
|
int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf_raw_tp_link *link);
|
|
|
|
|
2018-12-13 00:42:37 +00:00
|
|
|
struct bpf_raw_event_map *bpf_get_raw_tracepoint(const char *name);
|
|
|
|
void bpf_put_raw_tracepoint(struct bpf_raw_event_map *btp);
|
bpf: introduce bpf subcommand BPF_TASK_FD_QUERY
Currently, suppose a userspace application has loaded a bpf program
and attached it to a tracepoint/kprobe/uprobe, and a bpf
introspection tool, e.g., bpftool, wants to show which bpf program
is attached to which tracepoint/kprobe/uprobe. Such attachment
information will be really useful to understand the overall bpf
deployment in the system.
There is a name field (16 bytes) for each program, which could
be used to encode the attachment point. There are some drawbacks
for this approaches. First, bpftool user (e.g., an admin) may not
really understand the association between the name and the
attachment point. Second, if one program is attached to multiple
places, encoding a proper name which can imply all these
attachments becomes difficult.
This patch introduces a new bpf subcommand BPF_TASK_FD_QUERY.
Given a pid and fd, if the <pid, fd> is associated with a
tracepoint/kprobe/uprobe perf event, BPF_TASK_FD_QUERY will return
. prog_id
. tracepoint name, or
. k[ret]probe funcname + offset or kernel addr, or
. u[ret]probe filename + offset
to the userspace.
The user can use "bpftool prog" to find more information about
bpf program itself with prog_id.
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-24 18:21:09 +00:00
|
|
|
int bpf_get_perf_event_info(const struct perf_event *event, u32 *prog_id,
|
|
|
|
u32 *fd_type, const char **buf,
|
2023-09-20 21:31:39 +00:00
|
|
|
u64 *probe_offset, u64 *probe_addr,
|
|
|
|
unsigned long *missed);
|
2022-03-16 12:24:09 +00:00
|
|
|
int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog);
|
2023-08-09 08:34:15 +00:00
|
|
|
int bpf_uprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog);
|
tracing, perf: Implement BPF programs attached to kprobes
BPF programs, attached to kprobes, provide a safe way to execute
user-defined BPF byte-code programs without being able to crash or
hang the kernel in any way. The BPF engine makes sure that such
programs have a finite execution time and that they cannot break
out of their sandbox.
The user interface is to attach to a kprobe via the perf syscall:
struct perf_event_attr attr = {
.type = PERF_TYPE_TRACEPOINT,
.config = event_id,
...
};
event_fd = perf_event_open(&attr,...);
ioctl(event_fd, PERF_EVENT_IOC_SET_BPF, prog_fd);
'prog_fd' is a file descriptor associated with BPF program
previously loaded.
'event_id' is an ID of the kprobe created.
Closing 'event_fd':
close(event_fd);
... automatically detaches BPF program from it.
BPF programs can call in-kernel helper functions to:
- lookup/update/delete elements in maps
- probe_read - wraper of probe_kernel_read() used to access any
kernel data structures
BPF programs receive 'struct pt_regs *' as an input ('struct pt_regs' is
architecture dependent) and return 0 to ignore the event and 1 to store
kprobe event into the ring buffer.
Note, kprobes are a fundamentally _not_ a stable kernel ABI,
so BPF programs attached to kprobes must be recompiled for
every kernel version and user must supply correct LINUX_VERSION_CODE
in attr.kern_version during bpf_prog_load() call.
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1427312966-8434-4-git-send-email-ast@plumgrid.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-25 19:49:20 +00:00
|
|
|
#else
|
2017-10-24 06:53:08 +00:00
|
|
|
static inline unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx)
|
tracing, perf: Implement BPF programs attached to kprobes
BPF programs, attached to kprobes, provide a safe way to execute
user-defined BPF byte-code programs without being able to crash or
hang the kernel in any way. The BPF engine makes sure that such
programs have a finite execution time and that they cannot break
out of their sandbox.
The user interface is to attach to a kprobe via the perf syscall:
struct perf_event_attr attr = {
.type = PERF_TYPE_TRACEPOINT,
.config = event_id,
...
};
event_fd = perf_event_open(&attr,...);
ioctl(event_fd, PERF_EVENT_IOC_SET_BPF, prog_fd);
'prog_fd' is a file descriptor associated with BPF program
previously loaded.
'event_id' is an ID of the kprobe created.
Closing 'event_fd':
close(event_fd);
... automatically detaches BPF program from it.
BPF programs can call in-kernel helper functions to:
- lookup/update/delete elements in maps
- probe_read - wraper of probe_kernel_read() used to access any
kernel data structures
BPF programs receive 'struct pt_regs *' as an input ('struct pt_regs' is
architecture dependent) and return 0 to ignore the event and 1 to store
kprobe event into the ring buffer.
Note, kprobes are a fundamentally _not_ a stable kernel ABI,
so BPF programs attached to kprobes must be recompiled for
every kernel version and user must supply correct LINUX_VERSION_CODE
in attr.kern_version during bpf_prog_load() call.
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1427312966-8434-4-git-send-email-ast@plumgrid.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-25 19:49:20 +00:00
|
|
|
{
|
|
|
|
return 1;
|
|
|
|
}
|
2017-10-24 06:53:08 +00:00
|
|
|
|
|
|
|
static inline int
|
bpf: Allow to specify user-provided bpf_cookie for BPF perf links
Add ability for users to specify custom u64 value (bpf_cookie) when creating
BPF link for perf_event-backed BPF programs (kprobe/uprobe, perf_event,
tracepoints).
This is useful for cases when the same BPF program is used for attaching and
processing invocation of different tracepoints/kprobes/uprobes in a generic
fashion, but such that each invocation is distinguished from each other (e.g.,
BPF program can look up additional information associated with a specific
kernel function without having to rely on function IP lookups). This enables
new use cases to be implemented simply and efficiently that previously were
possible only through code generation (and thus multiple instances of almost
identical BPF program) or compilation at runtime (BCC-style) on target hosts
(even more expensive resource-wise). For uprobes it is not even possible in
some cases to know function IP before hand (e.g., when attaching to shared
library without PID filtering, in which case base load address is not known
for a library).
This is done by storing u64 bpf_cookie in struct bpf_prog_array_item,
corresponding to each attached and run BPF program. Given cgroup BPF programs
already use two 8-byte pointers for their needs and cgroup BPF programs don't
have (yet?) support for bpf_cookie, reuse that space through union of
cgroup_storage and new bpf_cookie field.
Make it available to kprobe/tracepoint BPF programs through bpf_trace_run_ctx.
This is set by BPF_PROG_RUN_ARRAY, used by kprobe/uprobe/tracepoint BPF
program execution code, which luckily is now also split from
BPF_PROG_RUN_ARRAY_CG. This run context will be utilized by a new BPF helper
giving access to this user-provided cookie value from inside a BPF program.
Generic perf_event BPF programs will access this value from perf_event itself
through passed in BPF program context.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/bpf/20210815070609.987780-6-andrii@kernel.org
2021-08-15 07:05:58 +00:00
|
|
|
perf_event_attach_bpf_prog(struct perf_event *event, struct bpf_prog *prog, u64 bpf_cookie)
|
2017-10-24 06:53:08 +00:00
|
|
|
{
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline void perf_event_detach_bpf_prog(struct perf_event *event) { }
|
|
|
|
|
2017-12-13 18:35:37 +00:00
|
|
|
static inline int
|
|
|
|
perf_event_query_prog_array(struct perf_event *event, void __user *info)
|
|
|
|
{
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
}
|
2024-03-19 23:38:49 +00:00
|
|
|
struct bpf_raw_tp_link;
|
|
|
|
static inline int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_raw_tp_link *link)
|
2018-03-28 19:05:37 +00:00
|
|
|
{
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
}
|
2024-03-19 23:38:49 +00:00
|
|
|
static inline int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf_raw_tp_link *link)
|
2018-03-28 19:05:37 +00:00
|
|
|
{
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
}
|
2018-12-13 00:42:37 +00:00
|
|
|
static inline struct bpf_raw_event_map *bpf_get_raw_tracepoint(const char *name)
|
2018-03-28 19:05:37 +00:00
|
|
|
{
|
|
|
|
return NULL;
|
|
|
|
}
|
2018-12-13 00:42:37 +00:00
|
|
|
static inline void bpf_put_raw_tracepoint(struct bpf_raw_event_map *btp)
|
|
|
|
{
|
|
|
|
}
|
bpf: introduce bpf subcommand BPF_TASK_FD_QUERY
Currently, suppose a userspace application has loaded a bpf program
and attached it to a tracepoint/kprobe/uprobe, and a bpf
introspection tool, e.g., bpftool, wants to show which bpf program
is attached to which tracepoint/kprobe/uprobe. Such attachment
information will be really useful to understand the overall bpf
deployment in the system.
There is a name field (16 bytes) for each program, which could
be used to encode the attachment point. There are some drawbacks
for this approaches. First, bpftool user (e.g., an admin) may not
really understand the association between the name and the
attachment point. Second, if one program is attached to multiple
places, encoding a proper name which can imply all these
attachments becomes difficult.
This patch introduces a new bpf subcommand BPF_TASK_FD_QUERY.
Given a pid and fd, if the <pid, fd> is associated with a
tracepoint/kprobe/uprobe perf event, BPF_TASK_FD_QUERY will return
. prog_id
. tracepoint name, or
. k[ret]probe funcname + offset or kernel addr, or
. u[ret]probe filename + offset
to the userspace.
The user can use "bpftool prog" to find more information about
bpf program itself with prog_id.
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-24 18:21:09 +00:00
|
|
|
static inline int bpf_get_perf_event_info(const struct perf_event *event,
|
|
|
|
u32 *prog_id, u32 *fd_type,
|
|
|
|
const char **buf, u64 *probe_offset,
|
2023-09-20 21:31:39 +00:00
|
|
|
u64 *probe_addr, unsigned long *missed)
|
bpf: introduce bpf subcommand BPF_TASK_FD_QUERY
Currently, suppose a userspace application has loaded a bpf program
and attached it to a tracepoint/kprobe/uprobe, and a bpf
introspection tool, e.g., bpftool, wants to show which bpf program
is attached to which tracepoint/kprobe/uprobe. Such attachment
information will be really useful to understand the overall bpf
deployment in the system.
There is a name field (16 bytes) for each program, which could
be used to encode the attachment point. There are some drawbacks
for this approaches. First, bpftool user (e.g., an admin) may not
really understand the association between the name and the
attachment point. Second, if one program is attached to multiple
places, encoding a proper name which can imply all these
attachments becomes difficult.
This patch introduces a new bpf subcommand BPF_TASK_FD_QUERY.
Given a pid and fd, if the <pid, fd> is associated with a
tracepoint/kprobe/uprobe perf event, BPF_TASK_FD_QUERY will return
. prog_id
. tracepoint name, or
. k[ret]probe funcname + offset or kernel addr, or
. u[ret]probe filename + offset
to the userspace.
The user can use "bpftool prog" to find more information about
bpf program itself with prog_id.
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-24 18:21:09 +00:00
|
|
|
{
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
}
|
2022-03-16 12:24:09 +00:00
|
|
|
static inline int
|
|
|
|
bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
|
|
|
|
{
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
}
|
2023-08-09 08:34:15 +00:00
|
|
|
static inline int
|
|
|
|
bpf_uprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
|
2022-03-16 12:24:09 +00:00
|
|
|
{
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
}
|
tracing, perf: Implement BPF programs attached to kprobes
BPF programs, attached to kprobes, provide a safe way to execute
user-defined BPF byte-code programs without being able to crash or
hang the kernel in any way. The BPF engine makes sure that such
programs have a finite execution time and that they cannot break
out of their sandbox.
The user interface is to attach to a kprobe via the perf syscall:
struct perf_event_attr attr = {
.type = PERF_TYPE_TRACEPOINT,
.config = event_id,
...
};
event_fd = perf_event_open(&attr,...);
ioctl(event_fd, PERF_EVENT_IOC_SET_BPF, prog_fd);
'prog_fd' is a file descriptor associated with BPF program
previously loaded.
'event_id' is an ID of the kprobe created.
Closing 'event_fd':
close(event_fd);
... automatically detaches BPF program from it.
BPF programs can call in-kernel helper functions to:
- lookup/update/delete elements in maps
- probe_read - wraper of probe_kernel_read() used to access any
kernel data structures
BPF programs receive 'struct pt_regs *' as an input ('struct pt_regs' is
architecture dependent) and return 0 to ignore the event and 1 to store
kprobe event into the ring buffer.
Note, kprobes are a fundamentally _not_ a stable kernel ABI,
so BPF programs attached to kprobes must be recompiled for
every kernel version and user must supply correct LINUX_VERSION_CODE
in attr.kern_version during bpf_prog_load() call.
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1427312966-8434-4-git-send-email-ast@plumgrid.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-25 19:49:20 +00:00
|
|
|
#endif
|
|
|
|
|
2009-08-07 02:33:22 +00:00
|
|
|
enum {
|
|
|
|
FILTER_OTHER = 0,
|
|
|
|
FILTER_STATIC_STRING,
|
|
|
|
FILTER_DYN_STRING,
|
2021-11-22 09:30:12 +00:00
|
|
|
FILTER_RDYN_STRING,
|
2009-08-07 02:33:43 +00:00
|
|
|
FILTER_PTR_STRING,
|
2012-02-15 14:51:53 +00:00
|
|
|
FILTER_TRACE_FN,
|
2023-07-07 17:21:48 +00:00
|
|
|
FILTER_CPUMASK,
|
2016-03-03 22:18:20 +00:00
|
|
|
FILTER_COMM,
|
|
|
|
FILTER_CPU,
|
2023-05-24 03:09:13 +00:00
|
|
|
FILTER_STACKTRACE,
|
2009-08-07 02:33:22 +00:00
|
|
|
};
|
|
|
|
|
2015-05-05 15:45:27 +00:00
|
|
|
extern int trace_event_raw_init(struct trace_event_call *call);
|
|
|
|
extern int trace_define_field(struct trace_event_call *call, const char *type,
|
2009-08-27 03:09:51 +00:00
|
|
|
const char *name, int offset, int size,
|
|
|
|
int is_signed, int filter_type);
|
2015-05-05 15:45:27 +00:00
|
|
|
extern int trace_add_event_call(struct trace_event_call *call);
|
|
|
|
extern int trace_remove_event_call(struct trace_event_call *call);
|
2016-04-07 01:43:28 +00:00
|
|
|
extern int trace_event_get_offsets(struct trace_event_call *call);
|
2009-04-13 15:20:49 +00:00
|
|
|
|
2019-07-04 17:21:10 +00:00
|
|
|
int ftrace_set_clr_event(struct trace_array *tr, char *buf, int set);
|
2009-05-08 20:27:41 +00:00
|
|
|
int trace_set_clr_event(const char *system, const char *event, int set);
|
2019-11-20 19:08:38 +00:00
|
|
|
int trace_array_set_clr_event(struct trace_array *tr, const char *system,
|
|
|
|
const char *event, bool enable);
|
2009-04-13 15:20:49 +00:00
|
|
|
/*
|
|
|
|
* The double __builtin_constant_p is because gcc will give us an error
|
|
|
|
* if we try to allocate the static variable to fmt if it is not a
|
|
|
|
* constant. Even with the outer if statement optimizing out.
|
|
|
|
*/
|
|
|
|
#define event_trace_printk(ip, fmt, args...) \
|
|
|
|
do { \
|
|
|
|
__trace_printk_check_format(fmt, ##args); \
|
|
|
|
tracing_record_cmdline(current); \
|
|
|
|
if (__builtin_constant_p(fmt)) { \
|
|
|
|
static const char *trace_printk_fmt \
|
2020-10-22 02:36:07 +00:00
|
|
|
__section("__trace_printk_fmt") = \
|
2009-04-13 15:20:49 +00:00
|
|
|
__builtin_constant_p(fmt) ? fmt : NULL; \
|
|
|
|
\
|
|
|
|
__trace_bprintk(ip, trace_printk_fmt, ##args); \
|
|
|
|
} else \
|
|
|
|
__trace_printk(ip, fmt, ##args); \
|
|
|
|
} while (0)
|
|
|
|
|
2009-12-21 06:27:35 +00:00
|
|
|
#ifdef CONFIG_PERF_EVENTS
|
2009-10-15 03:21:42 +00:00
|
|
|
struct perf_event;
|
2010-03-03 06:16:16 +00:00
|
|
|
|
|
|
|
DECLARE_PER_CPU(struct pt_regs, perf_trace_regs);
|
|
|
|
|
2010-05-19 12:02:22 +00:00
|
|
|
extern int perf_trace_init(struct perf_event *event);
|
|
|
|
extern void perf_trace_destroy(struct perf_event *event);
|
perf: Rework the PMU methods
Replace pmu::{enable,disable,start,stop,unthrottle} with
pmu::{add,del,start,stop}, all of which take a flags argument.
The new interface extends the capability to stop a counter while
keeping it scheduled on the PMU. We replace the throttled state with
the generic stopped state.
This also allows us to efficiently stop/start counters over certain
code paths (like IRQ handlers).
It also allows scheduling a counter without it starting, allowing for
a generic frozen state (useful for rotating stopped counters).
The stopped state is implemented in two different ways, depending on
how the architecture implemented the throttled state:
1) We disable the counter:
a) the pmu has per-counter enable bits, we flip that
b) we program a NOP event, preserving the counter state
2) We store the counter state and ignore all read/overflow events
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: paulus <paulus@samba.org>
Cc: stephane eranian <eranian@googlemail.com>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Lin Ming <ming.m.lin@intel.com>
Cc: Yanmin <yanmin_zhang@linux.intel.com>
Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
Cc: David Miller <davem@davemloft.net>
Cc: Michael Cree <mcree@orcon.net.nz>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-06-16 12:37:10 +00:00
|
|
|
extern int perf_trace_add(struct perf_event *event, int flags);
|
|
|
|
extern void perf_trace_del(struct perf_event *event, int flags);
|
2017-12-06 22:45:15 +00:00
|
|
|
#ifdef CONFIG_KPROBE_EVENTS
|
|
|
|
extern int perf_kprobe_init(struct perf_event *event, bool is_retprobe);
|
|
|
|
extern void perf_kprobe_destroy(struct perf_event *event);
|
bpf: introduce bpf subcommand BPF_TASK_FD_QUERY
Currently, suppose a userspace application has loaded a bpf program
and attached it to a tracepoint/kprobe/uprobe, and a bpf
introspection tool, e.g., bpftool, wants to show which bpf program
is attached to which tracepoint/kprobe/uprobe. Such attachment
information will be really useful to understand the overall bpf
deployment in the system.
There is a name field (16 bytes) for each program, which could
be used to encode the attachment point. There are some drawbacks
for this approaches. First, bpftool user (e.g., an admin) may not
really understand the association between the name and the
attachment point. Second, if one program is attached to multiple
places, encoding a proper name which can imply all these
attachments becomes difficult.
This patch introduces a new bpf subcommand BPF_TASK_FD_QUERY.
Given a pid and fd, if the <pid, fd> is associated with a
tracepoint/kprobe/uprobe perf event, BPF_TASK_FD_QUERY will return
. prog_id
. tracepoint name, or
. k[ret]probe funcname + offset or kernel addr, or
. u[ret]probe filename + offset
to the userspace.
The user can use "bpftool prog" to find more information about
bpf program itself with prog_id.
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-24 18:21:09 +00:00
|
|
|
extern int bpf_get_kprobe_info(const struct perf_event *event,
|
|
|
|
u32 *fd_type, const char **symbol,
|
|
|
|
u64 *probe_offset, u64 *probe_addr,
|
2023-09-20 21:31:39 +00:00
|
|
|
unsigned long *missed,
|
bpf: introduce bpf subcommand BPF_TASK_FD_QUERY
Currently, suppose a userspace application has loaded a bpf program
and attached it to a tracepoint/kprobe/uprobe, and a bpf
introspection tool, e.g., bpftool, wants to show which bpf program
is attached to which tracepoint/kprobe/uprobe. Such attachment
information will be really useful to understand the overall bpf
deployment in the system.
There is a name field (16 bytes) for each program, which could
be used to encode the attachment point. There are some drawbacks
for this approaches. First, bpftool user (e.g., an admin) may not
really understand the association between the name and the
attachment point. Second, if one program is attached to multiple
places, encoding a proper name which can imply all these
attachments becomes difficult.
This patch introduces a new bpf subcommand BPF_TASK_FD_QUERY.
Given a pid and fd, if the <pid, fd> is associated with a
tracepoint/kprobe/uprobe perf event, BPF_TASK_FD_QUERY will return
. prog_id
. tracepoint name, or
. k[ret]probe funcname + offset or kernel addr, or
. u[ret]probe filename + offset
to the userspace.
The user can use "bpftool prog" to find more information about
bpf program itself with prog_id.
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-24 18:21:09 +00:00
|
|
|
bool perf_type_tracepoint);
|
2017-12-06 22:45:15 +00:00
|
|
|
#endif
|
2017-12-06 22:45:16 +00:00
|
|
|
#ifdef CONFIG_UPROBE_EVENTS
|
2018-10-02 05:36:36 +00:00
|
|
|
extern int perf_uprobe_init(struct perf_event *event,
|
|
|
|
unsigned long ref_ctr_offset, bool is_retprobe);
|
2017-12-06 22:45:16 +00:00
|
|
|
extern void perf_uprobe_destroy(struct perf_event *event);
|
bpf: introduce bpf subcommand BPF_TASK_FD_QUERY
Currently, suppose a userspace application has loaded a bpf program
and attached it to a tracepoint/kprobe/uprobe, and a bpf
introspection tool, e.g., bpftool, wants to show which bpf program
is attached to which tracepoint/kprobe/uprobe. Such attachment
information will be really useful to understand the overall bpf
deployment in the system.
There is a name field (16 bytes) for each program, which could
be used to encode the attachment point. There are some drawbacks
for this approaches. First, bpftool user (e.g., an admin) may not
really understand the association between the name and the
attachment point. Second, if one program is attached to multiple
places, encoding a proper name which can imply all these
attachments becomes difficult.
This patch introduces a new bpf subcommand BPF_TASK_FD_QUERY.
Given a pid and fd, if the <pid, fd> is associated with a
tracepoint/kprobe/uprobe perf event, BPF_TASK_FD_QUERY will return
. prog_id
. tracepoint name, or
. k[ret]probe funcname + offset or kernel addr, or
. u[ret]probe filename + offset
to the userspace.
The user can use "bpftool prog" to find more information about
bpf program itself with prog_id.
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-24 18:21:09 +00:00
|
|
|
extern int bpf_get_uprobe_info(const struct perf_event *event,
|
|
|
|
u32 *fd_type, const char **filename,
|
2023-07-09 02:56:25 +00:00
|
|
|
u64 *probe_offset, u64 *probe_addr,
|
|
|
|
bool perf_type_tracepoint);
|
2017-12-06 22:45:16 +00:00
|
|
|
#endif
|
2010-05-19 12:02:22 +00:00
|
|
|
extern int ftrace_profile_set_filter(struct perf_event *event, int event_id,
|
2009-10-15 03:21:42 +00:00
|
|
|
char *filter_str);
|
|
|
|
extern void ftrace_profile_free_filter(struct perf_event *event);
|
2016-04-07 01:43:24 +00:00
|
|
|
void perf_trace_buf_update(void *record, u16 type);
|
|
|
|
void *perf_trace_buf_alloc(int size, struct pt_regs **regs, int *rctxp);
|
2010-01-28 01:32:29 +00:00
|
|
|
|
bpf: Allow to specify user-provided bpf_cookie for BPF perf links
Add ability for users to specify custom u64 value (bpf_cookie) when creating
BPF link for perf_event-backed BPF programs (kprobe/uprobe, perf_event,
tracepoints).
This is useful for cases when the same BPF program is used for attaching and
processing invocation of different tracepoints/kprobes/uprobes in a generic
fashion, but such that each invocation is distinguished from each other (e.g.,
BPF program can look up additional information associated with a specific
kernel function without having to rely on function IP lookups). This enables
new use cases to be implemented simply and efficiently that previously were
possible only through code generation (and thus multiple instances of almost
identical BPF program) or compilation at runtime (BCC-style) on target hosts
(even more expensive resource-wise). For uprobes it is not even possible in
some cases to know function IP before hand (e.g., when attaching to shared
library without PID filtering, in which case base load address is not known
for a library).
This is done by storing u64 bpf_cookie in struct bpf_prog_array_item,
corresponding to each attached and run BPF program. Given cgroup BPF programs
already use two 8-byte pointers for their needs and cgroup BPF programs don't
have (yet?) support for bpf_cookie, reuse that space through union of
cgroup_storage and new bpf_cookie field.
Make it available to kprobe/tracepoint BPF programs through bpf_trace_run_ctx.
This is set by BPF_PROG_RUN_ARRAY, used by kprobe/uprobe/tracepoint BPF
program execution code, which luckily is now also split from
BPF_PROG_RUN_ARRAY_CG. This run context will be utilized by a new BPF helper
giving access to this user-provided cookie value from inside a BPF program.
Generic perf_event BPF programs will access this value from perf_event itself
through passed in BPF program context.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/bpf/20210815070609.987780-6-andrii@kernel.org
2021-08-15 07:05:58 +00:00
|
|
|
int perf_event_set_bpf_prog(struct perf_event *event, struct bpf_prog *prog, u64 bpf_cookie);
|
bpf: Implement minimal BPF perf link
Introduce a new type of BPF link - BPF perf link. This brings perf_event-based
BPF program attachments (perf_event, tracepoints, kprobes, and uprobes) into
the common BPF link infrastructure, allowing to list all active perf_event
based attachments, auto-detaching BPF program from perf_event when link's FD
is closed, get generic BPF link fdinfo/get_info functionality.
BPF_LINK_CREATE command expects perf_event's FD as target_fd. No extra flags
are currently supported.
Force-detaching and atomic BPF program updates are not yet implemented, but
with perf_event-based BPF links we now have common framework for this without
the need to extend ioctl()-based perf_event interface.
One interesting consideration is a new value for bpf_attach_type, which
BPF_LINK_CREATE command expects. Generally, it's either 1-to-1 mapping from
bpf_attach_type to bpf_prog_type, or many-to-1 mapping from a subset of
bpf_attach_types to one bpf_prog_type (e.g., see BPF_PROG_TYPE_SK_SKB or
BPF_PROG_TYPE_CGROUP_SOCK). In this case, though, we have three different
program types (KPROBE, TRACEPOINT, PERF_EVENT) using the same perf_event-based
mechanism, so it's many bpf_prog_types to one bpf_attach_type. I chose to
define a single BPF_PERF_EVENT attach type for all of them and adjust
link_create()'s logic for checking correspondence between attach type and
program type.
The alternative would be to define three new attach types (e.g., BPF_KPROBE,
BPF_TRACEPOINT, and BPF_PERF_EVENT), but that seemed like unnecessary overkill
and BPF_KPROBE will cause naming conflicts with BPF_KPROBE() macro, defined by
libbpf. I chose to not do this to avoid unnecessary proliferation of
bpf_attach_type enum values and not have to deal with naming conflicts.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/bpf/20210815070609.987780-5-andrii@kernel.org
2021-08-15 07:05:57 +00:00
|
|
|
void perf_event_free_bpf_prog(struct perf_event *event);
|
|
|
|
|
2024-03-19 23:38:49 +00:00
|
|
|
void bpf_trace_run1(struct bpf_raw_tp_link *link, u64 arg1);
|
|
|
|
void bpf_trace_run2(struct bpf_raw_tp_link *link, u64 arg1, u64 arg2);
|
|
|
|
void bpf_trace_run3(struct bpf_raw_tp_link *link, u64 arg1, u64 arg2,
|
2018-03-28 19:05:37 +00:00
|
|
|
u64 arg3);
|
2024-03-19 23:38:49 +00:00
|
|
|
void bpf_trace_run4(struct bpf_raw_tp_link *link, u64 arg1, u64 arg2,
|
2018-03-28 19:05:37 +00:00
|
|
|
u64 arg3, u64 arg4);
|
2024-03-19 23:38:49 +00:00
|
|
|
void bpf_trace_run5(struct bpf_raw_tp_link *link, u64 arg1, u64 arg2,
|
2018-03-28 19:05:37 +00:00
|
|
|
u64 arg3, u64 arg4, u64 arg5);
|
2024-03-19 23:38:49 +00:00
|
|
|
void bpf_trace_run6(struct bpf_raw_tp_link *link, u64 arg1, u64 arg2,
|
2018-03-28 19:05:37 +00:00
|
|
|
u64 arg3, u64 arg4, u64 arg5, u64 arg6);
|
2024-03-19 23:38:49 +00:00
|
|
|
void bpf_trace_run7(struct bpf_raw_tp_link *link, u64 arg1, u64 arg2,
|
2018-03-28 19:05:37 +00:00
|
|
|
u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7);
|
2024-03-19 23:38:49 +00:00
|
|
|
void bpf_trace_run8(struct bpf_raw_tp_link *link, u64 arg1, u64 arg2,
|
2018-03-28 19:05:37 +00:00
|
|
|
u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
|
|
|
|
u64 arg8);
|
2024-03-19 23:38:49 +00:00
|
|
|
void bpf_trace_run9(struct bpf_raw_tp_link *link, u64 arg1, u64 arg2,
|
2018-03-28 19:05:37 +00:00
|
|
|
u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
|
|
|
|
u64 arg8, u64 arg9);
|
2024-03-19 23:38:49 +00:00
|
|
|
void bpf_trace_run10(struct bpf_raw_tp_link *link, u64 arg1, u64 arg2,
|
2018-03-28 19:05:37 +00:00
|
|
|
u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
|
|
|
|
u64 arg8, u64 arg9, u64 arg10);
|
2024-03-19 23:38:49 +00:00
|
|
|
void bpf_trace_run11(struct bpf_raw_tp_link *link, u64 arg1, u64 arg2,
|
2018-03-28 19:05:37 +00:00
|
|
|
u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
|
|
|
|
u64 arg8, u64 arg9, u64 arg10, u64 arg11);
|
2024-03-19 23:38:49 +00:00
|
|
|
void bpf_trace_run12(struct bpf_raw_tp_link *link, u64 arg1, u64 arg2,
|
2018-03-28 19:05:37 +00:00
|
|
|
u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
|
|
|
|
u64 arg8, u64 arg9, u64 arg10, u64 arg11, u64 arg12);
|
2016-04-19 03:11:50 +00:00
|
|
|
void perf_trace_run_bpf_submit(void *raw_data, int size, int rctx,
|
|
|
|
struct trace_event_call *call, u64 count,
|
|
|
|
struct pt_regs *regs, struct hlist_head *head,
|
|
|
|
struct task_struct *task);
|
|
|
|
|
2010-01-28 01:32:29 +00:00
|
|
|
static inline void
|
2016-04-07 01:43:24 +00:00
|
|
|
perf_trace_buf_submit(void *raw_data, int size, int rctx, u16 type,
|
2012-07-11 14:14:58 +00:00
|
|
|
u64 count, struct pt_regs *regs, void *head,
|
2017-10-11 07:45:29 +00:00
|
|
|
struct task_struct *task)
|
2010-01-28 01:32:29 +00:00
|
|
|
{
|
2017-10-11 07:45:29 +00:00
|
|
|
perf_tp_event(type, count, raw_data, size, regs, head, rctx, task);
|
2010-01-28 01:32:29 +00:00
|
|
|
}
|
2017-10-24 06:53:08 +00:00
|
|
|
|
2009-10-15 03:21:42 +00:00
|
|
|
#endif
|
|
|
|
|
tracing/events: Add __vstring() and __assign_vstr() helper macros
There's several places that open code the following logic:
TP_STRUCT__entry(__dynamic_array(char, msg, MSG_MAX)),
TP_fast_assign(vsnprintf(__get_str(msg), MSG_MAX, vaf->fmt, *vaf->va);)
To load a string created by variable array va_list.
The main issue with this approach is that "MSG_MAX" usage in the
__dynamic_array() portion. That actually just reserves the MSG_MAX in the
event, and even wastes space because there's dynamic meta data also saved
in the event to denote the offset and size of the dynamic array. It would
have been better to just use a static __array() field.
Instead, create __vstring() and __assign_vstr() that work like __string
and __assign_str() but instead of taking a destination string to copy,
take a format string and a va_list pointer and fill in the values.
It uses the helper:
#define __trace_event_vstr_len(fmt, va) \
({ \
va_list __ap; \
int __ret; \
\
va_copy(__ap, *(va)); \
__ret = vsnprintf(NULL, 0, fmt, __ap) + 1; \
va_end(__ap); \
\
min(__ret, TRACE_EVENT_STR_MAX); \
})
To figure out the length to store the string. It may be slightly slower as
it needs to run the vsnprintf() twice, but it now saves space on the ring
buffer.
Link: https://lkml.kernel.org/r/20220705224749.053570613@goodmis.org
Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Kalle Valo <kvalo@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Arend van Spriel <aspriel@gmail.com>
Cc: Franky Lin <franky.lin@broadcom.com>
Cc: Hante Meuleman <hante.meuleman@broadcom.com>
Cc: Gregory Greenman <gregory.greenman@intel.com>
Cc: Peter Chen <peter.chen@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mathias Nyman <mathias.nyman@intel.com>
Cc: Chunfeng Yun <chunfeng.yun@mediatek.com>
Cc: Bin Liu <b-liu@ti.com>
Cc: Marek Lindner <mareklindner@neomailbox.ch>
Cc: Simon Wunderlich <sw@simonwunderlich.de>
Cc: Antonio Quartulli <a@unstable.cc>
Cc: Sven Eckelmann <sven@narfation.org>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Jim Cromie <jim.cromie@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-07-05 22:44:54 +00:00
|
|
|
#define TRACE_EVENT_STR_MAX 512
|
|
|
|
|
|
|
|
/*
|
|
|
|
* gcc warns that you can not use a va_list in an inlined
|
|
|
|
* function. But lets me make it into a macro :-/
|
|
|
|
*/
|
|
|
|
#define __trace_event_vstr_len(fmt, va) \
|
|
|
|
({ \
|
|
|
|
va_list __ap; \
|
|
|
|
int __ret; \
|
|
|
|
\
|
|
|
|
va_copy(__ap, *(va)); \
|
|
|
|
__ret = vsnprintf(NULL, 0, fmt, __ap) + 1; \
|
|
|
|
va_end(__ap); \
|
|
|
|
\
|
|
|
|
min(__ret, TRACE_EVENT_STR_MAX); \
|
|
|
|
})
|
|
|
|
|
2015-05-05 15:45:27 +00:00
|
|
|
#endif /* _LINUX_TRACE_EVENT_H */
|
2022-03-03 22:05:34 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Note: we keep the TRACE_CUSTOM_EVENT outside the include file ifdef protection.
|
|
|
|
* This is due to the way trace custom events work. If a file includes two
|
|
|
|
* trace event headers under one "CREATE_CUSTOM_TRACE_EVENTS" the first include
|
|
|
|
* will override the TRACE_CUSTOM_EVENT and break the second include.
|
|
|
|
*/
|
|
|
|
|
|
|
|
#ifndef TRACE_CUSTOM_EVENT
|
|
|
|
|
|
|
|
#define DECLARE_CUSTOM_EVENT_CLASS(name, proto, args, tstruct, assign, print)
|
|
|
|
#define DEFINE_CUSTOM_EVENT(template, name, proto, args)
|
|
|
|
#define TRACE_CUSTOM_EVENT(name, proto, args, struct, assign, print)
|
|
|
|
|
|
|
|
#endif /* ifdef TRACE_CUSTOM_EVENT (see note above) */
|