Tracing fixes:

User events:
 
  - Use long instead of int for storing the enable set/clear bit, as it was
    found that big endian machines could end up using the wrong bits.
 
  - Split allocating mm and attaching it. This keeps the allocation separate
    from the registration and avoids various races.
 
  - Remove RCU locking around pin_user_pages_remote() as that can schedule. The
    RCU protection is no longer needed with the above split of mm allocation and
    attaching.
 
  - Rename the "link" fields of the various structs to something more
    meaningful.
 
  - Add comments around user_event_mm struct usage and locking requirements.
 
 Timerlat tracer:
 
  - Fix missed wakeup of timerlat thread caused by the timerlat interrupt
    triggering when tracing is off. The timer interrupt handler needs to always
    wake up the timerlat thread regardless if tracing is enabled or not,
    otherwise, it will never wake up.
 
 Histograms:
 
  - Fix regression of breaking the "stacktrace" modifier for variables. That
    modifier cannot be used for values, but can be used for variables that are
    passed from one histogram to the next. This was broken when adding the
    restriction to values as the variable logic used the same code.
 
  - Rename the special field "stacktrace" to "common_stacktrace". Special fields
    (that are not actually part of the event, but can act just like event
    fields, like 'comm' and 'timestamp') should be prefixed with 'common_' for
    consistency. To keep backward compatibility, 'stacktrace' can still be used
    (as with the special field 'cpu'), but can be overridden if the event has a
    field called 'stacktrace'.
 
  - Update the synthetic event selftests to use the new name (synthetic events
    are created by histograms)
 
 Tracing bootup selftests:
 
  - Reorganize the code to keep artifacts of the selftests not compiled in when
    selftests are not configured.
 
  - Add various cond_resched() around the selftest code, as the softlock
    watchdog was triggering much more often. It appears that the kernel runs
    slower now with full debugging enabled.
 
  - While debugging ftrace with ftrace (using an instance ring buffer instead of
    the top level one), I found that the selftests were disabling prints to the
    debug instance. This should not happen, as the selftests only disable
    printing to the main buffer as the selftests examine the main buffer to see
    if it has what it expects, and prints can make the tests fail. Make the
    selftests only disable printing to the toplevel buffer, and leave the
    instance buffers alone.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZHQGJBQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qu6hAQCJ1WebZUTJ/s7pFo36mXirLnrW4afB
 Ua6sALseqKNesgEAyhLmd2+sMeqmAbCCIUWtcWJb/Pod0jGOt0U8+cBxfw8=
 =PhaX
 -----END PGP SIGNATURE-----

Merge tag 'trace-v6.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing fixes from Steven Rostedt:
 "User events:

   - Use long instead of int for storing the enable set/clear bit, as it
     was found that big endian machines could end up using the wrong
     bits.

   - Split allocating mm and attaching it. This keeps the allocation
     separate from the registration and avoids various races.

   - Remove RCU locking around pin_user_pages_remote() as that can
     schedule. The RCU protection is no longer needed with the above
     split of mm allocation and attaching.

   - Rename the "link" fields of the various structs to something more
     meaningful.

   - Add comments around user_event_mm struct usage and locking
     requirements.

  Timerlat tracer:

   - Fix missed wakeup of timerlat thread caused by the timerlat
     interrupt triggering when tracing is off. The timer interrupt
     handler needs to always wake up the timerlat thread regardless if
     tracing is enabled or not, otherwise, it will never wake up.

  Histograms:

   - Fix regression of breaking the "stacktrace" modifier for variables.
     That modifier cannot be used for values, but can be used for
     variables that are passed from one histogram to the next. This was
     broken when adding the restriction to values as the variable logic
     used the same code.

   - Rename the special field "stacktrace" to "common_stacktrace".

     Special fields (that are not actually part of the event, but can
     act just like event fields, like 'comm' and 'timestamp') should be
     prefixed with 'common_' for consistency. To keep backward
     compatibility, 'stacktrace' can still be used (as with the special
     field 'cpu'), but can be overridden if the event has a field called
     'stacktrace'.

   - Update the synthetic event selftests to use the new name (synthetic
     events are created by histograms)

  Tracing bootup selftests:

   - Reorganize the code to keep artifacts of the selftests not compiled
     in when selftests are not configured.

   - Add various cond_resched() around the selftest code, as the
     softlock watchdog was triggering much more often. It appears that
     the kernel runs slower now with full debugging enabled.

   - While debugging ftrace with ftrace (using an instance ring buffer
     instead of the top level one), I found that the selftests were
     disabling prints to the debug instance.

     This should not happen, as the selftests only disable printing to
     the main buffer as the selftests examine the main buffer to see if
     it has what it expects, and prints can make the tests fail.

     Make the selftests only disable printing to the toplevel buffer,
     and leave the instance buffers alone"

* tag 'trace-v6.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: Have function_graph selftest call cond_resched()
  tracing: Only make selftest conditionals affect the global_trace
  tracing: Make tracing_selftest_running/delete nops when not used
  tracing: Have tracer selftests call cond_resched() before running
  tracing: Move setting of tracing_selftest_running out of register_tracer()
  tracing/selftests: Update synthetic event selftest to use common_stacktrace
  tracing: Rename stacktrace field to common_stacktrace
  tracing/histograms: Allow variables to have some modifiers
  tracing/user_events: Document user_event_mm one-shot list usage
  tracing/user_events: Rename link fields for clarity
  tracing/user_events: Remove RCU lock while pinning pages
  tracing/user_events: Split up mm alloc and attach
  tracing/timerlat: Always wakeup the timerlat thread
  tracing/user_events: Use long vs int for atomic bit ops
This commit is contained in:
Linus Torvalds 2023-05-29 07:20:13 -04:00
commit 8b817fded4
11 changed files with 210 additions and 96 deletions

View File

@ -35,7 +35,7 @@ Documentation written by Tom Zanussi
in place of an explicit value field - this is simply a count of
event hits. If 'values' isn't specified, an implicit 'hitcount'
value will be automatically created and used as the only value.
Keys can be any field, or the special string 'stacktrace', which
Keys can be any field, or the special string 'common_stacktrace', which
will use the event's kernel stacktrace as the key. The keywords
'keys' or 'key' can be used to specify keys, and the keywords
'values', 'vals', or 'val' can be used to specify values. Compound
@ -54,7 +54,7 @@ Documentation written by Tom Zanussi
'compatible' if the fields named in the trigger share the same
number and type of fields and those fields also have the same names.
Note that any two events always share the compatible 'hitcount' and
'stacktrace' fields and can therefore be combined using those
'common_stacktrace' fields and can therefore be combined using those
fields, however pointless that may be.
'hist' triggers add a 'hist' file to each event's subdirectory.
@ -547,9 +547,9 @@ Extended error information
the hist trigger display symbolic call_sites, we can have the hist
trigger additionally display the complete set of kernel stack traces
that led to each call_site. To do that, we simply use the special
value 'stacktrace' for the key parameter::
value 'common_stacktrace' for the key parameter::
# echo 'hist:keys=stacktrace:values=bytes_req,bytes_alloc:sort=bytes_alloc' > \
# echo 'hist:keys=common_stacktrace:values=bytes_req,bytes_alloc:sort=bytes_alloc' > \
/sys/kernel/tracing/events/kmem/kmalloc/trigger
The above trigger will use the kernel stack trace in effect when an
@ -561,9 +561,9 @@ Extended error information
every callpath to a kmalloc for a kernel compile)::
# cat /sys/kernel/tracing/events/kmem/kmalloc/hist
# trigger info: hist:keys=stacktrace:vals=bytes_req,bytes_alloc:sort=bytes_alloc:size=2048 [active]
# trigger info: hist:keys=common_stacktrace:vals=bytes_req,bytes_alloc:sort=bytes_alloc:size=2048 [active]
{ stacktrace:
{ common_stacktrace:
__kmalloc_track_caller+0x10b/0x1a0
kmemdup+0x20/0x50
hidraw_report_event+0x8a/0x120 [hid]
@ -581,7 +581,7 @@ Extended error information
cpu_startup_entry+0x315/0x3e0
rest_init+0x7c/0x80
} hitcount: 3 bytes_req: 21 bytes_alloc: 24
{ stacktrace:
{ common_stacktrace:
__kmalloc_track_caller+0x10b/0x1a0
kmemdup+0x20/0x50
hidraw_report_event+0x8a/0x120 [hid]
@ -596,7 +596,7 @@ Extended error information
do_IRQ+0x5a/0xf0
ret_from_intr+0x0/0x30
} hitcount: 3 bytes_req: 21 bytes_alloc: 24
{ stacktrace:
{ common_stacktrace:
kmem_cache_alloc_trace+0xeb/0x150
aa_alloc_task_context+0x27/0x40
apparmor_cred_prepare+0x1f/0x50
@ -608,7 +608,7 @@ Extended error information
.
.
.
{ stacktrace:
{ common_stacktrace:
__kmalloc+0x11b/0x1b0
i915_gem_execbuffer2+0x6c/0x2c0 [i915]
drm_ioctl+0x349/0x670 [drm]
@ -616,7 +616,7 @@ Extended error information
SyS_ioctl+0x81/0xa0
system_call_fastpath+0x12/0x6a
} hitcount: 17726 bytes_req: 13944120 bytes_alloc: 19593808
{ stacktrace:
{ common_stacktrace:
__kmalloc+0x11b/0x1b0
load_elf_phdrs+0x76/0xa0
load_elf_binary+0x102/0x1650
@ -625,7 +625,7 @@ Extended error information
SyS_execve+0x3a/0x50
return_from_execve+0x0/0x23
} hitcount: 33348 bytes_req: 17152128 bytes_alloc: 20226048
{ stacktrace:
{ common_stacktrace:
kmem_cache_alloc_trace+0xeb/0x150
apparmor_file_alloc_security+0x27/0x40
security_file_alloc+0x16/0x20
@ -636,7 +636,7 @@ Extended error information
SyS_open+0x1e/0x20
system_call_fastpath+0x12/0x6a
} hitcount: 4766422 bytes_req: 9532844 bytes_alloc: 38131376
{ stacktrace:
{ common_stacktrace:
__kmalloc+0x11b/0x1b0
seq_buf_alloc+0x1b/0x50
seq_read+0x2cc/0x370
@ -1026,7 +1026,7 @@ Extended error information
First we set up an initially paused stacktrace trigger on the
netif_receive_skb event::
# echo 'hist:key=stacktrace:vals=len:pause' > \
# echo 'hist:key=common_stacktrace:vals=len:pause' > \
/sys/kernel/tracing/events/net/netif_receive_skb/trigger
Next, we set up an 'enable_hist' trigger on the sched_process_exec
@ -1060,9 +1060,9 @@ Extended error information
$ wget https://www.kernel.org/pub/linux/kernel/v3.x/patch-3.19.xz
# cat /sys/kernel/tracing/events/net/netif_receive_skb/hist
# trigger info: hist:keys=stacktrace:vals=len:sort=hitcount:size=2048 [paused]
# trigger info: hist:keys=common_stacktrace:vals=len:sort=hitcount:size=2048 [paused]
{ stacktrace:
{ common_stacktrace:
__netif_receive_skb_core+0x46d/0x990
__netif_receive_skb+0x18/0x60
netif_receive_skb_internal+0x23/0x90
@ -1079,7 +1079,7 @@ Extended error information
kthread+0xd2/0xf0
ret_from_fork+0x42/0x70
} hitcount: 85 len: 28884
{ stacktrace:
{ common_stacktrace:
__netif_receive_skb_core+0x46d/0x990
__netif_receive_skb+0x18/0x60
netif_receive_skb_internal+0x23/0x90
@ -1097,7 +1097,7 @@ Extended error information
irq_thread+0x11f/0x150
kthread+0xd2/0xf0
} hitcount: 98 len: 664329
{ stacktrace:
{ common_stacktrace:
__netif_receive_skb_core+0x46d/0x990
__netif_receive_skb+0x18/0x60
process_backlog+0xa8/0x150
@ -1115,7 +1115,7 @@ Extended error information
inet_sendmsg+0x64/0xa0
sock_sendmsg+0x3d/0x50
} hitcount: 115 len: 13030
{ stacktrace:
{ common_stacktrace:
__netif_receive_skb_core+0x46d/0x990
__netif_receive_skb+0x18/0x60
netif_receive_skb_internal+0x23/0x90
@ -1142,14 +1142,14 @@ Extended error information
into the histogram. In order to avoid having to set everything up
again, we can just clear the histogram first::
# echo 'hist:key=stacktrace:vals=len:clear' >> \
# echo 'hist:key=common_stacktrace:vals=len:clear' >> \
/sys/kernel/tracing/events/net/netif_receive_skb/trigger
Just to verify that it is in fact cleared, here's what we now see in
the hist file::
# cat /sys/kernel/tracing/events/net/netif_receive_skb/hist
# trigger info: hist:keys=stacktrace:vals=len:sort=hitcount:size=2048 [paused]
# trigger info: hist:keys=common_stacktrace:vals=len:sort=hitcount:size=2048 [paused]
Totals:
Hits: 0
@ -1485,12 +1485,12 @@ Extended error information
And here's an example that shows how to combine histogram data from
any two events even if they don't share any 'compatible' fields
other than 'hitcount' and 'stacktrace'. These commands create a
other than 'hitcount' and 'common_stacktrace'. These commands create a
couple of triggers named 'bar' using those fields::
# echo 'hist:name=bar:key=stacktrace:val=hitcount' > \
# echo 'hist:name=bar:key=common_stacktrace:val=hitcount' > \
/sys/kernel/tracing/events/sched/sched_process_fork/trigger
# echo 'hist:name=bar:key=stacktrace:val=hitcount' > \
# echo 'hist:name=bar:key=common_stacktrace:val=hitcount' > \
/sys/kernel/tracing/events/net/netif_rx/trigger
And displaying the output of either shows some interesting if
@ -1501,16 +1501,16 @@ Extended error information
# event histogram
#
# trigger info: hist:name=bar:keys=stacktrace:vals=hitcount:sort=hitcount:size=2048 [active]
# trigger info: hist:name=bar:keys=common_stacktrace:vals=hitcount:sort=hitcount:size=2048 [active]
#
{ stacktrace:
{ common_stacktrace:
kernel_clone+0x18e/0x330
kernel_thread+0x29/0x30
kthreadd+0x154/0x1b0
ret_from_fork+0x3f/0x70
} hitcount: 1
{ stacktrace:
{ common_stacktrace:
netif_rx_internal+0xb2/0xd0
netif_rx_ni+0x20/0x70
dev_loopback_xmit+0xaa/0xd0
@ -1528,7 +1528,7 @@ Extended error information
call_cpuidle+0x3b/0x60
cpu_startup_entry+0x22d/0x310
} hitcount: 1
{ stacktrace:
{ common_stacktrace:
netif_rx_internal+0xb2/0xd0
netif_rx_ni+0x20/0x70
dev_loopback_xmit+0xaa/0xd0
@ -1543,7 +1543,7 @@ Extended error information
SyS_sendto+0xe/0x10
entry_SYSCALL_64_fastpath+0x12/0x6a
} hitcount: 2
{ stacktrace:
{ common_stacktrace:
netif_rx_internal+0xb2/0xd0
netif_rx+0x1c/0x60
loopback_xmit+0x6c/0xb0
@ -1561,7 +1561,7 @@ Extended error information
sock_sendmsg+0x38/0x50
___sys_sendmsg+0x14e/0x270
} hitcount: 76
{ stacktrace:
{ common_stacktrace:
netif_rx_internal+0xb2/0xd0
netif_rx+0x1c/0x60
loopback_xmit+0x6c/0xb0
@ -1579,7 +1579,7 @@ Extended error information
sock_sendmsg+0x38/0x50
___sys_sendmsg+0x269/0x270
} hitcount: 77
{ stacktrace:
{ common_stacktrace:
netif_rx_internal+0xb2/0xd0
netif_rx+0x1c/0x60
loopback_xmit+0x6c/0xb0
@ -1597,7 +1597,7 @@ Extended error information
sock_sendmsg+0x38/0x50
SYSC_sendto+0xef/0x170
} hitcount: 88
{ stacktrace:
{ common_stacktrace:
kernel_clone+0x18e/0x330
SyS_clone+0x19/0x20
entry_SYSCALL_64_fastpath+0x12/0x6a
@ -1949,7 +1949,7 @@ uninterruptible state::
# cd /sys/kernel/tracing
# echo 's:block_lat pid_t pid; u64 delta; unsigned long[] stack;' > dynamic_events
# echo 'hist:keys=next_pid:ts=common_timestamp.usecs,st=stacktrace if prev_state == 2' >> events/sched/sched_switch/trigger
# echo 'hist:keys=next_pid:ts=common_timestamp.usecs,st=common_stacktrace if prev_state == 2' >> events/sched/sched_switch/trigger
# echo 'hist:keys=prev_pid:delta=common_timestamp.usecs-$ts,s=$st:onmax($delta).trace(block_lat,prev_pid,$delta,$s)' >> events/sched/sched_switch/trigger
# echo 1 > events/synthetic/block_lat/enable
# cat trace

View File

@ -806,6 +806,7 @@ enum {
FILTER_TRACE_FN,
FILTER_COMM,
FILTER_CPU,
FILTER_STACKTRACE,
};
extern int trace_event_raw_init(struct trace_event_call *call);

View File

@ -17,9 +17,10 @@
#ifdef CONFIG_USER_EVENTS
struct user_event_mm {
struct list_head link;
struct list_head mms_link;
struct list_head enablers;
struct mm_struct *mm;
/* Used for one-shot lists, protected by event_mutex */
struct user_event_mm *next;
refcount_t refcnt;
refcount_t tasks;

View File

@ -60,6 +60,7 @@
*/
bool ring_buffer_expanded;
#ifdef CONFIG_FTRACE_STARTUP_TEST
/*
* We need to change this state when a selftest is running.
* A selftest will lurk into the ring-buffer to count the
@ -75,7 +76,6 @@ static bool __read_mostly tracing_selftest_running;
*/
bool __read_mostly tracing_selftest_disabled;
#ifdef CONFIG_FTRACE_STARTUP_TEST
void __init disable_tracing_selftest(const char *reason)
{
if (!tracing_selftest_disabled) {
@ -83,6 +83,9 @@ void __init disable_tracing_selftest(const char *reason)
pr_info("Ftrace startup test is disabled due to %s\n", reason);
}
}
#else
#define tracing_selftest_running 0
#define tracing_selftest_disabled 0
#endif
/* Pipe tracepoints to printk */
@ -1051,7 +1054,10 @@ int __trace_array_puts(struct trace_array *tr, unsigned long ip,
if (!(tr->trace_flags & TRACE_ITER_PRINTK))
return 0;
if (unlikely(tracing_selftest_running || tracing_disabled))
if (unlikely(tracing_selftest_running && tr == &global_trace))
return 0;
if (unlikely(tracing_disabled))
return 0;
alloc = sizeof(*entry) + size + 2; /* possible \n added */
@ -2041,6 +2047,24 @@ static int run_tracer_selftest(struct tracer *type)
return 0;
}
static int do_run_tracer_selftest(struct tracer *type)
{
int ret;
/*
* Tests can take a long time, especially if they are run one after the
* other, as does happen during bootup when all the tracers are
* registered. This could cause the soft lockup watchdog to trigger.
*/
cond_resched();
tracing_selftest_running = true;
ret = run_tracer_selftest(type);
tracing_selftest_running = false;
return ret;
}
static __init int init_trace_selftests(void)
{
struct trace_selftests *p, *n;
@ -2092,6 +2116,10 @@ static inline int run_tracer_selftest(struct tracer *type)
{
return 0;
}
static inline int do_run_tracer_selftest(struct tracer *type)
{
return 0;
}
#endif /* CONFIG_FTRACE_STARTUP_TEST */
static void add_tracer_options(struct trace_array *tr, struct tracer *t);
@ -2127,8 +2155,6 @@ int __init register_tracer(struct tracer *type)
mutex_lock(&trace_types_lock);
tracing_selftest_running = true;
for (t = trace_types; t; t = t->next) {
if (strcmp(type->name, t->name) == 0) {
/* already found */
@ -2157,7 +2183,7 @@ int __init register_tracer(struct tracer *type)
/* store the tracer for __set_tracer_option */
type->flags->trace = type;
ret = run_tracer_selftest(type);
ret = do_run_tracer_selftest(type);
if (ret < 0)
goto out;
@ -2166,7 +2192,6 @@ int __init register_tracer(struct tracer *type)
add_tracer_options(&global_trace, type);
out:
tracing_selftest_running = false;
mutex_unlock(&trace_types_lock);
if (ret || !default_bootup_tracer)
@ -3490,7 +3515,7 @@ __trace_array_vprintk(struct trace_buffer *buffer,
unsigned int trace_ctx;
char *tbuffer;
if (tracing_disabled || tracing_selftest_running)
if (tracing_disabled)
return 0;
/* Don't pollute graph traces with trace_vprintk internals */
@ -3538,6 +3563,9 @@ __printf(3, 0)
int trace_array_vprintk(struct trace_array *tr,
unsigned long ip, const char *fmt, va_list args)
{
if (tracing_selftest_running && tr == &global_trace)
return 0;
return __trace_array_vprintk(tr->array_buffer.buffer, ip, fmt, args);
}
@ -5752,7 +5780,7 @@ static const char readme_msg[] =
"\t table using the key(s) and value(s) named, and the value of a\n"
"\t sum called 'hitcount' is incremented. Keys and values\n"
"\t correspond to fields in the event's format description. Keys\n"
"\t can be any field, or the special string 'stacktrace'.\n"
"\t can be any field, or the special string 'common_stacktrace'.\n"
"\t Compound keys consisting of up to two fields can be specified\n"
"\t by the 'keys' keyword. Values must correspond to numeric\n"
"\t fields. Sort keys consisting of up to two fields can be\n"

View File

@ -194,6 +194,8 @@ static int trace_define_generic_fields(void)
__generic_field(int, common_cpu, FILTER_CPU);
__generic_field(char *, COMM, FILTER_COMM);
__generic_field(char *, comm, FILTER_COMM);
__generic_field(char *, stacktrace, FILTER_STACKTRACE);
__generic_field(char *, STACKTRACE, FILTER_STACKTRACE);
return ret;
}

View File

@ -1364,7 +1364,7 @@ static const char *hist_field_name(struct hist_field *field,
if (field->field)
field_name = field->field->name;
else
field_name = "stacktrace";
field_name = "common_stacktrace";
} else if (field->flags & HIST_FIELD_FL_HITCOUNT)
field_name = "hitcount";
@ -2367,7 +2367,7 @@ parse_field(struct hist_trigger_data *hist_data, struct trace_event_file *file,
hist_data->enable_timestamps = true;
if (*flags & HIST_FIELD_FL_TIMESTAMP_USECS)
hist_data->attrs->ts_in_usecs = true;
} else if (strcmp(field_name, "stacktrace") == 0) {
} else if (strcmp(field_name, "common_stacktrace") == 0) {
*flags |= HIST_FIELD_FL_STACKTRACE;
} else if (strcmp(field_name, "common_cpu") == 0)
*flags |= HIST_FIELD_FL_CPU;
@ -2378,11 +2378,15 @@ parse_field(struct hist_trigger_data *hist_data, struct trace_event_file *file,
if (!field || !field->size) {
/*
* For backward compatibility, if field_name
* was "cpu", then we treat this the same as
* common_cpu. This also works for "CPU".
* was "cpu" or "stacktrace", then we treat this
* the same as common_cpu and common_stacktrace
* respectively. This also works for "CPU", and
* "STACKTRACE".
*/
if (field && field->filter_type == FILTER_CPU) {
*flags |= HIST_FIELD_FL_CPU;
} else if (field && field->filter_type == FILTER_STACKTRACE) {
*flags |= HIST_FIELD_FL_STACKTRACE;
} else {
hist_err(tr, HIST_ERR_FIELD_NOT_FOUND,
errpos(field_name));
@ -4238,13 +4242,19 @@ static int __create_val_field(struct hist_trigger_data *hist_data,
goto out;
}
/* Some types cannot be a value */
if (hist_field->flags & (HIST_FIELD_FL_GRAPH | HIST_FIELD_FL_PERCENT |
HIST_FIELD_FL_BUCKET | HIST_FIELD_FL_LOG2 |
HIST_FIELD_FL_SYM | HIST_FIELD_FL_SYM_OFFSET |
HIST_FIELD_FL_SYSCALL | HIST_FIELD_FL_STACKTRACE)) {
hist_err(file->tr, HIST_ERR_BAD_FIELD_MODIFIER, errpos(field_str));
ret = -EINVAL;
/* values and variables should not have some modifiers */
if (hist_field->flags & HIST_FIELD_FL_VAR) {
/* Variable */
if (hist_field->flags & (HIST_FIELD_FL_GRAPH | HIST_FIELD_FL_PERCENT |
HIST_FIELD_FL_BUCKET | HIST_FIELD_FL_LOG2))
goto err;
} else {
/* Value */
if (hist_field->flags & (HIST_FIELD_FL_GRAPH | HIST_FIELD_FL_PERCENT |
HIST_FIELD_FL_BUCKET | HIST_FIELD_FL_LOG2 |
HIST_FIELD_FL_SYM | HIST_FIELD_FL_SYM_OFFSET |
HIST_FIELD_FL_SYSCALL | HIST_FIELD_FL_STACKTRACE))
goto err;
}
hist_data->fields[val_idx] = hist_field;
@ -4256,6 +4266,9 @@ static int __create_val_field(struct hist_trigger_data *hist_data,
ret = -EINVAL;
out:
return ret;
err:
hist_err(file->tr, HIST_ERR_BAD_FIELD_MODIFIER, errpos(field_str));
return -EINVAL;
}
static int create_val_field(struct hist_trigger_data *hist_data,
@ -5385,7 +5398,7 @@ static void hist_trigger_print_key(struct seq_file *m,
if (key_field->field)
seq_printf(m, "%s.stacktrace", key_field->field->name);
else
seq_puts(m, "stacktrace:\n");
seq_puts(m, "common_stacktrace:\n");
hist_trigger_stacktrace_print(m,
key + key_field->offset,
HIST_STACKTRACE_DEPTH);
@ -5968,7 +5981,7 @@ static int event_hist_trigger_print(struct seq_file *m,
if (field->field)
seq_printf(m, "%s.stacktrace", field->field->name);
else
seq_puts(m, "stacktrace");
seq_puts(m, "common_stacktrace");
} else
hist_field_print(m, field);
}

View File

@ -96,12 +96,12 @@ struct user_event {
* these to track enablement sites that are tied to an event.
*/
struct user_event_enabler {
struct list_head link;
struct list_head mm_enablers_link;
struct user_event *event;
unsigned long addr;
/* Track enable bit, flags, etc. Aligned for bitops. */
unsigned int values;
unsigned long values;
};
/* Bits 0-5 are for the bit to update upon enable/disable (0-63 allowed) */
@ -116,7 +116,9 @@ struct user_event_enabler {
/* Only duplicate the bit value */
#define ENABLE_VAL_DUP_MASK ENABLE_VAL_BIT_MASK
#define ENABLE_BITOPS(e) ((unsigned long *)&(e)->values)
#define ENABLE_BITOPS(e) (&(e)->values)
#define ENABLE_BIT(e) ((int)((e)->values & ENABLE_VAL_BIT_MASK))
/* Used for asynchronous faulting in of pages */
struct user_event_enabler_fault {
@ -153,7 +155,7 @@ struct user_event_file_info {
#define VALIDATOR_REL (1 << 1)
struct user_event_validator {
struct list_head link;
struct list_head user_event_link;
int offset;
int flags;
};
@ -259,7 +261,7 @@ static struct user_event_group
static void user_event_enabler_destroy(struct user_event_enabler *enabler)
{
list_del_rcu(&enabler->link);
list_del_rcu(&enabler->mm_enablers_link);
/* No longer tracking the event via the enabler */
refcount_dec(&enabler->event->refcnt);
@ -423,9 +425,9 @@ static int user_event_enabler_write(struct user_event_mm *mm,
/* Update bit atomically, user tracers must be atomic as well */
if (enabler->event && enabler->event->status)
set_bit(enabler->values & ENABLE_VAL_BIT_MASK, ptr);
set_bit(ENABLE_BIT(enabler), ptr);
else
clear_bit(enabler->values & ENABLE_VAL_BIT_MASK, ptr);
clear_bit(ENABLE_BIT(enabler), ptr);
kunmap_local(kaddr);
unpin_user_pages_dirty_lock(&page, 1, true);
@ -437,11 +439,9 @@ static bool user_event_enabler_exists(struct user_event_mm *mm,
unsigned long uaddr, unsigned char bit)
{
struct user_event_enabler *enabler;
struct user_event_enabler *next;
list_for_each_entry_safe(enabler, next, &mm->enablers, link) {
if (enabler->addr == uaddr &&
(enabler->values & ENABLE_VAL_BIT_MASK) == bit)
list_for_each_entry(enabler, &mm->enablers, mm_enablers_link) {
if (enabler->addr == uaddr && ENABLE_BIT(enabler) == bit)
return true;
}
@ -451,23 +451,36 @@ static bool user_event_enabler_exists(struct user_event_mm *mm,
static void user_event_enabler_update(struct user_event *user)
{
struct user_event_enabler *enabler;
struct user_event_mm *mm = user_event_mm_get_all(user);
struct user_event_mm *next;
struct user_event_mm *mm;
int attempt;
lockdep_assert_held(&event_mutex);
/*
* We need to build a one-shot list of all the mms that have an
* enabler for the user_event passed in. This list is only valid
* while holding the event_mutex. The only reason for this is due
* to the global mm list being RCU protected and we use methods
* which can wait (mmap_read_lock and pin_user_pages_remote).
*
* NOTE: user_event_mm_get_all() increments the ref count of each
* mm that is added to the list to prevent removal timing windows.
* We must always put each mm after they are used, which may wait.
*/
mm = user_event_mm_get_all(user);
while (mm) {
next = mm->next;
mmap_read_lock(mm->mm);
rcu_read_lock();
list_for_each_entry_rcu(enabler, &mm->enablers, link) {
list_for_each_entry(enabler, &mm->enablers, mm_enablers_link) {
if (enabler->event == user) {
attempt = 0;
user_event_enabler_write(mm, enabler, true, &attempt);
}
}
rcu_read_unlock();
mmap_read_unlock(mm->mm);
user_event_mm_put(mm);
mm = next;
@ -495,7 +508,9 @@ static bool user_event_enabler_dup(struct user_event_enabler *orig,
enabler->values = orig->values & ENABLE_VAL_DUP_MASK;
refcount_inc(&enabler->event->refcnt);
list_add_rcu(&enabler->link, &mm->enablers);
/* Enablers not exposed yet, RCU not required */
list_add(&enabler->mm_enablers_link, &mm->enablers);
return true;
}
@ -513,6 +528,14 @@ static struct user_event_mm *user_event_mm_get_all(struct user_event *user)
struct user_event_enabler *enabler;
struct user_event_mm *mm;
/*
* We use the mm->next field to build a one-shot list from the global
* RCU protected list. To build this list the event_mutex must be held.
* This lets us build a list without requiring allocs that could fail
* when user based events are most wanted for diagnostics.
*/
lockdep_assert_held(&event_mutex);
/*
* We do not want to block fork/exec while enablements are being
* updated, so we use RCU to walk the current tasks that have used
@ -525,23 +548,24 @@ static struct user_event_mm *user_event_mm_get_all(struct user_event *user)
*/
rcu_read_lock();
list_for_each_entry_rcu(mm, &user_event_mms, link)
list_for_each_entry_rcu(enabler, &mm->enablers, link)
list_for_each_entry_rcu(mm, &user_event_mms, mms_link) {
list_for_each_entry_rcu(enabler, &mm->enablers, mm_enablers_link) {
if (enabler->event == user) {
mm->next = found;
found = user_event_mm_get(mm);
break;
}
}
}
rcu_read_unlock();
return found;
}
static struct user_event_mm *user_event_mm_create(struct task_struct *t)
static struct user_event_mm *user_event_mm_alloc(struct task_struct *t)
{
struct user_event_mm *user_mm;
unsigned long flags;
user_mm = kzalloc(sizeof(*user_mm), GFP_KERNEL_ACCOUNT);
@ -553,12 +577,6 @@ static struct user_event_mm *user_event_mm_create(struct task_struct *t)
refcount_set(&user_mm->refcnt, 1);
refcount_set(&user_mm->tasks, 1);
spin_lock_irqsave(&user_event_mms_lock, flags);
list_add_rcu(&user_mm->link, &user_event_mms);
spin_unlock_irqrestore(&user_event_mms_lock, flags);
t->user_event_mm = user_mm;
/*
* The lifetime of the memory descriptor can slightly outlast
* the task lifetime if a ref to the user_event_mm is taken
@ -572,6 +590,17 @@ static struct user_event_mm *user_event_mm_create(struct task_struct *t)
return user_mm;
}
static void user_event_mm_attach(struct user_event_mm *user_mm, struct task_struct *t)
{
unsigned long flags;
spin_lock_irqsave(&user_event_mms_lock, flags);
list_add_rcu(&user_mm->mms_link, &user_event_mms);
spin_unlock_irqrestore(&user_event_mms_lock, flags);
t->user_event_mm = user_mm;
}
static struct user_event_mm *current_user_event_mm(void)
{
struct user_event_mm *user_mm = current->user_event_mm;
@ -579,10 +608,12 @@ static struct user_event_mm *current_user_event_mm(void)
if (user_mm)
goto inc;
user_mm = user_event_mm_create(current);
user_mm = user_event_mm_alloc(current);
if (!user_mm)
goto error;
user_event_mm_attach(user_mm, current);
inc:
refcount_inc(&user_mm->refcnt);
error:
@ -593,7 +624,7 @@ static void user_event_mm_destroy(struct user_event_mm *mm)
{
struct user_event_enabler *enabler, *next;
list_for_each_entry_safe(enabler, next, &mm->enablers, link)
list_for_each_entry_safe(enabler, next, &mm->enablers, mm_enablers_link)
user_event_enabler_destroy(enabler);
mmdrop(mm->mm);
@ -630,7 +661,7 @@ void user_event_mm_remove(struct task_struct *t)
/* Remove the mm from the list, so it can no longer be enabled */
spin_lock_irqsave(&user_event_mms_lock, flags);
list_del_rcu(&mm->link);
list_del_rcu(&mm->mms_link);
spin_unlock_irqrestore(&user_event_mms_lock, flags);
/*
@ -670,7 +701,7 @@ void user_event_mm_remove(struct task_struct *t)
void user_event_mm_dup(struct task_struct *t, struct user_event_mm *old_mm)
{
struct user_event_mm *mm = user_event_mm_create(t);
struct user_event_mm *mm = user_event_mm_alloc(t);
struct user_event_enabler *enabler;
if (!mm)
@ -678,16 +709,18 @@ void user_event_mm_dup(struct task_struct *t, struct user_event_mm *old_mm)
rcu_read_lock();
list_for_each_entry_rcu(enabler, &old_mm->enablers, link)
list_for_each_entry_rcu(enabler, &old_mm->enablers, mm_enablers_link) {
if (!user_event_enabler_dup(enabler, mm))
goto error;
}
rcu_read_unlock();
user_event_mm_attach(mm, t);
return;
error:
rcu_read_unlock();
user_event_mm_remove(t);
user_event_mm_destroy(mm);
}
static bool current_user_event_enabler_exists(unsigned long uaddr,
@ -748,7 +781,7 @@ static struct user_event_enabler
*/
if (!*write_result) {
refcount_inc(&enabler->event->refcnt);
list_add_rcu(&enabler->link, &user_mm->enablers);
list_add_rcu(&enabler->mm_enablers_link, &user_mm->enablers);
}
mutex_unlock(&event_mutex);
@ -904,8 +937,8 @@ static void user_event_destroy_validators(struct user_event *user)
struct user_event_validator *validator, *next;
struct list_head *head = &user->validators;
list_for_each_entry_safe(validator, next, head, link) {
list_del(&validator->link);
list_for_each_entry_safe(validator, next, head, user_event_link) {
list_del(&validator->user_event_link);
kfree(validator);
}
}
@ -959,7 +992,7 @@ static int user_event_add_field(struct user_event *user, const char *type,
validator->offset = offset;
/* Want sequential access when validating */
list_add_tail(&validator->link, &user->validators);
list_add_tail(&validator->user_event_link, &user->validators);
add_field:
field->type = type;
@ -1349,7 +1382,7 @@ static int user_event_validate(struct user_event *user, void *data, int len)
void *pos, *end = data + len;
u32 loc, offset, size;
list_for_each_entry(validator, head, link) {
list_for_each_entry(validator, head, user_event_link) {
pos = data + validator->offset;
/* Already done min_size check, no bounds check here */
@ -2270,9 +2303,9 @@ static long user_events_ioctl_unreg(unsigned long uarg)
*/
mutex_lock(&event_mutex);
list_for_each_entry_safe(enabler, next, &mm->enablers, link)
list_for_each_entry_safe(enabler, next, &mm->enablers, mm_enablers_link) {
if (enabler->addr == reg.disable_addr &&
(enabler->values & ENABLE_VAL_BIT_MASK) == reg.disable_bit) {
ENABLE_BIT(enabler) == reg.disable_bit) {
set_bit(ENABLE_VAL_FREEING_BIT, ENABLE_BITOPS(enabler));
if (!test_bit(ENABLE_VAL_FAULTING_BIT, ENABLE_BITOPS(enabler)))
@ -2281,6 +2314,7 @@ static long user_events_ioctl_unreg(unsigned long uarg)
/* Removed at least one */
ret = 0;
}
}
mutex_unlock(&event_mutex);

View File

@ -1652,6 +1652,8 @@ static enum hrtimer_restart timerlat_irq(struct hrtimer *timer)
osnoise_stop_tracing();
notify_new_max_latency(diff);
wake_up_process(tlat->kthread);
return HRTIMER_NORESTART;
}
}

View File

@ -848,6 +848,12 @@ trace_selftest_startup_function_graph(struct tracer *trace,
}
#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
/*
* These tests can take some time to run. Make sure on non PREEMPT
* kernels, we do not trigger the softlockup detector.
*/
cond_resched();
tracing_reset_online_cpus(&tr->array_buffer);
set_graph_array(tr);
@ -869,6 +875,8 @@ trace_selftest_startup_function_graph(struct tracer *trace,
if (ret)
goto out;
cond_resched();
ret = register_ftrace_graph(&fgraph_ops);
if (ret) {
warn_failed_init_tracer(trace, ret);
@ -891,6 +899,8 @@ trace_selftest_startup_function_graph(struct tracer *trace,
if (ret)
goto out;
cond_resched();
tracing_start();
if (!ret && !count) {

View File

@ -0,0 +1,24 @@
#!/bin/sh
# SPDX-License-Identifier: GPL-2.0
# description: event trigger - test inter-event histogram trigger trace action with dynamic string param (legacy stack)
# requires: set_event synthetic_events events/sched/sched_process_exec/hist "long[] stack' >> synthetic_events":README
fail() { #msg
echo $1
exit_fail
}
echo "Test create synthetic event with stack"
# Test the old stacktrace keyword (for backward compatibility)
echo 's:wake_lat pid_t pid; u64 delta; unsigned long[] stack;' > dynamic_events
echo 'hist:keys=next_pid:ts=common_timestamp.usecs,st=stacktrace if prev_state == 1||prev_state == 2' >> events/sched/sched_switch/trigger
echo 'hist:keys=prev_pid:delta=common_timestamp.usecs-$ts,s=$st:onmax($delta).trace(wake_lat,prev_pid,$delta,$s)' >> events/sched/sched_switch/trigger
echo 1 > events/synthetic/wake_lat/enable
sleep 1
if ! grep -q "=>.*sched" trace; then
fail "Failed to create synthetic event with stack"
fi
exit 0

View File

@ -1,7 +1,7 @@
#!/bin/sh
# SPDX-License-Identifier: GPL-2.0
# description: event trigger - test inter-event histogram trigger trace action with dynamic string param
# requires: set_event synthetic_events events/sched/sched_process_exec/hist "long[]' >> synthetic_events":README
# requires: set_event synthetic_events events/sched/sched_process_exec/hist "can be any field, or the special string 'common_stacktrace'":README
fail() { #msg
echo $1
@ -10,9 +10,8 @@ fail() { #msg
echo "Test create synthetic event with stack"
echo 's:wake_lat pid_t pid; u64 delta; unsigned long[] stack;' > dynamic_events
echo 'hist:keys=next_pid:ts=common_timestamp.usecs,st=stacktrace if prev_state == 1||prev_state == 2' >> events/sched/sched_switch/trigger
echo 'hist:keys=next_pid:ts=common_timestamp.usecs,st=common_stacktrace if prev_state == 1||prev_state == 2' >> events/sched/sched_switch/trigger
echo 'hist:keys=prev_pid:delta=common_timestamp.usecs-$ts,s=$st:onmax($delta).trace(wake_lat,prev_pid,$delta,$s)' >> events/sched/sched_switch/trigger
echo 1 > events/synthetic/wake_lat/enable
sleep 1