License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-01 14:07:57 +00:00
|
|
|
// SPDX-License-Identifier: GPL-2.0
|
2009-06-02 20:59:57 +00:00
|
|
|
/*
|
2009-06-02 21:37:05 +00:00
|
|
|
* builtin-record.c
|
|
|
|
*
|
|
|
|
* Builtin record command: Record the profile of a workload
|
|
|
|
* (or a CPU, or a PID) into the perf.data output file - for
|
|
|
|
* later analysis via perf report.
|
2009-06-02 20:59:57 +00:00
|
|
|
*/
|
2009-05-27 07:10:38 +00:00
|
|
|
#include "builtin.h"
|
2009-06-02 21:37:05 +00:00
|
|
|
|
2010-02-03 18:52:05 +00:00
|
|
|
#include "util/build-id.h"
|
2015-12-15 15:39:39 +00:00
|
|
|
#include <subcmd/parse-options.h>
|
2022-09-01 19:57:37 +00:00
|
|
|
#include <internal/xyarray.h>
|
2009-05-26 09:10:09 +00:00
|
|
|
#include "util/parse-events.h"
|
2016-06-23 08:55:17 +00:00
|
|
|
#include "util/config.h"
|
2009-05-01 16:29:57 +00:00
|
|
|
|
2014-10-09 19:12:24 +00:00
|
|
|
#include "util/callchain.h"
|
2014-10-17 15:17:40 +00:00
|
|
|
#include "util/cgroup.h"
|
2009-06-25 15:05:54 +00:00
|
|
|
#include "util/header.h"
|
2009-08-12 09:07:25 +00:00
|
|
|
#include "util/event.h"
|
2011-01-11 22:56:53 +00:00
|
|
|
#include "util/evlist.h"
|
2011-01-03 18:39:04 +00:00
|
|
|
#include "util/evsel.h"
|
2009-08-16 20:05:48 +00:00
|
|
|
#include "util/debug.h"
|
2019-09-23 15:20:38 +00:00
|
|
|
#include "util/mmap.h"
|
2022-08-26 16:42:31 +00:00
|
|
|
#include "util/mutex.h"
|
2019-08-22 18:40:29 +00:00
|
|
|
#include "util/target.h"
|
2009-12-11 23:24:02 +00:00
|
|
|
#include "util/session.h"
|
2011-11-28 10:30:20 +00:00
|
|
|
#include "util/tool.h"
|
perf symbols: Use the buildids if present
With this change 'perf record' will intercept PERF_RECORD_MMAP
calls, creating a linked list of DSOs, then when the session
finishes, it will traverse this list and read the buildids,
stashing them at the end of the file and will set up a new
feature bit in the header bitmask.
'perf report' will then notice this feature and populate the
'dsos' list and set the build ids.
When reading the symtabs it will refuse to load from a file that
doesn't have the same build id. This improves the
reliability of the profiler output, as symbols and profiling
data is more guaranteed to match.
Example:
[root@doppio ~]# perf report | head
/home/acme/bin/perf with build id b1ea544ac3746e7538972548a09aadecc5753868 not found, continuing without symbols
# Samples: 2621434559
#
# Overhead Command Shared Object Symbol
# ........ ............... ............................. ......
#
7.91% init [kernel] [k] read_hpet
7.64% init [kernel] [k] mwait_idle_with_hints
7.60% swapper [kernel] [k] read_hpet
7.60% swapper [kernel] [k] mwait_idle_with_hints
3.65% init [kernel] [k] 0xffffffffa02339d9
[root@doppio ~]#
In this case the 'perf' binary was an older one, vanished,
so its symbols probably wouldn't match or would cause subtly
different (and misleading) output.
Next patches will support the kernel as well, reading the build
id notes for it and the modules from /sys.
Another patch should also introduce a new plumbing command:
'perf list-buildids'
that will then be used in porcelain that is distro specific to
fetch -debuginfo packages where such buildids are present. This
will in turn allow for one to run 'perf record' in one machine
and 'perf report' in another.
Future work on having the buildid sent directly from the kernel
in the PERF_RECORD_MMAP event is needed to close races, as the
DSO can be changed during a 'perf record' session, but this
patch at least helps with non-corner cases and current/older
kernels.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Frank Ch. Eigler <fche@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Jim Keniston <jkenisto@us.ibm.com>
Cc: K. Prasad <prasad@linux.vnet.ibm.com>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roland McGrath <roland@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <1257367843-26224-1-git-send-email-acme@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-04 20:50:43 +00:00
|
|
|
#include "util/symbol.h"
|
2019-08-22 18:40:29 +00:00
|
|
|
#include "util/record.h"
|
perf tools: Fix sparse CPU numbering related bugs
At present, the perf subcommands that do system-wide monitoring
(perf stat, perf record and perf top) don't work properly unless
the online cpus are numbered 0, 1, ..., N-1. These tools ask
for the number of online cpus with sysconf(_SC_NPROCESSORS_ONLN)
and then try to create events for cpus 0, 1, ..., N-1.
This creates problems for systems where the online cpus are
numbered sparsely. For example, a POWER6 system in
single-threaded mode (i.e. only running 1 hardware thread per
core) will have only even-numbered cpus online.
This fixes the problem by reading the /sys/devices/system/cpu/online
file to find out which cpus are online. The code that does that is in
tools/perf/util/cpumap.[ch], and consists of a read_cpu_map()
function that sets up a cpumap[] array and returns the number of
online cpus. If /sys/devices/system/cpu/online can't be read or
can't be parsed successfully, it falls back to using sysconf to
ask how many cpus are online and sets up an identity map in cpumap[].
The perf record, perf stat and perf top code then calls
read_cpu_map() in the system-wide monitoring case (instead of
sysconf) and uses cpumap[] to get the cpu numbers to pass to
perf_event_open.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
LKML-Reference: <20100310093609.GA3959@brick.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-10 09:36:09 +00:00
|
|
|
#include "util/cpumap.h"
|
2011-01-18 17:15:24 +00:00
|
|
|
#include "util/thread_map.h"
|
2013-10-15 14:27:32 +00:00
|
|
|
#include "util/data.h"
|
perf record: Add ability to name registers to record
This patch modifies the -I/--int-regs option to enablepassing the name
of the registers to sample on interrupt. Registers can be specified by
their symbolic names. For instance on x86, --intr-regs=ax,si.
The motivation is to reduce the size of the perf.data file and the
overhead of sampling by only collecting the registers useful to a
specific analysis. For instance, for value profiling, sampling only the
registers used to passed arguements to functions.
With no parameter, the --intr-regs still records all possible registers
based on the architecture.
To name registers, it is necessary to use the long form of the option,
i.e., --intr-regs:
$ perf record --intr-regs=si,di,r8,r9 .....
To record any possible registers:
$ perf record -I .....
$ perf report --intr-regs ...
To display the register, one can use perf report -D
To list the available registers:
$ perf record --intr-regs=\?
available registers: AX BX CX DX SI DI BP SP IP FLAGS CS SS R8 R9 R10 R11 R12 R13 R14 R15
Signed-off-by: Stephane Eranian <eranian@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1441039273-16260-4-git-send-email-eranian@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-31 16:41:12 +00:00
|
|
|
#include "util/perf_regs.h"
|
2015-04-09 15:53:45 +00:00
|
|
|
#include "util/auxtrace.h"
|
2016-03-08 08:38:44 +00:00
|
|
|
#include "util/tsc.h"
|
2015-05-27 17:51:51 +00:00
|
|
|
#include "util/parse-branch-options.h"
|
perf record: Add ability to name registers to record
This patch modifies the -I/--int-regs option to enablepassing the name
of the registers to sample on interrupt. Registers can be specified by
their symbolic names. For instance on x86, --intr-regs=ax,si.
The motivation is to reduce the size of the perf.data file and the
overhead of sampling by only collecting the registers useful to a
specific analysis. For instance, for value profiling, sampling only the
registers used to passed arguements to functions.
With no parameter, the --intr-regs still records all possible registers
based on the architecture.
To name registers, it is necessary to use the long form of the option,
i.e., --intr-regs:
$ perf record --intr-regs=si,di,r8,r9 .....
To record any possible registers:
$ perf record -I .....
$ perf report --intr-regs ...
To display the register, one can use perf report -D
To list the available registers:
$ perf record --intr-regs=\?
available registers: AX BX CX DX SI DI BP SP IP FLAGS CS SS R8 R9 R10 R11 R12 R13 R14 R15
Signed-off-by: Stephane Eranian <eranian@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1441039273-16260-4-git-send-email-eranian@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-31 16:41:12 +00:00
|
|
|
#include "util/parse-regs-options.h"
|
2020-05-05 14:49:08 +00:00
|
|
|
#include "util/perf_api_probe.h"
|
2016-04-20 18:59:49 +00:00
|
|
|
#include "util/trigger.h"
|
2016-11-26 07:03:28 +00:00
|
|
|
#include "util/perf-hooks.h"
|
2019-01-22 17:50:57 +00:00
|
|
|
#include "util/cpu-set-sched.h"
|
2019-09-18 14:36:13 +00:00
|
|
|
#include "util/synthetic-events.h"
|
2017-04-19 19:12:39 +00:00
|
|
|
#include "util/time-utils.h"
|
2017-04-19 19:05:56 +00:00
|
|
|
#include "util/units.h"
|
perf tools: Synthesize PERF_RECORD_* for loaded BPF programs
This patch synthesize PERF_RECORD_KSYMBOL and PERF_RECORD_BPF_EVENT for
BPF programs loaded before perf-record. This is achieved by gathering
information about all BPF programs via sys_bpf.
Committer notes:
Fix the build on some older systems such as amazonlinux:1 where it was
breaking with:
util/bpf-event.c: In function 'perf_event__synthesize_one_bpf_prog':
util/bpf-event.c:52:9: error: missing initializer for field 'type' of 'struct bpf_prog_info' [-Werror=missing-field-initializers]
struct bpf_prog_info info = {};
^
In file included from /git/linux/tools/lib/bpf/bpf.h:26:0,
from util/bpf-event.c:3:
/git/linux/tools/include/uapi/linux/bpf.h:2699:8: note: 'type' declared here
__u32 type;
^
cc1: all warnings being treated as errors
Further fix on a centos:6 system:
cc1: warnings being treated as errors
util/bpf-event.c: In function 'perf_event__synthesize_one_bpf_prog':
util/bpf-event.c:50: error: 'func_info_rec_size' may be used uninitialized in this function
The compiler is wrong, but to silence it, initialize that variable to
zero.
One more fix, this time for debian:experimental-x-mips, x-mips64 and
x-mipsel:
util/bpf-event.c: In function 'perf_event__synthesize_one_bpf_prog':
util/bpf-event.c:93:16: error: implicit declaration of function 'calloc' [-Werror=implicit-function-declaration]
func_infos = calloc(sub_prog_cnt, func_info_rec_size);
^~~~~~
util/bpf-event.c:93:16: error: incompatible implicit declaration of built-in function 'calloc' [-Werror]
util/bpf-event.c:93:16: note: include '<stdlib.h>' or provide a declaration of 'calloc'
Add the missing header.
Committer testing:
# perf record --bpf-event sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.021 MB perf.data (7 samples) ]
# perf report -D | grep PERF_RECORD_BPF_EVENT | nl
1 0 0x4b10 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 13
2 0 0x4c60 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 14
3 0 0x4db0 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 15
4 0 0x4f00 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 16
5 0 0x5050 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 17
6 0 0x51a0 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 18
7 0 0x52f0 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 21
8 0 0x5440 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 22
# bpftool prog
13: cgroup_skb tag 7be49e3934a125ba gpl
loaded_at 2019-01-19T09:09:43-0300 uid 0
xlated 296B jited 229B memlock 4096B map_ids 13,14
14: cgroup_skb tag 2a142ef67aaad174 gpl
loaded_at 2019-01-19T09:09:43-0300 uid 0
xlated 296B jited 229B memlock 4096B map_ids 13,14
15: cgroup_skb tag 7be49e3934a125ba gpl
loaded_at 2019-01-19T09:09:43-0300 uid 0
xlated 296B jited 229B memlock 4096B map_ids 15,16
16: cgroup_skb tag 2a142ef67aaad174 gpl
loaded_at 2019-01-19T09:09:43-0300 uid 0
xlated 296B jited 229B memlock 4096B map_ids 15,16
17: cgroup_skb tag 7be49e3934a125ba gpl
loaded_at 2019-01-19T09:09:44-0300 uid 0
xlated 296B jited 229B memlock 4096B map_ids 17,18
18: cgroup_skb tag 2a142ef67aaad174 gpl
loaded_at 2019-01-19T09:09:44-0300 uid 0
xlated 296B jited 229B memlock 4096B map_ids 17,18
21: cgroup_skb tag 7be49e3934a125ba gpl
loaded_at 2019-01-19T09:09:45-0300 uid 0
xlated 296B jited 229B memlock 4096B map_ids 21,22
22: cgroup_skb tag 2a142ef67aaad174 gpl
loaded_at 2019-01-19T09:09:45-0300 uid 0
xlated 296B jited 229B memlock 4096B map_ids 21,22
#
# perf report -D | grep -B22 PERF_RECORD_KSYMBOL
. ... raw event: size 312 bytes
. 0000: 11 00 00 00 00 00 38 01 ff 44 06 c0 ff ff ff ff ......8..D......
. 0010: e5 00 00 00 01 00 00 00 62 70 66 5f 70 72 6f 67 ........bpf_prog
. 0020: 5f 37 62 65 34 39 65 33 39 33 34 61 31 32 35 62 _7be49e3934a125b
. 0030: 61 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a...............
<SNIP zeroes>
. 0110: 00 00 00 00 00 00 00 00 21 00 00 00 00 00 00 00 ........!.......
. 0120: 7b e4 9e 39 34 a1 25 ba 00 00 00 00 00 00 00 00 {..94.%.........
. 0130: 00 00 00 00 00 00 00 00 ........
0 0x49d8 [0x138]: PERF_RECORD_KSYMBOL ksymbol event with addr ffffffffc00644ff len 229 type 1 flags 0x0 name bpf_prog_7be49e3934a125ba
--
. ... raw event: size 312 bytes
. 0000: 11 00 00 00 00 00 38 01 48 6d 06 c0 ff ff ff ff ......8.Hm......
. 0010: e5 00 00 00 01 00 00 00 62 70 66 5f 70 72 6f 67 ........bpf_prog
. 0020: 5f 32 61 31 34 32 65 66 36 37 61 61 61 64 31 37 _2a142ef67aaad17
. 0030: 34 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 4...............
<SNIP zeroes>
. 0110: 00 00 00 00 00 00 00 00 21 00 00 00 00 00 00 00 ........!.......
. 0120: 2a 14 2e f6 7a aa d1 74 00 00 00 00 00 00 00 00 *...z..t........
. 0130: 00 00 00 00 00 00 00 00 ........
0 0x4b28 [0x138]: PERF_RECORD_KSYMBOL ksymbol event with addr ffffffffc0066d48 len 229 type 1 flags 0x0 name bpf_prog_2a142ef67aaad174
--
. ... raw event: size 312 bytes
. 0000: 11 00 00 00 00 00 38 01 04 cf 03 c0 ff ff ff ff ......8.........
. 0010: e5 00 00 00 01 00 00 00 62 70 66 5f 70 72 6f 67 ........bpf_prog
. 0020: 5f 37 62 65 34 39 65 33 39 33 34 61 31 32 35 62 _7be49e3934a125b
. 0030: 61 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a...............
<SNIP zeroes>
. 0110: 00 00 00 00 00 00 00 00 21 00 00 00 00 00 00 00 ........!.......
. 0120: 7b e4 9e 39 34 a1 25 ba 00 00 00 00 00 00 00 00 {..94.%.........
. 0130: 00 00 00 00 00 00 00 00 ........
0 0x4c78 [0x138]: PERF_RECORD_KSYMBOL ksymbol event with addr ffffffffc003cf04 len 229 type 1 flags 0x0 name bpf_prog_7be49e3934a125ba
--
. ... raw event: size 312 bytes
. 0000: 11 00 00 00 00 00 38 01 96 28 04 c0 ff ff ff ff ......8..(......
. 0010: e5 00 00 00 01 00 00 00 62 70 66 5f 70 72 6f 67 ........bpf_prog
. 0020: 5f 32 61 31 34 32 65 66 36 37 61 61 61 64 31 37 _2a142ef67aaad17
. 0030: 34 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 4...............
<SNIP zeroes>
. 0110: 00 00 00 00 00 00 00 00 21 00 00 00 00 00 00 00 ........!.......
. 0120: 2a 14 2e f6 7a aa d1 74 00 00 00 00 00 00 00 00 *...z..t........
. 0130: 00 00 00 00 00 00 00 00 ........
0 0x4dc8 [0x138]: PERF_RECORD_KSYMBOL ksymbol event with addr ffffffffc0042896 len 229 type 1 flags 0x0 name bpf_prog_2a142ef67aaad174
--
. ... raw event: size 312 bytes
. 0000: 11 00 00 00 00 00 38 01 05 13 17 c0 ff ff ff ff ......8.........
. 0010: e5 00 00 00 01 00 00 00 62 70 66 5f 70 72 6f 67 ........bpf_prog
. 0020: 5f 37 62 65 34 39 65 33 39 33 34 61 31 32 35 62 _7be49e3934a125b
. 0030: 61 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a...............
<SNIP zeroes>
. 0110: 00 00 00 00 00 00 00 00 21 00 00 00 00 00 00 00 ........!.......
. 0120: 7b e4 9e 39 34 a1 25 ba 00 00 00 00 00 00 00 00 {..94.%.........
. 0130: 00 00 00 00 00 00 00 00 ........
0 0x4f18 [0x138]: PERF_RECORD_KSYMBOL ksymbol event with addr ffffffffc0171305 len 229 type 1 flags 0x0 name bpf_prog_7be49e3934a125ba
--
. ... raw event: size 312 bytes
. 0000: 11 00 00 00 00 00 38 01 0a 8c 23 c0 ff ff ff ff ......8...#.....
. 0010: e5 00 00 00 01 00 00 00 62 70 66 5f 70 72 6f 67 ........bpf_prog
. 0020: 5f 32 61 31 34 32 65 66 36 37 61 61 61 64 31 37 _2a142ef67aaad17
. 0030: 34 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 4...............
<SNIP zeroes>
. 0110: 00 00 00 00 00 00 00 00 21 00 00 00 00 00 00 00 ........!.......
. 0120: 2a 14 2e f6 7a aa d1 74 00 00 00 00 00 00 00 00 *...z..t........
. 0130: 00 00 00 00 00 00 00 00 ........
0 0x5068 [0x138]: PERF_RECORD_KSYMBOL ksymbol event with addr ffffffffc0238c0a len 229 type 1 flags 0x0 name bpf_prog_2a142ef67aaad174
--
. ... raw event: size 312 bytes
. 0000: 11 00 00 00 00 00 38 01 2a a5 a4 c0 ff ff ff ff ......8.*.......
. 0010: e5 00 00 00 01 00 00 00 62 70 66 5f 70 72 6f 67 ........bpf_prog
. 0020: 5f 37 62 65 34 39 65 33 39 33 34 61 31 32 35 62 _7be49e3934a125b
. 0030: 61 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a...............
<SNIP zeroes>
. 0110: 00 00 00 00 00 00 00 00 21 00 00 00 00 00 00 00 ........!.......
. 0120: 7b e4 9e 39 34 a1 25 ba 00 00 00 00 00 00 00 00 {..94.%.........
. 0130: 00 00 00 00 00 00 00 00 ........
0 0x51b8 [0x138]: PERF_RECORD_KSYMBOL ksymbol event with addr ffffffffc0a4a52a len 229 type 1 flags 0x0 name bpf_prog_7be49e3934a125ba
--
. ... raw event: size 312 bytes
. 0000: 11 00 00 00 00 00 38 01 9b c9 a4 c0 ff ff ff ff ......8.........
. 0010: e5 00 00 00 01 00 00 00 62 70 66 5f 70 72 6f 67 ........bpf_prog
. 0020: 5f 32 61 31 34 32 65 66 36 37 61 61 61 64 31 37 _2a142ef67aaad17
. 0030: 34 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 4...............
<SNIP zeroes>
. 0110: 00 00 00 00 00 00 00 00 21 00 00 00 00 00 00 00 ........!.......
. 0120: 2a 14 2e f6 7a aa d1 74 00 00 00 00 00 00 00 00 *...z..t........
. 0130: 00 00 00 00 00 00 00 00 ........
0 0x5308 [0x138]: PERF_RECORD_KSYMBOL ksymbol event with addr ffffffffc0a4c99b len 229 type 1 flags 0x0 name bpf_prog_2a142ef67aaad174
Signed-off-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kernel-team@fb.com
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/20190117161521.1341602-8-songliubraving@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-01-17 16:15:19 +00:00
|
|
|
#include "util/bpf-event.h"
|
2020-04-22 15:50:38 +00:00
|
|
|
#include "util/util.h"
|
2020-05-05 18:29:43 +00:00
|
|
|
#include "util/pfm.h"
|
2023-05-27 07:22:03 +00:00
|
|
|
#include "util/pmu.h"
|
|
|
|
#include "util/pmus.h"
|
2020-08-05 09:34:38 +00:00
|
|
|
#include "util/clockid.h"
|
2022-05-18 22:47:21 +00:00
|
|
|
#include "util/off_cpu.h"
|
2023-03-14 23:42:31 +00:00
|
|
|
#include "util/bpf-filter.h"
|
2016-02-26 09:32:06 +00:00
|
|
|
#include "asm/bug.h"
|
2019-08-29 18:20:59 +00:00
|
|
|
#include "perf.h"
|
2022-01-17 18:34:33 +00:00
|
|
|
#include "cputopo.h"
|
2009-06-25 15:05:54 +00:00
|
|
|
|
2017-04-18 13:46:11 +00:00
|
|
|
#include <errno.h>
|
2017-04-17 18:23:08 +00:00
|
|
|
#include <inttypes.h>
|
perf record: Allow asking for the maximum allowed sample rate
Add the handy '-F max' shortcut to reading and using the
kernel.perf_event_max_sample_rate value as the user supplied
sampling frequency:
# perf record -F max sleep 1
info: Using a maximum frequency rate of 15,000 Hz
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.019 MB perf.data (14 samples) ]
# sysctl kernel.perf_event_max_sample_rate
kernel.perf_event_max_sample_rate = 15000
# perf evlist -v
cycles:ppp: size: 112, { sample_period, sample_freq }: 15000, sample_type: IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1
# perf record -F 10 sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.019 MB perf.data (4 samples) ]
# perf evlist -v
cycles:ppp: size: 112, { sample_period, sample_freq }: 10, sample_type: IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1
#
Suggested-by: Ingo Molnar <mingo@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-4y0tiuws62c64gp4cf0hme0m@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-01 16:46:23 +00:00
|
|
|
#include <locale.h>
|
2017-04-19 22:06:30 +00:00
|
|
|
#include <poll.h>
|
2020-04-22 15:50:38 +00:00
|
|
|
#include <pthread.h>
|
2009-06-02 13:52:24 +00:00
|
|
|
#include <unistd.h>
|
2022-01-17 18:34:23 +00:00
|
|
|
#ifndef HAVE_GETTID
|
|
|
|
#include <syscall.h>
|
|
|
|
#endif
|
2009-04-08 13:01:31 +00:00
|
|
|
#include <sched.h>
|
2017-04-19 18:49:18 +00:00
|
|
|
#include <signal.h>
|
2020-05-13 02:20:23 +00:00
|
|
|
#ifdef HAVE_EVENTFD_SUPPORT
|
|
|
|
#include <sys/eventfd.h>
|
|
|
|
#endif
|
2010-05-18 21:29:23 +00:00
|
|
|
#include <sys/mman.h>
|
2017-04-19 22:06:30 +00:00
|
|
|
#include <sys/wait.h>
|
perf record: Put a copy of kcore into the perf.data directory
Add a new 'perf record' option '--kcore' which will put a copy of
/proc/kcore, kallsyms and modules into a perf.data directory. Note, that
without the --kcore option, output goes to a file as previously. The
tools' -o and -i options work with either a file name or directory name.
Example:
$ sudo perf record --kcore uname
$ sudo tree perf.data
perf.data
├── kcore_dir
│ ├── kallsyms
│ ├── kcore
│ └── modules
└── data
$ sudo perf script -v
build id event received for vmlinux: 1eaa285996affce2d74d8e66dcea09a80c9941de
build id event received for [vdso]: 8bbaf5dc62a9b644b4d4e4539737e104e4a84541
Samples for 'cycles' event do not have CPU attribute set. Skipping 'cpu' field.
Using CPUID GenuineIntel-6-8E-A
Using perf.data/kcore_dir/kcore for kernel data
Using perf.data/kcore_dir/kallsyms for symbols
perf 19058 506778.423729: 1 cycles: ffffffffa2caa548 native_write_msr+0x8 (vmlinux)
perf 19058 506778.423733: 1 cycles: ffffffffa2caa548 native_write_msr+0x8 (vmlinux)
perf 19058 506778.423734: 7 cycles: ffffffffa2caa548 native_write_msr+0x8 (vmlinux)
perf 19058 506778.423736: 117 cycles: ffffffffa2caa54a native_write_msr+0xa (vmlinux)
perf 19058 506778.423738: 2092 cycles: ffffffffa2c9b7b0 native_apic_msr_write+0x0 (vmlinux)
perf 19058 506778.423740: 37380 cycles: ffffffffa2f121d0 perf_event_addr_filters_exec+0x0 (vmlinux)
uname 19058 506778.423751: 582673 cycles: ffffffffa303a407 propagate_protected_usage+0x147 (vmlinux)
uname 19058 506778.423892: 2241841 cycles: ffffffffa2cae0c9 unwind_next_frame.part.5+0x79 (vmlinux)
uname 19058 506778.424430: 2457397 cycles: ffffffffa3019232 check_memory_region+0x52 (vmlinux)
Committer testing:
# rm -rf perf.data*
# perf record sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.024 MB perf.data (7 samples) ]
# ls -l perf.data
-rw-------. 1 root root 34772 Oct 21 11:08 perf.data
# perf record --kcore uname
Linux
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.024 MB perf.data (7 samples) ]
ls[root@quaco ~]# ls -lad perf.data*
drwx------. 3 root root 4096 Oct 21 11:08 perf.data
-rw-------. 1 root root 34772 Oct 21 11:08 perf.data.old
# perf evlist -v
cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1
# perf evlist -v -i perf.data/data
cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1
#
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lore.kernel.org/lkml/20191004083121.12182-6-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-10-04 08:31:21 +00:00
|
|
|
#include <sys/types.h>
|
|
|
|
#include <sys/stat.h>
|
|
|
|
#include <fcntl.h>
|
2019-08-22 07:20:49 +00:00
|
|
|
#include <linux/err.h>
|
2019-08-29 19:18:59 +00:00
|
|
|
#include <linux/string.h>
|
2016-08-08 18:05:46 +00:00
|
|
|
#include <linux/time64.h>
|
2019-07-04 15:06:20 +00:00
|
|
|
#include <linux/zalloc.h>
|
2019-12-03 11:45:27 +00:00
|
|
|
#include <linux/bitmap.h>
|
perf header: Store clock references for -k/--clockid option
Add a new CLOCK_DATA feature that stores reference times when
-k/--clockid option is specified.
It contains the clock id and its reference time together with wall clock
time taken at the 'same time', both values are in nanoseconds.
The format of data is as below:
struct {
u32 version; /* version = 1 */
u32 clockid;
u64 wall_clock_ns;
u64 clockid_time_ns;
};
This clock reference times will be used in following changes to display
wall clock for perf events.
It's available only for recording with clockid specified, because it's
the only case where we can get reference time to wallclock time. It's
can't do that with perf clock yet.
Committer testing:
$ perf record -h -k
Usage: perf record [<options>] [<command>]
or: perf record [<options>] -- <command> [<options>]
-k, --clockid <clockid>
clockid to use for events, see clock_gettime()
$ perf record -k monotonic sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.017 MB perf.data (8 samples) ]
$ perf report --header-only | grep clockid -A1
# event : name = cycles:u, , id = { 88815, 88816, 88817, 88818, 88819, 88820, 88821, 88822 }, size = 120, { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|PERIOD, read_format = ID, disabled = 1, inherit = 1, exclude_kernel = 1, mmap = 1, comm = 1, freq = 1, enable_on_exec = 1, task = 1, precise_ip = 3, sample_id_all = 1, exclude_guest = 1, mmap2 = 1, comm_exec = 1, use_clockid = 1, ksymbol = 1, bpf_event = 1, clockid = 1
# CPU_TOPOLOGY info available, use -I to display
--
# clockid frequency: 1000 MHz
# cpu pmu capabilities: branches=32, max_precise=3, pmu_name=skylake
# clockid: monotonic (1)
# reference time: 2020-08-06 09:40:21.619290 = 1596717621.619290 (TOD) = 21931.077673635 (monotonic)
$
Original-patch-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Geneviève Bastien <gbastien@versatic.net>
Cc: Ian Rogers <irogers@google.com>
Cc: Jeremie Galarneau <jgalar@efficios.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lore.kernel.org/lkml/20200805093444.314999-4-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-05 09:34:40 +00:00
|
|
|
#include <sys/time.h>
|
2012-10-08 06:43:26 +00:00
|
|
|
|
2017-01-09 09:51:56 +00:00
|
|
|
struct switch_output {
|
2017-01-09 09:51:58 +00:00
|
|
|
bool enabled;
|
2017-01-09 09:51:56 +00:00
|
|
|
bool signal;
|
2017-01-09 09:51:58 +00:00
|
|
|
unsigned long size;
|
2017-01-09 09:52:00 +00:00
|
|
|
unsigned long time;
|
2017-01-09 09:51:57 +00:00
|
|
|
const char *str;
|
|
|
|
bool set;
|
2019-03-14 22:49:55 +00:00
|
|
|
char **filenames;
|
|
|
|
int num_files;
|
|
|
|
int cur_file;
|
2017-01-09 09:51:56 +00:00
|
|
|
};
|
|
|
|
|
2022-01-17 18:34:21 +00:00
|
|
|
struct thread_mask {
|
|
|
|
struct mmap_cpu_mask maps;
|
|
|
|
struct mmap_cpu_mask affinity;
|
|
|
|
};
|
|
|
|
|
2022-01-17 18:34:23 +00:00
|
|
|
struct record_thread {
|
|
|
|
pid_t tid;
|
|
|
|
struct thread_mask *mask;
|
|
|
|
struct {
|
|
|
|
int msg[2];
|
|
|
|
int ack[2];
|
|
|
|
} pipes;
|
|
|
|
struct fdarray pollfd;
|
|
|
|
int ctlfd_pos;
|
|
|
|
int nr_mmaps;
|
|
|
|
struct mmap **maps;
|
|
|
|
struct mmap **overwrite_maps;
|
|
|
|
struct record *rec;
|
2022-01-17 18:34:25 +00:00
|
|
|
unsigned long long samples;
|
|
|
|
unsigned long waking;
|
2022-01-17 18:34:29 +00:00
|
|
|
u64 bytes_written;
|
2022-01-17 18:34:31 +00:00
|
|
|
u64 bytes_transferred;
|
|
|
|
u64 bytes_compressed;
|
2022-01-17 18:34:23 +00:00
|
|
|
};
|
|
|
|
|
2022-01-17 18:34:25 +00:00
|
|
|
static __thread struct record_thread *thread;
|
|
|
|
|
2022-01-17 18:34:26 +00:00
|
|
|
enum thread_msg {
|
|
|
|
THREAD_MSG__UNDEFINED = 0,
|
|
|
|
THREAD_MSG__READY,
|
|
|
|
THREAD_MSG__MAX,
|
|
|
|
};
|
|
|
|
|
|
|
|
static const char *thread_msg_tags[THREAD_MSG__MAX] = {
|
|
|
|
"UNDEFINED", "READY"
|
|
|
|
};
|
|
|
|
|
2022-01-17 18:34:32 +00:00
|
|
|
enum thread_spec {
|
|
|
|
THREAD_SPEC__UNDEFINED = 0,
|
|
|
|
THREAD_SPEC__CPU,
|
2022-01-17 18:34:33 +00:00
|
|
|
THREAD_SPEC__CORE,
|
|
|
|
THREAD_SPEC__PACKAGE,
|
|
|
|
THREAD_SPEC__NUMA,
|
|
|
|
THREAD_SPEC__USER,
|
|
|
|
THREAD_SPEC__MAX,
|
|
|
|
};
|
|
|
|
|
|
|
|
static const char *thread_spec_tags[THREAD_SPEC__MAX] = {
|
|
|
|
"undefined", "cpu", "core", "package", "numa", "user"
|
2022-01-17 18:34:32 +00:00
|
|
|
};
|
|
|
|
|
2022-08-24 07:28:10 +00:00
|
|
|
struct pollfd_index_map {
|
|
|
|
int evlist_pollfd_index;
|
|
|
|
int thread_pollfd_index;
|
|
|
|
};
|
|
|
|
|
2013-12-19 17:38:03 +00:00
|
|
|
struct record {
|
2011-11-28 10:30:20 +00:00
|
|
|
struct perf_tool tool;
|
2013-12-19 17:43:45 +00:00
|
|
|
struct record_opts opts;
|
2011-11-25 10:19:45 +00:00
|
|
|
u64 bytes_written;
|
perf record: Fix segfault with --overwrite and --max-size
When --overwrite and --max-size options of perf record are used
together, a segmentation fault occurs. The following is an example:
# perf record -e sched:sched* --overwrite --max-size 1K -a -- sleep 1
[ perf record: Woken up 1 times to write data ]
perf: Segmentation fault
Obtained 12 stack frames.
./perf/perf(+0x197673) [0x55f99710b673]
/lib/x86_64-linux-gnu/libc.so.6(+0x3ef0f) [0x7fa45f3cff0f]
./perf/perf(+0x8eb40) [0x55f997002b40]
./perf/perf(+0x1f6882) [0x55f99716a882]
./perf/perf(+0x794c2) [0x55f996fed4c2]
./perf/perf(+0x7b7c7) [0x55f996fef7c7]
./perf/perf(+0x9074b) [0x55f99700474b]
./perf/perf(+0x12e23c) [0x55f9970a223c]
./perf/perf(+0x12e54a) [0x55f9970a254a]
./perf/perf(+0x7db60) [0x55f996ff1b60]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe6) [0x7fa45f3b2c86]
./perf/perf(+0x7dfe9) [0x55f996ff1fe9]
Segmentation fault (core dumped)
backtrace of the core file is as follows:
(gdb) bt
#0 record__bytes_written (rec=0x55f99755a200 <record>) at builtin-record.c:234
#1 record__output_max_size_exceeded (rec=0x55f99755a200 <record>) at builtin-record.c:242
#2 record__write (map=0x0, size=12816, bf=0x55f9978da2e0, rec=0x55f99755a200 <record>) at builtin-record.c:263
#3 process_synthesized_event (tool=tool@entry=0x55f99755a200 <record>, event=event@entry=0x55f9978da2e0, sample=sample@entry=0x0, machine=machine@entry=0x55f997893658) at builtin-record.c:618
#4 0x000055f99716a883 in __perf_event__synthesize_id_index (tool=tool@entry=0x55f99755a200 <record>, process=process@entry=0x55f997002aa0 <process_synthesized_event>, evlist=0x55f9978928b0, machine=machine@entry=0x55f997893658,
from=from@entry=0) at util/synthetic-events.c:1895
#5 0x000055f99716a91f in perf_event__synthesize_id_index (tool=tool@entry=0x55f99755a200 <record>, process=process@entry=0x55f997002aa0 <process_synthesized_event>, evlist=<optimized out>, machine=machine@entry=0x55f997893658)
at util/synthetic-events.c:1905
#6 0x000055f996fed4c3 in record__synthesize (tail=tail@entry=true, rec=0x55f99755a200 <record>) at builtin-record.c:1997
#7 0x000055f996fef7c8 in __cmd_record (argc=argc@entry=2, argv=argv@entry=0x7ffc67551260, rec=0x55f99755a200 <record>) at builtin-record.c:2802
#8 0x000055f99700474c in cmd_record (argc=<optimized out>, argv=0x7ffc67551260) at builtin-record.c:4258
#9 0x000055f9970a223d in run_builtin (p=0x55f997564d88 <commands+264>, argc=10, argv=0x7ffc67551260) at perf.c:330
#10 0x000055f9970a254b in handle_internal_command (argc=10, argv=0x7ffc67551260) at perf.c:384
#11 0x000055f996ff1b61 in run_argv (argcp=<synthetic pointer>, argv=<synthetic pointer>) at perf.c:428
#12 main (argc=<optimized out>, argv=0x7ffc67551260) at perf.c:562
The reason is that record__bytes_written accesses the freed memory rec->thread_data,
The process is as follows:
__cmd_record
-> record__free_thread_data
-> zfree(&rec->thread_data) // free rec->thread_data
-> record__synthesize
-> perf_event__synthesize_id_index
-> process_synthesized_event
-> record__write
-> record__bytes_written // access rec->thread_data
We add a member variable "thread_bytes_written" in the struct "record"
to save the data size written by the threads.
Fixes: 6d57581659f72299 ("perf record: Add support for limit perf output file size")
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Jiwei Sun <jiwei.sun@windriver.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/CAM9d7ci_TRrqBQVQNW8=GwakUr7SsZpYxaaty-S4bxF8zJWyqw@mail.gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-02-15 12:23:24 +00:00
|
|
|
u64 thread_bytes_written;
|
2017-01-23 21:07:59 +00:00
|
|
|
struct perf_data data;
|
2015-04-09 15:53:45 +00:00
|
|
|
struct auxtrace_record *itr;
|
2019-07-21 11:23:52 +00:00
|
|
|
struct evlist *evlist;
|
2011-11-25 10:19:45 +00:00
|
|
|
struct perf_session *session;
|
2020-04-24 13:24:04 +00:00
|
|
|
struct evlist *sb_evlist;
|
2020-04-27 20:56:37 +00:00
|
|
|
pthread_t thread_id;
|
2011-11-25 10:19:45 +00:00
|
|
|
int realtime_prio;
|
2020-04-27 20:56:37 +00:00
|
|
|
bool switch_output_event_set;
|
2011-11-25 10:19:45 +00:00
|
|
|
bool no_buildid;
|
2016-01-25 09:56:19 +00:00
|
|
|
bool no_buildid_set;
|
2011-11-25 10:19:45 +00:00
|
|
|
bool no_buildid_cache;
|
2016-01-25 09:56:19 +00:00
|
|
|
bool no_buildid_cache_set;
|
2016-01-11 13:37:09 +00:00
|
|
|
bool buildid_all;
|
perf record: Add --buildid-mmap option to enable PERF_RECORD_MMAP2's build id
Add --buildid-mmap option to enable build id in PERF_RECORD_MMAP2 events.
It will only work if there's kernel support for that and it disables
build id cache (implies --no-buildid).
It's also possible to enable it permanently via config option in
~/.perfconfig file:
[record]
build-id=mmap
Also added build_id bit in the verbose output for perf_event_attr:
# perf record --buildid-mmap -vv
...
perf_event_attr:
type 1
size 120
...
build_id 1
Adding also missing text_poke bit.
Committer testing:
$ perf record -h build
Usage: perf record [<options>] [<command>]
or: perf record [<options>] -- <command> [<options>]
-B, --no-buildid do not collect buildids in perf.data
-N, --no-buildid-cache
do not update the buildid cache
--buildid-all Record build-id of all DSOs regardless of hits
--buildid-mmap Record build-id in map events
$
$ perf record --buildid-mmap sleep 1
Failed: no support to record build id in mmap events, update your kernel.
$
After adding the needed kernel bits in a test kernel:
$ perf record -vv --buildid-mmap sleep 1 |& grep -m1 build
Enabling build id in mmap2 events.
$ perf evlist -v
cycles:u: size: 120, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD, read_format: ID, disabled: 1, inherit: 1, exclude_kernel: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, build_id: 1
$
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Budankov <abudankov@huawei.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201214105457.543111-16-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-14 10:54:57 +00:00
|
|
|
bool buildid_mmap;
|
2016-04-13 08:21:07 +00:00
|
|
|
bool timestamp_filename;
|
perf record: Record the first and last sample time in the header
In the default 'perf record' configuration, all samples are processed,
to create the HEADER_BUILD_ID table. So it's very easy to get the
first/last samples and save the time to perf file header via the
function write_sample_time().
Later, at post processing time, perf report/script will fetch the time
from perf file header.
Committer testing:
# perf record -a sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 2.099 MB perf.data (1101 samples) ]
[root@jouet home]# perf report --header | grep "time of "
# time of first sample : 22947.909226
# time of last sample : 22948.910704
#
# perf report -D | grep PERF_RECORD_SAMPLE\(
0 22947909226101 0x20bb68 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4001): 0/0: 0xffffffffa21b1af3 period: 1 addr: 0
0 22947909229928 0x20bb98 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4001): 0/0: 0xffffffffa200d204 period: 1 addr: 0
<SNIP>
3 22948910397351 0x219360 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4001): 28251/28251: 0xffffffffa22071d8 period: 169518 addr: 0
0 22948910652380 0x20f120 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4001): 0/0: 0xffffffffa2856816 period: 198807 addr: 0
2 22948910704034 0x2172d0 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4001): 0/0: 0xffffffffa2856816 period: 88111 addr: 0
#
Changelog:
v7: Just update the patch description according to Arnaldo's suggestion.
v6: Currently '--buildid-all' is not enabled at default. So the walking
on all samples is the default operation. There is no big overhead
to calculate the timestamp boundary in process_sample_event handler
once we already go through all samples. So the timestamp boundary
calculation is enabled by default when '--buildid-all' is not enabled.
While if '--buildid-all' is enabled, we creates a new option
"--timestamp-boundary" for user to decide if it enables the
timestamp boundary calculation.
v5: There is an issue that the sample walking can only work when
'--buildid-all' is not enabled. So we need to let the walking
be able to work even if '--buildid-all' is enabled and let the
processing skips the dso hit marking for this case.
At first, I want to provide a new option "--record-time-boundaries".
While after consideration, I think a new option is not very
necessary.
v3: Remove the definitions of first_sample_time and last_sample_time
from struct record and directly save them in perf_evlist.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1512738826-2628-3-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-12-08 13:13:42 +00:00
|
|
|
bool timestamp_boundary;
|
2022-05-18 22:47:21 +00:00
|
|
|
bool off_cpu;
|
2024-07-03 22:30:34 +00:00
|
|
|
const char *filter_action;
|
2017-01-09 09:51:56 +00:00
|
|
|
struct switch_output switch_output;
|
perf record: Change 'record.samples' type to unsigned long long
When run "perf record -e", the number of samples showed up is wrong on some
32 bit systems, i.e. powerpc and arm.
For example, run the below commands on 32 bit powerpc:
perf probe -x /lib/libc.so.6 malloc
perf record -e probe_libc:malloc -a ls perf.data
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.036 MB perf.data (13829241621624967218 samples) ]
Actually, "perf script" just shows 21 samples. The number of samples is also
absurd since samples is long type, but it is printed as PRIu64.
Build test ran on x86-64, x86, aarch64, arm, mips, ppc and ppc64.
Signed-off-by: Yang Shi <yang.shi@linaro.org>
Cc: linaro-kernel@lists.linaro.org
Link: http://lkml.kernel.org/r/1443563383-4064-1-git-send-email-yang.shi@linaro.org
[ Bumped the 'hits' var used together with record.samples to 'unsigned long long' too ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-09-29 21:49:43 +00:00
|
|
|
unsigned long long samples;
|
2019-10-22 08:09:01 +00:00
|
|
|
unsigned long output_max_size; /* = 0: unlimited */
|
2021-12-09 20:04:25 +00:00
|
|
|
struct perf_debuginfod debuginfod;
|
2022-01-17 18:34:21 +00:00
|
|
|
int nr_threads;
|
|
|
|
struct thread_mask *thread_masks;
|
2022-01-17 18:34:23 +00:00
|
|
|
struct record_thread *thread_data;
|
2022-08-24 07:28:10 +00:00
|
|
|
struct pollfd_index_map *index_map;
|
|
|
|
size_t index_map_sz;
|
|
|
|
size_t index_map_cnt;
|
2011-11-08 16:41:57 +00:00
|
|
|
};
|
2009-06-06 07:58:57 +00:00
|
|
|
|
2019-10-22 08:09:01 +00:00
|
|
|
static volatile int done;
|
|
|
|
|
2017-01-09 09:51:58 +00:00
|
|
|
static volatile int auxtrace_record__snapshot_started;
|
|
|
|
static DEFINE_TRIGGER(auxtrace_snapshot_trigger);
|
|
|
|
static DEFINE_TRIGGER(switch_output_trigger);
|
|
|
|
|
2019-01-22 17:47:43 +00:00
|
|
|
static const char *affinity_tags[PERF_AFFINITY_MAX] = {
|
|
|
|
"SYS", "NODE", "CPU"
|
|
|
|
};
|
|
|
|
|
2024-08-12 20:47:03 +00:00
|
|
|
static int build_id__process_mmap(const struct perf_tool *tool, union perf_event *event,
|
|
|
|
struct perf_sample *sample, struct machine *machine);
|
|
|
|
static int build_id__process_mmap2(const struct perf_tool *tool, union perf_event *event,
|
|
|
|
struct perf_sample *sample, struct machine *machine);
|
|
|
|
static int process_timestamp_boundary(const struct perf_tool *tool,
|
|
|
|
union perf_event *event,
|
|
|
|
struct perf_sample *sample,
|
|
|
|
struct machine *machine);
|
|
|
|
|
2022-01-17 18:34:23 +00:00
|
|
|
#ifndef HAVE_GETTID
|
|
|
|
static inline pid_t gettid(void)
|
|
|
|
{
|
|
|
|
return (pid_t)syscall(__NR_gettid);
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2022-01-17 18:34:27 +00:00
|
|
|
static int record__threads_enabled(struct record *rec)
|
|
|
|
{
|
|
|
|
return rec->opts.threads_spec;
|
|
|
|
}
|
|
|
|
|
2017-01-09 09:51:58 +00:00
|
|
|
static bool switch_output_signal(struct record *rec)
|
|
|
|
{
|
|
|
|
return rec->switch_output.signal &&
|
|
|
|
trigger_is_ready(&switch_output_trigger);
|
|
|
|
}
|
|
|
|
|
|
|
|
static bool switch_output_size(struct record *rec)
|
|
|
|
{
|
|
|
|
return rec->switch_output.size &&
|
|
|
|
trigger_is_ready(&switch_output_trigger) &&
|
|
|
|
(rec->bytes_written >= rec->switch_output.size);
|
|
|
|
}
|
|
|
|
|
2017-01-09 09:52:00 +00:00
|
|
|
static bool switch_output_time(struct record *rec)
|
|
|
|
{
|
|
|
|
return rec->switch_output.time &&
|
|
|
|
trigger_is_ready(&switch_output_trigger);
|
|
|
|
}
|
|
|
|
|
2022-01-17 18:34:29 +00:00
|
|
|
static u64 record__bytes_written(struct record *rec)
|
|
|
|
{
|
perf record: Fix segfault with --overwrite and --max-size
When --overwrite and --max-size options of perf record are used
together, a segmentation fault occurs. The following is an example:
# perf record -e sched:sched* --overwrite --max-size 1K -a -- sleep 1
[ perf record: Woken up 1 times to write data ]
perf: Segmentation fault
Obtained 12 stack frames.
./perf/perf(+0x197673) [0x55f99710b673]
/lib/x86_64-linux-gnu/libc.so.6(+0x3ef0f) [0x7fa45f3cff0f]
./perf/perf(+0x8eb40) [0x55f997002b40]
./perf/perf(+0x1f6882) [0x55f99716a882]
./perf/perf(+0x794c2) [0x55f996fed4c2]
./perf/perf(+0x7b7c7) [0x55f996fef7c7]
./perf/perf(+0x9074b) [0x55f99700474b]
./perf/perf(+0x12e23c) [0x55f9970a223c]
./perf/perf(+0x12e54a) [0x55f9970a254a]
./perf/perf(+0x7db60) [0x55f996ff1b60]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe6) [0x7fa45f3b2c86]
./perf/perf(+0x7dfe9) [0x55f996ff1fe9]
Segmentation fault (core dumped)
backtrace of the core file is as follows:
(gdb) bt
#0 record__bytes_written (rec=0x55f99755a200 <record>) at builtin-record.c:234
#1 record__output_max_size_exceeded (rec=0x55f99755a200 <record>) at builtin-record.c:242
#2 record__write (map=0x0, size=12816, bf=0x55f9978da2e0, rec=0x55f99755a200 <record>) at builtin-record.c:263
#3 process_synthesized_event (tool=tool@entry=0x55f99755a200 <record>, event=event@entry=0x55f9978da2e0, sample=sample@entry=0x0, machine=machine@entry=0x55f997893658) at builtin-record.c:618
#4 0x000055f99716a883 in __perf_event__synthesize_id_index (tool=tool@entry=0x55f99755a200 <record>, process=process@entry=0x55f997002aa0 <process_synthesized_event>, evlist=0x55f9978928b0, machine=machine@entry=0x55f997893658,
from=from@entry=0) at util/synthetic-events.c:1895
#5 0x000055f99716a91f in perf_event__synthesize_id_index (tool=tool@entry=0x55f99755a200 <record>, process=process@entry=0x55f997002aa0 <process_synthesized_event>, evlist=<optimized out>, machine=machine@entry=0x55f997893658)
at util/synthetic-events.c:1905
#6 0x000055f996fed4c3 in record__synthesize (tail=tail@entry=true, rec=0x55f99755a200 <record>) at builtin-record.c:1997
#7 0x000055f996fef7c8 in __cmd_record (argc=argc@entry=2, argv=argv@entry=0x7ffc67551260, rec=0x55f99755a200 <record>) at builtin-record.c:2802
#8 0x000055f99700474c in cmd_record (argc=<optimized out>, argv=0x7ffc67551260) at builtin-record.c:4258
#9 0x000055f9970a223d in run_builtin (p=0x55f997564d88 <commands+264>, argc=10, argv=0x7ffc67551260) at perf.c:330
#10 0x000055f9970a254b in handle_internal_command (argc=10, argv=0x7ffc67551260) at perf.c:384
#11 0x000055f996ff1b61 in run_argv (argcp=<synthetic pointer>, argv=<synthetic pointer>) at perf.c:428
#12 main (argc=<optimized out>, argv=0x7ffc67551260) at perf.c:562
The reason is that record__bytes_written accesses the freed memory rec->thread_data,
The process is as follows:
__cmd_record
-> record__free_thread_data
-> zfree(&rec->thread_data) // free rec->thread_data
-> record__synthesize
-> perf_event__synthesize_id_index
-> process_synthesized_event
-> record__write
-> record__bytes_written // access rec->thread_data
We add a member variable "thread_bytes_written" in the struct "record"
to save the data size written by the threads.
Fixes: 6d57581659f72299 ("perf record: Add support for limit perf output file size")
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Jiwei Sun <jiwei.sun@windriver.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/CAM9d7ci_TRrqBQVQNW8=GwakUr7SsZpYxaaty-S4bxF8zJWyqw@mail.gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-02-15 12:23:24 +00:00
|
|
|
return rec->bytes_written + rec->thread_bytes_written;
|
2022-01-17 18:34:29 +00:00
|
|
|
}
|
|
|
|
|
2019-10-22 08:09:01 +00:00
|
|
|
static bool record__output_max_size_exceeded(struct record *rec)
|
|
|
|
{
|
|
|
|
return rec->output_max_size &&
|
2022-01-17 18:34:29 +00:00
|
|
|
(record__bytes_written(rec) >= rec->output_max_size);
|
2019-10-22 08:09:01 +00:00
|
|
|
}
|
|
|
|
|
2019-07-27 18:30:53 +00:00
|
|
|
static int record__write(struct record *rec, struct mmap *map __maybe_unused,
|
2018-09-13 12:54:06 +00:00
|
|
|
void *bf, size_t size)
|
2009-06-18 21:22:55 +00:00
|
|
|
{
|
2018-09-13 12:54:06 +00:00
|
|
|
struct perf_data_file *file = &rec->session->data->file;
|
|
|
|
|
2022-01-17 18:34:28 +00:00
|
|
|
if (map && map->file)
|
|
|
|
file = map->file;
|
|
|
|
|
2018-09-13 12:54:06 +00:00
|
|
|
if (perf_data_file__write(file, bf, size) < 0) {
|
2013-11-22 12:11:24 +00:00
|
|
|
pr_err("failed to write perf data, error: %m\n");
|
|
|
|
return -1;
|
2009-06-18 21:22:55 +00:00
|
|
|
}
|
2012-08-26 18:24:47 +00:00
|
|
|
|
perf record: Fix segfault with --overwrite and --max-size
When --overwrite and --max-size options of perf record are used
together, a segmentation fault occurs. The following is an example:
# perf record -e sched:sched* --overwrite --max-size 1K -a -- sleep 1
[ perf record: Woken up 1 times to write data ]
perf: Segmentation fault
Obtained 12 stack frames.
./perf/perf(+0x197673) [0x55f99710b673]
/lib/x86_64-linux-gnu/libc.so.6(+0x3ef0f) [0x7fa45f3cff0f]
./perf/perf(+0x8eb40) [0x55f997002b40]
./perf/perf(+0x1f6882) [0x55f99716a882]
./perf/perf(+0x794c2) [0x55f996fed4c2]
./perf/perf(+0x7b7c7) [0x55f996fef7c7]
./perf/perf(+0x9074b) [0x55f99700474b]
./perf/perf(+0x12e23c) [0x55f9970a223c]
./perf/perf(+0x12e54a) [0x55f9970a254a]
./perf/perf(+0x7db60) [0x55f996ff1b60]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe6) [0x7fa45f3b2c86]
./perf/perf(+0x7dfe9) [0x55f996ff1fe9]
Segmentation fault (core dumped)
backtrace of the core file is as follows:
(gdb) bt
#0 record__bytes_written (rec=0x55f99755a200 <record>) at builtin-record.c:234
#1 record__output_max_size_exceeded (rec=0x55f99755a200 <record>) at builtin-record.c:242
#2 record__write (map=0x0, size=12816, bf=0x55f9978da2e0, rec=0x55f99755a200 <record>) at builtin-record.c:263
#3 process_synthesized_event (tool=tool@entry=0x55f99755a200 <record>, event=event@entry=0x55f9978da2e0, sample=sample@entry=0x0, machine=machine@entry=0x55f997893658) at builtin-record.c:618
#4 0x000055f99716a883 in __perf_event__synthesize_id_index (tool=tool@entry=0x55f99755a200 <record>, process=process@entry=0x55f997002aa0 <process_synthesized_event>, evlist=0x55f9978928b0, machine=machine@entry=0x55f997893658,
from=from@entry=0) at util/synthetic-events.c:1895
#5 0x000055f99716a91f in perf_event__synthesize_id_index (tool=tool@entry=0x55f99755a200 <record>, process=process@entry=0x55f997002aa0 <process_synthesized_event>, evlist=<optimized out>, machine=machine@entry=0x55f997893658)
at util/synthetic-events.c:1905
#6 0x000055f996fed4c3 in record__synthesize (tail=tail@entry=true, rec=0x55f99755a200 <record>) at builtin-record.c:1997
#7 0x000055f996fef7c8 in __cmd_record (argc=argc@entry=2, argv=argv@entry=0x7ffc67551260, rec=0x55f99755a200 <record>) at builtin-record.c:2802
#8 0x000055f99700474c in cmd_record (argc=<optimized out>, argv=0x7ffc67551260) at builtin-record.c:4258
#9 0x000055f9970a223d in run_builtin (p=0x55f997564d88 <commands+264>, argc=10, argv=0x7ffc67551260) at perf.c:330
#10 0x000055f9970a254b in handle_internal_command (argc=10, argv=0x7ffc67551260) at perf.c:384
#11 0x000055f996ff1b61 in run_argv (argcp=<synthetic pointer>, argv=<synthetic pointer>) at perf.c:428
#12 main (argc=<optimized out>, argv=0x7ffc67551260) at perf.c:562
The reason is that record__bytes_written accesses the freed memory rec->thread_data,
The process is as follows:
__cmd_record
-> record__free_thread_data
-> zfree(&rec->thread_data) // free rec->thread_data
-> record__synthesize
-> perf_event__synthesize_id_index
-> process_synthesized_event
-> record__write
-> record__bytes_written // access rec->thread_data
We add a member variable "thread_bytes_written" in the struct "record"
to save the data size written by the threads.
Fixes: 6d57581659f72299 ("perf record: Add support for limit perf output file size")
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Jiwei Sun <jiwei.sun@windriver.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/CAM9d7ci_TRrqBQVQNW8=GwakUr7SsZpYxaaty-S4bxF8zJWyqw@mail.gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-02-15 12:23:24 +00:00
|
|
|
if (map && map->file) {
|
2022-01-17 18:34:29 +00:00
|
|
|
thread->bytes_written += size;
|
perf record: Fix segfault with --overwrite and --max-size
When --overwrite and --max-size options of perf record are used
together, a segmentation fault occurs. The following is an example:
# perf record -e sched:sched* --overwrite --max-size 1K -a -- sleep 1
[ perf record: Woken up 1 times to write data ]
perf: Segmentation fault
Obtained 12 stack frames.
./perf/perf(+0x197673) [0x55f99710b673]
/lib/x86_64-linux-gnu/libc.so.6(+0x3ef0f) [0x7fa45f3cff0f]
./perf/perf(+0x8eb40) [0x55f997002b40]
./perf/perf(+0x1f6882) [0x55f99716a882]
./perf/perf(+0x794c2) [0x55f996fed4c2]
./perf/perf(+0x7b7c7) [0x55f996fef7c7]
./perf/perf(+0x9074b) [0x55f99700474b]
./perf/perf(+0x12e23c) [0x55f9970a223c]
./perf/perf(+0x12e54a) [0x55f9970a254a]
./perf/perf(+0x7db60) [0x55f996ff1b60]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe6) [0x7fa45f3b2c86]
./perf/perf(+0x7dfe9) [0x55f996ff1fe9]
Segmentation fault (core dumped)
backtrace of the core file is as follows:
(gdb) bt
#0 record__bytes_written (rec=0x55f99755a200 <record>) at builtin-record.c:234
#1 record__output_max_size_exceeded (rec=0x55f99755a200 <record>) at builtin-record.c:242
#2 record__write (map=0x0, size=12816, bf=0x55f9978da2e0, rec=0x55f99755a200 <record>) at builtin-record.c:263
#3 process_synthesized_event (tool=tool@entry=0x55f99755a200 <record>, event=event@entry=0x55f9978da2e0, sample=sample@entry=0x0, machine=machine@entry=0x55f997893658) at builtin-record.c:618
#4 0x000055f99716a883 in __perf_event__synthesize_id_index (tool=tool@entry=0x55f99755a200 <record>, process=process@entry=0x55f997002aa0 <process_synthesized_event>, evlist=0x55f9978928b0, machine=machine@entry=0x55f997893658,
from=from@entry=0) at util/synthetic-events.c:1895
#5 0x000055f99716a91f in perf_event__synthesize_id_index (tool=tool@entry=0x55f99755a200 <record>, process=process@entry=0x55f997002aa0 <process_synthesized_event>, evlist=<optimized out>, machine=machine@entry=0x55f997893658)
at util/synthetic-events.c:1905
#6 0x000055f996fed4c3 in record__synthesize (tail=tail@entry=true, rec=0x55f99755a200 <record>) at builtin-record.c:1997
#7 0x000055f996fef7c8 in __cmd_record (argc=argc@entry=2, argv=argv@entry=0x7ffc67551260, rec=0x55f99755a200 <record>) at builtin-record.c:2802
#8 0x000055f99700474c in cmd_record (argc=<optimized out>, argv=0x7ffc67551260) at builtin-record.c:4258
#9 0x000055f9970a223d in run_builtin (p=0x55f997564d88 <commands+264>, argc=10, argv=0x7ffc67551260) at perf.c:330
#10 0x000055f9970a254b in handle_internal_command (argc=10, argv=0x7ffc67551260) at perf.c:384
#11 0x000055f996ff1b61 in run_argv (argcp=<synthetic pointer>, argv=<synthetic pointer>) at perf.c:428
#12 main (argc=<optimized out>, argv=0x7ffc67551260) at perf.c:562
The reason is that record__bytes_written accesses the freed memory rec->thread_data,
The process is as follows:
__cmd_record
-> record__free_thread_data
-> zfree(&rec->thread_data) // free rec->thread_data
-> record__synthesize
-> perf_event__synthesize_id_index
-> process_synthesized_event
-> record__write
-> record__bytes_written // access rec->thread_data
We add a member variable "thread_bytes_written" in the struct "record"
to save the data size written by the threads.
Fixes: 6d57581659f72299 ("perf record: Add support for limit perf output file size")
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Jiwei Sun <jiwei.sun@windriver.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/CAM9d7ci_TRrqBQVQNW8=GwakUr7SsZpYxaaty-S4bxF8zJWyqw@mail.gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-02-15 12:23:24 +00:00
|
|
|
rec->thread_bytes_written += size;
|
|
|
|
} else {
|
2022-01-17 18:34:28 +00:00
|
|
|
rec->bytes_written += size;
|
perf record: Fix segfault with --overwrite and --max-size
When --overwrite and --max-size options of perf record are used
together, a segmentation fault occurs. The following is an example:
# perf record -e sched:sched* --overwrite --max-size 1K -a -- sleep 1
[ perf record: Woken up 1 times to write data ]
perf: Segmentation fault
Obtained 12 stack frames.
./perf/perf(+0x197673) [0x55f99710b673]
/lib/x86_64-linux-gnu/libc.so.6(+0x3ef0f) [0x7fa45f3cff0f]
./perf/perf(+0x8eb40) [0x55f997002b40]
./perf/perf(+0x1f6882) [0x55f99716a882]
./perf/perf(+0x794c2) [0x55f996fed4c2]
./perf/perf(+0x7b7c7) [0x55f996fef7c7]
./perf/perf(+0x9074b) [0x55f99700474b]
./perf/perf(+0x12e23c) [0x55f9970a223c]
./perf/perf(+0x12e54a) [0x55f9970a254a]
./perf/perf(+0x7db60) [0x55f996ff1b60]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe6) [0x7fa45f3b2c86]
./perf/perf(+0x7dfe9) [0x55f996ff1fe9]
Segmentation fault (core dumped)
backtrace of the core file is as follows:
(gdb) bt
#0 record__bytes_written (rec=0x55f99755a200 <record>) at builtin-record.c:234
#1 record__output_max_size_exceeded (rec=0x55f99755a200 <record>) at builtin-record.c:242
#2 record__write (map=0x0, size=12816, bf=0x55f9978da2e0, rec=0x55f99755a200 <record>) at builtin-record.c:263
#3 process_synthesized_event (tool=tool@entry=0x55f99755a200 <record>, event=event@entry=0x55f9978da2e0, sample=sample@entry=0x0, machine=machine@entry=0x55f997893658) at builtin-record.c:618
#4 0x000055f99716a883 in __perf_event__synthesize_id_index (tool=tool@entry=0x55f99755a200 <record>, process=process@entry=0x55f997002aa0 <process_synthesized_event>, evlist=0x55f9978928b0, machine=machine@entry=0x55f997893658,
from=from@entry=0) at util/synthetic-events.c:1895
#5 0x000055f99716a91f in perf_event__synthesize_id_index (tool=tool@entry=0x55f99755a200 <record>, process=process@entry=0x55f997002aa0 <process_synthesized_event>, evlist=<optimized out>, machine=machine@entry=0x55f997893658)
at util/synthetic-events.c:1905
#6 0x000055f996fed4c3 in record__synthesize (tail=tail@entry=true, rec=0x55f99755a200 <record>) at builtin-record.c:1997
#7 0x000055f996fef7c8 in __cmd_record (argc=argc@entry=2, argv=argv@entry=0x7ffc67551260, rec=0x55f99755a200 <record>) at builtin-record.c:2802
#8 0x000055f99700474c in cmd_record (argc=<optimized out>, argv=0x7ffc67551260) at builtin-record.c:4258
#9 0x000055f9970a223d in run_builtin (p=0x55f997564d88 <commands+264>, argc=10, argv=0x7ffc67551260) at perf.c:330
#10 0x000055f9970a254b in handle_internal_command (argc=10, argv=0x7ffc67551260) at perf.c:384
#11 0x000055f996ff1b61 in run_argv (argcp=<synthetic pointer>, argv=<synthetic pointer>) at perf.c:428
#12 main (argc=<optimized out>, argv=0x7ffc67551260) at perf.c:562
The reason is that record__bytes_written accesses the freed memory rec->thread_data,
The process is as follows:
__cmd_record
-> record__free_thread_data
-> zfree(&rec->thread_data) // free rec->thread_data
-> record__synthesize
-> perf_event__synthesize_id_index
-> process_synthesized_event
-> record__write
-> record__bytes_written // access rec->thread_data
We add a member variable "thread_bytes_written" in the struct "record"
to save the data size written by the threads.
Fixes: 6d57581659f72299 ("perf record: Add support for limit perf output file size")
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Jiwei Sun <jiwei.sun@windriver.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/CAM9d7ci_TRrqBQVQNW8=GwakUr7SsZpYxaaty-S4bxF8zJWyqw@mail.gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-02-15 12:23:24 +00:00
|
|
|
}
|
2017-01-09 09:51:58 +00:00
|
|
|
|
2019-10-22 08:09:01 +00:00
|
|
|
if (record__output_max_size_exceeded(rec) && !done) {
|
|
|
|
fprintf(stderr, "[ perf record: perf size limit reached (%" PRIu64 " KB),"
|
|
|
|
" stopping session ]\n",
|
2022-01-17 18:34:29 +00:00
|
|
|
record__bytes_written(rec) >> 10);
|
2019-10-22 08:09:01 +00:00
|
|
|
done = 1;
|
|
|
|
}
|
|
|
|
|
2017-01-09 09:51:58 +00:00
|
|
|
if (switch_output_size(rec))
|
|
|
|
trigger_hit(&switch_output_trigger);
|
|
|
|
|
2012-08-26 18:24:47 +00:00
|
|
|
return 0;
|
2009-06-18 21:22:55 +00:00
|
|
|
}
|
|
|
|
|
2019-03-18 17:44:12 +00:00
|
|
|
static int record__aio_enabled(struct record *rec);
|
|
|
|
static int record__comp_enabled(struct record *rec);
|
2023-11-02 17:56:46 +00:00
|
|
|
static ssize_t zstd_compress(struct perf_session *session, struct mmap *map,
|
2022-01-17 18:34:30 +00:00
|
|
|
void *dst, size_t dst_size, void *src, size_t src_size);
|
2019-03-18 17:43:35 +00:00
|
|
|
|
2018-11-06 09:04:58 +00:00
|
|
|
#ifdef HAVE_AIO_SUPPORT
|
|
|
|
static int record__aio_write(struct aiocb *cblock, int trace_fd,
|
|
|
|
void *buf, size_t size, off_t off)
|
|
|
|
{
|
|
|
|
int rc;
|
|
|
|
|
|
|
|
cblock->aio_fildes = trace_fd;
|
|
|
|
cblock->aio_buf = buf;
|
|
|
|
cblock->aio_nbytes = size;
|
|
|
|
cblock->aio_offset = off;
|
|
|
|
cblock->aio_sigevent.sigev_notify = SIGEV_NONE;
|
|
|
|
|
|
|
|
do {
|
|
|
|
rc = aio_write(cblock);
|
|
|
|
if (rc == 0) {
|
|
|
|
break;
|
|
|
|
} else if (errno != EAGAIN) {
|
|
|
|
cblock->aio_fildes = -1;
|
|
|
|
pr_err("failed to queue perf data, error: %m\n");
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
} while (1);
|
|
|
|
|
|
|
|
return rc;
|
|
|
|
}
|
|
|
|
|
2019-07-27 18:30:53 +00:00
|
|
|
static int record__aio_complete(struct mmap *md, struct aiocb *cblock)
|
2018-11-06 09:04:58 +00:00
|
|
|
{
|
|
|
|
void *rem_buf;
|
|
|
|
off_t rem_off;
|
|
|
|
size_t rem_size;
|
|
|
|
int rc, aio_errno;
|
|
|
|
ssize_t aio_ret, written;
|
|
|
|
|
|
|
|
aio_errno = aio_error(cblock);
|
|
|
|
if (aio_errno == EINPROGRESS)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
written = aio_ret = aio_return(cblock);
|
|
|
|
if (aio_ret < 0) {
|
|
|
|
if (aio_errno != EINTR)
|
|
|
|
pr_err("failed to write perf data, error: %m\n");
|
|
|
|
written = 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
rem_size = cblock->aio_nbytes - written;
|
|
|
|
|
|
|
|
if (rem_size == 0) {
|
|
|
|
cblock->aio_fildes = -1;
|
|
|
|
/*
|
2019-03-18 17:44:12 +00:00
|
|
|
* md->refcount is incremented in record__aio_pushfn() for
|
|
|
|
* every aio write request started in record__aio_push() so
|
|
|
|
* decrement it because the request is now complete.
|
2018-11-06 09:04:58 +00:00
|
|
|
*/
|
2019-10-07 12:53:15 +00:00
|
|
|
perf_mmap__put(&md->core);
|
2018-11-06 09:04:58 +00:00
|
|
|
rc = 1;
|
|
|
|
} else {
|
|
|
|
/*
|
|
|
|
* aio write request may require restart with the
|
2024-04-25 06:04:27 +00:00
|
|
|
* remainder if the kernel didn't write whole
|
2018-11-06 09:04:58 +00:00
|
|
|
* chunk at once.
|
|
|
|
*/
|
|
|
|
rem_off = cblock->aio_offset + written;
|
|
|
|
rem_buf = (void *)(cblock->aio_buf + written);
|
|
|
|
record__aio_write(cblock, cblock->aio_fildes,
|
|
|
|
rem_buf, rem_size, rem_off);
|
|
|
|
rc = 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
return rc;
|
|
|
|
}
|
|
|
|
|
2019-07-27 18:30:53 +00:00
|
|
|
static int record__aio_sync(struct mmap *md, bool sync_all)
|
2018-11-06 09:04:58 +00:00
|
|
|
{
|
2018-11-06 09:07:19 +00:00
|
|
|
struct aiocb **aiocb = md->aio.aiocb;
|
|
|
|
struct aiocb *cblocks = md->aio.cblocks;
|
2018-11-06 09:04:58 +00:00
|
|
|
struct timespec timeout = { 0, 1000 * 1000 * 1 }; /* 1ms */
|
2018-11-06 09:07:19 +00:00
|
|
|
int i, do_suspend;
|
2018-11-06 09:04:58 +00:00
|
|
|
|
|
|
|
do {
|
2018-11-06 09:07:19 +00:00
|
|
|
do_suspend = 0;
|
|
|
|
for (i = 0; i < md->aio.nr_cblocks; ++i) {
|
|
|
|
if (cblocks[i].aio_fildes == -1 || record__aio_complete(md, &cblocks[i])) {
|
|
|
|
if (sync_all)
|
|
|
|
aiocb[i] = NULL;
|
|
|
|
else
|
|
|
|
return i;
|
|
|
|
} else {
|
|
|
|
/*
|
|
|
|
* Started aio write is not complete yet
|
|
|
|
* so it has to be waited before the
|
|
|
|
* next allocation.
|
|
|
|
*/
|
|
|
|
aiocb[i] = &cblocks[i];
|
|
|
|
do_suspend = 1;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
if (!do_suspend)
|
|
|
|
return -1;
|
2018-11-06 09:04:58 +00:00
|
|
|
|
2018-11-06 09:07:19 +00:00
|
|
|
while (aio_suspend((const struct aiocb **)aiocb, md->aio.nr_cblocks, &timeout)) {
|
2018-11-06 09:04:58 +00:00
|
|
|
if (!(errno == EAGAIN || errno == EINTR))
|
|
|
|
pr_err("failed to sync perf data, error: %m\n");
|
|
|
|
}
|
|
|
|
} while (1);
|
|
|
|
}
|
|
|
|
|
2019-03-18 17:44:12 +00:00
|
|
|
struct record_aio {
|
|
|
|
struct record *rec;
|
|
|
|
void *data;
|
|
|
|
size_t size;
|
|
|
|
};
|
|
|
|
|
2019-07-27 18:30:53 +00:00
|
|
|
static int record__aio_pushfn(struct mmap *map, void *to, void *buf, size_t size)
|
2018-11-06 09:04:58 +00:00
|
|
|
{
|
2019-03-18 17:44:12 +00:00
|
|
|
struct record_aio *aio = to;
|
2018-11-06 09:04:58 +00:00
|
|
|
|
2019-03-18 17:44:12 +00:00
|
|
|
/*
|
2019-07-27 20:07:44 +00:00
|
|
|
* map->core.base data pointed by buf is copied into free map->aio.data[] buffer
|
2019-03-18 17:44:12 +00:00
|
|
|
* to release space in the kernel buffer as fast as possible, calling
|
|
|
|
* perf_mmap__consume() from perf_mmap__push() function.
|
|
|
|
*
|
|
|
|
* That lets the kernel to proceed with storing more profiling data into
|
|
|
|
* the kernel buffer earlier than other per-cpu kernel buffers are handled.
|
|
|
|
*
|
|
|
|
* Coping can be done in two steps in case the chunk of profiling data
|
|
|
|
* crosses the upper bound of the kernel buffer. In this case we first move
|
2024-04-25 06:04:27 +00:00
|
|
|
* part of data from map->start till the upper bound and then the remainder
|
2019-03-18 17:44:12 +00:00
|
|
|
* from the beginning of the kernel buffer till the end of the data chunk.
|
|
|
|
*/
|
|
|
|
|
|
|
|
if (record__comp_enabled(aio->rec)) {
|
2023-11-02 17:56:46 +00:00
|
|
|
ssize_t compressed = zstd_compress(aio->rec->session, NULL, aio->data + aio->size,
|
|
|
|
mmap__mmap_len(map) - aio->size,
|
|
|
|
buf, size);
|
|
|
|
if (compressed < 0)
|
|
|
|
return (int)compressed;
|
|
|
|
|
|
|
|
size = compressed;
|
2019-03-18 17:44:12 +00:00
|
|
|
} else {
|
|
|
|
memcpy(aio->data + aio->size, buf, size);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!aio->size) {
|
|
|
|
/*
|
|
|
|
* Increment map->refcount to guard map->aio.data[] buffer
|
|
|
|
* from premature deallocation because map object can be
|
|
|
|
* released earlier than aio write request started on
|
|
|
|
* map->aio.data[] buffer is complete.
|
|
|
|
*
|
|
|
|
* perf_mmap__put() is done at record__aio_complete()
|
|
|
|
* after started aio request completion or at record__aio_push()
|
|
|
|
* if the request failed to start.
|
|
|
|
*/
|
2019-10-07 12:53:13 +00:00
|
|
|
perf_mmap__get(&map->core);
|
2019-03-18 17:44:12 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
aio->size += size;
|
|
|
|
|
|
|
|
return size;
|
|
|
|
}
|
|
|
|
|
2019-07-27 18:30:53 +00:00
|
|
|
static int record__aio_push(struct record *rec, struct mmap *map, off_t *off)
|
2019-03-18 17:44:12 +00:00
|
|
|
{
|
|
|
|
int ret, idx;
|
|
|
|
int trace_fd = rec->session->data->file.fd;
|
|
|
|
struct record_aio aio = { .rec = rec, .size = 0 };
|
2018-11-06 09:04:58 +00:00
|
|
|
|
2019-03-18 17:44:12 +00:00
|
|
|
/*
|
|
|
|
* Call record__aio_sync() to wait till map->aio.data[] buffer
|
|
|
|
* becomes available after previous aio write operation.
|
|
|
|
*/
|
|
|
|
|
|
|
|
idx = record__aio_sync(map, false);
|
|
|
|
aio.data = map->aio.data[idx];
|
|
|
|
ret = perf_mmap__push(map, &aio, record__aio_pushfn);
|
|
|
|
if (ret != 0) /* ret > 0 - no data, ret < 0 - error */
|
|
|
|
return ret;
|
|
|
|
|
|
|
|
rec->samples++;
|
|
|
|
ret = record__aio_write(&(map->aio.cblocks[idx]), trace_fd, aio.data, aio.size, *off);
|
2018-11-06 09:04:58 +00:00
|
|
|
if (!ret) {
|
2019-03-18 17:44:12 +00:00
|
|
|
*off += aio.size;
|
|
|
|
rec->bytes_written += aio.size;
|
2018-11-06 09:04:58 +00:00
|
|
|
if (switch_output_size(rec))
|
|
|
|
trigger_hit(&switch_output_trigger);
|
2019-03-18 17:44:12 +00:00
|
|
|
} else {
|
|
|
|
/*
|
|
|
|
* Decrement map->refcount incremented in record__aio_pushfn()
|
|
|
|
* back if record__aio_write() operation failed to start, otherwise
|
|
|
|
* map->refcount is decremented in record__aio_complete() after
|
|
|
|
* aio write operation finishes successfully.
|
|
|
|
*/
|
2019-10-07 12:53:15 +00:00
|
|
|
perf_mmap__put(&map->core);
|
2018-11-06 09:04:58 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
static off_t record__aio_get_pos(int trace_fd)
|
|
|
|
{
|
|
|
|
return lseek(trace_fd, 0, SEEK_CUR);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void record__aio_set_pos(int trace_fd, off_t pos)
|
|
|
|
{
|
|
|
|
lseek(trace_fd, pos, SEEK_SET);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void record__aio_mmap_read_sync(struct record *rec)
|
|
|
|
{
|
|
|
|
int i;
|
2019-07-21 11:23:52 +00:00
|
|
|
struct evlist *evlist = rec->evlist;
|
2019-07-27 18:30:53 +00:00
|
|
|
struct mmap *maps = evlist->mmap;
|
2018-11-06 09:04:58 +00:00
|
|
|
|
2019-03-18 17:44:12 +00:00
|
|
|
if (!record__aio_enabled(rec))
|
2018-11-06 09:04:58 +00:00
|
|
|
return;
|
|
|
|
|
2019-07-30 11:04:59 +00:00
|
|
|
for (i = 0; i < evlist->core.nr_mmaps; i++) {
|
2019-07-27 18:30:53 +00:00
|
|
|
struct mmap *map = &maps[i];
|
2018-11-06 09:04:58 +00:00
|
|
|
|
2019-07-27 20:07:44 +00:00
|
|
|
if (map->core.base)
|
2018-11-06 09:07:19 +00:00
|
|
|
record__aio_sync(map, true);
|
2018-11-06 09:04:58 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static int nr_cblocks_default = 1;
|
2018-11-06 09:07:19 +00:00
|
|
|
static int nr_cblocks_max = 4;
|
2018-11-06 09:04:58 +00:00
|
|
|
|
|
|
|
static int record__aio_parse(const struct option *opt,
|
2018-11-06 09:07:19 +00:00
|
|
|
const char *str,
|
2018-11-06 09:04:58 +00:00
|
|
|
int unset)
|
|
|
|
{
|
|
|
|
struct record_opts *opts = (struct record_opts *)opt->value;
|
|
|
|
|
2018-11-06 09:07:19 +00:00
|
|
|
if (unset) {
|
2018-11-06 09:04:58 +00:00
|
|
|
opts->nr_cblocks = 0;
|
2018-11-06 09:07:19 +00:00
|
|
|
} else {
|
|
|
|
if (str)
|
|
|
|
opts->nr_cblocks = strtol(str, NULL, 0);
|
|
|
|
if (!opts->nr_cblocks)
|
|
|
|
opts->nr_cblocks = nr_cblocks_default;
|
|
|
|
}
|
2018-11-06 09:04:58 +00:00
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
#else /* HAVE_AIO_SUPPORT */
|
2018-11-06 09:07:19 +00:00
|
|
|
static int nr_cblocks_max = 0;
|
|
|
|
|
2019-07-27 18:30:53 +00:00
|
|
|
static int record__aio_push(struct record *rec __maybe_unused, struct mmap *map __maybe_unused,
|
2019-03-18 17:44:12 +00:00
|
|
|
off_t *off __maybe_unused)
|
2018-11-06 09:04:58 +00:00
|
|
|
{
|
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
|
|
|
|
static off_t record__aio_get_pos(int trace_fd __maybe_unused)
|
|
|
|
{
|
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void record__aio_set_pos(int trace_fd __maybe_unused, off_t pos __maybe_unused)
|
|
|
|
{
|
|
|
|
}
|
|
|
|
|
|
|
|
static void record__aio_mmap_read_sync(struct record *rec __maybe_unused)
|
|
|
|
{
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
|
|
|
static int record__aio_enabled(struct record *rec)
|
|
|
|
{
|
|
|
|
return rec->opts.nr_cblocks > 0;
|
|
|
|
}
|
|
|
|
|
perf record: Implement --mmap-flush=<number> option
Implement a --mmap-flush option that specifies minimal number of bytes
that is extracted from mmaped kernel buffer to store into a trace. The
default option value is 1 byte what means every time trace writing
thread finds some new data in the mmaped buffer the data is extracted,
possibly compressed and written to a trace.
$ tools/perf/perf record --mmap-flush 1024 -e cycles -- matrix.gcc
$ tools/perf/perf record --aio --mmap-flush 1K -e cycles -- matrix.gcc
The option is independent from -z setting, doesn't vary with compression
level and can serve two purposes.
The first purpose is to increase the compression ratio of a trace data.
Larger data chunks are compressed more effectively so the implemented
option allows specifying data chunk size to compress. Also at some cases
executing more write syscalls with smaller data size can take longer
than executing less write syscalls with bigger data size due to syscall
overhead so extracting bigger data chunks specified by the option value
could additionally decrease runtime overhead.
The second purpose is to avoid self monitoring live-lock issue in system
wide (-a) profiling mode. Profiling in system wide mode with compression
(-a -z) can additionally induce data into the kernel buffers along with
the data from monitored processes. If performance data rate and volume
from the monitored processes is high then trace streaming and
compression activity in the tool is also high. High tool process
activity can lead to subtle live-lock effect when compression of single
new byte from some of mmaped kernel buffer leads to generation of the
next single byte at some mmaped buffer. So perf tool process ends up in
endless self monitoring.
Implemented synch parameter is the mean to force data move independently
from the specified flush threshold value. Despite the provided flush
value the tool needs capability to unconditionally drain memory buffers,
at least in the end of the collection.
Committer testing:
Running with the default value, i.e. as soon as there is something to
read go on consuming, we first write the synthesized events, small
chunks of about 128 bytes:
# perf trace -m 2048 --call-graph dwarf -e write -- perf record
<SNIP>
101.142 ( 0.004 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x210db60, count: 120) = 120
__libc_write (/usr/lib64/libpthread-2.28.so)
ion (/home/acme/bin/perf)
record__write (inlined)
process_synthesized_event (/home/acme/bin/perf)
perf_tool__process_synth_event (inlined)
perf_event__synthesize_mmap_events (/home/acme/bin/perf)
Then we move to reading the mmap buffers consuming the events put there
by the kernel perf infrastructure:
107.561 ( 0.005 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befc02000, count: 336) = 336
__libc_write (/usr/lib64/libpthread-2.28.so)
ion (/home/acme/bin/perf)
record__write (inlined)
record__pushfn (/home/acme/bin/perf)
perf_mmap__push (/home/acme/bin/perf)
record__mmap_read_evlist (inlined)
record__mmap_read_all (inlined)
__cmd_record (inlined)
cmd_record (/home/acme/bin/perf)
12919.953 ( 0.136 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befc83150, count: 184984) = 184984
<SNIP same backtrace as in the 107.561 timestamp>
12920.094 ( 0.155 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befc02150, count: 261816) = 261816
<SNIP same backtrace as in the 107.561 timestamp>
12920.253 ( 0.093 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befb81120, count: 170832) = 170832
<SNIP same backtrace as in the 107.561 timestamp>
If we limit it to write only when more than 16MB are available for
reading, it throttles that to a quarter of the --mmap-pages set for
'perf record', which by default get to 528384 bytes, found out using
'record -v':
mmap flush: 132096
mmap size 528384B
With that in place all the writes coming from
record__mmap_read_evlist(), i.e. from the mmap buffers setup by the
kernel perf infrastructure were at least 132096 bytes long.
Trying with a bigger mmap size:
perf trace -e write perf record -v -m 2048 --mmap-flush 16M
74982.928 ( 2.471 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff94a6cc000, count: 3580888) = 3580888
74985.406 ( 2.353 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff949ecb000, count: 3453256) = 3453256
74987.764 ( 2.629 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff9496ca000, count: 3859232) = 3859232
74990.399 ( 2.341 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff948ec9000, count: 3769032) = 3769032
74992.744 ( 2.064 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff9486c8000, count: 3310520) = 3310520
74994.814 ( 2.619 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff947ec7000, count: 4194688) = 4194688
74997.439 ( 2.787 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff9476c6000, count: 4029760) = 4029760
Was again limited to a quarter of the mmap size:
mmap flush: 2098176
mmap size 8392704B
A warning about that would be good to have but can be added later,
something like:
"max flush is a quarter of the mmap size, if wanting to bump the mmap
flush further, bump the mmap size as well using -m/--mmap-pages"
Also rename the 'sync' parameters to 'synch' to keep tools/perf building
with older glibcs:
cc1: warnings being treated as errors
builtin-record.c: In function 'record__mmap_read_evlist':
builtin-record.c:775: warning: declaration of 'sync' shadows a global declaration
/usr/include/unistd.h:933: warning: shadowed declaration is here
builtin-record.c: In function 'record__mmap_read_all':
builtin-record.c:856: warning: declaration of 'sync' shadows a global declaration
/usr/include/unistd.h:933: warning: shadowed declaration is here
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/f6600d72-ecfa-2eb7-7e51-f6954547d500@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-18 17:40:26 +00:00
|
|
|
#define MMAP_FLUSH_DEFAULT 1
|
|
|
|
static int record__mmap_flush_parse(const struct option *opt,
|
|
|
|
const char *str,
|
|
|
|
int unset)
|
|
|
|
{
|
|
|
|
int flush_max;
|
|
|
|
struct record_opts *opts = (struct record_opts *)opt->value;
|
|
|
|
static struct parse_tag tags[] = {
|
|
|
|
{ .tag = 'B', .mult = 1 },
|
|
|
|
{ .tag = 'K', .mult = 1 << 10 },
|
|
|
|
{ .tag = 'M', .mult = 1 << 20 },
|
|
|
|
{ .tag = 'G', .mult = 1 << 30 },
|
|
|
|
{ .tag = 0 },
|
|
|
|
};
|
|
|
|
|
|
|
|
if (unset)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
if (str) {
|
|
|
|
opts->mmap_flush = parse_tag_value(str, tags);
|
|
|
|
if (opts->mmap_flush == (int)-1)
|
|
|
|
opts->mmap_flush = strtol(str, NULL, 0);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!opts->mmap_flush)
|
|
|
|
opts->mmap_flush = MMAP_FLUSH_DEFAULT;
|
|
|
|
|
2019-07-28 10:45:35 +00:00
|
|
|
flush_max = evlist__mmap_size(opts->mmap_pages);
|
perf record: Implement --mmap-flush=<number> option
Implement a --mmap-flush option that specifies minimal number of bytes
that is extracted from mmaped kernel buffer to store into a trace. The
default option value is 1 byte what means every time trace writing
thread finds some new data in the mmaped buffer the data is extracted,
possibly compressed and written to a trace.
$ tools/perf/perf record --mmap-flush 1024 -e cycles -- matrix.gcc
$ tools/perf/perf record --aio --mmap-flush 1K -e cycles -- matrix.gcc
The option is independent from -z setting, doesn't vary with compression
level and can serve two purposes.
The first purpose is to increase the compression ratio of a trace data.
Larger data chunks are compressed more effectively so the implemented
option allows specifying data chunk size to compress. Also at some cases
executing more write syscalls with smaller data size can take longer
than executing less write syscalls with bigger data size due to syscall
overhead so extracting bigger data chunks specified by the option value
could additionally decrease runtime overhead.
The second purpose is to avoid self monitoring live-lock issue in system
wide (-a) profiling mode. Profiling in system wide mode with compression
(-a -z) can additionally induce data into the kernel buffers along with
the data from monitored processes. If performance data rate and volume
from the monitored processes is high then trace streaming and
compression activity in the tool is also high. High tool process
activity can lead to subtle live-lock effect when compression of single
new byte from some of mmaped kernel buffer leads to generation of the
next single byte at some mmaped buffer. So perf tool process ends up in
endless self monitoring.
Implemented synch parameter is the mean to force data move independently
from the specified flush threshold value. Despite the provided flush
value the tool needs capability to unconditionally drain memory buffers,
at least in the end of the collection.
Committer testing:
Running with the default value, i.e. as soon as there is something to
read go on consuming, we first write the synthesized events, small
chunks of about 128 bytes:
# perf trace -m 2048 --call-graph dwarf -e write -- perf record
<SNIP>
101.142 ( 0.004 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x210db60, count: 120) = 120
__libc_write (/usr/lib64/libpthread-2.28.so)
ion (/home/acme/bin/perf)
record__write (inlined)
process_synthesized_event (/home/acme/bin/perf)
perf_tool__process_synth_event (inlined)
perf_event__synthesize_mmap_events (/home/acme/bin/perf)
Then we move to reading the mmap buffers consuming the events put there
by the kernel perf infrastructure:
107.561 ( 0.005 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befc02000, count: 336) = 336
__libc_write (/usr/lib64/libpthread-2.28.so)
ion (/home/acme/bin/perf)
record__write (inlined)
record__pushfn (/home/acme/bin/perf)
perf_mmap__push (/home/acme/bin/perf)
record__mmap_read_evlist (inlined)
record__mmap_read_all (inlined)
__cmd_record (inlined)
cmd_record (/home/acme/bin/perf)
12919.953 ( 0.136 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befc83150, count: 184984) = 184984
<SNIP same backtrace as in the 107.561 timestamp>
12920.094 ( 0.155 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befc02150, count: 261816) = 261816
<SNIP same backtrace as in the 107.561 timestamp>
12920.253 ( 0.093 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befb81120, count: 170832) = 170832
<SNIP same backtrace as in the 107.561 timestamp>
If we limit it to write only when more than 16MB are available for
reading, it throttles that to a quarter of the --mmap-pages set for
'perf record', which by default get to 528384 bytes, found out using
'record -v':
mmap flush: 132096
mmap size 528384B
With that in place all the writes coming from
record__mmap_read_evlist(), i.e. from the mmap buffers setup by the
kernel perf infrastructure were at least 132096 bytes long.
Trying with a bigger mmap size:
perf trace -e write perf record -v -m 2048 --mmap-flush 16M
74982.928 ( 2.471 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff94a6cc000, count: 3580888) = 3580888
74985.406 ( 2.353 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff949ecb000, count: 3453256) = 3453256
74987.764 ( 2.629 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff9496ca000, count: 3859232) = 3859232
74990.399 ( 2.341 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff948ec9000, count: 3769032) = 3769032
74992.744 ( 2.064 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff9486c8000, count: 3310520) = 3310520
74994.814 ( 2.619 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff947ec7000, count: 4194688) = 4194688
74997.439 ( 2.787 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff9476c6000, count: 4029760) = 4029760
Was again limited to a quarter of the mmap size:
mmap flush: 2098176
mmap size 8392704B
A warning about that would be good to have but can be added later,
something like:
"max flush is a quarter of the mmap size, if wanting to bump the mmap
flush further, bump the mmap size as well using -m/--mmap-pages"
Also rename the 'sync' parameters to 'synch' to keep tools/perf building
with older glibcs:
cc1: warnings being treated as errors
builtin-record.c: In function 'record__mmap_read_evlist':
builtin-record.c:775: warning: declaration of 'sync' shadows a global declaration
/usr/include/unistd.h:933: warning: shadowed declaration is here
builtin-record.c: In function 'record__mmap_read_all':
builtin-record.c:856: warning: declaration of 'sync' shadows a global declaration
/usr/include/unistd.h:933: warning: shadowed declaration is here
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/f6600d72-ecfa-2eb7-7e51-f6954547d500@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-18 17:40:26 +00:00
|
|
|
flush_max /= 4;
|
|
|
|
if (opts->mmap_flush > flush_max)
|
|
|
|
opts->mmap_flush = flush_max;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
perf record: Implement -z,--compression_level[=<n>] option
Implemented -z,--compression_level[=<n>] option that enables compression
of mmaped kernel data buffers content in runtime during perf record mode
collection. Default option value is 1 (fastest compression).
Compression overhead has been measured for serial and AIO streaming when
profiling matrix multiplication workload:
-------------------------------------------------------------
| SERIAL | AIO-1 |
----------------------------------------------------------------|
|-z | OVH(x) | ratio(x) size(MiB) | OVH(x) | ratio(x) size(MiB) |
|---------------------------------------------------------------|
| 0 | 1,00 | 1,000 179,424 | 1,00 | 1,000 187,527 |
| 1 | 1,04 | 8,427 181,148 | 1,01 | 8,474 188,562 |
| 2 | 1,07 | 8,055 186,953 | 1,03 | 7,912 191,773 |
| 3 | 1,04 | 8,283 181,908 | 1,03 | 8,220 191,078 |
| 5 | 1,09 | 8,101 187,705 | 1,05 | 7,780 190,065 |
| 8 | 1,05 | 9,217 179,191 | 1,12 | 6,111 193,024 |
-----------------------------------------------------------------
OVH = (Execution time with -z N) / (Execution time with -z 0)
ratio - compression ratio
size - number of bytes that was compressed
size ~= trace size x ratio
Committer notes:
Testing it I noticed that it failed to disable build id processing when
compression is enabled, and as we'd have to uncompress everything to
look for the PERF_RECORD_{MMAP,SAMPLE,etc} to figure out which build ids
to read from DSOs, we better disable build id processing when
compression is enabled, logging with pr_debug() when doing so:
Original patch:
# perf record -z2
^C[ perf record: Woken up 1 times to write data ]
0x1746e0 [0x76]: failed to process type: 81 [Invalid argument]
[ perf record: Captured and wrote 1.568 MB perf.data, compressed (original 0.452 MB, ratio is 3.995) ]
#
After auto-disabling build id processing when compression is enabled:
$ perf record -z2 sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.001 MB perf.data, compressed (original 0.001 MB, ratio is 2.292) ]
$ perf record -v -z2 sleep 1
Compression enabled, disabling build id collection at the end of the session.
<SNIP extra -v pr_debug() messages>
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.001 MB perf.data, compressed (original 0.001 MB, ratio is 2.305) ]
$
Also, with parts of the patch originally after this one moved to just
before this one we get:
$ perf record -z2 sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.001 MB perf.data, compressed (original 0.001 MB, ratio is 2.371) ]
$ perf report -D | grep COMPRESS
0 0x1b8 [0x155]: PERF_RECORD_COMPRESSED: unhandled!
0 0x30d [0x80]: PERF_RECORD_COMPRESSED: unhandled!
COMPRESSED events: 2
COMPRESSED events: 0
$
I.e. when faced with PERF_RECORD_COMPRESSED that we still have no code
to process, we just show it as not being handled, skip them and
continue, while before we had:
$ perf report -D | grep COMPRESS
0x1b8 [0x169]: failed to process type: 81 [Invalid argument]
Error:
failed to process sample
0 0x1b8 [0x169]: PERF_RECORD_COMPRESSED
$
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/9ff06518-ae63-a908-e44d-5d9e56dd66d9@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-18 17:44:42 +00:00
|
|
|
#ifdef HAVE_ZSTD_SUPPORT
|
|
|
|
static unsigned int comp_level_default = 1;
|
|
|
|
|
|
|
|
static int record__parse_comp_level(const struct option *opt, const char *str, int unset)
|
|
|
|
{
|
|
|
|
struct record_opts *opts = opt->value;
|
|
|
|
|
|
|
|
if (unset) {
|
|
|
|
opts->comp_level = 0;
|
|
|
|
} else {
|
|
|
|
if (str)
|
|
|
|
opts->comp_level = strtol(str, NULL, 0);
|
|
|
|
if (!opts->comp_level)
|
|
|
|
opts->comp_level = comp_level_default;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
#endif
|
2019-03-18 17:42:19 +00:00
|
|
|
static unsigned int comp_level_max = 22;
|
|
|
|
|
2019-03-18 17:41:33 +00:00
|
|
|
static int record__comp_enabled(struct record *rec)
|
|
|
|
{
|
|
|
|
return rec->opts.comp_level > 0;
|
|
|
|
}
|
|
|
|
|
2024-08-12 20:46:55 +00:00
|
|
|
static int process_synthesized_event(const struct perf_tool *tool,
|
2011-11-25 10:19:45 +00:00
|
|
|
union perf_event *event,
|
2012-09-10 22:15:03 +00:00
|
|
|
struct perf_sample *sample __maybe_unused,
|
|
|
|
struct machine *machine __maybe_unused)
|
2009-10-26 21:23:18 +00:00
|
|
|
{
|
2013-12-19 17:38:03 +00:00
|
|
|
struct record *rec = container_of(tool, struct record, tool);
|
2018-09-13 12:54:06 +00:00
|
|
|
return record__write(rec, NULL, event, event->header.size);
|
2009-10-26 21:23:18 +00:00
|
|
|
}
|
|
|
|
|
2022-08-26 16:42:31 +00:00
|
|
|
static struct mutex synth_lock;
|
|
|
|
|
2024-08-12 20:46:55 +00:00
|
|
|
static int process_locked_synthesized_event(const struct perf_tool *tool,
|
2020-04-22 15:50:38 +00:00
|
|
|
union perf_event *event,
|
|
|
|
struct perf_sample *sample __maybe_unused,
|
|
|
|
struct machine *machine __maybe_unused)
|
|
|
|
{
|
|
|
|
int ret;
|
|
|
|
|
2022-08-26 16:42:31 +00:00
|
|
|
mutex_lock(&synth_lock);
|
2020-04-22 15:50:38 +00:00
|
|
|
ret = process_synthesized_event(tool, event, sample, machine);
|
2022-08-26 16:42:31 +00:00
|
|
|
mutex_unlock(&synth_lock);
|
2020-04-22 15:50:38 +00:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2019-07-27 18:30:53 +00:00
|
|
|
static int record__pushfn(struct mmap *map, void *to, void *bf, size_t size)
|
2017-10-05 19:39:55 +00:00
|
|
|
{
|
|
|
|
struct record *rec = to;
|
|
|
|
|
2019-03-18 17:43:35 +00:00
|
|
|
if (record__comp_enabled(rec)) {
|
2023-11-02 17:56:46 +00:00
|
|
|
ssize_t compressed = zstd_compress(rec->session, map, map->data,
|
|
|
|
mmap__mmap_len(map), bf, size);
|
|
|
|
|
|
|
|
if (compressed < 0)
|
|
|
|
return (int)compressed;
|
|
|
|
|
|
|
|
size = compressed;
|
2019-03-18 17:43:35 +00:00
|
|
|
bf = map->data;
|
|
|
|
}
|
|
|
|
|
2022-01-17 18:34:25 +00:00
|
|
|
thread->samples++;
|
2018-09-13 12:54:06 +00:00
|
|
|
return record__write(rec, map, bf, size);
|
2017-10-05 19:39:55 +00:00
|
|
|
}
|
|
|
|
|
2022-10-24 18:19:07 +00:00
|
|
|
static volatile sig_atomic_t signr = -1;
|
|
|
|
static volatile sig_atomic_t child_finished;
|
2020-05-13 02:20:23 +00:00
|
|
|
#ifdef HAVE_EVENTFD_SUPPORT
|
2022-10-24 18:19:07 +00:00
|
|
|
static volatile sig_atomic_t done_fd = -1;
|
2020-05-13 02:20:23 +00:00
|
|
|
#endif
|
2016-04-13 08:21:06 +00:00
|
|
|
|
2015-04-30 14:37:32 +00:00
|
|
|
static void sig_handler(int sig)
|
|
|
|
{
|
|
|
|
if (sig == SIGCHLD)
|
|
|
|
child_finished = 1;
|
|
|
|
else
|
|
|
|
signr = sig;
|
|
|
|
|
|
|
|
done = 1;
|
2020-05-13 02:20:23 +00:00
|
|
|
#ifdef HAVE_EVENTFD_SUPPORT
|
2022-10-24 01:10:24 +00:00
|
|
|
if (done_fd >= 0) {
|
|
|
|
u64 tmp = 1;
|
|
|
|
int orig_errno = errno;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* It is possible for this signal handler to run after done is
|
|
|
|
* checked in the main loop, but before the perf counter fds are
|
|
|
|
* polled. If this happens, the poll() will continue to wait
|
|
|
|
* even though done is set, and will only break out if either
|
|
|
|
* another signal is received, or the counters are ready for
|
|
|
|
* read. To ensure the poll() doesn't sleep when done is set,
|
|
|
|
* use an eventfd (done_fd) to wake up the poll().
|
|
|
|
*/
|
|
|
|
if (write(done_fd, &tmp, sizeof(tmp)) < 0)
|
|
|
|
pr_err("failed to signal wakeup fd, error: %m\n");
|
|
|
|
|
|
|
|
errno = orig_errno;
|
|
|
|
}
|
2020-05-13 02:20:23 +00:00
|
|
|
#endif // HAVE_EVENTFD_SUPPORT
|
2015-04-30 14:37:32 +00:00
|
|
|
}
|
|
|
|
|
2016-11-26 07:03:28 +00:00
|
|
|
static void sigsegv_handler(int sig)
|
|
|
|
{
|
|
|
|
perf_hooks__recover();
|
|
|
|
sighandler_dump_stack(sig);
|
|
|
|
}
|
|
|
|
|
2015-04-30 14:37:32 +00:00
|
|
|
static void record__sig_exit(void)
|
|
|
|
{
|
|
|
|
if (signr == -1)
|
|
|
|
return;
|
|
|
|
|
|
|
|
signal(signr, SIG_DFL);
|
|
|
|
raise(signr);
|
|
|
|
}
|
|
|
|
|
2015-04-30 14:37:27 +00:00
|
|
|
#ifdef HAVE_AUXTRACE_SUPPORT
|
|
|
|
|
2024-08-12 20:46:55 +00:00
|
|
|
static int record__process_auxtrace(const struct perf_tool *tool,
|
2019-07-27 18:30:53 +00:00
|
|
|
struct mmap *map,
|
2015-04-09 15:53:45 +00:00
|
|
|
union perf_event *event, void *data1,
|
|
|
|
size_t len1, void *data2, size_t len2)
|
|
|
|
{
|
|
|
|
struct record *rec = container_of(tool, struct record, tool);
|
2017-01-23 21:07:59 +00:00
|
|
|
struct perf_data *data = &rec->data;
|
2015-04-09 15:53:45 +00:00
|
|
|
size_t padding;
|
|
|
|
u8 pad[8] = {0};
|
|
|
|
|
2019-10-04 08:31:20 +00:00
|
|
|
if (!perf_data__is_pipe(data) && perf_data__is_single_file(data)) {
|
2015-04-30 14:37:25 +00:00
|
|
|
off_t file_offset;
|
2017-01-23 21:07:59 +00:00
|
|
|
int fd = perf_data__fd(data);
|
2015-04-30 14:37:25 +00:00
|
|
|
int err;
|
|
|
|
|
|
|
|
file_offset = lseek(fd, 0, SEEK_CUR);
|
|
|
|
if (file_offset == -1)
|
|
|
|
return -1;
|
|
|
|
err = auxtrace_index__auxtrace_event(&rec->session->auxtrace_index,
|
|
|
|
event, file_offset);
|
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2015-04-09 15:53:45 +00:00
|
|
|
/* event.auxtrace.size includes padding, see __auxtrace_mmap__read() */
|
|
|
|
padding = (len1 + len2) & 7;
|
|
|
|
if (padding)
|
|
|
|
padding = 8 - padding;
|
|
|
|
|
2018-09-13 12:54:06 +00:00
|
|
|
record__write(rec, map, event, event->header.size);
|
|
|
|
record__write(rec, map, data1, len1);
|
2015-04-09 15:53:45 +00:00
|
|
|
if (len2)
|
2018-09-13 12:54:06 +00:00
|
|
|
record__write(rec, map, data2, len2);
|
|
|
|
record__write(rec, map, &pad, padding);
|
2015-04-09 15:53:45 +00:00
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int record__auxtrace_mmap_read(struct record *rec,
|
2019-07-27 18:30:53 +00:00
|
|
|
struct mmap *map)
|
2015-04-09 15:53:45 +00:00
|
|
|
{
|
|
|
|
int ret;
|
|
|
|
|
2018-09-13 12:54:05 +00:00
|
|
|
ret = auxtrace_mmap__read(map, rec->itr, &rec->tool,
|
2015-04-09 15:53:45 +00:00
|
|
|
record__process_auxtrace);
|
|
|
|
if (ret < 0)
|
|
|
|
return ret;
|
|
|
|
|
|
|
|
if (ret)
|
|
|
|
rec->samples++;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2015-04-30 14:37:32 +00:00
|
|
|
static int record__auxtrace_mmap_read_snapshot(struct record *rec,
|
2019-07-27 18:30:53 +00:00
|
|
|
struct mmap *map)
|
2015-04-30 14:37:32 +00:00
|
|
|
{
|
|
|
|
int ret;
|
|
|
|
|
2018-09-13 12:54:05 +00:00
|
|
|
ret = auxtrace_mmap__read_snapshot(map, rec->itr, &rec->tool,
|
2015-04-30 14:37:32 +00:00
|
|
|
record__process_auxtrace,
|
|
|
|
rec->opts.auxtrace_snapshot_size);
|
|
|
|
if (ret < 0)
|
|
|
|
return ret;
|
|
|
|
|
|
|
|
if (ret)
|
|
|
|
rec->samples++;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int record__auxtrace_read_snapshot_all(struct record *rec)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
int rc = 0;
|
|
|
|
|
2019-07-30 11:04:59 +00:00
|
|
|
for (i = 0; i < rec->evlist->core.nr_mmaps; i++) {
|
2019-07-27 18:30:53 +00:00
|
|
|
struct mmap *map = &rec->evlist->mmap[i];
|
2015-04-30 14:37:32 +00:00
|
|
|
|
2018-09-13 12:54:05 +00:00
|
|
|
if (!map->auxtrace_mmap.base)
|
2015-04-30 14:37:32 +00:00
|
|
|
continue;
|
|
|
|
|
2018-09-13 12:54:05 +00:00
|
|
|
if (record__auxtrace_mmap_read_snapshot(rec, map) != 0) {
|
2015-04-30 14:37:32 +00:00
|
|
|
rc = -1;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
out:
|
|
|
|
return rc;
|
|
|
|
}
|
|
|
|
|
2019-08-06 14:41:01 +00:00
|
|
|
static void record__read_auxtrace_snapshot(struct record *rec, bool on_exit)
|
2015-04-30 14:37:32 +00:00
|
|
|
{
|
|
|
|
pr_debug("Recording AUX area tracing snapshot\n");
|
|
|
|
if (record__auxtrace_read_snapshot_all(rec) < 0) {
|
2016-04-20 18:59:49 +00:00
|
|
|
trigger_error(&auxtrace_snapshot_trigger);
|
2015-04-30 14:37:32 +00:00
|
|
|
} else {
|
2019-08-06 14:41:01 +00:00
|
|
|
if (auxtrace_record__snapshot_finish(rec->itr, on_exit))
|
2016-04-20 18:59:49 +00:00
|
|
|
trigger_error(&auxtrace_snapshot_trigger);
|
|
|
|
else
|
|
|
|
trigger_ready(&auxtrace_snapshot_trigger);
|
2015-04-30 14:37:32 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2019-08-06 14:41:01 +00:00
|
|
|
static int record__auxtrace_snapshot_exit(struct record *rec)
|
|
|
|
{
|
|
|
|
if (trigger_is_error(&auxtrace_snapshot_trigger))
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
if (!auxtrace_record__snapshot_started &&
|
|
|
|
auxtrace_record__snapshot_start(rec->itr))
|
|
|
|
return -1;
|
|
|
|
|
|
|
|
record__read_auxtrace_snapshot(rec, true);
|
|
|
|
if (trigger_is_error(&auxtrace_snapshot_trigger))
|
|
|
|
return -1;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2018-03-06 09:13:12 +00:00
|
|
|
static int record__auxtrace_init(struct record *rec)
|
|
|
|
{
|
|
|
|
int err;
|
|
|
|
|
2022-01-17 18:34:34 +00:00
|
|
|
if ((rec->opts.auxtrace_snapshot_opts || rec->opts.auxtrace_sample_opts)
|
|
|
|
&& record__threads_enabled(rec)) {
|
|
|
|
pr_err("AUX area tracing options are not available in parallel streaming mode.\n");
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
2018-03-06 09:13:12 +00:00
|
|
|
if (!rec->itr) {
|
|
|
|
rec->itr = auxtrace_record__init(rec->evlist, &err);
|
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
err = auxtrace_parse_snapshot_options(rec->itr, &rec->opts,
|
|
|
|
rec->opts.auxtrace_snapshot_opts);
|
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
|
2019-11-15 12:42:16 +00:00
|
|
|
err = auxtrace_parse_sample_options(rec->itr, rec->evlist, &rec->opts,
|
|
|
|
rec->opts.auxtrace_sample_opts);
|
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
|
2021-01-21 14:04:18 +00:00
|
|
|
auxtrace_regroup_aux_output(rec->evlist);
|
|
|
|
|
2018-03-06 09:13:12 +00:00
|
|
|
return auxtrace_parse_filters(rec->evlist);
|
|
|
|
}
|
|
|
|
|
2015-04-30 14:37:27 +00:00
|
|
|
#else
|
|
|
|
|
|
|
|
static inline
|
|
|
|
int record__auxtrace_mmap_read(struct record *rec __maybe_unused,
|
2019-07-27 18:30:53 +00:00
|
|
|
struct mmap *map __maybe_unused)
|
2015-04-30 14:37:27 +00:00
|
|
|
{
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2015-04-30 14:37:32 +00:00
|
|
|
static inline
|
2019-08-06 14:41:01 +00:00
|
|
|
void record__read_auxtrace_snapshot(struct record *rec __maybe_unused,
|
|
|
|
bool on_exit __maybe_unused)
|
2009-04-08 13:01:31 +00:00
|
|
|
{
|
2009-06-10 13:55:59 +00:00
|
|
|
}
|
|
|
|
|
2015-04-30 14:37:32 +00:00
|
|
|
static inline
|
|
|
|
int auxtrace_record__snapshot_start(struct auxtrace_record *itr __maybe_unused)
|
2009-06-10 13:55:59 +00:00
|
|
|
{
|
2015-04-30 14:37:32 +00:00
|
|
|
return 0;
|
2009-04-08 13:01:31 +00:00
|
|
|
}
|
|
|
|
|
2019-08-06 14:41:01 +00:00
|
|
|
static inline
|
|
|
|
int record__auxtrace_snapshot_exit(struct record *rec __maybe_unused)
|
|
|
|
{
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2018-03-06 09:13:12 +00:00
|
|
|
static int record__auxtrace_init(struct record *rec __maybe_unused)
|
|
|
|
{
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2015-04-30 14:37:32 +00:00
|
|
|
#endif
|
|
|
|
|
2020-05-12 12:19:18 +00:00
|
|
|
static int record__config_text_poke(struct evlist *evlist)
|
|
|
|
{
|
|
|
|
struct evsel *evsel;
|
|
|
|
|
|
|
|
/* Nothing to do if text poke is already configured */
|
|
|
|
evlist__for_each_entry(evlist, evsel) {
|
|
|
|
if (evsel->core.attr.text_poke)
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2022-05-24 07:54:27 +00:00
|
|
|
evsel = evlist__add_dummy_on_all_cpus(evlist);
|
|
|
|
if (!evsel)
|
|
|
|
return -ENOMEM;
|
2020-05-12 12:19:18 +00:00
|
|
|
|
|
|
|
evsel->core.attr.text_poke = 1;
|
|
|
|
evsel->core.attr.ksymbol = 1;
|
|
|
|
evsel->immediate = true;
|
|
|
|
evsel__set_sample_bit(evsel, TIME);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2022-05-18 22:47:21 +00:00
|
|
|
static int record__config_off_cpu(struct record *rec)
|
|
|
|
{
|
2022-05-18 22:47:24 +00:00
|
|
|
return off_cpu_prepare(rec->evlist, &rec->opts.target, &rec->opts);
|
2022-05-18 22:47:21 +00:00
|
|
|
}
|
|
|
|
|
2023-09-04 02:33:38 +00:00
|
|
|
static bool record__tracking_system_wide(struct record *rec)
|
|
|
|
{
|
|
|
|
struct evlist *evlist = rec->evlist;
|
|
|
|
struct evsel *evsel;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If non-dummy evsel exists, system_wide sideband is need to
|
|
|
|
* help parse sample information.
|
|
|
|
* For example, PERF_EVENT_MMAP event to help parse symbol,
|
|
|
|
* and PERF_EVENT_COMM event to help parse task executable name.
|
|
|
|
*/
|
|
|
|
evlist__for_each_entry(evlist, evsel) {
|
|
|
|
if (!evsel__is_dummy_event(evsel))
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
2023-09-04 02:33:37 +00:00
|
|
|
static int record__config_tracking_events(struct record *rec)
|
|
|
|
{
|
|
|
|
struct record_opts *opts = &rec->opts;
|
|
|
|
struct evlist *evlist = rec->evlist;
|
2023-09-04 02:33:38 +00:00
|
|
|
bool system_wide = false;
|
2023-09-04 02:33:37 +00:00
|
|
|
struct evsel *evsel;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* For initial_delay, system wide or a hybrid system, we need to add
|
|
|
|
* tracking event so that we can track PERF_RECORD_MMAP to cover the
|
|
|
|
* delay of waiting or event synthesis.
|
|
|
|
*/
|
|
|
|
if (opts->target.initial_delay || target__has_cpu(&opts->target) ||
|
|
|
|
perf_pmus__num_core_pmus() > 1) {
|
2023-09-04 02:33:38 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* User space tasks can migrate between CPUs, so when tracing
|
|
|
|
* selected CPUs, sideband for all CPUs is still needed.
|
|
|
|
*/
|
|
|
|
if (!!opts->target.cpu_list && record__tracking_system_wide(rec))
|
|
|
|
system_wide = true;
|
|
|
|
|
|
|
|
evsel = evlist__findnew_tracking_event(evlist, system_wide);
|
2023-09-04 02:33:37 +00:00
|
|
|
if (!evsel)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Enable the tracking event when the process is forked for
|
|
|
|
* initial_delay, immediately for system wide.
|
|
|
|
*/
|
|
|
|
if (opts->target.initial_delay && !evsel->immediate &&
|
|
|
|
!target__has_cpu(&opts->target))
|
|
|
|
evsel->core.attr.enable_on_exec = 1;
|
|
|
|
else
|
|
|
|
evsel->immediate = 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
perf record: Put a copy of kcore into the perf.data directory
Add a new 'perf record' option '--kcore' which will put a copy of
/proc/kcore, kallsyms and modules into a perf.data directory. Note, that
without the --kcore option, output goes to a file as previously. The
tools' -o and -i options work with either a file name or directory name.
Example:
$ sudo perf record --kcore uname
$ sudo tree perf.data
perf.data
├── kcore_dir
│ ├── kallsyms
│ ├── kcore
│ └── modules
└── data
$ sudo perf script -v
build id event received for vmlinux: 1eaa285996affce2d74d8e66dcea09a80c9941de
build id event received for [vdso]: 8bbaf5dc62a9b644b4d4e4539737e104e4a84541
Samples for 'cycles' event do not have CPU attribute set. Skipping 'cpu' field.
Using CPUID GenuineIntel-6-8E-A
Using perf.data/kcore_dir/kcore for kernel data
Using perf.data/kcore_dir/kallsyms for symbols
perf 19058 506778.423729: 1 cycles: ffffffffa2caa548 native_write_msr+0x8 (vmlinux)
perf 19058 506778.423733: 1 cycles: ffffffffa2caa548 native_write_msr+0x8 (vmlinux)
perf 19058 506778.423734: 7 cycles: ffffffffa2caa548 native_write_msr+0x8 (vmlinux)
perf 19058 506778.423736: 117 cycles: ffffffffa2caa54a native_write_msr+0xa (vmlinux)
perf 19058 506778.423738: 2092 cycles: ffffffffa2c9b7b0 native_apic_msr_write+0x0 (vmlinux)
perf 19058 506778.423740: 37380 cycles: ffffffffa2f121d0 perf_event_addr_filters_exec+0x0 (vmlinux)
uname 19058 506778.423751: 582673 cycles: ffffffffa303a407 propagate_protected_usage+0x147 (vmlinux)
uname 19058 506778.423892: 2241841 cycles: ffffffffa2cae0c9 unwind_next_frame.part.5+0x79 (vmlinux)
uname 19058 506778.424430: 2457397 cycles: ffffffffa3019232 check_memory_region+0x52 (vmlinux)
Committer testing:
# rm -rf perf.data*
# perf record sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.024 MB perf.data (7 samples) ]
# ls -l perf.data
-rw-------. 1 root root 34772 Oct 21 11:08 perf.data
# perf record --kcore uname
Linux
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.024 MB perf.data (7 samples) ]
ls[root@quaco ~]# ls -lad perf.data*
drwx------. 3 root root 4096 Oct 21 11:08 perf.data
-rw-------. 1 root root 34772 Oct 21 11:08 perf.data.old
# perf evlist -v
cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1
# perf evlist -v -i perf.data/data
cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1
#
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lore.kernel.org/lkml/20191004083121.12182-6-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-10-04 08:31:21 +00:00
|
|
|
static bool record__kcore_readable(struct machine *machine)
|
|
|
|
{
|
|
|
|
char kcore[PATH_MAX];
|
|
|
|
int fd;
|
|
|
|
|
|
|
|
scnprintf(kcore, sizeof(kcore), "%s/proc/kcore", machine->root_dir);
|
|
|
|
|
|
|
|
fd = open(kcore, O_RDONLY);
|
|
|
|
if (fd < 0)
|
|
|
|
return false;
|
|
|
|
|
|
|
|
close(fd);
|
|
|
|
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int record__kcore_copy(struct machine *machine, struct perf_data *data)
|
|
|
|
{
|
|
|
|
char from_dir[PATH_MAX];
|
|
|
|
char kcore_dir[PATH_MAX];
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
snprintf(from_dir, sizeof(from_dir), "%s/proc", machine->root_dir);
|
|
|
|
|
|
|
|
ret = perf_data__make_kcore_dir(data, kcore_dir, sizeof(kcore_dir));
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
|
|
|
return kcore_copy(from_dir, kcore_dir);
|
|
|
|
}
|
|
|
|
|
2022-01-17 18:34:23 +00:00
|
|
|
static void record__thread_data_init_pipes(struct record_thread *thread_data)
|
|
|
|
{
|
|
|
|
thread_data->pipes.msg[0] = -1;
|
|
|
|
thread_data->pipes.msg[1] = -1;
|
|
|
|
thread_data->pipes.ack[0] = -1;
|
|
|
|
thread_data->pipes.ack[1] = -1;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int record__thread_data_open_pipes(struct record_thread *thread_data)
|
|
|
|
{
|
|
|
|
if (pipe(thread_data->pipes.msg))
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
if (pipe(thread_data->pipes.ack)) {
|
|
|
|
close(thread_data->pipes.msg[0]);
|
|
|
|
thread_data->pipes.msg[0] = -1;
|
|
|
|
close(thread_data->pipes.msg[1]);
|
|
|
|
thread_data->pipes.msg[1] = -1;
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
|
|
|
pr_debug2("thread_data[%p]: msg=[%d,%d], ack=[%d,%d]\n", thread_data,
|
|
|
|
thread_data->pipes.msg[0], thread_data->pipes.msg[1],
|
|
|
|
thread_data->pipes.ack[0], thread_data->pipes.ack[1]);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void record__thread_data_close_pipes(struct record_thread *thread_data)
|
|
|
|
{
|
|
|
|
if (thread_data->pipes.msg[0] != -1) {
|
|
|
|
close(thread_data->pipes.msg[0]);
|
|
|
|
thread_data->pipes.msg[0] = -1;
|
|
|
|
}
|
|
|
|
if (thread_data->pipes.msg[1] != -1) {
|
|
|
|
close(thread_data->pipes.msg[1]);
|
|
|
|
thread_data->pipes.msg[1] = -1;
|
|
|
|
}
|
|
|
|
if (thread_data->pipes.ack[0] != -1) {
|
|
|
|
close(thread_data->pipes.ack[0]);
|
|
|
|
thread_data->pipes.ack[0] = -1;
|
|
|
|
}
|
|
|
|
if (thread_data->pipes.ack[1] != -1) {
|
|
|
|
close(thread_data->pipes.ack[1]);
|
|
|
|
thread_data->pipes.ack[1] = -1;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2022-05-24 07:54:30 +00:00
|
|
|
static bool evlist__per_thread(struct evlist *evlist)
|
|
|
|
{
|
|
|
|
return cpu_map__is_dummy(evlist->core.user_requested_cpus);
|
|
|
|
}
|
|
|
|
|
2022-01-17 18:34:23 +00:00
|
|
|
static int record__thread_data_init_maps(struct record_thread *thread_data, struct evlist *evlist)
|
|
|
|
{
|
|
|
|
int m, tm, nr_mmaps = evlist->core.nr_mmaps;
|
|
|
|
struct mmap *mmap = evlist->mmap;
|
|
|
|
struct mmap *overwrite_mmap = evlist->overwrite_mmap;
|
2022-05-24 07:54:30 +00:00
|
|
|
struct perf_cpu_map *cpus = evlist->core.all_cpus;
|
|
|
|
bool per_thread = evlist__per_thread(evlist);
|
2022-01-17 18:34:23 +00:00
|
|
|
|
2022-05-24 07:54:30 +00:00
|
|
|
if (per_thread)
|
2022-04-14 01:46:40 +00:00
|
|
|
thread_data->nr_mmaps = nr_mmaps;
|
|
|
|
else
|
|
|
|
thread_data->nr_mmaps = bitmap_weight(thread_data->mask->maps.bits,
|
|
|
|
thread_data->mask->maps.nbits);
|
2022-01-17 18:34:23 +00:00
|
|
|
if (mmap) {
|
|
|
|
thread_data->maps = zalloc(thread_data->nr_mmaps * sizeof(struct mmap *));
|
|
|
|
if (!thread_data->maps)
|
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
if (overwrite_mmap) {
|
|
|
|
thread_data->overwrite_maps = zalloc(thread_data->nr_mmaps * sizeof(struct mmap *));
|
|
|
|
if (!thread_data->overwrite_maps) {
|
|
|
|
zfree(&thread_data->maps);
|
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
pr_debug2("thread_data[%p]: nr_mmaps=%d, maps=%p, ow_maps=%p\n", thread_data,
|
|
|
|
thread_data->nr_mmaps, thread_data->maps, thread_data->overwrite_maps);
|
|
|
|
|
|
|
|
for (m = 0, tm = 0; m < nr_mmaps && tm < thread_data->nr_mmaps; m++) {
|
2022-05-24 07:54:30 +00:00
|
|
|
if (per_thread ||
|
2022-05-03 04:17:52 +00:00
|
|
|
test_bit(perf_cpu_map__cpu(cpus, m).cpu, thread_data->mask->maps.bits)) {
|
2022-01-17 18:34:23 +00:00
|
|
|
if (thread_data->maps) {
|
|
|
|
thread_data->maps[tm] = &mmap[m];
|
|
|
|
pr_debug2("thread_data[%p]: cpu%d: maps[%d] -> mmap[%d]\n",
|
2022-04-14 01:46:40 +00:00
|
|
|
thread_data, perf_cpu_map__cpu(cpus, m).cpu, tm, m);
|
2022-01-17 18:34:23 +00:00
|
|
|
}
|
|
|
|
if (thread_data->overwrite_maps) {
|
|
|
|
thread_data->overwrite_maps[tm] = &overwrite_mmap[m];
|
|
|
|
pr_debug2("thread_data[%p]: cpu%d: ow_maps[%d] -> ow_mmap[%d]\n",
|
2022-04-14 01:46:40 +00:00
|
|
|
thread_data, perf_cpu_map__cpu(cpus, m).cpu, tm, m);
|
2022-01-17 18:34:23 +00:00
|
|
|
}
|
|
|
|
tm++;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int record__thread_data_init_pollfd(struct record_thread *thread_data, struct evlist *evlist)
|
|
|
|
{
|
|
|
|
int f, tm, pos;
|
|
|
|
struct mmap *map, *overwrite_map;
|
|
|
|
|
|
|
|
fdarray__init(&thread_data->pollfd, 64);
|
|
|
|
|
|
|
|
for (tm = 0; tm < thread_data->nr_mmaps; tm++) {
|
|
|
|
map = thread_data->maps ? thread_data->maps[tm] : NULL;
|
|
|
|
overwrite_map = thread_data->overwrite_maps ?
|
|
|
|
thread_data->overwrite_maps[tm] : NULL;
|
|
|
|
|
|
|
|
for (f = 0; f < evlist->core.pollfd.nr; f++) {
|
|
|
|
void *ptr = evlist->core.pollfd.priv[f].ptr;
|
|
|
|
|
|
|
|
if ((map && ptr == map) || (overwrite_map && ptr == overwrite_map)) {
|
|
|
|
pos = fdarray__dup_entry_from(&thread_data->pollfd, f,
|
|
|
|
&evlist->core.pollfd);
|
|
|
|
if (pos < 0)
|
|
|
|
return pos;
|
|
|
|
pr_debug2("thread_data[%p]: pollfd[%d] <- event_fd=%d\n",
|
|
|
|
thread_data, pos, evlist->core.pollfd.entries[f].fd);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void record__free_thread_data(struct record *rec)
|
|
|
|
{
|
|
|
|
int t;
|
|
|
|
struct record_thread *thread_data = rec->thread_data;
|
|
|
|
|
|
|
|
if (thread_data == NULL)
|
|
|
|
return;
|
|
|
|
|
|
|
|
for (t = 0; t < rec->nr_threads; t++) {
|
|
|
|
record__thread_data_close_pipes(&thread_data[t]);
|
|
|
|
zfree(&thread_data[t].maps);
|
|
|
|
zfree(&thread_data[t].overwrite_maps);
|
|
|
|
fdarray__exit(&thread_data[t].pollfd);
|
|
|
|
}
|
|
|
|
|
|
|
|
zfree(&rec->thread_data);
|
|
|
|
}
|
|
|
|
|
2022-08-24 07:28:10 +00:00
|
|
|
static int record__map_thread_evlist_pollfd_indexes(struct record *rec,
|
|
|
|
int evlist_pollfd_index,
|
|
|
|
int thread_pollfd_index)
|
|
|
|
{
|
|
|
|
size_t x = rec->index_map_cnt;
|
|
|
|
|
|
|
|
if (realloc_array_as_needed(rec->index_map, rec->index_map_sz, x, NULL))
|
|
|
|
return -ENOMEM;
|
|
|
|
rec->index_map[x].evlist_pollfd_index = evlist_pollfd_index;
|
|
|
|
rec->index_map[x].thread_pollfd_index = thread_pollfd_index;
|
|
|
|
rec->index_map_cnt += 1;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int record__update_evlist_pollfd_from_thread(struct record *rec,
|
|
|
|
struct evlist *evlist,
|
|
|
|
struct record_thread *thread_data)
|
|
|
|
{
|
|
|
|
struct pollfd *e_entries = evlist->core.pollfd.entries;
|
|
|
|
struct pollfd *t_entries = thread_data->pollfd.entries;
|
|
|
|
int err = 0;
|
|
|
|
size_t i;
|
|
|
|
|
|
|
|
for (i = 0; i < rec->index_map_cnt; i++) {
|
|
|
|
int e_pos = rec->index_map[i].evlist_pollfd_index;
|
|
|
|
int t_pos = rec->index_map[i].thread_pollfd_index;
|
|
|
|
|
|
|
|
if (e_entries[e_pos].fd != t_entries[t_pos].fd ||
|
|
|
|
e_entries[e_pos].events != t_entries[t_pos].events) {
|
|
|
|
pr_err("Thread and evlist pollfd index mismatch\n");
|
|
|
|
err = -EINVAL;
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
e_entries[e_pos].revents = t_entries[t_pos].revents;
|
|
|
|
}
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int record__dup_non_perf_events(struct record *rec,
|
|
|
|
struct evlist *evlist,
|
|
|
|
struct record_thread *thread_data)
|
|
|
|
{
|
|
|
|
struct fdarray *fda = &evlist->core.pollfd;
|
|
|
|
int i, ret;
|
|
|
|
|
|
|
|
for (i = 0; i < fda->nr; i++) {
|
|
|
|
if (!(fda->priv[i].flags & fdarray_flag__non_perf_event))
|
|
|
|
continue;
|
|
|
|
ret = fdarray__dup_entry_from(&thread_data->pollfd, i, fda);
|
|
|
|
if (ret < 0) {
|
|
|
|
pr_err("Failed to duplicate descriptor in main thread pollfd\n");
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
pr_debug2("thread_data[%p]: pollfd[%d] <- non_perf_event fd=%d\n",
|
|
|
|
thread_data, ret, fda->entries[i].fd);
|
|
|
|
ret = record__map_thread_evlist_pollfd_indexes(rec, i, ret);
|
|
|
|
if (ret < 0) {
|
|
|
|
pr_err("Failed to map thread and evlist pollfd indexes\n");
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2022-01-17 18:34:23 +00:00
|
|
|
static int record__alloc_thread_data(struct record *rec, struct evlist *evlist)
|
|
|
|
{
|
|
|
|
int t, ret;
|
|
|
|
struct record_thread *thread_data;
|
|
|
|
|
|
|
|
rec->thread_data = zalloc(rec->nr_threads * sizeof(*(rec->thread_data)));
|
|
|
|
if (!rec->thread_data) {
|
|
|
|
pr_err("Failed to allocate thread data\n");
|
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
thread_data = rec->thread_data;
|
|
|
|
|
|
|
|
for (t = 0; t < rec->nr_threads; t++)
|
|
|
|
record__thread_data_init_pipes(&thread_data[t]);
|
|
|
|
|
|
|
|
for (t = 0; t < rec->nr_threads; t++) {
|
|
|
|
thread_data[t].rec = rec;
|
|
|
|
thread_data[t].mask = &rec->thread_masks[t];
|
|
|
|
ret = record__thread_data_init_maps(&thread_data[t], evlist);
|
|
|
|
if (ret) {
|
|
|
|
pr_err("Failed to initialize thread[%d] maps\n", t);
|
|
|
|
goto out_free;
|
|
|
|
}
|
|
|
|
ret = record__thread_data_init_pollfd(&thread_data[t], evlist);
|
|
|
|
if (ret) {
|
|
|
|
pr_err("Failed to initialize thread[%d] pollfd\n", t);
|
|
|
|
goto out_free;
|
|
|
|
}
|
|
|
|
if (t) {
|
|
|
|
thread_data[t].tid = -1;
|
|
|
|
ret = record__thread_data_open_pipes(&thread_data[t]);
|
|
|
|
if (ret) {
|
|
|
|
pr_err("Failed to open thread[%d] communication pipes\n", t);
|
|
|
|
goto out_free;
|
|
|
|
}
|
|
|
|
ret = fdarray__add(&thread_data[t].pollfd, thread_data[t].pipes.msg[0],
|
|
|
|
POLLIN | POLLERR | POLLHUP, fdarray_flag__nonfilterable);
|
|
|
|
if (ret < 0) {
|
|
|
|
pr_err("Failed to add descriptor to thread[%d] pollfd\n", t);
|
|
|
|
goto out_free;
|
|
|
|
}
|
|
|
|
thread_data[t].ctlfd_pos = ret;
|
|
|
|
pr_debug2("thread_data[%p]: pollfd[%d] <- ctl_fd=%d\n",
|
|
|
|
thread_data, thread_data[t].ctlfd_pos,
|
|
|
|
thread_data[t].pipes.msg[0]);
|
|
|
|
} else {
|
|
|
|
thread_data[t].tid = gettid();
|
2022-08-24 07:28:10 +00:00
|
|
|
|
|
|
|
ret = record__dup_non_perf_events(rec, evlist, &thread_data[t]);
|
|
|
|
if (ret < 0)
|
|
|
|
goto out_free;
|
|
|
|
|
2022-08-24 07:28:12 +00:00
|
|
|
thread_data[t].ctlfd_pos = -1; /* Not used */
|
2022-01-17 18:34:23 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
out_free:
|
|
|
|
record__free_thread_data(rec);
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2016-06-27 10:24:03 +00:00
|
|
|
static int record__mmap_evlist(struct record *rec,
|
2019-07-21 11:23:52 +00:00
|
|
|
struct evlist *evlist)
|
2016-06-27 10:24:03 +00:00
|
|
|
{
|
2022-01-17 18:34:28 +00:00
|
|
|
int i, ret;
|
2016-06-27 10:24:03 +00:00
|
|
|
struct record_opts *opts = &rec->opts;
|
2019-11-15 12:42:16 +00:00
|
|
|
bool auxtrace_overwrite = opts->auxtrace_snapshot_mode ||
|
|
|
|
opts->auxtrace_sample_mode;
|
2016-06-27 10:24:03 +00:00
|
|
|
char msg[512];
|
|
|
|
|
2019-01-22 17:50:57 +00:00
|
|
|
if (opts->affinity != PERF_AFFINITY_SYS)
|
|
|
|
cpu__setup_cpunode_map();
|
|
|
|
|
2019-07-28 10:45:35 +00:00
|
|
|
if (evlist__mmap_ex(evlist, opts->mmap_pages,
|
2016-06-27 10:24:03 +00:00
|
|
|
opts->auxtrace_mmap_pages,
|
2019-11-15 12:42:16 +00:00
|
|
|
auxtrace_overwrite,
|
perf record: Implement --mmap-flush=<number> option
Implement a --mmap-flush option that specifies minimal number of bytes
that is extracted from mmaped kernel buffer to store into a trace. The
default option value is 1 byte what means every time trace writing
thread finds some new data in the mmaped buffer the data is extracted,
possibly compressed and written to a trace.
$ tools/perf/perf record --mmap-flush 1024 -e cycles -- matrix.gcc
$ tools/perf/perf record --aio --mmap-flush 1K -e cycles -- matrix.gcc
The option is independent from -z setting, doesn't vary with compression
level and can serve two purposes.
The first purpose is to increase the compression ratio of a trace data.
Larger data chunks are compressed more effectively so the implemented
option allows specifying data chunk size to compress. Also at some cases
executing more write syscalls with smaller data size can take longer
than executing less write syscalls with bigger data size due to syscall
overhead so extracting bigger data chunks specified by the option value
could additionally decrease runtime overhead.
The second purpose is to avoid self monitoring live-lock issue in system
wide (-a) profiling mode. Profiling in system wide mode with compression
(-a -z) can additionally induce data into the kernel buffers along with
the data from monitored processes. If performance data rate and volume
from the monitored processes is high then trace streaming and
compression activity in the tool is also high. High tool process
activity can lead to subtle live-lock effect when compression of single
new byte from some of mmaped kernel buffer leads to generation of the
next single byte at some mmaped buffer. So perf tool process ends up in
endless self monitoring.
Implemented synch parameter is the mean to force data move independently
from the specified flush threshold value. Despite the provided flush
value the tool needs capability to unconditionally drain memory buffers,
at least in the end of the collection.
Committer testing:
Running with the default value, i.e. as soon as there is something to
read go on consuming, we first write the synthesized events, small
chunks of about 128 bytes:
# perf trace -m 2048 --call-graph dwarf -e write -- perf record
<SNIP>
101.142 ( 0.004 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x210db60, count: 120) = 120
__libc_write (/usr/lib64/libpthread-2.28.so)
ion (/home/acme/bin/perf)
record__write (inlined)
process_synthesized_event (/home/acme/bin/perf)
perf_tool__process_synth_event (inlined)
perf_event__synthesize_mmap_events (/home/acme/bin/perf)
Then we move to reading the mmap buffers consuming the events put there
by the kernel perf infrastructure:
107.561 ( 0.005 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befc02000, count: 336) = 336
__libc_write (/usr/lib64/libpthread-2.28.so)
ion (/home/acme/bin/perf)
record__write (inlined)
record__pushfn (/home/acme/bin/perf)
perf_mmap__push (/home/acme/bin/perf)
record__mmap_read_evlist (inlined)
record__mmap_read_all (inlined)
__cmd_record (inlined)
cmd_record (/home/acme/bin/perf)
12919.953 ( 0.136 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befc83150, count: 184984) = 184984
<SNIP same backtrace as in the 107.561 timestamp>
12920.094 ( 0.155 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befc02150, count: 261816) = 261816
<SNIP same backtrace as in the 107.561 timestamp>
12920.253 ( 0.093 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befb81120, count: 170832) = 170832
<SNIP same backtrace as in the 107.561 timestamp>
If we limit it to write only when more than 16MB are available for
reading, it throttles that to a quarter of the --mmap-pages set for
'perf record', which by default get to 528384 bytes, found out using
'record -v':
mmap flush: 132096
mmap size 528384B
With that in place all the writes coming from
record__mmap_read_evlist(), i.e. from the mmap buffers setup by the
kernel perf infrastructure were at least 132096 bytes long.
Trying with a bigger mmap size:
perf trace -e write perf record -v -m 2048 --mmap-flush 16M
74982.928 ( 2.471 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff94a6cc000, count: 3580888) = 3580888
74985.406 ( 2.353 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff949ecb000, count: 3453256) = 3453256
74987.764 ( 2.629 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff9496ca000, count: 3859232) = 3859232
74990.399 ( 2.341 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff948ec9000, count: 3769032) = 3769032
74992.744 ( 2.064 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff9486c8000, count: 3310520) = 3310520
74994.814 ( 2.619 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff947ec7000, count: 4194688) = 4194688
74997.439 ( 2.787 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff9476c6000, count: 4029760) = 4029760
Was again limited to a quarter of the mmap size:
mmap flush: 2098176
mmap size 8392704B
A warning about that would be good to have but can be added later,
something like:
"max flush is a quarter of the mmap size, if wanting to bump the mmap
flush further, bump the mmap size as well using -m/--mmap-pages"
Also rename the 'sync' parameters to 'synch' to keep tools/perf building
with older glibcs:
cc1: warnings being treated as errors
builtin-record.c: In function 'record__mmap_read_evlist':
builtin-record.c:775: warning: declaration of 'sync' shadows a global declaration
/usr/include/unistd.h:933: warning: shadowed declaration is here
builtin-record.c: In function 'record__mmap_read_all':
builtin-record.c:856: warning: declaration of 'sync' shadows a global declaration
/usr/include/unistd.h:933: warning: shadowed declaration is here
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/f6600d72-ecfa-2eb7-7e51-f6954547d500@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-18 17:40:26 +00:00
|
|
|
opts->nr_cblocks, opts->affinity,
|
2019-03-18 17:42:19 +00:00
|
|
|
opts->mmap_flush, opts->comp_level) < 0) {
|
2016-06-27 10:24:03 +00:00
|
|
|
if (errno == EPERM) {
|
|
|
|
pr_err("Permission error mapping pages.\n"
|
|
|
|
"Consider increasing "
|
|
|
|
"/proc/sys/kernel/perf_event_mlock_kb,\n"
|
|
|
|
"or try again with a smaller value of -m/--mmap_pages.\n"
|
|
|
|
"(current value: %u,%u)\n",
|
|
|
|
opts->mmap_pages, opts->auxtrace_mmap_pages);
|
|
|
|
return -errno;
|
|
|
|
} else {
|
|
|
|
pr_err("failed to mmap with %d (%s)\n", errno,
|
tools: Introduce str_error_r()
The tools so far have been using the strerror_r() GNU variant, that
returns a string, be it the buffer passed or something else.
But that, besides being tricky in cases where we expect that the
function using strerror_r() returns the error formatted in a provided
buffer (we have to check if it returned something else and copy that
instead), breaks the build on systems not using glibc, like Alpine
Linux, where musl libc is used.
So, introduce yet another wrapper, str_error_r(), that has the GNU
interface, but uses the portable XSI variant of strerror_r(), so that
users rest asured that the provided buffer is used and it is what is
returned.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-d4t42fnf48ytlk8rjxs822tf@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-07-06 14:56:20 +00:00
|
|
|
str_error_r(errno, msg, sizeof(msg)));
|
2016-06-27 10:24:03 +00:00
|
|
|
if (errno)
|
|
|
|
return -errno;
|
|
|
|
else
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
}
|
2022-01-17 18:34:23 +00:00
|
|
|
|
|
|
|
if (evlist__initialize_ctlfd(evlist, opts->ctl_fd, opts->ctl_fd_ack))
|
|
|
|
return -1;
|
|
|
|
|
|
|
|
ret = record__alloc_thread_data(rec, evlist);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
2022-01-17 18:34:28 +00:00
|
|
|
if (record__threads_enabled(rec)) {
|
|
|
|
ret = perf_data__create_dir(&rec->data, evlist->core.nr_mmaps);
|
2022-02-22 09:14:17 +00:00
|
|
|
if (ret) {
|
|
|
|
pr_err("Failed to create data directory: %s\n", strerror(-ret));
|
2022-01-17 18:34:28 +00:00
|
|
|
return ret;
|
2022-02-22 09:14:17 +00:00
|
|
|
}
|
2022-01-17 18:34:28 +00:00
|
|
|
for (i = 0; i < evlist->core.nr_mmaps; i++) {
|
|
|
|
if (evlist->mmap)
|
|
|
|
evlist->mmap[i].file = &rec->data.dir.files[i];
|
|
|
|
if (evlist->overwrite_mmap)
|
|
|
|
evlist->overwrite_mmap[i].file = &rec->data.dir.files[i];
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2016-06-27 10:24:03 +00:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int record__mmap(struct record *rec)
|
|
|
|
{
|
|
|
|
return record__mmap_evlist(rec, rec->evlist);
|
|
|
|
}
|
|
|
|
|
2013-12-19 17:38:03 +00:00
|
|
|
static int record__open(struct record *rec)
|
2011-01-12 16:28:51 +00:00
|
|
|
{
|
2017-02-13 19:45:24 +00:00
|
|
|
char msg[BUFSIZ];
|
2019-07-21 11:23:51 +00:00
|
|
|
struct evsel *pos;
|
2019-07-21 11:23:52 +00:00
|
|
|
struct evlist *evlist = rec->evlist;
|
2011-11-25 10:19:45 +00:00
|
|
|
struct perf_session *session = rec->session;
|
2013-12-19 17:43:45 +00:00
|
|
|
struct record_opts *opts = &rec->opts;
|
2012-08-26 18:24:47 +00:00
|
|
|
int rc = 0;
|
2011-01-12 16:28:51 +00:00
|
|
|
|
2016-06-23 14:26:15 +00:00
|
|
|
evlist__for_each_entry(evlist, pos) {
|
2011-01-12 16:28:51 +00:00
|
|
|
try_again:
|
2019-07-21 11:24:39 +00:00
|
|
|
if (evsel__open(pos, pos->core.cpus, pos->core.threads) < 0) {
|
2023-11-21 00:04:20 +00:00
|
|
|
if (evsel__fallback(pos, &opts->target, errno, msg, sizeof(msg))) {
|
2017-02-17 08:17:38 +00:00
|
|
|
if (verbose > 0)
|
2012-12-13 17:16:30 +00:00
|
|
|
ui__warning("%s\n", msg);
|
2010-03-18 14:36:05 +00:00
|
|
|
goto try_again;
|
|
|
|
}
|
perf record: Support weak groups
Implement a weak group fallback for 'perf record', similar to the
existing 'perf stat' support. This allows to use groups that might be
longer than the available counters without failing.
Before:
$ perf record -e '{cycles,cache-misses,cache-references,cpu_clk_unhalted.thread,cycles,cycles,cycles}' -a sleep 1
Error:
The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (cycles).
/bin/dmesg | grep -i perf may provide additional information.
After:
$ ./perf record -e '{cycles,cache-misses,cache-references,cpu_clk_unhalted.thread,cycles,cycles,cycles}:W' -a sleep 1
WARNING: No sample_id_all support, falling back to unordered processing
[ perf record: Woken up 3 times to write data ]
[ perf record: Captured and wrote 8.136 MB perf.data (134069 samples) ]
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20181001195927.14211-2-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-10-01 19:59:27 +00:00
|
|
|
if ((errno == EINVAL || errno == EBADF) &&
|
2021-07-06 15:17:00 +00:00
|
|
|
pos->core.leader != &pos->core &&
|
perf record: Support weak groups
Implement a weak group fallback for 'perf record', similar to the
existing 'perf stat' support. This allows to use groups that might be
longer than the available counters without failing.
Before:
$ perf record -e '{cycles,cache-misses,cache-references,cpu_clk_unhalted.thread,cycles,cycles,cycles}' -a sleep 1
Error:
The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (cycles).
/bin/dmesg | grep -i perf may provide additional information.
After:
$ ./perf record -e '{cycles,cache-misses,cache-references,cpu_clk_unhalted.thread,cycles,cycles,cycles}:W' -a sleep 1
WARNING: No sample_id_all support, falling back to unordered processing
[ perf record: Woken up 3 times to write data ]
[ perf record: Captured and wrote 8.136 MB perf.data (134069 samples) ]
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20181001195927.14211-2-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-10-01 19:59:27 +00:00
|
|
|
pos->weak_group) {
|
2020-11-30 17:58:32 +00:00
|
|
|
pos = evlist__reset_weak_group(evlist, pos, true);
|
perf record: Support weak groups
Implement a weak group fallback for 'perf record', similar to the
existing 'perf stat' support. This allows to use groups that might be
longer than the available counters without failing.
Before:
$ perf record -e '{cycles,cache-misses,cache-references,cpu_clk_unhalted.thread,cycles,cycles,cycles}' -a sleep 1
Error:
The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (cycles).
/bin/dmesg | grep -i perf may provide additional information.
After:
$ ./perf record -e '{cycles,cache-misses,cache-references,cpu_clk_unhalted.thread,cycles,cycles,cycles}:W' -a sleep 1
WARNING: No sample_id_all support, falling back to unordered processing
[ perf record: Woken up 3 times to write data ]
[ perf record: Captured and wrote 8.136 MB perf.data (134069 samples) ]
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20181001195927.14211-2-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-10-01 19:59:27 +00:00
|
|
|
goto try_again;
|
|
|
|
}
|
2012-12-13 18:10:58 +00:00
|
|
|
rc = -errno;
|
2020-05-04 16:43:03 +00:00
|
|
|
evsel__open_strerror(pos, &opts->target, errno, msg, sizeof(msg));
|
2012-12-13 18:10:58 +00:00
|
|
|
ui__error("%s\n", msg);
|
2012-08-26 18:24:47 +00:00
|
|
|
goto out;
|
2009-10-15 03:22:07 +00:00
|
|
|
}
|
2017-11-17 21:42:58 +00:00
|
|
|
|
|
|
|
pos->supported = true;
|
2009-10-15 03:22:07 +00:00
|
|
|
}
|
2010-12-25 14:12:25 +00:00
|
|
|
|
2020-11-30 18:07:49 +00:00
|
|
|
if (symbol_conf.kptr_restrict && !evlist__exclude_kernel(evlist)) {
|
perf record: Move restricted maps check to after a possible fallback to not collect kernel samples
Before:
[acme@quaco ~]$ perf record -b -e cycles date
WARNING: Kernel address maps (/proc/{kallsyms,modules}) are restricted,
check /proc/sys/kernel/kptr_restrict and /proc/sys/kernel/perf_event_paranoid.
Samples in kernel functions may not be resolved if a suitable vmlinux
file is not found in the buildid cache or in the vmlinux path.
Samples in kernel modules won't be resolved at all.
If some relocation was applied (e.g. kexec) symbols may be misresolved
even with a suitable vmlinux or kallsyms file.
Mon 23 Sep 2019 11:00:59 AM -03
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.005 MB perf.data (14 samples) ]
[acme@quaco ~]$
But we did a fallback and exclude_kernel was set, so no need for
resolving kernel symbols:
$ perf evlist -v
cycles:u: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD|BRANCH_STACK, read_format: ID, disabled: 1, inherit: 1, exclude_kernel: 1, exclude_hv: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, branch_sample_type: ANY
$
After:
[acme@quaco ~]$ perf record -b -e cycles date
Mon 23 Sep 2019 11:07:18 AM -03
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.007 MB perf.data (16 samples) ]
[acme@quaco ~]$ perf evlist -v
cycles:u: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD|BRANCH_STACK, read_format: ID, disabled: 1, inherit: 1, exclude_kernel: 1, exclude_hv: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, branch_sample_type: ANY
[acme@quaco ~]$
No needless warning is emitted.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lkml.kernel.org/n/tip-5yqnr8xcqwhr15xktj2097ac@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-09-23 14:07:29 +00:00
|
|
|
pr_warning(
|
|
|
|
"WARNING: Kernel address maps (/proc/{kallsyms,modules}) are restricted,\n"
|
|
|
|
"check /proc/sys/kernel/kptr_restrict and /proc/sys/kernel/perf_event_paranoid.\n\n"
|
|
|
|
"Samples in kernel functions may not be resolved if a suitable vmlinux\n"
|
|
|
|
"file is not found in the buildid cache or in the vmlinux path.\n\n"
|
|
|
|
"Samples in kernel modules won't be resolved at all.\n\n"
|
|
|
|
"If some relocation was applied (e.g. kexec) symbols may be misresolved\n"
|
|
|
|
"even with a suitable vmlinux or kallsyms file.\n\n");
|
|
|
|
}
|
|
|
|
|
2024-07-03 22:30:29 +00:00
|
|
|
if (evlist__apply_filters(evlist, &pos, &opts->target)) {
|
2017-06-27 14:22:31 +00:00
|
|
|
pr_err("failed to set filter \"%s\" on event %s with %d (%s)\n",
|
2023-03-14 23:42:36 +00:00
|
|
|
pos->filter ?: "BPF", evsel__name(pos), errno,
|
tools: Introduce str_error_r()
The tools so far have been using the strerror_r() GNU variant, that
returns a string, be it the buffer passed or something else.
But that, besides being tricky in cases where we expect that the
function using strerror_r() returns the error formatted in a provided
buffer (we have to check if it returned something else and copy that
instead), breaks the build on systems not using glibc, like Alpine
Linux, where musl libc is used.
So, introduce yet another wrapper, str_error_r(), that has the GNU
interface, but uses the portable XSI variant of strerror_r(), so that
users rest asured that the provided buffer is used and it is what is
returned.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-d4t42fnf48ytlk8rjxs822tf@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-07-06 14:56:20 +00:00
|
|
|
str_error_r(errno, msg, sizeof(msg)));
|
2012-08-26 18:24:47 +00:00
|
|
|
rc = -1;
|
2016-09-16 15:50:03 +00:00
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
2016-06-27 10:24:03 +00:00
|
|
|
rc = record__mmap(rec);
|
|
|
|
if (rc)
|
2012-08-26 18:24:47 +00:00
|
|
|
goto out;
|
2011-01-14 17:50:51 +00:00
|
|
|
|
2013-06-05 11:35:06 +00:00
|
|
|
session->evlist = evlist;
|
2012-08-01 22:31:00 +00:00
|
|
|
perf_session__set_id_hdr_size(session);
|
2012-08-26 18:24:47 +00:00
|
|
|
out:
|
|
|
|
return rc;
|
2009-05-05 15:50:27 +00:00
|
|
|
}
|
|
|
|
|
2021-05-03 06:42:22 +00:00
|
|
|
static void set_timestamp_boundary(struct record *rec, u64 sample_time)
|
|
|
|
{
|
|
|
|
if (rec->evlist->first_sample_time == 0)
|
|
|
|
rec->evlist->first_sample_time = sample_time;
|
|
|
|
|
|
|
|
if (sample_time)
|
|
|
|
rec->evlist->last_sample_time = sample_time;
|
|
|
|
}
|
|
|
|
|
2024-08-12 20:46:55 +00:00
|
|
|
static int process_sample_event(const struct perf_tool *tool,
|
2015-01-29 08:06:44 +00:00
|
|
|
union perf_event *event,
|
|
|
|
struct perf_sample *sample,
|
2019-07-21 11:23:51 +00:00
|
|
|
struct evsel *evsel,
|
2015-01-29 08:06:44 +00:00
|
|
|
struct machine *machine)
|
|
|
|
{
|
|
|
|
struct record *rec = container_of(tool, struct record, tool);
|
|
|
|
|
2021-05-03 06:42:22 +00:00
|
|
|
set_timestamp_boundary(rec, sample->time);
|
2015-01-29 08:06:44 +00:00
|
|
|
|
perf record: Record the first and last sample time in the header
In the default 'perf record' configuration, all samples are processed,
to create the HEADER_BUILD_ID table. So it's very easy to get the
first/last samples and save the time to perf file header via the
function write_sample_time().
Later, at post processing time, perf report/script will fetch the time
from perf file header.
Committer testing:
# perf record -a sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 2.099 MB perf.data (1101 samples) ]
[root@jouet home]# perf report --header | grep "time of "
# time of first sample : 22947.909226
# time of last sample : 22948.910704
#
# perf report -D | grep PERF_RECORD_SAMPLE\(
0 22947909226101 0x20bb68 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4001): 0/0: 0xffffffffa21b1af3 period: 1 addr: 0
0 22947909229928 0x20bb98 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4001): 0/0: 0xffffffffa200d204 period: 1 addr: 0
<SNIP>
3 22948910397351 0x219360 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4001): 28251/28251: 0xffffffffa22071d8 period: 169518 addr: 0
0 22948910652380 0x20f120 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4001): 0/0: 0xffffffffa2856816 period: 198807 addr: 0
2 22948910704034 0x2172d0 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4001): 0/0: 0xffffffffa2856816 period: 88111 addr: 0
#
Changelog:
v7: Just update the patch description according to Arnaldo's suggestion.
v6: Currently '--buildid-all' is not enabled at default. So the walking
on all samples is the default operation. There is no big overhead
to calculate the timestamp boundary in process_sample_event handler
once we already go through all samples. So the timestamp boundary
calculation is enabled by default when '--buildid-all' is not enabled.
While if '--buildid-all' is enabled, we creates a new option
"--timestamp-boundary" for user to decide if it enables the
timestamp boundary calculation.
v5: There is an issue that the sample walking can only work when
'--buildid-all' is not enabled. So we need to let the walking
be able to work even if '--buildid-all' is enabled and let the
processing skips the dso hit marking for this case.
At first, I want to provide a new option "--record-time-boundaries".
While after consideration, I think a new option is not very
necessary.
v3: Remove the definitions of first_sample_time and last_sample_time
from struct record and directly save them in perf_evlist.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1512738826-2628-3-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-12-08 13:13:42 +00:00
|
|
|
if (rec->buildid_all)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
rec->samples++;
|
2015-01-29 08:06:44 +00:00
|
|
|
return build_id__mark_dso_hit(tool, event, sample, evsel, machine);
|
|
|
|
}
|
|
|
|
|
2013-12-19 17:38:03 +00:00
|
|
|
static int process_buildids(struct record *rec)
|
2010-02-03 18:52:05 +00:00
|
|
|
{
|
2013-10-15 14:27:32 +00:00
|
|
|
struct perf_session *session = rec->session;
|
2010-02-03 18:52:05 +00:00
|
|
|
|
2019-02-21 09:41:29 +00:00
|
|
|
if (perf_data__size(&rec->data) == 0)
|
2010-03-11 18:53:11 +00:00
|
|
|
return 0;
|
|
|
|
|
2014-11-04 01:14:32 +00:00
|
|
|
/*
|
|
|
|
* During this process, it'll load kernel map and replace the
|
|
|
|
* dso->long_name to a real pathname it found. In this case
|
|
|
|
* we prefer the vmlinux path like
|
|
|
|
* /lib/modules/3.16.4/build/vmlinux
|
|
|
|
*
|
|
|
|
* rather than build-id path (in debug directory).
|
|
|
|
* $HOME/.debug/.build-id/f0/6e17aa50adf4d00b88925e03775de107611551
|
|
|
|
*/
|
|
|
|
symbol_conf.ignore_vmlinux_buildid = true;
|
|
|
|
|
2016-01-11 13:37:09 +00:00
|
|
|
/*
|
|
|
|
* If --buildid-all is given, it marks all DSO regardless of hits,
|
perf record: Record the first and last sample time in the header
In the default 'perf record' configuration, all samples are processed,
to create the HEADER_BUILD_ID table. So it's very easy to get the
first/last samples and save the time to perf file header via the
function write_sample_time().
Later, at post processing time, perf report/script will fetch the time
from perf file header.
Committer testing:
# perf record -a sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 2.099 MB perf.data (1101 samples) ]
[root@jouet home]# perf report --header | grep "time of "
# time of first sample : 22947.909226
# time of last sample : 22948.910704
#
# perf report -D | grep PERF_RECORD_SAMPLE\(
0 22947909226101 0x20bb68 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4001): 0/0: 0xffffffffa21b1af3 period: 1 addr: 0
0 22947909229928 0x20bb98 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4001): 0/0: 0xffffffffa200d204 period: 1 addr: 0
<SNIP>
3 22948910397351 0x219360 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4001): 28251/28251: 0xffffffffa22071d8 period: 169518 addr: 0
0 22948910652380 0x20f120 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4001): 0/0: 0xffffffffa2856816 period: 198807 addr: 0
2 22948910704034 0x2172d0 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4001): 0/0: 0xffffffffa2856816 period: 88111 addr: 0
#
Changelog:
v7: Just update the patch description according to Arnaldo's suggestion.
v6: Currently '--buildid-all' is not enabled at default. So the walking
on all samples is the default operation. There is no big overhead
to calculate the timestamp boundary in process_sample_event handler
once we already go through all samples. So the timestamp boundary
calculation is enabled by default when '--buildid-all' is not enabled.
While if '--buildid-all' is enabled, we creates a new option
"--timestamp-boundary" for user to decide if it enables the
timestamp boundary calculation.
v5: There is an issue that the sample walking can only work when
'--buildid-all' is not enabled. So we need to let the walking
be able to work even if '--buildid-all' is enabled and let the
processing skips the dso hit marking for this case.
At first, I want to provide a new option "--record-time-boundaries".
While after consideration, I think a new option is not very
necessary.
v3: Remove the definitions of first_sample_time and last_sample_time
from struct record and directly save them in perf_evlist.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1512738826-2628-3-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-12-08 13:13:42 +00:00
|
|
|
* so no need to process samples. But if timestamp_boundary is enabled,
|
|
|
|
* it still needs to walk on all samples to get the timestamps of
|
|
|
|
* first/last samples.
|
2016-01-11 13:37:09 +00:00
|
|
|
*/
|
perf record: Record the first and last sample time in the header
In the default 'perf record' configuration, all samples are processed,
to create the HEADER_BUILD_ID table. So it's very easy to get the
first/last samples and save the time to perf file header via the
function write_sample_time().
Later, at post processing time, perf report/script will fetch the time
from perf file header.
Committer testing:
# perf record -a sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 2.099 MB perf.data (1101 samples) ]
[root@jouet home]# perf report --header | grep "time of "
# time of first sample : 22947.909226
# time of last sample : 22948.910704
#
# perf report -D | grep PERF_RECORD_SAMPLE\(
0 22947909226101 0x20bb68 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4001): 0/0: 0xffffffffa21b1af3 period: 1 addr: 0
0 22947909229928 0x20bb98 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4001): 0/0: 0xffffffffa200d204 period: 1 addr: 0
<SNIP>
3 22948910397351 0x219360 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4001): 28251/28251: 0xffffffffa22071d8 period: 169518 addr: 0
0 22948910652380 0x20f120 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4001): 0/0: 0xffffffffa2856816 period: 198807 addr: 0
2 22948910704034 0x2172d0 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4001): 0/0: 0xffffffffa2856816 period: 88111 addr: 0
#
Changelog:
v7: Just update the patch description according to Arnaldo's suggestion.
v6: Currently '--buildid-all' is not enabled at default. So the walking
on all samples is the default operation. There is no big overhead
to calculate the timestamp boundary in process_sample_event handler
once we already go through all samples. So the timestamp boundary
calculation is enabled by default when '--buildid-all' is not enabled.
While if '--buildid-all' is enabled, we creates a new option
"--timestamp-boundary" for user to decide if it enables the
timestamp boundary calculation.
v5: There is an issue that the sample walking can only work when
'--buildid-all' is not enabled. So we need to let the walking
be able to work even if '--buildid-all' is enabled and let the
processing skips the dso hit marking for this case.
At first, I want to provide a new option "--record-time-boundaries".
While after consideration, I think a new option is not very
necessary.
v3: Remove the definitions of first_sample_time and last_sample_time
from struct record and directly save them in perf_evlist.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1512738826-2628-3-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-12-08 13:13:42 +00:00
|
|
|
if (rec->buildid_all && !rec->timestamp_boundary)
|
2024-08-12 20:47:03 +00:00
|
|
|
rec->tool.sample = process_event_sample_stub;
|
2016-01-11 13:37:09 +00:00
|
|
|
|
2015-03-03 14:58:45 +00:00
|
|
|
return perf_session__process_events(session);
|
2010-02-03 18:52:05 +00:00
|
|
|
}
|
|
|
|
|
2011-01-29 16:01:45 +00:00
|
|
|
static void perf_event__synthesize_guest_os(struct machine *machine, void *data)
|
2010-04-19 05:32:50 +00:00
|
|
|
{
|
|
|
|
int err;
|
2011-11-28 10:30:20 +00:00
|
|
|
struct perf_tool *tool = data;
|
2010-04-19 05:32:50 +00:00
|
|
|
/*
|
|
|
|
*As for guest kernel when processing subcommand record&report,
|
|
|
|
*we arrange module mmap prior to guest kernel mmap and trigger
|
|
|
|
*a preload dso because default guest module symbols are loaded
|
|
|
|
*from guest kallsyms instead of /lib/modules/XXX/XXX. This
|
|
|
|
*method is used to avoid symbol missing when the first addr is
|
|
|
|
*in module instead of in guest kernel.
|
|
|
|
*/
|
2011-11-28 10:30:20 +00:00
|
|
|
err = perf_event__synthesize_modules(tool, process_synthesized_event,
|
2011-11-28 09:56:39 +00:00
|
|
|
machine);
|
2010-04-19 05:32:50 +00:00
|
|
|
if (err < 0)
|
|
|
|
pr_err("Couldn't record guest kernel [%d]'s reference"
|
2010-04-28 00:17:50 +00:00
|
|
|
" relocation symbol.\n", machine->pid);
|
2010-04-19 05:32:50 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* We use _stext for guest kernel because guest kernel's /proc/kallsyms
|
|
|
|
* have no _text sometimes.
|
|
|
|
*/
|
2011-11-28 10:30:20 +00:00
|
|
|
err = perf_event__synthesize_kernel_mmap(tool, process_synthesized_event,
|
2014-01-29 14:14:40 +00:00
|
|
|
machine);
|
2010-04-19 05:32:50 +00:00
|
|
|
if (err < 0)
|
|
|
|
pr_err("Couldn't record guest kernel [%d]'s reference"
|
2010-04-28 00:17:50 +00:00
|
|
|
" relocation symbol.\n", machine->pid);
|
2010-04-19 05:32:50 +00:00
|
|
|
}
|
|
|
|
|
2010-05-02 20:05:29 +00:00
|
|
|
static struct perf_event_header finished_round_event = {
|
|
|
|
.size = sizeof(struct perf_event_header),
|
|
|
|
.type = PERF_RECORD_FINISHED_ROUND,
|
|
|
|
};
|
|
|
|
|
2022-06-10 11:33:15 +00:00
|
|
|
static struct perf_event_header finished_init_event = {
|
|
|
|
.size = sizeof(struct perf_event_header),
|
|
|
|
.type = PERF_RECORD_FINISHED_INIT,
|
|
|
|
};
|
|
|
|
|
2019-07-27 18:30:53 +00:00
|
|
|
static void record__adjust_affinity(struct record *rec, struct mmap *map)
|
2019-01-22 17:50:57 +00:00
|
|
|
{
|
|
|
|
if (rec->opts.affinity != PERF_AFFINITY_SYS &&
|
2022-01-17 18:34:25 +00:00
|
|
|
!bitmap_equal(thread->mask->affinity.bits, map->affinity_mask.bits,
|
|
|
|
thread->mask->affinity.nbits)) {
|
|
|
|
bitmap_zero(thread->mask->affinity.bits, thread->mask->affinity.nbits);
|
|
|
|
bitmap_or(thread->mask->affinity.bits, thread->mask->affinity.bits,
|
|
|
|
map->affinity_mask.bits, thread->mask->affinity.nbits);
|
|
|
|
sched_setaffinity(0, MMAP_CPU_MASK_BYTES(&thread->mask->affinity),
|
|
|
|
(cpu_set_t *)thread->mask->affinity.bits);
|
|
|
|
if (verbose == 2) {
|
|
|
|
pr_debug("threads[%d]: running on cpu%d: ", thread->tid, sched_getcpu());
|
|
|
|
mmap_cpu_mask__scnprintf(&thread->mask->affinity, "affinity");
|
|
|
|
}
|
2019-01-22 17:50:57 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2019-03-18 17:43:35 +00:00
|
|
|
static size_t process_comp_header(void *record, size_t increment)
|
|
|
|
{
|
2019-08-28 13:57:16 +00:00
|
|
|
struct perf_record_compressed *event = record;
|
2019-03-18 17:43:35 +00:00
|
|
|
size_t size = sizeof(*event);
|
|
|
|
|
|
|
|
if (increment) {
|
|
|
|
event->header.size += increment;
|
|
|
|
return increment;
|
|
|
|
}
|
|
|
|
|
|
|
|
event->header.type = PERF_RECORD_COMPRESSED;
|
|
|
|
event->header.size = size;
|
|
|
|
|
|
|
|
return size;
|
|
|
|
}
|
|
|
|
|
2023-11-02 17:56:46 +00:00
|
|
|
static ssize_t zstd_compress(struct perf_session *session, struct mmap *map,
|
2022-01-17 18:34:30 +00:00
|
|
|
void *dst, size_t dst_size, void *src, size_t src_size)
|
2019-03-18 17:43:35 +00:00
|
|
|
{
|
2023-11-02 17:56:46 +00:00
|
|
|
ssize_t compressed;
|
2019-08-28 13:57:16 +00:00
|
|
|
size_t max_record_size = PERF_SAMPLE_MAX_SIZE - sizeof(struct perf_record_compressed) - 1;
|
2022-01-17 18:34:30 +00:00
|
|
|
struct zstd_data *zstd_data = &session->zstd_data;
|
2019-03-18 17:43:35 +00:00
|
|
|
|
2022-01-17 18:34:30 +00:00
|
|
|
if (map && map->file)
|
|
|
|
zstd_data = &map->zstd_data;
|
|
|
|
|
|
|
|
compressed = zstd_compress_stream_to_records(zstd_data, dst, dst_size, src, src_size,
|
2019-03-18 17:43:35 +00:00
|
|
|
max_record_size, process_comp_header);
|
2023-11-02 17:56:46 +00:00
|
|
|
if (compressed < 0)
|
|
|
|
return compressed;
|
2019-03-18 17:43:35 +00:00
|
|
|
|
2022-01-17 18:34:31 +00:00
|
|
|
if (map && map->file) {
|
|
|
|
thread->bytes_transferred += src_size;
|
|
|
|
thread->bytes_compressed += compressed;
|
|
|
|
} else {
|
|
|
|
session->bytes_transferred += src_size;
|
|
|
|
session->bytes_compressed += compressed;
|
|
|
|
}
|
2019-03-18 17:43:35 +00:00
|
|
|
|
|
|
|
return compressed;
|
|
|
|
}
|
|
|
|
|
2019-07-21 11:23:52 +00:00
|
|
|
static int record__mmap_read_evlist(struct record *rec, struct evlist *evlist,
|
perf record: Implement --mmap-flush=<number> option
Implement a --mmap-flush option that specifies minimal number of bytes
that is extracted from mmaped kernel buffer to store into a trace. The
default option value is 1 byte what means every time trace writing
thread finds some new data in the mmaped buffer the data is extracted,
possibly compressed and written to a trace.
$ tools/perf/perf record --mmap-flush 1024 -e cycles -- matrix.gcc
$ tools/perf/perf record --aio --mmap-flush 1K -e cycles -- matrix.gcc
The option is independent from -z setting, doesn't vary with compression
level and can serve two purposes.
The first purpose is to increase the compression ratio of a trace data.
Larger data chunks are compressed more effectively so the implemented
option allows specifying data chunk size to compress. Also at some cases
executing more write syscalls with smaller data size can take longer
than executing less write syscalls with bigger data size due to syscall
overhead so extracting bigger data chunks specified by the option value
could additionally decrease runtime overhead.
The second purpose is to avoid self monitoring live-lock issue in system
wide (-a) profiling mode. Profiling in system wide mode with compression
(-a -z) can additionally induce data into the kernel buffers along with
the data from monitored processes. If performance data rate and volume
from the monitored processes is high then trace streaming and
compression activity in the tool is also high. High tool process
activity can lead to subtle live-lock effect when compression of single
new byte from some of mmaped kernel buffer leads to generation of the
next single byte at some mmaped buffer. So perf tool process ends up in
endless self monitoring.
Implemented synch parameter is the mean to force data move independently
from the specified flush threshold value. Despite the provided flush
value the tool needs capability to unconditionally drain memory buffers,
at least in the end of the collection.
Committer testing:
Running with the default value, i.e. as soon as there is something to
read go on consuming, we first write the synthesized events, small
chunks of about 128 bytes:
# perf trace -m 2048 --call-graph dwarf -e write -- perf record
<SNIP>
101.142 ( 0.004 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x210db60, count: 120) = 120
__libc_write (/usr/lib64/libpthread-2.28.so)
ion (/home/acme/bin/perf)
record__write (inlined)
process_synthesized_event (/home/acme/bin/perf)
perf_tool__process_synth_event (inlined)
perf_event__synthesize_mmap_events (/home/acme/bin/perf)
Then we move to reading the mmap buffers consuming the events put there
by the kernel perf infrastructure:
107.561 ( 0.005 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befc02000, count: 336) = 336
__libc_write (/usr/lib64/libpthread-2.28.so)
ion (/home/acme/bin/perf)
record__write (inlined)
record__pushfn (/home/acme/bin/perf)
perf_mmap__push (/home/acme/bin/perf)
record__mmap_read_evlist (inlined)
record__mmap_read_all (inlined)
__cmd_record (inlined)
cmd_record (/home/acme/bin/perf)
12919.953 ( 0.136 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befc83150, count: 184984) = 184984
<SNIP same backtrace as in the 107.561 timestamp>
12920.094 ( 0.155 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befc02150, count: 261816) = 261816
<SNIP same backtrace as in the 107.561 timestamp>
12920.253 ( 0.093 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befb81120, count: 170832) = 170832
<SNIP same backtrace as in the 107.561 timestamp>
If we limit it to write only when more than 16MB are available for
reading, it throttles that to a quarter of the --mmap-pages set for
'perf record', which by default get to 528384 bytes, found out using
'record -v':
mmap flush: 132096
mmap size 528384B
With that in place all the writes coming from
record__mmap_read_evlist(), i.e. from the mmap buffers setup by the
kernel perf infrastructure were at least 132096 bytes long.
Trying with a bigger mmap size:
perf trace -e write perf record -v -m 2048 --mmap-flush 16M
74982.928 ( 2.471 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff94a6cc000, count: 3580888) = 3580888
74985.406 ( 2.353 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff949ecb000, count: 3453256) = 3453256
74987.764 ( 2.629 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff9496ca000, count: 3859232) = 3859232
74990.399 ( 2.341 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff948ec9000, count: 3769032) = 3769032
74992.744 ( 2.064 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff9486c8000, count: 3310520) = 3310520
74994.814 ( 2.619 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff947ec7000, count: 4194688) = 4194688
74997.439 ( 2.787 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff9476c6000, count: 4029760) = 4029760
Was again limited to a quarter of the mmap size:
mmap flush: 2098176
mmap size 8392704B
A warning about that would be good to have but can be added later,
something like:
"max flush is a quarter of the mmap size, if wanting to bump the mmap
flush further, bump the mmap size as well using -m/--mmap-pages"
Also rename the 'sync' parameters to 'synch' to keep tools/perf building
with older glibcs:
cc1: warnings being treated as errors
builtin-record.c: In function 'record__mmap_read_evlist':
builtin-record.c:775: warning: declaration of 'sync' shadows a global declaration
/usr/include/unistd.h:933: warning: shadowed declaration is here
builtin-record.c: In function 'record__mmap_read_all':
builtin-record.c:856: warning: declaration of 'sync' shadows a global declaration
/usr/include/unistd.h:933: warning: shadowed declaration is here
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/f6600d72-ecfa-2eb7-7e51-f6954547d500@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-18 17:40:26 +00:00
|
|
|
bool overwrite, bool synch)
|
2010-05-02 20:05:29 +00:00
|
|
|
{
|
2014-07-25 14:56:16 +00:00
|
|
|
u64 bytes_written = rec->bytes_written;
|
2010-05-20 12:45:26 +00:00
|
|
|
int i;
|
2012-08-26 18:24:47 +00:00
|
|
|
int rc = 0;
|
2022-01-17 18:34:25 +00:00
|
|
|
int nr_mmaps;
|
|
|
|
struct mmap **maps;
|
2018-11-06 09:04:58 +00:00
|
|
|
int trace_fd = rec->data.file.fd;
|
2019-03-18 17:44:12 +00:00
|
|
|
off_t off = 0;
|
2010-05-02 20:05:29 +00:00
|
|
|
|
2016-06-27 10:24:04 +00:00
|
|
|
if (!evlist)
|
|
|
|
return 0;
|
2015-04-09 15:53:45 +00:00
|
|
|
|
2022-01-17 18:34:25 +00:00
|
|
|
nr_mmaps = thread->nr_mmaps;
|
|
|
|
maps = overwrite ? thread->overwrite_maps : thread->maps;
|
|
|
|
|
2016-07-14 08:34:36 +00:00
|
|
|
if (!maps)
|
|
|
|
return 0;
|
|
|
|
|
2017-12-04 16:51:07 +00:00
|
|
|
if (overwrite && evlist->bkw_mmap_state != BKW_MMAP_DATA_PENDING)
|
2016-07-14 08:34:42 +00:00
|
|
|
return 0;
|
|
|
|
|
2018-11-06 09:04:58 +00:00
|
|
|
if (record__aio_enabled(rec))
|
|
|
|
off = record__aio_get_pos(trace_fd);
|
|
|
|
|
2022-01-17 18:34:25 +00:00
|
|
|
for (i = 0; i < nr_mmaps; i++) {
|
perf record: Implement --mmap-flush=<number> option
Implement a --mmap-flush option that specifies minimal number of bytes
that is extracted from mmaped kernel buffer to store into a trace. The
default option value is 1 byte what means every time trace writing
thread finds some new data in the mmaped buffer the data is extracted,
possibly compressed and written to a trace.
$ tools/perf/perf record --mmap-flush 1024 -e cycles -- matrix.gcc
$ tools/perf/perf record --aio --mmap-flush 1K -e cycles -- matrix.gcc
The option is independent from -z setting, doesn't vary with compression
level and can serve two purposes.
The first purpose is to increase the compression ratio of a trace data.
Larger data chunks are compressed more effectively so the implemented
option allows specifying data chunk size to compress. Also at some cases
executing more write syscalls with smaller data size can take longer
than executing less write syscalls with bigger data size due to syscall
overhead so extracting bigger data chunks specified by the option value
could additionally decrease runtime overhead.
The second purpose is to avoid self monitoring live-lock issue in system
wide (-a) profiling mode. Profiling in system wide mode with compression
(-a -z) can additionally induce data into the kernel buffers along with
the data from monitored processes. If performance data rate and volume
from the monitored processes is high then trace streaming and
compression activity in the tool is also high. High tool process
activity can lead to subtle live-lock effect when compression of single
new byte from some of mmaped kernel buffer leads to generation of the
next single byte at some mmaped buffer. So perf tool process ends up in
endless self monitoring.
Implemented synch parameter is the mean to force data move independently
from the specified flush threshold value. Despite the provided flush
value the tool needs capability to unconditionally drain memory buffers,
at least in the end of the collection.
Committer testing:
Running with the default value, i.e. as soon as there is something to
read go on consuming, we first write the synthesized events, small
chunks of about 128 bytes:
# perf trace -m 2048 --call-graph dwarf -e write -- perf record
<SNIP>
101.142 ( 0.004 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x210db60, count: 120) = 120
__libc_write (/usr/lib64/libpthread-2.28.so)
ion (/home/acme/bin/perf)
record__write (inlined)
process_synthesized_event (/home/acme/bin/perf)
perf_tool__process_synth_event (inlined)
perf_event__synthesize_mmap_events (/home/acme/bin/perf)
Then we move to reading the mmap buffers consuming the events put there
by the kernel perf infrastructure:
107.561 ( 0.005 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befc02000, count: 336) = 336
__libc_write (/usr/lib64/libpthread-2.28.so)
ion (/home/acme/bin/perf)
record__write (inlined)
record__pushfn (/home/acme/bin/perf)
perf_mmap__push (/home/acme/bin/perf)
record__mmap_read_evlist (inlined)
record__mmap_read_all (inlined)
__cmd_record (inlined)
cmd_record (/home/acme/bin/perf)
12919.953 ( 0.136 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befc83150, count: 184984) = 184984
<SNIP same backtrace as in the 107.561 timestamp>
12920.094 ( 0.155 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befc02150, count: 261816) = 261816
<SNIP same backtrace as in the 107.561 timestamp>
12920.253 ( 0.093 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befb81120, count: 170832) = 170832
<SNIP same backtrace as in the 107.561 timestamp>
If we limit it to write only when more than 16MB are available for
reading, it throttles that to a quarter of the --mmap-pages set for
'perf record', which by default get to 528384 bytes, found out using
'record -v':
mmap flush: 132096
mmap size 528384B
With that in place all the writes coming from
record__mmap_read_evlist(), i.e. from the mmap buffers setup by the
kernel perf infrastructure were at least 132096 bytes long.
Trying with a bigger mmap size:
perf trace -e write perf record -v -m 2048 --mmap-flush 16M
74982.928 ( 2.471 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff94a6cc000, count: 3580888) = 3580888
74985.406 ( 2.353 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff949ecb000, count: 3453256) = 3453256
74987.764 ( 2.629 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff9496ca000, count: 3859232) = 3859232
74990.399 ( 2.341 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff948ec9000, count: 3769032) = 3769032
74992.744 ( 2.064 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff9486c8000, count: 3310520) = 3310520
74994.814 ( 2.619 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff947ec7000, count: 4194688) = 4194688
74997.439 ( 2.787 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff9476c6000, count: 4029760) = 4029760
Was again limited to a quarter of the mmap size:
mmap flush: 2098176
mmap size 8392704B
A warning about that would be good to have but can be added later,
something like:
"max flush is a quarter of the mmap size, if wanting to bump the mmap
flush further, bump the mmap size as well using -m/--mmap-pages"
Also rename the 'sync' parameters to 'synch' to keep tools/perf building
with older glibcs:
cc1: warnings being treated as errors
builtin-record.c: In function 'record__mmap_read_evlist':
builtin-record.c:775: warning: declaration of 'sync' shadows a global declaration
/usr/include/unistd.h:933: warning: shadowed declaration is here
builtin-record.c: In function 'record__mmap_read_all':
builtin-record.c:856: warning: declaration of 'sync' shadows a global declaration
/usr/include/unistd.h:933: warning: shadowed declaration is here
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/f6600d72-ecfa-2eb7-7e51-f6954547d500@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-18 17:40:26 +00:00
|
|
|
u64 flush = 0;
|
2022-01-17 18:34:25 +00:00
|
|
|
struct mmap *map = maps[i];
|
2016-06-27 10:24:04 +00:00
|
|
|
|
2019-07-27 20:07:44 +00:00
|
|
|
if (map->core.base) {
|
2019-01-22 17:50:57 +00:00
|
|
|
record__adjust_affinity(rec, map);
|
perf record: Implement --mmap-flush=<number> option
Implement a --mmap-flush option that specifies minimal number of bytes
that is extracted from mmaped kernel buffer to store into a trace. The
default option value is 1 byte what means every time trace writing
thread finds some new data in the mmaped buffer the data is extracted,
possibly compressed and written to a trace.
$ tools/perf/perf record --mmap-flush 1024 -e cycles -- matrix.gcc
$ tools/perf/perf record --aio --mmap-flush 1K -e cycles -- matrix.gcc
The option is independent from -z setting, doesn't vary with compression
level and can serve two purposes.
The first purpose is to increase the compression ratio of a trace data.
Larger data chunks are compressed more effectively so the implemented
option allows specifying data chunk size to compress. Also at some cases
executing more write syscalls with smaller data size can take longer
than executing less write syscalls with bigger data size due to syscall
overhead so extracting bigger data chunks specified by the option value
could additionally decrease runtime overhead.
The second purpose is to avoid self monitoring live-lock issue in system
wide (-a) profiling mode. Profiling in system wide mode with compression
(-a -z) can additionally induce data into the kernel buffers along with
the data from monitored processes. If performance data rate and volume
from the monitored processes is high then trace streaming and
compression activity in the tool is also high. High tool process
activity can lead to subtle live-lock effect when compression of single
new byte from some of mmaped kernel buffer leads to generation of the
next single byte at some mmaped buffer. So perf tool process ends up in
endless self monitoring.
Implemented synch parameter is the mean to force data move independently
from the specified flush threshold value. Despite the provided flush
value the tool needs capability to unconditionally drain memory buffers,
at least in the end of the collection.
Committer testing:
Running with the default value, i.e. as soon as there is something to
read go on consuming, we first write the synthesized events, small
chunks of about 128 bytes:
# perf trace -m 2048 --call-graph dwarf -e write -- perf record
<SNIP>
101.142 ( 0.004 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x210db60, count: 120) = 120
__libc_write (/usr/lib64/libpthread-2.28.so)
ion (/home/acme/bin/perf)
record__write (inlined)
process_synthesized_event (/home/acme/bin/perf)
perf_tool__process_synth_event (inlined)
perf_event__synthesize_mmap_events (/home/acme/bin/perf)
Then we move to reading the mmap buffers consuming the events put there
by the kernel perf infrastructure:
107.561 ( 0.005 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befc02000, count: 336) = 336
__libc_write (/usr/lib64/libpthread-2.28.so)
ion (/home/acme/bin/perf)
record__write (inlined)
record__pushfn (/home/acme/bin/perf)
perf_mmap__push (/home/acme/bin/perf)
record__mmap_read_evlist (inlined)
record__mmap_read_all (inlined)
__cmd_record (inlined)
cmd_record (/home/acme/bin/perf)
12919.953 ( 0.136 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befc83150, count: 184984) = 184984
<SNIP same backtrace as in the 107.561 timestamp>
12920.094 ( 0.155 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befc02150, count: 261816) = 261816
<SNIP same backtrace as in the 107.561 timestamp>
12920.253 ( 0.093 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befb81120, count: 170832) = 170832
<SNIP same backtrace as in the 107.561 timestamp>
If we limit it to write only when more than 16MB are available for
reading, it throttles that to a quarter of the --mmap-pages set for
'perf record', which by default get to 528384 bytes, found out using
'record -v':
mmap flush: 132096
mmap size 528384B
With that in place all the writes coming from
record__mmap_read_evlist(), i.e. from the mmap buffers setup by the
kernel perf infrastructure were at least 132096 bytes long.
Trying with a bigger mmap size:
perf trace -e write perf record -v -m 2048 --mmap-flush 16M
74982.928 ( 2.471 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff94a6cc000, count: 3580888) = 3580888
74985.406 ( 2.353 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff949ecb000, count: 3453256) = 3453256
74987.764 ( 2.629 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff9496ca000, count: 3859232) = 3859232
74990.399 ( 2.341 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff948ec9000, count: 3769032) = 3769032
74992.744 ( 2.064 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff9486c8000, count: 3310520) = 3310520
74994.814 ( 2.619 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff947ec7000, count: 4194688) = 4194688
74997.439 ( 2.787 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff9476c6000, count: 4029760) = 4029760
Was again limited to a quarter of the mmap size:
mmap flush: 2098176
mmap size 8392704B
A warning about that would be good to have but can be added later,
something like:
"max flush is a quarter of the mmap size, if wanting to bump the mmap
flush further, bump the mmap size as well using -m/--mmap-pages"
Also rename the 'sync' parameters to 'synch' to keep tools/perf building
with older glibcs:
cc1: warnings being treated as errors
builtin-record.c: In function 'record__mmap_read_evlist':
builtin-record.c:775: warning: declaration of 'sync' shadows a global declaration
/usr/include/unistd.h:933: warning: shadowed declaration is here
builtin-record.c: In function 'record__mmap_read_all':
builtin-record.c:856: warning: declaration of 'sync' shadows a global declaration
/usr/include/unistd.h:933: warning: shadowed declaration is here
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/f6600d72-ecfa-2eb7-7e51-f6954547d500@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-18 17:40:26 +00:00
|
|
|
if (synch) {
|
2019-08-27 14:05:18 +00:00
|
|
|
flush = map->core.flush;
|
|
|
|
map->core.flush = 1;
|
perf record: Implement --mmap-flush=<number> option
Implement a --mmap-flush option that specifies minimal number of bytes
that is extracted from mmaped kernel buffer to store into a trace. The
default option value is 1 byte what means every time trace writing
thread finds some new data in the mmaped buffer the data is extracted,
possibly compressed and written to a trace.
$ tools/perf/perf record --mmap-flush 1024 -e cycles -- matrix.gcc
$ tools/perf/perf record --aio --mmap-flush 1K -e cycles -- matrix.gcc
The option is independent from -z setting, doesn't vary with compression
level and can serve two purposes.
The first purpose is to increase the compression ratio of a trace data.
Larger data chunks are compressed more effectively so the implemented
option allows specifying data chunk size to compress. Also at some cases
executing more write syscalls with smaller data size can take longer
than executing less write syscalls with bigger data size due to syscall
overhead so extracting bigger data chunks specified by the option value
could additionally decrease runtime overhead.
The second purpose is to avoid self monitoring live-lock issue in system
wide (-a) profiling mode. Profiling in system wide mode with compression
(-a -z) can additionally induce data into the kernel buffers along with
the data from monitored processes. If performance data rate and volume
from the monitored processes is high then trace streaming and
compression activity in the tool is also high. High tool process
activity can lead to subtle live-lock effect when compression of single
new byte from some of mmaped kernel buffer leads to generation of the
next single byte at some mmaped buffer. So perf tool process ends up in
endless self monitoring.
Implemented synch parameter is the mean to force data move independently
from the specified flush threshold value. Despite the provided flush
value the tool needs capability to unconditionally drain memory buffers,
at least in the end of the collection.
Committer testing:
Running with the default value, i.e. as soon as there is something to
read go on consuming, we first write the synthesized events, small
chunks of about 128 bytes:
# perf trace -m 2048 --call-graph dwarf -e write -- perf record
<SNIP>
101.142 ( 0.004 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x210db60, count: 120) = 120
__libc_write (/usr/lib64/libpthread-2.28.so)
ion (/home/acme/bin/perf)
record__write (inlined)
process_synthesized_event (/home/acme/bin/perf)
perf_tool__process_synth_event (inlined)
perf_event__synthesize_mmap_events (/home/acme/bin/perf)
Then we move to reading the mmap buffers consuming the events put there
by the kernel perf infrastructure:
107.561 ( 0.005 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befc02000, count: 336) = 336
__libc_write (/usr/lib64/libpthread-2.28.so)
ion (/home/acme/bin/perf)
record__write (inlined)
record__pushfn (/home/acme/bin/perf)
perf_mmap__push (/home/acme/bin/perf)
record__mmap_read_evlist (inlined)
record__mmap_read_all (inlined)
__cmd_record (inlined)
cmd_record (/home/acme/bin/perf)
12919.953 ( 0.136 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befc83150, count: 184984) = 184984
<SNIP same backtrace as in the 107.561 timestamp>
12920.094 ( 0.155 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befc02150, count: 261816) = 261816
<SNIP same backtrace as in the 107.561 timestamp>
12920.253 ( 0.093 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befb81120, count: 170832) = 170832
<SNIP same backtrace as in the 107.561 timestamp>
If we limit it to write only when more than 16MB are available for
reading, it throttles that to a quarter of the --mmap-pages set for
'perf record', which by default get to 528384 bytes, found out using
'record -v':
mmap flush: 132096
mmap size 528384B
With that in place all the writes coming from
record__mmap_read_evlist(), i.e. from the mmap buffers setup by the
kernel perf infrastructure were at least 132096 bytes long.
Trying with a bigger mmap size:
perf trace -e write perf record -v -m 2048 --mmap-flush 16M
74982.928 ( 2.471 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff94a6cc000, count: 3580888) = 3580888
74985.406 ( 2.353 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff949ecb000, count: 3453256) = 3453256
74987.764 ( 2.629 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff9496ca000, count: 3859232) = 3859232
74990.399 ( 2.341 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff948ec9000, count: 3769032) = 3769032
74992.744 ( 2.064 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff9486c8000, count: 3310520) = 3310520
74994.814 ( 2.619 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff947ec7000, count: 4194688) = 4194688
74997.439 ( 2.787 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff9476c6000, count: 4029760) = 4029760
Was again limited to a quarter of the mmap size:
mmap flush: 2098176
mmap size 8392704B
A warning about that would be good to have but can be added later,
something like:
"max flush is a quarter of the mmap size, if wanting to bump the mmap
flush further, bump the mmap size as well using -m/--mmap-pages"
Also rename the 'sync' parameters to 'synch' to keep tools/perf building
with older glibcs:
cc1: warnings being treated as errors
builtin-record.c: In function 'record__mmap_read_evlist':
builtin-record.c:775: warning: declaration of 'sync' shadows a global declaration
/usr/include/unistd.h:933: warning: shadowed declaration is here
builtin-record.c: In function 'record__mmap_read_all':
builtin-record.c:856: warning: declaration of 'sync' shadows a global declaration
/usr/include/unistd.h:933: warning: shadowed declaration is here
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/f6600d72-ecfa-2eb7-7e51-f6954547d500@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-18 17:40:26 +00:00
|
|
|
}
|
2018-11-06 09:04:58 +00:00
|
|
|
if (!record__aio_enabled(rec)) {
|
2019-03-18 17:44:12 +00:00
|
|
|
if (perf_mmap__push(map, rec, record__pushfn) < 0) {
|
perf record: Implement --mmap-flush=<number> option
Implement a --mmap-flush option that specifies minimal number of bytes
that is extracted from mmaped kernel buffer to store into a trace. The
default option value is 1 byte what means every time trace writing
thread finds some new data in the mmaped buffer the data is extracted,
possibly compressed and written to a trace.
$ tools/perf/perf record --mmap-flush 1024 -e cycles -- matrix.gcc
$ tools/perf/perf record --aio --mmap-flush 1K -e cycles -- matrix.gcc
The option is independent from -z setting, doesn't vary with compression
level and can serve two purposes.
The first purpose is to increase the compression ratio of a trace data.
Larger data chunks are compressed more effectively so the implemented
option allows specifying data chunk size to compress. Also at some cases
executing more write syscalls with smaller data size can take longer
than executing less write syscalls with bigger data size due to syscall
overhead so extracting bigger data chunks specified by the option value
could additionally decrease runtime overhead.
The second purpose is to avoid self monitoring live-lock issue in system
wide (-a) profiling mode. Profiling in system wide mode with compression
(-a -z) can additionally induce data into the kernel buffers along with
the data from monitored processes. If performance data rate and volume
from the monitored processes is high then trace streaming and
compression activity in the tool is also high. High tool process
activity can lead to subtle live-lock effect when compression of single
new byte from some of mmaped kernel buffer leads to generation of the
next single byte at some mmaped buffer. So perf tool process ends up in
endless self monitoring.
Implemented synch parameter is the mean to force data move independently
from the specified flush threshold value. Despite the provided flush
value the tool needs capability to unconditionally drain memory buffers,
at least in the end of the collection.
Committer testing:
Running with the default value, i.e. as soon as there is something to
read go on consuming, we first write the synthesized events, small
chunks of about 128 bytes:
# perf trace -m 2048 --call-graph dwarf -e write -- perf record
<SNIP>
101.142 ( 0.004 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x210db60, count: 120) = 120
__libc_write (/usr/lib64/libpthread-2.28.so)
ion (/home/acme/bin/perf)
record__write (inlined)
process_synthesized_event (/home/acme/bin/perf)
perf_tool__process_synth_event (inlined)
perf_event__synthesize_mmap_events (/home/acme/bin/perf)
Then we move to reading the mmap buffers consuming the events put there
by the kernel perf infrastructure:
107.561 ( 0.005 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befc02000, count: 336) = 336
__libc_write (/usr/lib64/libpthread-2.28.so)
ion (/home/acme/bin/perf)
record__write (inlined)
record__pushfn (/home/acme/bin/perf)
perf_mmap__push (/home/acme/bin/perf)
record__mmap_read_evlist (inlined)
record__mmap_read_all (inlined)
__cmd_record (inlined)
cmd_record (/home/acme/bin/perf)
12919.953 ( 0.136 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befc83150, count: 184984) = 184984
<SNIP same backtrace as in the 107.561 timestamp>
12920.094 ( 0.155 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befc02150, count: 261816) = 261816
<SNIP same backtrace as in the 107.561 timestamp>
12920.253 ( 0.093 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befb81120, count: 170832) = 170832
<SNIP same backtrace as in the 107.561 timestamp>
If we limit it to write only when more than 16MB are available for
reading, it throttles that to a quarter of the --mmap-pages set for
'perf record', which by default get to 528384 bytes, found out using
'record -v':
mmap flush: 132096
mmap size 528384B
With that in place all the writes coming from
record__mmap_read_evlist(), i.e. from the mmap buffers setup by the
kernel perf infrastructure were at least 132096 bytes long.
Trying with a bigger mmap size:
perf trace -e write perf record -v -m 2048 --mmap-flush 16M
74982.928 ( 2.471 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff94a6cc000, count: 3580888) = 3580888
74985.406 ( 2.353 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff949ecb000, count: 3453256) = 3453256
74987.764 ( 2.629 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff9496ca000, count: 3859232) = 3859232
74990.399 ( 2.341 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff948ec9000, count: 3769032) = 3769032
74992.744 ( 2.064 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff9486c8000, count: 3310520) = 3310520
74994.814 ( 2.619 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff947ec7000, count: 4194688) = 4194688
74997.439 ( 2.787 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff9476c6000, count: 4029760) = 4029760
Was again limited to a quarter of the mmap size:
mmap flush: 2098176
mmap size 8392704B
A warning about that would be good to have but can be added later,
something like:
"max flush is a quarter of the mmap size, if wanting to bump the mmap
flush further, bump the mmap size as well using -m/--mmap-pages"
Also rename the 'sync' parameters to 'synch' to keep tools/perf building
with older glibcs:
cc1: warnings being treated as errors
builtin-record.c: In function 'record__mmap_read_evlist':
builtin-record.c:775: warning: declaration of 'sync' shadows a global declaration
/usr/include/unistd.h:933: warning: shadowed declaration is here
builtin-record.c: In function 'record__mmap_read_all':
builtin-record.c:856: warning: declaration of 'sync' shadows a global declaration
/usr/include/unistd.h:933: warning: shadowed declaration is here
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/f6600d72-ecfa-2eb7-7e51-f6954547d500@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-18 17:40:26 +00:00
|
|
|
if (synch)
|
2019-08-27 14:05:18 +00:00
|
|
|
map->core.flush = flush;
|
2018-11-06 09:04:58 +00:00
|
|
|
rc = -1;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
} else {
|
2019-03-18 17:44:12 +00:00
|
|
|
if (record__aio_push(rec, map, &off) < 0) {
|
2018-11-06 09:04:58 +00:00
|
|
|
record__aio_set_pos(trace_fd, off);
|
perf record: Implement --mmap-flush=<number> option
Implement a --mmap-flush option that specifies minimal number of bytes
that is extracted from mmaped kernel buffer to store into a trace. The
default option value is 1 byte what means every time trace writing
thread finds some new data in the mmaped buffer the data is extracted,
possibly compressed and written to a trace.
$ tools/perf/perf record --mmap-flush 1024 -e cycles -- matrix.gcc
$ tools/perf/perf record --aio --mmap-flush 1K -e cycles -- matrix.gcc
The option is independent from -z setting, doesn't vary with compression
level and can serve two purposes.
The first purpose is to increase the compression ratio of a trace data.
Larger data chunks are compressed more effectively so the implemented
option allows specifying data chunk size to compress. Also at some cases
executing more write syscalls with smaller data size can take longer
than executing less write syscalls with bigger data size due to syscall
overhead so extracting bigger data chunks specified by the option value
could additionally decrease runtime overhead.
The second purpose is to avoid self monitoring live-lock issue in system
wide (-a) profiling mode. Profiling in system wide mode with compression
(-a -z) can additionally induce data into the kernel buffers along with
the data from monitored processes. If performance data rate and volume
from the monitored processes is high then trace streaming and
compression activity in the tool is also high. High tool process
activity can lead to subtle live-lock effect when compression of single
new byte from some of mmaped kernel buffer leads to generation of the
next single byte at some mmaped buffer. So perf tool process ends up in
endless self monitoring.
Implemented synch parameter is the mean to force data move independently
from the specified flush threshold value. Despite the provided flush
value the tool needs capability to unconditionally drain memory buffers,
at least in the end of the collection.
Committer testing:
Running with the default value, i.e. as soon as there is something to
read go on consuming, we first write the synthesized events, small
chunks of about 128 bytes:
# perf trace -m 2048 --call-graph dwarf -e write -- perf record
<SNIP>
101.142 ( 0.004 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x210db60, count: 120) = 120
__libc_write (/usr/lib64/libpthread-2.28.so)
ion (/home/acme/bin/perf)
record__write (inlined)
process_synthesized_event (/home/acme/bin/perf)
perf_tool__process_synth_event (inlined)
perf_event__synthesize_mmap_events (/home/acme/bin/perf)
Then we move to reading the mmap buffers consuming the events put there
by the kernel perf infrastructure:
107.561 ( 0.005 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befc02000, count: 336) = 336
__libc_write (/usr/lib64/libpthread-2.28.so)
ion (/home/acme/bin/perf)
record__write (inlined)
record__pushfn (/home/acme/bin/perf)
perf_mmap__push (/home/acme/bin/perf)
record__mmap_read_evlist (inlined)
record__mmap_read_all (inlined)
__cmd_record (inlined)
cmd_record (/home/acme/bin/perf)
12919.953 ( 0.136 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befc83150, count: 184984) = 184984
<SNIP same backtrace as in the 107.561 timestamp>
12920.094 ( 0.155 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befc02150, count: 261816) = 261816
<SNIP same backtrace as in the 107.561 timestamp>
12920.253 ( 0.093 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befb81120, count: 170832) = 170832
<SNIP same backtrace as in the 107.561 timestamp>
If we limit it to write only when more than 16MB are available for
reading, it throttles that to a quarter of the --mmap-pages set for
'perf record', which by default get to 528384 bytes, found out using
'record -v':
mmap flush: 132096
mmap size 528384B
With that in place all the writes coming from
record__mmap_read_evlist(), i.e. from the mmap buffers setup by the
kernel perf infrastructure were at least 132096 bytes long.
Trying with a bigger mmap size:
perf trace -e write perf record -v -m 2048 --mmap-flush 16M
74982.928 ( 2.471 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff94a6cc000, count: 3580888) = 3580888
74985.406 ( 2.353 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff949ecb000, count: 3453256) = 3453256
74987.764 ( 2.629 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff9496ca000, count: 3859232) = 3859232
74990.399 ( 2.341 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff948ec9000, count: 3769032) = 3769032
74992.744 ( 2.064 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff9486c8000, count: 3310520) = 3310520
74994.814 ( 2.619 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff947ec7000, count: 4194688) = 4194688
74997.439 ( 2.787 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff9476c6000, count: 4029760) = 4029760
Was again limited to a quarter of the mmap size:
mmap flush: 2098176
mmap size 8392704B
A warning about that would be good to have but can be added later,
something like:
"max flush is a quarter of the mmap size, if wanting to bump the mmap
flush further, bump the mmap size as well using -m/--mmap-pages"
Also rename the 'sync' parameters to 'synch' to keep tools/perf building
with older glibcs:
cc1: warnings being treated as errors
builtin-record.c: In function 'record__mmap_read_evlist':
builtin-record.c:775: warning: declaration of 'sync' shadows a global declaration
/usr/include/unistd.h:933: warning: shadowed declaration is here
builtin-record.c: In function 'record__mmap_read_all':
builtin-record.c:856: warning: declaration of 'sync' shadows a global declaration
/usr/include/unistd.h:933: warning: shadowed declaration is here
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/f6600d72-ecfa-2eb7-7e51-f6954547d500@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-18 17:40:26 +00:00
|
|
|
if (synch)
|
2019-08-27 14:05:18 +00:00
|
|
|
map->core.flush = flush;
|
2018-11-06 09:04:58 +00:00
|
|
|
rc = -1;
|
|
|
|
goto out;
|
|
|
|
}
|
2012-08-26 18:24:47 +00:00
|
|
|
}
|
perf record: Implement --mmap-flush=<number> option
Implement a --mmap-flush option that specifies minimal number of bytes
that is extracted from mmaped kernel buffer to store into a trace. The
default option value is 1 byte what means every time trace writing
thread finds some new data in the mmaped buffer the data is extracted,
possibly compressed and written to a trace.
$ tools/perf/perf record --mmap-flush 1024 -e cycles -- matrix.gcc
$ tools/perf/perf record --aio --mmap-flush 1K -e cycles -- matrix.gcc
The option is independent from -z setting, doesn't vary with compression
level and can serve two purposes.
The first purpose is to increase the compression ratio of a trace data.
Larger data chunks are compressed more effectively so the implemented
option allows specifying data chunk size to compress. Also at some cases
executing more write syscalls with smaller data size can take longer
than executing less write syscalls with bigger data size due to syscall
overhead so extracting bigger data chunks specified by the option value
could additionally decrease runtime overhead.
The second purpose is to avoid self monitoring live-lock issue in system
wide (-a) profiling mode. Profiling in system wide mode with compression
(-a -z) can additionally induce data into the kernel buffers along with
the data from monitored processes. If performance data rate and volume
from the monitored processes is high then trace streaming and
compression activity in the tool is also high. High tool process
activity can lead to subtle live-lock effect when compression of single
new byte from some of mmaped kernel buffer leads to generation of the
next single byte at some mmaped buffer. So perf tool process ends up in
endless self monitoring.
Implemented synch parameter is the mean to force data move independently
from the specified flush threshold value. Despite the provided flush
value the tool needs capability to unconditionally drain memory buffers,
at least in the end of the collection.
Committer testing:
Running with the default value, i.e. as soon as there is something to
read go on consuming, we first write the synthesized events, small
chunks of about 128 bytes:
# perf trace -m 2048 --call-graph dwarf -e write -- perf record
<SNIP>
101.142 ( 0.004 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x210db60, count: 120) = 120
__libc_write (/usr/lib64/libpthread-2.28.so)
ion (/home/acme/bin/perf)
record__write (inlined)
process_synthesized_event (/home/acme/bin/perf)
perf_tool__process_synth_event (inlined)
perf_event__synthesize_mmap_events (/home/acme/bin/perf)
Then we move to reading the mmap buffers consuming the events put there
by the kernel perf infrastructure:
107.561 ( 0.005 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befc02000, count: 336) = 336
__libc_write (/usr/lib64/libpthread-2.28.so)
ion (/home/acme/bin/perf)
record__write (inlined)
record__pushfn (/home/acme/bin/perf)
perf_mmap__push (/home/acme/bin/perf)
record__mmap_read_evlist (inlined)
record__mmap_read_all (inlined)
__cmd_record (inlined)
cmd_record (/home/acme/bin/perf)
12919.953 ( 0.136 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befc83150, count: 184984) = 184984
<SNIP same backtrace as in the 107.561 timestamp>
12920.094 ( 0.155 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befc02150, count: 261816) = 261816
<SNIP same backtrace as in the 107.561 timestamp>
12920.253 ( 0.093 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befb81120, count: 170832) = 170832
<SNIP same backtrace as in the 107.561 timestamp>
If we limit it to write only when more than 16MB are available for
reading, it throttles that to a quarter of the --mmap-pages set for
'perf record', which by default get to 528384 bytes, found out using
'record -v':
mmap flush: 132096
mmap size 528384B
With that in place all the writes coming from
record__mmap_read_evlist(), i.e. from the mmap buffers setup by the
kernel perf infrastructure were at least 132096 bytes long.
Trying with a bigger mmap size:
perf trace -e write perf record -v -m 2048 --mmap-flush 16M
74982.928 ( 2.471 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff94a6cc000, count: 3580888) = 3580888
74985.406 ( 2.353 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff949ecb000, count: 3453256) = 3453256
74987.764 ( 2.629 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff9496ca000, count: 3859232) = 3859232
74990.399 ( 2.341 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff948ec9000, count: 3769032) = 3769032
74992.744 ( 2.064 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff9486c8000, count: 3310520) = 3310520
74994.814 ( 2.619 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff947ec7000, count: 4194688) = 4194688
74997.439 ( 2.787 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff9476c6000, count: 4029760) = 4029760
Was again limited to a quarter of the mmap size:
mmap flush: 2098176
mmap size 8392704B
A warning about that would be good to have but can be added later,
something like:
"max flush is a quarter of the mmap size, if wanting to bump the mmap
flush further, bump the mmap size as well using -m/--mmap-pages"
Also rename the 'sync' parameters to 'synch' to keep tools/perf building
with older glibcs:
cc1: warnings being treated as errors
builtin-record.c: In function 'record__mmap_read_evlist':
builtin-record.c:775: warning: declaration of 'sync' shadows a global declaration
/usr/include/unistd.h:933: warning: shadowed declaration is here
builtin-record.c: In function 'record__mmap_read_all':
builtin-record.c:856: warning: declaration of 'sync' shadows a global declaration
/usr/include/unistd.h:933: warning: shadowed declaration is here
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/f6600d72-ecfa-2eb7-7e51-f6954547d500@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-18 17:40:26 +00:00
|
|
|
if (synch)
|
2019-08-27 14:05:18 +00:00
|
|
|
map->core.flush = flush;
|
2012-08-26 18:24:47 +00:00
|
|
|
}
|
2015-04-09 15:53:45 +00:00
|
|
|
|
2018-09-13 12:54:05 +00:00
|
|
|
if (map->auxtrace_mmap.base && !rec->opts.auxtrace_snapshot_mode &&
|
2019-11-15 12:42:16 +00:00
|
|
|
!rec->opts.auxtrace_sample_mode &&
|
2018-09-13 12:54:05 +00:00
|
|
|
record__auxtrace_mmap_read(rec, map) != 0) {
|
2015-04-09 15:53:45 +00:00
|
|
|
rc = -1;
|
|
|
|
goto out;
|
|
|
|
}
|
2010-05-02 20:05:29 +00:00
|
|
|
}
|
|
|
|
|
2018-11-06 09:04:58 +00:00
|
|
|
if (record__aio_enabled(rec))
|
|
|
|
record__aio_set_pos(trace_fd, off);
|
|
|
|
|
2014-07-25 14:56:16 +00:00
|
|
|
/*
|
|
|
|
* Mark the round finished in case we wrote
|
|
|
|
* at least one event.
|
2022-01-17 18:34:28 +00:00
|
|
|
*
|
|
|
|
* No need for round events in directory mode,
|
|
|
|
* because per-cpu maps and files have data
|
|
|
|
* sorted by kernel.
|
2014-07-25 14:56:16 +00:00
|
|
|
*/
|
2022-01-17 18:34:28 +00:00
|
|
|
if (!record__threads_enabled(rec) && bytes_written != rec->bytes_written)
|
2018-09-13 12:54:06 +00:00
|
|
|
rc = record__write(rec, NULL, &finished_round_event, sizeof(finished_round_event));
|
2012-08-26 18:24:47 +00:00
|
|
|
|
2017-12-04 16:51:07 +00:00
|
|
|
if (overwrite)
|
2020-11-30 12:33:55 +00:00
|
|
|
evlist__toggle_bkw_mmap(evlist, BKW_MMAP_EMPTY);
|
2012-08-26 18:24:47 +00:00
|
|
|
out:
|
|
|
|
return rc;
|
2010-05-02 20:05:29 +00:00
|
|
|
}
|
|
|
|
|
perf record: Implement --mmap-flush=<number> option
Implement a --mmap-flush option that specifies minimal number of bytes
that is extracted from mmaped kernel buffer to store into a trace. The
default option value is 1 byte what means every time trace writing
thread finds some new data in the mmaped buffer the data is extracted,
possibly compressed and written to a trace.
$ tools/perf/perf record --mmap-flush 1024 -e cycles -- matrix.gcc
$ tools/perf/perf record --aio --mmap-flush 1K -e cycles -- matrix.gcc
The option is independent from -z setting, doesn't vary with compression
level and can serve two purposes.
The first purpose is to increase the compression ratio of a trace data.
Larger data chunks are compressed more effectively so the implemented
option allows specifying data chunk size to compress. Also at some cases
executing more write syscalls with smaller data size can take longer
than executing less write syscalls with bigger data size due to syscall
overhead so extracting bigger data chunks specified by the option value
could additionally decrease runtime overhead.
The second purpose is to avoid self monitoring live-lock issue in system
wide (-a) profiling mode. Profiling in system wide mode with compression
(-a -z) can additionally induce data into the kernel buffers along with
the data from monitored processes. If performance data rate and volume
from the monitored processes is high then trace streaming and
compression activity in the tool is also high. High tool process
activity can lead to subtle live-lock effect when compression of single
new byte from some of mmaped kernel buffer leads to generation of the
next single byte at some mmaped buffer. So perf tool process ends up in
endless self monitoring.
Implemented synch parameter is the mean to force data move independently
from the specified flush threshold value. Despite the provided flush
value the tool needs capability to unconditionally drain memory buffers,
at least in the end of the collection.
Committer testing:
Running with the default value, i.e. as soon as there is something to
read go on consuming, we first write the synthesized events, small
chunks of about 128 bytes:
# perf trace -m 2048 --call-graph dwarf -e write -- perf record
<SNIP>
101.142 ( 0.004 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x210db60, count: 120) = 120
__libc_write (/usr/lib64/libpthread-2.28.so)
ion (/home/acme/bin/perf)
record__write (inlined)
process_synthesized_event (/home/acme/bin/perf)
perf_tool__process_synth_event (inlined)
perf_event__synthesize_mmap_events (/home/acme/bin/perf)
Then we move to reading the mmap buffers consuming the events put there
by the kernel perf infrastructure:
107.561 ( 0.005 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befc02000, count: 336) = 336
__libc_write (/usr/lib64/libpthread-2.28.so)
ion (/home/acme/bin/perf)
record__write (inlined)
record__pushfn (/home/acme/bin/perf)
perf_mmap__push (/home/acme/bin/perf)
record__mmap_read_evlist (inlined)
record__mmap_read_all (inlined)
__cmd_record (inlined)
cmd_record (/home/acme/bin/perf)
12919.953 ( 0.136 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befc83150, count: 184984) = 184984
<SNIP same backtrace as in the 107.561 timestamp>
12920.094 ( 0.155 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befc02150, count: 261816) = 261816
<SNIP same backtrace as in the 107.561 timestamp>
12920.253 ( 0.093 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befb81120, count: 170832) = 170832
<SNIP same backtrace as in the 107.561 timestamp>
If we limit it to write only when more than 16MB are available for
reading, it throttles that to a quarter of the --mmap-pages set for
'perf record', which by default get to 528384 bytes, found out using
'record -v':
mmap flush: 132096
mmap size 528384B
With that in place all the writes coming from
record__mmap_read_evlist(), i.e. from the mmap buffers setup by the
kernel perf infrastructure were at least 132096 bytes long.
Trying with a bigger mmap size:
perf trace -e write perf record -v -m 2048 --mmap-flush 16M
74982.928 ( 2.471 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff94a6cc000, count: 3580888) = 3580888
74985.406 ( 2.353 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff949ecb000, count: 3453256) = 3453256
74987.764 ( 2.629 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff9496ca000, count: 3859232) = 3859232
74990.399 ( 2.341 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff948ec9000, count: 3769032) = 3769032
74992.744 ( 2.064 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff9486c8000, count: 3310520) = 3310520
74994.814 ( 2.619 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff947ec7000, count: 4194688) = 4194688
74997.439 ( 2.787 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff9476c6000, count: 4029760) = 4029760
Was again limited to a quarter of the mmap size:
mmap flush: 2098176
mmap size 8392704B
A warning about that would be good to have but can be added later,
something like:
"max flush is a quarter of the mmap size, if wanting to bump the mmap
flush further, bump the mmap size as well using -m/--mmap-pages"
Also rename the 'sync' parameters to 'synch' to keep tools/perf building
with older glibcs:
cc1: warnings being treated as errors
builtin-record.c: In function 'record__mmap_read_evlist':
builtin-record.c:775: warning: declaration of 'sync' shadows a global declaration
/usr/include/unistd.h:933: warning: shadowed declaration is here
builtin-record.c: In function 'record__mmap_read_all':
builtin-record.c:856: warning: declaration of 'sync' shadows a global declaration
/usr/include/unistd.h:933: warning: shadowed declaration is here
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/f6600d72-ecfa-2eb7-7e51-f6954547d500@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-18 17:40:26 +00:00
|
|
|
static int record__mmap_read_all(struct record *rec, bool synch)
|
2016-06-27 10:24:04 +00:00
|
|
|
{
|
|
|
|
int err;
|
|
|
|
|
perf record: Implement --mmap-flush=<number> option
Implement a --mmap-flush option that specifies minimal number of bytes
that is extracted from mmaped kernel buffer to store into a trace. The
default option value is 1 byte what means every time trace writing
thread finds some new data in the mmaped buffer the data is extracted,
possibly compressed and written to a trace.
$ tools/perf/perf record --mmap-flush 1024 -e cycles -- matrix.gcc
$ tools/perf/perf record --aio --mmap-flush 1K -e cycles -- matrix.gcc
The option is independent from -z setting, doesn't vary with compression
level and can serve two purposes.
The first purpose is to increase the compression ratio of a trace data.
Larger data chunks are compressed more effectively so the implemented
option allows specifying data chunk size to compress. Also at some cases
executing more write syscalls with smaller data size can take longer
than executing less write syscalls with bigger data size due to syscall
overhead so extracting bigger data chunks specified by the option value
could additionally decrease runtime overhead.
The second purpose is to avoid self monitoring live-lock issue in system
wide (-a) profiling mode. Profiling in system wide mode with compression
(-a -z) can additionally induce data into the kernel buffers along with
the data from monitored processes. If performance data rate and volume
from the monitored processes is high then trace streaming and
compression activity in the tool is also high. High tool process
activity can lead to subtle live-lock effect when compression of single
new byte from some of mmaped kernel buffer leads to generation of the
next single byte at some mmaped buffer. So perf tool process ends up in
endless self monitoring.
Implemented synch parameter is the mean to force data move independently
from the specified flush threshold value. Despite the provided flush
value the tool needs capability to unconditionally drain memory buffers,
at least in the end of the collection.
Committer testing:
Running with the default value, i.e. as soon as there is something to
read go on consuming, we first write the synthesized events, small
chunks of about 128 bytes:
# perf trace -m 2048 --call-graph dwarf -e write -- perf record
<SNIP>
101.142 ( 0.004 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x210db60, count: 120) = 120
__libc_write (/usr/lib64/libpthread-2.28.so)
ion (/home/acme/bin/perf)
record__write (inlined)
process_synthesized_event (/home/acme/bin/perf)
perf_tool__process_synth_event (inlined)
perf_event__synthesize_mmap_events (/home/acme/bin/perf)
Then we move to reading the mmap buffers consuming the events put there
by the kernel perf infrastructure:
107.561 ( 0.005 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befc02000, count: 336) = 336
__libc_write (/usr/lib64/libpthread-2.28.so)
ion (/home/acme/bin/perf)
record__write (inlined)
record__pushfn (/home/acme/bin/perf)
perf_mmap__push (/home/acme/bin/perf)
record__mmap_read_evlist (inlined)
record__mmap_read_all (inlined)
__cmd_record (inlined)
cmd_record (/home/acme/bin/perf)
12919.953 ( 0.136 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befc83150, count: 184984) = 184984
<SNIP same backtrace as in the 107.561 timestamp>
12920.094 ( 0.155 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befc02150, count: 261816) = 261816
<SNIP same backtrace as in the 107.561 timestamp>
12920.253 ( 0.093 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befb81120, count: 170832) = 170832
<SNIP same backtrace as in the 107.561 timestamp>
If we limit it to write only when more than 16MB are available for
reading, it throttles that to a quarter of the --mmap-pages set for
'perf record', which by default get to 528384 bytes, found out using
'record -v':
mmap flush: 132096
mmap size 528384B
With that in place all the writes coming from
record__mmap_read_evlist(), i.e. from the mmap buffers setup by the
kernel perf infrastructure were at least 132096 bytes long.
Trying with a bigger mmap size:
perf trace -e write perf record -v -m 2048 --mmap-flush 16M
74982.928 ( 2.471 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff94a6cc000, count: 3580888) = 3580888
74985.406 ( 2.353 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff949ecb000, count: 3453256) = 3453256
74987.764 ( 2.629 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff9496ca000, count: 3859232) = 3859232
74990.399 ( 2.341 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff948ec9000, count: 3769032) = 3769032
74992.744 ( 2.064 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff9486c8000, count: 3310520) = 3310520
74994.814 ( 2.619 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff947ec7000, count: 4194688) = 4194688
74997.439 ( 2.787 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff9476c6000, count: 4029760) = 4029760
Was again limited to a quarter of the mmap size:
mmap flush: 2098176
mmap size 8392704B
A warning about that would be good to have but can be added later,
something like:
"max flush is a quarter of the mmap size, if wanting to bump the mmap
flush further, bump the mmap size as well using -m/--mmap-pages"
Also rename the 'sync' parameters to 'synch' to keep tools/perf building
with older glibcs:
cc1: warnings being treated as errors
builtin-record.c: In function 'record__mmap_read_evlist':
builtin-record.c:775: warning: declaration of 'sync' shadows a global declaration
/usr/include/unistd.h:933: warning: shadowed declaration is here
builtin-record.c: In function 'record__mmap_read_all':
builtin-record.c:856: warning: declaration of 'sync' shadows a global declaration
/usr/include/unistd.h:933: warning: shadowed declaration is here
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/f6600d72-ecfa-2eb7-7e51-f6954547d500@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-18 17:40:26 +00:00
|
|
|
err = record__mmap_read_evlist(rec, rec->evlist, false, synch);
|
2016-06-27 10:24:04 +00:00
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
|
perf record: Implement --mmap-flush=<number> option
Implement a --mmap-flush option that specifies minimal number of bytes
that is extracted from mmaped kernel buffer to store into a trace. The
default option value is 1 byte what means every time trace writing
thread finds some new data in the mmaped buffer the data is extracted,
possibly compressed and written to a trace.
$ tools/perf/perf record --mmap-flush 1024 -e cycles -- matrix.gcc
$ tools/perf/perf record --aio --mmap-flush 1K -e cycles -- matrix.gcc
The option is independent from -z setting, doesn't vary with compression
level and can serve two purposes.
The first purpose is to increase the compression ratio of a trace data.
Larger data chunks are compressed more effectively so the implemented
option allows specifying data chunk size to compress. Also at some cases
executing more write syscalls with smaller data size can take longer
than executing less write syscalls with bigger data size due to syscall
overhead so extracting bigger data chunks specified by the option value
could additionally decrease runtime overhead.
The second purpose is to avoid self monitoring live-lock issue in system
wide (-a) profiling mode. Profiling in system wide mode with compression
(-a -z) can additionally induce data into the kernel buffers along with
the data from monitored processes. If performance data rate and volume
from the monitored processes is high then trace streaming and
compression activity in the tool is also high. High tool process
activity can lead to subtle live-lock effect when compression of single
new byte from some of mmaped kernel buffer leads to generation of the
next single byte at some mmaped buffer. So perf tool process ends up in
endless self monitoring.
Implemented synch parameter is the mean to force data move independently
from the specified flush threshold value. Despite the provided flush
value the tool needs capability to unconditionally drain memory buffers,
at least in the end of the collection.
Committer testing:
Running with the default value, i.e. as soon as there is something to
read go on consuming, we first write the synthesized events, small
chunks of about 128 bytes:
# perf trace -m 2048 --call-graph dwarf -e write -- perf record
<SNIP>
101.142 ( 0.004 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x210db60, count: 120) = 120
__libc_write (/usr/lib64/libpthread-2.28.so)
ion (/home/acme/bin/perf)
record__write (inlined)
process_synthesized_event (/home/acme/bin/perf)
perf_tool__process_synth_event (inlined)
perf_event__synthesize_mmap_events (/home/acme/bin/perf)
Then we move to reading the mmap buffers consuming the events put there
by the kernel perf infrastructure:
107.561 ( 0.005 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befc02000, count: 336) = 336
__libc_write (/usr/lib64/libpthread-2.28.so)
ion (/home/acme/bin/perf)
record__write (inlined)
record__pushfn (/home/acme/bin/perf)
perf_mmap__push (/home/acme/bin/perf)
record__mmap_read_evlist (inlined)
record__mmap_read_all (inlined)
__cmd_record (inlined)
cmd_record (/home/acme/bin/perf)
12919.953 ( 0.136 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befc83150, count: 184984) = 184984
<SNIP same backtrace as in the 107.561 timestamp>
12920.094 ( 0.155 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befc02150, count: 261816) = 261816
<SNIP same backtrace as in the 107.561 timestamp>
12920.253 ( 0.093 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befb81120, count: 170832) = 170832
<SNIP same backtrace as in the 107.561 timestamp>
If we limit it to write only when more than 16MB are available for
reading, it throttles that to a quarter of the --mmap-pages set for
'perf record', which by default get to 528384 bytes, found out using
'record -v':
mmap flush: 132096
mmap size 528384B
With that in place all the writes coming from
record__mmap_read_evlist(), i.e. from the mmap buffers setup by the
kernel perf infrastructure were at least 132096 bytes long.
Trying with a bigger mmap size:
perf trace -e write perf record -v -m 2048 --mmap-flush 16M
74982.928 ( 2.471 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff94a6cc000, count: 3580888) = 3580888
74985.406 ( 2.353 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff949ecb000, count: 3453256) = 3453256
74987.764 ( 2.629 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff9496ca000, count: 3859232) = 3859232
74990.399 ( 2.341 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff948ec9000, count: 3769032) = 3769032
74992.744 ( 2.064 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff9486c8000, count: 3310520) = 3310520
74994.814 ( 2.619 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff947ec7000, count: 4194688) = 4194688
74997.439 ( 2.787 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff9476c6000, count: 4029760) = 4029760
Was again limited to a quarter of the mmap size:
mmap flush: 2098176
mmap size 8392704B
A warning about that would be good to have but can be added later,
something like:
"max flush is a quarter of the mmap size, if wanting to bump the mmap
flush further, bump the mmap size as well using -m/--mmap-pages"
Also rename the 'sync' parameters to 'synch' to keep tools/perf building
with older glibcs:
cc1: warnings being treated as errors
builtin-record.c: In function 'record__mmap_read_evlist':
builtin-record.c:775: warning: declaration of 'sync' shadows a global declaration
/usr/include/unistd.h:933: warning: shadowed declaration is here
builtin-record.c: In function 'record__mmap_read_all':
builtin-record.c:856: warning: declaration of 'sync' shadows a global declaration
/usr/include/unistd.h:933: warning: shadowed declaration is here
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/f6600d72-ecfa-2eb7-7e51-f6954547d500@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-18 17:40:26 +00:00
|
|
|
return record__mmap_read_evlist(rec, rec->evlist, true, synch);
|
2016-06-27 10:24:04 +00:00
|
|
|
}
|
|
|
|
|
2022-01-17 18:34:25 +00:00
|
|
|
static void record__thread_munmap_filtered(struct fdarray *fda, int fd,
|
|
|
|
void *arg __maybe_unused)
|
|
|
|
{
|
|
|
|
struct perf_mmap *map = fda->priv[fd].ptr;
|
|
|
|
|
|
|
|
if (map)
|
|
|
|
perf_mmap__put(map);
|
|
|
|
}
|
|
|
|
|
2022-01-17 18:34:27 +00:00
|
|
|
static void *record__thread(void *arg)
|
|
|
|
{
|
|
|
|
enum thread_msg msg = THREAD_MSG__READY;
|
|
|
|
bool terminate = false;
|
|
|
|
struct fdarray *pollfd;
|
|
|
|
int err, ctlfd_pos;
|
|
|
|
|
|
|
|
thread = arg;
|
|
|
|
thread->tid = gettid();
|
|
|
|
|
|
|
|
err = write(thread->pipes.ack[1], &msg, sizeof(msg));
|
|
|
|
if (err == -1)
|
|
|
|
pr_warning("threads[%d]: failed to notify on start: %s\n",
|
|
|
|
thread->tid, strerror(errno));
|
|
|
|
|
|
|
|
pr_debug("threads[%d]: started on cpu%d\n", thread->tid, sched_getcpu());
|
|
|
|
|
|
|
|
pollfd = &thread->pollfd;
|
|
|
|
ctlfd_pos = thread->ctlfd_pos;
|
|
|
|
|
|
|
|
for (;;) {
|
|
|
|
unsigned long long hits = thread->samples;
|
|
|
|
|
|
|
|
if (record__mmap_read_all(thread->rec, false) < 0 || terminate)
|
|
|
|
break;
|
|
|
|
|
|
|
|
if (hits == thread->samples) {
|
|
|
|
|
|
|
|
err = fdarray__poll(pollfd, -1);
|
|
|
|
/*
|
|
|
|
* Propagate error, only if there's any. Ignore positive
|
|
|
|
* number of returned events and interrupt error.
|
|
|
|
*/
|
|
|
|
if (err > 0 || (err < 0 && errno == EINTR))
|
|
|
|
err = 0;
|
|
|
|
thread->waking++;
|
|
|
|
|
|
|
|
if (fdarray__filter(pollfd, POLLERR | POLLHUP,
|
|
|
|
record__thread_munmap_filtered, NULL) == 0)
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (pollfd->entries[ctlfd_pos].revents & POLLHUP) {
|
|
|
|
terminate = true;
|
|
|
|
close(thread->pipes.msg[0]);
|
|
|
|
thread->pipes.msg[0] = -1;
|
|
|
|
pollfd->entries[ctlfd_pos].fd = -1;
|
|
|
|
pollfd->entries[ctlfd_pos].events = 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
pollfd->entries[ctlfd_pos].revents = 0;
|
|
|
|
}
|
|
|
|
record__mmap_read_all(thread->rec, true);
|
|
|
|
|
|
|
|
err = write(thread->pipes.ack[1], &msg, sizeof(msg));
|
|
|
|
if (err == -1)
|
|
|
|
pr_warning("threads[%d]: failed to notify on termination: %s\n",
|
|
|
|
thread->tid, strerror(errno));
|
|
|
|
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
2013-12-19 17:38:03 +00:00
|
|
|
static void record__init_features(struct record *rec)
|
2013-11-06 18:41:34 +00:00
|
|
|
{
|
|
|
|
struct perf_session *session = rec->session;
|
|
|
|
int feat;
|
|
|
|
|
|
|
|
for (feat = HEADER_FIRST_FEATURE; feat < HEADER_LAST_FEATURE; feat++)
|
|
|
|
perf_header__set_feat(&session->header, feat);
|
|
|
|
|
|
|
|
if (rec->no_buildid)
|
|
|
|
perf_header__clear_feat(&session->header, HEADER_BUILD_ID);
|
|
|
|
|
perf build: Use libtraceevent from the system
Remove the LIBTRACEEVENT_DYNAMIC and LIBTRACEFS_DYNAMIC make command
line variables.
If libtraceevent isn't installed or NO_LIBTRACEEVENT=1 is passed to the
build, don't compile in libtraceevent and libtracefs support.
This also disables CONFIG_TRACE that controls "perf trace".
CONFIG_LIBTRACEEVENT is used to control enablement in Build/Makefiles,
HAVE_LIBTRACEEVENT is used in C code.
Without HAVE_LIBTRACEEVENT tracepoints are disabled and as such the
commands kmem, kwork, lock, sched and timechart are removed. The
majority of commands continue to work including "perf test".
Committer notes:
Fixed up a tools/perf/util/Build reject and added:
#include <traceevent/event-parse.h>
to tools/perf/util/scripting-engines/trace-event-perl.c.
Committer testing:
$ rpm -qi libtraceevent-devel
Name : libtraceevent-devel
Version : 1.5.3
Release : 2.fc36
Architecture: x86_64
Install Date: Mon 25 Jul 2022 03:20:19 PM -03
Group : Unspecified
Size : 27728
License : LGPLv2+ and GPLv2+
Signature : RSA/SHA256, Fri 15 Apr 2022 02:11:58 PM -03, Key ID 999f7cbf38ab71f4
Source RPM : libtraceevent-1.5.3-2.fc36.src.rpm
Build Date : Fri 15 Apr 2022 10:57:01 AM -03
Build Host : buildvm-x86-05.iad2.fedoraproject.org
Packager : Fedora Project
Vendor : Fedora Project
URL : https://git.kernel.org/pub/scm/libs/libtrace/libtraceevent.git/
Bug URL : https://bugz.fedoraproject.org/libtraceevent
Summary : Development headers of libtraceevent
Description :
Development headers of libtraceevent-libs
$
Default build:
$ ldd ~/bin/perf | grep tracee
libtraceevent.so.1 => /lib64/libtraceevent.so.1 (0x00007f1dcaf8f000)
$
# perf trace -e sched:* --max-events 10
0.000 migration/0/17 sched:sched_migrate_task(comm: "", pid: 1603763 (perf), prio: 120, dest_cpu: 1)
0.005 migration/0/17 sched:sched_wake_idle_without_ipi(cpu: 1)
0.011 migration/0/17 sched:sched_switch(prev_comm: "", prev_pid: 17 (migration/0), prev_state: 1, next_comm: "", next_prio: 120)
1.173 :0/0 sched:sched_wakeup(comm: "", pid: 3138 (gnome-terminal-), prio: 120)
1.180 :0/0 sched:sched_switch(prev_comm: "", prev_prio: 120, next_comm: "", next_pid: 3138 (gnome-terminal-), next_prio: 120)
0.156 migration/1/21 sched:sched_migrate_task(comm: "", pid: 1603763 (perf), prio: 120, orig_cpu: 1, dest_cpu: 2)
0.160 migration/1/21 sched:sched_wake_idle_without_ipi(cpu: 2)
0.166 migration/1/21 sched:sched_switch(prev_comm: "", prev_pid: 21 (migration/1), prev_state: 1, next_comm: "", next_prio: 120)
1.183 :0/0 sched:sched_wakeup(comm: "", pid: 1602985 (kworker/u16:0-f), prio: 120, target_cpu: 1)
1.186 :0/0 sched:sched_switch(prev_comm: "", prev_prio: 120, next_comm: "", next_pid: 1602985 (kworker/u16:0-f), next_prio: 120)
#
Had to tweak tools/perf/util/setup.py to make sure the python binding
shared object links with libtraceevent if -DHAVE_LIBTRACEEVENT is
present in CFLAGS.
Building with NO_LIBTRACEEVENT=1 uncovered some more build failures:
- Make building of data-convert-bt.c to CONFIG_LIBTRACEEVENT=y
- perf-$(CONFIG_LIBTRACEEVENT) += scripts/
- bpf_kwork.o needs also to be dependent on CONFIG_LIBTRACEEVENT=y
- The python binding needed some fixups and util/trace-event.c can't be
built and linked with the python binding shared object, so remove it
in tools/perf/util/setup.py and exclude it from the list of
dependencies in the python/perf.so Makefile.perf target.
Building without libtraceevent-devel installed uncovered more build
failures:
- The python binding tools/perf/util/python.c was assuming that
traceevent/parse-events.h was always available, which was the case
when we defaulted to using the in-kernel tools/lib/traceevent/ files,
now we need to enclose it under ifdef HAVE_LIBTRACEEVENT, just like
the other parts of it that deal with tracepoints.
- We have to ifdef the rules in the Build files with
CONFIG_LIBTRACEEVENT=y to build builtin-trace.c and
tools/perf/trace/beauty/ as we only ifdef setting CONFIG_TRACE=y when
setting NO_LIBTRACEEVENT=1 in the make command line, not when we don't
detect libtraceevent-devel installed in the system. Simplification here
to avoid these two ways of disabling builtin-trace.c and not having
CONFIG_TRACE=y when libtraceevent-devel isn't installed is the clean
way.
From Athira:
<quote>
tools/perf/arch/powerpc/util/Build
-perf-y += kvm-stat.o
+perf-$(CONFIG_LIBTRACEEVENT) += kvm-stat.o
</quote>
Then, ditto for arm64 and s390, detected by container cross build tests.
- s/390 uses test__checkevent_tracepoint() that is now only available if
HAVE_LIBTRACEEVENT is defined, enclose the callsite with ifder HAVE_LIBTRACEEVENT.
Also from Athira:
<quote>
With this change, I could successfully compile in these environment:
- Without libtraceevent-devel installed
- With libtraceevent-devel installed
- With “make NO_LIBTRACEEVENT=1”
</quote>
Then, finally rename CONFIG_TRACEEVENT to CONFIG_LIBTRACEEVENT for
consistency with other libraries detected in tools/perf/.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: bpf@vger.kernel.org
Link: http://lore.kernel.org/lkml/20221205225940.3079667-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-12-05 22:59:39 +00:00
|
|
|
#ifdef HAVE_LIBTRACEEVENT
|
2019-07-21 11:24:23 +00:00
|
|
|
if (!have_tracepoints(&rec->evlist->core.entries))
|
2013-11-06 18:41:34 +00:00
|
|
|
perf_header__clear_feat(&session->header, HEADER_TRACING_DATA);
|
perf build: Use libtraceevent from the system
Remove the LIBTRACEEVENT_DYNAMIC and LIBTRACEFS_DYNAMIC make command
line variables.
If libtraceevent isn't installed or NO_LIBTRACEEVENT=1 is passed to the
build, don't compile in libtraceevent and libtracefs support.
This also disables CONFIG_TRACE that controls "perf trace".
CONFIG_LIBTRACEEVENT is used to control enablement in Build/Makefiles,
HAVE_LIBTRACEEVENT is used in C code.
Without HAVE_LIBTRACEEVENT tracepoints are disabled and as such the
commands kmem, kwork, lock, sched and timechart are removed. The
majority of commands continue to work including "perf test".
Committer notes:
Fixed up a tools/perf/util/Build reject and added:
#include <traceevent/event-parse.h>
to tools/perf/util/scripting-engines/trace-event-perl.c.
Committer testing:
$ rpm -qi libtraceevent-devel
Name : libtraceevent-devel
Version : 1.5.3
Release : 2.fc36
Architecture: x86_64
Install Date: Mon 25 Jul 2022 03:20:19 PM -03
Group : Unspecified
Size : 27728
License : LGPLv2+ and GPLv2+
Signature : RSA/SHA256, Fri 15 Apr 2022 02:11:58 PM -03, Key ID 999f7cbf38ab71f4
Source RPM : libtraceevent-1.5.3-2.fc36.src.rpm
Build Date : Fri 15 Apr 2022 10:57:01 AM -03
Build Host : buildvm-x86-05.iad2.fedoraproject.org
Packager : Fedora Project
Vendor : Fedora Project
URL : https://git.kernel.org/pub/scm/libs/libtrace/libtraceevent.git/
Bug URL : https://bugz.fedoraproject.org/libtraceevent
Summary : Development headers of libtraceevent
Description :
Development headers of libtraceevent-libs
$
Default build:
$ ldd ~/bin/perf | grep tracee
libtraceevent.so.1 => /lib64/libtraceevent.so.1 (0x00007f1dcaf8f000)
$
# perf trace -e sched:* --max-events 10
0.000 migration/0/17 sched:sched_migrate_task(comm: "", pid: 1603763 (perf), prio: 120, dest_cpu: 1)
0.005 migration/0/17 sched:sched_wake_idle_without_ipi(cpu: 1)
0.011 migration/0/17 sched:sched_switch(prev_comm: "", prev_pid: 17 (migration/0), prev_state: 1, next_comm: "", next_prio: 120)
1.173 :0/0 sched:sched_wakeup(comm: "", pid: 3138 (gnome-terminal-), prio: 120)
1.180 :0/0 sched:sched_switch(prev_comm: "", prev_prio: 120, next_comm: "", next_pid: 3138 (gnome-terminal-), next_prio: 120)
0.156 migration/1/21 sched:sched_migrate_task(comm: "", pid: 1603763 (perf), prio: 120, orig_cpu: 1, dest_cpu: 2)
0.160 migration/1/21 sched:sched_wake_idle_without_ipi(cpu: 2)
0.166 migration/1/21 sched:sched_switch(prev_comm: "", prev_pid: 21 (migration/1), prev_state: 1, next_comm: "", next_prio: 120)
1.183 :0/0 sched:sched_wakeup(comm: "", pid: 1602985 (kworker/u16:0-f), prio: 120, target_cpu: 1)
1.186 :0/0 sched:sched_switch(prev_comm: "", prev_prio: 120, next_comm: "", next_pid: 1602985 (kworker/u16:0-f), next_prio: 120)
#
Had to tweak tools/perf/util/setup.py to make sure the python binding
shared object links with libtraceevent if -DHAVE_LIBTRACEEVENT is
present in CFLAGS.
Building with NO_LIBTRACEEVENT=1 uncovered some more build failures:
- Make building of data-convert-bt.c to CONFIG_LIBTRACEEVENT=y
- perf-$(CONFIG_LIBTRACEEVENT) += scripts/
- bpf_kwork.o needs also to be dependent on CONFIG_LIBTRACEEVENT=y
- The python binding needed some fixups and util/trace-event.c can't be
built and linked with the python binding shared object, so remove it
in tools/perf/util/setup.py and exclude it from the list of
dependencies in the python/perf.so Makefile.perf target.
Building without libtraceevent-devel installed uncovered more build
failures:
- The python binding tools/perf/util/python.c was assuming that
traceevent/parse-events.h was always available, which was the case
when we defaulted to using the in-kernel tools/lib/traceevent/ files,
now we need to enclose it under ifdef HAVE_LIBTRACEEVENT, just like
the other parts of it that deal with tracepoints.
- We have to ifdef the rules in the Build files with
CONFIG_LIBTRACEEVENT=y to build builtin-trace.c and
tools/perf/trace/beauty/ as we only ifdef setting CONFIG_TRACE=y when
setting NO_LIBTRACEEVENT=1 in the make command line, not when we don't
detect libtraceevent-devel installed in the system. Simplification here
to avoid these two ways of disabling builtin-trace.c and not having
CONFIG_TRACE=y when libtraceevent-devel isn't installed is the clean
way.
From Athira:
<quote>
tools/perf/arch/powerpc/util/Build
-perf-y += kvm-stat.o
+perf-$(CONFIG_LIBTRACEEVENT) += kvm-stat.o
</quote>
Then, ditto for arm64 and s390, detected by container cross build tests.
- s/390 uses test__checkevent_tracepoint() that is now only available if
HAVE_LIBTRACEEVENT is defined, enclose the callsite with ifder HAVE_LIBTRACEEVENT.
Also from Athira:
<quote>
With this change, I could successfully compile in these environment:
- Without libtraceevent-devel installed
- With libtraceevent-devel installed
- With “make NO_LIBTRACEEVENT=1”
</quote>
Then, finally rename CONFIG_TRACEEVENT to CONFIG_LIBTRACEEVENT for
consistency with other libraries detected in tools/perf/.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: bpf@vger.kernel.org
Link: http://lore.kernel.org/lkml/20221205225940.3079667-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-12-05 22:59:39 +00:00
|
|
|
#endif
|
2013-11-06 18:41:34 +00:00
|
|
|
|
|
|
|
if (!rec->opts.branch_stack)
|
|
|
|
perf_header__clear_feat(&session->header, HEADER_BRANCH_STACK);
|
2015-04-09 15:53:45 +00:00
|
|
|
|
|
|
|
if (!rec->opts.full_auxtrace)
|
|
|
|
perf_header__clear_feat(&session->header, HEADER_AUXTRACE);
|
2015-10-25 14:51:43 +00:00
|
|
|
|
perf record: Encode -k clockid frequency into Perf trace
Store -k clockid frequency into Perf trace to enable timestamps
derived metrics conversion into wall clock time on reporting stage.
Below is the example of perf report output:
tools/perf/perf record -k raw -- ../../matrix/linux/matrix.gcc
...
[ perf record: Captured and wrote 31.222 MB perf.data (818054 samples) ]
tools/perf/perf report --header
# ========
...
# event : name = cycles:ppp, , size = 112, { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|PERIOD, disabled = 1, inherit = 1, mmap = 1, comm = 1, freq = 1, enable_on_exec = 1, task = 1, precise_ip = 3, sample_id_all = 1, exclude_guest = 1, mmap2 = 1, comm_exec = 1, use_clockid = 1, clockid = 4
...
# clockid frequency: 1000 MHz
...
# ========
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/23a4a1dc-b160-85a0-347d-40a2ed6d007b@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-10-09 14:36:24 +00:00
|
|
|
if (!(rec->opts.use_clockid && rec->opts.clockid_res_ns))
|
|
|
|
perf_header__clear_feat(&session->header, HEADER_CLOCKID);
|
|
|
|
|
perf header: Store clock references for -k/--clockid option
Add a new CLOCK_DATA feature that stores reference times when
-k/--clockid option is specified.
It contains the clock id and its reference time together with wall clock
time taken at the 'same time', both values are in nanoseconds.
The format of data is as below:
struct {
u32 version; /* version = 1 */
u32 clockid;
u64 wall_clock_ns;
u64 clockid_time_ns;
};
This clock reference times will be used in following changes to display
wall clock for perf events.
It's available only for recording with clockid specified, because it's
the only case where we can get reference time to wallclock time. It's
can't do that with perf clock yet.
Committer testing:
$ perf record -h -k
Usage: perf record [<options>] [<command>]
or: perf record [<options>] -- <command> [<options>]
-k, --clockid <clockid>
clockid to use for events, see clock_gettime()
$ perf record -k monotonic sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.017 MB perf.data (8 samples) ]
$ perf report --header-only | grep clockid -A1
# event : name = cycles:u, , id = { 88815, 88816, 88817, 88818, 88819, 88820, 88821, 88822 }, size = 120, { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|PERIOD, read_format = ID, disabled = 1, inherit = 1, exclude_kernel = 1, mmap = 1, comm = 1, freq = 1, enable_on_exec = 1, task = 1, precise_ip = 3, sample_id_all = 1, exclude_guest = 1, mmap2 = 1, comm_exec = 1, use_clockid = 1, ksymbol = 1, bpf_event = 1, clockid = 1
# CPU_TOPOLOGY info available, use -I to display
--
# clockid frequency: 1000 MHz
# cpu pmu capabilities: branches=32, max_precise=3, pmu_name=skylake
# clockid: monotonic (1)
# reference time: 2020-08-06 09:40:21.619290 = 1596717621.619290 (TOD) = 21931.077673635 (monotonic)
$
Original-patch-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Geneviève Bastien <gbastien@versatic.net>
Cc: Ian Rogers <irogers@google.com>
Cc: Jeremie Galarneau <jgalar@efficios.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lore.kernel.org/lkml/20200805093444.314999-4-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-05 09:34:40 +00:00
|
|
|
if (!rec->opts.use_clockid)
|
|
|
|
perf_header__clear_feat(&session->header, HEADER_CLOCK_DATA);
|
|
|
|
|
2022-01-17 18:34:28 +00:00
|
|
|
if (!record__threads_enabled(rec))
|
|
|
|
perf_header__clear_feat(&session->header, HEADER_DIR_FORMAT);
|
|
|
|
|
2019-03-18 17:41:33 +00:00
|
|
|
if (!record__comp_enabled(rec))
|
|
|
|
perf_header__clear_feat(&session->header, HEADER_COMPRESSED);
|
2019-03-08 13:47:39 +00:00
|
|
|
|
2015-10-25 14:51:43 +00:00
|
|
|
perf_header__clear_feat(&session->header, HEADER_STAT);
|
2013-11-06 18:41:34 +00:00
|
|
|
}
|
|
|
|
|
2016-02-26 09:32:10 +00:00
|
|
|
static void
|
|
|
|
record__finish_output(struct record *rec)
|
|
|
|
{
|
2022-01-17 18:34:28 +00:00
|
|
|
int i;
|
2017-01-23 21:07:59 +00:00
|
|
|
struct perf_data *data = &rec->data;
|
|
|
|
int fd = perf_data__fd(data);
|
2016-02-26 09:32:10 +00:00
|
|
|
|
2024-01-12 23:13:40 +00:00
|
|
|
if (data->is_pipe) {
|
|
|
|
/* Just to display approx. size */
|
|
|
|
data->file.size = rec->bytes_written;
|
2016-02-26 09:32:10 +00:00
|
|
|
return;
|
2024-01-12 23:13:40 +00:00
|
|
|
}
|
2016-02-26 09:32:10 +00:00
|
|
|
|
|
|
|
rec->session->header.data_size += rec->bytes_written;
|
2019-02-21 09:41:29 +00:00
|
|
|
data->file.size = lseek(perf_data__fd(data), 0, SEEK_CUR);
|
2022-01-17 18:34:28 +00:00
|
|
|
if (record__threads_enabled(rec)) {
|
|
|
|
for (i = 0; i < data->dir.nr; i++)
|
|
|
|
data->dir.files[i].size = lseek(data->dir.files[i].fd, 0, SEEK_CUR);
|
|
|
|
}
|
2016-02-26 09:32:10 +00:00
|
|
|
|
|
|
|
if (!rec->no_buildid) {
|
|
|
|
process_buildids(rec);
|
|
|
|
|
|
|
|
if (rec->buildid_all)
|
2024-04-10 06:42:03 +00:00
|
|
|
perf_session__dsos_hit_all(rec->session);
|
2016-02-26 09:32:10 +00:00
|
|
|
}
|
|
|
|
perf_session__write_header(rec->session, rec->evlist, fd, true);
|
|
|
|
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
perf record: Add --tail-synthesize option
When working with overwritable ring buffer there's a inconvenience
problem: if perf dumps data after a long period after it starts,
non-sample events may lost, which makes following 'perf report' unable
to identify proc name and mmap layout. For example:
# perf record -m 4 -e raw_syscalls:* -g --overwrite --switch-output \
dd if=/dev/zero of=/dev/null
send SIGUSR2 after dd runs long enough. The resuling perf.data lost
correct comm and mmap events:
# perf script -i perf.data.2016061522374354
perf 24478 [004] 2581325.601789: raw_syscalls:sys_exit: NR 0 = 512
^^^^
Should be 'dd'
27b2e8 syscall_slow_exit_work+0xfe2000e3 (/lib/modules/4.6.0-rc3+/build/vmlinux)
203cc7 do_syscall_64+0xfe200117 (/lib/modules/4.6.0-rc3+/build/vmlinux)
b18d83 return_from_SYSCALL_64+0xfe200000 (/lib/modules/4.6.0-rc3+/build/vmlinux)
7f47c417edf0 [unknown] ([unknown])
^^^^^^^^^^^^
Fail to unwind
This patch provides a '--tail-synthesize' option, allows perf to collect
system status when finalizing output file. In resuling output file, the
non-sample events reflect system status when dumping data.
After this patch:
# perf record -m 4 -e raw_syscalls:* -g --overwrite --switch-output --tail-synthesize \
dd if=/dev/zero of=/dev/null
# perf script -i perf.data.2016061600544998
dd 27364 [004] 2583244.994464: raw_syscalls:sys_enter: NR 1 (1, ...
^^
Correct comm
203a18 syscall_trace_enter_phase2+0xfe2001a8 ([kernel.kallsyms])
203aa5 syscall_trace_enter+0xfe200055 ([kernel.kallsyms])
203caa do_syscall_64+0xfe2000fa ([kernel.kallsyms])
b18d83 return_from_SYSCALL_64+0xfe200000 ([kernel.kallsyms])
d8e50 __GI___libc_write+0xffff01d9639f4010 (/tmp/oxygen_root-w00229757/lib64/libc-2.18.so)
^^^^^
Correct unwind
This option doesn't aim to solve this problem completely. If a process
terminates before SIGUSR2, we still lost its COMM and MMAP events. For
example, we can't unwind correctly from the final perf.data we get from
the previous example, because when perf collects the final output file
(when we press C-c), 'dd' has been terminated so its '/proc/<pid>/mmap'
becomes empty.
However, this is a cheaper choice. To completely solve this problem we
need to continously output non-sample events. To satisify the
requirement of daemonization, we need to merge them periodically. It is
possible but requires much more code and cycles.
Automatically select --tail-synthesize when --overwrite is provided.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1468485287-33422-16-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-07-14 08:34:47 +00:00
|
|
|
static int record__synthesize_workload(struct record *rec, bool tail)
|
2016-04-20 18:59:54 +00:00
|
|
|
{
|
2017-02-14 13:59:04 +00:00
|
|
|
int err;
|
2019-07-21 11:23:50 +00:00
|
|
|
struct perf_thread_map *thread_map;
|
2021-08-11 04:46:58 +00:00
|
|
|
bool needs_mmap = rec->opts.synth & PERF_SYNTH_MMAP;
|
2016-04-20 18:59:54 +00:00
|
|
|
|
perf record: Add --tail-synthesize option
When working with overwritable ring buffer there's a inconvenience
problem: if perf dumps data after a long period after it starts,
non-sample events may lost, which makes following 'perf report' unable
to identify proc name and mmap layout. For example:
# perf record -m 4 -e raw_syscalls:* -g --overwrite --switch-output \
dd if=/dev/zero of=/dev/null
send SIGUSR2 after dd runs long enough. The resuling perf.data lost
correct comm and mmap events:
# perf script -i perf.data.2016061522374354
perf 24478 [004] 2581325.601789: raw_syscalls:sys_exit: NR 0 = 512
^^^^
Should be 'dd'
27b2e8 syscall_slow_exit_work+0xfe2000e3 (/lib/modules/4.6.0-rc3+/build/vmlinux)
203cc7 do_syscall_64+0xfe200117 (/lib/modules/4.6.0-rc3+/build/vmlinux)
b18d83 return_from_SYSCALL_64+0xfe200000 (/lib/modules/4.6.0-rc3+/build/vmlinux)
7f47c417edf0 [unknown] ([unknown])
^^^^^^^^^^^^
Fail to unwind
This patch provides a '--tail-synthesize' option, allows perf to collect
system status when finalizing output file. In resuling output file, the
non-sample events reflect system status when dumping data.
After this patch:
# perf record -m 4 -e raw_syscalls:* -g --overwrite --switch-output --tail-synthesize \
dd if=/dev/zero of=/dev/null
# perf script -i perf.data.2016061600544998
dd 27364 [004] 2583244.994464: raw_syscalls:sys_enter: NR 1 (1, ...
^^
Correct comm
203a18 syscall_trace_enter_phase2+0xfe2001a8 ([kernel.kallsyms])
203aa5 syscall_trace_enter+0xfe200055 ([kernel.kallsyms])
203caa do_syscall_64+0xfe2000fa ([kernel.kallsyms])
b18d83 return_from_SYSCALL_64+0xfe200000 ([kernel.kallsyms])
d8e50 __GI___libc_write+0xffff01d9639f4010 (/tmp/oxygen_root-w00229757/lib64/libc-2.18.so)
^^^^^
Correct unwind
This option doesn't aim to solve this problem completely. If a process
terminates before SIGUSR2, we still lost its COMM and MMAP events. For
example, we can't unwind correctly from the final perf.data we get from
the previous example, because when perf collects the final output file
(when we press C-c), 'dd' has been terminated so its '/proc/<pid>/mmap'
becomes empty.
However, this is a cheaper choice. To completely solve this problem we
need to continously output non-sample events. To satisify the
requirement of daemonization, we need to merge them periodically. It is
possible but requires much more code and cycles.
Automatically select --tail-synthesize when --overwrite is provided.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1468485287-33422-16-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-07-14 08:34:47 +00:00
|
|
|
if (rec->opts.tail_synthesize != tail)
|
|
|
|
return 0;
|
|
|
|
|
2017-02-14 13:59:04 +00:00
|
|
|
thread_map = thread_map__new_by_tid(rec->evlist->workload.pid);
|
|
|
|
if (thread_map == NULL)
|
|
|
|
return -1;
|
|
|
|
|
|
|
|
err = perf_event__synthesize_thread_map(&rec->tool, thread_map,
|
2016-04-20 18:59:54 +00:00
|
|
|
process_synthesized_event,
|
|
|
|
&rec->session->machines.host,
|
2021-08-11 04:46:58 +00:00
|
|
|
needs_mmap,
|
2018-12-04 20:34:20 +00:00
|
|
|
rec->opts.sample_address);
|
2019-07-21 11:24:20 +00:00
|
|
|
perf_thread_map__put(thread_map);
|
2017-02-14 13:59:04 +00:00
|
|
|
return err;
|
2016-04-20 18:59:54 +00:00
|
|
|
}
|
|
|
|
|
2022-06-10 11:33:15 +00:00
|
|
|
static int write_finished_init(struct record *rec, bool tail)
|
|
|
|
{
|
|
|
|
if (rec->opts.tail_synthesize != tail)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
return record__write(rec, NULL, &finished_init_event, sizeof(finished_init_event));
|
|
|
|
}
|
|
|
|
|
perf record: Add --tail-synthesize option
When working with overwritable ring buffer there's a inconvenience
problem: if perf dumps data after a long period after it starts,
non-sample events may lost, which makes following 'perf report' unable
to identify proc name and mmap layout. For example:
# perf record -m 4 -e raw_syscalls:* -g --overwrite --switch-output \
dd if=/dev/zero of=/dev/null
send SIGUSR2 after dd runs long enough. The resuling perf.data lost
correct comm and mmap events:
# perf script -i perf.data.2016061522374354
perf 24478 [004] 2581325.601789: raw_syscalls:sys_exit: NR 0 = 512
^^^^
Should be 'dd'
27b2e8 syscall_slow_exit_work+0xfe2000e3 (/lib/modules/4.6.0-rc3+/build/vmlinux)
203cc7 do_syscall_64+0xfe200117 (/lib/modules/4.6.0-rc3+/build/vmlinux)
b18d83 return_from_SYSCALL_64+0xfe200000 (/lib/modules/4.6.0-rc3+/build/vmlinux)
7f47c417edf0 [unknown] ([unknown])
^^^^^^^^^^^^
Fail to unwind
This patch provides a '--tail-synthesize' option, allows perf to collect
system status when finalizing output file. In resuling output file, the
non-sample events reflect system status when dumping data.
After this patch:
# perf record -m 4 -e raw_syscalls:* -g --overwrite --switch-output --tail-synthesize \
dd if=/dev/zero of=/dev/null
# perf script -i perf.data.2016061600544998
dd 27364 [004] 2583244.994464: raw_syscalls:sys_enter: NR 1 (1, ...
^^
Correct comm
203a18 syscall_trace_enter_phase2+0xfe2001a8 ([kernel.kallsyms])
203aa5 syscall_trace_enter+0xfe200055 ([kernel.kallsyms])
203caa do_syscall_64+0xfe2000fa ([kernel.kallsyms])
b18d83 return_from_SYSCALL_64+0xfe200000 ([kernel.kallsyms])
d8e50 __GI___libc_write+0xffff01d9639f4010 (/tmp/oxygen_root-w00229757/lib64/libc-2.18.so)
^^^^^
Correct unwind
This option doesn't aim to solve this problem completely. If a process
terminates before SIGUSR2, we still lost its COMM and MMAP events. For
example, we can't unwind correctly from the final perf.data we get from
the previous example, because when perf collects the final output file
(when we press C-c), 'dd' has been terminated so its '/proc/<pid>/mmap'
becomes empty.
However, this is a cheaper choice. To completely solve this problem we
need to continously output non-sample events. To satisify the
requirement of daemonization, we need to merge them periodically. It is
possible but requires much more code and cycles.
Automatically select --tail-synthesize when --overwrite is provided.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1468485287-33422-16-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-07-14 08:34:47 +00:00
|
|
|
static int record__synthesize(struct record *rec, bool tail);
|
2016-04-20 18:59:50 +00:00
|
|
|
|
2016-04-13 08:21:07 +00:00
|
|
|
static int
|
|
|
|
record__switch_output(struct record *rec, bool at_exit)
|
|
|
|
{
|
2017-01-23 21:07:59 +00:00
|
|
|
struct perf_data *data = &rec->data;
|
2024-01-19 04:03:02 +00:00
|
|
|
char *new_filename = NULL;
|
2016-04-13 08:21:07 +00:00
|
|
|
int fd, err;
|
|
|
|
|
|
|
|
/* Same Size: "2015122520103046"*/
|
|
|
|
char timestamp[] = "InvalidTimestamp";
|
|
|
|
|
2018-11-06 09:04:58 +00:00
|
|
|
record__aio_mmap_read_sync(rec);
|
|
|
|
|
2022-06-10 11:33:15 +00:00
|
|
|
write_finished_init(rec, true);
|
|
|
|
|
perf record: Add --tail-synthesize option
When working with overwritable ring buffer there's a inconvenience
problem: if perf dumps data after a long period after it starts,
non-sample events may lost, which makes following 'perf report' unable
to identify proc name and mmap layout. For example:
# perf record -m 4 -e raw_syscalls:* -g --overwrite --switch-output \
dd if=/dev/zero of=/dev/null
send SIGUSR2 after dd runs long enough. The resuling perf.data lost
correct comm and mmap events:
# perf script -i perf.data.2016061522374354
perf 24478 [004] 2581325.601789: raw_syscalls:sys_exit: NR 0 = 512
^^^^
Should be 'dd'
27b2e8 syscall_slow_exit_work+0xfe2000e3 (/lib/modules/4.6.0-rc3+/build/vmlinux)
203cc7 do_syscall_64+0xfe200117 (/lib/modules/4.6.0-rc3+/build/vmlinux)
b18d83 return_from_SYSCALL_64+0xfe200000 (/lib/modules/4.6.0-rc3+/build/vmlinux)
7f47c417edf0 [unknown] ([unknown])
^^^^^^^^^^^^
Fail to unwind
This patch provides a '--tail-synthesize' option, allows perf to collect
system status when finalizing output file. In resuling output file, the
non-sample events reflect system status when dumping data.
After this patch:
# perf record -m 4 -e raw_syscalls:* -g --overwrite --switch-output --tail-synthesize \
dd if=/dev/zero of=/dev/null
# perf script -i perf.data.2016061600544998
dd 27364 [004] 2583244.994464: raw_syscalls:sys_enter: NR 1 (1, ...
^^
Correct comm
203a18 syscall_trace_enter_phase2+0xfe2001a8 ([kernel.kallsyms])
203aa5 syscall_trace_enter+0xfe200055 ([kernel.kallsyms])
203caa do_syscall_64+0xfe2000fa ([kernel.kallsyms])
b18d83 return_from_SYSCALL_64+0xfe200000 ([kernel.kallsyms])
d8e50 __GI___libc_write+0xffff01d9639f4010 (/tmp/oxygen_root-w00229757/lib64/libc-2.18.so)
^^^^^
Correct unwind
This option doesn't aim to solve this problem completely. If a process
terminates before SIGUSR2, we still lost its COMM and MMAP events. For
example, we can't unwind correctly from the final perf.data we get from
the previous example, because when perf collects the final output file
(when we press C-c), 'dd' has been terminated so its '/proc/<pid>/mmap'
becomes empty.
However, this is a cheaper choice. To completely solve this problem we
need to continously output non-sample events. To satisify the
requirement of daemonization, we need to merge them periodically. It is
possible but requires much more code and cycles.
Automatically select --tail-synthesize when --overwrite is provided.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1468485287-33422-16-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-07-14 08:34:47 +00:00
|
|
|
record__synthesize(rec, true);
|
|
|
|
if (target__none(&rec->opts.target))
|
|
|
|
record__synthesize_workload(rec, true);
|
|
|
|
|
2016-04-13 08:21:07 +00:00
|
|
|
rec->samples = 0;
|
|
|
|
record__finish_output(rec);
|
|
|
|
err = fetch_current_timestamp(timestamp, sizeof(timestamp));
|
|
|
|
if (err) {
|
|
|
|
pr_err("Failed to get current timestamp\n");
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
2017-01-23 21:07:59 +00:00
|
|
|
fd = perf_data__switch(data, timestamp,
|
2024-01-19 04:03:04 +00:00
|
|
|
rec->session->header.data_offset,
|
|
|
|
at_exit, &new_filename);
|
2016-04-13 08:21:07 +00:00
|
|
|
if (fd >= 0 && !at_exit) {
|
|
|
|
rec->bytes_written = 0;
|
|
|
|
rec->session->header.data_size = 0;
|
|
|
|
}
|
|
|
|
|
2024-01-19 04:03:04 +00:00
|
|
|
if (!quiet) {
|
2016-04-13 08:21:07 +00:00
|
|
|
fprintf(stderr, "[ perf record: Dump %s.%s ]\n",
|
2019-02-21 09:41:30 +00:00
|
|
|
data->path, timestamp);
|
2024-01-19 04:03:04 +00:00
|
|
|
}
|
2016-04-20 18:59:50 +00:00
|
|
|
|
2019-03-14 22:49:55 +00:00
|
|
|
if (rec->switch_output.num_files) {
|
|
|
|
int n = rec->switch_output.cur_file + 1;
|
|
|
|
|
|
|
|
if (n >= rec->switch_output.num_files)
|
|
|
|
n = 0;
|
|
|
|
rec->switch_output.cur_file = n;
|
|
|
|
if (rec->switch_output.filenames[n]) {
|
|
|
|
remove(rec->switch_output.filenames[n]);
|
2019-07-04 15:06:20 +00:00
|
|
|
zfree(&rec->switch_output.filenames[n]);
|
2019-03-14 22:49:55 +00:00
|
|
|
}
|
|
|
|
rec->switch_output.filenames[n] = new_filename;
|
|
|
|
} else {
|
|
|
|
free(new_filename);
|
|
|
|
}
|
|
|
|
|
2016-04-20 18:59:50 +00:00
|
|
|
/* Output tracking events */
|
2016-04-20 18:59:54 +00:00
|
|
|
if (!at_exit) {
|
perf record: Add --tail-synthesize option
When working with overwritable ring buffer there's a inconvenience
problem: if perf dumps data after a long period after it starts,
non-sample events may lost, which makes following 'perf report' unable
to identify proc name and mmap layout. For example:
# perf record -m 4 -e raw_syscalls:* -g --overwrite --switch-output \
dd if=/dev/zero of=/dev/null
send SIGUSR2 after dd runs long enough. The resuling perf.data lost
correct comm and mmap events:
# perf script -i perf.data.2016061522374354
perf 24478 [004] 2581325.601789: raw_syscalls:sys_exit: NR 0 = 512
^^^^
Should be 'dd'
27b2e8 syscall_slow_exit_work+0xfe2000e3 (/lib/modules/4.6.0-rc3+/build/vmlinux)
203cc7 do_syscall_64+0xfe200117 (/lib/modules/4.6.0-rc3+/build/vmlinux)
b18d83 return_from_SYSCALL_64+0xfe200000 (/lib/modules/4.6.0-rc3+/build/vmlinux)
7f47c417edf0 [unknown] ([unknown])
^^^^^^^^^^^^
Fail to unwind
This patch provides a '--tail-synthesize' option, allows perf to collect
system status when finalizing output file. In resuling output file, the
non-sample events reflect system status when dumping data.
After this patch:
# perf record -m 4 -e raw_syscalls:* -g --overwrite --switch-output --tail-synthesize \
dd if=/dev/zero of=/dev/null
# perf script -i perf.data.2016061600544998
dd 27364 [004] 2583244.994464: raw_syscalls:sys_enter: NR 1 (1, ...
^^
Correct comm
203a18 syscall_trace_enter_phase2+0xfe2001a8 ([kernel.kallsyms])
203aa5 syscall_trace_enter+0xfe200055 ([kernel.kallsyms])
203caa do_syscall_64+0xfe2000fa ([kernel.kallsyms])
b18d83 return_from_SYSCALL_64+0xfe200000 ([kernel.kallsyms])
d8e50 __GI___libc_write+0xffff01d9639f4010 (/tmp/oxygen_root-w00229757/lib64/libc-2.18.so)
^^^^^
Correct unwind
This option doesn't aim to solve this problem completely. If a process
terminates before SIGUSR2, we still lost its COMM and MMAP events. For
example, we can't unwind correctly from the final perf.data we get from
the previous example, because when perf collects the final output file
(when we press C-c), 'dd' has been terminated so its '/proc/<pid>/mmap'
becomes empty.
However, this is a cheaper choice. To completely solve this problem we
need to continously output non-sample events. To satisify the
requirement of daemonization, we need to merge them periodically. It is
possible but requires much more code and cycles.
Automatically select --tail-synthesize when --overwrite is provided.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1468485287-33422-16-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-07-14 08:34:47 +00:00
|
|
|
record__synthesize(rec, false);
|
2016-04-20 18:59:50 +00:00
|
|
|
|
2016-04-20 18:59:54 +00:00
|
|
|
/*
|
|
|
|
* In 'perf record --switch-output' without -a,
|
|
|
|
* record__synthesize() in record__switch_output() won't
|
|
|
|
* generate tracking events because there's no thread_map
|
|
|
|
* in evlist. Which causes newly created perf.data doesn't
|
|
|
|
* contain map and comm information.
|
|
|
|
* Create a fake thread_map and directly call
|
|
|
|
* perf_event__synthesize_thread_map() for those events.
|
|
|
|
*/
|
|
|
|
if (target__none(&rec->opts.target))
|
perf record: Add --tail-synthesize option
When working with overwritable ring buffer there's a inconvenience
problem: if perf dumps data after a long period after it starts,
non-sample events may lost, which makes following 'perf report' unable
to identify proc name and mmap layout. For example:
# perf record -m 4 -e raw_syscalls:* -g --overwrite --switch-output \
dd if=/dev/zero of=/dev/null
send SIGUSR2 after dd runs long enough. The resuling perf.data lost
correct comm and mmap events:
# perf script -i perf.data.2016061522374354
perf 24478 [004] 2581325.601789: raw_syscalls:sys_exit: NR 0 = 512
^^^^
Should be 'dd'
27b2e8 syscall_slow_exit_work+0xfe2000e3 (/lib/modules/4.6.0-rc3+/build/vmlinux)
203cc7 do_syscall_64+0xfe200117 (/lib/modules/4.6.0-rc3+/build/vmlinux)
b18d83 return_from_SYSCALL_64+0xfe200000 (/lib/modules/4.6.0-rc3+/build/vmlinux)
7f47c417edf0 [unknown] ([unknown])
^^^^^^^^^^^^
Fail to unwind
This patch provides a '--tail-synthesize' option, allows perf to collect
system status when finalizing output file. In resuling output file, the
non-sample events reflect system status when dumping data.
After this patch:
# perf record -m 4 -e raw_syscalls:* -g --overwrite --switch-output --tail-synthesize \
dd if=/dev/zero of=/dev/null
# perf script -i perf.data.2016061600544998
dd 27364 [004] 2583244.994464: raw_syscalls:sys_enter: NR 1 (1, ...
^^
Correct comm
203a18 syscall_trace_enter_phase2+0xfe2001a8 ([kernel.kallsyms])
203aa5 syscall_trace_enter+0xfe200055 ([kernel.kallsyms])
203caa do_syscall_64+0xfe2000fa ([kernel.kallsyms])
b18d83 return_from_SYSCALL_64+0xfe200000 ([kernel.kallsyms])
d8e50 __GI___libc_write+0xffff01d9639f4010 (/tmp/oxygen_root-w00229757/lib64/libc-2.18.so)
^^^^^
Correct unwind
This option doesn't aim to solve this problem completely. If a process
terminates before SIGUSR2, we still lost its COMM and MMAP events. For
example, we can't unwind correctly from the final perf.data we get from
the previous example, because when perf collects the final output file
(when we press C-c), 'dd' has been terminated so its '/proc/<pid>/mmap'
becomes empty.
However, this is a cheaper choice. To completely solve this problem we
need to continously output non-sample events. To satisify the
requirement of daemonization, we need to merge them periodically. It is
possible but requires much more code and cycles.
Automatically select --tail-synthesize when --overwrite is provided.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1468485287-33422-16-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-07-14 08:34:47 +00:00
|
|
|
record__synthesize_workload(rec, false);
|
2022-06-10 11:33:15 +00:00
|
|
|
write_finished_init(rec, false);
|
2016-04-20 18:59:54 +00:00
|
|
|
}
|
2016-04-13 08:21:07 +00:00
|
|
|
return fd;
|
|
|
|
}
|
|
|
|
|
2023-03-14 23:42:31 +00:00
|
|
|
static void __record__save_lost_samples(struct record *rec, struct evsel *evsel,
|
2022-09-01 19:57:37 +00:00
|
|
|
struct perf_record_lost_samples *lost,
|
2023-03-14 23:42:31 +00:00
|
|
|
int cpu_idx, int thread_idx, u64 lost_count,
|
|
|
|
u16 misc_flag)
|
2022-09-01 19:57:37 +00:00
|
|
|
{
|
|
|
|
struct perf_sample_id *sid;
|
|
|
|
struct perf_sample sample = {};
|
|
|
|
int id_hdr_size;
|
|
|
|
|
2023-03-14 23:42:31 +00:00
|
|
|
lost->lost = lost_count;
|
2022-09-01 19:57:37 +00:00
|
|
|
if (evsel->core.ids) {
|
|
|
|
sid = xyarray__entry(evsel->core.sample_id, cpu_idx, thread_idx);
|
|
|
|
sample.id = sid->id;
|
|
|
|
}
|
|
|
|
|
|
|
|
id_hdr_size = perf_event__synthesize_id_sample((void *)(lost + 1),
|
|
|
|
evsel->core.attr.sample_type, &sample);
|
|
|
|
lost->header.size = sizeof(*lost) + id_hdr_size;
|
2023-03-14 23:42:31 +00:00
|
|
|
lost->header.misc = misc_flag;
|
2022-09-01 19:57:37 +00:00
|
|
|
record__write(rec, NULL, lost, lost->header.size);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void record__read_lost_samples(struct record *rec)
|
|
|
|
{
|
|
|
|
struct perf_session *session = rec->session;
|
2024-06-11 05:06:26 +00:00
|
|
|
struct perf_record_lost_samples_and_ids lost;
|
2022-09-01 19:57:37 +00:00
|
|
|
struct evsel *evsel;
|
|
|
|
|
2022-09-09 23:50:24 +00:00
|
|
|
/* there was an error during record__open */
|
|
|
|
if (session->evlist == NULL)
|
|
|
|
return;
|
|
|
|
|
2022-09-01 19:57:37 +00:00
|
|
|
evlist__for_each_entry(session->evlist, evsel) {
|
|
|
|
struct xyarray *xy = evsel->core.sample_id;
|
2023-03-14 23:42:31 +00:00
|
|
|
u64 lost_count;
|
2022-09-01 19:57:37 +00:00
|
|
|
|
2022-09-09 23:50:24 +00:00
|
|
|
if (xy == NULL || evsel->core.fd == NULL)
|
|
|
|
continue;
|
2022-09-01 19:57:37 +00:00
|
|
|
if (xyarray__max_x(evsel->core.fd) != xyarray__max_x(xy) ||
|
|
|
|
xyarray__max_y(evsel->core.fd) != xyarray__max_y(xy)) {
|
|
|
|
pr_debug("Unmatched FD vs. sample ID: skip reading LOST count\n");
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
for (int x = 0; x < xyarray__max_x(xy); x++) {
|
|
|
|
for (int y = 0; y < xyarray__max_y(xy); y++) {
|
2023-03-14 23:42:31 +00:00
|
|
|
struct perf_counts_values count;
|
|
|
|
|
|
|
|
if (perf_evsel__read(&evsel->core, x, y, &count) < 0) {
|
|
|
|
pr_debug("read LOST count failed\n");
|
2024-06-11 05:06:26 +00:00
|
|
|
return;
|
2023-03-14 23:42:31 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
if (count.lost) {
|
perf record: Fix memset out-of-range error
Modified the object of 'memset' from '&lost.lost' to '&lost' in
record__read_lost_samples. This allows 'memset' to access memory properly
without causing out-of-bounds problems.
The problems got from builtin-record.c are:
In file included from /usr/include/string.h:495,
from util/parse-events.h:13,
from builtin-record.c:14:
In function 'memset',
inlined from 'record__read_lost_samples' at
builtin-record.c:1958:6,
inlined from '__cmd_record.constprop' at builtin-record.c:2817:2:
/usr/include/x86_64-linux-gnu/bits/string_fortified.h:71:10: error:
'__builtin_memset' offset [17, 64] from the object at 'lost' is out
of the bounds of referenced subobject 'lost' with type
'struct perf_record_lost_samples' at offset 0 [-Werror=array-bounds]
71|return __builtin___memset_chk (__dest,__ch,__len,__bos0 (__dest));
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The error arised when performing a memset operation on the 'lost' variable,
the bytes of 'sizeof(lost)' exceeds that of '&lost.lost', which are 64
and 16.
Fixes: 6c1785cd75ef ("perf record: Ensure space for lost samples")
Signed-off-by: Haoze Xie <royenheart@gmail.com>
Signed-off-by: Yuan Tan <tanyuan@tinylab.org>
Link: https://lore.kernel.org/r/11e12f171b846577cac698cd3999db3d7f6c4d03.1720372317.git.royenheart@gmail.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-07-07 18:01:00 +00:00
|
|
|
memset(&lost, 0, sizeof(lost));
|
2024-06-11 05:06:26 +00:00
|
|
|
lost.lost.header.type = PERF_RECORD_LOST_SAMPLES;
|
|
|
|
__record__save_lost_samples(rec, evsel, &lost.lost,
|
2023-03-14 23:42:31 +00:00
|
|
|
x, y, count.lost, 0);
|
|
|
|
}
|
2022-09-01 19:57:37 +00:00
|
|
|
}
|
|
|
|
}
|
2023-03-14 23:42:31 +00:00
|
|
|
|
|
|
|
lost_count = perf_bpf_filter__lost_count(evsel);
|
2023-11-27 22:08:20 +00:00
|
|
|
if (lost_count) {
|
perf record: Fix memset out-of-range error
Modified the object of 'memset' from '&lost.lost' to '&lost' in
record__read_lost_samples. This allows 'memset' to access memory properly
without causing out-of-bounds problems.
The problems got from builtin-record.c are:
In file included from /usr/include/string.h:495,
from util/parse-events.h:13,
from builtin-record.c:14:
In function 'memset',
inlined from 'record__read_lost_samples' at
builtin-record.c:1958:6,
inlined from '__cmd_record.constprop' at builtin-record.c:2817:2:
/usr/include/x86_64-linux-gnu/bits/string_fortified.h:71:10: error:
'__builtin_memset' offset [17, 64] from the object at 'lost' is out
of the bounds of referenced subobject 'lost' with type
'struct perf_record_lost_samples' at offset 0 [-Werror=array-bounds]
71|return __builtin___memset_chk (__dest,__ch,__len,__bos0 (__dest));
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The error arised when performing a memset operation on the 'lost' variable,
the bytes of 'sizeof(lost)' exceeds that of '&lost.lost', which are 64
and 16.
Fixes: 6c1785cd75ef ("perf record: Ensure space for lost samples")
Signed-off-by: Haoze Xie <royenheart@gmail.com>
Signed-off-by: Yuan Tan <tanyuan@tinylab.org>
Link: https://lore.kernel.org/r/11e12f171b846577cac698cd3999db3d7f6c4d03.1720372317.git.royenheart@gmail.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-07-07 18:01:00 +00:00
|
|
|
memset(&lost, 0, sizeof(lost));
|
2024-06-11 05:06:26 +00:00
|
|
|
lost.lost.header.type = PERF_RECORD_LOST_SAMPLES;
|
|
|
|
__record__save_lost_samples(rec, evsel, &lost.lost, 0, 0, lost_count,
|
2023-03-14 23:42:31 +00:00
|
|
|
PERF_RECORD_MISC_LOST_SAMPLES_BPF);
|
2023-11-27 22:08:20 +00:00
|
|
|
}
|
2022-09-01 19:57:37 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2022-10-24 18:19:07 +00:00
|
|
|
static volatile sig_atomic_t workload_exec_errno;
|
2014-01-02 18:11:25 +00:00
|
|
|
|
|
|
|
/*
|
2020-11-30 12:26:54 +00:00
|
|
|
* evlist__prepare_workload will send a SIGUSR1
|
2014-01-02 18:11:25 +00:00
|
|
|
* if the fork fails, since we asked by setting its
|
|
|
|
* want_signal to true.
|
|
|
|
*/
|
2014-05-12 00:47:24 +00:00
|
|
|
static void workload_exec_failed_signal(int signo __maybe_unused,
|
|
|
|
siginfo_t *info,
|
2014-01-02 18:11:25 +00:00
|
|
|
void *ucontext __maybe_unused)
|
|
|
|
{
|
|
|
|
workload_exec_errno = info->si_value.sival_int;
|
|
|
|
done = 1;
|
|
|
|
child_finished = 1;
|
|
|
|
}
|
|
|
|
|
2015-04-30 14:37:32 +00:00
|
|
|
static void snapshot_sig_handler(int sig);
|
2017-01-09 09:52:00 +00:00
|
|
|
static void alarm_sig_handler(int sig);
|
2015-04-30 14:37:32 +00:00
|
|
|
|
2020-11-30 18:19:40 +00:00
|
|
|
static const struct perf_event_mmap_page *evlist__pick_pc(struct evlist *evlist)
|
2016-06-27 10:24:05 +00:00
|
|
|
{
|
2016-07-14 08:34:39 +00:00
|
|
|
if (evlist) {
|
2019-07-27 20:07:44 +00:00
|
|
|
if (evlist->mmap && evlist->mmap[0].core.base)
|
|
|
|
return evlist->mmap[0].core.base;
|
|
|
|
if (evlist->overwrite_mmap && evlist->overwrite_mmap[0].core.base)
|
|
|
|
return evlist->overwrite_mmap[0].core.base;
|
2016-07-14 08:34:39 +00:00
|
|
|
}
|
2016-06-27 10:24:05 +00:00
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
2016-05-24 02:28:59 +00:00
|
|
|
static const struct perf_event_mmap_page *record__pick_pc(struct record *rec)
|
|
|
|
{
|
2020-11-30 18:19:40 +00:00
|
|
|
const struct perf_event_mmap_page *pc = evlist__pick_pc(rec->evlist);
|
2016-06-27 10:24:05 +00:00
|
|
|
if (pc)
|
|
|
|
return pc;
|
2016-05-24 02:28:59 +00:00
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
perf record: Add --tail-synthesize option
When working with overwritable ring buffer there's a inconvenience
problem: if perf dumps data after a long period after it starts,
non-sample events may lost, which makes following 'perf report' unable
to identify proc name and mmap layout. For example:
# perf record -m 4 -e raw_syscalls:* -g --overwrite --switch-output \
dd if=/dev/zero of=/dev/null
send SIGUSR2 after dd runs long enough. The resuling perf.data lost
correct comm and mmap events:
# perf script -i perf.data.2016061522374354
perf 24478 [004] 2581325.601789: raw_syscalls:sys_exit: NR 0 = 512
^^^^
Should be 'dd'
27b2e8 syscall_slow_exit_work+0xfe2000e3 (/lib/modules/4.6.0-rc3+/build/vmlinux)
203cc7 do_syscall_64+0xfe200117 (/lib/modules/4.6.0-rc3+/build/vmlinux)
b18d83 return_from_SYSCALL_64+0xfe200000 (/lib/modules/4.6.0-rc3+/build/vmlinux)
7f47c417edf0 [unknown] ([unknown])
^^^^^^^^^^^^
Fail to unwind
This patch provides a '--tail-synthesize' option, allows perf to collect
system status when finalizing output file. In resuling output file, the
non-sample events reflect system status when dumping data.
After this patch:
# perf record -m 4 -e raw_syscalls:* -g --overwrite --switch-output --tail-synthesize \
dd if=/dev/zero of=/dev/null
# perf script -i perf.data.2016061600544998
dd 27364 [004] 2583244.994464: raw_syscalls:sys_enter: NR 1 (1, ...
^^
Correct comm
203a18 syscall_trace_enter_phase2+0xfe2001a8 ([kernel.kallsyms])
203aa5 syscall_trace_enter+0xfe200055 ([kernel.kallsyms])
203caa do_syscall_64+0xfe2000fa ([kernel.kallsyms])
b18d83 return_from_SYSCALL_64+0xfe200000 ([kernel.kallsyms])
d8e50 __GI___libc_write+0xffff01d9639f4010 (/tmp/oxygen_root-w00229757/lib64/libc-2.18.so)
^^^^^
Correct unwind
This option doesn't aim to solve this problem completely. If a process
terminates before SIGUSR2, we still lost its COMM and MMAP events. For
example, we can't unwind correctly from the final perf.data we get from
the previous example, because when perf collects the final output file
(when we press C-c), 'dd' has been terminated so its '/proc/<pid>/mmap'
becomes empty.
However, this is a cheaper choice. To completely solve this problem we
need to continously output non-sample events. To satisify the
requirement of daemonization, we need to merge them periodically. It is
possible but requires much more code and cycles.
Automatically select --tail-synthesize when --overwrite is provided.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1468485287-33422-16-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-07-14 08:34:47 +00:00
|
|
|
static int record__synthesize(struct record *rec, bool tail)
|
2016-02-26 09:32:07 +00:00
|
|
|
{
|
|
|
|
struct perf_session *session = rec->session;
|
|
|
|
struct machine *machine = &session->machines.host;
|
2017-01-23 21:07:59 +00:00
|
|
|
struct perf_data *data = &rec->data;
|
2016-02-26 09:32:07 +00:00
|
|
|
struct record_opts *opts = &rec->opts;
|
|
|
|
struct perf_tool *tool = &rec->tool;
|
|
|
|
int err = 0;
|
2020-04-22 15:50:38 +00:00
|
|
|
event_op f = process_synthesized_event;
|
2016-02-26 09:32:07 +00:00
|
|
|
|
perf record: Add --tail-synthesize option
When working with overwritable ring buffer there's a inconvenience
problem: if perf dumps data after a long period after it starts,
non-sample events may lost, which makes following 'perf report' unable
to identify proc name and mmap layout. For example:
# perf record -m 4 -e raw_syscalls:* -g --overwrite --switch-output \
dd if=/dev/zero of=/dev/null
send SIGUSR2 after dd runs long enough. The resuling perf.data lost
correct comm and mmap events:
# perf script -i perf.data.2016061522374354
perf 24478 [004] 2581325.601789: raw_syscalls:sys_exit: NR 0 = 512
^^^^
Should be 'dd'
27b2e8 syscall_slow_exit_work+0xfe2000e3 (/lib/modules/4.6.0-rc3+/build/vmlinux)
203cc7 do_syscall_64+0xfe200117 (/lib/modules/4.6.0-rc3+/build/vmlinux)
b18d83 return_from_SYSCALL_64+0xfe200000 (/lib/modules/4.6.0-rc3+/build/vmlinux)
7f47c417edf0 [unknown] ([unknown])
^^^^^^^^^^^^
Fail to unwind
This patch provides a '--tail-synthesize' option, allows perf to collect
system status when finalizing output file. In resuling output file, the
non-sample events reflect system status when dumping data.
After this patch:
# perf record -m 4 -e raw_syscalls:* -g --overwrite --switch-output --tail-synthesize \
dd if=/dev/zero of=/dev/null
# perf script -i perf.data.2016061600544998
dd 27364 [004] 2583244.994464: raw_syscalls:sys_enter: NR 1 (1, ...
^^
Correct comm
203a18 syscall_trace_enter_phase2+0xfe2001a8 ([kernel.kallsyms])
203aa5 syscall_trace_enter+0xfe200055 ([kernel.kallsyms])
203caa do_syscall_64+0xfe2000fa ([kernel.kallsyms])
b18d83 return_from_SYSCALL_64+0xfe200000 ([kernel.kallsyms])
d8e50 __GI___libc_write+0xffff01d9639f4010 (/tmp/oxygen_root-w00229757/lib64/libc-2.18.so)
^^^^^
Correct unwind
This option doesn't aim to solve this problem completely. If a process
terminates before SIGUSR2, we still lost its COMM and MMAP events. For
example, we can't unwind correctly from the final perf.data we get from
the previous example, because when perf collects the final output file
(when we press C-c), 'dd' has been terminated so its '/proc/<pid>/mmap'
becomes empty.
However, this is a cheaper choice. To completely solve this problem we
need to continously output non-sample events. To satisify the
requirement of daemonization, we need to merge them periodically. It is
possible but requires much more code and cycles.
Automatically select --tail-synthesize when --overwrite is provided.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1468485287-33422-16-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-07-14 08:34:47 +00:00
|
|
|
if (rec->opts.tail_synthesize != tail)
|
|
|
|
return 0;
|
|
|
|
|
2017-01-23 21:07:59 +00:00
|
|
|
if (data->is_pipe) {
|
2021-07-19 22:31:52 +00:00
|
|
|
err = perf_event__synthesize_for_pipe(tool, session, data,
|
2018-03-14 09:22:04 +00:00
|
|
|
process_synthesized_event);
|
2021-07-19 22:31:52 +00:00
|
|
|
if (err < 0)
|
|
|
|
goto out;
|
2018-03-14 09:22:04 +00:00
|
|
|
|
2021-07-19 22:31:52 +00:00
|
|
|
rec->bytes_written += err;
|
2016-02-26 09:32:07 +00:00
|
|
|
}
|
|
|
|
|
2016-05-24 02:28:59 +00:00
|
|
|
err = perf_event__synth_time_conv(record__pick_pc(rec), tool,
|
2016-03-08 08:38:44 +00:00
|
|
|
process_synthesized_event, machine);
|
|
|
|
if (err)
|
|
|
|
goto out;
|
|
|
|
|
2019-11-15 12:42:16 +00:00
|
|
|
/* Synthesize id_index before auxtrace_info */
|
2022-06-10 11:33:13 +00:00
|
|
|
err = perf_event__synthesize_id_index(tool,
|
|
|
|
process_synthesized_event,
|
|
|
|
session->evlist, machine);
|
|
|
|
if (err)
|
|
|
|
goto out;
|
2019-11-15 12:42:16 +00:00
|
|
|
|
2016-02-26 09:32:07 +00:00
|
|
|
if (rec->opts.full_auxtrace) {
|
|
|
|
err = perf_event__synthesize_auxtrace_info(rec->itr, tool,
|
|
|
|
session, process_synthesized_event);
|
|
|
|
if (err)
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
2020-11-30 18:07:49 +00:00
|
|
|
if (!evlist__exclude_kernel(rec->evlist)) {
|
perf record: Ignore kptr_restrict when not sampling the kernel
If we're not sampling the kernel, we shouldn't care about kptr_restrict
neither synthesize anything for assisting in resolving kernel samples,
like the reference relocation symbol or kernel modules information.
Before:
$ cat /proc/sys/kernel/kptr_restrict /proc/sys/kernel/perf_event_paranoid
2
2
$ perf record sleep 1
WARNING: Kernel address maps (/proc/{kallsyms,modules}) are restricted,
check /proc/sys/kernel/kptr_restrict.
Samples in kernel functions may not be resolved if a suitable vmlinux
file is not found in the buildid cache or in the vmlinux path.
Samples in kernel modules won't be resolved at all.
If some relocation was applied (e.g. kexec) symbols may be misresolved
even with a suitable vmlinux or kallsyms file.
Couldn't record kernel reference relocation symbol
Symbol resolution may be skewed if relocation was used (e.g. kexec).
Check /proc/kallsyms permission or run as root.
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.001 MB perf.data (8 samples) ]
$ perf evlist -v
cycles:uppp: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, exclude_kernel: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1
$
After:
$ perf record sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.001 MB perf.data (10 samples) ]
$
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-t025e9zftbx2b8cq2w01g5e5@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-11-14 14:03:19 +00:00
|
|
|
err = perf_event__synthesize_kernel_mmap(tool, process_synthesized_event,
|
|
|
|
machine);
|
|
|
|
WARN_ONCE(err < 0, "Couldn't record kernel reference relocation symbol\n"
|
|
|
|
"Symbol resolution may be skewed if relocation was used (e.g. kexec).\n"
|
|
|
|
"Check /proc/kallsyms permission or run as root.\n");
|
|
|
|
|
|
|
|
err = perf_event__synthesize_modules(tool, process_synthesized_event,
|
|
|
|
machine);
|
|
|
|
WARN_ONCE(err < 0, "Couldn't record kernel module information.\n"
|
|
|
|
"Symbol resolution may be skewed if relocation was used (e.g. kexec).\n"
|
|
|
|
"Check /proc/modules permission or run as root.\n");
|
|
|
|
}
|
2016-02-26 09:32:07 +00:00
|
|
|
|
|
|
|
if (perf_guest) {
|
|
|
|
machines__process_guests(&session->machines,
|
|
|
|
perf_event__synthesize_guest_os, tool);
|
|
|
|
}
|
|
|
|
|
2017-11-17 21:42:58 +00:00
|
|
|
err = perf_event__synthesize_extra_attr(&rec->tool,
|
|
|
|
rec->evlist,
|
|
|
|
process_synthesized_event,
|
|
|
|
data->is_pipe);
|
|
|
|
if (err)
|
|
|
|
goto out;
|
|
|
|
|
2019-07-21 11:24:42 +00:00
|
|
|
err = perf_event__synthesize_thread_map2(&rec->tool, rec->evlist->core.threads,
|
2017-11-17 21:42:59 +00:00
|
|
|
process_synthesized_event,
|
|
|
|
NULL);
|
|
|
|
if (err < 0) {
|
|
|
|
pr_err("Couldn't synthesize thread map.\n");
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2022-05-24 07:54:30 +00:00
|
|
|
err = perf_event__synthesize_cpu_map(&rec->tool, rec->evlist->core.all_cpus,
|
2017-11-17 21:42:59 +00:00
|
|
|
process_synthesized_event, NULL);
|
|
|
|
if (err < 0) {
|
|
|
|
pr_err("Couldn't synthesize cpu map.\n");
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2019-03-12 05:30:41 +00:00
|
|
|
err = perf_event__synthesize_bpf_events(session, process_synthesized_event,
|
perf tools: Synthesize PERF_RECORD_* for loaded BPF programs
This patch synthesize PERF_RECORD_KSYMBOL and PERF_RECORD_BPF_EVENT for
BPF programs loaded before perf-record. This is achieved by gathering
information about all BPF programs via sys_bpf.
Committer notes:
Fix the build on some older systems such as amazonlinux:1 where it was
breaking with:
util/bpf-event.c: In function 'perf_event__synthesize_one_bpf_prog':
util/bpf-event.c:52:9: error: missing initializer for field 'type' of 'struct bpf_prog_info' [-Werror=missing-field-initializers]
struct bpf_prog_info info = {};
^
In file included from /git/linux/tools/lib/bpf/bpf.h:26:0,
from util/bpf-event.c:3:
/git/linux/tools/include/uapi/linux/bpf.h:2699:8: note: 'type' declared here
__u32 type;
^
cc1: all warnings being treated as errors
Further fix on a centos:6 system:
cc1: warnings being treated as errors
util/bpf-event.c: In function 'perf_event__synthesize_one_bpf_prog':
util/bpf-event.c:50: error: 'func_info_rec_size' may be used uninitialized in this function
The compiler is wrong, but to silence it, initialize that variable to
zero.
One more fix, this time for debian:experimental-x-mips, x-mips64 and
x-mipsel:
util/bpf-event.c: In function 'perf_event__synthesize_one_bpf_prog':
util/bpf-event.c:93:16: error: implicit declaration of function 'calloc' [-Werror=implicit-function-declaration]
func_infos = calloc(sub_prog_cnt, func_info_rec_size);
^~~~~~
util/bpf-event.c:93:16: error: incompatible implicit declaration of built-in function 'calloc' [-Werror]
util/bpf-event.c:93:16: note: include '<stdlib.h>' or provide a declaration of 'calloc'
Add the missing header.
Committer testing:
# perf record --bpf-event sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.021 MB perf.data (7 samples) ]
# perf report -D | grep PERF_RECORD_BPF_EVENT | nl
1 0 0x4b10 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 13
2 0 0x4c60 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 14
3 0 0x4db0 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 15
4 0 0x4f00 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 16
5 0 0x5050 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 17
6 0 0x51a0 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 18
7 0 0x52f0 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 21
8 0 0x5440 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 22
# bpftool prog
13: cgroup_skb tag 7be49e3934a125ba gpl
loaded_at 2019-01-19T09:09:43-0300 uid 0
xlated 296B jited 229B memlock 4096B map_ids 13,14
14: cgroup_skb tag 2a142ef67aaad174 gpl
loaded_at 2019-01-19T09:09:43-0300 uid 0
xlated 296B jited 229B memlock 4096B map_ids 13,14
15: cgroup_skb tag 7be49e3934a125ba gpl
loaded_at 2019-01-19T09:09:43-0300 uid 0
xlated 296B jited 229B memlock 4096B map_ids 15,16
16: cgroup_skb tag 2a142ef67aaad174 gpl
loaded_at 2019-01-19T09:09:43-0300 uid 0
xlated 296B jited 229B memlock 4096B map_ids 15,16
17: cgroup_skb tag 7be49e3934a125ba gpl
loaded_at 2019-01-19T09:09:44-0300 uid 0
xlated 296B jited 229B memlock 4096B map_ids 17,18
18: cgroup_skb tag 2a142ef67aaad174 gpl
loaded_at 2019-01-19T09:09:44-0300 uid 0
xlated 296B jited 229B memlock 4096B map_ids 17,18
21: cgroup_skb tag 7be49e3934a125ba gpl
loaded_at 2019-01-19T09:09:45-0300 uid 0
xlated 296B jited 229B memlock 4096B map_ids 21,22
22: cgroup_skb tag 2a142ef67aaad174 gpl
loaded_at 2019-01-19T09:09:45-0300 uid 0
xlated 296B jited 229B memlock 4096B map_ids 21,22
#
# perf report -D | grep -B22 PERF_RECORD_KSYMBOL
. ... raw event: size 312 bytes
. 0000: 11 00 00 00 00 00 38 01 ff 44 06 c0 ff ff ff ff ......8..D......
. 0010: e5 00 00 00 01 00 00 00 62 70 66 5f 70 72 6f 67 ........bpf_prog
. 0020: 5f 37 62 65 34 39 65 33 39 33 34 61 31 32 35 62 _7be49e3934a125b
. 0030: 61 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a...............
<SNIP zeroes>
. 0110: 00 00 00 00 00 00 00 00 21 00 00 00 00 00 00 00 ........!.......
. 0120: 7b e4 9e 39 34 a1 25 ba 00 00 00 00 00 00 00 00 {..94.%.........
. 0130: 00 00 00 00 00 00 00 00 ........
0 0x49d8 [0x138]: PERF_RECORD_KSYMBOL ksymbol event with addr ffffffffc00644ff len 229 type 1 flags 0x0 name bpf_prog_7be49e3934a125ba
--
. ... raw event: size 312 bytes
. 0000: 11 00 00 00 00 00 38 01 48 6d 06 c0 ff ff ff ff ......8.Hm......
. 0010: e5 00 00 00 01 00 00 00 62 70 66 5f 70 72 6f 67 ........bpf_prog
. 0020: 5f 32 61 31 34 32 65 66 36 37 61 61 61 64 31 37 _2a142ef67aaad17
. 0030: 34 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 4...............
<SNIP zeroes>
. 0110: 00 00 00 00 00 00 00 00 21 00 00 00 00 00 00 00 ........!.......
. 0120: 2a 14 2e f6 7a aa d1 74 00 00 00 00 00 00 00 00 *...z..t........
. 0130: 00 00 00 00 00 00 00 00 ........
0 0x4b28 [0x138]: PERF_RECORD_KSYMBOL ksymbol event with addr ffffffffc0066d48 len 229 type 1 flags 0x0 name bpf_prog_2a142ef67aaad174
--
. ... raw event: size 312 bytes
. 0000: 11 00 00 00 00 00 38 01 04 cf 03 c0 ff ff ff ff ......8.........
. 0010: e5 00 00 00 01 00 00 00 62 70 66 5f 70 72 6f 67 ........bpf_prog
. 0020: 5f 37 62 65 34 39 65 33 39 33 34 61 31 32 35 62 _7be49e3934a125b
. 0030: 61 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a...............
<SNIP zeroes>
. 0110: 00 00 00 00 00 00 00 00 21 00 00 00 00 00 00 00 ........!.......
. 0120: 7b e4 9e 39 34 a1 25 ba 00 00 00 00 00 00 00 00 {..94.%.........
. 0130: 00 00 00 00 00 00 00 00 ........
0 0x4c78 [0x138]: PERF_RECORD_KSYMBOL ksymbol event with addr ffffffffc003cf04 len 229 type 1 flags 0x0 name bpf_prog_7be49e3934a125ba
--
. ... raw event: size 312 bytes
. 0000: 11 00 00 00 00 00 38 01 96 28 04 c0 ff ff ff ff ......8..(......
. 0010: e5 00 00 00 01 00 00 00 62 70 66 5f 70 72 6f 67 ........bpf_prog
. 0020: 5f 32 61 31 34 32 65 66 36 37 61 61 61 64 31 37 _2a142ef67aaad17
. 0030: 34 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 4...............
<SNIP zeroes>
. 0110: 00 00 00 00 00 00 00 00 21 00 00 00 00 00 00 00 ........!.......
. 0120: 2a 14 2e f6 7a aa d1 74 00 00 00 00 00 00 00 00 *...z..t........
. 0130: 00 00 00 00 00 00 00 00 ........
0 0x4dc8 [0x138]: PERF_RECORD_KSYMBOL ksymbol event with addr ffffffffc0042896 len 229 type 1 flags 0x0 name bpf_prog_2a142ef67aaad174
--
. ... raw event: size 312 bytes
. 0000: 11 00 00 00 00 00 38 01 05 13 17 c0 ff ff ff ff ......8.........
. 0010: e5 00 00 00 01 00 00 00 62 70 66 5f 70 72 6f 67 ........bpf_prog
. 0020: 5f 37 62 65 34 39 65 33 39 33 34 61 31 32 35 62 _7be49e3934a125b
. 0030: 61 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a...............
<SNIP zeroes>
. 0110: 00 00 00 00 00 00 00 00 21 00 00 00 00 00 00 00 ........!.......
. 0120: 7b e4 9e 39 34 a1 25 ba 00 00 00 00 00 00 00 00 {..94.%.........
. 0130: 00 00 00 00 00 00 00 00 ........
0 0x4f18 [0x138]: PERF_RECORD_KSYMBOL ksymbol event with addr ffffffffc0171305 len 229 type 1 flags 0x0 name bpf_prog_7be49e3934a125ba
--
. ... raw event: size 312 bytes
. 0000: 11 00 00 00 00 00 38 01 0a 8c 23 c0 ff ff ff ff ......8...#.....
. 0010: e5 00 00 00 01 00 00 00 62 70 66 5f 70 72 6f 67 ........bpf_prog
. 0020: 5f 32 61 31 34 32 65 66 36 37 61 61 61 64 31 37 _2a142ef67aaad17
. 0030: 34 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 4...............
<SNIP zeroes>
. 0110: 00 00 00 00 00 00 00 00 21 00 00 00 00 00 00 00 ........!.......
. 0120: 2a 14 2e f6 7a aa d1 74 00 00 00 00 00 00 00 00 *...z..t........
. 0130: 00 00 00 00 00 00 00 00 ........
0 0x5068 [0x138]: PERF_RECORD_KSYMBOL ksymbol event with addr ffffffffc0238c0a len 229 type 1 flags 0x0 name bpf_prog_2a142ef67aaad174
--
. ... raw event: size 312 bytes
. 0000: 11 00 00 00 00 00 38 01 2a a5 a4 c0 ff ff ff ff ......8.*.......
. 0010: e5 00 00 00 01 00 00 00 62 70 66 5f 70 72 6f 67 ........bpf_prog
. 0020: 5f 37 62 65 34 39 65 33 39 33 34 61 31 32 35 62 _7be49e3934a125b
. 0030: 61 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a...............
<SNIP zeroes>
. 0110: 00 00 00 00 00 00 00 00 21 00 00 00 00 00 00 00 ........!.......
. 0120: 7b e4 9e 39 34 a1 25 ba 00 00 00 00 00 00 00 00 {..94.%.........
. 0130: 00 00 00 00 00 00 00 00 ........
0 0x51b8 [0x138]: PERF_RECORD_KSYMBOL ksymbol event with addr ffffffffc0a4a52a len 229 type 1 flags 0x0 name bpf_prog_7be49e3934a125ba
--
. ... raw event: size 312 bytes
. 0000: 11 00 00 00 00 00 38 01 9b c9 a4 c0 ff ff ff ff ......8.........
. 0010: e5 00 00 00 01 00 00 00 62 70 66 5f 70 72 6f 67 ........bpf_prog
. 0020: 5f 32 61 31 34 32 65 66 36 37 61 61 61 64 31 37 _2a142ef67aaad17
. 0030: 34 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 4...............
<SNIP zeroes>
. 0110: 00 00 00 00 00 00 00 00 21 00 00 00 00 00 00 00 ........!.......
. 0120: 2a 14 2e f6 7a aa d1 74 00 00 00 00 00 00 00 00 *...z..t........
. 0130: 00 00 00 00 00 00 00 00 ........
0 0x5308 [0x138]: PERF_RECORD_KSYMBOL ksymbol event with addr ffffffffc0a4c99b len 229 type 1 flags 0x0 name bpf_prog_2a142ef67aaad174
Signed-off-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kernel-team@fb.com
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/20190117161521.1341602-8-songliubraving@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-01-17 16:15:19 +00:00
|
|
|
machine, opts);
|
2022-09-07 16:24:58 +00:00
|
|
|
if (err < 0) {
|
perf tools: Synthesize PERF_RECORD_* for loaded BPF programs
This patch synthesize PERF_RECORD_KSYMBOL and PERF_RECORD_BPF_EVENT for
BPF programs loaded before perf-record. This is achieved by gathering
information about all BPF programs via sys_bpf.
Committer notes:
Fix the build on some older systems such as amazonlinux:1 where it was
breaking with:
util/bpf-event.c: In function 'perf_event__synthesize_one_bpf_prog':
util/bpf-event.c:52:9: error: missing initializer for field 'type' of 'struct bpf_prog_info' [-Werror=missing-field-initializers]
struct bpf_prog_info info = {};
^
In file included from /git/linux/tools/lib/bpf/bpf.h:26:0,
from util/bpf-event.c:3:
/git/linux/tools/include/uapi/linux/bpf.h:2699:8: note: 'type' declared here
__u32 type;
^
cc1: all warnings being treated as errors
Further fix on a centos:6 system:
cc1: warnings being treated as errors
util/bpf-event.c: In function 'perf_event__synthesize_one_bpf_prog':
util/bpf-event.c:50: error: 'func_info_rec_size' may be used uninitialized in this function
The compiler is wrong, but to silence it, initialize that variable to
zero.
One more fix, this time for debian:experimental-x-mips, x-mips64 and
x-mipsel:
util/bpf-event.c: In function 'perf_event__synthesize_one_bpf_prog':
util/bpf-event.c:93:16: error: implicit declaration of function 'calloc' [-Werror=implicit-function-declaration]
func_infos = calloc(sub_prog_cnt, func_info_rec_size);
^~~~~~
util/bpf-event.c:93:16: error: incompatible implicit declaration of built-in function 'calloc' [-Werror]
util/bpf-event.c:93:16: note: include '<stdlib.h>' or provide a declaration of 'calloc'
Add the missing header.
Committer testing:
# perf record --bpf-event sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.021 MB perf.data (7 samples) ]
# perf report -D | grep PERF_RECORD_BPF_EVENT | nl
1 0 0x4b10 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 13
2 0 0x4c60 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 14
3 0 0x4db0 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 15
4 0 0x4f00 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 16
5 0 0x5050 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 17
6 0 0x51a0 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 18
7 0 0x52f0 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 21
8 0 0x5440 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 22
# bpftool prog
13: cgroup_skb tag 7be49e3934a125ba gpl
loaded_at 2019-01-19T09:09:43-0300 uid 0
xlated 296B jited 229B memlock 4096B map_ids 13,14
14: cgroup_skb tag 2a142ef67aaad174 gpl
loaded_at 2019-01-19T09:09:43-0300 uid 0
xlated 296B jited 229B memlock 4096B map_ids 13,14
15: cgroup_skb tag 7be49e3934a125ba gpl
loaded_at 2019-01-19T09:09:43-0300 uid 0
xlated 296B jited 229B memlock 4096B map_ids 15,16
16: cgroup_skb tag 2a142ef67aaad174 gpl
loaded_at 2019-01-19T09:09:43-0300 uid 0
xlated 296B jited 229B memlock 4096B map_ids 15,16
17: cgroup_skb tag 7be49e3934a125ba gpl
loaded_at 2019-01-19T09:09:44-0300 uid 0
xlated 296B jited 229B memlock 4096B map_ids 17,18
18: cgroup_skb tag 2a142ef67aaad174 gpl
loaded_at 2019-01-19T09:09:44-0300 uid 0
xlated 296B jited 229B memlock 4096B map_ids 17,18
21: cgroup_skb tag 7be49e3934a125ba gpl
loaded_at 2019-01-19T09:09:45-0300 uid 0
xlated 296B jited 229B memlock 4096B map_ids 21,22
22: cgroup_skb tag 2a142ef67aaad174 gpl
loaded_at 2019-01-19T09:09:45-0300 uid 0
xlated 296B jited 229B memlock 4096B map_ids 21,22
#
# perf report -D | grep -B22 PERF_RECORD_KSYMBOL
. ... raw event: size 312 bytes
. 0000: 11 00 00 00 00 00 38 01 ff 44 06 c0 ff ff ff ff ......8..D......
. 0010: e5 00 00 00 01 00 00 00 62 70 66 5f 70 72 6f 67 ........bpf_prog
. 0020: 5f 37 62 65 34 39 65 33 39 33 34 61 31 32 35 62 _7be49e3934a125b
. 0030: 61 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a...............
<SNIP zeroes>
. 0110: 00 00 00 00 00 00 00 00 21 00 00 00 00 00 00 00 ........!.......
. 0120: 7b e4 9e 39 34 a1 25 ba 00 00 00 00 00 00 00 00 {..94.%.........
. 0130: 00 00 00 00 00 00 00 00 ........
0 0x49d8 [0x138]: PERF_RECORD_KSYMBOL ksymbol event with addr ffffffffc00644ff len 229 type 1 flags 0x0 name bpf_prog_7be49e3934a125ba
--
. ... raw event: size 312 bytes
. 0000: 11 00 00 00 00 00 38 01 48 6d 06 c0 ff ff ff ff ......8.Hm......
. 0010: e5 00 00 00 01 00 00 00 62 70 66 5f 70 72 6f 67 ........bpf_prog
. 0020: 5f 32 61 31 34 32 65 66 36 37 61 61 61 64 31 37 _2a142ef67aaad17
. 0030: 34 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 4...............
<SNIP zeroes>
. 0110: 00 00 00 00 00 00 00 00 21 00 00 00 00 00 00 00 ........!.......
. 0120: 2a 14 2e f6 7a aa d1 74 00 00 00 00 00 00 00 00 *...z..t........
. 0130: 00 00 00 00 00 00 00 00 ........
0 0x4b28 [0x138]: PERF_RECORD_KSYMBOL ksymbol event with addr ffffffffc0066d48 len 229 type 1 flags 0x0 name bpf_prog_2a142ef67aaad174
--
. ... raw event: size 312 bytes
. 0000: 11 00 00 00 00 00 38 01 04 cf 03 c0 ff ff ff ff ......8.........
. 0010: e5 00 00 00 01 00 00 00 62 70 66 5f 70 72 6f 67 ........bpf_prog
. 0020: 5f 37 62 65 34 39 65 33 39 33 34 61 31 32 35 62 _7be49e3934a125b
. 0030: 61 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a...............
<SNIP zeroes>
. 0110: 00 00 00 00 00 00 00 00 21 00 00 00 00 00 00 00 ........!.......
. 0120: 7b e4 9e 39 34 a1 25 ba 00 00 00 00 00 00 00 00 {..94.%.........
. 0130: 00 00 00 00 00 00 00 00 ........
0 0x4c78 [0x138]: PERF_RECORD_KSYMBOL ksymbol event with addr ffffffffc003cf04 len 229 type 1 flags 0x0 name bpf_prog_7be49e3934a125ba
--
. ... raw event: size 312 bytes
. 0000: 11 00 00 00 00 00 38 01 96 28 04 c0 ff ff ff ff ......8..(......
. 0010: e5 00 00 00 01 00 00 00 62 70 66 5f 70 72 6f 67 ........bpf_prog
. 0020: 5f 32 61 31 34 32 65 66 36 37 61 61 61 64 31 37 _2a142ef67aaad17
. 0030: 34 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 4...............
<SNIP zeroes>
. 0110: 00 00 00 00 00 00 00 00 21 00 00 00 00 00 00 00 ........!.......
. 0120: 2a 14 2e f6 7a aa d1 74 00 00 00 00 00 00 00 00 *...z..t........
. 0130: 00 00 00 00 00 00 00 00 ........
0 0x4dc8 [0x138]: PERF_RECORD_KSYMBOL ksymbol event with addr ffffffffc0042896 len 229 type 1 flags 0x0 name bpf_prog_2a142ef67aaad174
--
. ... raw event: size 312 bytes
. 0000: 11 00 00 00 00 00 38 01 05 13 17 c0 ff ff ff ff ......8.........
. 0010: e5 00 00 00 01 00 00 00 62 70 66 5f 70 72 6f 67 ........bpf_prog
. 0020: 5f 37 62 65 34 39 65 33 39 33 34 61 31 32 35 62 _7be49e3934a125b
. 0030: 61 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a...............
<SNIP zeroes>
. 0110: 00 00 00 00 00 00 00 00 21 00 00 00 00 00 00 00 ........!.......
. 0120: 7b e4 9e 39 34 a1 25 ba 00 00 00 00 00 00 00 00 {..94.%.........
. 0130: 00 00 00 00 00 00 00 00 ........
0 0x4f18 [0x138]: PERF_RECORD_KSYMBOL ksymbol event with addr ffffffffc0171305 len 229 type 1 flags 0x0 name bpf_prog_7be49e3934a125ba
--
. ... raw event: size 312 bytes
. 0000: 11 00 00 00 00 00 38 01 0a 8c 23 c0 ff ff ff ff ......8...#.....
. 0010: e5 00 00 00 01 00 00 00 62 70 66 5f 70 72 6f 67 ........bpf_prog
. 0020: 5f 32 61 31 34 32 65 66 36 37 61 61 61 64 31 37 _2a142ef67aaad17
. 0030: 34 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 4...............
<SNIP zeroes>
. 0110: 00 00 00 00 00 00 00 00 21 00 00 00 00 00 00 00 ........!.......
. 0120: 2a 14 2e f6 7a aa d1 74 00 00 00 00 00 00 00 00 *...z..t........
. 0130: 00 00 00 00 00 00 00 00 ........
0 0x5068 [0x138]: PERF_RECORD_KSYMBOL ksymbol event with addr ffffffffc0238c0a len 229 type 1 flags 0x0 name bpf_prog_2a142ef67aaad174
--
. ... raw event: size 312 bytes
. 0000: 11 00 00 00 00 00 38 01 2a a5 a4 c0 ff ff ff ff ......8.*.......
. 0010: e5 00 00 00 01 00 00 00 62 70 66 5f 70 72 6f 67 ........bpf_prog
. 0020: 5f 37 62 65 34 39 65 33 39 33 34 61 31 32 35 62 _7be49e3934a125b
. 0030: 61 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a...............
<SNIP zeroes>
. 0110: 00 00 00 00 00 00 00 00 21 00 00 00 00 00 00 00 ........!.......
. 0120: 7b e4 9e 39 34 a1 25 ba 00 00 00 00 00 00 00 00 {..94.%.........
. 0130: 00 00 00 00 00 00 00 00 ........
0 0x51b8 [0x138]: PERF_RECORD_KSYMBOL ksymbol event with addr ffffffffc0a4a52a len 229 type 1 flags 0x0 name bpf_prog_7be49e3934a125ba
--
. ... raw event: size 312 bytes
. 0000: 11 00 00 00 00 00 38 01 9b c9 a4 c0 ff ff ff ff ......8.........
. 0010: e5 00 00 00 01 00 00 00 62 70 66 5f 70 72 6f 67 ........bpf_prog
. 0020: 5f 32 61 31 34 32 65 66 36 37 61 61 61 64 31 37 _2a142ef67aaad17
. 0030: 34 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 4...............
<SNIP zeroes>
. 0110: 00 00 00 00 00 00 00 00 21 00 00 00 00 00 00 00 ........!.......
. 0120: 2a 14 2e f6 7a aa d1 74 00 00 00 00 00 00 00 00 *...z..t........
. 0130: 00 00 00 00 00 00 00 00 ........
0 0x5308 [0x138]: PERF_RECORD_KSYMBOL ksymbol event with addr ffffffffc0a4c99b len 229 type 1 flags 0x0 name bpf_prog_2a142ef67aaad174
Signed-off-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kernel-team@fb.com
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/20190117161521.1341602-8-songliubraving@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-01-17 16:15:19 +00:00
|
|
|
pr_warning("Couldn't synthesize bpf events.\n");
|
2022-09-07 16:24:58 +00:00
|
|
|
err = 0;
|
|
|
|
}
|
perf tools: Synthesize PERF_RECORD_* for loaded BPF programs
This patch synthesize PERF_RECORD_KSYMBOL and PERF_RECORD_BPF_EVENT for
BPF programs loaded before perf-record. This is achieved by gathering
information about all BPF programs via sys_bpf.
Committer notes:
Fix the build on some older systems such as amazonlinux:1 where it was
breaking with:
util/bpf-event.c: In function 'perf_event__synthesize_one_bpf_prog':
util/bpf-event.c:52:9: error: missing initializer for field 'type' of 'struct bpf_prog_info' [-Werror=missing-field-initializers]
struct bpf_prog_info info = {};
^
In file included from /git/linux/tools/lib/bpf/bpf.h:26:0,
from util/bpf-event.c:3:
/git/linux/tools/include/uapi/linux/bpf.h:2699:8: note: 'type' declared here
__u32 type;
^
cc1: all warnings being treated as errors
Further fix on a centos:6 system:
cc1: warnings being treated as errors
util/bpf-event.c: In function 'perf_event__synthesize_one_bpf_prog':
util/bpf-event.c:50: error: 'func_info_rec_size' may be used uninitialized in this function
The compiler is wrong, but to silence it, initialize that variable to
zero.
One more fix, this time for debian:experimental-x-mips, x-mips64 and
x-mipsel:
util/bpf-event.c: In function 'perf_event__synthesize_one_bpf_prog':
util/bpf-event.c:93:16: error: implicit declaration of function 'calloc' [-Werror=implicit-function-declaration]
func_infos = calloc(sub_prog_cnt, func_info_rec_size);
^~~~~~
util/bpf-event.c:93:16: error: incompatible implicit declaration of built-in function 'calloc' [-Werror]
util/bpf-event.c:93:16: note: include '<stdlib.h>' or provide a declaration of 'calloc'
Add the missing header.
Committer testing:
# perf record --bpf-event sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.021 MB perf.data (7 samples) ]
# perf report -D | grep PERF_RECORD_BPF_EVENT | nl
1 0 0x4b10 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 13
2 0 0x4c60 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 14
3 0 0x4db0 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 15
4 0 0x4f00 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 16
5 0 0x5050 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 17
6 0 0x51a0 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 18
7 0 0x52f0 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 21
8 0 0x5440 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 22
# bpftool prog
13: cgroup_skb tag 7be49e3934a125ba gpl
loaded_at 2019-01-19T09:09:43-0300 uid 0
xlated 296B jited 229B memlock 4096B map_ids 13,14
14: cgroup_skb tag 2a142ef67aaad174 gpl
loaded_at 2019-01-19T09:09:43-0300 uid 0
xlated 296B jited 229B memlock 4096B map_ids 13,14
15: cgroup_skb tag 7be49e3934a125ba gpl
loaded_at 2019-01-19T09:09:43-0300 uid 0
xlated 296B jited 229B memlock 4096B map_ids 15,16
16: cgroup_skb tag 2a142ef67aaad174 gpl
loaded_at 2019-01-19T09:09:43-0300 uid 0
xlated 296B jited 229B memlock 4096B map_ids 15,16
17: cgroup_skb tag 7be49e3934a125ba gpl
loaded_at 2019-01-19T09:09:44-0300 uid 0
xlated 296B jited 229B memlock 4096B map_ids 17,18
18: cgroup_skb tag 2a142ef67aaad174 gpl
loaded_at 2019-01-19T09:09:44-0300 uid 0
xlated 296B jited 229B memlock 4096B map_ids 17,18
21: cgroup_skb tag 7be49e3934a125ba gpl
loaded_at 2019-01-19T09:09:45-0300 uid 0
xlated 296B jited 229B memlock 4096B map_ids 21,22
22: cgroup_skb tag 2a142ef67aaad174 gpl
loaded_at 2019-01-19T09:09:45-0300 uid 0
xlated 296B jited 229B memlock 4096B map_ids 21,22
#
# perf report -D | grep -B22 PERF_RECORD_KSYMBOL
. ... raw event: size 312 bytes
. 0000: 11 00 00 00 00 00 38 01 ff 44 06 c0 ff ff ff ff ......8..D......
. 0010: e5 00 00 00 01 00 00 00 62 70 66 5f 70 72 6f 67 ........bpf_prog
. 0020: 5f 37 62 65 34 39 65 33 39 33 34 61 31 32 35 62 _7be49e3934a125b
. 0030: 61 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a...............
<SNIP zeroes>
. 0110: 00 00 00 00 00 00 00 00 21 00 00 00 00 00 00 00 ........!.......
. 0120: 7b e4 9e 39 34 a1 25 ba 00 00 00 00 00 00 00 00 {..94.%.........
. 0130: 00 00 00 00 00 00 00 00 ........
0 0x49d8 [0x138]: PERF_RECORD_KSYMBOL ksymbol event with addr ffffffffc00644ff len 229 type 1 flags 0x0 name bpf_prog_7be49e3934a125ba
--
. ... raw event: size 312 bytes
. 0000: 11 00 00 00 00 00 38 01 48 6d 06 c0 ff ff ff ff ......8.Hm......
. 0010: e5 00 00 00 01 00 00 00 62 70 66 5f 70 72 6f 67 ........bpf_prog
. 0020: 5f 32 61 31 34 32 65 66 36 37 61 61 61 64 31 37 _2a142ef67aaad17
. 0030: 34 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 4...............
<SNIP zeroes>
. 0110: 00 00 00 00 00 00 00 00 21 00 00 00 00 00 00 00 ........!.......
. 0120: 2a 14 2e f6 7a aa d1 74 00 00 00 00 00 00 00 00 *...z..t........
. 0130: 00 00 00 00 00 00 00 00 ........
0 0x4b28 [0x138]: PERF_RECORD_KSYMBOL ksymbol event with addr ffffffffc0066d48 len 229 type 1 flags 0x0 name bpf_prog_2a142ef67aaad174
--
. ... raw event: size 312 bytes
. 0000: 11 00 00 00 00 00 38 01 04 cf 03 c0 ff ff ff ff ......8.........
. 0010: e5 00 00 00 01 00 00 00 62 70 66 5f 70 72 6f 67 ........bpf_prog
. 0020: 5f 37 62 65 34 39 65 33 39 33 34 61 31 32 35 62 _7be49e3934a125b
. 0030: 61 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a...............
<SNIP zeroes>
. 0110: 00 00 00 00 00 00 00 00 21 00 00 00 00 00 00 00 ........!.......
. 0120: 7b e4 9e 39 34 a1 25 ba 00 00 00 00 00 00 00 00 {..94.%.........
. 0130: 00 00 00 00 00 00 00 00 ........
0 0x4c78 [0x138]: PERF_RECORD_KSYMBOL ksymbol event with addr ffffffffc003cf04 len 229 type 1 flags 0x0 name bpf_prog_7be49e3934a125ba
--
. ... raw event: size 312 bytes
. 0000: 11 00 00 00 00 00 38 01 96 28 04 c0 ff ff ff ff ......8..(......
. 0010: e5 00 00 00 01 00 00 00 62 70 66 5f 70 72 6f 67 ........bpf_prog
. 0020: 5f 32 61 31 34 32 65 66 36 37 61 61 61 64 31 37 _2a142ef67aaad17
. 0030: 34 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 4...............
<SNIP zeroes>
. 0110: 00 00 00 00 00 00 00 00 21 00 00 00 00 00 00 00 ........!.......
. 0120: 2a 14 2e f6 7a aa d1 74 00 00 00 00 00 00 00 00 *...z..t........
. 0130: 00 00 00 00 00 00 00 00 ........
0 0x4dc8 [0x138]: PERF_RECORD_KSYMBOL ksymbol event with addr ffffffffc0042896 len 229 type 1 flags 0x0 name bpf_prog_2a142ef67aaad174
--
. ... raw event: size 312 bytes
. 0000: 11 00 00 00 00 00 38 01 05 13 17 c0 ff ff ff ff ......8.........
. 0010: e5 00 00 00 01 00 00 00 62 70 66 5f 70 72 6f 67 ........bpf_prog
. 0020: 5f 37 62 65 34 39 65 33 39 33 34 61 31 32 35 62 _7be49e3934a125b
. 0030: 61 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a...............
<SNIP zeroes>
. 0110: 00 00 00 00 00 00 00 00 21 00 00 00 00 00 00 00 ........!.......
. 0120: 7b e4 9e 39 34 a1 25 ba 00 00 00 00 00 00 00 00 {..94.%.........
. 0130: 00 00 00 00 00 00 00 00 ........
0 0x4f18 [0x138]: PERF_RECORD_KSYMBOL ksymbol event with addr ffffffffc0171305 len 229 type 1 flags 0x0 name bpf_prog_7be49e3934a125ba
--
. ... raw event: size 312 bytes
. 0000: 11 00 00 00 00 00 38 01 0a 8c 23 c0 ff ff ff ff ......8...#.....
. 0010: e5 00 00 00 01 00 00 00 62 70 66 5f 70 72 6f 67 ........bpf_prog
. 0020: 5f 32 61 31 34 32 65 66 36 37 61 61 61 64 31 37 _2a142ef67aaad17
. 0030: 34 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 4...............
<SNIP zeroes>
. 0110: 00 00 00 00 00 00 00 00 21 00 00 00 00 00 00 00 ........!.......
. 0120: 2a 14 2e f6 7a aa d1 74 00 00 00 00 00 00 00 00 *...z..t........
. 0130: 00 00 00 00 00 00 00 00 ........
0 0x5068 [0x138]: PERF_RECORD_KSYMBOL ksymbol event with addr ffffffffc0238c0a len 229 type 1 flags 0x0 name bpf_prog_2a142ef67aaad174
--
. ... raw event: size 312 bytes
. 0000: 11 00 00 00 00 00 38 01 2a a5 a4 c0 ff ff ff ff ......8.*.......
. 0010: e5 00 00 00 01 00 00 00 62 70 66 5f 70 72 6f 67 ........bpf_prog
. 0020: 5f 37 62 65 34 39 65 33 39 33 34 61 31 32 35 62 _7be49e3934a125b
. 0030: 61 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a...............
<SNIP zeroes>
. 0110: 00 00 00 00 00 00 00 00 21 00 00 00 00 00 00 00 ........!.......
. 0120: 7b e4 9e 39 34 a1 25 ba 00 00 00 00 00 00 00 00 {..94.%.........
. 0130: 00 00 00 00 00 00 00 00 ........
0 0x51b8 [0x138]: PERF_RECORD_KSYMBOL ksymbol event with addr ffffffffc0a4a52a len 229 type 1 flags 0x0 name bpf_prog_7be49e3934a125ba
--
. ... raw event: size 312 bytes
. 0000: 11 00 00 00 00 00 38 01 9b c9 a4 c0 ff ff ff ff ......8.........
. 0010: e5 00 00 00 01 00 00 00 62 70 66 5f 70 72 6f 67 ........bpf_prog
. 0020: 5f 32 61 31 34 32 65 66 36 37 61 61 61 64 31 37 _2a142ef67aaad17
. 0030: 34 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 4...............
<SNIP zeroes>
. 0110: 00 00 00 00 00 00 00 00 21 00 00 00 00 00 00 00 ........!.......
. 0120: 2a 14 2e f6 7a aa d1 74 00 00 00 00 00 00 00 00 *...z..t........
. 0130: 00 00 00 00 00 00 00 00 ........
0 0x5308 [0x138]: PERF_RECORD_KSYMBOL ksymbol event with addr ffffffffc0a4c99b len 229 type 1 flags 0x0 name bpf_prog_2a142ef67aaad174
Signed-off-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kernel-team@fb.com
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/20190117161521.1341602-8-songliubraving@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-01-17 16:15:19 +00:00
|
|
|
|
2021-08-11 04:46:58 +00:00
|
|
|
if (rec->opts.synth & PERF_SYNTH_CGROUP) {
|
|
|
|
err = perf_event__synthesize_cgroups(tool, process_synthesized_event,
|
|
|
|
machine);
|
2022-09-07 16:24:58 +00:00
|
|
|
if (err < 0) {
|
2021-08-11 04:46:58 +00:00
|
|
|
pr_warning("Couldn't synthesize cgroup events.\n");
|
2022-09-07 16:24:58 +00:00
|
|
|
err = 0;
|
|
|
|
}
|
2021-08-11 04:46:58 +00:00
|
|
|
}
|
2020-03-25 12:45:33 +00:00
|
|
|
|
2020-04-22 15:50:38 +00:00
|
|
|
if (rec->opts.nr_threads_synthesize > 1) {
|
2022-08-26 16:42:31 +00:00
|
|
|
mutex_init(&synth_lock);
|
2020-04-22 15:50:38 +00:00
|
|
|
perf_set_multithreaded();
|
|
|
|
f = process_locked_synthesized_event;
|
|
|
|
}
|
|
|
|
|
2021-08-11 04:46:58 +00:00
|
|
|
if (rec->opts.synth & PERF_SYNTH_TASK) {
|
|
|
|
bool needs_mmap = rec->opts.synth & PERF_SYNTH_MMAP;
|
|
|
|
|
|
|
|
err = __machine__synthesize_threads(machine, tool, &opts->target,
|
|
|
|
rec->evlist->core.threads,
|
|
|
|
f, needs_mmap, opts->sample_address,
|
|
|
|
rec->opts.nr_threads_synthesize);
|
|
|
|
}
|
2020-04-22 15:50:38 +00:00
|
|
|
|
2022-08-26 16:42:31 +00:00
|
|
|
if (rec->opts.nr_threads_synthesize > 1) {
|
2020-04-22 15:50:38 +00:00
|
|
|
perf_set_singlethreaded();
|
2022-08-26 16:42:31 +00:00
|
|
|
mutex_destroy(&synth_lock);
|
|
|
|
}
|
2020-04-22 15:50:38 +00:00
|
|
|
|
2016-02-26 09:32:07 +00:00
|
|
|
out:
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2020-04-27 20:56:37 +00:00
|
|
|
static int record__process_signal_event(union perf_event *event __maybe_unused, void *data)
|
|
|
|
{
|
|
|
|
struct record *rec = data;
|
|
|
|
pthread_kill(rec->thread_id, SIGUSR2);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2020-04-28 17:58:29 +00:00
|
|
|
static int record__setup_sb_evlist(struct record *rec)
|
|
|
|
{
|
|
|
|
struct record_opts *opts = &rec->opts;
|
|
|
|
|
|
|
|
if (rec->sb_evlist != NULL) {
|
|
|
|
/*
|
|
|
|
* We get here if --switch-output-event populated the
|
|
|
|
* sb_evlist, so associate a callback that will send a SIGUSR2
|
|
|
|
* to the main thread.
|
|
|
|
*/
|
|
|
|
evlist__set_cb(rec->sb_evlist, record__process_signal_event, rec);
|
|
|
|
rec->thread_id = pthread_self();
|
|
|
|
}
|
2020-08-05 02:29:37 +00:00
|
|
|
#ifdef HAVE_LIBBPF_SUPPORT
|
2020-04-28 17:58:29 +00:00
|
|
|
if (!opts->no_bpf_event) {
|
|
|
|
if (rec->sb_evlist == NULL) {
|
|
|
|
rec->sb_evlist = evlist__new();
|
|
|
|
|
|
|
|
if (rec->sb_evlist == NULL) {
|
|
|
|
pr_err("Couldn't create side band evlist.\n.");
|
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
if (evlist__add_bpf_sb_event(rec->sb_evlist, &rec->session->header.env)) {
|
|
|
|
pr_err("Couldn't ask for PERF_RECORD_BPF_EVENT side band events.\n.");
|
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
}
|
2020-08-05 02:29:37 +00:00
|
|
|
#endif
|
2020-11-30 12:40:10 +00:00
|
|
|
if (evlist__start_sb_thread(rec->sb_evlist, &rec->opts.target)) {
|
2020-04-28 17:58:29 +00:00
|
|
|
pr_debug("Couldn't start the BPF side band thread:\nBPF programs starting from now on won't be annotatable\n");
|
|
|
|
opts->no_bpf_event = true;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
perf header: Store clock references for -k/--clockid option
Add a new CLOCK_DATA feature that stores reference times when
-k/--clockid option is specified.
It contains the clock id and its reference time together with wall clock
time taken at the 'same time', both values are in nanoseconds.
The format of data is as below:
struct {
u32 version; /* version = 1 */
u32 clockid;
u64 wall_clock_ns;
u64 clockid_time_ns;
};
This clock reference times will be used in following changes to display
wall clock for perf events.
It's available only for recording with clockid specified, because it's
the only case where we can get reference time to wallclock time. It's
can't do that with perf clock yet.
Committer testing:
$ perf record -h -k
Usage: perf record [<options>] [<command>]
or: perf record [<options>] -- <command> [<options>]
-k, --clockid <clockid>
clockid to use for events, see clock_gettime()
$ perf record -k monotonic sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.017 MB perf.data (8 samples) ]
$ perf report --header-only | grep clockid -A1
# event : name = cycles:u, , id = { 88815, 88816, 88817, 88818, 88819, 88820, 88821, 88822 }, size = 120, { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|PERIOD, read_format = ID, disabled = 1, inherit = 1, exclude_kernel = 1, mmap = 1, comm = 1, freq = 1, enable_on_exec = 1, task = 1, precise_ip = 3, sample_id_all = 1, exclude_guest = 1, mmap2 = 1, comm_exec = 1, use_clockid = 1, ksymbol = 1, bpf_event = 1, clockid = 1
# CPU_TOPOLOGY info available, use -I to display
--
# clockid frequency: 1000 MHz
# cpu pmu capabilities: branches=32, max_precise=3, pmu_name=skylake
# clockid: monotonic (1)
# reference time: 2020-08-06 09:40:21.619290 = 1596717621.619290 (TOD) = 21931.077673635 (monotonic)
$
Original-patch-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Geneviève Bastien <gbastien@versatic.net>
Cc: Ian Rogers <irogers@google.com>
Cc: Jeremie Galarneau <jgalar@efficios.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lore.kernel.org/lkml/20200805093444.314999-4-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-05 09:34:40 +00:00
|
|
|
static int record__init_clock(struct record *rec)
|
|
|
|
{
|
|
|
|
struct perf_session *session = rec->session;
|
|
|
|
struct timespec ref_clockid;
|
|
|
|
struct timeval ref_tod;
|
|
|
|
u64 ref;
|
|
|
|
|
|
|
|
if (!rec->opts.use_clockid)
|
|
|
|
return 0;
|
|
|
|
|
2020-08-05 09:34:41 +00:00
|
|
|
if (rec->opts.use_clockid && rec->opts.clockid_res_ns)
|
|
|
|
session->header.env.clock.clockid_res_ns = rec->opts.clockid_res_ns;
|
|
|
|
|
perf header: Store clock references for -k/--clockid option
Add a new CLOCK_DATA feature that stores reference times when
-k/--clockid option is specified.
It contains the clock id and its reference time together with wall clock
time taken at the 'same time', both values are in nanoseconds.
The format of data is as below:
struct {
u32 version; /* version = 1 */
u32 clockid;
u64 wall_clock_ns;
u64 clockid_time_ns;
};
This clock reference times will be used in following changes to display
wall clock for perf events.
It's available only for recording with clockid specified, because it's
the only case where we can get reference time to wallclock time. It's
can't do that with perf clock yet.
Committer testing:
$ perf record -h -k
Usage: perf record [<options>] [<command>]
or: perf record [<options>] -- <command> [<options>]
-k, --clockid <clockid>
clockid to use for events, see clock_gettime()
$ perf record -k monotonic sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.017 MB perf.data (8 samples) ]
$ perf report --header-only | grep clockid -A1
# event : name = cycles:u, , id = { 88815, 88816, 88817, 88818, 88819, 88820, 88821, 88822 }, size = 120, { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|PERIOD, read_format = ID, disabled = 1, inherit = 1, exclude_kernel = 1, mmap = 1, comm = 1, freq = 1, enable_on_exec = 1, task = 1, precise_ip = 3, sample_id_all = 1, exclude_guest = 1, mmap2 = 1, comm_exec = 1, use_clockid = 1, ksymbol = 1, bpf_event = 1, clockid = 1
# CPU_TOPOLOGY info available, use -I to display
--
# clockid frequency: 1000 MHz
# cpu pmu capabilities: branches=32, max_precise=3, pmu_name=skylake
# clockid: monotonic (1)
# reference time: 2020-08-06 09:40:21.619290 = 1596717621.619290 (TOD) = 21931.077673635 (monotonic)
$
Original-patch-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Geneviève Bastien <gbastien@versatic.net>
Cc: Ian Rogers <irogers@google.com>
Cc: Jeremie Galarneau <jgalar@efficios.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lore.kernel.org/lkml/20200805093444.314999-4-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-05 09:34:40 +00:00
|
|
|
session->header.env.clock.clockid = rec->opts.clockid;
|
|
|
|
|
|
|
|
if (gettimeofday(&ref_tod, NULL) != 0) {
|
|
|
|
pr_err("gettimeofday failed, cannot set reference time.\n");
|
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (clock_gettime(rec->opts.clockid, &ref_clockid)) {
|
|
|
|
pr_err("clock_gettime failed, cannot set reference time.\n");
|
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
|
|
|
|
ref = (u64) ref_tod.tv_sec * NSEC_PER_SEC +
|
|
|
|
(u64) ref_tod.tv_usec * NSEC_PER_USEC;
|
|
|
|
|
|
|
|
session->header.env.clock.tod_ns = ref;
|
|
|
|
|
|
|
|
ref = (u64) ref_clockid.tv_sec * NSEC_PER_SEC +
|
|
|
|
(u64) ref_clockid.tv_nsec;
|
|
|
|
|
|
|
|
session->header.env.clock.clockid_ns = ref;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2020-09-01 09:37:57 +00:00
|
|
|
static void hit_auxtrace_snapshot_trigger(struct record *rec)
|
|
|
|
{
|
|
|
|
if (trigger_is_ready(&auxtrace_snapshot_trigger)) {
|
|
|
|
trigger_hit(&auxtrace_snapshot_trigger);
|
|
|
|
auxtrace_record__snapshot_started = 1;
|
|
|
|
if (auxtrace_record__snapshot_start(rec->itr))
|
|
|
|
trigger_error(&auxtrace_snapshot_trigger);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2022-01-17 18:34:26 +00:00
|
|
|
static int record__terminate_thread(struct record_thread *thread_data)
|
|
|
|
{
|
|
|
|
int err;
|
|
|
|
enum thread_msg ack = THREAD_MSG__UNDEFINED;
|
|
|
|
pid_t tid = thread_data->tid;
|
|
|
|
|
|
|
|
close(thread_data->pipes.msg[1]);
|
|
|
|
thread_data->pipes.msg[1] = -1;
|
|
|
|
err = read(thread_data->pipes.ack[0], &ack, sizeof(ack));
|
|
|
|
if (err > 0)
|
|
|
|
pr_debug2("threads[%d]: sent %s\n", tid, thread_msg_tags[ack]);
|
|
|
|
else
|
|
|
|
pr_warning("threads[%d]: failed to receive termination notification from %d\n",
|
|
|
|
thread->tid, tid);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2022-01-17 18:34:25 +00:00
|
|
|
static int record__start_threads(struct record *rec)
|
|
|
|
{
|
2022-01-17 18:34:27 +00:00
|
|
|
int t, tt, err, ret = 0, nr_threads = rec->nr_threads;
|
2022-01-17 18:34:25 +00:00
|
|
|
struct record_thread *thread_data = rec->thread_data;
|
2022-01-17 18:34:27 +00:00
|
|
|
sigset_t full, mask;
|
|
|
|
pthread_t handle;
|
|
|
|
pthread_attr_t attrs;
|
2022-01-17 18:34:25 +00:00
|
|
|
|
|
|
|
thread = &thread_data[0];
|
|
|
|
|
2022-01-17 18:34:27 +00:00
|
|
|
if (!record__threads_enabled(rec))
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
sigfillset(&full);
|
|
|
|
if (sigprocmask(SIG_SETMASK, &full, &mask)) {
|
|
|
|
pr_err("Failed to block signals on threads start: %s\n", strerror(errno));
|
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
|
|
|
|
pthread_attr_init(&attrs);
|
|
|
|
pthread_attr_setdetachstate(&attrs, PTHREAD_CREATE_DETACHED);
|
|
|
|
|
|
|
|
for (t = 1; t < nr_threads; t++) {
|
|
|
|
enum thread_msg msg = THREAD_MSG__UNDEFINED;
|
|
|
|
|
|
|
|
#ifdef HAVE_PTHREAD_ATTR_SETAFFINITY_NP
|
|
|
|
pthread_attr_setaffinity_np(&attrs,
|
|
|
|
MMAP_CPU_MASK_BYTES(&(thread_data[t].mask->affinity)),
|
|
|
|
(cpu_set_t *)(thread_data[t].mask->affinity.bits));
|
|
|
|
#endif
|
|
|
|
if (pthread_create(&handle, &attrs, record__thread, &thread_data[t])) {
|
|
|
|
for (tt = 1; tt < t; tt++)
|
|
|
|
record__terminate_thread(&thread_data[t]);
|
|
|
|
pr_err("Failed to start threads: %s\n", strerror(errno));
|
|
|
|
ret = -1;
|
|
|
|
goto out_err;
|
|
|
|
}
|
|
|
|
|
|
|
|
err = read(thread_data[t].pipes.ack[0], &msg, sizeof(msg));
|
|
|
|
if (err > 0)
|
|
|
|
pr_debug2("threads[%d]: sent %s\n", rec->thread_data[t].tid,
|
|
|
|
thread_msg_tags[msg]);
|
|
|
|
else
|
|
|
|
pr_warning("threads[%d]: failed to receive start notification from %d\n",
|
|
|
|
thread->tid, rec->thread_data[t].tid);
|
|
|
|
}
|
|
|
|
|
|
|
|
sched_setaffinity(0, MMAP_CPU_MASK_BYTES(&thread->mask->affinity),
|
|
|
|
(cpu_set_t *)thread->mask->affinity.bits);
|
|
|
|
|
2022-01-17 18:34:25 +00:00
|
|
|
pr_debug("threads[%d]: started on cpu%d\n", thread->tid, sched_getcpu());
|
|
|
|
|
2022-01-17 18:34:27 +00:00
|
|
|
out_err:
|
|
|
|
pthread_attr_destroy(&attrs);
|
|
|
|
|
|
|
|
if (sigprocmask(SIG_SETMASK, &mask, NULL)) {
|
|
|
|
pr_err("Failed to unblock signals on threads start: %s\n", strerror(errno));
|
|
|
|
ret = -1;
|
|
|
|
}
|
|
|
|
|
|
|
|
return ret;
|
2022-01-17 18:34:25 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static int record__stop_threads(struct record *rec)
|
|
|
|
{
|
|
|
|
int t;
|
|
|
|
struct record_thread *thread_data = rec->thread_data;
|
|
|
|
|
2022-01-17 18:34:26 +00:00
|
|
|
for (t = 1; t < rec->nr_threads; t++)
|
|
|
|
record__terminate_thread(&thread_data[t]);
|
|
|
|
|
2022-01-17 18:34:31 +00:00
|
|
|
for (t = 0; t < rec->nr_threads; t++) {
|
2022-01-17 18:34:25 +00:00
|
|
|
rec->samples += thread_data[t].samples;
|
2022-01-17 18:34:31 +00:00
|
|
|
if (!record__threads_enabled(rec))
|
|
|
|
continue;
|
|
|
|
rec->session->bytes_transferred += thread_data[t].bytes_transferred;
|
|
|
|
rec->session->bytes_compressed += thread_data[t].bytes_compressed;
|
|
|
|
pr_debug("threads[%d]: samples=%lld, wakes=%ld, ", thread_data[t].tid,
|
|
|
|
thread_data[t].samples, thread_data[t].waking);
|
|
|
|
if (thread_data[t].bytes_transferred && thread_data[t].bytes_compressed)
|
|
|
|
pr_debug("transferred=%" PRIu64 ", compressed=%" PRIu64 "\n",
|
|
|
|
thread_data[t].bytes_transferred, thread_data[t].bytes_compressed);
|
|
|
|
else
|
|
|
|
pr_debug("written=%" PRIu64 "\n", thread_data[t].bytes_written);
|
|
|
|
}
|
2022-01-17 18:34:25 +00:00
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static unsigned long record__waking(struct record *rec)
|
|
|
|
{
|
|
|
|
int t;
|
|
|
|
unsigned long waking = 0;
|
|
|
|
struct record_thread *thread_data = rec->thread_data;
|
|
|
|
|
|
|
|
for (t = 0; t < rec->nr_threads; t++)
|
|
|
|
waking += thread_data[t].waking;
|
|
|
|
|
|
|
|
return waking;
|
|
|
|
}
|
|
|
|
|
2013-12-19 17:38:03 +00:00
|
|
|
static int __cmd_record(struct record *rec, int argc, const char **argv)
|
2009-05-05 15:50:27 +00:00
|
|
|
{
|
2013-11-06 18:41:34 +00:00
|
|
|
int err;
|
2014-05-12 00:47:24 +00:00
|
|
|
int status = 0;
|
2010-03-18 14:36:04 +00:00
|
|
|
const bool forks = argc > 0;
|
2011-11-28 10:30:20 +00:00
|
|
|
struct perf_tool *tool = &rec->tool;
|
2013-12-19 17:43:45 +00:00
|
|
|
struct record_opts *opts = &rec->opts;
|
2017-01-23 21:07:59 +00:00
|
|
|
struct perf_data *data = &rec->data;
|
2011-11-25 10:19:45 +00:00
|
|
|
struct perf_session *session;
|
2014-08-13 14:33:59 +00:00
|
|
|
bool disabled = false, draining = false;
|
2015-01-29 08:06:48 +00:00
|
|
|
int fd;
|
2019-03-18 17:41:02 +00:00
|
|
|
float ratio = 0;
|
2020-07-17 07:07:50 +00:00
|
|
|
enum evlist_ctl_cmd cmd = EVLIST_CTL_CMD_UNSUPPORTED;
|
2009-04-08 13:01:31 +00:00
|
|
|
|
2014-05-12 00:47:24 +00:00
|
|
|
atexit(record__sig_exit);
|
2009-06-18 21:22:55 +00:00
|
|
|
signal(SIGCHLD, sig_handler);
|
|
|
|
signal(SIGINT, sig_handler);
|
2013-05-06 18:24:23 +00:00
|
|
|
signal(SIGTERM, sig_handler);
|
2016-11-26 07:03:28 +00:00
|
|
|
signal(SIGSEGV, sigsegv_handler);
|
2016-04-13 08:21:06 +00:00
|
|
|
|
2020-03-25 12:45:34 +00:00
|
|
|
if (rec->opts.record_cgroup) {
|
2024-08-18 21:29:47 +00:00
|
|
|
#ifndef HAVE_FILE_HANDLE
|
2020-03-25 12:45:34 +00:00
|
|
|
pr_err("cgroup tracking is not supported\n");
|
|
|
|
return -1;
|
|
|
|
#endif
|
|
|
|
}
|
|
|
|
|
2017-01-09 09:51:58 +00:00
|
|
|
if (rec->opts.auxtrace_snapshot_mode || rec->switch_output.enabled) {
|
2015-04-30 14:37:32 +00:00
|
|
|
signal(SIGUSR2, snapshot_sig_handler);
|
2016-04-20 18:59:50 +00:00
|
|
|
if (rec->opts.auxtrace_snapshot_mode)
|
|
|
|
trigger_on(&auxtrace_snapshot_trigger);
|
2017-01-09 09:51:58 +00:00
|
|
|
if (rec->switch_output.enabled)
|
2016-04-20 18:59:50 +00:00
|
|
|
trigger_on(&switch_output_trigger);
|
2016-04-13 08:21:06 +00:00
|
|
|
} else {
|
2015-04-30 14:37:32 +00:00
|
|
|
signal(SIGUSR2, SIG_IGN);
|
2016-04-13 08:21:06 +00:00
|
|
|
}
|
2009-06-18 21:22:55 +00:00
|
|
|
|
2024-08-12 20:47:03 +00:00
|
|
|
perf_tool__init(tool, /*ordered_events=*/true);
|
|
|
|
tool->sample = process_sample_event;
|
|
|
|
tool->fork = perf_event__process_fork;
|
|
|
|
tool->exit = perf_event__process_exit;
|
|
|
|
tool->comm = perf_event__process_comm;
|
|
|
|
tool->namespaces = perf_event__process_namespaces;
|
|
|
|
tool->mmap = build_id__process_mmap;
|
|
|
|
tool->mmap2 = build_id__process_mmap2;
|
|
|
|
tool->itrace_start = process_timestamp_boundary;
|
|
|
|
tool->aux = process_timestamp_boundary;
|
2024-08-18 21:29:47 +00:00
|
|
|
tool->namespace_events = rec->opts.record_namespaces;
|
|
|
|
tool->cgroup_events = rec->opts.record_cgroup;
|
2021-07-19 22:31:49 +00:00
|
|
|
session = perf_session__new(data, tool);
|
2019-08-22 07:20:49 +00:00
|
|
|
if (IS_ERR(session)) {
|
2014-04-18 02:00:43 +00:00
|
|
|
pr_err("Perf session creation failed.\n");
|
2019-08-22 07:20:49 +00:00
|
|
|
return PTR_ERR(session);
|
2009-11-17 03:18:11 +00:00
|
|
|
}
|
|
|
|
|
2022-01-17 18:34:34 +00:00
|
|
|
if (record__threads_enabled(rec)) {
|
|
|
|
if (perf_data__is_pipe(&rec->data)) {
|
|
|
|
pr_err("Parallel trace streaming is not available in pipe mode.\n");
|
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
if (rec->opts.full_auxtrace) {
|
|
|
|
pr_err("Parallel trace streaming is not available in AUX area tracing mode.\n");
|
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2017-01-23 21:07:59 +00:00
|
|
|
fd = perf_data__fd(data);
|
2011-11-25 10:19:45 +00:00
|
|
|
rec->session = session;
|
|
|
|
|
2019-03-18 17:43:35 +00:00
|
|
|
if (zstd_init(&session->zstd_data, rec->opts.comp_level) < 0) {
|
|
|
|
pr_err("Compression initialization failed.\n");
|
|
|
|
return -1;
|
|
|
|
}
|
2020-05-13 02:20:23 +00:00
|
|
|
#ifdef HAVE_EVENTFD_SUPPORT
|
|
|
|
done_fd = eventfd(0, EFD_NONBLOCK);
|
|
|
|
if (done_fd < 0) {
|
|
|
|
pr_err("Failed to create wakeup eventfd, error: %m\n");
|
|
|
|
status = -1;
|
|
|
|
goto out_delete_session;
|
|
|
|
}
|
2021-02-05 06:50:01 +00:00
|
|
|
err = evlist__add_wakeup_eventfd(rec->evlist, done_fd);
|
2020-05-13 02:20:23 +00:00
|
|
|
if (err < 0) {
|
|
|
|
pr_err("Failed to add wakeup eventfd to poll list\n");
|
|
|
|
status = err;
|
|
|
|
goto out_delete_session;
|
|
|
|
}
|
|
|
|
#endif // HAVE_EVENTFD_SUPPORT
|
2019-03-18 17:43:35 +00:00
|
|
|
|
|
|
|
session->header.env.comp_type = PERF_COMP_ZSTD;
|
|
|
|
session->header.env.comp_level = rec->opts.comp_level;
|
|
|
|
|
perf record: Put a copy of kcore into the perf.data directory
Add a new 'perf record' option '--kcore' which will put a copy of
/proc/kcore, kallsyms and modules into a perf.data directory. Note, that
without the --kcore option, output goes to a file as previously. The
tools' -o and -i options work with either a file name or directory name.
Example:
$ sudo perf record --kcore uname
$ sudo tree perf.data
perf.data
├── kcore_dir
│ ├── kallsyms
│ ├── kcore
│ └── modules
└── data
$ sudo perf script -v
build id event received for vmlinux: 1eaa285996affce2d74d8e66dcea09a80c9941de
build id event received for [vdso]: 8bbaf5dc62a9b644b4d4e4539737e104e4a84541
Samples for 'cycles' event do not have CPU attribute set. Skipping 'cpu' field.
Using CPUID GenuineIntel-6-8E-A
Using perf.data/kcore_dir/kcore for kernel data
Using perf.data/kcore_dir/kallsyms for symbols
perf 19058 506778.423729: 1 cycles: ffffffffa2caa548 native_write_msr+0x8 (vmlinux)
perf 19058 506778.423733: 1 cycles: ffffffffa2caa548 native_write_msr+0x8 (vmlinux)
perf 19058 506778.423734: 7 cycles: ffffffffa2caa548 native_write_msr+0x8 (vmlinux)
perf 19058 506778.423736: 117 cycles: ffffffffa2caa54a native_write_msr+0xa (vmlinux)
perf 19058 506778.423738: 2092 cycles: ffffffffa2c9b7b0 native_apic_msr_write+0x0 (vmlinux)
perf 19058 506778.423740: 37380 cycles: ffffffffa2f121d0 perf_event_addr_filters_exec+0x0 (vmlinux)
uname 19058 506778.423751: 582673 cycles: ffffffffa303a407 propagate_protected_usage+0x147 (vmlinux)
uname 19058 506778.423892: 2241841 cycles: ffffffffa2cae0c9 unwind_next_frame.part.5+0x79 (vmlinux)
uname 19058 506778.424430: 2457397 cycles: ffffffffa3019232 check_memory_region+0x52 (vmlinux)
Committer testing:
# rm -rf perf.data*
# perf record sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.024 MB perf.data (7 samples) ]
# ls -l perf.data
-rw-------. 1 root root 34772 Oct 21 11:08 perf.data
# perf record --kcore uname
Linux
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.024 MB perf.data (7 samples) ]
ls[root@quaco ~]# ls -lad perf.data*
drwx------. 3 root root 4096 Oct 21 11:08 perf.data
-rw-------. 1 root root 34772 Oct 21 11:08 perf.data.old
# perf evlist -v
cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1
# perf evlist -v -i perf.data/data
cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1
#
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lore.kernel.org/lkml/20191004083121.12182-6-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-10-04 08:31:21 +00:00
|
|
|
if (rec->opts.kcore &&
|
|
|
|
!record__kcore_readable(&session->machines.host)) {
|
|
|
|
pr_err("ERROR: kcore is not readable.\n");
|
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
|
perf header: Store clock references for -k/--clockid option
Add a new CLOCK_DATA feature that stores reference times when
-k/--clockid option is specified.
It contains the clock id and its reference time together with wall clock
time taken at the 'same time', both values are in nanoseconds.
The format of data is as below:
struct {
u32 version; /* version = 1 */
u32 clockid;
u64 wall_clock_ns;
u64 clockid_time_ns;
};
This clock reference times will be used in following changes to display
wall clock for perf events.
It's available only for recording with clockid specified, because it's
the only case where we can get reference time to wallclock time. It's
can't do that with perf clock yet.
Committer testing:
$ perf record -h -k
Usage: perf record [<options>] [<command>]
or: perf record [<options>] -- <command> [<options>]
-k, --clockid <clockid>
clockid to use for events, see clock_gettime()
$ perf record -k monotonic sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.017 MB perf.data (8 samples) ]
$ perf report --header-only | grep clockid -A1
# event : name = cycles:u, , id = { 88815, 88816, 88817, 88818, 88819, 88820, 88821, 88822 }, size = 120, { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|PERIOD, read_format = ID, disabled = 1, inherit = 1, exclude_kernel = 1, mmap = 1, comm = 1, freq = 1, enable_on_exec = 1, task = 1, precise_ip = 3, sample_id_all = 1, exclude_guest = 1, mmap2 = 1, comm_exec = 1, use_clockid = 1, ksymbol = 1, bpf_event = 1, clockid = 1
# CPU_TOPOLOGY info available, use -I to display
--
# clockid frequency: 1000 MHz
# cpu pmu capabilities: branches=32, max_precise=3, pmu_name=skylake
# clockid: monotonic (1)
# reference time: 2020-08-06 09:40:21.619290 = 1596717621.619290 (TOD) = 21931.077673635 (monotonic)
$
Original-patch-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Geneviève Bastien <gbastien@versatic.net>
Cc: Ian Rogers <irogers@google.com>
Cc: Jeremie Galarneau <jgalar@efficios.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lore.kernel.org/lkml/20200805093444.314999-4-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-05 09:34:40 +00:00
|
|
|
if (record__init_clock(rec))
|
|
|
|
return -1;
|
|
|
|
|
2013-12-19 17:38:03 +00:00
|
|
|
record__init_features(rec);
|
2012-03-08 22:47:46 +00:00
|
|
|
|
2009-12-27 23:36:57 +00:00
|
|
|
if (forks) {
|
2020-11-30 12:26:54 +00:00
|
|
|
err = evlist__prepare_workload(rec->evlist, &opts->target, argv, data->is_pipe,
|
|
|
|
workload_exec_failed_signal);
|
2011-11-09 10:47:15 +00:00
|
|
|
if (err < 0) {
|
|
|
|
pr_err("Couldn't run the workload!\n");
|
2014-05-12 00:47:24 +00:00
|
|
|
status = err;
|
2011-11-09 10:47:15 +00:00
|
|
|
goto out_delete_session;
|
2009-12-16 16:55:55 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-03-02 16:13:54 +00:00
|
|
|
/*
|
|
|
|
* If we have just single event and are sending data
|
|
|
|
* through pipe, we need to force the ids allocation,
|
|
|
|
* because we synthesize event name through the pipe
|
|
|
|
* and need the id for that.
|
|
|
|
*/
|
2019-07-21 11:24:28 +00:00
|
|
|
if (data->is_pipe && rec->evlist->core.nr_entries == 1)
|
2018-03-02 16:13:54 +00:00
|
|
|
rec->opts.sample_id = true;
|
|
|
|
|
2024-01-19 04:03:03 +00:00
|
|
|
if (rec->timestamp_filename && perf_data__is_pipe(data)) {
|
|
|
|
rec->timestamp_filename = false;
|
|
|
|
pr_warning("WARNING: --timestamp-filename option is not available in pipe mode.\n");
|
|
|
|
}
|
|
|
|
|
2023-12-14 14:46:12 +00:00
|
|
|
evlist__uniquify_name(rec->evlist);
|
2021-04-27 07:01:30 +00:00
|
|
|
|
2024-04-11 07:54:47 +00:00
|
|
|
evlist__config(rec->evlist, opts, &callchain_param);
|
|
|
|
|
2022-09-12 08:34:11 +00:00
|
|
|
/* Debug message used by test scripts */
|
|
|
|
pr_debug3("perf record opening and mmapping events\n");
|
2013-12-19 17:38:03 +00:00
|
|
|
if (record__open(rec) != 0) {
|
2012-08-26 18:24:47 +00:00
|
|
|
err = -1;
|
2022-01-17 18:34:25 +00:00
|
|
|
goto out_free_threads;
|
2012-08-26 18:24:47 +00:00
|
|
|
}
|
2022-09-12 08:34:11 +00:00
|
|
|
/* Debug message used by test scripts */
|
|
|
|
pr_debug3("perf record done opening and mmapping events\n");
|
2019-08-06 13:14:05 +00:00
|
|
|
session->header.env.comp_mmap_len = session->evlist->core.mmap_len;
|
2009-04-08 13:01:31 +00:00
|
|
|
|
perf record: Put a copy of kcore into the perf.data directory
Add a new 'perf record' option '--kcore' which will put a copy of
/proc/kcore, kallsyms and modules into a perf.data directory. Note, that
without the --kcore option, output goes to a file as previously. The
tools' -o and -i options work with either a file name or directory name.
Example:
$ sudo perf record --kcore uname
$ sudo tree perf.data
perf.data
├── kcore_dir
│ ├── kallsyms
│ ├── kcore
│ └── modules
└── data
$ sudo perf script -v
build id event received for vmlinux: 1eaa285996affce2d74d8e66dcea09a80c9941de
build id event received for [vdso]: 8bbaf5dc62a9b644b4d4e4539737e104e4a84541
Samples for 'cycles' event do not have CPU attribute set. Skipping 'cpu' field.
Using CPUID GenuineIntel-6-8E-A
Using perf.data/kcore_dir/kcore for kernel data
Using perf.data/kcore_dir/kallsyms for symbols
perf 19058 506778.423729: 1 cycles: ffffffffa2caa548 native_write_msr+0x8 (vmlinux)
perf 19058 506778.423733: 1 cycles: ffffffffa2caa548 native_write_msr+0x8 (vmlinux)
perf 19058 506778.423734: 7 cycles: ffffffffa2caa548 native_write_msr+0x8 (vmlinux)
perf 19058 506778.423736: 117 cycles: ffffffffa2caa54a native_write_msr+0xa (vmlinux)
perf 19058 506778.423738: 2092 cycles: ffffffffa2c9b7b0 native_apic_msr_write+0x0 (vmlinux)
perf 19058 506778.423740: 37380 cycles: ffffffffa2f121d0 perf_event_addr_filters_exec+0x0 (vmlinux)
uname 19058 506778.423751: 582673 cycles: ffffffffa303a407 propagate_protected_usage+0x147 (vmlinux)
uname 19058 506778.423892: 2241841 cycles: ffffffffa2cae0c9 unwind_next_frame.part.5+0x79 (vmlinux)
uname 19058 506778.424430: 2457397 cycles: ffffffffa3019232 check_memory_region+0x52 (vmlinux)
Committer testing:
# rm -rf perf.data*
# perf record sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.024 MB perf.data (7 samples) ]
# ls -l perf.data
-rw-------. 1 root root 34772 Oct 21 11:08 perf.data
# perf record --kcore uname
Linux
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.024 MB perf.data (7 samples) ]
ls[root@quaco ~]# ls -lad perf.data*
drwx------. 3 root root 4096 Oct 21 11:08 perf.data
-rw-------. 1 root root 34772 Oct 21 11:08 perf.data.old
# perf evlist -v
cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1
# perf evlist -v -i perf.data/data
cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1
#
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lore.kernel.org/lkml/20191004083121.12182-6-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-10-04 08:31:21 +00:00
|
|
|
if (rec->opts.kcore) {
|
|
|
|
err = record__kcore_copy(&session->machines.host, data);
|
|
|
|
if (err) {
|
|
|
|
pr_err("ERROR: Failed to copy kcore\n");
|
2022-01-17 18:34:25 +00:00
|
|
|
goto out_free_threads;
|
perf record: Put a copy of kcore into the perf.data directory
Add a new 'perf record' option '--kcore' which will put a copy of
/proc/kcore, kallsyms and modules into a perf.data directory. Note, that
without the --kcore option, output goes to a file as previously. The
tools' -o and -i options work with either a file name or directory name.
Example:
$ sudo perf record --kcore uname
$ sudo tree perf.data
perf.data
├── kcore_dir
│ ├── kallsyms
│ ├── kcore
│ └── modules
└── data
$ sudo perf script -v
build id event received for vmlinux: 1eaa285996affce2d74d8e66dcea09a80c9941de
build id event received for [vdso]: 8bbaf5dc62a9b644b4d4e4539737e104e4a84541
Samples for 'cycles' event do not have CPU attribute set. Skipping 'cpu' field.
Using CPUID GenuineIntel-6-8E-A
Using perf.data/kcore_dir/kcore for kernel data
Using perf.data/kcore_dir/kallsyms for symbols
perf 19058 506778.423729: 1 cycles: ffffffffa2caa548 native_write_msr+0x8 (vmlinux)
perf 19058 506778.423733: 1 cycles: ffffffffa2caa548 native_write_msr+0x8 (vmlinux)
perf 19058 506778.423734: 7 cycles: ffffffffa2caa548 native_write_msr+0x8 (vmlinux)
perf 19058 506778.423736: 117 cycles: ffffffffa2caa54a native_write_msr+0xa (vmlinux)
perf 19058 506778.423738: 2092 cycles: ffffffffa2c9b7b0 native_apic_msr_write+0x0 (vmlinux)
perf 19058 506778.423740: 37380 cycles: ffffffffa2f121d0 perf_event_addr_filters_exec+0x0 (vmlinux)
uname 19058 506778.423751: 582673 cycles: ffffffffa303a407 propagate_protected_usage+0x147 (vmlinux)
uname 19058 506778.423892: 2241841 cycles: ffffffffa2cae0c9 unwind_next_frame.part.5+0x79 (vmlinux)
uname 19058 506778.424430: 2457397 cycles: ffffffffa3019232 check_memory_region+0x52 (vmlinux)
Committer testing:
# rm -rf perf.data*
# perf record sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.024 MB perf.data (7 samples) ]
# ls -l perf.data
-rw-------. 1 root root 34772 Oct 21 11:08 perf.data
# perf record --kcore uname
Linux
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.024 MB perf.data (7 samples) ]
ls[root@quaco ~]# ls -lad perf.data*
drwx------. 3 root root 4096 Oct 21 11:08 perf.data
-rw-------. 1 root root 34772 Oct 21 11:08 perf.data.old
# perf evlist -v
cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1
# perf evlist -v -i perf.data/data
cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1
#
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lore.kernel.org/lkml/20191004083121.12182-6-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-10-04 08:31:21 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2015-08-19 14:29:21 +00:00
|
|
|
/*
|
|
|
|
* Normally perf_session__new would do this, but it doesn't have the
|
|
|
|
* evlist.
|
|
|
|
*/
|
2020-06-17 12:29:48 +00:00
|
|
|
if (rec->tool.ordered_events && !evlist__sample_id_all(rec->evlist)) {
|
2015-08-19 14:29:21 +00:00
|
|
|
pr_warning("WARNING: No sample_id_all support, falling back to unordered processing\n");
|
|
|
|
rec->tool.ordered_events = false;
|
|
|
|
}
|
|
|
|
|
2023-03-12 02:15:42 +00:00
|
|
|
if (evlist__nr_groups(rec->evlist) == 0)
|
2013-01-22 09:09:31 +00:00
|
|
|
perf_header__clear_feat(&session->header, HEADER_GROUP_DESC);
|
|
|
|
|
2017-01-23 21:07:59 +00:00
|
|
|
if (data->is_pipe) {
|
2015-01-29 08:06:48 +00:00
|
|
|
err = perf_header__write_pipe(fd);
|
2010-04-02 04:59:16 +00:00
|
|
|
if (err < 0)
|
2022-01-17 18:34:25 +00:00
|
|
|
goto out_free_threads;
|
2013-06-05 11:35:06 +00:00
|
|
|
} else {
|
2015-01-29 08:06:48 +00:00
|
|
|
err = perf_session__write_header(session, rec->evlist, fd, false);
|
2009-11-19 16:55:56 +00:00
|
|
|
if (err < 0)
|
2022-01-17 18:34:25 +00:00
|
|
|
goto out_free_threads;
|
2010-01-05 18:50:31 +00:00
|
|
|
}
|
|
|
|
|
2020-04-24 15:24:51 +00:00
|
|
|
err = -1;
|
2012-02-06 22:27:52 +00:00
|
|
|
if (!rec->no_buildid
|
2011-12-07 09:02:55 +00:00
|
|
|
&& !perf_header__has_feat(&session->header, HEADER_BUILD_ID)) {
|
2012-02-06 22:27:52 +00:00
|
|
|
pr_err("Couldn't generate buildids. "
|
2011-12-07 09:02:55 +00:00
|
|
|
"Use --no-buildid to profile anyway.\n");
|
2022-01-17 18:34:25 +00:00
|
|
|
goto out_free_threads;
|
2011-12-07 09:02:55 +00:00
|
|
|
}
|
|
|
|
|
2020-04-28 17:58:29 +00:00
|
|
|
err = record__setup_sb_evlist(rec);
|
|
|
|
if (err)
|
2022-01-17 18:34:25 +00:00
|
|
|
goto out_free_threads;
|
2019-03-12 05:30:50 +00:00
|
|
|
|
perf record: Add --tail-synthesize option
When working with overwritable ring buffer there's a inconvenience
problem: if perf dumps data after a long period after it starts,
non-sample events may lost, which makes following 'perf report' unable
to identify proc name and mmap layout. For example:
# perf record -m 4 -e raw_syscalls:* -g --overwrite --switch-output \
dd if=/dev/zero of=/dev/null
send SIGUSR2 after dd runs long enough. The resuling perf.data lost
correct comm and mmap events:
# perf script -i perf.data.2016061522374354
perf 24478 [004] 2581325.601789: raw_syscalls:sys_exit: NR 0 = 512
^^^^
Should be 'dd'
27b2e8 syscall_slow_exit_work+0xfe2000e3 (/lib/modules/4.6.0-rc3+/build/vmlinux)
203cc7 do_syscall_64+0xfe200117 (/lib/modules/4.6.0-rc3+/build/vmlinux)
b18d83 return_from_SYSCALL_64+0xfe200000 (/lib/modules/4.6.0-rc3+/build/vmlinux)
7f47c417edf0 [unknown] ([unknown])
^^^^^^^^^^^^
Fail to unwind
This patch provides a '--tail-synthesize' option, allows perf to collect
system status when finalizing output file. In resuling output file, the
non-sample events reflect system status when dumping data.
After this patch:
# perf record -m 4 -e raw_syscalls:* -g --overwrite --switch-output --tail-synthesize \
dd if=/dev/zero of=/dev/null
# perf script -i perf.data.2016061600544998
dd 27364 [004] 2583244.994464: raw_syscalls:sys_enter: NR 1 (1, ...
^^
Correct comm
203a18 syscall_trace_enter_phase2+0xfe2001a8 ([kernel.kallsyms])
203aa5 syscall_trace_enter+0xfe200055 ([kernel.kallsyms])
203caa do_syscall_64+0xfe2000fa ([kernel.kallsyms])
b18d83 return_from_SYSCALL_64+0xfe200000 ([kernel.kallsyms])
d8e50 __GI___libc_write+0xffff01d9639f4010 (/tmp/oxygen_root-w00229757/lib64/libc-2.18.so)
^^^^^
Correct unwind
This option doesn't aim to solve this problem completely. If a process
terminates before SIGUSR2, we still lost its COMM and MMAP events. For
example, we can't unwind correctly from the final perf.data we get from
the previous example, because when perf collects the final output file
(when we press C-c), 'dd' has been terminated so its '/proc/<pid>/mmap'
becomes empty.
However, this is a cheaper choice. To completely solve this problem we
need to continously output non-sample events. To satisify the
requirement of daemonization, we need to merge them periodically. It is
possible but requires much more code and cycles.
Automatically select --tail-synthesize when --overwrite is provided.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1468485287-33422-16-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-07-14 08:34:47 +00:00
|
|
|
err = record__synthesize(rec, false);
|
2016-02-26 09:32:07 +00:00
|
|
|
if (err < 0)
|
2022-01-17 18:34:25 +00:00
|
|
|
goto out_free_threads;
|
2012-08-26 18:24:47 +00:00
|
|
|
|
2011-11-25 10:19:45 +00:00
|
|
|
if (rec->realtime_prio) {
|
2009-04-08 13:01:31 +00:00
|
|
|
struct sched_param param;
|
|
|
|
|
2011-11-25 10:19:45 +00:00
|
|
|
param.sched_priority = rec->realtime_prio;
|
2009-04-08 13:01:31 +00:00
|
|
|
if (sched_setscheduler(0, SCHED_FIFO, ¶m)) {
|
2009-10-21 19:34:06 +00:00
|
|
|
pr_err("Could not set realtime priority.\n");
|
2012-08-26 18:24:47 +00:00
|
|
|
err = -1;
|
2022-01-17 18:34:25 +00:00
|
|
|
goto out_free_threads;
|
2009-04-08 13:01:31 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2022-01-17 18:34:25 +00:00
|
|
|
if (record__start_threads(rec))
|
|
|
|
goto out_free_threads;
|
|
|
|
|
2012-11-12 17:34:01 +00:00
|
|
|
/*
|
|
|
|
* When perf is starting the traced process, all the events
|
|
|
|
* (apart from group members) have enable_on_exec=1 set,
|
|
|
|
* so don't spoil it by prematurely enabling them.
|
|
|
|
*/
|
2023-03-02 03:11:45 +00:00
|
|
|
if (!target__none(&opts->target) && !opts->target.initial_delay)
|
2019-07-21 11:24:08 +00:00
|
|
|
evlist__enable(rec->evlist);
|
2011-08-25 16:17:55 +00:00
|
|
|
|
2009-12-16 16:55:55 +00:00
|
|
|
/*
|
|
|
|
* Let the child rip
|
|
|
|
*/
|
2015-09-22 00:24:55 +00:00
|
|
|
if (forks) {
|
2018-03-07 15:50:04 +00:00
|
|
|
struct machine *machine = &session->machines.host;
|
2015-09-30 01:45:24 +00:00
|
|
|
union perf_event *event;
|
2017-03-07 20:41:51 +00:00
|
|
|
pid_t tgid;
|
2015-09-30 01:45:24 +00:00
|
|
|
|
|
|
|
event = malloc(sizeof(event->comm) + machine->id_hdr_size);
|
|
|
|
if (event == NULL) {
|
|
|
|
err = -ENOMEM;
|
|
|
|
goto out_child;
|
|
|
|
}
|
|
|
|
|
2015-09-22 00:24:55 +00:00
|
|
|
/*
|
|
|
|
* Some H/W events are generated before COMM event
|
|
|
|
* which is emitted during exec(), so perf script
|
|
|
|
* cannot see a correct process name for those events.
|
|
|
|
* Synthesize COMM event to prevent it.
|
|
|
|
*/
|
2017-03-07 20:41:51 +00:00
|
|
|
tgid = perf_event__synthesize_comm(tool, event,
|
|
|
|
rec->evlist->workload.pid,
|
|
|
|
process_synthesized_event,
|
|
|
|
machine);
|
|
|
|
free(event);
|
|
|
|
|
|
|
|
if (tgid == -1)
|
|
|
|
goto out_child;
|
|
|
|
|
|
|
|
event = malloc(sizeof(event->namespaces) +
|
|
|
|
(NR_NAMESPACES * sizeof(struct perf_ns_link_info)) +
|
|
|
|
machine->id_hdr_size);
|
|
|
|
if (event == NULL) {
|
|
|
|
err = -ENOMEM;
|
|
|
|
goto out_child;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Synthesize NAMESPACES event for the command specified.
|
|
|
|
*/
|
|
|
|
perf_event__synthesize_namespaces(tool, event,
|
|
|
|
rec->evlist->workload.pid,
|
|
|
|
tgid, process_synthesized_event,
|
|
|
|
machine);
|
2015-09-30 01:45:24 +00:00
|
|
|
free(event);
|
2015-09-22 00:24:55 +00:00
|
|
|
|
2020-11-30 12:26:54 +00:00
|
|
|
evlist__start_workload(rec->evlist);
|
2015-09-22 00:24:55 +00:00
|
|
|
}
|
2009-12-16 16:55:55 +00:00
|
|
|
|
2023-03-02 03:11:45 +00:00
|
|
|
if (opts->target.initial_delay) {
|
2020-07-17 07:07:03 +00:00
|
|
|
pr_info(EVLIST_DISABLED_MSG);
|
2023-03-02 03:11:45 +00:00
|
|
|
if (opts->target.initial_delay > 0) {
|
|
|
|
usleep(opts->target.initial_delay * USEC_PER_MSEC);
|
2020-07-17 07:07:03 +00:00
|
|
|
evlist__enable(rec->evlist);
|
|
|
|
pr_info(EVLIST_ENABLED_MSG);
|
|
|
|
}
|
2014-01-11 21:38:27 +00:00
|
|
|
}
|
|
|
|
|
perf record: Allow multiple recording time ranges
AUX area traces can produce too much data to record successfully or
analyze subsequently. Add another means to reduce data collection by
allowing multiple recording time ranges.
This is useful, for instance, in cases where a workload produces
predictably reproducible events in specific time ranges.
Today we only have perf record -D <msecs> to start at a specific region, or
some complicated approach using snapshot mode and external scripts sending
signals or using the fifos. But these approaches are difficult to set up
compared with simply having perf do it.
Extend perf record option -D/--delay option to specifying relative time
stamps for start stop controlled by perf with the right time offset, for
instance:
perf record -e intel_pt// -D 10-20,30-40
to record 10ms to 20ms into the trace and 30ms to 40ms.
Example:
The example workload is:
$ cat repeat-usleep.c
int usleep(useconds_t usec);
int usage(int ret, const char *msg)
{
if (msg)
fprintf(stderr, "%s\n", msg);
fprintf(stderr, "Usage is: repeat-usleep <microseconds>\n");
return ret;
}
int main(int argc, char *argv[])
{
unsigned long usecs;
char *end_ptr;
if (argc != 2)
return usage(1, "Error: Wrong number of arguments!");
errno = 0;
usecs = strtoul(argv[1], &end_ptr, 0);
if (errno || *end_ptr || usecs > UINT_MAX)
return usage(1, "Error: Invalid argument!");
while (1) {
int ret = usleep(usecs);
if (ret & errno != EINTR)
return usage(1, "Error: usleep() failed!");
}
return 0;
}
$ perf record -e intel_pt//u --delay 10-20,40-70,110-160 -- ./repeat-usleep 500
Events disabled
Events enabled
Events disabled
Events enabled
Events disabled
Events enabled
Events disabled
[ perf record: Woken up 5 times to write data ]
[ perf record: Captured and wrote 0.204 MB perf.data ]
Terminated
A dlfilter is used to determine continuous data collection (timestamps
less than 1ms apart):
$ cat dlfilter-show-delays.c
static __u64 start_time;
static __u64 last_time;
int start(void **data, void *ctx)
{
printf("%-17s\t%-9s\t%-6s\n", " Time", " Duration", " Delay");
return 0;
}
int filter_event_early(void *data, const struct perf_dlfilter_sample *sample, void *ctx)
{
__u64 delta;
if (!sample->time)
return 1;
if (!last_time)
goto out;
delta = sample->time - last_time;
if (delta < 1000000)
goto out2;;
printf("%17.9f\t%9.1f\t%6.1f\n", start_time / 1000000000.0, (last_time - start_time) / 1000000.0, delta / 1000000.0);
out:
start_time = sample->time;
out2:
last_time = sample->time;
return 1;
}
int stop(void *data, void *ctx)
{
printf("%17.9f\t%9.1f\n", start_time / 1000000000.0, (last_time - start_time) / 1000000.0);
return 0;
}
The result shows the times roughly match the --delay option:
$ perf script --itrace=qb --dlfilter dlfilter-show-delays.so
Time Duration Delay
39215.302317300 9.7 20.5
39215.332480217 30.4 40.9
39215.403837717 49.8
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20220824072814.16422-6-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-08-24 07:28:14 +00:00
|
|
|
err = event_enable_timer__start(rec->evlist->eet);
|
|
|
|
if (err)
|
|
|
|
goto out_child;
|
|
|
|
|
2022-09-12 08:34:11 +00:00
|
|
|
/* Debug message used by test scripts */
|
|
|
|
pr_debug3("perf record has started\n");
|
|
|
|
fflush(stderr);
|
|
|
|
|
2016-04-20 18:59:49 +00:00
|
|
|
trigger_ready(&auxtrace_snapshot_trigger);
|
2016-04-20 18:59:50 +00:00
|
|
|
trigger_ready(&switch_output_trigger);
|
2016-11-26 07:03:28 +00:00
|
|
|
perf_hooks__invoke_record_start();
|
2022-06-10 11:33:15 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Must write FINISHED_INIT so it will be seen after all other
|
|
|
|
* synthesized user events, but before any regular events.
|
|
|
|
*/
|
|
|
|
err = write_finished_init(rec, false);
|
|
|
|
if (err < 0)
|
|
|
|
goto out_child;
|
|
|
|
|
2009-06-24 19:12:48 +00:00
|
|
|
for (;;) {
|
2022-01-17 18:34:25 +00:00
|
|
|
unsigned long long hits = thread->samples;
|
2009-04-08 13:01:31 +00:00
|
|
|
|
2016-07-14 08:34:43 +00:00
|
|
|
/*
|
|
|
|
* rec->evlist->bkw_mmap_state is possible to be
|
|
|
|
* BKW_MMAP_EMPTY here: when done == true and
|
|
|
|
* hits != rec->samples in previous round.
|
|
|
|
*
|
2020-11-30 12:33:55 +00:00
|
|
|
* evlist__toggle_bkw_mmap ensure we never
|
2016-07-14 08:34:43 +00:00
|
|
|
* convert BKW_MMAP_EMPTY to BKW_MMAP_DATA_PENDING.
|
|
|
|
*/
|
|
|
|
if (trigger_is_hit(&switch_output_trigger) || done || draining)
|
2020-11-30 12:33:55 +00:00
|
|
|
evlist__toggle_bkw_mmap(rec->evlist, BKW_MMAP_DATA_PENDING);
|
2016-07-14 08:34:43 +00:00
|
|
|
|
perf record: Implement --mmap-flush=<number> option
Implement a --mmap-flush option that specifies minimal number of bytes
that is extracted from mmaped kernel buffer to store into a trace. The
default option value is 1 byte what means every time trace writing
thread finds some new data in the mmaped buffer the data is extracted,
possibly compressed and written to a trace.
$ tools/perf/perf record --mmap-flush 1024 -e cycles -- matrix.gcc
$ tools/perf/perf record --aio --mmap-flush 1K -e cycles -- matrix.gcc
The option is independent from -z setting, doesn't vary with compression
level and can serve two purposes.
The first purpose is to increase the compression ratio of a trace data.
Larger data chunks are compressed more effectively so the implemented
option allows specifying data chunk size to compress. Also at some cases
executing more write syscalls with smaller data size can take longer
than executing less write syscalls with bigger data size due to syscall
overhead so extracting bigger data chunks specified by the option value
could additionally decrease runtime overhead.
The second purpose is to avoid self monitoring live-lock issue in system
wide (-a) profiling mode. Profiling in system wide mode with compression
(-a -z) can additionally induce data into the kernel buffers along with
the data from monitored processes. If performance data rate and volume
from the monitored processes is high then trace streaming and
compression activity in the tool is also high. High tool process
activity can lead to subtle live-lock effect when compression of single
new byte from some of mmaped kernel buffer leads to generation of the
next single byte at some mmaped buffer. So perf tool process ends up in
endless self monitoring.
Implemented synch parameter is the mean to force data move independently
from the specified flush threshold value. Despite the provided flush
value the tool needs capability to unconditionally drain memory buffers,
at least in the end of the collection.
Committer testing:
Running with the default value, i.e. as soon as there is something to
read go on consuming, we first write the synthesized events, small
chunks of about 128 bytes:
# perf trace -m 2048 --call-graph dwarf -e write -- perf record
<SNIP>
101.142 ( 0.004 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x210db60, count: 120) = 120
__libc_write (/usr/lib64/libpthread-2.28.so)
ion (/home/acme/bin/perf)
record__write (inlined)
process_synthesized_event (/home/acme/bin/perf)
perf_tool__process_synth_event (inlined)
perf_event__synthesize_mmap_events (/home/acme/bin/perf)
Then we move to reading the mmap buffers consuming the events put there
by the kernel perf infrastructure:
107.561 ( 0.005 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befc02000, count: 336) = 336
__libc_write (/usr/lib64/libpthread-2.28.so)
ion (/home/acme/bin/perf)
record__write (inlined)
record__pushfn (/home/acme/bin/perf)
perf_mmap__push (/home/acme/bin/perf)
record__mmap_read_evlist (inlined)
record__mmap_read_all (inlined)
__cmd_record (inlined)
cmd_record (/home/acme/bin/perf)
12919.953 ( 0.136 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befc83150, count: 184984) = 184984
<SNIP same backtrace as in the 107.561 timestamp>
12920.094 ( 0.155 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befc02150, count: 261816) = 261816
<SNIP same backtrace as in the 107.561 timestamp>
12920.253 ( 0.093 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befb81120, count: 170832) = 170832
<SNIP same backtrace as in the 107.561 timestamp>
If we limit it to write only when more than 16MB are available for
reading, it throttles that to a quarter of the --mmap-pages set for
'perf record', which by default get to 528384 bytes, found out using
'record -v':
mmap flush: 132096
mmap size 528384B
With that in place all the writes coming from
record__mmap_read_evlist(), i.e. from the mmap buffers setup by the
kernel perf infrastructure were at least 132096 bytes long.
Trying with a bigger mmap size:
perf trace -e write perf record -v -m 2048 --mmap-flush 16M
74982.928 ( 2.471 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff94a6cc000, count: 3580888) = 3580888
74985.406 ( 2.353 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff949ecb000, count: 3453256) = 3453256
74987.764 ( 2.629 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff9496ca000, count: 3859232) = 3859232
74990.399 ( 2.341 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff948ec9000, count: 3769032) = 3769032
74992.744 ( 2.064 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff9486c8000, count: 3310520) = 3310520
74994.814 ( 2.619 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff947ec7000, count: 4194688) = 4194688
74997.439 ( 2.787 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff9476c6000, count: 4029760) = 4029760
Was again limited to a quarter of the mmap size:
mmap flush: 2098176
mmap size 8392704B
A warning about that would be good to have but can be added later,
something like:
"max flush is a quarter of the mmap size, if wanting to bump the mmap
flush further, bump the mmap size as well using -m/--mmap-pages"
Also rename the 'sync' parameters to 'synch' to keep tools/perf building
with older glibcs:
cc1: warnings being treated as errors
builtin-record.c: In function 'record__mmap_read_evlist':
builtin-record.c:775: warning: declaration of 'sync' shadows a global declaration
/usr/include/unistd.h:933: warning: shadowed declaration is here
builtin-record.c: In function 'record__mmap_read_all':
builtin-record.c:856: warning: declaration of 'sync' shadows a global declaration
/usr/include/unistd.h:933: warning: shadowed declaration is here
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/f6600d72-ecfa-2eb7-7e51-f6954547d500@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-18 17:40:26 +00:00
|
|
|
if (record__mmap_read_all(rec, false) < 0) {
|
2016-04-20 18:59:49 +00:00
|
|
|
trigger_error(&auxtrace_snapshot_trigger);
|
2016-04-20 18:59:50 +00:00
|
|
|
trigger_error(&switch_output_trigger);
|
2012-08-26 18:24:47 +00:00
|
|
|
err = -1;
|
2014-05-12 00:47:24 +00:00
|
|
|
goto out_child;
|
2012-08-26 18:24:47 +00:00
|
|
|
}
|
2009-04-08 13:01:31 +00:00
|
|
|
|
2015-04-30 14:37:32 +00:00
|
|
|
if (auxtrace_record__snapshot_started) {
|
|
|
|
auxtrace_record__snapshot_started = 0;
|
2016-04-20 18:59:49 +00:00
|
|
|
if (!trigger_is_error(&auxtrace_snapshot_trigger))
|
2019-08-06 14:41:01 +00:00
|
|
|
record__read_auxtrace_snapshot(rec, false);
|
2016-04-20 18:59:49 +00:00
|
|
|
if (trigger_is_error(&auxtrace_snapshot_trigger)) {
|
2015-04-30 14:37:32 +00:00
|
|
|
pr_err("AUX area tracing snapshot failed\n");
|
|
|
|
err = -1;
|
|
|
|
goto out_child;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2016-04-20 18:59:50 +00:00
|
|
|
if (trigger_is_hit(&switch_output_trigger)) {
|
2016-07-14 08:34:43 +00:00
|
|
|
/*
|
|
|
|
* If switch_output_trigger is hit, the data in
|
|
|
|
* overwritable ring buffer should have been collected,
|
|
|
|
* so bkw_mmap_state should be set to BKW_MMAP_EMPTY.
|
|
|
|
*
|
|
|
|
* If SIGUSR2 raise after or during record__mmap_read_all(),
|
|
|
|
* record__mmap_read_all() didn't collect data from
|
|
|
|
* overwritable ring buffer. Read again.
|
|
|
|
*/
|
|
|
|
if (rec->evlist->bkw_mmap_state == BKW_MMAP_RUNNING)
|
|
|
|
continue;
|
2016-04-20 18:59:50 +00:00
|
|
|
trigger_ready(&switch_output_trigger);
|
|
|
|
|
2016-07-14 08:34:43 +00:00
|
|
|
/*
|
|
|
|
* Reenable events in overwrite ring buffer after
|
|
|
|
* record__mmap_read_all(): we should have collected
|
|
|
|
* data from it.
|
|
|
|
*/
|
2020-11-30 12:33:55 +00:00
|
|
|
evlist__toggle_bkw_mmap(rec->evlist, BKW_MMAP_RUNNING);
|
2016-07-14 08:34:43 +00:00
|
|
|
|
2016-04-20 18:59:50 +00:00
|
|
|
if (!quiet)
|
|
|
|
fprintf(stderr, "[ perf record: dump data: Woken up %ld times ]\n",
|
2022-01-17 18:34:25 +00:00
|
|
|
record__waking(rec));
|
|
|
|
thread->waking = 0;
|
2016-04-20 18:59:50 +00:00
|
|
|
fd = record__switch_output(rec, false);
|
|
|
|
if (fd < 0) {
|
|
|
|
pr_err("Failed to switch to new file\n");
|
|
|
|
trigger_error(&switch_output_trigger);
|
|
|
|
err = fd;
|
|
|
|
goto out_child;
|
|
|
|
}
|
2017-01-09 09:52:00 +00:00
|
|
|
|
|
|
|
/* re-arm the alarm */
|
|
|
|
if (rec->switch_output.time)
|
|
|
|
alarm(rec->switch_output.time);
|
2016-04-20 18:59:50 +00:00
|
|
|
}
|
|
|
|
|
2022-01-17 18:34:25 +00:00
|
|
|
if (hits == thread->samples) {
|
2014-08-13 14:33:59 +00:00
|
|
|
if (done || draining)
|
2009-06-24 19:12:48 +00:00
|
|
|
break;
|
2022-01-17 18:34:25 +00:00
|
|
|
err = fdarray__poll(&thread->pollfd, -1);
|
2014-06-02 17:44:23 +00:00
|
|
|
/*
|
|
|
|
* Propagate error, only if there's any. Ignore positive
|
|
|
|
* number of returned events and interrupt error.
|
|
|
|
*/
|
|
|
|
if (err > 0 || (err < 0 && errno == EINTR))
|
2014-05-12 00:47:24 +00:00
|
|
|
err = 0;
|
2022-01-17 18:34:25 +00:00
|
|
|
thread->waking++;
|
2014-08-13 14:33:59 +00:00
|
|
|
|
2022-01-17 18:34:25 +00:00
|
|
|
if (fdarray__filter(&thread->pollfd, POLLERR | POLLHUP,
|
|
|
|
record__thread_munmap_filtered, NULL) == 0)
|
2014-08-13 14:33:59 +00:00
|
|
|
draining = true;
|
2022-01-17 18:34:25 +00:00
|
|
|
|
2022-08-24 07:28:10 +00:00
|
|
|
err = record__update_evlist_pollfd_from_thread(rec, rec->evlist, thread);
|
|
|
|
if (err)
|
|
|
|
goto out_child;
|
2009-09-17 17:59:05 +00:00
|
|
|
}
|
|
|
|
|
2020-07-17 07:07:50 +00:00
|
|
|
if (evlist__ctlfd_process(rec->evlist, &cmd) > 0) {
|
|
|
|
switch (cmd) {
|
2020-09-01 09:37:57 +00:00
|
|
|
case EVLIST_CTL_CMD_SNAPSHOT:
|
|
|
|
hit_auxtrace_snapshot_trigger(rec);
|
|
|
|
evlist__ctlfd_ack(rec->evlist);
|
|
|
|
break;
|
2020-12-26 23:20:37 +00:00
|
|
|
case EVLIST_CTL_CMD_STOP:
|
|
|
|
done = 1;
|
|
|
|
break;
|
2020-07-17 07:07:50 +00:00
|
|
|
case EVLIST_CTL_CMD_ACK:
|
|
|
|
case EVLIST_CTL_CMD_UNSUPPORTED:
|
perf tools: Allow to enable/disable events via control file
Adding new control events to enable/disable specific event.
The interface string for control file are:
'enable <EVENT NAME>'
'disable <EVENT NAME>'
when received the command, perf will scan the current evlist
for <EVENT NAME> and if found it's enabled/disabled.
Example session:
terminal 1:
# mkfifo control ack perf.pipe
# perf record --control=fifo:control,ack -D -1 --no-buffering -e 'sched:*' -o - > perf.pipe
terminal 2:
# cat perf.pipe | perf --no-pager script -i -
terminal 1:
Events disabled
NOTE Above message will show only after read side of the pipe ('>')
is started on 'terminal 2'. The 'terminal 1's bash does not execute
perf before that, hence the delyaed perf record message.
terminal 3:
# echo 'enable sched:sched_process_fork' > control
terminal 1:
event sched:sched_process_fork enabled
terminal 2:
bash 33349 [034] 149587.674295: sched:sched_process_fork: comm=bash pid=33349 child_comm=bash child_pid=34056
bash 33349 [034] 149588.239521: sched:sched_process_fork: comm=bash pid=33349 child_comm=bash child_pid=34057
terminal 3:
# echo 'enable sched:sched_wakeup_new' > control
terminal 1:
event sched:sched_wakeup_new enabled
terminal 2:
bash 33349 [034] 149632.228023: sched:sched_process_fork: comm=bash pid=33349 child_comm=bash child_pid=34059
bash 33349 [034] 149632.228050: sched:sched_wakeup_new: bash:34059 [120] success=1 CPU:036
bash 33349 [034] 149633.950005: sched:sched_process_fork: comm=bash pid=33349 child_comm=bash child_pid=34060
bash 33349 [034] 149633.950030: sched:sched_wakeup_new: bash:34060 [120] success=1 CPU:036
Committer testing:
If I use 'sched:*' and then enable all events, I can't get 'perf record'
to react to further commands, so I tested it with:
[root@five ~]# perf record --control=fifo:control,ack -D -1 --no-buffering -e 'sched:sched_process_*' -o - > perf.pipe
Events disabled
Events enabled
Events disabled
And then it works as expected, so we need to fix this pre-existing
problem.
Another issue, we need to check if a event is already enabled or
disabled and change the message to be clearer, i.e.:
[root@five ~]# perf record --control=fifo:control,ack -D -1 --no-buffering -e 'sched:sched_process_*' -o - > perf.pipe
Events disabled
If we receive a 'disable' command, then it should say:
[root@five ~]# perf record --control=fifo:control,ack -D -1 --no-buffering -e 'sched:sched_process_*' -o - > perf.pipe
Events disabled
Events already disabled
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Budankov <abudankov@huawei.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201226232038.390883-2-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-26 23:20:35 +00:00
|
|
|
case EVLIST_CTL_CMD_ENABLE:
|
|
|
|
case EVLIST_CTL_CMD_DISABLE:
|
perf tools: Add 'evlist' control command
Add a new 'evlist' control command to display all the evlist events.
When it is received, perf will scan and print current evlist into perf
record terminal.
The interface string for control file is:
evlist [-v|-g|-F]
The syntax follows perf evlist command:
-F Show just the sample frequency used for each event.
-v Show all fields.
-g Show event group information.
Example session:
terminal 1:
# mkfifo control ack
# perf record --control=fifo:control,ack -e '{cycles,instructions}'
terminal 2:
# echo evlist > control
terminal 1:
cycles
instructions
dummy:HG
terminal 2:
# echo 'evlist -v' > control
terminal 1:
cycles: size: 120, { sample_period, sample_freq }: 4000, sample_type: \
IP|TID|TIME|ID|CPU|PERIOD, read_format: ID, disabled: 1, inherit: 1, freq: 1, \
sample_id_all: 1, exclude_guest: 1
instructions: size: 120, config: 0x1, { sample_period, sample_freq }: 4000, \
sample_type: IP|TID|TIME|ID|CPU|PERIOD, read_format: ID, inherit: 1, freq: 1, \
sample_id_all: 1, exclude_guest: 1
dummy:HG: type: 1, size: 120, config: 0x9, { sample_period, sample_freq }: 4000, \
sample_type: IP|TID|TIME|ID|CPU|PERIOD, read_format: ID, inherit: 1, mmap: 1, \
comm: 1, freq: 1, task: 1, sample_id_all: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, \
bpf_event: 1
terminal 2:
# echo 'evlist -g' > control
terminal 1:
{cycles,instructions}
dummy:HG
terminal 2:
# echo 'evlist -F' > control
terminal 1:
cycles: sample_freq=4000
instructions: sample_freq=4000
dummy:HG: sample_freq=4000
This new evlist command is handy to get real event names when
wildcards are used.
Adding evsel_fprintf.c object to python/perf.so build, because
it's now evlist.c dependency.
Adding PYTHON_PERF define for python/perf.so compilation, so we
can use it to compile in only evsel__fprintf from evsel_fprintf.c
object.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Budankov <abudankov@huawei.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201226232038.390883-3-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-26 23:20:36 +00:00
|
|
|
case EVLIST_CTL_CMD_EVLIST:
|
2020-12-26 23:20:38 +00:00
|
|
|
case EVLIST_CTL_CMD_PING:
|
2020-07-17 07:07:50 +00:00
|
|
|
default:
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
perf record: Allow multiple recording time ranges
AUX area traces can produce too much data to record successfully or
analyze subsequently. Add another means to reduce data collection by
allowing multiple recording time ranges.
This is useful, for instance, in cases where a workload produces
predictably reproducible events in specific time ranges.
Today we only have perf record -D <msecs> to start at a specific region, or
some complicated approach using snapshot mode and external scripts sending
signals or using the fifos. But these approaches are difficult to set up
compared with simply having perf do it.
Extend perf record option -D/--delay option to specifying relative time
stamps for start stop controlled by perf with the right time offset, for
instance:
perf record -e intel_pt// -D 10-20,30-40
to record 10ms to 20ms into the trace and 30ms to 40ms.
Example:
The example workload is:
$ cat repeat-usleep.c
int usleep(useconds_t usec);
int usage(int ret, const char *msg)
{
if (msg)
fprintf(stderr, "%s\n", msg);
fprintf(stderr, "Usage is: repeat-usleep <microseconds>\n");
return ret;
}
int main(int argc, char *argv[])
{
unsigned long usecs;
char *end_ptr;
if (argc != 2)
return usage(1, "Error: Wrong number of arguments!");
errno = 0;
usecs = strtoul(argv[1], &end_ptr, 0);
if (errno || *end_ptr || usecs > UINT_MAX)
return usage(1, "Error: Invalid argument!");
while (1) {
int ret = usleep(usecs);
if (ret & errno != EINTR)
return usage(1, "Error: usleep() failed!");
}
return 0;
}
$ perf record -e intel_pt//u --delay 10-20,40-70,110-160 -- ./repeat-usleep 500
Events disabled
Events enabled
Events disabled
Events enabled
Events disabled
Events enabled
Events disabled
[ perf record: Woken up 5 times to write data ]
[ perf record: Captured and wrote 0.204 MB perf.data ]
Terminated
A dlfilter is used to determine continuous data collection (timestamps
less than 1ms apart):
$ cat dlfilter-show-delays.c
static __u64 start_time;
static __u64 last_time;
int start(void **data, void *ctx)
{
printf("%-17s\t%-9s\t%-6s\n", " Time", " Duration", " Delay");
return 0;
}
int filter_event_early(void *data, const struct perf_dlfilter_sample *sample, void *ctx)
{
__u64 delta;
if (!sample->time)
return 1;
if (!last_time)
goto out;
delta = sample->time - last_time;
if (delta < 1000000)
goto out2;;
printf("%17.9f\t%9.1f\t%6.1f\n", start_time / 1000000000.0, (last_time - start_time) / 1000000.0, delta / 1000000.0);
out:
start_time = sample->time;
out2:
last_time = sample->time;
return 1;
}
int stop(void *data, void *ctx)
{
printf("%17.9f\t%9.1f\n", start_time / 1000000000.0, (last_time - start_time) / 1000000.0);
return 0;
}
The result shows the times roughly match the --delay option:
$ perf script --itrace=qb --dlfilter dlfilter-show-delays.so
Time Duration Delay
39215.302317300 9.7 20.5
39215.332480217 30.4 40.9
39215.403837717 49.8
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20220824072814.16422-6-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-08-24 07:28:14 +00:00
|
|
|
err = event_enable_timer__process(rec->evlist->eet);
|
|
|
|
if (err < 0)
|
|
|
|
goto out_child;
|
|
|
|
if (err) {
|
|
|
|
err = 0;
|
|
|
|
done = 1;
|
|
|
|
}
|
|
|
|
|
2012-11-12 17:34:01 +00:00
|
|
|
/*
|
|
|
|
* When perf is starting the traced process, at the end events
|
|
|
|
* die with the process and we wait for that. Thus no need to
|
|
|
|
* disable events in this case.
|
|
|
|
*/
|
2013-11-12 19:46:16 +00:00
|
|
|
if (done && !disabled && !target__none(&opts->target)) {
|
2016-04-20 18:59:49 +00:00
|
|
|
trigger_off(&auxtrace_snapshot_trigger);
|
2019-07-21 11:24:09 +00:00
|
|
|
evlist__disable(rec->evlist);
|
2012-11-12 17:34:02 +00:00
|
|
|
disabled = true;
|
|
|
|
}
|
2009-04-08 13:01:31 +00:00
|
|
|
}
|
2019-08-06 14:41:01 +00:00
|
|
|
|
2016-04-20 18:59:49 +00:00
|
|
|
trigger_off(&auxtrace_snapshot_trigger);
|
2016-04-20 18:59:50 +00:00
|
|
|
trigger_off(&switch_output_trigger);
|
2009-04-08 13:01:31 +00:00
|
|
|
|
2019-08-06 14:41:01 +00:00
|
|
|
if (opts->auxtrace_snapshot_on_exit)
|
|
|
|
record__auxtrace_snapshot_exit(rec);
|
|
|
|
|
2014-01-02 18:11:25 +00:00
|
|
|
if (forks && workload_exec_errno) {
|
perf record: Improve 'Workload failed' message printing events + what was exec'ed
Before:
# perf record -a cycles,instructions,cache-misses
Workload failed: No such file or directory
#
After:
# perf record -a cycles,instructions,cache-misses
Failed to collect 'cycles' for the 'cycles,instructions,cache-misses' workload: No such file or directory
#
Helps disambiguating other error scenarios:
# perf record -a -e cycles,instructions,cache-misses bla
Failed to collect 'cycles,instructions,cache-misses' for the 'bla' workload: No such file or directory
# perf record -a cycles,instructions,cache-misses sleep 1
Failed to collect 'cycles' for the 'cycles,instructions,cache-misses' workload: No such file or directory
#
When all goes well we're back to the usual:
# perf record -a -e cycles,instructions,cache-misses sleep 1
[ perf record: Woken up 3 times to write data ]
[ perf record: Captured and wrote 3.151 MB perf.data (21242 samples) ]
#
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lore.kernel.org/lkml/20210414131628.2064862-3-acme@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-04-14 12:32:14 +00:00
|
|
|
char msg[STRERR_BUFSIZE], strevsels[2048];
|
tools: Introduce str_error_r()
The tools so far have been using the strerror_r() GNU variant, that
returns a string, be it the buffer passed or something else.
But that, besides being tricky in cases where we expect that the
function using strerror_r() returns the error formatted in a provided
buffer (we have to check if it returned something else and copy that
instead), breaks the build on systems not using glibc, like Alpine
Linux, where musl libc is used.
So, introduce yet another wrapper, str_error_r(), that has the GNU
interface, but uses the portable XSI variant of strerror_r(), so that
users rest asured that the provided buffer is used and it is what is
returned.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-d4t42fnf48ytlk8rjxs822tf@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-07-06 14:56:20 +00:00
|
|
|
const char *emsg = str_error_r(workload_exec_errno, msg, sizeof(msg));
|
perf record: Improve 'Workload failed' message printing events + what was exec'ed
Before:
# perf record -a cycles,instructions,cache-misses
Workload failed: No such file or directory
#
After:
# perf record -a cycles,instructions,cache-misses
Failed to collect 'cycles' for the 'cycles,instructions,cache-misses' workload: No such file or directory
#
Helps disambiguating other error scenarios:
# perf record -a -e cycles,instructions,cache-misses bla
Failed to collect 'cycles,instructions,cache-misses' for the 'bla' workload: No such file or directory
# perf record -a cycles,instructions,cache-misses sleep 1
Failed to collect 'cycles' for the 'cycles,instructions,cache-misses' workload: No such file or directory
#
When all goes well we're back to the usual:
# perf record -a -e cycles,instructions,cache-misses sleep 1
[ perf record: Woken up 3 times to write data ]
[ perf record: Captured and wrote 3.151 MB perf.data (21242 samples) ]
#
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lore.kernel.org/lkml/20210414131628.2064862-3-acme@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-04-14 12:32:14 +00:00
|
|
|
|
|
|
|
evlist__scnprintf_evsels(rec->evlist, sizeof(strevsels), strevsels);
|
|
|
|
|
|
|
|
pr_err("Failed to collect '%s' for the '%s' workload: %s\n",
|
|
|
|
strevsels, argv[0], emsg);
|
2014-01-02 18:11:25 +00:00
|
|
|
err = -1;
|
2014-05-12 00:47:24 +00:00
|
|
|
goto out_child;
|
2014-01-02 18:11:25 +00:00
|
|
|
}
|
|
|
|
|
2015-01-29 08:06:44 +00:00
|
|
|
if (!quiet)
|
2022-01-17 18:34:25 +00:00
|
|
|
fprintf(stderr, "[ perf record: Woken up %ld times to write data ]\n",
|
|
|
|
record__waking(rec));
|
2010-10-26 17:20:09 +00:00
|
|
|
|
2022-06-10 11:33:15 +00:00
|
|
|
write_finished_init(rec, true);
|
|
|
|
|
perf record: Add --tail-synthesize option
When working with overwritable ring buffer there's a inconvenience
problem: if perf dumps data after a long period after it starts,
non-sample events may lost, which makes following 'perf report' unable
to identify proc name and mmap layout. For example:
# perf record -m 4 -e raw_syscalls:* -g --overwrite --switch-output \
dd if=/dev/zero of=/dev/null
send SIGUSR2 after dd runs long enough. The resuling perf.data lost
correct comm and mmap events:
# perf script -i perf.data.2016061522374354
perf 24478 [004] 2581325.601789: raw_syscalls:sys_exit: NR 0 = 512
^^^^
Should be 'dd'
27b2e8 syscall_slow_exit_work+0xfe2000e3 (/lib/modules/4.6.0-rc3+/build/vmlinux)
203cc7 do_syscall_64+0xfe200117 (/lib/modules/4.6.0-rc3+/build/vmlinux)
b18d83 return_from_SYSCALL_64+0xfe200000 (/lib/modules/4.6.0-rc3+/build/vmlinux)
7f47c417edf0 [unknown] ([unknown])
^^^^^^^^^^^^
Fail to unwind
This patch provides a '--tail-synthesize' option, allows perf to collect
system status when finalizing output file. In resuling output file, the
non-sample events reflect system status when dumping data.
After this patch:
# perf record -m 4 -e raw_syscalls:* -g --overwrite --switch-output --tail-synthesize \
dd if=/dev/zero of=/dev/null
# perf script -i perf.data.2016061600544998
dd 27364 [004] 2583244.994464: raw_syscalls:sys_enter: NR 1 (1, ...
^^
Correct comm
203a18 syscall_trace_enter_phase2+0xfe2001a8 ([kernel.kallsyms])
203aa5 syscall_trace_enter+0xfe200055 ([kernel.kallsyms])
203caa do_syscall_64+0xfe2000fa ([kernel.kallsyms])
b18d83 return_from_SYSCALL_64+0xfe200000 ([kernel.kallsyms])
d8e50 __GI___libc_write+0xffff01d9639f4010 (/tmp/oxygen_root-w00229757/lib64/libc-2.18.so)
^^^^^
Correct unwind
This option doesn't aim to solve this problem completely. If a process
terminates before SIGUSR2, we still lost its COMM and MMAP events. For
example, we can't unwind correctly from the final perf.data we get from
the previous example, because when perf collects the final output file
(when we press C-c), 'dd' has been terminated so its '/proc/<pid>/mmap'
becomes empty.
However, this is a cheaper choice. To completely solve this problem we
need to continously output non-sample events. To satisify the
requirement of daemonization, we need to merge them periodically. It is
possible but requires much more code and cycles.
Automatically select --tail-synthesize when --overwrite is provided.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1468485287-33422-16-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-07-14 08:34:47 +00:00
|
|
|
if (target__none(&rec->opts.target))
|
|
|
|
record__synthesize_workload(rec, true);
|
|
|
|
|
2014-05-12 00:47:24 +00:00
|
|
|
out_child:
|
2022-01-17 18:34:25 +00:00
|
|
|
record__stop_threads(rec);
|
perf record: Implement --mmap-flush=<number> option
Implement a --mmap-flush option that specifies minimal number of bytes
that is extracted from mmaped kernel buffer to store into a trace. The
default option value is 1 byte what means every time trace writing
thread finds some new data in the mmaped buffer the data is extracted,
possibly compressed and written to a trace.
$ tools/perf/perf record --mmap-flush 1024 -e cycles -- matrix.gcc
$ tools/perf/perf record --aio --mmap-flush 1K -e cycles -- matrix.gcc
The option is independent from -z setting, doesn't vary with compression
level and can serve two purposes.
The first purpose is to increase the compression ratio of a trace data.
Larger data chunks are compressed more effectively so the implemented
option allows specifying data chunk size to compress. Also at some cases
executing more write syscalls with smaller data size can take longer
than executing less write syscalls with bigger data size due to syscall
overhead so extracting bigger data chunks specified by the option value
could additionally decrease runtime overhead.
The second purpose is to avoid self monitoring live-lock issue in system
wide (-a) profiling mode. Profiling in system wide mode with compression
(-a -z) can additionally induce data into the kernel buffers along with
the data from monitored processes. If performance data rate and volume
from the monitored processes is high then trace streaming and
compression activity in the tool is also high. High tool process
activity can lead to subtle live-lock effect when compression of single
new byte from some of mmaped kernel buffer leads to generation of the
next single byte at some mmaped buffer. So perf tool process ends up in
endless self monitoring.
Implemented synch parameter is the mean to force data move independently
from the specified flush threshold value. Despite the provided flush
value the tool needs capability to unconditionally drain memory buffers,
at least in the end of the collection.
Committer testing:
Running with the default value, i.e. as soon as there is something to
read go on consuming, we first write the synthesized events, small
chunks of about 128 bytes:
# perf trace -m 2048 --call-graph dwarf -e write -- perf record
<SNIP>
101.142 ( 0.004 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x210db60, count: 120) = 120
__libc_write (/usr/lib64/libpthread-2.28.so)
ion (/home/acme/bin/perf)
record__write (inlined)
process_synthesized_event (/home/acme/bin/perf)
perf_tool__process_synth_event (inlined)
perf_event__synthesize_mmap_events (/home/acme/bin/perf)
Then we move to reading the mmap buffers consuming the events put there
by the kernel perf infrastructure:
107.561 ( 0.005 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befc02000, count: 336) = 336
__libc_write (/usr/lib64/libpthread-2.28.so)
ion (/home/acme/bin/perf)
record__write (inlined)
record__pushfn (/home/acme/bin/perf)
perf_mmap__push (/home/acme/bin/perf)
record__mmap_read_evlist (inlined)
record__mmap_read_all (inlined)
__cmd_record (inlined)
cmd_record (/home/acme/bin/perf)
12919.953 ( 0.136 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befc83150, count: 184984) = 184984
<SNIP same backtrace as in the 107.561 timestamp>
12920.094 ( 0.155 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befc02150, count: 261816) = 261816
<SNIP same backtrace as in the 107.561 timestamp>
12920.253 ( 0.093 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befb81120, count: 170832) = 170832
<SNIP same backtrace as in the 107.561 timestamp>
If we limit it to write only when more than 16MB are available for
reading, it throttles that to a quarter of the --mmap-pages set for
'perf record', which by default get to 528384 bytes, found out using
'record -v':
mmap flush: 132096
mmap size 528384B
With that in place all the writes coming from
record__mmap_read_evlist(), i.e. from the mmap buffers setup by the
kernel perf infrastructure were at least 132096 bytes long.
Trying with a bigger mmap size:
perf trace -e write perf record -v -m 2048 --mmap-flush 16M
74982.928 ( 2.471 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff94a6cc000, count: 3580888) = 3580888
74985.406 ( 2.353 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff949ecb000, count: 3453256) = 3453256
74987.764 ( 2.629 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff9496ca000, count: 3859232) = 3859232
74990.399 ( 2.341 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff948ec9000, count: 3769032) = 3769032
74992.744 ( 2.064 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff9486c8000, count: 3310520) = 3310520
74994.814 ( 2.619 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff947ec7000, count: 4194688) = 4194688
74997.439 ( 2.787 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff9476c6000, count: 4029760) = 4029760
Was again limited to a quarter of the mmap size:
mmap flush: 2098176
mmap size 8392704B
A warning about that would be good to have but can be added later,
something like:
"max flush is a quarter of the mmap size, if wanting to bump the mmap
flush further, bump the mmap size as well using -m/--mmap-pages"
Also rename the 'sync' parameters to 'synch' to keep tools/perf building
with older glibcs:
cc1: warnings being treated as errors
builtin-record.c: In function 'record__mmap_read_evlist':
builtin-record.c:775: warning: declaration of 'sync' shadows a global declaration
/usr/include/unistd.h:933: warning: shadowed declaration is here
builtin-record.c: In function 'record__mmap_read_all':
builtin-record.c:856: warning: declaration of 'sync' shadows a global declaration
/usr/include/unistd.h:933: warning: shadowed declaration is here
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/f6600d72-ecfa-2eb7-7e51-f6954547d500@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-18 17:40:26 +00:00
|
|
|
record__mmap_read_all(rec, true);
|
2022-01-17 18:34:25 +00:00
|
|
|
out_free_threads:
|
2022-01-17 18:34:23 +00:00
|
|
|
record__free_thread_data(rec);
|
2022-01-17 18:34:25 +00:00
|
|
|
evlist__finalize_ctlfd(rec->evlist);
|
2018-11-06 09:04:58 +00:00
|
|
|
record__aio_mmap_read_sync(rec);
|
|
|
|
|
2019-03-18 17:41:02 +00:00
|
|
|
if (rec->session->bytes_transferred && rec->session->bytes_compressed) {
|
|
|
|
ratio = (float)rec->session->bytes_transferred/(float)rec->session->bytes_compressed;
|
|
|
|
session->header.env.comp_ratio = ratio + 0.5;
|
|
|
|
}
|
|
|
|
|
2014-05-12 00:47:24 +00:00
|
|
|
if (forks) {
|
|
|
|
int exit_status;
|
2009-06-02 21:43:11 +00:00
|
|
|
|
2014-05-12 00:47:24 +00:00
|
|
|
if (!child_finished)
|
|
|
|
kill(rec->evlist->workload.pid, SIGTERM);
|
|
|
|
|
|
|
|
wait(&exit_status);
|
|
|
|
|
|
|
|
if (err < 0)
|
|
|
|
status = err;
|
|
|
|
else if (WIFEXITED(exit_status))
|
|
|
|
status = WEXITSTATUS(exit_status);
|
|
|
|
else if (WIFSIGNALED(exit_status))
|
|
|
|
signr = WTERMSIG(exit_status);
|
|
|
|
} else
|
|
|
|
status = err;
|
|
|
|
|
2022-05-18 22:47:21 +00:00
|
|
|
if (rec->off_cpu)
|
|
|
|
rec->bytes_written += off_cpu_write(rec->session);
|
|
|
|
|
2022-09-01 19:57:37 +00:00
|
|
|
record__read_lost_samples(rec);
|
perf record: Add --tail-synthesize option
When working with overwritable ring buffer there's a inconvenience
problem: if perf dumps data after a long period after it starts,
non-sample events may lost, which makes following 'perf report' unable
to identify proc name and mmap layout. For example:
# perf record -m 4 -e raw_syscalls:* -g --overwrite --switch-output \
dd if=/dev/zero of=/dev/null
send SIGUSR2 after dd runs long enough. The resuling perf.data lost
correct comm and mmap events:
# perf script -i perf.data.2016061522374354
perf 24478 [004] 2581325.601789: raw_syscalls:sys_exit: NR 0 = 512
^^^^
Should be 'dd'
27b2e8 syscall_slow_exit_work+0xfe2000e3 (/lib/modules/4.6.0-rc3+/build/vmlinux)
203cc7 do_syscall_64+0xfe200117 (/lib/modules/4.6.0-rc3+/build/vmlinux)
b18d83 return_from_SYSCALL_64+0xfe200000 (/lib/modules/4.6.0-rc3+/build/vmlinux)
7f47c417edf0 [unknown] ([unknown])
^^^^^^^^^^^^
Fail to unwind
This patch provides a '--tail-synthesize' option, allows perf to collect
system status when finalizing output file. In resuling output file, the
non-sample events reflect system status when dumping data.
After this patch:
# perf record -m 4 -e raw_syscalls:* -g --overwrite --switch-output --tail-synthesize \
dd if=/dev/zero of=/dev/null
# perf script -i perf.data.2016061600544998
dd 27364 [004] 2583244.994464: raw_syscalls:sys_enter: NR 1 (1, ...
^^
Correct comm
203a18 syscall_trace_enter_phase2+0xfe2001a8 ([kernel.kallsyms])
203aa5 syscall_trace_enter+0xfe200055 ([kernel.kallsyms])
203caa do_syscall_64+0xfe2000fa ([kernel.kallsyms])
b18d83 return_from_SYSCALL_64+0xfe200000 ([kernel.kallsyms])
d8e50 __GI___libc_write+0xffff01d9639f4010 (/tmp/oxygen_root-w00229757/lib64/libc-2.18.so)
^^^^^
Correct unwind
This option doesn't aim to solve this problem completely. If a process
terminates before SIGUSR2, we still lost its COMM and MMAP events. For
example, we can't unwind correctly from the final perf.data we get from
the previous example, because when perf collects the final output file
(when we press C-c), 'dd' has been terminated so its '/proc/<pid>/mmap'
becomes empty.
However, this is a cheaper choice. To completely solve this problem we
need to continously output non-sample events. To satisify the
requirement of daemonization, we need to merge them periodically. It is
possible but requires much more code and cycles.
Automatically select --tail-synthesize when --overwrite is provided.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1468485287-33422-16-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-07-14 08:34:47 +00:00
|
|
|
record__synthesize(rec, true);
|
2015-01-29 08:06:44 +00:00
|
|
|
/* this will be recalculated during process_buildids() */
|
|
|
|
rec->samples = 0;
|
|
|
|
|
2016-04-13 08:21:07 +00:00
|
|
|
if (!err) {
|
|
|
|
if (!rec->timestamp_filename) {
|
|
|
|
record__finish_output(rec);
|
|
|
|
} else {
|
|
|
|
fd = record__switch_output(rec, true);
|
|
|
|
if (fd < 0) {
|
|
|
|
status = fd;
|
|
|
|
goto out_delete_session;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
2010-07-29 17:08:55 +00:00
|
|
|
|
2016-11-26 07:03:28 +00:00
|
|
|
perf_hooks__invoke_record_end();
|
|
|
|
|
2015-01-29 08:06:44 +00:00
|
|
|
if (!err && !quiet) {
|
|
|
|
char samples[128];
|
2016-04-13 08:21:07 +00:00
|
|
|
const char *postfix = rec->timestamp_filename ?
|
|
|
|
".<timestamp>" : "";
|
2015-01-29 08:06:44 +00:00
|
|
|
|
2015-04-09 15:53:45 +00:00
|
|
|
if (rec->samples && !rec->opts.full_auxtrace)
|
2015-01-29 08:06:44 +00:00
|
|
|
scnprintf(samples, sizeof(samples),
|
|
|
|
" (%" PRIu64 " samples)", rec->samples);
|
|
|
|
else
|
|
|
|
samples[0] = '\0';
|
|
|
|
|
2019-03-18 17:41:02 +00:00
|
|
|
fprintf(stderr, "[ perf record: Captured and wrote %.3f MB %s%s%s",
|
2017-01-23 21:07:59 +00:00
|
|
|
perf_data__size(data) / 1024.0 / 1024.0,
|
2019-02-21 09:41:30 +00:00
|
|
|
data->path, postfix, samples);
|
2019-03-18 17:41:02 +00:00
|
|
|
if (ratio) {
|
|
|
|
fprintf(stderr, ", compressed (original %.3f MB, ratio is %.3f)",
|
|
|
|
rec->session->bytes_transferred / 1024.0 / 1024.0,
|
|
|
|
ratio);
|
|
|
|
}
|
|
|
|
fprintf(stderr, " ]\n");
|
2015-01-29 08:06:44 +00:00
|
|
|
}
|
|
|
|
|
2010-07-29 17:08:55 +00:00
|
|
|
out_delete_session:
|
2020-05-13 02:20:23 +00:00
|
|
|
#ifdef HAVE_EVENTFD_SUPPORT
|
2022-10-24 01:10:24 +00:00
|
|
|
if (done_fd >= 0) {
|
|
|
|
fd = done_fd;
|
|
|
|
done_fd = -1;
|
|
|
|
|
|
|
|
close(fd);
|
|
|
|
}
|
2020-05-13 02:20:23 +00:00
|
|
|
#endif
|
2019-03-18 17:43:35 +00:00
|
|
|
zstd_fini(&session->zstd_data);
|
2019-03-12 05:30:50 +00:00
|
|
|
if (!opts->no_bpf_event)
|
2020-11-30 12:40:10 +00:00
|
|
|
evlist__stop_sb_thread(rec->sb_evlist);
|
2024-03-01 07:46:36 +00:00
|
|
|
|
|
|
|
perf_session__delete(session);
|
2014-05-12 00:47:24 +00:00
|
|
|
return status;
|
2009-04-08 13:01:31 +00:00
|
|
|
}
|
2009-05-26 07:17:18 +00:00
|
|
|
|
2016-04-15 19:37:17 +00:00
|
|
|
static void callchain_debug(struct callchain_param *callchain)
|
2013-10-26 14:25:33 +00:00
|
|
|
{
|
2015-01-05 18:23:04 +00:00
|
|
|
static const char *str[CALLCHAIN_MAX] = { "NONE", "FP", "DWARF", "LBR" };
|
2014-02-03 11:44:43 +00:00
|
|
|
|
2016-04-15 19:37:17 +00:00
|
|
|
pr_debug("callchain: type %s\n", str[callchain->record_mode]);
|
2012-08-07 13:20:47 +00:00
|
|
|
|
2016-04-15 19:37:17 +00:00
|
|
|
if (callchain->record_mode == CALLCHAIN_DWARF)
|
2013-10-26 14:25:33 +00:00
|
|
|
pr_debug("callchain: stack dump size %d\n",
|
2016-04-15 19:37:17 +00:00
|
|
|
callchain->dump_size);
|
2013-10-26 14:25:33 +00:00
|
|
|
}
|
|
|
|
|
2016-04-15 19:37:17 +00:00
|
|
|
int record_opts__parse_callchain(struct record_opts *record,
|
|
|
|
struct callchain_param *callchain,
|
|
|
|
const char *arg, bool unset)
|
2013-10-26 14:25:33 +00:00
|
|
|
{
|
|
|
|
int ret;
|
2016-04-15 19:37:17 +00:00
|
|
|
callchain->enabled = !unset;
|
2014-02-03 11:44:42 +00:00
|
|
|
|
2013-10-26 14:25:33 +00:00
|
|
|
/* --no-call-graph */
|
|
|
|
if (unset) {
|
2016-04-15 19:37:17 +00:00
|
|
|
callchain->record_mode = CALLCHAIN_NONE;
|
2013-10-26 14:25:33 +00:00
|
|
|
pr_debug("callchain: disabled\n");
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2016-04-15 19:37:17 +00:00
|
|
|
ret = parse_callchain_record_opt(arg, callchain);
|
2016-01-07 13:30:22 +00:00
|
|
|
if (!ret) {
|
|
|
|
/* Enable data address sampling for DWARF unwind. */
|
2016-04-15 19:37:17 +00:00
|
|
|
if (callchain->record_mode == CALLCHAIN_DWARF)
|
2016-01-07 13:30:22 +00:00
|
|
|
record->sample_address = true;
|
2016-04-15 19:37:17 +00:00
|
|
|
callchain_debug(callchain);
|
2016-01-07 13:30:22 +00:00
|
|
|
}
|
2012-08-07 13:20:47 +00:00
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2016-04-15 19:37:17 +00:00
|
|
|
int record_parse_callchain_opt(const struct option *opt,
|
|
|
|
const char *arg,
|
|
|
|
int unset)
|
|
|
|
{
|
|
|
|
return record_opts__parse_callchain(opt->value, &callchain_param, arg, unset);
|
|
|
|
}
|
|
|
|
|
2015-07-29 09:42:12 +00:00
|
|
|
int record_callchain_opt(const struct option *opt,
|
2013-10-26 14:25:33 +00:00
|
|
|
const char *arg __maybe_unused,
|
|
|
|
int unset __maybe_unused)
|
|
|
|
{
|
2016-04-18 15:09:08 +00:00
|
|
|
struct callchain_param *callchain = opt->value;
|
2015-07-29 09:42:12 +00:00
|
|
|
|
2016-04-18 15:09:08 +00:00
|
|
|
callchain->enabled = true;
|
2013-10-26 14:25:33 +00:00
|
|
|
|
2016-04-18 15:09:08 +00:00
|
|
|
if (callchain->record_mode == CALLCHAIN_NONE)
|
|
|
|
callchain->record_mode = CALLCHAIN_FP;
|
2014-02-03 11:44:42 +00:00
|
|
|
|
2016-04-18 15:09:08 +00:00
|
|
|
callchain_debug(callchain);
|
2013-10-26 14:25:33 +00:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2014-02-03 11:44:42 +00:00
|
|
|
static int perf_record_config(const char *var, const char *value, void *cb)
|
|
|
|
{
|
2015-12-15 01:49:56 +00:00
|
|
|
struct record *rec = cb;
|
|
|
|
|
|
|
|
if (!strcmp(var, "record.build-id")) {
|
|
|
|
if (!strcmp(value, "cache"))
|
|
|
|
rec->no_buildid_cache = false;
|
|
|
|
else if (!strcmp(value, "no-cache"))
|
|
|
|
rec->no_buildid_cache = true;
|
|
|
|
else if (!strcmp(value, "skip"))
|
|
|
|
rec->no_buildid = true;
|
perf record: Add --buildid-mmap option to enable PERF_RECORD_MMAP2's build id
Add --buildid-mmap option to enable build id in PERF_RECORD_MMAP2 events.
It will only work if there's kernel support for that and it disables
build id cache (implies --no-buildid).
It's also possible to enable it permanently via config option in
~/.perfconfig file:
[record]
build-id=mmap
Also added build_id bit in the verbose output for perf_event_attr:
# perf record --buildid-mmap -vv
...
perf_event_attr:
type 1
size 120
...
build_id 1
Adding also missing text_poke bit.
Committer testing:
$ perf record -h build
Usage: perf record [<options>] [<command>]
or: perf record [<options>] -- <command> [<options>]
-B, --no-buildid do not collect buildids in perf.data
-N, --no-buildid-cache
do not update the buildid cache
--buildid-all Record build-id of all DSOs regardless of hits
--buildid-mmap Record build-id in map events
$
$ perf record --buildid-mmap sleep 1
Failed: no support to record build id in mmap events, update your kernel.
$
After adding the needed kernel bits in a test kernel:
$ perf record -vv --buildid-mmap sleep 1 |& grep -m1 build
Enabling build id in mmap2 events.
$ perf evlist -v
cycles:u: size: 120, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD, read_format: ID, disabled: 1, inherit: 1, exclude_kernel: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, build_id: 1
$
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Budankov <abudankov@huawei.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201214105457.543111-16-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-14 10:54:57 +00:00
|
|
|
else if (!strcmp(value, "mmap"))
|
|
|
|
rec->buildid_mmap = true;
|
2015-12-15 01:49:56 +00:00
|
|
|
else
|
|
|
|
return -1;
|
|
|
|
return 0;
|
|
|
|
}
|
2018-03-12 11:25:57 +00:00
|
|
|
if (!strcmp(var, "record.call-graph")) {
|
|
|
|
var = "call-graph.record-mode";
|
|
|
|
return perf_default_config(var, value, cb);
|
|
|
|
}
|
2018-11-06 09:07:19 +00:00
|
|
|
#ifdef HAVE_AIO_SUPPORT
|
|
|
|
if (!strcmp(var, "record.aio")) {
|
|
|
|
rec->opts.nr_cblocks = strtol(value, NULL, 0);
|
|
|
|
if (!rec->opts.nr_cblocks)
|
|
|
|
rec->opts.nr_cblocks = nr_cblocks_default;
|
|
|
|
}
|
|
|
|
#endif
|
2021-12-09 20:04:25 +00:00
|
|
|
if (!strcmp(var, "record.debuginfod")) {
|
|
|
|
rec->debuginfod.urls = strdup(value);
|
|
|
|
if (!rec->debuginfod.urls)
|
|
|
|
return -ENOMEM;
|
|
|
|
rec->debuginfod.set = true;
|
|
|
|
}
|
2014-02-03 11:44:42 +00:00
|
|
|
|
2018-03-12 11:25:57 +00:00
|
|
|
return 0;
|
2014-02-03 11:44:42 +00:00
|
|
|
}
|
|
|
|
|
perf record: Allow multiple recording time ranges
AUX area traces can produce too much data to record successfully or
analyze subsequently. Add another means to reduce data collection by
allowing multiple recording time ranges.
This is useful, for instance, in cases where a workload produces
predictably reproducible events in specific time ranges.
Today we only have perf record -D <msecs> to start at a specific region, or
some complicated approach using snapshot mode and external scripts sending
signals or using the fifos. But these approaches are difficult to set up
compared with simply having perf do it.
Extend perf record option -D/--delay option to specifying relative time
stamps for start stop controlled by perf with the right time offset, for
instance:
perf record -e intel_pt// -D 10-20,30-40
to record 10ms to 20ms into the trace and 30ms to 40ms.
Example:
The example workload is:
$ cat repeat-usleep.c
int usleep(useconds_t usec);
int usage(int ret, const char *msg)
{
if (msg)
fprintf(stderr, "%s\n", msg);
fprintf(stderr, "Usage is: repeat-usleep <microseconds>\n");
return ret;
}
int main(int argc, char *argv[])
{
unsigned long usecs;
char *end_ptr;
if (argc != 2)
return usage(1, "Error: Wrong number of arguments!");
errno = 0;
usecs = strtoul(argv[1], &end_ptr, 0);
if (errno || *end_ptr || usecs > UINT_MAX)
return usage(1, "Error: Invalid argument!");
while (1) {
int ret = usleep(usecs);
if (ret & errno != EINTR)
return usage(1, "Error: usleep() failed!");
}
return 0;
}
$ perf record -e intel_pt//u --delay 10-20,40-70,110-160 -- ./repeat-usleep 500
Events disabled
Events enabled
Events disabled
Events enabled
Events disabled
Events enabled
Events disabled
[ perf record: Woken up 5 times to write data ]
[ perf record: Captured and wrote 0.204 MB perf.data ]
Terminated
A dlfilter is used to determine continuous data collection (timestamps
less than 1ms apart):
$ cat dlfilter-show-delays.c
static __u64 start_time;
static __u64 last_time;
int start(void **data, void *ctx)
{
printf("%-17s\t%-9s\t%-6s\n", " Time", " Duration", " Delay");
return 0;
}
int filter_event_early(void *data, const struct perf_dlfilter_sample *sample, void *ctx)
{
__u64 delta;
if (!sample->time)
return 1;
if (!last_time)
goto out;
delta = sample->time - last_time;
if (delta < 1000000)
goto out2;;
printf("%17.9f\t%9.1f\t%6.1f\n", start_time / 1000000000.0, (last_time - start_time) / 1000000.0, delta / 1000000.0);
out:
start_time = sample->time;
out2:
last_time = sample->time;
return 1;
}
int stop(void *data, void *ctx)
{
printf("%17.9f\t%9.1f\n", start_time / 1000000000.0, (last_time - start_time) / 1000000.0);
return 0;
}
The result shows the times roughly match the --delay option:
$ perf script --itrace=qb --dlfilter dlfilter-show-delays.so
Time Duration Delay
39215.302317300 9.7 20.5
39215.332480217 30.4 40.9
39215.403837717 49.8
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20220824072814.16422-6-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-08-24 07:28:14 +00:00
|
|
|
static int record__parse_event_enable_time(const struct option *opt, const char *str, int unset)
|
|
|
|
{
|
|
|
|
struct record *rec = (struct record *)opt->value;
|
|
|
|
|
|
|
|
return evlist__parse_event_enable_time(rec->evlist, &rec->opts, str, unset);
|
|
|
|
}
|
2015-03-30 22:19:31 +00:00
|
|
|
|
2019-01-22 17:52:03 +00:00
|
|
|
static int record__parse_affinity(const struct option *opt, const char *str, int unset)
|
|
|
|
{
|
|
|
|
struct record_opts *opts = (struct record_opts *)opt->value;
|
|
|
|
|
|
|
|
if (unset || !str)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
if (!strcasecmp(str, "node"))
|
|
|
|
opts->affinity = PERF_AFFINITY_NODE;
|
|
|
|
else if (!strcasecmp(str, "cpu"))
|
|
|
|
opts->affinity = PERF_AFFINITY_CPU;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2022-01-17 18:34:21 +00:00
|
|
|
static int record__mmap_cpu_mask_alloc(struct mmap_cpu_mask *mask, int nr_bits)
|
|
|
|
{
|
|
|
|
mask->nbits = nr_bits;
|
|
|
|
mask->bits = bitmap_zalloc(mask->nbits);
|
|
|
|
if (!mask->bits)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void record__mmap_cpu_mask_free(struct mmap_cpu_mask *mask)
|
|
|
|
{
|
|
|
|
bitmap_free(mask->bits);
|
|
|
|
mask->nbits = 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int record__thread_mask_alloc(struct thread_mask *mask, int nr_bits)
|
|
|
|
{
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
ret = record__mmap_cpu_mask_alloc(&mask->maps, nr_bits);
|
|
|
|
if (ret) {
|
|
|
|
mask->affinity.bits = NULL;
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
ret = record__mmap_cpu_mask_alloc(&mask->affinity, nr_bits);
|
|
|
|
if (ret) {
|
|
|
|
record__mmap_cpu_mask_free(&mask->maps);
|
|
|
|
mask->maps.bits = NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void record__thread_mask_free(struct thread_mask *mask)
|
|
|
|
{
|
|
|
|
record__mmap_cpu_mask_free(&mask->maps);
|
|
|
|
record__mmap_cpu_mask_free(&mask->affinity);
|
|
|
|
}
|
|
|
|
|
2022-01-17 18:34:32 +00:00
|
|
|
static int record__parse_threads(const struct option *opt, const char *str, int unset)
|
|
|
|
{
|
2022-01-17 18:34:33 +00:00
|
|
|
int s;
|
2022-01-17 18:34:32 +00:00
|
|
|
struct record_opts *opts = opt->value;
|
|
|
|
|
2022-01-17 18:34:33 +00:00
|
|
|
if (unset || !str || !strlen(str)) {
|
2022-01-17 18:34:32 +00:00
|
|
|
opts->threads_spec = THREAD_SPEC__CPU;
|
2022-01-17 18:34:33 +00:00
|
|
|
} else {
|
|
|
|
for (s = 1; s < THREAD_SPEC__MAX; s++) {
|
|
|
|
if (s == THREAD_SPEC__USER) {
|
|
|
|
opts->threads_user_spec = strdup(str);
|
|
|
|
if (!opts->threads_user_spec)
|
|
|
|
return -ENOMEM;
|
|
|
|
opts->threads_spec = THREAD_SPEC__USER;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
if (!strncasecmp(str, thread_spec_tags[s], strlen(thread_spec_tags[s]))) {
|
|
|
|
opts->threads_spec = s;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
if (opts->threads_spec == THREAD_SPEC__USER)
|
|
|
|
pr_debug("threads_spec: %s\n", opts->threads_user_spec);
|
|
|
|
else
|
|
|
|
pr_debug("threads_spec: %s\n", thread_spec_tags[opts->threads_spec]);
|
2022-01-17 18:34:32 +00:00
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2019-10-22 08:09:01 +00:00
|
|
|
static int parse_output_max_size(const struct option *opt,
|
|
|
|
const char *str, int unset)
|
|
|
|
{
|
|
|
|
unsigned long *s = (unsigned long *)opt->value;
|
|
|
|
static struct parse_tag tags_size[] = {
|
|
|
|
{ .tag = 'B', .mult = 1 },
|
|
|
|
{ .tag = 'K', .mult = 1 << 10 },
|
|
|
|
{ .tag = 'M', .mult = 1 << 20 },
|
|
|
|
{ .tag = 'G', .mult = 1 << 30 },
|
|
|
|
{ .tag = 0 },
|
|
|
|
};
|
|
|
|
unsigned long val;
|
|
|
|
|
|
|
|
if (unset) {
|
|
|
|
*s = 0;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
val = parse_tag_value(str, tags_size);
|
|
|
|
if (val != (unsigned long) -1) {
|
|
|
|
*s = val;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
|
2015-04-09 15:53:46 +00:00
|
|
|
static int record__parse_mmap_pages(const struct option *opt,
|
|
|
|
const char *str,
|
|
|
|
int unset __maybe_unused)
|
|
|
|
{
|
|
|
|
struct record_opts *opts = opt->value;
|
|
|
|
char *s, *p;
|
|
|
|
unsigned int mmap_pages;
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
if (!str)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
s = strdup(str);
|
|
|
|
if (!s)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
p = strchr(s, ',');
|
|
|
|
if (p)
|
|
|
|
*p = '\0';
|
|
|
|
|
|
|
|
if (*s) {
|
2020-11-30 18:09:45 +00:00
|
|
|
ret = __evlist__parse_mmap_pages(&mmap_pages, s);
|
2015-04-09 15:53:46 +00:00
|
|
|
if (ret)
|
|
|
|
goto out_free;
|
|
|
|
opts->mmap_pages = mmap_pages;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!p) {
|
|
|
|
ret = 0;
|
|
|
|
goto out_free;
|
|
|
|
}
|
|
|
|
|
2020-11-30 18:09:45 +00:00
|
|
|
ret = __evlist__parse_mmap_pages(&mmap_pages, p + 1);
|
2015-04-09 15:53:46 +00:00
|
|
|
if (ret)
|
|
|
|
goto out_free;
|
|
|
|
|
|
|
|
opts->auxtrace_mmap_pages = mmap_pages;
|
|
|
|
|
|
|
|
out_free:
|
|
|
|
free(s);
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2021-12-17 15:45:15 +00:00
|
|
|
void __weak arch__add_leaf_frame_record_opts(struct record_opts *opts __maybe_unused)
|
|
|
|
{
|
|
|
|
}
|
|
|
|
|
2020-07-17 07:08:23 +00:00
|
|
|
static int parse_control_option(const struct option *opt,
|
|
|
|
const char *str,
|
|
|
|
int unset __maybe_unused)
|
|
|
|
{
|
2020-09-01 09:37:53 +00:00
|
|
|
struct record_opts *opts = opt->value;
|
2020-07-17 07:08:23 +00:00
|
|
|
|
2020-09-02 10:57:07 +00:00
|
|
|
return evlist__parse_control(str, &opts->ctl_fd, &opts->ctl_fd_ack, &opts->ctl_fd_close);
|
|
|
|
}
|
|
|
|
|
2017-01-09 09:51:59 +00:00
|
|
|
static void switch_output_size_warn(struct record *rec)
|
|
|
|
{
|
2019-07-28 10:45:35 +00:00
|
|
|
u64 wakeup_size = evlist__mmap_size(rec->opts.mmap_pages);
|
2017-01-09 09:51:59 +00:00
|
|
|
struct switch_output *s = &rec->switch_output;
|
|
|
|
|
|
|
|
wakeup_size /= 2;
|
|
|
|
|
|
|
|
if (s->size < wakeup_size) {
|
|
|
|
char buf[100];
|
|
|
|
|
|
|
|
unit_number__scnprintf(buf, sizeof(buf), wakeup_size);
|
|
|
|
pr_warning("WARNING: switch-output data size lower than "
|
|
|
|
"wakeup kernel buffer size (%s) "
|
|
|
|
"expect bigger perf.data sizes\n", buf);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2017-01-09 09:51:57 +00:00
|
|
|
static int switch_output_setup(struct record *rec)
|
|
|
|
{
|
|
|
|
struct switch_output *s = &rec->switch_output;
|
2017-01-09 09:51:58 +00:00
|
|
|
static struct parse_tag tags_size[] = {
|
|
|
|
{ .tag = 'B', .mult = 1 },
|
|
|
|
{ .tag = 'K', .mult = 1 << 10 },
|
|
|
|
{ .tag = 'M', .mult = 1 << 20 },
|
|
|
|
{ .tag = 'G', .mult = 1 << 30 },
|
|
|
|
{ .tag = 0 },
|
|
|
|
};
|
2017-01-09 09:52:00 +00:00
|
|
|
static struct parse_tag tags_time[] = {
|
|
|
|
{ .tag = 's', .mult = 1 },
|
|
|
|
{ .tag = 'm', .mult = 60 },
|
|
|
|
{ .tag = 'h', .mult = 60*60 },
|
|
|
|
{ .tag = 'd', .mult = 60*60*24 },
|
|
|
|
{ .tag = 0 },
|
|
|
|
};
|
2017-01-09 09:51:58 +00:00
|
|
|
unsigned long val;
|
2017-01-09 09:51:57 +00:00
|
|
|
|
2020-04-27 20:56:37 +00:00
|
|
|
/*
|
2024-06-11 05:06:26 +00:00
|
|
|
* If we're using --switch-output-events, then we imply its
|
2020-04-27 20:56:37 +00:00
|
|
|
* --switch-output=signal, as we'll send a SIGUSR2 from the side band
|
|
|
|
* thread to its parent.
|
|
|
|
*/
|
2022-01-17 18:34:34 +00:00
|
|
|
if (rec->switch_output_event_set) {
|
|
|
|
if (record__threads_enabled(rec)) {
|
|
|
|
pr_warning("WARNING: --switch-output-event option is not available in parallel streaming mode.\n");
|
|
|
|
return 0;
|
|
|
|
}
|
2020-04-27 20:56:37 +00:00
|
|
|
goto do_signal;
|
2022-01-17 18:34:34 +00:00
|
|
|
}
|
2020-04-27 20:56:37 +00:00
|
|
|
|
2017-01-09 09:51:57 +00:00
|
|
|
if (!s->set)
|
|
|
|
return 0;
|
|
|
|
|
2022-01-17 18:34:34 +00:00
|
|
|
if (record__threads_enabled(rec)) {
|
|
|
|
pr_warning("WARNING: --switch-output option is not available in parallel streaming mode.\n");
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2017-01-09 09:51:57 +00:00
|
|
|
if (!strcmp(s->str, "signal")) {
|
2020-04-27 20:56:37 +00:00
|
|
|
do_signal:
|
2017-01-09 09:51:57 +00:00
|
|
|
s->signal = true;
|
|
|
|
pr_debug("switch-output with SIGUSR2 signal\n");
|
2017-01-09 09:51:58 +00:00
|
|
|
goto enabled;
|
|
|
|
}
|
|
|
|
|
|
|
|
val = parse_tag_value(s->str, tags_size);
|
|
|
|
if (val != (unsigned long) -1) {
|
|
|
|
s->size = val;
|
|
|
|
pr_debug("switch-output with %s size threshold\n", s->str);
|
|
|
|
goto enabled;
|
2017-01-09 09:51:57 +00:00
|
|
|
}
|
|
|
|
|
2017-01-09 09:52:00 +00:00
|
|
|
val = parse_tag_value(s->str, tags_time);
|
|
|
|
if (val != (unsigned long) -1) {
|
|
|
|
s->time = val;
|
|
|
|
pr_debug("switch-output with %s time threshold (%lu seconds)\n",
|
|
|
|
s->str, s->time);
|
|
|
|
goto enabled;
|
|
|
|
}
|
|
|
|
|
2017-01-09 09:51:57 +00:00
|
|
|
return -1;
|
2017-01-09 09:51:58 +00:00
|
|
|
|
|
|
|
enabled:
|
|
|
|
rec->timestamp_filename = true;
|
|
|
|
s->enabled = true;
|
2017-01-09 09:51:59 +00:00
|
|
|
|
|
|
|
if (s->size && !rec->opts.no_buffering)
|
|
|
|
switch_output_size_warn(rec);
|
|
|
|
|
2017-01-09 09:51:58 +00:00
|
|
|
return 0;
|
2017-01-09 09:51:57 +00:00
|
|
|
}
|
|
|
|
|
2014-10-22 15:15:46 +00:00
|
|
|
static const char * const __record_usage[] = {
|
2009-05-28 14:25:34 +00:00
|
|
|
"perf record [<options>] [<command>]",
|
|
|
|
"perf record [<options>] -- <command> [<options>]",
|
2009-05-26 07:17:18 +00:00
|
|
|
NULL
|
|
|
|
};
|
2014-10-22 15:15:46 +00:00
|
|
|
const char * const *record_usage = __record_usage;
|
2009-05-26 07:17:18 +00:00
|
|
|
|
2024-08-12 20:46:55 +00:00
|
|
|
static int build_id__process_mmap(const struct perf_tool *tool, union perf_event *event,
|
2019-11-14 15:15:34 +00:00
|
|
|
struct perf_sample *sample, struct machine *machine)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* We already have the kernel maps, put in place via perf_session__create_kernel_maps()
|
|
|
|
* no need to add them twice.
|
|
|
|
*/
|
|
|
|
if (!(event->header.misc & PERF_RECORD_MISC_USER))
|
|
|
|
return 0;
|
|
|
|
return perf_event__process_mmap(tool, event, sample, machine);
|
|
|
|
}
|
|
|
|
|
2024-08-12 20:46:55 +00:00
|
|
|
static int build_id__process_mmap2(const struct perf_tool *tool, union perf_event *event,
|
2019-11-14 15:15:34 +00:00
|
|
|
struct perf_sample *sample, struct machine *machine)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* We already have the kernel maps, put in place via perf_session__create_kernel_maps()
|
|
|
|
* no need to add them twice.
|
|
|
|
*/
|
|
|
|
if (!(event->header.misc & PERF_RECORD_MISC_USER))
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
return perf_event__process_mmap2(tool, event, sample, machine);
|
|
|
|
}
|
|
|
|
|
2024-08-12 20:46:55 +00:00
|
|
|
static int process_timestamp_boundary(const struct perf_tool *tool,
|
2021-05-03 06:42:22 +00:00
|
|
|
union perf_event *event __maybe_unused,
|
|
|
|
struct perf_sample *sample,
|
|
|
|
struct machine *machine __maybe_unused)
|
|
|
|
{
|
|
|
|
struct record *rec = container_of(tool, struct record, tool);
|
|
|
|
|
|
|
|
set_timestamp_boundary(rec, sample->time);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2021-08-11 04:46:58 +00:00
|
|
|
static int parse_record_synth_option(const struct option *opt,
|
|
|
|
const char *str,
|
|
|
|
int unset __maybe_unused)
|
|
|
|
{
|
|
|
|
struct record_opts *opts = opt->value;
|
|
|
|
char *p = strdup(str);
|
|
|
|
|
|
|
|
if (p == NULL)
|
|
|
|
return -1;
|
|
|
|
|
|
|
|
opts->synth = parse_synth_opt(p);
|
|
|
|
free(p);
|
|
|
|
|
|
|
|
if (opts->synth < 0) {
|
|
|
|
pr_err("Invalid synth option: %s\n", str);
|
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2011-11-25 10:19:45 +00:00
|
|
|
/*
|
2013-12-19 17:38:03 +00:00
|
|
|
* XXX Ideally would be local to cmd_record() and passed to a record__new
|
|
|
|
* because we need to have access to it in record__exit, that is called
|
2011-11-25 10:19:45 +00:00
|
|
|
* after cmd_record() exits, but since record_options need to be accessible to
|
|
|
|
* builtin-script, leave it here.
|
|
|
|
*
|
|
|
|
* At least we don't ouch it in all the other functions here directly.
|
|
|
|
*
|
|
|
|
* Just say no to tons of global variables, sigh.
|
|
|
|
*/
|
2013-12-19 17:38:03 +00:00
|
|
|
static struct record record = {
|
2011-11-25 10:19:45 +00:00
|
|
|
.opts = {
|
2014-07-31 06:45:04 +00:00
|
|
|
.sample_time = true,
|
2011-11-25 10:19:45 +00:00
|
|
|
.mmap_pages = UINT_MAX,
|
|
|
|
.user_freq = UINT_MAX,
|
|
|
|
.user_interval = ULLONG_MAX,
|
2012-05-22 16:14:18 +00:00
|
|
|
.freq = 4000,
|
2012-05-16 09:45:49 +00:00
|
|
|
.target = {
|
|
|
|
.uses_mmap = true,
|
2013-11-15 13:52:29 +00:00
|
|
|
.default_per_cpu = true,
|
2012-05-16 09:45:49 +00:00
|
|
|
},
|
perf record: Implement --mmap-flush=<number> option
Implement a --mmap-flush option that specifies minimal number of bytes
that is extracted from mmaped kernel buffer to store into a trace. The
default option value is 1 byte what means every time trace writing
thread finds some new data in the mmaped buffer the data is extracted,
possibly compressed and written to a trace.
$ tools/perf/perf record --mmap-flush 1024 -e cycles -- matrix.gcc
$ tools/perf/perf record --aio --mmap-flush 1K -e cycles -- matrix.gcc
The option is independent from -z setting, doesn't vary with compression
level and can serve two purposes.
The first purpose is to increase the compression ratio of a trace data.
Larger data chunks are compressed more effectively so the implemented
option allows specifying data chunk size to compress. Also at some cases
executing more write syscalls with smaller data size can take longer
than executing less write syscalls with bigger data size due to syscall
overhead so extracting bigger data chunks specified by the option value
could additionally decrease runtime overhead.
The second purpose is to avoid self monitoring live-lock issue in system
wide (-a) profiling mode. Profiling in system wide mode with compression
(-a -z) can additionally induce data into the kernel buffers along with
the data from monitored processes. If performance data rate and volume
from the monitored processes is high then trace streaming and
compression activity in the tool is also high. High tool process
activity can lead to subtle live-lock effect when compression of single
new byte from some of mmaped kernel buffer leads to generation of the
next single byte at some mmaped buffer. So perf tool process ends up in
endless self monitoring.
Implemented synch parameter is the mean to force data move independently
from the specified flush threshold value. Despite the provided flush
value the tool needs capability to unconditionally drain memory buffers,
at least in the end of the collection.
Committer testing:
Running with the default value, i.e. as soon as there is something to
read go on consuming, we first write the synthesized events, small
chunks of about 128 bytes:
# perf trace -m 2048 --call-graph dwarf -e write -- perf record
<SNIP>
101.142 ( 0.004 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x210db60, count: 120) = 120
__libc_write (/usr/lib64/libpthread-2.28.so)
ion (/home/acme/bin/perf)
record__write (inlined)
process_synthesized_event (/home/acme/bin/perf)
perf_tool__process_synth_event (inlined)
perf_event__synthesize_mmap_events (/home/acme/bin/perf)
Then we move to reading the mmap buffers consuming the events put there
by the kernel perf infrastructure:
107.561 ( 0.005 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befc02000, count: 336) = 336
__libc_write (/usr/lib64/libpthread-2.28.so)
ion (/home/acme/bin/perf)
record__write (inlined)
record__pushfn (/home/acme/bin/perf)
perf_mmap__push (/home/acme/bin/perf)
record__mmap_read_evlist (inlined)
record__mmap_read_all (inlined)
__cmd_record (inlined)
cmd_record (/home/acme/bin/perf)
12919.953 ( 0.136 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befc83150, count: 184984) = 184984
<SNIP same backtrace as in the 107.561 timestamp>
12920.094 ( 0.155 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befc02150, count: 261816) = 261816
<SNIP same backtrace as in the 107.561 timestamp>
12920.253 ( 0.093 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befb81120, count: 170832) = 170832
<SNIP same backtrace as in the 107.561 timestamp>
If we limit it to write only when more than 16MB are available for
reading, it throttles that to a quarter of the --mmap-pages set for
'perf record', which by default get to 528384 bytes, found out using
'record -v':
mmap flush: 132096
mmap size 528384B
With that in place all the writes coming from
record__mmap_read_evlist(), i.e. from the mmap buffers setup by the
kernel perf infrastructure were at least 132096 bytes long.
Trying with a bigger mmap size:
perf trace -e write perf record -v -m 2048 --mmap-flush 16M
74982.928 ( 2.471 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff94a6cc000, count: 3580888) = 3580888
74985.406 ( 2.353 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff949ecb000, count: 3453256) = 3453256
74987.764 ( 2.629 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff9496ca000, count: 3859232) = 3859232
74990.399 ( 2.341 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff948ec9000, count: 3769032) = 3769032
74992.744 ( 2.064 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff9486c8000, count: 3310520) = 3310520
74994.814 ( 2.619 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff947ec7000, count: 4194688) = 4194688
74997.439 ( 2.787 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff9476c6000, count: 4029760) = 4029760
Was again limited to a quarter of the mmap size:
mmap flush: 2098176
mmap size 8392704B
A warning about that would be good to have but can be added later,
something like:
"max flush is a quarter of the mmap size, if wanting to bump the mmap
flush further, bump the mmap size as well using -m/--mmap-pages"
Also rename the 'sync' parameters to 'synch' to keep tools/perf building
with older glibcs:
cc1: warnings being treated as errors
builtin-record.c: In function 'record__mmap_read_evlist':
builtin-record.c:775: warning: declaration of 'sync' shadows a global declaration
/usr/include/unistd.h:933: warning: shadowed declaration is here
builtin-record.c: In function 'record__mmap_read_all':
builtin-record.c:856: warning: declaration of 'sync' shadows a global declaration
/usr/include/unistd.h:933: warning: shadowed declaration is here
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/f6600d72-ecfa-2eb7-7e51-f6954547d500@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-18 17:40:26 +00:00
|
|
|
.mmap_flush = MMAP_FLUSH_DEFAULT,
|
2020-04-22 15:50:38 +00:00
|
|
|
.nr_threads_synthesize = 1,
|
2020-07-17 07:08:23 +00:00
|
|
|
.ctl_fd = -1,
|
|
|
|
.ctl_fd_ack = -1,
|
2021-08-11 04:46:58 +00:00
|
|
|
.synth = PERF_SYNTH_ALL,
|
2011-11-25 10:19:45 +00:00
|
|
|
},
|
|
|
|
};
|
2010-04-14 17:42:07 +00:00
|
|
|
|
perf tools: Improve call graph documents and help messages
The --call-graph option is complex so we should provide better guide for
users. Also change help message to be consistent with config option
names. Now perf top will show help like below:
$ perf top --call-graph
Error: option `call-graph' requires a value
Usage: perf top [<options>]
--call-graph <record_mode[,record_size],print_type,threshold[,print_limit],order,sort_key[,branch]>
setup and enables call-graph (stack chain/backtrace):
record_mode: call graph recording mode (fp|dwarf|lbr)
record_size: if record_mode is 'dwarf', max size of stack recording (<bytes>)
default: 8192 (bytes)
print_type: call graph printing style (graph|flat|fractal|none)
threshold: minimum call graph inclusion threshold (<percent>)
print_limit: maximum number of call graph entry (<number>)
order: call graph order (caller|callee)
sort_key: call graph sort key (function|address)
branch: include last branch info to call graph (branch)
Default: fp,graph,0.5,caller,function
Requested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Chandler Carruth <chandlerc@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1445524112-5201-2-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-22 14:28:32 +00:00
|
|
|
const char record_callchain_help[] = CALLCHAIN_RECORD_HELP
|
|
|
|
"\n\t\t\t\tDefault: fp";
|
2012-10-01 18:20:58 +00:00
|
|
|
|
2016-06-16 08:02:41 +00:00
|
|
|
static bool dry_run;
|
|
|
|
|
2023-05-02 22:38:36 +00:00
|
|
|
static struct parse_events_option_args parse_events_option_args = {
|
|
|
|
.evlistp = &record.evlist,
|
|
|
|
};
|
|
|
|
|
|
|
|
static struct parse_events_option_args switch_output_parse_events_option_args = {
|
|
|
|
.evlistp = &record.sb_evlist,
|
|
|
|
};
|
|
|
|
|
2011-11-25 10:19:45 +00:00
|
|
|
/*
|
|
|
|
* XXX Will stay a global variable till we fix builtin-script.c to stop messing
|
|
|
|
* with it and switch to use the library functions in perf_evlist that came
|
2013-12-19 17:43:45 +00:00
|
|
|
* from builtin-record.c, i.e. use record_opts,
|
2020-11-30 12:26:54 +00:00
|
|
|
* evlist__prepare_workload, etc instead of fork+exec'in 'perf record',
|
2011-11-25 10:19:45 +00:00
|
|
|
* using pipes, etc.
|
|
|
|
*/
|
2017-01-03 08:19:55 +00:00
|
|
|
static struct option __record_options[] = {
|
2023-05-02 22:38:36 +00:00
|
|
|
OPT_CALLBACK('e', "event", &parse_events_option_args, "event",
|
2009-06-06 10:24:17 +00:00
|
|
|
"event selector. use 'perf list' to list available events",
|
2011-07-14 09:25:32 +00:00
|
|
|
parse_events_option),
|
2011-11-25 10:19:45 +00:00
|
|
|
OPT_CALLBACK(0, "filter", &record.evlist, "filter",
|
2009-10-15 03:22:07 +00:00
|
|
|
"event filter", parse_filter),
|
2015-07-10 07:36:10 +00:00
|
|
|
OPT_CALLBACK_NOOPT(0, "exclude-perf", &record.evlist,
|
|
|
|
NULL, "don't record events from perf itself",
|
|
|
|
exclude_perf),
|
2012-04-26 05:15:15 +00:00
|
|
|
OPT_STRING('p', "pid", &record.opts.target.pid, "pid",
|
2010-03-18 14:36:05 +00:00
|
|
|
"record events on existing process id"),
|
2012-04-26 05:15:15 +00:00
|
|
|
OPT_STRING('t', "tid", &record.opts.target.tid, "tid",
|
2010-03-18 14:36:05 +00:00
|
|
|
"record events on existing thread id"),
|
2011-11-25 10:19:45 +00:00
|
|
|
OPT_INTEGER('r', "realtime", &record.realtime_prio,
|
2009-05-26 07:17:18 +00:00
|
|
|
"collect data with this RT SCHED_FIFO priority"),
|
2014-01-14 20:52:14 +00:00
|
|
|
OPT_BOOLEAN(0, "no-buffering", &record.opts.no_buffering,
|
perf record: Add "nodelay" mode, disabled by default
Sometimes there is a need to use perf in "live-log" mode. The problem
is, for seldom events, actual info output is largely delayed because
perf-record reads sample data in whole pages.
So for such scenarious, add flag for perf-record to go in "nodelay"
mode. To track e.g. what's going on in icmp_rcv while ping is running
Use it with something like this:
(1) $ perf probe -L icmp_rcv | grep -U8 '^ *43\>'
goto error;
}
38 if (!pskb_pull(skb, sizeof(*icmph)))
goto error;
icmph = icmp_hdr(skb);
43 ICMPMSGIN_INC_STATS_BH(net, icmph->type);
/*
* 18 is the highest 'known' ICMP type. Anything else is a mystery
*
* RFC 1122: 3.2.2 Unknown ICMP messages types MUST be silently
* discarded.
*/
50 if (icmph->type > NR_ICMP_TYPES)
goto error;
$ perf probe icmp_rcv:43 'type=icmph->type'
(2) $ cat trace-icmp.py
[...]
def trace_begin():
print "in trace_begin"
def trace_end():
print "in trace_end"
def probe__icmp_rcv(event_name, context, common_cpu,
common_secs, common_nsecs, common_pid, common_comm,
__probe_ip, type):
print_header(event_name, common_cpu, common_secs, common_nsecs,
common_pid, common_comm)
print "__probe_ip=%u, type=%u\n" % \
(__probe_ip, type),
[...]
(3) $ perf record -a -D -e probe:icmp_rcv -o - | \
perf script -i - -s trace-icmp.py
Thanks to Peter Zijlstra for pointing how to do it.
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>, Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
LKML-Reference: <20110112140613.GA11698@tugrik.mns.mnsspb.ru>
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-01-12 14:59:36 +00:00
|
|
|
"collect data without buffering"),
|
2011-11-25 10:19:45 +00:00
|
|
|
OPT_BOOLEAN('R', "raw-samples", &record.opts.raw_samples,
|
2009-08-13 08:27:19 +00:00
|
|
|
"collect raw sample records from all opened counters"),
|
2012-04-26 05:15:15 +00:00
|
|
|
OPT_BOOLEAN('a', "all-cpus", &record.opts.target.system_wide,
|
2009-05-26 07:17:18 +00:00
|
|
|
"system-wide collection from all CPUs"),
|
2012-04-26 05:15:15 +00:00
|
|
|
OPT_STRING('C', "cpu", &record.opts.target.cpu_list, "cpu",
|
2010-05-28 10:00:01 +00:00
|
|
|
"list of cpus to monitor"),
|
2011-11-25 10:19:45 +00:00
|
|
|
OPT_U64('c', "count", &record.opts.user_interval, "event period to sample"),
|
2019-02-21 09:41:30 +00:00
|
|
|
OPT_STRING('o', "output", &record.data.path, "file",
|
2009-06-02 20:59:57 +00:00
|
|
|
"output file name"),
|
2013-11-18 09:55:57 +00:00
|
|
|
OPT_BOOLEAN_SET('i', "no-inherit", &record.opts.no_inherit,
|
|
|
|
&record.opts.no_inherit_set,
|
|
|
|
"child tasks do not inherit counters"),
|
perf record: Add --tail-synthesize option
When working with overwritable ring buffer there's a inconvenience
problem: if perf dumps data after a long period after it starts,
non-sample events may lost, which makes following 'perf report' unable
to identify proc name and mmap layout. For example:
# perf record -m 4 -e raw_syscalls:* -g --overwrite --switch-output \
dd if=/dev/zero of=/dev/null
send SIGUSR2 after dd runs long enough. The resuling perf.data lost
correct comm and mmap events:
# perf script -i perf.data.2016061522374354
perf 24478 [004] 2581325.601789: raw_syscalls:sys_exit: NR 0 = 512
^^^^
Should be 'dd'
27b2e8 syscall_slow_exit_work+0xfe2000e3 (/lib/modules/4.6.0-rc3+/build/vmlinux)
203cc7 do_syscall_64+0xfe200117 (/lib/modules/4.6.0-rc3+/build/vmlinux)
b18d83 return_from_SYSCALL_64+0xfe200000 (/lib/modules/4.6.0-rc3+/build/vmlinux)
7f47c417edf0 [unknown] ([unknown])
^^^^^^^^^^^^
Fail to unwind
This patch provides a '--tail-synthesize' option, allows perf to collect
system status when finalizing output file. In resuling output file, the
non-sample events reflect system status when dumping data.
After this patch:
# perf record -m 4 -e raw_syscalls:* -g --overwrite --switch-output --tail-synthesize \
dd if=/dev/zero of=/dev/null
# perf script -i perf.data.2016061600544998
dd 27364 [004] 2583244.994464: raw_syscalls:sys_enter: NR 1 (1, ...
^^
Correct comm
203a18 syscall_trace_enter_phase2+0xfe2001a8 ([kernel.kallsyms])
203aa5 syscall_trace_enter+0xfe200055 ([kernel.kallsyms])
203caa do_syscall_64+0xfe2000fa ([kernel.kallsyms])
b18d83 return_from_SYSCALL_64+0xfe200000 ([kernel.kallsyms])
d8e50 __GI___libc_write+0xffff01d9639f4010 (/tmp/oxygen_root-w00229757/lib64/libc-2.18.so)
^^^^^
Correct unwind
This option doesn't aim to solve this problem completely. If a process
terminates before SIGUSR2, we still lost its COMM and MMAP events. For
example, we can't unwind correctly from the final perf.data we get from
the previous example, because when perf collects the final output file
(when we press C-c), 'dd' has been terminated so its '/proc/<pid>/mmap'
becomes empty.
However, this is a cheaper choice. To completely solve this problem we
need to continously output non-sample events. To satisify the
requirement of daemonization, we need to merge them periodically. It is
possible but requires much more code and cycles.
Automatically select --tail-synthesize when --overwrite is provided.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1468485287-33422-16-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-07-14 08:34:47 +00:00
|
|
|
OPT_BOOLEAN(0, "tail-synthesize", &record.opts.tail_synthesize,
|
|
|
|
"synthesize non-sample events at the end of output"),
|
perf tools: Enable overwrite settings
This patch allows following config terms and option:
Globally setting events to overwrite;
# perf record --overwrite ...
Set specific events to be overwrite or no-overwrite.
# perf record --event cycles/overwrite/ ...
# perf record --event cycles/no-overwrite/ ...
Add missing config terms and update the config term array size because
the longest string length has changed.
For overwritable events, it automatically selects attr.write_backward
since perf requires it to be backward for reading.
Test result:
# perf record --overwrite -e syscalls:*enter_nanosleep* usleep 1
[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 0.011 MB perf.data (1 samples) ]
# perf evlist -v
syscalls:sys_enter_nanosleep: type: 2, size: 112, config: 0x134, { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|CPU|PERIOD|RAW, disabled: 1, inherit: 1, mmap: 1, comm: 1, enable_on_exec: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, write_backward: 1
# Tip: use 'perf evlist --trace-fields' to show fields for tracepoint events
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1468485287-33422-14-git-send-email-wangnan0@huawei.com
Signed-off-by: He Kuang <hekuang@huawei.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-07-14 08:34:45 +00:00
|
|
|
OPT_BOOLEAN(0, "overwrite", &record.opts.overwrite, "use overwrite mode"),
|
2020-08-19 03:19:47 +00:00
|
|
|
OPT_BOOLEAN(0, "no-bpf-event", &record.opts.no_bpf_event, "do not record bpf events"),
|
perf record: Throttle user defined frequencies to the maximum allowed
# perf record -F 200000 sleep 1
warning: Maximum frequency rate (15,000 Hz) exceeded, throttling from 200,000 Hz to 15,000 Hz.
The limit can be raised via /proc/sys/kernel/perf_event_max_sample_rate.
The kernel will lower it when perf's interrupts take too long.
Use --strict-freq to disable this throttling, refusing to record.
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.019 MB perf.data (15 samples) ]
# perf evlist -v
cycles:ppp: size: 112, { sample_period, sample_freq }: 15000, sample_type: IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1
For those wanting that it fails if the desired frequency can't be used:
# perf record --strict-freq -F 200000 sleep 1
error: Maximum frequency rate (15,000 Hz) exceeded.
Please use -F freq option with a lower value or consider
tweaking /proc/sys/kernel/perf_event_max_sample_rate.
#
Suggested-by: Ingo Molnar <mingo@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-oyebruc44nlja499nqkr1nzn@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-01 17:52:50 +00:00
|
|
|
OPT_BOOLEAN(0, "strict-freq", &record.opts.strict_freq,
|
|
|
|
"Fail if the specified frequency can't be used"),
|
perf record: Allow asking for the maximum allowed sample rate
Add the handy '-F max' shortcut to reading and using the
kernel.perf_event_max_sample_rate value as the user supplied
sampling frequency:
# perf record -F max sleep 1
info: Using a maximum frequency rate of 15,000 Hz
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.019 MB perf.data (14 samples) ]
# sysctl kernel.perf_event_max_sample_rate
kernel.perf_event_max_sample_rate = 15000
# perf evlist -v
cycles:ppp: size: 112, { sample_period, sample_freq }: 15000, sample_type: IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1
# perf record -F 10 sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.019 MB perf.data (4 samples) ]
# perf evlist -v
cycles:ppp: size: 112, { sample_period, sample_freq }: 10, sample_type: IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1
#
Suggested-by: Ingo Molnar <mingo@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-4y0tiuws62c64gp4cf0hme0m@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-01 16:46:23 +00:00
|
|
|
OPT_CALLBACK('F', "freq", &record.opts, "freq or 'max'",
|
|
|
|
"profile at this frequency",
|
|
|
|
record__parse_freq),
|
2015-04-09 15:53:46 +00:00
|
|
|
OPT_CALLBACK('m', "mmap-pages", &record.opts, "pages[,pages]",
|
|
|
|
"number of mmap data pages and AUX area tracing mmap pages",
|
|
|
|
record__parse_mmap_pages),
|
perf record: Implement --mmap-flush=<number> option
Implement a --mmap-flush option that specifies minimal number of bytes
that is extracted from mmaped kernel buffer to store into a trace. The
default option value is 1 byte what means every time trace writing
thread finds some new data in the mmaped buffer the data is extracted,
possibly compressed and written to a trace.
$ tools/perf/perf record --mmap-flush 1024 -e cycles -- matrix.gcc
$ tools/perf/perf record --aio --mmap-flush 1K -e cycles -- matrix.gcc
The option is independent from -z setting, doesn't vary with compression
level and can serve two purposes.
The first purpose is to increase the compression ratio of a trace data.
Larger data chunks are compressed more effectively so the implemented
option allows specifying data chunk size to compress. Also at some cases
executing more write syscalls with smaller data size can take longer
than executing less write syscalls with bigger data size due to syscall
overhead so extracting bigger data chunks specified by the option value
could additionally decrease runtime overhead.
The second purpose is to avoid self monitoring live-lock issue in system
wide (-a) profiling mode. Profiling in system wide mode with compression
(-a -z) can additionally induce data into the kernel buffers along with
the data from monitored processes. If performance data rate and volume
from the monitored processes is high then trace streaming and
compression activity in the tool is also high. High tool process
activity can lead to subtle live-lock effect when compression of single
new byte from some of mmaped kernel buffer leads to generation of the
next single byte at some mmaped buffer. So perf tool process ends up in
endless self monitoring.
Implemented synch parameter is the mean to force data move independently
from the specified flush threshold value. Despite the provided flush
value the tool needs capability to unconditionally drain memory buffers,
at least in the end of the collection.
Committer testing:
Running with the default value, i.e. as soon as there is something to
read go on consuming, we first write the synthesized events, small
chunks of about 128 bytes:
# perf trace -m 2048 --call-graph dwarf -e write -- perf record
<SNIP>
101.142 ( 0.004 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x210db60, count: 120) = 120
__libc_write (/usr/lib64/libpthread-2.28.so)
ion (/home/acme/bin/perf)
record__write (inlined)
process_synthesized_event (/home/acme/bin/perf)
perf_tool__process_synth_event (inlined)
perf_event__synthesize_mmap_events (/home/acme/bin/perf)
Then we move to reading the mmap buffers consuming the events put there
by the kernel perf infrastructure:
107.561 ( 0.005 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befc02000, count: 336) = 336
__libc_write (/usr/lib64/libpthread-2.28.so)
ion (/home/acme/bin/perf)
record__write (inlined)
record__pushfn (/home/acme/bin/perf)
perf_mmap__push (/home/acme/bin/perf)
record__mmap_read_evlist (inlined)
record__mmap_read_all (inlined)
__cmd_record (inlined)
cmd_record (/home/acme/bin/perf)
12919.953 ( 0.136 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befc83150, count: 184984) = 184984
<SNIP same backtrace as in the 107.561 timestamp>
12920.094 ( 0.155 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befc02150, count: 261816) = 261816
<SNIP same backtrace as in the 107.561 timestamp>
12920.253 ( 0.093 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befb81120, count: 170832) = 170832
<SNIP same backtrace as in the 107.561 timestamp>
If we limit it to write only when more than 16MB are available for
reading, it throttles that to a quarter of the --mmap-pages set for
'perf record', which by default get to 528384 bytes, found out using
'record -v':
mmap flush: 132096
mmap size 528384B
With that in place all the writes coming from
record__mmap_read_evlist(), i.e. from the mmap buffers setup by the
kernel perf infrastructure were at least 132096 bytes long.
Trying with a bigger mmap size:
perf trace -e write perf record -v -m 2048 --mmap-flush 16M
74982.928 ( 2.471 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff94a6cc000, count: 3580888) = 3580888
74985.406 ( 2.353 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff949ecb000, count: 3453256) = 3453256
74987.764 ( 2.629 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff9496ca000, count: 3859232) = 3859232
74990.399 ( 2.341 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff948ec9000, count: 3769032) = 3769032
74992.744 ( 2.064 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff9486c8000, count: 3310520) = 3310520
74994.814 ( 2.619 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff947ec7000, count: 4194688) = 4194688
74997.439 ( 2.787 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff9476c6000, count: 4029760) = 4029760
Was again limited to a quarter of the mmap size:
mmap flush: 2098176
mmap size 8392704B
A warning about that would be good to have but can be added later,
something like:
"max flush is a quarter of the mmap size, if wanting to bump the mmap
flush further, bump the mmap size as well using -m/--mmap-pages"
Also rename the 'sync' parameters to 'synch' to keep tools/perf building
with older glibcs:
cc1: warnings being treated as errors
builtin-record.c: In function 'record__mmap_read_evlist':
builtin-record.c:775: warning: declaration of 'sync' shadows a global declaration
/usr/include/unistd.h:933: warning: shadowed declaration is here
builtin-record.c: In function 'record__mmap_read_all':
builtin-record.c:856: warning: declaration of 'sync' shadows a global declaration
/usr/include/unistd.h:933: warning: shadowed declaration is here
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/f6600d72-ecfa-2eb7-7e51-f6954547d500@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-18 17:40:26 +00:00
|
|
|
OPT_CALLBACK(0, "mmap-flush", &record.opts, "number",
|
|
|
|
"Minimal number of bytes that is extracted from mmap data pages (default: 1)",
|
|
|
|
record__mmap_flush_parse),
|
2016-04-18 15:09:08 +00:00
|
|
|
OPT_CALLBACK_NOOPT('g', NULL, &callchain_param,
|
2013-10-26 14:25:33 +00:00
|
|
|
NULL, "enables call-graph recording" ,
|
|
|
|
&record_callchain_opt),
|
|
|
|
OPT_CALLBACK(0, "call-graph", &record.opts,
|
perf tools: Improve call graph documents and help messages
The --call-graph option is complex so we should provide better guide for
users. Also change help message to be consistent with config option
names. Now perf top will show help like below:
$ perf top --call-graph
Error: option `call-graph' requires a value
Usage: perf top [<options>]
--call-graph <record_mode[,record_size],print_type,threshold[,print_limit],order,sort_key[,branch]>
setup and enables call-graph (stack chain/backtrace):
record_mode: call graph recording mode (fp|dwarf|lbr)
record_size: if record_mode is 'dwarf', max size of stack recording (<bytes>)
default: 8192 (bytes)
print_type: call graph printing style (graph|flat|fractal|none)
threshold: minimum call graph inclusion threshold (<percent>)
print_limit: maximum number of call graph entry (<number>)
order: call graph order (caller|callee)
sort_key: call graph sort key (function|address)
branch: include last branch info to call graph (branch)
Default: fp,graph,0.5,caller,function
Requested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Chandler Carruth <chandlerc@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1445524112-5201-2-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-22 14:28:32 +00:00
|
|
|
"record_mode[,record_size]", record_callchain_help,
|
2013-10-26 14:25:33 +00:00
|
|
|
&record_parse_callchain_opt),
|
2010-04-13 08:37:33 +00:00
|
|
|
OPT_INCR('v', "verbose", &verbose,
|
2009-06-07 15:39:02 +00:00
|
|
|
"be more verbose (show counter open errors, etc)"),
|
2022-10-18 09:41:36 +00:00
|
|
|
OPT_BOOLEAN('q', "quiet", &quiet, "don't print any warnings or messages"),
|
2011-11-25 10:19:45 +00:00
|
|
|
OPT_BOOLEAN('s', "stat", &record.opts.inherit_stat,
|
2009-06-24 19:12:48 +00:00
|
|
|
"per thread counts"),
|
2015-06-10 14:48:50 +00:00
|
|
|
OPT_BOOLEAN('d', "data", &record.opts.sample_address, "Record the sample addresses"),
|
2017-08-29 17:11:08 +00:00
|
|
|
OPT_BOOLEAN(0, "phys-data", &record.opts.sample_phys_addr,
|
|
|
|
"Record the sample physical addresses"),
|
2020-11-30 17:27:53 +00:00
|
|
|
OPT_BOOLEAN(0, "data-page-size", &record.opts.sample_data_page_size,
|
|
|
|
"Record the sampled data address data page size"),
|
2021-01-05 19:57:49 +00:00
|
|
|
OPT_BOOLEAN(0, "code-page-size", &record.opts.sample_code_page_size,
|
|
|
|
"Record the sampled code address (ip) page size"),
|
2016-08-01 18:02:35 +00:00
|
|
|
OPT_BOOLEAN(0, "sample-cpu", &record.opts.sample_cpu, "Record the sample cpu"),
|
perf record: Add new option to sample identifier
In preparation for recording sideband events in a virtual machine guest so
that they can be injected into a host perf.data file.
Add an option to always include sample type PERF_SAMPLE_IDENTIFIER.
Committer testing:
# perf record sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.020 MB perf.data (7 samples) ]
# perf evlist -v
cycles: size: 128, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, sample_id_all: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1
#
#
# perf record --sample-identifier sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.022 MB perf.data (7 samples) ]
# perf evlist -v
cycles: size: 128, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD|IDENTIFIER, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, sample_id_all: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1
#
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20220615052511.4441-1-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-06-15 05:25:11 +00:00
|
|
|
OPT_BOOLEAN(0, "sample-identifier", &record.opts.sample_identifier,
|
|
|
|
"Record the sample identifier"),
|
2015-07-06 11:51:01 +00:00
|
|
|
OPT_BOOLEAN_SET('T', "timestamp", &record.opts.sample_time,
|
|
|
|
&record.opts.sample_time_set,
|
|
|
|
"Record the sample timestamps"),
|
perf record: Fix period option handling
Stephan reported we don't unset PERIOD sample type when --no-period is
specified. Adding the unset check and reset PERIOD if --no-period is
specified.
Committer notes:
Check the sample_type, it shouldn't have PERF_SAMPLE_PERIOD there when
--no-period is used.
Before:
# perf record --no-period sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.018 MB perf.data (7 samples) ]
# perf evlist -v
cycles:ppp: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1
#
After:
[root@jouet ~]# perf record --no-period sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.019 MB perf.data (17 samples) ]
[root@jouet ~]# perf evlist -v
cycles:ppp: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1
[root@jouet ~]#
Reported-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Stephane Eranian <eranian@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180201083812.11359-3-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-02-01 08:38:11 +00:00
|
|
|
OPT_BOOLEAN_SET('P', "period", &record.opts.period, &record.opts.period_set,
|
|
|
|
"Record the sample period"),
|
2011-11-25 10:19:45 +00:00
|
|
|
OPT_BOOLEAN('n', "no-samples", &record.opts.no_samples,
|
2009-06-24 19:12:48 +00:00
|
|
|
"don't sample"),
|
2016-01-25 09:56:19 +00:00
|
|
|
OPT_BOOLEAN_SET('N', "no-buildid-cache", &record.no_buildid_cache,
|
|
|
|
&record.no_buildid_cache_set,
|
|
|
|
"do not update the buildid cache"),
|
|
|
|
OPT_BOOLEAN_SET('B', "no-buildid", &record.no_buildid,
|
|
|
|
&record.no_buildid_set,
|
|
|
|
"do not collect buildids in perf.data"),
|
2011-11-25 10:19:45 +00:00
|
|
|
OPT_CALLBACK('G', "cgroup", &record.evlist, "name",
|
perf tool: Add cgroup support
This patch adds the ability to filter monitoring based on container groups
(cgroups) for both perf stat and perf record. It is possible to monitor
multiple cgroup in parallel. There is one cgroup per event. The cgroups to
monitor are passed via a new -G option followed by a comma separated list of
cgroup names.
The cgroup filesystem has to be mounted. Given a cgroup name, the perf tool
finds the corresponding directory in the cgroup filesystem and opens it. It
then passes that file descriptor to the kernel.
Example:
$ perf stat -B -a -e cycles:u,cycles:u,cycles:u -G test1,,test2 -- sleep 1
Performance counter stats for 'sleep 1':
2,368,667,414 cycles test1
2,369,661,459 cycles
<not counted> cycles test2
1.001856890 seconds time elapsed
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <4d590290.825bdf0a.7d0a.4890@mx.google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-14 09:20:01 +00:00
|
|
|
"monitor event in cgroup name only",
|
|
|
|
parse_cgroups),
|
perf record: Allow multiple recording time ranges
AUX area traces can produce too much data to record successfully or
analyze subsequently. Add another means to reduce data collection by
allowing multiple recording time ranges.
This is useful, for instance, in cases where a workload produces
predictably reproducible events in specific time ranges.
Today we only have perf record -D <msecs> to start at a specific region, or
some complicated approach using snapshot mode and external scripts sending
signals or using the fifos. But these approaches are difficult to set up
compared with simply having perf do it.
Extend perf record option -D/--delay option to specifying relative time
stamps for start stop controlled by perf with the right time offset, for
instance:
perf record -e intel_pt// -D 10-20,30-40
to record 10ms to 20ms into the trace and 30ms to 40ms.
Example:
The example workload is:
$ cat repeat-usleep.c
int usleep(useconds_t usec);
int usage(int ret, const char *msg)
{
if (msg)
fprintf(stderr, "%s\n", msg);
fprintf(stderr, "Usage is: repeat-usleep <microseconds>\n");
return ret;
}
int main(int argc, char *argv[])
{
unsigned long usecs;
char *end_ptr;
if (argc != 2)
return usage(1, "Error: Wrong number of arguments!");
errno = 0;
usecs = strtoul(argv[1], &end_ptr, 0);
if (errno || *end_ptr || usecs > UINT_MAX)
return usage(1, "Error: Invalid argument!");
while (1) {
int ret = usleep(usecs);
if (ret & errno != EINTR)
return usage(1, "Error: usleep() failed!");
}
return 0;
}
$ perf record -e intel_pt//u --delay 10-20,40-70,110-160 -- ./repeat-usleep 500
Events disabled
Events enabled
Events disabled
Events enabled
Events disabled
Events enabled
Events disabled
[ perf record: Woken up 5 times to write data ]
[ perf record: Captured and wrote 0.204 MB perf.data ]
Terminated
A dlfilter is used to determine continuous data collection (timestamps
less than 1ms apart):
$ cat dlfilter-show-delays.c
static __u64 start_time;
static __u64 last_time;
int start(void **data, void *ctx)
{
printf("%-17s\t%-9s\t%-6s\n", " Time", " Duration", " Delay");
return 0;
}
int filter_event_early(void *data, const struct perf_dlfilter_sample *sample, void *ctx)
{
__u64 delta;
if (!sample->time)
return 1;
if (!last_time)
goto out;
delta = sample->time - last_time;
if (delta < 1000000)
goto out2;;
printf("%17.9f\t%9.1f\t%6.1f\n", start_time / 1000000000.0, (last_time - start_time) / 1000000.0, delta / 1000000.0);
out:
start_time = sample->time;
out2:
last_time = sample->time;
return 1;
}
int stop(void *data, void *ctx)
{
printf("%17.9f\t%9.1f\n", start_time / 1000000000.0, (last_time - start_time) / 1000000.0);
return 0;
}
The result shows the times roughly match the --delay option:
$ perf script --itrace=qb --dlfilter dlfilter-show-delays.so
Time Duration Delay
39215.302317300 9.7 20.5
39215.332480217 30.4 40.9
39215.403837717 49.8
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20220824072814.16422-6-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-08-24 07:28:14 +00:00
|
|
|
OPT_CALLBACK('D', "delay", &record, "ms",
|
|
|
|
"ms to wait before starting measurement after program start (-1: start with events disabled), "
|
|
|
|
"or ranges of time to enable events e.g. '-D 10-20,30-40'",
|
|
|
|
record__parse_event_enable_time),
|
perf record: Put a copy of kcore into the perf.data directory
Add a new 'perf record' option '--kcore' which will put a copy of
/proc/kcore, kallsyms and modules into a perf.data directory. Note, that
without the --kcore option, output goes to a file as previously. The
tools' -o and -i options work with either a file name or directory name.
Example:
$ sudo perf record --kcore uname
$ sudo tree perf.data
perf.data
├── kcore_dir
│ ├── kallsyms
│ ├── kcore
│ └── modules
└── data
$ sudo perf script -v
build id event received for vmlinux: 1eaa285996affce2d74d8e66dcea09a80c9941de
build id event received for [vdso]: 8bbaf5dc62a9b644b4d4e4539737e104e4a84541
Samples for 'cycles' event do not have CPU attribute set. Skipping 'cpu' field.
Using CPUID GenuineIntel-6-8E-A
Using perf.data/kcore_dir/kcore for kernel data
Using perf.data/kcore_dir/kallsyms for symbols
perf 19058 506778.423729: 1 cycles: ffffffffa2caa548 native_write_msr+0x8 (vmlinux)
perf 19058 506778.423733: 1 cycles: ffffffffa2caa548 native_write_msr+0x8 (vmlinux)
perf 19058 506778.423734: 7 cycles: ffffffffa2caa548 native_write_msr+0x8 (vmlinux)
perf 19058 506778.423736: 117 cycles: ffffffffa2caa54a native_write_msr+0xa (vmlinux)
perf 19058 506778.423738: 2092 cycles: ffffffffa2c9b7b0 native_apic_msr_write+0x0 (vmlinux)
perf 19058 506778.423740: 37380 cycles: ffffffffa2f121d0 perf_event_addr_filters_exec+0x0 (vmlinux)
uname 19058 506778.423751: 582673 cycles: ffffffffa303a407 propagate_protected_usage+0x147 (vmlinux)
uname 19058 506778.423892: 2241841 cycles: ffffffffa2cae0c9 unwind_next_frame.part.5+0x79 (vmlinux)
uname 19058 506778.424430: 2457397 cycles: ffffffffa3019232 check_memory_region+0x52 (vmlinux)
Committer testing:
# rm -rf perf.data*
# perf record sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.024 MB perf.data (7 samples) ]
# ls -l perf.data
-rw-------. 1 root root 34772 Oct 21 11:08 perf.data
# perf record --kcore uname
Linux
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.024 MB perf.data (7 samples) ]
ls[root@quaco ~]# ls -lad perf.data*
drwx------. 3 root root 4096 Oct 21 11:08 perf.data
-rw-------. 1 root root 34772 Oct 21 11:08 perf.data.old
# perf evlist -v
cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1
# perf evlist -v -i perf.data/data
cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1
#
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lore.kernel.org/lkml/20191004083121.12182-6-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-10-04 08:31:21 +00:00
|
|
|
OPT_BOOLEAN(0, "kcore", &record.opts.kcore, "copy /proc/kcore"),
|
2012-04-26 05:15:15 +00:00
|
|
|
OPT_STRING('u', "uid", &record.opts.target.uid_str, "user",
|
|
|
|
"user to profile"),
|
2012-03-08 22:47:45 +00:00
|
|
|
|
|
|
|
OPT_CALLBACK_NOOPT('b', "branch-any", &record.opts.branch_stack,
|
|
|
|
"branch any", "sample any taken branches",
|
|
|
|
parse_branch_stack),
|
|
|
|
|
|
|
|
OPT_CALLBACK('j', "branch-filter", &record.opts.branch_stack,
|
|
|
|
"branch filter mask", "branch stack filter modes",
|
2012-02-09 22:21:02 +00:00
|
|
|
parse_branch_stack),
|
2013-01-24 15:10:29 +00:00
|
|
|
OPT_BOOLEAN('W', "weight", &record.opts.sample_weight,
|
|
|
|
"sample by weight (on special events only)"),
|
2013-09-20 14:40:43 +00:00
|
|
|
OPT_BOOLEAN(0, "transaction", &record.opts.sample_transaction,
|
|
|
|
"sample transaction flags (special events only)"),
|
2013-11-15 13:52:29 +00:00
|
|
|
OPT_BOOLEAN(0, "per-thread", &record.opts.target.per_thread,
|
|
|
|
"use per-thread mmaps"),
|
perf record: Add ability to name registers to record
This patch modifies the -I/--int-regs option to enablepassing the name
of the registers to sample on interrupt. Registers can be specified by
their symbolic names. For instance on x86, --intr-regs=ax,si.
The motivation is to reduce the size of the perf.data file and the
overhead of sampling by only collecting the registers useful to a
specific analysis. For instance, for value profiling, sampling only the
registers used to passed arguements to functions.
With no parameter, the --intr-regs still records all possible registers
based on the architecture.
To name registers, it is necessary to use the long form of the option,
i.e., --intr-regs:
$ perf record --intr-regs=si,di,r8,r9 .....
To record any possible registers:
$ perf record -I .....
$ perf report --intr-regs ...
To display the register, one can use perf report -D
To list the available registers:
$ perf record --intr-regs=\?
available registers: AX BX CX DX SI DI BP SP IP FLAGS CS SS R8 R9 R10 R11 R12 R13 R14 R15
Signed-off-by: Stephane Eranian <eranian@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1441039273-16260-4-git-send-email-eranian@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-31 16:41:12 +00:00
|
|
|
OPT_CALLBACK_OPTARG('I', "intr-regs", &record.opts.sample_intr_regs, NULL, "any register",
|
|
|
|
"sample selected machine registers on interrupt,"
|
2019-05-14 20:19:32 +00:00
|
|
|
" use '-I?' to list register names", parse_intr_regs),
|
2017-09-05 17:00:28 +00:00
|
|
|
OPT_CALLBACK_OPTARG(0, "user-regs", &record.opts.sample_user_regs, NULL, "any register",
|
|
|
|
"sample selected machine registers on interrupt,"
|
2019-05-14 20:19:32 +00:00
|
|
|
" use '--user-regs=?' to list register names", parse_user_regs),
|
2015-02-24 23:13:40 +00:00
|
|
|
OPT_BOOLEAN(0, "running-time", &record.opts.running_time,
|
|
|
|
"Record running/enabled time of read (:S) events"),
|
2015-03-30 22:19:31 +00:00
|
|
|
OPT_CALLBACK('k', "clockid", &record.opts,
|
|
|
|
"clockid", "clockid to use for events, see clock_gettime()",
|
|
|
|
parse_clockid),
|
2015-04-30 14:37:32 +00:00
|
|
|
OPT_STRING_OPTARG('S', "snapshot", &record.opts.auxtrace_snapshot_opts,
|
|
|
|
"opts", "AUX area tracing Snapshot Mode", ""),
|
2019-11-15 12:42:16 +00:00
|
|
|
OPT_STRING_OPTARG(0, "aux-sample", &record.opts.auxtrace_sample_opts,
|
|
|
|
"opts", "sample AUX area", ""),
|
2018-12-04 20:34:20 +00:00
|
|
|
OPT_UINTEGER(0, "proc-map-timeout", &proc_map_timeout,
|
2015-06-17 13:51:11 +00:00
|
|
|
"per thread proc mmap processing timeout in ms"),
|
perf tools: Add PERF_RECORD_NAMESPACES to include namespaces related info
Introduce a new option to record PERF_RECORD_NAMESPACES events emitted
by the kernel when fork, clone, setns or unshare are invoked. And update
perf-record documentation with the new option to record namespace
events.
Committer notes:
Combined it with a later patch to allow printing it via 'perf report -D'
and be able to test the feature introduced in this patch. Had to move
here also perf_ns__name(), that was introduced in another later patch.
Also used PRIu64 and PRIx64 to fix the build in some enfironments wrt:
util/event.c:1129:39: error: format '%lx' expects argument of type 'long unsigned int', but argument 6 has type 'long long unsigned int' [-Werror=format=]
ret += fprintf(fp, "%u/%s: %lu/0x%lx%s", idx
^
Testing it:
# perf record --namespaces -a
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 1.083 MB perf.data (423 samples) ]
#
# perf report -D
<SNIP>
3 2028902078892 0x115140 [0xa0]: PERF_RECORD_NAMESPACES 14783/14783 - nr_namespaces: 7
[0/net: 3/0xf0000081, 1/uts: 3/0xeffffffe, 2/ipc: 3/0xefffffff, 3/pid: 3/0xeffffffc,
4/user: 3/0xeffffffd, 5/mnt: 3/0xf0000000, 6/cgroup: 3/0xeffffffb]
0x1151e0 [0x30]: event: 9
.
. ... raw event: size 48 bytes
. 0000: 09 00 00 00 02 00 30 00 c4 71 82 68 0c 7f 00 00 ......0..q.h....
. 0010: a9 39 00 00 a9 39 00 00 94 28 fe 63 d8 01 00 00 .9...9...(.c....
. 0020: 03 00 00 00 00 00 00 00 ce c4 02 00 00 00 00 00 ................
<SNIP>
NAMESPACES events: 1
<SNIP>
#
Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sargun Dhillon <sargun@sargun.me>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/148891930386.25309.18412039920746995488.stgit@hbathini.in.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-03-07 20:41:43 +00:00
|
|
|
OPT_BOOLEAN(0, "namespaces", &record.opts.record_namespaces,
|
|
|
|
"Record namespaces events"),
|
2020-03-25 12:45:34 +00:00
|
|
|
OPT_BOOLEAN(0, "all-cgroups", &record.opts.record_cgroup,
|
|
|
|
"Record cgroup events"),
|
2020-05-28 12:08:58 +00:00
|
|
|
OPT_BOOLEAN_SET(0, "switch-events", &record.opts.record_switch_events,
|
|
|
|
&record.opts.record_switch_events_set,
|
|
|
|
"Record context switch events"),
|
2016-02-15 08:34:31 +00:00
|
|
|
OPT_BOOLEAN_FLAG(0, "all-kernel", &record.opts.all_kernel,
|
|
|
|
"Configure all used events to run in kernel space.",
|
|
|
|
PARSE_OPT_EXCLUSIVE),
|
|
|
|
OPT_BOOLEAN_FLAG(0, "all-user", &record.opts.all_user,
|
|
|
|
"Configure all used events to run in user space.",
|
|
|
|
PARSE_OPT_EXCLUSIVE),
|
2019-05-30 13:29:22 +00:00
|
|
|
OPT_BOOLEAN(0, "kernel-callchains", &record.opts.kernel_callchains,
|
|
|
|
"collect kernel callchains"),
|
|
|
|
OPT_BOOLEAN(0, "user-callchains", &record.opts.user_callchains,
|
|
|
|
"collect user callchains"),
|
2015-12-14 10:39:23 +00:00
|
|
|
OPT_STRING(0, "vmlinux", &symbol_conf.vmlinux_name,
|
|
|
|
"file", "vmlinux pathname"),
|
2016-01-11 13:37:09 +00:00
|
|
|
OPT_BOOLEAN(0, "buildid-all", &record.buildid_all,
|
|
|
|
"Record build-id of all DSOs regardless of hits"),
|
perf record: Add --buildid-mmap option to enable PERF_RECORD_MMAP2's build id
Add --buildid-mmap option to enable build id in PERF_RECORD_MMAP2 events.
It will only work if there's kernel support for that and it disables
build id cache (implies --no-buildid).
It's also possible to enable it permanently via config option in
~/.perfconfig file:
[record]
build-id=mmap
Also added build_id bit in the verbose output for perf_event_attr:
# perf record --buildid-mmap -vv
...
perf_event_attr:
type 1
size 120
...
build_id 1
Adding also missing text_poke bit.
Committer testing:
$ perf record -h build
Usage: perf record [<options>] [<command>]
or: perf record [<options>] -- <command> [<options>]
-B, --no-buildid do not collect buildids in perf.data
-N, --no-buildid-cache
do not update the buildid cache
--buildid-all Record build-id of all DSOs regardless of hits
--buildid-mmap Record build-id in map events
$
$ perf record --buildid-mmap sleep 1
Failed: no support to record build id in mmap events, update your kernel.
$
After adding the needed kernel bits in a test kernel:
$ perf record -vv --buildid-mmap sleep 1 |& grep -m1 build
Enabling build id in mmap2 events.
$ perf evlist -v
cycles:u: size: 120, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD, read_format: ID, disabled: 1, inherit: 1, exclude_kernel: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, build_id: 1
$
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Budankov <abudankov@huawei.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201214105457.543111-16-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-14 10:54:57 +00:00
|
|
|
OPT_BOOLEAN(0, "buildid-mmap", &record.buildid_mmap,
|
|
|
|
"Record build-id in map events"),
|
2016-04-13 08:21:07 +00:00
|
|
|
OPT_BOOLEAN(0, "timestamp-filename", &record.timestamp_filename,
|
|
|
|
"append timestamp to output filename"),
|
perf record: Record the first and last sample time in the header
In the default 'perf record' configuration, all samples are processed,
to create the HEADER_BUILD_ID table. So it's very easy to get the
first/last samples and save the time to perf file header via the
function write_sample_time().
Later, at post processing time, perf report/script will fetch the time
from perf file header.
Committer testing:
# perf record -a sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 2.099 MB perf.data (1101 samples) ]
[root@jouet home]# perf report --header | grep "time of "
# time of first sample : 22947.909226
# time of last sample : 22948.910704
#
# perf report -D | grep PERF_RECORD_SAMPLE\(
0 22947909226101 0x20bb68 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4001): 0/0: 0xffffffffa21b1af3 period: 1 addr: 0
0 22947909229928 0x20bb98 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4001): 0/0: 0xffffffffa200d204 period: 1 addr: 0
<SNIP>
3 22948910397351 0x219360 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4001): 28251/28251: 0xffffffffa22071d8 period: 169518 addr: 0
0 22948910652380 0x20f120 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4001): 0/0: 0xffffffffa2856816 period: 198807 addr: 0
2 22948910704034 0x2172d0 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4001): 0/0: 0xffffffffa2856816 period: 88111 addr: 0
#
Changelog:
v7: Just update the patch description according to Arnaldo's suggestion.
v6: Currently '--buildid-all' is not enabled at default. So the walking
on all samples is the default operation. There is no big overhead
to calculate the timestamp boundary in process_sample_event handler
once we already go through all samples. So the timestamp boundary
calculation is enabled by default when '--buildid-all' is not enabled.
While if '--buildid-all' is enabled, we creates a new option
"--timestamp-boundary" for user to decide if it enables the
timestamp boundary calculation.
v5: There is an issue that the sample walking can only work when
'--buildid-all' is not enabled. So we need to let the walking
be able to work even if '--buildid-all' is enabled and let the
processing skips the dso hit marking for this case.
At first, I want to provide a new option "--record-time-boundaries".
While after consideration, I think a new option is not very
necessary.
v3: Remove the definitions of first_sample_time and last_sample_time
from struct record and directly save them in perf_evlist.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1512738826-2628-3-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-12-08 13:13:42 +00:00
|
|
|
OPT_BOOLEAN(0, "timestamp-boundary", &record.timestamp_boundary,
|
|
|
|
"Record timestamp boundary (time of first/last samples)"),
|
2017-01-09 09:51:57 +00:00
|
|
|
OPT_STRING_OPTARG_SET(0, "switch-output", &record.switch_output.str,
|
2019-03-14 22:49:56 +00:00
|
|
|
&record.switch_output.set, "signal or size[BKMG] or time[smhd]",
|
|
|
|
"Switch output when receiving SIGUSR2 (signal) or cross a size or time threshold",
|
2017-01-09 09:51:58 +00:00
|
|
|
"signal"),
|
2023-05-02 22:38:36 +00:00
|
|
|
OPT_CALLBACK_SET(0, "switch-output-event", &switch_output_parse_events_option_args,
|
|
|
|
&record.switch_output_event_set, "switch output event",
|
2020-04-27 20:56:37 +00:00
|
|
|
"switch output event selector. use 'perf list' to list available events",
|
|
|
|
parse_events_option_new_evlist),
|
2019-03-14 22:49:55 +00:00
|
|
|
OPT_INTEGER(0, "switch-max-files", &record.switch_output.num_files,
|
|
|
|
"Limit number of switch output generated files"),
|
2016-06-16 08:02:41 +00:00
|
|
|
OPT_BOOLEAN(0, "dry-run", &dry_run,
|
|
|
|
"Parse options then exit"),
|
2018-11-06 09:04:58 +00:00
|
|
|
#ifdef HAVE_AIO_SUPPORT
|
2018-11-06 09:07:19 +00:00
|
|
|
OPT_CALLBACK_OPTARG(0, "aio", &record.opts,
|
|
|
|
&nr_cblocks_default, "n", "Use <n> control blocks in asynchronous trace writing mode (default: 1, max: 4)",
|
2018-11-06 09:04:58 +00:00
|
|
|
record__aio_parse),
|
|
|
|
#endif
|
2019-01-22 17:52:03 +00:00
|
|
|
OPT_CALLBACK(0, "affinity", &record.opts, "node|cpu",
|
|
|
|
"Set affinity mask of trace reading thread to NUMA node cpu mask or cpu of processed mmap buffer",
|
|
|
|
record__parse_affinity),
|
perf record: Implement -z,--compression_level[=<n>] option
Implemented -z,--compression_level[=<n>] option that enables compression
of mmaped kernel data buffers content in runtime during perf record mode
collection. Default option value is 1 (fastest compression).
Compression overhead has been measured for serial and AIO streaming when
profiling matrix multiplication workload:
-------------------------------------------------------------
| SERIAL | AIO-1 |
----------------------------------------------------------------|
|-z | OVH(x) | ratio(x) size(MiB) | OVH(x) | ratio(x) size(MiB) |
|---------------------------------------------------------------|
| 0 | 1,00 | 1,000 179,424 | 1,00 | 1,000 187,527 |
| 1 | 1,04 | 8,427 181,148 | 1,01 | 8,474 188,562 |
| 2 | 1,07 | 8,055 186,953 | 1,03 | 7,912 191,773 |
| 3 | 1,04 | 8,283 181,908 | 1,03 | 8,220 191,078 |
| 5 | 1,09 | 8,101 187,705 | 1,05 | 7,780 190,065 |
| 8 | 1,05 | 9,217 179,191 | 1,12 | 6,111 193,024 |
-----------------------------------------------------------------
OVH = (Execution time with -z N) / (Execution time with -z 0)
ratio - compression ratio
size - number of bytes that was compressed
size ~= trace size x ratio
Committer notes:
Testing it I noticed that it failed to disable build id processing when
compression is enabled, and as we'd have to uncompress everything to
look for the PERF_RECORD_{MMAP,SAMPLE,etc} to figure out which build ids
to read from DSOs, we better disable build id processing when
compression is enabled, logging with pr_debug() when doing so:
Original patch:
# perf record -z2
^C[ perf record: Woken up 1 times to write data ]
0x1746e0 [0x76]: failed to process type: 81 [Invalid argument]
[ perf record: Captured and wrote 1.568 MB perf.data, compressed (original 0.452 MB, ratio is 3.995) ]
#
After auto-disabling build id processing when compression is enabled:
$ perf record -z2 sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.001 MB perf.data, compressed (original 0.001 MB, ratio is 2.292) ]
$ perf record -v -z2 sleep 1
Compression enabled, disabling build id collection at the end of the session.
<SNIP extra -v pr_debug() messages>
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.001 MB perf.data, compressed (original 0.001 MB, ratio is 2.305) ]
$
Also, with parts of the patch originally after this one moved to just
before this one we get:
$ perf record -z2 sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.001 MB perf.data, compressed (original 0.001 MB, ratio is 2.371) ]
$ perf report -D | grep COMPRESS
0 0x1b8 [0x155]: PERF_RECORD_COMPRESSED: unhandled!
0 0x30d [0x80]: PERF_RECORD_COMPRESSED: unhandled!
COMPRESSED events: 2
COMPRESSED events: 0
$
I.e. when faced with PERF_RECORD_COMPRESSED that we still have no code
to process, we just show it as not being handled, skip them and
continue, while before we had:
$ perf report -D | grep COMPRESS
0x1b8 [0x169]: failed to process type: 81 [Invalid argument]
Error:
failed to process sample
0 0x1b8 [0x169]: PERF_RECORD_COMPRESSED
$
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/9ff06518-ae63-a908-e44d-5d9e56dd66d9@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-18 17:44:42 +00:00
|
|
|
#ifdef HAVE_ZSTD_SUPPORT
|
2022-01-17 18:34:34 +00:00
|
|
|
OPT_CALLBACK_OPTARG('z', "compression-level", &record.opts, &comp_level_default, "n",
|
|
|
|
"Compress records using specified level (default: 1 - fastest compression, 22 - greatest compression)",
|
perf record: Implement -z,--compression_level[=<n>] option
Implemented -z,--compression_level[=<n>] option that enables compression
of mmaped kernel data buffers content in runtime during perf record mode
collection. Default option value is 1 (fastest compression).
Compression overhead has been measured for serial and AIO streaming when
profiling matrix multiplication workload:
-------------------------------------------------------------
| SERIAL | AIO-1 |
----------------------------------------------------------------|
|-z | OVH(x) | ratio(x) size(MiB) | OVH(x) | ratio(x) size(MiB) |
|---------------------------------------------------------------|
| 0 | 1,00 | 1,000 179,424 | 1,00 | 1,000 187,527 |
| 1 | 1,04 | 8,427 181,148 | 1,01 | 8,474 188,562 |
| 2 | 1,07 | 8,055 186,953 | 1,03 | 7,912 191,773 |
| 3 | 1,04 | 8,283 181,908 | 1,03 | 8,220 191,078 |
| 5 | 1,09 | 8,101 187,705 | 1,05 | 7,780 190,065 |
| 8 | 1,05 | 9,217 179,191 | 1,12 | 6,111 193,024 |
-----------------------------------------------------------------
OVH = (Execution time with -z N) / (Execution time with -z 0)
ratio - compression ratio
size - number of bytes that was compressed
size ~= trace size x ratio
Committer notes:
Testing it I noticed that it failed to disable build id processing when
compression is enabled, and as we'd have to uncompress everything to
look for the PERF_RECORD_{MMAP,SAMPLE,etc} to figure out which build ids
to read from DSOs, we better disable build id processing when
compression is enabled, logging with pr_debug() when doing so:
Original patch:
# perf record -z2
^C[ perf record: Woken up 1 times to write data ]
0x1746e0 [0x76]: failed to process type: 81 [Invalid argument]
[ perf record: Captured and wrote 1.568 MB perf.data, compressed (original 0.452 MB, ratio is 3.995) ]
#
After auto-disabling build id processing when compression is enabled:
$ perf record -z2 sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.001 MB perf.data, compressed (original 0.001 MB, ratio is 2.292) ]
$ perf record -v -z2 sleep 1
Compression enabled, disabling build id collection at the end of the session.
<SNIP extra -v pr_debug() messages>
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.001 MB perf.data, compressed (original 0.001 MB, ratio is 2.305) ]
$
Also, with parts of the patch originally after this one moved to just
before this one we get:
$ perf record -z2 sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.001 MB perf.data, compressed (original 0.001 MB, ratio is 2.371) ]
$ perf report -D | grep COMPRESS
0 0x1b8 [0x155]: PERF_RECORD_COMPRESSED: unhandled!
0 0x30d [0x80]: PERF_RECORD_COMPRESSED: unhandled!
COMPRESSED events: 2
COMPRESSED events: 0
$
I.e. when faced with PERF_RECORD_COMPRESSED that we still have no code
to process, we just show it as not being handled, skip them and
continue, while before we had:
$ perf report -D | grep COMPRESS
0x1b8 [0x169]: failed to process type: 81 [Invalid argument]
Error:
failed to process sample
0 0x1b8 [0x169]: PERF_RECORD_COMPRESSED
$
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/9ff06518-ae63-a908-e44d-5d9e56dd66d9@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-18 17:44:42 +00:00
|
|
|
record__parse_comp_level),
|
|
|
|
#endif
|
2019-10-22 08:09:01 +00:00
|
|
|
OPT_CALLBACK(0, "max-size", &record.output_max_size,
|
|
|
|
"size", "Limit the maximum size of the output file", parse_output_max_size),
|
2020-04-22 15:50:38 +00:00
|
|
|
OPT_UINTEGER(0, "num-thread-synthesize",
|
|
|
|
&record.opts.nr_threads_synthesize,
|
|
|
|
"number of threads to run for event synthesis"),
|
2020-05-05 18:29:43 +00:00
|
|
|
#ifdef HAVE_LIBPFM
|
|
|
|
OPT_CALLBACK(0, "pfm-events", &record.evlist, "event",
|
|
|
|
"libpfm4 event selector. use 'perf list' to list available events",
|
|
|
|
parse_libpfm_events_option),
|
|
|
|
#endif
|
2020-09-02 10:57:07 +00:00
|
|
|
OPT_CALLBACK(0, "control", &record.opts, "fd:ctl-fd[,ack-fd] or fifo:ctl-fifo[,ack-fifo]",
|
2020-09-01 09:37:57 +00:00
|
|
|
"Listen on ctl-fd descriptor for command to control measurement ('enable': enable events, 'disable': disable events,\n"
|
|
|
|
"\t\t\t 'snapshot': AUX area tracing snapshot).\n"
|
2020-09-02 10:57:07 +00:00
|
|
|
"\t\t\t Optionally send control command completion ('ack\\n') to ack-fd descriptor.\n"
|
|
|
|
"\t\t\t Alternatively, ctl-fifo / ack-fifo will be opened and used as ctl-fd / ack-fd.",
|
2020-07-17 07:08:23 +00:00
|
|
|
parse_control_option),
|
2021-08-11 04:46:58 +00:00
|
|
|
OPT_CALLBACK(0, "synth", &record.opts, "no|all|task|mmap|cgroup",
|
|
|
|
"Fine-tune event synthesis: default=all", parse_record_synth_option),
|
2021-12-09 20:04:25 +00:00
|
|
|
OPT_STRING_OPTARG_SET(0, "debuginfod", &record.debuginfod.urls,
|
|
|
|
&record.debuginfod.set, "debuginfod urls",
|
|
|
|
"Enable debuginfod data retrieval from DEBUGINFOD_URLS or specified urls",
|
|
|
|
"system"),
|
2022-01-17 18:34:32 +00:00
|
|
|
OPT_CALLBACK_OPTARG(0, "threads", &record.opts, NULL, "spec",
|
|
|
|
"write collected trace data into several data files using parallel threads",
|
|
|
|
record__parse_threads),
|
2022-05-18 22:47:21 +00:00
|
|
|
OPT_BOOLEAN(0, "off-cpu", &record.off_cpu, "Enable off-cpu analysis"),
|
2024-07-03 22:30:34 +00:00
|
|
|
OPT_STRING(0, "setup-filter", &record.filter_action, "pin|unpin",
|
|
|
|
"BPF filter action"),
|
2009-05-26 07:17:18 +00:00
|
|
|
OPT_END()
|
|
|
|
};
|
|
|
|
|
2014-10-22 15:15:46 +00:00
|
|
|
struct option *record_options = __record_options;
|
|
|
|
|
2022-09-05 14:19:29 +00:00
|
|
|
static int record__mmap_cpu_mask_init(struct mmap_cpu_mask *mask, struct perf_cpu_map *cpus)
|
2022-01-17 18:34:21 +00:00
|
|
|
{
|
2022-05-03 04:17:52 +00:00
|
|
|
struct perf_cpu cpu;
|
|
|
|
int idx;
|
2022-01-17 18:34:21 +00:00
|
|
|
|
2022-04-14 01:46:40 +00:00
|
|
|
if (cpu_map__is_dummy(cpus))
|
2022-09-05 14:19:29 +00:00
|
|
|
return 0;
|
2022-04-14 01:46:40 +00:00
|
|
|
|
2023-11-29 06:02:02 +00:00
|
|
|
perf_cpu_map__for_each_cpu_skip_any(cpu, idx, cpus) {
|
2022-09-05 14:19:29 +00:00
|
|
|
/* Return ENODEV is input cpu is greater than max cpu */
|
|
|
|
if ((unsigned long)cpu.cpu > mask->nbits)
|
|
|
|
return -ENODEV;
|
2022-11-19 01:34:46 +00:00
|
|
|
__set_bit(cpu.cpu, mask->bits);
|
2022-09-05 14:19:29 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
2022-01-17 18:34:21 +00:00
|
|
|
}
|
|
|
|
|
2022-01-17 18:34:33 +00:00
|
|
|
static int record__mmap_cpu_mask_init_spec(struct mmap_cpu_mask *mask, const char *mask_spec)
|
|
|
|
{
|
|
|
|
struct perf_cpu_map *cpus;
|
|
|
|
|
|
|
|
cpus = perf_cpu_map__new(mask_spec);
|
|
|
|
if (!cpus)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
bitmap_zero(mask->bits, mask->nbits);
|
2022-09-05 14:19:29 +00:00
|
|
|
if (record__mmap_cpu_mask_init(mask, cpus))
|
|
|
|
return -ENODEV;
|
|
|
|
|
2022-01-17 18:34:33 +00:00
|
|
|
perf_cpu_map__put(cpus);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2022-01-17 18:34:21 +00:00
|
|
|
static void record__free_thread_masks(struct record *rec, int nr_threads)
|
|
|
|
{
|
|
|
|
int t;
|
|
|
|
|
|
|
|
if (rec->thread_masks)
|
|
|
|
for (t = 0; t < nr_threads; t++)
|
|
|
|
record__thread_mask_free(&rec->thread_masks[t]);
|
|
|
|
|
|
|
|
zfree(&rec->thread_masks);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int record__alloc_thread_masks(struct record *rec, int nr_threads, int nr_bits)
|
|
|
|
{
|
|
|
|
int t, ret;
|
|
|
|
|
|
|
|
rec->thread_masks = zalloc(nr_threads * sizeof(*(rec->thread_masks)));
|
|
|
|
if (!rec->thread_masks) {
|
|
|
|
pr_err("Failed to allocate thread masks\n");
|
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
|
|
|
|
for (t = 0; t < nr_threads; t++) {
|
|
|
|
ret = record__thread_mask_alloc(&rec->thread_masks[t], nr_bits);
|
|
|
|
if (ret) {
|
|
|
|
pr_err("Failed to allocate thread masks[%d]\n", t);
|
|
|
|
goto out_free;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
out_free:
|
|
|
|
record__free_thread_masks(rec, nr_threads);
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2022-01-17 18:34:32 +00:00
|
|
|
static int record__init_thread_cpu_masks(struct record *rec, struct perf_cpu_map *cpus)
|
|
|
|
{
|
|
|
|
int t, ret, nr_cpus = perf_cpu_map__nr(cpus);
|
|
|
|
|
|
|
|
ret = record__alloc_thread_masks(rec, nr_cpus, cpu__max_cpu().cpu);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
|
|
|
rec->nr_threads = nr_cpus;
|
|
|
|
pr_debug("nr_threads: %d\n", rec->nr_threads);
|
|
|
|
|
|
|
|
for (t = 0; t < rec->nr_threads; t++) {
|
2022-11-19 01:34:46 +00:00
|
|
|
__set_bit(perf_cpu_map__cpu(cpus, t).cpu, rec->thread_masks[t].maps.bits);
|
|
|
|
__set_bit(perf_cpu_map__cpu(cpus, t).cpu, rec->thread_masks[t].affinity.bits);
|
2022-12-20 03:57:01 +00:00
|
|
|
if (verbose > 0) {
|
2022-01-17 18:34:32 +00:00
|
|
|
pr_debug("thread_masks[%d]: ", t);
|
|
|
|
mmap_cpu_mask__scnprintf(&rec->thread_masks[t].maps, "maps");
|
|
|
|
pr_debug("thread_masks[%d]: ", t);
|
|
|
|
mmap_cpu_mask__scnprintf(&rec->thread_masks[t].affinity, "affinity");
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2022-01-17 18:34:33 +00:00
|
|
|
static int record__init_thread_masks_spec(struct record *rec, struct perf_cpu_map *cpus,
|
|
|
|
const char **maps_spec, const char **affinity_spec,
|
|
|
|
u32 nr_spec)
|
|
|
|
{
|
|
|
|
u32 s;
|
|
|
|
int ret = 0, t = 0;
|
|
|
|
struct mmap_cpu_mask cpus_mask;
|
|
|
|
struct thread_mask thread_mask, full_mask, *thread_masks;
|
|
|
|
|
|
|
|
ret = record__mmap_cpu_mask_alloc(&cpus_mask, cpu__max_cpu().cpu);
|
|
|
|
if (ret) {
|
|
|
|
pr_err("Failed to allocate CPUs mask\n");
|
|
|
|
return ret;
|
|
|
|
}
|
2022-09-05 14:19:29 +00:00
|
|
|
|
|
|
|
ret = record__mmap_cpu_mask_init(&cpus_mask, cpus);
|
|
|
|
if (ret) {
|
|
|
|
pr_err("Failed to init cpu mask\n");
|
|
|
|
goto out_free_cpu_mask;
|
|
|
|
}
|
2022-01-17 18:34:33 +00:00
|
|
|
|
|
|
|
ret = record__thread_mask_alloc(&full_mask, cpu__max_cpu().cpu);
|
|
|
|
if (ret) {
|
|
|
|
pr_err("Failed to allocate full mask\n");
|
|
|
|
goto out_free_cpu_mask;
|
|
|
|
}
|
|
|
|
|
|
|
|
ret = record__thread_mask_alloc(&thread_mask, cpu__max_cpu().cpu);
|
|
|
|
if (ret) {
|
|
|
|
pr_err("Failed to allocate thread mask\n");
|
|
|
|
goto out_free_full_and_cpu_masks;
|
|
|
|
}
|
|
|
|
|
|
|
|
for (s = 0; s < nr_spec; s++) {
|
|
|
|
ret = record__mmap_cpu_mask_init_spec(&thread_mask.maps, maps_spec[s]);
|
|
|
|
if (ret) {
|
|
|
|
pr_err("Failed to initialize maps thread mask\n");
|
|
|
|
goto out_free;
|
|
|
|
}
|
|
|
|
ret = record__mmap_cpu_mask_init_spec(&thread_mask.affinity, affinity_spec[s]);
|
|
|
|
if (ret) {
|
|
|
|
pr_err("Failed to initialize affinity thread mask\n");
|
|
|
|
goto out_free;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* ignore invalid CPUs but do not allow empty masks */
|
|
|
|
if (!bitmap_and(thread_mask.maps.bits, thread_mask.maps.bits,
|
|
|
|
cpus_mask.bits, thread_mask.maps.nbits)) {
|
|
|
|
pr_err("Empty maps mask: %s\n", maps_spec[s]);
|
|
|
|
ret = -EINVAL;
|
|
|
|
goto out_free;
|
|
|
|
}
|
|
|
|
if (!bitmap_and(thread_mask.affinity.bits, thread_mask.affinity.bits,
|
|
|
|
cpus_mask.bits, thread_mask.affinity.nbits)) {
|
|
|
|
pr_err("Empty affinity mask: %s\n", affinity_spec[s]);
|
|
|
|
ret = -EINVAL;
|
|
|
|
goto out_free;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* do not allow intersection with other masks (full_mask) */
|
|
|
|
if (bitmap_intersects(thread_mask.maps.bits, full_mask.maps.bits,
|
|
|
|
thread_mask.maps.nbits)) {
|
|
|
|
pr_err("Intersecting maps mask: %s\n", maps_spec[s]);
|
|
|
|
ret = -EINVAL;
|
|
|
|
goto out_free;
|
|
|
|
}
|
|
|
|
if (bitmap_intersects(thread_mask.affinity.bits, full_mask.affinity.bits,
|
|
|
|
thread_mask.affinity.nbits)) {
|
|
|
|
pr_err("Intersecting affinity mask: %s\n", affinity_spec[s]);
|
|
|
|
ret = -EINVAL;
|
|
|
|
goto out_free;
|
|
|
|
}
|
|
|
|
|
|
|
|
bitmap_or(full_mask.maps.bits, full_mask.maps.bits,
|
|
|
|
thread_mask.maps.bits, full_mask.maps.nbits);
|
|
|
|
bitmap_or(full_mask.affinity.bits, full_mask.affinity.bits,
|
|
|
|
thread_mask.affinity.bits, full_mask.maps.nbits);
|
|
|
|
|
|
|
|
thread_masks = realloc(rec->thread_masks, (t + 1) * sizeof(struct thread_mask));
|
|
|
|
if (!thread_masks) {
|
|
|
|
pr_err("Failed to reallocate thread masks\n");
|
|
|
|
ret = -ENOMEM;
|
|
|
|
goto out_free;
|
|
|
|
}
|
|
|
|
rec->thread_masks = thread_masks;
|
|
|
|
rec->thread_masks[t] = thread_mask;
|
2022-12-20 03:57:01 +00:00
|
|
|
if (verbose > 0) {
|
2022-01-17 18:34:33 +00:00
|
|
|
pr_debug("thread_masks[%d]: ", t);
|
|
|
|
mmap_cpu_mask__scnprintf(&rec->thread_masks[t].maps, "maps");
|
|
|
|
pr_debug("thread_masks[%d]: ", t);
|
|
|
|
mmap_cpu_mask__scnprintf(&rec->thread_masks[t].affinity, "affinity");
|
|
|
|
}
|
|
|
|
t++;
|
|
|
|
ret = record__thread_mask_alloc(&thread_mask, cpu__max_cpu().cpu);
|
|
|
|
if (ret) {
|
|
|
|
pr_err("Failed to allocate thread mask\n");
|
|
|
|
goto out_free_full_and_cpu_masks;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
rec->nr_threads = t;
|
|
|
|
pr_debug("nr_threads: %d\n", rec->nr_threads);
|
|
|
|
if (!rec->nr_threads)
|
|
|
|
ret = -EINVAL;
|
|
|
|
|
|
|
|
out_free:
|
|
|
|
record__thread_mask_free(&thread_mask);
|
|
|
|
out_free_full_and_cpu_masks:
|
|
|
|
record__thread_mask_free(&full_mask);
|
|
|
|
out_free_cpu_mask:
|
|
|
|
record__mmap_cpu_mask_free(&cpus_mask);
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int record__init_thread_core_masks(struct record *rec, struct perf_cpu_map *cpus)
|
|
|
|
{
|
|
|
|
int ret;
|
|
|
|
struct cpu_topology *topo;
|
|
|
|
|
|
|
|
topo = cpu_topology__new();
|
|
|
|
if (!topo) {
|
|
|
|
pr_err("Failed to allocate CPU topology\n");
|
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
|
|
|
|
ret = record__init_thread_masks_spec(rec, cpus, topo->core_cpus_list,
|
|
|
|
topo->core_cpus_list, topo->core_cpus_lists);
|
|
|
|
cpu_topology__delete(topo);
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int record__init_thread_package_masks(struct record *rec, struct perf_cpu_map *cpus)
|
|
|
|
{
|
|
|
|
int ret;
|
|
|
|
struct cpu_topology *topo;
|
|
|
|
|
|
|
|
topo = cpu_topology__new();
|
|
|
|
if (!topo) {
|
|
|
|
pr_err("Failed to allocate CPU topology\n");
|
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
|
|
|
|
ret = record__init_thread_masks_spec(rec, cpus, topo->package_cpus_list,
|
|
|
|
topo->package_cpus_list, topo->package_cpus_lists);
|
|
|
|
cpu_topology__delete(topo);
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int record__init_thread_numa_masks(struct record *rec, struct perf_cpu_map *cpus)
|
|
|
|
{
|
|
|
|
u32 s;
|
|
|
|
int ret;
|
|
|
|
const char **spec;
|
|
|
|
struct numa_topology *topo;
|
|
|
|
|
|
|
|
topo = numa_topology__new();
|
|
|
|
if (!topo) {
|
|
|
|
pr_err("Failed to allocate NUMA topology\n");
|
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
|
|
|
|
spec = zalloc(topo->nr * sizeof(char *));
|
|
|
|
if (!spec) {
|
|
|
|
pr_err("Failed to allocate NUMA spec\n");
|
|
|
|
ret = -ENOMEM;
|
|
|
|
goto out_delete_topo;
|
|
|
|
}
|
|
|
|
for (s = 0; s < topo->nr; s++)
|
|
|
|
spec[s] = topo->nodes[s].cpus;
|
|
|
|
|
|
|
|
ret = record__init_thread_masks_spec(rec, cpus, spec, spec, topo->nr);
|
|
|
|
|
|
|
|
zfree(&spec);
|
|
|
|
|
|
|
|
out_delete_topo:
|
|
|
|
numa_topology__delete(topo);
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int record__init_thread_user_masks(struct record *rec, struct perf_cpu_map *cpus)
|
|
|
|
{
|
|
|
|
int t, ret;
|
|
|
|
u32 s, nr_spec = 0;
|
|
|
|
char **maps_spec = NULL, **affinity_spec = NULL, **tmp_spec;
|
|
|
|
char *user_spec, *spec, *spec_ptr, *mask, *mask_ptr, *dup_mask = NULL;
|
|
|
|
|
|
|
|
for (t = 0, user_spec = (char *)rec->opts.threads_user_spec; ; t++, user_spec = NULL) {
|
|
|
|
spec = strtok_r(user_spec, ":", &spec_ptr);
|
|
|
|
if (spec == NULL)
|
|
|
|
break;
|
|
|
|
pr_debug2("threads_spec[%d]: %s\n", t, spec);
|
|
|
|
mask = strtok_r(spec, "/", &mask_ptr);
|
|
|
|
if (mask == NULL)
|
|
|
|
break;
|
|
|
|
pr_debug2(" maps mask: %s\n", mask);
|
|
|
|
tmp_spec = realloc(maps_spec, (nr_spec + 1) * sizeof(char *));
|
|
|
|
if (!tmp_spec) {
|
|
|
|
pr_err("Failed to reallocate maps spec\n");
|
|
|
|
ret = -ENOMEM;
|
|
|
|
goto out_free;
|
|
|
|
}
|
|
|
|
maps_spec = tmp_spec;
|
|
|
|
maps_spec[nr_spec] = dup_mask = strdup(mask);
|
|
|
|
if (!maps_spec[nr_spec]) {
|
|
|
|
pr_err("Failed to allocate maps spec[%d]\n", nr_spec);
|
|
|
|
ret = -ENOMEM;
|
|
|
|
goto out_free;
|
|
|
|
}
|
|
|
|
mask = strtok_r(NULL, "/", &mask_ptr);
|
|
|
|
if (mask == NULL) {
|
|
|
|
pr_err("Invalid thread maps or affinity specs\n");
|
|
|
|
ret = -EINVAL;
|
|
|
|
goto out_free;
|
|
|
|
}
|
|
|
|
pr_debug2(" affinity mask: %s\n", mask);
|
|
|
|
tmp_spec = realloc(affinity_spec, (nr_spec + 1) * sizeof(char *));
|
|
|
|
if (!tmp_spec) {
|
|
|
|
pr_err("Failed to reallocate affinity spec\n");
|
|
|
|
ret = -ENOMEM;
|
|
|
|
goto out_free;
|
|
|
|
}
|
|
|
|
affinity_spec = tmp_spec;
|
|
|
|
affinity_spec[nr_spec] = strdup(mask);
|
|
|
|
if (!affinity_spec[nr_spec]) {
|
|
|
|
pr_err("Failed to allocate affinity spec[%d]\n", nr_spec);
|
|
|
|
ret = -ENOMEM;
|
|
|
|
goto out_free;
|
|
|
|
}
|
|
|
|
dup_mask = NULL;
|
|
|
|
nr_spec++;
|
|
|
|
}
|
|
|
|
|
|
|
|
ret = record__init_thread_masks_spec(rec, cpus, (const char **)maps_spec,
|
|
|
|
(const char **)affinity_spec, nr_spec);
|
|
|
|
|
|
|
|
out_free:
|
|
|
|
free(dup_mask);
|
|
|
|
for (s = 0; s < nr_spec; s++) {
|
|
|
|
if (maps_spec)
|
|
|
|
free(maps_spec[s]);
|
|
|
|
if (affinity_spec)
|
|
|
|
free(affinity_spec[s]);
|
|
|
|
}
|
|
|
|
free(affinity_spec);
|
|
|
|
free(maps_spec);
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2022-01-17 18:34:21 +00:00
|
|
|
static int record__init_thread_default_masks(struct record *rec, struct perf_cpu_map *cpus)
|
|
|
|
{
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
ret = record__alloc_thread_masks(rec, 1, cpu__max_cpu().cpu);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
2022-09-05 14:19:29 +00:00
|
|
|
if (record__mmap_cpu_mask_init(&rec->thread_masks->maps, cpus))
|
|
|
|
return -ENODEV;
|
2022-01-17 18:34:21 +00:00
|
|
|
|
|
|
|
rec->nr_threads = 1;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int record__init_thread_masks(struct record *rec)
|
|
|
|
{
|
2022-01-17 18:34:33 +00:00
|
|
|
int ret = 0;
|
2022-05-24 07:54:30 +00:00
|
|
|
struct perf_cpu_map *cpus = rec->evlist->core.all_cpus;
|
2022-01-17 18:34:21 +00:00
|
|
|
|
2022-01-17 18:34:32 +00:00
|
|
|
if (!record__threads_enabled(rec))
|
|
|
|
return record__init_thread_default_masks(rec, cpus);
|
|
|
|
|
2022-05-24 07:54:30 +00:00
|
|
|
if (evlist__per_thread(rec->evlist)) {
|
2022-04-14 01:46:40 +00:00
|
|
|
pr_err("--per-thread option is mutually exclusive to parallel streaming mode.\n");
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
2022-01-17 18:34:33 +00:00
|
|
|
switch (rec->opts.threads_spec) {
|
|
|
|
case THREAD_SPEC__CPU:
|
|
|
|
ret = record__init_thread_cpu_masks(rec, cpus);
|
|
|
|
break;
|
|
|
|
case THREAD_SPEC__CORE:
|
|
|
|
ret = record__init_thread_core_masks(rec, cpus);
|
|
|
|
break;
|
|
|
|
case THREAD_SPEC__PACKAGE:
|
|
|
|
ret = record__init_thread_package_masks(rec, cpus);
|
|
|
|
break;
|
|
|
|
case THREAD_SPEC__NUMA:
|
|
|
|
ret = record__init_thread_numa_masks(rec, cpus);
|
|
|
|
break;
|
|
|
|
case THREAD_SPEC__USER:
|
|
|
|
ret = record__init_thread_user_masks(rec, cpus);
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
return ret;
|
2022-01-17 18:34:21 +00:00
|
|
|
}
|
|
|
|
|
2017-03-27 14:47:20 +00:00
|
|
|
int cmd_record(int argc, const char **argv)
|
2009-05-26 07:17:18 +00:00
|
|
|
{
|
2015-04-09 15:53:45 +00:00
|
|
|
int err;
|
2013-12-19 17:38:03 +00:00
|
|
|
struct record *rec = &record;
|
2012-05-07 05:09:02 +00:00
|
|
|
char errbuf[BUFSIZ];
|
2009-05-26 07:17:18 +00:00
|
|
|
|
perf record: Allow asking for the maximum allowed sample rate
Add the handy '-F max' shortcut to reading and using the
kernel.perf_event_max_sample_rate value as the user supplied
sampling frequency:
# perf record -F max sleep 1
info: Using a maximum frequency rate of 15,000 Hz
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.019 MB perf.data (14 samples) ]
# sysctl kernel.perf_event_max_sample_rate
kernel.perf_event_max_sample_rate = 15000
# perf evlist -v
cycles:ppp: size: 112, { sample_period, sample_freq }: 15000, sample_type: IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1
# perf record -F 10 sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.019 MB perf.data (4 samples) ]
# perf evlist -v
cycles:ppp: size: 112, { sample_period, sample_freq }: 10, sample_type: IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1
#
Suggested-by: Ingo Molnar <mingo@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-4y0tiuws62c64gp4cf0hme0m@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-01 16:46:23 +00:00
|
|
|
setlocale(LC_ALL, "");
|
|
|
|
|
2022-05-18 22:47:21 +00:00
|
|
|
#ifndef HAVE_BPF_SKEL
|
|
|
|
# define set_nobuild(s, l, m, c) set_option_nobuild(record_options, s, l, m, c)
|
2023-05-06 21:07:37 +00:00
|
|
|
set_nobuild('\0', "off-cpu", "no BUILD_BPF_SKEL=1", true);
|
2022-05-18 22:47:21 +00:00
|
|
|
# undef set_nobuild
|
perf tools: Make options always available, even if required libs not linked
This patch keeps options of perf builtins same in all conditions. If
one option is disabled because of compiling options, users should be
notified.
Masami suggested another implementation in [1] that, by adding a
OPTION_NEXT_DEPENDS option before those options in the 'struct option'
array, options parser knows an option is disabled. However, in some
cases this array is reordered (options__order()). In addition, in
parse-option.c that array is const, so we can't simply merge
information in decorator option into the affacted option.
This patch chooses a simpler implementation that, introducing a
set_option_nobuild() function and two option parsing flags. Builtins
with such options should call set_option_nobuild() before option
parsing. The complexity of this patch is because we want some of options
can be skipped safely. In this case their arguments should also be
consumed.
Options in 'perf record' and 'perf probe' are fixed in this patch.
[1] http://lkml.kernel.org/g/50399556C9727B4D88A595C8584AAB3752627CD4@GSjpTKYDCembx32.service.hitachi.net
Test result:
Normal case:
# ./perf probe --vmlinux /tmp/vmlinux sys_write
Added new event:
probe:sys_write (on sys_write)
You can now use it in all perf tools, such as:
perf record -e probe:sys_write -aR sleep 1
Build with NO_DWARF=1:
# ./perf probe -L sys_write
Error: switch `L' is not available because NO_DWARF=1
Usage: perf probe [<options>] 'PROBEDEF' ['PROBEDEF' ...]
or: perf probe [<options>] --add 'PROBEDEF' [--add 'PROBEDEF' ...]
or: perf probe [<options>] --del '[GROUP:]EVENT' ...
or: perf probe --list [GROUP:]EVENT ...
or: perf probe [<options>] --funcs
-L, --line <FUNC[:RLN[+NUM|-RLN2]]|SRC:ALN[+NUM|-ALN2]>
Show source code lines.
(not built-in because NO_DWARF=1)
# ./perf probe -k /tmp/vmlinux sys_write
Warning: switch `k' is being ignored because NO_DWARF=1
Added new event:
probe:sys_write (on sys_write)
You can now use it in all perf tools, such as:
perf record -e probe:sys_write -aR sleep 1
# ./perf probe --vmlinux /tmp/vmlinux sys_write
Warning: option `vmlinux' is being ignored because NO_DWARF=1
Added new event:
[SNIP]
# ./perf probe -l
Usage: perf probe [<options>] 'PROBEDEF' ['PROBEDEF' ...]
or: perf probe [<options>] --add 'PROBEDEF' [--add 'PROBEDEF' ...]
...
-k, --vmlinux <file> vmlinux pathname
(not built-in because NO_DWARF=1)
-L, --line <FUNC[:RLN[+NUM|-RLN2]]|SRC:ALN[+NUM|-ALN2]>
Show source code lines.
(not built-in because NO_DWARF=1)
...
-V, --vars <FUNC[@SRC][+OFF|%return|:RL|;PT]|SRC:AL|SRC;PT>
Show accessible variables on PROBEDEF
(not built-in because NO_DWARF=1)
--externs Show external variables too (with --vars only)
(not built-in because NO_DWARF=1)
--no-inlines Don't search inlined functions
(not built-in because NO_DWARF=1)
--range Show variables location range in scope (with --vars only)
(not built-in because NO_DWARF=1)
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1450089563-122430-14-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-12-14 10:39:22 +00:00
|
|
|
#endif
|
|
|
|
|
2023-11-02 17:56:44 +00:00
|
|
|
/* Disable eager loading of kernel symbols that adds overhead to perf record. */
|
|
|
|
symbol_conf.lazy_load_kernel_maps = true;
|
2019-01-22 17:47:43 +00:00
|
|
|
rec->opts.affinity = PERF_AFFINITY_SYS;
|
|
|
|
|
2019-07-21 11:23:55 +00:00
|
|
|
rec->evlist = evlist__new();
|
2014-01-03 18:03:26 +00:00
|
|
|
if (rec->evlist == NULL)
|
2011-01-11 22:56:53 +00:00
|
|
|
return -ENOMEM;
|
|
|
|
|
2017-01-24 16:44:10 +00:00
|
|
|
err = perf_config(perf_record_config, rec);
|
|
|
|
if (err)
|
|
|
|
return err;
|
2014-02-03 11:44:42 +00:00
|
|
|
|
2010-11-10 14:11:30 +00:00
|
|
|
argc = parse_options(argc, argv, record_options, record_usage,
|
2009-12-15 22:04:40 +00:00
|
|
|
PARSE_OPT_STOP_AT_NON_OPTION);
|
2017-02-17 08:17:42 +00:00
|
|
|
if (quiet)
|
|
|
|
perf_quiet_option();
|
2017-02-17 17:00:18 +00:00
|
|
|
|
2021-10-18 13:48:42 +00:00
|
|
|
err = symbol__validate_sym_arguments();
|
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
|
2021-12-09 20:04:25 +00:00
|
|
|
perf_debuginfod_setup(&record.debuginfod);
|
|
|
|
|
2017-02-17 17:00:18 +00:00
|
|
|
/* Make system wide (-a) the default target. */
|
2013-11-12 19:46:16 +00:00
|
|
|
if (!argc && target__none(&rec->opts.target))
|
2017-02-17 17:00:18 +00:00
|
|
|
rec->opts.target.system_wide = true;
|
2009-05-26 07:17:18 +00:00
|
|
|
|
2012-04-26 05:15:15 +00:00
|
|
|
if (nr_cgroups && !rec->opts.target.system_wide) {
|
2015-10-24 15:49:27 +00:00
|
|
|
usage_with_options_msg(record_usage, record_options,
|
|
|
|
"cgroup monitoring only available in system-wide mode");
|
|
|
|
|
perf tool: Add cgroup support
This patch adds the ability to filter monitoring based on container groups
(cgroups) for both perf stat and perf record. It is possible to monitor
multiple cgroup in parallel. There is one cgroup per event. The cgroups to
monitor are passed via a new -G option followed by a comma separated list of
cgroup names.
The cgroup filesystem has to be mounted. Given a cgroup name, the perf tool
finds the corresponding directory in the cgroup filesystem and opens it. It
then passes that file descriptor to the kernel.
Example:
$ perf stat -B -a -e cycles:u,cycles:u,cycles:u -G test1,,test2 -- sleep 1
Performance counter stats for 'sleep 1':
2,368,667,414 cycles test1
2,369,661,459 cycles
<not counted> cycles test2
1.001856890 seconds time elapsed
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <4d590290.825bdf0a.7d0a.4890@mx.google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-14 09:20:01 +00:00
|
|
|
}
|
perf record: Implement -z,--compression_level[=<n>] option
Implemented -z,--compression_level[=<n>] option that enables compression
of mmaped kernel data buffers content in runtime during perf record mode
collection. Default option value is 1 (fastest compression).
Compression overhead has been measured for serial and AIO streaming when
profiling matrix multiplication workload:
-------------------------------------------------------------
| SERIAL | AIO-1 |
----------------------------------------------------------------|
|-z | OVH(x) | ratio(x) size(MiB) | OVH(x) | ratio(x) size(MiB) |
|---------------------------------------------------------------|
| 0 | 1,00 | 1,000 179,424 | 1,00 | 1,000 187,527 |
| 1 | 1,04 | 8,427 181,148 | 1,01 | 8,474 188,562 |
| 2 | 1,07 | 8,055 186,953 | 1,03 | 7,912 191,773 |
| 3 | 1,04 | 8,283 181,908 | 1,03 | 8,220 191,078 |
| 5 | 1,09 | 8,101 187,705 | 1,05 | 7,780 190,065 |
| 8 | 1,05 | 9,217 179,191 | 1,12 | 6,111 193,024 |
-----------------------------------------------------------------
OVH = (Execution time with -z N) / (Execution time with -z 0)
ratio - compression ratio
size - number of bytes that was compressed
size ~= trace size x ratio
Committer notes:
Testing it I noticed that it failed to disable build id processing when
compression is enabled, and as we'd have to uncompress everything to
look for the PERF_RECORD_{MMAP,SAMPLE,etc} to figure out which build ids
to read from DSOs, we better disable build id processing when
compression is enabled, logging with pr_debug() when doing so:
Original patch:
# perf record -z2
^C[ perf record: Woken up 1 times to write data ]
0x1746e0 [0x76]: failed to process type: 81 [Invalid argument]
[ perf record: Captured and wrote 1.568 MB perf.data, compressed (original 0.452 MB, ratio is 3.995) ]
#
After auto-disabling build id processing when compression is enabled:
$ perf record -z2 sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.001 MB perf.data, compressed (original 0.001 MB, ratio is 2.292) ]
$ perf record -v -z2 sleep 1
Compression enabled, disabling build id collection at the end of the session.
<SNIP extra -v pr_debug() messages>
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.001 MB perf.data, compressed (original 0.001 MB, ratio is 2.305) ]
$
Also, with parts of the patch originally after this one moved to just
before this one we get:
$ perf record -z2 sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.001 MB perf.data, compressed (original 0.001 MB, ratio is 2.371) ]
$ perf report -D | grep COMPRESS
0 0x1b8 [0x155]: PERF_RECORD_COMPRESSED: unhandled!
0 0x30d [0x80]: PERF_RECORD_COMPRESSED: unhandled!
COMPRESSED events: 2
COMPRESSED events: 0
$
I.e. when faced with PERF_RECORD_COMPRESSED that we still have no code
to process, we just show it as not being handled, skip them and
continue, while before we had:
$ perf report -D | grep COMPRESS
0x1b8 [0x169]: failed to process type: 81 [Invalid argument]
Error:
failed to process sample
0 0x1b8 [0x169]: PERF_RECORD_COMPRESSED
$
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/9ff06518-ae63-a908-e44d-5d9e56dd66d9@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-18 17:44:42 +00:00
|
|
|
|
perf record: Add --buildid-mmap option to enable PERF_RECORD_MMAP2's build id
Add --buildid-mmap option to enable build id in PERF_RECORD_MMAP2 events.
It will only work if there's kernel support for that and it disables
build id cache (implies --no-buildid).
It's also possible to enable it permanently via config option in
~/.perfconfig file:
[record]
build-id=mmap
Also added build_id bit in the verbose output for perf_event_attr:
# perf record --buildid-mmap -vv
...
perf_event_attr:
type 1
size 120
...
build_id 1
Adding also missing text_poke bit.
Committer testing:
$ perf record -h build
Usage: perf record [<options>] [<command>]
or: perf record [<options>] -- <command> [<options>]
-B, --no-buildid do not collect buildids in perf.data
-N, --no-buildid-cache
do not update the buildid cache
--buildid-all Record build-id of all DSOs regardless of hits
--buildid-mmap Record build-id in map events
$
$ perf record --buildid-mmap sleep 1
Failed: no support to record build id in mmap events, update your kernel.
$
After adding the needed kernel bits in a test kernel:
$ perf record -vv --buildid-mmap sleep 1 |& grep -m1 build
Enabling build id in mmap2 events.
$ perf evlist -v
cycles:u: size: 120, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD, read_format: ID, disabled: 1, inherit: 1, exclude_kernel: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, build_id: 1
$
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Budankov <abudankov@huawei.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201214105457.543111-16-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-14 10:54:57 +00:00
|
|
|
if (rec->buildid_mmap) {
|
|
|
|
if (!perf_can_record_build_id()) {
|
|
|
|
pr_err("Failed: no support to record build id in mmap events, update your kernel.\n");
|
|
|
|
err = -EINVAL;
|
|
|
|
goto out_opts;
|
|
|
|
}
|
|
|
|
pr_debug("Enabling build id in mmap2 events.\n");
|
|
|
|
/* Enable mmap build id synthesizing. */
|
|
|
|
symbol_conf.buildid_mmap2 = true;
|
|
|
|
/* Enable perf_event_attr::build_id bit. */
|
|
|
|
rec->opts.build_id = true;
|
|
|
|
/* Disable build id cache. */
|
|
|
|
rec->no_buildid = true;
|
|
|
|
}
|
|
|
|
|
2021-05-27 18:28:35 +00:00
|
|
|
if (rec->opts.record_cgroup && !perf_can_record_cgroup()) {
|
|
|
|
pr_err("Kernel has no cgroup sampling support.\n");
|
|
|
|
err = -EINVAL;
|
|
|
|
goto out_opts;
|
|
|
|
}
|
|
|
|
|
2022-06-10 11:33:12 +00:00
|
|
|
if (rec->opts.kcore)
|
|
|
|
rec->opts.text_poke = true;
|
|
|
|
|
2022-01-17 18:34:28 +00:00
|
|
|
if (rec->opts.kcore || record__threads_enabled(rec))
|
perf record: Put a copy of kcore into the perf.data directory
Add a new 'perf record' option '--kcore' which will put a copy of
/proc/kcore, kallsyms and modules into a perf.data directory. Note, that
without the --kcore option, output goes to a file as previously. The
tools' -o and -i options work with either a file name or directory name.
Example:
$ sudo perf record --kcore uname
$ sudo tree perf.data
perf.data
├── kcore_dir
│ ├── kallsyms
│ ├── kcore
│ └── modules
└── data
$ sudo perf script -v
build id event received for vmlinux: 1eaa285996affce2d74d8e66dcea09a80c9941de
build id event received for [vdso]: 8bbaf5dc62a9b644b4d4e4539737e104e4a84541
Samples for 'cycles' event do not have CPU attribute set. Skipping 'cpu' field.
Using CPUID GenuineIntel-6-8E-A
Using perf.data/kcore_dir/kcore for kernel data
Using perf.data/kcore_dir/kallsyms for symbols
perf 19058 506778.423729: 1 cycles: ffffffffa2caa548 native_write_msr+0x8 (vmlinux)
perf 19058 506778.423733: 1 cycles: ffffffffa2caa548 native_write_msr+0x8 (vmlinux)
perf 19058 506778.423734: 7 cycles: ffffffffa2caa548 native_write_msr+0x8 (vmlinux)
perf 19058 506778.423736: 117 cycles: ffffffffa2caa54a native_write_msr+0xa (vmlinux)
perf 19058 506778.423738: 2092 cycles: ffffffffa2c9b7b0 native_apic_msr_write+0x0 (vmlinux)
perf 19058 506778.423740: 37380 cycles: ffffffffa2f121d0 perf_event_addr_filters_exec+0x0 (vmlinux)
uname 19058 506778.423751: 582673 cycles: ffffffffa303a407 propagate_protected_usage+0x147 (vmlinux)
uname 19058 506778.423892: 2241841 cycles: ffffffffa2cae0c9 unwind_next_frame.part.5+0x79 (vmlinux)
uname 19058 506778.424430: 2457397 cycles: ffffffffa3019232 check_memory_region+0x52 (vmlinux)
Committer testing:
# rm -rf perf.data*
# perf record sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.024 MB perf.data (7 samples) ]
# ls -l perf.data
-rw-------. 1 root root 34772 Oct 21 11:08 perf.data
# perf record --kcore uname
Linux
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.024 MB perf.data (7 samples) ]
ls[root@quaco ~]# ls -lad perf.data*
drwx------. 3 root root 4096 Oct 21 11:08 perf.data
-rw-------. 1 root root 34772 Oct 21 11:08 perf.data.old
# perf evlist -v
cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1
# perf evlist -v -i perf.data/data
cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1
#
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lore.kernel.org/lkml/20191004083121.12182-6-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-10-04 08:31:21 +00:00
|
|
|
rec->data.is_dir = true;
|
|
|
|
|
2022-01-17 18:34:34 +00:00
|
|
|
if (record__threads_enabled(rec)) {
|
|
|
|
if (rec->opts.affinity != PERF_AFFINITY_SYS) {
|
|
|
|
pr_err("--affinity option is mutually exclusive to parallel streaming mode.\n");
|
|
|
|
goto out_opts;
|
|
|
|
}
|
|
|
|
if (record__aio_enabled(rec)) {
|
|
|
|
pr_err("Asynchronous streaming mode (--aio) is mutually exclusive to parallel streaming mode.\n");
|
|
|
|
goto out_opts;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
perf record: Implement -z,--compression_level[=<n>] option
Implemented -z,--compression_level[=<n>] option that enables compression
of mmaped kernel data buffers content in runtime during perf record mode
collection. Default option value is 1 (fastest compression).
Compression overhead has been measured for serial and AIO streaming when
profiling matrix multiplication workload:
-------------------------------------------------------------
| SERIAL | AIO-1 |
----------------------------------------------------------------|
|-z | OVH(x) | ratio(x) size(MiB) | OVH(x) | ratio(x) size(MiB) |
|---------------------------------------------------------------|
| 0 | 1,00 | 1,000 179,424 | 1,00 | 1,000 187,527 |
| 1 | 1,04 | 8,427 181,148 | 1,01 | 8,474 188,562 |
| 2 | 1,07 | 8,055 186,953 | 1,03 | 7,912 191,773 |
| 3 | 1,04 | 8,283 181,908 | 1,03 | 8,220 191,078 |
| 5 | 1,09 | 8,101 187,705 | 1,05 | 7,780 190,065 |
| 8 | 1,05 | 9,217 179,191 | 1,12 | 6,111 193,024 |
-----------------------------------------------------------------
OVH = (Execution time with -z N) / (Execution time with -z 0)
ratio - compression ratio
size - number of bytes that was compressed
size ~= trace size x ratio
Committer notes:
Testing it I noticed that it failed to disable build id processing when
compression is enabled, and as we'd have to uncompress everything to
look for the PERF_RECORD_{MMAP,SAMPLE,etc} to figure out which build ids
to read from DSOs, we better disable build id processing when
compression is enabled, logging with pr_debug() when doing so:
Original patch:
# perf record -z2
^C[ perf record: Woken up 1 times to write data ]
0x1746e0 [0x76]: failed to process type: 81 [Invalid argument]
[ perf record: Captured and wrote 1.568 MB perf.data, compressed (original 0.452 MB, ratio is 3.995) ]
#
After auto-disabling build id processing when compression is enabled:
$ perf record -z2 sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.001 MB perf.data, compressed (original 0.001 MB, ratio is 2.292) ]
$ perf record -v -z2 sleep 1
Compression enabled, disabling build id collection at the end of the session.
<SNIP extra -v pr_debug() messages>
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.001 MB perf.data, compressed (original 0.001 MB, ratio is 2.305) ]
$
Also, with parts of the patch originally after this one moved to just
before this one we get:
$ perf record -z2 sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.001 MB perf.data, compressed (original 0.001 MB, ratio is 2.371) ]
$ perf report -D | grep COMPRESS
0 0x1b8 [0x155]: PERF_RECORD_COMPRESSED: unhandled!
0 0x30d [0x80]: PERF_RECORD_COMPRESSED: unhandled!
COMPRESSED events: 2
COMPRESSED events: 0
$
I.e. when faced with PERF_RECORD_COMPRESSED that we still have no code
to process, we just show it as not being handled, skip them and
continue, while before we had:
$ perf report -D | grep COMPRESS
0x1b8 [0x169]: failed to process type: 81 [Invalid argument]
Error:
failed to process sample
0 0x1b8 [0x169]: PERF_RECORD_COMPRESSED
$
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/9ff06518-ae63-a908-e44d-5d9e56dd66d9@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-18 17:44:42 +00:00
|
|
|
if (rec->opts.comp_level != 0) {
|
|
|
|
pr_debug("Compression enabled, disabling build id collection at the end of the session.\n");
|
|
|
|
rec->no_buildid = true;
|
|
|
|
}
|
|
|
|
|
2015-07-21 09:44:04 +00:00
|
|
|
if (rec->opts.record_switch_events &&
|
|
|
|
!perf_can_record_switch_events()) {
|
2015-10-24 15:49:27 +00:00
|
|
|
ui__error("kernel does not support recording context switch events\n");
|
|
|
|
parse_options_usage(record_usage, record_options, "switch-events", 0);
|
2020-09-02 10:57:07 +00:00
|
|
|
err = -EINVAL;
|
|
|
|
goto out_opts;
|
2015-07-21 09:44:04 +00:00
|
|
|
}
|
perf tool: Add cgroup support
This patch adds the ability to filter monitoring based on container groups
(cgroups) for both perf stat and perf record. It is possible to monitor
multiple cgroup in parallel. There is one cgroup per event. The cgroups to
monitor are passed via a new -G option followed by a comma separated list of
cgroup names.
The cgroup filesystem has to be mounted. Given a cgroup name, the perf tool
finds the corresponding directory in the cgroup filesystem and opens it. It
then passes that file descriptor to the kernel.
Example:
$ perf stat -B -a -e cycles:u,cycles:u,cycles:u -G test1,,test2 -- sleep 1
Performance counter stats for 'sleep 1':
2,368,667,414 cycles test1
2,369,661,459 cycles
<not counted> cycles test2
1.001856890 seconds time elapsed
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <4d590290.825bdf0a.7d0a.4890@mx.google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-14 09:20:01 +00:00
|
|
|
|
2017-01-09 09:51:57 +00:00
|
|
|
if (switch_output_setup(rec)) {
|
|
|
|
parse_options_usage(record_usage, record_options, "switch-output", 0);
|
2020-09-02 10:57:07 +00:00
|
|
|
err = -EINVAL;
|
|
|
|
goto out_opts;
|
2017-01-09 09:51:57 +00:00
|
|
|
}
|
|
|
|
|
2017-01-09 09:52:00 +00:00
|
|
|
if (rec->switch_output.time) {
|
|
|
|
signal(SIGALRM, alarm_sig_handler);
|
|
|
|
alarm(rec->switch_output.time);
|
|
|
|
}
|
|
|
|
|
2019-03-14 22:49:55 +00:00
|
|
|
if (rec->switch_output.num_files) {
|
2024-01-06 09:41:29 +00:00
|
|
|
rec->switch_output.filenames = calloc(rec->switch_output.num_files,
|
|
|
|
sizeof(char *));
|
2020-09-02 10:57:07 +00:00
|
|
|
if (!rec->switch_output.filenames) {
|
|
|
|
err = -EINVAL;
|
|
|
|
goto out_opts;
|
|
|
|
}
|
2019-03-14 22:49:55 +00:00
|
|
|
}
|
|
|
|
|
2022-01-17 18:34:34 +00:00
|
|
|
if (rec->timestamp_filename && record__threads_enabled(rec)) {
|
|
|
|
rec->timestamp_filename = false;
|
|
|
|
pr_warning("WARNING: --timestamp-filename option is not available in parallel streaming mode.\n");
|
|
|
|
}
|
|
|
|
|
2024-07-03 22:30:34 +00:00
|
|
|
if (rec->filter_action) {
|
|
|
|
if (!strcmp(rec->filter_action, "pin"))
|
|
|
|
err = perf_bpf_filter__pin();
|
|
|
|
else if (!strcmp(rec->filter_action, "unpin"))
|
|
|
|
err = perf_bpf_filter__unpin();
|
|
|
|
else {
|
|
|
|
pr_warning("Unknown BPF filter action: %s\n", rec->filter_action);
|
|
|
|
err = -EINVAL;
|
|
|
|
}
|
|
|
|
goto out_opts;
|
|
|
|
}
|
|
|
|
|
2016-09-23 14:38:39 +00:00
|
|
|
/*
|
|
|
|
* Allow aliases to facilitate the lookup of symbols for address
|
|
|
|
* filters. Refer to auxtrace_parse_filters().
|
|
|
|
*/
|
|
|
|
symbol_conf.allow_aliases = true;
|
|
|
|
|
|
|
|
symbol__init(NULL);
|
|
|
|
|
2018-03-06 09:13:12 +00:00
|
|
|
err = record__auxtrace_init(rec);
|
2016-09-23 14:38:39 +00:00
|
|
|
if (err)
|
|
|
|
goto out;
|
|
|
|
|
2016-06-16 08:02:41 +00:00
|
|
|
if (dry_run)
|
2016-09-23 14:38:37 +00:00
|
|
|
goto out;
|
2016-06-16 08:02:41 +00:00
|
|
|
|
2015-04-09 15:53:45 +00:00
|
|
|
err = -ENOMEM;
|
|
|
|
|
2016-04-20 18:59:52 +00:00
|
|
|
if (rec->no_buildid_cache || rec->no_buildid) {
|
2010-06-17 09:39:01 +00:00
|
|
|
disable_buildid_cache();
|
2017-01-09 09:51:58 +00:00
|
|
|
} else if (rec->switch_output.enabled) {
|
2016-04-20 18:59:52 +00:00
|
|
|
/*
|
|
|
|
* In 'perf record --switch-output', disable buildid
|
|
|
|
* generation by default to reduce data file switching
|
|
|
|
* overhead. Still generate buildid if they are required
|
|
|
|
* explicitly using
|
|
|
|
*
|
2017-01-03 08:19:56 +00:00
|
|
|
* perf record --switch-output --no-no-buildid \
|
2016-04-20 18:59:52 +00:00
|
|
|
* --no-no-buildid-cache
|
|
|
|
*
|
|
|
|
* Following code equals to:
|
|
|
|
*
|
|
|
|
* if ((rec->no_buildid || !rec->no_buildid_set) &&
|
|
|
|
* (rec->no_buildid_cache || !rec->no_buildid_cache_set))
|
|
|
|
* disable_buildid_cache();
|
|
|
|
*/
|
|
|
|
bool disable = true;
|
|
|
|
|
|
|
|
if (rec->no_buildid_set && !rec->no_buildid)
|
|
|
|
disable = false;
|
|
|
|
if (rec->no_buildid_cache_set && !rec->no_buildid_cache)
|
|
|
|
disable = false;
|
|
|
|
if (disable) {
|
|
|
|
rec->no_buildid = true;
|
|
|
|
rec->no_buildid_cache = true;
|
|
|
|
disable_buildid_cache();
|
|
|
|
}
|
|
|
|
}
|
2009-12-15 22:04:40 +00:00
|
|
|
|
perf record: Add --tail-synthesize option
When working with overwritable ring buffer there's a inconvenience
problem: if perf dumps data after a long period after it starts,
non-sample events may lost, which makes following 'perf report' unable
to identify proc name and mmap layout. For example:
# perf record -m 4 -e raw_syscalls:* -g --overwrite --switch-output \
dd if=/dev/zero of=/dev/null
send SIGUSR2 after dd runs long enough. The resuling perf.data lost
correct comm and mmap events:
# perf script -i perf.data.2016061522374354
perf 24478 [004] 2581325.601789: raw_syscalls:sys_exit: NR 0 = 512
^^^^
Should be 'dd'
27b2e8 syscall_slow_exit_work+0xfe2000e3 (/lib/modules/4.6.0-rc3+/build/vmlinux)
203cc7 do_syscall_64+0xfe200117 (/lib/modules/4.6.0-rc3+/build/vmlinux)
b18d83 return_from_SYSCALL_64+0xfe200000 (/lib/modules/4.6.0-rc3+/build/vmlinux)
7f47c417edf0 [unknown] ([unknown])
^^^^^^^^^^^^
Fail to unwind
This patch provides a '--tail-synthesize' option, allows perf to collect
system status when finalizing output file. In resuling output file, the
non-sample events reflect system status when dumping data.
After this patch:
# perf record -m 4 -e raw_syscalls:* -g --overwrite --switch-output --tail-synthesize \
dd if=/dev/zero of=/dev/null
# perf script -i perf.data.2016061600544998
dd 27364 [004] 2583244.994464: raw_syscalls:sys_enter: NR 1 (1, ...
^^
Correct comm
203a18 syscall_trace_enter_phase2+0xfe2001a8 ([kernel.kallsyms])
203aa5 syscall_trace_enter+0xfe200055 ([kernel.kallsyms])
203caa do_syscall_64+0xfe2000fa ([kernel.kallsyms])
b18d83 return_from_SYSCALL_64+0xfe200000 ([kernel.kallsyms])
d8e50 __GI___libc_write+0xffff01d9639f4010 (/tmp/oxygen_root-w00229757/lib64/libc-2.18.so)
^^^^^
Correct unwind
This option doesn't aim to solve this problem completely. If a process
terminates before SIGUSR2, we still lost its COMM and MMAP events. For
example, we can't unwind correctly from the final perf.data we get from
the previous example, because when perf collects the final output file
(when we press C-c), 'dd' has been terminated so its '/proc/<pid>/mmap'
becomes empty.
However, this is a cheaper choice. To completely solve this problem we
need to continously output non-sample events. To satisify the
requirement of daemonization, we need to merge them periodically. It is
possible but requires much more code and cycles.
Automatically select --tail-synthesize when --overwrite is provided.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1468485287-33422-16-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-07-14 08:34:47 +00:00
|
|
|
if (record.opts.overwrite)
|
|
|
|
record.opts.tail_synthesize = true;
|
|
|
|
|
2021-04-27 07:01:26 +00:00
|
|
|
if (rec->evlist->core.nr_entries == 0) {
|
2024-10-16 06:23:58 +00:00
|
|
|
err = parse_event(rec->evlist, "cycles:P");
|
2023-05-27 07:21:49 +00:00
|
|
|
if (err)
|
2021-04-27 07:01:26 +00:00
|
|
|
goto out;
|
2009-06-11 21:11:50 +00:00
|
|
|
}
|
2009-05-26 07:17:18 +00:00
|
|
|
|
2013-11-18 09:55:57 +00:00
|
|
|
if (rec->opts.target.tid && !rec->opts.no_inherit_set)
|
|
|
|
rec->opts.no_inherit = true;
|
|
|
|
|
2013-11-12 19:46:16 +00:00
|
|
|
err = target__validate(&rec->opts.target);
|
2012-05-07 05:09:02 +00:00
|
|
|
if (err) {
|
2013-11-12 19:46:16 +00:00
|
|
|
target__strerror(&rec->opts.target, err, errbuf, BUFSIZ);
|
2018-02-06 18:17:58 +00:00
|
|
|
ui__warning("%s\n", errbuf);
|
2012-05-07 05:09:02 +00:00
|
|
|
}
|
|
|
|
|
2013-11-12 19:46:16 +00:00
|
|
|
err = target__parse_uid(&rec->opts.target);
|
2012-05-07 05:09:02 +00:00
|
|
|
if (err) {
|
|
|
|
int saved_errno = errno;
|
2012-04-26 05:15:18 +00:00
|
|
|
|
2013-11-12 19:46:16 +00:00
|
|
|
target__strerror(&rec->opts.target, err, errbuf, BUFSIZ);
|
2012-05-29 04:22:57 +00:00
|
|
|
ui__error("%s", errbuf);
|
2012-05-07 05:09:02 +00:00
|
|
|
|
|
|
|
err = -saved_errno;
|
2016-09-23 14:38:36 +00:00
|
|
|
goto out;
|
2012-05-07 05:09:02 +00:00
|
|
|
}
|
2012-01-19 16:08:15 +00:00
|
|
|
|
perf evsel: Enable ignore_missing_thread for pid option
While monitoring a multithread process with pid option, perf sometimes
may return sys_perf_event_open failure with 3(No such process) if any of
the process's threads die before we open the event. However, we want
perf continue monitoring the remaining threads and do not exit with
error.
Here, the patch enables perf_evsel::ignore_missing_thread for -p option
to ignore complete failure if any of threads die before we open the event.
But it may still return sys_perf_event_open failure with 22(Invalid) if we
monitors several event groups.
sys_perf_event_open: pid 28960 cpu 40 group_fd 118202 flags 0x8
sys_perf_event_open: pid 28961 cpu 40 group_fd 118203 flags 0x8
WARNING: Ignored open failure for pid 28962
sys_perf_event_open: pid 28962 cpu 40 group_fd [118203] flags 0x8
sys_perf_event_open failed, error -22
That is because when we ignore a missing thread, we change the thread_idx
without dealing with its fds, FD(evsel, cpu, thread). Then get_group_fd()
may return a wrong group_fd for the next thread and sys_perf_event_open()
return with 22.
sys_perf_event_open(){
...
if (group_fd != -1)
perf_fget_light()//to get corresponding group_leader by group_fd
...
if (group_leader)
if (group_leader->ctx->task != ctx->task)//should on the same task
goto err_context
...
}
This patch also fixes this bug by introducing perf_evsel__remove_fd() and
update_fds to allow removing fds for the missing thread.
Changes since v1:
- Change group_fd__remove() into a more genetic way without changing code logic
- Remove redundant condition
Changes since v2:
- Use a proper function name and add some comment.
- Multiline comment style fixes.
Committer testing:
Before this patch the recently added 'perf stat --per-thread' for system
wide counting would race while enumerating all threads using /proc:
[root@jouet ~]# perf stat --per-thread
failed to parse CPUs map: No such file or directory
Usage: perf stat [<options>] [<command>]
-C, --cpu <cpu> list of cpus to monitor in system-wide
-a, --all-cpus system-wide collection from all CPUs
[root@jouet ~]# perf stat --per-thread
failed to parse CPUs map: No such file or directory
Usage: perf stat [<options>] [<command>]
-C, --cpu <cpu> list of cpus to monitor in system-wide
-a, --all-cpus system-wide collection from all CPUs
[root@jouet ~]#
When, say, the kernel was being built, so lots of shortlived threads,
after this patch this doesn't happen.
Signed-off-by: Mengting Zhang <zhangmengting@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Cheng Jian <cj.chengjian@huawei.com>
Cc: Li Bin <huawei.libin@huawei.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1513148513-6974-1-git-send-email-zhangmengting@huawei.com
[ Remove one use 'evlist' alias variable ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-12-13 07:01:53 +00:00
|
|
|
/* Enable ignoring missing threads when -u/-p option is defined. */
|
|
|
|
rec->opts.ignore_missing_thread = rec->opts.target.uid != UINT_MAX || rec->opts.target.pid;
|
2016-12-12 10:35:43 +00:00
|
|
|
|
2023-05-27 07:21:47 +00:00
|
|
|
evlist__warn_user_requested_cpus(rec->evlist, rec->opts.target.cpu_list);
|
perf tools: Enable on a list of CPUs for hybrid
The 'perf record' and 'perf stat' commands have supported the option
'-C/--cpus' to count or collect only on the list of CPUs provided. This
option needs to be supported for hybrid as well.
For hybrid support, it needs to check that the cpu list are available
on hybrid PMU. One example for AlderLake, cpu0-7 is 'cpu_core', cpu8-11
is 'cpu_atom'.
Before:
# perf stat -e cpu_core/cycles/ -C11 -- sleep 1
Performance counter stats for 'CPU(s) 11':
<not supported> cpu_core/cycles/
1.006179431 seconds time elapsed
The 'perf stat' command silently returned "<not supported>" without any
helpful information. It should error out pointing out that that cpu11
was not 'cpu_core'.
After:
# perf stat -e cpu_core/cycles/ -C11 -- sleep 1
WARNING: 11 isn't a 'cpu_core', please use a CPU list in the 'cpu_core' range (0-7)
failed to use cpu list 11
We also need to support the events without pmu prefix specified.
# perf stat -e cycles -C11 -- sleep 1
WARNING: 11 isn't a 'cpu_core', please use a CPU list in the 'cpu_core' range (0-7)
Performance counter stats for 'CPU(s) 11':
1,067,373 cpu_atom/cycles/
1.005544738 seconds time elapsed
The perf tool creates two cycles events automatically, cpu_core/cycles/ and
cpu_atom/cycles/. It checks that cpu11 is not 'cpu_core', then shows a warning
for cpu_core/cycles/ and only count the cpu_atom/cycles/.
If part of cpus are 'cpu_core' and part of cpus are 'cpu_atom', for example,
# perf stat -e cycles -C0,11 -- sleep 1
WARNING: use 0 in 'cpu_core' for 'cycles', skip other cpus in list.
WARNING: use 11 in 'cpu_atom' for 'cycles', skip other cpus in list.
Performance counter stats for 'CPU(s) 0,11':
1,914,704 cpu_core/cycles/
2,036,983 cpu_atom/cycles/
1.005815641 seconds time elapsed
It now automatically selects cpu0 for cpu_core/cycles/, selects cpu11 for
cpu_atom/cycles/, and output with some warnings.
Some more complex examples,
# perf stat -e cycles,instructions -C0,11 -- sleep 1
WARNING: use 0 in 'cpu_core' for 'cycles', skip other cpus in list.
WARNING: use 11 in 'cpu_atom' for 'cycles', skip other cpus in list.
WARNING: use 0 in 'cpu_core' for 'instructions', skip other cpus in list.
WARNING: use 11 in 'cpu_atom' for 'instructions', skip other cpus in list.
Performance counter stats for 'CPU(s) 0,11':
2,780,387 cpu_core/cycles/
1,583,432 cpu_atom/cycles/
3,957,277 cpu_core/instructions/
1,167,089 cpu_atom/instructions/
1.006005124 seconds time elapsed
# perf stat -e cycles,cpu_atom/instructions/ -C0,11 -- sleep 1
WARNING: use 0 in 'cpu_core' for 'cycles', skip other cpus in list.
WARNING: use 11 in 'cpu_atom' for 'cycles', skip other cpus in list.
WARNING: use 11 in 'cpu_atom' for 'cpu_atom/instructions/', skip other cpus in list.
Performance counter stats for 'CPU(s) 0,11':
3,290,301 cpu_core/cycles/
1,953,073 cpu_atom/cycles/
1,407,869 cpu_atom/instructions/
1.006260912 seconds time elapsed
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jin Yao <yao.jin@intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https //lore.kernel.org/r/20210723063433.7318-4-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-07-23 06:34:33 +00:00
|
|
|
|
2021-12-17 15:45:15 +00:00
|
|
|
if (callchain_param.enabled && callchain_param.record_mode == CALLCHAIN_FP)
|
|
|
|
arch__add_leaf_frame_record_opts(&rec->opts);
|
|
|
|
|
2012-05-07 05:09:02 +00:00
|
|
|
err = -ENOMEM;
|
2022-08-12 11:40:49 +00:00
|
|
|
if (evlist__create_maps(rec->evlist, &rec->opts.target) < 0) {
|
|
|
|
if (rec->opts.target.pid != NULL) {
|
|
|
|
pr_err("Couldn't create thread/CPU maps: %s\n",
|
|
|
|
errno == ENOENT ? "No such process" : str_error_r(errno, errbuf, sizeof(errbuf)));
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
else
|
|
|
|
usage_with_options(record_usage, record_options);
|
|
|
|
}
|
2011-01-03 18:39:04 +00:00
|
|
|
|
2015-04-09 15:53:45 +00:00
|
|
|
err = auxtrace_record__options(rec->itr, rec->evlist, &rec->opts);
|
|
|
|
if (err)
|
2016-09-23 14:38:36 +00:00
|
|
|
goto out;
|
2015-04-09 15:53:45 +00:00
|
|
|
|
2016-01-11 13:37:09 +00:00
|
|
|
/*
|
|
|
|
* We take all buildids when the file contains
|
|
|
|
* AUX area tracing data because we do not decode the
|
|
|
|
* trace because it would take too long.
|
|
|
|
*/
|
|
|
|
if (rec->opts.full_auxtrace)
|
|
|
|
rec->buildid_all = true;
|
|
|
|
|
2020-05-12 12:19:18 +00:00
|
|
|
if (rec->opts.text_poke) {
|
|
|
|
err = record__config_text_poke(rec->evlist);
|
|
|
|
if (err) {
|
|
|
|
pr_err("record__config_text_poke failed, error %d\n", err);
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2022-05-18 22:47:21 +00:00
|
|
|
if (rec->off_cpu) {
|
|
|
|
err = record__config_off_cpu(rec);
|
|
|
|
if (err) {
|
|
|
|
pr_err("record__config_off_cpu failed, error %d\n", err);
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2013-12-19 17:43:45 +00:00
|
|
|
if (record_opts__config(&rec->opts)) {
|
2010-07-29 17:08:55 +00:00
|
|
|
err = -EINVAL;
|
2016-09-23 14:38:36 +00:00
|
|
|
goto out;
|
2009-10-12 05:56:03 +00:00
|
|
|
}
|
|
|
|
|
2023-09-04 02:33:37 +00:00
|
|
|
err = record__config_tracking_events(rec);
|
|
|
|
if (err) {
|
|
|
|
pr_err("record__config_tracking_events failed, error %d\n", err);
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
2022-01-17 18:34:21 +00:00
|
|
|
err = record__init_thread_masks(rec);
|
|
|
|
if (err) {
|
|
|
|
pr_err("Failed to initialize parallel data streaming masks\n");
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
2018-11-06 09:07:19 +00:00
|
|
|
if (rec->opts.nr_cblocks > nr_cblocks_max)
|
|
|
|
rec->opts.nr_cblocks = nr_cblocks_max;
|
2019-03-18 17:43:35 +00:00
|
|
|
pr_debug("nr_cblocks: %d\n", rec->opts.nr_cblocks);
|
2018-11-06 09:04:58 +00:00
|
|
|
|
2019-01-22 17:47:43 +00:00
|
|
|
pr_debug("affinity: %s\n", affinity_tags[rec->opts.affinity]);
|
perf record: Implement --mmap-flush=<number> option
Implement a --mmap-flush option that specifies minimal number of bytes
that is extracted from mmaped kernel buffer to store into a trace. The
default option value is 1 byte what means every time trace writing
thread finds some new data in the mmaped buffer the data is extracted,
possibly compressed and written to a trace.
$ tools/perf/perf record --mmap-flush 1024 -e cycles -- matrix.gcc
$ tools/perf/perf record --aio --mmap-flush 1K -e cycles -- matrix.gcc
The option is independent from -z setting, doesn't vary with compression
level and can serve two purposes.
The first purpose is to increase the compression ratio of a trace data.
Larger data chunks are compressed more effectively so the implemented
option allows specifying data chunk size to compress. Also at some cases
executing more write syscalls with smaller data size can take longer
than executing less write syscalls with bigger data size due to syscall
overhead so extracting bigger data chunks specified by the option value
could additionally decrease runtime overhead.
The second purpose is to avoid self monitoring live-lock issue in system
wide (-a) profiling mode. Profiling in system wide mode with compression
(-a -z) can additionally induce data into the kernel buffers along with
the data from monitored processes. If performance data rate and volume
from the monitored processes is high then trace streaming and
compression activity in the tool is also high. High tool process
activity can lead to subtle live-lock effect when compression of single
new byte from some of mmaped kernel buffer leads to generation of the
next single byte at some mmaped buffer. So perf tool process ends up in
endless self monitoring.
Implemented synch parameter is the mean to force data move independently
from the specified flush threshold value. Despite the provided flush
value the tool needs capability to unconditionally drain memory buffers,
at least in the end of the collection.
Committer testing:
Running with the default value, i.e. as soon as there is something to
read go on consuming, we first write the synthesized events, small
chunks of about 128 bytes:
# perf trace -m 2048 --call-graph dwarf -e write -- perf record
<SNIP>
101.142 ( 0.004 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x210db60, count: 120) = 120
__libc_write (/usr/lib64/libpthread-2.28.so)
ion (/home/acme/bin/perf)
record__write (inlined)
process_synthesized_event (/home/acme/bin/perf)
perf_tool__process_synth_event (inlined)
perf_event__synthesize_mmap_events (/home/acme/bin/perf)
Then we move to reading the mmap buffers consuming the events put there
by the kernel perf infrastructure:
107.561 ( 0.005 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befc02000, count: 336) = 336
__libc_write (/usr/lib64/libpthread-2.28.so)
ion (/home/acme/bin/perf)
record__write (inlined)
record__pushfn (/home/acme/bin/perf)
perf_mmap__push (/home/acme/bin/perf)
record__mmap_read_evlist (inlined)
record__mmap_read_all (inlined)
__cmd_record (inlined)
cmd_record (/home/acme/bin/perf)
12919.953 ( 0.136 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befc83150, count: 184984) = 184984
<SNIP same backtrace as in the 107.561 timestamp>
12920.094 ( 0.155 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befc02150, count: 261816) = 261816
<SNIP same backtrace as in the 107.561 timestamp>
12920.253 ( 0.093 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befb81120, count: 170832) = 170832
<SNIP same backtrace as in the 107.561 timestamp>
If we limit it to write only when more than 16MB are available for
reading, it throttles that to a quarter of the --mmap-pages set for
'perf record', which by default get to 528384 bytes, found out using
'record -v':
mmap flush: 132096
mmap size 528384B
With that in place all the writes coming from
record__mmap_read_evlist(), i.e. from the mmap buffers setup by the
kernel perf infrastructure were at least 132096 bytes long.
Trying with a bigger mmap size:
perf trace -e write perf record -v -m 2048 --mmap-flush 16M
74982.928 ( 2.471 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff94a6cc000, count: 3580888) = 3580888
74985.406 ( 2.353 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff949ecb000, count: 3453256) = 3453256
74987.764 ( 2.629 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff9496ca000, count: 3859232) = 3859232
74990.399 ( 2.341 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff948ec9000, count: 3769032) = 3769032
74992.744 ( 2.064 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff9486c8000, count: 3310520) = 3310520
74994.814 ( 2.619 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff947ec7000, count: 4194688) = 4194688
74997.439 ( 2.787 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff9476c6000, count: 4029760) = 4029760
Was again limited to a quarter of the mmap size:
mmap flush: 2098176
mmap size 8392704B
A warning about that would be good to have but can be added later,
something like:
"max flush is a quarter of the mmap size, if wanting to bump the mmap
flush further, bump the mmap size as well using -m/--mmap-pages"
Also rename the 'sync' parameters to 'synch' to keep tools/perf building
with older glibcs:
cc1: warnings being treated as errors
builtin-record.c: In function 'record__mmap_read_evlist':
builtin-record.c:775: warning: declaration of 'sync' shadows a global declaration
/usr/include/unistd.h:933: warning: shadowed declaration is here
builtin-record.c: In function 'record__mmap_read_all':
builtin-record.c:856: warning: declaration of 'sync' shadows a global declaration
/usr/include/unistd.h:933: warning: shadowed declaration is here
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/f6600d72-ecfa-2eb7-7e51-f6954547d500@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-18 17:40:26 +00:00
|
|
|
pr_debug("mmap flush: %d\n", rec->opts.mmap_flush);
|
2019-01-22 17:47:43 +00:00
|
|
|
|
2019-03-18 17:42:19 +00:00
|
|
|
if (rec->opts.comp_level > comp_level_max)
|
|
|
|
rec->opts.comp_level = comp_level_max;
|
|
|
|
pr_debug("comp level: %d\n", rec->opts.comp_level);
|
|
|
|
|
2011-11-25 10:19:45 +00:00
|
|
|
err = __cmd_record(&record, argc, argv);
|
2016-09-23 14:38:36 +00:00
|
|
|
out:
|
2024-07-03 22:30:33 +00:00
|
|
|
record__free_thread_masks(rec, rec->nr_threads);
|
|
|
|
rec->nr_threads = 0;
|
2010-07-30 21:31:28 +00:00
|
|
|
symbol__exit();
|
2015-04-09 15:53:45 +00:00
|
|
|
auxtrace_record__free(rec->itr);
|
2020-09-02 10:57:07 +00:00
|
|
|
out_opts:
|
2020-09-03 12:29:37 +00:00
|
|
|
evlist__close_control(rec->opts.ctl_fd, rec->opts.ctl_fd_ack, &rec->opts.ctl_fd_close);
|
2024-07-03 22:30:33 +00:00
|
|
|
evlist__delete(rec->evlist);
|
2010-07-29 17:08:55 +00:00
|
|
|
return err;
|
2009-05-26 07:17:18 +00:00
|
|
|
}
|
2015-04-30 14:37:32 +00:00
|
|
|
|
|
|
|
static void snapshot_sig_handler(int sig __maybe_unused)
|
|
|
|
{
|
2017-01-09 09:51:58 +00:00
|
|
|
struct record *rec = &record;
|
|
|
|
|
2020-09-01 09:37:57 +00:00
|
|
|
hit_auxtrace_snapshot_trigger(rec);
|
2016-04-20 18:59:50 +00:00
|
|
|
|
2017-01-09 09:51:58 +00:00
|
|
|
if (switch_output_signal(rec))
|
2016-04-20 18:59:50 +00:00
|
|
|
trigger_hit(&switch_output_trigger);
|
2015-04-30 14:37:32 +00:00
|
|
|
}
|
2017-01-09 09:52:00 +00:00
|
|
|
|
|
|
|
static void alarm_sig_handler(int sig __maybe_unused)
|
|
|
|
{
|
|
|
|
struct record *rec = &record;
|
|
|
|
|
|
|
|
if (switch_output_time(rec))
|
|
|
|
trigger_hit(&switch_output_trigger);
|
|
|
|
}
|