Namhyung Kim 13159a139d perf mem: Update documentation for new options
Add a common options section and move some items to the section.  Also
add description of new options to report options.

Suggested-by: Ian Rogers <irogers@google.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/lkml/20240802180913.1023886-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-05 11:40:20 -03:00

134 lines
4.0 KiB
Plaintext

perf-mem(1)
===========
NAME
----
perf-mem - Profile memory accesses
SYNOPSIS
--------
[verse]
'perf mem' [<options>] (record [<command>] | report)
DESCRIPTION
-----------
"perf mem record" runs a command and gathers memory operation data
from it, into perf.data. Perf record options are accepted and are passed through.
"perf mem report" displays the result. It invokes perf report with the
right set of options to display a memory access profile. By default, loads
and stores are sampled. Use the -t option to limit to loads or stores.
Note that on Intel systems the memory latency reported is the use-latency,
not the pure load (or store latency). Use latency includes any pipeline
queuing delays in addition to the memory subsystem latency.
On Arm64 this uses SPE to sample load and store operations, therefore hardware
and kernel support is required. See linkperf:perf-arm-spe[1] for a setup guide.
Due to the statistical nature of SPE sampling, not every memory operation will
be sampled.
COMMON OPTIONS
--------------
-f::
--force::
Don't do ownership validation
-t::
--type=<type>::
Select the memory operation type: load or store (default: load,store)
-v::
--verbose::
Be more verbose (show counter open errors, etc)
-p::
--phys-data::
Record/Report sample physical addresses
--data-page-size::
Record/Report sample data address page size
RECORD OPTIONS
--------------
<command>...::
Any command you can specify in a shell.
-e::
--event <event>::
Event selector. Use 'perf mem record -e list' to list available events.
-K::
--all-kernel::
Configure all used events to run in kernel space.
-U::
--all-user::
Configure all used events to run in user space.
--ldlat <n>::
Specify desired latency for loads event. Supported on Intel and Arm64
processors only. Ignored on other archs.
REPORT OPTIONS
--------------
-i::
--input=<file>::
Input file name.
-C::
--cpu=<cpu>::
Monitor only on the list of CPUs provided. Multiple CPUs can be provided as a
comma-separated list with no space: 0,1. Ranges of CPUs are specified with -
like 0-2. Default is to monitor all CPUS.
-D::
--dump-raw-samples::
Dump the raw decoded samples on the screen in a format that is easy to parse with
one sample per line.
-s::
--sort=<key>::
Group result by given key(s) - multiple keys can be specified
in CSV format. The keys are specific to memory samples are:
symbol_daddr, symbol_iaddr, dso_daddr, locked, tlb, mem, snoop,
dcacheline, phys_daddr, data_page_size, blocked.
- symbol_daddr: name of data symbol being executed on at the time of sample
- symbol_iaddr: name of code symbol being executed on at the time of sample
- dso_daddr: name of library or module containing the data being executed
on at the time of the sample
- locked: whether the bus was locked at the time of the sample
- tlb: type of tlb access for the data at the time of the sample
- mem: type of memory access for the data at the time of the sample
- snoop: type of snoop (if any) for the data at the time of the sample
- dcacheline: the cacheline the data address is on at the time of the sample
- phys_daddr: physical address of data being executed on at the time of sample
- data_page_size: the data page size of data being executed on at the time of sample
- blocked: reason of blocked load access for the data at the time of the sample
And the default sort keys are changed to local_weight, mem, sym, dso,
symbol_daddr, dso_daddr, snoop, tlb, locked, blocked, local_ins_lat.
-T::
--type-profile::
Show data-type profile result instead of code symbols. This requires
the debug information and it will change the default sort keys to:
mem, snoop, tlb, type.
-U::
--hide-unresolved::
Only display entries resolved to a symbol.
-x::
--field-separator=<separator>::
Specify the field separator used when dump raw samples (-D option). By default,
The separator is the space character.
In addition, for report all perf report options are valid, and for record
all perf record options.
SEE ALSO
--------
linkperf:perf-record[1], linkperf:perf-report[1], linkperf:perf-arm-spe[1]