perf Document: Add TPEBS (Timed PEBS(Precise Event-Based Sampling)) to Documents

TPEBS (Timed PEBS(Precise Event-Based Sampling)) is a new feature Intel
PMU from Granite Rapids microarchitecture.

It will be used in new TMA (Top-Down Microarchitecture Analysis)
releases.

Add related introduction to documents while adding new code to support
it in 'perf stat'.

Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Weilin Wang <weilin.wang@intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Samantha Alt <samantha.alt@intel.com>
Link: https://lore.kernel.org/r/20240720062102.444578-8-weilin.wang@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This commit is contained in:
Weilin Wang 2024-07-20 02:21:00 -04:00 committed by Arnaldo Carvalho de Melo
parent d546e3acf3
commit 169f18fd98
2 changed files with 31 additions and 0 deletions

View File

@ -72,6 +72,7 @@ counted. The following modifiers exist:
W - group is weak and will fallback to non-group if not schedulable, W - group is weak and will fallback to non-group if not schedulable,
e - group or event are exclusive and do not share the PMU e - group or event are exclusive and do not share the PMU
b - use BPF aggregration (see perf stat --bpf-counters) b - use BPF aggregration (see perf stat --bpf-counters)
R - retire latency value of the event
The 'p' modifier can be used for specifying how precise the instruction The 'p' modifier can be used for specifying how precise the instruction
address should be. The 'p' modifier can be specified multiple times: address should be. The 'p' modifier can be specified multiple times:

View File

@ -325,6 +325,36 @@ other four level 2 metrics by subtracting corresponding metrics as below.
Fetch_Bandwidth = Frontend_Bound - Fetch_Latency Fetch_Bandwidth = Frontend_Bound - Fetch_Latency
Core_Bound = Backend_Bound - Memory_Bound Core_Bound = Backend_Bound - Memory_Bound
TPEBS in TopDown
================
TPEBS (Timed PEBS) is one of the new Intel PMU features provided since Granite
Rapids microarchitecture. The TPEBS feature adds a 16 bit retire_latency field
in the Basic Info group of the PEBS record. It records the Core cycles since the
retirement of the previous instruction to the retirement of current instruction.
Please refer to Section 8.4.1 of "Intel® Architecture Instruction Set Extensions
Programming Reference" for more details about this feature. Because this feature
extends PEBS record, sampling with weight option is required to get the
retire_latency value.
perf record -e event_name -W ...
In the most recent release of TMA, the metrics begin to use event retire_latency
values in some of the metrics formulas on processors that support TPEBS feature.
For previous generations that do not support TPEBS, the values are static and
predefined per processor family by the hardware architects. Due to the diversity
of workloads in execution environments, retire_latency values measured at real
time are more accurate. Therefore, new TMA metrics that use TPEBS will provide
more accurate performance analysis results.
To support TPEBS in TMA metrics, a new modifier :R on event is added. Perf would
capture retire_latency value of required events(event with :R in metric formula)
with perf record. The retire_latency value would be used in metric calculation.
Currently, this feature is supported through perf stat
perf stat -M metric_name --record-tpebs ...
[1] https://software.intel.com/en-us/top-down-microarchitecture-analysis-method-win [1] https://software.intel.com/en-us/top-down-microarchitecture-analysis-method-win
[2] https://sites.google.com/site/analysismethods/yasin-pubs [2] https://sites.google.com/site/analysismethods/yasin-pubs