Commit 169f18fd authored by Weilin Wang's avatar Weilin Wang Committed by Arnaldo Carvalho de Melo

perf Document: Add TPEBS (Timed PEBS(Precise Event-Based Sampling)) to Documents

TPEBS (Timed PEBS(Precise Event-Based Sampling)) is a new feature Intel
PMU from Granite Rapids microarchitecture.

It will be used in new TMA (Top-Down Microarchitecture Analysis)
releases.

Add related introduction to documents while adding new code to support
it in 'perf stat'.
Reviewed-by: default avatarNamhyung Kim <namhyung@kernel.org>
Signed-off-by: default avatarWeilin Wang <weilin.wang@intel.com>
Acked-by: default avatarIan Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Samantha Alt <samantha.alt@intel.com>
Link: https://lore.kernel.org/r/20240720062102.444578-8-weilin.wang@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
parent d546e3ac
......@@ -72,6 +72,7 @@ counted. The following modifiers exist:
W - group is weak and will fallback to non-group if not schedulable,
e - group or event are exclusive and do not share the PMU
b - use BPF aggregration (see perf stat --bpf-counters)
R - retire latency value of the event
The 'p' modifier can be used for specifying how precise the instruction
address should be. The 'p' modifier can be specified multiple times:
......
......@@ -325,6 +325,36 @@ other four level 2 metrics by subtracting corresponding metrics as below.
Fetch_Bandwidth = Frontend_Bound - Fetch_Latency
Core_Bound = Backend_Bound - Memory_Bound
TPEBS in TopDown
================
TPEBS (Timed PEBS) is one of the new Intel PMU features provided since Granite
Rapids microarchitecture. The TPEBS feature adds a 16 bit retire_latency field
in the Basic Info group of the PEBS record. It records the Core cycles since the
retirement of the previous instruction to the retirement of current instruction.
Please refer to Section 8.4.1 of "Intel® Architecture Instruction Set Extensions
Programming Reference" for more details about this feature. Because this feature
extends PEBS record, sampling with weight option is required to get the
retire_latency value.
perf record -e event_name -W ...
In the most recent release of TMA, the metrics begin to use event retire_latency
values in some of the metrics’ formulas on processors that support TPEBS feature.
For previous generations that do not support TPEBS, the values are static and
predefined per processor family by the hardware architects. Due to the diversity
of workloads in execution environments, retire_latency values measured at real
time are more accurate. Therefore, new TMA metrics that use TPEBS will provide
more accurate performance analysis results.
To support TPEBS in TMA metrics, a new modifier :R on event is added. Perf would
capture retire_latency value of required events(event with :R in metric formula)
with perf record. The retire_latency value would be used in metric calculation.
Currently, this feature is supported through perf stat
perf stat -M metric_name --record-tpebs ...
[1] https://software.intel.com/en-us/top-down-microarchitecture-analysis-method-win
[2] https://sites.google.com/site/analysismethods/yasin-pubs
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment