Commit 86569c0a authored by James Clark's avatar James Clark Committed by Arnaldo Carvalho de Melo

perf mem/c2c: Document that SPE is used for mem and c2c on ARM

Setup is non-trivial so also link to the full SPE docs.
Signed-off-by: default avatarJames Clark <james.clark@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-perf-users@vger.kernel.or
Link: https://lore.kernel.org/r/20230124145929.557891-1-james.clark@arm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
parent 6bc75b4c
...@@ -22,7 +22,11 @@ you to track down the cacheline contentions. ...@@ -22,7 +22,11 @@ you to track down the cacheline contentions.
On Intel, the tool is based on load latency and precise store facility events On Intel, the tool is based on load latency and precise store facility events
provided by Intel CPUs. On PowerPC, the tool uses random instruction sampling provided by Intel CPUs. On PowerPC, the tool uses random instruction sampling
with thresholding feature. On AMD, the tool uses IBS op pmu (due to hardware with thresholding feature. On AMD, the tool uses IBS op pmu (due to hardware
limitations, perf c2c is not supported on Zen3 cpus). limitations, perf c2c is not supported on Zen3 cpus). On Arm64 it uses SPE to
sample load and store operations, therefore hardware and kernel support is
required. See linkperf:perf-arm-spe[1] for a setup guide. Due to the
statistical nature of Arm SPE sampling, not every memory operation will be
sampled.
These events provide: These events provide:
- memory address of the access - memory address of the access
...@@ -333,4 +337,4 @@ Check Joe's blog on c2c tool for detailed use case explanation: ...@@ -333,4 +337,4 @@ Check Joe's blog on c2c tool for detailed use case explanation:
SEE ALSO SEE ALSO
-------- --------
linkperf:perf-record[1], linkperf:perf-mem[1] linkperf:perf-record[1], linkperf:perf-mem[1], linkperf:perf-arm-spe[1]
...@@ -23,6 +23,11 @@ Note that on Intel systems the memory latency reported is the use-latency, ...@@ -23,6 +23,11 @@ Note that on Intel systems the memory latency reported is the use-latency,
not the pure load (or store latency). Use latency includes any pipeline not the pure load (or store latency). Use latency includes any pipeline
queueing delays in addition to the memory subsystem latency. queueing delays in addition to the memory subsystem latency.
On Arm64 this uses SPE to sample load and store operations, therefore hardware
and kernel support is required. See linkperf:perf-arm-spe[1] for a setup guide.
Due to the statistical nature of SPE sampling, not every memory operation will
be sampled.
OPTIONS OPTIONS
------- -------
<command>...:: <command>...::
...@@ -93,4 +98,4 @@ all perf record options. ...@@ -93,4 +98,4 @@ all perf record options.
SEE ALSO SEE ALSO
-------- --------
linkperf:perf-record[1], linkperf:perf-report[1] linkperf:perf-record[1], linkperf:perf-report[1], linkperf:perf-arm-spe[1]
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment