Commit 3071f13d authored by Agustin Vega-Frias's avatar Agustin Vega-Frias Committed by Will Deacon

perf: qcom: Add L3 cache PMU driver

This adds a new dynamic PMU to the Perf Events framework to program
and control the L3 cache PMUs in some Qualcomm Technologies SOCs.

The driver supports a distributed cache architecture where the overall
cache for a socket is comprised of multiple slices each with its own PMU.
Access to each individual PMU is provided even though all CPUs share all
the slices. User space needs to aggregate to individual counts to provide
a global picture.

The driver exports formatting and event information to sysfs so it can
be used by the perf user space tools with the syntaxes:
   perf stat -a -e l3cache_0_0/read-miss/
   perf stat -a -e l3cache_0_0/event=0x21/
Acked-by: default avatarMark Rutland <mark.rutland@arm.com>
Signed-off-by: default avatarAgustin Vega-Frias <agustinv@codeaurora.org>
[will: fixed sparse issues]
Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
parent c09adab0
Qualcomm Datacenter Technologies L3 Cache Performance Monitoring Unit (PMU)
===========================================================================
This driver supports the L3 cache PMUs found in Qualcomm Datacenter Technologies
Centriq SoCs. The L3 cache on these SOCs is composed of multiple slices, shared
by all cores within a socket. Each slice is exposed as a separate uncore perf
PMU with device name l3cache_<socket>_<instance>. User space is responsible
for aggregating across slices.
The driver provides a description of its available events and configuration
options in sysfs, see /sys/devices/l3cache*. Given that these are uncore PMUs
the driver also exposes a "cpumask" sysfs attribute which contains a mask
consisting of one CPU per socket which will be used to handle all the PMU
events on that socket.
The hardware implements 32bit event counters and has a flat 8bit event space
exposed via the "event" format attribute. In addition to the 32bit physical
counters the driver supports virtual 64bit hardware counters by using hardware
counter chaining. This feature is exposed via the "lc" (long counter) format
flag. E.g.:
perf stat -e l3cache_0_0/read-miss,lc/
Given that these are uncore PMUs the driver does not support sampling, therefore
"perf record" will not work. Per-task perf sessions are not supported.
...@@ -21,6 +21,16 @@ config QCOM_L2_PMU ...@@ -21,6 +21,16 @@ config QCOM_L2_PMU
Adds the L2 cache PMU into the perf events subsystem for Adds the L2 cache PMU into the perf events subsystem for
monitoring L2 cache events. monitoring L2 cache events.
config QCOM_L3_PMU
bool "Qualcomm Technologies L3-cache PMU"
depends on ARCH_QCOM && ARM64 && PERF_EVENTS && ACPI
select QCOM_IRQ_COMBINER
help
Provides support for the L3 cache performance monitor unit (PMU)
in Qualcomm Technologies processors.
Adds the L3 cache PMU into the perf events subsystem for
monitoring L3 cache events.
config XGENE_PMU config XGENE_PMU
depends on PERF_EVENTS && ARCH_XGENE depends on PERF_EVENTS && ARCH_XGENE
bool "APM X-Gene SoC PMU" bool "APM X-Gene SoC PMU"
......
obj-$(CONFIG_ARM_PMU) += arm_pmu.o obj-$(CONFIG_ARM_PMU) += arm_pmu.o
obj-$(CONFIG_QCOM_L2_PMU) += qcom_l2_pmu.o obj-$(CONFIG_QCOM_L2_PMU) += qcom_l2_pmu.o
obj-$(CONFIG_QCOM_L3_PMU) += qcom_l3_pmu.o
obj-$(CONFIG_XGENE_PMU) += xgene_pmu.o obj-$(CONFIG_XGENE_PMU) += xgene_pmu.o
This diff is collapsed.
...@@ -137,6 +137,7 @@ enum cpuhp_state { ...@@ -137,6 +137,7 @@ enum cpuhp_state {
CPUHP_AP_PERF_ARM_CCN_ONLINE, CPUHP_AP_PERF_ARM_CCN_ONLINE,
CPUHP_AP_PERF_ARM_L2X0_ONLINE, CPUHP_AP_PERF_ARM_L2X0_ONLINE,
CPUHP_AP_PERF_ARM_QCOM_L2_ONLINE, CPUHP_AP_PERF_ARM_QCOM_L2_ONLINE,
CPUHP_AP_PERF_ARM_QCOM_L3_ONLINE,
CPUHP_AP_WORKQUEUE_ONLINE, CPUHP_AP_WORKQUEUE_ONLINE,
CPUHP_AP_RCUTREE_ONLINE, CPUHP_AP_RCUTREE_ONLINE,
CPUHP_AP_ONLINE_DYN, CPUHP_AP_ONLINE_DYN,
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment