• Yicong Yang's avatar
    perf stat: Support per-cluster aggregation · cbc917a1
    Yicong Yang authored
    Some platforms have 'cluster' topology and CPUs in the cluster will
    share resources like L3 Cache Tag (for HiSilicon Kunpeng SoC) or L2
    cache (for Intel Jacobsville). Currently parsing and building cluster
    topology have been supported since [1].
    
    perf stat has already supported aggregation for other topologies like
    die or socket, etc. It'll be useful to aggregate per-cluster to find
    problems like L3T bandwidth contention.
    
    This patch add support for "--per-cluster" option for per-cluster
    aggregation. Also update the docs and related test. The output will
    be like:
    
    [root@localhost tmp]# perf stat -a -e LLC-load --per-cluster -- sleep 5
    
     Performance counter stats for 'system wide':
    
    S56-D0-CLS158    4      1,321,521,570      LLC-load
    S56-D0-CLS594    4        794,211,453      LLC-load
    S56-D0-CLS1030    4             41,623      LLC-load
    S56-D0-CLS1466    4             41,646      LLC-load
    S56-D0-CLS1902    4             16,863      LLC-load
    S56-D0-CLS2338    4             15,721      LLC-load
    S56-D0-CLS2774    4             22,671      LLC-load
    [...]
    
    On a legacy system without cluster or cluster support, the output will
    be look like:
    [root@localhost perf]# perf stat -a -e cycles --per-cluster -- sleep 1
    
     Performance counter stats for 'system wide':
    
    S56-D0-CLS0   64         18,011,485      cycles
    S7182-D0-CLS0   64         16,548,835      cycles
    
    Note that this patch doesn't mix the cluster information in the outputs
    of --per-core to avoid breaking any tools/scripts using it.
    
    Note that perf recently supports "--per-cache" aggregation, but it's not
    the same with the cluster although cluster CPUs may share some cache
    resources. For example on my machine all clusters within a die share the
    same L3 cache:
    $ cat /sys/devices/system/cpu/cpu0/cache/index3/shared_cpu_list
    0-31
    $ cat /sys/devices/system/cpu/cpu0/topology/cluster_cpus_list
    0-3
    
    [1] commit c5e22fef ("topology: Represent clusters of CPUs within a die")
    Tested-by: default avatarJie Zhan <zhanjie9@hisilicon.com>
    Reviewed-by: default avatarTim Chen <tim.c.chen@linux.intel.com>
    Reviewed-by: default avatarIan Rogers <irogers@google.com>
    Signed-off-by: default avatarYicong Yang <yangyicong@hisilicon.com>
    Cc: james.clark@arm.com
    Cc: 21cnbao@gmail.com
    Cc: prime.zeng@hisilicon.com
    Cc: Jonathan.Cameron@huawei.com
    Cc: fanghao11@huawei.com
    Cc: linuxarm@huawei.com
    Cc: tim.c.chen@intel.com
    Cc: linux-arm-kernel@lists.infradead.org
    Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
    Link: https://lore.kernel.org/r/20240208024026.2691-1-yangyicong@huawei.com
    cbc917a1
stat+std_output.sh 2.93 KB