1. 21 Jun, 2023 2 commits
    • Ian Rogers's avatar
      perf annotation: Switch lock from a mutex to a sharded_mutex · 2e9f9d4a
      Ian Rogers authored
      Remove the "struct mutex lock" variable from annotation that is
      allocated per symbol. This removes in the region of 40 bytes per
      symbol allocation. Use a sharded mutex where the number of shards is
      set to the number of CPUs. Assuming good hashing of the annotation
      (done based on the pointer), this means in order to contend there
      needs to be more threads than CPUs, which is not currently true in any
      perf command. Were contention an issue it is straightforward to
      increase the number of shards in the mutex.
      
      On my Debian/glibc based machine, this reduces the size of struct
      annotation from 136 bytes to 96 bytes, or nearly 30%.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Andres Freund <andres@anarazel.de>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Yuan Can <yuancan@huawei.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Link: https://lore.kernel.org/r/20230615040715.2064350-2-irogers@google.comSigned-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      2e9f9d4a
    • Ian Rogers's avatar
      perf sharded_mutex: Introduce sharded_mutex · 0650b2b2
      Ian Rogers authored
      Per object mutexes may come with significant memory cost while a
      global mutex can suffer from unnecessary contention. A sharded mutex
      is a compromise where objects are hashed and then a particular mutex
      for the hash of the object used. Contention can be controlled by the
      number of shards.
      
      v2. Use hashmap.h's hash_bits in case of contention from alignment of
          objects.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Andres Freund <andres@anarazel.de>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Yuan Can <yuancan@huawei.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Link: https://lore.kernel.org/r/20230615040715.2064350-1-irogers@google.comSigned-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      0650b2b2
  2. 20 Jun, 2023 5 commits
  3. 16 Jun, 2023 20 commits
    • Arnaldo Carvalho de Melo's avatar
      perf pmus: Check if we can encode the PMU number in perf_event_attr.type · 82fe2e45
      Arnaldo Carvalho de Melo authored
      In some architectures we can't encode the PMU number in
      perf_event_attr.type and thus can't just ask for the same event in
      multiple CPUs (and thus PMUs), that is what we want in hybrid systems
      but we can't when that encoding isn't understood by the kernel, such as
      in ARM64's big.LITTLE.
      
      If that is the case, fallback to the previous behaviour till we find a
      better solution to have consistent output accross architectures with
      hybrid CPU configurations.
      
      Co-developed-with: Ian Rogers <irogers@google.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: John Garry <john.g.garry@oracle.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Cc: Thomas Richter <tmricht@linux.ibm.com>
      Cc: Will Deacon <will@kernel.org>
      Link: https://lore.kernel.org/linux-perf-users/ZIzYgImv61OGK1wA@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      82fe2e45
    • Arnaldo Carvalho de Melo's avatar
      perf print-events: Export is_event_supported() · e2be0666
      Arnaldo Carvalho de Melo authored
      Will be used when checking if we can encode the PMU number in
      perf_event_attr.type, part of the logic to use in hybrid systems
      (multiple types of CPUs, such as Intel's (Alder Lake, etc) or ARM's
      big.LITTLE).
      
      Co-developed-with: Ian Rogers <irogers@google.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: John Garry <john.g.garry@oracle.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Cc: Thomas Richter <tmricht@linux.ibm.com>
      Cc: Will Deacon <will@kernel.org>
      Link: https://lore.kernel.org/linux-perf-users/ZIzYgImv61OGK1wA@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      e2be0666
    • Tiezhu Yang's avatar
      perf test record+probe_libc_inet_pton.sh: Use "grep -F" instead of obsolescent "fgrep" · bb6b369c
      Tiezhu Yang authored
      There exists the following warning when executing 'perf test record+probe_libc_inet_pton.sh':
      
        fgrep: warning: fgrep is obsolescent; using grep -F
      
      This is tested on Fedora 38, the version of grep is 3.8, the latest
      version of grep claims the fgrep is obsolete, use "grep -F" instead of
      "fgrep" to silence the warning.
      Signed-off-by: default avatarTiezhu Yang <yangtiezhu@loongson.cn>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: loongson-kernel@lists.loongnix.cn
      Link: https://lore.kernel.org/r/1686880567-30017-1-git-send-email-yangtiezhu@loongson.cnSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      bb6b369c
    • Ravi Bangoria's avatar
      perf mem: Scan all PMUs instead of just core ones · 5752c20f
      Ravi Bangoria authored
      Scanning only core PMUs is not sufficient on platforms like AMD since
      perf mem on AMD uses IBS OP PMU, which is independent of core PMU.
      Scan all PMUs instead of just core PMUs. There should be negligible
      performance overhead because of scanning all PMUs, so we should be okay.
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Signed-off-by: default avatarRavi Bangoria <ravi.bangoria@amd.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ali Saidi <alisaidi@amazon.com>
      Cc: Ananth Narayan <ananth.narayan@amd.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sandipan Das <sandipan.das@amd.com>
      Cc: Santosh Shukla <santosh.shukla@amd.com>
      Cc: Thomas Richter <tmricht@linux.ibm.com>
      Link: https://lore.kernel.org/r/20230615051700.1833-4-ravi.bangoria@amd.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      5752c20f
    • Ravi Bangoria's avatar
      perf mem amd: Fix perf_pmus__num_mem_pmus() · f0dc2082
      Ravi Bangoria authored
      perf mem/c2c on AMD internally uses IBS OP PMU, not the core PMU. Also,
      AMD platforms does not have heterogeneous PMUs.
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Signed-off-by: default avatarRavi Bangoria <ravi.bangoria@amd.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ali Saidi <alisaidi@amazon.com>
      Cc: Ananth Narayan <ananth.narayan@amd.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sandipan Das <sandipan.das@amd.com>
      Cc: Santosh Shukla <santosh.shukla@amd.com>
      Cc: Thomas Richter <tmricht@linux.ibm.com>
      Link: https://lore.kernel.org/r/20230615051700.1833-3-ravi.bangoria@amd.com
      [ Added the improved comment for perf_pmus__num_mem_pmus() as b4 didn't from the per-patch (not series) newer version ]
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      f0dc2082
    • Ravi Bangoria's avatar
      perf pmus: Describe semantics of 'core_pmus' and 'other_pmus' · cddfc5fb
      Ravi Bangoria authored
      Notion of 'core_pmus' and 'other_pmus' are independent of hw core and
      uncore pmus. For example, AMD IBS PMUs are present in each SMT-thread
      but they belongs to 'other_pmus'. Add a comment describing what these
      list contains and how they are treated.
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Signed-off-by: default avatarRavi Bangoria <ravi.bangoria@amd.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ali Saidi <alisaidi@amazon.com>
      Cc: Ananth Narayan <ananth.narayan@amd.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sandipan Das <sandipan.das@amd.com>
      Cc: Santosh Shukla <santosh.shukla@amd.com>
      Cc: Thomas Richter <tmricht@linux.ibm.com>
      Link: https://lore.kernel.org/r/20230615051700.1833-2-ravi.bangoria@amd.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      cddfc5fb
    • Namhyung Kim's avatar
      perf stat: Show average value on multiple runs · dada1a1f
      Namhyung Kim authored
      When -r option is used, perf stat runs the command multiple times and
      update stats in the evsel->stats.res_stats for global aggregation.  But
      the value is never used and the value it prints at the end is just the
      value from the last run.  I think we should print the average number of
      multiple runs.
      
      Add evlist__copy_res_stats() to update the aggr counter (for display)
      using the values in the evsel->stats.res_stats.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20230616073211.1057936-2-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      dada1a1f
    • Namhyung Kim's avatar
      perf stat: Reset aggr stats for each run · ed4090a2
      Namhyung Kim authored
      When it runs multiple times with -r option, it missed to reset the
      aggregation counters and the values were added up.  The aggregation
      count has the values to be printed in the end.  It should reset the
      counters at the beginning of each run.  But the current code does that
      only when -I/--interval-print option is given.
      
      Fixes: 91f85f98 ("perf stat: Display event stats using aggr counts")
      Reported-by: default avatarJiri Olsa <jolsa@kernel.org>
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20230616073211.1057936-1-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      ed4090a2
    • Thomas Richter's avatar
      perf test: fix failing test cases on linux-next for s390 · 6fbd67b0
      Thomas Richter authored
      In linux-next tree the many test cases fail on s390x when running the
      perf test suite, sometime the perf tool dumps core.
      
      Output before:
        6.1: Test event parsing                               : FAILED!
       10.3: Parsing of PMU event table metrics               : FAILED!
       10.4: Parsing of PMU event table metrics with fake PMUs: FAILED!
       17: Setup struct perf_event_attr                       : FAILED!
       24: Number of exit events of a simple workload         : FAILED!
       26: Object code reading                                : FAILED!
       28: Use a dummy software event to keep tracking        : FAILED!
       35: Track with sched_switch                            : FAILED!
       42.3: BPF prologue generation                          : FAILED!
       66: Parse and process metrics                          : FAILED!
       68: Event expansion for cgroups                        : FAILED!
       69.2: Perf time to TSC                                 : FAILED!
       74: build id cache operations                          : FAILED!
       86: Zstd perf.data compression/decompression           : FAILED!
       87: perf record tests                                  : FAILED!
      106: Test java symbol                                   : FAILED!
      
      The reason for all these failure is a missing PMU. On s390x the PMU is
      named cpum_cf which is not detected as core PMU.  A similar patch was
      added before, see commit 9bacbced ("perf list: Add s390 support
      for detailed PMU event description") which got lost during the recent
      reworks. Add it again.
      
      Output after:
       10.2: PMU event map aliases                            : FAILED!
       42.3: BPF prologue generation                          : FAILED!
      
      Most test cases now work and there is not core dump anymore.
      Signed-off-by: default avatarThomas Richter <tmricht@linux.ibm.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
      Cc: Sven Schnelle <svens@linux.ibm.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Link: https://lore.kernel.org/r/20230616081437.1932003-1-tmricht@linux.ibm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      6fbd67b0
    • Vincent Whitchurch's avatar
      perf annotate: Work with vmlinux outside symfs · 66dc1920
      Vincent Whitchurch authored
      It is currently possible to use --symfs along with a vmlinux which lies
      outside of the symfs by passing an absolute path to --vmlinux, thanks to
      the check in dso__load_vmlinux() which handles this explicitly.
      
      However, the annotate code lacks this check and thus 'perf annotate'
      does not work ("Internal error: Invalid -1 error code") for kernel
      functions with this combination.  Add the missing handling.
      Signed-off-by: default avatarVincent Whitchurch <vincent.whitchurch@axis.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: kernel@axis.com
      Link: https://lore.kernel.org/r/20221125114210.2353820-1-vincent.whitchurch@axis.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      66dc1920
    • Kan Liang's avatar
      perf vendor events arm64: Add default tags for Hisi hip08 L1 metrics · f9625140
      Kan Liang authored
      Add the default tags for Hisi hip08 as well.
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ahmad Yasin <ahmad.yasin@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.g.garry@oracle.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: https://lore.kernel.org/r/20230616031420.3751973-6-kan.liang@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      f9625140
    • Kan Liang's avatar
      perf test: Add test case for the standard 'perf stat' output · 99a04a48
      Kan Liang authored
      Add a new test case to verify the standard 'perf stat' output with
      different options.
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ahmad Yasin <ahmad.yasin@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: https://lore.kernel.org/r/20230616031420.3751973-5-kan.liang@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      99a04a48
    • Kan Liang's avatar
      perf test: Move all the check functions of stat CSV output to lib · fc51fc87
      Kan Liang authored
      These functions can be shared with the stat std output test.
      
      There is no functional change.
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ahmad Yasin <ahmad.yasin@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: https://lore.kernel.org/r/20230616031420.3751973-4-kan.liang@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      fc51fc87
    • Kan Liang's avatar
      perf stat: New metricgroup output for the default mode · 6a80d794
      Kan Liang authored
      In the default mode, the current output of the metricgroup include both
      events and metrics, which is not necessary and just makes the output
      hard to read. Since different ARCHs (even different generations in the
      same ARCH) may use different events. The output also vary on different
      platforms.
      
      For a metricgroup, only outputting the value of each metric is good
      enough.
      
      Add a new field default_metricgroup in evsel to indicate an event of the
      default metricgroup. For those events, printout() should print the
      metricgroup name rather than each event.
      
      Add perf_stat__skip_metric_event() to skip the evsel in the Default
      metricgroup, if it's not running or not the metric event.
      
      Add print_metricgroup_header_t to pass the functions which print the
      display name of each metricgroup in the Default metricgroup. Support all
      three output methods.
      
      Factor out perf_stat__print_shadow_stats_metricgroup() to print out each
      metrics.
      
      On SPR:
      
      Before:
      
       ./perf_old stat sleep 1
      
       Performance counter stats for 'sleep 1':
      
                    0.54 msec task-clock:u                     #    0.001 CPUs utilized
                       0      context-switches:u               #    0.000 /sec
                       0      cpu-migrations:u                 #    0.000 /sec
                      68      page-faults:u                    #  125.445 K/sec
                 540,970      cycles:u                         #    0.998 GHz
                 556,325      instructions:u                   #    1.03  insn per cycle
                 123,602      branches:u                       #  228.018 M/sec
                   6,889      branch-misses:u                  #    5.57% of all branches
               3,245,820      TOPDOWN.SLOTS:u                  #     18.4 %  tma_backend_bound
                                                        #     17.2 %  tma_retiring
                                                        #     23.1 %  tma_bad_speculation
                                                        #     41.4 %  tma_frontend_bound
                 564,859      topdown-retiring:u
               1,370,999      topdown-fe-bound:u
                 603,271      topdown-be-bound:u
                 744,874      topdown-bad-spec:u
                  12,661      INT_MISC.UOP_DROPPING:u          #   23.357 M/sec
      
             1.001798215 seconds time elapsed
      
             0.000193000 seconds user
             0.001700000 seconds sys
      
      After:
      
      $ ./perf stat sleep 1
      
       Performance counter stats for 'sleep 1':
      
                    0.51 msec task-clock:u                     #    0.001 CPUs utilized
                       0      context-switches:u               #    0.000 /sec
                       0      cpu-migrations:u                 #    0.000 /sec
                      68      page-faults:u                    #  132.683 K/sec
                 545,228      cycles:u                         #    1.064 GHz
                 555,509      instructions:u                   #    1.02  insn per cycle
                 123,574      branches:u                       #  241.120 M/sec
                   6,957      branch-misses:u                  #    5.63% of all branches
                              TopdownL1                 #     17.5 %  tma_backend_bound
                                                        #     22.6 %  tma_bad_speculation
                                                        #     42.7 %  tma_frontend_bound
                                                        #     17.1 %  tma_retiring
                              TopdownL2                 #     21.8 %  tma_branch_mispredicts
                                                        #     11.5 %  tma_core_bound
                                                        #     13.4 %  tma_fetch_bandwidth
                                                        #     29.3 %  tma_fetch_latency
                                                        #      2.7 %  tma_heavy_operations
                                                        #     14.5 %  tma_light_operations
                                                        #      0.8 %  tma_machine_clears
                                                        #      6.1 %  tma_memory_bound
      
             1.001712086 seconds time elapsed
      
             0.000151000 seconds user
             0.001618000 seconds sys
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ahmad Yasin <ahmad.yasin@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: https://lore.kernel.org/r/20230616031420.3751973-3-kan.liang@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      6a80d794
    • Kan Liang's avatar
      perf metrics: Sort the Default metricgroup · 1c0e4795
      Kan Liang authored
      The new default mode will print the metrics as a metric group. The
      metrics from the same metric group must be adjacent to each other in the
      metric list. But the metric_list_cmp() sorts metrics by the number of
      events.
      
      Add a new sort for the Default metricgroup, which sorts by
      default_metricgroup_name and metric_name.
      
      Add is_default in the struct metric_event to indicate that it's from
      the Default metricgroup.
      
      Store the displayed metricgroup name of the Default metricgroup into
      the metric expr for output.
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ahmad Yasin <ahmad.yasin@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: https://lore.kernel.org/r/20230616031420.3751973-2-kan.liang@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      1c0e4795
    • Kan Liang's avatar
      pert tests: Update metric-value for perf stat JSON output · 18b687d7
      Kan Liang authored
      There may be multiplexing triggered, e.g., e-core of ADL.
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ahmad Yasin <ahmad.yasin@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: https://lore.kernel.org/r/20230615135315.3662428-7-kan.liang@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      18b687d7
    • Kan Liang's avatar
      perf stat,jevents: Introduce Default tags for the default mode · b0a9e8f8
      Kan Liang authored
      Introduce a new metricgroup, Default, to tag all the metric groups which
      will be collected in the default mode.
      
      Add a new field, DefaultMetricgroupName, in the JSON file to indicate
      the real metric group name. It will be printed in the default output
      to replace the event names.
      
      There is nothing changed for the output format.
      
      On SPR, both TopdownL1 and TopdownL2 are displayed in the default
      output.
      
      On ARM, Intel ICL and later platforms (before SPR), only TopdownL1 is
      displayed in the default output.
      Suggested-by: default avatarStephane Eranian <eranian@google.com>
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ahmad Yasin <ahmad.yasin@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20230615135315.3662428-4-kan.liang@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      b0a9e8f8
    • Kan Liang's avatar
      perf metric: JSON flag to default metric group · 969a4661
      Kan Liang authored
      For the default output, the default metric group could vary on different
      platforms. For example, on SPR, the TopdownL1 and TopdownL2 metrics
      should be displayed in the default mode. On ICL, only the TopdownL1
      should be displayed.
      
      Add a flag so we can tag the default metric group for different
      platforms rather than hack the perf code.
      
      The flag is added to Intel TopdownL1 since ICL and ADL, TopdownL2
      metrics since SPR.
      
      Add a new field, DefaultMetricgroupName, in the JSON file to indicate
      the real metric group name.
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ahmad Yasin <ahmad.yasin@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: https://lore.kernel.org/r/20230615135315.3662428-3-kan.liang@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      969a4661
    • Kan Liang's avatar
      perf evsel: Fix the annotation for hardware events on hybrid · e15e4a3d
      Kan Liang authored
      The annotation for hardware events is wrong on hybrid. For example,
      
       # ./perf stat -a sleep 1
      
       Performance counter stats for 'system wide':
      
               32,148.85 msec cpu-clock                        #   32.000 CPUs utilized
                     374      context-switches                 #   11.633 /sec
                      33      cpu-migrations                   #    1.026 /sec
                     295      page-faults                      #    9.176 /sec
              18,979,960      cpu_core/cycles/                 #  590.378 K/sec
             261,230,783      cpu_atom/cycles/                 #    8.126 M/sec                       (54.21%)
              17,019,732      cpu_core/instructions/           #  529.404 K/sec
              38,020,470      cpu_atom/instructions/           #    1.183 M/sec                       (63.36%)
               3,296,743      cpu_core/branches/               #  102.546 K/sec
               6,692,338      cpu_atom/branches/               #  208.167 K/sec                       (63.40%)
                  96,421      cpu_core/branch-misses/          #    2.999 K/sec
               1,016,336      cpu_atom/branch-misses/          #   31.613 K/sec                       (63.38%)
      
      The hardware events have extended type on hybrid, but the evsel__match()
      doesn't take it into account.
      
      Filter the config on hybrid before checking.
      
      With the patch,
      
       # ./perf stat -a sleep 1
      
       Performance counter stats for 'system wide':
      
               32,139.90 msec cpu-clock                        #   32.003 CPUs utilized
                     343      context-switches                 #   10.672 /sec
                      32      cpu-migrations                   #    0.996 /sec
                      73      page-faults                      #    2.271 /sec
              13,712,841      cpu_core/cycles/                 #    0.000 GHz
             258,301,691      cpu_atom/cycles/                 #    0.008 GHz                         (54.20%)
              12,428,163      cpu_core/instructions/           #    0.91  insn per cycle
              37,786,557      cpu_atom/instructions/           #    2.76  insn per cycle              (63.35%)
               2,418,826      cpu_core/branches/               #   75.259 K/sec
               6,965,962      cpu_atom/branches/               #  216.739 K/sec                       (63.38%)
                  72,150      cpu_core/branch-misses/          #    2.98% of all branches
               1,032,746      cpu_atom/branch-misses/          #   42.70% of all branches             (63.35%)
      Suggested-by: default avatarIan Rogers <irogers@google.com>
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ahmad Yasin <ahmad.yasin@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: https://lore.kernel.org/r/20230615135315.3662428-2-kan.liang@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      e15e4a3d
    • Ian Rogers's avatar
      perf srcline: Fix handling of inline functions · e90208e9
      Ian Rogers authored
      We write an address then a ',' to addr2line. With inline data we
      generally get back (// are my comments):
      0x1234    // address
      foo       // function name
      foo.c:123 // filename:line
      bar       // function name
      bar.c:123 // filename:line
      0x000000000000000 // sentinel address created by ','
      ??        // unknown function name
      ??:0      // unknown filename:line
      
      The code was assuming the inline data also had the address, which is
      incorrect. This means the first inline function name (bar above) needs
      to be checked to see if it is the sentinel, otherwise to be treated as
      a function name. The regression was caused by the addition of
      addresses as the kernel is reporting a symbol at address 0 (used by
      GNU binutils when it interprets ',').
      
      Committer testing:
      
      Using:
      
        # perf trace --call-graph=dwarf -e lock:contention_*
        <SNIP>
        1244.615 TaskCon~ller #/2645281 lock:contention_begin(lock_addr: 0xffff8e6748da5ab0, flags: 2)
                                             __preempt_count_dec_and_test (inlined)
                                             trace_contention_begin (inlined)
                                             trace_contention_begin (inlined)
                                             rwsem_down_read_slowpath ([kernel.kallsyms])
                                             __preempt_count_dec_and_test (inlined)
                                             trace_contention_begin (inlined)
                                             trace_contention_begin (inlined)
                                             rwsem_down_read_slowpath ([kernel.kallsyms])
                                             __down_read_common (inlined)
                                             __down_read (inlined)
                                             down_read ([kernel.kallsyms])
                                             arch_static_branch (inlined)
                                             static_key_false (inlined)
                                             __mmap_lock_trace_acquire_returned (inlined)
                                             mmap_read_lock (inlined)
                                             do_user_addr_fault ([kernel.kallsyms])
                                             arch_local_irq_disable (inlined)
                                             handle_page_fault (inlined)
                                             exc_page_fault ([kernel.kallsyms])
                                             asm_exc_page_fault ([kernel.kallsyms])
                                             [0x4def008] (/usr/lib64/firefox/libxul.so)
        1244.619 TaskCon~ller #/2645281 lock:contention_end(lock_addr: 0xffff8e6748da5ab0)
                                             __preempt_count_dec_and_test (inlined)
                                             trace_contention_end (inlined)
                                             trace_contention_end (inlined)
                                             rwsem_down_read_slowpath ([kernel.kallsyms])
                                             __preempt_count_dec_and_test (inlined)
                                             trace_contention_end (inlined)
                                             trace_contention_end (inlined)
                                             rwsem_down_read_slowpath ([kernel.kallsyms])
                                             __down_read_common (inlined)
                                             __down_read (inlined)
                                             down_read ([kernel.kallsyms])
                                             arch_static_branch (inlined)
                                             static_key_false (inlined)
                                             __mmap_lock_trace_acquire_returned (inlined)
                                             mmap_read_lock (inlined)
                                             do_user_addr_fault ([kernel.kallsyms])
                                             arch_local_irq_disable (inlined)
                                             handle_page_fault (inlined)
                                             exc_page_fault ([kernel.kallsyms])
                                             asm_exc_page_fault ([kernel.kallsyms])
        <SNIP>
      
      Fixes: 8dc26b6f ("perf srcline: Make sentinel reading for binutils addr2line more robust")
      Reported-by: default avatarArnaldo Carvalho de Melo <acme@kernel.org>
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: llvm@lists.linux.dev
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Tom Rix <trix@redhat.com>
      Link: https://lore.kernel.org/r/20230615025041.1982072-1-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      e90208e9
  4. 14 Jun, 2023 13 commits