1. 16 Jun, 2023 7 commits
    • Kan Liang's avatar
      perf stat: New metricgroup output for the default mode · 6a80d794
      Kan Liang authored
      In the default mode, the current output of the metricgroup include both
      events and metrics, which is not necessary and just makes the output
      hard to read. Since different ARCHs (even different generations in the
      same ARCH) may use different events. The output also vary on different
      platforms.
      
      For a metricgroup, only outputting the value of each metric is good
      enough.
      
      Add a new field default_metricgroup in evsel to indicate an event of the
      default metricgroup. For those events, printout() should print the
      metricgroup name rather than each event.
      
      Add perf_stat__skip_metric_event() to skip the evsel in the Default
      metricgroup, if it's not running or not the metric event.
      
      Add print_metricgroup_header_t to pass the functions which print the
      display name of each metricgroup in the Default metricgroup. Support all
      three output methods.
      
      Factor out perf_stat__print_shadow_stats_metricgroup() to print out each
      metrics.
      
      On SPR:
      
      Before:
      
       ./perf_old stat sleep 1
      
       Performance counter stats for 'sleep 1':
      
                    0.54 msec task-clock:u                     #    0.001 CPUs utilized
                       0      context-switches:u               #    0.000 /sec
                       0      cpu-migrations:u                 #    0.000 /sec
                      68      page-faults:u                    #  125.445 K/sec
                 540,970      cycles:u                         #    0.998 GHz
                 556,325      instructions:u                   #    1.03  insn per cycle
                 123,602      branches:u                       #  228.018 M/sec
                   6,889      branch-misses:u                  #    5.57% of all branches
               3,245,820      TOPDOWN.SLOTS:u                  #     18.4 %  tma_backend_bound
                                                        #     17.2 %  tma_retiring
                                                        #     23.1 %  tma_bad_speculation
                                                        #     41.4 %  tma_frontend_bound
                 564,859      topdown-retiring:u
               1,370,999      topdown-fe-bound:u
                 603,271      topdown-be-bound:u
                 744,874      topdown-bad-spec:u
                  12,661      INT_MISC.UOP_DROPPING:u          #   23.357 M/sec
      
             1.001798215 seconds time elapsed
      
             0.000193000 seconds user
             0.001700000 seconds sys
      
      After:
      
      $ ./perf stat sleep 1
      
       Performance counter stats for 'sleep 1':
      
                    0.51 msec task-clock:u                     #    0.001 CPUs utilized
                       0      context-switches:u               #    0.000 /sec
                       0      cpu-migrations:u                 #    0.000 /sec
                      68      page-faults:u                    #  132.683 K/sec
                 545,228      cycles:u                         #    1.064 GHz
                 555,509      instructions:u                   #    1.02  insn per cycle
                 123,574      branches:u                       #  241.120 M/sec
                   6,957      branch-misses:u                  #    5.63% of all branches
                              TopdownL1                 #     17.5 %  tma_backend_bound
                                                        #     22.6 %  tma_bad_speculation
                                                        #     42.7 %  tma_frontend_bound
                                                        #     17.1 %  tma_retiring
                              TopdownL2                 #     21.8 %  tma_branch_mispredicts
                                                        #     11.5 %  tma_core_bound
                                                        #     13.4 %  tma_fetch_bandwidth
                                                        #     29.3 %  tma_fetch_latency
                                                        #      2.7 %  tma_heavy_operations
                                                        #     14.5 %  tma_light_operations
                                                        #      0.8 %  tma_machine_clears
                                                        #      6.1 %  tma_memory_bound
      
             1.001712086 seconds time elapsed
      
             0.000151000 seconds user
             0.001618000 seconds sys
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ahmad Yasin <ahmad.yasin@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: https://lore.kernel.org/r/20230616031420.3751973-3-kan.liang@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      6a80d794
    • Kan Liang's avatar
      perf metrics: Sort the Default metricgroup · 1c0e4795
      Kan Liang authored
      The new default mode will print the metrics as a metric group. The
      metrics from the same metric group must be adjacent to each other in the
      metric list. But the metric_list_cmp() sorts metrics by the number of
      events.
      
      Add a new sort for the Default metricgroup, which sorts by
      default_metricgroup_name and metric_name.
      
      Add is_default in the struct metric_event to indicate that it's from
      the Default metricgroup.
      
      Store the displayed metricgroup name of the Default metricgroup into
      the metric expr for output.
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ahmad Yasin <ahmad.yasin@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: https://lore.kernel.org/r/20230616031420.3751973-2-kan.liang@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      1c0e4795
    • Kan Liang's avatar
      pert tests: Update metric-value for perf stat JSON output · 18b687d7
      Kan Liang authored
      There may be multiplexing triggered, e.g., e-core of ADL.
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ahmad Yasin <ahmad.yasin@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: https://lore.kernel.org/r/20230615135315.3662428-7-kan.liang@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      18b687d7
    • Kan Liang's avatar
      perf stat,jevents: Introduce Default tags for the default mode · b0a9e8f8
      Kan Liang authored
      Introduce a new metricgroup, Default, to tag all the metric groups which
      will be collected in the default mode.
      
      Add a new field, DefaultMetricgroupName, in the JSON file to indicate
      the real metric group name. It will be printed in the default output
      to replace the event names.
      
      There is nothing changed for the output format.
      
      On SPR, both TopdownL1 and TopdownL2 are displayed in the default
      output.
      
      On ARM, Intel ICL and later platforms (before SPR), only TopdownL1 is
      displayed in the default output.
      Suggested-by: default avatarStephane Eranian <eranian@google.com>
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ahmad Yasin <ahmad.yasin@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20230615135315.3662428-4-kan.liang@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      b0a9e8f8
    • Kan Liang's avatar
      perf metric: JSON flag to default metric group · 969a4661
      Kan Liang authored
      For the default output, the default metric group could vary on different
      platforms. For example, on SPR, the TopdownL1 and TopdownL2 metrics
      should be displayed in the default mode. On ICL, only the TopdownL1
      should be displayed.
      
      Add a flag so we can tag the default metric group for different
      platforms rather than hack the perf code.
      
      The flag is added to Intel TopdownL1 since ICL and ADL, TopdownL2
      metrics since SPR.
      
      Add a new field, DefaultMetricgroupName, in the JSON file to indicate
      the real metric group name.
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ahmad Yasin <ahmad.yasin@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: https://lore.kernel.org/r/20230615135315.3662428-3-kan.liang@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      969a4661
    • Kan Liang's avatar
      perf evsel: Fix the annotation for hardware events on hybrid · e15e4a3d
      Kan Liang authored
      The annotation for hardware events is wrong on hybrid. For example,
      
       # ./perf stat -a sleep 1
      
       Performance counter stats for 'system wide':
      
               32,148.85 msec cpu-clock                        #   32.000 CPUs utilized
                     374      context-switches                 #   11.633 /sec
                      33      cpu-migrations                   #    1.026 /sec
                     295      page-faults                      #    9.176 /sec
              18,979,960      cpu_core/cycles/                 #  590.378 K/sec
             261,230,783      cpu_atom/cycles/                 #    8.126 M/sec                       (54.21%)
              17,019,732      cpu_core/instructions/           #  529.404 K/sec
              38,020,470      cpu_atom/instructions/           #    1.183 M/sec                       (63.36%)
               3,296,743      cpu_core/branches/               #  102.546 K/sec
               6,692,338      cpu_atom/branches/               #  208.167 K/sec                       (63.40%)
                  96,421      cpu_core/branch-misses/          #    2.999 K/sec
               1,016,336      cpu_atom/branch-misses/          #   31.613 K/sec                       (63.38%)
      
      The hardware events have extended type on hybrid, but the evsel__match()
      doesn't take it into account.
      
      Filter the config on hybrid before checking.
      
      With the patch,
      
       # ./perf stat -a sleep 1
      
       Performance counter stats for 'system wide':
      
               32,139.90 msec cpu-clock                        #   32.003 CPUs utilized
                     343      context-switches                 #   10.672 /sec
                      32      cpu-migrations                   #    0.996 /sec
                      73      page-faults                      #    2.271 /sec
              13,712,841      cpu_core/cycles/                 #    0.000 GHz
             258,301,691      cpu_atom/cycles/                 #    0.008 GHz                         (54.20%)
              12,428,163      cpu_core/instructions/           #    0.91  insn per cycle
              37,786,557      cpu_atom/instructions/           #    2.76  insn per cycle              (63.35%)
               2,418,826      cpu_core/branches/               #   75.259 K/sec
               6,965,962      cpu_atom/branches/               #  216.739 K/sec                       (63.38%)
                  72,150      cpu_core/branch-misses/          #    2.98% of all branches
               1,032,746      cpu_atom/branch-misses/          #   42.70% of all branches             (63.35%)
      Suggested-by: default avatarIan Rogers <irogers@google.com>
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ahmad Yasin <ahmad.yasin@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: https://lore.kernel.org/r/20230615135315.3662428-2-kan.liang@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      e15e4a3d
    • Ian Rogers's avatar
      perf srcline: Fix handling of inline functions · e90208e9
      Ian Rogers authored
      We write an address then a ',' to addr2line. With inline data we
      generally get back (// are my comments):
      0x1234    // address
      foo       // function name
      foo.c:123 // filename:line
      bar       // function name
      bar.c:123 // filename:line
      0x000000000000000 // sentinel address created by ','
      ??        // unknown function name
      ??:0      // unknown filename:line
      
      The code was assuming the inline data also had the address, which is
      incorrect. This means the first inline function name (bar above) needs
      to be checked to see if it is the sentinel, otherwise to be treated as
      a function name. The regression was caused by the addition of
      addresses as the kernel is reporting a symbol at address 0 (used by
      GNU binutils when it interprets ',').
      
      Committer testing:
      
      Using:
      
        # perf trace --call-graph=dwarf -e lock:contention_*
        <SNIP>
        1244.615 TaskCon~ller #/2645281 lock:contention_begin(lock_addr: 0xffff8e6748da5ab0, flags: 2)
                                             __preempt_count_dec_and_test (inlined)
                                             trace_contention_begin (inlined)
                                             trace_contention_begin (inlined)
                                             rwsem_down_read_slowpath ([kernel.kallsyms])
                                             __preempt_count_dec_and_test (inlined)
                                             trace_contention_begin (inlined)
                                             trace_contention_begin (inlined)
                                             rwsem_down_read_slowpath ([kernel.kallsyms])
                                             __down_read_common (inlined)
                                             __down_read (inlined)
                                             down_read ([kernel.kallsyms])
                                             arch_static_branch (inlined)
                                             static_key_false (inlined)
                                             __mmap_lock_trace_acquire_returned (inlined)
                                             mmap_read_lock (inlined)
                                             do_user_addr_fault ([kernel.kallsyms])
                                             arch_local_irq_disable (inlined)
                                             handle_page_fault (inlined)
                                             exc_page_fault ([kernel.kallsyms])
                                             asm_exc_page_fault ([kernel.kallsyms])
                                             [0x4def008] (/usr/lib64/firefox/libxul.so)
        1244.619 TaskCon~ller #/2645281 lock:contention_end(lock_addr: 0xffff8e6748da5ab0)
                                             __preempt_count_dec_and_test (inlined)
                                             trace_contention_end (inlined)
                                             trace_contention_end (inlined)
                                             rwsem_down_read_slowpath ([kernel.kallsyms])
                                             __preempt_count_dec_and_test (inlined)
                                             trace_contention_end (inlined)
                                             trace_contention_end (inlined)
                                             rwsem_down_read_slowpath ([kernel.kallsyms])
                                             __down_read_common (inlined)
                                             __down_read (inlined)
                                             down_read ([kernel.kallsyms])
                                             arch_static_branch (inlined)
                                             static_key_false (inlined)
                                             __mmap_lock_trace_acquire_returned (inlined)
                                             mmap_read_lock (inlined)
                                             do_user_addr_fault ([kernel.kallsyms])
                                             arch_local_irq_disable (inlined)
                                             handle_page_fault (inlined)
                                             exc_page_fault ([kernel.kallsyms])
                                             asm_exc_page_fault ([kernel.kallsyms])
        <SNIP>
      
      Fixes: 8dc26b6f ("perf srcline: Make sentinel reading for binutils addr2line more robust")
      Reported-by: default avatarArnaldo Carvalho de Melo <acme@kernel.org>
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: llvm@lists.linux.dev
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Tom Rix <trix@redhat.com>
      Link: https://lore.kernel.org/r/20230615025041.1982072-1-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      e90208e9
  2. 14 Jun, 2023 30 commits
  3. 13 Jun, 2023 2 commits
  4. 12 Jun, 2023 1 commit