1. 24 Jun, 2023 4 commits
    • Ian Rogers's avatar
      perf bpf: Move the declaration of struct rq · 5c45b210
      Ian Rogers authored
      struct rq is defined in vmlinux.h when the vmlinux.h is generated,
      this causes a redefinition failure if it is declared in
      lock_contention.bpf.c. Move the definition to vmlinux.h for
      consistency with the generated version.
      
      Fixes: 760ebc45 ("perf lock contention: Add empty 'struct rq' to satisfy libbpf 'runqueue' type verification")
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Acked-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: James Clark <james.clark@arm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Yang Jihong <yangjihong1@huawei.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: bpf@vger.kernel.org
      Link: https://lore.kernel.org/r/20230623041405.4039475-3-irogers@google.comSigned-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      5c45b210
    • Ian Rogers's avatar
      perf build: Add ability to build with a generated vmlinux.h · b7a2d774
      Ian Rogers authored
      Commit a8874665 ("perf bpf skels: Stop using vmlinux.h generated
      from BTF, use subset of used structs + CO-RE") made it so that
      vmlinux.h was uncondtionally included from
      tools/perf/util/vmlinux.h. This change reverts part of that change (so
      that vmlinux.h is once again generated) and makes it so that the
      vmlinux.h used at build time is selected from the VMLINUX_H
      variable. By default the VMLINUX_H variable is set to the vmlinux.h
      added in change a8874665, but if GEN_VMLINUX_H=1 is passed on the
      build command line then the previous generation behavior kicks in.
      
      The build with GEN_VMLINUX_H=1 currently fails with:
      
          util/bpf_skel/lock_contention.bpf.c:419:8: error: redefinition of 'rq'
          struct rq {};
                 ^
          /tmp/perf/util/bpf_skel/.tmp/../vmlinux.h:45630:8: note: previous definition is here
          struct rq {
                 ^
          1 error generated.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Acked-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: James Clark <james.clark@arm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Yang Jihong <yangjihong1@huawei.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: bpf@vger.kernel.org
      Link: https://lore.kernel.org/r/20230623041405.4039475-2-irogers@google.com
      [ Format the error message and add a comment for GEN_VMLINUX_H ]
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      b7a2d774
    • Namhyung Kim's avatar
      perf test: Skip metrics w/o event name in stat STD output linter · 4d60e83d
      Namhyung Kim authored
      This test checks if the output of perf stat to match event names and
      metrics.  So it wants the output lines to have both event name and
      metric.  Otherwise it should skip the line.
      
      On AMD machines, the instruction event has two metrics and they are printed
      in separate lines.  It makes the line without event name like below:
      
        # perf stat -a sleep 1
      
         Performance counter stats for 'system wide':
      
                 64,383.34 msec cpu-clock                  #   64.048 CPUs utilized
                    14,526      context-switches           #  225.617 /sec
                       112      cpu-migrations             #    1.740 /sec
                       190      page-faults                #    2.951 /sec
               807,558,652      cycles                     #    0.013 GHz                         (83.30%)
                69,809,799      stalled-cycles-frontend    #    8.64% frontend cycles idle        (83.30%)
               196,983,266      stalled-cycles-backend     #   24.39% backend cycles idle         (83.30%)
               424,876,008      instructions               #    0.53  insn per cycle
       (here) --->                                  #    0.46  stalled cycles per insn     (83.30%)
                97,788,321      branches                   #    1.519 M/sec                       (83.34%)
                 4,147,377      branch-misses              #    4.24% of all branches             (83.46%)
      
               1.005241409 seconds time elapsed
      
      Also modern Intel machines have TopDown metrics which also don't have
      event names.
      
        # perf stat -a sleep 1
      
         Performance counter stats for 'system wide':
      
                  8,015.39 msec cpu-clock                        #    7.996 CPUs utilized
                     5,823      context-switches                 #  726.477 /sec
                       189      cpu-migrations                   #   23.580 /sec
                       139      page-faults                      #   17.342 /sec
               435,139,308      cycles                           #    0.054 GHz
               193,891,345      instructions                     #    0.45  insn per cycle
                42,773,028      branches                         #    5.336 M/sec
                 2,298,113      branch-misses                    #    5.37% of all branches
                                TopdownL1                 #     25.5 %  tma_backend_bound
                    /-->                                  #      7.9 %  tma_bad_speculation
          (here) --+                                      #     55.7 %  tma_frontend_bound
                    \-->                                  #     10.9 %  tma_retiring
      
               1.002395924 seconds time elapsed
      
      There is a check to skip TopdownL1 and TopdownL2 specifically but it
      does not cover every affected lines.
      
      So there is another check to skip the line if it has nothing on the left
      side of # sign.  Well.. it seems ok but that's not enough too.
      
      When aggregation mode (like --per-socket or --per-thread) is used, it
      adds some prefix (e.g. CPU socket, task name and PID) in the output
      line.  So the test code ignores them to normalize result.
      
      A problem can happen for per-thread mode when task name contains one or
      more spaces.  It'd only ignore the first part of the task name, and it
      thinks there's something more in the line so it would not skip.
      
        # perf stat -a --perf-thread sleep 1
        ...
                  perf-21276                  #     70.2 %  tma_backend_bound
                  perf-21276                  #      3.9 %  tma_bad_speculation
                  perf-21276                  #     10.5 %  tma_frontend_bound
                  perf-21276                  #     15.3 %  tma_retiring
      	    ^^^^^^^^^^
      	    (ignored)
      
               my task-21328                  #     70.2 %  tma_backend_bound
               my task-21328                  #      3.9 %  tma_bad_speculation
               my task-21328                  #     10.5 %  tma_frontend_bound
               my task-21328                  #     15.3 %  tma_retiring
      	 ^^
           (ignored)
      
      So I think it should look at the metric names instead.  Add skip_metric
      to hold the list of names to skip.  It would contain 'stalled cycles per
      insn' and metrics started by 'tma_'.
      
      Fixes: 99a04a48 ("perf test: Add test case for the standard 'perf stat' output")
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Link: https://lore.kernel.org/r/20230623230139.985594-2-namhyung@kernel.orgSigned-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      4d60e83d
    • Namhyung Kim's avatar
      perf test: Reorder event name checks in stat STD output linter · 8d3df7c3
      Namhyung Kim authored
      On AMD machines, the perf stat STD output test failed like below:
      
        $ sudo ./perf test -v 98
         98: perf stat STD output linter                                     :
        --- start ---
        test child forked, pid 1841901
        Checking STD output: no argswrong event metric.
          expected 'GHz' in 108,121 stalled-cycles-frontend  # 10.88% frontend cycles idle
        test child finished with -1
        ---- end ----
        perf stat STD output linter: FAILED!
      
      This is because there are stalled-cycles-{frontend,backend} events are
      used by default.  The current logic checks the event_name array to find
      which event it's running.  But 'cycles' event comes before those stalled
      cycles event and it matches first.  So it tries to find 'GHz' metric
      in the output (which is for the 'cycles') and fails.
      
      Move the stalled-cycles-{frontend,backend} events before 'cycles' so
      that it can find the stalled cycles events first.
      
      Also add a space after 'no args' test name for consistency.
      
      Fixes: 99a04a48 ("perf test: Add test case for the standard 'perf stat' output")
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Link: https://lore.kernel.org/r/20230623230139.985594-1-namhyung@kernel.orgSigned-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      8d3df7c3
  2. 23 Jun, 2023 6 commits
    • Ian Rogers's avatar
      perf pmu: Remove a hard coded cpu PMU assumption · d06593aa
      Ian Rogers authored
      The property of "cpu" when it has no cpu map is true on S390 with the
      PMU cpum_cf. Rather than maintain a list of such PMUs, reuse the
      is_core test result from the caller.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Tested-by: default avatarThomas Richter <tmricht@linux.ibm.com>
      Cc: Ravi Bangoria <ravi.bangoria@amd.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-perf-users@vger.kernel.org
      Link: https://lore.kernel.org/r/20230623043843.4080180-2-irogers@google.comSigned-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      d06593aa
    • Ian Rogers's avatar
      perf pmus: Add notion of default PMU for JSON events · d685819b
      Ian Rogers authored
      JSON events created in pmu-events.c by jevents.py may not specify a
      PMU they are associated with, in which case it is implied that it is
      the first core PMU. Care is needed to select this for regular 'cpu',
      s390 'cpum_cf' and ARMs many names as at the point the name is first
      needed the core PMUs list hasn't been initialized. Add a helper in
      perf_pmus to create this value, in the worst case by scanning sysfs.
      
      v2. Add missing close if fdopendir fails.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Tested-by: default avatarThomas Richter <tmricht@linux.ibm.com>
      Cc: Ravi Bangoria <ravi.bangoria@amd.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Thomas Richter <tmricht@linux.ibm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Link: https://lore.kernel.org/r/20230623043843.4080180-1-irogers@google.comSigned-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      d685819b
    • Ian Rogers's avatar
      perf unwind: Fix map reference counts · 33941dbd
      Ian Rogers authored
      The result of thread__find_map is the map in the passed in
      addr_location. Calling addr_location__exit puts that map and so copies
      need to do a map__get. Add in the corresponding map__puts.
      
      v2. Add missing map__put when dso is missing.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Ivan Babrou <ivan@cloudflare.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Link: https://lore.kernel.org/r/20230623043107.4077510-1-irogers@google.comSigned-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      33941dbd
    • Namhyung Kim's avatar
      perf test: Set PERF_EXEC_PATH for script execution · e4ef3ef1
      Namhyung Kim authored
      The task-analyzer.py script (actually every other scripts too) requires
      PERF_EXEC_PATH env to find dependent libraries and scripts. For scripts
      test to run correctly, it needs to set PERF_EXEC_PATH to the perf tool
      source directory.
      
      Instead of blindly update the env, let's check the directory structure
      to make sure it points to the correct location.
      
      Fixes: e8478b84 ("perf test: add new task-analyzer tests")
      Cc: Petar Gligoric <petar.gligoric@rohde-schwarz.com>
      Cc: Hagen Paul Pfeifer <hagen@jauu.net>
      Cc: Aditya Gupta <adityag@linux.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      e4ef3ef1
    • Namhyung Kim's avatar
      perf script: Initialize buffer for regs_map() · 2d7f5540
      Namhyung Kim authored
      The buffer is used to save register mapping in a sample.  Normally
      perf samples don't have any register so the string should be empty.
      But it missed to initialize the buffer when the size is 0.  And it's
      passed to PyUnicode_FromString() with a garbage data.
      
      So it returns NULL due to invalid input (instead of an empty unicode
      string object) which causes a segfault like below:
      
        Thread 2.1 "perf" received signal SIGSEGV, Segmentation fault.
        [Switching to Thread 0x7ffff7c83780 (LWP 193775)]
        0x00007ffff6dbca2e in PyDict_SetItem () from /lib/x86_64-linux-gnu/libpython3.11.so.1.0
        (gdb) bt
        #0  0x00007ffff6dbca2e in PyDict_SetItem () from /lib/x86_64-linux-gnu/libpython3.11.so.1.0
        #1  0x00007ffff6dbf848 in PyDict_SetItemString () from /lib/x86_64-linux-gnu/libpython3.11.so.1.0
        #2  0x000055555575824d in pydict_set_item_string_decref (val=0x0, key=0x5555557f96e3 "iregs", dict=0x7ffff5f7f780)
            at util/scripting-engines/trace-event-python.c:145
        #3  set_regs_in_dict (evsel=0x555555efc370, sample=0x7fffffffb870, dict=0x7ffff5f7f780)
            at util/scripting-engines/trace-event-python.c:776
        #4  get_perf_sample_dict (sample=sample@entry=0x7fffffffb870, evsel=evsel@entry=0x555555efc370, al=al@entry=0x7fffffffb2e0,
            addr_al=addr_al@entry=0x0, callchain=callchain@entry=0x7ffff63ef440) at util/scripting-engines/trace-event-python.c:923
        #5  0x0000555555758ec1 in python_process_tracepoint (sample=0x7fffffffb870, evsel=0x555555efc370, al=0x7fffffffb2e0, addr_al=0x0)
            at util/scripting-engines/trace-event-python.c:1044
        #6  0x00005555555c5db8 in process_sample_event (tool=<optimized out>, event=<optimized out>, sample=<optimized out>,
            evsel=0x555555efc370, machine=0x555555ef4d68) at builtin-script.c:2421
        #7  0x00005555556b7793 in perf_session__deliver_event (session=0x555555ef4b60, event=0x7ffff62ff7d0, tool=0x7fffffffc150,
            file_offset=30672, file_path=0x555555efb8a0 "perf.data") at util/session.c:1639
        #8  0x00005555556bc864 in do_flush (show_progress=true, oe=0x555555efb700) at util/ordered-events.c:245
        #9  __ordered_events__flush (oe=oe@entry=0x555555efb700, how=how@entry=OE_FLUSH__FINAL, timestamp=timestamp@entry=0)
            at util/ordered-events.c:324
        #10 0x00005555556bd06e in ordered_events__flush (oe=oe@entry=0x555555efb700, how=how@entry=OE_FLUSH__FINAL)
            at util/ordered-events.c:342
        #11 0x00005555556b9d63 in __perf_session__process_events (session=0x555555ef4b60) at util/session.c:2465
        #12 perf_session__process_events (session=0x555555ef4b60) at util/session.c:2627
        #13 0x00005555555cb1d0 in __cmd_script (script=0x7fffffffc150) at builtin-script.c:2839
        #14 cmd_script (argc=<optimized out>, argv=<optimized out>) at builtin-script.c:4365
        #15 0x0000555555650811 in run_builtin (p=p@entry=0x555555ed8948 <commands+456>, argc=argc@entry=4, argv=argv@entry=0x7fffffffe240)
            at perf.c:323
        #16 0x0000555555597eb3 in handle_internal_command (argv=0x7fffffffe240, argc=4) at perf.c:377
        #17 run_argv (argv=<synthetic pointer>, argcp=<synthetic pointer>) at perf.c:421
        #18 main (argc=4, argv=0x7fffffffe240) at perf.c:537
      
      Fixes: 51cfe7a3 ("perf python: Avoid 2 leak sanitizer issues")
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      2d7f5540
    • James Clark's avatar
      perf tests: Fix test_arm_callgraph_fp variable expansion · 33fe7c08
      James Clark authored
      $TEST_PROGRAM is a command with spaces so it's supposed to be word
      split. The referenced fix to fix the shellcheck warnings incorrectly
      quoted this string so unquote it to fix the test.
      
      At the same time silence the shellcheck warning for that line and fix
      two more shellcheck errors at the end of the script.
      
      Fixes: 1bb17b4c ("perf tests arm_callgraph_fp: Address shellcheck warnings about signal names and adding double quotes for expression")
      Signed-off-by: default avatarJames Clark <james.clark@arm.com>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: spoorts2@in.ibm.com
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Link: https://lore.kernel.org/r/20230622101809.2431897-1-james.clark@arm.comSigned-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      33fe7c08
  3. 22 Jun, 2023 5 commits
  4. 21 Jun, 2023 4 commits
    • elisabeth's avatar
      perf jit: Fix incorrect file name in DWARF line table · 362f9c90
      elisabeth authored
      Fixes an issue where an incorrect filename was added in the DWARF line table of
      an ELF object file when calling 'perf inject --jit' due to not checking the
      filename of a debug entry against the repeated name marker (/xff/0).
      The marker is mentioned in the tools/perf/util/jitdump.h header, which describes
      the jitdump binary format, and indicitates that the filename in a debug entry
      is the same as the previous enrty.
      
      In the function emit_lineno_info(), in the file tools/perf/util/genelf-debug.c,
      the debug entry filename gets compared to the previous entry filename. If they
      are not the same, a new filename is added to the DWARF line table. However,
      since there is no check against '\xff\0', in some cases '\xff\0' is inserted
      as the filename into the DWARF line table.
      
      This can be seen with `objdump --dwarf=line` on the ELF file after `perf inject --jit`.
      It also makes no source code information show up in 'perf annotate'.
      Signed-off-by: default avatarElisabeth Panholzer <elisabeth@leaningtech.com>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20230602123815.255001-1-paniii94@gmail.com
      [ Fixed a trailing white space, removed a subject prefix ]
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      362f9c90
    • WANG Rui's avatar
      perf annotate: Fix instruction association and parsing for LoongArch · 4ca0d340
      WANG Rui authored
      In the perf annotate view for LoongArch, there is no arrowed line
      pointing to the target from the branch instruction. This issue is
      caused by incorrect instruction association and parsing.
      
      $ perf record alloc-6276705c94ad1398 # rust benchmark
      $ perf report
      
        0.28 │       ori        $a1, $zero, 0x63
             │       move       $a2, $zero
       10.55 │       addi.d     $a3, $a2, 1(0x1)
             │       sltu       $a4, $a3, $s7
        9.53 │       masknez    $a4, $s7, $a4
             │       sub.d      $a3, $a3, $a4
       12.12 │       st.d       $a1, $fp, 24(0x18)
             │       st.d       $a3, $fp, 16(0x10)
       16.29 │       slli.d     $a2, $a2, 0x2
             │       ldx.w      $a2, $s8, $a2
       12.77 │       st.w       $a2, $sp, 724(0x2d4)
             │       st.w       $s0, $sp, 720(0x2d0)
        7.03 │       addi.d     $a2, $sp, 720(0x2d0)
             │       addi.d     $a1, $a1, -1(0xfff)
       12.03 │       move       $a2, $a3
             │     → bne        $a1, $s3, -52(0x3ffcc)  # 82ce8 <test::bench::Bencher::iter+0x3f4>
        2.50 │       addi.d     $a0, $a0, 1(0x1)
      
      This patch fixes instruction association issues, such as associating
      branch instructions with jump_ops instead of call_ops, and corrects
      false instruction matches. It also implements branch instruction parsing
      specifically for LoongArch. With this patch, we will be able to see the
      arrowed line.
      
        0.79 │3ec:   ori        $a1, $zero, 0x63
             │       move       $a2, $zero
       10.32 │3f4:┌─→addi.d     $a3, $a2, 1(0x1)
             │    │  sltu       $a4, $a3, $s7
       10.44 │    │  masknez    $a4, $s7, $a4
             │    │  sub.d      $a3, $a3, $a4
       14.17 │    │  st.d       $a1, $fp, 24(0x18)
             │    │  st.d       $a3, $fp, 16(0x10)
       13.15 │    │  slli.d     $a2, $a2, 0x2
             │    │  ldx.w      $a2, $s8, $a2
       11.00 │    │  st.w       $a2, $sp, 724(0x2d4)
             │    │  st.w       $s0, $sp, 720(0x2d0)
        8.00 │    │  addi.d     $a2, $sp, 720(0x2d0)
             │    │  addi.d     $a1, $a1, -1(0xfff)
       11.99 │    │  move       $a2, $a3
             │    └──bne        $a1, $s3, 3f4
        3.17 │       addi.d     $a0, $a0, 1(0x1)
      Signed-off-by: default avatarWANG Rui <wangrui@loongson.cn>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: loongarch@lists.linux.dev
      Cc: loongson-kernel@lists.loongnix.cn
      Cc: Huacai Chen <chenhuacai@loongson.cn>
      Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: WANG Xuerui <kernel@xen0n.name>
      Link: https://lore.kernel.org/r/20230620132025.105563-1-wangrui@loongson.cnSigned-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      4ca0d340
    • Ian Rogers's avatar
      perf annotation: Switch lock from a mutex to a sharded_mutex · 2e9f9d4a
      Ian Rogers authored
      Remove the "struct mutex lock" variable from annotation that is
      allocated per symbol. This removes in the region of 40 bytes per
      symbol allocation. Use a sharded mutex where the number of shards is
      set to the number of CPUs. Assuming good hashing of the annotation
      (done based on the pointer), this means in order to contend there
      needs to be more threads than CPUs, which is not currently true in any
      perf command. Were contention an issue it is straightforward to
      increase the number of shards in the mutex.
      
      On my Debian/glibc based machine, this reduces the size of struct
      annotation from 136 bytes to 96 bytes, or nearly 30%.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Andres Freund <andres@anarazel.de>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Yuan Can <yuancan@huawei.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Link: https://lore.kernel.org/r/20230615040715.2064350-2-irogers@google.comSigned-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      2e9f9d4a
    • Ian Rogers's avatar
      perf sharded_mutex: Introduce sharded_mutex · 0650b2b2
      Ian Rogers authored
      Per object mutexes may come with significant memory cost while a
      global mutex can suffer from unnecessary contention. A sharded mutex
      is a compromise where objects are hashed and then a particular mutex
      for the hash of the object used. Contention can be controlled by the
      number of shards.
      
      v2. Use hashmap.h's hash_bits in case of contention from alignment of
          objects.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Andres Freund <andres@anarazel.de>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Yuan Can <yuancan@huawei.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Link: https://lore.kernel.org/r/20230615040715.2064350-1-irogers@google.comSigned-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      0650b2b2
  5. 20 Jun, 2023 5 commits
  6. 16 Jun, 2023 16 commits
    • Arnaldo Carvalho de Melo's avatar
      perf pmus: Check if we can encode the PMU number in perf_event_attr.type · 82fe2e45
      Arnaldo Carvalho de Melo authored
      In some architectures we can't encode the PMU number in
      perf_event_attr.type and thus can't just ask for the same event in
      multiple CPUs (and thus PMUs), that is what we want in hybrid systems
      but we can't when that encoding isn't understood by the kernel, such as
      in ARM64's big.LITTLE.
      
      If that is the case, fallback to the previous behaviour till we find a
      better solution to have consistent output accross architectures with
      hybrid CPU configurations.
      
      Co-developed-with: Ian Rogers <irogers@google.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: John Garry <john.g.garry@oracle.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Cc: Thomas Richter <tmricht@linux.ibm.com>
      Cc: Will Deacon <will@kernel.org>
      Link: https://lore.kernel.org/linux-perf-users/ZIzYgImv61OGK1wA@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      82fe2e45
    • Arnaldo Carvalho de Melo's avatar
      perf print-events: Export is_event_supported() · e2be0666
      Arnaldo Carvalho de Melo authored
      Will be used when checking if we can encode the PMU number in
      perf_event_attr.type, part of the logic to use in hybrid systems
      (multiple types of CPUs, such as Intel's (Alder Lake, etc) or ARM's
      big.LITTLE).
      
      Co-developed-with: Ian Rogers <irogers@google.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: John Garry <john.g.garry@oracle.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Cc: Thomas Richter <tmricht@linux.ibm.com>
      Cc: Will Deacon <will@kernel.org>
      Link: https://lore.kernel.org/linux-perf-users/ZIzYgImv61OGK1wA@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      e2be0666
    • Tiezhu Yang's avatar
      perf test record+probe_libc_inet_pton.sh: Use "grep -F" instead of obsolescent "fgrep" · bb6b369c
      Tiezhu Yang authored
      There exists the following warning when executing 'perf test record+probe_libc_inet_pton.sh':
      
        fgrep: warning: fgrep is obsolescent; using grep -F
      
      This is tested on Fedora 38, the version of grep is 3.8, the latest
      version of grep claims the fgrep is obsolete, use "grep -F" instead of
      "fgrep" to silence the warning.
      Signed-off-by: default avatarTiezhu Yang <yangtiezhu@loongson.cn>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: loongson-kernel@lists.loongnix.cn
      Link: https://lore.kernel.org/r/1686880567-30017-1-git-send-email-yangtiezhu@loongson.cnSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      bb6b369c
    • Ravi Bangoria's avatar
      perf mem: Scan all PMUs instead of just core ones · 5752c20f
      Ravi Bangoria authored
      Scanning only core PMUs is not sufficient on platforms like AMD since
      perf mem on AMD uses IBS OP PMU, which is independent of core PMU.
      Scan all PMUs instead of just core PMUs. There should be negligible
      performance overhead because of scanning all PMUs, so we should be okay.
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Signed-off-by: default avatarRavi Bangoria <ravi.bangoria@amd.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ali Saidi <alisaidi@amazon.com>
      Cc: Ananth Narayan <ananth.narayan@amd.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sandipan Das <sandipan.das@amd.com>
      Cc: Santosh Shukla <santosh.shukla@amd.com>
      Cc: Thomas Richter <tmricht@linux.ibm.com>
      Link: https://lore.kernel.org/r/20230615051700.1833-4-ravi.bangoria@amd.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      5752c20f
    • Ravi Bangoria's avatar
      perf mem amd: Fix perf_pmus__num_mem_pmus() · f0dc2082
      Ravi Bangoria authored
      perf mem/c2c on AMD internally uses IBS OP PMU, not the core PMU. Also,
      AMD platforms does not have heterogeneous PMUs.
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Signed-off-by: default avatarRavi Bangoria <ravi.bangoria@amd.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ali Saidi <alisaidi@amazon.com>
      Cc: Ananth Narayan <ananth.narayan@amd.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sandipan Das <sandipan.das@amd.com>
      Cc: Santosh Shukla <santosh.shukla@amd.com>
      Cc: Thomas Richter <tmricht@linux.ibm.com>
      Link: https://lore.kernel.org/r/20230615051700.1833-3-ravi.bangoria@amd.com
      [ Added the improved comment for perf_pmus__num_mem_pmus() as b4 didn't from the per-patch (not series) newer version ]
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      f0dc2082
    • Ravi Bangoria's avatar
      perf pmus: Describe semantics of 'core_pmus' and 'other_pmus' · cddfc5fb
      Ravi Bangoria authored
      Notion of 'core_pmus' and 'other_pmus' are independent of hw core and
      uncore pmus. For example, AMD IBS PMUs are present in each SMT-thread
      but they belongs to 'other_pmus'. Add a comment describing what these
      list contains and how they are treated.
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Signed-off-by: default avatarRavi Bangoria <ravi.bangoria@amd.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ali Saidi <alisaidi@amazon.com>
      Cc: Ananth Narayan <ananth.narayan@amd.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sandipan Das <sandipan.das@amd.com>
      Cc: Santosh Shukla <santosh.shukla@amd.com>
      Cc: Thomas Richter <tmricht@linux.ibm.com>
      Link: https://lore.kernel.org/r/20230615051700.1833-2-ravi.bangoria@amd.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      cddfc5fb
    • Namhyung Kim's avatar
      perf stat: Show average value on multiple runs · dada1a1f
      Namhyung Kim authored
      When -r option is used, perf stat runs the command multiple times and
      update stats in the evsel->stats.res_stats for global aggregation.  But
      the value is never used and the value it prints at the end is just the
      value from the last run.  I think we should print the average number of
      multiple runs.
      
      Add evlist__copy_res_stats() to update the aggr counter (for display)
      using the values in the evsel->stats.res_stats.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20230616073211.1057936-2-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      dada1a1f
    • Namhyung Kim's avatar
      perf stat: Reset aggr stats for each run · ed4090a2
      Namhyung Kim authored
      When it runs multiple times with -r option, it missed to reset the
      aggregation counters and the values were added up.  The aggregation
      count has the values to be printed in the end.  It should reset the
      counters at the beginning of each run.  But the current code does that
      only when -I/--interval-print option is given.
      
      Fixes: 91f85f98 ("perf stat: Display event stats using aggr counts")
      Reported-by: default avatarJiri Olsa <jolsa@kernel.org>
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20230616073211.1057936-1-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      ed4090a2
    • Thomas Richter's avatar
      perf test: fix failing test cases on linux-next for s390 · 6fbd67b0
      Thomas Richter authored
      In linux-next tree the many test cases fail on s390x when running the
      perf test suite, sometime the perf tool dumps core.
      
      Output before:
        6.1: Test event parsing                               : FAILED!
       10.3: Parsing of PMU event table metrics               : FAILED!
       10.4: Parsing of PMU event table metrics with fake PMUs: FAILED!
       17: Setup struct perf_event_attr                       : FAILED!
       24: Number of exit events of a simple workload         : FAILED!
       26: Object code reading                                : FAILED!
       28: Use a dummy software event to keep tracking        : FAILED!
       35: Track with sched_switch                            : FAILED!
       42.3: BPF prologue generation                          : FAILED!
       66: Parse and process metrics                          : FAILED!
       68: Event expansion for cgroups                        : FAILED!
       69.2: Perf time to TSC                                 : FAILED!
       74: build id cache operations                          : FAILED!
       86: Zstd perf.data compression/decompression           : FAILED!
       87: perf record tests                                  : FAILED!
      106: Test java symbol                                   : FAILED!
      
      The reason for all these failure is a missing PMU. On s390x the PMU is
      named cpum_cf which is not detected as core PMU.  A similar patch was
      added before, see commit 9bacbced ("perf list: Add s390 support
      for detailed PMU event description") which got lost during the recent
      reworks. Add it again.
      
      Output after:
       10.2: PMU event map aliases                            : FAILED!
       42.3: BPF prologue generation                          : FAILED!
      
      Most test cases now work and there is not core dump anymore.
      Signed-off-by: default avatarThomas Richter <tmricht@linux.ibm.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
      Cc: Sven Schnelle <svens@linux.ibm.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Link: https://lore.kernel.org/r/20230616081437.1932003-1-tmricht@linux.ibm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      6fbd67b0
    • Vincent Whitchurch's avatar
      perf annotate: Work with vmlinux outside symfs · 66dc1920
      Vincent Whitchurch authored
      It is currently possible to use --symfs along with a vmlinux which lies
      outside of the symfs by passing an absolute path to --vmlinux, thanks to
      the check in dso__load_vmlinux() which handles this explicitly.
      
      However, the annotate code lacks this check and thus 'perf annotate'
      does not work ("Internal error: Invalid -1 error code") for kernel
      functions with this combination.  Add the missing handling.
      Signed-off-by: default avatarVincent Whitchurch <vincent.whitchurch@axis.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: kernel@axis.com
      Link: https://lore.kernel.org/r/20221125114210.2353820-1-vincent.whitchurch@axis.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      66dc1920
    • Kan Liang's avatar
      perf vendor events arm64: Add default tags for Hisi hip08 L1 metrics · f9625140
      Kan Liang authored
      Add the default tags for Hisi hip08 as well.
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ahmad Yasin <ahmad.yasin@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.g.garry@oracle.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: https://lore.kernel.org/r/20230616031420.3751973-6-kan.liang@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      f9625140
    • Kan Liang's avatar
      perf test: Add test case for the standard 'perf stat' output · 99a04a48
      Kan Liang authored
      Add a new test case to verify the standard 'perf stat' output with
      different options.
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ahmad Yasin <ahmad.yasin@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: https://lore.kernel.org/r/20230616031420.3751973-5-kan.liang@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      99a04a48
    • Kan Liang's avatar
      perf test: Move all the check functions of stat CSV output to lib · fc51fc87
      Kan Liang authored
      These functions can be shared with the stat std output test.
      
      There is no functional change.
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ahmad Yasin <ahmad.yasin@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: https://lore.kernel.org/r/20230616031420.3751973-4-kan.liang@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      fc51fc87
    • Kan Liang's avatar
      perf stat: New metricgroup output for the default mode · 6a80d794
      Kan Liang authored
      In the default mode, the current output of the metricgroup include both
      events and metrics, which is not necessary and just makes the output
      hard to read. Since different ARCHs (even different generations in the
      same ARCH) may use different events. The output also vary on different
      platforms.
      
      For a metricgroup, only outputting the value of each metric is good
      enough.
      
      Add a new field default_metricgroup in evsel to indicate an event of the
      default metricgroup. For those events, printout() should print the
      metricgroup name rather than each event.
      
      Add perf_stat__skip_metric_event() to skip the evsel in the Default
      metricgroup, if it's not running or not the metric event.
      
      Add print_metricgroup_header_t to pass the functions which print the
      display name of each metricgroup in the Default metricgroup. Support all
      three output methods.
      
      Factor out perf_stat__print_shadow_stats_metricgroup() to print out each
      metrics.
      
      On SPR:
      
      Before:
      
       ./perf_old stat sleep 1
      
       Performance counter stats for 'sleep 1':
      
                    0.54 msec task-clock:u                     #    0.001 CPUs utilized
                       0      context-switches:u               #    0.000 /sec
                       0      cpu-migrations:u                 #    0.000 /sec
                      68      page-faults:u                    #  125.445 K/sec
                 540,970      cycles:u                         #    0.998 GHz
                 556,325      instructions:u                   #    1.03  insn per cycle
                 123,602      branches:u                       #  228.018 M/sec
                   6,889      branch-misses:u                  #    5.57% of all branches
               3,245,820      TOPDOWN.SLOTS:u                  #     18.4 %  tma_backend_bound
                                                        #     17.2 %  tma_retiring
                                                        #     23.1 %  tma_bad_speculation
                                                        #     41.4 %  tma_frontend_bound
                 564,859      topdown-retiring:u
               1,370,999      topdown-fe-bound:u
                 603,271      topdown-be-bound:u
                 744,874      topdown-bad-spec:u
                  12,661      INT_MISC.UOP_DROPPING:u          #   23.357 M/sec
      
             1.001798215 seconds time elapsed
      
             0.000193000 seconds user
             0.001700000 seconds sys
      
      After:
      
      $ ./perf stat sleep 1
      
       Performance counter stats for 'sleep 1':
      
                    0.51 msec task-clock:u                     #    0.001 CPUs utilized
                       0      context-switches:u               #    0.000 /sec
                       0      cpu-migrations:u                 #    0.000 /sec
                      68      page-faults:u                    #  132.683 K/sec
                 545,228      cycles:u                         #    1.064 GHz
                 555,509      instructions:u                   #    1.02  insn per cycle
                 123,574      branches:u                       #  241.120 M/sec
                   6,957      branch-misses:u                  #    5.63% of all branches
                              TopdownL1                 #     17.5 %  tma_backend_bound
                                                        #     22.6 %  tma_bad_speculation
                                                        #     42.7 %  tma_frontend_bound
                                                        #     17.1 %  tma_retiring
                              TopdownL2                 #     21.8 %  tma_branch_mispredicts
                                                        #     11.5 %  tma_core_bound
                                                        #     13.4 %  tma_fetch_bandwidth
                                                        #     29.3 %  tma_fetch_latency
                                                        #      2.7 %  tma_heavy_operations
                                                        #     14.5 %  tma_light_operations
                                                        #      0.8 %  tma_machine_clears
                                                        #      6.1 %  tma_memory_bound
      
             1.001712086 seconds time elapsed
      
             0.000151000 seconds user
             0.001618000 seconds sys
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ahmad Yasin <ahmad.yasin@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: https://lore.kernel.org/r/20230616031420.3751973-3-kan.liang@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      6a80d794
    • Kan Liang's avatar
      perf metrics: Sort the Default metricgroup · 1c0e4795
      Kan Liang authored
      The new default mode will print the metrics as a metric group. The
      metrics from the same metric group must be adjacent to each other in the
      metric list. But the metric_list_cmp() sorts metrics by the number of
      events.
      
      Add a new sort for the Default metricgroup, which sorts by
      default_metricgroup_name and metric_name.
      
      Add is_default in the struct metric_event to indicate that it's from
      the Default metricgroup.
      
      Store the displayed metricgroup name of the Default metricgroup into
      the metric expr for output.
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ahmad Yasin <ahmad.yasin@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: https://lore.kernel.org/r/20230616031420.3751973-2-kan.liang@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      1c0e4795
    • Kan Liang's avatar
      pert tests: Update metric-value for perf stat JSON output · 18b687d7
      Kan Liang authored
      There may be multiplexing triggered, e.g., e-core of ADL.
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ahmad Yasin <ahmad.yasin@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: https://lore.kernel.org/r/20230615135315.3662428-7-kan.liang@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      18b687d7