1. 08 Sep, 2022 1 commit
    • Adrian Hunter's avatar
      libperf evlist: Fix per-thread mmaps for multi-threaded targets · 7864d8f7
      Adrian Hunter authored
      The offending commit removed mmap_per_thread(), which did not consider
      the different set-output rules for per-thread mmaps i.e. in the per-thread
      case set-output is used for file descriptors of the same thread not the
      same cpu.
      
      This was not immediately noticed because it only happens with
      multi-threaded targets and we do not have a test for that yet.
      
      Reinstate mmap_per_thread() expanding it to cover also system-wide per-cpu
      events i.e. to continue to allow the mixing of per-thread and per-cpu
      mmaps.
      
      Debug messages (with -vv) show the file descriptors that are opened with
      sys_perf_event_open. New debug messages are added (needs -vvv) that show
      also which file descriptors are mmapped and which are redirected with
      set-output.
      
      In the per-cpu case (cpu != -1) file descriptors for the same CPU are
      set-output to the first file descriptor for that CPU.
      
      In the per-thread case (cpu == -1) file descriptors for the same thread are
      set-output to the first file descriptor for that thread.
      
      Example (process 17489 has 2 threads):
      
       Before (but with new debug prints):
      
         $ perf record --no-bpf-event -vvv --per-thread -p 17489
         <SNIP>
         sys_perf_event_open: pid 17489  cpu -1  group_fd -1  flags 0x8 = 5
         sys_perf_event_open: pid 17490  cpu -1  group_fd -1  flags 0x8 = 6
         <SNIP>
         libperf: idx 0: mmapping fd 5
         libperf: idx 0: set output fd 6 -> 5
         failed to mmap with 22 (Invalid argument)
      
       After:
      
         $ perf record --no-bpf-event -vvv --per-thread -p 17489
         <SNIP>
         sys_perf_event_open: pid 17489  cpu -1  group_fd -1  flags 0x8 = 5
         sys_perf_event_open: pid 17490  cpu -1  group_fd -1  flags 0x8 = 6
         <SNIP>
         libperf: mmap_per_thread: nr cpu values (may include -1) 1 nr threads 2
         libperf: idx 0: mmapping fd 5
         libperf: idx 1: mmapping fd 6
         <SNIP>
         [ perf record: Woken up 2 times to write data ]
         [ perf record: Captured and wrote 0.018 MB perf.data (15 samples) ]
      
      Per-cpu example (process 20341 has 2 threads, same as above):
      
         $ perf record --no-bpf-event -vvv -p 20341
         <SNIP>
         sys_perf_event_open: pid 20341  cpu 0  group_fd -1  flags 0x8 = 5
         sys_perf_event_open: pid 20342  cpu 0  group_fd -1  flags 0x8 = 6
         sys_perf_event_open: pid 20341  cpu 1  group_fd -1  flags 0x8 = 7
         sys_perf_event_open: pid 20342  cpu 1  group_fd -1  flags 0x8 = 8
         sys_perf_event_open: pid 20341  cpu 2  group_fd -1  flags 0x8 = 9
         sys_perf_event_open: pid 20342  cpu 2  group_fd -1  flags 0x8 = 10
         sys_perf_event_open: pid 20341  cpu 3  group_fd -1  flags 0x8 = 11
         sys_perf_event_open: pid 20342  cpu 3  group_fd -1  flags 0x8 = 12
         sys_perf_event_open: pid 20341  cpu 4  group_fd -1  flags 0x8 = 13
         sys_perf_event_open: pid 20342  cpu 4  group_fd -1  flags 0x8 = 14
         sys_perf_event_open: pid 20341  cpu 5  group_fd -1  flags 0x8 = 15
         sys_perf_event_open: pid 20342  cpu 5  group_fd -1  flags 0x8 = 16
         sys_perf_event_open: pid 20341  cpu 6  group_fd -1  flags 0x8 = 17
         sys_perf_event_open: pid 20342  cpu 6  group_fd -1  flags 0x8 = 18
         sys_perf_event_open: pid 20341  cpu 7  group_fd -1  flags 0x8 = 19
         sys_perf_event_open: pid 20342  cpu 7  group_fd -1  flags 0x8 = 20
         <SNIP>
         libperf: mmap_per_cpu: nr cpu values 8 nr threads 2
         libperf: idx 0: mmapping fd 5
         libperf: idx 0: set output fd 6 -> 5
         libperf: idx 1: mmapping fd 7
         libperf: idx 1: set output fd 8 -> 7
         libperf: idx 2: mmapping fd 9
         libperf: idx 2: set output fd 10 -> 9
         libperf: idx 3: mmapping fd 11
         libperf: idx 3: set output fd 12 -> 11
         libperf: idx 4: mmapping fd 13
         libperf: idx 4: set output fd 14 -> 13
         libperf: idx 5: mmapping fd 15
         libperf: idx 5: set output fd 16 -> 15
         libperf: idx 6: mmapping fd 17
         libperf: idx 6: set output fd 18 -> 17
         libperf: idx 7: mmapping fd 19
         libperf: idx 7: set output fd 20 -> 19
         <SNIP>
         [ perf record: Woken up 7 times to write data ]
         [ perf record: Captured and wrote 0.020 MB perf.data (17 samples) ]
      
      Fixes: ae4f8ae1 ("libperf evlist: Allow mixing per-thread and per-cpu mmaps")
      Reported-by: default avatarTomáš Trnka <trnka@scm.com>
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=216441Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20220905114209.8389-1-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      7864d8f7
  2. 06 Sep, 2022 4 commits
  3. 02 Sep, 2022 1 commit
    • Zhengjun Xing's avatar
      perf stat: Fix L2 Topdown metrics disappear for raw events · f0c86a2b
      Zhengjun Xing authored
      In perf/Documentation/perf-stat.txt, for "--td-level" the default "0" means
      the max level that the current hardware support.
      
      So we need initialize the stat_config.topdown_level to TOPDOWN_MAX_LEVEL
      when “--td-level=0” or no “--td-level” option. Otherwise, for the
      hardware with a max level is 2, the 2nd level metrics disappear for raw
      events in this case.
      
      The issue cannot be observed for the perf stat default or "--topdown"
      options. This commit fixes the raw events issue and removes the
      duplicated code for the perf stat default.
      
      Before:
      
       # ./perf stat -e "cpu-clock,context-switches,cpu-migrations,page-faults,instructions,cycles,ref-cycles,branches,branch-misses,{slots,topdown-retiring,topdown-bad-spec,topdown-fe-bound,topdown-be-bound,topdown-heavy-ops,topdown-br-mispredict,topdown-fetch-lat,topdown-mem-bound}" sleep 1
      
       Performance counter stats for 'sleep 1':
      
                    1.03 msec cpu-clock                        #    0.001 CPUs utilized
                       1      context-switches                 #  966.216 /sec
                       0      cpu-migrations                   #    0.000 /sec
                      60      page-faults                      #   57.973 K/sec
               1,132,112      instructions                     #    1.41  insn per cycle
                 803,872      cycles                           #    0.777 GHz
               1,909,120      ref-cycles                       #    1.845 G/sec
                 236,634      branches                         #  228.640 M/sec
                   6,367      branch-misses                    #    2.69% of all branches
               4,823,232      slots                            #    4.660 G/sec
               1,210,536      topdown-retiring                 #     25.1% Retiring
                 699,841      topdown-bad-spec                 #     14.5% Bad Speculation
               1,777,975      topdown-fe-bound                 #     36.9% Frontend Bound
               1,134,878      topdown-be-bound                 #     23.5% Backend Bound
                 189,146      topdown-heavy-ops                #  182.756 M/sec
                 662,012      topdown-br-mispredict            #  639.647 M/sec
               1,097,048      topdown-fetch-lat                #    1.060 G/sec
                 416,121      topdown-mem-bound                #  402.063 M/sec
      
             1.002423690 seconds time elapsed
      
             0.002494000 seconds user
             0.000000000 seconds sys
      
      After:
      
       # ./perf stat -e "cpu-clock,context-switches,cpu-migrations,page-faults,instructions,cycles,ref-cycles,branches,branch-misses,{slots,topdown-retiring,topdown-bad-spec,topdown-fe-bound,topdown-be-bound,topdown-heavy-ops,topdown-br-mispredict,topdown-fetch-lat,topdown-mem-bound}" sleep 1
      
       Performance counter stats for 'sleep 1':
      
                    1.13 msec cpu-clock                        #    0.001 CPUs utilized
                       1      context-switches                 #  882.128 /sec
                       0      cpu-migrations                   #    0.000 /sec
                      61      page-faults                      #   53.810 K/sec
               1,137,612      instructions                     #    1.29  insn per cycle
                 881,477      cycles                           #    0.778 GHz
               2,093,496      ref-cycles                       #    1.847 G/sec
                 236,356      branches                         #  208.496 M/sec
                   7,090      branch-misses                    #    3.00% of all branches
               5,288,862      slots                            #    4.665 G/sec
               1,223,697      topdown-retiring                 #     23.1% Retiring
                 767,403      topdown-bad-spec                 #     14.5% Bad Speculation
               2,053,322      topdown-fe-bound                 #     38.8% Frontend Bound
               1,244,438      topdown-be-bound                 #     23.5% Backend Bound
                 186,665      topdown-heavy-ops                #      3.5% Heavy Operations       #     19.6% Light Operations
                 725,922      topdown-br-mispredict            #     13.7% Branch Mispredict      #      0.8% Machine Clears
               1,327,400      topdown-fetch-lat                #     25.1% Fetch Latency          #     13.7% Fetch Bandwidth
                 497,775      topdown-mem-bound                #      9.4% Memory Bound           #     14.1% Core Bound
      
             1.002701530 seconds time elapsed
      
             0.002744000 seconds user
             0.000000000 seconds sys
      
      Fixes: 63e39aa6 ("perf stat: Support L2 Topdown events")
      Reviewed-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: default avatarXing Zhengjun <zhengjun.xing@linux.intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20220826140057.3289401-1-zhengjun.xing@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      f0c86a2b
  4. 31 Aug, 2022 2 commits
    • Jiri Olsa's avatar
      perf script: Skip dummy event attr check · 35503ce1
      Jiri Olsa authored
      Hongtao Yu reported problem when displaying uregs in perf script
      for system wide perf.data:
      
        # perf script -F uregs | head -10
        Samples for 'dummy:HG' event do not have UREGS attribute set. Cannot print 'uregs' field.
      
      The problem is the extra dummy event added for system wide,
      which does not have proper sample_type setup.
      
      Skipping attr check completely for dummy event as suggested
      by Namhyung, because it does not have any samples anyway.
      Reported-by: default avatarHongtao Yu <hoy@fb.com>
      Suggested-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20220831124041.219925-1-jolsa@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      35503ce1
    • Ian Rogers's avatar
      perf metric: Return early if no CPU PMU table exists · 3f5df3ac
      Ian Rogers authored
      Previous behavior is to segfault if there is no CPU PMU table and a
      metric is sought. To reproduce compile with NO_JEVENTS=1 then request a
      metric, for example, "perf stat -M IPC true".
      
      Committer testing:
      
      Before:
      
        $ make -k NO_JEVENTS=1 BUILD_BPF_SKEL=1 O=/tmp/build/perf-urgent -C tools/perf install-bin
        $ perf stat -M IPC true
        Segmentation fault (core dumped)
        $
      
      After:
      
        $ perf stat -M IPC true
      
         Usage: perf stat [<options>] [<command>]
      
            -M, --metrics <metric/metric group list>
                                  monitor specified metrics or metric groups (separated by ,)
        $
      
      Fixes: 00facc76 ("perf jevents: Switch build to use jevents.py")
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Caleb Biggers <caleb.biggers@intel.com>
      Cc: Florian Fischer <florian.fischer@muhq.space>
      Cc: Ian Rogers <rogers.email@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Kshipra Bopardikar <kshipra.bopardikar@intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Miaoqian Lin <linmq006@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Perry Taylor <perry.taylor@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Richter <tmricht@linux.ibm.com>
      Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
      Link: https://lore.kernel.org/r/20220830164846.401143-3-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      3f5df3ac
  5. 29 Aug, 2022 2 commits
  6. 28 Aug, 2022 25 commits
  7. 27 Aug, 2022 5 commits