1. 14 Oct, 2020 6 commits
  2. 13 Oct, 2020 16 commits
    • Arnaldo Carvalho de Melo's avatar
      perf config: Export the perf_config_from_file() function · 79bbbabd
      Arnaldo Carvalho de Melo authored
      We'll use it to ask for extra config files to be loaded, profile like
      stuff that will be used first to make 'perf trace' mimic 'strace' output
      via a 'perf strace' command that just sets up 'perf trace' output.
      
      At some point it'll be used for regression tests, where we'll run some
      simple commands like:
      
        perf strace ls > perf-strace.output
        strace ls > strace.output
      
      And then do some mutable syscall arg aware diff like tool to deal with
      arguments for things like mmap, that change at each execution, to be
      first ignored and then properly tracked when used accoss multiple
      syscalls.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      79bbbabd
    • James Clark's avatar
      perf python: Autodetect python3 binary · 79373082
      James Clark authored
      Some distros don't come with python2 and only have python3 available.
      This causes the "'import perf' in python" self test to fail.
      
      This change adds python3 to the list of possible python versions
      that are autodetected but maintains the priorities for
      'python2' and 'python' detection. Python3 has the lowest priority.
      
      Committer notes:
      
      On a fedora system without python2 packages the 'perf test python'
      continues to work:
      
        # python2
        bash: python2: command not found...
        Similar command is: 'python'
        # rpm -qa | grep python2
        #
      
      That "Similar command" gives the clue:
      
        # rpm -qf /usr/bin/python
        python-unversioned-command-3.8.5-5.fc32.noarch
        # rpm -ql python-unversioned-command
        /usr/bin/python
        /usr/share/man/man1/python.1.gz
        #
      
      With it in place the 'python' binary is found and perf builds the python
      binding using python3:
      
        # perf test -v python
        19: 'import perf' in python                                         :
        --- start ---
        test child forked, pid 379988
        python usage test: "echo "import sys ; sys.path.append('/tmp/build/perf/python'); import perf" | '/usr/bin/python' "
        test child finished with 0
        ---- end ----
        'import perf' in python: Ok
        #
      
      Looking at that path:
      
        # ls -la /tmp/build/perf/python
        total 1864
        drwxrwxr-x.  2 acme acme      60 Oct 13 16:20 .
        drwxrwxr-x. 18 acme acme    4420 Oct 13 16:28 ..
        -rwxrwxr-x.  1 acme acme 1907216 Oct 13 16:28 perf.cpython-38-x86_64-linux-gnu.so
        #
      
      And:
      
        # ldd ~/bin/perf | grep python
        	libpython3.8.so.1.0 => /lib64/libpython3.8.so.1.0 (0x00007f5471187000)
        #
      
      As soon as we remove it:
      
        # rpm -e python-unversioned-command-3.8.5-5.fc32.noarch
        # hash -r
        # python
        bash: python: command not found...
        Install package 'python-unversioned-command' to provide command 'python'? [N/y] n
        #
      
      And rebuilding perf now doesn't find python in the system:
      
        make: Entering directory '/home/acme/git/perf/tools/perf'
          BUILD:   Doing 'make -j24' parallel build
        <SNIP>
        Makefile.config:786: No python interpreter was found: disables Python support - please install python-devel/python-dev
        <SNIP>
      
      After this patch:
      
        $ rpm -qi python-unversioned-command
        package python-unversioned-command is not installed
        $
        $ python
        bash: python: command not found...
        Install package 'python-unversioned-command' to provide command 'python'? [N/y] ^C
        $
        $ m
        make: Entering directory '/home/acme/git/perf/tools/perf'
          BUILD:   Doing 'make -j24' parallel build
        <SNIP>
          CC       /tmp/build/perf/tests/attr.o
          CC       /tmp/build/perf/tests/python-use.o
          DESCEND  plugins
          GEN      /tmp/build/perf/python/perf.so
          INSTALL  trace_plugins
          LD       /tmp/build/perf/tests/perf-in.o
          LD       /tmp/build/perf/perf-in.o
          LINK     /tmp/build/perf/perf
        <SNIP>
        make: Leaving directory '/home/acme/git/perf/tools/perf'
        19: 'import perf' in python                                         : Ok
        $ ldd ~/bin/perf | grep python
        	libpython3.8.so.1.0 => /lib64/libpython3.8.so.1.0 (0x00007f2c8c708000)
        $ ls -la /tmp/build/perf/python
        total 1864
        drwxrwxr-x.  2 acme acme      60 Oct 13 16:20 .
        drwxrwxr-x. 18 acme acme    4420 Oct 13 16:31 ..
        -rwxrwxr-x.  1 acme acme 1907216 Oct 13 16:31 perf.cpython-38-x86_64-linux-gnu.so
        $
      Signed-off-by: default avatarJames Clark <james.clark@arm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      LPU-Reference: 20201005080645.6588-1-james.clark@arm.com
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      79373082
    • Arnaldo Carvalho de Melo's avatar
      perf tests: Show python test script in verbose mode · 0fd0f00f
      Arnaldo Carvalho de Melo authored
      To help figure out where it is getting the binding.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      0fd0f00f
    • Vasily Gorbik's avatar
      perf build: Allow nested externs to enable BUILD_BUG() usage · 6cf4ecf5
      Vasily Gorbik authored
      Currently BUILD_BUG() macro is expanded to smth like the following:
      
         do {
                 extern void __compiletime_assert_0(void)
                         __attribute__((error("BUILD_BUG failed")));
                 if (!(!(1)))
                         __compiletime_assert_0();
         } while (0);
      
      If used in a function body this obviously would produce build errors
      with -Wnested-externs and -Werror.
      
      To enable BUILD_BUG() usage in tools/arch/x86/lib/insn.c which perf
      includes in intel-pt-decoder, build perf without -Wnested-externs.
      Reported-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: default avatarVasily Gorbik <gor@linux.ibm.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Tested-by: Stephen Rothwell <sfr@canb.auug.org.au> # build tested
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lore.kernel.org/lkml/patch-1.thread-251403.git-2514037e9477.your-ad-here.call-01602244460-ext-7088@work.hoursSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      6cf4ecf5
    • Jiri Slaby's avatar
      perf trace: Fix off by ones in memset() after realloc() in arches using libaudit · f3013f7e
      Jiri Slaby authored
      'perf trace ls' started crashing after commit d21cb73a on
      !HAVE_SYSCALL_TABLE_SUPPORT configs (armv7l here) like this:
      
        0  strlen () at ../sysdeps/arm/armv6t2/strlen.S:126
        1  0xb6800780 in __vfprintf_internal (s=0xbeff9908, s@entry=0xbeff9900, format=0xa27160 "]: %s()", ap=..., mode_flags=<optimized out>) at vfprintf-internal.c:1688
        ...
        5  0x0056ecdc in fprintf (__fmt=0xa27160 "]: %s()", __stream=<optimized out>) at /usr/include/bits/stdio2.h:100
        6  trace__sys_exit (trace=trace@entry=0xbeffc710, evsel=evsel@entry=0xd968d0, event=<optimized out>, sample=sample@entry=0xbeffc3e8) at builtin-trace.c:2475
        7  0x00566d40 in trace__handle_event (sample=0xbeffc3e8, event=<optimized out>, trace=0xbeffc710) at builtin-trace.c:3122
        ...
        15 main (argc=2, argv=0xbefff6e8) at perf.c:538
      
      It is because memset in trace__read_syscall_info zeroes wrong memory:
      
      1) when initializing for the first time, it does not reset the last id.
      
      2) in other cases, it resets the last id of previous buffer.
      
      ad 1) it causes the crash above as sc->name used in the fprintf above
            contains garbage.
      
      ad 2) it sets nonexistent from true back to false for id 11 here. Not
            sure, what the consequences are.
      
      So fix it by introducing a special case for the initial initialization
      and do the right +1 in both cases.
      
      Fixes: d21cb73a ("perf trace: Grow the syscall table as needed when using libaudit")
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20201001093419.15761-1-jslaby@suse.czSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      f3013f7e
    • Leo Yan's avatar
      perf c2c: Update usage for showing memory events · edac75a2
      Leo Yan authored
      Since commit b027cc6f ("perf c2c: Fix 'perf c2c record -e list' to
      show the default events used"), "perf c2c" tool can show the memory
      events properly, it's no reason to still suggest user to use the
      command "perf mem record -e list" for showing events.
      
      This patch updates the usage for showing memory events with command
      "perf c2c record -e list".
      Signed-off-by: default avatarLeo Yan <leo.yan@linaro.org>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Link: https://lore.kernel.org/r/20201011121022.22409-1-leo.yan@linaro.org
      edac75a2
    • Arnaldo Carvalho de Melo's avatar
      Merge branch 'perf/urgent' into perf/core · dbaa1b3d
      Arnaldo Carvalho de Melo authored
      To pick fixes that missed v5.9.
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      dbaa1b3d
    • Tzvetomir Stoyanov (VMware)'s avatar
      tools lib traceevent: Hide non API functions · a41c3210
      Tzvetomir Stoyanov (VMware) authored
      There are internal library functions, which are not declared as a static.
      They are used inside the library from different files. Hide them from
      the library users, as they are not part of the API.
      These functions are made hidden and are renamed without the prefix "tep_":
       tep_free_plugin_paths
       tep_peek_char
       tep_buffer_init
       tep_get_input_buf_ptr
       tep_get_input_buf
       tep_read_token
       tep_free_token
       tep_free_event
       tep_free_format_field
       __tep_parse_format
      
      Link: https://lore.kernel.org/linux-trace-devel/e4afdd82deb5e023d53231bb13e08dca78085fb0.camel@decadent.org.uk/Reported-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarTzvetomir Stoyanov (VMware) <tz.stoyanov@gmail.com>
      Reviewed-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Cc: linux-trace-devel@vger.kernel.org
      Link: http://lore.kernel.org/lkml/20200930110733.280534-1-tz.stoyanov@gmail.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      a41c3210
    • Joel Fernandes (Google)'s avatar
      perf sched: Show start of latency as well · dc000c45
      Joel Fernandes (Google) authored
      The 'perf sched latency' tool is really useful at showing worst-case
      latencies that task encountered since wakeup. However it shows only the
      end of the latency. Often times the start of a latency is interesting as
      it can show what else was going on at the time to cause the latency. I
      certainly myself spending a lot of time backtracking to the start of the
      latency in "perf sched script" which wastes a lot of time.
      
      This patch therefore adds a new column "Max delay start". Considering
      this, also rename "Maximum delay at" to "Max delay end" as its easier to
      understand.
      
      Example of the new output:
      
        ----------------------------------------------------------------------------------------------------------------------------------
         Task                  | Runtime ms  | Switches | Avg delay ms  | Max delay ms   | Max delay start         | Max delay end       |
        ----------------------------------------------------------------------------------------------------------------------------------
         MediaScannerSer:11936 |  651.296 ms |    67978 | avg: 0.113 ms | max: 77.250 ms | max start: 477.691360 s | max end: 477.768610 s
         audio@2.0-servi:(3)   |    0.000 ms |     3440 | avg: 0.034 ms | max: 72.267 ms | max start: 477.697051 s | max end: 477.769318 s
         AudioOut_1D:8112      |    0.000 ms |     2588 | avg: 0.083 ms | max: 64.020 ms | max start: 477.710740 s | max end: 477.774760 s
         Time-limited te:14973 | 7966.090 ms |    24807 | avg: 0.073 ms | max: 15.563 ms | max start: 477.162746 s | max end: 477.178309 s
         surfaceflinger:8049   |    9.680 ms |      603 | avg: 0.063 ms | max: 13.275 ms | max start: 476.931791 s | max end: 476.945067 s
         HeapTaskDaemon:(3)    | 1588.830 ms |     7040 | avg: 0.065 ms | max:  6.880 ms | max start: 473.666043 s | max end: 473.672922 s
         mount-passthrou:(3)   | 1370.809 ms |    68904 | avg: 0.011 ms | max:  6.524 ms | max start: 478.090630 s | max end: 478.097154 s
         ReferenceQueueD:(3)   |   11.794 ms |     1725 | avg: 0.014 ms | max:  6.521 ms | max start: 476.119782 s | max end: 476.126303 s
         writer:14077          |   18.410 ms |     1427 | avg: 0.036 ms | max:  6.131 ms | max start: 474.169675 s | max end: 474.175805 s
      Signed-off-by: default avatarJoel Fernandes (Google) <joel@joelfernandes.org>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20200925235634.4089867-1-joel@joelfernandes.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      dc000c45
    • Sandipan Das's avatar
      perf vendor events: Fix typos in power8 PMU events · 70830f97
      Sandipan Das authored
      This replaces the incorrectly spelled word "localtion" with "location"
      in some power8 PMU event descriptions.
      
      Fixes: 2a81fa3b ("perf vendor events: Add power8 PMU events")
      Signed-off-by: default avatarSandipan Das <sandipan@linux.ibm.com>
      Reviewed-by: default avatarKajol Jain <kjain@linux.ibm.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      Link: http://lore.kernel.org/lkml/20201012050205.328523-1-sandipan@linux.ibm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      70830f97
    • Namhyung Kim's avatar
      perf bench: Run inject-build-id with --buildid-all option too · bf7ef5dd
      Namhyung Kim authored
      For comparison, it now runs the benchmark twice - one if regular -b and
      another for --buildid-all.
      
        $ perf bench internals inject-build-id
        # Running 'internals/inject-build-id' benchmark:
          Average build-id injection took: 21.002 msec (+- 0.172 msec)
          Average time per event: 2.059 usec (+- 0.017 usec)
          Average memory usage: 8169 KB (+- 0 KB)
          Average build-id-all injection took: 19.543 msec (+- 0.124 msec)
          Average time per event: 1.916 usec (+- 0.012 usec)
          Average memory usage: 7348 KB (+- 0 KB)
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Link: https://lore.kernel.org/r/20201012070214.2074921-7-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      bf7ef5dd
    • Namhyung Kim's avatar
      perf inject: Add --buildid-all option · 27c9c342
      Namhyung Kim authored
      Like 'perf record', we can even more speedup build-id processing by just
      using all DSOs.  Then we don't need to look at all the sample events
      anymore.  The following patch will update 'perf bench' to show the result
      of the --buildid-all option too.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Original-patch-by: default avatarStephane Eranian <eranian@google.com>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Link: https://lore.kernel.org/r/20201012070214.2074921-6-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      27c9c342
    • Namhyung Kim's avatar
      perf inject: Do not load map/dso when injecting build-id · e7b60c5a
      Namhyung Kim authored
      No need to load symbols in a DSO when injecting build-id.  I guess the
      reason was to check the DSO is a special file like anon files.  Use some
      helper functions in map.c to check them before reading build-id.  Also
      pass sample event's cpumode to a new build-id event.
      
      It brought a speedup in the benchmark of 25 -> 21 msec on my laptop.
      Also the memory usage (Max RSS) went down by ~200 KB.
      
        # Running 'internals/inject-build-id' benchmark:
          Average build-id injection took: 21.389 msec (+- 0.138 msec)
          Average time per event: 2.097 usec (+- 0.014 usec)
          Average memory usage: 8225 KB (+- 0 KB)
      
      Committer notes:
      
      Before:
      
        $ perf stat -r5 perf bench internals inject-build-id > /dev/null
      
         Performance counter stats for 'perf bench internals inject-build-id' (5 runs):
      
                  4,020.56 msec task-clock:u              #    1.271 CPUs utilized            ( +-  0.74% )
                         0      context-switches:u        #    0.000 K/sec
                         0      cpu-migrations:u          #    0.000 K/sec
                   123,354      page-faults:u             #    0.031 M/sec                    ( +-  0.81% )
             7,119,951,568      cycles:u                  #    1.771 GHz                      ( +-  1.74% )  (83.27%)
               230,086,969      stalled-cycles-frontend:u #    3.23% frontend cycles idle     ( +-  1.97% )  (83.41%)
             1,168,298,765      stalled-cycles-backend:u  #   16.41% backend cycles idle      ( +-  1.13% )  (83.44%)
            11,173,083,669      instructions:u            #    1.57  insn per cycle
                                                          #    0.10  stalled cycles per insn  ( +-  1.58% )  (83.31%)
             2,413,908,936      branches:u                #  600.392 M/sec                    ( +-  1.69% )  (83.26%)
                46,576,289      branch-misses:u           #    1.93% of all branches          ( +-  2.20% )  (83.31%)
      
                    3.1638 +- 0.0309 seconds time elapsed  ( +-  0.98% )
      
        $
      
      After:
      
        $ perf stat -r5 perf bench internals inject-build-id > /dev/null
      
         Performance counter stats for 'perf bench internals inject-build-id' (5 runs):
      
                  2,379.94 msec task-clock:u              #    1.473 CPUs utilized            ( +-  0.18% )
                         0      context-switches:u        #    0.000 K/sec
                         0      cpu-migrations:u          #    0.000 K/sec
                    62,584      page-faults:u             #    0.026 M/sec                    ( +-  0.07% )
             2,372,389,668      cycles:u                  #    0.997 GHz                      ( +-  0.29% )  (83.14%)
               106,937,862      stalled-cycles-frontend:u #    4.51% frontend cycles idle     ( +-  4.89% )  (83.20%)
               581,697,915      stalled-cycles-backend:u  #   24.52% backend cycles idle      ( +-  0.71% )  (83.47%)
             3,659,692,199      instructions:u            #    1.54  insn per cycle
                                                          #    0.16  stalled cycles per insn  ( +-  0.10% )  (83.63%)
               791,372,961      branches:u                #  332.518 M/sec                    ( +-  0.27% )  (83.39%)
                10,648,083      branch-misses:u           #    1.35% of all branches          ( +-  0.22% )  (83.16%)
      
                   1.61570 +- 0.00172 seconds time elapsed  ( +-  0.11% )
      
        $
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Original-patch-by: default avatarStephane Eranian <eranian@google.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Link: https://lore.kernel.org/r/20201012070214.2074921-5-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      e7b60c5a
    • Namhyung Kim's avatar
      perf inject: Enter namespace when reading build-id · 336c95b2
      Namhyung Kim authored
      It should be in a proper mnt namespace when accessing the file.
      
      I think this had no problem since the build-id was actually read from
      map__load() -> dso__load() already.  But I'd like to change it in the
      following commit.
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20201012070214.2074921-4-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      336c95b2
    • Namhyung Kim's avatar
      perf inject: Add missing callbacks in perf_tool · 2946eced
      Namhyung Kim authored
      I found some events (like PERF_RECORD_CGROUP) are not copied by perf
      inject due to the missing callbacks.  Let's add them.
      
      While at it, I've changed the order of the callbacks to match with
      struct perf_tool so that we can compare them easily.
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20201012070214.2074921-3-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      2946eced
    • Namhyung Kim's avatar
      perf bench: Add build-id injection benchmark · 0bf02a0d
      Namhyung Kim authored
      Sometimes I can see that 'perf record' piped with 'perf inject' take a
      long time processing build-ids.
      
      So introduce a inject-build-id benchmark to the internals benchmark
      suite to measure its overhead regularly.
      
      It runs the 'perf inject' command internally and feeds the given number
      of synthesized events (MMAP2 + SAMPLE basically).
      
        Usage: perf bench internals inject-build-id <options>
      
          -i, --iterations <n>  Number of iterations used to compute average (default: 100)
          -m, --nr-mmaps <n>    Number of mmap events for each iteration (default: 100)
          -n, --nr-samples <n>  Number of sample events per mmap event (default: 100)
          -v, --verbose         be more verbose (show iteration count, DSO name, etc)
      
      By default, it measures average processing time of 100 MMAP2 events
      and 10000 SAMPLE events.  Below is a result on my laptop.
      
        $ perf bench internals inject-build-id
        # Running 'internals/inject-build-id' benchmark:
          Average build-id injection took: 25.789 msec (+- 0.202 msec)
          Average time per event: 2.528 usec (+- 0.020 usec)
          Average memory usage: 8411 KB (+- 7 KB)
      
      Committer testing:
      
        $ perf bench
        Usage:
        	perf bench [<common options>] <collection> <benchmark> [<options>]
      
                # List of all available benchmark collections:
      
                 sched: Scheduler and IPC benchmarks
               syscall: System call benchmarks
                   mem: Memory access benchmarks
                  numa: NUMA scheduling and MM benchmarks
                 futex: Futex stressing benchmarks
                 epoll: Epoll stressing benchmarks
             internals: Perf-internals benchmarks
                   all: All benchmarks
      
        $ perf bench internals
      
                # List of available benchmarks for collection 'internals':
      
            synthesize: Benchmark perf event synthesis
        kallsyms-parse: Benchmark kallsyms parsing
        inject-build-id: Benchmark build-id injection
      
        $ perf bench internals inject-build-id
        # Running 'internals/inject-build-id' benchmark:
          Average build-id injection took: 14.202 msec (+- 0.059 msec)
          Average time per event: 1.392 usec (+- 0.006 usec)
          Average memory usage: 12650 KB (+- 10 KB)
          Average build-id-all injection took: 12.831 msec (+- 0.071 msec)
          Average time per event: 1.258 usec (+- 0.007 usec)
          Average memory usage: 11895 KB (+- 10 KB)
        $
      
        $ perf stat -r5 perf bench internals inject-build-id
        # Running 'internals/inject-build-id' benchmark:
          Average build-id injection took: 14.380 msec (+- 0.056 msec)
          Average time per event: 1.410 usec (+- 0.006 usec)
          Average memory usage: 12608 KB (+- 11 KB)
          Average build-id-all injection took: 11.889 msec (+- 0.064 msec)
          Average time per event: 1.166 usec (+- 0.006 usec)
          Average memory usage: 11838 KB (+- 10 KB)
        # Running 'internals/inject-build-id' benchmark:
          Average build-id injection took: 14.246 msec (+- 0.065 msec)
          Average time per event: 1.397 usec (+- 0.006 usec)
          Average memory usage: 12744 KB (+- 10 KB)
          Average build-id-all injection took: 12.019 msec (+- 0.066 msec)
          Average time per event: 1.178 usec (+- 0.006 usec)
          Average memory usage: 11963 KB (+- 10 KB)
        # Running 'internals/inject-build-id' benchmark:
          Average build-id injection took: 14.321 msec (+- 0.067 msec)
          Average time per event: 1.404 usec (+- 0.007 usec)
          Average memory usage: 12690 KB (+- 10 KB)
          Average build-id-all injection took: 11.909 msec (+- 0.041 msec)
          Average time per event: 1.168 usec (+- 0.004 usec)
          Average memory usage: 11938 KB (+- 10 KB)
        # Running 'internals/inject-build-id' benchmark:
          Average build-id injection took: 14.287 msec (+- 0.059 msec)
          Average time per event: 1.401 usec (+- 0.006 usec)
          Average memory usage: 12864 KB (+- 10 KB)
          Average build-id-all injection took: 11.862 msec (+- 0.058 msec)
          Average time per event: 1.163 usec (+- 0.006 usec)
          Average memory usage: 12103 KB (+- 10 KB)
        # Running 'internals/inject-build-id' benchmark:
          Average build-id injection took: 14.402 msec (+- 0.053 msec)
          Average time per event: 1.412 usec (+- 0.005 usec)
          Average memory usage: 12876 KB (+- 10 KB)
          Average build-id-all injection took: 11.826 msec (+- 0.061 msec)
          Average time per event: 1.159 usec (+- 0.006 usec)
          Average memory usage: 12111 KB (+- 10 KB)
      
         Performance counter stats for 'perf bench internals inject-build-id' (5 runs):
      
                  4,267.48 msec task-clock:u              #    1.502 CPUs utilized            ( +-  0.14% )
                         0      context-switches:u        #    0.000 K/sec
                         0      cpu-migrations:u          #    0.000 K/sec
                   102,092      page-faults:u             #    0.024 M/sec                    ( +-  0.08% )
             3,894,589,578      cycles:u                  #    0.913 GHz                      ( +-  0.19% )  (83.49%)
               140,078,421      stalled-cycles-frontend:u #    3.60% frontend cycles idle     ( +-  0.77% )  (83.34%)
               948,581,189      stalled-cycles-backend:u  #   24.36% backend cycles idle      ( +-  0.46% )  (83.25%)
             5,835,587,719      instructions:u            #    1.50  insn per cycle
                                                          #    0.16  stalled cycles per insn  ( +-  0.21% )  (83.24%)
             1,267,423,636      branches:u                #  296.996 M/sec                    ( +-  0.22% )  (83.12%)
                17,484,290      branch-misses:u           #    1.38% of all branches          ( +-  0.12% )  (83.55%)
      
                   2.84176 +- 0.00222 seconds time elapsed  ( +-  0.08% )
      
        $
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20201012070214.2074921-2-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      0bf02a0d
  3. 07 Oct, 2020 1 commit
    • Namhyung Kim's avatar
      perf stat: Fix out of bounds CPU map access when handling armv8_pmu events · bef69bd7
      Namhyung Kim authored
      It was reported that 'perf stat' crashed when using with armv8_pmu (CPU)
      events with the task mode.  As 'perf stat' uses an empty cpu map for
      task mode but armv8_pmu has its own cpu mask, it has confused which map
      it should use when accessing file descriptors and this causes segfaults:
      
        (gdb) bt
        #0  0x0000000000603fc8 in perf_evsel__close_fd_cpu (evsel=<optimized out>,
            cpu=<optimized out>) at evsel.c:122
        #1  perf_evsel__close_cpu (evsel=evsel@entry=0x716e950, cpu=7) at evsel.c:156
        #2  0x00000000004d4718 in evlist__close (evlist=0x70a7cb0) at util/evlist.c:1242
        #3  0x0000000000453404 in __run_perf_stat (argc=3, argc@entry=1, argv=0x30,
            argv@entry=0xfffffaea2f90, run_idx=119, run_idx@entry=1701998435)
            at builtin-stat.c:929
        #4  0x0000000000455058 in run_perf_stat (run_idx=1701998435, argv=0xfffffaea2f90,
            argc=1) at builtin-stat.c:947
        #5  cmd_stat (argc=1, argv=0xfffffaea2f90) at builtin-stat.c:2357
        #6  0x00000000004bb888 in run_builtin (p=p@entry=0x9764b8 <commands+288>,
            argc=argc@entry=4, argv=argv@entry=0xfffffaea2f90) at perf.c:312
        #7  0x00000000004bbb54 in handle_internal_command (argc=argc@entry=4,
            argv=argv@entry=0xfffffaea2f90) at perf.c:364
        #8  0x0000000000435378 in run_argv (argcp=<synthetic pointer>,
            argv=<synthetic pointer>) at perf.c:408
        #9  main (argc=4, argv=0xfffffaea2f90) at perf.c:538
      
      To fix this, I simply used the given cpu map unless the evsel actually
      is not a system-wide event (like uncore events).
      
      Fixes: 7736627b ("perf stat: Use affinity for closing file descriptors")
      Reported-by: default avatarWei Li <liwei391@huawei.com>
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Tested-by: default avatarBarry Song <song.bao.hua@hisilicon.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lore.kernel.org/lkml/20201007081311.1831003-1-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      bef69bd7
  4. 01 Oct, 2020 3 commits
    • Jiri Olsa's avatar
      perf python scripting: Fix printable strings in python3 scripts · 6fcd5ddc
      Jiri Olsa authored
      Hagen reported broken strings in python3 tracepoint scripts:
      
        make PYTHON=python3
        perf record -e sched:sched_switch -a -- sleep 5
        perf script --gen-script py
        perf script -s ./perf-script.py
      
        [..]
        sched__sched_switch      7 563231.759525792        0 swapper   prev_comm=bytearray(b'swapper/7\x00\x00\x00\x00\x00\x00\x00'), prev_pid=0, prev_prio=120, prev_state=, next_comm=bytearray(b'mutex-thread-co\x00'),
      
      The problem is in the is_printable_array function that does not take the
      zero byte into account and claim such string as not printable, so the
      code will create byte array instead of string.
      
      Committer testing:
      
      After this fix:
      
      sched__sched_switch 3 484522.497072626  1158680 kworker/3:0-eve  prev_comm=kworker/3:0, prev_pid=1158680, prev_prio=120, prev_state=I, next_comm=swapper/3, next_pid=0, next_prio=120
      Sample: {addr=0, cpu=3, datasrc=84410401, datasrc_decode=N/A|SNP N/A|TLB N/A|LCK N/A, ip=18446744071841817196, period=1, phys_addr=0, pid=1158680, tid=1158680, time=484522497072626, transaction=0, values=[(0, 0)], weight=0}
      
      sched__sched_switch 4 484522.497085610  1225814 perf             prev_comm=perf, prev_pid=1225814, prev_prio=120, prev_state=, next_comm=migration/4, next_pid=30, next_prio=0
      Sample: {addr=0, cpu=4, datasrc=84410401, datasrc_decode=N/A|SNP N/A|TLB N/A|LCK N/A, ip=18446744071841817196, period=1, phys_addr=0, pid=1225814, tid=1225814, time=484522497085610, transaction=0, values=[(0, 0)], weight=0}
      
      Fixes: 249de6e0 ("perf script python: Fix string vs byte array resolving")
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Tested-by: default avatarHagen Paul Pfeifer <hagen@jauu.net>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Michael Petlan <mpetlan@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: stable@vger.kernel.org
      Link: http://lore.kernel.org/lkml/20200928201135.3633850-1-jolsa@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      6fcd5ddc
    • Arnaldo Carvalho de Melo's avatar
      perf trace: Use the autogenerated mmap 'prot' string/id table · 388968d8
      Arnaldo Carvalho de Melo authored
      No change in behaviour:
      
        # perf trace -e mmap sleep 1
             0.000 ( 0.009 ms): sleep/751870 mmap(len: 143317, prot: READ, flags: PRIVATE, fd: 3)                  = 0x7fa96d0f7000
             0.028 ( 0.004 ms): sleep/751870 mmap(len: 8192, prot: READ|WRITE, flags: PRIVATE|ANONYMOUS)           = 0x7fa96d0f5000
             0.037 ( 0.005 ms): sleep/751870 mmap(len: 1872744, prot: READ, flags: PRIVATE|DENYWRITE, fd: 3)       = 0x7fa96cf2b000
             0.044 ( 0.011 ms): sleep/751870 mmap(addr: 0x7fa96cf50000, len: 1376256, prot: READ|EXEC, flags: PRIVATE|FIXED|DENYWRITE, fd: 3, off: 0x25000) = 0x7fa96cf50000
             0.056 ( 0.007 ms): sleep/751870 mmap(addr: 0x7fa96d0a0000, len: 307200, prot: READ, flags: PRIVATE|FIXED|DENYWRITE, fd: 3, off: 0x175000) = 0x7fa96d0a0000
             0.064 ( 0.007 ms): sleep/751870 mmap(addr: 0x7fa96d0eb000, len: 24576, prot: READ|WRITE, flags: PRIVATE|FIXED|DENYWRITE, fd: 3, off: 0x1bf000) = 0x7fa96d0eb000
             0.075 ( 0.005 ms): sleep/751870 mmap(addr: 0x7fa96d0f1000, len: 13160, prot: READ|WRITE, flags: PRIVATE|FIXED|ANONYMOUS) = 0x7fa96d0f1000
             0.253 ( 0.005 ms): sleep/751870 mmap(len: 218049136, prot: READ, flags: PRIVATE, fd: 3)               = 0x7fa95ff38000
        #
        #
        # set -o vi
        # strace -e mmap sleep 1
        mmap(NULL, 143317, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f333bd83000
        mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f333bd81000
        mmap(NULL, 1872744, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f333bbb7000
        mmap(0x7f333bbdc000, 1376256, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x25000) = 0x7f333bbdc000
        mmap(0x7f333bd2c000, 307200, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x175000) = 0x7f333bd2c000
        mmap(0x7f333bd77000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1bf000) = 0x7f333bd77000
        mmap(0x7f333bd7d000, 13160, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f333bd7d000
        mmap(NULL, 218049136, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f332ebc4000
        +++ exited with 0 +++
        #
      
      And you can as well tweak 'perf trace's output to more closely match
      strace's:
      
        # perf config trace.show_arg_names=no
        # perf config trace.show_duration=no
        # perf config trace.show_prefix=yes
        # perf config trace.show_timestamp=no
        # perf config trace.show_zeros=yes
        # perf config trace.no_inherit=yes
        # perf trace -e mmap sleep 1
        mmap(NULL, 143317, PROT_READ, MAP_PRIVATE, 3, 0)                      = 0x7f0d287ca000
        mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS)     = 0x7f0d287c8000
        mmap(NULL, 1872744, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0)       = 0x7f0d285fe000
        mmap(0x7f0d28623000, 1376256, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x25000) = 0x7f0d28623000
        mmap(0x7f0d28773000, 307200, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x175000) = 0x7f0d28773000
        mmap(0x7f0d287be000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1bf000) = 0x7f0d287be000
        mmap(0x7f0d287c4000, 13160, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS) = 0x7f0d287c4000
        mmap(NULL, 218049136, PROT_READ, MAP_PRIVATE, 3, 0)                   = 0x7f0d1b60b000
        #
      
        # perf config | grep ^trace
        trace.show_arg_names=no
        trace.show_duration=no
        trace.show_prefix=yes
        trace.show_timestamp=no
        trace.show_zeros=yes
        trace.no_inherit=yes
        #
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      388968d8
    • Arnaldo Carvalho de Melo's avatar
      tools beauty: Add script to generate table of mmap's 'prot' argument · 08fc4762
      Arnaldo Carvalho de Melo authored
      Will be wired up in the following csets:
      
        $ tools/perf/trace/beauty/mmap_prot.sh
        static const char *mmap_prot[] = {
        	[ilog2(0x1) + 1] = "READ",
        #ifndef PROT_READ
        #define PROT_READ 0x1
        #endif
        	[ilog2(0x2) + 1] = "WRITE",
        #ifndef PROT_WRITE
        #define PROT_WRITE 0x2
        #endif
        	[ilog2(0x4) + 1] = "EXEC",
        #ifndef PROT_EXEC
        #define PROT_EXEC 0x4
        #endif
        	[ilog2(0x8) + 1] = "SEM",
        #ifndef PROT_SEM
        #define PROT_SEM 0x8
        #endif
        	[ilog2(0x01000000) + 1] = "GROWSDOWN",
        #ifndef PROT_GROWSDOWN
        #define PROT_GROWSDOWN 0x01000000
        #endif
        	[ilog2(0x02000000) + 1] = "GROWSUP",
        #ifndef PROT_GROWSUP
        #define PROT_GROWSUP 0x02000000
        #endif
        };
        $
        $
        $
        $ tools/perf/trace/beauty/mmap_prot.sh alpha
        static const char *mmap_prot[] = {
        	[ilog2(0x4) + 1] = "EXEC",
        #ifndef PROT_EXEC
        #define PROT_EXEC 0x4
        #endif
        	[ilog2(0x01000000) + 1] = "GROWSDOWN",
        #ifndef PROT_GROWSDOWN
        #define PROT_GROWSDOWN 0x01000000
        #endif
        	[ilog2(0x02000000) + 1] = "GROWSUP",
        #ifndef PROT_GROWSUP
        #define PROT_GROWSUP 0x02000000
        #endif
        	[ilog2(0x1) + 1] = "READ",
        #ifndef PROT_READ
        #define PROT_READ 0x1
        #endif
        	[ilog2(0x8) + 1] = "SEM",
        #ifndef PROT_SEM
        #define PROT_SEM 0x8
        #endif
        	[ilog2(0x2) + 1] = "WRITE",
        #ifndef PROT_WRITE
        #define PROT_WRITE 0x2
        #endif
        };
        $
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      08fc4762
  5. 30 Sep, 2020 1 commit
    • Arnaldo Carvalho de Melo's avatar
      perf beauty mmap_flags: Conditionaly define the mmap flags · 61693228
      Arnaldo Carvalho de Melo authored
      So that in older systems we get it in the mmap flags scnprintf routines:
      
        $ tools/perf/trace/beauty/mmap_flags.sh  | head -9 2> /dev/null
        static const char *mmap_flags[] = {
        	[ilog2(0x40) + 1] = "32BIT",
        #ifndef MAP_32BIT
        #define MAP_32BIT 0x40
        #endif
        	[ilog2(0x01) + 1] = "SHARED",
        #ifndef MAP_SHARED
        #define MAP_SHARED 0x01
        #endif
        $
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      61693228
  6. 29 Sep, 2020 2 commits
    • Arnaldo Carvalho de Melo's avatar
      perf trace beauty: Add script to autogenerate mremap's flags args string/id table · 9012e3dd
      Arnaldo Carvalho de Melo authored
      It'll also conditionally generate the defines, so that if we don't have
      those when building a new tool tarball in an older systems, we get
      those, and we need them sometimes in the actual scnprintf routine, such
      as when checking if a flags means we have an extra arg, like with
      MREMAP_FIXED.
      
        $ tools/perf/trace/beauty/mremap_flags.sh
        static const char *mremap_flags[] = {
        	[ilog2(1) + 1] = "MAYMOVE",
        #ifndef MREMAP_MAYMOVE
        #define MREMAP_MAYMOVE 1
        #endif
        	[ilog2(2) + 1] = "FIXED",
        #ifndef MREMAP_FIXED
        #define MREMAP_FIXED 2
        #endif
        	[ilog2(4) + 1] = "DONTUNMAP",
        #ifndef MREMAP_DONTUNMAP
        #define MREMAP_DONTUNMAP 4
        #endif
        };
        $
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      9012e3dd
    • Arnaldo Carvalho de Melo's avatar
      perf tools: Separate the checking of headers only used to build beautification tables · d758d5d4
      Arnaldo Carvalho de Melo authored
      Some headers are not used in building the tools directly, but instead to
      generate tables that then gets source code included to do id->string and
      string->id lookups for things like syscall flags and commands.
      
      We were adding it directly to tools/include/ and this sometimes gets in
      the way of building using system headers, lets untangle this a bit.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      d758d5d4
  7. 28 Sep, 2020 11 commits