1. 17 Jul, 2020 5 commits
    • Jiri Olsa's avatar
      perf metric: Rename expr__add_id() to expr__add_val() · 2c46f542
      Jiri Olsa authored
      Rename expr__add_id() to expr__add_val() so we can use expr__add_id() to
      actually add just the id without any value in following changes.
      
      There's no functional change.
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Michael Petlan <mpetlan@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Clarke <pc@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lore.kernel.org/lkml/20200712132634.138901-2-jolsa@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      2c46f542
    • Masami Hiramatsu's avatar
      perf probe: Warn if the target function is a GNU indirect function · 3de2bf9d
      Masami Hiramatsu authored
      Warn if the probe target function is a GNU indirect function (GNU_IFUNC)
      because it may not be what the user wants to probe.
      
      The GNU indirect function ( https://sourceware.org/glibc/wiki/GNU_IFUNC )
      is the dynamic symbol solved at runtime. An IFUNC function is a selector
      which is invoked from the ELF loader, but the symbol address of the
      function which will be modified by the IFUNC is the same as the IFUNC in
      the symbol table. This can confuse users trying to probe such functions.
      
      For example, memcpy is an IFUNC.
      
        probe_libc:memcpy    (on __new_memcpy_ifunc@x86_64/multiarch/memcpy.c in /usr/lib64/libc-2.30.so)
      
      the probe is put on an IFUNC.
      
        perf  1742 [000] 26201.715632: probe_libc:memcpy: (7fdaa53824c0)
                    7fdaa53824c0 __new_memcpy_ifunc+0x0 (inlined)
                    7fdaa5d4a980 elf_machine_rela+0x6c0 (inlined)
                    7fdaa5d4a980 elf_dynamic_do_Rela+0x6c0 (inlined)
                    7fdaa5d4a980 _dl_relocate_object+0x6c0 (/usr/lib64/ld-2.30.so)
                    7fdaa5d42155 dl_main+0x1cc5 (/usr/lib64/ld-2.30.so)
                    7fdaa5d5831a _dl_sysdep_start+0x54a (/usr/lib64/ld-2.30.so)
                    7fdaa5d3ffeb _dl_start_final+0x25b (inlined)
                    7fdaa5d3ffeb _dl_start+0x25b (/usr/lib64/ld-2.30.so)
                    7fdaa5d3f117 .annobin_rtld.c+0x7 (inlined)
      
      And the event is invoked from the ELF loader instead of the target
      program's main code.
      
      Moreover, at this moment, we can not probe on the function which will
      be selected by the IFUNC, because it is determined at runtime. But
      uprobe will be prepared before running the target binary.
      
      Thus, I decided to warn user when 'perf probe' detects that the probe
      point is on an GNU IFUNC symbol. Someone who wants to probe an IFUNC
      symbol to debug the IFUNC function can ignore this warning.
      
      Committer notes:
      
      I.e., this warning will be emitted if the probe point is an IFUNC:
      
        "Warning: The probe function (%s) is a GNU indirect function.\n"
        "Consider identifying the final function used at run time and set the probe directly on that.\n"
      
      Complete set of steps:
      
        # readelf -sW /lib64/libc-2.29.so  | grep IFUNC | tail
         22196: 0000000000109a80   183 IFUNC   GLOBAL DEFAULT   14 __memcpy_chk
         22214: 00000000000b7d90   191 IFUNC   GLOBAL DEFAULT   14 __gettimeofday
         22336: 000000000008b690    60 IFUNC   GLOBAL DEFAULT   14 memchr
         22350: 000000000008b9b0    89 IFUNC   GLOBAL DEFAULT   14 __stpcpy
         22420: 000000000008bb10    76 IFUNC   GLOBAL DEFAULT   14 __strcasecmp_l
         22582: 000000000008a970    60 IFUNC   GLOBAL DEFAULT   14 strlen
         22585: 00000000000a54d0    92 IFUNC   WEAK   DEFAULT   14 wmemset
         22600: 000000000010b030    92 IFUNC   GLOBAL DEFAULT   14 __wmemset_chk
         22618: 000000000008b8a0   183 IFUNC   GLOBAL DEFAULT   14 __mempcpy
         22675: 000000000008ba70    76 IFUNC   WEAK   DEFAULT   14 strcasecmp
        #
        # perf probe -x /lib64/libc-2.29.so strlen
        Warning: The probe function (strlen) is a GNU indirect function.
        Consider identifying the final function used at run time and set the probe directly on that.
        Added new event:
          probe_libc:strlen    (on strlen in /usr/lib64/libc-2.29.so)
      
        You can now use it in all perf tools, such as:
      
        	perf record -e probe_libc:strlen -aR sleep 1
      
        #
      Reported-by: default avatarAndi Kleen <andi@firstfloor.org>
      Signed-off-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Link: http://lore.kernel.org/lkml/159438669349.62703.5978345670436126948.stgit@devnote2Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      3de2bf9d
    • Masami Hiramatsu's avatar
      perf probe: Fix memory leakage when the probe point is not found · 12d572e7
      Masami Hiramatsu authored
      Fix the memory leakage in debuginfo__find_trace_events() when the probe
      point is not found in the debuginfo. If there is no probe point found in
      the debuginfo, debuginfo__find_probes() will NOT return -ENOENT, but 0.
      
      Thus the caller of debuginfo__find_probes() must check the tf.ntevs and
      release the allocated memory for the array of struct probe_trace_event.
      
      The current code releases the memory only if the debuginfo__find_probes()
      hits an error but not checks tf.ntevs. In the result, the memory allocated
      on *tevs are not released if tf.ntevs == 0.
      
      This fixes the memory leakage by checking tf.ntevs == 0 in addition to
      ret < 0.
      
      Fixes: ff741783 ("perf probe: Introduce debuginfo to encapsulate dwarf information")
      Signed-off-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Reviewed-by: default avatarSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: stable@vger.kernel.org
      Link: http://lore.kernel.org/lkml/159438668346.62703.10887420400718492503.stgit@devnote2Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      12d572e7
    • Masami Hiramatsu's avatar
      perf probe: Fix wrong variable warning when the probe point is not found · 11fd3eb8
      Masami Hiramatsu authored
      Fix a wrong "variable not found" warning when the probe point is not
      found in the debuginfo.
      
      Since the debuginfo__find_probes() can return 0 even if it does not find
      given probe point in the debuginfo, fill_empty_trace_arg() can be called
      with tf.ntevs == 0 and it can emit a wrong warning.  To fix this, reject
      ntevs == 0 in fill_empty_trace_arg().
      
      E.g. without this patch;
      
        # perf probe -x /lib64/libc-2.30.so -a "memcpy arg1=%di"
        Failed to find the location of the '%di' variable at this address.
         Perhaps it has been optimized out.
         Use -V with the --range option to show '%di' location range.
        Added new events:
          probe_libc:memcpy    (on memcpy in /usr/lib64/libc-2.30.so with arg1=%di)
          probe_libc:memcpy    (on memcpy in /usr/lib64/libc-2.30.so with arg1=%di)
      
        You can now use it in all perf tools, such as:
      
        	perf record -e probe_libc:memcpy -aR sleep 1
      
      With this;
      
        # perf probe -x /lib64/libc-2.30.so -a "memcpy arg1=%di"
        Added new events:
          probe_libc:memcpy    (on memcpy in /usr/lib64/libc-2.30.so with arg1=%di)
          probe_libc:memcpy    (on memcpy in /usr/lib64/libc-2.30.so with arg1=%di)
      
        You can now use it in all perf tools, such as:
      
        	perf record -e probe_libc:memcpy -aR sleep 1
      
      Fixes: cb402730 ("perf probe: Trace a magic number if variable is not found")
      Reported-by: default avatarAndi Kleen <andi@firstfloor.org>
      Signed-off-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Reviewed-by: default avatarSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Tested-by: default avatarAndi Kleen <ak@linux.intel.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: stable@vger.kernel.org
      Link: http://lore.kernel.org/lkml/159438667364.62703.2200642186798763202.stgit@devnote2Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      11fd3eb8
    • Masami Hiramatsu's avatar
      perf probe: Avoid setting probes on the same address for the same event · 26bbf45f
      Masami Hiramatsu authored
      There is a case that several same-name symbols points to the same
      address.  In that case, 'perf probe' returns an error.
      
      E.g.
      
        # perf probe -x /lib64/libc-2.30.so -v -a "memcpy arg1=%di"
        probe-definition(0): memcpy arg1=%di
        symbol:memcpy file:(null) line:0 offset:0 return:0 lazy:(null)
        parsing arg: arg1=%di into name:arg1 %di
        1 arguments
        symbol:setjmp file:(null) line:0 offset:0 return:0 lazy:(null)
        symbol:longjmp file:(null) line:0 offset:0 return:0 lazy:(null)
        symbol:longjmp_target file:(null) line:0 offset:0 return:0 lazy:(null)
        symbol:lll_lock_wait_private file:(null) line:0 offset:0 return:0 lazy:(null)
        symbol:memory_mallopt_arena_max file:(null) line:0 offset:0 return:0 lazy:(null)
        symbol:memory_mallopt_arena_test file:(null) line:0 offset:0 return:0 lazy:(null)
        symbol:memory_tunable_tcache_max_bytes file:(null) line:0 offset:0 return:0 lazy:(null)
        symbol:memory_tunable_tcache_count file:(null) line:0 offset:0 return:0 lazy:(null)
        symbol:memory_tunable_tcache_unsorted_limit file:(null) line:0 offset:0 return:0 lazy:(null)
        symbol:memory_mallopt_trim_threshold file:(null) line:0 offset:0 return:0 lazy:(null)
        symbol:memory_mallopt_top_pad file:(null) line:0 offset:0 return:0 lazy:(null)
        symbol:memory_mallopt_mmap_threshold file:(null) line:0 offset:0 return:0 lazy:(null)
        symbol:memory_mallopt_mmap_max file:(null) line:0 offset:0 return:0 lazy:(null)
        symbol:memory_mallopt_perturb file:(null) line:0 offset:0 return:0 lazy:(null)
        symbol:memory_mallopt_mxfast file:(null) line:0 offset:0 return:0 lazy:(null)
        symbol:memory_heap_new file:(null) line:0 offset:0 return:0 lazy:(null)
        symbol:memory_arena_reuse_free_list file:(null) line:0 offset:0 return:0 lazy:(null)
        symbol:memory_arena_reuse file:(null) line:0 offset:0 return:0 lazy:(null)
        symbol:memory_arena_reuse_wait file:(null) line:0 offset:0 return:0 lazy:(null)
        symbol:memory_arena_new file:(null) line:0 offset:0 return:0 lazy:(null)
        symbol:memory_arena_retry file:(null) line:0 offset:0 return:0 lazy:(null)
        symbol:memory_sbrk_less file:(null) line:0 offset:0 return:0 lazy:(null)
        symbol:memory_heap_free file:(null) line:0 offset:0 return:0 lazy:(null)
        symbol:memory_heap_less file:(null) line:0 offset:0 return:0 lazy:(null)
        symbol:memory_tcache_double_free file:(null) line:0 offset:0 return:0 lazy:(null)
        symbol:memory_heap_more file:(null) line:0 offset:0 return:0 lazy:(null)
        symbol:memory_sbrk_more file:(null) line:0 offset:0 return:0 lazy:(null)
        symbol:memory_malloc_retry file:(null) line:0 offset:0 return:0 lazy:(null)
        symbol:memory_memalign_retry file:(null) line:0 offset:0 return:0 lazy:(null)
        symbol:memory_mallopt_free_dyn_thresholds file:(null) line:0 offset:0 return:0 lazy:(null)
        symbol:memory_realloc_retry file:(null) line:0 offset:0 return:0 lazy:(null)
        symbol:memory_calloc_retry file:(null) line:0 offset:0 return:0 lazy:(null)
        symbol:memory_mallopt file:(null) line:0 offset:0 return:0 lazy:(null)
        Open Debuginfo file: /usr/lib/debug/usr/lib64/libc-2.30.so.debug
        Try to find probe point from debuginfo.
        Opening /sys/kernel/debug/tracing//README write=0
        Failed to find the location of the '%di' variable at this address.
         Perhaps it has been optimized out.
         Use -V with the --range option to show '%di' location range.
        An error occurred in debuginfo analysis (-2).
        Trying to use symbols.
        Opening /sys/kernel/debug/tracing//uprobe_events write=1
        Writing event: p:probe_libc/memcpy /usr/lib64/libc-2.30.so:0x914c0 arg1=%di
        Writing event: p:probe_libc/memcpy /usr/lib64/libc-2.30.so:0x914c0 arg1=%di
        Failed to write event: File exists
          Error: Failed to add events. Reason: File exists (Code: -17)
      
      You can see that perf tried to write completely the same probe
      definition twice, which caused an error.
      
      To fix this issue, check the symbol list and drop duplicated symbols
      (which has the same symbol name and address) from it.
      
      With this patch:
      
        # perf probe -x /lib64/libc-2.30.so -a "memcpy arg1=%di"
        Failed to find the location of the '%di' variable at this address.
         Perhaps it has been optimized out.
         Use -V with the --range option to show '%di' location range.
        Added new events:
          probe_libc:memcpy    (on memcpy in /usr/lib64/libc-2.30.so with arg1=%di)
          probe_libc:memcpy    (on memcpy in /usr/lib64/libc-2.30.so with arg1=%di)
      
        You can now use it in all perf tools, such as:
      
        	perf record -e probe_libc:memcpy -aR sleep 1
      
      Committer notes:
      
      Fix this build error on 32-bit arches by using PRIx64 for symbol->start,
      that is an u64:
      
        In file included from util/probe-event.c:27:
        util/probe-event.c: In function 'find_probe_trace_events_from_map':
        util/probe-event.c:2978:14: error: format '%lx' expects argument of type 'long unsigned int', but argument 5 has type 'u64' {aka 'long long unsigned int'} [-Werror=format=]
             pr_debug("Found duplicated symbol %s @ %lx\n",
                      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        util/debug.h:17:21: note: in definition of macro 'pr_fmt'
         #define pr_fmt(fmt) fmt
                             ^~~
        util/probe-event.c:2978:5: note: in expansion of macro 'pr_debug'
             pr_debug("Found duplicated symbol %s @ %lx\n",
             ^~~~~~~~
      Reported-by: default avatarAndi Kleen <andi@firstfloor.org>
      Signed-off-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Reviewed-by: default avatarSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Link: http://lore.kernel.org/lkml/159438666401.62703.15196394835032087840.stgit@devnote2Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      26bbf45f
  2. 10 Jul, 2020 7 commits
    • Ian Rogers's avatar
      perf kmem: Pass additional arguments to 'perf record' · be8299e4
      Ian Rogers authored
      'perf kmem' has an input file option but current an output file option
      fails:
      
        $ sudo perf kmem record -o /tmp/p.data sleep 1  
         Error: unknown switch `o'
      
        Usage: perf kmem [<options>] {record|stat}
      
           -f, --force           don't complain, do it
           -i, --input <file>    input file name
           -l, --line <num>      show n lines
           -s, --sort <key[,key2...]>
                                 sort by keys: ptr, callsite, bytes, hit, pingpong, frag, page, order, mig>
           -v, --verbose         be more verbose (show symbol address, etc)
               --alloc           show per-allocation statistics
               --caller          show per-callsite statistics
               --live            Show live page stat
               --page            Analyze page allocator
               --raw-ip          show raw ip instead of symbol
               --slab            Analyze slab allocator
               --time <str>      Time span of interest (start,stop)
      
      'perf sched' is similar in implementation and avoids the problem by
      passing additional arguments to 'perf record'.
      
      This change makes 'perf kmem' parse command line options consistently
      with 'perf sched', although neither actually list that -o is a supported
      option.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lore.kernel.org/lkml/20200708183919.4141023-1-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      be8299e4
    • Ian Rogers's avatar
      perf parse-events: Report BPF errors · 5f634c8e
      Ian Rogers authored
      Setting the parse_events_error directly doesn't increment num_errors
      causing the error message not to be displayed. Use the
      parse_events__handle_error function that sets num_errors and handle
      multiple errors.
      
      Committer notes:
      
      Ian provided a before/after upon request:
      
      Before:
      
        $ /tmp/perf/perf record -e /tmp/perf/util/parse-events.o
        Run 'perf list' for a list of valid events
      
        Usage: perf record [<options>] [<command>]
           or: perf record [<options>] -- <command> [<options>]
      
           -e, --event <event>   event selector. use 'perf list' to list available event
      
      After:
      
        $ /tmp/perf/perf record -e /tmp/perf/util/parse-events.o
        event syntax error: '/tmp/perf/util/parse-events.o'
                            \___ Failed to load /tmp/perf/util/parse-events.o: BPF object format invalid
      
        (add -v to see detail)
        Run 'perf list' for a list of valid events
      
        Usage: perf record [<options>] [<command>]
           or: perf record [<options>] -- <command> [<options>]
      
           -e, --event <event>   event selector. use 'perf list' to list available events
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andrii Nakryiko <andriin@fb.com>
      Cc: bpf@vger.kernel.org
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: KP Singh <kpsingh@chromium.org>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: netdev@vger.kernel.org
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Yonghong Song <yhs@fb.com>
      Link: http://lore.kernel.org/lkml/20200707211449.3868944-1-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      5f634c8e
    • Adrian Hunter's avatar
      perf script: Show text poke address symbol · 7eeb9855
      Adrian Hunter authored
      It is generally more useful to show the symbol with an address. In this
      case, the print function requires the 'machine' which means changing
      callers to provide it as a parameter. It is optional because most events
      do not need it and the callers that matter can provide it.
      
      Committer notes:
      
      Made 'union perf_event' continue to be the first parameter to the
      perf_event__fprintf() and perf_event__fprintf_text_poke() events.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
      Cc: x86@kernel.org
      Link: http://lore.kernel.org/lkml/20200512121922.8997-16-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      7eeb9855
    • Adrian Hunter's avatar
      perf script: Add option --show-text-poke-events · 92ecf3a6
      Adrian Hunter authored
      Consistent with other new events, add an option to perf script to
      display text poke events and ksymbol events. Both text poke events and
      ksymbol events are displayed because some text pokes (e.g. ftrace
      trampolines) have corresponding ksymbol events.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
      Cc: x86@kernel.org
      Link: http://lore.kernel.org/lkml/20200512121922.8997-15-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      92ecf3a6
    • Adrian Hunter's avatar
      perf intel-pt: Add support for text poke events · b22f90aa
      Adrian Hunter authored
      Select text poke events when available and the kernel is being traced.
      Process text poke events to invalidate entries in Intel PT's instruction
      cache.
      
      Example:
      
        The example requires kernel config:
          CONFIG_PROC_SYSCTL=y
          CONFIG_SCHED_DEBUG=y
          CONFIG_SCHEDSTATS=y
      
        Before:
      
          # perf record -o perf.data.before --kcore -a -e intel_pt//k -m,64M &
          # cat /proc/sys/kernel/sched_schedstats
          0
          # echo 1 > /proc/sys/kernel/sched_schedstats
          # cat /proc/sys/kernel/sched_schedstats
          1
          # echo 0 > /proc/sys/kernel/sched_schedstats
          # cat /proc/sys/kernel/sched_schedstats
          0
          # kill %1
          [ perf record: Woken up 1 times to write data ]
          [ perf record: Captured and wrote 3.341 MB perf.data.before ]
          [1]+  Terminated                 perf record -o perf.data.before --kcore -a -e intel_pt//k -m,64M
          # perf script -i perf.data.before --itrace=e >/dev/null
          Warning:
          474 instruction trace errors
      
        After:
      
          # perf record -o perf.data.after --kcore -a -e intel_pt//k -m,64M &
          # cat /proc/sys/kernel/sched_schedstats
          0
          # echo 1 > /proc/sys/kernel/sched_schedstats
          # cat /proc/sys/kernel/sched_schedstats
          1
          # echo 0 > /proc/sys/kernel/sched_schedstats
          # cat /proc/sys/kernel/sched_schedstats
          0
          # kill %1
          [ perf record: Woken up 1 times to write data ]
          [ perf record: Captured and wrote 2.646 MB perf.data.after ]
          [1]+  Terminated                 perf record -o perf.data.after --kcore -a -e intel_pt//k -m,64M
          # perf script -i perf.data.after --itrace=e >/dev/null
      
      Example:
      
        The example requires kernel config:
          # CONFIG_FUNCTION_TRACER is not set
      
        Before:
          # perf record --kcore -m,64M -o t1 -a -e intel_pt//k &
          # perf probe __schedule
          Added new event:
            probe:__schedule     (on __schedule)
      
          You can now use it in all perf tools, such as:
      
                  perf record -e probe:__schedule -aR sleep 1
      
          # perf record -e probe:__schedule -aR sleep 1
          [ perf record: Woken up 1 times to write data ]
          [ perf record: Captured and wrote 0.026 MB perf.data (68 samples) ]
          # perf probe -d probe:__schedule
          Removed event: probe:__schedule
          # kill %1
          [ perf record: Woken up 1 times to write data ]
          [ perf record: Captured and wrote 41.268 MB t1 ]
          [1]+  Terminated                 perf record --kcore -m,64M -o t1 -a -e intel_pt//k
          # perf script -i t1 --itrace=e >/dev/null
          Warning:
          207 instruction trace errors
      
        After:
          # perf record --kcore -m,64M -o t1 -a -e intel_pt//k &
          # perf probe __schedule
          Added new event:
            probe:__schedule     (on __schedule)
      
          You can now use it in all perf tools, such as:
      
              perf record -e probe:__schedule -aR sleep 1
      
          # perf record -e probe:__schedule -aR sleep 1
          [ perf record: Woken up 1 times to write data ]
          [ perf record: Captured and wrote 0.028 MB perf.data (107 samples) ]
          # perf probe -d probe:__schedule
          Removed event: probe:__schedule
          # kill %1
          [ perf record: Woken up 1 times to write data ]
          [ perf record: Captured and wrote 39.978 MB t1 ]
          [1]+  Terminated                 perf record --kcore -m,64M -o t1 -a -e intel_pt//k
          # perf script -i t1 --itrace=e >/dev/null
          # perf script -i t1 --no-itrace -D | grep 'POKE\|KSYMBOL'
          6 565303693547 0x291f18 [0x50]: PERF_RECORD_KSYMBOL addr ffffffffc027a000 len 4096 type 2 flags 0x0 name kprobe_insn_page
          6 565303697010 0x291f68 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffc027a000 old len 0 new len 6
          6 565303838278 0x291fa8 [0x50]: PERF_RECORD_KSYMBOL addr ffffffffc027c000 len 4096 type 2 flags 0x0 name kprobe_optinsn_page
          6 565303848286 0x291ff8 [0xa0]: PERF_RECORD_TEXT_POKE addr 0xffffffffc027c000 old len 0 new len 106
          6 565369336743 0x292af8 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffff88ab8890 old len 5 new len 5
          7 566434327704 0x217c208 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffff88ab8890 old len 5 new len 5
          6 566456313475 0x293198 [0xa0]: PERF_RECORD_TEXT_POKE addr 0xffffffffc027c000 old len 106 new len 0
          6 566456314935 0x293238 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffc027a000 old len 6 new len 0
      
      Example:
      
        The example requires kernel config:
          CONFIG_FUNCTION_TRACER=y
      
        Before:
          # perf record --kcore -m,64M -o t1 -a -e intel_pt//k &
          # perf probe __kmalloc
          Added new event:
            probe:__kmalloc      (on __kmalloc)
      
          You can now use it in all perf tools, such as:
      
              perf record -e probe:__kmalloc -aR sleep 1
      
          # perf record -e probe:__kmalloc -aR sleep 1
          [ perf record: Woken up 1 times to write data ]
          [ perf record: Captured and wrote 0.022 MB perf.data (6 samples) ]
          # perf probe -d probe:__kmalloc
          Removed event: probe:__kmalloc
          # kill %1
          [ perf record: Woken up 2 times to write data ]
          [ perf record: Captured and wrote 43.850 MB t1 ]
          [1]+  Terminated                 perf record --kcore -m,64M -o t1 -a -e intel_pt//k
          # perf script -i t1 --itrace=e >/dev/null
          Warning:
          8 instruction trace errors
      
        After:
          # perf record --kcore -m,64M -o t1 -a -e intel_pt//k &
          # perf probe __kmalloc
          Added new event:
            probe:__kmalloc      (on __kmalloc)
      
          You can now use it in all perf tools, such as:
      
                  perf record -e probe:__kmalloc -aR sleep 1
      
          # perf record -e probe:__kmalloc -aR sleep 1
          [ perf record: Woken up 1 times to write data ]
          [ perf record: Captured and wrote 0.037 MB perf.data (206 samples) ]
          # perf probe -d probe:__kmalloc
          Removed event: probe:__kmalloc
          # kill %1
          [ perf record: Woken up 1 times to write data ]
          [ perf record: Captured and wrote 41.442 MB t1 ]
          [1]+  Terminated                 perf record --kcore -m,64M -o t1 -a -e intel_pt//k
          # perf script -i t1 --itrace=e >/dev/null
          # perf script -i t1 --no-itrace -D | grep 'POKE\|KSYMBOL'
          5 312216133258 0x8bafe0 [0x50]: PERF_RECORD_KSYMBOL addr ffffffffc0360000 len 415 type 2 flags 0x0 name ftrace_trampoline
          5 312216133494 0x8bb030 [0x1d8]: PERF_RECORD_TEXT_POKE addr 0xffffffffc0360000 old len 0 new len 415
          5 312216229563 0x8bb208 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffac6016f5 old len 5 new len 5
          5 312216239063 0x8bb248 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffac601803 old len 5 new len 5
          5 312216727230 0x8bb288 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffabbea190 old len 5 new len 5
          5 312216739322 0x8bb2c8 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffac6016f5 old len 5 new len 5
          5 312216748321 0x8bb308 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffac601803 old len 5 new len 5
          7 313287163462 0x2817430 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffac6016f5 old len 5 new len 5
          7 313287174890 0x2817470 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffac601803 old len 5 new len 5
          7 313287818979 0x28174b0 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffabbea190 old len 5 new len 5
          7 313287829357 0x28174f0 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffac6016f5 old len 5 new len 5
          7 313287841246 0x2817530 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffac601803 old len 5 new len 5
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
      Cc: x86@kernel.org
      Link: http://lore.kernel.org/lkml/20200512121922.8997-14-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      b22f90aa
    • Adrian Hunter's avatar
      perf tools: Add support for PERF_RECORD_KSYMBOL_TYPE_OOL · 789e2419
      Adrian Hunter authored
      PERF_RECORD_KSYMBOL_TYPE_OOL marks an executable page. Create a map
      backed only by memory, which will be populated as necessary by text poke
      events.
      
      Committer notes:
      
      From the patch:
      
      OOL stands for "Out of line" code such as kprobe-replaced instructions
      or optimized kprobes or ftrace trampolines.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
      Cc: x86@kernel.org
      Link: http://lore.kernel.org/lkml/20200512121922.8997-13-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      789e2419
    • Adrian Hunter's avatar
      perf tools: Add support for PERF_RECORD_TEXT_POKE · 246eba8e
      Adrian Hunter authored
      Add processing for PERF_RECORD_TEXT_POKE events. When a text poke event
      is processed, then the kernel dso data cache is updated with the poked
      bytes.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
      Cc: x86@kernel.org
      Link: http://lore.kernel.org/lkml/20200512121922.8997-12-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      246eba8e
  3. 09 Jul, 2020 1 commit
    • Numfor Mbiziwo-Tiapo's avatar
      perf annotate: Fix non-null terminated buffer returned by readlink() · b39730a6
      Numfor Mbiziwo-Tiapo authored
      Our local MSAN (Memory Sanitizer) build of perf throws a warning that
      comes from the "dso__disassemble_filename" function in
      "tools/perf/util/annotate.c" when running perf record.
      
      The warning stems from the call to readlink, in which "build_id_path"
      was being read into "linkname". Since readlink does not null terminate,
      an uninitialized memory access would later occur when "linkname" is
      passed into the strstr function. This is simply fixed by
      null-terminating "linkname" after the call to readlink.
      
      To reproduce this warning, build perf by running:
      
        $ make -C tools/perf CLANG=1 CC=clang EXTRA_CFLAGS="-fsanitize=memory -fsanitize-memory-track-origins"
      
      (Additionally, llvm might have to be installed and clang might have to
      be specified as the compiler - export CC=/usr/bin/clang)
      
      Then running:
      
        tools/perf/perf record -o - ls / | tools/perf/perf --no-pager annotate -i - --stdio
      
      Please see the cover letter for why false positive warnings may be
      generated.
      Signed-off-by: default avatarNumfor Mbiziwo-Tiapo <nums@google.com>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Mark Drayton <mbd@fb.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lore.kernel.org/lkml/20190729205750.193289-1-nums@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      b39730a6
  4. 08 Jul, 2020 2 commits
    • Steve MacLean's avatar
      perf inject jit: Remove //anon mmap events · c8f6ae1f
      Steve MacLean authored
      **perf-<pid>.map and jit-<pid>.dump designs:
      
      When a JIT generates code to be executed, it must allocate memory and
      mark it executable using an mmap call.
      
      *** perf-<pid>.map design
      
      The perf-<pid>.map assumes that any sample recorded in an anonymous
      memory page is JIT code. It then tries to resolve the symbol name by
      looking at the process' perf-<pid>.map.
      
      *** jit-<pid>.dump design
      
      The jit-<pid>.dump mechanism takes a different approach. It requires a
      JIT to write a `<path>/jit-<pid>.dump` file. This file must also be
      mmapped so that perf inject -jit can find the file. The JIT must also
      add JIT_CODE_LOAD records for any functions it generates. The records
      are timestamped using a clock which can be correlated to the perf record
      clock.
      
      After perf record,  the `perf inject -jit` pass parses the recording
      looking for a `<path>/jit-<pid>.dump` file. When it finds the file, it
      parses it and for each JIT_CODE_LOAD record:
      * creates an elf file `<path>/jitted-<pid>-<code_index>.so
      * injects a new mmap record mapping the new elf file into the process.
      
      *** Coexistence design
      
      The kernel and perf support both of these mechanisms. We need to make
      sure perf works on an app supporting either or both of these mechanisms.
      Both designs rely on mmap records to determine how to resolve an ip
      address.
      
      The mmap records of both techniques by definition overlap. When the JIT
      compiles a method, it must:
      
      * allocate memory (mmap)
      * add execution privilege (mprotect or mmap. either will
      generate an mmap event form the kernel to perf)
      * compile code into memory
      * add a function record to perf-<pid>.map and/or jit-<pid>.dump
      
      Because the jit-<pid>.dump mechanism supports greater capabilities, perf
      prefers the symbols from jit-<pid>.dump. It implements this based on
      timestamp ordering of events. There is an implicit ASSUMPTION that the
      JIT_CODE_LOAD record timestamp will be after the // anon mmap event that
      was generated during memory allocation or adding the execution privilege setting.
      
      *** Problems with the ASSUMPTION
      
      The ASSUMPTION made in the Coexistence design section above is violated
      in the following scenario.
      
      *** Scenario
      
      While a JIT is jitting code it will eventually need to commit more
      pages and change these pages to executable permissions. Typically the
      JIT will want these collocated to minimize branch displacements.
      
      The kernel will coalesce these anonymous mapping with identical
      permissions before sending an MMAP event for the new pages. The address
      range of the new mmap will not be just the most recently mmap pages.
      It will include the entire coalesced mmap region.
      
      See mm/mmap.c
      
      unsigned long mmap_region(struct file *file, unsigned long addr,
                      unsigned long len, vm_flags_t vm_flags, unsigned long pgoff,
                      struct list_head *uf)
      {
      ...
              /*
               * Can we just expand an old mapping?
               */
      ...
              perf_event_mmap(vma);
      ...
      }
      
      *** Symptoms
      
      The coalesced // anon mmap event will be timestamped after the
      JIT_CODE_LOAD records. This means it will be used as the most recent
      mapping for that entire address range. For remaining events it will look
      at the inferior perf-<pid>.map for symbols.
      
      If both mechanisms are supported, the symbol will appear twice with
      different module names. This causes weird behavior in reporting.
      
      If only jit-<pid>.dump is supported, the symbol will no longer be resolved.
      
      ** Implemented solution
      
      This patch solves the issue by removing // anon mmap events for any
      process which has a valid jit-<pid>.dump file.
      
      It tracks on a per process basis to handle the case where some running
      apps support jit-<pid>.dump, but some only support perf-<pid>.map.
      
      It adds new assumptions:
      * // anon mmap events are only required for perf-<pid>.map support.
      * An app that uses jit-<pid>.dump, no longer needs
      perf-<pid>.map support. It assumes that any perf-<pid>.map info is
      inferior.
      
      *** Details
      
      Use thread->priv to store whether a jitdump file has been processed
      
      During "perf inject --jit", discard "//anon*" mmap events for any pid which
      has sucessfully processed a jitdump file.
      
      ** Testing:
      
      // jitdump case
      
        perf record <app with jitdump>
        perf inject --jit --input perf.data --output perfjit.data
      
      // verify mmap "//anon" events present initially
      
        perf script --input perf.data --show-mmap-events | grep '//anon'
      
      // verify mmap "//anon" events removed
      
        perf script --input perfjit.data --show-mmap-events | grep '//anon'
      
      // no jitdump case
      
        perf record <app without jitdump>
        perf inject --jit --input perf.data --output perfjit.data
      
      // verify mmap "//anon" events present initially
      
        perf script --input perf.data --show-mmap-events | grep '//anon'
      
      // verify mmap "//anon" events not removed
      
        perf script --input perfjit.data --show-mmap-events | grep '//anon'
      
      ** Repro:
      
      This issue was discovered while testing the initial CoreCLR jitdump
      implementation. https://github.com/dotnet/coreclr/pull/26897.
      
      ** Alternate solutions considered
      
      These were also briefly considered:
      
      * Change kernel to not coalesce mmap regions.
      
      * Change kernel reporting of coalesced mmap regions to perf. Only
      include newly mapped memory.
      
      * Only strip parts of // anon mmap events overlapping existing
      jitted-<pid>-<code_index>.so mmap events.
      Signed-off-by: default avatarSteve MacLean <Steve.MacLean@Microsoft.com>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lore.kernel.org/lkml/1590544271-125795-1-git-send-email-steve.maclean@linux.microsoft.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      c8f6ae1f
    • Arnaldo Carvalho de Melo's avatar
      Merge remote-tracking branch 'torvalds/master' into perf/core · facbf0b9
      Arnaldo Carvalho de Melo authored
      To pick up fixes and move perf/core forward, minor conflict as
      perf_evlist__add_dummy() lost its 'perf_' prefix as it operates on a
      'struct evlist', not on a 'struct perf_evlist', i.e. its tools/perf/
      specific, it is not in libperf.
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      facbf0b9
  5. 07 Jul, 2020 8 commits
  6. 06 Jul, 2020 17 commits