1. 14 Aug, 2018 5 commits
  2. 13 Aug, 2018 25 commits
  3. 10 Aug, 2018 1 commit
  4. 09 Aug, 2018 1 commit
    • Sandipan Das's avatar
      perf probe powerpc: Fix trace event post-processing · 354b064b
      Sandipan Das authored
      In some cases, a symbol may have multiple aliases. Attempting to add an
      entry probe for such symbols results in a probe being added at an
      incorrect location while it fails altogether for return probes. This is
      only applicable for binaries with debug information.
      
      During the arch-dependent post-processing, the offset from the start of
      the symbol at which the probe is to be attached is determined and added
      to the start address of the symbol to get the probe's location.  In case
      there are multiple aliases, this offset gets added multiple times for
      each alias of the symbol and we end up with an incorrect probe location.
      
      This can be verified on a powerpc64le system as shown below.
      
        $ nm /lib/modules/$(uname -r)/build/vmlinux | grep "sys_open$"
        ...
        c000000000414290 T __se_sys_open
        c000000000414290 T sys_open
      
        $ objdump -d /lib/modules/$(uname -r)/build/vmlinux | grep -A 10 "<__se_sys_open>:"
      
        c000000000414290 <__se_sys_open>:
        c000000000414290:       19 01 4c 3c     addis   r2,r12,281
        c000000000414294:       70 c4 42 38     addi    r2,r2,-15248
        c000000000414298:       a6 02 08 7c     mflr    r0
        c00000000041429c:       e8 ff a1 fb     std     r29,-24(r1)
        c0000000004142a0:       f0 ff c1 fb     std     r30,-16(r1)
        c0000000004142a4:       f8 ff e1 fb     std     r31,-8(r1)
        c0000000004142a8:       10 00 01 f8     std     r0,16(r1)
        c0000000004142ac:       c1 ff 21 f8     stdu    r1,-64(r1)
        c0000000004142b0:       78 23 9f 7c     mr      r31,r4
        c0000000004142b4:       78 1b 7e 7c     mr      r30,r3
      
        For both the entry probe and the return probe, the probe location
        should be _text+4276888 (0xc000000000414298). Since another alias
        exists for 'sys_open', the post-processing code will end up adding
        the offset (8 for powerpc64le) twice and perf will attempt to add
        the probe at _text+4276896 (0xc0000000004142a0) instead.
      
      Before:
      
        # perf probe -v -a sys_open
      
        probe-definition(0): sys_open
        symbol:sys_open file:(null) line:0 offset:0 return:0 lazy:(null)
        0 arguments
        Looking at the vmlinux_path (8 entries long)
        Using /lib/modules/4.18.0-rc8+/build/vmlinux for symbols
        Open Debuginfo file: /lib/modules/4.18.0-rc8+/build/vmlinux
        Try to find probe point from debuginfo.
        Symbol sys_open address found : c000000000414290
        Matched function: __se_sys_open [2ad03a0]
        Probe point found: __se_sys_open+0
        Found 1 probe_trace_events.
        Opening /sys/kernel/debug/tracing/kprobe_events write=1
        Writing event: p:probe/sys_open _text+4276896
        Added new event:
          probe:sys_open       (on sys_open)
        ...
      
        # perf probe -v -a sys_open%return $retval
      
        probe-definition(0): sys_open%return
        symbol:sys_open file:(null) line:0 offset:0 return:1 lazy:(null)
        0 arguments
        Looking at the vmlinux_path (8 entries long)
        Using /lib/modules/4.18.0-rc8+/build/vmlinux for symbols
        Open Debuginfo file: /lib/modules/4.18.0-rc8+/build/vmlinux
        Try to find probe point from debuginfo.
        Symbol sys_open address found : c000000000414290
        Matched function: __se_sys_open [2ad03a0]
        Probe point found: __se_sys_open+0
        Found 1 probe_trace_events.
        Opening /sys/kernel/debug/tracing/README write=0
        Opening /sys/kernel/debug/tracing/kprobe_events write=1
        Parsing probe_events: p:probe/sys_open _text+4276896
        Group:probe Event:sys_open probe:p
        Writing event: r:probe/sys_open__return _text+4276896
        Failed to write event: Invalid argument
          Error: Failed to add events. Reason: Invalid argument (Code: -22)
      
      After:
      
        # perf probe -v -a sys_open
      
        probe-definition(0): sys_open
        symbol:sys_open file:(null) line:0 offset:0 return:0 lazy:(null)
        0 arguments
        Looking at the vmlinux_path (8 entries long)
        Using /lib/modules/4.18.0-rc8+/build/vmlinux for symbols
        Open Debuginfo file: /lib/modules/4.18.0-rc8+/build/vmlinux
        Try to find probe point from debuginfo.
        Symbol sys_open address found : c000000000414290
        Matched function: __se_sys_open [2ad03a0]
        Probe point found: __se_sys_open+0
        Found 1 probe_trace_events.
        Opening /sys/kernel/debug/tracing/kprobe_events write=1
        Writing event: p:probe/sys_open _text+4276888
        Added new event:
          probe:sys_open       (on sys_open)
        ...
      
        # perf probe -v -a sys_open%return $retval
      
        probe-definition(0): sys_open%return
        symbol:sys_open file:(null) line:0 offset:0 return:1 lazy:(null)
        0 arguments
        Looking at the vmlinux_path (8 entries long)
        Using /lib/modules/4.18.0-rc8+/build/vmlinux for symbols
        Open Debuginfo file: /lib/modules/4.18.0-rc8+/build/vmlinux
        Try to find probe point from debuginfo.
        Symbol sys_open address found : c000000000414290
        Matched function: __se_sys_open [2ad03a0]
        Probe point found: __se_sys_open+0
        Found 1 probe_trace_events.
        Opening /sys/kernel/debug/tracing/README write=0
        Opening /sys/kernel/debug/tracing/kprobe_events write=1
        Parsing probe_events: p:probe/sys_open _text+4276888
        Group:probe Event:sys_open probe:p
        Writing event: r:probe/sys_open__return _text+4276888
        Added new event:
          probe:sys_open__return (on sys_open%return)
        ...
      Reported-by: default avatarAneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarSandipan Das <sandipan@linux.ibm.com>
      Acked-by: default avatarNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Fixes: 99e608b5 ("perf probe ppc64le: Fix probe location when using DWARF")
      Link: http://lkml.kernel.org/r/20180809161929.35058-1-sandipan@linux.ibm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      354b064b
  5. 08 Aug, 2018 8 commits
    • Konstantin Khlebnikov's avatar
      perf map: Optimize maps__fixup_overlappings() · 6a9405b5
      Konstantin Khlebnikov authored
      This function splits and removes overlapping areas.
      
      Maps in tree are ordered by start address thus we could find first
      overlap and stop if next map does not overlap.
      Signed-off-by: default avatarKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/153365189407.435244.7234821822450484712.stgit@buzzSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      6a9405b5
    • Konstantin Khlebnikov's avatar
      perf map: Synthesize maps only for thread group leader · e5adfc3e
      Konstantin Khlebnikov authored
      Threads share map_groups, all map events are merged into it.
      
      Thus we could send mmaps only for thread group leader.  Otherwise it
      took ages to attach and record something from processes with many vmas
      and threads.
      
      Thread group leader could be already dead, but it seems perf cannot
      handle this case anyway.
      
      Testing dummy:
      
        #include <stdio.h>
        #include <stdlib.h>
        #include <sys/mman.h>
        #include <pthread.h>
        #include <unistd.h>
      
        void *thread(void *arg) {
                pause();
        }
      
        int main(int argc, char **argv) {
              int threads = 10000;
              int vmas = 50000;
              pthread_t th;
              for (int i = 0; i < threads; i++)
                      pthread_create(&th, NULL, thread, NULL);
              for (int i = 0; i < vmas; i++)
                      mmap(NULL, 4096, (i & 1) ? PROT_READ : PROT_WRITE,
                           MAP_PRIVATE | MAP_ANONYMOUS | MAP_NORESERVE, -1, 0);
              sleep(60);
              return 0;
        }
      
      Comment by Jiri Olsa:
      
      We actualy synthesize the group leader (if we found one) for the thread
      even if it's not present in the thread_map, so the process maps are
      always in data.
      Signed-off-by: default avatarKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/153363294102.396323.6277944760215058174.stgit@buzzSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      e5adfc3e
    • Arnaldo Carvalho de Melo's avatar
      perf trace: Wire up the augmented syscalls with the syscalls:sys_enter_FOO beautifier · 88cf7084
      Arnaldo Carvalho de Melo authored
      We just check that the evsel is the one we associated with the
      bpf-output event associated with the "__augmented_syscalls__" eBPF map,
      to show that the formatting is done properly:
      
        # perf trace -e perf/tools/perf/examples/bpf/augmented_syscalls.c,openat cat /etc/passwd > /dev/null
           0.000 (         ): __augmented_syscalls__:dfd: CWD, filename: 0x43e06da8, flags: CLOEXEC
           0.006 (         ): syscalls:sys_enter_openat:dfd: CWD, filename: 0x43e06da8, flags: CLOEXEC
           0.007 ( 0.004 ms): cat/11486 openat(dfd: CWD, filename: 0x43e06da8, flags: CLOEXEC                 ) = 3
           0.029 (         ): __augmented_syscalls__:dfd: CWD, filename: 0x4400ece0, flags: CLOEXEC
           0.030 (         ): syscalls:sys_enter_openat:dfd: CWD, filename: 0x4400ece0, flags: CLOEXEC
           0.031 ( 0.004 ms): cat/11486 openat(dfd: CWD, filename: 0x4400ece0, flags: CLOEXEC                 ) = 3
           0.249 (         ): __augmented_syscalls__:dfd: CWD, filename: 0xc3700d6
           0.250 (         ): syscalls:sys_enter_openat:dfd: CWD, filename: 0xc3700d6
           0.252 ( 0.003 ms): cat/11486 openat(dfd: CWD, filename: 0xc3700d6                                  ) = 3
        #
      
      Now we just need to get the full blown enter/exit handlers to check if the
      evsel being processed is the augmented_syscalls one to go pick the pointer
      payloads from the end of the payload.
      
      We also need to state somehow what is the layout for multi pointer arg syscalls.
      
      Also handy would be to have a BTF file with the struct definitions used in
      syscalls, compact, generated at kernel built time and available for use in eBPF
      programs.
      
      Till we get there we can go on doing some manual coupling of the most relevant
      syscalls with some hand built beautifiers.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-r6ba5izrml82nwfmwcp7jpkm@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      88cf7084
    • Arnaldo Carvalho de Melo's avatar
      perf trace: Setup the augmented syscalls bpf-output event fields · d3d1c4bd
      Arnaldo Carvalho de Melo authored
      The payload that is put in place by the eBPF script attached to
      syscalls:sys_enter_openat (and other syscalls with pointers, in the
      future) can be consumed by the existing sys_enter beautifiers if
      evsel->priv is setup with a struct syscall_tp with struct tp_fields for
      the 'syscall_id' and 'args' fields expected by the beautifiers, this
      patch does just that.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-xfjyog8oveg2fjys9r1yy1es@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      d3d1c4bd
    • Arnaldo Carvalho de Melo's avatar
      perf bpf: Make bpf__setup_output_event() return the bpf-output event · 78e890ea
      Arnaldo Carvalho de Melo authored
      We're calling it to setup that event, and we'll need it later to decide
      if the bpf-output event we're handling is the one setup for a specific
      purpose, return it using ERR_PTR, etc.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-zhachv7il2n1lopt9aonwhu7@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      78e890ea
    • Arnaldo Carvalho de Melo's avatar
      perf trace: Handle "bpf-output" events associated with "__augmented_syscalls__" BPF map · e0b6d2ef
      Arnaldo Carvalho de Melo authored
      Add an example BPF script that writes syscalls:sys_enter_openat raw
      tracepoint payloads augmented with the first 64 bytes of the "filename"
      syscall pointer arg.
      
      Then catch it and print it just like with things written to the
      "__bpf_stdout__" map associated with a PERF_COUNT_SW_BPF_OUTPUT software
      event, by just letting the default tracepoint handler in 'perf trace',
      trace__event_handler(), to use bpf_output__fprintf(trace, sample), just
      like it does with all other PERF_COUNT_SW_BPF_OUTPUT events, i.e. just
      do a dump on the payload, so that we can check if what is being printed
      has at least the first 64 bytes of the "filename" arg:
      
      The augmented_syscalls.c eBPF script:
      
        # cat tools/perf/examples/bpf/augmented_syscalls.c
        // SPDX-License-Identifier: GPL-2.0
      
        #include <stdio.h>
      
        struct bpf_map SEC("maps") __augmented_syscalls__ = {
             .type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
             .key_size = sizeof(int),
             .value_size = sizeof(u32),
             .max_entries = __NR_CPUS__,
        };
      
        struct syscall_enter_openat_args {
      	unsigned long long common_tp_fields;
      	long		   syscall_nr;
      	long		   dfd;
      	char		   *filename_ptr;
      	long		   flags;
      	long		   mode;
        };
      
        struct augmented_enter_openat_args {
      	struct syscall_enter_openat_args args;
      	char				 filename[64];
        };
      
        int syscall_enter(openat)(struct syscall_enter_openat_args *args)
        {
      	struct augmented_enter_openat_args augmented_args;
      
      	probe_read(&augmented_args.args, sizeof(augmented_args.args), args);
      	probe_read_str(&augmented_args.filename, sizeof(augmented_args.filename), args->filename_ptr);
      	perf_event_output(args, &__augmented_syscalls__, BPF_F_CURRENT_CPU,
      			  &augmented_args, sizeof(augmented_args));
      	return 1;
        }
      
        license(GPL);
        #
      
      So it will just prepare a raw_syscalls:sys_enter payload for the
      "openat" syscall.
      
      This will eventually be done for all syscalls with pointer args,
      globally or just when the user asks, using some spec, which args of
      which syscalls it wants "expanded" this way, we'll probably start with
      just all the syscalls that have char * pointers with familiar names, the
      ones we already handle with the probe:vfs_getname kprobe if it is in
      place hooking the kernel getname_flags() function used to copy from user
      the paths.
      
      Running it we get:
      
        # perf trace -e perf/tools/perf/examples/bpf/augmented_syscalls.c,openat cat /etc/passwd > /dev/null
           0.000 (         ): __augmented_syscalls__:X?.C......................`\..................../etc/ld.so.cache..#......,....ao.k...............k......1.".........
           0.006 (         ): syscalls:sys_enter_openat:dfd: CWD, filename: 0x5c600da8, flags: CLOEXEC
           0.008 ( 0.005 ms): cat/31292 openat(dfd: CWD, filename: 0x5c600da8, flags: CLOEXEC                 ) = 3
           0.036 (         ): __augmented_syscalls__:X?.C.......................\..................../lib64/libc.so.6......... .\....#........?.......=.C..../.".........
           0.037 (         ): syscalls:sys_enter_openat:dfd: CWD, filename: 0x5c808ce0, flags: CLOEXEC
           0.039 ( 0.007 ms): cat/31292 openat(dfd: CWD, filename: 0x5c808ce0, flags: CLOEXEC                 ) = 3
           0.323 (         ): __augmented_syscalls__:X?.C.....................P....................../etc/passwd......>.C....@................>.C.....,....ao.>.C........
           0.325 (         ): syscalls:sys_enter_openat:dfd: CWD, filename: 0xe8be50d6
           0.327 ( 0.004 ms): cat/31292 openat(dfd: CWD, filename: 0xe8be50d6                                 ) = 3
        #
      
      We need to go on optimizing this to avoid seding trash or zeroes in the
      pointer content payload, using the return from bpf_probe_read_str(), but
      to keep things simple at this stage and make incremental progress, lets
      leave it at that for now.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-g360n1zbj6bkbk6q0qo11c28@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      e0b6d2ef
    • Arnaldo Carvalho de Melo's avatar
      perf bpf: Add wrappers to BPF_FUNC_probe_read(_str) functions · 8fa25f30
      Arnaldo Carvalho de Melo authored
      Will be used shortly in the augmented syscalls work together with a
      PERF_COUNT_SW_BPF_OUTPUT software event to insert syscalls + pointer
      contents in the perf ring buffer, to be consumed by 'perf trace'
      beautifiers.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-ajlkpz4cd688ulx1u30htkj3@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      8fa25f30
    • Arnaldo Carvalho de Melo's avatar
      perf bpf: Add bpf__setup_output_event() strerror() counterpart · aa31be3a
      Arnaldo Carvalho de Melo authored
      That is just bpf__strerror_setup_stdout() renamed to the more general
      "setup_output_event" method, keep the existing stdout() as a wrapper.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-nwnveo428qn0b48axj50vkc7@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      aa31be3a