1. 18 Apr, 2020 10 commits
    • Jiri Olsa's avatar
      perf parser: Add support to specify rXXX event with pmu · 3a6c51e4
      Jiri Olsa authored
      The current rXXXX event specification creates event under PERF_TYPE_RAW
      pmu type. This change allows to use rXXXX within pmu syntax, so it's
      type is used via the following syntax:
      
        -e 'cpu/r3c/'
        -e 'cpum_cf/r0/'
      
      The XXXX number goes directly to perf_event_attr::config the same way as
      in '-e rXXXX' event. The perf_event_attr::type is filled with pmu type.
      
      Committer testing:
      
      So, lets see what goes in perf_event_attr::config for, say, the
      'instructions' PERF_TYPE_HARDWARE (0) event, first we should look at how
      to encode this event as a PERF_TYPE_RAW event for this specific CPU, an
      AMD Ryzen 5:
      
        # cat /sys/devices/cpu/events/instructions
        event=0xc0
        #
      
      Then try with it _and_ the instruction, just to see that they are close
      enough:
      
        # perf stat -e rc0,instructions sleep 1
      
         Performance counter stats for 'sleep 1':
      
                   919,794      rc0
                   919,898      instructions
      
               1.000754579 seconds time elapsed
      
               0.000715000 seconds user
               0.000000000 seconds sys
        #
      
      Now we should try, before this patch, the PMU event encoding:
      
        # perf stat -e cpu/rc0/ sleep 1
        event syntax error: 'cpu/rc0/'
                                 \___ unknown term
      
        valid terms: event,edge,inv,umask,cmask,config,config1,config2,name,period,percore
        #
      
      Now with this patch, the three ways of specifying the 'instructions' CPU
      counter are accepted:
      
        # perf stat -e cpu/rc0/,rc0,instructions sleep 1
      
         Performance counter stats for 'sleep 1':
      
                   892,948      cpu/rc0/
                   893,052      rc0
                   893,156      instructions
      
               1.000931819 seconds time elapsed
      
               0.000916000 seconds user
               0.000000000 seconds sys
      
        #
      Requested-by: default avatarThomas Richter <tmricht@linux.ibm.com>
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Tested-by: default avatarThomas Richter <tmricht@linux.ibm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Michael Petlan <mpetlan@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Link: http://lore.kernel.org/lkml/20200416221405.437788-1-jolsa@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      3a6c51e4
    • Ian Rogers's avatar
      perf doc: allow ASCIIDOC_EXTRA to be an argument · e9cfa47e
      Ian Rogers authored
      This will allow parent makefiles to pass values to asciidoc.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Andrii Nakryiko <andriin@fb.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Florian Fainelli <f.fainelli@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Igor Lubashev <ilubashe@akamai.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Jiwei Sun <jiwei.sun@windriver.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yonghong Song <yhs@fb.com>
      Cc: bpf@vger.kernel.org
      Cc: netdev@vger.kernel.org
      Cc: yuzhoujian <yuzhoujian@didichuxing.com>
      Link: http://lore.kernel.org/lkml/20200416162058.201954-2-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      e9cfa47e
    • Kan Liang's avatar
      perf pmu: Add support for PMU capabilities · 9fbc61f8
      Kan Liang authored
      The PMU capabilities information, which is located at
      /sys/bus/event_source/devices/<dev>/caps, is required by perf tool.  For
      example, the max LBR information is required to stitch LBR call stack.
      
      Add perf_pmu__caps_parse() to parse the PMU capabilities information.
      The information is stored in a list.
      
      The following patch will store the capabilities information in perf
      header.
      
      Committer notes:
      
      Here's an example of such directories and its files in an i5 7th gen
      machine:
      
        [root@seventh ~]# ls -lad /sys/bus/event_source/devices/*/caps
        drwxr-xr-x. 2 root root 0 Apr 14 13:33 /sys/bus/event_source/devices/cpu/caps
        drwxr-xr-x. 2 root root 0 Apr 14 13:33 /sys/bus/event_source/devices/intel_pt/caps
        [root@seventh ~]# ls -la /sys/bus/event_source/devices/intel_pt/caps
        total 0
        drwxr-xr-x. 2 root root    0 Apr 14 13:33 .
        drwxr-xr-x. 5 root root    0 Apr 14 13:12 ..
        -r--r--r--. 1 root root 4096 Apr 16 13:10 cr3_filtering
        -r--r--r--. 1 root root 4096 Apr 16 11:42 cycle_thresholds
        -r--r--r--. 1 root root 4096 Apr 16 13:10 ip_filtering
        -r--r--r--. 1 root root 4096 Apr 16 13:10 max_subleaf
        -r--r--r--. 1 root root 4096 Apr 14 13:33 mtc
        -r--r--r--. 1 root root 4096 Apr 14 13:33 mtc_periods
        -r--r--r--. 1 root root 4096 Apr 16 13:10 num_address_ranges
        -r--r--r--. 1 root root 4096 Apr 16 13:10 output_subsys
        -r--r--r--. 1 root root 4096 Apr 16 13:10 payloads_lip
        -r--r--r--. 1 root root 4096 Apr 16 13:10 power_event_trace
        -r--r--r--. 1 root root 4096 Apr 14 13:33 psb_cyc
        -r--r--r--. 1 root root 4096 Apr 14 13:33 psb_periods
        -r--r--r--. 1 root root 4096 Apr 16 13:10 ptwrite
        -r--r--r--. 1 root root 4096 Apr 16 13:10 single_range_output
        -r--r--r--. 1 root root 4096 Apr 16 12:03 topa_multiple_entries
        -r--r--r--. 1 root root 4096 Apr 16 13:10 topa_output
        [root@seventh ~]# cat /sys/bus/event_source/devices/intel_pt/caps/topa_output
        1
        [root@seventh ~]# cat /sys/bus/event_source/devices/intel_pt/caps/topa_multiple_entries
        1
        [root@seventh ~]# cat /sys/bus/event_source/devices/intel_pt/caps/mtc
        1
        [root@seventh ~]# cat /sys/bus/event_source/devices/intel_pt/caps/power_event_trace
        0
        [root@seventh ~]#
      
        [root@seventh ~]# ls -la /sys/bus/event_source/devices/cpu/caps/
        total 0
        drwxr-xr-x. 2 root root    0 Apr 14 13:33 .
        drwxr-xr-x. 6 root root    0 Apr 14 13:12 ..
        -r--r--r--. 1 root root 4096 Apr 16 13:10 branches
        -r--r--r--. 1 root root 4096 Apr 14 13:33 max_precise
        -r--r--r--. 1 root root 4096 Apr 16 13:10 pmu_name
        [root@seventh ~]# cat /sys/bus/event_source/devices/cpu/caps/max_precise
        3
        [root@seventh ~]# cat /sys/bus/event_source/devices/cpu/caps/branches
        32
        [root@seventh ~]# cat /sys/bus/event_source/devices/cpu/caps/pmu_name
        skylake
        [root@seventh ~]#
      
      Wow, first time I've heard about
      /sys/bus/event_source/devices/cpu/caps/max_precise, I think I'll use it!
      :-)
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Reviewed-by: default avatarAndi Kleen <ak@linux.intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Pavel Gerasimov <pavel.gerasimov@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com>
      Link: http://lore.kernel.org/lkml/20200319202517.23423-2-kan.liang@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      9fbc61f8
    • He Zhe's avatar
      tools lib traceevent: Take care of return value of asprintf · f8ff18be
      He Zhe authored
      According to the API, if memory allocation wasn't possible, or some
      other error occurs, asprintf will return -1, and the contents of strp
      below are undefined.
      
        int asprintf(char **strp, const char *fmt, ...);
      
      This patch takes care of return value of asprintf to make it less error
      prone and prevent the following build warning.
      
        ignoring return value of ‘asprintf’, declared with attribute warn_unused_result [-Wunused-result]
      Signed-off-by: default avatarHe Zhe <zhe.he@windriver.com>
      Reviewed-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Cc: Tzvetomir Stoyanov <tstoyanov@vmware.com>
      Cc: hewenliang4@huawei.com
      Link: http://lore.kernel.org/lkml/1582163930-233692-1-git-send-email-zhe.he@windriver.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      f8ff18be
    • Stephane Eranian's avatar
      perf stat: Force error in fallback on :k events · bec49a9e
      Stephane Eranian authored
      When it is not possible for a non-privilege perf command to monitor at
      the kernel level (:k), the fallback code forces a :u. That works if the
      event was previously monitoring both levels.  But if the event was
      already constrained to kernel only, then it does not make sense to
      restrict it to user only.
      
      Given the code works by exclusion, a kernel only event would have:
      
        attr->exclude_user = 1
      
      The fallback code would add:
      
        attr->exclude_kernel = 1
      
      In the end the end would not monitor in either the user level or kernel
      level. In other words, it would count nothing.
      
      An event programmed to monitor kernel only cannot be switched to user
      only without seriously warning the user.
      
      This patch forces an error in this case to make it clear the request
      cannot really be satisfied.
      
      Behavior with paranoid 1:
      
        $ sudo bash -c "echo 1 > /proc/sys/kernel/perf_event_paranoid"
        $ perf stat -e cycles:k sleep 1
      
         Performance counter stats for 'sleep 1':
      
                 1,520,413      cycles:k
      
               1.002361664 seconds time elapsed
      
               0.002480000 seconds user
               0.000000000 seconds sys
      
      Old behavior with paranoid 2:
      
        $ sudo bash -c "echo 2 > /proc/sys/kernel/perf_event_paranoid"
        $ perf stat -e cycles:k sleep 1
         Performance counter stats for 'sleep 1':
      
                         0      cycles:ku
      
               1.002358127 seconds time elapsed
      
               0.002384000 seconds user
               0.000000000 seconds sys
      
      New behavior with paranoid 2:
      
        $ sudo bash -c "echo 2 > /proc/sys/kernel/perf_event_paranoid"
        $ perf stat -e cycles:k sleep 1
        Error:
        You may not have permission to collect stats.
      
        Consider tweaking /proc/sys/kernel/perf_event_paranoid,
        which controls use of the performance events system by
        unprivileged users (without CAP_PERFMON or CAP_SYS_ADMIN).
      
        The current value is 2:
      
          -1: Allow use of (almost) all events by all users
              Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK
        >= 0: Disallow ftrace function tracepoint by users without CAP_PERFMON or CAP_SYS_ADMIN
              Disallow raw tracepoint access by users without CAP_SYS_PERFMON or CAP_SYS_ADMIN
        >= 1: Disallow CPU event access by users without CAP_PERFMON or CAP_SYS_ADMIN
        >= 2: Disallow kernel profiling by users without CAP_PERFMON or CAP_SYS_ADMIN
      
        To make this setting permanent, edit /etc/sysctl.conf too, e.g.:
      
                kernel.perf_event_paranoid = -1
      
      v2 of this patch addresses the review feedback from jolsa@redhat.com.
      Signed-off-by: default avatarStephane Eranian <eranian@google.com>
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200414161550.225588-1-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      bec49a9e
    • Adrian Hunter's avatar
      perf tools: Add support for leader-sampling with AUX area events · e3459979
      Adrian Hunter authored
      When AUX area events are used in sampling mode, they must be the group
      leader, but the group leader is also used for leader-sampling. However,
      it is not desirable to use an AUX area event as the leader for
      leader-sampling, because it doesn't have any samples of its own. To support
      leader-sampling with AUX area events, use the 2nd event of the group as the
      "leader" for the purposes of leader-sampling.
      
      Example:
      
       # perf record --kcore --aux-sample -e '{intel_pt//,cycles,instructions}:S' -c 10000 uname
       [ perf record: Woken up 3 times to write data ]
       [ perf record: Captured and wrote 0.786 MB perf.data ]
       # perf report
       Samples: 380  of events 'anon group { cycles, instructions }', Event count (approx.): 3026164
                 Children              Self  Command  Shared Object      Symbol
       +   38.76%  42.65%     0.00%   0.00%  uname    [kernel.kallsyms]  [k] __x86_indirect_thunk_rax
       +   35.82%  31.33%     0.00%   0.00%  uname    ld-2.28.so         [.] _dl_start_user
       +   34.29%  29.74%     0.55%   0.47%  uname    ld-2.28.so         [.] _dl_start
       +   33.73%  28.62%     1.60%   0.97%  uname    ld-2.28.so         [.] dl_main
       +   33.19%  29.04%     0.52%   0.32%  uname    ld-2.28.so         [.] _dl_sysdep_start
       +   27.83%  33.74%     0.00%   0.00%  uname    [kernel.kallsyms]  [k] do_syscall_64
       +   26.76%  33.29%     0.00%   0.00%  uname    [kernel.kallsyms]  [k] entry_SYSCALL_64_after_hwframe
       +   23.78%  20.33%     5.97%   5.25%  uname    [kernel.kallsyms]  [k] page_fault
       +   23.18%  24.60%     0.00%   0.00%  uname    libc-2.28.so       [.] __libc_start_main
       +   22.64%  24.37%     0.00%   0.00%  uname    uname              [.] _start
       +   21.04%  23.27%     0.00%   0.00%  uname    uname              [.] main
       +   19.48%  18.08%     3.72%   3.64%  uname    ld-2.28.so         [.] _dl_relocate_object
       +   19.47%  21.81%     0.00%   0.00%  uname    libc-2.28.so       [.] setlocale
       +   19.44%  21.56%     0.52%   0.61%  uname    libc-2.28.so       [.] _nl_find_locale
       +   17.87%  19.66%     0.00%   0.00%  uname    libc-2.28.so       [.] _nl_load_locale_from_archive
       +   15.71%  13.73%     0.53%   0.52%  uname    [kernel.kallsyms]  [k] do_page_fault
       +   15.18%  13.21%     1.03%   0.68%  uname    [kernel.kallsyms]  [k] handle_mm_fault
       +   14.15%  12.53%     1.01%   1.12%  uname    [kernel.kallsyms]  [k] __handle_mm_fault
       +   12.03%   9.67%     0.54%   0.32%  uname    ld-2.28.so         [.] _dl_map_object
       +   10.55%   8.48%     0.00%   0.00%  uname    ld-2.28.so         [.] openaux
       +   10.55%  20.20%     0.52%   0.61%  uname    libc-2.28.so       [.] __run_exit_handlers
      
      Comnmitter notes:
      
      Fixed up this problem:
      
        util/record.c: In function ‘perf_evlist__config’:
        util/record.c:256:3: error: too few arguments to function ‘perf_evsel__config_leader_sampling’
          256 |   perf_evsel__config_leader_sampling(evsel);
              |   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        util/record.c:190:13: note: declared here
          190 | static void perf_evsel__config_leader_sampling(struct evsel *evsel,
              |             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lore.kernel.org/lkml/20200401101613.6201-17-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      e3459979
    • Adrian Hunter's avatar
      perf evlist: Allow multiple read formats · 94d3820f
      Adrian Hunter authored
      Tools find the correct evsel, and therefore read format, using the event
      ID, so it isn't necessary for all read formats to be the same. In the
      case of leader-sampling of AUX area events, dummy tracking events will
      have a different read format, so relax the validation to become a debug
      message only.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lore.kernel.org/lkml/20200401101613.6201-16-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      94d3820f
    • Adrian Hunter's avatar
      perf evsel: Rearrange perf_evsel__config_leader_sampling() · 3713eb37
      Adrian Hunter authored
      In preparation for adding support for leader sampling with AUX area events.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lore.kernel.org/lkml/20200401101613.6201-15-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      3713eb37
    • Adrian Hunter's avatar
      perf evlist: Move leader-sampling configuration · 5f342788
      Adrian Hunter authored
      Move leader-sampling configuration in preparation for adding support for
      leader sampling with AUX area events.
      
      Committer notes:
      
      It only makes sense when configuring an evsel that is part of an evlist,
      so the only case where it is called outside perf_evlist__config(), in
      some 'perf test' entry, is safe, and even there we should just use
      perf_evlist__config(), but since in that case we have just one evsel in
      the evlist, it is equivalent.
      
      Also fixed up this problem:
      
        util/record.c: In function ‘perf_evlist__config’:
        util/record.c:223:3: error: too many arguments to function ‘perf_evsel__config_leader_sampling’
          223 |   perf_evsel__config_leader_sampling(evsel, evlist);
              |   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        util/record.c:170:13: note: declared here
          170 | static void perf_evsel__config_leader_sampling(struct evsel *evsel)
              |             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lore.kernel.org/lkml/20200401101613.6201-14-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      5f342788
    • Adrian Hunter's avatar
      perf evsel: Move and globalize perf_evsel__find_pmu() and perf_evsel__is_aux_event() · e12ee9f7
      Adrian Hunter authored
      Move and globalize 2 functions from the auxtrace specific sources so
      that they can be reused.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lore.kernel.org/lkml/20200401101613.6201-13-adrian.hunter@intel.com
      [ Move to pmu.c, as moving to evsel.h breaks the python binding ]
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      e12ee9f7
  2. 16 Apr, 2020 30 commits
    • Adrian Hunter's avatar
      perf intel-pt: Add support for synthesizing callchains for regular events · 2855c05c
      Adrian Hunter authored
      Currently, callchains can be synthesized only for synthesized events.
      Support also synthesizing callchains for regular events.
      
      Example:
      
       # perf record --kcore --aux-sample -e '{intel_pt//,cycles}' -c 10000 uname
       Linux
       [ perf record: Woken up 3 times to write data ]
       [ perf record: Captured and wrote 0.532 MB perf.data ]
       # perf script --itrace=Ge | head -20
       uname  4864 2419025.358181:      10000     cycles:
              ffffffffbba56965 apparmor_bprm_committing_creds+0x35 ([kernel.kallsyms])
              ffffffffbc400cd5 __indirect_thunk_start+0x5 ([kernel.kallsyms])
              ffffffffbba07422 security_bprm_committing_creds+0x22 ([kernel.kallsyms])
              ffffffffbb89805d install_exec_creds+0xd ([kernel.kallsyms])
              ffffffffbb90d9ac load_elf_binary+0x3ac ([kernel.kallsyms])
      
       uname  4864 2419025.358185:      10000     cycles:
              ffffffffbba56db0 apparmor_bprm_committed_creds+0x20 ([kernel.kallsyms])
              ffffffffbc400cd5 __indirect_thunk_start+0x5 ([kernel.kallsyms])
              ffffffffbba07452 security_bprm_committed_creds+0x22 ([kernel.kallsyms])
              ffffffffbb89809a install_exec_creds+0x4a ([kernel.kallsyms])
              ffffffffbb90d9ac load_elf_binary+0x3ac ([kernel.kallsyms])
      
       uname  4864 2419025.358189:      10000     cycles:
              ffffffffbb86fdf6 vma_adjust_trans_huge+0x6 ([kernel.kallsyms])
              ffffffffbb821660 __vma_adjust+0x160 ([kernel.kallsyms])
              ffffffffbb897be7 shift_arg_pages+0x97 ([kernel.kallsyms])
              ffffffffbb897ed9 setup_arg_pages+0x1e9 ([kernel.kallsyms])
              ffffffffbb90d9f2 load_elf_binary+0x3f2 ([kernel.kallsyms])
      
      Committer testing:
      
        # perf record --kcore --aux-sample -e '{intel_pt//,cycles}' -c 10000 uname
        Linux
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.233 MB perf.data ]
        #
      
      Then, before this patch:
      
        # perf script --itrace=Ge | head -20
           uname 28642 168664.856384: 10000 cycles: ffffffff9810aeaa commit_creds+0x2a ([kernel.kallsyms])
           uname 28642 168664.856388: 10000 cycles: ffffffff982a24f1 mprotect_fixup+0x151 ([kernel.kallsyms])
           uname 28642 168664.856392: 10000 cycles: ffffffff982a385b move_page_tables+0xbcb ([kernel.kallsyms])
           uname 28642 168664.856396: 10000 cycles: ffffffff982fd4ec __mod_memcg_state+0x1c ([kernel.kallsyms])
           uname 28642 168664.856400: 10000 cycles: ffffffff9829fddd do_mmap+0xfd ([kernel.kallsyms])
           uname 28642 168664.856404: 10000 cycles: ffffffff9829c879 __vma_adjust+0x479 ([kernel.kallsyms])
           uname 28642 168664.856408: 10000 cycles: ffffffff98238e94 __perf_addr_filters_adjust+0x34 ([kernel.kallsyms])
           uname 28642 168664.856412: 10000 cycles: ffffffff98a38e0b down_write+0x1b ([kernel.kallsyms])
           uname 28642 168664.856416: 10000 cycles: ffffffff983006a0 memcg_kmem_get_cache+0x0 ([kernel.kallsyms])
           uname 28642 168664.856421: 10000 cycles: ffffffff98396eaf load_elf_binary+0x92f ([kernel.kallsyms])
           uname 28642 168664.856425: 10000 cycles: ffffffff982e0222 kfree+0x62 ([kernel.kallsyms])
           uname 28642 168664.856428: 10000 cycles: ffffffff9846dfd4 file_has_perm+0x54 ([kernel.kallsyms])
           uname 28642 168664.856433: 10000 cycles: ffffffff98288911 vma_interval_tree_insert+0x51 ([kernel.kallsyms])
           uname 28642 168664.856437: 10000 cycles: ffffffff9823e577 perf_event_mmap_output+0x27 ([kernel.kallsyms])
           uname 28642 168664.856441: 10000 cycles: ffffffff98a26fa0 xas_load+0x40 ([kernel.kallsyms])
           uname 28642 168664.856445: 10000 cycles: ffffffff98004f30 arch_setup_additional_pages+0x0 ([kernel.kallsyms])
           uname 28642 168664.856448: 10000 cycles: ffffffff98a297c0 copy_user_generic_unrolled+0xa0 ([kernel.kallsyms])
           uname 28642 168664.856452: 10000 cycles: ffffffff9853a87a strnlen_user+0x10a ([kernel.kallsyms])
           uname 28642 168664.856456: 10000 cycles: ffffffff986638a7 randomize_page+0x27 ([kernel.kallsyms])
           uname 28642 168664.856460: 10000 cycles: ffffffff98a3b645 _raw_spin_lock+0x5 ([kernel.kallsyms])
      
        #
      
      And after:
      
        # perf script --itrace=Ge | head -20
        uname 28642 168664.856384:      10000     cycles:
        	ffffffff9810aeaa commit_creds+0x2a ([kernel.kallsyms])
        	ffffffff9831fe87 install_exec_creds+0x17 ([kernel.kallsyms])
        	ffffffff983968d9 load_elf_binary+0x359 ([kernel.kallsyms])
        	ffffffff98e00c45 __x86_indirect_thunk_rax+0x5 ([kernel.kallsyms])
        	ffffffff98e00c45 __x86_indirect_thunk_rax+0x5 ([kernel.kallsyms])
      
        uname 28642 168664.856388:      10000     cycles:
        	ffffffff982a24f1 mprotect_fixup+0x151 ([kernel.kallsyms])
        	ffffffff9831fa83 setup_arg_pages+0x123 ([kernel.kallsyms])
        	ffffffff9839691f load_elf_binary+0x39f ([kernel.kallsyms])
        	ffffffff98e00c45 __x86_indirect_thunk_rax+0x5 ([kernel.kallsyms])
        	ffffffff98e00c45 __x86_indirect_thunk_rax+0x5 ([kernel.kallsyms])
      
        uname 28642 168664.856392:      10000     cycles:
        	ffffffff982a385b move_page_tables+0xbcb ([kernel.kallsyms])
        	ffffffff9831f889 shift_arg_pages+0xa9 ([kernel.kallsyms])
        	ffffffff9831fb4f setup_arg_pages+0x1ef ([kernel.kallsyms])
        	ffffffff9839691f load_elf_binary+0x39f ([kernel.kallsyms])
        	ffffffff98e00c45 __x86_indirect_thunk_rax+0x5 ([kernel.kallsyms])
        #
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lore.kernel.org/lkml/20200401101613.6201-12-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      2855c05c
    • Adrian Hunter's avatar
      perf evsel: Add support for synthesized sample type · e11869a0
      Adrian Hunter authored
      For reporting purposes, an evsel sample can have a callchain synthesized
      from AUX area data. Add support for keeping track of synthesized sample
      types. Note, the recorded sample_type cannot be changed because it is
      needed to continue to parse events.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lore.kernel.org/lkml/20200401101613.6201-11-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      e11869a0
    • Adrian Hunter's avatar
      perf evsel: Be consistent when looking which evsel PERF_SAMPLE_ bits are set · 8e94b324
      Adrian Hunter authored
      Using 'type' variable for checking for callchains is equivalent to using
      evsel__has_callchain(evsel) and is how the other PERF_SAMPLE_ bits are checked
      in this function, so use it to be consistent.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lore.kernel.org/lkml/20200401101613.6201-11-adrian.hunter@intel.com
      [ split from a larger patch ]
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      8e94b324
    • Adrian Hunter's avatar
      perf thread-stack: Add thread_stack__sample_late() · 4fef41bf
      Adrian Hunter authored
      Add a thread stack function to create a call chain for hardware events
      where the sample records get created some time after the event occurred.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lore.kernel.org/lkml/20200401101613.6201-10-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      4fef41bf
    • Adrian Hunter's avatar
      perf auxtrace: Add an option to synthesize callchains for regular events · 1c5c25b3
      Adrian Hunter authored
      Currently, callchains can be synthesized only for synthesized events. Add
      an itrace option to synthesize callchains for regular events.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lore.kernel.org/lkml/20200401101613.6201-9-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      1c5c25b3
    • Adrian Hunter's avatar
      perf auxtrace: For reporting purposes, un-group AUX area event · 5c7bec0c
      Adrian Hunter authored
      An AUX area event must be the group leader when recording traces in
      sample mode, but that does not produce the expected results from
      'perf report' because it expects the leader to provide samples.
      
      Rather than teach 'perf report' about AUX area sampling, un-group the
      AUX area event during processing, making the 2nd event the leader.
      
      Example:
      
       $ perf record -e '{intel_pt//u,branch-misses:u}' -c 1 uname
       Linux
       [ perf record: Woken up 1 times to write data ]
       [ perf record: Captured and wrote 0.080 MB perf.data ]
      
       Before:
      
       $ perf report
      
       Samples: 800  of events 'anon group { intel_pt//u, branch-misses:u }', Event count (approx.): 800
              Children              Self  Command  Shared Object     Symbol
           0.00%  47.50%     0.00%  47.50%  uname    libc-2.28.so      [.] _dl_addr
           0.00%  16.38%     0.00%  16.38%  uname    ld-2.28.so        [.] __GI___tunables_init
           0.00%  54.75%     0.00%   4.75%  uname    ld-2.28.so        [.] dl_main
           0.00%   3.12%     0.00%   3.12%  uname    ld-2.28.so        [.] _dl_map_object_from_fd
           0.00%   2.38%     0.00%   2.38%  uname    ld-2.28.so        [.] strcmp
           0.00%   2.25%     0.00%   2.25%  uname    ld-2.28.so        [.] _dl_check_map_versions
           0.00%   2.00%     0.00%   2.00%  uname    ld-2.28.so        [.] _dl_important_hwcaps
           0.00%   2.00%     0.00%   2.00%  uname    ld-2.28.so        [.] _dl_map_object_deps
           0.00%  51.50%     0.00%   1.50%  uname    ld-2.28.so        [.] _dl_sysdep_start
           0.00%   1.25%     0.00%   1.25%  uname    ld-2.28.so        [.] _dl_load_cache_lookup
           0.00%  51.12%     0.00%   1.12%  uname    ld-2.28.so        [.] _dl_start
           0.00%  50.88%     0.00%   1.12%  uname    ld-2.28.so        [.] do_lookup_x
           0.00%  50.62%     0.00%   1.00%  uname    ld-2.28.so        [.] _dl_lookup_symbol_x
           0.00%   1.00%     0.00%   1.00%  uname    ld-2.28.so        [.] _dl_map_object
           0.00%   1.00%     0.00%   1.00%  uname    ld-2.28.so        [.] _dl_next_ld_env_entry
           0.00%   0.88%     0.00%   0.88%  uname    ld-2.28.so        [.] _dl_cache_libcmp
           0.00%   0.88%     0.00%   0.88%  uname    ld-2.28.so        [.] _dl_new_object
           0.00%  50.88%     0.00%   0.88%  uname    ld-2.28.so        [.] _dl_relocate_object
           0.00%   0.62%     0.00%   0.62%  uname    ld-2.28.so        [.] _dl_init_paths
           0.00%   0.62%     0.00%   0.62%  uname    ld-2.28.so        [.] _dl_name_match_p
           0.00%   0.50%     0.00%   0.50%  uname    ld-2.28.so        [.] get_common_indeces.constprop.1
           0.00%   0.50%     0.00%   0.50%  uname    ld-2.28.so        [.] memmove
           0.00%   0.50%     0.00%   0.50%  uname    ld-2.28.so        [.] memset
           0.00%   0.50%     0.00%   0.50%  uname    ld-2.28.so        [.] open_verify.constprop.11
           0.00%   0.38%     0.00%   0.38%  uname    ld-2.28.so        [.] _dl_check_all_versions
           0.00%   0.38%     0.00%   0.38%  uname    ld-2.28.so        [.] _dl_find_dso_for_object
           0.00%   0.38%     0.00%   0.38%  uname    ld-2.28.so        [.] init_tls
           0.00%   0.25%     0.00%   0.25%  uname    ld-2.28.so        [.] __tunable_get_val
           0.00%   0.25%     0.00%   0.25%  uname    ld-2.28.so        [.] _dl_add_to_namespace_list
           0.00%   0.25%     0.00%   0.25%  uname    ld-2.28.so        [.] _dl_determine_tlsoffset
           0.00%   0.25%     0.00%   0.25%  uname    ld-2.28.so        [.] _dl_discover_osversion
           0.00%   0.25%     0.00%   0.25%  uname    ld-2.28.so        [.] calloc@plt
           0.00%   0.25%     0.00%   0.25%  uname    ld-2.28.so        [.] malloc
           0.00%   0.25%     0.00%   0.25%  uname    ld-2.28.so        [.] malloc@plt
           0.00%   0.25%     0.00%   0.25%  uname    libc-2.28.so      [.] _nl_load_locale_from_archive
           0.00%   0.25%     0.00%   0.25%  uname    [unknown]         [k] 0xffffffffa3a00010
           0.00%   0.12%     0.00%   0.12%  uname    ld-2.28.so        [.] __libc_scratch_buffer_set_array_size
           0.00%   0.12%     0.00%   0.12%  uname    ld-2.28.so        [.] _dl_allocate_tls_storage
           0.00%   0.12%     0.00%   0.12%  uname    ld-2.28.so        [.] _dl_catch_exception
           0.00%   0.12%     0.00%   0.12%  uname    ld-2.28.so        [.] _dl_setup_hash
           0.00%   0.12%     0.00%   0.12%  uname    ld-2.28.so        [.] _dl_sort_maps
           0.00%   0.12%     0.00%   0.12%  uname    ld-2.28.so        [.] _dl_sysdep_read_whole_file
           0.00%   0.12%     0.00%   0.12%  uname    ld-2.28.so        [.] access
           0.00%   0.12%     0.00%   0.12%  uname    ld-2.28.so        [.] calloc
           0.00%   0.12%     0.00%   0.12%  uname    ld-2.28.so        [.] mmap64
           0.00%   0.12%     0.00%   0.12%  uname    ld-2.28.so        [.] openaux
           0.00%   0.12%     0.00%   0.12%  uname    ld-2.28.so        [.] rtld_lock_default_lock_recursive
           0.00%   0.12%     0.00%   0.12%  uname    ld-2.28.so        [.] rtld_lock_default_unlock_recursive
           0.00%   0.12%     0.00%   0.12%  uname    ld-2.28.so        [.] strchr
           0.00%   0.12%     0.00%   0.12%  uname    ld-2.28.so        [.] strlen
           0.00%   0.12%     0.00%   0.12%  uname    ld-2.28.so        [.] 0x0000000000001080
           0.00%   0.12%     0.00%   0.12%  uname    libc-2.28.so      [.] __strchrnul_avx2
           0.00%   0.12%     0.00%   0.12%  uname    libc-2.28.so      [.] _nl_normalize_codeset
           0.00%   0.12%     0.00%   0.12%  uname    libc-2.28.so      [.] malloc
           0.00%   0.12%     0.00%   0.12%  uname    [unknown]         [k] 0xffffffffa3a011f0
           0.00%  50.00%     0.00%   0.00%  uname    ld-2.28.so        [.] _dl_start_user
           0.00%  50.00%     0.00%   0.00%  uname    [unknown]         [.] 0000000000000000
      
       After:
      
       Samples: 800  of event 'branch-misses:u', Event count (approx.): 800
        Children      Self  Command  Shared Object     Symbol
          54.75%     4.75%  uname    ld-2.28.so        [.] dl_main
          51.50%     1.50%  uname    ld-2.28.so        [.] _dl_sysdep_start
          51.12%     1.12%  uname    ld-2.28.so        [.] _dl_start
          50.88%     0.88%  uname    ld-2.28.so        [.] _dl_relocate_object
          50.88%     1.12%  uname    ld-2.28.so        [.] do_lookup_x
          50.62%     1.00%  uname    ld-2.28.so        [.] _dl_lookup_symbol_x
          50.00%     0.00%  uname    ld-2.28.so        [.] _dl_start_user
          50.00%     0.00%  uname    [unknown]         [.] 0000000000000000
          47.50%    47.50%  uname    libc-2.28.so      [.] _dl_addr
          16.38%    16.38%  uname    ld-2.28.so        [.] __GI___tunables_init
           3.12%     3.12%  uname    ld-2.28.so        [.] _dl_map_object_from_fd
           2.38%     2.38%  uname    ld-2.28.so        [.] strcmp
           2.25%     2.25%  uname    ld-2.28.so        [.] _dl_check_map_versions
           2.00%     2.00%  uname    ld-2.28.so        [.] _dl_important_hwcaps
           2.00%     2.00%  uname    ld-2.28.so        [.] _dl_map_object_deps
           1.25%     1.25%  uname    ld-2.28.so        [.] _dl_load_cache_lookup
           1.00%     1.00%  uname    ld-2.28.so        [.] _dl_map_object
           1.00%     1.00%  uname    ld-2.28.so        [.] _dl_next_ld_env_entry
           0.88%     0.88%  uname    ld-2.28.so        [.] _dl_cache_libcmp
           0.88%     0.88%  uname    ld-2.28.so        [.] _dl_new_object
           0.62%     0.62%  uname    ld-2.28.so        [.] _dl_init_paths
           0.62%     0.62%  uname    ld-2.28.so        [.] _dl_name_match_p
           0.50%     0.50%  uname    ld-2.28.so        [.] get_common_indeces.constprop.1
           0.50%     0.50%  uname    ld-2.28.so        [.] memmove
           0.50%     0.50%  uname    ld-2.28.so        [.] memset
           0.50%     0.50%  uname    ld-2.28.so        [.] open_verify.constprop.11
           0.38%     0.38%  uname    ld-2.28.so        [.] _dl_check_all_versions
           0.38%     0.38%  uname    ld-2.28.so        [.] _dl_find_dso_for_object
           0.38%     0.38%  uname    ld-2.28.so        [.] init_tls
           0.25%     0.25%  uname    ld-2.28.so        [.] __tunable_get_val
           0.25%     0.25%  uname    ld-2.28.so        [.] _dl_add_to_namespace_list
           0.25%     0.25%  uname    ld-2.28.so        [.] _dl_determine_tlsoffset
           0.25%     0.25%  uname    ld-2.28.so        [.] _dl_discover_osversion
           0.25%     0.25%  uname    ld-2.28.so        [.] calloc@plt
           0.25%     0.25%  uname    ld-2.28.so        [.] malloc
           0.25%     0.25%  uname    ld-2.28.so        [.] malloc@plt
           0.25%     0.25%  uname    libc-2.28.so      [.] _nl_load_locale_from_archive
           0.25%     0.25%  uname    [unknown]         [k] 0xffffffffa3a00010
           0.12%     0.12%  uname    ld-2.28.so        [.] __libc_scratch_buffer_set_array_size
           0.12%     0.12%  uname    ld-2.28.so        [.] _dl_allocate_tls_storage
           0.12%     0.12%  uname    ld-2.28.so        [.] _dl_catch_exception
           0.12%     0.12%  uname    ld-2.28.so        [.] _dl_setup_hash
           0.12%     0.12%  uname    ld-2.28.so        [.] _dl_sort_maps
           0.12%     0.12%  uname    ld-2.28.so        [.] _dl_sysdep_read_whole_file
           0.12%     0.12%  uname    ld-2.28.so        [.] access
           0.12%     0.12%  uname    ld-2.28.so        [.] calloc
           0.12%     0.12%  uname    ld-2.28.so        [.] mmap64
           0.12%     0.12%  uname    ld-2.28.so        [.] openaux
           0.12%     0.12%  uname    ld-2.28.so        [.] rtld_lock_default_lock_recursive
           0.12%     0.12%  uname    ld-2.28.so        [.] rtld_lock_default_unlock_recursive
           0.12%     0.12%  uname    ld-2.28.so        [.] strchr
           0.12%     0.12%  uname    ld-2.28.so        [.] strlen
           0.12%     0.12%  uname    ld-2.28.so        [.] 0x0000000000001080
           0.12%     0.12%  uname    libc-2.28.so      [.] __strchrnul_avx2
           0.12%     0.12%  uname    libc-2.28.so      [.] _nl_normalize_codeset
           0.12%     0.12%  uname    libc-2.28.so      [.] malloc
           0.12%     0.12%  uname    [unknown]         [k] 0xffffffffa3a011f0
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lore.kernel.org/lkml/20200401101613.6201-8-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      5c7bec0c
    • Adrian Hunter's avatar
      perf s390-cpumsf: Implement ->evsel_is_auxtrace() callback · 113fcb46
      Adrian Hunter authored
      Implement ->evsel_is_auxtrace() callback.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Acked-by: default avatarThomas Richter <tmricht@linux.ibm.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lore.kernel.org/lkml/20200401101613.6201-7-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      113fcb46
    • Adrian Hunter's avatar
      perf cs-etm: Implement ->evsel_is_auxtrace() callback · a58ab57c
      Adrian Hunter authored
      Implement ->evsel_is_auxtrace() callback.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Reviewed-by: default avatarMathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lore.kernel.org/lkml/20200401101613.6201-6-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      a58ab57c
    • Adrian Hunter's avatar
      perf arm-spe: Implement ->evsel_is_auxtrace() callback · 508c71e3
      Adrian Hunter authored
      Implement ->evsel_is_auxtrace() callback.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Reviewed-by: default avatarLeo Yan <leo.yan@linaro.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kim Phillips <kim.phillips@arm.com>
      Link: http://lore.kernel.org/lkml/20200401101613.6201-5-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      508c71e3
    • Adrian Hunter's avatar
      perf intel-bts: Implement ->evsel_is_auxtrace() callback · 966246f5
      Adrian Hunter authored
      Implement ->evsel_is_auxtrace() callback.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lore.kernel.org/lkml/20200401101613.6201-4-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      966246f5
    • Adrian Hunter's avatar
      perf intel-pt: Implement ->evsel_is_auxtrace() callback · 6b52bb07
      Adrian Hunter authored
      Implement ->evsel_is_auxtrace() callback.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lore.kernel.org/lkml/20200401101613.6201-3-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      6b52bb07
    • Adrian Hunter's avatar
      perf auxtrace: Add ->evsel_is_auxtrace() callback · 853f37d7
      Adrian Hunter authored
      Add ->evsel_is_auxtrace() callback to identify if a selected event
      is an AUX area event.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kim Phillips <kim.phillips@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Thomas Richter <tmricht@linux.ibm.com>
      Link: http://lore.kernel.org/lkml/20200401101613.6201-2-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      853f37d7
    • Andreas Gerstmayr's avatar
      perf script: Add flamegraph.py script · 5287f926
      Andreas Gerstmayr authored
      This script works in tandem with d3-flame-graph to generate flame graphs
      from perf. It supports two output formats: JSON and HTML (the default).
      The HTML format will look for a standalone d3-flame-graph template file
      in /usr/share/d3-flame-graph/d3-flamegraph-base.html and fill in the
      collected stacks.
      
      Usage:
      
          perf record -a -g -F 99 sleep 60
          perf script report flamegraph
      
      Combined:
      
          perf script flamegraph -a -F 99 sleep 60
      
      Committer testing:
      
      Tested both with "PYTHON=python3" and with the default, that uses
      python2-devel:
      
      Complete set of instructions:
      
        $ mkdir /tmp/build/perf
        $ make PYTHON=python3 -C tools/perf O=/tmp/build/perf install-bin
        $ export PATH=~/bin:$PATH
        $ perf record -a -g -F 99 sleep 60
        $ perf script report flamegraph
      
      Now go and open the generated flamegraph.html file in a browser.
      
      At first this required building with PYTHON=python3, but after I
      reported this Andreas was kind enough to send a patch making it work
      with both python and python3.
      Signed-off-by: default avatarAndreas Gerstmayr <agerstmayr@redhat.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Brendan Gregg <bgregg@netflix.com>
      Cc: Martin Spier <mspier@netflix.com>
      Link: http://lore.kernel.org/lkml/20200320151355.66302-1-agerstmayr@redhat.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      5287f926
    • Kajol Jain's avatar
      perf metrictroup: Split the metricgroup__add_metric function · 47352aba
      Kajol Jain authored
      This patch refactors metricgroup__add_metric function where some part of
      it move to function metricgroup__add_metric_param.  No logic change.
      Signed-off-by: default avatarKajol Jain <kjain@linux.ibm.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Anju T Sudhakar <anju@linux.vnet.ibm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Joe Mario <jmario@redhat.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Cc: Mamatha Inamdar <mamatha4@linux.vnet.ibm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michael Petlan <mpetlan@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Mackerras <paulus@ozlabs.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linuxppc-dev@lists.ozlabs.org
      Link: http://lore.kernel.org/lkml/20200401203340.31402-4-kjain@linux.ibm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      47352aba
    • Jiri Olsa's avatar
      perf expr: Add expr_scanner_ctx object · 871f9f59
      Jiri Olsa authored
      Add the expr_scanner_ctx object to hold user data for the expr scanner.
      Currently it holds only start_token, Kajol Jain will use it to hold 24x7
      runtime param.
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Anju T Sudhakar <anju@linux.vnet.ibm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Joe Mario <jmario@redhat.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Cc: Mamatha Inamdar <mamatha4@linux.vnet.ibm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michael Petlan <mpetlan@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Mackerras <paulus@ozlabs.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linuxppc-dev@lists.ozlabs.org
      Link: http://lore.kernel.org/lkml/20200401203340.31402-3-kjain@linux.ibm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      871f9f59
    • Jiri Olsa's avatar
      perf expr: Add expr_ prefix for parse_ctx and parse_id · aecce63e
      Jiri Olsa authored
      Adding expr_ prefix for parse_ctx and parse_id, to straighten out the
      expr* namespace.
      
      There's no functional change.
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Anju T Sudhakar <anju@linux.vnet.ibm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Joe Mario <jmario@redhat.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Cc: Mamatha Inamdar <mamatha4@linux.vnet.ibm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michael Petlan <mpetlan@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Mackerras <paulus@ozlabs.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linuxppc-dev@lists.ozlabs.org
      Link: http://lore.kernel.org/lkml/20200401203340.31402-2-kjain@linux.ibm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      aecce63e
    • Ian Rogers's avatar
      perf synthetic-events: save 4kb from 2 stack frames · 04ed4ccb
      Ian Rogers authored
      Reuse an existing char buffer to avoid two PATH_MAX sized char buffers.
      
      Reduces stack frame sizes by 4kb.
      
      perf_event__synthesize_mmap_events before 'sub $0x45b8,%rsp' after
      'sub $0x35b8,%rsp'.
      
      perf_event__get_comm_ids before 'sub $0x2028,%rsp' after
      'sub $0x1028,%rsp'.
      
      The performance impact of this change is negligible.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andrey Zhizhikin <andrey.z@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lore.kernel.org/lkml/20200402154357.107873-4-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      04ed4ccb
    • Stephane Eranian's avatar
      tools api fs: Make xxx__mountpoint() more scalable · c6fddb28
      Stephane Eranian authored
      The xxx_mountpoint() interface provided by fs.c finds mount points for
      common pseudo filesystems. The first time xxx_mountpoint() is invoked,
      it scans the mount table (/proc/mounts) looking for a match. If found,
      it is cached. The price to scan /proc/mounts is paid once if the mount
      is found.
      
      When the mount point is not found, subsequent calls to xxx_mountpoint()
      scan /proc/mounts over and over again.  There is no caching.
      
      This causes a scaling issue in perf record with hugeltbfs__mountpoint().
      The function is called for each process found in
      synthesize__mmap_events().  If the machine has thousands of processes
      and if the /proc/mounts has many entries this could cause major overhead
      in perf record. We have observed multi-second slowdowns on some
      configurations.
      
      As an example on a laptop:
      
      Before:
      
        $ sudo umount /dev/hugepages
        $ strace -e trace=openat -o /tmp/tt perf record -a ls
        $ fgrep mounts /tmp/tt
        285
      
      After:
      
        $ sudo umount /dev/hugepages
        $ strace -e trace=openat -o /tmp/tt perf record -a ls
        $ fgrep mounts /tmp/tt
        1
      
      One could argue that the non-caching in case the moint point is not
      found is intentional. That way subsequent calls may discover a moint
      point if the sysadmin mounts the filesystem. But the same argument could
      be made against caching the mount point. It could be unmounted causing
      errors.  It all depends on the intent of the interface. This patch
      assumes it is expected to scan /proc/mounts once. The patch documents
      the caching behavior in the fs.h header file.
      
      An alternative would be to just fix perf record. But it would solve the
      problem with hugetlbs__mountpoint() but there could be similar issues
      (possibly down the line) with other xxx_mountpoint() calls in perf or
      other tools.
      Signed-off-by: default avatarStephane Eranian <eranian@google.com>
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andrey Zhizhikin <andrey.z@gmail.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lore.kernel.org/lkml/20200402154357.107873-3-irogers@google.comSigned-off-by: default avatarIan Rogers <irogers@google.com>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      c6fddb28
    • Ian Rogers's avatar
      perf bench: Add event synthesis benchmark · 2a4b5166
      Ian Rogers authored
      Event synthesis may occur at the start or end (tail) of a perf command.
      In system-wide mode it can scan every process in /proc, which may add
      seconds of latency before event recording. Add a new benchmark that
      times how long event synthesis takes with and without data synthesis.
      
      An example execution looks like:
      
       $ perf bench internals synthesize
       # Running 'internals/synthesize' benchmark:
       Average synthesis took: 168.253800 usec
       Average data synthesis took: 208.104700 usec
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andrey Zhizhikin <andrey.z@gmail.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lore.kernel.org/lkml/20200402154357.107873-2-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      2a4b5166
    • Adrian Hunter's avatar
      perf script: Simplify auxiliary event printing functions · 1a2725f3
      Adrian Hunter authored
      This simplifies the print functions for the following perf script
      options:
      
      	--show-task-events
      	--show-namespace-events
      	--show-cgroup-events
      	--show-mmap-events
      	--show-switch-events
      	--show-lost-events
      	--show-bpf-events
      
      Example:
      	# perf record --switch-events -a -e cycles -c 10000 sleep 1
       Before:
      	# perf script --show-task-events --show-namespace-events --show-cgroup-events --show-mmap-events --show-switch-events --show-lost-events --show-bpf-events > out-before.txt
       After:
      	# perf script --show-task-events --show-namespace-events --show-cgroup-events --show-mmap-events --show-switch-events --show-lost-events --show-bpf-events > out-after.txt
      	# diff -s out-before.txt out-after.txt
      	Files out-before.txt and out-after.tx are identical
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Link: http://lore.kernel.org/lkml/20200402141548.21283-1-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      1a2725f3
    • Alexey Budankov's avatar
      doc/admin-guide: update kernel.rst with CAP_PERFMON information · 025b16f8
      Alexey Budankov authored
      Update the kernel.rst documentation file with the information related to
      usage of CAP_PERFMON capability to secure performance monitoring and
      observability operations in system.
      Signed-off-by: default avatarAlexey Budankov <alexey.budankov@linux.intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Igor Lubashev <ilubashe@akamai.com>
      Cc: James Morris <jmorris@namei.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Serge Hallyn <serge@hallyn.com>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: intel-gfx@lists.freedesktop.org
      Cc: linux-doc@vger.kernel.org
      Cc: linux-man@vger.kernel.org
      Cc: linux-security-module@vger.kernel.org
      Cc: selinux@vger.kernel.org
      Link: http://lore.kernel.org/lkml/84c32383-14a2-fa35-16b6-f9e59bd37240@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      025b16f8
    • Alexey Budankov's avatar
      doc/admin-guide: Update perf-security.rst with CAP_PERFMON information · 902a8dcc
      Alexey Budankov authored
      Update perf-security.rst documentation file with the information
      related to usage of CAP_PERFMON capability to secure performance
      monitoring and observability operations in system.
      
      Committer notes:
      
      While testing 'perf top' under cap_perfmon I noticed that it needs
      some more capability and Alexey pointed out cap_ipc_lock, as needed by
      this kernel chunk:
      
        kernel/events/core.c: 6101
             if ((locked > lock_limit) && perf_is_paranoid() &&
                     !capable(CAP_IPC_LOCK)) {
                     ret = -EPERM;
                     goto unlock;
             }
      
      So I added it to the documentation, and also mentioned that if the
      libcap version doesn't yet supports 'cap_perfmon', its numeric value can
      be used instead, i.e. if:
      
      	# setcap "cap_perfmon,cap_ipc_lock,cap_sys_ptrace,cap_syslog=ep" perf
      
      Fails, try:
      
      	# setcap "38,cap_ipc_lock,cap_sys_ptrace,cap_syslog=ep" perf
      
      I also added a paragraph stating that using an unpatched libcap will
      fail the check for CAP_PERFMON, as it checks the cap number against a
      maximum to see if it is valid, which makes it use as the default the
      'cycles:u' event, even tho a cap_perfmon capable perf binary can get
      kernel samples, to workaround that just use, e.g.:
      
        # perf top -e cycles
        # perf record -e cycles
      
      And it will sample kernel and user modes.
      Signed-off-by: default avatarAlexey Budankov <alexey.budankov@linux.intel.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Igor Lubashev <ilubashe@akamai.com>
      Cc: James Morris <jmorris@namei.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Serge Hallyn <serge@hallyn.com>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: intel-gfx@lists.freedesktop.org
      Cc: linux-doc@vger.kernel.org
      Cc: linux-man@vger.kernel.org
      Cc: linux-security-module@vger.kernel.org
      Cc: selinux@vger.kernel.org
      Link: http://lore.kernel.org/lkml/17278551-9399-9ebe-d665-8827016a217d@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      902a8dcc
    • Alexey Budankov's avatar
      drivers/oprofile: Open access for CAP_PERFMON privileged process · ab76878b
      Alexey Budankov authored
      Open access to monitoring for CAP_PERFMON privileged process.  Providing
      the access under CAP_PERFMON capability singly, without the rest of
      CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials
      and makes operation more secure.
      
      CAP_PERFMON implements the principle of least privilege for performance
      monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39
      principle of least privilege: A security design principle that states
      that a process or program be granted only those privileges (e.g.,
      capabilities) necessary to accomplish its legitimate function, and only
      for the time that such privileges are actually required)
      
      For backward compatibility reasons access to the monitoring remains open
      for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for
      secure monitoring is discouraged with respect to CAP_PERFMON capability.
      Signed-off-by: default avatarAlexey Budankov <alexey.budankov@linux.intel.com>
      Acked-by: default avatarJames Morris <jamorris@linux.microsoft.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Igor Lubashev <ilubashe@akamai.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Serge Hallyn <serge@hallyn.com>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: intel-gfx@lists.freedesktop.org
      Cc: linux-doc@vger.kernel.org
      Cc: linux-man@vger.kernel.org
      Cc: linux-security-module@vger.kernel.org
      Cc: selinux@vger.kernel.org
      Link: http://lore.kernel.org/lkml/691f1096-b15f-9b12-50a0-c2b93918149e@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      ab76878b
    • Alexey Budankov's avatar
      drivers/perf: Open access for CAP_PERFMON privileged process · cea7d0d4
      Alexey Budankov authored
      Open access to monitoring for CAP_PERFMON privileged process.  Providing
      the access under CAP_PERFMON capability singly, without the rest of
      CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials
      and makes operation more secure.
      
      CAP_PERFMON implements the principle of least privilege for performance
      monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39
      principle of least privilege: A security design principle that states
      that a process or program be granted only those privileges (e.g.,
      capabilities) necessary to accomplish its legitimate function, and only
      for the time that such privileges are actually required)
      
      For backward compatibility reasons access to the monitoring remains open
      for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for
      secure monitoring is discouraged with respect to CAP_PERFMON capability.
      Signed-off-by: default avatarAlexey Budankov <alexey.budankov@linux.intel.com>
      Reviewed-by: default avatarJames Morris <jamorris@linux.microsoft.com>
      Acked-by: default avatarWill Deacon <will@kernel.org>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Igor Lubashev <ilubashe@akamai.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Serge Hallyn <serge@hallyn.com>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: intel-gfx@lists.freedesktop.org
      Cc: linux-doc@vger.kernel.org
      Cc: linux-man@vger.kernel.org
      Cc: linux-security-module@vger.kernel.org
      Cc: selinux@vger.kernel.org
      Link: http://lore.kernel.org/lkml/4ec1d6f7-548c-8d1c-f84a-cebeb9674e4e@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      cea7d0d4
    • Alexey Budankov's avatar
      parisc/perf: open access for CAP_PERFMON privileged process · cf91baf3
      Alexey Budankov authored
      Open access to monitoring for CAP_PERFMON privileged process.  Providing
      the access under CAP_PERFMON capability singly, without the rest of
      CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials
      and makes operation more secure.
      
      CAP_PERFMON implements the principle of least privilege for performance
      monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39
      principle of least privilege: A security design principle that states
      that a process or program be granted only those privileges (e.g.,
      capabilities) necessary to accomplish its legitimate function, and only
      for the time that such privileges are actually required)
      
      For backward compatibility reasons access to the monitoring remains open
      for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for
      secure monitoring is discouraged with respect to CAP_PERFMON capability.
      Signed-off-by: default avatarAlexey Budankov <alexey.budankov@linux.intel.com>
      Reviewed-by: default avatarJames Morris <jamorris@linux.microsoft.com>
      Acked-by: default avatarHelge Deller <deller@gmx.de>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Igor Lubashev <ilubashe@akamai.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Serge Hallyn <serge@hallyn.com>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: intel-gfx@lists.freedesktop.org
      Cc: linux-doc@vger.kernel.org
      Cc: linux-man@vger.kernel.org
      Cc: linux-security-module@vger.kernel.org
      Cc: selinux@vger.kernel.org
      Link: http://lore.kernel.org/lkml/8cc98809-d35b-de0f-de02-4cf554f3cf62@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      cf91baf3
    • Alexey Budankov's avatar
      powerpc/perf: open access for CAP_PERFMON privileged process · ff467583
      Alexey Budankov authored
      Open access to monitoring for CAP_PERFMON privileged process.  Providing
      the access under CAP_PERFMON capability singly, without the rest of
      CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials
      and makes operation more secure.
      
      CAP_PERFMON implements the principle of least privilege for performance
      monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39
      principle of least privilege: A security design principle that states
      that a process or program be granted only those privileges (e.g.,
      capabilities) necessary to accomplish its legitimate function, and only
      for the time that such privileges are actually required)
      
      For backward compatibility reasons access to the monitoring remains open
      for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for
      secure monitoring is discouraged with respect to CAP_PERFMON capability.
      Signed-off-by: default avatarAlexey Budankov <alexey.budankov@linux.intel.com>
      Reviewed-by: default avatarJames Morris <jamorris@linux.microsoft.com>
      Acked-by: default avatarAnju T Sudhakar <anju@linux.vnet.ibm.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Igor Lubashev <ilubashe@akamai.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Serge Hallyn <serge@hallyn.com>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: intel-gfx@lists.freedesktop.org
      Cc: linux-doc@vger.kernel.org
      Cc: linux-man@vger.kernel.org
      Cc: linux-security-module@vger.kernel.org
      Cc: selinux@vger.kernel.org
      Link: http://lore.kernel.org/lkml/ac98cd9f-b59e-673c-c70d-180b3e7695d2@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      ff467583
    • Alexey Budankov's avatar
      trace/bpf_trace: Open access for CAP_PERFMON privileged process · 031258da
      Alexey Budankov authored
      Open access to bpf_trace monitoring for CAP_PERFMON privileged process.
      Providing the access under CAP_PERFMON capability singly, without the
      rest of CAP_SYS_ADMIN credentials, excludes chances to misuse the
      credentials and makes operation more secure.
      
      CAP_PERFMON implements the principle of least privilege for performance
      monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39
      principle of least privilege: A security design principle that states
      that a process or program be granted only those privileges (e.g.,
      capabilities) necessary to accomplish its legitimate function, and only
      for the time that such privileges are actually required)
      
      For backward compatibility reasons access to bpf_trace monitoring
      remains open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN
      usage for secure bpf_trace monitoring is discouraged with respect to
      CAP_PERFMON capability.
      Signed-off-by: default avatarAlexey Budankov <alexey.budankov@linux.intel.com>
      Reviewed-by: default avatarJames Morris <jamorris@linux.microsoft.com>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Igor Lubashev <ilubashe@akamai.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Serge Hallyn <serge@hallyn.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: intel-gfx@lists.freedesktop.org
      Cc: linux-doc@vger.kernel.org
      Cc: linux-man@vger.kernel.org
      Cc: linux-security-module@vger.kernel.org
      Cc: selinux@vger.kernel.org
      Link: http://lore.kernel.org/lkml/c0a0ae47-8b6e-ff3e-416b-3cd1faaf71c0@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      031258da
    • Alexey Budankov's avatar
      drm/i915/perf: Open access for CAP_PERFMON privileged process · 4e3d3456
      Alexey Budankov authored
      Open access to i915_perf monitoring for CAP_PERFMON privileged process.
      Providing the access under CAP_PERFMON capability singly, without the
      rest of CAP_SYS_ADMIN credentials, excludes chances to misuse the
      credentials and makes operation more secure.
      
      CAP_PERFMON implements the principle of least privilege for performance
      monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39
      principle of least privilege: A security design principle that states
      that a process or program be granted only those privileges (e.g.,
      capabilities) necessary to accomplish its legitimate function, and only
      for the time that such privileges are actually required)
      
      For backward compatibility reasons access to i915_events subsystem remains
      open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for
      secure i915_events monitoring is discouraged with respect to CAP_PERFMON
      capability.
      Signed-off-by: default avatarAlexey Budankov <alexey.budankov@linux.intel.com>
      Reviewed-by: default avatarJames Morris <jamorris@linux.microsoft.com>
      Acked-by: default avatarLionel Landwerlin <lionel.g.landwerlin@intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Igor Lubashev <ilubashe@akamai.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Serge Hallyn <serge@hallyn.com>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: intel-gfx@lists.freedesktop.org
      Cc: linux-doc@vger.kernel.org
      Cc: linux-man@vger.kernel.org
      Cc: linux-security-module@vger.kernel.org
      Cc: selinux@vger.kernel.org
      Link: http://lore.kernel.org/lkml/e3e3292f-f765-ea98-e59c-fbe2db93fd34@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      4e3d3456
    • Alexey Budankov's avatar
      perf tools: Support CAP_PERFMON capability · 6b3e0e2e
      Alexey Budankov authored
      Extend error messages to mention CAP_PERFMON capability as an option to
      substitute CAP_SYS_ADMIN capability for secure system performance
      monitoring and observability operations. Make
      perf_event_paranoid_check() and __cmd_ftrace() to be aware of
      CAP_PERFMON capability.
      
      CAP_PERFMON implements the principle of least privilege for performance
      monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39
      principle of least privilege: A security design principle that states
      that a process or program be granted only those privileges (e.g.,
      capabilities) necessary to accomplish its legitimate function, and only
      for the time that such privileges are actually required)
      
      For backward compatibility reasons access to perf_events subsystem remains
      open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for
      secure perf_events monitoring is discouraged with respect to CAP_PERFMON
      capability.
      
      Committer testing:
      
      Using a libcap with this patch:
      
        diff --git a/libcap/include/uapi/linux/capability.h b/libcap/include/uapi/linux/capability.h
        index 78b2fd4c8a95..89b5b0279b60 100644
        --- a/libcap/include/uapi/linux/capability.h
        +++ b/libcap/include/uapi/linux/capability.h
        @@ -366,8 +366,9 @@ struct vfs_ns_cap_data {
      
         #define CAP_AUDIT_READ       37
      
        +#define CAP_PERFMON	     38
      
        -#define CAP_LAST_CAP         CAP_AUDIT_READ
        +#define CAP_LAST_CAP         CAP_PERFMON
      
         #define cap_valid(x) ((x) >= 0 && (x) <= CAP_LAST_CAP)
      
      Note that using '38' in place of 'cap_perfmon' works to some degree with
      an old libcap, its only when cap_get_flag() is called that libcap
      performs an error check based on the maximum value known for
      capabilities that it will fail.
      
      This makes determining the default of perf_event_attr.exclude_kernel to
      fail, as it can't determine if CAP_PERFMON is in place.
      
      Using 'perf top -e cycles' avoids the default check and sets
      perf_event_attr.exclude_kernel to 1.
      
      As root, with a libcap supporting CAP_PERFMON:
      
        # groupadd perf_users
        # adduser perf -g perf_users
        # mkdir ~perf/bin
        # cp ~acme/bin/perf ~perf/bin/
        # chgrp perf_users ~perf/bin/perf
        # setcap "cap_perfmon,cap_sys_ptrace,cap_syslog=ep" ~perf/bin/perf
        # getcap ~perf/bin/perf
        /home/perf/bin/perf = cap_sys_ptrace,cap_syslog,cap_perfmon+ep
        # ls -la ~perf/bin/perf
        -rwxr-xr-x. 1 root perf_users 16968552 Apr  9 13:10 /home/perf/bin/perf
      
      As the 'perf' user in the 'perf_users' group:
      
        $ perf top -a --stdio
        Error:
        Failed to mmap with 1 (Operation not permitted)
        $
      
      Either add the cap_ipc_lock capability to the perf binary or reduce the
      ring buffer size to some smaller value:
      
        $ perf top -m10 -a --stdio
        rounding mmap pages size to 64K (16 pages)
        Error:
        Failed to mmap with 1 (Operation not permitted)
        $ perf top -m4 -a --stdio
        Error:
        Failed to mmap with 1 (Operation not permitted)
        $ perf top -m2 -a --stdio
         PerfTop: 762 irqs/sec  kernel:49.7%  exact: 100.0% lost: 0/0 drop: 0/0 [4000Hz cycles], (all, 4 CPUs)
        ------------------------------------------------------------------------------------------------------
      
           9.83%  perf                [.] __symbols__insert
           8.58%  perf                [.] rb_next
           5.91%  [kernel]            [k] module_get_kallsym
           5.66%  [kernel]            [k] kallsyms_expand_symbol.constprop.0
           3.98%  libc-2.29.so        [.] __GI_____strtoull_l_internal
           3.66%  perf                [.] rb_insert_color
           2.34%  [kernel]            [k] vsnprintf
           2.30%  [kernel]            [k] string_nocheck
           2.16%  libc-2.29.so        [.] _IO_getdelim
           2.15%  [kernel]            [k] number
           2.13%  [kernel]            [k] format_decode
           1.58%  libc-2.29.so        [.] _IO_feof
           1.52%  libc-2.29.so        [.] __strcmp_avx2
           1.50%  perf                [.] rb_set_parent_color
           1.47%  libc-2.29.so        [.] __libc_calloc
           1.24%  [kernel]            [k] do_syscall_64
           1.17%  [kernel]            [k] __x86_indirect_thunk_rax
      
        $ perf record -a sleep 1
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.552 MB perf.data (74 samples) ]
        $ perf evlist
        cycles
        $ perf evlist -v
        cycles: size: 120, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1
        $ perf report | head -20
        # To display the perf.data header info, please use --header/--header-only options.
        #
        #
        # Total Lost Samples: 0
        #
        # Samples: 74  of event 'cycles'
        # Event count (approx.): 15694834
        #
        # Overhead  Command          Shared Object               Symbol
        # ........  ...............  ..........................  ......................................
        #
            19.62%  perf             [kernel.vmlinux]            [k] strnlen_user
            13.88%  swapper          [kernel.vmlinux]            [k] intel_idle
            13.83%  ksoftirqd/0      [kernel.vmlinux]            [k] pfifo_fast_dequeue
            13.51%  swapper          [kernel.vmlinux]            [k] kmem_cache_free
             6.31%  gnome-shell      [kernel.vmlinux]            [k] kmem_cache_free
             5.66%  kworker/u8:3+ix  [kernel.vmlinux]            [k] delay_tsc
             4.42%  perf             [kernel.vmlinux]            [k] __set_cpus_allowed_ptr
             3.45%  kworker/2:1-eve  [kernel.vmlinux]            [k] shmem_truncate_range
             2.29%  gnome-shell      libgobject-2.0.so.0.6000.7  [.] g_closure_ref
        $
      Signed-off-by: default avatarAlexey Budankov <alexey.budankov@linux.intel.com>
      Reviewed-by: default avatarJames Morris <jamorris@linux.microsoft.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Igor Lubashev <ilubashe@akamai.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Serge Hallyn <serge@hallyn.com>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: intel-gfx@lists.freedesktop.org
      Cc: linux-doc@vger.kernel.org
      Cc: linux-man@vger.kernel.org
      Cc: linux-security-module@vger.kernel.org
      Cc: selinux@vger.kernel.org
      Link: http://lore.kernel.org/lkml/a66d5648-2b8e-577e-e1f2-1d56c017ab5e@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      6b3e0e2e
    • Alexey Budankov's avatar
      perf/core: open access to probes for CAP_PERFMON privileged process · c9e0924e
      Alexey Budankov authored
      Open access to monitoring via kprobes and uprobes and eBPF tracing for
      CAP_PERFMON privileged process. Providing the access under CAP_PERFMON
      capability singly, without the rest of CAP_SYS_ADMIN credentials,
      excludes chances to misuse the credentials and makes operation more
      secure.
      
      perf kprobes and uprobes are used by ftrace and eBPF. perf probe uses
      ftrace to define new kprobe events, and those events are treated as
      tracepoint events. eBPF defines new probes via perf_event_open interface
      and then the probes are used in eBPF tracing.
      
      CAP_PERFMON implements the principle of least privilege for performance
      monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39
      principle of least privilege: A security design principle that states
      that a process or program be granted only those privileges (e.g.,
      capabilities) necessary to accomplish its legitimate function, and only
      for the time that such privileges are actually required)
      
      For backward compatibility reasons access to perf_events subsystem
      remains open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN
      usage for secure perf_events monitoring is discouraged with respect to
      CAP_PERFMON capability.
      Signed-off-by: default avatarAlexey Budankov <alexey.budankov@linux.intel.com>
      Reviewed-by: default avatarJames Morris <jamorris@linux.microsoft.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Igor Lubashev <ilubashe@akamai.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Serge Hallyn <serge@hallyn.com>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: intel-gfx@lists.freedesktop.org
      Cc: linux-doc@vger.kernel.org
      Cc: linux-security-module@vger.kernel.org
      Cc: selinux@vger.kernel.org
      Cc: linux-man@vger.kernel.org
      Link: http://lore.kernel.org/lkml/3c129d9a-ba8a-3483-ecc5-ad6c8e7c203f@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      c9e0924e