1. 18 Apr, 2020 1 commit
  2. 16 Apr, 2020 38 commits
    • Adrian Hunter's avatar
      perf intel-pt: Add support for synthesizing callchains for regular events · 2855c05c
      Adrian Hunter authored
      Currently, callchains can be synthesized only for synthesized events.
      Support also synthesizing callchains for regular events.
      
      Example:
      
       # perf record --kcore --aux-sample -e '{intel_pt//,cycles}' -c 10000 uname
       Linux
       [ perf record: Woken up 3 times to write data ]
       [ perf record: Captured and wrote 0.532 MB perf.data ]
       # perf script --itrace=Ge | head -20
       uname  4864 2419025.358181:      10000     cycles:
              ffffffffbba56965 apparmor_bprm_committing_creds+0x35 ([kernel.kallsyms])
              ffffffffbc400cd5 __indirect_thunk_start+0x5 ([kernel.kallsyms])
              ffffffffbba07422 security_bprm_committing_creds+0x22 ([kernel.kallsyms])
              ffffffffbb89805d install_exec_creds+0xd ([kernel.kallsyms])
              ffffffffbb90d9ac load_elf_binary+0x3ac ([kernel.kallsyms])
      
       uname  4864 2419025.358185:      10000     cycles:
              ffffffffbba56db0 apparmor_bprm_committed_creds+0x20 ([kernel.kallsyms])
              ffffffffbc400cd5 __indirect_thunk_start+0x5 ([kernel.kallsyms])
              ffffffffbba07452 security_bprm_committed_creds+0x22 ([kernel.kallsyms])
              ffffffffbb89809a install_exec_creds+0x4a ([kernel.kallsyms])
              ffffffffbb90d9ac load_elf_binary+0x3ac ([kernel.kallsyms])
      
       uname  4864 2419025.358189:      10000     cycles:
              ffffffffbb86fdf6 vma_adjust_trans_huge+0x6 ([kernel.kallsyms])
              ffffffffbb821660 __vma_adjust+0x160 ([kernel.kallsyms])
              ffffffffbb897be7 shift_arg_pages+0x97 ([kernel.kallsyms])
              ffffffffbb897ed9 setup_arg_pages+0x1e9 ([kernel.kallsyms])
              ffffffffbb90d9f2 load_elf_binary+0x3f2 ([kernel.kallsyms])
      
      Committer testing:
      
        # perf record --kcore --aux-sample -e '{intel_pt//,cycles}' -c 10000 uname
        Linux
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.233 MB perf.data ]
        #
      
      Then, before this patch:
      
        # perf script --itrace=Ge | head -20
           uname 28642 168664.856384: 10000 cycles: ffffffff9810aeaa commit_creds+0x2a ([kernel.kallsyms])
           uname 28642 168664.856388: 10000 cycles: ffffffff982a24f1 mprotect_fixup+0x151 ([kernel.kallsyms])
           uname 28642 168664.856392: 10000 cycles: ffffffff982a385b move_page_tables+0xbcb ([kernel.kallsyms])
           uname 28642 168664.856396: 10000 cycles: ffffffff982fd4ec __mod_memcg_state+0x1c ([kernel.kallsyms])
           uname 28642 168664.856400: 10000 cycles: ffffffff9829fddd do_mmap+0xfd ([kernel.kallsyms])
           uname 28642 168664.856404: 10000 cycles: ffffffff9829c879 __vma_adjust+0x479 ([kernel.kallsyms])
           uname 28642 168664.856408: 10000 cycles: ffffffff98238e94 __perf_addr_filters_adjust+0x34 ([kernel.kallsyms])
           uname 28642 168664.856412: 10000 cycles: ffffffff98a38e0b down_write+0x1b ([kernel.kallsyms])
           uname 28642 168664.856416: 10000 cycles: ffffffff983006a0 memcg_kmem_get_cache+0x0 ([kernel.kallsyms])
           uname 28642 168664.856421: 10000 cycles: ffffffff98396eaf load_elf_binary+0x92f ([kernel.kallsyms])
           uname 28642 168664.856425: 10000 cycles: ffffffff982e0222 kfree+0x62 ([kernel.kallsyms])
           uname 28642 168664.856428: 10000 cycles: ffffffff9846dfd4 file_has_perm+0x54 ([kernel.kallsyms])
           uname 28642 168664.856433: 10000 cycles: ffffffff98288911 vma_interval_tree_insert+0x51 ([kernel.kallsyms])
           uname 28642 168664.856437: 10000 cycles: ffffffff9823e577 perf_event_mmap_output+0x27 ([kernel.kallsyms])
           uname 28642 168664.856441: 10000 cycles: ffffffff98a26fa0 xas_load+0x40 ([kernel.kallsyms])
           uname 28642 168664.856445: 10000 cycles: ffffffff98004f30 arch_setup_additional_pages+0x0 ([kernel.kallsyms])
           uname 28642 168664.856448: 10000 cycles: ffffffff98a297c0 copy_user_generic_unrolled+0xa0 ([kernel.kallsyms])
           uname 28642 168664.856452: 10000 cycles: ffffffff9853a87a strnlen_user+0x10a ([kernel.kallsyms])
           uname 28642 168664.856456: 10000 cycles: ffffffff986638a7 randomize_page+0x27 ([kernel.kallsyms])
           uname 28642 168664.856460: 10000 cycles: ffffffff98a3b645 _raw_spin_lock+0x5 ([kernel.kallsyms])
      
        #
      
      And after:
      
        # perf script --itrace=Ge | head -20
        uname 28642 168664.856384:      10000     cycles:
        	ffffffff9810aeaa commit_creds+0x2a ([kernel.kallsyms])
        	ffffffff9831fe87 install_exec_creds+0x17 ([kernel.kallsyms])
        	ffffffff983968d9 load_elf_binary+0x359 ([kernel.kallsyms])
        	ffffffff98e00c45 __x86_indirect_thunk_rax+0x5 ([kernel.kallsyms])
        	ffffffff98e00c45 __x86_indirect_thunk_rax+0x5 ([kernel.kallsyms])
      
        uname 28642 168664.856388:      10000     cycles:
        	ffffffff982a24f1 mprotect_fixup+0x151 ([kernel.kallsyms])
        	ffffffff9831fa83 setup_arg_pages+0x123 ([kernel.kallsyms])
        	ffffffff9839691f load_elf_binary+0x39f ([kernel.kallsyms])
        	ffffffff98e00c45 __x86_indirect_thunk_rax+0x5 ([kernel.kallsyms])
        	ffffffff98e00c45 __x86_indirect_thunk_rax+0x5 ([kernel.kallsyms])
      
        uname 28642 168664.856392:      10000     cycles:
        	ffffffff982a385b move_page_tables+0xbcb ([kernel.kallsyms])
        	ffffffff9831f889 shift_arg_pages+0xa9 ([kernel.kallsyms])
        	ffffffff9831fb4f setup_arg_pages+0x1ef ([kernel.kallsyms])
        	ffffffff9839691f load_elf_binary+0x39f ([kernel.kallsyms])
        	ffffffff98e00c45 __x86_indirect_thunk_rax+0x5 ([kernel.kallsyms])
        #
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lore.kernel.org/lkml/20200401101613.6201-12-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      2855c05c
    • Adrian Hunter's avatar
      perf evsel: Add support for synthesized sample type · e11869a0
      Adrian Hunter authored
      For reporting purposes, an evsel sample can have a callchain synthesized
      from AUX area data. Add support for keeping track of synthesized sample
      types. Note, the recorded sample_type cannot be changed because it is
      needed to continue to parse events.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lore.kernel.org/lkml/20200401101613.6201-11-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      e11869a0
    • Adrian Hunter's avatar
      perf evsel: Be consistent when looking which evsel PERF_SAMPLE_ bits are set · 8e94b324
      Adrian Hunter authored
      Using 'type' variable for checking for callchains is equivalent to using
      evsel__has_callchain(evsel) and is how the other PERF_SAMPLE_ bits are checked
      in this function, so use it to be consistent.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lore.kernel.org/lkml/20200401101613.6201-11-adrian.hunter@intel.com
      [ split from a larger patch ]
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      8e94b324
    • Adrian Hunter's avatar
      perf thread-stack: Add thread_stack__sample_late() · 4fef41bf
      Adrian Hunter authored
      Add a thread stack function to create a call chain for hardware events
      where the sample records get created some time after the event occurred.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lore.kernel.org/lkml/20200401101613.6201-10-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      4fef41bf
    • Adrian Hunter's avatar
      perf auxtrace: Add an option to synthesize callchains for regular events · 1c5c25b3
      Adrian Hunter authored
      Currently, callchains can be synthesized only for synthesized events. Add
      an itrace option to synthesize callchains for regular events.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lore.kernel.org/lkml/20200401101613.6201-9-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      1c5c25b3
    • Adrian Hunter's avatar
      perf auxtrace: For reporting purposes, un-group AUX area event · 5c7bec0c
      Adrian Hunter authored
      An AUX area event must be the group leader when recording traces in
      sample mode, but that does not produce the expected results from
      'perf report' because it expects the leader to provide samples.
      
      Rather than teach 'perf report' about AUX area sampling, un-group the
      AUX area event during processing, making the 2nd event the leader.
      
      Example:
      
       $ perf record -e '{intel_pt//u,branch-misses:u}' -c 1 uname
       Linux
       [ perf record: Woken up 1 times to write data ]
       [ perf record: Captured and wrote 0.080 MB perf.data ]
      
       Before:
      
       $ perf report
      
       Samples: 800  of events 'anon group { intel_pt//u, branch-misses:u }', Event count (approx.): 800
              Children              Self  Command  Shared Object     Symbol
           0.00%  47.50%     0.00%  47.50%  uname    libc-2.28.so      [.] _dl_addr
           0.00%  16.38%     0.00%  16.38%  uname    ld-2.28.so        [.] __GI___tunables_init
           0.00%  54.75%     0.00%   4.75%  uname    ld-2.28.so        [.] dl_main
           0.00%   3.12%     0.00%   3.12%  uname    ld-2.28.so        [.] _dl_map_object_from_fd
           0.00%   2.38%     0.00%   2.38%  uname    ld-2.28.so        [.] strcmp
           0.00%   2.25%     0.00%   2.25%  uname    ld-2.28.so        [.] _dl_check_map_versions
           0.00%   2.00%     0.00%   2.00%  uname    ld-2.28.so        [.] _dl_important_hwcaps
           0.00%   2.00%     0.00%   2.00%  uname    ld-2.28.so        [.] _dl_map_object_deps
           0.00%  51.50%     0.00%   1.50%  uname    ld-2.28.so        [.] _dl_sysdep_start
           0.00%   1.25%     0.00%   1.25%  uname    ld-2.28.so        [.] _dl_load_cache_lookup
           0.00%  51.12%     0.00%   1.12%  uname    ld-2.28.so        [.] _dl_start
           0.00%  50.88%     0.00%   1.12%  uname    ld-2.28.so        [.] do_lookup_x
           0.00%  50.62%     0.00%   1.00%  uname    ld-2.28.so        [.] _dl_lookup_symbol_x
           0.00%   1.00%     0.00%   1.00%  uname    ld-2.28.so        [.] _dl_map_object
           0.00%   1.00%     0.00%   1.00%  uname    ld-2.28.so        [.] _dl_next_ld_env_entry
           0.00%   0.88%     0.00%   0.88%  uname    ld-2.28.so        [.] _dl_cache_libcmp
           0.00%   0.88%     0.00%   0.88%  uname    ld-2.28.so        [.] _dl_new_object
           0.00%  50.88%     0.00%   0.88%  uname    ld-2.28.so        [.] _dl_relocate_object
           0.00%   0.62%     0.00%   0.62%  uname    ld-2.28.so        [.] _dl_init_paths
           0.00%   0.62%     0.00%   0.62%  uname    ld-2.28.so        [.] _dl_name_match_p
           0.00%   0.50%     0.00%   0.50%  uname    ld-2.28.so        [.] get_common_indeces.constprop.1
           0.00%   0.50%     0.00%   0.50%  uname    ld-2.28.so        [.] memmove
           0.00%   0.50%     0.00%   0.50%  uname    ld-2.28.so        [.] memset
           0.00%   0.50%     0.00%   0.50%  uname    ld-2.28.so        [.] open_verify.constprop.11
           0.00%   0.38%     0.00%   0.38%  uname    ld-2.28.so        [.] _dl_check_all_versions
           0.00%   0.38%     0.00%   0.38%  uname    ld-2.28.so        [.] _dl_find_dso_for_object
           0.00%   0.38%     0.00%   0.38%  uname    ld-2.28.so        [.] init_tls
           0.00%   0.25%     0.00%   0.25%  uname    ld-2.28.so        [.] __tunable_get_val
           0.00%   0.25%     0.00%   0.25%  uname    ld-2.28.so        [.] _dl_add_to_namespace_list
           0.00%   0.25%     0.00%   0.25%  uname    ld-2.28.so        [.] _dl_determine_tlsoffset
           0.00%   0.25%     0.00%   0.25%  uname    ld-2.28.so        [.] _dl_discover_osversion
           0.00%   0.25%     0.00%   0.25%  uname    ld-2.28.so        [.] calloc@plt
           0.00%   0.25%     0.00%   0.25%  uname    ld-2.28.so        [.] malloc
           0.00%   0.25%     0.00%   0.25%  uname    ld-2.28.so        [.] malloc@plt
           0.00%   0.25%     0.00%   0.25%  uname    libc-2.28.so      [.] _nl_load_locale_from_archive
           0.00%   0.25%     0.00%   0.25%  uname    [unknown]         [k] 0xffffffffa3a00010
           0.00%   0.12%     0.00%   0.12%  uname    ld-2.28.so        [.] __libc_scratch_buffer_set_array_size
           0.00%   0.12%     0.00%   0.12%  uname    ld-2.28.so        [.] _dl_allocate_tls_storage
           0.00%   0.12%     0.00%   0.12%  uname    ld-2.28.so        [.] _dl_catch_exception
           0.00%   0.12%     0.00%   0.12%  uname    ld-2.28.so        [.] _dl_setup_hash
           0.00%   0.12%     0.00%   0.12%  uname    ld-2.28.so        [.] _dl_sort_maps
           0.00%   0.12%     0.00%   0.12%  uname    ld-2.28.so        [.] _dl_sysdep_read_whole_file
           0.00%   0.12%     0.00%   0.12%  uname    ld-2.28.so        [.] access
           0.00%   0.12%     0.00%   0.12%  uname    ld-2.28.so        [.] calloc
           0.00%   0.12%     0.00%   0.12%  uname    ld-2.28.so        [.] mmap64
           0.00%   0.12%     0.00%   0.12%  uname    ld-2.28.so        [.] openaux
           0.00%   0.12%     0.00%   0.12%  uname    ld-2.28.so        [.] rtld_lock_default_lock_recursive
           0.00%   0.12%     0.00%   0.12%  uname    ld-2.28.so        [.] rtld_lock_default_unlock_recursive
           0.00%   0.12%     0.00%   0.12%  uname    ld-2.28.so        [.] strchr
           0.00%   0.12%     0.00%   0.12%  uname    ld-2.28.so        [.] strlen
           0.00%   0.12%     0.00%   0.12%  uname    ld-2.28.so        [.] 0x0000000000001080
           0.00%   0.12%     0.00%   0.12%  uname    libc-2.28.so      [.] __strchrnul_avx2
           0.00%   0.12%     0.00%   0.12%  uname    libc-2.28.so      [.] _nl_normalize_codeset
           0.00%   0.12%     0.00%   0.12%  uname    libc-2.28.so      [.] malloc
           0.00%   0.12%     0.00%   0.12%  uname    [unknown]         [k] 0xffffffffa3a011f0
           0.00%  50.00%     0.00%   0.00%  uname    ld-2.28.so        [.] _dl_start_user
           0.00%  50.00%     0.00%   0.00%  uname    [unknown]         [.] 0000000000000000
      
       After:
      
       Samples: 800  of event 'branch-misses:u', Event count (approx.): 800
        Children      Self  Command  Shared Object     Symbol
          54.75%     4.75%  uname    ld-2.28.so        [.] dl_main
          51.50%     1.50%  uname    ld-2.28.so        [.] _dl_sysdep_start
          51.12%     1.12%  uname    ld-2.28.so        [.] _dl_start
          50.88%     0.88%  uname    ld-2.28.so        [.] _dl_relocate_object
          50.88%     1.12%  uname    ld-2.28.so        [.] do_lookup_x
          50.62%     1.00%  uname    ld-2.28.so        [.] _dl_lookup_symbol_x
          50.00%     0.00%  uname    ld-2.28.so        [.] _dl_start_user
          50.00%     0.00%  uname    [unknown]         [.] 0000000000000000
          47.50%    47.50%  uname    libc-2.28.so      [.] _dl_addr
          16.38%    16.38%  uname    ld-2.28.so        [.] __GI___tunables_init
           3.12%     3.12%  uname    ld-2.28.so        [.] _dl_map_object_from_fd
           2.38%     2.38%  uname    ld-2.28.so        [.] strcmp
           2.25%     2.25%  uname    ld-2.28.so        [.] _dl_check_map_versions
           2.00%     2.00%  uname    ld-2.28.so        [.] _dl_important_hwcaps
           2.00%     2.00%  uname    ld-2.28.so        [.] _dl_map_object_deps
           1.25%     1.25%  uname    ld-2.28.so        [.] _dl_load_cache_lookup
           1.00%     1.00%  uname    ld-2.28.so        [.] _dl_map_object
           1.00%     1.00%  uname    ld-2.28.so        [.] _dl_next_ld_env_entry
           0.88%     0.88%  uname    ld-2.28.so        [.] _dl_cache_libcmp
           0.88%     0.88%  uname    ld-2.28.so        [.] _dl_new_object
           0.62%     0.62%  uname    ld-2.28.so        [.] _dl_init_paths
           0.62%     0.62%  uname    ld-2.28.so        [.] _dl_name_match_p
           0.50%     0.50%  uname    ld-2.28.so        [.] get_common_indeces.constprop.1
           0.50%     0.50%  uname    ld-2.28.so        [.] memmove
           0.50%     0.50%  uname    ld-2.28.so        [.] memset
           0.50%     0.50%  uname    ld-2.28.so        [.] open_verify.constprop.11
           0.38%     0.38%  uname    ld-2.28.so        [.] _dl_check_all_versions
           0.38%     0.38%  uname    ld-2.28.so        [.] _dl_find_dso_for_object
           0.38%     0.38%  uname    ld-2.28.so        [.] init_tls
           0.25%     0.25%  uname    ld-2.28.so        [.] __tunable_get_val
           0.25%     0.25%  uname    ld-2.28.so        [.] _dl_add_to_namespace_list
           0.25%     0.25%  uname    ld-2.28.so        [.] _dl_determine_tlsoffset
           0.25%     0.25%  uname    ld-2.28.so        [.] _dl_discover_osversion
           0.25%     0.25%  uname    ld-2.28.so        [.] calloc@plt
           0.25%     0.25%  uname    ld-2.28.so        [.] malloc
           0.25%     0.25%  uname    ld-2.28.so        [.] malloc@plt
           0.25%     0.25%  uname    libc-2.28.so      [.] _nl_load_locale_from_archive
           0.25%     0.25%  uname    [unknown]         [k] 0xffffffffa3a00010
           0.12%     0.12%  uname    ld-2.28.so        [.] __libc_scratch_buffer_set_array_size
           0.12%     0.12%  uname    ld-2.28.so        [.] _dl_allocate_tls_storage
           0.12%     0.12%  uname    ld-2.28.so        [.] _dl_catch_exception
           0.12%     0.12%  uname    ld-2.28.so        [.] _dl_setup_hash
           0.12%     0.12%  uname    ld-2.28.so        [.] _dl_sort_maps
           0.12%     0.12%  uname    ld-2.28.so        [.] _dl_sysdep_read_whole_file
           0.12%     0.12%  uname    ld-2.28.so        [.] access
           0.12%     0.12%  uname    ld-2.28.so        [.] calloc
           0.12%     0.12%  uname    ld-2.28.so        [.] mmap64
           0.12%     0.12%  uname    ld-2.28.so        [.] openaux
           0.12%     0.12%  uname    ld-2.28.so        [.] rtld_lock_default_lock_recursive
           0.12%     0.12%  uname    ld-2.28.so        [.] rtld_lock_default_unlock_recursive
           0.12%     0.12%  uname    ld-2.28.so        [.] strchr
           0.12%     0.12%  uname    ld-2.28.so        [.] strlen
           0.12%     0.12%  uname    ld-2.28.so        [.] 0x0000000000001080
           0.12%     0.12%  uname    libc-2.28.so      [.] __strchrnul_avx2
           0.12%     0.12%  uname    libc-2.28.so      [.] _nl_normalize_codeset
           0.12%     0.12%  uname    libc-2.28.so      [.] malloc
           0.12%     0.12%  uname    [unknown]         [k] 0xffffffffa3a011f0
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lore.kernel.org/lkml/20200401101613.6201-8-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      5c7bec0c
    • Adrian Hunter's avatar
      perf s390-cpumsf: Implement ->evsel_is_auxtrace() callback · 113fcb46
      Adrian Hunter authored
      Implement ->evsel_is_auxtrace() callback.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Acked-by: default avatarThomas Richter <tmricht@linux.ibm.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lore.kernel.org/lkml/20200401101613.6201-7-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      113fcb46
    • Adrian Hunter's avatar
      perf cs-etm: Implement ->evsel_is_auxtrace() callback · a58ab57c
      Adrian Hunter authored
      Implement ->evsel_is_auxtrace() callback.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Reviewed-by: default avatarMathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lore.kernel.org/lkml/20200401101613.6201-6-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      a58ab57c
    • Adrian Hunter's avatar
      perf arm-spe: Implement ->evsel_is_auxtrace() callback · 508c71e3
      Adrian Hunter authored
      Implement ->evsel_is_auxtrace() callback.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Reviewed-by: default avatarLeo Yan <leo.yan@linaro.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kim Phillips <kim.phillips@arm.com>
      Link: http://lore.kernel.org/lkml/20200401101613.6201-5-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      508c71e3
    • Adrian Hunter's avatar
      perf intel-bts: Implement ->evsel_is_auxtrace() callback · 966246f5
      Adrian Hunter authored
      Implement ->evsel_is_auxtrace() callback.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lore.kernel.org/lkml/20200401101613.6201-4-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      966246f5
    • Adrian Hunter's avatar
      perf intel-pt: Implement ->evsel_is_auxtrace() callback · 6b52bb07
      Adrian Hunter authored
      Implement ->evsel_is_auxtrace() callback.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lore.kernel.org/lkml/20200401101613.6201-3-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      6b52bb07
    • Adrian Hunter's avatar
      perf auxtrace: Add ->evsel_is_auxtrace() callback · 853f37d7
      Adrian Hunter authored
      Add ->evsel_is_auxtrace() callback to identify if a selected event
      is an AUX area event.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kim Phillips <kim.phillips@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Thomas Richter <tmricht@linux.ibm.com>
      Link: http://lore.kernel.org/lkml/20200401101613.6201-2-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      853f37d7
    • Andreas Gerstmayr's avatar
      perf script: Add flamegraph.py script · 5287f926
      Andreas Gerstmayr authored
      This script works in tandem with d3-flame-graph to generate flame graphs
      from perf. It supports two output formats: JSON and HTML (the default).
      The HTML format will look for a standalone d3-flame-graph template file
      in /usr/share/d3-flame-graph/d3-flamegraph-base.html and fill in the
      collected stacks.
      
      Usage:
      
          perf record -a -g -F 99 sleep 60
          perf script report flamegraph
      
      Combined:
      
          perf script flamegraph -a -F 99 sleep 60
      
      Committer testing:
      
      Tested both with "PYTHON=python3" and with the default, that uses
      python2-devel:
      
      Complete set of instructions:
      
        $ mkdir /tmp/build/perf
        $ make PYTHON=python3 -C tools/perf O=/tmp/build/perf install-bin
        $ export PATH=~/bin:$PATH
        $ perf record -a -g -F 99 sleep 60
        $ perf script report flamegraph
      
      Now go and open the generated flamegraph.html file in a browser.
      
      At first this required building with PYTHON=python3, but after I
      reported this Andreas was kind enough to send a patch making it work
      with both python and python3.
      Signed-off-by: default avatarAndreas Gerstmayr <agerstmayr@redhat.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Brendan Gregg <bgregg@netflix.com>
      Cc: Martin Spier <mspier@netflix.com>
      Link: http://lore.kernel.org/lkml/20200320151355.66302-1-agerstmayr@redhat.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      5287f926
    • Kajol Jain's avatar
      perf metrictroup: Split the metricgroup__add_metric function · 47352aba
      Kajol Jain authored
      This patch refactors metricgroup__add_metric function where some part of
      it move to function metricgroup__add_metric_param.  No logic change.
      Signed-off-by: default avatarKajol Jain <kjain@linux.ibm.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Anju T Sudhakar <anju@linux.vnet.ibm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Joe Mario <jmario@redhat.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Cc: Mamatha Inamdar <mamatha4@linux.vnet.ibm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michael Petlan <mpetlan@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Mackerras <paulus@ozlabs.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linuxppc-dev@lists.ozlabs.org
      Link: http://lore.kernel.org/lkml/20200401203340.31402-4-kjain@linux.ibm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      47352aba
    • Jiri Olsa's avatar
      perf expr: Add expr_scanner_ctx object · 871f9f59
      Jiri Olsa authored
      Add the expr_scanner_ctx object to hold user data for the expr scanner.
      Currently it holds only start_token, Kajol Jain will use it to hold 24x7
      runtime param.
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Anju T Sudhakar <anju@linux.vnet.ibm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Joe Mario <jmario@redhat.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Cc: Mamatha Inamdar <mamatha4@linux.vnet.ibm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michael Petlan <mpetlan@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Mackerras <paulus@ozlabs.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linuxppc-dev@lists.ozlabs.org
      Link: http://lore.kernel.org/lkml/20200401203340.31402-3-kjain@linux.ibm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      871f9f59
    • Jiri Olsa's avatar
      perf expr: Add expr_ prefix for parse_ctx and parse_id · aecce63e
      Jiri Olsa authored
      Adding expr_ prefix for parse_ctx and parse_id, to straighten out the
      expr* namespace.
      
      There's no functional change.
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Anju T Sudhakar <anju@linux.vnet.ibm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Joe Mario <jmario@redhat.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Cc: Mamatha Inamdar <mamatha4@linux.vnet.ibm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michael Petlan <mpetlan@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Mackerras <paulus@ozlabs.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linuxppc-dev@lists.ozlabs.org
      Link: http://lore.kernel.org/lkml/20200401203340.31402-2-kjain@linux.ibm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      aecce63e
    • Ian Rogers's avatar
      perf synthetic-events: save 4kb from 2 stack frames · 04ed4ccb
      Ian Rogers authored
      Reuse an existing char buffer to avoid two PATH_MAX sized char buffers.
      
      Reduces stack frame sizes by 4kb.
      
      perf_event__synthesize_mmap_events before 'sub $0x45b8,%rsp' after
      'sub $0x35b8,%rsp'.
      
      perf_event__get_comm_ids before 'sub $0x2028,%rsp' after
      'sub $0x1028,%rsp'.
      
      The performance impact of this change is negligible.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andrey Zhizhikin <andrey.z@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lore.kernel.org/lkml/20200402154357.107873-4-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      04ed4ccb
    • Stephane Eranian's avatar
      tools api fs: Make xxx__mountpoint() more scalable · c6fddb28
      Stephane Eranian authored
      The xxx_mountpoint() interface provided by fs.c finds mount points for
      common pseudo filesystems. The first time xxx_mountpoint() is invoked,
      it scans the mount table (/proc/mounts) looking for a match. If found,
      it is cached. The price to scan /proc/mounts is paid once if the mount
      is found.
      
      When the mount point is not found, subsequent calls to xxx_mountpoint()
      scan /proc/mounts over and over again.  There is no caching.
      
      This causes a scaling issue in perf record with hugeltbfs__mountpoint().
      The function is called for each process found in
      synthesize__mmap_events().  If the machine has thousands of processes
      and if the /proc/mounts has many entries this could cause major overhead
      in perf record. We have observed multi-second slowdowns on some
      configurations.
      
      As an example on a laptop:
      
      Before:
      
        $ sudo umount /dev/hugepages
        $ strace -e trace=openat -o /tmp/tt perf record -a ls
        $ fgrep mounts /tmp/tt
        285
      
      After:
      
        $ sudo umount /dev/hugepages
        $ strace -e trace=openat -o /tmp/tt perf record -a ls
        $ fgrep mounts /tmp/tt
        1
      
      One could argue that the non-caching in case the moint point is not
      found is intentional. That way subsequent calls may discover a moint
      point if the sysadmin mounts the filesystem. But the same argument could
      be made against caching the mount point. It could be unmounted causing
      errors.  It all depends on the intent of the interface. This patch
      assumes it is expected to scan /proc/mounts once. The patch documents
      the caching behavior in the fs.h header file.
      
      An alternative would be to just fix perf record. But it would solve the
      problem with hugetlbs__mountpoint() but there could be similar issues
      (possibly down the line) with other xxx_mountpoint() calls in perf or
      other tools.
      Signed-off-by: default avatarStephane Eranian <eranian@google.com>
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andrey Zhizhikin <andrey.z@gmail.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lore.kernel.org/lkml/20200402154357.107873-3-irogers@google.comSigned-off-by: default avatarIan Rogers <irogers@google.com>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      c6fddb28
    • Ian Rogers's avatar
      perf bench: Add event synthesis benchmark · 2a4b5166
      Ian Rogers authored
      Event synthesis may occur at the start or end (tail) of a perf command.
      In system-wide mode it can scan every process in /proc, which may add
      seconds of latency before event recording. Add a new benchmark that
      times how long event synthesis takes with and without data synthesis.
      
      An example execution looks like:
      
       $ perf bench internals synthesize
       # Running 'internals/synthesize' benchmark:
       Average synthesis took: 168.253800 usec
       Average data synthesis took: 208.104700 usec
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andrey Zhizhikin <andrey.z@gmail.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lore.kernel.org/lkml/20200402154357.107873-2-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      2a4b5166
    • Adrian Hunter's avatar
      perf script: Simplify auxiliary event printing functions · 1a2725f3
      Adrian Hunter authored
      This simplifies the print functions for the following perf script
      options:
      
      	--show-task-events
      	--show-namespace-events
      	--show-cgroup-events
      	--show-mmap-events
      	--show-switch-events
      	--show-lost-events
      	--show-bpf-events
      
      Example:
      	# perf record --switch-events -a -e cycles -c 10000 sleep 1
       Before:
      	# perf script --show-task-events --show-namespace-events --show-cgroup-events --show-mmap-events --show-switch-events --show-lost-events --show-bpf-events > out-before.txt
       After:
      	# perf script --show-task-events --show-namespace-events --show-cgroup-events --show-mmap-events --show-switch-events --show-lost-events --show-bpf-events > out-after.txt
      	# diff -s out-before.txt out-after.txt
      	Files out-before.txt and out-after.tx are identical
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Link: http://lore.kernel.org/lkml/20200402141548.21283-1-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      1a2725f3
    • Alexey Budankov's avatar
      doc/admin-guide: update kernel.rst with CAP_PERFMON information · 025b16f8
      Alexey Budankov authored
      Update the kernel.rst documentation file with the information related to
      usage of CAP_PERFMON capability to secure performance monitoring and
      observability operations in system.
      Signed-off-by: default avatarAlexey Budankov <alexey.budankov@linux.intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Igor Lubashev <ilubashe@akamai.com>
      Cc: James Morris <jmorris@namei.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Serge Hallyn <serge@hallyn.com>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: intel-gfx@lists.freedesktop.org
      Cc: linux-doc@vger.kernel.org
      Cc: linux-man@vger.kernel.org
      Cc: linux-security-module@vger.kernel.org
      Cc: selinux@vger.kernel.org
      Link: http://lore.kernel.org/lkml/84c32383-14a2-fa35-16b6-f9e59bd37240@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      025b16f8
    • Alexey Budankov's avatar
      doc/admin-guide: Update perf-security.rst with CAP_PERFMON information · 902a8dcc
      Alexey Budankov authored
      Update perf-security.rst documentation file with the information
      related to usage of CAP_PERFMON capability to secure performance
      monitoring and observability operations in system.
      
      Committer notes:
      
      While testing 'perf top' under cap_perfmon I noticed that it needs
      some more capability and Alexey pointed out cap_ipc_lock, as needed by
      this kernel chunk:
      
        kernel/events/core.c: 6101
             if ((locked > lock_limit) && perf_is_paranoid() &&
                     !capable(CAP_IPC_LOCK)) {
                     ret = -EPERM;
                     goto unlock;
             }
      
      So I added it to the documentation, and also mentioned that if the
      libcap version doesn't yet supports 'cap_perfmon', its numeric value can
      be used instead, i.e. if:
      
      	# setcap "cap_perfmon,cap_ipc_lock,cap_sys_ptrace,cap_syslog=ep" perf
      
      Fails, try:
      
      	# setcap "38,cap_ipc_lock,cap_sys_ptrace,cap_syslog=ep" perf
      
      I also added a paragraph stating that using an unpatched libcap will
      fail the check for CAP_PERFMON, as it checks the cap number against a
      maximum to see if it is valid, which makes it use as the default the
      'cycles:u' event, even tho a cap_perfmon capable perf binary can get
      kernel samples, to workaround that just use, e.g.:
      
        # perf top -e cycles
        # perf record -e cycles
      
      And it will sample kernel and user modes.
      Signed-off-by: default avatarAlexey Budankov <alexey.budankov@linux.intel.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Igor Lubashev <ilubashe@akamai.com>
      Cc: James Morris <jmorris@namei.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Serge Hallyn <serge@hallyn.com>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: intel-gfx@lists.freedesktop.org
      Cc: linux-doc@vger.kernel.org
      Cc: linux-man@vger.kernel.org
      Cc: linux-security-module@vger.kernel.org
      Cc: selinux@vger.kernel.org
      Link: http://lore.kernel.org/lkml/17278551-9399-9ebe-d665-8827016a217d@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      902a8dcc
    • Alexey Budankov's avatar
      drivers/oprofile: Open access for CAP_PERFMON privileged process · ab76878b
      Alexey Budankov authored
      Open access to monitoring for CAP_PERFMON privileged process.  Providing
      the access under CAP_PERFMON capability singly, without the rest of
      CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials
      and makes operation more secure.
      
      CAP_PERFMON implements the principle of least privilege for performance
      monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39
      principle of least privilege: A security design principle that states
      that a process or program be granted only those privileges (e.g.,
      capabilities) necessary to accomplish its legitimate function, and only
      for the time that such privileges are actually required)
      
      For backward compatibility reasons access to the monitoring remains open
      for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for
      secure monitoring is discouraged with respect to CAP_PERFMON capability.
      Signed-off-by: default avatarAlexey Budankov <alexey.budankov@linux.intel.com>
      Acked-by: default avatarJames Morris <jamorris@linux.microsoft.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Igor Lubashev <ilubashe@akamai.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Serge Hallyn <serge@hallyn.com>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: intel-gfx@lists.freedesktop.org
      Cc: linux-doc@vger.kernel.org
      Cc: linux-man@vger.kernel.org
      Cc: linux-security-module@vger.kernel.org
      Cc: selinux@vger.kernel.org
      Link: http://lore.kernel.org/lkml/691f1096-b15f-9b12-50a0-c2b93918149e@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      ab76878b
    • Alexey Budankov's avatar
      drivers/perf: Open access for CAP_PERFMON privileged process · cea7d0d4
      Alexey Budankov authored
      Open access to monitoring for CAP_PERFMON privileged process.  Providing
      the access under CAP_PERFMON capability singly, without the rest of
      CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials
      and makes operation more secure.
      
      CAP_PERFMON implements the principle of least privilege for performance
      monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39
      principle of least privilege: A security design principle that states
      that a process or program be granted only those privileges (e.g.,
      capabilities) necessary to accomplish its legitimate function, and only
      for the time that such privileges are actually required)
      
      For backward compatibility reasons access to the monitoring remains open
      for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for
      secure monitoring is discouraged with respect to CAP_PERFMON capability.
      Signed-off-by: default avatarAlexey Budankov <alexey.budankov@linux.intel.com>
      Reviewed-by: default avatarJames Morris <jamorris@linux.microsoft.com>
      Acked-by: default avatarWill Deacon <will@kernel.org>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Igor Lubashev <ilubashe@akamai.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Serge Hallyn <serge@hallyn.com>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: intel-gfx@lists.freedesktop.org
      Cc: linux-doc@vger.kernel.org
      Cc: linux-man@vger.kernel.org
      Cc: linux-security-module@vger.kernel.org
      Cc: selinux@vger.kernel.org
      Link: http://lore.kernel.org/lkml/4ec1d6f7-548c-8d1c-f84a-cebeb9674e4e@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      cea7d0d4
    • Alexey Budankov's avatar
      parisc/perf: open access for CAP_PERFMON privileged process · cf91baf3
      Alexey Budankov authored
      Open access to monitoring for CAP_PERFMON privileged process.  Providing
      the access under CAP_PERFMON capability singly, without the rest of
      CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials
      and makes operation more secure.
      
      CAP_PERFMON implements the principle of least privilege for performance
      monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39
      principle of least privilege: A security design principle that states
      that a process or program be granted only those privileges (e.g.,
      capabilities) necessary to accomplish its legitimate function, and only
      for the time that such privileges are actually required)
      
      For backward compatibility reasons access to the monitoring remains open
      for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for
      secure monitoring is discouraged with respect to CAP_PERFMON capability.
      Signed-off-by: default avatarAlexey Budankov <alexey.budankov@linux.intel.com>
      Reviewed-by: default avatarJames Morris <jamorris@linux.microsoft.com>
      Acked-by: default avatarHelge Deller <deller@gmx.de>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Igor Lubashev <ilubashe@akamai.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Serge Hallyn <serge@hallyn.com>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: intel-gfx@lists.freedesktop.org
      Cc: linux-doc@vger.kernel.org
      Cc: linux-man@vger.kernel.org
      Cc: linux-security-module@vger.kernel.org
      Cc: selinux@vger.kernel.org
      Link: http://lore.kernel.org/lkml/8cc98809-d35b-de0f-de02-4cf554f3cf62@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      cf91baf3
    • Alexey Budankov's avatar
      powerpc/perf: open access for CAP_PERFMON privileged process · ff467583
      Alexey Budankov authored
      Open access to monitoring for CAP_PERFMON privileged process.  Providing
      the access under CAP_PERFMON capability singly, without the rest of
      CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials
      and makes operation more secure.
      
      CAP_PERFMON implements the principle of least privilege for performance
      monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39
      principle of least privilege: A security design principle that states
      that a process or program be granted only those privileges (e.g.,
      capabilities) necessary to accomplish its legitimate function, and only
      for the time that such privileges are actually required)
      
      For backward compatibility reasons access to the monitoring remains open
      for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for
      secure monitoring is discouraged with respect to CAP_PERFMON capability.
      Signed-off-by: default avatarAlexey Budankov <alexey.budankov@linux.intel.com>
      Reviewed-by: default avatarJames Morris <jamorris@linux.microsoft.com>
      Acked-by: default avatarAnju T Sudhakar <anju@linux.vnet.ibm.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Igor Lubashev <ilubashe@akamai.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Serge Hallyn <serge@hallyn.com>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: intel-gfx@lists.freedesktop.org
      Cc: linux-doc@vger.kernel.org
      Cc: linux-man@vger.kernel.org
      Cc: linux-security-module@vger.kernel.org
      Cc: selinux@vger.kernel.org
      Link: http://lore.kernel.org/lkml/ac98cd9f-b59e-673c-c70d-180b3e7695d2@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      ff467583
    • Alexey Budankov's avatar
      trace/bpf_trace: Open access for CAP_PERFMON privileged process · 031258da
      Alexey Budankov authored
      Open access to bpf_trace monitoring for CAP_PERFMON privileged process.
      Providing the access under CAP_PERFMON capability singly, without the
      rest of CAP_SYS_ADMIN credentials, excludes chances to misuse the
      credentials and makes operation more secure.
      
      CAP_PERFMON implements the principle of least privilege for performance
      monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39
      principle of least privilege: A security design principle that states
      that a process or program be granted only those privileges (e.g.,
      capabilities) necessary to accomplish its legitimate function, and only
      for the time that such privileges are actually required)
      
      For backward compatibility reasons access to bpf_trace monitoring
      remains open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN
      usage for secure bpf_trace monitoring is discouraged with respect to
      CAP_PERFMON capability.
      Signed-off-by: default avatarAlexey Budankov <alexey.budankov@linux.intel.com>
      Reviewed-by: default avatarJames Morris <jamorris@linux.microsoft.com>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Igor Lubashev <ilubashe@akamai.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Serge Hallyn <serge@hallyn.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: intel-gfx@lists.freedesktop.org
      Cc: linux-doc@vger.kernel.org
      Cc: linux-man@vger.kernel.org
      Cc: linux-security-module@vger.kernel.org
      Cc: selinux@vger.kernel.org
      Link: http://lore.kernel.org/lkml/c0a0ae47-8b6e-ff3e-416b-3cd1faaf71c0@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      031258da
    • Alexey Budankov's avatar
      drm/i915/perf: Open access for CAP_PERFMON privileged process · 4e3d3456
      Alexey Budankov authored
      Open access to i915_perf monitoring for CAP_PERFMON privileged process.
      Providing the access under CAP_PERFMON capability singly, without the
      rest of CAP_SYS_ADMIN credentials, excludes chances to misuse the
      credentials and makes operation more secure.
      
      CAP_PERFMON implements the principle of least privilege for performance
      monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39
      principle of least privilege: A security design principle that states
      that a process or program be granted only those privileges (e.g.,
      capabilities) necessary to accomplish its legitimate function, and only
      for the time that such privileges are actually required)
      
      For backward compatibility reasons access to i915_events subsystem remains
      open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for
      secure i915_events monitoring is discouraged with respect to CAP_PERFMON
      capability.
      Signed-off-by: default avatarAlexey Budankov <alexey.budankov@linux.intel.com>
      Reviewed-by: default avatarJames Morris <jamorris@linux.microsoft.com>
      Acked-by: default avatarLionel Landwerlin <lionel.g.landwerlin@intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Igor Lubashev <ilubashe@akamai.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Serge Hallyn <serge@hallyn.com>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: intel-gfx@lists.freedesktop.org
      Cc: linux-doc@vger.kernel.org
      Cc: linux-man@vger.kernel.org
      Cc: linux-security-module@vger.kernel.org
      Cc: selinux@vger.kernel.org
      Link: http://lore.kernel.org/lkml/e3e3292f-f765-ea98-e59c-fbe2db93fd34@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      4e3d3456
    • Alexey Budankov's avatar
      perf tools: Support CAP_PERFMON capability · 6b3e0e2e
      Alexey Budankov authored
      Extend error messages to mention CAP_PERFMON capability as an option to
      substitute CAP_SYS_ADMIN capability for secure system performance
      monitoring and observability operations. Make
      perf_event_paranoid_check() and __cmd_ftrace() to be aware of
      CAP_PERFMON capability.
      
      CAP_PERFMON implements the principle of least privilege for performance
      monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39
      principle of least privilege: A security design principle that states
      that a process or program be granted only those privileges (e.g.,
      capabilities) necessary to accomplish its legitimate function, and only
      for the time that such privileges are actually required)
      
      For backward compatibility reasons access to perf_events subsystem remains
      open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for
      secure perf_events monitoring is discouraged with respect to CAP_PERFMON
      capability.
      
      Committer testing:
      
      Using a libcap with this patch:
      
        diff --git a/libcap/include/uapi/linux/capability.h b/libcap/include/uapi/linux/capability.h
        index 78b2fd4c8a95..89b5b0279b60 100644
        --- a/libcap/include/uapi/linux/capability.h
        +++ b/libcap/include/uapi/linux/capability.h
        @@ -366,8 +366,9 @@ struct vfs_ns_cap_data {
      
         #define CAP_AUDIT_READ       37
      
        +#define CAP_PERFMON	     38
      
        -#define CAP_LAST_CAP         CAP_AUDIT_READ
        +#define CAP_LAST_CAP         CAP_PERFMON
      
         #define cap_valid(x) ((x) >= 0 && (x) <= CAP_LAST_CAP)
      
      Note that using '38' in place of 'cap_perfmon' works to some degree with
      an old libcap, its only when cap_get_flag() is called that libcap
      performs an error check based on the maximum value known for
      capabilities that it will fail.
      
      This makes determining the default of perf_event_attr.exclude_kernel to
      fail, as it can't determine if CAP_PERFMON is in place.
      
      Using 'perf top -e cycles' avoids the default check and sets
      perf_event_attr.exclude_kernel to 1.
      
      As root, with a libcap supporting CAP_PERFMON:
      
        # groupadd perf_users
        # adduser perf -g perf_users
        # mkdir ~perf/bin
        # cp ~acme/bin/perf ~perf/bin/
        # chgrp perf_users ~perf/bin/perf
        # setcap "cap_perfmon,cap_sys_ptrace,cap_syslog=ep" ~perf/bin/perf
        # getcap ~perf/bin/perf
        /home/perf/bin/perf = cap_sys_ptrace,cap_syslog,cap_perfmon+ep
        # ls -la ~perf/bin/perf
        -rwxr-xr-x. 1 root perf_users 16968552 Apr  9 13:10 /home/perf/bin/perf
      
      As the 'perf' user in the 'perf_users' group:
      
        $ perf top -a --stdio
        Error:
        Failed to mmap with 1 (Operation not permitted)
        $
      
      Either add the cap_ipc_lock capability to the perf binary or reduce the
      ring buffer size to some smaller value:
      
        $ perf top -m10 -a --stdio
        rounding mmap pages size to 64K (16 pages)
        Error:
        Failed to mmap with 1 (Operation not permitted)
        $ perf top -m4 -a --stdio
        Error:
        Failed to mmap with 1 (Operation not permitted)
        $ perf top -m2 -a --stdio
         PerfTop: 762 irqs/sec  kernel:49.7%  exact: 100.0% lost: 0/0 drop: 0/0 [4000Hz cycles], (all, 4 CPUs)
        ------------------------------------------------------------------------------------------------------
      
           9.83%  perf                [.] __symbols__insert
           8.58%  perf                [.] rb_next
           5.91%  [kernel]            [k] module_get_kallsym
           5.66%  [kernel]            [k] kallsyms_expand_symbol.constprop.0
           3.98%  libc-2.29.so        [.] __GI_____strtoull_l_internal
           3.66%  perf                [.] rb_insert_color
           2.34%  [kernel]            [k] vsnprintf
           2.30%  [kernel]            [k] string_nocheck
           2.16%  libc-2.29.so        [.] _IO_getdelim
           2.15%  [kernel]            [k] number
           2.13%  [kernel]            [k] format_decode
           1.58%  libc-2.29.so        [.] _IO_feof
           1.52%  libc-2.29.so        [.] __strcmp_avx2
           1.50%  perf                [.] rb_set_parent_color
           1.47%  libc-2.29.so        [.] __libc_calloc
           1.24%  [kernel]            [k] do_syscall_64
           1.17%  [kernel]            [k] __x86_indirect_thunk_rax
      
        $ perf record -a sleep 1
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.552 MB perf.data (74 samples) ]
        $ perf evlist
        cycles
        $ perf evlist -v
        cycles: size: 120, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1
        $ perf report | head -20
        # To display the perf.data header info, please use --header/--header-only options.
        #
        #
        # Total Lost Samples: 0
        #
        # Samples: 74  of event 'cycles'
        # Event count (approx.): 15694834
        #
        # Overhead  Command          Shared Object               Symbol
        # ........  ...............  ..........................  ......................................
        #
            19.62%  perf             [kernel.vmlinux]            [k] strnlen_user
            13.88%  swapper          [kernel.vmlinux]            [k] intel_idle
            13.83%  ksoftirqd/0      [kernel.vmlinux]            [k] pfifo_fast_dequeue
            13.51%  swapper          [kernel.vmlinux]            [k] kmem_cache_free
             6.31%  gnome-shell      [kernel.vmlinux]            [k] kmem_cache_free
             5.66%  kworker/u8:3+ix  [kernel.vmlinux]            [k] delay_tsc
             4.42%  perf             [kernel.vmlinux]            [k] __set_cpus_allowed_ptr
             3.45%  kworker/2:1-eve  [kernel.vmlinux]            [k] shmem_truncate_range
             2.29%  gnome-shell      libgobject-2.0.so.0.6000.7  [.] g_closure_ref
        $
      Signed-off-by: default avatarAlexey Budankov <alexey.budankov@linux.intel.com>
      Reviewed-by: default avatarJames Morris <jamorris@linux.microsoft.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Igor Lubashev <ilubashe@akamai.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Serge Hallyn <serge@hallyn.com>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: intel-gfx@lists.freedesktop.org
      Cc: linux-doc@vger.kernel.org
      Cc: linux-man@vger.kernel.org
      Cc: linux-security-module@vger.kernel.org
      Cc: selinux@vger.kernel.org
      Link: http://lore.kernel.org/lkml/a66d5648-2b8e-577e-e1f2-1d56c017ab5e@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      6b3e0e2e
    • Alexey Budankov's avatar
      perf/core: open access to probes for CAP_PERFMON privileged process · c9e0924e
      Alexey Budankov authored
      Open access to monitoring via kprobes and uprobes and eBPF tracing for
      CAP_PERFMON privileged process. Providing the access under CAP_PERFMON
      capability singly, without the rest of CAP_SYS_ADMIN credentials,
      excludes chances to misuse the credentials and makes operation more
      secure.
      
      perf kprobes and uprobes are used by ftrace and eBPF. perf probe uses
      ftrace to define new kprobe events, and those events are treated as
      tracepoint events. eBPF defines new probes via perf_event_open interface
      and then the probes are used in eBPF tracing.
      
      CAP_PERFMON implements the principle of least privilege for performance
      monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39
      principle of least privilege: A security design principle that states
      that a process or program be granted only those privileges (e.g.,
      capabilities) necessary to accomplish its legitimate function, and only
      for the time that such privileges are actually required)
      
      For backward compatibility reasons access to perf_events subsystem
      remains open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN
      usage for secure perf_events monitoring is discouraged with respect to
      CAP_PERFMON capability.
      Signed-off-by: default avatarAlexey Budankov <alexey.budankov@linux.intel.com>
      Reviewed-by: default avatarJames Morris <jamorris@linux.microsoft.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Igor Lubashev <ilubashe@akamai.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Serge Hallyn <serge@hallyn.com>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: intel-gfx@lists.freedesktop.org
      Cc: linux-doc@vger.kernel.org
      Cc: linux-security-module@vger.kernel.org
      Cc: selinux@vger.kernel.org
      Cc: linux-man@vger.kernel.org
      Link: http://lore.kernel.org/lkml/3c129d9a-ba8a-3483-ecc5-ad6c8e7c203f@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      c9e0924e
    • Alexey Budankov's avatar
      perf/core: Open access to the core for CAP_PERFMON privileged process · 18aa1856
      Alexey Budankov authored
      Open access to monitoring of kernel code, CPUs, tracepoints and
      namespaces data for a CAP_PERFMON privileged process. Providing the
      access under CAP_PERFMON capability singly, without the rest of
      CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials
      and makes operation more secure.
      
      CAP_PERFMON implements the principle of least privilege for performance
      monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39
      principle of least privilege: A security design principle that states
      that a process or program be granted only those privileges (e.g.,
      capabilities) necessary to accomplish its legitimate function, and only
      for the time that such privileges are actually required)
      
      For backward compatibility reasons the access to perf_events subsystem
      remains open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN
      usage for secure perf_events monitoring is discouraged with respect to
      CAP_PERFMON capability.
      Signed-off-by: default avatarAlexey Budankov <alexey.budankov@linux.intel.com>
      Reviewed-by: default avatarJames Morris <jamorris@linux.microsoft.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Igor Lubashev <ilubashe@akamai.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: linux-man@vger.kernel.org
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Serge Hallyn <serge@hallyn.com>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: intel-gfx@lists.freedesktop.org
      Cc: linux-doc@vger.kernel.org
      Cc: linux-security-module@vger.kernel.org
      Cc: selinux@vger.kernel.org
      Link: http://lore.kernel.org/lkml/471acaef-bb8a-5ce2-923f-90606b78eef9@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      18aa1856
    • Alexey Budankov's avatar
      capabilities: Introduce CAP_PERFMON to kernel and user space · 98073728
      Alexey Budankov authored
      Introduce the CAP_PERFMON capability designed to secure system
      performance monitoring and observability operations so that CAP_PERFMON
      can assist CAP_SYS_ADMIN capability in its governing role for
      performance monitoring and observability subsystems.
      
      CAP_PERFMON hardens system security and integrity during performance
      monitoring and observability operations by decreasing attack surface that
      is available to a CAP_SYS_ADMIN privileged process [2]. Providing the access
      to system performance monitoring and observability operations under CAP_PERFMON
      capability singly, without the rest of CAP_SYS_ADMIN credentials, excludes
      chances to misuse the credentials and makes the operation more secure.
      
      Thus, CAP_PERFMON implements the principle of least privilege for
      performance monitoring and observability operations (POSIX IEEE 1003.1e:
      2.2.2.39 principle of least privilege: A security design principle that
        states that a process or program be granted only those privileges
      (e.g., capabilities) necessary to accomplish its legitimate function,
      and only for the time that such privileges are actually required)
      
      CAP_PERFMON meets the demand to secure system performance monitoring and
      observability operations for adoption in security sensitive, restricted,
      multiuser production environments (e.g. HPC clusters, cloud and virtual compute
      environments), where root or CAP_SYS_ADMIN credentials are not available to
      mass users of a system, and securely unblocks applicability and scalability
      of system performance monitoring and observability operations beyond root
      and CAP_SYS_ADMIN use cases.
      
      CAP_PERFMON takes over CAP_SYS_ADMIN credentials related to system performance
      monitoring and observability operations and balances amount of CAP_SYS_ADMIN
      credentials following the recommendations in the capabilities man page [1]
      for CAP_SYS_ADMIN: "Note: this capability is overloaded; see Notes to kernel
      developers, below." For backward compatibility reasons access to system
      performance monitoring and observability subsystems of the kernel remains
      open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN capability
      usage for secure system performance monitoring and observability operations
      is discouraged with respect to the designed CAP_PERFMON capability.
      
      Although the software running under CAP_PERFMON can not ensure avoidance
      of related hardware issues, the software can still mitigate these issues
      following the official hardware issues mitigation procedure [2]. The bugs
      in the software itself can be fixed following the standard kernel development
      process [3] to maintain and harden security of system performance monitoring
      and observability operations.
      
      [1] http://man7.org/linux/man-pages/man7/capabilities.7.html
      [2] https://www.kernel.org/doc/html/latest/process/embargoed-hardware-issues.html
      [3] https://www.kernel.org/doc/html/latest/admin-guide/security-bugs.htmlSigned-off-by: default avatarAlexey Budankov <alexey.budankov@linux.intel.com>
      Acked-by: default avatarJames Morris <jamorris@linux.microsoft.com>
      Acked-by: default avatarSerge E. Hallyn <serge@hallyn.com>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Acked-by: default avatarStephen Smalley <sds@tycho.nsa.gov>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Igor Lubashev <ilubashe@akamai.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: intel-gfx@lists.freedesktop.org
      Cc: linux-doc@vger.kernel.org
      Cc: linux-man@vger.kernel.org
      Cc: linux-security-module@vger.kernel.org
      Cc: selinux@vger.kernel.org
      Link: http://lore.kernel.org/lkml/5590d543-82c6-490a-6544-08e6a5517db0@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      98073728
    • Jiri Olsa's avatar
      perf annotate: Add basic support for bpf_image · 3c29d448
      Jiri Olsa authored
      Add the DSO_BINARY_TYPE__BPF_IMAGE dso binary type to recognize BPF
      images that carry trampoline or dispatcher.
      
      Upcoming patches will add support to read the image data, store it
      within the BPF feature in perf.data and display it for annotation
      purposes.
      
      Currently we only display following message:
      
        # ./perf annotate bpf_trampoline_24456 --stdio
         Percent |      Source code & Disassembly of . for cycles (504  ...
        --------------------------------------------------------------- ...
                 :       to be implemented
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andrii Nakryiko <andriin@fb.com>
      Cc: Björn Töpel <bjorn.topel@intel.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David S. Miller <davem@redhat.com>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Jesper Dangaard Brouer <hawk@kernel.org>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Yonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/20200312195610.346362-16-jolsa@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      3c29d448
    • Jiri Olsa's avatar
      perf machine: Set ksymbol dso as loaded on arrival · 7eddf7e7
      Jiri Olsa authored
      There's no special load action for ksymbol data on map__load/dso__load
      action, where the kernel is getting loaded. It only gets confused with
      kernel kallsyms/vmlinux load for bpf object, which fails and could mess
      up with the map.
      
      Disabling any further load of the map for ksymbol related dso/map.
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andrii Nakryiko <andriin@fb.com>
      Cc: Björn Töpel <bjorn.topel@intel.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David S. Miller <davem@redhat.com>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Jesper Dangaard Brouer <hawk@kernel.org>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Yonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/20200312195610.346362-15-jolsa@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      7eddf7e7
    • Jiri Olsa's avatar
      perf tools: Synthesize bpf_trampoline/dispatcher ksymbol event · 943930e4
      Jiri Olsa authored
      Synthesize bpf images (trampolines/dispatchers) on start, as ksymbol
      events from /proc/kallsyms. Having this perf can recognize samples from
      those images and perf report and top shows them correctly.
      
      The rest of the ksymbol handling is already in place from for the bpf
      programs monitoring, so only the initial state was needed.
      
      perf report output:
      
        # Overhead  Command     Shared Object                  Symbol
      
          12.37%  test_progs  [kernel.vmlinux]                 [k] entry_SYSCALL_64
          11.80%  test_progs  [kernel.vmlinux]                 [k] syscall_return_via_sysret
           9.63%  test_progs  bpf_prog_bcf7977d3b93787c_prog2  [k] bpf_prog_bcf7977d3b93787c_prog2
           6.90%  test_progs  bpf_trampoline_24456             [k] bpf_trampoline_24456
           6.36%  test_progs  [kernel.vmlinux]                 [k] memcpy_erms
      
      Committer notes:
      
      Use scnprintf() instead of strncpy() to overcome this on fedora:32,
      rawhide and OpenMandriva Cooker:
      
          CC       /tmp/build/perf/util/bpf-event.o
        In file included from /usr/include/string.h:495,
                         from /git/linux/tools/lib/bpf/libbpf_common.h:12,
                         from /git/linux/tools/lib/bpf/bpf.h:31,
                         from util/bpf-event.c:4:
        In function 'strncpy',
            inlined from 'process_bpf_image' at util/bpf-event.c:323:2,
            inlined from 'kallsyms_process_symbol' at util/bpf-event.c:358:9:
        /usr/include/bits/string_fortified.h:106:10: error: '__builtin_strncpy' specified bound 256 equals destination size [-Werror=stringop-truncation]
          106 |   return __builtin___strncpy_chk (__dest, __src, __len, __bos (__dest));
              |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        cc1: all warnings being treated as errors
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andrii Nakryiko <andriin@fb.com>
      Cc: Björn Töpel <bjorn.topel@intel.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David S. Miller <davem@redhat.com>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Jesper Dangaard Brouer <hawk@kernel.org>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Yonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/20200312195610.346362-14-jolsa@kernel.org/Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      943930e4
    • Arnaldo Carvalho de Melo's avatar
      perf stat: Honour --timeout for forked workloads · cfbd41b7
      Arnaldo Carvalho de Melo authored
      When --timeout is used and a workload is specified to be started by
      'perf stat', i.e.
      
        $ perf stat --timeout 1000 sleep 1h
      
      The --timeout wasn't being honoured, i.e. the workload, 'sleep 1h' in
      the above example, should be terminated after 1000ms, but it wasn't,
      'perf stat' was waiting for it to finish.
      
      Fix it by sending a SIGTERM when the timeout expires.
      
      Now it works:
      
        # perf stat -e cycles --timeout 1234 sleep 1h
        sleep: Terminated
      
         Performance counter stats for 'sleep 1h':
      
                 1,066,692      cycles
      
               1.234314838 seconds time elapsed
      
               0.000750000 seconds user
               0.000000000 seconds sys
      
        #
      
      Fixes: f1f8ad52 ("perf stat: Add support to print counts after a period of time")
      Reported-by: default avatarKonstantin Kharlamov <hi-angel@yandex.ru>
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=207243Tested-by: default avatarKonstantin Kharlamov <hi-angel@yandex.ru>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Tested-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: yuzhoujian <yuzhoujian@didichuxing.com>
      Link: https://lore.kernel.org/lkml/20200415153803.GB20324@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      cfbd41b7
    • Ingo Molnar's avatar
      Merge tag 'perf-urgent-for-mingo-5.7-20200414' of... · cd094335
      Ingo Molnar authored
      Merge tag 'perf-urgent-for-mingo-5.7-20200414' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
      
      Pull perf/urgent fixes from Arnaldo Carvalho de Melo:
      
      perf stat:
      
        Jin Yao:
      
        - Fix no metric header if --per-socket and --metric-only set
      
      build system:
      
        - Fix python building when built with clang, that was failing if the clang
          version doesn't support -fno-semantic-interposition.
      
      tools UAPI headers:
      
        Arnaldo Carvalho de Melo:
      
        - Update various copies of kernel headers, some ended up automatically
          updating build-time generated tables to enable tools such as 'perf trace'
          to decode syscalls and tracepoints arguments.
      
          Now the tools/perf build is free of UAPI drift warnings.
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      cd094335
    • Linus Torvalds's avatar
      Merge tag 'efi-urgent-2020-04-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 00086336
      Linus Torvalds authored
      Pull EFI fixes from Ingo Molnar:
       "Misc EFI fixes, including the boot failure regression caused by the
        BSS section not being cleared by the loaders"
      
      * tag 'efi-urgent-2020-04-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        efi/x86: Revert struct layout change to fix kexec boot regression
        efi/x86: Don't remap text<->rodata gap read-only for mixed mode
        efi/x86: Fix the deletion of variables in mixed mode
        efi/libstub/file: Merge file name buffers to reduce stack usage
        Documentation/x86, efi/x86: Clarify EFI handover protocol and its requirements
        efi/arm: Deal with ADR going out of range in efi_enter_kernel()
        efi/x86: Always relocate the kernel for EFI handover entry
        efi/x86: Move efi stub globals from .bss to .data
        efi/libstub/x86: Remove redundant assignment to pointer hdr
        efi/cper: Use scnprintf() for avoiding potential buffer overflow
      00086336
  3. 14 Apr, 2020 1 commit
    • Linus Torvalds's avatar
      Merge tag 'hyperv-fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux · 8632e9b5
      Linus Torvalds authored
      Pull hyperv fixes from Wei Liu:
      
       - a series from Tianyu Lan to fix crash reporting on Hyper-V
      
       - three miscellaneous cleanup patches
      
      * tag 'hyperv-fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
        x86/Hyper-V: Report crash data in die() when panic_on_oops is set
        x86/Hyper-V: Report crash register data when sysctl_record_panic_msg is not set
        x86/Hyper-V: Report crash register data or kmsg before running crash kernel
        x86/Hyper-V: Trigger crash enlightenment only once during system crash.
        x86/Hyper-V: Free hv_panic_page when fail to register kmsg dump
        x86/Hyper-V: Unload vmbus channel in hv panic callback
        x86: hyperv: report value of misc_features
        hv_debugfs: Make hv_debug_root static
        hv: hyperv_vmbus.h: Replace zero-length array with flexible-array member
      8632e9b5