• Jin Yao's avatar
    perf report: Sort by sampled cycles percent per block for stdio · 6f7164fa
    Jin Yao authored
    It would be useful to support sorting for all blocks by the sampled
    cycles percent per block. This is useful to concentrate on the globally
    hottest blocks.
    
    This patch implements a new option "--total-cycles" which sorts all
    blocks by 'Sampled Cycles%'. The 'Sampled Cycles%' is the percent:
    
     percent = block sampled cycles aggregation / total sampled cycles
    
    Note that, this patch only supports "--stdio" mode.
    
    For example,
    
      # perf record -b ./div
      # perf report --total-cycles --stdio
      # To display the perf.data header info, please use --header/--header-only options.
      #
      # Total Lost Samples: 0
      #
      # Samples: 2M of event 'cycles'
      # Event count (approx.): 2753248
      #
      # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles                             [Program Block Range]      Shared Object
      # ...............  ..............  ...........  ..........  ................................................  .................
      #
                 26.04%            2.8M        0.40%          18                            [div.c:42 -> div.c:39]                div
                 15.17%            1.2M        0.16%           7                [random_r.c:357 -> random_r.c:380]       libc-2.27.so
                  5.11%          402.0K        0.04%           2                            [div.c:27 -> div.c:28]                div
                  4.87%          381.6K        0.04%           2                    [random.c:288 -> random.c:291]       libc-2.27.so
                  4.53%          381.0K        0.04%           2                            [div.c:40 -> div.c:40]                div
                  3.85%          300.9K        0.02%           1                            [div.c:22 -> div.c:25]                div
                  3.08%          241.1K        0.02%           1                          [rand.c:26 -> rand.c:27]       libc-2.27.so
                  3.06%          240.0K        0.02%           1                    [random.c:291 -> random.c:291]       libc-2.27.so
                  2.78%          215.7K        0.02%           1                    [random.c:298 -> random.c:298]       libc-2.27.so
                  2.52%          198.3K        0.02%           1                    [random.c:293 -> random.c:293]       libc-2.27.so
                  2.36%          184.8K        0.02%           1                          [rand.c:28 -> rand.c:28]       libc-2.27.so
                  2.33%          180.5K        0.02%           1                    [random.c:295 -> random.c:295]       libc-2.27.so
                  2.28%          176.7K        0.02%           1                    [random.c:295 -> random.c:295]       libc-2.27.so
                  2.20%          168.8K        0.02%           1                        [rand@plt+0 -> rand@plt+0]                div
                  1.98%          158.2K        0.02%           1                [random_r.c:388 -> random_r.c:388]       libc-2.27.so
                  1.57%          123.3K        0.02%           1                            [div.c:42 -> div.c:44]                div
                  1.44%          116.0K        0.42%          19                [random_r.c:357 -> random_r.c:394]       libc-2.27.so
                  0.25%          182.5K        0.02%           1                [random_r.c:388 -> random_r.c:391]       libc-2.27.so
                  0.00%              48        1.07%          48        [x86_pmu_enable+284 -> x86_pmu_enable+298]  [kernel.kallsyms]
                  0.00%              74        1.64%          74             [vm_mmap_pgoff+0 -> vm_mmap_pgoff+92]  [kernel.kallsyms]
                  0.00%              73        1.62%          73                         [vm_mmap+0 -> vm_mmap+48]  [kernel.kallsyms]
                  0.00%              63        0.69%          31                       [up_write+0 -> up_write+34]  [kernel.kallsyms]
                  0.00%              13        0.29%          13      [setup_arg_pages+396 -> setup_arg_pages+413]  [kernel.kallsyms]
                  0.00%               3        0.07%           3      [setup_arg_pages+418 -> setup_arg_pages+450]  [kernel.kallsyms]
                  0.00%             616        6.84%         308   [security_mmap_file+0 -> security_mmap_file+72]  [kernel.kallsyms]
                  0.00%              23        0.51%          23  [security_mmap_file+77 -> security_mmap_file+87]  [kernel.kallsyms]
                  0.00%               4        0.02%           1                  [sched_clock+0 -> sched_clock+4]  [kernel.kallsyms]
                  0.00%               4        0.02%           1                 [sched_clock+9 -> sched_clock+12]  [kernel.kallsyms]
                  0.00%               1        0.02%           1                [rcu_nmi_exit+0 -> rcu_nmi_exit+9]  [kernel.kallsyms]
    
    Committer testing:
    
    This should provide material for hours of endless joy, both from looking
    for suspicious things in the implementation of this patch, such as the
    top one:
    
      # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles                          [Program Block Range]     Shared Object
                  2.17%            1.7M        0.08%         607   [compiler.h:199 -> common.c:221]              [kernel.vmlinux]
    
    As well from things that look legit:
    
      # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles                          [Program Block Range]     Shared Object
                  0.16%          123.0K        0.60%        4.7K   [nospec-branch.h:265 -> nospec-branch.h:278]  [kernel.vmlinux]
    
    :-)
    
    Very short system wide taken branches session:
    
      # perf record -h -b
    
       Usage: perf record [<options>] [<command>]
          or: perf record [<options>] -- <command> [<options>]
    
          -b, --branch-any      sample any taken branches
    
      #
      # perf record -b
      ^C[ perf record: Woken up 595 times to write data ]
      [ perf record: Captured and wrote 156.672 MB perf.data (196873 samples) ]
    
      #
      # perf evlist -v
      cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD|BRANCH_STACK, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, branch_sample_type: ANY
      #
      # perf report --total-cycles --stdio
      # To display the perf.data header info, please use --header/--header-only options.
      #
      # Total Lost Samples: 0
      #
      # Samples: 6M of event 'cycles'
      # Event count (approx.): 6299936
      #
      # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles                                                   [Program Block Range]         Shared Object
      # ...............  ..............  ...........  ..........  ......................................................................  ....................
      #
                  2.17%            1.7M        0.08%         607                                        [compiler.h:199 -> common.c:221]      [kernel.vmlinux]
                  1.75%            1.3M        8.34%       65.5K    [memset-vec-unaligned-erms.S:147 -> memset-vec-unaligned-erms.S:151]          libc-2.29.so
                  0.72%          544.5K        0.03%         230                                      [entry_64.S:657 -> entry_64.S:662]      [kernel.vmlinux]
                  0.56%          541.8K        0.09%         672                                        [compiler.h:199 -> common.c:300]      [kernel.vmlinux]
                  0.39%          293.2K        0.01%         104                                    [list_debug.c:43 -> list_debug.c:61]      [kernel.vmlinux]
                  0.36%          278.6K        0.03%         272                                    [entry_64.S:1289 -> entry_64.S:1308]      [kernel.vmlinux]
                  0.30%          260.8K        0.07%         564                              [clear_page_64.S:47 -> clear_page_64.S:50]      [kernel.vmlinux]
                  0.28%          215.3K        0.05%         369                                            [traps.c:623 -> traps.c:628]      [kernel.vmlinux]
                  0.23%          178.1K        0.04%         278                                      [entry_64.S:271 -> entry_64.S:275]      [kernel.vmlinux]
                  0.20%          152.6K        0.09%         706                                      [paravirt.c:177 -> paravirt.c:179]      [kernel.vmlinux]
                  0.20%          155.8K        0.05%         373                                      [entry_64.S:153 -> entry_64.S:175]      [kernel.vmlinux]
                  0.18%          136.6K        0.03%         222                                                [msr.h:105 -> msr.h:166]      [kernel.vmlinux]
                  0.16%          123.0K        0.60%        4.7K                            [nospec-branch.h:265 -> nospec-branch.h:278]      [kernel.vmlinux]
                  0.16%          118.3K        0.01%          44                                      [entry_64.S:632 -> entry_64.S:657]      [kernel.vmlinux]
                  0.14%          104.5K        0.00%          28                                          [rwsem.c:1541 -> rwsem.c:1544]      [kernel.vmlinux]
                  0.13%           99.2K        0.01%          53                                      [spinlock.c:150 -> spinlock.c:152]      [kernel.vmlinux]
                  0.13%           95.5K        0.00%          35                                              [swap.c:456 -> swap.c:471]      [kernel.vmlinux]
                  0.12%           96.2K        0.05%         407                              [copy_user_64.S:175 -> copy_user_64.S:209]      [kernel.vmlinux]
                  0.11%           85.9K        0.00%          31                                        [swap.c:400 -> page-flags.h:188]      [kernel.vmlinux]
                  0.10%           73.0K        0.01%          52                                          [paravirt.h:763 -> list.h:131]      [kernel.vmlinux]
                  0.07%           56.2K        0.03%         214                                      [filemap.c:1524 -> filemap.c:1557]      [kernel.vmlinux]
                  0.07%           54.2K        0.02%         145                                        [memory.c:1032 -> memory.c:1049]      [kernel.vmlinux]
                  0.07%           50.3K        0.00%          39                                            [mmzone.c:49 -> mmzone.c:69]      [kernel.vmlinux]
                  0.06%           48.3K        0.01%          40                                   [paravirt.h:768 -> page_alloc.c:3304]      [kernel.vmlinux]
                  0.06%           46.7K        0.02%         155                                        [memory.c:1032 -> memory.c:1056]      [kernel.vmlinux]
                  0.06%           46.9K        0.01%         103                                              [swap.c:867 -> swap.c:902]      [kernel.vmlinux]
                  0.06%           47.8K        0.00%          34                                    [entry_64.S:1201 -> entry_64.S:1202]      [kernel.vmlinux]
    
     -----------------------------------------------------------
    
     v7:
     ---
     Use use_browser in report__browse_block_hists for supporting
     stdio and potential tui mode.
    
     v6:
     ---
     Create report__browse_block_hists in block-info.c (codes are
     moved from builtin-report.c). It's called from
     perf_evlist__tty_browse_hists.
    
     v5:
     ---
     1. Move all block functions to block-info.c
    
     2. Move the code of setting ms in block hist_entry to
        other patch.
    
     v4:
     ---
     1. Use new option '--total-cycles' to replace
        '-s total_cycles' in v3.
    
     2. Move block info collection out of block info
        printing.
    
     v3:
     ---
     1. Use common function block_info__process_sym to
        process the blocks per symbol.
    
     2. Remove the nasty hack for skipping calculation
        of column length
    
     3. Some minor cleanup
    Signed-off-by: default avatarJin Yao <yao.jin@linux.intel.com>
    Reviewed-by: default avatarJiri Olsa <jolsa@kernel.org>
    Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
    Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
    Cc: Andi Kleen <ak@linux.intel.com>
    Cc: Jin Yao <yao.jin@intel.com>
    Cc: Kan Liang <kan.liang@linux.intel.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Link: http://lore.kernel.org/lkml/20191107074719.26139-6-yao.jin@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
    6f7164fa
builtin-report.c 42.3 KB