• Roberto Agostino Vitillo's avatar
    perf report: Add support for taken branch sampling · b50311dc
    Roberto Agostino Vitillo authored
    This patch adds support for taken branch sampling, i.e, the
    PERF_SAMPLE_BRANCH_STACK feature to perf report. In other
    words, to display histograms based on taken branches rather
    than executed instructions addresses.
    
    The new option is called -b and it takes no argument. To
    generate meaningful output, the perf.data must have been
    obtained using perf record -b xxx ... where xxx is a branch
    filter option.
    
    The output shows symbols, modules, sorted by 'who branches
    where' the most often. The percentages reported in the first
    column refer to the total number of branches captured and
    not the usual number of samples.
    
    Here is a quick example.
    Here branchy is simple test program which looks as follows:
    
    void f2(void)
    {}
    void f3(void)
    {}
    void f1(unsigned long n)
    {
      if (n & 1UL)
        f2();
      else
        f3();
    }
    int main(void)
    {
      unsigned long i;
    
      for (i=0; i < N; i++)
       f1(i);
      return 0;
    }
    
    Here is the output captured on Nehalem, if we are
    only interested in user level function calls.
    
    $ perf record -b any_call,u -e cycles:u branchy
    
    $ perf report -b --sort=symbol
        52.34%  [.] main                   [.] f1
        24.04%  [.] f1                     [.] f3
        23.60%  [.] f1                     [.] f2
         0.01%  [k] _IO_new_file_xsputn    [k] _IO_file_overflow
         0.01%  [k] _IO_vfprintf_internal  [k] _IO_new_file_xsputn
         0.01%  [k] _IO_vfprintf_internal  [k] strchrnul
         0.01%  [k] __printf               [k] _IO_vfprintf_internal
         0.01%  [k] main                   [k] __printf
    
    About half (52%) of the call branches captured are from main()
    -> f1(). The second half (24%+23%) is split in two equal shares
    between f1() -> f2(), f1() ->f3(). The output is as expected
    given the code.
    
    It should be noted, that using -b in perf record does not
    eliminate information in the perf.data file. Consequently, a
    typical profile can also be obtained by perf report by simply
    not using its -b option.
    
    It is possible to sort on branch related columns:
    
       - dso_from, symbol_from
       - dso_to, symbol_to
       - mispredict
    Signed-off-by: default avatarRoberto Agostino Vitillo <ravitillo@lbl.gov>
    Signed-off-by: default avatarStephane Eranian <eranian@google.com>
    Cc: peterz@infradead.org
    Cc: acme@redhat.com
    Cc: robert.richter@amd.com
    Cc: ming.m.lin@intel.com
    Cc: andi@firstfloor.org
    Cc: asharma@fb.com
    Cc: vweaver1@eecs.utk.edu
    Cc: khandual@linux.vnet.ibm.com
    Cc: dsahern@gmail.com
    Link: http://lkml.kernel.org/r/1328826068-11713-14-git-send-email-eranian@google.comSigned-off-by: default avatarIngo Molnar <mingo@elte.hu>
    b50311dc
perf-report.txt 4 KB