• Namhyung Kim's avatar
    perf sort: Fix the 'weight' sort key behavior · 784e8add
    Namhyung Kim authored
    Currently, the 'weight' field in the perf sample has latency information
    for some instructions like in memory accesses.  And perf tool has 'weight'
    and 'local_weight' sort keys to display the info.
    
    But it's somewhat confusing what it shows exactly.  In my understanding,
    'local_weight' shows a weight in a single sample, and (global) 'weight'
    shows a sum of the weights in the hist_entry.
    
    For example:
    
      $ perf mem record -t load dd if=/dev/zero of=/dev/null bs=4k count=1M
    
      $ perf report --stdio -n -s +local_weight
      ...
      #
      # Overhead  Samples  Command  Shared Object     Symbol                     Local Weight
      # ........  .......  .......  ................  .........................  ............
      #
          21.23%      313  dd       [kernel.vmlinux]  [k] lockref_get_not_zero   32
          12.43%      183  dd       [kernel.vmlinux]  [k] lockref_get_not_zero   35
          11.97%      159  dd       [kernel.vmlinux]  [k] lockref_get_not_zero   36
          10.40%      141  dd       [kernel.vmlinux]  [k] lockref_put_return     32
           7.63%      113  dd       [kernel.vmlinux]  [k] lockref_get_not_zero   33
           6.37%       92  dd       [kernel.vmlinux]  [k] lockref_get_not_zero   34
           6.15%       90  dd       [kernel.vmlinux]  [k] lockref_put_return     33
      ...
    
    So let's look at the 'lockref_get_not_zero' symbols.  The top entry
    shows that 313 samples were captured with 'local_weight' 32, so the
    total weight should be 313 x 32 = 10016.  But it's not the case:
    
      $ perf report --stdio -n -s +local_weight,weight -S lockref_get_not_zero
      ...
      #
      # Overhead  Samples  Command  Shared Object     Local Weight  Weight
      # ........  .......  .......  ................  ............  ......
      #
           1.36%        4  dd       [kernel.vmlinux]  36            144
           0.47%        4  dd       [kernel.vmlinux]  37            148
           0.42%        4  dd       [kernel.vmlinux]  32            128
           0.40%        4  dd       [kernel.vmlinux]  34            136
           0.35%        4  dd       [kernel.vmlinux]  36            144
           0.34%        4  dd       [kernel.vmlinux]  35            140
           0.30%        4  dd       [kernel.vmlinux]  36            144
           0.30%        4  dd       [kernel.vmlinux]  34            136
           0.30%        4  dd       [kernel.vmlinux]  32            128
           0.30%        4  dd       [kernel.vmlinux]  32            128
      ...
    
    With the 'weight' sort key, it's divided to 4 samples even with the same
    info ('comm', 'dso', 'sym' and 'local_weight').  I don't think this is
    what we want.
    
    I found this because of the way it aggregates the 'weight' value.  Since
    it's not a period, we should not add them in the he->stat.  Otherwise,
    two 32 'weight' entries will create a 64 'weight' entry.
    
    After that, new 32 'weight' samples don't have a matching entry so it'd
    create a new entry and make it a 64 'weight' entry again and again.
    Later, they will be merged into 128 'weight' entries during the
    hists__collapse_resort() with 4 samples, multiple times like above.
    
    Let's keep the weight and display it differently.  For 'local_weight',
    it can show the weight as is, and for (global) 'weight' it can display
    the number multiplied by the number of samples.
    
    With this change, I can see the expected numbers.
    
      $ perf report --stdio -n -s +local_weight,weight -S lockref_get_not_zero
      ...
      #
      # Overhead  Samples  Command  Shared Object     Local Weight  Weight
      # ........  .......  .......  ................  ............  .....
      #
          21.23%      313  dd       [kernel.vmlinux]  32            10016
          12.43%      183  dd       [kernel.vmlinux]  35            6405
          11.97%      159  dd       [kernel.vmlinux]  36            5724
           7.63%      113  dd       [kernel.vmlinux]  33            3729
           6.37%       92  dd       [kernel.vmlinux]  34            3128
           4.17%       59  dd       [kernel.vmlinux]  37            2183
           0.08%        1  dd       [kernel.vmlinux]  269           269
           0.08%        1  dd       [kernel.vmlinux]  38            38
    Reviewed-by: default avatarAthira Jajeev <atrajeev@linux.vnet.ibm.com>
    Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
    Tested-by: default avatarAthira Jajeev <atrajeev@linux.vnet.ibm.com>
    Cc: Andi Kleen <ak@linux.intel.com>
    Cc: Ian Rogers <irogers@google.com>
    Cc: Ingo Molnar <mingo@kernel.org>
    Cc: Jiri Olsa <jolsa@redhat.com>
    Cc: Kan Liang <kan.liang@linux.intel.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Stephane Eranian <eranian@google.com>
    Link: https://lore.kernel.org/r/20211105225617.151364-1-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
    784e8add
sort.c 80.1 KB