1. 17 Mar, 2020 2 commits
    • Jin Yao's avatar
      perf report: Fix no branch type statistics report issue · c3b10649
      Jin Yao authored
      Previously we could get the report of branch type statistics.
      
      For example:
      
        # perf record -j any,save_type ...
        # t perf report --stdio
      
        #
        # Branch Statistics:
        #
        COND_FWD:  40.6%
        COND_BWD:   4.1%
        CROSS_4K:  24.7%
        CROSS_2M:  12.3%
            COND:  44.7%
          UNCOND:   0.0%
             IND:   6.1%
            CALL:  24.5%
             RET:  24.7%
      
      But now for the recent perf, it can't report the branch type statistics.
      
      It's a regression issue caused by commit 40c39e30 ("perf report: Fix
      a no annotate browser displayed issue"), which only counts the branch
      type statistics for browser mode.
      
      This patch moves the branch_type_count() outside of ui__has_annotation()
      checking, then branch type statistics can work for stdio mode.
      
      Fixes: 40c39e30 ("perf report: Fix a no annotate browser displayed issue")
      Signed-off-by: default avatarJin Yao <yao.jin@linux.intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200313134607.12873-1-yao.jin@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      c3b10649
    • Ian Rogers's avatar
      perf tools: Give synthetic mmap events an inode generation · 3b7a15b0
      Ian Rogers authored
      When mmap2 events are synthesized the ino_generation field isn't being
      set leading to uninitialized memory being compared.
      
      Caught with clang's -fsanitize=memory:
      
      ==124733==WARNING: MemorySanitizer: use-of-uninitialized-value
          #0 0x55a96a6a65cc in __dso_id__cmp tools/perf/util/dsos.c:23:6
          #1 0x55a96a6a81d5 in dso_id__cmp tools/perf/util/dsos.c:38:9
          #2 0x55a96a6a717f in __dso__cmp_long_name tools/perf/util/dsos.c:74:15
          #3 0x55a96a6a6c4c in __dsos__findnew_link_by_longname_id tools/perf/util/dsos.c:106:12
          #4 0x55a96a6a851e in __dsos__findnew_by_longname_id tools/perf/util/dsos.c:178:9
          #5 0x55a96a6a7798 in __dsos__find_id tools/perf/util/dsos.c:191:9
          #6 0x55a96a6a7b57 in __dsos__findnew_id tools/perf/util/dsos.c:251:20
          #7 0x55a96a6a7a57 in dsos__findnew_id tools/perf/util/dsos.c:259:17
          #8 0x55a96a7776ae in machine__findnew_dso_id tools/perf/util/machine.c:2709:9
          #9 0x55a96a77dfcf in map__new tools/perf/util/map.c:193:10
          #10 0x55a96a77240a in machine__process_mmap2_event tools/perf/util/machine.c:1670:8
          #11 0x55a96a7741a3 in machine__process_event tools/perf/util/machine.c:1882:9
          #12 0x55a96a6aee39 in perf_event__process tools/perf/util/event.c:454:9
          #13 0x55a96a87d633 in perf_tool__process_synth_event tools/perf/util/synthetic-events.c:63:9
          #14 0x55a96a87f131 in perf_event__synthesize_mmap_events tools/perf/util/synthetic-events.c:403:7
          #15 0x55a96a8815d6 in __event__synthesize_thread tools/perf/util/synthetic-events.c:548:9
          #16 0x55a96a882bff in __perf_event__synthesize_threads tools/perf/util/synthetic-events.c:681:3
          #17 0x55a96a881ec2 in perf_event__synthesize_threads tools/perf/util/synthetic-events.c:750:9
          #18 0x55a96a562b26 in synth_all tools/perf/tests/mmap-thread-lookup.c:136:9
          #19 0x55a96a5623b1 in mmap_events tools/perf/tests/mmap-thread-lookup.c:174:8
          #20 0x55a96a561fa0 in test__mmap_thread_lookup tools/perf/tests/mmap-thread-lookup.c:230:2
          #21 0x55a96a52c182 in run_test tools/perf/tests/builtin-test.c:378:9
          #22 0x55a96a52afc1 in test_and_print tools/perf/tests/builtin-test.c:408:9
          #23 0x55a96a52966e in __cmd_test tools/perf/tests/builtin-test.c:603:4
          #24 0x55a96a52855d in cmd_test tools/perf/tests/builtin-test.c:747:9
          #25 0x55a96a2844d4 in run_builtin tools/perf/perf.c:312:11
          #26 0x55a96a282bd0 in handle_internal_command tools/perf/perf.c:364:8
          #27 0x55a96a284097 in run_argv tools/perf/perf.c:408:2
          #28 0x55a96a282223 in main tools/perf/perf.c:538:3
      
        Uninitialized value was stored to memory at
          #1 0x55a96a6a18f7 in dso__new_id tools/perf/util/dso.c:1230:14
          #2 0x55a96a6a78ee in __dsos__addnew_id tools/perf/util/dsos.c:233:20
          #3 0x55a96a6a7bcc in __dsos__findnew_id tools/perf/util/dsos.c:252:21
          #4 0x55a96a6a7a57 in dsos__findnew_id tools/perf/util/dsos.c:259:17
          #5 0x55a96a7776ae in machine__findnew_dso_id tools/perf/util/machine.c:2709:9
          #6 0x55a96a77dfcf in map__new tools/perf/util/map.c:193:10
          #7 0x55a96a77240a in machine__process_mmap2_event tools/perf/util/machine.c:1670:8
          #8 0x55a96a7741a3 in machine__process_event tools/perf/util/machine.c:1882:9
          #9 0x55a96a6aee39 in perf_event__process tools/perf/util/event.c:454:9
          #10 0x55a96a87d633 in perf_tool__process_synth_event tools/perf/util/synthetic-events.c:63:9
          #11 0x55a96a87f131 in perf_event__synthesize_mmap_events tools/perf/util/synthetic-events.c:403:7
          #12 0x55a96a8815d6 in __event__synthesize_thread tools/perf/util/synthetic-events.c:548:9
          #13 0x55a96a882bff in __perf_event__synthesize_threads tools/perf/util/synthetic-events.c:681:3
          #14 0x55a96a881ec2 in perf_event__synthesize_threads tools/perf/util/synthetic-events.c:750:9
          #15 0x55a96a562b26 in synth_all tools/perf/tests/mmap-thread-lookup.c:136:9
          #16 0x55a96a5623b1 in mmap_events tools/perf/tests/mmap-thread-lookup.c:174:8
          #17 0x55a96a561fa0 in test__mmap_thread_lookup tools/perf/tests/mmap-thread-lookup.c:230:2
          #18 0x55a96a52c182 in run_test tools/perf/tests/builtin-test.c:378:9
          #19 0x55a96a52afc1 in test_and_print tools/perf/tests/builtin-test.c:408:9
      
        Uninitialized value was stored to memory at
          #0 0x55a96a7725af in machine__process_mmap2_event tools/perf/util/machine.c:1646:25
          #1 0x55a96a7741a3 in machine__process_event tools/perf/util/machine.c:1882:9
          #2 0x55a96a6aee39 in perf_event__process tools/perf/util/event.c:454:9
          #3 0x55a96a87d633 in perf_tool__process_synth_event tools/perf/util/synthetic-events.c:63:9
          #4 0x55a96a87f131 in perf_event__synthesize_mmap_events tools/perf/util/synthetic-events.c:403:7
          #5 0x55a96a8815d6 in __event__synthesize_thread tools/perf/util/synthetic-events.c:548:9
          #6 0x55a96a882bff in __perf_event__synthesize_threads tools/perf/util/synthetic-events.c:681:3
          #7 0x55a96a881ec2 in perf_event__synthesize_threads tools/perf/util/synthetic-events.c:750:9
          #8 0x55a96a562b26 in synth_all tools/perf/tests/mmap-thread-lookup.c:136:9
          #9 0x55a96a5623b1 in mmap_events tools/perf/tests/mmap-thread-lookup.c:174:8
          #10 0x55a96a561fa0 in test__mmap_thread_lookup tools/perf/tests/mmap-thread-lookup.c:230:2
          #11 0x55a96a52c182 in run_test tools/perf/tests/builtin-test.c:378:9
          #12 0x55a96a52afc1 in test_and_print tools/perf/tests/builtin-test.c:408:9
          #13 0x55a96a52966e in __cmd_test tools/perf/tests/builtin-test.c:603:4
          #14 0x55a96a52855d in cmd_test tools/perf/tests/builtin-test.c:747:9
          #15 0x55a96a2844d4 in run_builtin tools/perf/perf.c:312:11
          #16 0x55a96a282bd0 in handle_internal_command tools/perf/perf.c:364:8
          #17 0x55a96a284097 in run_argv tools/perf/perf.c:408:2
          #18 0x55a96a282223 in main tools/perf/perf.c:538:3
      
        Uninitialized value was created by a heap allocation
          #0 0x55a96a22f60d in malloc llvm/llvm-project/compiler-rt/lib/msan/msan_interceptors.cpp:925:3
          #1 0x55a96a882948 in __perf_event__synthesize_threads tools/perf/util/synthetic-events.c:655:15
          #2 0x55a96a881ec2 in perf_event__synthesize_threads tools/perf/util/synthetic-events.c:750:9
          #3 0x55a96a562b26 in synth_all tools/perf/tests/mmap-thread-lookup.c:136:9
          #4 0x55a96a5623b1 in mmap_events tools/perf/tests/mmap-thread-lookup.c:174:8
          #5 0x55a96a561fa0 in test__mmap_thread_lookup tools/perf/tests/mmap-thread-lookup.c:230:2
          #6 0x55a96a52c182 in run_test tools/perf/tests/builtin-test.c:378:9
          #7 0x55a96a52afc1 in test_and_print tools/perf/tests/builtin-test.c:408:9
          #8 0x55a96a52966e in __cmd_test tools/perf/tests/builtin-test.c:603:4
          #9 0x55a96a52855d in cmd_test tools/perf/tests/builtin-test.c:747:9
          #10 0x55a96a2844d4 in run_builtin tools/perf/perf.c:312:11
          #11 0x55a96a282bd0 in handle_internal_command tools/perf/perf.c:364:8
          #12 0x55a96a284097 in run_argv tools/perf/perf.c:408:2
          #13 0x55a96a282223 in main tools/perf/perf.c:538:3
      
      SUMMARY: MemorySanitizer: use-of-uninitialized-value tools/perf/util/dsos.c:23:6 in __dso_id__cmp
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: clang-built-linux@googlegroups.com
      Link: http://lore.kernel.org/lkml/20200313053129.131264-1-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      3b7a15b0
  2. 13 Mar, 2020 1 commit
  3. 12 Mar, 2020 1 commit
  4. 11 Mar, 2020 12 commits
  5. 10 Mar, 2020 18 commits
    • Kan Liang's avatar
      perf vendor events intel: Add NO_NMI_WATCHDOG metric constraint · b95fcd2c
      Kan Liang authored
      Add NO_NMI_WATCHDOG metric constraint to Page_Walks_Utilization for Sky Lake
      and Cascade Lake.
      
      Committer testing:
      
      On a Lenovo T480S, Intel(R) Core(TM) i7-8650U Kaby Lake, that looking at x86's
      mapfile.csv file is a:
      
        $ grep -w skylake tools/perf/pmu-events/arch/x86/mapfile.csv
        GenuineIntel-6-[4589]E,v24,skylake,core
        $
      
      So uses the constraint added in this patch in this file:
      
        tools/perf/pmu-events/arch/x86/skylake/skl-metrics.json
      
      Before:
      
        # perf stat -a -M Page_Walks_Utilization sleep 2
      
         Performance counter stats for 'system wide':
      
             <not counted>      itlb_misses.walk_pending                                      (0.00%)
             <not counted>      dtlb_load_misses.walk_pending                                     (0.00%)
             <not counted>      dtlb_store_misses.walk_pending                                     (0.00%)
             <not counted>      ept.walk_pending                                              (0.00%)
             <not counted>      cycles                                                        (0.00%)
      
               2.001750514 seconds time elapsed
      
        Some events weren't counted. Try disabling the NMI watchdog:
        	echo 0 > /proc/sys/kernel/nmi_watchdog
        	perf stat ...
        	echo 1 > /proc/sys/kernel/nmi_watchdog
        The events in group usually have to be from the same PMU. Try reorganizing the group.
        #
      
      After:
      
        # perf stat -a -M Page_Walks_Utilization sleep 2
        Splitting metric group Page_Walks_Utilization into standalone metrics.
        Try disabling the NMI watchdog to comply NO_NMI_WATCHDOG metric constraint:
            echo 0 > /proc/sys/kernel/nmi_watchdog
            perf stat ...
            echo 1 > /proc/sys/kernel/nmi_watchdog
        ,
         Performance counter stats for 'system wide':
      
                36,883,102      itlb_misses.walk_pending  #      0.1 Page_Walks_Utilization   (79.99%)
               123,104,146      dtlb_load_misses.walk_pending                                     (80.02%)
                13,720,795      dtlb_store_misses.walk_pending                                     (79.99%)
                         0      ept.walk_pending                                              (79.99%)
             1,519,948,400      cycles                                                        (80.01%)
      
               2.002170780 seconds time elapsed
      
        #
      
      Before and after, if we disable the nmi_watchdog we get:
      
        # echo 0 > /proc/sys/kernel/nmi_watchdog
        # perf stat -a -M Page_Walks_Utilization sleep 2
      
         Performance counter stats for 'system wide':
      
                33,721,658      itlb_misses.walk_pending  #      0.1 Page_Walks_Utilization
                84,070,996      dtlb_load_misses.walk_pending
                 9,816,071      dtlb_store_misses.walk_pending
                         0      ept.walk_pending
               704,920,899      cycles
      
               2.002331670 seconds time elapsed
      
        #
      
        More information about the metric expressions:
      
        # perf stat -v -a -M Page_Walks_Utilization sleep 2
        Using CPUID GenuineIntel-6-8E-A
        metric expr ( itlb_misses.walk_pending + dtlb_load_misses.walk_pending + dtlb_store_misses.walk_pending + ept.walk_pending ) / ( 2 * cycles ) for Page_Walks_Utilization
        found event itlb_misses.walk_pending
        found event dtlb_load_misses.walk_pending
        found event dtlb_store_misses.walk_pending
        found event ept.walk_pending
        found event cycles
        adding {itlb_misses.walk_pending,dtlb_load_misses.walk_pending,dtlb_store_misses.walk_pending,ept.walk_pending,cycles}:W
         -> cpu/umask=0x10,(null)=0x186a3,event=0x85/
         -> cpu/umask=0x10,(null)=0x1e8483,event=0x8/
         -> cpu/umask=0x10,(null)=0x1e8483,event=0x49/
         -> cpu/umask=0x10,(null)=0x1e8483,event=0x4f/
        itlb_misses.walk_pending: 8085772 16010162799 16010162799
        dtlb_load_misses.walk_pending: 28134579 16010162799 16010162799
        dtlb_store_misses.walk_pending: 7276535 16010162799 16010162799
        ept.walk_pending: 2 16010162799 16010162799
        cycles: 315140605 16010162799 16010162799
      
         Performance counter stats for 'system wide':
      
                 8,085,772      itlb_misses.walk_pending  #      0.1 Page_Walks_Utilization
                28,134,579      dtlb_load_misses.walk_pending
                 7,276,535      dtlb_store_misses.walk_pending
                         2      ept.walk_pending
               315,140,605      cycles
      
               2.002333181 seconds time elapsed
      
        #
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Link: http://lore.kernel.org/lkml/1582581564-184429-6-git-send-email-kan.liang@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      b95fcd2c
    • Kan Liang's avatar
      perf metricgroup: Support metric constraint · ab483d8b
      Kan Liang authored
      Some metric groups have metric constraints. A metric group can be
      scheduled as a group only when some constraints are applied.  For
      example, Page_Walks_Utilization has a metric constraint,
      "NO_NMI_WATCHDOG".
      
      When NMI watchdog is disabled, the metric group can be scheduled as a
      group. Otherwise, splitting the metric group into standalone metrics.
      
      Add a new function, metricgroup__has_constraint(), to check whether all
      constraints are applied. If not, splitting the metric group into
      standalone metrics.
      
      Currently, only one constraint, "NO_NMI_WATCHDOG", is checked. Print a
      warning for the metric group with the constraint, when NMI WATCHDOG is
      enabled.
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Link: http://lore.kernel.org/lkml/1582581564-184429-5-git-send-email-kan.liang@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      ab483d8b
    • Kan Liang's avatar
      perf util: Factor out sysctl__nmi_watchdog_enabled() · 2a14c1bf
      Kan Liang authored
      The NMI watchdog status is required for metric group constraint
      examination.  Factor out sysctl__nmi_watchdog_enabled() to retrieve the
      NMI watchdog status.
      
      Users may count more than one metric group each time. If so, the NMI
      watchdog status may be retrieved several times. To reduce the overhead,
      cache the NMI watchdog status.
      
      Replace the NMI watchdog status checking in print_footer() by
      sysctl__nmi_watchdog_enabled().
      Suggested-by: default avatarAndi Kleen <ak@linux.intel.com>
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Link: http://lore.kernel.org/lkml/1582581564-184429-4-git-send-email-kan.liang@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      2a14c1bf
    • Kan Liang's avatar
      perf metricgroup: Factor out metricgroup__add_metric_weak_group() · f742634a
      Kan Liang authored
      Factor out metricgroup__add_metric_weak_group() which add metrics into a
      weak group. The change can improve code readability. Because following
      patch will introduce a function which add standalone metrics.
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Link: http://lore.kernel.org/lkml/1582581564-184429-3-git-send-email-kan.liang@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      f742634a
    • Kan Liang's avatar
      perf jevents: Support metric constraint · 03fe02b1
      Kan Liang authored
      A new field "MetricConstraint" is introduced in JSON event list.
      
      Extend jevents to parse the field and save the value in
      metric_constraint.
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Link: http://lore.kernel.org/lkml/1582581564-184429-2-git-send-email-kan.liang@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      03fe02b1
    • Thomas Richter's avatar
      perf vendor events s390: Add new deflate counters for IBM z15 · e7950166
      Thomas Richter authored
      Add support for new deflate counters:
      
      - Counter 247: cycles CPU spent obtaining access to Deflate unit
      - Counter 252: cycles CPU is using Deflate unit
      - Counter 264: Increments by one for every DEFLATE CONVERSION CALL
      	    instruction executed.
      - Counter 265: Increments by one for every DEFLATE CONVERSION CALL
      	    instruction executed that ended in Condition Codes
      	    0, 1 or 2.
      
      Also adjust the some crypto counter description to latest documentation.
      Signed-off-by: default avatarThomas Richter <tmricht@linux.ibm.com>
      Reviewed-by: default avatarSumanth Korikkar <sumanthk@linux.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Link: http://lore.kernel.org/lkml/20200310142937.32045-1-tmricht@linux.ibm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      e7950166
    • Jin Yao's avatar
      perf block-info: Support color ops to print block percents in color · f787feff
      Jin Yao authored
      It would be nice to print the block percents with colors.
      
      This patch supports the 'Sampled Cycles%' and 'Avg Cycles%' printed in
      colors.
      
      For example,
      
      perf record -b ...
      perf report --total-cycles or perf report --total-cycles --stdio
      
      percent > 5%, colored in red
      percent > 0.5%, colored in green
      percent < 0.5%, default color
      Signed-off-by: default avatarJin Yao <yao.jin@linux.intel.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200202141655.32053-5-yao.jin@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      f787feff
    • Jin Yao's avatar
      perf block-info: Allow selecting which columns to report and its order · cca0cc76
      Jin Yao authored
      Currently we use a predefined array to set the block info output
      formats, it's fixed and inflexible.
      
      This patch adds two parameters "block_hpps" and "nr_hpps" in
      block_info__create_report and other static functions, in order to let
      user decide which columns to report and with specified report ordering.
      It should be more flexible.
      
      Buffers will be allocated to contain the new fmts, of course, we need to
      release them before perf exits.
      Signed-off-by: default avatarJin Yao <yao.jin@linux.intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200202141655.32053-4-yao.jin@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      cca0cc76
    • Jin Yao's avatar
      perf diff: Use __block_info__cmp() to replace block_pair_cmp() · a8a9f6dc
      Jin Yao authored
      'perf diff' uses block_pair_cmp() to compare two blocks. But
      block_info__cmp() has the similar functionality and it's a bit more
      complete.
      
      This patch removes block_pair_cmp() and uses __block_info__cmp()
      instead. __block_info__cmp() is wrapped by block_info__cmp() and it
      doesn't receives a perf_hpp_fmt parameter.
      Signed-off-by: default avatarJin Yao <yao.jin@linux.intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200202141655.32053-3-yao.jin@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      a8a9f6dc
    • Jin Yao's avatar
      perf block-info: Fix wrong block address comparison in block_info__cmp() · 3e152aa9
      Jin Yao authored
      Commit 60414418 ("perf block: Cleanup and refactor block info
      functions") introduces block_info__cmp(), which compares two blocks.
      
      But the issues are:
      
      1. It should return the strcmp cmp value only if it's not 0.
      
      2. When symbol names are matched, we need to compare the addresses
         of blocks further. But it wrongly uses the symbol addresses for
         comparison.
      
      3. If the syms are both NULL, we can't consider these two blocks are
         matched.
      
      This patch fixes above 3 issues.
      
      Fixes: 60414418 ("perf block: Cleanup and refactor block info functions")
      Signed-off-by: default avatarJin Yao <yao.jin@linux.intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200202141655.32053-2-yao.jin@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      3e152aa9
    • Jiri Olsa's avatar
      perf expr: Make expr__parse() return -1 on error · d942815a
      Jiri Olsa authored
      To match the error value of the expr__find_other function, so all
      exported expr functions return the same values:
      0 on success, -1 on error.
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Reviewed-by: default avatarAndi Kleen <ak@linux.intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Michael Petlan <mpetlan@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Link: http://lore.kernel.org/lkml/20200228093616.67125-6-jolsa@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      d942815a
    • Jiri Olsa's avatar
      perf expr: Straighten expr__parse()/expr__find_other() interface · 0f9b1e12
      Jiri Olsa authored
      Now that we have a flex parser we don't need to update the parsed string
      pointer, so the interface can just be passed the pointer to the
      expression instead of a pointer to pointer.
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Reviewed-by: default avatarAndi Kleen <ak@linux.intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Michael Petlan <mpetlan@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Link: http://lore.kernel.org/lkml/20200228093616.67125-5-jolsa@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      0f9b1e12
    • Jiri Olsa's avatar
      perf expr: Increase EXPR_MAX_OTHER to support metrics with more than 15 variables · 58ca7076
      Jiri Olsa authored
      We have metrics that define more than 15 variables, like
      Branch_Misprediction_Cost. Increasing the allowed variables count to 20.
      
      As Andy pointed out, we can't go too high in here, because some of the
      code has O(n^2) complexity (already_seen) and we might want to do some
      other changes (like using hash tables) before increasing the maximum
      even more.
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Reviewed-by: default avatarAndi Kleen <ak@linux.intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Michael Petlan <mpetlan@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Link: http://lore.kernel.org/lkml/20200228093616.67125-4-jolsa@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      58ca7076
    • Jiri Olsa's avatar
      perf expr: Move expr lexer to flex · 26226a97
      Jiri Olsa authored
      Adding expr flex code instead of the manual parser code. So it's easily
      extensible in upcoming changes.
      
      The new flex code is in flex.l object and gets compiled like all the
      other flexers we use.  It's defined as flex reentrant parser.
      
      It's used by both expr__parse and expr__find_other interfaces by
      separating the starting point.
      
      There's no intended change of functionality ;-) the test expr is
      passing.
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Reviewed-by: default avatarAndi Kleen <ak@linux.intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Michael Petlan <mpetlan@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Link: http://lore.kernel.org/lkml/20200228093616.67125-3-jolsa@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      26226a97
    • Jiri Olsa's avatar
      perf expr: Add expr.c object · 576a65b6
      Jiri Olsa authored
      Add generic expr code into new expr.c object.
      
      The expr.c object will be mainly used in following change that will get
      rid of the manual flex code,
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Reviewed-by: default avatarAndi Kleen <ak@linux.intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Michael Petlan <mpetlan@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Link: http://lore.kernel.org/lkml/20200228093616.67125-2-jolsa@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      576a65b6
    • Kan Liang's avatar
      perf header: Add check for unexpected use of reserved membrs in event attr · 277ce1ef
      Kan Liang authored
      The perf.data may be generated by a newer version of perf tool, which
      support new input bits in attr, e.g. new bit for branch_sample_type.
      
      The perf.data may be parsed by an older version of perf tool later.  The
      old perf tool may parse the perf.data incorrectly. There is no warning
      message for this case.
      
      Current perf header never check for unknown input bits in attr.
      
      When read the event desc from header, check the stored event attr.  The
      reserved bits, sample type, read format and branch sample type will be
      checked.
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Pavel Gerasimov <pavel.gerasimov@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com>
      Link: http://lkml.kernel.org/r/20200228163011.19358-4-kan.liang@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      277ce1ef
    • Kan Liang's avatar
      perf evsel: Support PERF_SAMPLE_BRANCH_HW_INDEX · d3f85437
      Kan Liang authored
      A new branch sample type PERF_SAMPLE_BRANCH_HW_INDEX has been introduced
      in latest kernel.
      
      Enable HW_INDEX by default in LBR call stack mode.
      
      If kernel doesn't support the sample type, switching it off.
      
      Add HW_INDEX in attr_fprintf as well. User can check whether the branch
      sample type is set via debug information or header.
      
      Committer testing:
      
      First collect some samples with LBR callchains, system wide, for a few
      seconds:
      
        # perf record --call-graph lbr -a sleep 5
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.625 MB perf.data (224 samples) ]
        #
      
      Now lets use 'perf evlist -v' to look at the branch_sample_type:
      
        # perf evlist -v
        cycles: size: 120, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CALLCHAIN|CPU|PERIOD|BRANCH_STACK, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, branch_sample_type: USER|CALL_STACK|NO_FLAGS|NO_CYCLES|HW_INDEX
        #
      
      So the machine has the kernel feature, and it was correctly added to
      perf_event_attr.branch_sample_type, for the default 'cycles' event.
      
      If we do it in another machine, where the kernel lacks the HW_INDEX
      feature, we get:
      
        # perf record --call-graph lbr -a sleep 2s
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 1.690 MB perf.data (499 samples) ]
        # perf evlist -v
        cycles: size: 120, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CALLCHAIN|CPU|PERIOD|BRANCH_STACK, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, branch_sample_type: USER|CALL_STACK|NO_FLAGS|NO_CYCLES
        #
      
      No HW_INDEX in attr.branch_sample_type.
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Pavel Gerasimov <pavel.gerasimov@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com>
      Link: http://lore.kernel.org/lkml/20200228163011.19358-3-kan.liang@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      d3f85437
    • Kan Liang's avatar
      perf tools: Add hw_idx in struct branch_stack · 42bbabed
      Kan Liang authored
      The low level index of raw branch records for the most recent branch can
      be recorded in a sample with PERF_SAMPLE_BRANCH_HW_INDEX
      branch_sample_type. Extend struct branch_stack to support it.
      
      However, if the PERF_SAMPLE_BRANCH_HW_INDEX is not applied, only nr and
      entries[] will be output by kernel. The pointer of entries[] could be
      wrong, since the output format is different with new struct
      branch_stack.  Add a variable no_hw_idx in struct perf_sample to
      indicate whether the hw_idx is output.  Add get_branch_entry() to return
      corresponding pointer of entries[0].
      
      To make dummy branch sample consistent as new branch sample, add hw_idx
      in struct dummy_branch_stack for cs-etm and intel-pt.
      
      Apply the new struct branch_stack for synthetic events as well.
      
      Extend test case sample-parsing to support new struct branch_stack.
      
      Committer notes:
      
      Renamed get_branch_entries() to perf_sample__branch_entries() to have
      proper namespacing and pave the way for this to be moved to libperf,
      eventually.
      
      Add 'static' to that inline as it is in a header.
      
      Add 'hw_idx' to 'struct dummy_branch_stack' in cs-etm.c to fix the build
      on arm64.
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Pavel Gerasimov <pavel.gerasimov@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com>
      Link: http://lore.kernel.org/lkml/20200228163011.19358-2-kan.liang@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      42bbabed
  6. 05 Mar, 2020 1 commit
    • Arnaldo Carvalho de Melo's avatar
      tools headers UAPI: Update tools's copy of linux/perf_event.h · 6339998d
      Arnaldo Carvalho de Melo authored
      To get the changes in:
      
        bbfd5e4f ("perf/core: Add new branch sample type for HW index of raw branch records")
      
      This silences this perf tools build warning:
      
        Warning: Kernel ABI header at 'tools/include/uapi/linux/perf_event.h' differs from latest version at 'include/uapi/linux/perf_event.h'
        diff -u tools/include/uapi/linux/perf_event.h include/uapi/linux/perf_event.h
      
      This update is a prerequisite to adding support for the HW index of raw
      branch records.
      Acked-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Pavel Gerasimov <pavel.gerasimov@intel.com>
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com>
      Link: http://lore.kernel.org/lkml/20200304134902.GB12612@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      6339998d
  7. 04 Mar, 2020 5 commits
    • Steven Rostedt (VMware)'s avatar
      tools lib traceevent: Remove extra '\n' in print_event_time() · 401d61cb
      Steven Rostedt (VMware) authored
      If the precision of print_event_time() is zero or greater than the
      timestamp, it uses a different format. But that format had an extra new
      line at the end, and caused the output to not look right:
      
      cpus=2
                 sleep-3946  [001]111264306005
      : function:             inotify_inode_queue_event
                 sleep-3946  [001]111264307158
      : function:             __fsnotify_parent
                 sleep-3946  [001]111264307637
      : function:             inotify_dentry_parent_queue_event
                 sleep-3946  [001]111264307989
      : function:             fsnotify
                 sleep-3946  [001]111264308401
      : function:             audit_syscall_exit
      
      Fixes: 38847db9 ("libtraceevent, perf tools: Changes in tep_print_event_* APIs")
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lore.kernel.org/lkml/20200303231852.6ab6882f@oasis.local.homeSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      401d61cb
    • Michael Petlan's avatar
      libperf: Add counting example · 76ce0265
      Michael Petlan authored
      Current libperf man pages mention file counting.c "coming with libperf package",
      however, the file is missing. Add the file then.
      
      Fixes: 81de3bf3 ("libperf: Add man pages")
      Signed-off-by: default avatarMichael Petlan <mpetlan@redhat.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      LPU-Reference: 20200227194424.28210-1-mpetlan@redhat.com
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      76ce0265
    • Ravi Bangoria's avatar
      perf annotate: Get rid of annotation->nr_jumps · dabce16b
      Ravi Bangoria authored
      The 'nr_jumps' field in 'struct annotation' is not used since it's
      inception in commit 2402e4a9 ("perf annotate browser: Show 'jumpy'
      functions").  Get rid of it.
      Signed-off-by: default avatarRavi Bangoria <ravi.bangoria@linux.ibm.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Song Liu <songliubraving@fb.com>
      Link: http://lore.kernel.org/lkml/20200204045233.474937-7-ravi.bangoria@linux.ibm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      dabce16b
    • Arnaldo Carvalho de Melo's avatar
      perf llvm: Add debug hint message about missing kernel-devel package · 357a5d24
      Arnaldo Carvalho de Melo authored
      To help in debugging, add this extra message:
      
        detect_kbuild_dir: Couldn't find "/lib/modules/5.4.20-200.fc31.x86_64/build/include/generated/autoconf.h", missing kernel-devel package?.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      357a5d24
    • Jin Yao's avatar
      perf stat: Show percore counts in per CPU output · 1af62ce6
      Jin Yao authored
      We have supported the event modifier "percore" which sums up the event
      counts for all hardware threads in a core and show the counts per core.
      
      For example,
      
       # perf stat -e cpu/event=cpu-cycles,percore/ -a -A -- sleep 1
      
        Performance counter stats for 'system wide':
      
       S0-D0-C0                395,072      cpu/event=cpu-cycles,percore/
       S0-D0-C1                851,248      cpu/event=cpu-cycles,percore/
       S0-D0-C2                954,226      cpu/event=cpu-cycles,percore/
       S0-D0-C3              1,233,659      cpu/event=cpu-cycles,percore/
      
      This patch provides a new option "--percore-show-thread". It is used
      with event modifier "percore" together to sum up the event counts for
      all hardware threads in a core but show the counts per hardware thread.
      
      This is essentially a replacement for the any bit (which is gone in
      Icelake). Per core counts are useful for some formulas, e.g. CoreIPC.
      The original percore version was inconvenient to post process. This
      variant matches the output of the any bit.
      
      With this patch, for example,
      
       # perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread  -- sleep 1
      
        Performance counter stats for 'system wide':
      
       CPU0               2,453,061      cpu/event=cpu-cycles,percore/
       CPU1               1,823,921      cpu/event=cpu-cycles,percore/
       CPU2               1,383,166      cpu/event=cpu-cycles,percore/
       CPU3               1,102,652      cpu/event=cpu-cycles,percore/
       CPU4               2,453,061      cpu/event=cpu-cycles,percore/
       CPU5               1,823,921      cpu/event=cpu-cycles,percore/
       CPU6               1,383,166      cpu/event=cpu-cycles,percore/
       CPU7               1,102,652      cpu/event=cpu-cycles,percore/
      
      We can see counts are duplicated in CPU pairs (CPU0/CPU4, CPU1/CPU5,
      CPU2/CPU6, CPU3/CPU7).
      
      The interval mode also works. For example,
      
       # perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread  -I 1000
       #           time CPU                    counts unit events
            1.000425421 CPU0                 925,032      cpu/event=cpu-cycles,percore/
            1.000425421 CPU1                 430,202      cpu/event=cpu-cycles,percore/
            1.000425421 CPU2                 436,843      cpu/event=cpu-cycles,percore/
            1.000425421 CPU3               1,192,504      cpu/event=cpu-cycles,percore/
            1.000425421 CPU4                 925,032      cpu/event=cpu-cycles,percore/
            1.000425421 CPU5                 430,202      cpu/event=cpu-cycles,percore/
            1.000425421 CPU6                 436,843      cpu/event=cpu-cycles,percore/
            1.000425421 CPU7               1,192,504      cpu/event=cpu-cycles,percore/
      
      If we offline CPU5, the result is:
      
       # perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread -- sleep 1
      
        Performance counter stats for 'system wide':
      
       CPU0               2,752,148      cpu/event=cpu-cycles,percore/
       CPU1               1,009,312      cpu/event=cpu-cycles,percore/
       CPU2               2,784,072      cpu/event=cpu-cycles,percore/
       CPU3               2,427,922      cpu/event=cpu-cycles,percore/
       CPU4               2,752,148      cpu/event=cpu-cycles,percore/
       CPU6               2,784,072      cpu/event=cpu-cycles,percore/
       CPU7               2,427,922      cpu/event=cpu-cycles,percore/
      
              1.001416041 seconds time elapsed
      
       v4:
       ---
       Ravi Bangoria reports an issue in v3. Once we offline a CPU,
       the output is not correct. The issue is we should use the cpu
       idx in print_percore_thread rather than using the cpu value.
      
       v3:
       ---
       1. Fix the interval mode output error
       2. Use cpu value (not cpu index) in config->aggr_get_id().
       3. Refine the code according to Jiri's comments.
      
       v2:
       ---
       Add the explanation in change log. This is essentially a replacement
       for the any bit. No code change.
      Signed-off-by: default avatarJin Yao <yao.jin@linux.intel.com>
      Tested-by: default avatarRavi Bangoria <ravi.bangoria@linux.ibm.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200214080452.26402-1-yao.jin@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      1af62ce6