1. 11 Mar, 2020 6 commits
    • Leo Yan's avatar
      perf cs-etm: Optimize copying last branches · 695378b5
      Leo Yan authored
      If an instruction range packet can generate multiple instruction
      samples, these samples share the same last branches; it's not necessary
      to copy the same last branches repeatedly for these samples within the
      same packet.
      
      This patch moves out the last branches copying from function
      cs_etm__synth_instruction_sample(), and execute it prior to generating
      instruction samples.
      Signed-off-by: default avatarLeo Yan <leo.yan@linaro.org>
      Reviewed-by: default avatarMathieu Poirier <mathieu.poirier@linaro.org>
      Reviewed-by: default avatarMike Leach <mike.leach@linaro.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Robert Walker <robert.walker@arm.com>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: coresight ml <coresight@lists.linaro.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: http://lore.kernel.org/lkml/20200219021811.20067-5-leo.yan@linaro.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      695378b5
    • Leo Yan's avatar
      perf cs-etm: Correct synthesizing instruction samples · c9f5baa1
      Leo Yan authored
      When 'etm->instructions_sample_period' is less than
      'tidq->period_instructions', the function cs_etm__sample() cannot handle
      this case properly with its logic.
      
      Let's see below flow as an example:
      
      - If we set itrace option '--itrace=i4', then function cs_etm__sample()
        has variables with initialized values:
      
        tidq->period_instructions = 0
        etm->instructions_sample_period = 4
      
      - When the first packet is coming:
      
        packet->instr_count = 10; the number of instructions executed in this
        packet is 10, thus update period_instructions as below:
      
        tidq->period_instructions = 0 + 10 = 10
        instrs_over = 10 - 4 = 6
        offset = 10 - 6 - 1 = 3
        tidq->period_instructions = instrs_over = 6
      
      - When the second packet is coming:
      
        packet->instr_count = 10; in the second pass, assume 10 instructions
        in the trace sample again:
      
        tidq->period_instructions = 6 + 10 = 16
        instrs_over = 16 - 4 = 12
        offset = 10 - 12 - 1 = -3  -> the negative value
        tidq->period_instructions = instrs_over = 12
      
      So after handle these two packets, there have below issues:
      
      The first issue is that cs_etm__instr_addr() returns the address within
      the current trace sample of the instruction related to offset, so the
      offset is supposed to be always unsigned value.  But in fact, function
      cs_etm__sample() might calculate a negative offset value (in handling
      the second packet, the offset is -3) and pass to cs_etm__instr_addr()
      with u64 type with a big positive integer.
      
      The second issue is it only synthesizes 2 samples for sample period = 4.
      In theory, every packet has 10 instructions so the two packets have
      total 20 instructions, 20 instructions should generate 5 samples
      (4 x 5 = 20).  This is because cs_etm__sample() only calls once
      cs_etm__synth_instruction_sample() to generate instruction sample per
      range packet.
      
      This patch fixes the logic in function cs_etm__sample(); the basic
      idea for handling coming packet is:
      
      - To synthesize the first instruction sample, it combines the left
        instructions from the previous packet and the head of the new
        packet; then generate continuous samples with sample period;
      - At the tail of the new packet, if it has the rest instructions,
        these instructions will be left for the sequential sample.
      Suggested-by: default avatarMike Leach <mike.leach@linaro.org>
      Signed-off-by: default avatarLeo Yan <leo.yan@linaro.org>
      Reviewed-by: default avatarMathieu Poirier <mathieu.poirier@linaro.org>
      Reviewed-by: default avatarMike Leach <mike.leach@linaro.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Robert Walker <robert.walker@arm.com>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: coresight ml <coresight@lists.linaro.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: http://lore.kernel.org/lkml/20200219021811.20067-4-leo.yan@linaro.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      c9f5baa1
    • Leo Yan's avatar
      perf cs-etm: Continuously record last branch · f1410028
      Leo Yan authored
      Every time synthesize instruction sample, the last branch recording will
      be reset.  This is fine if the instruction period is big enough, for
      example if use the option '--itrace=i100000', the last branch array is
      reset for every sample with 100000 instructions per period; before
      generate the next instruction sample, there has the sufficient packets
      coming to fill the last branch array.
      
      On the other hand, if set a very small period, the packets will be
      significantly reduced between two continuous instruction samples, thus
      the last branch array is almost empty for new instruction sample by
      frequently resetting.
      
      To allow the last branches to work properly for any instruction periods,
      this patch avoids to reset the last branch for every instruction sample
      and only reset it when flush the trace data.  The last branches will be
      reset only for two cases, one is for trace starting, another case is for
      discontinuous trace; other cases can keep recording last branches for
      continuous instruction samples.
      Signed-off-by: default avatarLeo Yan <leo.yan@linaro.org>
      Reviewed-by: default avatarMathieu Poirier <mathieu.poirier@linaro.org>
      Reviewed-by: default avatarMike Leach <mike.leach@linaro.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Robert Walker <robert.walker@arm.com>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: coresight ml <coresight@lists.linaro.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: http://lore.kernel.org/lkml/20200219021811.20067-3-leo.yan@linaro.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      f1410028
    • Leo Yan's avatar
      perf cs-etm: Swap packets for instruction samples · d0175156
      Leo Yan authored
      If use option '--itrace=iNNN' with Arm CoreSight trace data, perf tool
      fails inject instruction samples; the root cause is the packets are only
      swapped for branch samples and last branches but not for instruction
      samples, so the new coming packets cannot be properly handled for only
      synthesizing instruction samples.
      
      To fix this issue, this patch refactors the code with a new function
      cs_etm__packet_swap() which is used to swap packets and adds the
      condition for instruction samples.
      Signed-off-by: default avatarLeo Yan <leo.yan@linaro.org>
      Reviewed-by: default avatarMathieu Poirier <mathieu.poirier@linaro.org>
      Reviewed-by: default avatarMike Leach <mike.leach@linaro.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Robert Walker <robert.walker@arm.com>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: coresight ml <coresight@lists.linaro.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: http://lore.kernel.org/lkml/20200219021811.20067-2-leo.yan@linaro.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      d0175156
    • Arnaldo Carvalho de Melo's avatar
      perf map: Use strstarts() to look for Android libraries · bdadd647
      Arnaldo Carvalho de Melo authored
      And add the '/' to avoid looking at things like "/system/libsomething",
      when all we want to know if it is like "/system/lib/something", i.e. if
      it is in that system library dir.
      
      Using strstarts() avoids off-by-one errors like recently fixed in this
      file.
      
      Since this adds the '/' I separated this patch, another patch will make
      this consistent by removing other strncmp(str, prefix, manually
      calculated prefix length) usage.
      Reported-by: default avatarDominik Czarnota <dominik.b.czarnota@gmail.com>
      Acked-by: default avatarDominik Czarnota <dominik.b.czarnota@gmail.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: Link: http://lore.kernel.org/lkml/CABEVAa0_q-uC0vrrqpkqRHy_9RLOSXOJxizMLm1n5faHRy2AeA@mail.gmail.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      bdadd647
    • disconnect3d's avatar
      perf map: Fix off by one in strncpy() size argument · b8fdcfb5
      disconnect3d authored
      This patch fixes an off-by-one error in strncpy size argument in
      tools/perf/util/map.c. The issue is that in:
      
              strncmp(filename, "/system/lib/", 11)
      
      the passed string literal: "/system/lib/" has 12 bytes (without the NULL
      byte) and the passed size argument is 11. As a result, the logic won't
      match the ending "/" byte and will pass filepaths that are stored in
      other directories e.g. "/system/libmalicious/bin" or just
      "/system/libmalicious".
      
      This functionality seems to be present only on Android. I assume the
      /system/ directory is only writable by the root user, so I don't think
      this bug has much (or any) security impact.
      
      Fixes: eca81836 ("perf tools: Add automatic remapping of Android libraries")
      Signed-off-by: default avatardisconnect3d <dominik.b.czarnota@gmail.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Changbin Du <changbin.du@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Keeping <john@metanate.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Michael Lentine <mlentine@google.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lore.kernel.org/lkml/20200309104855.3775-1-dominik.b.czarnota@gmail.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      b8fdcfb5
  2. 10 Mar, 2020 18 commits
    • Kan Liang's avatar
      perf vendor events intel: Add NO_NMI_WATCHDOG metric constraint · b95fcd2c
      Kan Liang authored
      Add NO_NMI_WATCHDOG metric constraint to Page_Walks_Utilization for Sky Lake
      and Cascade Lake.
      
      Committer testing:
      
      On a Lenovo T480S, Intel(R) Core(TM) i7-8650U Kaby Lake, that looking at x86's
      mapfile.csv file is a:
      
        $ grep -w skylake tools/perf/pmu-events/arch/x86/mapfile.csv
        GenuineIntel-6-[4589]E,v24,skylake,core
        $
      
      So uses the constraint added in this patch in this file:
      
        tools/perf/pmu-events/arch/x86/skylake/skl-metrics.json
      
      Before:
      
        # perf stat -a -M Page_Walks_Utilization sleep 2
      
         Performance counter stats for 'system wide':
      
             <not counted>      itlb_misses.walk_pending                                      (0.00%)
             <not counted>      dtlb_load_misses.walk_pending                                     (0.00%)
             <not counted>      dtlb_store_misses.walk_pending                                     (0.00%)
             <not counted>      ept.walk_pending                                              (0.00%)
             <not counted>      cycles                                                        (0.00%)
      
               2.001750514 seconds time elapsed
      
        Some events weren't counted. Try disabling the NMI watchdog:
        	echo 0 > /proc/sys/kernel/nmi_watchdog
        	perf stat ...
        	echo 1 > /proc/sys/kernel/nmi_watchdog
        The events in group usually have to be from the same PMU. Try reorganizing the group.
        #
      
      After:
      
        # perf stat -a -M Page_Walks_Utilization sleep 2
        Splitting metric group Page_Walks_Utilization into standalone metrics.
        Try disabling the NMI watchdog to comply NO_NMI_WATCHDOG metric constraint:
            echo 0 > /proc/sys/kernel/nmi_watchdog
            perf stat ...
            echo 1 > /proc/sys/kernel/nmi_watchdog
        ,
         Performance counter stats for 'system wide':
      
                36,883,102      itlb_misses.walk_pending  #      0.1 Page_Walks_Utilization   (79.99%)
               123,104,146      dtlb_load_misses.walk_pending                                     (80.02%)
                13,720,795      dtlb_store_misses.walk_pending                                     (79.99%)
                         0      ept.walk_pending                                              (79.99%)
             1,519,948,400      cycles                                                        (80.01%)
      
               2.002170780 seconds time elapsed
      
        #
      
      Before and after, if we disable the nmi_watchdog we get:
      
        # echo 0 > /proc/sys/kernel/nmi_watchdog
        # perf stat -a -M Page_Walks_Utilization sleep 2
      
         Performance counter stats for 'system wide':
      
                33,721,658      itlb_misses.walk_pending  #      0.1 Page_Walks_Utilization
                84,070,996      dtlb_load_misses.walk_pending
                 9,816,071      dtlb_store_misses.walk_pending
                         0      ept.walk_pending
               704,920,899      cycles
      
               2.002331670 seconds time elapsed
      
        #
      
        More information about the metric expressions:
      
        # perf stat -v -a -M Page_Walks_Utilization sleep 2
        Using CPUID GenuineIntel-6-8E-A
        metric expr ( itlb_misses.walk_pending + dtlb_load_misses.walk_pending + dtlb_store_misses.walk_pending + ept.walk_pending ) / ( 2 * cycles ) for Page_Walks_Utilization
        found event itlb_misses.walk_pending
        found event dtlb_load_misses.walk_pending
        found event dtlb_store_misses.walk_pending
        found event ept.walk_pending
        found event cycles
        adding {itlb_misses.walk_pending,dtlb_load_misses.walk_pending,dtlb_store_misses.walk_pending,ept.walk_pending,cycles}:W
         -> cpu/umask=0x10,(null)=0x186a3,event=0x85/
         -> cpu/umask=0x10,(null)=0x1e8483,event=0x8/
         -> cpu/umask=0x10,(null)=0x1e8483,event=0x49/
         -> cpu/umask=0x10,(null)=0x1e8483,event=0x4f/
        itlb_misses.walk_pending: 8085772 16010162799 16010162799
        dtlb_load_misses.walk_pending: 28134579 16010162799 16010162799
        dtlb_store_misses.walk_pending: 7276535 16010162799 16010162799
        ept.walk_pending: 2 16010162799 16010162799
        cycles: 315140605 16010162799 16010162799
      
         Performance counter stats for 'system wide':
      
                 8,085,772      itlb_misses.walk_pending  #      0.1 Page_Walks_Utilization
                28,134,579      dtlb_load_misses.walk_pending
                 7,276,535      dtlb_store_misses.walk_pending
                         2      ept.walk_pending
               315,140,605      cycles
      
               2.002333181 seconds time elapsed
      
        #
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Link: http://lore.kernel.org/lkml/1582581564-184429-6-git-send-email-kan.liang@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      b95fcd2c
    • Kan Liang's avatar
      perf metricgroup: Support metric constraint · ab483d8b
      Kan Liang authored
      Some metric groups have metric constraints. A metric group can be
      scheduled as a group only when some constraints are applied.  For
      example, Page_Walks_Utilization has a metric constraint,
      "NO_NMI_WATCHDOG".
      
      When NMI watchdog is disabled, the metric group can be scheduled as a
      group. Otherwise, splitting the metric group into standalone metrics.
      
      Add a new function, metricgroup__has_constraint(), to check whether all
      constraints are applied. If not, splitting the metric group into
      standalone metrics.
      
      Currently, only one constraint, "NO_NMI_WATCHDOG", is checked. Print a
      warning for the metric group with the constraint, when NMI WATCHDOG is
      enabled.
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Link: http://lore.kernel.org/lkml/1582581564-184429-5-git-send-email-kan.liang@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      ab483d8b
    • Kan Liang's avatar
      perf util: Factor out sysctl__nmi_watchdog_enabled() · 2a14c1bf
      Kan Liang authored
      The NMI watchdog status is required for metric group constraint
      examination.  Factor out sysctl__nmi_watchdog_enabled() to retrieve the
      NMI watchdog status.
      
      Users may count more than one metric group each time. If so, the NMI
      watchdog status may be retrieved several times. To reduce the overhead,
      cache the NMI watchdog status.
      
      Replace the NMI watchdog status checking in print_footer() by
      sysctl__nmi_watchdog_enabled().
      Suggested-by: default avatarAndi Kleen <ak@linux.intel.com>
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Link: http://lore.kernel.org/lkml/1582581564-184429-4-git-send-email-kan.liang@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      2a14c1bf
    • Kan Liang's avatar
      perf metricgroup: Factor out metricgroup__add_metric_weak_group() · f742634a
      Kan Liang authored
      Factor out metricgroup__add_metric_weak_group() which add metrics into a
      weak group. The change can improve code readability. Because following
      patch will introduce a function which add standalone metrics.
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Link: http://lore.kernel.org/lkml/1582581564-184429-3-git-send-email-kan.liang@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      f742634a
    • Kan Liang's avatar
      perf jevents: Support metric constraint · 03fe02b1
      Kan Liang authored
      A new field "MetricConstraint" is introduced in JSON event list.
      
      Extend jevents to parse the field and save the value in
      metric_constraint.
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Link: http://lore.kernel.org/lkml/1582581564-184429-2-git-send-email-kan.liang@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      03fe02b1
    • Thomas Richter's avatar
      perf vendor events s390: Add new deflate counters for IBM z15 · e7950166
      Thomas Richter authored
      Add support for new deflate counters:
      
      - Counter 247: cycles CPU spent obtaining access to Deflate unit
      - Counter 252: cycles CPU is using Deflate unit
      - Counter 264: Increments by one for every DEFLATE CONVERSION CALL
      	    instruction executed.
      - Counter 265: Increments by one for every DEFLATE CONVERSION CALL
      	    instruction executed that ended in Condition Codes
      	    0, 1 or 2.
      
      Also adjust the some crypto counter description to latest documentation.
      Signed-off-by: default avatarThomas Richter <tmricht@linux.ibm.com>
      Reviewed-by: default avatarSumanth Korikkar <sumanthk@linux.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Link: http://lore.kernel.org/lkml/20200310142937.32045-1-tmricht@linux.ibm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      e7950166
    • Jin Yao's avatar
      perf block-info: Support color ops to print block percents in color · f787feff
      Jin Yao authored
      It would be nice to print the block percents with colors.
      
      This patch supports the 'Sampled Cycles%' and 'Avg Cycles%' printed in
      colors.
      
      For example,
      
      perf record -b ...
      perf report --total-cycles or perf report --total-cycles --stdio
      
      percent > 5%, colored in red
      percent > 0.5%, colored in green
      percent < 0.5%, default color
      Signed-off-by: default avatarJin Yao <yao.jin@linux.intel.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200202141655.32053-5-yao.jin@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      f787feff
    • Jin Yao's avatar
      perf block-info: Allow selecting which columns to report and its order · cca0cc76
      Jin Yao authored
      Currently we use a predefined array to set the block info output
      formats, it's fixed and inflexible.
      
      This patch adds two parameters "block_hpps" and "nr_hpps" in
      block_info__create_report and other static functions, in order to let
      user decide which columns to report and with specified report ordering.
      It should be more flexible.
      
      Buffers will be allocated to contain the new fmts, of course, we need to
      release them before perf exits.
      Signed-off-by: default avatarJin Yao <yao.jin@linux.intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200202141655.32053-4-yao.jin@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      cca0cc76
    • Jin Yao's avatar
      perf diff: Use __block_info__cmp() to replace block_pair_cmp() · a8a9f6dc
      Jin Yao authored
      'perf diff' uses block_pair_cmp() to compare two blocks. But
      block_info__cmp() has the similar functionality and it's a bit more
      complete.
      
      This patch removes block_pair_cmp() and uses __block_info__cmp()
      instead. __block_info__cmp() is wrapped by block_info__cmp() and it
      doesn't receives a perf_hpp_fmt parameter.
      Signed-off-by: default avatarJin Yao <yao.jin@linux.intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200202141655.32053-3-yao.jin@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      a8a9f6dc
    • Jin Yao's avatar
      perf block-info: Fix wrong block address comparison in block_info__cmp() · 3e152aa9
      Jin Yao authored
      Commit 60414418 ("perf block: Cleanup and refactor block info
      functions") introduces block_info__cmp(), which compares two blocks.
      
      But the issues are:
      
      1. It should return the strcmp cmp value only if it's not 0.
      
      2. When symbol names are matched, we need to compare the addresses
         of blocks further. But it wrongly uses the symbol addresses for
         comparison.
      
      3. If the syms are both NULL, we can't consider these two blocks are
         matched.
      
      This patch fixes above 3 issues.
      
      Fixes: 60414418 ("perf block: Cleanup and refactor block info functions")
      Signed-off-by: default avatarJin Yao <yao.jin@linux.intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200202141655.32053-2-yao.jin@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      3e152aa9
    • Jiri Olsa's avatar
      perf expr: Make expr__parse() return -1 on error · d942815a
      Jiri Olsa authored
      To match the error value of the expr__find_other function, so all
      exported expr functions return the same values:
      0 on success, -1 on error.
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Reviewed-by: default avatarAndi Kleen <ak@linux.intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Michael Petlan <mpetlan@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Link: http://lore.kernel.org/lkml/20200228093616.67125-6-jolsa@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      d942815a
    • Jiri Olsa's avatar
      perf expr: Straighten expr__parse()/expr__find_other() interface · 0f9b1e12
      Jiri Olsa authored
      Now that we have a flex parser we don't need to update the parsed string
      pointer, so the interface can just be passed the pointer to the
      expression instead of a pointer to pointer.
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Reviewed-by: default avatarAndi Kleen <ak@linux.intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Michael Petlan <mpetlan@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Link: http://lore.kernel.org/lkml/20200228093616.67125-5-jolsa@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      0f9b1e12
    • Jiri Olsa's avatar
      perf expr: Increase EXPR_MAX_OTHER to support metrics with more than 15 variables · 58ca7076
      Jiri Olsa authored
      We have metrics that define more than 15 variables, like
      Branch_Misprediction_Cost. Increasing the allowed variables count to 20.
      
      As Andy pointed out, we can't go too high in here, because some of the
      code has O(n^2) complexity (already_seen) and we might want to do some
      other changes (like using hash tables) before increasing the maximum
      even more.
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Reviewed-by: default avatarAndi Kleen <ak@linux.intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Michael Petlan <mpetlan@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Link: http://lore.kernel.org/lkml/20200228093616.67125-4-jolsa@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      58ca7076
    • Jiri Olsa's avatar
      perf expr: Move expr lexer to flex · 26226a97
      Jiri Olsa authored
      Adding expr flex code instead of the manual parser code. So it's easily
      extensible in upcoming changes.
      
      The new flex code is in flex.l object and gets compiled like all the
      other flexers we use.  It's defined as flex reentrant parser.
      
      It's used by both expr__parse and expr__find_other interfaces by
      separating the starting point.
      
      There's no intended change of functionality ;-) the test expr is
      passing.
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Reviewed-by: default avatarAndi Kleen <ak@linux.intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Michael Petlan <mpetlan@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Link: http://lore.kernel.org/lkml/20200228093616.67125-3-jolsa@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      26226a97
    • Jiri Olsa's avatar
      perf expr: Add expr.c object · 576a65b6
      Jiri Olsa authored
      Add generic expr code into new expr.c object.
      
      The expr.c object will be mainly used in following change that will get
      rid of the manual flex code,
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Reviewed-by: default avatarAndi Kleen <ak@linux.intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Michael Petlan <mpetlan@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Link: http://lore.kernel.org/lkml/20200228093616.67125-2-jolsa@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      576a65b6
    • Kan Liang's avatar
      perf header: Add check for unexpected use of reserved membrs in event attr · 277ce1ef
      Kan Liang authored
      The perf.data may be generated by a newer version of perf tool, which
      support new input bits in attr, e.g. new bit for branch_sample_type.
      
      The perf.data may be parsed by an older version of perf tool later.  The
      old perf tool may parse the perf.data incorrectly. There is no warning
      message for this case.
      
      Current perf header never check for unknown input bits in attr.
      
      When read the event desc from header, check the stored event attr.  The
      reserved bits, sample type, read format and branch sample type will be
      checked.
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Pavel Gerasimov <pavel.gerasimov@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com>
      Link: http://lkml.kernel.org/r/20200228163011.19358-4-kan.liang@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      277ce1ef
    • Kan Liang's avatar
      perf evsel: Support PERF_SAMPLE_BRANCH_HW_INDEX · d3f85437
      Kan Liang authored
      A new branch sample type PERF_SAMPLE_BRANCH_HW_INDEX has been introduced
      in latest kernel.
      
      Enable HW_INDEX by default in LBR call stack mode.
      
      If kernel doesn't support the sample type, switching it off.
      
      Add HW_INDEX in attr_fprintf as well. User can check whether the branch
      sample type is set via debug information or header.
      
      Committer testing:
      
      First collect some samples with LBR callchains, system wide, for a few
      seconds:
      
        # perf record --call-graph lbr -a sleep 5
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.625 MB perf.data (224 samples) ]
        #
      
      Now lets use 'perf evlist -v' to look at the branch_sample_type:
      
        # perf evlist -v
        cycles: size: 120, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CALLCHAIN|CPU|PERIOD|BRANCH_STACK, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, branch_sample_type: USER|CALL_STACK|NO_FLAGS|NO_CYCLES|HW_INDEX
        #
      
      So the machine has the kernel feature, and it was correctly added to
      perf_event_attr.branch_sample_type, for the default 'cycles' event.
      
      If we do it in another machine, where the kernel lacks the HW_INDEX
      feature, we get:
      
        # perf record --call-graph lbr -a sleep 2s
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 1.690 MB perf.data (499 samples) ]
        # perf evlist -v
        cycles: size: 120, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CALLCHAIN|CPU|PERIOD|BRANCH_STACK, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, branch_sample_type: USER|CALL_STACK|NO_FLAGS|NO_CYCLES
        #
      
      No HW_INDEX in attr.branch_sample_type.
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Pavel Gerasimov <pavel.gerasimov@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com>
      Link: http://lore.kernel.org/lkml/20200228163011.19358-3-kan.liang@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      d3f85437
    • Kan Liang's avatar
      perf tools: Add hw_idx in struct branch_stack · 42bbabed
      Kan Liang authored
      The low level index of raw branch records for the most recent branch can
      be recorded in a sample with PERF_SAMPLE_BRANCH_HW_INDEX
      branch_sample_type. Extend struct branch_stack to support it.
      
      However, if the PERF_SAMPLE_BRANCH_HW_INDEX is not applied, only nr and
      entries[] will be output by kernel. The pointer of entries[] could be
      wrong, since the output format is different with new struct
      branch_stack.  Add a variable no_hw_idx in struct perf_sample to
      indicate whether the hw_idx is output.  Add get_branch_entry() to return
      corresponding pointer of entries[0].
      
      To make dummy branch sample consistent as new branch sample, add hw_idx
      in struct dummy_branch_stack for cs-etm and intel-pt.
      
      Apply the new struct branch_stack for synthetic events as well.
      
      Extend test case sample-parsing to support new struct branch_stack.
      
      Committer notes:
      
      Renamed get_branch_entries() to perf_sample__branch_entries() to have
      proper namespacing and pave the way for this to be moved to libperf,
      eventually.
      
      Add 'static' to that inline as it is in a header.
      
      Add 'hw_idx' to 'struct dummy_branch_stack' in cs-etm.c to fix the build
      on arm64.
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Pavel Gerasimov <pavel.gerasimov@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com>
      Link: http://lore.kernel.org/lkml/20200228163011.19358-2-kan.liang@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      42bbabed
  3. 05 Mar, 2020 1 commit
    • Arnaldo Carvalho de Melo's avatar
      tools headers UAPI: Update tools's copy of linux/perf_event.h · 6339998d
      Arnaldo Carvalho de Melo authored
      To get the changes in:
      
        bbfd5e4f ("perf/core: Add new branch sample type for HW index of raw branch records")
      
      This silences this perf tools build warning:
      
        Warning: Kernel ABI header at 'tools/include/uapi/linux/perf_event.h' differs from latest version at 'include/uapi/linux/perf_event.h'
        diff -u tools/include/uapi/linux/perf_event.h include/uapi/linux/perf_event.h
      
      This update is a prerequisite to adding support for the HW index of raw
      branch records.
      Acked-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Pavel Gerasimov <pavel.gerasimov@intel.com>
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com>
      Link: http://lore.kernel.org/lkml/20200304134902.GB12612@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      6339998d
  4. 04 Mar, 2020 9 commits
    • Steven Rostedt (VMware)'s avatar
      tools lib traceevent: Remove extra '\n' in print_event_time() · 401d61cb
      Steven Rostedt (VMware) authored
      If the precision of print_event_time() is zero or greater than the
      timestamp, it uses a different format. But that format had an extra new
      line at the end, and caused the output to not look right:
      
      cpus=2
                 sleep-3946  [001]111264306005
      : function:             inotify_inode_queue_event
                 sleep-3946  [001]111264307158
      : function:             __fsnotify_parent
                 sleep-3946  [001]111264307637
      : function:             inotify_dentry_parent_queue_event
                 sleep-3946  [001]111264307989
      : function:             fsnotify
                 sleep-3946  [001]111264308401
      : function:             audit_syscall_exit
      
      Fixes: 38847db9 ("libtraceevent, perf tools: Changes in tep_print_event_* APIs")
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lore.kernel.org/lkml/20200303231852.6ab6882f@oasis.local.homeSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      401d61cb
    • Michael Petlan's avatar
      libperf: Add counting example · 76ce0265
      Michael Petlan authored
      Current libperf man pages mention file counting.c "coming with libperf package",
      however, the file is missing. Add the file then.
      
      Fixes: 81de3bf3 ("libperf: Add man pages")
      Signed-off-by: default avatarMichael Petlan <mpetlan@redhat.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      LPU-Reference: 20200227194424.28210-1-mpetlan@redhat.com
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      76ce0265
    • Ravi Bangoria's avatar
      perf annotate: Get rid of annotation->nr_jumps · dabce16b
      Ravi Bangoria authored
      The 'nr_jumps' field in 'struct annotation' is not used since it's
      inception in commit 2402e4a9 ("perf annotate browser: Show 'jumpy'
      functions").  Get rid of it.
      Signed-off-by: default avatarRavi Bangoria <ravi.bangoria@linux.ibm.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Song Liu <songliubraving@fb.com>
      Link: http://lore.kernel.org/lkml/20200204045233.474937-7-ravi.bangoria@linux.ibm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      dabce16b
    • Arnaldo Carvalho de Melo's avatar
      perf llvm: Add debug hint message about missing kernel-devel package · 357a5d24
      Arnaldo Carvalho de Melo authored
      To help in debugging, add this extra message:
      
        detect_kbuild_dir: Couldn't find "/lib/modules/5.4.20-200.fc31.x86_64/build/include/generated/autoconf.h", missing kernel-devel package?.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      357a5d24
    • Jin Yao's avatar
      perf stat: Show percore counts in per CPU output · 1af62ce6
      Jin Yao authored
      We have supported the event modifier "percore" which sums up the event
      counts for all hardware threads in a core and show the counts per core.
      
      For example,
      
       # perf stat -e cpu/event=cpu-cycles,percore/ -a -A -- sleep 1
      
        Performance counter stats for 'system wide':
      
       S0-D0-C0                395,072      cpu/event=cpu-cycles,percore/
       S0-D0-C1                851,248      cpu/event=cpu-cycles,percore/
       S0-D0-C2                954,226      cpu/event=cpu-cycles,percore/
       S0-D0-C3              1,233,659      cpu/event=cpu-cycles,percore/
      
      This patch provides a new option "--percore-show-thread". It is used
      with event modifier "percore" together to sum up the event counts for
      all hardware threads in a core but show the counts per hardware thread.
      
      This is essentially a replacement for the any bit (which is gone in
      Icelake). Per core counts are useful for some formulas, e.g. CoreIPC.
      The original percore version was inconvenient to post process. This
      variant matches the output of the any bit.
      
      With this patch, for example,
      
       # perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread  -- sleep 1
      
        Performance counter stats for 'system wide':
      
       CPU0               2,453,061      cpu/event=cpu-cycles,percore/
       CPU1               1,823,921      cpu/event=cpu-cycles,percore/
       CPU2               1,383,166      cpu/event=cpu-cycles,percore/
       CPU3               1,102,652      cpu/event=cpu-cycles,percore/
       CPU4               2,453,061      cpu/event=cpu-cycles,percore/
       CPU5               1,823,921      cpu/event=cpu-cycles,percore/
       CPU6               1,383,166      cpu/event=cpu-cycles,percore/
       CPU7               1,102,652      cpu/event=cpu-cycles,percore/
      
      We can see counts are duplicated in CPU pairs (CPU0/CPU4, CPU1/CPU5,
      CPU2/CPU6, CPU3/CPU7).
      
      The interval mode also works. For example,
      
       # perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread  -I 1000
       #           time CPU                    counts unit events
            1.000425421 CPU0                 925,032      cpu/event=cpu-cycles,percore/
            1.000425421 CPU1                 430,202      cpu/event=cpu-cycles,percore/
            1.000425421 CPU2                 436,843      cpu/event=cpu-cycles,percore/
            1.000425421 CPU3               1,192,504      cpu/event=cpu-cycles,percore/
            1.000425421 CPU4                 925,032      cpu/event=cpu-cycles,percore/
            1.000425421 CPU5                 430,202      cpu/event=cpu-cycles,percore/
            1.000425421 CPU6                 436,843      cpu/event=cpu-cycles,percore/
            1.000425421 CPU7               1,192,504      cpu/event=cpu-cycles,percore/
      
      If we offline CPU5, the result is:
      
       # perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread -- sleep 1
      
        Performance counter stats for 'system wide':
      
       CPU0               2,752,148      cpu/event=cpu-cycles,percore/
       CPU1               1,009,312      cpu/event=cpu-cycles,percore/
       CPU2               2,784,072      cpu/event=cpu-cycles,percore/
       CPU3               2,427,922      cpu/event=cpu-cycles,percore/
       CPU4               2,752,148      cpu/event=cpu-cycles,percore/
       CPU6               2,784,072      cpu/event=cpu-cycles,percore/
       CPU7               2,427,922      cpu/event=cpu-cycles,percore/
      
              1.001416041 seconds time elapsed
      
       v4:
       ---
       Ravi Bangoria reports an issue in v3. Once we offline a CPU,
       the output is not correct. The issue is we should use the cpu
       idx in print_percore_thread rather than using the cpu value.
      
       v3:
       ---
       1. Fix the interval mode output error
       2. Use cpu value (not cpu index) in config->aggr_get_id().
       3. Refine the code according to Jiri's comments.
      
       v2:
       ---
       Add the explanation in change log. This is essentially a replacement
       for the any bit. No code change.
      Signed-off-by: default avatarJin Yao <yao.jin@linux.intel.com>
      Tested-by: default avatarRavi Bangoria <ravi.bangoria@linux.ibm.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200214080452.26402-1-yao.jin@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      1af62ce6
    • Namhyung Kim's avatar
      tools lib api fs: Move cgroupsfs_find_mountpoint() · 7982a898
      Namhyung Kim authored
      Move it from tools/perf/util/cgroup.c as it can be used by other places.
      Note that cgroup filesystem is different from others since it's usually
      mounted separately (in v1) for each subsystem.
      
      I just copied the code with a little modification to pass a name of
      subsystem.
      Suggested-by: default avatarJiri Olsa <jolsa@redhat.com>
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lore.kernel.org/lkml/20200127100031.1368732-1-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      7982a898
    • Arnaldo Carvalho de Melo's avatar
    • Nick Desaulniers's avatar
      perf diff: Fix undefined string comparison spotted by clang's -Wstring-compare · c395c355
      Nick Desaulniers authored
      clang warns:
      
        util/block-info.c:298:18: error: result of comparison against a string
        literal is unspecified (use an explicit string comparison function
        instead) [-Werror,-Wstring-compare]
                if ((start_line != SRCLINE_UNKNOWN) && (end_line != SRCLINE_UNKNOWN)) {
                                ^  ~~~~~~~~~~~~~~~
        util/block-info.c:298:51: error: result of comparison against a string
        literal is unspecified (use an explicit string comparison function
        instead) [-Werror,-Wstring-compare]
                if ((start_line != SRCLINE_UNKNOWN) && (end_line != SRCLINE_UNKNOWN)) {
                                                                 ^  ~~~~~~~~~~~~~~~
        util/block-info.c:298:18: error: result of comparison against a string
        literal is unspecified (use an explicit string
        comparison function instead) [-Werror,-Wstring-compare]
                if ((start_line != SRCLINE_UNKNOWN) && (end_line != SRCLINE_UNKNOWN)) {
                                ^  ~~~~~~~~~~~~~~~
        util/block-info.c:298:51: error: result of comparison against a string
        literal is unspecified (use an explicit string comparison function
        instead) [-Werror,-Wstring-compare]
                if ((start_line != SRCLINE_UNKNOWN) && (end_line != SRCLINE_UNKNOWN)) {
                                                                 ^  ~~~~~~~~~~~~~~~
        util/map.c:434:15: error: result of comparison against a string literal
        is unspecified (use an explicit string comparison function instead)
        [-Werror,-Wstring-compare]
                        if (srcline != SRCLINE_UNKNOWN)
                                    ^  ~~~~~~~~~~~~~~~
      
      Reviewer Notes:
      
      Looks good to me. Some more context:
      https://clang.llvm.org/docs/DiagnosticsReference.html#wstring-compare
      The spec says:
      J.1 Unspecified behavior
      The following are unspecified:
      .. Whether two string literals result in distinct arrays (6.4.5).
      Signed-off-by: default avatarNick Desaulniers <nick.desaulniers@gmail.com>
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Changbin Du <changbin.du@intel.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Keeping <john@metanate.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: clang-built-linux@googlegroups.com
      Link: https://github.com/ClangBuiltLinux/linux/issues/900
      Link: http://lore.kernel.org/lkml/20200223193456.25291-1-nick.desaulniers@gmail.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      c395c355
    • Ingo Molnar's avatar
      Merge tag 'perf-urgent-for-mingo-5.6-20200303' of... · b95b4d5e
      Ingo Molnar authored
      Merge tag 'perf-urgent-for-mingo-5.6-20200303' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
      
      Pull perf/urgent fixes from Arnaldo Carvalho de Melo:
      
      perf symbols:
      
        Arnaldo Carvalho de Melo:
      
        - Don't try to find a vmlinux file when looking for kernel modules,
          fixing symbol resolution in systems with compressed kernel modules.
      
      perf env:
      
        Arnaldo Carvalho de Melo:
      
        - Do not return pointers to local variables, fixing valid warning from
          gcc 10 for corner case that stops the build due to -Werror.
      
      perf tests:
      
        Arnaldo Carvalho de Melo:
      
        - Make global variable static in the bp_account entry to fix build
          with gcc 10.
      
      perf parse-events:
      
        Arnaldo Carvalho de Melo:
      
        - Use asprintf() instead of strncpy() to read tracepoint files, addressing
          compiler warning that stops the build as we use -Werror.
      
      perf bench:
      
        Arnaldo Carvalho de Melo:
      
        - Share some global variables to fix build with gcc 10.
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      b95b4d5e
  5. 03 Mar, 2020 3 commits
    • Linus Torvalds's avatar
      Merge tag '5.6-rc4-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6 · 8b614cb8
      Linus Torvalds authored
      Pull cifs fixes from Steve French:
       "Five small cifs/smb3 fixes, two for stable (one for a reconnect
        problem and the other fixes a use case when renaming an open file)"
      
      * tag '5.6-rc4-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
        cifs: Use #define in cifs_dbg
        cifs: fix rename() by ensuring source handle opened with DELETE bit
        cifs: add missing mount option to /proc/mounts
        cifs: fix potential mismatch of UNC paths
        cifs: don't leak -EAGAIN for stat() during reconnect
      8b614cb8
    • Arnaldo Carvalho de Melo's avatar
      perf symbols: Don't try to find a vmlinux file when looking for kernel modules · b5c09518
      Arnaldo Carvalho de Melo authored
      The dso->kernel value is now set to everything that is in
      machine->kmaps, but that was being used to decide if vmlinux lookup is
      needed, which ended up making that lookup be made for kernel modules,
      that now have dso->kernel set, leading to these kinds of warnings when
      running on a machine with compressed kernel modules, like fedora:31:
      
        [root@five ~]# perf record -F 10000 -a sleep 2
        [ perf record: Woken up 1 times to write data ]
        lzma: fopen failed on vmlinux: 'No such file or directory'
        lzma: fopen failed on /boot/vmlinux: 'No such file or directory'
        lzma: fopen failed on /boot/vmlinux-5.5.5-200.fc31.x86_64: 'No such file or directory'
        lzma: fopen failed on /usr/lib/debug/boot/vmlinux-5.5.5-200.fc31.x86_64: 'No such file or directory'
        lzma: fopen failed on /lib/modules/5.5.5-200.fc31.x86_64/build/vmlinux: 'No such file or directory'
        lzma: fopen failed on vmlinux: 'No such file or directory'
        lzma: fopen failed on /boot/vmlinux: 'No such file or directory'
        lzma: fopen failed on /boot/vmlinux-5.5.5-200.fc31.x86_64: 'No such file or directory'
        lzma: fopen failed on /usr/lib/debug/boot/vmlinux-5.5.5-200.fc31.x86_64: 'No such file or directory'
        lzma: fopen failed on /lib/modules/5.5.5-200.fc31.x86_64/build/vmlinux: 'No such file or directory'
        lzma: fopen failed on vmlinux: 'No such file or directory'
        lzma: fopen failed on /boot/vmlinux: 'No such file or directory'
        lzma: fopen failed on /boot/vmlinux-5.5.5-200.fc31.x86_64: 'No such file or directory'
        lzma: fopen failed on /usr/lib/debug/boot/vmlinux-5.5.5-200.fc31.x86_64: 'No such file or directory'
        lzma: fopen failed on /lib/modules/5.5.5-200.fc31.x86_64/build/vmlinux: 'No such file or directory'
        lzma: fopen failed on vmlinux: 'No such file or directory'
        lzma: fopen failed on /boot/vmlinux: 'No such file or directory'
        lzma: fopen failed on /boot/vmlinux-5.5.5-200.fc31.x86_64: 'No such file or directory'
        lzma: fopen failed on /usr/lib/debug/boot/vmlinux-5.5.5-200.fc31.x86_64: 'No such file or directory'
        lzma: fopen failed on /lib/modules/5.5.5-200.fc31.x86_64/build/vmlinux: 'No such file or directory'
        lzma: fopen failed on vmlinux: 'No such file or directory'
        lzma: fopen failed on /boot/vmlinux: 'No such file or directory'
        lzma: fopen failed on /boot/vmlinux-5.5.5-200.fc31.x86_64: 'No such file or directory'
        lzma: fopen failed on /usr/lib/debug/boot/vmlinux-5.5.5-200.fc31.x86_64: 'No such file or directory'
        lzma: fopen failed on /lib/modules/5.5.5-200.fc31.x86_64/build/vmlinux: 'No such file or directory'
        [ perf record: Captured and wrote 1.024 MB perf.data (1366 samples) ]
        [root@five ~]#
      
      This happens when collecting the buildid, when we find samples for
      kernel modules, fix it by checking if the looked up DSO is a kernel
      module by other means.
      
      Fixes: 02213cec ("perf maps: Mark module DSOs with kernel type")
      Tested-by: default avatarJiri Olsa <jolsa@redhat.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Kim Phillips <kim.phillips@amd.com>
      Cc: Michael Petlan <mpetlan@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Link: http://lore.kernel.org/lkml/20200302191007.GD10335@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      b5c09518
    • Arnaldo Carvalho de Melo's avatar
      perf bench: Share some global variables to fix build with gcc 10 · e4d9b04b
      Arnaldo Carvalho de Melo authored
      Noticed with gcc 10 (fedora rawhide) that those variables were not being
      declared as static, so end up with:
      
        ld: /tmp/build/perf/bench/epoll-wait.o:/git/perf/tools/perf/bench/epoll-wait.c:93: multiple definition of `end'; /tmp/build/perf/bench/futex-hash.o:/git/perf/tools/perf/bench/futex-hash.c:40: first defined here
        ld: /tmp/build/perf/bench/epoll-wait.o:/git/perf/tools/perf/bench/epoll-wait.c:93: multiple definition of `start'; /tmp/build/perf/bench/futex-hash.o:/git/perf/tools/perf/bench/futex-hash.c:40: first defined here
        ld: /tmp/build/perf/bench/epoll-wait.o:/git/perf/tools/perf/bench/epoll-wait.c:93: multiple definition of `runtime'; /tmp/build/perf/bench/futex-hash.o:/git/perf/tools/perf/bench/futex-hash.c:40: first defined here
        ld: /tmp/build/perf/bench/epoll-ctl.o:/git/perf/tools/perf/bench/epoll-ctl.c:38: multiple definition of `end'; /tmp/build/perf/bench/futex-hash.o:/git/perf/tools/perf/bench/futex-hash.c:40: first defined here
        ld: /tmp/build/perf/bench/epoll-ctl.o:/git/perf/tools/perf/bench/epoll-ctl.c:38: multiple definition of `start'; /tmp/build/perf/bench/futex-hash.o:/git/perf/tools/perf/bench/futex-hash.c:40: first defined here
        ld: /tmp/build/perf/bench/epoll-ctl.o:/git/perf/tools/perf/bench/epoll-ctl.c:38: multiple definition of `runtime'; /tmp/build/perf/bench/futex-hash.o:/git/perf/tools/perf/bench/futex-hash.c:40: first defined here
        make[4]: *** [/git/perf/tools/build/Makefile.build:145: /tmp/build/perf/bench/perf-in.o] Error 1
      
      Prefix those with bench__ and add them to bench/bench.h, so that we can
      share those on the tools needing to access those variables from signal
      handlers.
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: http://lore.kernel.org/lkml/20200303155811.GD13702@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      e4d9b04b
  6. 02 Mar, 2020 3 commits
    • Arnaldo Carvalho de Melo's avatar
      perf parse-events: Use asprintf() instead of strncpy() to read tracepoint files · 7125f204
      Arnaldo Carvalho de Melo authored
      Make the code more compact by using asprintf() instead of malloc()+strncpy() which also uses
      less memory and avoids these warnings with gcc 10:
      
          CC       /tmp/build/perf/util/cloexec.o
        In file included from /usr/include/string.h:495,
                         from util/parse-events.h:12,
                         from util/parse-events.c:18:
        In function ‘strncpy’,
            inlined from ‘tracepoint_id_to_path’ at util/parse-events.c:271:5:
        /usr/include/bits/string_fortified.h:106:10: error: ‘__builtin_strncpy’ offset [275, 511] from the object at ‘sys_dirent’ is out of the bounds of referenced subobject ‘d_name’ with type ‘char[256]’ at offset 19 [-Werror=array-bounds]
          106 |   return __builtin___strncpy_chk (__dest, __src, __len, __bos (__dest));
              |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        In file included from /usr/include/dirent.h:61,
                         from util/parse-events.c:5:
        util/parse-events.c: In function ‘tracepoint_id_to_path’:
        /usr/include/bits/dirent.h:33:10: note: subobject ‘d_name’ declared here
           33 |     char d_name[256];  /* We must not include limits.h! */
              |          ^~~~~~
        In file included from /usr/include/string.h:495,
                         from util/parse-events.h:12,
                         from util/parse-events.c:18:
        In function ‘strncpy’,
            inlined from ‘tracepoint_id_to_path’ at util/parse-events.c:273:5:
        /usr/include/bits/string_fortified.h:106:10: error: ‘__builtin_strncpy’ offset [275, 511] from the object at ‘evt_dirent’ is out of the bounds of referenced subobject ‘d_name’ with type ‘char[256]’ at offset 19 [-Werror=array-bounds]
          106 |   return __builtin___strncpy_chk (__dest, __src, __len, __bos (__dest));
              |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        In file included from /usr/include/dirent.h:61,
                         from util/parse-events.c:5:
        util/parse-events.c: In function ‘tracepoint_id_to_path’:
        /usr/include/bits/dirent.h:33:10: note: subobject ‘d_name’ declared here
           33 |     char d_name[256];  /* We must not include limits.h! */
              |          ^~~~~~
          CC       /tmp/build/perf/util/call-path.o
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: http://lore.kernel.org/lkml/20200302145535.GA28183@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      7125f204
    • Arnaldo Carvalho de Melo's avatar
      perf env: Do not return pointers to local variables · ebcb9464
      Arnaldo Carvalho de Melo authored
      It is possible to return a pointer to a local variable when looking up
      the architecture name for the running system and no normalization is
      done on that value, i.e. we may end up returning the uts.machine local
      variable.
      
      While this doesn't happen on most arches, as normalization takes place,
      lets fix this by making that a static variable and optimize it a bit by
      not always running uname(), only the first time.
      
      Noticed in fedora rawhide running with:
      
        [perfbuilder@a5ff49d6e6e4 ~]$ gcc --version
        gcc (GCC) 10.0.1 20200216 (Red Hat 10.0.1-0.8)
      Reported-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      ebcb9464
    • Arnaldo Carvalho de Melo's avatar
      perf tests bp_account: Make global variable static · cff20b31
      Arnaldo Carvalho de Melo authored
      To fix the build with newer gccs, that without this patch exit with:
      
          LD       /tmp/build/perf/tests/perf-in.o
        ld: /tmp/build/perf/tests/bp_account.o:/git/perf/tools/perf/tests/bp_account.c:22: multiple definition of `the_var'; /tmp/build/perf/tests/bp_signal.o:/git/perf/tools/perf/tests/bp_signal.c:38: first defined here
        make[4]: *** [/git/perf/tools/build/Makefile.build:145: /tmp/build/perf/tests/perf-in.o] Error 1
      
      First noticed in fedora:rawhide/32 with:
      
        [perfbuilder@a5ff49d6e6e4 ~]$ gcc --version
        gcc (GCC) 10.0.1 20200216 (Red Hat 10.0.1-0.8)
      Reported-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      cff20b31