1. 27 Mar, 2015 4 commits
    • Andi Kleen's avatar
      perf/x86/intel: Add INST_RETIRED.ALL workarounds · 294fe0f5
      Andi Kleen authored
      On Broadwell INST_RETIRED.ALL cannot be used with any period
      that doesn't have the lowest 6 bits cleared. And the period
      should not be smaller than 128.
      
      This is erratum BDM11 and BDM55:
      
        http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/5th-gen-core-family-spec-update.pdf
      
      BDM11: When using a period < 100; we may get incorrect PEBS/PMI
      interrupts and/or an invalid counter state.
      BDM55: When bit0-5 of the period are !0 we may get redundant PEBS
      records on overflow.
      
      Add a new callback to enforce this, and set it for Broadwell.
      
      How does this handle the case when an app requests a specific
      period with some of the bottom bits set?
      
      Short answer:
      
      Any useful instruction sampling period needs to be 4-6 orders
      of magnitude larger than 128, as an PMI every 128 instructions
      would instantly overwhelm the system and be throttled.
      So the +-64 error from this is really small compared to the
      period, much smaller than normal system jitter.
      
      Long answer (by Peterz):
      
      IFF we guarantee perf_event_attr::sample_period >= 128.
      
      Suppose we start out with sample_period=192; then we'll set period_left
      to 192, we'll end up with left = 128 (we truncate the lower bits). We
      get an interrupt, find that period_left = 64 (>0 so we return 0 and
      don't get an overflow handler), up that to 128. Then we trigger again,
      at n=256. Then we find period_left = -64 (<=0 so we return 1 and do get
      an overflow). We increment with sample_period so we get left = 128. We
      fire again, at n=384, period_left = 0 (<=0 so we return 1 and get an
      overflow). And on and on.
      
      So while the individual interrupts are 'wrong' we get then with
      interval=256,128 in exactly the right ratio to average out at 192. And
      this works for everything >=128.
      
      So the num_samples*fixed_period thing is still entirely correct +- 127,
      which is good enough I'd say, as you already have that error anyhow.
      
      So no need to 'fix' the tools, al we need to do is refuse to create
      INST_RETIRED:ALL events with sample_period < 128.
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      [ Updated comments and changelog a bit. ]
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1424225886-18652-3-git-send-email-andi@firstfloor.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      294fe0f5
    • Andi Kleen's avatar
      perf/x86/intel: Add Broadwell core support · 91f1b705
      Andi Kleen authored
      Add Broadwell support for Broadwell to perf.
      
      The basic support is very similar to Haswell. We use the new cache
      event list added for Haswell earlier. The only differences
      are a few bits related to remote nodes. To avoid an extra,
      mostly identical, table these are patched up in the initialization code.
      
      The constraint list has one new event that needs to be handled over Haswell.
      
      Includes code and testing from Kan Liang.
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1424225886-18652-2-git-send-email-andi@firstfloor.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      91f1b705
    • Andi Kleen's avatar
      perf/x86/intel: Add new cache events table for Haswell · 0f1b5ca2
      Andi Kleen authored
      Haswell offcore events are quite different from Sandy Bridge.
      Add a new table to handle Haswell properly.
      
      Note that the offcore bits listed in the SDM are not quite correct
      (this is currently being fixed). An uptodate list of bits is
      in the patch.
      
      The basic setup is similar to Sandy Bridge. The prefetch columns
      have been removed, as prefetch counting is not very reliable
      on Haswell. One L1 event that is not in the event list anymore
      has been also removed.
      
      - data reads do not include code reads (comparable to earlier Sandy Bridge tables)
      - data counts include speculative execution (except L1 write, dtlb, bpu)
      - remote node access includes both remote memory, remote cache, remote mmio.
      - prefetches are not included in the counts for consistency
        (different from Sandy Bridge, which includes prefetches in the remote node)
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      [ Removed the HSM30 comments; we don't have them for SNB/IVB either. ]
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1424225886-18652-1-git-send-email-andi@firstfloor.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      0f1b5ca2
    • Ingo Molnar's avatar
      Merge tag 'perf-core-for-mingo' of... · 30fdaa6b
      Ingo Molnar authored
      Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
      
      Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
      
      User visible changes:
      
        - Show the first event with an invalid filter (David Ahern, Arnaldo Carvalho de Melo)
      
        - Fix garbage output when intermixing syscalls from different threads in 'perf trace' (Arnaldo Carvalho de Melo)
      
        - Fix 'perf timechart' SIBGUS error on sparc64 (David Ahern)
      
      Infrastructure changes:
      
        - Set JOBS based on CPU or processor, making it work on SPARC, where
          /proc/cpuinfo has "CPU", not "processor" (David Ahern)
      
        - Zero should not be considered "not found" in libtraceevent's eval_flag() (Steven Rostedt)
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      30fdaa6b
  2. 26 Mar, 2015 6 commits
  3. 24 Mar, 2015 19 commits
  4. 23 Mar, 2015 7 commits
  5. 22 Mar, 2015 2 commits
    • Ingo Molnar's avatar
      Merge tag 'perf-core-for-mingo-2' of... · 963a70b8
      Ingo Molnar authored
      Merge tag 'perf-core-for-mingo-2' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
      
      Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
      
        - Handle legacy syscalls tracepoints (David Ahern, Arnaldo Carvalho de Melo)
      
        - Indicate which callchain entries are annotated in the
          TUI hists browser (report/top) (Arnaldo Carvalho de Melo)
      
        - Fix failure to add multiple probes without debuginfo (He Kuang)
      
        - Fix 'trace' summary_only option (David Ahern)
      
        - Fix race in build_id_cache__add_s() in 'buildid-cache' (Milos Vyletel)
      
        - Don't allow empty argument for field-separator, fixing segfault (Wang Nan)
      
      Infrastructure:
      
        - Add destructor for format_field in libtraceevent (David Ahern)
      
        - Prep work for support lzma compressed kernel modules (Jiri Olsa)
      
        - Update .gitignore with recently added/renamed feature detection files (Yunlong Song)
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      963a70b8
    • Ingo Molnar's avatar
      Merge tag 'perf-core-for-mingo' of... · 08b3f913
      Ingo Molnar authored
      Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
      
      Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
      
      User visible changes:
      
        - Bash completion for subcommands (Yunlong Song)
      
        - Allow annotating entries in callchains in the hists browser (top/report).
          TODO: give some visual cue to what entries in callchains have samples and thus
          can be annotated and/or allow showing the source code for functions without
          samples (Arnaldo Carvalho de Melo)
      
        - Don't allow empty argument for '-t' in perf report, fixing segfault (Wang Nan)
      
      Infrastructure:
      
        - Prep work for moving the perf feature tests build system to tools/build (Jiri Olsa)
      
        - Fix perf-read-vdsox32 not building and lib64 install dir (H.J. Lu)
      
        - ARM64: fix building error and eh/debug frame offset cache fixes (Wang Nan)
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      08b3f913
  6. 21 Mar, 2015 2 commits