1. 12 Sep, 2016 1 commit
    • Arnaldo Carvalho de Melo's avatar
      perf probe: Fix dwarf regs table for x86_64 · 7a023fd2
      Arnaldo Carvalho de Melo authored
      In 293d5b43 ("perf probe: Support probing on offline cross-arch binary")
      DWARF register tables were introduced for many architectures, with the one for
      the "dx" register being broken for x86_64, which got noticed by the 'perf test
      bpf' testcase, that has this difference from a successful run to one that
      fails, with the aforementioned patch:
      
        -Writing event: p:perf_bpf_probe/func _text+5197232 f_mode=+68(%di):x32 offset=%si:s64 orig=dx:s32
        -Failed to write event: Invalid argument
        -bpf_probe: failed to apply perf probe eventsFailed to add events selected by BPF
        +Writing event: p:perf_bpf_probe/func _text+5197232 f_mode=+68(%di):x32 offset=%si:s64 orig=%dx:s32
      
      Add the missing '%' to '%dx' to fix this.
      Acked-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Fixes: 293d5b43 ("perf probe: Support probing on offline cross-arch binary")
      Link: https://lkml.kernel.org/r/20160909145955.GC32585@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      7a023fd2
  2. 10 Sep, 2016 9 commits
    • Kan Liang's avatar
      perf/x86/intel/uncore: Add Skylake server uncore support · cd34cd97
      Kan Liang authored
      This patch implements the uncore monitoring driver for Skylake server.
      The uncore subsystem in Skylake server is similar to previous
      server. There are some differences in config register encoding and pci
      device IDs. Besides, Skylake introduces many new boxes to reflect the
      MESH architecture changes.
      
      The control registers for IIO and UPI have been extended to 64 bit. This
      patch also introduces event_mask_ext to handle the high 32 bit mask.
      
      The CHA box number could vary for different machines. This patch gets
      the CHA box number by counting the CHA register space during
      initialization at runtime.
      Signed-off-by: default avatarKan Liang <kan.liang@intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/1471378190-17276-3-git-send-email-kan.liang@intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      cd34cd97
    • Harry Pan's avatar
      perf/x86/rapl: Enable Apollo Lake RAPL support · 2668c619
      Harry Pan authored
      This patch enables RAPL counters (energy consumption counters)
      support for Intel Apollo Lake (Goldmont) processors (Model 92):
      
      RAPL of Goldmont, unlikes ESU increment of Silvermont/Airmont,
      it likes the Haswell microarchitecture in 1/2^ESU joules and
      supports power domains in PP0/PP1/PKG/RAM.
      
      ESU and power domains refer to Intel Software Developers' Manual,
      Vol. 3C, Order No. 325384, Table 35-12.
      
      Usage example:
      
        $ perf list
        $ perf stat -a -e power/energy-cores/,power/energy-pkg/ sleep 10
      Signed-off-by: default avatarHarry Pan <harry.pan@intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: bp@alien8.de
      Cc: gs0622@gmail.com
      Cc: hpa@zytor.com
      Cc: srinivas.pandruvada@linux.intel.com
      Link: http://lkml.kernel.org/r/1473325738-730-1-git-send-email-harry.pan@intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      2668c619
    • Ingo Molnar's avatar
      50069218
    • Peter Zijlstra's avatar
      perf/x86/intel: Fix PEBSv3 record drain · 8ef9b845
      Peter Zijlstra authored
      Alexander hit the WARN_ON_ONCE(!event) on his Skylake while running
      the perf fuzzer.
      
      This means the PEBSv3 record included a status bit for an inactive
      event, something that _should_ not happen.
      
      Move the code that filters the status bits against our known PEBS
      events up a spot to guarantee we only deal with events we know about.
      
      Further add "continue" statements to the WARN_ON_ONCE()s such that
      we'll not die nor generate silly events in case we ever do hit them
      again.
      Reported-by: default avatarAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Tested-by: default avatarAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vince@deater.net>
      Cc: stable@vger.kernel.org
      Fixes: a3d86542 ("perf/x86/intel/pebs: Add PEBSv3 decoding")
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      8ef9b845
    • Alexander Shishkin's avatar
      perf/x86/intel/bts: Kill a silly warning · ef9ef3be
      Alexander Shishkin authored
      At the moment, intel_bts will WARN() out if there is more than one
      event writing to the same ring buffer, via SET_OUTPUT, and will only
      send data from one event to a buffer.
      
      There is no reason to have this warning in, so kill it.
      Signed-off-by: default avatarAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: vince@deater.net
      Link: http://lkml.kernel.org/r/20160906132353.19887-6-alexander.shishkin@linux.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      ef9ef3be
    • Alexander Shishkin's avatar
      perf/x86/intel/bts: Fix BTS PMI detection · 4d4c4741
      Alexander Shishkin authored
      Since BTS doesn't have a dedicated PMI status bit, the driver needs to
      take extra care to check for the condition that triggers it to avoid
      spurious NMI warnings.
      
      Regardless of the local BTS context state, the only way of knowing that
      the NMI is ours is to compare the write pointer against the interrupt
      threshold.
      Reported-by: default avatarVince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: default avatarAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: vince@deater.net
      Link: http://lkml.kernel.org/r/20160906132353.19887-5-alexander.shishkin@linux.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      4d4c4741
    • Alexander Shishkin's avatar
      perf/x86/intel/bts: Fix confused ordering of PMU callbacks · a9a94401
      Alexander Shishkin authored
      The intel_bts driver is using a CPU-local 'started' variable to order
      callbacks and PMIs and make sure that AUX transactions don't get messed
      up. However, the ordering rules in regard to this variable is a complete
      mess, which recently resulted in perf_fuzzer-triggered warnings and
      panics.
      
      The general ordering rule that is patch is enforcing is that this
      cpu-local variable be set only when the cpu-local AUX transaction is
      active; consequently, this variable is to be checked before the AUX
      related bits can be touched.
      Reported-by: default avatarVince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: default avatarAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: vince@deater.net
      Link: http://lkml.kernel.org/r/20160906132353.19887-4-alexander.shishkin@linux.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      a9a94401
    • Alexander Shishkin's avatar
      perf/core: Fix aux_mmap_count vs aux_refcount order · b79ccadd
      Alexander Shishkin authored
      The order of accesses to ring buffer's aux_mmap_count and aux_refcount
      has to be preserved across the users, namely perf_mmap_close() and
      perf_aux_output_begin(), otherwise the inversion can result in the latter
      holding the last reference to the aux buffer and subsequently free'ing
      it in atomic context, triggering a warning.
      
      > ------------[ cut here ]------------
      > WARNING: CPU: 0 PID: 257 at kernel/events/ring_buffer.c:541 __rb_free_aux+0x11a/0x130
      > CPU: 0 PID: 257 Comm: stopbug Not tainted 4.8.0-rc1+ #2596
      > Call Trace:
      >  [<ffffffff810f3e0b>] __warn+0xcb/0xf0
      >  [<ffffffff810f3f3d>] warn_slowpath_null+0x1d/0x20
      >  [<ffffffff8121182a>] __rb_free_aux+0x11a/0x130
      >  [<ffffffff812127a8>] rb_free_aux+0x18/0x20
      >  [<ffffffff81212913>] perf_aux_output_begin+0x163/0x1e0
      >  [<ffffffff8100c33a>] bts_event_start+0x3a/0xd0
      >  [<ffffffff8100c42d>] bts_event_add+0x5d/0x80
      >  [<ffffffff81203646>] event_sched_in.isra.104+0xf6/0x2f0
      >  [<ffffffff8120652e>] group_sched_in+0x6e/0x190
      >  [<ffffffff8120694e>] ctx_sched_in+0x2fe/0x5f0
      >  [<ffffffff81206ca0>] perf_event_sched_in+0x60/0x80
      >  [<ffffffff81206d1b>] ctx_resched+0x5b/0x90
      >  [<ffffffff81207281>] __perf_event_enable+0x1e1/0x240
      >  [<ffffffff81200639>] event_function+0xa9/0x180
      >  [<ffffffff81202000>] ? perf_cgroup_attach+0x70/0x70
      >  [<ffffffff8120203f>] remote_function+0x3f/0x50
      >  [<ffffffff811971f3>] flush_smp_call_function_queue+0x83/0x150
      >  [<ffffffff81197bd3>] generic_smp_call_function_single_interrupt+0x13/0x60
      >  [<ffffffff810a6477>] smp_call_function_single_interrupt+0x27/0x40
      >  [<ffffffff81a26ea9>] call_function_single_interrupt+0x89/0x90
      >  [<ffffffff81120056>] finish_task_switch+0xa6/0x210
      >  [<ffffffff81120017>] ? finish_task_switch+0x67/0x210
      >  [<ffffffff81a1e83d>] __schedule+0x3dd/0xb50
      >  [<ffffffff81a1efe5>] schedule+0x35/0x80
      >  [<ffffffff81128031>] sys_sched_yield+0x61/0x70
      >  [<ffffffff81a25be5>] entry_SYSCALL_64_fastpath+0x18/0xa8
      > ---[ end trace 6235f556f5ea83a9 ]---
      
      This patch puts the checks in perf_aux_output_begin() in the same order
      as that of perf_mmap_close().
      Reported-by: default avatarVince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: default avatarAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: vince@deater.net
      Link: http://lkml.kernel.org/r/20160906132353.19887-3-alexander.shishkin@linux.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      b79ccadd
    • Alexander Shishkin's avatar
      perf/core: Fix a race between mmap_close() and set_output() of AUX events · 767ae086
      Alexander Shishkin authored
      In the mmap_close() path we need to stop all the AUX events that are
      writing data to the AUX area that we are unmapping, before we can
      safely free the pages. To determine if an event needs to be stopped,
      we're comparing its ->rb against the one that's getting unmapped.
      However, a SET_OUTPUT ioctl may turn up inside an AUX transaction
      and swizzle event::rb to some other ring buffer, but the transaction
      will keep writing data to the old ring buffer until the event gets
      scheduled out. At this point, mmap_close() will skip over such an
      event and will proceed to free the AUX area, while it's still being
      used by this event, which will set off a warning in the mmap_close()
      path and cause a memory corruption.
      
      To avoid this, always stop an AUX event before its ->rb is updated;
      this will release the (potentially) last reference on the AUX area
      of the buffer. If the event gets restarted, its new ring buffer will
      be used. If another SET_OUTPUT comes and switches it back to the
      old ring buffer that's getting unmapped, it's also fine: this
      ring buffer's aux_mmap_count will be zero and AUX transactions won't
      start any more.
      Reported-by: default avatarVince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: default avatarAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: vince@deater.net
      Link: http://lkml.kernel.org/r/20160906132353.19887-2-alexander.shishkin@linux.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      767ae086
  3. 09 Sep, 2016 2 commits
    • Sebastian Andrzej Siewior's avatar
      perf/x86/amd/uncore: Prevent use after free · 7d762e49
      Sebastian Andrzej Siewior authored
      The resent conversion of the cpu hotplug support in the uncore driver
      introduced a regression due to the way the callbacks are invoked at
      initialization time.
      
      The old code called the prepare/starting/online function on each online cpu
      as a block. The new code registers the hotplug callbacks in the core for
      each state. The core invokes the callbacks at each registration on all
      online cpus.
      
      The code implicitely relied on the prepare/starting/online callbacks being
      called as combo on a particular cpu, which was not obvious and completely
      undocumented.
      
      The resulting subtle wreckage happens due to the way how the uncore code
      manages shared data structures for cpus which share an uncore resource in
      hardware. The sharing is determined in the cpu starting callback, but the
      prepare callback allocates per cpu data for the upcoming cpu because
      potential sharing is unknown at this point. If the starting callback finds
      a online cpu which shares the hardware resource it takes a refcount on the
      percpu data of that cpu and puts the own data structure into a
      'free_at_online' pointer of that shared data structure. The online callback
      frees that.
      
      With the old model this worked because in a starting callback only one non
      unused structure (the one of the starting cpu) was available. The new code
      allocates the data structures for all cpus when the prepare callback is
      registered.
      
      Now the starting function iterates through all online cpus and looks for a
      data structure (skipping its own) which has a matching hardware id. The id
      member of the data structure is initialized to 0, but the hardware id can
      be 0 as well. The resulting wreckage is:
      
        CPU0 finds a matching id on CPU1, takes a refcount on CPU1 data and puts
        its own data structure into CPU1s data structure to be freed.
      
        CPU1 skips CPU0 because the data structure is its allegedly unsued own.
        It finds a matching id on CPU2, takes a refcount on CPU1 data and puts
        its own data structure into CPU2s data structure to be freed.
      
        ....
      
      Now the online callbacks are invoked.
      
        CPU0 has a pointer to CPU1s data and frees the original CPU0 data. So
        far so good.
      
        CPU1 has a pointer to CPU2s data and frees the original CPU1 data, which
        is still referenced by CPU0 ---> Booom
      
      So there are two issues to be solved here:
      
      1) The id field must be initialized at allocation time to a value which
         cannot be a valid hardware id, i.e. -1
      
         This prevents the above scenario, but now CPU1 and CPU2 both stick their
         own data structure into the free_at_online pointer of CPU0. So we leak
         CPU1s data structure.
      
      2) Fix the memory leak described in #1
      
         Instead of having a single pointer, use a hlist to enqueue the
         superflous data structures which are then freed by the first cpu
         invoking the online callback.
      
      Ideally we should know the sharing _before_ invoking the prepare callback,
      but that's way beyond the scope of this bug fix.
      
      [ tglx: Rewrote changelog ]
      
      Fixes: 96b2bd38 ("perf/x86/amd/uncore: Convert to hotplug state machine")
      Reported-and-tested-by: default avatarEric Sandeen <sandeen@sandeen.net>
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Borislav Petkov <bp@suse.de>
      Link: http://lkml.kernel.org/r/20160909160822.lowgmkdwms2dheyv@linutronix.deSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      7d762e49
    • Ingo Molnar's avatar
      Merge tag 'perf-core-for-mingo-20160908' of... · 14520d63
      Ingo Molnar authored
      Merge tag 'perf-core-for-mingo-20160908' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
      
      Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
      
      User visible changes:
      
      - Add branch stack / basic block info to 'perf annotate --stdio', where for
        each branch, we add an asm comment after the instruction with information on
        how often it was taken and predicted. See example with color output at:
      
          http://vger.kernel.org/~acme/perf/annotate_basic_blocks.png
      
        (Peter Zijlstra)
      
      - Only open an evsel in CPUs in its cpu map, fixing some use cases in
        systems with multiple PMUs with different CPU maps (Mark Rutland)
      
      - Fix handling of huge TLB maps, recognizing it as anonymous (Wang Nan)
      
      Infrastructure changes:
      
      - Remove the symbol filtering code, i.e. the callbacks passed to all functions
        that could end up loading a DSO symtab, simplifying the code, eventually
        allowing what we should have had since day one: removing the 'map' parameter
        from dso__load() functions (Arnaldo Carvalho de Melo)
      
      Arch specific build fixes:
      
      - Fix detached tarball build on powerpc, where we were still accessing a
        file outside tools/ (Ravi Bangoria)
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      14520d63
  4. 08 Sep, 2016 7 commits
    • Ravi Bangoria's avatar
      perf powerpc: Fix build-test failure · 25b8592e
      Ravi Bangoria authored
      'make -C tools/perf build-test' is failing with below log for poewrpc.
      
        In file included from /tmp/tmp.3eEwmGlYaF/perf-4.8.0-rc4/tools/perf/perf.h:15:0,
                         from util/cpumap.h:8,
                         from util/env.c:1:
        /tmp/tmp.3eEwmGlYaF/perf-4.8.0-rc4/tools/perf/perf-sys.h:23:56:
        fatal error: ../../arch/powerpc/include/uapi/asm/unistd.h: No such file or directory
        compilation terminated.
      
      I bisected it and found it's failing from commit ad430729 ("Remove:
      kernel unistd*h files from perf's MANIFEST, not used").
      
      Header file '../../arch/powerpc/include/uapi/asm/unistd.h' is included
      only for powerpc in tools/perf/perf-sys.h.
      
      By looking closly at commit history, I found little weird thing:
      
      Commit f2d9cae9 ("perf powerpc: Use uapi/unistd.h to fix build
      error") replaced 'asm/unistd.h' with 'uapi/asm/unistd.h'
      
      Commit d2709c7c ("perf: Make perf build for x86 with UAPI
      disintegration applied") removes all arch specific 'uapi/asm/unistd.h'
      for all archs and adds generic <asm/unistd.h>.
      
      Commit f0b9abfb ("Merge branch 'linus' into perf/core") again
      includes 'uapi/asm/unistd.h' for powerpc. Don't know how exactly this
      happened as this change is not part of commit also.
      Signed-off-by: default avatarRavi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1472630591-5089-1-git-send-email-ravi.bangoria@linux.vnet.ibm.com
      Fixes: ad430729 ("Remove: kernel unistd*h files from perf's MANIFEST, not used")
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      25b8592e
    • Mark Rutland's avatar
      perf pmu: Support alternative sysfs cpumask · 7e3fcffe
      Mark Rutland authored
      The perf tools can read a cpumask file for a PMU, describing a subset of
      CPUs which that PMU covers. So far this has only been used to cater for
      uncore PMUs, which in practice happen to only have a single CPU
      described in the mask.
      
      Until recently, the perf tools only correctly handled cpumask containing
      a single CPU, and only when monitoring in system-wide mode. For example,
      prior to commit 00e727bb ("perf stat: Balance opening and
      reading events"), a mask with more than a single CPU could cause perf
      stat to hang. When a CPU PMU covers a subset of CPUs, but lacks a
      cpumask, perf record will fail to open events (on the cores the PMU does
      not support), and gives up.
      
      For systems with heterogeneous CPUs such as ARM big.LITTLE systems, this
      presents a problem. We have a PMU for each microarchitecture (e.g. a big
      PMU and a little PMU), and would like to expose a cpumask for each (so
      as to allow perf record and other tools to do the right thing). However,
      doing so kernel-side will cause old perf binaries to not function (e.g.
      hitting the issue solved by 00e727bb), and thus commits the
      cardinal sin of breaking (existing) userspace.
      
      To address this chicken-and-egg problem, this patch adds support got a
      new file, cpus, which is largely identical to the existing cpumask file.
      A kernel can expose this file, knowing that new perf binaries will
      correctly support it, while old perf binaries will not look for it (and
      thus will not be broken).
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Link: http://lkml.kernel.org/r/1473330112-28528-8-git-send-email-mark.rutland@arm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      7e3fcffe
    • Mark Rutland's avatar
      perf evlist: Only open events on CPUs an evsel permits · 9f21b815
      Mark Rutland authored
      In systems with heterogeneous CPU PMUs, it's possible for each evsel to
      cover a distinct set of CPUs, and hence the cpu_map associated with each
      evsel may have a distinct idx<->id mapping. Any of these may be distinct
      from the evlist's cpu map.
      
      Events can be tied to the same fd so long as they use the same per-cpu
      ringbuffer (i.e. so long as they are on the same CPU). To acquire the
      correct FDs, we must compare the Linux logical IDs rather than the evsel
      or evlist indices.
      
      This path adds logic to perf_evlist__mmap_per_evsel to handle this,
      translating IDs as required. As PMUs may cover a subset of CPUs from the
      evlist, we skip the CPUs a PMU cannot handle.
      
      Without this patch, perf record may try to mmap erroneous FDs on
      heterogeneous systems, and will bail out early rather than running the
      workload.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Link: http://lkml.kernel.org/r/1473330112-28528-7-git-send-email-mark.rutland@arm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      9f21b815
    • Peter Zijlstra's avatar
      perf annotate: Add branch stack / basic block · 70fbe057
      Peter Zijlstra authored
      I wanted to know the hottest path through a function and figured the
      branch-stack (LBR) information should be able to help out with that.
      
      The below uses the branch-stack to create basic blocks and generate
      statistics from them.
      
              from    to              branch_i
              * ----> *
                      |
                      | block
                      v
                      * ----> *
                      from    to      branch_i+1
      
      The blocks are broken down into non-overlapping ranges, while tracking
      if the start of each range is an entry point and/or the end of a range
      is a branch.
      
      Each block iterates all ranges it covers (while splitting where required
      to exactly match the block) and increments the 'coverage' count.
      
      For the range including the branch we increment the taken counter, as
      well as the pred counter if flags.predicted.
      
      Using these number we can find if an instruction:
      
       - had coverage; given by:
      
              br->coverage / br->sym->max_coverage
      
         This metric ensures each symbol has a 100% spot, which reflects the
         observation that each symbol must have a most covered/hottest
         block.
      
       - is a branch target: br->is_target && br->start == add
      
       - for targets, how much of a branch's coverages comes from it:
      
      	target->entry / branch->coverage
      
       - is a branch: br->is_branch && br->end == addr
      
       - for branches, how often it was taken:
      
              br->taken / br->coverage
      
         after all, all execution that didn't take the branch would have
         incremented the coverage and continued onward to a later branch.
      
       - for branches, how often it was predicted:
      
              br->pred / br->taken
      
      The coverage percentage is used to color the address and asm sections;
      for low (<1%) coverage we use NORMAL (uncolored), indicating that these
      instructions are not 'important'. For high coverage (>75%) we color the
      address RED.
      
      For each branch, we add an asm comment after the instruction with
      information on how often it was taken and predicted.
      
      Output looks like (sans color, which does loose a lot of the
      information :/)
      
      $ perf record --branch-filter u,any -e cycles:p ./branches 27
      $ perf annotate branches
      
       Percent |	Source code & Disassembly of branches for cycles:pu (217 samples)
      ---------------------------------------------------------------------------------
               :	branches():
          0.00 :	  40057a:       push   %rbp
          0.00 :	  40057b:       mov    %rsp,%rbp
          0.00 :	  40057e:       sub    $0x20,%rsp
          0.00 :	  400582:       mov    %rdi,-0x18(%rbp)
          0.00 :	  400586:       mov    %rsi,-0x20(%rbp)
          0.00 :	  40058a:       mov    -0x18(%rbp),%rax
          0.00 :	  40058e:       mov    %rax,-0x10(%rbp)
          0.00 :	  400592:       movq   $0x0,-0x8(%rbp)
          0.00 :	  40059a:       jmpq   400656 <branches+0xdc>
          1.84 :	  40059f:       mov    -0x10(%rbp),%rax	# +100.00%
          3.23 :	  4005a3:       and    $0x1,%eax
          1.84 :	  4005a6:       test   %rax,%rax
          0.00 :	  4005a9:       je     4005bf <branches+0x45>	# -54.50% (p:42.00%)
          0.46 :	  4005ab:       mov    0x200bbe(%rip),%rax        # 601170 <acc>
         12.90 :	  4005b2:       add    $0x1,%rax
          2.30 :	  4005b6:       mov    %rax,0x200bb3(%rip)        # 601170 <acc>
          0.46 :	  4005bd:       jmp    4005d1 <branches+0x57>	# -100.00% (p:100.00%)
          0.92 :	  4005bf:       mov    0x200baa(%rip),%rax        # 601170 <acc>	# +49.54%
         13.82 :	  4005c6:       sub    $0x1,%rax
          0.46 :	  4005ca:       mov    %rax,0x200b9f(%rip)        # 601170 <acc>
          2.30 :	  4005d1:       mov    -0x10(%rbp),%rax	# +50.46%
          0.46 :	  4005d5:       mov    %rax,%rdi
          0.46 :	  4005d8:       callq  400526 <lfsr>	# -100.00% (p:100.00%)
          0.00 :	  4005dd:       mov    %rax,-0x10(%rbp)	# +100.00%
          0.92 :	  4005e1:       mov    -0x18(%rbp),%rax
          0.00 :	  4005e5:       and    $0x1,%eax
          0.00 :	  4005e8:       test   %rax,%rax
          0.00 :	  4005eb:       je     4005ff <branches+0x85>	# -100.00% (p:100.00%)
          0.00 :	  4005ed:       mov    0x200b7c(%rip),%rax        # 601170 <acc>
          0.00 :	  4005f4:       shr    $0x2,%rax
          0.00 :	  4005f8:       mov    %rax,0x200b71(%rip)        # 601170 <acc>
          0.00 :	  4005ff:       mov    -0x10(%rbp),%rax	# +100.00%
          7.37 :	  400603:       and    $0x1,%eax
          3.69 :	  400606:       test   %rax,%rax
          0.00 :	  400609:       jne    400612 <branches+0x98>	# -59.25% (p:42.99%)
          1.84 :	  40060b:       mov    $0x1,%eax
         14.29 :	  400610:       jmp    400617 <branches+0x9d>	# -100.00% (p:100.00%)
          1.38 :	  400612:       mov    $0x0,%eax	# +57.65%
         10.14 :	  400617:       test   %al,%al	# +42.35%
          0.00 :	  400619:       je     40062f <branches+0xb5>	# -57.65% (p:100.00%)
          0.46 :	  40061b:       mov    0x200b4e(%rip),%rax        # 601170 <acc>
          2.76 :	  400622:       sub    $0x1,%rax
          0.00 :	  400626:       mov    %rax,0x200b43(%rip)        # 601170 <acc>
          0.46 :	  40062d:       jmp    400641 <branches+0xc7>	# -100.00% (p:100.00%)
          0.92 :	  40062f:       mov    0x200b3a(%rip),%rax        # 601170 <acc>	# +56.13%
          2.30 :	  400636:       add    $0x1,%rax
          0.92 :	  40063a:       mov    %rax,0x200b2f(%rip)        # 601170 <acc>
          0.92 :	  400641:       mov    -0x10(%rbp),%rax	# +43.87%
          2.30 :	  400645:       mov    %rax,%rdi
          0.00 :	  400648:       callq  400526 <lfsr>	# -100.00% (p:100.00%)
          0.00 :	  40064d:       mov    %rax,-0x10(%rbp)	# +100.00%
          1.84 :	  400651:       addq   $0x1,-0x8(%rbp)
          0.92 :	  400656:       mov    -0x8(%rbp),%rax
          5.07 :	  40065a:       cmp    -0x20(%rbp),%rax
          0.00 :	  40065e:       jb     40059f <branches+0x25>	# -100.00% (p:100.00%)
          0.00 :	  400664:       nop
          0.00 :	  400665:       leaveq
          0.00 :	  400666:       retq
      
      (Note: the --branch-filter u,any was used to avoid spurious target and
      branch points due to interrupts/faults, they show up as very small -/+
      annotations on 'weird' locations)
      
      Committer note:
      
      Please take a look at:
      
        http://vger.kernel.org/~acme/perf/annotate_basic_blocks.png
      
      To see the colors.
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
      Cc: David Carrillo-Cisneros <davidcc@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Stephane Eranian <eranian@google.com>
      [ Moved sym->max_coverage to 'struct annotate', aka symbol__annotate(sym) ]
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      70fbe057
    • Wang Nan's avatar
      perf record: Mark MAP_HUGETLB when synthesizing mmap events · d7e404af
      Wang Nan authored
      When synthesizing mmap events, add MAP_HUGETLB map flag if the source of
      mapping is file in hugetlbfs.
      
      After this patch, perf can identify hugetlb mapping even if perf is
      started after the mapping of huge pages (like with 'perf top').
      Signed-off-by: default avatarWang Nan <wangnan0@huawei.com>
      Reviewed-by: default avatarNilay Vaish <nilayvaish@gmail.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Hou Pengyang <houpengyang@huawei.com>
      Cc: Zefan Li <lizefan@huawei.com>
      Link: http://lkml.kernel.org/r/1473137909-142064-4-git-send-email-wangnan0@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      d7e404af
    • Wang Nan's avatar
      tools lib api fs: Add hugetlbfs filesystem detector · 5e7be3e1
      Wang Nan authored
      Detect hugetlbfs. hugetlbfs__mountpoint() will be used during recording
      to help identifying hugetlb mmaps: which should be recognized as anon
      mapping.
      Signed-off-by: default avatarWang Nan <wangnan0@huawei.com>
      Reviewed-by: default avatarNilay Vaish <nilayvaish@gmail.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Hou Pengyang <houpengyang@huawei.com>
      Cc: Zefan Li <lizefan@huawei.com>
      Link: http://lkml.kernel.org/r/1473137909-142064-3-git-send-email-wangnan0@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      5e7be3e1
    • Wang Nan's avatar
      perf tools: Recognize hugetlb mapping as anon mapping · 0ac3348e
      Wang Nan authored
      Hugetlbfs mapping should be recognized as anon mapping so user has a
      chance to create /tmp/perf-<pid>.map file for symbol resolving. This
      patch utilizes MAP_HUGETLB to identify hugetlb mapping.
      
      After this patch, if perf is started before a program starts using huge
      pages (so perf gets MMAP2 events from kernel), perf is able to recognize
      hugetlb mapping as anon mapping.
      Signed-off-by: default avatarWang Nan <wangnan0@huawei.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Nilay Vaish <nilayvaish@gmail.com>
      Cc: Zefan Li <lizefan@huawei.com>
      Link: http://lkml.kernel.org/r/1473137909-142064-2-git-send-email-wangnan0@huawei.comSigned-off-by: default avatarHou Pengyang <houpengyang@huawei.com>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      0ac3348e
  5. 06 Sep, 2016 1 commit
  6. 05 Sep, 2016 11 commits
  7. 04 Sep, 2016 4 commits
  8. 03 Sep, 2016 5 commits
    • Linus Torvalds's avatar
      Merge branch 'for-linus-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs · 4b30b6d1
      Linus Torvalds authored
      Pull btrfs fixes from Chris Mason:
       "I'm still prepping a set of fixes for btrfs fsync, just nailing down a
        hard to trigger memory corruption.  For now, these are tested and ready."
      
      * 'for-linus-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
        btrfs: fix one bug that process may endlessly wait for ticket in wait_reserve_ticket()
        Btrfs: fix endless loop in balancing block groups
        Btrfs: kill invalid ASSERT() in process_all_refs()
      4b30b6d1
    • Linus Torvalds's avatar
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · 2bece1a0
      Linus Torvalds authored
      Pull arm64 fixes from Catalin Marinas:
       "arm64 and arm/perf fixes:
      
         - arm64 fix: debug exception unmasking on the CPU resume path
      
         - ARM PMU fixes: memory leak on error path and NULL pointer
           dereference"
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64: kernel: Fix unmasked debug exceptions when restoring mdscr_el1
        drivers/perf: arm_pmu: Fix NULL pointer dereference during probe
        drivers/perf: arm_pmu: Fix leak in error path
      2bece1a0
    • Linus Torvalds's avatar
      Merge tag 'char-misc-4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc · 593ee4ed
      Linus Torvalds authored
      Pull char/misc driver fixes from Greg KH:
       "Here are a number of small driver fixes for 4.8-rc5.
      
        The largest thing here is deleting an obsolete driver,
        drivers/misc/bh1780gli.c, as the functionality of it was replaced by
        an iio driver a while ago.
      
        The other fixes are things that have been reported, or reverts of
        broken stuff (the binder change).  All of these changes have been in
        linux-next for a while with no reported issues"
      
      * tag 'char-misc-4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
        thunderbolt: Don't declare Falcon Ridge unsupported
        thunderbolt: Add support for INTEL_FALCON_RIDGE_2C controller.
        thunderbolt: Fix resume quirk for Falcon Ridge 4C.
        lkdtm: Mark lkdtm_rodata_do_nothing() notrace
        mei: me: disable driver on SPT SPS firmware
        Revert "android: binder: fix dangling pointer comparison"
        drivers/iio/light/Kconfig: SENSORS_BH1780 cleanup
        android: binder: fix dangling pointer comparison
        misc: delete bh1780 driver
      593ee4ed
    • Linus Torvalds's avatar
      Merge tag 'driver-core-4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core · 41488202
      Linus Torvalds authored
      Pull driver core fixes from Greg KH:
       "Here are three small fixes for 4.8-rc5.
      
        One for sysfs, one for kernfs, and one documentation fix, all for
        reported issues.  All of these have been in linux-next for a while"
      
      * tag 'driver-core-4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
        sysfs: correctly handle read offset on PREALLOC attrs
        documentation: drivers/core/of: fix name of of_node symlink
        kernfs: don't depend on d_find_any_alias() when generating notifications
      41488202
    • Linus Torvalds's avatar
      Merge tag 'staging-4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging · 018c81b8
      Linus Torvalds authored
      Pull staging/IIO driver fixes from Greg KH:
       "Here are a number of small fixes for staging and IIO drivers that
        resolve reported problems.
      
        Full details are in the shortlog.  All of these have been in
        linux-next with no reported issues"
      
      * tag 'staging-4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (35 commits)
        arm: dts: rockchip: add reset node for the exist saradc SoCs
        arm64: dts: rockchip: add reset saradc node for rk3368 SoCs
        iio: adc: rockchip_saradc: reset saradc controller before programming it
        iio: accel: kxsd9: Fix raw read return
        iio: adc: ti_am335x_adc: Increase timeout value waiting for ADC sample
        iio: adc: ti_am335x_adc: Protect FIFO1 from concurrent access
        include/linux: fix excess fence.h kernel-doc notation
        staging: wilc1000: correctly check if associatedsta has not been found
        staging: wilc1000: NULL dereference on error
        staging: wilc1000: txq_event: Fix coding error
        MAINTAINERS: Add file patterns for ion device tree bindings
        MAINTAINERS: Update maintainer entry for wilc1000
        iio: chemical: atlas-ph-sensor: fix typo in val assignment
        iio: fix sched WARNING "do not call blocking ops when !TASK_RUNNING"
        staging: comedi: ni_mio_common: fix AO inttrig backwards compatibility
        staging: comedi: dt2811: fix a precedence bug
        staging: comedi: adv_pci1760: Do not return EINVAL for CMDF_ROUND_DOWN.
        staging: comedi: ni_mio_common: fix wrong insn_write handler
        staging: comedi: comedi_test: fix timer race conditions
        staging: comedi: daqboard2000: bug fix board type matching code
        ...
      018c81b8