1. 18 Jan, 2011 3 commits
    • Oleg Nesterov's avatar
      perf: Validate cpu early in perf_event_alloc() · 66832eb4
      Oleg Nesterov authored
      Starting from perf_event_alloc()->perf_init_event(), the kernel
      assumes that event->cpu is either -1 or the valid CPU number.
      
      Change perf_event_alloc() to validate this argument early. This
      also means we can remove the similar check in
      find_get_context().
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Prasad <prasad@linux.vnet.ibm.com>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: gregkh@suse.de
      Cc: stable@kernel.org
      LKML-Reference: <20110118161032.GC693@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      66832eb4
    • Oleg Nesterov's avatar
      perf: Find_get_context: fix the per-cpu-counter check · 22a4ec72
      Oleg Nesterov authored
      If task == NULL, find_get_context() should always check that cpu
      is correct.
      
      Afaics, the bug was introduced by 38a81da2 "perf events: Clean
      up pid passing", but even before that commit "&& cpu != -1" was
      not exactly right, -ESRCH from find_task_by_vpid() is not
      accurate.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Prasad <prasad@linux.vnet.ibm.com>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: gregkh@suse.de
      Cc: stable@kernel.org
      LKML-Reference: <20110118161008.GB693@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      22a4ec72
    • Peter Zijlstra's avatar
      perf: Fix contexted inheritance · c5ed5145
      Peter Zijlstra authored
      Linus reported that the RCU lockdep annotation bits triggered for this
      rcu_dereference() because we're not holding rcu_read_lock().
      
      Going over the code I cannot convince myself its correct:
      
       - holding a ref on the parent_ctx, doesn't avoid it being uncloned
         concurrently (as the comment says), so we can race with a free.
      
       - holding parent_ctx->mutex doesn't avoid the above free from taking
         place either, it would at best avoid parent_ctx from being freed.
      
      I.e. the warning is correct. To fix the bug, serialize against the
      unclone_ctx() call by extending the reach of the parent_ctx->lock.
      Reported-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      c5ed5145
  2. 17 Jan, 2011 3 commits
    • Arnaldo Carvalho de Melo's avatar
      perf tools: Fix tracepoint id to string perf.data header table · ad7f4e3f
      Arnaldo Carvalho de Melo authored
      It was broken by f006d25a that passed just the event name, not the complete
      sys:event that it expected to open the /sys/.../sys/sys:event/id file to get
      the id.
      
      Fix it by moving it to after parse_events in cmd_record, as at that point
      we can just traverse the evsel_list and use evsel->attr.config +
      event_name(evsel) instead of re-opening the /id file.
      Reported-by: default avatarFranck Bui-Huu <vagabon.xyz@gmail.com>
      Cc: Franck Bui-Huu <vagabon.xyz@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Han Pingtian <phan@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      LKML-Reference: <20110117202801.GG2085@ghostprotocols.net>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      ad7f4e3f
    • Arnaldo Carvalho de Melo's avatar
      perf tools: Fix handling of wildcards in tracepoint event selectors · dd9a9ad5
      Arnaldo Carvalho de Melo authored
      It wasn't accounting the ':' when consuming bytes in the the event
      selector string, so parse_events() would fail in this test:
      
                      if (!(*str == 0 || *str == ',' || isspace(*str)))
                              return -1;
      
      as *str would be pointing to '*', the last character in the '-e' arg in:
      
      $ perf record -q -a -D -e sched:sched_* | perf script -i - -s perf-script.py
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      dd9a9ad5
    • Anton Blanchard's avatar
      powerpc: perf: Fix frequency calculation for overflowing counters · 4bca770e
      Anton Blanchard authored
      When profiling a benchmark that is almost 100% userspace, I noticed some wildly
      inaccurate profiles that showed almost all time spent in the kernel.
      
      Closer examination shows we were programming a tiny number of cycles into the
      PMU after each overflow (about ~200 away from the next overflow). This gets us
      stuck in a loop which we eventually break out of by throttling the PMU (there
      are regular throttle/unthrottle events in the log).
      
      It looks like we aren't setting event->hw.last_period to something same and the
      frequency to period calculations in perf are going haywire.
      
      With the following patch we find the correct period after a few interrupts and
      stay there. I also see no more throttle events.
      Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
      Acked-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      LKML-Reference: <20110117161742.5feb3761@kryten>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      4bca770e
  3. 15 Jan, 2011 1 commit
  4. 14 Jan, 2011 2 commits
    • Lai Jiangshan's avatar
      tracing: Remove syscall_exit_fields · 7f85803a
      Lai Jiangshan authored
      There is no need for syscall_exit_fields as the syscall
      exit event class can already host the fields in its structure,
      like most other trace events do by default. Use that
      default behavior instead.
      
      Following this scheme, we don't need anymore to override the
      get_fields() callback of the syscall exit event class either.
      
      Hence both syscall_exit_fields and syscall_get_exit_fields() can
      be removed.
      
      Also changed some indentation to keep the following under 80
      characters:
      
      ".fields		= LIST_HEAD_INIT(event_class_syscall_exit.fields),"
      Acked-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      LKML-Reference: <4D301C0E.8090408@cn.fujitsu.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      7f85803a
    • Steven Rostedt's avatar
      tracing: Only process module tracepoints once · c94fbe1d
      Steven Rostedt authored
      The commit:
      
       9f987b3141f086de27832514aad9f50a53f754
       tracing: Include module.h in define_trace.h
      
      only solved half the problem. If the trace/events/module.h header is
      included at the time of define_trace.h (or in ftrace.h within it),
      the module.h TRACE_SYSTEM will override the current TRACE_SYSTEM
      macro.
      
      Since define_trace.h is included when CREATE_TRACE_POINTS is set,
      and the first thing it does is to #undef CREATE_TRACE_POINTS,
      by placing the module.h TRACE_SYSTEM inside a
       #ifdef CREATE_TRACE_POINTS
      we can prevent it from overriding the TRACE_SYSTEM that is
      being processed, and still process the module.h tracepoints
      when the module code defines CREATE_TRACE_POINTS and includes
      the trace/events/module.h header.
      
      As with commit 9f987b3141, this is only an issue if module.h
      is not included before the trace/events/<event>.h file is
      included, which (luckily) has not happened yet.
      Reported-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      c94fbe1d
  5. 13 Jan, 2011 2 commits
    • Kirill Smelkov's avatar
      perf record: Add "nodelay" mode, disabled by default · acac03fa
      Kirill Smelkov authored
      Sometimes there is a need to use perf in "live-log" mode. The problem
      is, for seldom events, actual info output is largely delayed because
      perf-record reads sample data in whole pages.
      
      So for such scenarious, add flag for perf-record to go in "nodelay"
      mode. To track e.g. what's going on in icmp_rcv while ping is running
      Use it with something like this:
      
      (1) $ perf probe -L icmp_rcv | grep -U8 '^ *43\>'
                                          goto error;
                          }
               38         if (!pskb_pull(skb, sizeof(*icmph)))
                                  goto error;
                          icmph = icmp_hdr(skb);
      
               43         ICMPMSGIN_INC_STATS_BH(net, icmph->type);
                          /*
                           *      18 is the highest 'known' ICMP type. Anything else is a mystery
                           *
                           *      RFC 1122: 3.2.2  Unknown ICMP messages types MUST be silently
                           *                discarded.
                           */
               50         if (icmph->type > NR_ICMP_TYPES)
                                  goto error;
      
          $ perf probe icmp_rcv:43 'type=icmph->type'
      
      (2) $ cat trace-icmp.py
          [...]
          def trace_begin():
                  print "in trace_begin"
      
          def trace_end():
                  print "in trace_end"
      
          def probe__icmp_rcv(event_name, context, common_cpu,
                  common_secs, common_nsecs, common_pid, common_comm,
                  __probe_ip, type):
                          print_header(event_name, common_cpu, common_secs, common_nsecs,
                                  common_pid, common_comm)
      
                          print "__probe_ip=%u, type=%u\n" % \
                          (__probe_ip, type),
          [...]
      
      (3) $ perf record -a -D -e probe:icmp_rcv -o - | \
            perf script -i - -s trace-icmp.py
      
      Thanks to Peter Zijlstra for pointing how to do it.
      
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>, Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      LKML-Reference: <20110112140613.GA11698@tugrik.mns.mnsspb.ru>
      Signed-off-by: default avatarKirill Smelkov <kirr@mns.spb.ru>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      acac03fa
    • Stephane Eranian's avatar
      perf sched: Fix list of events, dropping unsupported ':r' modifier · 9710118b
      Stephane Eranian authored
      Looks to me like the :r modifier is not supported anymore, so remove it from
      the list of events.
      
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Robert Richter <robert.richter@amd.com>
      LKML-Reference: <AANLkTim=jawJyBj0iFd0r4-LCKzvjFW+NddzJMD5GUB9@mail.gmail.com>
      Signed-off-by: default avatarStephane Eranian <eranian@google.com>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      9710118b
  6. 11 Jan, 2011 5 commits
    • Arnaldo Carvalho de Melo's avatar
      Revert "perf tools: Emit clearer message for sys_perf_event_open ENOENT return" · 4ad9f594
      Arnaldo Carvalho de Melo authored
      This reverts commit aa7bc7ef.
      
      It removed the fallback from hardware profiling to software profiling.
      .e.g., in a VM with no PMU.
      Reported-by: default avatarDavid Ahern <daahern@cisco.com>
      Cc: David Ahern <daahern@cisco.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      4ad9f594
    • Arnaldo Carvalho de Melo's avatar
      perf top: Fix annotate segv · cc841580
      Arnaldo Carvalho de Melo authored
      Before we had sym_counter, it was initialized to zero and we used that
      as an index in the global attrs variable, now we have a list of evsel
      entries, and sym_counter became sym_evsel, that remained initialized to
      zero (NULL): b00m.
      
      Fix it by initializing it to the first entry in the evsel list.
      
      Bug-introduced: 69aad6f1Reported-by: default avatarKirill Smelkov <kirr@mns.spb.ru>
      Tested-by: default avatarKirill Smelkov <kirr@mns.spb.ru>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Kirill Smelkov <kirr@mns.spb.ru>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      cc841580
    • Arnaldo Carvalho de Melo's avatar
      perf evsel: Fix order of event list deletion · bd3bfe9e
      Arnaldo Carvalho de Melo authored
      We need to defer calling perf_evsel_list__delete() till after atexit
      registered routines, because we need to traverse the events being
      recorded at that time at least on 'perf record'.
      
      This fixes the problem reported by Thomas Renninger where cmd_record
      called by cmd_timechart would not write the tracing data to the perf.data
      file header because the evsel_list at atexit (control+C on 'perf timechart
      record') time would be empty, being already deleted by run_builtin(),
      and thus 'perf timechart' when trying to process such perf.data file would
      die with:
      
      "no trace data in the file"
      
      Problem introduced in 70d544d0.
      Reported-by: default avatarThomas Renninger <trenn@suse.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Renninger <trenn@suse.de>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      bd3bfe9e
    • Arnaldo Carvalho de Melo's avatar
      perf session: Fix infinite loop in __perf_session__process_events · 3d03e2ea
      Arnaldo Carvalho de Melo authored
      In this if statement:
      
              if (head + event->header.size >= mmap_size) {
                      if (mmaps[map_idx]) {
                              munmap(mmaps[map_idx], mmap_size);
                              mmaps[map_idx] = NULL;
                      }
      
                      page_offset = page_size * (head / page_size);
                      file_offset += page_offset;
                      head -= page_offset;
                      goto remap;
              }
      
      With, for instance, these values:
      
      head=2992
      event->header.size=48
      mmap_size=3040
      
      We end up endlessly looping back to remap. Off by one.
      
      Problem introduced in 55b4462.
      Reported-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Reported-by: default avatarIngo Molnar <mingo@elte.hu>
      Reported-by: default avatarDavid Ahern <daahern@cisco.com>
      Bisected-by: default avatarDavid Ahern <daahern@cisco.com>
      Tested-by: default avatarDavid Ahern <daahern@cisco.com>
      Cc: David Ahern <daahern@cisco.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      3d03e2ea
    • Arnaldo Carvalho de Melo's avatar
      perf evsel: Support perf_evsel__open(cpus > 1 && threads > 1) · 0252208e
      Arnaldo Carvalho de Melo authored
      And a test for it:
      
      [acme@felicio linux]$ perf test
       1: vmlinux symtab matches kallsyms: Ok
       2: detect open syscall event: Ok
       3: detect open syscall event on all cpus: Ok
      [acme@felicio linux]$
      
      Translating C the test does:
      
      1. generates different number of open syscalls on each CPU
         by using sched_setaffinity
      2. Verifies that the expected number of events is generated
         on each CPU
      
      It works as expected.
      
      LKML-Reference: <new-submission>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      0252208e
  7. 10 Jan, 2011 4 commits
  8. 09 Jan, 2011 2 commits
    • Ingo Molnar's avatar
      Merge branch 'tip/perf/core' of... · 4385428a
      Ingo Molnar authored
      Merge branch 'tip/perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/urgent
      4385428a
    • Cyrill Gorcunov's avatar
      perf, x86: P4 PMU - Fix unflagged overflows handling · 047a3772
      Cyrill Gorcunov authored
      Don found that P4 PMU reads CCCR register instead of counter
      itself (in attempt to catch unflagged event) this makes P4
      NMI handler to consume all NMIs it observes. So the other
      NMI users such as kgdb simply have no chance to get NMI
      on their hands.
      
      Side note: at moment there is no way to run nmi-watchdog
      together with perf tool. This is because both 'perf top' and
      nmi-watchdog use same event. So while nmi-watchdog reserves
      one event/counter for own needs there is no room for perf tool
      left (there is a way to disable nmi-watchdog on boot of course).
      
      Ming has tested this patch with the following results
      
       | 1. watchdog disabled
       |
       | kgdb tests on boot OK
       | perf works OK
       |
       | 2. watchdog enabled, without patch perf-x86-p4-nmi-4
       |
       | kgdb tests on boot hang
       |
       | 3. watchdog enabled, without patch perf-x86-p4-nmi-4 and do not run kgdb
       | tests on boot
       |
       | "perf top" partialy works
       |   cpu-cycles            no
       |   instructions          yes
       |   cache-references      no
       |   cache-misses          no
       |   branch-instructions   no
       |   branch-misses         yes
       |   bus-cycles            no
       |
       | 4. watchdog enabled, with patch perf-x86-p4-nmi-4 applied
       |
       | kgdb tests on boot OK
       | perf does not work, NMI "Dazed and confused" messages show up
       |
      
      Which means we still have problems with p4 box due to 'unknown'
      nmi happens but at least it should fix kgdb test cases.
      Reported-by: default avatarJason Wessel <jason.wessel@windriver.com>
      Reported-by: default avatarDon Zickus <dzickus@redhat.com>
      Signed-off-by: default avatarCyrill Gorcunov <gorcunov@openvz.org>
      Acked-by: default avatarDon Zickus <dzickus@redhat.com>
      Acked-by: default avatarLin Ming <ming.m.lin@intel.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      LKML-Reference: <4D275E7E.3040903@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      047a3772
  9. 08 Jan, 2011 6 commits
  10. 07 Jan, 2011 12 commits
    • Steven Rostedt's avatar
      tracing: Include module.h in define_trace.h · 3a9f987b
      Steven Rostedt authored
      While doing some developing, Peter Zijlstra and I have found
      that if a CREATE_TRACE_POINTS include is done before module.h
      is included, it can break the build.
      
      We have been lucky so far that this has not broke the build
      since module.h is included in almost everything.
      Reported-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      3a9f987b
    • Frederic Weisbecker's avatar
      x86: Save rbp in pt_regs on irq entry · 625dbc3b
      Frederic Weisbecker authored
      From the x86_64 low level interrupt handlers, the frame pointer is
      saved right after the partial pt_regs frame.
      
      rbp is not supposed to be part of the irq partial saved registers,
      but it only requires to extend the pt_regs frame by 8 bytes to
      do so, plus a tiny stack offset fixup on irq exit.
      
      This changes a bit the semantics or get_irq_entry() that is supposed
      to provide only the value of caller saved registers and the cpu
      saved frame. However it's a win for unwinders that can walk through
      stack frames on top of get_irq_regs() snapshots.
      
      A noticeable impact is that it makes perf events cpu-clock and
      task-clock events based callchains working on x86_64.
      
      Let's then save rbp into the irq pt_regs.
      
      As a result with:
      
      	perf record -e cpu-clock perf bench sched messaging
      	perf report --stdio
      
      Before:
          20.94%             perf  [kernel.kallsyms]        [k] lock_acquire
                             |
                             --- lock_acquire
                                |
                                |--44.01%-- __write_nocancel
                                |
                                |--43.18%-- __read
                                |
                                |--6.08%-- fork
                                |          create_worker
                                |
                                |--0.88%-- _dl_fixup
                                |
                                |--0.65%-- do_lookup_x
                                |
                                |--0.53%-- __GI___libc_read
                                 --4.67%-- [...]
      
      After:
          19.23%         perf  [kernel.kallsyms]    [k] __lock_acquire
                         |
                         --- __lock_acquire
                            |
                            |--97.74%-- lock_acquire
                            |          |
                            |          |--21.82%-- _raw_spin_lock
                            |          |          |
                            |          |          |--37.26%-- unix_stream_recvmsg
                            |          |          |          sock_aio_read
                            |          |          |          do_sync_read
                            |          |          |          vfs_read
                            |          |          |          sys_read
                            |          |          |          system_call
                            |          |          |          __read
                            |          |          |
                            |          |          |--24.09%-- unix_stream_sendmsg
                            |          |          |          sock_aio_write
                            |          |          |          do_sync_write
                            |          |          |          vfs_write
                            |          |          |          sys_write
                            |          |          |          system_call
                            |          |          |          __write_nocancel
      
      v2: Fix cfi annotations.
      Reported-by: default avatarSoeren Sandmann Pedersen <sandmann@redhat.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Jan Beulich <JBeulich@novell.com>
      625dbc3b
    • Rakib Mullick's avatar
      x86, dumpstack: Fix unused variable warning · 39a6eebd
      Rakib Mullick authored
      In dump_stack function, bp isn't used anymore, which is introduced by
      commit 9c0729dc. This patch removes bp
      completely.
      Signed-off-by: default avatarRakib Mullick <rakib.mullick@gmail.com>
      Cc: Soeren Sandmann <sandmann@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: H. Peter Anvin <hpa@zytor.com>
      LKML-Reference: <AANLkTik9U_Z0WSZ7YjrykER_pBUfPDdgUUmtYx=R74nL@mail.gmail.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      39a6eebd
    • Don Zickus's avatar
      x86, NMI: Clean-up default_do_nmi() · f2fd4395
      Don Zickus authored
      Just re-arrange the code a bit to make it easier to follow what is
      going on.  Basically un-negating the if-statement and swapping the code
      inside the if-statement with code outside.
      
      No functional changes.
      Originally-by: default avatarHuang Ying <ying.huang@intel.com>
      Signed-off-by: default avatarDon Zickus <dzickus@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1294348732-15030-7-git-send-email-dzickus@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      f2fd4395
    • Don Zickus's avatar
      x86, NMI: Allow NMI reason io port (0x61) to be processed on any CPU · ab846f13
      Don Zickus authored
      In original NMI handler, NMI reason io port (0x61) is only processed
      on BSP.  This makes it impossible to hot-remove BSP.  To solve the
      issue, a raw spinlock is used to allow the port to be processed on any
      CPU.
      Originally-by: default avatarHuang Ying <ying.huang@intel.com>
      Signed-off-by: default avatarDon Zickus <dzickus@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1294348732-15030-6-git-send-email-dzickus@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      ab846f13
    • Don Zickus's avatar
      x86, NMI: Remove DIE_NMI_IPI · c410b830
      Don Zickus authored
      With priorities in place and no one really understanding the difference between
      DIE_NMI and DIE_NMI_IPI, just remove DIE_NMI_IPI and convert everyone to DIE_NMI.
      
      This also simplifies default_do_nmi() a little bit.  Instead of calling the
      die_notifier in both the if and else part, just pull it out and call it before
      the if-statement.  This has the side benefit of avoiding a call to the ioport
      to see if there is an external NMI sitting around until after the (more frequent)
      internal NMIs are dealt with.
      Patch-Inspired-by: default avatarHuang Ying <ying.huang@intel.com>
      Signed-off-by: default avatarDon Zickus <dzickus@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1294348732-15030-5-git-send-email-dzickus@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      c410b830
    • Don Zickus's avatar
      x86, NMI: Add priorities to handlers · 166d7514
      Don Zickus authored
      In order to consolidate the NMI die_chain events, we need to setup the priorities
      for the die notifiers.
      
      I started by defining a bunch of common priorities that can be used by the
      notifier blocks.  Then I modified the notifier blocks to use the newly created
      priorities.
      
      Now that the priorities are straightened out, it should be easier to remove the
      event DIE_NMI_IPI.
      Signed-off-by: default avatarDon Zickus <dzickus@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1294348732-15030-4-git-send-email-dzickus@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      166d7514
    • Don Zickus's avatar
      x86: Convert some devices to use DIE_NMIUNKNOWN · 673a6092
      Don Zickus authored
      They are a handful of places in the code that register a die_notifier
      as a catch all in case no claims the NMI.  Unfortunately, they trigger
      on events like DIE_NMI and DIE_NMI_IPI, which depending on when they
      registered may collide with other handlers that have the ability to
      determine if the NMI is theirs or not.
      
      The function unknown_nmi_error() makes one last effort to walk the
      die_chain when no one else has claimed the NMI before spitting out
      messages that the NMI is unknown.
      
      This is a better spot for these devices to execute any code without
      colliding with the other handlers.
      
      The two drivers modified are only compiled on x86 arches I believe, so
      they shouldn't be affected by other arches that may not have
      DIE_NMIUNKNOWN defined.
      Signed-off-by: default avatarDon Zickus <dzickus@redhat.com>
      Cc: Russ Anderson <rja@sgi.com>
      Cc: Corey Minyard <minyard@acm.org>
      Cc: openipmi-developer@lists.sourceforge.net
      Cc: dann frazier <dannf@hp.com>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1294348732-15030-3-git-send-email-dzickus@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      673a6092
    • Huang Ying's avatar
      x86, NMI: Add NMI symbol constants and rename memory parity to PCI SERR · 1c7b74d4
      Huang Ying authored
      Replace the NMI related magic numbers with symbol constants.
      
      Memory parity error is only valid for IBM PC-AT, newer machine use
      bit 7 (0x80) of 0x61 port for PCI SERR. While memory error is usually
      reported via MCE. So corresponding function name and kernel log string
      is changed.
      
      But on some machines, PCI SERR line is still used to report memory
      errors. This is used by EDAC, so corresponding EDAC call is reserved.
      Signed-off-by: default avatarHuang Ying <ying.huang@intel.com>
      Signed-off-by: default avatarDon Zickus <dzickus@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1294348732-15030-2-git-send-email-dzickus@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      1c7b74d4
    • Stephane Eranian's avatar
      perf_events: Add perf_event_time() · 4158755d
      Stephane Eranian authored
      Adds perf_event_time() to try and centralize access to event
      timing and in particular ctx->time. Prepares for cgroup support.
      Signed-off-by: default avatarStephane Eranian <eranian@google.com>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <4d22059c.122ae30a.5e0e.ffff8b8b@mx.google.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      4158755d
    • Stephane Eranian's avatar
      perf_events: Generalize use of event_filter_match() · 5632ab12
      Stephane Eranian authored
      Replace all occurrences of:
      	event->cpu != -1 && event->cpu == smp_processor_id()
      by a call to:
      	event_filter_match(event)
      
      This makes the code more consistent and will make the cgroup
      patch smaller.
      Signed-off-by: default avatarStephane Eranian <eranian@google.com>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <4d220593.2308e30a.48c5.ffff8ae9@mx.google.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      5632ab12
    • Stephane Eranian's avatar
      perf_events: Move code around to prepare for cgroup · 0b3fcf17
      Stephane Eranian authored
      In particular this patch move perf_event_exit_task() before
      cgroup_exit() to allow for cgroup support. The cgroup_exit()
      function detaches the cgroups attached to a task.
      
      Other movements include hoisting some definitions and inlines
      at the top of perf_event.c
      Signed-off-by: default avatarStephane Eranian <eranian@google.com>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <4d22058b.cdace30a.4657.ffff95b1@mx.google.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      0b3fcf17