1. 08 Jun, 2010 3 commits
  2. 07 Jun, 2010 1 commit
  3. 04 Jun, 2010 2 commits
  4. 03 Jun, 2010 3 commits
    • Steven Rostedt's avatar
      tracing: Remove ftrace_preempt_disable/enable · 5168ae50
      Steven Rostedt authored
      The ftrace_preempt_disable/enable functions were to address a
      recursive race caused by the function tracer. The function tracer
      traces all functions which makes it easily susceptible to recursion.
      One area was preempt_enable(). This would call the scheduler and
      the schedulre would call the function tracer and loop.
      (So was it thought).
      
      The ftrace_preempt_disable/enable was made to protect against recursion
      inside the scheduler by storing the NEED_RESCHED flag. If it was
      set before the ftrace_preempt_disable() it would not call schedule
      on ftrace_preempt_enable(), thinking that if it was set before then
      it would have already scheduled unless it was already in the scheduler.
      
      This worked fine except in the case of SMP, where another task would set
      the NEED_RESCHED flag for a task on another CPU, and then kick off an
      IPI to trigger it. This could cause the NEED_RESCHED to be saved at
      ftrace_preempt_disable() but the IPI to arrive in the the preempt
      disabled section. The ftrace_preempt_enable() would not call the scheduler
      because the flag was already set before entring the section.
      
      This bug would cause a missed preemption check and cause lower latencies.
      
      Investigating further, I found that the recusion caused by the function
      tracer was not due to schedule(), but due to preempt_schedule(). Now
      that preempt_schedule is completely annotated with notrace, the recusion
      no longer is an issue.
      Reported-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      5168ae50
    • Steven Rostedt's avatar
      tracing/sched: Make preempt_schedule() notrace · d1f74e20
      Steven Rostedt authored
      The function tracer code uses ftrace_preempt_disable() to disable
      preemption instead of normal preempt_disable(). But there's a slight
      race condition that may cause it to lose a preemption check.
      
      This was made to keep the function tracer from recursing on itself
      by disabling preemption then having the enable call the function tracer
      again, causing infinite recursion.
      
      The bug was assumed to happen if the call was just in schedule, but
      this is incorrect. The bug is caused by preempt_schedule() which
      is called by preempt_enable(). The calling of preempt_enable() when
      NEED_RESCHED was set would call preempt_schedule() which would call
      the function tracer again.
      
      By making the preempt_schedule() and add_preempt_count() notrace
      then this will prevent the inifinite recursion. This is because
      the add_preempt_count() would stop the preempt_enable() in the
      function tracer from calling preempt_schedule() again.
      
      The sub_preempt_count() is also made notrace just to keep it
      symmetric.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      d1f74e20
    • Peter Zijlstra's avatar
      perf: Fix crash in swevents · c6df8d5a
      Peter Zijlstra authored
      Frederic reported that because swevents handling doesn't disable IRQs
      anymore, we can get a recursion of perf_adjust_period(), once from
      overflow handling and once from the tick.
      
      If both call ->disable, we get a double hlist_del_rcu() and trigger
      a LIST_POISON2 dereference.
      
      Since we don't actually need to stop/start a swevent to re-programm
      the hardware (lack of hardware to program), simply nop out these
      callbacks for the swevent pmu.
      Reported-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1275557609.27810.35218.camel@twins>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      c6df8d5a
  5. 02 Jun, 2010 1 commit
  6. 01 Jun, 2010 3 commits
    • Arnaldo Carvalho de Melo's avatar
      perf buildid-list: Fix --with-hits event processing · b5c874f1
      Arnaldo Carvalho de Melo authored
      When we use plain 'perf buildid-list' we use only what is in the buildid
      table in the perf.data header. And those have absolute pathnames because
      at 'perf record' time we used __perf_session__process_events and that
      doesn't sets up the path shortening code in map__new() that happens if
      symbol_conf.full_paths is false, the default.
      
      On the other hand, when we use 'perf buildid-list --with-hits' we
      process all the events using perf_session__process_events, adding
      entries to the global DSO list _after_ removing the current directory
      from the DSO name, for presentation purposes.
      
      Because of that we end up having two entries in the DSO list when
      recording events for binaries using relative pathnames.
      
      Fix it minimally by setting symbol_conf.full_paths to true when marking
      the DSOs with hits in 'perf buildid-list --with-hits', as used by 'perf
      archive'
      
      Right fix longer term is to shorten the path only at presentation time.
      Will be done for 2.6.36.
      Reported-by: default avatarStephane Eranian <eranian@google.com>
      Tested-by: default avatarStephane Eranian <eranian@google.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      LKML-Reference: <20100601183837.GC4093@ghostprotocols.net>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      b5c874f1
    • Pierre Tardy's avatar
      perf scripts python: Give field dict to unhandled callback · c0251485
      Pierre Tardy authored
      trace_unhandled() callback does not allow to access event fields, this patch
      resolves the problem.
      
      It can also been used as a more pythonic and flexible way for script writters
      to demux event types
      
      This will for example greatly simplify pytimechart event demux.
      Acked-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: default avatarTom Zanussi <tzanussi@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>,
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      LKML-Reference: <1275340329-2397-1-git-send-email-tardyp@gmail.com>
      Signed-off-by: default avatarPierre Tardy <tardyp@gmail.com>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      c0251485
    • Konstantin Stepanyuk's avatar
      perf hist: fix objdump output parsing · 75d9ef17
      Konstantin Stepanyuk authored
      hist_entry__annotate() runs objdump with -S option so the output may contain
      lines of any format. If a line starts with a colon strtoull() returns 0 and
      calculated offset will be negative. This causes perf annotate segfaults.
      
      Make sure that strtoull() has parsed at least one digit.
      
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarKonstantin Stepanyuk <konstantin.stepanyuk@gmail.com>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      75d9ef17
  7. 31 May, 2010 11 commits
    • Borislav Petkov's avatar
      perf-record: Check correct pid when forking · 2fb750e8
      Borislav Petkov authored
      When forking the child to be traced, we should check the correct
      return value from fork() and not a local variable which is otherwise
      unused.
      Signed-off-by: default avatarBorislav Petkov <bp@alien8.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Cc: Stephane Eranian <eranian@google.com>
      LKML-Reference: <20100531211818.GA30175@liondog.tnic>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      2fb750e8
    • Frederic Weisbecker's avatar
      perf: Do the comm inheritance per thread in event__process_task · dd833d71
      Frederic Weisbecker authored
      event__process_task() doesn't propagate the comm copy on clone,
      but only on process fork. So we loose all the tid:comm resolution
      for tasks that aren't a main process thread.
      
      Progragate the per thread granularity to event__process_task for
      pid resolution.
      
      This fixes various unresolved pids in perf sched, especially when
      we trace multithread processes. The problem is quickly reproducible
      with the messaging benchmark using the multithread mode "-t" :
      
      	perf sched record perf bench sched messaging -t
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Cc: Stephane Eranian <eranian@google.com>
      dd833d71
    • Frederic Weisbecker's avatar
      perf: Use event__process_task from perf sched · af64865b
      Frederic Weisbecker authored
      perf sched uses event__process_comm(), which means it can resolve
      comms from:
      
      - tasks that have exec'ed (kernel comm events)
      - tasks that were running when perf record started the actual
        recording (synthetized comm events)
      
      But perf sched can't resolve the pids of tasks that were created
      after the recording started.
      
      To solve this, we need to inherit the comms on fork events using
      event__process_task().
      
      This fixes various unresolved pids in perf sched, easily visible
      with:
      	perf sched record perf bench sched messaging
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Cc: Stephane Eranian <eranian@google.com>
      af64865b
    • Frederic Weisbecker's avatar
      perf: Process comm events by tid · 13eb04fd
      Frederic Weisbecker authored
      When we synthetize the existing running tasks though procfs,
      we walk through every threads of a process, queuing one comm
      events per tid.
      
      But then on report time, event__process_comm() only creates and
      sets the comm on a per process granularity. This is the right
      thing for comm events that came from the kernel, as they are
      only created on exec. Sub-threads then inherit their comm
      from fork events. But that doesn't work with our synthetized
      comm events taken from procfs informations as the per thread
      granularity is done on comm events directly there.
      
      Hence we need event__process_comm() to work with the tid rather
      than the pid. It won't change anything for comm events coming
      from the kernel but this will fix the synthetized ones.
      
      Before:
      
      	$ ./perf report -D | grep COMM | grep firefox
      
      	0x2c7b8 [0x18]: PERF_RECORD_COMM: firefox:5297
      	0x2c7d0 [0x18]: PERF_RECORD_COMM: firefox:5297
      	0x2c7e8 [0x18]: PERF_RECORD_COMM: firefox:5297
      	0x2c800 [0x18]: PERF_RECORD_COMM: firefox:5297
      	0x2c818 [0x18]: PERF_RECORD_COMM: firefox:5297
      	0x2c830 [0x18]: PERF_RECORD_COMM: firefox:5297
      
      After:
      	$ ./perf report -D | grep COMM | grep firefox
      
      	0x2c7b8 [0x18]: PERF_RECORD_COMM: firefox:5297
      	0x2c7d0 [0x18]: PERF_RECORD_COMM: firefox:5299
      	0x2c7e8 [0x18]: PERF_RECORD_COMM: firefox:5300
      	0x2c800 [0x18]: PERF_RECORD_COMM: firefox:5308
      	0x2c818 [0x18]: PERF_RECORD_COMM: firefox:5309
      	0x2c830 [0x18]: PERF_RECORD_COMM: firefox:5312
      
      This fixes various unresolved pid on perf sched.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Cc: Stephane Eranian <eranian@google.com>
      13eb04fd
    • Randy Dunlap's avatar
      blktrace: Fix new kernel-doc warnings · 546cf44a
      Randy Dunlap authored
      Fix blktrace.c kernel-doc warnings:
       Warning(kernel/trace/blktrace.c:858): No description found for parameter 'ignore'
       Warning(kernel/trace/blktrace.c:890): No description found for parameter 'ignore'
      Signed-off-by: default avatarRandy Dunlap <randy.dunlap@oracle.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <20100529114507.c466fc1e.randy.dunlap@oracle.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      546cf44a
    • Frederic Weisbecker's avatar
      perf_events: Fix unincremented buffer base on partial copy · 74048f89
      Frederic Weisbecker authored
      If a sample size crosses to the next page boundary, the copy
      will be made in more than one step. However we forget to advance
      the source offset for the next copy, leading to unexpected double
      copies that completely mess up the traces.
      
      This fixes various kinds of bad traces that have irrelevant
      data inside, as an example:
      
      	geany-4979  [001]  5758.077775: sched_switch: prev_comm=! prev_pid=121
      		prev_prio=0 prev_state=S|D|Z|X|x ==> next_comm= next_pid=7497072
      		next_prio=0
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1274988898-5639-1-git-send-regression-fweisbec@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      74048f89
    • Stephane Eranian's avatar
      perf_events: Fix event scheduling issues introduced by transactional API · 90151c35
      Stephane Eranian authored
      The transactional API patch between the generic and model-specific
      code introduced several important bugs with event scheduling, at
      least on X86. If you had pinned events, e.g., watchdog,  and were
      over-committing the PMU, you would get bogus counts. The bug was
      showing up on Intel CPU because events would move around more
      often that on AMD. But the problem also existed on AMD, though
      harder to expose.
      
      The issues were:
      
       - group_sched_in() was missing a cancel_txn() in the error path
      
       - cpuc->n_added was not properly maintained, leading to missing
         actions in hw_perf_enable(), i.e., n_running being 0. You cannot
         update n_added until you know the transaction has succeeded. In
         case of failed transaction n_added was not adjusted back.
      
       - in case of failed transactions, event_sched_out() was called
         and eventually invoked x86_disable_event() to touch the HW reg.
         But with transactions, on X86, event_sched_in() does not touch
         HW registers, it simply collects events into a list. Thus, you
         could end up calling x86_disable_event() on a counter which
         did not correspond to the current event when idx != -1.
      
      The patch modifies the generic and X86 code to avoid all those problems.
      
      First, we keep track of the number of events added last. In case the
      transaction fails, we substract them from n_added. This approach is
      necessary (as opposed to delaying updates to n_added) because not all
      event updates use the transaction API, e.g., single events.
      
      Second, we encapsulate the event_sched_in() and event_sched_out() in
      group_sched_in() inside the transaction. That makes the operations
      symmetrical and you can also detect that you are inside a transaction
      and skip the HW reg access by checking cpuc->group_flag.
      
      With this patch, you can now overcommit the PMU even with pinned
      system-wide events present and still get valid counts.
      Signed-off-by: default avatarStephane Eranian <eranian@google.com>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1274796225.5882.1389.camel@twins>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      90151c35
    • Peter Zijlstra's avatar
      perf_events, trace: Fix perf_trace_destroy(), mutex went missing · 2e97942f
      Peter Zijlstra authored
      Steve spotted I forgot to do the destroy under event_mutex.
      Reported-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1274451913.1674.1707.camel@laptop>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      2e97942f
    • Peter Zijlstra's avatar
      perf_events, trace: Fix probe unregister race · 3771f077
      Peter Zijlstra authored
      tracepoint_probe_unregister() does not synchronize against the probe
      callbacks, so do that explicitly. This properly serializes the callbacks
      and the free of the data used therein.
      
      Also, use this_cpu_ptr() where possible.
      Acked-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1274438476.1674.1702.camel@laptop>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      3771f077
    • Peter Zijlstra's avatar
      perf_events: Fix races in group composition · 8a49542c
      Peter Zijlstra authored
      Group siblings don't pin each-other or the parent, so when we destroy
      events we must make sure to clean up all cross referencing pointers.
      
      In particular, for destruction of a group leader we must be able to
      find all its siblings and remove their reference to it.
      
      This means that detaching an event from its context must not detach it
      from the group, otherwise we can end up failing to clear all pointers.
      
      Solve this by clearly separating the attachment to a context and
      attachment to a group, and keep the group composed until we destroy
      the events.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      8a49542c
    • Peter Zijlstra's avatar
      perf_events: Fix races and clean up perf_event and perf_mmap_data interaction · ac9721f3
      Peter Zijlstra authored
      In order to move toward separate buffer objects, rework the whole
      perf_mmap_data construct to be a more self-sufficient entity, one
      with its own lifetime rules.
      
      This greatly sanitizes the whole output redirection code, which
      was riddled with bugs and races.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: <stable@kernel.org>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      ac9721f3
  8. 30 May, 2010 16 commits