1. 21 Jul, 2011 9 commits
  2. 15 Jul, 2011 12 commits
  3. 14 Jul, 2011 7 commits
    • Masami Hiramatsu's avatar
      tracing/kprobes: Rename probe_* to trace_probe_* · 7143f168
      Masami Hiramatsu authored
      Rename probe_* to trace_probe_* for avoiding namespace
      confliction. This also fixes improper names of find_probe_event()
      and cleanup_all_probes() to find_trace_probe() and
      release_all_trace_probes() respectively.
      Signed-off-by: default avatarMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Link: http://lkml.kernel.org/r/20110627072636.6528.60374.stgit@fedora15Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      7143f168
    • Cyrill Gorcunov's avatar
      perf, x86: P4 PMU - Introduce event alias feature · f9129870
      Cyrill Gorcunov authored
      Instead of hw_nmi_watchdog_set_attr() weak function
      and appropriate x86_pmu::hw_watchdog_set_attr() call
      we introduce even alias mechanism which allow us
      to drop this routines completely and isolate quirks
      of Netburst architecture inside P4 PMU code only.
      
      The main idea remains the same though -- to allow
      nmi-watchdog and perf top run simultaneously.
      
      Note the aliasing mechanism applies to generic
      PERF_COUNT_HW_CPU_CYCLES event only because arbitrary
      event (say passed as RAW initially) might have some
      additional bits set inside ESCR register changing
      the behaviour of event and we can't guarantee anymore
      that alias event will give the same result.
      
      P.S. Thanks a huge to Don and Steven for for testing
           and early review.
      Acked-by: default avatarDon Zickus <dzickus@redhat.com>
      Tested-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarCyrill Gorcunov <gorcunov@openvz.org>
      CC: Ingo Molnar <mingo@elte.hu>
      CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
      CC: Stephane Eranian <eranian@google.com>
      CC: Lin Ming <ming.m.lin@intel.com>
      CC: Arnaldo Carvalho de Melo <acme@redhat.com>
      CC: Frederic Weisbecker <fweisbec@gmail.com>
      Link: http://lkml.kernel.org/r/20110708201712.GS23657@sunSigned-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      f9129870
    • Steven Rostedt's avatar
      tracing: Have dynamic size event stack traces · 4a9bd3f1
      Steven Rostedt authored
      Currently the stack trace per event in ftace is only 8 frames.
      This can be quite limiting and sometimes useless. Especially when
      the "ignore frames" is wrong and we also use up stack frames for
      the event processing itself.
      
      Change this to be dynamic by adding a percpu buffer that we can
      write a large stack frame into and then copy into the ring buffer.
      
      For interrupts and NMIs that come in while another event is being
      process, will only get to use the 8 frame stack. That should be enough
      as the task that it interrupted will have the full stack frame anyway.
      Requested-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      4a9bd3f1
    • Sonny Rao's avatar
      perf: Robustify proc and debugfs file recording · 259032bf
      Sonny Rao authored
      While attempting to create a timechart of boot up I found perf didn't
      tolerate modules being loaded/unloaded.  This patch fixes this by
      reading the file once and then writing the size read at the correct
      point in the file.  It also simplifies the code somewhat.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Signed-off-by: default avatarSonny Rao <sonnyrao@chromium.org>
      Signed-off-by: default avatarMichael Neuling <mikey@neuling.org>
      Link: http://lkml.kernel.org/r/10011.1310614483@neuling.orgSigned-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      259032bf
    • Steven Rostedt's avatar
      ftrace: Fix dynamic selftest failure on some archs · 6331c28c
      Steven Rostedt authored
      Archs that do not implement CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST, will
      fail the dynamic ftrace selftest.
      
      The function tracer has a quick 'off' variable that will prevent
      the call back functions from being called. This variable is called
      function_trace_stop. In x86, this is implemented directly in the mcount
      assembly, but for other archs, an intermediate function is used called
      ftrace_test_stop_func().
      
      In dynamic ftrace, the function pointer variable ftrace_trace_function is
      used to update the caller code in the mcount caller. But for archs that
      do not have CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST set, it only calls
      ftrace_test_stop_func() instead, which in turn calls __ftrace_trace_function.
      
      When more than one ftrace_ops is registered, the function it calls is
      ftrace_ops_list_func(), which will iterate over all registered ftrace_ops
      and call the callbacks that have their hash matching.
      
      The issue happens when two ftrace_ops are registered for different functions
      and one is then unregistered. The __ftrace_trace_function is then pointed
      to the remaining ftrace_ops callback function directly. This mean it will
      be called for all functions that were registered to trace by both ftrace_ops
      that were registered.
      
      This is not an issue for archs with CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST,
      because the update of ftrace_trace_function doesn't happen until after all
      functions have been updated, and then the mcount caller is updated. But
      for those archs that do use the ftrace_test_stop_func(), the update is
      immediate.
      
      The dynamic selftest fails because it hits this situation, and the
      ftrace_ops that it registers fails to only trace what it was suppose to
      and instead traces all other functions.
      
      The solution is to delay the setting of __ftrace_trace_function until
      after all the functions have been updated according to the registered
      ftrace_ops. Also, function_trace_stop is set during the update to prevent
      function tracing from calling code that is caused by the function tracer
      itself.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      6331c28c
    • Steven Rostedt's avatar
      ftrace: Update filter when tracing enabled in set_ftrace_filter() · 072126f4
      Steven Rostedt authored
      Currently, if set_ftrace_filter() is called when the ftrace_ops is
      active, the function filters will not be updated. They will only be updated
      when tracing is disabled and re-enabled.
      
      Update the functions immediately during set_ftrace_filter().
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      072126f4
    • Steven Rostedt's avatar
      ftrace: Balance records when updating the hash · 41fb61c2
      Steven Rostedt authored
      Whenever the hash of the ftrace_ops is updated, the record counts
      must be balance. This requires disabling the records that are set
      in the original hash, and then enabling the records that are set
      in the updated hash.
      
      Moving the update into ftrace_hash_move() removes the bug where the
      hash was updated but the records were not, which results in ftrace
      triggering a warning and disabling itself because the ftrace_ops filter
      is updated while the ftrace_ops was registered, and then the failure
      happens when the ftrace_ops is unregistered.
      
      The current code will not trigger this bug, but new code will.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      41fb61c2
  4. 11 Jul, 2011 1 commit
  5. 08 Jul, 2011 2 commits
    • Steven Rostedt's avatar
      ftrace: Do not disable interrupts for modules in mcount update · 4376cac6
      Steven Rostedt authored
      When I mounted an NFS directory, it caused several modules to be loaded. At the
      time I was running the preemptirqsoff tracer, and it showed the following
      output:
      
      # tracer: preemptirqsoff
      #
      # preemptirqsoff latency trace v1.1.5 on 2.6.33.9-rt30-mrg-test
      # --------------------------------------------------------------------
      # latency: 1177 us, #4/4, CPU#3 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4)
      #    -----------------
      #    | task: modprobe-19370 (uid:0 nice:0 policy:0 rt_prio:0)
      #    -----------------
      #  => started at: ftrace_module_notify
      #  => ended at:   ftrace_module_notify
      #
      #
      #                  _------=> CPU#
      #                 / _-----=> irqs-off
      #                | / _----=> need-resched
      #                || / _---=> hardirq/softirq
      #                ||| / _--=> preempt-depth
      #                |||| /_--=> lock-depth
      #                |||||/     delay
      #  cmd     pid   |||||| time  |   caller
      #     \   /      ||||||   \   |   /
      modprobe-19370   3d....    0us!: ftrace_process_locs <-ftrace_module_notify
      modprobe-19370   3d.... 1176us : ftrace_process_locs <-ftrace_module_notify
      modprobe-19370   3d.... 1178us : trace_hardirqs_on <-ftrace_module_notify
      modprobe-19370   3d.... 1178us : <stack trace>
       => ftrace_process_locs
       => ftrace_module_notify
       => notifier_call_chain
       => __blocking_notifier_call_chain
       => blocking_notifier_call_chain
       => sys_init_module
       => system_call_fastpath
      
      That's over 1ms that interrupts are disabled on a Real-Time kernel!
      
      Looking at the cause (being the ftrace author helped), I found that the
      interrupts are disabled before the code modification of mcounts into nops. The
      interrupts only need to be disabled on start up around this code, not when
      modules are being loaded.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      4376cac6
    • Steven Rostedt's avatar
      tracing: Still trace filtered irq functions when irq trace is disabled · e4a3f541
      Steven Rostedt authored
      If a function is set to be traced by the set_graph_function, but the
      option funcgraph-irqs is zero, and the traced function happens to be
      called from a interrupt, it will not be traced.
      
      The point of funcgraph-irqs is to not trace interrupts when we are
      preempted by an irq, not to not trace functions we want to trace that
      happen to be *in* a irq.
      
      Luckily the current->trace_recursion element is perfect to add a flag
      to help us be able to trace functions within an interrupt even when
      we are not tracing interrupts that preempt the trace.
      Reported-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Tested-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      e4a3f541
  6. 07 Jul, 2011 4 commits
    • Steven Rostedt's avatar
      tracing, x86/irq: Do not trace arch_local_{*,irq_*}() functions · e08fbb78
      Steven Rostedt authored
      I triggered a triple fault with gcc 4.5.1 because it did not
      honor the inline annotation to arch_local_save_flags() function
      and that function was added to the pool of functions traced by
      the function tracer.
      
      When preempt_schedule() called arch_local_save_flags() (called
      by irqs_disabled()), it was traced, but the first thing the
      function tracer does is disable preemption. When it enables
      preemption, the NEED_RESCHED flag will not have been cleared and
      the preemption check will trigger the call to preempt_schedule()
      again.
      
      Although the dynamic function tracer crashed immediately, the
      static version of the function tracer (CONFIG_DYNAMIC_FTRACE is
      not set) actually was able to show where the problem was.
      
       swapper-1       3.N.. 103885us : arch_local_save_flags <-preempt_schedule
       swapper-1       3.N.. 103886us : arch_local_save_flags <-preempt_schedule
       swapper-1       3.N.. 103886us : arch_local_save_flags <-preempt_schedule
       swapper-1       3.N.. 103887us : arch_local_save_flags <-preempt_schedule
       swapper-1       3.N.. 103887us : arch_local_save_flags <-preempt_schedule
       swapper-1       3.N.. 103888us : arch_local_save_flags <-preempt_schedule
       swapper-1       3.N.. 103888us : arch_local_save_flags <-preempt_schedule
      
      It went on for a while before it triple faulted with a corrupted
      stack.
      
      The arch_local_save_flags and arch_local_irq_* functions should
      not be traced. Even though they are marked as inline, gcc may
      still make them a function and enable tracing of them.
      
      The simple solution is to just mark them as notrace. I had to
      add the <linux/types.h> for this file to include the notrace
      tag.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/r/20110702033852.733414762@goodmis.orgSigned-off-by: default avatarIngo Molnar <mingo@elte.hu>
      e08fbb78
    • Steven Rostedt's avatar
      ftrace: Fix regression of :mod:module function enabling · 43dd61c9
      Steven Rostedt authored
      The new code that allows different utilities to pick and choose
      what functions they trace broke the :mod: hook that allows users
      to trace only functions of a particular module.
      
      The reason is that the :mod: hook bypasses the hash that is setup
      to allow individual users to trace their own functions and uses
      the global hash directly. But if the global hash has not been
      set up, it will cause a bug:
      
      echo '*:mod:radeon' > /sys/kernel/debug/set_ftrace_filter
      
      produces:
      
       [drm:drm_mode_getfb] *ERROR* invalid framebuffer id
       [drm:radeon_crtc_page_flip] *ERROR* failed to reserve new rbo buffer before flip
       BUG: unable to handle kernel paging request at ffffffff8160ec90
       IP: [<ffffffff810d9136>] add_hash_entry+0x66/0xd0
       PGD 1a05067 PUD 1a09063 PMD 80000000016001e1
       Oops: 0003 [#1] SMP Jul  7 04:02:28 phyllis kernel: [55303.858604] CPU 1
       Modules linked in: cryptd aes_x86_64 aes_generic binfmt_misc rfcomm bnep ip6table_filter hid radeon r8169 ahci libahci mii ttm drm_kms_helper drm video i2c_algo_bit intel_agp intel_gtt
      
       Pid: 10344, comm: bash Tainted: G        WC  3.0.0-rc5 #1 Dell Inc. Inspiron N5010/0YXXJJ
       RIP: 0010:[<ffffffff810d9136>]  [<ffffffff810d9136>] add_hash_entry+0x66/0xd0
       RSP: 0018:ffff88003a96bda8  EFLAGS: 00010246
       RAX: ffff8801301735c0 RBX: ffffffff8160ec80 RCX: 0000000000306ee0
       RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880137c92940
       RBP: ffff88003a96bdb8 R08: ffff880137c95680 R09: 0000000000000000
       R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff81c9df78
       R13: ffff8801153d1000 R14: 0000000000000000 R15: 0000000000000000
       FS: 00007f329c18a700(0000) GS:ffff880137c80000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: ffffffff8160ec90 CR3: 000000003002b000 CR4: 00000000000006e0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
       Process bash (pid: 10344, threadinfo ffff88003a96a000, task ffff88012fcfc470)
       Stack:
        0000000000000fd0 00000000000000fc ffff88003a96be38 ffffffff810d92f5
        ffff88011c4c4e00 ffff880000000000 000000000b69f4d0 ffffffff8160ec80
        ffff8800300e6f06 0000000081130295 0000000000000282 ffff8800300e6f00
       Call Trace:
        [<ffffffff810d92f5>] match_records+0x155/0x1b0
        [<ffffffff810d940c>] ftrace_mod_callback+0xbc/0x100
        [<ffffffff810dafdf>] ftrace_regex_write+0x16f/0x210
        [<ffffffff810db09f>] ftrace_filter_write+0xf/0x20
        [<ffffffff81166e48>] vfs_write+0xc8/0x190
        [<ffffffff81167001>] sys_write+0x51/0x90
        [<ffffffff815c7e02>] system_call_fastpath+0x16/0x1b
       Code: 48 8b 33 31 d2 48 85 f6 75 33 49 89 d4 4c 03 63 08 49 8b 14 24 48 85 d2 48 89 10 74 04 48 89 42 08 49 89 04 24 4c 89 60 08 31 d2
       RIP [<ffffffff810d9136>] add_hash_entry+0x66/0xd0
        RSP <ffff88003a96bda8>
       CR2: ffffffff8160ec90
       ---[ end trace a5d031828efdd88e ]---
      Reported-by: default avatarBrian Marete <marete@toshnix.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      43dd61c9
    • Steven Rostedt's avatar
      tracing: Have "enable" file use refcounts like the "filter" file · 40ee4dff
      Steven Rostedt authored
      The "enable" file for the event system can be removed when a module
      is unloaded and the event system only has events from that module.
      As the event system nr_events count goes to zero, it may be freed
      if its ref_count is also set to zero.
      
      Like the "filter" file, the "enable" file may be opened by a task and
      referenced later, after a module has been unloaded and the events for
      that event system have been removed.
      
      Although the "filter" file referenced the event system structure,
      the "enable" file only references a pointer to the event system
      name. Since the name is freed when the event system is removed,
      it is possible that an access to the "enable" file may reference
      a freed pointer.
      
      Update the "enable" file to use the subsystem_open() routine that
      the "filter" file uses, to keep a reference to the event system
      structure while the "enable" file is opened.
      
      Cc: <stable@kernel.org>
      Reported-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      40ee4dff
    • Steven Rostedt's avatar
      tracing: Fix bug when reading system filters on module removal · e9dbfae5
      Steven Rostedt authored
      The event system is freed when its nr_events is set to zero. This happens
      when a module created an event system and then later the module is
      removed. Modules may share systems, so the system is allocated when
      it is created and freed when the modules are unloaded and all the
      events under the system are removed (nr_events set to zero).
      
      The problem arises when a task opened the "filter" file for the
      system. If the module is unloaded and it removed the last event for
      that system, the system structure is freed. If the task that opened
      the filter file accesses the "filter" file after the system has
      been freed, the system will access an invalid pointer.
      
      By adding a ref_count, and using it to keep track of what
      is using the event system, we can free it after all users
      are finished with the event system.
      
      Cc: <stable@kernel.org>
      Reported-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      e9dbfae5
  7. 05 Jul, 2011 2 commits
  8. 04 Jul, 2011 1 commit
  9. 03 Jul, 2011 1 commit
  10. 02 Jul, 2011 1 commit
    • Frederic Weisbecker's avatar
      x86: Don't use frame pointer to save old stack on irq entry · a2bbe750
      Frederic Weisbecker authored
      rbp is used in SAVE_ARGS_IRQ to save the old stack pointer
      in order to restore it later in ret_from_intr.
      
      It is convenient because we save its value in the irq regs
      and it's easily restored using the leave instruction.
      
      However this is a kind of abuse of the frame pointer which
      role is to help unwinding the kernel by chaining frames
      together, each node following the return address to the
      previous frame.
      
      But although we are breaking the frame by changing the stack
      pointer, there is no preceding return address before the new
      frame. Hence using the frame pointer to link the two stacks
      breaks the stack unwinders that find a random value instead of
      a return address here.
      
      There is no workaround that can work in every case. We are using
      the fixup_bp_irq_link() function to dereference that abused frame
      pointer in the case of non nesting interrupt (which means stack
      changed).
      But that doesn't fix the case of interrupts that don't change the
      stack (but we still have the unconditional frame link), which is
      the case of hardirq interrupting softirq. We have no way to detect
      this transition so the frame irq link is considered as a real frame
      pointer and the return address is dereferenced but it is still a
      spurious one.
      
      There are two possible results of this: either the spurious return
      address, a random stack value, luckily belongs to the kernel text
      and then the unwinding can continue and we just have a weird entry
      in the stack trace. Or it doesn't belong to the kernel text and
      unwinding stops there.
      
      This is the reason why stacktraces (including perf callchains) on
      irqs that interrupted softirqs don't work very well.
      
      To solve this, we don't save the old stack pointer on rbp anymore
      but we save it to a scratch register that we push on the new
      stack and that we pop back later on irq return.
      
      This preserves the whole frame chain without spurious return addresses
      in the middle and drops the need for the horrid fixup_bp_irq_link()
      workaround.
      
      And finally irqs that interrupt softirq are sanely unwinded.
      
      Before:
      
          99.81%         perf  [kernel.kallsyms]  [k] perf_pending_event
                         |
                         --- perf_pending_event
                             irq_work_run
                             smp_irq_work_interrupt
                             irq_work_interrupt
                            |
                            |--41.60%-- __read
                            |          |
                            |          |--99.90%-- create_worker
                            |          |          bench_sched_messaging
                            |          |          cmd_bench
                            |          |          run_builtin
                            |          |          main
                            |          |          __libc_start_main
                            |           --0.10%-- [...]
      
      After:
      
           1.64%  swapper  [kernel.kallsyms]  [k] perf_pending_event
                  |
                  --- perf_pending_event
                      irq_work_run
                      smp_irq_work_interrupt
                      irq_work_interrupt
                     |
                     |--95.00%-- arch_irq_work_raise
                     |          irq_work_queue
                     |          __perf_event_overflow
                     |          perf_swevent_overflow
                     |          perf_swevent_event
                     |          perf_tp_event
                     |          perf_trace_softirq
                     |          __do_softirq
                     |          call_softirq
                     |          do_softirq
                     |          irq_exit
                     |          |
                     |          |--73.68%-- smp_apic_timer_interrupt
                     |          |          apic_timer_interrupt
                     |          |          |
                     |          |          |--96.43%-- amd_e400_idle
                     |          |          |          cpu_idle
                     |          |          |          start_secondary
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jan Beulich <JBeulich@novell.com>
      a2bbe750