1. 25 Feb, 2009 8 commits
    • Frederic Weisbecker's avatar
      tracing/core: make the read callbacks reentrants · d7350c3f
      Frederic Weisbecker authored
      Now that several per-cpu files can be read or spliced at the
      same, we want the read/splice callbacks for tracing files to be
      reentrants.
      
      Until now, a single global mutex (trace_types_lock) serialized
      the access to tracing_read_pipe(), tracing_splice_read_pipe(),
      and the seq helpers.
      
      Ie: it means that if a user tries to read trace_pipe0 and
      trace_pipe1 at the same time, the access to the function
      tracing_read_pipe() is contended and one reader must wait for
      the other to finish its read call.
      
      The trace_type_lock mutex is mostly here to serialize the access
      to the global current tracer (current_trace), which can be
      changed concurrently. Although the iter struct keeps a private
      pointer to this tracer, its callbacks can be changed by another
      function.
      
      The method used here is to not keep anymore private reference to
      the tracer inside the iterator but to make a copy of it inside
      the iterator. Then it checks on subsequents read calls if the
      tracer has changed. This is not costly because the current
      tracer is not expected to be changed often, so we use a branch
      prediction for that.
      
      Moreover, we add a private mutex to the iterator (there is one
      iterator per file descriptor) to serialize the accesses in case
      of multiple consumers per file descriptor (which would be a
      silly idea from the user). Note that this is not to protect the
      ring buffer, since the ring buffer already serializes the
      readers accesses. This is to prevent from traces weirdness in
      case of concurrent consumers. But these mutexes can be dropped
      anyway, that would not result in any crash. Just tell me what
      you think about it.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      d7350c3f
    • Frederic Weisbecker's avatar
      tracing/core: introduce per cpu tracing files · b04cc6b1
      Frederic Weisbecker authored
      Impact: split up tracing output per cpu
      
      Currently, on the tracing debugfs directory, three files are
      available to the user to let him extracting the trace output:
      
      - trace is an iterator through the ring-buffer. It's a reader
        but not a consumer It doesn't block when no more traces are
        available.
      
      - trace pretty similar to the former, except that it adds more
        informations such as prempt count, irq flag, ...
      
      - trace_pipe is a reader and a consumer, it will also block
        waiting for traces if necessary (heh, yes it's a pipe).
      
      The traces coming from different cpus are curretly mixed up
      inside these files. Sometimes it messes up the informations,
      sometimes it's useful, depending on what does the tracer
      capture.
      
      The tracing_cpumask file is useful to filter the output and
      select only the traces captured a custom defined set of cpus.
      But still it is not enough powerful to extract at the same time
      one trace buffer per cpu.
      
      So this patch creates a new directory: /debug/tracing/per_cpu/.
      
      Inside this directory, you will now find one trace_pipe file and
      one trace file per cpu.
      
      Which means if you have two cpus, you will have:
      
       trace0
       trace1
       trace_pipe0
       trace_pipe1
      
      And of course, reading these files will have the same effect
      than with the usual tracing files, except that you will only see
      the traces from the given cpu.
      
      The original all-in-one cpu trace file are still available on
      their original place.
      
      Until now, only one consumer was allowed on trace_pipe to avoid
      racy consuming on the ring-buffer. Now the approach changed a
      bit, you can have only one consumer per cpu.
      
      Which means you are allowed to read concurrently trace_pipe0 and
      trace_pipe1 But you can't have two readers on trace_pipe0 or
      trace_pipe1.
      
      Following the same logic, if there is one reader on the common
      trace_pipe, you can not have at the same time another reader on
      trace_pipe0 or in trace_pipe1. Because in trace_pipe is already
      a consumer in all cpu buffers in essence.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      b04cc6b1
    • Ingo Molnar's avatar
      Merge branch 'tip/tracing/ftrace' of... · 2b1b858f
      Ingo Molnar authored
      Merge branch 'tip/tracing/ftrace' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into tracing/ftrace
      2b1b858f
    • Ingo Molnar's avatar
      tracing: remove /debug/tracing/latency_trace · 886b5b73
      Ingo Molnar authored
      Impact: remove old debug/tracing API
      
      /debug/tracing/latency_trace is an old legacy format we kept from
      the old latency tracer. Remove the file for now. If there's any
      useful bit missing then we'll propagate any useful output bits into
      the /debug/tracing/trace output.
      Reported-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      886b5b73
    • Steven Rostedt's avatar
      tracing: make event directory structure · 1473e441
      Steven Rostedt authored
      This patch adds the directory /debug/tracing/events/ that will contain
      all the registered trace points.
      
       # ls /debug/tracing/events/
      sched_kthread_stop      sched_process_fork  sched_switch
      sched_kthread_stop_ret  sched_process_free  sched_wait_task
      sched_migrate_task      sched_process_wait  sched_wakeup
      sched_process_exit      sched_signal_send   sched_wakeup_new
      
       # ls /debug/tracing/events/sched_switch/
      enable
      
       # cat /debug/tracing/events/sched_switch/enable
      1
      
       # cat /debug/tracing/set_event
      sched_switch
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      1473e441
    • Steven Rostedt's avatar
      tracing: add schedule events to event trace · f3fe8e4a
      Steven Rostedt authored
      This patch changes the trace/sched.h to use the DECLARE_TRACE_FMT
      such that they are automatically registered with the event tracer.
      
      And it also adds the tracing sched headers to kernel/trace/events.c
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      f3fe8e4a
    • Steven Rostedt's avatar
      tracing: add event trace infrastructure · b77e38aa
      Steven Rostedt authored
      This patch creates the event tracing infrastructure of ftrace.
      It will create the files:
      
       /debug/tracing/available_events
       /debug/tracing/set_event
      
      The available_events will list the trace points that have been
      registered with the event tracer.
      
      set_events will allow the user to enable or disable an event hook.
      
      example:
      
       # echo sched_wakeup > /debug/tracing/set_event
      
      Will enable the sched_wakeup event (if it is registered).
      
       # echo "!sched_wakeup" >> /debug/tracing/set_event
      
      Will disable the sched_wakeup event (and only that event).
      
       # echo > /debug/tracing/set_event
      
      Will disable all events (notice the '>')
      
       # cat /debug/tracing/available_events > /debug/tracing/set_event
      
      Will enable all registered event hooks.
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      b77e38aa
    • Steven Rostedt's avatar
      tracing: add DEFINE_TRACE_FMT to tracepoint.h · 7c37730c
      Steven Rostedt authored
      This patch creates a DEFINE_TRACE_FMT to map to DECLARE_TRACE.
      This allows for the developers to place format strings and
      args in with their tracepoint declaration. A tracer may now
      override the DEFINE_TRACE_FMT macro and use it to record
      a default format.
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      7c37730c
  2. 22 Feb, 2009 2 commits
  3. 20 Feb, 2009 11 commits
    • Steven Rostedt's avatar
      ftrace: break out modify loop immediately on detection of error · 4377245a
      Steven Rostedt authored
      Impact: added precaution on failure detection
      
      Break out of the modifying loop as soon as a failure is detected.
      This is just an added precaution found by code review and was not
      found by any bug chasing.
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      4377245a
    • Steven Rostedt's avatar
      ftrace: immediately stop code modification if failure is detected · 90c7ac49
      Steven Rostedt authored
      Impact: fix to prevent NMI lockup
      
      If the page fault handler produces a WARN_ON in the modifying of
      text, and the system is setup to have a high frequency of NMIs,
      we can lock up the system on a failure to modify code.
      
      The modifying of code with NMIs allows all NMIs to modify the code
      if it is about to run. This prevents a modifier on one CPU from
      modifying code running in NMI context on another CPU. The modifying
      is done through stop_machine, so only NMIs must be considered.
      
      But if the write causes the page fault handler to produce a warning,
      the print can slow it down enough that as soon as it is done
      it will take another NMI before going back to the process context.
      The new NMI will perform the write again causing another print and
      this will hang the box.
      
      This patch turns off the writing as soon as a failure is detected
      and does not wait for it to be turned off by the process context.
      This will keep NMIs from getting stuck in this back and forth
      of print outs.
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      90c7ac49
    • Steven Rostedt's avatar
      ftrace, x86: make kernel text writable only for conversions · 16239630
      Steven Rostedt authored
      Impact: keep kernel text read only
      
      Because dynamic ftrace converts the calls to mcount into and out of
      nops at run time, we needed to always keep the kernel text writable.
      
      But this defeats the point of CONFIG_DEBUG_RODATA. This patch converts
      the kernel code to writable before ftrace modifies the text, and converts
      it back to read only afterward.
      
      The kernel text is converted to read/write, stop_machine is called to
      modify the code, then the kernel text is converted back to read only.
      
      The original version used SYSTEM_STATE to determine when it was OK
      or not to change the code to rw or ro. Andrew Morton pointed out that
      using SYSTEM_STATE is a bad idea since there is no guarantee to what
      its state will actually be.
      
      Instead, I moved the check into the set_kernel_text_* functions
      themselves, and use a local variable to determine when it is
      OK to change the kernel text RW permissions.
      
      [ Update: Ingo Molnar suggested moving the prototypes to cacheflush.h ]
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      16239630
    • Frederic Weisbecker's avatar
      tracing/markers: make markers select tracepoints · 91f73f90
      Frederic Weisbecker authored
      Sometimes it happens that KConfig dependencies are not handled
      like in the following scenario:
      
      - config A
         bool
      
      - config B
         bool
         depends on A
      
      - config C
         bool
         select B
      
      If one selects C, then it will select B without checking its
      dependency to A, if A hasn't been selected elsewhere, it will
      result in a build failure.
      
      This is what happens on the following build error:
      
       kernel/built-in.o: In function `marker_update_probe_range':
       (.text+0x52f64): undefined reference to `tracepoint_probe_register_noupdate'
       kernel/built-in.o: In function `marker_update_probe_range':
       (.text+0x52f74): undefined reference to `tracepoint_probe_unregister_noupdate'
       kernel/built-in.o: In function `marker_update_probe_range':
       (.text+0x52fb9): undefined reference to `tracepoint_probe_unregister_noupdate'
       kernel/built-in.o: In function `marker_update_probes':
       marker.c:(.text+0x530ba): undefined reference to `tracepoint_probe_update_all'
      
      CONFIG_KVM_TRACE will select CONFIG_MARKER, but the latter
      depends on CONFIG_TRACEPOINTS which will not be selected.
      Reported-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      91f73f90
    • Steven Rostedt's avatar
      ftrace: allow archs to preform pre and post process for code modification · 000ab691
      Steven Rostedt authored
      This patch creates the weak functions: ftrace_arch_code_modify_prepare
      and ftrace_arch_code_modify_post_process that are called before and
      after the stop machine is called to modify the kernel text.
      
      If the arch needs to do pre or post processing, it only needs to define
      these functions.
      
      [ Update: Ingo Molnar suggested using the name ftrace_arch_code_modify_*
                over using ftrace_arch_modify_* ]
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      000ab691
    • Ingo Molnar's avatar
      Merge branch 'for-ingo' of... · 057685cf
      Ingo Molnar authored
      Merge branch 'for-ingo' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6 into tracing/kmemtrace
      
      Conflicts:
      	mm/slub.c
      057685cf
    • Frederic Weisbecker's avatar
      tracing/function-graph-tracer: make set_graph_function file support ftrace regex · f9349a8f
      Frederic Weisbecker authored
      Impact: trace only functions matching a pattern
      
      The set_graph_function file let one to trace only one or several
      chosen functions and follow all their code flow.
      
      Currently, only a constant function name is allowed so this patch
      allows the ftrace_regex functions:
      
      - matches all functions that end with "name":
        echo *name > set_graph_function
      
      - matches all functions that begin with "name":
        echo name* > set_graph_function
      
      - matches all functions that contains "name":
        echo *name* > set_graph_function
      
      Example:
      
      echo mutex* > set_graph_function
      
       0)               |  mutex_lock_nested() {
       0)   0.563 us    |    __might_sleep();
       0)   2.072 us    |  }
       0)               |  mutex_unlock() {
       0)   1.036 us    |    __mutex_unlock_slowpath();
       0)   2.433 us    |  }
       0)               |  mutex_unlock() {
       0)   0.691 us    |    __mutex_unlock_slowpath();
       0)   1.787 us    |  }
       0)               |  mutex_lock_interruptible_nested() {
       0)   0.548 us    |    __might_sleep();
       0)   1.945 us    |  }
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      f9349a8f
    • Ingo Molnar's avatar
    • Christoph Lameter's avatar
      SLUB: Introduce and use SLUB_MAX_SIZE and SLUB_PAGE_SHIFT constants · fe1200b6
      Christoph Lameter authored
      As a preparational patch to bump up page allocator pass-through threshold,
      introduce two new constants SLUB_MAX_SIZE and SLUB_PAGE_SHIFT and convert
      mm/slub.c to use them.
      Reported-by: default avatar"Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
      Tested-by: default avatar"Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
      Signed-off-by: default avatarChristoph Lameter <cl@linux-foundation.org>
      Signed-off-by: default avatarPekka Enberg <penberg@cs.helsinki.fi>
      fe1200b6
    • Ingo Molnar's avatar
      x86: use the right protections for split-up pagetables · 07a66d7c
      Ingo Molnar authored
      Steven Rostedt found a bug in where in his modified kernel
      ftrace was unable to modify the kernel text, due to the PMD
      itself having been marked read-only as well in
      split_large_page().
      
      The fix, suggested by Linus, is to not try to 'clone' the
      reference protection of a huge-page, but to use the standard
      (and permissive) page protection bits of KERNPG_TABLE.
      
      The 'cloning' makes sense for the ptes but it's a confused and
      incorrect concept at the page table level - because the
      pagetable entry is a set of all ptes and hence cannot
      'clone' any single protection attribute - the ptes can be any
      mixture of protections.
      
      With the permissive KERNPG_TABLE, even if the pte protections
      get changed after this point (due to ftrace doing code-patching
      or other similar activities like kprobes), the resulting combined
      protections will still be correct and the pte's restrictive
      (or permissive) protections will control it.
      
      Also update the comment.
      
      This bug was there for a long time but has not caused visible
      problems before as it needs a rather large read-only area to
      trigger. Steve possibly hacked his kernel with some really
      large arrays or so. Anyway, the bug is definitely worth fixing.
      
      [ Huang Ying also experienced problems in this area when writing
        the EFI code, but the real bug in split_large_page() was not
        realized back then. ]
      Reported-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Reported-by: default avatarHuang Ying <ying.huang@intel.com>
      Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      07a66d7c
    • Alok N Kataria's avatar
      x86, vmi: TSC going backwards check in vmi clocksource · 48ffc70b
      Alok N Kataria authored
      Impact: fix time warps under vmware
      
      Similar to the check for TSC going backwards in the TSC clocksource,
      we also need this check for VMI clocksource.
      Signed-off-by: default avatarAlok N Kataria <akataria@vmware.com>
      Cc: Zachary Amsden <zach@vmware.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Cc: stable@kernel.org
      48ffc70b
  4. 19 Feb, 2009 19 commits