1. 21 Dec, 2011 16 commits
    • Tejun Heo's avatar
      tracing: Factorize filter creation · 38b78eb8
      Tejun Heo authored
      There are four places where new filter for a given filter string is
      created, which involves several different steps.  This patch factors
      those steps into create_[system_]filter() functions which in turn make
      use of create_filter_{start|finish}() for common parts.
      
      The only functional change is that if replace_filter_string() is
      requested and fails, creation fails without any side effect instead of
      being ignored.
      
      Note that system filter is now installed after the processing is
      complete which makes freeing before and then restoring filter string
      on error unncessary.
      
      -v2: Rebased to resolve conflict with 49aa2951 and updated both
           create_filter() functions to always set *filterp instead of
           requiring the caller to clear it to %NULL on entry.
      
      Link: http://lkml.kernel.org/r/1323988305-1469-2-git-send-email-tj@kernel.orgSigned-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      38b78eb8
    • Steven Rostedt's avatar
      tracing: Have stack tracing set filtered functions at boot · 762e1207
      Steven Rostedt authored
      Add stacktrace_filter= to the kernel command line that lets
      the user pick specific functions to check the stack on.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      762e1207
    • Steven Rostedt's avatar
      ftrace: Allow access to the boot time function enabling · 2a85a37f
      Steven Rostedt authored
      Change set_ftrace_early_filter() to ftrace_set_early_filter()
      and make it a global function. This will allow other subsystems
      in the kernel to be able to enable function tracing at start
      up and reuse the ftrace function parsing code.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      2a85a37f
    • Steven Rostedt's avatar
      tracing: Have stack_tracer use a separate list of functions · d2d45c7a
      Steven Rostedt authored
      The stack_tracer is used to look at every function and check
      if the current stack is bigger than the last recorded max stack size.
      When a new max is found, then it saves that stack off.
      
      Currently the stack tracer is limited by the global_ops of
      the function tracer. As the stack tracer has nothing to do with
      the ftrace function tracer, except that it uses it as its internal
      engine, the stack tracer should have its own list.
      
      A new file is added to the tracing debugfs directory called:
      
        stack_trace_filter
      
      that can be used to select which functions you want to check the stack
      on.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      d2d45c7a
    • Steven Rostedt's avatar
      ftrace: Decouple hash items from showing filtered functions · 69a3083c
      Steven Rostedt authored
      The set_ftrace_filter shows "hashed" functions, which are functions
      that are added with operations to them (like traceon and traceoff).
      
      As other subsystems may be able to show what functions they are
      using for function tracing, the hash items should no longer
      be shown just because the FILTER flag is set. As they have nothing
      to do with other subsystems filters.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      69a3083c
    • Steven Rostedt's avatar
      ftrace: Allow other users of function tracing to use the output listing · fc13cb0c
      Steven Rostedt authored
      The function tracer is set up to allow any other subsystem (like perf)
      to use it. Ftrace already has a way to list what functions are enabled
      by the global_ops. It would be very helpful to let other users of
      the function tracer to be able to use the same code.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      fc13cb0c
    • Steven Rostedt's avatar
      ftrace: Create ftrace_hash_empty() helper routine · 06a51d93
      Steven Rostedt authored
      There are two types of hashes in the ftrace_ops; one type
      is the filter_hash and the other is the notrace_hash. Either
      one may be null, meaning it has no elements. But when elements
      are added, the hash is allocated.
      
      Throughout the code, a check needs to be made to see if a hash
      exists or the hash has elements, but the check if the hash exists
      is usually missing causing the possible "NULL pointer dereference bug".
      
      Add a helper routine called "ftrace_hash_empty()" that returns
      true if the hash doesn't exist or its count is zero. As they mean
      the same thing.
      Last-bug-reported-by: default avatarJiri Olsa <jolsa@redhat.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      06a51d93
    • Steven Rostedt's avatar
      ftrace: Fix ftrace hash record update with notrace · c842e975
      Steven Rostedt authored
      When disabling the "notrace" records, that means we want to trace them.
      If the notrace_hash is zero, it means that we want to trace all
      records. But to disable a zero notrace_hash means nothing.
      
      The check for the notrace_hash count was incorrect with:
      
      	if (hash && !hash->count)
      		return
      
      With the correct comment above it that states that we do nothing
      if the notrace_hash has zero count. But !hash also means that
      the notrace hash has zero count. I think this was done to
      protect against dereferencing NULL. But if !hash is true, then
      we go through the following loop without doing a single thing.
      
      Fix it to:
      
      	if (!hash || !hash->count)
      		return;
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      c842e975
    • Steven Rostedt's avatar
      ftrace: Use bsearch to find record ip · 5855fead
      Steven Rostedt authored
      Now that each set of pages in the function list are sorted by
      ip, we can use bsearch to find a record within each set of pages.
      This speeds up the ftrace_location() function by magnitudes.
      
      For archs (like x86) that need to add a breakpoint at every function
      that will be converted from a nop to a callback and vice versa,
      the breakpoint callback needs to know if the breakpoint was for
      ftrace or not. It requires finding the breakpoint ip within the
      records. Doing a linear search is extremely inefficient. It is
      a must to be able to do a fast binary search to find these locations.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      5855fead
    • Steven Rostedt's avatar
      ftrace: Sort the mcount records on each page · 68950619
      Steven Rostedt authored
      Sort records by ip locations of the ftrace mcount calls on each of the
      set of pages in the function list. This helps in localizing cache
      usuage when updating the function locations, as well as gives us
      the ability to quickly find an ip location in the list.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      68950619
    • Steven Rostedt's avatar
      ftrace: Replace record newlist with record page list · 85ae32ae
      Steven Rostedt authored
      As new functions come in to be initalized from mcount to nop,
      they are done by groups of pages. Whether it is the core kernel
      or a module. There's no need to keep track of these on a per record
      basis.
      
      At startup, and as any module is loaded, the functions to be
      traced are stored in a group of pages and added to the function
      list at the end. We just need to keep a pointer to the first
      page of the list that was added, and use that to know where to
      start on the list for initializing functions.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      85ae32ae
    • Steven Rostedt's avatar
      ftrace: Allocate the mcount record pages as groups · a7900875
      Steven Rostedt authored
      Allocate the mcount record pages as a group of pages as big
      as can be allocated and waste no more than a single page.
      
      Grouping the mcount pages as much as possible helps with cache
      locality, as we do not need to redirect with descriptors as we
      cross from page to page. It also allows us to do more with the
      records later on (sort them with bigger benefits).
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      a7900875
    • Steven Rostedt's avatar
      ftrace: Remove usage of "freed" records · 32082309
      Steven Rostedt authored
      Records that are added to the function trace table are
      permanently there, except for modules. By separating out the
      modules to their own pages that can be freed in one shot
      we can remove the "freed" flag and simplify some of the record
      management.
      
      Another benefit of doing this is that we can also move the
      records around; sort them.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      32082309
    • Steven Rostedt's avatar
      ftrace: Allow archs to modify code without stop machine · c88fd863
      Steven Rostedt authored
      The stop machine method to modify all functions in the kernel
      (some 20,000 of them) is the safest way to do so across all archs.
      But some archs may not need this big hammer approach to modify code
      on SMP machines, and can simply just update the code it needs.
      
      Adding a weak function arch_ftrace_update_code() that now does the
      stop machine, will also let any arch override this method.
      
      If the arch needs to check the system and then decide if it can
      avoid stop machine, it can still call ftrace_run_stop_machine() to
      use the old method.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      c88fd863
    • Steven Rostedt's avatar
      ftrace: Do not function trace inlined functions · 45959ee7
      Steven Rostedt authored
      When gcc inlines a function, it does not mark it with the mcount
      prologue, which in turn means that inlined functions are not traced
      by the function tracer. But if CONFIG_OPTIMIZE_INLINING is set, then
      gcc is allowed not to inline a function that is marked inline.
      
      Depending on the options and the compiler, a function may or may
      not be traced by the function tracer, depending on whether gcc
      decides to inline a function or not. This has caused several
      problems in the pass becaues gcc is not always consistent with
      what it decides to inline between different gcc versions.
      
      Some places should not be traced (like paravirt native_* functions)
      and these are mostly marked as inline. When gcc decides not to
      inline the function, and if that function should not be traced, then
      the ftrace function tracer will suddenly break when it use to work
      fine. This becomes even harder to debug when different versions of
      gcc will not inline that function, making the same kernel and config
      work for some gcc versions and not work for others.
      
      By making all functions marked inline to not be traced will remove
      the ambiguity that gcc adds when it comes to tracing functions marked
      inline. All gcc versions will be consistent with what functions are
      traced and having volatile working code will be removed.
      
      Note, only the inline macro when CONFIG_OPTIMIZE_INLINING is set needs
      to have notrace added, as the attribute __always_inline will force
      the function to be inlined and then not traced.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      45959ee7
    • Jiri Olsa's avatar
      ftrace: Fix unregister ftrace_ops accounting · 30fb6aa7
      Jiri Olsa authored
      Multiple users of the function tracer can register their functions
      with the ftrace_ops structure. The accounting within ftrace will
      update the counter on each function record that is being traced.
      When the ftrace_ops filtering adds or removes functions, the
      function records will be updated accordingly if the ftrace_ops is
      still registered.
      
      When a ftrace_ops is removed, the counter of the function records,
      that the ftrace_ops traces, are decremented. When they reach zero
      the functions that they represent are modified to stop calling the
      mcount code.
      
      When changes are made, the code is updated via stop_machine() with
      a command passed to the function to tell it what to do. There is an
      ENABLE and DISABLE command that tells the called function to enable
      or disable the functions. But the ENABLE is really a misnomer as it
      should just update the records, as records that have been enabled
      and now have a count of zero should be disabled.
      
      The DISABLE command is used to disable all functions regardless of
      their counter values. This is the big off switch and is not the
      complement of the ENABLE command.
      
      To make matters worse, when a ftrace_ops is unregistered and there
      is another ftrace_ops registered, neither the DISABLE nor the
      ENABLE command are set when calling into the stop_machine() function
      and the records will not be updated to match their counter. A command
      is passed to that function that will update the mcount code to call
      the registered callback directly if it is the only one left. This
      means that the ftrace_ops that is still registered will have its callback
      called by all functions that have been set for it as well as the ftrace_ops
      that was just unregistered.
      
      Here's a way to trigger this bug. Compile the kernel with
      CONFIG_FUNCTION_PROFILER set and with CONFIG_FUNCTION_GRAPH not set:
      
       CONFIG_FUNCTION_PROFILER=y
       # CONFIG_FUNCTION_GRAPH is not set
      
      This will force the function profiler to use the function tracer instead
      of the function graph tracer.
      
        # cd /sys/kernel/debug/tracing
        # echo schedule > set_ftrace_filter
        # echo function > current_tracer
        # cat set_ftrace_filter
       schedule
        # cat trace
       # tracer: nop
       #
       # entries-in-buffer/entries-written: 692/68108025   #P:4
       #
       #                              _-----=> irqs-off
       #                             / _----=> need-resched
       #                            | / _---=> hardirq/softirq
       #                            || / _--=> preempt-depth
       #                            ||| /     delay
       #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
       #              | |       |   ||||       |         |
            kworker/0:2-909   [000] ....   531.235574: schedule <-worker_thread
                 <idle>-0     [001] .N..   531.235575: schedule <-cpu_idle
            kworker/0:2-909   [000] ....   531.235597: schedule <-worker_thread
                   sshd-2563  [001] ....   531.235647: schedule <-schedule_hrtimeout_range_clock
      
        # echo 1 > function_profile_enabled
        # echo 0 > function_porfile_enabled
        # cat set_ftrace_filter
       schedule
        # cat trace
       # tracer: function
       #
       # entries-in-buffer/entries-written: 159701/118821262   #P:4
       #
       #                              _-----=> irqs-off
       #                             / _----=> need-resched
       #                            | / _---=> hardirq/softirq
       #                            || / _--=> preempt-depth
       #                            ||| /     delay
       #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
       #              | |       |   ||||       |         |
                 <idle>-0     [002] ...1   604.870655: local_touch_nmi <-cpu_idle
                 <idle>-0     [002] d..1   604.870655: enter_idle <-cpu_idle
                 <idle>-0     [002] d..1   604.870656: atomic_notifier_call_chain <-enter_idle
                 <idle>-0     [002] d..1   604.870656: __atomic_notifier_call_chain <-atomic_notifier_call_chain
      
      The same problem could have happened with the trace_probe_ops,
      but they are modified with the set_frace_filter file which does the
      update at closure of the file.
      
      The simple solution is to change ENABLE to UPDATE and call it every
      time an ftrace_ops is unregistered.
      
      Link: http://lkml.kernel.org/r/1323105776-26961-3-git-send-email-jolsa@redhat.com
      
      Cc: stable@vger.kernel.org # 3.0+
      Signed-off-by: default avatarJiri Olsa <jolsa@redhat.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      30fb6aa7
  2. 12 Dec, 2011 3 commits
  3. 06 Dec, 2011 14 commits
  4. 05 Dec, 2011 7 commits