1. 18 Oct, 2010 5 commits
    • Peter Zijlstra's avatar
      perf: Fix group moving · 74c3337c
      Peter Zijlstra authored
      Matt found we trigger the WARN_ON_ONCE() in perf_group_attach() when we take
      the move_group path in perf_event_open().
      
      Since we cannot de-construct the group (we rely on it to move the events), we
      have to simply ignore the double attach. The group state is context invariant
      and doesn't need changing.
      Reported-by: default avatarMatt Fleming <matt@console-pimps.org>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1287135757.29097.1368.camel@twins>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      74c3337c
    • Peter Zijlstra's avatar
      irq_work: Add generic hardirq context callbacks · e360adbe
      Peter Zijlstra authored
      Provide a mechanism that allows running code in IRQ context. It is
      most useful for NMI code that needs to interact with the rest of the
      system -- like wakeup a task to drain buffers.
      
      Perf currently has such a mechanism, so extract that and provide it as
      a generic feature, independent of perf so that others may also
      benefit.
      
      The IRQ context callback is generated through self-IPIs where
      possible, or on architectures like powerpc the decrementer (the
      built-in timer facility) is set to generate an interrupt immediately.
      
      Architectures that don't have anything like this get to do with a
      callback from the timer tick. These architectures can call
      irq_work_run() at the tail of any IRQ handlers that might enqueue such
      work (like the perf IRQ handler) to avoid undue latencies in
      processing the work.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: default avatarKyle McMartin <kyle@mcmartin.ca>
      Acked-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      [ various fixes ]
      Signed-off-by: default avatarHuang Ying <ying.huang@intel.com>
      LKML-Reference: <1287036094.7768.291.camel@yhuang-dev>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      e360adbe
    • Stephane Eranian's avatar
      perf_events: Fix transaction recovery in group_sched_in() · 8e5fc1a7
      Stephane Eranian authored
      The group_sched_in() function uses a transactional approach to schedule
      a group of events. In a group, either all events can be scheduled or
      none are. To schedule each event in, the function calls event_sched_in().
      In case of error, event_sched_out() is called on each event in the group.
      
      The problem is that event_sched_out() does not completely cancel the
      effects of event_sched_in(). Furthermore event_sched_out() changes the
      state of the event as if it had run which is not true is this particular
      case.
      
      Those inconsistencies impact time tracking fields and may lead to events
      in a group not all reporting the same time_enabled and time_running values.
      This is demonstrated with the example below:
      
      $ task -eunhalted_core_cycles,baclears,baclears -e unhalted_core_cycles,baclears,baclears sleep 5
      1946101 unhalted_core_cycles (32.85% scaling, ena=829181, run=556827)
        11423 baclears (32.85% scaling, ena=829181, run=556827)
         7671 baclears (0.00% scaling, ena=556827, run=556827)
      
      2250443 unhalted_core_cycles (57.83% scaling, ena=962822, run=405995)
        11705 baclears (57.83% scaling, ena=962822, run=405995)
        11705 baclears (57.83% scaling, ena=962822, run=405995)
      
      Notice that in the first group, the last baclears event does not
      report the same timings as its siblings.
      
      This issue comes from the fact that tstamp_stopped is updated
      by event_sched_out() as if the event had actually run.
      
      To solve the issue, we must ensure that, in case of error, there is
      no change in the event state whatsoever. That means timings must
      remain as they were when entering group_sched_in().
      
      To do this we defer updating tstamp_running until we know the
      transaction succeeded. Therefore, we have split event_sched_in()
      in two parts separating the update to tstamp_running.
      
      Similarly, in case of error, we do not want to update tstamp_stopped.
      Therefore, we have split event_sched_out() in two parts separating
      the update to tstamp_stopped.
      
      With this patch, we now get the following output:
      
      $ task -eunhalted_core_cycles,baclears,baclears -e unhalted_core_cycles,baclears,baclears sleep 5
      2492050 unhalted_core_cycles (71.75% scaling, ena=1093330, run=308841)
        11243 baclears (71.75% scaling, ena=1093330, run=308841)
        11243 baclears (71.75% scaling, ena=1093330, run=308841)
      
      1852746 unhalted_core_cycles (0.00% scaling, ena=784489, run=784489)
         9253 baclears (0.00% scaling, ena=784489, run=784489)
         9253 baclears (0.00% scaling, ena=784489, run=784489)
      
      Note that the uneven timing between groups is a side effect of
      the process spending most of its time sleeping, i.e., not enough
      event rotations (but that's a separate issue).
      Signed-off-by: default avatarStephane Eranian <eranian@google.com>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <4cb86b4c.41e9d80a.44e9.3e19@mx.google.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      8e5fc1a7
    • Stephane Eranian's avatar
      perf_events: Fix bogus AMD64 generic TLB events · ba0cef3d
      Stephane Eranian authored
      PERF_COUNT_HW_CACHE_DTLB:READ:MISS had a bogus umask value of 0 which
      counts nothing. Needed to be 0x7 (to count all possibilities).
      
      PERF_COUNT_HW_CACHE_ITLB:READ:MISS had a bogus umask value of 0 which
      counts nothing. Needed to be 0x3 (to count all possibilities).
      Signed-off-by: default avatarStephane Eranian <eranian@google.com>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: <stable@kernel.org> # as far back as it applies
      LKML-Reference: <4cb85478.41e9d80a.44e2.3f00@mx.google.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      ba0cef3d
    • Stephane Eranian's avatar
      perf_events: Fix bogus context time tracking · c530ccd9
      Stephane Eranian authored
      You can only call update_context_time() when the context
      is active, i.e., the thread it is attached to is still running.
      
      However, perf_event_read() can be called even when the context
      is inactive, e.g., user read() the counters. The call to
      update_context_time() must be conditioned on the status of
      the context, otherwise, bogus time_enabled, time_running may
      be returned. Here is an example on AMD64. The task program
      is an example from libpfm4. The -p prints deltas every 1s.
      
      $ task -p -e cpu_clk_unhalted sleep 5
          2,266,610 cpu_clk_unhalted (0.00% scaling, ena=2,158,982, run=2,158,982)
      	    0 cpu_clk_unhalted (0.00% scaling, ena=2,158,982, run=2,158,982)
      	    0 cpu_clk_unhalted (0.00% scaling, ena=2,158,982, run=2,158,982)
      	    0 cpu_clk_unhalted (0.00% scaling, ena=2,158,982, run=2,158,982)
      	    0 cpu_clk_unhalted (0.00% scaling, ena=2,158,982, run=2,158,982)
      5,242,358,071 cpu_clk_unhalted (99.95% scaling, ena=5,000,359,984, run=2,319,270)
      
      Whereas if you don't read deltas, e.g., no call to perf_event_read() until
      the process terminates:
      
      $ task -e cpu_clk_unhalted sleep 5
          2,497,783 cpu_clk_unhalted (0.00% scaling, ena=2,376,899, run=2,376,899)
      
      Notice that time_enable, time_running are bogus in the first example
      causing bogus scaling.
      
      This patch fixes the problem, by conditionally calling update_context_time()
      in perf_event_read().
      Signed-off-by: default avatarStephane Eranian <eranian@google.com>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: stable@kernel.org
      LKML-Reference: <4cb856dc.51edd80a.5ae0.38fb@mx.google.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      c530ccd9
  2. 16 Oct, 2010 2 commits
  3. 15 Oct, 2010 9 commits
    • Steven Rostedt's avatar
      ftrace: Use objtree for C version of recordmcount · 85caa993
      Steven Rostedt authored
      The C version of recordmcount is compiled to a binary, which will
      end up located in the objtree. If the kernel is built with O=path,
      the srctree will not include the binary recordmcount caller.
      
      Cc: Michal Marek <mmarek@suse.cz>
      Cc: linux-kbuild@vger.kernel.org
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      85caa993
    • Steven Rostedt's avatar
      ftrace: Do not process kernel/trace/ftrace.o with C recordmcount program · 44475863
      Steven Rostedt authored
      The file kernel/trace/ftrace.c references the mcount() call to
      convert the mcount() callers to nops. But because it references
      mcount(), the mcount() address is placed in the relocation table.
      
      The C version of recordmcount reads the relocation table of all
      object files, and it will add all references to mcount to the
      __mcount_loc table that is used to find the places that call mcount()
      and change the call to a nop. When recordmcount finds the mcount reference
      in kernel/trace/ftrace.o, it saves that location even though the code
      is not a call, but references mcount as data.
      
      On boot up, when all calls are converted to nops, the code has a safety
      check to determine what op code it is actually replacing before it
      replaces it. If that op code at the address does not match, then
      a warning is printed and the function tracer is disabled.
      
      The reference to mcount in ftrace.c, causes this warning to trigger,
      since the reference is not a call to mcount(). The ftrace.c file is
      not compiled with the -pg flag, so no calls to mcount() should be
      expected.
      
      This patch simply makes recordmcount.c skip the kernel/trace/ftrace.c
      file. This was the same solution used by the perl version of
      recordmcount.
      Reported-by: default avatarIngo Molnar <mingo@elte.hu>
      Cc: John Reiser <jreiser@bitwagon.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      44475863
    • Robert Richter's avatar
      oprofile: make !CONFIG_PM function stubs static inline · cd254f29
      Robert Richter authored
      Make !CONFIG_PM function stubs static inline and remove section
      attribute.
      Signed-off-by: default avatarRobert Richter <robert.richter@amd.com>
      cd254f29
    • Anand Gadiyar's avatar
      oprofile: fix linker errors · b3b3a9b6
      Anand Gadiyar authored
      Commit e9677b3c (oprofile, ARM: Use oprofile_arch_exit() to
      cleanup on failure) caused oprofile_perf_exit to be called
      in the cleanup path of oprofile_perf_init. The __exit tag
      for oprofile_perf_exit should therefore be dropped.
      
      The same has to be done for exit_driverfs as well, as this
      function is called from oprofile_perf_exit. Else, we get
      the following two linker errors.
      
        LD      .tmp_vmlinux1
      `oprofile_perf_exit' referenced in section `.init.text' of arch/arm/oprofile/built-in.o: defined in discarded section `.exit.text' of arch/arm/oprofile/built-in.o
      make: *** [.tmp_vmlinux1] Error 1
      
        LD      .tmp_vmlinux1
      `exit_driverfs' referenced in section `.text' of arch/arm/oprofile/built-in.o: defined in discarded section `.exit.text' of arch/arm/oprofile/built-in.o
      make: *** [.tmp_vmlinux1] Error 1
      Signed-off-by: default avatarAnand Gadiyar <gadiyar@ti.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarRobert Richter <robert.richter@amd.com>
      b3b3a9b6
    • Anand Gadiyar's avatar
      oprofile: include platform_device.h to fix build break · 277dd984
      Anand Gadiyar authored
      oprofile_perf.c needs to include platform_device.h
      Otherwise we get the following build break.
      
        CC      arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.o
      arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:192: warning: 'struct platform_device' declared inside parameter list
      arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:192: warning: its scope is only this definition or declaration, which is probably not what you want
      arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:201: warning: 'struct platform_device' declared inside parameter list
      arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:210: error: variable 'oprofile_driver' has initializer but incomplete type
      arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:211: error: unknown field 'driver' specified in initializer
      arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:211: error: extra brace group at end of initializer
      arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:211: error: (near initialization for 'oprofile_driver')
      arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:213: warning: excess elements in struct initializer
      arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:213: warning: (near initialization for 'oprofile_driver')
      arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:214: error: unknown field 'resume' specified in initializer
      arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:214: warning: excess elements in struct initializer
      arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:214: warning: (near initialization for 'oprofile_driver')
      arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:215: error: unknown field 'suspend' specified in initializer
      arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:215: warning: excess elements in struct initializer
      arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:215: warning: (near initialization for 'oprofile_driver')
      arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c: In function 'init_driverfs':
      Signed-off-by: default avatarAnand Gadiyar <gadiyar@ti.com>
      Cc: Matt Fleming <matt@console-pimps.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarRobert Richter <robert.richter@amd.com>
      277dd984
    • Robert Richter's avatar
      Merge remote branch 'tip/perf/core' into oprofile/core · 6268464b
      Robert Richter authored
      Conflicts:
      	arch/arm/oprofile/common.c
      	kernel/perf_event.c
      6268464b
    • Ingo Molnar's avatar
      Merge branch 'tip/perf/recordmcount-2' of... · 0fdf1360
      Ingo Molnar authored
      Merge branch 'tip/perf/recordmcount-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/core
      0fdf1360
    • Steven Rostedt's avatar
      ftrace: Rename config option HAVE_C_MCOUNT_RECORD to HAVE_C_RECORDMCOUNT · cf4db259
      Steven Rostedt authored
      The config option used by archs to let the build system know that
      the C version of the recordmcount works for said arch is currently
      called HAVE_C_MCOUNT_RECORD which enables BUILD_C_RECORDMCOUNT. To
      be more consistent with the name that all archs may use, it has been
      renamed to HAVE_C_RECORDMCOUNT. This will be less confusing since
      we are building a C recordmcount and not a mcount_record.
      Suggested-by: default avatarIngo Molnar <mingo@elte.hu>
      Cc: <linux-arch@vger.kernel.org>
      Cc: Michal Marek <mmarek@suse.cz>
      Cc: linux-kbuild@vger.kernel.org
      Cc: John Reiser <jreiser@bitwagon.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      cf4db259
    • Ingo Molnar's avatar
      Merge branch 'perf/core' of... · d9d572a9
      Ingo Molnar authored
      Merge branch 'perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing into perf/core
      d9d572a9
  4. 14 Oct, 2010 10 commits
  5. 13 Oct, 2010 1 commit
    • Borislav Petkov's avatar
      tracing: Fix function-graph build warning on 32-bit · 14cae9bd
      Borislav Petkov authored
      Fix
      
      kernel/trace/trace_functions_graph.c: In function ‘trace_print_graph_duration’:
      kernel/trace/trace_functions_graph.c:652: warning: comparison of distinct pointer types lacks a cast
      
      when building 36-rc6 on a 32-bit due to the strict type check failing
      in the min() macro.
      Signed-off-by: default avatarBorislav Petkov <bp@alien8.de>
      Cc: Chase Douglas <chase.douglas@canonical.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      LKML-Reference: <20100929080823.GA13595@liondog.tnic>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      14cae9bd
  6. 12 Oct, 2010 1 commit
    • Robert Richter's avatar
      oprofile: disable write access to oprofilefs while profiler is running · 7df01d96
      Robert Richter authored
      Oprofile counters are setup when profiling is disabled. Thus, writing
      to oprofilefs has no immediate effect. Changes are updated only after
      oprofile is reenabled.
      
      To keep userland and kernel states synchronized, we now allow
      configuration of oprofile only if profiling is disabled.  In this case
      it checks if the profiler is running and then disables write access to
      oprofilefs by returning -EBUSY. The change should be backward
      compatible with current oprofile userland daemon.
      Acked-by: default avatarMaynard Johnson <maynardj@us.ibm.com>
      Cc: William Cohen <wcohen@redhat.com>
      Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
      Signed-off-by: default avatarRobert Richter <robert.richter@amd.com>
      7df01d96
  7. 11 Oct, 2010 12 commits