1. 10 Apr, 2009 8 commits
    • Zhaolei's avatar
      tracing, net, skb tracepoint: make skb tracepoint use the TRACE_EVENT() macro · 5cb3d1d9
      Zhaolei authored
      TRACE_EVENT is a more generic way to define a tracepoint.
      Doing so adds these new capabilities to this tracepoint:
      
        - zero-copy and per-cpu splice() tracing
        - binary tracing without printf overhead
        - structured logging records exposed under /debug/tracing/events
        - trace events embedded in function tracer output and other plugins
        - user-defined, per tracepoint filter expressions
      Signed-off-by: default avatarZhao Lei <zhaolei@cn.fujitsu.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: "Steven Rostedt ;" <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      LKML-Reference: <49DD90D2.5020604@cn.fujitsu.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      5cb3d1d9
    • Steven Rostedt's avatar
      x86, function-graph: only save return values on x86_64 · e71e99c2
      Steven Rostedt authored
      Impact: speed up
      
      The return to handler portion of the function graph tracer should only
      need to save the return values. The caller already saved off the
      registers that the callee can modify. The returning function already
      saved the registers it modified. When we call our own trace function
      it too will save the registers that the callee must restore.
      
      There's no reason to save off anything more that the registers used
      to return the values.
      
      Note, I did a complete kernel build with this modification and the
      function graph tracer running on x86_64.
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      e71e99c2
    • Frederic Weisbecker's avatar
      tracing/lockdep: report the time waited for a lock · 2062501a
      Frederic Weisbecker authored
      While trying to optimize the new lock on reiserfs to replace
      the bkl, I find the lock tracing very useful though it lacks
      something important for performance (and latency) instrumentation:
      the time a task waits for a lock.
      
      That's what this patch implements:
      
        bash-4816  [000]   202.652815: lock_contended: lock_contended: &sb->s_type->i_mutex_key
        bash-4816  [000]   202.652819: lock_acquired: &rq->lock (0.000 us)
       <...>-4787  [000]   202.652825: lock_acquired: &rq->lock (0.000 us)
       <...>-4787  [000]   202.652829: lock_acquired: &rq->lock (0.000 us)
        bash-4816  [000]   202.652833: lock_acquired: &sb->s_type->i_mutex_key (16.005 us)
      
      As shown above, the "lock acquired" field is followed by the time
      it has been waiting for the lock. Usually, a lock contended entry
      is followed by a near lock_acquired entry with a non-zero time waited.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      LKML-Reference: <1238975373-15739-1-git-send-email-fweisbec@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      2062501a
    • Ingo Molnar's avatar
      Merge branch 'tracing/urgent' into tracing/core · 1cad1252
      Ingo Molnar authored
      Merge reason: pick up both v2.6.30-rc1 [which includes tracing/urgent fixes]
                    and pick up the current lineup of tracing/urgent fixes as well
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      1cad1252
    • Lai Jiangshan's avatar
      tracing: fix splice return too large · 93cfb3c9
      Lai Jiangshan authored
      I got these from strace:
      
       splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 12288
       splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 12288
       splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 12288
       splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 16384
       splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 8192
       splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 8192
       splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 8192
      
      I wanted to splice_read 4096 bytes, but it returns 8192 or larger.
      
      It is because the return value of tracing_buffers_splice_read()
      does not include "zero out any left over data" bytes.
      
      But tracing_buffers_read() includes these bytes, we make them
      consistent.
      Signed-off-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <srostedt@redhat.com>
      LKML-Reference: <49D46674.9030804@cn.fujitsu.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      93cfb3c9
    • Lai Jiangshan's avatar
      tracing: update file->f_pos when splice(2) it · c7625a55
      Lai Jiangshan authored
      Impact: Cleanup
      
      These two lines:
      
      	if (unlikely(*ppos))
      		return -ESPIPE;
      
      in tracing_buffers_splice_read() are not needed, VFS layer
      has disabled seek(2).
      
      We remove these two lines, and then we can update file->f_pos.
      
      And tracing_buffers_read() updates file->f_pos, this fix
      make tracing_buffers_splice_read() updates file->f_pos too.
      Signed-off-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <srostedt@redhat.com>
      LKML-Reference: <49D46670.4010503@cn.fujitsu.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      c7625a55
    • Lai Jiangshan's avatar
      tracing: allocate page when needed · ddd538f3
      Lai Jiangshan authored
      Impact: Cleanup
      
      Sometimes, we open trace_pipe_raw, but we don't read(2) it,
      we just splice(2) it, thus, the page is not used.
      Signed-off-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <srostedt@redhat.com>
      LKML-Reference: <49D4666B.4010608@cn.fujitsu.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      ddd538f3
    • Lai Jiangshan's avatar
      tracing: disable seeking for trace_pipe_raw · d1e7e02f
      Lai Jiangshan authored
      Impact: disable pread()
      
      We set tracing_buffers_fops.llseek to no_llseek,
      but we can still perform pread() to read this file.
      
      That is not expected.
      
      This fix uses nonseekable_open() to disable it.
      
      tracing_buffers_fops.llseek is still set to no_llseek,
      it mark this file is a "non-seekable device" and is used by
      sys_splice(). See also do_splice() or manual of splice(2):
      
      ERRORS
             EINVAL Target file system doesn't support  splicing;
                    neither  of the descriptors refers to a pipe;
                    or offset given for non-seekable device.
      Signed-off-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <srostedt@redhat.com>
      LKML-Reference: <49D46668.8030806@cn.fujitsu.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      d1e7e02f
  2. 09 Apr, 2009 25 commits
  3. 08 Apr, 2009 7 commits
    • Mikulas Patocka's avatar
      dm kcopyd: fix callback race · 340cd444
      Mikulas Patocka authored
      If the thread calling dm_kcopyd_copy is delayed due to scheduling inside
      split_job/segment_complete and the subjobs complete before the loop in
      split_job completes, the kcopyd callback could be invoked from the
      thread that called dm_kcopyd_copy instead of the kcopyd workqueue.
      
      dm_kcopyd_copy -> split_job -> segment_complete -> job->fn()
      
      Snapshots depend on the fact that callbacks are called from the singlethreaded
      kcopyd workqueue and expect that there is no racing between individual
      callbacks. The racing between callbacks can lead to corruption of exception
      store and it can also mean that exception store callbacks are called twice
      for the same exception - a likely reason for crashes reported inside
      pending_complete() / remove_exception().
      
      This patch fixes two problems:
      
      1. job->fn being called from the thread that submitted the job (see above).
      
      - Fix: hand over the completion callback to the kcopyd thread.
      
      2. job->fn(read_err, write_err, job->context); in segment_complete
      reports the error of the last subjob, not the union of all errors.
      
      - Fix: pass job->write_err to the callback to report all error bits
        (it is done already in run_complete_job)
      
      Cc: stable@kernel.org
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      340cd444
    • Mikulas Patocka's avatar
      dm kcopyd: prepare for callback race fix · 73830857
      Mikulas Patocka authored
      Use a variable in segment_complete() to point to the dm_kcopyd_client
      struct and only release job->pages in run_complete_job() if any are
      defined.  These changes are needed by the next patch.
      
      Cc: stable@kernel.org
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      73830857
    • Mikulas Patocka's avatar
      dm: implement basic barrier support · af7e466a
      Mikulas Patocka authored
      Barriers are submitted to a worker thread that issues them in-order.
      
      The thread is modified so that when it sees a barrier request it waits
      for all pending IO before the request then submits the barrier and
      waits for it.  (We must wait, otherwise it could be intermixed with
      following requests.)
      
      Errors from the barrier request are recorded in a per-device barrier_error
      variable. There may be only one barrier request in progress at once.
      
      For now, the barrier request is converted to a non-barrier request when
      sending it to the underlying device.
      
      This patch guarantees correct barrier behavior if the underlying device
      doesn't perform write-back caching. The same requirement existed before
      barriers were supported in dm.
      
      Bottom layer barrier support (sending barriers by target drivers) and
      handling devices with write-back caches will be done in further patches.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      af7e466a
    • Mikulas Patocka's avatar
      dm: remove dm_request loop · 92c63902
      Mikulas Patocka authored
      Remove queue_io return value and a loop in dm_request.
      
      IO may be submitted to a worker thread with queue_io().  queue_io() sets
      DMF_QUEUE_IO_TO_THREAD so that all further IO is queued for the thread. When
      the thread finishes its work, it clears DMF_QUEUE_IO_TO_THREAD and from this
      point on, requests are submitted from dm_request again. This will be used
      for processing barriers.
      
      Remove the loop in dm_request. queue_io() can submit I/Os to the worker thread
      even if DMF_QUEUE_IO_TO_THREAD was not set.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      92c63902
    • Mikulas Patocka's avatar
      dm: rework queueing and suspension · 3b00b203
      Mikulas Patocka authored
      Rework shutting down on suspend and document the associated rules.
      
      Drop write lock in __split_and_process_bio to allow more processing
      concurrency.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      3b00b203
    • Alasdair G Kergon's avatar
      dm: simplify dm_request loop · 54d9a1b4
      Alasdair G Kergon authored
      Refactor the code in dm_request().
      
      Require the new DMF_BLOCK_FOR_SUSPEND flag on readahead bios we will
      discard so we don't drop such bios while processing a barrier.
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      54d9a1b4
    • Alasdair G Kergon's avatar
      dm: split DMF_BLOCK_IO flag into two · 1eb787ec
      Alasdair G Kergon authored
      Split the DMF_BLOCK_IO flag into two.
      
      DMF_BLOCK_IO_FOR_SUSPEND is set when I/O must be blocked while suspending a
      device.  DMF_QUEUE_IO_TO_THREAD is set when I/O must be queued to a
      worker thread for later processing.
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      1eb787ec