Commits · 5cb3d1d9d34ac04bcaa2034139345b2a5fea54c1 · Kirill Smelkov / linux

10 Apr, 2009 8 commits

tracing, net, skb tracepoint: make skb tracepoint use the TRACE_EVENT() macro · 5cb3d1d9

Zhaolei authored Apr 09, 2009

TRACE_EVENT is a more generic way to define a tracepoint.
Doing so adds these new capabilities to this tracepoint:

  - zero-copy and per-cpu splice() tracing
  - binary tracing without printf overhead
  - structured logging records exposed under /debug/tracing/events
  - trace events embedded in function tracer output and other plugins
  - user-defined, per tracepoint filter expressions
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: "Steven Rostedt ;" <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
LKML-Reference: <49DD90D2.5020604@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

5cb3d1d9

x86, function-graph: only save return values on x86_64 · e71e99c2

Steven Rostedt authored Mar 25, 2009

Impact: speed up

The return to handler portion of the function graph tracer should only
need to save the return values. The caller already saved off the
registers that the callee can modify. The returning function already
saved the registers it modified. When we call our own trace function
it too will save the registers that the callee must restore.

There's no reason to save off anything more that the registers used
to return the values.

Note, I did a complete kernel build with this modification and the
function graph tracer running on x86_64.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

e71e99c2

tracing/lockdep: report the time waited for a lock · 2062501a

Frederic Weisbecker authored Apr 06, 2009

While trying to optimize the new lock on reiserfs to replace
the bkl, I find the lock tracing very useful though it lacks
something important for performance (and latency) instrumentation:
the time a task waits for a lock.

That's what this patch implements:

  bash-4816  [000]   202.652815: lock_contended: lock_contended: &sb->s_type->i_mutex_key
  bash-4816  [000]   202.652819: lock_acquired: &rq->lock (0.000 us)
 <...>-4787  [000]   202.652825: lock_acquired: &rq->lock (0.000 us)
 <...>-4787  [000]   202.652829: lock_acquired: &rq->lock (0.000 us)
  bash-4816  [000]   202.652833: lock_acquired: &sb->s_type->i_mutex_key (16.005 us)

As shown above, the "lock acquired" field is followed by the time
it has been waiting for the lock. Usually, a lock contended entry
is followed by a near lock_acquired entry with a non-zero time waited.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <1238975373-15739-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

2062501a

Merge branch 'tracing/urgent' into tracing/core · 1cad1252

Ingo Molnar authored Apr 10, 2009

Merge reason: pick up both v2.6.30-rc1 [which includes tracing/urgent fixes]
and pick up the current lineup of tracing/urgent fixes as well
Signed-off-by: Ingo Molnar <mingo@elte.hu>

1cad1252

tracing: fix splice return too large · 93cfb3c9

Lai Jiangshan authored Apr 02, 2009

I got these from strace:

 splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 12288
 splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 12288
 splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 12288
 splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 16384
 splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 8192
 splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 8192
 splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 8192

I wanted to splice_read 4096 bytes, but it returns 8192 or larger.

It is because the return value of tracing_buffers_splice_read()
does not include "zero out any left over data" bytes.

But tracing_buffers_read() includes these bytes, we make them
consistent.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <srostedt@redhat.com>
LKML-Reference: <49D46674.9030804@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

93cfb3c9

tracing: update file->f_pos when splice(2) it · c7625a55

Lai Jiangshan authored Apr 02, 2009

Impact: Cleanup

These two lines:

	if (unlikely(*ppos))
		return -ESPIPE;

in tracing_buffers_splice_read() are not needed, VFS layer
has disabled seek(2).

We remove these two lines, and then we can update file->f_pos.

And tracing_buffers_read() updates file->f_pos, this fix
make tracing_buffers_splice_read() updates file->f_pos too.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <srostedt@redhat.com>
LKML-Reference: <49D46670.4010503@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

c7625a55

tracing: allocate page when needed · ddd538f3

Lai Jiangshan authored Apr 02, 2009

Impact: Cleanup

Sometimes, we open trace_pipe_raw, but we don't read(2) it,
we just splice(2) it, thus, the page is not used.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <srostedt@redhat.com>
LKML-Reference: <49D4666B.4010608@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

ddd538f3

tracing: disable seeking for trace_pipe_raw · d1e7e02f

Lai Jiangshan authored Apr 02, 2009

Impact: disable pread()

We set tracing_buffers_fops.llseek to no_llseek,
but we can still perform pread() to read this file.

That is not expected.

This fix uses nonseekable_open() to disable it.

tracing_buffers_fops.llseek is still set to no_llseek,
it mark this file is a "non-seekable device" and is used by
sys_splice(). See also do_splice() or manual of splice(2):

ERRORS
       EINVAL Target file system doesn't support  splicing;
              neither  of the descriptors refers to a pipe;
              or offset given for non-seekable device.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <srostedt@redhat.com>
LKML-Reference: <49D46668.8030806@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

d1e7e02f

09 Apr, 2009 25 commits

MN10300: Kill MN10300's own profiling Kconfig · 62b8e680

David Howells authored Apr 09, 2009

Kill MN10300's own profiling Kconfig as this is superfluous given that the
profiling options have moved to init/Kconfig and arch/Kconfig. Not only is
this now superfluous, but the dependencies are not correct.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

62b8e680

FRV: Use <asm-generic/pgtable.h> in NOMMU mode · 6fde836b

David Howells authored Apr 09, 2009

asm-frv/pgtable.h could just #include <asm-generic/pgtable.h> in NOMMU mode
rather than #defining macros for lazy MMU and CPU stuff.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

6fde836b

keys: Handle there being no fallback destination keyring for request_key() · 34574dd1

David Howells authored Apr 09, 2009

When request_key() is called, without there being any standard process
keyrings on which to fall back if a destination keyring is not specified, an
oops is liable to occur when construct_alloc_key() calls down_write() on
dest_keyring's semaphore.

Due to function inlining this may be seen as an oops in down_write() as called
from request_key_and_link().

This situation crops up during boot, where request_key() is called from within
the kernel (such as in CIFS mounts) where nobody is actually logged in, and so
PAM has not had a chance to create a session keyring and user keyrings to act
as the fallback.

To fix this, make construct_alloc_key() not attempt to cache a key if there is
no fallback key if no destination keyring is given specifically.
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

34574dd1

afs: BUG to BUG_ON changes · 11ff5f6a

Stoyan Gaydarov authored Apr 09, 2009

Signed-off-by: Stoyan Gaydarov <stoyboyker@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

11ff5f6a

Merge branch 'x86-fixes-for-linus' of... · e66dd190

Linus Torvalds authored Apr 09, 2009

Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: cpu_debug remove execute permission
  x86: smarten /proc/interrupts output for new counters
  x86: DMI match for the Dell DXP061 as it needs BIOS reboot
  x86: make 64 bit to use default_inquire_remote_apic
  x86, setup: un-resequence mode setting for VGA 80x34 and 80x60 modes
  x86, intel-iommu: fix X2APIC && !ACPI build failure

e66dd190

Merge branch 'tracing-fixes-for-linus' of... · c2ea122c

Linus Torvalds authored Apr 09, 2009

Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  tracing: consolidate documents
  blktrace: pass the right pointer to kfree()
  tracing/syscalls: use a dedicated file header
  tracing: append a comma to INIT_FTRACE_GRAPH

c2ea122c

Merge branch 'sched-fixes-for-linus' of... · 17b2e9bf

Linus Torvalds authored Apr 09, 2009

Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  sched: do not count frozen tasks toward load
  sched: refresh MAINTAINERS entry
  sched: Print sched_group::__cpu_power in sched_domain_debug
  cpuacct: add per-cgroup utime/stime statistics
  posixtimers, sched: Fix posix clock monotonicity
  sched_rt: don't allocate cpumask in fastpath
  cpuacct: make cpuacct hierarchy walk in cpuacct_charge() safe when rcupreempt is used -v2

17b2e9bf

Merge branches 'core-fixes-for-linus', 'irq-fixes-for-linus' and... · 422a2534

Linus Torvalds authored Apr 09, 2009

Merge branches 'core-fixes-for-linus', 'irq-fixes-for-linus' and 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  printk: fix wrong format string iter for printk
  futex: comment requeue key reference semantics

* 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  irq: fix cpumask memory leak on offstack cpumask kernels

* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  posix-timers: fix RLIMIT_CPU && setitimer(CPUCLOCK_PROF)
  posix-timers: fix RLIMIT_CPU && fork()
  timers: add missing kernel-doc

422a2534

MN10300: Convert obsolete no_irq_type to no_irq_chip · 91e58b6e

Thomas Gleixner authored Apr 09, 2009

Convert the last remaining users to no_irq_chip.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

91e58b6e

Merge git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm · df552929

Linus Torvalds authored Apr 09, 2009

* git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm:
  dm kcopyd: fix callback race
  dm kcopyd: prepare for callback race fix
  dm: implement basic barrier support
  dm: remove dm_request loop
  dm: rework queueing and suspension
  dm: simplify dm_request loop
  dm: split DMF_BLOCK_IO flag into two
  dm: rearrange dm_wq_work
  dm: remove limited barrier support
  dm: add integrity support

df552929

module: try_then_request_module must wait · 97c18e2c

Herbert Xu authored Apr 09, 2009

Since the whole point of try_then_request_module is to retry
the operation after a module has been loaded, we must wait for
the module to fully load.

Otherwise all sort of things start breaking, e.g., you won't
be able to read your encrypted disks on the first attempt.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Maciej Rutecki <maciej.rutecki@gmail.com>
Tested-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

97c18e2c

sched: do not count frozen tasks toward load · e3c8ca83

Nathan Lynch authored Apr 08, 2009

Freezing tasks via the cgroup freezer causes the load average to climb
because the freezer's current implementation puts frozen tasks in
uninterruptible sleep (D state).

Some applications which perform job-scheduling functions consult the
load average when making decisions.  If a cgroup is frozen, the load
average does not provide a useful measure of the system's utilization
to such applications.  This is especially inconvenient if the job
scheduler employs the cgroup freezer as a mechanism for preempting low
priority jobs.  Contrast this with using SIGSTOP for the same purpose:
the stopped tasks do not count toward system load.

Change task_contributes_to_load() to return false if the task is
frozen.  This results in /proc/loadavg behavior that better meets
users' expectations.
Signed-off-by: Nathan Lynch <ntl@pobox.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Nigel Cunningham <nigel@tuxonice.net>
Tested-by: Nigel Cunningham <nigel@tuxonice.net>
Cc: <stable@kernel.org>
Cc: containers@lists.linux-foundation.org
Cc: linux-pm@lists.linux-foundation.org
Cc: Matt Helsley <matthltc@us.ibm.com>
LKML-Reference: <20090408194512.47a99b95@manatee.lan>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

e3c8ca83

tracing: consolidate documents · 66bb7488

Li Zefan authored Apr 09, 2009

Move kmemtrace.txt, tracepoints.txt, ftrace.txt and mmiotrace.txt to
the new trace/ directory.

I didnt find any references to those documents in both source
files and documents, so no extra work needs to be done.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Pekka Paalanen <pq@iki.fi>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
LKML-Reference: <49DD6E2B.6090200@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

66bb7488

x86: cpu_debug remove execute permission · f20ab9c3

Jaswinder Singh Rajput authored Apr 08, 2009

It seems by mistake these files got execute permissions so removing it.
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
LKML-Reference: <1239211186.9037.2.camel@ht.satnam>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

f20ab9c3

blktrace: pass the right pointer to kfree() · 9eb85125

Li Zefan authored Apr 09, 2009

Impact: fix kfree crash with non-standard act_mask string

If passing a string with leading white spaces to strstrip(),
the returned ptr != the original ptr.

This bug was introduced by me.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <49DD694C.8020902@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

9eb85125

tracing/syscalls: use a dedicated file header · 47788c58

Frederic Weisbecker authored Apr 08, 2009

Impact: fix build warnings and possibe compat misbehavior on IA64

Building a kernel on ia64 might trigger these ugly build warnings:

CC      arch/ia64/ia32/sys_ia32.o
In file included from arch/ia64/ia32/sys_ia32.c:55:
arch/ia64/ia32/ia32priv.h:290:1: warning: "elf_check_arch" redefined
In file included from include/linux/elf.h:7,
                 from include/linux/module.h:14,
                 from include/linux/ftrace.h:8,
                 from include/linux/syscalls.h:68,
                 from arch/ia64/ia32/sys_ia32.c:18:
arch/ia64/include/asm/elf.h:19:1: warning: this is the location of the previous definition
[...]

sys_ia32.c includes linux/syscalls.h which in turn includes linux/ftrace.h
to import the syscalls tracing prototypes.

But including ftrace.h can pull too much things for a low level file,
especially on ia64 where the ia32 private headers conflict with higher
level headers.

Now we isolate the syscall tracing headers in their own lightweight file.
Reported-by: Tony Luck <tony.luck@intel.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Jason Baron <jbaron@redhat.com>
Cc: "Frank Ch. Eigler" <fche@redhat.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Jiaying Zhang <jiayingz@google.com>
Cc: Michael Rubin <mrubin@google.com>
Cc: Martin Bligh <mbligh@google.com>
Cc: Michael Davidson <md@google.com>
LKML-Reference: <20090408184058.GB6017@nowhere>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

47788c58

Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus · f4efdd65

Linus Torvalds authored Apr 08, 2009

* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
  work_on_cpu(): rewrite it to create a kernel thread on demand
  kthread: move sched-realeted initialization from kthreadd context
  kthread: Don't looking for a task in create_kthread() #2

f4efdd65

Merge git://git.infradead.org/battery-2.6 · d8312204

Linus Torvalds authored Apr 08, 2009

* git://git.infradead.org/battery-2.6:
  pda_power: Add optional OTG transceiver and voltage regulator support
  pcf50633_charger: Remove unused mbc_set_status function
  pcf50633_charger: Enable periodic charging restart

d8312204

Merge branch 'for-linus' of... · 3d4d4c8b

Linus Torvalds authored Apr 08, 2009

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
  cap_prctl: don't set error to 0 at 'no_change'

3d4d4c8b

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 · c71d9caf

Linus Torvalds authored Apr 08, 2009

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  igb: remove sysfs entry that was used to set the number of vfs
  igbvf: add new driver to support 82576 virtual functions
  drivers/net/eql.c: Fix a dev leakage.
  niu: Fix unused variable warning.
  r6040: set MODULE_VERSION
  bnx2: Don't use reserved names
  FEC driver: add missing #endif
  niu: Fix error handling
  mv643xx_eth: don't reset the rx coal timer on interface up
  smsc911x: correct debugging message on mii read timeout
  ethoc: fix library build errors
  netfilter: ctnetlink: fix regression in expectation handling
  netfilter: fix selection of "LED" target in netfilter
  netfilter: ip6tables regression fix

c71d9caf

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6 · a219ee88

Linus Torvalds authored Apr 08, 2009

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
  sparc: Hook up sys_preadv and sys_pwritev
  sparc64: add_node_ranges() must be __init
  serial: sunsu: sunsu_kbd_ms_init needs to be __devinit
  sparc: Fix section mismatch warnings in cs4231 sound driver.
  sparc64: Fix section mismatch warnings in PCI controller drivers.
  sparc64: Fix section mismatch warnings in power driver.
  sparc64: get_cells() can't be marked __init

a219ee88

Merge branch 'ext3-latency-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 · a7b334de

Linus Torvalds authored Apr 08, 2009

* 'ext3-latency-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext3: Try to avoid starting a transaction in writepage for data=writepage
  block_write_full_page: switch synchronous writes to use WRITE_SYNC_PLUG

a7b334de

work_on_cpu(): rewrite it to create a kernel thread on demand · 6b44003e

Andrew Morton authored Apr 09, 2009

Impact: circular locking bugfix

The various implemetnations and proposed implemetnations of work_on_cpu()
are vulnerable to various deadlocks because they all used queues of some
form.

Unrelated pieces of kernel code thus gained dependencies wherein if one
work_on_cpu() caller holds a lock which some other work_on_cpu() callback
also takes, the kernel could rarely deadlock.

Fix this by creating a short-lived kernel thread for each work_on_cpu()
invokation.

This is not terribly fast, but the only current caller of work_on_cpu() is
pci_call_probe().

It would be nice to find some other way of doing the node-local
allocations in the PCI probe code so that we can zap work_on_cpu()
altogether.  The code there is rather nasty.  I can't think of anything
simple at this time...

Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

6b44003e

kthread: move sched-realeted initialization from kthreadd context · 1c99315b

Oleg Nesterov authored Apr 09, 2009

kthreadd is the single thread which implements ths "create" request, move
sched_setscheduler/etc from create_kthread() to kthread_create() to
improve the scalability.

We should be careful with sched_setscheduler(), use _nochek helper.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Vitaliy Gusev <vgusev@openvz.org
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

1c99315b

kthread: Don't looking for a task in create_kthread() #2 · 3217ab97

Vitaliy Gusev authored Apr 09, 2009

Remove the unnecessary find_task_by_pid_ns(). kthread() can just
use "current" to get the same result.
Signed-off-by: Vitaliy Gusev <vgusev@openvz.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

3217ab97

08 Apr, 2009 7 commits

dm kcopyd: fix callback race · 340cd444

Mikulas Patocka authored Apr 09, 2009

If the thread calling dm_kcopyd_copy is delayed due to scheduling inside
split_job/segment_complete and the subjobs complete before the loop in
split_job completes, the kcopyd callback could be invoked from the
thread that called dm_kcopyd_copy instead of the kcopyd workqueue.

dm_kcopyd_copy -> split_job -> segment_complete -> job->fn()

Snapshots depend on the fact that callbacks are called from the singlethreaded
kcopyd workqueue and expect that there is no racing between individual
callbacks. The racing between callbacks can lead to corruption of exception
store and it can also mean that exception store callbacks are called twice
for the same exception - a likely reason for crashes reported inside
pending_complete() / remove_exception().

This patch fixes two problems:

1. job->fn being called from the thread that submitted the job (see above).

- Fix: hand over the completion callback to the kcopyd thread.

2. job->fn(read_err, write_err, job->context); in segment_complete
reports the error of the last subjob, not the union of all errors.

- Fix: pass job->write_err to the callback to report all error bits
  (it is done already in run_complete_job)

Cc: stable@kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

340cd444

dm kcopyd: prepare for callback race fix · 73830857

Mikulas Patocka authored Apr 09, 2009

Use a variable in segment_complete() to point to the dm_kcopyd_client
struct and only release job->pages in run_complete_job() if any are
defined.  These changes are needed by the next patch.

Cc: stable@kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

73830857

dm: implement basic barrier support · af7e466a

Mikulas Patocka authored Apr 09, 2009

Barriers are submitted to a worker thread that issues them in-order.

The thread is modified so that when it sees a barrier request it waits
for all pending IO before the request then submits the barrier and
waits for it.  (We must wait, otherwise it could be intermixed with
following requests.)

Errors from the barrier request are recorded in a per-device barrier_error
variable. There may be only one barrier request in progress at once.

For now, the barrier request is converted to a non-barrier request when
sending it to the underlying device.

This patch guarantees correct barrier behavior if the underlying device
doesn't perform write-back caching. The same requirement existed before
barriers were supported in dm.

Bottom layer barrier support (sending barriers by target drivers) and
handling devices with write-back caches will be done in further patches.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

af7e466a

dm: remove dm_request loop · 92c63902

Mikulas Patocka authored Apr 09, 2009

Remove queue_io return value and a loop in dm_request.

IO may be submitted to a worker thread with queue_io().  queue_io() sets
DMF_QUEUE_IO_TO_THREAD so that all further IO is queued for the thread. When
the thread finishes its work, it clears DMF_QUEUE_IO_TO_THREAD and from this
point on, requests are submitted from dm_request again. This will be used
for processing barriers.

Remove the loop in dm_request. queue_io() can submit I/Os to the worker thread
even if DMF_QUEUE_IO_TO_THREAD was not set.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

92c63902

dm: rework queueing and suspension · 3b00b203

Mikulas Patocka authored Apr 09, 2009

Rework shutting down on suspend and document the associated rules.

Drop write lock in __split_and_process_bio to allow more processing
concurrency.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

3b00b203

dm: simplify dm_request loop · 54d9a1b4

Alasdair G Kergon authored Apr 09, 2009

Refactor the code in dm_request().

Require the new DMF_BLOCK_FOR_SUSPEND flag on readahead bios we will
discard so we don't drop such bios while processing a barrier.
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

54d9a1b4

dm: split DMF_BLOCK_IO flag into two · 1eb787ec

Alasdair G Kergon authored Apr 09, 2009

Split the DMF_BLOCK_IO flag into two.

DMF_BLOCK_IO_FOR_SUSPEND is set when I/O must be blocked while suspending a
device.  DMF_QUEUE_IO_TO_THREAD is set when I/O must be queued to a
worker thread for later processing.
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

1eb787ec