Commits · 3f029d3c6d62068d59301d90c18dbde8ee402107 · nexedi / linux

02 Aug, 2009 9 commits

sched: Enhance the pre/post scheduling logic · 3f029d3c

Gregory Haskins authored Jul 29, 2009

We currently have an explicit "needs_post" vtable method which
returns a stack variable for whether we should later run
post-schedule.  This leads to an awkward exchange of the
variable as it bubbles back up out of the context switch. Peter
Zijlstra observed that this information could be stored in the
run-queue itself instead of handled on the stack.

Therefore, we revert to the method of having context_switch
return void, and update an internal rq->post_schedule variable
when we require further processing.

In addition, we fix a race condition where we try to access
current->sched_class without holding the rq->lock.  This is
technically racy, as the sched-class could change out from
under us.  Instead, we reference the per-rq post_schedule
variable with the runqueue unlocked, but with preemption
disabled to see if we need to reacquire the rq->lock.

Finally, we clean the code up slightly by removing the #ifdef
CONFIG_SMP conditionals from the schedule() call, and implement
some inline helper functions instead.

This patch passes checkpatch, and rt-migrate.
Signed-off-by: Gregory Haskins <ghaskins@novell.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20090729150422.17691.55590.stgit@dev.haskins.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

3f029d3c

sched: Add new prio to cpupri before removing old prio · c3a2ae3d

Steven Rostedt authored Jul 29, 2009

We need to add the new prio to the cpupri accounting before
removing the old prio. This is because removing the old prio
first will open a race window where the cpu will be removed
from pri_active. In this case the cpu will not be visible for
RT push and pulls. This could cause a RT task to not migrate
appropriately, and create a very large latency.

This bug was found with the use of ftrace sched events and
trace_printk.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20090729042526.438281019@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

c3a2ae3d

sched: Check for pushing rt tasks after all scheduling · da19ab51

Steven Rostedt authored Jul 29, 2009

The current method for pushing RT tasks after scheduling only
happens after a context switch. But we found cases where a task
is set up on a run queue to be pushed but the push never
happens because the schedule chooses the same task.

This bug was found with the help of Gregory Haskins and the use
of ftrace (trace_printk). It tooks several days for both of us
analyzing the code and the trace output to find this.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20090729042526.205923666@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

da19ab51

sched: Optimize unused cgroup configuration · e7097159

Peter Zijlstra authored Jun 03, 2009

When cgroup group scheduling is built in, skip some code paths
if we don't have any (but the root) cgroups configured.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

e7097159

sched: Fix cgroup smp fairness · a5004278

Peter Zijlstra authored Jul 27, 2009

Commit ec4e0e2f ("fix
inconsistency when redistribute per-cpu tg->cfs_rq shares")
broke cgroup smp fairness.

In order to avoid starvation of newly placed tasks, we never
quite set the share of an empty cpu group-task to 0, but
instead we set it as if there's a single NICE-0 task present.

If however we actually set this in cfs_rq[cpu]->shares, that
means the total shares for that group will be slightly inflated
every time we balance, causing the observed unfairness.

Fix this by setting cfs_rq[cpu]->shares to 0 but actually
setting the effective weight of the related se to the inflated
number.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1248696557.6987.1615.camel@twins>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

a5004278

Merge branch 'sched/urgent' into sched/core · 8e9ed8b0
Ingo Molnar authored Aug 02, 2009
```
Merge reason: avoid upcoming patch conflict.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
```
8e9ed8b0

sched: Fix race in cpupri introduced by cpumask_var changes · 07903af1

Gregory Haskins authored Jul 30, 2009

Background:

Several race conditions in the scheduler have cropped up
recently, which Steven and I have tracked down using ftrace.
The most recent one turns out to be a race in how the scheduler
determines a suitable migration target for RT tasks, introduced
recently with commit:

    commit 68e74568
    Date:   Tue Nov 25 02:35:13 2008 +1030

        sched: convert struct cpupri_vec cpumask_var_t.

The original design of cpupri allowed lockless readers to
quickly determine a best-estimate target.  Races between the
pri_active bitmap and the vec->mask were handled in the
original code because we would detect and return "0" when this
occured.  The design was predicated on the *effective*
atomicity (*) of caching the result of cpus_and() between the
cpus_allowed and the vec->mask.

Commit 68e74568 changed the behavior such that vec->mask is
accessed multiple times.  This introduces a subtle race, the
result of which means we can have a result that returns "1",
but with an empty bitmap.

*) yes, we know cpus_and() is not a locked operator across the
   entire composite array, but it is implicitly atomic on a
   per-word basis which is all the design required to work.

Implementation:

Rather than forgoing the lockless design, or reverting to a
stack-based cpumask_t, we simply check for when the race has
been encountered and continue processing in the event that the
race is hit.  This renders the removal race as if the priority
bit had been atomically cleared as well, and allows the
algorithm to execute correctly.
Signed-off-by: Gregory Haskins <ghaskins@novell.com>
CC: Rusty Russell <rusty@rustcorp.com.au>
CC: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20090730145728.25226.92769.stgit@dev.haskins.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

07903af1

sched: Fix latencytop and sleep profiling vs group scheduling · e414314c

Peter Zijlstra authored Jul 23, 2009

The latencytop and sleep accounting code assumes that any
scheduler entity represents a task, this is not so.

Cc: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

e414314c

sched: Fix cond_resched_lock() in !CONFIG_PREEMPT · 716a4234

Frederic Weisbecker authored Jul 24, 2009

The might_sleep() test inside cond_resched_lock() assumes the
spinlock is held and then preemption is disabled. This is true
with CONFIG_PREEMPT but the preempt_count() doesn't change
otherwise.

Check by starting from the appropriate preempt offset depending
on the config.
Reported-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1248458723-12146-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

716a4234

01 Aug, 2009 1 commit
- Linux 2.6.31-rc5 · ed680c4a
  Linus Torvalds authored Jul 31, 2009
  
  ed680c4a
31 Jul, 2009 19 commits

Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs · f5266cbd

Linus Torvalds authored Jul 31, 2009

* 'for-linus' of git://oss.sgi.com/xfs/xfs:
  xfs: bump up nr_to_write in xfs_vm_writepage
  xfs: reduce bmv_count in xfs_vn_fiemap

f5266cbd

Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block · a5bc92cd

Linus Torvalds authored Jul 31, 2009

* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
  io context: fix ref counting
  block: make the end_io functions be non-GPL exports
  block: fix improper kobject release in blk_integrity_unregister
  block: always assign default lock to queues
  mg_disk: Add missing ready status check on mg_write()
  mg_disk: fix issue with data integrity on error in mg_write()
  mg_disk: fix reading invalid status when use polling driver
  mg_disk: remove prohibited sleep operation

a5bc92cd

Merge branch 'timers-fixes-for-linus' of... · 6eb80e00

Linus Torvalds authored Jul 31, 2009

Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  clocksource: Save mult_orig in clocksource_disable()

6eb80e00

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/drzeus/mmc · d27d4e2a

Linus Torvalds authored Jul 31, 2009

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/drzeus/mmc:
  mmc: orphan subsystem
  imxmmc: Remove unnecessary semicolons
  cb710: use SG_MITER_TO_SG/SG_MITER_FROM_SG
  sdhci: use SG_MITER_TO_SG/SG_MITER_FROM_SG
  lib/scatterlist: add a flags to signalize mapping direction

d27d4e2a

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6 · dbe63a2c

Linus Torvalds authored Jul 31, 2009

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
  ALSA: sound/aoa: Add kmalloc NULL tests
  ALSA: hda - Increase PCM stream name buf in patch_realtek.c
  sound: mpu401.c: Buffer overflow
  sound: aedsp16: Buffer overflow
  ALSA: hda: fix out-of-bound hdmi_eld.sad[] write
  ALSA: hda - Add quirk for Dell Studio 1555

dbe63a2c

clocksource: Save mult_orig in clocksource_disable() · c7121843

Magnus Damm authored Jul 28, 2009

To fix the common case where ->enable() does not set up
mult, make sure mult_orig is saved in mult on disable.

Also add comments to explain why we do this.
Signed-off-by: Magnus Damm <damm@igel.co.jp>
Cc: johnstul@us.ibm.com
Cc: lethal@linux-sh.org
Cc: akpm@linux-foundation.org
LKML-Reference: <20090618152432.10136.9932.sendpatchset@rx1.opensource.se>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

c7121843

mmc: orphan subsystem · 3822a0e3

Pierre Ossman authored Jul 31, 2009

I do not have the time to take care of this, so remove myself as
maintainer.
Signed-off-by: Pierre Ossman <pierre@ossman.eu>

3822a0e3

imxmmc: Remove unnecessary semicolons · a9239d75

Joe Perches authored Jun 28, 2009

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Pavel Pisa <pisa@cmp.felk.cvut.cz>
Signed-off-by: Pierre Ossman <pierre@ossman.eu>

a9239d75

cb710: use SG_MITER_TO_SG/SG_MITER_FROM_SG · 4b2a108c

Sebastian Andrzej Siewior authored Jun 22, 2009

the code allready uses flush_kernel_dcache_page(). This patch updates the
driver to the recent sg API changes which require that either SG_MITER_TO_SG
or SG_MITER_FROM_SG is set. SG_MITER_TO_SG calls flush_kernel_dcache_page()
in sg_mitter_stop()
Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Acked-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: Pierre Ossman <pierre@ossman.eu>

4b2a108c

sdhci: use SG_MITER_TO_SG/SG_MITER_FROM_SG · da60a91d

Sebastian Andrzej Siewior authored Jun 18, 2009

so the page will be flushed on unmap on ARCH which need it.
Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Signed-off-by: Pierre Ossman <pierre@ossman.eu>

da60a91d

lib/scatterlist: add a flags to signalize mapping direction · 6de7e356

Sebastian Andrzej Siewior authored Jun 18, 2009

sg_miter_start() is currently unaware of the direction of the copy
process (to or from the scatter list). It is important to know the
direction because the page has to be flushed in case the data written
is seen on a different mapping in user land on cache incoherent
architectures.
Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Acked-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Pierre Ossman <pierre@ossman.eu>

6de7e356

Merge branch 'fix/oss' into for-linus · ec86fe52
Takashi Iwai authored Jul 31, 2009
```
* fix/oss:
  sound: mpu401.c: Buffer overflow
  sound: aedsp16: Buffer overflow
```
ec86fe52
Merge branch 'fix/misc' into for-linus · d62e345f
Takashi Iwai authored Jul 31, 2009
```
* fix/misc:
  ALSA: sound/aoa: Add kmalloc NULL tests
```
d62e345f

Merge branch 'fix/hda' into for-linus · 6280b61a

Takashi Iwai authored Jul 31, 2009

* fix/hda:
  ALSA: hda - Increase PCM stream name buf in patch_realtek.c
  ALSA: hda: fix out-of-bound hdmi_eld.sad[] write
  ALSA: hda - Add quirk for Dell Studio 1555

6280b61a

ALSA: sound/aoa: Add kmalloc NULL tests · f065fabc

Julia Lawall authored Jul 31, 2009

Check that the result of kzalloc is not NULL before a dereference.

The semantic match that finds this problem is as follows:
(http://www.emn.fr/x-info/coccinelle/)

// <smpl>
@@
expression *x;
identifier f;
constant char *C;
@@

x = \(kmalloc\|kcalloc\|kzalloc\)(...);
... when != x == NULL
    when != x != NULL
    when != (x || ...)
(
kfree(x)
|
f(...,C,...,x,...)
|
*f(...,x,...)
|
*x->f
)
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

f065fabc

ALSA: hda - Increase PCM stream name buf in patch_realtek.c · aa563af7

Takashi Iwai authored Jul 31, 2009

The name buf with size 16 is too short for some codec names, e.g.
truncated like "ALC861-VD Analo".  Now the size is doubled.
Signed-off-by: Takashi Iwai <tiwai@suse.de>

aa563af7

io context: fix ref counting · cbb4f264

Li Zefan authored Jul 31, 2009

Commit d9c7d394
("block: prevent possible io_context->refcount overflow") mistakenly
changed atomic_inc(&ioc->nr_tasks) to atomic_long_inc(&ioc->refcount).
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Nikanth Karthikesan <knikanth@suse.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

cbb4f264

xfs: bump up nr_to_write in xfs_vm_writepage · c8a4051c

Eric Sandeen authored Jul 31, 2009

VM calculation for nr_to_write seems off.  Bump it way
up, this gets simple streaming writes zippy again.
To be reviewed again after Jens' writeback changes.
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Cc: Chris Mason <chris.mason@oracle.com>
Reviewed-by: Felix Blyakher <felixb@sgi.com>
Signed-off-by: Felix Blyakher <felixb@sgi.com>

c8a4051c

xfs: reduce bmv_count in xfs_vn_fiemap · 97db39a1

Eric Sandeen authored Jul 26, 2009

commit 6321e3ed caused
the full bmv_count's worth of getbmapx structures to get
allocated; telling it to do MAXEXTNUM was a bit insane,
resulting in ENOMEM every time.

Chop it down to something reasonable, the number of slots
in the caller's input buffer.  If this is too large the
caller may get ENOMEM but the reason should not be a
mystery, and they can try again with something smaller.

We add 1 to the value because in the normal getbmap
world, bmv_count includes the header and xfs_getbmap does:

        nex = bmv->bmv_count - 1;
        if (nex <= 0)
                return XFS_ERROR(EINVAL);
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Reviewed-by: Olaf Weber <olaf@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Felix Blyakher <felixb@sgi.com>

97db39a1

30 Jul, 2009 11 commits

Merge branch 'tracing-fixes-for-linus' of... · b5929724

Linus Torvalds authored Jul 30, 2009

Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  tracing/stat: Fix seqfile memory leak
  function-graph: Fix seqfile memory leak
  trace_stack: Fix seqfile memory leak
  profile: Suppress warning about large allocations when profile=1 is specified

b5929724

Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable · ec6a8679

Linus Torvalds authored Jul 30, 2009

* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
  Btrfs: be more polite in the async caching threads
  Btrfs: preserve commit_root for async caching

ec6a8679

Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx · db06816c

Linus Torvalds authored Jul 30, 2009

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx:
  dmaengine: at_hdmac: add DMA slave transfers
  dmaengine: at_hdmac: new driver for the Atmel AHB DMA Controller
  dmaengine: dmatest: correct thread_count while using multiple thread per channel
  dmaengine: dmatest: add a maximum number of test iterations
  drivers/dma: Remove unnecessary semicolons
  drivers/dma/fsldma.c: Remove unnecessary semicolons
  dmaengine: move HIGHMEM64G restriction to ASYNC_TX_DMA
  fsldma: do not clear bandwidth control bits on the 83xx controller
  fsldma: enable external start for the 83xx controller
  fsldma: use PCI Read Multiple command

db06816c

Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6 · 784b1d6b

Linus Torvalds authored Jul 30, 2009

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6:
  udf: Fix loading of VAT inode when drive wrongly reports number of recorded blocks

784b1d6b

Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-quota-2.6 · 691c5f73
Linus Torvalds authored Jul 30, 2009
```
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-quota-2.6:
  quota: Silence lockdep on quota_on
```
691c5f73

Merge git://git.infradead.org/users/cbou/battery-2.6.31 · fbdbf838

Linus Torvalds authored Jul 30, 2009

* git://git.infradead.org/users/cbou/battery-2.6.31:
  Add ds2782 battery gas gauge driver
  olpc_battery: Ensure that the TRICKLE bit is checked
  olpc_battery: Fix up eeprom read function

fbdbf838

Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes · 79af3133

Linus Torvalds authored Jul 30, 2009

* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes:
  GFS2: remove dcache entries for remote deleted inodes
  GFS2: Fix incorrent statfs consistency check
  GFS2: Don't put unlikely reclaim candidates on the reclaim list.
  GFS2: Don't try and dealloc own inode
  GFS2: Fix panic in glock memory shrinker
  GFS2: keep statfs info in sync on grows
  GFS2: Shrink the shrinker

79af3133

Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc · e1ca4aed

Linus Torvalds authored Jul 30, 2009

* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
  powerpc: Update defconfigs for embedded 6xx/7xxx, 8xx, 8{3,5,6}xxx
  powerpc/86xx: Update GE Fanuc sbc310 default configuration
  powerpc/86xx: Update defconfig for GE Fanuc's PPC9A
  cpm_uart: Don't use alloc_bootmem in cpm_uart_cpm2.c
  powerpc/83xx: Fix PCI IO base address on MPC837xE-RDB boards
  powerpc/85xx: Don't scan for TBI PHY addresses on MPC8569E-MDS boards
  powerpc/85xx: Fix ethernet link detection on MPC8569E-MDS boards
  powerpc/mm: Fix SMP issue with MMU context handling code

e1ca4aed

Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus · 6ae7d6f0

Linus Torvalds authored Jul 30, 2009

* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
  lguest and virtio: cleanup struct definitions to Linux style.
  lguest: update commentry
  lguest: fix comment style
  virtio: refactor find_vqs
  virtio: delete vq from list
  virtio: fix memory leak on device removal
  lguest: fix descriptor corruption in example launcher
  lguest: dereferencing freed mem in add_eventfd()

6ae7d6f0

kprobes: Use kernel_text_address() for checking probe address · ec30c5f3

Masami Hiramatsu authored Jul 28, 2009

Use kernel_text_address() for checking probe address instead of
__kernel_text_address(), because __kernel_text_address() returns true
for init functions even after relaseing those functions.

That will hit a BUG() in text_poke().
Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Jim Keniston <jkenisto@us.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ec30c5f3

Alan doesn't want to maintain tty code any more · 90a09c9c

Linus Torvalds authored Jul 30, 2009

Not that anybody can blame him. It's a morass. But hey, it's way
better than it _used_ to be, though, so thanks for all the fish.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

90a09c9c