Commits · 9ad4107ba137f743fc531f9f4ceb81b122f9ff25 · Kirill Smelkov / linux

03 Jul, 2008 19 commits

Merge branch 'i2c-fix' of git://aeryn.fluff.org.uk/bjdooks/linux · 9ad4107b

Linus Torvalds authored Jul 02, 2008

* 'i2c-fix' of git://aeryn.fluff.org.uk/bjdooks/linux:
  I2C: S3C2410: Add MODULE_ALIAS() for s3c2440 device.
  I2C: S3C2410: Fixup error codes returned rom a transfer.
  I2C: S3C2410: Check ACK on byte transmission

9ad4107b

Merge branch 'for-2.6.26' of git://git.kernel.dk/linux-2.6-block · 0e77a07f

Linus Torvalds authored Jul 02, 2008

* 'for-2.6.26' of git://git.kernel.dk/linux-2.6-block:
  Properly notify block layer of sync writes
  block: Fix the starving writes bug in the anticipatory IO scheduler

0e77a07f

Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6 · 23c0e4a2

Linus Torvalds authored Jul 02, 2008

* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
  [IA64] export account_system_vtime
  [IA64] Bugfix for system with 32 cpus

23c0e4a2

Merge git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/v4l-dvb · 3a57a788

Linus Torvalds authored Jul 02, 2008

* git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/v4l-dvb:
  V4L/DVB (8178): uvc: Fix compilation breakage for the other drivers, if uvc is selected
  V4L/DVB (8145a): USB Video Class driver

3a57a788

Merge git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6 · a16b4bcd

Linus Torvalds authored Jul 02, 2008

* git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6:
  ide: fix /proc/ide/ide?/mate reporting
  Revert "BAST: Remove old IDE driver"

a16b4bcd

Merge master.kernel.org:/home/rmk/linux-2.6-arm · 15895b93

Linus Torvalds authored Jul 02, 2008

* master.kernel.org:/home/rmk/linux-2.6-arm:
  [ARM] 5131/1: Annotate platform_secondary_init with trace_hardirqs_off
  [ARM] 5117/1: pxafb: fix __devinit/exit annotations
  [ARM] Export dma_sync_sg_for_device()
  [ARM] 5109/1: Mark rtc sa1100 driver as wakeup source before registering it
  [ARM] 5116/1: pxafb: cleanup and fix order of failure handling
  [ARM] 5115/1: pxafb: fix ifdef for command line option handling
  ARM: OMAP: Correcting the gpmc prefetch control register address
  ARM: OMAP: DMA: Don't mark channel active in omap_enable_channel_irq

15895b93

tty: Fix inverted logic in send_break · 3e2a078c

Alan Cox authored Jun 30, 2008

Not sure how this came to get inverted but it appears to have been my
mess up.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

3e2a078c

Merge branch 'sched-fixes-for-linus' of... · b2a4a7ce

Linus Torvalds authored Jul 02, 2008

Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  sched: fix divide error when trying to configure rt_period to zero

b2a4a7ce

Merge branch 'i2c-for-linus' of git://jdelvare.pck.nerim.net/jdelvare-2.6 · f7572da5

Linus Torvalds authored Jul 02, 2008

* 'i2c-for-linus' of git://jdelvare.pck.nerim.net/jdelvare-2.6:
  i2c: Fix bad hint about irqs in i2c.h
  i2c: Documentation: fix device matching description

f7572da5

Merge branch 'core-fixes-for-linus' of... · c000131c

Linus Torvalds authored Jul 02, 2008

Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  rcu: fix hotplug vs rcu race

c000131c

Merge branch 'x86-fixes-for-linus' of... · 041924ec

Linus Torvalds authored Jul 02, 2008

Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: fix NODES_SHIFT Kconfig range

041924ec

Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6 · f36b7a2c

Linus Torvalds authored Jul 02, 2008

* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6:
  [SCSI] esp: tidy up target reference counting
  [SCSI] esp: Fix OOPS in esp_reset_cleanup().
  [SCSI] ses: Fix timeout

f36b7a2c

Merge git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm · cefcade9
Linus Torvalds authored Jul 02, 2008
```
* git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm:
  dm crypt: use cond_resched
```
cefcade9

Merge branch 'for-2.6.26' of git://neil.brown.name/md · c6b96d19

Linus Torvalds authored Jul 02, 2008

* 'for-2.6.26' of git://neil.brown.name/md:
  Fix error paths if md_probe fails.
  Don't acknowlege that stripe-expand is complete until it really is.
  Ensure interrupted recovery completed properly (v1 metadata plus bitmap)

c6b96d19

Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc · 79ff1ad2

Linus Torvalds authored Jul 02, 2008

* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
  powerpc/mpc5200: Fix lite5200b suspend/resume
  powerpc/legacy_serial: Bail if reg-offset/shift properties are present
  powerpc/bootwrapper: update for initrd with simpleImage

79ff1ad2

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 · 821b03ff

Linus Torvalds authored Jul 02, 2008

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (55 commits)
  net: fib_rules: fix error code for unsupported families
  netdevice: Fix wrong string handle in kernel command line parsing
  net: Tyop of sk_filter() comment
  netlink: Unneeded local variable
  net-sched: fix filter destruction in atm/hfsc qdisc destruction
  net-sched: change tcf_destroy_chain() to clear start of filter list
  ipv4: fix sysctl documentation of time related values
  mac80211: don't accept WEP keys other than WEP40 and WEP104
  hostap: fix sparse warnings
  hostap: don't report useless WDS frames by default
  textsearch: fix Boyer-Moore text search bug
  netfilter: nf_conntrack_tcp: fixing to check the lower bound of valid ACK
  ipv6 route: Convert rt6_device_match() to use RT6_LOOKUP_F_xxx flags.
  netlabel: Fix a problem when dumping the default IPv6 static labels
  net/inet_lro: remove setting skb->ip_summed when not LRO-able
  inet fragments: fix race between inet_frag_find and inet_frag_secret_rebuild
  CONNECTOR: add a proc entry to list connectors
  netlink: Fix some doc comments in net/netlink/attr.c
  tcp: /proc/net/tcp rto,ato values not scaled properly (v2)
  include/linux/netdevice.h: don't export MAX_HEADER to userspace
  ...

821b03ff

DRM/i915: only use tiled blits on 965+ · 3d25802e

Jesse Barnes authored Jul 01, 2008

When scheduled swaps occur, we need to blit between front & back
buffers.  If the buffers are tiled, we need to set the appropriate
XY_SRC_COPY tile bit, but only on 965 chips, since it will cause
corruption on pre-965 (e.g. 945).

Bug reported by and fix tested by Tomas Janousek <tomi@nomi.cz>.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Acked-by: Dave Airlie <airlied@linux.ie>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

3d25802e

drivers/input/ff-core.c needs <linux/sched.h> · 83680cdb

Geert Uytterhoeven authored Jul 01, 2008

Commit 656acd2b ("Input: fix locking in
force-feedback core") causes the following regression on m68k:

| linux/drivers/input/ff-core.c: In function 'input_ff_upload':
| linux/drivers/input/ff-core.c:172: error: dereferencing pointer to incomplete type
| linux/drivers/input/ff-core.c: In function 'erase_effect':
| linux/drivers/input/ff-core.c:197: error: dereferencing pointer to incomplete type
| linux/drivers/input/ff-core.c:204: error: dereferencing pointer to incomplete type
| make[4]: *** [drivers/input/ff-core.o] Error 1

As the incomplete type is `struct task_struct', including <linux/sched.h> fixes
it.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

83680cdb

Merge branch 'for-2.6.26' of git://git.secretlab.ca/git/linux-2.6-mpc52xx into merge · 781c74b1
Paul Mackerras authored Jul 03, 2008

781c74b1

02 Jul, 2008 9 commits

V4L/DVB (8178): uvc: Fix compilation breakage for the other drivers, if uvc is selected · 06f3ed23

Mauro Carvalho Chehab authored Jul 02, 2008

UVC makefile defines obj as:
	obj-$(CONFIG_USB_VIDEO_CLASS) := uvcvideo.o
Instead of:
	obj-$(CONFIG_USB_VIDEO_CLASS) += uvcvideo.o

Due to that, if uvc is selected, all obj-y or obj-m that were added to
compilation were forget. This breaks a proper kernel build.
Acked-by: Laurent Pinchart <laurent.pinchart@skynet.be>
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>

06f3ed23

dm crypt: use cond_resched · c7f1b204

Milan Broz authored Jul 02, 2008

Add cond_resched() to prevent monopolising CPU when processing large bios.

dm-crypt processes encryption of bios in sector units.  If the bio request
is big it can spend a long time in the encryption call.
Signed-off-by: Milan Broz <mbroz@redhat.com>
Tested-by: Yan Li <elliot.li.tech@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

c7f1b204

net: fib_rules: fix error code for unsupported families · 2fe195cf

Patrick McHardy authored Jul 01, 2008

The errno code returned must be negative.

Fixes "RTNETLINK answers: Unknown error 18446744073709551519".
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

2fe195cf

netdevice: Fix wrong string handle in kernel command line parsing · 93b3cff9

Wang Chen authored Jul 01, 2008

v1->v2: Use strlcpy() to ensure s[i].name be null-termination.

1. In netdev_boot_setup_add(), a long name will leak.
   ex. : dev=21,0x1234,0x1234,0x2345,eth123456789verylongname.........
2. In netdev_boot_setup_check(), mismatch will happen if s[i].name
   is a substring of dev->name.
   ex. : dev=...eth1 dev=...eth11

[ With feedback from Ben Hutchings. ]
Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

93b3cff9

net: Tyop of sk_filter() comment · 8fde8a07

Wang Chen authored Jul 01, 2008

Parameter "needlock" no long exists.
Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8fde8a07

netlink: Unneeded local variable · 84874607

Wang Chen authored Jul 01, 2008

We already have a variable, which has the same capability.
Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

84874607

net-sched: fix filter destruction in atm/hfsc qdisc destruction · a4aebb83

Patrick McHardy authored Jul 01, 2008

Filters need to be destroyed before beginning to destroy classes
since the destination class needs to still be alive to unbind the
filter.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

a4aebb83

net-sched: change tcf_destroy_chain() to clear start of filter list · ff31ab56

Patrick McHardy authored Jul 01, 2008

Pass double tcf_proto pointers to tcf_destroy_chain() to make it
clear the start of the filter list for more consistency.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

ff31ab56

ipv4: fix sysctl documentation of time related values · 77a538d5

Stephen Hemminger authored Jul 01, 2008

These sysctl values are time related and all use the same routine
(proc_dointvec_jiffies) that internally converts from seconds to jiffies.
The code is fine, the documentation is just wrong.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

77a538d5

01 Jul, 2008 12 commits

powerpc/mpc5200: Fix lite5200b suspend/resume · 18d76ac9

Tim Yamin authored Jun 17, 2008

Suspend/resume ("echo mem > /sys/power/state") does not work with
vanilla kernels -- the system does not suspend correctly and just
hangs. This patch fixes this so suspend/resume works:

1) of_iomap does not map the whole 0xC000 of the MPC5200 immr so
saving registers does not work.
2) PCI registers need to be saved and restored.
Signed-off-by: Tim Yamin <plasm@roo.me.uk>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>

18d76ac9

powerpc/legacy_serial: Bail if reg-offset/shift properties are present · 1e6d1f26

John Linn authored Jul 01, 2008

The legacy serial driver does not work with an 8250 type UART that is
described in the device tree with the reg-offset and reg-shift
properties.  This change makes legacy_serial ignore these devices.
Signed-off-by: John Linn <john.linn@xilinx.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>

1e6d1f26

i2c: Fix bad hint about irqs in i2c.h · 8e29da9e

Wolfram Sang authored Jul 01, 2008

i2c.h mentions -1 as a not-issued irq. This false hint was taken by
of_i2c and caused crashes. Don't give any advice as 'no irq' is not
consistent across all architectures yet and it is not needed internally
by the i2c-core.
Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
Signed-off-by: Jean Delvare <khali@linux-fr.org>

8e29da9e

i2c: Documentation: fix device matching description · 2260e63a

Ben Dooks authored Jul 01, 2008

The matching process described for new style clients in
Documentation/i2c/writing-clients is classed as out-of-date
as it requires the presence of an .id_table entry in the
driver's i2c_driver entry.
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Jean Delvare <khali@linux-fr.org>

2260e63a

powerpc/bootwrapper: update for initrd with simpleImage · 5d1a0411

John Linn authored Jul 01, 2008

This change to the makefile corrects the build of a simpleImage with initrd.
Signed-off-by: John Linn <john.linn@xilinx>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>

5d1a0411

I2C: S3C2410: Add MODULE_ALIAS() for s3c2440 device. · d150a4bb

Ben Dooks authored Jul 01, 2008

Add a MODULE_ALIAS() statement for the i2c-s3c2410 controller
to ensure that it can be autoloaded on the S3C2440 systems that
we support.
Signed-off-by: Ben Dooks <ben-linux@fluff.org>

d150a4bb

I2C: S3C2410: Fixup error codes returned rom a transfer. · 63f5c289

Ben Dooks authored Jul 01, 2008

The driver should be returning -ENXIO for transfers that do not
pass the initial address byte stage.

Note, also small tidyups to the driver comments in the area.
Signed-off-by: Ben Dooks <ben-linux@fluff.org>

63f5c289

I2C: S3C2410: Check ACK on byte transmission · 2709781b

Ben Dooks authored Jul 01, 2008

We should check for the reception of an ACK after transmitting each
data byte. The address send has been correctly checking this, but the
data write byte state should have also been checking for these failures.

As part of the same fix, we remove the ACK checking from the receive
path where it should not have been checking for an ACK which our hardware
was sending.
Signed-off-by: Ben Dooks <ben-linux@fluff.org>

2709781b

rcu: fix hotplug vs rcu race · 8558f8f8

Gautham R Shenoy authored Jun 27, 2008

Dhaval Giani reported this warning during cpu hotplug stress-tests:

| On running kernel compiles in parallel with cpu hotplug:
|
| WARNING: at arch/x86/kernel/smp.c:118
| native_smp_send_reschedule+0x21/0x36()
| Modules linked in:
| Pid: 27483, comm: cc1 Not tainted 2.6.26-rc7 #1
| [...]
|  [<c0110355>] native_smp_send_reschedule+0x21/0x36
|  [<c014fe8f>] force_quiescent_state+0x47/0x57
|  [<c014fef0>] call_rcu+0x51/0x6d
|  [<c01713b3>] __fput+0x130/0x158
|  [<c0171231>] fput+0x17/0x19
|  [<c016fd99>] filp_close+0x4d/0x57
|  [<c016fdff>] sys_close+0x5c/0x97

IMHO the warning is a spurious one.

cpu_online_map is updated by the _cpu_down() using stop_machine_run().
Since force_quiescent_state is invoked from irqs disabled section,
stop_machine_run() won't be executing while a cpu is executing
force_quiescent_state(). Hence the cpu_online_map is stable while we're
in the irq disabled section.

However, a cpu might have been offlined _just_ before we disabled irqs
while entering force_quiescent_state(). And rcu subsystem might not yet
have handled the CPU_DEAD notification, leading to the offlined cpu's
bit being set in the rcp->cpumask.

Hence cpumask = (rcp->cpumask & cpu_online_map) to prevent sending
smp_reschedule() to an offlined CPU.

Here's the timeline:

CPU_A						 CPU_B
--------------------------------------------------------------
cpu_down():					.
.					   	.
.						.
stop_machine(): /* disables preemption,		.
		 * and irqs */			.
.						.
.						.
take_cpu_down();				.
.						.
.						.
.						.
cpu_disable(); /*this removes cpu 		.
		*from cpu_online_map 		.
		*/				.
.						.
.						.
restart_machine(); /* enables irqs */		.
------WINDOW DURING WHICH rcp->cpumask is stale ---------------
.						call_rcu();
.						/* disables irqs here */
.						.force_quiescent_state();
.CPU_DEAD:					.for_each_cpu(rcp->cpumask)
.						.   smp_send_reschedule();
.						.
.						.   WARN_ON() for offlined CPU!
.
.
.
rcu_cpu_notify:
.
-------- WINDOW ENDS ------------------------------------------
rcu_offline_cpu() /* Which calls cpu_quiet()
		   * which removes
		   * cpu from rcp->cpumask.
		   */

If a new batch was started just before calling stop_machine_run(), the
"tobe-offlined" cpu is still present in rcp-cpumask.

During a cpu-offline, from take_cpu_down(), we queue an rt-prio idle
task as the next task to be picked by the scheduler. We also call
cpu_disable() which will disable any further interrupts and remove the
cpu's bit from the cpu_online_map.

Once the stop_machine_run() successfully calls take_cpu_down(), it calls
schedule(). That's the last time a schedule is called on the offlined
cpu, and hence the last time when rdp->passed_quiesc will be set to 1
through rcu_qsctr_inc().

But the cpu_quiet() will be on this cpu will be called only when the
next RCU_SOFTIRQ occurs on this CPU. So at this time, the offlined CPU
is still set in rcp->cpumask.

Now coming back to the idle_task which truely offlines the CPU, it does
check for a pending RCU and raises the softirq, since it will find
rdp->passed_quiesc to be 0 in this case. However, since the cpu is
offline I am not sure if the softirq will trigger on the CPU.

Even if it doesn't the rcu_offline_cpu() will find that rcp->completed
is not the same as rcp->cur, which means that our cpu could be holding
up the grace period progression. Hence we call cpu_quiet() and move
ahead.

But because of the window explained in the timeline, we could still have
a call_rcu() before the RCU subsystem executes it's CPU_DEAD
notification, and we send smp_send_reschedule() to offlined cpu while
trying to force the quiescent states. The appended patch adds comments
and prevents checking for offlined cpu everytime.

cpu_online_map is updated by the _cpu_down() using stop_machine_run().
Since force_quiescent_state is invoked from irqs disabled section,
stop_machine_run() won't be executing while a cpu is executing
force_quiescent_state(). Hence the cpu_online_map is stable while we're
in the irq disabled section.
Reported-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Acked-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Cc: Dipankar Sarma <dipankar@in.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Rusty Russel <rusty@rustcorp.com.au>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

8558f8f8

Properly notify block layer of sync writes · 18ce3751

Jens Axboe authored Jul 01, 2008

fsync_buffers_list() and sync_dirty_buffer() both issue async writes and
then immediately wait on them. Conceptually, that makes them sync writes
and we should treat them as such so that the IO schedulers can handle
them appropriately.

This patch fixes a write starvation issue that Lin Ming reported, where
xx is stuck for more than 2 minutes because of a large number of
synchronous IO in the system:

INFO: task kjournald:20558 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
kjournald     D ffff810010820978  6712 20558      2
ffff81022ddb1d10 0000000000000046 ffff81022e7baa10 ffffffff803ba6f2
ffff81022ecd0000 ffff8101e6dc9160 ffff81022ecd0348 000000008048b6cb
0000000000000086 ffff81022c4e8d30 0000000000000000 ffffffff80247537
Call Trace:
[<ffffffff803ba6f2>] kobject_get+0x12/0x17
[<ffffffff80247537>] getnstimeofday+0x2f/0x83
[<ffffffff8029c1ac>] sync_buffer+0x0/0x3f
[<ffffffff8066d195>] io_schedule+0x5d/0x9f
[<ffffffff8029c1e7>] sync_buffer+0x3b/0x3f
[<ffffffff8066d3f0>] __wait_on_bit+0x40/0x6f
[<ffffffff8029c1ac>] sync_buffer+0x0/0x3f
[<ffffffff8066d48b>] out_of_line_wait_on_bit+0x6c/0x78
[<ffffffff80243909>] wake_bit_function+0x0/0x23
[<ffffffff8029e3ad>] sync_dirty_buffer+0x98/0xcb
[<ffffffff8030056b>] journal_commit_transaction+0x97d/0xcb6
[<ffffffff8023a676>] lock_timer_base+0x26/0x4b
[<ffffffff8030300a>] kjournald+0xc1/0x1fb
[<ffffffff802438db>] autoremove_wake_function+0x0/0x2e
[<ffffffff80302f49>] kjournald+0x0/0x1fb
[<ffffffff802437bb>] kthread+0x47/0x74
[<ffffffff8022de51>] schedule_tail+0x28/0x5d
[<ffffffff8020cac8>] child_rip+0xa/0x12
[<ffffffff80243774>] kthread+0x0/0x74
[<ffffffff8020cabe>] child_rip+0x0/0x12

Lin Ming confirms that this patch fixes the issue. I've run tests with
it for the past week and no ill effects have been observed, so I'm
proposing it for inclusion into 2.6.26.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

18ce3751

block: Fix the starving writes bug in the anticipatory IO scheduler · d585d0b9

Divyesh Shah authored Jun 16, 2008

AS scheduler alternates between issuing read and write batches. It does
the batch switch only after all requests from the previous batch are
completed.

When switching to a write batch, if there is an on-going read request,
it waits for its completion and indicates its intention of switching by
setting ad->changed_batch and the new direction but does not update the
batch_expire_time for the new write batch which it does in the case of
no previous pending requests.
On completion of the read request, it sees that we were waiting for the
switch and schedules work for kblockd right away and resets the
ad->changed_data flag.
Now when kblockd enters dispatch_request where it is expected to pick
up a write request, it in turn ends the write batch because the
batch_expire_timer was not updated and shows the expire timestamp for
the previous batch.

This results in the write starvation for all the cases where there is
the intention for switching to a write batch, but there is a previous
in-flight read request and the batch gets reverted to a read_batch
right away.

This also holds true in the reverse case (switching from a write batch
to a read batch with an in-flight write request).

I've checked that this bug exists on 2.6.11, 2.6.18, 2.6.24 and
linux-2.6-block git HEAD. I've tested the fix on x86 platforms with
SCSI drives where the driver asks for the next request while a current
request is in-flight.

This patch is based off linux-2.6-block git HEAD.

Bug reproduction:
A simple scenario which reproduces this bug is:
- dd if=/dev/hda3 of=/dev/null &
- lilo
   The lilo takes forever to complete.

This can also be reproduced fairly easily with the earlier dd and
another test
program doing msync().

The example test program below should print out a message after every
iteration
but it simply hangs forever. With this bugfix it makes forward progress.

====
Example test program using msync() (thanks to suleiman AT google DOT
com)

inline uint64_t
rdtsc(void)
{
         int64_t tsc;

         __asm __volatile("rdtsc" : "=A" (tsc));
         return (tsc);
}

int
main(int argc, char **argv)
{
         struct stat st;
         uint64_t e, s, t;
         char *p, q;
         long i;
         int fd;

         if (argc < 2) {
                 printf("Usage: %s <file>\n", argv[0]);
                 return (1);
         }

         if ((fd = open(argv[1], O_RDWR | O_NOATIME)) < 0)
                 err(1, "open");

         if (fstat(fd, &st) < 0)
                 err(1, "fstat");

         p = mmap(NULL, st.st_size, PROT_READ | PROT_WRITE,
MAP_SHARED, fd, 0);

         t = 0;
         for (i = 0; i < 1000; i++) {
                 *p = 0;
                 msync(p, 4096, MS_SYNC);
                 s = rdtsc();
                *p = 0;
                 __asm __volatile(""::: "memory");
                 e = rdtsc();
                 if (argc > 2)
                         printf("%d: %lld cycles %jd %jd\n",
                                i, e - s, (intmax_t)s, (intmax_t)e);
                 t += e - s;
         }
         printf("average time: %lld cycles\n", t / 1000);
         return (0);
}

Cc: <stable@kernel.org>
Acked-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

d585d0b9

x86: fix NODES_SHIFT Kconfig range · efac4189

Thomas Gleixner authored Jul 01, 2008

commit 43238382
       x86: change size of node ids from u8 to s16

set the range for NODES_SHIFT to 1..15.

The possible range is 1..9

Fixes Bugzilla #10726
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

efac4189