Commits · 1efe8fe1c2240acc476bed77740883df63373862 · Kirill Smelkov / linux

02 Feb, 2010 1 commit

cfq-iosched: Do not idle on async queues · 1efe8fe1

Vivek Goyal authored Feb 02, 2010

Few weeks back, Shaohua Li had posted similar patch. I am reposting it
with more test results.

This patch does two things.

- Do not idle on async queues.

- It also changes the write queue depth CFQ drives (cfq_may_dispatch()).
  Currently, we seem to driving queue depth of 1 always for WRITES. This is
  true even if there is only one write queue in the system and all the logic
  of infinite queue depth in case of single busy queue as well as slowly
  increasing queue depth based on last delayed sync request does not seem to
  be kicking in at all.

This patch will allow deeper WRITE queue depths (subjected to the other
WRITE queue depth contstraints like cfq_quantum and last delayed sync
request).

Shaohua Li had reported getting more out of his SSD. For me, I have got
one Lun exported from an HP EVA and when pure buffered writes are on, I
can get more out of the system. Following are test results of pure
buffered writes (with end_fsync=1) with vanilla and patched kernel. These
results are average of 3 sets of run with increasing number of threads.

AVERAGE[bufwfs][vanilla]
-------
job       Set NR  ReadBW(KB/s)   MaxClat(us)    WriteBW(KB/s)  MaxClat(us)
---       --- --  ------------   -----------    -------------  -----------
bufwfs    3   1   0              0              95349          474141
bufwfs    3   2   0              0              100282         806926
bufwfs    3   4   0              0              109989         2.7301e+06
bufwfs    3   8   0              0              116642         3762231
bufwfs    3   16  0              0              118230         6902970

AVERAGE[bufwfs] [patched kernel]
-------
bufwfs    3   1   0              0              270722         404352
bufwfs    3   2   0              0              206770         1.06552e+06
bufwfs    3   4   0              0              195277         1.62283e+06
bufwfs    3   8   0              0              260960         2.62979e+06
bufwfs    3   16  0              0              299260         1.70731e+06

I also ran buffered writes along with some sequential reads and some
buffered reads going on in the system on a SATA disk because the potential
risk could be that we should not be driving queue depth higher in presence
of sync IO going to keep the max clat low.

With some random and sequential reads going on in the system on one SATA
disk I did not see any significant increase in max clat. So it looks like
other WRITE queue depth control logic is doing its job. Here are the
results.

AVERAGE[brr, bsr, bufw together] [vanilla]
-------
job       Set NR  ReadBW(KB/s)   MaxClat(us)    WriteBW(KB/s)  MaxClat(us)
---       --- --  ------------   -----------    -------------  -----------
brr       3   1   850            546345         0              0
bsr       3   1   14650          729543         0              0
bufw      3   1   0              0              23908          8274517

brr       3   2   981.333        579395         0              0
bsr       3   2   14149.7        1175689        0              0
bufw      3   2   0              0              21921          1.28108e+07

brr       3   4   898.333        1.75527e+06    0              0
bsr       3   4   12230.7        1.40072e+06    0              0
bufw      3   4   0              0              19722.3        2.4901e+07

brr       3   8   900            3160594        0              0
bsr       3   8   9282.33        1.91314e+06    0              0
bufw      3   8   0              0              18789.3        23890622

AVERAGE[brr, bsr, bufw mixed] [patched kernel]
-------
job       Set NR  ReadBW(KB/s)   MaxClat(us)    WriteBW(KB/s)  MaxClat(us)
---       --- --  ------------   -----------    -------------  -----------
brr       3   1   837            417973         0              0
bsr       3   1   14357.7        591275         0              0
bufw      3   1   0              0              24869.7        8910662

brr       3   2   1038.33        543434         0              0
bsr       3   2   13351.3        1205858        0              0
bufw      3   2   0              0              18626.3        13280370

brr       3   4   913            1.86861e+06    0              0
bsr       3   4   12652.3        1430974        0              0
bufw      3   4   0              0              15343.3        2.81305e+07

brr       3   8   890            2.92695e+06    0              0
bsr       3   8   9635.33        1.90244e+06    0              0
bufw      3   8   0              0              17200.3        24424392

So looks like it might make sense to include this patch.

Thanks
Vivek
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

1efe8fe1

01 Feb, 2010 1 commit

blk-cgroup: Fix potential deadlock in blk-cgroup · bcf4dd43

Gui Jianfeng authored Feb 01, 2010

I triggered a lockdep warning as following.

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.33-rc2 #1
-------------------------------------------------------
test_io_control/7357 is trying to acquire lock:
 (blkio_list_lock){+.+...}, at: [<c053a990>] blkiocg_weight_write+0x82/0x9e

but task is already holding lock:
 (&(&blkcg->lock)->rlock){......}, at: [<c053a949>] blkiocg_weight_write+0x3b/0x9e

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 (&(&blkcg->lock)->rlock){......}:
       [<c04583b7>] validate_chain+0x8bc/0xb9c
       [<c0458dba>] __lock_acquire+0x723/0x789
       [<c0458eb0>] lock_acquire+0x90/0xa7
       [<c0692b0a>] _raw_spin_lock_irqsave+0x27/0x5a
       [<c053a4e1>] blkiocg_add_blkio_group+0x1a/0x6d
       [<c053cac7>] cfq_get_queue+0x225/0x3de
       [<c053eec2>] cfq_set_request+0x217/0x42d
       [<c052c8a6>] elv_set_request+0x17/0x26
       [<c0532a0f>] get_request+0x203/0x2c5
       [<c0532ae9>] get_request_wait+0x18/0x10e
       [<c0533470>] __make_request+0x2ba/0x375
       [<c0531985>] generic_make_request+0x28d/0x30f
       [<c0532da7>] submit_bio+0x8a/0x8f
       [<c04d827a>] submit_bh+0xf0/0x10f
       [<c04d91d2>] ll_rw_block+0xc0/0xf9
       [<f86e9705>] ext3_find_entry+0x319/0x544 [ext3]
       [<f86eae58>] ext3_lookup+0x2c/0xb9 [ext3]
       [<c04c3e1b>] do_lookup+0xd3/0x172
       [<c04c56c8>] link_path_walk+0x5fb/0x95c
       [<c04c5a65>] path_walk+0x3c/0x81
       [<c04c5b63>] do_path_lookup+0x21/0x8a
       [<c04c66cc>] do_filp_open+0xf0/0x978
       [<c04c0c7e>] open_exec+0x1b/0xb7
       [<c04c1436>] do_execve+0xbb/0x266
       [<c04081a9>] sys_execve+0x24/0x4a
       [<c04028a2>] ptregs_execve+0x12/0x18

-> #1 (&(&q->__queue_lock)->rlock){..-.-.}:
       [<c04583b7>] validate_chain+0x8bc/0xb9c
       [<c0458dba>] __lock_acquire+0x723/0x789
       [<c0458eb0>] lock_acquire+0x90/0xa7
       [<c0692b0a>] _raw_spin_lock_irqsave+0x27/0x5a
       [<c053dd2a>] cfq_unlink_blkio_group+0x17/0x41
       [<c053a6eb>] blkiocg_destroy+0x72/0xc7
       [<c0467df0>] cgroup_diput+0x4a/0xb2
       [<c04ca473>] dentry_iput+0x93/0xb7
       [<c04ca4b3>] d_kill+0x1c/0x36
       [<c04cb5c5>] dput+0xf5/0xfe
       [<c04c6084>] do_rmdir+0x95/0xbe
       [<c04c60ec>] sys_rmdir+0x10/0x12
       [<c04027cc>] sysenter_do_call+0x12/0x32

-> #0 (blkio_list_lock){+.+...}:
       [<c0458117>] validate_chain+0x61c/0xb9c
       [<c0458dba>] __lock_acquire+0x723/0x789
       [<c0458eb0>] lock_acquire+0x90/0xa7
       [<c06929fd>] _raw_spin_lock+0x1e/0x4e
       [<c053a990>] blkiocg_weight_write+0x82/0x9e
       [<c0467f1e>] cgroup_file_write+0xc6/0x1c0
       [<c04bd2f3>] vfs_write+0x8c/0x116
       [<c04bd7c6>] sys_write+0x3b/0x60
       [<c04027cc>] sysenter_do_call+0x12/0x32

other info that might help us debug this:

1 lock held by test_io_control/7357:
 #0:  (&(&blkcg->lock)->rlock){......}, at: [<c053a949>] blkiocg_weight_write+0x3b/0x9e
stack backtrace:
Pid: 7357, comm: test_io_control Not tainted 2.6.33-rc2 #1
Call Trace:
 [<c045754f>] print_circular_bug+0x91/0x9d
 [<c0458117>] validate_chain+0x61c/0xb9c
 [<c0458dba>] __lock_acquire+0x723/0x789
 [<c0458eb0>] lock_acquire+0x90/0xa7
 [<c053a990>] ? blkiocg_weight_write+0x82/0x9e
 [<c06929fd>] _raw_spin_lock+0x1e/0x4e
 [<c053a990>] ? blkiocg_weight_write+0x82/0x9e
 [<c053a990>] blkiocg_weight_write+0x82/0x9e
 [<c0467f1e>] cgroup_file_write+0xc6/0x1c0
 [<c0454df5>] ? trace_hardirqs_off+0xb/0xd
 [<c044d93a>] ? cpu_clock+0x2e/0x44
 [<c050e6ec>] ? security_file_permission+0xf/0x11
 [<c04bcdda>] ? rw_verify_area+0x8a/0xad
 [<c0467e58>] ? cgroup_file_write+0x0/0x1c0
 [<c04bd2f3>] vfs_write+0x8c/0x116
 [<c04bd7c6>] sys_write+0x3b/0x60
 [<c04027cc>] sysenter_do_call+0x12/0x32

To prevent deadlock, we should take locks as following sequence:

blkio_list_lock -> queue_lock ->  blkcg_lock.

The following patch should fix this bug.
Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

bcf4dd43

30 Jan, 2010 1 commit

block: fix bugs in bio-integrity mempool usage · 9e9432c2

Chuck Ebbert authored Jan 30, 2010

Fix two bugs in the bio integrity code:

 use_bip_pool() always returns 0 because it checks against the wrong limit,
 causing the mempool to be used only when regular allocation fails.

 When the mempool is used as a fallback we don't free the data properly.
Signed-Off-By: Chuck Ebbert <cebbert@redhat.com>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

9e9432c2

28 Jan, 2010 1 commit

block: fix bio_add_page for non trivial merge_bvec_fn case · 1d616585

Dmitry Monakhov authored Jan 27, 2010

We have to properly decrease bi_size in order to merge_bvec_fn return
right result.  Otherwise this result in false merge rejects for two
absolutely valid bio_vecs.  This may cause significant performance
penalty for example fs_block_size == 1k and block device is raid0 with
small chunk_size = 8k. Then it is impossible to merge 7-th fs-block in
to bio which already has 6 fs-blocks.

Cc: <stable@kernel.org>
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

1d616585

25 Jan, 2010 2 commits
- Merge branch 'for-jens' of git://git.drbd.org/linux-2.6-drbd into for-linus · c84a301d
  Jens Axboe authored Jan 25, 2010
  
  c84a301d
- drbd: null dereference bug · d3db7b48
  Dan Carpenter authored Jan 23, 2010
```
epoch is always NULL here.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
```
  d3db7b48
24 Jan, 2010 5 commits

Merge branch 'timers-fixes-for-linus' of... · f6760aa0

Linus Torvalds authored Jan 24, 2010

Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  clockevent: Don't remove broadcast device when cpu is dead

f6760aa0

Merge branch 'for-linus/i2c' of git://git.fluff.org/bjdooks/linux · 90ea3019

Linus Torvalds authored Jan 24, 2010

* 'for-linus/i2c' of git://git.fluff.org/bjdooks/linux:
  i2c: imx: call ioremap only after request_mem_region
  i2c: mxc: let time to generate stop bit

90ea3019

Merge git://git.infradead.org/~dwmw2/mtd-2.6.33 · b8be634e

Linus Torvalds authored Jan 24, 2010

* git://git.infradead.org/~dwmw2/mtd-2.6.33:
  mtd: tests: fix read, speed and stress tests on NOR flash
  mtd: Really add ARM pismo support
  kmsg_dump: Dump on crash_kexec as well

b8be634e

i2c: imx: call ioremap only after request_mem_region · 4927fbf1

Uwe Kleine-König authored Jan 08, 2010

accordingly adapt order of release_mem_region and release_mem_region on
remove.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: Ben Dooks <ben-linux@fluff.org>
Cc: Wolfram Sang <w.sang@pengutronix.de>
Cc: Richard Zhao <linuxzsc@gmail.com>
Cc: Darius Augulis <augulis.darius@gmail.com>
Cc: Sascha Hauer <s.hauer@pengutronix.de>
Cc: linux-i2c@vger.kernel.org
Acked-by: Wolfram Sang <w.sang@pengutronix.de>
Signed-off-by: Ben Dooks <ben-linux@fluff.org>

4927fbf1

i2c: mxc: let time to generate stop bit · a1ee06b7

Valentin Longchamp authored Jan 21, 2010

After generating the stop bit by changing MSTA from 1 to 0,
the i2c_imx->stopped was immediatly set to 1. The second test
on i2c_imx->stopped then is correct and the controller never
waits if the bus is busy. This patch corrects this.

On mx31moboard, stop bit was not generated on single write transfers.
This was kept unnoticed as other transfers are made afterwards that
help the write recipient to resynchronize.

Thanks to Philippe and Michael for the debugging.
Signed-off-by: Valentin Longchamp <valentin.longchamp@epfl.ch>
Signed-off by: Philippe Rétornaz <philippe.retornaz@epfl.ch>
Reported-by: Michael Bonani <michael.bonani@epfl.ch>
Acked-by; Wolfram Sang <w.sang@pengutronix.de>
Signed-off-by: Ben Dooks <ben-linux@fluff.org>

a1ee06b7

22 Jan, 2010 3 commits

drbd: fix max_segment_size initialization · 98ec286e

Lars Ellenberg authored Jan 21, 2010

blk_queue_make_request() internally calls blk_set_default_limits(),
so calling blk_queue_max_segment_size() before is useless.
Ergo: move the call to blk_queue_max_segment_size() down a few lines.

Impact:
If, after a fresh modprobe, you first connect a Diskless drbd,
then attach, this could result in a DRBD Protocol Error at first.
The next connection attempt would then succeeded.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

98ec286e

Merge branch 'for-linus/samsung' of git://git.fluff.org/bjdooks/linux · 298a4c3a
Linus Torvalds authored Jan 21, 2010
```
* 'for-linus/samsung' of git://git.fluff.org/bjdooks/linux:
  hmt: adjust for new pwm_backlight->notify prototype
```
298a4c3a

hmt: adjust for new pwm_backlight->notify prototype · 1619ce11

Peter Korsgaard authored Jan 21, 2010

Commit cfc38999f (backlight: Pass device through notify callback)
added a struct device argument to the notify callback, but didn't
update the user of it in mach-hmt.c
Signed-off-by: Peter Korsgaard <jacmet@sunsite.dk>
Signed-off-by: Ben Dooks <ben-linux@fluff.org>

1619ce11

21 Jan, 2010 21 commits

Linux 2.6.33-rc5 · 92dcffb9
Linus Torvalds authored Jan 21, 2010

92dcffb9

Merge branch 'perf-fixes-for-linus' of... · e80b1359

Linus Torvalds authored Jan 21, 2010

Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf: x86: Add support for the ANY bit
  perf: Change the is_software_event() definition
  perf: Honour event state for aux stream data
  perf: Fix perf_event_do_pending() fallback callsite
  perf kmem: Print usage help for unknown commands
  perf kmem: Increase "Hit" column length
  hw-breakpoints, perf: Fix broken mmiotrace due to dr6 by reference change
  perf timechart: Use tid not pid for COMM change

e80b1359

Merge branch 'sched-fixes-for-linus' of... · 341031ca

Linus Torvalds authored Jan 21, 2010

Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  sched: Reassign prev and switch_count when reacquire_kernel_lock() fail
  sched: Fix vmark regression on big machines

341031ca

Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev · 836f48c5

Linus Torvalds authored Jan 21, 2010

* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
  libata: retry FS IOs even if it has failed with AC_ERR_INVALID

836f48c5

Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6 · bdeef61c

Linus Torvalds authored Jan 21, 2010

* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6:
  tty: fix race in tty_fasync
  serial: serial_cs: oxsemi quirk breaks resume
  serial: imx: bit &/| confusion
  serial: Fix crash if the minimum rate of the device is > 9600 baud
  serial-core: resume serial hardware with no_console_suspend
  serial: 8250_pnp: use wildcard for serial Wacom tablets
  nozomi: quick fix for the close/close bug
  compat_ioctl: Supress "unknown cmd" message on serial /dev/console

bdeef61c

Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging-2.6 · 4caca5f9

Linus Torvalds authored Jan 21, 2010

* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging-2.6:
  Staging: hv: fix smp problems in the hyperv core code
  Staging: et131x: Fix 2.6.33rc1 regression in et131x
  Staging: asus_oled: fix oops in 2.6.32.2

4caca5f9

Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6 · f8c7e6c2

Linus Torvalds authored Jan 21, 2010

* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6:
  Revert "sysdev: fix prototype for memory_sysdev_class show/store functions"
  driver-core: fix devtmpfs crash on s390

f8c7e6c2

Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6 · c9140487

Linus Torvalds authored Jan 21, 2010

* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6:
  USB: isp1362: fix build failure on ARM systems via irq_flags cleanup
  USB: isp1362: better 64bit printf warning fixes
  USB: fix usbstorage for 2770:915d delivers no FAT
  USB: Fix level of isp1760 Reloading ptd error message
  USB: FHCI: avoid NULL pointer dereference
  USB: Fix duplicate sysfs problem after device reset.
  USB: add speed values for USB 3.0 and wireless controllers
  USB: add missing delay during remote wakeup
  USB: EHCI & UHCI: fix race between root-hub suspend and port resume
  USB: EHCI: fix handling of unusual interrupt intervals
  USB: Don't use GFP_KERNEL while we cannot reset a storage device
  USB: fix bitmask merge error
  usb: serial: fix memory leak in generic driver
  USB: serial: fix USB serial fix kfifo_len locking

c9140487

Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block · 456eac94

Linus Torvalds authored Jan 21, 2010

* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
  fs/bio.c: fix shadows sparse warning
  drbd: The kernel code is now equivalent to out of tree release 8.3.7
  drbd: Allow online resizing of DRBD devices while peer not reachable (needs to be explicitly forced)
  drbd: Don't go into StandAlone mode when authentification failes because of network error
  drivers/block/drbd/drbd_receiver.c: correct NULL test
  cfq-iosched: Respect ioprio_class when preempting
  genhd: overlapping variable definition
  block: removed unused as_io_context
  DM: Fix device mapper topology stacking
  block: bdev_stack_limits wrapper
  block: Fix discard alignment calculation and printing
  block: Correct handling of bottom device misaligment
  drbd: check on CONFIG_LBDAF, not LBD
  drivers/block/drbd: Correct NULL test
  drbd: Silenced an assert that could triggered after changing write ordering method
  drbd: Kconfig fix
  drbd: Fix for a race between IO and a detach operation [Bugz 262]
  drbd: Use drbd_crypto_is_hash() instead of an open coded check

456eac94

Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6 · dedd0c2a

Linus Torvalds authored Jan 21, 2010

* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (23 commits)
  ACPI: delete acpi_processor_power_verify_c2()
  ACPI: allow C3 > 1000usec
  ACPI: enable C2 and Turbo-mode on Nehalem notebooks on A/C
  ACPI: power_meter: remove double kfree()
  ACPI: processor: restrict early _PDC to opt-in platforms
  ACPI: Fix unused variable warning in sbs.c
  acpi: make ACPI device id constant
  sony-laptop - fix using of uninitialized variable
  ACPI: Fix section mismatch error for acpi_early_processor_set_pdc()
  eeepc-laptop: disable wireless hotplug for 1201N
  eeepc-laptop: add hotplug_disable parameter
  eeepc-laptop: switch to using sparse keymap library
  eeepc-laptop: dmi blacklist to disable pci hotplug code
  eeepc-laptop: disable cpu speed control on EeePC 701
  ACPI: don't cond_resched if irq is disabled
  ACPI: Remove unnecessary cast.
  ACPI: Advertise to BIOS in _OSC: _OST on _PPC changes
  ACPI: EC: Add wait for irq storm
  ACPI: SBS: Move SBS HC callback to faster Notify queue
  x86, ACPI: delete acpi_boot_table_init() return value
  ...

dedd0c2a

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6 · 15e551e5

Linus Torvalds authored Jan 21, 2010

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6:
  ecryptfs: use after free
  ecryptfs: Eliminate useless code
  ecryptfs: fix interpose/interpolate typos in comments
  ecryptfs: pass matching flags to interpose as defined and used there
  ecryptfs: remove unnecessary d_drop calls in ecryptfs_link
  ecryptfs: don't ignore return value from lock_rename
  ecryptfs: initialize private persistent file before dereferencing pointer
  eCryptfs: Remove mmap from directory operations
  eCryptfs: Add getattr function
  eCryptfs: Use notify_change for truncating lower inodes

15e551e5

Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable · 30a0f5e1

Linus Torvalds authored Jan 21, 2010

* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
  Btrfs: fix possible panic on unmount
  Btrfs: deal with NULL acl sent to btrfs_set_acl
  Btrfs: fix regression in orphan cleanup
  Btrfs: Fix race in btrfs_mark_extent_written
  Btrfs, fix memory leaks in error paths
  Btrfs: align offsets for btrfs_ordered_update_i_size
  btrfs: fix missing last-entry in readdir(3)

30a0f5e1

vmalloc: remove BUG_ON due to racy counting of VM_LAZY_FREE · 88f50044

Yongseok Koh authored Jan 19, 2010

In free_unmap_area_noflush(), va->flags is marked as VM_LAZY_FREE first, and
then vmap_lazy_nr is increased atomically.

But, in __purge_vmap_area_lazy(), while traversing of vmap_are_list, nr
is counted by checking VM_LAZY_FREE is set to va->flags.  After counting
the variable nr, kernel reads vmap_lazy_nr atomically and checks a
BUG_ON condition whether nr is greater than vmap_lazy_nr to prevent
vmap_lazy_nr from being negative.

The problem is that, if interrupted right after marking VM_LAZY_FREE,
increment of vmap_lazy_nr can be delayed.  Consequently, BUG_ON
condition can be met because nr is counted more than vmap_lazy_nr.

It is highly probable when vmalloc/vfree are called frequently.  This
scenario have been verified by adding delay between marking VM_LAZY_FREE
and increasing vmap_lazy_nr in free_unmap_area_noflush().

Even the vmap_lazy_nr is for checking high watermark, it never be the
strict watermark.  Although the BUG_ON condition is to prevent
vmap_lazy_nr from being negative, vmap_lazy_nr is signed variable.  So,
it could go down to negative value temporarily.

Consequently, removing the BUG_ON condition is proper.

A possible BUG_ON message is like the below.

   kernel BUG at mm/vmalloc.c:517!
   invalid opcode: 0000 [#1] SMP
   EIP: 0060:[<c04824a4>] EFLAGS: 00010297 CPU: 3
   EIP is at __purge_vmap_area_lazy+0x144/0x150
   EAX: ee8a8818 EBX: c08e77d4 ECX: e7c7ae40 EDX: c08e77ec
   ESI: 000081fe EDI: e7c7ae60 EBP: e7c7ae64 ESP: e7c7ae3c
   DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
   Call Trace:
   [<c0482ad9>] free_unmap_vmap_area_noflush+0x69/0x70
   [<c0482b02>] remove_vm_area+0x22/0x70
   [<c0482c15>] __vunmap+0x45/0xe0
   [<c04831ec>] vmalloc+0x2c/0x30
   Code: 8d 59 e0 eb 04 66 90 89 cb 89 d0 e8 87 fe ff ff 8b 43 20 89 da 8d 48 e0 8d 43 20 3b 04 24 75 e7 fe 05 a8 a5 a3 c0 e9 78 ff ff ff <0f> 0b eb fe 90 8d b4 26 00 00 00 00 56 89 c6 b8 ac a5 a3 c0 31
   EIP: [<c04824a4>] __purge_vmap_area_lazy+0x144/0x150 SS:ESP 0068:e7c7ae3c

[ See also http://marc.info/?l=linux-kernel&m=126335856228090&w=2 ]
Signed-off-by: Yongseok Koh <yongseok.koh@samsung.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

88f50044

Merge branch 'sh/for-2.6.33' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6 · 970114a1

Linus Torvalds authored Jan 21, 2010

* 'sh/for-2.6.33' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6:
  sh64: wire up sys_accept4.
  sh: unwire sys_recvmmsg.
  sh: ms7724: Correct sh-eth EEPROM polling timeout.

970114a1

Merge master.kernel.org:/home/rmk/linux-2.6-arm · def20529

Linus Torvalds authored Jan 21, 2010

* master.kernel.org:/home/rmk/linux-2.6-arm:
  ARM: 5888/1: arm: Update comments in cacheflush.h and remove unnecessary V6 and V7 comments
  ARM: 5886/1: arm: Fix cpu_proc_fin() for proc-v7.S and make kexec work
  ARM: 5885/1: arm: Flush TLB entries in setup_mm_for_reboot()
  ARM: 5884/1: arm: Fix DCC console for v7
  ARM: 5883/1: Revert "disable NX support for OABI-supporting kernels"
  ARM: 5882/1: ARM: Fix uncompress code compile for different defines of flush(void)
  ARM: fix badly placed mach/plat entries in Kconfig & Makefile

def20529

perf: x86: Add support for the ANY bit · b27d515a

Stephane Eranian authored Jan 18, 2010

Propagate the ANY bit into the fixed counter config for v3 and higher.
Signed-off-by: Stephane Eranian <eranian@google.com>
[a.p.zijlstra@chello.nl: split from larger patch]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <4b5430c6.0f975e0a.1bf9.ffff85fe@mx.google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

b27d515a

perf: Change the is_software_event() definition · 92b67598

Peter Zijlstra authored Jan 18, 2010

The is_software_event() definition always confuses me because its an
exclusive expression, make it an inclusive one.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

92b67598

perf: Honour event state for aux stream data · 22e19085

Peter Zijlstra authored Jan 18, 2010

Anton reported that perf record kept receiving events even after calling
ioctl(PERF_EVENT_IOC_DISABLE). It turns out that FORK,COMM and MMAP
events didn't respect the disabled state and kept flowing in.
Reported-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Anton Blanchard <anton@samba.org>
LKML-Reference: <1263459187.4244.265.camel@laptop>
CC: stable@kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>

22e19085

perf: Fix perf_event_do_pending() fallback callsite · fe432200

Peter Zijlstra authored Jan 18, 2010

Paul questioned the context in which we should call
perf_event_do_pending(). After looking at that I found that it should be
called from IRQ context these days, however the fallback call-site is
placed in softirq context. Ammend this by placing the callback in the IRQ
timer path.
Reported-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1263374859.4244.192.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

fe432200

sched: Reassign prev and switch_count when reacquire_kernel_lock() fail · 6d558c3a

Yong Zhang authored Jan 11, 2010

Assume A->B schedule is processing, if B have acquired BKL before and it
need reschedule this time. Then on B's context, it will go to
need_resched_nonpreemptible for reschedule. But at this time, prev and
switch_count are related to A. It's wrong and will lead to incorrect
scheduler statistics.
Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <2674af741001102238w7b0ddcadref00d345e2181d11@mail.gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

6d558c3a

sched: Fix vmark regression on big machines · 50b926e4

Mike Galbraith authored Jan 04, 2010

SD_PREFER_SIBLING is set at the CPU domain level if power saving isn't
enabled, leading to many cache misses on large machines as we traverse
looking for an idle shared cache to wake to.  Change the enabler of
select_idle_sibling() to SD_SHARE_PKG_RESOURCES, and enable same at the
sibling domain level.
Reported-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1262612696.15495.15.camel@marge.simson.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

50b926e4

20 Jan, 2010 5 commits

USB: isp1362: fix build failure on ARM systems via irq_flags cleanup · 0a2fea2e

Lothar Wassmann authored Jan 15, 2010

There was some left over #ifdef ARM logic that is outdated but no one
really noticed.  So instead of relying on this tricky logic, properly
load and utilize the platform irq_flags resources.
Reported-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Lothar Wassmann <LW@KARO-electronics.de>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

0a2fea2e

USB: isp1362: better 64bit printf warning fixes · 96b85179

Lothar Wassmann authored Jan 15, 2010

Some hosts that treat the return value of sizeof differently from unsigned
long might still hit warnings.  So use %zu for sizeof() values.  This is a
better version of the previous commit b0a9cf29.
Signed-off-by: Lothar Wassmann <LW@KARO-electronics.de>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

96b85179

USB: fix usbstorage for 2770:915d delivers no FAT · 10d2cdb6

Ryan May authored Jan 06, 2010

Resolves kernel.org bug 14914.

Remove entry for 2770:915d (usb digital camera with mass storage
support) from unusual_devs.h. The fix triggered by the entry causes
the file system on the camera to be completely inaccessible (no
partition table, the device is not mountable).

The patch works, but let me clarify a few things about it.  All the
patch does is remove the entry for this device from the
drivers/usb/storage/unusual_devs.h, which is supposed to help with a
problem with the device's reported size (I think).  I'm pretty sure it
was originally added for a reason, so I'm not sure removing it won't
cause other problems to reappear.  Also, I should note that this
unusual_devs.h entry was present (and activating workarounds) in
2.6.29, but in that version everything works fine.  Starting with
2.6.30, things no longer work.
Signed-off-by: Ryan May <rmay31@gmail.com>
Cc: Rohan Hart <rohan.hart17@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

10d2cdb6

USB: Fix level of isp1760 Reloading ptd error message · c0d74142

Colin Tuckley authored Jan 07, 2010

This error message is not actually an error, it's an information
message. It is triggered when a transfer which ended in a NAQ is
retried successfully by the hardware.
Signed-off-by: Colin Tuckley <colin.tuckley@arm.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

c0d74142

USB: FHCI: avoid NULL pointer dereference · ae35fe9e

Alexander Beregalov authored Jan 07, 2010

Assign fhci only if usb is not NULL.
Signed-off-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ae35fe9e