Commits · 6b4517a7913a09d3259bb1d21c9cb300f12294bd · nexedi / linux

27 Apr, 2010 2 commits

block: implement bd_claiming and claiming block · 6b4517a7

Tejun Heo authored Apr 07, 2010

Currently, device claiming for exclusive open is done after low level
open - disk->fops->open() - has completed successfully.  This means
that exclusive open attempts while a device is already exclusively
open will fail only after disk->fops->open() is called.

cdrom driver issues commands during open() which means that O_EXCL
open attempt can unintentionally inject commands to in-progress
command stream for burning thus disturbing burning process.  In most
cases, this doesn't cause problems because the first command to be
issued is TUR which most devices can process in the middle of burning.
However, depending on how a device replies to TUR during burning,
cdrom driver may end up issuing further commands.

This can't be resolved trivially by moving bd_claim() before doing
actual open() because that means an open attempt which will end up
failing could interfere other legit O_EXCL open attempts.
ie. unconfirmed open attempts can fail others.

This patch resolves the problem by introducing claiming block which is
started by bd_start_claiming() and terminated either by bd_claim() or
bd_abort_claiming().  bd_claim() from inside a claiming block is
guaranteed to succeed and once a claiming block is started, other
bd_start_claiming() or bd_claim() attempts block till the current
claiming block is terminated.

bd_claim() can still be used standalone although now it always
synchronizes against claiming blocks, so the existing users will keep
working without any change.

blkdev_open() and open_bdev_exclusive() are converted to use claiming
blocks so that exclusive open attempts from these functions don't
interfere with the existing exclusive open.

This problem was discovered while investigating bko#15403.

  https://bugzilla.kernel.org/show_bug.cgi?id=15403

The burning problem itself can be resolved by updating userspace
probing tools to always open w/ O_EXCL.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Matthias-Christian Ott <ott@mirix.org>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

6b4517a7

block: factor out bd_may_claim() · 1a3cbbc5

Tejun Heo authored Apr 07, 2010

Factor out bd_may_claim() from bd_claim(), add comments and apply a
couple of cosmetic edits.  This is to prepare for further updates to
claim path.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

1a3cbbc5

26 Apr, 2010 2 commits

blk-cgroup: config options re-arrangement · afc24d49

Vivek Goyal authored Apr 26, 2010

This patch fixes few usability and configurability issues.

o All the cgroup based controller options are configurable from
  "Genral Setup/Control Group Support/" menu. blkio is the only exception.
  Hence make this option visible in above menu and make it configurable from
  there to bring it inline with rest of the cgroup based controllers.

o Get rid of CONFIG_DEBUG_CFQ_IOSCHED.

  This option currently does two things.

  - Enable printing of cgroup paths in blktrace
  - Enables CONFIG_DEBUG_BLK_CGROUP, which in turn displays additional stat
    files in cgroup.

  If we are using group scheduling, blktrace data is of not really much use
  if cgroup information is not present. To get this data, currently one has to
  also enable CONFIG_DEBUG_CFQ_IOSCHED, which in turn brings the overhead of
  all the additional debug stat files which is not desired.

  Hence, this patch moves printing of cgroup paths under
  CONFIG_CFQ_GROUP_IOSCHED.

  This allows us to get rid of CONFIG_DEBUG_CFQ_IOSCHED completely. Now all
  the debug stat files are controlled only by CONFIG_DEBUG_BLK_CGROUP which
  can be enabled through config menu.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Divyesh Shah <dpshah@google.com>
Reviewed-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

afc24d49

blkio: Fix another BUG_ON() crash due to cfqq movement across groups · e5ff082e

Vivek Goyal authored Apr 26, 2010

o Once in a while, I was hitting a BUG_ON() in blkio code. empty_time was
  assuming that upon slice expiry, group can't be marked empty already (except
  forced dispatch).

  But this assumption is broken if cfqq can move (group_isolation=0) across
  groups after receiving a request.

  I think most likely in this case we got a request in a cfqq and accounted
  the rq in one group, later while adding the cfqq to tree, we moved the queue
  to a different group which was already marked empty and after dispatch from
  slice we found group already marked empty and raised alarm.

  This patch does not error out if group is already marked empty. This can
  introduce some empty_time stat error only in case of group_isolation=0. This
  is better than crashing. In case of group_isolation=1 we should still get
  same stats as before this patch.

[  222.308546] ------------[ cut here ]------------
[  222.309311] kernel BUG at block/blk-cgroup.c:236!
[  222.309311] invalid opcode: 0000 [#1] SMP
[  222.309311] last sysfs file: /sys/devices/virtual/block/dm-3/queue/scheduler
[  222.309311] CPU 1
[  222.309311] Modules linked in: dm_round_robin dm_multipath qla2xxx scsi_transport_fc dm_zero dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
[  222.309311]
[  222.309311] Pid: 4780, comm: fio Not tainted 2.6.34-rc4-blkio-config #68 0A98h/HP xw8600 Workstation
[  222.309311] RIP: 0010:[<ffffffff8121ad88>]  [<ffffffff8121ad88>] blkiocg_set_start_empty_time+0x50/0x83
[  222.309311] RSP: 0018:ffff8800ba6e79f8  EFLAGS: 00010002
[  222.309311] RAX: 0000000000000082 RBX: ffff8800a13b7990 RCX: ffff8800a13b7808
[  222.309311] RDX: 0000000000002121 RSI: 0000000000000082 RDI: ffff8800a13b7a30
[  222.309311] RBP: ffff8800ba6e7a18 R08: 0000000000000000 R09: 0000000000000001
[  222.309311] R10: 000000000002f8c8 R11: ffff8800ba6e7ad8 R12: ffff8800a13b78ff
[  222.309311] R13: ffff8800a13b7990 R14: 0000000000000001 R15: ffff8800a13b7808
[  222.309311] FS:  00007f3beec476f0(0000) GS:ffff880001e40000(0000) knlGS:0000000000000000
[  222.309311] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  222.309311] CR2: 000000000040e7f0 CR3: 00000000a12d5000 CR4: 00000000000006e0
[  222.309311] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  222.309311] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  222.309311] Process fio (pid: 4780, threadinfo ffff8800ba6e6000, task ffff8800b3d6bf00)
[  222.309311] Stack:
[  222.309311]  0000000000000001 ffff8800bab17a48 ffff8800bab17a48 ffff8800a13b7800
[  222.309311] <0> ffff8800ba6e7a68 ffffffff8121da35 ffff880000000001 00ff8800ba5c5698
[  222.309311] <0> ffff8800ba6e7a68 ffff8800a13b7800 0000000000000000 ffff8800bab17a48
[  222.309311] Call Trace:
[  222.309311]  [<ffffffff8121da35>] __cfq_slice_expired+0x2af/0x3ec
[  222.309311]  [<ffffffff8121fd7b>] cfq_dispatch_requests+0x2c8/0x8e8
[  222.309311]  [<ffffffff8120f1cd>] ? spin_unlock_irqrestore+0xe/0x10
[  222.309311]  [<ffffffff8120fb1a>] ? blk_insert_cloned_request+0x70/0x7b
[  222.309311]  [<ffffffff81210461>] blk_peek_request+0x191/0x1a7
[  222.309311]  [<ffffffffa0002799>] dm_request_fn+0x38/0x14c [dm_mod]
[  222.309311]  [<ffffffff810ae61f>] ? sync_page_killable+0x0/0x35
[  222.309311]  [<ffffffff81210fd4>] __generic_unplug_device+0x32/0x37
[  222.309311]  [<ffffffff81211274>] generic_unplug_device+0x2e/0x3c
[  222.309311]  [<ffffffffa00011a6>] dm_unplug_all+0x42/0x5b [dm_mod]
[  222.309311]  [<ffffffff8120ca37>] blk_unplug+0x29/0x2d
[  222.309311]  [<ffffffff8120ca4d>] blk_backing_dev_unplug+0x12/0x14
[  222.309311]  [<ffffffff81109a7a>] block_sync_page+0x35/0x39
[  222.309311]  [<ffffffff810ae616>] sync_page+0x41/0x4a
[  222.309311]  [<ffffffff810ae62d>] sync_page_killable+0xe/0x35
[  222.309311]  [<ffffffff8158aa59>] __wait_on_bit_lock+0x46/0x8f
[  222.309311]  [<ffffffff810ae4f5>] __lock_page_killable+0x66/0x6d
[  222.309311]  [<ffffffff81056f9c>] ? wake_bit_function+0x0/0x33
[  222.309311]  [<ffffffff810ae528>] lock_page_killable+0x2c/0x2e
[  222.309311]  [<ffffffff810afbc5>] generic_file_aio_read+0x361/0x4f0
[  222.309311]  [<ffffffff810ea044>] do_sync_read+0xcb/0x108
[  222.309311]  [<ffffffff811e42f7>] ? security_file_permission+0x16/0x18
[  222.309311]  [<ffffffff810ea6ab>] vfs_read+0xab/0x108
[  222.309311]  [<ffffffff810ea7c8>] sys_read+0x4a/0x6e
[  222.309311]  [<ffffffff81002b5b>] system_call_fastpath+0x16/0x1b
[  222.309311] Code: 58 01 00 00 00 48 89 c6 75 0a 48 83 bb 60 01 00 00 00 74 09 48 8d bb a0 00 00 00 eb 35 41 fe cc 74 0d f6 83 c0 01 00 00 04 74 04 <0f> 0b eb fe 48 89 75 e8 e8 be e0 de ff 66 83 8b c0 01 00 00 04
[  222.309311] RIP  [<ffffffff8121ad88>] blkiocg_set_start_empty_time+0x50/0x83
[  222.309311]  RSP <ffff8800ba6e79f8>
[  222.309311] ---[ end trace 32b4f71dffc15712 ]---
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Divyesh Shah <dpshah@google.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

e5ff082e

21 Apr, 2010 1 commit

blkio: Fix blkio crash during rq stat update · 7f1dc8a2

Vivek Goyal authored Apr 21, 2010

blkio + cfq was crashing even when two sequential readers were put in two
separate cgroups (group_isolation=0).

The reason being that cfqq can migrate across groups based on its being
sync-noidle or not, it can happen that at request insertion time, cfqq
belonged to one cfqg and at request dispatch time, it belonged to root
group. In this case request stats per cgroup can go wrong and it also runs
into BUG_ON().

This patch implements rq stashing away a cfq group pointer and not relying
on cfqq->cfqg pointer alone for rq stat accounting.

[   65.163523] ------------[ cut here ]------------
[   65.164301] kernel BUG at block/blk-cgroup.c:117!
[   65.164301] invalid opcode: 0000 [#1] SMP
[   65.164301] last sysfs file: /sys/devices/pci0000:00/0000:00:05.0/0000:60:00.1/host9/rport-9:0-0/target9:0:0/9:0:0:2/block/sde/stat
[   65.164301] CPU 1
[   65.164301] Modules linked in: dm_round_robin dm_multipath qla2xxx scsi_transport_fc dm_zero dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
[   65.164301]
[   65.164301] Pid: 4505, comm: fio Not tainted 2.6.34-rc4-blk-for-35 #34 0A98h/HP xw8600 Workstation
[   65.164301] RIP: 0010:[<ffffffff8121924f>]  [<ffffffff8121924f>] blkiocg_update_io_remove_stats+0x5b/0xaf
[   65.164301] RSP: 0018:ffff8800ba5a79e8  EFLAGS: 00010046
[   65.164301] RAX: 0000000000000096 RBX: ffff8800bb268d60 RCX: 0000000000000000
[   65.164301] RDX: ffff8800bb268eb8 RSI: 0000000000000000 RDI: ffff8800bb268e00
[   65.164301] RBP: ffff8800ba5a7a08 R08: 0000000000000064 R09: 0000000000000001
[   65.164301] R10: 0000000000079640 R11: ffff8800a0bd5bf0 R12: ffff8800bab4af01
[   65.164301] R13: ffff8800bab4af00 R14: ffff8800bb1d8928 R15: 0000000000000000
[   65.164301] FS:  00007f18f75056f0(0000) GS:ffff880001e40000(0000) knlGS:0000000000000000
[   65.164301] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   65.164301] CR2: 000000000040e7f0 CR3: 00000000ba52b000 CR4: 00000000000006e0
[   65.164301] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   65.164301] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[   65.164301] Process fio (pid: 4505, threadinfo ffff8800ba5a6000, task ffff8800ba45ae80)
[   65.164301] Stack:
[   65.164301]  ffff8800ba5a7a08 ffff8800ba722540 ffff8800bab4af68 ffff8800bab4af68
[   65.164301] <0> ffff8800ba5a7a38 ffffffff8121d814 ffff8800ba722540 ffff8800bab4af68
[   65.164301] <0> ffff8800ba722540 ffff8800a08f6800 ffff8800ba5a7a68 ffffffff8121d8ca
[   65.164301] Call Trace:
[   65.164301]  [<ffffffff8121d814>] cfq_remove_request+0xe4/0x116
[   65.164301]  [<ffffffff8121d8ca>] cfq_dispatch_insert+0x84/0xe1
[   65.164301]  [<ffffffff8121e833>] cfq_dispatch_requests+0x767/0x8e8
[   65.164301]  [<ffffffff8120e524>] ? submit_bio+0xc3/0xcc
[   65.164301]  [<ffffffff810ad657>] ? sync_page_killable+0x0/0x35
[   65.164301]  [<ffffffff8120ea8d>] blk_peek_request+0x191/0x1a7
[   65.164301]  [<ffffffffa000109c>] ? dm_get_live_table+0x44/0x4f [dm_mod]
[   65.164301]  [<ffffffffa0002799>] dm_request_fn+0x38/0x14c [dm_mod]
[   65.164301]  [<ffffffff810ad657>] ? sync_page_killable+0x0/0x35
[   65.164301]  [<ffffffff8120f600>] __generic_unplug_device+0x32/0x37
[   65.164301]  [<ffffffff8120f8a0>] generic_unplug_device+0x2e/0x3c
[   65.164301]  [<ffffffffa00011a6>] dm_unplug_all+0x42/0x5b [dm_mod]
[   65.164301]  [<ffffffff8120b063>] blk_unplug+0x29/0x2d
[   65.164301]  [<ffffffff8120b079>] blk_backing_dev_unplug+0x12/0x14
[   65.164301]  [<ffffffff81108a82>] block_sync_page+0x35/0x39
[   65.164301]  [<ffffffff810ad64e>] sync_page+0x41/0x4a
[   65.164301]  [<ffffffff810ad665>] sync_page_killable+0xe/0x35
[   65.164301]  [<ffffffff81589027>] __wait_on_bit_lock+0x46/0x8f
[   65.164301]  [<ffffffff810ad52d>] __lock_page_killable+0x66/0x6d
[   65.164301]  [<ffffffff81055fd4>] ? wake_bit_function+0x0/0x33
[   65.164301]  [<ffffffff810ad560>] lock_page_killable+0x2c/0x2e
[   65.164301]  [<ffffffff810aebfd>] generic_file_aio_read+0x361/0x4f0
[   65.164301]  [<ffffffff810e906c>] do_sync_read+0xcb/0x108
[   65.164301]  [<ffffffff811e32a3>] ? security_file_permission+0x16/0x18
[   65.164301]  [<ffffffff810e96d3>] vfs_read+0xab/0x108
[   65.164301]  [<ffffffff810e97f0>] sys_read+0x4a/0x6e
[   65.164301]  [<ffffffff81002b5b>] system_call_fastpath+0x16/0x1b
[   65.164301] Code: 00 74 1c 48 8b 8b 60 01 00 00 48 85 c9 75 04 0f 0b eb fe 48 ff c9 48 89 8b 60 01 00 00 eb 1a 48 8b 8b 58 01 00 00 48 85 c9 75 04 <0f> 0b eb fe 48 ff c9 48 89 8b 58 01 00 00 45 84 e4 74 16 48 8b
[   65.164301] RIP  [<ffffffff8121924f>] blkiocg_update_io_remove_stats+0x5b/0xaf
[   65.164301]  RSP <ffff8800ba5a79e8>
[   65.164301] ---[ end trace 1b2b828753032e68 ]---
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

7f1dc8a2

16 Apr, 2010 1 commit

blkio: Initialize blkg->stats_lock for the root cfqg too · 8d2a91f8

Divyesh Shah authored Apr 16, 2010

This fixes the lockdep warning reported by Gui Jianfeng.
Signed-off-by: Divyesh Shah <dpshah@google.com>
Reviewed-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

8d2a91f8

15 Apr, 2010 1 commit

blkio: fix for modular blk-cgroup build · b6ac23af

Divyesh Shah authored Apr 15, 2010

After merging the block tree, 20100414's linux-next build (x86_64
allmodconfig) failed like this:

ERROR: "get_gendisk" [block/blk-cgroup.ko] undefined!
ERROR: "sched_clock" [block/blk-cgroup.ko] undefined!

This happens because the two symbols aren't exported and hence not available
when blk-cgroup code is built as a module. I've tried to stay consistent with
the use of EXPORT_SYMBOL or EXPORT_SYMBOL_GPL with the other symbols in the
respective files.
Signed-off-by: Divyesh Shah <dpshah@google.com>
Acked-by: Gui Jianfeng <guijianfeng@cn.fujitsu.cn>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

b6ac23af

14 Apr, 2010 2 commits

block: ensure jiffies wrap is handled correctly in blk_rq_timed_out_timer · c0d97e9c

Richard Kennedy authored Apr 14, 2010

blk_rq_timed_out_timer() relied on blk_add_timer() never returning a
timer value of zero, but commit 7838c15b
removed the code that bumped this value when it was zero.
Therefore when jiffies is near wrap we could get unlucky & not set the
timeout value correctly.

This patch uses a flag to indicate that the timeout value was set and so
handles jiffies wrap correctly, and it keeps all the logic in one
function so should be easier to maintain in the future.
Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

c0d97e9c

blkio: Fix compile errors · 28baf442

Divyesh Shah authored Apr 14, 2010

Fixes compile errors in blk-cgroup code for empty_time stat and a merge fix in
CFQ. The first error was when CONFIG_DEBUG_CFQ_IOSCHED is not set.
Signed-off-by: Divyesh Shah <dpshah@google.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

28baf442

13 Apr, 2010 17 commits

Merge branch 'master' into for-2.6.35 · 4facdaec

Jens Axboe authored Apr 13, 2010

Conflicts:
	block/blk-cgroup.c
	block/cfq-iosched.c
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

4facdaec

block: Update to io-controller stats · a11cdaa7

Divyesh Shah authored Apr 13, 2010

Changelog from v1:
o Call blkiocg_update_idle_time_stats() at cfq_rq_enqueued() instead of at
  dispatch time.

Changelog from original patchset: (in response to Vivek Goyal's comments)
o group blkiocg_update_blkio_group_dequeue_stats() with other DEBUG functions
o rename blkiocg_update_set_active_queue_stats() to
  blkiocg_update_avg_queue_size_stats()
o s/request/io/ in blkiocg_update_request_add_stats() and
  blkiocg_update_request_remove_stats()
o Call cfq_del_timer() at request dispatch() instead of
  blkiocg_update_idle_time_stats()

Signed-off-by: Divyesh Shah<dpshah@google.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

a11cdaa7

io-controller: Document for blkio.weight_device · da69da18

Gui Jianfeng authored Apr 13, 2010

Here is the document for blkio.weight_device
Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

da69da18

io-controller: Add a new interface "weight_device" for IO-Controller · 34d0f179

Gui Jianfeng authored Apr 13, 2010

Currently, IO Controller makes use of blkio.weight to assign weight for
all devices. Here a new user interface "blkio.weight_device" is introduced to
assign different weights for different devices. blkio.weight becomes the
default value for devices which are not configured by "blkio.weight_device"

You can use the following format to assigned specific weight for a given
device:
#echo "major:minor weight" > blkio.weight_device

major:minor represents device number.

And you can remove weight for a given device as following:
#echo "major:minor 0" > blkio.weight_device

V1->V2 changes:
- use user interface "weight_device" instead of "policy" suggested by Vivek
- rename some struct suggested by Vivek
- rebase to 2.6-block "for-linus" branch
- remove an useless list_empty check pointed out by Li Zefan
- some trivial typo fix

V2->V3 changes:
- Move policy_*_node() functions up to get rid of forward declarations
- rename related functions by adding prefix "blkio_"
Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

34d0f179

Linux 2.6.34-rc4 · 0d0fb0f9
Linus Torvalds authored Apr 12, 2010

0d0fb0f9

Merge branch 'anonvma' · 64a8920f

Linus Torvalds authored Apr 12, 2010

* anonvma:
  anonvma: when setting up page->mapping, we need to pick the _oldest_ anonvma
  anon_vma: clone the anon_vma chain in the right order
  vma_adjust: fix the copying of anon_vma chains
  Simplify and comment on anon_vma re-use for anon_vma_prepare()

64a8920f

Merge master.kernel.org:/home/rmk/linux-2.6-arm · 50b88c46

Linus Torvalds authored Apr 12, 2010

* master.kernel.org:/home/rmk/linux-2.6-arm: (21 commits)
  ARM: Fix ioremap_cached()/ioremap_wc() for SMP platforms
  ARM: 6043/1: AT91 slow-clock resume: Don't wait for a disabled PLL to lock
  ARM: 6031/1: fix Thumb-2 decompressor
  ARM: 6029/1: ep93xx: gpio.c: local functions should be static
  ARM: 6028/1: ARM: add MAINTAINERS for U300
  ARM: 6024/1: bcmring: fix missing down on semaphore in dma.c
  MXC: mach_armadillo5x0: Add USB Host support.
  ARM mach-mx3: duplicated include
  ARM mach-mx3: duplicated include
  imx31: add watchdog device on litekit board.
  imx3: Add watchdog platform device support
  MXC: mach-mx31_3ds: add support for freescale mc13783 power management device.
  MXC: mach-mx31_3ds: Add SPI1 device support.
  MXC: mach-mx31_3ds: Add support for on board NAND Flash.
  MXC: mach-mx31_3ds: Update variable names over recent mach name modification.
  imx31: fix parent clock for rtc
  i.MX51: remove NFC AXI static mapping
  i.MX51: determine silicon revision dynamically
  i.MX51: map TZIC dynamically
  i.MX51: Use correct clock for gpt
  ...

50b88c46

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable · d6cf853d

Linus Torvalds authored Apr 12, 2010

* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
  Btrfs: make sure the chunk allocator doesn't create zero length chunks
  Btrfs: fix data enospc check overflow

d6cf853d

Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 · 6a945f38

Linus Torvalds authored Apr 12, 2010

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6:
  quota: Fix possible dq_flags corruption
  quota: Hide warnings about writes to the filesystem before quota was turned on
  ext3: symlink must be handled via filesystem specific operation
  ext2: symlink must be handled via filesystem specific operation

6a945f38

Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6 · 50fc88cb

Linus Torvalds authored Apr 12, 2010

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6:
  udf: add speciffic ->setattr callback
  udf: potential integer overflow

50fc88cb

Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus · 4505a493

Linus Torvalds authored Apr 12, 2010

* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus: (36 commits)
  MIPS: Calculate proper ebase value for 64-bit kernels
  MIPS: Alchemy: DB1200: Remove custom wait implementation
  MIPS: Big Sur: Make defconfig more useful.
  MIPS: Fix __vmalloc() etc. on MIPS for non-GPL modules
  MIPS: Sibyte: Fix M3 TLB exception handler workaround.
  MIPS: BCM63xx: Fix build failure in board_bcm963xx.c
  MIPS: uasm: Add OR instruction.
  MIPS: Sibyte: Apply M3 workaround only on affected chip types and versions.
  MIPS: BCM63xx: Initialize gpio_out_low & out_high to current value at boot.
  MIPS: BCM63xx: Register SSB SPROM fallback in board's first stage callback
  MIPS: BCM63xx: Fix typo in cpu-feature-overrides file.
  MIPS: BCM63xx: Add support for second uart.
  MIPS: BCM63xx: Fix double gpio registration.
  MIPS: BCM63xx: Add DWVS0 board
  MIPS: BCM63xx: Add the RTA1025W-16 BCM6348-based board to suppported boards.
  MIPS: BCM63xx: Fix BCM6338 and BCM6345 gpio count
  MIPS: libgcc.h: Checkpatch cleanup
  MIPS: Loongson-2F: Flush the branch target history in BTB and RAS
  MIPS: Move signal trampolines off of the stack.
  MIPS: Preliminary VDSO
  ...

4505a493

Merge branch 'for-2.6.34' of git://linux-nfs.org/~bfields/linux · fedfb947
Linus Torvalds authored Apr 12, 2010
```
* 'for-2.6.34' of git://linux-nfs.org/~bfields/linux:
  svcrdma: RDMA support not yet compatible with RPC6
```
fedfb947

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2 · 44fa2b4b

Linus Torvalds authored Apr 12, 2010

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2:
  nilfs2: fix typo "numer" -> "number" in alloc.c
  nilfs2: Remove an uninitialization warning in nilfs_btree_propagate_v()
  nilfs2: fix a wrong type conversion in nilfs_ioctl()

44fa2b4b

anonvma: when setting up page->mapping, we need to pick the _oldest_ anonvma · ea90002b

Linus Torvalds authored Apr 12, 2010

Otherwise we might be mapping in a page in a new mapping, but that page
(through the swapcache) would later be mapped into an old mapping too.
The page->mapping must be the case that works for everybody, not just
the mapping that happened to page it in first.

Here's the scenario:

 - page gets allocated/mapped by process A. Let's call the anon_vma we
   associate the page with 'A' to keep it easy to track.

 - Process A forks, creating process B. The anon_vma in B is 'B', and has
   a chain that looks like 'B' -> 'A'. Everything is fine.

 - Swapping happens. The page (with mapping pointing to 'A') gets swapped
   out (perhaps not to disk - it's enough to assume that it's just not
   mapped any more, and lives entirely in the swap-cache)

 - Process B pages it in, which goes like this:

        do_swap_page ->
          page = lookup_swap_cache(entry);
         ...
          set_pte_at(mm, address, page_table, pte);
          page_add_anon_rmap(page, vma, address);

   And think about what happens here!

   In particular, what happens is that this will now be the "first"
   mapping of that page, so page_add_anon_rmap() used to do

        if (first)
                __page_set_anon_rmap(page, vma, address);

   and notice what anon_vma it will use? It will use the anon_vma for
   process B!

   What happens then? Trivial: process 'A' also pages it in (nothing
   happens, it's not the first mapping), and then process 'B' execve's
   or exits or unmaps, making anon_vma B go away.

   End result: process A has a page that points to anon_vma B, but
   anon_vma B does not exist any more.  This can go on forever.  Forget
   about RCU grace periods, forget about locking, forget anything like
   that.  The bug is simply that page->mapping points to an anon_vma
   that was correct at one point, but was _not_ the one that was shared
   by all users of that possible mapping.

Changing it to always use the deepest anon_vma in the anonvma chain gets
us to the safest model.

This can be improved in certain cases: if we know the page is private to
just this particular mapping (for example, it's a new page, or it is the
only swapcache entry), we could pick the top (most specific) anon_vma.

But that's a future optimization. Make it _work_ reliably first.
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Tested-by: Borislav Petkov <bp@alien8.de> [ "What do you know, I think you fixed it!" ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ea90002b

anon_vma: clone the anon_vma chain in the right order · 646d87b4

Linus Torvalds authored Apr 11, 2010

We want to walk the chain in reverse order when cloning it, so that the
order of the result chain will be the same as the order in the source
chain.  When we add entries to the chain, they go at the head of the
chain, so we want to add the source head last.
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Tested-by: Borislav Petkov <bp@alien8.de> [ "No, it still oopses" ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

646d87b4

vma_adjust: fix the copying of anon_vma chains · 287d97ac

Linus Torvalds authored Apr 10, 2010

When we move the boundaries between two vma's due to things like
mprotect, we need to make sure that the anon_vma of the pages that got
moved from one vma to another gets properly copied around.  And that was
not always the case, in this rather hard-to-follow code sequence.

Clarify the code, and fix it so that it copies the anon_vma from the
right source.
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Tested-by: Borislav Petkov <bp@alien8.de> [ "Yeah, not so much this one either" ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

287d97ac

Simplify and comment on anon_vma re-use for anon_vma_prepare() · d0e9fe17

Linus Torvalds authored Apr 10, 2010

This changes the anon_vma reuse case to require that we only reuse
simple anon_vma's - ie the case when the vma only has a single anon_vma
associated with it.

This means that a reuse of an anon_vma from an adjacent vma will always
guarantee that both vma's are associated not only with the same
anon_vma, they will also have the same anon_vma chain (of just a single
entry in this case).

And since anon_vma re-use was the only case where the same anon_vma
might be associated with different chains of anon_vma's, we now have the
case that every vma that shares the same anon_vma will always also have
the same chain.  That makes it much easier to think about merging vma's
that share the same anon_vma's: you can always just drop the other
anon_vma chain in anon_vma_merge() since you know that they are always
identical.

This also splits up the function to validate the anon_vma re-use, and
adds a lot of commentary about the possible races.
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Tested-by: Borislav Petkov <bp@alien8.de> [ "That didn't fix it" ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

d0e9fe17

12 Apr, 2010 14 commits

quota: Fix possible dq_flags corruption · 08261673

Andrew Perepechko authored Apr 12, 2010

dq_flags are modified non-atomically in do_set_dqblk via __set_bit calls and
atomically for example in mark_dquot_dirty or clear_dquot_dirty. Hence a
change done by an atomic operation can be overwritten by a change done by a
non-atomic one. Fix the problem by using atomic bitops even in do_set_dqblk.
Signed-off-by: Andrew Perepechko <andrew.perepechko@sun.com>
Signed-off-by: Jan Kara <jack@suse.cz>

08261673

quota: Hide warnings about writes to the filesystem before quota was turned on · 4c5e6c0e

Jan Kara authored Apr 06, 2010

For a root filesystem write to the filesystem before quota is turned on happens
regularly and there's no way around it because of writes to syslog, /etc/mtab,
and similar. So the warning is rather pointless for ordinary users. It's
still useful during development so we just hide the warning behind
__DQUOT_PARANOIA config option.
Signed-off-by: Jan Kara <jack@suse.cz>

4c5e6c0e

ext3: symlink must be handled via filesystem specific operation · 774f03fb

Dmitry Monakhov authored Mar 26, 2010

generic setattr implementation is no longer responsible for
quota transfer so synlinks must be handled via ext3_setattr.
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Jan Kara <jack@suse.cz>

774f03fb

ext2: symlink must be handled via filesystem specific operation · fc7683a3

Dmitry Monakhov authored Mar 26, 2010

generic setattr implementation is no longer responsible for
quota transfer so synlinks must be handled via ext2_setattr.
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Jan Kara <jack@suse.cz>

fc7683a3

MIPS: Calculate proper ebase value for 64-bit kernels · f6be75d0

David Daney authored Apr 06, 2010

The ebase is relative to CKSEG0 not CAC_BASE. On a 32-bit kernel they
are the same thing, for a 64-bit kernel they are not.

It happens to kind of work on a 64-bit kernel as they both reference
the same physical memory. However since the CPU uses the CKSEG0 base,
determining if a J instruction will reach always gives the wrong result
unless we use the same number the CPU uses.
Signed-off-by: David Daney <ddaney@caviumnetworks.com>
To: linux-mips@linux-mips.org
Patchwork: http://patchwork.linux-mips.org/patch/1093/Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

f6be75d0

MIPS: Alchemy: DB1200: Remove custom wait implementation · d8000bee

Manuel Lauss authored Apr 03, 2010

While playing with the out-of-tree MAE driver module, the system would
panic after a while in the db1200 custom wait code after wakeup due to
a clobbered k0 register being used as target address of a store op.

Remove the custom wait implementation and revert back to the Alchemy-
recommended implementation already set as default.
Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com>
To: Linux-MIPS <linux-mips@linux-mips.org>
Patchwork: http://patchwork.linux-mips.org/patch/1092/Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

d8000bee

MIPS: Big Sur: Make defconfig more useful. · 2844e49f
Ralf Baechle authored Apr 03, 2010
```
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
```
2844e49f

MIPS: Fix __vmalloc() etc. on MIPS for non-GPL modules · 7b3e543d

Anton Altaparmakov authored Mar 25, 2010

Commit b3594a089f1c17ff919f8f78505c3f20e1f6f8ce (lmo) rsp.
35133692 (kernel.org) break non-GPL modules
that use __vmalloc() or any of the vmap(), vm_map_ram(), etc functions on
MIPS.

All those functions are EXPORT_SYMBOL() so are meant to be allowed to be
used by non-GPL kernel modules.  These calls all take page protection as
an argument which is normally a constant like PAGE_KERNEL.

This commit causes all protection constants like PAGE_KERNEL to not be
constants and instead to contain the GPL-only symbol _page_cachable_default.

This means that all calls to __vmalloc(), vmap(), etc, cause non-GPL
modules to fail to link with the complaint that they are trying to use the
GPL-only symbol _page_cachable_default...

Change EXPORT_SYMBOL_GPL(_page_cachable_default) to EXPORT_SYMBOL() for
non-GPL modules that call __vmalloc(), vmap(), vm_map_ram() etc.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
Cc: Chris Dearman <chris@mips.com>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: http://patchwork.linux-mips.org/patch/1084/Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

7b3e543d

MIPS: Sibyte: Fix M3 TLB exception handler workaround. · 3d45285d

Ralf Baechle authored Mar 23, 2010

The M3 workaround needs to cmpare the region and VPN2 fields only.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

3d45285d

MIPS: BCM63xx: Fix build failure in board_bcm963xx.c · 5e3644a9

Florian Fainelli authored Mar 23, 2010

Since 2083e8327aeeaf818b0e4522a9d2539835c60423, the SPROM is now registered
in the board_prom_init callback, but it references variables and functions
which are declared below.  Move the variables and functions above
board_prom_init.
Signed-off-by: Florian Fainelli <ffainelli@freebox.fr>
To: linux-mips@linux-mips.org
Patchwork: http://patchwork.linux-mips.org/patch/1077/Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

5e3644a9

MIPS: uasm: Add OR instruction. · 5808184f

Ralf Baechle authored Mar 23, 2010

This is needed for the fix of the M3 workaround.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

5808184f

MIPS: Sibyte: Apply M3 workaround only on affected chip types and versions. · 8d9df29d

Ralf Baechle authored Mar 23, 2010

Previously it was unconditionally used on all Sibyte family SOCs. The
M3 bug has to be handled in the TLB exception handler which is extremly
performance sensitive, so this modification is expected to deliver around
2-3% performance improvment. This is important as required changes to the
M3 workaround will make it more costly.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

8d9df29d

MIPS: BCM63xx: Initialize gpio_out_low & out_high to current value at boot. · 9538ca63

Maxime Bizon authored Jan 30, 2010

To avoid a glitch during GPIO initialisation read GPIO output register
values left by the firmware.
Signed-off-by: Maxime Bizon <mbizon@freebox.fr>
To: linux-mips@linux-mips.org
Patchwork: http://patchwork.linux-mips.org/patch/903/Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

9538ca63

MIPS: BCM63xx: Register SSB SPROM fallback in board's first stage callback · e23a90eb

Florian Fainelli authored Mar 02, 2010

Signed-off-by: Florian Fainelli <ffainelli@freebox.fr>
To: Maxime Bizon <mbizon@freebox.fr>
Cc: linux-mips@linux-mips.org
Patchwork: http://patchwork.linux-mips.org/patch/1017/Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

e23a90eb