Commits · 0f78ab9899e9d6acb09d5465def618704255963b · Kirill Smelkov / linux

04 Oct, 2009 2 commits

Revert "Seperate read and write statistics of in_flight requests" · 0f78ab98

Jens Axboe authored Oct 04, 2009

This reverts commit a9327cac.

Corrado Zoccolo <czoccolo@gmail.com> reports:

"with 2.6.32-rc1 I started getting the following strange output from
"iostat -kx 2":
Linux 2.6.31bisect (et2) 	04/10/2009 	_i686_	(2 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          10,70    0,00    3,16   15,75    0,00   70,38

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
avgrq-sz avgqu-sz   await  svctm  %util
sda              18,22     0,00    0,67    0,01    14,77     0,02
43,94     0,01   10,53 39043915,03 2629219,87
sdb              60,89     9,68   50,79    3,04  1724,43    50,52
65,95     0,70   13,06 488437,47 2629219,87

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           2,72    0,00    0,74    0,00    0,00   96,53

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
avgrq-sz avgqu-sz   await  svctm  %util
sda               0,00     0,00    0,00    0,00     0,00     0,00
0,00     0,00    0,00   0,00 100,00
sdb               0,00     0,00    0,00    0,00     0,00     0,00
0,00     0,00    0,00   0,00 100,00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           6,68    0,00    0,99    0,00    0,00   92,33

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
avgrq-sz avgqu-sz   await  svctm  %util
sda               0,00     0,00    0,00    0,00     0,00     0,00
0,00     0,00    0,00   0,00 100,00
sdb               0,00     0,00    0,00    0,00     0,00     0,00
0,00     0,00    0,00   0,00 100,00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           4,40    0,00    0,73    1,47    0,00   93,40

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
avgrq-sz avgqu-sz   await  svctm  %util
sda               0,00     0,00    0,00    0,00     0,00     0,00
0,00     0,00    0,00   0,00 100,00
sdb               0,00     4,00    0,00    3,00     0,00    28,00
18,67     0,06   19,50 333,33 100,00

Global values for service time and utilization are garbage. For
interval values, utilization is always 100%, and service time is
higher than normal.

I bisected it down to:
[a9327cac] Seperate read and write
statistics of in_flight requests
and verified that reverting just that commit indeed solves the issue
on 2.6.32-rc1."

So until this is debugged, revert the bad commit.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

0f78ab98

cfq-iosched: don't delay async queue if it hasn't dispatched at all · e00c54c3

Jens Axboe authored Oct 04, 2009

We cannot delay for the first dispatch of the async queue if it
hasn't dispatched at all, since that could present a local user
DoS attack vector using an app that just did slow timed sync reads
while filling memory.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

e00c54c3

03 Oct, 2009 5 commits

block: Topology ioctls · ac481c20

Martin K. Petersen authored Oct 03, 2009

Not all users of the topology information want to use libblkid.  Provide
the topology information through bdev ioctls.

Also clarify sector size comments for existing BLK ioctls.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

ac481c20

cfq-iosched: use assigned slice sync value, not default · 61f0c1dc

Jens Axboe authored Oct 03, 2009

We should use the sysfs modified slice sync value, in case it differs
from the default.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

61f0c1dc

cfq-iosched: rename 'desktop' sysfs entry to 'low_latency' · 963b72fc

Jens Axboe authored Oct 03, 2009

Don't think that's necessarily a perfect description of what this
option fiddles with, but it's probably better than 'desktop'.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

963b72fc

cfq-iosched: implement slower async initiate and queue ramp up · 8e296755

Jens Axboe authored Oct 03, 2009

This slowly ramps up the async queue depth based on the time
passed since the sync IO, and doesn't allow async at all until
a sync slice period has passed.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

8e296755

cfq-iosched: delay async IO dispatch, if sync IO was just done · 365722bb

Vivek Goyal authored Oct 03, 2009

o Do not allow more than max_dispatch requests from an async queue, if some
  sync request has finished recently. This is in the hope that sync activity
  is still going on in the system and we might receive a sync request soon.
  Most likely from a sync queue which finished a request and we did not enable
  idling on it.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

365722bb

02 Oct, 2009 1 commit

cfq-iosched: add a knob for desktop interactiveness · 1d223515

Jens Axboe authored Oct 02, 2009

This is basically identical to what Vivek Goyal posted, but combined
into one and labelled 'desktop' instead of 'fairness'. The goal
is to continue to improve on the latency side of things as it relates
to interactiveness, keeping the questionable bits under this sysfs
tunable so it would be easy for throughput-only people to turn off.

Apart from adding the interactive sysfs knob, it also adds the
behavioural change of allowing slice idling even if the hardware
does tagged command queuing.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

1d223515

01 Oct, 2009 32 commits

Add a tracepoint for block request remapping · b0da3f0d

Jun'ichi Nomura authored Oct 01, 2009

Since 2.6.31 now has request-based device-mapper, it's useful to have
a tracepoint for request-remapping as well as bio-remapping.
This patch adds a tracepoint for request-remapping, trace_block_rq_remap().
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: Alasdair G Kergon <agk@redhat.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

b0da3f0d

block: allow large discard requests · 67efc925

Christoph Hellwig authored Sep 30, 2009

Currently we set the bio size to the byte equivalent of the blocks to
be trimmed when submitting the initial DISCARD ioctl.  That means it
is subject to the max_hw_sectors limitation of the HBA which is
much lower than the size of a DISCARD request we can support.
Add a separate max_discard_sectors tunable to limit the size for discard
requests.

We limit the max discard request size in bytes to 32bit as that is the
limit for bio->bi_size.  This could be much larger if we had a way to pass
that information through the block layer.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

67efc925

block: use normal I/O path for discard requests · c15227de

Christoph Hellwig authored Sep 30, 2009

prepare_discard_fn() was being called in a place where memory allocation
was effectively impossible.  This makes it inappropriate for all but
the most trivial translations of Linux's DISCARD operation to the block
command set.  Additionally adding a payload there makes the ownership
of the bio backing unclear as it's now allocated by the device driver
and not the submitter as usual.

It is replaced with QUEUE_FLAG_DISCARD which is used to indicate whether
the queue supports discard operations or not.  blkdev_issue_discard now
allocates a one-page, sector-length payload which is the right thing
for the common ATA and SCSI implementations.

The mtd implementation of prepare_discard_fn() is replaced with simply
checking for the request being a discard.

Largely based on a previous patch from Matthew Wilcox <matthew@wil.cx>
which did the prepare_discard_fn but not the different payload allocation
yet.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

c15227de

swapfile: avoid NULL pointer dereference in swapon when s_bdev is NULL · 3bd0f0c7

Suresh Jayaraman authored Sep 30, 2009

While testing Swap over NFS patchset, I noticed an oops that was triggered
during swapon. Investigating further, the NULL pointer deference is due to the
SSD device check/optimization in the swapon code that assumes s_bdev could never
be NULL.

inode->i_sb->s_bdev could be NULL in a few cases. For e.g. one such case is
loopback NFS mount, there could be others as well. Fix this by ensuring s_bdev
is not NULL before we try to deference s_bdev.
Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

3bd0f0c7

fs/bio.c: move EXPORT* macros to line after function · a112a71d

H Hartley Sweeten authored Sep 26, 2009

As mentioned in Documentation/CodingStyle, move EXPORT* macro's
to the line immediately after the closing function brace line.
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

a112a71d

Add missing blk_trace_remove_sysfs to be in pair with blk_trace_init_sysfs · 48c0d4d4

Zdenek Kabelac authored Sep 25, 2009

Add missing blk_trace_remove_sysfs to be in pair with blk_trace_init_sysfs
introduced in commit 1d54ad6d.
Release kobject also in case the request_fn is NULL.

Problem was noticed via kmemleak backtrace when some sysfs entries were
note properly destroyed during  device removal:

unreferenced object 0xffff88001aa76640 (size 80):
  comm "lvcreate", pid 2120, jiffies 4294885144
  hex dump (first 32 bytes):
    01 00 00 00 00 00 00 00 f0 65 a7 1a 00 88 ff ff  .........e......
    90 66 a7 1a 00 88 ff ff 86 1d 53 81 ff ff ff ff  .f........S.....
  backtrace:
    [<ffffffff813f9cc6>] kmemleak_alloc+0x26/0x60
    [<ffffffff8111d693>] kmem_cache_alloc+0x133/0x1c0
    [<ffffffff81195891>] sysfs_new_dirent+0x41/0x120
    [<ffffffff81194b0c>] sysfs_add_file_mode+0x3c/0xb0
    [<ffffffff81197c81>] internal_create_group+0xc1/0x1a0
    [<ffffffff81197d93>] sysfs_create_group+0x13/0x20
    [<ffffffff810d8004>] blk_trace_init_sysfs+0x14/0x20
    [<ffffffff8123f45c>] blk_register_queue+0x3c/0xf0
    [<ffffffff812447e4>] add_disk+0x94/0x160
    [<ffffffffa00d8b08>] dm_create+0x598/0x6e0 [dm_mod]
    [<ffffffffa00de951>] dev_create+0x51/0x350 [dm_mod]
    [<ffffffffa00de823>] ctl_ioctl+0x1a3/0x240 [dm_mod]
    [<ffffffffa00de8f2>] dm_compat_ctl_ioctl+0x12/0x20 [dm_mod]
    [<ffffffff81177bfd>] compat_sys_ioctl+0xcd/0x4f0
    [<ffffffff81036ed8>] sysenter_dispatch+0x7/0x2c
    [<ffffffffffffffff>] 0xffffffffffffffff
Signed-off-by: Zdenek Kabelac <zkabelac@redhat.com>
Reviewed-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

48c0d4d4

cciss: fix build when !PROC_FS · 1e6f2dc1

Alexander Beregalov authored Sep 24, 2009

Fix these build errors when CONFIG_PROC_FS is not set:
drivers/block/cciss.c: In function 'cciss_show_raid_level':
drivers/block/cciss.c:623: error: 'RAID_UNKNOWN' undeclared (first use in this function)
drivers/block/cciss.c:626: error: 'raid_label' undeclared (first use in this function)
drivers/block/cciss.c: In function 'cciss_geometry_inquiry':
drivers/block/cciss.c:2696: error: 'RAID_UNKNOWN' undeclared (first use in this function)
Signed-off-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

1e6f2dc1

block: Do not clamp max_hw_sectors for stacking devices · 5dee2477

Martin K. Petersen authored Sep 21, 2009

Stacking devices do not have an inherent max_hw_sector limit.  Set the
default to INT_MAX so we are bounded only by capabilities of the
underlying storage.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

5dee2477

block: Set max_sectors correctly for stacking devices · 80ddf247

Martin K. Petersen authored Sep 18, 2009

The topology changes unintentionally caused SAFE_MAX_SECTORS to be set
for stacking devices.  Set the default limit to BLK_DEF_MAX_SECTORS and
provide SAFE_MAX_SECTORS in blk_queue_make_request() for legacy hw
drivers that depend on the old behavior.
Acked-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

80ddf247

cciss: cciss_host_attr_groups should be const · 9f792d9f
Jens Axboe authored Sep 18, 2009
```
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
```
9f792d9f

cciss: Dynamically allocate the drive_info_struct for each logical drive. · 9cef0d2f

Stephen M. Cameron authored Sep 17, 2009

cciss: Dynamically allocate the drive_info_struct for each logical drive.
This reduces the size of the per-hba ctlr_info structure from 106936
bytes to 8132 bytes. That's on 32-bit systems. On 64-bit systems, the
improvement is even bigger. Without this, the ctlr_info struct is so big
that the driver won't even load on a 64 bit system if CISS_MAX_LUN was
at it's current setting of 1024 logical drives.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

9cef0d2f

cciss: Add usage_count attribute to each logical drive in /sys · e272afec

Stephen M. Cameron authored Sep 17, 2009

Add usage_count attribute to each logical drive at
/sys/devices/<dev>/ccissX/cXdY/usage_count for controller X,
logical drive Y.  The usage count is the number of times
the device has currently been opened.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

e272afec

cciss: Add a "raid_level" attribute to each logical drive in /sys · 3ff1111d

Stephen M. Cameron authored Sep 17, 2009

and change get rid of some magic numbers in raid lavel decoding.

Add raid_level attribute to each logical drive at
/sys/devices/<dev>/ccissX/cXdY/raid_level for controller X,
logical drive Y
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

3ff1111d

cciss: fix some magic numbers in the raid-level decoding · fa52bec9

Stephen M. Cameron authored Sep 17, 2009

cciss: fix some magic numbers in the raid-level decoding
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

fa52bec9

cciss: Add lunid attribute to each logical drive in /sys · ce84a8ae

Stephen M. Cameron authored Sep 17, 2009

Add lunid attribute to each logical drive at
/sys/devices/<dev>/ccissX/cXdY/lunid for controller X,
logical drive Y
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

ce84a8ae

cciss: Don't check h->busy_initializing in cciss_open(). · 2e043986

Stephen M. Cameron authored Sep 17, 2009

Don't check h->busy_initializing in cciss_open().  Open won't be
called before things are ready, but h->busy_initializing won't be
unset until after the initial rebuild_lun_table is finished.  But,
to read the partitions, cciss_open will be called for each logical
drive during rebuild_lun_table.  If cciss_open checks h->busy_initializing,
then the reading of the partition information during the initial
rebuild_lun_table will fail, which is especially bad news if it
happens to be your boot device.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

2e043986

cciss: Preserve all 8 bytes of LUN ID for logical drives. · 39ccf9a6

Stephen M. Cameron authored Sep 17, 2009

Preserve all 8 bytes of the LunID field returned
by CCISS_REPORT_LOGICAL instead of only saving 4 bytes.
This fixes a bug with logical volume addressing encountered on
an MSA2012.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

39ccf9a6

cciss: Silence noisy per-disk messages output by cciss_read_capacity · 983333cb

Stephen M. Cameron authored Sep 17, 2009

Silence noisy per-disk messages output by cciss_read_capacity
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

983333cb

cciss: Fix excessive gendisk freeing bug on driver unload. · 2c935593

Stephen M. Cameron authored Sep 17, 2009

Fix bug that free_hba was calling put_disk for all gendisk[]
pointers -- all 1024 of them -- regardless of whether the were
used or not (NULL).  This bug could cause rmmod to oops if logical
drives had been deleted during the driver's lifetime.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

2c935593

cciss: Fix usage_count check in rebuild_lun_table when triggered via sysfs. · 2d11d993

Stephen M. Cameron authored Sep 17, 2009

When rebuild_lun_table is reached via sysfs, the usage count that
is checked prior to messing with c0d0 has different constraints
(must be zero) than if rebuild_lun_table is reached via ioctl
(must be one.)  Fix rebuild_lun_table to take that into account.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

2d11d993

cciss: Clear all sysfs-exposed data for deleted logical drives. · 9ddb27b4

Stephen M. Cameron authored Sep 17, 2009

When removing a logical drive, clear all the information that is
now exposed by sysfs (e.g. vendor, model, serial number.)
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

9ddb27b4

cciss: Handle special case for sysfs attributes of the first logical drive. · 8ce51966

Stephen M. Cameron authored Sep 17, 2009

For c0dx where x is not 0, we handle deletion and addition simply,
but for c0d0, there is the special case that even when there's no
disk, the device node exists so that the controller may be accessed.
So, for c0d0, we only create the sysfs entries once, when a controller
is added, and only remove them once, when a controller is being
taken down.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

8ce51966

cciss: Handle cases when cciss_add_disk fails. · 361e9b07

Stephen M. Cameron authored Sep 17, 2009

Handle cases when cciss_add_disk fails.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

361e9b07

cciss: Handle failure of blk_init_queue gracefully in cciss_add_disk. · e8074f79

Stephen M. Cameron authored Sep 17, 2009

Handle failure of blk_init_queue gracefully in cciss_add_disk.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

e8074f79

cciss: Rearrange logical drive sysfs code to make the "changing a disk" path work. · 097d0264

Stephen M. Cameron authored Sep 17, 2009

Rearrange logical drive sysfs code to make the "changing a disk" path work.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

097d0264

cciss: Dynamically allocate struct device for each logical drive as needed. · 617e1344

Stephen M. Cameron authored Sep 17, 2009

Dynamically allocate struct device for each logical drive as needed
instead of allocating the maximum we would ever need at driver init time.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

617e1344

cciss: Remove some unused code in rebuild_lun_table() · 21d9db0b

Stephen M. Cameron authored Sep 17, 2009

Remove some unused code in rebuild_lun_table()
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

21d9db0b

cciss: Allow triggering of rescan of logical drive topology via sysfs entry · d6f4965d

Andrew Patterson authored Sep 17, 2009

Added /sys/bus/pci/devices/<dev>/ccissX/rescan sysfs entry used
to kick off a rescan that discovers logical drive topology changes.
Signed-off-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Acked-by: Mike Miller <mike.miller@hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

d6f4965d

cciss: Use one scan thread per controller and fix hang during rmmod · b368c9dd

Andrew Patterson authored Sep 17, 2009

Replace the use of one scan kthread per controller with one per driver.
Use a queue to hold a list of controllers that need to be rescanned with
routines to add and remove controllers from the queue.

Fix locking and completion handling to prevent a hang during rmmod.
Signed-off-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Acked-by: Mike Miller <mike.miller@hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

b368c9dd

cciss: Remove sysfs entries for logical drives on driver cleanup. · c64bebcd

Andrew Patterson authored Sep 17, 2009

Sysfs entries for logical drives need to be removed when a drive is
deleted during driver cleanup.
Signed-off-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Acked-by: Mike Miller <mike.miller@hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

c64bebcd

cciss: fix schedule_timeout() parameters · 4d761609

Randy Dunlap authored Sep 18, 2009

Change schedule_timeout() parameter to not be specific to HZ=1000.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Mike Miller <mike.miller@hp.com>
Cc: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: "Cameron, Steve" <Steve.Cameron@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

4d761609

dac960: switch to seq_file · d5d03eec

Alexey Dobriyan authored Sep 18, 2009

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Yang Hongyang <yanghy@cn.fujitsu.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

d5d03eec