- 09 Oct, 2008 40 commits
-
-
Kiyoshi Ueda authored
This patch converts virtio_blk to use __blk_end_request() directly so that end_{queued|dequeued}_request() can be removed. Related 'uptodate' argument is converted to 'error'. Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
-
Jens Axboe authored
Define as 32, which is is what BDEVNAME_SIZE is/was as well. This keeps the user interface the same and gets rid of the difference between kernel and user api here. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
-
Kiyoshi Ueda authored
This patch adds an new interface, blk_lld_busy(), to check lld's busy state from the block layer. blk_lld_busy() calls down into low-level drivers for the checking if the drivers set q->lld_busy_fn() using blk_queue_lld_busy(). This resolves a performance problem on request stacking devices below. Some drivers like scsi mid layer stop dispatching request when they detect busy state on its low-level device like host/target/device. It allows other requests to stay in the I/O scheduler's queue for a chance of merging. Request stacking drivers like request-based dm should follow the same logic. However, there is no generic interface for the stacked device to check if the underlying device(s) are busy. If the request stacking driver dispatches and submits requests to the busy underlying device, the requests will stay in the underlying device's queue without a chance of merging. This causes performance problem on burst I/O load. With this patch, busy state of the underlying device is exported via q->lld_busy_fn(). So the request stacking driver can check it and stop dispatching requests if busy. The underlying device driver must return the busy state appropriately: 1: when the device driver can't process requests immediately. 0: when the device driver can process requests immediately, including abnormal situations where the device driver needs to kill all requests. Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
-
Elias Oltmanns authored
blk_start_queueing() should act like the generic queue unplugging and kicking and ignore a stopped queue. Such a queue may not be run until after a call to blk_start_queue(). Signed-off-by: Elias Oltmanns <eo@nebensachen.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
-
Sven Schuetz authored
This header file is of interest for user space programming, i.e. for tools that process blktrace data. We would like to use it for a tool on-top of blktrace which processes data provided by blktrace. For this purpose, it would be helpful if the blktrace API would make it to /usr/include/linux. The git tree for the blktrace tools comes with its own copy of this header file. I didn't manage to replace that copy with the file generated by the patch below yet. A few more cleanups would be needed. For example, the blktrace ioctl numbers, which are currently defined in usr/include/fs.h, might need to be moved. Should be feasible, though. Signed-off-by: Sven Schuetz <sven@linux.vnet.ibm.com> Signed-off-by: Martin Peschke <mp3@de.ibm.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
-
Jens Axboe authored
By only allowing async IO to consume 3/4 ths of the tag depth, we always have slots free to serve sync IO. This is important to avoid having writes fill the entire tag queue, thus starving reads. Original patch and idea from Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
-
Jens Axboe authored
We really need to know about the hardware tagging support as well, since if the SSD does not do tagging then we still want to idle. Otherwise have the same dependent sync IO vs flooding async IO problem as on rotational media. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
-
Jens Axboe authored
SSD devices should give an RPM setting of 1 in word 217 of the ID page. If we see such a device, tell the block layer about it. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
-
Jens Axboe authored
We don't want to idle in AS/CFQ if the device doesn't have a seek penalty. So add a QUEUE_FLAG_NONROT to indicate a non-rotational device, low level drivers should set this flag upon discovery of an SSD or similar device type. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
-
Keith Wansbrough authored
The current floppy_struct allows floppies to number sectors starting from 0 or 1. This patch allows arbitrary first-sector numbers - for example, 0xC1 for Amstrad CPC disks. This extends the existing 1-bit field (FD_ZEROBASED, bit 2 of stretch) to 8 bits (FD_SECTMASK, bits 2 to 9). Currently 0x00 denotes a first sector number of 1, and 0x01 denotes a first sector number of 0. We extend this by interpreting FD_SECTMASK as the first sector number with the LSB flipped. Signed-off-by: Keith Wansbrough <keith@lochan.org> Cc: Alain Knaff <alain@linux.lu> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: Karel Zak <kzak@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
-
Julia Lawall authored
The kernel.h macro DIV_ROUND_UP performs the computation (((n) + (d) - 1) / (d)) but is perhaps more readable. An extract of the semantic patch that makes this change is as follows: (http://www.emn.fr/x-info/coccinelle/) // <smpl> @haskernel@ @@ #include <linux/kernel.h> @depends on haskernel@ expression n,d; @@ ( - (n + d - 1) / d + DIV_ROUND_UP(n,d) | - (n + (d - 1)) / d + DIV_ROUND_UP(n,d) ) @depends on haskernel@ expression n,d; @@ - DIV_ROUND_UP((n),d) + DIV_ROUND_UP(n,d) @depends on haskernel@ expression n,d; @@ - DIV_ROUND_UP(n,(d)) + DIV_ROUND_UP(n,d) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Cc: <mike.miller@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
-
scameron@beardog.cca.cpqcorp.net authored
Fix cciss SCSI rescan code to better notice device changes. If you hot-unplug a tape drive, then hot-plug a different tape drive into the same slot in a storage enclosure, the cciss driver wouldn't notice anything had changed, as it was only looking at the LUN address and device type. Now it looks at the inquiry page 0x83 device identifier, and vendor and model strings as well. Signed-off-by: Stephen M. Cameron <scameron@beardog.cca.cpqcorp.net> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
-
FUJITA Tomonori authored
This example isn't the proper way to handle scatterlists (can't handle sg chaining). Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
-
Kiyoshi Ueda authored
This patch adds a queue flag to indicate the block device can be used for request stacking. Request stacking drivers need to stack their devices on top of only devices of which q->request_fn is functional. Since bio stacking drivers (e.g. md, loop) basically initialize their queue using blk_alloc_queue() and don't set q->request_fn, the check of (q->request_fn == NULL) looks enough for that purpose. However, dm will become both types of stacking driver (bio-based and request-based). And dm will always set q->request_fn even if the dm device is bio-based of which q->request_fn is not functional actually. So we need something else to distinguish the type of the device. Adding a queue flag is a solution for that. The reason why dm always sets q->request_fn is to keep the compatibility of dm user-space tools. Currently, all dm user-space tools are using bio-based dm without specifying the type of the dm device they use. To use request-based dm without changing such tools, the kernel must decide the type of the dm device automatically. The automatic type decision can't be done at the device creation time and needs to be deferred until such tools load a mapping table, since the actual type is decided by dm target type included in the mapping table. So a dm device has to be initialized using blk_init_queue() so that we can load either type of table. Then, all queue stuffs are set (e.g. q->request_fn) and we have no element to distinguish that it is bio-based or request-based, even after a table is loaded and the type of the device is decided. By the way, some stuffs of the queue (e.g. request_list, elevator) are needless when the dm device is used as bio-based. But the memory size is not so large (about 20[KB] per queue on ia64), so I hope the memory loss can be acceptable for bio-based dm users. Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
-
Kiyoshi Ueda authored
This patch adds blk_insert_cloned_request(), a generic request submission interface for request stacking drivers. Request-based dm will use it to submit their clones to underlying devices. blk_rq_check_limits() is also added because it is possible that the lower queue has stronger limitations than the upper queue if multiple drivers are stacking at request-level. Not only for blk_insert_cloned_request()'s internal use, the function will be used by request-based dm when the queue limitation is modified (e.g. by replacing dm's table). Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
-
Kiyoshi Ueda authored
This patch adds blk_update_request(), which updates struct request with completing its data part, but doesn't complete the struct request itself. Though it looks like end_that_request_first() of older kernels, blk_update_request() should be used only by request stacking drivers. Request-based dm will use it in bio->bi_end_io callback to update the original request when a data part of a cloned request completes. Followings are additional background information of why request-based dm needs this interface. - Request stacking drivers can't use blk_end_request() directly from the lower driver's completion context (bio->bi_end_io or rq->end_io), because some device drivers (e.g. ide) may try to complete their request with queue lock held, and it may cause deadlock. See below for detailed description of possible deadlock: <http://marc.info/?l=linux-kernel&m=120311479108569&w=2> - To solve that, request-based dm offloads the completion of cloned struct request to softirq context (i.e. using blk_complete_request() from rq->end_io). - Though it is possible to use the same solution from bio->bi_end_io, it will delay the notification of bio completion to the original submitter. Also, it will cause inefficient partial completion, because the lower driver can't perform the cloned request anymore and request-based dm needs to requeue and redispatch it to the lower driver again later. That's not good. - So request-based dm needs blk_update_request() to perform the bio completion in the lower driver's completion context, which is more efficient. Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
-
Jens Axboe authored
When a driver calls blk_cleanup_queue(), the device should be fully idle. However, the block layer may have pending plugging timers and the IO schedulers may have pending work in the work queues. So quisce the device by waiting for the timer and flushing the work queues. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
-
Chris Lalancette authored
Until recently, the maximum number of xvd block devices you could attach to a Xen domU was 16. This limitation turned out to be problematic for some users, so it was expanded to handle a much larger number of disks. However, this requires a couple of changes in the way that blkfront scans for disks. This functionality is already present in the Xen linux-2.6.18-xen.hg tree; the attached patch adds this functionality to the mainline xen-blkfront implementation. I successfully tested it on a 2.6.25 tree, and build tested it on 2.6.27-rc3. Signed-off-by: Chris Lalancette <clalance@redhat.com> Acked-by: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
-
Jens Axboe authored
Don't put functions that are only used in fs/bio-integrity.c in blkdev.h, it's much cleaner to just keep it in there. Also kill completely unused bdev_get_tag_size() Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
-
Jens Axboe authored
We cannot abort a request if we raced with the timeout handler already, or with the IO completion. So make blk_abort_request() mark the request as complete, and only continue if we succeeded. Found and suggested by Mike Anderson <andmike@linux.vnet.ibm.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
-
Jens Axboe authored
Only works for the generic request timer handling. Allows one to sporadically ignore request completions, thus exercising the timeout handling. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
-
Jens Axboe authored
Not all callers need (or want!) the mempool backing guarentee, it essentially means that you can only use bio_alloc() for short allocations and not for preallocating some bio's at setup or init time. So add bio_kmalloc() which does the same thing as bio_alloc(), except it just uses kmalloc() as the backing instead of the bio mempools. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
-
Hugh Dickins authored
Two mods to blkdev_issue_discard(), thinking ahead to its use on swap: 1. Add gfp_mask argument, so swap allocation can use it where GFP_KERNEL might deadlock but GFP_NOIO is safe. 2. Enlarge nr_sects argument from unsigned to sector_t: unsigned long is enough to cover a whole swap area, but sector_t suits any partition. Change sb_issue_discard()'s nr_blocks to sector_t too; but no need seen for a gfp_mask there, just pass GFP_KERNEL down to blkdev_issue_discard(). Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
-
FUJITA Tomonori authored
blk_rq_unmap_user in sg_finish_rem_req can take care of all the cases. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
-
FUJITA Tomonori authored
sg_read_xfer was used to copy data to user space for READ commands. blk_rq_unmap_user does the job so sg_read_xfer does nothing useful. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
-
FUJITA Tomonori authored
sg_write_xfer was used to copy data from user space for WRITE commands. blk_rq_map_user_iov and blk_rq_map_user do the job so sg_write_xfer does nothing useful. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
-
FUJITA Tomonori authored
Calling blk_rq_map_user() at a single place is better than at different two places. It makes the code more understandable. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
-
FUJITA Tomonori authored
__sg_start_req() was used temporarily to call blk_get_request() during converting sg to use the block layer. Now sg always calls blk_get_request() so we can move blk_get_request() to sg_start_req(). We don't need __sg_start_req anymore. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
-
FUJITA Tomonori authored
It's not used for anything useful after the block layer conversion. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
-
FUJITA Tomonori authored
sg had lots of the own functions for the direct IO but now sg uses the block layer functions for it. There are only five lines for the direct IO. SG_ALLOW_DIO_CODE define was used to compile out the direct IO code but we don't need the define. If someone wants to remove the direct IO code, he can do easily without the define. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
-
FUJITA Tomonori authored
old sg_rq_end_io() was used to wrap sg_cmd_done during converting sg to use the block layer (in order to cover the difference scsi_execute_async and blk_execute_rq_nowait). Now we don't need it so let's remove it. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
-
Mike Anderson authored
Signed-off-by: Mike Anderson <andmike@linux.vnet.ibm.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
-
Mike Anderson authored
Signed-off-by: Mike Anderson <andmike@linux.vnet.ibm.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
-
Jens Axboe authored
Right now SCSI and others do their own command timeout handling. Move those bits to the block layer. Instead of having a timer per command, we try to be a bit more clever and simply have one per-queue. This avoids the overhead of having to tear down and setup a timer for each command, so it will result in a lot less timer fiddling. Signed-off-by: Mike Anderson <andmike@linux.vnet.ibm.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
-
Andrew Patterson authored
We call flush_disk() to make sure the buffer cache for the disk is flushed after a disk resize. There are two resize cases, growing and shrinking. Given that users can shrink/then grow a disk before revalidate_disk() is called, we treat the grow case identically to shrinking. We need to flush the buffer cache after an online shrink because, as James Bottomley puts it, The two use cases for shrinking I can see are 1. planned: the fs is already shrunk to within the new boundaries and all data is relocated, so invalidate is fine (any dirty buffers that might exist in the shrunk region are there only because they were relocated but not yet written to their original location). 2. unplanned: In this case, the fs is probably toast, so whether we invalidate or not isn't going to make a whole lot of difference; it's still going to try to read or write from sectors beyond the new size and get I/O errors. Immediately invalidating shrunk disks will cause errors for outstanding I/Os for reads/write beyond the new end of the disk to be generated earlier then if we waited for the normal buffer cache operation. It also removes a potential security hole where we might keep old data around from beyond the end of the shrunk disk if the disk was not invalidated. Signed-off-by: Andrew Patterson <andrew.patterson@hp.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
-
Andrew Patterson authored
We need to be able to flush the buffer cache for for more than just when a disk is changed, so we factor out common cache flush code in check_disk_change() to an internal flush_disk() routine. This routine will then be used for both disk changes and disk resizes (in a later patch). Include the disk name in the text indicating that there are busy inodes on the device and increase the KERN severity of the message. Signed-off-by: Andrew Patterson <andrew.patterson@hp.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
-
Andrew Patterson authored
Modify the SCSI disk driver to call the revalidate_disk() wrapper. This allows us to do some housekeeping such as accounting for a disk being resized online. The wrapper will call sd_revalidate_disk() at the appropriate time. Signed-off-by: Andrew Patterson <andrew.patterson@hp.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
-
Andrew Patterson authored
Check for device resize in the rescan_partitions() routine. If the device has been resized, the bdev size is set to match. The rescan_partitions() routine is called when opening the device and when calling the BLKRRPART ioctl. Signed-off-by: Andrew Patterson <andrew.patterson@hp.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
-
Andrew Patterson authored
The revalidate_disk routine now checks if a disk has been resized by comparing the gendisk capacity to the bdev inode size. If they are different (usually because the disk has been resized underneath the kernel) the bdev inode size is adjusted to match the capacity. Signed-off-by: Andrew Patterson <andrew.patterson@hp.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
-
Andrew Patterson authored
This is a wrapper for the lower-level revalidate_disk call-backs such as sd_revalidate_disk(). It allows us to perform pre and post operations when calling them. We will use this wrapper in a later patch to adjust block device sizes after an online resize (a _post_ operation). Signed-off-by: Andrew Patterson <andrew.patterson@hp.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
-