- 28 Jun, 2016 3 commits
-
-
Jan Kara authored
Commit 9a7f38c4 (cfq-iosched: Convert from jiffies to nanoseconds) broke the condition for detecting starved sync IO in cfq_completed_request() because rq->start_time remained in jiffies but we compared it with nanosecond values. This manifested as a regression in bonnie++ rewrite performance because we always ended up considering sync IO starved and thus never increased async IO queue depth. Since rq->start_time is used in a lot of places, converting it to ns values would be non-trivial. So just revert the condition in CFQ to use comparison with jiffies. This will lead to suboptimal results if cfq_fifo_expire[1] will ever come close to 1 jiffie but so far we are relatively far from that with the storage used with CFQ (the default value is 128 ms). Fixes: 9a7f38c4Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Jan Kara authored
slice_resid can be both positive and negative. Commit 9a7f38c4 (cfq-iosched: Convert from jiffies to nanoseconds) converted it from long to u64. Although this did not introduce any functional regression (the operations just overflow and the result was fine), it is certainly wrong and could cause issues in future. So convert the type to more appropriate s64. Fixes: 9a7f38c4Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Jan Kara authored
Currently rq->fifo_time is unsigned long but CFQ stores nanosecond timestamp in it which would overflow on 32-bit archs. Convert it to u64 to avoid the overflow. Since the rq->fifo_time is unioned with struct call_single_data(), this does not change the size of struct request in any way. We have to slightly fixup block/deadline-iosched.c so that comparison happens in the right types. Fixes: 9a7f38c4Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
-
- 17 Jun, 2016 1 commit
-
-
Arnd Bergmann authored
The blktrace code stores the current time in a 32-bit word in its user interface. This is a bad idea because 32-bit seconds overflow at some point. We probably have until 2106 before this one overflows, as it seems to use an 'unsigned' variable, but we should confirm that user space treats it the same way. Aside from this, we want to stop using 'struct timespec' here, so I'm adding a comment about the overflow and change the code to use timespec64 instead to make the loss of range more obvious. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jens Axboe <axboe@fb.com>
-
- 14 Jun, 2016 3 commits
-
-
Bart Van Assche authored
Detected by sparse. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Bart Van Assche authored
This patch avoids that building with W=1 C=2 triggers the following warning: block/bio-integrity.c:35:6: warning: symbol 'blk_flush_integrity' was not declared. Should it be static? Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Bart Van Assche authored
A value is assigned to the variable 'info' but that value is never used. Hence remove the variable 'info'. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
- 10 Jun, 2016 1 commit
-
-
Ming Lei authored
No one need this macro now, so remove it. Basically only how many bvecs in one bio matters instead of how many bytes in this bio. The motivation is for supporting multipage bvecs, in which we only know what the max count of bvecs is supported in the bio. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@canonical.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
- 09 Jun, 2016 10 commits
-
-
Jens Axboe authored
If we're queuing REQ_PRIO IO and the task is running at an idle IO class, then temporarily boost the priority. This prevents livelocks due to priority inversion, when a low priority task is holding file system resources while attempting to do IO. An example of that is shown below. An ioniced idle task is holding the directory mutex, while a normal priority task is trying to do a directory lookup. [478381.198925] ------------[ cut here ]------------ [478381.200315] INFO: task ionice:1168369 blocked for more than 120 seconds. [478381.201324] Not tainted 4.0.9-38_fbk5_hotfix1_2936_g85409c6 #1 [478381.202278] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [478381.203462] ionice D ffff8803692736a8 0 1168369 1 0x00000080 [478381.203466] ffff8803692736a8 ffff880399c21300 ffff880276adcc00 ffff880369273698 [478381.204589] ffff880369273fd8 0000000000000000 7fffffffffffffff 0000000000000002 [478381.205752] ffffffff8177d5e0 ffff8803692736c8 ffffffff8177cea7 0000000000000000 [478381.206874] Call Trace: [478381.207253] [<ffffffff8177d5e0>] ? bit_wait_io_timeout+0x80/0x80 [478381.208175] [<ffffffff8177cea7>] schedule+0x37/0x90 [478381.208932] [<ffffffff8177f5fc>] schedule_timeout+0x1dc/0x250 [478381.209805] [<ffffffff81421c17>] ? __blk_run_queue+0x37/0x50 [478381.210706] [<ffffffff810ca1c5>] ? ktime_get+0x45/0xb0 [478381.211489] [<ffffffff8177c407>] io_schedule_timeout+0xa7/0x110 [478381.212402] [<ffffffff810a8c2b>] ? prepare_to_wait+0x5b/0x90 [478381.213280] [<ffffffff8177d616>] bit_wait_io+0x36/0x50 [478381.214063] [<ffffffff8177d325>] __wait_on_bit+0x65/0x90 [478381.214961] [<ffffffff8177d5e0>] ? bit_wait_io_timeout+0x80/0x80 [478381.215872] [<ffffffff8177d47c>] out_of_line_wait_on_bit+0x7c/0x90 [478381.216806] [<ffffffff810a89f0>] ? wake_atomic_t_function+0x40/0x40 [478381.217773] [<ffffffff811f03aa>] __wait_on_buffer+0x2a/0x30 [478381.218641] [<ffffffff8123c557>] ext4_bread+0x57/0x70 [478381.219425] [<ffffffff8124498c>] __ext4_read_dirblock+0x3c/0x380 [478381.220467] [<ffffffff8124665d>] ext4_dx_find_entry+0x7d/0x170 [478381.221357] [<ffffffff8114c49e>] ? find_get_entry+0x1e/0xa0 [478381.222208] [<ffffffff81246bd4>] ext4_find_entry+0x484/0x510 [478381.223090] [<ffffffff812471a2>] ext4_lookup+0x52/0x160 [478381.223882] [<ffffffff811c401d>] lookup_real+0x1d/0x60 [478381.224675] [<ffffffff811c4698>] __lookup_hash+0x38/0x50 [478381.225697] [<ffffffff817745bd>] lookup_slow+0x45/0xab [478381.226941] [<ffffffff811c690e>] link_path_walk+0x7ae/0x820 [478381.227880] [<ffffffff811c6a42>] path_init+0xc2/0x430 [478381.228677] [<ffffffff813e6e26>] ? security_file_alloc+0x16/0x20 [478381.229776] [<ffffffff811c8c57>] path_openat+0x77/0x620 [478381.230767] [<ffffffff81185c6e>] ? page_add_file_rmap+0x2e/0x70 [478381.232019] [<ffffffff811cb253>] do_filp_open+0x43/0xa0 [478381.233016] [<ffffffff8108c4a9>] ? creds_are_invalid+0x29/0x70 [478381.234072] [<ffffffff811c0cb0>] do_open_execat+0x70/0x170 [478381.235039] [<ffffffff811c1bf8>] do_execveat_common.isra.36+0x1b8/0x6e0 [478381.236051] [<ffffffff811c214c>] do_execve+0x2c/0x30 [478381.236809] [<ffffffff811ca392>] ? getname+0x12/0x20 [478381.237564] [<ffffffff811c23be>] SyS_execve+0x2e/0x40 [478381.238338] [<ffffffff81780a1d>] stub_execve+0x6d/0xa0 [478381.239126] ------------[ cut here ]------------ [478381.239915] ------------[ cut here ]------------ [478381.240606] INFO: task python2.7:1168375 blocked for more than 120 seconds. [478381.242673] Not tainted 4.0.9-38_fbk5_hotfix1_2936_g85409c6 #1 [478381.243653] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [478381.244902] python2.7 D ffff88005cf8fb98 0 1168375 1168248 0x00000080 [478381.244904] ffff88005cf8fb98 ffff88016c1f0980 ffffffff81c134c0 ffff88016c1f11a0 [478381.246023] ffff88005cf8ffd8 ffff880466cd0cbc ffff88016c1f0980 00000000ffffffff [478381.247138] ffff880466cd0cc0 ffff88005cf8fbb8 ffffffff8177cea7 ffff88005cf8fcc8 [478381.248252] Call Trace: [478381.248630] [<ffffffff8177cea7>] schedule+0x37/0x90 [478381.249382] [<ffffffff8177d08e>] schedule_preempt_disabled+0xe/0x10 [478381.250465] [<ffffffff8177e892>] __mutex_lock_slowpath+0x92/0x100 [478381.251409] [<ffffffff8177e91b>] mutex_lock+0x1b/0x2f [478381.252199] [<ffffffff817745ae>] lookup_slow+0x36/0xab [478381.253023] [<ffffffff811c690e>] link_path_walk+0x7ae/0x820 [478381.253877] [<ffffffff811aeb41>] ? try_charge+0xc1/0x700 [478381.254690] [<ffffffff811c6a42>] path_init+0xc2/0x430 [478381.255525] [<ffffffff813e6e26>] ? security_file_alloc+0x16/0x20 [478381.256450] [<ffffffff811c8c57>] path_openat+0x77/0x620 [478381.257256] [<ffffffff8115b2fb>] ? lru_cache_add_active_or_unevictable+0x2b/0xa0 [478381.258390] [<ffffffff8117b623>] ? handle_mm_fault+0x13f3/0x1720 [478381.259309] [<ffffffff811cb253>] do_filp_open+0x43/0xa0 [478381.260139] [<ffffffff811d7ae2>] ? __alloc_fd+0x42/0x120 [478381.260962] [<ffffffff811b95ac>] do_sys_open+0x13c/0x230 [478381.261779] [<ffffffff81011393>] ? syscall_trace_enter_phase1+0x113/0x170 [478381.262851] [<ffffffff811b96c2>] SyS_open+0x22/0x30 [478381.263598] [<ffffffff81780532>] system_call_fastpath+0x12/0x17 [478381.264551] ------------[ cut here ]------------ [478381.265377] ------------[ cut here ]------------ Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
-
Ming Lei authored
Use BIO_MAX_PAGES instead and we will remove BIO_MAX_SIZE. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@canonical.com> Tested-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Ming Lei authored
No one need this macro, so remove it. The motivation is for supporting multipage bvecs, in which we only know what the max count of bvecs is supported in the bio, instead of max size or max sectors. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@canonical.com> Tested-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Ming Lei authored
BIO_MAX_PAGES is used as maximum count of bvecs, so replace BIO_MAX_SECTORS with BIO_MAX_PAGES since BIO_MAX_SECTORS is to be removed. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@canonical.com> Tested-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Ming Lei authored
bvec has one native/mature iterator for long time, so not necessary to use the reinvented wheel for iterating bvecs in lib/iov_iter.c. Two ITER_BVEC test cases are run: - xfstest(-g auto) on loop dio/aio, no regression found - swap file works well under extreme stress(stress-ng --all 64 -t 800 -v), and lots of OOMs are triggerd, and the whole system still survives Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@canonical.com> Tested-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Ming Lei authored
bvec_iter_advance() only writes the parameter of iterator, so the base address of bvec can be marked as const safely. Without the change, we can see compiling warning in the following patch for implementing iterate_bvec(): lib/iov_iter.c with bvec iterator. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@canonical.com> Tested-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Ming Lei authored
This patch moves 'struct bio_vec' and 'struct bvec_iter' into 'include/linux/bvec.h', then always include this header into 'include/linux/blk_types.h'. With this change, both 'struct bvec_iter' and bvec iterator helpers don't depend on CONFIG_BLOCK any more, then we can use bvec iterator to implement iterate_bvec(): lib/iov_iter.c. Reviewed-by: Christoph Hellwig <hch@lst.de> Suggested-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Ming Lei <ming.lei@canonical.com> Tested-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Ming Lei authored
bvec iterator helpers should be used to implement by iterate_bvec():lib/iov_iter.c too, and move them into one header, so that we can keep bvec iterator header out of CONFIG_BLOCK. Then we can remove the reinventing of wheel in iterate_bvec(). Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@canonical.com> Tested-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Omar Sandoval authored
If ->queue_rq() returns BLK_MQ_RQ_QUEUE_OK, we use continue and skip over the rest of the loop body. However, dptr is assigned later in the loop body, and the BLK_MQ_RQ_QUEUE_OK case is exactly the case that we'd want it for. NVMe isn't actually using BLK_MQ_F_DEFER_ISSUE yet, nor is any other in-tree driver, but if the code's going to be there, it might as well work. Fixes: 74c45052 ("blk-mq: add a 'list' parameter to ->queue_rq()") Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Christoph Hellwig authored
Keep the 32-bit CPU and cmd_type flags together to avoid holes on 64-bit architectures. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
-
- 08 Jun, 2016 4 commits
-
-
Mike Christie authored
This was missed from my last patchset. This patch has ext4 crypto code use the bio op helper to set the operation. The operation (discard, write, writesame, etc) is now defined seperately from the other REQ bits. They still share the bi_rw field to save space, so we use these helpers so modules do not have to worry about setting/overwriting info. Jens, I am not sure how you handle patches on top of patches in the next branches. If you merge patches that fix issues in previous patches in next, then this patch could be part of commit 95fe6c1a Author: Mike Christie <mchristi@redhat.com> Date: Sun Jun 5 14:31:48 2016 -0500 block, fs, mm, drivers: use bio set/get op accessors Signed-off-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Jan Kara authored
Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jan Kara <jack@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Jeff Moyer authored
Expose interfaces to tune time slices of CFQ IO scheduler in microseconds. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Jeff Moyer authored
Convert all time-keeping in CFQ IO scheduler from jiffies to nanoseconds so that we can later make the intervals more fine-grained than jiffies. One jiffie is several miliseconds and even for today's rotating disks that is a noticeable amount of time and thus we leave disk unnecessarily idle. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
-
- 07 Jun, 2016 18 commits
-
-
Mike Christie authored
To avoid confusion between REQ_OP_FLUSH, which is handled by request_fn drivers, and upper layers requesting the block layer perform a flush sequence along with possibly a WRITE, this patch renames REQ_FLUSH to REQ_PREFLUSH. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Mike Christie authored
The last patch added a REQ_OP_FLUSH for request_fn drivers and the next patch renames REQ_FLUSH to REQ_PREFLUSH which will be used by file systems and make_request_fn drivers so they can send a write/flush combo. This patch drops xen's use of REQ_FLUSH to track if it supports REQ_OP_FLUSH requests, so REQ_FLUSH can be deleted. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Juergen Gross <kernel@pfupf.net> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Mike Christie authored
This adds a REQ_OP_FLUSH operation that is sent to request_fn based drivers by the block layer's flush code, instead of sending requests with the request->cmd_flags REQ_FLUSH bit set. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Mike Christie authored
This patch drops the compat definition of req_op where it matches the rq_flag_bits definitions, and drops the related old and compat code that allowed users to set either the op or flags for the operation. We also then store the operation in the bi_rw/cmd_flags field similar to how we used to store the bio ioprio where it sat in the upper bits of the field. Signed-off-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Mike Christie authored
We don't need bi_rw to be so large on 64 bit archs, so reduce it to unsigned int. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Mike Christie authored
In the next patch, we move drop the compat code and make the op a separate value that is hidden in bi_rw. To give the op and rq bits flags room to grow this moves prio to its own field. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Mike Christie authored
The block layer will set the correct READ/WRITE operation flags/fields when creating a request, so there is not need for drivers to set the REQ_WRITE flag. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Mike Christie authored
Have blktrace use the req/bio op accessor to get the REQ_OP. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Mike Christie authored
The req operation REQ_OP is separated from the rq_flag_bits definition. This converts the block layer drivers to use req_op to get the op from the request struct. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Mike Christie authored
This patch converts the is_sync helpers to use separate variables for the operation and flags. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Mike Christie authored
This patch converts the block layer merging code to use separate variables for the operation and flags, and to check req_op for the REQ_OP. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Mike Christie authored
The bio and request operation and flags are going to be separate definitions, so we cannot pass them in as a bitmap. This patch converts the blkg_rwstat code and its caller, cfq, to pass in the values separately. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Mike Christie authored
This patch converts the elevator code to use separate variables for the operation and flags, and to check req_op for the REQ_OP. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Mike Christie authored
This patch modifies the blk mq request creation code to use separate variables for the operation and flags, because in the the next patches the struct request users will be converted like was done for bios. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Mike Christie authored
This patch prepares *_get_request/*_put_request and freed_request, to use separate variables for the operation and flags. In the next patches the struct request users will be converted like was done for bios where the op and flags are set separately. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Mike Christie authored
The bio users should now always be setting up the bio op. This patch has the block layer copy that to the request. Signed-off-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Mike Christie authored
Separate the op from the rq_flag_bits and have xen set/get the bio using bio_set_op_attrs/bio_op. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Mike Christie authored
Separate the op from the rq_flag_bits and have the target layer set/get the bio using bio_set_op_attrs/bio_op. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-