- 17 Jul, 2020 3 commits
-
-
Coly Li authored
Currently REQ_OP_ZONE_RESET and REQ_OP_ZONE_RESET_ALL are defined as even numbers 6 and 8, such zone reset bios are treated as READ bios by bio_data_dir(), which is obviously misleading. The macro bio_data_dir() is defined in include/linux/bio.h as, 55 #define bio_data_dir(bio) \ 56 (op_is_write(bio_op(bio)) ? WRITE : READ) And op_is_write() is defined in include/linux/blk_types.h as, 397 static inline bool op_is_write(unsigned int op) 398 { 399 return (op & 1); 400 } The convention of op_is_write() is when there is data transfer then the op code should be odd number, and treat as a write op. bio_data_dir() treats all bio direction as READ if op_is_write() reports false, and WRITE if op_is_write() reports true. Because REQ_OP_ZONE_RESET and REQ_OP_ZONE_RESET_ALL are even numbers, although they don't transfer data but reporting them as READ bio by bio_data_dir() is misleading and might be wrong. Because these two commands will reset the writer pointers of the resetting zones, and all content after the reset write pointer will be invalid and unaccessible, obviously they are not READ bios in any means. This patch changes REQ_OP_ZONE_RESET from 6 to 15, and changes REQ_OP_ZONE_RESET_ALL from 8 to 17. Now bios with these two op code can be treated as WRITE by bio_data_dir(). Although they don't transfer data, now we keep them consistent with REQ_OP_DISCARD and REQ_OP_WRITE_ZEROES with the ituition that they change on-media content and should be WRITE request. Signed-off-by: Coly Li <colyli@suse.de> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Jens Axboe <axboe@fb.com> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: Keith Busch <kbusch@kernel.org> Cc: Shaun Tancheff <shaun.tancheff@seagate.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Yufen Yu authored
Commit 7520872c ("block: don't defer flushes on blk-mq + scheduling") tried to fix deadlock for cycled wait between flush requests and data request into flush_data_in_flight. The former holded all driver tags and wait for data request completion, but the latter can not complete for waiting free driver tags. After commit 923218f6 ("blk-mq: don't allocate driver tag upfront for flush rq"), flush requests will not get driver tag before queuing into flush queue. * With elevator, flush request just get sched_tags before inserting flush queue. It will not get driver tag until issue them to driver. data request on list fq->flush_data_in_flight will complete in the end. * Without elevator, each flush request will get a driver tag when allocate request. Then data request on fq->flush_data_in_flight don't worry about lacking driver tag. In both of these cases, cycled wait cannot be true. So we may allow to defer flush request. Signed-off-by: Yufen Yu <yuyufen@huawei.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Wei Yongjun authored
The sparse tool complains as follows: block/blk-timeout.c:93:12: warning: symbol 'blk_timeout_init' was not declared. Should it be static? Function blk_timeout_init() is not used outside of blk-timeout.c, so mark it static. Fixes: 9054650f ("block: relax jiffies rounding for timeouts") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 16 Jul, 2020 6 commits
-
-
John Ogness authored
The reverse-order double lock dance in ioc_release_fn() is using a retry loop. This is a problem on PREEMPT_RT because it could preempt the task that would release q->queue_lock and thus live lock in the retry loop. RCU is already managing the freeing of the request queue and icq. If the trylock fails, use RCU to guarantee that the request queue and icq are not freed and re-acquire the locks in the correct order, allowing forward progress. Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
John Ogness authored
The legacy CFQ IO scheduler could call put_io_context() in its exit_icq() elevator callback. This led to a lockdep warning, which was fixed in commit d8c66c5d ("block: fix lockdep warning on io_context release put_io_context()") by using a nested subclass for the ioc spinlock. However, with commit f382fb0b ("block: remove legacy IO schedulers") the CFQ IO scheduler no longer exists. The BFQ IO scheduler also implements the exit_icq() elevator callback but does not call put_io_context(). The nested subclass for the ioc spinlock is no longer needed. Since it existed as an exception and no longer applies, remove the nested subclass usage. Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
bd_start_claiming duplicates a lot of the work done in __blkdev_get. Integrate the two functions to avoid the duplicate work, and to do the right thing for the md -ERESTARTSYS corner case. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
The arcane magic in bd_start_claiming is only needed to be able to claim a block_device that hasn't been fully set up. Switch the loop driver that claims from the ioctl path with a fully set up struct block_device to just use the much simpler bd_prepare_to_claim directly. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Move the locking and assignment of bd_claiming from bd_start_claiming to bd_prepare_to_claim. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Insted of duplicating all the cleanup logic jump to the code that cleans up anyway, and restart after that. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 15 Jul, 2020 3 commits
-
-
Jens Axboe authored
This reverts commit 826f2f48. Qian Cai reports that this commit causes stalls with swap. Revert until the reason can be figured out. Reported-by: Qian Cai <cai@lca.pw> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Ming Lei authored
In theory, when GENHD_FL_NO_PART_SCAN is set, no partitions can be created on one disk. However, ioctl(BLKPG, BLKPG_ADD_PARTITION) doesn't check GENHD_FL_NO_PART_SCAN, so partitions still can be added even though GENHD_FL_NO_PART_SCAN is set. So far blk_drop_partitions() only removes partitions when disk_part_scan_enabled() return true. This way can make ghost partition on loop device after changing/clearing FD in case that PARTSCAN is disabled, such as partitions can be added via 'parted' on loop disk even though GENHD_FL_NO_PART_SCAN is set. Fix this issue by always removing partitions in blk_drop_partitions(), and this way is correct because the current code supposes that no partitions can be added in case of GENHD_FL_NO_PART_SCAN. Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
In doing high IOPS testing, blk-mq is generally pretty well optimized. There are a few things that stuck out as using more CPU than what is really warranted, and one thing is the round_jiffies_up() that we do twice for each request. That accounts for about 0.8% of the CPU in my testing. We can make this cheaper by avoiding an integer division, by just adding a rough HZ mask that we can AND with instead. The timeouts are only on a second granularity already, we don't have to be that accurate here and this patch barely changes that. All we care about is nice grouping. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 10 Jul, 2020 2 commits
-
-
Baolin Wang authored
We've already validated the 'q->elevator' before calling ->ops.completed_request() in blk_mq_sched_completed_request(), thus no need to validate rq->internal_tag again. Rmove it. Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Baolin Wang authored
Remove unnecessary local variable 'ret' in blk_mq_dispatch_hctx_list(). Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 08 Jul, 2020 12 commits
-
-
Christoph Hellwig authored
Except for pktdvd, the only places setting congested bits are file systems that allocate their own backing_dev_info structures. And pktdvd is a deprecated driver that isn't useful in stack setup either. So remove the dead congested_fn stacking infrastructure. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Song Liu <song@kernel.org> Acked-by: David Sterba <dsterba@suse.com> [axboe: fixup unused variables in bcache/request.c] Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
We never set any congested bits in the group writeback instances of it. And for the simpler bdi-wide case a simple scalar field is all that that is needed. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Just merge them into their only callers. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
bdi_rw_congested returns congestion state, so calling it without looking at the return value doesn't make much sense. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
The mmc driver doesn't support event notifications, which means that check_disk_change is a no-op. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
The simdisk driver doesn't support event notifications, which means that check_disk_change is a no-op. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
check_disk_change isn't for consumers of the block layer, so remove the comment mentioning it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
flush_disk has only two callers, so open code it there. That also helps clarifying the error message for the particular case, and allows to remove setting bd_invalidated in check_disk_size_change, which will be cleared again instantly. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
As well as the ->media_changed method. All these are left over from before the drivers were switched over to the check_events scheme. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
md is the last driver using the legacy media_changed method. Switch it over to (not so) new ->clear_events approach, which also removes the need for the ->revalidate_disk method. Signed-off-by: Christoph Hellwig <hch@lst.de> [axboe: remove unused 'bdops' variable in disk_clear_events()] Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Ming Lei authored
Move .nr_active update and request assignment into blk_mq_get_driver_tag(), all are good to do during getting driver tag. Meantime blk-flush related code is simplified and flush request needn't to update the request table manually any more. Signed-off-by: Ming Lei <ming.lei@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Ming Lei authored
Current handling of q->mq_ops->queue_rq result is a bit ugly: - two branches which needs to 'continue' have to check if the dispatch local list is empty, otherwise one bad request may be retrieved via 'rq = list_first_entry(list, struct request, queuelist);' - the branch of 'if (unlikely(ret != BLK_STS_OK))' isn't easy to follow, since it is actually one error branch. Streamline this handling, so the code becomes more readable, meantime potential kernel oops can be avoided in case that the last request in local dispatch list is failed. Fixes: fc17b653 ("blk-mq: switch ->queue_rq return value to blk_status_t") Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 07 Jul, 2020 1 commit
-
-
Christoph Hellwig authored
If blk_mq_submit_bio flushes the plug list, bios for other disks can show up on current->bio_list. As that doesn't involve any stacking of block device it is entirely harmless and we should not warn about this case. Fixes: ff93ea0c ("block: shortcut __submit_bio_noacct for blk-mq drivers") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 02 Jul, 2020 2 commits
-
-
Christoph Hellwig authored
bio_alloc_bioset references current->bio_list[1], so we need to initialize it for the blk-mq submission path as well. Fixes: ff93ea0c ("block: shortcut __submit_bio_noacct for blk-mq drivers") Reported-by: Qian Cai <cai@lca.pw> Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
This reverts commits the following commits: 37f4a24c 723bf178 36a3df5a The last one is the culprit, but we have to go a bit deeper to get this to revert cleanly. There's been a report that this breaks some MMC setups [1], and also causes an issue with swap [2]. Until this can be figured out, revert the offending commits. [1] https://lore.kernel.org/linux-block/57fb09b1-54ba-f3aa-f82c-d709b0e6b281@samsung.com/ [2] https://lore.kernel.org/linux-block/20200702043721.GA1087@lca.pw/Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Reported-by: Qian Cai <cai@lca.pw> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 01 Jul, 2020 11 commits
-
-
Jens Axboe authored
Since merging the commit identified in Fixes below, we trigger this compile time warning: drivers/md/dm.c: In function ‘__map_bio’: drivers/md/dm.c:1296:24: warning: unused variable ‘md’ [-Wunused-variable] 1296 | struct mapped_device *md = io->md; | ^~ Remove the 'md' variable. Fixes: 5a6c35f9 ("block: remove direct_make_request") Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
John Garry authored
sbitmap works by maintaining separate bitmaps of set and cleared bits. The set bits are cleared in a batch, to save the burden of continuously locking the "word" map to unset. sbitmap_bitmap_show() only shows the set bits (in "word"), which is not too much use, so mask out the cleared bits. Fixes: ea86ea2c ("sbitmap: ammortize cost of clearing bits") Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Instead just iterate over the inodes for the block device superblock. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Just use bd_disk->queue instead. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
We can trivially calculate the block size from the inodes i_blkbits variable. Use that instead of keeping two redundant copies of the information in slightly different formats. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
The loop to increase the initial block size doesn't really make any sense, as the AND operation won't match for powers of two if it didn't for the initial block size. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
bd_block_size contains a value that matches the logic block size when opening, so the statement is redundant. Even if it wasn't the dumb assignment would cause a a mismatch with bd_inode->i_blkbits. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Use the block_size helper instead of open coding it. Also remove the check for a 0 block size, as that can't happen. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Hongnan Li authored
ktime_to_ns(ktime_get()), which is expensive, does not need to be called if blk_iolatency_enabled() return false in blkcg_iolatency_done_bio(). Postponing ktime_to_ns(ktime_get()) execution reduces the CPU usage when blk_iolatency is disabled. Signed-off-by: Hongnan Li <hongnan.li@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Now that submit_bio_noacct has a decent blk-mq fast path there is no more need for this bypass. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-