- 27 Sep, 2018 2 commits
-
-
Omar Sandoval authored
Kyber will need this in a future change if it is built as a module. Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Omar Sandoval authored
Commit 4bc6339a ("block: move blk_stat_add() to __blk_mq_end_request()") consolidated some calls using ktime_get() so we'd only need to call it once. Kyber's ->completed_request() hook also calls ktime_get(), so let's move it to the same place, too. Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 26 Sep, 2018 13 commits
-
-
Bart Van Assche authored
Now that the blk-mq core processes power management requests (marked with RQF_PREEMPT) in other states than RPM_ACTIVE, enable runtime power management for blk-mq. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Jianchao Wang <jianchao.w.wang@oracle.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Bart Van Assche authored
Instead of allowing requests that are not power management requests to enter the queue in runtime suspended status (RPM_SUSPENDED), make the blk_get_request() caller block. This change fixes a starvation issue: it is now guaranteed that power management requests will be executed no matter how many blk_get_request() callers are waiting. For blk-mq, instead of maintaining the q->nr_pending counter, rely on q->q_usage_counter. Call pm_runtime_mark_last_busy() every time a request finishes instead of only if the queue depth drops to zero. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Jianchao Wang <jianchao.w.wang@oracle.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Bart Van Assche authored
A later patch will call blk_freeze_queue_start() followed by blk_mq_unfreeze_queue() without waiting for q_usage_counter to drop to zero. Make sure that this doesn't cause a kernel warning to appear by switching from percpu_ref_reinit() to percpu_ref_resurrect(). The former namely requires that the refcount it operates on is zero. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Jianchao Wang <jianchao.w.wang@oracle.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Bart Van Assche authored
This function will be used in a later patch to switch the struct request_queue q_usage_counter from killed back to live. In contrast to percpu_ref_reinit(), this new function does not require that the refcount is zero. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Ming Lei <ming.lei@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Jianchao Wang <jianchao.w.wang@oracle.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Bart Van Assche authored
Instead of scheduling runtime resume of a request queue after a request has been queued, schedule asynchronous resume during request allocation. The new pm_request_resume() calls occur after blk_queue_enter() has increased the q_usage_counter request queue member. This change is needed for a later patch that will make request allocation block while the queue status is not RPM_ACTIVE. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Jianchao Wang <jianchao.w.wang@oracle.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Bart Van Assche authored
Move the pm_request_resume() and pm_runtime_mark_last_busy() calls into two new functions and thereby separate legacy block layer code from code that works for both the legacy block layer and blk-mq. A later patch will add calls to the new functions in the blk-mq code. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Jianchao Wang <jianchao.w.wang@oracle.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Bart Van Assche authored
The RQF_PREEMPT flag is used for three purposes: - In the SCSI core, for making sure that power management requests are executed even if a device is in the "quiesced" state. - For domain validation by SCSI drivers that use the parallel port. - In the IDE driver, for IDE preempt requests. Rename "preempt-only" into "pm-only" because the primary purpose of this mode is power management. Since the power management core may but does not have to resume a runtime suspended device before performing system-wide suspend and since a later patch will set "pm-only" mode as long as a block device is runtime suspended, make it possible to set "pm-only" mode from more than one context. Since with this change scsi_device_quiesce() is no longer idempotent, make that function return early if it is called for a quiesced queue. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Cc: Jianchao Wang <jianchao.w.wang@oracle.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Bart Van Assche authored
Move the code for runtime power management from blk-core.c into the new source file blk-pm.c. Move the corresponding declarations from <linux/blkdev.h> into <linux/blk-pm.h>. For CONFIG_PM=n, leave out the declarations of the functions that are not used in that mode. This patch not only reduces the number of #ifdefs in the block layer core code but also reduces the size of header file <linux/blkdev.h> and hence should help to reduce the build time of the Linux kernel if CONFIG_PM is not defined. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Jianchao Wang <jianchao.w.wang@oracle.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Nothing Xen specific in these headers, which get included from a lot of code in the kernel. So prune the includes and move them to the Xen-specific files that actually use them instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Take the Xen check into the core code instead of delegating it to the architectures. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Having multiple externs in arch headers is not a good way to provide a common interface. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
BIOVEC_PHYS_MERGEABLE is only called from core block code. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 24 Sep, 2018 9 commits
-
-
Christoph Hellwig authored
No need to pull in the BUG() defintion. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Now that we don't need an override for BIOVEC_PHYS_MERGEABLE there is no need to drag this header in. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
We only use it in biovec_phys_mergeable and a m68k paravirt driver, so just opencode it there. Also remove the pointless unsigned long cast for the offset in the opencoded instances. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
These two checks should always be performed together, so merge them into a single helper. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
The actual recaculation of segments in __blk_recalc_rq_segments will do this check, so there is no point in forcing it if we know it won't succeed. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Turn the macro into an inline, move it to blk.h and simplify the arch hooks a bit. Also rename the function to biovec_phys_mergeable as there is no need to shout. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
No need to expose these helpers outside the block layer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Keep it close to the actual users instead of exposing the function to all drivers. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
No need to expose these to drivers. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 22 Sep, 2018 13 commits
-
-
Bart Van Assche authored
Make it easier to understand the purpose of the functions that iterate over requests by documenting their purpose. Fix several minor spelling and grammer mistakes in comments in these functions. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Jianchao Wang <jianchao.w.wang@oracle.com> Cc: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Dennis Zhou (Facebook) authored
blkg reference counting now uses percpu_ref rather than atomic_t. Let's make this consistent with css_tryget. This renames blkg_try_get to blkg_tryget and now returns a bool rather than the blkg or NULL. Signed-off-by: Dennis Zhou <dennisszhou@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Dennis Zhou (Facebook) authored
Now that every bio is associated with a blkg, this puts the use of blkg_get, blkg_try_get, and blkg_put on the hot path. This switches over the refcnt in blkg to use percpu_ref. Signed-off-by: Dennis Zhou <dennisszhou@gmail.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Dennis Zhou (Facebook) authored
blk_get_rl is responsible for identifying which request_list a request should be allocated to. Try get logic was added earlier, but semantically the logic was not changed. This patch makes better use of the bio already having a reference to the blkg in the hot path. The cold path uses a better fallback of blkg_lookup_create rather than just blkg_lookup and then falling back to the q->root_rl. If lookup_create fails with anything but -ENODEV, it falls back to q->root_rl. A clarifying comment is added to explain why q->root_rl is used rather than the root blkg's rl. Signed-off-by: Dennis Zhou <dennisszhou@gmail.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Dennis Zhou (Facebook) authored
The previous patch in this series removed carrying around a pointer to the css in blkg. However, the blkg association logic still relied on taking a reference on the css to ensure we wouldn't fail in getting a reference for the blkg. Here the implicit dependency on the css is removed. The association continues to rely on the tryget logic walking up the blkg tree. This streamlines the three ways that association can happen: normal, swap, and writeback. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Dennis Zhou <dennisszhou@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Dennis Zhou (Facebook) authored
Prior patches ensured that all bios are now associated with some blkg. This now makes bio->bi_css unnecessary as blkg maintains a reference to the blkcg already. This patch removes the field bi_css and transfers corresponding uses to access via bi_blkg. Signed-off-by: Dennis Zhou <dennisszhou@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Dennis Zhou (Facebook) authored
One of the goals of this series is to remove a separate reference to the css of the bio. This can and should be accessed via bio_blkcg. In this patch, the wbc_init_bio call is changed such that it must be called after a queue has been associated with the bio. Signed-off-by: Dennis Zhou <dennisszhou@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Dennis Zhou (Facebook) authored
A prior patch in this series added blkg association to bios issued by cgroups. There are two other paths that we want to attribute work back to the appropriate cgroup: swap and writeback. Here we modify the way swap tags bios to include the blkg. Writeback will be tackle in the next patch. Signed-off-by: Dennis Zhou <dennisszhou@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Dennis Zhou (Facebook) authored
bio_issue_init among other things initializes the timestamp for an IO. Rather than have this logic handled by policies, this consolidates it to be on the init paths (normal, clone, bounce clone). Signed-off-by: Dennis Zhou <dennisszhou@gmail.com> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Dennis Zhou (Facebook) authored
Previously, blkg's were only assigned as needed by blk-iolatency and blk-throttle. bio->css was also always being associated while blkg was being looked up and then thrown away in blkcg_bio_issue_check. This patch begins the cleanup of bio->css and bio->bi_blkg by always associating a blkg in blkcg_bio_issue_check. This tries to create the blkg, but if it is not possible, falls back to using the root_blkg of the request_queue. Therefore, a bio will always be associated with a blkg. The duplicate association logic is removed from blk-throttle and blk-iolatency. Signed-off-by: Dennis Zhou <dennisszhou@gmail.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Dennis Zhou (Facebook) authored
There are several scenarios where blkg_lookup_create can fail. Examples include the blkcg dying, request_queue is dying, or simply being OOM. At the end of the day, most handle this by simply falling back to the q->root_blkg and calling it a day. This patch implements the notion of closest blkg. During blkg_lookup_create, if it fails to create, return the closest blkg found or the q->root_blkg. blkg_try_get_closest is introduced and used during association so a bio is always attached to a blkg. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Dennis Zhou <dennisszhou@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Dennis Zhou (Facebook) authored
To know when to create a blkg, the general pattern is to do a blkg_lookup and if that fails, lock and then do a lookup again and if that fails finally create. It doesn't make much sense for everyone who wants to do creation to write this themselves. This changes blkg_lookup_create to do locking and implement this pattern. The old blkg_lookup_create is renamed to __blkg_lookup_create. If a call site wants to do its own error handling or already owns the queue lock, they can use __blkg_lookup_create. This will be used in upcoming patches. Signed-off-by: Dennis Zhou <dennisszhou@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Dennis Zhou (Facebook) authored
The accessor function bio_blkcg either returns the blkcg associated with the bio or finds one in the current context. This can cause an issue when trying to associate a bio with a blkcg. Particularly, it's the third case that is problematic: return css_to_blkcg(task_css(current, io_cgrp_id)); As the above may race against task migration and the cgroup exiting, it is not always ok to take a reference on the blkcg returned from bio_blkcg. This patch adds association ahead of calling bio_blkcg rather than after. This makes association a required and explicit step along the code paths for calling bio_blkcg. blk_get_rl is modified as well to get a reference to the blkcg it may use and blk_put_rl will always put the reference back. Association is also moved above the bio_blkcg call to ensure it will not return NULL in blk-iolatency. BFQ and CFQ utilize this flaw, but due to the complexity, I do not want to address this in this series. I've created a private version of the function with notes not to use it describing the flaw. Hopefully soon, that code can be cleaned up. Signed-off-by: Dennis Zhou <dennisszhou@gmail.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 21 Sep, 2018 1 commit
-
-
Liu Bo authored
As rbtree has native support of caching leftmost node, i.e. rb_root_cached, no need to do the caching by ourselves. Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 20 Sep, 2018 1 commit
-
-
Christoph Hellwig authored
Replace a nasty hack with a different nasty hack to prepare for multipage bio_vecs. By moving the temporary page array as far up as possible in the space allocated for the bio_vec array we can iterate forward over it and thus use bio_add_page. Using bio_add_page means we'll be able to merge physically contiguous pages once support for multipath bio_vecs is merged. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 14 Sep, 2018 1 commit
-
-
Paolo Valente authored
To reduce latency for interactive and soft real-time applications, bfq privileges the bfq_queues containing the I/O of these applications. These privileged queues, referred-to as weight-raised queues, get a much higher share of the device throughput w.r.t. non-privileged queues. To preserve this higher share, the I/O of any non-weight-raised queue must be plugged whenever a sync weight-raised queue, while being served, remains temporarily empty. To attain this goal, bfq simply plugs any I/O (from any queue), if a sync weight-raised queue remains empty while in service. Unfortunately, this plugging typically lowers throughput with random I/O, on devices with internal queueing (because it reduces the filling level of the internal queues of the device). This commit addresses this issue by restricting the cases where plugging is performed: if a sync weight-raised queue remains empty while in service, then I/O plugging is performed only if some of the active bfq_queues are *not* weight-raised (which is actually the only circumstance where plugging is needed to preserve the higher share of the throughput of weight-raised queues). This restriction proved able to boost throughput in really many use cases needing only maximum throughput. Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-