Commit 71217df3 authored by Paolo Valente's avatar Paolo Valente Committed by Jens Axboe

block, bfq: make waker-queue detection more robust

In the presence of many parallel I/O flows, the detection of waker
bfq_queues suffers from false positives. This commits addresses this
issue by making the filtering of actual wakers more selective. In more
detail, a candidate waker must be found to meet waker requirements
three times before being promoted to actual waker.
Tested-by: default avatarJan Kara <jack@suse.cz>
Signed-off-by: default avatarPaolo Valente <paolo.valente@linaro.org>
Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
parent 5a5436b9
...@@ -158,7 +158,6 @@ BFQ_BFQQ_FNS(in_large_burst); ...@@ -158,7 +158,6 @@ BFQ_BFQQ_FNS(in_large_burst);
BFQ_BFQQ_FNS(coop); BFQ_BFQQ_FNS(coop);
BFQ_BFQQ_FNS(split_coop); BFQ_BFQQ_FNS(split_coop);
BFQ_BFQQ_FNS(softrt_update); BFQ_BFQQ_FNS(softrt_update);
BFQ_BFQQ_FNS(has_waker);
#undef BFQ_BFQQ_FNS \ #undef BFQ_BFQQ_FNS \
/* Expiration time of sync (0) and async (1) requests, in ns. */ /* Expiration time of sync (0) and async (1) requests, in ns. */
...@@ -1905,88 +1904,79 @@ static void bfq_update_io_intensity(struct bfq_queue *bfqq, u64 now_ns) ...@@ -1905,88 +1904,79 @@ static void bfq_update_io_intensity(struct bfq_queue *bfqq, u64 now_ns)
} }
} }
static void bfq_add_request(struct request *rq) /*
{ * Detect whether bfqq's I/O seems synchronized with that of some
struct bfq_queue *bfqq = RQ_BFQQ(rq); * other queue, i.e., whether bfqq, after remaining empty, happens to
struct bfq_data *bfqd = bfqq->bfqd; * receive new I/O only right after some I/O request of the other
struct request *next_rq, *prev; * queue has been completed. We call waker queue the other queue, and
unsigned int old_wr_coeff = bfqq->wr_coeff; * we assume, for simplicity, that bfqq may have at most one waker
bool interactive = false; * queue.
u64 now_ns = ktime_get_ns(); *
* A remarkable throughput boost can be reached by unconditionally
bfq_log_bfqq(bfqd, bfqq, "add_request %d", rq_is_sync(rq)); * injecting the I/O of the waker queue, every time a new
bfqq->queued[rq_is_sync(rq)]++; * bfq_dispatch_request happens to be invoked while I/O is being
bfqd->queued++; * plugged for bfqq. In addition to boosting throughput, this
* unblocks bfqq's I/O, thereby improving bandwidth and latency for
if (RB_EMPTY_ROOT(&bfqq->sort_list) && bfq_bfqq_sync(bfqq)) { * bfqq. Note that these same results may be achieved with the general
/* * injection mechanism, but less effectively. For details on this
* Detect whether bfqq's I/O seems synchronized with * aspect, see the comments on the choice of the queue for injection
* that of some other queue, i.e., whether bfqq, after * in bfq_select_queue().
* remaining empty, happens to receive new I/O only *
* right after some I/O request of the other queue has * Turning back to the detection of a waker queue, a queue Q is deemed
* been completed. We call waker queue the other * as a waker queue for bfqq if, for three consecutive times, bfqq
* queue, and we assume, for simplicity, that bfqq may * happens to become non empty right after a request of Q has been
* have at most one waker queue. * completed. In particular, on the first time, Q is tentatively set
* * as a candidate waker queue, while on the third consecutive time
* A remarkable throughput boost can be reached by * that Q is detected, the field waker_bfqq is set to Q, to confirm
* unconditionally injecting the I/O of the waker * that Q is a waker queue for bfqq. These detection steps are
* queue, every time a new bfq_dispatch_request * performed only if bfqq has a long think time, so as to make it more
* happens to be invoked while I/O is being plugged * likely that bfqq's I/O is actually being blocked by a
* for bfqq. In addition to boosting throughput, this * synchronization. This last filter, plus the above three-times
* unblocks bfqq's I/O, thereby improving bandwidth * requirement, make false positives less likely.
* and latency for bfqq. Note that these same results
* may be achieved with the general injection
* mechanism, but less effectively. For details on
* this aspect, see the comments on the choice of the
* queue for injection in bfq_select_queue().
*
* Turning back to the detection of a waker queue, a
* queue Q is deemed as a waker queue for bfqq if, for
* two consecutive times, bfqq happens to become non
* empty right after a request of Q has been
* completed. In particular, on the first time, Q is
* tentatively set as a candidate waker queue, while
* on the second time, the flag
* bfq_bfqq_has_waker(bfqq) is set to confirm that Q
* is a waker queue for bfqq. These detection steps
* are performed only if bfqq has a long think time,
* so as to make it more likely that bfqq's I/O is
* actually being blocked by a synchronization. This
* last filter, plus the above two-times requirement,
* make false positives less likely.
* *
* NOTE * NOTE
* *
* The sooner a waker queue is detected, the sooner * The sooner a waker queue is detected, the sooner throughput can be
* throughput can be boosted by injecting I/O from the * boosted by injecting I/O from the waker queue. Fortunately,
* waker queue. Fortunately, detection is likely to be * detection is likely to be actually fast, for the following
* actually fast, for the following reasons. While * reasons. While blocked by synchronization, bfqq has a long think
* blocked by synchronization, bfqq has a long think * time. This implies that bfqq's inject limit is at least equal to 1
* time. This implies that bfqq's inject limit is at * (see the comments in bfq_update_inject_limit()). So, thanks to
* least equal to 1 (see the comments in * injection, the waker queue is likely to be served during the very
* bfq_update_inject_limit()). So, thanks to * first I/O-plugging time interval for bfqq. This triggers the first
* injection, the waker queue is likely to be served * step of the detection mechanism. Thanks again to injection, the
* during the very first I/O-plugging time interval * candidate waker queue is then likely to be confirmed no later than
* for bfqq. This triggers the first step of the * during the next I/O-plugging interval for bfqq.
* detection mechanism. Thanks again to injection, the *
* candidate waker queue is then likely to be * ISSUE
* confirmed no later than during the next *
* I/O-plugging interval for bfqq. * On queue merging all waker information is lost.
*/ */
if (bfqd->last_completed_rq_bfqq && void bfq_check_waker(struct bfq_data *bfqd, struct bfq_queue *bfqq, u64 now_ns)
!bfq_bfqq_has_short_ttime(bfqq) && {
now_ns - bfqd->last_completion < if (!bfqd->last_completed_rq_bfqq ||
4 * NSEC_PER_MSEC) { bfqd->last_completed_rq_bfqq == bfqq ||
if (bfqd->last_completed_rq_bfqq != bfqq && bfq_bfqq_has_short_ttime(bfqq) ||
bfqd->last_completed_rq_bfqq != now_ns - bfqd->last_completion >= 4 * NSEC_PER_MSEC ||
bfqq->waker_bfqq) { bfqd->last_completed_rq_bfqq == bfqq->waker_bfqq)
/* return;
* First synchronization detected with
* a candidate waker queue, or with a if (bfqd->last_completed_rq_bfqq !=
* different candidate waker queue bfqq->tentative_waker_bfqq) {
* from the current one. /*
* First synchronization detected with a
* candidate waker queue, or with a different
* candidate waker queue from the current one.
*/ */
bfqq->tentative_waker_bfqq =
bfqd->last_completed_rq_bfqq;
bfqq->num_waker_detections = 1;
} else /* Same tentative waker queue detected again */
bfqq->num_waker_detections++;
if (bfqq->num_waker_detections == 3) {
bfqq->waker_bfqq = bfqd->last_completed_rq_bfqq; bfqq->waker_bfqq = bfqd->last_completed_rq_bfqq;
bfqq->tentative_waker_bfqq = NULL;
/* /*
* If the waker queue disappears, then * If the waker queue disappears, then
...@@ -2012,18 +2002,24 @@ static void bfq_add_request(struct request *rq) ...@@ -2012,18 +2002,24 @@ static void bfq_add_request(struct request *rq)
hlist_del_init(&bfqq->woken_list_node); hlist_del_init(&bfqq->woken_list_node);
hlist_add_head(&bfqq->woken_list_node, hlist_add_head(&bfqq->woken_list_node,
&bfqd->last_completed_rq_bfqq->woken_list); &bfqd->last_completed_rq_bfqq->woken_list);
bfq_clear_bfqq_has_waker(bfqq);
} else if (bfqd->last_completed_rq_bfqq ==
bfqq->waker_bfqq &&
!bfq_bfqq_has_waker(bfqq)) {
/*
* synchronization with waker_bfqq
* seen for the second time
*/
bfq_mark_bfqq_has_waker(bfqq);
}
} }
}
static void bfq_add_request(struct request *rq)
{
struct bfq_queue *bfqq = RQ_BFQQ(rq);
struct bfq_data *bfqd = bfqq->bfqd;
struct request *next_rq, *prev;
unsigned int old_wr_coeff = bfqq->wr_coeff;
bool interactive = false;
u64 now_ns = ktime_get_ns();
bfq_log_bfqq(bfqd, bfqq, "add_request %d", rq_is_sync(rq));
bfqq->queued[rq_is_sync(rq)]++;
bfqd->queued++;
if (RB_EMPTY_ROOT(&bfqq->sort_list) && bfq_bfqq_sync(bfqq)) {
bfq_check_waker(bfqd, bfqq, now_ns);
/* /*
* Periodically reset inject limit, to make sure that * Periodically reset inject limit, to make sure that
...@@ -4569,7 +4565,7 @@ static struct bfq_queue *bfq_select_queue(struct bfq_data *bfqd) ...@@ -4569,7 +4565,7 @@ static struct bfq_queue *bfq_select_queue(struct bfq_data *bfqd)
bfq_serv_to_charge(async_bfqq->next_rq, async_bfqq) <= bfq_serv_to_charge(async_bfqq->next_rq, async_bfqq) <=
bfq_bfqq_budget_left(async_bfqq)) bfq_bfqq_budget_left(async_bfqq))
bfqq = bfqq->bic->bfqq[0]; bfqq = bfqq->bic->bfqq[0];
else if (bfq_bfqq_has_waker(bfqq) && else if (bfqq->waker_bfqq &&
bfq_bfqq_busy(bfqq->waker_bfqq) && bfq_bfqq_busy(bfqq->waker_bfqq) &&
bfqq->waker_bfqq->next_rq && bfqq->waker_bfqq->next_rq &&
bfq_serv_to_charge(bfqq->waker_bfqq->next_rq, bfq_serv_to_charge(bfqq->waker_bfqq->next_rq,
...@@ -4973,7 +4969,6 @@ void bfq_put_queue(struct bfq_queue *bfqq) ...@@ -4973,7 +4969,6 @@ void bfq_put_queue(struct bfq_queue *bfqq)
hlist_for_each_entry_safe(item, n, &bfqq->woken_list, hlist_for_each_entry_safe(item, n, &bfqq->woken_list,
woken_list_node) { woken_list_node) {
item->waker_bfqq = NULL; item->waker_bfqq = NULL;
bfq_clear_bfqq_has_waker(item);
hlist_del_init(&item->woken_list_node); hlist_del_init(&item->woken_list_node);
} }
......
...@@ -376,6 +376,11 @@ struct bfq_queue { ...@@ -376,6 +376,11 @@ struct bfq_queue {
* bfq_select_queue(). * bfq_select_queue().
*/ */
struct bfq_queue *waker_bfqq; struct bfq_queue *waker_bfqq;
/* pointer to the curr. tentative waker queue, see bfq_check_waker() */
struct bfq_queue *tentative_waker_bfqq;
/* number of times the same tentative waker has been detected */
unsigned int num_waker_detections;
/* node for woken_list, see below */ /* node for woken_list, see below */
struct hlist_node woken_list_node; struct hlist_node woken_list_node;
/* /*
...@@ -776,7 +781,6 @@ enum bfqq_state_flags { ...@@ -776,7 +781,6 @@ enum bfqq_state_flags {
*/ */
BFQQF_coop, /* bfqq is shared */ BFQQF_coop, /* bfqq is shared */
BFQQF_split_coop, /* shared bfqq will be split */ BFQQF_split_coop, /* shared bfqq will be split */
BFQQF_has_waker /* bfqq has a waker queue */
}; };
#define BFQ_BFQQ_FNS(name) \ #define BFQ_BFQQ_FNS(name) \
...@@ -796,7 +800,6 @@ BFQ_BFQQ_FNS(in_large_burst); ...@@ -796,7 +800,6 @@ BFQ_BFQQ_FNS(in_large_burst);
BFQ_BFQQ_FNS(coop); BFQ_BFQQ_FNS(coop);
BFQ_BFQQ_FNS(split_coop); BFQ_BFQQ_FNS(split_coop);
BFQ_BFQQ_FNS(softrt_update); BFQ_BFQQ_FNS(softrt_update);
BFQ_BFQQ_FNS(has_waker);
#undef BFQ_BFQQ_FNS #undef BFQ_BFQQ_FNS
/* Expiration reasons. */ /* Expiration reasons. */
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment