Commit 9778369a authored by Paolo Valente's avatar Paolo Valente Committed by Jens Axboe

block, bfq: split sync bfq_queues on a per-actuator basis

Single-LUN multi-actuator SCSI drives, as well as all multi-actuator
SATA drives appear as a single device to the I/O subsystem [1].  Yet
they address commands to different actuators internally, as a function
of Logical Block Addressing (LBAs). A given sector is reachable by
only one of the actuators. For example, Seagate’s Serial Advanced
Technology Attachment (SATA) version contains two actuators and maps
the lower half of the SATA LBA space to the lower actuator and the
upper half to the upper actuator.

Evidently, to fully utilize actuators, no actuator must be left idle
or underutilized while there is pending I/O for it. The block layer
must somehow control the load of each actuator individually. This
commit lays the ground for allowing BFQ to provide such a per-actuator
control.

BFQ associates an I/O-request sync bfq_queue with each process doing
synchronous I/O, or with a group of processes, in case of queue
merging. Then BFQ serves one bfq_queue at a time. While in service, a
bfq_queue is emptied in request-position order. Yet the same process,
or group of processes, may generate I/O for different actuators. In
this case, different streams of I/O (each for a different actuator)
get all inserted into the same sync bfq_queue. So there is basically
no individual control on when each stream is served, i.e., on when the
I/O requests of the stream are picked from the bfq_queue and
dispatched to the drive.

This commit enables BFQ to control the service of each actuator
individually for synchronous I/O, by simply splitting each sync
bfq_queue into N queues, one for each actuator. In other words, a sync
bfq_queue is now associated to a pair (process, actuator). As a
consequence of this split, the per-queue proportional-share policy
implemented by BFQ will guarantee that the sync I/O generated for each
actuator, by each process, receives its fair share of service.

This is just a preparatory patch. If the I/O of the same process
happens to be sent to different queues, then each of these queues may
undergo queue merging. To handle this event, the bfq_io_cq data
structure must be properly extended. In addition, stable merging must
be disabled to avoid loss of control on individual actuators. Finally,
also async queues must be split. These issues are described in detail
and addressed in next commits. As for this commit, although multiple
per-process bfq_queues are provided, the I/O of each process or group
of processes is still sent to only one queue, regardless of the
actuator the I/O is for. The forwarding to distinct bfq_queues will be
enabled after addressing the above issues.

[1] https://www.linaro.org/blog/budget-fair-queueing-bfq-linux-io-scheduler-optimizations-for-multi-actuator-sata-hard-drives/Reviewed-by: default avatarDamien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: default avatarGabriele Felici <felicigb@gmail.com>
Signed-off-by: default avatarCarmine Zaccagnino <carmine@carminezacc.com>
Signed-off-by: default avatarPaolo Valente <paolo.valente@linaro.org>
Link: https://lore.kernel.org/r/20230103145503.71712-2-paolo.valente@linaro.orgSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
parent 6d796c50
......@@ -712,40 +712,20 @@ void bfq_bfqq_move(struct bfq_data *bfqd, struct bfq_queue *bfqq,
bfq_put_queue(bfqq);
}
/**
* __bfq_bic_change_cgroup - move @bic to @bfqg.
* @bfqd: the queue descriptor.
* @bic: the bic to move.
* @bfqg: the group to move to.
*
* Move bic to blkcg, assuming that bfqd->lock is held; which makes
* sure that the reference to cgroup is valid across the call (see
* comments in bfq_bic_update_cgroup on this issue)
*/
static void __bfq_bic_change_cgroup(struct bfq_data *bfqd,
static void bfq_sync_bfqq_move(struct bfq_data *bfqd,
struct bfq_queue *sync_bfqq,
struct bfq_io_cq *bic,
struct bfq_group *bfqg)
struct bfq_group *bfqg,
unsigned int act_idx)
{
struct bfq_queue *async_bfqq = bic_to_bfqq(bic, false);
struct bfq_queue *sync_bfqq = bic_to_bfqq(bic, true);
struct bfq_entity *entity;
if (async_bfqq) {
entity = &async_bfqq->entity;
if (entity->sched_data != &bfqg->sched_data) {
bic_set_bfqq(bic, NULL, false);
bfq_release_process_ref(bfqd, async_bfqq);
}
}
struct bfq_queue *bfqq;
if (sync_bfqq) {
if (!sync_bfqq->new_bfqq && !bfq_bfqq_coop(sync_bfqq)) {
/* We are the only user of this bfqq, just move it */
if (sync_bfqq->entity.sched_data != &bfqg->sched_data)
bfq_bfqq_move(bfqd, sync_bfqq, bfqg);
} else {
struct bfq_queue *bfqq;
return;
}
/*
* The queue was merged to a different queue. Check
......@@ -753,26 +733,53 @@ static void __bfq_bic_change_cgroup(struct bfq_data *bfqd,
* cgroup.
*/
for (bfqq = sync_bfqq; bfqq; bfqq = bfqq->new_bfqq)
if (bfqq->entity.sched_data !=
&bfqg->sched_data)
if (bfqq->entity.sched_data != &bfqg->sched_data)
break;
if (bfqq) {
/*
* Some queue changed cgroup so the merge is
* not valid anymore. We cannot easily just
* cancel the merge (by clearing new_bfqq) as
* there may be other processes using this
* queue and holding refs to all queues below
* sync_bfqq->new_bfqq. Similarly if the merge
* already happened, we need to detach from
* bfqq now so that we cannot merge bio to a
* request from the old cgroup.
* Some queue changed cgroup so the merge is not valid
* anymore. We cannot easily just cancel the merge (by
* clearing new_bfqq) as there may be other processes
* using this queue and holding refs to all queues
* below sync_bfqq->new_bfqq. Similarly if the merge
* already happened, we need to detach from bfqq now
* so that we cannot merge bio to a request from the
* old cgroup.
*/
bfq_put_cooperator(sync_bfqq);
bfq_release_process_ref(bfqd, sync_bfqq);
bic_set_bfqq(bic, NULL, true);
bic_set_bfqq(bic, NULL, true, act_idx);
}
}
/**
* __bfq_bic_change_cgroup - move @bic to @bfqg.
* @bfqd: the queue descriptor.
* @bic: the bic to move.
* @bfqg: the group to move to.
*
* Move bic to blkcg, assuming that bfqd->lock is held; which makes
* sure that the reference to cgroup is valid across the call (see
* comments in bfq_bic_update_cgroup on this issue)
*/
static void __bfq_bic_change_cgroup(struct bfq_data *bfqd,
struct bfq_io_cq *bic,
struct bfq_group *bfqg)
{
unsigned int act_idx;
for (act_idx = 0; act_idx < bfqd->num_actuators; act_idx++) {
struct bfq_queue *async_bfqq = bic_to_bfqq(bic, false, act_idx);
struct bfq_queue *sync_bfqq = bic_to_bfqq(bic, true, act_idx);
if (async_bfqq &&
async_bfqq->entity.sched_data != &bfqg->sched_data) {
bic_set_bfqq(bic, NULL, false, act_idx);
bfq_release_process_ref(bfqd, async_bfqq);
}
if (sync_bfqq)
bfq_sync_bfqq_move(bfqd, sync_bfqq, bic, bfqg, act_idx);
}
}
......
This diff is collapsed.
......@@ -33,6 +33,14 @@
*/
#define BFQ_SOFTRT_WEIGHT_FACTOR 100
/*
* Maximum number of actuators supported. This constant is used simply
* to define the size of the static array that will contain
* per-actuator data. The current value is hopefully a good upper
* bound to the possible number of actuators of any actual drive.
*/
#define BFQ_MAX_ACTUATORS 8
struct bfq_entity;
/**
......@@ -228,11 +236,13 @@ struct bfq_ttime {
*
* A bfq_queue is a leaf request queue; it can be associated with an
* io_context or more, if it is async or shared between cooperating
* processes. @cgroup holds a reference to the cgroup, to be sure that it
* does not disappear while a bfqq still references it (mostly to avoid
* races between request issuing and task migration followed by cgroup
* destruction).
* All the fields are protected by the queue lock of the containing bfqd.
* processes. Besides, it contains I/O requests for only one actuator
* (an io_context is associated with a different bfq_queue for each
* actuator it generates I/O for). @cgroup holds a reference to the
* cgroup, to be sure that it does not disappear while a bfqq still
* references it (mostly to avoid races between request issuing and
* task migration followed by cgroup destruction). All the fields are
* protected by the queue lock of the containing bfqd.
*/
struct bfq_queue {
/* reference counter */
......@@ -397,6 +407,9 @@ struct bfq_queue {
* the woken queues when this queue exits.
*/
struct hlist_head woken_list;
/* index of the actuator this queue is associated with */
unsigned int actuator_idx;
};
/**
......@@ -405,8 +418,17 @@ struct bfq_queue {
struct bfq_io_cq {
/* associated io_cq structure */
struct io_cq icq; /* must be the first member */
/* array of two process queues, the sync and the async */
struct bfq_queue *bfqq[2];
/*
* Matrix of associated process queues: first row for async
* queues, second row sync queues. Each row contains one
* column for each actuator. An I/O request generated by the
* process is inserted into the queue pointed by bfqq[i][j] if
* the request is to be served by the j-th actuator of the
* drive, where i==0 or i==1, depending on whether the request
* is async or sync. So there is a distinct queue for each
* actuator.
*/
struct bfq_queue *bfqq[2][BFQ_MAX_ACTUATORS];
/* per (request_queue, blkcg) ioprio */
int ioprio;
#ifdef CONFIG_BFQ_GROUP_IOSCHED
......@@ -772,6 +794,13 @@ struct bfq_data {
*/
unsigned int word_depths[2][2];
unsigned int full_depth_shift;
/*
* Number of independent actuators. This is equal to 1 in
* case of single-actuator drives.
*/
unsigned int num_actuators;
};
enum bfqq_state_flags {
......@@ -969,8 +998,10 @@ struct bfq_group {
extern const int bfq_timeout;
struct bfq_queue *bic_to_bfqq(struct bfq_io_cq *bic, bool is_sync);
void bic_set_bfqq(struct bfq_io_cq *bic, struct bfq_queue *bfqq, bool is_sync);
struct bfq_queue *bic_to_bfqq(struct bfq_io_cq *bic, bool is_sync,
unsigned int actuator_idx);
void bic_set_bfqq(struct bfq_io_cq *bic, struct bfq_queue *bfqq, bool is_sync,
unsigned int actuator_idx);
struct bfq_data *bic_to_bfqd(struct bfq_io_cq *bic);
void bfq_pos_tree_add_move(struct bfq_data *bfqd, struct bfq_queue *bfqq);
void bfq_weights_tree_add(struct bfq_queue *bfqq);
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment