1. 20 Apr, 2017 22 commits
  2. 19 Apr, 2017 18 commits
    • Bart Van Assche's avatar
      block: Optimize ioprio_best() · 9a87182c
      Bart Van Assche authored
      Since ioprio_best() translates IOPRIO_CLASS_NONE into IOPRIO_CLASS_BE
      and since lower numerical priority values represent a higher priority
      a simple numerical comparison is sufficient.
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Reviewed-by: default avatarAdam Manzanares <adam.manzanares@wdc.com>
      Tested-by: default avatarAdam Manzanares <adam.manzanares@wdc.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Cc: Matias Bjørling <m@bjorling.me>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      9a87182c
    • Bart Van Assche's avatar
      block: Inline blk_rq_set_prio() · 0be0dee6
      Bart Van Assche authored
      Since only a single caller remains, inline blk_rq_set_prio(). Initialize
      req->ioprio even if no I/O priority has been set in the bio nor in the
      I/O context.
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Reviewed-by: default avatarAdam Manzanares <adam.manzanares@wdc.com>
      Tested-by: default avatarAdam Manzanares <adam.manzanares@wdc.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Cc: Matias Bjørling <m@bjorling.me>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      0be0dee6
    • Bart Van Assche's avatar
      lightnvm: Use blk_init_request_from_bio() instead of open-coding it · 9460e280
      Bart Van Assche authored
      This patch changes the behavior of the lightnvm driver as follows:
      * REQ_FAILFAST_MASK is set for read-ahead requests.
      * If no I/O priority has been set in the bio, the I/O priority is
        copied from the I/O context.
      * The rq_disk member is initialized if bio->bi_bdev != NULL.
      * The bio sector offset is copied into req->__sector instead of
        retaining the value -1 set by blk_mq_alloc_request().
      * req->errors is initialized to zero.
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Cc: Matias Bjørling <m@bjorling.me>
      Cc: Adam Manzanares <adam.manzanares@wdc.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      9460e280
    • Bart Van Assche's avatar
      null_blk: Use blk_init_request_from_bio() instead of open-coding it · 2644a3cc
      Bart Van Assche authored
      This patch changes the behavior of the null_blk driver for the
      LightNVM mode as follows:
      * REQ_FAILFAST_MASK is set for read-ahead requests.
      * If no I/O priority has been set in the bio, the I/O priority is
        copied from the I/O context.
      * The rq_disk member is initialized if bio->bi_bdev != NULL.
      * req->errors is initialized to zero.
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Cc: Matias Bjørling <m@bjorling.me>
      Cc: Adam Manzanares <adam.manzanares@wdc.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      2644a3cc
    • Bart Van Assche's avatar
      block: Export blk_init_request_from_bio() · da8d7f07
      Bart Van Assche authored
      Export this function such that it becomes available to block
      drivers.
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Cc: Matias Bjørling <m@bjorling.me>
      Cc: Adam Manzanares <adam.manzanares@wdc.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      da8d7f07
    • Arnd Bergmann's avatar
      lightnvm: assume 64-bit lba numbers · ef697902
      Arnd Bergmann authored
      The driver uses both u64 and sector_t to refer to offsets, and assigns between the
      two. This causes one harmless warning when sector_t is 32-bit:
      
      drivers/lightnvm/pblk-rb.c: In function 'pblk_rb_write_entry_gc':
      include/linux/lightnvm.h:215:20: error: large integer implicitly truncated to unsigned type [-Werror=overflow]
      drivers/lightnvm/pblk-rb.c:324:22: note: in expansion of macro 'ADDR_EMPTY'
      
      As the driver is already doing this inconsistently, changing the type
      won't make it worse and is an easy way to avoid the warning.
      
      Fixes: a4bd217b ("lightnvm: physical block device (pblk) target")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      ef697902
    • Christoph Hellwig's avatar
      block: make __blk_end_bidi_request private · d0fac025
      Christoph Hellwig authored
      blk_insert_flush should be using __blk_end_request to start with.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      d0fac025
    • Christoph Hellwig's avatar
      block: remove blk_end_request_cur · fa1a15c0
      Christoph Hellwig authored
      This function is not used anywhere in the kernel.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      fa1a15c0
    • Christoph Hellwig's avatar
      block: remove blk_end_request_err and __blk_end_request_err · 314fe91b
      Christoph Hellwig authored
      Both functions are entirely unused.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      314fe91b
    • Christoph Hellwig's avatar
      block: remove the osdblk driver · 10081552
      Christoph Hellwig authored
      This was just a proof of concept user for the SCSI OSD library, and
      never had any real users.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Acked-by: default avatarBoaz Harrosh <ooo@electrozaur.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      10081552
    • Jan Kara's avatar
      block: Make writeback throttling defaults consistent for SQ devices · 8330cdb0
      Jan Kara authored
      When CFQ is used as an elevator, it disables writeback throttling
      because they don't play well together. Later when a different elevator
      is chosen for the device, writeback throttling doesn't get enabled
      again as it should. Make sure CFQ enables writeback throttling (if it
      should be enabled by default) when we switch from it to another IO
      scheduler.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      8330cdb0
    • Paolo Valente's avatar
      block, bfq: split bfq-iosched.c into multiple source files · ea25da48
      Paolo Valente authored
      The BFQ I/O scheduler features an optimal fair-queuing
      (proportional-share) scheduling algorithm, enriched with several
      mechanisms to boost throughput and reduce latency for interactive and
      real-time applications. This makes BFQ a large and complex piece of
      code. This commit addresses this issue by splitting BFQ into three
      main, independent components, and by moving each component into a
      separate source file:
      1. Main algorithm: handles the interaction with the kernel, and
      decides which requests to dispatch; it uses the following two further
      components to achieve its goals.
      2. Scheduling engine (Hierarchical B-WF2Q+ scheduling algorithm):
      computes the schedule, using weights and budgets provided by the above
      component.
      3. cgroups support: handles group operations (creation, destruction,
      move, ...).
      Signed-off-by: default avatarPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      ea25da48
    • Paolo Valente's avatar
      block, bfq: remove all get and put of I/O contexts · 6fa3e8d3
      Paolo Valente authored
      When a bfq queue is set in service and when it is merged, a reference
      to the I/O context associated with the queue is taken. This reference
      is then released when the queue is deselected from service or
      split. More precisely, the release of the reference is postponed to
      when the scheduler lock is released, to avoid nesting between the
      scheduler and the I/O-context lock. In fact, such nesting would lead
      to deadlocks, because of other code paths that take the same locks in
      the opposite order. This postponing of I/O-context releases does
      complicate code.
      
      This commit addresses these issue by modifying involved operations in
      such a way to not need to get the above I/O-context references any
      more. Then it also removes any get and release of these references.
      Signed-off-by: default avatarPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      6fa3e8d3
    • Arianna Avanzini's avatar
      block, bfq: handle bursts of queue activations · e1b2324d
      Arianna Avanzini authored
      Many popular I/O-intensive services or applications spawn or
      reactivate many parallel threads/processes during short time
      intervals. Examples are systemd during boot or git grep.  These
      services or applications benefit mostly from a high throughput: the
      quicker the I/O generated by their processes is cumulatively served,
      the sooner the target job of these services or applications gets
      completed. As a consequence, it is almost always counterproductive to
      weight-raise any of the queues associated to the processes of these
      services or applications: in most cases it would just lower the
      throughput, mainly because weight-raising also implies device idling.
      
      To address this issue, an I/O scheduler needs, first, to detect which
      queues are associated with these services or applications. In this
      respect, we have that, from the I/O-scheduler standpoint, these
      services or applications cause bursts of activations, i.e.,
      activations of different queues occurring shortly after each
      other. However, a shorter burst of activations may be caused also by
      the start of an application that does not consist in a lot of parallel
      I/O-bound threads (see the comments on the function bfq_handle_burst
      for details).
      
      In view of these facts, this commit introduces:
      1) an heuristic to detect (only) bursts of queue activations caused by
         services or applications consisting in many parallel I/O-bound
         threads;
      2) the prevention of device idling and weight-raising for the queues
         belonging to these bursts.
      Signed-off-by: default avatarArianna Avanzini <avanzini.arianna@gmail.com>
      Signed-off-by: default avatarPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      e1b2324d
    • Paolo Valente's avatar
      block, bfq: boost the throughput with random I/O on NCQ-capable HDDs · e01eff01
      Paolo Valente authored
      This patch is basically the counterpart, for NCQ-capable rotational
      devices, of the previous patch. Exactly as the previous patch does on
      flash-based devices and for any workload, this patch disables device
      idling on rotational devices, but only for random I/O. In fact, only
      with these queues disabling idling boosts the throughput on
      NCQ-capable rotational devices. To not break service guarantees,
      idling is disabled for NCQ-enabled rotational devices only when the
      same symmetry conditions considered in the previous patches hold.
      Signed-off-by: default avatarPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: default avatarArianna Avanzini <avanzini.arianna@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      e01eff01
    • Paolo Valente's avatar
      block, bfq: boost the throughput on NCQ-capable flash-based devices · bf2b79e7
      Paolo Valente authored
      This patch boosts the throughput on NCQ-capable flash-based devices,
      while still preserving latency guarantees for interactive and soft
      real-time applications. The throughput is boosted by just not idling
      the device when the in-service queue remains empty, even if the queue
      is sync and has a non-null idle window. This helps to keep the drive's
      internal queue full, which is necessary to achieve maximum
      performance. This solution to boost the throughput is a port of
      commits a68bbdd and f7d7b7a7 for CFQ.
      
      As already highlighted in a previous patch, allowing the device to
      prefetch and internally reorder requests trivially causes loss of
      control on the request service order, and hence on service guarantees.
      Fortunately, as discussed in detail in the comments on the function
      bfq_bfqq_may_idle(), if every process has to receive the same
      fraction of the throughput, then the service order enforced by the
      internal scheduler of a flash-based device is relatively close to that
      enforced by BFQ. In particular, it is close enough to let service
      guarantees be substantially preserved.
      
      Things change in an asymmetric scenario, i.e., if not every process
      has to receive the same fraction of the throughput. In this case, to
      guarantee the desired throughput distribution, the device must be
      prevented from prefetching requests. This is exactly what this patch
      does in asymmetric scenarios.
      Signed-off-by: default avatarPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: default avatarArianna Avanzini <avanzini.arianna@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      bf2b79e7
    • Arianna Avanzini's avatar
      block, bfq: reduce idling only in symmetric scenarios · 1de0c4cd
      Arianna Avanzini authored
      A seeky queue (i..e, a queue containing random requests) is assigned a
      very small device-idling slice, for throughput issues. Unfortunately,
      given the process associated with a seeky queue, this behavior causes
      the following problem: if the process, say P, performs sync I/O and
      has a higher weight than some other processes doing I/O and associated
      with non-seeky queues, then BFQ may fail to guarantee to P its
      reserved share of the throughput. The reason is that idling is key
      for providing service guarantees to processes doing sync I/O [1].
      
      This commit addresses this issue by allowing the device-idling slice
      to be reduced for a seeky queue only if the scenario happens to be
      symmetric, i.e., if all the queues are to receive the same share of
      the throughput.
      
      [1] P. Valente, A. Avanzini, "Evolution of the BFQ Storage I/O
          Scheduler", Proceedings of the First Workshop on Mobile System
          Technologies (MST-2015), May 2015.
          http://algogroup.unimore.it/people/paolo/disk_sched/mst-2015.pdfSigned-off-by: default avatarArianna Avanzini <avanzini.arianna@gmail.com>
      Signed-off-by: default avatarRiccardo Pizzetti <riccardo.pizzetti@gmail.com>
      Signed-off-by: default avatarSamuele Zecchini <samuele.zecchini92@gmail.com>
      Signed-off-by: default avatarPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      1de0c4cd
    • Arianna Avanzini's avatar
      block, bfq: add Early Queue Merge (EQM) · 36eca894
      Arianna Avanzini authored
      A set of processes may happen to perform interleaved reads, i.e.,
      read requests whose union would give rise to a sequential read pattern.
      There are two typical cases: first, processes reading fixed-size chunks
      of data at a fixed distance from each other; second, processes reading
      variable-size chunks at variable distances. The latter case occurs for
      example with QEMU, which splits the I/O generated by a guest into
      multiple chunks, and lets these chunks be served by a pool of I/O
      threads, iteratively assigning the next chunk of I/O to the first
      available thread. CFQ denotes as 'cooperating' a set of processes that
      are doing interleaved I/O, and when it detects cooperating processes,
      it merges their queues to obtain a sequential I/O pattern from the union
      of their I/O requests, and hence boost the throughput.
      
      Unfortunately, in the following frequent case, the mechanism
      implemented in CFQ for detecting cooperating processes and merging
      their queues is not responsive enough to handle also the fluctuating
      I/O pattern of the second type of processes. Suppose that one process
      of the second type issues a request close to the next request to serve
      of another process of the same type. At that time the two processes
      would be considered as cooperating. But, if the request issued by the
      first process is to be merged with some other already-queued request,
      then, from the moment at which this request arrives, to the moment
      when CFQ controls whether the two processes are cooperating, the two
      processes are likely to be already doing I/O in distant zones of the
      disk surface or device memory.
      
      CFQ uses however preemption to get a sequential read pattern out of
      the read requests performed by the second type of processes too.  As a
      consequence, CFQ uses two different mechanisms to achieve the same
      goal: boosting the throughput with interleaved I/O.
      
      This patch introduces Early Queue Merge (EQM), a unified mechanism to
      get a sequential read pattern with both types of processes. The main
      idea is to immediately check whether a newly-arrived request lets some
      pair of processes become cooperating, both in the case of actual
      request insertion and, to be responsive with the second type of
      processes, in the case of request merge. Both types of processes are
      then handled by just merging their queues.
      Signed-off-by: default avatarArianna Avanzini <avanzini.arianna@gmail.com>
      Signed-off-by: default avatarMauro Andreolini <mauro.andreolini@unimore.it>
      Signed-off-by: default avatarPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      36eca894