Commit 0e9da3fb authored by Linus Torvalds's avatar Linus Torvalds

Merge tag 'for-4.21/block-20181221' of git://git.kernel.dk/linux-block

Pull block updates from Jens Axboe:
 "This is the main pull request for block/storage for 4.21.

  Larger than usual, it was a busy round with lots of goodies queued up.
  Most notable is the removal of the old IO stack, which has been a long
  time coming. No new features for a while, everything coming in this
  week has all been fixes for things that were previously merged.

  This contains:

   - Use atomic counters instead of semaphores for mtip32xx (Arnd)

   - Cleanup of the mtip32xx request setup (Christoph)

   - Fix for circular locking dependency in loop (Jan, Tetsuo)

   - bcache (Coly, Guoju, Shenghui)
      * Optimizations for writeback caching
      * Various fixes and improvements

   - nvme (Chaitanya, Christoph, Sagi, Jay, me, Keith)
      * host and target support for NVMe over TCP
      * Error log page support
      * Support for separate read/write/poll queues
      * Much improved polling
      * discard OOM fallback
      * Tracepoint improvements

   - lightnvm (Hans, Hua, Igor, Matias, Javier)
      * Igor added packed metadata to pblk. Now drives without metadata
        per LBA can be used as well.
      * Fix from Geert on uninitialized value on chunk metadata reads.
      * Fixes from Hans and Javier to pblk recovery and write path.
      * Fix from Hua Su to fix a race condition in the pblk recovery
        code.
      * Scan optimization added to pblk recovery from Zhoujie.
      * Small geometry cleanup from me.

   - Conversion of the last few drivers that used the legacy path to
     blk-mq (me)

   - Removal of legacy IO path in SCSI (me, Christoph)

   - Removal of legacy IO stack and schedulers (me)

   - Support for much better polling, now without interrupts at all.
     blk-mq adds support for multiple queue maps, which enables us to
     have a map per type. This in turn enables nvme to have separate
     completion queues for polling, which can then be interrupt-less.
     Also means we're ready for async polled IO, which is hopefully
     coming in the next release.

   - Killing of (now) unused block exports (Christoph)

   - Unification of the blk-rq-qos and blk-wbt wait handling (Josef)

   - Support for zoned testing with null_blk (Masato)

   - sx8 conversion to per-host tag sets (Christoph)

   - IO priority improvements (Damien)

   - mq-deadline zoned fix (Damien)

   - Ref count blkcg series (Dennis)

   - Lots of blk-mq improvements and speedups (me)

   - sbitmap scalability improvements (me)

   - Make core inflight IO accounting per-cpu (Mikulas)

   - Export timeout setting in sysfs (Weiping)

   - Cleanup the direct issue path (Jianchao)

   - Export blk-wbt internals in block debugfs for easier debugging
     (Ming)

   - Lots of other fixes and improvements"

* tag 'for-4.21/block-20181221' of git://git.kernel.dk/linux-block: (364 commits)
  kyber: use sbitmap add_wait_queue/list_del wait helpers
  sbitmap: add helpers for add/del wait queue handling
  block: save irq state in blkg_lookup_create()
  dm: don't reuse bio for flushes
  nvme-pci: trace SQ status on completions
  nvme-rdma: implement polling queue map
  nvme-fabrics: allow user to pass in nr_poll_queues
  nvme-fabrics: allow nvmf_connect_io_queue to poll
  nvme-core: optionally poll sync commands
  block: make request_to_qc_t public
  nvme-tcp: fix spelling mistake "attepmpt" -> "attempt"
  nvme-tcp: fix endianess annotations
  nvmet-tcp: fix endianess annotations
  nvme-pci: refactor nvme_poll_irqdisable to make sparse happy
  nvme-pci: only set nr_maps to 2 if poll queues are supported
  nvmet: use a macro for default error location
  nvmet: fix comparison of a u16 with -1
  blk-mq: enable IO poll if .nr_queues of type poll > 0
  blk-mq: change blk_mq_queue_busy() to blk_mq_queue_inflight()
  blk-mq: skip zero-queue maps in blk_mq_map_swqueue
  ...
parents b12a9124 00203ba4
...@@ -244,7 +244,7 @@ Description: ...@@ -244,7 +244,7 @@ Description:
What: /sys/block/<disk>/queue/zoned What: /sys/block/<disk>/queue/zoned
Date: September 2016 Date: September 2016
Contact: Damien Le Moal <damien.lemoal@hgst.com> Contact: Damien Le Moal <damien.lemoal@wdc.com>
Description: Description:
zoned indicates if the device is a zoned block device zoned indicates if the device is a zoned block device
and the zone model of the device if it is indeed zoned. and the zone model of the device if it is indeed zoned.
...@@ -259,6 +259,14 @@ Description: ...@@ -259,6 +259,14 @@ Description:
zone commands, they will be treated as regular block zone commands, they will be treated as regular block
devices and zoned will report "none". devices and zoned will report "none".
What: /sys/block/<disk>/queue/nr_zones
Date: November 2018
Contact: Damien Le Moal <damien.lemoal@wdc.com>
Description:
nr_zones indicates the total number of zones of a zoned block
device ("host-aware" or "host-managed" zone model). For regular
block devices, the value is always 0.
What: /sys/block/<disk>/queue/chunk_sectors What: /sys/block/<disk>/queue/chunk_sectors
Date: September 2016 Date: September 2016
Contact: Hannes Reinecke <hare@suse.com> Contact: Hannes Reinecke <hare@suse.com>
...@@ -268,6 +276,6 @@ Description: ...@@ -268,6 +276,6 @@ Description:
indicates the size in 512B sectors of the RAID volume indicates the size in 512B sectors of the RAID volume
stripe segment. For a zoned block device, either stripe segment. For a zoned block device, either
host-aware or host-managed, chunk_sectors indicates the host-aware or host-managed, chunk_sectors indicates the
size of 512B sectors of the zones of the device, with size in 512B sectors of the zones of the device, with
the eventual exception of the last zone of the device the eventual exception of the last zone of the device
which may be smaller. which may be smaller.
...@@ -1879,8 +1879,10 @@ following two functions. ...@@ -1879,8 +1879,10 @@ following two functions.
wbc_init_bio(@wbc, @bio) wbc_init_bio(@wbc, @bio)
Should be called for each bio carrying writeback data and Should be called for each bio carrying writeback data and
associates the bio with the inode's owner cgroup. Can be associates the bio with the inode's owner cgroup and the
called anytime between bio allocation and submission. corresponding request queue. This must be called after
a queue (device) has been associated with the bio and
before submission.
wbc_account_io(@wbc, @page, @bytes) wbc_account_io(@wbc, @page, @bytes)
Should be called for each data segment being written out. Should be called for each data segment being written out.
...@@ -1899,7 +1901,7 @@ the configuration, the bio may be executed at a lower priority and if ...@@ -1899,7 +1901,7 @@ the configuration, the bio may be executed at a lower priority and if
the writeback session is holding shared resources, e.g. a journal the writeback session is holding shared resources, e.g. a journal
entry, may lead to priority inversion. There is no one easy solution entry, may lead to priority inversion. There is no one easy solution
for the problem. Filesystems can try to work around specific problem for the problem. Filesystems can try to work around specific problem
cases by skipping wbc_init_bio() or using bio_associate_blkcg() cases by skipping wbc_init_bio() and using bio_associate_blkg()
directly. directly.
......
...@@ -65,7 +65,6 @@ Description of Contents: ...@@ -65,7 +65,6 @@ Description of Contents:
3.2.3 I/O completion 3.2.3 I/O completion
3.2.4 Implications for drivers that do not interpret bios (don't handle 3.2.4 Implications for drivers that do not interpret bios (don't handle
multiple segments) multiple segments)
3.2.5 Request command tagging
3.3 I/O submission 3.3 I/O submission
4. The I/O scheduler 4. The I/O scheduler
5. Scalability related changes 5. Scalability related changes
...@@ -708,93 +707,6 @@ is crossed on completion of a transfer. (The end*request* functions should ...@@ -708,93 +707,6 @@ is crossed on completion of a transfer. (The end*request* functions should
be used if only if the request has come down from block/bio path, not for be used if only if the request has come down from block/bio path, not for
direct access requests which only specify rq->buffer without a valid rq->bio) direct access requests which only specify rq->buffer without a valid rq->bio)
3.2.5 Generic request command tagging
3.2.5.1 Tag helpers
Block now offers some simple generic functionality to help support command
queueing (typically known as tagged command queueing), ie manage more than
one outstanding command on a queue at any given time.
blk_queue_init_tags(struct request_queue *q, int depth)
Initialize internal command tagging structures for a maximum
depth of 'depth'.
blk_queue_free_tags((struct request_queue *q)
Teardown tag info associated with the queue. This will be done
automatically by block if blk_queue_cleanup() is called on a queue
that is using tagging.
The above are initialization and exit management, the main helpers during
normal operations are:
blk_queue_start_tag(struct request_queue *q, struct request *rq)
Start tagged operation for this request. A free tag number between
0 and 'depth' is assigned to the request (rq->tag holds this number),
and 'rq' is added to the internal tag management. If the maximum depth
for this queue is already achieved (or if the tag wasn't started for
some other reason), 1 is returned. Otherwise 0 is returned.
blk_queue_end_tag(struct request_queue *q, struct request *rq)
End tagged operation on this request. 'rq' is removed from the internal
book keeping structures.
To minimize struct request and queue overhead, the tag helpers utilize some
of the same request members that are used for normal request queue management.
This means that a request cannot both be an active tag and be on the queue
list at the same time. blk_queue_start_tag() will remove the request, but
the driver must remember to call blk_queue_end_tag() before signalling
completion of the request to the block layer. This means ending tag
operations before calling end_that_request_last()! For an example of a user
of these helpers, see the IDE tagged command queueing support.
3.2.5.2 Tag info
Some block functions exist to query current tag status or to go from a
tag number to the associated request. These are, in no particular order:
blk_queue_tagged(q)
Returns 1 if the queue 'q' is using tagging, 0 if not.
blk_queue_tag_request(q, tag)
Returns a pointer to the request associated with tag 'tag'.
blk_queue_tag_depth(q)
Return current queue depth.
blk_queue_tag_queue(q)
Returns 1 if the queue can accept a new queued command, 0 if we are
at the maximum depth already.
blk_queue_rq_tagged(rq)
Returns 1 if the request 'rq' is tagged.
3.2.5.2 Internal structure
Internally, block manages tags in the blk_queue_tag structure:
struct blk_queue_tag {
struct request **tag_index; /* array or pointers to rq */
unsigned long *tag_map; /* bitmap of free tags */
struct list_head busy_list; /* fifo list of busy tags */
int busy; /* queue depth */
int max_depth; /* max queue depth */
};
Most of the above is simple and straight forward, however busy_list may need
a bit of explaining. Normally we don't care too much about request ordering,
but in the event of any barrier requests in the tag queue we need to ensure
that requests are restarted in the order they were queue.
3.3 I/O Submission 3.3 I/O Submission
The routine submit_bio() is used to submit a single io. Higher level i/o The routine submit_bio() is used to submit a single io. Higher level i/o
......
This diff is collapsed.
...@@ -64,7 +64,7 @@ guess, the kernel will put the process issuing IO to sleep for an amount ...@@ -64,7 +64,7 @@ guess, the kernel will put the process issuing IO to sleep for an amount
of time, before entering a classic poll loop. This mode might be a of time, before entering a classic poll loop. This mode might be a
little slower than pure classic polling, but it will be more efficient. little slower than pure classic polling, but it will be more efficient.
If set to a value larger than 0, the kernel will put the process issuing If set to a value larger than 0, the kernel will put the process issuing
IO to sleep for this amont of microseconds before entering classic IO to sleep for this amount of microseconds before entering classic
polling. polling.
iostats (RW) iostats (RW)
...@@ -194,4 +194,31 @@ blk-throttle makes decision based on the samplings. Lower time means cgroups ...@@ -194,4 +194,31 @@ blk-throttle makes decision based on the samplings. Lower time means cgroups
have more smooth throughput, but higher CPU overhead. This exists only when have more smooth throughput, but higher CPU overhead. This exists only when
CONFIG_BLK_DEV_THROTTLING_LOW is enabled. CONFIG_BLK_DEV_THROTTLING_LOW is enabled.
zoned (RO)
----------
This indicates if the device is a zoned block device and the zone model of the
device if it is indeed zoned. The possible values indicated by zoned are
"none" for regular block devices and "host-aware" or "host-managed" for zoned
block devices. The characteristics of host-aware and host-managed zoned block
devices are described in the ZBC (Zoned Block Commands) and ZAC
(Zoned Device ATA Command Set) standards. These standards also define the
"drive-managed" zone model. However, since drive-managed zoned block devices
do not support zone commands, they will be treated as regular block devices
and zoned will report "none".
nr_zones (RO)
-------------
For zoned block devices (zoned attribute indicating "host-managed" or
"host-aware"), this indicates the total number of zones of the device.
This is always 0 for regular block devices.
chunk_sectors (RO)
------------------
This has different meaning depending on the type of the block device.
For a RAID device (dm-raid), chunk_sectors indicates the size in 512B sectors
of the RAID volume stripe segment. For a zoned block device, either host-aware
or host-managed, chunk_sectors indicates the size in 512B sectors of the zones
of the device, with the eventual exception of the last zone of the device which
may be smaller.
Jens Axboe <jens.axboe@oracle.com>, February 2009 Jens Axboe <jens.axboe@oracle.com>, February 2009
...@@ -97,11 +97,6 @@ parameters may be changed at runtime by the command ...@@ -97,11 +97,6 @@ parameters may be changed at runtime by the command
allowing boot to proceed. none ignores them, expecting allowing boot to proceed. none ignores them, expecting
user space to do the scan. user space to do the scan.
scsi_mod.use_blk_mq=
[SCSI] use blk-mq I/O path by default
See SCSI_MQ_DEFAULT in drivers/scsi/Kconfig.
Format: <y/n>
sim710= [SCSI,HW] sim710= [SCSI,HW]
See header of drivers/scsi/sim710.c. See header of drivers/scsi/sim710.c.
......
...@@ -155,12 +155,6 @@ config BLK_CGROUP_IOLATENCY ...@@ -155,12 +155,6 @@ config BLK_CGROUP_IOLATENCY
Note, this is an experimental interface and could be changed someday. Note, this is an experimental interface and could be changed someday.
config BLK_WBT_SQ
bool "Single queue writeback throttling"
depends on BLK_WBT
---help---
Enable writeback throttling by default on legacy single queue devices
config BLK_WBT_MQ config BLK_WBT_MQ
bool "Multiqueue writeback throttling" bool "Multiqueue writeback throttling"
default y default y
......
...@@ -3,67 +3,6 @@ if BLOCK ...@@ -3,67 +3,6 @@ if BLOCK
menu "IO Schedulers" menu "IO Schedulers"
config IOSCHED_NOOP
bool
default y
---help---
The no-op I/O scheduler is a minimal scheduler that does basic merging
and sorting. Its main uses include non-disk based block devices like
memory devices, and specialised software or hardware environments
that do their own scheduling and require only minimal assistance from
the kernel.
config IOSCHED_DEADLINE
tristate "Deadline I/O scheduler"
default y
---help---
The deadline I/O scheduler is simple and compact. It will provide
CSCAN service with FIFO expiration of requests, switching to
a new point in the service tree and doing a batch of IO from there
in case of expiry.
config IOSCHED_CFQ
tristate "CFQ I/O scheduler"
default y
---help---
The CFQ I/O scheduler tries to distribute bandwidth equally
among all processes in the system. It should provide a fair
and low latency working environment, suitable for both desktop
and server systems.
This is the default I/O scheduler.
config CFQ_GROUP_IOSCHED
bool "CFQ Group Scheduling support"
depends on IOSCHED_CFQ && BLK_CGROUP
---help---
Enable group IO scheduling in CFQ.
choice
prompt "Default I/O scheduler"
default DEFAULT_CFQ
help
Select the I/O scheduler which will be used by default for all
block devices.
config DEFAULT_DEADLINE
bool "Deadline" if IOSCHED_DEADLINE=y
config DEFAULT_CFQ
bool "CFQ" if IOSCHED_CFQ=y
config DEFAULT_NOOP
bool "No-op"
endchoice
config DEFAULT_IOSCHED
string
default "deadline" if DEFAULT_DEADLINE
default "cfq" if DEFAULT_CFQ
default "noop" if DEFAULT_NOOP
config MQ_IOSCHED_DEADLINE config MQ_IOSCHED_DEADLINE
tristate "MQ deadline I/O scheduler" tristate "MQ deadline I/O scheduler"
default y default y
......
...@@ -3,7 +3,7 @@ ...@@ -3,7 +3,7 @@
# Makefile for the kernel block layer # Makefile for the kernel block layer
# #
obj-$(CONFIG_BLOCK) := bio.o elevator.o blk-core.o blk-tag.o blk-sysfs.o \ obj-$(CONFIG_BLOCK) := bio.o elevator.o blk-core.o blk-sysfs.o \
blk-flush.o blk-settings.o blk-ioc.o blk-map.o \ blk-flush.o blk-settings.o blk-ioc.o blk-map.o \
blk-exec.o blk-merge.o blk-softirq.o blk-timeout.o \ blk-exec.o blk-merge.o blk-softirq.o blk-timeout.o \
blk-lib.o blk-mq.o blk-mq-tag.o blk-stat.o \ blk-lib.o blk-mq.o blk-mq-tag.o blk-stat.o \
...@@ -18,9 +18,6 @@ obj-$(CONFIG_BLK_DEV_BSGLIB) += bsg-lib.o ...@@ -18,9 +18,6 @@ obj-$(CONFIG_BLK_DEV_BSGLIB) += bsg-lib.o
obj-$(CONFIG_BLK_CGROUP) += blk-cgroup.o obj-$(CONFIG_BLK_CGROUP) += blk-cgroup.o
obj-$(CONFIG_BLK_DEV_THROTTLING) += blk-throttle.o obj-$(CONFIG_BLK_DEV_THROTTLING) += blk-throttle.o
obj-$(CONFIG_BLK_CGROUP_IOLATENCY) += blk-iolatency.o obj-$(CONFIG_BLK_CGROUP_IOLATENCY) += blk-iolatency.o
obj-$(CONFIG_IOSCHED_NOOP) += noop-iosched.o
obj-$(CONFIG_IOSCHED_DEADLINE) += deadline-iosched.o
obj-$(CONFIG_IOSCHED_CFQ) += cfq-iosched.o
obj-$(CONFIG_MQ_IOSCHED_DEADLINE) += mq-deadline.o obj-$(CONFIG_MQ_IOSCHED_DEADLINE) += mq-deadline.o
obj-$(CONFIG_MQ_IOSCHED_KYBER) += kyber-iosched.o obj-$(CONFIG_MQ_IOSCHED_KYBER) += kyber-iosched.o
bfq-y := bfq-iosched.o bfq-wf2q.o bfq-cgroup.o bfq-y := bfq-iosched.o bfq-wf2q.o bfq-cgroup.o
......
...@@ -334,7 +334,7 @@ static void bfqg_stats_xfer_dead(struct bfq_group *bfqg) ...@@ -334,7 +334,7 @@ static void bfqg_stats_xfer_dead(struct bfq_group *bfqg)
parent = bfqg_parent(bfqg); parent = bfqg_parent(bfqg);
lockdep_assert_held(bfqg_to_blkg(bfqg)->q->queue_lock); lockdep_assert_held(&bfqg_to_blkg(bfqg)->q->queue_lock);
if (unlikely(!parent)) if (unlikely(!parent))
return; return;
...@@ -642,7 +642,7 @@ void bfq_bic_update_cgroup(struct bfq_io_cq *bic, struct bio *bio) ...@@ -642,7 +642,7 @@ void bfq_bic_update_cgroup(struct bfq_io_cq *bic, struct bio *bio)
uint64_t serial_nr; uint64_t serial_nr;
rcu_read_lock(); rcu_read_lock();
serial_nr = bio_blkcg(bio)->css.serial_nr; serial_nr = __bio_blkcg(bio)->css.serial_nr;
/* /*
* Check whether blkcg has changed. The condition may trigger * Check whether blkcg has changed. The condition may trigger
...@@ -651,7 +651,7 @@ void bfq_bic_update_cgroup(struct bfq_io_cq *bic, struct bio *bio) ...@@ -651,7 +651,7 @@ void bfq_bic_update_cgroup(struct bfq_io_cq *bic, struct bio *bio)
if (unlikely(!bfqd) || likely(bic->blkcg_serial_nr == serial_nr)) if (unlikely(!bfqd) || likely(bic->blkcg_serial_nr == serial_nr))
goto out; goto out;
bfqg = __bfq_bic_change_cgroup(bfqd, bic, bio_blkcg(bio)); bfqg = __bfq_bic_change_cgroup(bfqd, bic, __bio_blkcg(bio));
/* /*
* Update blkg_path for bfq_log_* functions. We cache this * Update blkg_path for bfq_log_* functions. We cache this
* path, and update it here, for the following * path, and update it here, for the following
......
...@@ -399,9 +399,9 @@ static struct bfq_io_cq *bfq_bic_lookup(struct bfq_data *bfqd, ...@@ -399,9 +399,9 @@ static struct bfq_io_cq *bfq_bic_lookup(struct bfq_data *bfqd,
unsigned long flags; unsigned long flags;
struct bfq_io_cq *icq; struct bfq_io_cq *icq;
spin_lock_irqsave(q->queue_lock, flags); spin_lock_irqsave(&q->queue_lock, flags);
icq = icq_to_bic(ioc_lookup_icq(ioc, q)); icq = icq_to_bic(ioc_lookup_icq(ioc, q));
spin_unlock_irqrestore(q->queue_lock, flags); spin_unlock_irqrestore(&q->queue_lock, flags);
return icq; return icq;
} }
...@@ -4066,7 +4066,7 @@ static void bfq_update_dispatch_stats(struct request_queue *q, ...@@ -4066,7 +4066,7 @@ static void bfq_update_dispatch_stats(struct request_queue *q,
* In addition, the following queue lock guarantees that * In addition, the following queue lock guarantees that
* bfqq_group(bfqq) exists as well. * bfqq_group(bfqq) exists as well.
*/ */
spin_lock_irq(q->queue_lock); spin_lock_irq(&q->queue_lock);
if (idle_timer_disabled) if (idle_timer_disabled)
/* /*
* Since the idle timer has been disabled, * Since the idle timer has been disabled,
...@@ -4085,7 +4085,7 @@ static void bfq_update_dispatch_stats(struct request_queue *q, ...@@ -4085,7 +4085,7 @@ static void bfq_update_dispatch_stats(struct request_queue *q,
bfqg_stats_set_start_empty_time(bfqg); bfqg_stats_set_start_empty_time(bfqg);
bfqg_stats_update_io_remove(bfqg, rq->cmd_flags); bfqg_stats_update_io_remove(bfqg, rq->cmd_flags);
} }
spin_unlock_irq(q->queue_lock); spin_unlock_irq(&q->queue_lock);
} }
#else #else
static inline void bfq_update_dispatch_stats(struct request_queue *q, static inline void bfq_update_dispatch_stats(struct request_queue *q,
...@@ -4416,7 +4416,7 @@ static struct bfq_queue *bfq_get_queue(struct bfq_data *bfqd, ...@@ -4416,7 +4416,7 @@ static struct bfq_queue *bfq_get_queue(struct bfq_data *bfqd,
rcu_read_lock(); rcu_read_lock();
bfqg = bfq_find_set_group(bfqd, bio_blkcg(bio)); bfqg = bfq_find_set_group(bfqd, __bio_blkcg(bio));
if (!bfqg) { if (!bfqg) {
bfqq = &bfqd->oom_bfqq; bfqq = &bfqd->oom_bfqq;
goto out; goto out;
...@@ -4669,11 +4669,11 @@ static void bfq_update_insert_stats(struct request_queue *q, ...@@ -4669,11 +4669,11 @@ static void bfq_update_insert_stats(struct request_queue *q,
* In addition, the following queue lock guarantees that * In addition, the following queue lock guarantees that
* bfqq_group(bfqq) exists as well. * bfqq_group(bfqq) exists as well.
*/ */
spin_lock_irq(q->queue_lock); spin_lock_irq(&q->queue_lock);
bfqg_stats_update_io_add(bfqq_group(bfqq), bfqq, cmd_flags); bfqg_stats_update_io_add(bfqq_group(bfqq), bfqq, cmd_flags);
if (idle_timer_disabled) if (idle_timer_disabled)
bfqg_stats_update_idle_time(bfqq_group(bfqq)); bfqg_stats_update_idle_time(bfqq_group(bfqq));
spin_unlock_irq(q->queue_lock); spin_unlock_irq(&q->queue_lock);
} }
#else #else
static inline void bfq_update_insert_stats(struct request_queue *q, static inline void bfq_update_insert_stats(struct request_queue *q,
...@@ -5414,9 +5414,9 @@ static int bfq_init_queue(struct request_queue *q, struct elevator_type *e) ...@@ -5414,9 +5414,9 @@ static int bfq_init_queue(struct request_queue *q, struct elevator_type *e)
} }
eq->elevator_data = bfqd; eq->elevator_data = bfqd;
spin_lock_irq(q->queue_lock); spin_lock_irq(&q->queue_lock);
q->elevator = eq; q->elevator = eq;
spin_unlock_irq(q->queue_lock); spin_unlock_irq(&q->queue_lock);
/* /*
* Our fallback bfqq if bfq_find_alloc_queue() runs into OOM issues. * Our fallback bfqq if bfq_find_alloc_queue() runs into OOM issues.
...@@ -5756,7 +5756,7 @@ static struct elv_fs_entry bfq_attrs[] = { ...@@ -5756,7 +5756,7 @@ static struct elv_fs_entry bfq_attrs[] = {
}; };
static struct elevator_type iosched_bfq_mq = { static struct elevator_type iosched_bfq_mq = {
.ops.mq = { .ops = {
.limit_depth = bfq_limit_depth, .limit_depth = bfq_limit_depth,
.prepare_request = bfq_prepare_request, .prepare_request = bfq_prepare_request,
.requeue_request = bfq_finish_requeue_request, .requeue_request = bfq_finish_requeue_request,
...@@ -5777,7 +5777,6 @@ static struct elevator_type iosched_bfq_mq = { ...@@ -5777,7 +5777,6 @@ static struct elevator_type iosched_bfq_mq = {
.exit_sched = bfq_exit_queue, .exit_sched = bfq_exit_queue,
}, },
.uses_mq = true,
.icq_size = sizeof(struct bfq_io_cq), .icq_size = sizeof(struct bfq_io_cq),
.icq_align = __alignof__(struct bfq_io_cq), .icq_align = __alignof__(struct bfq_io_cq),
.elevator_attrs = bfq_attrs, .elevator_attrs = bfq_attrs,
......
...@@ -390,7 +390,6 @@ void bio_integrity_advance(struct bio *bio, unsigned int bytes_done) ...@@ -390,7 +390,6 @@ void bio_integrity_advance(struct bio *bio, unsigned int bytes_done)
bip->bip_iter.bi_sector += bytes_done >> 9; bip->bip_iter.bi_sector += bytes_done >> 9;
bvec_iter_advance(bip->bip_vec, &bip->bip_iter, bytes); bvec_iter_advance(bip->bip_vec, &bip->bip_iter, bytes);
} }
EXPORT_SYMBOL(bio_integrity_advance);
/** /**
* bio_integrity_trim - Trim integrity vector * bio_integrity_trim - Trim integrity vector
...@@ -460,7 +459,6 @@ void bioset_integrity_free(struct bio_set *bs) ...@@ -460,7 +459,6 @@ void bioset_integrity_free(struct bio_set *bs)
mempool_exit(&bs->bio_integrity_pool); mempool_exit(&bs->bio_integrity_pool);
mempool_exit(&bs->bvec_integrity_pool); mempool_exit(&bs->bvec_integrity_pool);
} }
EXPORT_SYMBOL(bioset_integrity_free);
void __init bio_integrity_init(void) void __init bio_integrity_init(void)
{ {
......
...@@ -244,7 +244,7 @@ struct bio_vec *bvec_alloc(gfp_t gfp_mask, int nr, unsigned long *idx, ...@@ -244,7 +244,7 @@ struct bio_vec *bvec_alloc(gfp_t gfp_mask, int nr, unsigned long *idx,
void bio_uninit(struct bio *bio) void bio_uninit(struct bio *bio)
{ {
bio_disassociate_task(bio); bio_disassociate_blkg(bio);
} }
EXPORT_SYMBOL(bio_uninit); EXPORT_SYMBOL(bio_uninit);
...@@ -571,14 +571,13 @@ void bio_put(struct bio *bio) ...@@ -571,14 +571,13 @@ void bio_put(struct bio *bio)
} }
EXPORT_SYMBOL(bio_put); EXPORT_SYMBOL(bio_put);
inline int bio_phys_segments(struct request_queue *q, struct bio *bio) int bio_phys_segments(struct request_queue *q, struct bio *bio)
{ {
if (unlikely(!bio_flagged(bio, BIO_SEG_VALID))) if (unlikely(!bio_flagged(bio, BIO_SEG_VALID)))
blk_recount_segments(q, bio); blk_recount_segments(q, bio);
return bio->bi_phys_segments; return bio->bi_phys_segments;
} }
EXPORT_SYMBOL(bio_phys_segments);
/** /**
* __bio_clone_fast - clone a bio that shares the original bio's biovec * __bio_clone_fast - clone a bio that shares the original bio's biovec
...@@ -610,7 +609,8 @@ void __bio_clone_fast(struct bio *bio, struct bio *bio_src) ...@@ -610,7 +609,8 @@ void __bio_clone_fast(struct bio *bio, struct bio *bio_src)
bio->bi_iter = bio_src->bi_iter; bio->bi_iter = bio_src->bi_iter;
bio->bi_io_vec = bio_src->bi_io_vec; bio->bi_io_vec = bio_src->bi_io_vec;
bio_clone_blkcg_association(bio, bio_src); bio_clone_blkg_association(bio, bio_src);
blkcg_bio_issue_init(bio);
} }
EXPORT_SYMBOL(__bio_clone_fast); EXPORT_SYMBOL(__bio_clone_fast);
...@@ -901,7 +901,6 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) ...@@ -901,7 +901,6 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
return 0; return 0;
} }
EXPORT_SYMBOL_GPL(bio_iov_iter_get_pages);
static void submit_bio_wait_endio(struct bio *bio) static void submit_bio_wait_endio(struct bio *bio)
{ {
...@@ -1592,7 +1591,6 @@ void bio_set_pages_dirty(struct bio *bio) ...@@ -1592,7 +1591,6 @@ void bio_set_pages_dirty(struct bio *bio)
set_page_dirty_lock(bvec->bv_page); set_page_dirty_lock(bvec->bv_page);
} }
} }
EXPORT_SYMBOL_GPL(bio_set_pages_dirty);
static void bio_release_pages(struct bio *bio) static void bio_release_pages(struct bio *bio)
{ {
...@@ -1662,17 +1660,33 @@ void bio_check_pages_dirty(struct bio *bio) ...@@ -1662,17 +1660,33 @@ void bio_check_pages_dirty(struct bio *bio)
spin_unlock_irqrestore(&bio_dirty_lock, flags); spin_unlock_irqrestore(&bio_dirty_lock, flags);
schedule_work(&bio_dirty_work); schedule_work(&bio_dirty_work);
} }
EXPORT_SYMBOL_GPL(bio_check_pages_dirty);
void update_io_ticks(struct hd_struct *part, unsigned long now)
{
unsigned long stamp;
again:
stamp = READ_ONCE(part->stamp);
if (unlikely(stamp != now)) {
if (likely(cmpxchg(&part->stamp, stamp, now) == stamp)) {
__part_stat_add(part, io_ticks, 1);
}
}
if (part->partno) {
part = &part_to_disk(part)->part0;
goto again;
}
}
void generic_start_io_acct(struct request_queue *q, int op, void generic_start_io_acct(struct request_queue *q, int op,
unsigned long sectors, struct hd_struct *part) unsigned long sectors, struct hd_struct *part)
{ {
const int sgrp = op_stat_group(op); const int sgrp = op_stat_group(op);
int cpu = part_stat_lock();
part_round_stats(q, cpu, part); part_stat_lock();
part_stat_inc(cpu, part, ios[sgrp]);
part_stat_add(cpu, part, sectors[sgrp], sectors); update_io_ticks(part, jiffies);
part_stat_inc(part, ios[sgrp]);
part_stat_add(part, sectors[sgrp], sectors);
part_inc_in_flight(q, part, op_is_write(op)); part_inc_in_flight(q, part, op_is_write(op));
part_stat_unlock(); part_stat_unlock();
...@@ -1682,12 +1696,15 @@ EXPORT_SYMBOL(generic_start_io_acct); ...@@ -1682,12 +1696,15 @@ EXPORT_SYMBOL(generic_start_io_acct);
void generic_end_io_acct(struct request_queue *q, int req_op, void generic_end_io_acct(struct request_queue *q, int req_op,
struct hd_struct *part, unsigned long start_time) struct hd_struct *part, unsigned long start_time)
{ {
unsigned long duration = jiffies - start_time; unsigned long now = jiffies;
unsigned long duration = now - start_time;
const int sgrp = op_stat_group(req_op); const int sgrp = op_stat_group(req_op);
int cpu = part_stat_lock();
part_stat_add(cpu, part, nsecs[sgrp], jiffies_to_nsecs(duration)); part_stat_lock();
part_round_stats(q, cpu, part);
update_io_ticks(part, now);
part_stat_add(part, nsecs[sgrp], jiffies_to_nsecs(duration));
part_stat_add(part, time_in_queue, duration);
part_dec_in_flight(q, part, op_is_write(req_op)); part_dec_in_flight(q, part, op_is_write(req_op));
part_stat_unlock(); part_stat_unlock();
...@@ -1957,102 +1974,133 @@ EXPORT_SYMBOL(bioset_init_from_src); ...@@ -1957,102 +1974,133 @@ EXPORT_SYMBOL(bioset_init_from_src);
#ifdef CONFIG_BLK_CGROUP #ifdef CONFIG_BLK_CGROUP
#ifdef CONFIG_MEMCG
/** /**
* bio_associate_blkcg_from_page - associate a bio with the page's blkcg * bio_disassociate_blkg - puts back the blkg reference if associated
* @bio: target bio * @bio: target bio
* @page: the page to lookup the blkcg from
* *
* Associate @bio with the blkcg from @page's owning memcg. This works like * Helper to disassociate the blkg from @bio if a blkg is associated.
* every other associate function wrt references.
*/ */
int bio_associate_blkcg_from_page(struct bio *bio, struct page *page) void bio_disassociate_blkg(struct bio *bio)
{ {
struct cgroup_subsys_state *blkcg_css; if (bio->bi_blkg) {
blkg_put(bio->bi_blkg);
if (unlikely(bio->bi_css)) bio->bi_blkg = NULL;
return -EBUSY; }
if (!page->mem_cgroup)
return 0;
blkcg_css = cgroup_get_e_css(page->mem_cgroup->css.cgroup,
&io_cgrp_subsys);
bio->bi_css = blkcg_css;
return 0;
} }
#endif /* CONFIG_MEMCG */ EXPORT_SYMBOL_GPL(bio_disassociate_blkg);
/** /**
* bio_associate_blkcg - associate a bio with the specified blkcg * __bio_associate_blkg - associate a bio with the a blkg
* @bio: target bio * @bio: target bio
* @blkcg_css: css of the blkcg to associate * @blkg: the blkg to associate
* *
* Associate @bio with the blkcg specified by @blkcg_css. Block layer will * This tries to associate @bio with the specified @blkg. Association failure
* treat @bio as if it were issued by a task which belongs to the blkcg. * is handled by walking up the blkg tree. Therefore, the blkg associated can
* be anything between @blkg and the root_blkg. This situation only happens
* when a cgroup is dying and then the remaining bios will spill to the closest
* alive blkg.
* *
* This function takes an extra reference of @blkcg_css which will be put * A reference will be taken on the @blkg and will be released when @bio is
* when @bio is released. The caller must own @bio and is responsible for * freed.
* synchronizing calls to this function.
*/ */
int bio_associate_blkcg(struct bio *bio, struct cgroup_subsys_state *blkcg_css) static void __bio_associate_blkg(struct bio *bio, struct blkcg_gq *blkg)
{ {
if (unlikely(bio->bi_css)) bio_disassociate_blkg(bio);
return -EBUSY;
css_get(blkcg_css); bio->bi_blkg = blkg_tryget_closest(blkg);
bio->bi_css = blkcg_css;
return 0;
} }
EXPORT_SYMBOL_GPL(bio_associate_blkcg);
/** /**
* bio_associate_blkg - associate a bio with the specified blkg * bio_associate_blkg_from_css - associate a bio with a specified css
* @bio: target bio * @bio: target bio
* @blkg: the blkg to associate * @css: target css
* *
* Associate @bio with the blkg specified by @blkg. This is the queue specific * Associate @bio with the blkg found by combining the css's blkg and the
* blkcg information associated with the @bio, a reference will be taken on the * request_queue of the @bio. This falls back to the queue's root_blkg if
* @blkg and will be freed when the bio is freed. * the association fails with the css.
*/ */
int bio_associate_blkg(struct bio *bio, struct blkcg_gq *blkg) void bio_associate_blkg_from_css(struct bio *bio,
struct cgroup_subsys_state *css)
{ {
if (unlikely(bio->bi_blkg)) struct request_queue *q = bio->bi_disk->queue;
return -EBUSY; struct blkcg_gq *blkg;
if (!blkg_try_get(blkg))
return -ENODEV; rcu_read_lock();
bio->bi_blkg = blkg;
return 0; if (!css || !css->parent)
blkg = q->root_blkg;
else
blkg = blkg_lookup_create(css_to_blkcg(css), q);
__bio_associate_blkg(bio, blkg);
rcu_read_unlock();
} }
EXPORT_SYMBOL_GPL(bio_associate_blkg_from_css);
#ifdef CONFIG_MEMCG
/** /**
* bio_disassociate_task - undo bio_associate_current() * bio_associate_blkg_from_page - associate a bio with the page's blkg
* @bio: target bio * @bio: target bio
* @page: the page to lookup the blkcg from
*
* Associate @bio with the blkg from @page's owning memcg and the respective
* request_queue. If cgroup_e_css returns %NULL, fall back to the queue's
* root_blkg.
*/ */
void bio_disassociate_task(struct bio *bio) void bio_associate_blkg_from_page(struct bio *bio, struct page *page)
{ {
if (bio->bi_ioc) { struct cgroup_subsys_state *css;
put_io_context(bio->bi_ioc);
bio->bi_ioc = NULL; if (!page->mem_cgroup)
} return;
if (bio->bi_css) {
css_put(bio->bi_css); rcu_read_lock();
bio->bi_css = NULL;
} css = cgroup_e_css(page->mem_cgroup->css.cgroup, &io_cgrp_subsys);
if (bio->bi_blkg) { bio_associate_blkg_from_css(bio, css);
blkg_put(bio->bi_blkg);
bio->bi_blkg = NULL; rcu_read_unlock();
} }
#endif /* CONFIG_MEMCG */
/**
* bio_associate_blkg - associate a bio with a blkg
* @bio: target bio
*
* Associate @bio with the blkg found from the bio's css and request_queue.
* If one is not found, bio_lookup_blkg() creates the blkg. If a blkg is
* already associated, the css is reused and association redone as the
* request_queue may have changed.
*/
void bio_associate_blkg(struct bio *bio)
{
struct cgroup_subsys_state *css;
rcu_read_lock();
if (bio->bi_blkg)
css = &bio_blkcg(bio)->css;
else
css = blkcg_css();
bio_associate_blkg_from_css(bio, css);
rcu_read_unlock();
} }
EXPORT_SYMBOL_GPL(bio_associate_blkg);
/** /**
* bio_clone_blkcg_association - clone blkcg association from src to dst bio * bio_clone_blkg_association - clone blkg association from src to dst bio
* @dst: destination bio * @dst: destination bio
* @src: source bio * @src: source bio
*/ */
void bio_clone_blkcg_association(struct bio *dst, struct bio *src) void bio_clone_blkg_association(struct bio *dst, struct bio *src)
{ {
if (src->bi_css) if (src->bi_blkg)
WARN_ON(bio_associate_blkcg(dst, src->bi_css)); __bio_associate_blkg(dst, src->bi_blkg);
} }
EXPORT_SYMBOL_GPL(bio_clone_blkcg_association); EXPORT_SYMBOL_GPL(bio_clone_blkg_association);
#endif /* CONFIG_BLK_CGROUP */ #endif /* CONFIG_BLK_CGROUP */
static void __init biovec_init_slabs(void) static void __init biovec_init_slabs(void)
......
This diff is collapsed.
This diff is collapsed.
...@@ -48,8 +48,6 @@ void blk_execute_rq_nowait(struct request_queue *q, struct gendisk *bd_disk, ...@@ -48,8 +48,6 @@ void blk_execute_rq_nowait(struct request_queue *q, struct gendisk *bd_disk,
struct request *rq, int at_head, struct request *rq, int at_head,
rq_end_io_fn *done) rq_end_io_fn *done)
{ {
int where = at_head ? ELEVATOR_INSERT_FRONT : ELEVATOR_INSERT_BACK;
WARN_ON(irqs_disabled()); WARN_ON(irqs_disabled());
WARN_ON(!blk_rq_is_passthrough(rq)); WARN_ON(!blk_rq_is_passthrough(rq));
...@@ -60,23 +58,7 @@ void blk_execute_rq_nowait(struct request_queue *q, struct gendisk *bd_disk, ...@@ -60,23 +58,7 @@ void blk_execute_rq_nowait(struct request_queue *q, struct gendisk *bd_disk,
* don't check dying flag for MQ because the request won't * don't check dying flag for MQ because the request won't
* be reused after dying flag is set * be reused after dying flag is set
*/ */
if (q->mq_ops) { blk_mq_sched_insert_request(rq, at_head, true, false);
blk_mq_sched_insert_request(rq, at_head, true, false);
return;
}
spin_lock_irq(q->queue_lock);
if (unlikely(blk_queue_dying(q))) {
rq->rq_flags |= RQF_QUIET;
__blk_end_request_all(rq, BLK_STS_IOERR);
spin_unlock_irq(q->queue_lock);
return;
}
__elv_add_request(q, rq, where);
__blk_run_queue(q);
spin_unlock_irq(q->queue_lock);
} }
EXPORT_SYMBOL_GPL(blk_execute_rq_nowait); EXPORT_SYMBOL_GPL(blk_execute_rq_nowait);
......
This diff is collapsed.
...@@ -28,7 +28,6 @@ void get_io_context(struct io_context *ioc) ...@@ -28,7 +28,6 @@ void get_io_context(struct io_context *ioc)
BUG_ON(atomic_long_read(&ioc->refcount) <= 0); BUG_ON(atomic_long_read(&ioc->refcount) <= 0);
atomic_long_inc(&ioc->refcount); atomic_long_inc(&ioc->refcount);
} }
EXPORT_SYMBOL(get_io_context);
static void icq_free_icq_rcu(struct rcu_head *head) static void icq_free_icq_rcu(struct rcu_head *head)
{ {
...@@ -48,10 +47,8 @@ static void ioc_exit_icq(struct io_cq *icq) ...@@ -48,10 +47,8 @@ static void ioc_exit_icq(struct io_cq *icq)
if (icq->flags & ICQ_EXITED) if (icq->flags & ICQ_EXITED)
return; return;
if (et->uses_mq && et->ops.mq.exit_icq) if (et->ops.exit_icq)
et->ops.mq.exit_icq(icq); et->ops.exit_icq(icq);
else if (!et->uses_mq && et->ops.sq.elevator_exit_icq_fn)
et->ops.sq.elevator_exit_icq_fn(icq);
icq->flags |= ICQ_EXITED; icq->flags |= ICQ_EXITED;
} }
...@@ -113,9 +110,9 @@ static void ioc_release_fn(struct work_struct *work) ...@@ -113,9 +110,9 @@ static void ioc_release_fn(struct work_struct *work)
struct io_cq, ioc_node); struct io_cq, ioc_node);
struct request_queue *q = icq->q; struct request_queue *q = icq->q;
if (spin_trylock(q->queue_lock)) { if (spin_trylock(&q->queue_lock)) {
ioc_destroy_icq(icq); ioc_destroy_icq(icq);
spin_unlock(q->queue_lock); spin_unlock(&q->queue_lock);
} else { } else {
spin_unlock_irqrestore(&ioc->lock, flags); spin_unlock_irqrestore(&ioc->lock, flags);
cpu_relax(); cpu_relax();
...@@ -162,7 +159,6 @@ void put_io_context(struct io_context *ioc) ...@@ -162,7 +159,6 @@ void put_io_context(struct io_context *ioc)
if (free_ioc) if (free_ioc)
kmem_cache_free(iocontext_cachep, ioc); kmem_cache_free(iocontext_cachep, ioc);
} }
EXPORT_SYMBOL(put_io_context);
/** /**
* put_io_context_active - put active reference on ioc * put_io_context_active - put active reference on ioc
...@@ -173,7 +169,6 @@ EXPORT_SYMBOL(put_io_context); ...@@ -173,7 +169,6 @@ EXPORT_SYMBOL(put_io_context);
*/ */
void put_io_context_active(struct io_context *ioc) void put_io_context_active(struct io_context *ioc)
{ {
struct elevator_type *et;
unsigned long flags; unsigned long flags;
struct io_cq *icq; struct io_cq *icq;
...@@ -187,25 +182,12 @@ void put_io_context_active(struct io_context *ioc) ...@@ -187,25 +182,12 @@ void put_io_context_active(struct io_context *ioc)
* reverse double locking. Read comment in ioc_release_fn() for * reverse double locking. Read comment in ioc_release_fn() for
* explanation on the nested locking annotation. * explanation on the nested locking annotation.
*/ */
retry:
spin_lock_irqsave_nested(&ioc->lock, flags, 1); spin_lock_irqsave_nested(&ioc->lock, flags, 1);
hlist_for_each_entry(icq, &ioc->icq_list, ioc_node) { hlist_for_each_entry(icq, &ioc->icq_list, ioc_node) {
if (icq->flags & ICQ_EXITED) if (icq->flags & ICQ_EXITED)
continue; continue;
et = icq->q->elevator->type; ioc_exit_icq(icq);
if (et->uses_mq) {
ioc_exit_icq(icq);
} else {
if (spin_trylock(icq->q->queue_lock)) {
ioc_exit_icq(icq);
spin_unlock(icq->q->queue_lock);
} else {
spin_unlock_irqrestore(&ioc->lock, flags);
cpu_relax();
goto retry;
}
}
} }
spin_unlock_irqrestore(&ioc->lock, flags); spin_unlock_irqrestore(&ioc->lock, flags);
...@@ -232,7 +214,7 @@ static void __ioc_clear_queue(struct list_head *icq_list) ...@@ -232,7 +214,7 @@ static void __ioc_clear_queue(struct list_head *icq_list)
while (!list_empty(icq_list)) { while (!list_empty(icq_list)) {
struct io_cq *icq = list_entry(icq_list->next, struct io_cq *icq = list_entry(icq_list->next,
struct io_cq, q_node); struct io_cq, q_node);
struct io_context *ioc = icq->ioc; struct io_context *ioc = icq->ioc;
spin_lock_irqsave(&ioc->lock, flags); spin_lock_irqsave(&ioc->lock, flags);
...@@ -251,16 +233,11 @@ void ioc_clear_queue(struct request_queue *q) ...@@ -251,16 +233,11 @@ void ioc_clear_queue(struct request_queue *q)
{ {
LIST_HEAD(icq_list); LIST_HEAD(icq_list);
spin_lock_irq(q->queue_lock); spin_lock_irq(&q->queue_lock);
list_splice_init(&q->icq_list, &icq_list); list_splice_init(&q->icq_list, &icq_list);
spin_unlock_irq(&q->queue_lock);
if (q->mq_ops) { __ioc_clear_queue(&icq_list);
spin_unlock_irq(q->queue_lock);
__ioc_clear_queue(&icq_list);
} else {
__ioc_clear_queue(&icq_list);
spin_unlock_irq(q->queue_lock);
}
} }
int create_task_io_context(struct task_struct *task, gfp_t gfp_flags, int node) int create_task_io_context(struct task_struct *task, gfp_t gfp_flags, int node)
...@@ -336,7 +313,6 @@ struct io_context *get_task_io_context(struct task_struct *task, ...@@ -336,7 +313,6 @@ struct io_context *get_task_io_context(struct task_struct *task,
return NULL; return NULL;
} }
EXPORT_SYMBOL(get_task_io_context);
/** /**
* ioc_lookup_icq - lookup io_cq from ioc * ioc_lookup_icq - lookup io_cq from ioc
...@@ -350,7 +326,7 @@ struct io_cq *ioc_lookup_icq(struct io_context *ioc, struct request_queue *q) ...@@ -350,7 +326,7 @@ struct io_cq *ioc_lookup_icq(struct io_context *ioc, struct request_queue *q)
{ {
struct io_cq *icq; struct io_cq *icq;
lockdep_assert_held(q->queue_lock); lockdep_assert_held(&q->queue_lock);
/* /*
* icq's are indexed from @ioc using radix tree and hint pointer, * icq's are indexed from @ioc using radix tree and hint pointer,
...@@ -409,16 +385,14 @@ struct io_cq *ioc_create_icq(struct io_context *ioc, struct request_queue *q, ...@@ -409,16 +385,14 @@ struct io_cq *ioc_create_icq(struct io_context *ioc, struct request_queue *q,
INIT_HLIST_NODE(&icq->ioc_node); INIT_HLIST_NODE(&icq->ioc_node);
/* lock both q and ioc and try to link @icq */ /* lock both q and ioc and try to link @icq */
spin_lock_irq(q->queue_lock); spin_lock_irq(&q->queue_lock);
spin_lock(&ioc->lock); spin_lock(&ioc->lock);
if (likely(!radix_tree_insert(&ioc->icq_tree, q->id, icq))) { if (likely(!radix_tree_insert(&ioc->icq_tree, q->id, icq))) {
hlist_add_head(&icq->ioc_node, &ioc->icq_list); hlist_add_head(&icq->ioc_node, &ioc->icq_list);
list_add(&icq->q_node, &q->icq_list); list_add(&icq->q_node, &q->icq_list);
if (et->uses_mq && et->ops.mq.init_icq) if (et->ops.init_icq)
et->ops.mq.init_icq(icq); et->ops.init_icq(icq);
else if (!et->uses_mq && et->ops.sq.elevator_init_icq_fn)
et->ops.sq.elevator_init_icq_fn(icq);
} else { } else {
kmem_cache_free(et->icq_cache, icq); kmem_cache_free(et->icq_cache, icq);
icq = ioc_lookup_icq(ioc, q); icq = ioc_lookup_icq(ioc, q);
...@@ -427,7 +401,7 @@ struct io_cq *ioc_create_icq(struct io_context *ioc, struct request_queue *q, ...@@ -427,7 +401,7 @@ struct io_cq *ioc_create_icq(struct io_context *ioc, struct request_queue *q,
} }
spin_unlock(&ioc->lock); spin_unlock(&ioc->lock);
spin_unlock_irq(q->queue_lock); spin_unlock_irq(&q->queue_lock);
radix_tree_preload_end(); radix_tree_preload_end();
return icq; return icq;
} }
......
...@@ -262,29 +262,25 @@ static inline void iolat_update_total_lat_avg(struct iolatency_grp *iolat, ...@@ -262,29 +262,25 @@ static inline void iolat_update_total_lat_avg(struct iolatency_grp *iolat,
stat->rqs.mean); stat->rqs.mean);
} }
static inline bool iolatency_may_queue(struct iolatency_grp *iolat, static void iolat_cleanup_cb(struct rq_wait *rqw, void *private_data)
wait_queue_entry_t *wait,
bool first_block)
{ {
struct rq_wait *rqw = &iolat->rq_wait; atomic_dec(&rqw->inflight);
wake_up(&rqw->wait);
}
if (first_block && waitqueue_active(&rqw->wait) && static bool iolat_acquire_inflight(struct rq_wait *rqw, void *private_data)
rqw->wait.head.next != &wait->entry) {
return false; struct iolatency_grp *iolat = private_data;
return rq_wait_inc_below(rqw, iolat->rq_depth.max_depth); return rq_wait_inc_below(rqw, iolat->rq_depth.max_depth);
} }
static void __blkcg_iolatency_throttle(struct rq_qos *rqos, static void __blkcg_iolatency_throttle(struct rq_qos *rqos,
struct iolatency_grp *iolat, struct iolatency_grp *iolat,
spinlock_t *lock, bool issue_as_root, bool issue_as_root,
bool use_memdelay) bool use_memdelay)
__releases(lock)
__acquires(lock)
{ {
struct rq_wait *rqw = &iolat->rq_wait; struct rq_wait *rqw = &iolat->rq_wait;
unsigned use_delay = atomic_read(&lat_to_blkg(iolat)->use_delay); unsigned use_delay = atomic_read(&lat_to_blkg(iolat)->use_delay);
DEFINE_WAIT(wait);
bool first_block = true;
if (use_delay) if (use_delay)
blkcg_schedule_throttle(rqos->q, use_memdelay); blkcg_schedule_throttle(rqos->q, use_memdelay);
...@@ -301,27 +297,7 @@ static void __blkcg_iolatency_throttle(struct rq_qos *rqos, ...@@ -301,27 +297,7 @@ static void __blkcg_iolatency_throttle(struct rq_qos *rqos,
return; return;
} }
if (iolatency_may_queue(iolat, &wait, first_block)) rq_qos_wait(rqw, iolat, iolat_acquire_inflight, iolat_cleanup_cb);
return;
do {
prepare_to_wait_exclusive(&rqw->wait, &wait,
TASK_UNINTERRUPTIBLE);
if (iolatency_may_queue(iolat, &wait, first_block))
break;
first_block = false;
if (lock) {
spin_unlock_irq(lock);
io_schedule();
spin_lock_irq(lock);
} else {
io_schedule();
}
} while (1);
finish_wait(&rqw->wait, &wait);
} }
#define SCALE_DOWN_FACTOR 2 #define SCALE_DOWN_FACTOR 2
...@@ -478,38 +454,15 @@ static void check_scale_change(struct iolatency_grp *iolat) ...@@ -478,38 +454,15 @@ static void check_scale_change(struct iolatency_grp *iolat)
scale_change(iolat, direction > 0); scale_change(iolat, direction > 0);
} }
static void blkcg_iolatency_throttle(struct rq_qos *rqos, struct bio *bio, static void blkcg_iolatency_throttle(struct rq_qos *rqos, struct bio *bio)
spinlock_t *lock)
{ {
struct blk_iolatency *blkiolat = BLKIOLATENCY(rqos); struct blk_iolatency *blkiolat = BLKIOLATENCY(rqos);
struct blkcg *blkcg; struct blkcg_gq *blkg = bio->bi_blkg;
struct blkcg_gq *blkg;
struct request_queue *q = rqos->q;
bool issue_as_root = bio_issue_as_root_blkg(bio); bool issue_as_root = bio_issue_as_root_blkg(bio);
if (!blk_iolatency_enabled(blkiolat)) if (!blk_iolatency_enabled(blkiolat))
return; return;
rcu_read_lock();
blkcg = bio_blkcg(bio);
bio_associate_blkcg(bio, &blkcg->css);
blkg = blkg_lookup(blkcg, q);
if (unlikely(!blkg)) {
if (!lock)
spin_lock_irq(q->queue_lock);
blkg = blkg_lookup_create(blkcg, q);
if (IS_ERR(blkg))
blkg = NULL;
if (!lock)
spin_unlock_irq(q->queue_lock);
}
if (!blkg)
goto out;
bio_issue_init(&bio->bi_issue, bio_sectors(bio));
bio_associate_blkg(bio, blkg);
out:
rcu_read_unlock();
while (blkg && blkg->parent) { while (blkg && blkg->parent) {
struct iolatency_grp *iolat = blkg_to_lat(blkg); struct iolatency_grp *iolat = blkg_to_lat(blkg);
if (!iolat) { if (!iolat) {
...@@ -518,7 +471,7 @@ static void blkcg_iolatency_throttle(struct rq_qos *rqos, struct bio *bio, ...@@ -518,7 +471,7 @@ static void blkcg_iolatency_throttle(struct rq_qos *rqos, struct bio *bio,
} }
check_scale_change(iolat); check_scale_change(iolat);
__blkcg_iolatency_throttle(rqos, iolat, lock, issue_as_root, __blkcg_iolatency_throttle(rqos, iolat, issue_as_root,
(bio->bi_opf & REQ_SWAP) == REQ_SWAP); (bio->bi_opf & REQ_SWAP) == REQ_SWAP);
blkg = blkg->parent; blkg = blkg->parent;
} }
...@@ -640,7 +593,7 @@ static void blkcg_iolatency_done_bio(struct rq_qos *rqos, struct bio *bio) ...@@ -640,7 +593,7 @@ static void blkcg_iolatency_done_bio(struct rq_qos *rqos, struct bio *bio)
bool enabled = false; bool enabled = false;
blkg = bio->bi_blkg; blkg = bio->bi_blkg;
if (!blkg) if (!blkg || !bio_flagged(bio, BIO_TRACKED))
return; return;
iolat = blkg_to_lat(bio->bi_blkg); iolat = blkg_to_lat(bio->bi_blkg);
...@@ -730,7 +683,7 @@ static void blkiolatency_timer_fn(struct timer_list *t) ...@@ -730,7 +683,7 @@ static void blkiolatency_timer_fn(struct timer_list *t)
* We could be exiting, don't access the pd unless we have a * We could be exiting, don't access the pd unless we have a
* ref on the blkg. * ref on the blkg.
*/ */
if (!blkg_try_get(blkg)) if (!blkg_tryget(blkg))
continue; continue;
iolat = blkg_to_lat(blkg); iolat = blkg_to_lat(blkg);
......
...@@ -389,7 +389,6 @@ void blk_recount_segments(struct request_queue *q, struct bio *bio) ...@@ -389,7 +389,6 @@ void blk_recount_segments(struct request_queue *q, struct bio *bio)
bio_set_flag(bio, BIO_SEG_VALID); bio_set_flag(bio, BIO_SEG_VALID);
} }
EXPORT_SYMBOL(blk_recount_segments);
static int blk_phys_contig_segment(struct request_queue *q, struct bio *bio, static int blk_phys_contig_segment(struct request_queue *q, struct bio *bio,
struct bio *nxt) struct bio *nxt)
...@@ -596,17 +595,6 @@ int ll_front_merge_fn(struct request_queue *q, struct request *req, ...@@ -596,17 +595,6 @@ int ll_front_merge_fn(struct request_queue *q, struct request *req,
return ll_new_hw_segment(q, req, bio); return ll_new_hw_segment(q, req, bio);
} }
/*
* blk-mq uses req->special to carry normal driver per-request payload, it
* does not indicate a prepared command that we cannot merge with.
*/
static bool req_no_special_merge(struct request *req)
{
struct request_queue *q = req->q;
return !q->mq_ops && req->special;
}
static bool req_attempt_discard_merge(struct request_queue *q, struct request *req, static bool req_attempt_discard_merge(struct request_queue *q, struct request *req,
struct request *next) struct request *next)
{ {
...@@ -632,13 +620,6 @@ static int ll_merge_requests_fn(struct request_queue *q, struct request *req, ...@@ -632,13 +620,6 @@ static int ll_merge_requests_fn(struct request_queue *q, struct request *req,
unsigned int seg_size = unsigned int seg_size =
req->biotail->bi_seg_back_size + next->bio->bi_seg_front_size; req->biotail->bi_seg_back_size + next->bio->bi_seg_front_size;
/*
* First check if the either of the requests are re-queued
* requests. Can't merge them if they are.
*/
if (req_no_special_merge(req) || req_no_special_merge(next))
return 0;
if (req_gap_back_merge(req, next->bio)) if (req_gap_back_merge(req, next->bio))
return 0; return 0;
...@@ -703,12 +684,10 @@ static void blk_account_io_merge(struct request *req) ...@@ -703,12 +684,10 @@ static void blk_account_io_merge(struct request *req)
{ {
if (blk_do_io_stat(req)) { if (blk_do_io_stat(req)) {
struct hd_struct *part; struct hd_struct *part;
int cpu;
cpu = part_stat_lock(); part_stat_lock();
part = req->part; part = req->part;
part_round_stats(req->q, cpu, part);
part_dec_in_flight(req->q, part, rq_data_dir(req)); part_dec_in_flight(req->q, part, rq_data_dir(req));
hd_struct_put(part); hd_struct_put(part);
...@@ -731,7 +710,8 @@ static inline bool blk_discard_mergable(struct request *req) ...@@ -731,7 +710,8 @@ static inline bool blk_discard_mergable(struct request *req)
return false; return false;
} }
enum elv_merge blk_try_req_merge(struct request *req, struct request *next) static enum elv_merge blk_try_req_merge(struct request *req,
struct request *next)
{ {
if (blk_discard_mergable(req)) if (blk_discard_mergable(req))
return ELEVATOR_DISCARD_MERGE; return ELEVATOR_DISCARD_MERGE;
...@@ -748,9 +728,6 @@ enum elv_merge blk_try_req_merge(struct request *req, struct request *next) ...@@ -748,9 +728,6 @@ enum elv_merge blk_try_req_merge(struct request *req, struct request *next)
static struct request *attempt_merge(struct request_queue *q, static struct request *attempt_merge(struct request_queue *q,
struct request *req, struct request *next) struct request *req, struct request *next)
{ {
if (!q->mq_ops)
lockdep_assert_held(q->queue_lock);
if (!rq_mergeable(req) || !rq_mergeable(next)) if (!rq_mergeable(req) || !rq_mergeable(next))
return NULL; return NULL;
...@@ -758,8 +735,7 @@ static struct request *attempt_merge(struct request_queue *q, ...@@ -758,8 +735,7 @@ static struct request *attempt_merge(struct request_queue *q,
return NULL; return NULL;
if (rq_data_dir(req) != rq_data_dir(next) if (rq_data_dir(req) != rq_data_dir(next)
|| req->rq_disk != next->rq_disk || req->rq_disk != next->rq_disk)
|| req_no_special_merge(next))
return NULL; return NULL;
if (req_op(req) == REQ_OP_WRITE_SAME && if (req_op(req) == REQ_OP_WRITE_SAME &&
...@@ -773,6 +749,9 @@ static struct request *attempt_merge(struct request_queue *q, ...@@ -773,6 +749,9 @@ static struct request *attempt_merge(struct request_queue *q,
if (req->write_hint != next->write_hint) if (req->write_hint != next->write_hint)
return NULL; return NULL;
if (req->ioprio != next->ioprio)
return NULL;
/* /*
* If we are allowed to merge, then append bio list * If we are allowed to merge, then append bio list
* from next to rq and release next. merge_requests_fn * from next to rq and release next. merge_requests_fn
...@@ -828,10 +807,6 @@ static struct request *attempt_merge(struct request_queue *q, ...@@ -828,10 +807,6 @@ static struct request *attempt_merge(struct request_queue *q,
*/ */
blk_account_io_merge(next); blk_account_io_merge(next);
req->ioprio = ioprio_best(req->ioprio, next->ioprio);
if (blk_rq_cpu_valid(next))
req->cpu = next->cpu;
/* /*
* ownership of bio passed from next to req, return 'next' for * ownership of bio passed from next to req, return 'next' for
* the caller to free * the caller to free
...@@ -863,16 +838,11 @@ struct request *attempt_front_merge(struct request_queue *q, struct request *rq) ...@@ -863,16 +838,11 @@ struct request *attempt_front_merge(struct request_queue *q, struct request *rq)
int blk_attempt_req_merge(struct request_queue *q, struct request *rq, int blk_attempt_req_merge(struct request_queue *q, struct request *rq,
struct request *next) struct request *next)
{ {
struct elevator_queue *e = q->elevator;
struct request *free; struct request *free;
if (!e->uses_mq && e->type->ops.sq.elevator_allow_rq_merge_fn)
if (!e->type->ops.sq.elevator_allow_rq_merge_fn(q, rq, next))
return 0;
free = attempt_merge(q, rq, next); free = attempt_merge(q, rq, next);
if (free) { if (free) {
__blk_put_request(q, free); blk_put_request(free);
return 1; return 1;
} }
...@@ -891,8 +861,8 @@ bool blk_rq_merge_ok(struct request *rq, struct bio *bio) ...@@ -891,8 +861,8 @@ bool blk_rq_merge_ok(struct request *rq, struct bio *bio)
if (bio_data_dir(bio) != rq_data_dir(rq)) if (bio_data_dir(bio) != rq_data_dir(rq))
return false; return false;
/* must be same device and not a special request */ /* must be same device */
if (rq->rq_disk != bio->bi_disk || req_no_special_merge(rq)) if (rq->rq_disk != bio->bi_disk)
return false; return false;
/* only merge integrity protected bio into ditto rq */ /* only merge integrity protected bio into ditto rq */
...@@ -911,6 +881,9 @@ bool blk_rq_merge_ok(struct request *rq, struct bio *bio) ...@@ -911,6 +881,9 @@ bool blk_rq_merge_ok(struct request *rq, struct bio *bio)
if (rq->write_hint != bio->bi_write_hint) if (rq->write_hint != bio->bi_write_hint)
return false; return false;
if (rq->ioprio != bio_prio(bio))
return false;
return true; return true;
} }
......
This diff is collapsed.
This diff is collapsed.
...@@ -31,6 +31,10 @@ void blk_mq_debugfs_unregister_sched(struct request_queue *q); ...@@ -31,6 +31,10 @@ void blk_mq_debugfs_unregister_sched(struct request_queue *q);
int blk_mq_debugfs_register_sched_hctx(struct request_queue *q, int blk_mq_debugfs_register_sched_hctx(struct request_queue *q,
struct blk_mq_hw_ctx *hctx); struct blk_mq_hw_ctx *hctx);
void blk_mq_debugfs_unregister_sched_hctx(struct blk_mq_hw_ctx *hctx); void blk_mq_debugfs_unregister_sched_hctx(struct blk_mq_hw_ctx *hctx);
int blk_mq_debugfs_register_rqos(struct rq_qos *rqos);
void blk_mq_debugfs_unregister_rqos(struct rq_qos *rqos);
void blk_mq_debugfs_unregister_queue_rqos(struct request_queue *q);
#else #else
static inline int blk_mq_debugfs_register(struct request_queue *q) static inline int blk_mq_debugfs_register(struct request_queue *q)
{ {
...@@ -78,6 +82,19 @@ static inline int blk_mq_debugfs_register_sched_hctx(struct request_queue *q, ...@@ -78,6 +82,19 @@ static inline int blk_mq_debugfs_register_sched_hctx(struct request_queue *q,
static inline void blk_mq_debugfs_unregister_sched_hctx(struct blk_mq_hw_ctx *hctx) static inline void blk_mq_debugfs_unregister_sched_hctx(struct blk_mq_hw_ctx *hctx)
{ {
} }
static inline int blk_mq_debugfs_register_rqos(struct rq_qos *rqos)
{
return 0;
}
static inline void blk_mq_debugfs_unregister_rqos(struct rq_qos *rqos)
{
}
static inline void blk_mq_debugfs_unregister_queue_rqos(struct request_queue *q)
{
}
#endif #endif
#ifdef CONFIG_BLK_DEBUG_FS_ZONED #ifdef CONFIG_BLK_DEBUG_FS_ZONED
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
...@@ -421,7 +421,7 @@ int blk_revalidate_disk_zones(struct gendisk *disk) ...@@ -421,7 +421,7 @@ int blk_revalidate_disk_zones(struct gendisk *disk)
* BIO based queues do not use a scheduler so only q->nr_zones * BIO based queues do not use a scheduler so only q->nr_zones
* needs to be updated so that the sysfs exposed value is correct. * needs to be updated so that the sysfs exposed value is correct.
*/ */
if (!queue_is_rq_based(q)) { if (!queue_is_mq(q)) {
q->nr_zones = nr_zones; q->nr_zones = nr_zones;
return 0; return 0;
} }
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment