Commits · 8d96a1117c21faad4f88d3d2df8c62712b3f495d · nexedi / linux

27 Mar, 2020 3 commits

null_blk: use blk_mq_init_queue_data · 8d96a111

Christoph Hellwig authored Mar 27, 2020

Use the new blk_mq_init_queue_data instead of open coding the queue
allocation and initialization.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

8d96a111

block: add a blk_mq_init_queue_data helper · 2f227bb9

Christoph Hellwig authored Mar 27, 2020

This allows a driver to pass a queuedata member before ->init_hctx is
called.  null_blk currently open codes this logic, but I'd rather have
it in the core to ease future maintainance.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

2f227bb9

block: move the ->devnode callback to struct block_device_operations · 348e114b

Christoph Hellwig authored Mar 27, 2020

There really isn't any good reason to stash a method directly into
struct gendisk.  Move it together with the other block device
operations.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

348e114b

25 Mar, 2020 12 commits

block: move the part_stat* helpers from genhd.h to a new header · c6a564ff

Christoph Hellwig authored Mar 25, 2020

These macros are just used by a few files.  Move them out of genhd.h,
which is included everywhere into a new standalone header.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

c6a564ff

block: move block layer internals out of include/linux/genhd.h · 581e2600

Christoph Hellwig authored Mar 25, 2020

None of this needs to be exposed to drivers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

581e2600

block: move guard_bio_eod to bio.c · 29125ed6

Christoph Hellwig authored Mar 25, 2020

This is bio layer functionality and not related to buffer heads.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

29125ed6

block: unexport get_gendisk · 1b4d4dbd

Christoph Hellwig authored Mar 25, 2020

get_gendisk is not used by any modular code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

1b4d4dbd

block: unexport disk_map_sector_rcu · a7818aed

Christoph Hellwig authored Mar 25, 2020

disk_map_sector_rcu is not used by any modular code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

a7818aed

block: unexport disk_get_part · 572e7fc8

Christoph Hellwig authored Mar 25, 2020

disk_get_part is not used by any modular code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

572e7fc8

block: mark part_in_flight and part_in_flight_rw static · 6005771c
Christoph Hellwig authored Mar 25, 2020
```
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
```
6005771c

block: mark block_depr static · 31eb6186

Christoph Hellwig authored Mar 25, 2020

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

31eb6186

block: factor out requeue handling from dispatch code · c92a4103

Johannes Thumshirn authored Mar 25, 2020

Factor out the requeue handling from the dispatch code, this will make
subsequent addition of different requeueing schemes easier.
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

c92a4103

block/diskstats: replace time_in_queue with sum of request times · 8cd5b8fc

Konstantin Khlebnikov authored Mar 25, 2020

Column "time_in_queue" in diskstats is supposed to show total waiting time
of all requests. I.e. value should be equal to the sum of times from other
columns. But this is not true, because column "time_in_queue" is counted
separately in jiffies rather than in nanoseconds as other times.

This patch removes redundant counter for "time_in_queue" and shows total
time of read, write, discard and flush requests.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

8cd5b8fc

block/diskstats: accumulate all per-cpu counters in one pass · ea18e0f0

Konstantin Khlebnikov authored Mar 25, 2020

Reading /proc/diskstats iterates over all cpus for summing each field.
It's faster to sum all fields in one pass.

Hammering /proc/diskstats with fio shows 2x performance improvement:

fio --name=test --numjobs=$JOBS --filename=/proc/diskstats \
    --size=1k --bs=1k --fallocate=none --create_on_open=1 \
    --time_based=1 --runtime=10 --invalidate=0 --group_report

	  JOBS=1	JOBS=10
Before:	  7k iops	64k iops
After:	 18k iops      120k iops

Also this way code is more compact:

add/remove: 1/0 grow/shrink: 0/2 up/down: 194/-1540 (-1346)
Function                                     old     new   delta
part_stat_read_all                             -     194    +194
diskstats_show                              1344     631    -713
part_stat_show                              1219     392    -827
Total: Before=14966947, After=14965601, chg -0.01%
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

ea18e0f0

block/diskstats: more accurate approximation of io_ticks for slow disks · 2b8bd423

Konstantin Khlebnikov authored Mar 25, 2020

Currently io_ticks is approximated by adding one at each start and end of
requests if jiffies counter has changed. This works perfectly for requests
shorter than a jiffy or if one of requests starts/ends at each jiffy.

If disk executes just one request at a time and they are longer than two
jiffies then only first and last jiffies will be accounted.

Fix is simple: at the end of request add up into io_ticks jiffies passed
since last update rather than just one jiffy.

Example: common HDD executes random read 4k requests around 12ms.

fio --name=test --filename=/dev/sdb --rw=randread --direct=1 --runtime=30 &
iostat -x 10 sdb

Note changes of iostat's "%util" 8,43% -> 99,99% before/after patch:

Before:

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdb 0,00 0,00 82,60 0,00 330,40 0,00 8,00 0,96 12,09 12,09 0,00 1,02 8,43

After:

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdb 0,00 0,00 82,50 0,00 330,00 0,00 8,00 1,00 12,10 12,10 0,00 12,12 99,99

Now io_ticks does not loose time between start and end of requests, but
for queue-depth > 1 some I/O time between adjacent starts might be lost.

For load estimation "%util" is not as useful as average queue length,
but it clearly shows how often disk queue is completely empty.

Fixes: 5b18b5a7 ("block: delete part_round_stats and switch to less precise counting")
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

2b8bd423

24 Mar, 2020 21 commits

block: merge partition-generic.c and check.c · 387048bf

Christoph Hellwig authored Mar 24, 2020

Merge block/partition-generic.c and block/partitions/check.c into
a single block/partitions/core.c as the content is closely related
and both files are tiny.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

387048bf

block: move the various x86 Unix label formats out of genhd.h · 3f4fc59c

Christoph Hellwig authored Mar 24, 2020

All these are just used in block/partitions/msdos.c, so move them out of the
genhd.h driver included by every driver.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

3f4fc59c

partitions/msdos: remove LINUX_SWAP_PARTITION · cb0ab526

Christoph Hellwig authored Mar 24, 2020

Just always use NEW_SOLARIS_X86_PARTITION and explain the situation,
as that is less confusing than two names for a single value.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

cb0ab526

block: move the *_PARTITION enum out of genhd.h · 0226e9ea

Christoph Hellwig authored Mar 24, 2020

The enum containing the *_PARTITION symbolic names is only relevant
for the partition parser.  More specifically most values are MSDOS
partition table system indicators and thus should go straight into
msdos.c.  One value is only used by the sun partition parser, and the
sun and sgi partition parsers use the same value as the x86 Linux
RAID indicator to also indicate RAID autodetection.  Duplicate them
in sun.c and sgi.c given that the different partition types use
entirely different values otherwise.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

0226e9ea

block: move struct partition out of genhd.h · 1442f76d

Christoph Hellwig authored Mar 24, 2020

struct partition is the on-disk format of a MSDOS partition table entry.
Move it out of genhd.h into a new msdos_partition.h header and give it
a msdos_ prefix to avoid confusion.
Also move the magic number from block/partitions/msdos.h to the new
header so that it can be used by the SCSI drivers looking at the DOS
partition tables.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

1442f76d

block: remove block/partitions/sun.h · cbb5cb3b

Christoph Hellwig authored Mar 24, 2020

Just move the two defines to block/partitions/sun.c.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

cbb5cb3b

block: remove block/partitions/sgi.h · 95f77ef3

Christoph Hellwig authored Mar 24, 2020

Just move the single define to block/partitions/sgi.c.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

95f77ef3

block: remove block/partitions/osf.h · 3466f63a

Christoph Hellwig authored Mar 24, 2020

Just move the single define to block/partitions/osf.c.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

3466f63a

block: remove block/partitions/karma.h · f6d17358

Christoph Hellwig authored Mar 24, 2020

Just move the single define to block/partitions/karma.c.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

f6d17358

block: declare all partition detection routines in check.h · 3f1b95ef

Christoph Hellwig authored Mar 24, 2020

There is no good reason to include one header per partition type in
core.c.  Instead move the prototypes for the detection routins to
check.h, and remove all now empty headers in block/partitions/.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

3f1b95ef

block: remove warn_no_part · ffa9ed64

Christoph Hellwig authored Mar 24, 2020

The warn_no_part is initialized to 1 and never changed.  Remove
it and execute the code keyed off from it unconditionally.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

ffa9ed64

block: cleanup how md_autodetect_dev is called · 74cc979c

Christoph Hellwig authored Mar 24, 2020

Add a new include/linux/raid/detect.h header to declare the
md_autodetect_dev prototype which can be shared between md and
the partition code.  Then use IS_BUILTIN to call it instead of the
ifdef magic.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

74cc979c

block: unexport read_dev_sector and put_dev_sector · 1a9fba3a

Christoph Hellwig authored Mar 24, 2020

read_dev_sector and put_dev_sector are now only used by the partition
parsing code.  Remove the export for read_dev_sector and merge it into
the only caller.  Clean the mess up a bit by using goto labels and
the SECTOR_SHIFT constant.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

1a9fba3a

scsi: simplify scsi_partsize · a10183d7

Christoph Hellwig authored Mar 24, 2020

Call scsi_bios_ptable from scsi_partsize instead of requiring boilerplate
code in the callers.  Also switch the calling convention to match that
of the ->bios_param instances calling this function, and use true/false
for the return value instead of the weird -1 convention.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

a10183d7

scsi: move scsicam_bios_param to the end of scsicam.c · 26ae3533

Christoph Hellwig authored Mar 24, 2020

This avoids the need for a forward declaration and generally keeps the
file in the lower level first, high level last order.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

26ae3533

scsi: simplify scsi_bios_ptable · e63105df

Christoph Hellwig authored Mar 24, 2020

Use read_mapping_page and kmemdup instead of the odd read_dev_sector and
put_dev_sector helpers from the partitioning code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

e63105df

block: remove alloc_part_info and free_part_info · f17c21c1

Christoph Hellwig authored Mar 24, 2020

There isn't any good reason not to simply open code the allocation and
freeing of the partition_meta_info structure.  Especially as one of
the branches in alloc_part_info is entirely dead code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

f17c21c1

block: move sysfs methods shared by disks and partitions to genhd.c · 3ad5cee5

Christoph Hellwig authored Mar 24, 2020

Move the sysfs _show methods that are used both on the full disk and
partition nodes to genhd.c instead of hiding them in the partitioning
code.  Also move the declaration for these methods to block/blk.h so
that we don't expose them to drivers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

3ad5cee5

block: move disk_name and related helpers out of partition-generic.c · 5cbd28e3

Christoph Hellwig authored Mar 24, 2020

Thes functions aren't really related to partition support, so move them
to a more suitable place.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

5cbd28e3

block: remove __bdevname · ea3edd4d

Christoph Hellwig authored Mar 24, 2020

There is no good reason for __bdevname to exist.  Just open code
printing the string in the callers.  For three of them the format
string can be trivially merged into existing printk statements,
and in init/do_mounts.c we can at least do the scnprintf once at
the start of the function, and unconditional of CONFIG_BLOCK to
make the output for tiny configfs a little more helpful.

Acked-by: Theodore Ts'o <tytso@mit.edu> # for ext4
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

ea3edd4d

block: remove the blk_lookup_devt export · d2332c5c

Christoph Hellwig authored Mar 24, 2020

This function is only used by init/do_mounts.c, which can't be modular.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

d2332c5c

21 Mar, 2020 4 commits

block, bfq: invoke flush_idle_tree after reparent_active_queues in pd_offline · 4d38a87f

Paolo Valente authored Mar 21, 2020

In bfq_pd_offline(), the function bfq_flush_idle_tree() is invoked to
flush the rb tree that contains all idle entities belonging to the pd
(cgroup) being destroyed. In particular, bfq_flush_idle_tree() is
invoked before bfq_reparent_active_queues(). Yet the latter may happen
to add some entities to the idle tree. It happens if, in some of the
calls to bfq_bfqq_move() performed by bfq_reparent_active_queues(),
the queue to move is empty and gets expired.

This commit simply reverses the invocation order between
bfq_flush_idle_tree() and bfq_reparent_active_queues().

Tested-by: cki-project@redhat.com
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

4d38a87f

block, bfq: make reparent_leaf_entity actually work only on leaf entities · 576682fa

Paolo Valente authored Mar 21, 2020

bfq_reparent_leaf_entity() reparents the input leaf entity (a leaf
entity represents just a bfq_queue in an entity tree). Yet, the input
entity is guaranteed to always be a leaf entity only in two-level
entity trees. In this respect, because of the error fixed by
commit 14afc593 ("block, bfq: fix overwrite of bfq_group pointer
in bfq_find_set_group()"), all (wrongly collapsed) entity trees happened
to actually have only two levels. After the latter commit, this does not
hold any longer.

This commit fixes this problem by modifying
bfq_reparent_leaf_entity(), so that it searches an active leaf entity
down the path that stems from the input entity. Such a leaf entity is
guaranteed to exist when bfq_reparent_leaf_entity() is invoked.

Tested-by: cki-project@redhat.com
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

576682fa

block, bfq: turn put_queue into release_process_ref in __bfq_bic_change_cgroup · c8997736

Paolo Valente authored Mar 21, 2020

A bfq_put_queue() may be invoked in __bfq_bic_change_cgroup(). The
goal of this put is to release a process reference to a bfq_queue. But
process-reference releases may trigger also some extra operation, and,
to this goal, are handled through bfq_release_process_ref(). So, turn
the invocation of bfq_put_queue() into an invocation of
bfq_release_process_ref().

Tested-by: cki-project@redhat.com
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

c8997736

block, bfq: move forward the getting of an extra ref in bfq_bfqq_move · fd1bb3ae

Paolo Valente authored Mar 21, 2020

Commit ecedd3d7 ("block, bfq: get extra ref to prevent a queue
from being freed during a group move") gets an extra reference to a
bfq_queue before possibly deactivating it (temporarily), in
bfq_bfqq_move(). This prevents the bfq_queue from disappearing before
being reactivated in its new group.

Yet, the bfq_queue may also be expired (i.e., its service may be
stopped) before the bfq_queue is deactivated. And also an expiration
may lead to a premature freeing. This commit fixes this issue by
simply moving forward the getting of the extra reference already
introduced by commit ecedd3d7 ("block, bfq: get extra ref to
prevent a queue from being freed during a group move").

Reported-by: cki-project@redhat.com
Tested-by: cki-project@redhat.com
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

fd1bb3ae