Commits · 681cc5e8667e8579a2da8fa4090c48a2d73fc3bb · Kirill Smelkov / linux

07 Oct, 2020 2 commits

dm: fix request-based DM to not bounce through indirect dm_submit_bio · 681cc5e8

Mike Snitzer authored Oct 07, 2020

It is unnecessary to force request-based DM to call into bio-based
dm_submit_bio (via indirect disk->fops->submit_bio) only to have it then
call blk_mq_submit_bio().

Fix this by establishing a request-based DM block_device_operations
(dm_rq_blk_dops, which doesn't have .submit_bio) and update
dm_setup_md_queue() to set md->disk->fops to it for
DM_TYPE_REQUEST_BASED.

Remove DM_TYPE_REQUEST_BASED conditional in dm_submit_bio and unexport
blk_mq_submit_bio.

Fixes: c62b37d9 ("block: move ->make_request_fn to struct block_device_operations")
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

681cc5e8

dm: remove special-casing of bio-based immutable singleton target on NVMe · 9c37de29

Mike Snitzer authored Oct 07, 2020

Since commit 5a6c35f9 ("block: remove direct_make_request") there
is no benefit to DM special-casing NVMe. Remove all code used to
establish DM_TYPE_NVME_BIO_BASED.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

9c37de29

01 Oct, 2020 3 commits

dm: export dm_copy_name_and_uuid · 61931c0e

Mike Snitzer authored Oct 01, 2020

Allow DM targets to access the configured name and uuid.
Also, bump DM ioctl version.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

61931c0e

dm: fix comment in __dm_suspend() · 0cede372

Mike Snitzer authored Sep 30, 2020

Fix stale references to functions that have been renamed and fix typo.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

0cede372

dm: fold dm_process_bio() into dm_submit_bio() · b2abdb1b

Mike Snitzer authored Sep 30, 2020

dm_process_bio() is only called by dm_submit_bio(), there is no benefit
to keeping dm_process_bio() factored out, so fold it.

While at it, cleanup dm_submit_bio()'s DMF_BLOCK_IO_FOR_SUSPEND related
branching and expand scope of dm_get_live_table() rcu reference on map
via common 'out' label to dm_put_live_table().
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

b2abdb1b

30 Sep, 2020 1 commit

dm: fix missing imposition of queue_limits from dm_wq_work() thread · 0c2915b8

Mike Snitzer authored Sep 28, 2020

If a DM device was suspended when bios were issued to it, those bios
would be deferred using queue_io(). Once the DM device was resumed
dm_process_bio() could be called by dm_wq_work() for original bio that
still needs splitting. dm_process_bio()'s check for current->bio_list
(meaning call chain is within ->submit_bio) as a prerequisite for
calling blk_queue_split() for "abnormal IO" would result in
dm_process_bio() never imposing corresponding queue_limits
(e.g. discard_granularity, discard_max_bytes, etc).

Fix this by always having dm_wq_work() resubmit deferred bios using
submit_bio_noacct().

Side-effect is blk_queue_split() is always called for "abnormal IO" from
->submit_bio, be it from application thread or dm_wq_work() workqueue,
so proper bio splitting and depth-first bio submission is performed.
For sake of clarity, remove current->bio_list check before call to
blk_queue_split().

Also, remove dm_wq_work()'s use of dm_{get,put}_live_table() -- no
longer needed since IO will be reissued in terms of ->submit_bio.
And rename bio variable from 'c' to 'bio'.

Fixes: cf9c3786 ("dm: fix comment in dm_process_bio()")
Reported-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

0c2915b8

29 Sep, 2020 16 commits

dm snap persistent: simplify area_io() · 7d837c0d

Qinglang Miao authored Sep 21, 2020

Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

7d837c0d

dm thin metadata: Remove unused local variable when create thin and snap · 399c9bdb

Huaisheng Ye authored Sep 15, 2020

The local variable disk details is not used during the creating of thin & snap
devices. Remove them from dm-thin-metadata, and add pointer validity check for
pointer value in btree_lookup_raw. Skip memory copy when the caller doesn't need
the value.
Signed-off-by: Huaisheng Ye <yehs1@lenovo.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

399c9bdb

dm raid: remove unnecessary discard limits for raid10 · f0e90b6c

Mike Snitzer authored Sep 24, 2020

Commit bcc90d28 ("md/raid10: improve raid10 discard request")
removes raid10's inability to properly handle large discards.  So
eliminate associated constraint from dm-raid's raid10 support.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

f0e90b6c

dm raid: fix discard limits for raid1 and raid10 · e0910c8e

Mike Snitzer authored Sep 24, 2020

Block core warned that discard_granularity was 0 for dm-raid with
personality of raid1.  Reason is that raid_io_hints() was incorrectly
special-casing raid1 rather than raid0.

But since commit 29efc390 ("md/md0: optimize raid0 discard
handling") even raid0 properly handles large discards.

Fix raid_io_hints() by removing discard limits settings for raid1.
Also, fix limits for raid10 by properly stacking underlying limits as
done in blk_stack_limits().

Depends-on: 29efc390 ("md/md0: optimize raid0 discard handling")
Fixes: 61697a6a ("dm: eliminate 'split_discard_bios' flag from DM target interface")
Cc: stable@vger.kernel.org
Reported-by: Zdenek Kabelac <zkabelac@redhat.com>
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

e0910c8e

dm crypt: don't use drivers that have CRYPTO_ALG_ALLOCATES_MEMORY · cd746938

Mikulas Patocka authored Jul 09, 2020

Don't use crypto drivers that have the flag CRYPTO_ALG_ALLOCATES_MEMORY
set. These drivers allocate memory and thus they are unsuitable for block
I/O processing.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

cd746938

dm: use dm_table_get_device_name() where appropriate in targets · d4a512ed

Mike Snitzer authored Sep 19, 2020

dm_table_get_device_name() avoids calling dm_table_get_md() followed by
dm_device_name() -- saves intermediate dm_table_get_md() call.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

d4a512ed

dm table: make 'struct dm_table' definition accessible to all of DM core · 33bd6f06

Mike Snitzer authored Sep 19, 2020

Move 'struct dm_table' definition from dm-table.c to dm-core.h and
update DM core to access its members directly.

Helps optimize max_io_len() and other methods slightly.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

33bd6f06

dm: eliminate need for start_io_acct() forward declaration · 7465d7ac
Mike Snitzer authored Sep 17, 2020
```
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
```
7465d7ac

dm: simplify __process_abnormal_io() · 9679b5a7

Mike Snitzer authored Sep 15, 2020

Only call bio_op() once in switch statement.  Also remove the
excessive factoring out to one line functions.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

9679b5a7

dm: push use of on-stack flush_bio down to __send_empty_flush() · 828678b8
Mike Snitzer authored Sep 14, 2020
```
Eliminates duplicate code, no functional change.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
```
828678b8

dm: optimize max_io_len() by inlining max_io_len_target_boundary() · 3720281d

Mike Snitzer authored Sep 19, 2020

Saves redundant dm_target_offset() math.

Also, reverse argument order for max_io_len() to be consistent with
other similar functions.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

3720281d

dm: push md->immutable_target optimization down to __process_bio() · 094ee64d
Mike Snitzer authored Sep 14, 2020
```
Also, update associated stale comment in __bind().
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
```
094ee64d

dm: change max_io_len() to use blk_max_size_offset() · 5091cdec

Mike Snitzer authored Sep 18, 2020

Using blk_max_size_offset() enables DM core's splitting to impose
ti->max_io_len (via q->limits.chunk_sectors) and also fallback to
respecting q->limits.max_sectors if chunk_sectors isn't set.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

5091cdec

dm table: stack 'chunk_sectors' limit to account for target-specific splitting · 882ec4e6

Mike Snitzer authored Sep 14, 2020

If target set ti->max_io_len it must be used when stacking
DM device's queue_limits to establish a 'chunk_sectors' that is
compatible with the IO stack.

By using lcm_not_zero() care is taken to avoid blindly overriding the
chunk_sectors limit stacked up by blk_stack_limits().

Depends-on: 07d098e6 ("block: allow 'chunk_sectors' to be non-power-of-2")
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

882ec4e6

Merge remote-tracking branch 'jens/for-5.10/block' into dm-5.10 · 1471308f

Mike Snitzer authored Sep 29, 2020

DM depends on these block 5.10 commits:

22ada802 block: use lcm_not_zero() when stacking chunk_sectors
07d098e6 block: allow 'chunk_sectors' to be non-power-of-2
021a2446 block: add QUEUE_FLAG_NOWAIT
6abc4946 dm: add support for REQ_NOWAIT and enable it for linear target
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

1471308f

block-mq: fix comments in blk_mq_queue_tag_busy_iter · 76cffccd

yangerkun authored Sep 19, 2020

'f5bbbbe4 ("blk-mq: sync the update nr_hw_queues with
blk_mq_queue_tag_busy_iter")' introduce a bug what we may sleep between
rcu lock. Then '530ca2c9 ("blk-mq: Allow blocking queue tag iter
callbacks")' fix it by get request_queue's ref. And 'a9a80808 ("block:
Remove the synchronize_rcu() call from __blk_mq_update_nr_hw_queues()")'
remove the synchronize_rcu in __blk_mq_update_nr_hw_queues. We need
update the confused comments in blk_mq_queue_tag_busy_iter.
Signed-off-by: yangerkun <yangerkun@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

76cffccd

28 Sep, 2020 1 commit

blk-mq: add cond_resched() in __blk_mq_alloc_rq_maps() · 8229cca8

Xianting Tian authored Sep 26, 2020

We found blk_mq_alloc_rq_maps() takes more time in kernel space when
testing nvme device hot-plugging. The test and anlysis as below.

Debug code,
1, blk_mq_alloc_rq_maps():
        u64 start, end;
        depth = set->queue_depth;
        start = ktime_get_ns();
        pr_err("[%d:%s switch:%ld,%ld] queue depth %d, nr_hw_queues %d\n",
                        current->pid, current->comm, current->nvcsw, current->nivcsw,
                        set->queue_depth, set->nr_hw_queues);
        do {
                err = __blk_mq_alloc_rq_maps(set);
                if (!err)
                        break;

                set->queue_depth >>= 1;
                if (set->queue_depth < set->reserved_tags + BLK_MQ_TAG_MIN) {
                        err = -ENOMEM;
                        break;
                }
        } while (set->queue_depth);
        end = ktime_get_ns();
        pr_err("[%d:%s switch:%ld,%ld] all hw queues init cost time %lld ns\n",
                        current->pid, current->comm,
                        current->nvcsw, current->nivcsw, end - start);

2, __blk_mq_alloc_rq_maps():
        u64 start, end;
        for (i = 0; i < set->nr_hw_queues; i++) {
                start = ktime_get_ns();
                if (!__blk_mq_alloc_rq_map(set, i))
                        goto out_unwind;
                end = ktime_get_ns();
                pr_err("hw queue %d init cost time %lld ns\n", i, end - start);
        }

Test nvme hot-plugging with above debug code, we found it totally cost more
than 3ms in kernel space without being scheduled out when alloc rqs for all
16 hw queues with depth 1023, each hw queue cost about 140-250us. The cost
time will be increased with hw queue number and queue depth increasing. And
in an extreme case, if __blk_mq_alloc_rq_maps() returns -ENOMEM, it will try
"queue_depth >>= 1", more time will be consumed.
	[  428.428771] nvme nvme0: pci function 10000:01:00.0
	[  428.428798] nvme 10000:01:00.0: enabling device (0000 -> 0002)
	[  428.428806] pcieport 10000:00:00.0: can't derive routing for PCI INT A
	[  428.428809] nvme 10000:01:00.0: PCI INT A: no GSI
	[  432.593374] [4688:kworker/u33:8 switch:663,2] queue depth 30, nr_hw_queues 1
	[  432.593404] hw queue 0 init cost time 22883 ns
	[  432.593408] [4688:kworker/u33:8 switch:663,2] all hw queues init cost time 35960 ns
	[  432.595953] nvme nvme0: 16/0/0 default/read/poll queues
	[  432.595958] [4688:kworker/u33:8 switch:700,2] queue depth 1023, nr_hw_queues 16
	[  432.596203] hw queue 0 init cost time 242630 ns
	[  432.596441] hw queue 1 init cost time 235913 ns
	[  432.596659] hw queue 2 init cost time 216461 ns
	[  432.596877] hw queue 3 init cost time 215851 ns
	[  432.597107] hw queue 4 init cost time 228406 ns
	[  432.597336] hw queue 5 init cost time 227298 ns
	[  432.597564] hw queue 6 init cost time 224633 ns
	[  432.597785] hw queue 7 init cost time 219954 ns
	[  432.597937] hw queue 8 init cost time 150930 ns
	[  432.598082] hw queue 9 init cost time 143496 ns
	[  432.598231] hw queue 10 init cost time 147261 ns
	[  432.598397] hw queue 11 init cost time 164522 ns
	[  432.598542] hw queue 12 init cost time 143401 ns
	[  432.598692] hw queue 13 init cost time 148934 ns
	[  432.598841] hw queue 14 init cost time 147194 ns
	[  432.598991] hw queue 15 init cost time 148942 ns
	[  432.598993] [4688:kworker/u33:8 switch:700,2] all hw queues init cost time 3035099 ns
	[  432.602611]  nvme0n1: p1

So use this patch to trigger schedule between each hw queue init, to avoid
other threads getting stuck. It is not in atomic context when executing
__blk_mq_alloc_rq_maps(), so it is safe to call cond_resched().
Signed-off-by: Xianting Tian <tian.xianting@h3c.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

8229cca8

25 Sep, 2020 17 commits

iocost: consider iocgs with active delays for debt forgiveness · bec02dbb

Tejun Heo authored Sep 18, 2020

An iocg may have 0 debt but non-zero delay. The current debt forgiveness
logic doesn't act on such iocgs. This can lead to unexpected behaviors - an
iocg with a little bit of debt will have its delay canceled through debt
forgiveness but one w/o any debt but active delay will have to wait out
until its delay decays out.

This patch updates the debt handling logic so that it treats delays the same
as debts. If either debt or delay is active, debt forgiveness logic kicks in
and acts on both the same way.

Also, avoid turning the debt and delay directly to zero as that can confuse
state transitions.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>