Commits · 563c81586d0ab2841487a61fb34d6e9cd5efded7 · Kirill Smelkov / linux

02 Feb, 2021 22 commits

nvme-tcp: use cancel tagset helper for tear down · 563c8158

Chao Leng authored Jan 21, 2021

Use nvme_cancel_tagset and nvme_cancel_admin_tagset to clean code for
tear down process.
Signed-off-by: Chao Leng <lengchao@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

563c8158

nvme-rdma: use cancel tagset helper for tear down · c4189d68

Chao Leng authored Jan 21, 2021

Use nvme_cancel_tagset and nvme_cancel_admin_tagset to clean code for
tear down process.
Signed-off-by: Chao Leng <lengchao@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

c4189d68

nvme-tcp: add clean action for failed reconnection · 70a99574

Chao Leng authored Jan 21, 2021

If reconnect failed after start io queues, the queues will be unquiesced
and new requests continue to be delivered. Reconnection error handling
process directly free queues without cancel suspend requests. The
suppend request will time out, and then crash due to use the queue
after free.

Add sync queues and cancel suppend requests for reconnection error
handling.
Signed-off-by: Chao Leng <lengchao@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

70a99574

nvme-rdma: add clean action for failed reconnection · 958dc1d3

Chao Leng authored Jan 21, 2021

A crash happens when inject failed reconnection.
If reconnect failed after start io queues, the queues will be unquiesced
and new requests continue to be delivered. Reconnection error handling
process directly free queues without cancel suspend requests. The
suppend request will time out, and then crash due to use the queue
after free.

Add sync queues and cancel suppend requests for reconnection error
handling.
Signed-off-by: Chao Leng <lengchao@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

958dc1d3

nvme-core: add cancel tagset helpers · 25479069

Chao Leng authored Jan 21, 2021

Add nvme_cancel_tagset and nvme_cancel_admin_tagset for tear down and
reconnection error handling.
Signed-off-by: Chao Leng <lengchao@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

25479069

nvme-core: get rid of the extra space · 8f8ea928

Chaitanya Kulkarni authored Jan 26, 2021

Remove the extra space in the nvme_free_cels() when calling
xa_for_each loop which is not a common practice
(except drivers/infiniband/core/ not sure why).
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

8f8ea928

nvme: add tracing of zns commands · 4a407d5e

Johannes Thumshirn authored Jan 27, 2021

When support for the NVMe ZNS commands was merged, tracing of these has
been omitted.

Add nvme_cmd_zone_mgmt_send, nvme_cmd_zone_mgmt_recv as well as
nvme_cmd_zone_append to the nvme driver's tracing facility.
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

4a407d5e

nvme: parse format nvm command details when tracing · 3a98c51a

Michal Krakowiak authored Jan 04, 2021

Add detailed parsing of format nvm admin command to make the
trace log more consistent and human-readable.
Signed-off-by: Michal Krakowiak <michal.krakowiak@intel.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

3a98c51a

nvme: update enumerations for status codes · 3254899e

Max Gurtovoy authored Jan 21, 2021

All the updates are mentioned in the ratified NVMe 1.4 spec.
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

3254899e

nvmet: add lba to sect conversion helpers · 193fcf37

Chaitanya Kulkarni authored Jan 11, 2021

In this preparation patch, we add helpers to convert lbas to sectors &
sectors to lba. This is needed to eliminate code duplication in the ZBD
backend.

Use these helpers in the block device backend.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

193fcf37

nvmet: remove extra variable in identify ns · 3c7b224f

Chaitanya Kulkarni authored Jan 13, 2021

We remove the extra local variable struct nvmet_ns in
nvmet_execute_identify_ns() since req already has ns member that can be
reused, this also eliminates the explicit call to nvmet_put_namespace()
which is already present in the request completion path.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>

3c7b224f

nvmet: remove extra variable in id-desclist · 3631c7f4

Chaitanya Kulkarni authored Jan 13, 2021

We remove the extra local variable struct nvmet_ns in
nvmet_execute_identify_desclist() since req already has ns member that
can be reused, this also eliminates the explicit call to
nvmet_put_namespace() which is already present in the request
completion path.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>

3631c7f4

nvmet: remove extra variable in smart log nsid · 624e67fd

Chaitanya Kulkarni authored Jan 13, 2021

We remove the extra local variable struct nvmet_ns in
nvmet_get_smart_log_nsid() since req already has ns member that can be
reused, this also eliminates the explicit call to nvmet_put_namespace()
which is already present in the request completion path.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>

624e67fd

nvme: refactor ns->ctrl by request · fc97e942

Minwoo Im authored Jan 13, 2021

Just for current code in nvme_cleanup_cmd(), we don't have to get
namespace instance, but we need controller instance.

Controller instance can be retrieved by namespace instance, but it can
be directly accessed by nvme_request instance from request.

	ctrl = nvme_req(req)->ctrl;

We don't have to go around namespace instance from request instance
through gendisk.
Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>

fc97e942

nvme-tcp: pass multipage bvec to request iov_iter · 0dc9edaf

Sagi Grimberg authored Jan 14, 2021

iov_iter uses the right helpers so we should be able
to pass in a multipage bvec. Right now the iov_iter is
initialized with more segments that it needs which doesn't
fail because the iov_iter is capped by byte count, but it
is better to use a full multipage bvec iter.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>

0dc9edaf

nvme-tcp: get rid of unused helper function · 60141aa0

Sagi Grimberg authored Jan 14, 2021

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>

60141aa0

nvme-tcp: fix wrong setting of request iov_iter · cb9b870f

Sagi Grimberg authored Jan 14, 2021

We might set the iov_iter direction wrong, which is harmless for this
use-case, but get it right. Also this makes the code slightly cleaner.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>

cb9b870f

nvme: support command retry delay for admin command · f9063a53

Minwoo Im authored Jan 08, 2021

The controller can request a delay retrying a failed command by setting
the Command Retry Delay (CRD) field in the Completion Queue Entry.

Currentlty this features is only applied to commands on the I/O queue, but
not to commands on the admin queue.  Retreive the nvme_ctrl from the
request so that no namespace is required and apply the feature to all
commands.
Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

f9063a53

nvme: constify static attribute_group structs · 60b152a5

Rikard Falkeborn authored Jan 09, 2021

The only usage of these is to put their addresses in arrays of pointers
to const attribute_groups. Make them const to allow the compiler to put
them in read-only memory.
Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

60b152a5

nvmet-fc: use RCU proctection for assoc_list · 4e2f02bf

Leonid Ravich authored Jan 03, 2021

searching assoc_list protected by rcu_read_lock if list not changed inline.
and according to the rcu list rules.

queue array embedded into nvmet_fc_tgt_assoc protected by rcu_read_lock
according to rcu dereference/assign rules.

queue and assoc object freed after grace period by call_rcu.

tgtport lock taken for changing assoc_list.
Reviewed-by: Eldad Zinger <Eldad.Zinger@dell.com>
Reviewed-by: Elad Grupi <Elad.Grupi@dell.com>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Leonid Ravich <Leonid.Ravich@emc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

4e2f02bf

nvmet: Fix nvmet_is_port_enabled indentation · 36ca03c8

Israel Rukshin authored Jan 07, 2021

Remove extra tab.
Signed-off-by: Israel Rukshin <israelr@nvidia.com>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

36ca03c8

nvmet: Use nvmet_is_port_enabled helper for pi_enable · cc345622

Israel Rukshin authored Jan 07, 2021

Remove code duplication.
Signed-off-by: Israel Rukshin <israelr@nvidia.com>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

cc345622

31 Jan, 2021 1 commit

drbd: Avoid comma separated statements · e8628013

Joe Perches authored Aug 24, 2020

Use semicolons and braces.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

e8628013

26 Jan, 2021 11 commits

rsxx: remove redundant NULL check · 9abe47cc

Yang Li authored Jan 21, 2021

Fix below warnings reported by coccicheck:
./drivers/block/rsxx/dma.c:948:3-8: WARNING: NULL check
before some freeing functions is not needed.
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <abaci-bugfix@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

9abe47cc

zram: fix NULL check before some freeing functions is not needed · 294ed6b9

Tian Tao authored Jan 25, 2021

fixed the below warning:
/drivers/block/zram/zram_drv.c:534:2-8: WARNING: NULL check
before some freeing functions is not needed.
Signed-off-by: Tian Tao <tiantao6@hisilicon.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

294ed6b9

drbd: remove unused argument from drbd_request_prepare and __drbd_make_request · 370276ba

Guoqing Jiang authored Jan 21, 2021

We can remove start_jif since it is not used by drbd_request_prepare,
then remove it from __drbd_make_request further.

Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: drbd-dev@lists.linbit.com
Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

370276ba

mtip32xx: prefer pcie_capability_read_word() · 21269791

Bjorn Helgaas authored Jan 26, 2021

Replace pci_read_config_word() with pcie_capability_read_word().

pcie_capability_read_word() takes care of a few special cases when reading
the PCIe capability.  See 8c0d3a02 ("PCI: Add accessors for PCI Express
Capability").
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

21269791

mtip32xx: use PCI #defines instead of numbers · 416c0547

Bjorn Helgaas authored Jan 26, 2021

Use PCI #defines for PCIe Device Control register values instead of
hard-coding bit positions.  No functional change intended.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

416c0547

loop: scale loop device by introducing per device lock · 6cc8e743

Pavel Tatashin authored Jan 26, 2021

Currently, loop device has only one global lock: loop_ctl_mutex.

This becomes hot in scenarios where many loop devices are used.

Scale it by introducing per-device lock: lo_mutex that protects
modifications of all fields in struct loop_device.

Keep loop_ctl_mutex to protect global data: loop_index_idr, loop_lookup,
loop_add.

The new lock ordering requirement is that loop_ctl_mutex must be taken
before lo_mutex.
Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Reviewed-by: Petr Vorel <pvorel@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

6cc8e743

bdev: Do not return EBUSY if bdev discard races with write · 767630c6

Jan Kara authored Jan 07, 2021

blkdev_fallocate() tries to detect whether a discard raced with an
overlapping write by calling invalidate_inode_pages2_range(). However
this check can give both false negatives (when writing using direct IO
or when writeback already writes out the written pagecache range) and
false positives (when write is not actually overlapping but ends in the
same page when blocksize < pagesize). This actually causes issues for
qemu which is getting confused by EBUSY errors.

Fix the problem by removing this conflicting write detection since it is
inherently racy and thus of little use anyway.
Reported-by: Maxim Levitsky <mlevitsk@redhat.com>
CC: "Darrick J. Wong" <darrick.wong@oracle.com>
Link: https://lore.kernel.org/qemu-devel/20201111153913.41840-1-mlevitsk@redhat.comSigned-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

767630c6

block: inherit BIO_REMAPPED when cloning bios · 46bbf653

Christoph Hellwig authored Jan 26, 2021

Cloned bios are can be used to on the same device, in which case we need
to inherit the BIO_REMAPPED flag to avoid a double partition remap. When
the cloned bios are used on another device, bio_set_dev will clear the flag.

Fixes: 309dca30 ("block: store a block_device pointer in struct bio")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

46bbf653

bcache: use bio_set_dev to assign ->bi_bdev · f65b95fe

Christoph Hellwig authored Jan 26, 2021

Always use the bio_set_dev helper to assign ->bi_bdev to make sure
other state related to the device is uptodate.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

f65b95fe

nvme: use bio_set_dev to assign ->bi_bdev · a7c7f7b2

Christoph Hellwig authored Jan 26, 2021

Always use the bio_set_dev helper to assign ->bi_bdev to make sure
other state related to the device is uptodate.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

a7c7f7b2

bfq: bfq_check_waker() should be static · a5bf0a92

Jens Axboe authored Jan 25, 2021

It's only used in the same file, mark is appropriately static.

Fixes: 71217df3 ("block, bfq: make waker-queue detection more robust")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

a5bf0a92

25 Jan, 2021 6 commits

block, bfq: make waker-queue detection more robust · 71217df3

Paolo Valente authored Jan 25, 2021

In the presence of many parallel I/O flows, the detection of waker
bfq_queues suffers from false positives. This commits addresses this
issue by making the filtering of actual wakers more selective. In more
detail, a candidate waker must be found to meet waker requirements
three times before being promoted to actual waker.
Tested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

71217df3

block, bfq: save also injection state on queue merging · 5a5436b9

Paolo Valente authored Jan 25, 2021

To prevent injection information from being lost on bfq_queue merging,
also the amount of service that a bfq_queue receives must be saved and
restored when the bfq_queue is merged and split, respectively.
Tested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

5a5436b9

block, bfq: save also weight-raised service on queue merging · e673914d

Paolo Valente authored Jan 25, 2021

To prevent weight-raising information from being lost on bfq_queue merging,
also the amount of service that a bfq_queue receives must be saved and
restored when the bfq_queue is merged and split, respectively.
Tested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

e673914d

block, bfq: fix switch back from soft-rt weitgh-raising · d1f600fa

Paolo Valente authored Jan 25, 2021

A bfq_queue may happen to be deemed as soft real-time while it is
still enjoying interactive weight-raising. If this happens because of
a false positive, then the bfq_queue is likely to loose its soft
real-time status soon. Upon losing such a status, the bfq_queue must
get back its interactive weight-raising, if its interactive period is
not over yet. But this case is not handled. This commit corrects this
error.
Tested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

d1f600fa

block, bfq: re-evaluate convenience of I/O plugging on rq arrivals · 7f1995c2

Paolo Valente authored Jan 25, 2021

Upon an I/O-dispatch attempt, BFQ may detect that it was better to
plug I/O dispatch, and to wait for a new request to arrive for the
currently in-service queue. But the arrival of a new request for an
empty bfq_queue, and thus the switch from idle to busy of the
bfq_queue, may cause the scenario to change, and make plugging no
longer needed for service guarantees, or more convenient for
throughput. In this case, keeping I/O-dispatch plugged would certainly
lower throughput.

To address this issue, this commit makes such a check, and stops
plugging I/O if it is better to stop plugging I/O.
Tested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

7f1995c2

block, bfq: replace mechanism for evaluating I/O intensity · eb2fd80f

Paolo Valente authored Jan 25, 2021

Some BFQ mechanisms make their decisions on a bfq_queue basing also on
whether the bfq_queue is I/O bound. In this respect, the current logic
for evaluating whether a bfq_queue is I/O bound is rather rough. This
commits replaces this logic with a more effective one.

The new logic measures the percentage of time during which a bfq_queue
is active, and marks the bfq_queue as I/O bound if the latter if this
percentage is above a fixed threshold.
Tested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

eb2fd80f