- 29 Sep, 2022 2 commits
-
-
Pankaj Raghav authored
There are two places in the block layer at the moment where blk_mq_plug() helper could be used instead of directly accessing the plug from struct current. In both these cases, directly accessing the plug should not have any consequences for zoned devices. Make the intent explicit by adding comments instead of introducing unwanted checks with blk_mq_plug() helper.[1] [1] https://lore.kernel.org/linux-block/f6e54907-1035-2b2c-6387-ed178be05ccb@kernel.dk/Signed-off-by: Pankaj Raghav <p.raghav@samsung.com> Suggested-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/20220929144141.140077-1-p.raghav@samsung.com [axboe: fixup multi-line comment style] Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Pankaj Raghav authored
The current implementation of blk_mq_plug() disables plugging for all operations that involves a transfer to the device as we just check if the last bit in op_is_write() function. Modify blk_mq_plug() to disable plugging only for REQ_OP_WRITE and REQ_OP_WRITE_ZEROS as they might require a zone lock. Suggested-by: Christoph Hellwig <hch@lst.de> Suggested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Pankaj Raghav <p.raghav@samsung.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220929074745.103073-2-p.raghav@samsung.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
- 28 Sep, 2022 2 commits
-
-
git://git.infradead.org/nvmeJens Axboe authored
Pull NVMe updates from Christoph: "nvme updates for Linux 6.1 - handle effects after freeing the request (Keith Busch) - copy firmware_rev on each init (Keith Busch) - restrict management ioctls to admin (Keith Busch) - ensure subsystem reset is single threaded (Keith Busch) - report the actual number of tagset maps in nvme-pci (Keith Busch) - small fabrics authentication fixups (Christoph Hellwig) - add common code for tagset allocation and freeing (Christoph Hellwig) - stop using the request_queue in nvmet (Christoph Hellwig) - set min_align_mask before calculating max_hw_sectors (Rishabh Bhatnagar) - send a rediscover uevent when a persistent discovery controller reconnects (Sagi Grimberg) - misc nvmet-tcp fixes (Varun Prakash, zhenwei pi)" * tag 'nvme-6.1-2022-09-28' of git://git.infradead.org/nvme: (31 commits) nvmet: don't look at the request_queue in nvmet_bdev_set_limits nvmet: don't look at the request_queue in nvmet_bdev_zone_mgmt_emulate_all nvme: remove nvme_ctrl_init_connect_q nvme-loop: use the tagset alloc/free helpers nvme-loop: store the generic nvme_ctrl in set->driver_data nvme-loop: initialize sqsize later nvme-fc: use the tagset alloc/free helpers nvme-fc: store the generic nvme_ctrl in set->driver_data nvme-fc: keep ctrl->sqsize in sync with opts->queue_size nvme-rdma: use the tagset alloc/free helpers nvme-rdma: store the generic nvme_ctrl in set->driver_data nvme-tcp: use the tagset alloc/free helpers nvme-tcp: store the generic nvme_ctrl in set->driver_data nvme-tcp: remove the unused queue_size member in nvme_tcp_queue nvme: add common helpers to allocate and free tagsets nvme-auth: add a MAINTAINERS entry nvmet: add helpers to set the result field for connect commands nvme: improve the NVME_CONNECT_AUTHREQ* definitions nvmet-auth: don't try to cancel a non-initialized work_struct nvmet-tcp: remove nvmet_tcp_finish_cmd ...
-
Christoph Hellwig authored
As far as I can tell there is no need for the staged setup in dasd, so allocate the tagset and the disk with the queue in dasd_gendisk_alloc. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Stefan Haberland <sth@linux.ibm.com> Signed-off-by: Stefan Haberland <sth@linux.ibm.com> Link: https://lore.kernel.org/r/20220928143945.1687114-2-sth@linux.ibm.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
- 27 Sep, 2022 36 commits
-
-
Christoph Hellwig authored
blkg_conf_prep just creates a new blkg structure, there is no real need to update the lookup hint which should only be done on a successful lookup in the I/O path. Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220927065425.257876-1-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
nvmet is a consumer of the block layer and should not directly look at the request_queue. Use the bdev_ helpers to retrieve the device limits instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org>
-
Christoph Hellwig authored
nvmet is a consumer of the block layer and should not directly look at the request_queue. Just use the NUMA node ID from the gendisk instead of the request_queue. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org>
-
Keith Busch authored
The hctx's run_work may be racing with the elevator switch when reinitializing hardware queues. The queue is merely frozen in this context, but that only prevents requests from allocating and doesn't stop the hctx work from running. The work may get an elevator pointer that's being torn down, and can result in use-after-free errors and kernel panics (example below). Use the quiesced elevator switch instead, and make the previous one static since it is now only used locally. nvme nvme0: resetting controller nvme nvme0: 32/0/0 default/read/poll queues BUG: kernel NULL pointer dereference, address: 0000000000000008 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 80000020c8861067 P4D 80000020c8861067 PUD 250f8c8067 PMD 0 Oops: 0000 [#1] SMP PTI Workqueue: kblockd blk_mq_run_work_fn RIP: 0010:kyber_has_work+0x29/0x70 ... Call Trace: __blk_mq_do_dispatch_sched+0x83/0x2b0 __blk_mq_sched_dispatch_requests+0x12e/0x170 blk_mq_sched_dispatch_requests+0x30/0x60 __blk_mq_run_hw_queue+0x2b/0x50 process_one_work+0x1ef/0x380 worker_thread+0x2d/0x3e0 Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220927155652.3260724-1-kbusch@fb.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Replace blk_queue_nowait with a bdev_nowait helpers that takes the block_device given that the I/O submission path should not have to look into the request_queue. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Pankaj Raghav <p.raghav@samsung.com> Link: https://lore.kernel.org/r/20220927075815.269694-1-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Unused now. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
-
Christoph Hellwig authored
Use the common helpers to allocate and free the tagsets. To make this work the generic nvme_ctrl now needs to be stored in the hctx private data instead of the nvme_loop_ctrl. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
-
Christoph Hellwig authored
Point the private data to the generic controller structure in preparation of using the common tagset init/exit code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
-
Christoph Hellwig authored
Defer initializing the sqsize field from the options until it has been capped by MAXCMD. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
-
Christoph Hellwig authored
Use the common helpers to allocate and free the tagsets. To make this work the generic nvme_ctrl now needs to be stored in the hctx private data instead of the nvme_fc_ctrl. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: James Smart <jsmart2021@gmail.com>
-
Christoph Hellwig authored
Point the private data to the generic controller structure in preparation of using the common tagset init/exit code and use the chance the cleanup the init_hctx methods a bit. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: James Smart <jsmart2021@gmail.com>
-
Christoph Hellwig authored
Also update the sqsize field when capping the queue size, and remove the check a queue size that is larger than sqsize given that sqsize is only initialized from opts->queue_size. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: James Smart <jsmart2021@gmail.com>
-
Christoph Hellwig authored
Use the common helpers to allocate and free the tagsets. To make this work the generic nvme_ctrl now needs to be stored in the hctx private data instead of the nvme_rdma_ctrl. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
-
Christoph Hellwig authored
Point the private data to the generic controller structure in preparation of using the common tagset init/exit code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
-
Christoph Hellwig authored
Use the common helpers to allocate and free the tagsets. To make this work the generic nvme_ctrl now needs to be stored in the hctx private data instead of the nvme_tcp_ctrl. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
-
Christoph Hellwig authored
Point the private data to the generic controller structure in preparation of using the common tagset init/exit code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
-
Christoph Hellwig authored
->nvme_tcp_queue is not used anywhere, so remove it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
-
Christoph Hellwig authored
Add common helpers to allocate and tear down the admin and I/O tag sets, including the special queues allocated with them. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
-
Christoph Hellwig authored
Add Hannes as the nvme-auth maintainer. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
-
Christoph Hellwig authored
The code to set the result field for the admin and I/O connect commands is not only verbose and duplicated, but also violates the aliasing rules as it accesses both the u16 and u32 members in the union. Add a little helper to sort all that out. Fixes: db1312dd ("nvmet: implement basic In-Band Authentication") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Hannes Reinecke <hare@suse.de>
-
Christoph Hellwig authored
Mark them as unsigned so that we don't need extra casts, and define them relative to cdword0 instead of requiring extra shifts. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Hannes Reinecke <hare@suse.de>
-
Christoph Hellwig authored
Currently blktests nvme/002 trips up debugobjects if CONFIG_NVME_AUTH is enabled, but authentication is not on a queue. This is because nvmet_auth_sq_free cancels sq->auth_expired_work unconditionaly, while auth_expired_work is only ever initialized if authentication is enabled for a given controller. Fix this by calling most of what is nvmet_init_auth unconditionally when initializing the SQ, and just do the setting of the result field in the connect command handler. Fixes: db1312dd ("nvmet: implement basic In-Band Authentication") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Hannes Reinecke <hare@suse.de>
-
zhenwei pi authored
There is only a single call-site of nvmet_tcp_finish_cmd(), this becomes redundant. Remove nvmet_tcp_finish_cmd() and use the original function body instead. Suggested-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: zhenwei pi <pizhenwei@bytedance.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Varun Prakash authored
ttag is used as an index to get cmd in nvmet_tcp_handle_h2c_data_pdu(), add a bounds check to avoid out-of-bounds access. Signed-off-by: Varun Prakash <varun@chelsio.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Varun Prakash authored
As per NVMe/TCP transport specification ICReq PDU is the first PDU received by the controller and controller should receive only one ICReq PDU. If controller receives more than one ICReq PDU then this can be considered as fatal error. nvmet-tcp driver does not check for ICReq PDU opcode if queue state is NVMET_TCP_Q_LIVE. In LIVE state ICReq PDU is treated as CapsuleCmd PDU, this can result in abnormal behavior. Add a check for ICReq PDU in nvmet_tcp_done_recv_pdu() to fix this issue. Signed-off-by: Varun Prakash <varun@chelsio.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
zhenwei pi authored
nvmet-tcp frees CMD buffers in nvmet_tcp_uninit_data_in_cmds(), and waits the inflight IO requests in nvmet_sq_destroy(). During wait the inflight IO requests, the callback nvmet_tcp_queue_response() is called from backend after IO complete, this leads a typical Use-After-Free issue like this: BUG: kernel NULL pointer dereference, address: 0000000000000008 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 107f80067 P4D 107f80067 PUD 10789e067 PMD 0 Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 1 PID: 123 Comm: kworker/1:1H Kdump: loaded Tainted: G E 6.0.0-rc2.bm.1-amd64 #15 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Workqueue: nvmet_tcp_wq nvmet_tcp_io_work [nvmet_tcp] RIP: 0010:shash_ahash_digest+0x2b/0x110 Code: 1f 44 00 00 41 57 41 56 41 55 41 54 55 48 89 fd 53 48 89 f3 48 83 ec 08 44 8b 67 30 45 85 e4 74 1c 48 8b 57 38 b8 00 10 00 00 <44> 8b 7a 08 44 29 f8 39 42 0c 0f 46 42 0c 41 39 c4 76 43 48 8b 03 RSP: 0018:ffffc9000051bdd8 EFLAGS: 00010206 RAX: 0000000000001000 RBX: ffff888100ab5470 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff888100ab5470 RDI: ffff888100ab5420 RBP: ffff888100ab5420 R08: ffff8881024d08c8 R09: ffff888103e1b4b8 R10: 8080808080808080 R11: 0000000000000000 R12: 0000000000001000 R13: 0000000000000000 R14: ffff88813412bd4c R15: ffff8881024d0800 FS: 0000000000000000(0000) GS:ffff88883fa40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000008 CR3: 0000000104b48000 CR4: 0000000000350ee0 Call Trace: <TASK> nvmet_tcp_io_work+0xa52/0xb52 [nvmet_tcp] ? __switch_to+0x106/0x420 process_one_work+0x1ae/0x380 ? process_one_work+0x380/0x380 worker_thread+0x30/0x360 ? process_one_work+0x380/0x380 kthread+0xe6/0x110 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x1f/0x30 Separate nvmet_tcp_uninit_data_in_cmds() into two steps: uninit data in cmds <- new step 1 nvmet_sq_destroy(); cancel_work_sync(&queue->io_work); free CMD buffers <- new step 2 Signed-off-by: zhenwei pi <pizhenwei@bytedance.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Keith Busch authored
We've been reporting 2 maps regardless of whether the module parameter asked for anything beyond the default queues. A consequence of this means that blk-mq will reinitialize the all the hardware contexts and io schedulers on every controller reset when the mapping is exactly the same as before. This unnecessary overhead is adding several milliseconds on a reset for environments that don't need it. Report the actual number of mappings in use. Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Rishabh Bhatnagar authored
If swiotlb is force enabled dma_max_mapping_size ends up calling swiotlb_max_mapping_size which takes into account the min align mask for the device. Set the min align mask for nvme driver before calling dma_max_mapping_size while calculating max hw sectors. Signed-off-by: Rishabh Bhatnagar <risbhat@amazon.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Sagi Grimberg authored
When a discovery controller is disconnected, no AENs will arrive to notify the host about discovery log change events. In order to solve this, send a uevent notification when a persistent discovery controller reconnects. We add a new ctrl flag NVME_CTRL_STARTED_ONCE that will be set on the first start, and consecutive calls will find it set, and send the event to userspace if the controller is a discovery controller. Upon the event reception, userspace will re-read the discovery log page and will act upon changes as it sees fit. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Daniel Wagner <dwagner@suse.de> Reviewed-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Sagi Grimberg authored
We expect to grow a few of these flags for various purposes so make them a proper enumeration. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Keith Busch authored
The subsystem reset writes to a register, so we have to ensure the device state is capable of handling that otherwise the driver may access unmapped registers. Use the state machine to ensure the subsystem reset doesn't try to write registers on a device already undergoing this type of reset. Link: https://bugzilla.kernel.org/show_bug.cgi?id=214771Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Keith Busch authored
The passthrough commands already have this restriction, but the other operations do not. Require the same capabilities for all users as all of these operations, which include resets and rescans, can be disruptive. Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Keith Busch authored
The firmware revision can change on after a reset so copy the most recent info each time instead of just the first time, otherwise the sysfs firmware_rev entry may contain stale data. Reported-by: Jeff Lien <jeff.lien@wdc.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Chao Leng <lengchao@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Keith Busch authored
If a reset occurs after the scan work attempts to issue a command, the reset may quisce the admin queue, which blocks the scan work's command from dispatching. The scan work will not be able to complete while the queue is quiesced. Meanwhile, the reset work will cancel all outstanding admin tags and wait until all requests have transitioned to idle, which includes the passthrough request. But the passthrough request won't be set to idle until after the scan_work flushes, so we're deadlocked. Fix this by handling the end effects after the request has been freed. Link: https://bugzilla.kernel.org/show_bug.cgi?id=216354Reported-by: Jonathan Derrick <Jonathan.Derrick@solidigm.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chao Leng <lengchao@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Christoph Hellwig authored
Prepare for storing the blkcg information in the gendisk instead of the request_queue. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Andreas Herrmann <aherrmann@suse.de> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220921180501.1539876-18-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Pass the gendisk to blkcg_schedule_throttle as part of moving the blk-cgroup infrastructure to be gendisk based. Remove the unused !BLK_CGROUP stub while we're at it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Andreas Herrmann <aherrmann@suse.de> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220921180501.1539876-17-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>
-