- 17 Jul, 2021 1 commit
-
-
Oleksandr Natalenko authored
After mq-deadline learned to deal with cgroups, the BLKCG_MAX_POLS value became too small for all the elevators to be registered properly. The following issue is seen: ``` calling bfq_init+0x0/0x8b @ 1 blkcg_policy_register: BLKCG_MAX_POLS too small initcall bfq_init+0x0/0x8b returned -28 after 507 usecs ``` which renders BFQ non-functional. Increase BLKCG_MAX_POLS to allow enough space for everyone. Fixes: 08a9ad8b ("block/mq-deadline: Add cgroup support") Link: https://lore.kernel.org/lkml/8988303.mDXGIdCtx8@natalenko.name/Signed-off-by: Oleksandr Natalenko <oleksandr@natalenko.name> Link: https://lore.kernel.org/r/20210717123328.945810-1-oleksandr@natalenko.nameSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
- 15 Jul, 2021 4 commits
-
-
Christoph Hellwig authored
xen-blkfront has a weird protocol where close message from the remote side can be delayed, and where hot removals are treated somewhat differently from regular removals, all leading to potential NULL pointer removals, and a del_gendisk from the block device release method, which will deadlock. Fix this by just performing normal hot removals even when the device is opened like all other Linux block drivers. Fixes: c76f48eb ("block: take bd_mutex around delete_partitions in del_gendisk") Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20210715141711.1257293-1-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
git://git.infradead.org/nvmeJens Axboe authored
Pull NVMe fixes from Christoph: "nvme fixes for Linux 5.14 - fix various races in nvme-pci when shutting down just after probing (Casey Chen) - fix a net_device leak in nvme-tcp (Prabhakar Kushwaha)" * tag 'nvme-5.14-2021-07-15' of git://git.infradead.org/nvme: nvme-pci: do not call nvme_dev_remove_admin from nvme_remove nvme-pci: fix multiple races in nvme_setup_io_queues nvme-tcp: use __dev_get_by_name instead dev_get_by_name for OPT_HOST_IFACE
-
Wang Qing authored
We must release the queue before freeing the tagset. Fixes: 4af5f2e0 ("nbd: use blk_mq_alloc_disk and blk_cleanup_disk") Reported-and-tested-by: syzbot+9ca43ff47167c0ee3466@syzkaller.appspotmail.com Signed-off-by: Wang Qing <wangqing@vivo.com> Signed-off-by: Guoqing Jiang <jiangguoqing@kylinos.cn> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210706040016.1360412-1-guoqing.jiang@linux.devSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Guoqing Jiang authored
We must release the queue before freeing the tagset. Fixes: 262d431f ("pd: use blk_mq_alloc_disk and blk_cleanup_disk") Signed-off-by: Guoqing Jiang <jiangguoqing@kylinos.cn> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210706010734.1356066-1-guoqing.jiang@linux.devSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
- 13 Jul, 2021 3 commits
-
-
Casey Chen authored
nvme_dev_remove_admin could free dev->admin_q and the admin_tagset while they are being accessed by nvme_dev_disable(), which can be called by nvme_reset_work via nvme_remove_dead_ctrl. Commit cb4bfda6 ("nvme-pci: fix hot removal during error handling") intended to avoid requests being stuck on a removed controller by killing the admin queue. But the later fix c8e9e9b7 ("nvme-pci: unquiesce admin queue on shutdown"), together with nvme_dev_disable(dev, true) right before nvme_dev_remove_admin() could help dispatch requests and fail them early, so we don't need nvme_dev_remove_admin() any more. Fixes: cb4bfda6 ("nvme-pci: fix hot removal during error handling") Signed-off-by: Casey Chen <cachen@purestorage.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Casey Chen authored
Below two paths could overlap each other if we power off a drive quickly after powering it on. There are multiple races in nvme_setup_io_queues() because of shutdown_lock missing and improper use of NVMEQ_ENABLED bit. nvme_reset_work() nvme_remove() nvme_setup_io_queues() nvme_dev_disable() ... ... A1 clear NVMEQ_ENABLED bit for admin queue lock retry: B1 nvme_suspend_io_queues() A2 pci_free_irq() admin queue B2 nvme_suspend_queue() admin queue A3 pci_free_irq_vectors() nvme_pci_disable() A4 nvme_setup_irqs(); B3 pci_free_irq_vectors() ... unlock A5 queue_request_irq() for admin queue set NVMEQ_ENABLED bit ... nvme_create_io_queues() A6 result = queue_request_irq(); set NVMEQ_ENABLED bit ... fail to allocate enough IO queues: A7 nvme_suspend_io_queues() goto retry If B3 runs in between A1 and A2, it will crash if irqaction haven't been freed by A2. B2 is supposed to free admin queue IRQ but it simply can't fulfill the job as A1 has cleared NVMEQ_ENABLED bit. Fix: combine A1 A2 so IRQ get freed as soon as the NVMEQ_ENABLED bit gets cleared. After solved #1, A2 could race with B3 if A2 is freeing IRQ while B3 is checking irqaction. A3 also could race with B2 if B2 is freeing IRQ while A3 is checking irqaction. Fix: A2 and A3 take lock for mutual exclusion. A3 could race with B3 since they could run free_msi_irqs() in parallel. Fix: A3 takes lock for mutual exclusion. A4 could fail to allocate all needed IRQ vectors if A3 and A4 are interrupted by B3. Fix: A4 takes lock for mutual exclusion. If A5/A6 happened after B2/B1, B3 will crash since irqaction is not NULL. They are just allocated by A5/A6. Fix: Lock queue_request_irq() and setting of NVMEQ_ENABLED bit. A7 could get chance to pci_free_irq() for certain IO queue while B3 is checking irqaction. Fix: A7 takes lock. nvme_dev->online_queues need to be protected by shutdown_lock. Since it is not atomic, both paths could modify it using its own copy. Co-developed-by: Yuanyuan Zhong <yzhong@purestorage.com> Signed-off-by: Casey Chen <cachen@purestorage.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Prabhakar Kushwaha authored
dev_get_by_name() finds network device by name but it also increases the reference count. If a nvme-tcp queue is present and the network device driver is removed before nvme_tcp, we will face the following continuous log: "kernel:unregister_netdevice: waiting for <eth> to become free. Usage count = 2" And rmmod further halts. Similar case arises during reboot/shutdown with nvme-tcp queue present and both never completes. To fix this, use __dev_get_by_name() which finds network device by name without increasing any reference counter. Fixes: 3ede8f72 ("nvme-tcp: allow selecting the network interface for connections") Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com> Signed-off-by: Shai Malin <smalin@marvell.com> Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> [hch: remove the ->ndev member entirely] Signed-off-by: Christoph Hellwig <hch@lst.de>
-
- 07 Jul, 2021 3 commits
-
-
Yu Kuai authored
We run a test that create millions of cgroups and blkgs, and then trigger blkg_destroy_all(). blkg_destroy_all() will hold spin lock for a long time in such situation. Thus release the lock when a batch of blkgs are destroyed. blkcg_activate_policy() and blkcg_deactivate_policy() might have the same problem, however, as they are basically only called from module init/exit paths, let's leave them alone for now. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20210707015649.1929797-1-yukuai3@huawei.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Chunguang Xu authored
On the IO submission path, blk_account_io_start() may interrupt the system interruption. When the interruption returns, the value of part->stamp may have been updated by other cores, so the time value collected before the interruption may be less than part-> stamp. So when this happens, we should do nothing to make io_ticks more accurate? For kernels less than 5.0, this may cause io_ticks to become smaller, which in turn may cause abnormal ioutil values. Signed-off-by: Chunguang Xu <brookxu@tencent.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/1625521646-1069-1-git-send-email-brookxu.cn@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
git://git.infradead.org/nvmeJens Axboe authored
Pull single NVMe fix from Christoph. * 'nvme-5.14' of git://git.infradead.org/nvme: nvme-tcp: can't set sk_user_data without write_lock
-
- 05 Jul, 2021 1 commit
-
-
Maurizio Lombardi authored
The sk_user_data pointer is supposed to be modified only while holding the write_lock "sk_callback_lock", otherwise we could race with other threads and crash the kernel. we can't take the write_lock in nvmet_tcp_state_change() because it would cause a deadlock, but the release_work queue will set the pointer to NULL later so we can simply remove the assignment. Fixes: b5332a9f ("nvmet-tcp: fix incorrect locking in state_change sk callback") Signed-off-by: Maurizio Lombardi <mlombard@redhat.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
- 02 Jul, 2021 1 commit
-
-
Tetsuo Handa authored
Commit 0384264e ("block: pass a gendisk to bdev_disk_changed") changed to pass lo->lo_disk instead of lo->lo_device. Fixes: 0384264e ("block: pass a gendisk to bdev_disk_changed") Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Link: https://lore.kernel.org/r/20210702152714.7978-1-penguin-kernel@I-love.SAKURA.ne.jpSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
- 01 Jul, 2021 5 commits
-
-
Christoph Hellwig authored
There is no need to hold a bdev reference when removing the partition. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210701081638.246552-3-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Sending uevents requires the struct device to be alive. To ensure that grab the device refcount instead of just an inode reference. Fixes: bc359d03 ("block: add a disk_uevent helper") Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210701081638.246552-2-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Kees Cook authored
In preparation for FORTIFY_SOURCE performing compile-time and run-time field array bounds checking for memcpy(), memmove(), and memset(), avoid intentionally reading across neighboring array fields. Add a wrapping structure to serve as the memcpy() source, so the compiler can do appropriate bounds checking, avoiding this future warning: In function '__fortify_memcpy', inlined from 'create_uid' at drivers/s390/block/dasd_eckd.c:749:2: ./include/linux/fortify-string.h:246:4: error: call to '__read_overflow2_field' declared with attribute error: detected read beyond size of field (2nd parameter) Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Stefan Haberland <sth@linux.ibm.com> Link: https://lore.kernel.org/r/20210701142221.3408680-3-sth@linux.ibm.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
dasd_set_target_state is only used inside of dasd_mod.ko, so don't export it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Stefan Haberland <sth@linux.ibm.com> Link: https://lore.kernel.org/r/20210701142221.3408680-2-sth@linux.ibm.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Yufen Yu authored
If disk have been deleted, we should return fail for ioctl BLKPG_DEL_PARTITION. Otherwise, the directory /sys/class/block may remain invalid symlinks file. The race as following: blkdev_open del_gendisk disk->flags &= ~GENHD_FL_UP; blk_drop_partitions blkpg_ioctl bdev_add_partition add_partition device_add device_add_class_symlinks ioctl may add_partition after del_gendisk() have tried to delete partitions. Then, symlinks file will be created. Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Yufen Yu <yuyufen@huawei.com> Link: https://lore.kernel.org/r/20210610023241.3646241-1-yuyufen@huawei.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
- 30 Jun, 2021 22 commits
-
-
Christoph Hellwig authored
Remove some leftovers of the fake major number parsing that cause complains from some compilers. Fixes: 2933a1b2c6f3 ("ubd: remove the code to register as the legacy IDE driver") Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210628093937.1325608-1-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Keith Busch authored
We don't have an nvme status to report if the driver's .queue_rq() returns an error without dispatching the requested nvme command. Check the return value from blk_execute_rq() for all passthrough commands so the caller may know their command was not successful. If the command is from the target passthrough interface and fails to dispatch, synthesize the response back to the host as a internal target error. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Link: https://lore.kernel.org/r/20210610214437.641245-5-kbusch@kernel.orgSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Keith Busch authored
The synchronous blk_execute_rq() had not provided a way for its callers to know if its request was successful or not. Return the blk_status_t result of the request. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Link: https://lore.kernel.org/r/20210610214437.641245-4-kbusch@kernel.orgSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Keith Busch authored
The generic blk_execute_rq() knows how to handle polled completions. Use that instead of implementing an nvme specific handler. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Link: https://lore.kernel.org/r/20210610214437.641245-3-kbusch@kernel.orgSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Keith Busch authored
Poll for completions if the request's hctx is a polling type. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Link: https://lore.kernel.org/r/20210610214437.641245-2-kbusch@kernel.orgSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
With the legacy IDE driver gone drivers now use either REQ_OP_DRV_* or REQ_OP_SCSI_*, so unify the two concepts of passthrough requests into a single one. Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
All driver uses are gone now. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20210624081012.256464-1-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Use idr_for_each_entry to simplify removing all devices. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Link: https://lore.kernel.org/r/20210623145908.92973-10-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
loop_lookup has two callers - one wants to do the a find by index and the other wants any unbound loop device. Open code the respective functionality in each caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210623145908.92973-9-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Passing a negative index to loop_lookup while return any unbound device. Doing that for a delete does not make much sense, so add check to explicitly reject that case. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210623145908.92973-8-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Move acquiring and releasing loop_ctl_mutex from the callers into loop_add. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Link: https://lore.kernel.org/r/20210623145908.92973-7-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Split loop_control_ioctl into a helper for each command. This keeps the code nicely separated for the upcoming locking changes. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Link: https://lore.kernel.org/r/20210623145908.92973-6-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
loop_add returns the right error if the slot wasn't available. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210623145908.92973-5-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
None of the callers cares about the allocated struct loop_device. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Link: https://lore.kernel.org/r/20210623145908.92973-4-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
loop_ctl_mutex is only needed to iterate the IDR for removing the loop devices, so reduce the coverage. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Link: https://lore.kernel.org/r/20210623145908.92973-3-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Unregister the misc and blockdevice first to prevent further access, and only then iterate to remove the devices. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Link: https://lore.kernel.org/r/20210623145908.92973-2-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Fix a let hunk from the blk_mq_alloc_disk conversion. Fixes: 281ea6a5bfdc ("mmc: switch to blk_mq_alloc_disk") Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Acked-by: Ulf Hansson <ulf.hansson@linaro.org> Link: https://lore.kernel.org/r/20210621080144.3655131-1-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Use the blk_mq_alloc_disk to allocate the request_queue and gendisk together. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Link: https://lore.kernel.org/r/20210616053934.880951-3-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
The gendisk already acquires a reference to the queue when add_disk is called, which dropped on put_disk. So remove the superflous extra refcounting. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Link: https://lore.kernel.org/r/20210616053934.880951-2-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Prasanna Kumar Kalever authored
Problem: On reconfigure of device, there is no way to defend if the backend storage is matching with the initial backend storage. Say, if an initial connect request for backend "pool1/image1" got mapped to /dev/nbd0 and the userspace process is terminated. A next reconfigure request within NBD_ATTR_DEAD_CONN_TIMEOUT is allowed to use /dev/nbd0 for a different backend "pool1/image2" For example, an operation like below could be dangerous: $ sudo rbd-nbd map --try-netlink rbd-pool/ext4-image /dev/nbd0 $ sudo blkid /dev/nbd0 /dev/nbd0: UUID="bfc444b4-64b1-418f-8b36-6e0d170cfc04" TYPE="ext4" $ sudo pkill -9 rbd-nbd $ sudo rbd-nbd attach --try-netlink --device /dev/nbd0 rbd-pool/xfs-image /dev/nbd0 $ sudo blkid /dev/nbd0 /dev/nbd0: UUID="d29bf343-6570-4069-a9ea-2fa156ced908" TYPE="xfs" Solution: Provide a way for userspace processes to keep some metadata to identify between the device and the backend, so that when a reconfigure request is made, we can compare and avoid such dangerous operations. With this solution, as part of the initial connect request, backend path can be stored in the sysfs per device config, so that on a reconfigure request it's easy to check if the backend path matches with the initial connect backend path. Please note, ioctl interface to nbd will not have these changes, as there won't be any reconfigure. Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20210429102828.31248-1-prasanna.kalever@redhat.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Use blk_mq_alloc_disk and blk_cleanup_disk to simplify the gendisk and request_queue allocation. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210614060759.3965724-3-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
With the legacy IDE driver long deprecated, and modern userspace being much more flexible about dev_t assignments there is no reason to fake a registration as the legacy IDE driver in ubd. This registeration is a little problematic as it registers the same request_queue for multiple gendisks, so just remove it. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com> Link: https://lore.kernel.org/r/20210614060759.3965724-2-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>
-