Commit 1b95e817 authored by Ming Lei's avatar Ming Lei Committed by Keith Busch

nvme: fix possible hang when removing a controller during error recovery

Error recovery can be interrupted by controller removal, then the
controller is left as quiesced, and IO hang can be caused.

Fix the issue by unquiescing controller unconditionally when removing
namespaces.

This way is reasonable and safe given forward progress can be made
when removing namespaces.
Reviewed-by: default avatarKeith Busch <kbusch@kernel.org>
Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
Reported-by: default avatarChunguang Xu <brookxu.cn@gmail.com>
Closes: https://lore.kernel.org/linux-nvme/cover.1685350577.git.chunguang.xu@shopee.com/
Cc: stable@vger.kernel.org
Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
Signed-off-by: default avatarKeith Busch <kbusch@kernel.org>
parent b8f6446b
...@@ -3933,6 +3933,12 @@ void nvme_remove_namespaces(struct nvme_ctrl *ctrl) ...@@ -3933,6 +3933,12 @@ void nvme_remove_namespaces(struct nvme_ctrl *ctrl)
*/ */
nvme_mpath_clear_ctrl_paths(ctrl); nvme_mpath_clear_ctrl_paths(ctrl);
/*
* Unquiesce io queues so any pending IO won't hang, especially
* those submitted from scan work
*/
nvme_unquiesce_io_queues(ctrl);
/* prevent racing with ns scanning */ /* prevent racing with ns scanning */
flush_work(&ctrl->scan_work); flush_work(&ctrl->scan_work);
...@@ -3942,10 +3948,8 @@ void nvme_remove_namespaces(struct nvme_ctrl *ctrl) ...@@ -3942,10 +3948,8 @@ void nvme_remove_namespaces(struct nvme_ctrl *ctrl)
* removing the namespaces' disks; fail all the queues now to avoid * removing the namespaces' disks; fail all the queues now to avoid
* potentially having to clean up the failed sync later. * potentially having to clean up the failed sync later.
*/ */
if (ctrl->state == NVME_CTRL_DEAD) { if (ctrl->state == NVME_CTRL_DEAD)
nvme_mark_namespaces_dead(ctrl); nvme_mark_namespaces_dead(ctrl);
nvme_unquiesce_io_queues(ctrl);
}
/* this is a no-op when called from the controller reset handler */ /* this is a no-op when called from the controller reset handler */
nvme_change_ctrl_state(ctrl, NVME_CTRL_DELETING_NOIO); nvme_change_ctrl_state(ctrl, NVME_CTRL_DELETING_NOIO);
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment