Commit cddce011 authored by Xie Yongji's avatar Xie Yongji Committed by Jens Axboe

nbd: Aovid double completion of a request

There is a race between iterating over requests in
nbd_clear_que() and completing requests in recv_work(),
which can lead to double completion of a request.

To fix it, flush the recv worker before iterating over
the requests and don't abort the completed request
while iterating.

Fixes: 96d97e17 ("nbd: clear_sock on netlink disconnect")
Reported-by: default avatarJiang Yadong <jiangyadong@bytedance.com>
Signed-off-by: default avatarXie Yongji <xieyongji@bytedance.com>
Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
Link: https://lore.kernel.org/r/20210813151330.96-1-xieyongji@bytedance.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
parent 454bb677
......@@ -818,6 +818,10 @@ static bool nbd_clear_req(struct request *req, void *data, bool reserved)
{
struct nbd_cmd *cmd = blk_mq_rq_to_pdu(req);
/* don't abort one completed request */
if (blk_mq_request_completed(req))
return true;
mutex_lock(&cmd->lock);
cmd->status = BLK_STS_IOERR;
mutex_unlock(&cmd->lock);
......@@ -2004,15 +2008,19 @@ static void nbd_disconnect_and_put(struct nbd_device *nbd)
{
mutex_lock(&nbd->config_lock);
nbd_disconnect(nbd);
nbd_clear_sock(nbd);
mutex_unlock(&nbd->config_lock);
sock_shutdown(nbd);
/*
* Make sure recv thread has finished, so it does not drop the last
* config ref and try to destroy the workqueue from inside the work
* queue.
* queue. And this also ensure that we can safely call nbd_clear_que()
* to cancel the inflight I/Os.
*/
if (nbd->recv_workq)
flush_workqueue(nbd->recv_workq);
nbd_clear_que(nbd);
nbd->task_setup = NULL;
mutex_unlock(&nbd->config_lock);
if (test_and_clear_bit(NBD_RT_HAS_CONFIG_REF,
&nbd->config->runtime_flags))
nbd_config_put(nbd);
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment