• Josef Bacik's avatar
    nbd: handle racing with error'ed out commands · 7ce23e8e
    Josef Bacik authored
    We hit the following warning in production
    
    print_req_error: I/O error, dev nbd0, sector 7213934408 flags 80700
    ------------[ cut here ]------------
    refcount_t: underflow; use-after-free.
    WARNING: CPU: 25 PID: 32407 at lib/refcount.c:190 refcount_sub_and_test_checked+0x53/0x60
    Workqueue: knbd-recv recv_work [nbd]
    RIP: 0010:refcount_sub_and_test_checked+0x53/0x60
    Call Trace:
     blk_mq_free_request+0xb7/0xf0
     blk_mq_complete_request+0x62/0xf0
     recv_work+0x29/0xa1 [nbd]
     process_one_work+0x1f5/0x3f0
     worker_thread+0x2d/0x3d0
     ? rescuer_thread+0x340/0x340
     kthread+0x111/0x130
     ? kthread_create_on_node+0x60/0x60
     ret_from_fork+0x1f/0x30
    ---[ end trace b079c3c67f98bb7c ]---
    
    This was preceded by us timing out everything and shutting down the
    sockets for the device.  The problem is we had a request in the queue at
    the same time, so we completed the request twice.  This can actually
    happen in a lot of cases, we fail to get a ref on our config, we only
    have one connection and just error out the command, etc.
    
    Fix this by checking cmd->status in nbd_read_stat.  We only change this
    under the cmd->lock, so we are safe to check this here and see if we've
    already error'ed this command out, which would indicate that we've
    completed it as well.
    Reviewed-by: default avatarMike Christie <mchristi@redhat.com>
    Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
    Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
    7ce23e8e
nbd.c 59.7 KB