• Yu Kuai's avatar
    nbd: fix race between nbd_alloc_config() and module removal · c55b2b98
    Yu Kuai authored
    When nbd module is being removing, nbd_alloc_config() may be
    called concurrently by nbd_genl_connect(), although try_module_get()
    will return false, but nbd_alloc_config() doesn't handle it.
    
    The race may lead to the leak of nbd_config and its related
    resources (e.g, recv_workq) and oops in nbd_read_stat() due
    to the unload of nbd module as shown below:
    
      BUG: kernel NULL pointer dereference, address: 0000000000000040
      Oops: 0000 [#1] SMP PTI
      CPU: 5 PID: 13840 Comm: kworker/u17:33 Not tainted 5.14.0+ #1
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
      Workqueue: knbd16-recv recv_work [nbd]
      RIP: 0010:nbd_read_stat.cold+0x130/0x1a4 [nbd]
      Call Trace:
       recv_work+0x3b/0xb0 [nbd]
       process_one_work+0x1ed/0x390
       worker_thread+0x4a/0x3d0
       kthread+0x12a/0x150
       ret_from_fork+0x22/0x30
    
    Fixing it by checking the return value of try_module_get()
    in nbd_alloc_config(). As nbd_alloc_config() may return ERR_PTR(-ENODEV),
    assign nbd->config only when nbd_alloc_config() succeeds to ensure
    the value of nbd->config is binary (valid or NULL).
    
    Also adding a debug message to check the reference counter
    of nbd_config during module removal.
    Signed-off-by: default avatarHou Tao <houtao1@huawei.com>
    Signed-off-by: default avatarYu Kuai <yukuai3@huawei.com>
    Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
    Link: https://lore.kernel.org/r/20220521073749.3146892-3-yukuai3@huawei.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
    c55b2b98
nbd.c 64.4 KB