• Gioh Kim's avatar
    block/rnbd-srv: Prevent a deadlock generated by accessing sysfs in parallel · b168e1d8
    Gioh Kim authored
    We got a warning message below.
    When server tries to close one session by force, it locks the sysfs
    interface and locks the srv_sess lock.
    The problem is that client can send a request to close at the same time.
    By close request, server locks the srv_sess lock and locks the sysfs
    to remove the sysfs interfaces.
    
    The simplest way to prevent that situation could be just use
    mutex_trylock.
    
    [  234.153965] ======================================================
    [  234.154093] WARNING: possible circular locking dependency detected
    [  234.154219] 5.4.84-storage #5.4.84-1+feature+linux+5.4.y+dbg+20201216.1319+b6b887b~deb10 Tainted: G           O
    [  234.154381] ------------------------------------------------------
    [  234.154531] kworker/1:1H/618 is trying to acquire lock:
    [  234.154651] ffff8887a09db0a8 (kn->count#132){++++}, at: kernfs_remove_by_name_ns+0x40/0x80
    [  234.154819]
                   but task is already holding lock:
    [  234.154965] ffff8887ae5f6518 (&srv_sess->lock){+.+.}, at: rnbd_srv_rdma_ev+0x144/0x1590 [rnbd_server]
    [  234.155132]
                   which lock already depends on the new lock.
    
    [  234.155311]
                   the existing dependency chain (in reverse order) is:
    [  234.155462]
                   -> #1 (&srv_sess->lock){+.+.}:
    [  234.155614]        __mutex_lock+0x134/0xcb0
    [  234.155761]        rnbd_srv_sess_dev_force_close+0x36/0x50 [rnbd_server]
    [  234.155889]        rnbd_srv_dev_session_force_close_store+0x69/0xc0 [rnbd_server]
    [  234.156042]        kernfs_fop_write+0x13f/0x240
    [  234.156162]        vfs_write+0xf3/0x280
    [  234.156278]        ksys_write+0xba/0x150
    [  234.156395]        do_syscall_64+0x62/0x270
    [  234.156513]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [  234.156632]
                   -> #0 (kn->count#132){++++}:
    [  234.156782]        __lock_acquire+0x129e/0x23a0
    [  234.156900]        lock_acquire+0xf3/0x210
    [  234.157043]        __kernfs_remove+0x42b/0x4c0
    [  234.157161]        kernfs_remove_by_name_ns+0x40/0x80
    [  234.157282]        remove_files+0x3f/0xa0
    [  234.157399]        sysfs_remove_group+0x4a/0xb0
    [  234.157519]        rnbd_srv_destroy_dev_session_sysfs+0x19/0x30 [rnbd_server]
    [  234.157648]        rnbd_srv_rdma_ev+0x14c/0x1590 [rnbd_server]
    [  234.157775]        process_io_req+0x29a/0x6a0 [rtrs_server]
    [  234.157924]        __ib_process_cq+0x8c/0x100 [ib_core]
    [  234.158709]        ib_cq_poll_work+0x31/0xb0 [ib_core]
    [  234.158834]        process_one_work+0x4e5/0xaa0
    [  234.158958]        worker_thread+0x65/0x5c0
    [  234.159078]        kthread+0x1e0/0x200
    [  234.159194]        ret_from_fork+0x24/0x30
    [  234.159309]
                   other info that might help us debug this:
    
    [  234.159513]  Possible unsafe locking scenario:
    
    [  234.159658]        CPU0                    CPU1
    [  234.159775]        ----                    ----
    [  234.159891]   lock(&srv_sess->lock);
    [  234.160005]                                lock(kn->count#132);
    [  234.160128]                                lock(&srv_sess->lock);
    [  234.160250]   lock(kn->count#132);
    [  234.160364]
                    *** DEADLOCK ***
    
    [  234.160536] 3 locks held by kworker/1:1H/618:
    [  234.160677]  #0: ffff8883ca1ed528 ((wq_completion)ib-comp-wq){+.+.}, at: process_one_work+0x40a/0xaa0
    [  234.160840]  #1: ffff8883d2d5fe10 ((work_completion)(&cq->work)){+.+.}, at: process_one_work+0x40a/0xaa0
    [  234.161003]  #2: ffff8887ae5f6518 (&srv_sess->lock){+.+.}, at: rnbd_srv_rdma_ev+0x144/0x1590 [rnbd_server]
    [  234.161168]
                   stack backtrace:
    [  234.161312] CPU: 1 PID: 618 Comm: kworker/1:1H Tainted: G           O      5.4.84-storage #5.4.84-1+feature+linux+5.4.y+dbg+20201216.1319+b6b887b~deb10
    [  234.161490] Hardware name: Supermicro H8QG6/H8QG6, BIOS 3.00       09/04/2012
    [  234.161643] Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
    [  234.161765] Call Trace:
    [  234.161910]  dump_stack+0x96/0xe0
    [  234.162028]  check_noncircular+0x29e/0x2e0
    [  234.162148]  ? print_circular_bug+0x100/0x100
    [  234.162267]  ? register_lock_class+0x1ad/0x8a0
    [  234.162385]  ? __lock_acquire+0x68e/0x23a0
    [  234.162505]  ? trace_event_raw_event_lock+0x190/0x190
    [  234.162626]  __lock_acquire+0x129e/0x23a0
    [  234.162746]  ? register_lock_class+0x8a0/0x8a0
    [  234.162866]  lock_acquire+0xf3/0x210
    [  234.162982]  ? kernfs_remove_by_name_ns+0x40/0x80
    [  234.163127]  __kernfs_remove+0x42b/0x4c0
    [  234.163243]  ? kernfs_remove_by_name_ns+0x40/0x80
    [  234.163363]  ? kernfs_fop_readdir+0x3b0/0x3b0
    [  234.163482]  ? strlen+0x1f/0x40
    [  234.163596]  ? strcmp+0x30/0x50
    [  234.163712]  kernfs_remove_by_name_ns+0x40/0x80
    [  234.163832]  remove_files+0x3f/0xa0
    [  234.163948]  sysfs_remove_group+0x4a/0xb0
    [  234.164068]  rnbd_srv_destroy_dev_session_sysfs+0x19/0x30 [rnbd_server]
    [  234.164196]  rnbd_srv_rdma_ev+0x14c/0x1590 [rnbd_server]
    [  234.164345]  ? _raw_spin_unlock_irqrestore+0x43/0x50
    [  234.164466]  ? lockdep_hardirqs_on+0x1a8/0x290
    [  234.164597]  ? mlx4_ib_poll_cq+0x927/0x1280 [mlx4_ib]
    [  234.164732]  ? rnbd_get_sess_dev+0x270/0x270 [rnbd_server]
    [  234.164859]  process_io_req+0x29a/0x6a0 [rtrs_server]
    [  234.164982]  ? rnbd_get_sess_dev+0x270/0x270 [rnbd_server]
    [  234.165130]  __ib_process_cq+0x8c/0x100 [ib_core]
    [  234.165279]  ib_cq_poll_work+0x31/0xb0 [ib_core]
    [  234.165404]  process_one_work+0x4e5/0xaa0
    [  234.165550]  ? pwq_dec_nr_in_flight+0x160/0x160
    [  234.165675]  ? do_raw_spin_lock+0x119/0x1d0
    [  234.165796]  worker_thread+0x65/0x5c0
    [  234.165914]  ? process_one_work+0xaa0/0xaa0
    [  234.166031]  kthread+0x1e0/0x200
    [  234.166147]  ? kthread_create_worker_on_cpu+0xc0/0xc0
    [  234.166268]  ret_from_fork+0x24/0x30
    [  234.251591] rnbd_server L243: </dev/loop1@close_device_session>: Device closed
    [  234.604221] rnbd_server L264: RTRS Session close_device_session disconnected
    Signed-off-by: default avatarGioh Kim <gi-oh.kim@ionos.com>
    Signed-off-by: default avatarMd Haris Iqbal <haris.iqbal@ionos.com>
    Signed-off-by: default avatarJack Wang <jinpu.wang@ionos.com>
    Link: https://lore.kernel.org/r/20210419073722.15351-10-gi-oh.kim@ionos.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
    b168e1d8
rnbd-srv.c 22.8 KB