• Filipe Manana's avatar
    Btrfs: fix deadlock when starting writeback of bg caches · 24b89d08
    Filipe Manana authored
    While starting the writes of the dirty block group caches, if we don't
    find a block group item in the extent tree we were leaving without
    releasing our path, running delayed references and then looping again to
    process any new dirty block groups. However this second iteration of the
    loop could cause a deadlock because it tries to lock some other extent
    tree node/leaf which another task already locked and it's blocked because
    it's waiting for a lock on some node/leaf that is in our path that was not
    released before.
    We could also deadlock when running the delayed references - as we could
    end up trying to lock the same nodes/leafs that we have in our local path
    (with a different lock type).
    
    Got into such case when running xfstests:
    
    [20892.242791] ------------[ cut here ]------------
    [20892.243776] WARNING: CPU: 0 PID: 13299 at fs/btrfs/super.c:260 __btrfs_abort_transaction+0x52/0x114 [btrfs]()
    [20892.245874] BTRFS: Transaction aborted (error -2)
    (...)
    [20892.269378] Call Trace:
    [20892.269915]  [<ffffffff8142fa46>] dump_stack+0x4f/0x7b
    [20892.271097]  [<ffffffff8108b6a2>] ? console_unlock+0x361/0x3ad
    [20892.272173]  [<ffffffff81045ea5>] warn_slowpath_common+0xa1/0xbb
    [20892.273386]  [<ffffffffa0509a6d>] ? __btrfs_abort_transaction+0x52/0x114 [btrfs]
    [20892.274857]  [<ffffffff81045f05>] warn_slowpath_fmt+0x46/0x48
    [20892.275851]  [<ffffffffa0509a6d>] __btrfs_abort_transaction+0x52/0x114 [btrfs]
    [20892.277341]  [<ffffffffa0515e10>] write_one_cache_group+0x68/0xaf [btrfs]
    [20892.278628]  [<ffffffffa052088a>] btrfs_start_dirty_block_groups+0x18d/0x29b [btrfs]
    [20892.280191]  [<ffffffffa052f077>] btrfs_commit_transaction+0x130/0x9c9 [btrfs]
    (...)
    [20892.291316] ---[ end trace 597f77e664245373 ]---
    [20892.293955] BTRFS: error (device sdg) in write_one_cache_group:3184: errno=-2 No such entry
    [20892.297390] BTRFS info (device sdg): forced readonly
    [20892.298222] ------------[ cut here ]------------
    [20892.299190] WARNING: CPU: 0 PID: 13299 at fs/btrfs/ctree.c:2683 btrfs_search_slot+0x7e/0x7d2 [btrfs]()
    (...)
    [20892.326253] Call Trace:
    [20892.326904]  [<ffffffff8142fa46>] dump_stack+0x4f/0x7b
    [20892.329503]  [<ffffffff8108b6a2>] ? console_unlock+0x361/0x3ad
    [20892.330815]  [<ffffffff81045ea5>] warn_slowpath_common+0xa1/0xbb
    [20892.332556]  [<ffffffffa0510b73>] ? btrfs_search_slot+0x7e/0x7d2 [btrfs]
    [20892.333955]  [<ffffffff81045f62>] warn_slowpath_null+0x1a/0x1c
    [20892.335562]  [<ffffffffa0510b73>] btrfs_search_slot+0x7e/0x7d2 [btrfs]
    [20892.336849]  [<ffffffff8107b024>] ? arch_local_irq_save+0x9/0xc
    [20892.338222]  [<ffffffffa051ad52>] ? cache_save_setup+0x43/0x2a5 [btrfs]
    [20892.339823]  [<ffffffffa051ad66>] ? cache_save_setup+0x57/0x2a5 [btrfs]
    [20892.341275]  [<ffffffff814351a4>] ? _raw_spin_unlock+0x32/0x46
    [20892.342810]  [<ffffffffa0515de7>] write_one_cache_group+0x3f/0xaf [btrfs]
    [20892.344184]  [<ffffffffa052088a>] btrfs_start_dirty_block_groups+0x18d/0x29b [btrfs]
    [20892.347162]  [<ffffffffa052f077>] btrfs_commit_transaction+0x130/0x9c9 [btrfs]
    (...)
    [20892.361015] ---[ end trace 597f77e664245374 ]---
    [21120.688097] INFO: task kworker/u8:17:29854 blocked for more than 120 seconds.
    [21120.689881]       Tainted: G        W       4.0.0-rc5-btrfs-next-9+ #2
    [21120.691384] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    (...)
    [21120.703696] Call Trace:
    [21120.704310]  [<ffffffff8143107e>] schedule+0x74/0x83
    [21120.705490]  [<ffffffffa055f025>] btrfs_tree_lock+0xd7/0x236 [btrfs]
    [21120.706757]  [<ffffffff81075cd6>] ? signal_pending_state+0x31/0x31
    [21120.708156]  [<ffffffffa054ac1e>] lock_extent_buffer_for_io+0x3e/0x194 [btrfs]
    [21120.709892]  [<ffffffffa054bb86>] ? btree_write_cache_pages+0x273/0x385 [btrfs]
    [21120.711605]  [<ffffffffa054bc42>] btree_write_cache_pages+0x32f/0x385 [btrfs]
    [21120.723440]  [<ffffffffa0527552>] btree_writepages+0x23/0x5c [btrfs]
    [21120.724943]  [<ffffffff8110c4c8>] do_writepages+0x23/0x2c
    [21120.726008]  [<ffffffff81176dde>] __writeback_single_inode+0x73/0x2fa
    [21120.727230]  [<ffffffff8117714a>] ? writeback_sb_inodes+0xe5/0x38b
    [21120.728526]  [<ffffffff811771fb>] ? writeback_sb_inodes+0x196/0x38b
    [21120.729701]  [<ffffffff8117726a>] writeback_sb_inodes+0x205/0x38b
    (...)
    [21120.747853] INFO: task btrfs:13282 blocked for more than 120 seconds.
    [21120.749459]       Tainted: G        W       4.0.0-rc5-btrfs-next-9+ #2
    [21120.751137] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    (...)
    [21120.768457] Call Trace:
    [21120.769039]  [<ffffffff8143107e>] schedule+0x74/0x83
    [21120.770107]  [<ffffffffa052f25c>] btrfs_commit_transaction+0x315/0x9c9 [btrfs]
    [21120.771558]  [<ffffffff81075cd6>] ? signal_pending_state+0x31/0x31
    [21120.773659]  [<ffffffffa056fd8c>] prepare_to_relocate+0xcb/0xd2 [btrfs]
    [21120.776257]  [<ffffffffa05741da>] relocate_block_group+0x44/0x4a9 [btrfs]
    [21120.777755]  [<ffffffffa05747a0>] ? btrfs_relocate_block_group+0x161/0x288 [btrfs]
    [21120.779459]  [<ffffffffa05747a8>] btrfs_relocate_block_group+0x169/0x288 [btrfs]
    [21120.781153]  [<ffffffffa0550403>] btrfs_relocate_chunk.isra.29+0x3e/0xa7 [btrfs]
    [21120.783918]  [<ffffffffa05518fd>] btrfs_balance+0xaa4/0xc52 [btrfs]
    [21120.785436]  [<ffffffff8114306e>] ? cpu_cache_get.isra.39+0xe/0x1f
    [21120.786434]  [<ffffffffa0559252>] btrfs_ioctl_balance+0x23f/0x2b0 [btrfs]
    (...)
    [21120.889251] INFO: task fsstress:13288 blocked for more than 120 seconds.
    [21120.890526]       Tainted: G        W       4.0.0-rc5-btrfs-next-9+ #2
    [21120.891773] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    (...)
    [21120.899960] Call Trace:
    [21120.900743]  [<ffffffff8143107e>] schedule+0x74/0x83
    [21120.903004]  [<ffffffffa055f025>] btrfs_tree_lock+0xd7/0x236 [btrfs]
    [21120.904383]  [<ffffffff81075cd6>] ? signal_pending_state+0x31/0x31
    [21120.905608]  [<ffffffffa051125b>] btrfs_search_slot+0x766/0x7d2 [btrfs]
    [21120.906812]  [<ffffffff8114290e>] ? virt_to_head_page+0x9/0x2c
    [21120.907874]  [<ffffffff81144b7f>] ? cache_alloc_debugcheck_after.isra.42+0x16c/0x1cb
    [21120.909551]  [<ffffffffa05124e0>] btrfs_insert_empty_items+0x5d/0xa8 [btrfs]
    [21120.910914]  [<ffffffffa0512585>] btrfs_insert_item+0x5a/0xa5 [btrfs]
    [21120.912181]  [<ffffffffa0520271>] ? btrfs_create_pending_block_groups+0x96/0x130 [btrfs]
    [21120.913784]  [<ffffffffa052028a>] btrfs_create_pending_block_groups+0xaf/0x130 [btrfs]
    [21120.915374]  [<ffffffffa052ffc2>] __btrfs_end_transaction+0x84/0x366 [btrfs]
    [21120.916735]  [<ffffffffa05302b4>] btrfs_end_transaction+0x10/0x12 [btrfs]
    [21120.917996]  [<ffffffffa051ab26>] btrfs_check_data_free_space+0x11f/0x27c [btrfs]
    [21120.919478]  [<ffffffffa051ba25>] btrfs_delalloc_reserve_space+0x1e/0x51 [btrfs]
    [21120.921226]  [<ffffffffa05382f2>] btrfs_truncate_page+0x85/0x2c4 [btrfs]
    [21120.923121]  [<ffffffffa0538572>] btrfs_cont_expand+0x41/0x3ef [btrfs]
    [21120.924449]  [<ffffffffa0541091>] ? btrfs_file_write_iter+0x19a/0x431 [btrfs]
    [21120.926602]  [<ffffffff8107b024>] ? arch_local_irq_save+0x9/0xc
    [21120.927769]  [<ffffffffa0541091>] ? btrfs_file_write_iter+0x19a/0x431 [btrfs]
    [21120.929324]  [<ffffffffa05410a0>] ? btrfs_file_write_iter+0x1a9/0x431 [btrfs]
    [21120.930723]  [<ffffffffa05410d9>] btrfs_file_write_iter+0x1e2/0x431 [btrfs]
    [21120.931897]  [<ffffffff81067d85>] ? get_parent_ip+0xe/0x3e
    [21120.934446]  [<ffffffff811534c3>] new_sync_write+0x7c/0xa0
    [21120.935528]  [<ffffffff81153b58>] vfs_write+0xb2/0x117
    (...)
    
    Fixes: 1bbc621e ("Btrfs: allow block group cache writeout
                          outside critical section in commit")
    Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Signed-off-by: default avatarChris Mason <clm@fb.com>
    24b89d08
extent-tree.c 271 KB