• Qu Wenruo's avatar
    btrfs: Fix metadata underflow caused by btrfs_reloc_clone_csum error · 4dbd80fb
    Qu Wenruo authored
    [BUG]
    When btrfs_reloc_clone_csum() reports error, it can underflow metadata
    and leads to kernel assertion on outstanding extents in
    run_delalloc_nocow() and cow_file_range().
    
     BTRFS info (device vdb5): relocating block group 12582912 flags data
     BTRFS info (device vdb5): found 1 extents
     assertion failed: inode->outstanding_extents >= num_extents, file: fs/btrfs//extent-tree.c, line: 5858
    
    Currently, due to another bug blocking ordered extents, the bug is only
    reproducible under certain block group layout and using error injection.
    
    a) Create one data block group with one 4K extent in it.
       To avoid the bug that hangs btrfs due to ordered extent which never
       finishes
    b) Make btrfs_reloc_clone_csum() always fail
    c) Relocate that block group
    
    [CAUSE]
    run_delalloc_nocow() and cow_file_range() handles error from
    btrfs_reloc_clone_csum() wrongly:
    
    (The ascii chart shows a more generic case of this bug other than the
    bug mentioned above)
    
    |<------------------ delalloc range --------------------------->|
    | OE 1 | OE 2 | ... | OE n |
                        |<----------- cleanup range --------------->|
    |<-----------  ----------->|
                 \/
     btrfs_finish_ordered_io() range
    
    So error handler, which calls extent_clear_unlock_delalloc() with
    EXTENT_DELALLOC and EXTENT_DO_ACCOUNT bits, and btrfs_finish_ordered_io()
    will both cover OE n, and free its metadata, causing metadata under flow.
    
    [Fix]
    The fix is to ensure after calling btrfs_add_ordered_extent(), we only
    call error handler after increasing the iteration offset, so that
    cleanup range won't cover any created ordered extent.
    
    |<------------------ delalloc range --------------------------->|
    | OE 1 | OE 2 | ... | OE n |
    |<-----------  ----------->|<---------- cleanup range --------->|
                 \/
     btrfs_finish_ordered_io() range
    Signed-off-by: default avatarQu Wenruo <quwenruo@cn.fujitsu.com>
    Reviewed-by: default avatarFilipe Manana <fdmanana@suse.com>
    Reviewed-by: default avatarLiu Bo <bo.li.liu@oracle.com>
    4dbd80fb
inode.c 284 KB