• Filipe Manana's avatar
    Btrfs: fix range cloning when same inode used as source and destination · df858e76
    Filipe Manana authored
    While searching for extents to clone we might find one where we only use
    a part of it coming from its tail. If our destination inode is the same
    the source inode, we end up removing the tail part of the extent item and
    insert after a new one that point to the same extent with an adjusted
    key file offset and data offset. After this we search for the next extent
    item in the fs/subvol tree with a key that has an offset incremented by
    one. But this second search leaves us at the new extent item we inserted
    previously, and since that extent item has a non-zero data offset, it
    it can make us call btrfs_drop_extents with an empty range (start == end)
    which causes the following warning:
    
    [23978.537119] WARNING: CPU: 6 PID: 16251 at fs/btrfs/file.c:550 btrfs_drop_extent_cache+0x43/0x385 [btrfs]()
    (...)
    [23978.557266] Call Trace:
    [23978.557978]  [<ffffffff81425fd9>] dump_stack+0x4c/0x65
    [23978.559191]  [<ffffffff81045390>] warn_slowpath_common+0xa1/0xbb
    [23978.560699]  [<ffffffffa047f0ea>] ? btrfs_drop_extent_cache+0x43/0x385 [btrfs]
    [23978.562389]  [<ffffffff8104544d>] warn_slowpath_null+0x1a/0x1c
    [23978.563613]  [<ffffffffa047f0ea>] btrfs_drop_extent_cache+0x43/0x385 [btrfs]
    [23978.565103]  [<ffffffff810e3a18>] ? time_hardirqs_off+0x15/0x28
    [23978.566294]  [<ffffffff81079ff8>] ? trace_hardirqs_off+0xd/0xf
    [23978.567438]  [<ffffffffa047f73d>] __btrfs_drop_extents+0x6b/0x9e1 [btrfs]
    [23978.568702]  [<ffffffff8107c03f>] ? trace_hardirqs_on+0xd/0xf
    [23978.569763]  [<ffffffff811441c0>] ? ____cache_alloc+0x69/0x2eb
    [23978.570817]  [<ffffffff81142269>] ? virt_to_head_page+0x9/0x36
    [23978.571872]  [<ffffffff81143c15>] ? cache_alloc_debugcheck_after.isra.42+0x16c/0x1cb
    [23978.573466]  [<ffffffff811420d5>] ? kmemleak_alloc_recursive.constprop.52+0x16/0x18
    [23978.574962]  [<ffffffffa0480d07>] btrfs_drop_extents+0x66/0x7f [btrfs]
    [23978.576179]  [<ffffffffa049aa35>] btrfs_clone+0x516/0xaf5 [btrfs]
    [23978.577311]  [<ffffffffa04983dc>] ? lock_extent_range+0x7b/0xcd [btrfs]
    [23978.578520]  [<ffffffffa049b2a2>] btrfs_ioctl_clone+0x28e/0x39f [btrfs]
    [23978.580282]  [<ffffffffa049d9ae>] btrfs_ioctl+0xb51/0x219a [btrfs]
    (...)
    [23978.591887] ---[ end trace 988ec2a653d03ed3 ]---
    
    Then we attempt to insert a new extent item with a key that already
    exists, which makes btrfs_insert_empty_item return -EEXIST resulting in
    abortion of the current transaction:
    
    [23978.594355] WARNING: CPU: 6 PID: 16251 at fs/btrfs/super.c:260 __btrfs_abort_transaction+0x52/0x114 [btrfs]()
    (...)
    [23978.622589] Call Trace:
    [23978.623181]  [<ffffffff81425fd9>] dump_stack+0x4c/0x65
    [23978.624359]  [<ffffffff81045390>] warn_slowpath_common+0xa1/0xbb
    [23978.625573]  [<ffffffffa044ab6c>] ? __btrfs_abort_transaction+0x52/0x114 [btrfs]
    [23978.626971]  [<ffffffff810453f0>] warn_slowpath_fmt+0x46/0x48
    [23978.628003]  [<ffffffff8108a6c8>] ? vprintk_default+0x1d/0x1f
    [23978.629138]  [<ffffffffa044ab6c>] __btrfs_abort_transaction+0x52/0x114 [btrfs]
    [23978.630528]  [<ffffffffa049ad1b>] btrfs_clone+0x7fc/0xaf5 [btrfs]
    [23978.631635]  [<ffffffffa04983dc>] ? lock_extent_range+0x7b/0xcd [btrfs]
    [23978.632886]  [<ffffffffa049b2a2>] btrfs_ioctl_clone+0x28e/0x39f [btrfs]
    [23978.634119]  [<ffffffffa049d9ae>] btrfs_ioctl+0xb51/0x219a [btrfs]
    (...)
    [23978.647714] ---[ end trace 988ec2a653d03ed4 ]---
    
    This is wrong because we should not process the extent item that we just
    inserted previously, and instead process the extent item that follows it
    in the tree
    
    For example for the test case I wrote for fstests:
    
       bs=$((64 * 1024))
       mkfs.btrfs -f -l $bs -O ^no-holes /dev/sdc
       mount /dev/sdc /mnt
    
       xfs_io -f -c "pwrite -S 0xaa $(($bs * 2)) $(($bs * 2))" /mnt/foo
    
       $CLONER_PROG -s $((3 * $bs)) -d $((267 * $bs)) -l 0 /mnt/foo /mnt/foo
       $CLONER_PROG -s $((217 * $bs)) -d $((95 * $bs)) -l 0 /mnt/foo /mnt/foo
    
    The second clone call fails with -EEXIST, because when we process the
    first extent item (offset 262144), we drop part of it (counting from the
    end) and then insert a new extent item with a key greater then the key we
    found. The next time we search the tree we search for a key with offset
    262144 + 1, which leaves us at the new extent item we have just inserted
    but we think it refers to an extent that we need to clone.
    
    Fix this by ensuring the next search key uses an offset corresponding to
    the offset of the key we found previously plus the data length of the
    corresponding extent item. This ensures we skip new extent items that we
    inserted and works for the case of implicit holes too (NO_HOLES feature).
    
    A test case for fstests follows soon.
    Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Signed-off-by: default avatarChris Mason <clm@fb.com>
    df858e76
ioctl.c 128 KB