• Filipe Manana's avatar
    btrfs: send: avoid unaligned encoded writes when attempting to clone range · a11452a3
    Filipe Manana authored
    When trying to see if we can clone a file range, there are cases where we
    end up sending two write operations in case the inode from the source root
    has an i_size that is not sector size aligned and the length from the
    current offset to its i_size is less than the remaining length we are
    trying to clone.
    
    Issuing two write operations when we could instead issue a single write
    operation is not incorrect. However it is not optimal, specially if the
    extents are compressed and the flag BTRFS_SEND_FLAG_COMPRESSED was passed
    to the send ioctl. In that case we can end up sending an encoded write
    with an offset that is not sector size aligned, which makes the receiver
    fallback to decompressing the data and writing it using regular buffered
    IO (so re-compressing the data in case the fs is mounted with compression
    enabled), because encoded writes fail with -EINVAL when an offset is not
    sector size aligned.
    
    The following example, which triggered a bug in the receiver code for the
    fallback logic of decompressing + regular buffer IO and is fixed by the
    patchset referred in a Link at the bottom of this changelog, is an example
    where we have the non-optimal behaviour due to an unaligned encoded write:
    
       $ cat test.sh
       #!/bin/bash
    
       DEV=/dev/sdj
       MNT=/mnt/sdj
    
       mkfs.btrfs -f $DEV > /dev/null
       mount -o compress $DEV $MNT
    
       # File foo has a size of 33K, not aligned to the sector size.
       xfs_io -f -c "pwrite -S 0xab 0 33K" $MNT/foo
    
       xfs_io -f -c "pwrite -S 0xcd 0 64K" $MNT/bar
    
       # Now clone the first 32K of file bar into foo at offset 0.
       xfs_io -c "reflink $MNT/bar 0 0 32K" $MNT/foo
    
       # Snapshot the default subvolume and create a full send stream (v2).
       btrfs subvolume snapshot -r $MNT $MNT/snap
    
       btrfs send --compressed-data -f /tmp/test.send $MNT/snap
    
       echo -e "\nFile bar in the original filesystem:"
       od -A d -t x1 $MNT/snap/bar
    
       umount $MNT
       mkfs.btrfs -f $DEV > /dev/null
       mount $DEV $MNT
    
       echo -e "\nReceiving stream in a new filesystem..."
       btrfs receive -f /tmp/test.send $MNT
    
       echo -e "\nFile bar in the new filesystem:"
       od -A d -t x1 $MNT/snap/bar
    
       umount $MNT
    
    Before this patch, the send stream included one regular write and one
    encoded write for file 'bar', with the later being not sector size aligned
    and causing the receiver to fallback to decompression + buffered writes.
    The output of the btrfs receive command in verbose mode (-vvv):
    
       (...)
       mkfile o258-7-0
       rename o258-7-0 -> bar
       utimes
       clone bar - source=foo source offset=0 offset=0 length=32768
       write bar - offset=32768 length=1024
       encoded_write bar - offset=33792, len=4096, unencoded_offset=33792, unencoded_file_len=31744, unencoded_len=65536, compression=1, encryption=0
       encoded_write bar - falling back to decompress and write due to errno 22 ("Invalid argument")
       (...)
    
    This patch avoids the regular write followed by an unaligned encoded write
    so that we end up sending a single encoded write that is aligned. So after
    this patch the stream content is (output of btrfs receive -vvv):
    
       (...)
       mkfile o258-7-0
       rename o258-7-0 -> bar
       utimes
       clone bar - source=foo source offset=0 offset=0 length=32768
       encoded_write bar - offset=32768, len=4096, unencoded_offset=32768, unencoded_file_len=32768, unencoded_len=65536, compression=1, encryption=0
       (...)
    
    So we get more optimal behaviour and avoid the silent data loss bug in
    versions of btrfs-progs affected by the bug referred by the Link tag
    below (btrfs-progs v5.19, v5.19.1, v6.0 and v6.0.1).
    
    Link: https://lore.kernel.org/linux-btrfs/cover.1668529099.git.fdmanana@suse.com/Reviewed-by: default avatarBoris Burkov <boris@bur.io>
    Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    a11452a3
send.c 205 KB