• Filipe Manana's avatar
    btrfs: send: optimize clone detection to increase extent sharing · c7499a64
    Filipe Manana authored
    Currently send does not do the best decisions when it comes to decide
    between multiple clone sources, which results in clone operations for
    partial extent ranges, which has the following disadvantages:
    
    1) We get less shared extents at the destination;
    
    2) We have to read more data during the send operation and emit more
       write commands.
    
    Besides not being optimal behaviour, it also breaks user expectations and
    is often reported by users, with a recent example in the Link tag at the
    bottom of this change log.
    
    Part of the reason for this non-optimal behaviour is that the backref
    walking code does not provide information about the length of the file
    extent items that were found for each backref, so send is blind about
    which backref is the best to chose as a cloning source.
    
    The other existing reasons are just silliness, namely always prefering
    the inode with the lowest number when multiple are found for the same
    root and when we can clone from multiple roots, always prefer the send
    root over any of the other clone roots. This does not make any sense
    since any inode or root is fine and as good as any other inode/root.
    
    Fix this by making backref walking pass information about the number of
    bytes referenced by each file extent item and then have send's backref
    callback pick the inode with the highest number of bytes for each root.
    Finally select the root from which we can clone more bytes from.
    
    Example reproducer:
    
       $ cat test.sh
       #!/bin/bash
    
       DEV=/dev/sdi
       MNT=/mnt/sdi
    
       mkfs.btrfs -f $DEV
       mount $DEV $MNT
    
       xfs_io -f -c "pwrite -S 0xab -b 2M 0 2M" $MNT/foo
       cp --reflink=always $MNT/foo $MNT/bar
       cp --reflink=always $MNT/foo $MNT/baz
       sync
    
       # Overwrite the second half of file foo.
       xfs_io -c "pwrite -S 0xcd -b 1M 1M 1M" $MNT/foo
       sync
    
       echo
       echo "*** fiemap in the original filesystem ***"
       echo
       xfs_io -c "fiemap -v" $MNT/foo
       xfs_io -c "fiemap -v" $MNT/bar
       xfs_io -c "fiemap -v" $MNT/baz
       echo
    
       btrfs filesystem du $MNT
    
       btrfs subvolume snapshot -r $MNT $MNT/snap
    
       btrfs send -f /tmp/send_stream $MNT/snap
    
       umount $MNT
       mkfs.btrfs -f $DEV &> /dev/null
       mount $DEV $MNT
    
       btrfs receive -f /tmp/send_stream $MNT
    
       echo
       echo "*** fiemap in the new filesystem ***"
       echo
       xfs_io -r -c "fiemap -v" $MNT/snap/foo
       xfs_io -r -c "fiemap -v" $MNT/snap/bar
       xfs_io -r -c "fiemap -v" $MNT/snap/baz
       echo
    
       btrfs filesystem du $MNT
    
       rm -f /tmp/send_stream
       rm -f /tmp/snap.fssum
    
       umount $MNT
    
    Before this change:
    
       $ ./test.sh
       (...)
    
       *** fiemap in the original filesystem ***
    
       /mnt/sdi/foo:
        EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
          0: [0..2047]:       26624..28671      2048 0x2000
          1: [2048..4095]:    30720..32767      2048   0x1
       /mnt/sdi/bar:
        EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
          0: [0..4095]:       26624..30719      4096 0x2001
       /mnt/sdi/baz:
        EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
          0: [0..4095]:       26624..30719      4096 0x2001
    
            Total   Exclusive  Set shared  Filename
          2.00MiB     1.00MiB           -  /mnt/sdi/foo
          2.00MiB       0.00B           -  /mnt/sdi/bar
          2.00MiB       0.00B           -  /mnt/sdi/baz
          6.00MiB     1.00MiB     2.00MiB  /mnt/sdi
    
       Create a readonly snapshot of '/mnt/sdi' in '/mnt/sdi/snap'
       At subvol /mnt/sdi/snap
       At subvol snap
    
       *** fiemap in the new filesystem ***
    
       /mnt/sdi/snap/foo:
        EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
          0: [0..4095]:       26624..30719      4096 0x2001
       /mnt/sdi/snap/bar:
        EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
          0: [0..2047]:       26624..28671      2048 0x2000
          1: [2048..4095]:    30720..32767      2048   0x1
       /mnt/sdi/snap/baz:
        EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
          0: [0..2047]:       26624..28671      2048 0x2000
          1: [2048..4095]:    32768..34815      2048   0x1
    
            Total   Exclusive  Set shared  Filename
          2.00MiB       0.00B           -  /mnt/sdi/snap/foo
          2.00MiB     1.00MiB           -  /mnt/sdi/snap/bar
          2.00MiB     1.00MiB           -  /mnt/sdi/snap/baz
          6.00MiB     2.00MiB           -  /mnt/sdi/snap
          6.00MiB     2.00MiB     2.00MiB  /mnt/sdi
    
    We end up with two 1M extents that are not shared for files bar and baz.
    
    After this change:
    
       $ ./test.sh
       (...)
    
       *** fiemap in the original filesystem ***
    
       /mnt/sdi/foo:
        EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
          0: [0..2047]:       26624..28671      2048 0x2000
          1: [2048..4095]:    30720..32767      2048   0x1
       /mnt/sdi/bar:
        EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
          0: [0..4095]:       26624..30719      4096 0x2001
       /mnt/sdi/baz:
        EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
          0: [0..4095]:       26624..30719      4096 0x2001
    
            Total   Exclusive  Set shared  Filename
          2.00MiB     1.00MiB           -  /mnt/sdi/foo
          2.00MiB       0.00B           -  /mnt/sdi/bar
          2.00MiB       0.00B           -  /mnt/sdi/baz
          6.00MiB     1.00MiB     2.00MiB  /mnt/sdi
       Create a readonly snapshot of '/mnt/sdi' in '/mnt/sdi/snap'
       At subvol /mnt/sdi/snap
       At subvol snap
    
       *** fiemap in the new filesystem ***
    
       /mnt/sdi/snap/foo:
        EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
          0: [0..4095]:       26624..30719      4096 0x2001
       /mnt/sdi/snap/bar:
        EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
          0: [0..2047]:       26624..28671      2048 0x2000
          1: [2048..4095]:    30720..32767      2048 0x2001
       /mnt/sdi/snap/baz:
        EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
          0: [0..2047]:       26624..28671      2048 0x2000
          1: [2048..4095]:    30720..32767      2048 0x2001
    
            Total   Exclusive  Set shared  Filename
          2.00MiB       0.00B           -  /mnt/sdi/snap/foo
          2.00MiB       0.00B           -  /mnt/sdi/snap/bar
          2.00MiB       0.00B           -  /mnt/sdi/snap/baz
          6.00MiB       0.00B           -  /mnt/sdi/snap
          6.00MiB       0.00B     3.00MiB  /mnt/sdi
    
    Now there's a much better sharing, files bar and baz share 1M of the
    extent of file foo and the second extent of files bar and baz is shared
    between themselves.
    
    This will later be turned into a test case for fstests.
    
    Link: https://lore.kernel.org/linux-btrfs/20221008005704.795b44b0@crass-HP-ZBook-15-G2/Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    c7499a64
send.c 206 KB