• ethanwu's avatar
    btrfs: backref, use correct count to resolve normal data refs · b25b0b87
    ethanwu authored
    With the following patches:
    
    - btrfs: backref, only collect file extent items matching backref offset
    - btrfs: backref, not adding refs from shared block when resolving normal backref
    - btrfs: backref, only search backref entries from leaves of the same root
    
    we only collect the normal data refs we want, so the imprecise upper
    bound total_refs of that EXTENT_ITEM could now be changed to the count
    of the normal backref entry we want to search.
    
    Background and how the patches fit together:
    
    Btrfs has two types of data backref.
    For BTRFS_EXTENT_DATA_REF_KEY type of backref, we don't have the
    exact block number. Therefore, we need to call resolve_indirect_refs.
    It uses btrfs_search_slot to locate the leaf block. Then
    we need to walk through the leaves to search for the EXTENT_DATA items
    that have disk bytenr matching the extent item (add_all_parents).
    
    When resolving indirect refs, we could take entries that don't
    belong to the backref entry we are searching for right now.
    For that reason when searching backref entry, we always use total
    refs of that EXTENT_ITEM rather than individual count.
    
    For example:
    item 11 key (40831553536 EXTENT_ITEM 4194304) itemoff 15460 itemsize
      extent refs 24 gen 7302 flags DATA
      shared data backref parent 394985472 count 10 #1
      extent data backref root 257 objectid 260 offset 1048576 count 3 #2
      extent data backref root 256 objectid 260 offset 65536 count 6 #3
      extent data backref root 257 objectid 260 offset 65536 count 5 #4
    
    For example, when searching backref entry #4, we'll use total_refs
    24, a very loose loop ending condition, instead of total_refs = 5.
    
    But using total_refs = 24 is not accurate. Sometimes, we'll never find
    all the refs from specific root.  As a result, the loop keeps on going
    until we reach the end of that inode.
    
    The first 3 patches, handle 3 different types refs we might encounter.
    These refs do not belong to the normal backref we are searching, and
    hence need to be skipped.
    
    This patch changes the total_refs to correct number so that we could
    end loop as soon as we find all the refs we want.
    
    btrfs send uses backref to find possible clone sources, the following
    is a simple test to compare the results with and without this patch:
    
     $ btrfs subvolume create /sub1
     $ for i in `seq 1 163840`; do
         dd if=/dev/zero of=/sub1/file bs=64K count=1 seek=$((i-1)) conv=notrunc oflag=direct
       done
     $ btrfs subvolume snapshot /sub1 /sub2
     $ for i in `seq 1 163840`; do
         dd if=/dev/zero of=/sub1/file bs=4K count=1 seek=$(((i-1)*16+10)) conv=notrunc oflag=direct
       done
     $ btrfs subvolume snapshot -r /sub1 /snap1
     $ time btrfs send /snap1 | btrfs receive /volume2
    
    Without this patch:
    
    real 69m48.124s
    user 0m50.199s
    sys  70m15.600s
    
    With this patch:
    
    real    1m59.683s
    user    0m35.421s
    sys     2m42.684s
    Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
    Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
    Signed-off-by: default avatarethanwu <ethanwu@synology.com>
    [ add patchset cover letter with background and numbers ]
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    b25b0b87
backref.c 60.5 KB