1. 05 Dec, 2022 40 commits
    • David Sterba's avatar
      btrfs: pass btrfs_inode to btrfs_clear_delalloc_extent · bd54766e
      David Sterba authored
      The function is for internal interfaces so we should use the
      btrfs_inode.
      Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      bd54766e
    • David Sterba's avatar
      btrfs: pass btrfs_inode to btrfs_split_delalloc_extent · 62798a49
      David Sterba authored
      The function is for internal interfaces so we should use the
      btrfs_inode.
      Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      62798a49
    • David Sterba's avatar
      btrfs: pass btrfs_inode to btrfs_set_delalloc_extent · 4c5d166f
      David Sterba authored
      The function is for internal interfaces so we should use the
      btrfs_inode.
      Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      4c5d166f
    • David Sterba's avatar
      btrfs: pass btrfs_inode to btrfs_merge_delalloc_extent · 2454151c
      David Sterba authored
      The function is for internal interfaces so we should use the
      btrfs_inode.
      Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      2454151c
    • David Sterba's avatar
      btrfs: switch extent_io_tree::private_data to btrfs_inode and rename · 0988fc7b
      David Sterba authored
      The extent_io_tree::private_data was meant to be a preparatory work for
      the metadata inode rework but that never materialized. Now it's used
      only for an inode so it's better to change the appropriate type and
      rename it.
      Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      0988fc7b
    • David Sterba's avatar
      btrfs: drop private_data parameter from extent_io_tree_init · 35da5a7e
      David Sterba authored
      All callers except one pass NULL, so the parameter can be dropped and
      the inode::io_tree initialization can be open coded.
      Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      35da5a7e
    • David Sterba's avatar
      btrfs: pass btrfs_inode to btrfs_delete_subvolume · 3c4f91e2
      David Sterba authored
      The function is for internal interfaces so we should use the
      btrfs_inode.
      Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      3c4f91e2
    • David Sterba's avatar
      btrfs: pass btrfs_inode to __unlink_start_trans · e569b1d5
      David Sterba authored
      The function is for internal interfaces so we should use the
      btrfs_inode.
      Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      e569b1d5
    • David Sterba's avatar
      btrfs: pass btrfs_inode to btrfs_check_data_csum · 621af94a
      David Sterba authored
      The function is for internal interfaces so we should use the
      btrfs_inode.
      Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      621af94a
    • David Sterba's avatar
      btrfs: switch btrfs_writepage_fixup::inode to btrfs_inode · 36eeaef5
      David Sterba authored
      The btrfs_writepage_fixup structure is for internal interfaces so we
      should use the btrfs_inode.
      Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      36eeaef5
    • David Sterba's avatar
      btrfs: pass btrfs_inode to btrfs_add_delalloc_inodes · 82ca5a04
      David Sterba authored
      The function is for internal interfaces so we should use the
      btrfs_inode.
      Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      82ca5a04
    • David Sterba's avatar
      btrfs: pass btrfs_inode to btrfs_dirty_inode · 7152b425
      David Sterba authored
      The function is for internal interfaces so we should use the
      btrfs_inode.
      Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      7152b425
    • David Sterba's avatar
      btrfs: pass btrfs_inode to btrfs_inode_unlock · e5d4d75b
      David Sterba authored
      The function is for internal interfaces so we should use the
      btrfs_inode.
      Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      e5d4d75b
    • David Sterba's avatar
      btrfs: pass btrfs_inode to btrfs_inode_lock · 29b6352b
      David Sterba authored
      The function is for internal interfaces so we should use the
      btrfs_inode.
      Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      29b6352b
    • David Sterba's avatar
      btrfs: pass btrfs_inode to btrfs_truncate · d9dcae67
      David Sterba authored
      The function is for internal interfaces so we should use the
      btrfs_inode.
      Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      d9dcae67
    • David Sterba's avatar
      btrfs: pass btrfs_inode to btrfs_submit_dio_bio · bb41632e
      David Sterba authored
      The function is for internal interfaces so we should use the
      btrfs_inode.
      Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      bb41632e
    • David Sterba's avatar
      btrfs: switch btrfs_dio_private::inode to btrfs_inode · e2884c3d
      David Sterba authored
      The btrfs_dio_private structure is for internal interfaces so we should
      use the btrfs_inode.
      Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      e2884c3d
    • David Sterba's avatar
      btrfs: pass btrfs_inode to btrfs_repair_one_sector · d8f9268e
      David Sterba authored
      The function is for internal interfaces so we should use the
      btrfs_inode.
      Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      d8f9268e
    • David Sterba's avatar
      btrfs: pass btrfs_inode to submit_one_bio · c5ca391b
      David Sterba authored
      The function is for internal interfaces so we should use the
      btrfs_inode.
      Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      c5ca391b
    • David Sterba's avatar
      btrfs: pass btrfs_inode to btrfs_submit_dio_repair_bio · d781c1c3
      David Sterba authored
      The function is for internal interfaces so we should use the
      btrfs_inode.
      Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      d781c1c3
    • David Sterba's avatar
      btrfs: pass btrfs_inode to btrfs_submit_data_read_bio · b7620416
      David Sterba authored
      The function is for internal interfaces so we should use the
      btrfs_inode.
      Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      b7620416
    • David Sterba's avatar
      btrfs: pass btrfs_inode to btrfs_submit_data_write_bio · 535a7e5d
      David Sterba authored
      The function is for internal interfaces so we should use the
      btrfs_inode.
      Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      535a7e5d
    • David Sterba's avatar
      btrfs: pass btrfs_inode to btrfs_submit_metadata_bio · 644094fd
      David Sterba authored
      The function is for internal interfaces so we should use the
      btrfs_inode.
      Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      644094fd
    • David Sterba's avatar
      btrfs: pass btrfs_inode to btrfs_wq_submit_bio · 5fcdadc2
      David Sterba authored
      The function is for internal interfaces so we should use the
      btrfs_inode.
      Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      5fcdadc2
    • David Sterba's avatar
      btrfs: pass btrfs_inode to btrfs_submit_bio_start_direct_io · bfa17066
      David Sterba authored
      The function is for internal interfaces so we should use the
      btrfs_inode.
      Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      bfa17066
    • David Sterba's avatar
      btrfs: pass btrfs_inode to btrfs_submit_bio_start · 882681ac
      David Sterba authored
      The function is for internal interfaces so we should use the
      btrfs_inode.
      Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      882681ac
    • David Sterba's avatar
      btrfs: switch async_submit_bio::inode to btrfs_inode · da67daab
      David Sterba authored
      The async bio submit is for internal interfaces so we should use the
      btrfs_inode.
      Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      da67daab
    • David Sterba's avatar
      btrfs: simplify btree_submit_bio_start and btrfs_submit_bio_start parameters · ad65ecf3
      David Sterba authored
      After previous patches the unused parameters can be removed from
      btree_submit_bio_start and btrfs_submit_bio_start as they don't need to
      conform to the extent_submit_bio_start_t typedef.
      Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      ad65ecf3
    • David Sterba's avatar
      btrfs: change how submit bio callback is passed to btrfs_wq_submit_bio · ab2072b2
      David Sterba authored
      There's a callback function parameter for btrfs_wq_submit_bio that can
      be one of: metadata, buffered data, direct io data. The callback
      abstraction is unnecessary as we have all functions available.
      
      Replace the parameter with a command that leads to a direct call in
      run_one_async_start. The called functions can be then simplified and we
      can also remove the extent_submit_bio_start_t typedef.
      Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      ab2072b2
    • David Sterba's avatar
      btrfs: drop parameter compression_type from btrfs_submit_dio_repair_bio · 7920b773
      David Sterba authored
      Compression and direct io don't work together so the compression
      parameter can be dropped after previous patch that changed the call
      to direct.
      Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      7920b773
    • David Sterba's avatar
      btrfs: change how repair action is passed to btrfs_repair_one_sector · 19af6a7d
      David Sterba authored
      There's a function pointer passed to btrfs_repair_one_sector that will
      submit the right bio for repair. However there are only two callbacks,
      for buffered and for direct IO. This can be simplified to a bool-based
      switch and call either function, indirect calls in this case is an
      unnecessary abstraction. This allows to remove the submit_bio_hook_t
      typedef.
      Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      19af6a7d
    • David Sterba's avatar
      btrfs: convert btrfs_block_group::seq_zone to runtime flag · 961f5b8b
      David Sterba authored
      In zoned mode the sequential status of zone can be also tracked in the
      runtime flags of block group.
      Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      961f5b8b
    • David Sterba's avatar
      btrfs: convert btrfs_block_group::needs_free_space to runtime flag · 0d7764ff
      David Sterba authored
      We already have flags in block group to track various status bits,
      convert needs_free_space as well and reduce size of btrfs_block_group.
      Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      0d7764ff
    • David Sterba's avatar
      btrfs: zoned: use helper to check a power of two zone size · fd463ac4
      David Sterba authored
      We have a 64bit compatible helper to check if a value is a power of two,
      use it instead of open coding it.
      Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      fd463ac4
    • David Sterba's avatar
      btrfs: zlib: use copy_page for full page copy · 9e5e6d4e
      David Sterba authored
      The copy_page helper may use an optimized version for full page copy
      (eg. on s390 there's a special instruction for that), there's one more
      left to convert.
      Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      9e5e6d4e
    • Filipe Manana's avatar
      btrfs: send: bump the extent reference count limit for backref walking · e2a04165
      Filipe Manana authored
      After the previous patchset which is comprised of the following patches:
      
        01/17 btrfs: fix inode list leak during backref walking at resolve_indirect_refs()
        02/17 btrfs: fix inode list leak during backref walking at find_parent_nodes()
        03/17 btrfs: fix ulist leaks in error paths of qgroup self tests
        04/17 btrfs: remove pointless and double ulist frees in error paths of qgroup tests
        05/17 btrfs: send: avoid unnecessary path allocations when finding extent clone
        06/17 btrfs: send: update comment at find_extent_clone()
        07/17 btrfs: send: drop unnecessary backref context field initializations
        08/17 btrfs: send: avoid unnecessary backref lookups when finding clone source
        09/17 btrfs: send: optimize clone detection to increase extent sharing
        10/17 btrfs: use a single argument for extent offset in backref walking functions
        11/17 btrfs: use a structure to pass arguments to backref walking functions
        12/17 btrfs: reuse roots ulist on each leaf iteration for iterate_extent_inodes()
        13/17 btrfs: constify ulist parameter of ulist_next()
        14/17 btrfs: send: cache leaf to roots mapping during backref walking
        15/17 btrfs: send: skip unnecessary backref iterations
        16/17 btrfs: send: avoid double extent tree search when finding clone source
        17/17 btrfs: send: skip resolution of our own backref when finding clone source
      
      we have now much better performance when doing backref walking in the send
      code, so we can increase the current limit from 64 to 1024 references.
      This limit is still a bit conservative because there are still edge cases
      where backref walking will be too slow and spend a lot of cpu time, some IO
      reading b+tree nodes/leaves and memory. The goal is to eventually get rid
      of any limit, but for now bump it as it benefits users with extents shared
      more than 64 times and up to 1024 times, allowing for more deduplication
      at the destination without having to run a dedupe tool after a receive.
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      e2a04165
    • Filipe Manana's avatar
      btrfs: send: skip resolution of our own backref when finding clone source · adf02418
      Filipe Manana authored
      When doing backref walking to determine a source range to clone from, it
      is worthless to collect and resolve our own data backref, as we can't
      obviously use it as a clone source and it represents the range we want to
      clone into. Collecting the backref implies doing the extra work to resolve
      it, doing the search for a file extent item in a subvolume tree, etc.
      Skipping the data backref is valid as long as we only have the send root
      as the single clone root, otherwise the leaf with the file extent item may
      be accessible from another clone root due to shared subtrees created by
      snapshots, and therefore we have to collect the backref and resolve it.
      
      So add a callback to the backref walking code to guide it to skip data
      backrefs.
      
      This change is part of a patchset comprised of the following patches:
      
        01/17 btrfs: fix inode list leak during backref walking at resolve_indirect_refs()
        02/17 btrfs: fix inode list leak during backref walking at find_parent_nodes()
        03/17 btrfs: fix ulist leaks in error paths of qgroup self tests
        04/17 btrfs: remove pointless and double ulist frees in error paths of qgroup tests
        05/17 btrfs: send: avoid unnecessary path allocations when finding extent clone
        06/17 btrfs: send: update comment at find_extent_clone()
        07/17 btrfs: send: drop unnecessary backref context field initializations
        08/17 btrfs: send: avoid unnecessary backref lookups when finding clone source
        09/17 btrfs: send: optimize clone detection to increase extent sharing
        10/17 btrfs: use a single argument for extent offset in backref walking functions
        11/17 btrfs: use a structure to pass arguments to backref walking functions
        12/17 btrfs: reuse roots ulist on each leaf iteration for iterate_extent_inodes()
        13/17 btrfs: constify ulist parameter of ulist_next()
        14/17 btrfs: send: cache leaf to roots mapping during backref walking
        15/17 btrfs: send: skip unnecessary backref iterations
        16/17 btrfs: send: avoid double extent tree search when finding clone source
        17/17 btrfs: send: skip resolution of our own backref when finding clone source
      
      The following test was run on non-debug kernel (Debian's default kernel
      config) before and after applying the patchset:
      
         $ cat test-send-many-shared-extents.sh
         #!/bin/bash
      
         DEV=/dev/sdh
         MNT=/mnt/sdh
      
         umount $DEV &> /dev/null
         mkfs.btrfs -f $DEV
         mount $DEV $MNT
      
         num_files=50000
         num_clones_per_file=50
      
         for ((i = 1; i <= $num_files; i++)); do
             xfs_io -f -c "pwrite 0 64K" $MNT/file_$i > /dev/null
             echo -ne "\r$i files created..."
         done
         echo
      
         btrfs subvolume snapshot -r $MNT $MNT/snap1
      
         cloned=0
         for ((i = 1; i <= $num_clones_per_file; i++)); do
             for ((j = 1; j <= $num_files; j++)); do
                 cp --reflink=always $MNT/file_$j $MNT/file_${j}_clone_${i}
                 cloned=$((cloned + 1))
                 echo -ne "\r$cloned / $((num_files * num_clones_per_file)) clone operations"
             done
         done
         echo
      
         btrfs subvolume snapshot -r $MNT $MNT/snap2
      
         # Unmount and mount again to clear all cached metadata (and data).
         umount $DEV
         mount $DEV $MNT
      
         start=$(date +%s%N)
         btrfs send $MNT/snap2 > /dev/null
         end=$(date +%s%N)
      
         dur=$(( (end - start) / 1000000000 ))
         echo -e "\nFull send took $dur seconds"
      
         # Unmount and mount again to clear all cached metadata (and data).
         umount $DEV
         mount $DEV $MNT
      
         start=$(date +%s%N)
         btrfs send -p $MNT/snap1 $MNT/snap2 > /dev/null
         end=$(date +%s%N)
      
         dur=$(( (end - start) / 1000000000 ))
         echo -e "\nIncremental send took $dur seconds"
      
         umount $MNT
      
      Before applying the patchset:
      
         (...)
         Full send took 1108 seconds
         (...)
         Incremental send took 1135 seconds
      
      After applying the whole patchset:
      
         (...)
         Full send took 268 seconds            (-75.8%)
         (...)
         Incremental send took 316 seconds     (-72.2%)
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      adf02418
    • Filipe Manana's avatar
      btrfs: send: avoid double extent tree search when finding clone source · f73853c7
      Filipe Manana authored
      At find_extent_clone() we search twice for the extent item corresponding
      to the data extent that the current file extent items points to:
      
      1) Once with a call to extent_from_logical();
      
      2) Once again during backref walking, through iterate_extent_inodes()
         which eventually leads to find_parent_nodes() where we will search
         again the extent tree for the same extent item.
      
      The extent tree can be huge, so doing this one extra search for every
      extent we want to send adds up and it's expensive.
      
      The first call is there since the send code was introduced and it
      accomplishes two things:
      
      1) Check that the extent is flagged as a data extent in the extent tree.
         But it can not be anything else, otherwise we wouldn't have a file
         extent item in the send root pointing to it.
         This was probably added to catch bugs in the early days where send was
         yet too young and the interaction with everything else was far from
         perfect;
      
      2) Check how many direct references there are on the extent, and if
         there's too many (more than SEND_MAX_EXTENT_REFS), avoid doing the
         backred walking as it may take too long and slowdown send.
      
      So improve on this by having a callback in the backref walking code that
      is called when it finds the extent item in the extent tree, and have those
      checks done in the callback. When the callback returns anything different
      from 0, it stops the backref walking code. This way we do a single search
      on the extent tree for the extent item of our data extent.
      
      Also, before this change we were only checking the number of references on
      the data extent against SEND_MAX_EXTENT_REFS, but after starting backref
      walking we will end up resolving backrefs for extent buffers in the path
      from a leaf having a file extent item pointing to our data extent, up to
      roots of trees from which the extent buffer is accessible from, due to
      shared subtrees resulting from snapshoting. We were therefore allowing for
      the possibility for send taking too long due to some node in the path from
      the leaf to a root node being shared too many times. After this change we
      check for reference counts being greater than SEND_MAX_EXTENT_REFS for
      both data extents and metadata extents.
      
      This change is part of a patchset comprised of the following patches:
      
        01/17 btrfs: fix inode list leak during backref walking at resolve_indirect_refs()
        02/17 btrfs: fix inode list leak during backref walking at find_parent_nodes()
        03/17 btrfs: fix ulist leaks in error paths of qgroup self tests
        04/17 btrfs: remove pointless and double ulist frees in error paths of qgroup tests
        05/17 btrfs: send: avoid unnecessary path allocations when finding extent clone
        06/17 btrfs: send: update comment at find_extent_clone()
        07/17 btrfs: send: drop unnecessary backref context field initializations
        08/17 btrfs: send: avoid unnecessary backref lookups when finding clone source
        09/17 btrfs: send: optimize clone detection to increase extent sharing
        10/17 btrfs: use a single argument for extent offset in backref walking functions
        11/17 btrfs: use a structure to pass arguments to backref walking functions
        12/17 btrfs: reuse roots ulist on each leaf iteration for iterate_extent_inodes()
        13/17 btrfs: constify ulist parameter of ulist_next()
        14/17 btrfs: send: cache leaf to roots mapping during backref walking
        15/17 btrfs: send: skip unnecessary backref iterations
        16/17 btrfs: send: avoid double extent tree search when finding clone source
        17/17 btrfs: send: skip resolution of our own backref when finding clone source
      
      Performance test results are in the changelog of patch 17/17.
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      f73853c7
    • Filipe Manana's avatar
      btrfs: send: skip unnecessary backref iterations · 88ffb665
      Filipe Manana authored
      When looking for a clone source for an extent, we are iterating over all
      the backreferences for an extent. This is often a waste of time, because
      once we find a good clone source we could stop immediately instead of
      continuing backref walking, which is expensive.
      
      Basically what happens currently is this:
      
      1) Call iterate_extent_inodes() to iterate over all the backreferences;
      
      2) It calls btrfs_find_all_leafs() which in turn calls the main function
         to walk over backrefs and collect them - find_parent_nodes();
      
      3) Then we collect all the references for our target data extent from the
         extent tree (and delayed refs if any), add them to the rb trees,
         resolve all the indirect backreferences and search for all the file
         extent items in fs trees, building a list of inodes for each one of
         them (struct extent_inode_elem);
      
      4) Then back at iterate_extent_inodes() we find all the roots associated
         to each found leaf, and call the callback __iterate_backrefs defined
         at send.c for each inode in the inode list associated to each leaf.
      
      Some times one the first backreferences we find in a fs tree is optimal
      to satisfy the clone operation that send wants to perform, and in that
      case we could stop immediately and avoid resolving all the remaining
      indirect backreferences (search fs trees for the respective file extent
      items, etc). This possibly if when we find a fs tree leaf with a file
      extent item we are able to know what are all the roots that can lead to
      the leaf - this is now possible after the previous patch in the series
      that adds a cache that maps leaves to a list of roots. So we can now
      shortcircuit backref walking during send, by having the callback we
      pass to iterate_extent_inodes() to be called when we find a file extent
      item for an indirect backreference, and have it return a special value
      when it found a suitable backreference and it does not need to look for
      more backreferences. This change does that.
      
      This change is part of a patchset comprised of the following patches:
      
        01/17 btrfs: fix inode list leak during backref walking at resolve_indirect_refs()
        02/17 btrfs: fix inode list leak during backref walking at find_parent_nodes()
        03/17 btrfs: fix ulist leaks in error paths of qgroup self tests
        04/17 btrfs: remove pointless and double ulist frees in error paths of qgroup tests
        05/17 btrfs: send: avoid unnecessary path allocations when finding extent clone
        06/17 btrfs: send: update comment at find_extent_clone()
        07/17 btrfs: send: drop unnecessary backref context field initializations
        08/17 btrfs: send: avoid unnecessary backref lookups when finding clone source
        09/17 btrfs: send: optimize clone detection to increase extent sharing
        10/17 btrfs: use a single argument for extent offset in backref walking functions
        11/17 btrfs: use a structure to pass arguments to backref walking functions
        12/17 btrfs: reuse roots ulist on each leaf iteration for iterate_extent_inodes()
        13/17 btrfs: constify ulist parameter of ulist_next()
        14/17 btrfs: send: cache leaf to roots mapping during backref walking
        15/17 btrfs: send: skip unnecessary backref iterations
        16/17 btrfs: send: avoid double extent tree search when finding clone source
        17/17 btrfs: send: skip resolution of our own backref when finding clone source
      
      Performance test results are in the changelog of patch 17/17.
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      88ffb665
    • Filipe Manana's avatar
      btrfs: send: cache leaf to roots mapping during backref walking · 66d04209
      Filipe Manana authored
      During a send operation, when doing backref walking to determine which
      inodes/offsets/roots we can clone from, the most repetitive and expensive
      step is to map each leaf that has file extent items pointing to the target
      data extent to the IDs of the roots from which the leaves are accessible,
      which happens at iterate_extent_inodes(). That step requires finding every
      parent node of a leaf, then the parent of each parent, and so on until we
      reach a root node. So it's a naturally expensive operation, and repetitive
      because each leaf can have hundreds of file extent items (for a nodesize
      of 16K, that can be slightly over 200 file extent items). There's also
      temporal locality, as we process all file extent items from a leave before
      moving the next leaf.
      
      This change caches the mapping of leaves to root IDs, to avoid repeating
      those computations over and over again. The cache is limited to a maximum
      of 128 entries, with each entry being a struct with a size of 128 bytes,
      so the maximum cache size is 16K plus any nodes internally allocated by
      the maple tree that is used to index pointers to those structs. The cache
      is invalidated whenever we detect relocation happened since we started
      filling the cache, because if relocation happened then extent buffers for
      leaves and nodes of the trees used by a send operation may have been
      reallocated.
      
      This cache also allows for another important optimization that is
      introduced in the next patch in the series.
      
      This change is part of a patchset comprised of the following patches:
      
        01/17 btrfs: fix inode list leak during backref walking at resolve_indirect_refs()
        02/17 btrfs: fix inode list leak during backref walking at find_parent_nodes()
        03/17 btrfs: fix ulist leaks in error paths of qgroup self tests
        04/17 btrfs: remove pointless and double ulist frees in error paths of qgroup tests
        05/17 btrfs: send: avoid unnecessary path allocations when finding extent clone
        06/17 btrfs: send: update comment at find_extent_clone()
        07/17 btrfs: send: drop unnecessary backref context field initializations
        08/17 btrfs: send: avoid unnecessary backref lookups when finding clone source
        09/17 btrfs: send: optimize clone detection to increase extent sharing
        10/17 btrfs: use a single argument for extent offset in backref walking functions
        11/17 btrfs: use a structure to pass arguments to backref walking functions
        12/17 btrfs: reuse roots ulist on each leaf iteration for iterate_extent_inodes()
        13/17 btrfs: constify ulist parameter of ulist_next()
        14/17 btrfs: send: cache leaf to roots mapping during backref walking
        15/17 btrfs: send: skip unnecessary backref iterations
        16/17 btrfs: send: avoid double extent tree search when finding clone source
        17/17 btrfs: send: skip resolution of our own backref when finding clone source
      
      Performance test results are in the changelog of patch 17/17.
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      66d04209