• Filipe Manana's avatar
    btrfs: drop extent map range more efficiently · db21370b
    Filipe Manana authored
    Currently when dropping extent maps for a file range, through
    btrfs_drop_extent_map_range(), we do the following non-optimal things:
    
    1) We lookup for extent maps one by one, always starting the search from
       the root of the extent map tree. This is not efficient if we have
       multiple extent maps in the range;
    
    2) We check on every iteration if we have the 'split' and 'split2' spare
       extent maps in case we need to split an extent map that intersects our
       range but also crosses its boundaries (to the left, to the right or
       both cases). If our target range is for example:
    
           [2M, 8M)
    
       And we have 3 extents maps in the range:
    
           [1M, 3M) [3M, 6M) [6M, 10M[
    
       The on the first iteration we allocate two extent maps for 'split' and
       'split2', and use the 'split' to split the first extent map, so after
       the split we set 'split' to 'split2' and then set 'split2' to NULL.
    
       On the second iteration, we don't need to split the second extent map,
       but because 'split2' is now NULL, we allocate a new extent map for
       'split2'.
    
       On the third iteration we need to split the third extent map, so we
       use the extent map pointed by 'split'.
    
       So we ended up allocating 3 extent maps for splitting, but all we
       needed was 2 extent maps. We never need to allocate more than 2,
       because extent maps that need to be split are always the first one
       and the last one in the target range.
    
    Improve on this by:
    
    1) Using rb_next() to move on to the next extent map. This results in
       iterating over less nodes of the tree and it does not require comparing
       the ranges of nodes to our start/end offset;
    
    2) Allocate the 2 extent maps for splitting before entering the loop and
       never allocate more than 2. In practice it's very rare to have the
       combination of both extent map allocations fail, since we have a
       dedicated slab for extent maps, and also have the need to split two
       extent maps.
    
    This patch is part of a patchset comprised of the following patches:
    
       btrfs: fix missed extent on fsync after dropping extent maps
       btrfs: move btrfs_drop_extent_cache() to extent_map.c
       btrfs: use extent_map_end() at btrfs_drop_extent_map_range()
       btrfs: use cond_resched_rwlock_write() during inode eviction
       btrfs: move open coded extent map tree deletion out of inode eviction
       btrfs: add helper to replace extent map range with a new extent map
       btrfs: remove the refcount warning/check at free_extent_map()
       btrfs: remove unnecessary extent map initializations
       btrfs: assert tree is locked when clearing extent map from logging
       btrfs: remove unnecessary NULL pointer checks when searching extent maps
       btrfs: remove unnecessary next extent map search
       btrfs: avoid pointless extent map tree search when flushing delalloc
       btrfs: drop extent map range more efficiently
    
    And the following fio test was done before and after applying the whole
    patchset, on a non-debug kernel (Debian's default kernel config) on a 12
    cores Intel box with 64G of ram:
    
       $ cat test.sh
       #!/bin/bash
    
       DEV=/dev/nvme0n1
       MNT=/mnt/nvme0n1
       MOUNT_OPTIONS="-o ssd"
       MKFS_OPTIONS="-R free-space-tree -O no-holes"
    
       cat <<EOF > /tmp/fio-job.ini
       [writers]
       rw=randwrite
       fsync=8
       fallocate=none
       group_reporting=1
       direct=0
       bssplit=4k/20:8k/20:16k/20:32k/10:64k/10:128k/5:256k/5:512k/5:1m/5
       ioengine=psync
       filesize=2G
       runtime=300
       time_based
       directory=$MNT
       numjobs=8
       thread
       EOF
    
       echo performance | \
           tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
    
       echo
       echo "Using config:"
       echo
       cat /tmp/fio-job.ini
       echo
    
       umount $MNT &> /dev/null
       mkfs.btrfs -f $MKFS_OPTIONS $DEV
       mount $MOUNT_OPTIONS $DEV $MNT
    
       fio /tmp/fio-job.ini
    
       umount $MNT
    
    Result before applying the patchset:
    
       WRITE: bw=197MiB/s (206MB/s), 197MiB/s-197MiB/s (206MB/s-206MB/s), io=57.7GiB (61.9GB), run=300188-300188msec
    
    Result after applying the patchset:
    
       WRITE: bw=203MiB/s (213MB/s), 203MiB/s-203MiB/s (213MB/s-213MB/s), io=59.5GiB (63.9GB), run=300019-300019msec
    Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    db21370b
extent_map.c 25.8 KB