• Filipe Manana's avatar
    btrfs: reduce inode lock critical section when setting and clearing delalloc · bdc0f89e
    Filipe Manana authored
    
    
    When setting and clearing a delalloc range, at btrfs_set_delalloc_extent()
    and btrfs_clear_delalloc_extent(), we are adding/removing the inode
    to/from the root's list of delalloc inodes while under the protection of
    the inode's lock. This however is not needed, we can add and remove the
    inode to the root's list without holding the inode's lock because here
    we are under the protection of the io tree's lock, reducing the size of
    the critical section delimited by the inode's lock. The inode's lock is
    used in many other places such as when finishing an ordered extent (when
    calling btrfs_update_inode_bytes() or btrfs_delalloc_release_metadata(),
    or decreasing the number of outstanding extents) or when reserving space
    when doing a buffered or direct IO write (calls to functions from
    delalloc-space.c).
    
    So move the inode add/remove operations to the root's list of delalloc
    inodes to outside the critical section delimited by the inode's lock.
    This also allows us to get rid of the BTRFS_INODE_IN_DELALLOC_LIST flag
    since we can rely on the inode's delalloc bytes counter to determine if
    the inode is or is not in the list.
    
    The following fio based test, that exercises IO to multiple files in the
    same subvolume, was used to test:
    
       $ cat test.sh
       #!/bin/bash
    
       DEV=/dev/nullb0
       MNT=/mnt/nullb0
       MOUNT_OPTIONS="-o ssd"
    
       mkfs.btrfs -f $DEV &> /dev/null
       mount $MOUNT_OPTIONS $DEV $MNT
    
       fio --direct=0 --ioengine=sync --thread --directory=$MNT \
           --invalidate=1 --group_reporting=1 \
           --new_group --rw=randwrite --size=50m --numjobs=200 \
           --bs=4k --fsync_on_close=0 --fallocate=none --end_fsync=0 \
           --name=foo --filename_format=FioWorkloads.\$jobnum
    
       umount $MNT
    
    The test was run on a non-debug kernel (Debian's default kernel config)
    against a 16G null block device.
    
    Result before this patch:
    
       WRITE: bw=81.9MiB/s (85.9MB/s), 81.9MiB/s-81.9MiB/s (85.9MB/s-85.9MB/s), io=9.77GiB (10.5GB), run=122136-122136msec
    
    Result after this patch:
    
       WRITE: bw=86.8MiB/s (91.0MB/s), 86.8MiB/s-86.8MiB/s (91.0MB/s-91.0MB/s), io=9.77GiB (10.5GB), run=115180-115180msec
    Reviewed-by: default avatarBoris Burkov <boris@bur.io>
    Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    bdc0f89e
inode.c 316 KB