• Filipe Manana's avatar
    btrfs: avoid blocking on space revervation when doing nowait dio writes · d4135134
    Filipe Manana authored
    When doing a NOWAIT direct IO write, if we can NOCOW then it means we can
    proceed with the non-blocking, NOWAIT path. However reserving the metadata
    space and qgroup meta space can often result in blocking - flushing
    delalloc, wait for ordered extents to complete, trigger transaction
    commits, etc, going against the semantics of a NOWAIT write.
    
    So make the NOWAIT write path to try to reserve all the metadata it needs
    without resulting in a blocking behaviour - if we get -ENOSPC or -EDQUOT
    then return -EAGAIN to make the caller fallback to a blocking direct IO
    write.
    
    This is part of a patchset comprised of the following patches:
    
      btrfs: avoid blocking on page locks with nowait dio on compressed range
      btrfs: avoid blocking nowait dio when locking file range
      btrfs: avoid double nocow check when doing nowait dio writes
      btrfs: stop allocating a path when checking if cross reference exists
      btrfs: free path at can_nocow_extent() before checking for checksum items
      btrfs: release path earlier at can_nocow_extent()
      btrfs: avoid blocking when allocating context for nowait dio read/write
      btrfs: avoid blocking on space revervation when doing nowait dio writes
    
    The following test was run before and after applying this patchset:
    
      $ cat io-uring-nodatacow-test.sh
      #!/bin/bash
    
      DEV=/dev/sdc
      MNT=/mnt/sdc
    
      MOUNT_OPTIONS="-o ssd -o nodatacow"
      MKFS_OPTIONS="-R free-space-tree -O no-holes"
    
      NUM_JOBS=4
      FILE_SIZE=8G
      RUN_TIME=300
    
      cat <<EOF > /tmp/fio-job.ini
      [io_uring_rw]
      rw=randrw
      fsync=0
      fallocate=posix
      group_reporting=1
      direct=1
      ioengine=io_uring
      iodepth=64
      bssplit=4k/20:8k/20:16k/20:32k/10:64k/10:128k/5:256k/5:512k/5:1m/5
      filesize=$FILE_SIZE
      runtime=$RUN_TIME
      time_based
      filename=foobar
      directory=$MNT
      numjobs=$NUM_JOBS
      thread
      EOF
    
      echo performance | \
         tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
    
      umount $MNT &> /dev/null
      mkfs.btrfs -f $MKFS_OPTIONS $DEV &> /dev/null
      mount $MOUNT_OPTIONS $DEV $MNT
    
      fio /tmp/fio-job.ini
    
      umount $MNT
    
    The test was run a 12 cores box with 64G of ram, using a non-debug kernel
    config (Debian's default config) and a spinning disk.
    
    Result before the patchset:
    
     READ: bw=407MiB/s (427MB/s), 407MiB/s-407MiB/s (427MB/s-427MB/s), io=119GiB (128GB), run=300175-300175msec
    WRITE: bw=407MiB/s (427MB/s), 407MiB/s-407MiB/s (427MB/s-427MB/s), io=119GiB (128GB), run=300175-300175msec
    
    Result after the patchset:
    
     READ: bw=436MiB/s (457MB/s), 436MiB/s-436MiB/s (457MB/s-457MB/s), io=128GiB (137GB), run=300044-300044msec
    WRITE: bw=435MiB/s (456MB/s), 435MiB/s-435MiB/s (456MB/s-456MB/s), io=128GiB (137GB), run=300044-300044msec
    
    That's about +7.2% throughput for reads and +6.9% for writes.
    Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    d4135134
qgroup.h 14.2 KB