• Filipe Manana's avatar
    btrfs: prepare extents to be logged before locking a log tree path · e1f53ed8
    Filipe Manana authored
    
    
    When we want to log an extent, in the fast fsync path, we obtain a path
    to the leaf that will hold the file extent item either through a deletion
    search, via btrfs_drop_extents(), or through an insertion search using
    btrfs_insert_empty_item(). After that we fill the file extent item's
    fields one by one directly on the leaf.
    
    Instead of doing that, we could prepare the file extent item before
    obtaining a btree path, and then copy the prepared extent item with a
    single operation once we get the path. This helps avoid some contention
    on the log tree, since we are holding write locks for longer than
    necessary, especially in the case where the path is obtained via
    btrfs_drop_extents() through a deletion search, which always keeps a
    write lock on the nodes at levels 1 and 2 (besides the leaf).
    
    This change does that, we prepare the file extent item that is going to
    be inserted before acquiring a path, and then copy it into a leaf using
    a single copy operation once we get a path.
    
    This change if part of a patchset that is comprised of the following
    patches:
    
      1/6 btrfs: remove unnecessary leaf free space checks when pushing items
      2/6 btrfs: avoid unnecessary COW of leaves when deleting items from a leaf
      3/6 btrfs: avoid unnecessary computation when deleting items from a leaf
      4/6 btrfs: remove constraint on number of visited leaves when replacing extents
      5/6 btrfs: remove useless path release in the fast fsync path
      6/6 btrfs: prepare extents to be logged before locking a log tree path
    
    The following test was run to measure the impact of the whole patchset:
    
      $ cat test.sh
      #!/bin/bash
    
      DEV=/dev/sdi
      MNT=/mnt/sdi
      MOUNT_OPTIONS="-o ssd"
      MKFS_OPTIONS="-R free-space-tree -O no-holes"
    
      NUM_JOBS=8
      FILE_SIZE=128M
      RUN_TIME=200
    
      cat <<EOF > /tmp/fio-job.ini
      [writers]
      rw=randwrite
      fsync=1
      fallocate=none
      group_reporting=1
      direct=0
      bssplit=4k/20:8k/20:16k/20:32k/10:64k/10:128k/5:256k/5:512k/5:1m/5
      ioengine=sync
      filesize=$FILE_SIZE
      runtime=$RUN_TIME
      time_based
      directory=$MNT
      numjobs=$NUM_JOBS
      thread
      EOF
    
      echo "performance" | \
          tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
    
      echo
      echo "Using config:"
      echo
      cat /tmp/fio-job.ini
      echo
    
      umount $MNT &> /dev/null
      mkfs.btrfs -f $MKFS_OPTIONS $DEV
      mount $MOUNT_OPTIONS $DEV $MNT
    
      fio /tmp/fio-job.ini
    
      umount $MNT
    
    The test ran inside a VM (8 cores, 32G of RAM) with the target disk
    mapping to a raw NVMe device, and using a non-debug kernel config
    (Debian's default config).
    
    Before the patchset:
    
    WRITE: bw=116MiB/s (122MB/s), 116MiB/s-116MiB/s (122MB/s-122MB/s), io=22.7GiB (24.4GB), run=200013-200013msec
    
    After the patchset:
    
    WRITE: bw=125MiB/s (131MB/s), 125MiB/s-125MiB/s (131MB/s-131MB/s), io=24.3GiB (26.1GB), run=200007-200007msec
    
    A 7.8% gain on throughput and +7.0% more IO done in the same period of
    time (200 seconds).
    Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    e1f53ed8
tree-log.c 192 KB