• Filipe Manana's avatar
    btrfs: do not commit logs and transactions during link and rename operations · 75b463d2
    Filipe Manana authored
    Since commit d4682ba0 ("Btrfs: sync log after logging new name") we
    started to commit logs, and fallback to transaction commits when we failed
    to log the new names or commit the logs, after link and rename operations
    when the target inodes (or their parents) were previously logged in the
    current transaction. This was to avoid losing directories despite an
    explicit fsync on them when they are ancestors of some inode that got a
    new named logged, due to a link or rename operation. However that adds the
    cost of starting IO and waiting for it to complete, which can cause higher
    latencies for applications.
    
    Instead of doing that, just make sure that when we log a new name for an
    inode we don't mark any of its ancestors as logged, so that if any one
    does an fsync against any of them, without doing any other change on them,
    the fsync commits the log. This way we only pay the cost of a log commit
    (or a transaction commit if something goes wrong or a new block group was
    created) if the application explicitly asks to fsync any of the parent
    directories.
    
    Using dbench, which mixes several filesystems operations including renames,
    revealed some significant latency gains. The following script that uses
    dbench was used to test this:
    
      #!/bin/bash
    
      DEV=/dev/nvme0n1
      MNT=/mnt/btrfs
      MOUNT_OPTIONS="-o ssd -o space_cache=v2"
      MKFS_OPTIONS="-m single -d single"
      THREADS=16
    
      echo "performance" | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
      mkfs.btrfs -f $MKFS_OPTIONS $DEV
      mount $MOUNT_OPTIONS $DEV $MNT
    
      dbench -t 300 -D $MNT $THREADS
    
      umount $MNT
    
    The test was run on bare metal, no virtualization, on a box with 12 cores
    (Intel i7-8700), 64Gb of RAM and using a NVMe device, with a kernel
    configuration that is the default of typical distributions (debian in this
    case), without debug options enabled (kasan, kmemleak, slub debug, debug
    of page allocations, lock debugging, etc).
    
    Results before this patch:
    
     Operation      Count    AvgLat    MaxLat
     ----------------------------------------
     NTCreateX    10750455     0.011   155.088
     Close         7896674     0.001     0.243
     Rename         455222     2.158  1101.947
     Unlink        2171189     0.067   121.638
     Deltree           256     2.425     7.816
     Mkdir             128     0.002     0.003
     Qpathinfo     9744323     0.006    21.370
     Qfileinfo     1707092     0.001     0.146
     Qfsinfo       1786756     0.001    11.228
     Sfileinfo      875612     0.003    21.263
     Find          3767281     0.025     9.617
     WriteX        5356924     0.011   211.390
     ReadX        16852694     0.003     9.442
     LockX           35008     0.002     0.119
     UnlockX         35008     0.001     0.138
     Flush          753458     4.252  1102.249
    
    Throughput 1128.35 MB/sec  16 clients  16 procs  max_latency=1102.255 ms
    
    Results after this patch:
    
    16 clients, after
    
     Operation      Count    AvgLat    MaxLat
     ----------------------------------------
     NTCreateX    11471098     0.012   448.281
     Close         8426396     0.001     0.925
     Rename         485746     0.123   267.183
     Unlink        2316477     0.080    63.433
     Deltree           288     2.830    11.144
     Mkdir             144     0.003     0.010
     Qpathinfo    10397420     0.006    10.288
     Qfileinfo     1822039     0.001     0.169
     Qfsinfo       1906497     0.002    14.039
     Sfileinfo      934433     0.004     2.438
     Find          4019879     0.026    10.200
     WriteX        5718932     0.011   200.985
     ReadX        17981671     0.003    10.036
     LockX           37352     0.002     0.076
     UnlockX         37352     0.001     0.109
     Flush          804018     5.015   778.033
    
    Throughput 1201.98 MB/sec  16 clients  16 procs  max_latency=778.036 ms
    (+6.5% throughput, -29.4% max latency, -75.8% rename latency)
    
    Test case generic/498 from fstests tests the scenario that the previously
    mentioned commit fixed.
    Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    75b463d2
inode.c 284 KB