• Filipe Manana's avatar
    btrfs: stop doing unnecessary log updates during a rename · 259c4b96
    Filipe Manana authored
    During a rename, we call __btrfs_unlink_inode(), which will call
    btrfs_del_inode_ref_in_log() and btrfs_del_dir_entries_in_log(), in order
    to remove an inode reference and a directory entry from the log. These
    are necessary when __btrfs_unlink_inode() is called from the unlink path,
    but not necessary when it's called from a rename context, because:
    
    1) For the btrfs_del_inode_ref_in_log() call, it's pointless to delete the
       inode reference related to the old name, because later in the rename
       path we call btrfs_log_new_name(), which will drop all inode references
       from the log and copy all inode references from the subvolume tree to
       the log tree. So we are doing one unnecessary btree operation which
       adds additional latency and lock contention in case there are other
       tasks accessing the log tree;
    
    2) For the btrfs_del_dir_entries_in_log() call, we are now doing the
       equivalent at btrfs_log_new_name() since the previous patch in the
       series, that has the subject "btrfs: avoid logging all directory
       changes during renames". In fact, having __btrfs_unlink_inode() call
       this function not only adds additional latency and lock contention due
       to the extra btree operation, but also can make btrfs_log_new_name()
       unnecessarily log a range item to track the deletion of the old name,
       since it has no way to known that the directory entry related to the
       old name was previously logged and already deleted by
       __btrfs_unlink_inode() through its call to
       btrfs_del_dir_entries_in_log().
    
    So skip those calls at __btrfs_unlink_inode() when we are doing a rename.
    Skipping them also allows us now to reduce the duration of time we are
    pinning a log transaction during renames, which is always beneficial as
    it's not delaying so much other tasks trying to sync the log tree, in
    particular we end up not holding the log transaction pinned while adding
    the new name (adding inode ref, directory entry, etc).
    
    This change is part of a patchset comprised of the following patches:
    
      1/5 btrfs: add helper to delete a dir entry from a log tree
      2/5 btrfs: pass the dentry to btrfs_log_new_name() instead of the inode
      3/5 btrfs: avoid logging all directory changes during renames
      4/5 btrfs: stop doing unnecessary log updates during a rename
      5/5 btrfs: avoid inode logging during rename and link when possible
    
    Just like the previous patch in the series, "btrfs: avoid logging all
    directory changes during renames", the following script mimics part of
    what a package installation/upgrade with zypper does, which is basically
    renaming a lot of files, in some directory under /usr, to a name with a
    suffix of "-RPMDELETE":
    
      $ cat test.sh
      #!/bin/bash
    
      DEV=/dev/nvme0n1
      MNT=/mnt/nvme0n1
    
      NUM_FILES=10000
    
      mkfs.btrfs -f $DEV
      mount $DEV $MNT
    
      mkdir $MNT/testdir
    
      for ((i = 1; i <= $NUM_FILES; i++)); do
          echo -n > $MNT/testdir/file_$i
      done
    
      sync
    
      # Do some change to testdir and fsync it.
      echo -n > $MNT/testdir/file_$((NUM_FILES + 1))
      xfs_io -c "fsync" $MNT/testdir
    
      echo "Renaming $NUM_FILES files..."
      start=$(date +%s%N)
      for ((i = 1; i <= $NUM_FILES; i++)); do
          mv $MNT/testdir/file_$i $MNT/testdir/file_$i-RPMDELETE
      done
      end=$(date +%s%N)
    
      dur=$(( (end - start) / 1000000 ))
      echo "Renames took $dur milliseconds"
    
      umount $MNT
    
    Testing this change on box a using a non-debug kernel (Debian's default
    kernel config) gave the following results:
    
    NUM_FILES=10000, before patchset:                   27399 ms
    NUM_FILES=10000, after patches 1/5 to 3/5 applied:   9093 ms (-66.8%)
    NUM_FILES=10000, after patches 1/5 to 4/5 applied:   9016 ms (-67.1%)
    
    NUM_FILES=5000, before patchset:                     9241 ms
    NUM_FILES=5000, after patches 1/5 to 3/5 applied:    4642 ms (-49.8%)
    NUM_FILES=5000, after patches 1/5 to 4/5 applied:    4553 ms (-50.7%)
    
    NUM_FILES=2000, before patchset:                     2550 ms
    NUM_FILES=2000, after patches 1/5 to 3/5 applied:    1788 ms (-29.9%)
    NUM_FILES=2000, after patches 1/5 to 4/5 applied:    1767 ms (-30.7%)
    
    NUM_FILES=1000, before patchset:                     1088 ms
    NUM_FILES=1000, after patches 1/5 to 3/5 applied:     905 ms (-16.9%)
    NUM_FILES=1000, after patches 1/5 to 4/5 applied:     883 ms (-18.8%)
    
    The next patch in the series (5/5), also contains dbench results after
    applying to whole patchset.
    
    Link: https://bugzilla.opensuse.org/show_bug.cgi?id=1193549Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    259c4b96
tree-log.c 190 KB