• Filipe Manana's avatar
    btrfs: use inode_logged() at btrfs_record_unlink_dir() · d67ba263
    Filipe Manana authored
    At btrfs_record_unlink_dir() we directly check the logged_trans field of
    the given inodes to check if they were previously logged in the current
    transaction, and if any of them were, then we can avoid setting the field
    last_unlink_trans of the directory to the id of the current transaction if
    we are in a rename path. Avoiding that can later prevent falling back to
    a transaction commit if anyone attempts to log the directory.
    
    However the logged_trans field, store in struct btrfs_inode, is transient,
    not persisted in the inode item on its subvolume b+tree, so that means
    that if an inode is evicted and then loaded again, its original value is
    lost and it's reset to 0. So directly checking the logged_trans field can
    lead to some false negative, and that only results in a performance impact
    as mentioned before.
    
    Instead of directly checking the logged_trans field of the inodes, use the
    inode_logged() helper, which will check in the log tree if an inode was
    logged before in case its logged_trans field has a value of 0. This way
    we can avoid setting the directory inode's last_unlink_trans and cause
    future logging attempts of it to fallback to transaction commits. The
    following test script shows one example where this happens without this
    patch:
    
      $ cat test.sh
      #!/bin/bash
    
      DEV=/dev/nullb0
      MNT=/mnt/nullb0
    
      num_init_files=10000
      num_new_files=10000
    
      mkfs.btrfs -f $DEV
      mount -o ssd $DEV $MNT
    
      mkdir $MNT/testdir
      for ((i = 1; i <= $num_init_files; i++)); do
          echo -n > $MNT/testdir/file_$i
       done
    
      echo -n > $MNT/testdir/foo
    
      sync
    
      # Add some files so that there's more work in the transaction other
      # than just renaming file foo.
      for ((i = 1; i <= $num_new_files; i++)); do
          echo -n > $MNT/testdir/new_file_$i
      done
    
      # Change the file, fsync it.
      setfattr -n user.x1 -v 123 $MNT/testdir/foo
      xfs_io -c "fsync" $MNT/testdir/foo
    
      # Now triggger eviction of file foo but no eviction for our test
      # directory, since it is being used by the process below. This will
      # set logged_trans of the file's inode to 0 once it is loaded again.
      (
          cd $MNT/testdir
          while true; do
              :
          done
      ) &
      pid=$!
    
      echo 2 > /proc/sys/vm/drop_caches
    
      kill $pid
      wait $pid
    
      # Move foo out of our testdir. This will set last_unlink_trans
      # of the directory inode to the current transaction, because
      # logged_trans of both the directory and the file are set to 0.
      mv $MNT/testdir/foo $MNT/foo
    
      # Change file foo again and fsync it.
      # This fsync will result in a transaction commit because the rename
      # above has set last_unlink_trans of the parent directory to the id
      # of the current transaction and because our inode for file foo has
      # last_unlink_trans set to the current transaction, since it was
      # evicted and reloaded and it was previously modified in the current
      # transaction (the xattr addition).
      xfs_io -c "pwrite 0 64K" $MNT/foo
      start=$(date +%s%N)
      xfs_io -c "fsync" $MNT/foo
      end=$(date +%s%N)
      dur=$(( (end - start) / 1000000 ))
    
      echo "file fsync took: $dur milliseconds"
    
      umount $MNT
    
    Before this patch:   fsync took 19 milliseconds
    After this patch:    fsync took  5 milliseconds
    Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    d67ba263
tree-log.c 210 KB