• Filipe Manana's avatar
    Btrfs: fix stale dir entries after unlink, inode eviction and fsync · bde6c242
    Filipe Manana authored
    If we remove a hard link from an inode, the inode gets evicted, then
    we fsync the inode and then power fail/crash, when the log tree is
    replayed, the parent directory inode still has entries pointing to
    the name that no longer exists, while our inode no longer has the
    BTRFS_INODE_REF_KEY item matching the deleted hard link (as expected),
    leaving the filesystem in an inconsistent state. The stale directory
    entries can not be deleted (an attempt to delete them causes -ESTALE
    errors), which makes it impossible to delete the parent directory.
    
    This happens because we track the id of the transaction where the last
    unlink operation for the inode happened (last_unlink_trans) in an
    in-memory only field of the inode, that is, a value that is never
    persisted in the inode item stored on the fs/subvol btree. So if an
    inode is evicted and loaded again, the value for last_unlink_trans is
    set to 0, which prevents the fsync from logging the parent directory
    at btrfs_log_inode_parent(). So fix this by setting last_unlink_trans
    to the id of the transaction that last modified the inode when we
    load the inode. This is a pessimistic approach but it always ensures
    correctness with the trade off of ocassional full transaction commits
    when an fsync is done against the inode in the same transaction where
    it was evicted and reloaded when our inode is a directory and often
    logging its parent unnecessarily when our inode is not a directory.
    
    The following test case for fstests triggers the problem:
    
      seq=`basename $0`
      seqres=$RESULT_DIR/$seq
      echo "QA output created by $seq"
      tmp=/tmp/$$
      status=1	# failure is the default!
      trap "_cleanup; exit \$status" 0 1 2 3 15
    
      _cleanup()
      {
          _cleanup_flakey
          rm -f $tmp.*
      }
    
      # get standard environment, filters and checks
      . ./common/rc
      . ./common/filter
      . ./common/dmflakey
    
      # real QA test starts here
      _need_to_be_root
      _supported_fs generic
      _supported_os Linux
      _require_scratch
      _require_dm_flakey
      _require_metadata_journaling $SCRATCH_DEV
    
      rm -f $seqres.full
    
      _scratch_mkfs >>$seqres.full 2>&1
      _init_flakey
      _mount_flakey
    
      # Create our test file with 2 hard links.
      mkdir $SCRATCH_MNT/testdir
      touch $SCRATCH_MNT/testdir/foo
      ln $SCRATCH_MNT/testdir/foo $SCRATCH_MNT/testdir/bar
    
      # Make sure everything done so far is durably persisted.
      sync
    
      # Now remove one of the links, trigger inode eviction and then fsync
      # our inode.
      unlink $SCRATCH_MNT/testdir/bar
      echo 2 > /proc/sys/vm/drop_caches
      $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/testdir/foo
    
      # Silently drop all writes on our scratch device to simulate a power failure.
      _load_flakey_table $FLAKEY_DROP_WRITES
      _unmount_flakey
    
      # Allow writes again and mount the fs to trigger log/journal replay.
      _load_flakey_table $FLAKEY_ALLOW_WRITES
      _mount_flakey
    
      # Now verify our directory entries.
      echo "Entries in testdir:"
      ls -1 $SCRATCH_MNT/testdir
    
      # If we remove our inode, its parent should become empty and therefore we should
      # be able to remove the parent.
      rm -f $SCRATCH_MNT/testdir/*
      rmdir $SCRATCH_MNT/testdir
    
      _unmount_flakey
    
      # The fstests framework will call fsck against our filesystem which will verify
      # that all metadata is in a consistent state.
    
      status=0
      exit
    
    The test failed on btrfs with:
    
      generic/098 4s ... - output mismatch (see /home/fdmanana/git/hub/xfstests/results//generic/098.out.bad)
        --- tests/generic/098.out	2015-07-23 18:01:12.616175932 +0100
        +++ /home/fdmanana/git/hub/xfstests/results//generic/098.out.bad	2015-07-23 18:04:58.924138308 +0100
        @@ -1,3 +1,6 @@
         QA output created by 098
         Entries in testdir:
        +bar
         foo
        +rm: cannot remove '/home/fdmanana/btrfs-tests/scratch_1/testdir/foo': Stale file handle
        +rmdir: failed to remove '/home/fdmanana/btrfs-tests/scratch_1/testdir': Directory not empty
        ...
        (Run 'diff -u tests/generic/098.out /home/fdmanana/git/hub/xfstests/results//generic/098.out.bad'  to see the entire diff)
      _check_btrfs_filesystem: filesystem on /dev/sdc is inconsistent (see /home/fdmanana/git/hub/xfstests/results//generic/098.full)
    
      $ cat /home/fdmanana/git/hub/xfstests/results//generic/098.full
      (...)
      checking fs roots
      root 5 inode 258 errors 2001, no inode item, link count wrong
         unresolved ref dir 257 index 0 namelen 3 name foo filetype 1 errors 6, no dir index, no inode ref
         unresolved ref dir 257 index 3 namelen 3 name bar filetype 1 errors 5, no dir item, no inode ref
      Checking filesystem on /dev/sdc
      (...)
    Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Signed-off-by: default avatarChris Mason <clm@fb.com>
    bde6c242
inode.c 263 KB