• Filipe Manana's avatar
    btrfs: fix lost inode on log replay after mix of fsync, rename and inode eviction · ecc64fab
    Filipe Manana authored
    When checking if we need to log the new name of a renamed inode, we are
    checking if the inode and its parent inode have been logged before, and if
    not we don't log the new name. The check however is buggy, as it directly
    compares the logged_trans field of the inodes versus the ID of the current
    transaction. The problem is that logged_trans is a transient field, only
    stored in memory and never persisted in the inode item, so if an inode
    was logged before, evicted and reloaded, its logged_trans field is set to
    a value of 0, meaning the check will return false and the new name of the
    renamed inode is not logged. If the old parent directory was previously
    fsynced and we deleted the logged directory entries corresponding to the
    old name, we end up with a log that when replayed will delete the renamed
    inode.
    
    The following example triggers the problem:
    
      $ mkfs.btrfs -f /dev/sdc
      $ mount /dev/sdc /mnt
    
      $ mkdir /mnt/A
      $ mkdir /mnt/B
      $ echo -n "hello world" > /mnt/A/foo
    
      $ sync
    
      # Add some new file to A and fsync directory A.
      $ touch /mnt/A/bar
      $ xfs_io -c "fsync" /mnt/A
    
      # Now trigger inode eviction. We are only interested in triggering
      # eviction for the inode of directory A.
      $ echo 2 > /proc/sys/vm/drop_caches
    
      # Move foo from directory A to directory B.
      # This deletes the directory entries for foo in A from the log, and
      # does not add the new name for foo in directory B to the log, because
      # logged_trans of A is 0, which is less than the current transaction ID.
      $ mv /mnt/A/foo /mnt/B/foo
    
      # Now make an fsync to anything except A, B or any file inside them,
      # like for example create a file at the root directory and fsync this
      # new file. This syncs the log that contains all the changes done by
      # previous rename operation.
      $ touch /mnt/baz
      $ xfs_io -c "fsync" /mnt/baz
    
      <power fail>
    
      # Mount the filesystem and replay the log.
      $ mount /dev/sdc /mnt
    
      # Check the filesystem content.
      $ ls -1R /mnt
      /mnt/:
      A
      B
      baz
    
      /mnt/A:
      bar
    
      /mnt/B:
      $
    
      # File foo is gone, it's neither in A/ nor in B/.
    
    Fix this by using the inode_logged() helper at btrfs_log_new_name(), which
    safely checks if an inode was logged before in the current transaction.
    
    A test case for fstests will follow soon.
    
    CC: stable@vger.kernel.org # 4.14+
    Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    ecc64fab
tree-log.c 177 KB