• Filipe Manana's avatar
    Btrfs: sync log after logging new name · d4682ba0
    Filipe Manana authored
    When we add a new name for an inode which was logged in the current
    transaction, we update the inode in the log so that its new name and
    ancestors are added to the log. However when we do this we do not persist
    the log, so the changes remain in memory only, and as a consequence, any
    ancestors that were created in the current transaction are updated such
    that future calls to btrfs_inode_in_log() return true. This leads to a
    subsequent fsync against such new ancestor directories returning
    immediately, without persisting the log, therefore after a power failure
    the new ancestor directories do not exist, despite fsync being called
    against them explicitly.
    
    Example:
    
      $ mkfs.btrfs -f /dev/sdb
      $ mount /dev/sdb /mnt
    
      $ mkdir /mnt/A
      $ mkdir /mnt/B
      $ mkdir /mnt/A/C
      $ touch /mnt/B/foo
      $ xfs_io -c "fsync" /mnt/B/foo
      $ ln /mnt/B/foo /mnt/A/C/foo
      $ xfs_io -c "fsync" /mnt/A
      <power failure>
    
    After the power failure, directory "A" does not exist, despite the explicit
    fsync on it.
    
    Instead of fixing this by changing the behaviour of the explicit fsync on
    directory "A" to persist the log instead of doing nothing, make the logging
    of the new file name (which happens when creating a hard link or renaming)
    persist the log. This approach not only is simpler, not requiring addition
    of new fields to the inode in memory structure, but also gives us the same
    behaviour as ext4, xfs and f2fs (possibly other filesystems too).
    
    A test case for fstests follows soon.
    
    Fixes: 12fcfd22 ("Btrfs: tree logging unlink/rename fixes")
    Reported-by: default avatarVijay Chidambaram <vvijay03@gmail.com>
    Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    d4682ba0
tree-log.c 161 KB