• Filipe Manana's avatar
    Btrfs: fix fsync of files with multiple hard links in new directories · 41bd6067
    Filipe Manana authored
    The log tree has a long standing problem that when a file is fsync'ed we
    only check for new ancestors, created in the current transaction, by
    following only the hard link for which the fsync was issued. We follow the
    ancestors using the VFS' dget_parent() API. This means that if we create a
    new link for a file in a directory that is new (or in an any other new
    ancestor directory) and then fsync the file using an old hard link, we end
    up not logging the new ancestor, and on log replay that new hard link and
    ancestor do not exist. In some cases, involving renames, the file will not
    exist at all.
    
    Example:
    
      mkfs.btrfs -f /dev/sdb
      mount /dev/sdb /mnt
    
      mkdir /mnt/A
      touch /mnt/foo
      ln /mnt/foo /mnt/A/bar
      xfs_io -c fsync /mnt/foo
    
      <power failure>
    
    In this example after log replay only the hard link named 'foo' exists
    and directory A does not exist, which is unexpected. In other major linux
    filesystems, such as ext4, xfs and f2fs for example, both hard links exist
    and so does directory A after mounting again the filesystem.
    
    Checking if any new ancestors are new and need to be logged was added in
    2009 by commit 12fcfd22 ("Btrfs: tree logging unlink/rename fixes"),
    however only for the ancestors of the hard link (dentry) for which the
    fsync was issued, instead of checking for all ancestors for all of the
    inode's hard links.
    
    So fix this by tracking the id of the last transaction where a hard link
    was created for an inode and then on fsync fallback to a full transaction
    commit when an inode has more than one hard link and at least one new hard
    link was created in the current transaction. This is the simplest solution
    since this is not a common use case (adding frequently hard links for
    which there's an ancestor created in the current transaction and then
    fsync the file). In case it ever becomes a common use case, a solution
    that consists of iterating the fs/subvol btree for each hard link and
    check if any ancestor is new, could be implemented.
    
    This solves many unexpected scenarios reported by Jayashree Mohan and
    Vijay Chidambaram, and for which there is a new test case for fstests
    under review.
    
    Fixes: 12fcfd22 ("Btrfs: tree logging unlink/rename fixes")
    CC: stable@vger.kernel.org # 4.4+
    Reported-by: default avatarVijay Chidambaram <vvijay03@gmail.com>
    Reported-by: default avatarJayashree Mohan <jayashree2912@gmail.com>
    Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    41bd6067
tree-log.c 164 KB