• Filipe Manana's avatar
    Btrfs: fix fsync when extend references are added to an inode · 2c2c452b
    Filipe Manana authored
    If we added an extended reference to an inode and fsync'ed it, the log
    replay code would make our inode have an incorrect link count, which
    was lower then the expected/correct count.
    This resulted in stale directory index entries after deleting some of
    the hard links, and any access to the dangling directory entries resulted
    in -ESTALE errors because the entries pointed to inode items that don't
    exist anymore.
    
    This is easy to reproduce with the test case I made for xfstests, and
    the bulk of that test is:
    
        _scratch_mkfs "-O extref" >> $seqres.full 2>&1
        _init_flakey
        _mount_flakey
    
        # Create a test file with 3001 hard links. This number is large enough to
        # make btrfs start using extrefs at some point even if the fs has the maximum
        # possible leaf/node size (64Kb).
        echo "hello world" > $SCRATCH_MNT/foo
        for i in `seq 1 3000`; do
            ln $SCRATCH_MNT/foo $SCRATCH_MNT/foo_link_`printf "%04d" $i`
        done
    
        # Make sure all metadata and data are durably persisted.
        sync
    
        # Add one more link to the inode that ends up being a btrfs extref and fsync
        # the inode.
        ln $SCRATCH_MNT/foo $SCRATCH_MNT/foo_link_3001
        $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foo
    
        # Simulate a crash/power loss. This makes sure the next mount
        # will see an fsync log and will replay that log.
    
        _load_flakey_table $FLAKEY_DROP_WRITES
        _unmount_flakey
    
        _load_flakey_table $FLAKEY_ALLOW_WRITES
        _mount_flakey
    
        # Now after the fsync log replay btrfs left our inode with a wrong link count N,
        # which was smaller than the correct link count M (N < M).
        # So after removing N hard links, the remaining M - N directory entries were
        # still visible to user space but it was impossible to do anything with them
        # because they pointed to an inode that didn't exist anymore. This resulted in
        # stale file handle errors (-ESTALE) when accessing those dentries for example.
        #
        # So remove all hard links except the first one and then attempt to read the
        # file, to verify we don't get an -ESTALE error when accessing the inodel
        #
        # The btrfs fsck tool also detected the incorrect inode link count and it
        # reported an error message like the following:
        #
        # root 5 inode 257 errors 2001, no inode item, link count wrong
        #   unresolved ref dir 256 index 2978 namelen 13 name foo_link_2976 filetype 1 errors 4, no inode ref
        #
        # The fstests framework automatically calls fsck after a test is run, so we
        # don't need to call fsck explicitly here.
    
        rm -f $SCRATCH_MNT/foo_link_*
        cat $SCRATCH_MNT/foo
    
        status=0
        exit
    
    So make sure an fsync always flushes the delayed inode item, so that the
    fsync log contains it (needed in order to trigger the link count fixup
    code) and fix the extref counting function, which always return -ENOENT
    to its caller (and made it assume there were always 0 extrefs).
    
    This issue has been present since the introduction of the extrefs feature
    (2012).
    
    A test case for xfstests follows soon.
    Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Signed-off-by: default avatarChris Mason <clm@fb.com>
    2c2c452b
tree-log.c 120 KB