• Filipe Manana's avatar
    Btrfs: fix data corruption after fast fsync and writeback error · 8407f553
    Filipe Manana authored
    When we do a fast fsync, we start all ordered operations and then while
    they're running in parallel we visit the list of modified extent maps
    and construct their matching file extent items and write them to the
    log btree. After that, in btrfs_sync_log() we wait for all the ordered
    operations to finish (via btrfs_wait_logged_extents).
    
    The problem with this is that we were completely ignoring errors that
    can happen in the extent write path, such as -ENOSPC, a temporary -ENOMEM
    or -EIO errors for example. When such error happens, it means we have parts
    of the on disk extent that weren't written to, and so we end up logging
    file extent items that point to these extents that contain garbage/random
    data - so after a crash/reboot plus log replay, we get our inode's metadata
    pointing to those extents.
    
    This worked in contrast with the full (non-fast) fsync path, where we
    start all ordered operations, wait for them to finish and then write
    to the log btree. In this path, after each ordered operation completes
    we check if it's flagged with an error (BTRFS_ORDERED_IOERR) and return
    -EIO if so (via btrfs_wait_ordered_range).
    
    So if an error happens with any ordered operation, just return a -EIO
    error to userspace, so that it knows that not all of its previous writes
    were durably persisted and the application can take proper action (like
    redo the writes for e.g.) - and definitely not leave any file extent items
    in the log refer to non fully written extents.
    Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Signed-off-by: default avatarChris Mason <clm@fb.com>
    8407f553
tree-log.h 2.77 KB