• Filipe Manana's avatar
    Btrfs: deal with convert_extent_bit errors to avoid fs corruption · 663dfbb0
    Filipe Manana authored
    When committing a transaction or a log, we look for btree extents that
    need to be durably persisted by searching for ranges in a io tree that
    have some bits set (EXTENT_DIRTY or EXTENT_NEW). We then attempt to clear
    those bits and set the EXTENT_NEED_WAIT bit, with calls to the function
    convert_extent_bit, and then start writeback for the extents.
    
    That function however can return an error (at the moment only -ENOMEM
    is possible, specially when it does GFP_ATOMIC allocation requests
    through alloc_extent_state_atomic) - that means the ranges didn't got
    the EXTENT_NEED_WAIT bit set (or at least not for the whole range),
    which in turn means a call to btrfs_wait_marked_extents() won't find
    those ranges for which we started writeback, causing a transaction
    commit or a log commit to persist a new superblock without waiting
    for the writeback of extents in that range to finish first.
    
    Therefore if a crash happens after persisting the new superblock and
    before writeback finishes, we have a superblock pointing to roots that
    weren't fully persisted or roots that point to nodes or leafs that weren't
    fully persisted, causing all sorts of unexpected/bad behaviour as we endup
    reading garbage from disk or the content of some node/leaf from a past
    generation that got cowed or deleted and is no longer valid (for this later
    case we end up getting error messages like "parent transid verify failed on
    X wanted Y found Z" when reading btree nodes/leafs from disk).
    Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Signed-off-by: default avatarChris Mason <clm@fb.com>
    663dfbb0
transaction.c 57.1 KB