• Dave Chinner's avatar
    xfs: don't free EFIs before the EFDs are committed · 666d644c
    Dave Chinner authored
    Filesystems are occasionally being shut down with this error:
    
    xfs_trans_ail_delete_bulk: attempting to delete a log item that is
    not in the AIL.
    
    It was diagnosed to be related to the EFI/EFD commit order when the
    EFI and EFD are in different checkpoints and the EFD is committed
    before the EFI here:
    
    http://oss.sgi.com/archives/xfs/2013-01/msg00082.html
    
    The real problem is that a single bit cannot fully describe the
    states that the EFI/EFD processing can be in. These completion
    states are:
    
    EFI			EFI in AIL	EFD		Result
    committed/unpinned	Yes		committed	OK
    committed/pinned	No		committed	Shutdown
    uncommitted		No		committed	Shutdown
    
    
    Note that the "result" field is what should happen, not what does
    happen. The current logic is broken and handles the first two cases
    correctly by luck.  That is, the code will free the EFI if the
    XFS_EFI_COMMITTED bit is *not* set, rather than if it is set. The
    inverted logic "works" because if both EFI and EFD are committed,
    then the first __xfs_efi_release() call clears the XFS_EFI_COMMITTED
    bit, and the second frees the EFI item. Hence as long as
    xfs_efi_item_committed() has been called, everything appears to be
    fine.
    
    It is the third case where the logic fails - where
    xfs_efd_item_committed() is called before xfs_efi_item_committed(),
    and that results in the EFI being freed before it has been
    committed. That is the bug that triggered the shutdown, and hence
    keeping track of whether the EFI has been committed or not is
    insufficient to correctly order the EFI/EFD operations w.r.t. the
    AIL.
    
    What we really want is this: the EFI is always placed into the
    AIL before the last reference goes away. The only way to guarantee
    that is that the EFI is not freed until after it has been unpinned
    *and* the EFD has been committed. That is, restructure the logic so
    that the only case that can occur is the first case.
    
    This can be done easily by replacing the XFS_EFI_COMMITTED with an
    EFI reference count. The EFI is initialised with it's own count, and
    that is not released until it is unpinned. However, there is a
    complication to this method - the high level EFI/EFD code in
    xfs_bmap_finish() does not hold direct references to the EFI
    structure, and runs a transaction commit between the EFI and EFD
    processing. Hence the EFI can be freed even before the EFD is
    created using such a method.
    
    Further, log recovery uses the AIL for tracking EFI/EFDs that need
    to be recovered, but it uses the AIL *differently* to the EFI
    transaction commit. Hence log recovery never pins or unpins EFIs, so
    we can't drop the EFI reference count indirectly to free the EFI.
    
    However, this doesn't prevent us from using a reference count here.
    There is a 1:1 relationship between EFIs and EFDs, so when we
    initialise the EFI we can take a reference count for the EFD as
    well. This solves the xfs_bmap_finish() issue - the EFI will never
    be freed until the EFD is processed. In terms of log recovery,
    during the committing of the EFD we can look for the
    XFS_EFI_RECOVERED bit being set and drop the EFI reference as well,
    thereby ensuring everything works correctly there as well.
    Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
    Reviewed-by: default avatarMark Tinguely <tinguely@sgi.com>
    Signed-off-by: default avatarBen Myers <bpm@sgi.com>
    666d644c
xfs_log_recover.c 105 KB