• Brian Foster's avatar
    xfs: pass total block res. as total xfs_bmapi_write() parameter · dbd5c8c9
    Brian Foster authored
    The total field from struct xfs_alloc_arg is a bit of an unknown
    commodity. It is documented as the total block requirement for the
    transaction and is used in this manner from most call sites by virtue of
    passing the total block reservation of the transaction associated with
    an allocation. Several xfs_bmapi_write() callers pass hardcoded values
    of 0 or 1 for the total block requirement, which is a historical oddity
    without any clear reasoning.
    
    The xfs_iomap_write_direct() caller, for example, passes 0 for the total
    block requirement. This has been determined to cause problems in the
    form of ABBA deadlocks of AGF buffers due to incorrect AG selection in
    the block allocator. Specifically, the xfs_alloc_space_available()
    function incorrectly selects an AG that doesn't actually have sufficient
    space for the allocation. This occurs because the args.total field is 0
    and thus the remaining free space check on the AG doesn't actually
    consider the size of the allocation request. This locks the AGF buffer,
    the allocation attempt proceeds and ultimately fails (in
    xfs_alloc_fix_minleft()), and xfs_alloc_vexent() moves on to the next
    AG. In turn, this can lead to incorrect AG locking order (if the
    allocator wraps around, attempting to lock AG 0 after acquiring AG N)
    and thus deadlock if racing with another operation. This problem has
    been reproduced via generic/299 on smallish (1GB) ramdisk test devices.
    
    To avoid this problem, replace the undocumented hardcoded total
    parameters from the iomap and utility callers to pass the block
    reservation used for the associated transaction. This is consistent with
    other xfs_bmapi_write() callers throughout XFS. The assumption is that
    the total field allows the selection of an AG that can handle the entire
    operation rather than simply the allocation/range being requested (e.g.,
    resulting btree splits, etc.). This addresses the aforementioned
    generic/299 hang by ensuring AG selection only occurs when the
    allocation can be satisfied by the AG.
    Reported-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
    Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
    Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
    Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
    dbd5c8c9
xfs_iomap.c 24.2 KB