• Dave Chinner's avatar
    xfs: fix race condition in AIL push trigger · 7ac95657
    Dave Chinner authored
    The recent conversion of the xfsaild functionality to a work queue
    introduced a hard-to-hit log space grant hang. One is caused by a
    race condition in determining whether there is a psh in progress or
    not.
    
    The XFS_AIL_PUSHING_BIT is used to determine whether a push is
    currently in progress.  When the AIL push work completes, it checked
    whether the target changed and cleared the PUSHING bit to allow a
    new push to be requeued. The race condition is as follows:
    
    	Thread 1		push work
    
    	smp_wmb()
    				smp_rmb()
    				check ailp->xa_target unchanged
    	update ailp->xa_target
    	test/set PUSHING bit
    	does not queue
    				clear PUSHING bit
    				does not requeue
    
    Now that the push target is updated, new attempts to push the AIL
    will not trigger as the push target will be the same, and hence
    despite trying to push the AIL we won't ever wake it again.
    
    The fix is to ensure that the AIL push work clears the PUSHING bit
    before it checks if the target is unchanged.
    
    As a result, both push triggers operate on the same test/set bit
    criteria, so even if we race in the push work and miss the target
    update, the thread requesting the push will still set the PUSHING
    bit and queue the push work to occur. For safety sake, the same
    queue check is done if the push work detects the target change,
    though only one of the two will will queue new work due to the use
    of test_and_set_bit() checks.
    Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
    Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
    Reviewed-by: default avatarAlex Elder <aelder@sgi.com>
    
    (cherry picked from commit e4d3c4a4)
    7ac95657
xfs_trans_ail.c 21.8 KB