• Dave Chinner's avatar
    xfs: prevent deadlock trying to cover an active log · 192fedc5
    Dave Chinner authored
    commit 2c6e24ce upstream.
    
    Recent analysis of a deadlocked XFS filesystem from a kernel
    crash dump indicated that the filesystem was stuck waiting for log
    space. The short story of the hang on the RHEL6 kernel is this:
    
    	- the tail of the log is pinned by an inode
    	- the inode has been pushed by the xfsaild
    	- the inode has been flushed to it's backing buffer and is
    	  currently flush locked and hence waiting for backing
    	  buffer IO to complete and remove it from the AIL
    	- the backing buffer is marked for write - it is on the
    	  delayed write queue
    	- the inode buffer has been modified directly and logged
    	  recently due to unlinked inode list modification
    	- the backing buffer is pinned in memory as it is in the
    	  active CIL context.
    	- the xfsbufd won't start buffer writeback because it is
    	  pinned
    	- xfssyncd won't force the log because it sees the log as
    	  needing to be covered and hence wants to issue a dummy
    	  transaction to move the log covering state machine along.
    
    Hence there is no trigger to force the CIL to the log and hence
    unpin the inode buffer and therefore complete the inode IO, remove
    it from the AIL and hence move the tail of the log along, allowing
    transactions to start again.
    
    Mainline kernels also have the same deadlock, though the signature
    is slightly different - the inode buffer never reaches the delayed
    write lists because xfs_buf_item_push() sees that it is pinned and
    hence never adds it to the delayed write list that the xfsaild
    flushes.
    
    There are two possible solutions here. The first is to simply force
    the log before trying to cover the log and so ensure that the CIL is
    emptied before we try to reserve space for the dummy transaction in
    the xfs_log_worker(). While this might work most of the time, it is
    still racy and is no guarantee that we don't get stuck in
    xfs_trans_reserve waiting for log space to come free. Hence it's not
    the best way to solve the problem.
    
    The second solution is to modify xfs_log_need_covered() to be aware
    of the CIL. We only should be attempting to cover the log if there
    is no current activity in the log - covering the log is the process
    of ensuring that the head and tail in the log on disk are identical
    (i.e. the log is clean and at idle). Hence, by definition, if there
    are items in the CIL then the log is not at idle and so we don't
    need to attempt to cover it.
    
    When we don't need to cover the log because it is active or idle, we
    issue a log force from xfs_log_worker() - if the log is idle, then
    this does nothing.  However, if the log is active due to there being
    items in the CIL, it will force the items in the CIL to the log and
    unpin them.
    
    In the case of the above deadlock scenario, instead of
    xfs_log_worker() getting stuck in xfs_trans_reserve() attempting to
    cover the log, it will instead force the log, thereby unpinning the
    inode buffer, allowing IO to be issued and complete and hence
    removing the inode that was pinning the tail of the log from the
    AIL. At that point, everything will start moving along again. i.e.
    the xfs_log_worker turns back into a watchdog that can alleviate
    deadlocks based around pinned items that prevent the tail of the log
    from being moved...
    Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
    Reviewed-by: default avatarEric Sandeen <sandeen@redhat.com>
    Signed-off-by: default avatarBen Myers <bpm@sgi.com>
    Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
    192fedc5
xfs_log_priv.h 20.4 KB