• Dave Chinner's avatar
    xfs: pin inode backing buffer to the inode log item · 298f7bec
    Dave Chinner authored
    When we dirty an inode, we are going to have to write it disk at
    some point in the near future. This requires the inode cluster
    backing buffer to be present in memory. Unfortunately, under severe
    memory pressure we can reclaim the inode backing buffer while the
    inode is dirty in memory, resulting in stalling the AIL pushing
    because it has to do a read-modify-write cycle on the cluster
    buffer.
    
    When we have no memory available, the read of the cluster buffer
    blocks the AIL pushing process, and this causes all sorts of issues
    for memory reclaim as it requires inode writeback to make forwards
    progress. Allocating a cluster buffer causes more memory pressure,
    and results in more cluster buffers to be reclaimed, resulting in
    more RMW cycles to be done in the AIL context and everything then
    backs up on AIL progress. Only the synchronous inode cluster
    writeback in the the inode reclaim code provides some level of
    forwards progress guarantees that prevent OOM-killer rampages in
    this situation.
    
    Fix this by pinning the inode backing buffer to the inode log item
    when the inode is first dirtied (i.e. in xfs_trans_log_inode()).
    This may mean the first modification of an inode that has been held
    in cache for a long time may block on a cluster buffer read, but
    we can do that in transaction context and block safely until the
    buffer has been allocated and read.
    
    Once we have the cluster buffer, the inode log item takes a
    reference to it, pinning it in memory, and attaches it to the log
    item for future reference. This means we can always grab the cluster
    buffer from the inode log item when we need it.
    
    When the inode is finally cleaned and removed from the AIL, we can
    drop the reference the inode log item holds on the cluster buffer.
    Once all inodes on the cluster buffer are clean, the cluster buffer
    will be unpinned and it will be available for memory reclaim to
    reclaim again.
    
    This avoids the issues with needing to do RMW cycles in the AIL
    pushing context, and hence allows complete non-blocking inode
    flushing to be performed by the AIL pushing context.
    Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
    Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
    Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
    298f7bec
xfs_inode_buf.c 19.4 KB