• Brian Foster's avatar
    xfs: add missing ilock around dio write last extent alignment · 009c6e87
    Brian Foster authored
    The iomap codepath (via get_blocks()) acquires and release the inode
    lock in the case of a direct write that requires block allocation. This
    is because xfs_iomap_write_direct() allocates a transaction, which means
    the ilock must be dropped and reacquired after the transaction is
    allocated and reserved.
    
    xfs_iomap_write_direct() invokes xfs_iomap_eof_align_last_fsb() before
    the transaction is created and thus before the ilock is reacquired. This
    can lead to calls to xfs_iread_extents() and reads of the in-core extent
    list without any synchronization (via xfs_bmap_eof() and
    xfs_bmap_last_extent()). xfs_iread_extents() assert fails if the ilock
    is not held, but this is not currently seen in practice as the current
    callers had already invoked xfs_bmapi_read().
    
    What has been seen in practice are reports of crashes down in the
    xfs_bmap_eof() codepath on direct writes due to seemingly bogus pointer
    references from xfs_iext_get_ext(). While an explicit reproducer is not
    currently available to confirm the cause of the problem, crash analysis
    and code inspection from David Jeffrey had identified the insufficient
    locking.
    
    xfs_iomap_eof_align_last_fsb() is called from other contexts with the
    inode lock already held, so we cannot acquire it therein.
    __xfs_get_blocks() acquires and drops the ilock with variable flags to
    cover the event that the extent list must be read in. The common case is
    that __xfs_get_blocks() acquires the shared ilock. To provide locking
    around the last extent alignment call without adding more lock cycles to
    the dio path, update xfs_iomap_write_direct() to expect the shared ilock
    held on entry and do the extent alignment under its protection. Demote
    the lock, if necessary, from __xfs_get_blocks() and push the
    xfs_qm_dqattach() call outside of the shared lock critical section.
    Also, add an assert to document that the extent list is always expected
    to be present in this path. Otherwise, we risk a call to
    xfs_iread_extents() while under the shared ilock. This is safe as all
    current callers have executed an xfs_bmapi_read() call under the current
    iolock context.
    Reported-by: default avatarDavid Jeffery <djeffery@redhat.com>
    Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
    Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
    Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
    009c6e87
xfs_pnfs.c 8.09 KB