• Brian Foster's avatar
    xfs: trim writepage mapping to within eof · 40214d12
    Brian Foster authored
    The writeback rework in commit fbcc0256 ("xfs: Introduce
    writeback context for writepages") introduced a subtle change in
    behavior with regard to the block mapping used across the
    ->writepages() sequence. The previous xfs_cluster_write() code would
    only flush pages up to EOF at the time of the writepage, thus
    ensuring that any pages due to file-extending writes would be
    handled on a separate cycle and with a new, updated block mapping.
    
    The updated code establishes a block mapping in xfs_writepage_map()
    that could extend beyond EOF if the file has post-eof preallocation.
    Because we now use the generic writeback infrastructure and pass the
    cached mapping to each writepage call, there is no implicit EOF
    limit in place. If eofblocks trimming occurs during ->writepages(),
    any post-eof portion of the cached mapping becomes invalid. The
    eofblocks code has no means to serialize against writeback because
    there are no pages associated with post-eof blocks. Therefore if an
    eofblocks trim occurs and is followed by a file-extending buffered
    write, not only has the mapping become invalid, but we could end up
    writing a page to disk based on the invalid mapping.
    
    Consider the following sequence of events:
    
    - A buffered write creates a delalloc extent and post-eof
      speculative preallocation.
    - Writeback starts and on the first writepage cycle, the delalloc
      extent is converted to real blocks (including the post-eof blocks)
      and the mapping is cached.
    - The file is closed and xfs_release() trims post-eof blocks. The
      cached writeback mapping is now invalid.
    - Another buffered write appends the file with a delalloc extent.
    - The concurrent writeback cycle picks up the just written page
      because the writeback range end is LLONG_MAX. xfs_writepage_map()
      attributes it to the (now invalid) cached mapping and writes the
      data to an incorrect location on disk (and where the file offset is
      still backed by a delalloc extent).
    
    This problem is reproduced by xfstests test generic/464, which
    triggers racing writes, appends, open/closes and writeback requests.
    
    To address this problem, trim the mapping used during writeback to
    within EOF when the mapping is validated. This ensures the mapping
    is revalidated for any pages encountered beyond EOF as of the time
    the current mapping was cached or last validated.
    Reported-by: default avatarEryu Guan <eguan@redhat.com>
    Diagnosed-by: default avatarEryu Guan <eguan@redhat.com>
    Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
    Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
    Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
    40214d12
xfs_bmap.h 9.9 KB