• Brian Foster's avatar
    xfs: skip background cowblock trims on inodes open for write · 90a71daa
    Brian Foster authored
    The background blockgc scanner runs on a 5m interval by default and
    trims preallocation (post-eof and cow fork) from inodes that are
    otherwise idle. Idle effectively means that iolock can be acquired
    without blocking and that the inode has no dirty pagecache or I/O in
    flight.
    
    This simple mechanism and heuristic has worked fairly well for
    post-eof speculative preallocations. Support for reflink and COW
    fork preallocations came sometime later and plugged into the same
    mechanism, with similar heuristics. Some recent testing has shown
    that COW fork preallocation may be notably more sensitive to blockgc
    processing than post-eof preallocation, however.
    
    For example, consider an 8GB reflinked file with a COW extent size
    hint of 1MB. A worst case fully randomized overwrite of this file
    results in ~8k extents of an average size of ~1MB. If the same
    workload is interrupted a couple times for blockgc processing
    (assuming the file goes idle), the resulting extent count explodes
    to over 100k extents with an average size <100kB. This is
    significantly worse than ideal and essentially defeats the COW
    extent size hint mechanism.
    
    While this particular test is instrumented, it reflects a fairly
    reasonable pattern in practice where random I/Os might spread out
    over a large period of time with varying periods of (in)activity.
    For example, consider a cloned disk image file for a VM or container
    with long uptime and variable and bursty usage. A background blockgc
    scan that races and processes the image file when it happens to be
    clean and idle can have a significant effect on the future
    fragmentation level of the file, even when still in use.
    
    To help combat this, update the heuristic to skip cowblocks inodes
    that are currently opened for write access during non-sync blockgc
    scans. This allows COW fork preallocations to persist for as long as
    possible unless otherwise needed for functional purposes (i.e. a
    sync scan), the file is idle and closed, or the inode is being
    evicted from cache. While here, update the comments to help
    distinguish performance oriented heuristics from the logic that
    exists to maintain functional correctness.
    Suggested-by: default avatarDarrick Wong <djwong@kernel.org>
    Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
    Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
    Signed-off-by: default avatarCarlos Maiolino <cem@kernel.org>
    90a71daa
xfs_icache.c 57.6 KB