• Johannes Weiner's avatar
    mm: vmscan: reclaim writepage is IO cost · 96f8bf4f
    Johannes Weiner authored
    The VM tries to balance reclaim pressure between anon and file so as to
    reduce the amount of IO incurred due to the memory shortage.  It already
    counts refaults and swapins, but in addition it should also count
    writepage calls during reclaim.
    
    For swap, this is obvious: it's IO that wouldn't have occurred if the
    anonymous memory hadn't been under memory pressure.  From a relative
    balancing point of view this makes sense as well: even if anon is cold and
    reclaimable, a cache that isn't thrashing may have equally cold pages that
    don't require IO to reclaim.
    
    For file writeback, it's trickier: some of the reclaim writepage IO would
    have likely occurred anyway due to dirty expiration.  But not all of it -
    premature writeback reduces batching and generates additional writes.
    Since the flushers are already woken up by the time the VM starts writing
    cache pages one by one, let's assume that we'e likely causing writes that
    wouldn't have happened without memory pressure.  In addition, the per-page
    cost of IO would have probably been much cheaper if written in larger
    batches from the flusher thread rather than the single-page-writes from
    kswapd.
    
    For our purposes - getting the trend right to accelerate convergence on a
    stable state that doesn't require paging at all - this is sufficiently
    accurate.  If we later wanted to optimize for sustained thrashing, we can
    still refine the measurements.
    
    Count all writepage calls from kswapd as IO cost toward the LRU that the
    page belongs to.
    
    Why do this dynamically?  Don't we know in advance that anon pages require
    IO to reclaim, and so could build in a static bias?
    
    First, scanning is not the same as reclaiming.  If all the anon pages are
    referenced, we may not swap for a while just because we're scanning the
    anon list.  During this time, however, it's important that we age
    anonymous memory and the page cache at the same rate so that their
    hot-cold gradients are comparable.  Everything else being equal, we still
    want to reclaim the coldest memory overall.
    
    Second, we keep copies in swap unless the page changes.  If there is
    swap-backed data that's mostly read (tmpfs file) and has been swapped out
    before, we can reclaim it without incurring additional IO.
    Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Minchan Kim <minchan@kernel.org>
    Cc: Rik van Riel <riel@surriel.com>
    Link: http://lkml.kernel.org/r/20200520232525.798933-14-hannes@cmpxchg.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    96f8bf4f
swap.c 31.9 KB