• Johannes Weiner's avatar
    mm: zero-seek shrinkers · 4b85afbd
    Johannes Weiner authored
    The page cache and most shrinkable slab caches hold data that has been
    read from disk, but there are some caches that only cache CPU work, such
    as the dentry and inode caches of procfs and sysfs, as well as the subset
    of radix tree nodes that track non-resident page cache.
    
    Currently, all these are shrunk at the same rate: using DEFAULT_SEEKS for
    the shrinker's seeks setting tells the reclaim algorithm that for every
    two page cache pages scanned it should scan one slab object.
    
    This is a bogus setting.  A virtual inode that required no IO to create is
    not twice as valuable as a page cache page; shadow cache entries with
    eviction distances beyond the size of memory aren't either.
    
    In most cases, the behavior in practice is still fine.  Such virtual
    caches don't tend to grow and assert themselves aggressively, and usually
    get picked up before they cause problems.  But there are scenarios where
    that's not true.
    
    Our database workloads suffer from two of those.  For one, their file
    workingset is several times bigger than available memory, which has the
    kernel aggressively create shadow page cache entries for the non-resident
    parts of it.  The workingset code does tell the VM that most of these are
    expendable, but the VM ends up balancing them 2:1 to cache pages as per
    the seeks setting.  This is a huge waste of memory.
    
    These workloads also deal with tens of thousands of open files and use
    /proc for introspection, which ends up growing the proc_inode_cache to
    absurdly large sizes - again at the cost of valuable cache space, which
    isn't a reasonable trade-off, given that proc inodes can be re-created
    without involving the disk.
    
    This patch implements a "zero-seek" setting for shrinkers that results in
    a target ratio of 0:1 between their objects and IO-backed caches.  This
    allows such virtual caches to grow when memory is available (they do
    cache/avoid CPU work after all), but effectively disables them as soon as
    IO-backed objects are under pressure.
    
    It then switches the shrinkers for procfs and sysfs metadata, as well as
    excess page cache shadow nodes, to the new zero-seek setting.
    
    Link: http://lkml.kernel.org/r/20181009184732.762-5-hannes@cmpxchg.orgSigned-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
    Reported-by: default avatarDomas Mituzas <dmituzas@fb.com>
    Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Reviewed-by: default avatarRik van Riel <riel@surriel.com>
    Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    4b85afbd
workingset.c 19.5 KB