• Nhat Pham's avatar
    list_lru: allow explicit memcg and NUMA node selection · 0a97c01c
    Nhat Pham authored
    Patch series "workload-specific and memory pressure-driven zswap
    writeback", v8.
    
    There are currently several issues with zswap writeback:
    
    1. There is only a single global LRU for zswap, making it impossible to
       perform worload-specific shrinking - an memcg under memory pressure
       cannot determine which pages in the pool it owns, and often ends up
       writing pages from other memcgs. This issue has been previously
       observed in practice and mitigated by simply disabling
       memcg-initiated shrinking:
    
       https://lore.kernel.org/all/20230530232435.3097106-1-nphamcs@gmail.com/T/#u
    
       But this solution leaves a lot to be desired, as we still do not
       have an avenue for an memcg to free up its own memory locked up in
       the zswap pool.
    
    2. We only shrink the zswap pool when the user-defined limit is hit.
       This means that if we set the limit too high, cold data that are
       unlikely to be used again will reside in the pool, wasting precious
       memory. It is hard to predict how much zswap space will be needed
       ahead of time, as this depends on the workload (specifically, on
       factors such as memory access patterns and compressibility of the
       memory pages).
    
    This patch series solves these issues by separating the global zswap LRU
    into per-memcg and per-NUMA LRUs, and performs workload-specific (i.e
    memcg- and NUMA-aware) zswap writeback under memory pressure.  The new
    shrinker does not have any parameter that must be tuned by the user, and
    can be opted in or out on a per-memcg basis.
    
    As a proof of concept, we ran the following synthetic benchmark: build the
    linux kernel in a memory-limited cgroup, and allocate some cold data in
    tmpfs to see if the shrinker could write them out and improved the overall
    performance.  Depending on the amount of cold data generated, we observe
    from 14% to 35% reduction in kernel CPU time used in the kernel builds.
    
    
    This patch (of 6):
    
    The interface of list_lru is based on the assumption that the list node
    and the data it represents belong to the same allocated on the correct
    node/memcg.  While this assumption is valid for existing slab objects LRU
    such as dentries and inodes, it is undocumented, and rather inflexible for
    certain potential list_lru users (such as the upcoming zswap shrinker and
    the THP shrinker).  It has caused us a lot of issues during our
    development.
    
    This patch changes list_lru interface so that the caller must explicitly
    specify numa node and memcg when adding and removing objects.  The old
    list_lru_add() and list_lru_del() are renamed to list_lru_add_obj() and
    list_lru_del_obj(), respectively.
    
    It also extends the list_lru API with a new function, list_lru_putback,
    which undoes a previous list_lru_isolate call.  Unlike list_lru_add, it
    does not increment the LRU node count (as list_lru_isolate does not
    decrement the node count).  list_lru_putback also allows for explicit
    memcg and NUMA node selection.
    
    Link: https://lkml.kernel.org/r/20231130194023.4102148-1-nphamcs@gmail.com
    Link: https://lkml.kernel.org/r/20231130194023.4102148-2-nphamcs@gmail.comSigned-off-by: default avatarNhat Pham <nphamcs@gmail.com>
    Suggested-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
    Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
    Tested-by: default avatarBagas Sanjaya <bagasdotme@gmail.com>
    Cc: Chris Li <chrisl@kernel.org>
    Cc: Dan Streetman <ddstreet@ieee.org>
    Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Muchun Song <muchun.song@linux.dev>
    Cc: Roman Gushchin <roman.gushchin@linux.dev>
    Cc: Seth Jennings <sjenning@redhat.com>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: Shuah Khan <shuah@kernel.org>
    Cc: Vitaly Wool <vitaly.wool@konsulko.com>
    Cc: Yosry Ahmed <yosryahmed@google.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    0a97c01c
binder_alloc.c 34.9 KB