• Shakeel Butt's avatar
    mm/vmscan.c: prevent useless kswapd loops · dffcac2c
    Shakeel Butt authored
    In production we have noticed hard lockups on large machines running
    large jobs due to kswaps hoarding lru lock within isolate_lru_pages when
    sc->reclaim_idx is 0 which is a small zone.  The lru was couple hundred
    GiBs and the condition (page_zonenum(page) > sc->reclaim_idx) in
    isolate_lru_pages() was basically skipping GiBs of pages while holding
    the LRU spinlock with interrupt disabled.
    
    On further inspection, it seems like there are two issues:
    
    (1) If kswapd on the return from balance_pgdat() could not sleep (i.e.
        node is still unbalanced), the classzone_idx is unintentionally set
        to 0 and the whole reclaim cycle of kswapd will try to reclaim only
        the lowest and smallest zone while traversing the whole memory.
    
    (2) Fundamentally isolate_lru_pages() is really bad when the
        allocation has woken kswapd for a smaller zone on a very large machine
        running very large jobs.  It can hoard the LRU spinlock while skipping
        over 100s of GiBs of pages.
    
    This patch only fixes (1).  (2) needs a more fundamental solution.  To
    fix (1), in the kswapd context, if pgdat->kswapd_classzone_idx is
    invalid use the classzone_idx of the previous kswapd loop otherwise use
    the one the waker has requested.
    
    Link: http://lkml.kernel.org/r/20190701201847.251028-1-shakeelb@google.com
    Fixes: e716f2eb ("mm, vmscan: prevent kswapd sleeping prematurely due to mismatched classzone_idx")
    Signed-off-by: default avatarShakeel Butt <shakeelb@google.com>
    Reviewed-by: default avatarYang Shi <yang.shi@linux.alibaba.com>
    Acked-by: default avatarMel Gorman <mgorman@techsingularity.net>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Cc: Hillf Danton <hdanton@sina.com>
    Cc: Roman Gushchin <guro@fb.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    dffcac2c
vmscan.c 122 KB