• Michal Hocko's avatar
    mm, oom: rework oom detection · 0a0337e0
    Michal Hocko authored
    __alloc_pages_slowpath has traditionally relied on the direct reclaim
    and did_some_progress as an indicator that it makes sense to retry
    allocation rather than declaring OOM.  shrink_zones had to rely on
    zone_reclaimable if shrink_zone didn't make any progress to prevent from
    a premature OOM killer invocation - the LRU might be full of dirty or
    writeback pages and direct reclaim cannot clean those up.
    
    zone_reclaimable allows to rescan the reclaimable lists several times
    and restart if a page is freed.  This is really subtle behavior and it
    might lead to a livelock when a single freed page keeps allocator
    looping but the current task will not be able to allocate that single
    page.  OOM killer would be more appropriate than looping without any
    progress for unbounded amount of time.
    
    This patch changes OOM detection logic and pulls it out from shrink_zone
    which is too low to be appropriate for any high level decisions such as
    OOM which is per zonelist property.  It is __alloc_pages_slowpath which
    knows how many attempts have been done and what was the progress so far
    therefore it is more appropriate to implement this logic.
    
    The new heuristic is implemented in should_reclaim_retry helper called
    from __alloc_pages_slowpath.  It tries to be more deterministic and
    easier to follow.  It builds on an assumption that retrying makes sense
    only if the currently reclaimable memory + free pages would allow the
    current allocation request to succeed (as per __zone_watermark_ok) at
    least for one zone in the usable zonelist.
    
    This alone wouldn't be sufficient, though, because the writeback might
    get stuck and reclaimable pages might be pinned for a really long time
    or even depend on the current allocation context.  Therefore there is a
    backoff mechanism implemented which reduces the reclaim target after
    each reclaim round without any progress.  This means that we should
    eventually converge to only NR_FREE_PAGES as the target and fail on the
    wmark check and proceed to OOM.  The backoff is simple and linear with
    1/16 of the reclaimable pages for each round without any progress.  We
    are optimistic and reset counter for successful reclaim rounds.
    
    Costly high order pages mostly preserve their semantic and those without
    __GFP_REPEAT fail right away while those which have the flag set will
    back off after the amount of reclaimable pages reaches equivalent of the
    requested order.  The only difference is that if there was no progress
    during the reclaim we rely on zone watermark check.  This is more
    logical thing to do than previous 1<<order attempts which were a result
    of zone_reclaimable faking the progress.
    
    [vdavydov@virtuozzo.com: check classzone_idx for shrink_zone]
    [hannes@cmpxchg.org: separate the heuristic into should_reclaim_retry]
    [rientjes@google.com: use zone_page_state_snapshot for NR_FREE_PAGES]
    [rientjes@google.com: shrink_zones doesn't need to return anything]
    Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
    Acked-by: default avatarHillf Danton <hillf.zj@alibaba-inc.com>
    Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Joonsoo Kim <js1304@gmail.com>
    Cc: Mel Gorman <mgorman@techsingularity.net>
    Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    0a0337e0
page_alloc.c 203 KB