• Shaohua Li's avatar
    mm: delete unnecessary TTU_* flags · a128ca71
    Shaohua Li authored
    Patch series "mm: fix some MADV_FREE issues", v5.
    
    We are trying to use MADV_FREE in jemalloc.  Several issues are found.
    Without solving the issues, jemalloc can't use the MADV_FREE feature.
    
     - Doesn't support system without swap enabled. Because if swap is off,
       we can't or can't efficiently age anonymous pages. And since
       MADV_FREE pages are mixed with other anonymous pages, we can't
       reclaim MADV_FREE pages. In current implementation, MADV_FREE will
       fallback to MADV_DONTNEED without swap enabled. But in our
       environment, a lot of machines don't enable swap. This will prevent
       our setup using MADV_FREE.
    
     - Increases memory pressure. page reclaim bias file pages reclaim
       against anonymous pages. This doesn't make sense for MADV_FREE pages,
       because those pages could be freed easily and refilled with very
       slight penality. Even page reclaim doesn't bias file pages, there is
       still an issue, because MADV_FREE pages and other anonymous pages are
       mixed together. To reclaim a MADV_FREE page, we probably must scan a
       lot of other anonymous pages, which is inefficient. In our test, we
       usually see oom with MADV_FREE enabled and nothing without it.
    
     - Accounting. There are two accounting problems. We don't have a global
       accounting. If the system is abnormal, we don't know if it's a
       problem from MADV_FREE side. The other problem is RSS accounting.
       MADV_FREE pages are accounted as normal anon pages and reclaimed
       lazily, so application's RSS becomes bigger. This confuses our
       workloads. We have monitoring daemon running and if it finds
       applications' RSS becomes abnormal, the daemon will kill the
       applications even kernel can reclaim the memory easily.
    
    To address the first the two issues, we can either put MADV_FREE pages
    into a separate LRU list (Minchan's previous patches and V1 patches), or
    put them into LRU_INACTIVE_FILE list (suggested by Johannes).  The
    patchset use the second idea.  The reason is LRU_INACTIVE_FILE list is
    tiny nowadays and should be full of used once file pages.  So we can
    still efficiently reclaim MADV_FREE pages there without interference
    with other anon and active file pages.  Putting the pages into inactive
    file list also has an advantage which allows page reclaim to prioritize
    MADV_FREE pages and used once file pages.  MADV_FREE pages are put into
    the lru list and clear SwapBacked flag, so PageAnon(page) &&
    !PageSwapBacked(page) will indicate a MADV_FREE pages.  These pages will
    directly freed without pageout if they are clean, otherwise normal swap
    will reclaim them.
    
    For the third issue, the previous post adds global accounting and a
    separate RSS count for MADV_FREE pages.  The problem is we never get
    accurate accounting for MADV_FREE pages.  The pages are mapped to
    userspace, can be dirtied without notice from kernel side.  To get
    accurate accounting, we could write protect the page, but then there is
    extra page fault overhead, which people don't want to pay.  Jemalloc
    guys have concerns about the inaccurate accounting, so this post drops
    the accounting patches temporarily.  The info exported to
    /proc/pid/smaps for MADV_FREE pages are kept, which is the only place we
    can get accurate accounting right now.
    
    This patch (of 6):
    
    Johannes pointed out TTU_LZFREE is unnecessary.  It's true because we
    always have the flag set if we want to do an unmap.  For cases we don't
    do an unmap, the TTU_LZFREE part of code should never run.
    
    Also the TTU_UNMAP is unnecessary.  If no other flags set (for example,
    TTU_MIGRATION), an unmap is implied.
    
    The patch includes Johannes's cleanup and dead TTU_ACTION macro removal
    code
    
    Link: http://lkml.kernel.org/r/4be3ea1bc56b26fd98a54d0a6f70bec63f6d8980.1487965799.git.shli@fb.comSigned-off-by: default avatarShaohua Li <shli@fb.com>
    Suggested-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
    Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
    Acked-by: default avatarMinchan Kim <minchan@kernel.org>
    Acked-by: default avatarHillf Danton <hillf.zj@alibaba-inc.com>
    Acked-by: default avatarMichal Hocko <mhocko@suse.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Rik van Riel <riel@redhat.com>
    Cc: Mel Gorman <mgorman@techsingularity.net>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    a128ca71
memory-failure.c 49.2 KB