• David Rientjes's avatar
    Revert "Revert "mm, thp: restore node-local hugepage allocations"" · ac79f78d
    David Rientjes authored
    This reverts commit a8282608.
    
    The commit references the original intended semantic for MADV_HUGEPAGE
    which has subsequently taken on three unique purposes:
    
     - enables or disables thp for a range of memory depending on the system's
       config (is thp "enabled" set to "always" or "madvise"),
    
     - determines the synchronous compaction behavior for thp allocations at
       fault (is thp "defrag" set to "always", "defer+madvise", or "madvise"),
       and
    
     - reverts a previous MADV_NOHUGEPAGE (there is no madvise mode to only
       clear previous hugepage advice).
    
    These are the three purposes that currently exist in 5.2 and over the
    past several years that userspace has been written around.  Adding a
    NUMA locality preference adds a fourth dimension to an already conflated
    advice mode.
    
    Based on the semantic that MADV_HUGEPAGE has provided over the past
    several years, there exist workloads that use the tunable based on these
    principles: specifically that the allocation should attempt to
    defragment a local node before falling back.  It is agreed that remote
    hugepages typically (but not always) have a better access latency than
    remote native pages, although on Naples this is at parity for
    intersocket.
    
    The revert commit that this patch reverts allows hugepage allocation to
    immediately allocate remotely when local memory is fragmented.  This is
    contrary to the semantic of MADV_HUGEPAGE over the past several years:
    that is, memory compaction should be attempted locally before falling
    back.
    
    The performance degradation of remote hugepages over local hugepages on
    Rome, for example, is 53.5% increased access latency.  For this reason,
    the goal is to revert back to the 5.2 and previous behavior that would
    attempt local defragmentation before falling back.  With the patch that
    is reverted by this patch, we see performance degradations at the tail
    because the allocator happily allocates the remote hugepage rather than
    even attempting to make a local hugepage available.
    
    zone_reclaim_mode is not a solution to this problem since it does not
    only impact hugepage allocations but rather changes the memory
    allocation strategy for *all* page allocations.
    Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Mel Gorman <mgorman@suse.de>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Cc: Stefan Priebe - Profihost AG <s.priebe@profihost.ag>
    Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    ac79f78d
huge_memory.c 82.9 KB