• Hugh Dickins's avatar
    mempolicy: alloc_pages_mpol() for NUMA policy without vma · ddc1a5cb
    Hugh Dickins authored
    Shrink shmem's stack usage by eliminating the pseudo-vma from its folio
    allocation.  alloc_pages_mpol(gfp, order, pol, ilx, nid) becomes the
    principal actor for passing mempolicy choice down to __alloc_pages(),
    rather than vma_alloc_folio(gfp, order, vma, addr, hugepage).
    
    vma_alloc_folio() and alloc_pages() remain, but as wrappers around
    alloc_pages_mpol().  alloc_pages_bulk_*() untouched, except to provide the
    additional args to policy_nodemask(), which subsumes policy_node(). 
    Cleanup throughout, cutting out some unhelpful "helpers".
    
    It would all be much simpler without MPOL_INTERLEAVE, but that adds a
    dynamic to the constant mpol: complicated by v3.6 commit 09c231cb
    ("tmpfs: distribute interleave better across nodes"), which added ino bias
    to the interleave, hidden from mm/mempolicy.c until this commit.
    
    Hence "ilx" throughout, the "interleave index".  Originally I thought it
    could be done just with nid, but that's wrong: the nodemask may come from
    the shared policy layer below a shmem vma, or it may come from the task
    layer above a shmem vma; and without the final nodemask then nodeid cannot
    be decided.  And how ilx is applied depends also on page order.
    
    The interleave index is almost always irrelevant unless MPOL_INTERLEAVE:
    with one exception in alloc_pages_mpol(), where the NO_INTERLEAVE_INDEX
    passed down from vma-less alloc_pages() is also used as hint not to use
    THP-style hugepage allocation - to avoid the overhead of a hugepage arg
    (though I don't understand why we never just added a GFP bit for THP - if
    it actually needs a different allocation strategy from other pages of the
    same order).  vma_alloc_folio() still carries its hugepage arg here, but
    it is not used, and should be removed when agreed.
    
    get_vma_policy() no longer allows a NULL vma: over time I believe we've
    eradicated all the places which used to need it e.g.  swapoff and madvise
    used to pass NULL vma to read_swap_cache_async(), but now know the vma.
    
    [hughd@google.com: handle NULL mpol being passed to __read_swap_cache_async()]
      Link: https://lkml.kernel.org/r/ea419956-4751-0102-21f7-9c93cb957892@google.com
    Link: https://lkml.kernel.org/r/74e34633-6060-f5e3-aee-7040d43f2e93@google.com
    Link: https://lkml.kernel.org/r/1738368e-bac0-fd11-ed7f-b87142a939fe@google.comSigned-off-by: default avatarHugh Dickins <hughd@google.com>
    Cc: Andi Kleen <ak@linux.intel.com>
    Cc: Christoph Lameter <cl@linux.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Cc: Huang Ying <ying.huang@intel.com>
    Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
    Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: Mel Gorman <mgorman@techsingularity.net>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: Nhat Pham <nphamcs@gmail.com>
    Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
    Cc: Suren Baghdasaryan <surenb@google.com>
    Cc: Tejun heo <tj@kernel.org>
    Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
    Cc: Yang Shi <shy828301@gmail.com>
    Cc: Yosry Ahmed <yosryahmed@google.com>
    Cc: Domenico Cerasuolo <mimmocerasuolo@gmail.com>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    ddc1a5cb
shmem.c 128 KB