Commit cc638f32 authored by Vlastimil Babka's avatar Vlastimil Babka Committed by Linus Torvalds

mm, thp: tweak reclaim/compaction effort of local-only and all-node allocations

THP page faults now attempt a __GFP_THISNODE allocation first, which
should only compact existing free memory, followed by another attempt
that can allocate from any node using reclaim/compaction effort
specified by global defrag setting and madvise.

This patch makes the following changes to the scheme:

 - Before the patch, the first allocation relies on a check for
   pageblock order and __GFP_IO to prevent excessive reclaim. This
   however affects also the second attempt, which is not limited to
   single node.

   Instead of that, reuse the existing check for costly order
   __GFP_NORETRY allocations, and make sure the first THP attempt uses
   __GFP_NORETRY. As a side-effect, all costly order __GFP_NORETRY
   allocations will bail out if compaction needs reclaim, while
   previously they only bailed out when compaction was deferred due to
   previous failures.

   This should be still acceptable within the __GFP_NORETRY semantics.

 - Before the patch, the second allocation attempt (on all nodes) was
   passing __GFP_NORETRY. This is redundant as the check for pageblock
   order (discussed above) was stronger. It's also contrary to
   madvise(MADV_HUGEPAGE) which means some effort to allocate THP is
   requested.

   After this patch, the second attempt doesn't pass __GFP_THISNODE nor
   __GFP_NORETRY.

To sum up, THP page faults now try the following attempts:

1. local node only THP allocation with no reclaim, just compaction.
2. for madvised VMA's or when synchronous compaction is enabled always - THP
   allocation from any node with effort determined by global defrag setting
   and VMA madvise
3. fallback to base pages on any node

Link: http://lkml.kernel.org/r/08a3f4dd-c3ce-0009-86c5-9ee51aba8557@suse.cz
Fixes: b39d0ee2 ("mm, page_alloc: avoid expensive reclaim when compaction may not succeed")
Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
Acked-by: default avatarMichal Hocko <mhocko@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
parent b3a987b0
...@@ -2148,18 +2148,22 @@ alloc_pages_vma(gfp_t gfp, int order, struct vm_area_struct *vma, ...@@ -2148,18 +2148,22 @@ alloc_pages_vma(gfp_t gfp, int order, struct vm_area_struct *vma,
nmask = policy_nodemask(gfp, pol); nmask = policy_nodemask(gfp, pol);
if (!nmask || node_isset(hpage_node, *nmask)) { if (!nmask || node_isset(hpage_node, *nmask)) {
mpol_cond_put(pol); mpol_cond_put(pol);
/*
* First, try to allocate THP only on local node, but
* don't reclaim unnecessarily, just compact.
*/
page = __alloc_pages_node(hpage_node, page = __alloc_pages_node(hpage_node,
gfp | __GFP_THISNODE, order); gfp | __GFP_THISNODE | __GFP_NORETRY, order);
/* /*
* If hugepage allocations are configured to always * If hugepage allocations are configured to always
* synchronous compact or the vma has been madvised * synchronous compact or the vma has been madvised
* to prefer hugepage backing, retry allowing remote * to prefer hugepage backing, retry allowing remote
* memory as well. * memory with both reclaim and compact as well.
*/ */
if (!page && (gfp & __GFP_DIRECT_RECLAIM)) if (!page && (gfp & __GFP_DIRECT_RECLAIM))
page = __alloc_pages_node(hpage_node, page = __alloc_pages_node(hpage_node,
gfp | __GFP_NORETRY, order); gfp, order);
goto out; goto out;
} }
......
...@@ -4476,8 +4476,11 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, ...@@ -4476,8 +4476,11 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
if (page) if (page)
goto got_pg; goto got_pg;
if (order >= pageblock_order && (gfp_mask & __GFP_IO) && /*
!(gfp_mask & __GFP_RETRY_MAYFAIL)) { * Checks for costly allocations with __GFP_NORETRY, which
* includes some THP page fault allocations
*/
if (costly_order && (gfp_mask & __GFP_NORETRY)) {
/* /*
* If allocating entire pageblock(s) and compaction * If allocating entire pageblock(s) and compaction
* failed because all zones are below low watermarks * failed because all zones are below low watermarks
...@@ -4498,23 +4501,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, ...@@ -4498,23 +4501,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
if (compact_result == COMPACT_SKIPPED || if (compact_result == COMPACT_SKIPPED ||
compact_result == COMPACT_DEFERRED) compact_result == COMPACT_DEFERRED)
goto nopage; goto nopage;
}
/*
* Checks for costly allocations with __GFP_NORETRY, which
* includes THP page fault allocations
*/
if (costly_order && (gfp_mask & __GFP_NORETRY)) {
/*
* If compaction is deferred for high-order allocations,
* it is because sync compaction recently failed. If
* this is the case and the caller requested a THP
* allocation, we do not want to heavily disrupt the
* system, so we fail the allocation instead of entering
* direct reclaim.
*/
if (compact_result == COMPACT_DEFERRED)
goto nopage;
/* /*
* Looks like reclaim/compaction is worth trying, but * Looks like reclaim/compaction is worth trying, but
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment