Commit 08951e54 authored by Mel Gorman's avatar Mel Gorman Committed by Linus Torvalds

mm: vmscan: correct check for kswapd sleeping in sleeping_prematurely

During allocator-intensive workloads, kswapd will be woken frequently
causing free memory to oscillate between the high and min watermark.  This
is expected behaviour.  Unfortunately, if the highest zone is small, a
problem occurs.

This seems to happen most with recent sandybridge laptops but it's
probably a co-incidence as some of these laptops just happen to have a
small Normal zone.  The reproduction case is almost always during copying
large files that kswapd pegs at 100% CPU until the file is deleted or
cache is dropped.

The problem is mostly down to sleeping_prematurely() keeping kswapd awake
when the highest zone is small and unreclaimable and compounded by the
fact we shrink slabs even when not shrinking zones causing a lot of time
to be spent in shrinkers and a lot of memory to be reclaimed.

Patch 1 corrects sleeping_prematurely to check the zones matching
	the classzone_idx instead of all zones.

Patch 2 avoids shrinking slab when we are not shrinking a zone.

Patch 3 notes that sleeping_prematurely is checking lower zones against
	a high classzone which is not what allocators or balance_pgdat()
	is doing leading to an artifical belief that kswapd should be
	still awake.

Patch 4 notes that when balance_pgdat() gives up on a high zone that the
	decision is not communicated to sleeping_prematurely()

This problem affects 2.6.38.8 for certain and is expected to affect 2.6.39
and 3.0-rc4 as well.  If accepted, they need to go to -stable to be picked
up by distros and this series is against 3.0-rc4.  I've cc'd people that
reported similar problems recently to see if they still suffer from the
problem and if this fixes it.

This patch: correct the check for kswapd sleeping in sleeping_prematurely()

During allocator-intensive workloads, kswapd will be woken frequently
causing free memory to oscillate between the high and min watermark.  This
is expected behaviour.

A problem occurs if the highest zone is small.  balance_pgdat() only
considers unreclaimable zones when priority is DEF_PRIORITY but
sleeping_prematurely considers all zones.  It's possible for this sequence
to occur

  1. kswapd wakes up and enters balance_pgdat()
  2. At DEF_PRIORITY, marks highest zone unreclaimable
  3. At DEF_PRIORITY-1, ignores highest zone setting end_zone
  4. At DEF_PRIORITY-1, calls shrink_slab freeing memory from
        highest zone, clearing all_unreclaimable. Highest zone
        is still unbalanced
  5. kswapd returns and calls sleeping_prematurely
  6. sleeping_prematurely looks at *all* zones, not just the ones
     being considered by balance_pgdat. The highest small zone
     has all_unreclaimable cleared but the zone is not
     balanced. all_zones_ok is false so kswapd stays awake

This patch corrects the behaviour of sleeping_prematurely to check the
zones balance_pgdat() checked.
Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
Reported-by: default avatarPádraig Brady <P@draigBrady.com>
Tested-by: default avatarPádraig Brady <P@draigBrady.com>
Tested-by: default avatarAndrew Lutomirski <luto@mit.edu>
Acked-by: default avatarRik van Riel <riel@redhat.com>
Reviewed-by: default avatarMinchan Kim <minchan.kim@gmail.com>
Reviewed-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: <stable@kernel.org>
Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
parent 902daf65
......@@ -2326,7 +2326,7 @@ static bool sleeping_prematurely(pg_data_t *pgdat, int order, long remaining,
return true;
/* Check the watermark levels */
for (i = 0; i < pgdat->nr_zones; i++) {
for (i = 0; i <= classzone_idx; i++) {
struct zone *zone = pgdat->node_zones + i;
if (!populated_zone(zone))
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment