Commit 6fa1d901 authored by Andrew Morton's avatar Andrew Morton Committed by Linus Torvalds

[PATCH] Fix arithmetic in shrink_zone()

From: Nick Piggin <nickpiggin@yahoo.com.au>

If the zone has a very small number of inactive pages, local variable
`ratio' can be huge and we do way too much scanning.  So much so that Ingo
hit an NMI watchdog expiry, although that was because the zone would have a
had a single refcount-zero page in it, and that logic recently got fixed up
via get_page_testone().

Nick's patch simply puts a sane-looking upper bound on the number of pages
which we'll scan in this round.


It fixes another failure case: if the inactive list becomes very small
compared to the size of the active list, active list scanning (and therefore
inactive list refilling) also becomes small.

This patch causes inactive list scanning to be keyed off the size of the
active+inactive lists.  It has the plus of hiding active and inactive
balancing implementation from the higher level scanning code.  It will
slightly change other aspects of scanning behaviour, but probably not
significantly.
parent 8a98d6d1
......@@ -746,23 +746,33 @@ static int
shrink_zone(struct zone *zone, int max_scan, unsigned int gfp_mask,
int *total_scanned, struct page_state *ps, int do_writepage)
{
unsigned long ratio;
unsigned long scan_active;
int count;
/*
* Try to keep the active list 2/3 of the size of the cache. And
* make sure that refill_inactive is given a decent number of pages.
*
* The "ratio+1" here is important. With pagecache-intensive workloads
* the inactive list is huge, and `ratio' evaluates to zero all the
* time. Which pins the active list memory. So we add one to `ratio'
* just to make sure that the kernel will slowly sift through the
* active list.
* The "scan_active + 1" here is important. With pagecache-intensive
* workloads the inactive list is huge, and `ratio' evaluates to zero
* all the time. Which pins the active list memory. So we add one to
* `scan_active' just to make sure that the kernel will slowly sift
* through the active list.
*/
ratio = (unsigned long)SWAP_CLUSTER_MAX * zone->nr_active /
((zone->nr_inactive | 1) * 2);
if (zone->nr_active >= 4*(zone->nr_inactive*2 + 1)) {
/* Don't scan more than 4 times the inactive list scan size */
scan_active = 4*max_scan;
} else {
unsigned long long tmp;
atomic_add(ratio+1, &zone->nr_scan_active);
/* Cast to long long so the multiply doesn't overflow */
tmp = (unsigned long long)max_scan * zone->nr_active;
do_div(tmp, zone->nr_inactive*2 + 1);
scan_active = (unsigned long)tmp;
}
atomic_add(scan_active + 1, &zone->nr_scan_active);
count = atomic_read(&zone->nr_scan_active);
if (count >= SWAP_CLUSTER_MAX) {
atomic_set(&zone->nr_scan_active, 0);
......@@ -812,7 +822,7 @@ shrink_caches(struct zone **zones, int priority, int *total_scanned,
if (zone->all_unreclaimable && priority != DEF_PRIORITY)
continue; /* Let kswapd poll it */
max_scan = zone->nr_inactive >> priority;
max_scan = (zone->nr_active + zone->nr_inactive) >> priority;
ret += shrink_zone(zone, max_scan, gfp_mask,
total_scanned, ps, do_writepage);
}
......@@ -988,7 +998,8 @@ static int balance_pgdat(pg_data_t *pgdat, int nr_pages, struct page_state *ps)
all_zones_ok = 0;
}
zone->temp_priority = priority;
max_scan = zone->nr_inactive >> priority;
max_scan = (zone->nr_active + zone->nr_inactive)
>> priority;
reclaimed = shrink_zone(zone, max_scan, GFP_KERNEL,
&scanned, ps, do_writepage);
total_scanned += scanned;
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment