Commit 3c7b8b3c authored by Andrew Morton's avatar Andrew Morton Committed by Linus Torvalds

[PATCH] Fix interaction between batched lru addition and hot/cold

If a page is "freed" while in the deferred-lru-addition queue, the
final reference to it is the deferred lru addition queue.  When that
queue gets spilled onto the LRU, the page is actually freed.

Which is all expected and natural and works fine - it's a weird case.

But one of the AIM9 tests was taking a 20% performance hit (relative to
2.4) because it was going into the page allocator for new pages while
cache-hot pages were languishiung out in the deferred-addition queue.

So the patch changes things so that we spill the CPU's
deferred-lru-addition queue before starting to free pages.  This way,
the recently-used pages actually make it to the hot/cold lists and are
available for new allocations.

It gets back 15 of the lost 20%.  The other 5% is lost to the general
additional complexity of all this stuff.  (But we're 250% faster than
2.4 when running four instances of the test on 4-way).
parent 2f83855c
...@@ -405,6 +405,8 @@ void unmap_page_range(mmu_gather_t *tlb, struct vm_area_struct *vma, unsigned lo ...@@ -405,6 +405,8 @@ void unmap_page_range(mmu_gather_t *tlb, struct vm_area_struct *vma, unsigned lo
BUG_ON(address >= end); BUG_ON(address >= end);
lru_add_drain();
dir = pgd_offset(vma->vm_mm, address); dir = pgd_offset(vma->vm_mm, address);
tlb_start_vma(tlb, vma); tlb_start_vma(tlb, vma);
do { do {
...@@ -447,6 +449,8 @@ void zap_page_range(struct vm_area_struct *vma, unsigned long address, unsigned ...@@ -447,6 +449,8 @@ void zap_page_range(struct vm_area_struct *vma, unsigned long address, unsigned
return; return;
} }
lru_add_drain();
spin_lock(&mm->page_table_lock); spin_lock(&mm->page_table_lock);
/* /*
......
...@@ -211,8 +211,19 @@ void release_pages(struct page **pages, int nr, int cold) ...@@ -211,8 +211,19 @@ void release_pages(struct page **pages, int nr, int cold)
pagevec_free(&pages_to_free); pagevec_free(&pages_to_free);
} }
/*
* The pages which we're about to release may be in the deferred lru-addition
* queues. That would prevent them from really being freed right now. That's
* OK from a correctness point of view but is inefficient - those pages may be
* cache-warm and we want to give them back to the page allocator ASAP.
*
* So __pagevec_release() will drain those queues here. __pagevec_lru_add()
* and __pagevec_lru_add_active() call release_pages() directly to avoid
* mutual recursion.
*/
void __pagevec_release(struct pagevec *pvec) void __pagevec_release(struct pagevec *pvec)
{ {
lru_add_drain();
release_pages(pvec->pages, pagevec_count(pvec), pvec->cold); release_pages(pvec->pages, pagevec_count(pvec), pvec->cold);
pagevec_reinit(pvec); pagevec_reinit(pvec);
} }
...@@ -265,7 +276,8 @@ void __pagevec_lru_add(struct pagevec *pvec) ...@@ -265,7 +276,8 @@ void __pagevec_lru_add(struct pagevec *pvec)
} }
if (zone) if (zone)
spin_unlock_irq(&zone->lru_lock); spin_unlock_irq(&zone->lru_lock);
pagevec_release(pvec); release_pages(pvec->pages, pvec->nr, pvec->cold);
pagevec_reinit(pvec);
} }
void __pagevec_lru_add_active(struct pagevec *pvec) void __pagevec_lru_add_active(struct pagevec *pvec)
...@@ -291,7 +303,8 @@ void __pagevec_lru_add_active(struct pagevec *pvec) ...@@ -291,7 +303,8 @@ void __pagevec_lru_add_active(struct pagevec *pvec)
} }
if (zone) if (zone)
spin_unlock_irq(&zone->lru_lock); spin_unlock_irq(&zone->lru_lock);
pagevec_release(pvec); release_pages(pvec->pages, pvec->nr, pvec->cold);
pagevec_reinit(pvec);
} }
/* /*
......
...@@ -300,6 +300,7 @@ void free_pages_and_swap_cache(struct page **pages, int nr) ...@@ -300,6 +300,7 @@ void free_pages_and_swap_cache(struct page **pages, int nr)
const int chunk = 16; const int chunk = 16;
struct page **pagep = pages; struct page **pagep = pages;
lru_add_drain();
while (nr) { while (nr) {
int todo = min(chunk, nr); int todo = min(chunk, nr);
int i; int i;
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment