• Andrew Morton's avatar
    [PATCH] pagevec infrastructure · 6a952840
    Andrew Morton authored
    This is the first patch in a series of eight which address
    pagemap_lru_lock contention, and which simplify the VM locking
    hierarchy.
    
    Most testing has been done with all eight patches applied, so it would
    be best not to cherrypick, please.
    
    The workload which was optimised was: 4x500MHz PIII CPUs, mem=512m, six
    disks, six filesystems, six processes each flat-out writing a large
    file onto one of the disks.  ie: heavy page replacement load.
    
    The frequency with which pagemap_lru_lock is taken is reduced by 90%.
    
    Lockmeter claims that pagemap_lru_lock contention on the 4-way has been
    reduced by 98%.  Total amount of system time lost to lock spinning went
    from 2.5% to 0.85%.
    
    Anton ran a similar test on 8-way PPC, the reduction in system time was
    around 25%, and the reduction in time spent playing with
    pagemap_lru_lock was 80%.
    
    	http://samba.org/~anton/linux/2.5.30/standard/
    versus
    	http://samba.org/~anton/linux/2.5.30/akpm/
    
    Throughput changes on uniprocessor are modest: a 1% speedup with this
    workload due to shortened code paths and improved cache locality.
    
    The patches do two main things:
    
    1: In almost all places where the kernel was doing something with
       lots of pages one-at-a-time, convert the code to do the same thing
       sixteen-pages-at-a-time.  Take the lock once rather than sixteen
       times.  Take the lock for the minimum possible time.
    
    2: Multithread the pagecache reclaim function: don't hold
       pagemap_lru_lock while reclaiming pagecache pages.  That function
       was massively expensive.
    
    One fallout from this work is that we never take any other locks while
    holding pagemap_lru_lock.  So this lock conceptually disappears from
    the VM locking hierarchy.
    
    
    So.  This is all basically a code tweak to improve kernel scalability.
    It does it by optimising the existing design, rather than by redesign.
    There is little conceptual change to how the VM works.
    
    This is as far as I can tweak it.  It seems that the results are now
    acceptable on SMP.  But things are still bad on NUMA.  It is expected
    that the per-zone LRU and per-zone LRU lock patches will fix NUMA as
    well, but that has yet to be tested.
    
    
    This first patch introduces `struct pagevec', which is the basic unit
    of batched work.  It is simply:
    
    struct pagevec {
    	unsigned nr;
    	struct page *pages[16];
    };
    
    pagevecs are used in the following patches to get the VM away from
    page-at-a-time operations.
    
    This patch includes all the pagevec library functions which are used in
    later patches.
    6a952840
page_alloc.c 23.2 KB