• Andrew Morton's avatar
    [PATCH] hot-n-cold pages: bulk page allocator · 38e419f5
    Andrew Morton authored
    This is the hot-n-cold-pages series.  It introduces a per-cpu lockless
    LIFO pool in front of the page allocator.  For three reasons:
    
    1: To reduce lock contention on the buddy lock: we allocate and free
       pages in, typically, 16-page chunks.
    
    2: To return cache-warm pages to page allocation requests.
    
    3: As infrastructure for a page reservation API which can be used to
       ensure that the GFP_ATOMIC radix-tree node and pte_chain allocations
       cannot fail.  That code is not complete, and does not absolutely
       require hot-n-cold pages.  It'll work OK though.
    
    We add two queues per CPU.  The "hot" queue contains pages which the
    freeing code thought were likely to be cache-hot.  By default, new
    allocations are satisfied from this queue.
    
    The "cold" queue contains pages which the freeing code expected to be
    cache-cold.  The cold queue is mainly for lock amortisation, although
    it is possible to explicitly allocate cold pages.  The readahead code
    does that.
    
    I have been hot and cold on these patches for quite some time - the
    benefit is not great.
    
    - 4% speedup in Randy Hron's benching of the autoconf regression
      tests on a 4-way.  Most of this came from savings in pte_alloc and
      pmd_alloc: the pagetable clearing code liked the warmer pages (some
      architectures still have the pgt_cache, and can perhaps do away with
      them).
    
    - 1% to 2% speedup in kernel compiles on my 4-way and Martin's 32-way.
    
    - 60% speedup in a little test program which writes 80 kbytes to a
      file and ftruncates it to zero again.  Ran four instances of that on
      4-way and it loved the cache warmth.
    
    - 2.5% speedup in Specweb testing on 8-way
    
    - The thing which won me over: an 11% increase in throughput of the
      SDET benchmark on an 8-way PIII:
    
    	with hot & cold:
    
    	RESULT for 8 users is 17971    +12.1%
    	RESULT for 16 users is 17026   +12.0%
    	RESULT for 32 users is 17009   +10.4%
    	RESULT for 64 users is 16911   +10.3%
    
    	without:
    
    	RESULT for 8 users is 16038
    	RESULT for 16 users is 15200
    	RESULT for 32 users is 15406
    	RESULT for 64 users is 15331
    
      SDET is a very old SPEC test which simulates a development
      environment with a large number of users.  Lots of users running a
      mix of shell commands, basically.
    
    
    These patches were written by Martin Bligh and myself.
    
    This one implements rmqueue_bulk() - a function for removing multiple
    pages of a given order from the buddy lists.
    
    This is for lock amortisation: take the highly-contended zone->lock
    with less frequency, do more work once it has been acquired.
    38e419f5
page_alloc.c 30.8 KB