1. 15 Aug, 2002 21 commits
    • Andrew Morton's avatar
      [PATCH] make pagemap_lru_lock irq-safe · aaba9265
      Andrew Morton authored
      It is expensive for a CPU to take an interrupt while holding the page
      LRU lock, because other CPUs will pile up on the lock while the
      interrupt runs.
      
      Disabling interrupts while holding the lock reduces contention by an
      additional 30% on 4-way.  This is when the only source of interrupts is
      disk completion.  The improvement will be higher with more CPUs and it
      will be higher if there is networking happening.
      
      The maximum hold time of this lock is 17 microseconds on 500 MHx PIII,
      which is well inside the kernel's maximum interrupt latency (which was
      100 usecs when I last looked, a year ago).
      
      This optimisation is not needed on uniprocessor, but the patch disables
      IRQs while holding pagemap_lru_lock anyway, so it becomes an irq-safe
      spinlock, and pages can be moved from the LRU in interrupt context.
      
      pagemap_lru_lock has been renamed to _pagemap_lru_lock to pick up any
      missed uses, and to reliably break any out-of-tree patches which may be
      using the old semantics.
      aaba9265
    • Andrew Morton's avatar
      [PATCH] batched removal of pages from the LRU · 008f707c
      Andrew Morton authored
      Convert all the bulk callers of lru_cache_del() to use the batched
      pagevec_lru_del() function.
      
      Change truncate_complete_page() to not delete the page from the LRU.
      Do it in page_cache_release() instead.  (This reintroduces the problem
      with final-release-from-interrupt.  THat gets fixed further on).
      
      This patch changes the truncate locking somewhat.  The removal from the
      LRU now happens _after_ the page has been removed from the
      address_space and has been unlocked.  So there is now a window where
      the shrink_cache code can discover the to-be-freed page via the LRU
      list.  But that's OK - the page is clean, its buffers (if any) are
      clean.  It's not attached to any mapping.
      008f707c
    • Andrew Morton's avatar
      [PATCH] batched addition of pages to the LRU · 9eb76ee2
      Andrew Morton authored
      The patch goes through the various places which were calling
      lru_cache_add() against bulk pages and batches them up.
      
      Also.  This whole patch series improves the behaviour of the system
      under heavy writeback load.  There is a reduction in page allocation
      failures, some reduction in loss of interactivity due to page
      allocators getting stuck on writeback from the VM.  (This is still bad
      though).
      
      I think it's due to the change here in mpage_writepages().  That
      function was originally unconditionally refiling written-back pages to
      the head of the inactive list.  The theory being that they should be
      moved out of the way of page allocators, who would end up waiting on
      them.
      
      It appears that this simply had the effect of pushing dirty, unwritten
      data closer to the tail of the inactive list, making things worse.
      
      So instead, if the caller is (typically) balance_dirty_pages() then
      leave the pages where they are on the LRU.
      
      If the caller is PF_MEMALLOC then the pages *have* to be refiled.  This
      is because VM writeback is clustered along mapping->dirty_pages, and
      it's almost certain that the pages which are being written are near the
      tail of the LRU.  If they were left there, page allocators would block
      on them too soon.  It would effectively become a synchronous write.
      9eb76ee2
    • Andrew Morton's avatar
      [PATCH] batched movement of lru pages in writeback · 823e0df8
      Andrew Morton authored
      Makes mpage_writepages() move pages around on the LRU sixteen-at-a-time
      rather than one-at-a-time.
      823e0df8
    • Andrew Morton's avatar
      [PATCH] multithread page reclaim · 3aa1dc77
      Andrew Morton authored
      This patch multithreads the main page reclaim function, shrink_cache().
      
      This function used to run under pagemap_lru_lock.  Instead, we grab
      that lock, put 32 pages from the LRU into a private list, drop the
      pagemap_lru_lock and then proceed to attempt to free those pages.
      
      Any pages which were succesfully reclaimed are batch-freed.  Pages
      which were not reclaimed are re-added to the LRU.
      
      This patch reduces pagemap_lru_lock contention on the 4-way by a factor
      of thirty.
      
      The shrink_cache() code has been simplified somewhat.
      
      refill_inactive() was being called too often - often just to process
      two or three pages.  Fiddled with that so it processes pages at the
      same rate, but works on 32 pages at a time.
      
      Added a couple of mark_page_accessed() calls into mm/memory.c from 2.4.
      They seem appropriate.
      
      Change the shrink_caches() logic so that it will still trickle through
      the active list (via refill_inactive) even if the inactive list is much
      larger than the active list.
      3aa1dc77
    • Andrew Morton's avatar
      [PATCH] pagevec infrastructure · 6a952840
      Andrew Morton authored
      This is the first patch in a series of eight which address
      pagemap_lru_lock contention, and which simplify the VM locking
      hierarchy.
      
      Most testing has been done with all eight patches applied, so it would
      be best not to cherrypick, please.
      
      The workload which was optimised was: 4x500MHz PIII CPUs, mem=512m, six
      disks, six filesystems, six processes each flat-out writing a large
      file onto one of the disks.  ie: heavy page replacement load.
      
      The frequency with which pagemap_lru_lock is taken is reduced by 90%.
      
      Lockmeter claims that pagemap_lru_lock contention on the 4-way has been
      reduced by 98%.  Total amount of system time lost to lock spinning went
      from 2.5% to 0.85%.
      
      Anton ran a similar test on 8-way PPC, the reduction in system time was
      around 25%, and the reduction in time spent playing with
      pagemap_lru_lock was 80%.
      
      	http://samba.org/~anton/linux/2.5.30/standard/
      versus
      	http://samba.org/~anton/linux/2.5.30/akpm/
      
      Throughput changes on uniprocessor are modest: a 1% speedup with this
      workload due to shortened code paths and improved cache locality.
      
      The patches do two main things:
      
      1: In almost all places where the kernel was doing something with
         lots of pages one-at-a-time, convert the code to do the same thing
         sixteen-pages-at-a-time.  Take the lock once rather than sixteen
         times.  Take the lock for the minimum possible time.
      
      2: Multithread the pagecache reclaim function: don't hold
         pagemap_lru_lock while reclaiming pagecache pages.  That function
         was massively expensive.
      
      One fallout from this work is that we never take any other locks while
      holding pagemap_lru_lock.  So this lock conceptually disappears from
      the VM locking hierarchy.
      
      
      So.  This is all basically a code tweak to improve kernel scalability.
      It does it by optimising the existing design, rather than by redesign.
      There is little conceptual change to how the VM works.
      
      This is as far as I can tweak it.  It seems that the results are now
      acceptable on SMP.  But things are still bad on NUMA.  It is expected
      that the per-zone LRU and per-zone LRU lock patches will fix NUMA as
      well, but that has yet to be tested.
      
      
      This first patch introduces `struct pagevec', which is the basic unit
      of batched work.  It is simply:
      
      struct pagevec {
      	unsigned nr;
      	struct page *pages[16];
      };
      
      pagevecs are used in the following patches to get the VM away from
      page-at-a-time operations.
      
      This patch includes all the pagevec library functions which are used in
      later patches.
      6a952840
    • Matthew Wilcox's avatar
      [PATCH] lockd shouldn't call posix_unblock_lock here · ecc9d325
      Matthew Wilcox authored
      nlmsvc_notify_blocked() is only called via the fl_notify() pointer which
      is only called immediately after we already did a locks_delete_block(),
      so calling posix_unblock_lock() here is always a NOP.
      ecc9d325
    • Dave Jones's avatar
      [PATCH] Modular x86 MTRR driver. · 6a85ced0
      Dave Jones authored
      This patch from Pat Mochel cleans up the hell that was mtrr.c
      into something a lot more modular and easy to understand, by
      doing the implementation-per-file as has been done to various
      other things by Pat and myself over the last months.
      
      It's functionally identical from a kernel internal point of view,
      and a userspace point of view, and is basically just a very large
      code clean up.
      6a85ced0
    • Ingo Molnar's avatar
      [PATCH] stale thread detach debugging removal · 3b307fd5
      Ingo Molnar authored
      one of the debugging tests triggered a false-positive BUG() when a
      detached thread was straced.
      3b307fd5
    • Ingo Molnar's avatar
      [PATCH] thread release infrastructure · d2b7244f
      Ingo Molnar authored
      it is much cleaner to pass in the address of the user-space VM lock -
      this will also enable arbitrary implementations of the stack-unlock, as
      the fifth clone() parameter.
      d2b7244f
    • Rusty Russell's avatar
      [PATCH] init_tasks is not defined anywhere. · 86ae817e
      Rusty Russell authored
      It's referenced by mips and mips64 (both far out of date), but never
      actually defined anywhere.
      86ae817e
    • Linus Torvalds's avatar
      Merge http://linuxusb.bkbits.net/linus-2.5 · edf3d92b
      Linus Torvalds authored
      into home.transmeta.com:/home/torvalds/v2.5/linux
      edf3d92b
    • Petr Vandrovec's avatar
      [PATCH] es1371 synchronize_irq · 17454310
      Petr Vandrovec authored
      Update ES1371 to new synchronize_irq() API.
      17454310
    • Petr Vandrovec's avatar
      [PATCH] broken cfb* support in the 2.5.31-bk · 9299c003
      Petr Vandrovec authored
      line_length, type and visual moved from display struct to the fb_info's fix
      structure during last fbdev updates. Unfortunately generic code was not updated
      together, so now every fbdev driver is broken.
      9299c003
    • Petr Vandrovec's avatar
      [PATCH] Unicode characters 0x80-0x9F are valid ISO* characters · 26036678
      Petr Vandrovec authored
      Characters 0x80-0x9F from ISO encodings are U+0080-U+009F, so map
      them both ways. Otherwise you cannot use chars 0x80-0x9F in filenames
      on filesystems using NLS.
      26036678
    • Linus Torvalds's avatar
      Merge http://linux-scsi.bkbits.net/scsi-for-linus-2.5 · f9969cbe
      Linus Torvalds authored
      into home.transmeta.com:/home/torvalds/v2.5/linux
      f9969cbe
    • Linus Torvalds's avatar
      Merge bk://ldm.bkbits.net/linux-2.5 · ad2d842b
      Linus Torvalds authored
      into home.transmeta.com:/home/torvalds/v2.5/linux
      ad2d842b
    • Matthew Wilcox's avatar
      [PATCH] Trivial: remove sti from aic7xxx_old · 0352f6f5
      Matthew Wilcox authored
      We don't need to reenable interrupts before calling panic.
      0352f6f5
    • Alexander Viro's avatar
      [PATCH] umem per-disk gendisks · 49ae70c0
      Alexander Viro authored
      49ae70c0
    • Alexander Viro's avatar
      [PATCH] dasd per-disk gendisks · 664aa7b2
      Alexander Viro authored
      664aa7b2
    • Alexander Viro's avatar
      [PATCH] acsi per-disk gendisks · bedbeab4
      Alexander Viro authored
      bedbeab4
  2. 14 Aug, 2002 18 commits
  3. 13 Aug, 2002 1 commit
    • Paul Mackerras's avatar
      [PATCH] add FP exception mode prctl · fcc6fcc6
      Paul Mackerras authored
      This patch that adds a prctl so that processes can set their
      floating-point exception mode on PPC and on PPC64.  We need this
      because the FP exception mode is controlled by bits in the machine
      state register, which can only be accessed by the kernel, and because
      the exception mode setting interacts with the lazy FPU save/restore
      that the kernel does.
      fcc6fcc6