1. 25 May, 2011 12 commits
    • Namhyung Kim's avatar
      mm: nommu: sort mm->mmap list properly · 6038def0
      Namhyung Kim authored
      When I was reading nommu code, I found that it handles the vma list/tree
      in an unusual way.  IIUC, because there can be more than one
      identical/overrapped vmas in the list/tree, it sorts the tree more
      strictly and does a linear search on the tree.  But it doesn't applied to
      the list (i.e.  the list could be constructed in a different order than
      the tree so that we can't use the list when finding the first vma in that
      order).
      
      Since inserting/sorting a vma in the tree and link is done at the same
      time, we can easily construct both of them in the same order.  And linear
      searching on the tree could be more costly than doing it on the list, it
      can be converted to use the list.
      
      Also, after the commit 297c5eee ("mm: make the vma list be doubly
      linked") made the list be doubly linked, there were a couple of code need
      to be fixed to construct the list properly.
      
      Patch 1/6 is a preparation.  It maintains the list sorted same as the tree
      and construct doubly-linked list properly.  Patch 2/6 is a simple
      optimization for the vma deletion.  Patch 3/6 and 4/6 convert tree
      traversal to list traversal and the rest are simple fixes and cleanups.
      
      This patch:
      
      @vma added into @mm should be sorted by start addr, end addr and VMA
      struct addr in that order because we may get identical VMAs in the @mm.
      However this was true only for the rbtree, not for the list.
      
      This patch fixes this by remembering 'rb_prev' during the tree traversal
      like find_vma_prepare() does and linking the @vma via __vma_link_list().
      After this patch, we can iterate the whole VMAs in correct order simply by
      using @mm->mmap list.
      
      [akpm@linux-foundation.org: avoid duplicating __vma_link_list()]
      Signed-off-by: default avatarNamhyung Kim <namhyung@gmail.com>
      Acked-by: default avatarGreg Ungerer <gerg@uclinux.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6038def0
    • Sergey Senozhatsky's avatar
    • Shaohua Li's avatar
      mmap: avoid merging cloned VMAs · 965f55de
      Shaohua Li authored
      Avoid merging a VMA with another VMA which is cloned from the parent process.
      
      The cloned VMA shares the anon_vma lock with the parent process's VMA.  If
      we do the merge, more vmas (even the new range is only for current
      process) use the perent process's anon_vma lock.  This introduces
      scalability issues.  find_mergeable_anon_vma() already considers this
      case.
      Signed-off-by: default avatarShaohua Li <shaohua.li@intel.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      965f55de
    • Shaohua Li's avatar
      mmap: avoid unnecessary anon_vma lock · 5f70b962
      Shaohua Li authored
      If we only change vma->vm_end, we can avoid taking anon_vma lock even if
      'insert' isn't NULL, which is the case of split_vma.
      
      As I understand it, we need the lock before because rmap must get the
      'insert' VMA when we adjust old VMA's vm_end (the 'insert' VMA is linked
      to anon_vma list in __insert_vm_struct before).
      
      But now this isn't true any more.  The 'insert' VMA is already linked to
      anon_vma list in __split_vma(with anon_vma_clone()) instead of
      __insert_vm_struct.  There is no race rmap can't get required VMAs.  So
      the anon_vma lock is unnecessary, and this can reduce one locking in brk
      case and improve scalability.
      
      Signed-off-by: Shaohua Li<shaohua.li@intel.com>
      Cc: Rik van Riel <riel@redhat.com>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5f70b962
    • Shaohua Li's avatar
      mmap: add alignment for some variables · 34679d7e
      Shaohua Li authored
      Make some variables have correct alignment/section to avoid cache issue.
      In a workload which heavily does mmap/munmap, the variables will be used
      frequently.
      Signed-off-by: default avatarShaohua Li <shaohua.li@intel.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      34679d7e
    • David Rientjes's avatar
      arch, mm: filter disallowed nodes from arch specific show_mem functions · 7bf02ea2
      David Rientjes authored
      Architectures that implement their own show_mem() function did not pass
      the filter argument to show_free_areas() to appropriately avoid emitting
      the state of nodes that are disallowed in the current context.  This patch
      now passes the filter argument to show_free_areas() so those nodes are now
      avoided.
      
      This patch also removes the show_free_areas() wrapper around
      __show_free_areas() and converts existing callers to pass an empty filter.
      
      ia64 emits additional information for each node, so skip_free_areas_zone()
      must be made global to filter disallowed nodes and it is converted to use
      a nid argument rather than a zone for this use case.
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Helge Deller <deller@gmx.de>
      Cc: James Bottomley <jejb@parisc-linux.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7bf02ea2
    • Sebastian Andrzej Siewior's avatar
      xtensa/mm: remove WANT_PAGE_VIRTUAL · 851cc856
      Sebastian Andrzej Siewior authored
      This is not useful: it provides page->virtual and is used with highmem.
      xtensa has no support for highmem and those HIGHMEM bits which are found
      by grep are partly implemented.  The interesting functions like kmap() are
      missing.  If someone actually implements the complete HIGHMEM support he
      could use HASHED_PAGE_VIRTUAL like most others do.
      Signed-off-by: default avatarSebastian Andrzej Siewior <sebastian@breakpoint.cc>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      851cc856
    • Sebastian Andrzej Siewior's avatar
      xtensa/mm: remove pgtable.c · 45fd9515
      Sebastian Andrzej Siewior authored
      It is not referenced by any Makefile.  pte_alloc_one_kernel() and
      pte_alloc_one() is implemented in arch/xtensa/include/asm/pgalloc.h.
      Signed-off-by: default avatarSebastian Andrzej Siewior <sebastian@breakpoint.cc>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      45fd9515
    • Liu Yuan's avatar
      drivers/video/backlight/adp5520_bl.c: check strict_strtoul() return value · 877947bc
      Liu Yuan authored
      It should check if strict_strtoul() succeeds.
      
      [akpm@linux-foundation.org: don't override strict_strtoul() return value]
      Signed-off-by: default avatarLiu Yuan <tailai.ly@taobao.com>
      Acked-by: default avatarMichael Hennerich <michael.hennerich@analog.com>
      Cc: Richard Purdie <rpurdie@rpsys.net>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      877947bc
    • Minchan Kim's avatar
      mm: vmscan: correctly check if reclaimer should schedule during shrink_slab · f06590bd
      Minchan Kim authored
      It has been reported on some laptops that kswapd is consuming large
      amounts of CPU and not being scheduled when SLUB is enabled during large
      amounts of file copying.  It is expected that this is due to kswapd
      missing every cond_resched() point because;
      
      shrink_page_list() calls cond_resched() if inactive pages were isolated
              which in turn may not happen if all_unreclaimable is set in
              shrink_zones(). If for whatver reason, all_unreclaimable is
              set on all zones, we can miss calling cond_resched().
      
      balance_pgdat() only calls cond_resched if the zones are not
              balanced. For a high-order allocation that is balanced, it
              checks order-0 again. During that window, order-0 might have
              become unbalanced so it loops again for order-0 and returns
              that it was reclaiming for order-0 to kswapd(). It can then
              find that a caller has rewoken kswapd for a high-order and
              re-enters balance_pgdat() without ever calling cond_resched().
      
      shrink_slab only calls cond_resched() if we are reclaiming slab
      	pages. If there are a large number of direct reclaimers, the
      	shrinker_rwsem can be contended and prevent kswapd calling
      	cond_resched().
      
      This patch modifies the shrink_slab() case.  If the semaphore is
      contended, the caller will still check cond_resched().  After each
      successful call into a shrinker, the check for cond_resched() remains in
      case one shrinker is particularly slow.
      
      [mgorman@suse.de: preserve call to cond_resched after each call into shrinker]
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarMinchan Kim <minchan.kim@gmail.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Tested-by: default avatarColin King <colin.king@canonical.com>
      Cc: Raghavendra D Prabhu <raghu.prabhu13@gmail.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: <stable@kernel.org>		[2.6.38+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f06590bd
    • Johannes Weiner's avatar
      mm: vmscan: correct use of pgdat_balanced in sleeping_prematurely · afc7e326
      Johannes Weiner authored
      There are a few reports of people experiencing hangs when copying large
      amounts of data with kswapd using a large amount of CPU which appear to be
      due to recent reclaim changes.  SLUB using high orders is the trigger but
      not the root cause as SLUB has been using high orders for a while.  The
      root cause was bugs introduced into reclaim which are addressed by the
      following two patches.
      
      Patch 1 corrects logic introduced by commit 1741c877 ("mm: kswapd:
              keep kswapd awake for high-order allocations until a percentage of
              the node is balanced") to allow kswapd to go to sleep when
              balanced for high orders.
      
      Patch 2 notes that it is possible for kswapd to miss every
              cond_resched() and updates shrink_slab() so it'll at least reach
              that scheduling point.
      
      Chris Wood reports that these two patches in isolation are sufficient to
      prevent the system hanging.  AFAIK, they should also resolve similar hangs
      experienced by James Bottomley.
      
      This patch:
      
      Johannes Weiner poined out that the logic in commit 1741c877 ("mm: kswapd:
      keep kswapd awake for high-order allocations until a percentage of the
      node is balanced") is backwards.  Instead of allowing kswapd to go to
      sleep when balancing for high order allocations, it keeps it kswapd
      running uselessly.
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: default avatarWu Fengguang <fengguang.wu@intel.com>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Tested-by: default avatarColin King <colin.king@canonical.com>
      Cc: Raghavendra D Prabhu <raghu.prabhu13@gmail.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Rik van Riel <riel@redhat.com>
      Reviewed-by: default avatarMinchan Kim <minchan.kim@gmail.com>
      Reviewed-by: default avatarWu Fengguang <fengguang.wu@intel.com>
      Cc: <stable@kernel.org>		[2.6.38+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      afc7e326
    • Christoph Lameter's avatar
      slub: Fix double bit unlock in debug mode · a71ae47a
      Christoph Lameter authored
      Commit 442b06bc ("slub: Remove node check in slab_free") added a
      call to deactivate_slab() in the debug case in __slab_alloc(), which
      unlocks the current slab used for allocation.  Going to the label
      'unlock_out' then does it again.
      
      Also, in the debug case we do not need all the other processing that the
      'unlock_out' path does.  We always fall back to the slow path in the
      debug case.  So the tid update is useless.
      
      Similarly, ALLOC_SLOWPATH would just be incremented for all allocations.
      Also a pretty useless thing.
      
      So simply restore irq flags and return the object.
      Signed-off-by: default avatarChristoph Lameter <cl@linux.com>
      Reported-and-bisected-by: default avatarJames Morris <jmorris@namei.org>
      Reported-by: default avatarIngo Molnar <mingo@elte.hu>
      Reported-by: default avatarJens Axboe <jaxboe@fusionio.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a71ae47a
  2. 24 May, 2011 28 commits