1. 23 Mar, 2011 3 commits
    • David Rientjes's avatar
      oom: prevent unnecessary oom kills or kernel panics · 3a5dda7a
      David Rientjes authored
      This patch prevents unnecessary oom kills or kernel panics by reverting
      two commits:
      
      	495789a5 (oom: make oom_score to per-process value)
      	cef1d352 (oom: multi threaded process coredump don't make deadlock)
      
      First, 495789a5 (oom: make oom_score to per-process value) ignores the
      fact that all threads in a thread group do not necessarily exit at the
      same time.
      
      It is imperative that select_bad_process() detect threads that are in the
      exit path, specifically those with PF_EXITING set, to prevent needlessly
      killing additional tasks.  If a process is oom killed and the thread group
      leader exits, select_bad_process() cannot detect the other threads that
      are PF_EXITING by iterating over only processes.  Thus, it currently
      chooses another task unnecessarily for oom kill or panics the machine when
      nothing else is eligible.
      
      By iterating over threads instead, it is possible to detect threads that
      are exiting and nominate them for oom kill so they get access to memory
      reserves.
      
      Second, cef1d352 (oom: multi threaded process coredump don't make
      deadlock) erroneously avoids making the oom killer a no-op when an
      eligible thread other than current isfound to be exiting.  We want to
      detect this situation so that we may allow that exiting thread time to
      exit and free its memory; if it is able to exit on its own, that should
      free memory so current is no loner oom.  If it is not able to exit on its
      own, the oom killer will nominate it for oom kill which, in this case,
      only means it will get access to memory reserves.
      
      Without this change, it is easy for the oom killer to unnecessarily target
      tasks when all threads of a victim don't exit before the thread group
      leader or, in the worst case, panic the machine.
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Andrey Vagin <avagin@openvz.org>
      Cc: <stable@kernel.org>		[2.6.38.x]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3a5dda7a
    • Mel Gorman's avatar
      mm: swap: unlock swapfile inode mutex before closing file on bad swapfiles · 52c50567
      Mel Gorman authored
      If an administrator tries to swapon a file backed by NFS, the inode mutex is
      taken (as it is for any swapfile) but later identified to be a bad swapfile
      due to the lack of bmap and tries to cleanup. During cleanup, an attempt is
      made to close the file but with inode->i_mutex still held. Closing an NFS
      file syncs it which tries to acquire the inode mutex leading to deadlock. If
      lockdep is enabled the following appears on the console;
      
          =============================================
          [ INFO: possible recursive locking detected ]
          2.6.38-rc8-autobuild #1
          ---------------------------------------------
          swapon/2192 is trying to acquire lock:
           (&sb->s_type->i_mutex_key#13){+.+.+.}, at: vfs_fsync_range+0x47/0x7c
      
          but task is already holding lock:
           (&sb->s_type->i_mutex_key#13){+.+.+.}, at: sys_swapon+0x28d/0xae7
      
          other info that might help us debug this:
          1 lock held by swapon/2192:
           #0:  (&sb->s_type->i_mutex_key#13){+.+.+.}, at: sys_swapon+0x28d/0xae7
      
          stack backtrace:
          Pid: 2192, comm: swapon Not tainted 2.6.38-rc8-autobuild #1
          Call Trace:
              __lock_acquire+0x2eb/0x1623
              find_get_pages_tag+0x14a/0x174
              pagevec_lookup_tag+0x25/0x2e
              vfs_fsync_range+0x47/0x7c
              lock_acquire+0xd3/0x100
              vfs_fsync_range+0x47/0x7c
              nfs_flush_one+0x0/0xdf [nfs]
              mutex_lock_nested+0x40/0x2b1
              vfs_fsync_range+0x47/0x7c
              vfs_fsync_range+0x47/0x7c
              vfs_fsync+0x1c/0x1e
              nfs_file_flush+0x64/0x69 [nfs]
              filp_close+0x43/0x72
              sys_swapon+0xa39/0xae7
              sysret_check+0x2e/0x69
              system_call_fastpath+0x16/0x1b
      
      This patch releases the mutex if its held before calling filep_close()
      so swapon fails as expected without deadlock when the swapfile is backed
      by NFS.  If accepted for 2.6.39, it should also be considered a -stable
      candidate for 2.6.38 and 2.6.37.
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Cc: <stable@kernel.org>		[2.6.37+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      52c50567
    • Andrew Morton's avatar
      include/asm-generic/unistd.h: fix syncfs syscall number · c7a1fcd8
      Andrew Morton authored
      syncfs() is duplicating name_to_handle_at() due to a merging mistake.
      
      Cc: Sage Weil <sage@newdream.net>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c7a1fcd8
  2. 22 Mar, 2011 37 commits