1. 27 Jan, 2015 34 commits
  2. 16 Jan, 2015 6 commits
    • Greg Kroah-Hartman's avatar
      Linux 3.18.3 · 219b188d
      Greg Kroah-Hartman authored
      219b188d
    • Linus Torvalds's avatar
      mm: Don't count the stack guard page towards RLIMIT_STACK · f2f5d44b
      Linus Torvalds authored
      commit 690eac53 upstream.
      
      Commit fee7e49d ("mm: propagate error from stack expansion even for
      guard page") made sure that we return the error properly for stack
      growth conditions.  It also theorized that counting the guard page
      towards the stack limit might break something, but also said "Let's see
      if anybody notices".
      
      Somebody did notice.  Apparently android-x86 sets the stack limit very
      close to the limit indeed, and including the guard page in the rlimit
      check causes the android 'zygote' process problems.
      
      So this adds the (fairly trivial) code to make the stack rlimit check be
      against the actual real stack size, rather than the size of the vma that
      includes the guard page.
      Reported-and-tested-by: default avatarChih-Wei Huang <cwhuang@android-x86.org>
      Cc: Jay Foad <jay.foad@gmail.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f2f5d44b
    • Linus Torvalds's avatar
      mm: propagate error from stack expansion even for guard page · c03aed64
      Linus Torvalds authored
      commit fee7e49d upstream.
      
      Jay Foad reports that the address sanitizer test (asan) sometimes gets
      confused by a stack pointer that ends up being outside the stack vma
      that is reported by /proc/maps.
      
      This happens due to an interaction between RLIMIT_STACK and the guard
      page: when we do the guard page check, we ignore the potential error
      from the stack expansion, which effectively results in a missing guard
      page, since the expected stack expansion won't have been done.
      
      And since /proc/maps explicitly ignores the guard page (commit
      d7824370: "mm: fix up some user-visible effects of the stack guard
      page"), the stack pointer ends up being outside the reported stack area.
      
      This is the minimal patch: it just propagates the error.  It also
      effectively makes the guard page part of the stack limit, which in turn
      measn that the actual real stack is one page less than the stack limit.
      
      Let's see if anybody notices.  We could teach acct_stack_growth() to
      allow an extra page for a grow-up/grow-down stack in the rlimit test,
      but I don't want to add more complexity if it isn't needed.
      Reported-and-tested-by: default avatarJay Foad <jay.foad@gmail.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c03aed64
    • Vlastimil Babka's avatar
      mm, vmscan: prevent kswapd livelock due to pfmemalloc-throttled process being killed · 53bcf5c3
      Vlastimil Babka authored
      commit 9e5e3661 upstream.
      
      Charles Shirron and Paul Cassella from Cray Inc have reported kswapd
      stuck in a busy loop with nothing left to balance, but
      kswapd_try_to_sleep() failing to sleep.  Their analysis found the cause
      to be a combination of several factors:
      
      1. A process is waiting in throttle_direct_reclaim() on pgdat->pfmemalloc_wait
      
      2. The process has been killed (by OOM in this case), but has not yet been
         scheduled to remove itself from the waitqueue and die.
      
      3. kswapd checks for throttled processes in prepare_kswapd_sleep():
      
              if (waitqueue_active(&pgdat->pfmemalloc_wait)) {
                      wake_up(&pgdat->pfmemalloc_wait);
      		return false; // kswapd will not go to sleep
      	}
      
         However, for a process that was already killed, wake_up() does not remove
         the process from the waitqueue, since try_to_wake_up() checks its state
         first and returns false when the process is no longer waiting.
      
      4. kswapd is running on the same CPU as the only CPU that the process is
         allowed to run on (through cpus_allowed, or possibly single-cpu system).
      
      5. CONFIG_PREEMPT_NONE=y kernel is used. If there's nothing to balance, kswapd
         encounters no voluntary preemption points and repeatedly fails
         prepare_kswapd_sleep(), blocking the process from running and removing
         itself from the waitqueue, which would let kswapd sleep.
      
      So, the source of the problem is that we prevent kswapd from going to
      sleep until there are processes waiting on the pfmemalloc_wait queue,
      and a process waiting on a queue is guaranteed to be removed from the
      queue only when it gets scheduled.  This was done to make sure that no
      process is left sleeping on pfmemalloc_wait when kswapd itself goes to
      sleep.
      
      However, it isn't necessary to postpone kswapd sleep until the
      pfmemalloc_wait queue actually empties.  To prevent processes from being
      left sleeping, it's actually enough to guarantee that all processes
      waiting on pfmemalloc_wait queue have been woken up by the time we put
      kswapd to sleep.
      
      This patch therefore fixes this issue by substituting 'wake_up' with
      'wake_up_all' and removing 'return false' in the code snippet from
      prepare_kswapd_sleep() above.  Note that if any process puts itself in
      the queue after this waitqueue_active() check, or after the wake up
      itself, it means that the process will also wake up kswapd - and since
      we are under prepare_to_wait(), the wake up won't be missed.  Also we
      update the comment prepare_kswapd_sleep() to hopefully more clearly
      describe the races it is preventing.
      
      Fixes: 5515061d ("mm: throttle direct reclaimers if PF_MEMALLOC reserves are low and swap is backed by network storage")
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarVladimir Davydov <vdavydov@parallels.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.cz>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      53bcf5c3
    • Johannes Weiner's avatar
      mm: protect set_page_dirty() from ongoing truncation · a78e877e
      Johannes Weiner authored
      commit 2d6d7f98 upstream.
      
      Tejun, while reviewing the code, spotted the following race condition
      between the dirtying and truncation of a page:
      
      __set_page_dirty_nobuffers()       __delete_from_page_cache()
        if (TestSetPageDirty(page))
                                           page->mapping = NULL
      				     if (PageDirty())
      				       dec_zone_page_state(page, NR_FILE_DIRTY);
      				       dec_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE);
          if (page->mapping)
            account_page_dirtied(page)
              __inc_zone_page_state(page, NR_FILE_DIRTY);
      	__inc_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE);
      
      which results in an imbalance of NR_FILE_DIRTY and BDI_RECLAIMABLE.
      
      Dirtiers usually lock out truncation, either by holding the page lock
      directly, or in case of zap_pte_range(), by pinning the mapcount with
      the page table lock held.  The notable exception to this rule, though,
      is do_wp_page(), for which this race exists.  However, do_wp_page()
      already waits for a locked page to unlock before setting the dirty bit,
      in order to prevent a race where clear_page_dirty() misses the page bit
      in the presence of dirty ptes.  Upgrade that wait to a fully locked
      set_page_dirty() to also cover the situation explained above.
      
      Afterwards, the code in set_page_dirty() dealing with a truncation race
      is no longer needed.  Remove it.
      Reported-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a78e877e
    • Oleg Nesterov's avatar
      exit: fix race between wait_consider_task() and wait_task_zombie() · d73437ad
      Oleg Nesterov authored
      commit 3245d6ac upstream.
      
      wait_consider_task() checks EXIT_ZOMBIE after EXIT_DEAD/EXIT_TRACE and
      both checks can fail if we race with EXIT_ZOMBIE -> EXIT_DEAD/EXIT_TRACE
      change in between, gcc needs to reload p->exit_state after
      security_task_wait().  In this case ->notask_error will be wrongly
      cleared and do_wait() can hang forever if it was the last eligible
      child.
      
      Many thanks to Arne who carefully investigated the problem.
      
      Note: this bug is very old but it was pure theoretical until commit
      b3ab0316 ("wait: completely ignore the EXIT_DEAD tasks").  Before
      this commit "-O2" was probably enough to guarantee that compiler won't
      read ->exit_state twice.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Reported-by: default avatarArne Goedeke <el@laramies.com>
      Tested-by: default avatarArne Goedeke <el@laramies.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d73437ad