1. 22 Feb, 2014 36 commits
  2. 20 Feb, 2014 4 commits
    • Greg Kroah-Hartman's avatar
      Linux 3.10.31 · a43e02cf
      Greg Kroah-Hartman authored
      a43e02cf
    • Xishi Qiu's avatar
      mm: fix process accidentally killed by mce because of huge page migration · 6843d925
      Xishi Qiu authored
      Based on c8721bbb upstream, but only the
      bugfix portion pulled out.
      
      Hi Naoya or Greg,
      
      We found a bug in 3.10.x.
      The problem is that we accidentally have a hwpoisoned hugepage in free
      hugepage list. It could happend in the the following scenario:
      
              process A                           process B
      
        migrate_huge_page
        put_page (old hugepage)
          linked to free hugepage list
                                           hugetlb_fault
                                             hugetlb_no_page
                                               alloc_huge_page
                                                 dequeue_huge_page_vma
                                                   dequeue_huge_page_node
                                                     (steal hwpoisoned hugepage)
        set_page_hwpoison_huge_page
        dequeue_hwpoisoned_huge_page
          (fail to dequeue)
      
      I tested this bug, one process keeps allocating huge page, and I 
      use sysfs interface to soft offline a huge page, then received:
      "MCE: Killing UCP:2717 due to hardware memory corruption fault at 8200034"
      
      Upstream kernel is free from this bug because of these two commits:
      
      f15bdfa8
      mm/memory-failure.c: fix memory leak in successful soft offlining
      
      c8721bbb
      mm: memory-hotplug: enable memory hotplug to handle hugepage
      
      The first one, although the problem is about memory leak, this patch
      moves unset_migratetype_isolate(), which is important to avoid the race.
      The latter is not a bug fix and it's too big, so I rewrite a small one.
      
      The following patch can fix this bug.(please apply f15bdfa8 first)
      Signed-off-by: default avatarXishi Qiu <qiuxishi@huawei.com>
      Reviewed-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6843d925
    • Jan Kara's avatar
      IB/qib: Convert qib_user_sdma_pin_pages() to use get_user_pages_fast() · 2d9258e4
      Jan Kara authored
      commit 603e7729 upstream.
      
      qib_user_sdma_queue_pkts() gets called with mmap_sem held for
      writing. Except for get_user_pages() deep down in
      qib_user_sdma_pin_pages() we don't seem to need mmap_sem at all.  Even
      more interestingly the function qib_user_sdma_queue_pkts() (and also
      qib_user_sdma_coalesce() called somewhat later) call copy_from_user()
      which can hit a page fault and we deadlock on trying to get mmap_sem
      when handling that fault.
      
      So just make qib_user_sdma_pin_pages() use get_user_pages_fast() and
      leave mmap_sem locking for mm.
      
      This deadlock has actually been observed in the wild when the node
      is under memory pressure.
      Reviewed-by: default avatarMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      [Backported to 3.10: (Thanks to Ben Huthings)
       - Adjust context
       - Adjust indentation and nr_pages argument in qib_user_sdma_pin_pages()]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2d9258e4
    • Naoya Horiguchi's avatar
      mm/memory-failure.c: fix memory leak in successful soft offlining · b0d4c0f8
      Naoya Horiguchi authored
      commit f15bdfa8 upstream.
      
      After a successful page migration by soft offlining, the source page is
      not properly freed and it's never reusable even if we unpoison it
      afterward.
      
      This is caused by the race between freeing page and setting PG_hwpoison.
      In successful soft offlining, the source page is put (and the refcount
      becomes 0) by putback_lru_page() in unmap_and_move(), where it's linked
      to pagevec and actual freeing back to buddy is delayed.  So if
      PG_hwpoison is set for the page before freeing, the freeing does not
      functions as expected (in such case freeing aborts in
      free_pages_prepare() check.)
      
      This patch tries to make sure to free the source page before setting
      PG_hwpoison on it.  To avoid reallocating, the page keeps
      MIGRATE_ISOLATE until after setting PG_hwpoison.
      
      This patch also removes obsolete comments about "keeping elevated
      refcount" because what they say is not true.  Unlike memory_failure(),
      soft_offline_page() uses no special page isolation code, and the
      soft-offlined pages have no elevated.
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Xishi Qiu <qiuxishi@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b0d4c0f8