1. 22 Feb, 2014 37 commits
  2. 20 Feb, 2014 3 commits
    • Greg Kroah-Hartman's avatar
      Linux 3.10.31 · a43e02cf
      Greg Kroah-Hartman authored
      a43e02cf
    • Xishi Qiu's avatar
      mm: fix process accidentally killed by mce because of huge page migration · 6843d925
      Xishi Qiu authored
      Based on c8721bbb upstream, but only the
      bugfix portion pulled out.
      
      Hi Naoya or Greg,
      
      We found a bug in 3.10.x.
      The problem is that we accidentally have a hwpoisoned hugepage in free
      hugepage list. It could happend in the the following scenario:
      
              process A                           process B
      
        migrate_huge_page
        put_page (old hugepage)
          linked to free hugepage list
                                           hugetlb_fault
                                             hugetlb_no_page
                                               alloc_huge_page
                                                 dequeue_huge_page_vma
                                                   dequeue_huge_page_node
                                                     (steal hwpoisoned hugepage)
        set_page_hwpoison_huge_page
        dequeue_hwpoisoned_huge_page
          (fail to dequeue)
      
      I tested this bug, one process keeps allocating huge page, and I 
      use sysfs interface to soft offline a huge page, then received:
      "MCE: Killing UCP:2717 due to hardware memory corruption fault at 8200034"
      
      Upstream kernel is free from this bug because of these two commits:
      
      f15bdfa8
      mm/memory-failure.c: fix memory leak in successful soft offlining
      
      c8721bbb
      mm: memory-hotplug: enable memory hotplug to handle hugepage
      
      The first one, although the problem is about memory leak, this patch
      moves unset_migratetype_isolate(), which is important to avoid the race.
      The latter is not a bug fix and it's too big, so I rewrite a small one.
      
      The following patch can fix this bug.(please apply f15bdfa8 first)
      Signed-off-by: default avatarXishi Qiu <qiuxishi@huawei.com>
      Reviewed-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6843d925
    • Jan Kara's avatar
      IB/qib: Convert qib_user_sdma_pin_pages() to use get_user_pages_fast() · 2d9258e4
      Jan Kara authored
      commit 603e7729 upstream.
      
      qib_user_sdma_queue_pkts() gets called with mmap_sem held for
      writing. Except for get_user_pages() deep down in
      qib_user_sdma_pin_pages() we don't seem to need mmap_sem at all.  Even
      more interestingly the function qib_user_sdma_queue_pkts() (and also
      qib_user_sdma_coalesce() called somewhat later) call copy_from_user()
      which can hit a page fault and we deadlock on trying to get mmap_sem
      when handling that fault.
      
      So just make qib_user_sdma_pin_pages() use get_user_pages_fast() and
      leave mmap_sem locking for mm.
      
      This deadlock has actually been observed in the wild when the node
      is under memory pressure.
      Reviewed-by: default avatarMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      [Backported to 3.10: (Thanks to Ben Huthings)
       - Adjust context
       - Adjust indentation and nr_pages argument in qib_user_sdma_pin_pages()]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2d9258e4