1. 26 Jan, 2015 28 commits
  2. 14 Jan, 2015 7 commits
  3. 13 Jan, 2015 2 commits
    • Jiri Slaby's avatar
      Linux 3.12.36 · f6101957
      Jiri Slaby authored
      f6101957
    • Kirill A. Shutemov's avatar
      thp: close race between split and zap huge pages · 198717b6
      Kirill A. Shutemov authored
      commit b5a8cad3 upstream.
      
      [stable 3.12 note]
      This commit was supposed to fix a completely other issue. But in 3.12,
      with commit f72e7dcd (mm: let
      mm_find_pmd fix buggy race with THP fault), we need this commit as
      well (it fixes the issue as a by-product). Hugh Dickins writes:
      <== citation starts here>
      Fine for this to go in, but there is one catch, which I discovered
      when backporting to v3.11: it needed one more hunk.  I haven't checked
      your base tree, but if this applies then I believe you need it - most
      of the time no problem, but it can case page migration to fail to find
      a migration entry it inserted earlier, then BUG_ON(!PageLocked(p)) in
      migration_entry_to_page() soon after.  Here's what I wrote back then:
      
      Note on rebase to v3.11: added a hunk to replace the use of
      mm_find_pmd() in page_check_address_pmd().  This call had been
      similarly replaced by the time of my v3.16 commit, in Kirill
      Shutemov's v3.15 b5a8cad3 ("thp: close race between split and zap
      huge pages"): which we do not need as such, since it's fixing v3.13
      117b0791 ("mm, thp: move ptl taking inside
      page_check_address_pmd()"), from a split page-table-lock series we are
      not backporting.  But without this additional hunk, rmap sometimes
      broke when the new semantic for mm_find_pmd() was used here.
      <== end of citation>
      
      But instead of appending hunks to commits, I am taking a full,
      backported version of commit b5a8cad3 with this note prepended.
      
      So the changelog of b5a8cad3 is left below, but does not apply to
      3.12 yet.
      [=== stable 3.12 note ends here]
      
      Sasha Levin has reported two THP BUGs[1][2].  I believe both of them
      have the same root cause.  Let's look to them one by one.
      
      The first bug[1] is "kernel BUG at mm/huge_memory.c:1829!".  It's
      BUG_ON(mapcount != page_mapcount(page)) in __split_huge_page().  From my
      testing I see that page_mapcount() is higher than mapcount here.
      
      I think it happens due to race between zap_huge_pmd() and
      page_check_address_pmd().  page_check_address_pmd() misses PMD which is
      under zap:
      
      	CPU0						CPU1
      						zap_huge_pmd()
      						  pmdp_get_and_clear()
      __split_huge_page()
        anon_vma_interval_tree_foreach()
          __split_huge_page_splitting()
            page_check_address_pmd()
              mm_find_pmd()
      	  /*
      	   * We check if PMD present without taking ptl: no
      	   * serialization against zap_huge_pmd(). We miss this PMD,
      	   * it's not accounted to 'mapcount' in __split_huge_page().
      	   */
      	  pmd_present(pmd) == 0
      
        BUG_ON(mapcount != page_mapcount(page)) // CRASH!!!
      
      						  page_remove_rmap(page)
      						    atomic_add_negative(-1, &page->_mapcount)
      
      The second bug[2] is "kernel BUG at mm/huge_memory.c:1371!".
      It's VM_BUG_ON_PAGE(!PageHead(page), page) in zap_huge_pmd().
      
      This happens in similar way:
      
      	CPU0						CPU1
      						zap_huge_pmd()
      						  pmdp_get_and_clear()
      						  page_remove_rmap(page)
      						    atomic_add_negative(-1, &page->_mapcount)
      __split_huge_page()
        anon_vma_interval_tree_foreach()
          __split_huge_page_splitting()
            page_check_address_pmd()
              mm_find_pmd()
      	  pmd_present(pmd) == 0	/* The same comment as above */
        /*
         * No crash this time since we already decremented page->_mapcount in
         * zap_huge_pmd().
         */
        BUG_ON(mapcount != page_mapcount(page))
      
        /*
         * We split the compound page here into small pages without
         * serialization against zap_huge_pmd()
         */
        __split_huge_page_refcount()
      						VM_BUG_ON_PAGE(!PageHead(page), page); // CRASH!!!
      
      So my understanding the problem is pmd_present() check in mm_find_pmd()
      without taking page table lock.
      
      The bug was introduced by me commit with commit 117b0791. Sorry for
      that. :(
      
      Let's open code mm_find_pmd() in page_check_address_pmd() and do the
      check under page table lock.
      
      Note that __page_check_address() does the same for PTE entires
      if sync != 0.
      
      I've stress tested split and zap code paths for 36+ hours by now and
      don't see crashes with the patch applied. Before it took <20 min to
      trigger the first bug and few hours for second one (if we ignore
      first).
      
      [1] https://lkml.kernel.org/g/<53440991.9090001@oracle.com>
      [2] https://lkml.kernel.org/g/<5310C56C.60709@oracle.com>
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Tested-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Cc: Bob Liu <lliubbo@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      198717b6
  4. 10 Jan, 2015 1 commit
  5. 09 Jan, 2015 2 commits
    • Hugh Dickins's avatar
      mm: let mm_find_pmd fix buggy race with THP fault · e1c34dac
      Hugh Dickins authored
      commit f72e7dcd upstream.
      
      Trinity has reported:
      
          BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
          IP: __lock_acquire (kernel/locking/lockdep.c:3070 (discriminator 1))
          CPU: 6 PID: 16173 Comm: trinity-c364 Tainted: G        W
                                  3.15.0-rc1-next-20140415-sasha-00020-gaa90d09 #398
          lock_acquire (arch/x86/include/asm/current.h:14
                        kernel/locking/lockdep.c:3602)
          _raw_spin_lock (include/linux/spinlock_api_smp.h:143
                          kernel/locking/spinlock.c:151)
          remove_migration_pte (mm/migrate.c:137)
          rmap_walk (mm/rmap.c:1628 mm/rmap.c:1699)
          remove_migration_ptes (mm/migrate.c:224)
          migrate_pages (mm/migrate.c:922 mm/migrate.c:960 mm/migrate.c:1126)
          migrate_misplaced_page (mm/migrate.c:1733)
          __handle_mm_fault (mm/memory.c:3762 mm/memory.c:3812 mm/memory.c:3925)
          handle_mm_fault (mm/memory.c:3948)
          __get_user_pages (mm/memory.c:1851)
          __mlock_vma_pages_range (mm/mlock.c:255)
          __mm_populate (mm/mlock.c:711)
          SyS_mlockall (include/linux/mm.h:1799 mm/mlock.c:817 mm/mlock.c:791)
      
      I believe this comes about because, whereas collapsing and splitting THP
      functions take anon_vma lock in write mode (which excludes concurrent
      rmap walks), faulting THP functions (write protection and misplaced
      NUMA) do not - and mostly they do not need to.
      
      But they do use a pmdp_clear_flush(), set_pmd_at() sequence which, for
      an instant (indeed, for a long instant, given the inter-CPU TLB flush in
      there), leaves *pmd neither present not trans_huge.
      
      Which can confuse a concurrent rmap walk, as when removing migration
      ptes, seen in the dumped trace.  Although that rmap walk has a 4k page
      to insert, anon_vmas containing THPs are in no way segregated from
      4k-page anon_vmas, so the 4k-intent mm_find_pmd() does need to cope with
      that instant when a trans_huge pmd is temporarily absent.
      
      I don't think we need strengthen the locking at the THP end: it's easily
      handled with an ACCESS_ONCE() before testing both conditions.
      
      And since mm_find_pmd() had only one caller who wanted a THP rather than
      a pmd, let's slightly repurpose it to fail when it hits a THP or
      non-present pmd, and open code split_huge_page_address() again.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Bob Liu <bob.liu@oracle.com>
      Cc: Christoph Lameter <cl@gentwo.org>
      Cc: Dave Jones <davej@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      e1c34dac
    • Johan Hovold's avatar
      mfd: viperboard: Fix platform-device id collision · 0702a76d
      Johan Hovold authored
      commit b6684228 upstream.
      
      Allow more than one viperboard to be connected by registering with
      PLATFORM_DEVID_AUTO instead of PLATFORM_DEVID_NONE.
      
      The subdevices are currently registered with PLATFORM_DEVID_NONE, which
      will cause a name collision on the platform bus when a second viperboard
      is plugged in:
      
      viperboard 1-2.4:1.0: version 0.00 found at bus 001 address 004
      ------------[ cut here ]------------
      WARNING: CPU: 0 PID: 181 at /home/johan/work/omicron/src/linux/fs/sysfs/dir.c:31 sysfs_warn_dup+0x74/0x84()
      sysfs: cannot create duplicate filename '/bus/platform/devices/viperboard-gpio'
      Modules linked in: i2c_viperboard viperboard netconsole [last unloaded: viperboard]
      CPU: 0 PID: 181 Comm: bash Tainted: G        W      3.17.0-rc6 #1
      [<c0016bf4>] (unwind_backtrace) from [<c0013860>] (show_stack+0x20/0x24)
      [<c0013860>] (show_stack) from [<c04305f8>] (dump_stack+0x24/0x28)
      [<c04305f8>] (dump_stack) from [<c0040fb4>] (warn_slowpath_common+0x80/0x98)
      [<c0040fb4>] (warn_slowpath_common) from [<c004100c>] (warn_slowpath_fmt+0x40/0x48)
      [<c004100c>] (warn_slowpath_fmt) from [<c016f1bc>] (sysfs_warn_dup+0x74/0x84)
      [<c016f1bc>] (sysfs_warn_dup) from [<c016f548>] (sysfs_do_create_link_sd.isra.2+0xcc/0xd0)
      [<c016f548>] (sysfs_do_create_link_sd.isra.2) from [<c016f588>] (sysfs_create_link+0x3c/0x48)
      [<c016f588>] (sysfs_create_link) from [<c02867ec>] (bus_add_device+0x12c/0x1e0)
      [<c02867ec>] (bus_add_device) from [<c0284820>] (device_add+0x410/0x584)
      [<c0284820>] (device_add) from [<c0289440>] (platform_device_add+0xd8/0x26c)
      [<c0289440>] (platform_device_add) from [<c02a5ae4>] (mfd_add_device+0x240/0x344)
      [<c02a5ae4>] (mfd_add_device) from [<c02a5ce0>] (mfd_add_devices+0xb8/0x110)
      [<c02a5ce0>] (mfd_add_devices) from [<bf00d1c8>] (vprbrd_probe+0x160/0x1b0 [viperboard])
      [<bf00d1c8>] (vprbrd_probe [viperboard]) from [<c030c000>] (usb_probe_interface+0x1bc/0x2a8)
      [<c030c000>] (usb_probe_interface) from [<c028768c>] (driver_probe_device+0x14c/0x3ac)
      [<c028768c>] (driver_probe_device) from [<c02879e4>] (__driver_attach+0xa4/0xa8)
      [<c02879e4>] (__driver_attach) from [<c0285698>] (bus_for_each_dev+0x70/0xa4)
      [<c0285698>] (bus_for_each_dev) from [<c0287030>] (driver_attach+0x2c/0x30)
      [<c0287030>] (driver_attach) from [<c030a288>] (usb_store_new_id+0x170/0x1ac)
      [<c030a288>] (usb_store_new_id) from [<c030a2f8>] (new_id_store+0x34/0x3c)
      [<c030a2f8>] (new_id_store) from [<c02853ec>] (drv_attr_store+0x30/0x3c)
      [<c02853ec>] (drv_attr_store) from [<c016eaa8>] (sysfs_kf_write+0x5c/0x60)
      [<c016eaa8>] (sysfs_kf_write) from [<c016dc68>] (kernfs_fop_write+0xd4/0x194)
      [<c016dc68>] (kernfs_fop_write) from [<c010fe40>] (vfs_write+0xb4/0x1c0)
      [<c010fe40>] (vfs_write) from [<c01104a8>] (SyS_write+0x4c/0xa0)
      [<c01104a8>] (SyS_write) from [<c000f900>] (ret_fast_syscall+0x0/0x48)
      ---[ end trace 98e8603c22d65817 ]---
      viperboard 1-2.4:1.0: Failed to add mfd devices to core.
      viperboard: probe of 1-2.4:1.0 failed with error -17
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarLee Jones <lee.jones@linaro.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      0702a76d