1. 26 Jan, 2015 27 commits
  2. 14 Jan, 2015 7 commits
  3. 13 Jan, 2015 2 commits
    • Jiri Slaby's avatar
      Linux 3.12.36 · f6101957
      Jiri Slaby authored
      f6101957
    • Kirill A. Shutemov's avatar
      thp: close race between split and zap huge pages · 198717b6
      Kirill A. Shutemov authored
      commit b5a8cad3 upstream.
      
      [stable 3.12 note]
      This commit was supposed to fix a completely other issue. But in 3.12,
      with commit f72e7dcd (mm: let
      mm_find_pmd fix buggy race with THP fault), we need this commit as
      well (it fixes the issue as a by-product). Hugh Dickins writes:
      <== citation starts here>
      Fine for this to go in, but there is one catch, which I discovered
      when backporting to v3.11: it needed one more hunk.  I haven't checked
      your base tree, but if this applies then I believe you need it - most
      of the time no problem, but it can case page migration to fail to find
      a migration entry it inserted earlier, then BUG_ON(!PageLocked(p)) in
      migration_entry_to_page() soon after.  Here's what I wrote back then:
      
      Note on rebase to v3.11: added a hunk to replace the use of
      mm_find_pmd() in page_check_address_pmd().  This call had been
      similarly replaced by the time of my v3.16 commit, in Kirill
      Shutemov's v3.15 b5a8cad3 ("thp: close race between split and zap
      huge pages"): which we do not need as such, since it's fixing v3.13
      117b0791 ("mm, thp: move ptl taking inside
      page_check_address_pmd()"), from a split page-table-lock series we are
      not backporting.  But without this additional hunk, rmap sometimes
      broke when the new semantic for mm_find_pmd() was used here.
      <== end of citation>
      
      But instead of appending hunks to commits, I am taking a full,
      backported version of commit b5a8cad3 with this note prepended.
      
      So the changelog of b5a8cad3 is left below, but does not apply to
      3.12 yet.
      [=== stable 3.12 note ends here]
      
      Sasha Levin has reported two THP BUGs[1][2].  I believe both of them
      have the same root cause.  Let's look to them one by one.
      
      The first bug[1] is "kernel BUG at mm/huge_memory.c:1829!".  It's
      BUG_ON(mapcount != page_mapcount(page)) in __split_huge_page().  From my
      testing I see that page_mapcount() is higher than mapcount here.
      
      I think it happens due to race between zap_huge_pmd() and
      page_check_address_pmd().  page_check_address_pmd() misses PMD which is
      under zap:
      
      	CPU0						CPU1
      						zap_huge_pmd()
      						  pmdp_get_and_clear()
      __split_huge_page()
        anon_vma_interval_tree_foreach()
          __split_huge_page_splitting()
            page_check_address_pmd()
              mm_find_pmd()
      	  /*
      	   * We check if PMD present without taking ptl: no
      	   * serialization against zap_huge_pmd(). We miss this PMD,
      	   * it's not accounted to 'mapcount' in __split_huge_page().
      	   */
      	  pmd_present(pmd) == 0
      
        BUG_ON(mapcount != page_mapcount(page)) // CRASH!!!
      
      						  page_remove_rmap(page)
      						    atomic_add_negative(-1, &page->_mapcount)
      
      The second bug[2] is "kernel BUG at mm/huge_memory.c:1371!".
      It's VM_BUG_ON_PAGE(!PageHead(page), page) in zap_huge_pmd().
      
      This happens in similar way:
      
      	CPU0						CPU1
      						zap_huge_pmd()
      						  pmdp_get_and_clear()
      						  page_remove_rmap(page)
      						    atomic_add_negative(-1, &page->_mapcount)
      __split_huge_page()
        anon_vma_interval_tree_foreach()
          __split_huge_page_splitting()
            page_check_address_pmd()
              mm_find_pmd()
      	  pmd_present(pmd) == 0	/* The same comment as above */
        /*
         * No crash this time since we already decremented page->_mapcount in
         * zap_huge_pmd().
         */
        BUG_ON(mapcount != page_mapcount(page))
      
        /*
         * We split the compound page here into small pages without
         * serialization against zap_huge_pmd()
         */
        __split_huge_page_refcount()
      						VM_BUG_ON_PAGE(!PageHead(page), page); // CRASH!!!
      
      So my understanding the problem is pmd_present() check in mm_find_pmd()
      without taking page table lock.
      
      The bug was introduced by me commit with commit 117b0791. Sorry for
      that. :(
      
      Let's open code mm_find_pmd() in page_check_address_pmd() and do the
      check under page table lock.
      
      Note that __page_check_address() does the same for PTE entires
      if sync != 0.
      
      I've stress tested split and zap code paths for 36+ hours by now and
      don't see crashes with the patch applied. Before it took <20 min to
      trigger the first bug and few hours for second one (if we ignore
      first).
      
      [1] https://lkml.kernel.org/g/<53440991.9090001@oracle.com>
      [2] https://lkml.kernel.org/g/<5310C56C.60709@oracle.com>
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Tested-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Cc: Bob Liu <lliubbo@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      198717b6
  4. 10 Jan, 2015 1 commit
  5. 09 Jan, 2015 3 commits
    • Hugh Dickins's avatar
      mm: let mm_find_pmd fix buggy race with THP fault · e1c34dac
      Hugh Dickins authored
      commit f72e7dcd upstream.
      
      Trinity has reported:
      
          BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
          IP: __lock_acquire (kernel/locking/lockdep.c:3070 (discriminator 1))
          CPU: 6 PID: 16173 Comm: trinity-c364 Tainted: G        W
                                  3.15.0-rc1-next-20140415-sasha-00020-gaa90d09 #398
          lock_acquire (arch/x86/include/asm/current.h:14
                        kernel/locking/lockdep.c:3602)
          _raw_spin_lock (include/linux/spinlock_api_smp.h:143
                          kernel/locking/spinlock.c:151)
          remove_migration_pte (mm/migrate.c:137)
          rmap_walk (mm/rmap.c:1628 mm/rmap.c:1699)
          remove_migration_ptes (mm/migrate.c:224)
          migrate_pages (mm/migrate.c:922 mm/migrate.c:960 mm/migrate.c:1126)
          migrate_misplaced_page (mm/migrate.c:1733)
          __handle_mm_fault (mm/memory.c:3762 mm/memory.c:3812 mm/memory.c:3925)
          handle_mm_fault (mm/memory.c:3948)
          __get_user_pages (mm/memory.c:1851)
          __mlock_vma_pages_range (mm/mlock.c:255)
          __mm_populate (mm/mlock.c:711)
          SyS_mlockall (include/linux/mm.h:1799 mm/mlock.c:817 mm/mlock.c:791)
      
      I believe this comes about because, whereas collapsing and splitting THP
      functions take anon_vma lock in write mode (which excludes concurrent
      rmap walks), faulting THP functions (write protection and misplaced
      NUMA) do not - and mostly they do not need to.
      
      But they do use a pmdp_clear_flush(), set_pmd_at() sequence which, for
      an instant (indeed, for a long instant, given the inter-CPU TLB flush in
      there), leaves *pmd neither present not trans_huge.
      
      Which can confuse a concurrent rmap walk, as when removing migration
      ptes, seen in the dumped trace.  Although that rmap walk has a 4k page
      to insert, anon_vmas containing THPs are in no way segregated from
      4k-page anon_vmas, so the 4k-intent mm_find_pmd() does need to cope with
      that instant when a trans_huge pmd is temporarily absent.
      
      I don't think we need strengthen the locking at the THP end: it's easily
      handled with an ACCESS_ONCE() before testing both conditions.
      
      And since mm_find_pmd() had only one caller who wanted a THP rather than
      a pmd, let's slightly repurpose it to fail when it hits a THP or
      non-present pmd, and open code split_huge_page_address() again.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Bob Liu <bob.liu@oracle.com>
      Cc: Christoph Lameter <cl@gentwo.org>
      Cc: Dave Jones <davej@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      e1c34dac
    • Johan Hovold's avatar
      mfd: viperboard: Fix platform-device id collision · 0702a76d
      Johan Hovold authored
      commit b6684228 upstream.
      
      Allow more than one viperboard to be connected by registering with
      PLATFORM_DEVID_AUTO instead of PLATFORM_DEVID_NONE.
      
      The subdevices are currently registered with PLATFORM_DEVID_NONE, which
      will cause a name collision on the platform bus when a second viperboard
      is plugged in:
      
      viperboard 1-2.4:1.0: version 0.00 found at bus 001 address 004
      ------------[ cut here ]------------
      WARNING: CPU: 0 PID: 181 at /home/johan/work/omicron/src/linux/fs/sysfs/dir.c:31 sysfs_warn_dup+0x74/0x84()
      sysfs: cannot create duplicate filename '/bus/platform/devices/viperboard-gpio'
      Modules linked in: i2c_viperboard viperboard netconsole [last unloaded: viperboard]
      CPU: 0 PID: 181 Comm: bash Tainted: G        W      3.17.0-rc6 #1
      [<c0016bf4>] (unwind_backtrace) from [<c0013860>] (show_stack+0x20/0x24)
      [<c0013860>] (show_stack) from [<c04305f8>] (dump_stack+0x24/0x28)
      [<c04305f8>] (dump_stack) from [<c0040fb4>] (warn_slowpath_common+0x80/0x98)
      [<c0040fb4>] (warn_slowpath_common) from [<c004100c>] (warn_slowpath_fmt+0x40/0x48)
      [<c004100c>] (warn_slowpath_fmt) from [<c016f1bc>] (sysfs_warn_dup+0x74/0x84)
      [<c016f1bc>] (sysfs_warn_dup) from [<c016f548>] (sysfs_do_create_link_sd.isra.2+0xcc/0xd0)
      [<c016f548>] (sysfs_do_create_link_sd.isra.2) from [<c016f588>] (sysfs_create_link+0x3c/0x48)
      [<c016f588>] (sysfs_create_link) from [<c02867ec>] (bus_add_device+0x12c/0x1e0)
      [<c02867ec>] (bus_add_device) from [<c0284820>] (device_add+0x410/0x584)
      [<c0284820>] (device_add) from [<c0289440>] (platform_device_add+0xd8/0x26c)
      [<c0289440>] (platform_device_add) from [<c02a5ae4>] (mfd_add_device+0x240/0x344)
      [<c02a5ae4>] (mfd_add_device) from [<c02a5ce0>] (mfd_add_devices+0xb8/0x110)
      [<c02a5ce0>] (mfd_add_devices) from [<bf00d1c8>] (vprbrd_probe+0x160/0x1b0 [viperboard])
      [<bf00d1c8>] (vprbrd_probe [viperboard]) from [<c030c000>] (usb_probe_interface+0x1bc/0x2a8)
      [<c030c000>] (usb_probe_interface) from [<c028768c>] (driver_probe_device+0x14c/0x3ac)
      [<c028768c>] (driver_probe_device) from [<c02879e4>] (__driver_attach+0xa4/0xa8)
      [<c02879e4>] (__driver_attach) from [<c0285698>] (bus_for_each_dev+0x70/0xa4)
      [<c0285698>] (bus_for_each_dev) from [<c0287030>] (driver_attach+0x2c/0x30)
      [<c0287030>] (driver_attach) from [<c030a288>] (usb_store_new_id+0x170/0x1ac)
      [<c030a288>] (usb_store_new_id) from [<c030a2f8>] (new_id_store+0x34/0x3c)
      [<c030a2f8>] (new_id_store) from [<c02853ec>] (drv_attr_store+0x30/0x3c)
      [<c02853ec>] (drv_attr_store) from [<c016eaa8>] (sysfs_kf_write+0x5c/0x60)
      [<c016eaa8>] (sysfs_kf_write) from [<c016dc68>] (kernfs_fop_write+0xd4/0x194)
      [<c016dc68>] (kernfs_fop_write) from [<c010fe40>] (vfs_write+0xb4/0x1c0)
      [<c010fe40>] (vfs_write) from [<c01104a8>] (SyS_write+0x4c/0xa0)
      [<c01104a8>] (SyS_write) from [<c000f900>] (ret_fast_syscall+0x0/0x48)
      ---[ end trace 98e8603c22d65817 ]---
      viperboard 1-2.4:1.0: Failed to add mfd devices to core.
      viperboard: probe of 1-2.4:1.0 failed with error -17
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarLee Jones <lee.jones@linaro.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      0702a76d
    • Linus Walleij's avatar
      mfd: stmpe: Fix STMPE24xx GPMR LSB · 83ee5961
      Linus Walleij authored
      commit 871c3cf4 upstream.
      
      The least significat byte of the GPIO value read register
      on the STMPE24xx series is on addres 0xA4 not 0xA5. Correct
      against datasheet and tested on the STMPE2401 hardware.
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarLee Jones <lee.jones@linaro.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      83ee5961