• Wanpeng Li's avatar
    mm/hwpoison: fix panic due to split huge zero page · 7f6bf39b
    Wanpeng Li authored
    Bug:
    
      ------------[ cut here ]------------
      kernel BUG at mm/huge_memory.c:1957!
      invalid opcode: 0000 [#1] SMP
      Modules linked in: snd_hda_codec_hdmi i915 rpcsec_gss_krb5 snd_hda_codec_realtek snd_hda_codec_generic nfsv4 dns_re
      CPU: 2 PID: 2576 Comm: test_huge Not tainted 4.2.0-rc5-mm1+ #27
      Hardware name: Dell Inc. OptiPlex 7020/0F5C5X, BIOS A03 01/08/2015
      task: ffff880204e3d600 ti: ffff8800db16c000 task.ti: ffff8800db16c000
      RIP: split_huge_page_to_list+0xdb/0x120
      Call Trace:
        memory_failure+0x32e/0x7c0
        madvise_hwpoison+0x8b/0x160
        SyS_madvise+0x40/0x240
        ? do_page_fault+0x37/0x90
        entry_SYSCALL_64_fastpath+0x12/0x71
      Code: ff f0 41 ff 4c 24 30 74 0d 31 c0 48 83 c4 08 5b 41 5c 41 5d c9 c3 4c 89 e7 e8 e2 58 fd ff 48 83 c4 08 31 c0
      RIP  split_huge_page_to_list+0xdb/0x120
       RSP <ffff8800db16fde8>
      ---[ end trace aee7ce0df8e44076 ]---
    
    Testcase:
    
        #define _GNU_SOURCE
        #include <stdlib.h>
        #include <stdio.h>
        #include <sys/mman.h>
        #include <unistd.h>
        #include <fcntl.h>
        #include <sys/types.h>
        #include <errno.h>
        #include <string.h>
    
        #define MB 1024*1024
    
        int main(void)
        {
                char *mem;
    
                posix_memalign((void **)&mem, 2 * MB, 200 * MB);
    
                madvise(mem, 200 * MB, MADV_HWPOISON);
    
                free(mem);
    
                return 0;
        }
    
    Huge zero page is allocated if page fault w/o FAULT_FLAG_WRITE flag.
    The get_user_pages_fast() which called in madvise_hwpoison() will get
    huge zero page if the page is not allocated before.  Huge zero page is a
    tranparent huge page, however, it is not an anonymous page.
    memory_failure will split the huge zero page and trigger
    BUG_ON(is_huge_zero_page(page));
    
    After commit 98ed2b00 ("mm/memory-failure: give up error handling
    for non-tail-refcounted thp"), memory_failure will not catch non anon
    thp from madvise_hwpoison path and this bug occur.
    
    Fix it by catching non anon thp in memory_failure in order to not split
    huge zero page in madvise_hwpoison path.
    
    After this patch:
    
      Injecting memory failure for page 0x202800 at 0x7fd8ae800000
      MCE: 0x202800: non anonymous thp
      [...]
    
    [akpm@linux-foundation.org: remove second split, per Wanpeng]
    Signed-off-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
    Acked-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    7f6bf39b
memory-failure.c 48.5 KB