• Naoya Horiguchi's avatar
    x86/e820: put !E820_TYPE_RAM regions into memblock.reserved · 124049de
    Naoya Horiguchi authored
    There is a kernel panic that is triggered when reading /proc/kpageflags
    on the kernel booted with kernel parameter 'memmap=nn[KMG]!ss[KMG]':
    
      BUG: unable to handle kernel paging request at fffffffffffffffe
      PGD 9b20e067 P4D 9b20e067 PUD 9b210067 PMD 0
      Oops: 0000 [#1] SMP PTI
      CPU: 2 PID: 1728 Comm: page-types Not tainted 4.17.0-rc6-mm1-v4.17-rc6-180605-0816-00236-g2dfb086ef02c+ #160
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.fc28 04/01/2014
      RIP: 0010:stable_page_flags+0x27/0x3c0
      Code: 00 00 00 0f 1f 44 00 00 48 85 ff 0f 84 a0 03 00 00 41 54 55 49 89 fc 53 48 8b 57 08 48 8b 2f 48 8d 42 ff 83 e2 01 48 0f 44 c7 <48> 8b 00 f6 c4 01 0f 84 10 03 00 00 31 db 49 8b 54 24 08 4c 89 e7
      RSP: 0018:ffffbbd44111fde0 EFLAGS: 00010202
      RAX: fffffffffffffffe RBX: 00007fffffffeff9 RCX: 0000000000000000
      RDX: 0000000000000001 RSI: 0000000000000202 RDI: ffffed1182fff5c0
      RBP: ffffffffffffffff R08: 0000000000000001 R09: 0000000000000001
      R10: ffffbbd44111fed8 R11: 0000000000000000 R12: ffffed1182fff5c0
      R13: 00000000000bffd7 R14: 0000000002fff5c0 R15: ffffbbd44111ff10
      FS:  00007efc4335a500(0000) GS:ffff93a5bfc00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: fffffffffffffffe CR3: 00000000b2a58000 CR4: 00000000001406e0
      Call Trace:
       kpageflags_read+0xc7/0x120
       proc_reg_read+0x3c/0x60
       __vfs_read+0x36/0x170
       vfs_read+0x89/0x130
       ksys_pread64+0x71/0x90
       do_syscall_64+0x5b/0x160
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      RIP: 0033:0x7efc42e75e23
      Code: 09 00 ba 9f 01 00 00 e8 ab 81 f4 ff 66 2e 0f 1f 84 00 00 00 00 00 90 83 3d 29 0a 2d 00 00 75 13 49 89 ca b8 11 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 34 c3 48 83 ec 08 e8 db d3 01 00 48 89 04 24
    
    According to kernel bisection, this problem became visible due to commit
    f7f99100 ("mm: stop zeroing memory during allocation in vmemmap")
    which changes how struct pages are initialized.
    
    Memblock layout affects the pfn ranges covered by node/zone.  Consider
    that we have a VM with 2 NUMA nodes and each node has 4GB memory, and
    the default (no memmap= given) memblock layout is like below:
    
      MEMBLOCK configuration:
       memory size = 0x00000001fff75c00 reserved size = 0x000000000300c000
       memory.cnt  = 0x4
       memory[0x0]     [0x0000000000001000-0x000000000009efff], 0x000000000009e000 bytes on node 0 flags: 0x0
       memory[0x1]     [0x0000000000100000-0x00000000bffd6fff], 0x00000000bfed7000 bytes on node 0 flags: 0x0
       memory[0x2]     [0x0000000100000000-0x000000013fffffff], 0x0000000040000000 bytes on node 0 flags: 0x0
       memory[0x3]     [0x0000000140000000-0x000000023fffffff], 0x0000000100000000 bytes on node 1 flags: 0x0
       ...
    
    If you give memmap=1G!4G (so it just covers memory[0x2]),
    the range [0x100000000-0x13fffffff] is gone:
    
      MEMBLOCK configuration:
       memory size = 0x00000001bff75c00 reserved size = 0x000000000300c000
       memory.cnt  = 0x3
       memory[0x0]     [0x0000000000001000-0x000000000009efff], 0x000000000009e000 bytes on node 0 flags: 0x0
       memory[0x1]     [0x0000000000100000-0x00000000bffd6fff], 0x00000000bfed7000 bytes on node 0 flags: 0x0
       memory[0x2]     [0x0000000140000000-0x000000023fffffff], 0x0000000100000000 bytes on node 1 flags: 0x0
       ...
    
    This causes shrinking node 0's pfn range because it is calculated by the
    address range of memblock.memory.  So some of struct pages in the gap
    range are left uninitialized.
    
    We have a function zero_resv_unavail() which does zeroing the struct pages
    within the reserved unavailable range (i.e.  memblock.memory &&
    !memblock.reserved).  This patch utilizes it to cover all unavailable
    ranges by putting them into memblock.reserved.
    
    Link: http://lkml.kernel.org/r/20180615072947.GB23273@hori1.linux.bs1.fc.nec.co.jp
    Fixes: f7f99100 ("mm: stop zeroing memory during allocation in vmemmap")
    Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
    Tested-by: default avatarOscar Salvador <osalvador@suse.de>
    Tested-by: default avatar"Herton R. Krzesinski" <herton@redhat.com>
    Acked-by: default avatarMichal Hocko <mhocko@suse.com>
    Reviewed-by: default avatarPavel Tatashin <pasha.tatashin@oracle.com>
    Cc: Steven Sistare <steven.sistare@oracle.com>
    Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    124049de
e820.c 35.5 KB