• Jon Tollefson's avatar
    powerpc: Reserve in bootmem lmb reserved regions that cross NUMA nodes · 8f64e1f2
    Jon Tollefson authored
    If there are multiple reserved memory blocks via lmb_reserve() that are
    contiguous addresses and on different NUMA nodes we are losing track of which
    address ranges to reserve in bootmem on which node.  I discovered this
    when I recently got to try 16GB huge pages on a system with more then 2 nodes.
    
    When scanning the device tree in early boot we call lmb_reserve() with
    the addresses of the 16G pages that we find so that the memory doesn't
    get used for something else.  For example the addresses for the pages
    could be 4000000000, 4400000000, 4800000000, 4C00000000, etc - 8 pages,
    one on each of eight nodes.  In the lmb after all the pages have been
    reserved it will look something like the following:
    
    lmb_dump_all:
        memory.cnt            = 0x2
        memory.size           = 0x3e80000000
        memory.region[0x0].base       = 0x0
                          .size     = 0x1e80000000
        memory.region[0x1].base       = 0x4000000000
                          .size     = 0x2000000000
        reserved.cnt          = 0x5
        reserved.size         = 0x3e80000000
        reserved.region[0x0].base       = 0x0
                          .size     = 0x7b5000
        reserved.region[0x1].base       = 0x2a00000
                          .size     = 0x78c000
        reserved.region[0x2].base       = 0x328c000
                          .size     = 0x43000
        reserved.region[0x3].base       = 0xf4e8000
                          .size     = 0xb18000
        reserved.region[0x4].base       = 0x4000000000
                          .size     = 0x2000000000
    
    The reserved.region[0x4] contains the 16G pages.  In
    arch/powerpc/mm/num.c: do_init_bootmem() we loop through each of the
    node numbers looking for the reserved regions that belong to the
    particular node.  It is not able to identify region 0x4 as being a part
    of each of the 8 nodes.  It is assuming that a reserved region is only
    on a single node.
    
    This patch takes out the reserved region loop from inside
    the loop that goes over each node.  It looks up the active region containing
    the start of the reserved region.  If it extends past that active region then
    it adjusts the size and gets the next active region containing it.
    Signed-off-by: default avatarJon Tollefson <kniht@linux.vnet.ibm.com>
    Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
    8f64e1f2
numa.c 26.9 KB