• Dave Hansen's avatar
    powerpc: Fix boot freeze on machine with empty memory node · 4a618669
    Dave Hansen authored
    I got a bug report about a distro kernel not booting on a particular
    machine.  It would freeze during boot:
    
    > ...
    > Could not find start_pfn for node 1
    > [boot]0015 Setup Done
    > Built 2 zonelists in Node order, mobility grouping on.  Total pages: 123783
    > Policy zone: DMA
    > Kernel command line:
    > [boot]0020 XICS Init
    > [boot]0021 XICS Done
    > PID hash table entries: 4096 (order: 12, 32768 bytes)
    > clocksource: timebase mult[7d0000] shift[22] registered
    > Console: colour dummy device 80x25
    > console handover: boot [udbg0] -> real [hvc0]
    > Dentry cache hash table entries: 1048576 (order: 7, 8388608 bytes)
    > Inode-cache hash table entries: 524288 (order: 6, 4194304 bytes)
    > freeing bootmem node 0
    
    I've reproduced this on 2.6.27.7.  It is caused by commit
    8f64e1f2 ("powerpc: Reserve in bootmem
    lmb reserved regions that cross NUMA nodes").
    
    The problem is that Jon took a loop which was (in pseudocode):
    
    	for_each_node(nid)
    		NODE_DATA(nid) = careful_alloc(nid);
    		setup_bootmem(nid);
    		reserve_node_bootmem(nid);
    
    and broke it up into:
    
    	for_each_node(nid)
    		NODE_DATA(nid) = careful_alloc(nid);
    		setup_bootmem(nid);
    	for_each_node(nid)
    		reserve_node_bootmem(nid);
    
    The issue comes in when the 'careful_alloc()' is called on a node with
    no memory.  It falls back to using bootmem from a previously-initialized
    node.  But, bootmem has not yet been reserved when Jon's patch is
    applied.  It gives back bogus memory (0xc000000000000000) and pukes
    later in boot.
    
    The following patch collapses the loop back together.  It also breaks
    the mark_reserved_regions_for_nid() code out into a function and adds
    some comments.  I think a huge part of introducing this bug is because
    for loop was too long and hard to read.
    
    The actual bug fix here is the:
    
    +		if (end_pfn <= node->node_start_pfn ||
    +		    start_pfn >= node_end_pfn)
    +			continue;
    Signed-off-by: default avatarDave Hansen <dave@linux.vnet.ibm.com>
    Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
    4a618669
numa.c 28 KB