• Baolin Wang's avatar
    mm/page_alloc: add some comments to explain the possible hole in __pageblock_pfn_to_page() · 65f67a3e
    Baolin Wang authored
    Now the __pageblock_pfn_to_page() is used by set_zone_contiguous(), which
    checks whether the given zone contains holes, and uses
    pfn_to_online_page() to validate if the start pfn is online and valid, as
    well as using pfn_valid() to validate the end pfn.
    
    However, the __pageblock_pfn_to_page() function may return non-NULL even
    if the end pfn of a pageblock is in a memory hole in some situations.  For
    example, if the pageblock order is MAX_ORDER, which will fall into 2
    sub-sections, and the end pfn of the pageblock may be hole even though the
    start pfn is online and valid.
    
    See below memory layout as an example and suppose the pageblock order is
    MAX_ORDER.
    
    [    0.000000] Zone ranges:
    [    0.000000]   DMA      [mem 0x0000000040000000-0x00000000ffffffff]
    [    0.000000]   DMA32    empty
    [    0.000000]   Normal   [mem 0x0000000100000000-0x0000001fa7ffffff]
    [    0.000000] Movable zone start for each node
    [    0.000000] Early memory node ranges
    [    0.000000]   node   0: [mem 0x0000000040000000-0x0000001fa3c7ffff]
    [    0.000000]   node   0: [mem 0x0000001fa3c80000-0x0000001fa3ffffff]
    [    0.000000]   node   0: [mem 0x0000001fa4000000-0x0000001fa402ffff]
    [    0.000000]   node   0: [mem 0x0000001fa4030000-0x0000001fa40effff]
    [    0.000000]   node   0: [mem 0x0000001fa40f0000-0x0000001fa73cffff]
    [    0.000000]   node   0: [mem 0x0000001fa73d0000-0x0000001fa745ffff]
    [    0.000000]   node   0: [mem 0x0000001fa7460000-0x0000001fa746ffff]
    [    0.000000]   node   0: [mem 0x0000001fa7470000-0x0000001fa758ffff]
    [    0.000000]   node   0: [mem 0x0000001fa7590000-0x0000001fa7dfffff]
    
    Focus on the last memory range, and there is a hole for the range [mem
    0x0000001fa7590000-0x0000001fa7dfffff].  That means the last pageblock
    will contain the range from 0x1fa7c00000 to 0x1fa7ffffff, since the
    pageblock must be 4M aligned.  And in this pageblock, these pfns will fall
    into 2 sub-section (the sub-section size is 2M aligned).
    
    So, the 1st sub-section (indicates pfn range: 0x1fa7c00000 - 0x1fa7dfffff
    ) in this pageblock is valid by calling subsection_map_init() in
    free_area_init(), but the 2nd sub-section (indicates pfn range:
    0x1fa7e00000 - 0x1fa7ffffff ) in this pageblock is not valid.
    
    This did not break anything until now, but the zone continuous is fragile
    in this possible scenario.  So as previous discussion[1], it is better to
    add some comments to explain this possible issue in case there are some
    future pfn walkers that rely on this.
    
    [1] https://lore.kernel.org/all/87r0sdsmr6.fsf@yhuang6-desk2.ccr.corp.intel.com/
    
    Link: https://lkml.kernel.org/r/5c26368865e79c743a453dea48d30670b19d2e4f.1682425534.git.baolin.wang@linux.alibaba.com
    Link: https://lkml.kernel.org/r/5c26368865e79c743a453dea48d30670b19d2e4f.1682425534.git.baolin.wang@linux.alibaba.comSigned-off-by: default avatarBaolin Wang <baolin.wang@linux.alibaba.com>
    Acked-by: default avatarMichal Hocko <mhocko@suse.com>
    Reviewed-by: default avatar"Huang, Ying" <ying.huang@intel.com>
    Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Mel Gorman <mgorman@techsingularity.net>
    Cc: Mike Rapoport (IBM) <rppt@kernel.org>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    65f67a3e
page_alloc.c 199 KB