• Baoquan He's avatar
    mm_zone: add function to check if managed dma zone exists · 62b31070
    Baoquan He authored
    Patch series "Handle warning of allocation failure on DMA zone w/o
    managed pages", v4.
    
    **Problem observed:
    On x86_64, when crash is triggered and entering into kdump kernel, page
    allocation failure can always be seen.
    
     ---------------------------------
     DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations
     swapper/0: page allocation failure: order:5, mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
     CPU: 0 PID: 1 Comm: swapper/0
     Call Trace:
      dump_stack+0x7f/0xa1
      warn_alloc.cold+0x72/0xd6
      ......
      __alloc_pages+0x24d/0x2c0
      ......
      dma_atomic_pool_init+0xdb/0x176
      do_one_initcall+0x67/0x320
      ? rcu_read_lock_sched_held+0x3f/0x80
      kernel_init_freeable+0x290/0x2dc
      ? rest_init+0x24f/0x24f
      kernel_init+0xa/0x111
      ret_from_fork+0x22/0x30
     Mem-Info:
     ------------------------------------
    
    ***Root cause:
    In the current kernel, it assumes that DMA zone must have managed pages
    and try to request pages if CONFIG_ZONE_DMA is enabled. While this is not
    always true. E.g in kdump kernel of x86_64, only low 1M is presented and
    locked down at very early stage of boot, so that this low 1M won't be
    added into buddy allocator to become managed pages of DMA zone. This
    exception will always cause page allocation failure if page is requested
    from DMA zone.
    
    ***Investigation:
    This failure happens since below commit merged into linus's tree.
      1a6a9044 x86/setup: Remove CONFIG_X86_RESERVE_LOW and reservelow= options
      23721c8e x86/crash: Remove crash_reserve_low_1M()
      f1d4d47c x86/setup: Always reserve the first 1M of RAM
      7c321eb2 x86/kdump: Remove the backup region handling
      6f599d84 x86/kdump: Always reserve the low 1M when the crashkernel option is specified
    
    Before them, on x86_64, the low 640K area will be reused by kdump kernel.
    So in kdump kernel, the content of low 640K area is copied into a backup
    region for dumping before jumping into kdump. Then except of those firmware
    reserved region in [0, 640K], the left area will be added into buddy
    allocator to become available managed pages of DMA zone.
    
    However, after above commits applied, in kdump kernel of x86_64, the low
    1M is reserved by memblock, but not released to buddy allocator. So any
    later page allocation requested from DMA zone will fail.
    
    At the beginning, if crashkernel is reserved, the low 1M need be locked
    down because AMD SME encrypts memory making the old backup region
    mechanims impossible when switching into kdump kernel.
    
    Later, it was also observed that there are BIOSes corrupting memory
    under 1M. To solve this, in commit f1d4d47c, the entire region of
    low 1M is always reserved after the real mode trampoline is allocated.
    
    Besides, recently, Intel engineer mentioned their TDX (Trusted domain
    extensions) which is under development in kernel also needs to lock down
    the low 1M. So we can't simply revert above commits to fix the page allocation
    failure from DMA zone as someone suggested.
    
    ***Solution:
    Currently, only DMA atomic pool and dma-kmalloc will initialize and
    request page allocation with GFP_DMA during bootup.
    
    So only initializ DMA atomic pool when DMA zone has available managed
    pages, otherwise just skip the initialization.
    
    For dma-kmalloc(), for the time being, let's mute the warning of
    allocation failure if requesting pages from DMA zone while no manged
    pages.  Meanwhile, change code to use dma_alloc_xx/dma_map_xx API to
    replace kmalloc(GFP_DMA), or do not use GFP_DMA when calling kmalloc() if
    not necessary.  Christoph is posting patches to fix those under
    drivers/scsi/.  Finally, we can remove the need of dma-kmalloc() as people
    suggested.
    
    This patch (of 3):
    
    In some places of the current kernel, it assumes that dma zone must have
    managed pages if CONFIG_ZONE_DMA is enabled.  While this is not always
    true.  E.g in kdump kernel of x86_64, only low 1M is presented and locked
    down at very early stage of boot, so that there's no managed pages at all
    in DMA zone.  This exception will always cause page allocation failure if
    page is requested from DMA zone.
    
    Here add function has_managed_dma() and the relevant helper functions to
    check if there's DMA zone with managed pages.  It will be used in later
    patches.
    
    Link: https://lkml.kernel.org/r/20211223094435.248523-1-bhe@redhat.com
    Link: https://lkml.kernel.org/r/20211223094435.248523-2-bhe@redhat.com
    Fixes: 6f599d84 ("x86/kdump: Always reserve the low 1M when the crashkernel option is specified")
    Signed-off-by: default avatarBaoquan He <bhe@redhat.com>
    Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
    Acked-by: default avatarJohn Donnelly  <john.p.donnelly@oracle.com>
    Cc: Christoph Hellwig <hch@lst.de>
    Cc: Christoph Lameter <cl@linux.com>
    Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
    Cc: Pekka Enberg <penberg@kernel.org>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Cc: David Laight <David.Laight@ACULAB.COM>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Marek Szyprowski <m.szyprowski@samsung.com>
    Cc: Robin Murphy <robin.murphy@arm.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    62b31070
page_alloc.c 265 KB