• Muchun Song's avatar
    mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page · ad2fa371
    Muchun Song authored
    When we free a HugeTLB page to the buddy allocator, we need to allocate
    the vmemmap pages associated with it.  However, we may not be able to
    allocate the vmemmap pages when the system is under memory pressure.  In
    this case, we just refuse to free the HugeTLB page.  This changes behavior
    in some corner cases as listed below:
    
     1) Failing to free a huge page triggered by the user (decrease nr_pages).
    
        User needs to try again later.
    
     2) Failing to free a surplus huge page when freed by the application.
    
        Try again later when freeing a huge page next time.
    
     3) Failing to dissolve a free huge page on ZONE_MOVABLE via
        offline_pages().
    
        This can happen when we have plenty of ZONE_MOVABLE memory, but
        not enough kernel memory to allocate vmemmmap pages.  We may even
        be able to migrate huge page contents, but will not be able to
        dissolve the source huge page.  This will prevent an offline
        operation and is unfortunate as memory offlining is expected to
        succeed on movable zones.  Users that depend on memory hotplug
        to succeed for movable zones should carefully consider whether the
        memory savings gained from this feature are worth the risk of
        possibly not being able to offline memory in certain situations.
    
     4) Failing to dissolve a huge page on CMA/ZONE_MOVABLE via
        alloc_contig_range() - once we have that handling in place. Mainly
        affects CMA and virtio-mem.
    
        Similar to 3). virito-mem will handle migration errors gracefully.
        CMA might be able to fallback on other free areas within the CMA
        region.
    
    Vmemmap pages are allocated from the page freeing context.  In order for
    those allocations to be not disruptive (e.g.  trigger oom killer)
    __GFP_NORETRY is used.  hugetlb_lock is dropped for the allocation because
    a non sleeping allocation would be too fragile and it could fail too
    easily under memory pressure.  GFP_ATOMIC or other modes to access memory
    reserves is not used because we want to prevent consuming reserves under
    heavy hugetlb freeing.
    
    [mike.kravetz@oracle.com: fix dissolve_free_huge_page use of tail/head page]
      Link: https://lkml.kernel.org/r/20210527231225.226987-1-mike.kravetz@oracle.com
    [willy@infradead.org: fix alloc_vmemmap_page_list documentation warning]
      Link: https://lkml.kernel.org/r/20210615200242.1716568-6-willy@infradead.org
    
    Link: https://lkml.kernel.org/r/20210510030027.56044-7-songmuchun@bytedance.comSigned-off-by: default avatarMuchun Song <songmuchun@bytedance.com>
    Signed-off-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
    Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
    Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
    Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
    Cc: Alexander Viro <viro@zeniv.linux.org.uk>
    Cc: Andy Lutomirski <luto@kernel.org>
    Cc: Anshuman Khandual <anshuman.khandual@arm.com>
    Cc: Balbir Singh <bsingharora@gmail.com>
    Cc: Barry Song <song.bao.hua@hisilicon.com>
    Cc: Bodeddula Balasubramaniam <bodeddub@amazon.com>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Chen Huang <chenhuang5@huawei.com>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: David Rientjes <rientjes@google.com>
    Cc: HORIGUCHI NAOYA <naoya.horiguchi@nec.com>
    Cc: "H. Peter Anvin" <hpa@zytor.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Joao Martins <joao.m.martins@oracle.com>
    Cc: Joerg Roedel <jroedel@suse.de>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Mina Almasry <almasrymina@google.com>
    Cc: Oliver Neukum <oneukum@suse.com>
    Cc: Paul E. McKenney <paulmck@kernel.org>
    Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Randy Dunlap <rdunlap@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    ad2fa371
sparse-vmemmap.c 14 KB