• Mike Kravetz's avatar
    mm/hugetlb: expand restore_reserve_on_error functionality · 846be085
    Mike Kravetz authored
    The routine restore_reserve_on_error is called to restore reservation
    information when an error occurs after page allocation.  The routine
    alloc_huge_page modifies the mapping reserve map and potentially the
    reserve count during allocation.  If code calling alloc_huge_page
    encounters an error after allocation and needs to free the page, the
    reservation information needs to be adjusted.
    
    Currently, restore_reserve_on_error only takes action on pages for which
    the reserve count was adjusted(HPageRestoreReserve flag).  There is
    nothing wrong with these adjustments.  However, alloc_huge_page ALWAYS
    modifies the reserve map during allocation even if the reserve count is
    not adjusted.  This can cause issues as observed during development of
    this patch [1].
    
    One specific series of operations causing an issue is:
    
     - Create a shared hugetlb mapping
       Reservations for all pages created by default
    
     - Fault in a page in the mapping
       Reservation exists so reservation count is decremented
    
     - Punch a hole in the file/mapping at index previously faulted
       Reservation and any associated pages will be removed
    
     - Allocate a page to fill the hole
       No reservation entry, so reserve count unmodified
       Reservation entry added to map by alloc_huge_page
    
     - Error after allocation and before instantiating the page
       Reservation entry remains in map
    
     - Allocate a page to fill the hole
       Reservation entry exists, so decrement reservation count
    
    This will cause a reservation count underflow as the reservation count
    was decremented twice for the same index.
    
    A user would observe a very large number for HugePages_Rsvd in
    /proc/meminfo.  This would also likely cause subsequent allocations of
    hugetlb pages to fail as it would 'appear' that all pages are reserved.
    
    This sequence of operations is unlikely to happen, however they were
    easily reproduced and observed using hacked up code as described in [1].
    
    Address the issue by having the routine restore_reserve_on_error take
    action on pages where HPageRestoreReserve is not set.  In this case, we
    need to remove any reserve map entry created by alloc_huge_page.  A new
    helper routine vma_del_reservation assists with this operation.
    
    There are three callers of alloc_huge_page which do not currently call
    restore_reserve_on error before freeing a page on error paths.  Add
    those missing calls.
    
    [1] https://lore.kernel.org/linux-mm/20210528005029.88088-1-almasrymina@google.com/
    
    Link: https://lkml.kernel.org/r/20210607204510.22617-1-mike.kravetz@oracle.com
    Fixes: 96b96a96 ("mm/hugetlb: fix huge page reservation leak in private mapping error paths"
    Signed-off-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
    Reviewed-by: default avatarMina Almasry <almasrymina@google.com>
    Cc: Axel Rasmussen <axelrasmussen@google.com>
    Cc: Peter Xu <peterx@redhat.com>
    Cc: Muchun Song <songmuchun@bytedance.com>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    846be085
hugetlb.c 168 KB