• Breno Leitao's avatar
    mm/hugetlb: restore the reservation if needed · df7a6d1f
    Breno Leitao authored
    Patch series "mm/hugetlb: Restore the reservation", v2.
    
    This is a fix for a case where a backing huge page could stolen after
    madvise(MADV_DONTNEED).
    
    A full reproducer is in selftest. See
    https://lore.kernel.org/all/20240105155419.1939484-1-leitao@debian.org/
    
    In order to test this patch, I instrumented the kernel with LOCKDEP and
    KASAN, and run the following tests, without any regression:
      * The self test that reproduces the problem
      * All mm hugetlb selftests
    	SUMMARY: PASS=9 SKIP=0 FAIL=0
      * All libhugetlbfs tests
    	PASS:     0     86
    	FAIL:     0      0
    
    
    This patch (of 2):
    
    Currently there is a bug that a huge page could be stolen, and when the
    original owner tries to fault in it, it causes a page fault.
    
    You can achieve that by:
      1) Creating a single page
    	echo 1 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
    
      2) mmap() the page above with MAP_HUGETLB into (void *ptr1).
    	* This will mark the page as reserved
      3) touch the page, which causes a page fault and allocates the page
    	* This will move the page out of the free list.
    	* It will also unreserved the page, since there is no more free
    	  page
      4) madvise(MADV_DONTNEED) the page
    	* This will free the page, but not mark it as reserved.
      5) Allocate a secondary page with mmap(MAP_HUGETLB) into (void *ptr2).
    	* it should fail, but, since there is no more available page.
    	* But, since the page above is not reserved, this mmap() succeed.
      6) Faulting at ptr1 will cause a SIGBUS
    	* it will try to allocate a huge page, but there is none
    	  available
    
    A full reproducer is in selftest. See
    https://lore.kernel.org/all/20240105155419.1939484-1-leitao@debian.org/
    
    Fix this by restoring the reserved page if necessary.
    
    These are the condition for the page restore:
    
     * The system is not using surplus pages. The goal is to reduce the
       surplus usage for this case.
     * If the VMA has the HPAGE_RESV_OWNER flag set, and is PRIVATE. This is
       safely checked using __vma_private_lock()
     * The page is anonymous
    
    Once this is scenario is found, set the `hugetlb_restore_reserve` bit in
    the folio. Then check if the resv reservations need to be adjusted
    later, done later, after the spinlock, since the vma_xxxx_reservation()
    might touch the file system lock.
    
    Link: https://lkml.kernel.org/r/20240205191843.4009640-1-leitao@debian.org
    Link: https://lkml.kernel.org/r/20240205191843.4009640-2-leitao@debian.orgSigned-off-by: default avatarBreno Leitao <leitao@debian.org>
    Suggested-by: default avatarRik van Riel <riel@surriel.com>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Lorenzo Stoakes <lstoakes@gmail.com>
    Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Muchun Song <muchun.song@linux.dev>
    Cc: Roman Gushchin <roman.gushchin@linux.dev>
    Cc: Shuah Khan <shuah@kernel.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    df7a6d1f
hugetlb.c 216 KB