Commit deb4c93a authored by Peter Xu's avatar Peter Xu Committed by Andrew Morton

mm/khugepaged: don't recycle vma pgtable if uffd-wp registered

When we're trying to collapse a 2M huge shmem page, don't retract pgtable
pmd page if it's registered with uffd-wp, because that pgtable could have
pte markers installed.  Recycling of that pgtable means we'll lose the pte
markers.  That could cause data loss for an uffd-wp enabled application on
shmem.

Instead of disabling khugepaged on these files, simply skip retracting
these special VMAs, then the page cache can still be merged into a huge
thp, and other mm/vma can still map the range of file with a huge thp when
proper.

Note that checking VM_UFFD_WP needs to be done with mmap_sem held for
write, that avoids race like:

         khugepaged                             user thread
         ==========                             ===========
     check VM_UFFD_WP, not set
                                       UFFDIO_REGISTER with uffd-wp on shmem
                                       wr-protect some pages (install markers)
     take mmap_sem write lock
     erase pmd and free pmd page
      --> pte markers are dropped unnoticed!

Link: https://lkml.kernel.org/r/20220405014921.14994-1-peterx@redhat.comSigned-off-by: default avatarPeter Xu <peterx@redhat.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
parent bc70fbf2
...@@ -1456,6 +1456,10 @@ void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr) ...@@ -1456,6 +1456,10 @@ void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr)
if (!hugepage_vma_check(vma, vma->vm_flags | VM_HUGEPAGE)) if (!hugepage_vma_check(vma, vma->vm_flags | VM_HUGEPAGE))
return; return;
/* Keep pmd pgtable for uffd-wp; see comment in retract_page_tables() */
if (userfaultfd_wp(vma))
return;
hpage = find_lock_page(vma->vm_file->f_mapping, hpage = find_lock_page(vma->vm_file->f_mapping,
linear_page_index(vma, haddr)); linear_page_index(vma, haddr));
if (!hpage) if (!hpage)
...@@ -1591,7 +1595,15 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) ...@@ -1591,7 +1595,15 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff)
* reverse order. Trylock is a way to avoid deadlock. * reverse order. Trylock is a way to avoid deadlock.
*/ */
if (mmap_write_trylock(mm)) { if (mmap_write_trylock(mm)) {
if (!khugepaged_test_exit(mm)) /*
* When a vma is registered with uffd-wp, we can't
* recycle the pmd pgtable because there can be pte
* markers installed. Skip it only, so the rest mm/vma
* can still have the same file mapped hugely, however
* it'll always mapped in small page size for uffd-wp
* registered ranges.
*/
if (!khugepaged_test_exit(mm) && !userfaultfd_wp(vma))
collapse_and_free_pmd(mm, vma, addr, pmd); collapse_and_free_pmd(mm, vma, addr, pmd);
mmap_write_unlock(mm); mmap_write_unlock(mm);
} else { } else {
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment