Commit 4cd7ba16 authored by Ram Tummala's avatar Ram Tummala Committed by Andrew Morton

mm: fix old/young bit handling in the faulting path

Commit 3bd786f7 ("mm: convert do_set_pte() to set_pte_range()")
replaced do_set_pte() with set_pte_range() and that introduced a
regression in the following faulting path of non-anonymous vmas which
caused the PTE for the faulting address to be marked as old instead of
young.

handle_pte_fault()
  do_pte_missing()
    do_fault()
      do_read_fault() || do_cow_fault() || do_shared_fault()
        finish_fault()
          set_pte_range()

The polarity of prefault calculation is incorrect.  This leads to prefault
being incorrectly set for the faulting address.  The following check will
incorrectly mark the PTE old rather than young.  On some architectures
this will cause a double fault to mark it young when the access is
retried.

    if (prefault && arch_wants_old_prefaulted_pte())
        entry = pte_mkold(entry);

On a subsequent fault on the same address, the faulting path will see a
non NULL vmf->pte and instead of reaching the do_pte_missing() path, PTE
will then be correctly marked young in handle_pte_fault() itself.

Due to this bug, performance degradation in the fault handling path will
be observed due to unnecessary double faulting.

Link: https://lkml.kernel.org/r/20240710014539.746200-1-rtummala@nvidia.com
Fixes: 3bd786f7 ("mm: convert do_set_pte() to set_pte_range()")
Signed-off-by: default avatarRam Tummala <rtummala@nvidia.com>
Reviewed-by: default avatarYin Fengwei <fengwei.yin@intel.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Yin Fengwei <fengwei.yin@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
parent 34e526f6
...@@ -4780,7 +4780,7 @@ void set_pte_range(struct vm_fault *vmf, struct folio *folio, ...@@ -4780,7 +4780,7 @@ void set_pte_range(struct vm_fault *vmf, struct folio *folio,
{ {
struct vm_area_struct *vma = vmf->vma; struct vm_area_struct *vma = vmf->vma;
bool write = vmf->flags & FAULT_FLAG_WRITE; bool write = vmf->flags & FAULT_FLAG_WRITE;
bool prefault = in_range(vmf->address, addr, nr * PAGE_SIZE); bool prefault = !in_range(vmf->address, addr, nr * PAGE_SIZE);
pte_t entry; pte_t entry;
flush_icache_pages(vma, page, nr); flush_icache_pages(vma, page, nr);
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment