Commit 38e08854 authored by Lorenzo Stoakes's avatar Lorenzo Stoakes Committed by Linus Torvalds

mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing

The NUMA balancing logic uses an arch-specific PROT_NONE page table flag
defined by pte_protnone() or pmd_protnone() to mark PTEs or huge page
PMDs respectively as requiring balancing upon a subsequent page fault.
User-defined PROT_NONE memory regions which also have this flag set will
not normally invoke the NUMA balancing code as do_page_fault() will send
a segfault to the process before handle_mm_fault() is even called.

However if access_remote_vm() is invoked to access a PROT_NONE region of
memory, handle_mm_fault() is called via faultin_page() and
__get_user_pages() without any access checks being performed, meaning
the NUMA balancing logic is incorrectly invoked on a non-NUMA memory
region.

A simple means of triggering this problem is to access PROT_NONE mmap'd
memory using /proc/self/mem which reliably results in the NUMA handling
functions being invoked when CONFIG_NUMA_BALANCING is set.

This issue was reported in bugzilla (issue 99101) which includes some
simple repro code.

There are BUG_ON() checks in do_numa_page() and do_huge_pmd_numa_page()
added at commit c0e7cad9 to avoid accidentally provoking strange
behaviour by attempting to apply NUMA balancing to pages that are in
fact PROT_NONE.  The BUG_ON()'s are consistently triggered by the repro.

This patch moves the PROT_NONE check into mm/memory.c rather than
invoking BUG_ON() as faulting in these pages via faultin_page() is a
valid reason for reaching the NUMA check with the PROT_NONE page table
flag set and is therefore not always a bug.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=99101Reported-by: default avatarTrevor Saunders <tbsaunde@tbsaunde.org>
Signed-off-by: default avatarLorenzo Stoakes <lstoakes@gmail.com>
Acked-by: default avatarRik van Riel <riel@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
parent 831e45d8
...@@ -1138,9 +1138,6 @@ int do_huge_pmd_numa_page(struct fault_env *fe, pmd_t pmd) ...@@ -1138,9 +1138,6 @@ int do_huge_pmd_numa_page(struct fault_env *fe, pmd_t pmd)
bool was_writable; bool was_writable;
int flags = 0; int flags = 0;
/* A PROT_NONE fault should not end up here */
BUG_ON(!(vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE)));
fe->ptl = pmd_lock(vma->vm_mm, fe->pmd); fe->ptl = pmd_lock(vma->vm_mm, fe->pmd);
if (unlikely(!pmd_same(pmd, *fe->pmd))) if (unlikely(!pmd_same(pmd, *fe->pmd)))
goto out_unlock; goto out_unlock;
......
...@@ -3351,9 +3351,6 @@ static int do_numa_page(struct fault_env *fe, pte_t pte) ...@@ -3351,9 +3351,6 @@ static int do_numa_page(struct fault_env *fe, pte_t pte)
bool was_writable = pte_write(pte); bool was_writable = pte_write(pte);
int flags = 0; int flags = 0;
/* A PROT_NONE fault should not end up here */
BUG_ON(!(vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE)));
/* /*
* The "pte" at this point cannot be used safely without * The "pte" at this point cannot be used safely without
* validation through pte_unmap_same(). It's of NUMA type but * validation through pte_unmap_same(). It's of NUMA type but
...@@ -3458,6 +3455,11 @@ static int wp_huge_pmd(struct fault_env *fe, pmd_t orig_pmd) ...@@ -3458,6 +3455,11 @@ static int wp_huge_pmd(struct fault_env *fe, pmd_t orig_pmd)
return VM_FAULT_FALLBACK; return VM_FAULT_FALLBACK;
} }
static inline bool vma_is_accessible(struct vm_area_struct *vma)
{
return vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE);
}
/* /*
* These routines also need to handle stuff like marking pages dirty * These routines also need to handle stuff like marking pages dirty
* and/or accessed for architectures that don't do it in hardware (most * and/or accessed for architectures that don't do it in hardware (most
...@@ -3524,7 +3526,7 @@ static int handle_pte_fault(struct fault_env *fe) ...@@ -3524,7 +3526,7 @@ static int handle_pte_fault(struct fault_env *fe)
if (!pte_present(entry)) if (!pte_present(entry))
return do_swap_page(fe, entry); return do_swap_page(fe, entry);
if (pte_protnone(entry)) if (pte_protnone(entry) && vma_is_accessible(fe->vma))
return do_numa_page(fe, entry); return do_numa_page(fe, entry);
fe->ptl = pte_lockptr(fe->vma->vm_mm, fe->pmd); fe->ptl = pte_lockptr(fe->vma->vm_mm, fe->pmd);
...@@ -3590,7 +3592,7 @@ static int __handle_mm_fault(struct vm_area_struct *vma, unsigned long address, ...@@ -3590,7 +3592,7 @@ static int __handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
barrier(); barrier();
if (pmd_trans_huge(orig_pmd) || pmd_devmap(orig_pmd)) { if (pmd_trans_huge(orig_pmd) || pmd_devmap(orig_pmd)) {
if (pmd_protnone(orig_pmd)) if (pmd_protnone(orig_pmd) && vma_is_accessible(vma))
return do_huge_pmd_numa_page(&fe, orig_pmd); return do_huge_pmd_numa_page(&fe, orig_pmd);
if ((fe.flags & FAULT_FLAG_WRITE) && if ((fe.flags & FAULT_FLAG_WRITE) &&
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment