• Peter Xu's avatar
    mm/mprotect: push mmu notifier to PUDs · 7f06e3aa
    Peter Xu authored
    mprotect() does mmu notifiers in PMD levels.  It's there since 2014 of
    commit a5338093 ("mm: move mmu notifier call from change_protection to
    change_pmd_range").
    
    At that time, the issue was that NUMA balancing can be applied on a huge
    range of VM memory, even if nothing was populated.  The notification can
    be avoided in this case if no valid pmd detected, which includes either
    THP or a PTE pgtable page.
    
    Now to pave way for PUD handling, this isn't enough.  We need to generate
    mmu notifications even on PUD entries properly.  mprotect() is currently
    broken on PUD (e.g., one can easily trigger kernel error with dax 1G
    mappings already), this is the start to fix it.
    
    To fix that, this patch proposes to push such notifications to the PUD
    layers.
    
    There is risk on regressing the problem Rik wanted to resolve before, but I
    think it shouldn't really happen, and I still chose this solution because
    of a few reasons:
    
      1) Consider a large VM that should definitely contain more than GBs of
      memory, it's highly likely that PUDs are also none.  In this case there
      will have no regression.
    
      2) KVM has evolved a lot over the years to get rid of rmap walks, which
      might be the major cause of the previous soft-lockup.  At least TDP MMU
      already got rid of rmap as long as not nested (which should be the major
      use case, IIUC), then the TDP MMU pgtable walker will simply see empty VM
      pgtable (e.g. EPT on x86), the invalidation of a full empty region in
      most cases could be pretty fast now, comparing to 2014.
    
      3) KVM has explicit code paths now to even give way for mmu notifiers
      just like this one, e.g. in commit d02c357e ("KVM: x86/mmu: Retry
      fault before acquiring mmu_lock if mapping is changing").  It'll also
      avoid contentions that may also contribute to a soft-lockup.
    
      4) Stick with PMD layer simply don't work when PUD is there...  We need
      one way or another to fix PUD mappings on mprotect().
    
    Pushing it to PUD should be the safest approach as of now, e.g. there's yet
    no sign of huge P4D coming on any known archs.
    
    Link: https://lkml.kernel.org/r/20240812181225.1360970-3-peterx@redhat.comSigned-off-by: default avatarPeter Xu <peterx@redhat.com>
    Cc: Sean Christopherson <seanjc@google.com>
    Cc: Paolo Bonzini <pbonzini@redhat.com>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Rik van Riel <riel@surriel.com>
    Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
    Cc: Dan Williams <dan.j.williams@intel.com>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: Dave Jiang <dave.jiang@intel.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Kirill A. Shutemov <kirill@shutemov.name>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Michael Ellerman <mpe@ellerman.id.au>
    Cc: Nicholas Piggin <npiggin@gmail.com>
    Cc: Oscar Salvador <osalvador@suse.de>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    7f06e3aa
mprotect.c 22.6 KB