• Linus Torvalds's avatar
    Revert "rmap: do not call mmu_notifier_invalidate_page() under ptl" · 785373b4
    Linus Torvalds authored
    This reverts commit aac2fea9.
    
    It turns out that that patch was complete and utter garbage, and broke
    KVM, resulting in odd oopses.
    
    Quoting Andrea Arcangeli:
     "The aforementioned commit has 3 bugs.
    
      1) mmu_notifier_invalidate_range cannot be used in replacement of
         mmu_notifier_invalidate_range_start/end.
    
         For KVM mmu_notifier_invalidate_range is a noop and rightfully so.
    
         A MMU notifier implementation has to implement either
         ->invalidate_range method or the invalidate_range_start/end
         methods, not both. And if you implement invalidate_range_start/end
         like KVM is forced to do, calling mmu_notifier_invalidate_range in
         common code is a noop for KVM.
    
         For those MMU notifiers that can get away only implementing
         ->invalidate_range, the ->invalidate_range is implicitly called by
         mmu_notifier_invalidate_range_end(). And only those secondary MMUs
         that share the same pagetable with the primary MMU (like AMD
         iommuv2) can get away only implementing ->invalidate_range.
    
         So all cases (THP on/off) are broken right now.
    
         To fix this is enough to replace mmu_notifier_invalidate_range with
         mmu_notifier_invalidate_range_start;mmu_notifier_invalidate_range_end.
         Either that or call multiple mmu_notifier_invalidate_page like
         before.
    
      2) address + (1UL << compound_order(page) is buggy, it should be
         PAGE_SIZE << compound_order(page), it's bytes not pages, 2M not
         512.
    
      3) The whole invalidate_range thing was an attempt to call a single
         invalidate while walking multiple 4k ptes that maps the same THP
         (after a pmd virtual split without physical compound page THP
         split).
    
         It's unclear if the rmap_walk will always provide an address that
         is 2M aligned as parameter to try_to_unmap_one, in presence of THP.
         I think it needs also an address &= (PAGE_SIZE <<
         compound_order(page)) - 1 to be safe"
    
    In general, we should stop making excuses for horrible MMU notifier
    users.  It's much more important that the core VM is sane and safe, than
    letting MMU notifiers sleep.
    
    So if some MMU notifier is sleeping under a spinlock, we need to fix the
    notifier, not try to make excuses for that garbage in the core VM.
    Reported-and-tested-by: default avatarBernhard Held <berny156@gmx.de>
    Reported-and-tested-by: default avatarAdam Borowski <kilobyte@angband.pl>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Cc: Radim Krčmář <rkrcmar@redhat.com>
    Cc: Wanpeng Li <kernellwp@gmail.com>
    Cc: Paolo Bonzini <pbonzini@redhat.com>
    Cc: Takashi Iwai <tiwai@suse.de>
    Cc: Nadav Amit <nadav.amit@gmail.com>
    Cc: Mike Galbraith <efault@gmx.de>
    Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
    Cc: Jérôme Glisse <jglisse@redhat.com>
    Cc: axie <axie@amd.com>
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    785373b4
rmap.c 48.9 KB