• Hugh Dickins's avatar
    mm/munlock: mlock_pte_range() when mlocking or munlocking · 34b67923
    Hugh Dickins authored
    Fill in missing pieces: reimplementation of munlock_vma_pages_range(),
    required to lower the mlock_counts when munlocking without munmapping;
    and its complement, implementation of mlock_vma_pages_range(), required
    to raise the mlock_counts on pages already there when a range is mlocked.
    
    Combine them into just the one function mlock_vma_pages_range(), using
    walk_page_range() to run mlock_pte_range().  This approach fixes the
    "Very slow unlockall()" of unpopulated PROT_NONE areas, reported in
    https://lore.kernel.org/linux-mm/70885d37-62b7-748b-29df-9e94f3291736@gmail.com/
    
    Munlock clears VM_LOCKED at the start, under exclusive mmap_lock; but if
    a racing truncate or holepunch (depending on i_mmap_rwsem) gets to the
    pte first, it will not try to munlock the page: leaving release_pages()
    to correct it when the last reference to the page is gone - that's okay,
    a page is not evictable anyway while it is held by an extra reference.
    
    Mlock sets VM_LOCKED at the start, under exclusive mmap_lock; but if
    a racing remove_migration_pte() or try_to_unmap_one() (depending on
    i_mmap_rwsem) gets to the pte first, it will try to mlock the page,
    then mlock_pte_range() mlock it a second time.  This is harder to
    reproduce, but a more serious race because it could leave the page
    unevictable indefinitely though the area is munlocked afterwards.
    Guard against it by setting the (inappropriate) VM_IO flag,
    and modifying mlock_vma_page() to decline such vmas.
    Signed-off-by: default avatarHugh Dickins <hughd@google.com>
    Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
    Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
    34b67923
internal.h 22.8 KB