• Hugh Dickins's avatar
    mm,thp,rmap: simplify compound page mapcount handling · cb67f428
    Hugh Dickins authored
    Compound page (folio) mapcount calculations have been different for anon
    and file (or shmem) THPs, and involved the obscure PageDoubleMap flag. 
    And each huge mapping and unmapping of a file (or shmem) THP involved
    atomically incrementing and decrementing the mapcount of every subpage of
    that huge page, dirtying many struct page cachelines.
    
    Add subpages_mapcount field to the struct folio and first tail page, so
    that the total of subpage mapcounts is available in one place near the
    head: then page_mapcount() and total_mapcount() and page_mapped(), and
    their folio equivalents, are so quick that anon and file and hugetlb don't
    need to be optimized differently.  Delete the unloved PageDoubleMap.
    
    page_add and page_remove rmap functions must now maintain the
    subpages_mapcount as well as the subpage _mapcount, when dealing with pte
    mappings of huge pages; and correct maintenance of NR_ANON_MAPPED and
    NR_FILE_MAPPED statistics still needs reading through the subpages, using
    nr_subpages_unmapped() - but only when first or last pmd mapping finds
    subpages_mapcount raised (double-map case, not the common case).
    
    But are those counts (used to decide when to split an anon THP, and in
    vmscan's pagecache_reclaimable heuristic) correctly maintained?  Not
    quite: since page_remove_rmap() (and also split_huge_pmd()) is often
    called without page lock, there can be races when a subpage pte mapcount
    0<->1 while compound pmd mapcount 0<->1 is scanning - races which the
    previous implementation had prevented.  The statistics might become
    inaccurate, and even drift down until they underflow through 0.  That is
    not good enough, but is better dealt with in a followup patch.
    
    Update a few comments on first and second tail page overlaid fields. 
    hugepage_add_new_anon_rmap() has to "increment" compound_mapcount, but
    subpages_mapcount and compound_pincount are already correctly at 0, so
    delete its reinitialization of compound_pincount.
    
    A simple 100 X munmap(mmap(2GB, MAP_SHARED|MAP_POPULATE, tmpfs), 2GB) took
    18 seconds on small pages, and used to take 1 second on huge pages, but
    now takes 119 milliseconds on huge pages.  Mapping by pmds a second time
    used to take 860ms and now takes 92ms; mapping by pmds after mapping by
    ptes (when the scan is needed) used to take 870ms and now takes 495ms. 
    But there might be some benchmarks which would show a slowdown, because
    tail struct pages now fall out of cache until final freeing checks them.
    
    Link: https://lkml.kernel.org/r/47ad693-717-79c8-e1ba-46c3a6602e48@google.comSigned-off-by: default avatarHugh Dickins <hughd@google.com>
    Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: James Houghton <jthoughton@google.com>
    Cc: John Hubbard <jhubbard@nvidia.com>
    Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: Mina Almasry <almasrymina@google.com>
    Cc: Muchun Song <songmuchun@bytedance.com>
    Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>
    Cc: Peter Xu <peterx@redhat.com>
    Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Cc: Yang Shi <shy828301@gmail.com>
    Cc: Zach O'Keefe <zokeefe@google.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    cb67f428
khugepaged.c 70 KB