• David Hildenbrand's avatar
    mm/memory: factor out zapping of present pte into zap_present_pte() · 789753e1
    David Hildenbrand authored
    Patch series "mm/memory: optimize unmap/zap with PTE-mapped THP", v3.
    
    This series is based on [1].  Similar to what we did with fork(), let's
    implement PTE batching during unmap/zap when processing PTE-mapped THPs.
    
    We collect consecutive PTEs that map consecutive pages of the same large
    folio, making sure that the other PTE bits are compatible, and (a) adjust
    the refcount only once per batch, (b) call rmap handling functions only
    once per batch, (c) perform batch PTE setting/updates and (d) perform TLB
    entry removal once per batch.
    
    Ryan was previously working on this in the context of cont-pte for arm64,
    int latest iteration [2] with a focus on arm6 with cont-pte only.  This
    series implements the optimization for all architectures, independent of
    such PTE bits, teaches MMU gather/TLB code to be fully aware of such
    large-folio-pages batches as well, and amkes use of our new rmap batching
    function when removing the rmap.
    
    To achieve that, we have to enlighten MMU gather / page freeing code
    (i.e., everything that consumes encoded_page) to process unmapping of
    consecutive pages that all belong to the same large folio.  I'm being very
    careful to not degrade order-0 performance, and it looks like I managed to
    achieve that.
    
    While this series should -- similar to [1] -- be beneficial for adding
    cont-pte support on arm64[2], it's one of the requirements for maintaining
    a total mapcount[3] for large folios with minimal added overhead and
    further changes[4] that build up on top of the total mapcount.
    
    Independent of all that, this series results in a speedup during munmap()
    and similar unmapping (process teardown, MADV_DONTNEED on larger ranges)
    with PTE-mapped THP, which is the default with THPs that are smaller than
    a PMD (for example, 16KiB to 1024KiB mTHPs for anonymous memory[5]).
    
    On an Intel Xeon Silver 4210R CPU, munmap'ing a 1GiB VMA backed by
    PTE-mapped folios of the same size (stddev < 1%) results in the following
    runtimes for munmap() in seconds (shorter is better):
    
    Folio Size | mm-unstable |      New | Change
    ---------------------------------------------
          4KiB |    0.058110 | 0.057715 |   - 1%
         16KiB |    0.044198 | 0.035469 |   -20%
         32KiB |    0.034216 | 0.023522 |   -31%
         64KiB |    0.029207 | 0.018434 |   -37%
        128KiB |    0.026579 | 0.014026 |   -47%
        256KiB |    0.025130 | 0.011756 |   -53%
        512KiB |    0.024292 | 0.010703 |   -56%
       1024KiB |    0.023812 | 0.010294 |   -57%
       2048KiB |    0.023785 | 0.009910 |   -58%
    
    [1] https://lkml.kernel.org/r/20240129124649.189745-1-david@redhat.com
    [2] https://lkml.kernel.org/r/20231218105100.172635-1-ryan.roberts@arm.com
    [3] https://lkml.kernel.org/r/20230809083256.699513-1-david@redhat.com
    [4] https://lkml.kernel.org/r/20231124132626.235350-1-david@redhat.com
    [5] https://lkml.kernel.org/r/20231207161211.2374093-1-ryan.roberts@arm.com
    
    
    This patch (of 10):
    
    Let's prepare for further changes by factoring out processing of present
    PTEs.
    
    Link: https://lkml.kernel.org/r/20240214204435.167852-1-david@redhat.com
    Link: https://lkml.kernel.org/r/20240214204435.167852-2-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
    Reviewed-by: default avatarRyan Roberts <ryan.roberts@arm.com>
    Cc: Alexander Gordeev <agordeev@linux.ibm.com>
    Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
    Cc: Arnd Bergmann <arnd@arndb.de>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
    Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
    Cc: Heiko Carstens <hca@linux.ibm.com>
    Cc: linuxppc-dev@lists.ozlabs.org
    Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: Michael Ellerman <mpe@ellerman.id.au>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com>
    Cc: Nicholas Piggin <npiggin@gmail.com>
    Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
    Cc: Sven Schnelle <svens@linux.ibm.com>
    Cc: Vasily Gorbik <gor@linux.ibm.com>
    Cc: Will Deacon <will@kernel.org>
    Cc: Yin Fengwei <fengwei.yin@intel.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    789753e1
memory.c 175 KB