• David Hildenbrand's avatar
    mm/mmu_gather: improve cond_resched() handling with large folios and expensive page freeing · e61abd44
    David Hildenbrand authored
    In tlb_batch_pages_flush(), we can end up freeing up to 512 pages or now
    up to 256 folio fragments that span more than one page, before we
    conditionally reschedule.
    
    It's a pain that we have to handle cond_resched() in
    tlb_batch_pages_flush() manually and cannot simply handle it in
    release_pages() -- release_pages() can be called from atomic context. 
    Well, in a perfect world we wouldn't have to make our code more
    complicated at all.
    
    With page poisoning and init_on_free, we might now run into soft lockups
    when we free a lot of rather large folio fragments, because page freeing
    time then depends on the actual memory size we are freeing instead of on
    the number of folios that are involved.
    
    In the absolute (unlikely) worst case, on arm64 with 64k we will be able
    to free up to 256 folio fragments that each span 512 MiB: zeroing out 128
    GiB does sound like it might take a while.  But instead of ignoring this
    unlikely case, let's just handle it.
    
    So, let's teach tlb_batch_pages_flush() that there are some configurations
    where page freeing is horribly slow, and let's reschedule more frequently
    -- similarly like we did for now before we had large folio fragments in
    there.  Avoid yet another loop over all encoded pages in the common case
    by handling that separately.
    
    Note that with page poisoning/zeroing, we might now end up freeing only a
    single folio fragment at a time that might exceed the old 512 pages limit:
    but if we cannot even free a single MAX_ORDER page on a system without
    running into soft lockups, something else is already completely bogus. 
    Freeing a PMD-mapped THP would similarly cause trouble.
    
    In theory, we might even free 511 order-0 pages + a single MAX_ORDER page,
    effectively having to zero out 8703 pages on arm64 with 64k, translating
    to ~544 MiB of memory: however, if 512 MiB doesn't result in soft lockups,
    544 MiB is unlikely to result in soft lockups, so we won't care about that
    for the time being.
    
    In the future, we might want to detect if handling cond_resched() is
    required at all, and just not do any of that with full preemption enabled.
    
    Link: https://lkml.kernel.org/r/20240214204435.167852-10-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
    Reviewed-by: default avatarRyan Roberts <ryan.roberts@arm.com>
    Cc: Alexander Gordeev <agordeev@linux.ibm.com>
    Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
    Cc: Arnd Bergmann <arnd@arndb.de>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
    Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
    Cc: Heiko Carstens <hca@linux.ibm.com>
    Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: Michael Ellerman <mpe@ellerman.id.au>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com>
    Cc: Nicholas Piggin <npiggin@gmail.com>
    Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
    Cc: Sven Schnelle <svens@linux.ibm.com>
    Cc: Vasily Gorbik <gor@linux.ibm.com>
    Cc: Will Deacon <will@kernel.org>
    Cc: Yin Fengwei <fengwei.yin@intel.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    e61abd44
mmu_gather.c 12.3 KB