• Ryan Roberts's avatar
    arm64/mm: Modify range-based tlbi to decrement scale · e2768b79
    Ryan Roberts authored
    In preparation for adding support for LPA2 to the tlb invalidation
    routines, modify the algorithm used by range-based tlbi to start at the
    highest 'scale' and decrement instead of starting at the lowest 'scale'
    and incrementing. This new approach makes it possible to maintain 64K
    alignment as we work through the range, until the last op (at scale=0).
    This is required when LPA2 is enabled. (This part will be added in a
    subsequent commit).
    
    This change is separated into its own patch because it will also impact
    non-LPA2 systems, and I want to make it easy to bisect in case it leads
    to performance regression (see below for benchmarks that suggest this
    should not be a problem).
    
    The original commit (d1d3aa98 "arm64: tlb: Use the TLBI RANGE feature in
    arm64") stated this as the reason for _incrementing_ scale:
    
      However, in most scenarios, the pages = 1 when flush_tlb_range() is
      called. Start from scale = 3 or other proper value (such as scale
      =ilog2(pages)), will incur extra overhead. So increase 'scale' from 0
      to maximum.
    
    But pages=1 is already special cased by the non-range invalidation path,
    which will take care of it the first time through the loop (both in the
    original commit and in my change), so I don't think switching to
    decrement scale should have any extra performance impact after all.
    
    Indeed benchmarking kernel compilation, a TLBI-heavy workload, suggests
    that this new approach actually _improves_ performance slightly (using a
    virtual machine on Apple M2):
    
    Table shows time to execute kernel compilation workload with 8 jobs,
    relative to baseline without this patch (more negative number is
    bigger speedup). Repeated 9 times across 3 system reboots:
    
    | counter   |       mean |     stdev |
    |:----------|-----------:|----------:|
    | real-time |      -0.6% |      0.0% |
    | kern-time |      -1.6% |      0.5% |
    | user-time |      -0.4% |      0.1% |
    Reviewed-by: default avatarOliver Upton <oliver.upton@linux.dev>
    Signed-off-by: default avatarRyan Roberts <ryan.roberts@arm.com>
    Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
    Link: https://lore.kernel.org/r/20231127111737.1897081-2-ryan.roberts@arm.com
    e2768b79
tlbflush.h 14.4 KB