• Mark Rutland's avatar
    arm64: rework EL0 MRS emulation · f5962add
    Mark Rutland authored
    On CPUs without FEAT_IDST, ID register emulation is slower than it needs
    to be, as all threads contend for the same lock to perform the
    emulation. This patch reworks the emulation to avoid this unnecessary
    contention.
    
    On CPUs with FEAT_IDST (which is mandatory from ARMv8.4 onwards), EL0
    accesses to ID registers result in a SYS trap, and emulation of these is
    handled with a sys64_hook. These hooks are statically allocated, and no
    locking is required to iterate through the hooks and perform the
    emulation, allowing emulation to occur in parallel with no contention.
    
    On CPUs without FEAT_IDST, EL0 accesses to ID registers result in an
    UNDEFINED exception, and emulation of these accesses is handled with an
    undef_hook. When an EL0 MRS instruction is trapped to EL1, the kernel
    finds the relevant handler by iterating through all of the undef_hooks,
    requiring undef_lock to be held during this lookup.
    
    This locking is only required to safely traverse the list of undef_hooks
    (as it can be concurrently modified), and the actual emulation of the
    MRS does not require any mutual exclusion. This locking is an
    unfortunate bottleneck, especially given that MRS emulation is enabled
    unconditionally and is never disabled.
    
    This patch reworks the non-FEAT_IDST MRS emulation logic so that it can
    be invoked directly from do_el0_undef(). This removes the bottleneck,
    allowing MRS traps to be handled entirely in parallel, and is a stepping
    stone to making all of the undef_hooks lock-free.
    
    I've tested this in a 64-vCPU VM on a 64-CPU ThunderX2 host, with a
    benchmark which spawns a number of threads which each try to read
    ID_AA64ISAR0_EL1 1000000 times. This is vastly more contention than will
    ever be seen in realistic usage, but clearly demonstrates the removal of
    the bottleneck:
    
      | Threads || Time (seconds)                       |
      |         || Before           || After            |
      |         || Real   | System  || Real   | System  |
      |---------++--------+---------++--------+---------|
      |       1 ||   0.29 |    0.20 ||   0.24 |    0.12 |
      |       2 ||   0.35 |    0.51 ||   0.23 |    0.27 |
      |       4 ||   1.08 |    3.87 ||   0.24 |    0.56 |
      |       8 ||   4.31 |   33.60 ||   0.24 |    1.11 |
      |      16 ||   9.47 |  149.39 ||   0.23 |    2.15 |
      |      32 ||  19.07 |  605.27 ||   0.24 |    4.38 |
      |      64 ||  65.40 | 3609.09 ||   0.33 |   11.27 |
    
    Aside from the speedup, there should be no functional change as a result
    of this patch.
    Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: James Morse <james.morse@arm.com>
    Cc: Joey Gouly <joey.gouly@arm.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Will Deacon <will@kernel.org>
    Link: https://lore.kernel.org/r/20221019144123.612388-6-mark.rutland@arm.comSigned-off-by: default avatarWill Deacon <will@kernel.org>
    f5962add
cpufeature.h 29 KB