• Nicholas Piggin's avatar
    powerpc/64s/radix: flush remote CPUs out of single-threaded mm_cpumask · 0cef77c7
    Nicholas Piggin authored
    When a single-threaded process has a non-local mm_cpumask, try to use
    that point to flush the TLBs out of other CPUs in the cpumask.
    
    An IPI is used for clearing remote CPUs for a few reasons:
    - An IPI can end lazy TLB use of the mm, which is required to prevent
      TLB entries being created on the remote CPU. The alternative is to
      drop lazy TLB switching completely, which costs 7.5% in a context
      switch ping-pong test betwee a process and kernel idle thread.
    - An IPI can have remote CPUs flush the entire PID, but the local CPU
      can flush a specific VA. tlbie would require over-flushing of the
      local CPU (where the process is running).
    - A single threaded process that is migrated to a different CPU is
      likely to have a relatively small mm_cpumask, so IPI is reasonable.
    
    No other thread can concurrently switch to this mm, because it must
    have been given a reference to mm_users by the current thread before it
    can use_mm. mm_users can be asynchronously incremented (by
    mm_activate or mmget_not_zero), but those users must use remote mm
    access and can't use_mm or access user address space. Existing code
    makes the this assumption already, for example sparc64 has reset
    mm_cpumask using this condition since the start of history, see
    arch/sparc/kernel/smp_64.c.
    
    This reduces tlbies for a kernel compile workload from 0.90M to 0.12M,
    tlbiels are increased significantly due to the PID flushing for the
    cleaning up remote CPUs, and increased local flushes (PID flushes take
    128 tlbiels vs 1 tlbie).
    Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
    Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
    0cef77c7
tlb-radix.c 26.9 KB