• Benjamin Herrenschmidt's avatar
    powerpc/mm/radix: Workaround prefetch issue with KVM · a25bd72b
    Benjamin Herrenschmidt authored
    There's a somewhat architectural issue with Radix MMU and KVM.
    
    When coming out of a guest with AIL (Alternate Interrupt Location, ie,
    MMU enabled), we start executing hypervisor code with the PID register
    still containing whatever the guest has been using.
    
    The problem is that the CPU can (and will) then start prefetching or
    speculatively load from whatever host context has that same PID (if
    any), thus bringing translations for that context into the TLB, which
    Linux doesn't know about.
    
    This can cause stale translations and subsequent crashes.
    
    Fixing this in a way that is neither racy nor a huge performance
    impact is difficult. We could just make the host invalidations always
    use broadcast forms but that would hurt single threaded programs for
    example.
    
    We chose to fix it instead by partitioning the PID space between guest
    and host. This is possible because today Linux only use 19 out of the
    20 bits of PID space, so existing guests will work if we make the host
    use the top half of the 20 bits space.
    
    We additionally add support for a property to indicate to Linux the
    size of the PID register which will be useful if we eventually have
    processors with a larger PID space available.
    
    There is still an issue with malicious guests purposefully setting the
    PID register to a value in the hosts PID range. Hopefully future HW
    can prevent that, but in the meantime, we handle it with a pair of
    kludges:
    
     - On the way out of a guest, before we clear the current VCPU in the
       PACA, we check the PID and if it's outside of the permitted range
       we flush the TLB for that PID.
    
     - When context switching, if the mm is "new" on that CPU (the
       corresponding bit was set for the first time in the mm cpumask), we
       check if any sibling thread is in KVM (has a non-NULL VCPU pointer
       in the PACA). If that is the case, we also flush the PID for that
       CPU (core).
    
    This second part is needed to handle the case where a process is
    migrated (or starts a new pthread) on a sibling thread of the CPU
    coming out of KVM, as there's a window where stale translations can
    exist before we detect it and flush them out.
    
    A future optimization could be added by keeping track of whether the
    PID has ever been used and avoid doing that for completely fresh PIDs.
    We could similarily mark PIDs that have been the subject of a global
    invalidation as "fresh". But for now this will do.
    Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
    [mpe: Rework the asm to build with CONFIG_PPC_RADIX_MMU=n, drop
          unneeded include of kvm_book3s_asm.h]
    Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
    a25bd72b
pgtable-radix.c 21.9 KB