• Andi Kleen's avatar
    x86/speculation/l1tf: Protect PROT_NONE PTEs against speculation · 6b28baca
    Andi Kleen authored
    When PTEs are set to PROT_NONE the kernel just clears the Present bit and
    preserves the PFN, which creates attack surface for L1TF speculation
    speculation attacks.
    
    This is important inside guests, because L1TF speculation bypasses physical
    page remapping. While the host has its own migitations preventing leaking
    data from other VMs into the guest, this would still risk leaking the wrong
    page inside the current guest.
    
    This uses the same technique as Linus' swap entry patch: while an entry is
    is in PROTNONE state invert the complete PFN part part of it. This ensures
    that the the highest bit will point to non existing memory.
    
    The invert is done by pte/pmd_modify and pfn/pmd/pud_pte for PROTNONE and
    pte/pmd/pud_pfn undo it.
    
    This assume that no code path touches the PFN part of a PTE directly
    without using these primitives.
    
    This doesn't handle the case that MMIO is on the top of the CPU physical
    memory. If such an MMIO region was exposed by an unpriviledged driver for
    mmap it would be possible to attack some real memory.  However this
    situation is all rather unlikely.
    
    For 32bit non PAE the inversion is not done because there are really not
    enough bits to protect anything.
    
    Q: Why does the guest need to be protected when the HyperVisor already has
       L1TF mitigations?
    
    A: Here's an example:
    
       Physical pages 1 2 get mapped into a guest as
       GPA 1 -> PA 2
       GPA 2 -> PA 1
       through EPT.
    
       The L1TF speculation ignores the EPT remapping.
    
       Now the guest kernel maps GPA 1 to process A and GPA 2 to process B, and
       they belong to different users and should be isolated.
    
       A sets the GPA 1 PA 2 PTE to PROT_NONE to bypass the EPT remapping and
       gets read access to the underlying physical page. Which in this case
       points to PA 2, so it can read process B's data, if it happened to be in
       L1, so isolation inside the guest is broken.
    
       There's nothing the hypervisor can do about this. This mitigation has to
       be done in the guest itself.
    
    [ tglx: Massaged changelog ]
    Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
    Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
    Reviewed-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
    Acked-by: default avatarMichal Hocko <mhocko@suse.com>
    Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
    Acked-by: default avatarDave Hansen <dave.hansen@intel.com>
    
    
    6b28baca
pgtable_64.h 9.32 KB