1. 26 Apr, 2023 1 commit
    • Sean Christopherson's avatar
      KVM: x86: Preserve TDP MMU roots until they are explicitly invalidated · edbdb43f
      Sean Christopherson authored
      Preserve TDP MMU roots until they are explicitly invalidated by gifting
      the TDP MMU itself a reference to a root when it is allocated.  Keeping a
      reference in the TDP MMU fixes a flaw where the TDP MMU exhibits terrible
      performance, and can potentially even soft-hang a vCPU, if a vCPU
      frequently unloads its roots, e.g. when KVM is emulating SMI+RSM.
      
      When KVM emulates something that invalidates _all_ TLB entries, e.g. SMI
      and RSM, KVM unloads all of the vCPUs roots (KVM keeps a small per-vCPU
      cache of previous roots).  Unloading roots is a simple way to ensure KVM
      flushes and synchronizes all roots for the vCPU, as KVM flushes and syncs
      when allocating a "new" root (from the vCPU's perspective).
      
      In the shadow MMU, KVM keeps track of all shadow pages, roots included, in
      a per-VM hash table.  Unloading a shadow MMU root just wipes it from the
      per-vCPU cache; the root is still tracked in the per-VM hash table.  When
      KVM loads a "new" root for the vCPU, KVM will find the old, unloaded root
      in the per-VM hash table.
      
      Unlike the shadow MMU, the TDP MMU doesn't track "inactive" roots in a
      per-VM structure, where "active" in this case means a root is either
      in-use or cached as a previous root by at least one vCPU.  When a TDP MMU
      root becomes inactive, i.e. the last vCPU reference to the root is put,
      KVM immediately frees the root (asterisk on "immediately" as the actual
      freeing may be done by a worker, but for all intents and purposes the root
      is gone).
      
      The TDP MMU behavior is especially problematic for 1-vCPU setups, as
      unloading all roots effectively frees all roots.  The issue is mitigated
      to some degree in multi-vCPU setups as a different vCPU usually holds a
      reference to an unloaded root and thus keeps the root alive, allowing the
      vCPU to reuse its old root after unloading (with a flush+sync).
      
      The TDP MMU flaw has been known for some time, as until very recently,
      KVM's handling of CR0.WP also triggered unloading of all roots.  The
      CR0.WP toggling scenario was eventually addressed by not unloading roots
      when _only_ CR0.WP is toggled, but such an approach doesn't Just Work
      for emulating SMM as KVM must emulate a full TLB flush on entry and exit
      to/from SMM.  Given that the shadow MMU plays nice with unloading roots
      at will, teaching the TDP MMU to do the same is far less complex than
      modifying KVM to track which roots need to be flushed before reuse.
      
      Note, preserving all possible TDP MMU roots is not a concern with respect
      to memory consumption.  Now that the role for direct MMUs doesn't include
      information about the guest, e.g. CR0.PG, CR0.WP, CR4.SMEP, etc., there
      are _at most_ six possible roots (where "guest_mode" here means L2):
      
        1. 4-level !SMM !guest_mode
        2. 4-level  SMM !guest_mode
        3. 5-level !SMM !guest_mode
        4. 5-level  SMM !guest_mode
        5. 4-level !SMM guest_mode
        6. 5-level !SMM guest_mode
      
      And because each vCPU can track 4 valid roots, a VM can already have all
      6 root combinations live at any given time.  Not to mention that, in
      practice, no sane VMM will advertise different guest.MAXPHYADDR values
      across vCPUs, i.e. KVM won't ever use both 4-level and 5-level roots for
      a single VM.  Furthermore, the vast majority of modern hypervisors will
      utilize EPT/NPT when available, thus the guest_mode=%true cases are also
      unlikely to be utilized.
      Reported-by: default avatarJeremi Piotrowski <jpiotrowski@linux.microsoft.com>
      Link: https://lore.kernel.org/all/959c5bce-beb5-b463-7158-33fc4a4f910c@linux.microsoft.com
      Link: https://lkml.kernel.org/r/20220209170020.1775368-1-pbonzini%40redhat.com
      Link: https://lore.kernel.org/all/20230322013731.102955-1-minipli@grsecurity.net
      Link: https://lore.kernel.org/all/000000000000a0bc2b05f9dd7fab@google.com
      Link: https://lore.kernel.org/all/000000000000eca0b905fa0f7756@google.com
      Cc: Ben Gardon <bgardon@google.com>
      Cc: David Matlack <dmatlack@google.com>
      Cc: stable@vger.kernel.org
      Tested-by: default avatarJeremi Piotrowski <jpiotrowski@linux.microsoft.com>
      Link: https://lore.kernel.org/r/20230426220323.3079789-1-seanjc@google.comSigned-off-by: default avatarSean Christopherson <seanjc@google.com>
      edbdb43f
  2. 10 Apr, 2023 2 commits
  3. 04 Apr, 2023 13 commits
  4. 22 Mar, 2023 2 commits
  5. 17 Mar, 2023 17 commits
  6. 16 Mar, 2023 5 commits