1. 06 Sep, 2021 8 commits
    • Sean Christopherson's avatar
      KVM: x86/mmu: Relocate kvm_mmu_page.tdp_mmu_page for better cache locality · ca41c34c
      Sean Christopherson authored
      Move "tdp_mmu_page" into the 1-byte void left by the recently removed
      "mmio_cached" so that it resides in the first 64 bytes of kvm_mmu_page,
      i.e. in the same cache line as the most commonly accessed fields.
      
      Don't bother wrapping tdp_mmu_page in CONFIG_X86_64, including the field in
      32-bit builds doesn't affect the size of kvm_mmu_page, and a future patch
      can always wrap the field in the unlikely event KVM gains a 1-byte flag
      that is 32-bit specific.
      
      Note, the size of kvm_mmu_page is also unchanged on CONFIG_X86_64=y due
      to it previously sharing an 8-byte chunk with write_flooding_count.
      
      No functional change intended.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20210901221023.1303578-3-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      ca41c34c
    • Sean Christopherson's avatar
      Revert "KVM: x86: mmu: Add guest physical address check in translate_gpa()" · e7177339
      Sean Christopherson authored
      Revert a misguided illegal GPA check when "translating" a non-nested GPA.
      The check is woefully incomplete as it does not fill in @exception as
      expected by all callers, which leads to KVM attempting to inject a bogus
      exception, potentially exposing kernel stack information in the process.
      
       WARNING: CPU: 0 PID: 8469 at arch/x86/kvm/x86.c:525 exception_type+0x98/0xb0 arch/x86/kvm/x86.c:525
       CPU: 1 PID: 8469 Comm: syz-executor531 Not tainted 5.14.0-rc7-syzkaller #0
       RIP: 0010:exception_type+0x98/0xb0 arch/x86/kvm/x86.c:525
       Call Trace:
        x86_emulate_instruction+0xef6/0x1460 arch/x86/kvm/x86.c:7853
        kvm_mmu_page_fault+0x2f0/0x1810 arch/x86/kvm/mmu/mmu.c:5199
        handle_ept_misconfig+0xdf/0x3e0 arch/x86/kvm/vmx/vmx.c:5336
        __vmx_handle_exit arch/x86/kvm/vmx/vmx.c:6021 [inline]
        vmx_handle_exit+0x336/0x1800 arch/x86/kvm/vmx/vmx.c:6038
        vcpu_enter_guest+0x2a1c/0x4430 arch/x86/kvm/x86.c:9712
        vcpu_run arch/x86/kvm/x86.c:9779 [inline]
        kvm_arch_vcpu_ioctl_run+0x47d/0x1b20 arch/x86/kvm/x86.c:10010
        kvm_vcpu_ioctl+0x49e/0xe50 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3652
      
      The bug has escaped notice because practically speaking the GPA check is
      useless.  The GPA check in question only comes into play when KVM is
      walking guest page tables (or "translating" CR3), and KVM already handles
      illegal GPA checks by setting reserved bits in rsvd_bits_mask for each
      PxE, or in the case of CR3 for loading PTDPTRs, manually checks for an
      illegal CR3.  This particular failure doesn't hit the existing reserved
      bits checks because syzbot sets guest.MAXPHYADDR=1, and IA32 architecture
      simply doesn't allow for such an absurd MAXPHYADDR, e.g. 32-bit paging
      doesn't define any reserved PA bits checks, which KVM emulates by only
      incorporating the reserved PA bits into the "high" bits, i.e. bits 63:32.
      
      Simply remove the bogus check.  There is zero meaningful value and no
      architectural justification for supporting guest.MAXPHYADDR < 32, and
      properly filling the exception would introduce non-trivial complexity.
      
      This reverts commit ec7771ab.
      
      Fixes: ec7771ab ("KVM: x86: mmu: Add guest physical address check in translate_gpa()")
      Cc: stable@vger.kernel.org
      Reported-by: syzbot+200c08e88ae818f849ce@syzkaller.appspotmail.com
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20210831164224.1119728-2-seanjc@google.com>
      Reviewed-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      e7177339
    • Jia He's avatar
      KVM: x86/mmu: Remove unused field mmio_cached in struct kvm_mmu_page · 678a305b
      Jia He authored
      After reverting and restoring the fast tlb invalidation patch series,
      the mmio_cached is not removed. Hence a unused field is left in
      kvm_mmu_page.
      
      Cc: Sean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarJia He <justin.he@arm.com>
      Message-Id: <20210830145336.27183-1-justin.he@arm.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      678a305b
    • Eduardo Habkost's avatar
      kvm: x86: Increase KVM_SOFT_MAX_VCPUS to 710 · 1dbaf04c
      Eduardo Habkost authored
      Support for 710 VCPUs was tested by Red Hat since RHEL-8.4,
      so increase KVM_SOFT_MAX_VCPUS to 710.
      Signed-off-by: default avatarEduardo Habkost <ehabkost@redhat.com>
      Message-Id: <20210903211600.2002377-4-ehabkost@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      1dbaf04c
    • Eduardo Habkost's avatar
      kvm: x86: Increase MAX_VCPUS to 1024 · 074c82c8
      Eduardo Habkost authored
      Increase KVM_MAX_VCPUS to 1024, so we can test larger VMs.
      
      I'm not changing KVM_SOFT_MAX_VCPUS yet because I'm afraid it
      might involve complicated questions around the meaning of
      "supported" and "recommended" in the upstream tree.
      KVM_SOFT_MAX_VCPUS will be changed in a separate patch.
      
      For reference, visible effects of this change are:
      - KVM_CAP_MAX_VCPUS will now return 1024 (of course)
      - Default value for CPUID[HYPERV_CPUID_IMPLEMENT_LIMITS (00x40000005)].EAX
        will now be 1024
      - KVM_MAX_VCPU_ID will change from 1151 to 4096
      - Size of struct kvm will increase from 19328 to 22272 bytes
        (in x86_64)
      - Size of struct kvm_ioapic will increase from 1780 to 5084 bytes
        (in x86_64)
      - Bitmap stack variables that will grow:
        - At kvm_hv_flush_tlb() kvm_hv_send_ipi(),
          vp_bitmap[] and vcpu_bitmap[] will now be 128 bytes long
        - vcpu_bitmap at bioapic_write_indirect() will be 128 bytes long
          once patch "KVM: x86: Fix stack-out-of-bounds memory access
          from ioapic_write_indirect()" is applied
      Signed-off-by: default avatarEduardo Habkost <ehabkost@redhat.com>
      Message-Id: <20210903211600.2002377-3-ehabkost@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      074c82c8
    • Eduardo Habkost's avatar
      kvm: x86: Set KVM_MAX_VCPU_ID to 4*KVM_MAX_VCPUS · 4ddacd52
      Eduardo Habkost authored
      Instead of requiring KVM_MAX_VCPU_ID to be manually increased
      every time we increase KVM_MAX_VCPUS, set it to 4*KVM_MAX_VCPUS.
      This should be enough for CPU topologies where Cores-per-Package
      and Packages-per-Socket are not powers of 2.
      
      In practice, this increases KVM_MAX_VCPU_ID from 1023 to 1152.
      The only side effect of this change is making some fields in
      struct kvm_ioapic larger, increasing the struct size from 1628 to
      1780 bytes (in x86_64).
      Signed-off-by: default avatarEduardo Habkost <ehabkost@redhat.com>
      Message-Id: <20210903211600.2002377-2-ehabkost@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      4ddacd52
    • Maxim Levitsky's avatar
      KVM: VMX: avoid running vmx_handle_exit_irqoff in case of emulation · 81b4b56d
      Maxim Levitsky authored
      If we are emulating an invalid guest state, we don't have a correct
      exit reason, and thus we shouldn't do anything in this function.
      Signed-off-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20210826095750.1650467-2-mlevitsk@redhat.com>
      Cc: stable@vger.kernel.org
      Fixes: 95b5a48c ("KVM: VMX: Handle NMIs, #MCs and async #PFs in common irqs-disabled fn", 2019-06-18)
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      81b4b56d
    • Sean Christopherson's avatar
      KVM: x86/mmu: Don't freak out if pml5_root is NULL on 4-level host · a717a780
      Sean Christopherson authored
      Include pml5_root in the set of special roots if and only if the host,
      and thus NPT, is using 5-level paging.  mmu_alloc_special_roots() expects
      special roots to be allocated as a bundle, i.e. they're either all valid
      or all NULL.  But for pml5_root, that expectation only holds true if the
      host uses 5-level paging, which causes KVM to WARN about pml5_root being
      NULL when the other special roots are valid.
      
      The silver lining of 4-level vs. 5-level NPT being tied to the host
      kernel's paging level is that KVM's shadow root level is constant; unlike
      VMX's EPT, KVM can't choose 4-level NPT based on guest.MAXPHYADDR.  That
      means KVM can still expect pml5_root to be bundled with the other special
      roots, it just needs to be conditioned on the shadow root level.
      
      Fixes: cb0f722a ("KVM: x86/mmu: Support shadowing NPT when 5-level paging is enabled in host")
      Reported-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Reviewed-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20210824005824.205536-1-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      a717a780
  2. 20 Aug, 2021 32 commits