1. 06 Sep, 2021 4 commits
    • Eduardo Habkost's avatar
      kvm: x86: Increase MAX_VCPUS to 1024 · 074c82c8
      Eduardo Habkost authored
      Increase KVM_MAX_VCPUS to 1024, so we can test larger VMs.
      
      I'm not changing KVM_SOFT_MAX_VCPUS yet because I'm afraid it
      might involve complicated questions around the meaning of
      "supported" and "recommended" in the upstream tree.
      KVM_SOFT_MAX_VCPUS will be changed in a separate patch.
      
      For reference, visible effects of this change are:
      - KVM_CAP_MAX_VCPUS will now return 1024 (of course)
      - Default value for CPUID[HYPERV_CPUID_IMPLEMENT_LIMITS (00x40000005)].EAX
        will now be 1024
      - KVM_MAX_VCPU_ID will change from 1151 to 4096
      - Size of struct kvm will increase from 19328 to 22272 bytes
        (in x86_64)
      - Size of struct kvm_ioapic will increase from 1780 to 5084 bytes
        (in x86_64)
      - Bitmap stack variables that will grow:
        - At kvm_hv_flush_tlb() kvm_hv_send_ipi(),
          vp_bitmap[] and vcpu_bitmap[] will now be 128 bytes long
        - vcpu_bitmap at bioapic_write_indirect() will be 128 bytes long
          once patch "KVM: x86: Fix stack-out-of-bounds memory access
          from ioapic_write_indirect()" is applied
      Signed-off-by: default avatarEduardo Habkost <ehabkost@redhat.com>
      Message-Id: <20210903211600.2002377-3-ehabkost@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      074c82c8
    • Eduardo Habkost's avatar
      kvm: x86: Set KVM_MAX_VCPU_ID to 4*KVM_MAX_VCPUS · 4ddacd52
      Eduardo Habkost authored
      Instead of requiring KVM_MAX_VCPU_ID to be manually increased
      every time we increase KVM_MAX_VCPUS, set it to 4*KVM_MAX_VCPUS.
      This should be enough for CPU topologies where Cores-per-Package
      and Packages-per-Socket are not powers of 2.
      
      In practice, this increases KVM_MAX_VCPU_ID from 1023 to 1152.
      The only side effect of this change is making some fields in
      struct kvm_ioapic larger, increasing the struct size from 1628 to
      1780 bytes (in x86_64).
      Signed-off-by: default avatarEduardo Habkost <ehabkost@redhat.com>
      Message-Id: <20210903211600.2002377-2-ehabkost@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      4ddacd52
    • Maxim Levitsky's avatar
      KVM: VMX: avoid running vmx_handle_exit_irqoff in case of emulation · 81b4b56d
      Maxim Levitsky authored
      If we are emulating an invalid guest state, we don't have a correct
      exit reason, and thus we shouldn't do anything in this function.
      Signed-off-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20210826095750.1650467-2-mlevitsk@redhat.com>
      Cc: stable@vger.kernel.org
      Fixes: 95b5a48c ("KVM: VMX: Handle NMIs, #MCs and async #PFs in common irqs-disabled fn", 2019-06-18)
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      81b4b56d
    • Sean Christopherson's avatar
      KVM: x86/mmu: Don't freak out if pml5_root is NULL on 4-level host · a717a780
      Sean Christopherson authored
      Include pml5_root in the set of special roots if and only if the host,
      and thus NPT, is using 5-level paging.  mmu_alloc_special_roots() expects
      special roots to be allocated as a bundle, i.e. they're either all valid
      or all NULL.  But for pml5_root, that expectation only holds true if the
      host uses 5-level paging, which causes KVM to WARN about pml5_root being
      NULL when the other special roots are valid.
      
      The silver lining of 4-level vs. 5-level NPT being tied to the host
      kernel's paging level is that KVM's shadow root level is constant; unlike
      VMX's EPT, KVM can't choose 4-level NPT based on guest.MAXPHYADDR.  That
      means KVM can still expect pml5_root to be bundled with the other special
      roots, it just needs to be conditioned on the shadow root level.
      
      Fixes: cb0f722a ("KVM: x86/mmu: Support shadowing NPT when 5-level paging is enabled in host")
      Reported-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Reviewed-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20210824005824.205536-1-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      a717a780
  2. 20 Aug, 2021 34 commits
  3. 13 Aug, 2021 2 commits
    • Peter Xu's avatar
      KVM: Allow to have arch-specific per-vm debugfs files · 3165af73
      Peter Xu authored
      Allow archs to create arch-specific nodes under kvm->debugfs_dentry directory
      besides the stats fields.  The new interface kvm_arch_create_vm_debugfs() is
      defined but not yet used.  It's called after kvm->debugfs_dentry is created, so
      it can be referenced directly in kvm_arch_create_vm_debugfs().  Arch should
      define their own versions when they want to create extra debugfs nodes.
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Message-Id: <20210730220455.26054-2-peterx@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      3165af73
    • Sean Christopherson's avatar
      KVM: nVMX: Unconditionally clear nested.pi_pending on nested VM-Enter · f7782bb8
      Sean Christopherson authored
      Clear nested.pi_pending on nested VM-Enter even if L2 will run without
      posted interrupts enabled.  If nested.pi_pending is left set from a
      previous L2, vmx_complete_nested_posted_interrupt() will pick up the
      stale flag and exit to userspace with an "internal emulation error" due
      the new L2 not having a valid nested.pi_desc.
      
      Arguably, vmx_complete_nested_posted_interrupt() should first check for
      posted interrupts being enabled, but it's also completely reasonable that
      KVM wouldn't screw up a fundamental flag.  Not to mention that the mere
      existence of nested.pi_pending is a long-standing bug as KVM shouldn't
      move the posted interrupt out of the IRR until it's actually processed,
      e.g. KVM effectively drops an interrupt when it performs a nested VM-Exit
      with a "pending" posted interrupt.  Fixing the mess is a future problem.
      
      Prior to vmx_complete_nested_posted_interrupt() interpreting a null PI
      descriptor as an error, this was a benign bug as the null PI descriptor
      effectively served as a check on PI not being enabled.  Even then, the
      new flow did not become problematic until KVM started checking the result
      of kvm_check_nested_events().
      
      Fixes: 705699a1 ("KVM: nVMX: Enable nested posted interrupt processing")
      Fixes: 966eefb8 ("KVM: nVMX: Disable vmcs02 posted interrupts if vmcs12 PID isn't mappable")
      Fixes: 47d3530f86c0 ("KVM: x86: Exit to userspace when kvm_check_nested_events fails")
      Cc: stable@vger.kernel.org
      Cc: Jim Mattson <jmattson@google.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20210810144526.2662272-1-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      f7782bb8