1. 15 Feb, 2017 8 commits
    • Paolo Bonzini's avatar
      kvm: x86: do not use KVM_REQ_EVENT for APICv interrupt injection · b95234c8
      Paolo Bonzini authored
      Since bf9f6ac8 ("KVM: Update Posted-Interrupts Descriptor when vCPU
      is blocked", 2015-09-18) the posted interrupt descriptor is checked
      unconditionally for PIR.ON.  Therefore we don't need KVM_REQ_EVENT to
      trigger the scan and, if NMIs or SMIs are not involved, we can avoid
      the complicated event injection path.
      
      Calling kvm_vcpu_kick if PIR.ON=1 is also useless, though it has been
      there since APICv was introduced.
      
      However, without the KVM_REQ_EVENT safety net KVM needs to be much
      more careful about races between vmx_deliver_posted_interrupt and
      vcpu_enter_guest.  First, the IPI for posted interrupts may be issued
      between setting vcpu->mode = IN_GUEST_MODE and disabling interrupts.
      If that happens, kvm_trigger_posted_interrupt returns true, but
      smp_kvm_posted_intr_ipi doesn't do anything about it.  The guest is
      entered with PIR.ON, but the posted interrupt IPI has not been sent
      and the interrupt is only delivered to the guest on the next vmentry
      (if any).  To fix this, disable interrupts before setting vcpu->mode.
      This ensures that the IPI is delayed until the guest enters non-root mode;
      it is then trapped by the processor causing the interrupt to be injected.
      
      Second, the IPI may be issued between kvm_x86_ops->sync_pir_to_irr(vcpu)
      and vcpu->mode = IN_GUEST_MODE.  In this case, kvm_vcpu_kick is called
      but it (correctly) doesn't do anything because it sees vcpu->mode ==
      OUTSIDE_GUEST_MODE.  Again, the guest is entered with PIR.ON but no
      posted interrupt IPI is pending; this time, the fix for this is to move
      the RVI update after IN_GUEST_MODE.
      
      Both issues were mostly masked by the liberal usage of KVM_REQ_EVENT,
      though the second could actually happen with VT-d posted interrupts.
      In both race scenarios KVM_REQ_EVENT would cancel guest entry, resulting
      in another vmentry which would inject the interrupt.
      
      This saves about 300 cycles on the self_ipi_* tests of vmexit.flat.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b95234c8
    • Paolo Bonzini's avatar
      KVM: x86: do not scan IRR twice on APICv vmentry · 76dfafd5
      Paolo Bonzini authored
      Calls to apic_find_highest_irr are scanning IRR twice, once
      in vmx_sync_pir_from_irr and once in apic_search_irr.  Change
      sync_pir_from_irr to get the new maximum IRR from kvm_apic_update_irr;
      now that it does the computation, it can also do the RVI write.
      
      In order to avoid complications in svm.c, make the callback optional.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      76dfafd5
    • Paolo Bonzini's avatar
    • Paolo Bonzini's avatar
      KVM: x86: preparatory changes for APICv cleanups · 810e6def
      Paolo Bonzini authored
      Add return value to __kvm_apic_update_irr/kvm_apic_update_irr.
      Move vmx_sync_pir_to_irr around.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      810e6def
    • Paolo Bonzini's avatar
      kvm: nVMX: move nested events check to kvm_vcpu_running · 0ad3bed6
      Paolo Bonzini authored
      vcpu_run calls kvm_vcpu_running, not kvm_arch_vcpu_runnable,
      and the former does not call check_nested_events.
      
      Once KVM_REQ_EVENT is removed from the APICv interrupt injection
      path, however, this would leave no place to trigger a vmexit
      from L2 to L1, causing a missed interrupt delivery while in guest
      mode.  This is caught by the "ack interrupt on exit" test in
      vmx.flat.
      
      [This does not change the calls to check_nested_events in
       inject_pending_event.  That is material for a separate cleanup.]
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      0ad3bed6
    • Paolo Bonzini's avatar
      KVM: vmx: clear pending interrupts on KVM_SET_LAPIC · 967235d3
      Paolo Bonzini authored
      Pending interrupts might be in the PI descriptor when the
      LAPIC is restored from an external state; we do not want
      them to be injected.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      967235d3
    • Paolo Bonzini's avatar
      kvm: vmx: Use the hardware provided GPA instead of page walk · db1c056c
      Paolo Bonzini authored
      As in the SVM patch, the guest physical address is passed by
      VMX to x86_emulate_instruction already, so mark the GPA as available
      in vcpu->arch.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      db1c056c
    • Paolo Bonzini's avatar
      Merge branch 'kvm-ppc-next' of... · ee106891
      Paolo Bonzini authored
      Merge branch 'kvm-ppc-next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD
      
      This brings in two fixes for potential host crashes, from Ben
      Herrenschmidt and Nick Piggin.
      ee106891
  2. 09 Feb, 2017 3 commits
  3. 08 Feb, 2017 15 commits
  4. 07 Feb, 2017 8 commits
  5. 06 Feb, 2017 2 commits
  6. 03 Feb, 2017 4 commits
    • James Hogan's avatar
      KVM: MIPS: Allow multiple VCPUs to be created · 12ed1fae
      James Hogan authored
      Increase the maximum number of MIPS KVM VCPUs to 8, and implement the
      KVM_CAP_NR_VCPUS and KVM_CAP_MAX_CPUS capabilities which expose the
      recommended and maximum number of VCPUs to userland. The previous
      maximum of 1 didn't allow for any form of SMP guests.
      
      We calculate the values similarly to ARM, recommending as many VCPUs as
      there are CPUs online in the system. This will allow userland to know
      how many VCPUs it is possible to create.
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      12ed1fae
    • James Hogan's avatar
      KVM: MIPS/T&E: Expose read-only CP0_IntCtl register · ad58d4d4
      James Hogan authored
      Expose the CP0_IntCtl register through the KVM register access API,
      which is a required register since MIPS32r2. It is currently read-only
      since the VS field isn't implemented due to lack of Config3.VInt or
      Config3.VEIC.
      
      It is implemented in trap_emul.c so that a VZ implementation can allow
      writes.
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      ad58d4d4
    • James Hogan's avatar
      KVM: MIPS/T&E: Expose CP0_EntryLo0/1 registers · 013044cc
      James Hogan authored
      Expose the CP0_EntryLo0 and CP0_EntryLo1 registers through the KVM
      register access API. This is fairly straightforward for trap & emulate
      since we don't support the RI and XI bits. For the sake of future
      proofing (particularly for VZ) it is explicitly specified that the API
      always exposes the 64-bit version of these registers (i.e. with the RI
      and XI bits in bit positions 63 and 62 respectively), and they are
      implemented in trap_emul.c rather than mips.c to allow them to be
      implemented differently for VZ.
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      013044cc
    • James Hogan's avatar
      KVM: MIPS/T&E: Default to reset vector · be67a0be
      James Hogan authored
      Set the default VCPU state closer to the architectural reset state, with
      PC pointing at the reset vector (uncached PA 0x1fc00000, which for KVM
      T&E is VA 0x5fc00000), and with CP0_Status.BEV and CP0_Status.ERL to 1.
      
      Although QEMU at least will overwrite this state, it makes sense to do
      this now that CP0_EBase is properly implemented to check BEV, and now
      that we support a sparse GPA layout potentially with a boot ROM at GPA
      0x1fc00000.
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      be67a0be