1. 24 Oct, 2018 1 commit
  2. 23 Oct, 2018 1 commit
    • Radim Krčmář's avatar
      Revert "kvm: x86: optimize dr6 restore" · f9dcf08e
      Radim Krčmář authored
      This reverts commit 0e0a53c5.
      
      As Christian Ehrhardt noted:
      
        The most common case is that vcpu->arch.dr6 and the host's %dr6 value
        are not related at all because ->switch_db_regs is zero. To do this
        all correctly, we must handle the case where the guest leaves an arbitrary
        unused value in vcpu->arch.dr6 before disabling breakpoints again.
      
        However, this means that vcpu->arch.dr6 is not suitable to detect the
        need for a %dr6 clear.
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      f9dcf08e
  3. 21 Oct, 2018 1 commit
  4. 20 Oct, 2018 1 commit
    • Alexey Kardashevskiy's avatar
      KVM: PPC: Optimize clearing TCEs for sparse tables · 6e301a8e
      Alexey Kardashevskiy authored
      The powernv platform maintains 2 TCE tables for VFIO - a hardware TCE
      table and a table with userspace addresses. These tables are radix trees,
      we allocate indirect levels when they are written to. Since
      the memory allocation is problematic in real mode, we have 2 accessors
      to the entries:
      - for virtual mode: it allocates the memory and it is always expected
      to return non-NULL;
      - fr real mode: it does not allocate and can return NULL.
      
      Also, DMA windows can span to up to 55 bits of the address space and since
      we never have this much RAM, such windows are sparse. However currently
      the SPAPR TCE IOMMU driver walks through all TCEs to unpin DMA memory.
      
      Since we maintain a userspace addresses table for VFIO which is a mirror
      of the hardware table, we can use it to know which parts of the DMA
      window have not been mapped and skip these so does this patch.
      
      The bare metal systems do not have this problem as they use a bypass mode
      of a PHB which maps RAM directly.
      
      This helps a lot with sparse DMA windows, reducing the shutdown time from
      about 3 minutes per 1 billion TCEs to a few seconds for 32GB sparse guest.
      Just skipping the last level seems to be good enough.
      
      As non-allocating accessor is used now in virtual mode as well, rename it
      from IOMMU_TABLE_USERSPACE_ENTRY_RM (real mode) to _RO (read only).
      Signed-off-by: default avatarAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      6e301a8e
  5. 19 Oct, 2018 5 commits
  6. 18 Oct, 2018 2 commits
  7. 17 Oct, 2018 8 commits
    • Mark Rutland's avatar
      KVM: arm64: Fix caching of host MDCR_EL2 value · da5a3ce6
      Mark Rutland authored
      At boot time, KVM stashes the host MDCR_EL2 value, but only does this
      when the kernel is not running in hyp mode (i.e. is non-VHE). In these
      cases, the stashed value of MDCR_EL2.HPMN happens to be zero, which can
      lead to CONSTRAINED UNPREDICTABLE behaviour.
      
      Since we use this value to derive the MDCR_EL2 value when switching
      to/from a guest, after a guest have been run, the performance counters
      do not behave as expected. This has been observed to result in accesses
      via PMXEVTYPER_EL0 and PMXEVCNTR_EL0 not affecting the relevant
      counters, resulting in events not being counted. In these cases, only
      the fixed-purpose cycle counter appears to work as expected.
      
      Fix this by always stashing the host MDCR_EL2 value, regardless of VHE.
      
      Cc: Christopher Dall <christoffer.dall@arm.com>
      Cc: James Morse <james.morse@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: stable@vger.kernel.org
      Fixes: 1e947bad ("arm64: KVM: Skip HYP setup when already running in HYP")
      Tested-by: default avatarRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      da5a3ce6
    • Paolo Bonzini's avatar
      KVM: VMX: enable nested virtualization by default · 1e58e5e5
      Paolo Bonzini authored
      With live migration support and finally a good solution for exception
      event injection, nested VMX should be ready for having a stable userspace
      ABI.  The results of syzkaller fuzzing are not perfect but not horrible
      either (and might be partially due to running on GCE, so that effectively
      we're testing three-level nesting on a fork of upstream KVM!).  Enabling
      it by default seems like a nice way to conclude the 4.20 pull request. :)
      
      Unfortunately, enabling nested SVM in 2009 (commit 4b6e4dca) was a
      bit premature.  However, until live migration support is in place we can
      reasonably expect that it does not offer much in terms of ABI guarantees.
      Therefore we are still in time to break things and conform as much as
      possible to the interface used for VMX.
      Suggested-by: default avatarJim Mattson <jmattson@google.com>
      Suggested-by: default avatarLiran Alon <liran.alon@oracle.com>
      Reviewed-by: default avatarLiran Alon <liran.alon@oracle.com>
      Celebrated-by: default avatarLiran Alon <liran.alon@oracle.com>
      Celebrated-by: default avatarWanpeng Li <kernellwp@gmail.com>
      Celebrated-by: default avatarWincy Van <fanwenyi0529@gmail.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      1e58e5e5
    • Uros Bizjak's avatar
      KVM/x86: Use 32bit xor to clear registers in svm.c · 43ce76ce
      Uros Bizjak authored
      x86_64 zero-extends 32bit xor operation to a full 64bit register.
      
      Also add a comment and remove unnecessary instruction suffix in vmx.c
      Signed-off-by: default avatarUros Bizjak <ubizjak@gmail.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      43ce76ce
    • Jim Mattson's avatar
      kvm: x86: Introduce KVM_CAP_EXCEPTION_PAYLOAD · c4f55198
      Jim Mattson authored
      This is a per-VM capability which can be enabled by userspace so that
      the faulting linear address will be included with the information
      about a pending #PF in L2, and the "new DR6 bits" will be included
      with the information about a pending #DB in L2. With this capability
      enabled, the L1 hypervisor can now intercept #PF before CR2 is
      modified. Under VMX, the L1 hypervisor can now intercept #DB before
      DR6 and DR7 are modified.
      
      When userspace has enabled KVM_CAP_EXCEPTION_PAYLOAD, it should
      generally provide an appropriate payload when injecting a #PF or #DB
      exception via KVM_SET_VCPU_EVENTS. However, to support restoring old
      checkpoints, this payload is not required.
      
      Note that bit 16 of the "new DR6 bits" is set to indicate that a debug
      exception (#DB) or a breakpoint exception (#BP) occurred inside an RTM
      region while advanced debugging of RTM transactional regions was
      enabled. This is the reverse of DR6.RTM, which is cleared in this
      scenario.
      
      This capability also enables exception.pending in struct
      kvm_vcpu_events, which allows userspace to distinguish between pending
      and injected exceptions.
      Reported-by: default avatarJim Mattson <jmattson@google.com>
      Suggested-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarJim Mattson <jmattson@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c4f55198
    • Jim Mattson's avatar
      kvm: vmx: Defer setting of DR6 until #DB delivery · f10c729f
      Jim Mattson authored
      When exception payloads are enabled by userspace (which is not yet
      possible) and a #DB is raised in L2, defer the setting of DR6 until
      later. Under VMX, this allows the L1 hypervisor to intercept the fault
      before DR6 is modified. Under SVM, DR6 is modified before L1 can
      intercept the fault (as has always been the case with DR7).
      
      Note that the payload associated with a #DB exception includes only
      the "new DR6 bits." When the payload is delievered, DR6.B0-B3 will be
      cleared and DR6.RTM will be set prior to merging in the new DR6 bits.
      
      Also note that bit 16 in the "new DR6 bits" is set to indicate that a
      debug exception (#DB) or a breakpoint exception (#BP) occurred inside
      an RTM region while advanced debugging of RTM transactional regions
      was enabled. Though the reverse of DR6.RTM, this makes the #DB payload
      field compatible with both the pending debug exceptions field under
      VMX and the exit qualification for #DB exceptions under VMX.
      Reported-by: default avatarJim Mattson <jmattson@google.com>
      Suggested-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarJim Mattson <jmattson@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      f10c729f
    • Jim Mattson's avatar
      kvm: x86: Defer setting of CR2 until #PF delivery · da998b46
      Jim Mattson authored
      When exception payloads are enabled by userspace (which is not yet
      possible) and a #PF is raised in L2, defer the setting of CR2 until
      the #PF is delivered. This allows the L1 hypervisor to intercept the
      fault before CR2 is modified.
      
      For backwards compatibility, when exception payloads are not enabled
      by userspace, kvm_multiple_exception modifies CR2 when the #PF
      exception is raised.
      Reported-by: default avatarJim Mattson <jmattson@google.com>
      Suggested-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarJim Mattson <jmattson@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      da998b46
    • Jim Mattson's avatar
      kvm: x86: Add payload operands to kvm_multiple_exception · 91e86d22
      Jim Mattson authored
      kvm_multiple_exception now takes two additional operands: has_payload
      and payload, so that updates to CR2 (and DR6 under VMX) can be delayed
      until the exception is delivered. This is necessary to properly
      emulate VMX or SVM hardware behavior for nested virtualization.
      
      The new behavior is triggered by
      vcpu->kvm->arch.exception_payload_enabled, which will (later) be set
      by a new per-VM capability, KVM_CAP_EXCEPTION_PAYLOAD.
      Reported-by: default avatarJim Mattson <jmattson@google.com>
      Suggested-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarJim Mattson <jmattson@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      91e86d22
    • Jim Mattson's avatar
      kvm: x86: Add exception payload fields to kvm_vcpu_events · 59073aaf
      Jim Mattson authored
      The per-VM capability KVM_CAP_EXCEPTION_PAYLOAD (to be introduced in a
      later commit) adds the following fields to struct kvm_vcpu_events:
      exception_has_payload, exception_payload, and exception.pending.
      
      With this capability set, all of the details of vcpu->arch.exception,
      including the payload for a pending exception, are reported to
      userspace in response to KVM_GET_VCPU_EVENTS.
      
      With this capability clear, the original ABI is preserved, and the
      exception.injected field is set for either pending or injected
      exceptions.
      
      When userspace calls KVM_SET_VCPU_EVENTS with
      KVM_CAP_EXCEPTION_PAYLOAD clear, exception.injected is no longer
      translated to exception.pending. KVM_SET_VCPU_EVENTS can now only
      establish a pending exception when KVM_CAP_EXCEPTION_PAYLOAD is set.
      Reported-by: default avatarJim Mattson <jmattson@google.com>
      Suggested-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarJim Mattson <jmattson@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      59073aaf
  8. 16 Oct, 2018 21 commits