1. 03 May, 2017 2 commits
  2. 02 May, 2017 1 commit
  3. 28 Apr, 2017 1 commit
  4. 27 Apr, 2017 12 commits
  5. 26 Apr, 2017 4 commits
  6. 21 Apr, 2017 9 commits
    • Marcelo Tosatti's avatar
      KVM: x86: remove irq disablement around KVM_SET_CLOCK/KVM_GET_CLOCK · e891a32e
      Marcelo Tosatti authored
      The disablement of interrupts at KVM_SET_CLOCK/KVM_GET_CLOCK
      attempts to disable software suspend from causing "non atomic behaviour" of
      the operation:
      
          Add a helper function to compute the kernel time and convert nanoseconds
          back to CPU specific cycles.  Note that these must not be called in preemptible
          context, as that would mean the kernel could enter software suspend state,
          which would cause non-atomic operation.
      
      However, assume the kernel can enter software suspend at the following 2 points:
      
              ktime_get_ts(&ts);
      1.
      						hypothetical_ktime_get_ts(&ts)
              monotonic_to_bootbased(&ts);
      2.
      
      monotonic_to_bootbased() should be correct relative to a ktime_get_ts(&ts)
      performed after point 1 (that is after resuming from software suspend),
      hypothetical_ktime_get_ts()
      
      Therefore it is also correct for the ktime_get_ts(&ts) before point 1,
      which is
      
      	ktime_get_ts(&ts) = hypothetical_ktime_get_ts(&ts) + time-to-execute-suspend-code
      
      Note CLOCK_MONOTONIC does not count during suspension.
      
      So remove the irq disablement, which causes the following warning on
      -RT kernels:
      
       With this reasoning, and the -RT bug that the irq disablement causes
       (because spin_lock is now a sleeping lock), remove the IRQ protection as it
       causes:
      
       [ 1064.668109] in_atomic(): 0, irqs_disabled(): 1, pid: 15296, name:m
       [ 1064.668110] INFO: lockdep is turned off.
       [ 1064.668110] irq event stamp: 0
       [ 1064.668112] hardirqs last  enabled at (0): [<          (null)>]  )
       [ 1064.668116] hardirqs last disabled at (0): [] c0
       [ 1064.668118] softirqs last  enabled at (0): [] c0
       [ 1064.668118] softirqs last disabled at (0): [<          (null)>]  )
       [ 1064.668121] CPU: 13 PID: 15296 Comm: qemu-kvm Not tainted 3.10.0-1
       [ 1064.668121] Hardware name: Dell Inc. PowerEdge R730/0H21J3, BIOS 5
       [ 1064.668123]  ffff8c1796b88000 00000000afe7344c ffff8c179abf3c68 f3
       [ 1064.668125]  ffff8c179abf3c90 ffffffff930ccb3d ffff8c1b992b3610 f0
       [ 1064.668126]  00007ffc1a26fbc0 ffff8c179abf3cb0 ffffffff9375f694 f0
       [ 1064.668126] Call Trace:
       [ 1064.668132]  [] dump_stack+0x19/0x1b
       [ 1064.668135]  [] __might_sleep+0x12d/0x1f0
       [ 1064.668138]  [] rt_spin_lock+0x24/0x60
       [ 1064.668155]  [] __get_kvmclock_ns+0x36/0x110 [k]
       [ 1064.668159]  [] ? futex_wait_queue_me+0x103/0x10
       [ 1064.668171]  [] kvm_arch_vm_ioctl+0xa2/0xd70 [k]
       [ 1064.668173]  [] ? futex_wait+0x1ac/0x2a0
      
      v2: notice get_kvmclock_ns with the same problem (Pankaj).
      v3: remove useless helper function (Pankaj).
      Signed-off-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      e891a32e
    • Michael S. Tsirkin's avatar
      kvm: better MWAIT emulation for guests · 668fffa3
      Michael S. Tsirkin authored
      Guests that are heavy on futexes end up IPI'ing each other a lot. That
      can lead to significant slowdowns and latency increase for those guests
      when running within KVM.
      
      If only a single guest is needed on a host, we have a lot of spare host
      CPU time we can throw at the problem. Modern CPUs implement a feature
      called "MWAIT" which allows guests to wake up sleeping remote CPUs without
      an IPI - thus without an exit - at the expense of never going out of guest
      context.
      
      The decision whether this is something sensible to use should be up to the
      VM admin, so to user space. We can however allow MWAIT execution on systems
      that support it properly hardware wise.
      
      This patch adds a CAP to user space and a KVM cpuid leaf to indicate
      availability of native MWAIT execution. With that enabled, the worst a
      guest can do is waste as many cycles as a "jmp ." would do, so it's not
      a privilege problem.
      
      We consciously do *not* expose the feature in our CPUID bitmap, as most
      people will want to benefit from sleeping vCPUs to allow for over commit.
      Reported-by: default avatar"Gabriel L. Somlo" <gsomlo@gmail.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      [agraf: fix amd, change commit message]
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      668fffa3
    • Kyle Huey's avatar
      KVM: x86: virtualize cpuid faulting · db2336a8
      Kyle Huey authored
      Hardware support for faulting on the cpuid instruction is not required to
      emulate it, because cpuid triggers a VM exit anyways. KVM handles the relevant
      MSRs (MSR_PLATFORM_INFO and MSR_MISC_FEATURES_ENABLE) and upon a
      cpuid-induced VM exit checks the cpuid faulting state and the CPL.
      kvm_require_cpl is even kind enough to inject the GP fault for us.
      Signed-off-by: default avatarKyle Huey <khuey@kylehuey.com>
      Reviewed-by: default avatarDavid Matlack <dmatlack@google.com>
      [Return "1" from kvm_emulate_cpuid, it's not void. - Paolo]
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      db2336a8
    • Paolo Bonzini's avatar
      Merge tag 'kvm-s390-next-4.12-2' of... · bd17117b
      Paolo Bonzini authored
      Merge tag 'kvm-s390-next-4.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
      
      KVM: s390: Guarded storage fixup and keyless subset mode
      
      - detect and use the keyless subset mode (guests without
        storage keys)
      - fix vSIE support for sdnxc
      - fix machine check data for guarded storage
      bd17117b
    • Paolo Bonzini's avatar
      Merge branch 'kvm-ppc-next' of... · ec594c47
      Paolo Bonzini authored
      Merge branch 'kvm-ppc-next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD
      ec594c47
    • Paolo Bonzini's avatar
      Merge branch 'x86/process' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into HEAD · 8afd74c2
      Paolo Bonzini authored
      Required for KVM support of the CPUID faulting feature.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      8afd74c2
    • David Hildenbrand's avatar
      KVM: VMX: drop vmm_exclusive module parameter · fe0e80be
      David Hildenbrand authored
      vmm_exclusive=0 leads to KVM setting X86_CR4_VMXE always and calling
      VMXON only when the vcpu is loaded. X86_CR4_VMXE is used as an
      indication in cpu_emergency_vmxoff() (called on kdump) if VMXOFF has to be
      called. This is obviously not the case if both are used independtly.
      Calling VMXOFF without a previous VMXON will result in an exception.
      
      In addition, X86_CR4_VMXE is used as a mean to test if VMX is already in
      use by another VMM in hardware_enable(). So there can't really be
      co-existance. If the other VMM is prepared for co-existance and does a
      similar check, only one VMM can exist. If the other VMM is not prepared
      and blindly sets/clears X86_CR4_VMXE, we will get inconsistencies with
      X86_CR4_VMXE.
      
      As we also had bug reports related to clearing of vmcs with vmm_exclusive=0
      this seems to be pretty much untested. So let's better drop it.
      
      While at it, directly move setting/clearing X86_CR4_VMXE into
      kvm_cpu_vmxon/off.
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      fe0e80be
    • Farhan Ali's avatar
      KVM: s390: Support keyless subset guest mode · 730cd632
      Farhan Ali authored
      If the KSS facility is available on the machine, we also make it
      available for our KVM guests.
      
      The KSS facility bypasses storage key management as long as the guest
      does not issue a related instruction. When that happens, the control is
      returned to the host, which has to turn off KSS for a guest vcpu
      before retrying the instruction.
      Signed-off-by: default avatarCorey S. McQuay <csmcquay@linux.vnet.ibm.com>
      Signed-off-by: default avatarFarhan Ali <alifm@linux.vnet.ibm.com>
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      730cd632
    • Farhan Ali's avatar
      s390/sclp: Detect KSS facility · 71cb1bf6
      Farhan Ali authored
      Let's detect the keyless subset facility.
      Signed-off-by: default avatarFarhan Ali <alifm@linux.vnet.ibm.com>
      Reviewed-by: default avatarCornelia Huck <cornelia.huck@de.ibm.com>
      Reviewed-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Acked-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      71cb1bf6
  7. 20 Apr, 2017 11 commits
    • Marc Zyngier's avatar
      ARM: KVM: Fix idmap stub entry when running Thumb-2 code · 1edb6321
      Marc Zyngier authored
      When entering the hyp stub implemented in the idmap, we try to
      be mindful of the fact that we could be running a Thumb-2 kernel
      by adding 1 to the address we compute. Unfortunately, the assembler
      also knows about this trick, and has already generated an address
      that has bit 0 set in the litteral pool.
      
      Our superfluous correction ends up confusing the CPU entierely,
      as we now branch to the stub in ARM mode instead of Thumb, and on
      a possibly unaligned address for good measure. From that point,
      nothing really good happens.
      
      The obvious fix in to remove this stupid target PC correction.
      
      Fixes: 6bebcecb ("ARM: KVM: Allow the main HYP code to use the init hyp stub implementation")
      Reported-by: default avatarChristoffer Dall <cdall@linaro.org>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarChristoffer Dall <cdall@linaro.org>
      1edb6321
    • Marc Zyngier's avatar
      ARM: hyp-stub: Fix Thumb-2 compilation · 5b560525
      Marc Zyngier authored
      The assembler defaults to emiting the short form of ADR, leading
      to an out-of-range immediate. Using the wide version solves this
      issue.
      
      Fixes: bc845e4f ("ARM: KVM: Implement HVC_RESET_VECTORS stub hypercall in the init code")
      Reported-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarChristoffer Dall <cdall@linaro.org>
      5b560525
    • Thomas Huth's avatar
      KVM: PPC: Book3S PR: Do not fail emulation with mtspr/mfspr for unknown SPRs · feafd13c
      Thomas Huth authored
      According to the PowerISA 2.07, mtspr and mfspr should not always
      generate an illegal instruction exception when being used with an
      undefined SPR, but rather treat the instruction as a NOP or inject a
      privilege exception in some cases, too - depending on the SPR number.
      Also turn the printk here into a ratelimited print statement, so that
      the guest can not flood the dmesg log of the host by issueing lots of
      illegal mtspr/mfspr instruction here.
      Signed-off-by: default avatarThomas Huth <thuth@redhat.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      feafd13c
    • Alexey Kardashevskiy's avatar
      KVM: PPC: VFIO: Add in-kernel acceleration for VFIO · 121f80ba
      Alexey Kardashevskiy authored
      This allows the host kernel to handle H_PUT_TCE, H_PUT_TCE_INDIRECT
      and H_STUFF_TCE requests targeted an IOMMU TCE table used for VFIO
      without passing them to user space which saves time on switching
      to user space and back.
      
      This adds H_PUT_TCE/H_PUT_TCE_INDIRECT/H_STUFF_TCE handlers to KVM.
      KVM tries to handle a TCE request in the real mode, if failed
      it passes the request to the virtual mode to complete the operation.
      If it a virtual mode handler fails, the request is passed to
      the user space; this is not expected to happen though.
      
      To avoid dealing with page use counters (which is tricky in real mode),
      this only accelerates SPAPR TCE IOMMU v2 clients which are required
      to pre-register the userspace memory. The very first TCE request will
      be handled in the VFIO SPAPR TCE driver anyway as the userspace view
      of the TCE table (iommu_table::it_userspace) is not allocated till
      the very first mapping happens and we cannot call vmalloc in real mode.
      
      If we fail to update a hardware IOMMU table unexpected reason, we just
      clear it and move on as there is nothing really we can do about it -
      for example, if we hot plug a VFIO device to a guest, existing TCE tables
      will be mirrored automatically to the hardware and there is no interface
      to report to the guest about possible failures.
      
      This adds new attribute - KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE - to
      the VFIO KVM device. It takes a VFIO group fd and SPAPR TCE table fd
      and associates a physical IOMMU table with the SPAPR TCE table (which
      is a guest view of the hardware IOMMU table). The iommu_table object
      is cached and referenced so we do not have to look up for it in real mode.
      
      This does not implement the UNSET counterpart as there is no use for it -
      once the acceleration is enabled, the existing userspace won't
      disable it unless a VFIO container is destroyed; this adds necessary
      cleanup to the KVM_DEV_VFIO_GROUP_DEL handler.
      
      This advertises the new KVM_CAP_SPAPR_TCE_VFIO capability to the user
      space.
      
      This adds real mode version of WARN_ON_ONCE() as the generic version
      causes problems with rcu_sched. Since we testing what vmalloc_to_phys()
      returns in the code, this also adds a check for already existing
      vmalloc_to_phys() call in kvmppc_rm_h_put_tce_indirect().
      
      This finally makes use of vfio_external_user_iommu_id() which was
      introduced quite some time ago and was considered for removal.
      
      Tests show that this patch increases transmission speed from 220MB/s
      to 750..1020MB/s on 10Gb network (Chelsea CXGB3 10Gb ethernet card).
      Signed-off-by: default avatarAlexey Kardashevskiy <aik@ozlabs.ru>
      Acked-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      Reviewed-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      121f80ba
    • Alexey Kardashevskiy's avatar
      KVM: PPC: iommu: Unify TCE checking · b1af23d8
      Alexey Kardashevskiy authored
      This reworks helpers for checking TCE update parameters in way they
      can be used in KVM.
      
      This should cause no behavioral change.
      Signed-off-by: default avatarAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      Acked-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      b1af23d8
    • Alexey Kardashevskiy's avatar
      KVM: PPC: Use preregistered memory API to access TCE list · da6f59e1
      Alexey Kardashevskiy authored
      VFIO on sPAPR already implements guest memory pre-registration
      when the entire guest RAM gets pinned. This can be used to translate
      the physical address of a guest page containing the TCE list
      from H_PUT_TCE_INDIRECT.
      
      This makes use of the pre-registrered memory API to access TCE list
      pages in order to avoid unnecessary locking on the KVM memory
      reverse map as we know that all of guest memory is pinned and
      we have a flat array mapping GPA to HPA which makes it simpler and
      quicker to index into that array (even with looking up the
      kernel page tables in vmalloc_to_phys) than it is to find the memslot,
      lock the rmap entry, look up the user page tables, and unlock the rmap
      entry. Note that the rmap pointer is initialized to NULL
      where declared (not in this patch).
      
      If a requested chunk of memory has not been preregistered, this will
      fall back to non-preregistered case and lock rmap.
      Signed-off-by: default avatarAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      da6f59e1
    • Alexey Kardashevskiy's avatar
      KVM: PPC: Pass kvm* to kvmppc_find_table() · 503bfcbe
      Alexey Kardashevskiy authored
      The guest view TCE tables are per KVM anyway (not per VCPU) so pass kvm*
      there. This will be used in the following patches where we will be
      attaching VFIO containers to LIOBNs via ioctl() to KVM (rather than
      to VCPU).
      Signed-off-by: default avatarAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      503bfcbe
    • Alexey Kardashevskiy's avatar
      KVM: PPC: Enable IOMMU_API for KVM_BOOK3S_64 permanently · e91aa8e6
      Alexey Kardashevskiy authored
      It does not make much sense to have KVM in book3s-64 and
      not to have IOMMU bits for PCI pass through support as it costs little
      and allows VFIO to function on book3s KVM.
      
      Having IOMMU_API always enabled makes it unnecessary to have a lot of
      "#ifdef IOMMU_API" in arch/powerpc/kvm/book3s_64_vio*. With those
      ifdef's we could have only user space emulated devices accelerated
      (but not VFIO) which do not seem to be very useful.
      Signed-off-by: default avatarAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      e91aa8e6
    • Alexey Kardashevskiy's avatar
      KVM: PPC: Reserve KVM_CAP_SPAPR_TCE_VFIO capability number · 4898d3f4
      Alexey Kardashevskiy authored
      This adds a capability number for in-kernel support for VFIO on
      SPAPR platform.
      
      The capability will tell the user space whether in-kernel handlers of
      H_PUT_TCE can handle VFIO-targeted requests or not. If not, the user space
      must not attempt allocating a TCE table in the host kernel via
      the KVM_CREATE_SPAPR_TCE KVM ioctl because in that case TCE requests
      will not be passed to the user space which is desired action in
      the situation like that.
      Signed-off-by: default avatarAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      4898d3f4
    • Paul Mackerras's avatar
      Merge remote-tracking branch 'remotes/powerpc/topic/ppc-kvm' into kvm-ppc-next · 644d2d6f
      Paul Mackerras authored
      This merges in the commits in the topic/ppc-kvm branch of the powerpc
      tree to get the changes to arch/powerpc which subsequent patches will
      rely on.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      644d2d6f
    • Alexey Kardashevskiy's avatar
      KVM: PPC: Align the table size to system page size · 3762d45a
      Alexey Kardashevskiy authored
      At the moment the userspace can request a table smaller than a page size
      and this value will be stored as kvmppc_spapr_tce_table::size.
      However the actual allocated size will still be aligned to the system
      page size as alloc_page() is used there.
      
      This aligns the table size up to the system page size. It should not
      change the existing behaviour but when in-kernel TCE acceleration patchset
      reaches the upstream kernel, this will allow small TCE tables be
      accelerated as well: PCI IODA iommu_table allocator already aligns
      the size and, without this patch, an IOMMU group won't attach to LIOBN
      due to the mismatching table size.
      Reviewed-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: default avatarAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      3762d45a