1. 02 Nov, 2017 1 commit
    • Paolo Bonzini's avatar
      Merge branch 'kvm-ppc-next' of... · 6d6ab940
      Paolo Bonzini authored
      Merge branch 'kvm-ppc-next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD
      
      Apart from various bugfixes and code cleanups, the major new feature
      is the ability to run guests using the hashed page table (HPT) MMU
      mode on a host that is using the radix MMU mode.  Because of limitations
      in the current POWER9 chip (all SMT threads in each core must use the
      same MMU mode, HPT or radix), this requires the host to be configured
      to run similar to POWER8: the host runs in single-threaded mode (only
      thread 0 of each core online), and have KVM be able to wake up the other
      threads when a KVM guest is to be run, and use the other threads for
      running guest VCPUs.  A new module parameter, called "indep_threads_mode",
      is normally Y on POWER9 but must be set to N before any HPT guests can
      be run on a radix host:
      
          # echo N >/sys/module/kvm_hv/parameters/indep_threads_mode
          # ppc64_cpu --smt=off
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      6d6ab940
  2. 01 Nov, 2017 10 commits
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Run HPT guests on POWER9 radix hosts · c0101509
      Paul Mackerras authored
      This patch removes the restriction that a radix host can only run
      radix guests, allowing us to run HPT (hashed page table) guests as
      well.  This is useful because it provides a way to run old guest
      kernels that know about POWER8 but not POWER9.
      
      Unfortunately, POWER9 currently has a restriction that all threads
      in a given code must either all be in HPT mode, or all in radix mode.
      This means that when entering a HPT guest, we have to obtain control
      of all 4 threads in the core and get them to switch their LPIDR and
      LPCR registers, even if they are not going to run a guest.  On guest
      exit we also have to get all threads to switch LPIDR and LPCR back
      to host values.
      
      To make this feasible, we require that KVM not be in the "independent
      threads" mode, and that the CPU cores be in single-threaded mode from
      the host kernel's perspective (only thread 0 online; threads 1, 2 and
      3 offline).  That allows us to use the same code as on POWER8 for
      obtaining control of the secondary threads.
      
      To manage the LPCR/LPIDR changes required, we extend the kvm_split_info
      struct to contain the information needed by the secondary threads.
      All threads perform a barrier synchronization (where all threads wait
      for every other thread to reach the synchronization point) on guest
      entry, both before and after loading LPCR and LPIDR.  On guest exit,
      they all once again perform a barrier synchronization both before
      and after loading host values into LPCR and LPIDR.
      
      Finally, it is also currently necessary to flush the entire TLB every
      time we enter a HPT guest on a radix host.  We do this on thread 0
      with a loop of tlbiel instructions.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      c0101509
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Allow for running POWER9 host in single-threaded mode · 516f7898
      Paul Mackerras authored
      This patch allows for a mode on POWER9 hosts where we control all the
      threads of a core, much as we do on POWER8.  The mode is controlled by
      a module parameter on the kvm_hv module, called "indep_threads_mode".
      The normal mode on POWER9 is the "independent threads" mode, with
      indep_threads_mode=Y, where the host is in SMT4 mode (or in fact any
      desired SMT mode) and each thread independently enters and exits from
      KVM guests without reference to what other threads in the core are
      doing.
      
      If indep_threads_mode is set to N at the point when a VM is started,
      KVM will expect every core that the guest runs on to be in single
      threaded mode (that is, threads 1, 2 and 3 offline), and will set the
      flag that prevents secondary threads from coming online.  We can still
      use all four threads; the code that implements dynamic micro-threading
      on POWER8 will become active in over-commit situations and will allow
      up to three other VCPUs to be run on the secondary threads of the core
      whenever a VCPU is run.
      
      The reason for wanting this mode is that this will allow us to run HPT
      guests on a radix host on a POWER9 machine that does not support
      "mixed mode", that is, having some threads in a core be in HPT mode
      while other threads are in radix mode.  It will also make it possible
      to implement a "strict threads" mode in future, if desired.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      516f7898
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Add infrastructure for running HPT guests on radix host · 18c3640c
      Paul Mackerras authored
      This sets up the machinery for switching a guest between HPT (hashed
      page table) and radix MMU modes, so that in future we can run a HPT
      guest on a radix host on POWER9 machines.
      
      * The KVM_PPC_CONFIGURE_V3_MMU ioctl can now specify either HPT or
        radix mode, on a radix host.
      
      * The KVM_CAP_PPC_MMU_HASH_V3 capability now returns 1 on POWER9
        with HV KVM on a radix host.
      
      * The KVM_PPC_GET_SMMU_INFO returns information about the HPT MMU on a
        radix host.
      
      * The KVM_PPC_ALLOCATE_HTAB ioctl on a radix host will switch the
        guest to HPT mode and allocate a HPT.
      
      * For simplicity, we now allocate the rmap array for each memslot,
        even on a radix host, since it will be needed if the guest switches
        to HPT mode.
      
      * Since we cannot yet run a HPT guest on a radix host, the KVM_RUN
        ioctl will return an EINVAL error in that case.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      18c3640c
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Unify dirty page map between HPT and radix · e641a317
      Paul Mackerras authored
      Currently, the HPT code in HV KVM maintains a dirty bit per guest page
      in the rmap array, whether or not dirty page tracking has been enabled
      for the memory slot.  In contrast, the radix code maintains a dirty
      bit per guest page in memslot->dirty_bitmap, and only does so when
      dirty page tracking has been enabled.
      
      This changes the HPT code to maintain the dirty bits in the memslot
      dirty_bitmap like radix does.  This results in slightly less code
      overall, and will mean that we do not lose the dirty bits when
      transitioning between HPT and radix mode in future.
      
      There is one minor change to behaviour as a result.  With HPT, when
      dirty tracking was enabled for a memslot, we would previously clear
      all the dirty bits at that point (both in the HPT entries and in the
      rmap arrays), meaning that a KVM_GET_DIRTY_LOG ioctl immediately
      following would show no pages as dirty (assuming no vcpus have run
      in the meantime).  With this change, the dirty bits on HPT entries
      are not cleared at the point where dirty tracking is enabled, so
      KVM_GET_DIRTY_LOG would show as dirty any guest pages that are
      resident in the HPT and dirty.  This is consistent with what happens
      on radix.
      
      This also fixes a bug in the mark_pages_dirty() function for radix
      (in the sense that the function no longer exists).  In the case where
      a large page of 64 normal pages or more is marked dirty, the
      addressing of the dirty bitmap was incorrect and could write past
      the end of the bitmap.  Fortunately this case was never hit in
      practice because a 2MB large page is only 32 x 64kB pages, and we
      don't support backing the guest with 1GB huge pages at this point.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      e641a317
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Rename hpte_setup_done to mmu_ready · 1b151ce4
      Paul Mackerras authored
      This renames the kvm->arch.hpte_setup_done field to mmu_ready because
      we will want to use it for radix guests too -- both for setting things
      up before vcpu execution, and for excluding vcpus from executing while
      MMU-related things get changed, such as in future switching the MMU
      from radix to HPT mode or vice-versa.
      
      This also moves the call to kvmppc_setup_partition_table() that was
      done in kvmppc_hv_setup_htab_rma() for HPT guests, and the setting
      of mmu_ready, into the caller in kvmppc_vcpu_run_hv().
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      1b151ce4
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Don't rely on host's page size information · 8dc6cca5
      Paul Mackerras authored
      This removes the dependence of KVM on the mmu_psize_defs array (which
      stores information about hardware support for various page sizes) and
      the things derived from it, chiefly hpte_page_sizes[], hpte_page_size(),
      hpte_actual_page_size() and get_sllp_encoding().  We also no longer
      rely on the mmu_slb_size variable or the MMU_FTR_1T_SEGMENTS feature
      bit.
      
      The reason for doing this is so we can support a HPT guest on a radix
      host.  In a radix host, the mmu_psize_defs array contains information
      about page sizes supported by the MMU in radix mode rather than the
      page sizes supported by the MMU in HPT mode.  Similarly, mmu_slb_size
      and the MMU_FTR_1T_SEGMENTS bit are not set.
      
      Instead we hard-code knowledge of the behaviour of the HPT MMU in the
      POWER7, POWER8 and POWER9 processors (which are the only processors
      supported by HV KVM) - specifically the encoding of the LP fields in
      the HPT and SLB entries, and the fact that they have 32 SLB entries
      and support 1TB segments.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      8dc6cca5
    • Paul Mackerras's avatar
      Merge remote-tracking branch 'remotes/powerpc/topic/ppc-kvm' into kvm-ppc-next · 3e8f150a
      Paul Mackerras authored
      This merges in the ppc-kvm topic branch of the powerpc tree to get the
      commit that reverts the patch "KVM: PPC: Book3S HV: POWER9 does not
      require secondary thread management".  This is needed for subsequent
      patches which will be applied on this branch.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      3e8f150a
    • Nicholas Piggin's avatar
      KVM: PPC: Book3S: Fix gas warning due to using r0 as immediate 0 · 93897a1f
      Nicholas Piggin authored
      This fixes the message:
      
      arch/powerpc/kvm/book3s_segment.S: Assembler messages:
      arch/powerpc/kvm/book3s_segment.S:330: Warning: invalid register expression
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      93897a1f
    • Greg Kurz's avatar
      KVM: PPC: Book3S PR: Only install valid SLBs during KVM_SET_SREGS · f4093ee9
      Greg Kurz authored
      Userland passes an array of 64 SLB descriptors to KVM_SET_SREGS,
      some of which are valid (ie, SLB_ESID_V is set) and the rest are
      likely all-zeroes (with QEMU at least).
      
      Each of them is then passed to kvmppc_mmu_book3s_64_slbmte(), which
      assumes to find the SLB index in the 3 lower bits of its rb argument.
      When passed zeroed arguments, it happily overwrites the 0th SLB entry
      with zeroes. This is exactly what happens while doing live migration
      with QEMU when the destination pushes the incoming SLB descriptors to
      KVM PR. When reloading the SLBs at the next synchronization, QEMU first
      clears its SLB array and only restore valid ones, but the 0th one is
      now gone and we cannot access the corresponding memory anymore:
      
      (qemu) x/x $pc
      c0000000000b742c: Cannot access memory
      
      To avoid this, let's filter out non-valid SLB entries. While here, we
      also force a full SLB flush before installing new entries. Since SLB
      is for 64-bit only, we now build this path conditionally to avoid a
      build break on 32-bit, which doesn't define SLB_ESID_V.
      Signed-off-by: default avatarGreg Kurz <groug@kaod.org>
      Reviewed-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      f4093ee9
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Don't call real-mode XICS hypercall handlers if not enabled · 00bb6ae5
      Paul Mackerras authored
      When running a guest on a POWER9 system with the in-kernel XICS
      emulation disabled (for example by running QEMU with the parameter
      "-machine pseries,kernel_irqchip=off"), the kernel does not pass
      the XICS-related hypercalls such as H_CPPR up to userspace for
      emulation there as it should.
      
      The reason for this is that the real-mode handlers for these
      hypercalls don't check whether a XICS device has been instantiated
      before calling the xics-on-xive code.  That code doesn't check
      either, leading to potential NULL pointer dereferences because
      vcpu->arch.xive_vcpu is NULL.  Those dereferences won't cause an
      exception in real mode but will lead to kernel memory corruption.
      
      This fixes it by adding kvmppc_xics_enabled() checks before calling
      the XICS functions.
      
      Cc: stable@vger.kernel.org # v4.11+
      Fixes: 5af50993 ("KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller")
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      00bb6ae5
  3. 20 Oct, 2017 4 commits
    • Wanpeng Li's avatar
      KVM: X86: #GP when guest attempts to write MCi_STATUS register w/o 0 · 9ffd986c
      Wanpeng Li authored
      Both Intel SDM and AMD APM mentioned that MCi_STATUS, when the register is
      implemented, this register can be cleared by explicitly writing 0s to this
      register. Writing 1s to this register will cause a general-protection
      exception.
      
      The mce is emulated in qemu, so just the guest attempts to write 1 to this
      register should cause a #GP, this patch does it.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Jim Mattson <jmattson@google.com>
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      Reviewed-by: default avatarJim Mattson <jmattson@google.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      9ffd986c
    • Wanpeng Li's avatar
      KVM: VMX: Fix VPID capability detection · 61f1dd90
      Wanpeng Li authored
      In my setup, EPT is not exposed to L1, the VPID capability is exposed and
      can be observed by vmxcap tool in L1:
      INVVPID supported                        yes
      Individual-address INVVPID               yes
      Single-context INVVPID                   yes
      All-context INVVPID                      yes
      Single-context-retaining-globals INVVPID yes
      
      However, the module parameter of VPID observed in L1 is always N, the
      cpu_has_vmx_invvpid() check in L1 KVM fails since vmx_capability.vpid
      is 0 and it is not read from MSR due to EPT is not exposed.
      
      The VPID can be used to tag linear mappings when EPT is not enabled. However,
      current logic just detects VPID capability if EPT is enabled, this patch
      fixes it.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Jim Mattson <jmattson@google.com>
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      61f1dd90
    • Wanpeng Li's avatar
      KVM: nVMX: Fix EPT switching advertising · 575b3a2c
      Wanpeng Li authored
      I can use vmxcap tool to observe "EPTP Switching   yes" even if EPT is not
      exposed to L1.
      
      EPT switching is advertised unconditionally since it is emulated, however,
      it can be treated as an extended feature for EPT and it should not be
      advertised if EPT itself is not exposed. This patch fixes it.
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Jim Mattson <jmattson@google.com>
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      575b3a2c
    • Michael Ellerman's avatar
      KVM: PPC: Tie KVM_CAP_PPC_HTM to the user-visible TM feature · 2a3d6553
      Michael Ellerman authored
      Currently we use CPU_FTR_TM to decide if the CPU/kernel can support
      TM (Transactional Memory), and if it's true we advertise that to
      Qemu (or similar) via KVM_CAP_PPC_HTM.
      
      PPC_FEATURE2_HTM is the user-visible feature bit, which indicates that
      the CPU and kernel can support TM. Currently CPU_FTR_TM and
      PPC_FEATURE2_HTM always have the same value, either true or false, so
      using the former for KVM_CAP_PPC_HTM is correct.
      
      However some Power9 CPUs can operate in a mode where TM is enabled but
      TM suspended state is disabled. In this mode CPU_FTR_TM is true, but
      PPC_FEATURE2_HTM is false. Instead a different PPC_FEATURE2 bit is
      set, to indicate that this different mode of TM is available.
      
      It is not safe to let guests use TM as-is, when the CPU is in this
      mode. So to prevent that from happening, use PPC_FEATURE2_HTM to
      determine the value of KVM_CAP_PPC_HTM.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      2a3d6553
  4. 19 Oct, 2017 1 commit
  5. 18 Oct, 2017 1 commit
  6. 15 Oct, 2017 1 commit
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Explicitly disable HPT operations on radix guests · 891f1ebf
      Paul Mackerras authored
      This adds code to make sure that we don't try to access the
      non-existent HPT for a radix guest using the htab file for the VM
      in debugfs, a file descriptor obtained using the KVM_PPC_GET_HTAB_FD
      ioctl, or via the KVM_PPC_RESIZE_HPT_{PREPARE,COMMIT} ioctls.
      
      At present nothing bad happens if userspace does access these
      interfaces on a radix guest, mostly because kvmppc_hpt_npte()
      gives 0 for a radix guest, which in turn is because 1 << -4
      comes out as 0 on POWER processors.  However, that relies on
      undefined behaviour, so it is better to be explicit about not
      accessing the HPT for a radix guest.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      891f1ebf
  7. 14 Oct, 2017 5 commits
  8. 12 Oct, 2017 17 commits