1. 27 Sep, 2016 1 commit
    • Paul Mackerras's avatar
      KVM: PPC: Book3S: Treat VTB as a per-subcore register, not per-thread · 88b02cf9
      Paul Mackerras authored
      POWER8 has one virtual timebase (VTB) register per subcore, not one
      per CPU thread.  The HV KVM code currently treats VTB as a per-thread
      register, which can lead to spurious soft lockup messages from guests
      which use the VTB as the time source for the soft lockup detector.
      (CPUs before POWER8 did not have the VTB register.)
      
      For HV KVM, this fixes the problem by making only the primary thread
      in each virtual core save and restore the VTB value.  With this,
      the VTB state becomes part of the kvmppc_vcore structure.  This
      also means that "piggybacking" of multiple virtual cores onto one
      subcore is not possible on POWER8, because then the virtual cores
      would share a single VTB register.
      
      PR KVM emulates a VTB register, which is per-vcpu because PR KVM
      has no notion of CPU threads or SMT.  For PR KVM we move the VTB
      state into the kvmppc_vcpu_book3s struct.
      
      Cc: stable@vger.kernel.org # v3.14+
      Reported-by: default avatarThomas Huth <thuth@redhat.com>
      Tested-by: default avatarThomas Huth <thuth@redhat.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      88b02cf9
  2. 20 Sep, 2016 5 commits
  3. 16 Sep, 2016 6 commits
  4. 13 Sep, 2016 2 commits
  5. 12 Sep, 2016 12 commits
    • Markus Elfring's avatar
      KVM: PPC: e500: Use kmalloc_array() in kvmppc_e500_tlb_init() · 90235dc1
      Markus Elfring authored
      * A multiplication for the size determination of a memory allocation
        indicated that an array data structure should be processed.
        Thus use the corresponding function "kmalloc_array".
      
      * Replace the specification of a data structure by a pointer dereference
        to make the corresponding size determination a bit safer according to
        the Linux coding style convention.
      Signed-off-by: default avatarMarkus Elfring <elfring@users.sourceforge.net>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      90235dc1
    • Markus Elfring's avatar
      KVM: PPC: e500: Replace kzalloc() calls by kcalloc() in two functions · b0ac477b
      Markus Elfring authored
      * A multiplication for the size determination of a memory allocation
        indicated that an array data structure should be processed.
        Thus use the corresponding function "kcalloc".
      Suggested-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      
        This issue was detected also by using the Coccinelle software.
      
      * Replace the specification of data structures by pointer dereferences
        to make the corresponding size determination a bit safer according to
        the Linux coding style convention.
      Signed-off-by: default avatarMarkus Elfring <elfring@users.sourceforge.net>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      b0ac477b
    • Markus Elfring's avatar
      KVM: PPC: e500: Delete an unnecessary initialisation in kvm_vcpu_ioctl_config_tlb() · cfb60813
      Markus Elfring authored
      The local variable "g2h_bitmap" will be set to an appropriate value
      a bit later. Thus omit the explicit initialisation at the beginning.
      Signed-off-by: default avatarMarkus Elfring <elfring@users.sourceforge.net>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      cfb60813
    • Markus Elfring's avatar
      KVM: PPC: e500: Less function calls in kvm_vcpu_ioctl_config_tlb() after error detection · 46d4e747
      Markus Elfring authored
      The kfree() function was called in two cases by the
      kvm_vcpu_ioctl_config_tlb() function during error handling
      even if the passed data structure element contained a null pointer.
      
      * Split a condition check for memory allocation failures.
      
      * Adjust jump targets according to the Linux coding style convention.
      Signed-off-by: default avatarMarkus Elfring <elfring@users.sourceforge.net>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      46d4e747
    • Markus Elfring's avatar
      KVM: PPC: e500: Use kmalloc_array() in kvm_vcpu_ioctl_config_tlb() · f3c0ce86
      Markus Elfring authored
      * A multiplication for the size determination of a memory allocation
        indicated that an array data structure should be processed.
        Thus use the corresponding function "kmalloc_array".
      
        This issue was detected by using the Coccinelle software.
      
      * Replace the specification of a data type by a pointer dereference
        to make the corresponding size determination a bit safer according to
        the Linux coding style convention.
      Signed-off-by: default avatarMarkus Elfring <elfring@users.sourceforge.net>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      f3c0ce86
    • Suresh Warrier's avatar
      KVM: PPC: Book3S HV: Counters for passthrough IRQ stats · 65e7026a
      Suresh Warrier authored
      Add VCPU stat counters to track affinity for passthrough
      interrupts.
      
      pthru_all: Counts all passthrough interrupts whose IRQ mappings are
                 in the kvmppc_passthru_irq_map structure.
      pthru_host: Counts all cached passthrough interrupts that were injected
      	    from the host through kvm_set_irq (i.e. not handled in
      	    real mode).
      pthru_bad_aff: Counts how many cached passthrough interrupts have
                     bad affinity (receiving CPU is not running VCPU that is
      	       the target of the virtual interrupt in the guest).
      Signed-off-by: default avatarSuresh Warrier <warrier@linux.vnet.ibm.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      65e7026a
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Set server for passed-through interrupts · 5d375199
      Paul Mackerras authored
      When a guest has a PCI pass-through device with an interrupt, it
      will direct the interrupt to a particular guest VCPU.  In fact the
      physical interrupt might arrive on any CPU, and then get
      delivered to the target VCPU in the emulated XICS (guest interrupt
      controller), and eventually delivered to the target VCPU.
      
      Now that we have code to handle device interrupts in real mode
      without exiting to the host kernel, there is an advantage to having
      the device interrupt arrive on the same sub(core) as the target
      VCPU is running on.  In this situation, the interrupt can be
      delivered to the target VCPU without any exit to the host kernel
      (using a hypervisor doorbell interrupt between threads if
      necessary).
      
      This patch aims to get passed-through device interrupts arriving
      on the correct core by setting the interrupt server in the real
      hardware XICS for the interrupt to the first thread in the (sub)core
      where its target VCPU is running.  We do this in the real-mode H_EOI
      code because the H_EOI handler already needs to look at the
      emulated ICS state for the interrupt (whereas the H_XIRR handler
      doesn't), and we know we are running in the target VCPU context
      at that point.
      
      We set the server CPU in hardware using an OPAL call, regardless of
      what the IRQ affinity mask for the interrupt says, and without
      updating the affinity mask.  This amounts to saying that when an
      interrupt is passed through to a guest, as a matter of policy we
      allow the guest's affinity for the interrupt to override the host's.
      
      This is inspired by an earlier patch from Suresh Warrier, although
      none of this code came from that earlier patch.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      5d375199
    • Suresh Warrier's avatar
      KVM: PPC: Book3S HV: Update irq stats for IRQs handled in real mode · 366274f5
      Suresh Warrier authored
      When a passthrough IRQ is handled completely within KVM real
      mode code, it has to also update the IRQ stats since this
      does not go through the generic IRQ handling code.
      
      However, the per CPU kstat_irqs field is an allocated (not static)
      field and so cannot be directly accessed in real mode safely.
      
      The function this_cpu_inc_rm() is introduced to safely increment
      per CPU fields (currently coded for unsigned integers only) that
      are allocated and could thus be vmalloced also.
      Signed-off-by: default avatarSuresh Warrier <warrier@linux.vnet.ibm.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      366274f5
    • Suresh Warrier's avatar
      KVM: PPC: Book3S HV: Tunable to disable KVM IRQ bypass · 644abbb2
      Suresh Warrier authored
      Add a  module parameter kvm_irq_bypass for kvm_hv.ko to
      disable IRQ bypass for passthrough interrupts. The default
      value of this tunable is 1 - that is enable the feature.
      
      Since the tunable is used by built-in kernel code, we use
      the module_param_cb macro to achieve this.
      Signed-off-by: default avatarSuresh Warrier <warrier@linux.vnet.ibm.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      644abbb2
    • Suresh Warrier's avatar
      KVM: PPC: Book3S HV: Dump irqmap in debugfs · af893c7d
      Suresh Warrier authored
      Dump the passthrough irqmap structure associated with a
      guest as part of /sys/kernel/debug/powerpc/kvm-xics-*.
      Signed-off-by: default avatarSuresh Warrier <warrier@linux.vnet.ibm.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      af893c7d
    • Suresh Warrier's avatar
      KVM: PPC: Book3S HV: Complete passthrough interrupt in host · f7af5209
      Suresh Warrier authored
      In existing real mode ICP code, when updating the virtual ICP
      state, if there is a required action that cannot be completely
      handled in real mode, as for instance, a VCPU needs to be woken
      up, flags are set in the ICP to indicate the required action.
      This is checked when returning from hypercalls to decide whether
      the call needs switch back to the host where the action can be
      performed in virtual mode. Note that if h_ipi_redirect is enabled,
      real mode code will first try to message a free host CPU to
      complete this job instead of returning the host to do it ourselves.
      
      Currently, the real mode PCI passthrough interrupt handling code
      checks if any of these flags are set and simply returns to the host.
      This is not good enough as the trap value (0x500) is treated as an
      external interrupt by the host code. It is only when the trap value
      is a hypercall that the host code searches for and acts on unfinished
      work by calling kvmppc_xics_rm_complete.
      
      This patch introduces a special trap BOOK3S_INTERRUPT_HV_RM_HARD
      which is returned by KVM if there is unfinished business to be
      completed in host virtual mode after handling a PCI passthrough
      interrupt. The host checks for this special interrupt condition
      and calls into the kvmppc_xics_rm_complete, which is made an
      exported function for this reason.
      
      [paulus@ozlabs.org - moved logic to set r12 to BOOK3S_INTERRUPT_HV_RM_HARD
       in book3s_hv_rmhandlers.S into the end of kvmppc_check_wake_reason.]
      Signed-off-by: default avatarSuresh Warrier <warrier@linux.vnet.ibm.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      f7af5209
    • Suresh Warrier's avatar
      KVM: PPC: Book3S HV: Handle passthrough interrupts in guest · e3c13e56
      Suresh Warrier authored
      Currently, KVM switches back to the host to handle any external
      interrupt (when the interrupt is received while running in the
      guest). This patch updates real-mode KVM to check if an interrupt
      is generated by a passthrough adapter that is owned by this guest.
      If so, the real mode KVM will directly inject the corresponding
      virtual interrupt to the guest VCPU's ICS and also EOI the interrupt
      in hardware. In short, the interrupt is handled entirely in real
      mode in the guest context without switching back to the host.
      
      In some rare cases, the interrupt cannot be completely handled in
      real mode, for instance, a VCPU that is sleeping needs to be woken
      up. In this case, KVM simply switches back to the host with trap
      reason set to 0x500. This works, but it is clearly not very efficient.
      A following patch will distinguish this case and handle it
      correctly in the host. Note that we can use the existing
      check_too_hard() routine even though we are not in a hypercall to
      determine if there is unfinished business that needs to be
      completed in host virtual mode.
      
      The patch assumes that the mapping between hardware interrupt IRQ
      and virtual IRQ to be injected to the guest already exists for the
      PCI passthrough interrupts that need to be handled in real mode.
      If the mapping does not exist, KVM falls back to the default
      existing behavior.
      
      The KVM real mode code reads mappings from the mapped array in the
      passthrough IRQ map without taking any lock.  We carefully order the
      loads and stores of the fields in the kvmppc_irq_map data structure
      using memory barriers to avoid an inconsistent mapping being seen by
      the reader. Thus, although it is possible to miss a map entry, it is
      not possible to read a stale value.
      
      [paulus@ozlabs.org - get irq_chip from irq_map rather than pimap,
       pulled out powernv eoi change into a separate patch, made
       kvmppc_read_intr get the vcpu from the paca rather than being
       passed in, rewrote the logic at the end of kvmppc_read_intr to
       avoid deep indentation, simplified logic in book3s_hv_rmhandlers.S
       since we were always restoring SRR0/1 anyway, get rid of the cached
       array (just use the mapped array), removed the kick_all_cpus_sync()
       call, clear saved_xirr PACA field when we handle the interrupt in
       real mode, fix compilation with CONFIG_KVM_XICS=n.]
      Signed-off-by: default avatarSuresh Warrier <warrier@linux.vnet.ibm.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      e3c13e56
  6. 09 Sep, 2016 9 commits
    • Suresh Warrier's avatar
      KVM: PPC: Book3S HV: Enable IRQ bypass · c57875f5
      Suresh Warrier authored
      Add the irq_bypass_add_producer and irq_bypass_del_producer
      functions. These functions get called whenever a GSI is being
      defined for a guest. They create/remove the mapping between
      host real IRQ numbers and the guest GSI.
      
      Add the following helper functions to manage the
      passthrough IRQ map.
      
      kvmppc_set_passthru_irq()
        Creates a mapping in the passthrough IRQ map that maps a host
        IRQ to a guest GSI. It allocates the structure (one per guest VM)
        the first time it is called.
      
      kvmppc_clr_passthru_irq()
        Removes the passthrough IRQ map entry given a guest GSI.
        The passthrough IRQ map structure is not freed even when the
        number of mapped entries goes to zero. It is only freed when
        the VM is destroyed.
      
      [paulus@ozlabs.org - modified to use is_pnv_opal_msi() rather than
       requiring all passed-through interrupts to use the same irq_chip;
       changed deletion so it zeroes out the r_hwirq field rather than
       copying the last entry down and decrementing the number of entries.]
      Signed-off-by: default avatarSuresh Warrier <warrier@linux.vnet.ibm.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      c57875f5
    • Suresh Warrier's avatar
      KVM: PPC: Book3S HV: Introduce kvmppc_passthru_irqmap · 8daaafc8
      Suresh Warrier authored
      This patch introduces an IRQ mapping structure, the
      kvmppc_passthru_irqmap structure that is to be used
      to map the real hardware IRQ in the host with the virtual
      hardware IRQ (gsi) that is injected into a guest by KVM for
      passthrough adapters.
      
      Currently, we assume a separate IRQ mapping structure for
      each guest. Each kvmppc_passthru_irqmap has a mapping arrays,
      containing all defined real<->virtual IRQs.
      
      [paulus@ozlabs.org - removed irq_chip field from struct
       kvmppc_passthru_irqmap; changed parameter for
       kvmppc_get_passthru_irqmap from struct kvm_vcpu * to struct
       kvm *, removed small cached array.]
      Signed-off-by: default avatarSuresh Warrier <warrier@linux.vnet.ibm.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      8daaafc8
    • Suresh Warrier's avatar
      KVM: PPC: select IRQ_BYPASS_MANAGER · 9576730d
      Suresh Warrier authored
      Select IRQ_BYPASS_MANAGER for PPC when CONFIG_KVM is set.
      Add the PPC producer functions for add and del producer.
      
      [paulus@ozlabs.org - Moved new functions from book3s.c to powerpc.c
       so booke compiles; added kvm_arch_has_irq_bypass implementation.]
      Signed-off-by: default avatarSuresh Warrier <warrier@linux.vnet.ibm.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      9576730d
    • Suresh Warrier's avatar
      KVM: PPC: Book3S HV: Convert kvmppc_read_intr to a C function · 37f55d30
      Suresh Warrier authored
      Modify kvmppc_read_intr to make it a C function.  Because it is called
      from kvmppc_check_wake_reason, any of the assembler code that calls
      either kvmppc_read_intr or kvmppc_check_wake_reason now has to assume
      that the volatile registers might have been modified.
      
      This also adds in the optimization of clearing saved_xirr in the case
      where we completely handle and EOI an IPI.  Without this, the next
      device interrupt will require two trips through the host interrupt
      handling code.
      
      [paulus@ozlabs.org - made kvmppc_check_wake_reason create a stack frame
       when it is calling kvmppc_read_intr, which means we can set r12 to
       the trap number (0x500) after the call to kvmppc_read_intr, instead
       of using r31.  Also moved the deliver_guest_interrupt label so as to
       restore XER and CTR, plus other minor tweaks.]
      Signed-off-by: default avatarSuresh Warrier <warrier@linux.vnet.ibm.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      37f55d30
    • Paul Mackerras's avatar
      Merge branch 'kvm-ppc-infrastructure' into kvm-ppc-next · 99212c86
      Paul Mackerras authored
      This merges the topic branch 'kvm-ppc-infrastructure' into kvm-ppc-next
      so that I can then apply further patches that need the changes in the
      kvm-ppc-infrastructure branch.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      99212c86
    • Paolo Bonzini's avatar
      powerpc: move hmi.c to arch/powerpc/kvm/ · 3f257774
      Paolo Bonzini authored
      hmi.c functions are unused unless sibling_subcore_state is nonzero, and
      that in turn happens only if KVM is in use.  So move the code to
      arch/powerpc/kvm/, putting it under CONFIG_KVM_BOOK3S_HV_POSSIBLE
      rather than CONFIG_PPC_BOOK3S_64.  The sibling_subcore_state is also
      included in struct paca_struct only if KVM is supported by the kernel.
      
      Cc: Daniel Axtens <dja@axtens.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: kvm-ppc@vger.kernel.org
      Cc: kvm@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      3f257774
    • Suresh Warrier's avatar
      powerpc/powernv: Provide facilities for EOI, usable from real mode · 4ee11c1a
      Suresh Warrier authored
      This adds a new function pnv_opal_pci_msi_eoi() which does the part of
      end-of-interrupt (EOI) handling of an MSI which involves doing an
      OPAL call.  This function can be called in real mode.  This doesn't
      just export pnv_ioda2_msi_eoi() because that does a call to
      icp_native_eoi(), which does not work in real mode.
      
      This also adds a function, is_pnv_opal_msi(), which KVM can call to
      check whether an interrupt is one for which we should be calling
      pnv_opal_pci_msi_eoi() when we need to do an EOI.
      
      [paulus@ozlabs.org - split out the addition of pnv_opal_pci_msi_eoi()
       from Suresh's patch "KVM: PPC: Book3S HV: Handle passthrough
       interrupts in guest"; added is_pnv_opal_msi(); wrote description.]
      Signed-off-by: default avatarSuresh Warrier <warrier@linux.vnet.ibm.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      4ee11c1a
    • Suresh Warrier's avatar
      powerpc: Add simple cache inhibited MMIO accessors · 07b1fdf5
      Suresh Warrier authored
      Add simple cache inhibited accessors for memory mapped I/O.
      Unlike the accessors built from the DEF_MMIO_* macros, these
      don't include any hardware memory barriers, callers need to
      manage memory barriers on their own. These can only be called
      in hypervisor real mode.
      Signed-off-by: default avatarSuresh Warrier <warrier@linux.vnet.ibm.com>
      [paulus@ozlabs.org - added line to comment]
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      07b1fdf5
    • Paul Mackerras's avatar
      powerpc/mm: Speed up computation of base and actual page size for a HPTE · 0eeede0c
      Paul Mackerras authored
      This replaces a 2-D search through an array with a simple 8-bit table
      lookup for determining the actual and/or base page size for a HPT entry.
      
      The encoding in the second doubleword of the HPTE is designed to encode
      the actual and base page sizes without using any more bits than would be
      needed for a 4k page number, by using between 1 and 8 low-order bits of
      the RPN (real page number) field to encode the page sizes.  A single
      "large page" bit in the first doubleword indicates that these low-order
      bits are to be interpreted like this.
      
      We can determine the page sizes by using the low-order 8 bits of the RPN
      to look up a 256-entry table.  For actual page sizes less than 1MB, some
      of the upper bits of these 8 bits are going to be real address bits, but
      we can cope with that by replicating the entries for those smaller page
      sizes.
      
      While we're at it, let's move the hpte_page_size() and hpte_base_page_size()
      functions from a KVM-specific header to a header for 64-bit HPT systems,
      since this computation doesn't have anything specifically to do with KVM.
      Reviewed-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      0eeede0c
  7. 08 Sep, 2016 5 commits