1. 05 Mar, 2012 40 commits
    • Alexander Graf's avatar
      KVM: PPC: Rename MMIO register identifiers · b3c5d3c2
      Alexander Graf authored
      We need the KVM_REG namespace for generic register settings now, so
      let's rename the existing users to something different, enabling
      us to reuse the namespace for more visible interfaces.
      
      While at it, also move these private constants to a private header.
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      b3c5d3c2
    • Paul Mackerras's avatar
      KVM: PPC: Move kvm_vcpu_ioctl_[gs]et_one_reg down to platform-specific code · 31f3438e
      Paul Mackerras authored
      This moves the get/set_one_reg implementation down from powerpc.c into
      booke.c, book3s_pr.c and book3s_hv.c.  This avoids #ifdefs in C code,
      but more importantly, it fixes a bug on Book3s HV where we were
      accessing beyond the end of the kvm_vcpu struct (via the to_book3s()
      macro) and corrupting memory, causing random crashes and file corruption.
      
      On Book3s HV we only accept setting the HIOR to zero, since the guest
      runs in supervisor mode and its vectors are never offset from zero.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      [agraf update to apply on top of changed ONE_REG patches]
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      31f3438e
    • Alexander Graf's avatar
      KVM: PPC: Add support for explicit HIOR setting · 1022fc3d
      Alexander Graf authored
      Until now, we always set HIOR based on the PVR, but this is just wrong.
      Instead, we should be setting HIOR explicitly, so user space can decide
      what the initial HIOR value is - just like on real hardware.
      
      We keep the old PVR based way around for backwards compatibility, but
      once user space uses the SET_ONE_REG based method, we drop the PVR logic.
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      1022fc3d
    • Alexander Graf's avatar
      KVM: PPC: Add generic single register ioctls · e24ed81f
      Alexander Graf authored
      Right now we transfer a static struct every time we want to get or set
      registers. Unfortunately, over time we realize that there are more of
      these than we thought of before and the extensibility and flexibility of
      transferring a full struct every time is limited.
      
      So this is a new approach to the problem. With these new ioctls, we can
      get and set a single register that is identified by an ID. This allows for
      very precise and limited transmittal of data. When we later realize that
      it's a better idea to shove over multiple registers at once, we can reuse
      most of the infrastructure and simply implement a GET_MANY_REGS / SET_MANY_REGS
      interface.
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      e24ed81f
    • Sasha Levin's avatar
      KVM: PPC: Use the vcpu kmem_cache when allocating new VCPUs · 6b75e6bf
      Sasha Levin authored
      Currently the code kzalloc()s new VCPUs instead of using the kmem_cache
      which is created when KVM is initialized.
      
      Modify it to allocate VCPUs from that kmem_cache.
      Signed-off-by: default avatarSasha Levin <levinsasha928@gmail.com>
      Acked-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      6b75e6bf
    • Liu Yu's avatar
      KVM: PPC: booke: Add booke206 TLB trace · d37b1a03
      Liu Yu authored
      The existing kvm_stlb_write/kvm_gtlb_write were a poor match for
      the e500/book3e MMU -- mas1 was passed as "tid", mas2 was limited
      to "unsigned int" which will be a problem on 64-bit, mas3/7 got
      split up rather than treated as a single 64-bit word, etc.
      Signed-off-by: default avatarLiu Yu <yu.liu@freescale.com>
      [scottwood@freescale.com: made mas2 64-bit, and added mas8 init]
      Signed-off-by: default avatarScott Wood <scottwood@freescale.com>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      d37b1a03
    • Paul Mackerras's avatar
      KVM: PPC: Book3s HV: Implement get_dirty_log using hardware changed bit · 82ed3616
      Paul Mackerras authored
      This changes the implementation of kvm_vm_ioctl_get_dirty_log() for
      Book3s HV guests to use the hardware C (changed) bits in the guest
      hashed page table.  Since this makes the implementation quite different
      from the Book3s PR case, this moves the existing implementation from
      book3s.c to book3s_pr.c and creates a new implementation in book3s_hv.c.
      That implementation calls kvmppc_hv_get_dirty_log() to do the actual
      work by calling kvm_test_clear_dirty on each page.  It iterates over
      the HPTEs, clearing the C bit if set, and returns 1 if any C bit was
      set (including the saved C bit in the rmap entry).
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      82ed3616
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Use the hardware referenced bit for kvm_age_hva · 55514893
      Paul Mackerras authored
      This uses the host view of the hardware R (referenced) bit to speed
      up kvm_age_hva() and kvm_test_age_hva().  Instead of removing all
      the relevant HPTEs in kvm_age_hva(), we now just reset their R bits
      if set.  Also, kvm_test_age_hva() now scans the relevant HPTEs to
      see if any of them have R set.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      55514893
    • Paul Mackerras's avatar
      KVM: PPC: Book3s HV: Maintain separate guest and host views of R and C bits · bad3b507
      Paul Mackerras authored
      This allows both the guest and the host to use the referenced (R) and
      changed (C) bits in the guest hashed page table.  The guest has a view
      of R and C that is maintained in the guest_rpte field of the revmap
      entry for the HPTE, and the host has a view that is maintained in the
      rmap entry for the associated gfn.
      
      Both view are updated from the guest HPT.  If a bit (R or C) is zero
      in either view, it will be initially set to zero in the HPTE (or HPTEs),
      until set to 1 by hardware.  When an HPTE is removed for any reason,
      the R and C bits from the HPTE are ORed into both views.  We have to
      be careful to read the R and C bits from the HPTE after invalidating
      it, but before unlocking it, in case of any late updates by the hardware.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      bad3b507
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Keep HPTE locked when invalidating · a92bce95
      Paul Mackerras authored
      This reworks the implementations of the H_REMOVE and H_BULK_REMOVE
      hcalls to make sure that we keep the HPTE locked and in the reverse-
      mapping chain until we have finished invalidating it.  Previously
      we would remove it from the chain and unlock it before invalidating
      it, leaving a tiny window when the guest could access the page even
      though we believe we have removed it from the guest (e.g.,
      kvm_unmap_hva() has been called for the page and has found no HPTEs
      in the chain).  In addition, we'll need this for future patches where
      we will need to read the R and C bits in the HPTE after invalidating
      it.
      
      Doing this required restructuring kvmppc_h_bulk_remove() substantially.
      Since we want to batch up the tlbies, we now need to keep several
      HPTEs locked simultaneously.  In order to avoid possible deadlocks,
      we don't spin on the HPTE bitlock for any except the first HPTE in
      a batch.  If we can't acquire the HPTE bitlock for the second or
      subsequent HPTE, we terminate the batch at that point, do the tlbies
      that we have accumulated so far, unlock those HPTEs, and then start
      a new batch to do the remaining invalidations.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      a92bce95
    • Matt Evans's avatar
      KVM: PPC: Add KVM_CAP_NR_VCPUS and KVM_CAP_MAX_VCPUS · b5434032
      Matt Evans authored
      PPC KVM lacks these two capabilities, and as such a userland system must assume
      a max of 4 VCPUs (following api.txt).  With these, a userland can determine
      a more realistic limit.
      Signed-off-by: default avatarMatt Evans <matt@ozlabs.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      b5434032
    • Matt Evans's avatar
      KVM: PPC: Fix vcpu_create dereference before validity check. · 03cdab53
      Matt Evans authored
      Fix usage of vcpu struct before check that it's actually valid.
      Signed-off-by: default avatarMatt Evans <matt@ozlabs.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      03cdab53
    • Paul Mackerras's avatar
      KVM: PPC: Allow for read-only pages backing a Book3S HV guest · 4cf302bc
      Paul Mackerras authored
      With this, if a guest does an H_ENTER with a read/write HPTE on a page
      which is currently read-only, we make the actual HPTE inserted be a
      read-only version of the HPTE.  We now intercept protection faults as
      well as HPTE not found faults, and for a protection fault we work out
      whether it should be reflected to the guest (e.g. because the guest HPTE
      didn't allow write access to usermode) or handled by switching to
      kernel context and calling kvmppc_book3s_hv_page_fault, which will then
      request write access to the page and update the actual HPTE.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      4cf302bc
    • Paul Mackerras's avatar
      KVM: Add barriers to allow mmu_notifier_retry to be used locklessly · a355aa54
      Paul Mackerras authored
      This adds an smp_wmb in kvm_mmu_notifier_invalidate_range_end() and an
      smp_rmb in mmu_notifier_retry() so that mmu_notifier_retry() will give
      the correct answer when called without kvm->mmu_lock being held.
      PowerPC Book3S HV KVM wants to use a bitlock per guest page rather than
      a single global spinlock in order to improve the scalability of updates
      to the guest MMU hashed page table, and so needs this.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Acked-by: default avatarAvi Kivity <avi@redhat.com>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      a355aa54
    • Paul Mackerras's avatar
      KVM: PPC: Implement MMU notifiers for Book3S HV guests · 342d3db7
      Paul Mackerras authored
      This adds the infrastructure to enable us to page out pages underneath
      a Book3S HV guest, on processors that support virtualized partition
      memory, that is, POWER7.  Instead of pinning all the guest's pages,
      we now look in the host userspace Linux page tables to find the
      mapping for a given guest page.  Then, if the userspace Linux PTE
      gets invalidated, kvm_unmap_hva() gets called for that address, and
      we replace all the guest HPTEs that refer to that page with absent
      HPTEs, i.e. ones with the valid bit clear and the HPTE_V_ABSENT bit
      set, which will cause an HDSI when the guest tries to access them.
      Finally, the page fault handler is extended to reinstantiate the
      guest HPTE when the guest tries to access a page which has been paged
      out.
      
      Since we can't intercept the guest DSI and ISI interrupts on PPC970,
      we still have to pin all the guest pages on PPC970.  We have a new flag,
      kvm->arch.using_mmu_notifiers, that indicates whether we can page
      guest pages out.  If it is not set, the MMU notifier callbacks do
      nothing and everything operates as before.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      342d3db7
    • Paul Mackerras's avatar
      KVM: PPC: Implement MMIO emulation support for Book3S HV guests · 697d3899
      Paul Mackerras authored
      This provides the low-level support for MMIO emulation in Book3S HV
      guests.  When the guest tries to map a page which is not covered by
      any memslot, that page is taken to be an MMIO emulation page.  Instead
      of inserting a valid HPTE, we insert an HPTE that has the valid bit
      clear but another hypervisor software-use bit set, which we call
      HPTE_V_ABSENT, to indicate that this is an absent page.  An
      absent page is treated much like a valid page as far as guest hcalls
      (H_ENTER, H_REMOVE, H_READ etc.) are concerned, except of course that
      an absent HPTE doesn't need to be invalidated with tlbie since it
      was never valid as far as the hardware is concerned.
      
      When the guest accesses a page for which there is an absent HPTE, it
      will take a hypervisor data storage interrupt (HDSI) since we now set
      the VPM1 bit in the LPCR.  Our HDSI handler for HPTE-not-present faults
      looks up the hash table and if it finds an absent HPTE mapping the
      requested virtual address, will switch to kernel mode and handle the
      fault in kvmppc_book3s_hv_page_fault(), which at present just calls
      kvmppc_hv_emulate_mmio() to set up the MMIO emulation.
      
      This is based on an earlier patch by Benjamin Herrenschmidt, but since
      heavily reworked.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      697d3899
    • Paul Mackerras's avatar
      KVM: PPC: Maintain a doubly-linked list of guest HPTEs for each gfn · 06ce2c63
      Paul Mackerras authored
      This expands the reverse mapping array to contain two links for each
      HPTE which are used to link together HPTEs that correspond to the
      same guest logical page.  Each circular list of HPTEs is pointed to
      by the rmap array entry for the guest logical page, pointed to by
      the relevant memslot.  Links are 32-bit HPT entry indexes rather than
      full 64-bit pointers, to save space.  We use 3 of the remaining 32
      bits in the rmap array entries as a lock bit, a referenced bit and
      a present bit (the present bit is needed since HPTE index 0 is valid).
      The bit lock for the rmap chain nests inside the HPTE lock bit.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      06ce2c63
    • Paul Mackerras's avatar
      KVM: PPC: Allow I/O mappings in memory slots · 9d0ef5ea
      Paul Mackerras authored
      This provides for the case where userspace maps an I/O device into the
      address range of a memory slot using a VM_PFNMAP mapping.  In that
      case, we work out the pfn from vma->vm_pgoff, and record the cache
      enable bits from vma->vm_page_prot in two low-order bits in the
      slot_phys array entries.  Then, in kvmppc_h_enter() we check that the
      cache bits in the HPTE that the guest wants to insert match the cache
      bits in the slot_phys array entry.  However, we do allow the guest to
      create what it thinks is a non-cacheable or write-through mapping to
      memory that is actually cacheable, so that we can use normal system
      memory as part of an emulated device later on.  In that case the actual
      HPTE we insert is a cacheable HPTE.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      9d0ef5ea
    • Paul Mackerras's avatar
      KVM: PPC: Allow use of small pages to back Book3S HV guests · da9d1d7f
      Paul Mackerras authored
      This relaxes the requirement that the guest memory be provided as
      16MB huge pages, allowing it to be provided as normal memory, i.e.
      in pages of PAGE_SIZE bytes (4k or 64k).  To allow this, we index
      the kvm->arch.slot_phys[] arrays with a small page index, even if
      huge pages are being used, and use the low-order 5 bits of each
      entry to store the order of the enclosing page with respect to
      normal pages, i.e. log_2(enclosing_page_size / PAGE_SIZE).
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      da9d1d7f
    • Paul Mackerras's avatar
      KVM: PPC: Only get pages when actually needed, not in prepare_memory_region() · c77162de
      Paul Mackerras authored
      This removes the code from kvmppc_core_prepare_memory_region() that
      looked up the VMA for the region being added and called hva_to_page
      to get the pfns for the memory.  We have no guarantee that there will
      be anything mapped there at the time of the KVM_SET_USER_MEMORY_REGION
      ioctl call; userspace can do that ioctl and then map memory into the
      region later.
      
      Instead we defer looking up the pfn for each memory page until it is
      needed, which generally means when the guest does an H_ENTER hcall on
      the page.  Since we can't call get_user_pages in real mode, if we don't
      already have the pfn for the page, kvmppc_h_enter() will return
      H_TOO_HARD and we then call kvmppc_virtmode_h_enter() once we get back
      to kernel context.  That calls kvmppc_get_guest_page() to get the pfn
      for the page, and then calls back to kvmppc_h_enter() to redo the HPTE
      insertion.
      
      When the first vcpu starts executing, we need to have the RMO or VRMA
      region mapped so that the guest's real mode accesses will work.  Thus
      we now have a check in kvmppc_vcpu_run() to see if the RMO/VRMA is set
      up and if not, call kvmppc_hv_setup_rma().  It checks if the memslot
      starting at guest physical 0 now has RMO memory mapped there; if so it
      sets it up for the guest, otherwise on POWER7 it sets up the VRMA.
      The function that does that, kvmppc_map_vrma, is now a bit simpler,
      as it calls kvmppc_virtmode_h_enter instead of creating the HPTE itself.
      
      Since we are now potentially updating entries in the slot_phys[]
      arrays from multiple vcpu threads, we now have a spinlock protecting
      those updates to ensure that we don't lose track of any references
      to pages.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      c77162de
    • Paul Mackerras's avatar
      KVM: PPC: Make the H_ENTER hcall more reliable · 075295dd
      Paul Mackerras authored
      At present, our implementation of H_ENTER only makes one try at locking
      each slot that it looks at, and doesn't even retry the ldarx/stdcx.
      atomic update sequence that it uses to attempt to lock the slot.  Thus
      it can return the H_PTEG_FULL error unnecessarily, particularly when
      the H_EXACT flag is set, meaning that the caller wants a specific PTEG
      slot.
      
      This improves the situation by making a second pass when no free HPTE
      slot is found, where we spin until we succeed in locking each slot in
      turn and then check whether it is full while we hold the lock.  If the
      second pass fails, then we return H_PTEG_FULL.
      
      This also moves lock_hpte to a header file (since later commits in this
      series will need to use it from other source files) and renames it to
      try_lock_hpte, which is a somewhat less misleading name.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      075295dd
    • Paul Mackerras's avatar
      KVM: PPC: Add an interface for pinning guest pages in Book3s HV guests · 93e60249
      Paul Mackerras authored
      This adds two new functions, kvmppc_pin_guest_page() and
      kvmppc_unpin_guest_page(), and uses them to pin the guest pages where
      the guest has registered areas of memory for the hypervisor to update,
      (i.e. the per-cpu virtual processor areas, SLB shadow buffers and
      dispatch trace logs) and then unpin them when they are no longer
      required.
      
      Although it is not strictly necessary to pin the pages at this point,
      since all guest pages are already pinned, later commits in this series
      will mean that guest pages aren't all pinned.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      93e60249
    • Paul Mackerras's avatar
      KVM: PPC: Keep page physical addresses in per-slot arrays · b2b2f165
      Paul Mackerras authored
      This allocates an array for each memory slot that is added to store
      the physical addresses of the pages in the slot.  This array is
      vmalloc'd and accessed in kvmppc_h_enter using real_vmalloc_addr().
      This allows us to remove the ram_pginfo field from the kvm_arch
      struct, and removes the 64GB guest RAM limit that we had.
      
      We use the low-order bits of the array entries to store a flag
      indicating that we have done get_page on the corresponding page,
      and therefore need to call put_page when we are finished with the
      page.  Currently this is set for all pages except those in our
      special RMO regions.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      b2b2f165
    • Paul Mackerras's avatar
      KVM: PPC: Keep a record of HV guest view of hashed page table entries · 8936dda4
      Paul Mackerras authored
      This adds an array that parallels the guest hashed page table (HPT),
      that is, it has one entry per HPTE, used to store the guest's view
      of the second doubleword of the corresponding HPTE.  The first
      doubleword in the HPTE is the same as the guest's idea of it, so we
      don't need to store a copy, but the second doubleword in the HPTE has
      the real page number rather than the guest's logical page number.
      This allows us to remove the back_translate() and reverse_xlate()
      functions.
      
      This "reverse mapping" array is vmalloc'd, meaning that to access it
      in real mode we have to walk the kernel's page tables explicitly.
      That is done by the new real_vmalloc_addr() function.  (In fact this
      returns an address in the linear mapping, so the result is usable
      both in real mode and in virtual mode.)
      
      There are also some minor cleanups here: moving the definitions of
      HPT_ORDER etc. to a header file and defining HPT_NPTE for HPT_NPTEG << 3.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      8936dda4
    • Paul Mackerras's avatar
      KVM: PPC: Make wakeups work again for Book3S HV guests · 4e72dbe1
      Paul Mackerras authored
      When commit f43fdc15fa ("KVM: PPC: booke: Improve timer register
      emulation") factored out some code in arch/powerpc/kvm/powerpc.c
      into a new helper function, kvm_vcpu_kick(), an error crept in
      which causes Book3s HV guest vcpus to stall.  This fixes it.
      On POWER7 machines, guest vcpus are grouped together into virtual
      CPU cores that share a single waitqueue, so it's important to use
      vcpu->arch.wqp rather than &vcpu->wq.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      4e72dbe1
    • Liu Yu-B13201's avatar
      KVM: PPC: Avoid patching paravirt template code · befdc0a6
      Liu Yu-B13201 authored
      Currently we patch the whole code include paravirt template code.
      This isn't safe for scratch area and has impact to performance.
      Signed-off-by: default avatarLiu Yu <yu.liu@freescale.com>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      befdc0a6
    • Scott Wood's avatar
      KVM: PPC: e500: use hardware hint when loading TLB0 entries · 57013524
      Scott Wood authored
      The hardware maintains a per-set next victim hint.  Using this
      reduces conflicts, especially on e500v2 where a single guest
      TLB entry is mapped to two shadow TLB entries (user and kernel).
      We want those two entries to go to different TLB ways.
      
      sesel is now only used for TLB1.
      Reported-by: default avatarLiu Yu <yu.liu@freescale.com>
      Signed-off-by: default avatarScott Wood <scottwood@freescale.com>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      57013524
    • Scott Wood's avatar
      KVM: PPC: e500: Fix TLBnCFG in KVM_CONFIG_TLB · 7b11dc99
      Scott Wood authored
      The associativity, not just total size, can differ from the host
      hardware.
      Signed-off-by: default avatarScott Wood <scottwood@freescale.com>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      7b11dc99
    • Alexander Graf's avatar
      KVM: PPC: Book3S: PR: Fix signal check race · e371f713
      Alexander Graf authored
      As Scott put it:
      
      > If we get a signal after the check, we want to be sure that we don't
      > receive the reschedule IPI until after we're in the guest, so that it
      > will cause another signal check.
      
      we need to have interrupts disabled from the point we do signal_check()
      all the way until we actually enter the guest.
      
      This patch fixes potential signal loss races.
      Reported-by: default avatarScott Wood <scottwood@freescale.com>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      e371f713
    • Alexander Graf's avatar
      KVM: PPC: align vcpu_kick with x86 · ae21216b
      Alexander Graf authored
      Our vcpu kick implementation differs a bit from x86 which resulted in us not
      disabling preemption during the kick. Get it a bit closer to what x86 does.
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      ae21216b
    • Alexander Graf's avatar
      KVM: PPC: Use get/set for to_svcpu to help preemption · 468a12c2
      Alexander Graf authored
      When running the 64-bit Book3s PR code without CONFIG_PREEMPT_NONE, we were
      doing a few things wrong, most notably access to PACA fields without making
      sure that the pointers stay stable accross the access (preempt_disable()).
      
      This patch moves to_svcpu towards a get/put model which allows us to disable
      preemption while accessing the shadow vcpu fields in the PACA. That way we
      can run preemptible and everyone's happy!
      Reported-by: default avatarJörg Sommer <joerg@alea.gnuu.de>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      468a12c2
    • Alexander Graf's avatar
      KVM: PPC: Book3s: PR: No irq_disable in vcpu_run · d33ad328
      Alexander Graf authored
      Somewhere during merges we ended up from
      
        local_irq_enable()
        foo();
        local_irq_disable()
      
      to always keeping irqs enabled during that part. However, we now
      have the following code:
      
        foo();
        local_irq_disable()
      
      which disables interrupts without the surrounding code enabling them
      again! So let's remove that disable and be happy.
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      d33ad328
    • Alexander Graf's avatar
      KVM: PPC: Book3s: PR: Disable preemption in vcpu_run · 7d82714d
      Alexander Graf authored
      When entering the guest, we want to make sure we're not getting preempted
      away, so let's disable preemption on entry, but enable it again while handling
      guest exits.
      Reported-by: default avatarJörg Sommer <joerg@alea.gnuu.de>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      7d82714d
    • Scott Wood's avatar
      KVM: PPC: booke: Improve timer register emulation · dfd4d47e
      Scott Wood authored
      Decrementers are now properly driven by TCR/TSR, and the guest
      has full read/write access to these registers.
      
      The decrementer keeps ticking (and setting the TSR bit) regardless of
      whether the interrupts are enabled with TCR.
      
      The decrementer stops at zero, rather than going negative.
      
      Decrementers (and FITs, once implemented) are delivered as
      level-triggered interrupts -- dequeued when the TSR bit is cleared, not
      on delivery.
      Signed-off-by: default avatarLiu Yu <yu.liu@freescale.com>
      [scottwood@freescale.com: significant changes]
      Signed-off-by: default avatarScott Wood <scottwood@freescale.com>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      dfd4d47e
    • Scott Wood's avatar
      KVM: PPC: Paravirtualize SPRG4-7, ESR, PIR, MASn · b5904972
      Scott Wood authored
      This allows additional registers to be accessed by the guest
      in PR-mode KVM without trapping.
      
      SPRG4-7 are readable from userspace.  On booke, KVM will sync
      these registers when it enters the guest, so that accesses from
      guest userspace will work.  The guest kernel, OTOH, must consistently
      use either the real registers or the shared area between exits.  This
      also applies to the already-paravirted SPRG3.
      
      On non-booke, it's not clear to what extent SPRG4-7 are supported
      (they're not architected for book3s, but exist on at least some classic
      chips).  They are copied in the get/set regs ioctls, but I do not see any
      non-booke emulation.  I also do not see any syncing with real registers
      (in PR-mode) including the user-readable SPRG3.  This patch should not
      make that situation any worse.
      Signed-off-by: default avatarScott Wood <scottwood@freescale.com>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      b5904972
    • Scott Wood's avatar
      KVM: PPC: booke: Paravirtualize wrtee · 940b45ec
      Scott Wood authored
      Also fix wrteei 1 paravirt to check for a pending interrupt.
      Signed-off-by: default avatarScott Wood <scottwood@freescale.com>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      940b45ec
    • Scott Wood's avatar
      KVM: PPC: booke: Fix int_pending calculation for MSR[EE] paravirt · 29ac26ef
      Scott Wood authored
      int_pending was only being lowered if a bit in pending_exceptions
      was cleared during exception delivery -- but for interrupts, we clear
      it during IACK/TSR emulation.  This caused paravirt for enabling
      MSR[EE] to be ineffective.
      Signed-off-by: default avatarScott Wood <scottwood@freescale.com>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      29ac26ef
    • Scott Wood's avatar
      KVM: PPC: booke: Check for MSR[WE] in prepare_to_enter · c59a6a3e
      Scott Wood authored
      This prevents us from inappropriately blocking in a KVM_SET_REGS
      ioctl -- the MSR[WE] will take effect when the guest is next entered.
      
      It also causes SRR1[WE] to be set when we enter the guest's interrupt
      handler, which is what e500 hardware is documented to do.
      Signed-off-by: default avatarScott Wood <scottwood@freescale.com>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      c59a6a3e
    • Scott Wood's avatar
      KVM: PPC: Move prepare_to_enter call site into subarch code · 25051b5a
      Scott Wood authored
      This function should be called with interrupts disabled, to avoid
      a race where an exception is delivered after we check, but the
      resched kick is received before we disable interrupts (and thus doesn't
      actually trigger the exit code that would recheck exceptions).
      
      booke already does this properly in the lightweight exit case, but
      not on initial entry.
      
      For now, move the call of prepare_to_enter into subarch-specific code so
      that booke can do the right thing here.  Ideally book3s would do the same
      thing, but I'm having a hard time seeing where it does any interrupt
      disabling of this sort (plus it has several additional call sites), so
      I'm deferring the book3s fix to someone more familiar with that code.
      book3s behavior should be unchanged by this patch.
      Signed-off-by: default avatarScott Wood <scottwood@freescale.com>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      25051b5a
    • Scott Wood's avatar
      KVM: PPC: Rename deliver_interrupts to prepare_to_enter · 7e28e60e
      Scott Wood authored
      This function also updates paravirt int_pending, so rename it
      to be more obvious that this is a collection of checks run prior
      to (re)entering a guest.
      Signed-off-by: default avatarScott Wood <scottwood@freescale.com>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      7e28e60e