1. 02 Nov, 2015 2 commits
  2. 29 Oct, 2015 3 commits
    • Christian Borntraeger's avatar
      KVM: s390: use simple switch statement as multiplexer · 46b708ea
      Christian Borntraeger authored
      We currently do some magic shifting (by exploiting that exit codes
      are always a multiple of 4) and a table lookup to jump into the
      exit handlers. This causes some calculations and checks, just to
      do an potentially expensive function call.
      
      Changing that to a switch statement gives the compiler the chance
      to inline and dynamically decide between jump tables or inline
      compare and branches. In addition it makes the code more readable.
      
      bloat-o-meter gives me a small reduction in code size:
      
      add/remove: 0/7 grow/shrink: 1/1 up/down: 986/-1334 (-348)
      function                                     old     new   delta
      kvm_handle_sie_intercept                      72    1058    +986
      handle_prog                                  704     696      -8
      handle_noop                                   54       -     -54
      handle_partial_execution                      60       -     -60
      intercept_funcs                              120       -    -120
      handle_instruction                           198       -    -198
      handle_validity                              210       -    -210
      handle_stop                                  316       -    -316
      handle_external_interrupt                    368       -    -368
      
      Right now my gcc does conditional branches instead of jump tables.
      The inlining seems to give us enough cycles as some micro-benchmarking
      shows minimal improvements, but still in noise.
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Reviewed-by: default avatarCornelia Huck <cornelia.huck@de.ibm.com>
      46b708ea
    • Christian Borntraeger's avatar
      KVM: s390: drop useless newline in debugging data · 58c383c6
      Christian Borntraeger authored
      the s390 debug feature does not need newlines. In fact it will
      result in empty lines. Get rid of 4 leftovers.
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Acked-by: default avatarCornelia Huck <cornelia.huck@de.ibm.com>
      58c383c6
    • David Hildenbrand's avatar
      KVM: s390: SCA must not cross page boundaries · c5c2c393
      David Hildenbrand authored
      We seemed to have missed a few corner cases in commit f6c137ff
      ("KVM: s390: randomize sca address").
      
      The SCA has a maximum size of 2112 bytes. By setting the sca_offset to
      some unlucky numbers, we exceed the page.
      
      0x7c0 (1984) -> Fits exactly
      0x7d0 (2000) -> 16 bytes out
      0x7e0 (2016) -> 32 bytes out
      0x7f0 (2032) -> 48 bytes out
      
      One VCPU entry is 32 bytes long.
      
      For the last two cases, we actually write data to the other page.
      1. The address of the VCPU.
      2. Injection/delivery/clearing of SIGP externall calls via SIGP IF.
      
      Especially the 2. happens regularly. So this could produce two problems:
      1. The guest losing/getting external calls.
      2. Random memory overwrites in the host.
      
      So this problem happens on every 127 + 128 created VM with 64 VCPUs.
      
      Cc: stable@vger.kernel.org # v3.15+
      Acked-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarDavid Hildenbrand <dahi@linux.vnet.ibm.com>
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      c5c2c393
  3. 21 Oct, 2015 4 commits
    • Gautham R. Shenoy's avatar
      KVM: PPC: Book3S HV: Handle H_DOORBELL on the guest exit path · 70aa3961
      Gautham R. Shenoy authored
      Currently a CPU running a guest can receive a H_DOORBELL in the
      following two cases:
      1) When the CPU is napping due to CEDE or there not being a guest
      vcpu.
      2) The CPU is running the guest vcpu.
      
      Case 1), the doorbell message is not cleared since we were waking up
      from nap. Hence when the EE bit gets set on transition from guest to
      host, the H_DOORBELL interrupt is delivered to the host and the
      corresponding handler is invoked.
      
      However in Case 2), the message gets cleared by the action of taking
      the H_DOORBELL interrupt. Since the CPU was running a guest, instead
      of invoking the doorbell handler, the code invokes the second-level
      interrupt handler to switch the context from the guest to the host. At
      this point the setting of the EE bit doesn't result in the CPU getting
      the doorbell interrupt since it has already been delivered once. So,
      the handler for this doorbell is never invoked!
      
      This causes softlockups if the missed DOORBELL was an IPI sent from a
      sibling subcore on the same CPU.
      
      This patch fixes it by explitly invoking the doorbell handler on the
      exit path if the exit reason is H_DOORBELL similar to the way an
      EXTERNAL interrupt is handled. Since this will also handle Case 1), we
      can unconditionally clear the doorbell message in
      kvmppc_check_wake_reason.
      Signed-off-by: default avatarGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      70aa3961
    • Nikunj A Dadhania's avatar
      KVM: PPC: Implement extension to report number of memslots · bfec5c2c
      Nikunj A Dadhania authored
      QEMU assumes 32 memslots if this extension is not implemented. Although,
      current value of KVM_USER_MEM_SLOTS is 32, once KVM_USER_MEM_SLOTS
      changes QEMU would take a wrong value.
      Signed-off-by: default avatarNikunj A Dadhania <nikunj@linux.vnet.ibm.com>
      Reviewed-by: default avatarThomas Huth <thuth@redhat.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      bfec5c2c
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Make H_REMOVE return correct HPTE value for absent HPTEs · c64dfe2a
      Paul Mackerras authored
      This fixes a bug where the old HPTE value returned by H_REMOVE has
      the valid bit clear if the HPTE was an absent HPTE, as happens for
      HPTEs for emulated MMIO pages and for RAM pages that have been paged
      out by the host.  If the absent bit is set, we clear it and set the
      valid bit, because from the guest's point of view, the HPTE is valid.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      c64dfe2a
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Don't fall back to smaller HPT size in allocation ioctl · 572abd56
      Paul Mackerras authored
      Currently the KVM_PPC_ALLOCATE_HTAB will try to allocate the requested
      size of HPT, and if that is not possible, then try to allocate smaller
      sizes (by factors of 2) until either a minimum is reached or the
      allocation succeeds.  This is not ideal for userspace, particularly in
      migration scenarios, where the destination VM really does require the
      size requested.  Also, the minimum HPT size of 256kB may be
      insufficient for the guest to run successfully.
      
      This removes the fallback to smaller sizes on allocation failure for
      the KVM_PPC_ALLOCATE_HTAB ioctl.  The fallback still exists for the
      case where the HPT is allocated at the time the first VCPU is run, if
      no HPT has been allocated by ioctl by that time.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      572abd56
  4. 19 Oct, 2015 2 commits
    • Takuya Yoshikawa's avatar
      KVM: x86: MMU: Initialize force_pt_level before calling mapping_level() · 8c85ac1c
      Takuya Yoshikawa authored
      Commit fd136902 ("KVM: x86: MMU: Move mapping_level_dirty_bitmap()
      call in mapping_level()") forgot to initialize force_pt_level to false
      in FNAME(page_fault)() before calling mapping_level() like
      nonpaging_map() does.  This can sometimes result in forcing page table
      level mapping unnecessarily.
      
      Fix this and move the first *force_pt_level check in mapping_level()
      before kvm_vcpu_gfn_to_memslot() call to make it a bit clearer that
      the variable must be initialized before mapping_level() gets called.
      
      This change can also avoid calling kvm_vcpu_gfn_to_memslot() when
      !check_hugepage_cache_consistency() check in tdp_page_fault() forces
      page table level mapping.
      Signed-off-by: default avatarTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      8c85ac1c
    • Paolo Bonzini's avatar
      kvm: x86: zero EFER on INIT · 5690891b
      Paolo Bonzini authored
      Not zeroing EFER means that a 32-bit firmware cannot enter paging mode
      without clearing EFER.LME first (which it should not know about).
      Yang Zhang from Intel confirmed that the manual is wrong and EFER is
      cleared to zero on INIT.
      
      Fixes: d28bc9dd
      Cc: stable@vger.kernel.org
      Cc: Yang Z Zhang <yang.z.zhang@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      5690891b
  5. 16 Oct, 2015 16 commits
  6. 15 Oct, 2015 4 commits
  7. 14 Oct, 2015 9 commits
    • Wanpeng Li's avatar
      KVM: VMX: introduce __vmx_flush_tlb to handle specific vpid · dd5f5341
      Wanpeng Li authored
      Introduce __vmx_flush_tlb() to handle specific vpid.
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      dd5f5341
    • Wanpeng Li's avatar
      KVM: VMX: adjust interface to allocate/free_vpid · 991e7a0e
      Wanpeng Li authored
      Adjust allocate/free_vid so that they can be reused for the nested vpid.
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      991e7a0e
    • Kosuke Tatsukawa's avatar
      kvm: fix waitqueue_active without memory barrier in virt/kvm/async_pf.c · 6003a420
      Kosuke Tatsukawa authored
      async_pf_execute() seems to be missing a memory barrier which might
      cause the waker to not notice the waiter and miss sending a wake_up as
      in the following figure.
      
              async_pf_execute                    kvm_vcpu_block
      ------------------------------------------------------------------------
      spin_lock(&vcpu->async_pf.lock);
      if (waitqueue_active(&vcpu->wq))
      /* The CPU might reorder the test for
         the waitqueue up here, before
         prior writes complete */
                                          prepare_to_wait(&vcpu->wq, &wait,
                                            TASK_INTERRUPTIBLE);
                                          /*if (kvm_vcpu_check_block(vcpu) < 0) */
                                           /*if (kvm_arch_vcpu_runnable(vcpu)) { */
                                            ...
                                            return (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE &&
                                              !vcpu->arch.apf.halted)
                                              || !list_empty_careful(&vcpu->async_pf.done)
                                           ...
                                           return 0;
      list_add_tail(&apf->link,
        &vcpu->async_pf.done);
      spin_unlock(&vcpu->async_pf.lock);
                                          waited = true;
                                          schedule();
      ------------------------------------------------------------------------
      
      The attached patch adds the missing memory barrier.
      
      I found this issue when I was looking through the linux source code
      for places calling waitqueue_active() before wake_up*(), but without
      preceding memory barriers, after sending a patch to fix a similar
      issue in drivers/tty/n_tty.c  (Details about the original issue can be
      found here: https://lkml.org/lkml/2015/9/28/849).
      Signed-off-by: default avatarKosuke Tatsukawa <tatsu@ab.jp.nec.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      6003a420
    • Radim Krčmář's avatar
      KVM: x86: don't notify userspace IOAPIC on edge EOI · 13db7734
      Radim Krčmář authored
      On real hardware, edge-triggered interrupts don't set a bit in TMR,
      which means that IOAPIC isn't notified on EOI.  Do the same here.
      
      Staying in guest/kernel mode after edge EOI is what we want for most
      devices.  If some bugs could be nicely worked around with edge EOI
      notifications, we should invest in a better interface.
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      13db7734
    • Radim Krčmář's avatar
      KVM: x86: fix edge EOI and IOAPIC reconfig race · db2bdcbb
      Radim Krčmář authored
      KVM uses eoi_exit_bitmap to track vectors that need an action on EOI.
      The problem is that IOAPIC can be reconfigured while an interrupt with
      old configuration is pending and eoi_exit_bitmap only remembers the
      newest configuration;  thus EOI from the pending interrupt is not
      recognized.
      
      (Reconfiguration is not a problem for level interrupts, because IOAPIC
       sends interrupt with the new configuration.)
      
      For an edge interrupt with ACK notifiers, like i8254 timer; things can
      happen in this order
       1) IOAPIC inject a vector from i8254
       2) guest reconfigures that vector's VCPU and therefore eoi_exit_bitmap
          on original VCPU gets cleared
       3) guest's handler for the vector does EOI
       4) KVM's EOI handler doesn't pass that vector to IOAPIC because it is
          not in that VCPU's eoi_exit_bitmap
       5) i8254 stops working
      
      A simple solution is to set the IOAPIC vector in eoi_exit_bitmap if the
      vector is in PIR/IRR/ISR.
      
      This creates an unwanted situation if the vector is reused by a
      non-IOAPIC source, but I think it is so rare that we don't want to make
      the solution more sophisticated.  The simple solution also doesn't work
      if we are reconfiguring the vector.  (Shouldn't happen in the wild and
      I'd rather fix users of ACK notifiers instead of working around that.)
      
      The are no races because ioapic injection and reconfig are locked.
      
      Fixes: b053b2ae ("KVM: x86: Add EOI exit bitmap inference")
      [Before b053b2ae, this bug happened only with APICv.]
      Fixes: c7c9c56c ("x86, apicv: add virtual interrupt delivery support")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      db2bdcbb
    • Radim Krčmář's avatar
      kvm: x86: set KVM_REQ_EVENT when updating IRR · c77f3fab
      Radim Krčmář authored
      After moving PIR to IRR, the interrupt needs to be delivered manually.
      Reported-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c77f3fab
    • Paolo Bonzini's avatar
      Merge branch 'kvm-master' into HEAD · bff98d3b
      Paolo Bonzini authored
      Merge more important SMM fixes.
      bff98d3b
    • Paolo Bonzini's avatar
      KVM: x86: fix RSM into 64-bit protected mode · b10d92a5
      Paolo Bonzini authored
      In order to get into 64-bit protected mode, you need to enable
      paging while EFER.LMA=1.  For this to work, CS.L must be 0.
      Currently, we load the segments before CR0 and CR4, which means
      that if RSM returns into 64-bit protected mode CS.L is already 1
      and everything breaks.
      
      Luckily, CS.L=0 is always the case when executing RSM, because it
      is forbidden to execute RSM from 64-bit protected mode.  Hence it
      is enough to load CR0 and CR4 first, and only then the segments.
      
      Fixes: 660a5d51
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b10d92a5
    • Paolo Bonzini's avatar
      KVM: x86: fix previous commit for 32-bit · 25188b99
      Paolo Bonzini authored
      Unfortunately I only noticed this after pushing.
      
      Fixes: f0d648bd
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      25188b99