1. 02 Nov, 2015 1 commit
    • Paolo Bonzini's avatar
      Merge tag 'kvm-s390-next-20151028' of... · 4d5140c5
      Paolo Bonzini authored
      Merge tag 'kvm-s390-next-20151028' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
      
      KVM: s390: Bugfix and cleanups
      
      There is one important bug fix for a potential memory corruption
      and/or guest errors for guests with 63 or 64 vCPUs. This fix would
      qualify for 4.3 but is some days too late giving that we are
      about to release 4.3.
      Given that this patch is cc stable >= 3.15 anyway, we can handle
      it via 4.4. merge window.
      
      This pull request also contains two cleanups.
      4d5140c5
  2. 29 Oct, 2015 3 commits
    • Christian Borntraeger's avatar
      KVM: s390: use simple switch statement as multiplexer · 46b708ea
      Christian Borntraeger authored
      We currently do some magic shifting (by exploiting that exit codes
      are always a multiple of 4) and a table lookup to jump into the
      exit handlers. This causes some calculations and checks, just to
      do an potentially expensive function call.
      
      Changing that to a switch statement gives the compiler the chance
      to inline and dynamically decide between jump tables or inline
      compare and branches. In addition it makes the code more readable.
      
      bloat-o-meter gives me a small reduction in code size:
      
      add/remove: 0/7 grow/shrink: 1/1 up/down: 986/-1334 (-348)
      function                                     old     new   delta
      kvm_handle_sie_intercept                      72    1058    +986
      handle_prog                                  704     696      -8
      handle_noop                                   54       -     -54
      handle_partial_execution                      60       -     -60
      intercept_funcs                              120       -    -120
      handle_instruction                           198       -    -198
      handle_validity                              210       -    -210
      handle_stop                                  316       -    -316
      handle_external_interrupt                    368       -    -368
      
      Right now my gcc does conditional branches instead of jump tables.
      The inlining seems to give us enough cycles as some micro-benchmarking
      shows minimal improvements, but still in noise.
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Reviewed-by: default avatarCornelia Huck <cornelia.huck@de.ibm.com>
      46b708ea
    • Christian Borntraeger's avatar
      KVM: s390: drop useless newline in debugging data · 58c383c6
      Christian Borntraeger authored
      the s390 debug feature does not need newlines. In fact it will
      result in empty lines. Get rid of 4 leftovers.
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Acked-by: default avatarCornelia Huck <cornelia.huck@de.ibm.com>
      58c383c6
    • David Hildenbrand's avatar
      KVM: s390: SCA must not cross page boundaries · c5c2c393
      David Hildenbrand authored
      We seemed to have missed a few corner cases in commit f6c137ff
      ("KVM: s390: randomize sca address").
      
      The SCA has a maximum size of 2112 bytes. By setting the sca_offset to
      some unlucky numbers, we exceed the page.
      
      0x7c0 (1984) -> Fits exactly
      0x7d0 (2000) -> 16 bytes out
      0x7e0 (2016) -> 32 bytes out
      0x7f0 (2032) -> 48 bytes out
      
      One VCPU entry is 32 bytes long.
      
      For the last two cases, we actually write data to the other page.
      1. The address of the VCPU.
      2. Injection/delivery/clearing of SIGP externall calls via SIGP IF.
      
      Especially the 2. happens regularly. So this could produce two problems:
      1. The guest losing/getting external calls.
      2. Random memory overwrites in the host.
      
      So this problem happens on every 127 + 128 created VM with 64 VCPUs.
      
      Cc: stable@vger.kernel.org # v3.15+
      Acked-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarDavid Hildenbrand <dahi@linux.vnet.ibm.com>
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      c5c2c393
  3. 19 Oct, 2015 2 commits
    • Takuya Yoshikawa's avatar
      KVM: x86: MMU: Initialize force_pt_level before calling mapping_level() · 8c85ac1c
      Takuya Yoshikawa authored
      Commit fd136902 ("KVM: x86: MMU: Move mapping_level_dirty_bitmap()
      call in mapping_level()") forgot to initialize force_pt_level to false
      in FNAME(page_fault)() before calling mapping_level() like
      nonpaging_map() does.  This can sometimes result in forcing page table
      level mapping unnecessarily.
      
      Fix this and move the first *force_pt_level check in mapping_level()
      before kvm_vcpu_gfn_to_memslot() call to make it a bit clearer that
      the variable must be initialized before mapping_level() gets called.
      
      This change can also avoid calling kvm_vcpu_gfn_to_memslot() when
      !check_hugepage_cache_consistency() check in tdp_page_fault() forces
      page table level mapping.
      Signed-off-by: default avatarTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      8c85ac1c
    • Paolo Bonzini's avatar
      kvm: x86: zero EFER on INIT · 5690891b
      Paolo Bonzini authored
      Not zeroing EFER means that a 32-bit firmware cannot enter paging mode
      without clearing EFER.LME first (which it should not know about).
      Yang Zhang from Intel confirmed that the manual is wrong and EFER is
      cleared to zero on INIT.
      
      Fixes: d28bc9dd
      Cc: stable@vger.kernel.org
      Cc: Yang Z Zhang <yang.z.zhang@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      5690891b
  4. 16 Oct, 2015 15 commits
  5. 14 Oct, 2015 9 commits
    • Wanpeng Li's avatar
      KVM: VMX: introduce __vmx_flush_tlb to handle specific vpid · dd5f5341
      Wanpeng Li authored
      Introduce __vmx_flush_tlb() to handle specific vpid.
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      dd5f5341
    • Wanpeng Li's avatar
      KVM: VMX: adjust interface to allocate/free_vpid · 991e7a0e
      Wanpeng Li authored
      Adjust allocate/free_vid so that they can be reused for the nested vpid.
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      991e7a0e
    • Kosuke Tatsukawa's avatar
      kvm: fix waitqueue_active without memory barrier in virt/kvm/async_pf.c · 6003a420
      Kosuke Tatsukawa authored
      async_pf_execute() seems to be missing a memory barrier which might
      cause the waker to not notice the waiter and miss sending a wake_up as
      in the following figure.
      
              async_pf_execute                    kvm_vcpu_block
      ------------------------------------------------------------------------
      spin_lock(&vcpu->async_pf.lock);
      if (waitqueue_active(&vcpu->wq))
      /* The CPU might reorder the test for
         the waitqueue up here, before
         prior writes complete */
                                          prepare_to_wait(&vcpu->wq, &wait,
                                            TASK_INTERRUPTIBLE);
                                          /*if (kvm_vcpu_check_block(vcpu) < 0) */
                                           /*if (kvm_arch_vcpu_runnable(vcpu)) { */
                                            ...
                                            return (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE &&
                                              !vcpu->arch.apf.halted)
                                              || !list_empty_careful(&vcpu->async_pf.done)
                                           ...
                                           return 0;
      list_add_tail(&apf->link,
        &vcpu->async_pf.done);
      spin_unlock(&vcpu->async_pf.lock);
                                          waited = true;
                                          schedule();
      ------------------------------------------------------------------------
      
      The attached patch adds the missing memory barrier.
      
      I found this issue when I was looking through the linux source code
      for places calling waitqueue_active() before wake_up*(), but without
      preceding memory barriers, after sending a patch to fix a similar
      issue in drivers/tty/n_tty.c  (Details about the original issue can be
      found here: https://lkml.org/lkml/2015/9/28/849).
      Signed-off-by: default avatarKosuke Tatsukawa <tatsu@ab.jp.nec.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      6003a420
    • Radim Krčmář's avatar
      KVM: x86: don't notify userspace IOAPIC on edge EOI · 13db7734
      Radim Krčmář authored
      On real hardware, edge-triggered interrupts don't set a bit in TMR,
      which means that IOAPIC isn't notified on EOI.  Do the same here.
      
      Staying in guest/kernel mode after edge EOI is what we want for most
      devices.  If some bugs could be nicely worked around with edge EOI
      notifications, we should invest in a better interface.
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      13db7734
    • Radim Krčmář's avatar
      KVM: x86: fix edge EOI and IOAPIC reconfig race · db2bdcbb
      Radim Krčmář authored
      KVM uses eoi_exit_bitmap to track vectors that need an action on EOI.
      The problem is that IOAPIC can be reconfigured while an interrupt with
      old configuration is pending and eoi_exit_bitmap only remembers the
      newest configuration;  thus EOI from the pending interrupt is not
      recognized.
      
      (Reconfiguration is not a problem for level interrupts, because IOAPIC
       sends interrupt with the new configuration.)
      
      For an edge interrupt with ACK notifiers, like i8254 timer; things can
      happen in this order
       1) IOAPIC inject a vector from i8254
       2) guest reconfigures that vector's VCPU and therefore eoi_exit_bitmap
          on original VCPU gets cleared
       3) guest's handler for the vector does EOI
       4) KVM's EOI handler doesn't pass that vector to IOAPIC because it is
          not in that VCPU's eoi_exit_bitmap
       5) i8254 stops working
      
      A simple solution is to set the IOAPIC vector in eoi_exit_bitmap if the
      vector is in PIR/IRR/ISR.
      
      This creates an unwanted situation if the vector is reused by a
      non-IOAPIC source, but I think it is so rare that we don't want to make
      the solution more sophisticated.  The simple solution also doesn't work
      if we are reconfiguring the vector.  (Shouldn't happen in the wild and
      I'd rather fix users of ACK notifiers instead of working around that.)
      
      The are no races because ioapic injection and reconfig are locked.
      
      Fixes: b053b2ae ("KVM: x86: Add EOI exit bitmap inference")
      [Before b053b2ae, this bug happened only with APICv.]
      Fixes: c7c9c56c ("x86, apicv: add virtual interrupt delivery support")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      db2bdcbb
    • Radim Krčmář's avatar
      kvm: x86: set KVM_REQ_EVENT when updating IRR · c77f3fab
      Radim Krčmář authored
      After moving PIR to IRR, the interrupt needs to be delivered manually.
      Reported-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c77f3fab
    • Paolo Bonzini's avatar
      Merge branch 'kvm-master' into HEAD · bff98d3b
      Paolo Bonzini authored
      Merge more important SMM fixes.
      bff98d3b
    • Paolo Bonzini's avatar
      KVM: x86: fix RSM into 64-bit protected mode · b10d92a5
      Paolo Bonzini authored
      In order to get into 64-bit protected mode, you need to enable
      paging while EFER.LMA=1.  For this to work, CS.L must be 0.
      Currently, we load the segments before CR0 and CR4, which means
      that if RSM returns into 64-bit protected mode CS.L is already 1
      and everything breaks.
      
      Luckily, CS.L=0 is always the case when executing RSM, because it
      is forbidden to execute RSM from 64-bit protected mode.  Hence it
      is enough to load CR0 and CR4 first, and only then the segments.
      
      Fixes: 660a5d51
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b10d92a5
    • Paolo Bonzini's avatar
      KVM: x86: fix previous commit for 32-bit · 25188b99
      Paolo Bonzini authored
      Unfortunately I only noticed this after pushing.
      
      Fixes: f0d648bd
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      25188b99
  6. 13 Oct, 2015 10 commits