1. 10 May, 2016 6 commits
    • Paolo Bonzini's avatar
      Merge tag 'kvm-s390-next-4.7-2' of... · 6ac0f61f
      Paolo Bonzini authored
      Merge tag 'kvm-s390-next-4.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
      
      KVM: s390: features and fixes for 4.7 part2
      
      - Use hardware provided information about facility bits that do not
        need any hypervisor activitiy
      - Add missing documentation for KVM_CAP_S390_RI
      - Some updates/fixes for handling cpu models and facilities
      6ac0f61f
    • James Hogan's avatar
      MIPS: KVM: Add missing disable FPU hazard barriers · 4ac33429
      James Hogan authored
      Add the necessary hazard barriers after disabling the FPU in
      kvm_lose_fpu(), just to be safe.
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim KrčmáÅ" <rkrcmar@redhat.com>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      4ac33429
    • James Hogan's avatar
      MIPS: KVM: Fix preemption warning reading FPU capability · 556f2a52
      James Hogan authored
      Reading the KVM_CAP_MIPS_FPU capability returns cpu_has_fpu, however
      this uses smp_processor_id() to read the current CPU capabilities (since
      some old MIPS systems could have FPUs present on only a subset of CPUs).
      
      We don't support any such systems, so work around the warning by using
      raw_cpu_has_fpu instead.
      
      We should probably instead claim not to support FPU at all if any one
      CPU is lacking an FPU, but this should do for now.
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim KrčmáÅ" <rkrcmar@redhat.com>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      556f2a52
    • James Hogan's avatar
      MIPS: KVM: Fix preemptable kvm_mips_get_*_asid() calls · f049729c
      James Hogan authored
      There are a couple of places in KVM fault handling code which implicitly
      use smp_processor_id() via kvm_mips_get_kernel_asid() and
      kvm_mips_get_user_asid() from preemptable context. This is unsafe as a
      preemption could cause the guest kernel ASID to be changed, resulting in
      a host TLB entry being written with the wrong ASID.
      
      Fix by disabling preemption around the kvm_mips_get_*_asid() call and
      the corresponding kvm_mips_host_tlb_write().
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim KrčmáÅ" <rkrcmar@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      f049729c
    • James Hogan's avatar
      MIPS: KVM: Fix timer IRQ race when writing CP0_Compare · b45bacd2
      James Hogan authored
      Writing CP0_Compare clears the timer interrupt pending bit
      (CP0_Cause.TI), but this wasn't being done atomically. If a timer
      interrupt raced with the write of the guest CP0_Compare, the timer
      interrupt could end up being pending even though the new CP0_Compare is
      nowhere near CP0_Count.
      
      We were already updating the hrtimer expiry with
      kvm_mips_update_hrtimer(), which used both kvm_mips_freeze_hrtimer() and
      kvm_mips_resume_hrtimer(). Close the race window by expanding out
      kvm_mips_update_hrtimer(), and clearing CP0_Cause.TI and setting
      CP0_Compare between the freeze and resume. Since the pending timer
      interrupt should not be cleared when CP0_Compare is written via the KVM
      user API, an ack argument is added to distinguish the source of the
      write.
      
      Fixes: e30492bb ("MIPS: KVM: Rewrite count/compare timer emulation")
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim KrčmáÅ" <rkrcmar@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      Cc: <stable@vger.kernel.org> # 3.16.x-
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b45bacd2
    • James Hogan's avatar
      MIPS: KVM: Fix timer IRQ race when freezing timer · 4355c44f
      James Hogan authored
      There's a particularly narrow and subtle race condition when the
      software emulated guest timer is frozen which can allow a guest timer
      interrupt to be missed.
      
      This happens due to the hrtimer expiry being inexact, so very
      occasionally the freeze time will be after the moment when the emulated
      CP0_Count transitions to the same value as CP0_Compare (so an IRQ should
      be generated), but before the moment when the hrtimer is due to expire
      (so no IRQ is generated). The IRQ won't be generated when the timer is
      resumed either, since the resume CP0_Count will already match CP0_Compare.
      
      With VZ guests in particular this is far more likely to happen, since
      the soft timer may be frozen frequently in order to restore the timer
      state to the hardware guest timer. This happens after 5-10 hours of
      guest soak testing, resulting in an overflow in guest kernel timekeeping
      calculations, hanging the guest. A more focussed test case to
      intentionally hit the race (with the help of a new hypcall to cause the
      timer state to migrated between hardware & software) hits the condition
      fairly reliably within around 30 seconds.
      
      Instead of relying purely on the inexact hrtimer expiry to determine
      whether an IRQ should be generated, read the guest CP0_Compare and
      directly check whether the freeze time is before or after it. Only if
      CP0_Count is on or after CP0_Compare do we check the hrtimer expiry to
      determine whether the last IRQ has already been generated (which will
      have pushed back the expiry by one timer period).
      
      Fixes: e30492bb ("MIPS: KVM: Rewrite count/compare timer emulation")
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim KrčmáÅ" <rkrcmar@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      Cc: <stable@vger.kernel.org> # 3.16.x-
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      4355c44f
  2. 09 May, 2016 9 commits
  3. 04 May, 2016 2 commits
  4. 03 May, 2016 1 commit
  5. 29 Apr, 2016 1 commit
    • Bruce Rogers's avatar
      KVM: x86: fix ordering of cr0 initialization code in vmx_cpu_reset · f2463247
      Bruce Rogers authored
      Commit d28bc9dd reversed the order of two lines which initialize cr0,
      allowing the current (old) cr0 value to mess up vcpu initialization.
      This was observed in the checks for cr0 X86_CR0_WP bit in the context of
      kvm_mmu_reset_context(). Besides, setting vcpu->arch.cr0 after vmx_set_cr0()
      is completely redundant. Change the order back to ensure proper vcpu
      initialization.
      
      The combination of booting with ovmf firmware when guest vcpus > 1 and kvm's
      ept=N option being set results in a VM-entry failure. This patch fixes that.
      
      Fixes: d28bc9dd ("KVM: x86: INIT and reset sequences are different")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarBruce Rogers <brogers@suse.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      f2463247
  6. 25 Apr, 2016 1 commit
  7. 20 Apr, 2016 8 commits
  8. 13 Apr, 2016 1 commit
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 5e1b59ab
      Linus Torvalds authored
      Pull KVM fixes from Radim Krčmář:
       "ARM fixes:
         - Wrong indentation in the PMU code from the merge window
         - A long-time bug occuring with running ntpd on the host, candidate
           for stable
         - Properly handle (and warn about) the unsupported configuration of
           running on systems with less than 40 bits of PA space
         - More fixes to the PM and hotplug notifier stuff from the merge
           window
      
        x86:
         - leak of guest xcr0 (typically shows up as SIGILL)
         - new maintainer (who is sending the pull request too)
         - fix for merge window regression
         - fix for guest CPUID"
      
      Paolo Bonzini points out:
       "For the record, this tag is signed by me because I prepared the pull
        request.  Further pull requests for 4.6 will be signed and sent out by
        Radim directly"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        KVM: x86: mask CPUID(0xD,0x1).EAX against host value
        kvm: x86: do not leak guest xcr0 into host interrupt handlers
        KVM: MMU: fix permission_fault()
        KVM: new maintainer on the block
        arm64: KVM: unregister notifiers in hyp mode teardown path
        arm64: KVM: Warn when PARange is less than 40 bits
        KVM: arm/arm64: Handle forward time correction gracefully
        arm64: KVM: Add braces to multi-line if statement in virtual PMU code
      5e1b59ab
  9. 11 Apr, 2016 7 commits
  10. 10 Apr, 2016 4 commits
    • Linus Torvalds's avatar
      Revert "ext4: allow readdir()'s of large empty directories to be interrupted" · 9f2394c9
      Linus Torvalds authored
      This reverts commit 1028b55b.
      
      It's broken: it makes ext4 return an error at an invalid point, causing
      the readdir wrappers to write the the position of the last successful
      directory entry into the position field, which means that the next
      readdir will now return that last successful entry _again_.
      
      You can only return fatal errors (that terminate the readdir directory
      walk) from within the filesystem readdir functions, the "normal" errors
      (that happen when the readdir buffer fills up, for example) happen in
      the iterorator where we know the position of the actual failing entry.
      
      I do have a very different patch that does the "signal_pending()"
      handling inside the iterator function where it is allowable, but while
      that one passes all the sanity checks, I screwed up something like four
      times while emailing it out, so I'm not going to commit it today.
      
      So my track record is not good enough, and the stars will have to align
      better before that one gets committed.  And it would be good to get some
      review too, of course, since celestial alignments are always an iffy
      debugging model.
      
      IOW, let's just revert the commit that caused the problem for now.
      Reported-by: default avatarGreg Thelen <gthelen@google.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9f2394c9
    • Paolo Bonzini's avatar
      KVM: x86: mask CPUID(0xD,0x1).EAX against host value · 316314ca
      Paolo Bonzini authored
      This ensures that the guest doesn't see XSAVE extensions
      (e.g. xgetbv1 or xsavec) that the host lacks.
      
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      316314ca
    • David Matlack's avatar
      kvm: x86: do not leak guest xcr0 into host interrupt handlers · fc5b7f3b
      David Matlack authored
      An interrupt handler that uses the fpu can kill a KVM VM, if it runs
      under the following conditions:
       - the guest's xcr0 register is loaded on the cpu
       - the guest's fpu context is not loaded
       - the host is using eagerfpu
      
      Note that the guest's xcr0 register and fpu context are not loaded as
      part of the atomic world switch into "guest mode". They are loaded by
      KVM while the cpu is still in "host mode".
      
      Usage of the fpu in interrupt context is gated by irq_fpu_usable(). The
      interrupt handler will look something like this:
      
      if (irq_fpu_usable()) {
              kernel_fpu_begin();
      
              [... code that uses the fpu ...]
      
              kernel_fpu_end();
      }
      
      As long as the guest's fpu is not loaded and the host is using eager
      fpu, irq_fpu_usable() returns true (interrupted_kernel_fpu_idle()
      returns true). The interrupt handler proceeds to use the fpu with
      the guest's xcr0 live.
      
      kernel_fpu_begin() saves the current fpu context. If this uses
      XSAVE[OPT], it may leave the xsave area in an undesirable state.
      According to the SDM, during XSAVE bit i of XSTATE_BV is not modified
      if bit i is 0 in xcr0. So it's possible that XSTATE_BV[i] == 1 and
      xcr0[i] == 0 following an XSAVE.
      
      kernel_fpu_end() restores the fpu context. Now if any bit i in
      XSTATE_BV == 1 while xcr0[i] == 0, XRSTOR generates a #GP. The
      fault is trapped and SIGSEGV is delivered to the current process.
      
      Only pre-4.2 kernels appear to be vulnerable to this sequence of
      events. Commit 653f52c3 ("kvm,x86: load guest FPU context more eagerly")
      from 4.2 forces the guest's fpu to always be loaded on eagerfpu hosts.
      
      This patch fixes the bug by keeping the host's xcr0 loaded outside
      of the interrupts-disabled region where KVM switches into guest mode.
      
      Cc: stable@vger.kernel.org
      Suggested-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: default avatarDavid Matlack <dmatlack@google.com>
      [Move load after goto cancel_injection. - Paolo]
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      fc5b7f3b
    • Xiao Guangrong's avatar
      KVM: MMU: fix permission_fault() · 7a98205d
      Xiao Guangrong authored
      kvm-unit-tests complained about the PFEC is not set properly, e.g,:
      test pte.rw pte.d pte.nx pde.p pde.rw pde.pse user fetch: FAIL: error code 15
      expected 5
      Dump mapping: address: 0x123400000000
      ------L4: 3e95007
      ------L3: 3e96007
      ------L2: 2000083
      
      It's caused by the reason that PFEC returned to guest is copied from the
      PFEC triggered by shadow page table
      
      This patch fixes it and makes the logic of updating errcode more clean
      Signed-off-by: default avatarXiao Guangrong <guangrong.xiao@linux.intel.com>
      [Do not assume pfec.p=1. - Paolo]
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      7a98205d