- 29 Jan, 2014 1 commit
-
-
Paolo Bonzini authored
Self explanatory. Reported-by:
Radim Krcmar <rkrcmar@redhat.com> Cc: Vadim Rozenfeld <vrozenfe@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- 27 Jan, 2014 1 commit
-
-
Jan Kiszka authored
Check for invalid state transitions on guest-initiated updates of MSR_IA32_APICBASE. This address both enabling of the x2APIC when it is not supported and all invalid transitions as described in SDM section 10.12.5. It also checks that no reserved bit is set in APICBASE by the guest. Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com> [Use cpuid_maxphyaddr instead of guest_cpuid_get_phys_bits. - Paolo] Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- 24 Jan, 2014 1 commit
-
-
Vadim Rozenfeld authored
Signed-off-by:
Vadim Rozenfeld <vrozenfe@redhat.com> Reviewed-by:
Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- 23 Jan, 2014 1 commit
-
-
Vadim Rozenfeld authored
Signed-off-by:
Vadim Rozenfeld <vrozenfe@redhat.com> Reviewed-by:
Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- 17 Jan, 2014 3 commits
-
-
Jan Kiszka authored
In contrast to VMX, SVM dose not automatically transfer DR6 into the VCPU's arch.dr6. So if we face a DR6 read, we must consult a new vendor hook to obtain the current value. And as SVM now picks the DR6 state from its VMCB, we also need a set callback in order to write updates of DR6 back. Fixes a regression of 020df079 . Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Jan Kiszka authored
Whenever we change arch.dr7, we also have to call kvm_update_dr7. In case guest debugging is off, this will synchronize the new state into hardware. Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Vadim Rozenfeld authored
Signed-off: Peter Lieven <pl@kamp.de> Signed-off: Gleb Natapov Signed-off: Vadim Rozenfeld <vrozenfe@redhat.com> After some consideration I decided to submit only Hyper-V reference counters support this time. I will submit iTSC support as a separate patch as soon as it is ready. v1 -> v2 1. mark TSC page dirty as suggested by Eric Northup <digitaleric@google.com> and Gleb 2. disable local irq when calling get_kernel_ns, as it was done by Peter Lieven <pl@amp.de> 3. move check for TSC page enable from second patch to this one. v3 -> v4 Get rid of ref counter offset. v4 -> v5 replace __copy_to_user with kvm_write_guest when updateing iTSC page. Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- 15 Jan, 2014 3 commits
-
-
Paolo Bonzini authored
After the previous patch from Marcelo, the comment before this write became obsolete. In fact, the write is unnecessary. The calls to kvm_write_tsc ultimately result in a master clock update as soon as all TSCs agree and the master clock is re-enabled. This master clock update will rewrite tsc_timestamp. So, together with the comment, delete the dead write too. Reviewed-by:
Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Marcelo Tosatti authored
To fix a problem related to different resolution of TSC and system clock, the offset in TSC units is approximated by delta = vcpu->hv_clock.tsc_timestamp - vcpu->last_guest_tsc (Guest TSC value at (Guest TSC value at last VM-exit) the last kvm_guest_time_update call) Delta is then later scaled using mult,shift pair found in hv_clock structure (which is correct against tsc_timestamp in that structure). However, if a frequency change is performed between these two points, this delta is measured using different TSC frequencies, but scaled using mult,shift pair for one frequency only. The end result is an incorrect delta. The bug which this code works around is not the only cause for clock backwards events. The global accumulator is still necessary, so remove the max_kernel_ns fix and rely on the global accumulator for no clock backwards events. Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Marcelo Tosatti authored
Limit PIT timer frequency similarly to the limit applied by LAPIC timer. Cc: stable@kernel.org Reviewed-by:
Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- 13 Dec, 2013 2 commits
-
-
Takuya Yoshikawa authored
Giving proper names to the 0 and 1 was once suggested. But since 0 is returned to the userspace, giving it another name can introduce extra confusion. This patch just explains the meanings instead. Signed-off-by:
Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Takuya Yoshikawa authored
Since the commit 15ad7146 ("KVM: Use the scheduler preemption notifiers to make kvm preemptible"), the remaining stuff in this function is a simple cond_resched() call with an extra need_resched() check which was there to avoid dropping VCPUs unnecessarily. Now it is meaningless. Signed-off-by:
Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- 12 Dec, 2013 1 commit
-
-
Andy Honig authored
In kvm_lapic_sync_from_vapic and kvm_lapic_sync_to_vapic there is the potential to corrupt kernel memory if userspace provides an address that is at the end of a page. This patches concerts those functions to use kvm_write_guest_cached and kvm_read_guest_cached. It also checks the vapic_address specified by userspace during ioctl processing and returns an error to userspace if the address is not a valid GPA. This is generally not guest triggerable, because the required write is done by firmware that runs before the guest. Also, it only affects AMD processors and oldish Intel that do not have the FlexPriority feature (unless you disable FlexPriority, of course; then newer processors are also affected). Fixes: b93463aa ('KVM: Accelerated apic support') Reported-by:
Andrew Honig <ahonig@google.com> Cc: stable@vger.kernel.org Signed-off-by:
Andrew Honig <ahonig@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- 06 Nov, 2013 1 commit
-
-
Michael S. Tsirkin authored
I noticed that srcu_read_lock/unlock both have a memory barrier, so just by moving srcu_read_unlock earlier we can get rid of one call to smp_mb() using smp_mb__after_srcu_read_unlock instead. Unsurprisingly, the gain is small but measureable using the unit test microbenchmark: before vmcall in the ballpark of 1410 cycles after vmcall in the ballpark of 1360 cycles Signed-off-by:
Michael S. Tsirkin <mst@redhat.com> Signed-off-by:
Gleb Natapov <gleb@redhat.com>
-
- 31 Oct, 2013 2 commits
-
-
Paolo Bonzini authored
The loop was always using 0 as the index. This means that any rubbish after the first element of the array went undetected. It seems reasonable to assume that no KVM userspace did that. Reviewed-by:
Gleb Natapov <gleb@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
The KVM_SET_XCRS ioctl must accept anything that KVM_GET_XCRS could return. XCR0's bit 0 is always 1 in real processors with XSAVE, and KVM_GET_XCRS will always leave bit 0 set even if the emulated processor does not have XSAVE. So, KVM_SET_XCRS must ignore that bit when checking for attempts to enable unsupported save states. Reviewed-by:
Gleb Natapov <gleb@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- 30 Oct, 2013 5 commits
-
-
Alex Williamson authored
We currently use some ad-hoc arch variables tied to legacy KVM device assignment to manage emulation of instructions that depend on whether non-coherent DMA is present. Create an interface for this, adapting legacy KVM device assignment and adding VFIO via the KVM-VFIO device. For now we assume that non-coherent DMA is possible any time we have a VFIO group. Eventually an interface can be developed as part of the VFIO external user interface to query the coherency of a group. Signed-off-by:
Alex Williamson <alex.williamson@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Alex Williamson authored
Default to operating in coherent mode. This simplifies the logic when we switch to a model of registering and unregistering noncoherent I/O with KVM. Signed-off-by:
Alex Williamson <alex.williamson@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Borislav Petkov authored
Call it EmulateOnUD which is exactly what we're trying to do with vendor-specific instructions. Rename ->only_vendor_specific_insn to something shorter, while at it. Signed-off-by:
Borislav Petkov <bp@suse.de> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Borislav Petkov authored
Add a field to the current emulation context which contains the instruction opcode length. This will streamline handling of opcodes of different length. Signed-off-by:
Borislav Petkov <bp@suse.de> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Borislav Petkov authored
Add a kvm ioctl which states which system functionality kvm emulates. The format used is that of CPUID and we return the corresponding CPUID bits set for which we do emulate functionality. Make sure ->padding is being passed on clean from userspace so that we can use it for something in the future, after the ioctl gets cast in stone. s/kvm_dev_ioctl_get_supported_cpuid/kvm_dev_ioctl_get_cpuid/ while at it. Signed-off-by:
Borislav Petkov <bp@suse.de> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- 17 Oct, 2013 1 commit
-
-
Aneesh Kumar K.V authored
We will use that in the later patch to find the kvm ops handler Signed-off-by:
Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by:
Alexander Graf <agraf@suse.de>
-
- 15 Oct, 2013 1 commit
-
-
chai wen authored
Page pinning is not mandatory in kvm async page fault processing since after async page fault event is delivered to a guest it accesses page once again and does its own GUP. Drop the FOLL_GET flag in GUP in async_pf code, and do some simplifying in check/clear processing. Suggested-by:
Gleb Natapov <gleb@redhat.com> Signed-off-by:
Gu zheng <guz.fnst@cn.fujitsu.com> Signed-off-by:
chai wen <chaiw.fnst@cn.fujitsu.com> Signed-off-by:
Gleb Natapov <gleb@redhat.com>
-
- 03 Oct, 2013 4 commits
-
-
Paolo Bonzini authored
kvm_mmu initialization is mostly filling in function pointers, there is no way for it to fail. Clean up unused return values. Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Gleb Natapov <gleb@redhat.com>
-
Paolo Bonzini authored
The new_cr3 MMU callback has been a wrapper for mmu_free_roots since commit e676505a (KVM: MMU: Force cr3 reload with two dimensional paging on mov cr3 emulation, 2012-07-08). The commit message mentioned that "mmu_free_roots() is somewhat of an overkill, but fixing that is more complicated and will be done after this minimal fix". One year has passed, and no one really felt the need to do a different fix. Wrap the call with a kvm_mmu_new_cr3 function for clarity, but remove the callback. Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Gleb Natapov <gleb@redhat.com>
-
Paolo Bonzini authored
This makes the interface more deterministic for userspace, which can expect (after configuring only the features it supports) to get exactly the same state from the kernel, independent of the host CPU and kernel version. Suggested-by:
Gleb Natapov <gleb@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Gleb Natapov <gleb@redhat.com>
-
Paolo Bonzini authored
A guest can still attempt to save and restore XSAVE states even if they have been masked in CPUID leaf 0Dh. This usually is not visible to the guest, but is still wrong: "Any attempt to set a reserved bit (as determined by the contents of EAX and EDX after executing CPUID with EAX=0DH, ECX= 0H) in XCR0 for a given processor will result in a #GP exception". The patch also performs the same checks as __kvm_set_xcr in KVM_SET_XSAVE. This catches migration from newer to older kernel/processor before the guest starts running. Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Gleb Natapov <gleb@redhat.com>
-
- 30 Sep, 2013 1 commit
-
-
Paolo Bonzini authored
In commit e935b837 ("KVM: Convert kvm_lock to raw_spinlock"), the kvm_lock was made a raw lock. However, the kvm mmu_shrink() function tries to grab the (non-raw) mmu_lock within the scope of the raw locked kvm_lock being held. This leads to the following: BUG: sleeping function called from invalid context at kernel/rtmutex.c:659 in_atomic(): 1, irqs_disabled(): 0, pid: 55, name: kswapd0 Preemption disabled at:[<ffffffffa0376eac>] mmu_shrink+0x5c/0x1b0 [kvm] Pid: 55, comm: kswapd0 Not tainted 3.4.34_preempt-rt Call Trace: [<ffffffff8106f2ad>] __might_sleep+0xfd/0x160 [<ffffffff817d8d64>] rt_spin_lock+0x24/0x50 [<ffffffffa0376f3c>] mmu_shrink+0xec/0x1b0 [kvm] [<ffffffff8111455d>] shrink_slab+0x17d/0x3a0 [<ffffffff81151f00>] ? mem_cgroup_iter+0x130/0x260 [<ffffffff8111824a>] balance_pgdat+0x54a/0x730 [<ffffffff8111fe47>] ? set_pgdat_percpu_threshold+0xa7/0xd0 [<ffffffff811185bf>] kswapd+0x18f/0x490 [<ffffffff81070961>] ? get_parent_ip+0x11/0x50 [<ffffffff81061970>] ? __init_waitqueue_head+0x50/0x50 [<ffffffff81118430>] ? balance_pgdat+0x730/0x730 [<ffffffff81060d2b>] kthread+0xdb/0xe0 [<ffffffff8106e122>] ? finish_task_switch+0x52/0x100 [<ffffffff817e1e94>] kernel_thread_helper+0x4/0x10 [<ffffffff81060c50>] ? __init_kthread_worker+0x After the previous patch, kvm_lock need not be a raw spinlock anymore, so change it back. Reported-by:
Paul Gortmaker <paul.gortmaker@windriver.com> Cc: kvm@vger.kernel.org Cc: gleb@redhat.com Cc: jan.kiszka@siemens.com Reviewed-by:
Gleb Natapov <gleb@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- 28 Aug, 2013 2 commits
-
-
Marcelo Tosatti authored
The offset to add to the hosts monotonic time, kvmclock_offset, is calculated against the monotonic time at KVM_SET_CLOCK ioctl time. Request a master clock update at this time, to reduce a potentially unbounded difference between the values of the masterclock and the clock value used to calculate kvmclock_offset. Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by:
Gleb Natapov <gleb@redhat.com>
-
Paolo Bonzini authored
Support for single-step in the emulator (new in 3.12) does not work for MMIO or PIO writes, because they are completed without returning to the emulator. This is not worse than what we had in 3.11; still, add comments so that the issue is not forgotten. Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Gleb Natapov <gleb@redhat.com>
-
- 26 Aug, 2013 2 commits
-
-
Raghavendra K T authored
Note that we are using APIC_DM_REMRD which has reserved usage. In future if APIC_DM_REMRD usage is standardized, then we should find some other way or go back to old method. Suggested-by:
Gleb Natapov <gleb@redhat.com> Signed-off-by:
Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Acked-by:
Gleb Natapov <gleb@redhat.com> Acked-by:
Ingo Molnar <mingo@kernel.org> Signed-off-by:
Gleb Natapov <gleb@redhat.com>
-
Srivatsa Vaddagiri authored
kvm_hc_kick_cpu allows the calling vcpu to kick another vcpu out of halt state. the presence of these hypercalls is indicated to guest via kvm_feature_pv_unhalt. Fold pv_unhalt flag into GET_MP_STATE ioctl to aid migration During migration, any vcpu that got kicked but did not become runnable (still in halted state) should be runnable after migration. Signed-off-by:
Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by:
Suzuki Poulose <suzuki@in.ibm.com> [Raghu: Apic related changes, folding pvunhalted into vcpu_runnable Added flags for future use (suggested by Gleb)] [ Raghu: fold pv_unhalt flag as suggested by Eric Northup] Signed-off-by:
Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Acked-by:
Gleb Natapov <gleb@redhat.com> Acked-by:
Ingo Molnar <mingo@kernel.org> Signed-off-by:
Gleb Natapov <gleb@redhat.com>
-
- 07 Aug, 2013 1 commit
-
-
Nadav Har'El authored
kvm_set_cr3() attempts to check if the new cr3 is a valid guest physical address. The problem is that with nested EPT, cr3 is an *L2* physical address, not an L1 physical address as this test expects. As the comment above this test explains, it isn't necessary, and doesn't correspond to anything a real processor would do. So this patch removes it. Note that this wrong test could have also theoretically caused problems in nested NPT, not just in nested EPT. However, in practice, the problem was avoided: nested_svm_vmexit()/vmrun() do not call kvm_set_cr3 in the nested NPT case, and instead set the vmcb (and arch.cr3) directly, thus circumventing the problem. Additional potential calls to the buggy function are avoided in that we don't trap cr3 modifications when nested NPT is enabled. However, because in nested VMX we did want to use kvm_set_cr3() (as requested in Avi Kivity's review of the original nested VMX patches), we can't avoid this problem and need to fix it. Reviewed-by:
Orit Wasserman <owasserm@redhat.com> Reviewed-by:
Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by:
Nadav Har'El <nyh@il.ibm.com> Signed-off-by:
Jun Nakajima <jun.nakajima@intel.com> Signed-off-by:
Xinhao Xu <xinhao.xu@intel.com> Signed-off-by:
Yang Zhang <yang.z.zhang@Intel.com> Signed-off-by:
Gleb Natapov <gleb@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- 29 Jul, 2013 3 commits
-
-
Paolo Bonzini authored
This lets debugging work better during emulation of invalid guest state. This time the check is done after emulation, but before writeback of the flags; we need to check the flags *before* execution of the instruction, we cannot check singlestep_rip because the CS base may have already been modified. Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Conflicts: arch/x86/kvm/x86.c
-
Paolo Bonzini authored
This lets debugging work better during emulation of invalid guest state. The check is done before emulating the instruction, and (in the case of guest debugging) reuses EMULATE_DO_MMIO to exit with KVM_EXIT_DEBUG. Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
The next patch will reuse it for other userspace exits than MMIO, namely debug events. Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- 18 Jul, 2013 4 commits
-
-
Nadav Har'El authored
Fix read/write to IA32_FEATURE_CONTROL MSR in nested environment. This patch simulate this MSR in nested_vmx and the default value is 0x0. BIOS should set it to 0x5 before VMXON. After setting the lock bit, write to it will cause #GP(0). Another QEMU patch is also needed to handle emulation of reset and migration. Reset to vCPU should clear this MSR and migration should reserve value of it. This patch is based on Nadav's previous commit. http://permalink.gmane.org/gmane.comp.emulators.kvm.devel/88478 Signed-off-by:
Nadav Har'El <nyh@math.technion.ac.il> Signed-off-by:
Arthur Chunqi Li <yzt356@gmail.com> Signed-off-by:
Gleb Natapov <gleb@redhat.com>
-
Mathias Krause authored
Void pointers don't need no casting, drop it. Signed-off-by:
Mathias Krause <minipli@googlemail.com> Reviewed-by:
Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Gleb Natapov <gleb@redhat.com>
-
Takuya Yoshikawa authored
Now that kvm_arch_memslots_updated() catches every increment of the memslots->generation, checking if the mmio generation has reached its maximum value is enough. Signed-off-by:
Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Reviewed-by:
Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Takuya Yoshikawa authored
This is called right after the memslots is updated, i.e. when the result of update_memslots() gets installed in install_new_memslots(). Since the memslots needs to be updated twice when we delete or move a memslot, kvm_arch_commit_memory_region() does not correspond to this exactly. In the following patch, x86 will use this new API to check if the mmio generation has reached its maximum value, in which case mmio sptes need to be flushed out. Signed-off-by:
Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Acked-by:
Alexander Graf <agraf@suse.de> Reviewed-by:
Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-