- 19 May, 2015 8 commits
-
-
Xiao Guangrong authored
It's used to abstract the code from kvm_handle_hva_range and it will be used by later patch Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Xiao Guangrong authored
Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Xiao Guangrong authored
It's used to walk all the sptes on the rmap to clean up the code Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
This reverts commit ff7bbb9c. Sasha Levin is seeing odd jump in time values during boot of a KVM guest: [...] [ 0.000000] tsc: Detected 2260.998 MHz processor [3376355.247558] Calibrating delay loop (skipped) preset value.. [...] and bisected them to this commit. Reported-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Xiao Guangrong authored
KVM may turn a user page to a kernel page when kernel writes a readonly user page if CR0.WP = 1. This shadow page entry will be reused after SMAP is enabled so that kernel is allowed to access this user page Fix it by setting SMAP && !CR0.WP into shadow page's role and reset mmu once CR4.SMAP is updated Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Nadav Amit authored
When a REP-string is executed in 64-bit mode with an address-size prefix, ECX/EDI/ESI are used as counter and pointers. When ECX is initially zero, Intel CPUs clear the high 32-bits of RCX, and recent Intel CPUs update the high bits of the pointers in MOVS/STOS. This behavior is specific to Intel according to few experiments. As one may guess, this is an undocumented behavior. Yet, it is observable in the guest, since at least VMX traps REP-INS/OUTS even when ECX=0. Note that VMware appears to get it right. The behavior can be observed using the following code: #include <stdio.h> #define LOW_MASK (0xffffffff00000000ull) #define ALL_MASK (0xffffffffffffffffull) #define TEST(opcode) \ do { \ asm volatile(".byte 0xf2 \n\t .byte 0x67 \n\t .byte " opcode "\n\t" \ : "=S"(s), "=c"(c), "=D"(d) \ : "S"(ALL_MASK), "c"(LOW_MASK), "D"(ALL_MASK)); \ printf("opcode %s rcx=%llx rsi=%llx rdi=%llx\n", \ opcode, c, s, d); \ } while(0) void main() { unsigned long long s, d, c; iopl(3); TEST("0x6c"); TEST("0x6d"); TEST("0x6e"); TEST("0x6f"); TEST("0xa4"); TEST("0xa5"); TEST("0xa6"); TEST("0xa7"); TEST("0xaa"); TEST("0xab"); TEST("0xae"); TEST("0xaf"); } Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Nadav Amit authored
When REP-string instruction is preceded with an address-size prefix, ECX/EDI/ESI are used as the operation counter and pointers. When they are updated, the high 32-bits of RCX/RDI/RSI are cleared, similarly to the way they are updated on every 32-bit register operation. Fix it. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Nadav Amit authored
If the host sets hardware breakpoints to debug the guest, and a task-switch occurs in the guest, the architectural DR7 will not be updated. The effective DR7 would be updated instead. This fix puts the DR7 update during task-switch emulation, so it now uses the standard DR setting mechanism instead of the one that was previously used. As a bonus, the update of DR7 will now be effective for AMD as well. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
- 11 May, 2015 1 commit
-
-
Paolo Bonzini authored
Merge tag 'kvm-s390-next-20150508' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD KVM: s390: Fixes and features for 4.2 (kvm/next) Mostly a bunch of fixes, reworks and optimizations for s390. There is one new feature (EDAT-2 inside the guest), which boils down to 2GB pages.
-
- 08 May, 2015 15 commits
-
-
David Hildenbrand authored
Our implementation will never trigger interception code 12 as the responsible setting is never enabled - and never will be. The handler is dead code. Let's get rid of it. Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
-
David Hildenbrand authored
This patch factors out the search for a floating irq destination VCPU as well as the kicking of the found VCPU. The search is optimized in the following ways: 1. stopped VCPUs can't take any floating interrupts, so try to find an operating one. We have to take care of the special case where all VCPUs are stopped and we don't have any valid destination. 2. use online_vcpus, not KVM_MAX_VCPU. This speeds up the search especially if KVM_MAX_VCPU is increased one day. As these VCPU objects are initialized prior to increasing online_vcpus, we can be sure that they exist. Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Reviewed-by: Jens Freimann <jfrei@linux.vnet.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
-
Jens Freimann authored
We can avoid checking guest control registers and guest PSW as well as all the masking and calculations on the interrupt masks when no interrupts are pending. Also, the check for IRQ_PEND_COUNT can be removed, because we won't enter the while loop if no interrupts are pending and invalid interrupt types can't be injected. Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
-
Christian Borntraeger authored
Some updates to the control blocks need to be done in a way that ensures that no CPU is within SIE. Provide wrappers around the s390_vcpu_block functions and adopt the TOD migration code to update in a guaranteed fashion. Also rename these functions to have the kvm_s390_ prefix as everything else. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
-
Christian Borntraeger authored
exit_sie_sync is used to kick CPUs out of SIE and prevent reentering at any point in time. This is used to reload the prefix pages and to set the IBS stuff in a way that guarantees that after this function returns we are no longer in SIE. All current users trigger KVM requests. The request must be set before we block the CPUs to avoid races. Let's make this implicit by adding the request into a new function kvm_s390_sync_requests that replaces exit_sie_sync and split out s390_vcpu_block and s390_vcpu_unblock, that can be used to keep CPUs out of SIE independent of requests. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
-
Guenther Hutzl authored
1. Enable EDAT2 in the list of KVM facilities 2. Handle 2G frames in pfmf instruction If we support EDAT2, we may enable handling of 2G frames if not in 24 bit mode. 3. Enable EDAT2 in sie_block If the EDAT2 facility is available we enable GED2 mode control in the sie_block. Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Guenther Hutzl <hutzl@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
-
Guenther Hutzl authored
We should only enable EDAT1 for the guest if the host actually supports it and the cpu model for the guest has EDAT-1 enabled. Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Guenther Hutzl <hutzl@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
-
Christian Borntraeger authored
The fast path for a sie exit is that no kvm reqest is pending. Make an early check to skip all single bit checks. Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
-
David Hildenbrand authored
Commit ea5f4969 ("KVM: s390: only one external call may be pending at a time") introduced a bug on machines that don't have SIGP interpretation facility installed. The injection of an external call will now always fail with -EBUSY (if none is already pending). This leads to the following symptoms: - An external call will be injected but with the wrong "src cpu id", as this id will not be remembered. - The target vcpu will not be woken up, therefore the guest will hang if it cannot deal with unexpected failures of the SIGP EXTERNAL CALL instruction. - If an external call is already pending, -EBUSY will not be reported. Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Jens Freimann <jfrei@linux.vnet.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Cc: stable@vger.kernel.org # v4.0 Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
-
Paolo Bonzini authored
smep_andnot_wp is initialized in kvm_init_shadow_mmu and shadow pages should not be reused for different values of it. Thus, it has to be added to the mask in kvm_mmu_pte_write. Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Xiao Guangrong authored
Current permission check assumes that RSVD bit in PFEC is always zero, however, it is not true since MMIO #PF will use it to quickly identify MMIO access Fix it by clearing the bit if walking guest page table is needed Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Heiko Carstens authored
On cpu hotplug only KVM emits an unconditional message that its notifier has been called. It certainly can be assumed that calling cpu hotplug notifiers work, therefore there is no added value if KVM prints a message. If an error happens on cpu online KVM will still emit a warning. So let's remove this superfluous message. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Jan Kiszka authored
vcpu->arch.apic is NULL when a userspace irqchip is active. But instead of letting the test incorrectly depend on in-kernel irqchip mode, open-code it to catch also userspace x2APICs. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Nadav Amit authored
Far call in 64-bit has a 32-bit operand size. Remove the marking of this operation as Stack so it can be emulated correctly in 64-bit. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Radim Krčmář authored
Caching memslot value and using mark_page_dirty_in_slot() avoids another O(log N) search when dirtying the page. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Message-Id: <1428695247-27603-1-git-send-email-rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
- 07 May, 2015 14 commits
-
-
Paolo Bonzini authored
Code and format roughly based on Xen's vmcs_dump_vcpu. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Marcelo Tosatti authored
Drop unnecessary rdtsc_barrier(), as has been determined empirically, see 057e6a8c for details. Noticed by Andy Lutomirski. Improves clock_gettime() by approximately 15% on Intel i7-3520M @ 2.90GHz. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Julia Lawall authored
If the null test is needed, the call to cancel_delayed_work_sync would have already crashed. Normally, the destroy function should only be called if the init function has succeeded, in which case ioapic is not null. Problem found using Coccinelle. Suggested-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Radim Krčmář authored
PAT should be 0007_0406_0007_0406h on RESET and not modified on INIT. VMX used a wrong value (host's PAT) and while SVM used the right one, it never got to arch.pat. This is not an issue with QEMU as it will force the correct value. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Rik van Riel authored
Currently KVM will clear the FPU bits in CR0.TS in the VMCS, and trap to re-load them every time the guest accesses the FPU after a switch back into the guest from the host. This patch copies the x86 task switch semantics for FPU loading, with the FPU loaded eagerly after first use if the system uses eager fpu mode, or if the guest uses the FPU frequently. In the latter case, after loading the FPU for 255 times, the fpu_counter will roll over, and we will revert to loading the FPU on demand, until it has been established that the guest is still actively using the FPU. This mirrors the x86 task switch policy, which seems to work. Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
James Sullivan authored
An MSI interrupt should only be delivered to the lowest priority CPU when it has RH=1, regardless of the delivery mode. Modified kvm_is_dm_lowest_prio() to check for either irq->delivery_mode == APIC_DM_LOWPRI or irq->msi_redir_hint. Moved kvm_is_dm_lowest_prio() into lapic.h and renamed to kvm_lowest_prio_delivery(). Changed a check in kvm_irq_delivery_to_apic_fast() from irq->delivery_mode == APIC_DM_LOWPRI to kvm_is_dm_lowest_prio(). Signed-off-by: James Sullivan <sullivan.james.f@gmail.com> Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
James Sullivan authored
Extended struct kvm_lapic_irq with bool msi_redir_hint, which will be used to determine if the delivery of the MSI should target only the lowest priority CPU in the logical group specified for delivery. (In physical dest mode, the RH bit is not relevant). Initialized the value of msi_redir_hint to true when RH=1 in kvm_set_msi_irq(), and initialized to false in all other cases. Added value of msi_redir_hint to a debug message dump of an IRQ in apic_send_ipi(). Signed-off-by: James Sullivan <sullivan.james.f@gmail.com> Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
Change to u16 if they only contain data in the low 16 bits. Change the level field to bool, since we assign 1 sometimes, but just mask icr_low with APIC_INT_ASSERT in apic_send_ipi. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Nadav Amit authored
x86 architecture defines differences between the reset and INIT sequences. INIT does not initialize the FPU (including MMX, XMM, YMM, etc.), TSC, PMU, MSRs (in general), MTRRs machine-check, APIC ID, APIC arbitration ID and BSP. References (from Intel SDM): "If the MP protocol has completed and a BSP is chosen, subsequent INITs (either to a specific processor or system wide) do not cause the MP protocol to be repeated." [8.4.2: MP Initialization Protocol Requirements and Restrictions] [Table 9-1. IA-32 Processor States Following Power-up, Reset, or INIT] "If the processor is reset by asserting the INIT# pin, the x87 FPU state is not changed." [9.2: X87 FPU INITIALIZATION] "The state of the local APIC following an INIT reset is the same as it is after a power-up or hardware reset, except that the APIC ID and arbitration ID registers are not affected." [10.4.7.3: Local APIC State After an INIT Reset ("Wait-for-SIPI" State)] Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Message-Id: <1428924848-28212-1-git-send-email-namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Nadav Amit authored
Introducing KVM_CAP_DISABLE_QUIRKS for disabling x86 quirks that were previous created in order to overcome QEMU issues. Those issue were mostly result of invalid VM BIOS. Currently there are two quirks that can be disabled: 1. KVM_QUIRK_LINT0_REENABLED - LINT0 was enabled after boot 2. KVM_QUIRK_CD_NW_CLEARED - CD and NW are cleared after boot These two issues are already resolved in recent releases of QEMU, and would therefore be disabled by QEMU. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Message-Id: <1428879221-29996-1-git-send-email-namit@cs.technion.ac.il> [Report capability from KVM_CHECK_EXTENSION too. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Christian Borntraeger authored
Use __kvm_guest_{enter|exit} instead of kvm_guest_{enter|exit} where interrupts are disabled. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Christian Borntraeger authored
Several kvm architectures disable interrupts before kvm_guest_enter. kvm_guest_enter then uses local_irq_save/restore to disable interrupts again or for the first time. Lets provide underscore versions of kvm_guest_{enter|exit} that assume being called locked. kvm_guest_enter now disables interrupts for the full function and thus we can remove the check for preemptible. This patch then adopts s390/kvm to use local_irq_disable/enable calls which are slighty cheaper that local_irq_save/restore and call these new functions. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Luiz Capitulino authored
If you try to enable NOHZ_FULL on a guest today, you'll get the following error when the guest tries to deactivate the scheduler tick: WARNING: CPU: 3 PID: 2182 at kernel/time/tick-sched.c:192 can_stop_full_tick+0xb9/0x290() NO_HZ FULL will not work with unstable sched clock CPU: 3 PID: 2182 Comm: kworker/3:1 Not tainted 4.0.0-10545-gb9bb6fb7 #204 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 Workqueue: events flush_to_ldisc ffffffff8162a0c7 ffff88011f583e88 ffffffff814e6ba0 0000000000000002 ffff88011f583ed8 ffff88011f583ec8 ffffffff8104d095 ffff88011f583eb8 0000000000000000 0000000000000003 0000000000000001 0000000000000001 Call Trace: <IRQ> [<ffffffff814e6ba0>] dump_stack+0x4f/0x7b [<ffffffff8104d095>] warn_slowpath_common+0x85/0xc0 [<ffffffff8104d146>] warn_slowpath_fmt+0x46/0x50 [<ffffffff810bd2a9>] can_stop_full_tick+0xb9/0x290 [<ffffffff810bd9ed>] tick_nohz_irq_exit+0x8d/0xb0 [<ffffffff810511c5>] irq_exit+0xc5/0x130 [<ffffffff814f180a>] smp_apic_timer_interrupt+0x4a/0x60 [<ffffffff814eff5e>] apic_timer_interrupt+0x6e/0x80 <EOI> [<ffffffff814ee5d1>] ? _raw_spin_unlock_irqrestore+0x31/0x60 [<ffffffff8108bbc8>] __wake_up+0x48/0x60 [<ffffffff8134836c>] n_tty_receive_buf_common+0x49c/0xba0 [<ffffffff8134a6bf>] ? tty_ldisc_ref+0x1f/0x70 [<ffffffff81348a84>] n_tty_receive_buf2+0x14/0x20 [<ffffffff8134b390>] flush_to_ldisc+0xe0/0x120 [<ffffffff81064d05>] process_one_work+0x1d5/0x540 [<ffffffff81064c81>] ? process_one_work+0x151/0x540 [<ffffffff81065191>] worker_thread+0x121/0x470 [<ffffffff81065070>] ? process_one_work+0x540/0x540 [<ffffffff8106b4df>] kthread+0xef/0x110 [<ffffffff8106b3f0>] ? __kthread_parkme+0xa0/0xa0 [<ffffffff814ef4f2>] ret_from_fork+0x42/0x70 [<ffffffff8106b3f0>] ? __kthread_parkme+0xa0/0xa0 ---[ end trace 06e3507544a38866 ]--- However, it turns out that kvmclock does provide a stable sched_clock callback. So, let the scheduler know this which in turn makes NOHZ_FULL work in the guest. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
- 04 May, 2015 2 commits
-
-
Linus Torvalds authored
-
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4Linus Torvalds authored
Pull ext4 fixes from Ted Ts'o: "Some miscellaneous bug fixes and some final on-disk and ABI changes for ext4 encryption which provide better security and performance" * tag 'for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix growing of tiny filesystems ext4: move check under lock scope to close a race. ext4: fix data corruption caused by unwritten and delayed extents ext4 crypto: remove duplicated encryption mode definitions ext4 crypto: do not select from EXT4_FS_ENCRYPTION ext4 crypto: add padding to filenames before encrypting ext4 crypto: simplify and speed up filename encryption
-