- 18 Feb, 2022 8 commits
-
-
Paolo Bonzini authored
A few vendor callbacks are only used by VMX, but they return an integer or bool value. Introduce KVM_X86_OP_OPTIONAL_RET0 for them: if a func is NULL in struct kvm_x86_ops, it will be changed to __static_call_return0 when updating static calls. Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
All their invocations are conditional on vcpu->arch.apicv_active, meaning that they need not be implemented by vendor code: even though at the moment both vendors implement APIC virtualization, all of them can be optional. In fact SVM does not need many of them, and their implementation can be deleted now. Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
Use the newly corrected KVM_X86_OP annotations to warn about possible NULL pointer dereferences as soon as the vendor module is loaded. Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
The original use of KVM_X86_OP_NULL, which was to mark calls that do not follow a specific naming convention, is not in use anymore. Instead, let's mark calls that are optional because they are always invoked within conditionals or with static_call_cond. Those that are _not_, i.e. those that are defined with KVM_X86_OP, must be defined by both vendor modules or some kind of NULL pointer dereference is bound to happen at runtime. Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
SVM implements neither update_emulated_instruction nor set_apic_access_page_addr. Remove an "if" by calling them with static_call_cond(). Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
The two ioctls used to implement userspace-accelerated TPR, KVM_TPR_ACCESS_REPORTING and KVM_SET_VAPIC_ADDR, are available even if hardware-accelerated TPR can be used. So there is no reason not to report KVM_CAP_VAPIC. Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
I managed to get hold of a machine that has SEV but not SEV-ES, and sev_migrate_tests fails because sev_vm_create(true) returns ENOTTY. Fix this, and while at it also return KSFT_SKIP on machines that do not have SEV at all, instead of returning 0. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Peter Gonda authored
For SEV-ES VMs with mirrors to be intra-host migrated they need to be able to migrate with the mirror. This is due to that fact that all VMSAs need to be added into the VM with LAUNCH_UPDATE_VMSA before lAUNCH_FINISH. Allowing migration with mirrors allows users of SEV-ES to keep the mirror VMs VMSAs during migration. Adds a list of mirror VMs for the original VM iterate through during its migration. During the iteration the owner pointers can be updated from the source to the destination. This fixes the ASID leaking issue which caused the blocking of migration of VMs with mirrors. Signed-off-by: Peter Gonda <pgonda@google.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Marc Orr <marcorr@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Message-Id: <20220211193634.3183388-1-pgonda@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
- 14 Feb, 2022 4 commits
-
-
Sean Christopherson authored
Use "avic" instead of "svm" for SVM's all of APICv hooks and make a few additional funciton name tweaks so that the AVIC functions conform to their associated kvm_x86_ops hooks. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220128005208.4008533-19-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
Merge bugfix patches from Linux 5.17-rc.
-
Jim Mattson authored
AMD's event select is 3 nybbles, with the high nybble in bits 35:32 of a PerfEvtSeln MSR. Don't mask off the high nybble when configuring a RAW perf event. Fixes: ca724305 ("KVM: x86/vPMU: Implement AMD vPMU code for KVM") Signed-off-by: Jim Mattson <jmattson@google.com> Message-Id: <20220203014813.2130559-2-jmattson@google.com> Reviewed-by: David Dunn <daviddunn@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Jim Mattson authored
AMD's event select is 3 nybbles, with the high nybble in bits 35:32 of a PerfEvtSeln MSR. Don't drop the high nybble when setting up the config field of a perf_event_attr structure for a call to perf_event_create_kernel_counter(). Fixes: ca724305 ("KVM: x86/vPMU: Implement AMD vPMU code for KVM") Reported-by: Stephane Eranian <eranian@google.com> Signed-off-by: Jim Mattson <jmattson@google.com> Message-Id: <20220203014813.2130559-1-jmattson@google.com> Reviewed-by: David Dunn <daviddunn@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
- 11 Feb, 2022 6 commits
-
-
Maxim Levitsky authored
If svm_deliver_avic_intr is called just after the target vcpu's AVIC got inhibited, it might read a stale value of vcpu->arch.apicv_active which can lead to the target vCPU not noticing the interrupt. To fix this use load-acquire/store-release so that, if the target vCPU is IN_GUEST_MODE, we're guaranteed to see a previous disabling of the AVIC. If AVIC has been disabled in the meanwhile, proceed with the KVM_REQ_EVENT-based delivery. Incomplete IPI vmexit has the same races as svm_deliver_avic_intr, and in fact it can be handled in exactly the same way; the only difference lies in who has set IRR, whether svm_deliver_interrupt or the processor. Therefore, svm_complete_interrupt_delivery can be used to fix incomplete IPI vmexits as well. Co-developed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
SVM has to set IRR for both the AVIC and the software-LAPIC case, so pull it up to the common function that handles both configurations. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Maxim Levitsky authored
The check on the current CPU adds an extra level of indentation to svm_deliver_avic_intr and conflates documentation on what happens if the vCPU exits (of interest to svm_deliver_avic_intr) and migrates (only of interest to avic_ring_doorbell, which calls get/put_cpu()). Extract the wrmsr to a separate function and rewrite the comment in svm_deliver_avic_intr(). Co-developed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Muhammad Usama Anjum authored
There is no vmx_pi_mmio_test file. Remove it to get rid of error while creation of selftest archive: rsync: [sender] link_stat "/kselftest/kvm/x86_64/vmx_pi_mmio_test" failed: No such file or directory (2) rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1333) [sender=3.2.3] Fixes: 6a581508 ("selftest: KVM: Add intra host migration tests") Reported-by: "kernelci.org bot" <bot@kernelci.org> Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Message-Id: <20220210172352.1317554-1-usama.anjum@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
Merge tag 'kvmarm-fixes-5.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 5.17, take #3 - Fix pending state read of a HW interrupt
-
Marc Zyngier authored
It appears that a read access to GIC[DR]_I[CS]PENDRn doesn't always result in the pending interrupts being accurately reported if they are mapped to a HW interrupt. This is particularily visible when acking the timer interrupt and reading the GICR_ISPENDR1 register immediately after, for example (the interrupt appears as not-pending while it really is...). This is because a HW interrupt has its 'active and pending state' kept in the *physical* distributor, and not in the virtual one, as mandated by the spec (this is what allows the direct deactivation). The virtual distributor only caries the pending and active *states* (note the plural, as these are two independent and non-overlapping states). Fix it by reading the HW state back, either from the timer itself or from the distributor if necessary. Reported-by: Ricardo Koller <ricarkol@google.com> Tested-by: Ricardo Koller <ricarkol@google.com> Reviewed-by: Ricardo Koller <ricarkol@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220208123726.3604198-1-maz@kernel.org
-
- 10 Feb, 2022 22 commits
-
-
Oliver Upton authored
There is a local that contains a pointer to vcpu_vmx already. Just use that instead to get at the structure directly instead of doing pointer arithmetic. No functional change intended. Signed-off-by: Oliver Upton <oupton@google.com> Message-Id: <20220204204705.3538240-8-oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Vitaly Kuznetsov authored
Introduce a new test for Hyper-V nSVM extensions (Hyper-V on KVM) and add a test for enlightened MSR-Bitmap feature: - Intercept access to MSR_FS_BASE in L1 and check that this works with enlightened MSR-Bitmap disabled. - Enabled enlightened MSR-Bitmap and check that the intercept still works as expected. - Intercept access to MSR_GS_BASE but don't clear the corresponding bit from clean fields mask, KVM is supposed to skip updating MSR-Bitmap02 and thus the consequent access to the MSR from L2 will not get intercepted. - Finally, clear the corresponding bit from clean fields mask and check that access to MSR_GS_BASE is now intercepted. The test works with the assumption, that access to MSR_FS_BASE/MSR_GS_BASE is not intercepted for L1. If this ever becomes not true the test will fail as nested_svm_exit_handled_msr() always checks L1's MSR-Bitmap for L2 irrespective of clean fields. The behavior is correct as enlightened MSR-Bitmap feature is just an optimization, KVM is not obliged to ignore updates when the corresponding bit in clean fields stays clear. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220203104620.277031-7-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Vitaly Kuznetsov authored
There's a copy of 'struct vmcb_control_area' definition in KVM selftests, update it to allow testing of the newly introduced features. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220203104620.277031-6-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Vitaly Kuznetsov authored
Similar to VMX, allocate memory for MSR-Bitmap and fill in 'msrpm_base_pa' in VMCB. To use it, tests will need to set INTERCEPT_MSR_PROT interception along with the required bits in the MSR-Bitmap. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220203104620.277031-5-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Vitaly Kuznetsov authored
Introduce a test for enlightened MSR-Bitmap feature (Hyper-V on KVM): - Intercept access to MSR_FS_BASE in L1 and check that this works with enlightened MSR-Bitmap disabled. - Enabled enlightened MSR-Bitmap and check that the intercept still works as expected. - Intercept access to MSR_GS_BASE but don't clear the corresponding bit from 'hv_clean_fields', KVM is supposed to skip updating MSR-Bitmap02 and thus the consequent access to the MSR from L2 will not get intercepted. - Finally, clear the corresponding bit from 'hv_clean_fields' and check that access to MSR_GS_BASE is now intercepted. The test works with the assumption, that access to MSR_FS_BASE/MSR_GS_BASE is not intercepted for L1. If this ever becomes not true the test will fail as nested_vmx_exit_handled_msr() always checks L1's MSR-Bitmap for L2 irrespective of 'hv_clean_fields'. The behavior is correct as enlightened MSR-Bitmap feature is just an optimization, KVM is not obliged to ignore updates when the corresponding bit in 'hv_clean_fields' stays clear. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220203104620.277031-4-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Vitaly Kuznetsov authored
Instead of just resetting 'hv_clean_fields' to 0 on every enlightened vmresume, do the expected cleaning of the corresponding bit on enlightened vmwrite. Avoid direct access to 'current_evmcs' from evmcs_test to support the change. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220203104620.277031-3-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Vitaly Kuznetsov authored
CPUID 0x40000000.EAX is now always present as it has Enlightened MSR-Bitmap feature bit set. Adapt the test accordingly. Opportunistically add a check for the supported eVMCS version range. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220203104620.277031-2-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Vitaly Kuznetsov authored
Similar to nVMX commit 502d2bf5 ("KVM: nVMX: Implement Enlightened MSR Bitmap feature"), add support for the feature for nSVM (Hyper-V on KVM). Notable differences from nVMX implementation: - As the feature uses SW reserved fields in VMCB control, KVM needs to make sure it's dealing with a Hyper-V guest (kvm_hv_hypercall_enabled()). - 'msrpm_base_pa' needs to be always be overwritten in nested_svm_vmrun_msrpm(), even when the update is skipped. As an optimization, nested_vmcb02_prepare_control() copies it from VMCB01 so when MSR-Bitmap feature for L2 is disabled nothing needs to be done. - 'struct vmcb_ctrl_area_cached' needs to be extended with clean fields/sw reserved data and __nested_copy_vmcb_control_to_cache() needs to copy it so nested_svm_vmrun_msrpm() can use it later. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220202095100.129834-5-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Vitaly Kuznetsov authored
In preparation to implementing Enlightened MSR-Bitmap feature for Hyper-V on KVM, split off the required definitions into common 'svm/hyperv.h' header. No functional change intended. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220202095100.129834-4-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Vitaly Kuznetsov authored
In preparation for using kvm_hv_hypercall_enabled() from SVM code, make it static inline to avoid the need to export it. The function is a simple check with only two call sites currently. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220202095100.129834-3-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Vitaly Kuznetsov authored
Similar to nVMX commit ed2a4800 ("KVM: nVMX: Track whether changes in L0 require MSR bitmap for L2 to be rebuilt"), introduce a flag to keep track of whether MSR bitmap for L2 needs to be rebuilt due to changes in MSR bitmap for L1 or switching to a different L2. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220202095100.129834-2-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
David Matlack authored
Add an option to dirty_log_perf_test.c to disable KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE and KVM_DIRTY_LOG_INITIALLY_SET so the legacy dirty logging code path can be tested. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220119230739.2234394-19-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
David Matlack authored
Add a tracepoint that records whenever KVM eagerly splits a huge page and the error status of the split to indicate if it succeeded or failed and why. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220119230739.2234394-18-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
David Matlack authored
When using KVM_DIRTY_LOG_INITIALLY_SET, huge pages are not write-protected when dirty logging is enabled on the memslot. Instead they are write-protected once userspace invokes KVM_CLEAR_DIRTY_LOG for the first time and only for the specific sub-region being cleared. Enhance KVM_CLEAR_DIRTY_LOG to also try to split huge pages prior to write-protecting to avoid causing write-protection faults on vCPU threads. This also allows userspace to smear the cost of huge page splitting across multiple ioctls, rather than splitting the entire memslot as is the case when initially-all-set is not used. Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220119230739.2234394-17-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
David Matlack authored
When dirty logging is enabled without initially-all-set, try to split all huge pages in the memslot down to 4KB pages so that vCPUs do not have to take expensive write-protection faults to split huge pages. Eager page splitting is best-effort only. This commit only adds the support for the TDP MMU, and even there splitting may fail due to out of memory conditions. Failures to split a huge page is fine from a correctness standpoint because KVM will always follow up splitting by write-protecting any remaining huge pages. Eager page splitting moves the cost of splitting huge pages off of the vCPU threads and onto the thread enabling dirty logging on the memslot. This is useful because: 1. Splitting on the vCPU thread interrupts vCPUs execution and is disruptive to customers whereas splitting on VM ioctl threads can run in parallel with vCPU execution. 2. Splitting all huge pages at once is more efficient because it does not require performing VM-exit handling or walking the page table for every 4KiB page in the memslot, and greatly reduces the amount of contention on the mmu_lock. For example, when running dirty_log_perf_test with 96 virtual CPUs, 1GiB per vCPU, and 1GiB HugeTLB memory, the time it takes vCPUs to write to all of their memory after dirty logging is enabled decreased by 95% from 2.94s to 0.14s. Eager Page Splitting is over 100x more efficient than the current implementation of splitting on fault under the read lock. For example, taking the same workload as above, Eager Page Splitting reduced the CPU required to split all huge pages from ~270 CPU-seconds ((2.94s - 0.14s) * 96 vCPU threads) to only 1.55 CPU-seconds. Eager page splitting does increase the amount of time it takes to enable dirty logging since it has split all huge pages. For example, the time it took to enable dirty logging in the 96GiB region of the aforementioned test increased from 0.001s to 1.55s. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220119230739.2234394-16-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
David Matlack authored
Separate the allocation of shadow pages from their initialization. This is in preparation for splitting huge pages outside of the vCPU fault context, which requires a different allocation mechanism. No functional changed intended. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220119230739.2234394-15-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
David Matlack authored
Derive the page role from the parent shadow page, since the only thing that changes is the level. This is in preparation for splitting huge pages during VM-ioctls which do not have access to the vCPU MMU context. No functional change intended. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220119230739.2234394-14-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
David Matlack authored
The vCPU's mmu_role already has the correct values for direct, has_4_byte_gpte, access, and ad_disabled. Remove the code that was redundantly overwriting these fields with the same values. No functional change intended. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220119230739.2234394-13-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
David Matlack authored
Instead of passing a pointer to the root page table and the root level separately, pass in a pointer to the root kvm_mmu_page struct. This reduces the number of arguments by 1, cutting down on line lengths. No functional change intended. Reviewed-by: Ben Gardon <bgardon@google.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220119230739.2234394-12-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
David Matlack authored
restore_acc_track_spte() is pure SPTE bit manipulation, making it a good fit for spte.h. And now that the WARN_ON_ONCE() calls have been removed, there isn't any good reason to not inline it. This move also prepares for a follow-up commit that will need to call restore_acc_track_spte() from spte.c No functional change intended. Reviewed-by: Ben Gardon <bgardon@google.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220119230739.2234394-11-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
David Matlack authored
The new_spte local variable is unnecessary. Deleting it can save a line of code and simplify the remaining lines a bit. No functional change intended. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220119230739.2234394-10-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
David Matlack authored
The warnings in restore_acc_track_spte() can be removed because the only caller checks is_access_track_spte(), and is_access_track_spte() checks !spte_ad_enabled(). In other words, the warning can never be triggered. No functional change intended. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220119230739.2234394-9-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-