- 16 Nov, 2021 18 commits
-
-
Paolo Bonzini authored
- Cleanups for the perf test infrastructure and mapping hugepages - Avoid contention on mmap_sem when the guests start to run - Add event channel upcall support to xen_shinfo_test
-
David Matlack authored
Change memslot_modification_stress_test to use perf_test_destroy_vm instead of manually calling ucall_uninit and kvm_vm_free. No functional change intended. Signed-off-by: David Matlack <dmatlack@google.com> Reviewed-by: Ben Gardon <bgardon@google.com> Message-Id: <20211111001257.1446428-5-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
David Matlack authored
Thread creation requires taking the mmap_sem in write mode, which causes vCPU threads running in guest mode to block while they are populating memory. Fix this by waiting for all vCPU threads to be created and start running before entering guest mode on any one vCPU thread. This substantially improves the "Populate memory time" when using 1GiB pages since it allows all vCPUs to zero pages in parallel rather than blocking because a writer is waiting (which is waiting for another vCPU that is busy zeroing a 1GiB page). Before: $ ./dirty_log_perf_test -v256 -s anonymous_hugetlb_1gb ... Populate memory time: 52.811184013s After: $ ./dirty_log_perf_test -v256 -s anonymous_hugetlb_1gb ... Populate memory time: 10.204573342s Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20211111001257.1446428-4-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
David Matlack authored
Move vCPU thread creation and joining to common helper functions. This is in preparation for the next commit which ensures that all vCPU threads are fully created before entering guest mode on any one vCPU. No functional change intended. Signed-off-by: David Matlack <dmatlack@google.com> Reviewed-by: Ben Gardon <bgardon@google.com> Message-Id: <20211111001257.1446428-3-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
David Matlack authored
Start at iteration 0 instead of -1 to avoid having to initialize vcpu_last_completed_iteration when setting up vCPU threads. This simplifies the next commit where we move vCPU thread initialization out to a common helper. No functional change intended. Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20211111001257.1446428-2-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Copy perf_test_args to the guest during VM creation instead of relying on the caller to do so at their leisure. Ideally, tests wouldn't even be able to modify perf_test_args, i.e. they would have no motivation to do the sync, but enforcing that is arguably a net negative for readability. No functional change intended. [Set wr_fract=1 by default and add helper to override it since the new access_tracking_perf_test needs to set it dynamically.] Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: David Matlack <dmatlack@google.com> Reviewed-by: Ben Gardon <bgardon@google.com> Message-Id: <20211111000310.1435032-13-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Fill the per-vCPU args when creating the perf_test VM instead of having the caller do so. This helps ensure that any adjustments to the number of pages (and thus vcpu_memory_bytes) are reflected in the per-VM args. Automatically filling the per-vCPU args will also allow a future patch to do the sync to the guest during creation. Signed-off-by: Sean Christopherson <seanjc@google.com> [Updated access_tracking_perf_test as well.] Signed-off-by: David Matlack <dmatlack@google.com> Reviewed-by: Ben Gardon <bgardon@google.com> Message-Id: <20211111000310.1435032-12-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Use the already computed guest_num_pages when creating the so called extra VM pages for a perf test, and add a comment explaining why the pages are allocated as extra pages. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Ben Gardon <bgardon@google.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20211111000310.1435032-11-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Remove perf_test_args.host_page_size and instead use getpagesize() so that it's somewhat obvious that, for tests that care about the host page size, they care about the system page size, not the hardware page size, e.g. that the logic is unchanged if hugepages are in play. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Ben Gardon <bgardon@google.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20211111000310.1435032-10-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Move the per-VM GPA into perf_test_args instead of storing it as a separate global variable. It's not obvious that guest_test_phys_mem holds a GPA, nor that it's connected/coupled with per_vcpu->gpa. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Ben Gardon <bgardon@google.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20211111000310.1435032-9-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Grab the per-vCPU GPA and number of pages from perf_util in the demand paging test instead of duplicating perf_util's calculations. Note, this may or may not result in a functional change. It's not clear that the test's calculations are guaranteed to yield the same value as perf_util, e.g. if guest_percpu_mem_size != vcpu_args->pages. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Ben Gardon <bgardon@google.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20211111000310.1435032-8-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Capture the per-vCPU GPA in perf_test_vcpu_args so that tests can get the GPA without having to calculate the GPA on their own. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Ben Gardon <bgardon@google.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20211111000310.1435032-7-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Use 'pta' as a local pointer to the global perf_tests_args in order to shorten line lengths and make the code borderline readable. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Ben Gardon <bgardon@google.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20211111000310.1435032-6-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Assert that the GPA for a memslot backed by a hugepage is aligned to the hugepage size and fix perf_test_util accordingly. Lack of GPA alignment prevents KVM from backing the guest with hugepages, e.g. x86's write-protection of hugepages when dirty logging is activated is otherwise not exercised. Add a comment explaining that guest_page_size is for non-huge pages to try and avoid confusion about what it actually tracks. Cc: Ben Gardon <bgardon@google.com> Cc: Yanan Wang <wangyanan55@huawei.com> Cc: Andrew Jones <drjones@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Aaron Lewis <aaronlewis@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> [Used get_backing_src_pagesz() to determine alignment dynamically.] Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20211111000310.1435032-5-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Manually padding and aligning the mmap region is only needed when using THP. When using HugeTLB, mmap will always return an address aligned to the HugeTLB page size. Add a comment to clarify this and assert the mmap behavior for HugeTLB. [Removed requirement that HugeTLB mmaps must be padded per Yanan's feedback and added assertion that mmap returns aligned addresses when using HugeTLB.] Cc: Ben Gardon <bgardon@google.com> Cc: Yanan Wang <wangyanan55@huawei.com> Cc: Andrew Jones <drjones@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Aaron Lewis <aaronlewis@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20211111000310.1435032-4-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Refactor align() to work with non-pointers and split into separate helpers for aligning up vs. down. Add align_ptr_up() for use with pointers. Expose all helpers so that they can be used by tests and/or other utilities. The align_down() helper in particular will be used to ensure gpa alignment for hugepages. No functional change intended. [Added sepearate up/down helpers and replaced open-coded alignment bit math throughout the KVM selftests.] Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: David Matlack <dmatlack@google.com> Reviewed-by: Ben Gardon <bgardon@google.com> Message-Id: <20211111000310.1435032-3-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Explicitly state the indices when populating vm_guest_mode_params to make it marginally easier to visualize what's going on. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Ben Gardon <bgardon@google.com> [Added indices for new guest modes.] Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20211111000310.1435032-2-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
David Woodhouse authored
When I first looked at this, there was no support for guest exception handling in the KVM selftests. In fact it was merged into 5.10 before the Xen support got merged in 5.11, and I could have used it from the start. Hook it up now, to exercise the Xen upcall delivery. I'm about to make things a bit more interesting by handling the full 2level event channel stuff in-kernel on top of the basic vector injection that we already have, and I'll want to build more tests on top. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Message-Id: <20211115165030.7422-3-dwmw2@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
- 12 Nov, 2021 3 commits
-
-
Paolo Bonzini authored
Merge tag 'kvmarm-fixes-5.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master KVM/arm64 fixes for 5.16, take #1 - Fix the host S2 finalization by solely iterating over the memblocks instead of the whole IPA space - Tighten the return value of kvm_vcpu_preferred_target() now that 32bit support is long gone - Make sure the extraction of ESR_ELx.EC is limited to the architected bits - Comment fixups
-
Paolo Bonzini authored
Use the same cleanup code independent of whether the cgroup to be uncharged and unref'd is the source or the destination cgroup. Use a bool to track whether the destination cgroup has been charged, which also fixes a bug in the error case: the destination cgroup must be uncharged only if it does not match the source. Fixes: b5663931 ("KVM: SEV: Add support for SEV intra host migration") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
When UBSAN is enabled, the code emitted for the call to guest_pv_has includes a call to __ubsan_handle_load_invalid_value. objtool complains that this call happens with UACCESS enabled; to avoid the warning, pull the calls to user_access_begin into both arms of the "if" statement, after the check for guest_pv_has. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
- 11 Nov, 2021 19 commits
-
-
Paolo Bonzini authored
* Fix misuse of gfn-to-pfn cache when recording guest steal time / preempted status * Fix selftests on APICv machines * Fix sparse warnings * Fix detection of KVM features in CPUID * Cleanups for bogus writes to MSR_KVM_PV_EOI_EN * Fixes and cleanups for MSR bitmap handling * Cleanups for INVPCID * Make x86 KVM_SOFT_MAX_VCPUS consistent with other architectures
-
Paolo Bonzini authored
Add support for AMD SEV and SEV-ES intra-host migration support. Intra host migration provides a low-cost mechanism for userspace VMM upgrades. In the common case for intra host migration, we can rely on the normal ioctls for passing data from one VMM to the next. SEV, SEV-ES, and other confidential compute environments make most of this information opaque, and render KVM ioctls such as "KVM_GET_REGS" irrelevant. As a result, we need the ability to pass this opaque metadata from one VMM to the next. The easiest way to do this is to leave this data in the kernel, and transfer ownership of the metadata from one KVM VM (or vCPU) to the next. In-kernel hand off makes it possible to move any data that would be unsafe/impossible for the kernel to hand directly to userspace, and cannot be reproduced using data that can be handed to userspace. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Vitaly Kuznetsov authored
KVM_CAP_NR_VCPUS is used to get the "recommended" maximum number of VCPUs and arm64/mips/riscv report num_online_cpus(). Powerpc reports either num_online_cpus() or num_present_cpus(), s390 has multiple constants depending on hardware features. On x86, KVM reports an arbitrary value of '710' which is supposed to be the maximum tested value but it's possible to test all KVM_MAX_VCPUS even when there are less physical CPUs available. Drop the arbitrary '710' value and return num_online_cpus() on x86 as well. The recommendation will match other architectures and will mean 'no CPU overcommit'. For reference, QEMU only queries KVM_CAP_NR_VCPUS to print a warning when the requested vCPU number exceeds it. The static limit of '710' is quite weird as smaller systems with just a few physical CPUs should certainly "recommend" less. Suggested-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20211111134733.86601-1-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Vipin Sharma authored
Handle #GP on INVPCID due to an invalid type in the common switch statement instead of relying on the callers (VMX and SVM) to manually validate the type. Unlike INVVPID and INVEPT, INVPCID is not explicitly documented to check the type before reading the operand from memory, so deferring the type validity check until after that point is architecturally allowed. Signed-off-by: Vipin Sharma <vipinsh@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211109174426.2350547-3-vipinsh@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Vipin Sharma authored
handle_invept(), handle_invvpid(), handle_invpcid() read the same reg2 field in vmcs.VMX_INSTRUCTION_INFO to get the index of the GPR that holds the invalidation type. Add a helper to retrieve reg2 from VMX instruction info to consolidate and document the shift+mask magic. Signed-off-by: Vipin Sharma <vipinsh@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211109174426.2350547-2-vipinsh@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Clean up the x2APIC MSR bitmap intereption code for L2, which is the last holdout of open coded bitmap manipulations. Freshen up the SDM/PRM comment, rename the function to make it abundantly clear the funky behavior is x2APIC specific, and explain _why_ vmcs01's bitmap is ignored (the previous comment was flat out wrong for x2APIC behavior). No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211109013047.2041518-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Add builder macros to generate the MSR bitmap helpers to reduce the amount of copy-paste code, especially with respect to all the magic numbers needed to calc the correct bit location. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211109013047.2041518-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Always check vmcs01's MSR bitmap when merging L0 and L1 bitmaps for L2, and always update the relevant bits in vmcs02. This fixes two distinct, but intertwined bugs related to dynamic MSR bitmap modifications. The first issue is that KVM fails to enable MSR interception in vmcs02 for the FS/GS base MSRs if L1 first runs L2 with interception disabled, and later enables interception. The second issue is that KVM fails to honor userspace MSR filtering when preparing vmcs02. Fix both issues simultaneous as fixing only one of the issues (doesn't matter which) would create a mess that no one should have to bisect. Fixing only the first bug would exacerbate the MSR filtering issue as userspace would see inconsistent behavior depending on the whims of L1. Fixing only the second bug (MSR filtering) effectively requires fixing the first, as the nVMX code only knows how to transition vmcs02's bitmap from 1->0. Move the various accessor/mutators that are currently buried in vmx.c into vmx.h so that they can be shared by the nested code. Fixes: 1a155254 ("KVM: x86: Introduce MSR filtering") Fixes: d69129b4 ("KVM: nVMX: Disable intercept for FS/GS base MSRs in vmcs02 when possible") Cc: stable@vger.kernel.org Cc: Alexander Graf <graf@amazon.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211109013047.2041518-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Check the current VMCS controls to determine if an MSR write will be intercepted due to MSR bitmaps being disabled. In the nested VMX case, KVM will disable MSR bitmaps in vmcs02 if they're disabled in vmcs12 or if KVM can't map L1's bitmaps for whatever reason. Note, the bad behavior is relatively benign in the current code base as KVM sets all bits in vmcs02's MSR bitmap by default, clears bits if and only if L0 KVM also disables interception of an MSR, and only uses the buggy helper for MSR_IA32_SPEC_CTRL. Because KVM explicitly tests WRMSR before disabling interception of MSR_IA32_SPEC_CTRL, the flawed check will only result in KVM reading MSR_IA32_SPEC_CTRL from hardware when it isn't strictly necessary. Tag the fix for stable in case a future fix wants to use msr_write_intercepted(), in which case a buggy implementation in older kernels could prove subtly problematic. Fixes: d28b387f ("KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211109013047.2041518-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Vitaly Kuznetsov authored
KVM: x86: Don't update vcpu->arch.pv_eoi.msr_val when a bogus value was written to MSR_KVM_PV_EOI_EN When kvm_gfn_to_hva_cache_init() call from kvm_lapic_set_pv_eoi() fails, MSR write to MSR_KVM_PV_EOI_EN results in #GP so it is reasonable to expect that the value we keep internally in KVM wasn't updated. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20211108152819.12485-3-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Vitaly Kuznetsov authored
kvm_lapic_enable_pv_eoi() is a misnomer as the function is also used to disable PV EOI. Rename it to kvm_lapic_set_pv_eoi(). No functional change intended. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20211108152819.12485-2-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paul Durrant authored
Currently when kvm_update_cpuid_runtime() runs, it assumes that the KVM_CPUID_FEATURES leaf is located at 0x40000001. This is not true, however, if Hyper-V support is enabled. In this case the KVM leaves will be offset. This patch introdues as new 'kvm_cpuid_base' field into struct kvm_vcpu_arch to track the location of the KVM leaves and function kvm_update_kvm_cpuid_base() (called from kvm_set_cpuid()) to locate the leaves using the 'KVMKVMKVM\0\0\0' signature (which is now given a definition in kvm_para.h). Adjustment of KVM_CPUID_FEATURES will hence now target the correct leaf. NOTE: A new for_each_possible_hypervisor_cpuid_base() macro is intoduced into processor.h to avoid having duplicate code for the iteration over possible hypervisor base leaves. Signed-off-by: Paul Durrant <pdurrant@amazon.com> Message-Id: <20211105095101.5384-3-pdurrant@amazon.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Move the core logic of SET_CPUID and SET_CPUID2 to a common helper, the only difference between the two ioctls() is the format of the userspace struct. A future fix will add yet more code to the core logic. No functional change intended. Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211105095101.5384-2-pdurrant@amazon.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Junaid Shahid authored
The fast page fault path bails out on write faults to huge pages in order to accommodate dirty logging. This change adds a check to do that only when dirty logging is actually enabled, so that access tracking for huge pages can still use the fast path for write faults in the common case. Signed-off-by: Junaid Shahid <junaids@google.com> Reviewed-by: Ben Gardon <bgardon@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211104003359.2201967-1-junaids@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Wrap the read of iter->sptep in tdp_mmu_map_handle_target_level() with rcu_dereference(). Shadow pages in the TDP MMU, and thus their SPTEs, are protected by rcu. This fixes a Sparse warning at tdp_mmu.c:900:51: warning: incorrect type in argument 1 (different address spaces) expected unsigned long long [usertype] *sptep got unsigned long long [noderef] [usertype] __rcu *[usertype] sptep Fixes: 7158bee4 ("KVM: MMU: pass kvm_mmu_page struct to make_spte") Cc: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211103161833.3769487-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Maxim Levitsky authored
KVM_GUESTDBG_BLOCKIRQ relies on interrupts being injected using standard kvm's inject_pending_event, and not via APICv/AVIC. Since this is a debug feature, just inhibit APICv/AVIC while KVM_GUESTDBG_BLOCKIRQ is in use on at least one vCPU. Fixes: 61e5f69e ("KVM: x86: implement KVM_GUESTDBG_BLOCKIRQ") Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Tested-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211108090245.166408-1-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Jim Mattson authored
These function names sound like predicates, and they have siblings, *is_valid_msr(), which _are_ predicates. Moreover, there are comments that essentially warn that these functions behave unexpectedly. Flip the polarity of the return values, so that they become predicates, and convert the boolean result to a success/failure code at the outer call site. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211105202058.1048757-1-jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
David Woodhouse authored
In commit b0431382 ("x86/KVM: Make sure KVM_VCPU_FLUSH_TLB flag is not missed") we switched to using a gfn_to_pfn_cache for accessing the guest steal time structure in order to allow for an atomic xchg of the preempted field. This has a couple of problems. Firstly, kvm_map_gfn() doesn't work at all for IOMEM pages when the atomic flag is set, which it is in kvm_steal_time_set_preempted(). So a guest vCPU using an IOMEM page for its steal time would never have its preempted field set. Secondly, the gfn_to_pfn_cache is not invalidated in all cases where it should have been. There are two stages to the GFN->PFN conversion; first the GFN is converted to a userspace HVA, and then that HVA is looked up in the process page tables to find the underlying host PFN. Correct invalidation of the latter would require being hooked up to the MMU notifiers, but that doesn't happen---so it just keeps mapping and unmapping the *wrong* PFN after the userspace page tables change. In the !IOMEM case at least the stale page *is* pinned all the time it's cached, so it won't be freed and reused by anyone else while still receiving the steal time updates. The map/unmap dance only takes care of the KVM administrivia such as marking the page dirty. Until the gfn_to_pfn cache handles the remapping automatically by integrating with the MMU notifiers, we might as well not get a kernel mapping of it, and use the perfectly serviceable userspace HVA that we already have. We just need to implement the atomic xchg on the userspace address with appropriate exception handling, which is fairly trivial. Cc: stable@vger.kernel.org Fixes: b0431382 ("x86/KVM: Make sure KVM_VCPU_FLUSH_TLB flag is not missed") Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Message-Id: <3645b9b889dac6438394194bb5586a46b68d581f.camel@infradead.org> [I didn't entirely agree with David's assessment of the usefulness of the gfn_to_pfn cache, and integrated the outcome of the discussion in the above commit message. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Peter Gonda authored
Adds testcases for intra host migration for SEV and SEV-ES. Also adds locking test to confirm no deadlock exists. Signed-off-by: Peter Gonda <pgonda@google.com> Suggested-by: Sean Christopherson <seanjc@google.com> Cc: Marc Orr <marcorr@google.com> Cc: Sean Christopherson <seanjc@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Message-Id: <20211021174303.385706-6-pgonda@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-