Commits · 7a36d680658ba5a0d350f2ad275b97156b8d4333 · Kirill Smelkov / linux

05 Mar, 2024 5 commits

KVM: x86/xen: fix recursive deadlock in timer injection · 7a36d680

David Woodhouse authored Feb 27, 2024

The fast-path timer delivery introduced a recursive locking deadlock
when userspace configures a timer which has already expired and is
delivered immediately. The call to kvm_xen_inject_timer_irqs() can
call to kvm_xen_set_evtchn() which may take kvm->arch.xen.xen_lock,
which is already held in kvm_xen_vcpu_get_attr().

 ============================================
 WARNING: possible recursive locking detected
 6.8.0-smp--5e10b4d51d77-drs #232 Tainted: G           O
 --------------------------------------------
 xen_shinfo_test/250013 is trying to acquire lock:
 ffff938c9930cc30 (&kvm->arch.xen.xen_lock){+.+.}-{3:3}, at: kvm_xen_set_evtchn+0x74/0x170 [kvm]

 but task is already holding lock:
 ffff938c9930cc30 (&kvm->arch.xen.xen_lock){+.+.}-{3:3}, at: kvm_xen_vcpu_get_attr+0x38/0x250 [kvm]

Now that the gfn_to_pfn_cache has its own self-sufficient locking, its
callers no longer need to ensure serialization, so just stop taking
kvm->arch.xen.xen_lock from kvm_xen_set_evtchn().

Fixes: 77c9b9de ("KVM: x86/xen: Use fast path for Xen timer delivery")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
Link: https://lore.kernel.org/r/20240227115648.3104-6-dwmw2@infradead.orgSigned-off-by: Sean Christopherson <seanjc@google.com>

7a36d680

KVM: pfncache: simplify locking and make more self-contained · 6addfcf2

David Woodhouse authored Feb 27, 2024

The locking on the gfn_to_pfn_cache is... interesting. And awful.

There is a rwlock in ->lock which readers take to ensure protection
against concurrent changes. But __kvm_gpc_refresh() makes assumptions
that certain fields will not change even while it drops the write lock
and performs MM operations to revalidate the target PFN and kernel
mapping.

Commit 93984f19 ("KVM: Fully serialize gfn=>pfn cache refresh via
mutex") partly addressed that — not by fixing it, but by adding a new
mutex, ->refresh_lock. This prevented concurrent __kvm_gpc_refresh()
calls on a given gfn_to_pfn_cache, but is still only a partial solution.

There is still a theoretical race where __kvm_gpc_refresh() runs in
parallel with kvm_gpc_deactivate(). While __kvm_gpc_refresh() has
dropped the write lock, kvm_gpc_deactivate() clears the ->active flag
and unmaps ->khva. Then __kvm_gpc_refresh() determines that the previous
->pfn and ->khva are still valid, and reinstalls those values into the
structure. This leaves the gfn_to_pfn_cache with the ->valid bit set,
but ->active clear. And a ->khva which looks like a reasonable kernel
address but is actually unmapped.

All it takes is a subsequent reactivation to cause that ->khva to be
dereferenced. This would theoretically cause an oops which would look
something like this:

[1724749.564994] BUG: unable to handle page fault for address: ffffaa3540ace0e0
[1724749.565039] RIP: 0010:__kvm_xen_has_interrupt+0x8b/0xb0

I say "theoretically" because theoretically, that oops that was seen in
production cannot happen. The code which uses the gfn_to_pfn_cache is
supposed to have its *own* locking, to further paper over the fact that
the gfn_to_pfn_cache's own papering-over (->refresh_lock) of its own
rwlock abuse is not sufficient.

For the Xen vcpu_info that external lock is the vcpu->mutex, and for the
shared info it's kvm->arch.xen.xen_lock. Those locks ought to protect
the gfn_to_pfn_cache against concurrent deactivation vs. refresh in all
but the cases where the vcpu or kvm object is being *destroyed*, in
which case the subsequent reactivation should never happen.

Theoretically.

Nevertheless, this locking abuse is awful and should be fixed, even if
no clear explanation can be found for how the oops happened. So expand
the use of the ->refresh_lock mutex to ensure serialization of
activate/deactivate vs. refresh and make the pfncache locking entirely
self-sufficient.

This means that a future commit can simplify the locking in the callers,
such as the Xen emulation code which has an outstanding problem with
recursive locking of kvm->arch.xen.xen_lock, which will no longer be
necessary.

The rwlock abuse described above is still not best practice, although
it's harmless now that the ->refresh_lock is held for the entire duration
while the offending code drops the write lock, does some other stuff,
then takes the write lock again and assumes nothing changed. That can
also be fixed^W cleaned up in a subsequent commit, but this commit is
a simpler basis for the Xen deadlock fix mentioned above.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
Link: https://lore.kernel.org/r/20240227115648.3104-5-dwmw2@infradead.org
[sean: use guard(mutex) to fix a missed unlock]
Signed-off-by: Sean Christopherson <seanjc@google.com>

6addfcf2

KVM: x86/xen: remove WARN_ON_ONCE() with false positives in evtchn delivery · 66e3cf72

David Woodhouse authored Feb 27, 2024

The kvm_xen_inject_vcpu_vector() function has a comment saying "the fast
version will always work for physical unicast", justifying its use of
kvm_irq_delivery_to_apic_fast() and the WARN_ON_ONCE() when that fails.

In fact that assumption isn't true if X2APIC isn't in use by the guest
and there is (8-bit x)APIC ID aliasing. A single "unicast" destination
APIC ID *may* then be delivered to multiple vCPUs. Remove the warning,
and in fact it might as well just call kvm_irq_delivery_to_apic().
Reported-by: Michal Luczaj <mhal@rbox.co>
Fixes: fde0451b ("KVM: x86/xen: Support per-vCPU event channel upcall via local APIC")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
Link: https://lore.kernel.org/r/20240227115648.3104-4-dwmw2@infradead.orgSigned-off-by: Sean Christopherson <seanjc@google.com>

66e3cf72

KVM: x86/xen: inject vCPU upcall vector when local APIC is enabled · 8e62bf2b

David Woodhouse authored Feb 27, 2024

Linux guests since commit b1c3497e ("x86/xen: Add support for
HVMOP_set_evtchn_upcall_vector") in v6.0 onwards will use the per-vCPU
upcall vector when it's advertised in the Xen CPUID leaves.

This upcall is injected through the guest's local APIC as an MSI, unlike
the older system vector which was merely injected by the hypervisor any
time the CPU was able to receive an interrupt and the upcall_pending
flags is set in its vcpu_info.

Effectively, that makes the per-CPU upcall edge triggered instead of
level triggered, which results in the upcall being lost if the MSI is
delivered when the local APIC is *disabled*.

Xen checks the vcpu_info->evtchn_upcall_pending flag when the local APIC
for a vCPU is software enabled (in fact, on any write to the SPIV
register which doesn't disable the APIC). Do the same in KVM since KVM
doesn't provide a way for userspace to intervene and trap accesses to
the SPIV register of a local APIC emulated by KVM.

Fixes: fde0451b ("KVM: x86/xen: Support per-vCPU event channel upcall via local APIC")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240227115648.3104-3-dwmw2@infradead.orgSigned-off-by: Sean Christopherson <seanjc@google.com>

8e62bf2b

KVM: x86/xen: improve accuracy of Xen timers · 451a7078

David Woodhouse authored Feb 27, 2024

A test program such as http://david.woodhou.se/timerlat.c confirms user
reports that timers are increasingly inaccurate as the lifetime of a
guest increases. Reporting the actual delay observed when asking for
100µs of sleep, it starts off OK on a newly-launched guest but gets
worse over time, giving incorrect sleep times:

root@ip-10-0-193-21:~# ./timerlat -c -n 5
00000000 latency 103243/100000 (3.2430%)
00000001 latency 103243/100000 (3.2430%)
00000002 latency 103242/100000 (3.2420%)
00000003 latency 103245/100000 (3.2450%)
00000004 latency 103245/100000 (3.2450%)

The biggest problem is that get_kvmclock_ns() returns inaccurate values
when the guest TSC is scaled. The guest sees a TSC value scaled from the
host TSC by a mul/shift conversion (hopefully done in hardware). The
guest then converts that guest TSC value into nanoseconds using the
mul/shift conversion given to it by the KVM pvclock information.

But get_kvmclock_ns() performs only a single conversion directly from
host TSC to nanoseconds, giving a different result. A test program at
http://david.woodhou.se/tsdrift.c demonstrates the cumulative error
over a day.

It's non-trivial to fix get_kvmclock_ns(), although I'll come back to
that. The actual guest hv_clock is per-CPU, and *theoretically* each
vCPU could be running at a *different* frequency. But this patch is
needed anyway because...

The other issue with Xen timers was that the code would snapshot the
host CLOCK_MONOTONIC at some point in time, and then... after a few
interrupts may have occurred, some preemption perhaps... would also read
the guest's kvmclock. Then it would proceed under the false assumption
that those two happened at the *same* time. Any time which *actually*
elapsed between reading the two clocks was introduced as inaccuracies
in the time at which the timer fired.

Fix it to use a variant of kvm_get_time_and_clockread(), which reads the
host TSC just *once*, then use the returned TSC value to calculate the
kvmclock (making sure to do that the way the guest would instead of
making the same mistake get_kvmclock_ns() does).

Sadly, hrtimers based on CLOCK_MONOTONIC_RAW are not supported, so Xen
timers still have to use CLOCK_MONOTONIC. In practice the difference
between the two won't matter over the timescales involved, as the
*absolute* values don't matter; just the delta.

This does mean a new variant of kvm_get_time_and_clockread() is needed;
called kvm_get_monotonic_and_clockread() because that's what it does.

Fixes: 53639526 ("KVM: x86/xen: handle PV timers oneshot mode")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
Link: https://lore.kernel.org/r/20240227115648.3104-2-dwmw2@infradead.org
[sean: massage moved comment, tweak if statement formatting]
Signed-off-by: Sean Christopherson <seanjc@google.com>

451a7078

22 Feb, 2024 7 commits

KVM: x86/xen: allow vcpu_info content to be 'safely' copied · 003d9142

Paul Durrant authored Feb 15, 2024

If the guest sets an explicit vcpu_info GPA then, for any of the first 32
vCPUs, the content of the default vcpu_info in the shared_info page must be
copied into the new location. Because this copy may race with event
delivery (which updates the 'evtchn_pending_sel' field in vcpu_info),
event delivery needs to be deferred until the copy is complete.

Happily there is already a shadow of 'evtchn_pending_sel' in kvm_vcpu_xen
that is used in atomic context if the vcpu_info PFN cache has been
invalidated so that the update of vcpu_info can be deferred until the
cache can be refreshed (on vCPU thread's the way back into guest context).

Use this shadow if the vcpu_info cache has been *deactivated*, so that
the VMM can safely copy the vcpu_info content and then re-activate the
cache with the new GPA. To do this, stop considering an inactive vcpu_info
cache as a hard error in kvm_xen_set_evtchn_fast(), and let the existing
kvm_gpc_check() fail and kick the vCPU (if necessary).
Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lore.kernel.org/r/20240215152916.1158-21-paul@xen.org
[sean: add a bit of verbosity to the changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>

003d9142

KVM: pfncache: check the need for invalidation under read lock first · 9fa336e3

Paul Durrant authored Feb 15, 2024

When processing mmu_notifier invalidations for gpc caches, pre-check for
overlap with the invalidation event while holding gpc->lock for read, and
only take gpc->lock for write if the cache needs to be invalidated. Doing
a pre-check without taking gpc->lock for write avoids unnecessarily
contending the lock for unrelated invalidations, which is very beneficial
for caches that are heavily used (but rarely subjected to mmu_notifier
invalidations).
Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lore.kernel.org/r/20240215152916.1158-20-paul@xen.orgSigned-off-by: Sean Christopherson <seanjc@google.com>

9fa336e3

KVM: x86/xen: advertize the KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA capability · 615451d8

Paul Durrant authored Feb 15, 2024

Now that all relevant kernel changes and selftests are in place, enable the
new capability.
Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lore.kernel.org/r/20240215152916.1158-17-paul@xen.orgSigned-off-by: Sean Christopherson <seanjc@google.com>

615451d8

KVM: selftests: re-map Xen's vcpu_info using HVA rather than GPA · b4dfbfdc

Paul Durrant authored Feb 15, 2024

If the relevant capability (KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA) is present
then re-map vcpu_info using the HVA part way through the tests to make sure
then there is no functional change.
Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lore.kernel.org/r/20240215152916.1158-16-paul@xen.orgSigned-off-by: Sean Christopherson <seanjc@google.com>

b4dfbfdc

KVM: selftests: map Xen's shared_info page using HVA rather than GFN · 9397b533

Paul Durrant authored Feb 15, 2024

Using the HVA of the shared_info page is more efficient, so if the
capability (KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA) is present use that method
to do the mapping.

NOTE: Have the juggle_shinfo_state() thread map and unmap using both
GFN and HVA, to make sure the older mechanism is not broken.
Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lore.kernel.org/r/20240215152916.1158-15-paul@xen.orgSigned-off-by: Sean Christopherson <seanjc@google.com>

9397b533

KVM: x86/xen: allow vcpu_info to be mapped by fixed HVA · 3991f358

Paul Durrant authored Feb 15, 2024

If the guest does not explicitly set the GPA of vcpu_info structure in
memory then, for guests with 32 vCPUs or fewer, the vcpu_info embedded
in the shared_info page may be used. As described in a previous commit,
the shared_info page is an overlay at a fixed HVA within the VMM, so in
this case it also more optimal to activate the vcpu_info cache with a
fixed HVA to avoid unnecessary invalidation if the guest memory layout
is modified.
Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lore.kernel.org/r/20240215152916.1158-14-paul@xen.org
[sean: use kvm_gpc_is_{gpa,hva}_active()]
Signed-off-by: Sean Christopherson <seanjc@google.com>

3991f358

KVM: x86/xen: allow shared_info to be mapped by fixed HVA · b9220d32

Paul Durrant authored Feb 15, 2024

The shared_info page is not guest memory as such. It is a dedicated page
allocated by the VMM and overlaid onto guest memory in a GFN chosen by the
guest and specified in the XENMEM_add_to_physmap hypercall. The guest may
even request that shared_info be moved from one GFN to another by
re-issuing that hypercall, but the HVA is never going to change.

Because the shared_info page is an overlay the memory slots need to be
updated in response to the hypercall. However, memory slot adjustment is
not atomic and, whilst all vCPUs are paused, there is still the possibility
that events may be delivered (which requires the shared_info page to be
updated) whilst the shared_info GPA is absent. The HVA is never absent
though, so it makes much more sense to use that as the basis for the
kernel's mapping.

Hence add a new KVM_XEN_ATTR_TYPE_SHARED_INFO_HVA attribute type for this
purpose and a KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA flag to advertize its
availability. Don't actually advertize it yet though. That will be done in
a subsequent patch, which will also add tests for the new attribute type.

Also update the KVM API documentation with the new attribute and also fix
it up to consistently refer to 'shared_info' (with the underscore).
Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lore.kernel.org/r/20240215152916.1158-13-paul@xen.org
[sean: store "hva" as a user pointer, use kvm_gpc_is_{gpa,hva}_active()]
Signed-off-by: Sean Christopherson <seanjc@google.com>

b9220d32

20 Feb, 2024 11 commits

KVM: x86/xen: re-initialize shared_info if guest (32/64-bit) mode is set · 18b99e4d

Paul Durrant authored Feb 15, 2024

If the shared_info PFN cache has already been initialized then the content
of the shared_info page needs to be re-initialized whenever the guest
mode is (re)set.
Setting the guest mode is either done explicitly by the VMM via the
KVM_XEN_ATTR_TYPE_LONG_MODE attribute, or implicitly when the guest writes
the MSR to set up the hypercall page.
Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lore.kernel.org/r/20240215152916.1158-12-paul@xen.orgSigned-off-by: Sean Christopherson <seanjc@google.com>

18b99e4d

KVM: x86/xen: separate initialization of shared_info cache and content · c01c55a3

Paul Durrant authored Feb 15, 2024

A subsequent patch will allow shared_info to be initialized using either a
GPA or a user-space (i.e. VMM) HVA. To make that patch cleaner, separate
the initialization of the shared_info content from the activation of the
pfncache.
Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lore.kernel.org/r/20240215152916.1158-11-paul@xen.orgSigned-off-by: Sean Christopherson <seanjc@google.com>

c01c55a3

KVM: pfncache: allow a cache to be activated with a fixed (userspace) HVA · 721f5b0d

Paul Durrant authored Feb 15, 2024

Some pfncache pages may actually be overlays on guest memory that have a
fixed HVA within the VMM. It's pointless to invalidate such cached
mappings if the overlay is moved so allow a cache to be activated directly
with the HVA to cater for such cases. A subsequent patch will make use
of this facility.
Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lore.kernel.org/r/20240215152916.1158-10-paul@xen.orgSigned-off-by: Sean Christopherson <seanjc@google.com>

721f5b0d

KVM: s390: Refactor kvm_is_error_gpa() into kvm_is_gpa_in_memslot() · 9e7325ac

Sean Christopherson authored Feb 15, 2024

Rename kvm_is_error_gpa() to kvm_is_gpa_in_memslot() and invert the
polarity accordingly in order to (a) free up kvm_is_error_gpa() to match
with kvm_is_error_{hva,page}(), and (b) to make it more obvious that the
helper is doing a memslot lookup, i.e. not simply checking for INVALID_GPA.

No functional change intended.

Link: https://lore.kernel.org/r/20240215152916.1158-9-paul@xen.orgSigned-off-by: Sean Christopherson <seanjc@google.com>

9e7325ac

KVM: pfncache: include page offset in uhva and use it consistently · 406c1096

Paul Durrant authored Feb 15, 2024

Currently the pfncache page offset is sometimes determined using the gpa
and sometimes the khva, whilst the uhva is always page-aligned. After a
subsequent patch is applied the gpa will not always be valid so adjust
the code to include the page offset in the uhva and use it consistently
as the source of truth.

Also, where a page-aligned address is required, use PAGE_ALIGN_DOWN()
for clarity.

No functional change intended.
Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lore.kernel.org/r/20240215152916.1158-8-paul@xen.orgSigned-off-by: Sean Christopherson <seanjc@google.com>

406c1096

KVM: pfncache: stop open-coding offset_in_page() · 53e63e95

Paul Durrant authored Feb 15, 2024

Some code in pfncache uses offset_in_page() but in other places it is open-
coded. Use offset_in_page() consistently everywhere.
Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lore.kernel.org/r/20240215152916.1158-7-paul@xen.orgSigned-off-by: Sean Christopherson <seanjc@google.com>

53e63e95

KVM: pfncache: remove KVM_GUEST_USES_PFN usage · a4bff3df

Paul Durrant authored Feb 15, 2024

As noted in [1] the KVM_GUEST_USES_PFN usage flag is never set by any
callers of kvm_gpc_init(), and for good reason: the implementation is
incomplete/broken.  And it's not clear that there will ever be a user of
KVM_GUEST_USES_PFN, as coordinating vCPUs with mmu_notifier events is
non-trivial.

Remove KVM_GUEST_USES_PFN and all related code, e.g. dropping
KVM_GUEST_USES_PFN also makes the 'vcpu' argument redundant, to avoid
having to reason about broken code as __kvm_gpc_refresh() evolves.

Moreover, all existing callers specify KVM_HOST_USES_PFN so the usage
check in hva_to_pfn_retry() and hence the 'usage' argument to
kvm_gpc_init() are also redundant.

[1] https://lore.kernel.org/all/ZQiR8IpqOZrOpzHC@google.comSigned-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lore.kernel.org/r/20240215152916.1158-6-paul@xen.org
[sean: explicitly call out that guest usage is incomplete]
Signed-off-by: Sean Christopherson <seanjc@google.com>

a4bff3df

KVM: pfncache: add a mark-dirty helper · 78b74638

Paul Durrant authored Feb 15, 2024

At the moment pages are marked dirty by open-coded calls to
mark_page_dirty_in_slot(), directly deferefencing the gpa and memslot
from the cache. After a subsequent patch these may not always be set
so add a helper now so that caller will protected from the need to know
about this detail.
Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lore.kernel.org/r/20240215152916.1158-5-paul@xen.org
[sean: decrease indentation, use gpa_to_gfn()]
Signed-off-by: Sean Christopherson <seanjc@google.com>

78b74638

KVM: x86/xen: mark guest pages dirty with the pfncache lock held · 4438355e

Paul Durrant authored Feb 15, 2024

Sampling gpa and memslot from an unlocked pfncache may yield inconsistent
values so, since there is no problem with calling mark_page_dirty_in_slot()
with the pfncache lock held, relocate the calls in
kvm_xen_update_runstate_guest() and kvm_xen_inject_pending_events()
accordingly.
Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lore.kernel.org/r/20240215152916.1158-4-paul@xen.orgSigned-off-by: Sean Christopherson <seanjc@google.com>

4438355e

KVM: pfncache: remove unnecessary exports · 41496fff

Paul Durrant authored Feb 15, 2024

There is no need for the existing kvm_gpc_XXX() functions to be exported.
Clean up now before additional functions are added in subsequent patches.
Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lore.kernel.org/r/20240215152916.1158-3-paul@xen.orgSigned-off-by: Sean Christopherson <seanjc@google.com>

41496fff

KVM: pfncache: Add a map helper function · f39b80e3

Paul Durrant authored Feb 15, 2024

There is a pfncache unmap helper but mapping is open-coded. Arguably this
is fine because mapping is done in only one place, hva_to_pfn_retry(), but
adding the helper does make that function more readable.

No functional change intended.
Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lore.kernel.org/r/20240215152916.1158-2-paul@xen.orgSigned-off-by: Sean Christopherson <seanjc@google.com>

f39b80e3

08 Feb, 2024 13 commits

KVM: remove unnecessary #ifdef · db7d6fbc

Paolo Bonzini authored Jan 11, 2024

KVM_CAP_IRQ_ROUTING is always defined, so there is no need to check if it is.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

db7d6fbc

KVM: define __KVM_HAVE_GUEST_DEBUG unconditionally · 6bda055d

Paolo Bonzini authored Jan 11, 2024

Since all architectures (for historical reasons) have to define
struct kvm_guest_debug_arch, and since userspace has to check
KVM_CHECK_EXTENSION(KVM_CAP_SET_GUEST_DEBUG) anyway, there is
no advantage in masking the capability #define itself.  Remove
the #define __KVM_HAVE_GUEST_DEBUG from architecture-specific
headers.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

6bda055d

kvm: replace __KVM_HAVE_READONLY_MEM with Kconfig symbol · 8886640d

Paolo Bonzini authored Jan 11, 2024

KVM uses __KVM_HAVE_* symbols in the architecture-dependent uapi/asm/kvm.h to mask
unused definitions in include/uapi/linux/kvm.h. __KVM_HAVE_READONLY_MEM however
was nothing but a misguided attempt to define KVM_CAP_READONLY_MEM only on
architectures where KVM_CHECK_EXTENSION(KVM_CAP_READONLY_MEM) could possibly
return nonzero. This however does not make sense, and it prevented userspace
from supporting this architecture-independent feature without recompilation.

Therefore, these days __KVM_HAVE_READONLY_MEM does not mask anything and
is only used in virt/kvm/kvm_main.c. Userspace does not need to test it
and there should be no need for it to exist. Remove it and replace it
with a Kconfig symbol within Linux source code.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

8886640d

KVM: arm64: move ARM-specific defines to uapi/asm/kvm.h · 5d9cb716

Paolo Bonzini authored Jan 11, 2024

While this in principle breaks userspace code that mentions KVM_ARM_DEV_*
on architectures other than aarch64, this seems unlikely to be
a problem considering that run->s.regs.device_irq_level is only
defined on that architecture.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

5d9cb716

KVM: s390: move s390-specific structs to uapi/asm/kvm.h · 71cd774a

Paolo Bonzini authored Jan 11, 2024

While this in principle breaks the appearance of KVM_S390_* ioctls on architectures
other than s390, this seems unlikely to be a problem considering that there are
already many "struct kvm_s390_*" definitions in arch/s390/include/uapi.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

71cd774a

KVM: powerpc: move powerpc-specific structs to uapi/asm/kvm.h · d750951c

Paolo Bonzini authored Jan 11, 2024

While this in principle breaks the appearance of KVM_PPC_* ioctls on architectures
other than powerpc, this seems unlikely to be a problem considering that there are
already many "struct kvm_ppc_*" definitions in arch/powerpc/include/uapi.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

d750951c

KVM: x86: move x86-specific structs to uapi/asm/kvm.h · bcac0477

Paolo Bonzini authored Jan 11, 2024

Several capabilities that exist only on x86 nevertheless have their
structs defined in include/uapi/linux/kvm.h.  Move them to
arch/x86/include/uapi/asm/kvm.h for cleanliness.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

bcac0477

KVM: remove more traces of device assignment UAPI · c0a41190
Paolo Bonzini authored Jan 31, 2024
```
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
```
c0a41190

kvm: x86: use a uapi-friendly macro for GENMASK · 45882241

Paolo Bonzini authored Dec 12, 2023

Change uapi header uses of GENMASK to instead use the uapi/linux/bits.h bit
macros, since GENMASK is not defined in uapi headers.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

45882241

kvm: x86: use a uapi-friendly macro for BIT · 882dd4ae

Dionna Glaze authored Dec 07, 2023

Change uapi header uses of BIT to instead use the uapi/linux/const.h bit
macros, since BIT is not defined in uapi headers.

The PMU mask uses _BITUL since it targets a 32 bit flag field, whereas
the longmode definition is meant for a 64 bit flag field.

Cc: Sean Christophersen <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Dionna Glaze <dionnaglaze@google.com>
Message-Id: <20231207001142.3617856-1-dionnaglaze@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

882dd4ae

uapi: introduce uapi-friendly macros for GENMASK · 3c7a8e19

Paolo Bonzini authored Dec 12, 2023

Move __GENMASK and __GENMASK_ULL from include/ to include/uapi/ so that they can
be used to define masks in userspace API headers. Compared to what is already
in include/linux/bits.h, the definitions need to use the uglified versions of
UL(), ULL(), BITS_PER_LONG and BITS_PER_LONG_LONG (which did not even exist),
but otherwise expand to the same content.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

3c7a8e19

Merge tag 'v6.8-p3' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · 04737196

Linus Torvalds authored Feb 08, 2024

Pull crypto fixes from Herbert Xu:
 "Fix regressions in cbc and algif_hash, as well as an older
  NULL-pointer dereference in ccp"

* tag 'v6.8-p3' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: algif_hash - Remove bogus SGL free on zero-length error path
  crypto: cbc - Ensure statesize is zero
  crypto: ccp - Fix null pointer dereference in __sev_platform_shutdown_locked

04737196

Merge tag 'percpu-for-6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu · 860d7dcb

Linus Torvalds authored Feb 08, 2024

Pull percpu fix from Dennis Zhou:

 - fix riscv wrong size passed to local_flush_tlb_range_asid()

* tag 'percpu-for-6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu:
  riscv: Fix wrong size passed to local_flush_tlb_range_asid()

860d7dcb

07 Feb, 2024 4 commits

Merge tag 'loongarch-fixes-6.8-2' of... · 547ab8fc

Linus Torvalds authored Feb 07, 2024

Merge tag 'loongarch-fixes-6.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson

Pull LoongArch fixes from Huacai Chen:
 "Fix acpi_core_pic[] array overflow, fix earlycon parameter if KASAN
  enabled, disable UBSAN instrumentation for vDSO build, and two Kconfig
  cleanups"

* tag 'loongarch-fixes-6.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
  LoongArch: vDSO: Disable UBSAN instrumentation
  LoongArch: Fix earlycon parameter if KASAN enabled
  LoongArch: Change acpi_core_pic[NR_CPUS] to acpi_core_pic[MAX_CORE_PIC]
  LoongArch: Select HAVE_ARCH_SECCOMP to use the common SECCOMP menu
  LoongArch: Select ARCH_ENABLE_THP_MIGRATION instead of redefining it

547ab8fc

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 5c24ba20

Linus Torvalds authored Feb 07, 2024

Pull kvm fixes from Paolo Bonzini:
 "x86 guest:

   - Avoid false positive for check that only matters on AMD processors

  x86:

   - Give a hint when Win2016 might fail to boot due to XSAVES &&
     !XSAVEC configuration

   - Do not allow creating an in-kernel PIT unless an IOAPIC already
     exists

  RISC-V:

   - Allow ISA extensions that were enabled for bare metal in 6.8 (Zbc,
     scalar and vector crypto, Zfh[min], Zihintntl, Zvfh[min], Zfa)

  S390:

   - fix CC for successful PQAP instruction

   - fix a race when creating a shadow page"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  x86/coco: Define cc_vendor without CONFIG_ARCH_HAS_CC_PLATFORM
  x86/kvm: Fix SEV check in sev_map_percpu_data()
  KVM: x86: Give a hint when Win2016 might fail to boot due to XSAVES erratum
  KVM: x86: Check irqchip mode before create PIT
  KVM: riscv: selftests: Add Zfa extension to get-reg-list test
  RISC-V: KVM: Allow Zfa extension for Guest/VM
  KVM: riscv: selftests: Add Zvfh[min] extensions to get-reg-list test
  RISC-V: KVM: Allow Zvfh[min] extensions for Guest/VM
  KVM: riscv: selftests: Add Zihintntl extension to get-reg-list test
  RISC-V: KVM: Allow Zihintntl extension for Guest/VM
  KVM: riscv: selftests: Add Zfh[min] extensions to get-reg-list test
  RISC-V: KVM: Allow Zfh[min] extensions for Guest/VM
  KVM: riscv: selftests: Add vector crypto extensions to get-reg-list test
  RISC-V: KVM: Allow vector crypto extensions for Guest/VM
  KVM: riscv: selftests: Add scaler crypto extensions to get-reg-list test
  RISC-V: KVM: Allow scalar crypto extensions for Guest/VM
  KVM: riscv: selftests: Add Zbc extension to get-reg-list test
  RISC-V: KVM: Allow Zbc extension for Guest/VM
  KVM: s390: fix cc for successful PQAP
  KVM: s390: vsie: fix race during shadow creation

5c24ba20

Merge tag 'nfsd-6.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux · c8d80f83

Linus Torvalds authored Feb 07, 2024

Pull nfsd fix from Chuck Lever:

 - Address a deadlock regression in RELEASE_LOCKOWNER

* tag 'nfsd-6.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  nfsd: don't take fi_lock in nfsd_break_deleg_cb()

c8d80f83

Merge tag 'for-6.8-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 6d280f4d

Linus Torvalds authored Feb 07, 2024

Pull btrfs fixes from David Sterba:

 - two fixes preventing deletion and manual creation of subvolume qgroup

 - unify error code returned for unknown send flags

 - fix assertion during subvolume creation when anonymous device could
   be allocated by other thread (e.g. due to backref walk)

* tag 'for-6.8-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: do not ASSERT() if the newly created subvolume already got read
  btrfs: forbid deleting live subvol qgroup
  btrfs: forbid creating subvol qgroups
  btrfs: send: return EOPNOTSUPP on unknown flags

6d280f4d