Commits · e0135a104c52ccc977bc04a972bc889e0298b068 · Kirill Smelkov / linux

11 Jun, 2020 10 commits

KVM: x86: do not pass poisoned hva to __kvm_set_memory_region · e0135a10

Paolo Bonzini authored Jun 11, 2020

__kvm_set_memory_region does not use the hva at all, so trying to
catch use-after-delete is pointless and, worse, it fails access_ok
now that we apply it to all memslots including private kernel ones.
This fixes an AVIC regression.

Fixes: 09d952c9 ("KVM: check userspace_addr for all memslots")
Reported-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

e0135a10

KVM: selftests: fix sync_with_host() in smm_test · cfb65c15

Vitaly Kuznetsov authored Jun 10, 2020

It was reported that older GCCs compile smm_test in a way that breaks
it completely:

  kvm_exit:             reason EXIT_CPUID rip 0x4014db info 0 0
  func 7ffffffd idx 830 rax 0 rbx 0 rcx 0 rdx 0, cpuid entry not found
  ...
  kvm_exit:             reason EXIT_MSR rip 0x40abd9 info 0 0
  kvm_msr:              msr_read 487 = 0x0 (#GP)
  ...

Note, '7ffffffd' was supposed to be '80000001' as we're checking for
SVM. Dropping '-O2' from compiler flags help. Turns out, asm block in
sync_with_host() is wrong. We us 'in 0xe, %%al' instruction to sync
with the host and in 'AL' register we actually pass the parameter
(stage) but after sync 'AL' gets written to but GCC thinks the value
is still there and uses it to compute 'EAX' for 'cpuid'.

smm_test can't fully use standard ucall() framework as we need to
write a very simple SMI handler there. Fix the immediate issue by
making RAX input/output operand. While on it, make sync_with_host()
static inline.
Reported-by: Marcelo Bandeira Condotta <mcondotta@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200610164116.770811-1-vkuznets@redhat.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

cfb65c15

KVM: async_pf: Inject 'page ready' event only if 'page not present' was previously injected · 2a18b7e7

Vitaly Kuznetsov authored Jun 10, 2020

'Page not present' event may or may not get injected depending on
guest's state. If the event wasn't injected, there is no need to
inject the corresponding 'page ready' event as the guest may get
confused. E.g. Linux thinks that the corresponding 'page not present'
event wasn't delivered *yet* and allocates a 'dummy entry' for it.
This entry is never freed.

Note, 'wakeup all' events have no corresponding 'page not present'
event and always get injected.

s390 seems to always be able to inject 'page not present', the
change is effectively a nop.
Suggested-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200610175532.779793-2-vkuznets@redhat.com>
Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=208081Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

2a18b7e7

KVM: async_pf: Cleanup kvm_setup_async_pf() · 7863e346

Vitaly Kuznetsov authored Jun 10, 2020

schedule_work() returns 'false' only when the work is already on the queue
and this can't happen as kvm_setup_async_pf() always allocates a new one.
Also, to avoid potential race, it makes sense to to schedule_work() at the
very end after we've added it to the queue.

While on it, do some minor cleanup. gfn_to_pfn_async() mentioned in a
comment does not currently exist and, moreover, we can check
kvm_is_error_hva() at the very beginning, before we try to allocate work so
'retry_sync' label can go away completely.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200610175532.779793-1-vkuznets@redhat.com>
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

7863e346

kvm: i8254: remove redundant assignment to pointer s · cd18eaea

Colin Ian King authored Jun 10, 2020

The pointer s is being assigned a value that is never read, the
assignment is redundant and can be removed.

Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Message-Id: <20200609233121.1118683-1-colin.king@canonical.com>
Fixes: 7837699f ("KVM: In kernel PIT model")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

cd18eaea

KVM: x86: respect singlestep when emulating instruction · 384dea1c

Felipe Franciosi authored May 19, 2020

When userspace configures KVM_GUESTDBG_SINGLESTEP, KVM will manage the
presence of X86_EFLAGS_TF via kvm_set/get_rflags on vcpus. The actual
rflag bit is therefore hidden from callers.

That includes init_emulate_ctxt() which uses the value returned from
kvm_get_flags() to set ctxt->tf. As a result, x86_emulate_instruction()
will skip a single step, leaving singlestep_rip stale and not returning
to userspace.

This resolves the issue by observing the vcpu guest_debug configuration
alongside ctxt->tf in x86_emulate_instruction(), performing the single
step if set.

Cc: stable@vger.kernel.org
Signed-off-by: Felipe Franciosi <felipe@nutanix.com>
Message-Id: <20200519081048.8204-1-felipe@nutanix.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

384dea1c

KVM: selftests: Don't probe KVM_CAP_HYPERV_ENLIGHTENED_VMCS when nested VMX is unsupported · 7e464770

Vitaly Kuznetsov authored Jun 10, 2020

KVM_CAP_HYPERV_ENLIGHTENED_VMCS will be reported as supported even when
nested VMX is not, fix evmcs_test/hyperv_cpuid tests to check for both.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200610135847.754289-3-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

7e464770

KVM: selftests: do not substitute SVM/VMX check with KVM_CAP_NESTED_STATE check · 41a23ab3

Vitaly Kuznetsov authored Jun 10, 2020

state_test/smm_test use KVM_CAP_NESTED_STATE check as an indicator for
nested VMX/SVM presence and this is incorrect. Check for the required
features dirrectly.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200610135847.754289-2-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

41a23ab3

Merge branch 'kvm-basic-exit-reason' into HEAD · 77f81f37

Paolo Bonzini authored Jun 09, 2020

Using a topic branch so that stable branches can simply cherry-pick the
patch.
Reviewed-by: Oliver Upton <oupton@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

77f81f37

KVM: nVMX: Consult only the "basic" exit reason when routing nested exit · 2ebac8bb

Sean Christopherson authored Feb 27, 2020

Consult only the basic exit reason, i.e. bits 15:0 of vmcs.EXIT_REASON,
when determining whether a nested VM-Exit should be reflected into L1 or
handled by KVM in L0.

For better or worse, the switch statement in nested_vmx_exit_reflected()
currently defaults to "true", i.e. reflects any nested VM-Exit without
dedicated logic.  Because the case statements only contain the basic
exit reason, any VM-Exit with modifier bits set will be reflected to L1,
even if KVM intended to handle it in L0.

Practically speaking, this only affects EXIT_REASON_MCE_DURING_VMENTRY,
i.e. a #MC that occurs on nested VM-Enter would be incorrectly routed to
L1, as "failed VM-Entry" is the only modifier that KVM can currently
encounter.  The SMM modifiers will never be generated as KVM doesn't
support/employ a SMI Transfer Monitor.  Ditto for "exit from enclave",
as KVM doesn't yet support virtualizing SGX, i.e. it's impossible to
enter an enclave in a KVM guest (L1 or L2).

Fixes: 644d711a ("KVM: nVMX: Deciding if L0 or L1 should handle an L2 exit")
Cc: Jim Mattson <jmattson@google.com>
Cc: Xiaoyao Li <xiaoyao.li@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200227174430.26371-1-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

2ebac8bb

09 Jun, 2020 2 commits

KVM: x86: Unexport x86_fpu_cache and make it static · 80fbd280

Sean Christopherson authored Jun 08, 2020

Make x86_fpu_cache static now that FPU allocation and destruction is
handled entirely by common x86 code.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200608180218.20946-1-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

80fbd280

KVM: selftests: Ignore KVM 5-level paging support for VM_MODE_PXXV48_4K · 20670285

Sean Christopherson authored May 27, 2020

Explicitly set the VA width to 48 bits for the x86_64-only PXXV48_4K VM
mode instead of asserting the guest VA width is 48 bits.  The fact that
KVM supports 5-level paging is irrelevant unless the selftests opt-in to
5-level paging by setting CR4.LA57 for the guest.  The overzealous
assert prevents running the selftests on a kernel with 5-level paging
enabled.

Incorporate LA57 into the assert instead of removing the assert entirely
as a sanity check of KVM's CPUID output.

Fixes: 567a9f1e ("KVM: selftests: Introduce VM_MODE_PXXV48_4K")
Reported-by: Sergio Perez Gonzalez <sergio.perez.gonzalez@intel.com>
Cc: Adriana Cervantes Jimenez <adriana.cervantes.jimenez@intel.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200528021530.28091-1-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

20670285

08 Jun, 2020 6 commits

KVM: x86: Fix APIC page invalidation race · e649b3f0

Eiichi Tsukata authored Jun 06, 2020

Commit b1394e74 ("KVM: x86: fix APIC page invalidation") tried
to fix inappropriate APIC page invalidation by re-introducing arch
specific kvm_arch_mmu_notifier_invalidate_range() and calling it from
kvm_mmu_notifier_invalidate_range_start. However, the patch left a
possible race where the VMCS APIC address cache is updated *before*
it is unmapped:

  (Invalidator) kvm_mmu_notifier_invalidate_range_start()
  (Invalidator) kvm_make_all_cpus_request(kvm, KVM_REQ_APIC_PAGE_RELOAD)
  (KVM VCPU) vcpu_enter_guest()
  (KVM VCPU) kvm_vcpu_reload_apic_access_page()
  (Invalidator) actually unmap page

Because of the above race, there can be a mismatch between the
host physical address stored in the APIC_ACCESS_PAGE VMCS field and
the host physical address stored in the EPT entry for the APIC GPA
(0xfee0000).  When this happens, the processor will not trap APIC
accesses, and will instead show the raw contents of the APIC-access page.
Because Windows OS periodically checks for unexpected modifications to
the LAPIC register, this will show up as a BSOD crash with BugCheck
CRITICAL_STRUCTURE_CORRUPTION (109) we are currently seeing in
https://bugzilla.redhat.com/show_bug.cgi?id=1751017.

The root cause of the issue is that kvm_arch_mmu_notifier_invalidate_range()
cannot guarantee that no additional references are taken to the pages in
the range before kvm_mmu_notifier_invalidate_range_end().  Fortunately,
this case is supported by the MMU notifier API, as documented in
include/linux/mmu_notifier.h:

	 * If the subsystem
         * can't guarantee that no additional references are taken to
         * the pages in the range, it has to implement the
         * invalidate_range() notifier to remove any references taken
         * after invalidate_range_start().

The fix therefore is to reload the APIC-access page field in the VMCS
from kvm_mmu_notifier_invalidate_range() instead of ..._range_start().

Cc: stable@vger.kernel.org
Fixes: b1394e74 ("KVM: x86: fix APIC page invalidation")
Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=197951Signed-off-by: Eiichi Tsukata <eiichi.tsukata@nutanix.com>
Message-Id: <20200606042627.61070-1-eiichi.tsukata@nutanix.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

e649b3f0

KVM: SVM: fix calls to is_intercept · fb7333df

Paolo Bonzini authored Jun 08, 2020

is_intercept takes an INTERCEPT_* constant, not SVM_EXIT_*; because
of this, the compiler was removing the body of the conditionals,
as if is_intercept returned 0.

This unveils a latent bug: when clearing the VINTR intercept,
int_ctl must also be changed in the L1 VMCB (svm->nested.hsave),
just like the intercept itself is also changed in the L1 VMCB.
Otherwise V_IRQ remains set and, due to the VINTR intercept being clear,
we get a spurious injection of a vector 0 interrupt on the next
L2->L1 vmexit.
Reported-by: Qian Cai <cai@lca.pw>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

fb7333df

KVM: selftests: fix vmx_preemption_timer_test build with GCC10 · 75ad6e80

Vitaly Kuznetsov authored Jun 08, 2020

GCC10 fails to build vmx_preemption_timer_test:

gcc -Wall -Wstrict-prototypes -Wuninitialized -O2 -g -std=gnu99
-fno-stack-protector -fno-PIE -I../../../../tools/include
 -I../../../../tools/arch/x86/include -I../../../../usr/include/
 -Iinclude -Ix86_64 -Iinclude/x86_64 -I..  -pthread  -no-pie
 x86_64/evmcs_test.c ./linux/tools/testing/selftests/kselftest_harness.h
 ./linux/tools/testing/selftests/kselftest.h
 ./linux/tools/testing/selftests/kvm/libkvm.a
 -o ./linux/tools/testing/selftests/kvm/x86_64/evmcs_test
/usr/bin/ld: ./linux/tools/testing/selftests/kvm/libkvm.a(vmx.o):
 ./linux/tools/testing/selftests/kvm/include/x86_64/vmx.h:603:
 multiple definition of `ctrl_exit_rev'; /tmp/ccMQpvNt.o:
 ./linux/tools/testing/selftests/kvm/include/x86_64/vmx.h:603:
 first defined here
/usr/bin/ld: ./linux/tools/testing/selftests/kvm/libkvm.a(vmx.o):
 ./linux/tools/testing/selftests/kvm/include/x86_64/vmx.h:602:
 multiple definition of `ctrl_pin_rev'; /tmp/ccMQpvNt.o:
 ./linux/tools/testing/selftests/kvm/include/x86_64/vmx.h:602:
 first defined here
 ...

ctrl_exit_rev/ctrl_pin_rev/basic variables are only used in
vmx_preemption_timer_test.c, just move them there.

Fixes: 8d7fbf01 ("KVM: selftests: VMX preemption timer migration test")
Reported-by: Marcelo Bandeira Condotta <mcondotta@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200608112346.593513-2-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

75ad6e80

KVM: selftests: Add x86_64/debug_regs to .gitignore · 5ae1452f

Vitaly Kuznetsov authored Jun 08, 2020

Add x86_64/debug_regs to .gitignore.
Reported-by: Marcelo Bandeira Condotta <mcondotta@redhat.com>
Fixes: 449aa906 ("KVM: selftests: Add KVM_SET_GUEST_DEBUG test")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200608112346.593513-1-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

5ae1452f

Revert "KVM: x86: work around leak of uninitialized stack contents" · 25597f64

Vitaly Kuznetsov authored Jun 05, 2020

handle_vmptrst()/handle_vmread() stopped injecting #PF unconditionally
and switched to nested_vmx_handle_memory_failure() which just kills the
guest with KVM_EXIT_INTERNAL_ERROR in case of MMIO access, zeroing
'exception' in kvm_write_guest_virt_system() is not needed anymore.

This reverts commit 541ab2ae.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200605115906.532682-2-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

25597f64

KVM: VMX: Properly handle kvm_read/write_guest_virt*() result · 7a35e515

Vitaly Kuznetsov authored Jun 05, 2020

Syzbot reports the following issue:

WARNING: CPU: 0 PID: 6819 at arch/x86/kvm/x86.c:618
 kvm_inject_emulated_page_fault+0x210/0x290 arch/x86/kvm/x86.c:618
...
Call Trace:
...
RIP: 0010:kvm_inject_emulated_page_fault+0x210/0x290 arch/x86/kvm/x86.c:618
...
 nested_vmx_get_vmptr+0x1f9/0x2a0 arch/x86/kvm/vmx/nested.c:4638
 handle_vmon arch/x86/kvm/vmx/nested.c:4767 [inline]
 handle_vmon+0x168/0x3a0 arch/x86/kvm/vmx/nested.c:4728
 vmx_handle_exit+0x29c/0x1260 arch/x86/kvm/vmx/vmx.c:6067

'exception' we're trying to inject with kvm_inject_emulated_page_fault()
comes from:

  nested_vmx_get_vmptr()
   kvm_read_guest_virt()
     kvm_read_guest_virt_helper()
       vcpu->arch.walk_mmu->gva_to_gpa()

but it is only set when GVA to GPA conversion fails. In case it doesn't but
we still fail kvm_vcpu_read_guest_page(), X86EMUL_IO_NEEDED is returned and
nested_vmx_get_vmptr() calls kvm_inject_emulated_page_fault() with zeroed
'exception'. This happen when the argument is MMIO.

Paolo also noticed that nested_vmx_get_vmptr() is not the only place in
KVM code where kvm_read/write_guest_virt*() return result is mishandled.
VMX instructions along with INVPCID have the same issue. This was already
noticed before, e.g. see commit 541ab2ae ("KVM: x86: work around
leak of uninitialized stack contents") but was never fully fixed.

KVM could've handled the request correctly by going to userspace and
performing I/O but there doesn't seem to be a good need for such requests
in the first place.

Introduce vmx_handle_memory_failure() as an interim solution.

Note, nested_vmx_get_vmptr() now has three possible outcomes: OK, PF,
KVM_EXIT_INTERNAL_ERROR and callers need to know if userspace exit is
needed (for KVM_EXIT_INTERNAL_ERROR) in case of failure. We don't seem
to have a good enum describing this tristate, just add "int *ret" to
nested_vmx_get_vmptr() interface to pass the information.

Reported-by: syzbot+2a7156e11dc199bdbd8a@syzkaller.appspotmail.com
Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200605115906.532682-1-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

7a35e515

05 Jun, 2020 2 commits

KVM: x86: emulate reserved nops from 0f/18 to 0f/1f · 34d2618d

Paolo Bonzini authored May 15, 2020

Instructions starting with 0f18 up to 0f1f are reserved nops, except those
that were assigned to MPX.  These include the endbr markers used by CET.
List them correctly in the opcode table.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

34d2618d

KVM: selftests: Fix build with "make ARCH=x86_64" · b80db73d

Vitaly Kuznetsov authored Jun 05, 2020

Marcelo reports that kvm selftests fail to build with
"make ARCH=x86_64":

gcc -Wall -Wstrict-prototypes -Wuninitialized -O2 -g -std=gnu99
 -fno-stack-protector -fno-PIE -I../../../../tools/include
 -I../../../../tools/arch/x86_64/include  -I../../../../usr/include/
 -Iinclude -Ilib -Iinclude/x86_64 -I.. -c lib/kvm_util.c
 -o /var/tmp/20200604202744-bin/lib/kvm_util.o

In file included from lib/kvm_util.c:11:
include/x86_64/processor.h:14:10: fatal error: asm/msr-index.h: No such
 file or directory

 #include <asm/msr-index.h>
          ^~~~~~~~~~~~~~~~~
compilation terminated.

"make ARCH=x86", however, works. The problem is that arch specific headers
for x86_64 live in 'tools/arch/x86/include', not in
'tools/arch/x86_64/include'.

Fixes: 66d69e08 ("selftests: fix kvm relocatable native/cross builds and installs")
Reported-by: Marcelo Bandeira Condotta <mcondotta@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200605142028.550068-1-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

b80db73d

04 Jun, 2020 20 commits

Merge tag 'kvm-ppc-next-5.8-1' of... · ba4e6279

Paolo Bonzini authored Jun 04, 2020

Merge tag 'kvm-ppc-next-5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD

PPC KVM update for 5.8

- Updates and bug fixes for secure guest support
- Other minor bug fixes and cleanups.

ba4e6279

KVM: x86: minor code refactor and comments fixup around dirty logging · 3741679b

Anthony Yznaga authored Jun 02, 2020

Consolidate the code and correct the comments to show that the actions
taken to update existing mappings to disable or enable dirty logging
are not necessary when creating, moving, or deleting a memslot.
Signed-off-by: Anthony Yznaga <anthony.yznaga@oracle.com>
Message-Id: <1591128450-11977-4-git-send-email-anthony.yznaga@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

3741679b

KVM: x86: avoid unnecessary rmap walks when creating/moving slots · 4b442955

Anthony Yznaga authored Jun 02, 2020

On large memory guests it has been observed that creating a memslot
for a very large range can take noticeable amount of time.
Investigation showed that the time is spent walking the rmaps to update
existing sptes to remove write access or set/clear dirty bits to support
dirty logging. These rmap walks are unnecessary when creating or moving
a memslot. A newly created memslot will not have any existing mappings,
and the existing mappings of a moved memslot will have been invalidated
and flushed. Any mappings established once the new/moved memslot becomes
visible will be set using the properties of the new slot.
Signed-off-by: Anthony Yznaga <anthony.yznaga@oracle.com>
Message-Id: <1591128450-11977-3-git-send-email-anthony.yznaga@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

4b442955

KVM: x86: remove unnecessary rmap walk of read-only memslots · 5688fed6

Anthony Yznaga authored Jun 02, 2020

There's no write access to remove. An existing memslot cannot be updated
to set or clear KVM_MEM_READONLY, and any mappings established in a newly
created or moved read-only memslot will already be read-only.
Signed-off-by: Anthony Yznaga <anthony.yznaga@oracle.com>
Message-Id: <1591128450-11977-2-git-send-email-anthony.yznaga@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

5688fed6

KVM: Use vmemdup_user() · 7ec28e26

Denis Efremov authored Jun 03, 2020

Replace opencoded alloc and copy with vmemdup_user().
Signed-off-by: Denis Efremov <efremov@linux.com>
Message-Id: <20200603101131.2107303-1-efremov@linux.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

7ec28e26

x86/kvm: Remove defunct KVM_DEBUG_FS Kconfig · 0e96edd9

Sean Christopherson authored May 27, 2020

Remove KVM_DEBUG_FS, which can easily be misconstrued as controlling
KVM-as-a-host.  The sole user of CONFIG_KVM_DEBUG_FS was removed by
commit cfd8983f ("x86, locking/spinlocks: Remove ticket (spin)lock
implementation").
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200528031121.28904-1-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

0e96edd9

KVM: MIPS: Enable KVM support for Loongson-3 · 0f78355c

Huacai Chen authored May 23, 2020

This patch enable KVM support for Loongson-3 by selecting HAVE_KVM, but
only enable KVM/VZ on Loongson-3A R4+ (because VZ of early processors
are incomplete). Besides, Loongson-3 support SMP guests, so we clear the
linked load bit of LLAddr in kvm_vz_vcpu_load() if the guest has more
than one VCPUs.
Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Reviewed-by: Aleksandar Markovic <aleksandar.qemu.devel@gmail.com>
Signed-off-by: Huacai Chen <chenhc@lemote.com>
Co-developed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Message-Id: <1590220602-3547-15-git-send-email-chenhc@lemote.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

0f78355c

KVM: MIPS: Add more MMIO load/store instructions emulation · dc6d95b1

Huacai Chen authored May 23, 2020

This patch add more MMIO load/store instructions emulation, which can
be observed in QXL and some other device drivers:

1, LWL, LWR, LDW, LDR, SWL, SWR, SDL and SDR for all MIPS;
2, GSLBX, GSLHX, GSLWX, GSLDX, GSSBX, GSSHX, GSSWX and GSSDX for
   Loongson-3.
Reviewed-by: Aleksandar Markovic <aleksandar.qemu.devel@gmail.com>
Signed-off-by: Huacai Chen <chenhc@lemote.com>
Co-developed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Message-Id: <1590220602-3547-14-git-send-email-chenhc@lemote.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

dc6d95b1

KVM: MIPS: Add CONFIG6 and DIAG registers emulation · 8a5097ee

Huacai Chen authored May 23, 2020

Loongson-3 has CONFIG6 and DIAG registers which need to be emulated.
CONFIG6 is mostly used to enable/disable FTLB and SFB, while DIAG is
mostly used to flush BTB, ITLB, DTLB, VTLB and FTLB.
Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Reviewed-by: Aleksandar Markovic <aleksandar.qemu.devel@gmail.com>
Signed-off-by: Huacai Chen <chenhc@lemote.com>
Co-developed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Message-Id: <1590220602-3547-13-git-send-email-chenhc@lemote.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

8a5097ee

KVM: MIPS: Add CPUCFG emulation for Loongson-3 · 7f2a83f1

Huacai Chen authored May 23, 2020

Loongson-3 overrides lwc2 instructions to implement CPUCFG and CSR
read/write functions. These instructions all cause guest exit so CSR
doesn't benifit KVM guest (and there are always legacy methods to
provide the same functions as CSR). So, we only emulate CPUCFG and let
it return a reduced feature list (which means the virtual CPU doesn't
have any other advanced features, including CSR) in KVM.
Reviewed-by: Aleksandar Markovic <aleksandar.qemu.devel@gmail.com>
Signed-off-by: Huacai Chen <chenhc@lemote.com>
Co-developed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Message-Id: <1590220602-3547-12-git-send-email-chenhc@lemote.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

7f2a83f1

KVM: MIPS: Add Loongson-3 Virtual IPI interrupt support · f21db309

Huacai Chen authored May 23, 2020

This patch add Loongson-3 Virtual IPI interrupt support in the kernel.
The current implementation of IPI emulation in QEMU is based on GIC for
MIPS, but Loongson-3 doesn't use GIC. Furthermore, IPI emulation in QEMU
is too expensive for performance (because of too many context switches
between Host and Guest). With current solution, the IPI delay may even
cause RCU stall warnings in a multi-core Guest. So, we design a faster
solution that emulate IPI interrupt in kernel (only used by Loongson-3
now).
Reviewed-by: Aleksandar Markovic <aleksandar.qemu.devel@gmail.com>
Signed-off-by: Huacai Chen <chenhc@lemote.com>
Co-developed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Message-Id: <1590220602-3547-11-git-send-email-chenhc@lemote.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

f21db309

KVM: MIPS: Add more types of virtual interrupts · 3f51d8fc

Huacai Chen authored May 23, 2020

In current implementation, MIPS KVM uses IP2, IP3, IP4 and IP7 for
external interrupt, two kinds of IPIs and timer interrupt respectively,
but Loongson-3 based machines prefer to use IP2, IP3, IP6 and IP7 for
two kinds of external interrupts, IPI and timer interrupt. So we define
two priority-irq mapping tables: kvm_loongson3_priority_to_irq[] for
Loongson-3, and kvm_default_priority_to_irq[] for others. The virtual
interrupt infrastructure is updated to deliver all types of interrupts
from IP2, IP3, IP4, IP6 and IP7.
Reviewed-by: Aleksandar Markovic <aleksandar.qemu.devel@gmail.com>
Signed-off-by: Huacai Chen <chenhc@lemote.com>
Co-developed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Message-Id: <1590220602-3547-10-git-send-email-chenhc@lemote.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

3f51d8fc

KVM: MIPS: Let indexed cacheops cause guest exit on Loongson-3 · 49bb9600

Huacai Chen authored May 23, 2020

Loongson-3's indexed cache operations need a node-id in the address,
but in KVM guest the node-id may be incorrect. So, let indexed cache
operations cause guest exit on Loongson-3.
Reviewed-by: Aleksandar Markovic <aleksandar.qemu.devel@gmail.com>
Signed-off-by: Huacai Chen <chenhc@lemote.com>
Co-developed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Message-Id: <1590220602-3547-9-git-send-email-chenhc@lemote.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

49bb9600

KVM: MIPS: Use root tlb to control guest's CCA for Loongson-3 · 52c07e1c

Huacai Chen authored May 23, 2020

KVM guest has two levels of address translation: guest tlb translates
GVA to GPA, and root tlb translates GPA to HPA. By default guest's CCA
is controlled by guest tlb, but Loongson-3 maintains all cache coherency
by hardware (including multi-core coherency and I/O DMA coherency) so it
prefers all guest mappings be cacheable mappings. Thus, we use root tlb
to control guest's CCA for Loongson-3.
Reviewed-by: Aleksandar Markovic <aleksandar.qemu.devel@gmail.com>
Signed-off-by: Huacai Chen <chenhc@lemote.com>
Co-developed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Message-Id: <1590220602-3547-8-git-send-email-chenhc@lemote.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

52c07e1c

KVM: MIPS: Introduce and use cpu_guest_has_ldpte · 3210e2c2

Huacai Chen authored May 23, 2020

Loongson-3 has lddir/ldpte instructions and their related CP0 registers
are the same as HTW. So we introduce a cpu_guest_has_ldpte flag and use
it to indicate whether we need to save/restore HTW related CP0 registers
(PWBase, PWSize, PWField and PWCtl).
Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Reviewed-by: Aleksandar Markovic <aleksandar.qemu.devel@gmail.com>
Signed-off-by: Huacai Chen <chenhc@lemote.com>
Co-developed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Message-Id: <1590220602-3547-7-git-send-email-chenhc@lemote.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

3210e2c2

KVM: MIPS: Use lddir/ldpte instructions to lookup gpa_mm.pgd · 8bf31295

Huacai Chen authored May 23, 2020

Loongson-3 can use lddir/ldpte instuctions to accelerate page table
walking, so use them to lookup gpa_mm.pgd.
Reviewed-by: Aleksandar Markovic <aleksandar.qemu.devel@gmail.com>
Signed-off-by: Huacai Chen <chenhc@lemote.com>
Co-developed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Message-Id: <1590220602-3547-6-git-send-email-chenhc@lemote.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

8bf31295

KVM: MIPS: Add EVENTFD support which is needed by VHOST · bf10efbb

Huacai Chen authored May 23, 2020

Add EVENTFD support for KVM/MIPS, which is needed by VHOST. Tested on
Loongson-3 platform.
Reviewed-by: Aleksandar Markovic <aleksandar.qemu.devel@gmail.com>
Signed-off-by: Huacai Chen <chenhc@lemote.com>
Co-developed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Message-Id: <1590220602-3547-5-git-send-email-chenhc@lemote.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

bf10efbb

KVM: MIPS: Increase KVM_MAX_VCPUS and KVM_USER_MEM_SLOTS to 16 · 210b4b91

Huacai Chen authored May 23, 2020

Loongson-3 based machines can have as many as 16 CPUs, and so does
memory slots, so increase KVM_MAX_VCPUS and KVM_USER_MEM_SLOTS to 16.
Reviewed-by: Aleksandar Markovic <aleksandar.qemu.devel@gmail.com>
Signed-off-by: Huacai Chen <chenhc@lemote.com>
Co-developed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Message-Id: <1590220602-3547-4-git-send-email-chenhc@lemote.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

210b4b91

KVM: MIPS: Fix VPN2_MASK definition for variable cpu_vmbits · 5816c76d

Xing Li authored May 23, 2020

If a CPU support more than 32bit vmbits (which is true for 64bit CPUs),
VPN2_MASK set to fixed 0xffffe000 will lead to a wrong EntryHi in some
functions such as _kvm_mips_host_tlb_inv().

The cpu_vmbits definition of 32bit CPU in cpu-features.h is 31, so we
still use the old definition.

Cc: Stable <stable@vger.kernel.org>
Reviewed-by: Aleksandar Markovic <aleksandar.qemu.devel@gmail.com>
Signed-off-by: Xing Li <lixing@loongson.cn>
[Huacai: Improve commit messages]
Signed-off-by: Huacai Chen <chenhc@lemote.com>
Message-Id: <1590220602-3547-3-git-send-email-chenhc@lemote.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

5816c76d

KVM: MIPS: Define KVM_ENTRYHI_ASID to cpu_asid_mask(&boot_cpu_data) · fe2b73db

Xing Li authored May 23, 2020

The code in decode_config4() of arch/mips/kernel/cpu-probe.c

        asid_mask = MIPS_ENTRYHI_ASID;
        if (config4 & MIPS_CONF4_AE)
                asid_mask |= MIPS_ENTRYHI_ASIDX;
        set_cpu_asid_mask(c, asid_mask);

set asid_mask to cpuinfo->asid_mask.

So in order to support variable ASID_MASK, KVM_ENTRYHI_ASID should also
be changed to cpu_asid_mask(&boot_cpu_data).

Cc: Stable <stable@vger.kernel.org>  #4.9+
Reviewed-by: Aleksandar Markovic <aleksandar.qemu.devel@gmail.com>
Signed-off-by: Xing Li <lixing@loongson.cn>
[Huacai: Change current_cpu_data to boot_cpu_data for optimization]
Signed-off-by: Huacai Chen <chenhc@lemote.com>
Message-Id: <1590220602-3547-2-git-send-email-chenhc@lemote.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

fe2b73db