1. 06 Sep, 2021 17 commits
    • Zelin Deng's avatar
      KVM: x86: Update vCPU's hv_clock before back to guest when tsc_offset is adjusted · d9130a2d
      Zelin Deng authored
      When MSR_IA32_TSC_ADJUST is written by guest due to TSC ADJUST feature
      especially there's a big tsc warp (like a new vCPU is hot-added into VM
      which has been up for a long time), tsc_offset is added by a large value
      then go back to guest. This causes system time jump as tsc_timestamp is
      not adjusted in the meantime and pvclock monotonic character.
      To fix this, just notify kvm to update vCPU's guest time before back to
      guest.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarZelin Deng <zelin.deng@linux.alibaba.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Message-Id: <1619576521-81399-2-git-send-email-zelin.deng@linux.alibaba.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      d9130a2d
    • Paolo Bonzini's avatar
      KVM: MMU: mark role_regs and role accessors as maybe unused · 4ac21457
      Paolo Bonzini authored
      It is reasonable for these functions to be used only in some configurations,
      for example only if the host is 64-bits (and therefore supports 64-bit
      guests).  It is also reasonable to keep the role_regs and role accessors
      in sync even though some of the accessors may be used only for one of the
      two sets (as is the case currently for CR4.LA57)..
      
      Because clang reports warnings for unused inlines declared in a .c file,
      mark both sets of accessors as __maybe_unused.
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      4ac21457
    • Huacai Chen's avatar
      KVM: MIPS: Remove a "set but not used" variable · a3cf527e
      Huacai Chen authored
      This fix a build warning:
      
         arch/mips/kvm/vz.c: In function '_kvm_vz_restore_htimer':
      >> arch/mips/kvm/vz.c:392:10: warning: variable 'freeze_time' set but not used [-Wunused-but-set-variable]
           392 |  ktime_t freeze_time;
               |          ^~~~~~~~~~~
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      Message-Id: <20210406024911.2008046-1-chenhuacai@loongson.cn>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      a3cf527e
    • Paolo Bonzini's avatar
      Merge tag 'kvmarm-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD · e99314a3
      Paolo Bonzini authored
      KVM/arm64 updates for 5.15
      
      - Page ownership tracking between host EL1 and EL2
      
      - Rely on userspace page tables to create large stage-2 mappings
      
      - Fix incompatibility between pKVM and kmemleak
      
      - Fix the PMU reset state, and improve the performance of the virtual PMU
      
      - Move over to the generic KVM entry code
      
      - Address PSCI reset issues w.r.t. save/restore
      
      - Preliminary rework for the upcoming pKVM fixed feature
      
      - A bunch of MM cleanups
      
      - a vGIC fix for timer spurious interrupts
      
      - Various cleanups
      e99314a3
    • Paolo Bonzini's avatar
      Merge tag 'kvm-s390-next-5.15-1' of... · 0d0a1939
      Paolo Bonzini authored
      Merge tag 'kvm-s390-next-5.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
      
      KVM: s390: Fix and feature for 5.15
      
      - enable interpretion of specification exceptions
      - fix a vcpu_idx vs vcpu_id mixup
      0d0a1939
    • Lai Jiangshan's avatar
      x86/kvm: Don't enable IRQ when IRQ enabled in kvm_wait · a40b2fd0
      Lai Jiangshan authored
      Commit f4e61f0c ("x86/kvm: Fix broken irq restoration in kvm_wait")
      replaced "local_irq_restore() when IRQ enabled" with "local_irq_enable()
      when IRQ enabled" to suppress a warnning.
      
      Although there is no similar debugging warnning for doing local_irq_enable()
      when IRQ enabled as doing local_irq_restore() in the same IRQ situation.  But
      doing local_irq_enable() when IRQ enabled is no less broken as doing
      local_irq_restore() and we'd better avoid it.
      
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarLai Jiangshan <laijs@linux.alibaba.com>
      Message-Id: <20210814035129.154242-1-jiangshanlai@gmail.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      a40b2fd0
    • Jing Zhang's avatar
      KVM: stats: Add VM stat for remote tlb flush requests · 3cc4e148
      Jing Zhang authored
      Add a new stat that counts the number of times a remote TLB flush is
      requested, regardless of whether it kicks vCPUs out of guest mode. This
      allows us to look at how often flushes are initiated.
      
      Unlike remote_tlb_flush, this one applies to ARM's instruction-set-based
      TLB flush implementation, so apply it there too.
      Original-by: default avatarDavid Matlack <dmatlack@google.com>
      Signed-off-by: default avatarJing Zhang <jingzhangos@google.com>
      Message-Id: <20210817002639.3856694-1-jingzhangos@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      3cc4e148
    • Sean Christopherson's avatar
      KVM: Remove unnecessary export of kvm_{inc,dec}_notifier_count() · fdde13c1
      Sean Christopherson authored
      Don't export KVM's MMU notifier count helpers, under no circumstance
      should any downstream module, including x86's vendor code, have a
      legitimate reason to piggyback KVM's MMU notifier logic.  E.g in the x86
      case, only KVM's MMU should be elevating the notifier count, and that
      code is always built into the core kvm.ko module.
      
      Fixes: edb298c6 ("KVM: x86/mmu: bump mmu notifier count in kvm_zap_gfn_range")
      Cc: Maxim Levitsky <mlevitsk@redhat.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Reviewed-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20210902175951.1387989-1-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      fdde13c1
    • Sean Christopherson's avatar
      KVM: x86/mmu: Move lpage_disallowed_link further "down" in kvm_mmu_page · 1148bfc4
      Sean Christopherson authored
      Move "lpage_disallowed_link" out of the first 64 bytes, i.e. out of the
      first cache line, of kvm_mmu_page so that "spt" and to a lesser extent
      "gfns" land in the first cache line.  "lpage_disallowed_link" is accessed
      relatively infrequently compared to "spt", which is accessed any time KVM
      is walking and/or manipulating the shadow page tables.
      
      No functional change intended.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20210901221023.1303578-4-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      1148bfc4
    • Sean Christopherson's avatar
      KVM: x86/mmu: Relocate kvm_mmu_page.tdp_mmu_page for better cache locality · ca41c34c
      Sean Christopherson authored
      Move "tdp_mmu_page" into the 1-byte void left by the recently removed
      "mmio_cached" so that it resides in the first 64 bytes of kvm_mmu_page,
      i.e. in the same cache line as the most commonly accessed fields.
      
      Don't bother wrapping tdp_mmu_page in CONFIG_X86_64, including the field in
      32-bit builds doesn't affect the size of kvm_mmu_page, and a future patch
      can always wrap the field in the unlikely event KVM gains a 1-byte flag
      that is 32-bit specific.
      
      Note, the size of kvm_mmu_page is also unchanged on CONFIG_X86_64=y due
      to it previously sharing an 8-byte chunk with write_flooding_count.
      
      No functional change intended.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20210901221023.1303578-3-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      ca41c34c
    • Sean Christopherson's avatar
      Revert "KVM: x86: mmu: Add guest physical address check in translate_gpa()" · e7177339
      Sean Christopherson authored
      Revert a misguided illegal GPA check when "translating" a non-nested GPA.
      The check is woefully incomplete as it does not fill in @exception as
      expected by all callers, which leads to KVM attempting to inject a bogus
      exception, potentially exposing kernel stack information in the process.
      
       WARNING: CPU: 0 PID: 8469 at arch/x86/kvm/x86.c:525 exception_type+0x98/0xb0 arch/x86/kvm/x86.c:525
       CPU: 1 PID: 8469 Comm: syz-executor531 Not tainted 5.14.0-rc7-syzkaller #0
       RIP: 0010:exception_type+0x98/0xb0 arch/x86/kvm/x86.c:525
       Call Trace:
        x86_emulate_instruction+0xef6/0x1460 arch/x86/kvm/x86.c:7853
        kvm_mmu_page_fault+0x2f0/0x1810 arch/x86/kvm/mmu/mmu.c:5199
        handle_ept_misconfig+0xdf/0x3e0 arch/x86/kvm/vmx/vmx.c:5336
        __vmx_handle_exit arch/x86/kvm/vmx/vmx.c:6021 [inline]
        vmx_handle_exit+0x336/0x1800 arch/x86/kvm/vmx/vmx.c:6038
        vcpu_enter_guest+0x2a1c/0x4430 arch/x86/kvm/x86.c:9712
        vcpu_run arch/x86/kvm/x86.c:9779 [inline]
        kvm_arch_vcpu_ioctl_run+0x47d/0x1b20 arch/x86/kvm/x86.c:10010
        kvm_vcpu_ioctl+0x49e/0xe50 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3652
      
      The bug has escaped notice because practically speaking the GPA check is
      useless.  The GPA check in question only comes into play when KVM is
      walking guest page tables (or "translating" CR3), and KVM already handles
      illegal GPA checks by setting reserved bits in rsvd_bits_mask for each
      PxE, or in the case of CR3 for loading PTDPTRs, manually checks for an
      illegal CR3.  This particular failure doesn't hit the existing reserved
      bits checks because syzbot sets guest.MAXPHYADDR=1, and IA32 architecture
      simply doesn't allow for such an absurd MAXPHYADDR, e.g. 32-bit paging
      doesn't define any reserved PA bits checks, which KVM emulates by only
      incorporating the reserved PA bits into the "high" bits, i.e. bits 63:32.
      
      Simply remove the bogus check.  There is zero meaningful value and no
      architectural justification for supporting guest.MAXPHYADDR < 32, and
      properly filling the exception would introduce non-trivial complexity.
      
      This reverts commit ec7771ab.
      
      Fixes: ec7771ab ("KVM: x86: mmu: Add guest physical address check in translate_gpa()")
      Cc: stable@vger.kernel.org
      Reported-by: syzbot+200c08e88ae818f849ce@syzkaller.appspotmail.com
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20210831164224.1119728-2-seanjc@google.com>
      Reviewed-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      e7177339
    • Jia He's avatar
      KVM: x86/mmu: Remove unused field mmio_cached in struct kvm_mmu_page · 678a305b
      Jia He authored
      After reverting and restoring the fast tlb invalidation patch series,
      the mmio_cached is not removed. Hence a unused field is left in
      kvm_mmu_page.
      
      Cc: Sean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarJia He <justin.he@arm.com>
      Message-Id: <20210830145336.27183-1-justin.he@arm.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      678a305b
    • Eduardo Habkost's avatar
      kvm: x86: Increase KVM_SOFT_MAX_VCPUS to 710 · 1dbaf04c
      Eduardo Habkost authored
      Support for 710 VCPUs was tested by Red Hat since RHEL-8.4,
      so increase KVM_SOFT_MAX_VCPUS to 710.
      Signed-off-by: default avatarEduardo Habkost <ehabkost@redhat.com>
      Message-Id: <20210903211600.2002377-4-ehabkost@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      1dbaf04c
    • Eduardo Habkost's avatar
      kvm: x86: Increase MAX_VCPUS to 1024 · 074c82c8
      Eduardo Habkost authored
      Increase KVM_MAX_VCPUS to 1024, so we can test larger VMs.
      
      I'm not changing KVM_SOFT_MAX_VCPUS yet because I'm afraid it
      might involve complicated questions around the meaning of
      "supported" and "recommended" in the upstream tree.
      KVM_SOFT_MAX_VCPUS will be changed in a separate patch.
      
      For reference, visible effects of this change are:
      - KVM_CAP_MAX_VCPUS will now return 1024 (of course)
      - Default value for CPUID[HYPERV_CPUID_IMPLEMENT_LIMITS (00x40000005)].EAX
        will now be 1024
      - KVM_MAX_VCPU_ID will change from 1151 to 4096
      - Size of struct kvm will increase from 19328 to 22272 bytes
        (in x86_64)
      - Size of struct kvm_ioapic will increase from 1780 to 5084 bytes
        (in x86_64)
      - Bitmap stack variables that will grow:
        - At kvm_hv_flush_tlb() kvm_hv_send_ipi(),
          vp_bitmap[] and vcpu_bitmap[] will now be 128 bytes long
        - vcpu_bitmap at bioapic_write_indirect() will be 128 bytes long
          once patch "KVM: x86: Fix stack-out-of-bounds memory access
          from ioapic_write_indirect()" is applied
      Signed-off-by: default avatarEduardo Habkost <ehabkost@redhat.com>
      Message-Id: <20210903211600.2002377-3-ehabkost@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      074c82c8
    • Eduardo Habkost's avatar
      kvm: x86: Set KVM_MAX_VCPU_ID to 4*KVM_MAX_VCPUS · 4ddacd52
      Eduardo Habkost authored
      Instead of requiring KVM_MAX_VCPU_ID to be manually increased
      every time we increase KVM_MAX_VCPUS, set it to 4*KVM_MAX_VCPUS.
      This should be enough for CPU topologies where Cores-per-Package
      and Packages-per-Socket are not powers of 2.
      
      In practice, this increases KVM_MAX_VCPU_ID from 1023 to 1152.
      The only side effect of this change is making some fields in
      struct kvm_ioapic larger, increasing the struct size from 1628 to
      1780 bytes (in x86_64).
      Signed-off-by: default avatarEduardo Habkost <ehabkost@redhat.com>
      Message-Id: <20210903211600.2002377-2-ehabkost@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      4ddacd52
    • Maxim Levitsky's avatar
      KVM: VMX: avoid running vmx_handle_exit_irqoff in case of emulation · 81b4b56d
      Maxim Levitsky authored
      If we are emulating an invalid guest state, we don't have a correct
      exit reason, and thus we shouldn't do anything in this function.
      Signed-off-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20210826095750.1650467-2-mlevitsk@redhat.com>
      Cc: stable@vger.kernel.org
      Fixes: 95b5a48c ("KVM: VMX: Handle NMIs, #MCs and async #PFs in common irqs-disabled fn", 2019-06-18)
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      81b4b56d
    • Sean Christopherson's avatar
      KVM: x86/mmu: Don't freak out if pml5_root is NULL on 4-level host · a717a780
      Sean Christopherson authored
      Include pml5_root in the set of special roots if and only if the host,
      and thus NPT, is using 5-level paging.  mmu_alloc_special_roots() expects
      special roots to be allocated as a bundle, i.e. they're either all valid
      or all NULL.  But for pml5_root, that expectation only holds true if the
      host uses 5-level paging, which causes KVM to WARN about pml5_root being
      NULL when the other special roots are valid.
      
      The silver lining of 4-level vs. 5-level NPT being tied to the host
      kernel's paging level is that KVM's shadow root level is constant; unlike
      VMX's EPT, KVM can't choose 4-level NPT based on guest.MAXPHYADDR.  That
      means KVM can still expect pml5_root to be bundled with the other special
      roots, it just needs to be conditioned on the shadow root level.
      
      Fixes: cb0f722a ("KVM: x86/mmu: Support shadowing NPT when 5-level paging is enabled in host")
      Reported-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Reviewed-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20210824005824.205536-1-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      a717a780
  2. 27 Aug, 2021 2 commits
  3. 26 Aug, 2021 2 commits
  4. 20 Aug, 2021 19 commits