1. 16 Apr, 2019 16 commits
    • Paolo Bonzini's avatar
      selftests: kvm: fix for compilers that do not support -no-pie · c2390f16
      Paolo Bonzini authored
      -no-pie was added to GCC at the same time as their configuration option
      --enable-default-pie.  Compilers that were built before do not have
      -no-pie, but they also do not need it.  Detect the option at build
      time.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c2390f16
    • Paolo Bonzini's avatar
      selftests: kvm/evmcs_test: complete I/O before migrating guest state · c68c21ca
      Paolo Bonzini authored
      Starting state migration after an IO exit without first completing IO
      may result in test failures.  We already have two tests that need this
      (this patch in fact fixes evmcs_test, similar to what was fixed for
      state_test in commit 0f73bbc8, "KVM: selftests: complete IO before
      migrating guest state", 2019-03-13) and a third is coming.  So, move the
      code to vcpu_save_state, and while at it do not access register state
      until after I/O is complete.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c68c21ca
    • Sean Christopherson's avatar
      KVM: x86: Always use 32-bit SMRAM save state for 32-bit kernels · b68f3cc7
      Sean Christopherson authored
      Invoking the 64-bit variation on a 32-bit kenrel will crash the guest,
      trigger a WARN, and/or lead to a buffer overrun in the host, e.g.
      rsm_load_state_64() writes r8-r15 unconditionally, but enum kvm_reg and
      thus x86_emulate_ctxt._regs only define r8-r15 for CONFIG_X86_64.
      
      KVM allows userspace to report long mode support via CPUID, even though
      the guest is all but guaranteed to crash if it actually tries to enable
      long mode.  But, a pure 32-bit guest that is ignorant of long mode will
      happily plod along.
      
      SMM complicates things as 64-bit CPUs use a different SMRAM save state
      area.  KVM handles this correctly for 64-bit kernels, e.g. uses the
      legacy save state map if userspace has hid long mode from the guest,
      but doesn't fare well when userspace reports long mode support on a
      32-bit host kernel (32-bit KVM doesn't support 64-bit guests).
      
      Since the alternative is to crash the guest, e.g. by not loading state
      or explicitly requesting shutdown, unconditionally use the legacy SMRAM
      save state map for 32-bit KVM.  If a guest has managed to get far enough
      to handle SMIs when running under a weird/buggy userspace hypervisor,
      then don't deliberately crash the guest since there are no downsides
      (from KVM's perspective) to allow it to continue running.
      
      Fixes: 660a5d51 ("KVM: x86: save/load state on SMM switch")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b68f3cc7
    • Sean Christopherson's avatar
      KVM: x86: Don't clear EFER during SMM transitions for 32-bit vCPU · 8f4dc2e7
      Sean Christopherson authored
      Neither AMD nor Intel CPUs have an EFER field in the legacy SMRAM save
      state area, i.e. don't save/restore EFER across SMM transitions.  KVM
      somewhat models this, e.g. doesn't clear EFER on entry to SMM if the
      guest doesn't support long mode.  But during RSM, KVM unconditionally
      clears EFER so that it can get back to pure 32-bit mode in order to
      start loading CRs with their actual non-SMM values.
      
      Clear EFER only when it will be written when loading the non-SMM state
      so as to preserve bits that can theoretically be set on 32-bit vCPUs,
      e.g. KVM always emulates EFER_SCE.
      
      And because CR4.PAE is cleared only to play nice with EFER, wrap that
      code in the long mode check as well.  Note, this may result in a
      compiler warning about cr4 being consumed uninitialized.  Re-read CR4
      even though it's technically unnecessary, as doing so allows for more
      readable code and RSM emulation is not a performance critical path.
      
      Fixes: 660a5d51 ("KVM: x86: save/load state on SMM switch")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      8f4dc2e7
    • Sean Christopherson's avatar
      KVM: x86: clear SMM flags before loading state while leaving SMM · 9ec19493
      Sean Christopherson authored
      RSM emulation is currently broken on VMX when the interrupted guest has
      CR4.VMXE=1.  Stop dancing around the issue of HF_SMM_MASK being set when
      loading SMSTATE into architectural state, e.g. by toggling it for
      problematic flows, and simply clear HF_SMM_MASK prior to loading
      architectural state (from SMRAM save state area).
      Reported-by: default avatarJon Doron <arilou@gmail.com>
      Cc: Jim Mattson <jmattson@google.com>
      Cc: Liran Alon <liran.alon@oracle.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Fixes: 5bea5123 ("KVM: VMX: check nested state and CR4.VMXE against SMM")
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Tested-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      9ec19493
    • Sean Christopherson's avatar
      KVM: x86: Open code kvm_set_hflags · c5833c7a
      Sean Christopherson authored
      Prepare for clearing HF_SMM_MASK prior to loading state from the SMRAM
      save state map, i.e. kvm_smm_changed() needs to be called after state
      has been loaded and so cannot be done automatically when setting
      hflags from RSM.
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c5833c7a
    • Sean Christopherson's avatar
      KVM: x86: Load SMRAM in a single shot when leaving SMM · ed19321f
      Sean Christopherson authored
      RSM emulation is currently broken on VMX when the interrupted guest has
      CR4.VMXE=1.  Rather than dance around the issue of HF_SMM_MASK being set
      when loading SMSTATE into architectural state, ideally RSM emulation
      itself would be reworked to clear HF_SMM_MASK prior to loading non-SMM
      architectural state.
      
      Ostensibly, the only motivation for having HF_SMM_MASK set throughout
      the loading of state from the SMRAM save state area is so that the
      memory accesses from GET_SMSTATE() are tagged with role.smm.  Load
      all of the SMRAM save state area from guest memory at the beginning of
      RSM emulation, and load state from the buffer instead of reading guest
      memory one-by-one.
      
      This paves the way for clearing HF_SMM_MASK prior to loading state,
      and also aligns RSM with the enter_smm() behavior, which fills a
      buffer and writes SMRAM save state in a single go.
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      ed19321f
    • Liran Alon's avatar
      KVM: nVMX: Expose RDPMC-exiting only when guest supports PMU · e51bfdb6
      Liran Alon authored
      Issue was discovered when running kvm-unit-tests on KVM running as L1 on
      top of Hyper-V.
      
      When vmx_instruction_intercept unit-test attempts to run RDPMC to test
      RDPMC-exiting, it is intercepted by L1 KVM which it's EXIT_REASON_RDPMC
      handler raise #GP because vCPU exposed by Hyper-V doesn't support PMU.
      Instead of unit-test expectation to be reflected with EXIT_REASON_RDPMC.
      
      The reason vmx_instruction_intercept unit-test attempts to run RDPMC
      even though Hyper-V doesn't support PMU is because L1 expose to L2
      support for RDPMC-exiting. Which is reasonable to assume that is
      supported only in case CPU supports PMU to being with.
      
      Above issue can easily be simulated by modifying
      vmx_instruction_intercept config in x86/unittests.cfg to run QEMU with
      "-cpu host,+vmx,-pmu" and run unit-test.
      
      To handle issue, change KVM to expose RDPMC-exiting only when guest
      supports PMU.
      Reported-by: default avatarSaar Amar <saaramar@microsoft.com>
      Reviewed-by: default avatarMihai Carabas <mihai.carabas@oracle.com>
      Reviewed-by: default avatarJim Mattson <jmattson@google.com>
      Signed-off-by: default avatarLiran Alon <liran.alon@oracle.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      e51bfdb6
    • Liran Alon's avatar
      KVM: x86: Raise #GP when guest vCPU do not support PMU · 672ff6cf
      Liran Alon authored
      Before this change, reading a VMware pseduo PMC will succeed even when
      PMU is not supported by guest. This can easily be seen by running
      kvm-unit-test vmware_backdoors with "-cpu host,-pmu" option.
      Reviewed-by: default avatarMihai Carabas <mihai.carabas@oracle.com>
      Signed-off-by: default avatarLiran Alon <liran.alon@oracle.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      672ff6cf
    • WANG Chao's avatar
      x86/kvm: move kvm_load/put_guest_xcr0 into atomic context · 1811d979
      WANG Chao authored
      guest xcr0 could leak into host when MCE happens in guest mode. Because
      do_machine_check() could schedule out at a few places.
      
      For example:
      
      kvm_load_guest_xcr0
      ...
      kvm_x86_ops->run(vcpu) {
        vmx_vcpu_run
          vmx_complete_atomic_exit
            kvm_machine_check
              do_machine_check
                do_memory_failure
                  memory_failure
                    lock_page
      
      In this case, host_xcr0 is 0x2ff, guest vcpu xcr0 is 0xff. After schedule
      out, host cpu has guest xcr0 loaded (0xff).
      
      In __switch_to {
           switch_fpu_finish
             copy_kernel_to_fpregs
               XRSTORS
      
      If any bit i in XSTATE_BV[i] == 1 and xcr0[i] == 0, XRSTORS will
      generate #GP (In this case, bit 9). Then ex_handler_fprestore kicks in
      and tries to reinitialize fpu by restoring init fpu state. Same story as
      last #GP, except we get DOUBLE FAULT this time.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarWANG Chao <chao.wang@ucloud.cn>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      1811d979
    • Vitaly Kuznetsov's avatar
      KVM: x86: svm: make sure NMI is injected after nmi_singlestep · 99c22179
      Vitaly Kuznetsov authored
      I noticed that apic test from kvm-unit-tests always hangs on my EPYC 7401P,
      the hanging test nmi-after-sti is trying to deliver 30000 NMIs and tracing
      shows that we're sometimes able to deliver a few but never all.
      
      When we're trying to inject an NMI we may fail to do so immediately for
      various reasons, however, we still need to inject it so enable_nmi_window()
      arms nmi_singlestep mode. #DB occurs as expected, but we're not checking
      for pending NMIs before entering the guest and unless there's a different
      event to process, the NMI will never get delivered.
      
      Make KVM_REQ_EVENT request on the vCPU from db_interception() to make sure
      pending NMIs are checked and possibly injected.
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      99c22179
    • Suthikulpanit, Suravee's avatar
      svm/avic: Fix invalidate logical APIC id entry · e44e3eac
      Suthikulpanit, Suravee authored
      Only clear the valid bit when invalidate logical APIC id entry.
      The current logic clear the valid bit, but also set the rest of
      the bits (including reserved bits) to 1.
      
      Fixes: 98d90582 ('svm: Fix AVIC DFR and LDR handling')
      Signed-off-by: default avatarSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      e44e3eac
    • Suthikulpanit, Suravee's avatar
      Revert "svm: Fix AVIC incomplete IPI emulation" · 4a58038b
      Suthikulpanit, Suravee authored
      This reverts commit bb218fbc.
      
      As Oren Twaig pointed out the old discussion:
      
        https://patchwork.kernel.org/patch/8292231/
      
      that the change coud potentially cause an extra IPI to be sent to
      the destination vcpu because the AVIC hardware already set the IRR bit
      before the incomplete IPI #VMEXIT with id=1 (target vcpu is not running).
      Since writting to ICR and ICR2 will also set the IRR. If something triggers
      the destination vcpu to get scheduled before the emulation finishes, then
      this could result in an additional IPI.
      
      Also, the issue mentioned in the commit bb218fbc was misdiagnosed.
      
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Reported-by: default avatarOren Twaig <oren@scalemp.com>
      Signed-off-by: default avatarSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      4a58038b
    • Ben Gardon's avatar
      kvm: mmu: Fix overflow on kvm mmu page limit calculation · bc8a3d89
      Ben Gardon authored
      KVM bases its memory usage limits on the total number of guest pages
      across all memslots. However, those limits, and the calculations to
      produce them, use 32 bit unsigned integers. This can result in overflow
      if a VM has more guest pages that can be represented by a u32. As a
      result of this overflow, KVM can use a low limit on the number of MMU
      pages it will allocate. This makes KVM unable to map all of guest memory
      at once, prompting spurious faults.
      
      Tested: Ran all kvm-unit-tests on an Intel Haswell machine. This patch
      	introduced no new failures.
      Signed-off-by: default avatarBen Gardon <bgardon@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      bc8a3d89
    • Paolo Bonzini's avatar
      KVM: nVMX: always use early vmcs check when EPT is disabled · 2b27924b
      Paolo Bonzini authored
      The remaining failures of vmx.flat when EPT is disabled are caused by
      incorrectly reflecting VMfails to the L1 hypervisor.  What happens is
      that nested_vmx_restore_host_state corrupts the guest CR3, reloading it
      with the host's shadow CR3 instead, because it blindly loads GUEST_CR3
      from the vmcs01.
      
      For simplicity let's just always use hardware VMCS checks when EPT is
      disabled.  This way, nested_vmx_restore_host_state is not reached at
      all (or at least shouldn't be reached).
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      2b27924b
    • Paolo Bonzini's avatar
      KVM: nVMX: allow tests to use bad virtual-APIC page address · 69090810
      Paolo Bonzini authored
      As mentioned in the comment, there are some special cases where we can simply
      clear the TPR shadow bit from the CPU-based execution controls in the vmcs02.
      Handle them so that we can remove some XFAILs from vmx.flat.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      69090810
  2. 15 Apr, 2019 1 commit
    • Sean Christopherson's avatar
      KVM: x86/mmu: Fix an inverted list_empty() check when zapping sptes · cfd32acf
      Sean Christopherson authored
      A recently introduced helper for handling zap vs. remote flush
      incorrectly bails early, effectively leaking defunct shadow pages.
      Manifests as a slab BUG when exiting KVM due to the shadow pages
      being alive when their associated cache is destroyed.
      
      ==========================================================================
      BUG kvm_mmu_page_header: Objects remaining in kvm_mmu_page_header on ...
      --------------------------------------------------------------------------
      Disabling lock debugging due to kernel taint
      INFO: Slab 0x00000000fc436387 objects=26 used=23 fp=0x00000000d023caee ...
      CPU: 6 PID: 4315 Comm: rmmod Tainted: G    B             5.1.0-rc2+ #19
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
      Call Trace:
       dump_stack+0x46/0x5b
       slab_err+0xad/0xd0
       ? on_each_cpu_mask+0x3c/0x50
       ? ksm_migrate_page+0x60/0x60
       ? on_each_cpu_cond_mask+0x7c/0xa0
       ? __kmalloc+0x1ca/0x1e0
       __kmem_cache_shutdown+0x13a/0x310
       shutdown_cache+0xf/0x130
       kmem_cache_destroy+0x1d5/0x200
       kvm_mmu_module_exit+0xa/0x30 [kvm]
       kvm_arch_exit+0x45/0x60 [kvm]
       kvm_exit+0x6f/0x80 [kvm]
       vmx_exit+0x1a/0x50 [kvm_intel]
       __x64_sys_delete_module+0x153/0x1f0
       ? exit_to_usermode_loop+0x88/0xc0
       do_syscall_64+0x4f/0x100
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: a2113634 ("KVM: x86/mmu: Split remote_flush+zap case out of kvm_mmu_flush_or_zap()")
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      cfd32acf
  3. 10 Apr, 2019 3 commits
  4. 09 Apr, 2019 3 commits
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 869e3305
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Off by one and bounds checking fixes in NFC, from Dan Carpenter.
      
       2) There have been many weird regressions in r8169 since we turned ASPM
          support on, some are still not understood nor completely resolved.
          Let's turn this back off for now. From Heiner Kallweit.
      
       3) Signess fixes for ethtool speed value handling, from Michael
          Zhivich.
      
       4) Handle timestamps properly in macb driver, from Paul Thomas.
      
       5) Two erspan fixes, it's the usual "skb ->data potentially reallocated
          and we're holding a stale protocol header pointer". From Lorenzo
          Bianconi.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
        bnxt_en: Reset device on RX buffer errors.
        bnxt_en: Improve RX consumer index validity check.
        net: macb driver, check for SKBTX_HW_TSTAMP
        qlogic: qlcnic: fix use of SPEED_UNKNOWN ethtool constant
        broadcom: tg3: fix use of SPEED_UNKNOWN ethtool constant
        ethtool: avoid signed-unsigned comparison in ethtool_validate_speed()
        net: ip6_gre: fix possible use-after-free in ip6erspan_rcv
        net: ip_gre: fix possible use-after-free in erspan_rcv
        r8169: disable ASPM again
        MAINTAINERS: ieee802154: update documentation file pattern
        net: vrf: Fix ping failed when vrf mtu is set to 0
        selftests: add a tc matchall test case
        nfc: nci: Potential off by one in ->pipes[] array
        NFC: nci: Add some bounds checking in nci_hci_cmd_received()
      869e3305
    • Linus Torvalds's avatar
      Merge branch 'fixes-v5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security · a556810d
      Linus Torvalds authored
      Pull TPM fixes from James Morris:
       "From Jarkko: These are critical fixes for v5.1. Contains also couple
        of new selftests for v5.1 features (partial reads in /dev/tpm0)"
      
      * 'fixes-v5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
        selftests/tpm2: Open tpm dev in unbuffered mode
        selftests/tpm2: Extend tests to cover partial reads
        KEYS: trusted: fix -Wvarags warning
        tpm: Fix the type of the return value in calc_tpm2_event_size()
        KEYS: trusted: allow trusted.ko to initialize w/o a TPM
        tpm: fix an invalid condition in tpm_common_poll
        tpm: turn on TPM on suspend for TPM 1.x
      a556810d
    • Linus Torvalds's avatar
      Merge tag 'xtensa-20190408' of git://github.com/jcmvbkbc/linux-xtensa · 10d43397
      Linus Torvalds authored
      Pull xtensa fixes from Max Filippov:
      
       - fix syscall number passed to trace_sys_exit
      
       - fix syscall number initialization in start_thread
      
       - fix level interpretation in the return_address
      
       - fix format string warning in init_pmd
      
      * tag 'xtensa-20190408' of git://github.com/jcmvbkbc/linux-xtensa:
        xtensa: fix format string warning in init_pmd
        xtensa: fix return_address
        xtensa: fix initialization of pt_regs::syscall in start_thread
        xtensa: use actual syscall number in do_syscall_trace_leave
      10d43397
  5. 08 Apr, 2019 17 commits