1. 01 Sep, 2022 5 commits
    • Paolo Bonzini's avatar
      Merge tag 'kvm-riscv-fixes-6.0-1' of https://github.com/kvm-riscv/linux into HEAD · 35906d23
      Paolo Bonzini authored
      KVM/riscv fixes for 6.0, take #1
      
      - Fix unused variable warnings in vcpu_timer.c
      - Move extern sbi_ext declarations to a header
      35906d23
    • Paolo Bonzini's avatar
      KVM: x86: check validity of argument to KVM_SET_MP_STATE · 22c6a0ef
      Paolo Bonzini authored
      An invalid argument to KVM_SET_MP_STATE has no effect other than making the
      vCPU fail to run at the next KVM_RUN.  Since it is extremely unlikely that
      any userspace is relying on it, fail with -EINVAL just like for other
      architectures.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      22c6a0ef
    • Like Xu's avatar
      perf/x86/core: Completely disable guest PEBS via guest's global_ctrl · 87693645
      Like Xu authored
      When a guest PEBS counter is cross-mapped by a host counter, software
      will remove the corresponding bit in the arr[global_ctrl].guest and
      expect hardware to perform a change of state "from enable to disable"
      via the msr_slot[] switch during the vmx transaction.
      
      The real world is that if user adjust the counter overflow value small
      enough, it still opens a tiny race window for the previously PEBS-enabled
      counter to write cross-mapped PEBS records into the guest's PEBS buffer,
      when arr[global_ctrl].guest has been prioritised (switch_msr_special stuff)
      to switch into the enabled state, while the arr[pebs_enable].guest has not.
      
      Close this window by clearing invalid bits in the arr[global_ctrl].guest.
      
      Cc: linux-perf-users@vger.kernel.org
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sean Christopherson <seanjc@google.com>
      Fixes: 85425032 ("KVM: x86/pmu: Disable guest PEBS temporarily in two rare situations")
      Signed-off-by: default avatarLike Xu <likexu@tencent.com>
      Message-Id: <20220831033524.58561-1-likexu@tencent.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      87693645
    • Miaohe Lin's avatar
      KVM: x86: fix memoryleak in kvm_arch_vcpu_create() · 3c0ba05c
      Miaohe Lin authored
      When allocating memory for mci_ctl2_banks fails, KVM doesn't release
      mce_banks leading to memoryleak. Fix this issue by calling kfree()
      for it when kcalloc() fails.
      
      Fixes: 281b5278 ("KVM: x86: Add emulation for MSR_IA32_MCx_CTL2 MSRs.")
      Signed-off-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Message-Id: <20220901122300.22298-1-linmiaohe@huawei.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      3c0ba05c
    • Jim Mattson's avatar
      KVM: x86: Mask off unsupported and unknown bits of IA32_ARCH_CAPABILITIES · 0204750b
      Jim Mattson authored
      KVM should not claim to virtualize unknown IA32_ARCH_CAPABILITIES
      bits. When kvm_get_arch_capabilities() was originally written, there
      were only a few bits defined in this MSR, and KVM could virtualize all
      of them. However, over the years, several bits have been defined that
      KVM cannot just blindly pass through to the guest without additional
      work (such as virtualizing an MSR promised by the
      IA32_ARCH_CAPABILITES feature bit).
      
      Define a mask of supported IA32_ARCH_CAPABILITIES bits, and mask off
      any other bits that are set in the hardware MSR.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Fixes: 5b76a3cf ("KVM: VMX: Tell the nested hypervisor to skip L1D flush on vmentry")
      Signed-off-by: default avatarJim Mattson <jmattson@google.com>
      Reviewed-by: default avatarVipin Sharma <vipinsh@google.com>
      Reviewed-by: default avatarXiaoyao Li <xiaoyao.li@intel.com>
      Message-Id: <20220830174947.2182144-1-jmattson@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      0204750b
  2. 19 Aug, 2022 19 commits
    • Conor Dooley's avatar
      riscv: kvm: move extern sbi_ext declarations to a header · 3e5e56c6
      Conor Dooley authored
      Sparse complains about missing statics in the declarations of several
      variables:
      arch/riscv/kvm/vcpu_sbi_replace.c:38:37: warning: symbol 'vcpu_sbi_ext_time' was not declared. Should it be static?
      arch/riscv/kvm/vcpu_sbi_replace.c:73:37: warning: symbol 'vcpu_sbi_ext_ipi' was not declared. Should it be static?
      arch/riscv/kvm/vcpu_sbi_replace.c:126:37: warning: symbol 'vcpu_sbi_ext_rfence' was not declared. Should it be static?
      arch/riscv/kvm/vcpu_sbi_replace.c:170:37: warning: symbol 'vcpu_sbi_ext_srst' was not declared. Should it be static?
      arch/riscv/kvm/vcpu_sbi_base.c:69:37: warning: symbol 'vcpu_sbi_ext_base' was not declared. Should it be static?
      arch/riscv/kvm/vcpu_sbi_base.c:90:37: warning: symbol 'vcpu_sbi_ext_experimental' was not declared. Should it be static?
      arch/riscv/kvm/vcpu_sbi_base.c:96:37: warning: symbol 'vcpu_sbi_ext_vendor' was not declared. Should it be static?
      arch/riscv/kvm/vcpu_sbi_hsm.c:115:37: warning: symbol 'vcpu_sbi_ext_hsm' was not declared. Should it be static?
      
      These variables are however used in vcpu_sbi.c where they are declared
      as extern. Move them to kvm_vcpu_sbi.h which is handily already
      included by the three other files.
      
      Fixes: a046c2d8 ("RISC-V: KVM: Reorganize SBI code by moving SBI v0.1 to its own file")
      Fixes: 5f862df5 ("RISC-V: KVM: Add v0.1 replacement SBI extensions defined in v0.2")
      Fixes: 3e1d8656 ("RISC-V: KVM: Add SBI HSM extension in KVM")
      Reviewed-by: default avatarPalmer Dabbelt <palmer@rivosinc.com>
      Signed-off-by: default avatarConor Dooley <conor.dooley@microchip.com>
      Signed-off-by: default avatarAnup Patel <anup@brainfault.org>
      3e5e56c6
    • Conor Dooley's avatar
      riscv: kvm: vcpu_timer: fix unused variable warnings · fd0cd59f
      Conor Dooley authored
      In two places, csr is set but never used:
      
      arch/riscv/kvm/vcpu_timer.c:302:23: warning: variable 'csr' set but not used [-Wunused-but-set-variable]
              struct kvm_vcpu_csr *csr;
                                   ^
      arch/riscv/kvm/vcpu_timer.c:327:23: warning: variable 'csr' set but not used [-Wunused-but-set-variable]
              struct kvm_vcpu_csr *csr;
                                   ^
      
      Remove the variable.
      
      Fixes: 8f5cb44b ("RISC-V: KVM: Support sstc extension")
      Reviewed-by: default avatarPalmer Dabbelt <palmer@rivosinc.com>
      Signed-off-by: default avatarConor Dooley <conor.dooley@microchip.com>
      Signed-off-by: default avatarAnup Patel <anup@brainfault.org>
      fd0cd59f
    • David Matlack's avatar
      KVM: selftests: Fix ambiguous mov in KVM_ASM_SAFE() · 372d0708
      David Matlack authored
      Change the mov in KVM_ASM_SAFE() that zeroes @vector to a movb to
      make it unambiguous.
      
      This fixes a build failure with Clang since, unlike the GNU assembler,
      the LLVM integrated assembler rejects ambiguous X86 instructions that
      don't have suffixes:
      
        In file included from x86_64/hyperv_features.c:13:
        include/x86_64/processor.h:825:9: error: ambiguous instructions require an explicit suffix (could be 'movb', 'movw', 'movl', or 'movq')
                return kvm_asm_safe("wrmsr", "a"(val & -1u), "d"(val >> 32), "c"(msr));
                       ^
        include/x86_64/processor.h:802:15: note: expanded from macro 'kvm_asm_safe'
                asm volatile(KVM_ASM_SAFE(insn)                 \
                             ^
        include/x86_64/processor.h:788:16: note: expanded from macro 'KVM_ASM_SAFE'
                "1: " insn "\n\t"                                       \
                              ^
        <inline asm>:5:2: note: instantiated into assembly here
                mov $0, 15(%rsp)
                ^
      
      It seems like this change could introduce undesirable behavior in the
      future, e.g. if someone used a type larger than a u8 for @vector, since
      KVM_ASM_SAFE() will only zero the bottom byte. I tried changing the type
      of @vector to an int to see what would happen. GCC failed to compile due
      to a size mismatch between `movb` and `%eax`. Clang succeeded in
      compiling, but the generated code looked correct, so perhaps it will not
      be an issue. That being said it seems like there could be a better
      solution to this issue that does not assume @vector is a u8.
      
      Fixes: 3b23054c ("KVM: selftests: Add x86-64 support for exception fixup")
      Signed-off-by: default avatarDavid Matlack <dmatlack@google.com>
      Reviewed-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220722234838.2160385-3-dmatlack@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      372d0708
    • David Matlack's avatar
      KVM: selftests: Fix KVM_EXCEPTION_MAGIC build with Clang · 67ef8664
      David Matlack authored
      Change KVM_EXCEPTION_MAGIC to use the all-caps "ULL", rather than lower
      case. This fixes a build failure with Clang:
      
        In file included from x86_64/hyperv_features.c:13:
        include/x86_64/processor.h:825:9: error: unexpected token in argument list
                return kvm_asm_safe("wrmsr", "a"(val & -1u), "d"(val >> 32), "c"(msr));
                       ^
        include/x86_64/processor.h:802:15: note: expanded from macro 'kvm_asm_safe'
                asm volatile(KVM_ASM_SAFE(insn)                 \
                             ^
        include/x86_64/processor.h:785:2: note: expanded from macro 'KVM_ASM_SAFE'
                "mov $" __stringify(KVM_EXCEPTION_MAGIC) ", %%r9\n\t"   \
                ^
        <inline asm>:1:18: note: instantiated into assembly here
                mov $0xabacadabaull, %r9
                                ^
      
      Fixes: 3b23054c ("KVM: selftests: Add x86-64 support for exception fixup")
      Signed-off-by: default avatarDavid Matlack <dmatlack@google.com>
      Reviewed-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220722234838.2160385-2-dmatlack@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      67ef8664
    • Jim Mattson's avatar
      KVM: VMX: Heed the 'msr' argument in msr_write_intercepted() · 020dac41
      Jim Mattson authored
      Regardless of the 'msr' argument passed to the VMX version of
      msr_write_intercepted(), the function always checks to see if a
      specific MSR (IA32_SPEC_CTRL) is intercepted for write.  This behavior
      seems unintentional and unexpected.
      
      Modify the function so that it checks to see if the provided 'msr'
      index is intercepted for write.
      
      Fixes: 67f4b996 ("KVM: nVMX: Handle dynamic MSR intercept toggling")
      Cc: Sean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarJim Mattson <jmattson@google.com>
      Reviewed-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220810213050.2655000-1-jmattson@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      020dac41
    • Junaid Shahid's avatar
      kvm: x86: mmu: Always flush TLBs when enabling dirty logging · b64d740e
      Junaid Shahid authored
      When A/D bits are not available, KVM uses a software access tracking
      mechanism, which involves making the SPTEs inaccessible. However,
      the clear_young() MMU notifier does not flush TLBs. So it is possible
      that there may still be stale, potentially writable, TLB entries.
      This is usually fine, but can be problematic when enabling dirty
      logging, because it currently only does a TLB flush if any SPTEs were
      modified. But if all SPTEs are in access-tracked state, then there
      won't be a TLB flush, which means that the guest could still possibly
      write to memory and not have it reflected in the dirty bitmap.
      
      So just unconditionally flush the TLBs when enabling dirty logging.
      As an alternative, KVM could explicitly check the MMU-Writable bit when
      write-protecting SPTEs to decide if a flush is needed (instead of
      checking the Writable bit), but given that a flush almost always happens
      anyway, so just making it unconditional seems simpler.
      Signed-off-by: default avatarJunaid Shahid <junaids@google.com>
      Message-Id: <20220810224939.2611160-1-junaids@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b64d740e
    • Junaid Shahid's avatar
      kvm: x86: mmu: Drop the need_remote_flush() function · 1441ca14
      Junaid Shahid authored
      This is only used by kvm_mmu_pte_write(), which no longer actually
      creates the new SPTE and instead just clears the old SPTE. So we
      just need to check if the old SPTE was shadow-present instead of
      calling need_remote_flush(). Hence we can drop this function. It was
      incomplete anyway as it didn't take access-tracking into account.
      
      This patch should not result in any functional change.
      Signed-off-by: default avatarJunaid Shahid <junaids@google.com>
      Reviewed-by: default avatarDavid Matlack <dmatlack@google.com>
      Reviewed-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220723024316.2725328-1-junaids@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      1441ca14
    • Paolo Bonzini's avatar
      Merge tag 'kvmarm-fixes-6.0-1' of... · 959d6c4a
      Paolo Bonzini authored
      Merge tag 'kvmarm-fixes-6.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
      
      KVM/arm64 fixes for 6.0, take #1
      
      - Fix unexpected sign extension of KVM_ARM_DEVICE_ID_MASK
      
      - Tidy-up handling of AArch32 on asymmetric systems
      959d6c4a
    • Li kunyu's avatar
      KVM: Drop unnecessary initialization of "ops" in kvm_ioctl_create_device() · eceb6e1d
      Li kunyu authored
      The variable is initialized but it is only used after its assignment.
      Reviewed-by: default avatarSean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarLi kunyu <kunyu@nfschina.com>
      Message-Id: <20220819021535.483702-1-kunyu@nfschina.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      eceb6e1d
    • Li kunyu's avatar
      KVM: Drop unnecessary initialization of "npages" in hva_to_pfn_slow() · 28249139
      Li kunyu authored
      The variable is initialized but it is only used after its assignment.
      Reviewed-by: default avatarSean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarLi kunyu <kunyu@nfschina.com>
      Message-Id: <20220819022804.483914-1-kunyu@nfschina.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      28249139
    • Josh Poimboeuf's avatar
      x86/kvm: Fix "missing ENDBR" BUG for fastop functions · 3d9606b0
      Josh Poimboeuf authored
      The following BUG was reported:
      
        traps: Missing ENDBR: andw_ax_dx+0x0/0x10 [kvm]
        ------------[ cut here ]------------
        kernel BUG at arch/x86/kernel/traps.c:253!
        invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
         <TASK>
         asm_exc_control_protection+0x2b/0x30
        RIP: 0010:andw_ax_dx+0x0/0x10 [kvm]
        Code: c3 cc cc cc cc 0f 1f 44 00 00 66 0f 1f 00 48 19 d0 c3 cc cc cc
              cc 0f 1f 40 00 f3 0f 1e fa 20 d0 c3 cc cc cc cc 0f 1f 44 00 00
              <66> 0f 1f 00 66 21 d0 c3 cc cc cc cc 0f 1f 40 00 66 0f 1f 00 21
              d0
      
         ? andb_al_dl+0x10/0x10 [kvm]
         ? fastop+0x5d/0xa0 [kvm]
         x86_emulate_insn+0x822/0x1060 [kvm]
         x86_emulate_instruction+0x46f/0x750 [kvm]
         complete_emulated_mmio+0x216/0x2c0 [kvm]
         kvm_arch_vcpu_ioctl_run+0x604/0x650 [kvm]
         kvm_vcpu_ioctl+0x2f4/0x6b0 [kvm]
         ? wake_up_q+0xa0/0xa0
      
      The BUG occurred because the ENDBR in the andw_ax_dx() fastop function
      had been incorrectly "sealed" (converted to a NOP) by apply_ibt_endbr().
      
      Objtool marked it to be sealed because KVM has no compile-time
      references to the function.  Instead KVM calculates its address at
      runtime.
      
      Prevent objtool from annotating fastop functions as sealable by creating
      throwaway dummy compile-time references to the functions.
      
      Fixes: 6649fa87 ("x86/ibt,kvm: Add ENDBR to fastops")
      Reported-by: default avatarPengfei Xu <pengfei.xu@intel.com>
      Debugged-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarJosh Poimboeuf <jpoimboe@kernel.org>
      Message-Id: <0d4116f90e9d0c1b754bb90c585e6f0415a1c508.1660837839.git.jpoimboe@kernel.org>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      3d9606b0
    • Josh Poimboeuf's avatar
      x86/kvm: Simplify FOP_SETCC() · 22472d12
      Josh Poimboeuf authored
      SETCC_ALIGN and FOP_ALIGN are both 16.  Remove the special casing for
      FOP_SETCC() and just make it a normal fastop.
      Signed-off-by: default avatarJosh Poimboeuf <jpoimboe@kernel.org>
      Message-Id: <7c13d94d1a775156f7e36eed30509b274a229140.1660837839.git.jpoimboe@kernel.org>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      22472d12
    • Josh Poimboeuf's avatar
      x86/ibt, objtool: Add IBT_NOSEAL() · e27e5bea
      Josh Poimboeuf authored
      Add a macro which prevents a function from getting sealed if there are
      no compile-time references to it.
      Signed-off-by: default avatarJosh Poimboeuf <jpoimboe@kernel.org>
      Message-Id: <20220818213927.e44fmxkoq4yj6ybn@treble>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      e27e5bea
    • Chao Peng's avatar
      KVM: Rename mmu_notifier_* to mmu_invalidate_* · 20ec3ebd
      Chao Peng authored
      The motivation of this renaming is to make these variables and related
      helper functions less mmu_notifier bound and can also be used for non
      mmu_notifier based page invalidation. mmu_invalidate_* was chosen to
      better describe the purpose of 'invalidating' a page that those
      variables are used for.
      
        - mmu_notifier_seq/range_start/range_end are renamed to
          mmu_invalidate_seq/range_start/range_end.
      
        - mmu_notifier_retry{_hva} helper functions are renamed to
          mmu_invalidate_retry{_hva}.
      
        - mmu_notifier_count is renamed to mmu_invalidate_in_progress to
          avoid confusion with mn_active_invalidate_count.
      
        - While here, also update kvm_inc/dec_notifier_count() to
          kvm_mmu_invalidate_begin/end() to match the change for
          mmu_notifier_count.
      
      No functional change intended.
      Signed-off-by: default avatarChao Peng <chao.p.peng@linux.intel.com>
      Message-Id: <20220816125322.1110439-3-chao.p.peng@linux.intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      20ec3ebd
    • Chao Peng's avatar
      KVM: Rename KVM_PRIVATE_MEM_SLOTS to KVM_INTERNAL_MEM_SLOTS · bdd1c37a
      Chao Peng authored
      KVM_INTERNAL_MEM_SLOTS better reflects the fact those slots are KVM
      internally used (invisible to userspace) and avoids confusion to future
      private slots that can have different meaning.
      Signed-off-by: default avatarChao Peng <chao.p.peng@linux.intel.com>
      Message-Id: <20220816125322.1110439-2-chao.p.peng@linux.intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      bdd1c37a
    • Paolo Bonzini's avatar
      KVM: MIPS: remove unnecessary definition of KVM_PRIVATE_MEM_SLOTS · b0754508
      Paolo Bonzini authored
      KVM_PRIVATE_MEM_SLOTS defaults to zero, so it is not necessary to
      define it in MIPS's asm/kvm_host.h.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b0754508
    • Sean Christopherson's avatar
      KVM: Move coalesced MMIO initialization (back) into kvm_create_vm() · c2b82397
      Sean Christopherson authored
      Invoke kvm_coalesced_mmio_init() from kvm_create_vm() now that allocating
      and initializing coalesced MMIO objects is separate from registering any
      associated devices.  Moving coalesced MMIO cleans up the last oddity
      where KVM does VM creation/initialization after kvm_create_vm(), and more
      importantly after kvm_arch_post_init_vm() is called and the VM is added
      to the global vm_list, i.e. after the VM is fully created as far as KVM
      is concerned.
      
      Originally, kvm_coalesced_mmio_init() was called by kvm_create_vm(), but
      the original implementation was completely devoid of error handling.
      Commit 6ce5a090 ("KVM: coalesced_mmio: fix kvm_coalesced_mmio_init()'s
      error handling" fixed the various bugs, and in doing so rightly moved the
      call to after kvm_create_vm() because kvm_coalesced_mmio_init() also
      registered the coalesced MMIO device.  Commit 2b3c246a ("KVM: Make
      coalesced mmio use a device per zone") cleaned up that mess by having
      each zone register a separate device, i.e. moved device registration to
      its logical home in kvm_vm_ioctl_register_coalesced_mmio().  As a result,
      kvm_coalesced_mmio_init() is now a "pure" initialization helper and can
      be safely called from kvm_create_vm().
      
      Opportunstically drop the #ifdef, KVM provides stubs for
      kvm_coalesced_mmio_{init,free}() when CONFIG_KVM_MMIO=n (s390).
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220816053937.2477106-4-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c2b82397
    • Sean Christopherson's avatar
      KVM: Unconditionally get a ref to /dev/kvm module when creating a VM · 405294f2
      Sean Christopherson authored
      Unconditionally get a reference to the /dev/kvm module when creating a VM
      instead of using try_get_module(), which will fail if the module is in
      the process of being forcefully unloaded.  The error handling when
      try_get_module() fails doesn't properly unwind all that has been done,
      e.g. doesn't call kvm_arch_pre_destroy_vm() and doesn't remove the VM
      from the global list.  Not removing VMs from the global list tends to be
      fatal, e.g. leads to use-after-free explosions.
      
      The obvious alternative would be to add proper unwinding, but the
      justification for using try_get_module(), "rmmod --wait", is completely
      bogus as support for "rmmod --wait", i.e. delete_module() without
      O_NONBLOCK, was removed by commit 3f2b9c9c ("module: remove rmmod
      --wait option.") nearly a decade ago.
      
      It's still possible for try_get_module() to fail due to the module dying
      (more like being killed), as the module will be tagged MODULE_STATE_GOING
      by "rmmod --force", i.e. delete_module(..., O_TRUNC), but playing nice
      with forced unloading is an exercise in futility and gives a falsea sense
      of security.  Using try_get_module() only prevents acquiring _new_
      references, it doesn't magically put the references held by other VMs,
      and forced unloading doesn't wait, i.e. "rmmod --force" on KVM is all but
      guaranteed to cause spectacular fireworks; the window where KVM will fail
      try_get_module() is tiny compared to the window where KVM is building and
      running the VM with an elevated module refcount.
      
      Addressing KVM's inability to play nice with "rmmod --force" is firmly
      out-of-scope.  Forcefully unloading any module taints kernel (for obvious
      reasons)  _and_ requires the kernel to be built with
      CONFIG_MODULE_FORCE_UNLOAD=y, which is off by default and comes with the
      amusing disclaimer that it's "mainly for kernel developers and desperate
      users".  In other words, KVM is free to scoff at bug reports due to using
      "rmmod --force" while VMs may be running.
      
      Fixes: 5f6de5cb ("KVM: Prevent module exit until all VMs are freed")
      Cc: stable@vger.kernel.org
      Cc: David Matlack <dmatlack@google.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220816053937.2477106-3-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      405294f2
    • Sean Christopherson's avatar
      KVM: Properly unwind VM creation if creating debugfs fails · 4ba4f419
      Sean Christopherson authored
      Properly unwind VM creation if kvm_create_vm_debugfs() fails.  A recent
      change to invoke kvm_create_vm_debug() in kvm_create_vm() was led astray
      by buggy try_get_module() handling adding by commit 5f6de5cb ("KVM:
      Prevent module exit until all VMs are freed").  The debugfs error path
      effectively inherits the bad error path of try_module_get(), e.g. KVM
      leaves the to-be-free VM on vm_list even though KVM appears to do the
      right thing by calling module_put() and falling through.
      
      Opportunistically hoist kvm_create_vm_debugfs() above the call to
      kvm_arch_post_init_vm() so that the "post-init" arch hook is actually
      invoked after the VM is initialized (ignoring kvm_coalesced_mmio_init()
      for the moment).  x86 is the only non-nop implementation of the post-init
      hook, and it doesn't allocate/initialize any objects that are reachable
      via debugfs code (spawns a kthread worker for the NX huge page mitigation).
      
      Leave the buggy try_get_module() alone for now, it will be fixed in a
      separate commit.
      
      Fixes: b74ed7a6 ("KVM: Actually create debugfs in kvm_create_vm()")
      Reported-by: syzbot+744e173caec2e1627ee0@syzkaller.appspotmail.com
      Cc: Oliver Upton <oliver.upton@linux.dev>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Reviewed-by: default avatarOliver Upton <oliver.upton@linux.dev>
      Message-Id: <20220816053937.2477106-2-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      4ba4f419
  3. 17 Aug, 2022 2 commits
  4. 14 Aug, 2022 10 commits
    • Linus Torvalds's avatar
      Linux 6.0-rc1 · 568035b0
      Linus Torvalds authored
      568035b0
    • Yury Norov's avatar
      radix-tree: replace gfp.h inclusion with gfp_types.h · 9f162193
      Yury Norov authored
      Radix tree header includes gfp.h for __GFP_BITS_SHIFT only. Now we
      have gfp_types.h for this.
      
      Fixes powerpc allmodconfig build:
      
         In file included from include/linux/nodemask.h:97,
                          from include/linux/mmzone.h:17,
                          from include/linux/gfp.h:7,
                          from include/linux/radix-tree.h:12,
                          from include/linux/idr.h:15,
                          from include/linux/kernfs.h:12,
                          from include/linux/sysfs.h:16,
                          from include/linux/kobject.h:20,
                          from include/linux/pci.h:35,
                          from arch/powerpc/kernel/prom_init.c:24:
         include/linux/random.h: In function 'add_latent_entropy':
      >> include/linux/random.h:25:46: error: 'latent_entropy' undeclared (first use in this function); did you mean 'add_latent_entropy'?
            25 |         add_device_randomness((const void *)&latent_entropy, sizeof(latent_entropy));
               |                                              ^~~~~~~~~~~~~~
               |                                              add_latent_entropy
         include/linux/random.h:25:46: note: each undeclared identifier is reported only once for each function it appears in
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      CC: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
      CC: Andrew Morton <akpm@linux-foundation.org>
      CC: Jason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarYury Norov <yury.norov@gmail.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9f162193
    • Linus Torvalds's avatar
      Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 74cbb480
      Linus Torvalds authored
      Pull vfs lseek fix from Al Viro:
       "Fix proc_reg_llseek() breakage. Always had been possible if somebody
        left NULL ->proc_lseek, became a practical issue now"
      
      * tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        take care to handle NULL ->proc_lseek()
      74cbb480
    • Al Viro's avatar
      take care to handle NULL ->proc_lseek() · 3f61631d
      Al Viro authored
      Easily done now, just by clearing FMODE_LSEEK in ->f_mode
      during proc_reg_open() for such entries.
      
      Fixes: 868941b1 "fs: remove no_llseek"
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      3f61631d
    • Linus Torvalds's avatar
      Merge tag 'for-linus-6.0-rc1b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · 5d6a0f4d
      Linus Torvalds authored
      Pull more xen updates from Juergen Gross:
      
       - fix the handling of the "persistent grants" feature negotiation
         between Xen blkfront and Xen blkback drivers
      
       - a cleanup of xen.config and adding xen.config to Xen section in
         MAINTAINERS
      
       - support HVMOP_set_evtchn_upcall_vector, which is more compliant to
         "normal" interrupt handling than the global callback used up to now
      
       - further small cleanups
      
      * tag 'for-linus-6.0-rc1b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
        MAINTAINERS: add xen config fragments to XEN HYPERVISOR sections
        xen: remove XEN_SCRUB_PAGES in xen.config
        xen/pciback: Fix comment typo
        xen/xenbus: fix return type in xenbus_file_read()
        xen-blkfront: Apply 'feature_persistent' parameter when connect
        xen-blkback: Apply 'feature_persistent' parameter when connect
        xen-blkback: fix persistent grants negotiation
        x86/xen: Add support for HVMOP_set_evtchn_upcall_vector
      5d6a0f4d
    • Linus Torvalds's avatar
      Merge tag 'perf-tools-fixes-for-v6.0-2022-08-13' of... · 96f86ff0
      Linus Torvalds authored
      Merge tag 'perf-tools-fixes-for-v6.0-2022-08-13' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
      
      Pull more perf tool updates from Arnaldo Carvalho de Melo:
      
       - 'perf c2c' now supports ARM64, adjust its output to cope with
         differences with what is in x86_64. Now go find false sharing on
         ARM64 (at least Neoverse) as well!
      
       - Refactor the JSON processing, making the output more compact and thus
         reducing the size of the resulting perf binary
      
       - Improvements for 'perf offcpu' profiling, including tracking child
         processes
      
       - Update Intel JSON metrics and events files for broadwellde,
         broadwellx, cascadelakex, haswellx, icelakex, ivytown, jaketown,
         knightslanding, sapphirerapids, skylakex and snowridgex
      
       - Add 'perf stat' JSON output and a 'perf test' entry for it
      
       - Ignore memfd and anonymous mmap events if jitdump present
      
       - Refactor 'perf test' shell tests allowing subdirs
      
       - Fix an error handling path in 'parse_perf_probe_command()'
      
       - Fixes for the guest Intel PT tracing patchkit in the 1st batch of
         this merge window
      
       - Print debuginfod queries if -v option is used, to explain delays in
         processing when debuginfo servers are enabled to fetch DSOs with
         richer symbol tables
      
       - Improve error message for 'perf record -p not_existing_pid'
      
       - Fix openssl and libbpf feature detection
      
       - Add PMU pai_crypto event description for IBM z16 on 'perf list'
      
       - Fix typos and duplicated words on comments in various places
      
      * tag 'perf-tools-fixes-for-v6.0-2022-08-13' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (81 commits)
        perf test: Refactor shell tests allowing subdirs
        perf vendor events: Update events for snowridgex
        perf vendor events: Update events and metrics for skylakex
        perf vendor events: Update metrics for sapphirerapids
        perf vendor events: Update events for knightslanding
        perf vendor events: Update metrics for jaketown
        perf vendor events: Update metrics for ivytown
        perf vendor events: Update events and metrics for icelakex
        perf vendor events: Update events and metrics for haswellx
        perf vendor events: Update events and metrics for cascadelakex
        perf vendor events: Update events and metrics for broadwellx
        perf vendor events: Update metrics for broadwellde
        perf jevents: Fold strings optimization
        perf jevents: Compress the pmu_events_table
        perf metrics: Copy entire pmu_event in find metric
        perf pmu-events: Hide the pmu_events
        perf pmu-events: Don't assume pmu_event is an array
        perf pmu-events: Move test events/metrics to JSON
        perf test: Use full metric resolution
        perf pmu-events: Hide pmu_events_map
        ...
      96f86ff0
    • Linus Torvalds's avatar
      Merge tag 'powerpc-6.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · d785610f
      Linus Torvalds authored
      Pull powerpc fixes from Michael Ellerman:
      
       - Ensure we never emit lwarx with EH=1 on 32-bit, because some 32-bit
         CPUs trap on it rather than ignoring it as they should.
      
       - Fix ftrace when building with clang, which was broken by some
         refactoring.
      
       - A couple of other minor fixes.
      
      Thanks to Christophe Leroy, Naveen N.  Rao, Nick Desaulniers, Ondrej
      Mosnacek, Pali Rohár, Russell Currey, and Segher Boessenkool.
      
      * tag 'powerpc-6.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/kexec: Fix build failure from uninitialised variable
        powerpc/ppc-opcode: Fix PPC_RAW_TW()
        powerpc64/ftrace: Fix ftrace for clang builds
        powerpc: Make eh value more explicit when using lwarx
        powerpc: Don't hide eh field of lwarx behind a macro
        powerpc: Fix eh field when calling lwarx on PPC32
      d785610f
    • Linus Torvalds's avatar
      Merge tag 'pull-work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · aea23e7c
      Linus Torvalds authored
      Pull /proc/mounts fix from Al Viro:
       "Fix for /proc/mounts escaping - escape the '#' character too"
      
      * tag 'pull-work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        vfs: escape hash as well
      aea23e7c
    • Linus Torvalds's avatar
      Merge tag '5.20-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6 · 332019e2
      Linus Torvalds authored
      Pull more cifs updates from Steve French:
      
       - two fixes for stable, one for a lock length miscalculation, and
         another fixes a lease break timeout bug
      
       - improvement to handle leases, allows the close timeout to be
         configured more safely
      
       - five restructuring/cleanup patches
      
      * tag '5.20-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
        cifs: Do not access tcon->cfids->cfid directly from is_path_accessible
        cifs: Add constructor/destructors for tcon->cfid
        SMB3: fix lease break timeout when multiple deferred close handles for the same file.
        smb3: allow deferred close timeout to be configurable
        cifs: Do not use tcon->cfid directly, use the cfid we get from open_cached_dir
        cifs: Move cached-dir functions into a separate file
        cifs: Remove {cifs,nfs}_fscache_release_page()
        cifs: fix lock length calculation
      332019e2
    • David Howells's avatar
      afs: Enable multipage folio support · 8549a263
      David Howells authored
      Enable multipage folio support for the afs filesystem.
      
      Support has already been implemented in netfslib, fscache and cachefiles
      and in most of afs, but I've waited for Matthew Wilcox's latest folio
      changes.
      
      Note that it does require a change to afs_write_begin() to return the
      correct subpage.  This is a "temporary" change as we're working on
      getting rid of the need for ->write_begin() and ->write_end()
      completely, at least as far as network filesystems are concerned - but
      it doesn't prevent afs from making use of the capability.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Acked-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Tested-by: kafs-testing@auristor.com
      Cc: Marc Dionne <marc.dionne@auristor.com>
      Cc: linux-afs@lists.infradead.org
      Link: https://lore.kernel.org/lkml/2274528.1645833226@warthog.procyon.org.uk/Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8549a263
  5. 13 Aug, 2022 4 commits
    • Linus Torvalds's avatar
      Merge tag 'timers-urgent-2022-08-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · f6eb0fed
      Linus Torvalds authored
      Pull timer fixes from Ingo Molnar:
       "Misc timer fixes:
      
         - fix a potential use-after-free bug in posix timers
      
         - correct a prototype
      
         - address a build warning"
      
      * tag 'timers-urgent-2022-08-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        posix-cpu-timers: Cleanup CPU timers before freeing them during exec
        time: Correct the prototype of ns_to_kernel_old_timeval and ns_to_timespec64
        posix-timers: Make do_clock_gettime() static
      f6eb0fed
    • Linus Torvalds's avatar
      Merge tag 'x86-urgent-2022-08-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · c5f1e32e
      Linus Torvalds authored
      Pull x86 fix from Ingo Molnar:
       "Fix the 'IBPB mitigated RETBleed' mode of operation on AMD CPUs (not
        turned on by default), which also need STIBP enabled (if available) to
        be '100% safe' on even the shortest speculation windows"
      
      * tag 'x86-urgent-2022-08-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/bugs: Enable STIBP for IBPB mitigated RETBleed
      c5f1e32e
    • Linus Torvalds's avatar
      Merge tag 'i2c-for-5.20-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · 04734361
      Linus Torvalds authored
      Pull more i2c updates from Wolfram Sang:
      
       - two driver fixes for issues introduced this cycle
      
       - one trivial driver improvement regarding ACPI
      
       - more DTS conversion and additions
      
       - documentation updates
      
       - subsystem-wide move from strlcpy to strscpy
      
      * tag 'i2c-for-5.20-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
        docs: i2c: i2c-sysfs: fix hyperlinks
        docs: i2c: i2c-sysfs: improve wording
        docs: i2c: instantiating-devices: add syntax coloring to dts and C blocks
        docs: i2c: smbus-protocol: improve DataLow/DataHigh definition
        docs: i2c: i2c-protocol: remove unused legend items
        docs: i2c: i2c-protocol,smbus-protocol: remove nonsense words
        docs: i2c: i2c-protocol: update introductory paragraph
        i2c: move core from strlcpy to strscpy
        i2c: move drivers from strlcpy to strscpy
        i2c: kempld: Support ACPI I2C device declaration
        i2c: mediatek: add i2c compatible for MT8188
        dt-bindings: i2c: update bindings for mt8188 soc
        i2c: microchip-corei2c: fix erroneous late ack send
        dt-bindings: i2c: qcom,i2c-cci: convert to dtschema
        i2c: qcom-geni: Fix GPI DMA buffer sync-back
      04734361
    • Linus Torvalds's avatar
      Merge tag 'ntb-5.20' of https://github.com/jonmason/ntb · a976835f
      Linus Torvalds authored
      Pull NTB updates from Jon Mason:
       "Non-Transparent Bridge updates.
      
        Fix of heap data and clang warnings, support for a new Intel NTB
        device, and NTB EndPoint Function (EPF) support and the various fixes
        for that"
      
      * tag 'ntb-5.20' of https://github.com/jonmason/ntb:
        MAINTAINERS: add PCI Endpoint NTB drivers to NTB files
        NTB: EPF: Tidy up some bounds checks
        NTB: EPF: Fix error code in epf_ntb_bind()
        PCI: endpoint: pci-epf-vntb: reduce several globals to statics
        PCI: endpoint: pci-epf-vntb: fix error handle in epf_ntb_mw_bar_init()
        PCI: endpoint: Fix Kconfig dependency
        NTB: EPF: set pointer addr to null using NULL rather than 0
        Documentation: PCI: extend subheading underline for "lspci output" section
        Documentation: PCI: Use code-block block for scratchpad registers diagram
        Documentation: PCI: Add specification for the PCI vNTB function device
        PCI: endpoint: Support NTB transfer between RC and EP
        NTB: epf: Allow more flexibility in the memory BAR map method
        PCI: designware-ep: Allow pci_epc_set_bar() update inbound map address
        ntb: intel: add GNR support for Intel PCIe gen5 NTB
        NTB: ntb_tool: uninitialized heap data in tool_fn_write()
        ntb: idt: fix clang -Wformat warnings
      a976835f