1. 03 Jun, 2023 1 commit
    • Paolo Bonzini's avatar
      Merge tag 'kvmarm-fixes-6.4-2' of... · 26f31498
      Paolo Bonzini authored
      Merge tag 'kvmarm-fixes-6.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
      
      KVM/arm64 fixes for 6.4, take #2
      
      - Address some fallout of the locking rework, this time affecting
        the way the vgic is configured
      
      - Fix an issue where the page table walker frees a subtree and
        then proceeds with walking what it has just freed...
      
      - Check that a given PA donated to the gues is actually memory
        (only affecting pKVM)
      
      - Correctly handle MTE CMOs by Set/Way
      26f31498
  2. 24 May, 2023 3 commits
  3. 21 May, 2023 3 commits
  4. 19 May, 2023 9 commits
    • Michal Luczaj's avatar
      KVM: Fix vcpu_array[0] races · afb2acb2
      Michal Luczaj authored
      In kvm_vm_ioctl_create_vcpu(), add vcpu to vcpu_array iff it's safe to
      access vcpu via kvm_get_vcpu() and kvm_for_each_vcpu(), i.e. when there's
      no failure path requiring vcpu removal and destruction. Such order is
      important because vcpu_array accessors may end up referencing vcpu at
      vcpu_array[0] even before online_vcpus is set to 1.
      
      When online_vcpus=0, any call to kvm_get_vcpu() goes through
      array_index_nospec() and ends with an attempt to xa_load(vcpu_array, 0):
      
      	int num_vcpus = atomic_read(&kvm->online_vcpus);
      	i = array_index_nospec(i, num_vcpus);
      	return xa_load(&kvm->vcpu_array, i);
      
      Similarly, when online_vcpus=0, a kvm_for_each_vcpu() does not iterate over
      an "empty" range, but actually [0, ULONG_MAX]:
      
      	xa_for_each_range(&kvm->vcpu_array, idx, vcpup, 0, \
      			  (atomic_read(&kvm->online_vcpus) - 1))
      
      In both cases, such online_vcpus=0 edge case, even if leading to
      unnecessary calls to XArray API, should not be an issue; requesting
      unpopulated indexes/ranges is handled by xa_load() and xa_for_each_range().
      
      However, this means that when the first vCPU is created and inserted in
      vcpu_array *and* before online_vcpus is incremented, code calling
      kvm_get_vcpu()/kvm_for_each_vcpu() already has access to that first vCPU.
      
      This should not pose a problem assuming that once a vcpu is stored in
      vcpu_array, it will remain there, but that's not the case:
      kvm_vm_ioctl_create_vcpu() first inserts to vcpu_array, then requests a
      file descriptor. If create_vcpu_fd() fails, newly inserted vcpu is removed
      from the vcpu_array, then destroyed:
      
      	vcpu->vcpu_idx = atomic_read(&kvm->online_vcpus);
      	r = xa_insert(&kvm->vcpu_array, vcpu->vcpu_idx, vcpu, GFP_KERNEL_ACCOUNT);
      	kvm_get_kvm(kvm);
      	r = create_vcpu_fd(vcpu);
      	if (r < 0) {
      		xa_erase(&kvm->vcpu_array, vcpu->vcpu_idx);
      		kvm_put_kvm_no_destroy(kvm);
      		goto unlock_vcpu_destroy;
      	}
      	atomic_inc(&kvm->online_vcpus);
      
      This results in a possible race condition when a reference to a vcpu is
      acquired (via kvm_get_vcpu() or kvm_for_each_vcpu()) moments before said
      vcpu is destroyed.
      Signed-off-by: default avatarMichal Luczaj <mhal@rbox.co>
      Message-Id: <20230510140410.1093987-2-mhal@rbox.co>
      Cc: stable@vger.kernel.org
      Fixes: c5b07754 ("KVM: Convert the kvm->vcpus array to a xarray", 2021-12-08)
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      afb2acb2
    • Jacob Xu's avatar
      KVM: VMX: Fix header file dependency of asm/vmx.h · 3367eeab
      Jacob Xu authored
      Include a definition of WARN_ON_ONCE() before using it.
      
      Fixes: bb1fcc70 ("KVM: nVMX: Allow L1 to use 5-level page walks for nested EPT")
      Cc: Sean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarJacob Xu <jacobhxu@google.com>
      [reworded commit message; changed <asm/bug.h> to <linux/bug.h>]
      Signed-off-by: default avatarJim Mattson <jmattson@google.com>
      Reviewed-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220225012959.1554168-1-jmattson@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      3367eeab
    • Sean Christopherson's avatar
      KVM: Don't enable hardware after a restart/shutdown is initiated · e0ceec22
      Sean Christopherson authored
      Reject hardware enabling, i.e. VM creation, if a restart/shutdown has
      been initiated to avoid re-enabling hardware between kvm_reboot() and
      machine_{halt,power_off,restart}().  The restart case is especially
      problematic (for x86) as enabling VMX (or clearing GIF in KVM_RUN on
      SVM) blocks INIT, which results in the restart/reboot hanging as BIOS
      is unable to wake and rendezvous with APs.
      
      Note, this bug, and the original issue that motivated the addition of
      kvm_reboot(), is effectively limited to a forced reboot, e.g. `reboot -f`.
      In a "normal" reboot, userspace will gracefully teardown userspace before
      triggering the kernel reboot (modulo bugs, errors, etc), i.e. any process
      that might do ioctl(KVM_CREATE_VM) is long gone.
      
      Fixes: 8e1c1815 ("KVM: VMX: Disable VMX when system shutdown")
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Acked-by: default avatarMarc Zyngier <maz@kernel.org>
      Message-Id: <20230512233127.804012-3-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      e0ceec22
    • Sean Christopherson's avatar
      KVM: Use syscore_ops instead of reboot_notifier to hook restart/shutdown · 6735150b
      Sean Christopherson authored
      Use syscore_ops.shutdown to disable hardware virtualization during a
      reboot instead of using the dedicated reboot_notifier so that KVM disables
      virtualization _after_ system_state has been updated.  This will allow
      fixing a race in KVM's handling of a forced reboot where KVM can end up
      enabling hardware virtualization between kernel_restart_prepare() and
      machine_restart().
      
      Rename KVM's hook to match the syscore op to avoid any possible confusion
      from wiring up a "reboot" helper to a "shutdown" hook (neither "shutdown
      nor "reboot" is completely accurate as the hook handles both).
      
      Opportunistically rewrite kvm_shutdown()'s comment to make it less VMX
      specific, and to explain why kvm_rebooting exists.
      
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Oliver Upton <oliver.upton@linux.dev>
      Cc: James Morse <james.morse@arm.com>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Cc: Zenghui Yu <yuzenghui@huawei.com>
      Cc: kvmarm@lists.linux.dev
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Aleksandar Markovic <aleksandar.qemu.devel@gmail.com>
      Cc: Anup Patel <anup@brainfault.org>
      Cc: Atish Patra <atishp@atishpatra.org>
      Cc: kvm-riscv@lists.infradead.org
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Acked-by: default avatarMarc Zyngier <maz@kernel.org>
      Message-Id: <20230512233127.804012-2-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      6735150b
    • Will Deacon's avatar
      KVM: arm64: Prevent unconditional donation of unmapped regions from the host · 09cce60b
      Will Deacon authored
      Since host stage-2 mappings are created lazily, we cannot rely solely on
      the pte in order to recover the target physical address when checking a
      host-initiated memory transition as this permits donation of unmapped
      regions corresponding to MMIO or "no-map" memory.
      
      Instead of inspecting the pte, move the addr_is_allowed_memory() check
      into the host callback function where it is passed the physical address
      directly from the walker.
      
      Cc: Quentin Perret <qperret@google.com>
      Fixes: e82edcc7 ("KVM: arm64: Implement do_share() helper for sharing memory")
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20230518095844.1178-1-will@kernel.org
      09cce60b
    • Jean-Philippe Brucker's avatar
      KVM: arm64: vgic: Fix a comment · 62548732
      Jean-Philippe Brucker authored
      It is host userspace, not the guest, that issues
      KVM_DEV_ARM_VGIC_GRP_CTRL
      Signed-off-by: default avatarJean-Philippe Brucker <jean-philippe@linaro.org>
      Reviewed-by: default avatarOliver Upton <oliver.upton@linux.dev>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20230518100914.2837292-5-jean-philippe@linaro.org
      62548732
    • Jean-Philippe Brucker's avatar
      KVM: arm64: vgic: Fix locking comment · c38b8400
      Jean-Philippe Brucker authored
      It is now config_lock that must be held, not kvm lock. Replace the
      comment with a lockdep annotation.
      
      Fixes: f0032773 ("KVM: arm64: Use config_lock to protect vgic state")
      Signed-off-by: default avatarJean-Philippe Brucker <jean-philippe@linaro.org>
      Reviewed-by: default avatarOliver Upton <oliver.upton@linux.dev>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20230518100914.2837292-4-jean-philippe@linaro.org
      c38b8400
    • Jean-Philippe Brucker's avatar
      KVM: arm64: vgic: Wrap vgic_its_create() with config_lock · 9cf2f840
      Jean-Philippe Brucker authored
      vgic_its_create() changes the vgic state without holding the
      config_lock, which triggers a lockdep warning in vgic_v4_init():
      
      [  358.667941] WARNING: CPU: 3 PID: 178 at arch/arm64/kvm/vgic/vgic-v4.c:245 vgic_v4_init+0x15c/0x7a8
      ...
      [  358.707410]  vgic_v4_init+0x15c/0x7a8
      [  358.708550]  vgic_its_create+0x37c/0x4a4
      [  358.709640]  kvm_vm_ioctl+0x1518/0x2d80
      [  358.710688]  __arm64_sys_ioctl+0x7ac/0x1ba8
      [  358.711960]  invoke_syscall.constprop.0+0x70/0x1e0
      [  358.713245]  do_el0_svc+0xe4/0x2d4
      [  358.714289]  el0_svc+0x44/0x8c
      [  358.715329]  el0t_64_sync_handler+0xf4/0x120
      [  358.716615]  el0t_64_sync+0x190/0x194
      
      Wrap the whole of vgic_its_create() with config_lock since, in addition
      to calling vgic_v4_init(), it also modifies the global kvm->arch.vgic
      state.
      
      Fixes: f0032773 ("KVM: arm64: Use config_lock to protect vgic state")
      Signed-off-by: default avatarJean-Philippe Brucker <jean-philippe@linaro.org>
      Reviewed-by: default avatarOliver Upton <oliver.upton@linux.dev>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20230518100914.2837292-3-jean-philippe@linaro.org
      9cf2f840
    • Jean-Philippe Brucker's avatar
      KVM: arm64: vgic: Fix a circular locking issue · 59112e9c
      Jean-Philippe Brucker authored
      Lockdep reports a circular lock dependency between the srcu and the
      config_lock:
      
      [  262.179917] -> #1 (&kvm->srcu){.+.+}-{0:0}:
      [  262.182010]        __synchronize_srcu+0xb0/0x224
      [  262.183422]        synchronize_srcu_expedited+0x24/0x34
      [  262.184554]        kvm_io_bus_register_dev+0x324/0x50c
      [  262.185650]        vgic_register_redist_iodev+0x254/0x398
      [  262.186740]        vgic_v3_set_redist_base+0x3b0/0x724
      [  262.188087]        kvm_vgic_addr+0x364/0x600
      [  262.189189]        vgic_set_common_attr+0x90/0x544
      [  262.190278]        vgic_v3_set_attr+0x74/0x9c
      [  262.191432]        kvm_device_ioctl+0x2a0/0x4e4
      [  262.192515]        __arm64_sys_ioctl+0x7ac/0x1ba8
      [  262.193612]        invoke_syscall.constprop.0+0x70/0x1e0
      [  262.195006]        do_el0_svc+0xe4/0x2d4
      [  262.195929]        el0_svc+0x44/0x8c
      [  262.196917]        el0t_64_sync_handler+0xf4/0x120
      [  262.198238]        el0t_64_sync+0x190/0x194
      [  262.199224]
      [  262.199224] -> #0 (&kvm->arch.config_lock){+.+.}-{3:3}:
      [  262.201094]        __lock_acquire+0x2b70/0x626c
      [  262.202245]        lock_acquire+0x454/0x778
      [  262.203132]        __mutex_lock+0x190/0x8b4
      [  262.204023]        mutex_lock_nested+0x24/0x30
      [  262.205100]        vgic_mmio_write_v3_misc+0x5c/0x2a0
      [  262.206178]        dispatch_mmio_write+0xd8/0x258
      [  262.207498]        __kvm_io_bus_write+0x1e0/0x350
      [  262.208582]        kvm_io_bus_write+0xe0/0x1cc
      [  262.209653]        io_mem_abort+0x2ac/0x6d8
      [  262.210569]        kvm_handle_guest_abort+0x9b8/0x1f88
      [  262.211937]        handle_exit+0xc4/0x39c
      [  262.212971]        kvm_arch_vcpu_ioctl_run+0x90c/0x1c04
      [  262.214154]        kvm_vcpu_ioctl+0x450/0x12f8
      [  262.215233]        __arm64_sys_ioctl+0x7ac/0x1ba8
      [  262.216402]        invoke_syscall.constprop.0+0x70/0x1e0
      [  262.217774]        do_el0_svc+0xe4/0x2d4
      [  262.218758]        el0_svc+0x44/0x8c
      [  262.219941]        el0t_64_sync_handler+0xf4/0x120
      [  262.221110]        el0t_64_sync+0x190/0x194
      
      Note that the current report, which can be triggered by the vgic_irq
      kselftest, is a triple chain that includes slots_lock, but after
      inverting the slots_lock/config_lock dependency, the actual problem
      reported above remains.
      
      In several places, the vgic code calls kvm_io_bus_register_dev(), which
      synchronizes the srcu, while holding config_lock (#1). And the MMIO
      handler takes the config_lock while holding the srcu read lock (#0).
      
      Break dependency #1, by registering the distributor and redistributors
      without holding config_lock. The ITS also uses kvm_io_bus_register_dev()
      but already relies on slots_lock to serialize calls.
      
      The distributor iodev is created on the first KVM_RUN call. Multiple
      threads will race for vgic initialization, and only the first one will
      see !vgic_ready() under the lock. To serialize those threads, rely on
      slots_lock rather than config_lock.
      
      Redistributors are created earlier, through KVM_DEV_ARM_VGIC_GRP_ADDR
      ioctls and vCPU creation. Similarly, serialize the iodev creation with
      slots_lock, and the rest with config_lock.
      
      Fixes: f0032773 ("KVM: arm64: Use config_lock to protect vgic state")
      Signed-off-by: default avatarJean-Philippe Brucker <jean-philippe@linaro.org>
      Reviewed-by: default avatarOliver Upton <oliver.upton@linux.dev>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20230518100914.2837292-2-jean-philippe@linaro.org
      59112e9c
  5. 17 May, 2023 1 commit
    • Paolo Bonzini's avatar
      Merge tag 'kvmarm-fixes-6.4-1' of... · 39428f6e
      Paolo Bonzini authored
      Merge tag 'kvmarm-fixes-6.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
      
      KVM/arm64 fixes for 6.4, take #1
      
      - Plug a race in the stage-2 mapping code where the IPA and the PA
        would end up being out of sync
      
      - Make better use of the bitmap API (bitmap_zero, bitmap_zalloc...)
      
      - FP/SVE/SME documentation update, in the hope that this field
        becomes clearer...
      
      - Add workaround for the usual Apple SEIS brokenness
      
      - Random comment fixes
      39428f6e
  6. 14 May, 2023 13 commits
  7. 13 May, 2023 10 commits