1. 02 Dec, 2021 2 commits
    • Frederic Weisbecker's avatar
      workqueue: Fix unbind_workers() VS wq_worker_sleeping() race · 45c753f5
      Frederic Weisbecker authored
      At CPU-hotplug time, unbind_workers() may preempt a worker while it is
      going to sleep. In that case the following scenario can happen:
      
          unbind_workers()                     wq_worker_sleeping()
          --------------                      -------------------
                                            if (worker->flags & WORKER_NOT_RUNNING)
                                                return;
                                            //PREEMPTED by unbind_workers
          worker->flags |= WORKER_UNBOUND;
          [...]
          atomic_set(&pool->nr_running, 0);
          //resume to worker
                                             atomic_dec_and_test(&pool->nr_running);
      
      After unbind_worker() resets pool->nr_running, the value is expected to
      remain 0 until the pool ever gets rebound in case cpu_up() is called on
      the target CPU in the future. But here the race leaves pool->nr_running
      with a value of -1, triggering the following warning when the worker goes
      idle:
      
              WARNING: CPU: 3 PID: 34 at kernel/workqueue.c:1823 worker_enter_idle+0x95/0xc0
              Modules linked in:
              CPU: 3 PID: 34 Comm: kworker/3:0 Not tainted 5.16.0-rc1+ #34
              Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
              Workqueue:  0x0 (rcu_par_gp)
              RIP: 0010:worker_enter_idle+0x95/0xc0
              Code: 04 85 f8 ff ff ff 39 c1 7f 09 48 8b 43 50 48 85 c0 74 1b 83 e2 04 75 99 8b 43 34 39 43 30 75 91 8b 83 00 03 00 00 85 c0 74 87 <0f> 0b 5b c3 48 8b 35 70 f1 37 01 48 8d 7b 48 48 81 c6 e0 93  0
              RSP: 0000:ffff9b7680277ed0 EFLAGS: 00010086
              RAX: 00000000ffffffff RBX: ffff93465eae9c00 RCX: 0000000000000000
              RDX: 0000000000000000 RSI: ffff9346418a0000 RDI: ffff934641057140
              RBP: ffff934641057170 R08: 0000000000000001 R09: ffff9346418a0080
              R10: ffff9b768027fdf0 R11: 0000000000002400 R12: ffff93465eae9c20
              R13: ffff93465eae9c20 R14: ffff93465eae9c70 R15: ffff934641057140
              FS:  0000000000000000(0000) GS:ffff93465eac0000(0000) knlGS:0000000000000000
              CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
              CR2: 0000000000000000 CR3: 000000001cc0c000 CR4: 00000000000006e0
              DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
              DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
              Call Trace:
                <TASK>
                worker_thread+0x89/0x3d0
                ? process_one_work+0x400/0x400
                kthread+0x162/0x190
                ? set_kthread_struct+0x40/0x40
                ret_from_fork+0x22/0x30
                </TASK>
      
      Also due to this incorrect "nr_running == -1", all sorts of hazards can
      happen, starting with queued works being ignored because no workers are
      awaken at insert_work() time.
      
      Fix this with checking again the worker flags while pool->lock is locked.
      
      Fixes: b945efcd ("sched: Remove pointless preemption disable in sched_submit_work()")
      Reviewed-by: default avatarLai Jiangshan <jiangshanlai@gmail.com>
      Tested-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Paul E. McKenney <paulmck@kernel.org>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      45c753f5
    • Frederic Weisbecker's avatar
      workqueue: Fix unbind_workers() VS wq_worker_running() race · 07edfece
      Frederic Weisbecker authored
      At CPU-hotplug time, unbind_worker() may preempt a worker while it is
      waking up. In that case the following scenario can happen:
      
              unbind_workers()                     wq_worker_running()
              --------------                      -------------------
              	                      if (!(worker->flags & WORKER_NOT_RUNNING))
              	                          //PREEMPTED by unbind_workers
              worker->flags |= WORKER_UNBOUND;
              [...]
              atomic_set(&pool->nr_running, 0);
              //resume to worker
      		                              atomic_inc(&worker->pool->nr_running);
      
      After unbind_worker() resets pool->nr_running, the value is expected to
      remain 0 until the pool ever gets rebound in case cpu_up() is called on
      the target CPU in the future. But here the race leaves pool->nr_running
      with a value of 1, triggering the following warning when the worker goes
      idle:
      
      	WARNING: CPU: 3 PID: 34 at kernel/workqueue.c:1823 worker_enter_idle+0x95/0xc0
      	Modules linked in:
      	CPU: 3 PID: 34 Comm: kworker/3:0 Not tainted 5.16.0-rc1+ #34
      	Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
      	Workqueue:  0x0 (rcu_par_gp)
      	RIP: 0010:worker_enter_idle+0x95/0xc0
      	Code: 04 85 f8 ff ff ff 39 c1 7f 09 48 8b 43 50 48 85 c0 74 1b 83 e2 04 75 99 8b 43 34 39 43 30 75 91 8b 83 00 03 00 00 85 c0 74 87 <0f> 0b 5b c3 48 8b 35 70 f1 37 01 48 8d 7b 48 48 81 c6 e0 93  0
      	RSP: 0000:ffff9b7680277ed0 EFLAGS: 00010086
      	RAX: 00000000ffffffff RBX: ffff93465eae9c00 RCX: 0000000000000000
      	RDX: 0000000000000000 RSI: ffff9346418a0000 RDI: ffff934641057140
      	RBP: ffff934641057170 R08: 0000000000000001 R09: ffff9346418a0080
      	R10: ffff9b768027fdf0 R11: 0000000000002400 R12: ffff93465eae9c20
      	R13: ffff93465eae9c20 R14: ffff93465eae9c70 R15: ffff934641057140
      	FS:  0000000000000000(0000) GS:ffff93465eac0000(0000) knlGS:0000000000000000
      	CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      	CR2: 0000000000000000 CR3: 000000001cc0c000 CR4: 00000000000006e0
      	DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      	DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      	Call Trace:
      	  <TASK>
      	  worker_thread+0x89/0x3d0
      	  ? process_one_work+0x400/0x400
      	  kthread+0x162/0x190
      	  ? set_kthread_struct+0x40/0x40
      	  ret_from_fork+0x22/0x30
      	  </TASK>
      
      Also due to this incorrect "nr_running == 1", further queued work may
      end up not being served, because no worker is awaken at work insert time.
      This raises rcutorture writer stalls for example.
      
      Fix this with disabling preemption in the right place in
      wq_worker_running().
      
      It's worth noting that if the worker migrates and runs concurrently with
      unbind_workers(), it is guaranteed to see the WORKER_UNBOUND flag update
      due to set_cpus_allowed_ptr() acquiring/releasing rq->lock.
      
      Fixes: 6d25be57 ("sched/core, workqueues: Distangle worker accounting from rq lock")
      Reviewed-by: default avatarLai Jiangshan <jiangshanlai@gmail.com>
      Tested-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      07edfece
  2. 01 Dec, 2021 1 commit
    • Paul E. McKenney's avatar
      workqueue: Upgrade queue_work_on() comment · 443378f0
      Paul E. McKenney authored
      The current queue_work_on() docbook comment says that the caller must
      ensure that the specified CPU can't go away, but does not spell out the
      consequences, which turn out to be quite mild.  Therefore expand this
      comment to explicitly say that the penalty for failing to nail down the
      specified CPU is that the workqueue handler might find itself executing
      on some other CPU.
      
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Lai Jiangshan <jiangshanlai@gmail.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      443378f0
  3. 30 Nov, 2021 22 commits
    • Jason A. Donenfeld's avatar
      MAINTAINERS: co-maintain random.c · 58e1100f
      Jason A. Donenfeld authored
      random.c is a bit understaffed, and folks want more prompt reviews. I've
      got the crypto background and the interest to do these reviews, and have
      authored parts of the file already.
      
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      58e1100f
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · f080815f
      Linus Torvalds authored
      Pull kvm fixes from Paolo Bonzini:
       "ARM64:
      
         - Fix constant sign extension affecting TCR_EL2 and preventing
           running on ARMv8.7 models due to spurious bits being set
      
         - Fix use of helpers using PSTATE early on exit by always sampling it
           as soon as the exit takes place
      
         - Move pkvm's 32bit handling into a common helper
      
        RISC-V:
      
         - Fix incorrect KVM_MAX_VCPUS value
      
         - Unmap stage2 mapping when deleting/moving a memslot
      
        x86:
      
         - Fix and downgrade BUG_ON due to uninitialized cache
      
         - Many APICv and MOVE_ENC_CONTEXT_FROM fixes
      
         - Correctly emulate TLB flushes around nested vmentry/vmexit and when
           the nested hypervisor uses VPID
      
         - Prevent modifications to CPUID after the VM has run
      
         - Other smaller bugfixes
      
        Generic:
      
         - Memslot handling bugfixes"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (44 commits)
        KVM: fix avic_set_running for preemptable kernels
        KVM: VMX: clear vmx_x86_ops.sync_pir_to_irr if APICv is disabled
        KVM: SEV: accept signals in sev_lock_two_vms
        KVM: SEV: do not take kvm->lock when destroying
        KVM: SEV: Prohibit migration of a VM that has mirrors
        KVM: SEV: Do COPY_ENC_CONTEXT_FROM with both VMs locked
        selftests: sev_migrate_tests: add tests for KVM_CAP_VM_COPY_ENC_CONTEXT_FROM
        KVM: SEV: move mirror status to destination of KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM
        KVM: SEV: initialize regions_list of a mirror VM
        KVM: SEV: cleanup locking for KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM
        KVM: SEV: do not use list_replace_init on an empty list
        KVM: x86: Use a stable condition around all VT-d PI paths
        KVM: x86: check PIR even for vCPUs with disabled APICv
        KVM: VMX: prepare sync_pir_to_irr for running with APICv disabled
        KVM: selftests: page_table_test: fix calculation of guest_test_phys_mem
        KVM: x86/mmu: Handle "default" period when selectively waking kthread
        KVM: MMU: shadow nested paging does not have PKU
        KVM: x86/mmu: Remove spurious TLB flushes in TDP MMU zap collapsible path
        KVM: x86/mmu: Use yield-safe TDP MMU root iter in MMU notifier unmapping
        KVM: X86: Use vcpu->arch.walk_mmu for kvm_mmu_invlpg()
        ...
      f080815f
    • Matthew Wilcox (Oracle)'s avatar
      tools: Fix math.h breakage · d6e6a27d
      Matthew Wilcox (Oracle) authored
      Commit 98e1385e ("include/linux/radix-tree.h: replace kernel.h with
      the necessary inclusions") broke the radix tree test suite in two
      different ways; first by including math.h which didn't exist in the
      tools directory, and second by removing an implicit include of
      spinlock.h before lockdep.h.  Fix both issues.
      
      Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Acked-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d6e6a27d
    • Paolo Bonzini's avatar
      KVM: fix avic_set_running for preemptable kernels · 7cfc5c65
      Paolo Bonzini authored
      avic_set_running() passes the current CPU to avic_vcpu_load(), albeit
      via vcpu->cpu rather than smp_processor_id().  If the thread is migrated
      while avic_set_running runs, the call to avic_vcpu_load() can use a stale
      value for the processor id.  Avoid this by blocking preemption over the
      entire execution of avic_set_running().
      Reported-by: default avatarSean Christopherson <seanjc@google.com>
      Fixes: 8221c137 ("svm: Manage vcpu load/unload when enable AVIC")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      7cfc5c65
    • Paolo Bonzini's avatar
      KVM: VMX: clear vmx_x86_ops.sync_pir_to_irr if APICv is disabled · e90e51d5
      Paolo Bonzini authored
      There is nothing to synchronize if APICv is disabled, since neither
      other vCPUs nor assigned devices can set PIR.ON.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      e90e51d5
    • Paolo Bonzini's avatar
      KVM: SEV: accept signals in sev_lock_two_vms · c9d61dcb
      Paolo Bonzini authored
      Generally, kvm->lock is not taken for a long time, but
      sev_lock_two_vms is different: it takes vCPU locks
      inside, so userspace can hold it back just by calling
      a vCPU ioctl.  Play it safe and use mutex_lock_killable.
      
      Message-Id: <20211123005036.2954379-13-pbonzini@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c9d61dcb
    • Paolo Bonzini's avatar
      KVM: SEV: do not take kvm->lock when destroying · 10a37929
      Paolo Bonzini authored
      Taking the lock is useless since there are no other references,
      and there are already accesses (e.g. to sev->enc_context_owner)
      that do not take it.  So get rid of it.
      Reviewed-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20211123005036.2954379-12-pbonzini@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      10a37929
    • Paolo Bonzini's avatar
      KVM: SEV: Prohibit migration of a VM that has mirrors · 17d44a96
      Paolo Bonzini authored
      VMs that mirror an encryption context rely on the owner to keep the
      ASID allocated.  Performing a KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM
      would cause a dangling ASID:
      
      1. copy context from A to B (gets ref to A)
      2. move context from A to L (moves ASID from A to L)
      3. close L (releases ASID from L, B still references it)
      
      The right way to do the handoff instead is to create a fresh mirror VM
      on the destination first:
      
      1. copy context from A to B (gets ref to A)
      [later] 2. close B (releases ref to A)
      3. move context from A to L (moves ASID from A to L)
      4. copy context from L to M
      
      So, catch the situation by adding a count of how many VMs are
      mirroring this one's encryption context.
      
      Fixes: 0b020f5a ("KVM: SEV: Add support for SEV-ES intra host migration")
      Message-Id: <20211123005036.2954379-11-pbonzini@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      17d44a96
    • Paolo Bonzini's avatar
      KVM: SEV: Do COPY_ENC_CONTEXT_FROM with both VMs locked · bf42b02b
      Paolo Bonzini authored
      Now that we have a facility to lock two VMs with deadlock
      protection, use it for the creation of mirror VMs as well.  One of
      COPY_ENC_CONTEXT_FROM(dst, src) and COPY_ENC_CONTEXT_FROM(src, dst)
      would always fail, so the combination is nonsensical and it is okay to
      return -EBUSY if it is attempted.
      
      This sidesteps the question of what happens if a VM is
      MOVE_ENC_CONTEXT_FROM'd at the same time as it is
      COPY_ENC_CONTEXT_FROM'd: the locking prevents that from
      happening.
      
      Cc: Peter Gonda <pgonda@google.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Reviewed-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20211123005036.2954379-10-pbonzini@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      bf42b02b
    • Paolo Bonzini's avatar
      selftests: sev_migrate_tests: add tests for KVM_CAP_VM_COPY_ENC_CONTEXT_FROM · dc79c9f4
      Paolo Bonzini authored
      I am putting the tests in sev_migrate_tests because the failure conditions are
      very similar and some of the setup code can be reused, too.
      
      The tests cover both successful creation of a mirror VM, and error
      conditions.
      
      Cc: Peter Gonda <pgonda@google.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Message-Id: <20211123005036.2954379-9-pbonzini@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      dc79c9f4
    • Paolo Bonzini's avatar
      KVM: SEV: move mirror status to destination of KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM · 642525e3
      Paolo Bonzini authored
      Allow intra-host migration of a mirror VM; the destination VM will be
      a mirror of the same ASID as the source.
      
      Fixes: b5663931 ("KVM: SEV: Add support for SEV intra host migration")
      Reviewed-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20211123005036.2954379-8-pbonzini@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      642525e3
    • Paolo Bonzini's avatar
      KVM: SEV: initialize regions_list of a mirror VM · 2b347a38
      Paolo Bonzini authored
      This was broken before the introduction of KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM,
      but technically harmless because the region list was unused for a mirror
      VM.  However, it is untidy and it now causes a NULL pointer access when
      attempting to move the encryption context of a mirror VM.
      
      Fixes: 54526d1f ("KVM: x86: Support KVM VMs sharing SEV context")
      Message-Id: <20211123005036.2954379-7-pbonzini@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      2b347a38
    • Paolo Bonzini's avatar
      KVM: SEV: cleanup locking for KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM · 501b580c
      Paolo Bonzini authored
      Encapsulate the handling of the migration_in_progress flag for both VMs in
      two functions sev_lock_two_vms and sev_unlock_two_vms.  It does not matter
      if KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM locks the destination struct kvm a bit
      later, and this change 1) keeps the cleanup chain of labels smaller 2)
      makes it possible for KVM_CAP_VM_COPY_ENC_CONTEXT_FROM to reuse the logic.
      
      Cc: Peter Gonda <pgonda@google.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Message-Id: <20211123005036.2954379-6-pbonzini@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      501b580c
    • Paolo Bonzini's avatar
      KVM: SEV: do not use list_replace_init on an empty list · 4674164f
      Paolo Bonzini authored
      list_replace_init cannot be used if the source is an empty list,
      because "new->next->prev = new" will overwrite "old->next":
      
      				new				old
      				prev = new, next = new		prev = old, next = old
      new->next = old->next		prev = new, next = old		prev = old, next = old
      new->next->prev = new		prev = new, next = old		prev = old, next = new
      new->prev = old->prev		prev = old, next = old		prev = old, next = old
      new->next->prev = new		prev = old, next = old		prev = new, next = new
      
      The desired outcome instead would be to leave both old and new the same
      as they were (two empty circular lists).  Use list_cut_before, which
      already has the necessary check and is documented to discard the
      previous contents of the list that will hold the result.
      
      Fixes: b5663931 ("KVM: SEV: Add support for SEV intra host migration")
      Reviewed-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20211123005036.2954379-5-pbonzini@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      4674164f
    • Paolo Bonzini's avatar
      KVM: x86: Use a stable condition around all VT-d PI paths · 53b7ca1a
      Paolo Bonzini authored
      Currently, checks for whether VT-d PI can be used refer to the current
      status of the feature in the current vCPU; or they more or less pick
      vCPU 0 in case a specific vCPU is not available.
      
      However, these checks do not attempt to synchronize with changes to
      the IRTE.  In particular, there is no path that updates the IRTE when
      APICv is re-activated on vCPU 0; and there is no path to wakeup a CPU
      that has APICv disabled, if the wakeup occurs because of an IRTE
      that points to a posted interrupt.
      
      To fix this, always go through the VT-d PI path as long as there are
      assigned devices and APICv is available on both the host and the VM side.
      Since the relevant condition was copied over three times, take the hint
      and factor it into a separate function.
      Suggested-by: default avatarSean Christopherson <seanjc@google.com>
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarSean Christopherson <seanjc@google.com>
      Reviewed-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Reviewed-by: default avatarDavid Matlack <dmatlack@google.com>
      Message-Id: <20211123004311.2954158-5-pbonzini@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      53b7ca1a
    • Paolo Bonzini's avatar
      KVM: x86: check PIR even for vCPUs with disabled APICv · 37c4dbf3
      Paolo Bonzini authored
      The IRTE for an assigned device can trigger a POSTED_INTR_VECTOR even
      if APICv is disabled on the vCPU that receives it.  In that case, the
      interrupt will just cause a vmexit and leave the ON bit set together
      with the PIR bit corresponding to the interrupt.
      
      Right now, the interrupt would not be delivered until APICv is re-enabled.
      However, fixing this is just a matter of always doing the PIR->IRR
      synchronization, even if the vCPU has temporarily disabled APICv.
      
      This is not a problem for performance, or if anything it is an
      improvement.  First, in the common case where vcpu->arch.apicv_active is
      true, one fewer check has to be performed.  Second, static_call_cond will
      elide the function call if APICv is not present or disabled.  Finally,
      in the case for AMD hardware we can remove the sync_pir_to_irr callback:
      it is only needed for apic_has_interrupt_for_ppr, and that function
      already has a fallback for !APICv.
      
      Cc: stable@vger.kernel.org
      Co-developed-by: default avatarSean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Reviewed-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Reviewed-by: default avatarDavid Matlack <dmatlack@google.com>
      Message-Id: <20211123004311.2954158-4-pbonzini@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      37c4dbf3
    • Paolo Bonzini's avatar
      KVM: VMX: prepare sync_pir_to_irr for running with APICv disabled · 7e1901f6
      Paolo Bonzini authored
      If APICv is disabled for this vCPU, assigned devices may still attempt to
      post interrupts.  In that case, we need to cancel the vmentry and deliver
      the interrupt with KVM_REQ_EVENT.  Extend the existing code that handles
      injection of L1 interrupts into L2 to cover this case as well.
      
      vmx_hwapic_irr_update is only called when APICv is active so it would be
      confusing to add a check for vcpu->arch.apicv_active in there.  Instead,
      just use vmx_set_rvi directly in vmx_sync_pir_to_irr.
      
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Reviewed-by: default avatarDavid Matlack <dmatlack@google.com>
      Reviewed-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20211123004311.2954158-3-pbonzini@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      7e1901f6
    • Maciej S. Szmigiero's avatar
      KVM: selftests: page_table_test: fix calculation of guest_test_phys_mem · 81835ee1
      Maciej S. Szmigiero authored
      A kvm_page_table_test run with its default settings fails on VMX due to
      memory region add failure:
      > ==== Test Assertion Failure ====
      >  lib/kvm_util.c:952: ret == 0
      >  pid=10538 tid=10538 errno=17 - File exists
      >     1  0x00000000004057d1: vm_userspace_mem_region_add at kvm_util.c:947
      >     2  0x0000000000401ee9: pre_init_before_test at kvm_page_table_test.c:302
      >     3   (inlined by) run_test at kvm_page_table_test.c:374
      >     4  0x0000000000409754: for_each_guest_mode at guest_modes.c:53
      >     5  0x0000000000401860: main at kvm_page_table_test.c:500
      >     6  0x00007f82ae2d8554: ?? ??:0
      >     7  0x0000000000401894: _start at ??:?
      >  KVM_SET_USER_MEMORY_REGION IOCTL failed,
      >  rc: -1 errno: 17
      >  slot: 1 flags: 0x0
      >  guest_phys_addr: 0xc0000000 size: 0x40000000
      
      This is because the memory range that this test is trying to add
      (0x0c0000000 - 0x100000000) conflicts with LAPIC mapping at 0x0fee00000.
      
      Looking at the code it seems that guest_test_*phys*_mem variable gets
      mistakenly overwritten with guest_test_*virt*_mem while trying to adjust
      the former for alignment.
      With the correct variable adjusted this test runs successfully.
      Signed-off-by: default avatarMaciej S. Szmigiero <maciej.szmigiero@oracle.com>
      Message-Id: <52e487458c3172923549bbcf9dfccfbe6faea60b.1637940473.git.maciej.szmigiero@oracle.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      81835ee1
    • Sean Christopherson's avatar
      KVM: x86/mmu: Handle "default" period when selectively waking kthread · f47491d7
      Sean Christopherson authored
      Account for the '0' being a default, "let KVM choose" period, when
      determining whether or not the recovery worker needs to be awakened in
      response to userspace reducing the period.  Failure to do so results in
      the worker not being awakened properly, e.g. when changing the period
      from '0' to any small-ish value.
      
      Fixes: 4dfe4f40 ("kvm: x86: mmu: Make NX huge page recovery period configurable")
      Cc: stable@vger.kernel.org
      Cc: Junaid Shahid <junaids@google.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20211120015706.3830341-1-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      f47491d7
    • Paolo Bonzini's avatar
      KVM: MMU: shadow nested paging does not have PKU · 28f091bc
      Paolo Bonzini authored
      Initialize the mask for PKU permissions as if CR4.PKE=0, avoiding
      incorrect interpretations of the nested hypervisor's page tables.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      28f091bc
    • Sean Christopherson's avatar
      KVM: x86/mmu: Remove spurious TLB flushes in TDP MMU zap collapsible path · 4b85c921
      Sean Christopherson authored
      Drop the "flush" param and return values to/from the TDP MMU's helper for
      zapping collapsible SPTEs.  Because the helper runs with mmu_lock held
      for read, not write, it uses tdp_mmu_zap_spte_atomic(), and the atomic
      zap handles the necessary remote TLB flush.
      
      Similarly, because mmu_lock is dropped and re-acquired between zapping
      legacy MMUs and zapping TDP MMUs, kvm_mmu_zap_collapsible_sptes() must
      handle remote TLB flushes from the legacy MMU before calling into the TDP
      MMU.
      
      Fixes: e2209710 ("KVM: x86/mmu: Skip rmap operations if rmaps not allocated")
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20211120045046.3940942-4-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      4b85c921
    • Sean Christopherson's avatar
      KVM: x86/mmu: Use yield-safe TDP MMU root iter in MMU notifier unmapping · 75333772
      Sean Christopherson authored
      Use the yield-safe variant of the TDP MMU iterator when handling an
      unmapping event from the MMU notifier, as most occurences of the event
      allow yielding.
      
      Fixes: e1eed584 ("KVM: x86/mmu: Allow yielding during MMU notifier unmap/zap, if possible")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20211120015008.3780032-1-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      75333772
  4. 29 Nov, 2021 1 commit
  5. 28 Nov, 2021 8 commits
  6. 27 Nov, 2021 6 commits