- 02 Dec, 2021 2 commits
-
-
Frederic Weisbecker authored
At CPU-hotplug time, unbind_workers() may preempt a worker while it is going to sleep. In that case the following scenario can happen: unbind_workers() wq_worker_sleeping() -------------- ------------------- if (worker->flags & WORKER_NOT_RUNNING) return; //PREEMPTED by unbind_workers worker->flags |= WORKER_UNBOUND; [...] atomic_set(&pool->nr_running, 0); //resume to worker atomic_dec_and_test(&pool->nr_running); After unbind_worker() resets pool->nr_running, the value is expected to remain 0 until the pool ever gets rebound in case cpu_up() is called on the target CPU in the future. But here the race leaves pool->nr_running with a value of -1, triggering the following warning when the worker goes idle: WARNING: CPU: 3 PID: 34 at kernel/workqueue.c:1823 worker_enter_idle+0x95/0xc0 Modules linked in: CPU: 3 PID: 34 Comm: kworker/3:0 Not tainted 5.16.0-rc1+ #34 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014 Workqueue: 0x0 (rcu_par_gp) RIP: 0010:worker_enter_idle+0x95/0xc0 Code: 04 85 f8 ff ff ff 39 c1 7f 09 48 8b 43 50 48 85 c0 74 1b 83 e2 04 75 99 8b 43 34 39 43 30 75 91 8b 83 00 03 00 00 85 c0 74 87 <0f> 0b 5b c3 48 8b 35 70 f1 37 01 48 8d 7b 48 48 81 c6 e0 93 0 RSP: 0000:ffff9b7680277ed0 EFLAGS: 00010086 RAX: 00000000ffffffff RBX: ffff93465eae9c00 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff9346418a0000 RDI: ffff934641057140 RBP: ffff934641057170 R08: 0000000000000001 R09: ffff9346418a0080 R10: ffff9b768027fdf0 R11: 0000000000002400 R12: ffff93465eae9c20 R13: ffff93465eae9c20 R14: ffff93465eae9c70 R15: ffff934641057140 FS: 0000000000000000(0000) GS:ffff93465eac0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000001cc0c000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> worker_thread+0x89/0x3d0 ? process_one_work+0x400/0x400 kthread+0x162/0x190 ? set_kthread_struct+0x40/0x40 ret_from_fork+0x22/0x30 </TASK> Also due to this incorrect "nr_running == -1", all sorts of hazards can happen, starting with queued works being ignored because no workers are awaken at insert_work() time. Fix this with checking again the worker flags while pool->lock is locked. Fixes: b945efcd ("sched: Remove pointless preemption disable in sched_submit_work()") Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com> Tested-by: Paul E. McKenney <paulmck@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
-
Frederic Weisbecker authored
At CPU-hotplug time, unbind_worker() may preempt a worker while it is waking up. In that case the following scenario can happen: unbind_workers() wq_worker_running() -------------- ------------------- if (!(worker->flags & WORKER_NOT_RUNNING)) //PREEMPTED by unbind_workers worker->flags |= WORKER_UNBOUND; [...] atomic_set(&pool->nr_running, 0); //resume to worker atomic_inc(&worker->pool->nr_running); After unbind_worker() resets pool->nr_running, the value is expected to remain 0 until the pool ever gets rebound in case cpu_up() is called on the target CPU in the future. But here the race leaves pool->nr_running with a value of 1, triggering the following warning when the worker goes idle: WARNING: CPU: 3 PID: 34 at kernel/workqueue.c:1823 worker_enter_idle+0x95/0xc0 Modules linked in: CPU: 3 PID: 34 Comm: kworker/3:0 Not tainted 5.16.0-rc1+ #34 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014 Workqueue: 0x0 (rcu_par_gp) RIP: 0010:worker_enter_idle+0x95/0xc0 Code: 04 85 f8 ff ff ff 39 c1 7f 09 48 8b 43 50 48 85 c0 74 1b 83 e2 04 75 99 8b 43 34 39 43 30 75 91 8b 83 00 03 00 00 85 c0 74 87 <0f> 0b 5b c3 48 8b 35 70 f1 37 01 48 8d 7b 48 48 81 c6 e0 93 0 RSP: 0000:ffff9b7680277ed0 EFLAGS: 00010086 RAX: 00000000ffffffff RBX: ffff93465eae9c00 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff9346418a0000 RDI: ffff934641057140 RBP: ffff934641057170 R08: 0000000000000001 R09: ffff9346418a0080 R10: ffff9b768027fdf0 R11: 0000000000002400 R12: ffff93465eae9c20 R13: ffff93465eae9c20 R14: ffff93465eae9c70 R15: ffff934641057140 FS: 0000000000000000(0000) GS:ffff93465eac0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000001cc0c000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> worker_thread+0x89/0x3d0 ? process_one_work+0x400/0x400 kthread+0x162/0x190 ? set_kthread_struct+0x40/0x40 ret_from_fork+0x22/0x30 </TASK> Also due to this incorrect "nr_running == 1", further queued work may end up not being served, because no worker is awaken at work insert time. This raises rcutorture writer stalls for example. Fix this with disabling preemption in the right place in wq_worker_running(). It's worth noting that if the worker migrates and runs concurrently with unbind_workers(), it is guaranteed to see the WORKER_UNBOUND flag update due to set_cpus_allowed_ptr() acquiring/releasing rq->lock. Fixes: 6d25be57 ("sched/core, workqueues: Distangle worker accounting from rq lock") Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com> Tested-by: Paul E. McKenney <paulmck@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
-
- 01 Dec, 2021 1 commit
-
-
Paul E. McKenney authored
The current queue_work_on() docbook comment says that the caller must ensure that the specified CPU can't go away, but does not spell out the consequences, which turn out to be quite mild. Therefore expand this comment to explicitly say that the penalty for failing to nail down the specified CPU is that the workqueue handler might find itself executing on some other CPU. Cc: Tejun Heo <tj@kernel.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Tejun Heo <tj@kernel.org>
-
- 30 Nov, 2021 22 commits
-
-
Jason A. Donenfeld authored
random.c is a bit understaffed, and folks want more prompt reviews. I've got the crypto background and the interest to do these reviews, and have authored parts of the file already. Cc: Theodore Ts'o <tytso@mit.edu> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds authored
Pull kvm fixes from Paolo Bonzini: "ARM64: - Fix constant sign extension affecting TCR_EL2 and preventing running on ARMv8.7 models due to spurious bits being set - Fix use of helpers using PSTATE early on exit by always sampling it as soon as the exit takes place - Move pkvm's 32bit handling into a common helper RISC-V: - Fix incorrect KVM_MAX_VCPUS value - Unmap stage2 mapping when deleting/moving a memslot x86: - Fix and downgrade BUG_ON due to uninitialized cache - Many APICv and MOVE_ENC_CONTEXT_FROM fixes - Correctly emulate TLB flushes around nested vmentry/vmexit and when the nested hypervisor uses VPID - Prevent modifications to CPUID after the VM has run - Other smaller bugfixes Generic: - Memslot handling bugfixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (44 commits) KVM: fix avic_set_running for preemptable kernels KVM: VMX: clear vmx_x86_ops.sync_pir_to_irr if APICv is disabled KVM: SEV: accept signals in sev_lock_two_vms KVM: SEV: do not take kvm->lock when destroying KVM: SEV: Prohibit migration of a VM that has mirrors KVM: SEV: Do COPY_ENC_CONTEXT_FROM with both VMs locked selftests: sev_migrate_tests: add tests for KVM_CAP_VM_COPY_ENC_CONTEXT_FROM KVM: SEV: move mirror status to destination of KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM KVM: SEV: initialize regions_list of a mirror VM KVM: SEV: cleanup locking for KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM KVM: SEV: do not use list_replace_init on an empty list KVM: x86: Use a stable condition around all VT-d PI paths KVM: x86: check PIR even for vCPUs with disabled APICv KVM: VMX: prepare sync_pir_to_irr for running with APICv disabled KVM: selftests: page_table_test: fix calculation of guest_test_phys_mem KVM: x86/mmu: Handle "default" period when selectively waking kthread KVM: MMU: shadow nested paging does not have PKU KVM: x86/mmu: Remove spurious TLB flushes in TDP MMU zap collapsible path KVM: x86/mmu: Use yield-safe TDP MMU root iter in MMU notifier unmapping KVM: X86: Use vcpu->arch.walk_mmu for kvm_mmu_invlpg() ...
-
Matthew Wilcox (Oracle) authored
Commit 98e1385e ("include/linux/radix-tree.h: replace kernel.h with the necessary inclusions") broke the radix tree test suite in two different ways; first by including math.h which didn't exist in the tools directory, and second by removing an implicit include of spinlock.h before lockdep.h. Fix both issues. Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Paolo Bonzini authored
avic_set_running() passes the current CPU to avic_vcpu_load(), albeit via vcpu->cpu rather than smp_processor_id(). If the thread is migrated while avic_set_running runs, the call to avic_vcpu_load() can use a stale value for the processor id. Avoid this by blocking preemption over the entire execution of avic_set_running(). Reported-by: Sean Christopherson <seanjc@google.com> Fixes: 8221c137 ("svm: Manage vcpu load/unload when enable AVIC") Cc: stable@vger.kernel.org Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
There is nothing to synchronize if APICv is disabled, since neither other vCPUs nor assigned devices can set PIR.ON. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
Generally, kvm->lock is not taken for a long time, but sev_lock_two_vms is different: it takes vCPU locks inside, so userspace can hold it back just by calling a vCPU ioctl. Play it safe and use mutex_lock_killable. Message-Id: <20211123005036.2954379-13-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
Taking the lock is useless since there are no other references, and there are already accesses (e.g. to sev->enc_context_owner) that do not take it. So get rid of it. Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211123005036.2954379-12-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
VMs that mirror an encryption context rely on the owner to keep the ASID allocated. Performing a KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM would cause a dangling ASID: 1. copy context from A to B (gets ref to A) 2. move context from A to L (moves ASID from A to L) 3. close L (releases ASID from L, B still references it) The right way to do the handoff instead is to create a fresh mirror VM on the destination first: 1. copy context from A to B (gets ref to A) [later] 2. close B (releases ref to A) 3. move context from A to L (moves ASID from A to L) 4. copy context from L to M So, catch the situation by adding a count of how many VMs are mirroring this one's encryption context. Fixes: 0b020f5a ("KVM: SEV: Add support for SEV-ES intra host migration") Message-Id: <20211123005036.2954379-11-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
Now that we have a facility to lock two VMs with deadlock protection, use it for the creation of mirror VMs as well. One of COPY_ENC_CONTEXT_FROM(dst, src) and COPY_ENC_CONTEXT_FROM(src, dst) would always fail, so the combination is nonsensical and it is okay to return -EBUSY if it is attempted. This sidesteps the question of what happens if a VM is MOVE_ENC_CONTEXT_FROM'd at the same time as it is COPY_ENC_CONTEXT_FROM'd: the locking prevents that from happening. Cc: Peter Gonda <pgonda@google.com> Cc: Sean Christopherson <seanjc@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211123005036.2954379-10-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
I am putting the tests in sev_migrate_tests because the failure conditions are very similar and some of the setup code can be reused, too. The tests cover both successful creation of a mirror VM, and error conditions. Cc: Peter Gonda <pgonda@google.com> Cc: Sean Christopherson <seanjc@google.com> Message-Id: <20211123005036.2954379-9-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
Allow intra-host migration of a mirror VM; the destination VM will be a mirror of the same ASID as the source. Fixes: b5663931 ("KVM: SEV: Add support for SEV intra host migration") Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211123005036.2954379-8-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
This was broken before the introduction of KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM, but technically harmless because the region list was unused for a mirror VM. However, it is untidy and it now causes a NULL pointer access when attempting to move the encryption context of a mirror VM. Fixes: 54526d1f ("KVM: x86: Support KVM VMs sharing SEV context") Message-Id: <20211123005036.2954379-7-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
Encapsulate the handling of the migration_in_progress flag for both VMs in two functions sev_lock_two_vms and sev_unlock_two_vms. It does not matter if KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM locks the destination struct kvm a bit later, and this change 1) keeps the cleanup chain of labels smaller 2) makes it possible for KVM_CAP_VM_COPY_ENC_CONTEXT_FROM to reuse the logic. Cc: Peter Gonda <pgonda@google.com> Cc: Sean Christopherson <seanjc@google.com> Message-Id: <20211123005036.2954379-6-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
list_replace_init cannot be used if the source is an empty list, because "new->next->prev = new" will overwrite "old->next": new old prev = new, next = new prev = old, next = old new->next = old->next prev = new, next = old prev = old, next = old new->next->prev = new prev = new, next = old prev = old, next = new new->prev = old->prev prev = old, next = old prev = old, next = old new->next->prev = new prev = old, next = old prev = new, next = new The desired outcome instead would be to leave both old and new the same as they were (two empty circular lists). Use list_cut_before, which already has the necessary check and is documented to discard the previous contents of the list that will hold the result. Fixes: b5663931 ("KVM: SEV: Add support for SEV intra host migration") Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211123005036.2954379-5-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
Currently, checks for whether VT-d PI can be used refer to the current status of the feature in the current vCPU; or they more or less pick vCPU 0 in case a specific vCPU is not available. However, these checks do not attempt to synchronize with changes to the IRTE. In particular, there is no path that updates the IRTE when APICv is re-activated on vCPU 0; and there is no path to wakeup a CPU that has APICv disabled, if the wakeup occurs because of an IRTE that points to a posted interrupt. To fix this, always go through the VT-d PI path as long as there are assigned devices and APICv is available on both the host and the VM side. Since the relevant condition was copied over three times, take the hint and factor it into a separate function. Suggested-by: Sean Christopherson <seanjc@google.com> Cc: stable@vger.kernel.org Reviewed-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: David Matlack <dmatlack@google.com> Message-Id: <20211123004311.2954158-5-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
The IRTE for an assigned device can trigger a POSTED_INTR_VECTOR even if APICv is disabled on the vCPU that receives it. In that case, the interrupt will just cause a vmexit and leave the ON bit set together with the PIR bit corresponding to the interrupt. Right now, the interrupt would not be delivered until APICv is re-enabled. However, fixing this is just a matter of always doing the PIR->IRR synchronization, even if the vCPU has temporarily disabled APICv. This is not a problem for performance, or if anything it is an improvement. First, in the common case where vcpu->arch.apicv_active is true, one fewer check has to be performed. Second, static_call_cond will elide the function call if APICv is not present or disabled. Finally, in the case for AMD hardware we can remove the sync_pir_to_irr callback: it is only needed for apic_has_interrupt_for_ppr, and that function already has a fallback for !APICv. Cc: stable@vger.kernel.org Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: David Matlack <dmatlack@google.com> Message-Id: <20211123004311.2954158-4-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
If APICv is disabled for this vCPU, assigned devices may still attempt to post interrupts. In that case, we need to cancel the vmentry and deliver the interrupt with KVM_REQ_EVENT. Extend the existing code that handles injection of L1 interrupts into L2 to cover this case as well. vmx_hwapic_irr_update is only called when APICv is active so it would be confusing to add a check for vcpu->arch.apicv_active in there. Instead, just use vmx_set_rvi directly in vmx_sync_pir_to_irr. Cc: stable@vger.kernel.org Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: David Matlack <dmatlack@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211123004311.2954158-3-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Maciej S. Szmigiero authored
A kvm_page_table_test run with its default settings fails on VMX due to memory region add failure: > ==== Test Assertion Failure ==== > lib/kvm_util.c:952: ret == 0 > pid=10538 tid=10538 errno=17 - File exists > 1 0x00000000004057d1: vm_userspace_mem_region_add at kvm_util.c:947 > 2 0x0000000000401ee9: pre_init_before_test at kvm_page_table_test.c:302 > 3 (inlined by) run_test at kvm_page_table_test.c:374 > 4 0x0000000000409754: for_each_guest_mode at guest_modes.c:53 > 5 0x0000000000401860: main at kvm_page_table_test.c:500 > 6 0x00007f82ae2d8554: ?? ??:0 > 7 0x0000000000401894: _start at ??:? > KVM_SET_USER_MEMORY_REGION IOCTL failed, > rc: -1 errno: 17 > slot: 1 flags: 0x0 > guest_phys_addr: 0xc0000000 size: 0x40000000 This is because the memory range that this test is trying to add (0x0c0000000 - 0x100000000) conflicts with LAPIC mapping at 0x0fee00000. Looking at the code it seems that guest_test_*phys*_mem variable gets mistakenly overwritten with guest_test_*virt*_mem while trying to adjust the former for alignment. With the correct variable adjusted this test runs successfully. Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Message-Id: <52e487458c3172923549bbcf9dfccfbe6faea60b.1637940473.git.maciej.szmigiero@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Account for the '0' being a default, "let KVM choose" period, when determining whether or not the recovery worker needs to be awakened in response to userspace reducing the period. Failure to do so results in the worker not being awakened properly, e.g. when changing the period from '0' to any small-ish value. Fixes: 4dfe4f40 ("kvm: x86: mmu: Make NX huge page recovery period configurable") Cc: stable@vger.kernel.org Cc: Junaid Shahid <junaids@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211120015706.3830341-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
Initialize the mask for PKU permissions as if CR4.PKE=0, avoiding incorrect interpretations of the nested hypervisor's page tables. Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Drop the "flush" param and return values to/from the TDP MMU's helper for zapping collapsible SPTEs. Because the helper runs with mmu_lock held for read, not write, it uses tdp_mmu_zap_spte_atomic(), and the atomic zap handles the necessary remote TLB flush. Similarly, because mmu_lock is dropped and re-acquired between zapping legacy MMUs and zapping TDP MMUs, kvm_mmu_zap_collapsible_sptes() must handle remote TLB flushes from the legacy MMU before calling into the TDP MMU. Fixes: e2209710 ("KVM: x86/mmu: Skip rmap operations if rmaps not allocated") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211120045046.3940942-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Use the yield-safe variant of the TDP MMU iterator when handling an unmapping event from the MMU notifier, as most occurences of the event allow yielding. Fixes: e1eed584 ("KVM: x86/mmu: Allow yielding during MMU notifier unmap/zap, if possible") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211120015008.3780032-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
- 29 Nov, 2021 1 commit
-
-
David Howells authored
Adjust the netfslib docs in light of the foliation changes. Also un-kdoc-mark netfs_skip_folio_read() since it's internal and isn't part of the API. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> cc: Matthew Wilcox <willy@infradead.org> cc: linux-cachefs@redhat.com cc: linux-mm@kvack.org Link: https://lore.kernel.org/r/163706992597.3179783.18360472879717076435.stgit@warthog.procyon.org.uk/Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
- 28 Nov, 2021 8 commits
-
-
Linus Torvalds authored
-
git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds authored
Pull vhost,virtio,vdpa bugfixes from Michael Tsirkin: "Misc fixes all over the place. Revert of virtio used length validation series: the approach taken does not seem to work, breaking too many guests in the process. We'll need to do length validation using some other approach" [ This merge also ends up reverting commit f7a36b03 ("vsock/virtio: suppress used length validation"), which came in through the networking tree in the meantime, and was part of that whole used length validation series - Linus ] * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: vdpa_sim: avoid putting an uninitialized iova_domain vhost-vdpa: clean irqs before reseting vdpa device virtio-blk: modify the value type of num in virtio_queue_rq() vhost/vsock: cleanup removing `len` variable vhost/vsock: fix incorrect used length reported to the guest Revert "virtio_ring: validate used buffer length" Revert "virtio-net: don't let virtio core to validate used length" Revert "virtio-blk: don't let virtio core to validate used length" Revert "virtio-scsi: don't let virtio core to validate used buffer length"
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull x86 build fix from Thomas Gleixner: "A single fix for a missing __init annotation of prepare_command_line()" * tag 'x86-urgent-2021-11-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot: Mark prepare_command_line() __init
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull scheduler fix from Thomas Gleixner: "A single scheduler fix to ensure that there is no stale KASAN shadow state left on the idle task's stack when a CPU is brought up after it was brought down before" * tag 'sched-urgent-2021-11-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/scs: Reset task stack state in bringup_cpu()
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull perf fix from Thomas Gleixner: "A single fix for perf to prevent it from sending SIGTRAP to another task from a trace point event as it's not possible to deliver a synchronous signal to a different task from there" * tag 'perf-urgent-2021-11-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf: Ignore sigtrap for tracepoints destined for other tasks
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull locking fixes from Thomas Gleixner: "Two regression fixes for reader writer semaphores: - Plug a race in the lock handoff which is caused by inconsistency of the reader and writer path and can lead to corruption of the underlying counter. - down_read_trylock() is suboptimal when the lock is contended and multiple readers trylock concurrently. That's due to the initial value being read non-atomically which results in at least two compare exchange loops. Making the initial readout atomic reduces this significantly. Whith 40 readers by 11% in a benchmark which enforces contention on mmap_sem" * tag 'locking-urgent-2021-11-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/rwsem: Optimize down_read_trylock() under highly contended case locking/rwsem: Make handoff bit handling more consistent
-
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-traceLinus Torvalds authored
Pull another tracing fix from Steven Rostedt: "Fix the fix of pid filtering The setting of the pid filtering flag tested the "trace only this pid" case twice, and ignored the "trace everything but this pid" case. The 5.15 kernel does things a little differently due to the new sparse pid mask introduced in 5.16, and as the bug was discovered running the 5.15 kernel, and the first fix was initially done for that kernel, that fix handled both cases (only pid and all but pid), but the forward port to 5.16 created this bug" * tag 'trace-v5.16-rc2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Test the 'Do not trace this pid' case in create event
-
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommuLinus Torvalds authored
Pull iommu fixes from Joerg Roedel: - Intel VT-d fixes: - Remove unused PASID_DISABLED - Fix RCU locking - Fix for the unmap_pages call-back - Rockchip RK3568 address mask fix - AMD IOMMUv2 log message clarification * tag 'iommu-fixes-v5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: iommu/vt-d: Fix unmap_pages support iommu/vt-d: Fix an unbalanced rcu_read_lock/rcu_read_unlock() iommu/rockchip: Fix PAGE_DESC_HI_MASKs for RK3568 iommu/amd: Clarify AMD IOMMUv2 initialization messages iommu/vt-d: Remove unused PASID_DISABLED
-
- 27 Nov, 2021 6 commits
-
-
git://git.samba.org/ksmbdLinus Torvalds authored
Pull ksmbd fixes from Steve French: "Five ksmbd server fixes, four of them for stable: - memleak fix - fix for default data stream on filesystems that don't support xattr - error logging fix - session setup fix - minor doc cleanup" * tag '5.16-rc2-ksmbd-fixes' of git://git.samba.org/ksmbd: ksmbd: fix memleak in get_file_stream_info() ksmbd: contain default data stream even if xattr is empty ksmbd: downgrade addition info error msg to debug in smb2_get_info_sec() docs: filesystem: cifs: ksmbd: Fix small layout issues ksmbd: Fix an error handling path in 'smb2_sess_setup()'
-
Guenter Roeck authored
Use the architecture independent Kconfig option PAGE_SIZE_LESS_THAN_64KB to indicate that VMXNET3 requires a page size smaller than 64kB. Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Guenter Roeck authored
NTFS_RW code allocates page size dependent arrays on the stack. This results in build failures if the page size is 64k or larger. fs/ntfs/aops.c: In function 'ntfs_write_mst_block': fs/ntfs/aops.c:1311:1: error: the frame size of 2240 bytes is larger than 2048 bytes Since commit f22969a6 ("powerpc/64s: Default to 64K pages for 64 bit book3s") this affects ppc:allmodconfig builds, but other architectures supporting page sizes of 64k or larger are also affected. Increasing the maximum frame size for affected architectures just to silence this error does not really help. The frame size would have to be set to a really large value for 256k pages. Also, a large frame size could potentially result in stack overruns in this code and elsewhere and is therefore not desirable. Make NTFS_RW dependent on page sizes smaller than 64k instead. Signed-off-by: Guenter Roeck <linux@roeck-us.net> Cc: Anton Altaparmakov <anton@tuxera.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Guenter Roeck authored
NTFS_RW and VMXNET3 require a page size smaller than 64kB. Add generic Kconfig option for use outside architecture code to avoid architecture specific Kconfig options in that code. Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Cc: Anton Altaparmakov <anton@tuxera.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Steven Rostedt (VMware) authored
When creating a new event (via a module, kprobe, eprobe, etc), the descriptors that are created must add flags for pid filtering if an instance has pid filtering enabled, as the flags are used at the time the event is executed to know if pid filtering should be done or not. The "Only trace this pid" case was added, but a cut and paste error made that case checked twice, instead of checking the "Trace all but this pid" case. Link: https://lore.kernel.org/all/202111280401.qC0z99JB-lkp@intel.com/ Fixes: 6cb20650 ("tracing: Check pid filtering when creating events") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
-
git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds authored
Pull xfs fixes from Darrick Wong: "Fixes for a resource leak and a build robot complaint about totally dead code: - Fix buffer resource leak that could lead to livelock on corrupt fs. - Remove unused function xfs_inew_wait to shut up the build robots" * tag 'xfs-5.16-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: remove xfs_inew_wait xfs: Fix the free logic of state in xfs_attr_node_hasname
-