Commits · 04c4f2ee3f68c9a4bf1653d15f1a9a435ae33f7a · Kirill Smelkov / linux

13 Apr, 2021 1 commit

KVM: VMX: Don't use vcpu->run->internal.ndata as an array index · 04c4f2ee

Reiji Watanabe authored Apr 13, 2021

__vmx_handle_exit() uses vcpu->run->internal.ndata as an index for
an array access.  Since vcpu->run is (can be) mapped to a user address
space with a writer permission, the 'ndata' could be updated by the
user process at anytime (the user process can set it to outside the
bounds of the array).
So, it is not safe that __vmx_handle_exit() uses the 'ndata' that way.

Fixes: 1aa561b1 ("kvm: x86: Add "last CPU" to some KVM_EXIT information")
Signed-off-by: Reiji Watanabe <reijiw@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Message-Id: <20210413154739.490299-1-reijiw@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

04c4f2ee

08 Apr, 2021 1 commit

KVM: x86/mmu: preserve pending TLB flush across calls to kvm_tdp_mmu_zap_sp · 315f02c6

Paolo Bonzini authored Apr 06, 2021

Right now, if a call to kvm_tdp_mmu_zap_sp returns false, the caller
will skip the TLB flush, which is wrong.  There are two ways to fix
it:

- since kvm_tdp_mmu_zap_sp will not yield and therefore will not flush
  the TLB itself, we could change the call to kvm_tdp_mmu_zap_sp to
  use "flush |= ..."

- or we can chain the flush argument through kvm_tdp_mmu_zap_sp down
  to __kvm_tdp_mmu_zap_gfn_range.  Note that kvm_tdp_mmu_zap_sp will
  neither yield nor flush, so flush would never go from true to
  false.

This patch does the former to simplify application to stable kernels,
and to make it further clearer that kvm_tdp_mmu_zap_sp will not flush.

Cc: seanjc@google.com
Fixes: 048f4980 ("KVM: x86/mmu: Ensure TLBs are flushed for TDP MMU during NX zapping")
Cc: <stable@vger.kernel.org> # 5.10.x: 048f4980: KVM: x86/mmu: Ensure TLBs are flushed for TDP MMU during NX zapping
Cc: <stable@vger.kernel.org> # 5.10.x: 33a31641: KVM: x86/mmu: Don't allow TDP MMU to yield when recovering NX pages
Cc: <stable@vger.kernel.org>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

315f02c6

01 Apr, 2021 7 commits

selftests: kvm: Check that TSC page value is small after KVM_SET_CLOCK(0) · 55626ca9

Vitaly Kuznetsov authored Mar 26, 2021

Add a test for the issue when KVM_SET_CLOCK(0) call could cause
TSC page value to go very big because of a signedness issue around
hv_clock->system_time.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210326155551.17446-3-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

55626ca9

KVM: x86: Prevent 'hv_clock->system_time' from going negative in kvm_guest_time_update() · 77fcbe82

Vitaly Kuznetsov authored Mar 31, 2021

When guest time is reset with KVM_SET_CLOCK(0), it is possible for
'hv_clock->system_time' to become a small negative number. This happens
because in KVM_SET_CLOCK handling we set 'kvm->arch.kvmclock_offset' based
on get_kvmclock_ns(kvm) but when KVM_REQ_CLOCK_UPDATE is handled,
kvm_guest_time_update() does (masterclock in use case):

hv_clock.system_time = ka->master_kernel_ns + v->kvm->arch.kvmclock_offset;

And 'master_kernel_ns' represents the last time when masterclock
got updated, it can precede KVM_SET_CLOCK() call. Normally, this is not a
problem, the difference is very small, e.g. I'm observing
hv_clock.system_time = -70 ns. The issue comes from the fact that
'hv_clock.system_time' is stored as unsigned and 'system_time / 100' in
compute_tsc_page_parameters() becomes a very big number.

Use 'master_kernel_ns' instead of get_kvmclock_ns() when masterclock is in
use and get_kvmclock_base_ns() when it's not to prevent 'system_time' from
going negative.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210331124130.337992-2-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

77fcbe82

KVM: x86: disable interrupts while pvclock_gtod_sync_lock is taken · a83829f5

Paolo Bonzini authored Mar 25, 2021

pvclock_gtod_sync_lock can be taken with interrupts disabled if the
preempt notifier calls get_kvmclock_ns to update the Xen
runstate information:

   spin_lock include/linux/spinlock.h:354 [inline]
   get_kvmclock_ns+0x25/0x390 arch/x86/kvm/x86.c:2587
   kvm_xen_update_runstate+0x3d/0x2c0 arch/x86/kvm/xen.c:69
   kvm_xen_update_runstate_guest+0x74/0x320 arch/x86/kvm/xen.c:100
   kvm_xen_runstate_set_preempted arch/x86/kvm/xen.h:96 [inline]
   kvm_arch_vcpu_put+0x2d8/0x5a0 arch/x86/kvm/x86.c:4062

So change the users of the spinlock to spin_lock_irqsave and
spin_unlock_irqrestore.

Reported-by: syzbot+b282b65c2c68492df769@syzkaller.appspotmail.com
Fixes: 30b5c851 ("KVM: x86/xen: Add support for vCPU runstate information")
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

a83829f5

KVM: x86: reduce pvclock_gtod_sync_lock critical sections · c2c647f9

Paolo Bonzini authored Mar 25, 2021

There is no need to include changes to vcpu->requests into
the pvclock_gtod_sync_lock critical section.  The changes to
the shared data structures (in pvclock_update_vm_gtod_copy)
already occur under the lock.

Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

c2c647f9

Merge branch 'kvm-fix-svm-races' into kvm-master · 6ebae23c
Paolo Bonzini authored Mar 31, 2021

6ebae23c

KVM: SVM: ensure that EFER.SVME is set when running nested guest or on nested vmexit · 3c346c0c

Paolo Bonzini authored Mar 31, 2021

Fixing nested_vmcb_check_save to avoid all TOC/TOU races
is a bit harder in released kernels, so do the bare minimum
by avoiding that EFER.SVME is cleared.  This is problematic
because svm_set_efer frees the data structures for nested
virtualization if EFER.SVME is cleared.

Also check that EFER.SVME remains set after a nested vmexit;
clearing it could happen if the bit is zero in the save area
that is passed to KVM_SET_NESTED_STATE (the save area of the
nested state corresponds to the nested hypervisor's state
and is restored on the next nested vmexit).

Cc: stable@vger.kernel.org
Fixes: 2fcf4876 ("KVM: nSVM: implement on demand allocation of the nested state")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

3c346c0c

KVM: SVM: load control fields from VMCB12 before checking them · a58d9166

Paolo Bonzini authored Mar 31, 2021

Avoid races between check and use of the nested VMCB controls.  This
for example ensures that the VMRUN intercept is always reflected to the
nested hypervisor, instead of being processed by the host.  Without this
patch, it is possible to end up with svm->nested.hsave pointing to
the MSR permission bitmap for nested guests.

This bug is CVE-2021-29657.
Reported-by: Felix Wilhelm <fwilhelm@google.com>
Cc: stable@vger.kernel.org
Fixes: 2fcf4876 ("KVM: nSVM: implement on demand allocation of the nested state")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

a58d9166

31 Mar, 2021 1 commit
- Merge commit 'kvm-tdp-fix-flushes' into kvm-master · 825e34d3
  Paolo Bonzini authored Mar 31, 2021
  
  825e34d3
30 Mar, 2021 10 commits

KVM: x86/mmu: Don't allow TDP MMU to yield when recovering NX pages · 33a31641

Sean Christopherson authored Mar 25, 2021

Prevent the TDP MMU from yielding when zapping a gfn range during NX
page recovery.  If a flush is pending from a previous invocation of the
zapping helper, either in the TDP MMU or the legacy MMU, but the TDP MMU
has not accumulated a flush for the current invocation, then yielding
will release mmu_lock with stale TLB entries.

That being said, this isn't technically a bug fix in the current code, as
the TDP MMU will never yield in this case.  tdp_mmu_iter_cond_resched()
will yield if and only if it has made forward progress, as defined by the
current gfn vs. the last yielded (or starting) gfn.  Because zapping a
single shadow page is guaranteed to (a) find that page and (b) step
sideways at the level of the shadow page, the TDP iter will break its loop
before getting a chance to yield.

But that is all very, very subtle, and will break at the slightest sneeze,
e.g. zapping while holding mmu_lock for read would break as the TDP MMU
wouldn't be guaranteed to see the present shadow page, and thus could step
sideways at a lower level.

Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210325200119.1359384-4-seanjc@google.com>
[Add lockdep assertion. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

33a31641

KVM: x86/mmu: Ensure TLBs are flushed for TDP MMU during NX zapping · 048f4980

Sean Christopherson authored Mar 25, 2021

Honor the "flush needed" return from kvm_tdp_mmu_zap_gfn_range(), which
does the flush itself if and only if it yields (which it will never do in
this particular scenario), and otherwise expects the caller to do the
flush.  If pages are zapped from the TDP MMU but not the legacy MMU, then
no flush will occur.

Fixes: 29cf0f50 ("kvm: x86/mmu: NX largepage recovery for TDP MMU")
Cc: stable@vger.kernel.org
Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210325200119.1359384-3-seanjc@google.com>
Reviewed-by: Ben Gardon <bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

048f4980

KVM: x86/mmu: Ensure TLBs are flushed when yielding during GFN range zap · a835429c

Sean Christopherson authored Mar 25, 2021

When flushing a range of GFNs across multiple roots, ensure any pending
flush from a previous root is honored before yielding while walking the
tables of the current root.

Note, kvm_tdp_mmu_zap_gfn_range() now intentionally overwrites its local
"flush" with the result to avoid redundant flushes.  zap_gfn_range()
preserves and return the incoming "flush", unless of course the flush was
performed prior to yielding and no new flush was triggered.

Fixes: 1af4a960 ("KVM: x86/mmu: Yield in TDU MMU iter even if no SPTES changed")
Cc: stable@vger.kernel.org
Reviewed-by: Ben Gardon <bgardon@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210325200119.1359384-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

a835429c

KVM: make: Fix out-of-source module builds · 6fb3084a

Siddharth Chandrasekaran authored Mar 24, 2021

Building kvm module out-of-source with,

    make -C $SRC O=$BIN M=arch/x86/kvm

fails to find "irq.h" as the include dir passed to cflags-y does not
prefix the source dir. Fix this by prefixing $(srctree) to the include
dir path.
Signed-off-by: Siddharth Chandrasekaran <sidcha@amazon.de>
Message-Id: <20210324124347.18336-1-sidcha@amazon.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

6fb3084a

selftests: kvm: make hardware_disable_test less verbose · f982fb62

Vitaly Kuznetsov authored Mar 23, 2021

hardware_disable_test produces 512 snippets like
...
 main: [511] waiting semaphore
 run_test: [511] start vcpus
 run_test: [511] all threads launched
 main: [511] waiting 368us
 main: [511] killing child

and this doesn't have much value, let's print this info with pr_debug().
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210323104331.1354800-1-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

f982fb62

KVM: x86/vPMU: Forbid writing to MSR_F15H_PERF MSRs when guest doesn't have... · 1973cadd

Vitaly Kuznetsov authored Mar 23, 2021

KVM: x86/vPMU: Forbid writing to MSR_F15H_PERF MSRs when guest doesn't have X86_FEATURE_PERFCTR_CORE

MSR_F15H_PERF_CTL0-5, MSR_F15H_PERF_CTR0-5 MSRs are only available when
X86_FEATURE_PERFCTR_CORE CPUID bit was exposed to the guest. KVM, however,
allows these MSRs unconditionally because kvm_pmu_is_valid_msr() ->
amd_msr_idx_to_pmc() check always passes and because kvm_pmu_set_msr() ->
amd_pmu_set_msr() doesn't fail.

In case of a counter (CTRn), no big harm is done as we only increase
internal PMC's value but in case of an eventsel (CTLn), we go deep into
perf internals with a non-existing counter.

Note, kvm_get_msr_common() just returns '0' when these MSRs don't exist
and this also seems to contradict architectural behavior which is #GP
(I did check one old Opteron host) but changing this status quo is a bit
scarier.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210323084515.1346540-1-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

1973cadd

KVM: x86: remove unused declaration of kvm_write_tsc() · ecaf088f

Dongli Zhang authored Mar 26, 2021

kvm_write_tsc() was renamed and made static since commit 0c899c25
("KVM: x86: do not attempt TSC synchronization on guest writes"). Remove
its unused declaration.
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Message-Id: <20210326070334.12310-1-dongli.zhang@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

ecaf088f

KVM: clean up the unused argument · d632826f

Haiwei Li authored Mar 13, 2021

kvm_msr_ignored_check function never uses vcpu argument. Clean up the
function and invokers.
Signed-off-by: Haiwei Li <lihaiwei@tencent.com>
Message-Id: <20210313051032.4171-1-lihaiwei.kernel@gmail.com>
Reviewed-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

d632826f

tools/kvm_stat: Add restart delay · 75f94ecb

Stefan Raspl authored Mar 25, 2021

If this service is enabled and the system rebooted, Systemd's initial
attempt to start this unit file may fail in case the kvm module is not
loaded. Since we did not specify a delay for the retries, Systemd
restarts with a minimum delay a number of times before giving up and
disabling the service. Which means a subsequent kvm module load will
have kvm running without monitoring.
Adding a delay to fix this.
Signed-off-by: Stefan Raspl <raspl@linux.ibm.com>
Message-Id: <20210325122949.1433271-1-raspl@linux.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

75f94ecb

Merge tag 'kvmarm-fixes-5.12-3' of... · 41793e7f

Paolo Bonzini authored Mar 30, 2021

Merge tag 'kvmarm-fixes-5.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 fixes for 5.12, take #3

- Fix GICv3 MMIO compatibility probing
- Prevent guests from using the ARMv8.4 self-hosted tracing extension

41793e7f

28 Mar, 2021 9 commits

Linux 5.12-rc5 · a5e13c6d
Linus Torvalds authored Mar 28, 2021

a5e13c6d

Merge tag 'perf-tools-fixes-for-v5.12-2020-03-28' of... · f9e2bb42

Linus Torvalds authored Mar 28, 2021

Merge tag 'perf-tools-fixes-for-v5.12-2020-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull perf tooling fixes from Arnaldo Carvalho de Melo:

- Avoid write of uninitialized memory when generating PERF_RECORD_MMAP*
records.

- Fix 'perf top' BPF support related crash with perf_event_paranoid=3 +
kptr_restrict.

- Validate raw event with sysfs exported format bits.

- Fix waipid on SIGCHLD delivery bugs in 'perf daemon'.

- Change to use bash for daemon test on Debian, where the default is
dash and thus fails for use of bashisms in this test.

- Fix memory leak in vDSO found using ASAN.

- Remove now useless (due to the fact that BPF now supports static
vars) failing sub test "BPF relocation checker".

- Fix auxtrace queue conflict.

- Sync linux/kvm.h with the kernel sources.

* tag 'perf-tools-fixes-for-v5.12-2020-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
perf test: Change to use bash for daemon test
perf record: Fix memory leak in vDSO found using ASAN
perf test: Remove now useless failing sub test "BPF relocation checker"
perf daemon: Return from kill functions
perf daemon: Force waipid for all session on SIGCHLD delivery
perf top: Fix BPF support related crash with perf_event_paranoid=3 + kptr_restrict
perf pmu: Validate raw event with sysfs exported format bits
perf synthetic events: Avoid write of uninitialized memory when generating PERF_RECORD_MMAP* records
tools headers UAPI: Sync linux/kvm.h with the kernel sources
perf synthetic-events: Fix uninitialized 'kernel_thread' variable
perf auxtrace: Fix auxtrace queue conflict

f9e2bb42

Merge tag 'auxdisplay-for-linus-v5.12-rc6' of git://github.com/ojeda/linux · 3fef15f8

Linus Torvalds authored Mar 28, 2021

Pull auxdisplay fix from Miguel Ojeda:
 "Remove in_interrupt() usage (Sebastian Andrzej Siewior)"

* tag 'auxdisplay-for-linus-v5.12-rc6' of git://github.com/ojeda/linux:
  auxdisplay: Remove in_interrupt() usage.

3fef15f8

Merge tag 'x86-urgent-2021-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 36a14638

Linus Torvalds authored Mar 28, 2021

Pull x86 fixes from Ingo Molnar:
 "Two fixes:

   - Fix build failure on Ubuntu with new GCC packages that turn
     on -fcf-protection

   - Fix SME memory encryption PTE encoding bug - AFAICT the code
     worked on 4K page sizes (level 1) but had the wrong shift at
     higher page level orders (level 2 and higher)"

* tag 'x86-urgent-2021-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/build: Turn off -fcf-protection for realmode targets
  x86/mem_encrypt: Correct physical address calculation in __set_clr_pte_enc()

36a14638

Merge tag 'locking-urgent-2021-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 47fbbc94

Linus Torvalds authored Mar 28, 2021

Pull locking fix from Ingo Molnar:
 "Fix the non-debug mutex_lock_io_nested() method to map to
  mutex_lock_io() instead of mutex_lock().

  Right now nothing uses this API explicitly, but this is an
  accident waiting to happen"

* tag 'locking-urgent-2021-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/mutex: Fix non debug version of mutex_lock_io_nested()

47fbbc94

Merge tag '5.12-rc4-smb3' of git://git.samba.org/sfrench/cifs-2.6 · 81b1d39f

Linus Torvalds authored Mar 28, 2021

Pull cifs fixes from Steve French:
 "Five cifs/smb3 fixes, two for stable.

  Includes an important fix for encryption and an ACL fix, as well as a
  fix for possible reflink data corruption"

* tag '5.12-rc4-smb3' of git://git.samba.org/sfrench/cifs-2.6:
  smb3: fix cached file size problems in duplicate extents (reflink)
  cifs: Silently ignore unknown oplock break handle
  cifs: revalidate mapping when we open files for SMB1 POSIX
  cifs: Fix chmod with modefromsid when an older ACE already exists.
  cifs: Adjust key sizes and key generation routines for AES256 encryption

81b1d39f

Merge tag 'io_uring-5.12-2021-03-27' of git://git.kernel.dk/linux-block · b44d1ddc

Linus Torvalds authored Mar 28, 2021

Pull io_uring fixes from Jens Axboe:

 - Use thread info versions of flag testing, as discussed last week.

 - The series enabling PF_IO_WORKER to just take signals, instead of
   needing to special case that they do not in a bunch of places. Ends
   up being pretty trivial to do, and then we can revert all the special
   casing we're currently doing.

 - Kill dead pointer assignment

 - Fix hashed part of async work queue trace

 - Fix sign extension issue for IORING_OP_PROVIDE_BUFFERS

 - Fix a link completion ordering regression in this merge window

 - Cancellation fixes

* tag 'io_uring-5.12-2021-03-27' of git://git.kernel.dk/linux-block:
  io_uring: remove unsued assignment to pointer io
  io_uring: don't cancel extra on files match
  io_uring: don't cancel-track common timeouts
  io_uring: do post-completion chore on t-out cancel
  io_uring: fix timeout cancel return code
  Revert "signal: don't allow STOP on PF_IO_WORKER threads"
  Revert "kernel: freezer should treat PF_IO_WORKER like PF_KTHREAD for freezing"
  Revert "kernel: treat PF_IO_WORKER like PF_KTHREAD for ptrace/signals"
  Revert "signal: don't allow sending any signals to PF_IO_WORKER threads"
  kernel: stop masking signals in create_io_thread()
  io_uring: handle signals for IO threads like a normal thread
  kernel: don't call do_exit() for PF_IO_WORKER threads
  io_uring: maintain CQE order of a failed link
  io-wq: fix race around pending work on teardown
  io_uring: do ctx sqd ejection in a clear context
  io_uring: fix provide_buffers sign extension
  io_uring: don't skip file_end_write() on reissue
  io_uring: correct io_queue_async_work() traces
  io_uring: don't use {test,clear}_tsk_thread_flag() for current

b44d1ddc

Merge tag 'block-5.12-2021-03-27' of git://git.kernel.dk/linux-block · abed516e

Linus Torvalds authored Mar 28, 2021

Pull block fixes from Jens Axboe:

 - Fix regression from this merge window with the xarray partition
   change, which allowed partition counts that overflow the u8 that
   holds the partition number (Ming)

 - Fix zone append warning (Johannes)

 - Segmentation count fix for multipage bvecs (David)

 - Partition scan fix (Chris)

* tag 'block-5.12-2021-03-27' of git://git.kernel.dk/linux-block:
  block: don't create too many partitions
  block: support zone append bvecs
  block: recalculate segment count for multi-segment discards correctly
  block: clear GD_NEED_PART_SCAN later in bdev_disk_changed

abed516e

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · e8cfe8fa

Linus Torvalds authored Mar 28, 2021

Pull SCSI fixes from James Bottomley:
 "Seven fixes, all in drivers (qla2xxx, mkt3sas, qedi, target,
  ibmvscsi).

  The most serious are the target pscsi oom and the qla2xxx revert which
  can otherwise cause a use after free"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: target: pscsi: Clean up after failure in pscsi_map_sg()
  scsi: target: pscsi: Avoid OOM in pscsi_map_sg()
  scsi: mpt3sas: Fix error return code of mpt3sas_base_attach()
  scsi: qedi: Fix error return code of qedi_alloc_global_queues()
  scsi: Revert "qla2xxx: Make sure that aborted commands are freed"
  scsi: ibmvfc: Make ibmvfc_wait_for_ops() MQ aware
  scsi: ibmvfc: Fix potential race in ibmvfc_wait_for_ops()

e8cfe8fa

27 Mar, 2021 11 commits

io_uring: remove unsued assignment to pointer io · 2b8ed1c9

Colin Ian King authored Mar 26, 2021

There is an assignment to io that is never read after the assignment,
the assignment is redundant and can be removed.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

2b8ed1c9

io_uring: don't cancel extra on files match · 78d9d7c2

Pavel Begunkov authored Mar 25, 2021

As tasks always wait and kill their io-wq on exec/exit, files are of no
more concern to us, so we don't need to specifically cancel them by hand
in those cases. Moreover we should not, because io_match_task() looks at
req->task->files now, which is always true and so leads to extra
cancellations, that wasn't a case before per-task io-wq.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/0566c1de9b9dd417f5de345c817ca953580e0e2e.1616696997.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

78d9d7c2

io_uring: don't cancel-track common timeouts · 2482b58f

Pavel Begunkov authored Mar 25, 2021

Don't account usual timeouts (i.e. not linked) as REQ_F_INFLIGHT but
keep behaviour prior to dd59a3d5 ("io_uring: reliably cancel linked
timeouts").
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/104441ef5d97e3932113d44501fda0df88656b83.1616696997.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

2482b58f

io_uring: do post-completion chore on t-out cancel · 80c4cbdb

Pavel Begunkov authored Mar 25, 2021

Don't forget about io_commit_cqring() + io_cqring_ev_posted() after
exit/exec cancelling timeouts. Both functions declared only after
io_kill_timeouts(), so to avoid tons of forward declarations move
it down.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/72ace588772c0f14834a6a4185d56c445a366fb4.1616696997.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

80c4cbdb

io_uring: fix timeout cancel return code · 1ee4160c

Pavel Begunkov authored Mar 25, 2021

When we cancel a timeout we should emit a sensible return code, like
-ECANCELED but not 0, otherwise it may trick users.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/7b0ad1065e3bd1994722702bd0ba9e7bc9b0683b.1616696997.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

1ee4160c

Revert "signal: don't allow STOP on PF_IO_WORKER threads" · 1e4cf0d3

Jens Axboe authored Mar 25, 2021

This reverts commit 4db4b1a0.

The IO threads allow and handle SIGSTOP now, so don't special case them
anymore in task_set_jobctl_pending().
Signed-off-by: Jens Axboe <axboe@kernel.dk>

1e4cf0d3

Revert "kernel: freezer should treat PF_IO_WORKER like PF_KTHREAD for freezing" · d3dc04cd

Jens Axboe authored Mar 25, 2021

This reverts commit 15b2219f.

Before IO threads accepted signals, the freezer using take signals to wake
up an IO thread would cause them to loop without any way to clear the
pending signal. That is no longer the case, so stop special casing
PF_IO_WORKER in the freezer.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

d3dc04cd

Revert "kernel: treat PF_IO_WORKER like PF_KTHREAD for ptrace/signals" · e8b33b8c

Jens Axboe authored Mar 25, 2021

This reverts commit 6fb8f43c.

The IO threads do allow signals now, including SIGSTOP, and we can allow
ptrace attach. Attaching won't reveal anything interesting for the IO
threads, but it will allow eg gdb to attach to a task with io_urings
and IO threads without complaining. And once attached, it will allow
the usual introspection into regular threads.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

e8b33b8c

Revert "signal: don't allow sending any signals to PF_IO_WORKER threads" · 5a842a74

Jens Axboe authored Mar 25, 2021

This reverts commit 5be28c8f.

IO threads now take signals just fine, so there's no reason to limit them
specifically. Revert the change that prevented that from happening.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

5a842a74

kernel: stop masking signals in create_io_thread() · b16b3855

Jens Axboe authored Mar 26, 2021

This is racy - move the blocking into when the task is created and
we're marking it as PF_IO_WORKER anyway. The IO threads are now
prepared to handle signals like SIGSTOP as well, so clear that from
the mask to allow proper stopping of IO threads.
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Reported-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

b16b3855

io_uring: handle signals for IO threads like a normal thread · dbe1bdbb

Jens Axboe authored Mar 25, 2021

We go through various hoops to disallow signals for the IO threads, but
there's really no reason why we cannot just allow them. The IO threads
never return to userspace like a normal thread, and hence don't go through
normal signal processing. Instead, just check for a pending signal as part
of the work loop, and call get_signal() to handle it for us if anything
is pending.

With that, we can support receiving signals, including special ones like
SIGSTOP.
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

dbe1bdbb