- 25 Mar, 2021 7 commits
-
-
Paolo Valente authored
Many throughput-sensitive workloads are made of several parallel I/O flows, with all flows generated by the same application, or more generically by the same task (e.g., system boot). The most counterproductive action with these workloads is plugging I/O dispatch when one of the bfq_queues associated with these flows remains temporarily empty. To avoid this plugging, BFQ has been using a burst-handling mechanism for years now. This mechanism has proven effective for throughput, and not detrimental for service guarantees. This commit pushes this mechanism a little bit further, basing on the following two facts. First, all the I/O flows of a the same application or task contribute to the execution/completion of that common application or task. So the performance figures that matter are total throughput of the flows and task-wide I/O latency. In particular, these flows do not need to be protected from each other, in terms of individual bandwidth or latency. Second, the above fact holds regardless of the number of flows. Putting these two facts together, this commits merges stably the bfq_queues associated with these I/O flows, i.e., with the processes that generate these IO/ flows, regardless of how many the involved processes are. To decide whether a set of bfq_queues is actually associated with the I/O flows of a common application or task, and to merge these queues stably, this commit operates as follows: given a bfq_queue, say Q2, currently being created, and the last bfq_queue, say Q1, created before Q2, Q2 is merged stably with Q1 if - very little time has elapsed since when Q1 was created - Q2 has the same ioprio as Q1 - Q2 belongs to the same group as Q1 Merging bfq_queues also reduces scheduling overhead. A fio test with ten random readers on /dev/nullb shows a throughput boost of 40%, with a quadcore. Since BFQ's execution time amounts to ~50% of the total per-request processing time, the above throughput boost implies that BFQ's overhead is reduced by more than 50%. Tested-by: Jan Kara <jack@suse.cz> Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Link: https://lore.kernel.org/r/20210304174627.161-7-paolo.valente@linaro.orgSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Paolo Valente authored
Shared queues are likely to receive I/O at a high rate. This may deceptively let them be considered as wakers of other queues. But a false waker will unjustly steal bandwidth to its supposedly woken queue. So considering also shared queues in the waking mechanism may cause more control troubles than throughput benefits. This commit keeps shared queues out of the waker-detection mechanism. Tested-by: Jan Kara <jack@suse.cz> Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Link: https://lore.kernel.org/r/20210304174627.161-6-paolo.valente@linaro.orgSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Paolo Valente authored
When the io_latency heuristic is off, bfq_queues must not start to be weight-raised. Unfortunately, by mistake, this may happen when the state of a previously weight-raised bfq_queue is resumed after a queue split. This commit fixes this error. Tested-by: Jan Kara <jack@suse.cz> Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Link: https://lore.kernel.org/r/20210304174627.161-5-paolo.valente@linaro.orgSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Paolo Valente authored
Consider a bfq_queue bfqq that is about to be merged with another bfq_queue new_bfqq. The processes associated with bfqq are cooperators of the processes associated with new_bfqq. So, if bfqq has a waker, then it is reasonable (and beneficial for throughput) to assume that all these processes will be happy to let bfqq's waker freely inject I/O when they have no I/O. So this commit makes new_bfqq inherit bfqq's waker. Tested-by: Jan Kara <jack@suse.cz> Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Link: https://lore.kernel.org/r/20210304174627.161-4-paolo.valente@linaro.orgSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Paolo Valente authored
Consider a new I/O request that arrives for a bfq_queue bfqq. If, when this happens, the only active bfq_queues are bfqq and either its waker bfq_queue or one of its woken bfq_queues, then there is no point in queueing this new I/O request in bfqq for service. In fact, the in-service queue and bfqq agree on serving this new I/O request as soon as possible. So this commit puts this new I/O request directly into the dispatch list. Tested-by: Jan Kara <jack@suse.cz> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Link: https://lore.kernel.org/r/20210304174627.161-3-paolo.valente@linaro.orgSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Paolo Valente authored
Suppose that I/O dispatch is plugged, to wait for new I/O for the in-service bfq-queue, say bfqq. Suppose then that there is a further bfq_queue woken by bfqq, and that this woken queue has pending I/O. A woken queue does not steal bandwidth from bfqq, because it remains soon without I/O if bfqq is not served. So there is virtually no risk of loss of bandwidth for bfqq if this woken queue has I/O dispatched while bfqq is waiting for new I/O. In contrast, this extra I/O injection boosts throughput. This commit performs this extra injection. Tested-by: Jan Kara <jack@suse.cz> Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Link: https://lore.kernel.org/r/20210304174627.161-2-paolo.valente@linaro.orgSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Bhaskar Chowdhury authored
Sentence reconstruction for better readability. Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 21 Mar, 2021 23 commits
-
-
Linus Torvalds authored
-
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4Linus Torvalds authored
Pull ext4 fixes from Ted Ts'o: "Miscellaneous ext4 bug fixes for v5.12" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: initialize ret to suppress smatch warning ext4: stop inode update before return ext4: fix rename whiteout with fast commit ext4: fix timer use-after-free on failed mount ext4: fix potential error in ext4_do_update_inode ext4: do not try to set xattr into ea_inode if value is empty ext4: do not iput inode under running transaction in ext4_rename() ext4: find old entry again if failed to rename whiteout ext4: fix error handling in ext4_end_enable_verity() ext4: fix bh ref count on error paths fs/ext4: fix integer overflow in s_log_groups_per_flex ext4: add reclaim checks to xattr code ext4: shrink race window in ext4_should_retry_alloc()
-
git://git.kernel.dk/linux-blockLinus Torvalds authored
Pull io_uring followup fixes from Jens Axboe: - The SIGSTOP change from Eric, so we properly ignore that for PF_IO_WORKER threads. - Disallow sending signals to PF_IO_WORKER threads in general, we're not interested in having them funnel back to the io_uring owning task. - Stable fix from Stefan, ensuring we properly break links for short send/sendmsg recv/recvmsg if MSG_WAITALL is set. - Catch and loop when needing to run task_work before a PF_IO_WORKER threads goes to sleep. * tag 'io_uring-5.12-2021-03-21' of git://git.kernel.dk/linux-block: io_uring: call req_set_fail_links() on short send[msg]()/recv[msg]() with MSG_WAITALL io-wq: ensure task is running before processing task_work signal: don't allow STOP on PF_IO_WORKER threads signal: don't allow sending any signals to PF_IO_WORKER threads
-
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/stagingLinus Torvalds authored
Pull staging and IIO driver fixes from Greg KH: "Some small staging and IIO driver fixes: - MAINTAINERS changes for the move of the staging mailing list - comedi driver fixes to get request_irq() to work correctly - counter driver fixes for reported issues with iio devices - tiny iio driver fixes for reported issues. All of these have been in linux-next with no reported problems" * tag 'staging-5.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: staging: vt665x: fix alignment constraints staging: comedi: cb_pcidas64: fix request_irq() warn staging: comedi: cb_pcidas: fix request_irq() warn MAINTAINERS: move the staging subsystem to lists.linux.dev MAINTAINERS: move some real subsystems off of the staging mailing list iio: gyro: mpu3050: Fix error handling in mpu3050_trigger_handler iio: hid-sensor-temperature: Fix issues of timestamp channel iio: hid-sensor-humidity: Fix alignment issue of timestamp channel counter: stm32-timer-cnt: fix ceiling miss-alignment with reload register counter: stm32-timer-cnt: fix ceiling write max value counter: stm32-timer-cnt: Report count function when SLAVE_MODE_DISABLED iio: adc: ab8500-gpadc: Fix off by 10 to 3 iio:adc:stm32-adc: Add HAS_IOMEM dependency iio: adis16400: Fix an error code in adis16400_initial_setup() iio: adc: adi-axi-adc: add proper Kconfig dependencies iio: adc: ad7949: fix wrong ADC result due to incorrect bit mask iio: hid-sensor-prox: Fix scale not correct issue iio:adc:qcom-spmi-vadc: add default scale to LR_MUX2_BAT_ID channel
-
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usbLinus Torvalds authored
Pull USB and Thunderbolt driver fixes from Greg KH: "Here are some small Thunderbolt and USB driver fixes for some reported issues: - thunderbolt fixes for minor problems - typec fixes for power issues - usb-storage quirk addition - usbip bugfix - dwc3 bugfix when stopping transfers - cdnsp bugfix for isoc transfers - gadget use-after-free fix All have been in linux-next this week with no reported issues" * tag 'usb-5.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: usb: typec: tcpm: Skip sink_cap query only when VDM sm is busy usb: dwc3: gadget: Prevent EP queuing while stopping transfers usb: typec: tcpm: Invoke power_supply_changed for tcpm-source-psy- usb: typec: Remove vdo[3] part of tps6598x_rx_identity_reg struct usb-storage: Add quirk to defeat Kindle's automatic unload usb: gadget: configfs: Fix KASAN use-after-free usbip: Fix incorrect double assignment to udc->ud.tcp_rx usb: cdnsp: Fixes incorrect value in ISOC TRB thunderbolt: Increase runtime PM reference count on DP tunnel discovery thunderbolt: Initialize HopID IDAs in tb_switch_alloc()
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull irq fix from Ingo Molnar: "A change to robustify force-threaded IRQ handlers to always disable interrupts, plus a DocBook fix. The force-threaded IRQ handler change has been accelerated from the normal schedule of such a change to keep the bad pattern/workaround of spin_lock_irqsave() in handlers or IRQF_NOTHREAD as a kludge from spreading" * tag 'irq-urgent-2021-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq: Disable interrupts for force threaded handlers genirq/irq_sim: Fix typos in kernel doc (fnode -> fwnode)
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull perf fixes from Ingo Molnar: "Boundary condition fixes for bugs unearthed by the perf fuzzer" * tag 'perf-urgent-2021-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel: Fix unchecked MSR access error caused by VLBR_EVENT perf/x86/intel: Fix a crash caused by zero PEBS status
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull locking fixes from Ingo Molnar: - Get static calls & modules right. Hopefully. - WW mutex fixes * tag 'locking-urgent-2021-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: static_call: Fix static_call_update() sanity check static_call: Align static_call_is_init() patching condition static_call: Fix static_call_set_init() locking/ww_mutex: Fix acquire/release imbalance in ww_acquire_init()/ww_acquire_fini() locking/ww_mutex: Simplify use_ww_ctx & ww_ctx handling
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull EFI fixes from Ingo Molnar: - another missing RT_PROP table related fix, to ensure that the efivarfs pseudo filesystem fails gracefully if variable services are unsupported - use the correct alignment for literal EFI GUIDs - fix a use after unmap issue in the memreserve code * tag 'efi-urgent-2021-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: efi: use 32-bit alignment for efi_guid_t literals firmware/efi: Fix a use after bug in efi_mem_reserve_persistent efivars: respect EFI_UNSUPPORTED return from firmware
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull x86 fixes from Borislav Petkov: "The freshest pile of shiny x86 fixes for 5.12: - Add the arch-specific mapping between physical and logical CPUs to fix devicetree-node lookups - Restore the IRQ2 ignore logic - Fix get_nr_restart_syscall() to return the correct restart syscall number. Split in a 4-patches set to avoid kABI breakage when backporting to dead kernels" * tag 'x86_urgent_for_v5.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/apic/of: Fix CPU devicetree-node lookups x86/ioapic: Ignore IRQ2 again x86: Introduce restart_block->arch_data to remove TS_COMPAT_RESTART x86: Introduce TS_COMPAT_RESTART to fix get_nr_restart_syscall() x86: Move TS_COMPAT back to asm/thread_info.h kernel, fs: Introduce and use set_restart_fn() and arch_set_restart_data()
-
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linuxLinus Torvalds authored
Pull powerpc fixes from Michael Ellerman: - Fix a possible stack corruption and subsequent DLPAR failure in the rpadlpar_io PCI hotplug driver - Two build fixes for uncommon configurations Thanks to Christophe Leroy and Tyrel Datwyler. * tag 'powerpc-5.12-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: PCI: rpadlpar: Fix potential drc_name corruption in store functions powerpc: Force inlining of cpu_has_feature() to avoid build failure powerpc/vdso32: Add missing _restgpr_31_x to fix build failure
-
Stefan Metzmacher authored
Without that it's not safe to use them in a linked combination with others. Now combinations like IORING_OP_SENDMSG followed by IORING_OP_SPLICE should be possible. We already handle short reads and writes for the following opcodes: - IORING_OP_READV - IORING_OP_READ_FIXED - IORING_OP_READ - IORING_OP_WRITEV - IORING_OP_WRITE_FIXED - IORING_OP_WRITE - IORING_OP_SPLICE - IORING_OP_TEE Now we have it for these as well: - IORING_OP_SENDMSG - IORING_OP_SEND - IORING_OP_RECVMSG - IORING_OP_RECV For IORING_OP_RECVMSG we also check for the MSG_TRUNC and MSG_CTRUNC flags in order to call req_set_fail_links(). There might be applications arround depending on the behavior that even short send[msg]()/recv[msg]() retuns continue an IOSQE_IO_LINK chain. It's very unlikely that such applications pass in MSG_WAITALL, which is only defined in 'man 2 recvmsg', but not in 'man 2 sendmsg'. It's expected that the low level sock_sendmsg() call just ignores MSG_WAITALL, as MSG_ZEROCOPY is also ignored without explicitly set SO_ZEROCOPY. We also expect the caller to know about the implicit truncation to MAX_RW_COUNT, which we don't detect. cc: netdev@vger.kernel.org Link: https://lore.kernel.org/r/c4e1a4cc0d905314f4d5dc567e65a7b09621aab3.1615908477.git.metze@samba.orgSigned-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
Mark the current task as running if we need to run task_work from the io-wq threads as part of work handling. If that is the case, then return as such so that the caller can appropriately loop back and reset if it was part of a going-to-sleep flush. Fixes: 3bfe6106 ("io-wq: fork worker threads from original task") Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Eric W. Biederman authored
Just like we don't allow normal signals to IO threads, don't deliver a STOP to a task that has PF_IO_WORKER set. The IO threads don't take signals in general, and have no means of flushing out a stop either. Longer term, we may want to look into allowing stop of these threads, as it relates to eg process freezing. For now, this prevents a spin issue if a SIGSTOP is delivered to the parent task. Reported-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
-
Jens Axboe authored
They don't take signals individually, and even if they share signals with the parent task, don't allow them to be delivered through the worker thread. Linux does allow this kind of behavior for regular threads, but it's really a compatability thing that we need not care about for the IO threads. Reported-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Theodore Ts'o authored
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
-
Pan Bian authored
The inode update should be stopped before returing the error code. Signed-off-by: Pan Bian <bianpan2016@163.com> Link: https://lore.kernel.org/r/20210117085732.93788-1-bianpan2016@163.com Fixes: 8016e29f ("ext4: fast commit recovery path") Cc: stable@kernel.org Reviewed-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
-
Harshad Shirwadkar authored
This patch adds rename whiteout support in fast commits. Note that the whiteout object that gets created is actually char device. Which imples, the function ext4_inode_journal_mode(struct inode *inode) would return "JOURNAL_DATA" for this inode. This has a consequence in fast commit code that it will make creation of the whiteout object a fast-commit ineligible behavior and thus will fall back to full commits. With this patch, this can be observed by running fast commits with rename whiteout and seeing the stats generated by ext4_fc_stats tracepoint as follows: ext4_fc_stats: dev 254:32 fc ineligible reasons: XATTR:0, CROSS_RENAME:0, JOURNAL_FLAG_CHANGE:0, NO_MEM:0, SWAP_BOOT:0, RESIZE:0, RENAME_DIR:0, FALLOC_RANGE:0, INODE_JOURNAL_DATA:16; num_commits:6, ineligible: 6, numblks: 3 So in short, this patch guarantees that in case of rename whiteout, we fall back to full commits. Amir mentioned that instead of creating a new whiteout object for every rename, we can create a static whiteout object with irrelevant nlink. That will make fast commits to not fall back to full commit. But until this happens, this patch will ensure correctness by falling back to full commits. Fixes: 8016e29f ("ext4: fast commit recovery path") Cc: stable@kernel.org Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20210316221921.1124955-1-harshadshirwadkar@gmail.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>
-
Jan Kara authored
When filesystem mount fails because of corrupted filesystem we first cancel the s_err_report timer reminding fs errors every day and only then we flush s_error_work. However s_error_work may report another fs error and re-arm timer thus resulting in timer use-after-free. Fix the problem by first flushing the work and only after that canceling the s_err_report timer. Reported-by: syzbot+628472a2aac693ab0fcd@syzkaller.appspotmail.com Fixes: 2d01ddc8 ("ext4: save error info to sb through journal if available") CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210315165906.2175-1-jack@suse.czSigned-off-by: Theodore Ts'o <tytso@mit.edu>
-
Shijie Luo authored
If set_large_file = 1 and errors occur in ext4_handle_dirty_metadata(), the error code will be overridden, go to out_brelse to avoid this situation. Signed-off-by: Shijie Luo <luoshijie1@huawei.com> Link: https://lore.kernel.org/r/20210312065051.36314-1-luoshijie1@huawei.com Cc: stable@kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
-
zhangyi (F) authored
Syzbot report a warning that ext4 may create an empty ea_inode if set an empty extent attribute to a file on the file system which is no free blocks left. WARNING: CPU: 6 PID: 10667 at fs/ext4/xattr.c:1640 ext4_xattr_set_entry+0x10f8/0x1114 fs/ext4/xattr.c:1640 ... Call trace: ext4_xattr_set_entry+0x10f8/0x1114 fs/ext4/xattr.c:1640 ext4_xattr_block_set+0x1d0/0x1b1c fs/ext4/xattr.c:1942 ext4_xattr_set_handle+0x8a0/0xf1c fs/ext4/xattr.c:2390 ext4_xattr_set+0x120/0x1f0 fs/ext4/xattr.c:2491 ext4_xattr_trusted_set+0x48/0x5c fs/ext4/xattr_trusted.c:37 __vfs_setxattr+0x208/0x23c fs/xattr.c:177 ... Now, ext4 try to store extent attribute into an external inode if ext4_xattr_block_set() return -ENOSPC, but for the case of store an empty extent attribute, store the extent entry into the extent attribute block is enough. A simple reproduce below. fallocate test.img -l 1M mkfs.ext4 -F -b 2048 -O ea_inode test.img mount test.img /mnt dd if=/dev/zero of=/mnt/foo bs=2048 count=500 setfattr -n "user.test" /mnt/foo Reported-by: syzbot+98b881fdd8ebf45ab4ae@syzkaller.appspotmail.com Fixes: 9c6e7853 ("ext4: reserve space for xattr entries/names") Cc: stable@kernel.org Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Link: https://lore.kernel.org/r/20210305120508.298465-1-yi.zhang@huawei.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>
-
zhangyi (F) authored
In ext4_rename(), when RENAME_WHITEOUT failed to add new entry into directory, it ends up dropping new created whiteout inode under the running transaction. After commit <9b88f9fb> ("ext4: Do not iput inode under running transaction"), we follow the assumptions that evict() does not get called from a transaction context but in ext4_rename() it breaks this suggestion. Although it's not a real problem, better to obey it, so this patch add inode to orphan list and stop transaction before final iput(). Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Link: https://lore.kernel.org/r/20210303131703.330415-2-yi.zhang@huawei.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>
-
zhangyi (F) authored
If we failed to add new entry on rename whiteout, we cannot reset the old->de entry directly, because the old->de could have moved from under us during make indexed dir. So find the old entry again before reset is needed, otherwise it may corrupt the filesystem as below. /dev/sda: Entry '00000001' in ??? (12) has deleted/unused inode 15. CLEARED. /dev/sda: Unattached inode 75 /dev/sda: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY. Fixes: 6b4b8e6b ("ext4: fix bug for rename with RENAME_WHITEOUT") Cc: stable@vger.kernel.org Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Link: https://lore.kernel.org/r/20210303131703.330415-1-yi.zhang@huawei.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>
-
- 20 Mar, 2021 7 commits
-
-
Thomas Gleixner authored
With interrupt force threading all device interrupt handlers are invoked from kernel threads. Contrary to hard interrupt context the invocation only disables bottom halfs, but not interrupts. This was an oversight back then because any code like this will have an issue: thread(irq_A) irq_handler(A) spin_lock(&foo->lock); interrupt(irq_B) irq_handler(B) spin_lock(&foo->lock); This has been triggered with networking (NAPI vs. hrtimers) and console drivers where printk() happens from an interrupt which interrupted the force threaded handler. Now people noticed and started to change the spin_lock() in the handler to spin_lock_irqsave() which affects performance or add IRQF_NOTHREAD to the interrupt request which in turn breaks RT. Fix the root cause and not the symptom and disable interrupts before invoking the force threaded handler which preserves the regular semantics and the usefulness of the interrupt force threading as a general debugging tool. For not RT this is not changing much, except that during the execution of the threaded handler interrupts are delayed until the handler returns. Vs. scheduling and softirq processing there is no difference. For RT kernels there is no issue. Fixes: 8d32a307 ("genirq: Provide forced interrupt threading") Reported-by: Johan Hovold <johan@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Johan Hovold <johan@kernel.org> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://lore.kernel.org/r/20210317143859.513307808@linutronix.de
-
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linuxLinus Torvalds authored
Pull RISC-V fixes from Palmer Dabbelt: "A handful of fixes for 5.12: - fix the SBI remote fence numbers for hypervisor fences, which had been transcribed in the wrong order in Linux. These fences are only used with the KVM patches applied. - fix a whole host of build warnings, these should have no functional change. - fix init_resources() to prevent an off-by-one error from causing an out-of-bounds array reference. This was manifesting during boot on vexriscv. - ensure the KASAN mappings are visible before proceeding to use them" * tag 'riscv-for-linus-5.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: Correct SPARSEMEM configuration RISC-V: kasan: Declare kasan_shallow_populate() static riscv: Ensure page table writes are flushed when initializing KASAN vmalloc RISC-V: Fix out-of-bounds accesses in init_resources() riscv: Fix compilation error with Canaan SoC ftrace: Fix spelling mistake "disabed" -> "disabled" riscv: fix bugon.cocci warnings riscv: process: Fix no prototype for arch_dup_task_struct riscv: ftrace: Use ftrace_get_regs helper riscv: process: Fix no prototype for show_regs riscv: syscall_table: Reduce W=1 compilation warnings noise riscv: time: Fix no prototype for time_init riscv: ptrace: Fix no prototype warnings riscv: sbi: Fix comment of __sbi_set_timer_v01 riscv: irq: Fix no prototype warning riscv: traps: Fix no prototype warnings RISC-V: correct enum sbi_ext_rfence_fid
-
git://git.samba.org/sfrench/cifs-2.6Linus Torvalds authored
Pull cifs fixes from Steve French: "Five cifs/smb3 fixes - three for stable, including an important ACL fix and security signature fix" * tag '5.12-rc3-smb3' of git://git.samba.org/sfrench/cifs-2.6: cifs: fix allocation size on newly created files cifs: warn and fail if trying to use rootfs without the config option fs/cifs/: fix misspellings using codespell tool cifs: Fix preauth hash corruption cifs: update new ACE pointer after populate_new_aces.
-
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds authored
Pull SCSI fixes from James Bottomley: "Eight fixes, all in drivers, all fairly minor either being fixes in error legs, memory leaks on teardown, context errors or semantic problems" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: mpt3sas: Do not use GFP_KERNEL in atomic context scsi: ufs: ufs-mediatek: Correct operator & -> && scsi: sd_zbc: Update write pointer offset cache scsi: lpfc: Fix some error codes in debugfs scsi: qla2xxx: Fix broken #endif placement scsi: st: Fix a use after free in st_open() scsi: myrs: Fix a double free in myrs_cleanup() scsi: ibmvfc: Free channel_setup_buf during device tear down
-
git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefsLinus Torvalds authored
Pull zonefs fixes from Damien Le Moal: - fix inode write open reference count (Chao) - Fix wrong write offset for asynchronous O_APPEND writes (me) - Prevent use of sequential zone file as swap files (me) * tag 'zonefs-5.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs: zonefs: fix to update .i_wr_refcnt correctly in zonefs_open_zone() zonefs: Fix O_APPEND async write handling zonefs: prevent use of seq files as swap file
-
git://git.kernel.dk/linux-blockLinus Torvalds authored
Pull block fixes from Jens Axboe: "Just an NVMe pull request this week: - fix tag allocation for keep alive - fix a unit mismatch for the Write Zeroes limits - various TCP transport fixes (Sagi Grimberg, Elad Grupi) - fix iosqes and iocqes validation for discovery controllers (Sagi Grimberg)" * tag 'block-5.12-2021-03-19' of git://git.kernel.dk/linux-block: nvmet-tcp: fix kmap leak when data digest in use nvmet: don't check iosqes,iocqes for discovery controllers nvme-rdma: fix possible hang when failing to set io queues nvme-tcp: fix possible hang when failing to set io queues nvme-tcp: fix misuse of __smp_processor_id with preemption enabled nvme-tcp: fix a NULL deref when receiving a 0-length r2t PDU nvme: fix Write Zeroes limitations nvme: allocate the keep alive request using BLK_MQ_REQ_NOWAIT nvme: merge nvme_keep_alive into nvme_keep_alive_work nvme-fabrics: only reserve a single tag
-
git://git.kernel.dk/linux-blockLinus Torvalds authored
Pull io_uring fixes from Jens Axboe: "Quieter week this time, which was both expected and desired. About half of the below is fixes for this release, the other half are just fixes in general. In detail: - Fix the freezing of IO threads, by making the freezer not send them fake signals. Make them freezable by default. - Like we did for personalities, move the buffer IDR to xarray. Kills some code and avoids a use-after-free on teardown. - SQPOLL cleanups and fixes (Pavel) - Fix linked timeout race (Pavel) - Fix potential completion post use-after-free (Pavel) - Cleanup and move internal structures outside of general kernel view (Stefan) - Use MSG_SIGNAL for send/recv from io_uring (Stefan)" * tag 'io_uring-5.12-2021-03-19' of git://git.kernel.dk/linux-block: io_uring: don't leak creds on SQO attach error io_uring: use typesafe pointers in io_uring_task io_uring: remove structures from include/linux/io_uring.h io_uring: imply MSG_NOSIGNAL for send[msg]()/recv[msg]() calls io_uring: fix sqpoll cancellation via task_work io_uring: add generic callback_head helpers io_uring: fix concurrent parking io_uring: halt SQO submission on ctx exit io_uring: replace sqd rw_semaphore with mutex io_uring: fix complete_post use ctx after free io_uring: fix ->flags races by linked timeouts io_uring: convert io_buffer_idr to XArray io_uring: allow IO worker threads to be frozen kernel: freezer should treat PF_IO_WORKER like PF_KTHREAD for freezing
-
- 19 Mar, 2021 3 commits
-
-
Johan Hovold authored
Architectures that describe the CPU topology in devicetree and do not have an identity mapping between physical and logical CPU ids must override the default implementation of arch_match_cpu_phys_id(). Failing to do so breaks CPU devicetree-node lookups using of_get_cpu_node() and of_cpu_device_node_get() which several drivers rely on. It also causes the CPU struct devices exported through sysfs to point to the wrong devicetree nodes. On x86, CPUs are described in devicetree using their APIC ids and those do not generally coincide with the logical ids, even if CPU0 typically uses APIC id 0. Add the missing implementation of arch_match_cpu_phys_id() so that CPU-node lookups work also with SMP. Apart from fixing the broken sysfs devicetree-node links this likely does not affect current users of mainline kernels on x86. Fixes: 4e07db9c ("x86/devicetree: Use CPU description from Device Tree") Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210312092033.26317-1-johan@kernel.org
-
git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds authored
Pull kvm fixes from Paolo Bonzini: "Fixes for kvm on x86: - new selftests - fixes for migration with HyperV re-enlightenment enabled - fix RCU/SRCU usage - fixes for local_irq_restore misuse false positive" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: documentation/kvm: additional explanations on KVM_SET_BOOT_CPU_ID x86/kvm: Fix broken irq restoration in kvm_wait KVM: X86: Fix missing local pCPU when executing wbinvd on all dirty pCPUs KVM: x86: Protect userspace MSR filter with SRCU, and set atomically-ish selftests: kvm: add set_boot_cpu_id test selftests: kvm: add _vm_ioctl selftests: kvm: add get_msr_index_features selftests: kvm: Add basic Hyper-V clocksources tests KVM: x86: hyper-v: Don't touch TSC page values when guest opted for re-enlightenment KVM: x86: hyper-v: Track Hyper-V TSC page status KVM: x86: hyper-v: Prevent using not-yet-updated TSC page by secondary CPUs KVM: x86: hyper-v: Limit guest to writing zero to HV_X64_MSR_TSC_EMULATION_STATUS KVM: x86/mmu: Store the address space ID in the TDP iterator KVM: x86/mmu: Factor out tdp_iter_return_to_root KVM: x86/mmu: Fix RCU usage when atomically zapping SPTEs KVM: x86/mmu: Fix RCU usage in handle_removed_tdp_mmu_page
-
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linuxLinus Torvalds authored
Pull gpio fixes from Bartosz Golaszewski: "Two fixes for the GPIO subsystem. Both address issues in the core GPIO code: - fix the return value in error path in gpiolib_dev_init() - fix the 'gpio-line-names' property handling correctly this time" * tag 'gpio-fixes-for-v5.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: gpiolib: Assign fwnode to parent's if no primary one provided gpiolib: Fix error return code in gpiolib_dev_init()
-