- 11 Apr, 2021 30 commits
-
-
Pavel Begunkov authored
There are two problems: 1) we always allocate refnodes in advance and free them if those haven't been used. It's expensive, takes two allocations, where one of them is percpu. And it may be pretty common not actually using them. 2) Current API with allocating a refnode and setting some of the fields is error prone, we don't ever want to have a file node runninng fixed buffer callback... Solve both with pre-init/get API. Pre-init just leaves the node for later if not used, and for get (i.e. io_rsrc_refnode_get()), you need to explicitly pass all arguments setting callbacks/etc., so it's more resilient. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Emphasize that return value of io_flush_cached_reqs() depends on number of requests in the cache. It looks nicer and might help tools from false-negative analyses. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Move the case of successfully issued request by doing that check first. It's not much of a difference, just generates slightly better code for me. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Inline __io_queue_linked_timeout(), we don't need it Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Don't do a function call (io_dismantle_req()) in the middle and place it to near other function calls, otherwise may lead to excessive register spilling. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
First of all, w need to set tctx->sqpoll only when we add a new entry into ->xa, so move it from the hot path. Also extract a hot path for io_uring_add_task_file() as an inline helper. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Add unlikely annotations, because my compiler pretty much mispredicts every first check, and apart jumping around in the fast path, it also generates extra instructions, like in advance setting ret value. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
__tctx_task_work() guarantees that ctx won't be killed while running task_works, so we can remove now unnecessary ctx pinning for internally armed polling. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
We can set canceled == true and complete out-of-line, ensure that we catch that and correctly return -ECANCELED if the poll operation got canceled. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
The correct function is io_iopoll_complete(), which deals with completions of IOPOLL requests, not io_poll_complete(). Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
We have to dig quite deep to check for particularly whether or not a file supports a fast-path nonblock attempt. For fixed files, we can do this lookup once and cache the state instead. This adds two new bits to track whether we support async read/write attempt, and lines up the REQ_F_ISREG bit with those two. The file slot re-uses the last 3 (or 2, for 32-bit) of the file pointer to cache that state, and then we mask it in when we go and use a fixed file. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
We don't allow them at registration time, so limit the check for needing inflight tracking in io_file_get() to the non-fixed path. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Use a more comprehensible() max instead of hand coding it with ifs in io_sqd_update_thread_idle(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
io_uring manipulates references twice for each request, and hence is very sensitive to performance of the reference count. This commit borrows a trick from: commit f958d7b5 Author: Linus Torvalds <torvalds@linux-foundation.org> Date: Thu Apr 11 10:06:20 2019 -0700 mm: make page ref count overflow check tighter and more explicit and switches to atomic_t for references, while still retaining overflow and underflow checks. This is good for a 2-3% increase in peak IOPS on a single core. Before: IOPS=2970879, IOS/call=31/31, inflight=128 (128) IOPS=2952597, IOS/call=31/31, inflight=128 (128) IOPS=2943904, IOS/call=31/31, inflight=128 (128) IOPS=2930006, IOS/call=31/31, inflight=96 (96) and after: IOPS=3054354, IOS/call=31/31, inflight=128 (128) IOPS=3059038, IOS/call=31/31, inflight=128 (128) IOPS=3060320, IOS/call=31/31, inflight=128 (128) IOPS=3068256, IOS/call=31/31, inflight=96 (96) Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
No functional changes in this patch, just in preparation for handling the references a bit more efficiently. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
If not for async_data NULL check, io_resubmit_prep() is already an rw specific version of io_req_prep_async(), but slower because 1) it always goes through io_import_iovec() even if following io_setup_async_rw() the result 2) instead of initialising iovec/iter in-place it does it on-stack and then copies with io_setup_async_rw(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Merge two function and do renaming in favour of the second one, it relays the meaning better. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
needs_async_data controls allocation of async_data, and used in two cases. 1) when async setup requires it (by io_req_prep_async() or handler themselves), and 2) when op always needs additional space to operate, like timeouts do. Opcode preps already don't bother about the second case and do allocation unconditionally, restrict needs_async_data to the first case only and rename it into needs_async_setup. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> [axboe: update for IOPOLL fix] Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
All opcode handlers pretty well know whether they need async data or not, and can skip testing for needs_async_data. The exception is rw the generic path, but those test the flag by hand anyway. So, check the flag and make io_alloc_async_data() allocating unconditionally. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
IORING_OP_[SEND,RECV] don't need async setup neither will get into io_req_prep_async(). Remove them from io_req_prep_async() and remove needs_async_data checks from the related setup functions. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
__io_cqring_fill_event() takes cflags as long to squeeze it into u32 in an CQE, awhile all users pass int or unsigned. Replace it with unsigned int and store it as u32 in struct io_completion to match CQE. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Always complete request holding the mutex instead of doing that strange dancing with conditional ordering. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Add a simple helper doing CQE posting, marking request for link-failure, and putting both submission and completion references. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
io_fixed_file_slot() and io_file_from_index() behave pretty similarly, DRY and call one from another. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Use io_req_task_queue_fail() on the fail path of io_req_task_queue(). It's unlikely to happen, so don't care about additional overhead, but allows to keep all the req->result invariant in a single function. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Don't bother to take a ctx->refs for io_req_task_cancel() because it take uring_lock before putting a request, and the context is promised to stay alive until unlock happens. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Linus Torvalds authored
-
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linuxLinus Torvalds authored
Pull btrfs fix from David Sterba: "One more patch that we'd like to get to 5.12 before release. It's changing where and how the superblock is stored in the zoned mode. It is an on-disk format change but so far there are no implications for users as the proper mkfs support hasn't been merged and is waiting for the kernel side to settle. Until now, the superblocks were derived from the zone index, but zone size can differ per device. This is changed to be based on fixed offset values, to make it independent of the device zone size. The work on that got a bit delayed, we discussed the exact locations to support potential device sizes and usecases. (Partially delayed also due to my vacation.) Having that in the same release where the zoned mode is declared usable is highly desired, there are userspace projects that need to be updated to recognize the feature. Pushing that to the next release would make things harder to test" * tag 'for-5.12-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: zoned: move superblock logging zone location
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull locking fixlets from Ingo Molnar: "Two minor fixes: one for a Clang warning, the other improves an ambiguous/confusing kernel log message" * tag 'locking-urgent-2021-04-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: lockdep: Address clang -Wformat warning printing for %hd lockdep: Add a missing initialization hint to the "INFO: Trying to register non-static key" message
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull x86 fixes from Borislav Petkov: - Fix the vDSO exception handling return path to disable interrupts again. - A fix for the CE collector to return the proper return values to its callers which are used to convey what the collector has done with the error address. * tag 'x86_urgent_for_v5.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/traps: Correct exc_general_protection() and math_error() return paths RAS/CEC: Correct ce_add_elem()'s returned values
-
- 10 Apr, 2021 10 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpuLinus Torvalds authored
Pull percpu fix from Dennis Zhou: "This contains a fix for sporadically failing atomic percpu allocations. I only caught it recently while I was reviewing a new series [1] and simultaneously saw reports by btrfs in xfstests [2] and [3]. In v5.9, memcg accounting was extended to percpu done by adding a second type of chunk. I missed an interaction with the free page float count used to ensure we can support atomic allocations. If one type of chunk has no free pages, but the other has enough to satisfy the free page float requirement, we will not repopulate the free pages for the former type of chunk. This led to the sporadically failing atomic allocations" Link: https://lore.kernel.org/linux-mm/20210324190626.564297-1-guro@fb.com/ [1] Link: https://lore.kernel.org/linux-mm/20210401185158.3275.409509F4@e16-tech.com/ [2] Link: https://lore.kernel.org/linux-mm/CAL3q7H5RNBjCi708GH7jnczAOe0BLnacT9C+OBgA-Dx9jhB6SQ@mail.gmail.com/ [3] * 'for-5.12-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu: percpu: make pcpu_nr_empty_pop_pages per chunk type
-
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds authored
Pull SCSI fixes from James Bottomley: "Seven fixes, all in drivers. The hpsa three are the most extensive and the most problematic: it's a packed structure misalignment that oopses on ia64 but looks like it would also oops on quite a few non-x86 architectures. The pm80xx is a regression and the rest are bug fixes for patches in the misc tree" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: scsi_transport_srp: Don't block target in SRP_PORT_LOST state scsi: target: iscsi: Fix zero tag inside a trace event scsi: pm80xx: Fix chip initialization failure scsi: ufs: core: Fix wrong Task Tag used in task management request UPIUs scsi: ufs: core: Fix task management request completion timeout scsi: hpsa: Add an assert to prevent __packed reintroduction scsi: hpsa: Fix boot on ia64 (atomic_t alignment) scsi: hpsa: Use __packed on individual structs, not header-wide
-
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linuxLinus Torvalds authored
Pull powerpc fixes from Michael Ellerman: "Some some more powerpc fixes for 5.12: - Fix an oops triggered by ptrace when CONFIG_PPC_FPU_REGS=n - Fix an oops on sigreturn when the VDSO is unmapped on 32-bit - Fix vdso_wrapper.o not being rebuilt everytime vdso.so is rebuilt Thanks to Christophe Leroy" * tag 'powerpc-5.12-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/vdso: Make sure vdso_wrapper.o is rebuilt everytime vdso.so is rebuilt powerpc/signal32: Fix Oops on sigreturn with unmapped VDSO powerpc/ptrace: Don't return error when getting/setting FP regs without CONFIG_PPC_FPU_REGS
-
Linus Torvalds authored
Merge tag 'driver-core-5.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core fix from Greg KH: "Here is a single driver core fix for 5.12-rc7 to resolve a reported problem that caused some devices to lockup when booting. It has been in linux-next with no reported issues" * tag 'driver-core-5.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: driver core: Fix locking bug in deferred_probe_timeout_work_func()
-
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usbLinus Torvalds authored
Pull USB/Thunderbolt fixes from Greg KH: "Here are a few small USB and Thunderbolt driver fixes for 5.12-rc7 for reported issues: - thunderbolt leaks and off-by-one fix - cdnsp deque fix - usbip fixes for syzbot-reported issues All have been in linux-next with no reported problems" * tag 'usb-5.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: usbip: synchronize event handler with sysfs code paths usbip: vudc synchronize sysfs code paths usbip: stub-dev synchronize sysfs code paths usbip: add sysfs_lock to synchronize sysfs code paths thunderbolt: Fix off by one in tb_port_find_retimer() thunderbolt: Fix a leak in tb_retimer_add() usb: cdnsp: Fixes issue with dequeuing requests after disabling endpoint
-
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linuxLinus Torvalds authored
Pull i2c fixes from Wolfram Sang: "A mixture of driver and documentation bugfixes for I2C" * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: imx: mention Oleksij as maintainer of the binding docs i2c: exynos5: correct top kerneldoc i2c: designware: Adjust bus_freq_hz when refuse high speed mode set i2c: hix5hd2: use the correct HiSilicon copyright i2c: gpio: update email address in binding docs i2c: imx: drop me as maintainer of binding docs i2c: stm32f4: Mundane typo fix I2C: JZ4780: Fix bug for Ingenic X1000. i2c: turn recovery error on init to debug
-
Naohiro Aota authored
Moves the location of the superblock logging zones. The new locations of the logging zones are now determined based on fixed block addresses instead of on fixed zone numbers. The old placement method based on fixed zone numbers causes problems when one needs to inspect a file system image without access to the drive zone information. In such case, the super block locations cannot be reliably determined as the zone size is unknown. By locating the superblock logging zones using fixed addresses, we can scan a dumped file system image without the zone information since a super block copy will always be present at or after the fixed known locations. Introduce the following three pairs of zones containing fixed offset locations, regardless of the device zone size. - primary superblock: offset 0B (and the following zone) - first copy: offset 512G (and the following zone) - Second copy: offset 4T (4096G, and the following zone) If a logging zone is outside of the disk capacity, we do not record the superblock copy. The first copy position is much larger than for a non-zoned filesystem, which is at 64M. This is to avoid overlapping with the log zones for the primary superblock. This higher location is arbitrary but allows supporting devices with very large zone sizes, plus some space around in between. Such large zone size is unrealistic and very unlikely to ever be seen in real devices. Currently, SMR disks have a zone size of 256MB, and we are expecting ZNS drives to be in the 1-4GB range, so this limit gives us room to breathe. For now, we only allow zone sizes up to 8GB. The maximum zone size that would still fit in the space is 256G. The fixed location addresses are somewhat arbitrary, with the intent of maintaining superblock reliability for smaller and larger devices, with the preference for the latter. For this reason, there are two superblocks under the first 1T. This should cover use cases for physical devices and for emulated/device-mapper devices. The superblock logging zones are reserved for superblock logging and never used for data or metadata blocks. Note that we only reserve the two zones per primary/copy actually used for superblock logging. We do not reserve the ranges of zones possibly containing superblocks with the largest supported zone size (0-16GB, 512G-528GB, 4096G-4112G). The zones containing the fixed location offsets used to store superblocks on a non-zoned volume are also reserved to avoid confusion. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linuxLinus Torvalds authored
Pull clk fixes from Stephen Boyd: "Here's the latest pile of clk driver and clk framework fixes for this release: - Two clk framework fixes for a long standing issue in clk_notifier_{register,unregister}() where we used a pointer that was for a struct containing a list head when there was no container struct - A compile warning fix for socfpga that's good to have - A double free problem with devm registered fixed factor clks - One last fix to the Qualcomm camera clk driver to use the right clk ops so clks don't get stuck and stop working because the firmware takes them for a ride" * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: clk: fixed: fix double free in resource managed fixed-factor clock clk: fix invalid usage of list cursor in unregister clk: fix invalid usage of list cursor in register clk: qcom: camcc: Update the clock ops for the SC7180 clk: socfpga: fix iomem pointer cast on 64-bit
-
Linus Torvalds authored
Merge tag 'perf-tools-fixes-for-v5.12-2020-04-09' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux Pull perf tool fixes from Arnaldo Carvalho de Melo: - Fix wrong LBR block sorting in 'perf report' - Fix 'perf inject' repipe usage when consuming perf.data files - Avoid potential buffer overrun when decoding ARM SPE hardware tracing packets, bug found using a fuzzer * tag 'perf-tools-fixes-for-v5.12-2020-04-09' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: perf arm-spe: Avoid potential buffer overrun perf report: Fix wrong LBR block sorting perf inject: Fix repipe usage
-
Linus Torvalds authored
Merge misc fixes from Andrew Morton: "14 patches. Subsystems affected by this patch series: mm (kasan, gup, pagecache, and kfence), MAINTAINERS, mailmap, nds32, gcov, ocfs2, ia64, and lib" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: lib: fix kconfig dependency on ARCH_WANT_FRAME_POINTERS kfence, x86: fix preemptible warning on KPTI-enabled systems lib/test_kasan_module.c: suppress unused var warning kasan: fix conflict with page poisoning fs: direct-io: fix missing sdio->boundary ia64: fix user_stack_pointer() for ptrace() ocfs2: fix deadlock between setattr and dio_end_io_write gcov: re-fix clang-11+ support nds32: flush_dcache_page: use page_mapping_file to avoid races with swapoff mm/gup: check page posion status for coredump. .mailmap: fix old email addresses mailmap: update email address for Jordan Crouse treewide: change my e-mail address, fix my name MAINTAINERS: update CZ.NIC's Turris information
-