Commits · 9817ad85899fb695f875610fb743cb18cf087582 · Kirill Smelkov / linux

08 Mar, 2024 2 commits

io_uring/net: remove dependency on REQ_F_PARTIAL_IO for sr->done_io · 9817ad85

Jens Axboe authored Mar 07, 2024

Ensure that prep handlers always initialize sr->done_io before any
potential failure conditions, and with that, we now it's always been
set even for the failure case.

With that, we don't need to use the REQ_F_PARTIAL_IO flag to gate on that.
Additionally, we should not overwrite req->cqe.res unless sr->done_io is
actually positive.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

9817ad85

io_uring/net: correctly handle multishot recvmsg retry setup · deaef31b

Jens Axboe authored Mar 07, 2024

If we loop for multishot receive on the initial attempt, and then abort
later on to wait for more, we miss a case where we should be copying the
io_async_msghdr from the stack to stable storage. This leads to the next
retry potentially failing, if the application had the msghdr on the
stack.

Cc: stable@vger.kernel.org
Fixes: 9bb66906 ("io_uring: support multishot in recvmsg")
Signed-off-by: Jens Axboe <axboe@kernel.dk>

deaef31b

07 Mar, 2024 3 commits

io_uring/net: clear REQ_F_BL_EMPTY in the multishot retry handler · b5311dbc

Jens Axboe authored Mar 07, 2024

This flag should not be persistent across retries, so ensure we clear
it before potentially attemting a retry.

Fixes: c3f9109d ("io_uring/kbuf: flag request if buffer pool is empty after buffer pick")
Signed-off-by: Jens Axboe <axboe@kernel.dk>

b5311dbc

io_uring: fix io_queue_proc modifying req->flags · 1a8ec63b

Pavel Begunkov authored Mar 07, 2024

With multiple poll entries __io_queue_proc() might be running in
parallel with poll handlers and possibly task_work, we should not be
carelessly modifying req->flags there. io_poll_double_prepare() handles
a similar case with locking but it's much easier to move it into
__io_arm_poll_handler().

Cc: stable@vger.kernel.org
Fixes: 595e5228 ("io_uring/poll: don't enable lazy wake for POLLEXCLUSIVE")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/455cc49e38cf32026fa1b49670be8c162c2cb583.1709834755.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

1a8ec63b

io_uring: fix mshot read defer taskrun cqe posting · 70581dcd

Pavel Begunkov authored Mar 06, 2024

We can't post CQEs from io-wq with DEFER_TASKRUN set, normal completions
are handled but aux should be explicitly disallowed by opcode handlers.

Cc: stable@vger.kernel.org
Fixes: fc68fcda ("io_uring/rw: add support for IORING_OP_READ_MULTISHOT")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/6fb7cba6f5366da25f4d3eb95273f062309d97fa.1709740837.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

70581dcd

04 Mar, 2024 2 commits

io_uring/net: fix overflow check in io_recvmsg_mshot_prep() · 8ede3db5

Dan Carpenter authored Mar 01, 2024

The "controllen" variable is type size_t (unsigned long). Casting it
to int could lead to an integer underflow.

The check_add_overflow() function considers the type of the destination
which is type int. If we add two positive values and the result cannot
fit in an integer then that's counted as an overflow.

However, if we cast "controllen" to an int and it turns negative, then
negative values *can* fit into an int type so there is no overflow.

Good: 100 + (unsigned long)-4 = 96 <-- overflow
Bad: 100 + (int)-4 = 96 <-- no overflow

I deleted the cast of the sizeof() as well. That's not a bug but the
cast is unnecessary.

Fixes: 9b0fc3c0 ("io_uring: fix types in io_recvmsg_multishot_overflow")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://lore.kernel.org/r/138bd2e2-ede8-4bcc-aa7b-f3d9de167a37@moroto.mountainSigned-off-by: Jens Axboe <axboe@kernel.dk>

8ede3db5

io_uring/net: correct the type of variable · 86bcacc9

Muhammad Usama Anjum authored Mar 01, 2024

The namelen is of type int. It shouldn't be made size_t which is
unsigned. The signed number is needed for error checking before use.

Fixes: c5597802 ("io_uring/net: move receive multishot out of the generic msghdr path")
Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Link: https://lore.kernel.org/r/20240301144349.2807544-1-usama.anjum@collabora.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

86bcacc9

01 Mar, 2024 2 commits

io_uring/sqpoll: statistics of the true utilization of sq threads · 3fcb9d17

Xiaobing Li authored Feb 28, 2024

Count the running time and actual IO processing time of the sqpoll
thread, and output the statistical data to fdinfo.

Variable description:
"work_time" in the code represents the sum of the jiffies of the sq
thread actually processing IO, that is, how many milliseconds it
actually takes to process IO. "total_time" represents the total time
that the sq thread has elapsed from the beginning of the loop to the
current time point, that is, how many milliseconds it has spent in
total.

The test tool is fio, and its parameters are as follows:
[global]
ioengine=io_uring
direct=1
group_reporting
bs=128k
norandommap=1
randrepeat=0
refill_buffers
ramp_time=30s
time_based
runtime=1m
clocksource=clock_gettime
overwrite=1
log_avg_msec=1000
numjobs=1

[disk0]
filename=/dev/nvme0n1
rw=read
iodepth=16
hipri
sqthread_poll=1

The test results are as follows:
Every 2.0s: cat /proc/9230/fdinfo/6 | grep -E Sq
SqMask: 0x3
SqHead: 3197153
SqTail: 3197153
CachedSqHead:   3197153
SqThread:       9231
SqThreadCpu:    11
SqTotalTime:    18099614
SqWorkTime:     16748316

The test results corresponding to different iodepths are as follows:
|-----------|-------|-------|-------|------|-------|
|   iodepth |   1   |   4   |   8   |  16  |  64   |
|-----------|-------|-------|-------|------|-------|
|utilization| 2.9%  | 8.8%  | 10.9% | 92.9%| 84.4% |
|-----------|-------|-------|-------|------|-------|
|    idle   | 97.1% | 91.2% | 89.1% | 7.1% | 15.6% |
|-----------|-------|-------|-------|------|-------|
Signed-off-by: Xiaobing Li <xiaobing.li@samsung.com>
Link: https://lore.kernel.org/r/20240228091251.543383-1-xiaobing.li@samsung.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

3fcb9d17

io_uring/net: move recv/recvmsg flags out of retry loop · eb18c29d

Jens Axboe authored Feb 25, 2024

The flags don't change, just intialize them once rather than every loop
for multishot.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

eb18c29d

27 Feb, 2024 4 commits

io_uring/kbuf: flag request if buffer pool is empty after buffer pick · c3f9109d

Jens Axboe authored Feb 19, 2024

Normally we do an extra roundtrip for retries even if the buffer pool has
depleted, as we don't check that upfront. Rather than add this check, have
the buffer selection methods mark the request with REQ_F_BL_EMPTY if the
used buffer group is out of buffers after this selection. This is very
cheap to do once we're all the way inside there anyway, and it gives the
caller a chance to make better decisions on how to proceed.

For example, recv/recvmsg multishot could check this flag when it
decides whether to keep receiving or not.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

c3f9109d

io_uring/net: improve the usercopy for sendmsg/recvmsg · 792060de

Jens Axboe authored Feb 26, 2024

We're spending a considerable amount of the sendmsg/recvmsg time just
copying in the message header. And for provided buffers, the known
single entry iovec.

Be a bit smarter about it and enable/disable user access around our
copying. In a test case that does both sendmsg and recvmsg, the
runtime before this change (averaged over multiple runs, very stable
times however):

Kernel		Time		Diff
====================================
-git		4720 usec
-git+commit	4311 usec	-8.7%

and looking at a profile diff, we see the following:

0.25%     +9.33%  [kernel.kallsyms]     [k] _copy_from_user
4.47%     -3.32%  [kernel.kallsyms]     [k] __io_msg_copy_hdr.constprop.0

where we drop more than 9% of _copy_from_user() time, and consequently
add time to __io_msg_copy_hdr() where the copies are now attributed to,
but with a net win of 6%.

In comparison, the same test case with send/recv runs in 3745 usec, which
is (expectedly) still quite a bit faster. But at least sendmsg/recvmsg is
now only ~13% slower, where it was ~21% slower before.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

792060de

io_uring/net: move receive multishot out of the generic msghdr path · c5597802

Jens Axboe authored Feb 27, 2024

Move the actual user_msghdr / compat_msghdr into the send and receive
sides, respectively, so we can move the uaddr receive handling into its
own handler, and ditto the multishot with buffer selection logic.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

c5597802

io_uring/net: unify how recvmsg and sendmsg copy in the msghdr · 52307ac4

Jens Axboe authored Feb 19, 2024

For recvmsg, we roll our own since we support buffer selections. This
isn't the case for sendmsg right now, but in preparation for doing so,
make the recvmsg copy helpers generic so we can call them from the
sendmsg side as well.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

52307ac4

15 Feb, 2024 2 commits

io_uring/napi: enable even with a timeout of 0 · b4ccc4dd

Jens Axboe authored Feb 15, 2024

1 usec is not as short as it used to be, and it makes sense to allow 0
for a busy poll timeout - this means just do one loop to check if we
have anything available. Add a separate ->napi_enabled to check if napi
has been enabled or not.

While at it, move the writing of the ctx napi values after we've copied
the old values back to userspace. This ensures that if the call fails,
we'll be in the same state as we were before, rather than some
indeterminate state.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

b4ccc4dd

io_uring: kill stale comment for io_cqring_overflow_kill() · 871760eb

Jens Axboe authored Feb 15, 2024

This function now deals only with discarding overflow entries on ring
free and exit, and it no longer returns whether we successfully flushed
all entries as there's no CQE posting involved anymore. Kill the
outdated comment.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

871760eb

14 Feb, 2024 3 commits

io_uring/sqpoll: use the correct check for pending task_work · c8d8fc3b

Jens Axboe authored Feb 14, 2024

A previous commit moved to using just the private task_work list for
SQPOLL, but it neglected to update the check for whether we have
pending task_work. Normally this is fine as we'll attempt to run it
unconditionally, but if we race with going to sleep AND task_work
being added, then we certainly need the right check here.

Fixes: af5d68f8 ("io_uring/sqpoll: manage task_work privately")
Signed-off-by: Jens Axboe <axboe@kernel.dk>

c8d8fc3b

io_uring: wake SQPOLL task when task_work is added to an empty queue · 78f9b61b

Jens Axboe authored Feb 14, 2024

If there's no current work on the list, we still need to potentially
wake the SQPOLL task if it is sleeping. This is ordered with the
wait queue addition in sqpoll, which adds to the wait queue before
checking for pending work items.

Fixes: af5d68f8 ("io_uring/sqpoll: manage task_work privately")
Signed-off-by: Jens Axboe <axboe@kernel.dk>

78f9b61b

io_uring/napi: ensure napi polling is aborted when work is available · 428f1382

Jens Axboe authored Feb 14, 2024

While testing io_uring NAPI with DEFER_TASKRUN, I ran into slowdowns and
stalls in packet delivery. Turns out that while
io_napi_busy_loop_should_end() aborts appropriately on regular
task_work, it does not abort if we have local task_work pending.

Move io_has_work() into the private io_uring.h header, and gate whether
we should continue polling on that as well. This makes NAPI polling on
send/receive work as designed with IORING_SETUP_DEFER_TASKRUN as well.

Fixes: 8d0c12a8 ("io-uring: add napi busy poll support")
Signed-off-by: Jens Axboe <axboe@kernel.dk>

428f1382

13 Feb, 2024 1 commit

io_uring: Don't include af_unix.h. · 3fb1764c

Kuniyuki Iwashima authored Feb 12, 2024

Changes to AF_UNIX trigger rebuild of io_uring, but io_uring does
not use AF_UNIX anymore.

Let's not include af_unix.h and instead include necessary headers.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20240212234236.63714-1-kuniyu@amazon.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

3fb1764c

09 Feb, 2024 9 commits

io_uring: add register/unregister napi function · ef1186c1

Stefan Roesch authored Jun 08, 2023

This adds an api to register and unregister the napi for io-uring. If
the arg value is specified when unregistering, the current napi setting
for the busy poll timeout is copied into the user structure. If this is
not required, NULL can be passed as the arg value.
Signed-off-by: Stefan Roesch <shr@devkernel.io>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230608163839.2891748-7-shr@devkernel.ioSigned-off-by: Jens Axboe <axboe@kernel.dk>

ef1186c1

io-uring: add sqpoll support for napi busy poll · ff183d42

Stefan Roesch authored Jun 08, 2023

This adds the sqpoll support to the io-uring napi.
Signed-off-by: Stefan Roesch <shr@devkernel.io>
Suggested-by: Olivier Langlois <olivier@trillion01.com>
Link: https://lore.kernel.org/r/20230608163839.2891748-6-shr@devkernel.ioSigned-off-by: Jens Axboe <axboe@kernel.dk>

ff183d42

io-uring: add napi busy poll support · 8d0c12a8

Stefan Roesch authored Jun 08, 2023

This adds the napi busy polling support in io_uring.c. It adds a new
napi_list to the io_ring_ctx structure. This list contains the list of
napi_id's that are currently enabled for busy polling. The list is
synchronized by the new napi_lock spin lock. The current default napi
busy polling time is stored in napi_busy_poll_to. If napi busy polling
is not enabled, the value is 0.

In addition there is also a hash table. The hash table store the napi
id and the pointer to the above list nodes. The hash table is used to
speed up the lookup to the list elements. The hash table is synchronized
with rcu.

The NAPI_TIMEOUT is stored as a timeout to make sure that the time a
napi entry is stored in the napi list is limited.

The busy poll timeout is also stored as part of the io_wait_queue. This
is necessary as for sq polling the poll interval needs to be adjusted
and the napi callback allows only to pass in one value.

This has been tested with two simple programs from the liburing library
repository: the napi client and the napi server program. The client
sends a request, which has a timestamp in its payload and the server
replies with the same payload. The client calculates the roundtrip time
and stores it to calculate the results.

The client is running on host1 and the server is running on host 2 (in
the same rack). The measured times below are roundtrip times. They are
average times over 5 runs each. Each run measures 1 million roundtrips.

no rx coal rx coal: frames=88,usecs=33
Default 57us 56us

client_poll=100us 47us 46us

server_poll=100us 51us 46us

client_poll=100us+ 40us 40us
server_poll=100us

client_poll=100us+ 41us 39us
server_poll=100us+
prefer napi busy poll on client

client_poll=100us+ 41us 39us
server_poll=100us+
prefer napi busy poll on server

client_poll=100us+ 41us 39us
server_poll=100us+
prefer napi busy poll on client + server
Signed-off-by: Stefan Roesch <shr@devkernel.io>
Suggested-by: Olivier Langlois <olivier@trillion01.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230608163839.2891748-5-shr@devkernel.ioSigned-off-by: Jens Axboe <axboe@kernel.dk>

8d0c12a8

io-uring: move io_wait_queue definition to header file · 405b4dc1

Stefan Roesch authored Jun 08, 2023

This moves the definition of the io_wait_queue structure to the header
file so it can be also used from other files.
Signed-off-by: Stefan Roesch <shr@devkernel.io>
Link: https://lore.kernel.org/r/20230608163839.2891748-4-shr@devkernel.ioSigned-off-by: Jens Axboe <axboe@kernel.dk>

405b4dc1

Merge branch 'for-io_uring-add-napi-busy-polling-support' of... · adaad279

Jens Axboe authored Feb 09, 2024

Merge branch 'for-io_uring-add-napi-busy-polling-support' of git://git.kernel.org/pub/scm/linux/kernel/git/kuba/linux into for-6.9/io_uring

Pull netdev side of the io_uring napi support.

* 'for-io_uring-add-napi-busy-polling-support' of git://git.kernel.org/pub/scm/linux/kernel/git/kuba/linux:
  net: add napi_busy_loop_rcu()
  net: split off __napi_busy_poll from napi_busy_poll

adaad279

net: add napi_busy_loop_rcu() · b4e8ae5c

Stefan Roesch authored Feb 06, 2024

This adds the napi_busy_loop_rcu() function. This function assumes that
the calling function is already holding the rcu read lock and
napi_busy_loop() does not need to take the rcu read lock. Add a
NAPI_F_NO_SCHED flag, which tells __napi_busy_loop() to abort if we
need to reschedule rather than drop the RCU read lock and reschedule.
Signed-off-by: Stefan Roesch <shr@devkernel.io>
Link: https://lore.kernel.org/r/20230608163839.2891748-3-shr@devkernel.ioSigned-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

b4e8ae5c

net: split off __napi_busy_poll from napi_busy_poll · 13d381b4

Stefan Roesch authored Feb 06, 2024

This splits off the key part of the napi_busy_poll function into its own
function, __napi_busy_poll, and changes the prefer_busy_poll bool to be
flag based to allow passing in more flags in the future.

This is done in preparation for an additional napi_busy_poll() function,
that doesn't take the rcu_read_lock(). The new function is introduced
in the next patch.
Signed-off-by: Stefan Roesch <shr@devkernel.io>
Link: https://lore.kernel.org/r/20230608163839.2891748-2-shr@devkernel.ioSigned-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

13d381b4

io_uring: add support for ftruncate · b4bb1900

Tony Solomonik authored Feb 02, 2024

Adds support for doing truncate through io_uring, eliminating
the need for applications to roll their own thread pool or offload
mechanism to be able to do non-blocking truncates.
Signed-off-by: Tony Solomonik <tony.solomonik@gmail.com>
Link: https://lore.kernel.org/r/20240202121724.17461-3-tony.solomonik@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

b4bb1900

Add do_ftruncate that truncates a struct file · 5f0d594c

Tony Solomonik authored Feb 02, 2024

do_sys_ftruncate receives a file descriptor, fgets the struct file, and
finally actually truncates the file.

do_ftruncate allows for passing in a file directly, with the caller
already holding a reference to it.
Signed-off-by: Tony Solomonik <tony.solomonik@gmail.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Link: https://lore.kernel.org/r/20240202121724.17461-2-tony.solomonik@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

5f0d594c

08 Feb, 2024 12 commits

io_uring: Simplify the allocation of slab caches · a6e959bd

Kunwu Chan authored Jan 30, 2024

commit 0a31bd5f ("KMEM_CACHE(): simplify slab cache creation")
introduces a new macro.
Use the new KMEM_CACHE() macro instead of direct kmem_cache_create
to simplify the creation of SLAB caches.
Signed-off-by: Kunwu Chan <chentao@kylinos.cn>
Link: https://lore.kernel.org/r/20240130100247.81460-1-chentao@kylinos.cnSigned-off-by: Jens Axboe <axboe@kernel.dk>

a6e959bd

io_uring: re-arrange struct io_ring_ctx to reduce padding · da08d2ed

Jens Axboe authored Feb 08, 2024

Nothing major here, just moving a few things around to reduce the
padding. This reduces the size on a non-debug kernel from 1536 to
1472 bytes, saving a full cacheline.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

da08d2ed

io_uring/sqpoll: manage task_work privately · af5d68f8

Jens Axboe authored Feb 02, 2024

Decouple from task_work running, and cap the number of entries we process
at the time. If we exceed that number, push remaining entries to a retry
list that we'll process first next time.

We cap the number of entries to process at 8, which is fairly random.
We just want to get enough per-ctx batching here, while not processing
endlessly.

Since we manually run PF_IO_WORKER related task_work anyway as the task
never exits to userspace, with this we no longer need to add an actual
task_work item to the per-process list.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

af5d68f8

io_uring: pass in counter to handle_tw_list() rather than return it · 2708af1a

Jens Axboe authored Feb 02, 2024

No functional changes in this patch, just in preparation for returning
something other than count from this helper.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

2708af1a

io_uring: cleanup handle_tw_list() calling convention · 42c0905f

Jens Axboe authored Feb 02, 2024

Now that we don't loop around task_work anymore, there's no point in
maintaining the ring and locked state outside of handle_tw_list(). Get
rid of passing in those pointers (and pointers to pointers) and just do
the management internally in handle_tw_list().
Signed-off-by: Jens Axboe <axboe@kernel.dk>

42c0905f

io_uring/poll: improve readability of poll reference decrementing · 3cdc4be1

Jens Axboe authored Feb 01, 2024

This overly long line is hard to read. Break it up by AND'ing the
ref mask first, then perform the atomic_sub_return() with the value
itself.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

3cdc4be1

io_uring: remove unconditional looping in local task_work handling · 9fe3eaea

Jens Axboe authored Jan 31, 2024

If we have a ton of notifications coming in, we can be looping in here
for a long time. This can be problematic for various reasons, mostly
because we can starve userspace. If the application is waiting on N
events, then only re-run if we need more events.

Fixes: c0e0d6ba ("io_uring: add IORING_SETUP_DEFER_TASKRUN")
Signed-off-by: Jens Axboe <axboe@kernel.dk>

9fe3eaea

io_uring: remove next io_kiocb fetch in task_work running · 670d9d3d

Jens Axboe authored Jan 31, 2024

We just reversed the task_work list and that will have touched requests
as well, just get rid of this optimization as it should not make a
difference anymore.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

670d9d3d

io_uring: handle traditional task_work in FIFO order · 170539bd

Jens Axboe authored Jan 30, 2024

For local task_work, which is used if IORING_SETUP_DEFER_TASKRUN is set,
we reverse the order of the lockless list before processing the work.
This is done to process items in the order in which they were queued, as
the llist always adds to the head.

Do the same for traditional task_work, so we have the same behavior for
both types.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

170539bd

io_uring: remove 'loops' argument from trace_io_uring_task_work_run() · 4c98b891

Jens Axboe authored Jan 30, 2024

We no longer loop in task_work handling, hence delete the argument from
the tracepoint as it's always 1 and hence not very informative.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

4c98b891

io_uring: remove looping around handling traditional task_work · 592b4805

Jens Axboe authored Jan 30, 2024

A previous commit added looping around handling traditional task_work
as an optimization, and while that may seem like a good idea, it's also
possible to run into application starvation doing so. If the task_work
generation is bursty, we can get very deep task_work queues, and we can
end up looping in here for a very long time.

One immediately observable problem with that is handling network traffic
using provided buffers, where flooding incoming traffic and looping
task_work handling will very quickly lead to buffer starvation as we
keep running task_work rather than returning to the application so it
can handle the associated CQEs and also provide buffers back.

Fixes: 3a0c037b ("io_uring: batch task_work")
Signed-off-by: Jens Axboe <axboe@kernel.dk>

592b4805

io_uring/kbuf: cleanup passing back cflags · 8435c6f3

Jens Axboe authored Jan 29, 2024

We have various functions calculating the CQE cflags we need to pass
back, but it's all the same everywhere. Make a number of the putting
functions void, and just have the two main helps for this, io_put_kbuf()
and io_put_kbuf_comp() calculate the actual mask and pass it back.

While at it, cleanup how we put REQ_F_BUFFER_RING buffers. Before
this change, we would call into __io_put_kbuf() only to go right back
in to the header defined functions. As clearing this type of buffer
is just re-assigning the buf_index and incrementing the head, this
is very wasteful.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

8435c6f3