Commits · b303fe2e5a3802b0b1fb8d997e5c9caef48f6dd8 · Kirill Smelkov / linux

12 Apr, 2021 36 commits

io_uring: track inflight requests through counter · b303fe2e

Pavel Begunkov authored Apr 11, 2021

Instead of keeping requests in a inflight_list, just track them with a
per tctx atomic counter. Apart from it being much easier and more
consistent with task cancel, it frees ->inflight_entry from being shared
between iopoll and cancel-track, so less headache for us.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/3c2ee0863cd7eeefa605f3eaff4c1c461a6f1157.1618101759.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

b303fe2e

io_uring: unify task and files cancel loops · 368b2080

Pavel Begunkov authored Apr 11, 2021

Move tracked inflight number check up the stack into
__io_uring_files_cancel() so it's similar to task cancel. Will be used
for further cleaning.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/dca5a395efebd1e3e0f3bbc6b9640c5e8aa7e468.1618101759.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

368b2080

io_uring: simplify apoll hash removal · 0ea13b44

Pavel Begunkov authored Apr 09, 2021

hash_del() works well with non-hashed nodes, there's no need to check
if it is hashed first.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

0ea13b44

io_uring: refactor io_poll_complete() · e27414be

Pavel Begunkov authored Apr 09, 2021

Remove error parameter from io_poll_complete(), 0 is always passed,
and do a bit of cleaning on top.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

e27414be

io_uring: clean up io_poll_task_func() · f40b964a

Pavel Begunkov authored Apr 09, 2021

io_poll_complete() always fills an event (even an overflowed one), so we
always should do io_cqring_ev_posted() afterwards. And that's what is
currently happening, because second EPOLLONESHOT check is always true,
it can't return !done for oneshots.

Remove those branching, it's much easier to read.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

f40b964a

io-wq: Fix io_wq_worker_affinity() · e0051d7d

Peter Zijlstra authored Apr 08, 2021

Do not include private headers and do not frob in internals.

On top of that, while the previous code restores the affinity, it
doesn't ensure the task actually moves there if it was running,
leading to the fun situation that it can be observed running outside
of its allowed mask for potentially significant time.

Use the proper API instead.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/YG7QkiUzlEbW85TU@hirez.programming.kicks-ass.netSigned-off-by: Jens Axboe <axboe@kernel.dk>

e0051d7d

io_uring: don't attempt re-add of multishot poll request if racing · cb3b200e

Jens Axboe authored Apr 06, 2021

We currently allow racy updates to multishot requests, but we can end up
double adding the poll request if both completion and update does it.
Ensure that we skip re-add on the update side if someone else is
completing it.

Fixes: b69de288 ("io_uring: allow events and user_data update of running poll requests")
Reported-by: Joakim Hassila <joj@mac.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

cb3b200e

io-wq: simplify code in __io_worker_busy() · 417b5052

Hao Xu authored Apr 06, 2021

Leverage XOR to simplify the code in __io_worker_busy.
Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
Link: https://lore.kernel.org/r/1617678525-3129-1-git-send-email-haoxu@linux.alibaba.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

417b5052

io_uring: kill outdated comment about splice punt · 53a31267

Pavel Begunkov authored Apr 01, 2021

The splice/tee comment in io_prep_async_work() isn't relevant since the
section was moved, delete it.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/892a549c89c3d422b679677b8e68ffd3fcb736b6.1617287883.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

53a31267

io_uring: encapsulate fixed files into struct · a04b0ac0

Pavel Begunkov authored Apr 01, 2021

Add struct io_fixed_file representing a single registered file, first to
hide ugly struct file **, which may be misleading, and secondly to
retype it to unsigned long as conversions to it and back to file * for
handling and masking FFS_* flags are getting nasty.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/78669731a605a7614c577c3de552631cfaf0869a.1617287883.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

a04b0ac0

io_uring: refactor file tables alloc/free · 846a4ef2

Pavel Begunkov authored Apr 01, 2021

Introduce a heler io_free_file_tables() doing all the cleaning, there
are several places where it's hand coded. Also move all allocations into
io_sqe_alloc_file_tables() and rename it, so all of it is in one place.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/502a84ebf41ff119b095e59661e678eacb752bf8.1617287883.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

846a4ef2

io_uring: don't quiesce intial files register · f4f7d21c

Pavel Begunkov authored Apr 01, 2021

There is no reason why we would want to fully quiesce ring on
IORING_REGISTER_FILES, if it's already registered we fail.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/563bb8060bb2d3efbc32fce6101678281c574d2a.1617287883.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

f4f7d21c

io_uring: set proper FFS* flags on reg file update · 9a321c98

Pavel Begunkov authored Apr 01, 2021

Set FFS_* flags (e.g. FFS_ASYNC_READ) not only in initial registration
but also on registered files update. Not a bug, but may miss getting
profit out of the feature.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/df29a841a2d3d3695b509cdffce5070777d9d942.1617287883.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

9a321c98

io_uring: deduplicate NOSIGNAL setting · 04411806

Pavel Begunkov authored Apr 01, 2021

Set MSG_NOSIGNAL and REQ_F_NOWAIT in send/recv prep routines and don't
duplicate it in all four send/recv handlers.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e1133a3ed1c0e192975b7341ea4b0bf91f63b132.1617287883.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

04411806

io_uring: put link timeout req consistently · df9727af

Pavel Begunkov authored Apr 01, 2021

Don't put linked timeout req in io_async_find_and_cancel() but do it in
io_link_timeout_fn(), so we have only one point for that and won't have
to do it differently as it's now (put vs put_deferred). Btw, improve a
bit io_async_find_and_cancel()'s locking.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d75b70957f245275ab7cba83e0ac9c1b86aae78a.1617287883.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

df9727af

io_uring: simplify overflow handling · c4ea060e

Pavel Begunkov authored Apr 01, 2021

Overflowed CQEs doesn't lock requests anymore, so we don't care so much
about cancelling them, so kill cq_overflow_flushed and simplify the
code.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/5799867aeba9e713c32f49aef78e5e1aef9fbc43.1617287883.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

c4ea060e

io_uring: lock annotate timeouts and poll · e07785b0

Pavel Begunkov authored Apr 01, 2021

Add timeout and poll ->comletion_lock annotations for Sparse, makes life
easier while looking at the functions.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/2345325643093d41543383ba985a735aeb899eac.1617287883.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

e07785b0

io_uring: kill unused forward decls · 47e90392

Pavel Begunkov authored Apr 01, 2021

Kill unused forward declarations for io_ring_file_put() and
io_queue_next(). Also btw rename the first one.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/64aa27c3f9662e14615cc119189f5eaf12989671.1617287883.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

47e90392

io_uring: store reg buffer end instead of length · 4751f53d

Pavel Begunkov authored Apr 01, 2021

It's a bit more convenient for us to store a registered buffer end
address instead of length, see struct io_mapped_ubuf, as it allow to not
recompute it every time.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/39164403fe92f1dc437af134adeec2423cdf9395.1617287883.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

4751f53d

io_uring: improve import_fixed overflow checks · 75769e3f

Pavel Begunkov authored Apr 01, 2021

Replace a hand-coded overflow check with a specialised function. Even
though compilers are smart enough to generate identical binary (i.e.
check carry bit), but it's more foolproof and conveys the intention
better.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e437dcdc929bacbb6f11a4824ecbbf17225cb82a.1617287883.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

75769e3f

io_uring: refactor io_async_cancel() · 0aec38fd

Pavel Begunkov authored Apr 01, 2021

Remove extra tctx==NULL checks that are already done by
io_async_cancel_one().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/70c2a8b958d942e86958a28af0452966ce1095b0.1617287883.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

0aec38fd

io_uring: remove unused hash_wait · e146a4a3

Pavel Begunkov authored Apr 01, 2021

No users of io_uring_ctx::hash_wait left, kill it.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e25cb83c233a5f75f15275596b49fbafbea606fa.1617287883.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

e146a4a3

io_uring: better ref handling in poll_remove_one · 7394161c

Pavel Begunkov authored Apr 01, 2021

Instead of io_put_req() to drop not a final ref, use req_ref_put(),
which is slimmer and will also check the invariant.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/85b5774ce13ae55cc2e705abdc8cbafe1212f1bd.1617287883.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

7394161c

io_uring: combine lock/unlock sections on exit · 89b5066e

Pavel Begunkov authored Apr 01, 2021

io_ring_exit_work() already does uring_lock lock/unlock, no need to
repeat it for lock waiting trick in io_ring_ctx_free(). Move the waiting
with comments and spinlocking into io_ring_exit_work.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/a8ae0589b0ea64ad4791e2c282e4e9b713dd7024.1617287883.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

89b5066e

io_uring: remove useless is_dying check on quiesce · 215c3902

Pavel Begunkov authored Apr 01, 2021

rsrc_data refs should always be valid for potential submitters,
io_rsrc_ref_quiesce() restores it before unlocking, so
percpu_ref_is_dying() check in io_sqe_files_unregister() does nothing
and misleading. Concurrent quiesce is prevented with
struct io_rsrc_data::quiesce.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/bf97055e1748ee3a382e66daf384a469eb90b931.1617287883.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

215c3902

io_uring: reuse io_rsrc_node_destroy() · 28a9fe25

Pavel Begunkov authored Apr 01, 2021

Reuse io_rsrc_node_destroy() in __io_rsrc_put_work(). Also move it to a
more appropriate place -- to the other node routines, and remove forward
declaration.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/cccafba41aee1e5bb59988704885b1340aef3a27.1617287883.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

28a9fe25

io_uring: ctx-wide rsrc nodes · a7f0ed5a

Pavel Begunkov authored Apr 01, 2021

If we're going to ever support multiple types of resources we need
shared rsrc nodes to not bloat requests, that is implemented in this
patch. It also gives a nicer API and saves one pointer dereference
in io_req_set_rsrc_node().

We may say that all requests bound to a resource belong to one and only
one rsrc node, and considering that nodes are removed and recycled
strictly in-order, this separates requests into generations, where
generation are changed on each node switch (i.e. io_rsrc_node_switch()).

The API is simple, io_rsrc_node_switch() switches to a new generation if
needed, and also optionally kills a passed in io_rsrc_data. Each call to
io_rsrc_node_switch() have to be preceded with
io_rsrc_node_switch_start(). The start function is idempotent and should
not necessarily be followed by switch.

One difference is that once a node was set it will always retain a valid
rsrc node, even on unregister. It may be a nuisance at the moment, but
makes much sense for multiple types of resources. Another thing changed
is that nodes are bound to/associated with a io_rsrc_data later just
before killing (i.e. switching).
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/7e9c693b4b9a2f47aa784b616ce29843021bb65a.1617287883.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

a7f0ed5a

io_uring: refactor io_queue_rsrc_removal() · e7c78371

Pavel Begunkov authored Apr 01, 2021

Pass rsrc_node into io_queue_rsrc_removal() explicitly. Just a
simple preparation patch, makes following changes nicer.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/002889ce4de7baf287f2b010eef86ffe889174c6.1617287883.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

e7c78371

io_uring: move rsrc_put callback into io_rsrc_data · 40ae0ff7

Pavel Begunkov authored Apr 01, 2021

io_rsrc_node's callback operates only on a single io_rsrc_data and only
with its resources, so rsrc_put() callback is actually a property of
io_rsrc_data. Move it there, it makes code much nicecr.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/9417c2fba3c09e8668f05747006a603d416d34b4.1617287883.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

40ae0ff7

io_uring: encapsulate rsrc node manipulations · 82fbcfa9

Pavel Begunkov authored Apr 01, 2021

io_rsrc_node_get() and io_rsrc_node_set() are always used together,
merge them into one so most users don't even see io_rsrc_node and don't
need to care about it.

It helped to catch io_sqe_files_register() inferring rsrc data argument
for get and set differently, not a problem but a good sign.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/0827b080b2e61b3dec795380f7e1a1995595d41f.1617287883.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

82fbcfa9

io_uring: use rsrc prealloc infra for files reg · f3baed39

Pavel Begunkov authored Apr 01, 2021

Keep it consistent with update and use io_rsrc_node_prealloc() +
io_rsrc_node_get() in io_sqe_files_register() as well, that will be used
in future patches, not as error prone and allows to deduplicate
rsrc_node init.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/cf87321e6be5e38f4dc7fe5079d2aa6945b1ace0.1617287883.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

f3baed39

io_uring: simplify io_rsrc_node_ref_zero · 221aa924

Pavel Begunkov authored Apr 01, 2021

Replace queue_delayed_work() with mod_delayed_work() in
io_rsrc_node_ref_zero() as the later one can schedule a new work, and
cleanup it further for better readability.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/3b2b23e3a1ea4bbf789cd61815d33e05d9ff945e.1617287883.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

221aa924

io_uring: name rsrc bits consistently · b895c9a6

Pavel Begunkov authored Apr 01, 2021

Keep resource related structs' and functions' naming consistent, in
particular use "io_rsrc" prefix for everything.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/962f5acdf810f3a62831e65da3932cde24f6d9df.1617287883.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

b895c9a6

io-wq: cancel task_work on exit only targeting the current 'wq' · c80ca470

Jens Axboe authored Apr 01, 2021

With using task_work_cancel(), we're potentially canceling task_work
that isn't related to this specific io_wq. Use the newly added
task_work_cancel_match() to ensure that we only remove and cancel work
items that are specific to this io_wq.

Fixes: 685fe7fe ("io-wq: eliminate the need for a manager thread")
Signed-off-by: Jens Axboe <axboe@kernel.dk>

c80ca470

task_work: add helper for more targeted task_work canceling · c7aab1a7

Jens Axboe authored Apr 01, 2021

The only exported helper we have right now is task_work_cancel(), which
cancels any task_work from a given task where func matches the queued
work item. This is a bit too coarse for some use cases. Add a
task_work_cancel_match() that allows to more specifically target
individual work items outside of purely the callback function used.

task_work_cancel() can be trivially implemented on top of that, hence do
so.
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

c7aab1a7

io_uring: fix race around poll update and poll triggering · b2e720ac

Jens Axboe authored Mar 31, 2021

Joakim reports that in some conditions he sees a multishot poll request
being canceled, and that it coincides with getting -EALREADY on
modification. As part of the poll update procedure, there's a small window
where the request is marked as canceled, and if this coincides with the
event actually triggering, then we can get a spurious -ECANCELED and
termination of the multishot request.

Don't mark the poll request as being canceled for update. We also don't
care if we race on removal unless it's a one-shot request, we can safely
updated for either case.

Fixes: b69de288 ("io_uring: allow events and user_data update of running poll requests")
Reported-by: Joakim Hassila <joj@mac.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

b2e720ac

11 Apr, 2021 4 commits

io_uring: reg buffer overflow checks hardening · 50e96989

Pavel Begunkov authored Mar 24, 2021

We are safe with overflows in io_sqe_buffer_register() because it will
just yield alloc failure, but it's nicer to check explicitly.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/2b0625551be3d97b80a5fd21c8cd79dc1c91f0b5.1616624589.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

50e96989

io_uring: allow SQPOLL without CAP_SYS_ADMIN or CAP_SYS_NICE · 548d819d

Jens Axboe authored Mar 25, 2021

Now that we have any worker being attached to the original task as
threads, accounting of CPU time is directly attributed to the original
task as well. This means that we no longer have to restrict SQPOLL to
needing elevated privileges, as it's really no different from just having
the task spawn a busy looping thread in userspace.
Reported-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

548d819d

io-wq: eliminate the need for a manager thread · 685fe7fe

Jens Axboe authored Mar 08, 2021

io-wq relies on a manager thread to create/fork new workers, as needed.
But there's really no strong need for it anymore. We have the following
cases that fork a new worker:

1) Work queue. This is done from the task itself always, and it's trivial
to create a worker off that path, if needed.

2) All workers have gone to sleep, and we have more work. This is called
off the sched out path. For this case, use a task_work items to queue
a fork-worker operation.

3) Hashed work completion. Don't think we need to do anything off this
case. If need be, it could just use approach 2 as well.

Part of this change is incrementing the running worker count before the
fork, to avoid cases where we observe we need a worker and then queue
creation of one. Then new work comes in, we fork a new one. That last
queue operation should have waited for the previous worker to come up,
it's quite possible we don't even need it. Hence move the worker running
from before we fork it off to more efficiently handle that case.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

685fe7fe

kernel: allow fork with TIF_NOTIFY_SIGNAL pending · 66ae0d1e

Jens Axboe authored Mar 22, 2021

fork() fails if signal_pending() is true, but there are two conditions
that can lead to that:

1) An actual signal is pending. We want fork to fail for that one, like
   we always have.

2) TIF_NOTIFY_SIGNAL is pending, because the task has pending task_work.
   We don't need to make it fail for that case.

Allow fork() to proceed if just task_work is pending, by changing the
signal_pending() check to task_sigpending().
Signed-off-by: Jens Axboe <axboe@kernel.dk>

66ae0d1e