1. 20 Mar, 2022 1 commit
    • Jens Axboe's avatar
      io_uring: recycle provided before arming poll · abdad709
      Jens Axboe authored
      We currently have a race where we recycle the selected buffer if poll
      returns IO_APOLL_OK. But that's too late, as the poll could already be
      triggering or have triggered. If that race happens, then we're putting a
      buffer that's already being used.
      
      Fix this by recycling before we arm poll. This does mean that we'll
      sometimes almost instantly re-select the buffer, but it's rare enough in
      testing that it should not pose a performance issue.
      
      Fixes: b1c62645 ("io_uring: recycle provided buffers if request goes async")
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      abdad709
  2. 18 Mar, 2022 2 commits
  3. 17 Mar, 2022 9 commits
  4. 16 Mar, 2022 4 commits
    • Jens Axboe's avatar
      io_uring: cache poll/double-poll state with a request flag · 91eac1c6
      Jens Axboe authored
      With commit "io_uring: cache req->apoll->events in req->cflags" applied,
      we now have just io_poll_remove_entries() dipping into req->apoll when
      it isn't strictly necessary.
      
      Mark poll and double-poll with a flag, so we know if we need to look
      at apoll->double_poll. This avoids pulling in those cachelines if we
      don't need them. The common case is that the poll wake handler already
      removed these entries while hot off the completion path.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      91eac1c6
    • Jens Axboe's avatar
      io_uring: cache req->apoll->events in req->cflags · 81459350
      Jens Axboe authored
      When we arm poll on behalf of a different type of request, like a network
      receive, then we allocate req->apoll as our poll entry. Running network
      workloads shows io_poll_check_events() as the most expensive part of
      io_uring, and it's all due to having to pull in req->apoll instead of
      just the request which we have hot already.
      
      Cache poll->events in req->cflags, which isn't used until the request
      completes anyway. This isn't strictly needed for regular poll, where
      req->poll.events is used and thus already hot, but for the sake of
      unification we do it all around.
      
      This saves 3-4% of overhead in certain request workloads.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      81459350
    • Jens Axboe's avatar
      io_uring: move req->poll_refs into previous struct hole · 521d61fc
      Jens Axboe authored
      This serves two purposes:
      
      - We now have the last cacheline mostly unused for generic workloads,
        instead of having to pull in the poll refs explicitly for workloads
        that rely on poll arming.
      
      - It shrinks the io_kiocb from 232 to 224 bytes.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      521d61fc
    • Dylan Yudaken's avatar
      io_uring: make tracing format consistent · 052ebf1f
      Dylan Yudaken authored
      Make the tracing formatting for user_data and flags consistent.
      
      Having consistent formatting allows one for example to grep for a specific
      user_data/flags and be able to trace a single sqe through easily.
      
      Change user_data to 0x%llx and flags to 0x%x everywhere. The '0x' is
      useful to disambiguate for example "user_data 100".
      
      Additionally remove the '=' for flags in io_uring_req_failed, again for consistency.
      Signed-off-by: default avatarDylan Yudaken <dylany@fb.com>
      Link: https://lore.kernel.org/r/20220316095204.2191498-1-dylany@fb.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      052ebf1f
  5. 15 Mar, 2022 1 commit
    • Jens Axboe's avatar
      io_uring: recycle apoll_poll entries · 4d9237e3
      Jens Axboe authored
      Particularly for networked workloads, io_uring intensively uses its
      poll based backend to get a notification when data/space is available.
      Profiling workloads, we see 3-4% of alloc+free that is directly attributed
      to just the apoll allocation and free (and the rest being skb alloc+free).
      
      For the fast path, we have ctx->uring_lock held already for both issue
      and the inline completions, and we can utilize that to avoid any extra
      locking needed to have a basic recycling cache for the apoll entries on
      both the alloc and free side.
      
      Double poll still requires an allocation. But those are rare and not
      a fast path item.
      
      With the simple cache in place, we see a 3-4% reduction in overhead for
      the workload.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      4d9237e3
  6. 12 Mar, 2022 1 commit
  7. 10 Mar, 2022 22 commits