1. 27 Feb, 2024 4 commits
    • Jens Axboe's avatar
      io_uring/kbuf: flag request if buffer pool is empty after buffer pick · c3f9109d
      Jens Axboe authored
      Normally we do an extra roundtrip for retries even if the buffer pool has
      depleted, as we don't check that upfront. Rather than add this check, have
      the buffer selection methods mark the request with REQ_F_BL_EMPTY if the
      used buffer group is out of buffers after this selection. This is very
      cheap to do once we're all the way inside there anyway, and it gives the
      caller a chance to make better decisions on how to proceed.
      
      For example, recv/recvmsg multishot could check this flag when it
      decides whether to keep receiving or not.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      c3f9109d
    • Jens Axboe's avatar
      io_uring/net: improve the usercopy for sendmsg/recvmsg · 792060de
      Jens Axboe authored
      We're spending a considerable amount of the sendmsg/recvmsg time just
      copying in the message header. And for provided buffers, the known
      single entry iovec.
      
      Be a bit smarter about it and enable/disable user access around our
      copying. In a test case that does both sendmsg and recvmsg, the
      runtime before this change (averaged over multiple runs, very stable
      times however):
      
      Kernel		Time		Diff
      ====================================
      -git		4720 usec
      -git+commit	4311 usec	-8.7%
      
      and looking at a profile diff, we see the following:
      
      0.25%     +9.33%  [kernel.kallsyms]     [k] _copy_from_user
      4.47%     -3.32%  [kernel.kallsyms]     [k] __io_msg_copy_hdr.constprop.0
      
      where we drop more than 9% of _copy_from_user() time, and consequently
      add time to __io_msg_copy_hdr() where the copies are now attributed to,
      but with a net win of 6%.
      
      In comparison, the same test case with send/recv runs in 3745 usec, which
      is (expectedly) still quite a bit faster. But at least sendmsg/recvmsg is
      now only ~13% slower, where it was ~21% slower before.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      792060de
    • Jens Axboe's avatar
      io_uring/net: move receive multishot out of the generic msghdr path · c5597802
      Jens Axboe authored
      Move the actual user_msghdr / compat_msghdr into the send and receive
      sides, respectively, so we can move the uaddr receive handling into its
      own handler, and ditto the multishot with buffer selection logic.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      c5597802
    • Jens Axboe's avatar
      io_uring/net: unify how recvmsg and sendmsg copy in the msghdr · 52307ac4
      Jens Axboe authored
      For recvmsg, we roll our own since we support buffer selections. This
      isn't the case for sendmsg right now, but in preparation for doing so,
      make the recvmsg copy helpers generic so we can call them from the
      sendmsg side as well.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      52307ac4
  2. 15 Feb, 2024 2 commits
    • Jens Axboe's avatar
      io_uring/napi: enable even with a timeout of 0 · b4ccc4dd
      Jens Axboe authored
      1 usec is not as short as it used to be, and it makes sense to allow 0
      for a busy poll timeout - this means just do one loop to check if we
      have anything available. Add a separate ->napi_enabled to check if napi
      has been enabled or not.
      
      While at it, move the writing of the ctx napi values after we've copied
      the old values back to userspace. This ensures that if the call fails,
      we'll be in the same state as we were before, rather than some
      indeterminate state.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      b4ccc4dd
    • Jens Axboe's avatar
      io_uring: kill stale comment for io_cqring_overflow_kill() · 871760eb
      Jens Axboe authored
      This function now deals only with discarding overflow entries on ring
      free and exit, and it no longer returns whether we successfully flushed
      all entries as there's no CQE posting involved anymore. Kill the
      outdated comment.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      871760eb
  3. 14 Feb, 2024 3 commits
    • Jens Axboe's avatar
      io_uring/sqpoll: use the correct check for pending task_work · c8d8fc3b
      Jens Axboe authored
      A previous commit moved to using just the private task_work list for
      SQPOLL, but it neglected to update the check for whether we have
      pending task_work. Normally this is fine as we'll attempt to run it
      unconditionally, but if we race with going to sleep AND task_work
      being added, then we certainly need the right check here.
      
      Fixes: af5d68f8 ("io_uring/sqpoll: manage task_work privately")
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      c8d8fc3b
    • Jens Axboe's avatar
      io_uring: wake SQPOLL task when task_work is added to an empty queue · 78f9b61b
      Jens Axboe authored
      If there's no current work on the list, we still need to potentially
      wake the SQPOLL task if it is sleeping. This is ordered with the
      wait queue addition in sqpoll, which adds to the wait queue before
      checking for pending work items.
      
      Fixes: af5d68f8 ("io_uring/sqpoll: manage task_work privately")
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      78f9b61b
    • Jens Axboe's avatar
      io_uring/napi: ensure napi polling is aborted when work is available · 428f1382
      Jens Axboe authored
      While testing io_uring NAPI with DEFER_TASKRUN, I ran into slowdowns and
      stalls in packet delivery. Turns out that while
      io_napi_busy_loop_should_end() aborts appropriately on regular
      task_work, it does not abort if we have local task_work pending.
      
      Move io_has_work() into the private io_uring.h header, and gate whether
      we should continue polling on that as well. This makes NAPI polling on
      send/receive work as designed with IORING_SETUP_DEFER_TASKRUN as well.
      
      Fixes: 8d0c12a8 ("io-uring: add napi busy poll support")
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      428f1382
  4. 13 Feb, 2024 1 commit
  5. 09 Feb, 2024 9 commits
  6. 08 Feb, 2024 18 commits
  7. 07 Feb, 2024 1 commit
  8. 04 Feb, 2024 2 commits
    • Linus Torvalds's avatar
      Linux 6.8-rc3 · 54be6c6c
      Linus Torvalds authored
      54be6c6c
    • Linus Torvalds's avatar
      Merge tag 'for-linus-6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 · 3f24fcda
      Linus Torvalds authored
      Pull ext4 fixes from Ted Ts'o:
       "Miscellaneous bug fixes and cleanups in ext4's multi-block allocator
        and extent handling code"
      
      * tag 'for-linus-6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (23 commits)
        ext4: make ext4_set_iomap() recognize IOMAP_DELALLOC map type
        ext4: make ext4_map_blocks() distinguish delalloc only extent
        ext4: add a hole extent entry in cache after punch
        ext4: correct the hole length returned by ext4_map_blocks()
        ext4: convert to exclusive lock while inserting delalloc extents
        ext4: refactor ext4_da_map_blocks()
        ext4: remove 'needed' in trace_ext4_discard_preallocations
        ext4: remove unnecessary parameter "needed" in ext4_discard_preallocations
        ext4: remove unused return value of ext4_mb_release_group_pa
        ext4: remove unused return value of ext4_mb_release_inode_pa
        ext4: remove unused return value of ext4_mb_release
        ext4: remove unused ext4_allocation_context::ac_groups_considered
        ext4: remove unneeded return value of ext4_mb_release_context
        ext4: remove unused parameter ngroup in ext4_mb_choose_next_group_*()
        ext4: remove unused return value of __mb_check_buddy
        ext4: mark the group block bitmap as corrupted before reporting an error
        ext4: avoid allocating blocks from corrupted group in ext4_mb_find_by_goal()
        ext4: avoid allocating blocks from corrupted group in ext4_mb_try_best_found()
        ext4: avoid dividing by 0 in mb_update_avg_fragment_size() when block bitmap corrupt
        ext4: avoid bb_free and bb_fragments inconsistency in mb_free_blocks()
        ...
      3f24fcda