1. 02 Apr, 2021 1 commit
    • Jens Axboe's avatar
      io_uring: move reissue into regular IO path · 230d50d4
      Jens Axboe authored
      It's non-obvious how retry is done for block backed files, when it happens
      off the kiocb done path. It also makes it tricky to deal with the iov_iter
      handling.
      
      Just mark the req as needing a reissue, and handling it from the
      submission path instead. This makes it directly obvious that we're not
      re-importing the iovec from userspace past the submit point, and it means
      that we can just reuse our usual -EAGAIN retry path from the read/write
      handling.
      
      At some point in the future, we'll gain the ability to always reliably
      return -EAGAIN through the stack. A previous attempt on the block side
      didn't pan out and got reverted, hence the need to check for this
      information out-of-band right now.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      230d50d4
  2. 01 Apr, 2021 3 commits
  3. 30 Mar, 2021 1 commit
    • Jens Axboe's avatar
      io_uring: drop sqd lock before handling signals for SQPOLL · 82734c5b
      Jens Axboe authored
      Don't call into get_signal() with the sqd mutex held, it'll fail if we're
      freezing the task and we'll get complaints on locks still being held:
      
      ====================================
      WARNING: iou-sqp-8386/8387 still has locks held!
      5.12.0-rc4-syzkaller #0 Not tainted
      ------------------------------------
      1 lock held by iou-sqp-8386/8387:
       #0: ffff88801e1d2470 (&sqd->lock){+.+.}-{3:3}, at: io_sq_thread+0x24c/0x13a0 fs/io_uring.c:6731
      
       stack backtrace:
       CPU: 1 PID: 8387 Comm: iou-sqp-8386 Not tainted 5.12.0-rc4-syzkaller #0
       Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
       Call Trace:
        __dump_stack lib/dump_stack.c:79 [inline]
        dump_stack+0x141/0x1d7 lib/dump_stack.c:120
        try_to_freeze include/linux/freezer.h:66 [inline]
        get_signal+0x171a/0x2150 kernel/signal.c:2576
        io_sq_thread+0x8d2/0x13a0 fs/io_uring.c:6748
      
      Fold the get_signal() case in with the parking checks, as we need to drop
      the lock in both cases, and since we need to be checking for parking when
      juggling the lock anyway.
      
      Reported-by: syzbot+796d767eb376810256f5@syzkaller.appspotmail.com
      Fixes: dbe1bdbb ("io_uring: handle signals for IO threads like a normal thread")
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      82734c5b
  4. 29 Mar, 2021 2 commits
  5. 27 Mar, 2021 11 commits
  6. 26 Mar, 2021 1 commit
    • Jens Axboe's avatar
      kernel: don't call do_exit() for PF_IO_WORKER threads · 10442994
      Jens Axboe authored
      Right now we're never calling get_signal() from PF_IO_WORKER threads, but
      in preparation for doing so, don't handle a fatal signal for them. The
      workers have state they need to cleanup when exiting, so just return
      instead of calling do_exit() on their behalf. The threads themselves will
      detect a fatal signal and do proper shutdown.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      10442994
  7. 25 Mar, 2021 2 commits
    • Pavel Begunkov's avatar
      io_uring: maintain CQE order of a failed link · 90b87490
      Pavel Begunkov authored
      Arguably we want CQEs of linked requests be in a strict order of
      submission as it always was. Now if init of a request fails its CQE may
      be posted before all prior linked requests including the head of the
      link. Fix it by failing it last.
      
      Fixes: de59bc10 ("io_uring: fail links more in io_submit_sqe()")
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Link: https://lore.kernel.org/r/b7a96b05832e7ab23ad55f84092a2548c4a888b0.1616699075.git.asml.silence@gmail.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      90b87490
    • Jens Axboe's avatar
      io-wq: fix race around pending work on teardown · f5d2d23b
      Jens Axboe authored
      syzbot reports that it's triggering the warning condition on having
      pending work on shutdown:
      
      WARNING: CPU: 1 PID: 12346 at fs/io-wq.c:1061 io_wq_destroy fs/io-wq.c:1061 [inline]
      WARNING: CPU: 1 PID: 12346 at fs/io-wq.c:1061 io_wq_put+0x153/0x260 fs/io-wq.c:1072
      Modules linked in:
      CPU: 1 PID: 12346 Comm: syz-executor.5 Not tainted 5.12.0-rc2-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:io_wq_destroy fs/io-wq.c:1061 [inline]
      RIP: 0010:io_wq_put+0x153/0x260 fs/io-wq.c:1072
      Code: 8d e8 71 90 ea 01 49 89 c4 41 83 fc 40 7d 4f e8 33 4d 97 ff 42 80 7c 2d 00 00 0f 85 77 ff ff ff e9 7a ff ff ff e8 1d 4d 97 ff <0f> 0b eb b9 8d 6b ff 89 ee 09 de bf ff ff ff ff e8 18 51 97 ff 09
      RSP: 0018:ffffc90001ebfb08 EFLAGS: 00010293
      RAX: ffffffff81e16083 RBX: ffff888019038040 RCX: ffff88801e86b780
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000040
      RBP: 1ffff1100b2f8a80 R08: ffffffff81e15fce R09: ffffed100b2f8a82
      R10: ffffed100b2f8a82 R11: 0000000000000000 R12: 0000000000000000
      R13: dffffc0000000000 R14: ffff8880597c5400 R15: ffff888019038000
      FS:  00007f8dcd89c700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 000055e9a054e160 CR3: 000000001dfb8000 CR4: 00000000001506f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       io_uring_clean_tctx+0x1b7/0x210 fs/io_uring.c:8802
       __io_uring_files_cancel+0x13c/0x170 fs/io_uring.c:8820
       io_uring_files_cancel include/linux/io_uring.h:47 [inline]
       do_exit+0x258/0x2340 kernel/exit.c:780
       do_group_exit+0x168/0x2d0 kernel/exit.c:922
       get_signal+0x1734/0x1ef0 kernel/signal.c:2773
       arch_do_signal_or_restart+0x3c/0x610 arch/x86/kernel/signal.c:811
       handle_signal_work kernel/entry/common.c:147 [inline]
       exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
       exit_to_user_mode_prepare+0xac/0x1e0 kernel/entry/common.c:208
       __syscall_exit_to_user_mode_work kernel/entry/common.c:290 [inline]
       syscall_exit_to_user_mode+0x48/0x180 kernel/entry/common.c:301
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      RIP: 0033:0x465f69
      
      which shouldn't happen, but seems to be possible due to a race on whether
      or not the io-wq manager sees a fatal signal first, or whether the io-wq
      workers do. If we race with queueing work and then send a fatal signal to
      the owning task, and the io-wq worker sees that before the manager sets
      IO_WQ_BIT_EXIT, then it's possible to have the worker exit and leave work
      behind.
      
      Just turn the WARN_ON_ONCE() into a cancelation condition instead.
      
      Reported-by: syzbot+77a738a6bc947bf639ca@syzkaller.appspotmail.com
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      f5d2d23b
  8. 24 Mar, 2021 1 commit
    • Pavel Begunkov's avatar
      io_uring: do ctx sqd ejection in a clear context · a185f1db
      Pavel Begunkov authored
      WARNING: CPU: 1 PID: 27907 at fs/io_uring.c:7147 io_sq_thread_park+0xb5/0xd0 fs/io_uring.c:7147
      CPU: 1 PID: 27907 Comm: iou-sqp-27905 Not tainted 5.12.0-rc4-syzkaller #0
      RIP: 0010:io_sq_thread_park+0xb5/0xd0 fs/io_uring.c:7147
      Call Trace:
       io_ring_ctx_wait_and_kill+0x214/0x700 fs/io_uring.c:8619
       io_uring_release+0x3e/0x50 fs/io_uring.c:8646
       __fput+0x288/0x920 fs/file_table.c:280
       task_work_run+0xdd/0x1a0 kernel/task_work.c:140
       io_run_task_work fs/io_uring.c:2238 [inline]
       io_run_task_work fs/io_uring.c:2228 [inline]
       io_uring_try_cancel_requests+0x8ec/0xc60 fs/io_uring.c:8770
       io_uring_cancel_sqpoll+0x1cf/0x290 fs/io_uring.c:8974
       io_sqpoll_cancel_cb+0x87/0xb0 fs/io_uring.c:8907
       io_run_task_work_head+0x58/0xb0 fs/io_uring.c:1961
       io_sq_thread+0x3e2/0x18d0 fs/io_uring.c:6763
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
      
      May happen that last ctx ref is killed in io_uring_cancel_sqpoll(), so
      fput callback (i.e. io_uring_release()) is enqueued through task_work,
      and run by same cancellation. As it's deeply nested we can't do parking
      or taking sqd->lock there, because its state is unclear. So avoid
      ctx ejection from sqd list from io_ring_ctx_wait_and_kill() and do it
      in a clear context in io_ring_exit_work().
      
      Fixes: f6d54255 ("io_uring: halt SQO submission on ctx exit")
      Reported-by: syzbot+e3a3f84f5cecf61f0583@syzkaller.appspotmail.com
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Link: https://lore.kernel.org/r/e90df88b8ff2cabb14a7534601d35d62ab4cb8c7.1616496707.git.asml.silence@gmail.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      a185f1db
  9. 22 Mar, 2021 3 commits
  10. 21 Mar, 2021 5 commits
  11. 18 Mar, 2021 4 commits
  12. 15 Mar, 2021 6 commits
    • Pavel Begunkov's avatar
      io_uring: fix sqpoll cancellation via task_work · b7f5a0bf
      Pavel Begunkov authored
      Running sqpoll cancellations via task_work_run() is a bad idea because
      it depends on other task works to be run, but those may be locked in
      currently running task_work_run() because of how it's (splicing the list
      in batches).
      
      Enqueue and run them through a separate callback head, namely
      struct io_sq_data::park_task_work. As a nice bonus we now precisely
      control where it's run, that's much safer than guessing where it can
      happen as it was before.
      Reported-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      b7f5a0bf
    • Pavel Begunkov's avatar
      io_uring: add generic callback_head helpers · 9b465711
      Pavel Begunkov authored
      We already have helpers to run/add callback_head but taking ctx and
      working with ctx->exit_task_work. Extract generic versions of them
      implemented in terms of struct callback_head, it will be used later.
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      9b465711
    • Pavel Begunkov's avatar
      io_uring: fix concurrent parking · 9e138a48
      Pavel Begunkov authored
      If io_sq_thread_park() of one task got rescheduled right after
      set_bit(), before it gets back to mutex_lock() there can happen
      park()/unpark() by another task with SQPOLL locking again and
      continuing running never seeing that first set_bit(SHOULD_PARK),
      so won't even try to put the mutex down for parking.
      
      It will get parked eventually when SQPOLL drops the lock for reschedule,
      but may be problematic and will get in the way of further fixes.
      
      Account number of tasks waiting for parking with a new atomic variable
      park_pending and adjust SHOULD_PARK accordingly. It doesn't entirely
      replaces SHOULD_PARK bit with this atomic var because it's convenient
      to have it as a bit in the state and will help to do optimisations
      later.
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      9e138a48
    • Pavel Begunkov's avatar
      io_uring: halt SQO submission on ctx exit · f6d54255
      Pavel Begunkov authored
      io_sq_thread_finish() is called in io_ring_ctx_free(), so SQPOLL task is
      potentially running submitting new requests. It's not a disaster because
      of using a "try" variant of percpu_ref_get, but is far from nice.
      
      Remove ctx from the sqd ctx list earlier, before cancellation loop, so
      SQPOLL can't find it and so won't submit new requests.
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      f6d54255
    • Pavel Begunkov's avatar
      io_uring: replace sqd rw_semaphore with mutex · 09a6f4ef
      Pavel Begunkov authored
      The only user of read-locking of sqd->rw_lock is sq_thread itself, which
      is by definition alone, so we don't really need rw_semaphore, but mutex
      will do. Replace it with a mutex, and kill read-to-write upgrading and
      extra task_work handling in io_sq_thread().
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      09a6f4ef
    • Pavel Begunkov's avatar
      io_uring: fix complete_post use ctx after free · 180f829f
      Pavel Begunkov authored
      If io_req_complete_post() put not a final ref, we can't rely on the
      request's ctx ref, and so ctx may potentially be freed while
      complete_post() is in io_cqring_ev_posted()/etc.
      
      In that case get an additional ctx reference, and put it in the end, so
      protecting following io_cqring_ev_posted(). And also prolong ctx
      lifetime until spin_unlock happens, as we do with mutexes, so added
      percpu_ref_get() doesn't race with ctx free.
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      180f829f