1. 19 Oct, 2021 3 commits
    • Jens Axboe's avatar
      nvme: don't memset() the normal read/write command · a9a7e30f
      Jens Axboe authored
      This memset in the fast path costs a lot of cycles on my setup. Here's a
      top-of-profile of doing ~6.7M IOPS:
      
      +    5.90%  io_uring  [nvme]            [k] nvme_queue_rq
      +    5.32%  io_uring  [nvme_core]       [k] nvme_setup_cmd
      +    5.17%  io_uring  [kernel.vmlinux]  [k] io_submit_sqes
      +    4.97%  io_uring  [kernel.vmlinux]  [k] blkdev_direct_IO
      
      and a perf diff with this patch:
      
           0.92%     +4.40%  [nvme_core]       [k] nvme_setup_cmd
      
      reducing it from 5.3% to only 0.9%. This takes it from the 2nd most
      cycle consumer to something that's mostly irrelevant.
      Reviewed-by: default avatarChaitanya Kulkarni <kch@nvidia.com>
      Reviewed-by: default avatarKeith Busch <kbusch@kernel.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      a9a7e30f
    • Jens Axboe's avatar
      nvme: move command clear into the various setup helpers · 9c3d2929
      Jens Axboe authored
      We don't have to worry about doing extra memsets by moving it outside
      the protection of RQF_DONTPREP, as nvme doesn't do partial completions.
      
      This is in preparation for making the read/write fast path not do a full
      memset of the command.
      Reviewed-by: default avatarKeith Busch <kbusch@kernel.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      9c3d2929
    • Michael Schmitz's avatar
      block: ataflop: fix breakage introduced at blk-mq refactoring · 86d46fda
      Michael Schmitz authored
      Refactoring of the Atari floppy driver when converting to blk-mq
      has broken the state machine in not-so-subtle ways:
      
      finish_fdc() must be called when operations on the floppy device
      have completed. This is crucial in order to relase the ST-DMA
      lock, which protects against concurrent access to the ST-DMA
      controller by other drivers (some DMA related, most just related
      to device register access - broken beyond compare, I know).
      
      When rewriting the driver's old do_request() function, the fact
      that finish_fdc() was called only when all queued requests had
      completed appears to have been overlooked. Instead, the new
      request function calls finish_fdc() immediately after the last
      request has been queued. finish_fdc() executes a dummy seek after
      most requests, and this overwrites the state machine's interrupt
      hander that was set up to wait for completion of the read/write
      request just prior. To make matters worse, finish_fdc() is called
      before device interrupts are re-enabled, making certain that the
      read/write interupt is missed.
      
      Shifting the finish_fdc() call into the read/write request
      completion handler ensures the driver waits for the request to
      actually complete. With a queue depth of 2, we won't see long
      request sequences, so calling finish_fdc() unconditionally just
      adds a little overhead for the dummy seeks, and keeps the code
      simple.
      
      While we're at it, kill ataflop_commit_rqs() which does nothing
      but run finish_fdc() unconditionally, again likely wiping out an
      in-flight request.
      Signed-off-by: default avatarMichael Schmitz <schmitzmic@gmail.com>
      Fixes: 6ec3938c ("ataflop: convert to blk-mq")
      CC: linux-block@vger.kernel.org
      CC: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Link: https://lore.kernel.org/r/20211019061321.26425-1-schmitzmic@gmail.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      86d46fda
  2. 18 Oct, 2021 37 commits