1. 07 May, 2024 4 commits
  2. 02 May, 2024 1 commit
    • Jens Axboe's avatar
      Merge commit '50abcc17' of... · fb15ffd0
      Jens Axboe authored
      Merge commit '50abcc17' of git://git.infradead.org/nvme into block-6.9
      
      Pull NVMe fixes from Keith.
      
      * git://git.infradead.org/nvme:
        nvme-tcp: strict pdu pacing to avoid send stalls on TLS
        nvmet: fix nvme status code when namespace is disabled
        nvmet-tcp: fix possible memory leak when tearing down a controller
        nvme: cancel pending I/O if nvme controller is in terminal state
        nvmet-auth: replace pr_debug() with pr_err() to report an error.
        nvmet-auth: return the error code to the nvmet_auth_host_hash() callers
        nvme: find numa distance only if controller has valid numa id
        nvme: fix warn output about shared namespaces without CONFIG_NVME_MULTIPATH
      fb15ffd0
  3. 01 May, 2024 7 commits
  4. 30 Apr, 2024 1 commit
    • Uday Shankar's avatar
      ublk: remove segment count and size limits · eaf4a9b1
      Uday Shankar authored
      ublk_drv currently creates block devices with the default max_segments
      and max_segment_size limits of BLK_MAX_SEGMENTS (128) and
      BLK_MAX_SEGMENT_SIZE (65536) respectively. These defaults can
      artificially constrain the I/O size seen by the ublk server - for
      example, suppose that the ublk server has configured itself to accept
      I/Os up to 1M and the application is also issuing 1M sized I/Os. If the
      I/O buffer used by the application is backed by 4K pages, the buffer
      could consist of up to 1M / 4K = 256 physically discontiguous segments
      (even if the buffer is virtually contiguous). As such, the I/O could
      exceed the default max_segments limit and get split. This can cause
      unnecessary performance issues if the ublk server is optimized to handle
      1M I/Os. The block layer's segment count/size limits exist to model
      hardware constraints which don't exist in ublk_drv's case, so just
      remove those limits for the block devices created by ublk_drv.
      Signed-off-by: default avatarUday Shankar <ushankar@purestorage.com>
      Reviewed-by: default avatarRiley Thomasson <riley@purestorage.com>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Link: https://lore.kernel.org/r/20240430211623.2802036-1-ushankar@purestorage.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      eaf4a9b1
  5. 19 Apr, 2024 1 commit
    • Li Nan's avatar
      blk-iocost: do not WARN if iocg was already offlined · 01bc4fda
      Li Nan authored
      In iocg_pay_debt(), warn is triggered if 'active_list' is empty, which
      is intended to confirm iocg is active when it has debt. However, warn
      can be triggered during a blkcg or disk removal, if iocg_waitq_timer_fn()
      is run at that time:
      
        WARNING: CPU: 0 PID: 2344971 at block/blk-iocost.c:1402 iocg_pay_debt+0x14c/0x190
        Call trace:
        iocg_pay_debt+0x14c/0x190
        iocg_kick_waitq+0x438/0x4c0
        iocg_waitq_timer_fn+0xd8/0x130
        __run_hrtimer+0x144/0x45c
        __hrtimer_run_queues+0x16c/0x244
        hrtimer_interrupt+0x2cc/0x7b0
      
      The warn in this situation is meaningless. Since this iocg is being
      removed, the state of the 'active_list' is irrelevant, and 'waitq_timer'
      is canceled after removing 'active_list' in ioc_pd_free(), which ensures
      iocg is freed after iocg_waitq_timer_fn() returns.
      
      Therefore, add the check if iocg was already offlined to avoid warn
      when removing a blkcg or disk.
      Signed-off-by: default avatarLi Nan <linan122@huawei.com>
      Reviewed-by: default avatarYu Kuai <yukuai3@huawei.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Link: https://lore.kernel.org/r/20240419093257.3004211-1-linan666@huaweicloud.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      01bc4fda
  6. 18 Apr, 2024 1 commit
  7. 12 Apr, 2024 2 commits
    • Yu Kuai's avatar
      block: fix that blk_time_get_ns() doesn't update time after schedule · 3ec48489
      Yu Kuai authored
      While monitoring the throttle time of IO from iocost, it's found that
      such time is always zero after the io_schedule() from ioc_rqos_throttle,
      for example, with the following debug patch:
      
      +       printk("%s-%d: %s enter %llu\n", current->comm, current->pid, __func__, blk_time_get_ns());
              while (true) {
                      set_current_state(TASK_UNINTERRUPTIBLE);
                      if (wait.committed)
                              break;
                      io_schedule();
              }
      +       printk("%s-%d: %s exit  %llu\n", current->comm, current->pid, __func__, blk_time_get_ns());
      
      It can be observerd that blk_time_get_ns() always return the same time:
      
      [ 1068.096579] fio-1268: ioc_rqos_throttle enter 1067901962288
      [ 1068.272587] fio-1268: ioc_rqos_throttle exit  1067901962288
      [ 1068.274389] fio-1268: ioc_rqos_throttle enter 1067901962288
      [ 1068.472690] fio-1268: ioc_rqos_throttle exit  1067901962288
      [ 1068.474485] fio-1268: ioc_rqos_throttle enter 1067901962288
      [ 1068.672656] fio-1268: ioc_rqos_throttle exit  1067901962288
      [ 1068.674451] fio-1268: ioc_rqos_throttle enter 1067901962288
      [ 1068.872655] fio-1268: ioc_rqos_throttle exit  1067901962288
      
      And I think the root cause is that 'PF_BLOCK_TS' is always cleared
      by blk_flush_plug() before scheduel(), hence blk_plug_invalidate_ts()
      will never be called:
      
      blk_time_get_ns
       plug->cur_ktime = ktime_get_ns();
       current->flags |= PF_BLOCK_TS;
      
      io_schedule:
       io_schedule_prepare
        blk_flush_plug
         __blk_flush_plug
          /* the flag is cleared, while time is not */
          current->flags &= ~PF_BLOCK_TS;
       schedule
       sched_update_worker
        /* the flag is not set, hence plug->cur_ktime is not cleared */
        if (tsk->flags & PF_BLOCK_TS)
         blk_plug_invalidate_ts()
      
      blk_time_get_ns
       /* got the time stashed before schedule */
       return plug->cur_ktime;
      
      Fix the problem by clearing cached time in __blk_flush_plug().
      
      Fixes: 06b23f92 ("block: update cached timestamp post schedule/preemption")
      Signed-off-by: default avatarYu Kuai <yukuai3@huawei.com>
      Link: https://lore.kernel.org/r/20240411032349.3051233-2-yukuai1@huaweicloud.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      3ec48489
    • Yi Zhang's avatar
      nvme: fix warn output about shared namespaces without CONFIG_NVME_MULTIPATH · 0bc2e80b
      Yi Zhang authored
      Move the stray '.' that is currently at the end of the line after
      newline '\n' to before newline character which is the right position.
      
      Fixes: ce8d7861 ("nvme: warn about shared namespaces without CONFIG_NVME_MULTIPATH")
      Signed-off-by: default avatarYi Zhang <yi.zhang@redhat.com>
      Reviewed-by: default avatarChaitanya Kulkarni <kch@nvidia.com>
      Signed-off-by: default avatarKeith Busch <kbusch@kernel.org>
      0bc2e80b
  8. 09 Apr, 2024 1 commit
  9. 07 Apr, 2024 2 commits
  10. 06 Apr, 2024 1 commit
    • Rik van Riel's avatar
      blk-iocost: avoid out of bounds shift · beaa51b3
      Rik van Riel authored
      UBSAN catches undefined behavior in blk-iocost, where sometimes
      iocg->delay is shifted right by a number that is too large,
      resulting in undefined behavior on some architectures.
      
      [  186.556576] ------------[ cut here ]------------
      UBSAN: shift-out-of-bounds in block/blk-iocost.c:1366:23
      shift exponent 64 is too large for 64-bit type 'u64' (aka 'unsigned long long')
      CPU: 16 PID: 0 Comm: swapper/16 Tainted: G S          E    N 6.9.0-0_fbk700_debug_rc2_kbuilder_0_gc85af715 #1
      Hardware name: Quanta Twin Lakes MP/Twin Lakes Passive MP, BIOS F09_3A23 12/08/2020
      Call Trace:
       <IRQ>
       dump_stack_lvl+0x8f/0xe0
       __ubsan_handle_shift_out_of_bounds+0x22c/0x280
       iocg_kick_delay+0x30b/0x310
       ioc_timer_fn+0x2fb/0x1f80
       __run_timer_base+0x1b6/0x250
      ...
      
      Avoid that undefined behavior by simply taking the
      "delay = 0" branch if the shift is too large.
      
      I am not sure what the symptoms of an undefined value
      delay will be, but I suspect it could be more than a
      little annoying to debug.
      Signed-off-by: default avatarRik van Riel <riel@surriel.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Josef Bacik <josef@toxicpanda.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Link: https://lore.kernel.org/r/20240404123253.0f58010f@imladris.surriel.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      beaa51b3
  11. 04 Apr, 2024 5 commits
  12. 02 Apr, 2024 4 commits
  13. 31 Mar, 2024 10 commits