1. 02 May, 2024 1 commit
    • Jens Axboe's avatar
      Merge commit '50abcc17' of... · fb15ffd0
      Jens Axboe authored
      Merge commit '50abcc17' of git://git.infradead.org/nvme into block-6.9
      
      Pull NVMe fixes from Keith.
      
      * git://git.infradead.org/nvme:
        nvme-tcp: strict pdu pacing to avoid send stalls on TLS
        nvmet: fix nvme status code when namespace is disabled
        nvmet-tcp: fix possible memory leak when tearing down a controller
        nvme: cancel pending I/O if nvme controller is in terminal state
        nvmet-auth: replace pr_debug() with pr_err() to report an error.
        nvmet-auth: return the error code to the nvmet_auth_host_hash() callers
        nvme: find numa distance only if controller has valid numa id
        nvme: fix warn output about shared namespaces without CONFIG_NVME_MULTIPATH
      fb15ffd0
  2. 01 May, 2024 7 commits
  3. 30 Apr, 2024 1 commit
    • Uday Shankar's avatar
      ublk: remove segment count and size limits · eaf4a9b1
      Uday Shankar authored
      ublk_drv currently creates block devices with the default max_segments
      and max_segment_size limits of BLK_MAX_SEGMENTS (128) and
      BLK_MAX_SEGMENT_SIZE (65536) respectively. These defaults can
      artificially constrain the I/O size seen by the ublk server - for
      example, suppose that the ublk server has configured itself to accept
      I/Os up to 1M and the application is also issuing 1M sized I/Os. If the
      I/O buffer used by the application is backed by 4K pages, the buffer
      could consist of up to 1M / 4K = 256 physically discontiguous segments
      (even if the buffer is virtually contiguous). As such, the I/O could
      exceed the default max_segments limit and get split. This can cause
      unnecessary performance issues if the ublk server is optimized to handle
      1M I/Os. The block layer's segment count/size limits exist to model
      hardware constraints which don't exist in ublk_drv's case, so just
      remove those limits for the block devices created by ublk_drv.
      Signed-off-by: default avatarUday Shankar <ushankar@purestorage.com>
      Reviewed-by: default avatarRiley Thomasson <riley@purestorage.com>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Link: https://lore.kernel.org/r/20240430211623.2802036-1-ushankar@purestorage.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      eaf4a9b1
  4. 19 Apr, 2024 1 commit
    • Li Nan's avatar
      blk-iocost: do not WARN if iocg was already offlined · 01bc4fda
      Li Nan authored
      In iocg_pay_debt(), warn is triggered if 'active_list' is empty, which
      is intended to confirm iocg is active when it has debt. However, warn
      can be triggered during a blkcg or disk removal, if iocg_waitq_timer_fn()
      is run at that time:
      
        WARNING: CPU: 0 PID: 2344971 at block/blk-iocost.c:1402 iocg_pay_debt+0x14c/0x190
        Call trace:
        iocg_pay_debt+0x14c/0x190
        iocg_kick_waitq+0x438/0x4c0
        iocg_waitq_timer_fn+0xd8/0x130
        __run_hrtimer+0x144/0x45c
        __hrtimer_run_queues+0x16c/0x244
        hrtimer_interrupt+0x2cc/0x7b0
      
      The warn in this situation is meaningless. Since this iocg is being
      removed, the state of the 'active_list' is irrelevant, and 'waitq_timer'
      is canceled after removing 'active_list' in ioc_pd_free(), which ensures
      iocg is freed after iocg_waitq_timer_fn() returns.
      
      Therefore, add the check if iocg was already offlined to avoid warn
      when removing a blkcg or disk.
      Signed-off-by: default avatarLi Nan <linan122@huawei.com>
      Reviewed-by: default avatarYu Kuai <yukuai3@huawei.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Link: https://lore.kernel.org/r/20240419093257.3004211-1-linan666@huaweicloud.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      01bc4fda
  5. 18 Apr, 2024 1 commit
  6. 12 Apr, 2024 2 commits
    • Yu Kuai's avatar
      block: fix that blk_time_get_ns() doesn't update time after schedule · 3ec48489
      Yu Kuai authored
      While monitoring the throttle time of IO from iocost, it's found that
      such time is always zero after the io_schedule() from ioc_rqos_throttle,
      for example, with the following debug patch:
      
      +       printk("%s-%d: %s enter %llu\n", current->comm, current->pid, __func__, blk_time_get_ns());
              while (true) {
                      set_current_state(TASK_UNINTERRUPTIBLE);
                      if (wait.committed)
                              break;
                      io_schedule();
              }
      +       printk("%s-%d: %s exit  %llu\n", current->comm, current->pid, __func__, blk_time_get_ns());
      
      It can be observerd that blk_time_get_ns() always return the same time:
      
      [ 1068.096579] fio-1268: ioc_rqos_throttle enter 1067901962288
      [ 1068.272587] fio-1268: ioc_rqos_throttle exit  1067901962288
      [ 1068.274389] fio-1268: ioc_rqos_throttle enter 1067901962288
      [ 1068.472690] fio-1268: ioc_rqos_throttle exit  1067901962288
      [ 1068.474485] fio-1268: ioc_rqos_throttle enter 1067901962288
      [ 1068.672656] fio-1268: ioc_rqos_throttle exit  1067901962288
      [ 1068.674451] fio-1268: ioc_rqos_throttle enter 1067901962288
      [ 1068.872655] fio-1268: ioc_rqos_throttle exit  1067901962288
      
      And I think the root cause is that 'PF_BLOCK_TS' is always cleared
      by blk_flush_plug() before scheduel(), hence blk_plug_invalidate_ts()
      will never be called:
      
      blk_time_get_ns
       plug->cur_ktime = ktime_get_ns();
       current->flags |= PF_BLOCK_TS;
      
      io_schedule:
       io_schedule_prepare
        blk_flush_plug
         __blk_flush_plug
          /* the flag is cleared, while time is not */
          current->flags &= ~PF_BLOCK_TS;
       schedule
       sched_update_worker
        /* the flag is not set, hence plug->cur_ktime is not cleared */
        if (tsk->flags & PF_BLOCK_TS)
         blk_plug_invalidate_ts()
      
      blk_time_get_ns
       /* got the time stashed before schedule */
       return plug->cur_ktime;
      
      Fix the problem by clearing cached time in __blk_flush_plug().
      
      Fixes: 06b23f92 ("block: update cached timestamp post schedule/preemption")
      Signed-off-by: default avatarYu Kuai <yukuai3@huawei.com>
      Link: https://lore.kernel.org/r/20240411032349.3051233-2-yukuai1@huaweicloud.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      3ec48489
    • Yi Zhang's avatar
      nvme: fix warn output about shared namespaces without CONFIG_NVME_MULTIPATH · 0bc2e80b
      Yi Zhang authored
      Move the stray '.' that is currently at the end of the line after
      newline '\n' to before newline character which is the right position.
      
      Fixes: ce8d7861 ("nvme: warn about shared namespaces without CONFIG_NVME_MULTIPATH")
      Signed-off-by: default avatarYi Zhang <yi.zhang@redhat.com>
      Reviewed-by: default avatarChaitanya Kulkarni <kch@nvidia.com>
      Signed-off-by: default avatarKeith Busch <kbusch@kernel.org>
      0bc2e80b
  7. 09 Apr, 2024 1 commit
  8. 07 Apr, 2024 2 commits
  9. 06 Apr, 2024 1 commit
    • Rik van Riel's avatar
      blk-iocost: avoid out of bounds shift · beaa51b3
      Rik van Riel authored
      UBSAN catches undefined behavior in blk-iocost, where sometimes
      iocg->delay is shifted right by a number that is too large,
      resulting in undefined behavior on some architectures.
      
      [  186.556576] ------------[ cut here ]------------
      UBSAN: shift-out-of-bounds in block/blk-iocost.c:1366:23
      shift exponent 64 is too large for 64-bit type 'u64' (aka 'unsigned long long')
      CPU: 16 PID: 0 Comm: swapper/16 Tainted: G S          E    N 6.9.0-0_fbk700_debug_rc2_kbuilder_0_gc85af715 #1
      Hardware name: Quanta Twin Lakes MP/Twin Lakes Passive MP, BIOS F09_3A23 12/08/2020
      Call Trace:
       <IRQ>
       dump_stack_lvl+0x8f/0xe0
       __ubsan_handle_shift_out_of_bounds+0x22c/0x280
       iocg_kick_delay+0x30b/0x310
       ioc_timer_fn+0x2fb/0x1f80
       __run_timer_base+0x1b6/0x250
      ...
      
      Avoid that undefined behavior by simply taking the
      "delay = 0" branch if the shift is too large.
      
      I am not sure what the symptoms of an undefined value
      delay will be, but I suspect it could be more than a
      little annoying to debug.
      Signed-off-by: default avatarRik van Riel <riel@surriel.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Josef Bacik <josef@toxicpanda.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Link: https://lore.kernel.org/r/20240404123253.0f58010f@imladris.surriel.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      beaa51b3
  10. 04 Apr, 2024 5 commits
  11. 02 Apr, 2024 4 commits
  12. 31 Mar, 2024 12 commits
  13. 30 Mar, 2024 2 commits
    • Mikulas Patocka's avatar
      objtool: Fix compile failure when using the x32 compiler · 6205125b
      Mikulas Patocka authored
      When compiling the v6.9-rc1 kernel with the x32 compiler, the following
      errors are reported. The reason is that we take an "unsigned long"
      variable and print it using "PRIx64" format string.
      
      	In file included from check.c:16:
      	check.c: In function ‘add_dead_ends’:
      	/usr/src/git/linux-2.6/tools/objtool/include/objtool/warn.h:46:17: error: format ‘%llx’ expects argument of type ‘long long unsigned int’, but argument 5 has type ‘long unsigned int’ [-Werror=format=]
      	   46 |                 "%s: warning: objtool: " format "\n",   \
      	      |                 ^~~~~~~~~~~~~~~~~~~~~~~~
      	check.c:613:33: note: in expansion of macro ‘WARN’
      	  613 |                                 WARN("can't find unreachable insn at %s+0x%" PRIx64,
      	      |                                 ^~~~
      	...
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: linux-kernel@vger.kernel.org
      6205125b
    • Linus Torvalds's avatar
      Merge tag 'xfs-6.9-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · 712e1425
      Linus Torvalds authored
      Pull xfs fixes from Chandan Babu:
      
       - Allow stripe unit/width value passed via mount option to be written
         over existing values in the super block
      
       - Do not set current->journal_info to avoid its value from being miused
         by another filesystem context
      
      * tag 'xfs-6.9-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        xfs: don't use current->journal_info
        xfs: allow sunit mount option to repair bad primary sb stripe values
      712e1425