1. 07 Oct, 2020 2 commits
  2. 01 Oct, 2020 3 commits
  3. 30 Sep, 2020 1 commit
    • Mike Snitzer's avatar
      dm: fix missing imposition of queue_limits from dm_wq_work() thread · 0c2915b8
      Mike Snitzer authored
      If a DM device was suspended when bios were issued to it, those bios
      would be deferred using queue_io(). Once the DM device was resumed
      dm_process_bio() could be called by dm_wq_work() for original bio that
      still needs splitting. dm_process_bio()'s check for current->bio_list
      (meaning call chain is within ->submit_bio) as a prerequisite for
      calling blk_queue_split() for "abnormal IO" would result in
      dm_process_bio() never imposing corresponding queue_limits
      (e.g. discard_granularity, discard_max_bytes, etc).
      
      Fix this by always having dm_wq_work() resubmit deferred bios using
      submit_bio_noacct().
      
      Side-effect is blk_queue_split() is always called for "abnormal IO" from
      ->submit_bio, be it from application thread or dm_wq_work() workqueue,
      so proper bio splitting and depth-first bio submission is performed.
      For sake of clarity, remove current->bio_list check before call to
      blk_queue_split().
      
      Also, remove dm_wq_work()'s use of dm_{get,put}_live_table() -- no
      longer needed since IO will be reissued in terms of ->submit_bio.
      And rename bio variable from 'c' to 'bio'.
      
      Fixes: cf9c3786 ("dm: fix comment in dm_process_bio()")
      Reported-by: default avatarJeffle Xu <jefflexu@linux.alibaba.com>
      Reviewed-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      0c2915b8
  4. 29 Sep, 2020 16 commits
  5. 28 Sep, 2020 1 commit
    • Xianting Tian's avatar
      blk-mq: add cond_resched() in __blk_mq_alloc_rq_maps() · 8229cca8
      Xianting Tian authored
      We found blk_mq_alloc_rq_maps() takes more time in kernel space when
      testing nvme device hot-plugging. The test and anlysis as below.
      
      Debug code,
      1, blk_mq_alloc_rq_maps():
              u64 start, end;
              depth = set->queue_depth;
              start = ktime_get_ns();
              pr_err("[%d:%s switch:%ld,%ld] queue depth %d, nr_hw_queues %d\n",
                              current->pid, current->comm, current->nvcsw, current->nivcsw,
                              set->queue_depth, set->nr_hw_queues);
              do {
                      err = __blk_mq_alloc_rq_maps(set);
                      if (!err)
                              break;
      
                      set->queue_depth >>= 1;
                      if (set->queue_depth < set->reserved_tags + BLK_MQ_TAG_MIN) {
                              err = -ENOMEM;
                              break;
                      }
              } while (set->queue_depth);
              end = ktime_get_ns();
              pr_err("[%d:%s switch:%ld,%ld] all hw queues init cost time %lld ns\n",
                              current->pid, current->comm,
                              current->nvcsw, current->nivcsw, end - start);
      
      2, __blk_mq_alloc_rq_maps():
              u64 start, end;
              for (i = 0; i < set->nr_hw_queues; i++) {
                      start = ktime_get_ns();
                      if (!__blk_mq_alloc_rq_map(set, i))
                              goto out_unwind;
                      end = ktime_get_ns();
                      pr_err("hw queue %d init cost time %lld ns\n", i, end - start);
              }
      
      Test nvme hot-plugging with above debug code, we found it totally cost more
      than 3ms in kernel space without being scheduled out when alloc rqs for all
      16 hw queues with depth 1023, each hw queue cost about 140-250us. The cost
      time will be increased with hw queue number and queue depth increasing. And
      in an extreme case, if __blk_mq_alloc_rq_maps() returns -ENOMEM, it will try
      "queue_depth >>= 1", more time will be consumed.
      	[  428.428771] nvme nvme0: pci function 10000:01:00.0
      	[  428.428798] nvme 10000:01:00.0: enabling device (0000 -> 0002)
      	[  428.428806] pcieport 10000:00:00.0: can't derive routing for PCI INT A
      	[  428.428809] nvme 10000:01:00.0: PCI INT A: no GSI
      	[  432.593374] [4688:kworker/u33:8 switch:663,2] queue depth 30, nr_hw_queues 1
      	[  432.593404] hw queue 0 init cost time 22883 ns
      	[  432.593408] [4688:kworker/u33:8 switch:663,2] all hw queues init cost time 35960 ns
      	[  432.595953] nvme nvme0: 16/0/0 default/read/poll queues
      	[  432.595958] [4688:kworker/u33:8 switch:700,2] queue depth 1023, nr_hw_queues 16
      	[  432.596203] hw queue 0 init cost time 242630 ns
      	[  432.596441] hw queue 1 init cost time 235913 ns
      	[  432.596659] hw queue 2 init cost time 216461 ns
      	[  432.596877] hw queue 3 init cost time 215851 ns
      	[  432.597107] hw queue 4 init cost time 228406 ns
      	[  432.597336] hw queue 5 init cost time 227298 ns
      	[  432.597564] hw queue 6 init cost time 224633 ns
      	[  432.597785] hw queue 7 init cost time 219954 ns
      	[  432.597937] hw queue 8 init cost time 150930 ns
      	[  432.598082] hw queue 9 init cost time 143496 ns
      	[  432.598231] hw queue 10 init cost time 147261 ns
      	[  432.598397] hw queue 11 init cost time 164522 ns
      	[  432.598542] hw queue 12 init cost time 143401 ns
      	[  432.598692] hw queue 13 init cost time 148934 ns
      	[  432.598841] hw queue 14 init cost time 147194 ns
      	[  432.598991] hw queue 15 init cost time 148942 ns
      	[  432.598993] [4688:kworker/u33:8 switch:700,2] all hw queues init cost time 3035099 ns
      	[  432.602611]  nvme0n1: p1
      
      So use this patch to trigger schedule between each hw queue init, to avoid
      other threads getting stuck. It is not in atomic context when executing
      __blk_mq_alloc_rq_maps(), so it is safe to call cond_resched().
      Signed-off-by: default avatarXianting Tian <tian.xianting@h3c.com>
      Reviewed-by: default avatarBart Van Assche <bvanassche@acm.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      8229cca8
  6. 25 Sep, 2020 17 commits