1. 10 Mar, 2016 13 commits
  2. 23 Feb, 2016 15 commits
  3. 22 Feb, 2016 11 commits
    • Mike Snitzer's avatar
      dm: allocate blk_mq_tag_set rather than embed in mapped_device · 1c357a1e
      Mike Snitzer authored
      The blk_mq_tag_set is only needed for dm-mq support.  There is point
      wasting space in 'struct mapped_device' for non-dm-mq devices.
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> # check kzalloc return
      1c357a1e
    • Mike Snitzer's avatar
      dm: add 'dm_mq_nr_hw_queues' and 'dm_mq_queue_depth' module params · faad87df
      Mike Snitzer authored
      Allow user to change these values via module params or sysfs.
      
      'dm_mq_nr_hw_queues' defaults to 1 (max 32).
      
      'dm_mq_queue_depth' defaults to 2048 (up from 64, which proved far too
      small under moderate sized workloads -- the dm-multipath device would
      continuously block waiting for tags (requests) to become available).
      The maximum is BLK_MQ_MAX_DEPTH (currently 10240).
      
      Keep in mind the total number of pre-allocated requests per
      request-based dm-mq device is 'dm_mq_nr_hw_queues' * 'dm_mq_queue_depth'
      (currently 2048).
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      faad87df
    • Mike Snitzer's avatar
      dm: optimize dm_request_fn() · c91852ff
      Mike Snitzer authored
      DM multipath is the only request-based DM target -- which only supports
      tables with a single target that is immutable.  Leverage this fact in
      dm_request_fn().
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      c91852ff
    • Mike Snitzer's avatar
      dm: optimize dm_mq_queue_rq() · 16f12266
      Mike Snitzer authored
      DM multipath is the only dm-mq target.  But that aside, request-based DM
      only supports tables with a single target that is immutable.  Leverage
      this fact in dm_mq_queue_rq() by using the 'immutable_target' stored in
      the mapped_device when the table was made active.  This saves the need
      to even take the read-side of the SRCU via dm_{get,put}_live_table.
      
      If the active DM table does not have an immutable target (e.g. "error"
      target was swapped in) then fallback to the slow-path where the target
      is looked up from the live table.
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      16f12266
    • Mike Snitzer's avatar
      dm: set DM_TARGET_WILDCARD feature on "error" target · f083b09b
      Mike Snitzer authored
      The DM_TARGET_WILDCARD feature indicates that the "error" target may
      replace any target; even immutable targets.  This feature will be useful
      to preserve the ability to replace the "multipath" target even once it
      is formally converted over to having the DM_TARGET_IMMUTABLE feature.
      
      Also, implicit in the DM_TARGET_WILDCARD feature flag being set is that
      .map, .map_rq, .clone_and_map_rq and .release_clone_rq are all defined
      in the target_type.
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      f083b09b
    • Mike Snitzer's avatar
      dm: cleanup dm_any_congested() · e522c039
      Mike Snitzer authored
      The request-based DM support for checking queue congestion doesn't
      require access to the live DM table.
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      e522c039
    • Mike Snitzer's avatar
      dm: remove unused dm_get_rq_mapinfo() · ae6ad75e
      Mike Snitzer authored
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      ae6ad75e
    • Mike Snitzer's avatar
      dm: fix excessive dm-mq context switching · 6acfe68b
      Mike Snitzer authored
      Request-based DM's blk-mq support (dm-mq) was reported to be 50% slower
      than if an underlying null_blk device were used directly.  One of the
      reasons for this drop in performance is that blk_insert_clone_request()
      was calling blk_mq_insert_request() with @async=true.  This forced the
      use of kblockd_schedule_delayed_work_on() to run the blk-mq hw queues
      which ushered in ping-ponging between process context (fio in this case)
      and kblockd's kworker to submit the cloned request.  The ftrace
      function_graph tracer showed:
      
        kworker-2013  =>   fio-12190
        fio-12190    =>  kworker-2013
        ...
        kworker-2013  =>   fio-12190
        fio-12190    =>  kworker-2013
        ...
      
      Fixing blk_insert_clone_request()'s blk_mq_insert_request() call to
      _not_ use kblockd to submit the cloned requests isn't enough to
      eliminate the observed context switches.
      
      In addition to this dm-mq specific blk-core fix, there are 2 DM core
      fixes to dm-mq that (when paired with the blk-core fix) completely
      eliminate the observed context switching:
      
      1)  don't blk_mq_run_hw_queues in blk-mq request completion
      
          Motivated by desire to reduce overhead of dm-mq, punting to kblockd
          just increases context switches.
      
          In my testing against a really fast null_blk device there was no benefit
          to running blk_mq_run_hw_queues() on completion (and no other blk-mq
          driver does this).  So hopefully this change doesn't induce the need for
          yet another revert like commit 621739b0 !
      
      2)  use blk_mq_complete_request() in dm_complete_request()
      
          blk_complete_request() doesn't offer the traditional q->mq_ops vs
          .request_fn branching pattern that other historic block interfaces
          do (e.g. blk_get_request).  Using blk_mq_complete_request() for
          blk-mq requests is important for performance.  It should be noted
          that, like blk_complete_request(), blk_mq_complete_request() doesn't
          natively handle partial completions -- but the request-based
          DM-multipath target does provide the required partial completion
          support by dm.c:end_clone_bio() triggering requeueing of the request
          via dm-mpath.c:multipath_end_io()'s return of DM_ENDIO_REQUEUE.
      
      dm-mq fix #2 is _much_ more important than #1 for eliminating the
      context switches.
      Before: cpu          : usr=15.10%, sys=59.39%, ctx=7905181, majf=0, minf=475
      After:  cpu          : usr=20.60%, sys=79.35%, ctx=2008, majf=0, minf=472
      
      With these changes multithreaded async read IOPs improved from ~950K
      to ~1350K for this dm-mq stacked on null_blk test-case.  The raw read
      IOPs of the underlying null_blk device for the same workload is ~1950K.
      
      Fixes: 7fb4898e ("block: add blk-mq support to blk_insert_cloned_request()")
      Fixes: bfebd1cd ("dm: add full blk-mq support to request-based DM")
      Cc: stable@vger.kernel.org # 4.1+
      Reported-by: default avatarSagi Grimberg <sagig@dev.mellanox.co.il>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Acked-by: default avatarJens Axboe <axboe@kernel.dk>
      6acfe68b
    • Mike Snitzer's avatar
      dm: fix sparse "unexpected unlock" warnings in ioctl code · 956a4025
      Mike Snitzer authored
      Rename dm_get_live_table_for_ioctl to dm_grab_bdev_for_ioctl and have it
      do the dm_{get,put}_live_table() rather than split those operations.
      
      The dm_grab_bdev_for_ioctl() callers only care about the block_device
      associated with a singleton DM device so there isn't any need to retain
      a reference to the live DM table.  It is sufficient to:
      1) dm_get_live_table()
      2) bdgrab() the bdev associated with the singleton table's target
      3) dm_put_live_table()
      4) bdput() the bdev
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      956a4025
    • Mike Snitzer's avatar
      dm: do not return target from dm_get_live_table_for_ioctl() · 66482026
      Mike Snitzer authored
      None of the callers actually used the returned target.
      Also, just reuse bdev pointer passed to dm_blk_ioctl().
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      66482026
    • Mike Snitzer's avatar
      dm: fix dm_rq_target_io leak on faults with .request_fn DM w/ blk-mq paths · 4328daa2
      Mike Snitzer authored
      Using request-based DM mpath configured with the following stacking
      (.request_fn DM mpath ontop of scsi-mq paths):
      
      echo Y > /sys/module/scsi_mod/parameters/use_blk_mq
      echo N > /sys/module/dm_mod/parameters/use_blk_mq
      
      'struct dm_rq_target_io' would leak if a request is requeued before a
      blk-mq clone is allocated (or fails to allocate).  free_rq_tio()
      wasn't being called.
      
      kmemleak reported:
      
      unreferenced object 0xffff8800b90b98c0 (size 112):
        comm "kworker/7:1H", pid 5692, jiffies 4295056109 (age 78.589s)
        hex dump (first 32 bytes):
          00 d0 5c 2c 03 88 ff ff 40 00 bf 01 00 c9 ff ff  ..\,....@.......
          e0 d9 b1 34 00 88 ff ff 00 00 00 00 00 00 00 00  ...4............
        backtrace:
          [<ffffffff81672b6e>] kmemleak_alloc+0x4e/0xb0
          [<ffffffff811dbb63>] kmem_cache_alloc+0xc3/0x1e0
          [<ffffffff8117eae5>] mempool_alloc_slab+0x15/0x20
          [<ffffffff8117ec1e>] mempool_alloc+0x6e/0x170
          [<ffffffffa00029ac>] dm_old_prep_fn+0x3c/0x180 [dm_mod]
          [<ffffffff812fbd78>] blk_peek_request+0x168/0x290
          [<ffffffffa0003e62>] dm_request_fn+0xb2/0x1b0 [dm_mod]
          [<ffffffff812f66e3>] __blk_run_queue+0x33/0x40
          [<ffffffff812f9585>] blk_delay_work+0x25/0x40
          [<ffffffff81096fff>] process_one_work+0x14f/0x3d0
          [<ffffffff81097715>] worker_thread+0x125/0x4b0
          [<ffffffff8109ce88>] kthread+0xd8/0xf0
          [<ffffffff8167cb8f>] ret_from_fork+0x3f/0x70
          [<ffffffffffffffff>] 0xffffffffffffffff
      
      crash> struct -o dm_rq_target_io
      struct dm_rq_target_io {
          ...
      }
      SIZE: 112
      
      Fixes: e5863d9a ("dm: allocate requests in target when stacking on blk-mq devices")
      Cc: stable@vger.kernel.org # 4.0+
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      4328daa2
  4. 20 Feb, 2016 1 commit