1. 11 Aug, 2017 1 commit
    • Wen Xiong's avatar
      blk-mq: NVMe 512B/4K+T10 DIF/DIX format returns I/O error on dd with split op · 8ffac713
      Wen Xiong authored
      BugLink: http://bugs.launchpad.net/bugs/1689946
      
      
      
      When formatting NVMe to 512B/4K + T10 DIf/DIX, dd with split op returns
      "Input/output error". Looks block layer split the bio after calling
      bio_integrity_prep(bio). This patch fixes the issue.
      
      Below is how we debug this issue:
      (1)format nvme to 4K block # size with type 2 DIF
      (2)dd with block size bigger than 1024k.
      oflag=direct
      dd: error writing '/dev/nvme0n1': Input/output error
      
      We added some debug code in nvme device driver. It showed us the first
      op and the second op have the same bi and pi address. This is not
      correct.
      
      1st op: nvme0n1 Op:Wr slba 0x505 length 0x100, PI ctrl=0x1400,
      	dsmgmt=0x0, AT=0x0 & RT=0x505
      	Guard 0x00b1, AT 0x0000, RT physical 0x00000505 RT virtual 0x00002828
      
      2nd op: nvme0n1 Op:Wr slba 0x605 length 0x1, PI ctrl=0x1400, dsmgmt=0x0,
      	AT=0x0 & RT=0x605  ==> This op fails and subsequent 5 retires..
      	Guard 0x00b1, AT 0x0000, RT physical 0x00000605 RT virtual 0x00002828
      
      With the fix, It showed us both of the first op and the second op have
      correct bi and pi address.
      
      1st op: nvme2n1 Op:Wr slba 0x505 length 0x100, PI ctrl=0x1400,
      	dsmgmt=0x0, AT=0x0 & RT=0x505
      	Guard 0x5ccb, AT 0x0000, RT physical 0x00000505 RT virtual
      	0x00002828
      2nd op: nvme2n1 Op:Wr slba 0x605 length 0x1, PI ctrl=0x1400, dsmgmt=0x0,
      	AT=0x0 & RT=0x605
      	Guard 0xab4c, AT 0x0000, RT physical 0x00000605 RT virtual
      	0x00003028
      Signed-off-by: default avatarWen Xiong <wenxiong@linux.vnet.ibm.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      (cherry picked from commit f36ea50c
      
      )
      Signed-off-by: default avatarJoseph Salisbury <joseph.salisbury@canonical.com>
      Acked-by: default avatarStefan Bader <stefan.bader@canonical.com>
      Acked-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      Signed-off-by: default avatarThadeu Lima de Souza Cascardo <cascardo@canonical.com>
      8ffac713
  2. 08 Mar, 2017 2 commits
    • Gabriel Krisman Bertazi's avatar
      blk-mq: Fix failed allocation path when mapping queues · ce7a95c5
      Gabriel Krisman Bertazi authored
      BugLink: http://bugs.launchpad.net/bugs/1662666
      
      In blk_mq_map_swqueue, there is a memory optimization that frees the
      tags of a queue that has gone unmapped.  Later, if that hctx is remapped
      after another topology change, the tags need to be reallocated.
      
      If this allocation fails, a simple WARN_ON triggers, but the block layer
      ends up with an active hctx without any corresponding set of tags.
      Then, any income IO to that hctx can trigger an Oops.
      
      I can reproduce it consistently by running IO, flipping CPUs on and off
      and eventually injecting a memory allocation failure in that path.
      
      In the fix below, if the system experiences a failed allocation of any
      hctx's tags, we remap all the ctxs of that queue to the hctx_0, which
      should always keep it's tags.  There is a minor performance hit, since
      our mapping just got worse after the error path, but this is
      the simplest solution to handle this error path.  The performance hit
      will disappear after another su...
      ce7a95c5
    • Gabriel Krisman Bertazi's avatar
      blk-mq: Avoid memory reclaim when remapping queues · 2fd8868a
      Gabriel Krisman Bertazi authored
      BugLink: http://bugs.launchpad.net/bugs/1662666
      
      
      
      While stressing memory and IO at the same time we changed SMT settings,
      we were able to consistently trigger deadlocks in the mm system, which
      froze the entire machine.
      
      I think that under memory stress conditions, the large allocations
      performed by blk_mq_init_rq_map may trigger a reclaim, which stalls
      waiting on the block layer remmaping completion, thus deadlocking the
      system.  The trace below was collected after the machine stalled,
      waiting for the hotplug event completion.
      
      The simplest fix for this is to make allocations in this path
      non-reclaimable, with GFP_NOIO.  With this patch, We couldn't hit the
      issue anymore.
      
      This should apply on top of Jens's for-next branch cleanly.
      
      Changes since v1:
        - Use GFP_NOIO instead of GFP_NOWAIT.
      
       Call Trace:
      [c000000f0160aaf0] [c000000f0160ab50] 0xc000000f0160ab50 (unreliable)
      [c000000f0160acc0] [c000000000016624] __switch_to+0x2e4/0x430
      [c000000f0160ad20] [c000000000b1a880] __schedule+0x310/0x9b0
      [c000000f0160ae00] [c000000000b1af68] schedule+0x48/0xc0
      [c000000f0160ae30] [c000000000b1b4b0] schedule_preempt_disabled+0x20/0x30
      [c000000f0160ae50] [c000000000b1d4fc] __mutex_lock_slowpath+0xec/0x1f0
      [c000000f0160aed0] [c000000000b1d678] mutex_lock+0x78/0xa0
      [c000000f0160af00] [d000000019413cac] xfs_reclaim_inodes_ag+0x33c/0x380 [xfs]
      [c000000f0160b0b0] [d000000019415164] xfs_reclaim_inodes_nr+0x54/0x70 [xfs]
      [c000000f0160b0f0] [d0000000194297f8] xfs_fs_free_cached_objects+0x38/0x60 [xfs]
      [c000000f0160b120] [c0000000003172c8] super_cache_scan+0x1f8/0x210
      [c000000f0160b190] [c00000000026301c] shrink_slab.part.13+0x21c/0x4c0
      [c000000f0160b2d0] [c000000000268088] shrink_zone+0x2d8/0x3c0
      [c000000f0160b380] [c00000000026834c] do_try_to_free_pages+0x1dc/0x520
      [c000000f0160b450] [c00000000026876c] try_to_free_pages+0xdc/0x250
      [c000000f0160b4e0] [c000000000251978] __alloc_pages_nodemask+0x868/0x10d0
      [c000000f0160b6f0] [c000000000567030] blk_mq_init_rq_map+0x160/0x380
      [c000000f0160b7a0] [c00000000056758c] blk_mq_map_swqueue+0x33c/0x360
      [c000000f0160b820] [c000000000567904] blk_mq_queue_reinit+0x64/0xb0
      [c000000f0160b850] [c00000000056a16c] blk_mq_queue_reinit_notify+0x19c/0x250
      [c000000f0160b8a0] [c0000000000f5d38] notifier_call_chain+0x98/0x100
      [c000000f0160b8f0] [c0000000000c5fb0] __cpu_notify+0x70/0xe0
      [c000000f0160b930] [c0000000000c63c4] notify_prepare+0x44/0xb0
      [c000000f0160b9b0] [c0000000000c52f4] cpuhp_invoke_callback+0x84/0x250
      [c000000f0160ba10] [c0000000000c570c] cpuhp_up_callbacks+0x5c/0x120
      [c000000f0160ba60] [c0000000000c7cb8] _cpu_up+0xf8/0x1d0
      [c000000f0160bac0] [c0000000000c7eb0] do_cpu_up+0x120/0x150
      [c000000f0160bb40] [c0000000006fe024] cpu_subsys_online+0x64/0xe0
      [c000000f0160bb90] [c0000000006f5124] device_online+0xb4/0x120
      [c000000f0160bbd0] [c0000000006f5244] online_store+0xb4/0xc0
      [c000000f0160bc20] [c0000000006f0a68] dev_attr_store+0x68/0xa0
      [c000000f0160bc60] [c0000000003ccc30] sysfs_kf_write+0x80/0xb0
      [c000000f0160bca0] [c0000000003cbabc] kernfs_fop_write+0x17c/0x250
      [c000000f0160bcf0] [c00000000030fe6c] __vfs_write+0x6c/0x1e0
      [c000000f0160bd90] [c000000000311490] vfs_write+0xd0/0x270
      [c000000f0160bde0] [c0000000003131fc] SyS_write+0x6c/0x110
      [c000000f0160be30] [c000000000009204] system_call+0x38/0xec
      Signed-off-by: default avatarGabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
      Cc: Brian King <brking@linux.vnet.ibm.com>
      Cc: Douglas Miller <dougmill@linux.vnet.ibm.com>
      Cc: linux-block@vger.kernel.org
      Cc: linux-scsi@vger.kernel.org
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      (cherry picked from commit 36e1f3d1
      
      )
      Signed-off-by: default avatarTim Gardner <tim.gardner@canonical.com>
      Acked-by: default avatarStefan Bader <stefan.bader@canonical.com>
      Acked-by: default avatarBrad Figg <brad.figg@canonical.com>
      Signed-off-by: default avatarThadeu Lima de Souza Cascardo <cascardo@canonical.com>
      2fd8868a
  3. 20 Jan, 2017 1 commit
  4. 18 Jan, 2017 1 commit
  5. 10 Jan, 2017 1 commit
  6. 06 Dec, 2016 1 commit
  7. 20 Oct, 2016 1 commit
  8. 19 Sep, 2016 3 commits
    • Jens Axboe's avatar
      blk-mq: don't overwrite rq->mq_ctx · f463371e
      Jens Axboe authored
      BugLink: http://bugs.launchpad.net/bugs/1620317
      
      
      
      We do this in a few places, if the CPU is offline. This isn't allowed,
      though, since on multi queue hardware, we can't just move a request
      from one software queue to another, if they map to different hardware
      queues. The request and tag isn't valid on another hardware queue.
      
      This can happen if plugging races with CPU offlining. But it does
      no harm, since it can only happen in the window where we are
      currently busy freezing the queue and flushing IO, in preparation
      for redoing the software <-> hardware queue mappings.
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      (cherry picked from commit e57690fe
      
      )
      Signed-off-by: default avatarTim Gardner <tim.gardner@canonical.com>
      Acked-by: default avatarBrad Figg <brad.figg@canonical.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      f463371e
    • Jens Axboe's avatar
      blk-mq: improve warning for running a queue on the wrong CPU · 8b6f6135
      Jens Axboe authored
      BugLink: http://bugs.launchpad.net/bugs/1620317
      
      
      
      __blk_mq_run_hw_queue() currently warns if we are running the queue on a
      CPU that isn't set in its mask. However, this can happen if a CPU is
      being offlined, and the workqueue handling will place the work on CPU0
      instead. Improve the warning so that it only triggers if the batch cpu
      in the hardware queue is currently online.  If it triggers for that
      case, then it's indicative of a flow problem in blk-mq, so we want to
      retain it for that case.
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      (cherry picked from commit 0e87e58b
      
      )
      Signed-off-by: default avatarTim Gardner <tim.gardner@canonical.com>
      Acked-by: default avatarBrad Figg <brad.figg@canonical.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      8b6f6135
    • Gabriel Krisman Bertazi's avatar
      blk-mq: Allow timeouts to run while queue is freezing · 702b11d1
      Gabriel Krisman Bertazi authored
      BugLink: http://bugs.launchpad.net/bugs/1620317
      
      
      
      In case a submitted request gets stuck for some reason, the block layer
      can prevent the request starvation by starting the scheduled timeout work.
      If this stuck request occurs at the same time another thread has started
      a queue freeze, the blk_mq_timeout_work will not be able to acquire the
      queue reference and will return silently, thus not issuing the timeout.
      But since the request is already holding a q_usage_counter reference and
      is unable to complete, it will never release its reference, preventing
      the queue from completing the freeze started by first thread.  This puts
      the request_queue in a hung state, forever waiting for the freeze
      completion.
      
      This was observed while running IO to a NVMe device at the same time we
      toggled the CPU hotplug code. Eventually, once a request got stuck
      requiring a timeout during a queue freeze, we saw the CPU Hotplug
      notification code get stuck inside blk_mq_freeze_queue_wait, as shown in
      the trace below.
      
      [c000000deaf13690] [c000000deaf13738] 0xc000000deaf13738 (unreliable)
      [c000000deaf13860] [c000000000015ce8] __switch_to+0x1f8/0x350
      [c000000deaf138b0] [c000000000ade0e4] __schedule+0x314/0x990
      [c000000deaf13940] [c000000000ade7a8] schedule+0x48/0xc0
      [c000000deaf13970] [c0000000005492a4] blk_mq_freeze_queue_wait+0x74/0x110
      [c000000deaf139e0] [c00000000054b6a8] blk_mq_queue_reinit_notify+0x1a8/0x2e0
      [c000000deaf13a40] [c0000000000e7878] notifier_call_chain+0x98/0x100
      [c000000deaf13a90] [c0000000000b8e08] cpu_notify_nofail+0x48/0xa0
      [c000000deaf13ac0] [c0000000000b92f0] _cpu_down+0x2a0/0x400
      [c000000deaf13b90] [c0000000000b94a8] cpu_down+0x58/0xa0
      [c000000deaf13bc0] [c0000000006d5dcc] cpu_subsys_offline+0x2c/0x50
      [c000000deaf13bf0] [c0000000006cd244] device_offline+0x104/0x140
      [c000000deaf13c30] [c0000000006cd40c] online_store+0x6c/0xc0
      [c000000deaf13c80] [c0000000006c8c78] dev_attr_store+0x68/0xa0
      [c000000deaf13cc0] [c0000000003974d0] sysfs_kf_write+0x80/0xb0
      [c000000deaf13d00] [c0000000003963e8] kernfs_fop_write+0x188/0x200
      [c000000deaf13d50] [c0000000002e0f6c] __vfs_write+0x6c/0xe0
      [c000000deaf13d90] [c0000000002e1ca0] vfs_write+0xc0/0x230
      [c000000deaf13de0] [c0000000002e2cdc] SyS_write+0x6c/0x110
      [c000000deaf13e30] [c000000000009204] system_call+0x38/0xb4
      
      The fix is to allow the timeout work to execute in the window between
      dropping the initial refcount reference and the release of the last
      reference, which actually marks the freeze completion.  This can be
      achieved with percpu_refcount_tryget, which does not require the counter
      to be alive.  This way the timeout work can do it's job and terminate a
      stuck request even during a freeze, returning its reference and avoiding
      the deadlock.
      
      Allowing the timeout to run is just a part of the fix, since for some
      devices, we might get stuck again inside the device driver's timeout
      handler, should it attempt to allocate a new request in that path -
      which is a quite common action for Abort commands, which need to be sent
      after a timeout.  In NVMe, for instance, we call blk_mq_alloc_request
      from inside the timeout handler, which will fail during a freeze, since
      it also tries to acquire a queue reference.
      
      I considered a similar change to blk_mq_alloc_request as a generic
      solution for further device driver hangs, but we can't do that, since it
      would allow new requests to disturb the freeze process.  I thought about
      creating a new function in the block layer to support unfreezable
      requests for these occasions, but after working on it for a while, I
      feel like this should be handled in a per-driver basis.  I'm now
      experimenting with changes to the NVMe timeout path, but I'm open to
      suggestions of ways to make this generic.
      Signed-off-by: default avatarGabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
      Cc: Brian King <brking@linux.vnet.ibm.com>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: linux-nvme@lists.infradead.org
      Cc: linux-block@vger.kernel.org
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      (cherry picked from commit 71f79fb3
      
      )
      Signed-off-by: default avatarTim Gardner <tim.gardner@canonical.com>
      Acked-by: default avatarBrad Figg <brad.figg@canonical.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      702b11d1
  9. 06 Sep, 2016 2 commits
  10. 09 Aug, 2016 1 commit
  11. 10 Jun, 2016 1 commit
  12. 29 Feb, 2016 1 commit
  13. 21 Nov, 2015 1 commit
    • Jens Axboe's avatar
      blk-mq: fix calling unplug callbacks with preempt disabled · b094f89c
      Jens Axboe authored
      
      Liu reported that running certain parts of xfstests threw the
      following error:
      
      BUG: sleeping function called from invalid context at mm/page_alloc.c:3190
      in_atomic(): 1, irqs_disabled(): 0, pid: 6, name: kworker/u16:0
      3 locks held by kworker/u16:0/6:
       #0:  ("writeback"){++++.+}, at: [<ffffffff8107f083>] process_one_work+0x173/0x730
       #1:  ((&(&wb->dwork)->work)){+.+.+.}, at: [<ffffffff8107f083>] process_one_work+0x173/0x730
       #2:  (&type->s_umount_key#44){+++++.}, at: [<ffffffff811e6805>] trylock_super+0x25/0x60
      CPU: 5 PID: 6 Comm: kworker/u16:0 Tainted: G           OE   4.3.0+ #3
      Hardware name: Red Hat KVM, BIOS Bochs 01/01/2011
      Workqueue: writeback wb_workfn (flush-btrfs-108)
       ffffffff81a3abab ffff88042e282ba8 ffffffff8130191b ffffffff81a3abab
       0000000000000c76 ffff88042e282ba8 ffff88042e27c180 ffff88042e282bd8
       ffffffff8108ed95 ffff880400000004 0000000000000000 0000000000000c76
      Call Trace:
       [<ffffffff8130191b>] dump_stack+0x4f/0x74
       [<ffffffff8108ed95>] ___might_sleep+0x185/0x240
       [<ffffffff8108eea2>] __might_sleep+0x52/0x90
       [<ffffffff811817e8>] __alloc_pages_nodemask+0x268/0x410
       [<ffffffff8109a43c>] ? sched_clock_local+0x1c/0x90
       [<ffffffff8109a6d1>] ? local_clock+0x21/0x40
       [<ffffffff810b9eb0>] ? __lock_release+0x420/0x510
       [<ffffffff810b534c>] ? __lock_acquired+0x16c/0x3c0
       [<ffffffff811ca265>] alloc_pages_current+0xc5/0x210
       [<ffffffffa0577105>] ? rbio_is_full+0x55/0x70 [btrfs]
       [<ffffffff810b7ed8>] ? mark_held_locks+0x78/0xa0
       [<ffffffff81666d50>] ? _raw_spin_unlock_irqrestore+0x40/0x60
       [<ffffffffa0578c0a>] full_stripe_write+0x5a/0xc0 [btrfs]
       [<ffffffffa0578ca9>] __raid56_parity_write+0x39/0x60 [btrfs]
       [<ffffffffa0578deb>] run_plug+0x11b/0x140 [btrfs]
       [<ffffffffa0578e33>] btrfs_raid_unplug+0x23/0x70 [btrfs]
       [<ffffffff812d36c2>] blk_flush_plug_list+0x82/0x1f0
       [<ffffffff812e0349>] blk_sq_make_request+0x1f9/0x740
       [<ffffffff812ceba2>] ? generic_make_request_checks+0x222/0x7c0
       [<ffffffff812cf264>] ? blk_queue_enter+0x124/0x310
       [<ffffffff812cf1d2>] ? blk_queue_enter+0x92/0x310
       [<ffffffff812d0ae2>] generic_make_request+0x172/0x2c0
       [<ffffffff812d0ad4>] ? generic_make_request+0x164/0x2c0
       [<ffffffff812d0ca0>] submit_bio+0x70/0x140
       [<ffffffffa0577b29>] ? rbio_add_io_page+0x99/0x150 [btrfs]
       [<ffffffffa0578a89>] finish_rmw+0x4d9/0x600 [btrfs]
       [<ffffffffa0578c4c>] full_stripe_write+0x9c/0xc0 [btrfs]
       [<ffffffffa057ab7f>] raid56_parity_write+0xef/0x160 [btrfs]
       [<ffffffffa052bd83>] btrfs_map_bio+0xe3/0x2d0 [btrfs]
       [<ffffffffa04fbd6d>] btrfs_submit_bio_hook+0x8d/0x1d0 [btrfs]
       [<ffffffffa05173c4>] submit_one_bio+0x74/0xb0 [btrfs]
       [<ffffffffa0517f55>] submit_extent_page+0xe5/0x1c0 [btrfs]
       [<ffffffffa0519b18>] __extent_writepage_io+0x408/0x4c0 [btrfs]
       [<ffffffffa05179c0>] ? alloc_dummy_extent_buffer+0x140/0x140 [btrfs]
       [<ffffffffa051dc88>] __extent_writepage+0x218/0x3a0 [btrfs]
       [<ffffffff810b7ed8>] ? mark_held_locks+0x78/0xa0
       [<ffffffffa051e2c9>] extent_write_cache_pages.clone.0+0x2f9/0x400 [btrfs]
       [<ffffffffa051e422>] extent_writepages+0x52/0x70 [btrfs]
       [<ffffffffa05001f0>] ? btrfs_set_inode_index+0x70/0x70 [btrfs]
       [<ffffffffa04fcc17>] btrfs_writepages+0x27/0x30 [btrfs]
       [<ffffffff81184df3>] do_writepages+0x23/0x40
       [<ffffffff81212229>] __writeback_single_inode+0x89/0x4d0
       [<ffffffff81212a60>] ? writeback_sb_inodes+0x260/0x480
       [<ffffffff81212a60>] ? writeback_sb_inodes+0x260/0x480
       [<ffffffff8121295f>] ? writeback_sb_inodes+0x15f/0x480
       [<ffffffff81212ad2>] writeback_sb_inodes+0x2d2/0x480
       [<ffffffff810b1397>] ? down_read_trylock+0x57/0x60
       [<ffffffff811e6805>] ? trylock_super+0x25/0x60
       [<ffffffff810d629f>] ? rcu_read_lock_sched_held+0x4f/0x90
       [<ffffffff81212d0c>] __writeback_inodes_wb+0x8c/0xc0
       [<ffffffff812130b5>] wb_writeback+0x2b5/0x500
       [<ffffffff810b7ed8>] ? mark_held_locks+0x78/0xa0
       [<ffffffff810660a8>] ? __local_bh_enable_ip+0x68/0xc0
       [<ffffffff81213362>] ? wb_do_writeback+0x62/0x310
       [<ffffffff812133c1>] wb_do_writeback+0xc1/0x310
       [<ffffffff8107c3d9>] ? set_worker_desc+0x79/0x90
       [<ffffffff81213842>] wb_workfn+0x92/0x330
       [<ffffffff8107f133>] process_one_work+0x223/0x730
       [<ffffffff8107f083>] ? process_one_work+0x173/0x730
       [<ffffffff8108035f>] ? worker_thread+0x18f/0x430
       [<ffffffff810802ed>] worker_thread+0x11d/0x430
       [<ffffffff810801d0>] ? maybe_create_worker+0xf0/0xf0
       [<ffffffff810801d0>] ? maybe_create_worker+0xf0/0xf0
       [<ffffffff810858df>] kthread+0xef/0x110
       [<ffffffff8108f74e>] ? schedule_tail+0x1e/0xd0
       [<ffffffff810857f0>] ? __init_kthread_worker+0x70/0x70
       [<ffffffff816673bf>] ret_from_fork+0x3f/0x70
       [<ffffffff810857f0>] ? __init_kthread_worker+0x70/0x70
      
      The issue is that we've got the software context pinned while
      calling blk_flush_plug_list(), which flushes callbacks that
      are allowed to sleep. btrfs and raid has such callbacks.
      
      Flip the checks around a bit, so we can enable preempt a bit
      earlier and flush plugs without having preempt disabled.
      
      This only affects blk-mq driven devices, and only those that
      register a single queue.
      Reported-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Tested-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Cc: stable@kernel.org
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      b094f89c
  14. 11 Nov, 2015 1 commit
  15. 07 Nov, 2015 4 commits
    • Jens Axboe's avatar
      blk-mq: return tag/queue combo in the make_request_fn handlers · 7b371636
      Jens Axboe authored
      
      Return a cookie, blk_qc_t, from the blk-mq make request functions, that
      allows a later caller to uniquely identify a specific IO. The cookie
      doesn't mean anything to the caller, but the caller can use it to later
      pass back to the block layer. The block layer can then identify the
      hardware queue and request from that cookie.
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Acked-by: default avatarChristoph Hellwig <hch@lst.de>
      Acked-by: default avatarKeith Busch <keith.busch@intel.com>
      7b371636
    • Jens Axboe's avatar
      block: change ->make_request_fn() and users to return a queue cookie · dece1635
      Jens Axboe authored
      
      No functional changes in this patch, but it prepares us for returning
      a more useful cookie related to the IO that was queued up.
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Acked-by: default avatarChristoph Hellwig <hch@lst.de>
      Acked-by: default avatarKeith Busch <keith.busch@intel.com>
      dece1635
    • Mel Gorman's avatar
      mm, page_alloc: rename __GFP_WAIT to __GFP_RECLAIM · 71baba4b
      Mel Gorman authored
      
      __GFP_WAIT was used to signal that the caller was in atomic context and
      could not sleep.  Now it is possible to distinguish between true atomic
      context and callers that are not willing to sleep.  The latter should
      clear __GFP_DIRECT_RECLAIM so kswapd will still wake.  As clearing
      __GFP_WAIT behaves differently, there is a risk that people will clear the
      wrong flags.  This patch renames __GFP_WAIT to __GFP_RECLAIM to clearly
      indicate what it does -- setting it allows all reclaim activity, clearing
      them prevents it.
      
      [akpm@linux-foundation.org: fix build]
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Christoph Lameter <cl@linux.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Vitaly Wool <vitalywool@gmail.com>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      71baba4b
    • Mel Gorman's avatar
      mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep... · d0164adc
      Mel Gorman authored
      mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd
      
      __GFP_WAIT has been used to identify atomic context in callers that hold
      spinlocks or are in interrupts.  They are expected to be high priority and
      have access one of two watermarks lower than "min" which can be referred
      to as the "atomic reserve".  __GFP_HIGH users get access to the first
      lower watermark and can be called the "high priority reserve".
      
      Over time, callers had a requirement to not block when fallback options
      were available.  Some have abused __GFP_WAIT leading to a situation where
      an optimisitic allocation with a fallback option can access atomic
      reserves.
      
      This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
      cannot sleep and have no alternative.  High priority users continue to use
      __GFP_HIGH.  __GFP_DIRECT_RECLAIM identifies callers that can sleep and
      are willing to enter direct reclaim.  __GFP_KSWAPD_RECLAIM to identify
      callers that want to wake kswapd for background reclaim.  __GFP_WAIT is
      redefined as a caller that is willing to enter direct reclaim and wake
      kswapd for background reclaim.
      
      This patch then converts a number of sites
      
      o __GFP_ATOMIC is used by callers that are high priority and have memory
        pools for those requests. GFP_ATOMIC uses this flag.
      
      o Callers that have a limited mempool to guarantee forward progress clear
        __GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
        into this category where kswapd will still be woken but atomic reserves
        are not used as there is a one-entry mempool to guarantee progress.
      
      o Callers that are checking if they are non-blocking should use the
        helper gfpflags_allow_blocking() where possible. This is because
        checking for __GFP_WAIT as was done historically now can trigger false
        positives. Some exceptions like dm-crypt.c exist where the code intent
        is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
        flag manipulations.
      
      o Callers that built their own GFP flags instead of starting with GFP_KERNEL
        and friends now also need to specify __GFP_KSWAPD_RECLAIM.
      
      The first key hazard to watch out for is callers that removed __GFP_WAIT
      and was depending on access to atomic reserves for inconspicuous reasons.
      In some cases it may be appropriate for them to use __GFP_HIGH.
      
      The second key hazard is callers that assembled their own combination of
      GFP flags instead of starting with something like GFP_KERNEL.  They may
      now wish to specify __GFP_KSWAPD_RECLAIM.  It's almost certainly harmless
      if it's missed in most cases as other activity will wake kswapd.
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Vitaly Wool <vitalywool@gmail.com>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d0164adc
  16. 03 Nov, 2015 1 commit
    • Jeff Moyer's avatar
      blk-mq: avoid excessive boot delays with large lun counts · 2404e607
      Jeff Moyer authored
      Hi,
      
      Zhangqing Luo reported long boot times on a system with thousands of
      LUNs when scsi-mq was enabled.  He narrowed the problem down to
      blk_mq_add_queue_tag_set, where every queue is frozen in order to set
      the BLK_MQ_F_TAG_SHARED flag.  Each added device will freeze all queues
      added before it in sequence, which involves waiting for an RCU grace
      period for each one.  We don't need to do this.  After the second queue
      is added, only new queues need to be initialized with the shared tag.
      We can do that by percolating the flag up to the blk_mq_tag_set, and
      updating the newly added queue's hctxs if the flag is set.
      
      This problem was introduced by commit 0d2602ca
      
       (blk-mq: improve
      support for shared tags maps).
      Reported-and-tested-by: default avatarJason Luo <zhangqing.luo@oracle.com>
      Reviewed-by: default avatarMing Lei <ming.lei@canonical.com>
      Signed-off-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      2404e607
  17. 21 Oct, 2015 5 commits
  18. 15 Oct, 2015 1 commit
  19. 01 Oct, 2015 2 commits
  20. 29 Sep, 2015 6 commits
    • Akinobu Mita's avatar
      blk-mq: fix deadlock when reading cpu_list · 60de074b
      Akinobu Mita authored
      
      CPU hotplug handling for blk-mq (blk_mq_queue_reinit) acquires
      all_q_mutex in blk_mq_queue_reinit_notify() and then removes sysfs
      entries by blk_mq_sysfs_unregister().  Removing sysfs entry needs to
      be blocked until the active reference of the kernfs_node to be zero.
      
      On the other hand, reading blk_mq_hw_sysfs_cpu sysfs entry (e.g.
      /sys/block/nullb0/mq/0/cpu_list) acquires all_q_mutex in
      blk_mq_hw_sysfs_cpus_show().
      
      If these happen at the same time, a deadlock can happen.  Because one
      can wait for the active reference to be zero with holding all_q_mutex,
      and the other tries to acquire all_q_mutex with holding the active
      reference.
      
      The reason that all_q_mutex is acquired in blk_mq_hw_sysfs_cpus_show()
      is to avoid reading an imcomplete hctx->cpumask.  Since reading sysfs
      entry for blk-mq needs to acquire q->sysfs_lock, we can avoid deadlock
      and reading an imcomplete hctx->cpumask by protecting q->sysfs_lock
      while hctx->cpumask is being updated.
      Signed-off-by: default avatarAkinobu Mita <akinobu.mita@gmail.com>
      Reviewed-by: default avatarMing Lei <tom.leiming@gmail.com>
      Cc: Ming Lei <tom.leiming@gmail.com>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      60de074b
    • Akinobu Mita's avatar
      blk-mq: avoid inserting requests before establishing new mapping · 5778322e
      Akinobu Mita authored
      Notifier callbacks for CPU_ONLINE action can be run on the other CPU
      than the CPU which was just onlined.  So it is possible for the
      process running on the just onlined CPU to insert request and run
      hw queue before establishing new mapping which is done by
      blk_mq_queue_reinit_notify().
      
      This can cause a problem when the CPU has just been onlined first time
      since the request queue was initialized.  At this time ctx->index_hw
      for the CPU, which is the index in hctx->ctxs[] for this ctx, is still
      zero before blk_mq_queue_reinit_notify() is called by notifier
      callbacks for CPU_ONLINE action.
      
      For example, there is a single hw queue (hctx) and two CPU queues
      (ctx0 for CPU0, and ctx1 for CPU1).  Now CPU1 is just onlined and
      a request is inserted into ctx1->rq_list and set bit0 in pending
      bitmap as ctx1->index_hw is still zero.
      
      And then while running hw queue, flush_busy_ctxs() finds bit0 is set
      in pending bitmap and tries to retrieve requests in
      ...
      5778322e
    • Akinobu Mita's avatar
      blk-mq: fix q->mq_usage_counter access race · 0e626368
      Akinobu Mita authored
      
      CPU hotplug handling for blk-mq (blk_mq_queue_reinit) accesses
      q->mq_usage_counter while freezing all request queues in all_q_list.
      On the other hand, q->mq_usage_counter is deinitialized in
      blk_mq_free_queue() before deleting the queue from all_q_list.
      
      So if CPU hotplug event occurs in the window, percpu_ref_kill() is
      called with q->mq_usage_counter which has already been marked dead,
      and it triggers warning.  Fix it by deleting the queue from all_q_list
      earlier than destroying q->mq_usage_counter.
      Signed-off-by: default avatarAkinobu Mita <akinobu.mita@gmail.com>
      Reviewed-by: default avatarMing Lei <tom.leiming@gmail.com>
      Cc: Ming Lei <tom.leiming@gmail.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      0e626368
    • Akinobu Mita's avatar
      blk-mq: Fix use after of free q->mq_map · a723bab3
      Akinobu Mita authored
      
      CPU hotplug handling for blk-mq (blk_mq_queue_reinit) updates
      q->mq_map by blk_mq_update_queue_map() for all request queues in
      all_q_list.  On the other hand, q->mq_map is released before deleting
      the queue from all_q_list.
      
      So if CPU hotplug event occurs in the window, invalid memory access
      can happen.  Fix it by releasing q->mq_map in blk_mq_release() to make
      it happen latter than removal from all_q_list.
      Signed-off-by: default avatarAkinobu Mita <akinobu.mita@gmail.com>
      Suggested-by: default avatarMing Lei <tom.leiming@gmail.com>
      Reviewed-by: default avatarMing Lei <tom.leiming@gmail.com>
      Cc: Ming Lei <tom.leiming@gmail.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      a723bab3
    • Akinobu Mita's avatar
      blk-mq: fix sysfs registration/unregistration race · 4593fdbe
      Akinobu Mita authored
      
      There is a race between cpu hotplug handling and adding/deleting
      gendisk for blk-mq, where both are trying to register and unregister
      the same sysfs entries.
      
      null_add_dev
          --> blk_mq_init_queue
              --> blk_mq_init_allocated_queue
                  --> add to 'all_q_list' (*)
          --> add_disk
              --> blk_register_queue
                  --> blk_mq_register_disk (++)
      
      null_del_dev
          --> del_gendisk
              --> blk_unregister_queue
                  --> blk_mq_unregister_disk (--)
          --> blk_cleanup_queue
              --> blk_mq_free_queue
                  --> del from 'all_q_list' (*)
      
      blk_mq_queue_reinit
          --> blk_mq_sysfs_unregister (-)
          --> blk_mq_sysfs_register (+)
      
      While the request queue is added to 'all_q_list' (*),
      blk_mq_queue_reinit() can be called for the queue anytime by CPU
      hotplug callback.  But blk_mq_sysfs_unregister (-) and
      blk_mq_sysfs_register (+) in blk_mq_queue_reinit must not be called
      before blk_mq_register_disk (++) and after blk_mq_unregister_disk (--)
      is finished.  Because '/sys/block/*/mq/' is not exists.
      
      There has already been BLK_MQ_F_SYSFS_UP flag in hctx->flags which can
      be used to track these sysfs stuff, but it is only fixing this issue
      partially.
      
      In order to fix it completely, we just need per-queue flag instead of
      per-hctx flag with appropriate locking.  So this introduces
      q->mq_sysfs_init_done which is properly protected with all_q_mutex.
      
      Also, we need to ensure that blk_mq_map_swqueue() is called with
      all_q_mutex is held.  Since hctx->nr_ctx is reset temporarily and
      updated in blk_mq_map_swqueue(), so we should avoid
      blk_mq_register_hctx() seeing the temporary hctx->nr_ctx value
      in CPU hotplug handling or adding/deleting gendisk .
      Signed-off-by: default avatarAkinobu Mita <akinobu.mita@gmail.com>
      Reviewed-by: default avatarMing Lei <tom.leiming@gmail.com>
      Cc: Ming Lei <tom.leiming@gmail.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      4593fdbe
    • Akinobu Mita's avatar
      blk-mq: avoid setting hctx->tags->cpumask before allocation · 1356aae0
      Akinobu Mita authored
      When unmapped hw queue is remapped after CPU topology is changed,
      hctx->tags->cpumask has to be set after hctx->tags is setup in
      blk_mq_map_swqueue(), otherwise it causes null pointer dereference.
      
      Fixes: f26cdc85
      
       ("blk-mq: Shared tag enhancements")
      Signed-off-by: default avatarAkinobu Mita <akinobu.mita@gmail.com>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Ming Lei <tom.leiming@gmail.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      1356aae0
  21. 23 Sep, 2015 1 commit
  22. 15 Aug, 2015 1 commit
    • Ming Lei's avatar
      blk-mq: fix race between timeout and freeing request · 0048b483
      Ming Lei authored
      
      Inside timeout handler, blk_mq_tag_to_rq() is called
      to retrieve the request from one tag. This way is obviously
      wrong because the request can be freed any time and some
      fiedds of the request can't be trusted, then kernel oops
      might be triggered[1].
      
      Currently wrt. blk_mq_tag_to_rq(), the only special case is
      that the flush request can share same tag with the request
      cloned from, and the two requests can't be active at the same
      time, so this patch fixes the above issue by updating tags->rqs[tag]
      with the active request(either flush rq or the request cloned
      from) of the tag.
      
      Also blk_mq_tag_to_rq() gets much simplified with this patch.
      
      Given blk_mq_tag_to_rq() is mainly for drivers and the caller must
      make sure the request can't be freed, so in bt_for_each() this
      helper is replaced with tags->rqs[tag].
      
      [1] kernel oops log
      [  439.696220] BUG: unable to handle kernel NULL pointer dereference at 0000000000000158^M
      [  439.697162] IP: [<ffffffff812d89ba>] blk_mq_tag_to_rq+0x21/0x6e^M
      [  439.700653] PGD 7ef765067 PUD 7ef764067 PMD 0 ^M
      [  439.700653] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC ^M
      [  439.700653] Dumping ftrace buffer:^M
      [  439.700653]    (ftrace buffer empty)^M
      [  439.700653] Modules linked in: nbd ipv6 kvm_intel kvm serio_raw^M
      [  439.700653] CPU: 6 PID: 2779 Comm: stress-ng-sigfd Not tainted 4.2.0-rc5-next-20150805+ #265^M
      [  439.730500] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011^M
      [  439.730500] task: ffff880605308000 ti: ffff88060530c000 task.ti: ffff88060530c000^M
      [  439.730500] RIP: 0010:[<ffffffff812d89ba>]  [<ffffffff812d89ba>] blk_mq_tag_to_rq+0x21/0x6e^M
      [  439.730500] RSP: 0018:ffff880819203da0  EFLAGS: 00010283^M
      [  439.730500] RAX: ffff880811b0e000 RBX: ffff8800bb465f00 RCX: 0000000000000002^M
      [  439.730500] RDX: 0000000000000000 RSI: 0000000000000202 RDI: 0000000000000000^M
      [  439.730500] RBP: ffff880819203db0 R08: 0000000000000002 R09: 0000000000000000^M
      [  439.730500] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000202^M
      [  439.730500] R13: ffff880814104800 R14: 0000000000000002 R15: ffff880811a2ea00^M
      [  439.730500] FS:  00007f165b3f5740(0000) GS:ffff880819200000(0000) knlGS:0000000000000000^M
      [  439.730500] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b^M
      [  439.730500] CR2: 0000000000000158 CR3: 00000007ef766000 CR4: 00000000000006e0^M
      [  439.730500] Stack:^M
      [  439.730500]  0000000000000008 ffff8808114eed90 ffff880819203e00 ffffffff812dc104^M
      [  439.755663]  ffff880819203e40 ffffffff812d9f5e 0000020000000000 ffff8808114eed80^M
      [  439.755663] Call Trace:^M
      [  439.755663]  <IRQ> ^M
      [  439.755663]  [<ffffffff812dc104>] bt_for_each+0x6e/0xc8^M
      [  439.755663]  [<ffffffff812d9f5e>] ? blk_mq_rq_timed_out+0x6a/0x6a^M
      [  439.755663]  [<ffffffff812d9f5e>] ? blk_mq_rq_timed_out+0x6a/0x6a^M
      [  439.755663]  [<ffffffff812dc1b3>] blk_mq_tag_busy_iter+0x55/0x5e^M
      [  439.755663]  [<ffffffff812d88b4>] ? blk_mq_bio_to_request+0x38/0x38^M
      [  439.755663]  [<ffffffff812d8911>] blk_mq_rq_timer+0x5d/0xd4^M
      [  439.755663]  [<ffffffff810a3e10>] call_timer_fn+0xf7/0x284^M
      [  439.755663]  [<ffffffff810a3d1e>] ? call_timer_fn+0x5/0x284^M
      [  439.755663]  [<ffffffff812d88b4>] ? blk_mq_bio_to_request+0x38/0x38^M
      [  439.755663]  [<ffffffff810a46d6>] run_timer_softirq+0x1ce/0x1f8^M
      [  439.755663]  [<ffffffff8104c367>] __do_softirq+0x181/0x3a4^M
      [  439.755663]  [<ffffffff8104c76e>] irq_exit+0x40/0x94^M
      [  439.755663]  [<ffffffff81031482>] smp_apic_timer_interrupt+0x33/0x3e^M
      [  439.755663]  [<ffffffff815559a4>] apic_timer_interrupt+0x84/0x90^M
      [  439.755663]  <EOI> ^M
      [  439.755663]  [<ffffffff81554350>] ? _raw_spin_unlock_irq+0x32/0x4a^M
      [  439.755663]  [<ffffffff8106a98b>] finish_task_switch+0xe0/0x163^M
      [  439.755663]  [<ffffffff8106a94d>] ? finish_task_switch+0xa2/0x163^M
      [  439.755663]  [<ffffffff81550066>] __schedule+0x469/0x6cd^M
      [  439.755663]  [<ffffffff8155039b>] schedule+0x82/0x9a^M
      [  439.789267]  [<ffffffff8119b28b>] signalfd_read+0x186/0x49a^M
      [  439.790911]  [<ffffffff8106d86a>] ? wake_up_q+0x47/0x47^M
      [  439.790911]  [<ffffffff811618c2>] __vfs_read+0x28/0x9f^M
      [  439.790911]  [<ffffffff8117a289>] ? __fget_light+0x4d/0x74^M
      [  439.790911]  [<ffffffff811620a7>] vfs_read+0x7a/0xc6^M
      [  439.790911]  [<ffffffff8116292b>] SyS_read+0x49/0x7f^M
      [  439.790911]  [<ffffffff81554c17>] entry_SYSCALL_64_fastpath+0x12/0x6f^M
      [  439.790911] Code: 48 89 e5 e8 a9 b8 e7 ff 5d c3 0f 1f 44 00 00 55 89
      f2 48 89 e5 41 54 41 89 f4 53 48 8b 47 60 48 8b 1c d0 48 8b 7b 30 48 8b
      53 38 <48> 8b 87 58 01 00 00 48 85 c0 75 09 48 8b 97 88 0c 00 00 eb 10
      ^M
      [  439.790911] RIP  [<ffffffff812d89ba>] blk_mq_tag_to_rq+0x21/0x6e^M
      [  439.790911]  RSP <ffff880819203da0>^M
      [  439.790911] CR2: 0000000000000158^M
      [  439.790911] ---[ end trace d40af58949325661 ]---^M
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarMing Lei <ming.lei@canonical.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      0048b483
  23. 13 Aug, 2015 1 commit
    • Kent Overstreet's avatar
      block: make generic_make_request handle arbitrarily sized bios · 54efd50b
      Kent Overstreet authored
      
      The way the block layer is currently written, it goes to great lengths
      to avoid having to split bios; upper layer code (such as bio_add_page())
      checks what the underlying device can handle and tries to always create
      bios that don't need to be split.
      
      But this approach becomes unwieldy and eventually breaks down with
      stacked devices and devices with dynamic limits, and it adds a lot of
      complexity. If the block layer could split bios as needed, we could
      eliminate a lot of complexity elsewhere - particularly in stacked
      drivers. Code that creates bios can then create whatever size bios are
      convenient, and more importantly stacked drivers don't have to deal with
      both their own bio size limitations and the limitations of the
      (potentially multiple) devices underneath them.  In the future this will
      let us delete merge_bvec_fn and a bunch of other code.
      
      We do this by adding calls to blk_queue_split() to the various
      make_request functions that need it - a few can already handle arbitrary
      size bios. Note that we add the call _after_ any call to
      blk_queue_bounce(); this means that blk_queue_split() and
      blk_recalc_rq_segments() don't need to be concerned with bouncing
      affecting segment merging.
      
      Some make_request_fn() callbacks were simple enough to audit and verify
      they don't need blk_queue_split() calls. The skipped ones are:
      
       * nfhd_make_request (arch/m68k/emu/nfblock.c)
       * axon_ram_make_request (arch/powerpc/sysdev/axonram.c)
       * simdisk_make_request (arch/xtensa/platforms/iss/simdisk.c)
       * brd_make_request (ramdisk - drivers/block/brd.c)
       * mtip_submit_request (drivers/block/mtip32xx/mtip32xx.c)
       * loop_make_request
       * null_queue_bio
       * bcache's make_request fns
      
      Some others are almost certainly safe to remove now, but will be left
      for future patches.
      
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Ming Lei <ming.lei@canonical.com>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: dm-devel@redhat.com
      Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
      Cc: drbd-user@lists.linbit.com
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Jim Paris <jim@jtan.com>
      Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Cc: Oleg Drokin <oleg.drokin@intel.com>
      Cc: Andreas Dilger <andreas.dilger@intel.com>
      Acked-by: NeilBrown <neilb@suse.de> (for the 'md/md.c' bits)
      Acked-by: default avatarMike Snitzer <snitzer@redhat.com>
      Reviewed-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@gmail.com>
      [dpark: skip more mq-based drivers, resolve merge conflicts, etc.]
      Signed-off-by: default avatarDongsu Park <dpark@posteo.net>
      Signed-off-by: default avatarMing Lin <ming.l@ssi.samsung.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      54efd50b