1. 11 Apr, 2021 4 commits
  2. 08 Apr, 2021 4 commits
    • Jens Axboe's avatar
      Merge branch 'md-next' of... · ff917638
      Jens Axboe authored
      Merge branch 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-5.13/drivers
      
      Pull MD updates from Song:
      
      "These patches fix a race condition with md_release() and md_open()."
      
      * 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
        md: split mddev_find
        md: factor out a mddev_find_locked helper from mddev_find
        md: md_open returns -EBUSY when entering racing area
      ff917638
    • Christoph Hellwig's avatar
      md: split mddev_find · 65aa97c4
      Christoph Hellwig authored
      Split mddev_find into a simple mddev_find that just finds an existing
      mddev by the unit number, and a more complicated mddev_find that deals
      with find or allocating a mddev.
      
      This turns out to fix this bug reported by Zhao Heming.
      
      ----------------------------- snip ------------------------------
      commit d3374825 ("md: make devices disappear when they are no longer
      needed.") introduced protection between mddev creating & removing. The
      md_open shouldn't create mddev when all_mddevs list doesn't contain
      mddev. With currently code logic, there will be very easy to trigger
      soft lockup in non-preempt env.
      
      *** env ***
      kvm-qemu VM 2C1G with 2 iscsi luns
      kernel should be non-preempt
      
      *** script ***
      
      about trigger 1 time with 10 tests
      
      `1  node1="15sp3-mdcluster1"
      2  node2="15sp3-mdcluster2"
      3
      4  mdadm -Ss
      5  ssh ${node2} "mdadm -Ss"
      6  wipefs -a /dev/sda /dev/sdb
      7  mdadm -CR /dev/md0 -b clustered -e 1.2 -n 2 -l mirror /dev/sda \
         /dev/sdb --assume-clean
      8
      9  for i in {1..100}; do
      10    echo ==== $i ====;
      11
      12    echo "test  ...."
      13    ssh ${node2} "mdadm -A /dev/md0 /dev/sda /dev/sdb"
      14    sleep 1
      15
      16    echo "clean  ....."
      17    ssh ${node2} "mdadm -Ss"
      18 done
      `
      I use mdcluster env to trigger soft lockup, but it isn't mdcluster
      speical bug. To stop md array in mdcluster env will do more jobs than
      non-cluster array, which will leave enough time/gap to allow kernel to
      run md_open.
      
      *** stack ***
      
      `ID: 2831   TASK: ffff8dd7223b5040  CPU: 0   COMMAND: "mdadm"
       #0 [ffffa15d00a13b90] __schedule at ffffffffb8f1935f
       #1 [ffffa15d00a13ba8] exact_lock at ffffffffb8a4a66d
       #2 [ffffa15d00a13bb0] kobj_lookup at ffffffffb8c62fe3
       #3 [ffffa15d00a13c28] __blkdev_get at ffffffffb89273b9
       #4 [ffffa15d00a13c98] blkdev_get at ffffffffb8927964
       #5 [ffffa15d00a13cb0] do_dentry_open at ffffffffb88dc4b4
       #6 [ffffa15d00a13ce0] path_openat at ffffffffb88f0ccc
       #7 [ffffa15d00a13db8] do_filp_open at ffffffffb88f32bb
       #8 [ffffa15d00a13ee0] do_sys_open at ffffffffb88ddc7d
       #9 [ffffa15d00a13f38] do_syscall_64 at ffffffffb86053cb ffffffffb900008c
      
      or:
      [  884.226509]  mddev_put+0x1c/0xe0 [md_mod]
      [  884.226515]  md_open+0x3c/0xe0 [md_mod]
      [  884.226518]  __blkdev_get+0x30d/0x710
      [  884.226520]  ? bd_acquire+0xd0/0xd0
      [  884.226522]  blkdev_get+0x14/0x30
      [  884.226524]  do_dentry_open+0x204/0x3a0
      [  884.226531]  path_openat+0x2fc/0x1520
      [  884.226534]  ? seq_printf+0x4e/0x70
      [  884.226536]  do_filp_open+0x9b/0x110
      [  884.226542]  ? md_release+0x20/0x20 [md_mod]
      [  884.226543]  ? seq_read+0x1d8/0x3e0
      [  884.226545]  ? kmem_cache_alloc+0x18a/0x270
      [  884.226547]  ? do_sys_open+0x1bd/0x260
      [  884.226548]  do_sys_open+0x1bd/0x260
      [  884.226551]  do_syscall_64+0x5b/0x1e0
      [  884.226554]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      `
      *** rootcause ***
      
      "mdadm -A" (or other array assemble commands) will start a daemon "mdadm
      --monitor" by default. When "mdadm -Ss" is running, the stop action will
      wakeup "mdadm --monitor". The "--monitor" daemon will immediately get
      info from /proc/mdstat. This time mddev in kernel still exist, so
      /proc/mdstat still show md device, which makes "mdadm --monitor" to open
      /dev/md0.
      
      The previously "mdadm -Ss" is removing action, the "mdadm --monitor"
      open action will trigger md_open which is creating action. Racing is
      happening.
      
      `<thread 1>: "mdadm -Ss"
      md_release
        mddev_put deletes mddev from all_mddevs
        queue_work for mddev_delayed_delete
        at this time, "/dev/md0" is still available for opening
      
      <thread 2>: "mdadm --monitor ..."
      md_open
       + mddev_find can't find mddev of /dev/md0, and create a new mddev and
       |    return.
       + trigger "if (mddev->gendisk != bdev->bd_disk)" and return
            -ERESTARTSYS.
      `
      In non-preempt kernel, <thread 2> is occupying on current CPU. and
      mddev_delayed_delete which was created in <thread 1> also can't be
      schedule.
      
      In preempt kernel, it can also trigger above racing. But kernel doesn't
      allow one thread running on a CPU all the time. after <thread 2> running
      some time, the later "mdadm -A" (refer above script line 13) will call
      md_alloc to alloc a new gendisk for mddev. it will break md_open
      statement "if (mddev->gendisk != bdev->bd_disk)" and return 0 to caller,
      the soft lockup is broken.
      ------------------------------ snip ------------------------------
      
      Cc: stable@vger.kernel.org
      Fixes: d3374825 ("md: make devices disappear when they are no longer needed.")
      Reported-by: default avatarHeming Zhao <heming.zhao@suse.com>
      Reviewed-by: default avatarHeming Zhao <heming.zhao@suse.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      65aa97c4
    • Christoph Hellwig's avatar
      md: factor out a mddev_find_locked helper from mddev_find · 8b57251f
      Christoph Hellwig authored
      Factor out a self-contained helper to just lookup a mddev by the dev_t
      "unit".
      
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarHeming Zhao <heming.zhao@suse.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      8b57251f
    • Zhao Heming's avatar
      md: md_open returns -EBUSY when entering racing area · 6a4db2a6
      Zhao Heming authored
      commit d3374825 ("md: make devices disappear when they are no longer
      needed.") introduced protection between mddev creating & removing. The
      md_open shouldn't create mddev when all_mddevs list doesn't contain
      mddev. With currently code logic, there will be very easy to trigger
      soft lockup in non-preempt env.
      
      This patch changes md_open returning from -ERESTARTSYS to -EBUSY, which
      will break the infinitely retry when md_open enter racing area.
      
      This patch is partly fix soft lockup issue, full fix needs mddev_find
      is split into two functions: mddev_find & mddev_find_or_alloc. And
      md_open should call new mddev_find (it only does searching job).
      
      For more detail, please refer with Christoph's "split mddev_find" patch
      in later commits.
      
      *** env ***
      kvm-qemu VM 2C1G with 2 iscsi luns
      kernel should be non-preempt
      
      *** script ***
      
      about trigger every time with below script
      
      ```
      1  node1="mdcluster1"
      2  node2="mdcluster2"
      3
      4  mdadm -Ss
      5  ssh ${node2} "mdadm -Ss"
      6  wipefs -a /dev/sda /dev/sdb
      7  mdadm -CR /dev/md0 -b clustered -e 1.2 -n 2 -l mirror /dev/sda \
         /dev/sdb --assume-clean
      8
      9  for i in {1..10}; do
      10    echo ==== $i ====;
      11
      12    echo "test  ...."
      13    ssh ${node2} "mdadm -A /dev/md0 /dev/sda /dev/sdb"
      14    sleep 1
      15
      16    echo "clean  ....."
      17    ssh ${node2} "mdadm -Ss"
      18 done
      ```
      
      I use mdcluster env to trigger soft lockup, but it isn't mdcluster
      speical bug. To stop md array in mdcluster env will do more jobs than
      non-cluster array, which will leave enough time/gap to allow kernel to
      run md_open.
      
      *** stack ***
      
      ```
      [  884.226509]  mddev_put+0x1c/0xe0 [md_mod]
      [  884.226515]  md_open+0x3c/0xe0 [md_mod]
      [  884.226518]  __blkdev_get+0x30d/0x710
      [  884.226520]  ? bd_acquire+0xd0/0xd0
      [  884.226522]  blkdev_get+0x14/0x30
      [  884.226524]  do_dentry_open+0x204/0x3a0
      [  884.226531]  path_openat+0x2fc/0x1520
      [  884.226534]  ? seq_printf+0x4e/0x70
      [  884.226536]  do_filp_open+0x9b/0x110
      [  884.226542]  ? md_release+0x20/0x20 [md_mod]
      [  884.226543]  ? seq_read+0x1d8/0x3e0
      [  884.226545]  ? kmem_cache_alloc+0x18a/0x270
      [  884.226547]  ? do_sys_open+0x1bd/0x260
      [  884.226548]  do_sys_open+0x1bd/0x260
      [  884.226551]  do_syscall_64+0x5b/0x1e0
      [  884.226554]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      ```
      
      *** rootcause ***
      
      "mdadm -A" (or other array assemble commands) will start a daemon "mdadm
      --monitor" by default. When "mdadm -Ss" is running, the stop action will
      wakeup "mdadm --monitor". The "--monitor" daemon will immediately get
      info from /proc/mdstat. This time mddev in kernel still exist, so
      /proc/mdstat still show md device, which makes "mdadm --monitor" to open
      /dev/md0.
      
      The previously "mdadm -Ss" is removing action, the "mdadm --monitor"
      open action will trigger md_open which is creating action. Racing is
      happening.
      
      ```
      <thread 1>: "mdadm -Ss"
      md_release
        mddev_put deletes mddev from all_mddevs
        queue_work for mddev_delayed_delete
        at this time, "/dev/md0" is still available for opening
      
      <thread 2>: "mdadm --monitor ..."
      md_open
       + mddev_find can't find mddev of /dev/md0, and create a new mddev and
       |    return.
       + trigger "if (mddev->gendisk != bdev->bd_disk)" and return
            -ERESTARTSYS.
      ```
      
      In non-preempt kernel, <thread 2> is occupying on current CPU. and
      mddev_delayed_delete which was created in <thread 1> also can't be
      schedule.
      
      In preempt kernel, it can also trigger above racing. But kernel doesn't
      allow one thread running on a CPU all the time. after <thread 2> running
      some time, the later "mdadm -A" (refer above script line 13) will call
      md_alloc to alloc a new gendisk for mddev. it will break md_open
      statement "if (mddev->gendisk != bdev->bd_disk)" and return 0 to caller,
      the soft lockup is broken.
      
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarZhao Heming <heming.zhao@suse.com>
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      6a4db2a6
  3. 06 Apr, 2021 20 commits
    • Guobin Huang's avatar
      drbd: use DEFINE_SPINLOCK() for spinlock · 9c282c29
      Guobin Huang authored
      spinlock can be initialized automatically with DEFINE_SPINLOCK()
      rather than explicitly calling spin_lock_init().
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarGuobin Huang <huangguobin4@huawei.com>
      Link: https://lore.kernel.org/r/1617710988-49205-1-git-send-email-huangguobin4@huawei.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      9c282c29
    • Christoph Hellwig's avatar
      swim3: support highmem · b60b270b
      Christoph Hellwig authored
      swim3 only uses the virtual address of a bio to stash it into the data
      transfer using virt_to_bus.  But the ppc32 virt_to_bus just uses the
      physical address with an offset.  Replace virt_to_bus with a local hack
      that performs the equivalent transformation and stop asking for block
      layer bounce buffering.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Link: https://lore.kernel.org/r/20210406061839.811588-1-hch@lst.deSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      b60b270b
    • Christoph Hellwig's avatar
      floppy: always use the track buffer · 3d86739c
      Christoph Hellwig authored
      Always use the track buffer that is already used for addresses outside
      the 16MB address capability of the floppy controller.  This allows to
      remove a lot of code that relies on kernel virtual addresses.  With
      this gone there is just a single place left that looks at the bio,
      which can be converted to memcpy_{from,to}_page, thus removing the need
      for the extra block-layer bounce buffering for highmem pages.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Link: https://lore.kernel.org/r/20210406061755.811522-1-hch@lst.deSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      3d86739c
    • Christoph Hellwig's avatar
      swim: don't call blk_queue_bounce_limit · 4c6e5bc8
      Christoph Hellwig authored
      m68k doesn't support highmem, so don't bother enabling the block layer
      bounce buffer code.  Just for safety throw in a depend on !HIGHMEM.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Link: https://lore.kernel.org/r/20210406061725.811389-1-hch@lst.deSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      4c6e5bc8
    • Christoph Hellwig's avatar
      gdrom: support highmem · 1d2c8200
      Christoph Hellwig authored
      The gdrom driver only has a single reference to the virtual address of
      the bio data, and uses that only to get the physical address.  Switch
      to deriving the physical address from the page directly and thus avoid
      bounce buffering highmem data.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Link: https://lore.kernel.org/r/20210406061648.811275-1-hch@lst.deSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      1d2c8200
    • Lee Jones's avatar
      block: drbd: drbd_nl: Demote half-complete kernel-doc headers · a425711c
      Lee Jones authored
      Fixes the following W=1 kernel build warning(s):
      
       from drivers/block/drbd/drbd_nl.c:24:
       drivers/block/drbd/drbd_nl.c: In function ‘drbd_adm_attach’:
       drivers/block/drbd/drbd_nl.c:1968:10: warning: implicit conversion from ‘enum drbd_state_rv’ to ‘enum drbd_ret_code’ [-Wenum-conversion]
       drivers/block/drbd/drbd_nl.c:930: warning: Function parameter or member 'flags' not described in 'drbd_determine_dev_size'
       drivers/block/drbd/drbd_nl.c:930: warning: Function parameter or member 'rs' not described in 'drbd_determine_dev_size'
       drivers/block/drbd/drbd_nl.c:1148: warning: Function parameter or member 'dc' not described in 'drbd_check_al_size'
      
      Cc: Philipp Reisner <philipp.reisner@linbit.com>
      Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: drbd-dev@lists.linbit.com
      Cc: linux-block@vger.kernel.org
      Signed-off-by: default avatarLee Jones <lee.jones@linaro.org>
      Link: https://lore.kernel.org/r/20210312105530.2219008-12-lee.jones@linaro.orgSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      a425711c
    • Lee Jones's avatar
      block: xen-blkfront: Demote kernel-doc abuses · 5fdbd5bc
      Lee Jones authored
      Fixes the following W=1 kernel build warning(s):
      
       drivers/block/xen-blkfront.c:1960: warning: Function parameter or member 'dev' not described in 'blkfront_probe'
       drivers/block/xen-blkfront.c:1960: warning: Function parameter or member 'id' not described in 'blkfront_probe'
       drivers/block/xen-blkfront.c:1960: warning: expecting prototype for Allocate the basic(). Prototype was for blkfront_probe() instead
       drivers/block/xen-blkfront.c:2085: warning: Function parameter or member 'dev' not described in 'blkfront_resume'
       drivers/block/xen-blkfront.c:2085: warning: expecting prototype for or a backend(). Prototype was for blkfront_resume() instead
       drivers/block/xen-blkfront.c:2444: warning: wrong kernel-doc identifier on line:
      
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: "Roger Pau Monné" <roger.pau@citrix.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Stefano Stabellini <sstabellini@kernel.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: xen-devel@lists.xenproject.org
      Cc: linux-block@vger.kernel.org
      Signed-off-by: default avatarLee Jones <lee.jones@linaro.org>
      Acked-by: default avatarRoger Pau Monné <roger.pau@citrix.com>
      Link: https://lore.kernel.org/r/20210312105530.2219008-11-lee.jones@linaro.orgSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      5fdbd5bc
    • Lee Jones's avatar
      block: drbd: drbd_receiver: Demote less than half complete kernel-doc header · 6ec2a0f2
      Lee Jones authored
      Fixes the following W=1 kernel build warning(s):
      
       drivers/block/drbd/drbd_receiver.c:1641: warning: Function parameter or member 'op' not described in 'drbd_submit_peer_request'
       drivers/block/drbd/drbd_receiver.c:1641: warning: Function parameter or member 'op_flags' not described in 'drbd_submit_peer_request'
       drivers/block/drbd/drbd_receiver.c:1641: warning: Function parameter or member 'fault_type' not described in 'drbd_submit_peer_request'
      
      Cc: Philipp Reisner <philipp.reisner@linbit.com>
      Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: drbd-dev@lists.linbit.com
      Cc: linux-block@vger.kernel.org
      Signed-off-by: default avatarLee Jones <lee.jones@linaro.org>
      Link: https://lore.kernel.org/r/20210312105530.2219008-10-lee.jones@linaro.orgSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      6ec2a0f2
    • Lee Jones's avatar
      block: drbd: drbd_main: Fix a bunch of function documentation discrepancies · 584164c8
      Lee Jones authored
      Fixes the following W=1 kernel build warning(s):
      
       drivers/block/drbd/drbd_main.c:278: warning: Function parameter or member 'connection' not described in 'tl_clear'
       drivers/block/drbd/drbd_main.c:278: warning: Excess function parameter 'device' description in 'tl_clear'
       drivers/block/drbd/drbd_main.c:489: warning: Function parameter or member 'cpu_mask' not described in 'drbd_calc_cpu_mask'
       drivers/block/drbd/drbd_main.c:528: warning: Excess function parameter 'device' description in 'drbd_thread_current_set_cpu'
       drivers/block/drbd/drbd_main.c:549: warning: Function parameter or member 'connection' not described in 'drbd_header_size'
       drivers/block/drbd/drbd_main.c:1204: warning: Function parameter or member 'device' not described in 'send_bitmap_rle_or_plain'
       drivers/block/drbd/drbd_main.c:1204: warning: Function parameter or member 'c' not described in 'send_bitmap_rle_or_plain'
       drivers/block/drbd/drbd_main.c:1335: warning: Function parameter or member 'peer_device' not described in '_drbd_send_ack'
       drivers/block/drbd/drbd_main.c:1335: warning: Excess function parameter 'device' description in '_drbd_send_ack'
       drivers/block/drbd/drbd_main.c:1379: warning: Function parameter or member 'peer_device' not described in 'drbd_send_ack'
       drivers/block/drbd/drbd_main.c:1379: warning: Excess function parameter 'device' description in 'drbd_send_ack'
       drivers/block/drbd/drbd_main.c:1892: warning: Function parameter or member 'connection' not described in 'drbd_send_all'
       drivers/block/drbd/drbd_main.c:1892: warning: Function parameter or member 'sock' not described in 'drbd_send_all'
       drivers/block/drbd/drbd_main.c:1892: warning: Function parameter or member 'buffer' not described in 'drbd_send_all'
       drivers/block/drbd/drbd_main.c:1892: warning: Function parameter or member 'size' not described in 'drbd_send_all'
       drivers/block/drbd/drbd_main.c:1892: warning: Function parameter or member 'msg_flags' not described in 'drbd_send_all'
       drivers/block/drbd/drbd_main.c:3525: warning: Function parameter or member 'flags' not described in 'drbd_queue_bitmap_io'
       drivers/block/drbd/drbd_main.c:3563: warning: Function parameter or member 'flags' not described in 'drbd_bitmap_io'
      
      Cc: Philipp Reisner <philipp.reisner@linbit.com>
      Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: drbd-dev@lists.linbit.com
      Cc: linux-block@vger.kernel.org
      Signed-off-by: default avatarLee Jones <lee.jones@linaro.org>
      Link: https://lore.kernel.org/r/20210312105530.2219008-9-lee.jones@linaro.orgSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      584164c8
    • Lee Jones's avatar
      block: drbd: drbd_nl: Make conversion to 'enum drbd_ret_code' explicit · 1f1e87b4
      Lee Jones authored
      Fixes the following W=1 kernel build warning(s):
      
       from drivers/block/drbd/drbd_nl.c:24:
       drivers/block/drbd/drbd_nl.c: In function ‘drbd_adm_set_role’:
       drivers/block/drbd/drbd_nl.c:793:11: warning: implicit conversion from ‘enum drbd_state_rv’ to ‘enum drbd_ret_code’ [-Wenum-conversion]
       drivers/block/drbd/drbd_nl.c:795:11: warning: implicit conversion from ‘enum drbd_state_rv’ to ‘enum drbd_ret_code’ [-Wenum-conversion]
       drivers/block/drbd/drbd_nl.c: In function ‘drbd_adm_attach’:
       drivers/block/drbd/drbd_nl.c:1965:10: warning: implicit conversion from ‘enum drbd_state_rv’ to ‘enum drbd_ret_code’ [-Wenum-conversion]
       drivers/block/drbd/drbd_nl.c: In function ‘drbd_adm_connect’:
       drivers/block/drbd/drbd_nl.c:2690:10: warning: implicit conversion from ‘enum drbd_state_rv’ to ‘enum drbd_ret_code’ [-Wenum-conversion]
       drivers/block/drbd/drbd_nl.c: In function ‘drbd_adm_disconnect’:
       drivers/block/drbd/drbd_nl.c:2803:11: warning: implicit conversion from ‘enum drbd_state_rv’ to ‘enum drbd_ret_code’ [-Wenum-conversion]
      
      Cc: Philipp Reisner <philipp.reisner@linbit.com>
      Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: drbd-dev@lists.linbit.com
      Cc: linux-block@vger.kernel.org
      Signed-off-by: default avatarLee Jones <lee.jones@linaro.org>
      Link: https://lore.kernel.org/r/20210312105530.2219008-8-lee.jones@linaro.orgSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      1f1e87b4
    • Lee Jones's avatar
      block: drbd: drbd_main: Remove duplicate field initialisation · f58a0d18
      Lee Jones authored
      [P_RETRY_WRITE] is initialised more than once.
      
      Fixes the following W=1 kernel build warning(s):
      
       drivers/block/drbd/drbd_main.c: In function ‘cmdname’:
       drivers/block/drbd/drbd_main.c:3660:22: warning: initialized field overwritten [-Woverride-init]
       drivers/block/drbd/drbd_main.c:3660:22: note: (near initialization for ‘cmdnames[44]’)
      
      Cc: Philipp Reisner <philipp.reisner@linbit.com>
      Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: drbd-dev@lists.linbit.com
      Cc: linux-block@vger.kernel.org
      Signed-off-by: default avatarLee Jones <lee.jones@linaro.org>
      Link: https://lore.kernel.org/r/20210312105530.2219008-7-lee.jones@linaro.orgSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      f58a0d18
    • Lee Jones's avatar
      block: drbd: drbd_receiver: Demote non-conformant kernel-doc headers · 9b48ff07
      Lee Jones authored
      Fixes the following W=1 kernel build warning(s):
      
       drivers/block/drbd/drbd_receiver.c:265: warning: Function parameter or member 'peer_device' not described in 'drbd_alloc_pages'
       drivers/block/drbd/drbd_receiver.c:265: warning: Excess function parameter 'device' description in 'drbd_alloc_pages'
       drivers/block/drbd/drbd_receiver.c:1362: warning: Function parameter or member 'connection' not described in 'drbd_may_finish_epoch'
       drivers/block/drbd/drbd_receiver.c:1362: warning: Excess function parameter 'device' description in 'drbd_may_finish_epoch'
       drivers/block/drbd/drbd_receiver.c:1451: warning: Function parameter or member 'resource' not described in 'drbd_bump_write_ordering'
       drivers/block/drbd/drbd_receiver.c:1451: warning: Function parameter or member 'bdev' not described in 'drbd_bump_write_ordering'
       drivers/block/drbd/drbd_receiver.c:1451: warning: Excess function parameter 'connection' description in 'drbd_bump_write_ordering'
       drivers/block/drbd/drbd_receiver.c:1643: warning: Function parameter or member 'op' not described in 'drbd_submit_peer_request'
       drivers/block/drbd/drbd_receiver.c:1643: warning: Function parameter or member 'op_flags' not described in 'drbd_submit_peer_request'
       drivers/block/drbd/drbd_receiver.c:1643: warning: Function parameter or member 'fault_type' not described in 'drbd_submit_peer_request'
       drivers/block/drbd/drbd_receiver.c:1643: warning: Excess function parameter 'rw' description in 'drbd_submit_peer_request'
       drivers/block/drbd/drbd_receiver.c:3055: warning: Function parameter or member 'peer_device' not described in 'drbd_asb_recover_0p'
       drivers/block/drbd/drbd_receiver.c:3138: warning: Function parameter or member 'peer_device' not described in 'drbd_asb_recover_1p'
       drivers/block/drbd/drbd_receiver.c:3195: warning: Function parameter or member 'peer_device' not described in 'drbd_asb_recover_2p'
       drivers/block/drbd/drbd_receiver.c:4684: warning: Function parameter or member 'peer_device' not described in 'receive_bitmap_plain'
       drivers/block/drbd/drbd_receiver.c:4684: warning: Function parameter or member 'size' not described in 'receive_bitmap_plain'
       drivers/block/drbd/drbd_receiver.c:4684: warning: Function parameter or member 'p' not described in 'receive_bitmap_plain'
       drivers/block/drbd/drbd_receiver.c:4684: warning: Function parameter or member 'c' not described in 'receive_bitmap_plain'
       drivers/block/drbd/drbd_receiver.c:4738: warning: Function parameter or member 'peer_device' not described in 'recv_bm_rle_bits'
       drivers/block/drbd/drbd_receiver.c:4738: warning: Function parameter or member 'p' not described in 'recv_bm_rle_bits'
       drivers/block/drbd/drbd_receiver.c:4738: warning: Function parameter or member 'c' not described in 'recv_bm_rle_bits'
       drivers/block/drbd/drbd_receiver.c:4738: warning: Function parameter or member 'len' not described in 'recv_bm_rle_bits'
       drivers/block/drbd/drbd_receiver.c:4807: warning: Function parameter or member 'peer_device' not described in 'decode_bitmap_c'
       drivers/block/drbd/drbd_receiver.c:4807: warning: Function parameter or member 'p' not described in 'decode_bitmap_c'
       drivers/block/drbd/drbd_receiver.c:4807: warning: Function parameter or member 'c' not described in 'decode_bitmap_c'
       drivers/block/drbd/drbd_receiver.c:4807: warning: Function parameter or member 'len' not described in 'decode_bitmap_c'
      
      Cc: Philipp Reisner <philipp.reisner@linbit.com>
      Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: drbd-dev@lists.linbit.com
      Cc: linux-block@vger.kernel.org
      Signed-off-by: default avatarLee Jones <lee.jones@linaro.org>
      Link: https://lore.kernel.org/r/20210312105530.2219008-6-lee.jones@linaro.orgSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      9b48ff07
    • Lee Jones's avatar
      block: drbd: drbd_state: Fix some function documentation issues · 49ece311
      Lee Jones authored
      Fixes the following W=1 kernel build warning(s):
      
       drivers/block/drbd/drbd_state.c:913: warning: Function parameter or member 'connection' not described in 'is_valid_soft_transition'
       drivers/block/drbd/drbd_state.c:913: warning: Excess function parameter 'device' description in 'is_valid_soft_transition'
       drivers/block/drbd/drbd_state.c:1054: warning: Function parameter or member 'warn' not described in 'sanitize_state'
       drivers/block/drbd/drbd_state.c:1054: warning: Excess function parameter 'warn_sync_abort' description in 'sanitize_state'
       drivers/block/drbd/drbd_state.c:1703: warning: Function parameter or member 'state_change' not described in 'after_state_ch'
      
      Cc: Philipp Reisner <philipp.reisner@linbit.com>
      Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: drbd-dev@lists.linbit.com
      Cc: linux-block@vger.kernel.org
      Signed-off-by: default avatarLee Jones <lee.jones@linaro.org>
      Link: https://lore.kernel.org/r/20210312105530.2219008-5-lee.jones@linaro.orgSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      49ece311
    • Lee Jones's avatar
      block: mtip32xx: mtip32xx: Mark debugging variable 'start' as __maybe_unused · d0e0cb97
      Lee Jones authored
      Fixes the following W=1 kernel build warning(s):
      
       drivers/block/mtip32xx/mtip32xx.c: In function ‘mtip_standby_immediate’:
       drivers/block/mtip32xx/mtip32xx.c:1216:16: warning: variable ‘start’ set but not used [-Wunused-but-set-variable]
      
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: linux-block@vger.kernel.org
      Signed-off-by: default avatarLee Jones <lee.jones@linaro.org>
      Link: https://lore.kernel.org/r/20210312105530.2219008-4-lee.jones@linaro.orgSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      d0e0cb97
    • Lee Jones's avatar
      block: drbd: drbd_interval: Demote some kernel-doc abuses and fix another header · b8b87103
      Lee Jones authored
      Fixes the following W=1 kernel build warning(s):
      
       drivers/block/drbd/drbd_interval.c:11: warning: Function parameter or member 'node' not described in 'interval_end'
       drivers/block/drbd/drbd_interval.c:26: warning: Function parameter or member 'root' not described in 'drbd_insert_interval'
       drivers/block/drbd/drbd_interval.c:26: warning: Function parameter or member 'this' not described in 'drbd_insert_interval'
       drivers/block/drbd/drbd_interval.c:70: warning: Function parameter or member 'root' not described in 'drbd_contains_interval'
       drivers/block/drbd/drbd_interval.c:96: warning: Function parameter or member 'root' not described in 'drbd_remove_interval'
       drivers/block/drbd/drbd_interval.c:96: warning: Function parameter or member 'this' not described in 'drbd_remove_interval'
       drivers/block/drbd/drbd_interval.c:113: warning: Function parameter or member 'root' not described in 'drbd_find_overlap'
      
      Cc: Philipp Reisner <philipp.reisner@linbit.com>
      Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: drbd-dev@lists.linbit.com
      Cc: linux-block@vger.kernel.org
      Signed-off-by: default avatarLee Jones <lee.jones@linaro.org>
      Link: https://lore.kernel.org/r/20210312105530.2219008-3-lee.jones@linaro.orgSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      b8b87103
    • Jens Axboe's avatar
      Merge tag 'nvme-5.13-2021-04-06' of git://git.infradead.org/nvme into for-5.13/drivers · 762d6bd2
      Jens Axboe authored
      Pull NVMe updates from Christoph:
      
      "nvme updates for Linux 5.13
      
       - fix handling of very large MDTS values (Bart Van Assche)
       - retrigger ANA log update if group descriptor isn't found
         (Hannes Reinecke)
       - fix locking contexts in nvme-tcp and nvmet-tcp (Sagi Grimberg)
       - return proper error code from discovery ctrl (Hou Pu)
       - verify the SGLS field in nvmet-tcp and nvmet-fc (Max Gurtovoy)
       - disallow passthru cmd from targeting a nsid != nsid of the block dev
         (Niklas Cassel)
       - do not allow model_number exceed 40 bytes in nvmet (Noam Gottlieb)
       - enable optional queue idle period tracking in nvmet-tcp
         (Mark Wunderlich)
       - various cleanups and optimizations (Chaitanya Kulkarni, Kanchan Joshi)
       - expose fast_io_fail_tmo in sysfs (Daniel Wagner)
       - implement non-MDTS command limits (Keith Busch)
       - reduce warnings for unhandled command effects (Keith Busch)
       - allocate storage for the SQE as part of the nvme_request (Keith Busch)"
      
      * tag 'nvme-5.13-2021-04-06' of git://git.infradead.org/nvme: (33 commits)
        nvme: fix handling of large MDTS values
        nvme: implement non-mdts command limits
        nvme: disallow passthru cmd from targeting a nsid != nsid of the block dev
        nvme: retrigger ANA log update if group descriptor isn't found
        nvme: export fast_io_fail_tmo to sysfs
        nvme: remove superfluous else in nvme_ctrl_loss_tmo_store
        nvme: use sysfs_emit instead of sprintf
        nvme-fc: check sgl supported by target
        nvme-tcp: check sgl supported by target
        nvmet-tcp: enable optional queue idle period tracking
        nvmet-tcp: fix incorrect locking in state_change sk callback
        nvme-tcp: block BH in sk state_change sk callback
        nvmet: return proper error code from discovery ctrl
        nvme: warn of unhandled effects only once
        nvme: use driver pdu command for passthrough
        nvme-pci: allocate nvme_command within driver pdu
        nvmet: do not allow model_number exceed 40 bytes
        nvmet: remove unnecessary ctrl parameter
        nvmet-fc: update function documentation
        nvme-fc: fix the function documentation comment
        ...
      762d6bd2
    • Bart Van Assche's avatar
      nvme: fix handling of large MDTS values · 8609c63f
      Bart Van Assche authored
      Instead of triggering an integer overflow and undefined behavior if MDTS is
      large, set max_hw_sectors to UINT_MAX.
      Signed-off-by: default avatarBart Van Assche <bvanassche@acm.org>
      Reviewed-by: default avatarKeith Busch <kbusch@kernel.org>
      [hch: rebased to account for the new nvme_mps_to_sectors helper]
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      8609c63f
    • Keith Busch's avatar
      nvme: implement non-mdts command limits · 5befc7c2
      Keith Busch authored
      Commands that access LBA contents without a data transfer between the
      host historically have not had a spec defined upper limit. The driver
      set the queue constraints for such commands to the max data transfer
      size just to be safe, but this artificial constraint frequently limits
      devices below their capabilities.
      
      The NVMe Workgroup ratified TP4040 defines how a controller may
      advertise their non-MDTS limits. Use these if provided and default to
      the current constraints if not. Since the Dataset Management command
      limits are defined in logical blocks, but without a namespace to tell us
      the logical block size, the code defaults to the safe 512b size.
      Signed-off-by: default avatarKeith Busch <kbusch@kernel.org>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      5befc7c2
    • Niklas Cassel's avatar
      nvme: disallow passthru cmd from targeting a nsid != nsid of the block dev · c881a23f
      Niklas Cassel authored
      When a passthru command targets a specific namespace, the ns parameter to
      nvme_user_cmd()/nvme_user_cmd64() is set. However, there is currently no
      validation that the nsid specified in the passthru command targets the
      namespace/nsid represented by the block device that the ioctl was
      performed on.
      
      Add a check that validates that the nsid in the passthru command matches
      that of the supplied namespace.
      Signed-off-by: default avatarNiklas Cassel <niklas.cassel@wdc.com>
      Reviewed-by: default avatarJavier González <javier@javigon.com>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Reviewed-by: default avatarKanchan Joshi <joshi.k@samsung.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      c881a23f
    • Hannes Reinecke's avatar
      nvme: retrigger ANA log update if group descriptor isn't found · dd8f7fa9
      Hannes Reinecke authored
      If ANA is enabled but no ANA group descriptor is found when creating
      a new namespace the ANA log is most likely out of date, so trigger
      a re-read. The namespace will be tagged with the NS_ANA_PENDING flag
      to exclude it from path selection until the ANA log has been re-read.
      
      Fixes: 32acab31 ("nvme: implement multipath access to nvme subsystems")
      Reported-by: default avatarMartin George <marting@netapp.com>
      Signed-off-by: default avatarHannes Reinecke <hare@suse.de>
      Reviewed-by: default avatarKeith Busch <kbusch@kernel.org>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      dd8f7fa9
  4. 02 Apr, 2021 12 commits
    • Daniel Wagner's avatar
      nvme: export fast_io_fail_tmo to sysfs · 09fbed63
      Daniel Wagner authored
      Commit 8c4dfea9 ("nvme-fabrics: reject I/O to offline device")
      introduced fast_io_fail_tmo but didn't export the value to sysfs. The
      value can be set during the 'nvme connect'. Export the timeout value
      to user space via sysfs to allow runtime configuration.
      
      Cc: Victor Gladkov <Victor.Gladkov@kioxia.com>
      Signed-off-by: default avatarDaniel Wagner <dwagner@suse.de>
      Reviewed-by: default avatarEwan D. Milne <emilne@redhat.com>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Reviewed-by: default avatarHimanshu Madhani <himanshu.madhaani@oracle.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      09fbed63
    • Daniel Wagner's avatar
      nvme: remove superfluous else in nvme_ctrl_loss_tmo_store · 25a64e4e
      Daniel Wagner authored
      If there is an error we will leave the function early. So there
      is no need for an else. Remove it.
      Signed-off-by: default avatarDaniel Wagner <dwagner@suse.de>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      25a64e4e
    • Daniel Wagner's avatar
      nvme: use sysfs_emit instead of sprintf · bff4bcf3
      Daniel Wagner authored
      sysfs_emit is the recommended API to use for formatting strings to be
      returned to user space. It is equivalent to scnprintf and aware of the
      PAGE_SIZE buffer size.
      Suggested-by: default avatarChaitanya Kulkarni <Chaitanya.Kulkarni@wdc.com>
      Signed-off-by: default avatarDaniel Wagner <dwagner@suse.de>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      bff4bcf3
    • Max Gurtovoy's avatar
      nvme-fc: check sgl supported by target · 8df1bff5
      Max Gurtovoy authored
      SGLs support is mandatory for NVMe/FC, make sure that the target is
      aligned to the specification.
      Signed-off-by: default avatarMax Gurtovoy <mgurtovoy@nvidia.com>
      Reviewed-by: default avatarChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      8df1bff5
    • Max Gurtovoy's avatar
      nvme-tcp: check sgl supported by target · 73ffcefc
      Max Gurtovoy authored
      SGLs support is mandatory for NVMe/tcp, make sure that the target is
      aligned to the specification.
      Signed-off-by: default avatarMax Gurtovoy <mgurtovoy@nvidia.com>
      Reviewed-by: default avatarChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      73ffcefc
    • Wunderlich, Mark's avatar
      nvmet-tcp: enable optional queue idle period tracking · d8e7b462
      Wunderlich, Mark authored
      Add 'idle_poll_period_usecs' option used by io_work() to support
      network devices enabled with advanced interrupt moderation
      supporting a relaxed interrupt model. It was discovered that
      such a NIC used on the target was unable to support initiator
      connection establishment, caused by the existing io_work()
      flow that immediately exits after a loop with no activity and
      does not re-queue itself.
      
      With this new option a queue is assigned a period of time
      that no activity must occur in order to become 'idle'.  Until
      the queue is idle the work item is requeued.
      
      The new module option is defined as changeable making it
      flexible for testing purposes.
      
      The pre-existing legacy behavior is preserved when no module option
      for idle_poll_period_usecs is specified.
      Signed-off-by: default avatarMark Wunderlich <mark.wunderlich@intel.com>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      d8e7b462
    • Sagi Grimberg's avatar
      nvmet-tcp: fix incorrect locking in state_change sk callback · b5332a9f
      Sagi Grimberg authored
      We are not changing anything in the TCP connection state so
      we should not take a write_lock but rather a read lock.
      
      This caused a deadlock when running nvmet-tcp and nvme-tcp
      on the same system, where state_change callbacks on the
      host and on the controller side have causal relationship
      and made lockdep report on this with blktests:
      
      ================================
      WARNING: inconsistent lock state
      5.12.0-rc3 #1 Tainted: G          I
      --------------------------------
      inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-R} usage.
      nvme/1324 [HC0[0]:SC0[0]:HE1:SE1] takes:
      ffff888363151000 (clock-AF_INET){++-?}-{2:2}, at: nvme_tcp_state_change+0x21/0x150 [nvme_tcp]
      {IN-SOFTIRQ-W} state was registered at:
        __lock_acquire+0x79b/0x18d0
        lock_acquire+0x1ca/0x480
        _raw_write_lock_bh+0x39/0x80
        nvmet_tcp_state_change+0x21/0x170 [nvmet_tcp]
        tcp_fin+0x2a8/0x780
        tcp_data_queue+0xf94/0x1f20
        tcp_rcv_established+0x6ba/0x1f00
        tcp_v4_do_rcv+0x502/0x760
        tcp_v4_rcv+0x257e/0x3430
        ip_protocol_deliver_rcu+0x69/0x6a0
        ip_local_deliver_finish+0x1e2/0x2f0
        ip_local_deliver+0x1a2/0x420
        ip_rcv+0x4fb/0x6b0
        __netif_receive_skb_one_core+0x162/0x1b0
        process_backlog+0x1ff/0x770
        __napi_poll.constprop.0+0xa9/0x5c0
        net_rx_action+0x7b3/0xb30
        __do_softirq+0x1f0/0x940
        do_softirq+0xa1/0xd0
        __local_bh_enable_ip+0xd8/0x100
        ip_finish_output2+0x6b7/0x18a0
        __ip_queue_xmit+0x706/0x1aa0
        __tcp_transmit_skb+0x2068/0x2e20
        tcp_write_xmit+0xc9e/0x2bb0
        __tcp_push_pending_frames+0x92/0x310
        inet_shutdown+0x158/0x300
        __nvme_tcp_stop_queue+0x36/0x270 [nvme_tcp]
        nvme_tcp_stop_queue+0x87/0xb0 [nvme_tcp]
        nvme_tcp_teardown_admin_queue+0x69/0xe0 [nvme_tcp]
        nvme_do_delete_ctrl+0x100/0x10c [nvme_core]
        nvme_sysfs_delete.cold+0x8/0xd [nvme_core]
        kernfs_fop_write_iter+0x2c7/0x460
        new_sync_write+0x36c/0x610
        vfs_write+0x5c0/0x870
        ksys_write+0xf9/0x1d0
        do_syscall_64+0x33/0x40
        entry_SYSCALL_64_after_hwframe+0x44/0xae
      irq event stamp: 10687
      hardirqs last  enabled at (10687): [<ffffffff9ec376bd>] _raw_spin_unlock_irqrestore+0x2d/0x40
      hardirqs last disabled at (10686): [<ffffffff9ec374d8>] _raw_spin_lock_irqsave+0x68/0x90
      softirqs last  enabled at (10684): [<ffffffff9f000608>] __do_softirq+0x608/0x940
      softirqs last disabled at (10649): [<ffffffff9cdedd31>] do_softirq+0xa1/0xd0
      
      other info that might help us debug this:
       Possible unsafe locking scenario:
      
             CPU0
             ----
        lock(clock-AF_INET);
        <Interrupt>
          lock(clock-AF_INET);
      
       *** DEADLOCK ***
      
      5 locks held by nvme/1324:
       #0: ffff8884a01fe470 (sb_writers#4){.+.+}-{0:0}, at: ksys_write+0xf9/0x1d0
       #1: ffff8886e435c090 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x216/0x460
       #2: ffff888104d90c38 (kn->active#255){++++}-{0:0}, at: kernfs_remove_self+0x22d/0x330
       #3: ffff8884634538d0 (&queue->queue_lock){+.+.}-{3:3}, at: nvme_tcp_stop_queue+0x52/0xb0 [nvme_tcp]
       #4: ffff888363150d30 (sk_lock-AF_INET){+.+.}-{0:0}, at: inet_shutdown+0x59/0x300
      
      stack backtrace:
      CPU: 26 PID: 1324 Comm: nvme Tainted: G          I       5.12.0-rc3 #1
      Hardware name: Dell Inc. PowerEdge R640/06NR82, BIOS 2.10.0 11/12/2020
      Call Trace:
       dump_stack+0x93/0xc2
       mark_lock_irq.cold+0x2c/0xb3
       ? verify_lock_unused+0x390/0x390
       ? stack_trace_consume_entry+0x160/0x160
       ? lock_downgrade+0x100/0x100
       ? save_trace+0x88/0x5e0
       ? _raw_spin_unlock_irqrestore+0x2d/0x40
       mark_lock+0x530/0x1470
       ? mark_lock_irq+0x1d10/0x1d10
       ? enqueue_timer+0x660/0x660
       mark_usage+0x215/0x2a0
       __lock_acquire+0x79b/0x18d0
       ? tcp_schedule_loss_probe.part.0+0x38c/0x520
       lock_acquire+0x1ca/0x480
       ? nvme_tcp_state_change+0x21/0x150 [nvme_tcp]
       ? rcu_read_unlock+0x40/0x40
       ? tcp_mtu_probe+0x1ae0/0x1ae0
       ? kmalloc_reserve+0xa0/0xa0
       ? sysfs_file_ops+0x170/0x170
       _raw_read_lock+0x3d/0xa0
       ? nvme_tcp_state_change+0x21/0x150 [nvme_tcp]
       nvme_tcp_state_change+0x21/0x150 [nvme_tcp]
       ? sysfs_file_ops+0x170/0x170
       inet_shutdown+0x189/0x300
       __nvme_tcp_stop_queue+0x36/0x270 [nvme_tcp]
       nvme_tcp_stop_queue+0x87/0xb0 [nvme_tcp]
       nvme_tcp_teardown_admin_queue+0x69/0xe0 [nvme_tcp]
       nvme_do_delete_ctrl+0x100/0x10c [nvme_core]
       nvme_sysfs_delete.cold+0x8/0xd [nvme_core]
       kernfs_fop_write_iter+0x2c7/0x460
       new_sync_write+0x36c/0x610
       ? new_sync_read+0x600/0x600
       ? lock_acquire+0x1ca/0x480
       ? rcu_read_unlock+0x40/0x40
       ? lock_is_held_type+0x9a/0x110
       vfs_write+0x5c0/0x870
       ksys_write+0xf9/0x1d0
       ? __ia32_sys_read+0xa0/0xa0
       ? lockdep_hardirqs_on_prepare.part.0+0x198/0x340
       ? syscall_enter_from_user_mode+0x27/0x70
       do_syscall_64+0x33/0x40
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Fixes: 872d26a3 ("nvmet-tcp: add NVMe over TCP target driver")
      Reported-by: default avatarYi Zhang <yi.zhang@redhat.com>
      Signed-off-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      b5332a9f
    • Sagi Grimberg's avatar
      nvme-tcp: block BH in sk state_change sk callback · 8b73b45d
      Sagi Grimberg authored
      The TCP stack can run from process context for a long time
      so we should disable BH here.
      
      Fixes: 3f2304f8 ("nvme-tcp: add NVMe over TCP host driver")
      Signed-off-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      8b73b45d
    • Hou Pu's avatar
      nvmet: return proper error code from discovery ctrl · 79695dcd
      Hou Pu authored
      Return NVME_SC_INVALID_FIELD from discovery controller like normal
      controller when executing identify or get log page command.
      Signed-off-by: default avatarHou Pu <houpu.main@gmail.com>
      Reviewed-by: default avatarChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      79695dcd
    • Keith Busch's avatar
      nvme: warn of unhandled effects only once · ed4a854b
      Keith Busch authored
      We don't need to repeatedly spam the kernel logs with the same warning
      about unhandled passthrough IO effects. Just one warning is sufficient
      to observe this condition occurs.
      Signed-off-by: default avatarKeith Busch <kbusch@kernel.org>
      Reviewed-by: default avatarChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      ed4a854b
    • Keith Busch's avatar
      nvme: use driver pdu command for passthrough · f4b9e6c9
      Keith Busch authored
      All nvme transport drivers preallocate an nvme command for each request.
      Assume to use that command for nvme_setup_cmd() instead of requiring
      drivers pass a pointer to it. All nvme drivers must initialize the
      generic nvme_request 'cmd' to point to the transport's preallocated
      nvme_command.
      
      The generic nvme_request cmd pointer had previously been used only as a
      temporary copy for passthrough commands. Since it now points to the
      command that gets dispatched, passthrough commands must directly set it
      up prior to executing the request.
      Signed-off-by: default avatarKeith Busch <kbusch@kernel.org>
      Reviewed-by: default avatarJens Axboe <axboe@kernel.dk>
      Reviewed-by: default avatarHimanshu Madhani <himanshu.madhani@oracle.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      f4b9e6c9
    • Keith Busch's avatar
      nvme-pci: allocate nvme_command within driver pdu · af7fae85
      Keith Busch authored
      Except for pci, all the nvme transport drivers allocate a command within
      the driver's pdu. Align pci with everyone else by allocating the nvme
      command within pci's pdu and replace the .queue_rq() stack variable with
      this.
      Signed-off-by: default avatarKeith Busch <kbusch@kernel.org>
      Reviewed-by: default avatarJens Axboe <axboe@kernel.dk>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Reviewed-by: default avatarChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Reviewed-by: default avatarHimanshu Madhani <himanshu.madhani@oracle.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      af7fae85