1. 18 Oct, 2019 2 commits
    • Jens Axboe's avatar
      Merge branch 'nvme-5.4' of git://git.infradead.org/nvme into for-linus · b55f0097
      Jens Axboe authored
      Pull NVMe updates from Keith:
      
      "This is a collection of bug fixes committed since the previous pull
       request that address deadlocks, double resets, memory leaks, and other
       regression."
      
      * 'nvme-5.4' of git://git.infradead.org/nvme:
        nvme-pci: Set the prp2 correctly when using more than 4k page
        nvme-tcp: fix possible leakage during error flow
        nvmet-loop: fix possible leakage during error flow
        nvme-tcp: Initialize sk->sk_ll_usec only with NET_RX_BUSY_POLL
        nvme: Wait for reset state when required
        nvme: Prevent resets during paused controller state
        nvme: Restart request timers in resetting state
        nvme: Remove ADMIN_ONLY state
        nvme-pci: Free tagset if no IO queues
        nvme: retain split access workaround for capability reads
        nvme: fix possible deadlock when nvme_update_formats fails
      b55f0097
    • Kevin Hao's avatar
      nvme-pci: Set the prp2 correctly when using more than 4k page · a4f40484
      Kevin Hao authored
      In the current code, the nvme is using a fixed 4k PRP entry size,
      but if the kernel use a page size which is more than 4k, we should
      consider the situation that the bv_offset may be larger than the
      dev->ctrl.page_size. Otherwise we may miss setting the prp2 and then
      cause the command can't be executed correctly.
      
      Fixes: dff824b2 ("nvme-pci: optimize mapping of small single segment requests")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarKevin Hao <haokexin@gmail.com>
      Signed-off-by: default avatarKeith Busch <kbusch@kernel.org>
      a4f40484
  2. 17 Oct, 2019 2 commits
  3. 16 Oct, 2019 2 commits
  4. 15 Oct, 2019 6 commits
    • Dan Williams's avatar
      libata/ahci: Fix PCS quirk application · 09d6ac8d
      Dan Williams authored
      Commit c312ef17 "libata/ahci: Drop PCS quirk for Denverton and
      beyond" got the polarity wrong on the check for which board-ids should
      have the quirk applied. The board type board_ahci_pcs7 is defined at the
      end of the list such that "pcs7" boards can be special cased in the
      future if they need the quirk. All prior Intel board ids "<
      board_ahci_pcs7" should proceed with applying the quirk.
      Reported-by: default avatarAndreas Friedrich <afrie@gmx.net>
      Reported-by: default avatarStephen Douthit <stephend@silicom-usa.com>
      Fixes: c312ef17 ("libata/ahci: Drop PCS quirk for Denverton and beyond")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      09d6ac8d
    • Tejun Heo's avatar
      blk-rq-qos: fix first node deletion of rq_qos_del() · 307f4065
      Tejun Heo authored
      rq_qos_del() incorrectly assigns the node being deleted to the head if
      it was the first on the list in the !prev path.  Fix it by iterating
      with ** instead.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Josef Bacik <josef@toxicpanda.com>
      Fixes: a7905043 ("blk-rq-qos: refactor out common elements of blk-wbt")
      Cc: stable@vger.kernel.org # v4.19+
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      307f4065
    • Tejun Heo's avatar
      blkcg: Fix multiple bugs in blkcg_activate_policy() · 9d179b86
      Tejun Heo authored
      blkcg_activate_policy() has the following bugs.
      
      * cf09a8ee ("blkcg: pass @q and @blkcg into
        blkcg_pol_alloc_pd_fn()") added @blkcg to ->pd_alloc_fn(); however,
        blkcg_activate_policy() ends up using pd's allocated for the root
        blkcg for all preallocations, so ->pd_init_fn() for non-root blkcgs
        can be passed in pd's which are allocated for the root blkcg.
      
        For blk-iocost, this means that ->pd_init_fn() can write beyond the
        end of the allocated object as it determines the length of the flex
        array at the end based on the blkcg's nesting level.
      
      * Each pd is initialized as they get allocated.  If alloc fails, the
        policy will get freed with pd's initialized on it.
      
      * After the above partial failure, the partial pds are not freed.
      
      This patch fixes all the above issues by
      
      * Restructuring blkcg_activate_policy() so that alloc and init passes
        are separate.  Init takes place only after all allocs succeeded and
        on failure all allocated pds are freed.
      
      * Unifying and fixing the cleanup of the remaining pd_prealloc.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Fixes: cf09a8ee ("blkcg: pass @q and @blkcg into blkcg_pol_alloc_pd_fn()")
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      9d179b86
    • yangerkun's avatar
      io_uring: consider the overflow of sequence for timeout req · 5da0fb1a
      yangerkun authored
      Now we recalculate the sequence of timeout with 'req->sequence =
      ctx->cached_sq_head + count - 1', judge the right place to insert
      for timeout_list by compare the number of request we still expected for
      completion. But we have not consider about the situation of overflow:
      
      1. ctx->cached_sq_head + count - 1 may overflow. And a bigger count for
      the new timeout req can have a small req->sequence.
      
      2. cached_sq_head of now may overflow compare with before req. And it
      will lead the timeout req with small req->sequence.
      
      This overflow will lead to the misorder of timeout_list, which can lead
      to the wrong order of the completion of timeout_list. Fix it by reuse
      req->submit.sequence to store the count, and change the logic of
      inserting sort in io_timeout.
      Signed-off-by: default avataryangerkun <yangerkun@huawei.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      5da0fb1a
    • Max Gurtovoy's avatar
      nvme-tcp: fix possible leakage during error flow · 28a4cac4
      Max Gurtovoy authored
      During nvme_tcp_setup_cmd_pdu error flow, one must call nvme_cleanup_cmd
      since it's symmetric to nvme_setup_cmd.
      Signed-off-by: default avatarMax Gurtovoy <maxg@mellanox.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarKeith Busch <kbusch@kernel.org>
      28a4cac4
    • Max Gurtovoy's avatar
      nvmet-loop: fix possible leakage during error flow · 5812d04c
      Max Gurtovoy authored
      During nvme_loop_queue_rq error flow, one must call nvme_cleanup_cmd since
      it's symmetric to nvme_setup_cmd.
      Signed-off-by: default avatarMax Gurtovoy <maxg@mellanox.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarKeith Busch <kbusch@kernel.org>
      5812d04c
  5. 14 Oct, 2019 7 commits
  6. 11 Oct, 2019 1 commit
  7. 10 Oct, 2019 2 commits
  8. 09 Oct, 2019 1 commit
  9. 08 Oct, 2019 1 commit
  10. 06 Oct, 2019 3 commits
  11. 05 Oct, 2019 2 commits
    • Ard Biesheuvel's avatar
      nvme: retain split access workaround for capability reads · 3a8ecc93
      Ard Biesheuvel authored
      Commit 7fd8930f
      
        "nvme: add a common helper to read Identify Controller data"
      
      has re-introduced an issue that we have attempted to work around in the
      past, in commit a310acd7 ("NVMe: use split lo_hi_{read,write}q").
      
      The problem is that some PCIe NVMe controllers do not implement 64-bit
      outbound accesses correctly, which is why the commit above switched
      to using lo_hi_[read|write]q for all 64-bit BAR accesses occuring in
      the code.
      
      In the mean time, the NVMe subsystem has been refactored, and now calls
      into the PCIe support layer for NVMe via a .reg_read64() method, which
      fails to use lo_hi_readq(), and thus reintroduces the problem that the
      workaround above aimed to address.
      
      Given that, at the moment, .reg_read64() is only used to read the
      capability register [which is known to tolerate split reads], let's
      switch .reg_read64() to lo_hi_readq() as well.
      
      This fixes a boot issue on some ARM boxes with NVMe behind a Synopsys
      DesignWare PCIe host controller.
      
      Fixes: 7fd8930f ("nvme: add a common helper to read Identify Controller data")
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarSagi Grimberg <sagi@grimberg.me>
      3a8ecc93
    • Sagi Grimberg's avatar
      nvme: fix possible deadlock when nvme_update_formats fails · 6abff1b9
      Sagi Grimberg authored
      nvme_update_formats may fail to revalidate the namespace and
      attempt to remove the namespace. This may lead to a deadlock
      as nvme_ns_remove will attempt to acquire the subsystem lock
      which is already acquired by the passthru command with effects.
      
      Move the invalid namepsace removal to after the passthru command
      releases the subsystem lock.
      Reported-by: default avatarJudy Brock <judy.brock@samsung.com>
      Signed-off-by: default avatarSagi Grimberg <sagi@grimberg.me>
      6abff1b9
  12. 04 Oct, 2019 1 commit
  13. 03 Oct, 2019 3 commits
  14. 01 Oct, 2019 4 commits
    • Stefan Haberland's avatar
      Revert "s390/dasd: Add discard support for ESE volumes" · 964ce509
      Stefan Haberland authored
      This reverts commit 7e64db15.
      
      The thin provisioning feature introduces an IOCTL and the discard support
      to allow userspace tools and filesystems to release unused and previously
      allocated space respectively.
      
      During some internal performance improvements and further tests, the
      release of allocated space revealed some issues that may lead to data
      corruption in some configurations when filesystems are mounted with
      discard support enabled.
      
      While we're working on a fix and trying to clarify the situation,
      this commit reverts the discard support for ESE volumes to prevent
      potential data corruption.
      
      Cc: <stable@vger.kernel.org> # 5.3
      Signed-off-by: default avatarStefan Haberland <sth@linux.ibm.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      964ce509
    • Jan Höppner's avatar
      s390/dasd: Fix error handling during online processing · dd454839
      Jan Höppner authored
      It is possible that the CCW commands for reading volume and extent pool
      information are not supported, either by the storage server (for
      dedicated DASDs) or by z/VM (for virtual devices, such as MDISKs).
      
      As a command reject will occur in such a case, the current error
      handling leads to a failing online processing and thus the DASD can't be
      used at all.
      
      Since the data being read is not essential for an fully operational
      DASD, the error handling can be removed. Information about the failing
      command is sent to the s390dbf debug feature.
      
      Fixes: c729696b ("s390/dasd: Recognise data for ESE volumes")
      Cc: <stable@vger.kernel.org> # 5.3
      Reported-by: default avatarFrank Heimes <frank.heimes@canonical.com>
      Signed-off-by: default avatarJan Höppner <hoeppner@linux.ibm.com>
      Signed-off-by: default avatarStefan Haberland <sth@linux.ibm.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      dd454839
    • Arnd Bergmann's avatar
      io_uring: use __kernel_timespec in timeout ABI · bdf20073
      Arnd Bergmann authored
      All system calls use struct __kernel_timespec instead of the old struct
      timespec, but this one was just added with the old-style ABI. Change it
      now to enforce the use of __kernel_timespec, avoiding ABI confusion and
      the need for compat handlers on 32-bit architectures.
      
      Any user space caller will have to use __kernel_timespec now, but this
      is unambiguous and works for any C library regardless of the time_t
      definition. A nicer way to specify the timeout would have been a less
      ambiguous 64-bit nanosecond value, but I suppose it's too late now to
      change that as this would impact both 32-bit and 64-bit users.
      
      Fixes: 5262f567 ("io_uring: IORING_OP_TIMEOUT support")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      bdf20073
    • Martijn Coenen's avatar
      loop: change queue block size to match when using DIO · 85560117
      Martijn Coenen authored
      The loop driver assumes that if the passed in fd is opened with
      O_DIRECT, the caller wants to use direct I/O on the loop device.
      However, if the underlying block device has a different block size than
      the loop block queue, direct I/O can't be enabled. Instead of requiring
      userspace to manually change the blocksize and re-enable direct I/O,
      just change the queue block sizes to match, as well as the io_min size.
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarMartijn Coenen <maco@android.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      85560117
  15. 27 Sep, 2019 3 commits
    • Jens Axboe's avatar
      Merge branch 'nvme-5.4' of git://git.infradead.org/nvme into for-linus · 2d5ba0c7
      Jens Axboe authored
      Pull NVMe changes from Sagi:
      
      "This set consists of various fixes and cleanups:
       - controller removal race fix from Balbir
       - quirk additions from Gabriel and Jian-Hong
       - nvme-pci power state save fix from Mario
       - Add 64bit user commands (for 64bit registers) from Marta
       - nvme-rdma/nvme-tcp fixes from Max, Mark and Me
       - Minor cleanups and nits from James, Dan and John"
      
      * 'nvme-5.4' of git://git.infradead.org/nvme:
        nvme-rdma: fix possible use-after-free in connect timeout
        nvme: Move ctrl sqsize to generic space
        nvme: Add ctrl attributes for queue_count and sqsize
        nvme: allow 64-bit results in passthru commands
        nvme: Add quirk for Kingston NVME SSD running FW E8FK11.T
        nvmet-tcp: remove superflous check on request sgl
        Added QUIRKs for ADATA XPG SX8200 Pro 512GB
        nvme-rdma: Fix max_hw_sectors calculation
        nvme: fix an error code in nvme_init_subsystem()
        nvme-pci: Save PCI state before putting drive into deepest state
        nvme-tcp: fix wrong stop condition in io_work
        nvme-pci: Fix a race in controller removal
        nvmet: change ppl to lpp
      2d5ba0c7
    • Ming Lei's avatar
      blk-mq: apply normal plugging for HDD · 3154df26
      Ming Lei authored
      Some HDD drive may expose multiple hardware queues, such as MegraRaid.
      Let's apply the normal plugging for such devices because sequential IO
      may benefit a lot from plug merging.
      
      Cc: Bart Van Assche <bvanassche@acm.org>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Dave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      3154df26
    • Ming Lei's avatar
      blk-mq: honor IO scheduler for multiqueue devices · a12de1d4
      Ming Lei authored
      If a device is using multiple queues, the IO scheduler may be bypassed.
      This may hurt performance for some slow MQ devices, and it also breaks
      zoned devices which depend on mq-deadline for respecting the write order
      in one zone.
      
      Don't bypass io scheduler if we have one setup.
      
      This patch can double sequential write performance basically on MQ
      scsi_debug when mq-deadline is applied.
      
      Cc: Bart Van Assche <bvanassche@acm.org>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Dave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarJavier González <javier@javigon.com>
      Reviewed-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      a12de1d4