1. 08 Dec, 2020 5 commits
    • Damien Le Moal's avatar
      null_blk: Improve implicit zone close · 2e8c6e0e
      Damien Le Moal authored
      When open zone resource management is enabled, that is, when a null_blk
      zoned device is created with zone_max_open different than 0, implicitly
      or explicitly opening a zone may require implicitly closing a zone
      that is already implicitly open. This operation is done using the
      function null_close_first_imp_zone(), which search for an implicitly
      open zone to close starting from the first sequential zone. This
      implementation is simple but may result in the same being constantly
      implicitly closed and then implicitly reopened on write, namely, the
      lowest numbered zone that is being written.
      
      Avoid this by starting the search for an implicitly open zone starting
      from the zone following the last zone that was implicitly closed. The
      function null_close_first_imp_zone() is renamed
      null_close_imp_open_zone().
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      2e8c6e0e
    • Damien Le Moal's avatar
      null_blk: improve zone locking · 2b8b7ed7
      Damien Le Moal authored
      With memory backing disabled, using a single spinlock for protecting
      zone information and zone resource management prevents the parallel
      execution on multiple queue of IO requests to different zones.
      Furthermore, regardless of the use of memory backing, if a null_blk
      device is created without limits on the number of open and active zones,
      accounting for zone resource management is not necessary.
      
      >From these observations, zone locking is changed as follows to improve
      performance:
      1) the zone_lock spinlock is renamed zone_res_lock and used only if
         zone resource management is necessary, that is, if either
         zone_max_open or zone_max_active are not 0. This is indicated using
         the new boolean need_zone_res_mgmt in the nullb_device structure.
         null_zone_write() is modified to reduce the amount of code executed
         with the zone_res_lock spinlock held.
      2) With memory backing disabled, per zone locking is changed to a
         spinlock per zone.
      3) Introduce the structure nullb_zone to replace the use of
         struct blk_zone for zone information. This new structure includes a
         union of a spinlock and a mutex for zone locking. The spinlock is
         used when memory backing is disabled and the mutex is used with
         memory backing.
      
      With these changes, fio performance with zonemode=zbd for 4K random
      read and random write on a dual socket (24 cores per socket) machine
      using the none schedulder is as follows:
      
      before patch:
      	write (psync x 96 jobs) = 465 KIOPS
      	read (libaio@qd=8 x 96 jobs) = 1361 KIOPS
      after patch:
      	write (psync x 96 jobs) = 456 KIOPS
      	read (libaio@qd=8 x 96 jobs) = 4096 KIOPS
      
      Write performance remains mostly unchanged but read performance is three
      times higher. Performance when using the mq-deadline scheduler is not
      changed by this patch as mq-deadline becomes the bottleneck for a
      multi-queue device.
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      2b8b7ed7
    • Damien Le Moal's avatar
      block: Align max_hw_sectors to logical blocksize · 817046ec
      Damien Le Moal authored
      Block device drivers do not have to call blk_queue_max_hw_sectors() to
      set a limit on request size if the default limit BLK_SAFE_MAX_SECTORS
      is acceptable. However, this limit (255 sectors) may not be aligned
      to the device logical block size which cannot be used as is for a
      request maximum size. This is the case for the null_blk device driver.
      
      Modify blk_queue_max_hw_sectors() to make sure that the request size
      limits specified by the max_hw_sectors and max_sectors queue limits
      are always aligned to the device logical block size. Additionally, to
      avoid introducing a dependence on the execution order of this function
      with blk_queue_logical_block_size(), also modify
      blk_queue_logical_block_size() to perform the same alignment when the
      logical block size is set after max_hw_sectors.
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      817046ec
    • Damien Le Moal's avatar
      null_blk: Fail zone append to conventional zones · 2e896d89
      Damien Le Moal authored
      Conventional zones do not have a write pointer and so cannot accept zone
      append writes. Make sure to fail any zone append write command issued to
      a conventional zone.
      Reported-by: default avatarNaohiro Aota <naohiro.aota@wdc.com>
      Fixes: e0489ed5 ("null_blk: Support REQ_OP_ZONE_APPEND")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      2e896d89
    • Damien Le Moal's avatar
      null_blk: Fix zone size initialization · 0ebcdd70
      Damien Le Moal authored
      For a null_blk device with zoned mode enabled is currently initialized
      with a number of zones equal to the device capacity divided by the zone
      size, without considering if the device capacity is a multiple of the
      zone size. If the zone size is not a divisor of the capacity, the zones
      end up not covering the entire capacity, potentially resulting is out
      of bounds accesses to the zone array.
      
      Fix this by adding one last smaller zone with a size equal to the
      remainder of the disk capacity divided by the zone size if the capacity
      is not a multiple of the zone size. For such smaller last zone, the zone
      capacity is also checked so that it does not exceed the smaller zone
      size.
      Reported-by: default avatarNaohiro Aota <naohiro.aota@wdc.com>
      Fixes: ca4b2a01 ("null_blk: add zone support")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      0ebcdd70
  2. 07 Dec, 2020 2 commits
  3. 04 Dec, 2020 8 commits
  4. 02 Dec, 2020 1 commit
    • Jens Axboe's avatar
      Merge tag 'nvme-5.11-20201202' of git://git.infradead.org/nvme into for-5.11/drivers · 3b9351f0
      Jens Axboe authored
      Pull NVMe updates from Christoph:
      
      "nvme updates for 5.11
      
       - nvmet passthrough improvements (Chaitanya Kulkarni)
       - fcloop error injection support (James Smart)
       - read-only support for zoned namespaces without Zone Append
         (Javier González)
       - improve some error message (Minwoo Im)
       - reject I/O to offline fabrics namespaces (Victor Gladkov)
       - PCI queue allocation cleanups (Niklas Schnelle)
       - remove an unused allocation in nvmet (Amit Engel)
       - a Kconfig spelling fix (Colin Ian King)
       - nvme_req_qid simplication (Baolin Wang)"
      
      * tag 'nvme-5.11-20201202' of git://git.infradead.org/nvme: (23 commits)
        nvme: export zoned namespaces without Zone Append support read-only
        nvme: rename bdev operations
        nvme: rename controller base dev_t char device
        nvme: remove unnecessary return values
        nvme: print a warning for when listing active namespaces fails
        nvme: improve an error message on Identify failure
        nvme-fabrics: reject I/O to offline device
        nvmet: fix a spelling mistake "incuding" -> "including" in Kconfig
        nvmet: make sure discovery change log event is protected
        nvmet: remove unused ctrl->cqs
        nvme-pci: don't allocate unused I/O queues
        nvme-pci: drop min() from nr_io_queues assignment
        nvmet: use inline bio for passthru fast path
        nvmet: use blk_rq_bio_prep instead of blk_rq_append_bio
        nvmet: remove op_flags for passthru commands
        nvme: split nvme_alloc_request()
        block: move blk_rq_bio_prep() to linux/blk-mq.h
        nvmet: add passthru io timeout value attr
        nvmet: add passthru admin timeout value attr
        nvme: use consistent macro name for timeout
        ...
      3b9351f0
  5. 01 Dec, 2020 23 commits
  6. 30 Nov, 2020 1 commit
    • Jens Axboe's avatar
      Merge branch 'md-next' of... · 48332ff2
      Jens Axboe authored
      Merge branch 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-5.11/drivers
      
      Pull MD changes from Song:
      
      "Summary:
       1. Fix race condition in md_ioctl(), by Dae R. Jeong;
       2. Initialize read_slot properly for raid10, by Kevin Vigor;
       3. Code cleanup, by Pankaj Gupta;
       4. md-cluster resync/reshape fix, by Zhao Heming."
      
      * 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
        md/cluster: fix deadlock when node is doing resync job
        md/cluster: block reshape with remote resync job
        md: use current request time as base for ktime comparisons
        md: add comments in md_flush_request()
        md: improve variable names in md_flush_request()
        md/raid10: initialize r10_bio->read_slot before use.
        md: fix a warning caused by a race between concurrent md_ioctl()s
      48332ff2