1. 17 May, 2023 2 commits
  2. 03 May, 2023 3 commits
    • Adrian Huang's avatar
      nvme-pci: clamp max_hw_sectors based on DMA optimized limitation · 3710e2b0
      Adrian Huang authored
      When running the fio test on a 448-core AMD server + a NVME disk,
      a soft lockup or a hard lockup call trace is shown:
      
      [soft lockup]
      watchdog: BUG: soft lockup - CPU#126 stuck for 23s! [swapper/126:0]
      RIP: 0010:_raw_spin_unlock_irqrestore+0x21/0x50
      ...
      Call Trace:
       <IRQ>
       fq_flush_timeout+0x7d/0xd0
       ? __pfx_fq_flush_timeout+0x10/0x10
       call_timer_fn+0x2e/0x150
       run_timer_softirq+0x48a/0x560
       ? __pfx_fq_flush_timeout+0x10/0x10
       ? clockevents_program_event+0xaf/0x130
       __do_softirq+0xf1/0x335
       irq_exit_rcu+0x9f/0xd0
       sysvec_apic_timer_interrupt+0xb4/0xd0
       </IRQ>
       <TASK>
       asm_sysvec_apic_timer_interrupt+0x1f/0x30
      ...
      
      Obvisouly, fq_flush_timeout spends over 20 seconds. Here is ftrace log:
      
                     |  fq_flush_timeout() {
                     |    fq_ring_free() {
                     |      put_pages_list() {
         0.170 us    |        free_unref_page_list();
         0.810 us    |      }
                     |      free_iova_fast() {
                     |        free_iova() {
       * 85622.66 us |          _raw_spin_lock_irqsave();
         2.860 us    |          remove_iova();
         0.600 us    |          _raw_spin_unlock_irqrestore();
         0.470 us    |          lock_info_report();
         2.420 us    |          free_iova_mem.part.0();
       * 85638.27 us |        }
       * 85638.84 us |      }
                     |      put_pages_list() {
         0.230 us    |        free_unref_page_list();
         0.470 us    |      }
         ...            ...
       $ 31017069 us |  }
      
      Most of cores are under lock contention for acquiring iova_rbtree_lock due
      to the iova flush queue mechanism.
      
      [hard lockup]
      NMI watchdog: Watchdog detected hard LOCKUP on cpu 351
      RIP: 0010:native_queued_spin_lock_slowpath+0x2d8/0x330
      
      Call Trace:
       <IRQ>
       _raw_spin_lock_irqsave+0x4f/0x60
       free_iova+0x27/0xd0
       free_iova_fast+0x4d/0x1d0
       fq_ring_free+0x9b/0x150
       iommu_dma_free_iova+0xb4/0x2e0
       __iommu_dma_unmap+0x10b/0x140
       iommu_dma_unmap_sg+0x90/0x110
       dma_unmap_sg_attrs+0x4a/0x50
       nvme_unmap_data+0x5d/0x120 [nvme]
       nvme_pci_complete_batch+0x77/0xc0 [nvme]
       nvme_irq+0x2ee/0x350 [nvme]
       ? __pfx_nvme_pci_complete_batch+0x10/0x10 [nvme]
       __handle_irq_event_percpu+0x53/0x1a0
       handle_irq_event_percpu+0x19/0x60
       handle_irq_event+0x3d/0x60
       handle_edge_irq+0xb3/0x210
       __common_interrupt+0x7f/0x150
       common_interrupt+0xc5/0xf0
       </IRQ>
       <TASK>
       asm_common_interrupt+0x2b/0x40
      ...
      
      ftrace shows fq_ring_free spends over 10 seconds [1]. Again, most of
      cores are under lock contention for acquiring iova_rbtree_lock due
      to the iova flush queue mechanism.
      
      [Root Cause]
      The root cause is that the max_hw_sectors_kb of nvme disk (mdts=10)
      is 4096kb, which streaming DMA mappings cannot benefit from the
      scalable IOVA mechanism introduced by the commit 9257b4a2
      ("iommu/iova: introduce per-cpu caching to iova allocation") if
      the length is greater than 128kb.
      
      To fix the lock contention issue, clamp max_hw_sectors based on
      DMA optimized limitation in order to leverage scalable IOVA mechanism.
      
      Note: The issue does not happen with another NVME disk (mdts = 5
      and max_hw_sectors_kb = 128)
      
      [1] https://gist.github.com/AdrianHuang/bf8ec7338204837631fbdaed25d19cc4Suggested-by: default avatarKeith Busch <kbusch@kernel.org>
      Reported-and-tested-by: default avatarJiwei Sun <sunjw10@lenovo.com>
      Signed-off-by: default avatarAdrian Huang <ahuang12@lenovo.com>
      Reviewed-by: default avatarKeith Busch <kbusch@kernel.org>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      3710e2b0
    • Hristo Venev's avatar
      nvme-pci: add quirk for missing secondary temperature thresholds · bd375fee
      Hristo Venev authored
      On Kingston KC3000 and Kingston FURY Renegade (both have the same PCI
      IDs) accessing temp3_{min,max} fails with an invalid field error (note
      that there is no problem setting the thresholds for temp1).
      
      This contradicts the NVM Express Base Specification 2.0b, page 292:
      
        The over temperature threshold and under temperature threshold
        features shall be implemented for all implemented temperature sensors
        (i.e., all Temperature Sensor fields that report a non-zero value).
      
      Define NVME_QUIRK_NO_SECONDARY_TEMP_THRESH that disables the thresholds
      for all but the composite temperature and set it for this device.
      Signed-off-by: default avatarHristo Venev <hristo@venev.name>
      Reviewed-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      bd375fee
    • Sagi Grimberg's avatar
      nvme-pci: add NVME_QUIRK_BOGUS_NID for HS-SSD-FUTURE 2048G · 1616d6c3
      Sagi Grimberg authored
      Add a quirk to fix HS-SSD-FUTURE 2048G SSD drives reporting duplicate
      nsids.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=217384Reported-by: default avatarAndrey God <andreygod83@protonmail.com>
      Signed-off-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      1616d6c3
  3. 30 Apr, 2023 1 commit
  4. 28 Apr, 2023 9 commits
  5. 27 Apr, 2023 3 commits
  6. 25 Apr, 2023 2 commits
  7. 24 Apr, 2023 1 commit
  8. 20 Apr, 2023 2 commits
    • Zhong Jinghua's avatar
      nbd: fix incomplete validation of ioctl arg · 55793ea5
      Zhong Jinghua authored
      We tested and found an alarm caused by nbd_ioctl arg without verification.
      The UBSAN warning calltrace like below:
      
      UBSAN: Undefined behaviour in fs/buffer.c:1709:35
      signed integer overflow:
      -9223372036854775808 - 1 cannot be represented in type 'long long int'
      CPU: 3 PID: 2523 Comm: syz-executor.0 Not tainted 4.19.90 #1
      Hardware name: linux,dummy-virt (DT)
      Call trace:
       dump_backtrace+0x0/0x3f0 arch/arm64/kernel/time.c:78
       show_stack+0x28/0x38 arch/arm64/kernel/traps.c:158
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x170/0x1dc lib/dump_stack.c:118
       ubsan_epilogue+0x18/0xb4 lib/ubsan.c:161
       handle_overflow+0x188/0x1dc lib/ubsan.c:192
       __ubsan_handle_sub_overflow+0x34/0x44 lib/ubsan.c:206
       __block_write_full_page+0x94c/0xa20 fs/buffer.c:1709
       block_write_full_page+0x1f0/0x280 fs/buffer.c:2934
       blkdev_writepage+0x34/0x40 fs/block_dev.c:607
       __writepage+0x68/0xe8 mm/page-writeback.c:2305
       write_cache_pages+0x44c/0xc70 mm/page-writeback.c:2240
       generic_writepages+0xdc/0x148 mm/page-writeback.c:2329
       blkdev_writepages+0x2c/0x38 fs/block_dev.c:2114
       do_writepages+0xd4/0x250 mm/page-writeback.c:2344
      
      The reason for triggering this warning is __block_write_full_page()
      -> i_size_read(inode) - 1 overflow.
      inode->i_size is assigned in __nbd_ioctl() -> nbd_set_size() -> bytesize.
      We think it is necessary to limit the size of arg to prevent errors.
      
      Moreover, __nbd_ioctl() -> nbd_add_socket(), arg will be cast to int.
      Assuming the value of arg is 0x80000000000000001) (on a 64-bit machine),
      it will become 1 after the coercion, which will return unexpected results.
      
      Fix it by adding checks to prevent passing in too large numbers.
      Signed-off-by: default avatarZhong Jinghua <zhongjinghua@huawei.com>
      Reviewed-by: default avatarYu Kuai <yukuai3@huawei.com>
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Link: https://lore.kernel.org/r/20230206145805.2645671-1-zhongjinghua@huawei.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      55793ea5
    • Ming Lei's avatar
      ublk: don't return 0 in case of any failure · 7c75661c
      Ming Lei authored
      Commit 2d786e66 ("block: ublk: switch to ioctl command encoding")
      starts to reset local variable of 'ret' as zero, then if any failure
      happens when handling the three IO commands, 0 can be returned to ublk
      server.
      
      Fix it by returning -EINVAL in case of command handling failure.
      
      Cc: Christoph Hellwig <hch@lst.de>
      Fixes: 2d786e66 ("block: ublk: switch to ioctl command encoding")
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Link: https://lore.kernel.org/r/20230420091104.1092972-1-ming.lei@redhat.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      7c75661c
  9. 19 Apr, 2023 3 commits
    • Ondrej Kozina's avatar
      sed-opal: geometry feature reporting command · 9e05a259
      Ondrej Kozina authored
      Locking range start and locking range length
      attributes may be require to satisfy restrictions
      exposed by OPAL2 geometry feature reporting.
      
      Geometry reporting feature is described in TCG OPAL SSC,
      section 3.1.1.4 (ALIGN, LogicalBlockSize, AlignmentGranularity
      and LowestAlignedLBA).
      
      4.3.5.2.1.1 RangeStart Behavior:
      
      [ StartAlignment = (RangeStart modulo AlignmentGranularity) - LowestAlignedLBA ]
      
      When processing a Set method or CreateRow method on the Locking
      table for a non-Global Range row, if:
      
      a) the AlignmentRequired (ALIGN above) column in the LockingInfo
         table is TRUE;
      b) RangeStart is non-zero; and
      c) StartAlignment is non-zero, then the method SHALL fail and
         return an error status code INVALID_PARAMETER.
      
      4.3.5.2.1.2 RangeLength Behavior:
      
      If RangeStart is zero, then
      	[ LengthAlignment = (RangeLength modulo AlignmentGranularity) - LowestAlignedLBA ]
      
      If RangeStart is non-zero, then
      	[ LengthAlignment = (RangeLength modulo AlignmentGranularity) ]
      
      When processing a Set method or CreateRow method on the Locking
      table for a non-Global Range row, if:
      
      a) the AlignmentRequired (ALIGN above) column in the LockingInfo
         table is TRUE;
      b) RangeLength is non-zero; and
      c) LengthAlignment is non-zero, then the method SHALL fail and
         return an error status code INVALID_PARAMETER
      
      In userspace we stuck to logical block size reported by general
      block device (via sysfs or ioctl), but we can not read
      'AlignmentGranularity' or 'LowestAlignedLBA' anywhere else and
      we need to get those values from sed-opal interface otherwise
      we will not be able to report or avoid locking range setup
      INVALID_PARAMETER errors above.
      Signed-off-by: default avatarOndrej Kozina <okozina@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarChristian Brauner <brauner@kernel.org>
      Tested-by: default avatarMilan Broz <gmazyland@gmail.com>
      Link: https://lore.kernel.org/r/20230411090931.9193-2-okozina@redhat.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      9e05a259
    • Chaitanya Kulkarni's avatar
      null_blk: Always check queue mode setting from configfs · 63f8793e
      Chaitanya Kulkarni authored
      Make sure to check device queue mode in the null_validate_conf() and
      return error for NULL_Q_RQ as we don't allow legacy I/O path, without
      this patch we get OOPs when queue mode is set to 1 from configfs,
      following are repro steps :-
      
      modprobe null_blk nr_devices=0
      mkdir config/nullb/nullb0
      echo 1 > config/nullb/nullb0/memory_backed
      echo 4096 > config/nullb/nullb0/blocksize
      echo 20480 > config/nullb/nullb0/size
      echo 1 > config/nullb/nullb0/queue_mode
      echo 1 > config/nullb/nullb0/power
      
      Entering kdb (current=0xffff88810acdd080, pid 2372) on processor 42 Oops: (null)
      due to oops @ 0xffffffffc041c329
      CPU: 42 PID: 2372 Comm: sh Tainted: G           O     N 6.3.0-rc5lblk+ #5
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
      RIP: 0010:null_add_dev.part.0+0xd9/0x720 [null_blk]
      Code: 01 00 00 85 d2 0f 85 a1 03 00 00 48 83 bb 08 01 00 00 00 0f 85 f7 03 00 00 80 bb 62 01 00 00 00 48 8b 75 20 0f 85 6d 02 00 00 <48> 89 6e 60 48 8b 75 20 bf 06 00 00 00 e8 f5 37 2c c1 48 8b 75 20
      RSP: 0018:ffffc900052cbde0 EFLAGS: 00010246
      RAX: 0000000000000001 RBX: ffff88811084d800 RCX: 0000000000000001
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff888100042e00
      RBP: ffff8881053d8200 R08: ffffc900052cbd68 R09: ffff888105db2000
      R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000002
      R13: ffff888104765200 R14: ffff88810eec1748 R15: ffff88810eec1740
      FS:  00007fd445fd1740(0000) GS:ffff8897dfc80000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000060 CR3: 0000000166a00000 CR4: 0000000000350ee0
      DR0: ffffffff8437a488 DR1: ffffffff8437a489 DR2: ffffffff8437a48a
      DR3: ffffffff8437a48b DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Call Trace:
       <TASK>
       nullb_device_power_store+0xd1/0x120 [null_blk]
       configfs_write_iter+0xb4/0x120
       vfs_write+0x2ba/0x3c0
       ksys_write+0x5f/0xe0
       do_syscall_64+0x3b/0x90
       entry_SYSCALL_64_after_hwframe+0x72/0xdc
      RIP: 0033:0x7fd4460c57a7
      Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
      RSP: 002b:00007ffd3792a4a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fd4460c57a7
      RDX: 0000000000000002 RSI: 000055b43c02e4c0 RDI: 0000000000000001
      RBP: 000055b43c02e4c0 R08: 000000000000000a R09: 00007fd44615b4e0
      R10: 00007fd44615b3e0 R11: 0000000000000246 R12: 0000000000000002
      R13: 00007fd446198520 R14: 0000000000000002 R15: 00007fd446198700
       </TASK>
      Signed-off-by: default avatarChaitanya Kulkarni <kch@nvidia.com>
      Reviewed-by: default avatarDamien Le Moal <dlemoal@kernel.org>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Reviewed-by: default avatarNitesh Shetty <nj.shetty@samsung.com>
      Link: https://lore.kernel.org/r/20230416220339.43845-1-kch@nvidia.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      63f8793e
    • Ming Lei's avatar
      block: ublk: switch to ioctl command encoding · 2d786e66
      Ming Lei authored
      All ublk commands(control, IO) should have taken ioctl command encoding
      from the beginning, because ioctl command encoding defines each code
      uniquely, so driver can figure out wrong command sent from userspace
      easily; 2) it might help security subsystem for audit uring cmd[1].
      
      Unfortunately we didn't do that way, and it could be one lesson for
      ublk driver.
      
      So switch to ioctl command encoding now, we still support commands encoded
      in old way, but they become legacy definition. Any new command should take
      ioctl encoding.
      
      See ublksrv code for switching to ioctl command encoding in [2].
      
      [1] https://lore.kernel.org/io-uring/CAHC9VhSVzujW9LOj5Km80AjU0EfAuukoLrxO6BEfnXeK_s6bAg@mail.gmail.com/
      [2] https://github.com/ming1/ubdsrv/commits/ioctl_cmd_encoding
      
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Ken Kurematsu <k.kurematsu@nskint.co.jp>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Link: https://lore.kernel.org/r/20230418131810.855959-1-ming.lei@redhat.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      2d786e66
  10. 16 Apr, 2023 5 commits
    • Christoph Hellwig's avatar
      blk-mq: fix the blk_mq_add_to_requeue_list call in blk_kick_flush · 26a42b61
      Christoph Hellwig authored
      Commit b12e5c6c accidentally changes blk_kick_flush to do a head
      insert into the requeue list, fix this up.
      
      Fixes: b12e5c6c ("blk-mq: pass a flags argument to blk_mq_add_to_requeue_list")
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Link: https://lore.kernel.org/r/20230416073553.966161-1-hch@lst.deSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      26a42b61
    • Colin Ian King's avatar
      block, bfq: Fix division by zero error on zero wsum · e53413f8
      Colin Ian King authored
      When the weighted sum is zero the calculation of limit causes
      a division by zero error. Fix this by continuing to the next level.
      
      This was discovered by running as root:
      
      stress-ng --ioprio 0
      
      Fixes divison by error oops:
      
      [  521.450556] divide error: 0000 [#1] SMP NOPTI
      [  521.450766] CPU: 2 PID: 2684464 Comm: stress-ng-iopri Not tainted 6.2.1-1280.native #1
      [  521.451117] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.1-0-g3208b098f51a-prebuilt.qemu.org 04/01/2014
      [  521.451627] RIP: 0010:bfqq_request_over_limit+0x207/0x400
      [  521.451875] Code: 01 48 8d 0c c8 74 0b 48 8b 82 98 00 00 00 48 8d 0c c8 8b 85 34 ff ff ff 48 89 ca 41 0f af 41 50 48 d1 ea 48 98 48 01 d0 31 d2 <48> f7 f1 41 39 41 48 89 85 34 ff ff ff 0f 8c 7b 01 00 00 49 8b 44
      [  521.452699] RSP: 0018:ffffb1af84eb3948 EFLAGS: 00010046
      [  521.452938] RAX: 000000000000003c RBX: 0000000000000000 RCX: 0000000000000000
      [  521.453262] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffb1af84eb3978
      [  521.453584] RBP: ffffb1af84eb3a30 R08: 0000000000000001 R09: ffff8f88ab8a4ba0
      [  521.453905] R10: 0000000000000000 R11: 0000000000000001 R12: ffff8f88ab8a4b18
      [  521.454224] R13: ffff8f8699093000 R14: 0000000000000001 R15: ffffb1af84eb3970
      [  521.454549] FS:  00005640b6b0b580(0000) GS:ffff8f88b3880000(0000) knlGS:0000000000000000
      [  521.454912] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  521.455170] CR2: 00007ffcbcae4e38 CR3: 00000002e46de001 CR4: 0000000000770ee0
      [  521.455491] PKRU: 55555554
      [  521.455619] Call Trace:
      [  521.455736]  <TASK>
      [  521.455837]  ? bfq_request_merge+0x3a/0xc0
      [  521.456027]  ? elv_merge+0x115/0x140
      [  521.456191]  bfq_limit_depth+0xc8/0x240
      [  521.456366]  __blk_mq_alloc_requests+0x21a/0x2c0
      [  521.456577]  blk_mq_submit_bio+0x23c/0x6c0
      [  521.456766]  __submit_bio+0xb8/0x140
      [  521.457236]  submit_bio_noacct_nocheck+0x212/0x300
      [  521.457748]  submit_bio_noacct+0x1a6/0x580
      [  521.458220]  submit_bio+0x43/0x80
      [  521.458660]  ext4_io_submit+0x23/0x80
      [  521.459116]  ext4_do_writepages+0x40a/0xd00
      [  521.459596]  ext4_writepages+0x65/0x100
      [  521.460050]  do_writepages+0xb7/0x1c0
      [  521.460492]  __filemap_fdatawrite_range+0xa6/0x100
      [  521.460979]  file_write_and_wait_range+0xbf/0x140
      [  521.461452]  ext4_sync_file+0x105/0x340
      [  521.461882]  __x64_sys_fsync+0x67/0x100
      [  521.462305]  ? syscall_exit_to_user_mode+0x2c/0x1c0
      [  521.462768]  do_syscall_64+0x3b/0xc0
      [  521.463165]  entry_SYSCALL_64_after_hwframe+0x5a/0xc4
      [  521.463621] RIP: 0033:0x5640b6c56590
      [  521.464006] Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 80 3d 71 70 0e 00 00 74 17 b8 4a 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 48 c3 0f 1f 80 00 00 00 00 48 83 ec 18 89 7c
      Signed-off-by: default avatarColin Ian King <colin.i.king@gmail.com>
      Link: https://lore.kernel.org/r/20230413133009.1605335-1-colin.i.king@gmail.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      e53413f8
    • Akinobu Mita's avatar
      fault-inject: fix build error when FAULT_INJECTION_CONFIGFS=y and CONFIGFS_FS=m · d325c162
      Akinobu Mita authored
      This fixes a build error when CONFIG_FAULT_INJECTION_CONFIGFS=y and
      CONFIG_CONFIGFS_FS=m.
      
      Since the fault-injection library cannot built as a module, avoid building
      configfs as a module.
      
      Fixes: 4668c7a2 ("fault-inject: allow configuration via configfs")
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Link: https://lore.kernel.org/oe-kbuild-all/202304150025.K0hczLR4-lkp@intel.com/Signed-off-by: default avatarAkinobu Mita <akinobu.mita@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      d325c162
    • Jens Axboe's avatar
      block: store bdev->bd_disk->fops->submit_bio state in bdev · 9f4107b0
      Jens Axboe authored
      We have a long chain of memory dereferencing just to whether or not
      this disk has a special submit_bio helper. As that's not necessarily
      the common case, add a bd_has_submit_bio state in the bdev to avoid
      traversing this memory dependency chain if we don't need to.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      9f4107b0
    • Jens Axboe's avatar
      block: re-arrange the struct block_device fields for better layout · 3838c406
      Jens Axboe authored
      This moves struct device out-of-line as it's just used at open/close
      time, so we can keep some of the commonly used fields closer together.
      On a standard setup, it also reduces the size from 864 bytes to 848
      bytes. Yes, struct device is a pig...
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      3838c406
  11. 14 Apr, 2023 9 commits