1. 19 Oct, 2021 9 commits
    • Jens Axboe's avatar
      block: change plugging to use a singly linked list · bc490f81
      Jens Axboe authored
      Use a singly linked list for the blk_plug. This saves 8 bytes in the
      blk_plug struct, and makes for faster list manipulations than doubly
      linked lists. As we don't use the doubly linked lists for anything,
      singly linked is just fine.
      
      This yields a bump in default (merging enabled) performance from 7.0
      to 7.1M IOPS, and ~7.5M IOPS with merging disabled.
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      bc490f81
    • Andrea Righi's avatar
      blk-wbt: prevent NULL pointer dereference in wb_timer_fn · 480d42dc
      Andrea Righi authored
      The timer callback used to evaluate if the latency is exceeded can be
      executed after the corresponding disk has been released, causing the
      following NULL pointer dereference:
      
      [ 119.987108] BUG: kernel NULL pointer dereference, address: 0000000000000098
      [ 119.987617] #PF: supervisor read access in kernel mode
      [ 119.987971] #PF: error_code(0x0000) - not-present page
      [ 119.988325] PGD 7c4a4067 P4D 7c4a4067 PUD 7bf63067 PMD 0
      [ 119.988697] Oops: 0000 [#1] SMP NOPTI
      [ 119.988959] CPU: 1 PID: 9353 Comm: cloud-init Not tainted 5.15-rc5+arighi #rc5+arighi
      [ 119.989520] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
      [ 119.990055] RIP: 0010:wb_timer_fn+0x44/0x3c0
      [ 119.990376] Code: 41 8b 9c 24 98 00 00 00 41 8b 94 24 b8 00 00 00 41 8b 84 24 d8 00 00 00 4d 8b 74 24 28 01 d3 01 c3 49 8b 44 24 60 48 8b 40 78 <4c> 8b b8 98 00 00 00 4d 85 f6 0f 84 c4 00 00 00 49 83 7c 24 30 00
      [ 119.991578] RSP: 0000:ffffb5f580957da8 EFLAGS: 00010246
      [ 119.991937] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000004
      [ 119.992412] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88f476d7f780
      [ 119.992895] RBP: ffffb5f580957dd0 R08: 0000000000000000 R09: 0000000000000000
      [ 119.993371] R10: 0000000000000004 R11: 0000000000000002 R12: ffff88f476c84500
      [ 119.993847] R13: ffff88f4434390c0 R14: 0000000000000000 R15: ffff88f4bdc98c00
      [ 119.994323] FS: 00007fb90bcd9c00(0000) GS:ffff88f4bdc80000(0000) knlGS:0000000000000000
      [ 119.994952] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 119.995380] CR2: 0000000000000098 CR3: 000000007c0d6000 CR4: 00000000000006e0
      [ 119.995906] Call Trace:
      [ 119.996130] ? blk_stat_free_callback_rcu+0x30/0x30
      [ 119.996505] blk_stat_timer_fn+0x138/0x140
      [ 119.996830] call_timer_fn+0x2b/0x100
      [ 119.997136] __run_timers.part.0+0x1d1/0x240
      [ 119.997470] ? kvm_clock_get_cycles+0x11/0x20
      [ 119.997826] ? ktime_get+0x3e/0xa0
      [ 119.998110] ? native_apic_msr_write+0x2c/0x30
      [ 119.998456] ? lapic_next_event+0x20/0x30
      [ 119.998779] ? clockevents_program_event+0x94/0xf0
      [ 119.999150] run_timer_softirq+0x2a/0x50
      [ 119.999465] __do_softirq+0xcb/0x26f
      [ 119.999764] irq_exit_rcu+0x8c/0xb0
      [ 120.000057] sysvec_apic_timer_interrupt+0x43/0x90
      [ 120.000429] ? asm_sysvec_apic_timer_interrupt+0xa/0x20
      [ 120.000836] asm_sysvec_apic_timer_interrupt+0x12/0x20
      
      In this case simply return from the timer callback (no action
      required) to prevent the NULL pointer dereference.
      
      BugLink: https://bugs.launchpad.net/bugs/1947557
      Link: https://lore.kernel.org/linux-mm/YWRNVTk9N8K0RMst@arighi-desktop/
      Fixes: 34dbad5d ("blk-stat: convert to callback-based statistics reporting")
      Signed-off-by: default avatarAndrea Righi <andrea.righi@canonical.com>
      Link: https://lore.kernel.org/r/YW6N2qXpBU3oc50q@arighi-desktopSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      480d42dc
    • Jens Axboe's avatar
      block: align blkdev_dio inlined bio to a cacheline · 6155631a
      Jens Axboe authored
      We get all sorts of unreliable and funky results since the bio is
      designed to align on a cacheline, which it does not when inlined like
      this.
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      6155631a
    • Jens Axboe's avatar
      block: move blk_mq_tag_to_rq() inline · e028f167
      Jens Axboe authored
      This is in the fast path of driver issue or completion, and it's a single
      array index operation. Move it inline to avoid a function call for it.
      
      This does mean making struct blk_mq_tags block layer public, but there's
      not really much in there.
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      e028f167
    • Jens Axboe's avatar
      block: get rid of plug list sorting · df87eb0f
      Jens Axboe authored
      Even if we have multiple queues in the plug list, chances that they
      are very interspersed is minimal. Don't bother spending CPU cycles
      sorting the list.
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      df87eb0f
    • Jens Axboe's avatar
      block: return whether or not to unplug through boolean · 87c037d1
      Jens Axboe authored
      Instead of returning the same queue request through a request pointer,
      use a boolean to accomplish the same.
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      87c037d1
    • Christoph Hellwig's avatar
      block: don't call blk_status_to_errno in blk_update_request · 8a7d267b
      Christoph Hellwig authored
      We only need to call it to resolve the blk_status_t -> errno mapping for
      tracing, so move the conversion into the tracepoints that are not called
      at all when tracing isn't enabled.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      8a7d267b
    • Jens Axboe's avatar
      block: move bdev_read_only() into the header · db9a02ba
      Jens Axboe authored
      This is called for every write in the fast path, move it inline next
      to get_disk_ro() which is called internally.
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarChaitanya Kulkarni <kch@nvidia.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      db9a02ba
    • Jens Axboe's avatar
      block: fix too broad elevator check in blk_mq_free_request() · e0d78afe
      Jens Axboe authored
      We added RQF_ELV to tell whether there's an IO scheduler attached, and
      RQF_ELVPRIV tells us whether there's an IO scheduler with private data
      attached. Don't check RQF_ELV in blk_mq_free_request(), what we care
      about here is just if we have scheduler private data attached.
      
      This fixes a boot crash
      
      Fixes: 2ff0682d ("block: store elevator state in request")
      Reported-by: default avatarYi Zhang <yi.zhang@redhat.com>
      Reported-by: syzbot+eb8104072aeab6cc1195@syzkaller.appspotmail.com
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      e0d78afe
  2. 18 Oct, 2021 31 commits