• Keith Busch's avatar
    blk-mq: use quiesced elevator switch when reinitializing queues · 8237c01f
    Keith Busch authored
    The hctx's run_work may be racing with the elevator switch when
    reinitializing hardware queues. The queue is merely frozen in this
    context, but that only prevents requests from allocating and doesn't
    stop the hctx work from running. The work may get an elevator pointer
    that's being torn down, and can result in use-after-free errors and
    kernel panics (example below). Use the quiesced elevator switch instead,
    and make the previous one static since it is now only used locally.
    
      nvme nvme0: resetting controller
      nvme nvme0: 32/0/0 default/read/poll queues
      BUG: kernel NULL pointer dereference, address: 0000000000000008
      #PF: supervisor read access in kernel mode
      #PF: error_code(0x0000) - not-present page
      PGD 80000020c8861067 P4D 80000020c8861067 PUD 250f8c8067 PMD 0
      Oops: 0000 [#1] SMP PTI
      Workqueue: kblockd blk_mq_run_work_fn
      RIP: 0010:kyber_has_work+0x29/0x70
    
    ...
    
      Call Trace:
       __blk_mq_do_dispatch_sched+0x83/0x2b0
       __blk_mq_sched_dispatch_requests+0x12e/0x170
       blk_mq_sched_dispatch_requests+0x30/0x60
       __blk_mq_run_hw_queue+0x2b/0x50
       process_one_work+0x1ef/0x380
       worker_thread+0x2d/0x3e0
    Signed-off-by: default avatarKeith Busch <kbusch@kernel.org>
    Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
    Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
    Link: https://lore.kernel.org/r/20220927155652.3260724-1-kbusch@fb.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
    8237c01f
blk-mq.c 119 KB