• Ming Lei's avatar
    blk-mq: drain I/O when all CPUs in a hctx are offline · bf0beec0
    Ming Lei authored
    Most of blk-mq drivers depend on managed IRQ's auto-affinity to setup
    up queue mapping. Thomas mentioned the following point[1]:
    
    "That was the constraint of managed interrupts from the very beginning:
    
     The driver/subsystem has to quiesce the interrupt line and the associated
     queue _before_ it gets shutdown in CPU unplug and not fiddle with it
     until it's restarted by the core when the CPU is plugged in again."
    
    However, current blk-mq implementation doesn't quiesce hw queue before
    the last CPU in the hctx is shutdown.  Even worse, CPUHP_BLK_MQ_DEAD is a
    cpuhp state handled after the CPU is down, so there isn't any chance to
    quiesce the hctx before shutting down the CPU.
    
    Add new CPUHP_AP_BLK_MQ_ONLINE state to stop allocating from blk-mq hctxs
    where the last CPU goes away, and wait for completion of in-flight
    requests.  This guarantees that there is no inflight I/O before shutting
    down the managed IRQ.
    
    Add a BLK_MQ_F_STACKING and set it for dm-rq and loop, so we don't need
    to wait for completion of in-flight requests from these drivers to avoid
    a potential dead-lock. It is safe to do this for stacking drivers as those
    do not use interrupts at all and their I/O completions are triggered by
    underlying devices I/O completion.
    
    [1] https://lore.kernel.org/linux-block/alpine.DEB.2.21.1904051331270.1802@nanos.tec.linutronix.de/
    
    [hch: different retry mechanism, merged two patches, minor cleanups]
    Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
    Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
    Reviewed-by: default avatarHannes Reinecke <hare@suse.de>
    Reviewed-by: default avatarDaniel Wagner <dwagner@suse.de>
    Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
    bf0beec0
blk-mq-debugfs.c 24.5 KB