• Yufen Yu's avatar
    block: only update parent bi_status when bio fail · 3edf5346
    Yufen Yu authored
    For multiple split bios, if one of the bio is fail, the whole
    should return error to application. But we found there is a race
    between bio_integrity_verify_fn and bio complete, which return
    io success to application after one of the bio fail. The race as
    following:
    
    split bio(READ)          kworker
    
    nvme_complete_rq
    blk_update_request //split error=0
      bio_endio
        bio_integrity_endio
          queue_work(kintegrityd_wq, &bip->bip_work);
    
                             bio_integrity_verify_fn
                             bio_endio //split bio
                              __bio_chain_endio
                                 if (!parent->bi_status)
    
                                   <interrupt entry>
                                   nvme_irq
                                     blk_update_request //parent error=7
                                     req_bio_endio
                                        bio->bi_status = 7 //parent bio
                                   <interrupt exit>
    
                                   parent->bi_status = 0
                            parent->bi_end_io() // return bi_status=0
    
    The bio has been split as two: split and parent. When split
    bio completed, it depends on kworker to do endio, while
    bio_integrity_verify_fn have been interrupted by parent bio
    complete irq handler. Then, parent bio->bi_status which have
    been set in irq handler will overwrite by kworker.
    
    In fact, even without the above race, we also need to conside
    the concurrency beteen mulitple split bio complete and update
    the same parent bi_status. Normally, multiple split bios will
    be issued to the same hctx and complete from the same irq
    vector. But if we have updated queue map between multiple split
    bios, these bios may complete on different hw queue and different
    irq vector. Then the concurrency update parent bi_status may
    cause the final status error.
    Suggested-by: default avatarKeith Busch <kbusch@kernel.org>
    Signed-off-by: default avatarYufen Yu <yuyufen@huawei.com>
    Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
    Link: https://lore.kernel.org/r/20210331115359.1125679-1-yuyufen@huawei.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
    3edf5346
bio.c 44.7 KB