• Keith Busch's avatar
    NVMe: Fix reset/remove race · 9bf2b972
    Keith Busch authored
    This fixes a scenario where device is present and being reset, but a
    request to unbind the driver occurs.
    
    A previous patch series addressing a device failure removal scenario
    flushed reset_work after controller disable to unblock reset_work waiting
    on a completion that wouldn't occur. This isn't safe as-is. The broken
    scenario can potentially be induced with:
    
      modprobe nvme && modprobe -r nvme
    
    To fix, the reset work is flushed immediately after setting the controller
    removing flag, and any subsequent reset will not proceed with controller
    initialization if the flag is set.
    
    The controller status must be polled while active, so the watchdog timer
    is also left active until the controller is disabled to cleanup requests
    that may be stuck during namespace removal.
    
    [Fixes: ff23a2a1]
    Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
    Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
    Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
    Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
    Signed-off-by: default avatarJens Axboe <axboe@fb.com>
    9bf2b972
pci.c 55.2 KB