• Keith Busch's avatar
    nvme-pci: Remove watchdog timer · b2a0eb1a
    Keith Busch authored
    The controller status polling was added to preemptively reset a failed
    controller. This early detection would allow commands that would normally
    timeout a chance for a retry, or find broken links when the platform
    didn't support hotplug.
    
    This once-per-second MMIO read, however, created more problems than
    it solves. This often races with PCIe Hotplug events that required
    complicated syncing between work queues, frequently triggered PCIe
    Completion Timeout errors that also lead to fatal machine checks, and
    unnecessarily disrupts low power modes by running on idle controllers.
    
    This patch removes the watchdog timer, and instead checks controller
    health only on an IO timeout when we have a reason to believe something
    is wrong. If the controller is failed, the driver will disable immediately
    and request scheduling a reset.
    Suggested-by: default avatarAndy Lutomirski <luto@amacapital.net>
    Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
    Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
    b2a0eb1a
pci.c 63.8 KB