• Shannon Nelson's avatar
    ionic: stretch heartbeat detection · ec8ee714
    Shannon Nelson authored
    The driver can be premature in detecting stalled firmware
    when the heartbeat is not updated because the firmware can
    occasionally take a long time (more than 2 seconds) to service
    a request, and doesn't update the heartbeat during that time.
    
    The firmware heartbeat is not necessarily a steady 1 second
    periodic beat, but better described as something that should
    progress at least once in every DECVMD_TIMEOUT period.
    The single-threaded design in the FW means that if a devcmd
    or adminq request launches a large internal job, it is stuck
    waiting for that job to finish before it can get back to
    updating the heartbeat.  Since all requests are "guaranteed"
    to finish within the DEVCMD_TIMEOUT period, the driver needs
    to less aggressive in checking the heartbeat progress.
    
    We change our current 2 second window to something bigger than
    DEVCMD_TIMEOUT which should take care of most of the issue.
    We stop checking for the heartbeat while waiting for a request,
    as long as we're still watching for the FW status.  Lastly,
    we make sure our FW status is up to date before running a
    devcmd request.
    
    Once we do this, we need to not check the heartbeat on DEV
    commands because it may be stalled while we're on the fw_down
    path.  Instead, we can rely on the is_fw_running check.
    
    Fixes: b2b9a8d7 ("ionic: avoid races in ionic_heartbeat_check")
    Signed-off-by: default avatarBrett Creeley <brett@pensando.io>
    Signed-off-by: default avatarShannon Nelson <snelson@pensando.io>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    ec8ee714
ionic_main.c 16.9 KB