• Alan Stern's avatar
    [SCSI] return success after retries in scsi_eh_tur · e47373ec
    Alan Stern authored
    The problem lies in the way the error handler uses TEST UNIT READY to
    tell whether error recovery has succeeded.  The scsi_eh_tur function
    gives up after one round of retrying; after that it decides that more
    error recovery is needed.
    
    However TUR is liable to report sense data indicating a retry is needed
    when in fact error recovery has succeeded.  A typical example might be
    SK=2, ASC=4, ASCQ=1 (Logical unit in process of becoming ready).  The mere
    fact that we were able to get a sensible reply to the TUR should indicate
    that the device is working well enough to stop error recovery.
    
    I ran across a case back in January where this happened.  A CD-ROM drive
    timed out the INQUIRY command, and a device reset fixed the blockage.
    But then the drive kept responding with 2/4/1 -- because it was spinning
    up I suppose -- until the error handler gave up and placed it offline.
    If the initial INQUIRY had received the 2/4/1 instead, everything would
    have worked okay.  It doesn't seem reasonable for things to fail just
    because the error handler had started running.
    Signed-off-by: default avatarAlan Stern <stern@rowland.harvard.edu>
    Signed-off-by: default avatarJames Bottomley <James.Bottomley@SteelEye.com>
    e47373ec
scsi_error.c 53 KB