1. 13 Oct, 2008 8 commits
    • Mike Christie's avatar
      [SCSI] scsi: add transport host byte errors (v3) · a4dfaa6f
      Mike Christie authored
      Currently, if there is a transport problem the iscsi drivers will return
      outstanding commands (commands being exeucted by the driver/fw/hw) with
      DID_BUS_BUSY and block the session so no new commands can be queued.
      Commands that are caught between the failure handling and blocking are
      failed with DID_IMM_RETRY or one of the scsi ml queuecommand return values.
      When the recovery_timeout fires, the iscsi drivers then fail IO with
      DID_NO_CONNECT.
      
      For fcp, some drivers will fail some outstanding IO (disk but possibly not
      tape) with DID_BUS_BUSY or DID_ERROR or some other value that causes a retry
      and hits the scsi_error.c failfast check, block the rport, and commands
      caught in the race are failed with DID_IMM_RETRY. Other drivers, may
      hold onto all IO and wait for the terminate_rport_io or dev_loss_tmo_callbk
      to be called.
      
      The following patches attempt to unify what upper layers will see drivers
      like multipath can make a good guess. This relies on drivers being
      hooked into their transport class.
      
      This first patch just defines two new host byte errors so drivers can
      return the same value for when a rport/session is blocked and for
      when the fast_io_fail_tmo fires.
      
      The idea is that if the LLD/class detects a problem and is going to block
      a rport/session, then if the LLD wants or must return the command to scsi-ml,
      then it can return it with DID_TRANSPORT_DISRUPTED. This will requeue
      the IO into the same scsi queue it came from, until the fast io fail timer
      fires and the class decides what to do.
      
      When using multipath and the fast_io_fail_tmo fires then the class
      can fail commands with DID_TRANSPORT_FAILFAST or drivers can use
      DID_TRANSPORT_FAILFAST in their terminate_rport_io callbacks or
      the equivlent in iscsi if we ever implement more advanced recovery methods.
      A LLD, like lpfc, could continue to return DID_ERROR and then it will hit
      the normal failfast path, so drivers do not have fully be ported to
      work better. The point of the patches is that upper layers will
      not see a failure that could be recovered from while the rport/session is
      blocked until fast_io_fail_tmo/recovery_timeout fires.
      
      V3
      Remove some comments.
      V2
      Fixed patch/diff errors and renamed DID_TRANSPORT_BLOCKED to
      DID_TRANSPORT_DISRUPTED.
      V1
      initial patch.
      Signed-off-by: default avatarMike Christie <michaelc@cs.wisc.edu>
      Signed-off-by: default avatarJames Bottomley <James.Bottomley@HansenPartnership.com>
      a4dfaa6f
    • Mike Christie's avatar
      [SCSI] ibmvfc, qla2xxx, lpfc: remove scsi_target_unblock calls in terminate callbacks · 9cc328f5
      Mike Christie authored
      The fc class now calls scsi_target_unblock after calling the
      terminate callback, so this patch removes the calls from the
      drivers.
      Signed-off-by: default avatarMike Christie <michaelc@cs.wisc.edu>
      Signed-off-by: default avatarJames Bottomley <James.Bottomley@HansenPartnership.com>
      9cc328f5
    • Mike Christie's avatar
      [SCSI] fc class: unblock target after calling terminate callback (take 2) · fff9d40c
      Mike Christie authored
      When we block a rport and the driver implements the terminate
      callback we will fail IO that was running quickly. However
      IO that was in the scsi_device/block queue sits there until
      the dev_loss_tmo fires, and this can make it look like IO is
      lost because new IO will get executed but that IO stuck in
      the blocked queue sits there for some time longer.
      
      With this patch when the fast io fail tmo fires, we will
      fail the blocked IO and any new IO. This patch also allows
      all drivers to partially support the fast io fail tmo. If the
      terminate io callback is not implemented, we will still fail blocked
      IO and any new IO, so multipath can handle that.
      
      This patch also allows the fc and iscsi classes to implement the
      same behavior. The timers are just unfornately named differently.
      
      This patch also fixes the problem where drivers were unblocking
      the target in their terminate callback, which was needed for
      rport removal, but for fast io fail timeout it would cause
      IO to bounce arround the scsi/block layer and the LLD queuecommand.
      And it for drivers that could have IO stuck but did not have
      a terminate callback the unblock calls in the class will fix
      them.
      
      v2.
      - fix up bit setting style to meet JamesS's pref.
      - Broke out new host byte error changes to make it easier to read.
      - added JamesS's ack from list.
      v1
      - initial patch
      Signed-off-by: default avatarMike Christie <michaelc@cs.wisc.edu>
      Acked-by: default avatarJames Smart <James.Smart@emulex.com>
      Signed-off-by: default avatarJames Bottomley <James.Bottomley@HansenPartnership.com>
      fff9d40c
    • Mike Christie's avatar
      [SCSI] lpfc: use SCSI_MLQUEUE_TARGET_BUSY when catching the rport transition race · a93ce024
      Mike Christie authored
      We do want to call right back into the queuecommand during the race,
      so we can just use SCSI_MLQUEUE_TARGET_BUSY.
      Signed-off-by: default avatarMike Christie <michaelc@cs.wisc.edu>
      Acked-by: default avatarJames Smart <James.Smart@Emulex.Com>
      Signed-off-by: default avatarJames Bottomley <James.Bottomley@HansenPartnership.com>
      a93ce024
    • Mike Christie's avatar
      [SCSI] libiscsi: Use SCSI_MLQUEUE_TARGET_BUSY · d6d13ee1
      Mike Christie authored
      For the conditions below we do not want the queuecommand
      function to call us right back, so return SCSI_MLQUEUE_TARGET_BUSY.
      Signed-off-by: default avatarMike Christie <michaelc@cs.wisc.edu>
      Signed-off-by: default avatarJames Bottomley <James.Bottomley@HansenPartnership.com>
      d6d13ee1
    • Mike Christie's avatar
      [SCSI] qla2xxx: return SCSI_MLQUEUE_TARGET_BUSY when driver has detected rport error or race · 7b594131
      Mike Christie authored
      If the fcport is not online then we do not want to block IO to all ports on
      the host. We just want to stop IO on port not online, so we should be using
      the SCSI_MLQUEUE_TARGET_BUSY return value.
      
      For the case where we race with the rport memset initialization
      we do not want the queuecommand to be called again so we can just use
      SCSI_MLQUEUE_TARGET_BUSY for this.
      Signed-off-by: default avatarMike Christie <michaelc@cs.wisc.edu>
      Acked-by: default avatarAndrew Vasquez <andrew.vasquez@qlogic.com>
      Signed-off-by: default avatarJames Bottomley <James.Bottomley@HansenPartnership.com>
      7b594131
    • Mike Christie's avatar
      [SCSI] qla4xxx: return SCSI_MLQUEUE_TARGET_BUSY when driver has detected session error · c5e98e91
      Mike Christie authored
      When qla4xxx begins recovery and the iscsi class is firing up to handle
      it, we need to retrn SCSI_MLQUEUE_TARGET_BUSY from the driver instead
      of host busy, because the session recovery only affects the one target.
      Signed-off-by: default avatarMike Christie <michaelc@cs.wisc.edu>
      Acked-by: default avatarDavid C Somayajulu <david.somayajulu@qlogic.com>
      Signed-off-by: default avatarJames Bottomley <James.Bottomley@HansenPartnership.com>
      c5e98e91
    • Mike Christie's avatar
      [SCSI] Add helper code so transport classes/driver can control queueing (v3) · f0c0a376
      Mike Christie authored
      SCSI-ml manages the queueing limits for the device and host, but
      does not do so at the target level. However something something similar
      can come in userful when a driver is transitioning a transport object to
      the the blocked state, becuase at that time we do not want to queue
      io and we do not want the queuecommand to be called again.
      
      The patch adds code similar to the exisiting SCSI_ML_*BUSY handlers.
      You can now return SCSI_MLQUEUE_TARGET_BUSY when we hit
      a transport level queueing issue like the hw cannot allocate some
      resource at the iscsi session/connection level, or the target has temporarily
      closed or shrunk the queueing window, or if we are transitioning
      to the blocked state.
      
      bnx2i, when they rework their firmware according to netdev
      developers requests, will also need to be able to limit queueing at this
      level. bnx2i will hook into libiscsi, but will allocate a scsi host per
      netdevice/hba, so unlike pure software iscsi/iser which is allocating
      a host per session, it cannot set the scsi_host->can_queue and return
      SCSI_MLQUEUE_HOST_BUSY to reflect queueing limits on the transport.
      
      The iscsi class/driver can also set a scsi_target->can_queue value which
      reflects the max commands the driver/class can support. For iscsi this
      reflects the number of commands we can support for each session due to
      session/connection hw limits, driver limits, and to also reflect the
      session/targets's queueing window.
      
      Changes:
      v1 - initial patch.
      v2 - Fix scsi_run_queue handling of multiple blocked targets.
      Previously we would break from the main loop if a device was added back on
      the starved list. We now run over the list and check if any target is
      blocked.
      v3 - Rediff for scsi-misc.
      Signed-off-by: default avatarMike Christie <michaelc@cs.wisc.edu>
      Signed-off-by: default avatarJames Bottomley <James.Bottomley@HansenPartnership.com>
      f0c0a376
  2. 12 Oct, 2008 32 commits