1. 13 Oct, 2008 13 commits
    • Mike Christie's avatar
      [SCSI] modify scsi to handle new fail fast flags. · 4a27446f
      Mike Christie authored
      This checks the errors the scsi-ml determined were retryable
      and returns if we should fast fail it based on the request
      fail fast flags.
      
      Without the patch, drivers like lpfc, qla2xxx and fcoe would return
      DID_ERROR for what it determines is a temporary communication problem.
      There is no loss of connectivity at that time and the driver thinks
      that it would be fast to retry at the driver level. SCSI-ml will however
      sees fast fail on the request and DID_ERROR and will fast fail the io.
      This will then cause dm-multipath to fail the path and possibley switch
      target controllers when we should be retrying at the scsi layer.
      
      We also were fast failing device errors to dm multiapth when
      unless the scsi_dh modules think otherwis we want to retry at
      the scsi layer because multipath can only retry the IO like scsi
      should have done. multipath is a little dumber though because it
      does not what the error was for and assumes that it should fail
      the paths.
      Signed-off-by: default avatarMike Christie <michaelc@cs.wisc.edu>
      Signed-off-by: default avatarJames Bottomley <James.Bottomley@HansenPartnership.com>
      4a27446f
    • Mike Christie's avatar
      [SCSI] block: separate failfast into multiple bits. · 6000a368
      Mike Christie authored
      Multipath is best at handling transport errors. If it gets a device
      error then there is not much the multipath layer can do. It will just
      access the same device but from a different path.
      
      This patch breaks up failfast into device, transport and driver errors.
      The multipath layers (md and dm mutlipath) only ask the lower levels to
      fast fail transport errors. The user of failfast, read ahead, will ask
      to fast fail on all errors.
      
      Note that blk_noretry_request will return true if any failfast bit
      is set. This allows drivers that do not support the multipath failfast
      bits to continue to fail on any failfast error like before. Drivers
      like scsi that are able to fail fast specific errors can check
      for the specific fail fast type. In the next patch I will convert
      scsi.
      Signed-off-by: default avatarMike Christie <michaelc@cs.wisc.edu>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Signed-off-by: default avatarJames Bottomley <James.Bottomley@HansenPartnership.com>
      6000a368
    • Mike Christie's avatar
      [SCSI] qla2xxx: use new host byte transport errors. · 056a4483
      Mike Christie authored
      This has qla2xxx use the new transport error values instead of
      DID_BUS_BUSY. I am not sure if all the errors
      in qla_isr.c I changed are transport related. We end up blocking/deleting
      the rport for all of them so it is better to use the new transport error since
      the fc classs will decide when to fail the IO.
      
      With this patch if I pull a cable then IO that had reached
      the driver, will be failed with DID_TRANSPORT_DISRUPTED (not including
      tape). The fc class will then fail the IO when the fast io fail tmo
      has fired, and the driver will flush any other commands running.
      Signed-off-by: default avatarMike Christie <michaelc@cs.wisc.edu>
      Cc: Andrew Vasquez <andrew.vasquez@qlogic.com>
      Signed-off-by: default avatarJames Bottomley <James.Bottomley@HansenPartnership.com>
      056a4483
    • Mike Christie's avatar
      [SCSI] fc class: Add support for new transport errors · f46e307d
      Mike Christie authored
      If the target is blocked and fast io fail tmo has not fired
      then we requeue with DID_TRANSPORT_DISRUPTED. Once that
      tmo fires we fail with DID_TRANSPORT_FAILFAST.
      
      v2
      - seperate from
      "fc class: unblock target after calling terminate callback"
      to make it easier to review.
      - Add JamesS's ack from list.
      v2
      - initial patch
      Signed-off-by: default avatarMike Christie <michaelc@cs.wisc.edu>
      Acked-by: default avatarJames Smart <James.Smart@emulex.com>
      Signed-off-by: default avatarJames Bottomley <James.Bottomley@HansenPartnership.com>
      f46e307d
    • Mike Christie's avatar
      [SCSI] iscsi class, libiscsi and qla4xxx: convert to new transport host byte values · 56d7fcfa
      Mike Christie authored
      This patch converts the iscsi drivers to the new host byte values.
      
      v2
      Drop some conversions. Want to avoid conflicts with other patches.
      v1
      initial patch.
      Signed-off-by: default avatarMike Christie <michaelc@cs.wisc.edu>
      Signed-off-by: default avatarJames Bottomley <James.Bottomley@HansenPartnership.com>
      56d7fcfa
    • Mike Christie's avatar
      [SCSI] scsi: add transport host byte errors (v3) · a4dfaa6f
      Mike Christie authored
      Currently, if there is a transport problem the iscsi drivers will return
      outstanding commands (commands being exeucted by the driver/fw/hw) with
      DID_BUS_BUSY and block the session so no new commands can be queued.
      Commands that are caught between the failure handling and blocking are
      failed with DID_IMM_RETRY or one of the scsi ml queuecommand return values.
      When the recovery_timeout fires, the iscsi drivers then fail IO with
      DID_NO_CONNECT.
      
      For fcp, some drivers will fail some outstanding IO (disk but possibly not
      tape) with DID_BUS_BUSY or DID_ERROR or some other value that causes a retry
      and hits the scsi_error.c failfast check, block the rport, and commands
      caught in the race are failed with DID_IMM_RETRY. Other drivers, may
      hold onto all IO and wait for the terminate_rport_io or dev_loss_tmo_callbk
      to be called.
      
      The following patches attempt to unify what upper layers will see drivers
      like multipath can make a good guess. This relies on drivers being
      hooked into their transport class.
      
      This first patch just defines two new host byte errors so drivers can
      return the same value for when a rport/session is blocked and for
      when the fast_io_fail_tmo fires.
      
      The idea is that if the LLD/class detects a problem and is going to block
      a rport/session, then if the LLD wants or must return the command to scsi-ml,
      then it can return it with DID_TRANSPORT_DISRUPTED. This will requeue
      the IO into the same scsi queue it came from, until the fast io fail timer
      fires and the class decides what to do.
      
      When using multipath and the fast_io_fail_tmo fires then the class
      can fail commands with DID_TRANSPORT_FAILFAST or drivers can use
      DID_TRANSPORT_FAILFAST in their terminate_rport_io callbacks or
      the equivlent in iscsi if we ever implement more advanced recovery methods.
      A LLD, like lpfc, could continue to return DID_ERROR and then it will hit
      the normal failfast path, so drivers do not have fully be ported to
      work better. The point of the patches is that upper layers will
      not see a failure that could be recovered from while the rport/session is
      blocked until fast_io_fail_tmo/recovery_timeout fires.
      
      V3
      Remove some comments.
      V2
      Fixed patch/diff errors and renamed DID_TRANSPORT_BLOCKED to
      DID_TRANSPORT_DISRUPTED.
      V1
      initial patch.
      Signed-off-by: default avatarMike Christie <michaelc@cs.wisc.edu>
      Signed-off-by: default avatarJames Bottomley <James.Bottomley@HansenPartnership.com>
      a4dfaa6f
    • Mike Christie's avatar
      [SCSI] ibmvfc, qla2xxx, lpfc: remove scsi_target_unblock calls in terminate callbacks · 9cc328f5
      Mike Christie authored
      The fc class now calls scsi_target_unblock after calling the
      terminate callback, so this patch removes the calls from the
      drivers.
      Signed-off-by: default avatarMike Christie <michaelc@cs.wisc.edu>
      Signed-off-by: default avatarJames Bottomley <James.Bottomley@HansenPartnership.com>
      9cc328f5
    • Mike Christie's avatar
      [SCSI] fc class: unblock target after calling terminate callback (take 2) · fff9d40c
      Mike Christie authored
      When we block a rport and the driver implements the terminate
      callback we will fail IO that was running quickly. However
      IO that was in the scsi_device/block queue sits there until
      the dev_loss_tmo fires, and this can make it look like IO is
      lost because new IO will get executed but that IO stuck in
      the blocked queue sits there for some time longer.
      
      With this patch when the fast io fail tmo fires, we will
      fail the blocked IO and any new IO. This patch also allows
      all drivers to partially support the fast io fail tmo. If the
      terminate io callback is not implemented, we will still fail blocked
      IO and any new IO, so multipath can handle that.
      
      This patch also allows the fc and iscsi classes to implement the
      same behavior. The timers are just unfornately named differently.
      
      This patch also fixes the problem where drivers were unblocking
      the target in their terminate callback, which was needed for
      rport removal, but for fast io fail timeout it would cause
      IO to bounce arround the scsi/block layer and the LLD queuecommand.
      And it for drivers that could have IO stuck but did not have
      a terminate callback the unblock calls in the class will fix
      them.
      
      v2.
      - fix up bit setting style to meet JamesS's pref.
      - Broke out new host byte error changes to make it easier to read.
      - added JamesS's ack from list.
      v1
      - initial patch
      Signed-off-by: default avatarMike Christie <michaelc@cs.wisc.edu>
      Acked-by: default avatarJames Smart <James.Smart@emulex.com>
      Signed-off-by: default avatarJames Bottomley <James.Bottomley@HansenPartnership.com>
      fff9d40c
    • Mike Christie's avatar
      [SCSI] lpfc: use SCSI_MLQUEUE_TARGET_BUSY when catching the rport transition race · a93ce024
      Mike Christie authored
      We do want to call right back into the queuecommand during the race,
      so we can just use SCSI_MLQUEUE_TARGET_BUSY.
      Signed-off-by: default avatarMike Christie <michaelc@cs.wisc.edu>
      Acked-by: default avatarJames Smart <James.Smart@Emulex.Com>
      Signed-off-by: default avatarJames Bottomley <James.Bottomley@HansenPartnership.com>
      a93ce024
    • Mike Christie's avatar
      [SCSI] libiscsi: Use SCSI_MLQUEUE_TARGET_BUSY · d6d13ee1
      Mike Christie authored
      For the conditions below we do not want the queuecommand
      function to call us right back, so return SCSI_MLQUEUE_TARGET_BUSY.
      Signed-off-by: default avatarMike Christie <michaelc@cs.wisc.edu>
      Signed-off-by: default avatarJames Bottomley <James.Bottomley@HansenPartnership.com>
      d6d13ee1
    • Mike Christie's avatar
      [SCSI] qla2xxx: return SCSI_MLQUEUE_TARGET_BUSY when driver has detected rport error or race · 7b594131
      Mike Christie authored
      If the fcport is not online then we do not want to block IO to all ports on
      the host. We just want to stop IO on port not online, so we should be using
      the SCSI_MLQUEUE_TARGET_BUSY return value.
      
      For the case where we race with the rport memset initialization
      we do not want the queuecommand to be called again so we can just use
      SCSI_MLQUEUE_TARGET_BUSY for this.
      Signed-off-by: default avatarMike Christie <michaelc@cs.wisc.edu>
      Acked-by: default avatarAndrew Vasquez <andrew.vasquez@qlogic.com>
      Signed-off-by: default avatarJames Bottomley <James.Bottomley@HansenPartnership.com>
      7b594131
    • Mike Christie's avatar
      [SCSI] qla4xxx: return SCSI_MLQUEUE_TARGET_BUSY when driver has detected session error · c5e98e91
      Mike Christie authored
      When qla4xxx begins recovery and the iscsi class is firing up to handle
      it, we need to retrn SCSI_MLQUEUE_TARGET_BUSY from the driver instead
      of host busy, because the session recovery only affects the one target.
      Signed-off-by: default avatarMike Christie <michaelc@cs.wisc.edu>
      Acked-by: default avatarDavid C Somayajulu <david.somayajulu@qlogic.com>
      Signed-off-by: default avatarJames Bottomley <James.Bottomley@HansenPartnership.com>
      c5e98e91
    • Mike Christie's avatar
      [SCSI] Add helper code so transport classes/driver can control queueing (v3) · f0c0a376
      Mike Christie authored
      SCSI-ml manages the queueing limits for the device and host, but
      does not do so at the target level. However something something similar
      can come in userful when a driver is transitioning a transport object to
      the the blocked state, becuase at that time we do not want to queue
      io and we do not want the queuecommand to be called again.
      
      The patch adds code similar to the exisiting SCSI_ML_*BUSY handlers.
      You can now return SCSI_MLQUEUE_TARGET_BUSY when we hit
      a transport level queueing issue like the hw cannot allocate some
      resource at the iscsi session/connection level, or the target has temporarily
      closed or shrunk the queueing window, or if we are transitioning
      to the blocked state.
      
      bnx2i, when they rework their firmware according to netdev
      developers requests, will also need to be able to limit queueing at this
      level. bnx2i will hook into libiscsi, but will allocate a scsi host per
      netdevice/hba, so unlike pure software iscsi/iser which is allocating
      a host per session, it cannot set the scsi_host->can_queue and return
      SCSI_MLQUEUE_HOST_BUSY to reflect queueing limits on the transport.
      
      The iscsi class/driver can also set a scsi_target->can_queue value which
      reflects the max commands the driver/class can support. For iscsi this
      reflects the number of commands we can support for each session due to
      session/connection hw limits, driver limits, and to also reflect the
      session/targets's queueing window.
      
      Changes:
      v1 - initial patch.
      v2 - Fix scsi_run_queue handling of multiple blocked targets.
      Previously we would break from the main loop if a device was added back on
      the starved list. We now run over the list and check if any target is
      blocked.
      v3 - Rediff for scsi-misc.
      Signed-off-by: default avatarMike Christie <michaelc@cs.wisc.edu>
      Signed-off-by: default avatarJames Bottomley <James.Bottomley@HansenPartnership.com>
      f0c0a376
  2. 12 Oct, 2008 27 commits