1. 29 Aug, 2019 36 commits
  2. 20 Aug, 2019 4 commits
    • James Smart's avatar
      scsi: lpfc: Update lpfc version to 12.4.0.0 · 10541f03
      James Smart authored
      Update lpfc version to 12.4.0.0
      Signed-off-by: default avatarDick Kennedy <dick.kennedy@broadcom.com>
      Signed-off-by: default avatarJames Smart <jsmart2021@gmail.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      10541f03
    • James Smart's avatar
      scsi: lpfc: Merge per-protocol WQ/CQ pairs into single per-cpu pair · c00f62e6
      James Smart authored
      Currently, each hardware queue, typically allocated per-cpu, consists of a
      WQ/CQ pair per protocol. Meaning if both SCSI and NVMe are supported 2
      WQ/CQ pairs will exist for the hardware queue. Separate queues are
      unnecessary. The current implementation wastes memory backing the 2nd set
      of queues, and the use of double the SLI-4 WQ/CQ's means less hardware
      queues can be supported which means there may not always be enough to have
      a pair per cpu. If there is only 1 pair per cpu, more cpu's may get their
      own WQ/CQ.
      
      Rework the implementation to use a single WQ/CQ pair by both protocols.
      Signed-off-by: default avatarDick Kennedy <dick.kennedy@broadcom.com>
      Signed-off-by: default avatarJames Smart <jsmart2021@gmail.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      c00f62e6
    • James Smart's avatar
      scsi: lpfc: Add NVMe sequence level error recovery support · 0d8af096
      James Smart authored
      FC-NVMe-2 added support for sequence level error recovery in the FC-NVME
      protocol. This allows for the detection of errors and lost frames and
      immediate retransmission of data to avoid exchange termination, which
      escalates into NVMeoFC connection and association failures. A significant
      RAS improvement.
      
      The driver is modified to indicate support for SLER in the NVMe PRLI is
      issues and to check for support in the PRLI response.  When both sides
      support it, the driver will set a bit in the WQE to enable the recovery
      behavior on the exchange. The adapter will take care of all detection and
      retransmission.
      Signed-off-by: default avatarDick Kennedy <dick.kennedy@broadcom.com>
      Signed-off-by: default avatarJames Smart <jsmart2021@gmail.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      0d8af096
    • James Smart's avatar
      scsi: lpfc: Support dynamic unbounded SGL lists on G7 hardware. · d79c9e9d
      James Smart authored
      Typical SLI-4 hardware supports up to 2 4KB pages to be registered per XRI
      to contain the exchanges Scatter/Gather List. This caps the number of SGL
      elements that can be in the SGL. There are not extensions to extend the
      list out of the 2 pages.
      
      The G7 hardware adds a SGE type that allows the SGL to be vectored to a
      different scatter/gather list segment. And that segment can contain a SGE
      to go to another segment and so on.  The initial segment must still be
      pre-registered for the XRI, but it can be a much smaller amount (256Bytes)
      as it can now be dynamically grown.  This much smaller allocation can
      handle the SG list for most normal I/O, and the dynamic aspect allows it to
      support many MB's if needed.
      
      The implementation creates a pool which contains "segments" and which is
      initially sized to hold the initial small segment per xri. If an I/O
      requires additional segments, they are allocated from the pool.  If the
      pool has no more segments, the pool is grown based on what is now
      needed. After the I/O completes, the additional segments are returned to
      the pool for use by other I/Os. Once allocated, the additional segments are
      not released under the assumption of "if needed once, it will be needed
      again". Pools are kept on a per-hardware queue basis, which is typically
      1:1 per cpu, but may be shared by multiple cpus.
      
      The switch to the smaller initial allocation significantly reduces the
      memory footprint of the driver (which only grows if large ios are
      issued). Based on the several K of XRIs for the adapter, the 8KB->256B
      reduction can conserve 32MBs or more.
      
      It has been observed with per-cpu resource pools that allocating a resource
      on CPU A, may be put back on CPU B. While the get routines are distributed
      evenly, only a limited subset of CPUs may be handling the put routines.
      This can put a strain on the lpfc_put_cmd_rsp_buf_per_cpu routine because
      all the resources are being put on a limited subset of CPUs.
      Signed-off-by: default avatarDick Kennedy <dick.kennedy@broadcom.com>
      Signed-off-by: default avatarJames Smart <jsmart2021@gmail.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      d79c9e9d