• James Smart's avatar
    scsi: lpfc: Support dynamic unbounded SGL lists on G7 hardware. · d79c9e9d
    James Smart authored
    Typical SLI-4 hardware supports up to 2 4KB pages to be registered per XRI
    to contain the exchanges Scatter/Gather List. This caps the number of SGL
    elements that can be in the SGL. There are not extensions to extend the
    list out of the 2 pages.
    
    The G7 hardware adds a SGE type that allows the SGL to be vectored to a
    different scatter/gather list segment. And that segment can contain a SGE
    to go to another segment and so on.  The initial segment must still be
    pre-registered for the XRI, but it can be a much smaller amount (256Bytes)
    as it can now be dynamically grown.  This much smaller allocation can
    handle the SG list for most normal I/O, and the dynamic aspect allows it to
    support many MB's if needed.
    
    The implementation creates a pool which contains "segments" and which is
    initially sized to hold the initial small segment per xri. If an I/O
    requires additional segments, they are allocated from the pool.  If the
    pool has no more segments, the pool is grown based on what is now
    needed. After the I/O completes, the additional segments are returned to
    the pool for use by other I/Os. Once allocated, the additional segments are
    not released under the assumption of "if needed once, it will be needed
    again". Pools are kept on a per-hardware queue basis, which is typically
    1:1 per cpu, but may be shared by multiple cpus.
    
    The switch to the smaller initial allocation significantly reduces the
    memory footprint of the driver (which only grows if large ios are
    issued). Based on the several K of XRIs for the adapter, the 8KB->256B
    reduction can conserve 32MBs or more.
    
    It has been observed with per-cpu resource pools that allocating a resource
    on CPU A, may be put back on CPU B. While the get routines are distributed
    evenly, only a limited subset of CPUs may be handling the put routines.
    This can put a strain on the lpfc_put_cmd_rsp_buf_per_cpu routine because
    all the resources are being put on a limited subset of CPUs.
    Signed-off-by: default avatarDick Kennedy <dick.kennedy@broadcom.com>
    Signed-off-by: default avatarJames Smart <jsmart2021@gmail.com>
    Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
    d79c9e9d
lpfc_sli4.h 34.5 KB