• James Smart's avatar
    scsi: lpfc: Separate CQ processing for nvmet_fc upcalls · d74a89aa
    James Smart authored
    Currently the driver is notified of new command frame receipt by CQEs. As
    part of the CQE processing, the driver upcalls the nvmet_fc transport to
    deliver the command. nvmet_fc, as part of receiving the command builds out
    a context for it, where one of the first steps is to allocate memory for
    the io.
    
    When running with tests that do large ios (1MB), it was found on some
    systems, the total number of outstanding I/O's, at 1MB per, completely
    consumed the system's memory. Thus additional ios were getting blocked in
    the memory allocator.  Given that this blocked the lpfc thread processing
    CQEs, there were lots of other commands that were received and which are
    then held up, and given CQEs are serially processed, the aggregate delays
    for an IO waiting behind the others became cummulative - enough so that the
    initiator hit timeouts for the ios.
    
    The basic fix is to avoid the direct upcall and instead schedule a work
    item for each io as it is received. This allows the cq processing to
    complete very quickly, and each io can then run or block on it's own.
    However, this general solution hurts latency when there are few ios.  As
    such, implemented the fix such that the driver watches how many CQEs it has
    processed sequentially in one run. As long as the count is below a
    threshold, the direct nvmet_fc upcall will be made. Only when the count is
    exceeded will it revert to work scheduling.
    
    Given that debug of this showed a surprisingly long delay in cq processing,
    the io timer stats were updated to better reflect the processing of the
    different points.
    Signed-off-by: default avatarDick Kennedy <dick.kennedy@broadcom.com>
    Signed-off-by: default avatarJames Smart <jsmart2021@gmail.com>
    Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
    d74a89aa
lpfc_sli4.h 33.6 KB