• Yangyang Li's avatar
    RDMA/hns: Add the detection for CMDQ status in the device initialization process · e8ea058e
    Yangyang Li authored
    CMDQ may fail during HNS ROCEE initialization. The following is the log
    when the execution fails:
    
      hns3 0000:bd:00.2: In reset process RoCE client reinit.
      hns3 0000:bd:00.2: CMDQ move tail from 840 to 839
      hns3 0000:bd:00.2 hns_2: failed to set gid, ret = -11!
      hns3 0000:bd:00.2: CMDQ move tail from 840 to 839
      <...>
      hns3 0000:bd:00.2: CMDQ move tail from 840 to 839
      hns3 0000:bd:00.2: CMDQ move tail from 840 to 0
      hns3 0000:bd:00.2: [cmd]token 14e mailbox 20 timeout.
      hns3 0000:bd:00.2 hns_2: set HEM step 0 failed!
      hns3 0000:bd:00.2 hns_2: set HEM address to HW failed!
      hns3 0000:bd:00.2 hns_2: failed to alloc mtpt, ret = -16.
      infiniband hns_2: Couldn't create ib_mad PD
      infiniband hns_2: Couldn't open port 1
      hns3 0000:bd:00.2: Reset done, RoCE client reinit finished.
    
    However, even if ib_mad client registration failed, ib_register_device()
    still returns success to the driver.
    
    In the device initialization process, CMDQ execution fails because HW/FW
    is abnormal. Therefore, if CMDQ fails, the initialization function should
    set CMDQ to a fatal error state and return a failure to the caller.
    
    Fixes: 9a443537 ("IB/hns: Add driver files for hns RoCE driver")
    Link: https://lore.kernel.org/r/20220429093104.26687-1-liangwenpeng@huawei.comSigned-off-by: default avatarYangyang Li <liyangyang20@huawei.com>
    Signed-off-by: default avatarWenpeng Liang <liangwenpeng@huawei.com>
    Signed-off-by: default avatarJason Gunthorpe <jgg@nvidia.com>
    e8ea058e
hns_roce_device.h 32.1 KB