• Shanker Donthineni's avatar
    irqchip/gicv3: Workaround for NVIDIA erratum T241-FABRIC-4 · 35727af2
    Shanker Donthineni authored
    The T241 platform suffers from the T241-FABRIC-4 erratum which causes
    unexpected behavior in the GIC when multiple transactions are received
    simultaneously from different sources. This hardware issue impacts
    NVIDIA server platforms that use more than two T241 chips
    interconnected. Each chip has support for 320 {E}SPIs.
    
    This issue occurs when multiple packets from different GICs are
    incorrectly interleaved at the target chip. The erratum text below
    specifies exactly what can cause multiple transfer packets susceptible
    to interleaving and GIC state corruption. GIC state corruption can
    lead to a range of problems, including kernel panics, and unexpected
    behavior.
    
    >From the erratum text:
      "In some cases, inter-socket AXI4 Stream packets with multiple
      transfers, may be interleaved by the fabric when presented to ARM
      Generic Interrupt Controller. GIC expects all transfers of a packet
      to be delivered without any interleaving.
    
      The following GICv3 commands may result in multiple transfer packets
      over inter-socket AXI4 Stream interface:
       - Register reads from GICD_I* and GICD_N*
       - Register writes to 64-bit GICD registers other than GICD_IROUTERn*
       - ITS command MOVALL
    
      Multiple commands in GICv4+ utilize multiple transfer packets,
      including VMOVP, VMOVI, VMAPP, and 64-bit register accesses."
    
      This issue impacts system configurations with more than 2 sockets,
      that require multi-transfer packets to be sent over inter-socket
      AXI4 Stream interface between GIC instances on different sockets.
      GICv4 cannot be supported. GICv3 SW model can only be supported
      with the workaround. Single and Dual socket configurations are not
      impacted by this issue and support GICv3 and GICv4."
    
    Link: https://developer.nvidia.com/docs/t241-fabric-4/nvidia-t241-fabric-4-errata.pdf
    
    Writing to the chip alias region of the GICD_In{E} registers except
    GICD_ICENABLERn has an equivalent effect as writing to the global
    distributor. The SPI interrupt deactivate path is not impacted by
    the erratum.
    
    To fix this problem, implement a workaround that ensures read accesses
    to the GICD_In{E} registers are directed to the chip that owns the
    SPI, and disable GICv4.x features. To simplify code changes, the
    gic_configure_irq() function uses the same alias region for both read
    and write operations to GICD_ICFGR.
    Co-developed-by: default avatarVikram Sethi <vsethi@nvidia.com>
    Signed-off-by: default avatarVikram Sethi <vsethi@nvidia.com>
    Signed-off-by: default avatarShanker Donthineni <sdonthineni@nvidia.com>
    Acked-by: Sudeep Holla <sudeep.holla@arm.com> (for SMCCC/SOC ID bits)
    Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
    Link: https://lore.kernel.org/r/20230319024314.3540573-2-sdonthineni@nvidia.com
    35727af2
irq-gic-v3.c 63.4 KB