• Gavin Shan's avatar
    powerpc/powernv: Fix killed EEH event · 5c7a35e3
    Gavin Shan authored
    On PowerNV platform, EEH errors are reported by IO accessors or poller
    driven by interrupt. After the PE is isolated, we won't produce EEH
    event for the PE. The current implementation has possibility of EEH
    event lost in this way:
    
    The interrupt handler queues one "special" event, which drives the poller.
    EEH thread doesn't pick the special event yet. IO accessors kicks in, the
    frozen PE is marked as "isolated" and EEH event is queued to the list.
    EEH thread runs because of special event and purge all existing EEH events.
    However, we never produce an other EEH event for the frozen PE. Eventually,
    the PE is marked as "isolated" and we don't have EEH event to recover it.
    
    The patch fixes the issue to keep EEH events for PEs that have been
    marked as "isolated" with the help of additional "force" help to
    eeh_remove_event().
    Reported-by: default avatarRolf Brudeseth <rolfb@us.ibm.com>
    Signed-off-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
    Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
    5c7a35e3
eeh_driver.c 23 KB