• Matthew R. Ochs's avatar
    cxlflash: Fix to avoid bypassing context cleanup · a82544c7
    Matthew R. Ochs authored
    Contexts may be skipped over for cleanup in situations where contention
    for the adapter's table-list mutex is experienced in the presence of a
    signal during the execution of the release handler.
    
    This can lead to two known issues:
    
     - A hang condition on remove as that path tries to wait for users to
       cleanup - something that will never complete should this scenario play
       out as the user has already cleaned up from their perspective.
    
     - An Oops in the unmap_mapping_range() call that is made as part of
       the user waiting mechanism that is invoked on remove when contexts
       are found to still exist.
    
    The root cause of this issue can be found in get_context() and how the
    table-list mutex is acquired. As this code path is shared by several
    different access points within the driver, a decision was made during
    the development cycle to acquire this mutex in this location using the
    interruptible version of the mutex locking service. In almost all of
    the use-cases and environmental scenarios this holds up, even when the
    mutex is contended. However, for critical system threads (such as the
    release handler), failing to acquire the mutex and bailing with the
    intention of the user being able to try again later is unacceptable.
    
    In such a scenario, the context _must_ be derived as it is on an
    irreversible path to being freed. Without being able to derive the
    context, the code mistakenly assumes that it has already been freed
    and proceeds to free up the underlying CXL context resources. From
    this point on, any usage of [the now stale] CXL context resources
    will result in undefined behavior. This is root cause of the Oops
    mentioned as the second known issue as the mapping passed to the
    unmap_mapping_range() service is owned by the CXL context.
    
    To fix this problem, acquisition of the table-list mutex within
    get_context() is simply changed to use the uninterruptible version
    of the mutex locking service. This is safe as the timing windows for
    holding this mutex are short and also protected against blocking.
    Signed-off-by: default avatarMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
    Acked-by: default avatarManoj Kumar <manoj@linux.vnet.ibm.com>
    Reviewed-by: default avatarAndrew Donnellan <andrew.donnellan@au1.ibm.com>
    Signed-off-by: default avatarJames Bottomley <JBottomley@Odin.com>
    a82544c7
superpipe.c 59.1 KB