• Shuai Xue's avatar
    ACPI: APEI: do not add task_work to kernel thread to avoid memory leak · 415fed69
    Shuai Xue authored
    If an error is detected as a result of user-space process accessing a
    corrupt memory location, the CPU may take an abort. Then the platform
    firmware reports kernel via NMI like notifications, e.g. NOTIFY_SEA,
    NOTIFY_SOFTWARE_DELEGATED, etc.
    
    For NMI like notifications, commit 7f17b4a1 ("ACPI: APEI: Kick the
    memory_failure() queue for synchronous errors") keep track of whether
    memory_failure() work was queued, and make task_work pending to flush out
    the queue so that the work is processed before return to user-space.
    
    The code use init_mm to check whether the error occurs in user space:
    
        if (current->mm != &init_mm)
    
    The condition is always true, becase _nobody_ ever has "init_mm" as a real
    VM any more.
    
    In addition to abort, errors can also be signaled as asynchronous
    exceptions, such as interrupt and SError. In such case, the interrupted
    current process could be any kind of thread. When a kernel thread is
    interrupted, the work ghes_kick_task_work deferred to task_work will never
    be processed because entry_handler returns to call ret_to_kernel() instead
    of ret_to_user(). Consequently, the estatus_node alloced from
    ghes_estatus_pool in ghes_in_nmi_queue_one_entry() will not be freed.
    After around 200 allocations in our platform, the ghes_estatus_pool will
    run of memory and ghes_in_nmi_queue_one_entry() returns ENOMEM. As a
    result, the event failed to be processed.
    
        sdei: event 805 on CPU 113 failed with error: -2
    
    Finally, a lot of unhandled events may cause platform firmware to exceed
    some threshold and reboot.
    
    The condition should generally just do
    
        if (current->mm)
    
    as described in active_mm.rst documentation.
    
    Then if an asynchronous error is detected when a kernel thread is running,
    (e.g. when detected by a background scrubber), do not add task_work to it
    as the original patch intends to do.
    
    Fixes: 7f17b4a1 ("ACPI: APEI: Kick the memory_failure() queue for synchronous errors")
    Signed-off-by: default avatarShuai Xue <xueshuai@linux.alibaba.com>
    Reviewed-by: default avatarTony Luck <tony.luck@intel.com>
    Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
    415fed69
ghes.c 39.3 KB