• Aili Yao's avatar
    mm,hwpoison: return -EHWPOISON to denote that the page has already been poisoned · 47af12ba
    Aili Yao authored
    When memory_failure() is called with MF_ACTION_REQUIRED on the page that
    has already been hwpoisoned, memory_failure() could fail to send SIGBUS
    to the affected process, which results in infinite loop of MCEs.
    
    Currently memory_failure() returns 0 if it's called for already
    hwpoisoned page, then the caller, kill_me_maybe(), could return without
    sending SIGBUS to current process.  An action required MCE is raised
    when the current process accesses to the broken memory, so no SIGBUS
    means that the current process continues to run and access to the error
    page again soon, so running into MCE loop.
    
    This issue can arise for example in the following scenarios:
    
     - Two or more threads access to the poisoned page concurrently. If
       local MCE is enabled, MCE handler independently handles the MCE
       events. So there's a race among MCE events, and the second or latter
       threads fall into the situation in question.
    
     - If there was a precedent memory error event and memory_failure() for
       the event failed to unmap the error page for some reason, the
       subsequent memory access to the error page triggers the MCE loop
       situation.
    
    To fix the issue, make memory_failure() return an error code when the
    error page has already been hwpoisoned.  This allows memory error
    handler to control how it sends signals to userspace.  And make sure
    that any process touching a hwpoisoned page should get a SIGBUS even in
    "already hwpoisoned" path of memory_failure() as is done in page fault
    path.
    
    Link: https://lkml.kernel.org/r/20210521030156.2612074-3-nao.horiguchi@gmail.comSigned-off-by: default avatarAili Yao <yaoaili@kingsoft.com>
    Signed-off-by: default avatarNaoya Horiguchi <naoya.horiguchi@nec.com>
    Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
    Cc: Andy Lutomirski <luto@kernel.org>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Borislav Petkov <bp@suse.de>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Jue Wang <juew@google.com>
    Cc: Tony Luck <tony.luck@intel.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    47af12ba
memory-failure.c 54 KB