• Linus Torvalds's avatar
    mm: suppress mm fault logging if fatal signal already pending · 5f0bc0b0
    Linus Torvalds authored
    Commit eda00472 ("mm: make the page fault mmap locking killable")
    intentionally made it much easier to trigger the "page fault fails
    because a fatal signal is pending" situation, by having the mmap locking
    fail early in that case.
    
    We have long aborted page faults in other fatal cases when the actual IO
    for a page is interrupted by SIGKILL - which is particularly useful for
    the traditional case of NFS hanging due to network issues, but local
    filesystems could cause it too if you happened to get the SIGKILL while
    waiting for a page to be faulted in (eg lock_folio_maybe_drop_mmap()).
    
    So aborting the page fault wasn't a new condition - but it now triggers
    earlier, before we even get to 'handle_mm_fault()'.  And as a result the
    error doesn't go through our 'fault_signal_pending()' logic, and doesn't
    get filtered away there.
    
    Normally you'd never even notice, because if a fatal signal is pending,
    the new SIGSEGV we send ends up being ignored anyway.
    
    But it turns out that there is one very noticeable exception: if you
    enable 'show_unhandled_signals', the aborted page fault will be logged
    in the kernel messages, and you'll get a scary line looking something
    like this in your logs:
    
      pverados[2183248]: segfault at 55e5a00f9ae0 ip 000055e5a00f9ae0 sp 00007ffc0720bea8 error 14 in perl[55e5a00d4000+195000] likely on CPU 10 (core 4, socket 0)
    
    which is rather misleading.  It's not really a segfault at all, it's
    just "the thread was killed before the page fault completed, so we
    aborted the page fault".
    
    Fix this by just making it clear that a pending fatal signal means that
    any new signal coming in after that is implicitly handled.  This will
    avoid the misleading logging, since now the signal isn't 'unhandled' any
    more.
    Reported-and-tested-by: default avatarFiona Ebner <f.ebner@proxmox.com>
    Tested-by: default avatarThomas Lamprecht <t.lamprecht@proxmox.com>
    Link: https://lore.kernel.org/lkml/8d063a26-43f5-0bb7-3203-c6a04dc159f8@proxmox.com/Acked-by: default avatarOleg Nesterov <oleg@redhat.com>
    Fixes: eda00472 ("mm: make the page fault mmap locking killable")
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    5f0bc0b0
signal.c 125 KB