• Kirill Smelkov's avatar
    bigfile: Terminate only current thread if loading-through-mmap from file fails · f9379a1c
    Kirill Smelkov authored
    On usual kernel, if a file is mmaped, and then memory read, and
    underlying file implementation return -EIO, the kernel sends SIGBUS to
    client thread, and if that SIGBUS is not handled the whole client process
    is terminated with coredump.
    
    In bigfile/virtmem.c until now we were doing similar thing - if in
    vma_on_pagefault() a read request to loadblk() fails - we abort the
    whole process.
    
    This is however not very convenient, because if there is a multithreaded
    server with each request mapped to thread, and a handling thread for
    only 1 request fails this way, we kill the whole server process.
    
    What could be convenient is to somehow propagate the error to calling
    thread, e.g. unwinding the stack in a C++-style exceptions way and
    turning that back to python exception at some point. And in the future
    we maybe could try to do it.
    
    For now we take a small step forward - we can terminate only the thread
    which caused failed loadblk() - i.e. we still kill the code, without
    providing it a way to recover, but we kill only the working thread, not
    the whole process.
    
    To test the functionality, we leverage our tfault framework which is now
    extended to verify not only at which function a testcase dies, but more
    generally it now examines the traceback (so that we can track coredump
    to which thread terminated), and also it now verifies exit status code
    and terminating signal of dying process.
    
    NOTE on Linux it is not easy to terminate only 1 thread and produce a
    coredump for debugging and have right process exit status if e.g. main
    thread is terminated this way.
    
    The reason is Linux hardcodes termination-with-coredump to kill all
    threads from a process, and even separate processes which happen to
    share virtual memory layout with the killing thread.
    
    So to do such termination, we use hacks and workarounds - see comments
    in newly introduced __abort_thread().
    
    NOTE2 for getting separate coredump files for several faulting threads
    
        /proc/sys/kernel/core_pattern   or
        /proc/sys/kernel/core_uses_pid
    
    are your friends.
    
    /cc @Tyagov
    /cc @klaus
    f9379a1c
tfault-run 1.92 KB