-
Kirill Smelkov authored
On usual kernel, if a file is mmaped, and then memory read, and underlying file implementation return -EIO, the kernel sends SIGBUS to client thread, and if that SIGBUS is not handled the whole client process is terminated with coredump. In bigfile/virtmem.c until now we were doing similar thing - if in vma_on_pagefault() a read request to loadblk() fails - we abort the whole process. This is however not very convenient, because if there is a multithreaded server with each request mapped to thread, and a handling thread for only 1 request fails this way, we kill the whole server process. What could be convenient is to somehow propagate the error to calling thread, e.g. unwinding the stack in a C++-style exceptions way and turning that back to python exception at some point. And in the future we maybe could try to do it. For now we take a small step forward - we can terminate only the thread which caused failed loadblk() - i.e. we still kill the code, without providing it a way to recover, but we kill only the working thread, not the whole process. To test the functionality, we leverage our tfault framework which is now extended to verify not only at which function a testcase dies, but more generally it now examines the traceback (so that we can track coredump to which thread terminated), and also it now verifies exit status code and terminating signal of dying process. NOTE on Linux it is not easy to terminate only 1 thread and produce a coredump for debugging and have right process exit status if e.g. main thread is terminated this way. The reason is Linux hardcodes termination-with-coredump to kill all threads from a process, and even separate processes which happen to share virtual memory layout with the killing thread. So to do such termination, we use hacks and workarounds - see comments in newly introduced __abort_thread(). NOTE2 for getting separate coredump files for several faulting threads /proc/sys/kernel/core_pattern or /proc/sys/kernel/core_uses_pid are your friends. /cc @Tyagov /cc @klaus
f9379a1c