• Aleksey Midenkov's avatar
    MDEV-29023 MTR hangs after multiple failures · 18488048
    Aleksey Midenkov authored
    Passing $opt_parallel as $childs is wrong: child can be killed before
    it connects and you will never decrement $childs for this.
    
    Another problem is (and that is the cause of this bug): child can be
    killed and never close server socket. This can happen f.ex. after
    unmaskable KILL signal. In such case the socket is closed by reaping
    the child but that never happens inside reading the socket loop in
    run_test_server().
    
    The proper design is the waitless reap of children inside the socket
    loop and if there is no more children we finish the socket loop. Since
    there is Windows variation where we don't control the children via
    waitpid(), all the clients must normally close the socket and only
    this can finish the socket loop. For Unix variation we reckon that
    case as all children closed the socket but not all yet died and for
    that we do final waiting waitpid() (was done before the patch as
    well).
    
    To be more complete, we now handle 3 end-of-game scenarios in Unix:
    
       1. all children closed socket, all children died: everything is
          handled by the socket loop;
    
       2. all children closed socket, not all yet died: we wait for alive
          children to die after exiting the socket loop;
    
       3. not all children closed socket, all children died: everything is
          handled by the socket loop.
    
    For Windows end-of-game scenario is only one:
    
       All children close the socket.
    18488048
mysql-test-run.pl 177 KB