• Johannes Berg's avatar
    um: time-travel: fix signal blocking race/hang · 2cf3a3c4
    Johannes Berg authored
    When signals are hard-blocked in order to do time-travel
    socket processing, we set signals_blocked and then handle
    SIGIO signals by setting the SIGIO bit in signals_pending.
    When unblocking, we first set signals_blocked to 0, and
    then handle all pending signals. We have to set it first,
    so that we can again properly block/unblock inside the
    unblock, if the time-travel handlers need to be processed.
    
    Unfortunately, this is racy. We can get into this situation:
    
    // signals_pending = SIGIO_MASK
    
    unblock_signals_hard()
       signals_blocked = 0;
       if (signals_pending && signals_enabled) {
         block_signals();
         unblock_signals()
           ...
           sig_handler_common(SIGIO, NULL, NULL);
             sigio_handler()
               ...
               sigio_reg_handler()
                 irq_do_timetravel_handler()
                   reg->timetravel_handler() ==
                   vu_req_interrupt_comm_handler()
                     vu_req_read_message()
                       vhost_user_recv_req()
                         vhost_user_recv()
                           vhost_user_recv_header()
                             // reads 12 bytes header of
                             // 20 bytes message
    <-- receive SIGIO here <--
    sig_handler()
       int enabled = signals_enabled; // 1
       if ((signals_blocked || !enabled) && (sig == SIGIO)) {
         if (!signals_blocked && time_travel_mode == TT_MODE_EXTERNAL)
           sigio_run_timetravel_handlers()
             _sigio_handler()
               sigio_reg_handler()
                 ... as above ...
                   vhost_user_recv_header()
                     // reads 8 bytes that were message payload
                     // as if it were header - but aborts since
                     // it then gets -EAGAIN
    ...
    --> end signal handler -->
                           // continue in vhost_user_recv()
                           // full_read() for 8 bytes payload busy loops
                           // entire process hangs here
    
    Conceptually, to fix this, we need to ensure that the
    signal handler cannot run while we hard-unblock signals.
    The thing that makes this more complex is that we can be
    doing hard-block/unblock while unblocking. Introduce a
    new signals_blocked_pending variable that we can keep at
    non-zero as long as pending signals are being processed,
    then we only need to ensure it's decremented safely and
    the signal handler will only increment it if it's already
    non-zero (or signals_blocked is set, of course.)
    
    Note also that only the outermost call to hard-unblock is
    allowed to decrement signals_blocked_pending, since it
    could otherwise reach zero in an inner call, and leave
    the same race happening if the timetravel_handler loops,
    but that's basically required of it.
    
    Fixes: d6b399a0 ("um: time-travel/signals: fix ndelay() in interrupt")
    Link: https://patch.msgid.link/20240703110144.28034-2-johannes@sipsolutions.netSigned-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
    2cf3a3c4
signal.c 12 KB