• Andrei's avatar
    MDEV-30780 optimistic parallel slave hangs after hit an error · d4339620
    Andrei authored
    The hang could be seen as show slave status displaying an error like
        Last_Error: Could not execute Write_rows_v1
    along with
        Slave_SQL_Running: Yes
    
    accompanied with one of the replication threads in show-processlist
    characteristically having status like
    
       2394 | system user  |    | NULL | Slave_worker | 50852| closing tables
    
    It turns out that closing tables worker got entrapped in endless looping
    in mark_start_commit_inner() across already garbage-collected gco items.
    
    The reclaimed gco links are explained with actually possible
    out-of-order groups of events termination due to the Last_Error.
    This patch reinforces the correct ordering to perform
    finish_event_group's cleanup actions, incl unlinking gco:s
    from the active list.
    d4339620
rpl_parallel.cc 92.7 KB