MDEV-30780 optimistic parallel slave hangs after hit an error
The hang could be seen as show slave status displaying an error like Last_Error: Could not execute Write_rows_v1 along with Slave_SQL_Running: Yes accompanied with one of the replication threads in show-processlist characteristically having status like 2394 | system user | | NULL | Slave_worker | 50852| closing tables It turns out that closing tables worker got entrapped in endless looping in mark_start_commit_inner() across already garbage-collected gco items. The reclaimed gco links are explained with actually possible out-of-order groups of events termination due to the Last_Error. This patch reinforces the correct ordering to perform finish_event_group's cleanup actions, incl unlinking gco:s from the active list.
Showing
Please register or sign in to comment