MDEV-31755 Replica's DML event deadlocks wit online alter table
The deadlock was caused by too strong MDL acquired by the start ALTER. Replica's ALTER TABLE replication consists of two phases: 1. Start ALTER (SA) -- the event is emittd in the very beginning, allowing replication start ALTER in parallel 2. Commit ALTER (CA) -- ensures that master finishes successfully CA is normally received by wait_for_master call. If parallel DML was run, the following sequence will take place: |- SA |- DML |- CA If CA is handled after MDL upgrade, it'll will deadlock with DML. While MDL is shared by the start ALTER wait for its 2nd part to allow concurrent DMLs to grab the lock. The fix uses wait_for_master reentrancy -- no need to avoid a second call in the end of mysql_alter_table. Since SA and CA are marked with FL_DDL, the DML issued in-between cannot be rescheduled before or after them. However, SA "commits" (by he call of write_bin_log_start_alter and, subsequently, thd->wakeup_subsequent_commits) before the copy stage begins, unlocking the DMLs to run on this table. That is, these DMLs will be executed concurrently with the copy stage, making Online alter effective on replicas as well Co-authored-by: Nikita Malyavin (nikitamalyavin@gmail.com)
Showing
Please register or sign in to comment