• Andrei's avatar
    MDEV-31755 Replica's DML event deadlocks wit online alter table · bac8f189
    Andrei authored
    The deadlock was caused by too strong MDL acquired by the start ALTER.
    
    Replica's ALTER TABLE replication consists of two phases:
    1. Start ALTER (SA) -- the event is emittd in the very beginning,
    allowing replication start ALTER in parallel
    2. Commit ALTER (CA) -- ensures that master finishes successfully
    
    CA is normally received by wait_for_master call.
    If parallel DML was run, the following sequence will take place:
    
    |- SA
    |- DML
    |- CA
    
    If CA is handled after MDL upgrade, it'll will deadlock with DML.
    
    While MDL is shared by the start ALTER wait for its 2nd part
    to allow concurrent DMLs to grab the lock.
    
    The fix uses wait_for_master reentrancy -- no need to avoid a second call
    in the end of mysql_alter_table.
    
    Since SA and CA are marked with FL_DDL, the DML issued in-between cannot be
    rescheduled before or after them. However, SA "commits" (by he call of
    write_bin_log_start_alter and, subsequently,
    thd->wakeup_subsequent_commits) before the copy stage begins, unlocking
    the DMLs to run on this table. That is, these DMLs will be executed
    concurrently with the copy stage, making Online alter effective on replicas
    as well
    
    Co-authored-by: Nikita Malyavin (nikitamalyavin@gmail.com)
    bac8f189
rpl_alter_online_debug.result 1.06 KB