• Kristian Nielsen's avatar
    MDEV-8302: Duplicate key with parallel replication · 9b9c5e89
    Kristian Nielsen authored
    This bug is essentially another variant of MDEV-7458.
    
    If a transaction conflict caused a deadlock kill of T2 in record_gtid()
    during commit, the code would do a rollback _before_ running
    rgi->unmark_start_commit(). This creates a race where following transactions
    could start too early (before T2 has completed its transaction retry). This
    in turn could lead to replication failure, if there was a conflict that
    caused eg. duplicate key error or similar.
    
    The fix is to remove these rollbacks (in Query_log_event::do_apply_event()
    and Xid_log_event::do_apply_event(). They seem out-of-place; code in
    log_event.cc generally does not roll back on error, this is handled higher
    up.
    
    In addition, because of the extreme difficulty of reproducing bugs like
    MDEV-7458 and MDEV-8302, this patch adds some extra precations to try to
    detect (in debug builds) or prevent (in release builds) similar bugs.
    ha_rollback_trans() will now call unmark_start_commit() if needed (and
    assert in debug build when a caller does rollback without unmark first).
    
    We also add an extra check for thd->killed() so that we avoid doing
    mark_start_commit() if we already have a pending deadlock kill.
    
    And we add a missing unmark_start_commit() call in the error case, found by
    the above assertion.
    9b9c5e89
rpl_parallel.cc 71.9 KB