• Sachin's avatar
    MDEV-23089 rpl_parallel2 fails in 10.5 · 706a7101
    Sachin authored
    Problem:- rpl_parallel2 was failing non-deterministically
    Analysis:-
    When FLUSH TABLES WITH READ LOCK is executed, it will allow all worker
    threads to complete their ongoing transactions and then it will pause them.
    At this state FTWRL will proceed to acquire global read lock. FTWRL first
    blocks threads from starting new commits, then upgrades the lock to block
    commit of existing transactions.
      Step1:
        FLUSH TABLES WITH READ LOCK - Blocks new commits
      Step2:
        * STOP SLAVE command enables 'force_abort=1' which unblocks workers,
          they continue to execute events.
        * T1: Waits in 'record_gtid' call to update 'gtid_slave_pos' table with
          its current GTID, but it is blocked becuase of Step1.
        * T2: Holds COMMIT lock and waits for T1 to commit.
      Step3:
        FLUSH TABLES WITH READ LOCK - Waiting to get BLOCK_COMMIT.
    This results in deadlock. When STOP SLAVE command allows paused workers to
    proceed, workers should skip the execution of all further events, similar
    to 'conservative' parallel mode.
    Solution:-
    We will assign 1 to skip_event_group when we are aborted in do_ftwrl_wait.
    rpl_parallel_entry->pause_sub_id is only reset when force_abort is off in
    rpl_pause_after_ftwrl.
    706a7101
rpl_parallel2.result 4.07 KB