• Andrei's avatar
    MDEV-31949 parallel slave xa Round-Robin distribution · 96bd9e6b
    Andrei authored
    XA-Prepare group of events
    
      XA START xid
      ...
      XA END xid
      XA PREPARE xid
    
    and its XA-"complete" terminator
    
      XA COMMIT or
      XA ROLLBACK
    
    are made distributed Round-Robin across slave parallel workers.
    The former hash-based policy was proven to attribute to execution
    latency through creating a big - many times larger than the size
    of the worker pool - queue of binlog-ordered transactions
    to commit.
    
    Acronyms and notations used below:
    
      XAP := XA-Prepare event or the whole prepared XA group of events
      XAC := XA-"complete", which is a solitary group of events
      |W| := the size of the slave worker pool
      Subscripts like `_k' denote order in a corresponding sequence
         (e.g binlog file).
    
    KEY CHANGES:
    
    The parallel slave
    ------------------
    driver thread now maintains a list XAP:s currently
    in processing. It's purpose is to avoid "wild" parallel execution of XA:s
    with duplicate xids (unlikely, but that's the user's right).
    The list is arranged as a sliding window with the size of 2*|W| to account
    a possibility of XAP_k -> XAP_k+2|W|-1 the largest (in the group-of-events
    count sense) dependency.
    Say k=1, and |W| the # of Workers is 4. As transactions are distributed
    Round-Robin, it's possible to have T^*_1 -> T^*_8 as the largest
    dependency ('*' marks the dependents) in runtime.
    It can be seen from worker queues, like in the picture below.
    Let Q_i worker queues  develop downward:
    
      Q1 ...  Q4
      1^* 2 3 4
      5   6 7 8^*
    
    Worker # 1 has assigned with T_1 and T_5.
    Worker #4 can take on its T_8 when T_1 is yet at the
    beginning of its processing, so even before XA START of that XAP.
    
    XA related
    ----------
    XID_cache_element is extended with two pointers to resolve
    two types of dependencies: the duplicate xid XAP_k -> XAP_k+i
    and the ordinary completion on the prepare XAP_k -> XAC_k+j.
    The former is handled by a wait-for-xid protocol conducted by
    xid_cache_delete() and xid_cache_insert_maybe_wait().
    The later is done analogously by xid_cache_search_maybe_wait() and
    slave_applier_reset_xa_trans().
    
    XA-"complete" are allowed to go forward before its XAP parent
    has released the xid (all recovery concerns are covered in MDEV-21496,
    MDEV-21777).
    Yet XAC is going to wait for it at a critical
    point of execution which is at "complete" the work in Engine.
    
    CAVEAT: storage/innobase/trx/trx0undo.cc changes are due to possibly
            fixed MDEV-32144,
    	TODO: to be verified.
    
    Thanks to Brandon Nesterenko at mariadb.com for initial review and
    a lot of creative efforts to advance with this work!
    96bd9e6b
mysqld.h 39.2 KB