MDEV-31949 parallel slave xa Round-Robin distribution
XA-Prepare group of events XA START xid ... XA END xid XA PREPARE xid and its XA-"complete" terminator XA COMMIT or XA ROLLBACK are made distributed Round-Robin across slave parallel workers. The former hash-based policy was proven to attribute to execution latency through creating a big - many times larger than the size of the worker pool - queue of binlog-ordered transactions to commit. Acronyms and notations used below: XAP := XA-Prepare event or the whole prepared XA group of events XAC := XA-"complete", which is a solitary group of events |W| := the size of the slave worker pool Subscripts like `_k' denote order in a corresponding sequence (e.g binlog file). KEY CHANGES: The parallel slave ------------------ driver thread now maintains a list XAP:s currently in processing. It's purpose is to avoid "wild" parallel execution of XA:s with duplicate xids (unlikely, but that's the user's right). The list is arranged as a sliding window with the size of 2*|W| to account a possibility of XAP_k -> XAP_k+2|W|-1 the largest (in the group-of-events count sense) dependency. Say k=1, and |W| the # of Workers is 4. As transactions are distributed Round-Robin, it's possible to have T^*_1 -> T^*_8 as the largest dependency ('*' marks the dependents) in runtime. It can be seen from worker queues, like in the picture below. Let Q_i worker queues develop downward: Q1 ... Q4 1^* 2 3 4 5 6 7 8^* Worker # 1 has assigned with T_1 and T_5. Worker #4 can take on its T_8 when T_1 is yet at the beginning of its processing, so even before XA START of that XAP. XA related ---------- XID_cache_element is extended with two pointers to resolve two types of dependencies: the duplicate xid XAP_k -> XAP_k+i and the ordinary completion on the prepare XAP_k -> XAC_k+j. The former is handled by a wait-for-xid protocol conducted by xid_cache_delete() and xid_cache_insert_maybe_wait(). The later is done analogously by xid_cache_search_maybe_wait() and slave_applier_reset_xa_trans(). XA-"complete" are allowed to go forward before its XAP parent has released the xid (all recovery concerns are covered in MDEV-21496, MDEV-21777). Yet XAC is going to wait for it at a critical point of execution which is at "complete" the work in Engine. CAVEAT: storage/innobase/trx/trx0undo.cc changes are due to possibly fixed MDEV-32144, TODO: to be verified. Thanks to Brandon Nesterenko at mariadb.com for initial review and a lot of creative efforts to advance with this work!
Showing
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Please register or sign in to comment