• Teemu Ollakka's avatar
    MDEV-32282: Galera node remains paused after interleaving FTWRLs · ef7fc586
    Teemu Ollakka authored
    After two concurrent FTWRL/UNLOCK TABLES, the node stays in paused state
    and the following CREATE TABLE fails with
    
      ER_UNKNOWN_COM_ERROR (1047): Aborting TOI: Replication paused on
      node for FTWRL/BACKUP STAGE.
    
    The cause is the use of global `wsrep_locked_seqno` to determine
    if the node should be resumed on UNLOCK TABLES. In some executions
    the `wsrep_locked_seqno` is cleared by the first UNLOCK TABLES
    after the second FTWRL gets past `make_global_read_lock_block_commit()`.
    
    As a fix, use `thd->wsrep_desynced_backup_stage` to determine
    if the thread should resume the node on UNLOCK TABLES.
    
    Add MTR test galera.galera_ftwrl_concurrent to reproduce the
    race. The test contains also cases for BACKUP STAGE which
    uses similar mechanism for desyncing and pausing the node.
    Signed-off-by: default avatarJulius Goryavsky <julius.goryavsky@mariadb.com>
    ef7fc586
lock.cc 37.7 KB