• Jan Lindström's avatar
    MDEV-31413 : Node has been dropped from the cluster on Startup / Shutdown with async replica · 277968aa
    Jan Lindström authored
    There was two related problems:
    
    (1) Galera node that is defined as a slave to async MariaDB
    master at restart might do SST (state stransfer) and
    part of that it will copy mysql.gtid_slave_pos table.
    Problem is that updates on that table are not replicated
    on a cluster. Therefore, table from donor that is not
    slave is copied and joiner looses gtid position it was
    and start executing events from wrong position of the binlog.
    This incorrect position could break replication and
    causes node to be dropped and requiring user action.
    
    (2) Slave sql thread might start executing events before
    galera is ready (wsrep_ready=ON) and that could also
    cause node to be dropped from the cluster.
    
    In this fix we enable replication of mysql.gtid_slave_pos
    table on a cluster. In this way all nodes in a cluster
    will know gtid slave position and even after SST joiner
    knows correct gtid position to start.
    
    Furthermore, we wait galera to be ready before slave
    sql thread executes any events to prevent too early
    execution.
    Signed-off-by: default avatarJulius Goryavsky <julius.goryavsky@mariadb.com>
    277968aa
galera_restart_replica.cnf 293 Bytes