• sjaakola's avatar
    MDEV-31833 replication breaks when using optimistic replication and replica is a galera node · a3cbc44b
    sjaakola authored
    MariaDB async replication SQL thread was stopped for any failure
    in applying of replication events and error message logged for the failure
    was: "Node has dropped from cluster". The assumption was that event applying
    failure is always due to node dropping out.
    With optimistic parallel replication, event applying can fail for natural
    reasons and applying should be retried to handle the failure. This retry
    logic was never exercised because the slave SQL thread was stopped with first
    applying failure.
    
    To support optimistic parallel replication retrying logic this commit will
    now skip replication slave abort, if node remains in cluster (wsrep_ready==ON)
    and replication is configured for optimistic or aggressive retry logic.
    
    During the development of this fix, galera.galera_as_slave_nonprim test showed
    some problems. The test was analyzed, and it appears to need some attention.
    One excessive sleep command was removed in this commit, but it will need more
    fixes still to be fully deterministic. After this commit galera_as_slave_nonprim
    is successful, though.
    Signed-off-by: default avatarJulius Goryavsky <julius.goryavsky@mariadb.com>
    a3cbc44b
slave.cc 288 KB