Commit c9fd311a authored by guilhem@mysql.com's avatar guilhem@mysql.com

In the slave I/O thread (in master.info), it seems less worse to flush

the relay log before flushing master.info.
Doing 'before' leads to duplicate event, doing after leads to missing event.
Both can be as destructive, but 'duplicate' enables us to later add detection
code to catch it. Whereas 'missing' can't be caught (it can't, because
the I/O thread can produce legal position jumps, for example if it has
ignored an event coming from this slave (rememember that starting from 4.1.1,
the I/O thread filters the server id). 
parent 292b0a18
...@@ -2084,6 +2084,30 @@ bool flush_master_info(MASTER_INFO* mi, bool flush_relay_log_cache) ...@@ -2084,6 +2084,30 @@ bool flush_master_info(MASTER_INFO* mi, bool flush_relay_log_cache)
DBUG_ENTER("flush_master_info"); DBUG_ENTER("flush_master_info");
DBUG_PRINT("enter",("master_pos: %ld", (long) mi->master_log_pos)); DBUG_PRINT("enter",("master_pos: %ld", (long) mi->master_log_pos));
/*
Flush the relay log to disk. If we don't do it, then the relay log while
have some part (its last kilobytes) in memory only, so if the slave server
dies now, with, say, from master's position 100 to 150 in memory only (not
on disk), and with position 150 in master.info, then when the slave
restarts, the I/O thread will fetch binlogs from 150, so in the relay log
we will have "[0, 100] U [150, infinity[" and nobody will notice it, so the
SQL thread will jump from 100 to 150, and replication will silently break.
When we come to this place in code, relay log may or not be initialized;
the caller is responsible for setting 'flush_relay_log_cache' accordingly.
*/
if (flush_relay_log_cache)
flush_io_cache(mi->rli.relay_log.get_log_file());
/*
We flushed the relay log BEFORE the master.info file, because if we crash
now, we will get a duplicate event in the relay log at restart. If we
flushed in the other order, we would get a hole in the relay log.
And duplicate is better than hole (with a duplicate, in later versions we
can add detection and scrap one event; with a hole there's nothing we can
do).
*/
/* /*
In certain cases this code may create master.info files that seems In certain cases this code may create master.info files that seems
corrupted, because of extra lines filled with garbage in the end corrupted, because of extra lines filled with garbage in the end
...@@ -2101,20 +2125,6 @@ bool flush_master_info(MASTER_INFO* mi, bool flush_relay_log_cache) ...@@ -2101,20 +2125,6 @@ bool flush_master_info(MASTER_INFO* mi, bool flush_relay_log_cache)
(int)(mi->ssl), mi->ssl_ca, mi->ssl_capath, mi->ssl_cert, (int)(mi->ssl), mi->ssl_ca, mi->ssl_capath, mi->ssl_cert,
mi->ssl_cipher, mi->ssl_key); mi->ssl_cipher, mi->ssl_key);
flush_io_cache(file); flush_io_cache(file);
/*
Flush the relay log to disk. If we don't do it, then the relay log while
have some part (its last kilobytes) in memory only, so if the slave server
dies now, with, say, from master's position 100 to 150 in memory only (not
on disk), and with position 150 in master.info, then when the slave
restarts, the I/O thread will fetch binlogs from 150, so in the relay log
we will have "[0, 100] U [150, infinity[" and nobody will notice it, so the
SQL thread will jump from 100 to 150, and replication will silently break.
When we come to this place in code, relay log may or not be initialized;
the caller is responsible for setting 'flush_relay_log_cache' accordingly.
*/
if (flush_relay_log_cache)
flush_io_cache(mi->rli.relay_log.get_log_file());
DBUG_RETURN(0); DBUG_RETURN(0);
} }
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment