In the slave I/O thread (in master.info), it seems less worse to flush

the relay log before flushing master.info. Doing 'before' leads to duplicate event, doing after leads to missing event. Both can be as destructive, but 'duplicate' enables us to later add detection code to catch it. Whereas 'missing' can't be caught (it can't, because the I/O thread can produce legal position jumps, for example if it has ignored an event coming from this slave (rememember that starting from 4.1.1, the I/O thread filters the server id).

In the slave I/O thread (in master.info), it seems less worse to flush
the relay log before flushing master.info. Doing 'before' leads to duplicate event, doing after leads to missing event. Both can be as destructive, but 'duplicate' enables us to later add detection code to catch it. Whereas 'missing' can't be caught (it can't, because the I/O thread can produce legal position jumps, for example if it has ignored an event coming from this slave (rememember that starting from 4.1.1, the I/O thread filters the server id).
817752f0 · unknown · e8b6da43 · 817752f0
Commit 817752f0 authored Nov 28, 2003 by unknown
Show whitespace changes
Inline Side-by-side

Showing with 24 additions and 14 deletions

sql/slave.cc sql/slave.cc +24 -14

No files found.
--- a/sql/slave.cc
+++ b/sql/slave.cc
@@ -2084,6 +2084,30 @@ bool flush_master_info(MASTER_INFO* mi, bool flush_relay_log_cache)
  DBUG_ENTER("flush_master_info");
  DBUG_PRINT("enter",("master_pos: %ld", (long) mi->master_log_pos));

+  /*
+    Flush the relay log to disk. If we don't do it, then the relay log while
+    have some part (its last kilobytes) in memory only, so if the slave server
+    dies now, with, say, from master's position 100 to 150 in memory only (not
+    on disk), and with position 150 in master.info, then when the slave
+    restarts, the I/O thread will fetch binlogs from 150, so in the relay log
+    we will have "[0, 100] U [150, infinity[" and nobody will notice it, so the
+    SQL thread will jump from 100 to 150, and replication will silently break.
+
+    When we come to this place in code, relay log may or not be initialized;
+    the caller is responsible for setting 'flush_relay_log_cache' accordingly.
+  */
+  if (flush_relay_log_cache)
+    flush_io_cache(mi->rli.relay_log.get_log_file());
+
+  /*
+    We flushed the relay log BEFORE the master.info file, because if we crash
+    now, we will get a duplicate event in the relay log at restart. If we
+    flushed in the other order, we would get a hole in the relay log.
+    And duplicate is better than hole (with a duplicate, in later versions we
+    can add detection and scrap one event; with a hole there's nothing we can
+    do).
+  */
+
  /*
     In certain cases this code may create master.info files that seems 
     corrupted, because of extra lines filled with garbage in the end 
@@ -2101,20 +2125,6 @@ bool flush_master_info(MASTER_INFO* mi, bool flush_relay_log_cache)
              (int)(mi->ssl), mi->ssl_ca, mi->ssl_capath, mi->ssl_cert,
              mi->ssl_cipher, mi->ssl_key);
  flush_io_cache(file);
-  /*
-    Flush the relay log to disk. If we don't do it, then the relay log while
-    have some part (its last kilobytes) in memory only, so if the slave server
-    dies now, with, say, from master's position 100 to 150 in memory only (not
-    on disk), and with position 150 in master.info, then when the slave
-    restarts, the I/O thread will fetch binlogs from 150, so in the relay log
-    we will have "[0, 100] U [150, infinity[" and nobody will notice it, so the
-    SQL thread will jump from 100 to 150, and replication will silently break.
-
-    When we come to this place in code, relay log may or not be initialized;
-    the caller is responsible for setting 'flush_relay_log_cache' accordingly.
-  */
-  if (flush_relay_log_cache)
-    flush_io_cache(mi->rli.relay_log.get_log_file());
  DBUG_RETURN(0);
 }