Commit 6e10224d authored by unknown's avatar unknown

* Fix for a potential bug:

when the SQL thread stops, set rli->inside_transaction to 0. This is needed if the user
later restarts replication from a completely different place where there are only autocommit
statements.
* Detect the case where the master died while flushing the binlog cache to the binlog
and stop with error. Cannot add a testcase for this in 4.0 (I tested it manually)
as the slave always runs with --skip-innodb.


sql/log_event.cc:
  Detect the case where the master died while flushing the binlog cache to the binlog:
  in that case, we have a BEGIN with no COMMIT/ROLLBACK in the relay log; we detect
  this with rli->inside_transaction in Rotate_log_event::exec_event() (which is the
  only right place to detect this, see comments). When we see it, we stop with error.
  In 4.1, I had put code in Start_log_event::exec_event(); I'll remove it next time
  I push in the 4.1 tree.
sql/slave.cc:
  * Use slave_print_error instead of sql_print_error, to put the info in SHOW SLAVE STATUS too.
  * Fix for a potential bug:
  when the SQL thread stops, set rli->inside_transaction to 0. This is not needed if
  replication later restarts from the same position; but this is needed if the user
  restarts replication from a completely different place where there are only autocommit
  statements (in that case, if we didn't set to 0, the position would never increment in SHOW
  SLAVE STATUS, even if queries are processed well).
parent e3563c79
...@@ -2068,9 +2068,6 @@ Fatal error running LOAD DATA INFILE on table '%s'. Default database: '%s'", ...@@ -2068,9 +2068,6 @@ Fatal error running LOAD DATA INFILE on table '%s'. Default database: '%s'",
TODO TODO
- Remove all active user locks - Remove all active user locks
- If we have an active transaction at this point, the master died
in the middle while writing the transaction to the binary log.
In this case we should stop the slave.
*/ */
int Start_log_event::exec_event(struct st_relay_log_info* rli) int Start_log_event::exec_event(struct st_relay_log_info* rli)
...@@ -2098,7 +2095,9 @@ int Start_log_event::exec_event(struct st_relay_log_info* rli) ...@@ -2098,7 +2095,9 @@ int Start_log_event::exec_event(struct st_relay_log_info* rli)
break; break;
case BINLOG_FORMAT_323_GEQ_57 : case BINLOG_FORMAT_323_GEQ_57 :
/* Can distinguish, based on the value of 'created' */ /* Can distinguish, based on the value of 'created' */
if (created) /* this was generated at master startup*/ if (!created)
break;
/* otherwise this was generated at master startup*/
close_temporary_tables(thd); close_temporary_tables(thd);
break; break;
default : default :
...@@ -2156,10 +2155,28 @@ int Stop_log_event::exec_event(struct st_relay_log_info* rli) ...@@ -2156,10 +2155,28 @@ int Stop_log_event::exec_event(struct st_relay_log_info* rli)
We can't rotate the slave as this will cause infinitive rotations We can't rotate the slave as this will cause infinitive rotations
in a A -> B -> A setup. in a A -> B -> A setup.
NOTES
As a transaction NEVER spans on 2 or more binlogs:
if we have an active transaction at this point, the master died while
writing the transaction to the binary log, i.e. while flushing the binlog
cache to the binlog. As the write was started, the transaction had been
committed on the master, so we lack of information to replay this
transaction on the slave; all we can do is stop with error.
If we didn't detect it, then positions would start to become garbage (as we
are incrementing rli->relay_log_pos whereas we are in a transaction: the new
rli->relay_log_pos will be
relay_log_pos of the BEGIN + size of the Rotate event = garbage.
Since MySQL 4.0.14, the master ALWAYS sends a Rotate event when it starts
sending the next binlog, so we are sure to receive a Rotate event just
after the end of the "dead master"'s binlog; so this exec_event() is the
right place to catch the problem. If we would wait until
Start_log_event::exec_event() it would be too late, rli->relay_log_pos would
already be garbage.
RETURN VALUES RETURN VALUES
0 ok 0 ok
*/ */
int Rotate_log_event::exec_event(struct st_relay_log_info* rli) int Rotate_log_event::exec_event(struct st_relay_log_info* rli)
{ {
...@@ -2167,6 +2184,18 @@ int Rotate_log_event::exec_event(struct st_relay_log_info* rli) ...@@ -2167,6 +2184,18 @@ int Rotate_log_event::exec_event(struct st_relay_log_info* rli)
DBUG_ENTER("Rotate_log_event::exec_event"); DBUG_ENTER("Rotate_log_event::exec_event");
pthread_mutex_lock(&rli->data_lock); pthread_mutex_lock(&rli->data_lock);
if (rli->inside_transaction)
{
slave_print_error(rli, 0,
"there is an unfinished transaction in the relay log \
(could find neither COMMIT nor ROLLBACK in the relay log); it could be that \
the master died while writing the transaction to its binary log. Now the slave \
is rolling back the transaction.");
pthread_mutex_unlock(&rli->data_lock);
DBUG_RETURN(1);
}
memcpy(log_name, new_log_ident, ident_len+1); memcpy(log_name, new_log_ident, ident_len+1);
rli->master_log_pos = pos; rli->master_log_pos = pos;
rli->relay_log_pos += get_event_len(); rli->relay_log_pos += get_event_len();
......
...@@ -2260,7 +2260,7 @@ static int exec_relay_log_event(THD* thd, RELAY_LOG_INFO* rli) ...@@ -2260,7 +2260,7 @@ static int exec_relay_log_event(THD* thd, RELAY_LOG_INFO* rli)
} }
else else
{ {
sql_print_error("\ slave_print_error(rli, 0, "\
Could not parse relay log event entry. The possible reasons are: the master's \ Could not parse relay log event entry. The possible reasons are: the master's \
binary log is corrupted (you can check this by running 'mysqlbinlog' on the \ binary log is corrupted (you can check this by running 'mysqlbinlog' on the \
binary log), the slave's relay log is corrupted (you can check this by running \ binary log), the slave's relay log is corrupted (you can check this by running \
...@@ -2695,6 +2695,12 @@ the slave SQL thread with \"SLAVE START\". We stopped at log \ ...@@ -2695,6 +2695,12 @@ the slave SQL thread with \"SLAVE START\". We stopped at log \
DBUG_ASSERT(rli->slave_running == 1); // tracking buffer overrun DBUG_ASSERT(rli->slave_running == 1); // tracking buffer overrun
/* When master_pos_wait() wakes up it will check this and terminate */ /* When master_pos_wait() wakes up it will check this and terminate */
rli->slave_running= 0; rli->slave_running= 0;
/*
Going out of the transaction. Necessary to mark it, in case the user
restarts replication from a non-transactional statement (with CHANGE
MASTER).
*/
rli->inside_transaction= 0;
/* Wake up master_pos_wait() */ /* Wake up master_pos_wait() */
pthread_mutex_unlock(&rli->data_lock); pthread_mutex_unlock(&rli->data_lock);
DBUG_PRINT("info",("Signaling possibly waiting master_pos_wait() functions")); DBUG_PRINT("info",("Signaling possibly waiting master_pos_wait() functions"));
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment