Commit a9ead586 authored by Luis Soares's avatar Luis Soares

BUG#12400313 RELAY_LOG_SPACE_LIMIT IS NOT WORKING IN MANY CASES

BUG#64503: mysql frequently ignores --relay-log-space-limit

When the SQL thread goes to sleep, waiting for more events, it sets
the flag ignore_log_space_limit to true. This gives the IO thread a
chance to queue some more events and ultimately the SQL thread will be
able to purge the log once it is rotated. By then the SQL thread
resets the ignore_log_space_limit to false. However, between the time
the SQL thread has set the ignore flag and the time it resets it, the
IO thread will be queuing events in the relay log, possibly going way
over the limit.

This patch makes the IO and SQL thread to synchronize when they reach
the space limit and only ask for one event at a time. Thus the SQL
thread sets ignore_log_space_limit flag and the IO thread resets it to
false everytime it processes one more event. In addition, everytime
the SQL thread processes the next event, and the limit has been
reached, it checks if the IO thread should rotate. If it should, it
instructs the IO thread to rotate, giving the SQL thread a chance to
purge the logs (freeing space). Finally, this patch removes the
resetting of the ignore_log_space_limit flag from purge_first_log,
because this is now reset by the IO thread every time it processes the
next event when the limit has been reached.

If the SQL thread is in a transaction, it cannot purge so, there is no
point in asking the IO thread to rotate. The only thing it can do is
to ask for more events until the transaction is over (then it can ask
the IO to rotate and purge the log right away). Otherwise, there would
be a deadlock (SQL would not be able to purge and IO thread would not
be able to queue events so that the SQL would finish the transaction).
parent acb4ba73
include/master-slave.inc
[connection master]
include/assert.inc [Assert that relay log space is close to the limit]
include/rpl_end.inc
--relay-log-space-limit=8192 --relay-log-purge --max-relay-log-size=4096
This diff is collapsed.
...@@ -3194,8 +3194,6 @@ int MYSQL_BIN_LOG::purge_first_log(Relay_log_info* rli, bool included) ...@@ -3194,8 +3194,6 @@ int MYSQL_BIN_LOG::purge_first_log(Relay_log_info* rli, bool included)
pthread_mutex_lock(&rli->log_space_lock); pthread_mutex_lock(&rli->log_space_lock);
rli->relay_log.purge_logs(to_purge_if_included, included, rli->relay_log.purge_logs(to_purge_if_included, included,
0, 0, &rli->log_space_total); 0, 0, &rli->log_space_total);
// Tell the I/O thread to take the relay_log_space_limit into account
rli->ignore_log_space_limit= 0;
pthread_mutex_unlock(&rli->log_space_lock); pthread_mutex_unlock(&rli->log_space_lock);
/* /*
......
...@@ -189,6 +189,13 @@ class Relay_log_info : public Slave_reporting_capability ...@@ -189,6 +189,13 @@ class Relay_log_info : public Slave_reporting_capability
ulonglong log_space_limit,log_space_total; ulonglong log_space_limit,log_space_total;
bool ignore_log_space_limit; bool ignore_log_space_limit;
/*
Used by the SQL thread to instructs the IO thread to rotate
the logs when the SQL thread needs to purge to release some
disk space.
*/
bool sql_force_rotate_relay;
/* /*
When it commits, InnoDB internally stores the master log position it has When it commits, InnoDB internally stores the master log position it has
processed so far; the position to store is the one of the end of the processed so far; the position to store is the one of the end of the
......
...@@ -1450,6 +1450,54 @@ Waiting for the slave SQL thread to free enough relay log space"); ...@@ -1450,6 +1450,54 @@ Waiting for the slave SQL thread to free enough relay log space");
!(slave_killed=io_slave_killed(thd,mi)) && !(slave_killed=io_slave_killed(thd,mi)) &&
!rli->ignore_log_space_limit) !rli->ignore_log_space_limit)
pthread_cond_wait(&rli->log_space_cond, &rli->log_space_lock); pthread_cond_wait(&rli->log_space_cond, &rli->log_space_lock);
/*
Makes the IO thread read only one event at a time
until the SQL thread is able to purge the relay
logs, freeing some space.
Therefore, once the SQL thread processes this next
event, it goes to sleep (no more events in the queue),
sets ignore_log_space_limit=true and wakes the IO thread.
However, this event may have been enough already for
the SQL thread to purge some log files, freeing
rli->log_space_total .
This guarantees that the SQL and IO thread move
forward only one event at a time (to avoid deadlocks),
when the relay space limit is reached. It also
guarantees that when the SQL thread is prepared to
rotate (to be able to purge some logs), the IO thread
will know about it and will rotate.
NOTE: The ignore_log_space_limit is only set when the SQL
thread sleeps waiting for events.
*/
if (rli->ignore_log_space_limit)
{
#ifndef DBUG_OFF
{
char llbuf1[22], llbuf2[22];
DBUG_PRINT("info", ("log_space_limit=%s "
"log_space_total=%s "
"ignore_log_space_limit=%d "
"sql_force_rotate_relay=%d",
llstr(rli->log_space_limit,llbuf1),
llstr(rli->log_space_total,llbuf2),
(int) rli->ignore_log_space_limit,
(int) rli->sql_force_rotate_relay));
}
#endif
if (rli->sql_force_rotate_relay)
{
rotate_relay_log(rli->mi);
rli->sql_force_rotate_relay= false;
}
rli->ignore_log_space_limit= false;
}
thd->exit_cond(save_proc_info); thd->exit_cond(save_proc_info);
DBUG_RETURN(slave_killed); DBUG_RETURN(slave_killed);
} }
...@@ -4260,19 +4308,45 @@ static Log_event* next_event(Relay_log_info* rli) ...@@ -4260,19 +4308,45 @@ static Log_event* next_event(Relay_log_info* rli)
constraint, because we do not want the I/O thread to block because of constraint, because we do not want the I/O thread to block because of
space (it's ok if it blocks for any other reason (e.g. because the space (it's ok if it blocks for any other reason (e.g. because the
master does not send anything). Then the I/O thread stops waiting master does not send anything). Then the I/O thread stops waiting
and reads more events. and reads one more event and starts honoring log_space_limit again.
The SQL thread decides when the I/O thread should take log_space_limit
into account again : ignore_log_space_limit is reset to 0 If the SQL thread needs more events to be able to rotate the log (it
in purge_first_log (when the SQL thread purges the just-read relay might need to finish the current group first), then it can ask for one
log), and also when the SQL thread starts. We should also reset more at a time. Thus we don't outgrow the relay log indefinitely,
ignore_log_space_limit to 0 when the user does RESET SLAVE, but in but rather in a controlled manner, until the next rotate.
fact, no need as RESET SLAVE requires that the slave
When the SQL thread starts it sets ignore_log_space_limit to false.
We should also reset ignore_log_space_limit to 0 when the user does
RESET SLAVE, but in fact, no need as RESET SLAVE requires that the slave
be stopped, and the SQL thread sets ignore_log_space_limit to 0 when be stopped, and the SQL thread sets ignore_log_space_limit to 0 when
it stops. it stops.
*/ */
pthread_mutex_lock(&rli->log_space_lock); pthread_mutex_lock(&rli->log_space_lock);
// prevent the I/O thread from blocking next times
rli->ignore_log_space_limit= 1; /*
If we have reached the limit of the relay space and we
are going to sleep, waiting for more events:
1. If outside a group, SQL thread asks the IO thread
to force a rotation so that the SQL thread purges
logs next time it processes an event (thus space is
freed).
2. If in a group, SQL thread asks the IO thread to
ignore the limit and queues yet one more event
so that the SQL thread finishes the group and
is are able to rotate and purge sometime soon.
*/
if (rli->log_space_limit &&
rli->log_space_limit < rli->log_space_total)
{
/* force rotation if not in an unfinished group */
rli->sql_force_rotate_relay= !rli->is_in_group();
/* ask for one more event */
rli->ignore_log_space_limit= true;
}
/* /*
If the I/O thread is blocked, unblock it. Ok to broadcast If the I/O thread is blocked, unblock it. Ok to broadcast
after unlock, because the mutex is only destroyed in after unlock, because the mutex is only destroyed in
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment