Commit bb04ac97 authored by Marc Alff's avatar Marc Alff

Bug#56324 Race Condition while shutting down MySQL: "PSI_server"

Before this fix, the server could crash during shutdown,
due to race conditions, that occured when killing the server.

In particular, the performance schema instrumentation handle,
PSI_server, and the performance schema itself would be cleaned up
too soon, causing race conditions with a running kill server thread.

The specifics of the race condition found are that:
the main thread executing "PSI_server= NULL" can cause crashes in
other threads still running, which are executing
"if (PSI_server != NULL) PSI_server->xxx()"
as part of the performance schema instrumentation.

While the bug was reported for the kill server thread,
in theory the same crash could happen with the signal thread,
as found by code analysis.

The correct fix would be to only shutdown the performance schema
and set PSI_server to NULL after every other thread is guaranteed
to be completed, including the kill_server_thread.

However, due to the existing mysqld server design, this is not the case.
See in particular bug number 56666.

The work around used to fix this race condition is to simply not
perform the call to shutdown_performance_schema() when the server exits,
and to keep the PSI_server pointer unchanged.

This will cause memory leaks to be reported by tools like valgrind,
but no memory leak actually happen because the process is about to exit().

As a result, the file mysql-test/valgrind.supp has been updated
to filter out these false positive messages.

This code has been tested with running in a loop the following
tests in parallel, which have been known to fail with race conditions
in the past:
- rpl_change_master
- binlog_max_extension
- events_restart
- rpl_heartbeat_basic
and no crash of test failure has been seen with the changed code.
parent c4913bc3
......@@ -745,3 +745,28 @@
Memcheck:Addr1
fun:buf_buddy_relocate
}
#
# See related Bug#56666
# Race condition between the server main thread and the kill server thread.
#
# Because of this race condition, the call to shutdown_performance_schema()
# was commented in sql/mysqld.cc, causing the reported leaks.
#
{
missing shutdown_performance_schema 1
Memcheck:Leak
fun:malloc
fun:_Z10pfs_mallocmi
}
{
missing shutdown_performance_schema 2
Memcheck:Leak
fun:malloc
fun:my_malloc
fun:_lf_alloc_new
fun:lf_hash_insert
}
......@@ -1395,6 +1395,12 @@ static void mysqld_exit(int exit_code)
mysql_audit_finalize();
clean_up_mutexes();
clean_up_error_log_mutex();
#ifdef WITH_PERFSCHEMA_STORAGE_ENGINE
/*
Bug#56666 needs to be fixed before calling:
shutdown_performance_schema();
*/
#endif
my_end(opt_endinfo ? MY_CHECK_ERROR | MY_GIVE_INFO : 0);
exit(exit_code); /* purecov: inspected */
}
......@@ -2732,6 +2738,11 @@ pthread_handler_t signal_hand(void *arg __attribute__((unused)))
if (!abort_loop)
{
abort_loop=1; // mark abort for threads
#ifdef HAVE_PSI_INTERFACE
/* Delete the instrumentation for the signal thread */
if (likely(PSI_server != NULL))
PSI_server->delete_current_thread();
#endif
#ifdef USE_ONE_SIGNAL_HAND
pthread_t tmp;
if (mysql_thread_create(0, /* Not instrumented */
......@@ -4587,6 +4598,15 @@ int mysqld_main(int argc, char **argv)
#endif
#endif /* __WIN__ */
#ifdef HAVE_PSI_INTERFACE
/*
Disable the main thread instrumentation,
to avoid recording events during the shutdown.
*/
if (PSI_server)
PSI_server->delete_current_thread();
#endif
/* Wait until cleanup is done */
mysql_mutex_lock(&LOCK_thread_count);
while (!ready_to_exit)
......@@ -4604,18 +4624,6 @@ int mysqld_main(int argc, char **argv)
}
#endif
clean_up(1);
#ifdef HAVE_PSI_INTERFACE
/*
Disable the instrumentation, to avoid recording events
during the shutdown.
*/
if (PSI_server)
{
PSI_server->delete_current_thread();
PSI_server= NULL;
}
shutdown_performance_schema();
#endif
mysqld_exit(0);
}
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment