Commit 577c61e8 authored by Marko Mäkelä's avatar Marko Mäkelä

MDEV-23888: Potential server hang on replication with InnoDB

In MDEV-21452, SAFE_MUTEX flagged an ordering problem that involved
trx_t::mutex, LOCK_global_system_variables, and LOCK_commit_ordered
when running
./mtr --no-reorder\
 binlog.binlog_checksum,mix binlog.binlog_commit_wait,mix

Because LOCK_commit_ordered is acquired by replication code before
innobase_commit_ordered() is invoked, and because LOCK_commit_ordered
should be below LOCK_global_system_variables in the global latching
order, it turns out that we must avoid acquiring
LOCK_global_system_variables in any low-level code.

It also turns out that lock_rec_lock() acquires lock_sys_t::mutex
and then carries on to call lock_rec_enqueue_waiting(), which may
invoke THDVAR() via thd_lock_wait_timeout(). This call is problematic
if THDVAR() had never been invoked in that thread earlier.

innobase_trx_init(): Let us invoke THDVAR() at the start of an InnoDB
transaction so that future invocations of THDVAR() will avoid
LOCK_global_system_variables acquisition on the same THD. Because
the first call to intern_sys_var_ptr() will initialize all session
variables by not passing the offset to sync_dynamic_session_variables(),
this will indeed make any future THDVAR() invocation mutex-free.

There are some THDVAR() calls in other code (related to indexed virtual
columns, fulltext indexes, and DDL operations). No SAFE_MUTEX warning
was known for those, but there does not appear to be any replication
test coverage for indexed virtual columns or fulltext indexes. DDL should
be covered, and perhaps DDL code paths were already invoking THDVAR()
while not holding any InnoDB mutex.

Side note: MySQL should avoid this type of deadlocks since
mysql/mysql-server@4d275c89954685e2ed1b368812b3b5a29ddf9389.
MariaDB never defined alloc_and_copy_thd_dynamic_variables(),
because we prefer to avoid overhead during connection creation.

An important part of the deadlock could be the current handling of
SET GLOBAL binlog_checksum=NONE; and similar assignments.
In binlog_checksum_update(), we would hold LOCK_global_system_variables
while potentially acquiring LOCK_commit_ordered in MYSQL_BIN_LOG::open().
Even if that code was changed later to release
LOCK_global_system_variables during the write to mysql_bin_log,
it could be a good idea for performance to avoid invoking the
expensive code path of THDVAR() while holding any InnoDB mutexes,
such as lock_sys.mutex in lock_rec_enqueue_waiting().

Thanks to Andrei Elkin for debugging the SAFE_MUTEX issue, and to
Sergei Golubchik for the suggestion to invoke THDVAR() early.
parent 01ffccd6
...@@ -2717,6 +2717,12 @@ innobase_trx_init( ...@@ -2717,6 +2717,12 @@ innobase_trx_init(
DBUG_ENTER("innobase_trx_init"); DBUG_ENTER("innobase_trx_init");
DBUG_ASSERT(thd == trx->mysql_thd); DBUG_ASSERT(thd == trx->mysql_thd);
/* Ensure that thd_lock_wait_timeout(), which may be called
while holding lock_sys.mutex, by lock_rec_enqueue_waiting(),
will not end up acquiring LOCK_global_system_variables in
intern_sys_var_ptr(). */
THDVAR(thd, lock_wait_timeout);
trx->check_foreigns = !thd_test_options( trx->check_foreigns = !thd_test_options(
thd, OPTION_NO_FOREIGN_KEY_CHECKS); thd, OPTION_NO_FOREIGN_KEY_CHECKS);
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment