Commit e039720b authored by Marko Mäkelä's avatar Marko Mäkelä

MDEV-32096 Parallel replication lags because innobase_kill_query() may fail to...

MDEV-32096 Parallel replication lags because innobase_kill_query() may fail to interrupt a lock wait

lock_sys_t::cancel(trx_t*): Remove, and merge to its only caller
innobase_kill_query().

innobase_kill_query(): Before reading trx->lock.wait_lock,
do acquire lock_sys.wait_mutex, like we did before
commit e71e6133 (MDEV-24671).
In this way, we should not miss a recently started lock wait
by the killee transaction.

lock_rec_lock(): Add a DEBUG_SYNC "lock_rec" for the test case.

lock_wait(): Invoke trx_is_interrupted() before entering the wait,
in case innobase_kill_query() was invoked some time earlier and
some longer-running operation did not check for interrupts.
As suggested by Vladislav Lesin, do not overwrite
trx->error_state==DB_INTERRUPTED with DB_SUCCESS.
This would avoid a call to trx_is_interrupted() when the test is
modified to use the DEBUG_SYNC point lock_wait_start instead of lock_rec.
Avoid some redundant loads of trx->lock.wait_lock; cache the value
in the local variable wait_lock.

Deadlock::check_and_resolve(): Take wait_lock as a parameter and
return wait_lock (or -1 or nullptr). We only need to reload
trx->lock.wait_lock if lock_sys.wait_mutex had been released
and reacquired.

trx_t::error_state: Correctly document the data member.

trx_lock_t::was_chosen_as_deadlock_victim: Clarify that other threads
may set the field (or flags in it) while holding lock_sys.wait_mutex.

Thanks to Johannes Baumgarten for reporting the problem and testing
the fix, as well as to Kristian Nielsen for suggesting the fix.

Reviewed by: Vladislav Lesin
Tested by: Matthias Leich
parent 0dd25f28
......@@ -5023,7 +5023,11 @@ static void innobase_kill_query(handlerton*, THD *thd, enum thd_kill_levels)
if (trx_t* trx= thd_to_trx(thd))
{
ut_ad(trx->mysql_thd == thd);
if (!trx->lock.wait_lock);
mysql_mutex_lock(&lock_sys.wait_mutex);
lock_t *lock= trx->lock.wait_lock;
if (!lock)
/* The transaction is not waiting for any lock. */;
#ifdef WITH_WSREP
else if (trx->is_wsrep() && wsrep_thd_is_aborting(thd))
/* if victim has been signaled by BF thread and/or aborting is already
......@@ -5031,7 +5035,18 @@ static void innobase_kill_query(handlerton*, THD *thd, enum thd_kill_levels)
Also, BF thread should own trx mutex for the victim. */;
#endif /* WITH_WSREP */
else
lock_sys_t::cancel(trx);
{
if (!trx->dict_operation)
{
/* Dictionary transactions must be immune to KILL, because they
may be executed as part of a multi-transaction DDL operation, such
as rollback_inplace_alter_table() or ha_innobase::delete_table(). */;
trx->error_state= DB_INTERRUPTED;
lock_sys_t::cancel<false>(trx, lock);
}
lock_sys.deadlock_check();
}
mysql_mutex_unlock(&lock_sys.wait_mutex);
}
DBUG_VOID_RETURN;
......
......@@ -898,8 +898,6 @@ class lock_sys_t
@retval DB_LOCK_WAIT if the lock was canceled */
template<bool check_victim>
static dberr_t cancel(trx_t *trx, lock_t *lock);
/** Cancel a waiting lock request (if any) when killing a transaction */
static void cancel(trx_t *trx);
/** Note that a record lock wait started */
inline void wait_start();
......
......@@ -336,7 +336,10 @@ struct trx_lock_t
#if defined(UNIV_DEBUG) || !defined(DBUG_OFF)
/** 2=high priority WSREP thread has marked this trx to abort;
1=another transaction chose this as a victim in deadlock resolution. */
1=another transaction chose this as a victim in deadlock resolution.
Other threads than the one that is executing the transaction may set
flags in this while holding lock_sys.wait_mutex. */
Atomic_relaxed<byte> was_chosen_as_deadlock_victim;
/** Flag the lock owner as a victim in Galera conflict resolution. */
......@@ -355,13 +358,14 @@ struct trx_lock_t
#else /* defined(UNIV_DEBUG) || !defined(DBUG_OFF) */
/** High priority WSREP thread has marked this trx to abort or
another transaction chose this as a victim in deadlock resolution. */
another transaction chose this as a victim in deadlock resolution.
Other threads than the one that is executing the transaction may set
this while holding lock_sys.wait_mutex. */
Atomic_relaxed<bool> was_chosen_as_deadlock_victim;
/** Flag the lock owner as a victim in Galera conflict resolution. */
void set_wsrep_victim() {
was_chosen_as_deadlock_victim= true;
}
void set_wsrep_victim() { was_chosen_as_deadlock_victim= true; }
#endif /* defined(UNIV_DEBUG) || !defined(DBUG_OFF) */
/** Next available rec_pool[] entry */
......@@ -806,11 +810,13 @@ struct trx_t : ilist_node<>
/*!< how many tables the current SQL
statement uses, except those
in consistent read */
dberr_t error_state; /*!< 0 if no error, otherwise error
number; NOTE That ONLY the thread
doing the transaction is allowed to
set this field: this is NOT protected
by any mutex */
/** DB_SUCCESS or error code; usually only the thread that is running
the transaction is allowed to modify this field. The only exception is
when a thread invokes lock_sys_t::cancel() in order to abort a
lock_wait(). That is protected by lock_sys.wait_mutex and lock.wait_lock. */
dberr_t error_state;
const dict_index_t*error_info; /*!< if the error number indicates a
duplicate key error, a pointer to
the problematic index is stored here */
......
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment