MDEV-32096 Parallel replication lags because innobase_kill_query() may fail to...

MDEV-32096 Parallel replication lags because innobase_kill_query() may fail to interrupt a lock wait lock_sys_t::cancel(trx_t*): Remove, and merge to its only caller innobase_kill_query(). innobase_kill_query(): Before reading trx->lock.wait_lock, do acquire lock_sys.wait_mutex, like we did before commit e71e6133 (MDEV-24671). In this way, we should not miss a recently started lock wait by the killee transaction. lock_rec_lock(): Add a DEBUG_SYNC "lock_rec" for the test case. lock_wait(): Invoke trx_is_interrupted() before entering the wait, in case innobase_kill_query() was invoked some time earlier and some longer-running operation did not check for interrupts. As suggested by Vladislav Lesin, do not overwrite trx->error_state==DB_INTERRUPTED with DB_SUCCESS. This would avoid a call to trx_is_interrupted() when the test is modified to use the DEBUG_SYNC point lock_wait_start instead of lock_rec. Avoid some redundant loads of trx->lock.wait_lock; cache the value in the local variable wait_lock. Deadlock::check_and_resolve(): Take wait_lock as a parameter and return wait_lock (or -1 or nullptr). We only need to reload trx->lock.wait_lock if lock_sys.wait_mutex had been released and reacquired. trx_t::error_state: Correctly document the data member. trx_lock_t::was_chosen_as_deadlock_victim: Clarify that other threads may set the field (or flags in it) while holding lock_sys.wait_mutex. Thanks to Johannes Baumgarten for reporting the problem and testing the fix, as well as to Kristian Nielsen for suggesting the fix. Reviewed by: Vladislav Lesin Tested by: Matthias Leich

MDEV-32096 Parallel replication lags because innobase_kill_query() may fail to...
MDEV-32096 Parallel replication lags because innobase_kill_query() may fail to interrupt a lock wait lock_sys_t::cancel(trx_t*): Remove, and merge to its only caller innobase_kill_query(). innobase_kill_query(): Before reading trx->lock.wait_lock, do acquire lock_sys.wait_mutex, like we did before commit e71e6133 (MDEV-24671). In this way, we should not miss a recently started lock wait by the killee transaction. lock_rec_lock(): Add a DEBUG_SYNC "lock_rec" for the test case. lock_wait(): Invoke trx_is_interrupted() before entering the wait, in case innobase_kill_query() was invoked some time earlier and some longer-running operation did not check for interrupts. As suggested by Vladislav Lesin, do not overwrite trx->error_state==DB_INTERRUPTED with DB_SUCCESS. This would avoid a call to trx_is_interrupted() when the test is modified to use the DEBUG_SYNC point lock_wait_start instead of lock_rec. Avoid some redundant loads of trx->lock.wait_lock; cache the value in the local variable wait_lock. Deadlock::check_and_resolve(): Take wait_lock as a parameter and return wait_lock (or -1 or nullptr). We only need to reload trx->lock.wait_lock if lock_sys.wait_mutex had been released and reacquired. trx_t::error_state: Correctly document the data member. trx_lock_t::was_chosen_as_deadlock_victim: Clarify that other threads may set the field (or flags in it) while holding lock_sys.wait_mutex. Thanks to Johannes Baumgarten for reporting the problem and testing the fix, as well as to Kristian Nielsen for suggesting the fix. Reviewed by: Vladislav Lesin Tested by: Matthias Leich
e039720b · Marko Mäkelä · 0dd25f28 · e039720b · e039720b · e039720b
Commit e039720b authored Sep 11, 2023 by Marko Mäkelä
4 changed files
--- a/storage/innobase/handler/ha_innodb.cc
+++ b/storage/innobase/handler/ha_innodb.cc
@@ -5023,7 +5023,11 @@ static void innobase_kill_query(handlerton*, THD *thd, enum thd_kill_levels)
  if (trx_t* trx= thd_to_trx(thd))
  {
    ut_ad(trx->mysql_thd == thd);
-    if (!trx->lock.wait_lock);
+    mysql_mutex_lock(&lock_sys.wait_mutex);
+    lock_t *lock= trx->lock.wait_lock;
+
+    if (!lock)
+      /* The transaction is not waiting for any lock. */;
 #ifdef WITH_WSREP
    else if (trx->is_wsrep() && wsrep_thd_is_aborting(thd))
      /* if victim has been signaled by BF thread and/or aborting is already
@@ -5031,7 +5035,18 @@ static void innobase_kill_query(handlerton*, THD *thd, enum thd_kill_levels)
      Also, BF thread should own trx mutex for the victim. */;
 #endif /* WITH_WSREP */
    else
-      lock_sys_t::cancel(trx);
+    {
+      if (!trx->dict_operation)
+      {
+        /* Dictionary transactions must be immune to KILL, because they
+        may be executed as part of a multi-transaction DDL operation, such
+        as rollback_inplace_alter_table() or ha_innobase::delete_table(). */;
+        trx->error_state= DB_INTERRUPTED;
+        lock_sys_t::cancel<false>(trx, lock);
+      }
+      lock_sys.deadlock_check();
+    }
+    mysql_mutex_unlock(&lock_sys.wait_mutex);
  }

  DBUG_VOID_RETURN;

--- a/storage/innobase/include/lock0lock.h
+++ b/storage/innobase/include/lock0lock.h
@@ -898,8 +898,6 @@ class lock_sys_t
  @retval DB_LOCK_WAIT  if the lock was canceled */
  template<bool check_victim>
  static dberr_t cancel(trx_t *trx, lock_t *lock);
-  /** Cancel a waiting lock request (if any) when killing a transaction */
-  static void cancel(trx_t *trx);

  /** Note that a record lock wait started */
  inline void wait_start();

--- a/storage/innobase/include/trx0trx.h
+++ b/storage/innobase/include/trx0trx.h
@@ -336,7 +336,10 @@ struct trx_lock_t

 #if  defined(UNIV_DEBUG) || !defined(DBUG_OFF)
  /** 2=high priority WSREP thread has marked this trx to abort;
-  1=another transaction chose this as a victim in deadlock resolution. */
+  1=another transaction chose this as a victim in deadlock resolution.
+
+  Other threads than the one that is executing the transaction may set
+  flags in this while holding lock_sys.wait_mutex. */
  Atomic_relaxed<byte> was_chosen_as_deadlock_victim;

  /** Flag the lock owner as a victim in Galera conflict resolution. */
@@ -355,13 +358,14 @@ struct trx_lock_t
 #else /* defined(UNIV_DEBUG) || !defined(DBUG_OFF) */

  /** High priority WSREP thread has marked this trx to abort or
-  another transaction chose this as a victim in deadlock resolution. */
+  another transaction chose this as a victim in deadlock resolution.
+
+  Other threads than the one that is executing the transaction may set
+  this while holding lock_sys.wait_mutex. */
  Atomic_relaxed<bool> was_chosen_as_deadlock_victim;

  /** Flag the lock owner as a victim in Galera conflict resolution. */
-  void set_wsrep_victim() {
-    was_chosen_as_deadlock_victim= true;
-  }
+  void set_wsrep_victim() { was_chosen_as_deadlock_victim= true; }
 #endif /* defined(UNIV_DEBUG) || !defined(DBUG_OFF) */

  /** Next available rec_pool[] entry */
@@ -806,11 +810,13 @@ struct trx_t : ilist_node<>
 					/*!< how many tables the current SQL
 					statement uses, except those
 					in consistent read */
-	dberr_t		error_state;	/*!< 0 if no error, otherwise error
-					number; NOTE That ONLY the thread
-					doing the transaction is allowed to
-					set this field: this is NOT protected
-					by any mutex */
+
+  /** DB_SUCCESS or error code; usually only the thread that is running
+  the transaction is allowed to modify this field. The only exception is
+  when a thread invokes lock_sys_t::cancel() in order to abort a
+  lock_wait(). That is protected by lock_sys.wait_mutex and lock.wait_lock. */
+  dberr_t error_state;
+
 	const dict_index_t*error_info;	/*!< if the error number indicates a
 					duplicate key error, a pointer to
 					the problematic index is stored here */

--- a/storage/innobase/lock/lock0lock.cc
+++ b/storage/innobase/lock/lock0lock.cc