MDEV-33669 mariabackup --backup hangs

This is a server hang and not an issue with backup. While concurrent
DDLs in server gets in hanged state, mariabackup waits for DDLs to
finish trying to acquire MDL_BACKUP_BLOCK_DDL.

The server hang is serious in nature and caused by thread pool state
being incorrectly set to thread creation pending state while no creation
is actually pending. Once a thread pool reaches such state no new thread
gets created in the pool.

While it could possibly affect all thread pools in server, the innodb
thread pool is the victim in current bug where IO job gets blocked when
the pool is stuck with much less number of threads than intended.
Available workers are blocked in purge waiting for page lock to be
released by IO write (SX lock) causing a complete deadlock.

The issue is caused by the state variable m_thread_creation_pending
introduced by MDEV-31095: 9e62ab7a. We check and set the variable
early while attempting to create a new thread in pool but fail to reset
it if we exit the flow for other reasons like maximum threads reached
or get into thread creation throttling path.

Fix: The simple fix is to make sure that the state is reset back in case
we don't actually attempt to create the thread.
parent ef7a2344
......@@ -722,9 +722,6 @@ static int throttling_interval_ms(size_t n_threads,size_t concurrency)
/* Create a new worker.*/
bool thread_pool_generic::add_thread()
{
if (m_thread_creation_pending.test_and_set())
return false;
size_t n_threads = thread_count();
if (n_threads >= m_max_threads)
......@@ -750,6 +747,14 @@ bool thread_pool_generic::add_thread()
}
}
/* Check and set "thread creation pending" flag before creating the thread. We
reset the flag in thread_pool_generic::worker_main in new thread created. The
flag must be reset back in case we fail to create the thread. If this flag is
not reset all future attempt to create thread for this pool would not work as
we would return from here. */
if (m_thread_creation_pending.test_and_set())
return false;
worker_data *thread_data = m_thread_data_cache.get();
m_active_threads.push_back(thread_data);
try
......@@ -769,6 +774,7 @@ bool thread_pool_generic::add_thread()
"current number of threads in pool %zu\n", e.what(), thread_count());
warning_written = true;
}
m_thread_creation_pending.clear();
return false;
}
return true;
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment