Commit e81e55e7 authored by unknown's avatar unknown

A fix for Bug#29431 killing an insert delayed thread causes crash

No test case, since the bug requires a stress case with 30 INSERT DELAYED
threads and 1 killer thread to repeat. The patch is verified
manually.
Review fixes.

The server that is running DELAYED inserts would deadlock itself
or crash under high load if some of the delayed threads were KILLed
in the meanwhile.

The fix is to change internal lock acquisition order of delayed inserts
subsystem and to ensure that
Delayed_insert::table_list::db does not point to volatile memory in some 
cases.
For details, please see a comment for sql_insert.cc.


sql/sql_insert.cc:
  A fix for Bug#29431 killing an insert delayed thread causes crash
  
  1) The deadlock was caused by different lock acquisition order
  between delayed_get_table and handle_delayed_insert.
  
  delayed_get_table would:
  - acquire LOCK_delayed_create
  - create a new Delayed_insert instance
  - acquire instance mutex (di->mutex)
  - "lock the instance in memory" by 
  increasing di->locked_in_memory variable
  - start the delayed thread
  - release di->mutex
  - let the delayed thread open the delayed table
  - discover that the delayed thread was killed
  - try to unlock() the delayed insert instance in memory
  - in Delayed_insert::unlock() do
   * lock LOCK_delayed_insert
   * decrease locks_in_memory and discover it's 0
   * attempt to lock di->mutex to broadcast di->cond condition <-- deadlock
   here
  
  Meanwhile, the delayed thread would
   * lock di->mutex
   * open the table
   * get killed
   * not notice that and attempt to lock LOCK_delayed_insert
  to register itself in the delayed insert list <-- deadlock here.
  
  Simply put, delayed_get_table used to lock LOCK_delayed_insert and then 
  di->mutex, and handle_delayed_insert would lock di->mutex and then 
  LOCK_delayed_insert.
  
  Fixed by moving registration in the list of delayed insert threads from 
  handle_delayed_insert to delayed_get_table - so that now
  handle_delayed_insert doesn't need to acquire LOCK_delayed_insert mutex.
  
  
  2) di->table_list.db was copied by-pointer from table_list.db of the first
  producer -- the one who initiated creation of the delayed insert thread.
  This producer might be long gone when the member is needed
  (handle_delayed_insert:open_ltable),
  e.g. by having been killed with KILL CONNECTION statement.
  Fixed by using a pointer to the consumer's deep copy of THD::db.
  
  3) In find_handler, we shouldn't assume that Delayed_insert object
  already (or still) has a table opened all the time it is
  present in the delayed insert list. E.g. it's not the case
  when Delayed_insert decided to terminate, closed the table, but haven't
  yet unregistered from the list (see the end of handle_delayed_insert).
parent 6981af6f
......@@ -1705,8 +1705,8 @@ Delayed_insert *find_handler(THD *thd, TABLE_LIST *table_list)
Delayed_insert *tmp;
while ((tmp=it++))
{
if (!strcmp(tmp->thd.db,table_list->db) &&
!strcmp(table_list->table_name,tmp->table->s->table_name))
if (!strcmp(table_list->db, tmp->table_list.db) &&
!strcmp(table_list->table_name, tmp->table_list.table_name))
{
tmp->lock();
break;
......@@ -1739,7 +1739,27 @@ Delayed_insert *find_handler(THD *thd, TABLE_LIST *table_list)
Two latter cases indicate a request for lock upgrade.
XXX: why do we regard INSERT DELAYED into a view as an error and
do not simply a lock upgrade?
do not simply perform a lock upgrade?
TODO: The approach with using two mutexes to work with the
delayed thread list -- LOCK_delayed_insert and
LOCK_delayed_create -- is redundant, and we only need one of
them to protect the list. The reason we have two locks is that
we do not want to block look-ups in the list while we're waiting
for the newly created thread to open the delayed table. However,
this wait itself is redundant -- we always call get_local_table
later on, and there wait again until the created thread acquires
a table lock.
As is redundant the concept of locks_in_memory, since we already
have another counter with similar semantics - tables_in_use,
both of them are devoted to counting the number of producers for
a given consumer (delayed insert thread), only at different
stages of producer-consumer relationship.
'dead' and 'status' variables in Delayed_insert are redundant
too, since there is already 'di->thd.killed' and
di->stacked_inserts.
*/
static
......@@ -1788,7 +1808,9 @@ bool delayed_get_table(THD *thd, TABLE_LIST *table_list)
goto end_create;
}
tmp->table_list= *table_list; // Needed to open table
/* Replace volatile strings with local copies */
tmp->table_list.alias= tmp->table_list.table_name= tmp->thd.query;
tmp->table_list.db= tmp->thd.db;
tmp->lock();
pthread_mutex_lock(&tmp->mutex);
if ((error=pthread_create(&tmp->thd.real_id,&connection_attrib,
......@@ -1834,6 +1856,9 @@ bool delayed_get_table(THD *thd, TABLE_LIST *table_list)
tmp->unlock();
goto end_create;
}
pthread_mutex_lock(&LOCK_delayed_insert);
delayed_threads.append(tmp);
pthread_mutex_unlock(&LOCK_delayed_insert);
}
pthread_mutex_unlock(&LOCK_delayed_create);
}
......@@ -2176,11 +2201,6 @@ pthread_handler_t handle_delayed_insert(void *arg)
}
di->table->copy_blobs=1;
/* One can now use this */
pthread_mutex_lock(&LOCK_delayed_insert);
delayed_threads.append(di);
pthread_mutex_unlock(&LOCK_delayed_insert);
/* Tell client that the thread is initialized */
pthread_cond_signal(&di->cond_client);
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment