sql/sql_base.h · 5770bd1326fb0b055ba35f25cdc0cb291a85cbf4 · nexedi / MariaDB

A temporary workaround for bug #56405 "Deadlock in the · 5770bd13
Dmitry Lenev authored Sep 06, 2010
MDL deadlock detector".

Deadlock could have occurred when workload containing mix
of DML, DDL and FLUSH TABLES statements affecting same
set of tables was executed in heavily concurrent environment.

This deadlock occurred when several connections tried to
perform deadlock detection in metadata locking subsystem.
The first connection started traversing wait-for graph,
encountered sub-graph representing wait for flush, acquired
LOCK_open and dived into sub-graph inspection. When it has
encounterd sub-graph corresponding to wait for metadata lock
and blocked while trying to acquire rd-lock on
MDL_lock::m_rwlock (*) protecting this subgraph, since some
other thread had wr-lock on it. When this wr-lock was released
it could have happened (if there was other pending wr-lock
against this rwlock) that rd-lock from the first connection
was left unsatisfied but at the same time new rd-lock request
from the second connection sneaked in and was satisfied (for
this to be possible second rd- request should come exactly
after wr-lock is released but before pending wr-lock manages
to grab rwlock, which is possible both on Linux and in our
own rwlock implementation). If this second connection
continued traversing wait-for graph and encountered sub-graph
representing wait for flush it tried to acquire LOCK_open
and thus deadlock was created.

This patch tries to workaround this problem but not allowing
deadlock detector to lock LOCK_open mutex if some other thread
doing deadlock detection already owns it and current search
depth is greater than 0. Instead deadlock is reported.

Other possible solutions are either known to have negative
effects on performance or require much more time for proper
implementation and testing.

No test case is provided as this bug is very hard to repeat
in MTR environment but is repeatable with the help of RQG
tests.

sql/mdl.cc:
  Moved Deadlock_detection_visitor::m_current_search_depth to
  parent class to make it available in
  TABLE_SHARE::visit_subgraph().
  Added MDL_wait_for_graph_visitor::abort_traversal() method
  which allows to abort traversal of a wait-for graph and
  report a deadlock.
sql/mdl.h:
  Moved Deadlock_detection_visitor::m_current_search_depth to
  parent class to make it available in
  TABLE_SHARE::visit_subgraph().
  Added MDL_wait_for_graph_visitor::abort_traversal() method
  which allows to abort traversal of a wait-for graph and
  report a deadlock.
sql/sql_base.cc:
  Added dd_owns_lock_open counter and mutex protecting it to
  track number of connections which do deadlock detection and
  own or try to acquire LOCK_open.
sql/sql_base.h:
  Added dd_owns_lock_open counter and mutex protecting it to
  track number of connections which do deadlock detection and
  own or try to acquire LOCK_open.
sql/table.cc:
  Workaround bug #56405 but not allowing MDL deadlock detector
  to lock LOCK_open mutex if some other thread doing deadlock
  detection already owns it and current search depth is greater
  than 0. Instead report deadlock.
5770bd13
sql_base.h 21.9 KB
Replace sql_base.h