• Dmitry Lenev's avatar
    A better fix for bug #56405 "Deadlock in the MDL deadlock · 0afd0a18
    Dmitry Lenev authored
    detector" that doesn't introduce bug #56715 "Concurrent
    transactions + FLUSH result in sporadical unwarranted
    deadlock errors".
    
    Deadlock could have occurred when workload containing a mix
    of DML, DDL and FLUSH TABLES statements affecting the same
    set of tables was executed in a heavily concurrent environment.
    
    This deadlock occurred when several connections tried to
    perform deadlock detection in the metadata locking subsystem.
    The first connection started traversing wait-for graph,
    encountered a sub-graph representing a wait for flush, acquired
    LOCK_open and dived into sub-graph inspection. Then it
    encountered sub-graph corresponding to wait for metadata lock
    and blocked while trying to acquire a rd-lock on
    MDL_lock::m_rwlock, since some,other thread had a wr-lock on it.
    When this wr-lock was released it could have happened (if there
    was another pending wr-lock against this rwlock) that the rd-lock
    from the first connection was left unsatisfied but at the same
    time the new rd-lock request from the second connection sneaked
    in and was satisfied (for this to be possible the second
    rd-request should come exactly after the wr-lock is released but
    before pending the wr-lock manages to grab rwlock, which is
    possible both on Linux and in our own rwlock implementation).
    If this second connection continued traversing the wait-for graph
    and encountered a sub-graph representing a wait for flush it tried
    to acquire LOCK_open and thus the deadlock was created.
    
    The previous patch tried to workaround this problem by not
    allowing the deadlock detector to lock LOCK_open mutex if
    some other thread doing deadlock detection already owns it
    and current search depth is greater than 0. Instead deadlock
    was reported. As a result it has introduced bug #56715.
    
    This patch solves this problem in a different way.
    It introduces a new rw_pr_lock_t implementation to be used
    by MDL subsystem instead of one based on Linux rwlocks or
    our own rwlock implementation. This new implementation
    never allows situation in which an rwlock is rd-locked and
    there is a blocked pending rd-lock. Thus the situation which
    has caused this bug becomes impossible with this implementation.
    
    Due to fact that this implementation is optimized for
    wr-lock/unlock scenario which is most common in the MDL
    subsystem it doesn't introduce noticeable performance
    regressions in sysbench tests. Moreover it significantly
    improves situation for POINT_SELECT test when many
    connections are used.
    
    No test case is provided as this bug is very hard to repeat
    in MTR environment but is repeatable with the help of RQG
    tests.
    This patch also doesn't include a test for bug #56715
    "Concurrent transactions + FLUSH result in sporadical
    unwarranted deadlock errors" as it takes too much time to
    be run as part of normal test-suite runs.
    
    config.h.cmake:
      We no longer need to check for presence of
      pthread_rwlockattr_setkind_np as we no longer
      use Linux-specific implementation of rw_pr_lock_t
      which uses this function.
    configure.cmake:
      We no longer need to check for presence of
      pthread_rwlockattr_setkind_np as we no longer
      use Linux-specific implementation of rw_pr_lock_t
      which uses this function.
    configure.in:
      We no longer need to check for presence of
      pthread_rwlockattr_setkind_np as we no longer
      use Linux-specific implementation of rw_pr_lock_t
      which uses this function.
    include/my_pthread.h:
      Introduced new implementation of rw_pr_lock_t.
      Since it never allows situation in which rwlock is rd-locked
      and there is a blocked pending rd-lock it is not affected by
      bug #56405 "Deadlock in the MDL deadlock detector".
      This implementation is also optimized for wr-lock/unlock
      scenario which is most common in MDL subsystem. So it doesn't
      introduce noticiable performance regressions in sysbench tests
      (compared to old Linux-specific implementation). Moreover it
      significantly improves situation for POINT_SELECT test when
      many connections are used.
      As part of this change removed try-lock part of API for
      this type of lock. It is not used in our code and it would
      be hard to implement correctly within constraints of new
      implementation.
      Finally, removed support of preferring readers from
      my_rw_lock_t implementation as the only user of this
      feature was old rw_pr_lock_t implementation.
    include/mysql/psi/mysql_thread.h:
      Removed try-lock part of prlock API.
      It is not used in our code and it would be hard
      to implement correctly within constraints of new
      prlock implementation.
    mysys/thr_rwlock.c:
      Introduced new implementation of rw_pr_lock_t.
      Since it never allows situation in which rwlock is rd-locked
      and there is a blocked pending rd-lock it is not affected by
      bug #56405 "Deadlock in the MDL deadlock detector".
      This implementation is also optimized for wr-lock/unlock
      scenario which is most common in MDL subsystem. So it doesn't
      introduce noticiable performance regressions in sysbench tests
      (compared to old Linux-specific implementation). Moreover it
      significantly improves situation for POINT_SELECT test when
      many connections are used.
      Also removed support of preferring readers from
      my_rw_lock_t implementation as the only user of this
      feature was old rw_pr_lock_t implementation.
    0afd0a18
config.h.cmake 17.7 KB