• Dmitry Lenev's avatar
    Patch that refactors global read lock implementation and fixes · 378cdc58
    Dmitry Lenev authored
    bug #57006 "Deadlock between HANDLER and FLUSH TABLES WITH READ
    LOCK" and bug #54673 "It takes too long to get readlock for
    'FLUSH TABLES WITH READ LOCK'".
    
    The first bug manifested itself as a deadlock which occurred
    when a connection, which had some table open through HANDLER
    statement, tried to update some data through DML statement
    while another connection tried to execute FLUSH TABLES WITH
    READ LOCK concurrently.
    
    What happened was that FTWRL in the second connection managed
    to perform first step of GRL acquisition and thus blocked all
    upcoming DML. After that it started to wait for table open
    through HANDLER statement to be flushed. When the first connection
    tried to execute DML it has started to wait for GRL/the second
    connection creating deadlock.
    
    The second bug manifested itself as starvation of FLUSH TABLES
    WITH READ LOCK statements in cases when there was a constant
    stream of concurrent DML statements (in two or more
    connections).
    
    This has happened because requests for protection against GRL
    which were acquired by DML statements were ignoring presence of
    pending GRL and thus the latter was starved.
    
    This patch solves both these problems by re-implementing GRL
    using metadata locks.
    
    Similar to the old implementation acquisition of GRL in new
    implementation is two-step. During the first step we block
    all concurrent DML and DDL statements by acquiring global S
    metadata lock (each DML and DDL statement acquires global IX
    lock for its duration). During the second step we block commits
    by acquiring global S lock in COMMIT namespace (commit code
    acquires global IX lock in this namespace).
    
    Note that unlike in old implementation acquisition of
    protection against GRL in DML and DDL is semi-automatic.
    We assume that any statement which should be blocked by GRL
    will either open and acquires write-lock on tables or acquires
    metadata locks on objects it is going to modify. For any such
    statement global IX metadata lock is automatically acquired
    for its duration.
    
    The first problem is solved because waits for GRL become
    visible to deadlock detector in metadata locking subsystem
    and thus deadlocks like one in the first bug become impossible.
    
    The second problem is solved because global S locks which
    are used for GRL implementation are given preference over
    IX locks which are acquired by concurrent DML (and we can
    switch to fair scheduling in future if needed).
    
    Important change:
    FTWRL/GRL no longer blocks DML and DDL on temporary tables.
    Before this patch behavior was not consistent in this respect:
    in some cases DML/DDL statements on temporary tables were
    blocked while in others they were not. Since the main use cases
    for FTWRL are various forms of backups and temporary tables are
    not preserved during backups we have opted for consistently
    allowing DML/DDL on temporary tables during FTWRL/GRL.
    
    Important change:
    This patch changes thread state names which are used when
    DML/DDL of FTWRL is waiting for global read lock. It is now
    either "Waiting for global read lock" or "Waiting for commit
    lock" depending on the stage on which FTWRL is.
    
    Incompatible change:
    To solve deadlock in events code which was exposed by this
    patch we have to replace LOCK_event_metadata mutex with
    metadata locks on events. As result we have to prohibit
    DDL on events under LOCK TABLES.
    
    This patch also adds extensive test coverage for interaction
    of DML/DDL and FTWRL.
    
    Performance of new and old global read lock implementations
    in sysbench tests were compared. There were no significant
    difference between new and old implementations.
    378cdc58
event_db_repository.cc 34.9 KB