• Jan Lindström's avatar
    MDEV-31658 : Deadlock found when trying to get lock during applying · ee974ca5
    Jan Lindström authored
    Problem was that there was two non-conflicting local idle
    transactions in node_1 that both inserted a key to primary key.
    Then two transactions from other nodes inserted also
    a key to primary key so that insert from node_2 conflicted
    one of the local transactions in node_1 so that there would
    be duplicate key if both are committed. For this insert
    from other node tries to acquire S-lock for this record
    and because this insert is high priority brute force (BF)
    transaction it will kill idle local transaction.
    
    Concurrently, second insert from node_3 conflicts the second
    idle insert transaction in node_1. Again, it tries to acquire
    S-lock for this record and kills idle local transaction.
    
    At this point we have two non-conflicting high priority
    transactions holding S-lock on different records in node_1.
    For example like this: rec s-lock-node2-rec s-lock-node3-rec rec.
    
    Because these high priority BF-transactions do not wait
    each other insert from node3 that has later seqno compared
    to insert from node2 can continue. It will try to acquire
    insert intention for record it tries to insert (to avoid
    duplicate key to be inserted by local transaction). Hower,
    it will note that there is conflicting S-lock in same gap
    between records. This will lead deadlock error as we have
    defined that BF-transactions may not wait for record lock
    but we can't kill conflicting BF-transaction because
    it has lower seqno and it should commit first.
    
    BF-transactions are executed concurrently because their
    values to primary key are different i.e. they do not
    conflict.
    
    Galera certification will make sure that inserts from
    other nodes i.e these high priority BF-transactions
    can't insert duplicate keys. Local transactions naturally
    can but they will be killed when BF-transaction
    acquires required record locks.
    
    Therefore, we can allow situation where there is conflicting
    S-lock and insert intention lock regardless of their seqno
    order and let both continue with no wait. This will lead
    to situation where we need to allow BF-transaction
    to wait when lock_rec_has_to_wait_in_queue is called
    because this function is also called from
    lock_rec_queue_validate and because lock is waiting
    there would be assertion in ut_a(lock->is_gap()
    || lock_rec_has_to_wait_in_queue(cell, lock));
    
    lock_wait_wsrep_kill
      Add debug sync points for BF-transactions killing
      local transaction.
    
    wsrep_assert_no_bf_bf_wait
      Print also requested lock information
    
    lock_rec_has_to_wait
      Add function to handle wsrep transaction lock wait
      cases.
    
    lock_rec_has_to_wait_wsrep
      New function to handle wsrep transaction lock wait
      exceptions.
    
    lock_rec_has_to_wait_in_queue
      Remove wsrep exception, in this function all
      conflicting locks need to wait in queue.
      Conflicts between BF and local transactions
      are handled in lock_wait.
    Signed-off-by: default avatarJulius Goryavsky <julius.goryavsky@mariadb.com>
    ee974ca5
service_wsrep.cc 12.1 KB