• Sergei Golubchik's avatar
    MDEV-23328 Server hang due to Galera lock conflict resolution · 29bbcac0
    Sergei Golubchik authored
    mutex order violation here.
    when wsrep bf thread kills a conflicting trx, the stack is
    
      wsrep_thd_LOCK()
      wsrep_kill_victim()
      lock_rec_other_has_conflicting()
      lock_clust_rec_read_check_and_lock()
      row_search_mvcc()
      ha_innobase::index_read()
      ha_innobase::rnd_pos()
      handler::ha_rnd_pos()
      handler::rnd_pos_by_record()
      handler::ha_rnd_pos_by_record()
      Rows_log_event::find_row()
      Update_rows_log_event::do_exec_row()
      Rows_log_event::do_apply_event()
      Log_event::apply_event()
      wsrep_apply_events()
    
    and mutexes are taken in the order
    
      lock_sys->mutex -> victim_trx->mutex -> victim_thread->LOCK_thd_data
    
    When a normal KILL statement is executed, the stack is
    
      innobase_kill_query()
      kill_handlerton()
      plugin_foreach_with_mask()
      ha_kill_query()
      THD::awake()
      kill_one_thread()
    
    and mutexes are
    
      victim_thread->LOCK_thd_data -> lock_sys->mutex -> victim_trx->mutex
    
    To fix the mutex order violation we kill the victim thd asynchronously,
    from the manager thread
    29bbcac0
wsrep_mysqld.cc 86 KB