• Sergei Golubchik's avatar
    fix a 3-way deadlock in galera_sr.galera-features#56 · b91e77cf
    Sergei Golubchik authored
    rarely (try --repeat 1000), the following happens:
    
    * from wsrep_bf_abort (when a thread is being killed), wsrep-lib
    starts streaming_rollback that wants to
    convert_streaming_client_to_applier. wsrep_create_streaming_applier
    creates a new THD(). All while the other THD is being killed,
    so under LOCK_thd_kill and LOCK_thd_data. In particular, THD::init()
    takes LOCK_global_system_variables under LOCK_thd_kill.
    
    * updating @@wsrep_slave_threads takes LOCK_global_system_variables
    and LOCK_wsrep_cluster_config (in that order) and invokes
    wsrep_slave_threads_update() that takes LOCK_wsrep_slave_threads
    
    * wsrep_replication_process() takes LOCK_wsrep_slave_threads and
    invokes wsrep_close_applier(), that does thd->set_killed() which
    takes LOCK_thd_kill.
    
    et voilà.
    
    As a fix I copied a workaround from wsrep_cluster_address_update()
    to wsrep_slave_threads_update(). It seems to be safe: without mutexes
    a race condition is possible and a concurrent SET might change
    wsrep_slave_threads, but wsrep_slave_threads_update() always verifies
    if there's a need to do something, so it will not run twice in this case,
    it'll be a no-op.
    b91e77cf
wsrep_var.cc 28.9 KB