• Brandon Nesterenko's avatar
    MDEV-30260: Slave crashed:reload_acl_and_cache during shutdown · 952ab9a5
    Brandon Nesterenko authored
    The signal handler thread can use various different runtime
    resources when processing a SIGHUP (e.g. master-info information)
    due to calling into reload_acl_and_cache(). Currently, the shutdown
    process waits for the termination of the signal thread after
    performing cleanup. However, this could cause resources actively
    used by the signal handler to be freed while reload_acl_and_cache()
    is processing.
    
    The specific resource that caused MDEV-30260 is a race condition for
    the hostname_cache, such that mysqld would delete it in
    clean_up()::hostname_cache_free(), before the signal handler would
    use it in reload_acl_and_cache()::hostname_cache_refresh().
    
    Another similar resource is the active_mi/master_info_index. There
    was a race between its deletion by the main thread in end_slave(),
    and their usage by the Signal Handler as a part of
    Master_info_index::flush_all_relay_logs.read(active_mi) in
    reload_acl_and_cache().
    
    This patch fixes these race conditions by relocating where server
    shutdown waits for the signal handler to die until after
    server-level threads have been killed (i.e., as a last step of
    close_connections()). With respect to the hostname_cache, active_mi
    and master_info_cache, this ensures that they cannot be destroyed
    while the signal handler is still active, and potentially using
    them.
    
    Additionally:
    
     1) This requires that Events memory is still in place for SIGHUP
    handling's mysql_print_status(). So event deinitialization is moved
    into clean_up(), but the event scheduler still needs to be stopped
    in close_connections() at the same spot.
    
     2) The function kill_server_thread is no longer used, so it is
    deleted
    
     3) The timeout to wait for the death of the signal thread was not
    consistent with the comment. The comment mentioned up to 10 seconds,
    whereas it was actually 0.01s. The code has been fixed to wait up to
    10 seconds.
    
     4) A warning has been added if the signal handler thread fails to
    exit in time.
    
     5) Added pthread_join() to end of wait_for_signal_thread_to_end()
    if it hadn't ended in 10s with a warning. Note this also removes
    the pthread_detached attribute from the signal_thread to allow
    for the pthread_join().
    
    Reviewed By:
    ===========
    Vladislav Vaintroub <wlad@mariadb.com>
    Andrei Elkin <andrei.elkin@mariadb.com>
    952ab9a5
rpl_shutdown_sighup.result 1.79 KB