• Kristian Nielsen's avatar
    MDEV-31448: Killing a replica thread awaiting its GCO can hang/crash a parallel replica · 5d61442c
    Kristian Nielsen authored
    
    
    The problem is that when a worker thread is (user) killed in
    wait_for_prior_commit, the event group may complete out-of-order since the
    wait for prior commit was aborted by the kill.
    
    This fix ensures that event groups will always complete in-order, even
    in the error case. This is done in finish_event_group() by doing an
    extra wait_for_prior_commit(), if necessary, that ignores kills.
    
    This fix supersedes the fix for MDEV-30780, so the earlier fix for
    that is reverted in this patch.
    
    Also fix that an error from wait_for_prior_commit() inside
    finish_event_group() would not signal the error to
    wakeup_subsequent_commits().
    
    Based on earlier work by Brandon Nesterenko and Andrei Elkin, with
    some changes to simplify the semantics of wait_for_prior_commit() and
    make the code more robust to future changes.
    Reviewed-by: default avatarAndrei Elkin <andrei.elkin@mariadb.com>
    Signed-off-by: default avatarKristian Nielsen <knielsen@knielsen-hq.org>
    5d61442c
sql_class.h 230 KB