-
Kristian Nielsen authored
The bug occured in parallel replication when re-trying transactions that failed due to deadlock. In this case, the relay log file is re-opened and the events are read out again. This reading requires a format description event of the appropriate version. But the code was using a description event stored in rli, which is not thread-safe. This could lead to various rare races if the format description event was replaced by the SQL driver thread at the exact moment where a worker thread was trying to use it. The fix is to instead make the retry code create and maintain its own format description event. When the relay log file is opened, we first read the format description event from the start of the file, before seeking to the current position. This now uses the same code as when the SQL driver threads starts from a given relay log position. This also makes sure that the correct format description event version will be used in cases where the version of the binlog could change during replication.
8a3e2f29