-
Marko Mäkelä authored
https://jepsen.io/analyses/mysql-8.0.34 highlights that the transaction isolation levels in the InnoDB storage engine do not correspond to any widely accepted definitions, such as "Generalized Isolation Level Definitions" https://pmg.csail.mit.edu/papers/icde00.pdf (PL-1 = READ UNCOMMITTED, PL-2 = READ COMMITTED, PL-2.99 = REPEATABLE READ, PL-3 = SERIALIZABLE). Only READ UNCOMMITTED in InnoDB seems to match the above definition. The issue is that InnoDB does not detect write/write conflicts (Section 4.4.3, Definition 6) in the above. It appears that as soon as we implement write/write conflict detection (SET SESSION innodb_snapshot_isolation=ON), the default isolation level (SET TRANSACTION ISOLATION LEVEL REPEATABLE READ) will become Snapshot Isolation (similar to Postgres), as defined in Section 4.2 of "A Critique of ANSI SQL Isolation Levels", MSR-TR-95-51, June 1995 https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tr-95-51.pdf Locking reads inside InnoDB used to read the latest committed version, ignoring what should actually be visible to the transaction. The added test innodb.lock_isolation illustrates this. The statement UPDATE t SET a=3 WHERE b=2; is executed in a transaction that was started before a read view or a snapshot of the current transaction was created, and committed before the current transaction attempts to execute UPDATE t SET b=3; If SET innodb_snapshot_isolation=ON is in effect when the second transaction was started, the second transaction will be aborted with the error ER_CHECKREAD. By default (innodb_snapshot_isolation=OFF), the second transaction would execute inconsistently, displaying an incorrect SELECT COUNT(*) FROM t in its read view. If innodb_snapshot_isolation=ON, if an attempt to acquire a lock on a record that does not exist in the current read view is made, an error DB_RECORD_CHANGED (HA_ERR_RECORD_CHANGED, ER_CHECKREAD) will be raised. This error will be treated in the same way as a deadlock: the transaction will be rolled back. lock_clust_rec_read_check_and_lock(): If the current transaction has a read view where the record is not visible and innodb_snapshot_isolation=ON, fail before trying to acquire the lock. row_sel_build_committed_vers_for_mysql(): If innodb_snapshot_isolation=ON, disable the "semi-consistent read" logic that had been implemented by myself on the directions of Heikki Tuuri in order to address https://bugs.mysql.com/bug.php?id=3300 that was motivated by a customer wanting UPDATE to skip locked rows that do not match the WHERE condition. It looks like my changes were included in the MySQL 5.1.5 commit ad126d90; at that time, employees of Innobase Oy (a recent acquisition of Oracle) had lost write access to the repository. The only reason why we set innodb_snapshot_isolation=OFF by default is backward compatibility with applications, such as the one that motivated the implementation of "semi-consistent read" back in 2005. In a later major release, we can default to innodb_snapshot_isolation=ON. Thanks to Peter Alvaro, Kyle Kingsbury and Alexey Gotsman for their work on https://github.com/jepsen-io/ and to Kyle and Alexey for explanations and some testing of this fix. Thanks to Vladislav Lesin for the initial test for MDEV-26643, as well as reviewing these changes.
b8a67198