• Marko Mäkelä's avatar
    MDEV-26642/MDEV-26643/MDEV-32898 Implement innodb_snapshot_isolation · b8a67198
    Marko Mäkelä authored
    https://jepsen.io/analyses/mysql-8.0.34 highlights that the
    transaction isolation levels in the InnoDB storage engine do not
    correspond to any widely accepted definitions, such as
    "Generalized Isolation Level Definitions"
    https://pmg.csail.mit.edu/papers/icde00.pdf
    (PL-1 = READ UNCOMMITTED, PL-2 = READ COMMITTED, PL-2.99 = REPEATABLE READ,
    PL-3 = SERIALIZABLE).
    Only READ UNCOMMITTED in InnoDB seems to match the above definition.
    
    The issue is that InnoDB does not detect write/write conflicts
    (Section 4.4.3, Definition 6) in the above.
    
    It appears that as soon as we implement write/write conflict detection
    (SET SESSION innodb_snapshot_isolation=ON), the default isolation level
    (SET TRANSACTION ISOLATION LEVEL REPEATABLE READ) will become
    Snapshot Isolation (similar to Postgres), as defined in Section 4.2 of
    "A Critique of ANSI SQL Isolation Levels", MSR-TR-95-51, June 1995
    https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tr-95-51.pdf
    
    Locking reads inside InnoDB used to read the latest committed version,
    ignoring what should actually be visible to the transaction.
    The added test innodb.lock_isolation illustrates this. The statement
    	UPDATE t SET a=3 WHERE b=2;
    is executed in a transaction that was started before a read view or
    a snapshot of the current transaction was created, and committed before
    the current transaction attempts to execute
    	UPDATE t SET b=3;
    If SET innodb_snapshot_isolation=ON is in effect when the second
    transaction was started, the second transaction will be aborted with
    the error ER_CHECKREAD. By default (innodb_snapshot_isolation=OFF),
    the second transaction would execute inconsistently, displaying an
    incorrect SELECT COUNT(*) FROM t in its read view.
    
    If innodb_snapshot_isolation=ON, if an attempt to acquire a lock on a
    record that does not exist in the current read view is made, an error
    DB_RECORD_CHANGED (HA_ERR_RECORD_CHANGED, ER_CHECKREAD) will
    be raised. This error will be treated in the same way as a deadlock:
    the transaction will be rolled back.
    
    lock_clust_rec_read_check_and_lock(): If the current transaction has
    a read view where the record is not visible and
    innodb_snapshot_isolation=ON, fail before trying to acquire the lock.
    
    row_sel_build_committed_vers_for_mysql(): If innodb_snapshot_isolation=ON,
    disable the "semi-consistent read" logic that had been implemented by
    myself on the directions of Heikki Tuuri in order to address
    https://bugs.mysql.com/bug.php?id=3300 that was motivated by a customer
    wanting UPDATE to skip locked rows that do not match the WHERE condition.
    It looks like my changes were included in the MySQL 5.1.5
    commit ad126d90; at that time, employees
    of Innobase Oy (a recent acquisition of Oracle) had lost write access to
    the repository.
    
    The only reason why we set innodb_snapshot_isolation=OFF by default is
    backward compatibility with applications, such as the one that motivated
    the implementation of "semi-consistent read" back in 2005. In a later
    major release, we can default to innodb_snapshot_isolation=ON.
    
    Thanks to Peter Alvaro, Kyle Kingsbury and Alexey Gotsman for their work
    on https://github.com/jepsen-io/ and to Kyle and Alexey for explanations
    and some testing of this fix.
    
    Thanks to Vladislav Lesin for the initial test for MDEV-26643,
    as well as reviewing these changes.
    b8a67198
innodb_virtual_debug_purge.result 6.72 KB