• Marko Mäkelä's avatar
    MDEV-12288 Reset DB_TRX_ID when the history is removed, to speed up MVCC · 3c09f148
    Marko Mäkelä authored
    Let InnoDB purge reset DB_TRX_ID,DB_ROLL_PTR when the history is removed.
    
    [TODO: It appears that the resetting is not taking place as often as
    it could be. We should test that a simple INSERT should eventually
    cause row_purge_reset_trx_id() to be invoked unless DROP TABLE is
    invoked soon enough.]
    
    The InnoDB clustered index record system columns DB_TRX_ID,DB_ROLL_PTR
    are used by multi-versioning. After the history is no longer needed, these
    columns can safely be reset to 0 and 1<<55 (to indicate a fresh insert).
    
    When a reader sees 0 in the DB_TRX_ID column, it can instantly determine
    that the record is present the read view. There is no need to acquire
    the transaction system mutex to check if the transaction exists, because
    writes can never be conducted by a transaction whose ID is 0.
    
    The persistent InnoDB undo log used to be split into two parts:
    insert_undo and update_undo. The insert_undo log was discarded at
    transaction commit or rollback, and the update_undo log was processed
    by the purge subsystem. As part of this change, we will only generate
    a single undo log for new transactions, and the purge subsystem will
    reset the DB_TRX_ID whenever a clustered index record is touched.
    That is, all persistent undo log will be preserved at transaction commit
    or rollback, to be removed by purge.
    
    The InnoDB redo log format is changed in two ways:
    We remove the redo log record type MLOG_UNDO_HDR_REUSE, and
    we introduce the MLOG_ZIP_WRITE_TRX_ID record for updating the
    DB_TRX_ID,DB_ROLL_PTR in a ROW_FORMAT=COMPRESSED table.
    
    This is also changing the format of persistent InnoDB data files:
    undo log and clustered index leaf page records. It will still be
    possible via import and export to exchange data files with earlier
    versions of MariaDB. The change to clustered index leaf page records
    is simple: we allow DB_TRX_ID to be 0.
    
    When it comes to the undo log, we must be able to upgrade from earlier
    MariaDB versions after a clean shutdown (no redo log to apply).
    While it would be nice to perform a slow shutdown (innodb_fast_shutdown=0)
    before an upgrade, to empty the undo logs, we cannot assume that this
    has been done. So, separate insert_undo log may exist for recovered
    uncommitted transactions. These transactions may be automatically
    rolled back, or they may be in XA PREPARE state, in which case InnoDB
    will preserve the transaction until an explicit XA COMMIT or XA ROLLBACK.
    
    Upgrade has been tested by starting up MariaDB 10.2 with
    ./mysql-test-run --manual-gdb innodb.read_only_recovery
    and then starting up this patched server with
    and without --innodb-read-only.
    
    trx_undo_ptr_t::undo: Renamed from update_undo.
    
    trx_undo_ptr_t::old_insert: Renamed from insert_undo.
    
    trx_rseg_t::undo_list: Renamed from update_undo_list.
    
    trx_rseg_t::undo_cached: Merged from update_undo_cached
    and insert_undo_cached.
    
    trx_rseg_t::old_insert_list: Renamed from insert_undo_list.
    
    row_purge_reset_trx_id(): New function to reset the columns.
    This will be called for all undo processing in purge
    that does not remove the clustered index record.
    
    trx_undo_update_rec_get_update(): Allow trx_id=0 when copying the
    old DB_TRX_ID of the record to the undo log.
    
    ReadView::changes_visible(): Allow id==0. (Return true for it.
    This is what speeds up the MVCC.)
    
    row_vers_impl_x_locked_low(), row_vers_build_for_semi_consistent_read():
    Implement a fast path for DB_TRX_ID=0.
    
    Always initialize the TRX_UNDO_PAGE_TYPE to 0. Remove undo->type.
    
    MLOG_UNDO_HDR_REUSE: Remove. This changes the redo log format!
    
    innobase_start_or_create_for_mysql(): Set srv_undo_sources before
    starting any transactions.
    
    The parsing of the MLOG_ZIP_WRITE_TRX_ID record was successfully
    tested by running the following:
    ./mtr --parallel=auto --mysqld=--debug=d,ib_log innodb_zip.bug56680
    grep MLOG_ZIP_WRITE_TRX_ID var/*/log/mysqld.1.err
    3c09f148
log0recv.cc 98.6 KB