• Marko Mäkelä's avatar
    MDEV-13564 Mariabackup does not work with TRUNCATE · 055a3334
    Marko Mäkelä authored
    Implement undo tablespace truncation via normal redo logging.
    
    Implement TRUNCATE TABLE as a combination of RENAME to #sql-ib name,
    CREATE, and DROP.
    
    Note: Orphan #sql-ib*.ibd may be left behind if MariaDB Server 10.2
    is killed before the DROP operation is committed. If MariaDB Server 10.2
    is killed during TRUNCATE, it is also possible that the old table
    was renamed to #sql-ib*.ibd but the data dictionary will refer to the
    table using the original name.
    
    In MariaDB Server 10.3, RENAME inside InnoDB is transactional,
    and #sql-* tables will be dropped on startup. So, this new TRUNCATE
    will be fully crash-safe in 10.3.
    
    ha_mroonga::wrapper_truncate(): Pass table options to the underlying
    storage engine, now that ha_innobase::truncate() will need them.
    
    rpl_slave_state::truncate_state_table(): Before truncating
    mysql.gtid_slave_pos, evict any cached table handles from
    the table definition cache, so that there will be no stale
    references to the old table after truncating.
    
    == TRUNCATE TABLE ==
    
    WL#6501 in MySQL 5.7 introduced separate log files for implementing
    atomic and crash-safe TRUNCATE TABLE, instead of using the InnoDB
    undo and redo log. Some convoluted logic was added to the InnoDB
    crash recovery, and some extra synchronization (including a redo log
    checkpoint) was introduced to make this work. This synchronization
    has caused performance problems and race conditions, and the extra
    log files cannot be copied or applied by external backup programs.
    
    In order to support crash-upgrade from MariaDB 10.2, we will keep
    the logic for parsing and applying the extra log files, but we will
    no longer generate those files in TRUNCATE TABLE.
    
    A prerequisite for crash-safe TRUNCATE is a crash-safe RENAME TABLE
    (with full redo and undo logging and proper rollback). This will
    be implemented in MDEV-14717.
    
    ha_innobase::truncate(): Invoke RENAME, create(), delete_table().
    Because RENAME cannot be fully rolled back before MariaDB 10.3
    due to missing undo logging, add some explicit rename-back in
    case the operation fails.
    
    ha_innobase::delete(): Introduce a variant that takes sqlcom as
    a parameter. In TRUNCATE TABLE, we do not want to touch any
    FOREIGN KEY constraints.
    
    ha_innobase::create(): Add the parameters file_per_table, trx.
    In TRUNCATE, the new table must be created in the same transaction
    that renames the old table.
    
    create_table_info_t::create_table_info_t(): Add the parameters
    file_per_table, trx.
    
    row_drop_table_for_mysql(): Replace a bool parameter with sqlcom.
    
    row_drop_table_after_create_fail(): New function, wrapping
    row_drop_table_for_mysql().
    
    dict_truncate_index_tree_in_mem(), fil_truncate_tablespace(),
    fil_prepare_for_truncate(), fil_reinit_space_header_for_table(),
    row_truncate_table_for_mysql(), TruncateLogger,
    row_truncate_prepare(), row_truncate_rollback(),
    row_truncate_complete(), row_truncate_fts(),
    row_truncate_update_system_tables(),
    row_truncate_foreign_key_checks(), row_truncate_sanity_checks():
    Remove.
    
    row_upd_check_references_constraints(): Remove a check for
    TRUNCATE, now that the table is no longer truncated in place.
    
    The new test innodb.truncate_foreign uses DEBUG_SYNC to cover some
    race-condition like scenarios. The test innodb-innodb.truncate does
    not use any synchronization.
    
    We add a redo log subformat to indicate backup-friendly format.
    MariaDB 10.4 will remove support for the old TRUNCATE logging,
    so crash-upgrade from old 10.2 or 10.3 to 10.4 will involve
    limitations.
    
    == Undo tablespace truncation ==
    
    MySQL 5.7 implements undo tablespace truncation. It is only
    possible when innodb_undo_tablespaces is set to at least 2.
    The logging is implemented similar to the WL#6501 TRUNCATE,
    that is, using separate log files and a redo log checkpoint.
    
    We can simply implement undo tablespace truncation within
    a single mini-transaction that reinitializes the undo log
    tablespace file. Unfortunately, due to the redo log format
    of some operations, currently, the total redo log written by
    undo tablespace truncation will be more than the combined size
    of the truncated undo tablespace. It should be acceptable
    to have a little more than 1 megabyte of log in a single
    mini-transaction. This will be fixed in MDEV-17138 in
    MariaDB Server 10.4.
    
    recv_sys_t: Add truncated_undo_spaces[] to remember for which undo
    tablespaces a MLOG_FILE_CREATE2 record was seen.
    
    namespace undo: Remove some unnecessary declarations.
    
    fil_space_t::is_being_truncated: Document that this flag now
    only applies to undo tablespaces. Remove some references.
    
    fil_space_t::is_stopping(): Do not refer to is_being_truncated.
    This check is for tablespaces of tables. Potentially used
    tablespaces are never truncated any more.
    
    buf_dblwr_process(): Suppress the out-of-bounds warning
    for undo tablespaces.
    
    fil_truncate_log(): Write a MLOG_FILE_CREATE2 with a nonzero
    page number (new size of the tablespace in pages) to inform
    crash recovery that the undo tablespace size has been reduced.
    
    fil_op_write_log(): Relax assertions, so that MLOG_FILE_CREATE2
    can be written for undo tablespaces (without .ibd file suffix)
    for a nonzero page number.
    
    os_file_truncate(): Add the parameter allow_shrink=false
    so that undo tablespaces can actually be shrunk using this function.
    
    fil_name_parse(): For undo tablespace truncation,
    buffer MLOG_FILE_CREATE2 in truncated_undo_spaces[].
    
    recv_read_in_area(): Avoid reading pages for which no redo log
    records remain buffered, after recv_addr_trim() removed them.
    
    trx_rseg_header_create(): Add a FIXME comment that we could write
    much less redo log.
    
    trx_undo_truncate_tablespace(): Reinitialize the undo tablespace
    in a single mini-transaction, which will be flushed to the redo log
    before the file size is trimmed.
    
    recv_addr_trim(): Discard any redo logs for pages that were
    logged after the new end of a file, before the truncation LSN.
    If the rec_list becomes empty, reduce n_addrs. After removing
    any affected records, actually truncate the file.
    
    recv_apply_hashed_log_recs(): Invoke recv_addr_trim() right before
    applying any log records. The undo tablespace files must be open
    at this point.
    
    buf_flush_or_remove_pages(), buf_flush_dirty_pages(),
    buf_LRU_flush_or_remove_pages(): Add a parameter for specifying
    the number of the first page to flush or remove (default 0).
    
    trx_purge_initiate_truncate(): Remove the log checkpoints, the
    extra logging, and some unnecessary crash points. Merge the code
    from trx_undo_truncate_tablespace(). First, flush all to-be-discarded
    pages (beyond the new end of the file), then trim the space->size
    to make the page allocation deterministic. At the only remaining
    crash injection point, flush the redo log, so that the recovery
    can be tested.
    055a3334
ha_innodb.h 28.4 KB