• Marko Mäkelä's avatar
    MDEV-12699 Improve crash recovery of corrupted data pages · 169c0099
    Marko Mäkelä authored
    InnoDB crash recovery used to read every data page for which
    redo log exists. This is unnecessary for those pages that are
    initialized by the redo log. If a newly created page is corrupted,
    recovery could unnecessarily fail. It would suffice to reinitialize
    the page based on the redo log records.
    
    To add insult to injury, InnoDB crash recovery could hang if it
    encountered a corrupted page. We will fix also that problem.
    InnoDB would normally refuse to start up if it encounters a
    corrupted page on recovery, but that can be overridden by
    setting innodb_force_recovery=1.
    
    Data pages are completely initialized by the records
    MLOG_INIT_FILE_PAGE2 and MLOG_ZIP_PAGE_COMPRESS.
    MariaDB 10.4 additionally recognizes MLOG_INIT_FREE_PAGE,
    which notifies that a page has been freed and its contents
    can be discarded (filled with zeroes).
    
    The record MLOG_INDEX_LOAD notifies that redo logging has
    been re-enabled after being disabled. We can avoid loading
    the page if all buffered redo log records predate the
    MLOG_INDEX_LOAD record.
    
    For the internal tables of FULLTEXT INDEX, no MLOG_INDEX_LOAD
    records were written before commit aa3f7a10.
    Hence, we will skip these optimizations for tables whose
    name starts with FTS_.
    
    This is joint work with Thirunarayanan Balathandayuthapani.
    
    fil_space_t::enable_lsn, file_name_t::enable_lsn: The LSN of the
    latest recovered MLOG_INDEX_LOAD record for a tablespace.
    
    mlog_init: Page initialization operations discovered during
    redo log scanning. FIXME: This really belongs in recv_sys->addr_hash,
    and should be removed in MDEV-19176.
    
    recv_addr_state: Add the new state RECV_WILL_NOT_READ to
    indicate that according to mlog_init, the page will be
    initialized based on redo log record contents.
    
    recv_add_to_hash_table(): Set the RECV_WILL_NOT_READ state
    if appropriate. For now, we do not treat MLOG_ZIP_PAGE_COMPRESS
    as page initialization. This works around bugs in the crash
    recovery of ROW_FORMAT=COMPRESSED tables.
    
    recv_mark_log_index_load(): Process a MLOG_INDEX_LOAD record
    by resetting the state to RECV_NOT_PROCESSED and by updating
    the fil_name_t::enable_lsn.
    
    recv_init_crash_recovery_spaces(): Copy fil_name_t::enable_lsn
    to fil_space_t::enable_lsn.
    
    recv_recover_page(): Add the parameter init_lsn, to ignore
    any log records that precede the page initialization.
    Add DBUG output about skipped operations.
    
    buf_page_create(): Initialize FIL_PAGE_LSN, so that
    recv_recover_page() will not wrongly skip applying
    the page-initialization record due to the field containing
    some newer LSN as a leftover from a different page.
    Do not invoke ibuf_merge_or_delete_for_page() during
    crash recovery.
    
    recv_apply_hashed_log_recs(): Remove some unnecessary lookups.
    Note if a corrupted page was found during recovery.
    After invoking buf_page_create(), do invoke
    ibuf_merge_or_delete_for_page() via mlog_init.ibuf_merge()
    in the last recovery batch.
    
    ibuf_merge_or_delete_for_page(): Relax a debug assertion.
    
    innobase_start_or_create_for_mysql(): Abort startup if
    a corrupted page was found during recovery. Corrupted pages
    will not be flagged if innodb_force_recovery is set.
    However, the recv_sys->found_corrupt_fs flag can be set
    regardless of innodb_force_recovery if file names are found
    to be incorrect (for example, multiple files with the same
    tablespace ID).
    169c0099
ibuf0ibuf.cc 142 KB