• Marko Mäkelä's avatar
    MDEV-24626 Remove synchronous write of page0 file during file creation · 86dc7b4d
    Marko Mäkelä authored
    During data file creation, InnoDB holds dict_sys mutex, tries to
    write page 0 of the file and flushes the file. This not only causing
    unnecessary contention but also a deviation from the write-ahead
    logging protocol.
    
    The clean sequence of operations is that we first start a dictionary
    transaction and write SYS_TABLES and SYS_INDEXES records that identify
    the tablespace. Then, we durably write a FILE_CREATE record to the
    write-ahead log and create the file.
    
    Recovery should not unnecessarily insist that the first page of each
    data file that is referred to by the redo log is valid. It must be
    enough that page 0 of the tablespace can be initialized based on the
    redo log contents.
    
    We introduce a new data structure deferred_spaces that keeps track
    of corrupted-looking files during recovery. The data structure holds
    the last LSN of a FILE_ record referring to the data file, the
    tablespace identifier, and the last known file name.
    
    There are two scenarios can happen during recovery:
    i) Sufficient memory: InnoDB can reconstruct the
    tablespace after parsing all redo log records.
    
    ii) Insufficient memory(multiple apply phase): InnoDB should
    store the deferred tablespace redo logs even though
    tablespace is not present. InnoDB should start constructing
    the tablespace when it first encounters deferred tablespace
    id.
    
    Mariabackup copies the zero filled ibd file in backup_fix_ddl() as
    the extension of .new file. Mariabackup test case does page flushing
    when it deals with DDL operation during backup operation.
    
    fil_ibd_create(): Remove the write of page0 and flushing of file
    
    fil_ibd_load(): Return FIL_LOAD_DEFER if the tablespace has
    zero filled page0
    
    Datafile: Clean up the error handling, and do not report errors
    if we are in the middle of recovery. The caller will check
    Datafile::m_defer.
    
    fil_node_t::deferred: Indicates whether the tablespace loading was
    deferred during recovery
    
    FIL_LOAD_DEFER: Returned by fil_ibd_load() to indicate that tablespace
    file was cannot be loaded.
    
    recv_sys_t::recover_deferred(): Invoke deferred_spaces.create() to
    initialize fil_space_t based on buffered metadata and records to
    initialize page 0. Ignore the flags in fil_name_t, because they are
    intentionally invalid.
    
    fil_name_process(): Update deferred_spaces.
    
    recv_sys_t::parse(): Store the redo log if the tablespace id
    is present in deferred spaces
    
    recv_sys_t::recover_low(): Should recover the first page of
    the tablespace even though the tablespace instance is not
    present
    
    recv_sys_t::apply(): Initialize the deferred tablespace
    before applying the deferred tablespace records
    
    recv_validate_tablespace(): Skip the validation for deferred_spaces.
    
    recv_rename_files(): Moved and revised from recv_sys_t::apply().
    For deferred-recovery tablespaces, do not attempt to rename the
    file if a deferred-recovery tablespace is associated with the name.
    
    recv_recovery_from_checkpoint_start(): Invoke recv_rename_files()
    and initialize all deferred tablespaces before applying redo log.
    
    fil_node_t::read_page0(): Skip page0 validation if the tablespace
    is deferred
    
    buf_page_create_deferred(): A variant of buf_page_create() when
    the fil_space_t is not available yet
    
    This is joint work with Thirunarayanan Balathandayuthapani,
    who implemented an initial prototype.
    86dc7b4d
big_innodb_log.test 3.21 KB