• Vlad Lesin's avatar
    MDEV-20755 InnoDB: Database page corruption on disk or a failed file read of... · 985ede92
    Vlad Lesin authored
    MDEV-20755 InnoDB: Database page corruption on disk or a failed file read of tablespace upon prepare of mariabackup incremental backup
    
    The problem:
    
    When incremental backup is taken, delta files are created for innodb tables
    which are marked as new tables during innodb ddl tracking. When such
    tablespace is tried to be opened during prepare in
    xb_delta_open_matching_space(), it is "created", i.e.
    xb_space_create_file() is invoked, instead of opening, even if
    a tablespace with the same name exists in the base backup directory.
    
    xb_space_create_file() writes page 0 header the tablespace.
    This header does not contain crypt data, as mariabackup does not have
    any information about crypt data in delta file metadata for
    tablespaces.
    
    After delta file is applied, recovery process is started. As the
    sequence of recovery for different pages is not defined, there can be
    the situation when crypt data redo log event is executed after some
    other page is read for recovery. When some page is read for recovery, it's
    decrypted using crypt data stored in tablespace header in page 0, if
    there is no crypt data, the page is not decryped and does not pass corruption
    test.
    
    This causes error for incremental backup --prepare for encrypted
    tablespaces.
    
    The error is not stable because crypt data redo log event updates crypt
    data on page 0, and recovery for different pages can be executed in
    undefined order.
    
    The fix:
    
    When delta file is created, the corresponding write filter copies only
    the pages which LSN is greater then some incremental LSN. When new file
    is created during incremental backup, the LSN of all it's pages must be
    greater then incremental LSN, so there is no need to create delta for
    such table, we can just copy it completely.
    
    The fix is to copy the whole file which was tracked during incremental backup
    with innodb ddl tracker, and copy it to base directory during --prepare
    instead of delta applying.
    
    There is also DBUG_EXECUTE_IF() in innodb code to avoid writing redo log
    record for crypt data updating on page 0 to make the test case stable.
    
    Note:
    
    The issue is not reproducible in 10.5 as optimized DDL's are deprecated
    in 10.5. But the fix is still useful because it allows to decrease
    data copy size during backup, as delta file contains some extra info.
    The test case should be removed for 10.5 as it will always pass.
    985ede92
xtrabackup.cc 184 KB