• Marko Mäkelä's avatar
    MDEV-24449 Corruption of system tablespace or last recovered page · 5b9ee8d8
    Marko Mäkelä authored
    This corresponds to 10.5 commit 39378e13.
    
    With a patched version of the test innodb.ibuf_not_empty (so that
    it would trigger crash recovery after using the change buffer),
    and patched code that would modify the os_thread_sleep() in
    recv_apply_hashed_log_recs() to be 1ms as well as add a sleep of
    the same duration to the end of recv_recover_page() when
    recv_sys->n_addrs=0, we can demonstrate a race condition.
    
    After disabling some debug checks in buf_all_freed_instance(),
    buf_pool_invalidate_instance() and buf_validate(), we managed to
    trigger an assertion failure in fseg_free_step(), on the XDES_FREE_BIT.
    In other words, an trx_undo_seg_free() call during
    trx_rollback_resurrected() was attempting a double-free of a page.
    This was repeated about once in 400 to 500 test runs. With the fix
    applied, the test passed 2,000 runs.
    
    recv_apply_hashed_log_recs(): Do not only wait for recv_sys->n_addrs
    to reach 0, but also wait for buf_get_n_pending_read_ios() to reach 0,
    to guarantee that buf_page_io_complete() will not be executing
    ibuf_merge_or_delete_for_page().
    5b9ee8d8
log0recv.cc 118 KB