• Marko Mäkelä's avatar
    MDEV-27416 InnoDB hang in buf_flush_wait_flushed(), on log checkpoint · 4c3ad244
    Marko Mäkelä authored
    InnoDB could sometimes hang when triggering a log checkpoint. This is
    due to commit 7b1252c0 (MDEV-24278),
    which introduced an untimed wait to buf_flush_page_cleaner().
    
    The hang was noticed by occasional failures of IMPORT TABLESPACE tests,
    such as innodb.innodb-wl5522, which would (unnecessarily) invoke
    log_make_checkpoint() from row_import_cleanup().
    
    The reason of the hang was that buf_flush_page_cleaner() would enter
    untimed sleep despite buf_flush_sync_lsn being set. The exact failure
    scenario is unclear, because buf_flush_sync_lsn should actually be
    protected by buf_pool.flush_list_mutex. We prevent the hang by
    invoking buf_pool.page_cleaner_set_idle(false) whenever we are
    setting buf_flush_sync_lsn and signaling buf_pool.do_flush_list.
    
    The bulk of these changes was originally developed as a preparation
    for MDEV-26827, to invoke buf_flush_list() from fewer threads,
    and tested on 10.6 by Matthias Leich.
    
    This fix was tested by running 100 repetitions of 100 concurrent instances
    of the test innodb.innodb-wl5522 on a RelWithDebInfo build, using ext4fs
    and innodb_flush_method=O_DIRECT on a SATA SSD with 4096-byte block size.
    During the test, the call to log_make_checkpoint() in row_import_cleanup()
    was present.
    
    buf_flush_list(): Make static.
    
    buf_flush_wait(): Wait for buf_pool.get_oldest_modification()
    to reach a target, by work done in the buf_flush_page_cleaner.
    If buf_flush_sync_lsn is going to be set, we will invoke
    buf_pool.page_cleaner_set_idle(false).
    
    buf_flush_ahead(): If buf_flush_sync_lsn or buf_flush_async_lsn
    is going to be set and the page cleaner woken up, we will invoke
    buf_pool.page_cleaner_set_idle(false).
    
    buf_flush_wait_flushed(): Invoke buf_flush_wait().
    
    buf_flush_sync(): Invoke recv_sys.apply() at the start in case
    crash recovery is active. Invoke buf_flush_wait().
    
    buf_flush_sync_batch(): A lower-level variant of buf_flush_sync()
    that is only called by recv_sys_t::apply().
    
    buf_flush_sync_for_checkpoint(): Do not trigger log apply
    or checkpoint during recovery.
    
    buf_dblwr_t::create(): Only initiate a buffer pool flush, not
    a checkpoint.
    
    row_import_cleanup(): Do not unnecessarily invoke log_make_checkpoint().
    Invoking buf_flush_list_space() before starting to generate redo log
    for the imported tablespace should suffice.
    
    srv_prepare_to_delete_redo_log_file():
    Set recv_sys.recovery_on in order to prevent
    buf_flush_sync_for_checkpoint() from initiating a checkpoint
    while the log is inaccessible. Remove a wait loop that is already
    part of buf_flush_sync().
    Do not invoke fil_names_clear() if the log is being upgraded,
    because the FILE_MODIFY record is specific to the latest format.
    
    create_log_file(): Clear recv_sys.recovery_on only after calling
    log_make_checkpoint(), to prevent buf_flush_page_cleaner from
    invoking a checkpoint.
    
    innodb_shutdown(): Simplify the logic in mariadb-backup --prepare.
    
    os_aio_wait_until_no_pending_writes(): Update the function comment.
    Apart from row_quiesce_table_start() during FLUSH TABLES...FOR EXPORT,
    this is being called by buf_flush_list_space(), which is invoked
    by ALTER TABLE...IMPORT TABLESPACE as well as some encryption operations.
    4c3ad244
row0import.cc 124 KB