storage/innobase/buf/buf0dblwr.cc · 39e3ca8bd25fc539ed08ff464e8a3189ff9f7fa3 · nexedi / MariaDB

MDEV-31826 InnoDB may fail to recover after being killed in fil_delete_tablespace() · 39e3ca8b

Marko Mäkelä authored Oct 26, 2023

InnoDB was violating the write-ahead-logging protocol when a file
was being deleted, like this:

1. fil_delete_tablespace() set the fil_space_t::STOPPING flag
2. The buf_flush_page_cleaner() thread discards some changed pages for
this tablespace advances the log checkpoint a little.
3. The server process is killed before fil_delete_tablespace() wrote
a FILE_DELETE record.
4. Recovery will try to apply log to pages of the tablespace, because
there was no FILE_DELETE record. This will fail, because some pages
that had been modified since the latest checkpoint had not been written
by the page cleaner.

Page writes must not be stopped before a FILE_DELETE record has been
durably written.

fil_space_t::drop(): Replaces fil_space_t::check_pending_operations().
Add the parameter detached_handle, and return a tablespace pointer
if this thread was the first one to stop I/O on the tablespace.

mtr_t::commit_file(): Remove the parameter detached_handle, and
move some handling to fil_space_t::drop().

fil_space_t: STOPPING_READS, STOPPING_WRITES: Separate flags for STOPPING.
We want to stop reads (and encryption) before stopping page writes.

fil_space_t::is_stopping_writes(), fil_space_t::get_for_write():
Special accessors for the write path.

fil_space_t::flush_low(): Ignore the STOPPING_READS flag and only
stop if STOPPING_WRITES is set, to avoid an infinite loop in
fil_flush_file_spaces(), which was occasionally repeated by
running the test encryption.create_or_replace.

Reviewed by: Vladislav Lesin
Tested by: Matthias Leich

39e3ca8b

buf0dblwr.cc 25.6 KB

Replace buf0dblwr.cc