Commit a55b951e authored by Marko Mäkelä's avatar Marko Mäkelä

MDEV-26827 Make page flushing even faster

For more convenient monitoring of something that could greatly affect
the volume of page writes, we add the status variable
Innodb_buffer_pool_pages_split that was previously only available
via information_schema.innodb_metrics as "innodb_page_splits".
This was suggested by Axel Schwenke.

buf_flush_page_count: Replaced with buf_pool.stat.n_pages_written.
We protect buf_pool.stat (except n_page_gets) with buf_pool.mutex
and remove unnecessary export_vars indirection.

buf_pool.flush_list_bytes: Moved from buf_pool.stat.flush_list_bytes.
Protected by buf_pool.flush_list_mutex.

buf_pool_t::page_cleaner_status: Replaces buf_pool_t::n_flush_LRU_,
buf_pool_t::n_flush_list_, and buf_pool_t::page_cleaner_is_idle.
Protected by buf_pool.flush_list_mutex. We will exclusively broadcast
buf_pool.done_flush_list by the buf_flush_page_cleaner thread,
and only wait for it when communicating with buf_flush_page_cleaner.
There is no need to keep a count of pending writes by the
buf_pool.flush_list processing. A single flag suffices for that.

Waits for page write completion can be performed by
simply waiting on block->page.lock, or by invoking
buf_dblwr.wait_for_page_writes().

buf_LRU_block_free_non_file_page(): Broadcast buf_pool.done_free and
set buf_pool.try_LRU_scan when freeing a page. This would be
executed also as part of buf_page_write_complete().

buf_page_write_complete(): Do not broadcast buf_pool.done_flush_list,
and do not acquire buf_pool.mutex unless buf_pool.LRU eviction is needed.
Let buf_dblwr count all writes to persistent pages and broadcast a
condition variable when no outstanding writes remain.

buf_flush_page_cleaner(): Prioritize LRU flushing and eviction right after
"furious flushing" (lsn_limit). Simplify the conditions and reduce the
hold time of buf_pool.flush_list_mutex. Refuse to shut down
or sleep if buf_pool.ran_out(), that is, LRU eviction is needed.

buf_pool_t::page_cleaner_wakeup(): Add the optional parameter for_LRU.

buf_LRU_get_free_block(): Protect buf_lru_free_blocks_error_printed
with buf_pool.mutex. Invoke buf_pool.page_cleaner_wakeup(true) to
to ensure that buf_flush_page_cleaner() will process the LRU flush
request.

buf_do_LRU_batch(), buf_flush_list(), buf_flush_list_space():
Update buf_pool.stat.n_pages_written when submitting writes
(while holding buf_pool.mutex), not when completing them.

buf_page_t::flush(), buf_flush_discard_page(): Require that
the page U-latch be acquired upfront, and remove
buf_page_t::ready_for_flush().

buf_pool_t::delete_from_flush_list(): Remove the parameter "bool clear".

buf_flush_page(): Count pending page writes via buf_dblwr.

buf_flush_try_neighbors(): Take the block of page_id as a parameter.
If the tablespace is dropped before our page has been written out,
release the page U-latch.

buf_pool_invalidate(): Let the caller ensure that there are no
outstanding writes.

buf_flush_wait_batch_end(false),
buf_flush_wait_batch_end_acquiring_mutex(false):
Replaced with buf_dblwr.wait_for_page_writes().

buf_flush_wait_LRU_batch_end(): Replaces buf_flush_wait_batch_end(true).

buf_flush_list(): Remove some broadcast of buf_pool.done_flush_list.

buf_flush_buffer_pool(): Invoke also buf_dblwr.wait_for_page_writes().

buf_pool_t::io_pending(), buf_pool_t::n_flush_list(): Remove.
Outstanding writes are reflected by buf_dblwr.pending_writes().

buf_dblwr_t::init(): New function, to initialize the mutex and
the condition variables, but not the backing store.

buf_dblwr_t::is_created(): Replaces buf_dblwr_t::is_initialised().

buf_dblwr_t::pending_writes(), buf_dblwr_t::writes_pending:
Keeps track of writes of persistent data pages.

buf_flush_LRU(): Allow calls while LRU flushing may be in progress
in another thread.

Tested by Matthias Leich (correctness) and Axel Schwenke (performance)
parent 9593cccf
......@@ -199,7 +199,7 @@ compress_pages_page_decompressed compression 0 NULL NULL NULL 0 NULL NULL NULL N
compress_pages_page_compression_error compression 0 NULL NULL NULL 0 NULL NULL NULL NULL NULL NULL NULL 0 counter Number of page compression errors
compress_pages_encrypted compression 0 NULL NULL NULL 0 NULL NULL NULL NULL NULL NULL NULL 0 counter Number of pages encrypted
compress_pages_decrypted compression 0 NULL NULL NULL 0 NULL NULL NULL NULL NULL NULL NULL 0 counter Number of pages decrypted
index_page_splits index 0 NULL NULL NULL 0 NULL NULL NULL NULL NULL NULL NULL 0 counter Number of index page splits
index_page_splits index 0 NULL NULL NULL 0 NULL NULL NULL NULL NULL NULL NULL 0 status_counter Number of index page splits
index_page_merge_attempts index 0 NULL NULL NULL 0 NULL NULL NULL NULL NULL NULL NULL 0 counter Number of index page merge attempts
index_page_merge_successful index 0 NULL NULL NULL 0 NULL NULL NULL NULL NULL NULL NULL 0 counter Number of successful index page merges
index_page_reorg_attempts index 0 NULL NULL NULL 0 NULL NULL NULL NULL NULL NULL NULL 0 counter Number of index page reorganization attempts
......
......@@ -23,6 +23,7 @@ INNODB_BUFFER_POOL_PAGES_OLD
INNODB_BUFFER_POOL_PAGES_TOTAL
INNODB_BUFFER_POOL_PAGES_LRU_FLUSHED
INNODB_BUFFER_POOL_PAGES_LRU_FREED
INNODB_BUFFER_POOL_PAGES_SPLIT
INNODB_BUFFER_POOL_READ_AHEAD_RND
INNODB_BUFFER_POOL_READ_AHEAD
INNODB_BUFFER_POOL_READ_AHEAD_EVICTED
......
......@@ -2975,6 +2975,8 @@ btr_page_split_and_insert(
ut_ad(*err == DB_SUCCESS);
ut_ad(dtuple_check_typed(tuple));
buf_pool.pages_split++;
if (cursor->index()->is_spatial()) {
/* Split rtree page and update parent */
return rtr_page_split_and_insert(flags, cursor, offsets, heap,
......@@ -3371,8 +3373,6 @@ btr_page_split_and_insert(
left_block, right_block, mtr);
}
MONITOR_INC(MONITOR_INDEX_SPLIT);
ut_ad(page_validate(buf_block_get_frame(left_block),
page_cursor->index));
ut_ad(page_validate(buf_block_get_frame(right_block),
......
......@@ -1401,8 +1401,10 @@ inline bool buf_pool_t::withdraw_blocks()
true);
mysql_mutex_unlock(&buf_pool.mutex);
buf_dblwr.flush_buffered_writes();
mysql_mutex_lock(&buf_pool.flush_list_mutex);
buf_flush_wait_LRU_batch_end();
mysql_mutex_unlock(&buf_pool.flush_list_mutex);
mysql_mutex_lock(&buf_pool.mutex);
buf_flush_wait_batch_end(true);
}
/* relocate blocks/buddies in withdrawn area */
......@@ -2265,13 +2267,15 @@ buf_page_t* buf_page_get_zip(const page_id_t page_id, ulint zip_size)
return bpage;
must_read_page:
if (dberr_t err= buf_read_page(page_id, zip_size))
{
switch (dberr_t err= buf_read_page(page_id, zip_size)) {
case DB_SUCCESS:
case DB_SUCCESS_LOCKED_REC:
goto lookup;
default:
ib::error() << "Reading compressed page " << page_id
<< " failed with error: " << err;
return nullptr;
}
goto lookup;
}
/********************************************************************//**
......@@ -2511,21 +2515,24 @@ buf_page_get_low(
corrupted, or if an encrypted page with a valid
checksum cannot be decypted. */
if (dberr_t local_err = buf_read_page(page_id, zip_size)) {
if (local_err != DB_CORRUPTION
&& mode != BUF_GET_POSSIBLY_FREED
switch (dberr_t local_err = buf_read_page(page_id, zip_size)) {
case DB_SUCCESS:
case DB_SUCCESS_LOCKED_REC:
buf_read_ahead_random(page_id, zip_size, ibuf_inside(mtr));
break;
default:
if (mode != BUF_GET_POSSIBLY_FREED
&& retries++ < BUF_PAGE_READ_MAX_RETRIES) {
DBUG_EXECUTE_IF("intermittent_read_failure",
retries = BUF_PAGE_READ_MAX_RETRIES;);
} else {
}
/* fall through */
case DB_PAGE_CORRUPTED:
if (err) {
*err = local_err;
}
return nullptr;
}
} else {
buf_read_ahead_random(page_id, zip_size, ibuf_inside(mtr));
}
ut_d(if (!(++buf_dbg_counter % 5771)) buf_pool.validate());
goto loop;
......@@ -3279,12 +3286,12 @@ static buf_block_t *buf_page_create_low(page_id_t page_id, ulint zip_size,
buf_unzip_LRU_add_block(reinterpret_cast<buf_block_t*>(bpage), FALSE);
}
buf_pool.stat.n_pages_created++;
mysql_mutex_unlock(&buf_pool.mutex);
mtr->memo_push(reinterpret_cast<buf_block_t*>(bpage), MTR_MEMO_PAGE_X_FIX);
bpage->set_accessed();
buf_pool.stat.n_pages_created++;
/* Delete possible entries for the page from the insert buffer:
such can exist if the page belonged to an index which was dropped */
......@@ -3534,7 +3541,6 @@ dberr_t buf_page_t::read_complete(const fil_node_t &node)
ut_d(auto n=) buf_pool.n_pend_reads--;
ut_ad(n > 0);
buf_pool.stat.n_pages_read++;
const byte *read_frame= zip.data ? zip.data : frame;
ut_ad(read_frame);
......@@ -3686,9 +3692,6 @@ void buf_pool_invalidate()
{
mysql_mutex_lock(&buf_pool.mutex);
buf_flush_wait_batch_end(true);
buf_flush_wait_batch_end(false);
/* It is possible that a write batch that has been posted
earlier is still not complete. For buffer pool invalidation to
proceed we must ensure there is NO write activity happening. */
......@@ -3839,8 +3842,8 @@ void buf_pool_t::print()
<< UT_LIST_GET_LEN(flush_list)
<< ", n pending decompressions=" << n_pend_unzip
<< ", n pending reads=" << n_pend_reads
<< ", n pending flush LRU=" << n_flush_LRU_
<< " list=" << n_flush_list_
<< ", n pending flush LRU=" << n_flush()
<< " list=" << buf_dblwr.pending_writes()
<< ", pages made young=" << stat.n_pages_made_young
<< ", not young=" << stat.n_pages_not_made_young
<< ", pages read=" << stat.n_pages_read
......@@ -3952,13 +3955,13 @@ void buf_stats_get_pool_info(buf_pool_info_t *pool_info)
pool_info->flush_list_len = UT_LIST_GET_LEN(buf_pool.flush_list);
pool_info->n_pend_unzip = UT_LIST_GET_LEN(buf_pool.unzip_LRU);
mysql_mutex_unlock(&buf_pool.flush_list_mutex);
pool_info->n_pend_reads = buf_pool.n_pend_reads;
pool_info->n_pending_flush_lru = buf_pool.n_flush_LRU_;
pool_info->n_pending_flush_lru = buf_pool.n_flush();
pool_info->n_pending_flush_list = buf_pool.n_flush_list_;
pool_info->n_pending_flush_list = buf_dblwr.pending_writes();
mysql_mutex_unlock(&buf_pool.flush_list_mutex);
current_time = time(NULL);
time_elapsed = 0.001 + difftime(current_time,
......
......@@ -46,7 +46,17 @@ inline buf_block_t *buf_dblwr_trx_sys_get(mtr_t *mtr)
0, RW_X_LATCH, mtr);
}
/** Initialize the doublewrite buffer data structure.
void buf_dblwr_t::init()
{
if (!active_slot)
{
active_slot= &slots[0];
mysql_mutex_init(buf_dblwr_mutex_key, &mutex, nullptr);
pthread_cond_init(&cond, nullptr);
}
}
/** Initialise the persistent storage of the doublewrite buffer.
@param header doublewrite page header in the TRX_SYS page */
inline void buf_dblwr_t::init(const byte *header)
{
......@@ -54,8 +64,6 @@ inline void buf_dblwr_t::init(const byte *header)
ut_ad(!active_slot->reserved);
ut_ad(!batch_running);
mysql_mutex_init(buf_dblwr_mutex_key, &mutex, nullptr);
pthread_cond_init(&cond, nullptr);
block1= page_id_t(0, mach_read_from_4(header + TRX_SYS_DOUBLEWRITE_BLOCK1));
block2= page_id_t(0, mach_read_from_4(header + TRX_SYS_DOUBLEWRITE_BLOCK2));
......@@ -74,7 +82,7 @@ inline void buf_dblwr_t::init(const byte *header)
@return whether the operation succeeded */
bool buf_dblwr_t::create()
{
if (is_initialised())
if (is_created())
return true;
mtr_t mtr;
......@@ -343,7 +351,7 @@ dberr_t buf_dblwr_t::init_or_load_pages(pfs_os_file_t file, const char *path)
void buf_dblwr_t::recover()
{
ut_ad(recv_sys.parse_start_lsn);
if (!is_initialised())
if (!is_created())
return;
uint32_t page_no_dblwr= 0;
......@@ -452,10 +460,9 @@ void buf_dblwr_t::recover()
/** Free the doublewrite buffer. */
void buf_dblwr_t::close()
{
if (!is_initialised())
if (!active_slot)
return;
/* Free the double write data structures. */
ut_ad(!active_slot->reserved);
ut_ad(!active_slot->first_free);
ut_ad(!batch_running);
......@@ -469,19 +476,24 @@ void buf_dblwr_t::close()
mysql_mutex_destroy(&mutex);
memset((void*) this, 0, sizeof *this);
active_slot= &slots[0];
}
/** Update the doublewrite buffer on write completion. */
void buf_dblwr_t::write_completed()
void buf_dblwr_t::write_completed(bool with_doublewrite)
{
ut_ad(this == &buf_dblwr);
ut_ad(srv_use_doublewrite_buf);
ut_ad(is_initialised());
ut_ad(!srv_read_only_mode);
mysql_mutex_lock(&mutex);
ut_ad(writes_pending);
if (!--writes_pending)
pthread_cond_broadcast(&write_cond);
if (with_doublewrite)
{
ut_ad(is_created());
ut_ad(srv_use_doublewrite_buf);
ut_ad(batch_running);
slot *flush_slot= active_slot == &slots[0] ? &slots[1] : &slots[0];
ut_ad(flush_slot->reserved);
......@@ -499,6 +511,7 @@ void buf_dblwr_t::write_completed()
batch_running= false;
pthread_cond_broadcast(&cond);
}
}
mysql_mutex_unlock(&mutex);
}
......@@ -642,7 +655,7 @@ void buf_dblwr_t::flush_buffered_writes_completed(const IORequest &request)
{
ut_ad(this == &buf_dblwr);
ut_ad(srv_use_doublewrite_buf);
ut_ad(is_initialised());
ut_ad(is_created());
ut_ad(!srv_read_only_mode);
ut_ad(!request.bpage);
ut_ad(request.node == fil_system.sys_space->chain.start);
......@@ -708,7 +721,7 @@ posted, and also when we may have to wait for a page latch!
Otherwise a deadlock of threads can occur. */
void buf_dblwr_t::flush_buffered_writes()
{
if (!is_initialised() || !srv_use_doublewrite_buf)
if (!is_created() || !srv_use_doublewrite_buf)
{
fil_flush_file_spaces();
return;
......@@ -741,6 +754,7 @@ void buf_dblwr_t::add_to_batch(const IORequest &request, size_t size)
const ulint buf_size= 2 * block_size();
mysql_mutex_lock(&mutex);
writes_pending++;
for (;;)
{
......
This diff is collapsed.
......@@ -136,7 +136,6 @@ static void buf_LRU_block_free_hashed_page(buf_block_t *block)
@param[in] bpage control block */
static inline void incr_LRU_size_in_bytes(const buf_page_t* bpage)
{
/* FIXME: use atomics, not mutex */
mysql_mutex_assert_owner(&buf_pool.mutex);
buf_pool.stat.LRU_bytes += bpage->physical_size();
......@@ -400,6 +399,7 @@ buf_block_t *buf_LRU_get_free_block(bool have_mutex)
DBUG_EXECUTE_IF("recv_ran_out_of_buffer",
if (recv_recovery_is_on()
&& recv_sys.apply_log_recs) {
mysql_mutex_lock(&buf_pool.mutex);
goto flush_lru;
});
get_mutex:
......@@ -445,20 +445,32 @@ buf_block_t *buf_LRU_get_free_block(bool have_mutex)
if ((block = buf_LRU_get_free_only()) != nullptr) {
goto got_block;
}
if (!buf_pool.n_flush_LRU_) {
break;
mysql_mutex_unlock(&buf_pool.mutex);
mysql_mutex_lock(&buf_pool.flush_list_mutex);
const auto n_flush = buf_pool.n_flush();
mysql_mutex_unlock(&buf_pool.flush_list_mutex);
mysql_mutex_lock(&buf_pool.mutex);
if (!n_flush) {
goto not_found;
}
if (!buf_pool.try_LRU_scan) {
mysql_mutex_lock(&buf_pool.flush_list_mutex);
buf_pool.page_cleaner_wakeup(true);
mysql_mutex_unlock(&buf_pool.flush_list_mutex);
my_cond_wait(&buf_pool.done_free,
&buf_pool.mutex.m_mutex);
}
my_cond_wait(&buf_pool.done_free, &buf_pool.mutex.m_mutex);
}
#ifndef DBUG_OFF
not_found:
#endif
mysql_mutex_unlock(&buf_pool.mutex);
if (n_iterations > 1) {
MONITOR_INC( MONITOR_LRU_GET_FREE_WAITS );
}
if (n_iterations > 20 && !buf_lru_free_blocks_error_printed
if (n_iterations == 21 && !buf_lru_free_blocks_error_printed
&& srv_buf_pool_old_size == srv_buf_pool_size) {
buf_lru_free_blocks_error_printed = true;
mysql_mutex_unlock(&buf_pool.mutex);
ib::warn() << "Difficult to find free blocks in the buffer pool"
" (" << n_iterations << " search iterations)! "
<< flush_failures << " failed attempts to"
......@@ -472,12 +484,7 @@ buf_block_t *buf_LRU_get_free_block(bool have_mutex)
<< os_n_file_writes << " OS file writes, "
<< os_n_fsyncs
<< " OS fsyncs.";
buf_lru_free_blocks_error_printed = true;
}
if (n_iterations > 1) {
MONITOR_INC( MONITOR_LRU_GET_FREE_WAITS );
mysql_mutex_lock(&buf_pool.mutex);
}
/* No free block was found: try to flush the LRU list.
......@@ -491,8 +498,6 @@ buf_block_t *buf_LRU_get_free_block(bool have_mutex)
#ifndef DBUG_OFF
flush_lru:
#endif
mysql_mutex_lock(&buf_pool.mutex);
if (!buf_flush_LRU(innodb_lru_flush_size, true)) {
MONITOR_INC(MONITOR_LRU_SINGLE_FLUSH_FAILURE_COUNT);
++flush_failures;
......@@ -1039,7 +1044,8 @@ buf_LRU_block_free_non_file_page(
} else {
UT_LIST_ADD_FIRST(buf_pool.free, &block->page);
ut_d(block->page.in_free_list = true);
pthread_cond_signal(&buf_pool.done_free);
buf_pool.try_LRU_scan= true;
pthread_cond_broadcast(&buf_pool.done_free);
}
MEM_NOACCESS(block->page.frame, srv_page_size);
......
......@@ -226,6 +226,7 @@ static buf_page_t* buf_page_init_for_read(ulint mode, const page_id_t page_id,
buf_LRU_add_block(bpage, true/* to old blocks */);
}
buf_pool.stat.n_pages_read++;
mysql_mutex_unlock(&buf_pool.mutex);
buf_pool.n_pend_reads++;
goto func_exit_no_mutex;
......@@ -245,20 +246,18 @@ buffer buf_pool if it is not already there, in which case does nothing.
Sets the io_fix flag and sets an exclusive lock on the buffer frame. The
flag is cleared and the x-lock released by an i/o-handler thread.
@param[out] err DB_SUCCESS or DB_TABLESPACE_DELETED
if we are trying
to read from a non-existent tablespace
@param[in,out] space tablespace
@param[in] sync true if synchronous aio is desired
@param[in] mode BUF_READ_IBUF_PAGES_ONLY, ...,
@param[in] page_id page id
@param[in] zip_size ROW_FORMAT=COMPRESSED page size, or 0
@param[in] unzip true=request uncompressed page
@return whether a read request was queued */
@return error code
@retval DB_SUCCESS if the page was read
@retval DB_SUCCESS_LOCKED_REC if the page exists in the buffer pool already */
static
bool
dberr_t
buf_read_page_low(
dberr_t* err,
fil_space_t* space,
bool sync,
ulint mode,
......@@ -268,15 +267,12 @@ buf_read_page_low(
{
buf_page_t* bpage;
*err = DB_SUCCESS;
if (buf_dblwr.is_inside(page_id)) {
ib::error() << "Trying to read doublewrite buffer page "
<< page_id;
ut_ad(0);
nothing_read:
space->release();
return false;
return DB_PAGE_CORRUPTED;
}
if (sync) {
......@@ -299,8 +295,9 @@ buf_read_page_low(
completed */
bpage = buf_page_init_for_read(mode, page_id, zip_size, unzip);
if (bpage == NULL) {
goto nothing_read;
if (!bpage) {
space->release();
return DB_SUCCESS_LOCKED_REC;
}
ut_ad(bpage->in_file());
......@@ -320,7 +317,6 @@ buf_read_page_low(
? IORequest::READ_SYNC
: IORequest::READ_ASYNC),
page_id.page_no() * len, len, dst, bpage);
*err = fio.err;
if (UNIV_UNLIKELY(fio.err != DB_SUCCESS)) {
ut_d(auto n=) buf_pool.n_pend_reads--;
......@@ -329,14 +325,14 @@ buf_read_page_low(
} else if (sync) {
thd_wait_end(NULL);
/* The i/o was already completed in space->io() */
*err = bpage->read_complete(*fio.node);
fio.err = bpage->read_complete(*fio.node);
space->release();
if (*err == DB_FAIL) {
*err = DB_PAGE_CORRUPTED;
if (fio.err == DB_FAIL) {
fio.err = DB_PAGE_CORRUPTED;
}
}
return true;
return fio.err;
}
/** Applies a random read-ahead in buf_pool if there are at least a threshold
......@@ -414,24 +410,26 @@ buf_read_ahead_random(const page_id_t page_id, ulint zip_size, bool ibuf)
continue;
if (space->is_stopping())
break;
dberr_t err;
space->reacquire();
if (buf_read_page_low(&err, space, false, ibuf_mode, i, zip_size, false))
if (buf_read_page_low(space, false, ibuf_mode, i, zip_size, false) ==
DB_SUCCESS)
count++;
}
if (count)
{
DBUG_PRINT("ib_buf", ("random read-ahead %zu pages from %s: %u",
count, space->chain.start->name,
low.page_no()));
space->release();
mysql_mutex_lock(&buf_pool.mutex);
/* Read ahead is considered one I/O operation for the purpose of
LRU policy decision. */
buf_LRU_stat_inc_io();
buf_pool.stat.n_ra_pages_read_rnd+= count;
srv_stats.buf_pool_reads.add(count);
mysql_mutex_unlock(&buf_pool.mutex);
}
space->release();
return count;
}
......@@ -441,8 +439,9 @@ on the buffer frame. The flag is cleared and the x-lock
released by the i/o-handler thread.
@param[in] page_id page id
@param[in] zip_size ROW_FORMAT=COMPRESSED page size, or 0
@retval DB_SUCCESS if the page was read and is not corrupted,
@retval DB_PAGE_CORRUPTED if page based on checksum check is corrupted,
@retval DB_SUCCESS if the page was read and is not corrupted
@retval DB_SUCCESS_LOCKED_REC if the page was not read
@retval DB_PAGE_CORRUPTED if page based on checksum check is corrupted
@retval DB_DECRYPTION_FAILED if page post encryption checksum matches but
after decryption normal page checksum does not match.
@retval DB_TABLESPACE_DELETED if tablespace .ibd file is missing */
......@@ -456,13 +455,9 @@ dberr_t buf_read_page(const page_id_t page_id, ulint zip_size)
return DB_TABLESPACE_DELETED;
}
dberr_t err;
if (buf_read_page_low(&err, space, true, BUF_READ_ANY_PAGE,
page_id, zip_size, false))
srv_stats.buf_pool_reads.add(1);
buf_LRU_stat_inc_io();
return err;
buf_LRU_stat_inc_io(); /* NOT protected by buf_pool.mutex */
return buf_read_page_low(space, true, BUF_READ_ANY_PAGE,
page_id, zip_size, false);
}
/** High-level function which reads a page asynchronously from a file to the
......@@ -475,12 +470,8 @@ released by the i/o-handler thread.
void buf_read_page_background(fil_space_t *space, const page_id_t page_id,
ulint zip_size)
{
dberr_t err;
if (buf_read_page_low(&err, space, false, BUF_READ_ANY_PAGE,
page_id, zip_size, false)) {
srv_stats.buf_pool_reads.add(1);
}
buf_read_page_low(space, false, BUF_READ_ANY_PAGE,
page_id, zip_size, false);
/* We do not increment number of I/O operations used for LRU policy
here (buf_LRU_stat_inc_io()). We use this in heuristics to decide
......@@ -638,23 +629,26 @@ buf_read_ahead_linear(const page_id_t page_id, ulint zip_size, bool ibuf)
continue;
if (space->is_stopping())
break;
dberr_t err;
space->reacquire();
count+= buf_read_page_low(&err, space, false, ibuf_mode, new_low, zip_size,
false);
if (buf_read_page_low(space, false, ibuf_mode, new_low, zip_size, false) ==
DB_SUCCESS)
count++;
}
if (count)
{
DBUG_PRINT("ib_buf", ("random read-ahead %zu pages from %s: %u",
count, space->chain.start->name,
new_low.page_no()));
space->release();
mysql_mutex_lock(&buf_pool.mutex);
/* Read ahead is considered one I/O operation for the purpose of
LRU policy decision. */
buf_LRU_stat_inc_io();
buf_pool.stat.n_ra_pages_read+= count;
mysql_mutex_unlock(&buf_pool.mutex);
}
space->release();
return count;
}
......@@ -709,13 +703,12 @@ void buf_read_recv_pages(ulint space_id, const uint32_t* page_nos, ulint n)
}
}
dberr_t err;
space->reacquire();
buf_read_page_low(&err, space, false,
BUF_READ_ANY_PAGE, cur_page_id, zip_size,
true);
if (err != DB_SUCCESS) {
switch (buf_read_page_low(space, false, BUF_READ_ANY_PAGE,
cur_page_id, zip_size, true)) {
case DB_SUCCESS: case DB_SUCCESS_LOCKED_REC:
break;
default:
sql_print_error("InnoDB: Recovery failed to read page "
UINT32PF " from %s",
cur_page_id.page_no(),
......
......@@ -1209,8 +1209,6 @@ rtr_page_split_and_insert(
ut_ad(!rec || rec_offs_validate(rec, cursor->index(), *offsets));
#endif
MONITOR_INC(MONITOR_INDEX_SPLIT);
return(rec);
}
......
......@@ -915,43 +915,37 @@ static SHOW_VAR innodb_status_variables[]= {
(char*) &export_vars.innodb_buffer_pool_resize_status, SHOW_CHAR},
{"buffer_pool_load_incomplete",
&export_vars.innodb_buffer_pool_load_incomplete, SHOW_BOOL},
{"buffer_pool_pages_data",
&export_vars.innodb_buffer_pool_pages_data, SHOW_SIZE_T},
{"buffer_pool_pages_data", &UT_LIST_GET_LEN(buf_pool.LRU), SHOW_SIZE_T},
{"buffer_pool_bytes_data",
&export_vars.innodb_buffer_pool_bytes_data, SHOW_SIZE_T},
{"buffer_pool_pages_dirty",
&export_vars.innodb_buffer_pool_pages_dirty, SHOW_SIZE_T},
{"buffer_pool_bytes_dirty",
&export_vars.innodb_buffer_pool_bytes_dirty, SHOW_SIZE_T},
{"buffer_pool_pages_flushed", &buf_flush_page_count, SHOW_SIZE_T},
{"buffer_pool_pages_free",
&export_vars.innodb_buffer_pool_pages_free, SHOW_SIZE_T},
&UT_LIST_GET_LEN(buf_pool.flush_list), SHOW_SIZE_T},
{"buffer_pool_bytes_dirty", &buf_pool.flush_list_bytes, SHOW_SIZE_T},
{"buffer_pool_pages_flushed", &buf_pool.stat.n_pages_written, SHOW_SIZE_T},
{"buffer_pool_pages_free", &UT_LIST_GET_LEN(buf_pool.free), SHOW_SIZE_T},
#ifdef UNIV_DEBUG
{"buffer_pool_pages_latched",
&export_vars.innodb_buffer_pool_pages_latched, SHOW_SIZE_T},
#endif /* UNIV_DEBUG */
{"buffer_pool_pages_made_not_young",
&export_vars.innodb_buffer_pool_pages_made_not_young, SHOW_SIZE_T},
&buf_pool.stat.n_pages_not_made_young, SHOW_SIZE_T},
{"buffer_pool_pages_made_young",
&export_vars.innodb_buffer_pool_pages_made_young, SHOW_SIZE_T},
&buf_pool.stat.n_pages_made_young, SHOW_SIZE_T},
{"buffer_pool_pages_misc",
&export_vars.innodb_buffer_pool_pages_misc, SHOW_SIZE_T},
{"buffer_pool_pages_old",
&export_vars.innodb_buffer_pool_pages_old, SHOW_SIZE_T},
{"buffer_pool_pages_old", &buf_pool.LRU_old_len, SHOW_SIZE_T},
{"buffer_pool_pages_total",
&export_vars.innodb_buffer_pool_pages_total, SHOW_SIZE_T},
{"buffer_pool_pages_LRU_flushed", &buf_lru_flush_page_count, SHOW_SIZE_T},
{"buffer_pool_pages_LRU_freed", &buf_lru_freed_page_count, SHOW_SIZE_T},
{"buffer_pool_pages_split", &buf_pool.pages_split, SHOW_SIZE_T},
{"buffer_pool_read_ahead_rnd",
&export_vars.innodb_buffer_pool_read_ahead_rnd, SHOW_SIZE_T},
{"buffer_pool_read_ahead",
&export_vars.innodb_buffer_pool_read_ahead, SHOW_SIZE_T},
&buf_pool.stat.n_ra_pages_read_rnd, SHOW_SIZE_T},
{"buffer_pool_read_ahead", &buf_pool.stat.n_ra_pages_read, SHOW_SIZE_T},
{"buffer_pool_read_ahead_evicted",
&export_vars.innodb_buffer_pool_read_ahead_evicted, SHOW_SIZE_T},
{"buffer_pool_read_requests",
&export_vars.innodb_buffer_pool_read_requests, SHOW_SIZE_T},
{"buffer_pool_reads",
&export_vars.innodb_buffer_pool_reads, SHOW_SIZE_T},
&buf_pool.stat.n_ra_pages_evicted, SHOW_SIZE_T},
{"buffer_pool_read_requests", &buf_pool.stat.n_page_gets, SHOW_SIZE_T},
{"buffer_pool_reads", &buf_pool.stat.n_pages_read, SHOW_SIZE_T},
{"buffer_pool_wait_free", &buf_pool.stat.LRU_waits, SHOW_SIZE_T},
{"buffer_pool_write_requests",
&export_vars.innodb_buffer_pool_write_requests, SHOW_SIZE_T},
......
This diff is collapsed.
/*****************************************************************************
Copyright (c) 1995, 2017, Oracle and/or its affiliates. All Rights Reserved.
Copyright (c) 2017, 2020, MariaDB Corporation.
Copyright (c) 2017, 2022, MariaDB Corporation.
This program is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
......@@ -54,9 +54,9 @@ class buf_dblwr_t
};
/** the page number of the first doublewrite block (block_size() pages) */
page_id_t block1= page_id_t(0, 0);
page_id_t block1{0, 0};
/** the page number of the second doublewrite block (block_size() pages) */
page_id_t block2= page_id_t(0, 0);
page_id_t block2{0, 0};
/** mutex protecting the data members below */
mysql_mutex_t mutex;
......@@ -72,11 +72,15 @@ class buf_dblwr_t
ulint writes_completed;
/** number of pages written by flush_buffered_writes_completed() */
ulint pages_written;
/** condition variable for !writes_pending */
pthread_cond_t write_cond;
/** number of pending page writes */
size_t writes_pending;
slot slots[2];
slot *active_slot= &slots[0];
slot *active_slot;
/** Initialize the doublewrite buffer data structure.
/** Initialise the persistent storage of the doublewrite buffer.
@param header doublewrite page header in the TRX_SYS page */
inline void init(const byte *header);
......@@ -84,6 +88,8 @@ class buf_dblwr_t
bool flush_buffered_writes(const ulint size);
public:
/** Initialise the doublewrite buffer data structures. */
void init();
/** Create or restore the doublewrite buffer in the TRX_SYS page.
@return whether the operation succeeded */
bool create();
......@@ -118,7 +124,7 @@ class buf_dblwr_t
void recover();
/** Update the doublewrite buffer on data page write completion. */
void write_completed();
void write_completed(bool with_doublewrite);
/** Flush possible buffered writes to persistent storage.
It is very important to call this function after a batch of writes has been
posted, and also when we may have to wait for a page latch!
......@@ -137,14 +143,14 @@ class buf_dblwr_t
@param size payload size in bytes */
void add_to_batch(const IORequest &request, size_t size);
/** Determine whether the doublewrite buffer is initialized */
bool is_initialised() const
/** Determine whether the doublewrite buffer has been created */
bool is_created() const
{ return UNIV_LIKELY(block1 != page_id_t(0, 0)); }
/** @return whether a page identifier is part of the doublewrite buffer */
bool is_inside(const page_id_t id) const
{
if (!is_initialised())
if (!is_created())
return false;
ut_ad(block1 < block2);
if (id < block1)
......@@ -155,14 +161,45 @@ class buf_dblwr_t
/** Wait for flush_buffered_writes() to be fully completed */
void wait_flush_buffered_writes()
{
if (is_initialised())
{
mysql_mutex_lock(&mutex);
while (batch_running)
my_cond_wait(&cond, &mutex.m_mutex);
mysql_mutex_unlock(&mutex);
}
/** Register an unbuffered page write */
void add_unbuffered()
{
mysql_mutex_lock(&mutex);
writes_pending++;
mysql_mutex_unlock(&mutex);
}
size_t pending_writes()
{
mysql_mutex_lock(&mutex);
const size_t pending{writes_pending};
mysql_mutex_unlock(&mutex);
return pending;
}
/** Wait for writes_pending to reach 0 */
void wait_for_page_writes()
{
mysql_mutex_lock(&mutex);
while (writes_pending)
my_cond_wait(&write_cond, &mutex.m_mutex);
mysql_mutex_unlock(&mutex);
}
/** Wait for writes_pending to reach 0 */
void wait_for_page_writes(const timespec &abstime)
{
mysql_mutex_lock(&mutex);
while (writes_pending)
my_cond_timedwait(&write_cond, &mutex.m_mutex, &abstime);
mysql_mutex_unlock(&mutex);
}
};
......
......@@ -30,10 +30,8 @@ Created 11/5/1995 Heikki Tuuri
#include "log0log.h"
#include "buf0buf.h"
/** Number of pages flushed. Protected by buf_pool.mutex. */
extern ulint buf_flush_page_count;
/** Number of pages flushed via LRU. Protected by buf_pool.mutex.
Also included in buf_flush_page_count. */
Also included in buf_pool.stat.n_pages_written. */
extern ulint buf_lru_flush_page_count;
/** Number of pages freed without flushing. Protected by buf_pool.mutex. */
extern ulint buf_lru_freed_page_count;
......@@ -96,9 +94,8 @@ after releasing buf_pool.mutex.
@retval 0 if a buf_pool.LRU batch is already running */
ulint buf_flush_LRU(ulint max_n, bool evict);
/** Wait until a flush batch ends.
@param lru true=buf_pool.LRU; false=buf_pool.flush_list */
void buf_flush_wait_batch_end(bool lru);
/** Wait until a LRU flush batch ends. */
void buf_flush_wait_LRU_batch_end();
/** Wait until all persistent pages are flushed up to a limit.
@param sync_lsn buf_pool.get_oldest_modification(LSN_MAX) to wait for */
ATTRIBUTE_COLD void buf_flush_wait_flushed(lsn_t sync_lsn);
......
......@@ -33,10 +33,11 @@ Created 11/5/1995 Heikki Tuuri
buffer buf_pool if it is not already there. Sets the io_fix flag and sets
an exclusive lock on the buffer frame. The flag is cleared and the x-lock
released by the i/o-handler thread.
@param[in] page_id page id
@param[in] zip_size ROW_FORMAT=COMPRESSED page size, or 0
@retval DB_SUCCESS if the page was read and is not corrupted,
@retval DB_PAGE_CORRUPTED if page based on checksum check is corrupted,
@param page_id page id
@param zip_size ROW_FORMAT=COMPRESSED page size, or 0
@retval DB_SUCCESS if the page was read and is not corrupted
@retval DB_SUCCESS_LOCKED_REC if the page was not read
@retval DB_PAGE_CORRUPTED if page based on checksum check is corrupted
@retval DB_DECRYPTION_FAILED if page post encryption checksum matches but
after decryption normal page checksum does not match.
@retval DB_TABLESPACE_DELETED if tablespace .ibd file is missing */
......
......@@ -1170,7 +1170,7 @@ struct fil_node_t final
inline bool fil_space_t::use_doublewrite() const
{
return !UT_LIST_GET_FIRST(chain)->atomic_write && srv_use_doublewrite_buf &&
buf_dblwr.is_initialised();
buf_dblwr.is_created();
}
inline void fil_space_t::set_imported()
......
......@@ -108,10 +108,6 @@ struct srv_stats_t
/** Store the number of write requests issued */
ulint_ctr_1_t buf_pool_write_requests;
/** Number of buffer pool reads that led to the reading of
a disk page */
ulint_ctr_1_t buf_pool_reads;
/** Number of bytes saved by page compression */
ulint_ctr_n_t page_compression_saved;
/* Number of pages compressed with page compression */
......@@ -670,24 +666,12 @@ struct export_var_t{
char innodb_buffer_pool_resize_status[512];/*!< Buf pool resize status */
my_bool innodb_buffer_pool_load_incomplete;/*!< Buf pool load incomplete */
ulint innodb_buffer_pool_pages_total; /*!< Buffer pool size */
ulint innodb_buffer_pool_pages_data; /*!< Data pages */
ulint innodb_buffer_pool_bytes_data; /*!< File bytes used */
ulint innodb_buffer_pool_pages_dirty; /*!< Dirty data pages */
ulint innodb_buffer_pool_bytes_dirty; /*!< File bytes modified */
ulint innodb_buffer_pool_pages_misc; /*!< Miscellanous pages */
ulint innodb_buffer_pool_pages_free; /*!< Free pages */
#ifdef UNIV_DEBUG
ulint innodb_buffer_pool_pages_latched; /*!< Latched pages */
#endif /* UNIV_DEBUG */
ulint innodb_buffer_pool_pages_made_not_young;
ulint innodb_buffer_pool_pages_made_young;
ulint innodb_buffer_pool_pages_old;
ulint innodb_buffer_pool_read_requests; /*!< buf_pool.stat.n_page_gets */
ulint innodb_buffer_pool_reads; /*!< srv_buf_pool_reads */
ulint innodb_buffer_pool_write_requests;/*!< srv_stats.buf_pool_write_requests */
ulint innodb_buffer_pool_read_ahead_rnd;/*!< srv_read_ahead_rnd */
ulint innodb_buffer_pool_read_ahead; /*!< srv_read_ahead */
ulint innodb_buffer_pool_read_ahead_evicted;/*!< srv_read_ahead evicted*/
ulint innodb_checkpoint_age;
ulint innodb_checkpoint_max_age;
ulint innodb_data_pending_reads; /*!< Pending reads */
......
......@@ -1173,14 +1173,6 @@ ATTRIBUTE_COLD void logs_empty_and_mark_files_at_shutdown()
if (!buf_pool.is_initialised()) {
ut_ad(!srv_was_started);
} else if (ulint pending_io = buf_pool.io_pending()) {
if (srv_print_verbose_log && count > 600) {
ib::info() << "Waiting for " << pending_io << " buffer"
" page I/Os to complete";
count = 0;
}
goto loop;
} else {
buf_flush_buffer_pool();
}
......
......@@ -909,7 +909,7 @@ static monitor_info_t innodb_counter_info[] =
MONITOR_DEFAULT_START, MONITOR_MODULE_INDEX},
{"index_page_splits", "index", "Number of index page splits",
MONITOR_NONE,
MONITOR_EXISTING,
MONITOR_DEFAULT_START, MONITOR_INDEX_SPLIT},
{"index_page_merge_attempts", "index",
......@@ -1411,10 +1411,12 @@ srv_mon_process_existing_counter(
/* Get the value from corresponding global variable */
switch (monitor_id) {
/* export_vars.innodb_buffer_pool_reads. Num Reads from
disk (page not in buffer) */
case MONITOR_INDEX_SPLIT:
value = buf_pool.pages_split;
break;
case MONITOR_OVLD_BUF_POOL_READS:
value = srv_stats.buf_pool_reads;
value = buf_pool.stat.n_pages_read;
break;
/* innodb_buffer_pool_read_requests, the number of logical
......@@ -1475,7 +1477,7 @@ srv_mon_process_existing_counter(
/* innodb_buffer_pool_bytes_dirty */
case MONITOR_OVLD_BUF_POOL_BYTES_DIRTY:
value = buf_pool.stat.flush_list_bytes;
value = buf_pool.flush_list_bytes;
break;
/* innodb_buffer_pool_pages_free */
......
......@@ -675,6 +675,7 @@ void srv_boot()
if (transactional_lock_enabled())
sql_print_information("InnoDB: Using transactional memory");
#endif
buf_dblwr.init();
srv_thread_pool_init();
trx_pool_init();
srv_init();
......@@ -1001,59 +1002,22 @@ srv_export_innodb_status(void)
export_vars.innodb_data_writes = os_n_file_writes;
ulint dblwr = 0;
if (buf_dblwr.is_initialised()) {
buf_dblwr.lock();
dblwr = buf_dblwr.submitted();
ulint dblwr = buf_dblwr.submitted();
export_vars.innodb_dblwr_pages_written = buf_dblwr.written();
export_vars.innodb_dblwr_writes = buf_dblwr.batches();
buf_dblwr.unlock();
}
export_vars.innodb_data_written = srv_stats.data_written + dblwr;
export_vars.innodb_buffer_pool_read_requests
= buf_pool.stat.n_page_gets;
export_vars.innodb_buffer_pool_write_requests =
srv_stats.buf_pool_write_requests;
export_vars.innodb_buffer_pool_reads = srv_stats.buf_pool_reads;
export_vars.innodb_buffer_pool_read_ahead_rnd =
buf_pool.stat.n_ra_pages_read_rnd;
export_vars.innodb_buffer_pool_read_ahead =
buf_pool.stat.n_ra_pages_read;
export_vars.innodb_buffer_pool_read_ahead_evicted =
buf_pool.stat.n_ra_pages_evicted;
export_vars.innodb_buffer_pool_pages_data =
UT_LIST_GET_LEN(buf_pool.LRU);
export_vars.innodb_buffer_pool_bytes_data =
buf_pool.stat.LRU_bytes
+ (UT_LIST_GET_LEN(buf_pool.unzip_LRU)
<< srv_page_size_shift);
export_vars.innodb_buffer_pool_pages_dirty =
UT_LIST_GET_LEN(buf_pool.flush_list);
export_vars.innodb_buffer_pool_pages_made_young
= buf_pool.stat.n_pages_made_young;
export_vars.innodb_buffer_pool_pages_made_not_young
= buf_pool.stat.n_pages_not_made_young;
export_vars.innodb_buffer_pool_pages_old = buf_pool.LRU_old_len;
export_vars.innodb_buffer_pool_bytes_dirty =
buf_pool.stat.flush_list_bytes;
export_vars.innodb_buffer_pool_pages_free =
UT_LIST_GET_LEN(buf_pool.free);
#ifdef UNIV_DEBUG
export_vars.innodb_buffer_pool_pages_latched =
buf_get_latched_pages_number();
......
......@@ -1997,7 +1997,7 @@ void innodb_shutdown()
ut_ad(dict_sys.is_initialised() || !srv_was_started);
ut_ad(trx_sys.is_initialised() || !srv_was_started);
ut_ad(buf_dblwr.is_initialised() || !srv_was_started
ut_ad(buf_dblwr.is_created() || !srv_was_started
|| srv_read_only_mode
|| srv_force_recovery >= SRV_FORCE_NO_TRX_UNDO);
ut_ad(lock_sys.is_initialised() || !srv_was_started);
......
......@@ -181,7 +181,7 @@ compress_pages_page_decompressed compression 0 NULL NULL NULL 0 NULL NULL NULL N
compress_pages_page_compression_error compression 0 NULL NULL NULL 0 NULL NULL NULL NULL NULL NULL NULL 0 counter Number of page compression errors
compress_pages_encrypted compression 0 NULL NULL NULL 0 NULL NULL NULL NULL NULL NULL NULL 0 counter Number of pages encrypted
compress_pages_decrypted compression 0 NULL NULL NULL 0 NULL NULL NULL NULL NULL NULL NULL 0 counter Number of pages decrypted
index_page_splits index 0 NULL NULL NULL 0 NULL NULL NULL NULL NULL NULL NULL 0 counter Number of index page splits
index_page_splits index 0 NULL NULL NULL 0 NULL NULL NULL NULL NULL NULL NULL 0 status_counter Number of index page splits
index_page_merge_attempts index 0 NULL NULL NULL 0 NULL NULL NULL NULL NULL NULL NULL 0 counter Number of index page merge attempts
index_page_merge_successful index 0 NULL NULL NULL 0 NULL NULL NULL NULL NULL NULL NULL 0 counter Number of successful index page merges
index_page_reorg_attempts index 0 NULL NULL NULL 0 NULL NULL NULL NULL NULL NULL NULL 0 counter Number of index page reorganization attempts
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment