Commit a55b951e authored by Marko Mäkelä's avatar Marko Mäkelä

MDEV-26827 Make page flushing even faster

For more convenient monitoring of something that could greatly affect
the volume of page writes, we add the status variable
Innodb_buffer_pool_pages_split that was previously only available
via information_schema.innodb_metrics as "innodb_page_splits".
This was suggested by Axel Schwenke.

buf_flush_page_count: Replaced with buf_pool.stat.n_pages_written.
We protect buf_pool.stat (except n_page_gets) with buf_pool.mutex
and remove unnecessary export_vars indirection.

buf_pool.flush_list_bytes: Moved from buf_pool.stat.flush_list_bytes.
Protected by buf_pool.flush_list_mutex.

buf_pool_t::page_cleaner_status: Replaces buf_pool_t::n_flush_LRU_,
buf_pool_t::n_flush_list_, and buf_pool_t::page_cleaner_is_idle.
Protected by buf_pool.flush_list_mutex. We will exclusively broadcast
buf_pool.done_flush_list by the buf_flush_page_cleaner thread,
and only wait for it when communicating with buf_flush_page_cleaner.
There is no need to keep a count of pending writes by the
buf_pool.flush_list processing. A single flag suffices for that.

Waits for page write completion can be performed by
simply waiting on block->page.lock, or by invoking
buf_dblwr.wait_for_page_writes().

buf_LRU_block_free_non_file_page(): Broadcast buf_pool.done_free and
set buf_pool.try_LRU_scan when freeing a page. This would be
executed also as part of buf_page_write_complete().

buf_page_write_complete(): Do not broadcast buf_pool.done_flush_list,
and do not acquire buf_pool.mutex unless buf_pool.LRU eviction is needed.
Let buf_dblwr count all writes to persistent pages and broadcast a
condition variable when no outstanding writes remain.

buf_flush_page_cleaner(): Prioritize LRU flushing and eviction right after
"furious flushing" (lsn_limit). Simplify the conditions and reduce the
hold time of buf_pool.flush_list_mutex. Refuse to shut down
or sleep if buf_pool.ran_out(), that is, LRU eviction is needed.

buf_pool_t::page_cleaner_wakeup(): Add the optional parameter for_LRU.

buf_LRU_get_free_block(): Protect buf_lru_free_blocks_error_printed
with buf_pool.mutex. Invoke buf_pool.page_cleaner_wakeup(true) to
to ensure that buf_flush_page_cleaner() will process the LRU flush
request.

buf_do_LRU_batch(), buf_flush_list(), buf_flush_list_space():
Update buf_pool.stat.n_pages_written when submitting writes
(while holding buf_pool.mutex), not when completing them.

buf_page_t::flush(), buf_flush_discard_page(): Require that
the page U-latch be acquired upfront, and remove
buf_page_t::ready_for_flush().

buf_pool_t::delete_from_flush_list(): Remove the parameter "bool clear".

buf_flush_page(): Count pending page writes via buf_dblwr.

buf_flush_try_neighbors(): Take the block of page_id as a parameter.
If the tablespace is dropped before our page has been written out,
release the page U-latch.

buf_pool_invalidate(): Let the caller ensure that there are no
outstanding writes.

buf_flush_wait_batch_end(false),
buf_flush_wait_batch_end_acquiring_mutex(false):
Replaced with buf_dblwr.wait_for_page_writes().

buf_flush_wait_LRU_batch_end(): Replaces buf_flush_wait_batch_end(true).

buf_flush_list(): Remove some broadcast of buf_pool.done_flush_list.

buf_flush_buffer_pool(): Invoke also buf_dblwr.wait_for_page_writes().

buf_pool_t::io_pending(), buf_pool_t::n_flush_list(): Remove.
Outstanding writes are reflected by buf_dblwr.pending_writes().

buf_dblwr_t::init(): New function, to initialize the mutex and
the condition variables, but not the backing store.

buf_dblwr_t::is_created(): Replaces buf_dblwr_t::is_initialised().

buf_dblwr_t::pending_writes(), buf_dblwr_t::writes_pending:
Keeps track of writes of persistent data pages.

buf_flush_LRU(): Allow calls while LRU flushing may be in progress
in another thread.

Tested by Matthias Leich (correctness) and Axel Schwenke (performance)
parent 9593cccf
...@@ -199,7 +199,7 @@ compress_pages_page_decompressed compression 0 NULL NULL NULL 0 NULL NULL NULL N ...@@ -199,7 +199,7 @@ compress_pages_page_decompressed compression 0 NULL NULL NULL 0 NULL NULL NULL N
compress_pages_page_compression_error compression 0 NULL NULL NULL 0 NULL NULL NULL NULL NULL NULL NULL 0 counter Number of page compression errors compress_pages_page_compression_error compression 0 NULL NULL NULL 0 NULL NULL NULL NULL NULL NULL NULL 0 counter Number of page compression errors
compress_pages_encrypted compression 0 NULL NULL NULL 0 NULL NULL NULL NULL NULL NULL NULL 0 counter Number of pages encrypted compress_pages_encrypted compression 0 NULL NULL NULL 0 NULL NULL NULL NULL NULL NULL NULL 0 counter Number of pages encrypted
compress_pages_decrypted compression 0 NULL NULL NULL 0 NULL NULL NULL NULL NULL NULL NULL 0 counter Number of pages decrypted compress_pages_decrypted compression 0 NULL NULL NULL 0 NULL NULL NULL NULL NULL NULL NULL 0 counter Number of pages decrypted
index_page_splits index 0 NULL NULL NULL 0 NULL NULL NULL NULL NULL NULL NULL 0 counter Number of index page splits index_page_splits index 0 NULL NULL NULL 0 NULL NULL NULL NULL NULL NULL NULL 0 status_counter Number of index page splits
index_page_merge_attempts index 0 NULL NULL NULL 0 NULL NULL NULL NULL NULL NULL NULL 0 counter Number of index page merge attempts index_page_merge_attempts index 0 NULL NULL NULL 0 NULL NULL NULL NULL NULL NULL NULL 0 counter Number of index page merge attempts
index_page_merge_successful index 0 NULL NULL NULL 0 NULL NULL NULL NULL NULL NULL NULL 0 counter Number of successful index page merges index_page_merge_successful index 0 NULL NULL NULL 0 NULL NULL NULL NULL NULL NULL NULL 0 counter Number of successful index page merges
index_page_reorg_attempts index 0 NULL NULL NULL 0 NULL NULL NULL NULL NULL NULL NULL 0 counter Number of index page reorganization attempts index_page_reorg_attempts index 0 NULL NULL NULL 0 NULL NULL NULL NULL NULL NULL NULL 0 counter Number of index page reorganization attempts
......
...@@ -23,6 +23,7 @@ INNODB_BUFFER_POOL_PAGES_OLD ...@@ -23,6 +23,7 @@ INNODB_BUFFER_POOL_PAGES_OLD
INNODB_BUFFER_POOL_PAGES_TOTAL INNODB_BUFFER_POOL_PAGES_TOTAL
INNODB_BUFFER_POOL_PAGES_LRU_FLUSHED INNODB_BUFFER_POOL_PAGES_LRU_FLUSHED
INNODB_BUFFER_POOL_PAGES_LRU_FREED INNODB_BUFFER_POOL_PAGES_LRU_FREED
INNODB_BUFFER_POOL_PAGES_SPLIT
INNODB_BUFFER_POOL_READ_AHEAD_RND INNODB_BUFFER_POOL_READ_AHEAD_RND
INNODB_BUFFER_POOL_READ_AHEAD INNODB_BUFFER_POOL_READ_AHEAD
INNODB_BUFFER_POOL_READ_AHEAD_EVICTED INNODB_BUFFER_POOL_READ_AHEAD_EVICTED
......
...@@ -2975,6 +2975,8 @@ btr_page_split_and_insert( ...@@ -2975,6 +2975,8 @@ btr_page_split_and_insert(
ut_ad(*err == DB_SUCCESS); ut_ad(*err == DB_SUCCESS);
ut_ad(dtuple_check_typed(tuple)); ut_ad(dtuple_check_typed(tuple));
buf_pool.pages_split++;
if (cursor->index()->is_spatial()) { if (cursor->index()->is_spatial()) {
/* Split rtree page and update parent */ /* Split rtree page and update parent */
return rtr_page_split_and_insert(flags, cursor, offsets, heap, return rtr_page_split_and_insert(flags, cursor, offsets, heap,
...@@ -3371,8 +3373,6 @@ btr_page_split_and_insert( ...@@ -3371,8 +3373,6 @@ btr_page_split_and_insert(
left_block, right_block, mtr); left_block, right_block, mtr);
} }
MONITOR_INC(MONITOR_INDEX_SPLIT);
ut_ad(page_validate(buf_block_get_frame(left_block), ut_ad(page_validate(buf_block_get_frame(left_block),
page_cursor->index)); page_cursor->index));
ut_ad(page_validate(buf_block_get_frame(right_block), ut_ad(page_validate(buf_block_get_frame(right_block),
......
...@@ -1401,8 +1401,10 @@ inline bool buf_pool_t::withdraw_blocks() ...@@ -1401,8 +1401,10 @@ inline bool buf_pool_t::withdraw_blocks()
true); true);
mysql_mutex_unlock(&buf_pool.mutex); mysql_mutex_unlock(&buf_pool.mutex);
buf_dblwr.flush_buffered_writes(); buf_dblwr.flush_buffered_writes();
mysql_mutex_lock(&buf_pool.flush_list_mutex);
buf_flush_wait_LRU_batch_end();
mysql_mutex_unlock(&buf_pool.flush_list_mutex);
mysql_mutex_lock(&buf_pool.mutex); mysql_mutex_lock(&buf_pool.mutex);
buf_flush_wait_batch_end(true);
} }
/* relocate blocks/buddies in withdrawn area */ /* relocate blocks/buddies in withdrawn area */
...@@ -2265,13 +2267,15 @@ buf_page_t* buf_page_get_zip(const page_id_t page_id, ulint zip_size) ...@@ -2265,13 +2267,15 @@ buf_page_t* buf_page_get_zip(const page_id_t page_id, ulint zip_size)
return bpage; return bpage;
must_read_page: must_read_page:
if (dberr_t err= buf_read_page(page_id, zip_size)) switch (dberr_t err= buf_read_page(page_id, zip_size)) {
{ case DB_SUCCESS:
case DB_SUCCESS_LOCKED_REC:
goto lookup;
default:
ib::error() << "Reading compressed page " << page_id ib::error() << "Reading compressed page " << page_id
<< " failed with error: " << err; << " failed with error: " << err;
return nullptr; return nullptr;
} }
goto lookup;
} }
/********************************************************************//** /********************************************************************//**
...@@ -2511,20 +2515,23 @@ buf_page_get_low( ...@@ -2511,20 +2515,23 @@ buf_page_get_low(
corrupted, or if an encrypted page with a valid corrupted, or if an encrypted page with a valid
checksum cannot be decypted. */ checksum cannot be decypted. */
if (dberr_t local_err = buf_read_page(page_id, zip_size)) { switch (dberr_t local_err = buf_read_page(page_id, zip_size)) {
if (local_err != DB_CORRUPTION case DB_SUCCESS:
&& mode != BUF_GET_POSSIBLY_FREED case DB_SUCCESS_LOCKED_REC:
buf_read_ahead_random(page_id, zip_size, ibuf_inside(mtr));
break;
default:
if (mode != BUF_GET_POSSIBLY_FREED
&& retries++ < BUF_PAGE_READ_MAX_RETRIES) { && retries++ < BUF_PAGE_READ_MAX_RETRIES) {
DBUG_EXECUTE_IF("intermittent_read_failure", DBUG_EXECUTE_IF("intermittent_read_failure",
retries = BUF_PAGE_READ_MAX_RETRIES;); retries = BUF_PAGE_READ_MAX_RETRIES;);
} else {
if (err) {
*err = local_err;
}
return nullptr;
} }
} else { /* fall through */
buf_read_ahead_random(page_id, zip_size, ibuf_inside(mtr)); case DB_PAGE_CORRUPTED:
if (err) {
*err = local_err;
}
return nullptr;
} }
ut_d(if (!(++buf_dbg_counter % 5771)) buf_pool.validate()); ut_d(if (!(++buf_dbg_counter % 5771)) buf_pool.validate());
...@@ -3279,12 +3286,12 @@ static buf_block_t *buf_page_create_low(page_id_t page_id, ulint zip_size, ...@@ -3279,12 +3286,12 @@ static buf_block_t *buf_page_create_low(page_id_t page_id, ulint zip_size,
buf_unzip_LRU_add_block(reinterpret_cast<buf_block_t*>(bpage), FALSE); buf_unzip_LRU_add_block(reinterpret_cast<buf_block_t*>(bpage), FALSE);
} }
buf_pool.stat.n_pages_created++;
mysql_mutex_unlock(&buf_pool.mutex); mysql_mutex_unlock(&buf_pool.mutex);
mtr->memo_push(reinterpret_cast<buf_block_t*>(bpage), MTR_MEMO_PAGE_X_FIX); mtr->memo_push(reinterpret_cast<buf_block_t*>(bpage), MTR_MEMO_PAGE_X_FIX);
bpage->set_accessed(); bpage->set_accessed();
buf_pool.stat.n_pages_created++;
/* Delete possible entries for the page from the insert buffer: /* Delete possible entries for the page from the insert buffer:
such can exist if the page belonged to an index which was dropped */ such can exist if the page belonged to an index which was dropped */
...@@ -3534,7 +3541,6 @@ dberr_t buf_page_t::read_complete(const fil_node_t &node) ...@@ -3534,7 +3541,6 @@ dberr_t buf_page_t::read_complete(const fil_node_t &node)
ut_d(auto n=) buf_pool.n_pend_reads--; ut_d(auto n=) buf_pool.n_pend_reads--;
ut_ad(n > 0); ut_ad(n > 0);
buf_pool.stat.n_pages_read++;
const byte *read_frame= zip.data ? zip.data : frame; const byte *read_frame= zip.data ? zip.data : frame;
ut_ad(read_frame); ut_ad(read_frame);
...@@ -3686,9 +3692,6 @@ void buf_pool_invalidate() ...@@ -3686,9 +3692,6 @@ void buf_pool_invalidate()
{ {
mysql_mutex_lock(&buf_pool.mutex); mysql_mutex_lock(&buf_pool.mutex);
buf_flush_wait_batch_end(true);
buf_flush_wait_batch_end(false);
/* It is possible that a write batch that has been posted /* It is possible that a write batch that has been posted
earlier is still not complete. For buffer pool invalidation to earlier is still not complete. For buffer pool invalidation to
proceed we must ensure there is NO write activity happening. */ proceed we must ensure there is NO write activity happening. */
...@@ -3839,8 +3842,8 @@ void buf_pool_t::print() ...@@ -3839,8 +3842,8 @@ void buf_pool_t::print()
<< UT_LIST_GET_LEN(flush_list) << UT_LIST_GET_LEN(flush_list)
<< ", n pending decompressions=" << n_pend_unzip << ", n pending decompressions=" << n_pend_unzip
<< ", n pending reads=" << n_pend_reads << ", n pending reads=" << n_pend_reads
<< ", n pending flush LRU=" << n_flush_LRU_ << ", n pending flush LRU=" << n_flush()
<< " list=" << n_flush_list_ << " list=" << buf_dblwr.pending_writes()
<< ", pages made young=" << stat.n_pages_made_young << ", pages made young=" << stat.n_pages_made_young
<< ", not young=" << stat.n_pages_not_made_young << ", not young=" << stat.n_pages_not_made_young
<< ", pages read=" << stat.n_pages_read << ", pages read=" << stat.n_pages_read
...@@ -3952,13 +3955,13 @@ void buf_stats_get_pool_info(buf_pool_info_t *pool_info) ...@@ -3952,13 +3955,13 @@ void buf_stats_get_pool_info(buf_pool_info_t *pool_info)
pool_info->flush_list_len = UT_LIST_GET_LEN(buf_pool.flush_list); pool_info->flush_list_len = UT_LIST_GET_LEN(buf_pool.flush_list);
pool_info->n_pend_unzip = UT_LIST_GET_LEN(buf_pool.unzip_LRU); pool_info->n_pend_unzip = UT_LIST_GET_LEN(buf_pool.unzip_LRU);
mysql_mutex_unlock(&buf_pool.flush_list_mutex);
pool_info->n_pend_reads = buf_pool.n_pend_reads; pool_info->n_pend_reads = buf_pool.n_pend_reads;
pool_info->n_pending_flush_lru = buf_pool.n_flush_LRU_; pool_info->n_pending_flush_lru = buf_pool.n_flush();
pool_info->n_pending_flush_list = buf_pool.n_flush_list_; pool_info->n_pending_flush_list = buf_dblwr.pending_writes();
mysql_mutex_unlock(&buf_pool.flush_list_mutex);
current_time = time(NULL); current_time = time(NULL);
time_elapsed = 0.001 + difftime(current_time, time_elapsed = 0.001 + difftime(current_time,
......
...@@ -46,7 +46,17 @@ inline buf_block_t *buf_dblwr_trx_sys_get(mtr_t *mtr) ...@@ -46,7 +46,17 @@ inline buf_block_t *buf_dblwr_trx_sys_get(mtr_t *mtr)
0, RW_X_LATCH, mtr); 0, RW_X_LATCH, mtr);
} }
/** Initialize the doublewrite buffer data structure. void buf_dblwr_t::init()
{
if (!active_slot)
{
active_slot= &slots[0];
mysql_mutex_init(buf_dblwr_mutex_key, &mutex, nullptr);
pthread_cond_init(&cond, nullptr);
}
}
/** Initialise the persistent storage of the doublewrite buffer.
@param header doublewrite page header in the TRX_SYS page */ @param header doublewrite page header in the TRX_SYS page */
inline void buf_dblwr_t::init(const byte *header) inline void buf_dblwr_t::init(const byte *header)
{ {
...@@ -54,8 +64,6 @@ inline void buf_dblwr_t::init(const byte *header) ...@@ -54,8 +64,6 @@ inline void buf_dblwr_t::init(const byte *header)
ut_ad(!active_slot->reserved); ut_ad(!active_slot->reserved);
ut_ad(!batch_running); ut_ad(!batch_running);
mysql_mutex_init(buf_dblwr_mutex_key, &mutex, nullptr);
pthread_cond_init(&cond, nullptr);
block1= page_id_t(0, mach_read_from_4(header + TRX_SYS_DOUBLEWRITE_BLOCK1)); block1= page_id_t(0, mach_read_from_4(header + TRX_SYS_DOUBLEWRITE_BLOCK1));
block2= page_id_t(0, mach_read_from_4(header + TRX_SYS_DOUBLEWRITE_BLOCK2)); block2= page_id_t(0, mach_read_from_4(header + TRX_SYS_DOUBLEWRITE_BLOCK2));
...@@ -74,7 +82,7 @@ inline void buf_dblwr_t::init(const byte *header) ...@@ -74,7 +82,7 @@ inline void buf_dblwr_t::init(const byte *header)
@return whether the operation succeeded */ @return whether the operation succeeded */
bool buf_dblwr_t::create() bool buf_dblwr_t::create()
{ {
if (is_initialised()) if (is_created())
return true; return true;
mtr_t mtr; mtr_t mtr;
...@@ -343,7 +351,7 @@ dberr_t buf_dblwr_t::init_or_load_pages(pfs_os_file_t file, const char *path) ...@@ -343,7 +351,7 @@ dberr_t buf_dblwr_t::init_or_load_pages(pfs_os_file_t file, const char *path)
void buf_dblwr_t::recover() void buf_dblwr_t::recover()
{ {
ut_ad(recv_sys.parse_start_lsn); ut_ad(recv_sys.parse_start_lsn);
if (!is_initialised()) if (!is_created())
return; return;
uint32_t page_no_dblwr= 0; uint32_t page_no_dblwr= 0;
...@@ -452,10 +460,9 @@ void buf_dblwr_t::recover() ...@@ -452,10 +460,9 @@ void buf_dblwr_t::recover()
/** Free the doublewrite buffer. */ /** Free the doublewrite buffer. */
void buf_dblwr_t::close() void buf_dblwr_t::close()
{ {
if (!is_initialised()) if (!active_slot)
return; return;
/* Free the double write data structures. */
ut_ad(!active_slot->reserved); ut_ad(!active_slot->reserved);
ut_ad(!active_slot->first_free); ut_ad(!active_slot->first_free);
ut_ad(!batch_running); ut_ad(!batch_running);
...@@ -469,35 +476,41 @@ void buf_dblwr_t::close() ...@@ -469,35 +476,41 @@ void buf_dblwr_t::close()
mysql_mutex_destroy(&mutex); mysql_mutex_destroy(&mutex);
memset((void*) this, 0, sizeof *this); memset((void*) this, 0, sizeof *this);
active_slot= &slots[0];
} }
/** Update the doublewrite buffer on write completion. */ /** Update the doublewrite buffer on write completion. */
void buf_dblwr_t::write_completed() void buf_dblwr_t::write_completed(bool with_doublewrite)
{ {
ut_ad(this == &buf_dblwr); ut_ad(this == &buf_dblwr);
ut_ad(srv_use_doublewrite_buf);
ut_ad(is_initialised());
ut_ad(!srv_read_only_mode); ut_ad(!srv_read_only_mode);
mysql_mutex_lock(&mutex); mysql_mutex_lock(&mutex);
ut_ad(batch_running); ut_ad(writes_pending);
slot *flush_slot= active_slot == &slots[0] ? &slots[1] : &slots[0]; if (!--writes_pending)
ut_ad(flush_slot->reserved); pthread_cond_broadcast(&write_cond);
ut_ad(flush_slot->reserved <= flush_slot->first_free);
if (!--flush_slot->reserved) if (with_doublewrite)
{ {
mysql_mutex_unlock(&mutex); ut_ad(is_created());
/* This will finish the batch. Sync data files to the disk. */ ut_ad(srv_use_doublewrite_buf);
fil_flush_file_spaces(); ut_ad(batch_running);
mysql_mutex_lock(&mutex); slot *flush_slot= active_slot == &slots[0] ? &slots[1] : &slots[0];
ut_ad(flush_slot->reserved);
ut_ad(flush_slot->reserved <= flush_slot->first_free);
if (!--flush_slot->reserved)
{
mysql_mutex_unlock(&mutex);
/* This will finish the batch. Sync data files to the disk. */
fil_flush_file_spaces();
mysql_mutex_lock(&mutex);
/* We can now reuse the doublewrite memory buffer: */ /* We can now reuse the doublewrite memory buffer: */
flush_slot->first_free= 0; flush_slot->first_free= 0;
batch_running= false; batch_running= false;
pthread_cond_broadcast(&cond); pthread_cond_broadcast(&cond);
}
} }
mysql_mutex_unlock(&mutex); mysql_mutex_unlock(&mutex);
...@@ -642,7 +655,7 @@ void buf_dblwr_t::flush_buffered_writes_completed(const IORequest &request) ...@@ -642,7 +655,7 @@ void buf_dblwr_t::flush_buffered_writes_completed(const IORequest &request)
{ {
ut_ad(this == &buf_dblwr); ut_ad(this == &buf_dblwr);
ut_ad(srv_use_doublewrite_buf); ut_ad(srv_use_doublewrite_buf);
ut_ad(is_initialised()); ut_ad(is_created());
ut_ad(!srv_read_only_mode); ut_ad(!srv_read_only_mode);
ut_ad(!request.bpage); ut_ad(!request.bpage);
ut_ad(request.node == fil_system.sys_space->chain.start); ut_ad(request.node == fil_system.sys_space->chain.start);
...@@ -708,7 +721,7 @@ posted, and also when we may have to wait for a page latch! ...@@ -708,7 +721,7 @@ posted, and also when we may have to wait for a page latch!
Otherwise a deadlock of threads can occur. */ Otherwise a deadlock of threads can occur. */
void buf_dblwr_t::flush_buffered_writes() void buf_dblwr_t::flush_buffered_writes()
{ {
if (!is_initialised() || !srv_use_doublewrite_buf) if (!is_created() || !srv_use_doublewrite_buf)
{ {
fil_flush_file_spaces(); fil_flush_file_spaces();
return; return;
...@@ -741,6 +754,7 @@ void buf_dblwr_t::add_to_batch(const IORequest &request, size_t size) ...@@ -741,6 +754,7 @@ void buf_dblwr_t::add_to_batch(const IORequest &request, size_t size)
const ulint buf_size= 2 * block_size(); const ulint buf_size= 2 * block_size();
mysql_mutex_lock(&mutex); mysql_mutex_lock(&mutex);
writes_pending++;
for (;;) for (;;)
{ {
......
This diff is collapsed.
...@@ -136,7 +136,6 @@ static void buf_LRU_block_free_hashed_page(buf_block_t *block) ...@@ -136,7 +136,6 @@ static void buf_LRU_block_free_hashed_page(buf_block_t *block)
@param[in] bpage control block */ @param[in] bpage control block */
static inline void incr_LRU_size_in_bytes(const buf_page_t* bpage) static inline void incr_LRU_size_in_bytes(const buf_page_t* bpage)
{ {
/* FIXME: use atomics, not mutex */
mysql_mutex_assert_owner(&buf_pool.mutex); mysql_mutex_assert_owner(&buf_pool.mutex);
buf_pool.stat.LRU_bytes += bpage->physical_size(); buf_pool.stat.LRU_bytes += bpage->physical_size();
...@@ -400,6 +399,7 @@ buf_block_t *buf_LRU_get_free_block(bool have_mutex) ...@@ -400,6 +399,7 @@ buf_block_t *buf_LRU_get_free_block(bool have_mutex)
DBUG_EXECUTE_IF("recv_ran_out_of_buffer", DBUG_EXECUTE_IF("recv_ran_out_of_buffer",
if (recv_recovery_is_on() if (recv_recovery_is_on()
&& recv_sys.apply_log_recs) { && recv_sys.apply_log_recs) {
mysql_mutex_lock(&buf_pool.mutex);
goto flush_lru; goto flush_lru;
}); });
get_mutex: get_mutex:
...@@ -445,20 +445,32 @@ buf_block_t *buf_LRU_get_free_block(bool have_mutex) ...@@ -445,20 +445,32 @@ buf_block_t *buf_LRU_get_free_block(bool have_mutex)
if ((block = buf_LRU_get_free_only()) != nullptr) { if ((block = buf_LRU_get_free_only()) != nullptr) {
goto got_block; goto got_block;
} }
if (!buf_pool.n_flush_LRU_) { mysql_mutex_unlock(&buf_pool.mutex);
break; mysql_mutex_lock(&buf_pool.flush_list_mutex);
const auto n_flush = buf_pool.n_flush();
mysql_mutex_unlock(&buf_pool.flush_list_mutex);
mysql_mutex_lock(&buf_pool.mutex);
if (!n_flush) {
goto not_found;
}
if (!buf_pool.try_LRU_scan) {
mysql_mutex_lock(&buf_pool.flush_list_mutex);
buf_pool.page_cleaner_wakeup(true);
mysql_mutex_unlock(&buf_pool.flush_list_mutex);
my_cond_wait(&buf_pool.done_free,
&buf_pool.mutex.m_mutex);
} }
my_cond_wait(&buf_pool.done_free, &buf_pool.mutex.m_mutex);
} }
#ifndef DBUG_OFF
not_found: not_found:
#endif if (n_iterations > 1) {
mysql_mutex_unlock(&buf_pool.mutex); MONITOR_INC( MONITOR_LRU_GET_FREE_WAITS );
}
if (n_iterations > 20 && !buf_lru_free_blocks_error_printed if (n_iterations == 21 && !buf_lru_free_blocks_error_printed
&& srv_buf_pool_old_size == srv_buf_pool_size) { && srv_buf_pool_old_size == srv_buf_pool_size) {
buf_lru_free_blocks_error_printed = true;
mysql_mutex_unlock(&buf_pool.mutex);
ib::warn() << "Difficult to find free blocks in the buffer pool" ib::warn() << "Difficult to find free blocks in the buffer pool"
" (" << n_iterations << " search iterations)! " " (" << n_iterations << " search iterations)! "
<< flush_failures << " failed attempts to" << flush_failures << " failed attempts to"
...@@ -472,12 +484,7 @@ buf_block_t *buf_LRU_get_free_block(bool have_mutex) ...@@ -472,12 +484,7 @@ buf_block_t *buf_LRU_get_free_block(bool have_mutex)
<< os_n_file_writes << " OS file writes, " << os_n_file_writes << " OS file writes, "
<< os_n_fsyncs << os_n_fsyncs
<< " OS fsyncs."; << " OS fsyncs.";
mysql_mutex_lock(&buf_pool.mutex);
buf_lru_free_blocks_error_printed = true;
}
if (n_iterations > 1) {
MONITOR_INC( MONITOR_LRU_GET_FREE_WAITS );
} }
/* No free block was found: try to flush the LRU list. /* No free block was found: try to flush the LRU list.
...@@ -491,8 +498,6 @@ buf_block_t *buf_LRU_get_free_block(bool have_mutex) ...@@ -491,8 +498,6 @@ buf_block_t *buf_LRU_get_free_block(bool have_mutex)
#ifndef DBUG_OFF #ifndef DBUG_OFF
flush_lru: flush_lru:
#endif #endif
mysql_mutex_lock(&buf_pool.mutex);
if (!buf_flush_LRU(innodb_lru_flush_size, true)) { if (!buf_flush_LRU(innodb_lru_flush_size, true)) {
MONITOR_INC(MONITOR_LRU_SINGLE_FLUSH_FAILURE_COUNT); MONITOR_INC(MONITOR_LRU_SINGLE_FLUSH_FAILURE_COUNT);
++flush_failures; ++flush_failures;
...@@ -1039,7 +1044,8 @@ buf_LRU_block_free_non_file_page( ...@@ -1039,7 +1044,8 @@ buf_LRU_block_free_non_file_page(
} else { } else {
UT_LIST_ADD_FIRST(buf_pool.free, &block->page); UT_LIST_ADD_FIRST(buf_pool.free, &block->page);
ut_d(block->page.in_free_list = true); ut_d(block->page.in_free_list = true);
pthread_cond_signal(&buf_pool.done_free); buf_pool.try_LRU_scan= true;
pthread_cond_broadcast(&buf_pool.done_free);
} }
MEM_NOACCESS(block->page.frame, srv_page_size); MEM_NOACCESS(block->page.frame, srv_page_size);
......
...@@ -226,6 +226,7 @@ static buf_page_t* buf_page_init_for_read(ulint mode, const page_id_t page_id, ...@@ -226,6 +226,7 @@ static buf_page_t* buf_page_init_for_read(ulint mode, const page_id_t page_id,
buf_LRU_add_block(bpage, true/* to old blocks */); buf_LRU_add_block(bpage, true/* to old blocks */);
} }
buf_pool.stat.n_pages_read++;
mysql_mutex_unlock(&buf_pool.mutex); mysql_mutex_unlock(&buf_pool.mutex);
buf_pool.n_pend_reads++; buf_pool.n_pend_reads++;
goto func_exit_no_mutex; goto func_exit_no_mutex;
...@@ -245,20 +246,18 @@ buffer buf_pool if it is not already there, in which case does nothing. ...@@ -245,20 +246,18 @@ buffer buf_pool if it is not already there, in which case does nothing.
Sets the io_fix flag and sets an exclusive lock on the buffer frame. The Sets the io_fix flag and sets an exclusive lock on the buffer frame. The
flag is cleared and the x-lock released by an i/o-handler thread. flag is cleared and the x-lock released by an i/o-handler thread.
@param[out] err DB_SUCCESS or DB_TABLESPACE_DELETED
if we are trying
to read from a non-existent tablespace
@param[in,out] space tablespace @param[in,out] space tablespace
@param[in] sync true if synchronous aio is desired @param[in] sync true if synchronous aio is desired
@param[in] mode BUF_READ_IBUF_PAGES_ONLY, ..., @param[in] mode BUF_READ_IBUF_PAGES_ONLY, ...,
@param[in] page_id page id @param[in] page_id page id
@param[in] zip_size ROW_FORMAT=COMPRESSED page size, or 0 @param[in] zip_size ROW_FORMAT=COMPRESSED page size, or 0
@param[in] unzip true=request uncompressed page @param[in] unzip true=request uncompressed page
@return whether a read request was queued */ @return error code
@retval DB_SUCCESS if the page was read
@retval DB_SUCCESS_LOCKED_REC if the page exists in the buffer pool already */
static static
bool dberr_t
buf_read_page_low( buf_read_page_low(
dberr_t* err,
fil_space_t* space, fil_space_t* space,
bool sync, bool sync,
ulint mode, ulint mode,
...@@ -268,15 +267,12 @@ buf_read_page_low( ...@@ -268,15 +267,12 @@ buf_read_page_low(
{ {
buf_page_t* bpage; buf_page_t* bpage;
*err = DB_SUCCESS;
if (buf_dblwr.is_inside(page_id)) { if (buf_dblwr.is_inside(page_id)) {
ib::error() << "Trying to read doublewrite buffer page " ib::error() << "Trying to read doublewrite buffer page "
<< page_id; << page_id;
ut_ad(0); ut_ad(0);
nothing_read:
space->release(); space->release();
return false; return DB_PAGE_CORRUPTED;
} }
if (sync) { if (sync) {
...@@ -299,8 +295,9 @@ buf_read_page_low( ...@@ -299,8 +295,9 @@ buf_read_page_low(
completed */ completed */
bpage = buf_page_init_for_read(mode, page_id, zip_size, unzip); bpage = buf_page_init_for_read(mode, page_id, zip_size, unzip);
if (bpage == NULL) { if (!bpage) {
goto nothing_read; space->release();
return DB_SUCCESS_LOCKED_REC;
} }
ut_ad(bpage->in_file()); ut_ad(bpage->in_file());
...@@ -320,7 +317,6 @@ buf_read_page_low( ...@@ -320,7 +317,6 @@ buf_read_page_low(
? IORequest::READ_SYNC ? IORequest::READ_SYNC
: IORequest::READ_ASYNC), : IORequest::READ_ASYNC),
page_id.page_no() * len, len, dst, bpage); page_id.page_no() * len, len, dst, bpage);
*err = fio.err;
if (UNIV_UNLIKELY(fio.err != DB_SUCCESS)) { if (UNIV_UNLIKELY(fio.err != DB_SUCCESS)) {
ut_d(auto n=) buf_pool.n_pend_reads--; ut_d(auto n=) buf_pool.n_pend_reads--;
...@@ -329,14 +325,14 @@ buf_read_page_low( ...@@ -329,14 +325,14 @@ buf_read_page_low(
} else if (sync) { } else if (sync) {
thd_wait_end(NULL); thd_wait_end(NULL);
/* The i/o was already completed in space->io() */ /* The i/o was already completed in space->io() */
*err = bpage->read_complete(*fio.node); fio.err = bpage->read_complete(*fio.node);
space->release(); space->release();
if (*err == DB_FAIL) { if (fio.err == DB_FAIL) {
*err = DB_PAGE_CORRUPTED; fio.err = DB_PAGE_CORRUPTED;
} }
} }
return true; return fio.err;
} }
/** Applies a random read-ahead in buf_pool if there are at least a threshold /** Applies a random read-ahead in buf_pool if there are at least a threshold
...@@ -414,24 +410,26 @@ buf_read_ahead_random(const page_id_t page_id, ulint zip_size, bool ibuf) ...@@ -414,24 +410,26 @@ buf_read_ahead_random(const page_id_t page_id, ulint zip_size, bool ibuf)
continue; continue;
if (space->is_stopping()) if (space->is_stopping())
break; break;
dberr_t err;
space->reacquire(); space->reacquire();
if (buf_read_page_low(&err, space, false, ibuf_mode, i, zip_size, false)) if (buf_read_page_low(space, false, ibuf_mode, i, zip_size, false) ==
DB_SUCCESS)
count++; count++;
} }
if (count) if (count)
{
DBUG_PRINT("ib_buf", ("random read-ahead %zu pages from %s: %u", DBUG_PRINT("ib_buf", ("random read-ahead %zu pages from %s: %u",
count, space->chain.start->name, count, space->chain.start->name,
low.page_no())); low.page_no()));
space->release(); mysql_mutex_lock(&buf_pool.mutex);
/* Read ahead is considered one I/O operation for the purpose of
/* Read ahead is considered one I/O operation for the purpose of LRU policy decision. */
LRU policy decision. */ buf_LRU_stat_inc_io();
buf_LRU_stat_inc_io(); buf_pool.stat.n_ra_pages_read_rnd+= count;
mysql_mutex_unlock(&buf_pool.mutex);
}
buf_pool.stat.n_ra_pages_read_rnd+= count; space->release();
srv_stats.buf_pool_reads.add(count);
return count; return count;
} }
...@@ -441,8 +439,9 @@ on the buffer frame. The flag is cleared and the x-lock ...@@ -441,8 +439,9 @@ on the buffer frame. The flag is cleared and the x-lock
released by the i/o-handler thread. released by the i/o-handler thread.
@param[in] page_id page id @param[in] page_id page id
@param[in] zip_size ROW_FORMAT=COMPRESSED page size, or 0 @param[in] zip_size ROW_FORMAT=COMPRESSED page size, or 0
@retval DB_SUCCESS if the page was read and is not corrupted, @retval DB_SUCCESS if the page was read and is not corrupted
@retval DB_PAGE_CORRUPTED if page based on checksum check is corrupted, @retval DB_SUCCESS_LOCKED_REC if the page was not read
@retval DB_PAGE_CORRUPTED if page based on checksum check is corrupted
@retval DB_DECRYPTION_FAILED if page post encryption checksum matches but @retval DB_DECRYPTION_FAILED if page post encryption checksum matches but
after decryption normal page checksum does not match. after decryption normal page checksum does not match.
@retval DB_TABLESPACE_DELETED if tablespace .ibd file is missing */ @retval DB_TABLESPACE_DELETED if tablespace .ibd file is missing */
...@@ -456,13 +455,9 @@ dberr_t buf_read_page(const page_id_t page_id, ulint zip_size) ...@@ -456,13 +455,9 @@ dberr_t buf_read_page(const page_id_t page_id, ulint zip_size)
return DB_TABLESPACE_DELETED; return DB_TABLESPACE_DELETED;
} }
dberr_t err; buf_LRU_stat_inc_io(); /* NOT protected by buf_pool.mutex */
if (buf_read_page_low(&err, space, true, BUF_READ_ANY_PAGE, return buf_read_page_low(space, true, BUF_READ_ANY_PAGE,
page_id, zip_size, false)) page_id, zip_size, false);
srv_stats.buf_pool_reads.add(1);
buf_LRU_stat_inc_io();
return err;
} }
/** High-level function which reads a page asynchronously from a file to the /** High-level function which reads a page asynchronously from a file to the
...@@ -475,12 +470,8 @@ released by the i/o-handler thread. ...@@ -475,12 +470,8 @@ released by the i/o-handler thread.
void buf_read_page_background(fil_space_t *space, const page_id_t page_id, void buf_read_page_background(fil_space_t *space, const page_id_t page_id,
ulint zip_size) ulint zip_size)
{ {
dberr_t err; buf_read_page_low(space, false, BUF_READ_ANY_PAGE,
page_id, zip_size, false);
if (buf_read_page_low(&err, space, false, BUF_READ_ANY_PAGE,
page_id, zip_size, false)) {
srv_stats.buf_pool_reads.add(1);
}
/* We do not increment number of I/O operations used for LRU policy /* We do not increment number of I/O operations used for LRU policy
here (buf_LRU_stat_inc_io()). We use this in heuristics to decide here (buf_LRU_stat_inc_io()). We use this in heuristics to decide
...@@ -638,23 +629,26 @@ buf_read_ahead_linear(const page_id_t page_id, ulint zip_size, bool ibuf) ...@@ -638,23 +629,26 @@ buf_read_ahead_linear(const page_id_t page_id, ulint zip_size, bool ibuf)
continue; continue;
if (space->is_stopping()) if (space->is_stopping())
break; break;
dberr_t err;
space->reacquire(); space->reacquire();
count+= buf_read_page_low(&err, space, false, ibuf_mode, new_low, zip_size, if (buf_read_page_low(space, false, ibuf_mode, new_low, zip_size, false) ==
false); DB_SUCCESS)
count++;
} }
if (count) if (count)
{
DBUG_PRINT("ib_buf", ("random read-ahead %zu pages from %s: %u", DBUG_PRINT("ib_buf", ("random read-ahead %zu pages from %s: %u",
count, space->chain.start->name, count, space->chain.start->name,
new_low.page_no())); new_low.page_no()));
space->release(); mysql_mutex_lock(&buf_pool.mutex);
/* Read ahead is considered one I/O operation for the purpose of
/* Read ahead is considered one I/O operation for the purpose of LRU policy decision. */
LRU policy decision. */ buf_LRU_stat_inc_io();
buf_LRU_stat_inc_io(); buf_pool.stat.n_ra_pages_read+= count;
mysql_mutex_unlock(&buf_pool.mutex);
}
buf_pool.stat.n_ra_pages_read+= count; space->release();
return count; return count;
} }
...@@ -709,13 +703,12 @@ void buf_read_recv_pages(ulint space_id, const uint32_t* page_nos, ulint n) ...@@ -709,13 +703,12 @@ void buf_read_recv_pages(ulint space_id, const uint32_t* page_nos, ulint n)
} }
} }
dberr_t err;
space->reacquire(); space->reacquire();
buf_read_page_low(&err, space, false, switch (buf_read_page_low(space, false, BUF_READ_ANY_PAGE,
BUF_READ_ANY_PAGE, cur_page_id, zip_size, cur_page_id, zip_size, true)) {
true); case DB_SUCCESS: case DB_SUCCESS_LOCKED_REC:
break;
if (err != DB_SUCCESS) { default:
sql_print_error("InnoDB: Recovery failed to read page " sql_print_error("InnoDB: Recovery failed to read page "
UINT32PF " from %s", UINT32PF " from %s",
cur_page_id.page_no(), cur_page_id.page_no(),
......
...@@ -1209,8 +1209,6 @@ rtr_page_split_and_insert( ...@@ -1209,8 +1209,6 @@ rtr_page_split_and_insert(
ut_ad(!rec || rec_offs_validate(rec, cursor->index(), *offsets)); ut_ad(!rec || rec_offs_validate(rec, cursor->index(), *offsets));
#endif #endif
MONITOR_INC(MONITOR_INDEX_SPLIT);
return(rec); return(rec);
} }
......
...@@ -915,43 +915,37 @@ static SHOW_VAR innodb_status_variables[]= { ...@@ -915,43 +915,37 @@ static SHOW_VAR innodb_status_variables[]= {
(char*) &export_vars.innodb_buffer_pool_resize_status, SHOW_CHAR}, (char*) &export_vars.innodb_buffer_pool_resize_status, SHOW_CHAR},
{"buffer_pool_load_incomplete", {"buffer_pool_load_incomplete",
&export_vars.innodb_buffer_pool_load_incomplete, SHOW_BOOL}, &export_vars.innodb_buffer_pool_load_incomplete, SHOW_BOOL},
{"buffer_pool_pages_data", {"buffer_pool_pages_data", &UT_LIST_GET_LEN(buf_pool.LRU), SHOW_SIZE_T},
&export_vars.innodb_buffer_pool_pages_data, SHOW_SIZE_T},
{"buffer_pool_bytes_data", {"buffer_pool_bytes_data",
&export_vars.innodb_buffer_pool_bytes_data, SHOW_SIZE_T}, &export_vars.innodb_buffer_pool_bytes_data, SHOW_SIZE_T},
{"buffer_pool_pages_dirty", {"buffer_pool_pages_dirty",
&export_vars.innodb_buffer_pool_pages_dirty, SHOW_SIZE_T}, &UT_LIST_GET_LEN(buf_pool.flush_list), SHOW_SIZE_T},
{"buffer_pool_bytes_dirty", {"buffer_pool_bytes_dirty", &buf_pool.flush_list_bytes, SHOW_SIZE_T},
&export_vars.innodb_buffer_pool_bytes_dirty, SHOW_SIZE_T}, {"buffer_pool_pages_flushed", &buf_pool.stat.n_pages_written, SHOW_SIZE_T},
{"buffer_pool_pages_flushed", &buf_flush_page_count, SHOW_SIZE_T}, {"buffer_pool_pages_free", &UT_LIST_GET_LEN(buf_pool.free), SHOW_SIZE_T},
{"buffer_pool_pages_free",
&export_vars.innodb_buffer_pool_pages_free, SHOW_SIZE_T},
#ifdef UNIV_DEBUG #ifdef UNIV_DEBUG
{"buffer_pool_pages_latched", {"buffer_pool_pages_latched",
&export_vars.innodb_buffer_pool_pages_latched, SHOW_SIZE_T}, &export_vars.innodb_buffer_pool_pages_latched, SHOW_SIZE_T},
#endif /* UNIV_DEBUG */ #endif /* UNIV_DEBUG */
{"buffer_pool_pages_made_not_young", {"buffer_pool_pages_made_not_young",
&export_vars.innodb_buffer_pool_pages_made_not_young, SHOW_SIZE_T}, &buf_pool.stat.n_pages_not_made_young, SHOW_SIZE_T},
{"buffer_pool_pages_made_young", {"buffer_pool_pages_made_young",
&export_vars.innodb_buffer_pool_pages_made_young, SHOW_SIZE_T}, &buf_pool.stat.n_pages_made_young, SHOW_SIZE_T},
{"buffer_pool_pages_misc", {"buffer_pool_pages_misc",
&export_vars.innodb_buffer_pool_pages_misc, SHOW_SIZE_T}, &export_vars.innodb_buffer_pool_pages_misc, SHOW_SIZE_T},
{"buffer_pool_pages_old", {"buffer_pool_pages_old", &buf_pool.LRU_old_len, SHOW_SIZE_T},
&export_vars.innodb_buffer_pool_pages_old, SHOW_SIZE_T},
{"buffer_pool_pages_total", {"buffer_pool_pages_total",
&export_vars.innodb_buffer_pool_pages_total, SHOW_SIZE_T}, &export_vars.innodb_buffer_pool_pages_total, SHOW_SIZE_T},
{"buffer_pool_pages_LRU_flushed", &buf_lru_flush_page_count, SHOW_SIZE_T}, {"buffer_pool_pages_LRU_flushed", &buf_lru_flush_page_count, SHOW_SIZE_T},
{"buffer_pool_pages_LRU_freed", &buf_lru_freed_page_count, SHOW_SIZE_T}, {"buffer_pool_pages_LRU_freed", &buf_lru_freed_page_count, SHOW_SIZE_T},
{"buffer_pool_pages_split", &buf_pool.pages_split, SHOW_SIZE_T},
{"buffer_pool_read_ahead_rnd", {"buffer_pool_read_ahead_rnd",
&export_vars.innodb_buffer_pool_read_ahead_rnd, SHOW_SIZE_T}, &buf_pool.stat.n_ra_pages_read_rnd, SHOW_SIZE_T},
{"buffer_pool_read_ahead", {"buffer_pool_read_ahead", &buf_pool.stat.n_ra_pages_read, SHOW_SIZE_T},
&export_vars.innodb_buffer_pool_read_ahead, SHOW_SIZE_T},
{"buffer_pool_read_ahead_evicted", {"buffer_pool_read_ahead_evicted",
&export_vars.innodb_buffer_pool_read_ahead_evicted, SHOW_SIZE_T}, &buf_pool.stat.n_ra_pages_evicted, SHOW_SIZE_T},
{"buffer_pool_read_requests", {"buffer_pool_read_requests", &buf_pool.stat.n_page_gets, SHOW_SIZE_T},
&export_vars.innodb_buffer_pool_read_requests, SHOW_SIZE_T}, {"buffer_pool_reads", &buf_pool.stat.n_pages_read, SHOW_SIZE_T},
{"buffer_pool_reads",
&export_vars.innodb_buffer_pool_reads, SHOW_SIZE_T},
{"buffer_pool_wait_free", &buf_pool.stat.LRU_waits, SHOW_SIZE_T}, {"buffer_pool_wait_free", &buf_pool.stat.LRU_waits, SHOW_SIZE_T},
{"buffer_pool_write_requests", {"buffer_pool_write_requests",
&export_vars.innodb_buffer_pool_write_requests, SHOW_SIZE_T}, &export_vars.innodb_buffer_pool_write_requests, SHOW_SIZE_T},
......
This diff is collapsed.
/***************************************************************************** /*****************************************************************************
Copyright (c) 1995, 2017, Oracle and/or its affiliates. All Rights Reserved. Copyright (c) 1995, 2017, Oracle and/or its affiliates. All Rights Reserved.
Copyright (c) 2017, 2020, MariaDB Corporation. Copyright (c) 2017, 2022, MariaDB Corporation.
This program is free software; you can redistribute it and/or modify it under This program is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software the terms of the GNU General Public License as published by the Free Software
...@@ -54,9 +54,9 @@ class buf_dblwr_t ...@@ -54,9 +54,9 @@ class buf_dblwr_t
}; };
/** the page number of the first doublewrite block (block_size() pages) */ /** the page number of the first doublewrite block (block_size() pages) */
page_id_t block1= page_id_t(0, 0); page_id_t block1{0, 0};
/** the page number of the second doublewrite block (block_size() pages) */ /** the page number of the second doublewrite block (block_size() pages) */
page_id_t block2= page_id_t(0, 0); page_id_t block2{0, 0};
/** mutex protecting the data members below */ /** mutex protecting the data members below */
mysql_mutex_t mutex; mysql_mutex_t mutex;
...@@ -72,11 +72,15 @@ class buf_dblwr_t ...@@ -72,11 +72,15 @@ class buf_dblwr_t
ulint writes_completed; ulint writes_completed;
/** number of pages written by flush_buffered_writes_completed() */ /** number of pages written by flush_buffered_writes_completed() */
ulint pages_written; ulint pages_written;
/** condition variable for !writes_pending */
pthread_cond_t write_cond;
/** number of pending page writes */
size_t writes_pending;
slot slots[2]; slot slots[2];
slot *active_slot= &slots[0]; slot *active_slot;
/** Initialize the doublewrite buffer data structure. /** Initialise the persistent storage of the doublewrite buffer.
@param header doublewrite page header in the TRX_SYS page */ @param header doublewrite page header in the TRX_SYS page */
inline void init(const byte *header); inline void init(const byte *header);
...@@ -84,6 +88,8 @@ class buf_dblwr_t ...@@ -84,6 +88,8 @@ class buf_dblwr_t
bool flush_buffered_writes(const ulint size); bool flush_buffered_writes(const ulint size);
public: public:
/** Initialise the doublewrite buffer data structures. */
void init();
/** Create or restore the doublewrite buffer in the TRX_SYS page. /** Create or restore the doublewrite buffer in the TRX_SYS page.
@return whether the operation succeeded */ @return whether the operation succeeded */
bool create(); bool create();
...@@ -118,7 +124,7 @@ class buf_dblwr_t ...@@ -118,7 +124,7 @@ class buf_dblwr_t
void recover(); void recover();
/** Update the doublewrite buffer on data page write completion. */ /** Update the doublewrite buffer on data page write completion. */
void write_completed(); void write_completed(bool with_doublewrite);
/** Flush possible buffered writes to persistent storage. /** Flush possible buffered writes to persistent storage.
It is very important to call this function after a batch of writes has been It is very important to call this function after a batch of writes has been
posted, and also when we may have to wait for a page latch! posted, and also when we may have to wait for a page latch!
...@@ -137,14 +143,14 @@ class buf_dblwr_t ...@@ -137,14 +143,14 @@ class buf_dblwr_t
@param size payload size in bytes */ @param size payload size in bytes */
void add_to_batch(const IORequest &request, size_t size); void add_to_batch(const IORequest &request, size_t size);
/** Determine whether the doublewrite buffer is initialized */ /** Determine whether the doublewrite buffer has been created */
bool is_initialised() const bool is_created() const
{ return UNIV_LIKELY(block1 != page_id_t(0, 0)); } { return UNIV_LIKELY(block1 != page_id_t(0, 0)); }
/** @return whether a page identifier is part of the doublewrite buffer */ /** @return whether a page identifier is part of the doublewrite buffer */
bool is_inside(const page_id_t id) const bool is_inside(const page_id_t id) const
{ {
if (!is_initialised()) if (!is_created())
return false; return false;
ut_ad(block1 < block2); ut_ad(block1 < block2);
if (id < block1) if (id < block1)
...@@ -156,13 +162,44 @@ class buf_dblwr_t ...@@ -156,13 +162,44 @@ class buf_dblwr_t
/** Wait for flush_buffered_writes() to be fully completed */ /** Wait for flush_buffered_writes() to be fully completed */
void wait_flush_buffered_writes() void wait_flush_buffered_writes()
{ {
if (is_initialised()) mysql_mutex_lock(&mutex);
{ while (batch_running)
mysql_mutex_lock(&mutex); my_cond_wait(&cond, &mutex.m_mutex);
while (batch_running) mysql_mutex_unlock(&mutex);
my_cond_wait(&cond, &mutex.m_mutex); }
mysql_mutex_unlock(&mutex);
} /** Register an unbuffered page write */
void add_unbuffered()
{
mysql_mutex_lock(&mutex);
writes_pending++;
mysql_mutex_unlock(&mutex);
}
size_t pending_writes()
{
mysql_mutex_lock(&mutex);
const size_t pending{writes_pending};
mysql_mutex_unlock(&mutex);
return pending;
}
/** Wait for writes_pending to reach 0 */
void wait_for_page_writes()
{
mysql_mutex_lock(&mutex);
while (writes_pending)
my_cond_wait(&write_cond, &mutex.m_mutex);
mysql_mutex_unlock(&mutex);
}
/** Wait for writes_pending to reach 0 */
void wait_for_page_writes(const timespec &abstime)
{
mysql_mutex_lock(&mutex);
while (writes_pending)
my_cond_timedwait(&write_cond, &mutex.m_mutex, &abstime);
mysql_mutex_unlock(&mutex);
} }
}; };
......
...@@ -30,10 +30,8 @@ Created 11/5/1995 Heikki Tuuri ...@@ -30,10 +30,8 @@ Created 11/5/1995 Heikki Tuuri
#include "log0log.h" #include "log0log.h"
#include "buf0buf.h" #include "buf0buf.h"
/** Number of pages flushed. Protected by buf_pool.mutex. */
extern ulint buf_flush_page_count;
/** Number of pages flushed via LRU. Protected by buf_pool.mutex. /** Number of pages flushed via LRU. Protected by buf_pool.mutex.
Also included in buf_flush_page_count. */ Also included in buf_pool.stat.n_pages_written. */
extern ulint buf_lru_flush_page_count; extern ulint buf_lru_flush_page_count;
/** Number of pages freed without flushing. Protected by buf_pool.mutex. */ /** Number of pages freed without flushing. Protected by buf_pool.mutex. */
extern ulint buf_lru_freed_page_count; extern ulint buf_lru_freed_page_count;
...@@ -96,9 +94,8 @@ after releasing buf_pool.mutex. ...@@ -96,9 +94,8 @@ after releasing buf_pool.mutex.
@retval 0 if a buf_pool.LRU batch is already running */ @retval 0 if a buf_pool.LRU batch is already running */
ulint buf_flush_LRU(ulint max_n, bool evict); ulint buf_flush_LRU(ulint max_n, bool evict);
/** Wait until a flush batch ends. /** Wait until a LRU flush batch ends. */
@param lru true=buf_pool.LRU; false=buf_pool.flush_list */ void buf_flush_wait_LRU_batch_end();
void buf_flush_wait_batch_end(bool lru);
/** Wait until all persistent pages are flushed up to a limit. /** Wait until all persistent pages are flushed up to a limit.
@param sync_lsn buf_pool.get_oldest_modification(LSN_MAX) to wait for */ @param sync_lsn buf_pool.get_oldest_modification(LSN_MAX) to wait for */
ATTRIBUTE_COLD void buf_flush_wait_flushed(lsn_t sync_lsn); ATTRIBUTE_COLD void buf_flush_wait_flushed(lsn_t sync_lsn);
......
...@@ -33,10 +33,11 @@ Created 11/5/1995 Heikki Tuuri ...@@ -33,10 +33,11 @@ Created 11/5/1995 Heikki Tuuri
buffer buf_pool if it is not already there. Sets the io_fix flag and sets buffer buf_pool if it is not already there. Sets the io_fix flag and sets
an exclusive lock on the buffer frame. The flag is cleared and the x-lock an exclusive lock on the buffer frame. The flag is cleared and the x-lock
released by the i/o-handler thread. released by the i/o-handler thread.
@param[in] page_id page id @param page_id page id
@param[in] zip_size ROW_FORMAT=COMPRESSED page size, or 0 @param zip_size ROW_FORMAT=COMPRESSED page size, or 0
@retval DB_SUCCESS if the page was read and is not corrupted, @retval DB_SUCCESS if the page was read and is not corrupted
@retval DB_PAGE_CORRUPTED if page based on checksum check is corrupted, @retval DB_SUCCESS_LOCKED_REC if the page was not read
@retval DB_PAGE_CORRUPTED if page based on checksum check is corrupted
@retval DB_DECRYPTION_FAILED if page post encryption checksum matches but @retval DB_DECRYPTION_FAILED if page post encryption checksum matches but
after decryption normal page checksum does not match. after decryption normal page checksum does not match.
@retval DB_TABLESPACE_DELETED if tablespace .ibd file is missing */ @retval DB_TABLESPACE_DELETED if tablespace .ibd file is missing */
......
...@@ -1170,7 +1170,7 @@ struct fil_node_t final ...@@ -1170,7 +1170,7 @@ struct fil_node_t final
inline bool fil_space_t::use_doublewrite() const inline bool fil_space_t::use_doublewrite() const
{ {
return !UT_LIST_GET_FIRST(chain)->atomic_write && srv_use_doublewrite_buf && return !UT_LIST_GET_FIRST(chain)->atomic_write && srv_use_doublewrite_buf &&
buf_dblwr.is_initialised(); buf_dblwr.is_created();
} }
inline void fil_space_t::set_imported() inline void fil_space_t::set_imported()
......
...@@ -108,10 +108,6 @@ struct srv_stats_t ...@@ -108,10 +108,6 @@ struct srv_stats_t
/** Store the number of write requests issued */ /** Store the number of write requests issued */
ulint_ctr_1_t buf_pool_write_requests; ulint_ctr_1_t buf_pool_write_requests;
/** Number of buffer pool reads that led to the reading of
a disk page */
ulint_ctr_1_t buf_pool_reads;
/** Number of bytes saved by page compression */ /** Number of bytes saved by page compression */
ulint_ctr_n_t page_compression_saved; ulint_ctr_n_t page_compression_saved;
/* Number of pages compressed with page compression */ /* Number of pages compressed with page compression */
...@@ -670,24 +666,12 @@ struct export_var_t{ ...@@ -670,24 +666,12 @@ struct export_var_t{
char innodb_buffer_pool_resize_status[512];/*!< Buf pool resize status */ char innodb_buffer_pool_resize_status[512];/*!< Buf pool resize status */
my_bool innodb_buffer_pool_load_incomplete;/*!< Buf pool load incomplete */ my_bool innodb_buffer_pool_load_incomplete;/*!< Buf pool load incomplete */
ulint innodb_buffer_pool_pages_total; /*!< Buffer pool size */ ulint innodb_buffer_pool_pages_total; /*!< Buffer pool size */
ulint innodb_buffer_pool_pages_data; /*!< Data pages */
ulint innodb_buffer_pool_bytes_data; /*!< File bytes used */ ulint innodb_buffer_pool_bytes_data; /*!< File bytes used */
ulint innodb_buffer_pool_pages_dirty; /*!< Dirty data pages */
ulint innodb_buffer_pool_bytes_dirty; /*!< File bytes modified */
ulint innodb_buffer_pool_pages_misc; /*!< Miscellanous pages */ ulint innodb_buffer_pool_pages_misc; /*!< Miscellanous pages */
ulint innodb_buffer_pool_pages_free; /*!< Free pages */
#ifdef UNIV_DEBUG #ifdef UNIV_DEBUG
ulint innodb_buffer_pool_pages_latched; /*!< Latched pages */ ulint innodb_buffer_pool_pages_latched; /*!< Latched pages */
#endif /* UNIV_DEBUG */ #endif /* UNIV_DEBUG */
ulint innodb_buffer_pool_pages_made_not_young;
ulint innodb_buffer_pool_pages_made_young;
ulint innodb_buffer_pool_pages_old;
ulint innodb_buffer_pool_read_requests; /*!< buf_pool.stat.n_page_gets */
ulint innodb_buffer_pool_reads; /*!< srv_buf_pool_reads */
ulint innodb_buffer_pool_write_requests;/*!< srv_stats.buf_pool_write_requests */ ulint innodb_buffer_pool_write_requests;/*!< srv_stats.buf_pool_write_requests */
ulint innodb_buffer_pool_read_ahead_rnd;/*!< srv_read_ahead_rnd */
ulint innodb_buffer_pool_read_ahead; /*!< srv_read_ahead */
ulint innodb_buffer_pool_read_ahead_evicted;/*!< srv_read_ahead evicted*/
ulint innodb_checkpoint_age; ulint innodb_checkpoint_age;
ulint innodb_checkpoint_max_age; ulint innodb_checkpoint_max_age;
ulint innodb_data_pending_reads; /*!< Pending reads */ ulint innodb_data_pending_reads; /*!< Pending reads */
......
...@@ -1173,14 +1173,6 @@ ATTRIBUTE_COLD void logs_empty_and_mark_files_at_shutdown() ...@@ -1173,14 +1173,6 @@ ATTRIBUTE_COLD void logs_empty_and_mark_files_at_shutdown()
if (!buf_pool.is_initialised()) { if (!buf_pool.is_initialised()) {
ut_ad(!srv_was_started); ut_ad(!srv_was_started);
} else if (ulint pending_io = buf_pool.io_pending()) {
if (srv_print_verbose_log && count > 600) {
ib::info() << "Waiting for " << pending_io << " buffer"
" page I/Os to complete";
count = 0;
}
goto loop;
} else { } else {
buf_flush_buffer_pool(); buf_flush_buffer_pool();
} }
......
...@@ -909,7 +909,7 @@ static monitor_info_t innodb_counter_info[] = ...@@ -909,7 +909,7 @@ static monitor_info_t innodb_counter_info[] =
MONITOR_DEFAULT_START, MONITOR_MODULE_INDEX}, MONITOR_DEFAULT_START, MONITOR_MODULE_INDEX},
{"index_page_splits", "index", "Number of index page splits", {"index_page_splits", "index", "Number of index page splits",
MONITOR_NONE, MONITOR_EXISTING,
MONITOR_DEFAULT_START, MONITOR_INDEX_SPLIT}, MONITOR_DEFAULT_START, MONITOR_INDEX_SPLIT},
{"index_page_merge_attempts", "index", {"index_page_merge_attempts", "index",
...@@ -1411,10 +1411,12 @@ srv_mon_process_existing_counter( ...@@ -1411,10 +1411,12 @@ srv_mon_process_existing_counter(
/* Get the value from corresponding global variable */ /* Get the value from corresponding global variable */
switch (monitor_id) { switch (monitor_id) {
/* export_vars.innodb_buffer_pool_reads. Num Reads from case MONITOR_INDEX_SPLIT:
disk (page not in buffer) */ value = buf_pool.pages_split;
break;
case MONITOR_OVLD_BUF_POOL_READS: case MONITOR_OVLD_BUF_POOL_READS:
value = srv_stats.buf_pool_reads; value = buf_pool.stat.n_pages_read;
break; break;
/* innodb_buffer_pool_read_requests, the number of logical /* innodb_buffer_pool_read_requests, the number of logical
...@@ -1475,7 +1477,7 @@ srv_mon_process_existing_counter( ...@@ -1475,7 +1477,7 @@ srv_mon_process_existing_counter(
/* innodb_buffer_pool_bytes_dirty */ /* innodb_buffer_pool_bytes_dirty */
case MONITOR_OVLD_BUF_POOL_BYTES_DIRTY: case MONITOR_OVLD_BUF_POOL_BYTES_DIRTY:
value = buf_pool.stat.flush_list_bytes; value = buf_pool.flush_list_bytes;
break; break;
/* innodb_buffer_pool_pages_free */ /* innodb_buffer_pool_pages_free */
......
...@@ -675,6 +675,7 @@ void srv_boot() ...@@ -675,6 +675,7 @@ void srv_boot()
if (transactional_lock_enabled()) if (transactional_lock_enabled())
sql_print_information("InnoDB: Using transactional memory"); sql_print_information("InnoDB: Using transactional memory");
#endif #endif
buf_dblwr.init();
srv_thread_pool_init(); srv_thread_pool_init();
trx_pool_init(); trx_pool_init();
srv_init(); srv_init();
...@@ -1001,59 +1002,22 @@ srv_export_innodb_status(void) ...@@ -1001,59 +1002,22 @@ srv_export_innodb_status(void)
export_vars.innodb_data_writes = os_n_file_writes; export_vars.innodb_data_writes = os_n_file_writes;
ulint dblwr = 0; buf_dblwr.lock();
ulint dblwr = buf_dblwr.submitted();
if (buf_dblwr.is_initialised()) { export_vars.innodb_dblwr_pages_written = buf_dblwr.written();
buf_dblwr.lock(); export_vars.innodb_dblwr_writes = buf_dblwr.batches();
dblwr = buf_dblwr.submitted(); buf_dblwr.unlock();
export_vars.innodb_dblwr_pages_written = buf_dblwr.written();
export_vars.innodb_dblwr_writes = buf_dblwr.batches();
buf_dblwr.unlock();
}
export_vars.innodb_data_written = srv_stats.data_written + dblwr; export_vars.innodb_data_written = srv_stats.data_written + dblwr;
export_vars.innodb_buffer_pool_read_requests
= buf_pool.stat.n_page_gets;
export_vars.innodb_buffer_pool_write_requests = export_vars.innodb_buffer_pool_write_requests =
srv_stats.buf_pool_write_requests; srv_stats.buf_pool_write_requests;
export_vars.innodb_buffer_pool_reads = srv_stats.buf_pool_reads;
export_vars.innodb_buffer_pool_read_ahead_rnd =
buf_pool.stat.n_ra_pages_read_rnd;
export_vars.innodb_buffer_pool_read_ahead =
buf_pool.stat.n_ra_pages_read;
export_vars.innodb_buffer_pool_read_ahead_evicted =
buf_pool.stat.n_ra_pages_evicted;
export_vars.innodb_buffer_pool_pages_data =
UT_LIST_GET_LEN(buf_pool.LRU);
export_vars.innodb_buffer_pool_bytes_data = export_vars.innodb_buffer_pool_bytes_data =
buf_pool.stat.LRU_bytes buf_pool.stat.LRU_bytes
+ (UT_LIST_GET_LEN(buf_pool.unzip_LRU) + (UT_LIST_GET_LEN(buf_pool.unzip_LRU)
<< srv_page_size_shift); << srv_page_size_shift);
export_vars.innodb_buffer_pool_pages_dirty =
UT_LIST_GET_LEN(buf_pool.flush_list);
export_vars.innodb_buffer_pool_pages_made_young
= buf_pool.stat.n_pages_made_young;
export_vars.innodb_buffer_pool_pages_made_not_young
= buf_pool.stat.n_pages_not_made_young;
export_vars.innodb_buffer_pool_pages_old = buf_pool.LRU_old_len;
export_vars.innodb_buffer_pool_bytes_dirty =
buf_pool.stat.flush_list_bytes;
export_vars.innodb_buffer_pool_pages_free =
UT_LIST_GET_LEN(buf_pool.free);
#ifdef UNIV_DEBUG #ifdef UNIV_DEBUG
export_vars.innodb_buffer_pool_pages_latched = export_vars.innodb_buffer_pool_pages_latched =
buf_get_latched_pages_number(); buf_get_latched_pages_number();
......
...@@ -1997,7 +1997,7 @@ void innodb_shutdown() ...@@ -1997,7 +1997,7 @@ void innodb_shutdown()
ut_ad(dict_sys.is_initialised() || !srv_was_started); ut_ad(dict_sys.is_initialised() || !srv_was_started);
ut_ad(trx_sys.is_initialised() || !srv_was_started); ut_ad(trx_sys.is_initialised() || !srv_was_started);
ut_ad(buf_dblwr.is_initialised() || !srv_was_started ut_ad(buf_dblwr.is_created() || !srv_was_started
|| srv_read_only_mode || srv_read_only_mode
|| srv_force_recovery >= SRV_FORCE_NO_TRX_UNDO); || srv_force_recovery >= SRV_FORCE_NO_TRX_UNDO);
ut_ad(lock_sys.is_initialised() || !srv_was_started); ut_ad(lock_sys.is_initialised() || !srv_was_started);
......
...@@ -181,7 +181,7 @@ compress_pages_page_decompressed compression 0 NULL NULL NULL 0 NULL NULL NULL N ...@@ -181,7 +181,7 @@ compress_pages_page_decompressed compression 0 NULL NULL NULL 0 NULL NULL NULL N
compress_pages_page_compression_error compression 0 NULL NULL NULL 0 NULL NULL NULL NULL NULL NULL NULL 0 counter Number of page compression errors compress_pages_page_compression_error compression 0 NULL NULL NULL 0 NULL NULL NULL NULL NULL NULL NULL 0 counter Number of page compression errors
compress_pages_encrypted compression 0 NULL NULL NULL 0 NULL NULL NULL NULL NULL NULL NULL 0 counter Number of pages encrypted compress_pages_encrypted compression 0 NULL NULL NULL 0 NULL NULL NULL NULL NULL NULL NULL 0 counter Number of pages encrypted
compress_pages_decrypted compression 0 NULL NULL NULL 0 NULL NULL NULL NULL NULL NULL NULL 0 counter Number of pages decrypted compress_pages_decrypted compression 0 NULL NULL NULL 0 NULL NULL NULL NULL NULL NULL NULL 0 counter Number of pages decrypted
index_page_splits index 0 NULL NULL NULL 0 NULL NULL NULL NULL NULL NULL NULL 0 counter Number of index page splits index_page_splits index 0 NULL NULL NULL 0 NULL NULL NULL NULL NULL NULL NULL 0 status_counter Number of index page splits
index_page_merge_attempts index 0 NULL NULL NULL 0 NULL NULL NULL NULL NULL NULL NULL 0 counter Number of index page merge attempts index_page_merge_attempts index 0 NULL NULL NULL 0 NULL NULL NULL NULL NULL NULL NULL 0 counter Number of index page merge attempts
index_page_merge_successful index 0 NULL NULL NULL 0 NULL NULL NULL NULL NULL NULL NULL 0 counter Number of successful index page merges index_page_merge_successful index 0 NULL NULL NULL 0 NULL NULL NULL NULL NULL NULL NULL 0 counter Number of successful index page merges
index_page_reorg_attempts index 0 NULL NULL NULL 0 NULL NULL NULL NULL NULL NULL NULL 0 counter Number of index page reorganization attempts index_page_reorg_attempts index 0 NULL NULL NULL 0 NULL NULL NULL NULL NULL NULL NULL 0 counter Number of index page reorganization attempts
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment