MDEV-34458 wait_for_read in buf_page_get_low hurts performance

The performance regression seen while loading BP is caused by the
deadlock fix given in MDEV-33543. The area of impact is wider but is
more visible when BP is being loaded initially via DMLs.  Specifically
the response time could be impacted in DML doing pessimistic operation
on index(split/merge) and the leaf pages are not found in buffer pool.
It is more likely to occur with small BP size.

The origin of the issue dates back to MDEV-30400 that introduced
btr_cur_t::search_leaf() replacing btr_cur_search_to_nth_level() for
leaf page searches. In btr_latch_prev, we use RW_NO_LATCH to get the
previous page fixed in BP without latching. When the page is not in BP,
we try to acquire and wait for S latch violating the latching order.

This deadlock was analyzed in MDEV-33543 and fixed by using the already
present wait logic in buf_page_get_gen() instead of waiting for latch.
The wait logic is inferior to usual S latch wait and is simply a
repeated sleep 100 of micro-sec (The actual sleep time could be more
depending on platforms). The bug was seen with "change-buffering" code
path and the idea was that this path should be less exercised. The
judgement was not correct and the path is actually quite frequent and
does impact performance when pages are not in BP and being loaded by
DML expanding/shrinking large data.

Fix: While trying to get a page with RW_NO_LATCH and we are attempting
"out of order" latch, return from buf_page_get_gen immediately instead
of waiting and follow the ordered latching path.
parent dcd8a648
...@@ -1262,7 +1262,7 @@ void btr_drop_temporary_table(const dict_table_t &table) ...@@ -1262,7 +1262,7 @@ void btr_drop_temporary_table(const dict_table_t &table)
{ {
if (buf_block_t *block= buf_page_get_low({SRV_TMP_SPACE_ID, index->page}, 0, if (buf_block_t *block= buf_page_get_low({SRV_TMP_SPACE_ID, index->page}, 0,
RW_X_LATCH, nullptr, BUF_GET, &mtr, RW_X_LATCH, nullptr, BUF_GET, &mtr,
nullptr, false)) nullptr, false, nullptr))
{ {
btr_free_but_not_root(block, MTR_LOG_NO_REDO); btr_free_but_not_root(block, MTR_LOG_NO_REDO);
mtr.set_log_mode(MTR_LOG_NO_REDO); mtr.set_log_mode(MTR_LOG_NO_REDO);
......
...@@ -971,12 +971,21 @@ static int btr_latch_prev(buf_block_t *block, page_id_t page_id, ...@@ -971,12 +971,21 @@ static int btr_latch_prev(buf_block_t *block, page_id_t page_id,
buffer-fixes on both blocks will prevent eviction. */ buffer-fixes on both blocks will prevent eviction. */
retry: retry:
/* Pass no_wait pointer to ensure that we don't wait on the current page
latch while holding the next page latch to avoid latch ordering violation. */
bool no_wait= false;
int ret= 1;
buf_block_t *prev= buf_page_get_gen(page_id, zip_size, RW_NO_LATCH, nullptr, buf_block_t *prev= buf_page_get_gen(page_id, zip_size, RW_NO_LATCH, nullptr,
BUF_GET, mtr, err, false); BUF_GET, mtr, err, false, &no_wait);
if (UNIV_UNLIKELY(!prev)) if (UNIV_UNLIKELY(!prev))
{
/* Check if we had to return because we couldn't wait on latch. */
if (no_wait)
goto ordered_latch;
return 0; return 0;
}
int ret= 1;
static_assert(MTR_MEMO_PAGE_S_FIX == mtr_memo_type_t(BTR_SEARCH_LEAF), ""); static_assert(MTR_MEMO_PAGE_S_FIX == mtr_memo_type_t(BTR_SEARCH_LEAF), "");
static_assert(MTR_MEMO_PAGE_X_FIX == mtr_memo_type_t(BTR_MODIFY_LEAF), ""); static_assert(MTR_MEMO_PAGE_X_FIX == mtr_memo_type_t(BTR_MODIFY_LEAF), "");
...@@ -996,6 +1005,7 @@ static int btr_latch_prev(buf_block_t *block, page_id_t page_id, ...@@ -996,6 +1005,7 @@ static int btr_latch_prev(buf_block_t *block, page_id_t page_id,
{ {
ut_ad(mtr->at_savepoint(mtr->get_savepoint() - 1)->page.id() == page_id); ut_ad(mtr->at_savepoint(mtr->get_savepoint() - 1)->page.id() == page_id);
mtr->release_last_page(); mtr->release_last_page();
ordered_latch:
if (rw_latch == RW_S_LATCH) if (rw_latch == RW_S_LATCH)
block->page.lock.s_unlock(); block->page.lock.s_unlock();
else else
......
...@@ -2416,6 +2416,9 @@ BUF_PEEK_IF_IN_POOL, or BUF_GET_IF_IN_POOL_OR_WATCH ...@@ -2416,6 +2416,9 @@ BUF_PEEK_IF_IN_POOL, or BUF_GET_IF_IN_POOL_OR_WATCH
while reading the page from file while reading the page from file
then it makes sure that it does merging of change buffer changes while then it makes sure that it does merging of change buffer changes while
reading the page from file. reading the page from file.
@param[in,out] no_wait If not NULL on input, then we must not
wait for current page latch. On output, the value is set to true if we had to
return because we could not wait on page latch.
@return pointer to the block or NULL */ @return pointer to the block or NULL */
TRANSACTIONAL_TARGET TRANSACTIONAL_TARGET
buf_block_t* buf_block_t*
...@@ -2427,7 +2430,8 @@ buf_page_get_low( ...@@ -2427,7 +2430,8 @@ buf_page_get_low(
ulint mode, ulint mode,
mtr_t* mtr, mtr_t* mtr,
dberr_t* err, dberr_t* err,
bool allow_ibuf_merge) bool allow_ibuf_merge,
bool* no_wait)
{ {
unsigned access_time; unsigned access_time;
ulint retries = 0; ulint retries = 0;
...@@ -2584,14 +2588,17 @@ buf_page_get_low( ...@@ -2584,14 +2588,17 @@ buf_page_get_low(
in buf_page_t::read_complete() or in buf_page_t::read_complete() or
buf_pool_t::corrupted_evict(), or buf_pool_t::corrupted_evict(), or
after buf_zip_decompress() in this function. */ after buf_zip_decompress() in this function. */
if (rw_latch != RW_NO_LATCH) { if (!no_wait) {
block->page.lock.s_lock(); block->page.lock.s_lock();
} else if (!block->page.lock.s_lock_try()) { } else if (!block->page.lock.s_lock_try()) {
/* For RW_NO_LATCH, we should not try to acquire S or X ut_ad(rw_latch == RW_NO_LATCH);
latch directly as we could be violating the latching /* We should not wait trying to acquire S latch for
order resulting in deadlock. Instead we try latching the current page while holding latch for the next page.
page and retry in case of a failure. */ It would violate the latching order resulting in
goto wait_for_read; possible deadlock. Caller must handle the failure. */
block->page.unfix();
*no_wait= true;
return nullptr;
} }
state = block->page.state(); state = block->page.state();
ut_ad(state < buf_page_t::READ_FIX ut_ad(state < buf_page_t::READ_FIX
...@@ -2647,7 +2654,6 @@ buf_page_get_low( ...@@ -2647,7 +2654,6 @@ buf_page_get_low(
if (UNIV_UNLIKELY(!block->page.frame)) { if (UNIV_UNLIKELY(!block->page.frame)) {
if (!block->page.lock.x_lock_try()) { if (!block->page.lock.x_lock_try()) {
wait_for_unzip: wait_for_unzip:
wait_for_read:
/* The page is being read or written, or /* The page is being read or written, or
another thread is executing buf_zip_decompress() another thread is executing buf_zip_decompress()
in buf_page_get_low() on it. */ in buf_page_get_low() on it. */
...@@ -2936,6 +2942,9 @@ BUF_PEEK_IF_IN_POOL, or BUF_GET_IF_IN_POOL_OR_WATCH ...@@ -2936,6 +2942,9 @@ BUF_PEEK_IF_IN_POOL, or BUF_GET_IF_IN_POOL_OR_WATCH
@param[out] err DB_SUCCESS or error code @param[out] err DB_SUCCESS or error code
@param[in] allow_ibuf_merge Allow change buffer merge while @param[in] allow_ibuf_merge Allow change buffer merge while
reading the pages from file. reading the pages from file.
@param[in,out] no_wait If not NULL on input, then we must not
wait for current page latch. On output, the value is set to true if we had to
return because we could not wait on page latch.
@return pointer to the block or NULL */ @return pointer to the block or NULL */
buf_block_t* buf_block_t*
buf_page_get_gen( buf_page_get_gen(
...@@ -2946,12 +2955,14 @@ buf_page_get_gen( ...@@ -2946,12 +2955,14 @@ buf_page_get_gen(
ulint mode, ulint mode,
mtr_t* mtr, mtr_t* mtr,
dberr_t* err, dberr_t* err,
bool allow_ibuf_merge) bool allow_ibuf_merge,
bool* no_wait)
{ {
buf_block_t *block= recv_sys.recover(page_id); buf_block_t *block= recv_sys.recover(page_id);
if (UNIV_LIKELY(!block)) if (UNIV_LIKELY(!block))
return buf_page_get_low(page_id, zip_size, rw_latch, return buf_page_get_low(page_id, zip_size, rw_latch,
guess, mode, mtr, err, allow_ibuf_merge); guess, mode, mtr, err, allow_ibuf_merge,
no_wait);
else if (UNIV_UNLIKELY(block == reinterpret_cast<buf_block_t*>(-1))) else if (UNIV_UNLIKELY(block == reinterpret_cast<buf_block_t*>(-1)))
{ {
corrupted: corrupted:
......
...@@ -209,6 +209,9 @@ BUF_PEEK_IF_IN_POOL, or BUF_GET_IF_IN_POOL_OR_WATCH ...@@ -209,6 +209,9 @@ BUF_PEEK_IF_IN_POOL, or BUF_GET_IF_IN_POOL_OR_WATCH
@param[out] err DB_SUCCESS or error code @param[out] err DB_SUCCESS or error code
@param[in] allow_ibuf_merge Allow change buffer merge while @param[in] allow_ibuf_merge Allow change buffer merge while
reading the pages from file. reading the pages from file.
@param[in,out] no_wait If not NULL on input, then we must not
wait for current page latch. On output, the value is set to true if we had to
return because we could not wait on page latch.
@return pointer to the block or NULL */ @return pointer to the block or NULL */
buf_block_t* buf_block_t*
buf_page_get_gen( buf_page_get_gen(
...@@ -219,7 +222,8 @@ buf_page_get_gen( ...@@ -219,7 +222,8 @@ buf_page_get_gen(
ulint mode, ulint mode,
mtr_t* mtr, mtr_t* mtr,
dberr_t* err = NULL, dberr_t* err = NULL,
bool allow_ibuf_merge = false) bool allow_ibuf_merge = false,
bool* no_wait = nullptr)
MY_ATTRIBUTE((nonnull(6), warn_unused_result)); MY_ATTRIBUTE((nonnull(6), warn_unused_result));
/** This is the low level function used to get access to a database page. /** This is the low level function used to get access to a database page.
...@@ -236,6 +240,11 @@ BUF_PEEK_IF_IN_POOL, or BUF_GET_IF_IN_POOL_OR_WATCH ...@@ -236,6 +240,11 @@ BUF_PEEK_IF_IN_POOL, or BUF_GET_IF_IN_POOL_OR_WATCH
while reading the page from file while reading the page from file
then it makes sure that it does merging of change buffer changes while then it makes sure that it does merging of change buffer changes while
reading the page from file. reading the page from file.
@param[in] holds_next_page_latch True if caller holds next page latch.
We must not wait for current page latch.
@param[in,out] no_wait If not NULL on input, then we must not
wait for current page latch. On output, the value is set to true if we had to
return because we could not wait on page latch.
@return pointer to the block or NULL */ @return pointer to the block or NULL */
buf_block_t* buf_block_t*
buf_page_get_low( buf_page_get_low(
...@@ -246,7 +255,8 @@ buf_page_get_low( ...@@ -246,7 +255,8 @@ buf_page_get_low(
ulint mode, ulint mode,
mtr_t* mtr, mtr_t* mtr,
dberr_t* err, dberr_t* err,
bool allow_ibuf_merge); bool allow_ibuf_merge,
bool* no_wait);
/** Initialize a page in the buffer pool. The page is usually not read /** Initialize a page in the buffer pool. The page is usually not read
from a file even if it cannot be found in the buffer buf_pool. This is one from a file even if it cannot be found in the buffer buf_pool. This is one
......
...@@ -2165,7 +2165,7 @@ dberr_t PageConverter::operator()(buf_block_t* block) UNIV_NOTHROW ...@@ -2165,7 +2165,7 @@ dberr_t PageConverter::operator()(buf_block_t* block) UNIV_NOTHROW
we no longer evict the pages on DISCARD TABLESPACE. */ we no longer evict the pages on DISCARD TABLESPACE. */
buf_page_get_low(block->page.id(), get_zip_size(), RW_NO_LATCH, buf_page_get_low(block->page.id(), get_zip_size(), RW_NO_LATCH,
nullptr, BUF_PEEK_IF_IN_POOL, nullptr, BUF_PEEK_IF_IN_POOL,
nullptr, nullptr, false); nullptr, nullptr, false, nullptr);
uint16_t page_type; uint16_t page_type;
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment