MDEV-33613 InnoDB may still hang when temporarily running out of buffer pool
By design, InnoDB has always hung when permanently running out of buffer pool, for example when several threads are waiting to allocate a block, and all of the buffer pool is buffer-fixed by the active threads. The hang that we are fixing here occurs when the buffer pool is only temporarily running out and the situation could be rescued by writing out some dirty pages or evicting some clean pages. buf_LRU_get_free_block(): Simplify the way how we wait for the buf_flush_page_cleaner thread. This fixes occasional hangs of the test encryption.innochecksum that were introduced by commit a55b951e (MDEV-26827). To play it safe, we use a timed wait when waiting for the buf_flush_page_cleaner() thread to perform its job. Should that thread get stuck, we will invoke buf_pool.LRU_warn() in order to display a message that pages could not be freed, and keep trying to wake up the buf_flush_page_cleaner() thread. The INFORMATION_SCHEMA.INNODB_METRICS counters buffer_LRU_single_flush_failure_count and buffer_LRU_get_free_waits will be removed. The latter is represented by buffer_pool_wait_free. Also removed will be the message "InnoDB: Difficult to find free blocks in the buffer pool" because in d34479dc we introduced a more precise message "InnoDB: Could not free any blocks in the buffer pool" in the buf_flush_page_cleaner thread. buf_pool_t::LRU_warn(): Issue the warning message that we could not free any blocks in the buffer pool. This may also be invoked by buf_LRU_get_free_block() if buf_flush_page_cleaner() appears to be stuck. buf_pool_t::n_flush_dec(): Remove. buf_pool_t::n_flush_dec_holding_mutex(): Rename to n_flush_dec(). buf_flush_LRU_list_batch(): Increment the eviction counter for blocks of temporary, discarded or dropped tablespaces. buf_flush_LRU(): Make static, and remove the constant parameter evict=false. The only caller will be the buf_flush_page_cleaner() thread. IORequest::is_LRU(): Remove. The only case of evicting pages on write completion will be when we are writing out pages of the temporary tablespace. Those pages are not in buf_pool.flush_list, only in buf_pool.LRU. buf_page_t::flush(): Remove the parameter evict. buf_page_t::write_complete(): Change the parameter "bool temporary" to "bool persistent" and add a parameter for an already read state(). Reviewed by: Debarun Banerjee
Showing
This diff is collapsed.
Please register or sign in to comment