MDEV-34458 wait_for_read in buf_page_get_low hurts performance
The performance regression seen while loading BP is caused by the deadlock fix given in MDEV-33543. The area of impact is wider but is more visible when BP is being loaded initially via DMLs. Specifically the response time could be impacted in DML doing pessimistic operation on index(split/merge) and the leaf pages are not found in buffer pool. It is more likely to occur with small BP size. The origin of the issue dates back to MDEV-30400 that introduced btr_cur_t::search_leaf() replacing btr_cur_search_to_nth_level() for leaf page searches. In btr_latch_prev, we use RW_NO_LATCH to get the previous page fixed in BP without latching. When the page is not in BP, we try to acquire and wait for S latch violating the latching order. This deadlock was analyzed in MDEV-33543 and fixed by using the already present wait logic in buf_page_get_gen() instead of waiting for latch. The wait logic is inferior to usual S latch wait and is simply a repeated sleep 100 of micro-sec (The actual sleep time could be more depending on platforms). The bug was seen with "change-buffering" code path and the idea was that this path should be less exercised. The judgement was not correct and the path is actually quite frequent and does impact performance when pages are not in BP and being loaded by DML expanding/shrinking large data. Fix: While trying to get a page with RW_NO_LATCH and we are attempting "out of order" latch, return from buf_page_get_gen immediately instead of waiting and follow the ordered latching path.
Showing
Please register or sign in to comment