• Vlad Lesin's avatar
    MDEV-20605 Awaken transaction can miss inserted by other transaction records... · 20e9e804
    Vlad Lesin authored
    MDEV-20605 Awaken transaction can miss inserted by other transaction records due to wrong persistent cursor restoration
    
    sel_restore_position_for_mysql() moves forward persistent cursor
    position after btr_pcur_restore_position() call if cursor relative position
    is BTR_PCUR_ON and the cursor points to the record with NOT the same field
    values as in a stored record(and some other not important for this case
    conditions).
    
    It was done because btr_pcur_restore_position() sets
    page_cur_mode_t mode  to PAGE_CUR_LE for cursor->rel_pos ==  BTR_PCUR_ON
    before opening cursor. So we are searching for the record less or equal
    to stored one. And if the found record is not equal to stored one, then
    it is less and we need to move cursor forward.
    
    But there can be a situation when the stored record was purged, but the
    new one with the same key but different value was inserted while
    row_search_mvcc() was suspended. In this case, when the thread is
    awaken, it will invoke sel_restore_position_for_mysql(), which, in turns,
    invoke btr_pcur_restore_position(), which will return false because found
    record don't match stored record, and
    sel_restore_position_for_mysql() will move forward cursor position.
    
    The above can lead to the case when awaken row_search_mvcc() do not see
    records inserted by other transactions while it slept. The mtr test case
    shows the example how it can be.
    
    The fix is to return special value from persistent cursor restoring
    function which would notify its caller that uniq fields of restored
    record and stored record are the same, and in this case
    sel_restore_position_for_mysql() don't move cursor forward.
    
    Delete-marked records are correctly processed in row_search_mvcc().
    Non-unique secondary indexes are "uniquified" by adding the PK, the
    index->n_uniq should then be index->n_fields. So there is no need in
    additional checks in the fix.
    
    If transaction's readview can't see the changes made in secondary index
    record, it requests clustered index record in row_search_mvcc() to check
    its transaction id and get the correspondent record version. After this
    row_search_mvcc() commits mtr to preserve clustered index latching
    order, and starts mtr. Between those mtr commit and start secondary
    index pages are unlatched, and purge has the ability to remove stored in
    the cursor record, what causes rows duplication in result set for
    non-locking reads, as cursor position is restored to the previously
    visited record.
    
    To solve this the changes are just switched off for non-locking reads,
    it's quite simple solution, besides the changes don't make sense for
    non-locking reads.
    
    The more complex and effective from performance perspective solution is
    to create mtr savepoint before clustered record requesting and rolling
    back to that savepoint after that. See MDEV-27557.
    
    One more solution is to have per-record transaction id for secondary
    indexes. See MDEV-17598.
    
    If any of those is implemented, just remove select_lock_type argument in
    sel_restore_position_for_mysql().
    20e9e804
btr0pcur.h 22 KB