1. 21 Oct, 2022 1 commit
    • Marko Mäkelä's avatar
      MDEV-24402: InnoDB CHECK TABLE ... EXTENDED · ab019010
      Marko Mäkelä authored
      Until now, the attribute EXTENDED of CHECK TABLE was ignored by InnoDB,
      and InnoDB only counted the records in each index according
      to the current read view. Unless the attribute QUICK was specified, the
      function btr_validate_index() would be invoked to validate the B-tree
      structure (the sibling and child links between index pages).
      
      The EXTENDED check will not only count all index records according to the
      current read view, but also ensure that any delete-marked records in the
      clustered index are waiting for the purge of history, and that all
      secondary index records point to a version of the clustered index record
      that is waiting for the purge of history. In other words, no index may
      contain orphan records. Normal MVCC reads and the non-EXTENDED version
      of CHECK TABLE would ignore these orphans.
      
      Unpurged records merely result in warnings (at most one per index),
      not errors, and no indexes will be flagged as corrupted due to such
      garbage. It will remain possible to SELECT data from such indexes or
      tables (which will skip such records) or to rebuild the table to
      reclaim some space.
      
      We introduce purge_sys.end_view that will be (almost) a copy of
      purge_sys.view at the end of a batch of purging committed transaction
      history. It is not an exact copy, because if the size of a purge batch
      is limited by innodb_purge_batch_size, some records that
      purge_sys.view would allow to be purged will be left over for
      subsequent batches.
      
      The purge_sys.view is relevant in the purge of committed transaction
      history, to determine if records are safe to remove. The new
      purge_sys.end_view is relevant in MVCC operations and in
      CHECK TABLE ... EXTENDED. It tells which undo log records are
      safe to access (have not been discarded at the end of a purge batch).
      
      purge_sys.clone_oldest_view<true>(): In trx_lists_init_at_db_start(),
      clone the oldest read view similar to purge_sys_t::clone_end_view()
      so that CHECK TABLE ... EXTENDED will not report bogus failures between
      InnoDB restart and the completed purge of committed transaction history.
      
      purge_sys_t::is_purgeable(): Replaces purge_sys_t::changes_visible()
      in the case that purge_sys.latch will not be held by the caller.
      Among other things, this guards access to BLOBs. It is not safe to
      dereference any BLOBs of a delete-marked purgeable record, because
      they may have already been freed.
      
      purge_sys_t::view_guard::view(): Return a reference to purge_sys.view
      that will be protected by purge_sys.latch, held by purge_sys_t::view_guard.
      
      purge_sys_t::end_view_guard::view(): Return a reference to
      purge_sys.end_view while it is protected by purge_sys.end_latch.
      Whenever a thread needs to retrieve an older version of a clustered
      index record, it will hold a page latch on the clustered index page
      and potentially also on a secondary index page that points to the
      clustered index page. If these pages contain purgeable records that
      would be accessed by a currently running purge batch, the progress of
      the purge batch would be blocked by the page latches. Hence, it is
      safe to make a copy of purge_sys.end_view while holding an index page
      latch, and consult the copy of the view to determine whether a record
      should already have been purged.
      
      btr_validate_index(): Remove a redundant check.
      
      row_check_index_match(): Check if a secondary index record and a
      version of a clustered index record match each other.
      
      row_check_index(): Replaces row_scan_index_for_mysql().
      Count the records in each index directly, duplicating the relevant
      logic from row_search_mvcc(). Initialize check_table_extended_view
      for CHECK ... EXTENDED while holding an index leaf page latch.
      If we encounter an orphan record, the copy of purge_sys.end_view that
      we make is safe for visibility checks, and trx_undo_get_undo_rec() will
      check for the safety to access each undo log record. Should that check
      fail, we should return DB_MISSING_HISTORY to report a corrupted index.
      The EXTENDED check tries to match each secondary index record with
      every available clustered index record version, by duplicating the logic
      of row_vers_build_for_consistent_read() and invoking
      trx_undo_prev_version_build() directly.
      
      Before invoking row_check_index_match() on delete-marked clustered index
      record versions, we will consult purge_sys.is_purgeable() in order to
      avoid accessing freed BLOBs.
      
      We will always check that the DB_TRX_ID or PAGE_MAX_TRX_ID does not
      exceed the global maximum. Orphan secondary index records will be
      flagged only if everything up to PAGE_MAX_TRX_ID has been purged.
      We warn also about clustered index records whose nonzero DB_TRX_ID
      should have been reset in purge or rollback.
      
      trx_set_rw_mode(): Move an assertion from ReadView::set_creator_trx_id().
      
      trx_undo_prev_version_build(): Remove two debug-only parameters,
      and return an error code instead of a Boolean.
      
      trx_undo_get_undo_rec(): Return a pointer to the undo log record,
      or nullptr if one cannot be retrieved. Instead of consulting the
      purge_sys.view, consult the purge_sys.end_view to determine which
      records can be accessed.
      
      trx_undo_get_rec_if_purgeable(): A variant of trx_undo_get_undo_rec()
      that will consult purge_sys.view instead of purge_sys.end_view.
      
      TRX_UNDO_CHECK_PURGEABILITY: A new parameter to
      trx_undo_prev_version_build(), passed by row_vers_old_has_index_entry()
      so that purge_sys.view instead of purge_sys.end_view will be consulted
      to determine whether a secondary index record may be safely purged.
      
      row_upd_changes_disowned_external(): Remove. This should be more
      expensive than briefly latching purge_sys in trx_undo_prev_version_build()
      (which may make use of transactional memory).
      
      row_sel_reset_old_vers_heap(): New function, split from
      row_sel_build_prev_vers_for_mysql().
      
      row_sel_build_prev_vers_for_mysql(): Reorder some parameters
      to simplify the call to row_sel_reset_old_vers_heap().
      
      row_search_for_mysql(): Replaced with direct calls to row_search_mvcc().
      
      sel_node_get_nth_plan(): Define inline in row0sel.h
      
      open_step(): Define at the call site, in simplified form.
      
      sel_node_reset_cursor(): Merged with the only caller open_step().
      ---
      ReadViewBase::check_trx_id_sanity(): Remove.
      Let us handle "future" DB_TRX_ID in a more meaningful way:
      
      row_sel_clust_sees(): Return DB_SUCCESS if the record is visible,
      DB_SUCCESS_LOCKED_REC if it is invisible, and DB_CORRUPTION if
      the DB_TRX_ID is in the future.
      
      row_undo_mod_must_purge(), row_undo_mod_clust(): Silently ignore
      corrupted DB_TRX_ID. We are in ROLLBACK, and we should have noticed
      that corruption when we were about to modify the record in the first
      place (leading us to refuse the operation).
      
      row_vers_build_for_consistent_read(): Return DB_CORRUPTION if
      DB_TRX_ID is in the future.
      
      Tested by: Matthias Leich
      Reviewed by: Vladislav Lesin
      ab019010
  2. 20 Oct, 2022 1 commit
  3. 18 Oct, 2022 1 commit
  4. 16 Oct, 2022 1 commit
  5. 15 Oct, 2022 2 commits
  6. 14 Oct, 2022 6 commits
  7. 13 Oct, 2022 6 commits
  8. 12 Oct, 2022 5 commits
    • Nikita Malyavin's avatar
      MDEV-29753 An error is wrongly reported during INSERT with vcol index · 128356b4
      Nikita Malyavin authored
      See also commits aa8a31da and 64678c for a Bug #22990029 fix.
      
      In this scenario INSERT chose to check if delete unmarking is available for
      a just deleted record. To build an update vector, it needed to calculate
      the vcols as well. Since this INSERT was not IGNORE-flagged, recalculation
      failed.
      
      Solutiuon: temporarily set abort_on_warning=true, while calculating the
      column for delete-unmarked insert.
      128356b4
    • Nikita Malyavin's avatar
      MDEV-29299 SELECT from table with vcol index reports warning · 3cd2c1e8
      Nikita Malyavin authored
      As of now innodb does not store trx_id for each record in secondary index.
      The idea behind is following: let us store only per-page max_trx_id, and
      delete-mark the records when they are deleted/updated.
      
      If the read starts, it rememders the lowest id of currently active
      transaction. Innodb refers to it as trx->read_view->m_up_limit_id.
      See also ReadView::open.
      
      When the page is fetched, its max_trx_id is compared to m_up_limit_id.
      If the value is lower, and the secondary index record is not delete-marked,
      then this page is just safe to read as is. Else, a clustered index could be
      needed ato access. See page_get_max_trx_id call in row_search_mvcc, and the
      corresponding switch (row_search_idx_cond_check(...)) below.
      
      Virtual columns are required to be updated in case if the record was
      delete-marked. The motivation behind it is documented in
      Row_sel_get_clust_rec_for_mysql::operator() near
      row_sel_sec_rec_is_for_clust_rec call.
      
      This was basically a description why virtual column computation can
      normally happen during SELECT, and, generally, a vcol index access.
      
      Sometimes stats tables are updated by innodb. This starts a new
      transaction, and it can happen that it didn't finish to the moment of
      SELECT execution, forcing virtual columns recomputation. If the result was
      a something that normally outputs a warning, like division by zero, then
      it could be outputted in a racy manner.
      
      The solution is to suppress the warnings when a column is computed
      for the described purpose.
      ignore_wrnings argument is added innobase_get_computed_value.
      Currently, it is only true for a call from
      row_sel_sec_rec_is_for_clust_rec.
      3cd2c1e8
    • Marko Mäkelä's avatar
      Merge 10.5 into 10.6 · a992c615
      Marko Mäkelä authored
      a992c615
    • Jan Lindström's avatar
      Fixes after 10.4 --> 10.5 merge · 5fffdbc8
      Jan Lindström authored
      * MDEV-29142 : Ignore inconsistency warning as we kill cluster
      * galera_parallel_apply_3nodes : Disabled because it is unstable
      * MDEV-26597 : Add missing code
      * galera_sr.galera_sr_ws_size2 : Remove incorrect assertion
      5fffdbc8
    • Marko Mäkelä's avatar
      Merge 10.4 into 10.5 · 977c385d
      Marko Mäkelä authored
      977c385d
  9. 11 Oct, 2022 9 commits
  10. 10 Oct, 2022 4 commits
  11. 09 Oct, 2022 4 commits