1. 20 Feb, 2023 9 commits
  2. 19 Feb, 2023 3 commits
  3. 17 Feb, 2023 9 commits
    • Nikita Malyavin's avatar
      MDEV-16329 [5/5] ALTER ONLINE TABLE · 5d633a74
      Nikita Malyavin authored
      * Log rows in online_alter_binlog.
      * Table online data is replicated within dedicated binlog file
      * Cached data is written on commit.
      * Versioning is fully supported.
      * Works both wit and without binlog enabled.
      
      * For now savepoints setup is forbidden while ONLINE ALTER goes on.
        Extra support is required. We can simply log the SAVEPOINT query events
        and replicate them together with row events. But it's not implemented
        for now.
      
      * Cache flipping:
      
        We want to care for the possible bottleneck in the online alter binlog
        reading/writing in advance.
      
        IO_CACHE does not provide anything better that sequential access,
        besides, only a single write is mutex-protected, which is not suitable,
        since we should write a transaction atomically.
      
        To solve this, a special layer on top Event_log is implemented.
        There are two IO_CACHE files underneath: one for reading, and one for
        writing.
      
        Once the read cache is empty, an exclusive lock is acquired (we can wait
        for a currently active transaction finish writing), and flip() is emitted,
        i.e. the write cache is reopened for read, and the read cache is emptied,
        and reopened for writing.
      
        This reminds a buffer flip that happens in accelerated graphics
        (DirectX/OpenGL/etc).
      
        Cache_flip_event_log is considered non-blocking for a single reader and a
        single writer in this sense, with the only lock held by reader during flip.
      
        An alternative approach by implementing a fair concurrent circular buffer
        is described in MDEV-24676.
      
      * Cache managers:
        We have two cache sinks: statement and transactional.
        It is important that the changes are first cached per-statement and
        per-transaction.
        If a statement fails, then only statement data is rolled back. The
        transaction moves along, however.
      
        Turns out, there's no guarantee that TABLE well persist in
        thd->open_tables to the transaction commit moment.
        If an error occurs, tables from statement are purged.
        Therefore, we can't store te caches in TABLE. Ideally, it should be
        handlerton, but we cut the corner and store it in THD in a list.
      5d633a74
    • Nikita Malyavin's avatar
      MDEV-16329 [4/5] Refactor MYSQL_BIN_LOG: extract Event_log ancestor · 3b0c2cf1
      Nikita Malyavin authored
      Event_log is supposed to be a basic logging class that can write events in
      a single file.
      
      MYSQL_BIN_LOG in comparison will have:
      * rotation support
      * index files
      * purging
      * gtid and transactional information handling.
      * is dedicated for a general-purpose binlog
      3b0c2cf1
    • Nikita Malyavin's avatar
      MDEV-16329 [3/5] use binlog_cache_data directly in most places · 40a0d3c0
      Nikita Malyavin authored
      * Eliminate most usages of THD::use_trans_table. Only 3 left, and they are
        at quite high levels, and really essential.
      * Eliminate is_transactional argument when possible. Lots of places are
        left though, because of some WSREP error handling in
        MYSQL_BIN_LOG::set_write_error.
      * Remove junk binlog functions from THD
      * binlog_prepare_pending_rows_event is moved to log.cc inside MYSQL_BIN_LOG
        and is not anymore template. Instead it accepls event factory with a type
        code, and a callback to a constructing function in it.
      40a0d3c0
    • Nikita Malyavin's avatar
      MDEV-16329 [2/5] refactor binlog and cache_mngr · 9668d9dd
      Nikita Malyavin authored
      pump up binlog and cache manager to level of binlog_log_row_internal
      9668d9dd
    • Nikita Malyavin's avatar
      d811a812
    • Nikita Malyavin's avatar
      rpl: repack table_def · dd7cb7cd
      Nikita Malyavin authored
      1. Change m_size to uint. This removes some implicit conversions.
        See unpack_row, for instance:
        uint max_cols= MY_MIN(tabledef->size(), cols->n_bits);
      2. Improve table_def memory layout by reordering columns
      dd7cb7cd
    • Nikita Malyavin's avatar
      Copy_field: add const to arguments · f4dea91e
      Nikita Malyavin authored
      f4dea91e
    • Sergei Golubchik's avatar
      rename tests · fd1a63aa
      Sergei Golubchik authored
      alter_table_online -> alter_table_locknone
      gis-alter_table_online -> gis-alter_table
      fd1a63aa
    • Sergei Golubchik's avatar
  4. 16 Feb, 2023 16 commits
    • Sergei Golubchik's avatar
      fix for --view-protocol · da114c70
      Sergei Golubchik authored
      da114c70
    • Marko Mäkelä's avatar
    • Marko Mäkelä's avatar
      Merge 10.11 into 11.0 · 2e431ff7
      Marko Mäkelä authored
      2e431ff7
    • Marko Mäkelä's avatar
      Merge 10.10 into 10.11 · 1fd00998
      Marko Mäkelä authored
      1fd00998
    • Marko Mäkelä's avatar
      Merge 10.9 into 10.10 · 345356b8
      Marko Mäkelä authored
      345356b8
    • Marko Mäkelä's avatar
      Merge 10.8 into 10.9 · 0d55914d
      Marko Mäkelä authored
      0d55914d
    • Marko Mäkelä's avatar
      Merge 10.6 into 10.8 · b12cd88c
      Marko Mäkelä authored
      b12cd88c
    • Marko Mäkelä's avatar
      Merge 10.5 into 10.6 · 67a6ad0a
      Marko Mäkelä authored
      67a6ad0a
    • Marko Mäkelä's avatar
      d3f35aa4
    • Marko Mäkelä's avatar
      Fix clang -Winconsistent-missing-override · 0c79ae94
      Marko Mäkelä authored
      0c79ae94
    • Marko Mäkelä's avatar
      MDEV-27774 fixup: Correct a comment · 34f0433c
      Marko Mäkelä authored
      34f0433c
    • Marko Mäkelä's avatar
      Merge 10.6 into 10.8 · 5abbe092
      Marko Mäkelä authored
      5abbe092
    • Marko Mäkelä's avatar
      MDEV-30638 Deadlock between INSERT and InnoDB non-persistent statistics update · 201cfc33
      Marko Mäkelä authored
      This is a partial revert of
      commit 8b6a308e (MDEV-29883)
      and a follow-up to the
      merge commit 394fc71f (MDEV-24569).
      
      The latching order related to any operation that accesses the allocation
      metadata of an InnoDB index tree is as follows:
      
      1. Acquire dict_index_t::lock in non-shared mode.
      2. Acquire the index root page latch in non-shared mode.
      3. Possibly acquire further index page latches. Unless an exclusive
      dict_index_t::lock is held, this must follow the root-to-leaf,
      left-to-right order.
      4. Acquire a *non-shared* fil_space_t::latch.
      5. Acquire latches on the allocation metadata pages.
      6. Possibly allocate and write some pages, or free some pages.
      
      btr_get_size_and_reserved(), dict_stats_update_transient_for_index(),
      dict_stats_analyze_index(): Acquire an exclusive fil_space_t::latch
      in order to avoid a deadlock in fseg_n_reserved_pages() in case of
      concurrent access to multiple indexes sharing the same "inode page".
      
      fseg_page_is_allocated(): Acquire an exclusive fil_space_t::latch
      in order to avoid deadlocks. All callers are holding latches
      on a buffer pool page, or an index, or both.
      Before commit edbde4a1 (MDEV-24167)
      a third mode was available that would not conflict with the shared
      fil_space_t::latch acquired by ha_innobase::info_low(),
      i_s_sys_tablespaces_fill_table(),
      or i_s_tablespaces_encryption_fill_table().
      Because those calls should be rather rare, it makes sense to use
      the simple rw_lock with only shared and exclusive modes.
      
      fil_crypt_get_page_throttle(): Avoid invoking fseg_page_is_allocated()
      on an allocation bitmap page (which can never be freed), to avoid
      acquiring a shared latch on top of an exclusive one.
      
      mtr_t::s_lock_space(), MTR_MEMO_SPACE_S_LOCK: Remove.
      201cfc33
    • Marko Mäkelä's avatar
      MDEV-30134 Assertion failed in buf_page_t::unfix() in buf_pool_t::watch_unset() · 54c0ac72
      Marko Mäkelä authored
      buf_pool_t::watch_set(): Always buffer-fix a block if one was found,
      no matter if it is a watch sentinel or a buffer page. The type of
      the block descriptor will be rechecked in buf_page_t::watch_unset().
      Do not expect the caller to acquire the page hash latch. Starting with
      commit bd5a6403 it is safe to release
      buf_pool.mutex before acquiring a buf_pool.page_hash latch.
      
      buf_page_get_low(): Adjust to the changed buf_pool_t::watch_set().
      
      This simplifies the logic and fixes a bug that was reproduced when
      using debug builds and the setting innodb_change_buffering_debug=1.
      54c0ac72
    • Marko Mäkelä's avatar
      MDEV-30397: MariaDB crash due to DB_FAIL reported for a corrupted page · 9c157994
      Marko Mäkelä authored
      buf_read_page_low(): Map the buf_page_t::read_complete() return
      value DB_FAIL to DB_PAGE_CORRUPTED. The purpose of the DB_FAIL
      return value is to avoid error log noise when read-ahead brings
      in an unused page that is typically filled with NUL bytes.
      
      If a synchronous read is bringing in a corrupted page where the
      page frame does not contain the expected tablespace identifier and
      page number, that must be treated as an attempt to read a corrupted
      page. The correct error code for this is DB_PAGE_CORRUPTED.
      The error code DB_FAIL is not handled by row_mysql_handle_errors().
      
      This was missed in commit 0b47c126
      (MDEV-13542).
      9c157994
    • Marko Mäkelä's avatar
      Merge 10.5 into 10.6 · cc27e5fd
      Marko Mäkelä authored
      cc27e5fd
  5. 15 Feb, 2023 3 commits
    • Julius Goryavsky's avatar
      MDEV-30318: galera error messages in mariadb log without galera enabled · 80b4fa54
      Julius Goryavsky authored
      Post-fix to MDEV-30318 and MDEV-22570-related changes:
      unified handling of wsrep_provider by code so that "none"
      is interpreted as case-insensitive everywhere and that
      work with an empty string is supported everywhere.
      80b4fa54
    • Marko Mäkelä's avatar
      MDEV-30657 InnoDB: Not applying UNDO_APPEND due to corruption · 5300c0fb
      Marko Mäkelä authored
      This almost completely reverts
      commit acd23da4 and
      retains a safe optimization:
      
      recv_sys_t::parse(): Remove any old redo log records for the
      truncated tablespace, to free up memory earlier.
      If recovery consists of multiple batches, then recv_sys_t::apply()
      will must invoke recv_sys_t::trim() again to avoid wrongly
      applying old log records to an already truncated undo tablespace.
      5300c0fb
    • Vicențiu Ciorbaru's avatar
      MDEV-30324: Wrong result upon SELECT DISTINCT ... WITH TIES · 4afa3b64
      Vicențiu Ciorbaru authored
      WITH TIES would not take effect if SELECT DISTINCT was used in a
      context where an INDEX is used to resolve the ORDER BY clause.
      
      WITH TIES relies on the `JOIN::order` to contain the non-constant
      fields to test the equality of ORDER BY fiels required for WITH TIES.
      
      The cause of the problem was a premature removal of the `JOIN::order`
      member during a DISTINCT optimization. This lead to WITH TIES code assuming
      ORDER BY only contained "constant" elements.
      
      Disable this optimization when WITH TIES is in effect.
      
      (side-note: the order by removal does not impact any current tests, thus
      it will be removed in a future version)
      
      Reviewed by: monty@mariadb.org
      4afa3b64