1. 26 Oct, 2021 14 commits
    • Sergei Golubchik's avatar
      the error should be on the second row, not the first · f845a983
      Sergei Golubchik authored
      otherwise how can we know that the row counter is incremented?
      f845a983
    • Rucha Deodhar's avatar
      MDEV-26832: ROW_NUMBER in SIGNAL/RESIGNAL causes a syntax error · ff5de38d
      Rucha Deodhar authored
      Analysis: Parser was missing ROW_NUMBER as syntax for SIGNAL and RESIGNAL.
      Fix: Fix parser and fix how m_row_number is copied like other attributes
      to avoid ROW_NUMBER from assuming default value.
      ff5de38d
    • Aleksey Midenkov's avatar
      MDEV-26767 Server crashes when rename table and alter storage engine · b15a5f6f
      Aleksey Midenkov authored
      Wrong assertion leftover removed. m_sql_cmd can be allocated by any
      ALTER subcommand and before allocation it is checked for NULL first.
      b15a5f6f
    • Aleksey Midenkov's avatar
      MDEV-22165 CONVERT TABLE: move in partition from existing table · 69724805
      Aleksey Midenkov authored
      Syntax for CONVERT TABLE
      
      ALTER TABLE tbl_name CONVERT TABLE tbl_name TO PARTITION partition_name partition_spec
      
      Examples:
      
          ALTER TABLE t1 CONVERT TABLE tp2 TO PARTITION p2 VALUES LESS THAN MAX_VALUE();
      
      New ALTER_PARTITION_CONVERT_IN command for
      fast_alter_partition_table() is done in alter_partition_convert_in()
      function which basically does ha_rename_table().
      
      Table structure and data check is basically the same as in EXCHANGE
      PARTITION command. And these are done by
      compare_table_with_partition() and check_table_data().
      
      Atomic DDL is done by the scheme from MDEV-22166 (see the
      corresponding commit message). The only differnce is that it also has
      to drop source table frm and that is done by WFRM_DROP_CONVERTED_FROM.
      
      Initial patch was done by Dmitry Shulga <dmitry.shulga@mariadb.com>
      69724805
    • Aleksey Midenkov's avatar
      Review and crash-safety fix · 7da721be
      Aleksey Midenkov authored
      7da721be
    • Sergei Golubchik's avatar
      42802452
    • Aleksey Midenkov's avatar
      MDEV-22166 CONVERT PARTITION: move out partition into a table · b7bba721
      Aleksey Midenkov authored
      Syntax for CONVERT keyword
      
      ALTER TABLE tbl_name
          [alter_option [, alter_option] ...] |
          [partition_options]
      
      partition_option: {
          ...
          | CONVERT PARTITION partition_name TO TABLE tbl_name
      }
      
      Examples:
      
          ALTER TABLE t1 CONVERT PARTITION p2 TO TABLE tp2;
      
      New ALTER_PARTITION_CONVERT_OUT command for
      fast_alter_partition_table() is done in alter_partition_convert_out()
      function which basically does ha_rename_table().
      
      Partition to extract is marked with the same flag as dropped
      partition: PART_TO_BE_DROPPED. Note that we cannot have multiple
      partitioning commands in one ALTER.
      
      For DDL logging basically the principle is the same as for other
      fast_alter_partition_table() commands. The only difference is that it
      integrates late Atomic DDL functions and introduces additional phase
      of WFRM_BACKUP_ORIGINAL. That is required for binlog consistency
      because otherwise we could not revert back after WFRM_INSTALL_SHADOW
      is done. And before DDL log is complete if we crash or fail the
      altered table will be already new but binlog will miss that ALTER
      command. Note that this is different from all other atomic DDL in that
      it rolls back until the ddl_log_complete() is done even if everything
      was done fully before the crash.
      
      Test cases added to:
      
        parts.alter_table \
        parts.partition_debug \
        versioning.partition \
        atomic.alter_partition
      b7bba721
    • Aleksey Midenkov's avatar
      MDEV-26471 Syntax extension: do not require PARTITION keyword in partition definition · f6b0e34c
      Aleksey Midenkov authored
      Instead of
      
        create or replace table t1 (x int)
        partition by range(x) (
          partition p1 values less than (10),
          partition pn values less than maxvalue);
      
      it should be possible to type in shorter form:
      
        create or replace table t1 (x int)
        partition by range(x) (
          p1 values less than (10),
          pn values less than maxvalue);
      
      As above examples demonstrate, make PARTITION keyword in partition
      definition optional.
      f6b0e34c
    • Dmitry Shulga's avatar
      MDEV-22165: Prerequisite patch that adds missing data member initializers in... · 379ddf49
      Dmitry Shulga authored
      MDEV-22165: Prerequisite patch that adds missing data member initializers in constructors of the class Alter_table_ctx
      
      Static analyzer built in Eclipse CDT complained about missing initializers in
      constructors of the class Alter_table_ctx so I've added them in order to
      eliminate annoying warnings.
      379ddf49
    • Aleksey Midenkov's avatar
      Vanilla cleanups and refactorings · d324c03d
      Aleksey Midenkov authored
      Dead code cleanup:
      
      part_info->num_parts usage was wrong and working incorrectly in
      mysql_drop_partitions() because num_parts is already updated in
      prep_alter_part_table(). We don't have to update part_info->partitions
      because part_info is destroyed at alter_partition_lock_handling().
      
      Cleanups:
      
      - DBUG_EVALUATE_IF() macro replaced by shorter form DBUG_IF();
      - Typo in ER_KEY_COLUMN_DOES_NOT_EXITS.
      
      Refactorings:
      
      - Splitted write_log_replace_delete_frm() into write_log_delete_frm()
        and write_log_replace_frm();
      - partition_info via DDL_LOG_STATE;
      - set_part_info_exec_log_entry() removed.
      
      DBUG_EVALUATE removed
      
      DBUG_EVALUTATE was only added for consistency together with
      DBUG_EVALUATE_IF. It is not used anywhere in the code.
      
      DBUG_SUICIDE() fix on release build
      
      On release DBUG_SUICIDE() was statement. It was wrong as
      DBUG_SUICIDE() is used in expression context.
      d324c03d
    • Aleksey Midenkov's avatar
      MDEV-25292 Better debug trace · 2dc3c320
      Aleksey Midenkov authored
      Improves readability of DDL log debug traces.
      2dc3c320
    • Thirunarayanan Balathandayuthapani's avatar
      MDEV-24621 In bulk insert, pre-sort and build indexes one page at a time · 045757af
      Thirunarayanan Balathandayuthapani authored
      When inserting a number of rows into an empty table,
      InnoDB will buffer and pre-sort the records for each index, and
      build the indexes one page at a time.
      
      For each index, a buffer of innodb_sort_buffer_size will be created.
      
      If the buffer ran out of memory then we will create temporary files
      for storing the data.
      
      At the end of the statement, we will sort and apply the buffered
      records. Ideally, we would do this at the end of the transaction
      or only when starting to execute a non-INSERT statement on the table.
      However, it could be awkward if duplicate keys or similar errors
      would be reported during the execution of a later statement.
      This will be addressed in MDEV-25036.
      
      Any columns longer than 2000 bytes will buffered in temporary files.
      
      innodb_prepare_commit_versioned(): Apply all bulk buffered insert
      operation, at the end of each statement.
      
      ha_commit_trans(): Handle errors from innodb_prepare_commit_versioned().
      
      row_merge_buf_write(): This function should accept blob
      file handle too and it should write the field data which are
      greater than 2000 bytes
      
      row_merge_bulk_t: Data structure to maintain the data during
      bulk insert operation.
      
      trx_mod_table_time_t::start_bulk_insert(): Notify the start of
      bulk insert operation and create new buffer for the given table
      
      trx_mod_table_time_t::add_tuple(): Buffer a record.
      
      trx_mod_table_time_t::write_bulk(): Do bulk insert operation
      present in the transaction
      
      trx_mod_table_time_t::bulk_buffer_exist(): Whether the buffer
      storage exist for the bulk transaction
      
      trx_mod_table_time_t::write_bulk(): Write all buffered insert
      operation for the transaction and the table.
      
      row_ins_clust_index_entry_low(): Insert the data into the
      bulk buffer if it is already exist.
      
      row_ins_sec_index_entry(): Insert the secondary tuple
      if the bulk buffer already exist.
      
      row_merge_bulk_buf_add(): Insert the tuple into bulk buffer
      insert operation.
      
      row_merge_buf_blob(): Write the field data whose length is
      more than 2000 bytes into blob temporary file. Write the
      file offset and length into the tuple field.
      
      row_merge_copy_blob_from_file(): Copy the blob from blob file
      handler based on reference of the given tuple.
      
      row_merge_insert_index_tuples(): Handle blob for bulk insert
      operation.
      
      row_merge_bulk_t::row_merge_bulk_t(): Constructor. Initialize
      the buffer and file for all the indexes expect fts index.
      
      row_merge_bulk_t::create_tmp_file(): Create new temporary file
      for the given index.
      
      row_merge_bulk_t::write_to_tmp_file(): Write the content from
      buffer to disk file for the given index.
      
      row_merge_bulk_t::add_tuple(): Insert the tuple into the merge
      buffer for the given index. If the memory ran out then InnoDB
      should sort the buffer and write into file.
      
      row_merge_bulk_t::write_to_index(): Do bulk insert operation
      from merge file/merge buffer for the given index
      
      row_merge_bulk_t::write_to_table(): Do bulk insert operation
      for all the indexes.
      
      dict_stats_update(): If a bulk insert transaction is in progress,
      treat the table as empty. The index creation could hold latches
      for extended amounts of time.
      045757af
    • Marko Mäkelä's avatar
      Merge 10.6 into 10.7 · c8e309a6
      Marko Mäkelä authored
      c8e309a6
    • Marko Mäkelä's avatar
      MDEV-26903: Assertion ctx->trx->state == TRX_STATE_ACTIVE on DROP INDEX · 58fe6b47
      Marko Mäkelä authored
      rollback_inplace_alter_table(): Tolerate a case where the transaction
      is not in an active state. If ha_innobase::commit_inplace_alter_table()
      failed with a deadlock, the transaction would already have been
      rolled back. This omission of error handling was introduced in
      commit 1bd681c8 (MDEV-25506 part 3).
      
      After commit c3c53926 (MDEV-26554)
      it became easier to trigger DB_DEADLOCK during exclusive table lock
      acquisition in ha_innobase::commit_inplace_alter_table().
      
      lock_table_low(): Add DBUG injection "innodb_table_deadlock".
      58fe6b47
  2. 25 Oct, 2021 5 commits
  3. 22 Oct, 2021 7 commits
    • Monty's avatar
    • Marko Mäkelä's avatar
      6bfaa68c
    • Marko Mäkelä's avatar
      Merge 10.6 into 10.7 · 71d4ecf1
      Marko Mäkelä authored
      71d4ecf1
    • Marko Mäkelä's avatar
      MDEV-26769 InnoDB does not support hardware lock elision · 1f022809
      Marko Mäkelä authored
      This implements memory transaction support for:
      
      * Intel Restricted Transactional Memory (RTM), also known as TSX-NI
      (Transactional Synchronization Extensions New Instructions)
      * POWER v2.09 Hardware Trace Monitor (HTM) on GNU/Linux
      
      transactional_lock_guard, transactional_shared_lock_guard:
      RAII lock guards that try to elide the lock acquisition
      when transactional memory is available.
      
      buf_pool.page_hash: Try to elide latches whenever feasible.
      Related to the InnoDB change buffer and ROW_FORMAT=COMPRESSED
      tables, this is not always possible.
      In buf_page_get_low(), memory transactions only work reasonably
      well for validating a guessed block address.
      
      TMLockGuard, TMLockTrxGuard, TMLockMutexGuard: RAII lock guards
      that try to elide lock_sys.latch and related latches.
      1f022809
    • Marko Mäkelä's avatar
      MDEV-26826 Duplicated computations of buf_pool.page_hash addresses · c091a0bc
      Marko Mäkelä authored
      Since commit bd5a6403 (MDEV-26033)
      we can actually calculate the buf_pool.page_hash cell and latch
      addresses while not holding buf_pool.mutex.
      
      buf_page_alloc_descriptor(): Remove the MEM_UNDEFINED.
      We now expect buf_page_t::hash to be zero-initialized.
      
      buf_pool_t::hash_chain: Dedicated data type for buf_pool.page_hash.array.
      
      buf_LRU_free_one_page(): Merged to the only caller
      buf_pool_t::corrupted_evict().
      c091a0bc
    • Marko Mäkelä's avatar
      MDEV-26828 Spinning on buf_pool.page_hash is wasting CPU cycles · fdae71f8
      Marko Mäkelä authored
      page_hash_latch: Only use the spinlock implementation on
      SUX_LOCK_GENERIC platforms (those for which we do not implement
      a futex-like interface). Use srw_spin_mutex on 32-bit systems
      (except Microsoft Windows) to satisfy the size constraints.
      
      rw_lock::is_read_locked(): Remove. We will use the slightly
      broader assertion is_locked().
      
      srw_lock_: Implement is_locked(), is_write_locked() in a hacky
      way for the Microsoft Windows SRWLOCK. This should be acceptable,
      because we are only using these predicates in debug assertions
      (or later, in lock elision), and false positives should not matter.
      fdae71f8
    • Marko Mäkelä's avatar
      MDEV-26883 InnoDB hang due to table lock conflict · 5caff202
      Marko Mäkelä authored
      In a stress test campaign of a 10.6-based branch by Matthias Leich,
      a deadlock between two InnoDB threads occurred, involving
      lock_sys.wait_mutex and a dict_table_t::lock_mutex.
      
      The cause of the hang is a latching order violation in
      lock_sys_t::cancel(). That function and the latching order
      violation were originally introduced in
      commit 8d16da14 (MDEV-24789).
      
      lock_sys_t::cancel(): Invoke table->lock_mutex_trylock() in order
      to avoid a deadlock. If that fails, release lock_sys.wait_mutex,
      and acquire both latches. In that way, we will be obeying the
      latching order and no hangs will occur.
      
      This hang should mostly affect DDL operations. DML operations will
      acquire only IX or IS table locks, which are compatible with each other.
      5caff202
  4. 21 Oct, 2021 14 commits