- 26 Oct, 2021 10 commits
-
-
Aleksey Midenkov authored
-
Sergei Golubchik authored
-
Aleksey Midenkov authored
Syntax for CONVERT keyword ALTER TABLE tbl_name [alter_option [, alter_option] ...] | [partition_options] partition_option: { ... | CONVERT PARTITION partition_name TO TABLE tbl_name } Examples: ALTER TABLE t1 CONVERT PARTITION p2 TO TABLE tp2; New ALTER_PARTITION_CONVERT_OUT command for fast_alter_partition_table() is done in alter_partition_convert_out() function which basically does ha_rename_table(). Partition to extract is marked with the same flag as dropped partition: PART_TO_BE_DROPPED. Note that we cannot have multiple partitioning commands in one ALTER. For DDL logging basically the principle is the same as for other fast_alter_partition_table() commands. The only difference is that it integrates late Atomic DDL functions and introduces additional phase of WFRM_BACKUP_ORIGINAL. That is required for binlog consistency because otherwise we could not revert back after WFRM_INSTALL_SHADOW is done. And before DDL log is complete if we crash or fail the altered table will be already new but binlog will miss that ALTER command. Note that this is different from all other atomic DDL in that it rolls back until the ddl_log_complete() is done even if everything was done fully before the crash. Test cases added to: parts.alter_table \ parts.partition_debug \ versioning.partition \ atomic.alter_partition
-
Aleksey Midenkov authored
Instead of create or replace table t1 (x int) partition by range(x) ( partition p1 values less than (10), partition pn values less than maxvalue); it should be possible to type in shorter form: create or replace table t1 (x int) partition by range(x) ( p1 values less than (10), pn values less than maxvalue); As above examples demonstrate, make PARTITION keyword in partition definition optional.
-
Dmitry Shulga authored
MDEV-22165: Prerequisite patch that adds missing data member initializers in constructors of the class Alter_table_ctx Static analyzer built in Eclipse CDT complained about missing initializers in constructors of the class Alter_table_ctx so I've added them in order to eliminate annoying warnings.
-
Aleksey Midenkov authored
Dead code cleanup: part_info->num_parts usage was wrong and working incorrectly in mysql_drop_partitions() because num_parts is already updated in prep_alter_part_table(). We don't have to update part_info->partitions because part_info is destroyed at alter_partition_lock_handling(). Cleanups: - DBUG_EVALUATE_IF() macro replaced by shorter form DBUG_IF(); - Typo in ER_KEY_COLUMN_DOES_NOT_EXITS. Refactorings: - Splitted write_log_replace_delete_frm() into write_log_delete_frm() and write_log_replace_frm(); - partition_info via DDL_LOG_STATE; - set_part_info_exec_log_entry() removed. DBUG_EVALUATE removed DBUG_EVALUTATE was only added for consistency together with DBUG_EVALUATE_IF. It is not used anywhere in the code. DBUG_SUICIDE() fix on release build On release DBUG_SUICIDE() was statement. It was wrong as DBUG_SUICIDE() is used in expression context.
-
Aleksey Midenkov authored
Improves readability of DDL log debug traces.
-
Thirunarayanan Balathandayuthapani authored
When inserting a number of rows into an empty table, InnoDB will buffer and pre-sort the records for each index, and build the indexes one page at a time. For each index, a buffer of innodb_sort_buffer_size will be created. If the buffer ran out of memory then we will create temporary files for storing the data. At the end of the statement, we will sort and apply the buffered records. Ideally, we would do this at the end of the transaction or only when starting to execute a non-INSERT statement on the table. However, it could be awkward if duplicate keys or similar errors would be reported during the execution of a later statement. This will be addressed in MDEV-25036. Any columns longer than 2000 bytes will buffered in temporary files. innodb_prepare_commit_versioned(): Apply all bulk buffered insert operation, at the end of each statement. ha_commit_trans(): Handle errors from innodb_prepare_commit_versioned(). row_merge_buf_write(): This function should accept blob file handle too and it should write the field data which are greater than 2000 bytes row_merge_bulk_t: Data structure to maintain the data during bulk insert operation. trx_mod_table_time_t::start_bulk_insert(): Notify the start of bulk insert operation and create new buffer for the given table trx_mod_table_time_t::add_tuple(): Buffer a record. trx_mod_table_time_t::write_bulk(): Do bulk insert operation present in the transaction trx_mod_table_time_t::bulk_buffer_exist(): Whether the buffer storage exist for the bulk transaction trx_mod_table_time_t::write_bulk(): Write all buffered insert operation for the transaction and the table. row_ins_clust_index_entry_low(): Insert the data into the bulk buffer if it is already exist. row_ins_sec_index_entry(): Insert the secondary tuple if the bulk buffer already exist. row_merge_bulk_buf_add(): Insert the tuple into bulk buffer insert operation. row_merge_buf_blob(): Write the field data whose length is more than 2000 bytes into blob temporary file. Write the file offset and length into the tuple field. row_merge_copy_blob_from_file(): Copy the blob from blob file handler based on reference of the given tuple. row_merge_insert_index_tuples(): Handle blob for bulk insert operation. row_merge_bulk_t::row_merge_bulk_t(): Constructor. Initialize the buffer and file for all the indexes expect fts index. row_merge_bulk_t::create_tmp_file(): Create new temporary file for the given index. row_merge_bulk_t::write_to_tmp_file(): Write the content from buffer to disk file for the given index. row_merge_bulk_t::add_tuple(): Insert the tuple into the merge buffer for the given index. If the memory ran out then InnoDB should sort the buffer and write into file. row_merge_bulk_t::write_to_index(): Do bulk insert operation from merge file/merge buffer for the given index row_merge_bulk_t::write_to_table(): Do bulk insert operation for all the indexes. dict_stats_update(): If a bulk insert transaction is in progress, treat the table as empty. The index creation could hold latches for extended amounts of time.
-
Marko Mäkelä authored
-
Marko Mäkelä authored
rollback_inplace_alter_table(): Tolerate a case where the transaction is not in an active state. If ha_innobase::commit_inplace_alter_table() failed with a deadlock, the transaction would already have been rolled back. This omission of error handling was introduced in commit 1bd681c8 (MDEV-25506 part 3). After commit c3c53926 (MDEV-26554) it became easier to trigger DB_DEADLOCK during exclusive table lock acquisition in ha_innobase::commit_inplace_alter_table(). lock_table_low(): Add DBUG injection "innodb_table_deadlock".
-
- 25 Oct, 2021 5 commits
-
-
Sergei Golubchik authored
this is CentOOOOOOS 7
-
Vladislav Vaintroub authored
The reason for the crash was a bug in MDEV-19275, after which shutdown does not wait for binlog threads anymore.
-
Vladislav Vaintroub authored
-
Marko Mäkelä authored
We have observed hangs of the io_uring subsystem when using a Linux kernel newer than 5.10. Also 5.15-rc6 is affected by this. The exact cause of the hangs has not been diagnosed yet. As a safety measure, we will disable innodb_use_native_aio by default when the server has been configured with io_uring and the kernel version is between 5.11 and 5.15. If the start-up parameter innodb_use_native_aio=ON is set, then we will issue a warning to the server error log.
-
Vladislav Vaintroub authored
Error C2440 'initializing': cannot convert from 'MYSQL_RES *(__stdcall *)(MYSQL *)' to 'MYSQL_RES *(__cdecl *)(MYSQL *)'
-
- 22 Oct, 2021 7 commits
-
-
Monty authored
-
Marko Mäkelä authored
-
Marko Mäkelä authored
-
Marko Mäkelä authored
This implements memory transaction support for: * Intel Restricted Transactional Memory (RTM), also known as TSX-NI (Transactional Synchronization Extensions New Instructions) * POWER v2.09 Hardware Trace Monitor (HTM) on GNU/Linux transactional_lock_guard, transactional_shared_lock_guard: RAII lock guards that try to elide the lock acquisition when transactional memory is available. buf_pool.page_hash: Try to elide latches whenever feasible. Related to the InnoDB change buffer and ROW_FORMAT=COMPRESSED tables, this is not always possible. In buf_page_get_low(), memory transactions only work reasonably well for validating a guessed block address. TMLockGuard, TMLockTrxGuard, TMLockMutexGuard: RAII lock guards that try to elide lock_sys.latch and related latches.
-
Marko Mäkelä authored
Since commit bd5a6403 (MDEV-26033) we can actually calculate the buf_pool.page_hash cell and latch addresses while not holding buf_pool.mutex. buf_page_alloc_descriptor(): Remove the MEM_UNDEFINED. We now expect buf_page_t::hash to be zero-initialized. buf_pool_t::hash_chain: Dedicated data type for buf_pool.page_hash.array. buf_LRU_free_one_page(): Merged to the only caller buf_pool_t::corrupted_evict().
-
Marko Mäkelä authored
page_hash_latch: Only use the spinlock implementation on SUX_LOCK_GENERIC platforms (those for which we do not implement a futex-like interface). Use srw_spin_mutex on 32-bit systems (except Microsoft Windows) to satisfy the size constraints. rw_lock::is_read_locked(): Remove. We will use the slightly broader assertion is_locked(). srw_lock_: Implement is_locked(), is_write_locked() in a hacky way for the Microsoft Windows SRWLOCK. This should be acceptable, because we are only using these predicates in debug assertions (or later, in lock elision), and false positives should not matter.
-
Marko Mäkelä authored
In a stress test campaign of a 10.6-based branch by Matthias Leich, a deadlock between two InnoDB threads occurred, involving lock_sys.wait_mutex and a dict_table_t::lock_mutex. The cause of the hang is a latching order violation in lock_sys_t::cancel(). That function and the latching order violation were originally introduced in commit 8d16da14 (MDEV-24789). lock_sys_t::cancel(): Invoke table->lock_mutex_trylock() in order to avoid a deadlock. If that fails, release lock_sys.wait_mutex, and acquire both latches. In that way, we will be obeying the latching order and no hangs will occur. This hang should mostly affect DDL operations. DML operations will acquire only IX or IS table locks, which are compatible with each other.
-
- 21 Oct, 2021 15 commits
-
-
Vladislav Vaintroub authored
-
Marko Mäkelä authored
-
Marko Mäkelä authored
page_create_low(): Fix -Warray-bounds log_buffer_extend(): Fix -Wstringop-overflow
-
Marko Mäkelä authored
-
Marko Mäkelä authored
-
Marko Mäkelä authored
-
Marko Mäkelä authored
-
Marko Mäkelä authored
-
Marko Mäkelä authored
Based on mysql/mysql-server@bc9c46bf2894673d0df17cd0ee872d0d99663121 but without sleeps. The test was verified to hit the debug assertion if the change to fts_add_doc_by_id() in commit 2d98b967 was reverted.
-
Marko Mäkelä authored
fts_cache_t::total_size_at_sync: New field, to sample total_size. fts_add_doc_by_id(): Invoke sync if total_size has grown too much since the previous sync request. (Maintain cache->total_size_at_sync.) ib_wqueue_t::length: Caches ib_list_len(*items). ib_wqueue_len(): Removed. We will refer to fts_optimize_wq->length directly. Based on mysql/mysql-server@bc9c46bf2894673d0df17cd0ee872d0d99663121
-
Marko Mäkelä authored
trx_commit_in_memory(): Do not release the rseg reference before trx_undo_commit_cleanup() has been invoked and the current transaction is truly done with the rollback segment. The purpose of the reference count is to prevent data races with trx_purge_truncate_history(). This is based on mysql/mysql-server@ac79aa1522f33e6eb912133a81fa2614db764c9c.
-
Thirunarayanan Balathandayuthapani authored
InnoDB commit fails when consecutive FTS_DOC_ID value is greater than 4294967295. Fix is that InnoDB should remove the delta FTS_DOC_ID value limitations and fts should encode 8 byte value, remove FTS_DOC_ID_MAX_STEP variable. Replaced the fts0vlc.ic file with fts0vlc.h fts_encode_int(): Should be able to encode 10 bytes value fts_get_encoded_len(): Should get the length of the value which has 10 bytes fts_decode_vlc(): Add debug assertion to verify the maximum length allowed is 10. mach_read_uint64_little_endian(): Reads 64 bit stored in little endian format Added a unit test case which check for minimum and maximum value to do the fts encoding
-
Marko Mäkelä authored
-
Marko Mäkelä authored
In commit 1811fd51 the assertion should have said error_reported instead of !error_reported. But, that revised assertion would still fail in main.defaults where ER_BAD_DATA is reported during CREATE TABLE.
-
Sergei Krivonos authored
-
- 20 Oct, 2021 3 commits
-
-
Marko Mäkelä authored
-
Nikita Malyavin authored
Assertion `!pk->has_virtual()' failed in dict_index_build_internal_clust while creating PRIMARY key longer than possible to store in the page. This happened because the key was wrongly deduced as Long UNIQUE supported, however PRIMARY KEY cannot be of that type. The main reason is that only 8 bytes are used to store the hash, see HA_HASH_FIELD_LENGTH. This is also why HA_NOSAME flag is removed (and caused the assertion in turn) in open_table_from_share: if (key_info->algorithm == HA_KEY_ALG_LONG_HASH) { key_part_end++; key_info->flags&= ~HA_NOSAME; } To make it unique, the additional check is done by check_duplicate_long_entries call from ha_write_row, and similar one from ha_update_row. PRIMARY key is already forbidden, which is checked by the first test in main.long_unique, however is_hash_field_needed was wrongly deduced to true in mysql_prepare_create_table in this particular case. FIX: * Improve the check for Key::PRIMARY type * Simplify is_hash_field_needed deduction for a more neat reading
-
Marko Mäkelä authored
-