Commits · 69724805bc4b61cdf8460d4a7ec0acf71f579396 · nexedi / MariaDB

26 Oct, 2021 11 commits

MDEV-22165 CONVERT TABLE: move in partition from existing table · 69724805

Aleksey Midenkov authored Sep 28, 2021

Syntax for CONVERT TABLE

ALTER TABLE tbl_name CONVERT TABLE tbl_name TO PARTITION partition_name partition_spec

Examples:

    ALTER TABLE t1 CONVERT TABLE tp2 TO PARTITION p2 VALUES LESS THAN MAX_VALUE();

New ALTER_PARTITION_CONVERT_IN command for
fast_alter_partition_table() is done in alter_partition_convert_in()
function which basically does ha_rename_table().

Table structure and data check is basically the same as in EXCHANGE
PARTITION command. And these are done by
compare_table_with_partition() and check_table_data().

Atomic DDL is done by the scheme from MDEV-22166 (see the
corresponding commit message). The only differnce is that it also has
to drop source table frm and that is done by WFRM_DROP_CONVERTED_FROM.

Initial patch was done by Dmitry Shulga <dmitry.shulga@mariadb.com>

69724805

Review and crash-safety fix · 7da721be
Aleksey Midenkov authored Sep 27, 2021

7da721be
cleanup: reduce error injection noise in partitioning · 42802452
Sergei Golubchik authored Sep 12, 2021

42802452

MDEV-22166 CONVERT PARTITION: move out partition into a table · b7bba721

Aleksey Midenkov authored Sep 09, 2021

Syntax for CONVERT keyword

ALTER TABLE tbl_name
    [alter_option [, alter_option] ...] |
    [partition_options]

partition_option: {
    ...
    | CONVERT PARTITION partition_name TO TABLE tbl_name
}

Examples:

    ALTER TABLE t1 CONVERT PARTITION p2 TO TABLE tp2;

New ALTER_PARTITION_CONVERT_OUT command for
fast_alter_partition_table() is done in alter_partition_convert_out()
function which basically does ha_rename_table().

Partition to extract is marked with the same flag as dropped
partition: PART_TO_BE_DROPPED. Note that we cannot have multiple
partitioning commands in one ALTER.

For DDL logging basically the principle is the same as for other
fast_alter_partition_table() commands. The only difference is that it
integrates late Atomic DDL functions and introduces additional phase
of WFRM_BACKUP_ORIGINAL. That is required for binlog consistency
because otherwise we could not revert back after WFRM_INSTALL_SHADOW
is done. And before DDL log is complete if we crash or fail the
altered table will be already new but binlog will miss that ALTER
command. Note that this is different from all other atomic DDL in that
it rolls back until the ddl_log_complete() is done even if everything
was done fully before the crash.

Test cases added to:

  parts.alter_table \
  parts.partition_debug \
  versioning.partition \
  atomic.alter_partition

b7bba721

MDEV-26471 Syntax extension: do not require PARTITION keyword in partition definition · f6b0e34c

Aleksey Midenkov authored Sep 09, 2021

Instead of

  create or replace table t1 (x int)
  partition by range(x) (
    partition p1 values less than (10),
    partition pn values less than maxvalue);

it should be possible to type in shorter form:

  create or replace table t1 (x int)
  partition by range(x) (
    p1 values less than (10),
    pn values less than maxvalue);

As above examples demonstrate, make PARTITION keyword in partition
definition optional.

f6b0e34c

MDEV-22165: Prerequisite patch that adds missing data member initializers in... · 379ddf49

Dmitry Shulga authored Aug 26, 2021

MDEV-22165: Prerequisite patch that adds missing data member initializers in constructors of the class Alter_table_ctx

Static analyzer built in Eclipse CDT complained about missing initializers in
constructors of the class Alter_table_ctx so I've added them in order to
eliminate annoying warnings.

379ddf49

Vanilla cleanups and refactorings · d324c03d

Aleksey Midenkov authored Sep 09, 2021

Dead code cleanup:

part_info->num_parts usage was wrong and working incorrectly in
mysql_drop_partitions() because num_parts is already updated in
prep_alter_part_table(). We don't have to update part_info->partitions
because part_info is destroyed at alter_partition_lock_handling().

Cleanups:

- DBUG_EVALUATE_IF() macro replaced by shorter form DBUG_IF();
- Typo in ER_KEY_COLUMN_DOES_NOT_EXITS.

Refactorings:

- Splitted write_log_replace_delete_frm() into write_log_delete_frm()
  and write_log_replace_frm();
- partition_info via DDL_LOG_STATE;
- set_part_info_exec_log_entry() removed.

DBUG_EVALUATE removed

DBUG_EVALUTATE was only added for consistency together with
DBUG_EVALUATE_IF. It is not used anywhere in the code.

DBUG_SUICIDE() fix on release build

On release DBUG_SUICIDE() was statement. It was wrong as
DBUG_SUICIDE() is used in expression context.

d324c03d

MDEV-25292 Better debug trace · 2dc3c320
Aleksey Midenkov authored Apr 22, 2021
```
Improves readability of DDL log debug traces.
```
2dc3c320

MDEV-24621 In bulk insert, pre-sort and build indexes one page at a time · 045757af

Thirunarayanan Balathandayuthapani authored Oct 22, 2021

When inserting a number of rows into an empty table,
InnoDB will buffer and pre-sort the records for each index, and
build the indexes one page at a time.

For each index, a buffer of innodb_sort_buffer_size will be created.

If the buffer ran out of memory then we will create temporary files
for storing the data.

At the end of the statement, we will sort and apply the buffered
records. Ideally, we would do this at the end of the transaction
or only when starting to execute a non-INSERT statement on the table.
However, it could be awkward if duplicate keys or similar errors
would be reported during the execution of a later statement.
This will be addressed in MDEV-25036.

Any columns longer than 2000 bytes will buffered in temporary files.

innodb_prepare_commit_versioned(): Apply all bulk buffered insert
operation, at the end of each statement.

ha_commit_trans(): Handle errors from innodb_prepare_commit_versioned().

row_merge_buf_write(): This function should accept blob
file handle too and it should write the field data which are
greater than 2000 bytes

row_merge_bulk_t: Data structure to maintain the data during
bulk insert operation.

trx_mod_table_time_t::start_bulk_insert(): Notify the start of
bulk insert operation and create new buffer for the given table

trx_mod_table_time_t::add_tuple(): Buffer a record.

trx_mod_table_time_t::write_bulk(): Do bulk insert operation
present in the transaction

trx_mod_table_time_t::bulk_buffer_exist(): Whether the buffer
storage exist for the bulk transaction

trx_mod_table_time_t::write_bulk(): Write all buffered insert
operation for the transaction and the table.

row_ins_clust_index_entry_low(): Insert the data into the
bulk buffer if it is already exist.

row_ins_sec_index_entry(): Insert the secondary tuple
if the bulk buffer already exist.

row_merge_bulk_buf_add(): Insert the tuple into bulk buffer
insert operation.

row_merge_buf_blob(): Write the field data whose length is
more than 2000 bytes into blob temporary file. Write the
file offset and length into the tuple field.

row_merge_copy_blob_from_file(): Copy the blob from blob file
handler based on reference of the given tuple.

row_merge_insert_index_tuples(): Handle blob for bulk insert
operation.

row_merge_bulk_t::row_merge_bulk_t(): Constructor. Initialize
the buffer and file for all the indexes expect fts index.

row_merge_bulk_t::create_tmp_file(): Create new temporary file
for the given index.

row_merge_bulk_t::write_to_tmp_file(): Write the content from
buffer to disk file for the given index.

row_merge_bulk_t::add_tuple(): Insert the tuple into the merge
buffer for the given index. If the memory ran out then InnoDB
should sort the buffer and write into file.

row_merge_bulk_t::write_to_index(): Do bulk insert operation
from merge file/merge buffer for the given index

row_merge_bulk_t::write_to_table(): Do bulk insert operation
for all the indexes.

dict_stats_update(): If a bulk insert transaction is in progress,
treat the table as empty. The index creation could hold latches
for extended amounts of time.

045757af

Merge 10.6 into 10.7 · c8e309a6
Marko Mäkelä authored Oct 26, 2021

c8e309a6

MDEV-26903: Assertion ctx->trx->state == TRX_STATE_ACTIVE on DROP INDEX · 58fe6b47

Marko Mäkelä authored Oct 26, 2021

rollback_inplace_alter_table(): Tolerate a case where the transaction
is not in an active state. If ha_innobase::commit_inplace_alter_table()
failed with a deadlock, the transaction would already have been
rolled back. This omission of error handling was introduced in
commit 1bd681c8 (MDEV-25506 part 3).

After commit c3c53926 (MDEV-26554)
it became easier to trigger DB_DEADLOCK during exclusive table lock
acquisition in ha_innobase::commit_inplace_alter_table().

lock_table_low(): Add DBUG injection "innodb_table_deadlock".

58fe6b47

25 Oct, 2021 5 commits

libfmt fix for cmake <3.0 · 2897ef09
Sergei Golubchik authored Oct 25, 2021
```
this is CentOOOOOOS 7
```
2897ef09
MDEV-26890 : Crash on shutdown, with active binlog dump threads · f9339759
Vladislav Vaintroub authored Oct 25, 2021
```
The reason for the crash was a bug in MDEV-19275, after which shutdown
does not wait for binlog threads anymore.
```
f9339759
Fix 32bit build · 30009f29
Vladislav Vaintroub authored Oct 25, 2021

30009f29

MDEV-26674: Set innodb_use_native_aio=OFF when using io_uring on a potentially affected kernel · 1193a793

Marko Mäkelä authored Oct 25, 2021

We have observed hangs of the io_uring subsystem when using a
Linux kernel newer than 5.10. Also 5.15-rc6 is affected by this.

The exact cause of the hangs has not been diagnosed yet.
As a safety measure, we will disable innodb_use_native_aio by default
when the server has been configured with io_uring and the kernel
version is between 5.11 and 5.15.

If the start-up parameter innodb_use_native_aio=ON is set, then
we will issue a warning to the server error log.

1193a793

MDEV-19275 : Fix compiler error - calling covention mismatch (32bit Windows) · 35084c5a
Vladislav Vaintroub authored Oct 25, 2021
```
Error C2440	'initializing': cannot convert from 'MYSQL_RES *(__stdcall *)(MYSQL *)' to 'MYSQL_RES *(__cdecl *)(MYSQL *)'
```
35084c5a

22 Oct, 2021 7 commits

Fixed mysqld--help.result if password-reuse-check is compiled in static · 9624bb0f
Monty authored Oct 22, 2021

9624bb0f
MDEV-26882 InnoDB number of trx pools note improvement · 6bfaa68c
Marko Mäkelä authored Oct 22, 2021

6bfaa68c
Merge 10.6 into 10.7 · 71d4ecf1
Marko Mäkelä authored Oct 22, 2021

71d4ecf1

MDEV-26769 InnoDB does not support hardware lock elision · 1f022809

Marko Mäkelä authored Oct 22, 2021

This implements memory transaction support for:

* Intel Restricted Transactional Memory (RTM), also known as TSX-NI
(Transactional Synchronization Extensions New Instructions)
* POWER v2.09 Hardware Trace Monitor (HTM) on GNU/Linux

transactional_lock_guard, transactional_shared_lock_guard:
RAII lock guards that try to elide the lock acquisition
when transactional memory is available.

buf_pool.page_hash: Try to elide latches whenever feasible.
Related to the InnoDB change buffer and ROW_FORMAT=COMPRESSED
tables, this is not always possible.
In buf_page_get_low(), memory transactions only work reasonably
well for validating a guessed block address.

TMLockGuard, TMLockTrxGuard, TMLockMutexGuard: RAII lock guards
that try to elide lock_sys.latch and related latches.

1f022809

MDEV-26826 Duplicated computations of buf_pool.page_hash addresses · c091a0bc

Marko Mäkelä authored Oct 22, 2021

Since commit bd5a6403 (MDEV-26033)
we can actually calculate the buf_pool.page_hash cell and latch
addresses while not holding buf_pool.mutex.

buf_page_alloc_descriptor(): Remove the MEM_UNDEFINED.
We now expect buf_page_t::hash to be zero-initialized.

buf_pool_t::hash_chain: Dedicated data type for buf_pool.page_hash.array.

buf_LRU_free_one_page(): Merged to the only caller
buf_pool_t::corrupted_evict().

c091a0bc

MDEV-26828 Spinning on buf_pool.page_hash is wasting CPU cycles · fdae71f8

Marko Mäkelä authored Oct 22, 2021

page_hash_latch: Only use the spinlock implementation on
SUX_LOCK_GENERIC platforms (those for which we do not implement
a futex-like interface). Use srw_spin_mutex on 32-bit systems
(except Microsoft Windows) to satisfy the size constraints.

rw_lock::is_read_locked(): Remove. We will use the slightly
broader assertion is_locked().

srw_lock_: Implement is_locked(), is_write_locked() in a hacky
way for the Microsoft Windows SRWLOCK. This should be acceptable,
because we are only using these predicates in debug assertions
(or later, in lock elision), and false positives should not matter.

fdae71f8

MDEV-26883 InnoDB hang due to table lock conflict · 5caff202

Marko Mäkelä authored Oct 22, 2021

In a stress test campaign of a 10.6-based branch by Matthias Leich,
a deadlock between two InnoDB threads occurred, involving
lock_sys.wait_mutex and a dict_table_t::lock_mutex.

The cause of the hang is a latching order violation in
lock_sys_t::cancel(). That function and the latching order
violation were originally introduced in
commit 8d16da14 (MDEV-24789).

lock_sys_t::cancel(): Invoke table->lock_mutex_trylock() in order
to avoid a deadlock. If that fails, release lock_sys.wait_mutex,
and acquire both latches. In that way, we will be obeying the
latching order and no hangs will occur.

This hang should mostly affect DDL operations. DML operations will
acquire only IX or IS table locks, which are compatible with each other.

5caff202

21 Oct, 2021 15 commits

Remove trailing space · 059a5f11
Vladislav Vaintroub authored Oct 21, 2021

059a5f11
Merge 10.5 into 10.6 · 73f5cbd0
Marko Mäkelä authored Oct 21, 2021

73f5cbd0
Fix GCC 11.2.0 -m32 (IA-32) warnings · a0fda162
Marko Mäkelä authored Oct 21, 2021
```
page_create_low(): Fix -Warray-bounds

log_buffer_extend(): Fix -Wstringop-overflow
```
a0fda162
Merge 10.4 into 10.5 · 5f8561a6
Marko Mäkelä authored Oct 21, 2021

5f8561a6
Merge 10.3 into 10.4 · 489ef007
Marko Mäkelä authored Oct 21, 2021

489ef007
Merge 10.2 into 10.3 · d5bcccda
Marko Mäkelä authored Oct 21, 2021

d5bcccda
MDEV-19522 fixup: Integer type mismatch in unit test · fbb1e92e
Marko Mäkelä authored Oct 21, 2021

fbb1e92e
Merge 10.2 into 10.3 · e4a7c15d
Marko Mäkelä authored Oct 21, 2021

e4a7c15d

MDEV-26865: Add test case and instrumentation · 1a2308d3

Marko Mäkelä authored Oct 21, 2021

Based on mysql/mysql-server@bc9c46bf2894673d0df17cd0ee872d0d99663121
but without sleeps.

The test was verified to hit the debug assertion if the change to
fts_add_doc_by_id() in commit 2d98b967
was reverted.

1a2308d3

MDEV-26865 fts_optimize_thread cannot keep up with workload · 2d98b967

Marko Mäkelä authored Oct 21, 2021

fts_cache_t::total_size_at_sync: New field, to sample total_size.

fts_add_doc_by_id(): Invoke sync if total_size has grown too much
since the previous sync request. (Maintain cache->total_size_at_sync.)

ib_wqueue_t::length: Caches ib_list_len(*items).

ib_wqueue_len(): Removed. We will refer to fts_optimize_wq->length
directly.

Based on mysql/mysql-server@bc9c46bf2894673d0df17cd0ee872d0d99663121

2d98b967

MDEV-26864 Race condition between transaction commit and undo log truncation · c484a358

Marko Mäkelä authored Oct 21, 2021

trx_commit_in_memory(): Do not release the rseg reference before
trx_undo_commit_cleanup() has been invoked and the current transaction
is truly done with the rollback segment. The purpose of the reference
count is to prevent data races with trx_purge_truncate_history().

This is based on
mysql/mysql-server@ac79aa1522f33e6eb912133a81fa2614db764c9c.

c484a358

MDEV-19522 InnoDB commit fails when FTS_DOC_ID value is greater than 4294967295 · 8ce8c269

Thirunarayanan Balathandayuthapani authored Oct 06, 2021

InnoDB commit fails when consecutive FTS_DOC_ID value
is greater than 4294967295.
Fix is that InnoDB should remove the delta FTS_DOC_ID
value limitations and fts should encode 8 byte value,
remove FTS_DOC_ID_MAX_STEP variable. Replaced the
fts0vlc.ic file with fts0vlc.h

fts_encode_int(): Should be able to encode 10 bytes value

fts_get_encoded_len(): Should get the length of the value
which has 10 bytes

fts_decode_vlc(): Add debug assertion to verify the maximum
length allowed is 10.

mach_read_uint64_little_endian(): Reads 64 bit stored in
little endian format

Added a unit test case which check for minimum and maximum
value to do the fts encoding

8ce8c269

MDEV-22627 fixup: Add a type cast for 32-bit platforms · 6b4fad94
Marko Mäkelä authored Oct 21, 2021

6b4fad94

MDEV-26262 fixup: Remove a bogus assertion · d3426c4c

Marko Mäkelä authored Oct 21, 2021

In commit 1811fd51 the assertion
should have said error_reported instead of !error_reported.
But, that revised assertion would still fail in main.defaults
where ER_BAD_DATA is reported during CREATE TABLE.

d3426c4c

MDEV-19129: Xcode compatibility update: mysql-test-run.pl · 2e844a08
Sergei Krivonos authored Oct 21, 2021

2e844a08

20 Oct, 2021 2 commits

MDEV-22627 fixup: Cover also ALTER TABLE...ALGORITHM=INPLACE · 05c3dced
Marko Mäkelä authored Oct 20, 2021

05c3dced

MDEV-20131 Assertion `!pk->has_virtual()' failed · d10c42b4

Nikita Malyavin authored Oct 07, 2021

Assertion `!pk->has_virtual()' failed in dict_index_build_internal_clust
while creating PRIMARY key longer than possible to store in the page.

This happened because the key was wrongly deduced as Long UNIQUE supported,
however PRIMARY KEY cannot be of that type. The main reason is that
only 8 bytes are used to store the hash, see HA_HASH_FIELD_LENGTH.

This is also why HA_NOSAME flag is removed (and caused the assertion in
turn) in open_table_from_share:
      if (key_info->algorithm == HA_KEY_ALG_LONG_HASH)
      {
        key_part_end++;
        key_info->flags&= ~HA_NOSAME;
      }

To make it unique, the additional check is done by
check_duplicate_long_entries call from ha_write_row, and similar one from
ha_update_row.

PRIMARY key is already forbidden, which is checked by the first test in
main.long_unique, however is_hash_field_needed was wrongly deduced to true
in mysql_prepare_create_table in this particular case.

FIX:

* Improve the check for Key::PRIMARY type
* Simplify is_hash_field_needed deduction for a more neat reading

d10c42b4