Commits · b83c379420a8846ae4b28768d3c81fa354cca056 · nexedi / MariaDB

08 Nov, 2023 6 commits

Merge branch '10.5' into 10.6 · b83c3794
Oleksandr Byelkin authored Nov 08, 2023

b83c3794

MDEV-32728: Wrong mutex usage 'LOCK_thd_data' and 'wait_mutex' · 2a4c5733

Kristian Nielsen authored Nov 08, 2023

Checking for kill with thd_kill_level() or check_killed() runs apc
requests, which takes the LOCK_thd_kill mutex. But this is dangerous,
as checking for kill needs to be called while holding many different
mutexes, and can lead to cyclic mutex dependency and deadlock.

But running apc is only "best effort", so skip running the apc if the
LOCK_thd_kill is not available. The apc will then be run on next check
of kill signal.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>

2a4c5733

Merge branch '10.4' into 10.5 · 6cfd2ba3
Oleksandr Byelkin authored Nov 08, 2023

6cfd2ba3
MDEV-31851 Doublewrite recovery fixup · a44869d8
Thirunarayanan Balathandayuthapani authored Nov 07, 2023
```
recv_dblwr_t::find_page(): Tablespace flags validity should be
checked only for page 0.
```
a44869d8
MDEV-32656: ASAN errors in base_list_iterator::next / setup_table_map upon 2nd execution of PS · fefd6d55
Oleksandr Byelkin authored Nov 08, 2023
```
Correctly supress error issuing when saving value in field for comporison
```
fefd6d55

MDEV-32682: Assertion `range->rows >= s->found_records' failed in best_access_path · 16977474

Sergei Petrunia authored Nov 07, 2023

Fix the issue introduced in ec2574fd, fix for MDEV-31983:

get_quick_record_count() must set quick_count=0 when it got
IMPOSSIBLE_RANGE from test_quick_select.

Failure to do so will cause an assertion in 11.0, when the number of
quick select rows (0) is checked to be lower than the number of
found_records (which is capped up to 1).

16977474

06 Nov, 2023 1 commit
- MDEV-31851 Doublewrite recovery fixup · b52b7b41
  Thirunarayanan Balathandayuthapani authored Nov 06, 2023
```
recv_dblwr_t::find_page(): Tablespace flags validity should be
checked only for page 0.
```
  b52b7b41
04 Nov, 2023 1 commit
- MDEV-31826: File handle leak on failed IMPORT TABLESPACE · 1fc2843e
  Marko Mäkelä authored Nov 04, 2023
```
fil_space_t::drop(): If the caller is not interested in a
detached handle, close it immediately.
```
  1fc2843e
01 Nov, 2023 1 commit
- update C/C - compilation failure with gcc7 on s390x-sles-12 · 90e11488
  Sergei Golubchik authored Nov 01, 2023
  
  90e11488
31 Oct, 2023 1 commit
- MDEV-31826: Memory leak on failed IMPORT TABLESPACE · 0cc809f9
  Marko Mäkelä authored Oct 31, 2023
```
fil_delete_tablespace(): Invoke fil_space_free_low() directly.
This fixes up commit 39e3ca8b
```
  0cc809f9
30 Oct, 2023 2 commits
- MDEV-32531 MSAN / Valgrind errors in Item_func_like::get_mm_leaf with temporal field · 6f091434
  Monty authored Oct 30, 2023
```
Added missing initializer
```
  6f091434
- Make the test more stable · c4143f90
  Oleksandr Byelkin authored Oct 30, 2023
  
  c4143f90
28 Oct, 2023 2 commits

MDEV-32612 Assertion `tab->select->quick' failed in test_if_skip_sort_order · ab6139dd
Rex authored Oct 28, 2023
```
Fixup for MDEV-31983, incorrect test for checking ability to use quick select.

Approved by Sergei Petrunia
```
ab6139dd

MDEV-32351: Significant slowdown with outer joins: fix embedded. · 86351f5e

Sergei Petrunia authored Oct 28, 2023

For some reason, in embedded server, a command

let $a=`$query`

ignores local context. Make a workaround: use SET STATEMENT to set
debug_dbug in the same statement.

86351f5e

27 Oct, 2023 7 commits

Fix of Backport block-nl-join.r_unpack_time_ms. · 1cd8a5ef
Oleksandr Byelkin authored Oct 27, 2023

1cd8a5ef

MDEV-32351: Significant slowdown with outer joins: Test coverage · 9bf2e5e3

Sergei Petrunia authored Oct 26, 2023

Make ANALYZE FORMAT=JSON print block-nl-join.r_unpack_ops when
analyze_print_r_unpack_ops debug flag is set.

Then, add a testcase.

9bf2e5e3

ANALYZE FORMAT=JSON: Backport block-nl-join.r_unpack_time_ms from 11.0 +fix MDEV-30830. · 4ed59006
Sergei Petrunia authored Mar 10, 2023
```
Also fix it to work with hashed join (MDEV-30830).

Reviewed by: Monty <monty@mariadb.org>
```
4ed59006

MDEV-32351 Significant slowdown for query with many outer joins · 954a6dec

Igor Babaev authored Oct 19, 2023

This patch fixes a performance regression introduced in the patch for the
bug MDEV-21104. The performance regression could affect queries for which
join buffer was used for an outer join such that its on expression from
which a conjunctive condition depended only on outer tables can be
extracted. If the number of records in the join buffer for which this
condition was false greatly exceeded the number of other records the
slowdown could be significant.

If there is a conjunctive condition extracted from the ON expression
depending only on outer tables this condition is evaluated when interesting
fields of each survived record of outer tables are put into the join buffer.
Each such set of fields for any join operation is supplied with a match
flag field used to generate null complemented rows. If the result of the
evaluation of the condition is false the flag is set to MATCH_IMPOSSIBLE.
When looking in the join buffer for records matching a record of the
right operand of the outer join operation the records with such flags
are not needed to be unpacked into record buffers for evaluation of on
expressions.

The patch for MDEV-21104 fixing some problem of wrong results when
'not exists' optimization by mistake broke the code that allowed to
ignore records with the match flag set to MATCH_IMPOSSIBLE when looking
for matching records. As a result such records were unpacked for each
record of the right operand of the outer join operation. This caused
significant execution penalty in some cases.

One of the test cases added in the patch can be used only for demonstration
of the restored performance for the reported query. The second test case is
needed to demonstrate the validity of the fix.

954a6dec

fixed typo · 11abc219
Oleksandr Byelkin authored Oct 27, 2023

11abc219

MDEV-32578 row_merge_fts_doc_tokenize() handles parser plugin inconsistently · 15ae97b1

Marko Mäkelä authored Oct 27, 2023

When mysql/mysql-server@0c954c2289a75d90d1088356b1092437ebf45a1d
added a plugin interface for FULLTEXT INDEX tokenization to MySQL 5.7,
fts_tokenize_ctx::processed_len got a second meaning, which is only
partly implemented in row_merge_fts_doc_tokenize().

This inconsistency could cause a crash when using FULLTEXT...WITH PARSER.
A test case that would crash MySQL 8.0 when using an n-gram parser and
single-character words would fail to crash in MySQL 5.7, because the
buf_full condition in row_merge_fts_doc_tokenize() was not met.

This change is inspired by
mysql/mysql-server@38e9a0779aeea2d197c727e306a910c56b26a47c
that appeared in MySQL 5.7.44.

15ae97b1

MDEV-32593 Assertion failure upon CREATE SEQUENCE · 728bca44
Andrei authored Oct 27, 2023
```
A recently added by MDEV-32593 assert conditions are corrected.
```
728bca44

26 Oct, 2023 6 commits

MDEV-32282: Galera node remains paused after interleaving FTWRLs · ef7fc586

Teemu Ollakka authored Sep 28, 2023

After two concurrent FTWRL/UNLOCK TABLES, the node stays in paused state
and the following CREATE TABLE fails with

  ER_UNKNOWN_COM_ERROR (1047): Aborting TOI: Replication paused on
  node for FTWRL/BACKUP STAGE.

The cause is the use of global `wsrep_locked_seqno` to determine
if the node should be resumed on UNLOCK TABLES. In some executions
the `wsrep_locked_seqno` is cleared by the first UNLOCK TABLES
after the second FTWRL gets past `make_global_read_lock_block_commit()`.

As a fix, use `thd->wsrep_desynced_backup_stage` to determine
if the thread should resume the node on UNLOCK TABLES.

Add MTR test galera.galera_ftwrl_concurrent to reproduce the
race. The test contains also cases for BACKUP STAGE which
uses similar mechanism for desyncing and pausing the node.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>

ef7fc586

MDEV-32586 incorrect error about cyclic reference about JSON type virtual column · c9f87b88

Sergei Golubchik authored Oct 26, 2023

remove the hack where NO_DEFAULT_VALUE_FLAG was temporarily removed
from a field to initialize DEFAULT() functions in CHECK constraints
while disabling self-reference field checks.

Instead, initialize DEFAULT() functions in CHECK explicitly,
don't call check_field_expression_processor() for CHECK at all.

c9f87b88

MDEV-32365 detailize the semisync replication magic number error · 9c433432

Andrei authored Oct 26, 2023

Semisync ack (master side) receiver thread is made to report
details of faced errors.
In case of 'magic byte' error, a hexdump of the received packet
is always (level) NOTEd into the error log.
In other cases an exact server level error is print out
as a warning (as it may not be critical) under log_warnings > 2.

An MTR test added for the magic byte error. For others existing mtr
tests cover that, provided log_warnings > 2 is set.

9c433432

MDEV-32588 InnoDB may hang when running out of buffer pool · 5b53342a

Marko Mäkelä authored Oct 26, 2023

buf_flush_LRU_list_batch(): Do not skip pages that are actually clean
but in buf_pool.flush_list due to the "lazy removal" optimization of
commit 22b62eda, but try to evict them.
After acquiring buf_pool.flush_list_mutex, reread oldest_modification
to ensure that the block still remains in buf_pool.flush_list.

In addition to server hangs, this bug could also cause
InnoDB: Failing assertion: list.count > 0
in invocations of UT_LIST_REMOVE(flush_list, ...).

This fixes a regression that was caused by
commit a55b951e
and possibly made more likely to hit due to
commit aa719b50.

5b53342a

MDEV-31826 InnoDB may fail to recover after being killed in fil_delete_tablespace() · 39e3ca8b

Marko Mäkelä authored Oct 26, 2023

InnoDB was violating the write-ahead-logging protocol when a file
was being deleted, like this:

1. fil_delete_tablespace() set the fil_space_t::STOPPING flag
2. The buf_flush_page_cleaner() thread discards some changed pages for
this tablespace advances the log checkpoint a little.
3. The server process is killed before fil_delete_tablespace() wrote
a FILE_DELETE record.
4. Recovery will try to apply log to pages of the tablespace, because
there was no FILE_DELETE record. This will fail, because some pages
that had been modified since the latest checkpoint had not been written
by the page cleaner.

Page writes must not be stopped before a FILE_DELETE record has been
durably written.

fil_space_t::drop(): Replaces fil_space_t::check_pending_operations().
Add the parameter detached_handle, and return a tablespace pointer
if this thread was the first one to stop I/O on the tablespace.

mtr_t::commit_file(): Remove the parameter detached_handle, and
move some handling to fil_space_t::drop().

fil_space_t: STOPPING_READS, STOPPING_WRITES: Separate flags for STOPPING.
We want to stop reads (and encryption) before stopping page writes.

fil_space_t::is_stopping_writes(), fil_space_t::get_for_write():
Special accessors for the write path.

fil_space_t::flush_low(): Ignore the STOPPING_READS flag and only
stop if STOPPING_WRITES is set, to avoid an infinite loop in
fil_flush_file_spaces(), which was occasionally repeated by
running the test encryption.create_or_replace.

Reviewed by: Vladislav Lesin
Tested by: Matthias Leich

39e3ca8b

Fix --view-protocol failures · cb4c2713
Oleksandr Byelkin authored Oct 26, 2023

cb4c2713

25 Oct, 2023 13 commits

MDEV-31983 jointable materialization subquery optimization ignoring · ec2574fd

Rex authored Sep 15, 2023

...errors, then failing ASSERT.

UPDATE queries treat warnings as errors. In this case, an invalid
condition "datetime_key_col >= '2012-01'" caused warning-as-error inside
SQL_SELECT::test_quick_select().

The code that called test_quick_select() ignored this error and continued
join optimization. Then it eventually reached a thd->is_error() check
and failed to setup SJ-Materialization which failed an assert.

Fixed this by making SQL_SELECT::test_quick_select() return error in
its return value, and making any code that calls it to check for error
condition and abort the query if the error is returned.

Places in the code that didn't check for errors from
SQL_SELECT::test_quick_select but now do:
- get_quick_record_count() call in make_join_statistics(),
- test_if_skip_sort_order(),
- "Range checked for each record" code.

Extra error handling fixes and commit text wording by Sergei Petrunia,

Reviewed-by: Sergei Petrunia, Oleg Smirnov

ec2574fd

MDEV-32475 Add logging of test_if_skip_sort_order to optimizer trace · 68542cae
Oleg Smirnov authored Oct 18, 2023

68542cae

MDEV-32475: Skip sorting if we will read one row · 680f732f

Oleg Smirnov authored Oct 14, 2023

test_if_skip_sort_order() should catch the join types JT_EQ_REF,
JT_CONST and JT_SYSTEM and skip sort order for these.

Such join types imply retrieving of a single row of data, and sorting
of a single row can always be skipped.

680f732f

MDEV-32050: Boost innodb_purge_batch_size on slow shutdown · 2ba97021

Marko Mäkelä authored Oct 25, 2023

A slow shutdown using the previous default innodb_purge_batch_size=300
could be extremely slow, employing at most a few CPU cores on the average.
Let us use the maximum batch size in order to increase throughput.

Reviewed by: Vladislav Lesin

2ba97021

MDEV-32050: Do not copy undo records in purge · aa719b50

Marko Mäkelä authored Oct 25, 2023

Also, default to innodb_purge_batch_size=1000,
replacing the old default value of processing 300 undo log pages
in a batch. Axel Schwenke found this value to help reduce purge lag
without having a significant impact on workload throughput.

In purge, we can simply acquire a shared latch on the undo log page
(to avoid a race condition like the one that was fixed in
commit b102872a) and retain a buffer-fix
after releasing the latch. The buffer-fix will prevent the undo log
page from being evicted from the buffer pool. Concurrent modification
is prevented by design. Only the purge_coordinator_task
(or its accomplice purge_truncation_task) may free the undo log pages,
after any purge_worker_task have completed execution. Hence, we do not
have to worry about any overwriting or reuse of the undo log records.

trx_undo_rec_copy(): Remove. The only remaining caller would have been
trx_undo_get_undo_rec_low(), which is where the logic was merged.

purge_sys_t::m_initialized: Replaces heap.

purge_sys_t::pages: A cache of buffer-fixed pages that have been
looked up from buf_pool.page_hash.

purge_sys_t::get_page(): Return a buffer-fixed undo page, using the
pages cache.

trx_purge_t::batch_cleanup(): Renamed from clone_end_view().
Clear the pages cache and clone the end_view at the end of a batch.

purge_sys_t::n_pages_handled(): Return pages.size(). This determines
if innodb_purge_batch_size was exceeded.

purge_sys_t::rseg_get_next_history_log(): Replaces
trx_purge_rseg_get_next_history_log().

purge_sys_t::choose_next_log(): Replaces trx_purge_choose_next_log()
and trx_purge_read_undo_rec().

purge_sys_t::get_next_rec(): Replaces trx_purge_get_next_rec()
and trx_undo_get_next_rec().

purge_sys_t::fetch_next_rec(): Replaces trx_purge_fetch_next_rec()
and some use of trx_undo_get_first_rec().

trx_purge_attach_undo_recs(): Do not allow purge_sys.n_pages_handled()
exceed the innodb_purge_batch_size or ¾ of the buffer pool, whichever
is smaller.

Reviewed by: Vladislav Lesin
Tested by: Matthias Leich and Axel Schwenke

aa719b50

MDEV-32050: Look up tables in the purge coordinator · 88733282

Marko Mäkelä authored Oct 25, 2023

The InnoDB table lookup in purge worker threads is a bottleneck that can
degrade a slow shutdown to utilize less than 2 threads. Let us fix that
bottleneck by constructing a local lookup table that does not require any
synchronization while the undo log records of the current batch
are being processed.

TRX_PURGE_TABLE_BUCKETS: The initial number of std::unordered_map
hash buckets used during a purge batch. This could avoid some
resizing and rehashing in trx_purge_attach_undo_recs().

purge_node_t::tables: A lookup table from table ID to an already
looked up and locked table. Replaces many fields.

trx_purge_attach_undo_recs(): Look up each table in the purge batch
only once.

trx_purge(): Close all tables and release MDL at the end of the batch.

trx_purge_table_open(), trx_purge_table_acquire(): Open a table in purge
and acquire a metadata lock on it. This replaces
dict_table_open_on_id<true>() and dict_acquire_mdl_shared().

purge_sys_t::close_and_reopen(): In case of an MDL conflict, close and
reopen all tables that are covered by the current purge batch.
It may be that some of the tables have been dropped meanwhile and can
be ignored. This replaces wait_SYS() and wait_FTS().

row_purge_parse_undo_rec(): Make purge_coordinator_task issue a
MDL warrant to any purge_worker_task which might need it
when innodb_purge_threads>1.

purge_node_t::end(): Clear the MDL warrant.

Reviewed by: Vladislav Lesin and Vladislav Vaintroub

88733282

MDEV-32050: Allow table to be guarded by an MDL of another thread · 39bb5ebb

Nikita Malyavin authored Oct 12, 2023

Add a debug-only field MDL_context::lock_warrant. This field can be set
to the MDL context different from the one the current execution is done in.

The lock warrantor has to hold an MDL for at least a duration of a table
lifetime.

This is needed in the subsequent commit so that the shared MDL acquired by
the InnoDB purge_coordinator_task can be shared by purge_worker_task
that access index records that include virtual columns.

Reviewed by: Vladislav Vaintroub

39bb5ebb

MDEV-32050: Revert the throttling of MDEV-26356 · d70a98ae

Marko Mäkelä authored Oct 25, 2023

purge_coordinator_state::do_purge(): Simply use all innodb_purge_threads,
no matter what the LSN age is. During shutdown with innodb_fast_shutdown=0
this code could degrade to using only 1 thread.

Also, restore periodical "InnoDB: to purge" messages that were
accidentally disabled in commit 80585c9d.

Reviewed by: Vladislav Lesin and Vladislav Vaintroub

d70a98ae

MDEV-32050: Hold exclusive purge_sys.rseg->latch longer · 2027c482

Marko Mäkelä authored Oct 25, 2023

Let the purge_coordinator_task acquire purge_sys.rseg->latch
less frequently and hold it longer at a time. This may throttle
concurrent DML and prevent purge lag a little.

Remove an unnecessary std::this_thread::yield(), because the
trx_purge_attach_undo_recs() is supposed to terminate the scan
when running out of undo log records. Ultimately, this will
result in purge_coordinator_state::do_purge() and
purge_coordinator_callback() returning control to the thread pool.

Reviewed by: Vladislav Lesin and Vladislav Vaintroub

2027c482

MDEV-32050: Improve srv_wake_purge_thread_if_not_active() · 44689eb7

Marko Mäkelä authored Oct 25, 2023

purge_sys_t::wake_if_not_active(): Replaces
srv_wake_purge_thread_if_not_active().

innodb_ddl_recovery_done(): Move the wakeup call to
srv_init_purge_tasks().

purge_coordinator_timer: Remove. The srv_master_callback() already
invokes purge_sys.wake_if_not_active() once per second.

Reviewed by: Vladislav Lesin and Vladislav Vaintroub

44689eb7

MDEV-32050: Deprecate&ignore innodb_purge_rseg_truncate_frequency · 14685b10

Marko Mäkelä authored Oct 25, 2023

The motivation of introducing the parameter
innodb_purge_rseg_truncate_frequency in
mysql/mysql-server@28bbd66ea5f6acf80fcb381057bb7ca5b7b188d2 and
mysql/mysql-server@8fc2120fed11d2498ecb3635d87f414c76985fce
seems to have been to avoid stalls due to freeing undo log pages
or truncating undo log tablespaces. In MariaDB Server,
innodb_undo_log_truncate=ON should be a much lighter operation
than in MySQL, because it will not involve any log checkpoint.

Another source of performance stalls should be
trx_purge_truncate_rseg_history(), which is shrinking the history list
by freeing the undo log pages whose undo records have been purged.
To alleviate that, we will introduce a purge_truncation_task that will
offload this from the purge_coordinator_task. In that way, the next
innodb_purge_batch_size pages may be parsed and purged while the pages
from the previous batch are being freed and the history list being shrunk.

The processing of innodb_undo_log_truncate=ON will still remain the
responsibility of the purge_coordinator_task.

purge_coordinator_state::count: Remove. We will ignore
innodb_purge_rseg_truncate_frequency, and act as if it had been
set to 1 (the maximum shrinking frequency).

purge_coordinator_state::do_purge(): Invoke an asynchronous task
purge_truncation_callback() to free the undo log pages.

purge_sys_t::iterator::free_history(): Free those undo log pages
that have been processed. This used to be a part of
trx_purge_truncate_history().

purge_sys_t::clone_end_view(): Take a new value of purge_sys.head
as a parameter, so that it will be updated while holding exclusive
purge_sys.latch. This is needed for race-free access to the field
in purge_truncation_callback().

Reviewed by: Vladislav Lesin

14685b10

MDEV-32050: Clean up online ALTER · 21bec970

Marko Mäkelä authored Oct 25, 2023

UndorecApplier::assign_rec(): Remove. We will pass the undo record to
UndorecApplier::apply_undo_rec(). There is no need to copy the
undo record, because nothing else can write to the undo log pages
that belong to an active or incomplete transaction.

trx_t::apply_log(): Buffer-fix the undo page across mini-transaction
boundary in order to avoid repeated page lookups.

Reviewed by: Vladislav Lesin

21bec970

MDEV-32050: Clean up log parsing · 9bb5d9fe

Marko Mäkelä authored Oct 25, 2023

purge_node_t, undo_node_t: Change the type of rec_type and cmpl_info
to byte, because this data is being extracted from a single byte.

UndoRecApplier: Change type and cmpl_info to be of type byte, and
move them next to the 16-bit offset field to minimize alignment bloat.

row_purge_parse_undo_rec(): Remove some redundant code. Purge will
be started by innodb_ddl_recovery_done(), at which point all
necessary subsystems will have been initialized.

trx_purge_rec_t::undo_rec: Point to const.

Reviewed by: Vladislav Lesin

9bb5d9fe