Commits · aa719b5010c929132b4460b78113fbd07497d9c8 · nexedi / MariaDB

25 Oct, 2023 10 commits

MDEV-32050: Do not copy undo records in purge · aa719b50

Marko Mäkelä authored Oct 25, 2023

Also, default to innodb_purge_batch_size=1000,
replacing the old default value of processing 300 undo log pages
in a batch. Axel Schwenke found this value to help reduce purge lag
without having a significant impact on workload throughput.

In purge, we can simply acquire a shared latch on the undo log page
(to avoid a race condition like the one that was fixed in
commit b102872a) and retain a buffer-fix
after releasing the latch. The buffer-fix will prevent the undo log
page from being evicted from the buffer pool. Concurrent modification
is prevented by design. Only the purge_coordinator_task
(or its accomplice purge_truncation_task) may free the undo log pages,
after any purge_worker_task have completed execution. Hence, we do not
have to worry about any overwriting or reuse of the undo log records.

trx_undo_rec_copy(): Remove. The only remaining caller would have been
trx_undo_get_undo_rec_low(), which is where the logic was merged.

purge_sys_t::m_initialized: Replaces heap.

purge_sys_t::pages: A cache of buffer-fixed pages that have been
looked up from buf_pool.page_hash.

purge_sys_t::get_page(): Return a buffer-fixed undo page, using the
pages cache.

trx_purge_t::batch_cleanup(): Renamed from clone_end_view().
Clear the pages cache and clone the end_view at the end of a batch.

purge_sys_t::n_pages_handled(): Return pages.size(). This determines
if innodb_purge_batch_size was exceeded.

purge_sys_t::rseg_get_next_history_log(): Replaces
trx_purge_rseg_get_next_history_log().

purge_sys_t::choose_next_log(): Replaces trx_purge_choose_next_log()
and trx_purge_read_undo_rec().

purge_sys_t::get_next_rec(): Replaces trx_purge_get_next_rec()
and trx_undo_get_next_rec().

purge_sys_t::fetch_next_rec(): Replaces trx_purge_fetch_next_rec()
and some use of trx_undo_get_first_rec().

trx_purge_attach_undo_recs(): Do not allow purge_sys.n_pages_handled()
exceed the innodb_purge_batch_size or ¾ of the buffer pool, whichever
is smaller.

Reviewed by: Vladislav Lesin
Tested by: Matthias Leich and Axel Schwenke

aa719b50

MDEV-32050: Look up tables in the purge coordinator · 88733282

Marko Mäkelä authored Oct 25, 2023

The InnoDB table lookup in purge worker threads is a bottleneck that can
degrade a slow shutdown to utilize less than 2 threads. Let us fix that
bottleneck by constructing a local lookup table that does not require any
synchronization while the undo log records of the current batch
are being processed.

TRX_PURGE_TABLE_BUCKETS: The initial number of std::unordered_map
hash buckets used during a purge batch. This could avoid some
resizing and rehashing in trx_purge_attach_undo_recs().

purge_node_t::tables: A lookup table from table ID to an already
looked up and locked table. Replaces many fields.

trx_purge_attach_undo_recs(): Look up each table in the purge batch
only once.

trx_purge(): Close all tables and release MDL at the end of the batch.

trx_purge_table_open(), trx_purge_table_acquire(): Open a table in purge
and acquire a metadata lock on it. This replaces
dict_table_open_on_id<true>() and dict_acquire_mdl_shared().

purge_sys_t::close_and_reopen(): In case of an MDL conflict, close and
reopen all tables that are covered by the current purge batch.
It may be that some of the tables have been dropped meanwhile and can
be ignored. This replaces wait_SYS() and wait_FTS().

row_purge_parse_undo_rec(): Make purge_coordinator_task issue a
MDL warrant to any purge_worker_task which might need it
when innodb_purge_threads>1.

purge_node_t::end(): Clear the MDL warrant.

Reviewed by: Vladislav Lesin and Vladislav Vaintroub

88733282

MDEV-32050: Allow table to be guarded by an MDL of another thread · 39bb5ebb

Nikita Malyavin authored Oct 12, 2023

Add a debug-only field MDL_context::lock_warrant. This field can be set
to the MDL context different from the one the current execution is done in.

The lock warrantor has to hold an MDL for at least a duration of a table
lifetime.

This is needed in the subsequent commit so that the shared MDL acquired by
the InnoDB purge_coordinator_task can be shared by purge_worker_task
that access index records that include virtual columns.

Reviewed by: Vladislav Vaintroub

39bb5ebb

MDEV-32050: Revert the throttling of MDEV-26356 · d70a98ae

Marko Mäkelä authored Oct 25, 2023

purge_coordinator_state::do_purge(): Simply use all innodb_purge_threads,
no matter what the LSN age is. During shutdown with innodb_fast_shutdown=0
this code could degrade to using only 1 thread.

Also, restore periodical "InnoDB: to purge" messages that were
accidentally disabled in commit 80585c9d.

Reviewed by: Vladislav Lesin and Vladislav Vaintroub

d70a98ae

MDEV-32050: Hold exclusive purge_sys.rseg->latch longer · 2027c482

Marko Mäkelä authored Oct 25, 2023

Let the purge_coordinator_task acquire purge_sys.rseg->latch
less frequently and hold it longer at a time. This may throttle
concurrent DML and prevent purge lag a little.

Remove an unnecessary std::this_thread::yield(), because the
trx_purge_attach_undo_recs() is supposed to terminate the scan
when running out of undo log records. Ultimately, this will
result in purge_coordinator_state::do_purge() and
purge_coordinator_callback() returning control to the thread pool.

Reviewed by: Vladislav Lesin and Vladislav Vaintroub

2027c482

MDEV-32050: Improve srv_wake_purge_thread_if_not_active() · 44689eb7

Marko Mäkelä authored Oct 25, 2023

purge_sys_t::wake_if_not_active(): Replaces
srv_wake_purge_thread_if_not_active().

innodb_ddl_recovery_done(): Move the wakeup call to
srv_init_purge_tasks().

purge_coordinator_timer: Remove. The srv_master_callback() already
invokes purge_sys.wake_if_not_active() once per second.

Reviewed by: Vladislav Lesin and Vladislav Vaintroub

44689eb7

MDEV-32050: Deprecate&ignore innodb_purge_rseg_truncate_frequency · 14685b10

Marko Mäkelä authored Oct 25, 2023

The motivation of introducing the parameter
innodb_purge_rseg_truncate_frequency in
mysql/mysql-server@28bbd66ea5f6acf80fcb381057bb7ca5b7b188d2 and
mysql/mysql-server@8fc2120fed11d2498ecb3635d87f414c76985fce
seems to have been to avoid stalls due to freeing undo log pages
or truncating undo log tablespaces. In MariaDB Server,
innodb_undo_log_truncate=ON should be a much lighter operation
than in MySQL, because it will not involve any log checkpoint.

Another source of performance stalls should be
trx_purge_truncate_rseg_history(), which is shrinking the history list
by freeing the undo log pages whose undo records have been purged.
To alleviate that, we will introduce a purge_truncation_task that will
offload this from the purge_coordinator_task. In that way, the next
innodb_purge_batch_size pages may be parsed and purged while the pages
from the previous batch are being freed and the history list being shrunk.

The processing of innodb_undo_log_truncate=ON will still remain the
responsibility of the purge_coordinator_task.

purge_coordinator_state::count: Remove. We will ignore
innodb_purge_rseg_truncate_frequency, and act as if it had been
set to 1 (the maximum shrinking frequency).

purge_coordinator_state::do_purge(): Invoke an asynchronous task
purge_truncation_callback() to free the undo log pages.

purge_sys_t::iterator::free_history(): Free those undo log pages
that have been processed. This used to be a part of
trx_purge_truncate_history().

purge_sys_t::clone_end_view(): Take a new value of purge_sys.head
as a parameter, so that it will be updated while holding exclusive
purge_sys.latch. This is needed for race-free access to the field
in purge_truncation_callback().

Reviewed by: Vladislav Lesin

14685b10

MDEV-32050: Clean up online ALTER · 21bec970

Marko Mäkelä authored Oct 25, 2023

UndorecApplier::assign_rec(): Remove. We will pass the undo record to
UndorecApplier::apply_undo_rec(). There is no need to copy the
undo record, because nothing else can write to the undo log pages
that belong to an active or incomplete transaction.

trx_t::apply_log(): Buffer-fix the undo page across mini-transaction
boundary in order to avoid repeated page lookups.

Reviewed by: Vladislav Lesin

21bec970

MDEV-32050: Clean up log parsing · 9bb5d9fe

Marko Mäkelä authored Oct 25, 2023

purge_node_t, undo_node_t: Change the type of rec_type and cmpl_info
to byte, because this data is being extracted from a single byte.

UndoRecApplier: Change type and cmpl_info to be of type byte, and
move them next to the 16-bit offset field to minimize alignment bloat.

row_purge_parse_undo_rec(): Remove some redundant code. Purge will
be started by innodb_ddl_recovery_done(), at which point all
necessary subsystems will have been initialized.

trx_purge_rec_t::undo_rec: Point to const.

Reviewed by: Vladislav Lesin

9bb5d9fe

MDEV-32050 preparation: Simplify ROLLBACK · ea42c4ba

Marko Mäkelä authored Oct 25, 2023

undo_node_t::state: Replaced with bool is_temp.

row_undo_rec_get(): Do not copy the undo log record.
The motivation of the copying was to not hold latches on the undo pages
and therefore to avoid deadlocks due to lock order inversion a.k.a.
latching order violation: It is not allowed to wait for an index page latch
while holding an undo page latch, because MVCC reads would first acquire
an index page latch and then an undo page latch. But, in rollback, we
do not actually need any latch on our own undo pages. The transaction
that is being rolled back is the exclusive owner of its undo log records.
They cannot be overwritten by other threads until the rollback is complete.
Therefore, a buffer fix will protect the undo log record just fine,
by preventing page eviction. We still must initially acquire a shared latch
on each undo page, to avoid a race condition like the one that was fixed in
commit b102872a.

row_undo_ins_parse_undo_rec(): The first two bytes of the undo log record
now are the pointer to the next record within the page, not a length.

Reviewed by: Vladislav Lesin

ea42c4ba

24 Oct, 2023 1 commit

MDEV-32530 Race condition in lock_wait_rpl_report() · b78b77e7

Marko Mäkelä authored Oct 24, 2023

After acquiring lock_sys.latch, always load trx->lock.wait_lock.
It could have changed by another thread that did lock_rec_move()
and released lock_sys.latch right before lock_sys.wr_lock_try()
succeeded.

This regression was introduced in
commit e039720b (MDEV-32096).

Reviewed by: Vladislav Lesin

b78b77e7

23 Oct, 2023 3 commits

Merge 10.5 into 10.6 · b21f52ee
Marko Mäkelä authored Oct 23, 2023

b21f52ee

MDEV-32552 Write-ahead logging is broken for freed pages · b5e43a1d

Marko Mäkelä authored Oct 23, 2023

buf_page_free(): Flag the freed page as modified if it is found in
the buffer pool.

buf_flush_page(): If the page has been freed, ensure that the log
for it has been durably written, before removing the page
from buf_pool.flush_list.

FindBlockX: Find also MTR_MEMO_PAGE_X_MODIFY in order to avoid an
occasional failure of innodb.innodb_defrag_concurrent, which involves
freeing and reallocating pages in the same mini-transaction.

This fixes a regression that was introduced in
commit a35b4ae8 (MDEV-15528).

This logic was tested by commenting out the $shutdown_timeout line
from a test and running the following:

./mtr --rr innodb.scrub
rr replay var/log/mysqld.1.rr/mariadbd-0

A breakpoint in the modified buf_flush_page() was hit, and the
FIL_PAGE_LSN of that page had been last modified during the
mtr_t::commit() of a mini-transaction where buf_page_free()
had been executed on that page.

b5e43a1d

new CC v3.3 · 0a4103e6
Oleksandr Byelkin authored Oct 23, 2023

0a4103e6

19 Oct, 2023 7 commits

MDEV-32113: utf8mb3_key_col=utf8mb4_value cannot be used for ref · 4941ac91

Sergei Petrunia authored Sep 19, 2023

(Variant#3: Allow cross-charset comparisons, use a special
CHARSET_INFO to create lookup keys. Review input addressed.)

Equalities that compare utf8mb{3,4}_general_ci strings, like:

  WHERE ... utf8mb3_key_col=utf8mb4_value    (MB3-4-CMP)

can now be used to construct ref[const] access and also participate
in multiple-equalities.
This means that utf8mb3_key_col can be used for key-lookups when
compared with an utf8mb4 constant, field or expression using '=' or
'<=>' comparison operators.

This is controlled by optimizer_switch='cset_narrowing=on', which is
OFF by default.

IMPLEMENTATION
Item value comparison in (MB3-4-CMP) is done using utf8mb4_general_ci.
This is valid as any utf8mb3 value is also an utf8mb4 value.

When making index lookup value for utf8mb3_key_col, we do "Charset
Narrowing": characters that are in the Basic Multilingual Plane (=BMP) are
copied as-is, as they can be represented in utf8mb3. Characters that are
outside the BMP cannot be represented in utf8mb3 and are replaced
with U+FFFD, the "Replacement Character".

In utf8mb4_general_ci, the Replacement Character compares as equal to any
character that's not in BMP. Because of this, the constructed lookup value
will find all index records that would be considered equal by the original
condition (MB3-4-CMP).
Approved-by: Monty <monty@mariadb.org>

4941ac91

MDEV-32476 LeakSanitizer errors in get_quick_select or Assertion ... · 6a674c31

Monty authored Oct 19, 2023

Problem was that JOIN_TAB::cleanup() was not run because
JOIN::top_join_tab_count was not set in case of early errors.

Fixed by setting JOIN::tab_join_tab_count when JOIN_TAB's are allocated.

Something that should eventually be fixed:
- Cleaning up JOIN_TAB's is now done in 3 different loops.
  JOIN_TAB::cleanup() is only doing a partial cleanup. Other cleanups
  are done outside of JOIN_TAB::cleanup().

The above should be fixed so that JOIN_TAB::cleanup() is freeing
everything related to it's own memory, including all its sub JOIN_ TAB's.
JOIN::cleanup() should only loop over all it's top JOIN_TAB's and call
JOIN_TAB::cleanup() on these.
This will greatly simplify and speedup the current code (as we now do some
cleanup's twice).

6a674c31

Fixed crash in is_stat_table() when using hash joins. · a1b6befc

Monty authored Oct 18, 2023

Other usage if persistent statistics is checking 'stats_is_read' in
caller, which is why this was not noticed earlier.

Other things:
- Simplified no_stat_values_provided

a1b6befc

Merge 10.5 into 10.6 · 6991b1c4
Marko Mäkelä authored Oct 19, 2023

6991b1c4

MDEV-31851 After crash recovery, undo tablespace fails to open · 85751ed8

Thirunarayanan Balathandayuthapani authored Oct 19, 2023

srv_all_undo_tablespaces_open(): While opening the extra unused
undo tablespaces, InnoDB should use ULINT_UNDEFINED instead of
SRV_SPACE_ID_UPPER_BOUND.

85751ed8

MDEV-31851 After crash recovery, undo tablespace fails to open · dbba1bb1

Thirunarayanan Balathandayuthapani authored Oct 19, 2023

recv_recovery_from_checkpoint_start(): InnoDB should add the
redo log block header + trailer size while checking the	log
sequence number in log file with log sequence number in the
system tablespace first page.

dbba1bb1

MDEV-32144 fixup · 2d6dc65d

Marko Mäkelä authored Oct 19, 2023

In commit 384eb570 the debug check
was relaxed in trx_undo_header_create(), not in the intended function
trx_undo_write_xid().

2d6dc65d

18 Oct, 2023 2 commits

MDEV-32511: Race condition between checkpoint and page write · cfd17881

Marko Mäkelä authored Oct 18, 2023

fil_aio_callback(): Invoke fil_node_t::complete_write() before
releasing any page latch, so that in case a log checkpoint is
executed roughly concurrently with the first write into a file
since the previous checkpoint, we will not miss a fdatasync()
or fsync() call to make the write durable.

cfd17881

MDEV-32511 Assertion !os_aio_pending_writes() failed · bf7c6fc2

Marko Mäkelä authored Oct 18, 2023

In MemorySanitizer builds of 10.10 and 10.11, we would rather often
have the assertion fail in innodb_init() during mariadb-backup --prepare.
The assertion could also fail during InnoDB startup, but less often.

Before commit 685d958e in 10.8 the
log file cleanup after a successfully applied backup is different,
and the os_aio_pending_writes() assertion is in srv0start.cc.

IORequest::write_complete(): Invoke node->complete_write() before
releasing the page latch, so that a log checkpoint that is about to
execute concurrently will not miss a fdatasync() or fsync() on the
file, in case this was the first write since the last such call.

create_log_file(), srv_start(): Replace the debug assertion with
a debug check. For all intents and purposes, all writes could have
been completed but some write_io_callback() may not have invoked
io_slots::release() yet.

bf7c6fc2

17 Oct, 2023 1 commit

MDEV-31851 After crash recovery, undo tablespace fails to open · 3da5d047

Thirunarayanan Balathandayuthapani authored Oct 17, 2023

Problem:
========
- InnoDB fails to open undo tablespace when page0 is corrupted
and fails to throw error.

Solution:
=========
- InnoDB throws DB_CORRUPTION error when InnoDB encounters
page0 corruption of undo tablespace.

- InnoDB restores the page0 of undo tablespace from
doublewrite buffer if it encounters page corruption

- Moved Datafile::restore_from_doublewrite() to
recv_dblwr_t::restore_first_page(). So that undo
tablespace and system tablespace can use this function
instead of duplicating the code

srv_undo_tablespace_open(): Returns 0 if file doesn't exist
or ULINT_UNDEFINED if page0 is corrupted.

3da5d047

16 Oct, 2023 2 commits
- MDEV-28122 Optimize table crash while applying online log · ee5cadd5
  Thirunarayanan Balathandayuthapani authored Oct 16, 2023
```
- InnoDB fails to check the overflow buffer while applying
the operation to the table that was rebuilt. This is caused
by commit 3cef4f8f (MDEV-515).
```
  ee5cadd5
- Post fix for MDEV-32449 · cca95478
  Monty authored Oct 16, 2023
  
  cca95478
14 Oct, 2023 3 commits

MDEV-32449 Server crashes in Alter_info::add_stat_drop_index upon CREATE TABLE · 1c554459

Monty authored Oct 14, 2023

Fixed missing initialization of Alter_info()

This could cause crashes in some create table like scenarios
where some generated indexes where automatically dropped.

I also added a test that we do not try to drop from index_stats for
temporary tables.

1c554459

Do not create histograms for single column unique key · ec277a70

Monty authored Oct 14, 2023

The intentention was always to not create histograms for single value
unique keys (as histograms is not useful in this case), but because of
a bug in the code this was still done.

The changes in the test cases was mainly because hist_size is now NULL
for these kind of columns.

ec277a70

Revert "MDEV-29091: Correct event_name in PFS for wait caused by FOR UPDATE" · ea0b1ccd

Sergei Golubchik authored Oct 07, 2023

This reverts commit 03c9a4ef.

The fix is wrong. It was doing this: if the uninitialized
wait->m_class has some specific value, then don't initialize it.

ea0b1ccd

13 Oct, 2023 4 commits

make perfschema.show_aggregate test more reliable · c378efee
Sergei Golubchik authored Oct 13, 2023

c378efee

MDEV-32272 lock_release_on_prepare_try() does not release lock if supremum bit... · 18fa00a5

Vlad Lesin authored Oct 12, 2023

MDEV-32272 lock_release_on_prepare_try() does not release lock if supremum bit is set along with other bits set in lock's bitmap

The error is caused by MDEV-30165 fix with the following commit:
d13a57ae

There is logical error in lock_release_on_prepare_try():

        if (supremum_bit)
          lock_rec_unlock_supremum(*cell, lock);
        else
          lock_rec_dequeue_from_page(lock, false);

Because there can be other bits set in the lock's bitmap, and the lock
type can be suitable for releasing criteria, but the above logic
releases only supremum bit of the lock.

The fix is to release lock if it suits for releasing criteria and unlock
supremum if supremum is locked otherwise.

Tere is also the test for the case, which was reported by QA team. I
placed it in a separate files, because it requires debug build.

Reviewed by: Marko Mäkelä

18fa00a5

make perfschema.show_aggregate test more debuggable · e3e66a57
Sergei Golubchik authored Oct 07, 2023

e3e66a57

MDEV-31098 InnoDB Recovery doesn't display encryption message when no... · cbad0bcd

Thirunarayanan Balathandayuthapani authored Oct 13, 2023

MDEV-31098  InnoDB Recovery doesn't display encryption message when no encryption configuration passed

- InnoDB fails to report the error when encryption configuration
wasn't passed. This patch addresses the issue by adding
the error while loading the tablespace and deferring the
tablespace creation.

cbad0bcd

12 Oct, 2023 3 commits

MDEV-18200 MariaBackup full backup failed with InnoDB: Failing assertion: success · fbd11d5f
Daniel Black authored Oct 13, 2023
```
Review cleanups.
```
fbd11d5f

MDEV-18200 MariaBackup full backup failed with InnoDB: Failing assertion: success · c79ca7c7

Daniel Black authored Sep 19, 2023

There are many filesystem related errors that can occur with
MariaBackup. These already outputed to stderr with a good description of
the error. Many of these are permission or resource (file descriptor)
limits where the assertion and resulting core crash doesn't offer
developers anything more than the log message. To the user, assertions
and core crashes come across as poor error handling.

As such we return an error and handle this all the way up the stack.

c79ca7c7

Cleanup: Remove innobase_init_vc_templ() · f9d471e2
Marko Mäkelä authored Oct 12, 2023
```
This fixes up a merge of commit 4fb8f7d0
with respect to commit ea37b144.
```
f9d471e2

10 Oct, 2023 4 commits
- MDEV-32388 MSAN / Valgrind errors in Item_func_like::get_mm_leaf upon query from partitioned table · 8bf17c57
  Monty authored Oct 10, 2023
```
The problem was that RANGE_OPT_PARAM was not completely initialized in
some cases.
Added bzero() to ensure that all elements are always initialized.
```
  8bf17c57
- Removed warning from ssl_cipher.test · 55534a26
  Monty authored Oct 10, 2023
  
  55534a26
- MDEV-31957 Concurrent ALTER and ANALYZE collecting statistics can result in stale statistical data · b159f05a
  Monty authored Oct 07, 2023
```
Fixed hang when renaming index to original name
```
  b159f05a
- Remember first error in Dummy_error_handler · fdcb443e
  Monty authored Oct 05, 2023
```
Use Dummy_error_handler in open_stat_tables() to ignore all errors
when opening statistics tables.
```
  fdcb443e