Commits · main · nexedi / MariaDB

19 Sep, 2024 1 commit
- MDEV-34380: Set optimizer_switch='cset_narrowing=on' by default · 8478a06c
  Sergei Petrunia authored Jul 26, 2024
  
  8478a06c
10 Sep, 2024 1 commit

MDEV-28009 Deprecate spider_table_crd_thread_count and spider_table_sts_thread_count · fe3432b3

Yuchen Pei authored Sep 10, 2024

These variables/parameters have the default read-only value of 1, and
the only way to change them is through a command line flag together
with a command line flag loading spider. After this change, the flag
will have no effect.

fe3432b3

05 Sep, 2024 1 commit

MDEV-33853 Async rollback prepared transactions during binlog · 5bbda971

Libing Song authored Jun 11, 2024

           crash recovery

Summary
=======
When doing server recovery, the active transactions will be rolled
back by InnoDB background rollback thread automatically. The
prepared transactions will be committed or rolled back accordingly
by binlog recovery. Binlog recovery is done in main thread before
the server can provide service to users. If there is a big
transaction to rollback, the server will not available for a long
time.

This patch provides a way to rollback the prepared transactions
asynchronously. Thus the rollback will not block server startup.

Design
======
- Handler::recover_rollback_by_xid()
  This patch provides a new handler interface to rollback transactions
  in recover phase. InnoDB just set the transaction's state to active.
  Then the transaction will be rolled back by the background rollback
  thread.

- Handler::signal_tc_log_recover_done()
  This function is called after tc log is opened(typically binlog opened)
  has done. When this function is called, all transactions will be rolled
  back have been reverted to ACTIVE state. Thus it starts rollback thread
  to rollback the transactions.

- Background rollback thread
  With this patch, background rollback thread is defered to run until binlog
  recovery is finished. It is started by innobase_tc_log_recovery_done().

5bbda971

04 Sep, 2024 3 commits

MDEV-34857: Implement --slave-abort-blocking-timeout · db5d1cde

Kristian Nielsen authored Aug 31, 2024

If a slave replicating an event has waited for more than
@@slave_abort_blocking_timeout for a conflicting metadata lock held by a
non-replication thread, the blocking query is killed to allow replication to
proceed and not be blocked indefinitely by a user query.
Reviewed-by: Monty <monty@mariadb.org>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>

db5d1cde

Merge 11.6 into main · 669d8ffe
Marko Mäkelä authored Sep 04, 2024

669d8ffe
Merge 11.4 into 11.6 · a5b80531
Marko Mäkelä authored Sep 04, 2024

a5b80531

29 Aug, 2024 7 commits

Merge 11.2 into 11.4 · 44733aa8
Marko Mäkelä authored Aug 29, 2024

44733aa8
Merge 10.11 into 11.2 · e91a7994
Marko Mäkelä authored Aug 29, 2024

e91a7994

MDEV-34750 SET GLOBAL innodb_log_file_size is not crash safe · 984606d7

Marko Mäkelä authored Aug 29, 2024

The recent commit 4ca355d8 (MDEV-33894)
caused a serious regression for online InnoDB ib_logfile0 resizing,
breaking crash-safety unless the memory-mapped log file interface is
being used. However, the log resizing was broken also before this.

To prevent such regressions in the future, we extend the test
innodb.log_file_size_online with a kill and restart of the server
and with some writes running concurrently with the log size change.
When run enough many times, this test revealed all the bugs that
are being fixed by the code changes.

log_t::resize_start(): Do not allow the resized log to start before
the current log sequence number. In this way, there is no need to
copy anything to the first block of resize_buf. The previous logic
regarding that was incorrect in two ways. First, we would have to
copy from the last written buffer (buf or flush_buf). Second, we failed
to ensure that the mini-transaction end marker bytes would be 1
in the buffer. If the source ib_logfile0 had wrapped around an odd number
of times, the end marker would be 0. This was occasionally observed
when running the test innodb.log_file_size_online.

log_t::resize_write_buf(): To adjust for the resize_start() change,
do not write anything that would be before the resize_lsn.
Take the buffer (resize_buf or resize_flush_buf) as a parameter.
Starting with commit 4ca355d8
we no longer swap buffers when rewriting the last log block.

log_t::append(): Define as a static function; only some debug
assertions need to refer to the log_sys object.

innodb_log_file_size_update(): Wake up the buf_flush_page_cleaner()
if needed, and wait for it to complete a batch while waiting for
the log resizing to be completed. If the current LSN is behind the
resize target LSN, we will write redundant FILE_CHECKPOINT records to
ensure that the log resizing completes. If the buf_pool.flush_list is
empty or the buf_flush_page_cleaner() is stuck for some reason, our wait
will time out in 5 seconds, so that we can periodically check if the
execution of SET GLOBAL innodb_log_file_size was aborted. Previously,
we could get into a busy loop here while the buf_flush_page_cleaner()
would remain idle.

984606d7

Merge branch '10.6' into 10.11 · 3a1ff739
Oleksandr Byelkin authored Aug 29, 2024

3a1ff739
Merge branch '10.5' into 10.6 · a4654ecc
Oleksandr Byelkin authored Aug 29, 2024

a4654ecc
MDEV-34833 Assertion failure in Item_float::do_build_clone (Item_static_float_func) · 03a5455c
Oleksandr Byelkin authored Aug 29, 2024
```
Added missing method of Item_static_float_func
```
03a5455c
Merge 10.6 into 10.11 · cfcf27c6
Marko Mäkelä authored Aug 29, 2024

cfcf27c6

28 Aug, 2024 8 commits

MDEV-34704 Quick mode produces the bug for mariadb client · 872dbec9
Oleksandr Byelkin authored Aug 05, 2024
```
  --quick-max-column-width parameter added to limit field
    width in --quick mode.
```
872dbec9
Merge 10.5 into 10.6 · 0e76c1ba
Marko Mäkelä authored Aug 28, 2024

0e76c1ba

MDEV-34802 Recovery fails to note some log corruption · 1ff6b6f0

Marko Mäkelä authored Aug 28, 2024

recv_recovery_from_checkpoint_start(): Abort startup due to log
corruption if we were unable to parse the entire log between
the latest log checkpoint and the corresponding FILE_CHECKPOINT record.

Also, reduce some code bloat related to log output and log_sys.mutex.

Reviewed by: Debarun Banerjee

1ff6b6f0

MDEV-33756: Deprecate binlog_optimize_thread_scheduling · 9811d23b

Brandon Nesterenko authored Jul 11, 2024

The option binlog_optimize_thread_scheduling was initially added
to provide a safe alternative for the newly added binlog group
commit logic, such that when 0, it would disable a leader thread
from performing the binlog write for all transactions that are a
part of the group commit. Any problems related to the binlog group
commit optimization should be sorted out by now, so we can
deprecate-to-eventually-remove the option altogether.

This commit performs the deprecation, and the removal is tracked
by MDEV-33745. Note, as the option is only able to be provided
via configuration at startup time, users will not see a
deprecation message unless looking through the CLI help
message.

Reviewed By
============
Kristian Nielsen <knielsen@knielsen-hq.org>
Sergei Golubchik <serg@mariadb.org>

9811d23b

MDEV-34829 LOCALTIME returns a wrong data type · c67149b8

Alexander Barkov authored Aug 28, 2024

Changing the alias LOCALTIME->CURRENT_TIMESTAMP to LOCALTIME->CURRENT_TIME.

This changes the return type of LOCALTIME from DATETIME to TIME,
according to the SQL Standard.

c67149b8

MDEV-32627 Spider: use CONNECTION string in SQLDriverConnect · 18d3f63a
Yuchen Pei authored Jun 11, 2024
```
This is the CS part of the implementation of MENT-2070.
```
18d3f63a

MDEV-34803 innodb_lru_flush_size is no longer used · bda40ccb

Marko Mäkelä authored Aug 28, 2024

In commit fa8a46eb (MDEV-33613)
the parameter innodb_lru_flush_size ceased to have any effect.

Let us declare the parameter as deprecated and additionally as
MARIADB_REMOVED_OPTION, so that there will be a warning written
to the error log in case the option is specified in the command line.

Let us also do the same for the parameter
innodb_purge_rseg_truncate_frequency
that was deprecated&ignored earlier in MDEV-32050.

Reviewed by: Debarun Banerjee

bda40ccb

Update markdown files for `main` branch · e6df06d4
Andrew Hutchings authored Aug 27, 2024
```
Coding standards and PR template now reference `main`.
```
e6df06d4

27 Aug, 2024 7 commits

MDEV-34754 packaging prep for Oracular · fea36b19
Daniel Black authored Aug 14, 2024
```
Add Oracular to the allowed Ubuntu names.
```
fea36b19
MDEV-24923 fixup: Correct a function comment · e7bb9b7c
Marko Mäkelä authored Aug 27, 2024

e7bb9b7c
MDEV-34704 Quick mode produces the bug for mariadb client · 7a65dcb5
Oleksandr Byelkin authored Aug 05, 2024
```
  --quick-max-column-width parameter added to limit field
    width in --quick mode.
```
7a65dcb5
Merge 10.5 into 10.6 · 48becffd
Marko Mäkelä authored Aug 27, 2024

48becffd

MDEV-34515 fixup: innodb.innodb_defrag_concurrent fails · 8cc82228

Marko Mäkelä authored Aug 27, 2024

Let us avoid EXTENDED in the CHECK TABLE after a defragmentation,
because it would occasionally report an orphan delete-marked record
in the index "third". That error does not seem to be reproducible
when using the regular OPTIMIZE TABLE.

Also, let us make the test --repeat safe by removing the defragmentation
related statistics after DROP TABLE.

The defragmentation feature was removed in later releases in
commit 7ca89af6 (MDEV-30545)
along with this test case.

8cc82228

[fixup] Spider: Restored lines accidentally deleted in MDEV-32157 · 58bc83e1
Yuchen Pei authored Aug 27, 2024
```
Also restored a change that resulted in off-by-one, as well as
appending the correctly indexed key_hint.
```
58bc83e1

MDEV-34515: Fix a bogus debug assertion · 36ab75a4

Marko Mäkelä authored Aug 27, 2024

purge_sys_t::stop_FTS(): Fix an incorrect debug assertion that
commit d58734d7 added.
The assertion would fail if there had been prior invocations of
purge_sys.stop_SYS() without purge_sys.resume_SYS().
The intention of the assertion is to check that number of pending
stop_FTS() stays below 65536.

36ab75a4

26 Aug, 2024 9 commits

Fix sporadic failure of test case rpl.rpl_start_stop_slave · 8642453c

Kristian Nielsen authored Aug 19, 2024

The test was expecting the I/O thread to be in a specific state, but thread
scheduling may cause it to not yet have reached that state. So just have a
loop that waits for the expected state to occur.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>

8642453c

Skip mariabackup.slave_provision_nolock in --valgrind, it uses a lot of CPU · 25e02248
Kristian Nielsen authored Aug 19, 2024
```
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
```
25e02248

Fix sporadic failure of test case rpl.rpl_old_master · 214e6c5b

Kristian Nielsen authored Aug 19, 2024

Remove the test for MDEV-14528. This is supposed to test that parallel
replication from pre-10.0 master will update Seconds_Behind_Master. But
after MDEV-12179 the SQL thread is blocked from even beginning to fetch
events from the relay log due to FLUSH TABLES WITH READ LOCK, so the test
case is no longer testing what is was intended to. And pre-10.0 versions are
long since out of support, so does not seem worthwhile to try to rewrite the
test to work another way.

The root cause of the test failure is MDEV-34778. Briefly, depending on
exact timing during slave stop, the rli->sql_thread_caught_up flag may end
up with different value. If it ends up as "true", this causes
Seconds_Behind_Master to be 0 during next slave start; and this caused test
case timeout as the test was waiting for Seconds_Behind_Master to become
non-zero.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>

214e6c5b

Fix sporadic test failure in rpl.rpl_create_drop_event · 7dc4ea56

Kristian Nielsen authored Aug 16, 2024

Depending on timing, an extra event run could start just when the event
scheduler is shut down and delay running until after the table has been
dropped; this would cause the test to fail with a "table does not exist"
error in the log.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>

7dc4ea56

Restore skiping rpl.rpl_mdev6020 under Valgrind · 33854d73

Kristian Nielsen authored Aug 03, 2024

(Revert a change done by mistake when XtraDB was removed.)
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>

33854d73

MDEV-34696: do_gco_wait() completes too early on InnoDB dict stats updates · b4c2e239

Kristian Nielsen authored Aug 03, 2024

Before doing mark_start_commit(), check that there is no pending deadlock
kill. If there is a pending kill, we won't commit (we will abort, roll back,
and retry). Then we should not mark the commit as started, since that could
potentially make the following GCO start too early, before we completed the
commit after the retry.

This condition could trigger in some corner cases, where InnoDB would take
temporarily table/row locks that are released again immediately, not held
until the transaction commits. This happens with dict_stats updates and
possibly auto-increment locks.

Such locks can be passed to thd_rpl_deadlock_check() and cause a deadlock
kill to be scheduled in the background. But since the blocking locks are
held only temporarily, they can be released before the background kill
happens. This way, the kill can be delayed until after mark_start_commit()
has been called. Thus we need to check the synchronous indication
rgi->killed_for_retry, not just the asynchroneous thd->killed.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>

b4c2e239

MDEV-34515: Reduce context switching in purge · 76f6b6d8

Marko Mäkelä authored Aug 26, 2024

Before this patch, the InnoDB purge coordinator task submitted
innodb_purge_threads-1 tasks even if there was not sufficient amount
of work for all of them. For example, if there are undo log records
only for 1 table, only 1 task can be employed, and that task had better
be the purge coordinator.

srv_purge_worker_task_low(): Split from purge_worker_callback().

trx_purge_attach_undo_recs(): Remove the parameter n_purge_threads,
and add the parameter n_work_items, to keep track of the amount of
work.

trx_purge(): Launch purge worker tasks only if necessary. The work of
one thread will be executed by this purge coordinator thread.

que_fork_scheduler_round_robin(): Merged to trx_purge().

Thanks to Vladislav Vaintroub for supplying a prototype of this.

Reviewed by: Debarun Banerjee

76f6b6d8

MDEV-34515: Contention between purge and workload · b7b9f3ce

Marko Mäkelä authored Aug 26, 2024

In a Sysbench oltp_update_index workload that involves 1 table,
a serious contention between the workload and the purge of history
was observed. This was the worst when the table contained only 1 record.

This turned out to be fixed by setting innodb_purge_batch_size=128,
which corresponds to the number of usable persistent rollback segments.
When we go above that, there would be contention between row_purge_poss_sec()
and the workload, typically on the clustered index page latch, sometimes
also on a secondary index page latch. It might be that with smaller
batches, trx_sys.history_size() will end up pausing all concurrent
transaction start/commit frequently enough so that purge will be able
to make some progress, so that there would be less contention on the
index page latches between purge and SQL execution.

In commit aa719b50 (part of MDEV-32050)
the interpretation of the parameter innodb_purge_batch_size was slightly
changed. It would correspond to the maximum desired size of the
purge_sys.pages cache. Before that change, the parameter was referring to
a number of undo log pages, but the accounting might have been inaccurate.

To avoid a regression, we will reduce the default value to
innodb_purge_batch_size=127, which will also be compatible with
innodb_undo_tablespaces>1 (which will disable rollback segment 0).

Additionally, some logic in the purge and MVCC checks is simplified.
The purge tasks will make use of purge_sys.pages when accessing undo
log pages to find out if a secondary index record can be removed.
If an undo page needs to be looked up in buf_pool.page_hash, we will
merely buffer-fix it. This is correct, because the undo pages are
append-only in nature. Holding purge_sys.latch or purge_sys.end_latch
or the fact that the current thread is executing as a part of an
in-progress purge batch will prevent the contents of the undo page from
being freed and subsequently reused. The buffer-fix will prevent the
page from being evicted form the buffer pool. Thanks to this logic,
we can refer to the undo log record directly in the buffer pool page
and avoid copying the record.

buf_pool_t::page_fix(): Look up and buffer-fix a page. This is useful
for accessing undo log pages, which are append-only by nature.
There will be no need to deal with change buffer or ROW_FORMAT=COMPRESSED
in that case.

purge_sys_t::view_guard::view_guard(): Allow the type of guard to be
acquired: end_latch, latch, or no latch (in case we are a purge thread).

purge_sys_t::view_guard::get(): Read-only accessor to purge_sys.pages.

purge_sys_t::get_page(): Invoke buf_pool_t::page_fix().

row_vers_old_has_index_entry(): Replaced with row_purge_is_unsafe()
and row_undo_mod_sec_unsafe().

trx_undo_get_undo_rec(): Merged to trx_undo_prev_version_build().

row_purge_poss_sec(): Add the parameter mtr and remove redundant
or unused parameters sec_pcur, sec_mtr, is_tree. We will use the
caller's mtr object but release any acquired page latches before
returning.

btr_cur_get_page(), page_cur_get_page(): Do not invoke page_align().

row_purge_remove_sec_if_poss_leaf(): Return the value of PAGE_MAX_TRX_ID
to be checked against the page in row_purge_remove_sec_if_poss_tree().
If the secondary index page was not changed meanwhile, it will be
unnecessary to invoke row_purge_poss_sec() again.

trx_undo_prev_version_build(): Access any undo log pages using
the caller's mini-transaction object.

row_purge_vc_matches_cluster(): Moved to the only compilation unit that
needs it.

Reviewed by: Debarun Banerjee

b7b9f3ce

MDEV-34520 purge_sys_t::wait_FTS sleeps 10ms, even if it does not have to · d58734d7

Marko Mäkelä authored Aug 26, 2024

There were two separate Atomic_counter<uint32_t>, purge_sys.m_SYS_paused
and purge_sys.m_FTS_paused. In purge_sys.wait_FTS() we have to read both
atomically. We used to use an overkill solution for this, acquiring
purge_sys.latch and waiting 10 milliseconds between samples. To make
matters worse, the 10-millisecond wait was unconditional, which would
unnecessarily suspend the purge_coordinator_task every now and then.

It turns out that we can fold both "reference counts" into a single
Atomic_relaxed<uint32_t> and avoid the purge_sys.latch.
To assess whether std::memory_order_relaxed is acceptable, we should
consider the operations that read these "reference counts", that is,
purge_sys_t::wait_FTS(bool) and purge_sys_t::must_wait_FTS().

Outside debug assertions, purge_sys.must_wait_FTS() is only invoked in
trx_purge_table_acquire(), which is covered by a shared dict_sys.latch.
We would increment the counter as part of a DDL operation, but before
acquiring an exclusive dict_sys.latch. So, a
purge_sys_t::close_and_reopen() loop could be triggered slightly
prematurely, before a problematic DDL operation is actually executed.
Decrementing the counter is less of an issue; purge_sys.resume_FTS()
or purge_sys.resume_SYS() would mostly be invoked while holding an
exclusive dict_sys.latch; ha_innobase::delete_table() does it outside
that critical section. Still, this would only cause some extra wait in
the purge_coordinator_task, just like at the start of a DDL operation.

There are two calls to purge_sys_t::wait_FTS(bool): in the above mentioned
purge_sys_t::close_and_reopen() and in purge_sys_t::clone_oldest_view(),
both invoked by the purge_coordinator_task. There is also a
purge_sys.clone_oldest_view<true>() call at startup when no DDL operation
can be in progress.

purge_sys_t::m_SYS_paused: Merged into m_FTS_paused, using a new
multiplier PAUSED_SYS = 65536.

purge_sys_t::wait_FTS(): Remove an unnecessary sleep as well as the
access to purge_sys.latch. It suffices to poll purge_sys.m_FTS_paused.

purge_sys_t::stop_FTS(): Do not acquire purge_sys.latch.

Reviewed by: Debarun Banerjee

d58734d7

25 Aug, 2024 1 commit

Trivial fix: Make test_if_cheaper_ordering() use actual_rec_per_key() · 9020baf1

Sergei Petrunia authored Aug 24, 2024

Discovered this while working on MDEV-34720: test_if_cheaper_ordering()
uses rec_per_key, while the original estimate for the access method
is produced in best_access_path() by using actual_rec_per_key().

Make test_if_cheaper_ordering() also use actual_rec_per_key().
Also make several getter function "const" to make this compile.
Also adjusted the testcase to handle this (the change backported from
11.0)

9020baf1

23 Aug, 2024 2 commits

MDEV-34759: buf_page_get_low() is unnecessarily acquiring exclusive latch · 9db2b327

Marko Mäkelä authored Aug 23, 2024

buf_page_ibuf_merge_try(): A new, separate function for invoking
ibuf_merge_or_delete_for_page() when needed. Use the already requested
page latch for determining if the call is necessary. If it is and
if we are currently holding rw_latch==RW_S_LATCH, upgrading to an exclusive
latch may involve waiting that another thread acquires and releases
a U or X latch on the page. If we have to wait, we must recheck if the
call to ibuf_merge_or_delete_for_page() is still needed. If the page
turns out to be corrupted, we will release and fail the operation.
Finally, the exclusive page latch will be downgraded to the originally
requested latch.

ssux_lock_impl::rd_u_upgrade_try(): Attempt to upgrade a shared lock to
an update lock.

sux_lock::s_x_upgrade_try(): Attempt to upgrade a shared lock to
exclusive.

sux_lock::s_x_upgrade(): Upgrade a shared lock to exclusive.
Return whether a wait was elided.

ssux_lock_impl::u_rd_downgrade(), sux_lock::u_s_downgrade():
Downgrade an update lock to shared.

9db2b327

MDEV-34765: rpl.master_last_event_time_stmt fails with Result Length Mismatch · 9e845107

Brandon Nesterenko authored Aug 20, 2024

When executing a Query_log_event that is a COMMIT query,
gtid_slave_pos is updated before other replication status
variables, so when an MTR test syncs a replica with
primaries via GTID, there is a slight window where accessing
status variables, e.g. via SHOW ALL SLAVES STATUS, results
in "stale" values because gtid_slave_pos has been updated
before the *_last_event_time fields have been updated.

This patch only fixes the test by switching from using
GTIDs to using binlog file coordinates when synchronizing
replicas with their primaries.

9e845107