Commits · 383ee364dc2a9234901d3098de0560a9a3e7adc4 · nexedi / MariaDB

07 May, 2024 1 commit
- Merge 10.6 to 10.11 · 383ee364
  Kristian Nielsen authored May 07, 2024
  
  383ee364
06 May, 2024 1 commit

MDEV-30929 spider.spider_fixes_part: wait and restart slave · 64314d30

Yuchen Pei authored May 06, 2024

In the absence of insight of the cause of spider.spider_fixes_part
failure as described in MDEV-30929, This is a workaround, which could
help narrow the possibility down to whether slave SQL thread attempts
to read from file that maybe not yet on disk. It does not otherwise
affect the coverage of the test.

I have pushed this commit 4 times, but have yet to encounter the
failure as described in MDEV-30929, so it could also fix the test and
stop the CI pollution.

Also replaced START SLAVE; with --source include/start_slave.inc
inside the slave_test_init.inc files.

64314d30

05 May, 2024 2 commits

MDEV-34042: Deadlock kill of XA PREPARE can break replication /... · 4b4db4a8

Kristian Nielsen authored May 04, 2024

MDEV-34042: Deadlock kill of XA PREPARE can break replication / rpl.rpl_parallel_multi_domain_xa sporadic failure

Refinement of the original patch.

Move the code to reset the kill up into the parent class
Xid_apply_log_event, to also fix the similar issue for XA COMMIT.

Increase the number of slave retries in the test case
rpl.rpl_parallel_multi_domain_xa to fix some sporadic failures. The test
generates massive amounts of conflicting transactions in multiple
independent domains, which can cause multiple rollback+retry for a
transaction as it conflicts with transactions in other domains one-by-one.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>

4b4db4a8

MDEV-33798: Follow-up patch · 2a2019e1

Kristian Nielsen authored May 05, 2024

Don't deadlock kill event groups in other domains if they are not
SPECULATE_OPTIMISTIC. Such event groups may not be able to safely roll back
and retry (eg. DDL).

But do deadlock kill a transaction T2 from a blocked transaction U in another
domain, even if T2 has lower sub_id than U. Otherwise, in case of a cycle
T2->T1->U->T2, we might not break the cycle if U is not SPECULATE_OPTIMISTIC
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>

2a2019e1

02 May, 2024 5 commits

sporadic failures of binlog_encryption.rpl_parallel_gco_wait_kill · 3ee6f69d

Sergei Golubchik authored May 02, 2024

CURRENT_TEST: binlog_encryption.rpl_parallel_gco_wait_kill
mysqltest: In included file "./suite/rpl/t/rpl_parallel_gco_wait_kill.test":
included from /home/buildbot/amd64-ubuntu-2004-debug/build/mysql-test/suite/binlog_encryption/rpl_parallel_gco_wait_kill.test at line 2:
At line 334: Can't initialize replace from 'replace_result $thd_id THD_ID'

An sql thread can reach the "Slave has read all relay log" state
and then start reading relay log again. Let's use a more generic
pattern to retrieve the sql thread ID even if it's not
in the "read all relay log" state.

3ee6f69d

MDEV-34042: Deadlock kill of XA PREPARE can break replication /... · 596921da

Kristian Nielsen authored Apr 30, 2024

MDEV-34042: Deadlock kill of XA PREPARE can break replication / rpl.rpl_parallel_multi_domain_xa sporadic failure

Clear any pending deadlock kill after completing XA PREPARE, and before
updating the mysql.gtid_slave_pos table in a separate transaction.
Reviewed-by: Andrei Elkin <andrei.elkin@mariadb.com>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>

596921da

MDEV-33798: ROW base optimistic deadlock with concurrent writes on same table · e365877b

Kristian Nielsen authored Apr 30, 2024

One case is conflicting transactions T1 and T2 with different domain id, in
optimistic parallel replication in non-GTID mode. Then T2 will
wait_for_prior_commit on T1; and if T1 got a row lock wait on T2 it would
hang, as different domains caused the deadlock kill to be skipped in
thd_rpl_deadlock_check().

More generally, if we have transactions T1 and T2 in one domain/master
connection, and independent transactions U in another, then we can
still deadlock like this:

  T1 row low wait on U
  U row lock wait on T2
  T2 wait_for_prior_commit on T1

This commit enforces the deadlock kill in these cases. If the waited-for
transaction is speculatively applied, then it will be deadlock killed in
case of a conflict, even if the two transactions are in different domains
or master connections.
Reviewed-by: Andrei Elkin <andrei.elkin@mariadb.com>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>

e365877b

MDEV-33543 Server hang caused by InnoDB change buffer · 90b95c61

mariadb-DebarunBanerjee authored Apr 30, 2024

Issue: When getting a page (buf_page_get_gen) with no latch option
(RW_NO_LATCH), the caller is not expected to follow the B-tree latching
order. However in buf_page_get_low we try to acquire shared page latch
unconditionally to wait for a page that is being loaded by another
thread concurrently. In general it could lead to latch order violation
and deadlock.

Currently it affects the change buffer insert path btr_latch_prev()
which tries to load the previous page out of order with RW_NO_LATCH and
two concurrent inserts into IBUF tree cause deadlock. This problem is
introduced in 10.6 by following commit.
commit 9436c778 (MDEV-27058)

Fix: While trying to latch a page with RW_NO_LATCH, always use the
"*lock_try" interface and retry operation on failure after unfixing the
page.

90b95c61

fix sporadic failures of main.lock_sync · 9dfef3fb
Sergei Golubchik authored Dec 22, 2023
```
wait for all connections to disconnect before the cleanup
```
9dfef3fb

30 Apr, 2024 8 commits

atomic.alter_table test is too slow for MSAN · dba9d192
Sergei Golubchik authored Apr 30, 2024

dba9d192

MDEV-31161 Assertion failures upon adding a too long key to table with COMPRESSED row · 156761db

Thirunarayanan Balathandayuthapani authored Apr 30, 2024

Problem:
=======
During InnoDB non-rebuild online alter operation, InnoDB set the
dummy log to clustered index online log. This can be used by
concurrent DML to identify whether the table undergoes online DDL.
InnoDB fails to reset the dummy log of clustered index in case
of error happened during prepare phase.

Solution:
========
Reset the InnoDB clustered index online log in case of error during
prepare phase.

156761db

don't use normal diffs in *.rdiff files · b663c935

Sergei Golubchik authored Apr 30, 2024

they aren't robust enough and can easily apply incorrectly

(this fixes the failure of innodb.insert_into_empty,4k after the merge)

b663c935

Merge branch '10.6' into 10.11 · 0aae11ac
Sergei Golubchik authored Apr 30, 2024

0aae11ac

MDEV-34030 rpl.rpl_using_gtid_default can fail in (BB) mtr · ae03374f

Andrei authored Apr 29, 2024

The test's header is not written to follow strictly a correct order
of checks by mtr at test start which may lead to an error. E.g

./mtr --mysqld=--binlog-format=row rpl.rpl_using_gtid_default

to
At line 175: query 'SET GLOBAL gtid_slave_pos= ""' failed: ER_SLAVE_MUST_STOP (1198): This operation cannot be performed as you have a running slave ''; run STOP SLAVE '' first

Fixed to require the binlog format first in the test header.

ae03374f

MDEV-34029 rpl.rpl_heartbeat can fail when (BB) mtr reorders tests · 6a63204c

Andrei authored Apr 29, 2024

rpl.rpl_heartbeat turns out to miss a standard include/master-slave
header which made it potentially in BB and actually with manual mtr
failing as it may have used a previous slave GTID state.

Fixed with installing the standard rpl suite header/footer in the
test file.

6a63204c

Fixed slow bootstrap introduced in 10.6 · 814dc467

Monty authored Apr 27, 2024

The problem was that the signal thread was not killed when using
unireg_abort().

The bug was introduced by:
MDEV-30260: Slave crashed:reload_acl_and_cache during shutdown

Other things fixed:
- Don't produce memory leaks with safemalloc if all threads was not
  ended properly (not useful)

814dc467

MDEV-33852: Rework systemd installation on Debian · ec09c034

Tuukka Pasanen authored Apr 17, 2024

Let dh_systemd handle most of the systemd side and
get rid of custom scripts

Rework installation of systemd service and socket files
base on Michael Biebl merge request:

https://salsa.debian.org/mariadb-team/mariadb-server/-/merge_requests/63
https://salsa.debian.org/mariadb-team/mariadb-server/-/merge_requests/75

ec09c034

29 Apr, 2024 3 commits

Merge branch '10.5' into 10.6 · c1f3eff5
Sergei Golubchik authored Apr 29, 2024

c1f3eff5
MDEV-30727 Check spider_hton_ptr in spider udfs · 267dd5a9
Yuchen Pei authored Apr 29, 2024
```
We have to #undef my_error and find it from udfs when spider is not
installed.
```
267dd5a9

MDEV-33669 mariabackup --backup hangs · 52f6df99

mariadb-DebarunBanerjee authored Apr 24, 2024

This is a server hang and not an issue with backup. While concurrent
DDLs in server gets in hanged state, mariabackup waits for DDLs to
finish trying to acquire MDL_BACKUP_BLOCK_DDL.

The server hang is serious in nature and caused by thread pool state
being incorrectly set to thread creation pending state while no creation
is actually pending. Once a thread pool reaches such state no new thread
gets created in the pool.

While it could possibly affect all thread pools in server, the innodb
thread pool is the victim in current bug where IO job gets blocked when
the pool is stuck with much less number of threads than intended.
Available workers are blocked in purge waiting for page lock to be
released by IO write (SX lock) causing a complete deadlock.

The issue is caused by the state variable m_thread_creation_pending
introduced by MDEV-31095: 9e62ab7a. We check and set the variable
early while attempting to create a new thread in pool but fail to reset
it if we exit the flow for other reasons like maximum threads reached
or get into thread creation throttling path.

Fix: The simple fix is to make sure that the state is reset back in case
we don't actually attempt to create the thread.

52f6df99

28 Apr, 2024 2 commits
- require boost 1.53 for columnstore · bda8d4fd
  Oleksandr Byelkin authored Apr 26, 2024
  
  bda8d4fd
- PCRE2-10.43 · a09ebe55
  Oleksandr Byelkin authored Apr 26, 2024
```
pcre2 - fix CMAKE_C_FLAGS for MSVC for external project by Vladislav Vaintroub <vvaintroub@gmail.com>
```
  a09ebe55
27 Apr, 2024 1 commit

MDEV-33534 UBSAN: Negation of -X cannot be represented in type 'long long... · 3141a68b

Alexander Barkov authored Apr 26, 2024

MDEV-33534 UBSAN: Negation of -X cannot be represented in type 'long long int'; cast to an unsigned type to negate this value to itself in my_double_round from sql/item_func.cc|

The negation in this line:
ulonglong abs_dec= dec_negative ? -dec : dec;
did not take into account that 'dec' can be the smallest possible
signed negative value -9223372036854775808. Its negation is
an operation with an undefined behavior.

Fixing the code to use Longlong_hybrid, which implements a safe
method to get an absolute value.

3141a68b

26 Apr, 2024 5 commits

sporadic failures of rpl.rpl_parallel_multi_domain_xa · 7ff64931

Sergei Golubchik authored Apr 26, 2024

it's a slow test, the slave needs to catch up, reading >1500
transactions. A default MASTER_GTID_WAIT() timeout in
sync_with_master_gtid.inc is 120 seconds, which might be not
enough for a slow/overloaded slave.

Let's wait forever or until ./mtr --testcase-timeout,
whatever comes first.

7ff64931

MDEV-33574 Improve mysqlbinlog error message · 3d417476

Hugo Wen authored Mar 20, 2024

Previously, when running mysqlbinlog without providing a binlog file, it
would print the entire help text, which was very verbose and made it
difficult to identify the actual issue.

Now change the behavior to print a more concise error message instead:

"ERROR: Please provide the log file(s). Run with '--help' for usage instructions."

This makes the error output more user-friendly and easier to understand,
especially when running the tool in scripts or automated processes.

All new code of the whole pull request, including one or several files
that are either new files or modified ones, are contributed under the
BSD-new license. I am contributing on behalf of my employer
Amazon Web Services, Inc.

3d417476

Fixup · ef7a2344

Daniele Sciascia authored Apr 25, 2024

0ccdf54b removed stack allocated THD objects from functions
Wsrep_schema::replay_transaction(). However, it inadvertedly
anticipated the destruction of the THD, causing assertions and usage
of THD after it was destroyed.
The fix consists in extracting the original function into a separate
function, and leave the allocation and destruction of the THD object
in Wsrep_schema::replay_transaction(), making sure that using the heap
allocated THD has no side effects.
Same for Wsrep_schema::recover_sr_transactions().
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>

ef7a2344

MDEV-33492 fix installation of rpm/deb packages · 22a69c78
Sergei Golubchik authored Apr 25, 2024
```
followup for 02715174
```
22a69c78
Merge branch '10.6' into 10.11 · c9b1ebee
Oleksandr Byelkin authored Apr 26, 2024

c9b1ebee

25 Apr, 2024 8 commits

MDEV-33896 : Galera test failure on galera_3nodes.MDEV-29171 · b3e531a3

Jan Lindström authored Apr 12, 2024

Based on logs we might start SST before donor has reached
Primary state. Because this test shutdowns all nodes we
need to make sure when we start nodes that previous nodes
have reached Primary state and joined the cluster.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>

b3e531a3

MDEV-26450 fixup: Remove a bogus assertion · 10d251e0

Marko Mäkelä authored Apr 25, 2024

mtr_t::commit_shrink(): Do not assert that some previously clean pages
will be flagged as modified by this mini-transaction. It could be the
case that there had been no recent write-back of any of the undo
tablespace pages that we are modifying when truncating the tablespace.
It suffices to assert that some pages were modified again:
ut_ad(m_modifications).

This fixes up commit f5fddae3

10d251e0

sporadic failures of rpl.rpl_parallel_sbm · 9e925820

Sergei Golubchik authored Apr 25, 2024

the test waits for the event to get stuck on MASTER_DELAY,
but on a slow/overloaded slave the event might pass MASTER_DELAY
before the test starts waiting.

Wait for the event to get stuck on the LOCK TABLES (after MASTER_DELAY),
the event cannot avoid that,

9e925820

MDEV-33993 Possible server hang on DROP INDEX or RENAME INDEX · 0936c138

Marko Mäkelä authored Apr 25, 2024

commit_try_norebuild(): Add the parameter statistics_exist,
similar to commit_try_rebuild(). If the InnoDB statistics tables
did not exist, we will not attempt to update statistics later on
during the transaction.

Thanks to Matthias Leich for originally reproducing this scenario.

0936c138

MDEV-33602: Sporadic test failure in rpl.rpl_gtid_stop_start · 553a4d62

Kristian Nielsen authored Apr 23, 2024

The test could fail with a duplicate key error because switching to non-GTID
mode could start at the wrong old-style position. The position could be
wrong when the previous GTID connect was stopped before receiving the fake
GTID list event which gives the old-style position corresponding to the GTID
connected position.

Work-around by injecting an extra event and syncing the slave before
switching to non-GTID mode.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>

553a4d62

MDEV-33974 Enable GNU libstdc++ debugging · a1c1f502

Marko Mäkelä authored Apr 25, 2024

Starting with GCC 10, let us enable _GLIBCXX_DEBUG as well as
_GLIBCXX_ASSERTIONS which have an impact on the GNU libstdc++.
On GCC 8, we observed a compilation failure related to some
missing type conversion.

Even though clang on GNU/Linux would default to using libstdc++
and enabling the debugging seems to work with clang-18, we will
not enable this on clang, in case it would lead to compilation
errors.

For the clang libc++ before clang-15 there was _LIBCPP_DEBUG,
but according to
llvm/llvm-project@f3966eaf869b7bdd9113ab9d5b78469eb0f5f028 and
llvm/llvm-project@13ea1343231fa4ae12fe9fba4c789728465783d7 and
llvm/llvm-project@ff573a42cd1f1d05508f165dc3e645a0ec17edb5 it
looks like that for proper results, a specially built debug version
of libc++ would have to be used in order to enable equivalent checks.

This should help catch bugs like the one that
commit 455a15fd fixed.

Reviewed by: Sergei Golubchik

a1c1f502

MDEV-33979 Disallow bulk insert operation during partition update statement · 8c8b7da0

Thirunarayanan Balathandayuthapani authored Apr 25, 2024

Problem:
========
- Partition update operation enables the bulk insert for the
transaction while moving the row between partitions. This leads
to debug assert failure while removing the row from one
of the partition.

Solution:
========
- Disallow the bulk insert operation for non-insert operation
of partition table.

8c8b7da0

MDEV-23974 fixup: Cover all debug builds · 72293842

Marko Mäkelä authored Apr 25, 2024

While commit 75b7cd68 was a significant
improvement, we occasionally got test failures of debug builds. One of
the affected tests is innodb.innodb-64k-crash.

72293842

24 Apr, 2024 4 commits
- cleanup: use THD_STAGE_INFO, not thd_proc_info · 9cf71885
  Sergei Golubchik authored Apr 24, 2024
```
and put master-slave.inc *last* in the series of includes
```
  9cf71885
- MDEV-20157 perfschema.stage_mdl_function failed in buildbot with wrong result · 7d5e08de
  Sergei Golubchik authored Apr 24, 2024
```
MDL wait consists of short 1 second waits (this is not configurable)
repeated until lock_wait_timeout is reached. The stage is changed
to Waiting and back every second. To have predictable result in the
test the query should filter all sequences of X, "Waiting for MDL", X,
leaving just X.
```
  7d5e08de
- disable mariabackup.incremental_encrypted,64k on 32bit · 259394ae
  Sergei Golubchik authored Apr 23, 2024
```
it allocates 1GB of memory, it causes failures in CI
```
  259394ae
- fix galera_3nodes.galera_gtid_consistency to work with nc · e2f95ebb
  Sergei Golubchik authored Apr 23, 2024
```
like other galera tests do
```
  e2f95ebb