Commits · 85db6df41200bfd438eb45e8e1eb7794fd9064be · nexedi / MariaDB

18 Sep, 2023 1 commit

MDEV-32151 InnoDB scrubbing doesn't write zero while freeing the page for temporary tablespace · 85db6df4

Thirunarayanan Balathandayuthapani authored Sep 18, 2023

- InnoDB fails to mark the page status as FREED during freeing
of page for temporary tablespace. This behaviour affects
scrubbing and doesn't write all zeroes in file even though
pages are freed.

mtr_t::free(): Mark the page as freed for temporary tablespace
also

85db6df4

15 Sep, 2023 9 commits

Merge branch '10.5' into 10.6 · 0f870914
Yuchen Pei authored Sep 15, 2023

0f870914
Merge branch '10.4' into 10.5 · cf816263
Yuchen Pei authored Sep 15, 2023

cf816263

MDEV-32157 MDEV-28856 Spider: Tests, documentation, small fixes and cleanups · 18990f00

Yuchen Pei authored Sep 15, 2023

Removed some redundant hint related string literals from
spd_db_conn.cc

Clean up SPIDER_PARAM_*_[CHAR]LEN[S]

Adding tests covering monitoring_kind=2. What it does is that it reads
from mysql.spider_link_mon_servers with matching db_name, table_name,
link_id, and does not do anything about that...

How monitoring_* can be useful: in the deprecated spider high
availability feature, when one remote fails, spider will try another
remote, which apparently makes use of these table parameters.

A test covering the query_cache_sync table param. Some further tests
on some spider table params.

Wrapper should be case insensitive.

Code documentation on spider priority binary tree.

Add an assertion that static_key_cardinality is always -1. All tests
pass still

18990f00

MDEV-32157 MDEV-28856 Spider: drop server in tests · 3b3200e2

Yuchen Pei authored Sep 15, 2023

This helps eliminate "server exists" failures

Also, spider/bugfix.mdev_29676, when enabled after MDEV-29525 is
pushed will fail because we have not --recorded the result. But the
failure will only emerge when working on MDEV-31138 where we manually
re-enable this test, so let's worry about that then.

3b3200e2

Merge branch '10.5' into 10.6 · b70d8fbf
Yuchen Pei authored Sep 15, 2023

b70d8fbf

MDEV-29502 Fix some issues with spider direct aggregate · 68a00207

Yuchen Pei authored Sep 14, 2023

The direct aggregate mechanism sems to be only intended to work when
otherwise a full table scan query will be executed from the spider
node and the aggregation done at the spider node too. Typically this
happens in sub_select(). In the test spider.direct_aggregate_part
direct aggregate allows to send COUNT statements directly to the data
nodes and adds up the results at the spider node, instead of iterating
over the rows one by one at the spider node.

By contrast, the group by handler (GBH) typically sends aggregated
queries directly to data nodes, in which case DA does not improve the
situation here.

That is why we should fix it by disabling DA when GBH is used.

There are other reasons supporting this change. First, the creation of
GBH results in a call to change_to_use_tmp_fields() (as opposed to
setup_copy_fields()) which causes the spider DA function
spider_db_fetch_for_item_sum_funcs() to work on wrong items. Second,
the spider DA function only calls direct_add() on the items, and the
follow-up add() needs to be called by the sql layer code. In
do_select(), after executing the query with the GBH, it seems that the
required add() would not necessarily be called.

Disabling DA when GBH is used does fix the bug. There are a few
other things included in this commit to improve the situation with
spider DA:

1. Add a session variable that allows user to disable DA completely,
this will help as a temporary measure if/when further bugs with DA
emerge.

2. Move the increment of direct_aggregate_count to the spider DA
function. Currently this is done in rather bizarre and random
locations.

3. Fix the spider_db_mbase_row creation so that the last of its row
field (sentinel) is NULL. The code is already doing a null check, but
somehow the sentinel field is on an invalid address, causing the
segfaults. With a correct implementation of the row creation, we can
avoid such segfaults.

68a00207

Merge branch '10.4' into 10.5 · e95e9a22
Yuchen Pei authored Sep 15, 2023

e95e9a22

MDEV-31787 MDEV-26151 Add a test exercising non-0 spider_casual_read · 96760d3a

Yuchen Pei authored Sep 14, 2023

Also:
- clean up spider_check_and_get_casual_read_conn() and
  spider_check_and_set_autocommit()
- remove a couple of commented out code blocks

96760d3a

MDEV-31673 [fixup] Fixing indentation from previous mdev-31673 patch · d59334da
Yuchen Pei authored Sep 15, 2023

d59334da

14 Sep, 2023 10 commits

MDEV-32004: Cosmetic fixes · 15cd8542

Anel Husakovic authored Aug 25, 2023

- Reviewer: <knielsen@knielsen-hq.org>
            <brandon.nesterenko@mariadb.com>

15cd8542

MDEV-32004: Remove extra `server_<num>_1` connections during initialization · 8d6ae0f2

Anel Husakovic authored Aug 25, 2023

- Remove extra connections in the form of `server_number_1` for the same server
  during initialization of servers in the `rpl_init.inc` file.
- Remove disconnecting and reconnecting to the same connections,
  since they are not used by the test.
- Update comments about the above.

- Reviewer: <knielsen@knielsen-hq.org>
            <brandon.nesterenko@mariadb.com>

8d6ae0f2

MDEV-32004: Parse error in mtr tests when using rpl_check_server_ids parameter · 2534e5bc

Anel Husakovic authored Aug 24, 2023

- Fix the calling of the assertion condition when `rpl_check_server_ids` parameter is used.
- Fix comments regarding the default usage and configuration files
extension in this case.

- Reviewer: <knielsen@knielsen-hq.org>
            <brandon.nesterenko@mariadb.com>

2534e5bc

Merge 10.5 into 10.6 · 6a470db5
Marko Mäkelä authored Sep 14, 2023

6a470db5

MDEV-32163 Crash recovery fails after DROP TABLE in system tablespace · 81e60f1a

Marko Mäkelä authored Sep 14, 2023

fseg_free_extent(): After fsp_free_extent() succeeded, properly
mark the affected pages as freed. We failed to write FREE_PAGE records.

This bug was revealed or caused by
commit e938d7c1 (MDEV-32028).

81e60f1a

Remove duplicated default client include from replication my.cnf · b1ab4ec4

Anel Husakovic authored Aug 24, 2023

- `default_client` is included already in rpl_1slave_base.cnf`, so
remove it from `my.cnf`
- Remove option group for `mysqld` server as and add comment how to
override specific settings for specific server

- Reviewer: <brandon.nesterenko@mariadb.com>

b1ab4ec4

MDEV-31673 MDEV-29502 Remove spider_db_handler::need_lock_before_set_sql_for_exec · d8e9f3d9
Yuchen Pei authored Sep 14, 2023
```
This function trivially returns false
```
d8e9f3d9
Merge branch '10.4' into 10.5 · cb1965bd
Yuchen Pei authored Sep 14, 2023

cb1965bd
Merge 10.5 into 10.6 · 0f9acce3
Marko Mäkelä authored Sep 14, 2023

0f9acce3

Fix cmake -DWITH_INNODB_AHI=OFF · cce76df5

Marko Mäkelä authored Sep 14, 2023

This fixes up commit 6cc88c3d

Thanks to Markus Mäkelä for reporting the build failure.

cce76df5

13 Sep, 2023 5 commits

MDEV-31177: SHOW SLAVE STATUS Last_SQL_Errno Race Condition on Errored Slave Restart · 1407f999

Brandon Nesterenko authored Sep 13, 2023

The SQL thread and a user connection executing SHOW SLAVE STATUS
have a race condition on Last_SQL_Errno, such that a slave which
previously errored and stopped, on its next start, SHOW SLAVE STATUS
can show that the SQL Thread is running while the previous error is
also showing.

The fix is to move when the last error is cleared when the SQL
thread starts to occur before setting the status of
Slave_SQL_Running.

Thanks to Kristian Nielson for his work diagnosing the problem!

Reviewed By:
============
Andrei Elkin <andrei.elkin@mariadb.com>
Kristian Nielson <knielsen@knielsen-hq.org>

1407f999

MDEV-31038: rpl.rpl_xa_prepare_gtid_fail clean up · 7de0c7b5

Brandon Nesterenko authored Aug 24, 2023

- Removed commented out and unused lines.
- Updated test to reference true failure of timeout
  rather than deadlock
- Switched save variables from MTR to user
- Forced relay-log purge to not potentially re-execute
  an already prepared transaction

7de0c7b5

MDEV-31369 Disable TLS v1.0 and 1.1 for MariaDB · 1831f8e4

Daniel Black authored Jul 06, 2023

Remove TLSv1.1 from the default tls_version system variable.

Output a warning if TLSv1.0 or TLSv1.1 are selected.

Thanks Tingyao Nian for the feature request.

1831f8e4

post-merge fix · 9e9cefde
Sergei Golubchik authored Sep 13, 2023

9e9cefde

MDEV-31315 Add client_ed25519.dll to the list of plugins shipped with HeidiSQL · 5fe8d0d5

Oleg Smirnov authored Jul 12, 2023

There is a list of plugins in the WiX configuration file for HeidiSQL,
and the installer only installs DLLs from that list although the HeidiSQL
portable archive may include other plugins.

This commit adds client_ed25519.dll to this list and also rearranges
the list alphabetically, so it is easier to verify its contents

5fe8d0d5

12 Sep, 2023 4 commits

MDEV-32150 InnoDB reports corruption on 32-bit platforms with ibd files sizes > 4GB · d20a4da2

Marko Mäkelä authored Sep 12, 2023

buf_read_page_low(): Use 64-bit arithmetics when computing the
file byte offset. In other calls to fil_space_t::io() the offset
was being computed correctly, for example by
buf_page_t::physical_offset().

d20a4da2

MDEV-30100 fixup: Remove a failing debug assertion · 736901b4

Marko Mäkelä authored Sep 12, 2023

trx_purge_truncate_history(): Remove a debug assertion that
had originally been added in
commit 0de3be8c (MDEV-30671).
In trx_t::commit_empty() we do not have any efficient way to rewind
rseg.needs_purge to an accurate value that would satisfy this
debug assertion.

Note: No correctness property should be violated here. At the point
where the debug assertion was located, we had already established
that purge_sys.sees(rseg.needs_purge) holds, that is, it is safe
to remove everything from rseg.

736901b4

MDEV-26782 fixup: Remove dead code · 3c840ae7

Marko Mäkelä authored Sep 12, 2023

trx_undo_reuse_cached(): Assert that this is being invoked on the
persistent rollback segment of the transaction, and remove dead code
that was handling cached temporary undo log. This was missed in
commit 51e62cb3 (MDEV-26782).

3c840ae7

MDEV-31833 replication breaks when using optimistic replication and replica is a galera node · a3cbc44b

sjaakola authored Sep 12, 2023

MariaDB async replication SQL thread was stopped for any failure
in applying of replication events and error message logged for the failure
was: "Node has dropped from cluster". The assumption was that event applying
failure is always due to node dropping out.
With optimistic parallel replication, event applying can fail for natural
reasons and applying should be retried to handle the failure. This retry
logic was never exercised because the slave SQL thread was stopped with first
applying failure.

To support optimistic parallel replication retrying logic this commit will
now skip replication slave abort, if node remains in cluster (wsrep_ready==ON)
and replication is configured for optimistic or aggressive retry logic.

During the development of this fix, galera.galera_as_slave_nonprim test showed
some problems. The test was analyzed, and it appears to need some attention.
One excessive sleep command was removed in this commit, but it will need more
fixes still to be fully deterministic. After this commit galera_as_slave_nonprim
is successful, though.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>

a3cbc44b

11 Sep, 2023 11 commits

galera: wsrep-lib sumbodule update · 1adfdfbd
Julius Goryavsky authored Sep 12, 2023

1adfdfbd

MDEV-32051 Failed to insert streaming client · ef4b59fa

Daniele Sciascia authored Sep 01, 2023

- Deterministic test to reproduce the warning
- Update wsrep-lib to fix the issue
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>

ef4b59fa

MDEV-31988 : galera_partition test: assertion due to unallowed state transition · fee138a1

Jan Lindström authored Aug 24, 2023

Test case is starting too many servers that are not really
needed for original problem testing. This fix reduces
number of servers to make test case smaller and more
robust.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>

fee138a1

MDEV-29861 : Galera "notify" test cases hang · 632a503c

Jan Lindström authored Aug 23, 2023

Problem was that if wsrep_notify_cmd was set it was called
with a new status "joined" it tries to connect to the server
to update some table, but the server isn't initialized yet,
it's not listening for connections. So the server waits for the
script to finish, script waits for mariadb client to connect,
and the client cannot connect, because the server isn't listening.

Fix is to call script only when Galera has already formed a
view or when it is synched or donor.

This fix also enables following test cases:
* galera.MW-284
* galera.galera_binlog_checksum
* galera_var_notify_ssl_ipv6
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>

632a503c

MDEV-32145 Disable read-ahead for temporary tablespace · a03b8cd0

Thirunarayanan Balathandayuthapani authored Sep 11, 2023

- Lifetime of temporary tables is expected to be short, it would
seem to make sense to assume that all temporary tablespace pages
will remain in the buffer pool. It doesn't make sense to have
read-ahead for pages of temporary tablespace

a03b8cd0

MDEV-32134 InnoDB hang in buf_flush_wait_LRU_batch_end() · cdd2fa7f

Marko Mäkelä authored Sep 11, 2023

buf_flush_page_cleaner(): Before finishing a batch, wake up any threads
that are waiting for buf_pool.done_flush_LRU.

This should fix a hung shutdown that we observed
after SET GLOBAL innodb_buffer_pool_size started was executed
to shrink the InnoDB buffer pool.

cdd2fa7f

MDEV-32103 InnoDB ALTER TABLE is not crash-safe · 466d9f5f

Marko Mäkelä authored Sep 11, 2023

Starting with commit 4ff5311d
log_write_up_to(trx->commit_lsn, true) in DDL operations could end up
being a no-op, because trx->commit_lsn would be 0.

trx_flush_log_if_needed(): Revert an incorrect attempt to ensure
that DDL operations are crash-safe.

trx_t::commit(std::vector<pfs_os_file_t> &), ha_innobase::rename_table():
Set trx_t::flush_log_later so that trx_t::commit_in_memory() will
retain trx_t::commit_lsn for the final durability call.

Tested by: Matthias Leich

466d9f5f

MDEV-30531 Corrupt index(es) on busy table when using FOREIGN KEY · 4a8291fc

Marko Mäkelä authored Sep 11, 2023

lock_wait(): Never return the transient error code DB_LOCK_WAIT.
In commit 78a04a4c (MDEV-29869)
some assignments assign trx->error_state = DB_SUCCESS were removed,
and it was possible that the field was left at its initial value
DB_LOCK_WAIT.

The test case for this is nondeterministic; without this fix, it
would only occasionally fail.

Reviewed by: Vladislav Lesin

4a8291fc

MDEV-32096 Parallel replication lags because innobase_kill_query() may fail to... · e039720b

Marko Mäkelä authored Sep 11, 2023

MDEV-32096 Parallel replication lags because innobase_kill_query() may fail to interrupt a lock wait

lock_sys_t::cancel(trx_t*): Remove, and merge to its only caller
innobase_kill_query().

innobase_kill_query(): Before reading trx->lock.wait_lock,
do acquire lock_sys.wait_mutex, like we did before
commit e71e6133 (MDEV-24671).
In this way, we should not miss a recently started lock wait
by the killee transaction.

lock_rec_lock(): Add a DEBUG_SYNC "lock_rec" for the test case.

lock_wait(): Invoke trx_is_interrupted() before entering the wait,
in case innobase_kill_query() was invoked some time earlier and
some longer-running operation did not check for interrupts.
As suggested by Vladislav Lesin, do not overwrite
trx->error_state==DB_INTERRUPTED with DB_SUCCESS.
This would avoid a call to trx_is_interrupted() when the test is
modified to use the DEBUG_SYNC point lock_wait_start instead of lock_rec.
Avoid some redundant loads of trx->lock.wait_lock; cache the value
in the local variable wait_lock.

Deadlock::check_and_resolve(): Take wait_lock as a parameter and
return wait_lock (or -1 or nullptr). We only need to reload
trx->lock.wait_lock if lock_sys.wait_mutex had been released
and reacquired.

trx_t::error_state: Correctly document the data member.

trx_lock_t::was_chosen_as_deadlock_victim: Clarify that other threads
may set the field (or flags in it) while holding lock_sys.wait_mutex.

Thanks to Johannes Baumgarten for reporting the problem and testing
the fix, as well as to Kristian Nielsen for suggesting the fix.

Reviewed by: Vladislav Lesin
Tested by: Matthias Leich

e039720b

Merge 10.5 into 10.6 · 0dd25f28
Marko Mäkelä authored Sep 11, 2023

0dd25f28

MDEV-21679 fixup for s390x · ef569c32

Marko Mäkelä authored Sep 11, 2023

Some s390x environments include
https://github.com/madler/zlib/pull/410
and a more pessimistic compressBound: (sourceLen * 16 + 2308) / 8 + 6.
Let us adjust the recently enabled tests accordingly.

ef569c32