Commits · 040069f4baead789bcb9dd55bb4932f6d1388d7c · nexedi / MariaDB

17 Apr, 2024 6 commits

MDEV-33431 Latching order violation reported fil_system.sys_space.latch and... · 040069f4

mariadb-DebarunBanerjee authored Apr 17, 2024

MDEV-33431 Latching order violation reported fil_system.sys_space.latch and ibuf_pessimistic_insert_mutex

Issue:
------
The actual order of acquisition of the IBUF pessimistic insert mutex
(SYNC_IBUF_PESS_INSERT_MUTEX) and IBUF header page latch
(SYNC_IBUF_HEADER) w.r.t space latch (SYNC_FSP) differs from the order
defined in sync0types.h. It was not discovered earlier as the path to
ibuf_remove_free_page was not covered by the mtr test. Ideal order and
one defined in sync0types.h is as follows.
SYNC_IBUF_HEADER -> SYNC_IBUF_PESS_INSERT_MUTEX -> SYNC_FSP

In ibuf_remove_free_page, we acquire space latch earlier and we have
the order as follows resulting in the assert with innodb_sync_debug=on.
SYNC_FSP -> SYNC_IBUF_HEADER -> SYNC_IBUF_PESS_INSERT_MUTEX

Fix:
---
We do maintain this order in other places and there doesn't seem to be
any real issue here. To reduce impact in GA versions, we avoid doing
extensive changes in mutex ordering to match the current
SYNC_IBUF_PESS_INSERT_MUTEX order. Instead we relax the ordering check
for IBUF pessimistic insert mutex using SYNC_NO_ORDER_CHECK.

040069f4

MDEV-33840 tpool- switch to longer maintainence timer interval, if pool is idle · f6e9600f

Vladislav Vaintroub authored Apr 17, 2024

Previous solution, that would entirely switch timer off, turned out
to be deadlock prone.

This patch fixed previous attempt to switch between long/short interval
periods in MDEV-24295. Now, initial state of the timer is fixed (it is ON).
Also, avoid switching timer to longer periods if there is any activity in
the pool.

f6e9600f

Revert "MDEV-33840 tpool : switch off maintenance timer when not needed." · 2ba79aba
Vladislav Vaintroub authored Apr 17, 2024
```
This reverts commit 09bae92c.
```
2ba79aba
Merge 10.4 into 10.5 · 3a3fe300
Marko Mäkelä authored Apr 17, 2024

3a3fe300
Tests: remove a duplicated check · 9164c2b8
Marko Mäkelä authored Apr 17, 2024
```
This fixes up the merge commit 9b182756
```
9164c2b8

MDEV-33895 : Galera test failure on galera_sr.MDEV-25718 · 4aeba259

Jan Lindström authored Apr 12, 2024

Test was waiting INSERT-clause to make rollback but
wait_condition was too tight. State could be
Freeing items or Rollback. Fixed wait_condition
to expect one of them.

4aeba259

16 Apr, 2024 3 commits

MDEV-33889 Read only server throws error when running a create temporary table as select statement · 41e7ceb0

Sergei Golubchik authored Apr 15, 2024

create_partitioning_metadata() should only mark transaction r/w
if it actually did anything (that is, the table is partitioned).

otherwise it's a no-op, called even for temporary tables and
it shouldn't do anything at all

41e7ceb0

Merge branch '10.4' into 10.5 · 9b182756
Oleksandr Byelkin authored Apr 16, 2024

9b182756

MDEV-33861 main.query_cache fails with embedded after enabling WITH_PROTECT_STATEMENT_MEMROOT · 50998a6c

Oleksandr Byelkin authored Apr 15, 2024

Synopsis: If SELECT returned answer from Query Cache it is not really executed.

The reason for firing of assertion
DBUG_ASSERT((mem_root->flags & ROOT_FLAG_READ_ONLY) == 0);
is that in case the query_cache is on and the same query run by different
stored routines the following use case can take place:
First, lets say that bodies of routines used by the test case are the same
and contains the only query 'SELECT * FROM t1';
call p1() -- a result set is stored in query cache for further use.
call p2() -- the same query is run against the table t1, that result in
not running the actual query but using its cached result.
On finishing execution of this routine, its memory root is
marked for read only since every SP instruction that this
routine contains has been executed.
INSERT INT t1 VALUE (1); -- force following invalidation of query cache
call p2() -- query the table t1 will result in assertion failure since its
execution would require allocation on the memory root that
has been already marked as read only memory root

The root cause of firing the assertion is that memory root of the stored
routine 'p2' was marked as read only although actual execution of the query
contained inside hadn't been performed.

To fix the issue, mark a SP instruction as not yet run in case its execution
doesn't result in real query processing and a result set got from query cache
instead.

Note that, this issue relates server built in debug mode AND with the protect
statement memory root feature turned on. It doesn't affect server built
in release mode.

50998a6c

15 Apr, 2024 5 commits

Fix windows build failure · ce104d41
Kristian Nielsen authored Apr 15, 2024
```
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
```
ce104d41
Merge from 10.4 to 10.5 · 16aa4b5f
Kristian Nielsen authored Apr 15, 2024
```
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
```
16aa4b5f

Distinguish "manager stopped" from "manager not started" · 10272f37

Kristian Nielsen authored Apr 15, 2024

This way, if manager thread somehow starts and stops again quickly before
main thread wakes up to check if it started correctly, we will not hang.

Patch suggested by Monty as follow-up to
7f498fbaSigned-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>

10272f37

MDEV-33559 matched_rec::block should be allocated from the buffer pool · a032f14b

Marko Mäkelä authored Apr 15, 2024

matched_rec::rec_buf[], matched_rec::bufp: Remove.

matched_rec::block: Make this a pointer to something that
is allocated by buf_block_alloc(). In this way, the only
case where buf_block_t is constructed outside buf_pool
is ALTER TABLE...IMPORT TABLESPACE.

rtr_info::heap: Remove. This was only used for allocating matched_rec,
which now is smaller.

mtr_t::memmove(): Simplify some code to avoid GCC 9.4.0 -Wconversion
in the 10.6 branch as a result of these changes.

Reviewed by: Debarun Banerjee

a032f14b

MDEV-30676 rpl.parallel_backup* tests sometimes fail · ea810b04
Daniel Black authored Mar 06, 2024
```
Raise innodb_lock_wait_timeout from 1 to 5
```
ea810b04

14 Apr, 2024 3 commits
- MDEV-33777 Spider: Correct checks for show index column numbers · 051a1fa0
  Yuchen Pei authored Mar 27, 2024
```
It was updated for 10.6+ in MDEV-7317. Because a lower version spider
node may connect to a higher version data node, we need to change this
for 10.4 and 10.5 as well.
```
  051a1fa0
- MDEV-28993 Spider: Push down CASE statement · 18b93d6e
  Yuchen Pei authored Mar 20, 2024
  
  18b93d6e
- MDEV-28993 spider: revert removal of ITEM_FUNC_CASE_PARAMS_ARE_PUBLIC · 99dc0f03
  Yuchen Pei authored Mar 20, 2024
```
It was done in MDEV-29447.
```
  99dc0f03
13 Apr, 2024 6 commits

feedback plugin: abort sending the report on server shutdown · 8bc32410

Sergei Golubchik authored Apr 13, 2024

network timeouts might be rather large and feedback plugin
waits forever for the sender thread to exit.

an alternative could've been to use GNU-specific pthread_timedjoin_np(),
where _np mean "not portable".

8bc32410

Fixed random failure in main.kill_processlist-6619 (take 3) · 6a4ac4c7

Sergei Golubchik authored Apr 13, 2024

followup for 81f75ca8

improve over take 2. It's technically possible, though unlikely,
to see THD after it already reset the info to NULL, but has not
changed the command to COM_SLEEP yet (see THD::mark_connection_idle()).

Let's wait for "Sleep", not for NULL.

6a4ac4c7

galera/suite.pm: perl warning · 69b5fdf3
Sergei Golubchik authored Apr 11, 2024
```
Unescaped left brace in regex is passed through in regex
```
69b5fdf3

Minor improvements to options error handling · 79706fd3

Tony Chen authored Mar 08, 2024

- Add additional MTRs for more coverage on invalid options
- Updating a few error messages to be more informative
- Use the exit code from handle_options() when there is an error processing
  user options

All new code of the whole pull request, including one or several files that are
either new files or modified ones, are contributed under the BSD-new license. I
am contributing on behalf of my employer Amazon Web Services, Inc.

79706fd3

MDEV-33469 Fix behavior on invalid arguments · 47d75cdd

Tony Chen authored Feb 25, 2024

When passing in an invalid value (e.g. incorrect data type) for a variable, the
server startup will fail with misleading error messages.

The behavior **before** this change:

For server options:
- The error message will indicate that the argument is being adjusted to a valid value
- Server startup still fails

For plugin options:
- The error message will indicate that the argument is being adjusted to a valid value
- The plugin is still disabled
- Server startup fails with a message that it does not recognize the plugin option

The behavior **after** this change:

For server options:
- Output that an invalid argument was provided
- Exit server startup

For plugin options:
- Output that an invalid argument was provided
- Disable the plugin
- Attempt to continue server startup

All new code of the whole pull request, including one or several files that are
either new files or modified ones, are contributed under the BSD-new license. I
am contributing on behalf of my employer Amazon Web Services, Inc.

47d75cdd

Simplify MTR for handling multiple invalid options · dd639985

Tony Chen authored Mar 08, 2024

In 69a4d6ae, an MTR test was added to verify that we handled multiple invalid
options. However, the logic to perform this test relied on a non-trivial regex
to filter out the noise in the logs.

Instead, we now just simply search for what we expect to be in the logs.

All new code of the whole pull request, including one or several files that are
either new files or modified ones, are contributed under the BSD-new license. I
am contributing on behalf of my employer Amazon Web Services, Inc.

dd639985

12 Apr, 2024 1 commit

MDEV-33802 Weird read view after ROLLBACK of other transactions. · d7fc975c

Vlad Lesin authored Apr 02, 2024

In the case if some unique key fields are nullable, there can be
several records with the same key fields in unique index with at least
one key field equal to NULL, as NULL != NULL.

When transaction is resumed after waiting on the record with at least one
key field equal to NULL, and stored in persistent cursor record is
deleted, persistent cursor can be restored to the record with all key
fields equal to the stored ones, but with at least one field equal to
NULL. And such record is wrongly treated as a record with the same unique
key as stored in persistent cursor record one, what is wrong as
NULL != NULL.

The fix is to check if at least one unique field is NULL in restored
persistent cursor position, and, if so, then don't treat the record as
one with the same unique key as in the stored record key.

dict_index_t::nulls_equal was removed, as it was initially developed for
never existed in MariaDB "intrinsic tables", and there is no code, which
would set it to "true".

Reviewed by Marko Mäkelä.

d7fc975c

11 Apr, 2024 4 commits

MDEV-10684: rpl.rpl_domain_id_filter_restart fails in buildbot · a6aecbb0

Brandon Nesterenko authored Apr 11, 2024

The test failure in rpl.rpl_domain_id_filter_restart is caused by
MDEV-33887. That is, the test uses master_pos_wait() (called
indirectly by sync_slave_with_master) to try and wait for the
replica to catch up to the master. However, the waited on
transaction is ignored by the configured
  CHANGE MASTER TO IGNORE_DOMAIN_IDS=()
As MDEV-33887 reports, due to the IO thread updating the binlog
coordinates and the SQL thread updating the GTID state, if the
replica is stopped in-between these updates, the replica state will
be inconsistent. That is, the test expects that the GTID state will
be updated, so upon restart, the replica will be up-to-date.
However, if the replica is stopped before the SQL thread updates its
GTID state, then upon restart, the replica will fetch the previously
ignored event, which is no longer ignored upon restart, and execute
it. This leads to the sporadic extra row in t2.

This patch changes master_pos_wait() to use master_gtid_wait() to
ensure the replica state is consistent with the master state.

a6aecbb0

Fix g++-14 -Wtemplate-id-cdtor · 04be12a8
Marko Mäkelä authored Apr 11, 2024

04be12a8

Link beginner instructions in README.md · f131c609

anson1014 authored Apr 09, 2024

When navigating through the existing links in the README, it is not
immediately obvious where to go to find instructions in building
and testing the source code. Since the README is often the first
thing people see when looking at a repository, this information
should be front and centre so that newcomers to the project can
get setup as quickly as possible.

f131c609

Update README.md · 8785b797
Ian Gilfillan authored Apr 08, 2024

8785b797

10 Apr, 2024 8 commits

MDEV-32458 ASAN unknown-crash in Inet6::ascii_to_fbt when casting character string to inet6 · 37fd497c

Alexander Barkov authored Apr 10, 2024

The condition checked the value of the leftmost byte before checking if
at least one byte is still available in the buffer.
Changing the order in the condition: check for a byte availability before
checking the byte value.

37fd497c

sporadic failures of rpl.rpl_semi_sync_master_shutdown · 2d2172a5

Sergei Golubchik authored Apr 10, 2024

increase the MASTER_CONNECT_RETRY time under valgrind,
otherwise the slave gives up retrying before the master is ready

also, cosmetic cleanup of rpl_semi_sync_master_shutdown.test

2d2172a5

MDEV-31779 Server crash in Rows_log_event::update_sequence upon replaying binary log · 0da1653f

Andrei authored Apr 10, 2024

The crash at running mysqlbinlog on a SEQUENCE containing binlog file
was caused MDEV-29621 fixes that did not check which of the slave
or binlog applier executes a block introduced there.

The block is meaningful only for the parallel slave applier, so
it's safe to fix this bug with identified the actual applier and
skipping the block when it's the mysqlbinlog one.

0da1653f

MDEV-29149 Assertion `!is_valid_datetime() ||... · b697dce8

Alexander Barkov authored Apr 10, 2024

MDEV-29149 Assertion `!is_valid_datetime() || fraction_remainder(((item->decimals) < (6) ? (item->decimals) : (6))) == 0' failed in Datetime_truncation_not_needed::Datetime_truncation_not_needed

TIME-alike string and numeric arguments to TIMEDIFF()
can get additional fractional seconds during the supported
TIME range adjustment in get_time().

For example, during TIMEDIFF('839:00:00','00:00:00') evaluation
in Item_func_timediff::get_date(), the call for args[0]->get_time()
returns MYSQL_TIME '838:59:59.999999'.

Item_func_timediff::get_date() did not handle these extra digits
and returned a MYSQL_TIME result with fractional digits outside
of Item_func_timediff::decimals. This mismatch could further be
caught by a DBUG_ASSERT() in various other pieces of the code,
leading to a crash.

Fix:

In case if get_time() returned MYSQL_TIMESTAMP_TIME,
let's truncate all extra digits using my_time_trunc(&l_time,decimals).
This guarantees that the rest of the code returns a MYSQL_TIME
with second_part not conflicting with Item_func_timediff::decimals.

b697dce8

MDEV-33512 Corrupted table after IMPORT TABLESPACE and restart · d8249775

Marko Mäkelä authored Apr 10, 2024

In commit d74d9596 (MDEV-18543)
there was an error that would cause the hidden metadata record
to be deleted, and therefore cause the table to appear corrupted
when it is reloaded into the data dictionary cache.

PageConverter::update_records(): Do not delete the metadata record,
but do validate it.

RecIterator::open(): Make the API more similar to 10.6, to simplify
merges.

d8249775

MDEV-25089 : Assertion `error.len > 0' failed in galera::ReplicatorSMM::handle_apply_error() · 0304dbc3
Jan Lindström authored Nov 01, 2023
```
Additional corrections after merge from 10.4 branch
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
```
0304dbc3

MDEV-28366 GLOBAL debug_dbug setting affected by collation_connection=utf16... · 9fb8881e

Alexander Barkov authored Apr 09, 2024

When the system variables @@debug_dbug was assigned to
some expression, Sys_debug_dbug::do_check() did not properly
convert the value from the expression character set to utf8.
So the value was erroneously re-interpretted as utf8 without
conversion. In case of a tricky expression character set
(e.g. utf16le), this led to unexpected results.

Fix:

Re-using Sys_var_charptr::do_string_check() in Sys_debug_dbug::do_check().

9fb8881e

MDEV-33661 MENT-1591 Keep spider in memory until exit in ASAN builds · 662bb176

Yuchen Pei authored Mar 13, 2024

Same as MDEV-29579. For some reason, libodbc does not clean up
properly if unloaded too early with the dlclose() of spider. So we add
UNIQUE symbols to spider so the spider does not reload in dlclose().

This change, however, uncovers some hidden problems in the spider
codebase, for which we move the initialisation of some spider global
variables into the initialisation of spider itself.

Spider has some global variables. Their initialisation should be done
in the initialisation of spider itself, otherwise, if spider were
re-initialised without these symbol being unloaded, the values could
be inconsistent and causing issues.

One such issue is caused by the variables
spider_mon_table_cache_version and spider_mon_table_cache_version_req.
They are used for resetting the spider monitoring table cache and have
initial values of 0 and 1 respectively. We have that always
spider_mon_table_cache_version_req >= spider_mon_table_cache_version,
and when the relation is strict, the cache is reset,
spider_mon_table_cache_version is brought to be equal to
spider_mon_table_cache_version_req, and the cache is searched for
matching table_name, db_name and link_idx. If the relation is equal,
no reset would happen and the cache would be searched directly.

When spider is re-inited without resetting the values of
spider_mon_table_cache_version and spider_mon_table_cache_version_req
that were set to be equal in the previous cache reset action, the
cache was emptied in the previous spider deinit, which would result in
HA_ERR_KEY_NOT_FOUND unexpectedly.

An alternative way to fix this issue would be to call the spider udf
spider_flush_mon_cache_table(), which increments
spider_mon_table_cache_version_req thus making sure the inequality is
strict. However, there's no reason for spider to initialise these
global variables on dlopen(), rather than on spider init, which is
cleaner and "purer".

To reproduce this issue, simply revert the changes involving the two
variables and then run:

mtr --no-reorder spider.ha{,_part}

662bb176

09 Apr, 2024 4 commits

MDEV-30260: Slave crashed:reload_acl_and_cache during shutdown · 952ab9a5

Brandon Nesterenko authored Apr 08, 2024

The signal handler thread can use various different runtime
resources when processing a SIGHUP (e.g. master-info information)
due to calling into reload_acl_and_cache(). Currently, the shutdown
process waits for the termination of the signal thread after
performing cleanup. However, this could cause resources actively
used by the signal handler to be freed while reload_acl_and_cache()
is processing.

The specific resource that caused MDEV-30260 is a race condition for
the hostname_cache, such that mysqld would delete it in
clean_up()::hostname_cache_free(), before the signal handler would
use it in reload_acl_and_cache()::hostname_cache_refresh().

Another similar resource is the active_mi/master_info_index. There
was a race between its deletion by the main thread in end_slave(),
and their usage by the Signal Handler as a part of
Master_info_index::flush_all_relay_logs.read(active_mi) in
reload_acl_and_cache().

This patch fixes these race conditions by relocating where server
shutdown waits for the signal handler to die until after
server-level threads have been killed (i.e., as a last step of
close_connections()). With respect to the hostname_cache, active_mi
and master_info_cache, this ensures that they cannot be destroyed
while the signal handler is still active, and potentially using
them.

Additionally:

 1) This requires that Events memory is still in place for SIGHUP
handling's mysql_print_status(). So event deinitialization is moved
into clean_up(), but the event scheduler still needs to be stopped
in close_connections() at the same spot.

 2) The function kill_server_thread is no longer used, so it is
deleted

 3) The timeout to wait for the death of the signal thread was not
consistent with the comment. The comment mentioned up to 10 seconds,
whereas it was actually 0.01s. The code has been fixed to wait up to
10 seconds.

 4) A warning has been added if the signal handler thread fails to
exit in time.

 5) Added pthread_join() to end of wait_for_signal_thread_to_end()
if it hadn't ended in 10s with a warning. Note this also removes
the pthread_detached attribute from the signal_thread to allow
for the pthread_join().

Reviewed By:
===========
Vladislav Vaintroub <wlad@mariadb.com>
Andrei Elkin <andrei.elkin@mariadb.com>

952ab9a5

MDEV-33867 main.query_cache_debug fails with heap-use-after-free · 4980fcb9

Sergei Golubchik authored Apr 09, 2024

What's happening:
1. Query_cache::insert() locks the QC and verifies that it's enabled
2. parallel thread tries to disable it. trylock fails (QC is locked)
   so the status becomes DISABLE_REQUEST
3. Query_cache::insert() calls Query_cache::write_result_data()
   which allocates a new block and unlocks the QC.
4. Query_cache::unlock() notices there are no more QC users and a
   pending DISABLE_REQUEST so it disables the QC and frees all the
   memory, including the new block that was just allocated
5. Query_cache::write_result_data() proceeds to write into the freed block

Fix: change m_cache_status under a mutex.

Approved by Oleksandr Byelkin <sanja@mariadb.com>

4980fcb9

MDEV-18898 SELECT using wrong index when using operator IN with mixed types · d4936c8b

Alexander Barkov authored Apr 09, 2024

These patches:

  # commit 74891ed2
  #
  #  MDEV-11514, MDEV-11497, MDEV-11554, MDEV-11555 - IN and CASE type aggregation problems

  # commit 53499cd1
  #
  # MDEV-31303 Key not used when IN clause has both signed and usigned values

earlier fixed MDEV-18898.

Adding only an MTR case.

	modified:   mysql-test/main/func_in.result
	modified:   mysql-test/main/func_in.test

d4936c8b

MDEV-33828 : Transactional commit not supported by involved engine(s) · 7aa86eb1

Jan Lindström authored Apr 04, 2024

Problem was too tight condition on ha_commit_trans to not
allow non transactional storage engines participate 2pc
in Galera case. This is required because transaction
using e.g. procedures might read mysql.proc table inside
a trasaction and these tables use at the moment Aria
storage engine that does not support 2pc.

Fixed by allowing read only transactions to storage
engines that do not support two phase commit to participate
2pc transaction. These will be committed later separately.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>

7aa86eb1