Commits · 76f6b6d818dda85b01a4cb0a62e45a341329dc5e · nexedi / MariaDB

26 Aug, 2024 3 commits

MDEV-34515: Reduce context switching in purge · 76f6b6d8

Marko Mäkelä authored Aug 26, 2024

Before this patch, the InnoDB purge coordinator task submitted
innodb_purge_threads-1 tasks even if there was not sufficient amount
of work for all of them. For example, if there are undo log records
only for 1 table, only 1 task can be employed, and that task had better
be the purge coordinator.

srv_purge_worker_task_low(): Split from purge_worker_callback().

trx_purge_attach_undo_recs(): Remove the parameter n_purge_threads,
and add the parameter n_work_items, to keep track of the amount of
work.

trx_purge(): Launch purge worker tasks only if necessary. The work of
one thread will be executed by this purge coordinator thread.

que_fork_scheduler_round_robin(): Merged to trx_purge().

Thanks to Vladislav Vaintroub for supplying a prototype of this.

Reviewed by: Debarun Banerjee

76f6b6d8

MDEV-34515: Contention between purge and workload · b7b9f3ce

Marko Mäkelä authored Aug 26, 2024

In a Sysbench oltp_update_index workload that involves 1 table,
a serious contention between the workload and the purge of history
was observed. This was the worst when the table contained only 1 record.

This turned out to be fixed by setting innodb_purge_batch_size=128,
which corresponds to the number of usable persistent rollback segments.
When we go above that, there would be contention between row_purge_poss_sec()
and the workload, typically on the clustered index page latch, sometimes
also on a secondary index page latch. It might be that with smaller
batches, trx_sys.history_size() will end up pausing all concurrent
transaction start/commit frequently enough so that purge will be able
to make some progress, so that there would be less contention on the
index page latches between purge and SQL execution.

In commit aa719b50 (part of MDEV-32050)
the interpretation of the parameter innodb_purge_batch_size was slightly
changed. It would correspond to the maximum desired size of the
purge_sys.pages cache. Before that change, the parameter was referring to
a number of undo log pages, but the accounting might have been inaccurate.

To avoid a regression, we will reduce the default value to
innodb_purge_batch_size=127, which will also be compatible with
innodb_undo_tablespaces>1 (which will disable rollback segment 0).

Additionally, some logic in the purge and MVCC checks is simplified.
The purge tasks will make use of purge_sys.pages when accessing undo
log pages to find out if a secondary index record can be removed.
If an undo page needs to be looked up in buf_pool.page_hash, we will
merely buffer-fix it. This is correct, because the undo pages are
append-only in nature. Holding purge_sys.latch or purge_sys.end_latch
or the fact that the current thread is executing as a part of an
in-progress purge batch will prevent the contents of the undo page from
being freed and subsequently reused. The buffer-fix will prevent the
page from being evicted form the buffer pool. Thanks to this logic,
we can refer to the undo log record directly in the buffer pool page
and avoid copying the record.

buf_pool_t::page_fix(): Look up and buffer-fix a page. This is useful
for accessing undo log pages, which are append-only by nature.
There will be no need to deal with change buffer or ROW_FORMAT=COMPRESSED
in that case.

purge_sys_t::view_guard::view_guard(): Allow the type of guard to be
acquired: end_latch, latch, or no latch (in case we are a purge thread).

purge_sys_t::view_guard::get(): Read-only accessor to purge_sys.pages.

purge_sys_t::get_page(): Invoke buf_pool_t::page_fix().

row_vers_old_has_index_entry(): Replaced with row_purge_is_unsafe()
and row_undo_mod_sec_unsafe().

trx_undo_get_undo_rec(): Merged to trx_undo_prev_version_build().

row_purge_poss_sec(): Add the parameter mtr and remove redundant
or unused parameters sec_pcur, sec_mtr, is_tree. We will use the
caller's mtr object but release any acquired page latches before
returning.

btr_cur_get_page(), page_cur_get_page(): Do not invoke page_align().

row_purge_remove_sec_if_poss_leaf(): Return the value of PAGE_MAX_TRX_ID
to be checked against the page in row_purge_remove_sec_if_poss_tree().
If the secondary index page was not changed meanwhile, it will be
unnecessary to invoke row_purge_poss_sec() again.

trx_undo_prev_version_build(): Access any undo log pages using
the caller's mini-transaction object.

row_purge_vc_matches_cluster(): Moved to the only compilation unit that
needs it.

Reviewed by: Debarun Banerjee

b7b9f3ce

MDEV-34520 purge_sys_t::wait_FTS sleeps 10ms, even if it does not have to · d58734d7

Marko Mäkelä authored Aug 26, 2024

There were two separate Atomic_counter<uint32_t>, purge_sys.m_SYS_paused
and purge_sys.m_FTS_paused. In purge_sys.wait_FTS() we have to read both
atomically. We used to use an overkill solution for this, acquiring
purge_sys.latch and waiting 10 milliseconds between samples. To make
matters worse, the 10-millisecond wait was unconditional, which would
unnecessarily suspend the purge_coordinator_task every now and then.

It turns out that we can fold both "reference counts" into a single
Atomic_relaxed<uint32_t> and avoid the purge_sys.latch.
To assess whether std::memory_order_relaxed is acceptable, we should
consider the operations that read these "reference counts", that is,
purge_sys_t::wait_FTS(bool) and purge_sys_t::must_wait_FTS().

Outside debug assertions, purge_sys.must_wait_FTS() is only invoked in
trx_purge_table_acquire(), which is covered by a shared dict_sys.latch.
We would increment the counter as part of a DDL operation, but before
acquiring an exclusive dict_sys.latch. So, a
purge_sys_t::close_and_reopen() loop could be triggered slightly
prematurely, before a problematic DDL operation is actually executed.
Decrementing the counter is less of an issue; purge_sys.resume_FTS()
or purge_sys.resume_SYS() would mostly be invoked while holding an
exclusive dict_sys.latch; ha_innobase::delete_table() does it outside
that critical section. Still, this would only cause some extra wait in
the purge_coordinator_task, just like at the start of a DDL operation.

There are two calls to purge_sys_t::wait_FTS(bool): in the above mentioned
purge_sys_t::close_and_reopen() and in purge_sys_t::clone_oldest_view(),
both invoked by the purge_coordinator_task. There is also a
purge_sys.clone_oldest_view<true>() call at startup when no DDL operation
can be in progress.

purge_sys_t::m_SYS_paused: Merged into m_FTS_paused, using a new
multiplier PAUSED_SYS = 65536.

purge_sys_t::wait_FTS(): Remove an unnecessary sleep as well as the
access to purge_sys.latch. It suffices to poll purge_sys.m_FTS_paused.

purge_sys_t::stop_FTS(): Do not acquire purge_sys.latch.

Reviewed by: Debarun Banerjee

d58734d7

25 Aug, 2024 1 commit

Trivial fix: Make test_if_cheaper_ordering() use actual_rec_per_key() · 9020baf1

Sergei Petrunia authored Aug 24, 2024

Discovered this while working on MDEV-34720: test_if_cheaper_ordering()
uses rec_per_key, while the original estimate for the access method
is produced in best_access_path() by using actual_rec_per_key().

Make test_if_cheaper_ordering() also use actual_rec_per_key().
Also make several getter function "const" to make this compile.
Also adjusted the testcase to handle this (the change backported from
11.0)

9020baf1

23 Aug, 2024 1 commit

MDEV-34759: buf_page_get_low() is unnecessarily acquiring exclusive latch · 9db2b327

Marko Mäkelä authored Aug 23, 2024

buf_page_ibuf_merge_try(): A new, separate function for invoking
ibuf_merge_or_delete_for_page() when needed. Use the already requested
page latch for determining if the call is necessary. If it is and
if we are currently holding rw_latch==RW_S_LATCH, upgrading to an exclusive
latch may involve waiting that another thread acquires and releases
a U or X latch on the page. If we have to wait, we must recheck if the
call to ibuf_merge_or_delete_for_page() is still needed. If the page
turns out to be corrupted, we will release and fail the operation.
Finally, the exclusive page latch will be downgraded to the originally
requested latch.

ssux_lock_impl::rd_u_upgrade_try(): Attempt to upgrade a shared lock to
an update lock.

sux_lock::s_x_upgrade_try(): Attempt to upgrade a shared lock to
exclusive.

sux_lock::s_x_upgrade(): Upgrade a shared lock to exclusive.
Return whether a wait was elided.

ssux_lock_impl::u_rd_downgrade(), sux_lock::u_s_downgrade():
Downgrade an update lock to shared.

9db2b327

22 Aug, 2024 1 commit

MDEV-34799: "Could not write packet" err message args off by 1 · 3e5e97b2

Brandon Nesterenko authored Aug 22, 2024

MDEV-33582 (3541bd63) changed the "Could not write packet"
error message in net_serv.cc to use the function
sql_print_warning(), instead of my_printf_error(). The flags
argument was not removed in this change though, so the old
flags were printed in place of the file descriptor, and all
other args are presenting for the wrong field (and length is
never showed).

This patch removes flags as a parameter to sql_print_warning().

3e5e97b2

20 Aug, 2024 1 commit
- Merge branch '10.5' into 10.6 · fc5772ce
  Oleksandr Byelkin authored Aug 20, 2024
  
  fc5772ce
19 Aug, 2024 5 commits

MDEV-34776 Assertion failure in Item_string::do_build_clone · ae02999c
Oleksandr Byelkin authored Aug 19, 2024
```
Added missed methods to Item_string child.
```
ae02999c
MDEV-34771 Types mismatch when cloning items causes debug assertion · fccfdc28
Oleksandr Byelkin authored Aug 19, 2024
```
Missing methods added to Item_bin_string
```
fccfdc28

Sort result from table_statistics and index_statistics · db8ab4ac

Monty authored Aug 19, 2024

This is needed as the order of rows are not deterministic,
especially in future versions of table statistics.

db8ab4ac

Revert "mtr: remove not_valgrind_build" · e51d55a6

Monty authored Aug 19, 2024

The original code is correct.

valgrind and asan binaries should be built with a specialiced version of
mem_root that makes it easier to find memory overwrites.
This is what the BUILD scripts is doing.

The specialiced mem_root code allocates a new block for every allocation
which is visiable for any test that depenmds on the default original malloc
size and usage.

e51d55a6

MDEV-34718: Trigger doesn't work correctly with bulk update · ba5482ff

Dmitry Shulga authored Aug 16, 2024

Running an UPDATE statement in PS mode and having positional
parameter(s) bound with an array of actual values (that is
prepared to be run in bulk mode) results in incorrect behaviour
in presence of on update trigger that also executes an UPDATE
statement. The same is true for handling a DELETE statement in
presence of on delete trigger. Typically, the visible effect of
such incorrect behaviour is expressed in a wrong number of
updated/deleted rows of a target table. Additionally, in case UPDATE
statement, a number of modified rows and a state message returned
by a statement contains wrong information about a number of modified rows.

The reason for incorrect number of updated/deleted rows is that
a data structure used for binding positional argument with its
actual values is stored in THD (this is thd->bulk_param) and reused
on processing every INSERT/UPDATE/DELETE statement. It leads to
consuming actual values bound with top-level UPDATE/DELETE statement
by other DML statements used by triggers' body.

To fix the issue, reset the thd->bulk_param temporary to the value
nullptr before invoking triggers and restore its value on finishing
its execution.

The second part of the problem relating with wrong value of affected
rows reported by Connector/C API is caused by the fact that diagnostics
area is reused by an original DML statement and a statement invoked
by a trigger. This fact should be take into account on finalizing a
state of diagnostics area on completion running of a statement.

Important remark: in case the macros DBUG_OFF is on, call of the method
  Diagnostics_area::reset_diagnostics_area()
results in reset of the data members
  m_affected_rows, m_statement_warn_count.
Values of these data members of the class Diagnostics_area are used on
sending OK and EOF messages. In case DML statement is executed in PS bulk
mode such resetting results in sending wrong result values to a client
for affected rows in case the DML statement fires a triggers. So, reset
these data members only in case the current statement being processed
is not run in bulk mode.

ba5482ff

15 Aug, 2024 3 commits
- Fix typo in xtrabackup --help output · f41a1202
  Tim van Dijen authored Aug 13, 2024
  
  f41a1202
- Merge 10.5 into 10.6 · 267c0fce
  Marko Mäkelä authored Aug 15, 2024
  
  267c0fce
- Fix clang++-19 -Wunused-but-set-variable · e40dfcdd
  Marko Mäkelä authored Aug 15, 2024
  
  e40dfcdd
14 Aug, 2024 3 commits

Merge 10.5 into 10.6 · 757c3681
Marko Mäkelä authored Aug 14, 2024

757c3681

MDEV-34755 g++- -Wstringop-truncation due to safe_strcpy() · ecd910ae

Marko Mäkelä authored Aug 14, 2024

The #pragma that was removed in
commit e255837e (MDEV-34266)
turns out to be necessary for silencing all cases of
-Wstringop-truncation.

ecd910ae

MDEV-34678 pthread_mutex_init() without pthread_mutex_destroy() · 4f8803c0

Marko Mäkelä authored Aug 14, 2024

When SUX_LOCK_GENERIC is defined, the srw_mutex, srw_lock, sux_lock are
implemented based on pthread_mutex_t and pthread_cond_t.  This is the
only option for systems that lack a futex-like system call.

In the SUX_LOCK_GENERIC mode, if pthread_mutex_init() is allocating
some resources that need to be freed by pthread_mutex_destroy(),
a memory leak could occur when we are repeatedly invoking
pthread_mutex_init() without a pthread_mutex_destroy() in between.

pthread_mutex_wrapper::initialized: A debug field to track whether
pthread_mutex_init() has been invoked.  This also helps find bugs
like the one that was fixed by
commit 1c8af2ae (MDEV-34422);
one simply needs to add -DSUX_LOCK_GENERIC to the CMAKE_CXX_FLAGS
to catch that particular bug on the initial server bootstrap.

buf_block_init(), buf_page_init_for_read(): Invoke block_lock::init()
because buf_page_t::init() will no longer do that.

buf_page_t::init(): Instead of invoking lock.init(), assert that it
has already been invoked (the lock is vacant).

add_fts_index(), build_fts_hidden_table(): Explicitly invoke
index_lock::init() in order to avoid a pthread_mutex_destroy()
invocation on an uninitialized object.

srw_lock_debug::destroy(): Invoke readers_lock.destroy().

trx_sys_t::create(): Invoke trx_rseg_t::init() on all rollback segments
in order to guarantee a deterministic state for shutdown, even if
InnoDB fails to start up.

trx_rseg_array_init(), trx_temp_rseg_create(), trx_rseg_create():
Invoke trx_rseg_t::destroy() before trx_rseg_t::init() in order to
balance pthread_mutex_init() and pthread_mutex_destroy() calls.

4f8803c0

13 Aug, 2024 1 commit

MDEV-14231 MATCH() AGAINST( IN BOOLEAN MODE), results mismatch · b304ec30

Thirunarayanan Balathandayuthapani authored Aug 13, 2024

- Added plugin_debug.test, multiple_index.test to innodb_fts suite
from mysql-5.7.

- commit c5b28e55 removes the warning
for InnoDB rebuilding table to add FTS_DOC_ID

- multiple_index test case  has MATCH(a) values are smaller
than in MySQL because ROLLBACK updates the stat_n_rows.

- st_mysql_ftparser_boolean_info structure conveys boolean
metadata to mysql search engine for every word in the query.
This structure misses the position value to store the correct
offset of every word. So phrase search queries in plugin_debug
test case with boolean mode for simple parser throws
wrong result.

b304ec30

12 Aug, 2024 4 commits

MDEV-30686: Endless loop when trying to establish connection · 2c5d8376

Julius Goryavsky authored Aug 07, 2024

With wsrep_sst_rsync, node goes into endless loop when trying
to establish connection to donor for IST/SST if the database
is bind on specific IP address, not the "*".

This commit fixes this problem. Separate tests are not
required - the problem can occur in normal configurations
on a number of systems when selecting a bing address other
than "*", especially on FreeBSD and with the IPv6 addresses.

2c5d8376

MDEV-34594 : Assertion `client_state.transaction().active()' failed in · cd8b8bb9

Jan Lindström authored Jul 18, 2024

int wsrep_thd_append_key(THD*, const wsrep_key*, int, Wsrep_service_key_type)

CREATE TABLE [SELECT|REPLACE SELECT] is CTAS and idea was that
we force ROW format. However, it was not correctly enforced
and keys were appended before wsrep transaction was started.

At THD::decide_logging_format we should force used stmt binlog
format to ROW in CTAS case and produce a warning if used
binlog format was not ROW.

At ha_innodb::update_row we should not append keys similarly
as in ha_innodb::write_row if sql_command is SQLCOM_CREATE_TABLE.
Improved error logging on ::write_row, ::update_row and ::delete_row
if wsrep key append fails.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>

cd8b8bb9

MDEV-34376 Wrong data types when mixing an utf8 *TEXT column and a short binary · 0e273510

Alexander Barkov authored Jun 12, 2024

A mixture of a multi-byte *TEXT column and a short binary column
produced a too large column.
For example, COALESCE(tinytext_utf8mb4, short_varbinary)
produced a BLOB column instead of an expected TINYBLOB.

- Adding a virtual method Type_all_attributes::character_octet_length(),
  returning max_length by default.
- Overriding Item_field::character_octet_length() to extract
  the octet length from the underlying Field.
- Overriding Item_ref::character_octet_length() to extract
  the octet length from the references Item (e.g. as VIEW fields).
- Fixing Type_numeric_attributes::find_max_octet_length() to
  take the octet length using the new method character_octet_length()
  instead of accessing max_length directly.

0e273510

Update sponsors · c83ba513
Ian Gilfillan authored Aug 06, 2024

c83ba513

09 Aug, 2024 2 commits
- Merge branch '10.6' into mariadb-10.6.19 · 2e580dc2
  Oleksandr Byelkin authored Aug 09, 2024
  
  2e580dc2
- Merge branch '10.5' into mariadb-10.5.26 · 662bb507
  Oleksandr Byelkin authored Aug 09, 2024
  
  662bb507
08 Aug, 2024 7 commits

MDEV-34714 perror-win test failure on localized Windows · e997bf58
Vladislav Vaintroub authored Aug 06, 2024
```
Let perror produce OS error messages in English
```
e997bf58

Fix server on windows, so it does not write to error log byte-by-byte · 4a67bd51

Vladislav Vaintroub authored Aug 08, 2024


fprintf() on Windows, when used on unbuffered FILE*, writes bytewise.
This can make crash handler messages harder to read, if they are mixed up
with other error log output.

Fixed , on Windows, by using a small buffer for formatting, and fwrite
instead of fprintf, if buffer is large enough for message.

4a67bd51

OpenSSL - set all heap functions in CRYPTO_set_mem_functions. · 62fd7b4c

Vladislav Vaintroub authored Jul 29, 2024

The reason is that on Windows, OpenSSL can be built with different C runtime
than the server (e.g Debug runtime in debug vcpkg build).

Overwriting only malloc(), with CRT that server is using can cause
mixup of incompatible malloc() and free() inside openssl.

To fix, overwrite all memory functions.

62fd7b4c

Support -DCONC_WITH_SSL parameter passed to CMake · b619be35

Vladislav Vaintroub authored Jul 29, 2024

Usually it is not needed, but sometimes we need to testing interoperability,
e.g OpenSSL server / SCHANNEL client, or WolfSSL server / OpenSSL client.

b619be35

Fix incorrect setting of opt_local_file in mysqlimport, for named pipe · d1713666

Vladislav Vaintroub authored Jul 19, 2024

For named pipe, server and client are on the same machine, and
opt_local_infile just adds unnecessary copying via "load data local infile"

d1713666

bump the VERSION · f15de425
Daniel Bartholomew authored Aug 08, 2024

f15de425
bump the VERSION · 56fb04aa
Daniel Bartholomew authored Aug 08, 2024

56fb04aa

07 Aug, 2024 3 commits

MDEV-34632 Assertion failed in handler::assert_icp_limitations · 25e2d0a6

Nikita Malyavin authored Aug 06, 2024

Assertion `table->field[0]->ptr >= table->record[0] &&
table->field[0]->ptr <= table->record[0] + table->s->reclength' failed in
handler::assert_icp_limitations.

table->move_fields has some limitations:
1. It cannot be used in cascade
2. It should always have a restoring pair.

Rule 1 is covered by assertions in handler::assert_icp_limitations
and handler::ptr_in_record (commit 30894fe9).

Rule 2 should be manually maintained with care. Hopefully, the rule 1 assertions
may sometimes help as well.

In ha_myisam::repair, both rules are broken. table->move_fields is used
asymmetrically there: it is set on every param->fix_record call
(i.e. in compute_vcols) but is restored only once, in the end of repair.

The reason to updating field ptr's for every call is that compute_vcols can
(supposedly) be called in parallel, that is, with the same table, but different
records.

The condition to "unmove" the pointers in ha_myisam::restore_vcos_after_repair
is incorrect, when stored vcols are available, and myisam stores a VIRTUAL field
if it's the only field in the table (the record cannot be of zero length).

This patch solves the problem by "unmoving" the pointers symmetrically, in
compute_vcols. That is, both rules will be preserved maintained.

25e2d0a6

MDEV-34682 Return the return value of ddl recovery done in ha_initialize_handlerton · fa8ce92c
Yuchen Pei authored Aug 02, 2024
```
Otherwise it could cause false negative when ddl recovery done is part
of the plugin initialization
```
fa8ce92c
MDEV-34682 Reset spider_hton_ptr in error mode of spider_db_init() · bce3f3f6
Yuchen Pei authored Aug 02, 2024

bce3f3f6

04 Aug, 2024 5 commits

MDEV-34640 : galera_var_ignore_apply_errors test freezes · 8b51d344

Jan Lindström authored Jul 23, 2024

Test improvements only to make test more robust.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>

8b51d344

MDEV-25614 : Galera test failure on GCF-354 · 71f289e5

Jan Lindström authored Jul 30, 2024

Modified node config with longer timeouts for suspect,
inactive, install and wait_prim timeout. Increased
node_1 weight to keep it primary component when
other nodes are voted out.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>

71f289e5

MDEV-34647 : 'INSERT...SELECT' on MyISAM table suddenly replicated by Galera · eb30a9d6

Jan Lindström authored Jul 26, 2024

Replication of MyISAM and Aria DML is experimental and best
effort only. Earlier change make INSERT SELECT on both
MyISAM and Aria to replicate using TOI and STATEMENT
replication. Replication should happen only if user
has set needed wsrep_mode setting.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>

eb30a9d6

MDEV-32778 : galera_ssl_reload failed with warning message · cb80ef93

Jan Lindström authored Jul 31, 2024

Fixed used configuration and added suppression for warning
message. Test case changes only.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>

cb80ef93

MDEV-32782 : galera_sst_mysqldump_with_key test failed · 0ba6068a

Jan Lindström authored Aug 01, 2024

Modified test configuration file to use wsrep_sync_wait
to make sure committed transactions are replicated before
next operation.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>

0ba6068a