Commits · knielsen_xa_sched_minimal_fix · nexedi / MariaDB

06 Apr, 2024 1 commit

MDEV-33668: More efficient XA dependency tracking in SQL driver thread · 6253f0dc

Kristian Nielsen authored Apr 05, 2024

Avoid linear scan of all recently queued XIDs in the SQL driver thread,
which might be expensive in XA-heavy workloads and large number of parallel
replication worker threads.

Instead keep a hash in the rpl_parallel_entry of where recently queued XIDs
were scheduled. This allows direct lookup of any potential scheduling
dependency.

Keep a list in each scheduling bucket of recently queued XIDs, and purge the
list (based on generations) when queueing next XA.

Also implement a more fine-grained dependency check based on sub_id
comparison. This can sometimes avoid a scheduling dependency that would
otherwise look necessary based solely on the generation check.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>

6253f0dc

05 Apr, 2024 4 commits

MDEV-33668: Simpler and better accounting of scheduling generations · 50472b2b

Kristian Nielsen authored Apr 04, 2024

We can use the first-in-first-out property of the scheduling to know
exactly when the next generation starts, no need to approximate it
conservatively with clener index comparison.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>

50472b2b

MDEV-33668: More precise dependency tracking of XA XID in parallel replication · 7418f47b

Kristian Nielsen authored Feb 27, 2024

Keep track of each recently active XID, recording which worker it was queued
on. If an XID might still be active, choose the same worker to queue event
groups that refer to the same XID to avoid conflicts.

Otherwise, schedule the XID freely in the next round-robin slot.

This way, XA PREPARE can normally be scheduled without restrictions (unless
duplicate XID transactions come close together). This improves scheduling
and parallelism over the old method, where the worker thread to schedule XA
PREPARE on was fixed based on a hash value of the XID.

XA COMMIT will normally be scheduled on the same worker as XA PREPARE, but
can be a different one if the XA PREPARE is far back in the event history.

Testcase and code for trimming dynamic array due to Andrei.
Reviewed-by: Andrei Elkin <andrei.elkin@mariadb.com>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>

7418f47b

MDEV-33668: Refactor parallel replication round-robin scheduling to use explicit FIFO · 4c95985d

Kristian Nielsen authored Feb 27, 2024

This is a preparatory patch to facilitate the next commit to improve
the scheduling of XA transactions in parallel replication.

When choosing the scheduling bucket for the next event group in
rpl_parallel_entry::choose_thread(), use an explicit FIFO for the
round-robin selection instead of a simple cyclic counter i := (i+1) % N.

This allows to schedule XA COMMIT/ROLLBACK dependencies explicitly without
changing the round-robin scheduling of other event groups.
Reviewed-by: Andrei Elkin <andrei.elkin@mariadb.com>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>

4c95985d

MDEV-33668: Testcase with many operlapping XA, to test generation code · 50567f05
Kristian Nielsen authored Apr 05, 2024
```
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
```
50567f05

14 Mar, 2024 8 commits
- Merge branch '10.5' into 10.6 · 55cea0c2
  Sergei Golubchik authored Mar 14, 2024
  
  55cea0c2
- fix galera tests after 9a132d42 · bb451d2c
  Sergei Golubchik authored Mar 14, 2024
  
  bb451d2c
- after merge fix · dbd36bb1
  Sergei Golubchik authored Mar 14, 2024
  
  dbd36bb1
- cmake: append to the array correctly · 51e3f1da
  Sergei Golubchik authored Mar 14, 2024
```
it was generating broken spec files
```
  51e3f1da
- MDEV-33665: MSAN failure due to uninitialized Item_func::not_null_tables_cache · 9d5a8bd6
  Sergei Petrunia authored Mar 13, 2024
```
eliminate_item_equal() uses quick_fix_field() for Item objects it creates.
It computes some of their attributes on its own (see update_used_tables()
call) but it doesn't update not_null_tables_cache.

Recompute not_null_tables_cache also. Not computing it is currently
harmless, except for producing MSAN error when some other code
propagates the wrong value of not_null_tables_cache to other item.
```
  9d5a8bd6
- build failure with cmake < 3.10 · 49cf702e
  Sergei Golubchik authored Mar 14, 2024
```
cmake bug #14362
```
  49cf702e
- update s3.partition result after 57ffcd68 · 7eb6d5aa
  Sergei Golubchik authored Mar 14, 2024
  
  7eb6d5aa
- MDEV-33635 innodb.innodb-64k-crash - Found warnings/errors in server log file · 967a1489
  Thirunarayanan Balathandayuthapani authored Mar 13, 2024
```
- Suppress the "Difficult to find free blocks" warning
globally to avoid many different test case failing.

- Demote the error information in validate_first_page() to note.
So first page can recovered from doublewrite buffer and can throw
error in case the page wasn't found in doublewrite buffer.
```
  967a1489
13 Mar, 2024 10 commits
- perfschema: LOCK_all_status_vars not LOCK_status · 61f6dc5e
  Sergei Golubchik authored Dec 18, 2023
```
to iterate over all status variables one should use
LOCK_all_status_vars not LOCK_status

this fixes sporadic mutex lock inversion in plugins.password_reuse_check:
* acl_cache->lock is taken over complex operations that might increment
  status counters (under LOCK_status).
* acl_cache->lock is needed to get the values of Acl% status variables
  when iterating over status variables
```
  61f6dc5e
- Merge branch '10.5' into 10.6 · f71d7f2f
  Sergei Golubchik authored Mar 13, 2024
  
  f71d7f2f
- MDEV-33313 Incorrect error message for "ALTER TABLE ... DROP CONSTRAINT ..., DROP col, DROP col" · 0e8cda61
  Sergei Golubchik authored Jan 25, 2024
  
  0e8cda61
- remove `exit 1` from search_pattern_in_file.inc · 67115405
  Sergei Golubchik authored Jan 25, 2024
```
it broke tests on Windows. Use SEARCH_ABORT instead.
also, remove redundant features and simplify
```
  67115405
- cleanup: remove SEARCH_TYPE from search_pattern_in_file.inc · bc46f1a7
  Sergei Golubchik authored Jan 25, 2024
  
  bc46f1a7
- cleanup: reduce code duplication · 424210ab
  Sergei Golubchik authored Jan 11, 2024
  
  424210ab
- Merge branch '10.4' into 10.5 · 4cda50af
  Sergei Golubchik authored Mar 13, 2024
  
  4cda50af
- MDEV-33344 REGEXP empty string inconsistent · 62a9a54a
  Sergei Golubchik authored Feb 01, 2024
  
  62a9a54a
- MDEV-33318 ORDER BY COLLATE improperly applied to non-character columns · 7828aadd
  Sergei Golubchik authored Jan 26, 2024
```
when changing charset from latin1 to utf8, adjust max_length accordingly
```
  7828aadd
- MDEV-33549: Incorrect handling of UPDATE in PS mode in case a table's colum declared as NOT NULL · ac20edd7
  Dmitry Shulga authored Mar 13, 2024
```
Follow-up to fix comiler warings caused by present of
the clause override in declaration of the method Item_param::cleanup
```
  ac20edd7
12 Mar, 2024 4 commits

MDEV-33622 Server crashes when the UPDATE statement (which has duplicate key)... · cfa8268e

Monty authored Mar 12, 2024

MDEV-33622 Server crashes when the UPDATE statement (which has duplicate key) is run after setting a low thread_stack

This was caused by wrong allocation of variable on stack.
(Was allocating 4K of data instead of 512 bytes).

No test case as the original MDEV test cases is not usable for mtr.

cfa8268e

MDEV-33549: Incorrect handling of UPDATE in PS mode in case a table's colum declared as NOT NULL · 428a6731

Dmitry Shulga authored Mar 12, 2024

UPDATE statement that is run in PS mode and uses positional parameter
handles columns declared with the clause DEFAULT NULL incorrectly in
case the clause DEFAULT is passed as actual value for the positional
parameter of the prepared statement. Similar issue happens in case
an expression specified in the DEFAULT clause of table's column definition.

The reason for incorrect processing of columns declared as DEFAULT NULL
is that setting of null flag for a field being updated was missed
in implementation of the method Item_param::assign_default().
The reason for incorrect handling of an expression in DEFAULT clause is
also missed saving of a field inside implementation of the method
Item_param::assign_default().

428a6731

MDEV-24167 fixup: Stricter assertion · 4ac8c4c8

Marko Mäkelä authored Mar 12, 2024

log_free_check(): Assert that the current thread is not holding
lock_sys.latch in any mode.

This fixes up commit 5f2dcd11

4ac8c4c8

Merge 10.5 into 10.6 · c3a00dfa
Marko Mäkelä authored Mar 12, 2024

c3a00dfa

11 Mar, 2024 5 commits
- MDEV-33642: MemorySanitizer: SEGV on unknown address on shutdown · 0a9cec22
  Marko Mäkelä authored Mar 11, 2024
```
signal_hand(): Remove the cmake -DWITH_DBUG_TRACE=ON instrumentation.
It can cause a crash on shutdown when the only other thread is
waiting in wait_for_signal_thread_to_end().
```
  0a9cec22
- MDEV-33593 Auto increment deadlock error causes ASSERT in subsequent save point · 67abdb9f
  mariadb-DebarunBanerjee authored Mar 11, 2024
```
innodb.autoinc_debug: Correct the test case for predictable deadlock.
```
  67abdb9f
- Merge 10.4 into 10.5 · f703e72b
  Marko Mäkelä authored Mar 11, 2024
  
  f703e72b
- MDEV-33209 Stack overflow in main.json_debug_nonembedded due to incorrect debug injection · 09ea2dc7
  Marko Mäkelä authored Mar 11, 2024
```
In the JSON functions, the debug injection for stack overflows is
inaccurate and may cause actual stack overflows. Let us simply
inject stack overflow errors without actually relying on the ability
of check_stack_overrun() to do so.

Reviewed by: Rucha Deodhar
```
  09ea2dc7
- MDEV-14448 fixup: clang -Wunused-function · 015f69a7
  Marko Mäkelä authored Mar 11, 2024
  
  015f69a7
09 Mar, 2024 1 commit
- Suppressed new warning for rpl_get_lock on amd-freebsd and aarch64-macos · b3d507ff
  Monty authored Mar 09, 2024
  
  b3d507ff
08 Mar, 2024 3 commits

MDEV-33540 Avoid writes to TRX_SYS page during mariabackup operations · 648d2da8

Daniele Sciascia authored Mar 07, 2024

Fix a scenario where `mariabackup --prepare` fails with assertion
`!m_modifications || !recv_no_log_write'  in `mtr_t::commit()`. This
happens if the prepare step of the backup encounters a data directory
which happens to store wsrep xid position in TRX SYS page (this is no
longer the case since 10.3.5). And since MDEV-17458,
`trx_rseg_array_init()` handles this case by copying the xid position
to rollback segments, before clearing the xid from TRX SYS page.
However, this step should be avoided when `trx_rseg_array_init()` is
invoked from mariabackup. The relevant code was surrounded by the
condition `srv_operation == SRV_OPERATION_NORMAL`. An additional check
ensures that we are not trying to copy a xid position which has
already zeroed.

648d2da8

MDEV-33623 Partitioning is broken on big endian architectures · f838b2d7

Monty authored Mar 08, 2024

MDEV-33502 Slowdown when running nested statement with many partitions
caused this error as I failed to take into account bigendian architectures.

This patch also introduces bitmap_import() and bitmap_export() to be used
when one wants to store bitmaps in files/logs in a portable way.
Reviewed-by: Kristian Nielsen <knielsen@knielsen-hq.org>

f838b2d7

MDEV-33620 Improve times and states in show processlist for replication · 9a132d42

Monty authored Mar 07, 2024

This will makes it easier to find out what replication workers are
doing and what they are waiting for.

Things changed in processlist:
- Slave_SQL time was not consistent. Now time for state "Slave has
  read all relay log; waiting for more updates" shows how long it has
  waited for getting the next event.
- Slave_worker threads did often show "Closing tables" for a long
  time.  Now the state is reverted to the previous state after
  "Closing tables" is done.
- Commit and Rollback states where not shown for replication (and some
  other threads). Now Commit and Rollback states are always shown and
  the state is reverted to previous state when the Commit/Rollback
  have finished.

Code changes:
- Added thd->set_time_for_next_stage() for parallel replication when
  when starting to wait for prior transactions to commit, group commit,
  and FTWRL and for free space in thread pool.
  Before we reset the time only after the above events.
- Moved THD_STAGE_INFO(stage_rollback) and THD_STAGE_INFO(stage_commit)
  from sql_parse.cc to transaction.cc to ensure this is done for
  all commits and not only 'normal connection queries'.

Test case changes:
- close_thread_tables() reverting stage to previous stage caused the
  counter in performance_schema to be increased. In many case it is
  the 'sql/starting' stage that was effected.
- We only change to "Commit" stage if there is a need for a commit.
  This caused some "Commit" stages to disapper from perfschema reports.

TODO in 11.#:
- Slave_IO always showes "Waiting for master to send event" and the time is
  from SLAVE START. We should in 11.# change this to be the time since
  reading the last event.

9a132d42

07 Mar, 2024 1 commit

MDEV-33593 Auto increment deadlock error causes ASSERT in subsequent save point · afe96329

mariadb-DebarunBanerjee authored Mar 05, 2024

The issue here is ha_innobase::get_auto_increment() could cause a
deadlock involving auto-increment lock and rollback the transaction
implicitly. For such cases, storage engines usually call
thd_mark_transaction_to_rollback() to inform SQL engine about it which
in turn takes appropriate actions and close the transaction. In innodb,
we call it while converting Innodb error code to MySQL.

However, since ::innobase_get_autoinc() returns void, we skip the call
for error code conversion and also miss marking the transaction for
rollback for deadlock error. We assert eventually while releasing a
savepoint as the transaction state is not active.

Since convert_error_code_to_mysql() is handling some generic error
handling part, like invoking the callback when needed, we should call
that function in ha_innobase::get_auto_increment() even if we don't
return the resulting mysql error code back.

afe96329

06 Mar, 2024 3 commits

Fixed some mtr results found in Jenins after MDEV-333582 push · 0df4651c

Monty authored Mar 06, 2024

MDEV-33582 Add more warnings to be able to better diagnose network issues

- Disabled "Semisync ack receiver got hangup" warning
  - One could get this warning from semisync if running
    mtr --mysqld=log-warnings=3 rpl.rpl_semi_sync_shutdown_await_ack
- Fixed result file for engines/funcs/rpl_get_lock.test

0df4651c

MDEV-32445 InnoDB may corrupt its log before upgrading it on startup · 6e5333fc

Thirunarayanan Balathandayuthapani authored Mar 04, 2024

Problem:
========
 During upgrade, InnoDB does write the redo log for adjusting
the tablespace size or tablespace flags even before the log
has upgraded to configured format. This could lead to data
inconsistent if any crash happened during upgrade process.

Fix:
===
srv_start(): Write the tablespace flags adjustment, increased
tablespace size redo log only after redo log upgradation.

log_write_low(), log_reserve_and_write_fast(): Check whether
the redo log is in physical format.

6e5333fc

MDEV-32346 Assertion failure sym_node->table != NULL in pars_retrieve_table_def on UPDATE · 738da491
Thirunarayanan Balathandayuthapani authored Mar 05, 2024
```
- During update operation, InnoDB should avoid the initializing
the FTS_DOC_ID of foreign table if the foreign table is discarded
```
738da491