Commits · fa8a46eb68299776589e769844372813ebb16a99 · nexedi / MariaDB

22 Mar, 2024 1 commit

MDEV-33613 InnoDB may still hang when temporarily running out of buffer pool · fa8a46eb

Marko Mäkelä authored Mar 22, 2024

By design, InnoDB has always hung when permanently running out of
buffer pool, for example when several threads are waiting to allocate
a block, and all of the buffer pool is buffer-fixed by the active threads.

The hang that we are fixing here occurs when the buffer pool is only
temporarily running out and the situation could be rescued by writing out
some dirty pages or evicting some clean pages.

buf_LRU_get_free_block(): Simplify the way how we wait for
the buf_flush_page_cleaner thread. This fixes occasional hangs
of the test encryption.innochecksum that were introduced by
commit a55b951e (MDEV-26827).
To play it safe, we use a timed wait when waiting for the
buf_flush_page_cleaner() thread to perform its job. Should that
thread get stuck, we will invoke buf_pool.LRU_warn() in order to
display a message that pages could not be freed, and keep trying
to wake up the buf_flush_page_cleaner() thread.

The INFORMATION_SCHEMA.INNODB_METRICS counters
buffer_LRU_single_flush_failure_count and
buffer_LRU_get_free_waits will be removed.
The latter is represented by buffer_pool_wait_free.

Also removed will be the message
"InnoDB: Difficult to find free blocks in the buffer pool"
because in d34479dc we
introduced a more precise message
"InnoDB: Could not free any blocks in the buffer pool"
in the buf_flush_page_cleaner thread.

buf_pool_t::LRU_warn(): Issue the warning message that we could
not free any blocks in the buffer pool. This may also be invoked
by buf_LRU_get_free_block() if buf_flush_page_cleaner() appears
to be stuck.

buf_pool_t::n_flush_dec(): Remove.

buf_pool_t::n_flush_dec_holding_mutex(): Rename to n_flush_dec().

buf_flush_LRU_list_batch(): Increment the eviction counter for blocks
of temporary, discarded or dropped tablespaces.

buf_flush_LRU(): Make static, and remove the constant parameter
evict=false. The only caller will be the buf_flush_page_cleaner()
thread.

IORequest::is_LRU(): Remove. The only case of evicting pages on
write completion will be when we are writing out pages of the
temporary tablespace. Those pages are not in buf_pool.flush_list,
only in buf_pool.LRU.

buf_page_t::flush(): Remove the parameter evict.

buf_page_t::write_complete(): Change the parameter "bool temporary"
to "bool persistent" and add a parameter for an already read state().

Reviewed by: Debarun Banerjee

fa8a46eb

21 Mar, 2024 1 commit

MDEV-33551: Semi-sync Wait Point AFTER_COMMIT Slow on Workloads with Heavy Concurrency · 75c7c6dc

Brandon Nesterenko authored Feb 27, 2024

When using semi-sync replication with
rpl_semi_sync_master_wait_point=AFTER_COMMIT, the performance of the
primary can significantly reduce compared to AFTER_SYNC's
performance for workloads with many concurrent users executing
transactions. This is because all connections on the primary share
the same cond_wait variable/mutex pair, so any time an ACK is
received from a replica, all waiting connections are awoken to check
if the ACK was for itself, which is done in mutual exclusion.

This patch changes this such that the waiting THD will use its own
local condition variable, and the ACK receiver thread only signals
connections which have been ACKed for wakeup. That is, the
THD::LOCK_wakeup_ready condition variable is re-used for this
purpose, and the Active_tranx queue nodes are extended to hold the
waiting thread, so it can be signalled once ACKed.

Additionally:

 1)  Removed part of MDEV-11853 additions, which allowed suspended
connection threads awaiting their semi-sync ACKs to live until their
ACKs had been received. This part, however, wasn't needed.  That is,
all that was needed was for the Ack_thread to survive.  So now the
connection threads are killed during phase 1. Thereby
THD::is_awaiting_semisync_ack, and all its related code was removed.

 2) COND_binlog_send is repurposed to signal on the condition when
Active_tranx is emptied during clear_active_tranx_nodes.

 3) At master shutdown (when waiting for slaves), instead of the
main loop individually waiting for each ACK, await_slave_reply()
(renamed await_all_slave_replies()) just waits once for the
repurposed COND_binlog_send to signal it is empty.

 4) Test rpl_semi_sync_shutdown_await_ack is updates as following:
   4.1) Added test case (adapted from Kristian Nielsen) to ensure
that if a thread awaiting its ACK is killed while SHUTDOWN WAIT FOR
ALL SLAVES is issued, the primary will still wait for the ACK from
the killed thread.
   4.2) As connections which by-passed phase 1 of thread killing no
longer are delayed for kill until phase 2, we can no longer query
yes/no tx after receiving an ACK/timeout. The check for these
variables is removed.
   4.3) Comment descriptions are updated which mention that the
connection is alive; and adjusted to be the Ack_thread.

Reviewed By:
============
Kristian Nielsen <knielsen@knielsen-hq.org>

75c7c6dc

20 Mar, 2024 1 commit

MDEV-26642/MDEV-26643/MDEV-32898 Implement innodb_snapshot_isolation · b8a67198

Marko Mäkelä authored Mar 20, 2024

https://jepsen.io/analyses/mysql-8.0.34 highlights that the
transaction isolation levels in the InnoDB storage engine do not
correspond to any widely accepted definitions, such as
"Generalized Isolation Level Definitions"
https://pmg.csail.mit.edu/papers/icde00.pdf
(PL-1 = READ UNCOMMITTED, PL-2 = READ COMMITTED, PL-2.99 = REPEATABLE READ,
PL-3 = SERIALIZABLE).
Only READ UNCOMMITTED in InnoDB seems to match the above definition.

The issue is that InnoDB does not detect write/write conflicts
(Section 4.4.3, Definition 6) in the above.

It appears that as soon as we implement write/write conflict detection
(SET SESSION innodb_snapshot_isolation=ON), the default isolation level
(SET TRANSACTION ISOLATION LEVEL REPEATABLE READ) will become
Snapshot Isolation (similar to Postgres), as defined in Section 4.2 of
"A Critique of ANSI SQL Isolation Levels", MSR-TR-95-51, June 1995
https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tr-95-51.pdf

Locking reads inside InnoDB used to read the latest committed version,
ignoring what should actually be visible to the transaction.
The added test innodb.lock_isolation illustrates this. The statement
	UPDATE t SET a=3 WHERE b=2;
is executed in a transaction that was started before a read view or
a snapshot of the current transaction was created, and committed before
the current transaction attempts to execute
	UPDATE t SET b=3;
If SET innodb_snapshot_isolation=ON is in effect when the second
transaction was started, the second transaction will be aborted with
the error ER_CHECKREAD. By default (innodb_snapshot_isolation=OFF),
the second transaction would execute inconsistently, displaying an
incorrect SELECT COUNT(*) FROM t in its read view.

If innodb_snapshot_isolation=ON, if an attempt to acquire a lock on a
record that does not exist in the current read view is made, an error
DB_RECORD_CHANGED (HA_ERR_RECORD_CHANGED, ER_CHECKREAD) will
be raised. This error will be treated in the same way as a deadlock:
the transaction will be rolled back.

lock_clust_rec_read_check_and_lock(): If the current transaction has
a read view where the record is not visible and
innodb_snapshot_isolation=ON, fail before trying to acquire the lock.

row_sel_build_committed_vers_for_mysql(): If innodb_snapshot_isolation=ON,
disable the "semi-consistent read" logic that had been implemented by
myself on the directions of Heikki Tuuri in order to address
https://bugs.mysql.com/bug.php?id=3300 that was motivated by a customer
wanting UPDATE to skip locked rows that do not match the WHERE condition.
It looks like my changes were included in the MySQL 5.1.5
commit ad126d90; at that time, employees
of Innobase Oy (a recent acquisition of Oracle) had lost write access to
the repository.

The only reason why we set innodb_snapshot_isolation=OFF by default is
backward compatibility with applications, such as the one that motivated
the implementation of "semi-consistent read" back in 2005. In a later
major release, we can default to innodb_snapshot_isolation=ON.

Thanks to Peter Alvaro, Kyle Kingsbury and Alexey Gotsman for their work
on https://github.com/jepsen-io/ and to Kyle and Alexey for explanations
and some testing of this fix.

Thanks to Vladislav Lesin for the initial test for MDEV-26643,
as well as reviewing these changes.

b8a67198

19 Mar, 2024 3 commits

MDEV-33716: rpl.rpl_semi_sync_slave_enabled_consistent Fails with Error Condition Reached · ca07f629

Brandon Nesterenko authored Mar 18, 2024

Though the test itself doesn't create any transactions
directly, the added test suppressions are replicated,
and when the SQL thread is stopped mid-execution,
it is set into an error state because these are
non-transactional events being aborted.

This patch fixes the test by ensuring that the test
suppressions are fully replicated before continuing

ca07f629

MDEV-33542 Inplace algorithm occupies more disk space compared to copy algorithm · c3a6248b

Thirunarayanan Balathandayuthapani authored Mar 19, 2024

Problem:
=======
- In case of large file size, InnoDB eagerly adds the new extent
even though there are many existing unused pages of the segment.
Reason is that in case of larger file size, threshold
(1/8 of reserved pages) for adding new extent has been
reached frequently.

Solution:
=========
- Try to utilise the unused pages in the segment before adding
the new extent in the file segment.

need_for_new_extent(): In case of larger file size, try to use
the 4 * FSP_EXTENT_SIZE as threshold to allocate the new extent.

fseg_alloc_free_page_low(): Rewrote the function to allocate
the page in the following order.
1) Try to get the page from existing segment extent.
2) Check whether the segment needs new extent
(need_for_new_extent()) and allocate the new extent,
find the page.
3) Take individual page from the unused page from
segment or tablespace.
4) Allocate a new extent and take first page from it.

Removed FSEG_FILLFACTOR, FSEG_FRAG_LIMIT variable.

c3a6248b

MDEV-23224 Windows threadpool - use better threadpool_max_threads default. · 5b4e69c0
Vladislav Vaintroub authored Mar 19, 2024
```
Use max_connections in calculation, top prevent possible deadlock, if
max_connection is high.
```
5b4e69c0

18 Mar, 2024 4 commits

Post-fix · 01d994b3

Vladislav Vaintroub authored Mar 18, 2024

Do *not* check if socket is closed by another thread. This is
race-condition prone, unnecessary, and harmful. VIO state was introduced
to debug the errors, not to change the behavior.

Rather than checking if socket is closed, add a DBUG_ASSERT that it is
*not* closed, because this is an actual logic error, and can potentially
lead to all sorts of funny behavior like writing error packets to Innodb
files.

Unlike closesocket(), shutdown(2) is not actually race-condition prone,
and it breaks poll() and read(), and it worked for longer than a decade,
and it does not need any state check in the code.

01d994b3

Merge 10.5 into 10.6 · 50715bd2
Marko Mäkelä authored Mar 18, 2024

50715bd2

Work around missing MSAN instrumentation · 4592af2e

Marko Mäkelä authored Mar 18, 2024

Let us skip the recently added test main.mysql-interactive if
an instrumented ncurses library is not available.

In InnoDB, let us work around an uninstrumented libnuma, by
declaring that the objects returned by numa_get_mems_allowed()
are initialized.

4592af2e

MDEV-33478: Tests massively fail with clang-18 -fsanitize=memory · 09d991d0

Marko Mäkelä authored Mar 18, 2024

Starting with clang-16, MemorySanitizer appears to check that
uninitialized values not be passed by value nor returned.
Previously, it was allowed to copy uninitialized data in such cases.

get_foreign_key_info(): Remove a local variable that was passed
uninitialized to a function.

DsMrr_impl: Initialize key_buffer, because DsMrr_impl::dsmrr_init()
is reading it.

test_bind_result_ext1(): MYSQL_TYPE_LONG is 32 bits, hence we must
use a 32-bit type, such as int. sizeof(long) differs between
LP64 and LLP64 targets.

09d991d0

15 Mar, 2024 5 commits

MDEV-25923: Aria parallel repair MY_THREAD_SPECIFIC mismatch in realloc · 51abae5e

Kristian Nielsen authored Feb 29, 2024

maria_repair_parallel() clears the MY_THREAD_SPECIFIC flag for allocations
since it uses different threads. But it still did one _ma_alloc_buffer()
call as thread-specific which would later assert if another thread needed
to extend the buffer with realloc.

This patch, due to Monty, removes the MY_THREAD_SPECIFIC flag for
allocations that need to realloc in different threads, and preserves
it for those that are allocated/freed in the user's thread.

Also fixes MDEV-33562.
Reviewed-by: Monty <monty@mariadb.org>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>

51abae5e

MDEV-24622: Replication does not support bulk insert into empty table · 77b9b28a

Kristian Nielsen authored Mar 08, 2024

Remove work-around that disables bulk insert optimization in replication

The root cause of the original problem is now fixed (MDEV-33475). Though the
bulk insert optimization will still be disabled in replication, as it is
only enabled in special circumstances meant for loading a mysqldump.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>

77b9b28a

MDEV-33303: slave_parallel_mode=optimistic should not report the mode's specific temporary errors · 1fb00f37

Kristian Nielsen authored Mar 08, 2024

An earlier patch for MDEV-13577 fixed the most common instances of this, but
missed one case for tables without primary key when the scan reaches the end
of the table. This patch adds similar code to handle this case, converting
the error to HA_ERR_RECORD_CHANGED when doing optimistic parallel apply.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>

1fb00f37

Fix occasional test failure of rpl.rpl_parallel_stop_slave · fb774eb1
Kristian Nielsen authored Mar 15, 2024
```
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
```
fb774eb1

Fix "Assertion `THR_PFS_initialized' failed" in main.bootstrap · 7f498fba

Kristian Nielsen authored Mar 15, 2024

This patch makes the server wait for the manager thread to actually start
before proceeding with server startup.

Without this, if thread scheduling is really slow and the server shutdowns
quickly, then it is possible that the manager thread is not yet started when
shutdown_performance_schema() is called. If the manager thread starts at just
the wrong moment and just before the main server reaches exit(), the thread
can try to access no longer available performance schema data. This was seen
as occasional assertion in the main.bootstrap test.

As an additional improvement, make sure to run all pending actions before
exiting the manager thread.
Reviewed-by: Monty <monty@mariadb.org>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>

7f498fba

14 Mar, 2024 8 commits
- Merge branch '10.5' into 10.6 · 55cea0c2
  Sergei Golubchik authored Mar 14, 2024
  
  55cea0c2
- fix galera tests after 9a132d42 · bb451d2c
  Sergei Golubchik authored Mar 14, 2024
  
  bb451d2c
- after merge fix · dbd36bb1
  Sergei Golubchik authored Mar 14, 2024
  
  dbd36bb1
- cmake: append to the array correctly · 51e3f1da
  Sergei Golubchik authored Mar 14, 2024
```
it was generating broken spec files
```
  51e3f1da
- MDEV-33665: MSAN failure due to uninitialized Item_func::not_null_tables_cache · 9d5a8bd6
  Sergei Petrunia authored Mar 13, 2024
```
eliminate_item_equal() uses quick_fix_field() for Item objects it creates.
It computes some of their attributes on its own (see update_used_tables()
call) but it doesn't update not_null_tables_cache.

Recompute not_null_tables_cache also. Not computing it is currently
harmless, except for producing MSAN error when some other code
propagates the wrong value of not_null_tables_cache to other item.
```
  9d5a8bd6
- build failure with cmake < 3.10 · 49cf702e
  Sergei Golubchik authored Mar 14, 2024
```
cmake bug #14362
```
  49cf702e
- update s3.partition result after 57ffcd68 · 7eb6d5aa
  Sergei Golubchik authored Mar 14, 2024
  
  7eb6d5aa
- MDEV-33635 innodb.innodb-64k-crash - Found warnings/errors in server log file · 967a1489
  Thirunarayanan Balathandayuthapani authored Mar 13, 2024
```
- Suppress the "Difficult to find free blocks" warning
globally to avoid many different test case failing.

- Demote the error information in validate_first_page() to note.
So first page can recovered from doublewrite buffer and can throw
error in case the page wasn't found in doublewrite buffer.
```
  967a1489
13 Mar, 2024 10 commits
- perfschema: LOCK_all_status_vars not LOCK_status · 61f6dc5e
  Sergei Golubchik authored Dec 18, 2023
```
to iterate over all status variables one should use
LOCK_all_status_vars not LOCK_status

this fixes sporadic mutex lock inversion in plugins.password_reuse_check:
* acl_cache->lock is taken over complex operations that might increment
  status counters (under LOCK_status).
* acl_cache->lock is needed to get the values of Acl% status variables
  when iterating over status variables
```
  61f6dc5e
- Merge branch '10.5' into 10.6 · f71d7f2f
  Sergei Golubchik authored Mar 13, 2024
  
  f71d7f2f
- MDEV-33313 Incorrect error message for "ALTER TABLE ... DROP CONSTRAINT ..., DROP col, DROP col" · 0e8cda61
  Sergei Golubchik authored Jan 25, 2024
  
  0e8cda61
- remove `exit 1` from search_pattern_in_file.inc · 67115405
  Sergei Golubchik authored Jan 25, 2024
```
it broke tests on Windows. Use SEARCH_ABORT instead.
also, remove redundant features and simplify
```
  67115405
- cleanup: remove SEARCH_TYPE from search_pattern_in_file.inc · bc46f1a7
  Sergei Golubchik authored Jan 25, 2024
  
  bc46f1a7
- cleanup: reduce code duplication · 424210ab
  Sergei Golubchik authored Jan 11, 2024
  
  424210ab
- Merge branch '10.4' into 10.5 · 4cda50af
  Sergei Golubchik authored Mar 13, 2024
  
  4cda50af
- MDEV-33344 REGEXP empty string inconsistent · 62a9a54a
  Sergei Golubchik authored Feb 01, 2024
  
  62a9a54a
- MDEV-33318 ORDER BY COLLATE improperly applied to non-character columns · 7828aadd
  Sergei Golubchik authored Jan 26, 2024
```
when changing charset from latin1 to utf8, adjust max_length accordingly
```
  7828aadd
- MDEV-33549: Incorrect handling of UPDATE in PS mode in case a table's colum declared as NOT NULL · ac20edd7
  Dmitry Shulga authored Mar 13, 2024
```
Follow-up to fix comiler warings caused by present of
the clause override in declaration of the method Item_param::cleanup
```
  ac20edd7
12 Mar, 2024 4 commits

MDEV-33622 Server crashes when the UPDATE statement (which has duplicate key)... · cfa8268e

Monty authored Mar 12, 2024

MDEV-33622 Server crashes when the UPDATE statement (which has duplicate key) is run after setting a low thread_stack

This was caused by wrong allocation of variable on stack.
(Was allocating 4K of data instead of 512 bytes).

No test case as the original MDEV test cases is not usable for mtr.

cfa8268e

MDEV-33549: Incorrect handling of UPDATE in PS mode in case a table's colum declared as NOT NULL · 428a6731

Dmitry Shulga authored Mar 12, 2024

UPDATE statement that is run in PS mode and uses positional parameter
handles columns declared with the clause DEFAULT NULL incorrectly in
case the clause DEFAULT is passed as actual value for the positional
parameter of the prepared statement. Similar issue happens in case
an expression specified in the DEFAULT clause of table's column definition.

The reason for incorrect processing of columns declared as DEFAULT NULL
is that setting of null flag for a field being updated was missed
in implementation of the method Item_param::assign_default().
The reason for incorrect handling of an expression in DEFAULT clause is
also missed saving of a field inside implementation of the method
Item_param::assign_default().

428a6731

MDEV-24167 fixup: Stricter assertion · 4ac8c4c8

Marko Mäkelä authored Mar 12, 2024

log_free_check(): Assert that the current thread is not holding
lock_sys.latch in any mode.

This fixes up commit 5f2dcd11

4ac8c4c8

Merge 10.5 into 10.6 · c3a00dfa
Marko Mäkelä authored Mar 12, 2024

c3a00dfa

11 Mar, 2024 3 commits
- MDEV-33642: MemorySanitizer: SEGV on unknown address on shutdown · 0a9cec22
  Marko Mäkelä authored Mar 11, 2024
```
signal_hand(): Remove the cmake -DWITH_DBUG_TRACE=ON instrumentation.
It can cause a crash on shutdown when the only other thread is
waiting in wait_for_signal_thread_to_end().
```
  0a9cec22
- MDEV-33593 Auto increment deadlock error causes ASSERT in subsequent save point · 67abdb9f
  mariadb-DebarunBanerjee authored Mar 11, 2024
```
innodb.autoinc_debug: Correct the test case for predictable deadlock.
```
  67abdb9f
- Merge 10.4 into 10.5 · f703e72b
  Marko Mäkelä authored Mar 11, 2024
  
  f703e72b