- 29 Apr, 2024 1 commit
-
-
mariadb-DebarunBanerjee authored
This is a server hang and not an issue with backup. While concurrent DDLs in server gets in hanged state, mariabackup waits for DDLs to finish trying to acquire MDL_BACKUP_BLOCK_DDL. The server hang is serious in nature and caused by thread pool state being incorrectly set to thread creation pending state while no creation is actually pending. Once a thread pool reaches such state no new thread gets created in the pool. While it could possibly affect all thread pools in server, the innodb thread pool is the victim in current bug where IO job gets blocked when the pool is stuck with much less number of threads than intended. Available workers are blocked in purge waiting for page lock to be released by IO write (SX lock) causing a complete deadlock. The issue is caused by the state variable m_thread_creation_pending introduced by MDEV-31095: 9e62ab7a. We check and set the variable early while attempting to create a new thread in pool but fail to reset it if we exit the flow for other reasons like maximum threads reached or get into thread creation throttling path. Fix: The simple fix is to make sure that the state is reset back in case we don't actually attempt to create the thread.
-
- 26 Apr, 2024 1 commit
-
-
Daniele Sciascia authored
0ccdf54b removed stack allocated THD objects from functions Wsrep_schema::replay_transaction(). However, it inadvertedly anticipated the destruction of the THD, causing assertions and usage of THD after it was destroyed. The fix consists in extracting the original function into a separate function, and leave the allocation and destruction of the THD object in Wsrep_schema::replay_transaction(), making sure that using the heap allocated THD has no side effects. Same for Wsrep_schema::recover_sr_transactions(). Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
-
- 25 Apr, 2024 2 commits
-
-
Marko Mäkelä authored
commit_try_norebuild(): Add the parameter statistics_exist, similar to commit_try_rebuild(). If the InnoDB statistics tables did not exist, we will not attempt to update statistics later on during the transaction. Thanks to Matthias Leich for originally reproducing this scenario.
-
Thirunarayanan Balathandayuthapani authored
Problem: ======== - Partition update operation enables the bulk insert for the transaction while moving the row between partitions. This leads to debug assert failure while removing the row from one of the partition. Solution: ======== - Disallow the bulk insert operation for non-insert operation of partition table.
-
- 23 Apr, 2024 2 commits
-
-
Monty authored
I checked all stack overflow potential problems found with gcc -Wstack-usage=16384 and clang -Wframe-larger-than=16384 -no-inline Fixes: Added '#pragma clang diagnostic ignored "-Wframe-larger-than="' to a lot of function to where stack usage large but resonable. - Added stack check warnings to BUILD scrips when using clang and debug. Function changed to use malloc instead allocating things on stack: - read_bootstrap_query() now allocates line_buffer (20000 bytes) with malloc() instead of using stack. This has a small performance impact but this is not releant for bootstrap. - mroonga grn_select() used 65856 bytes on stack. Changed it to use malloc(). - Wsrep_schema::replay_transaction() and Wsrep_schema::recover_sr_transactions(). - Connect zipOpen3() Not fixed: - mroonga/vendor/groonga/lib/expr.c grn_proc_call() uses 43712 byte on stack. However this is not easy to fix as the stack used is caused by a lot of code generated by defines. - Most changes in mroonga/groonga where only adding of pragmas to disable stack warnings. - rocksdb/options/options_helper.cc uses 20288 of stack space. (no reason to fix except to get rid of the compiler warning) - Causes using alloca() where the allocation size is resonable. - An issue in libmariadb (reported to connectors).
-
Marko Mäkelä authored
-
- 22 Apr, 2024 2 commits
-
-
Jan Lindström authored
Problem was assertion assuming we always hold THD::LOCK_thd_data mutex that is not true. In most cases this is true but function is also used from InnoDB lock manager and there we can't take THD::LOCK_thd_data to obey mutex ordering. Removed assertion as wsrep transaction state can't change even that case. Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
-
Sergei Golubchik authored
update results
-
- 21 Apr, 2024 3 commits
-
-
Sergei Golubchik authored
in the $case=2 - it's wrong to kill after the first binlog EOF, because that might happen between INSERT(4) and INSERT(5). So, wait for the slave to acknowledge INSERT(5) before killing the master, that is, both connection threads must pass repl_semisync_master.wait_after_sync()
-
Sergei Golubchik authored
-
Sergei Golubchik authored
fixes sporadic failures under --valgrind
-
- 20 Apr, 2024 5 commits
-
-
Sergei Golubchik authored
do CHANGE MASTER before sync_with_master to have the slave in a predictable fully synced state before the next test
-
Sergei Golubchik authored
it always has to be current_thd, DBUG_SYNC asserts that. fixes sporadic SIGABRT's in binlog_encryption.rpl_parallel_slave_bgc_kill
-
Sergei Golubchik authored
-
Kristian Nielsen authored
The slave IO thread sets MYSQL_SET_CHARSET_DIR. The code for this option however is not thread-safe in sql-common/client.c. The value set is temporarily written to mysys global variable `charsets-dir` and can be seen by other threads running in parallel, which can result in use-after-free error. Problem was visible as random failures of test cases in suite multi_source with Valgrind or MSAN. Work-around by not setting this option for slave connect, it is redundant anyway as it is just setting the default value. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
-
Kristian Nielsen authored
The root cause of the failure is a bug in the Linux network stack: https://lore.kernel.org/netdev/87sf0ldk41.fsf@urd.knielsen-hq.org/T/#u If the slave does a connect(2) at the exact same time that kill -9 of the master process closes the listening socket, the FIN or RST packet is lost in the kernel, and the slave ends up timing out waiting for the initial communication from the server. This timeout defaults to --slave-net-timeout=120, which causes include/master_gtid_wait.inc to time out first and fail the test. Work-around this problem by reducing the --slave-net-timeout for this test case. If this problem turns up in other tests, we can consider reducing the default value for all tests. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
-
- 19 Apr, 2024 6 commits
-
-
Sergei Golubchik authored
disable until fixed
-
Zhibo Zhang authored
As of version 3.2.0, OpenSSL updated the error message in new versions ("https://github.com/openssl/openssl/commit/81b741f68984"). Update the tests and result files such that they are compatible with both original and new error messages. All new code of the whole pull request, including one or several files that are either new files or modified ones, are contributed under the BSD-new license. I am contributing on behalf of my employer Amazon Web Services, Inc.
-
Marko Mäkelä authored
-
Marko Mäkelä authored
mtr_t::memmove(): Revert to the parent of commit a032f14b where there was supposed to be an equivalent change that would avoid hitting a warning in some old version of GCC when this change was part of another 10.6 based developmet branch. For some reason, this change is not equivalent but will cause massive amounts of backup failures in the stress tests run by Matthias Leich, caught by commit 4179f93d in 10.6.
-
Marko Mäkelä authored
ibuf_remove_free_page(): Correct the calculation of root_savepoint(). The first entry acquired by ibuf_tree_root_get() will be ibuf.index.lock and not the change buffer root page. Thanks to Matthias Leich for finding this bug in RQG. Unfortunately, this code is very difficult to cover in our regression test suite.
-
Marko Mäkelä authored
The libpmem dependency that had been added in commit 3daef523 (MDEV-17084) did not achieve any measurable performance improvement when comparing the same PMEM device with and without "mount -o dax" using the Linux ext4 file system. Because Red Hat has deprecated libpmem, let us remove the code altogether. Note: This is a 10.6 version of commit 3f9f5ca4 which will retain PMEM support in MariaDB Server 10.11.
-
- 18 Apr, 2024 3 commits
-
-
Vladislav Vaintroub authored
-
Marko Mäkelä authored
This excludes commit 040069f4 because it is specific to innodb_sync_debug, which had been removed in commit ff5d306e.
-
mariadb-DebarunBanerjee authored
When the change buffer records for a page span across multiple change buffer leaf pages or the starting record is at the beginning of a page with a left sibling, ibuf_delete_recs deletes only the records in first page and fails to move to subsequent pages. Subsequently a slow shutdown hangs trying to delete those left over records. Fix-A: Position the cursor to an user record in B-tree and exit only when all records are exhausted. Fix-B: Make sure we call ibuf_delete_recs during slow shutdown for pages with IBUF entries to cleanup any previously left over records.
-
- 17 Apr, 2024 14 commits
-
-
Brandon Nesterenko authored
If replicating an event in ROW format, and InnoDB detects a deadlock while searching for a row, the row event will error and rollback in InnoDB and indicate that the binlog cache also needs to be cleared, i.e. by marking thd->transaction_rollback_request. In the normal case, this will trigger an error in Rows_log_event::do_apply_event() and cause a rollback. During the Rows_log_event::do_apply_event() cleanup of a successful event application, there is a DBUG_ASSERT in log_event_server.cc::rows_event_stmt_cleanup(), which sets the expectation that thd->transaction_rollback_request cannot be set because the general rollback (i.e. not the InnoDB rollback) should have happened already. However, if the replica is configured to skip deadlock errors, the rows event logic will clear the error and continue on, as if no error happened. This results in thd->transaction_rollback_request being set while in rows_event_stmt_cleanup(), thereby triggering the assertion. This patch fixes this in the following ways: 1) The assertion is invalid, and thereby removed. 2) The rollback case is forced in rows_event_stmt_cleanup() if transaction_rollback_request is set. Note the differing behavior between transactions which are skipped due to deadlock errors and other errors. When a transaction is skipped due to an ignored deadlock error, the entire transaction is rolled back and skipped (though note MDEV-33930 which allows statements in the same transaction after the deadlock-inducing one to commit). When a transaction is skipped due to ignoring a different error, only the erroring statements are rolled-back and skipped - the rest of the transaction will execute as normal. The effect of this can be seen in the test results. The added test case to rpl_skip_error.test shows that only statements which are ignored due to non-deadlock errors are ignored in larger transactions. A diff between rpl_temporary_error2_skip_all.result and rpl_temporary_error2.result shows that all statements in the errored transaction are rolled back (diff pasted below): : diff rpl_temporary_error2.result rpl_temporary_error2_skip_all.result 49c49 < 2 1 --- > 2 NULL 51c51 < 4 1 --- > 4 NULL 53c53 < * There will be two rows in t2 due to the retry. --- > * There will be one row in t2 because the ignored deadlock does not retry. 57d56 < 1 59c58 < 1 --- > 0 Reviewed By: ============ Andrei Elkin <andrei.elkin@mariadb.com>
-
Vladislav Vaintroub authored
On Windows systems, occurrences of ERROR_SHARING_VIOLATION due to conflicting share modes between processes accessing the same file can result in CreateFile failures. mysys' my_open() already incorporates a workaround by implementing wait/retry logic on Windows. But this does not help if files are opened using shell redirection like mysqltest traditionally did it, i.e via --echo exec "some text" > output_file In such cases, it is cmd.exe, that opens the output_file, and it won't do any sharing-violation retries. This commit addresses the issue by introducing a new built-in command, 'write_line', in mysqltest. This new command serves as a brief alternative to 'write_file', with a single line output, that also resolves variables like "exec" would. Internally, this command will use my_open(), and therefore retry-on-error logic. Hopefully this will eliminate the very sporadic "can't open file because it is used by another process" error on CI.
-
Vladislav Vaintroub authored
Latest Visual Studio complains about invalid format, it breaks formatting in the IDE
-
Vladislav Vaintroub authored
It uses shutdown/restart etc, features not compatible the embedded. also add have_debug.inc , since it uses debug_dbug variable
-
Vladislav Vaintroub authored
Also, disable MSVC LTO for static client libraries - they won't be usable for end-users.
-
Marko Mäkelä authored
We have quite a few assertions ut_a(m_prebuilt->trx == thd_to_trx(ha_thd())); in low-level functions. These had better be debug assertions for performance reasons. It should suffice to check that condition in the less frequently invoked ha_innobase::change_active_index(). convert_search_mode_to_innobase(): Return whether the mode is unsupported, and optionally update ha_innobase::m_last_match_mode. ha_innobase::index_read(): Only branch on find_flag once, and simplify the error handling after invoking row_search_mvcc(). ha_innobase::rnd_pos(): Remove an assertion that is duplicating one in ha_innobase::index_read(), which we are calling unconditionally. ha_innobase::records_in_range(): Check only once whether min_key, max_key are null pointers. row_sel_convert_mysql_key_to_innobase(): Declare all parameters except the conversion buffer pointer (buf) to be nonnull. Reviewed by: Debarun Banerjee
-
Marko Mäkelä authored
-
Marko Mäkelä authored
rtr_pcur_getnext_from_path(): Remove a bogus assertion that may cause a data races with buf_LRU_block_free_non_file_page(). If my_latch_mode == BTR_MODIFY_LEAF, we would have released all page latches and buffer-fixes by invoking mtr->rollback_to_savepoint(1). After this point, the btr_cur->page_cur.block is no longer valid and must not be accessed. Before 03ca6495 this assertion had been disabled, because the preprocessor symbol UNIV_RTR_DEBUG had never been enabled (except when explicitly specified in CMAKE_CXX_FLAGS). Reviewed by: Debarun Banerjee
-
mariadb-DebarunBanerjee authored
MDEV-33431 Latching order violation reported fil_system.sys_space.latch and ibuf_pessimistic_insert_mutex Issue: ------ The actual order of acquisition of the IBUF pessimistic insert mutex (SYNC_IBUF_PESS_INSERT_MUTEX) and IBUF header page latch (SYNC_IBUF_HEADER) w.r.t space latch (SYNC_FSP) differs from the order defined in sync0types.h. It was not discovered earlier as the path to ibuf_remove_free_page was not covered by the mtr test. Ideal order and one defined in sync0types.h is as follows. SYNC_IBUF_HEADER -> SYNC_IBUF_PESS_INSERT_MUTEX -> SYNC_FSP In ibuf_remove_free_page, we acquire space latch earlier and we have the order as follows resulting in the assert with innodb_sync_debug=on. SYNC_FSP -> SYNC_IBUF_HEADER -> SYNC_IBUF_PESS_INSERT_MUTEX Fix: --- We do maintain this order in other places and there doesn't seem to be any real issue here. To reduce impact in GA versions, we avoid doing extensive changes in mutex ordering to match the current SYNC_IBUF_PESS_INSERT_MUTEX order. Instead we relax the ordering check for IBUF pessimistic insert mutex using SYNC_NO_ORDER_CHECK.
-
Vladislav Vaintroub authored
Previous solution, that would entirely switch timer off, turned out to be deadlock prone. This patch fixed previous attempt to switch between long/short interval periods in MDEV-24295. Now, initial state of the timer is fixed (it is ON). Also, avoid switching timer to longer periods if there is any activity in the pool.
-
Vladislav Vaintroub authored
This reverts commit 09bae92c.
-
Marko Mäkelä authored
-
Marko Mäkelä authored
This fixes up the merge commit 9b182756
-
Jan Lindström authored
Test was waiting INSERT-clause to make rollback but wait_condition was too tight. State could be Freeing items or Rollback. Fixed wait_condition to expect one of them.
-
- 16 Apr, 2024 1 commit
-
-
Sergei Golubchik authored
create_partitioning_metadata() should only mark transaction r/w if it actually did anything (that is, the table is partitioned). otherwise it's a no-op, called even for temporary tables and it shouldn't do anything at all
-