- 08 Apr, 2024 3 commits
-
-
Brandon Nesterenko authored
A GTID event can have variable length, with contributing factors such as the variable length from the flags2 and optional extra flags fields. These fields are bitmaps, where each set bit indicates an additional value that should be appended to the event, e.g. multi-engine transactions append a number to indicate the number of additional engines a transaction uses. However, if a flags bit is set, and no additional fields are appended to the event, MDEV-33672 reports that the server can still try to read from memory as if it did exist. Note, however, in debug builds, this condition is asserted for FL_EXTRA_MULTI_ENGINE. This patch fixes this to check that the length of the event is aligned with the expectation set by the flags for FL_PREPARED_XA, FL_COMPLETED_XA, and FL_EXTRA_MULTI_ENGINE. Reviewed By ============ Kristian Nielsen <knielsen@knielsen-hq.org>
-
Alexander Barkov authored
The problem happened when running mariabackup agains a pre-MDEV-30971 server, i.e. not having yet the system variable @@aria_log_dir_path. As a result, backup_start() called the function backup_files_from_datadir() with a NULL value, which further caused a crash. Fix: Perform this call: backup_files_from_datadir(.., aria_log_dir_path, ..) only if aria_log_dir_path is not NULL. Otherwise, assume that Aria log files are in their default location, so they've just copied by the previous call: backup_files_from_datadir(.., fil_path_to_mysql_datadir, ..) Thanks to Walter Doekes for a patch proposal.
-
Marko Mäkelä authored
In commit aa719b50 (part of MDEV-32050) a bug was introduced in the function purge_sys_t::choose_next_log(), which reimplements some logic that previously was part of trx_purge_read_undo_rec(). We must invoke trx_undo_get_first_rec() with the page number and offset of the undo log header, but we were incorrectly invoking it on the current undo page number, which caused us to parse undo records starting at an incorrect offset. purge_sys_t::choose_next_log(): Pass the correct parameter to trx_undo_page_get_first_rec(). trx_undo_page_get_next_rec(), trx_undo_page_get_first_rec(), trx_undo_page_get_last_rec(): Add debug assertions and make the code more robust by returning nullptr on corruption. Should we detect any corrupted undo logs during the purge of committed transaction history, the sanest thing to do is to pretend that the end of an undo log was reached. If any garbage is left in the tables, it will be ignored by anything else than CHECK TABLE ... EXTENDED, and it can be removed by OPTIMIZE TABLE. Thanks to Matthias Leich for providing an "rr replay" trace where this bug could be found. Reviewed by: Vladislav Lesin
-
- 05 Apr, 2024 1 commit
-
-
Vlad Lesin authored
Post-push fix: purge queue array can't be fixed size, because the elements of the array is the analogue of undo logs, which must be processed in the order of transaction commits, and the array can contain more elements, than trx_sys.rseg_array. Also it's necessary to maintain min-heap property by the trx_no of transaction, which produced the first non-purged undo log in all rsegs. That's why the element of purge queue aray must contain not only trx_sys.rseg_array index, but also trx_no of committed transacion, i.e. the pair (trx_no, trx_sys.rseg_array index), which is encoded as uint64_t((trx_no << 8) | (trx_sys.rseg_array index)). Reviewed by: Marko Mäkelä
-
- 03 Apr, 2024 2 commits
-
-
Brandon Nesterenko authored
MDEV-26473 fixed a segmentation fault at startup between the handle manager thread and the binlog background thread, such that the binlog background thread could be started and submit a job to the handle manager, before it had initialized. Where MDEV-26473 made it so the handle manager would initialize before the main thread started the normal binary logs, it did not account for the recovery case. That is, there is still a possibility of a segmentation fault when a server is recovering using the binary logs such that it can open the binary logs, start the binlog background thread, and submit a job to the handle manager before it is initialized. This patch fixes this by moving the initialization of the mysql handler manager to happen prior to recovery. Reviewed By: ============ Andrei Elkin <andrei.elkin@mariadb.com>
-
Vlad Lesin authored
TrxUndoRsegs is wrapper for vector of trx_rseg_t*. It has two constructors, both initialize the vector with only one element. And they are used to push transactions rseg(the singular) to purge queue. There is no function to add elements to the vector. The default constructor is used only for declaration of NullElement. The TrxUndoRsegs was introduced in WL#6915 in MySQL 5.7 and. MySQL 5.7 would unnecessarily let the purge of history parse the temporary undo records, and then look up the table (via a global hash table), and only at the point of processing the parsed undo log record determine that the table is a temporary table and the undo record must be thrown away. In MariaDB 10.2 we have two disjoint sets of rollback segments (128 for persistent, 128 for temporary), and purge does not even see the temporary tables. The only reason why temporary tables are visible to other threads is a SQL layer bug (MDEV-17805). purge_sys_t::choose_next_log(): merge the relevant part of TrxUndoRsegsIterator::set_next() to the start of purge_sys_t::choose_next_log(). purge_sys_t::rseg_get_next_history_log(): add a tail call of purge_sys_t::choose_next_log() and adjust the callers, to simplify the control flow further. purge_sys.pq_mutex and purge_sys.purge_queue: make it private by adding some simple accessor function. trx_purge_cleanse_purge_queue(): make it a member of purge_sys_t to have have access to private purge_sys.pq_mutex and purge_sys.purge_queue, simplify the code with using simple array copy and clearing purge queue instead of poping each purge queue element. rseg_t::last_commit_and_offset: exchange trx_no and offset bits to avoid bitwise operations during pushing to/popping from purge queue. Thanks Marko Mäkelä for historical overview of TrxUndoRsegs development. Reviewed by: Marko Mäkelä
-
- 27 Mar, 2024 7 commits
-
-
Marko Mäkelä authored
-
Alexander Barkov authored
Item_func_group_concat::print() did not take into account that Item_func_group_concat::separator can be of a different character set than the "String *str" (when the printing is being done to). Therefore, printing did not work correctly for: - non-ASCII separators when GROUP_CONCAT is done on 8bit data or multi-byte data with mbminlen==1. - all separators (even including simple ones like comma) when GROUP_CONCAT is done on ucs2/utf16/utf32 data (mbminlen>1). Because of this problem, VIEW definitions did not print correctly to their FRM files. This later led to a wrong SELECT and SHOW CREATE output. Fix: - Adding new String methods: bool append_for_single_quote_using_mb_wc(const char *str, size_t length, CHARSET_INFO *cs); bool append_for_single_quote_opt_convert(const char *str, size_t length, CHARSET_INFO *cs) which perform both escaping and character set conversion at the same time. - Adding a new String method escaped_wc_for_single_quote(), to reuse the code between the old and the new methods. - Fixing Item_func_group_concat::print() to use the new method append_for_single_quote_opt_convert().
-
Dave Gosselin authored
Queries that select concatenated constant strings now have colname and value that match. For example, SELECT '123' 'x'; will return a result where the column name and value both are '123x'. Review: Daniel Black
-
Daniel Black authored
.. even with MDEV-9095 fix CapabilityBounding sets require filesystem setcap attributes for the executable to gain privileges during execution. A side effect of this however is the getauxvec(AT_SECURE) gets set, and the secure_getenv from OpenSSL internals on OPENSSL_CONF environment variable will get ignored (openssl gh issue 21770). According to capabilities(7), Ambient capabilities don't trigger ld.so triggering the secure execution mode. Include SELinux and Apparmor capabilities for ipc_lock
-
Daniel Black authored
This was the orginal implementation that reverted with a bunch of commits. This reverts commit a13e521b. Revert "cmake: append to the array correctly" This reverts commit 51e3f1da. Revert "build failure with cmake < 3.10" This reverts commit 49cf702e. Revert "MDEV-33301 memlock with systemd still not working" This reverts commit 8a1904d7.
-
Jan Lindström authored
We should not set debug sync point when holding a mutex to avoid mutex ordering failure. Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
-
Denis Protivensky authored
User transactions may acquire explicit MDL locks from InnoDB level when persistent statistics is re-read for a table. If such a transaction would be subject to BF-abort, it was improperly detected as a system transaction and wouldn't get aborted. The fix: Check if a transaction holding explicit MDL locks is a user transaction in the MDL conflict handling code. Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
-
- 26 Mar, 2024 2 commits
-
-
Vladislav Vaintroub authored
Add "real ip:<ip_or_localhost>" part to the aborted message Only for proxy-protocoled connection, so it does not not to cause confusion to normal users.
-
Jan Lindström authored
Problem is that not all conflicting transactions have THD object. Therefore, it must be checked that victim has THD before it's identification is added to victim list as victim's thread identification is later requested using thd_get_thread_id function that requires that we have valid pointer to THD object in trx->mysql_thd. Victim might not have trx->mysql_thd in two cases: (1) An incomplete transaction that was recovered from undo logs on server startup (and not yet rolled back). (2) Transaction that is in XA PREPARE state and whose client connection was disconnected. Neither of these can complete before lock_wait_wsrep() releases lock_sys.latch. (1) trx_t::commit_in_memory() is clearing both trx_t::state and trx_t::is_recovered before it invokes lock_release(trx_t*) (which would be blocked by the exclusive lock_sys.latch that we are holding here). Hence, it is not possible to write a debug assertion to document this scenario. (2) If is in XA PREPARE state, it would eventually be rolled back and the lock conflict would be resolved when an XA COMMIT or XA ROLLBACK statement is executed in some other connection. Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
-
- 25 Mar, 2024 3 commits
-
-
Daniel Black authored
.. is not updating some system tables Some schema changes from MDEV-24312 master_host has 60 character limit, increase to 255 bytes failed to happen in the upgrade for tables in the mysql schema: * mysql.global_priv * mysql.procs_priv * mysql.proxies_priv * mysql.roles_mapping
-
Julius Goryavsky authored
-
Jan Lindström authored
MDEV-32787 : Assertion `!wsrep_has_changes(thd) || (thd->lex->sql_command == SQLCOM_CREATE_TABLE && !thd->is_current_stmt_binlog_format_row()) || thd->wsrep_cs().transaction().state() == wsrep::transaction::s_aborted' failed in void wsrep_commit_empty(THD*, bool) When we commit empty transaction we should allow wsrep transaction to be on s_must_replay state for DDL that was killed during certification. Fix is tested with RQG because deterministic mtr-testcase was not found. Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
-
- 22 Mar, 2024 4 commits
-
-
Marko Mäkelä authored
-
Marko Mäkelä authored
MONITOR_INC_VALUE_CUMULATIVE is a multiline macro, so the second statement will be executed always, regardless of "if" condition. These problems first started with commit b1ab211d (MDEV-15053). Thanks to Yury Chaikou from ServiceNow for the report.
-
Marko Mäkelä authored
From the correctness point of view, it should be safe to release all locks on index records that were not modified by the transaction. Doing so should make the locks after XA PREPARE fully compatible with what would happen if the server were restarted: InnoDB table IX locks and exclusive record locks would be resurrected based on undo log records. Concurrently running transactions that are waiting for a lock may invoke lock_rec_convert_impl_to_expl() to create an explicit record lock object on behalf of the lock-owning transaction so that they can attaching their waiting lock request on the explicit record lock object. Explicit locks would be released by trx_t::release_locks() during commit or rollback. Any clustered index record whose DB_TRX_ID belongs to a transaction that is in active or XA PREPARE state will be implicitly locked by that transaction. On XA PREPARE, we can release explicit exclusive locks on records whose DB_TRX_ID does not match the current transaction identifier. lock_rec_unlock_unmodified(): Release record locks that are not implicitly held by the current transaction. lock_release_on_prepare_try(), lock_release_on_prepare(): Invoke lock_rec_unlock_unmodified(). row_trx_id_offset(): Declare non-static. lock_rec_unlock(): Replaces lock_rec_unlock_supremum(). Reviewed by: Vladislav Lesin
-
Marko Mäkelä authored
By design, InnoDB has always hung when permanently running out of buffer pool, for example when several threads are waiting to allocate a block, and all of the buffer pool is buffer-fixed by the active threads. The hang that we are fixing here occurs when the buffer pool is only temporarily running out and the situation could be rescued by writing out some dirty pages or evicting some clean pages. buf_LRU_get_free_block(): Simplify the way how we wait for the buf_flush_page_cleaner thread. This fixes occasional hangs of the test encryption.innochecksum that were introduced by commit a55b951e (MDEV-26827). To play it safe, we use a timed wait when waiting for the buf_flush_page_cleaner() thread to perform its job. Should that thread get stuck, we will invoke buf_pool.LRU_warn() in order to display a message that pages could not be freed, and keep trying to wake up the buf_flush_page_cleaner() thread. The INFORMATION_SCHEMA.INNODB_METRICS counters buffer_LRU_single_flush_failure_count and buffer_LRU_get_free_waits will be removed. The latter is represented by buffer_pool_wait_free. Also removed will be the message "InnoDB: Difficult to find free blocks in the buffer pool" because in d34479dc we introduced a more precise message "InnoDB: Could not free any blocks in the buffer pool" in the buf_flush_page_cleaner thread. buf_pool_t::LRU_warn(): Issue the warning message that we could not free any blocks in the buffer pool. This may also be invoked by buf_LRU_get_free_block() if buf_flush_page_cleaner() appears to be stuck. buf_pool_t::n_flush_dec(): Remove. buf_pool_t::n_flush_dec_holding_mutex(): Rename to n_flush_dec(). buf_flush_LRU_list_batch(): Increment the eviction counter for blocks of temporary, discarded or dropped tablespaces. buf_flush_LRU(): Make static, and remove the constant parameter evict=false. The only caller will be the buf_flush_page_cleaner() thread. IORequest::is_LRU(): Remove. The only case of evicting pages on write completion will be when we are writing out pages of the temporary tablespace. Those pages are not in buf_pool.flush_list, only in buf_pool.LRU. buf_page_t::flush(): Remove the parameter evict. buf_page_t::write_complete(): Change the parameter "bool temporary" to "bool persistent" and add a parameter for an already read state(). Reviewed by: Debarun Banerjee
-
- 21 Mar, 2024 1 commit
-
-
Brandon Nesterenko authored
When using semi-sync replication with rpl_semi_sync_master_wait_point=AFTER_COMMIT, the performance of the primary can significantly reduce compared to AFTER_SYNC's performance for workloads with many concurrent users executing transactions. This is because all connections on the primary share the same cond_wait variable/mutex pair, so any time an ACK is received from a replica, all waiting connections are awoken to check if the ACK was for itself, which is done in mutual exclusion. This patch changes this such that the waiting THD will use its own local condition variable, and the ACK receiver thread only signals connections which have been ACKed for wakeup. That is, the THD::LOCK_wakeup_ready condition variable is re-used for this purpose, and the Active_tranx queue nodes are extended to hold the waiting thread, so it can be signalled once ACKed. Additionally: 1) Removed part of MDEV-11853 additions, which allowed suspended connection threads awaiting their semi-sync ACKs to live until their ACKs had been received. This part, however, wasn't needed. That is, all that was needed was for the Ack_thread to survive. So now the connection threads are killed during phase 1. Thereby THD::is_awaiting_semisync_ack, and all its related code was removed. 2) COND_binlog_send is repurposed to signal on the condition when Active_tranx is emptied during clear_active_tranx_nodes. 3) At master shutdown (when waiting for slaves), instead of the main loop individually waiting for each ACK, await_slave_reply() (renamed await_all_slave_replies()) just waits once for the repurposed COND_binlog_send to signal it is empty. 4) Test rpl_semi_sync_shutdown_await_ack is updates as following: 4.1) Added test case (adapted from Kristian Nielsen) to ensure that if a thread awaiting its ACK is killed while SHUTDOWN WAIT FOR ALL SLAVES is issued, the primary will still wait for the ACK from the killed thread. 4.2) As connections which by-passed phase 1 of thread killing no longer are delayed for kill until phase 2, we can no longer query yes/no tx after receiving an ACK/timeout. The check for these variables is removed. 4.3) Comment descriptions are updated which mention that the connection is alive; and adjusted to be the Ack_thread. Reviewed By: ============ Kristian Nielsen <knielsen@knielsen-hq.org>
-
- 20 Mar, 2024 1 commit
-
-
Marko Mäkelä authored
https://jepsen.io/analyses/mysql-8.0.34 highlights that the transaction isolation levels in the InnoDB storage engine do not correspond to any widely accepted definitions, such as "Generalized Isolation Level Definitions" https://pmg.csail.mit.edu/papers/icde00.pdf (PL-1 = READ UNCOMMITTED, PL-2 = READ COMMITTED, PL-2.99 = REPEATABLE READ, PL-3 = SERIALIZABLE). Only READ UNCOMMITTED in InnoDB seems to match the above definition. The issue is that InnoDB does not detect write/write conflicts (Section 4.4.3, Definition 6) in the above. It appears that as soon as we implement write/write conflict detection (SET SESSION innodb_snapshot_isolation=ON), the default isolation level (SET TRANSACTION ISOLATION LEVEL REPEATABLE READ) will become Snapshot Isolation (similar to Postgres), as defined in Section 4.2 of "A Critique of ANSI SQL Isolation Levels", MSR-TR-95-51, June 1995 https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tr-95-51.pdf Locking reads inside InnoDB used to read the latest committed version, ignoring what should actually be visible to the transaction. The added test innodb.lock_isolation illustrates this. The statement UPDATE t SET a=3 WHERE b=2; is executed in a transaction that was started before a read view or a snapshot of the current transaction was created, and committed before the current transaction attempts to execute UPDATE t SET b=3; If SET innodb_snapshot_isolation=ON is in effect when the second transaction was started, the second transaction will be aborted with the error ER_CHECKREAD. By default (innodb_snapshot_isolation=OFF), the second transaction would execute inconsistently, displaying an incorrect SELECT COUNT(*) FROM t in its read view. If innodb_snapshot_isolation=ON, if an attempt to acquire a lock on a record that does not exist in the current read view is made, an error DB_RECORD_CHANGED (HA_ERR_RECORD_CHANGED, ER_CHECKREAD) will be raised. This error will be treated in the same way as a deadlock: the transaction will be rolled back. lock_clust_rec_read_check_and_lock(): If the current transaction has a read view where the record is not visible and innodb_snapshot_isolation=ON, fail before trying to acquire the lock. row_sel_build_committed_vers_for_mysql(): If innodb_snapshot_isolation=ON, disable the "semi-consistent read" logic that had been implemented by myself on the directions of Heikki Tuuri in order to address https://bugs.mysql.com/bug.php?id=3300 that was motivated by a customer wanting UPDATE to skip locked rows that do not match the WHERE condition. It looks like my changes were included in the MySQL 5.1.5 commit ad126d90; at that time, employees of Innobase Oy (a recent acquisition of Oracle) had lost write access to the repository. The only reason why we set innodb_snapshot_isolation=OFF by default is backward compatibility with applications, such as the one that motivated the implementation of "semi-consistent read" back in 2005. In a later major release, we can default to innodb_snapshot_isolation=ON. Thanks to Peter Alvaro, Kyle Kingsbury and Alexey Gotsman for their work on https://github.com/jepsen-io/ and to Kyle and Alexey for explanations and some testing of this fix. Thanks to Vladislav Lesin for the initial test for MDEV-26643, as well as reviewing these changes.
-
- 19 Mar, 2024 4 commits
-
-
Brandon Nesterenko authored
Though the test itself doesn't create any transactions directly, the added test suppressions are replicated, and when the SQL thread is stopped mid-execution, it is set into an error state because these are non-transactional events being aborted. This patch fixes the test by ensuring that the test suppressions are fully replicated before continuing
-
Thirunarayanan Balathandayuthapani authored
Problem: ======= - In case of large file size, InnoDB eagerly adds the new extent even though there are many existing unused pages of the segment. Reason is that in case of larger file size, threshold (1/8 of reserved pages) for adding new extent has been reached frequently. Solution: ========= - Try to utilise the unused pages in the segment before adding the new extent in the file segment. need_for_new_extent(): In case of larger file size, try to use the 4 * FSP_EXTENT_SIZE as threshold to allocate the new extent. fseg_alloc_free_page_low(): Rewrote the function to allocate the page in the following order. 1) Try to get the page from existing segment extent. 2) Check whether the segment needs new extent (need_for_new_extent()) and allocate the new extent, find the page. 3) Take individual page from the unused page from segment or tablespace. 4) Allocate a new extent and take first page from it. Removed FSEG_FILLFACTOR, FSEG_FRAG_LIMIT variable.
-
Vladislav Vaintroub authored
Make WITHOUT_DYNAMIC_PLUGINS ignore mrooonga also in its own DIY version of MYSQL_ADD_PLUGIN
-
Vladislav Vaintroub authored
Use max_connections in calculation, top prevent possible deadlock, if max_connection is high.
-
- 18 Mar, 2024 5 commits
-
-
Vladislav Vaintroub authored
Do *not* check if socket is closed by another thread. This is race-condition prone, unnecessary, and harmful. VIO state was introduced to debug the errors, not to change the behavior. Rather than checking if socket is closed, add a DBUG_ASSERT that it is *not* closed, because this is an actual logic error, and can potentially lead to all sorts of funny behavior like writing error packets to Innodb files. Unlike closesocket(), shutdown(2) is not actually race-condition prone, and it breaks poll() and read(), and it worked for longer than a decade, and it does not need any state check in the code.
-
Daniel Black authored
Postfix on 51e3f1da that mariadbd should be the executable name rather than capabilities on a symlink.
-
Marko Mäkelä authored
-
Marko Mäkelä authored
Let us skip the recently added test main.mysql-interactive if an instrumented ncurses library is not available. In InnoDB, let us work around an uninstrumented libnuma, by declaring that the objects returned by numa_get_mems_allowed() are initialized.
-
Marko Mäkelä authored
Starting with clang-16, MemorySanitizer appears to check that uninitialized values not be passed by value nor returned. Previously, it was allowed to copy uninitialized data in such cases. get_foreign_key_info(): Remove a local variable that was passed uninitialized to a function. DsMrr_impl: Initialize key_buffer, because DsMrr_impl::dsmrr_init() is reading it. test_bind_result_ext1(): MYSQL_TYPE_LONG is 32 bits, hence we must use a 32-bit type, such as int. sizeof(long) differs between LP64 and LLP64 targets.
-
- 15 Mar, 2024 5 commits
-
-
Kristian Nielsen authored
maria_repair_parallel() clears the MY_THREAD_SPECIFIC flag for allocations since it uses different threads. But it still did one _ma_alloc_buffer() call as thread-specific which would later assert if another thread needed to extend the buffer with realloc. This patch, due to Monty, removes the MY_THREAD_SPECIFIC flag for allocations that need to realloc in different threads, and preserves it for those that are allocated/freed in the user's thread. Also fixes MDEV-33562. Reviewed-by: Monty <monty@mariadb.org> Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
-
Kristian Nielsen authored
Remove work-around that disables bulk insert optimization in replication The root cause of the original problem is now fixed (MDEV-33475). Though the bulk insert optimization will still be disabled in replication, as it is only enabled in special circumstances meant for loading a mysqldump. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
-
Kristian Nielsen authored
An earlier patch for MDEV-13577 fixed the most common instances of this, but missed one case for tables without primary key when the scan reaches the end of the table. This patch adds similar code to handle this case, converting the error to HA_ERR_RECORD_CHANGED when doing optimistic parallel apply. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
-
Kristian Nielsen authored
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
-
Kristian Nielsen authored
This patch makes the server wait for the manager thread to actually start before proceeding with server startup. Without this, if thread scheduling is really slow and the server shutdowns quickly, then it is possible that the manager thread is not yet started when shutdown_performance_schema() is called. If the manager thread starts at just the wrong moment and just before the main server reaches exit(), the thread can try to access no longer available performance schema data. This was seen as occasional assertion in the main.bootstrap test. As an additional improvement, make sure to run all pending actions before exiting the manager thread. Reviewed-by: Monty <monty@mariadb.org> Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
-
- 14 Mar, 2024 2 commits
-
-
Sergei Golubchik authored
-
Sergei Golubchik authored
-