- 15 Feb, 2022 1 commit
-
-
Vlad Lesin authored
MDEV-20605 Awaken transaction can miss inserted by other transaction records due to wrong persistent cursor restoration Post-push fix: remove unstable test. The test was developed to find the reason of duplicated rows caused by MDEV-20605 fix. The test is not necessary as the reason was found and the bug was fixed.
-
- 14 Feb, 2022 7 commits
-
-
Vlad Lesin authored
MDEV-20605 Awaken transaction can miss inserted by other transaction records due to wrong persistent cursor restoration sel_restore_position_for_mysql() moves forward persistent cursor position after btr_pcur_restore_position() call if cursor relative position is BTR_PCUR_ON and the cursor points to the record with NOT the same field values as in a stored record(and some other not important for this case conditions). It was done because btr_pcur_restore_position() sets page_cur_mode_t mode to PAGE_CUR_LE for cursor->rel_pos == BTR_PCUR_ON before opening cursor. So we are searching for the record less or equal to stored one. And if the found record is not equal to stored one, then it is less and we need to move cursor forward. But there can be a situation when the stored record was purged, but the new one with the same key but different value was inserted while row_search_mvcc() was suspended. In this case, when the thread is awaken, it will invoke sel_restore_position_for_mysql(), which, in turns, invoke btr_pcur_restore_position(), which will return false because found record don't match stored record, and sel_restore_position_for_mysql() will move forward cursor position. The above can lead to the case when awaken row_search_mvcc() do not see records inserted by other transactions while it slept. The mtr test case shows the example how it can be. The fix is to return special value from persistent cursor restoring function which would notify its caller that uniq fields of restored record and stored record are the same, and in this case sel_restore_position_for_mysql() don't move cursor forward. Delete-marked records are correctly processed in row_search_mvcc(). Non-unique secondary indexes are "uniquified" by adding the PK, the index->n_uniq should then be index->n_fields. So there is no need in additional checks in the fix. If transaction's readview can't see the changes made in secondary index record, it requests clustered index record in row_search_mvcc() to check its transaction id and get the correspondent record version. After this row_search_mvcc() commits mtr to preserve clustered index latching order, and starts mtr. Between those mtr commit and start secondary index pages are unlatched, and purge has the ability to remove stored in the cursor record, what causes rows duplication in result set for non-locking reads, as cursor position is restored to the previously visited record. To solve this the changes are just switched off for non-locking reads, it's quite simple solution, besides the changes don't make sense for non-locking reads. The more complex and effective from performance perspective solution is to create mtr savepoint before clustered record requesting and rolling back to that savepoint after that. See MDEV-27557. One more solution is to have per-record transaction id for secondary indexes. See MDEV-17598. If any of those is implemented, just remove select_lock_type argument in sel_restore_position_for_mysql().
-
Marko Mäkelä authored
-
Marko Mäkelä authored
-
Marko Mäkelä authored
-
Marko Mäkelä authored
-
Marko Mäkelä authored
-
Marko Mäkelä authored
-
- 12 Feb, 2022 4 commits
-
-
Daniel Bartholomew authored
-
Daniel Bartholomew authored
-
Daniel Bartholomew authored
-
Daniel Bartholomew authored
-
- 11 Feb, 2022 4 commits
-
-
Vladislav Vaintroub authored
Fixed inlining flags. Remove /Ob1 added by CMake for RelWithDebInfo. (the actual compiler default is /Ob2 if optimizations are enabled) Allow to define custom /Ob flag with new variable MSVC_INLINE, if desired
-
Marko Mäkelä authored
-
Vlad Lesin authored
MDEV-27746 Wrong comparision of BLOB's empty preffix with non-preffixed BLOB causes rows count mismatch for clustered and secondary indexes during non-locking read row_sel_sec_rec_is_for_clust_rec() treats empty BLOB prefix field in secondary index as a field equal to any external BLOB field in clustered index. Row_sel_get_clust_rec_for_mysql::operator() doesn't zerro out clustered record pointer in row_search_mvcc(), and row_search_mvcc() thinks that delete-marked secondary index record has visible for "CHECK TABLE"'s read view old-versioned clustered index record, and row_scan_index_for_mysql() counts it as a row. The fix is to execute row_sel_sec_rec_is_for_blob() in row_sel_sec_rec_is_for_clust_rec() if clustered field contains BLOB's reference.
-
Samuel Thibault authored
While building on GNU/Hurd and kfreebsd. On the C++ standard uintptr_t can be defined in <cstdint> ref: https://www.cplusplus.com/reference/cstdint/ Fixes: 0d44792a
-
- 10 Feb, 2022 10 commits
-
-
Sergei Golubchik authored
-
Sergei Golubchik authored
-
Sergei Golubchik authored
-
Sergei Petrunia authored
The asserion failure was caused by this query select /*id=1*/ from t1 where col= ( select /*id=2*/ from ... where corr_cond1 union select /*id=4*/ from ... where corr_cond2) Here, - select with id=2 was correlated due to corr_cond1. - select with id=4 was initially correlated due to corr_cond2, but then the optimizer optimized away the correlation, making the select with id=4 uncorrelated. However, since select with id=2 remained correlated, the execution had to re-compute the whole UNION. When it tried to execute select with id=4, it hit an assertion (join buffer already free'd). This is because select with id=4 has freed its execution structures after it has been executed once. The select is uncorrelated, so it did not expect it would need to be executed for the second time. Fixed this by adding this logic in st_select_lex::optimize_unflattened_subqueries(): If a member of a UNION is correlated, mark all its members as correlated, so that they are prepared to be executed multiple times.
-
Vladislav Vaintroub authored
Fixed tpool::pread() and tpool::pwrite() to return SSIZE_T on Windows, so that huge numbers are not converted to negatives. Also, make sure to never attempt reading/writing more bytes than DWORD can accomodate (4G)
-
Sergei Golubchik authored
don't let Aria create a table that it cannot open
-
Sergei Golubchik authored
use the correct check. before invoking handler methods we need to know that the table was opened, not only created.
-
Oleksandr Byelkin authored
Do not assume that subquery Item always present.
-
Sergei Golubchik authored
fix a debug assert to account for not opened temp tables
-
Monty authored
Removed all dependencies of command line arguments based on positions in an array (this kind of code should never have been written). Instead use option names, which are stable. Reviewer: Sergei Golubchik
-
- 09 Feb, 2022 6 commits
-
-
Marko Mäkelä authored
mtr_t::is_block_dirtied(), mtr_t::memo_push(): Never set m_made_dirty for pages of the temporary tablespace. Ever since commit 5eb53955 we never add those pages to buf_pool.flush_list. mtr_t::commit(): Implement part of mtr_t::prepare_write() here, and avoid acquiring log_sys.mutex if no log is written. During IMPORT TABLESPACE fixup, we do not write log, but we must add pages to buf_pool.flush_list and for that, be prepared to acquire log_sys.flush_order_mutex. mtr_t::do_write(): Replaces mtr_t::prepare_write().
-
Oleksandr Byelkin authored
-
Oleksandr Byelkin authored
-
Oleksandr Byelkin authored
-
Oleksandr Byelkin authored
-
Marko Mäkelä authored
The aim of the InnoDB change buffer is to avoid delays when a leaf page of a secondary index is not present in the buffer pool, and a record needs to be inserted, delete-marked, or purged. Instead of reading the page into the buffer pool for making such a modification, we may insert a record to the change buffer (a special index tree in the InnoDB system tablespace). The buffered changes are guaranteed to be merged if the index page actually needs to be read later. The change buffer could be useful when the database is stored on a rotational medium (hard disk) where random seeks are slower than sequential reads or writes. Obviously, the change buffer will cause write amplification, due to potentially large amount of metadata that is being written to the change buffer. We will have to write redo log records for modifying the change buffer tree as well as the user tablespace. Furthermore, in the user tablespace, we must maintain a change buffer bitmap page that uses 2 bits for estimating the amount of free space in pages, and 1 bit to specify whether buffered changes exist. This bitmap needs to be updated on every operation, which could reduce performance. Even if the change buffer were free of bugs such as MDEV-24449 (potentially causing the corruption of any page in the system tablespace) or MDEV-26977 (corruption of secondary indexes due to a currently unknown reason), it will make diagnosis of other data corruption harder. Because of all this, it is best to disable the change buffer by default.
-
- 08 Feb, 2022 8 commits
-
-
Daniel Bartholomew authored
-
Daniel Bartholomew authored
-
Daniel Bartholomew authored
-
Daniel Bartholomew authored
-
Monty authored
The problem was that "group_min_max optimization" does not work if some aggregate functions, like COUNT(*), is used. The function get_best_group_min_max() is using the join->sum_funcs array to check which aggregate functions are used. The bug was that aggregates in HAVING where not yet added to join->sum_funcs at the time get_best_group_min_max() was called. Fixed by populate join->sum_funcs already in prepare, which means that all sum functions will be in join->sum_funcs in get_best_group_min_max(). A benefit of this approach is that we can remove several calls to make_sum_func_list() from the code and simplify the function. I removed some wrong setting of 'sort_and_group'. This variable is set when alloc_group_fields() is called, as part of allocating the cache needed by end_send_group() and does not need to be set by other functions. One problematic thing was that Spider is using *join->sum_funcs to detect at which stage the optimizer is and do internal calculations of aggregate functions. Updating join->sum_funcs early caused Spider to fail when trying to find min/max values in opt_sum_query(). Fixed by temporarily resetting sum_funcs during opt_sum_query(). Reviewer: Sergei Petrunia
-
Monty authored
The problem was that get_best_group_min_max() did not check if fields used by the "group_min_max optimization" where used in sub queries. Because of this, it did not detect that a key (b,a) was used in the WHERE clause for the statement: SELECT DISTINCT b FROM t1 WHERE EXISTS ( SELECT 1 FROM DUAL WHERE a > 1 ). Fixed by also traversing the sub queries when checking if a field is used. This disables group_min_max_optimization for the above query. Reviewer: Sergei Petrunia
-
Monty authored
MENT-328 wrongly assumed that the backup failed because of warnings from mariabackup about not found files. This is normal (and the error message should be deleted). randgen failed because mariabackup didn't retry BACKUP STAGE BLOCK DDL if it failed with a deadlock. To simplify things, I implemented the retry loop in the server as this particular deadlock should be quickly resolved.
-
Monty authored
-