Commits · ac717eec1ef599945f5da1698f359f803a67cf0e · nexedi / MariaDB

20 Feb, 2023 8 commits
- don't copy stmt IO_CACHE to trx IO_CACHE at the stmt end · ac717eec
  Sergei Golubchik authored May 29, 2022
```
instead use only one (trx) IO_CACHE and truncate it if the
statement is rolled back.

don't use binlog_cache_mngr to accumulate the data,
use binlog_cache_data instead.

(binlog_cache_data owns one IO_CACHE, binlog_cache_mngr owns
two binlog_cache_data's, trx and stmt).
```
  ac717eec
- don't do DROP SYSTEM VERSIONING online · 55db8722
  Sergei Golubchik authored Jun 02, 2022
```
because ALTER TABLE ... DROP SYSTEM VERSIONING
is not just a change in the table structure, it also deletes
all historical rows
```
  55db8722
- set read_set early, before row reads · 6bfc841f
  Sergei Golubchik authored May 31, 2022
```
also

* don't modify write_set
* backup/restore rpl_write_set
```
  6bfc841f
- no ALTER TABLE should return ER_NO_DEFAULT_FOR_FIELD · f4f3f718
  Sergei Golubchik authored May 31, 2022
  
  f4f3f718
- online alter always uses ALGORITHM=COPY, LOCK=NONE · 12104fb9
  Sergei Golubchik authored May 25, 2022
```
so any other value of ALGORITHM or LOCK disables online alter
```
  12104fb9
- remove handler::open_read_view() · 0dc367a4
  Sergei Golubchik authored May 25, 2022
```
use ht->start_consistent_snapshot() instead
```
  0dc367a4
- cleanup · 1424cbb3
  Sergei Golubchik authored May 24, 2022
```
no functional changes here
```
  1424cbb3
- support 'alter online table t1 page_checksum=0' · b879325e
  Sergei Golubchik authored May 24, 2022
  
  b879325e
19 Feb, 2023 3 commits
- tests: move around, add new · af38e389
  Sergei Golubchik authored May 25, 2022
```
two new tests:
* alter table times out because of a long concurrent trx
* alter table adds a column in the middle
```
  af38e389
- set table->pos_in_table_list in online alter · f5db54e3
  Nikita Malyavin authored Feb 20, 2023
  
  f5db54e3
- fix after rebasing onto MDEV-30378 fix · 86179862
  Nikita Malyavin authored Feb 17, 2023
  
  86179862
17 Feb, 2023 9 commits

MDEV-16329 [5/5] ALTER ONLINE TABLE · 5d633a74

Nikita Malyavin authored Nov 26, 2020

* Log rows in online_alter_binlog.
* Table online data is replicated within dedicated binlog file
* Cached data is written on commit.
* Versioning is fully supported.
* Works both wit and without binlog enabled.

* For now savepoints setup is forbidden while ONLINE ALTER goes on.
  Extra support is required. We can simply log the SAVEPOINT query events
  and replicate them together with row events. But it's not implemented
  for now.

* Cache flipping:

  We want to care for the possible bottleneck in the online alter binlog
  reading/writing in advance.

  IO_CACHE does not provide anything better that sequential access,
  besides, only a single write is mutex-protected, which is not suitable,
  since we should write a transaction atomically.

  To solve this, a special layer on top Event_log is implemented.
  There are two IO_CACHE files underneath: one for reading, and one for
  writing.

  Once the read cache is empty, an exclusive lock is acquired (we can wait
  for a currently active transaction finish writing), and flip() is emitted,
  i.e. the write cache is reopened for read, and the read cache is emptied,
  and reopened for writing.

  This reminds a buffer flip that happens in accelerated graphics
  (DirectX/OpenGL/etc).

  Cache_flip_event_log is considered non-blocking for a single reader and a
  single writer in this sense, with the only lock held by reader during flip.

  An alternative approach by implementing a fair concurrent circular buffer
  is described in MDEV-24676.

* Cache managers:
  We have two cache sinks: statement and transactional.
  It is important that the changes are first cached per-statement and
  per-transaction.
  If a statement fails, then only statement data is rolled back. The
  transaction moves along, however.

  Turns out, there's no guarantee that TABLE well persist in
  thd->open_tables to the transaction commit moment.
  If an error occurs, tables from statement are purged.
  Therefore, we can't store te caches in TABLE. Ideally, it should be
  handlerton, but we cut the corner and store it in THD in a list.

5d633a74

MDEV-16329 [4/5] Refactor MYSQL_BIN_LOG: extract Event_log ancestor · 3b0c2cf1

Nikita Malyavin authored Dec 12, 2021

Event_log is supposed to be a basic logging class that can write events in
a single file.

MYSQL_BIN_LOG in comparison will have:
* rotation support
* index files
* purging
* gtid and transactional information handling.
* is dedicated for a general-purpose binlog

3b0c2cf1

MDEV-16329 [3/5] use binlog_cache_data directly in most places · 40a0d3c0

Nikita Malyavin authored Nov 25, 2020

* Eliminate most usages of THD::use_trans_table. Only 3 left, and they are
  at quite high levels, and really essential.
* Eliminate is_transactional argument when possible. Lots of places are
  left though, because of some WSREP error handling in
  MYSQL_BIN_LOG::set_write_error.
* Remove junk binlog functions from THD
* binlog_prepare_pending_rows_event is moved to log.cc inside MYSQL_BIN_LOG
  and is not anymore template. Instead it accepls event factory with a type
  code, and a callback to a constructing function in it.

40a0d3c0

MDEV-16329 [2/5] refactor binlog and cache_mngr · 9668d9dd
Nikita Malyavin authored Oct 06, 2020
```
pump up binlog and cache manager to level of binlog_log_row_internal
```
9668d9dd
MDEV-16329 [1/5] add THD::binlog_get_cache_mngr · d811a812
Nikita Malyavin authored Mar 04, 2020

d811a812

rpl: repack table_def · dd7cb7cd

Nikita Malyavin authored Dec 13, 2021

1. Change m_size to uint. This removes some implicit conversions.
  See unpack_row, for instance:
  uint max_cols= MY_MIN(tabledef->size(), cols->n_bits);
2. Improve table_def memory layout by reordering columns

dd7cb7cd

Copy_field: add const to arguments · f4dea91e
Nikita Malyavin authored Nov 26, 2020

f4dea91e

rename tests · fd1a63aa

Sergei Golubchik authored May 25, 2022

alter_table_online -> alter_table_locknone
gis-alter_table_online -> gis-alter_table

fd1a63aa

binlog_combinations.inc -> binlog_format_combinations.inc · 7e4ada00
Sergei Golubchik authored May 24, 2022

7e4ada00

16 Feb, 2023 16 commits

fix for --view-protocol · da114c70
Sergei Golubchik authored Feb 16, 2023

da114c70
MDEV-22570 fixup: Silence clang -Wunneeded-internal-declaration · a6874341
Marko Mäkelä authored Feb 16, 2023

a6874341
Merge 10.11 into 11.0 · 2e431ff7
Marko Mäkelä authored Feb 16, 2023

2e431ff7
Merge 10.10 into 10.11 · 1fd00998
Marko Mäkelä authored Feb 16, 2023

1fd00998
Merge 10.9 into 10.10 · 345356b8
Marko Mäkelä authored Feb 16, 2023

345356b8
Merge 10.8 into 10.9 · 0d55914d
Marko Mäkelä authored Feb 16, 2023

0d55914d
Merge 10.6 into 10.8 · b12cd88c
Marko Mäkelä authored Feb 16, 2023

b12cd88c
Merge 10.5 into 10.6 · 67a6ad0a
Marko Mäkelä authored Feb 16, 2023

67a6ad0a
MDEV-30552 fixup: Fix the test for non-debug · d3f35aa4
Marko Mäkelä authored Feb 16, 2023

d3f35aa4
Fix clang -Winconsistent-missing-override · 0c79ae94
Marko Mäkelä authored Feb 16, 2023

0c79ae94
MDEV-27774 fixup: Correct a comment · 34f0433c
Marko Mäkelä authored Feb 16, 2023

34f0433c
Merge 10.6 into 10.8 · 5abbe092
Marko Mäkelä authored Feb 16, 2023

5abbe092

MDEV-30638 Deadlock between INSERT and InnoDB non-persistent statistics update · 201cfc33

Marko Mäkelä authored Feb 16, 2023

This is a partial revert of
commit 8b6a308e (MDEV-29883)
and a follow-up to the
merge commit 394fc71f (MDEV-24569).

The latching order related to any operation that accesses the allocation
metadata of an InnoDB index tree is as follows:

1. Acquire dict_index_t::lock in non-shared mode.
2. Acquire the index root page latch in non-shared mode.
3. Possibly acquire further index page latches. Unless an exclusive
dict_index_t::lock is held, this must follow the root-to-leaf,
left-to-right order.
4. Acquire a *non-shared* fil_space_t::latch.
5. Acquire latches on the allocation metadata pages.
6. Possibly allocate and write some pages, or free some pages.

btr_get_size_and_reserved(), dict_stats_update_transient_for_index(),
dict_stats_analyze_index(): Acquire an exclusive fil_space_t::latch
in order to avoid a deadlock in fseg_n_reserved_pages() in case of
concurrent access to multiple indexes sharing the same "inode page".

fseg_page_is_allocated(): Acquire an exclusive fil_space_t::latch
in order to avoid deadlocks. All callers are holding latches
on a buffer pool page, or an index, or both.
Before commit edbde4a1 (MDEV-24167)
a third mode was available that would not conflict with the shared
fil_space_t::latch acquired by ha_innobase::info_low(),
i_s_sys_tablespaces_fill_table(),
or i_s_tablespaces_encryption_fill_table().
Because those calls should be rather rare, it makes sense to use
the simple rw_lock with only shared and exclusive modes.

fil_crypt_get_page_throttle(): Avoid invoking fseg_page_is_allocated()
on an allocation bitmap page (which can never be freed), to avoid
acquiring a shared latch on top of an exclusive one.

mtr_t::s_lock_space(), MTR_MEMO_SPACE_S_LOCK: Remove.

201cfc33

MDEV-30134 Assertion failed in buf_page_t::unfix() in buf_pool_t::watch_unset() · 54c0ac72

Marko Mäkelä authored Feb 16, 2023

buf_pool_t::watch_set(): Always buffer-fix a block if one was found,
no matter if it is a watch sentinel or a buffer page. The type of
the block descriptor will be rechecked in buf_page_t::watch_unset().
Do not expect the caller to acquire the page hash latch. Starting with
commit bd5a6403 it is safe to release
buf_pool.mutex before acquiring a buf_pool.page_hash latch.

buf_page_get_low(): Adjust to the changed buf_pool_t::watch_set().

This simplifies the logic and fixes a bug that was reproduced when
using debug builds and the setting innodb_change_buffering_debug=1.

54c0ac72

MDEV-30397: MariaDB crash due to DB_FAIL reported for a corrupted page · 9c157994

Marko Mäkelä authored Feb 16, 2023

buf_read_page_low(): Map the buf_page_t::read_complete() return
value DB_FAIL to DB_PAGE_CORRUPTED. The purpose of the DB_FAIL
return value is to avoid error log noise when read-ahead brings
in an unused page that is typically filled with NUL bytes.

If a synchronous read is bringing in a corrupted page where the
page frame does not contain the expected tablespace identifier and
page number, that must be treated as an attempt to read a corrupted
page. The correct error code for this is DB_PAGE_CORRUPTED.
The error code DB_FAIL is not handled by row_mysql_handle_errors().

This was missed in commit 0b47c126
(MDEV-13542).

9c157994

Merge 10.5 into 10.6 · cc27e5fd
Marko Mäkelä authored Feb 16, 2023

cc27e5fd

15 Feb, 2023 4 commits

MDEV-30318: galera error messages in mariadb log without galera enabled · 80b4fa54

Julius Goryavsky authored Feb 15, 2023

Post-fix to MDEV-30318 and MDEV-22570-related changes:
unified handling of wsrep_provider by code so that "none"
is interpreted as case-insensitive everywhere and that
work with an empty string is supported everywhere.

80b4fa54

MDEV-30657 InnoDB: Not applying UNDO_APPEND due to corruption · 5300c0fb

Marko Mäkelä authored Feb 15, 2023

This almost completely reverts
commit acd23da4 and
retains a safe optimization:

recv_sys_t::parse(): Remove any old redo log records for the
truncated tablespace, to free up memory earlier.
If recovery consists of multiple batches, then recv_sys_t::apply()
will must invoke recv_sys_t::trim() again to avoid wrongly
applying old log records to an already truncated undo tablespace.

5300c0fb

MDEV-30324: Wrong result upon SELECT DISTINCT ... WITH TIES · 4afa3b64

Vicențiu Ciorbaru authored Feb 15, 2023

WITH TIES would not take effect if SELECT DISTINCT was used in a
context where an INDEX is used to resolve the ORDER BY clause.

WITH TIES relies on the `JOIN::order` to contain the non-constant
fields to test the equality of ORDER BY fiels required for WITH TIES.

The cause of the problem was a premature removal of the `JOIN::order`
member during a DISTINCT optimization. This lead to WITH TIES code assuming
ORDER BY only contained "constant" elements.

Disable this optimization when WITH TIES is in effect.

(side-note: the order by removal does not impact any current tests, thus
it will be removed in a future version)

Reviewed by: monty@mariadb.org

4afa3b64

Whitespace fix · d2b773d9
Vicențiu Ciorbaru authored Feb 04, 2023

d2b773d9