Commits · 21c15cc20b3f17d10863ade48488e6fa7fff801a · nexedi / MariaDB

21 Oct, 2020 8 commits

Fixed compiler warning in connect/filemazip.cpp · 21c15cc2
Monty authored Oct 16, 2020

21c15cc2
Fixed typo in mtr_cases.pm · 2912f1f8
Monty authored Oct 16, 2020

2912f1f8

Trivial fixups, no code changes · 3c4b8440

Monty authored Sep 13, 2020

- Indentation changes
- Fixed wrong name for used in DBUG_ENTER
- Added some code comments

3c4b8440

Update S3 engine to maturity Gamma · dd757ee0
Monty authored Sep 10, 2020

dd757ee0

MDEV-23730 s3.replication_partition 'innodb,mix' segv · 2c8c1548

Monty authored Oct 16, 2020

This failure was caused because of several bugs:
- Someone had removed s3-slave-ignore-updates=1 from slave.cnf, which
  caused the slave to remove files that the master was working on.
- Bug in ha_partition::change_partitions() that didn't reset m_new_file
  in case of errors. This caused crashes in ha_maria::extra() as the
  maria handler was called on files that was already closed.
- In ma_pagecache there was a bug that when one got a read error one a
  big block (s3 block), it left the flag PCBLOCK_BIG_READ on for the page
  which cased an assert when the page where flushed.
- Flush all cached tables in case of ignored ALTER TABLE

Note that when merging code from 10.3, that fixes the partition bug, use
the code from this patch instead.

Changes to ma_pagecache.cc written or reviewed by Sanja

2c8c1548

MDEV-23691 S3 storage engine: delayed slave can drop the table · 71d263a1

Monty authored Sep 13, 2020

This commit fixed the problems with S3 after the "DROP TABLE FORCE" changes.
It also fixes all failing replication S3 tests.

A slave is delayed if it is trying to execute replicated queries on a
table that is already converted to S3 by the master later in the binlog.

Fixes for replication events on S3 tables for delayed slaves:
- INSERT and INSERT ... SELECT and CREATE TABLE are ignored but written
  to the binary log.   UPDATE & DELETE will be fixed in a future commit.

Other things:
- On slaves with --s3-slave-ignore-updates set, allow S3 tables to be
  opened in read-write mode. This was done to be able to
  ignore-but-replicate queries like insert.  Without this change any
  open of an S3 table failed with 'Table is read only' which is too
  early to be able to replicate the original query.
- Errors are now printed if handler::extra() call fails in
  wait_while_tables_are_used().
- Error message for row changes are changed from HA_ERR_WRONG_COMMAND
  to HA_ERR_TABLE_READONLY.
- Disable some maria_extra() calls for S3 tables. This could cause
  S3 tables to fail in some cases.
- Added missing thr_lock_delete() to ma_open() in case of failure.
- Removed from mysql_prepare_insert() the not needed argument 'table'.

71d263a1

Added wait-for-pos-timeout=NUM argument to mtr · 5902d5e0
Monty authored Sep 13, 2020
```
Other things:
- Updated help text for --gdb
```
5902d5e0
Disable from valgrind big innodb tests that doesn't run well in valgrind · f9c432c5
Monty authored Sep 10, 2020

f9c432c5

20 Oct, 2020 3 commits
- MCS engine ref update · edfeb129
  Roman Nozdrin authored Oct 20, 2020
  
  edfeb129
- MDEV-19275 Provide SQL service to plugins. · e3fc9c1d
  Alexey Botchkov authored Oct 20, 2020
```
Duplicating lines removed from the debian script.
```
  e3fc9c1d
- MDEV-23852 alter table rename column to uppercase doesn't work · d1667fb8
  Aleksey Midenkov authored Oct 20, 2020
```
Case-sensitive compare to detect column name case change in inplace
alter rename.
```
  d1667fb8
19 Oct, 2020 1 commit
- MDEV-19275 Provide SQL service to plugins. · 5ca14daf
  Alexey Botchkov authored Oct 19, 2020
```
Debian scripts fixed.
```
  5ca14daf
16 Oct, 2020 3 commits

MDEV-23399 fixup: Remove double-free of a buffer page · 6dc037a9

Marko Mäkelä authored Oct 16, 2020

In commit 7cffb5f6 we changed the
interface of buf_page_create() so that the free_block is allocated
by the caller. Both calls to buf_LRU_block_free_non_file_page()
should have been removed.

This caused an assertion failure 'block->page.state() == BUF_BLOCK_MEMORY'
in buf_LRU_block_free_non_file_page().

The bug only affected ROW_FORMAT=COMPRESSED pages.

6dc037a9

MDEV-23973 Change buffer corruption when reallocating an recently freed page · bdbec5a2

Marko Mäkelä authored Oct 16, 2020

After commit abb678b6
(a follow-up fix to MDEV-19514 to prevent potential hangs)
and MDEV-23399, the probability for hitting a dormant bug
that is related to MDEV-19514 was increased.

buf_page_create(): Call ibuf_merge_or_delete_for_page() also
when reusing a previously freed page.

Reviewed by: Thirunarayanan Balathandayuthapani

bdbec5a2

Fixup · a0113683

Marko Mäkelä authored Oct 16, 2020

We forgot to change innodb_autoextend_increment from ULONG to
UINT (always 32-bit) in Mariabackup.

a0113683

15 Oct, 2020 9 commits

Cleanup: Make InnoDB page numbers uint32_t · 9028cc6b

Marko Mäkelä authored Oct 15, 2020

InnoDB stores a 32-bit page number in page headers and in some
data structures, such as FIL_ADDR (consisting of a 32-bit page number
and a 16-bit byte offset within a page). For better compile-time
error detection and to reduce the memory footprint in some data
structures, let us use a uint32_t for the page number, instead
of ulint (size_t) which can be 64 bits.

9028cc6b

Cleanup: Remove export_vars.innodb_num_open_files · 61161d51
Marko Mäkelä authored Oct 15, 2020

61161d51
Cleanup: Compare page_id_t directly · ecb913c2
Marko Mäkelä authored Oct 15, 2020

ecb913c2

MDEV-19514 fixup: Simplify buf_page_read_complete() · abb678b6

Marko Mäkelä authored Oct 15, 2020

False positives for buf_page_t::ibuf_exist are acceptable,
because it does not hurt to unnecessarily invoke
ibuf_merge_or_delete_for_page().

Invoking buf_page_get_gen() in a read completion function
is a definite no-no, because it could trigger a page flush
or cause the server to run out of buffer pool.

With some MDEV-23855 changes present, the test innodb.purge_secondary
occasionally failed due to the table having been dropped while
ibuf_page_exists() invoked buf_page_get_gen().

Reviewed by: Thirunarayanan Balathandayuthapani

abb678b6

MDEV-23399: Performance regression with write workloads · 7cffb5f6

Marko Mäkelä authored Oct 15, 2020

The buffer pool refactoring in MDEV-15053 and MDEV-22871 shifted
the performance bottleneck to the page flushing.

The configuration parameters will be changed as follows:

innodb_lru_flush_size=32 (new: how many pages to flush on LRU eviction)
innodb_lru_scan_depth=1536 (old: 1024)
innodb_max_dirty_pages_pct=90 (old: 75)
innodb_max_dirty_pages_pct_lwm=75 (old: 0)

Note: The parameter innodb_lru_scan_depth will only affect LRU
eviction of buffer pool pages when a new page is being allocated. The
page cleaner thread will no longer evict any pages. It used to
guarantee that some pages will remain free in the buffer pool. Now, we
perform that eviction 'on demand' in buf_LRU_get_free_block().
The parameter innodb_lru_scan_depth(srv_LRU_scan_depth) is used as follows:
 * When the buffer pool is being shrunk in buf_pool_t::withdraw_blocks()
 * As a buf_pool.free limit in buf_LRU_list_batch() for terminating
   the flushing that is initiated e.g., by buf_LRU_get_free_block()
The parameter also used to serve as an initial limit for unzip_LRU
eviction (evicting uncompressed page frames while retaining
ROW_FORMAT=COMPRESSED pages), but now we will use a hard-coded limit
of 100 or unlimited for invoking buf_LRU_scan_and_free_block().

The status variables will be changed as follows:

innodb_buffer_pool_pages_flushed: This includes also the count of
innodb_buffer_pool_pages_LRU_flushed and should work reliably,
updated one by one in buf_flush_page() to give more real-time
statistics. The function buf_flush_stats(), which we are removing,
was not called in every code path. For both counters, we will use
regular variables that are incremented in a critical section of
buf_pool.mutex. Note that show_innodb_vars() directly links to the
variables, and reads of the counters will *not* be protected by
buf_pool.mutex, so you cannot get a consistent snapshot of both variables.

The following INFORMATION_SCHEMA.INNODB_METRICS counters will be
removed, because the page cleaner no longer deals with writing or
evicting least recently used pages, and because the single-page writes
have been removed:
* buffer_LRU_batch_flush_avg_time_slot
* buffer_LRU_batch_flush_avg_time_thread
* buffer_LRU_batch_flush_avg_time_est
* buffer_LRU_batch_flush_avg_pass
* buffer_LRU_single_flush_scanned
* buffer_LRU_single_flush_num_scan
* buffer_LRU_single_flush_scanned_per_call

When moving to a single buffer pool instance in MDEV-15058, we missed
some opportunity to simplify the buf_flush_page_cleaner thread. It was
unnecessarily using a mutex and some complex data structures, even
though we always have a single page cleaner thread.

Furthermore, the buf_flush_page_cleaner thread had separate 'recovery'
and 'shutdown' modes where it was waiting to be triggered by some
other thread, adding unnecessary latency and potential for hangs in
relatively rarely executed startup or shutdown code.

The page cleaner was also running two kinds of batches in an
interleaved fashion: "LRU flush" (writing out some least recently used
pages and evicting them on write completion) and the normal batches
that aim to increase the MIN(oldest_modification) in the buffer pool,
to help the log checkpoint advance.

The buf_pool.flush_list flushing was being blocked by
buf_block_t::lock for no good reason. Furthermore, if the FIL_PAGE_LSN
of a page is ahead of log_sys.get_flushed_lsn(), that is, what has
been persistently written to the redo log, we would trigger a log
flush and then resume the page flushing. This would unnecessarily
limit the performance of the page cleaner thread and trigger the
infamous messages "InnoDB: page_cleaner: 1000ms intended loop took 4450ms.
The settings might not be optimal" that were suppressed in
commit d1ab8903 unless log_warnings>2.

Our revised algorithm will make log_sys.get_flushed_lsn() advance at
the start of buf_flush_lists(), and then execute a 'best effort' to
write out all pages. The flush batches will skip pages that were modified
since the log was written, or are are currently exclusively locked.
The MDEV-13670 message "page_cleaner: 1000ms intended loop took" message
will be removed, because by design, the buf_flush_page_cleaner() should
not be blocked during a batch for extended periods of time.

We will remove the single-page flushing altogether. Related to this,
the debug parameter innodb_doublewrite_batch_size will be removed,
because all of the doublewrite buffer will be used for flushing
batches. If a page needs to be evicted from the buffer pool and all
100 least recently used pages in the buffer pool have unflushed
changes, buf_LRU_get_free_block() will execute buf_flush_lists() to
write out and evict innodb_lru_flush_size pages. At most one thread
will execute buf_flush_lists() in buf_LRU_get_free_block(); other
threads will wait for that LRU flushing batch to finish.

To improve concurrency, we will replace the InnoDB ib_mutex_t and
os_event_t native mutexes and condition variables in this area of code.
Most notably, this means that the buffer pool mutex (buf_pool.mutex)
is no longer instrumented via any InnoDB interfaces. It will continue
to be instrumented via PERFORMANCE_SCHEMA.

For now, both buf_pool.flush_list_mutex and buf_pool.mutex will be
declared with MY_MUTEX_INIT_FAST (PTHREAD_MUTEX_ADAPTIVE_NP). The critical
sections of buf_pool.flush_list_mutex should be shorter than those for
buf_pool.mutex, because in the worst case, they cover a linear scan of
buf_pool.flush_list, while the worst case of a critical section of
buf_pool.mutex covers a linear scan of the potentially much longer
buf_pool.LRU list.

mysql_mutex_is_owner(), safe_mutex_is_owner(): New predicate, usable
with SAFE_MUTEX. Some InnoDB debug assertions need this predicate
instead of mysql_mutex_assert_owner() or mysql_mutex_assert_not_owner().

buf_pool_t::n_flush_LRU, buf_pool_t::n_flush_list:
Replaces buf_pool_t::init_flush[] and buf_pool_t::n_flush[].
The number of active flush operations.

buf_pool_t::mutex, buf_pool_t::flush_list_mutex: Use mysql_mutex_t
instead of ib_mutex_t, to have native mutexes with PERFORMANCE_SCHEMA
and SAFE_MUTEX instrumentation.

buf_pool_t::done_flush_LRU: Condition variable for !n_flush_LRU.

buf_pool_t::done_flush_list: Condition variable for !n_flush_list.

buf_pool_t::do_flush_list: Condition variable to wake up the
buf_flush_page_cleaner when a log checkpoint needs to be written
or the server is being shut down. Replaces buf_flush_event.
We will keep using timed waits (the page cleaner thread will wake
_at least_ once per second), because the calculations for
innodb_adaptive_flushing depend on fixed time intervals.

buf_dblwr: Allocate statically, and move all code to member functions.
Use a native mutex and condition variable. Remove code to deal with
single-page flushing.

buf_dblwr_check_block(): Make the check debug-only. We were spending
a significant amount of execution time in page_simple_validate_new().

flush_counters_t::unzip_LRU_evicted: Remove.

IORequest: Make more members const. FIXME: m_fil_node should be removed.

buf_flush_sync_lsn: Protect by std::atomic, not page_cleaner.mutex
(which we are removing).

page_cleaner_slot_t, page_cleaner_t: Remove many redundant members.

pc_request_flush_slot(): Replaces pc_request() and pc_flush_slot().

recv_writer_thread: Remove. Recovery works just fine without it, if we
simply invoke buf_flush_sync() at the end of each batch in
recv_sys_t::apply().

recv_recovery_from_checkpoint_finish(): Remove. We can simply call
recv_sys.debug_free() directly.

srv_started_redo: Replaces srv_start_state.

SRV_SHUTDOWN_FLUSH_PHASE: Remove. logs_empty_and_mark_files_at_shutdown()
can communicate with the normal page cleaner loop via the new function
flush_buffer_pool().

buf_flush_remove(): Assert that the calling thread is holding
buf_pool.flush_list_mutex. This removes unnecessary mutex operations
from buf_flush_remove_pages() and buf_flush_dirty_pages(),
which replace buf_LRU_flush_or_remove_pages().

buf_flush_lists(): Renamed from buf_flush_batch(), with simplified
interface. Return the number of flushed pages. Clarified comments and
renamed min_n to max_n. Identify LRU batch by lsn=0. Merge all the functions
buf_flush_start(), buf_flush_batch(), buf_flush_end() directly to this
function, which was their only caller, and remove 2 unnecessary
buf_pool.mutex release/re-acquisition that we used to perform around
the buf_flush_batch() call. At the start, if not all log has been
durably written, wait for a background task to do it, or start a new
task to do it. This allows the log write to run concurrently with our
page flushing batch. Any pages that were skipped due to too recent
FIL_PAGE_LSN or due to them being latched by a writer should be flushed
during the next batch, unless there are further modifications to those
pages. It is possible that a page that we must flush due to small
oldest_modification also carries a recent FIL_PAGE_LSN or is being
constantly modified. In the worst case, all writers would then end up
waiting in log_free_check() to allow the flushing and the checkpoint
to complete.

buf_do_flush_list_batch(): Clarify comments, and rename min_n to max_n.
Cache the last looked up tablespace. If neighbor flushing is not applicable,
invoke buf_flush_page() directly, avoiding a page lookup in between.

buf_flush_space(): Auxiliary function to look up a tablespace for
page flushing.

buf_flush_page(): Defer the computation of space->full_crc32(). Never
call log_write_up_to(), but instead skip persistent pages whose latest
modification (FIL_PAGE_LSN) is newer than the redo log. Also skip
pages on which we cannot acquire a shared latch without waiting.

buf_flush_try_neighbors(): Do not bother checking buf_fix_count
because buf_flush_page() will no longer wait for the page latch.
Take the tablespace as a parameter, and only execute this function
when innodb_flush_neighbors>0. Avoid repeated calls of page_id_t::fold().

buf_flush_relocate_on_flush_list(): Declare as cold, and push down
a condition from the callers.

buf_flush_check_neighbor(): Take id.fold() as a parameter.

buf_flush_sync(): Ensure that the buf_pool.flush_list is empty,
because the flushing batch will skip pages whose modifications have
not yet been written to the log or were latched for modification.

buf_free_from_unzip_LRU_list_batch(): Remove redundant local variables.

buf_flush_LRU_list_batch(): Let the caller buf_do_LRU_batch() initialize
the counters, and report n->evicted.
Cache the last looked up tablespace. If neighbor flushing is not applicable,
invoke buf_flush_page() directly, avoiding a page lookup in between.

buf_do_LRU_batch(): Return the number of pages flushed.

buf_LRU_free_page(): Only release and re-acquire buf_pool.mutex if
adaptive hash index entries are pointing to the block.

buf_LRU_get_free_block(): Do not wake up the page cleaner, because it
will no longer perform any useful work for us, and we do not want it
to compete for I/O while buf_flush_lists(innodb_lru_flush_size, 0)
writes out and evicts at most innodb_lru_flush_size pages. (The
function buf_do_LRU_batch() may complete after writing fewer pages if
more than innodb_lru_scan_depth pages end up in buf_pool.free list.)
Eliminate some mutex release-acquire cycles, and wait for the LRU
flush batch to complete before rescanning.

buf_LRU_check_size_of_non_data_objects(): Simplify the code.

buf_page_write_complete(): Remove the parameter evict, and always
evict pages that were part of an LRU flush.

buf_page_create(): Take a pre-allocated page as a parameter.

buf_pool_t::free_block(): Free a pre-allocated block.

recv_sys_t::recover_low(), recv_sys_t::apply(): Preallocate the block
while not holding recv_sys.mutex. During page allocation, we may
initiate a page flush, which in turn may initiate a log flush, which
would require acquiring log_sys.mutex, which should always be acquired
before recv_sys.mutex in order to avoid deadlocks. Therefore, we must
not be holding recv_sys.mutex while allocating a buffer pool block.

BtrBulk::logFreeCheck(): Skip a redundant condition.

row_undo_step(): Do not invoke srv_inc_activity_count() for every row
that is being rolled back. It should suffice to invoke the function in
trx_flush_log_if_needed() during trx_t::commit_in_memory() when the
rollback completes.

sync_check_enable(): Remove. We will enable innodb_sync_debug from the
very beginning.

Reviewed by: Vladislav Vaintroub

7cffb5f6

MDEV-23399: Remove buf_pool.flush_rbt · 46b1f500

Marko Mäkelä authored Oct 15, 2020

Normally, buf_pool.flush_list must be sorted by
buf_page_t::oldest_modification, so that log_checkpoint()
can choose MIN(oldest_modification) as the checkpoint LSN.

During recovery, buf_pool.flush_rbt used to guarantee the
ordering. However, we can allow the buf_pool.flush_list to
be in an arbitrary order during recovery, and simply ensure
that it is in the correct order by the time a log checkpoint
needs to be executed.

recv_sys_t::apply(): To keep it simple, we will always flush the
buffer pool at the end of each batch.

Note that log_checkpoint() will invoke recv_sys_t::apply() in case
a checkpoint is initiated during the last batch of recovery,
when we already allow writes to data pages and the redo log.

Reviewed by: Vladislav Vaintroub

46b1f500

MDEV-23399: Remove recv_writer_thread · b535a790

Marko Mäkelä authored Oct 15, 2020

Recovery works just fine without a separate thread whose only
task is to tell the page cleaner thread to do its job.

recv_sys_t::apply(): Flush the buffer pool at the end of each batch.

Reviewed by: Vladislav Vaintroub

b535a790

MDEV-23399 preparation: Remove buf_pool.zip_clean · fa70c146

Marko Mäkelä authored Oct 15, 2020

The debug data structure may have been useful during the development of
ROW_FORMAT=COMPRESSED page frames. Let us simplify code by removing it.

fa70c146

MDEV-23190 after-merge fix: remove unused code · 308f8350
Marko Mäkelä authored Oct 15, 2020
```
The merge commit 4d4865de
introduced fil_space_t::max_page_number_of_io() with no callers.
```
308f8350

14 Oct, 2020 1 commit

Travis-CI: Use new Ubuntu 20.04 as base, streamline and document · cea6a666

Otto Kekäläinen authored Apr 15, 2020

Simplify Travis-CI file and extend inline comments.

Upgrade to using Ubuntu 20.04 (Focal) as the baseline distro version
now that Travis-CI has made it available. Drop Xenial and all the
excess repositories Xenial needed. Now we only Focal and one Bionic
build to keep things simple and streamlined.

Keep GCC-7/Clang-7 as the older compiler, and start using GCC-10
and Clang-10 as the newer compiler. Assume that if both of them
build OK, than the intermediate versions would be OK as well.

Print 'apt-cache policy' to make it transparent in build logs what
repositories was used for build dependencies.

Remove temporary workaround from homebrew install step as Travis-CI has
fixed the original issue.

Revert ignoring results form build that previously failed on the test
main.thread_pool_info as MDEV-20372 is not fixed.

Keep arm64 failures ignored due to MDEV-23955.

Allow failures for the test main.column_compression 'innodb' due
to MDEV-23954.

cea6a666

09 Oct, 2020 1 commit

MDEV-23927 Crash in ./mtr --skip-innodb-fast-shutdown innodb.temporary_tables · a891fe6a

Marko Mäkelä authored Oct 09, 2020

innodb_preshutdown(): On innodb_fast_shutdown=0, only wait for
transactions to exit if the transaction system had been initialized.

Reviewed by: Vladislav Vaintroub

a891fe6a

08 Oct, 2020 1 commit

MDEV-23909 innodb_flush_neighbors=2 is treated like innodb_flush_neighbors=0 · d312d641

Marko Mäkelä authored Oct 08, 2020

In MDEV-15053 (commit b1ab211d)
we inadvertently removed a check whether innodb_flush_neighbors is 0,
and thus started treating only the value 1 in a special way.

buf_flush_check_neighbors(): Add the parameter contiguous,
which can be set to skip the check for non-contiguous page number ranges.

Reviewed by: Thirunarayanan Balathandayuthapani

d312d641

07 Oct, 2020 2 commits
- Merge tag 'mariadb-10.5.6' into 10.5 · 2ff2e846
  Sergei Golubchik authored Oct 07, 2020
  
  2ff2e846
- bump the VERSION · 0c7f5293
  Daniel Bartholomew authored Oct 07, 2020
  
  0c7f5293
05 Oct, 2020 7 commits
- Merge branch '10.4' into 10.5 · 5b8ab193
  Sergei Golubchik authored Oct 05, 2020
  
  5b8ab193
- Merge branch '10.3' into 10.4 · a6e451dc
  Sergei Golubchik authored Oct 05, 2020
  
  a6e451dc
- Merge branch '10.2' into 10.3 · a707c7f0
  Sergei Golubchik authored Oct 05, 2020
  
  a707c7f0
- Merge branch '10.1' into 10.2 · a4649177
  Sergei Golubchik authored Oct 05, 2020
  
  a4649177
- bump VERSION · f4c85ef5
  Sergei Golubchik authored Oct 05, 2020
  
  f4c85ef5
- MDEV-23884 donor uses invalid SST methods · 418850b2
  Sergei Golubchik authored Oct 04, 2020
  
  418850b2
- MDEV-22871 fixup: Remove SYNC_BUF_PAGE_HASH · 861cd4ce
  Marko Mäkelä authored Oct 05, 2020
```
This was missed in commit 5155a300.
```
  861cd4ce
02 Oct, 2020 4 commits
- MDEV-16264 fixup: Remove unused fts_optimize_wq->event · 7fba16d5
  Marko Mäkelä authored Oct 02, 2020
```
This was missed not only in
commit 5e62b6a5 but also in
commit a9550c47.
```
  7fba16d5
- MDEV-19275 Provide SQL service to plugins. · 0ccdf8b1
  Alexey Botchkov authored Oct 02, 2020
```
test_sql_service plugin added and employed in test_sql_service.test.
```
  0ccdf8b1
- Cleanup: Remove non-existing parameters · 82bc007f
  Marko Mäkelä authored Oct 02, 2020
  
  82bc007f
- Cleanup: Remove unused mutex keys · 91d39f63
  Marko Mäkelä authored Oct 02, 2020
  
  91d39f63