Commits · bb-10.5-svoj-MDEV-17084 · nexedi / MariaDB

20 Sep, 2019 2 commits
- MDEV-17084 - Optimize append only files for NVDIMM · 7b6ff1f2
  Monty authored Sep 20, 2019
```
Minor fixes.
```
  7b6ff1f2
- Fixup: one background thread for all caches · a82c6dd9
  Sergey Vojtovich authored Sep 18, 2019
```
In reply to:
There is one thread per cache. Isn't that a bit overkill as we will have 3-4 caches?
```
  a82c6dd9
16 Sep, 2019 1 commit

Fixup: recreate file on size/n_caches change · 5bdb2ae3

Sergey Vojtovich authored Sep 16, 2019

In reply to:
If we call it with n_caches > what is in the file, we should flush the old cache and the re-initalize the cache for more files.  This is a likely scenario when we add a new store engine that also wants to use the append_cache in which case n_caches can increase for the same server instance

5bdb2ae3

12 Sep, 2019 3 commits

Fixup: reshuffle code to avoid going negative · 370474b6

Sergey Vojtovich authored Sep 12, 2019

In reply to:
Note that as the above is uint, it can never be < 0, so it's easier to just test == 0
/* Wait for preceding concurrent writes completion */
     while ((uint64_t) my_atomic_load64_explicit((int64*) &cache->cached_eof,
                                                 MY_MEMORY_ORDER_RELAXED) <
            start)
       LF_BACKOFF();
Why wait. Can't we start writing to the beginning of the cache buffer up to the last flushed byte?
Isn't the cache a round-buffer?  From the code it looks like we write to the end always, then flush and then start from the beginning.
hm.. It's probably right that we test for <= 0 above, but we need to cast the full expression to int or just make avail an int64_t

370474b6

Fixup: flush all caches · f5ae90f8

Sergey Vojtovich authored Sep 12, 2019

In reply to:
if we get an error when flushing one file, we should continue flush all the other files!

f5ae90f8

Fixup: size_t written · 10b8c5e5

Sergey Vojtovich authored Sep 12, 2019

In reply to:
ssize_t written; is sent to my_pwrite which takes size_t.  Isn't it better to use size_t to not get warnings about conversions on windows ?

10b8c5e5

11 Sep, 2019 2 commits

Fixup: 8k min cache size · 717c4704

Sergey Vojtovich authored Sep 11, 2019

In reply to:
Anyway, I think we should have a minumum required cache size to avoid stupid length checking, like at least 8K per chunk.

717c4704

Fixup: 32bit n_caches and magic · 6b583394

Sergey Vojtovich authored Sep 11, 2019

In reply to:
Why is n_caches uint64? When reading the code, I would assume that this means it's very big. Why not just uint ?
I can understand you want the header structure aligned, but that is independent of the user interface that is using 'n'
alos, please replace 'n' as a parameter to cache_slot or something more readable
Another thing to think about is to replace argument checking to use assert instead of return...
if (n >= dir->header->n_caches)
return -1;

6b583394

10 Sep, 2019 4 commits

Fixup: replaced dir->dummy with !dir->header · 8981c2da

Sergey Vojtovich authored Sep 10, 2019

In reply to:
In create_directory, shouldn't you start with setting dir->header to 0 to make it easy to check if the structure is unitialized or not?
Better to use header than 'dummy' which doesn't tell the user what it's all about.

8981c2da

Fixup: update magic · e0c4708c

Sergey Vojtovich authored Sep 10, 2019

In reply to:
+/* PMAC0\0\0\0 */
+static const uint64_t pmem_append_cache_magic= 0x0000003043414d50;
In allmost all mariadb code, we usually have a magic prefix of:
255,255,id,version
The current max id is 12, so you could use 13.
sorry, should be 254,254
For example:
uchar    maria_file_magic[]=
{ (uchar) 254, (uchar) 254, (uchar) 9, '\003', };
uchar    maria_pack_file_magic[]=
{ (uchar) 254, (uchar) 254, (uchar) 10, '\001', };
most system have updated 'file' to support mariadb files...

e0c4708c

MDEV-17084 - Optimize append only files for NVDIMM · 9ee181f8
Sergey Vojtovich authored Jul 31, 2019
```
Integration of persistent memory append cache with MariaDB binary log.
```
9ee181f8

MDEV-17084 - Optimize append only files for NVDIMM · fe08c25f

Sergey Vojtovich authored Jul 18, 2019

Append cache implementation. Based on libpmem, which is mostly needed for
effecient data flushing from CPU caches.

When append cache is enabled for particular file, data is first stored in a
mmap()-ed circular buffer on faster persistent storage. Background thread
is flushing this buffer to a file on slower persistent storage.

Append caches are stored in regular append cache files. One append cache file
may contain multiple caches.

fe08c25f

09 Sep, 2019 1 commit

Fix connect RESTSDK support. · 2f99e1a2

Vladislav Vaintroub authored Sep 09, 2019

Remove debug output,
remove overriding of the Windows C runtime flags(linker warning)
do not add code that depends on restsdk if library is not going
to be linked.

freaking Connect

2f99e1a2

08 Sep, 2019 1 commit

MDEV-20525 rocksdb debug compilation fails on Windows due to unresolved my_assert variable · b09c5887

Vladislav Vaintroub authored Sep 08, 2019

MYSQL_PLUGIN_IMPORT did not work correctly for the Rocksdb helper library
rocksdb_aux_lib, because that library was not compiled with
-DMYSQL_DYNAMIC_PLUGIN.

Fix dbug such that it does not depend on exported data, only on functions
(which do not need MYSQL_PLUGIN_IMPORT decoration)

Use a "getter" function _db_my_assert() instead of DLL-exported variable.

b09c5887

06 Sep, 2019 4 commits
- Merge 10.4 into 10.5 · 4081b7b2
  Marko Mäkelä authored Sep 06, 2019
  
  4081b7b2
- Merge 10.4 into 10.5 · 780d2bb8
  Marko Mäkelä authored Sep 06, 2019
  
  780d2bb8
- Merge branch '10.3' into 10.4 · 244f0e6d
  Sergei Golubchik authored Sep 06, 2019
  
  244f0e6d
- MDEV-20496 Assertion `field.is_sane()' failed in Protocol_text::store_field_metadata · db9e41dd
  Alexander Barkov authored Sep 06, 2019
  
  db9e41dd
05 Sep, 2019 3 commits

MDEV-20425: Enable a test for debug builds · 2842c369
Marko Mäkelä authored Sep 05, 2019

2842c369
Simplify trx_state_eq() · 67e2252b
Marko Mäkelä authored Sep 05, 2019

67e2252b

MDEV-15326 after-merge fixes · 2c9e75cc

Marko Mäkelä authored Sep 05, 2019

trx_t::is_recovered: Revert most of the changes that were made by the
merge of MDEV-15326 from 10.2. The trx_sys.rw_trx_hash and the recovery
of transactions at startup is quite different in 10.3.

trx_free_at_shutdown(): Avoid excessive mutex protection. Reading fields
that can only be modified by the current thread (owning the transaction)
can be done outside mutex.

trx_t::commit_state(): Restore a tighter assertion.

trx_rollback_recovered(): Clarify why there is no potential race condition
with other transactions.

lock_trx_release_locks(): Merge with trx_t::release_locks(),
and avoid holding lock_sys.mutex unnecessarily long.

rw_trx_hash_t::find(): Remove redundant code, and avoid starving the
committer by checking trx_t::state before trx_t::reference().

2c9e75cc

04 Sep, 2019 14 commits

Merge 10.2 into 10.3 · 537f8594
Marko Mäkelä authored Sep 04, 2019

537f8594
more tests for DEFAULT and DEFAULT(column) in INSERT · f605ce08
Sergei Golubchik authored Sep 04, 2019
```
this is not ideal and needs to be fixed eventually,
but it's consistent over all forms of INSERT.
```
f605ce08

MDEV-20403 Assertion `0' or Assertion `btr_validate_index(index, 0)' failed in... · 8dca4cf5

Sergei Golubchik authored Sep 04, 2019

MDEV-20403 Assertion `0' or Assertion `btr_validate_index(index, 0)' failed in row_upd_sec_index_entry or error code 126: Index is corrupted upon UPDATE with TIMESTAMP..ON UPDATE

remove a special treatment of a bare DEFAULT keyword that made it
behave inconsistently and differently from DEFAULT(column).
Now all forms of the explicit assignment of a default column value
behave identically, and all count as an explicitly assigned value
(for the purpose of ON UPDATE NOW).

followup for c7c481f4

8dca4cf5

MDEV-20137 rpl.mdev_17588 fails in buildbot with "Table doesn't exist" · 53ec9047
Sachin authored Jul 27, 2019
```
Fix the test case.
```
53ec9047

MDEV-20079 When setting back the system time while mysqld is running, NOW()... · 647d5b24

Sergei Golubchik authored Sep 03, 2019

MDEV-20079 When setting back the system time while mysqld is running, NOW() and UNIX_TIMESTAMP() results get stuck

typo. system_time.start wasn't updated when system_time.sec
and system_time.sec_part were.

647d5b24

MDEV-16871 in_predicate_conversion_threshold cannot be set in my.cnf · 08b01ace
Sergei Golubchik authored Sep 01, 2019

08b01ace

Fix of query cache bug in Aria · 01e455db

Monty authored Sep 04, 2019

MDEV-5817 query cache bug (returning inconsistent/old result
set) with aria table parallel inserts, row format = page

The problem is that for transactional aria tables
(row_type=PAGE and transactional=1), maria_lock_database()
didn't flush the state or the query cache.
Not flushing the state is correct for transactional tables as
this is done by checkpoint, but not flushing the query cache
was wrong and could cause concurrent SELECT queries to not
be deleted from the cache.

Fixed by introducing a flush of the query cache as part of commit, if the table has changed.
t for transactional aria tables (row_type=PAGE and transactional=1), maria_lock_table() didn't flush their state or the query cache.

01e455db

MDEV-15326: Backport trx_t::is_referenced() · dae1b3b0

Marko Mäkelä authored Sep 03, 2019

Backport the applicable part of Sergey Vojtovich's commit
0ca2ea1a from MariaDB Server 10.3.

trx reference counter was updated under mutex and read without any
protection. This is both slow and unsafe. Use atomic operations for
reference counter accesses.

dae1b3b0

MDEV-15326: InnoDB: Failing assertion: !other_lock · b07beff8

Marko Mäkelä authored Sep 03, 2019

MySQL 5.7.9 (and MariaDB 10.2.2) introduced a race condition
between InnoDB transaction commit and the conversion of implicit
locks into explicit ones.

The assertion failure can be triggered with a test that runs
3 concurrent single-statement transactions in a loop on a simple
table:

CREATE TABLE t (a INT PRIMARY KEY) ENGINE=InnoDB;
thread1: INSERT INTO t SET a=1;
thread2: DELETE FROM t;
thread3: SELECT * FROM t FOR UPDATE; -- or DELETE FROM t;

The failure scenarios are like the following:
(1) The INSERT statement is being committed, waiting for lock_sys->mutex.
(2) At the time of the failure, both the DELETE and SELECT transactions
are active but have not logged any changes yet.
(3) The transaction where the !other_lock assertion fails started
lock_rec_convert_impl_to_expl().
(4) After this point, the commit of the INSERT removed the transaction from
trx_sys->rw_trx_set, in trx_erase_lists().
(5) The other transaction consulted trx_sys->rw_trx_set and determined
that there is no implicit lock. Hence, it grabbed the lock.
(6) The !other_lock assertion fails in lock_rec_add_to_queue()
for the lock_rec_convert_impl_to_expl(), because the lock was 'stolen'.
This assertion failure looks genuine, because the INSERT transaction
is still active (trx->state=TRX_STATE_ACTIVE).

The problematic step (4) was introduced in
mysql/mysql-server@e27e0e0bb75b4d35e87059816f1cc370c09890ad
which fixed something related to MVCC (covered by the test
innodb.innodb-read-view). Basically, it reintroduced an error
that had been mentioned in an earlier commit
mysql/mysql-server@a17be6963fc0d9210fa0642d3985b7219cdaf0c5:
"The active transaction was removed from trx_sys->rw_trx_set prematurely."

Our fix goes along the following lines:

(a) Implicit locks will released by assigning
trx->state=TRX_STATE_COMMITTED_IN_MEMORY as the first step.
This transition will no longer be protected by lock_sys_t::mutex,
only by trx->mutex. This idea is by Sergey Vojtovich.
(b) We detach the transaction from trx_sys before starting to release
explicit locks.
(c) All callers of trx_rw_is_active() and trx_rw_is_active_low() must
recheck trx->state after acquiring trx->mutex.
(d) Before releasing any explicit locks, we will ensure that any activity
by other threads to convert implicit locks into explicit will have ceased,
by checking !trx_is_referenced(trx). There was a glitch
in this check when it was part of lock_trx_release_locks(); at the end
we would release trx->mutex and acquire lock_sys->mutex and trx->mutex,
and fail to recheck (trx_is_referenced() is protected by trx_t::mutex).
(e) Explicit locks can be released in batches (LOCK_RELEASE_INTERVAL=1000)
just like we did before.

trx_t::state: Document that the transition to COMMITTED is only
protected by trx_t::mutex, no longer by lock_sys_t::mutex.

trx_rw_is_active_low(), trx_rw_is_active(): Document that the transaction
state should be rechecked after acquiring trx_t::mutex.

trx_t::commit_state(): New function to change a transaction to committed
state, to release implicit locks.

trx_t::release_locks(): New function to release the explicit locks
after commit_state().

lock_trx_release_locks(): Move much of the logic to the caller
(which must invoke trx_t::commit_state() and trx_t::release_locks()
as needed), and assert that the transaction will have locks.

trx_get_trx_by_xid(): Make the parameter a pointer to const.

lock_rec_other_trx_holds_expl(): Recheck trx->state after acquiring
trx->mutex, and avoid a redundant lookup of the transaction.

lock_rec_queue_validate(): Recheck impl_trx->state while holding
impl_trx->mutex.

row_vers_impl_x_locked(), row_vers_impl_x_locked_low():
Document that the transaction state must be rechecked after
trx_mutex_enter().

trx_free_prepared(): Adjust for the changes to lock_trx_release_locks().

b07beff8

MDEV-15326 preparation: Remove trx_sys_t::n_prepared_trx · 7c79c127
Marko Mäkelä authored Sep 02, 2019
```
This is a backport of 900b0790
from MariaDB Server 10.3.
```
7c79c127

MDEV-15326 preparation: Test slow shutdown after XA PREPARE · 154bd095

Marko Mäkelä authored Sep 03, 2019

We were missing a test that would exercise trx_free_prepared()
with innodb_fast_shutdown=0. Add a test.

Note: if shutdown hangs due to the XA PREPARE transactions,
in MariaDB 10.2 the test would unfortunately pass, but take
2*60 seconds longer, because of two shutdown_server statements
timing out after 60 seconds. Starting with MariaDB 10.3, the
hung server would be killed with SIGABRT, and the test could
fail thanks to a backtrace message.

154bd095

MVCC::view_close(): Correct comments · b2775ae8
Marko Mäkelä authored Sep 02, 2019

b2775ae8
Merge 10.2 (up to commit ef00ac4c ) into 10.3 · 7e08ac0b
Alexander Barkov authored Sep 04, 2019

7e08ac0b
Disable galera.galera_var_node_address test case. · cbb85f0d
Jan Lindström authored Sep 04, 2019

cbb85f0d

03 Sep, 2019 5 commits

[NFC] range-forify loops · 18af13b8
Eugene Kosov authored Sep 03, 2019

18af13b8

MDEV-20479 assertion failure in dict_table_get_nth_col() after INSTANT DROP COLUMN · 7bccd291

Eugene Kosov authored Sep 03, 2019

get_col_list_to_be_dropped() incorrectly returned uninteresting instantly
dropped column which was missing in a new dict_index_t

get_col_list_to_be_dropped(): rename to collect_columns_from_dropped_indexes
and stop return dropped columns

7bccd291

C/C · 6ee7a9a4
Sergei Golubchik authored Sep 03, 2019

6ee7a9a4

MDEV-20403 Assertion `0' or Assertion `btr_validate_index(index, 0)' failed in... · c7c481f4

Sergei Golubchik authored Sep 02, 2019

MDEV-20403 Assertion `0' or Assertion `btr_validate_index(index, 0)' failed in row_upd_sec_index_entry or error code 126: Index is corrupted upon UPDATE with TIMESTAMP..ON UPDATE

Three issues here:
* ON UPDATE DEFAULT NOW columns were updated after generated columns
  were computed - this broke indexed virtual columns
* ON UPDATE DEFAULT NOW columns were updated after BEFORE triggers,
  so triggers didn't see the correct NEW value
* in case of a multi-update generated columns were also updated
  after BEFORE triggers

c7c481f4

don't compare unassigned columns · 3789692d

Sergei Golubchik authored Sep 02, 2019

on UPDATE, compare_record() was comparing all columns that are marked
for writing. But generated columns that are written to the table are
always deterministic and cannot change unless normal non-generated
columns were changed. So it's enough to compare only non-generated
columns that were explicitly assigned values in the SET clause.

3789692d