Commits · 629b822913348cec56ec7a80a236f0ba2e613585 · Kirill Smelkov / mariadb

03 Jun, 2014 1 commit

MDEV-5262, MDEV-5914, MDEV-5941, MDEV-6020: Deadlocks during parallel · 629b8229

unknown authored Jun 03, 2014

replication causing replication to fail.

In parallel replication, we run transactions from the master in parallel, but
force them to commit in the same order they did on the master. If we force T1
to commit before T2, but T2 holds eg. a row lock that is needed by T1, we get
a deadlock when T2 waits until T1 has committed.

Usually, we do not run T1 and T2 in parallel if there is a chance that they
can have conflicting locks like this, but there are certain edge cases where
it can occasionally happen (eg. MDEV-5914, MDEV-5941, MDEV-6020). The bug was
that this would cause replication to hang, eventually getting a lock timeout
and causing the slave to stop with error.

With this patch, InnoDB will report back to the upper layer whenever a
transactions T1 is about to do a lock wait on T2. If T1 and T2 are parallel
replication transactions, and T2 needs to commit later than T1, we can thus
detect the deadlock; we then kill T2, setting a flag that causes it to catch
the kill and convert it to a deadlock error; this error will then cause T2 to
roll back and release its locks (so that T1 can commit), and later T2 will be
re-tried and eventually also committed.

The kill happens asynchroneously in a slave background thread; this is
necessary, as the reporting from InnoDB about lock waits happen deep inside
the locking code, at a point where it is not possible to directly call
THD::awake() due to mutexes held.

Deadlock is assumed to be (very) rarely occuring, so this patch tries to
minimise the performance impact on the normal case where no deadlocks occur,
rather than optimise the handling of the occasional deadlock.

Also fix transaction retry due to deadlock when it happens after a transaction
already signalled to later transactions that it started to commit. In this
case we need to undo this signalling (and later redo it when we commit again
during retry), so following transactions will not start too early.

Also add a missing thd->send_kill_message() that got triggered during testing
(this corrects an incorrect fix for MySQL Bug#58933).

629b8229

15 May, 2014 1 commit

MDEV-5262: Missing retry after temp error in parallel replication · 787c470c

unknown authored May 15, 2014

Handle retry of event groups that span multiple relay log files.

 - If retry reaches the end of one relay log file, move on to the next.

 - Handle refcounting of relay log files, and avoid purging relay log
   files until all event groups have completed that might have needed
   them for transaction retry.

787c470c

13 May, 2014 1 commit

MDEV-5262: Missing retry after temp error in parallel replication · d6091569

unknown authored May 13, 2014

Implement that if first retry fails, we can do another attempt.

Add testcases to test multi-retry that succeeds in second attempt, and
multi-retry that eventually fails due to exceeding slave_trans_retries.

d6091569

08 May, 2014 1 commit

MDEV-5262: Missing retry after temp error in parallel replication · b0b60f24

unknown authored May 08, 2014

Start implementing that an event group can be re-tried in parallel replication
if it fails with a temporary error (like deadlock).

Patch is very incomplete, just some very basic retry works.

Stuff still missing (not complete list):

 - Handle moving to the next relay log file, if event group to be retried
   spans multiple relay log files.

 - Handle refcounting of relay log files, to ensure that we do not purge a
   relay log file and then later attempt to re-execute events out of it.

 - Handle description_event_for_exec - we need to save this somehow for the
   possible retry - and use the correct one in case it differs between relay
   logs.

 - Do another retry attempt in case the first retry also fails.

 - Limit the max number of retries.

 - Lots of testing will be needed for the various edge cases.

b0b60f24

07 Jul, 2014 1 commit

MDEV-6120: When slave stops with error, error message should indicate the failing GTID · 2b4b857d

Kristian Nielsen authored Jul 07, 2014

Follow-up patch. The original patch added an extra argument to the
rli->report() function, however it was forgotten to adjust the calls
accordingly in a few places.

This patch updates the remaining calls as needed. In files log_event_old.cc
and rpl_record_old.cc, it just adds NULL, since this is only for old event
formats from ancient master servers, which would not have any GTID information
to add to the error messages in any case.

2b4b857d

04 Jul, 2014 2 commits

MDEV-6318: MariaDB with XtraDB uses times more of IO events · d2098b96

Jan Lindström authored Jul 04, 2014

than with InnoDB plugin

Fix: os0file.h in XtraDB had OS_AIO_N_PENDING_IOS_PER_THREAD 256
when on InnoDB it is OS_AIO_N_PENDING_IOS_PER_THREAD 32. Changed
XtraDB also to use 32.

d2098b96

MDEV-6288: Innodb causes server crash after disk full, · 43c85143
Jan Lindström authored Jul 04, 2014
```
then can't ALTER TABLE any more.

Fix for InnoDB storage engine.
```
43c85143

03 Jul, 2014 1 commit
- MDEV-6288: Innodb causes server crash after disk full, then can't · 6bd2f900
  Jan Lindström authored Jul 03, 2014
```
ALTER TABLE any more.
```
  6bd2f900
30 Jun, 2014 2 commits

MDEV-6073 Merge gis test cases form 5.6. · 80a02037

Alexey Botchkov authored Jul 01, 2014

        Tests were merged.
        As the implementation is different, the 'internal debugging' part
        was not merged, only a stub for it created.

80a02037

Fix test failures in rpl.rpl_checksum and rpl.rpl_gtid_errorlog. · 439f75f8

Kristian Nielsen authored Jun 30, 2014

These tests use search_pattern_in_file.inc to search the error log for
expected output. However, search_pattern_in_file.inc by default searched only
the first 50000 bytes, so if the error log grew too big the tests would fail.

This patch extends search_pattern_in_file.inc with an option to specify how
much of the file to search, and whether to search from the start of the file
or from the end. Then the rpl.rpl_checksum and rpl.rpl_gtid_errorlog test
cases are fixed to search the last 50000 bytes of the error log, which will
work no matter how large prior tests have made it.

439f75f8

27 Jun, 2014 2 commits

MDEV-6386: Assertion `thd->transaction.stmt.is_empty() || thd->in_sub_stmt ||... · 370318f8

Kristian Nielsen authored Jun 27, 2014

MDEV-6386: Assertion `thd->transaction.stmt.is_empty() || thd->in_sub_stmt || (thd->state_flags & Open_tables_state::BACKUPS_AVAIL)' fails with parallel replication

The direct cause of the assertion was missing error handling in
record_gtid(). If ha_commit_trans() fails for the statement commit, there was
missing code to catch the error and do ha_rollback_trans() in this case; this
caused close_thread_tables() to assert.

Normally, this error case is not hit, but in this case it was triggered due to
another bug: When a transaction T1 fails during parallel replication, the code
would signal following transactions that they could start to run without
properly marking the error condition. This caused subsequent transactions to
incorrectly start replicating, only to get an error later during their own
commit step. This was particularly serious if the subsequent transactions were
DDL or MyISAM updates, which cannot be rolled back and would leave replication
in an inconsistent state.

Fixed by 1) in case of error, only signal following transactions to continue
once the error has been properly marked and those transactions will know not
to start; and 2) implement proper error handling in record_gtid() in the case
that statement commit fails.

370318f8

MDEV-6401 SET ROLE returning ERROR 1959 Invalid role specification for valid role · b9ddeeff
Sergei Golubchik authored Jun 27, 2014
```
Use user's ip address when verifying privileges for SET ROLE (just like check_access() does)
```
b9ddeeff

25 Jun, 2014 2 commits

MDEV-6120: When slave stops with error, error message should indicate the failing GTID · 86362129

Kristian Nielsen authored Jun 25, 2014

If replication breaks in GTID mode, it is not trivial to determine the GTID of
the failing event group. This is a problem, as such GTID is needed eg. to
explicitly set @@gtid_slave_pos to skip to after that event group, or to
compare errors on different servers, etc.

Fix by ensuring that relevant slave errors logged to the error log include the
GTID of the event group containing the problem event.

86362129

MDEV-5799: Error messages written upon LOST EVENTS incident are corrupted · 00467e13

Kristian Nielsen authored Jun 25, 2014

This is MySQL Bug#59123. The message string stored in an INCIDENT event was
not zero-terminated. This caused any following checksum bytes (if enabled on
the master) to be output to the error log as trailing garbage when the message
was printed to the error log.

Backport the patch from MySQL 5.6:

  revno: 2876.228.200
  revision-id: zhenxing.he@sun.com-20110111051323-w2xnzvcjn46x6h6u
  committer: He Zhenxing <zhenxing.he@sun.com>
  timestamp: Tue 2011-01-11 13:13:23 +0800
  message:
    BUG#59123 rpl_stm_binlog_max_cache_size fails sporadically with found warnings

Also add a test case.

00467e13

24 Jun, 2014 3 commits

semisync maturity: Unknown -> Gamma · 5591ef01
Sergei Golubchik authored Jun 24, 2014

5591ef01
metadata_lock_info: Beta -> Gamma · 0aff48ae
Sergei Golubchik authored Jun 24, 2014

0aff48ae

MDEV-6364: Migrate a slave from MySQL 5.6 to MariaDB 10 break replication · 312219cc

Kristian Nielsen authored Jun 24, 2014

MySQL 5.6 implemented WL#344, which is about a MASTER_DELAY option to CHANGE
MASTER. But as part of this worklog, the format of the realy-log.info file was
changed. The new format is not understood by earlier versions, and nor by
MariaDB 10.0, so changing server to those versions would cause the slave to
abort with an error due to reading incorrect data out of relay-log.info.

Fix this by backporting from the WL#344 patch just the code that understands
the new relay-log.info format. We still write out the old format, and none of
the MASTER_DELAY feature is backported with this commit.

312219cc

23 Jun, 2014 1 commit
- long overdue: change maturity level for built-in auth plugins to stable · e0c8d729
  Sergei Golubchik authored Jun 23, 2014
  
  e0c8d729
20 Jun, 2014 1 commit
- Increased the version number · c26bee40
  Elena Stepanova authored Jun 20, 2014
  
  c26bee40
18 Jun, 2014 4 commits

MDEV-6039 - WebScaleSQL patches · d2a4b785

Sergey Vojtovich authored Jun 18, 2014

Stop spawning dummy threads on client library initialization

Let's revert the fix for Bug#24507.  To quote Monty from 2006:

"After 1/2 a year, when all glibc versions are updated, we can delete
this code."

Note: The upstream glibc bug was fixed in 2006.

d2a4b785

MDEV-6039 - WebScaleSQL patches · b6c175aa

Sergey Vojtovich authored Jun 18, 2014

Preserve CLIENT_REMEMBER_OPTIONS flag for compressed connections

Code cleanup: removed reference to CLIENT_REMEMBER_OPTIONS from server
code. This flag is ignored in MariaDB.

b6c175aa

MDEV-6180: Error 1590 is not autoskippable · 643738ee

unknown authored Jun 18, 2014

The INCIDENT_EVENT always caused slave error and abort, without checking
--slave-skip-errors.

Now, if error 1590, ER_SLAVE_INCIDENT is included in the --slave-skip-errors
list, incident events will be ignored.

This is a merge of this MySQL 5.6 patch:

revision-id: frazer@mysql.com-20110314170916-ypgin17otj3ucx95
committer: Frazer Clement <frazer@mysql.com>
timestamp: Mon 2011-03-14 17:09:16 +0000
message:
  Bug#11799671 NOT POSSIBLE TO SKIP INCIDENT ERRORS

643738ee

MDEV-6039 - WebScaleSQL patches · da808ae0

Sergey Vojtovich authored Jun 18, 2014

Use single quotes for perl paths, in case of special symbols

Double-quoted string literals are subject to backslash and variable
substitution.

da808ae0

10 Jun, 2014 1 commit
- MDEV-6314 - Compile/run MariaDB with ASan · 7832f1bb
  Sergey Vojtovich authored Jun 10, 2014
```
Fixed some compilation errors/warnings with ASan.
```
  7832f1bb
13 Jun, 2014 1 commit
- promote server_audit and sequence plugins to stable · 55b01023
  Sergei Golubchik authored Jun 13, 2014
  
  55b01023
12 Jun, 2014 1 commit
- valgrind warning. initialize found_rows earlier, before any "goto err". · 0c64cd83
  Sergei Golubchik authored Jun 12, 2014
  
  0c64cd83
11 Jun, 2014 5 commits
- avoid uppercase table aliases tests - they're not portable · 3d4dbe4d
  Sergei Golubchik authored Jun 11, 2014
  
  3d4dbe4d
- MDEV-5995 MySQL Bug#12750920: EMBEDDED SERVER START/STOP. · bcb85f0e
  Alexey Botchkov authored Jun 11, 2014
```
  Some variables weren't cleared properly so consequitive embedded server start/stop failed.
  Cleanups added. Also mysql_client_test.c extended to test that (taken from Mattias Johnson's patch)
```
  bcb85f0e
- MDEV-6253 MySQL Users Break when Migrating from MySQL 5.1 to MariaDB 10.0.10 · 1eaf2106
  Sergei Golubchik authored Jun 11, 2014
```
When plugin=mysql_native_password (or mysql_old_password) take the password
from *either* password *or* authentication_string, whichever is set.
This makes no sense, but alas, that's what MySQL-5.6 does.
```
  1eaf2106
- MDEV-6065 MySQL Bug#13623473 "MISSING ROWS ON SELECT AND JOIN WITH TIME/DATETIME COMPARE · 805d302d
  Sergei Golubchik authored Jun 11, 2014
```
fix for ranges like "indexed_datetime OP time"
(test case is in the previous revision)
```
  805d302d
- MDEV-6065 MySQL Bug#13623473 "MISSING ROWS ON SELECT AND JOIN WITH TIME/DATETIME COMPARE" · 6e8d49b8
  Sergei Golubchik authored Jun 11, 2014
```
fix for ref like "indexed_time = datetime"
```
  6e8d49b8
09 Jun, 2014 2 commits
- cleanup: remove special case from store_key::store_key(), add Field_blob::new_key_field · 2510f9c6
  Sergei Golubchik authored Jun 09, 2014
```
(prep for MDEV-6065)
```
  2510f9c6
- MDEV-6249 mark P_S STABLE and disable it by default · dc9b2a95
  Sergei Golubchik authored Jun 09, 2014
  
  dc9b2a95
10 Jun, 2014 7 commits

Merge · 2436d58e
Igor Babaev authored Jun 10, 2014

2436d58e
Merge · 02720fd7
Sergey Petrunya authored Jun 10, 2014

02720fd7
Merge · b80a02cb
Sergey Petrunya authored Jun 10, 2014

b80a02cb
Merge. · 1f7e6804
Igor Babaev authored Jun 10, 2014

1f7e6804

Fixed bug mdev-6071. · d42e6d3a

Igor Babaev authored Jun 10, 2014

The method JOIN_CACHE::init may fail (return 1) if some conditions on the
used join buffer is not satisfied. For example it fails if join_buffer_size
is greater than join_buffer_space_limit. The conditions should be checked
when running the EXPLAIN command for the query. That's why the method
JOIN_CACHE::init has to be called for EXPLAIN commands as well.

d42e6d3a

MDEV-4440 IF NOT EXISTS in multi-action ALTER does not work when the problem... · 6b84ecdc

Alexey Botchkov authored Jun 10, 2014

MDEV-4440 IF NOT EXISTS in multi-action ALTER does not work when the problem is created by a previous part of the ALTER.
Loops added to the handle_if_exists_option() to check the
CREATE/DROP lists for duplicates.

6b84ecdc

MDEV-5985: EITS: selectivity estimates look illogical for join and non-key equalities · aeb62282

Sergey Petrunya authored Jun 10, 2014

Part#1. 

table_cond_selectivity() should discount selectivity of table' 
conditions only when ity counts that selectivity to begin with. 

For non-ref-based access methods (ALL/range/index_merge/etc),
we start with sel=1.0 and hence do not need to discount any
selectivities.

aeb62282