1. 03 Jun, 2014 1 commit
    • unknown's avatar
      MDEV-5262, MDEV-5914, MDEV-5941, MDEV-6020: Deadlocks during parallel · 629b8229
      unknown authored
      replication causing replication to fail.
      
      In parallel replication, we run transactions from the master in parallel, but
      force them to commit in the same order they did on the master. If we force T1
      to commit before T2, but T2 holds eg. a row lock that is needed by T1, we get
      a deadlock when T2 waits until T1 has committed.
      
      Usually, we do not run T1 and T2 in parallel if there is a chance that they
      can have conflicting locks like this, but there are certain edge cases where
      it can occasionally happen (eg. MDEV-5914, MDEV-5941, MDEV-6020). The bug was
      that this would cause replication to hang, eventually getting a lock timeout
      and causing the slave to stop with error.
      
      With this patch, InnoDB will report back to the upper layer whenever a
      transactions T1 is about to do a lock wait on T2. If T1 and T2 are parallel
      replication transactions, and T2 needs to commit later than T1, we can thus
      detect the deadlock; we then kill T2, setting a flag that causes it to catch
      the kill and convert it to a deadlock error; this error will then cause T2 to
      roll back and release its locks (so that T1 can commit), and later T2 will be
      re-tried and eventually also committed.
      
      The kill happens asynchroneously in a slave background thread; this is
      necessary, as the reporting from InnoDB about lock waits happen deep inside
      the locking code, at a point where it is not possible to directly call
      THD::awake() due to mutexes held.
      
      Deadlock is assumed to be (very) rarely occuring, so this patch tries to
      minimise the performance impact on the normal case where no deadlocks occur,
      rather than optimise the handling of the occasional deadlock.
      
      Also fix transaction retry due to deadlock when it happens after a transaction
      already signalled to later transactions that it started to commit. In this
      case we need to undo this signalling (and later redo it when we commit again
      during retry), so following transactions will not start too early.
      
      Also add a missing thd->send_kill_message() that got triggered during testing
      (this corrects an incorrect fix for MySQL Bug#58933).
      629b8229
  2. 15 May, 2014 1 commit
    • unknown's avatar
      MDEV-5262: Missing retry after temp error in parallel replication · 787c470c
      unknown authored
      Handle retry of event groups that span multiple relay log files.
      
       - If retry reaches the end of one relay log file, move on to the next.
      
       - Handle refcounting of relay log files, and avoid purging relay log
         files until all event groups have completed that might have needed
         them for transaction retry.
      787c470c
  3. 13 May, 2014 1 commit
  4. 08 May, 2014 1 commit
    • unknown's avatar
      MDEV-5262: Missing retry after temp error in parallel replication · b0b60f24
      unknown authored
      Start implementing that an event group can be re-tried in parallel replication
      if it fails with a temporary error (like deadlock).
      
      Patch is very incomplete, just some very basic retry works.
      
      Stuff still missing (not complete list):
      
       - Handle moving to the next relay log file, if event group to be retried
         spans multiple relay log files.
      
       - Handle refcounting of relay log files, to ensure that we do not purge a
         relay log file and then later attempt to re-execute events out of it.
      
       - Handle description_event_for_exec - we need to save this somehow for the
         possible retry - and use the correct one in case it differs between relay
         logs.
      
       - Do another retry attempt in case the first retry also fails.
      
       - Limit the max number of retries.
      
       - Lots of testing will be needed for the various edge cases.
      b0b60f24
  5. 07 Jul, 2014 1 commit
    • Kristian Nielsen's avatar
      MDEV-6120: When slave stops with error, error message should indicate the failing GTID · 2b4b857d
      Kristian Nielsen authored
      Follow-up patch. The original patch added an extra argument to the
      rli->report() function, however it was forgotten to adjust the calls
      accordingly in a few places.
      
      This patch updates the remaining calls as needed. In files log_event_old.cc
      and rpl_record_old.cc, it just adds NULL, since this is only for old event
      formats from ancient master servers, which would not have any GTID information
      to add to the error messages in any case.
      2b4b857d
  6. 04 Jul, 2014 2 commits
  7. 03 Jul, 2014 1 commit
  8. 30 Jun, 2014 2 commits
    • Alexey Botchkov's avatar
      MDEV-6073 Merge gis test cases form 5.6. · 80a02037
      Alexey Botchkov authored
              Tests were merged.
              As the implementation is different, the 'internal debugging' part
              was not merged, only a stub for it created.
      80a02037
    • Kristian Nielsen's avatar
      Fix test failures in rpl.rpl_checksum and rpl.rpl_gtid_errorlog. · 439f75f8
      Kristian Nielsen authored
      These tests use search_pattern_in_file.inc to search the error log for
      expected output. However, search_pattern_in_file.inc by default searched only
      the first 50000 bytes, so if the error log grew too big the tests would fail.
      
      This patch extends search_pattern_in_file.inc with an option to specify how
      much of the file to search, and whether to search from the start of the file
      or from the end. Then the rpl.rpl_checksum and rpl.rpl_gtid_errorlog test
      cases are fixed to search the last 50000 bytes of the error log, which will
      work no matter how large prior tests have made it.
      439f75f8
  9. 27 Jun, 2014 2 commits
    • Kristian Nielsen's avatar
      MDEV-6386: Assertion `thd->transaction.stmt.is_empty() || thd->in_sub_stmt ||... · 370318f8
      Kristian Nielsen authored
      MDEV-6386: Assertion `thd->transaction.stmt.is_empty() || thd->in_sub_stmt || (thd->state_flags & Open_tables_state::BACKUPS_AVAIL)' fails with parallel replication
      
      The direct cause of the assertion was missing error handling in
      record_gtid(). If ha_commit_trans() fails for the statement commit, there was
      missing code to catch the error and do ha_rollback_trans() in this case; this
      caused close_thread_tables() to assert.
      
      Normally, this error case is not hit, but in this case it was triggered due to
      another bug: When a transaction T1 fails during parallel replication, the code
      would signal following transactions that they could start to run without
      properly marking the error condition. This caused subsequent transactions to
      incorrectly start replicating, only to get an error later during their own
      commit step. This was particularly serious if the subsequent transactions were
      DDL or MyISAM updates, which cannot be rolled back and would leave replication
      in an inconsistent state.
      
      Fixed by 1) in case of error, only signal following transactions to continue
      once the error has been properly marked and those transactions will know not
      to start; and 2) implement proper error handling in record_gtid() in the case
      that statement commit fails.
      
      370318f8
    • Sergei Golubchik's avatar
      MDEV-6401 SET ROLE returning ERROR 1959 Invalid role specification for valid role · b9ddeeff
      Sergei Golubchik authored
      Use user's ip address when verifying privileges for SET ROLE (just like check_access() does)
      b9ddeeff
  10. 25 Jun, 2014 2 commits
    • Kristian Nielsen's avatar
      MDEV-6120: When slave stops with error, error message should indicate the failing GTID · 86362129
      Kristian Nielsen authored
      If replication breaks in GTID mode, it is not trivial to determine the GTID of
      the failing event group. This is a problem, as such GTID is needed eg. to
      explicitly set @@gtid_slave_pos to skip to after that event group, or to
      compare errors on different servers, etc.
      
      Fix by ensuring that relevant slave errors logged to the error log include the
      GTID of the event group containing the problem event.
      86362129
    • Kristian Nielsen's avatar
      MDEV-5799: Error messages written upon LOST EVENTS incident are corrupted · 00467e13
      Kristian Nielsen authored
      This is MySQL Bug#59123. The message string stored in an INCIDENT event was
      not zero-terminated. This caused any following checksum bytes (if enabled on
      the master) to be output to the error log as trailing garbage when the message
      was printed to the error log.
      
      Backport the patch from MySQL 5.6:
      
        revno: 2876.228.200
        revision-id: zhenxing.he@sun.com-20110111051323-w2xnzvcjn46x6h6u
        committer: He Zhenxing <zhenxing.he@sun.com>
        timestamp: Tue 2011-01-11 13:13:23 +0800
        message:
          BUG#59123 rpl_stm_binlog_max_cache_size fails sporadically with found warnings
      
      Also add a test case.
      00467e13
  11. 24 Jun, 2014 3 commits
  12. 23 Jun, 2014 1 commit
  13. 20 Jun, 2014 1 commit
  14. 18 Jun, 2014 4 commits
    • Sergey Vojtovich's avatar
      MDEV-6039 - WebScaleSQL patches · d2a4b785
      Sergey Vojtovich authored
      Stop spawning dummy threads on client library initialization
      
      Let's revert the fix for Bug#24507.  To quote Monty from 2006:
      
      "After 1/2 a year, when all glibc versions are updated, we can delete
      this code."
      
      Note: The upstream glibc bug was fixed in 2006.
      d2a4b785
    • Sergey Vojtovich's avatar
      MDEV-6039 - WebScaleSQL patches · b6c175aa
      Sergey Vojtovich authored
      Preserve CLIENT_REMEMBER_OPTIONS flag for compressed connections
      
      Code cleanup: removed reference to CLIENT_REMEMBER_OPTIONS from server
      code. This flag is ignored in MariaDB.
      b6c175aa
    • unknown's avatar
      MDEV-6180: Error 1590 is not autoskippable · 643738ee
      unknown authored
      The INCIDENT_EVENT always caused slave error and abort, without checking
      --slave-skip-errors.
      
      Now, if error 1590, ER_SLAVE_INCIDENT is included in the --slave-skip-errors
      list, incident events will be ignored.
      
      This is a merge of this MySQL 5.6 patch:
      
      revision-id: frazer@mysql.com-20110314170916-ypgin17otj3ucx95
      committer: Frazer Clement <frazer@mysql.com>
      timestamp: Mon 2011-03-14 17:09:16 +0000
      message:
        Bug#11799671 NOT POSSIBLE TO SKIP INCIDENT ERRORS
      643738ee
    • Sergey Vojtovich's avatar
      MDEV-6039 - WebScaleSQL patches · da808ae0
      Sergey Vojtovich authored
      Use single quotes for perl paths, in case of special symbols
      
      Double-quoted string literals are subject to backslash and variable
      substitution.
      da808ae0
  15. 10 Jun, 2014 1 commit
  16. 13 Jun, 2014 1 commit
  17. 12 Jun, 2014 1 commit
  18. 11 Jun, 2014 5 commits
  19. 09 Jun, 2014 2 commits
  20. 10 Jun, 2014 7 commits