1. 07 Jan, 2015 1 commit
    • Kristian Nielsen's avatar
      MDEV-7326: Server deadlock in connection with parallel replication · f27817c1
      Kristian Nielsen authored
      The bug occurs when a transaction does a retry after all transactions have
      done mark_start_commit() in a batch of group commit from the master. In this
      case, the retrying transaction can unmark_start_commit() after the following
      batch has already started running and de-allocated the GCO. Then after retry,
      the transaction will re-do mark_start_commit() on a de-allocated GCO, and also
      wakeup of later GCOs can be lost.
      
      This was seen "in the wild" by a user, even though it is not known exactly
      what circumstances can lead to retry of one transaction after all transactions
      in a group have reached the commit phase.
      
      The lifetime around GCO was somewhat clunky anyway. With this patch, a GCO
      lives until rpl_parallel_entry::last_committed_sub_id has reached the last
      transaction in the GCO. This guarantees that the GCO will still be alive when
      a transaction does mark_start_commit(). Also, we now loop over the list of
      active GCOs for wakeup, to ensure we do not lose a wakeup even in the
      problematic case.
      f27817c1
  2. 06 Jan, 2015 2 commits
    • Jan Lindström's avatar
      MDEV-7403: should not pass recv_writer_thread_handle to CloseHandle() · 4a325159
      Jan Lindström authored
      Analysis: For some reason actual thread handle is not
      returned on Windows instead lpThreadId was returned and
      thread handle was closed after thread create. Later
      CloseHandle was called for recv_writer_thread_handle
      and psort_info->thread_hdl.
      
      Fix: Return thread handle from os_thread_create()
      also on Windows and store these thread handles also
      in srv0start.cc so that they can be later closed.
      4a325159
    • Kristian Nielsen's avatar
      MDEV-7353: rpl_mdev6386 fails sporadically in buildbot · 6e0a00ed
      Kristian Nielsen authored
      Use include/sync_with_master_gtid.inc instead of --sync_with_master to avoid a
      race in the test case.
      
      In parallel replication, the old-style slave position (which is used by
      --sync_with_master) is updated out-of-order between parallel threads. This
      makes it possible for the position to be updated past DROP TEMPORARY TABLE t2
      just before the commit of INSERT INTO t1 SELECT * FROM t2 becomes visible.
      
      In this case, there is a small window where a SELECT just after
      --sync_with_master may not see the changes from the INSERT.
      6e0a00ed
  3. 30 Dec, 2014 1 commit
  4. 28 Dec, 2014 1 commit
  5. 19 Dec, 2014 1 commit
  6. 18 Dec, 2014 1 commit
    • Kristian Nielsen's avatar
      MDEV-7342: Test failure in perfschema.setup_instruments_defaults · 826d7c68
      Kristian Nielsen authored
      Fix a possible race in the test case when restarting the server.
      
      Make sure we have disconnected before waiting for the reconnect
      that signals that the server is back up. Otherwise, we may in
      rare cases continue the test while the old server is shutting
      down, eventually leading to "connection lost" failure.
      826d7c68
  7. 12 Dec, 2014 3 commits
  8. 10 Dec, 2014 1 commit
  9. 07 Dec, 2014 1 commit
  10. 05 Dec, 2014 2 commits
  11. 04 Dec, 2014 1 commit
  12. 03 Dec, 2014 6 commits
  13. 02 Dec, 2014 4 commits
  14. 03 Dec, 2014 1 commit
    • Kristian Nielsen's avatar
      MDEV-4393: show_explain.test times out randomly · d79cce86
      Kristian Nielsen authored
      The problem was a race between the debug code in the server and the SHOW
      EXPLAIN FOR in the test case.
      
      The test case would wait for a query to reach the first point of interest
      (inside dbug_serve_apcs()), then send it a SHOW EXPLAIN FOR, then wait for the
      query to reach the next point of interest. However, the second wait was
      insufficient. It was possible for the the second wait to complete immediately,
      causing both the first and the second SHOW EXPLAIN FOR to hit the same
      invocation of dbug_server_apcs(). Then a later invocation would miss its
      intended SHOW EXPLAIN FOR and hang, and the test case would eventually time
      out.
      
      Fix is to make sure that the second wait can not trigger during the first
      invocation of dbug_server_apcs(). We do this by clearing the thd_proc_info
      (that the wait is looking for) before processing the SHOW EXPLAIN FOR; this
      way the second wait can not start until the thd_proc_info from the first
      invocation has been cleared.
      d79cce86
  15. 02 Dec, 2014 3 commits
  16. 01 Dec, 2014 5 commits
  17. 22 Nov, 2014 2 commits
  18. 01 Dec, 2014 1 commit
    • Kristian Nielsen's avatar
      MDEV-7237: Parallel replication: incorrect relaylog position after stop/start the slave · 52b25934
      Kristian Nielsen authored
      The replication relay log position was sometimes updated incorrectly at the
      end of a transaction in parallel replication. This happened because the relay
      log file name was taken from the current Relay_log_info (SQL driver thread),
      not the correct value for the transaction in question.
      
      The result was that if a transaction was applied while the SQL driver thread
      was at least one relay log file ahead, _and_ the SQL thread was subsequently
      stopped before applying any events from the most recent relay log file, then
      the relay log position would be incorrect - wrong relay log file name. Thus,
      when the slave was started again, usually a relay log read error would result,
      or in rare cases, if the position happened to be readable, the slave might
      even skip arbitrary amounts of events.
      
      In GTID mode, the relay log position is reset when both slave threads are
      restarted, so this bug would only be seen in non-GTID mode, or in GTID mode
      when only the SQL thread, not the IO thread, was stopped.
      52b25934
  19. 28 Nov, 2014 1 commit
  20. 27 Nov, 2014 2 commits
    • Kristian Nielsen's avatar
      MDEV-7037: MariaDB 10.0 does not build on Debian / kfreebsd-i386/amd64 due to... · 74e581b7
      Kristian Nielsen authored
      MDEV-7037: MariaDB 10.0 does not build on Debian / kfreebsd-i386/amd64 due to MTR failure: multi_source.gtid
      MDEV-7106: Sporadic test failure in multi_source.gtid
      MDEV-7153: Yet another sporadic failure of multi_source.gtid in buildbot
      
      This patch fixes three races in the multi_source.gtid test case that could
      cause sporadic failures:
      
      1. Do not put SHOW ALL SLAVES STATUS in the output, the output is not stable.
      
      2. Ensure that slave1 has replicated as far as expected, before stopping its
      connection to master1 (otherwise the following wait will time out due to rows
      not replicated from master1).
      
      3. Ensure that slave2 has replicated far enough before connecting slave1 to it
      (otherwise we get an error during connect that slave1 is ahead of slave2).
      
      74e581b7
    • Alexander Barkov's avatar
      Backporting a cleanup in boolean function from 10.1: · 5ae1639c
      Alexander Barkov authored
      Moving Item_bool_func2 and Item_func_opt_neg from Item_int_func to
      Item_bool_func. Now all functions that return is_bool_func()=true
      have a common root class Item_bool_func.
      This change is needed to fix MDEV-7149 properly.
      5ae1639c