1. 15 Jan, 2015 1 commit
    • Kristian Nielsen's avatar
      MDEV-7430: rpl.rpl_gtid_crash still fails in buildbot · df2db863
      Kristian Nielsen authored
      The problem was a too low timeout for slave reconnect. It was set to 9 seconds
      (10 retries with 1 second in-between). This is occasinally too short on some
      Buildbot hosts, when the test crashes and restarts the master while the slave
      IO thread is running.
      
      Fix by increasing --master-retry-count for this test.
      df2db863
  2. 14 Jan, 2015 2 commits
  3. 13 Jan, 2015 1 commit
  4. 07 Jan, 2015 1 commit
    • Kristian Nielsen's avatar
      MDEV-7326: Server deadlock in connection with parallel replication · f27817c1
      Kristian Nielsen authored
      The bug occurs when a transaction does a retry after all transactions have
      done mark_start_commit() in a batch of group commit from the master. In this
      case, the retrying transaction can unmark_start_commit() after the following
      batch has already started running and de-allocated the GCO. Then after retry,
      the transaction will re-do mark_start_commit() on a de-allocated GCO, and also
      wakeup of later GCOs can be lost.
      
      This was seen "in the wild" by a user, even though it is not known exactly
      what circumstances can lead to retry of one transaction after all transactions
      in a group have reached the commit phase.
      
      The lifetime around GCO was somewhat clunky anyway. With this patch, a GCO
      lives until rpl_parallel_entry::last_committed_sub_id has reached the last
      transaction in the GCO. This guarantees that the GCO will still be alive when
      a transaction does mark_start_commit(). Also, we now loop over the list of
      active GCOs for wakeup, to ensure we do not lose a wakeup even in the
      problematic case.
      f27817c1
  5. 06 Jan, 2015 2 commits
    • Jan Lindström's avatar
      MDEV-7403: should not pass recv_writer_thread_handle to CloseHandle() · 4a325159
      Jan Lindström authored
      Analysis: For some reason actual thread handle is not
      returned on Windows instead lpThreadId was returned and
      thread handle was closed after thread create. Later
      CloseHandle was called for recv_writer_thread_handle
      and psort_info->thread_hdl.
      
      Fix: Return thread handle from os_thread_create()
      also on Windows and store these thread handles also
      in srv0start.cc so that they can be later closed.
      4a325159
    • Kristian Nielsen's avatar
      MDEV-7353: rpl_mdev6386 fails sporadically in buildbot · 6e0a00ed
      Kristian Nielsen authored
      Use include/sync_with_master_gtid.inc instead of --sync_with_master to avoid a
      race in the test case.
      
      In parallel replication, the old-style slave position (which is used by
      --sync_with_master) is updated out-of-order between parallel threads. This
      makes it possible for the position to be updated past DROP TEMPORARY TABLE t2
      just before the commit of INSERT INTO t1 SELECT * FROM t2 becomes visible.
      
      In this case, there is a small window where a SELECT just after
      --sync_with_master may not see the changes from the INSERT.
      6e0a00ed
  6. 30 Dec, 2014 1 commit
  7. 28 Dec, 2014 1 commit
  8. 19 Dec, 2014 1 commit
  9. 18 Dec, 2014 1 commit
    • Kristian Nielsen's avatar
      MDEV-7342: Test failure in perfschema.setup_instruments_defaults · 826d7c68
      Kristian Nielsen authored
      Fix a possible race in the test case when restarting the server.
      
      Make sure we have disconnected before waiting for the reconnect
      that signals that the server is back up. Otherwise, we may in
      rare cases continue the test while the old server is shutting
      down, eventually leading to "connection lost" failure.
      826d7c68
  10. 12 Dec, 2014 3 commits
  11. 10 Dec, 2014 1 commit
  12. 07 Dec, 2014 1 commit
  13. 05 Dec, 2014 2 commits
  14. 04 Dec, 2014 1 commit
  15. 03 Dec, 2014 6 commits
  16. 02 Dec, 2014 4 commits
  17. 03 Dec, 2014 1 commit
    • Kristian Nielsen's avatar
      MDEV-4393: show_explain.test times out randomly · d79cce86
      Kristian Nielsen authored
      The problem was a race between the debug code in the server and the SHOW
      EXPLAIN FOR in the test case.
      
      The test case would wait for a query to reach the first point of interest
      (inside dbug_serve_apcs()), then send it a SHOW EXPLAIN FOR, then wait for the
      query to reach the next point of interest. However, the second wait was
      insufficient. It was possible for the the second wait to complete immediately,
      causing both the first and the second SHOW EXPLAIN FOR to hit the same
      invocation of dbug_server_apcs(). Then a later invocation would miss its
      intended SHOW EXPLAIN FOR and hang, and the test case would eventually time
      out.
      
      Fix is to make sure that the second wait can not trigger during the first
      invocation of dbug_server_apcs(). We do this by clearing the thd_proc_info
      (that the wait is looking for) before processing the SHOW EXPLAIN FOR; this
      way the second wait can not start until the thd_proc_info from the first
      invocation has been cleared.
      d79cce86
  18. 02 Dec, 2014 3 commits
  19. 01 Dec, 2014 5 commits
  20. 22 Nov, 2014 2 commits