1. 07 May, 2024 1 commit
  2. 06 May, 2024 1 commit
    • Yuchen Pei's avatar
      MDEV-30929 spider.spider_fixes_part: wait and restart slave · 64314d30
      Yuchen Pei authored
      In the absence of insight of the cause of spider.spider_fixes_part
      failure as described in MDEV-30929, This is a workaround, which could
      help narrow the possibility down to whether slave SQL thread attempts
      to read from file that maybe not yet on disk. It does not otherwise
      affect the coverage of the test.
      
      I have pushed this commit 4 times, but have yet to encounter the
      failure as described in MDEV-30929, so it could also fix the test and
      stop the CI pollution.
      
      Also replaced START SLAVE; with --source include/start_slave.inc
      inside the slave_test_init.inc files.
      64314d30
  3. 05 May, 2024 2 commits
    • Kristian Nielsen's avatar
      MDEV-34042: Deadlock kill of XA PREPARE can break replication /... · 4b4db4a8
      Kristian Nielsen authored
      MDEV-34042: Deadlock kill of XA PREPARE can break replication / rpl.rpl_parallel_multi_domain_xa sporadic failure
      
      Refinement of the original patch.
      
      Move the code to reset the kill up into the parent class
      Xid_apply_log_event, to also fix the similar issue for XA COMMIT.
      
      Increase the number of slave retries in the test case
      rpl.rpl_parallel_multi_domain_xa to fix some sporadic failures. The test
      generates massive amounts of conflicting transactions in multiple
      independent domains, which can cause multiple rollback+retry for a
      transaction as it conflicts with transactions in other domains one-by-one.
      Signed-off-by: default avatarKristian Nielsen <knielsen@knielsen-hq.org>
      4b4db4a8
    • Kristian Nielsen's avatar
      MDEV-33798: Follow-up patch · 2a2019e1
      Kristian Nielsen authored
      Don't deadlock kill event groups in other domains if they are not
      SPECULATE_OPTIMISTIC. Such event groups may not be able to safely roll back
      and retry (eg. DDL).
      
      But do deadlock kill a transaction T2 from a blocked transaction U in another
      domain, even if T2 has lower sub_id than U. Otherwise, in case of a cycle
      T2->T1->U->T2, we might not break the cycle if U is not SPECULATE_OPTIMISTIC
      Signed-off-by: default avatarKristian Nielsen <knielsen@knielsen-hq.org>
      2a2019e1
  4. 02 May, 2024 5 commits
    • Sergei Golubchik's avatar
      sporadic failures of binlog_encryption.rpl_parallel_gco_wait_kill · 3ee6f69d
      Sergei Golubchik authored
      CURRENT_TEST: binlog_encryption.rpl_parallel_gco_wait_kill
      mysqltest: In included file "./suite/rpl/t/rpl_parallel_gco_wait_kill.test":
      included from /home/buildbot/amd64-ubuntu-2004-debug/build/mysql-test/suite/binlog_encryption/rpl_parallel_gco_wait_kill.test at line 2:
      At line 334: Can't initialize replace from 'replace_result $thd_id THD_ID'
      
      An sql thread can reach the "Slave has read all relay log" state
      and then start reading relay log again. Let's use a more generic
      pattern to retrieve the sql thread ID even if it's not
      in the "read all relay log" state.
      3ee6f69d
    • Kristian Nielsen's avatar
      MDEV-34042: Deadlock kill of XA PREPARE can break replication /... · 596921da
      Kristian Nielsen authored
      MDEV-34042: Deadlock kill of XA PREPARE can break replication / rpl.rpl_parallel_multi_domain_xa sporadic failure
      
      Clear any pending deadlock kill after completing XA PREPARE, and before
      updating the mysql.gtid_slave_pos table in a separate transaction.
      Reviewed-by: default avatarAndrei Elkin <andrei.elkin@mariadb.com>
      Signed-off-by: default avatarKristian Nielsen <knielsen@knielsen-hq.org>
      596921da
    • Kristian Nielsen's avatar
      MDEV-33798: ROW base optimistic deadlock with concurrent writes on same table · e365877b
      Kristian Nielsen authored
      One case is conflicting transactions T1 and T2 with different domain id, in
      optimistic parallel replication in non-GTID mode. Then T2 will
      wait_for_prior_commit on T1; and if T1 got a row lock wait on T2 it would
      hang, as different domains caused the deadlock kill to be skipped in
      thd_rpl_deadlock_check().
      
      More generally, if we have transactions T1 and T2 in one domain/master
      connection, and independent transactions U in another, then we can
      still deadlock like this:
      
        T1 row low wait on U
        U row lock wait on T2
        T2 wait_for_prior_commit on T1
      
      This commit enforces the deadlock kill in these cases. If the waited-for
      transaction is speculatively applied, then it will be deadlock killed in
      case of a conflict, even if the two transactions are in different domains
      or master connections.
      Reviewed-by: default avatarAndrei Elkin <andrei.elkin@mariadb.com>
      Signed-off-by: default avatarKristian Nielsen <knielsen@knielsen-hq.org>
      e365877b
    • mariadb-DebarunBanerjee's avatar
      MDEV-33543 Server hang caused by InnoDB change buffer · 90b95c61
      mariadb-DebarunBanerjee authored
      Issue: When getting a page (buf_page_get_gen) with no latch option
      (RW_NO_LATCH), the caller is not expected to follow the B-tree latching
      order. However in buf_page_get_low we try to acquire shared page latch
      unconditionally to wait for a page that is being loaded by another
      thread concurrently. In general it could lead to latch order violation
      and deadlock.
      
      Currently it affects the change buffer insert path btr_latch_prev()
      which tries to load the previous page out of order with RW_NO_LATCH and
      two concurrent inserts into IBUF tree cause deadlock. This problem is
      introduced in 10.6 by following commit.
      commit 9436c778 (MDEV-27058)
      
      Fix: While trying to latch a page with RW_NO_LATCH, always use the
      "*lock_try" interface and retry operation on failure after unfixing the
      page.
      90b95c61
    • Sergei Golubchik's avatar
      fix sporadic failures of main.lock_sync · 9dfef3fb
      Sergei Golubchik authored
      wait for all connections to disconnect before the cleanup
      9dfef3fb
  5. 30 Apr, 2024 8 commits
  6. 29 Apr, 2024 3 commits
    • Sergei Golubchik's avatar
      Merge branch '10.5' into 10.6 · c1f3eff5
      Sergei Golubchik authored
      c1f3eff5
    • Yuchen Pei's avatar
      MDEV-30727 Check spider_hton_ptr in spider udfs · 267dd5a9
      Yuchen Pei authored
      We have to #undef my_error and find it from udfs when spider is not
      installed.
      267dd5a9
    • mariadb-DebarunBanerjee's avatar
      MDEV-33669 mariabackup --backup hangs · 52f6df99
      mariadb-DebarunBanerjee authored
      This is a server hang and not an issue with backup. While concurrent
      DDLs in server gets in hanged state, mariabackup waits for DDLs to
      finish trying to acquire MDL_BACKUP_BLOCK_DDL.
      
      The server hang is serious in nature and caused by thread pool state
      being incorrectly set to thread creation pending state while no creation
      is actually pending. Once a thread pool reaches such state no new thread
      gets created in the pool.
      
      While it could possibly affect all thread pools in server, the innodb
      thread pool is the victim in current bug where IO job gets blocked when
      the pool is stuck with much less number of threads than intended.
      Available workers are blocked in purge waiting for page lock to be
      released by IO write (SX lock) causing a complete deadlock.
      
      The issue is caused by the state variable m_thread_creation_pending
      introduced by MDEV-31095: 9e62ab7a. We check and set the variable
      early while attempting to create a new thread in pool but fail to reset
      it if we exit the flow for other reasons like maximum threads reached
      or get into thread creation throttling path.
      
      Fix: The simple fix is to make sure that the state is reset back in case
      we don't actually attempt to create the thread.
      52f6df99
  7. 28 Apr, 2024 2 commits
  8. 27 Apr, 2024 1 commit
    • Alexander Barkov's avatar
      MDEV-33534 UBSAN: Negation of -X cannot be represented in type 'long long... · 3141a68b
      Alexander Barkov authored
      MDEV-33534 UBSAN: Negation of -X cannot be represented in type 'long long int'; cast to an unsigned type to negate this value to itself in my_double_round from sql/item_func.cc|
      
      The negation in this line:
        ulonglong abs_dec= dec_negative ? -dec : dec;
      did not take into account that 'dec' can be the smallest possible
      signed negative value -9223372036854775808. Its negation is
      an operation with an undefined behavior.
      
      Fixing the code to use Longlong_hybrid, which implements a safe
      method to get an absolute value.
      3141a68b
  9. 26 Apr, 2024 5 commits
    • Sergei Golubchik's avatar
      sporadic failures of rpl.rpl_parallel_multi_domain_xa · 7ff64931
      Sergei Golubchik authored
      it's a slow test, the slave needs to catch up, reading >1500
      transactions. A default MASTER_GTID_WAIT() timeout in
      sync_with_master_gtid.inc is 120 seconds, which might be not
      enough for a slow/overloaded slave.
      
      Let's wait forever or until ./mtr --testcase-timeout,
      whatever comes first.
      7ff64931
    • Hugo Wen's avatar
      MDEV-33574 Improve mysqlbinlog error message · 3d417476
      Hugo Wen authored
      Previously, when running mysqlbinlog without providing a binlog file, it
      would print the entire help text, which was very verbose and made it
      difficult to identify the actual issue.
      
      Now change the behavior to print a more concise error message instead:
      
          "ERROR: Please provide the log file(s). Run with '--help' for usage instructions."
      
      This makes the error output more user-friendly and easier to understand,
      especially when running the tool in scripts or automated processes.
      
      All new code of the whole pull request, including one or several files
      that are either new files or modified ones, are contributed under the
      BSD-new license. I am contributing on behalf of my employer
      Amazon Web Services, Inc.
      3d417476
    • Daniele Sciascia's avatar
      Fixup 0ccdf54b · ef7a2344
      Daniele Sciascia authored
      0ccdf54b removed stack allocated THD objects from functions
      Wsrep_schema::replay_transaction(). However, it inadvertedly
      anticipated the destruction of the THD, causing assertions and usage
      of THD after it was destroyed.
      The fix consists in extracting the original function into a separate
      function, and leave the allocation and destruction of the THD object
      in Wsrep_schema::replay_transaction(), making sure that using the heap
      allocated THD has no side effects.
      Same for Wsrep_schema::recover_sr_transactions().
      Signed-off-by: default avatarJulius Goryavsky <julius.goryavsky@mariadb.com>
      ef7a2344
    • Sergei Golubchik's avatar
      MDEV-33492 fix installation of rpm/deb packages · 22a69c78
      Sergei Golubchik authored
      followup for 02715174
      22a69c78
    • Oleksandr Byelkin's avatar
      Merge branch '10.6' into 10.11 · c9b1ebee
      Oleksandr Byelkin authored
      c9b1ebee
  10. 25 Apr, 2024 8 commits
    • Jan Lindström's avatar
      MDEV-33896 : Galera test failure on galera_3nodes.MDEV-29171 · b3e531a3
      Jan Lindström authored
      Based on logs we might start SST before donor has reached
      Primary state. Because this test shutdowns all nodes we
      need to make sure when we start nodes that previous nodes
      have reached Primary state and joined the cluster.
      Signed-off-by: default avatarJulius Goryavsky <julius.goryavsky@mariadb.com>
      b3e531a3
    • Marko Mäkelä's avatar
      MDEV-26450 fixup: Remove a bogus assertion · 10d251e0
      Marko Mäkelä authored
      mtr_t::commit_shrink(): Do not assert that some previously clean pages
      will be flagged as modified by this mini-transaction. It could be the
      case that there had been no recent write-back of any of the undo
      tablespace pages that we are modifying when truncating the tablespace.
      It suffices to assert that some pages were modified again:
      ut_ad(m_modifications).
      
      This fixes up commit f5fddae3
      10d251e0
    • Sergei Golubchik's avatar
      sporadic failures of rpl.rpl_parallel_sbm · 9e925820
      Sergei Golubchik authored
      the test waits for the event to get stuck on MASTER_DELAY,
      but on a slow/overloaded slave the event might pass MASTER_DELAY
      before the test starts waiting.
      
      Wait for the event to get stuck on the LOCK TABLES (after MASTER_DELAY),
      the event cannot avoid that,
      9e925820
    • Marko Mäkelä's avatar
      MDEV-33993 Possible server hang on DROP INDEX or RENAME INDEX · 0936c138
      Marko Mäkelä authored
      commit_try_norebuild(): Add the parameter statistics_exist,
      similar to commit_try_rebuild(). If the InnoDB statistics tables
      did not exist, we will not attempt to update statistics later on
      during the transaction.
      
      Thanks to Matthias Leich for originally reproducing this scenario.
      0936c138
    • Kristian Nielsen's avatar
      MDEV-33602: Sporadic test failure in rpl.rpl_gtid_stop_start · 553a4d62
      Kristian Nielsen authored
      The test could fail with a duplicate key error because switching to non-GTID
      mode could start at the wrong old-style position. The position could be
      wrong when the previous GTID connect was stopped before receiving the fake
      GTID list event which gives the old-style position corresponding to the GTID
      connected position.
      
      Work-around by injecting an extra event and syncing the slave before
      switching to non-GTID mode.
      Signed-off-by: default avatarKristian Nielsen <knielsen@knielsen-hq.org>
      553a4d62
    • Marko Mäkelä's avatar
      MDEV-33974 Enable GNU libstdc++ debugging · a1c1f502
      Marko Mäkelä authored
      Starting with GCC 10, let us enable _GLIBCXX_DEBUG as well as
      _GLIBCXX_ASSERTIONS which have an impact on the GNU libstdc++.
      On GCC 8, we observed a compilation failure related to some
      missing type conversion.
      
      Even though clang on GNU/Linux would default to using libstdc++
      and enabling the debugging seems to work with clang-18, we will
      not enable this on clang, in case it would lead to compilation
      errors.
      
      For the clang libc++ before clang-15 there was _LIBCPP_DEBUG,
      but according to
      llvm/llvm-project@f3966eaf869b7bdd9113ab9d5b78469eb0f5f028 and
      llvm/llvm-project@13ea1343231fa4ae12fe9fba4c789728465783d7 and
      llvm/llvm-project@ff573a42cd1f1d05508f165dc3e645a0ec17edb5 it
      looks like that for proper results, a specially built debug version
      of libc++ would have to be used in order to enable equivalent checks.
      
      This should help catch bugs like the one that
      commit 455a15fd fixed.
      
      Reviewed by: Sergei Golubchik
      a1c1f502
    • Thirunarayanan Balathandayuthapani's avatar
      MDEV-33979 Disallow bulk insert operation during partition update statement · 8c8b7da0
      Thirunarayanan Balathandayuthapani authored
      Problem:
      ========
      - Partition update operation enables the bulk insert for the
      transaction while moving the row between partitions. This leads
      to debug assert failure while removing the row from one
      of the partition.
      
      Solution:
      ========
      - Disallow the bulk insert operation for non-insert operation
      of partition table.
      8c8b7da0
    • Marko Mäkelä's avatar
      MDEV-23974 fixup: Cover all debug builds · 72293842
      Marko Mäkelä authored
      While commit 75b7cd68 was a significant
      improvement, we occasionally got test failures of debug builds. One of
      the affected tests is innodb.innodb-64k-crash.
      72293842
  11. 24 Apr, 2024 4 commits