1. 29 Apr, 2024 1 commit
    • mariadb-DebarunBanerjee's avatar
      MDEV-33669 mariabackup --backup hangs · 52f6df99
      mariadb-DebarunBanerjee authored
      This is a server hang and not an issue with backup. While concurrent
      DDLs in server gets in hanged state, mariabackup waits for DDLs to
      finish trying to acquire MDL_BACKUP_BLOCK_DDL.
      
      The server hang is serious in nature and caused by thread pool state
      being incorrectly set to thread creation pending state while no creation
      is actually pending. Once a thread pool reaches such state no new thread
      gets created in the pool.
      
      While it could possibly affect all thread pools in server, the innodb
      thread pool is the victim in current bug where IO job gets blocked when
      the pool is stuck with much less number of threads than intended.
      Available workers are blocked in purge waiting for page lock to be
      released by IO write (SX lock) causing a complete deadlock.
      
      The issue is caused by the state variable m_thread_creation_pending
      introduced by MDEV-31095: 9e62ab7a. We check and set the variable
      early while attempting to create a new thread in pool but fail to reset
      it if we exit the flow for other reasons like maximum threads reached
      or get into thread creation throttling path.
      
      Fix: The simple fix is to make sure that the state is reset back in case
      we don't actually attempt to create the thread.
      52f6df99
  2. 26 Apr, 2024 1 commit
    • Daniele Sciascia's avatar
      Fixup 0ccdf54b · ef7a2344
      Daniele Sciascia authored
      0ccdf54b removed stack allocated THD objects from functions
      Wsrep_schema::replay_transaction(). However, it inadvertedly
      anticipated the destruction of the THD, causing assertions and usage
      of THD after it was destroyed.
      The fix consists in extracting the original function into a separate
      function, and leave the allocation and destruction of the THD object
      in Wsrep_schema::replay_transaction(), making sure that using the heap
      allocated THD has no side effects.
      Same for Wsrep_schema::recover_sr_transactions().
      Signed-off-by: default avatarJulius Goryavsky <julius.goryavsky@mariadb.com>
      ef7a2344
  3. 25 Apr, 2024 2 commits
  4. 23 Apr, 2024 2 commits
    • Monty's avatar
      Check and remove high stack usage · 0ccdf54b
      Monty authored
      I checked all stack overflow potential problems found with
      gcc -Wstack-usage=16384
      and
      clang -Wframe-larger-than=16384 -no-inline
      
      Fixes:
      Added '#pragma clang diagnostic ignored "-Wframe-larger-than="'
        to a lot of function to where stack usage large but resonable.
      - Added stack check warnings to BUILD scrips when using clang and debug.
      
      Function changed to use malloc instead allocating things on stack:
      - read_bootstrap_query() now allocates line_buffer (20000 bytes) with
        malloc() instead of using stack. This has a small performance impact
        but this is not releant for bootstrap.
      - mroonga grn_select() used 65856 bytes on stack. Changed it to use
        malloc().
      - Wsrep_schema::replay_transaction() and
        Wsrep_schema::recover_sr_transactions().
      - Connect zipOpen3()
      
      Not fixed:
      - mroonga/vendor/groonga/lib/expr.c grn_proc_call() uses
        43712 byte on stack.  However this is not easy to fix as the stack
        used is caused by a lot of code generated by defines.
      - Most changes in mroonga/groonga where only adding of pragmas to disable
        stack warnings.
      - rocksdb/options/options_helper.cc uses 20288 of stack space.
        (no reason to fix except to get rid of the compiler warning)
      - Causes using alloca() where the allocation size is resonable.
      - An issue in libmariadb (reported to connectors).
      0ccdf54b
    • Marko Mäkelä's avatar
      07faba08
  5. 22 Apr, 2024 2 commits
  6. 21 Apr, 2024 3 commits
  7. 20 Apr, 2024 5 commits
  8. 19 Apr, 2024 6 commits
    • Sergei Golubchik's avatar
      MDEV-33952 galera_create_table_as_select fails sporadically · 4a2e0345
      Sergei Golubchik authored
      disable until fixed
      4a2e0345
    • Zhibo Zhang's avatar
      Update tests to be compatible with OpenSSL 3.2.0 · 7432a487
      Zhibo Zhang authored
      As of version 3.2.0, OpenSSL updated the error message in new versions
      ("https://github.com/openssl/openssl/commit/81b741f68984"). Update the
      tests and result files such that they are compatible with both original
      and new error messages.
      
      All new code of the whole pull request, including one or several files that are
      either new files or modified ones, are contributed under the BSD-new
      license. I am contributing on behalf of my employer Amazon Web Services,
      Inc.
      7432a487
    • Marko Mäkelä's avatar
      Merge 10.5 into 10.6 · 15b607b5
      Marko Mäkelä authored
      15b607b5
    • Marko Mäkelä's avatar
      MDEV-33946: OPT_PAGE_CHECKSUM mismatch due to mtr_t::memmove() · 4c343394
      Marko Mäkelä authored
      mtr_t::memmove(): Revert to the parent of
      commit a032f14b
      where there was supposed to be an equivalent change
      that would avoid hitting a warning in some old version of GCC
      when this change was part of another 10.6 based developmet branch.
      
      For some reason, this change is not equivalent but will cause
      massive amounts of backup failures in the stress tests
      run by Matthias Leich, caught by
      commit 4179f93d in 10.6.
      4c343394
    • Marko Mäkelä's avatar
      MDEV-33325 fixup · ec7db2bd
      Marko Mäkelä authored
      ibuf_remove_free_page(): Correct the calculation of root_savepoint().
      The first entry acquired by ibuf_tree_root_get() will be ibuf.index.lock
      and not the change buffer root page.
      
      Thanks to Matthias Leich for finding this bug in RQG.
      Unfortunately, this code is very difficult to cover
      in our regression test suite.
      ec7db2bd
    • Marko Mäkelä's avatar
      MDEV-32791 MariaDB cannot be installed on Red Hat ubi9 · 8e663f5e
      Marko Mäkelä authored
      The libpmem dependency that had been added in
      commit 3daef523 (MDEV-17084)
      did not achieve any measurable performance improvement when
      comparing the same PMEM device with and without "mount -o dax"
      using the Linux ext4 file system.
      
      Because Red Hat has deprecated libpmem, let us remove the code
      altogether.
      
      Note: This is a 10.6 version of
      commit 3f9f5ca4
      which will retain PMEM support in MariaDB Server 10.11.
      8e663f5e
  9. 18 Apr, 2024 3 commits
    • Vladislav Vaintroub's avatar
      MDEV-16944 postfix. Fix a typo · 2e84560d
      Vladislav Vaintroub authored
      2e84560d
    • Marko Mäkelä's avatar
      Merge 10.5 into 10.6 · bb2e125d
      Marko Mäkelä authored
      This excludes commit 040069f4
      because it is specific to innodb_sync_debug, which had been removed
      in commit ff5d306e.
      bb2e125d
    • mariadb-DebarunBanerjee's avatar
      MDEV-32489 Change buffer index fails to delete the records · 5928e04d
      mariadb-DebarunBanerjee authored
      When the change buffer records for a page span across multiple change
      buffer leaf pages or the starting record is at the beginning of a page
      with a left sibling, ibuf_delete_recs deletes only the records in first
      page and fails to move to subsequent pages.
      
      Subsequently a slow shutdown hangs trying to delete those left over
      records.
      
      Fix-A: Position the cursor to an user record in B-tree and exit only
      when all records are exhausted.
      
      Fix-B: Make sure we call ibuf_delete_recs during slow shutdown for
      pages with IBUF entries to cleanup any previously left over records.
      5928e04d
  10. 17 Apr, 2024 14 commits
    • Brandon Nesterenko's avatar
      MDEV-27512: Assertion !thd->transaction_rollback_request failed in rows_event_stmt_cleanup · 0ad52e4d
      Brandon Nesterenko authored
      If replicating an event in ROW format, and InnoDB detects a deadlock
      while searching for a row, the row event will error and rollback in
      InnoDB and indicate that the binlog cache also needs to be cleared,
      i.e. by marking thd->transaction_rollback_request. In the normal
      case, this will trigger an error in Rows_log_event::do_apply_event()
      and cause a rollback. During the Rows_log_event::do_apply_event()
      cleanup of a successful event application, there is a DBUG_ASSERT in
      log_event_server.cc::rows_event_stmt_cleanup(), which sets the
      expectation that thd->transaction_rollback_request cannot be set
      because the general rollback (i.e. not the InnoDB rollback) should
      have happened already. However, if the replica is configured to skip
      deadlock errors, the rows event logic will clear the error and
      continue on, as if no error happened. This results in
      thd->transaction_rollback_request being set while in
      rows_event_stmt_cleanup(), thereby triggering the assertion.
      
      This patch fixes this in the following ways:
       1) The assertion is invalid, and thereby removed.
       2) The rollback case is forced in rows_event_stmt_cleanup() if
      transaction_rollback_request is set.
      
      Note the differing behavior between transactions which are skipped
      due to deadlock errors and other errors. When a transaction is
      skipped due to an ignored deadlock error, the entire transaction is
      rolled back and skipped (though note MDEV-33930 which allows
      statements in the same transaction after the deadlock-inducing one
      to commit). When a transaction is skipped due to ignoring a
      different error, only the erroring statements are rolled-back and
      skipped - the rest of the transaction will execute as normal. The
      effect of this can be seen in the test results. The added test case
      to rpl_skip_error.test shows that only statements which are ignored
      due to non-deadlock errors are ignored in larger transactions. A
      diff between rpl_temporary_error2_skip_all.result and
      rpl_temporary_error2.result shows that all statements in the errored
      transaction are rolled back (diff pasted below):
      
      : diff rpl_temporary_error2.result rpl_temporary_error2_skip_all.result
      49c49
      < 2	1
      ---
      > 2	NULL
      51c51
      < 4	1
      ---
      > 4	NULL
      53c53
      < * There will be two rows in t2 due to the retry.
      ---
      > * There will be one row in t2 because the ignored deadlock does not retry.
      57d56
      < 1
      59c58
      < 1
      ---
      > 0
      
      Reviewed By:
      ============
      Andrei Elkin <andrei.elkin@mariadb.com>
      0ad52e4d
    • Vladislav Vaintroub's avatar
      MDEV-16944 Fix file sharing issues on Windows in mysqltest · 061adae9
      Vladislav Vaintroub authored
      On Windows systems, occurrences of ERROR_SHARING_VIOLATION due to
      conflicting share modes between processes accessing the same file can
      result in CreateFile failures.
      
      mysys' my_open() already incorporates a workaround by implementing
      wait/retry logic on Windows.
      
      But this does not help if files are opened using shell redirection like
      mysqltest traditionally did it, i.e via
      
      --echo exec "some text" > output_file
      
      In such cases, it is cmd.exe, that opens the output_file, and it
      won't do any sharing-violation retries.
      
      This commit addresses the issue by introducing a new built-in command,
      'write_line', in mysqltest. This new command serves as a brief alternative
      to 'write_file', with a single line output, that also resolves variables
      like "exec" would.
      
      Internally, this command will use my_open(), and therefore retry-on-error
      logic.
      
      Hopefully this will eliminate the very sporadic "can't open file because
      it is used by another process" error on CI.
      061adae9
    • Vladislav Vaintroub's avatar
      Remove duplicate key "Language" from .clang-format · b48de973
      Vladislav Vaintroub authored
      Latest Visual Studio complains about invalid format, it breaks formatting
      in the IDE
      b48de973
    • Vladislav Vaintroub's avatar
      Do not run maria_recover_encrypted with embedded. · 173847b7
      Vladislav Vaintroub authored
      It uses shutdown/restart etc, features not compatible the embedded.
      
      also add have_debug.inc , since it uses debug_dbug variable
      173847b7
    • Vladislav Vaintroub's avatar
      Fix LTO (aka interprocedural optimization) build with MSVC · e87a175b
      Vladislav Vaintroub authored
      Also, disable MSVC LTO for static client libraries - they won't be usable
      for end-users.
      e87a175b
    • Marko Mäkelä's avatar
      MDEV-33779 InnoDB row operations could be faster · e459ce83
      Marko Mäkelä authored
      We have quite a few assertions
      	ut_a(m_prebuilt->trx == thd_to_trx(ha_thd()));
      in low-level functions.
      These had better be debug assertions for performance reasons.
      It should suffice to check that condition in the less frequently invoked
      ha_innobase::change_active_index().
      
      convert_search_mode_to_innobase(): Return whether the mode is
      unsupported, and optionally update ha_innobase::m_last_match_mode.
      
      ha_innobase::index_read(): Only branch on find_flag once, and
      simplify the error handling after invoking row_search_mvcc().
      
      ha_innobase::rnd_pos(): Remove an assertion that is duplicating one
      in ha_innobase::index_read(), which we are calling unconditionally.
      
      ha_innobase::records_in_range(): Check only once whether
      min_key, max_key are null pointers.
      
      row_sel_convert_mysql_key_to_innobase(): Declare all parameters
      except the conversion buffer pointer (buf) to be nonnull.
      
      Reviewed by: Debarun Banerjee
      e459ce83
    • Marko Mäkelä's avatar
      Merge 10.5 into 10.6 · 829cb1a4
      Marko Mäkelä authored
      829cb1a4
    • Marko Mäkelä's avatar
      MDEV-33855 MSAN use-of-uninitialized-value in rtr_pcur_getnext_from_path() · 46e9e92e
      Marko Mäkelä authored
      rtr_pcur_getnext_from_path(): Remove a bogus assertion
      that may cause a data races with buf_LRU_block_free_non_file_page().
      
      If my_latch_mode == BTR_MODIFY_LEAF, we would have released all page
      latches and buffer-fixes by invoking mtr->rollback_to_savepoint(1).
      After this point, the btr_cur->page_cur.block is no longer valid and
      must not be accessed.
      
      Before 03ca6495 this assertion had
      been disabled, because the preprocessor symbol UNIV_RTR_DEBUG
      had never been enabled (except when explicitly specified in
      CMAKE_CXX_FLAGS).
      
      Reviewed by: Debarun Banerjee
      46e9e92e
    • mariadb-DebarunBanerjee's avatar
      MDEV-33431 Latching order violation reported fil_system.sys_space.latch and... · 040069f4
      mariadb-DebarunBanerjee authored
      MDEV-33431 Latching order violation reported fil_system.sys_space.latch and ibuf_pessimistic_insert_mutex
      
      Issue:
      ------
      The actual order of acquisition of the IBUF pessimistic insert mutex
      (SYNC_IBUF_PESS_INSERT_MUTEX) and IBUF header page latch
      (SYNC_IBUF_HEADER) w.r.t space latch (SYNC_FSP) differs from the order
      defined in sync0types.h. It was not discovered earlier as the path to
      ibuf_remove_free_page was not covered by the mtr test. Ideal order and
      one defined in sync0types.h is as follows.
      SYNC_IBUF_HEADER -> SYNC_IBUF_PESS_INSERT_MUTEX -> SYNC_FSP
      
      In ibuf_remove_free_page, we acquire space latch earlier and we have
      the order as follows resulting in the assert with innodb_sync_debug=on.
      SYNC_FSP -> SYNC_IBUF_HEADER -> SYNC_IBUF_PESS_INSERT_MUTEX
      
      Fix:
      ---
      We do maintain this order in other places and there doesn't seem to be
      any real issue here. To reduce impact in GA versions, we avoid doing
      extensive changes in mutex ordering to match the current
      SYNC_IBUF_PESS_INSERT_MUTEX order. Instead we relax the ordering check
      for IBUF pessimistic insert mutex using SYNC_NO_ORDER_CHECK.
      040069f4
    • Vladislav Vaintroub's avatar
      MDEV-33840 tpool- switch to longer maintainence timer interval, if pool is idle · f6e9600f
      Vladislav Vaintroub authored
      Previous solution, that would entirely switch timer off, turned out
      to be deadlock prone.
      
      This patch fixed previous attempt to switch between long/short interval
      periods in MDEV-24295. Now, initial state of the timer is fixed (it is ON).
      Also, avoid switching timer to longer periods if there is any activity in
      the pool.
      f6e9600f
    • Vladislav Vaintroub's avatar
      2ba79aba
    • Marko Mäkelä's avatar
      Merge 10.4 into 10.5 · 3a3fe300
      Marko Mäkelä authored
      3a3fe300
    • Marko Mäkelä's avatar
      Tests: remove a duplicated check · 9164c2b8
      Marko Mäkelä authored
      This fixes up the merge commit 9b182756
      9164c2b8
    • Jan Lindström's avatar
      MDEV-33895 : Galera test failure on galera_sr.MDEV-25718 · 4aeba259
      Jan Lindström authored
      Test was waiting INSERT-clause to make rollback but
      wait_condition was too tight. State could be
      Freeing items or Rollback. Fixed wait_condition
      to expect one of them.
      4aeba259
  11. 16 Apr, 2024 1 commit