1. 16 May, 2024 5 commits
  2. 24 Apr, 2024 3 commits
  3. 04 Apr, 2024 2 commits
    • Marko Mäkelä's avatar
      Merge 10.11 into 11.0 · e3ac7b80
      Marko Mäkelä authored
      e3ac7b80
    • Marko Mäkelä's avatar
      MDEV-33545: Improve innodb_doublewrite to cover NO_FSYNC · 1122ac97
      Marko Mäkelä authored
      In commit 24648768 (MDEV-30136)
      the parameter innodb_flush_method was deprecated, with no direct
      replacement for innodb_flush_method=O_DIRECT_NO_FSYNC.
      
      Let us change innodb_doublewrite from Boolean to ENUM that can
      be changed while the server is running:
      
      OFF: Assume that writes of innodb_page_size are atomic
      ON: Prevent torn writes (the default)
      fast: Like ON, but avoid synchronizing writes to data files
      
      The deprecated start-up parameter innodb_flush_method=NO_FSYNC will cause
      innodb_doublewrite=ON to be changed to innodb_doublewrite=fast,
      which will prevent InnoDB from making any durable writes to data files.
      This would normally be done right before the log checkpoint LSN is updated.
      Depending on the file systems being used and their configuration,
      this may or may not be safe.
      
      The value innodb_doublewrite=fast differs from the previous combination of
      innodb_doublewrite=ON and innodb_flush_method=O_DIRECT_NO_FSYNC by always
      invoking os_file_flush() on the doublewrite buffer itself
      in buf_dblwr_t::flush_buffered_writes_completed(). This should be safer
      when there are multiple doublewrite batches between checkpoints.
      Typically, once per second, buf_flush_page_cleaner() would write out
      up to innodb_io_capacity pages and advance the log checkpoint.
      Also typically, innodb_io_capacity>128, which is the size of the
      doublewrite buffer in pages. Should os_file_flush_func() not be invoked
      between doublewrite batches, writes could be reordered in an unsafe way.
      
      The setting innodb_doublewrite=fast could be safe when the doublewrite
      buffer (the first file of the system tablespace) and the data files
      reside in the same file system.
      
      This was tested by running "./mtr --rr innodb.alter_kill". On the first
      server startup, with innodb_doublewrite=fast, os_file_flush_func()
      would only be invoked on the ibdata1 file and possibly ib_logfile0.
      On subsequent startups with innodb_doublewrite=OFF, os_file_flush_func()
      will be invoked on the individual data files during log_checkpoint().
      
      Note: The setting debug_no_sync (in the code, my_disable_sync) would
      disable all durable writes to InnoDB files, which would be much less safe.
      
      IORequest::Type: Introduce special values WRITE_DBL and PUNCH_DBL
      for asynchronous writes that are submitted via the doublewrite buffer.
      In this way, fil_space_t::use_doublewrite() or buf_dblwr.in_use()
      will only be consulted during buf_page_t::flush() and the doublewrite
      buffer can be enabled or disabled without any fear of inconsistency.
      
      buf_dblwr_t::block_size: Replaces block_size().
      
      buf_dblwr_t::flush_buffered_writes(): If !in_use() and the doublewrite
      buffer is empty, just invoke fil_flush_file_spaces() and return. The
      doublewrite buffer could have been disabled while a batch was in
      progress.
      
      innodb_init_params(): If innodb_flush_method=O_DIRECT_NO_FSYNC,
      set innodb_doublewrite=fast or innodb_doublewrite=fearless.
      
      Thanks to Mark Callaghan for reporting this, and Vladislav Vaintroub
      for feedback.
      1122ac97
  4. 03 Apr, 2024 2 commits
    • Aleksey Midenkov's avatar
      Columnstore empty submodule fix 2 · af4df93c
      Aleksey Midenkov authored
      Original problem was error when configuring without initialized
      submodule Columnstore:
      
        The source directory
      
          /home/midenok/src/mariadb/10.6c/src/storage/columnstore/columnstore
      
        does not contain a CMakeLists.txt file.
      
      The original fix disabled Columnstore build when PLUGIN_COLUMNSTORE
      was not defined, but this seems to be wrong and any plugin should be
      built if it is not explicitly disabled. This is expected by buildbots.
      
      Thanks to Vladislav Vaintroub <vvaintroub@gmail.com> for the fix
      af4df93c
    • Jan Lindström's avatar
      baec63e3
  5. 02 Apr, 2024 1 commit
  6. 01 Apr, 2024 3 commits
    • Aleksey Midenkov's avatar
      Columnstore empty submodule fix · 099ca49c
      Aleksey Midenkov authored
      CMake doesn't set ${PLUGIN_COLUMNSTORE} to anything.
      099ca49c
    • Aleksey Midenkov's avatar
      MDEV-29872 MSAN/Valgrind uninitialised value errors in TABLE::vers_switch_partition · c4776974
      Aleksey Midenkov authored
      Delayed_insert has its own THD (initialized at mysql_insert()) and
      hence its own LEX. Delayed_insert initalizes a very few parameters for
      LEX and 'duplicates' is not in this list. Now we copy this missing
      parameter from parser LEX (as well as sql_command).
      c4776974
    • Aleksey Midenkov's avatar
      MDEV-31903 Server crashes in _ma_reset_history upon UNLOCK table with... · d966e55c
      Aleksey Midenkov authored
      MDEV-31903 Server crashes in _ma_reset_history upon UNLOCK table with auto-create history partitions
      
      When INSERT does auto-create for t1 all its handler instances are
      closed by alter_close_table(). At this time down the stack
      maria_close() clears share->state_history. Later when we unlock the
      tables Aria transaction manager accesses old share instance (the one
      before t1 was closed) and tries to reset its state_history.
      
      The problem is maria_close() didn't remove table from transaction's
      list (used_tables). The fix does _ma_remove_table_from_trnman() which
      is triggered by HA_EXTRA_PREPARE_FOR_RENAME.
      d966e55c
  7. 28 Mar, 2024 4 commits
    • Marko Mäkelä's avatar
      Merge 10.11 into 11.0 · fec2fd6a
      Marko Mäkelä authored
      fec2fd6a
    • Marko Mäkelä's avatar
      MDEV-33515 fixup for POWER · a79fb66a
      Marko Mäkelä authored
      a79fb66a
    • Marko Mäkelä's avatar
      Merge 10.6 into 10.11 · 78895346
      Marko Mäkelä authored
      Some fixes related to commit f838b2d7 and
      Rows_log_event::do_apply_event() and Update_rows_log_event::do_exec_row()
      for system-versioned tables were provided by Nikita Malyavin.
      This was required by test versioning.rpl,trx_id,row.
      78895346
    • Robin Newhouse's avatar
      Enable mini-benchmark to run with perf · 6efa75a8
      Robin Newhouse authored
      The mini-benchmark.sh script failed to run in the latest Fedora
      distributions in GitLab CI. Executing the benchmark inside a Docker
      container had failed because the check for `perf` was done in a way that
      caused the benchmark to exit because of the `set -e` option. Test and
      skip `perf` to allowing the remaining benchmark activities to proceed.
      
      This check was added in acb6684 but inadvertantly reverted in 42a1f94.
      
      Logic was corrected to only run perf when the flag is enabled, and to
      prevent perf stat and perf record from being simultaneously enabled.
      
      Set -ex is also added to enable easier identification of mini-benchmark
      issues in the future.
      
      All new code of the whole pull request, including one or several files
      that are either new files or modified ones, are contributed under the
      BSD-new license. I am contributing on behalf of my employer Amazon Web
      Services, Inc.
      6efa75a8
  8. 27 Mar, 2024 8 commits
    • Marko Mäkelä's avatar
      Merge 10.5 into 10.6 · ccb7a1e9
      Marko Mäkelä authored
      ccb7a1e9
    • Alexander Barkov's avatar
      MDEV-33772 Bad SEPARATOR value in GROUP_CONCAT on character set conversion · 0fc123c5
      Alexander Barkov authored
      Item_func_group_concat::print() did not take into account
      that Item_func_group_concat::separator can be of a different character set
      than the "String *str" (when the printing is being done to).
      Therefore, printing did not work correctly for:
      - non-ASCII separators when GROUP_CONCAT is done on 8bit data
        or multi-byte data with mbminlen==1.
      - all separators (even including simple ones like comma)
        when GROUP_CONCAT is done on ucs2/utf16/utf32 data (mbminlen>1).
      
      Because of this problem, VIEW definitions did not print correctly to
      their FRM files. This later led to a wrong SELECT and SHOW CREATE output.
      
      Fix:
      
      - Adding new String methods:
      
        bool append_for_single_quote_using_mb_wc(const char *str, size_t length,
                                                 CHARSET_INFO *cs);
      
        bool append_for_single_quote_opt_convert(const char *str,
                                                 size_t length,
                                                 CHARSET_INFO *cs)
      
        which perform both escaping and character set conversion at the same time.
      
      - Adding a new String method escaped_wc_for_single_quote(),
        to reuse the code between the old and the new methods.
      
      - Fixing Item_func_group_concat::print() to use the new
        method append_for_single_quote_opt_convert().
      0fc123c5
    • Marko Mäkelä's avatar
      MDEV-33515 fixup: Clarify mtr_t::spin_wait_delay · 0c6cac0a
      Marko Mäkelä authored
      innodb_log_spin_wait_delay_update(): Always acquire log_sys.latch
      to protect the change of mtr_t::spin_wait_delay.
      
      log_t::lock_lsn(): In the general case, actually use
      mtr_t::spin_wait_delay as it was intended. In the x86 specific
      log_t::lock_lsn_bts() we used mtr_t::spin_wait_delay.
      0c6cac0a
    • Dave Gosselin's avatar
      MDEV-33460 select '123' 'x'; unexpected result · 58df2097
      Dave Gosselin authored
      Queries that select concatenated constant strings now have
      colname and value that match.  For example,
        SELECT '123' 'x';
      will return a result where the column name and value both
      are '123x'.
      
      Review: Daniel Black
      58df2097
    • Daniel Black's avatar
      MDEV-33301 memlock with systemd still not working · 76a27155
      Daniel Black authored
      .. even with MDEV-9095 fix
      
      CapabilityBounding sets require filesystem setcap attributes
      for the executable to gain privileges during execution.
      
      A side effect of this however is the getauxvec(AT_SECURE) gets
      set, and the secure_getenv from OpenSSL internals on
      OPENSSL_CONF environment variable will get ignored (openssl gh issue
      21770).
      
      According to capabilities(7), Ambient capabilities don't trigger
      ld.so triggering the secure execution mode.
      
      Include SELinux and Apparmor capabilities for ipc_lock
      76a27155
    • Daniel Black's avatar
      Revert "MDEV-33636: RPM caps is on mariadbd exe" · ee2ed1a0
      Daniel Black authored
      This was the orginal implementation that reverted with a bunch of
      commits.
      
      This reverts commit a13e521b.
      
      Revert "cmake: append to the array correctly"
      This reverts commit 51e3f1da.
      
      Revert "build failure with cmake < 3.10"
      This reverts commit 49cf702e.
      
      Revert "MDEV-33301 memlock with systemd still not working"
      This reverts commit 8a1904d7.
      ee2ed1a0
    • Jan Lindström's avatar
      MDEV-33039 Galera test failure on mysql-wsrep-features#165 · c5ac9836
      Jan Lindström authored
      We should not set debug sync point when holding a mutex
      to avoid mutex ordering failure.
      Signed-off-by: default avatarJulius Goryavsky <julius.goryavsky@mariadb.com>
      c5ac9836
    • Denis Protivensky's avatar
      MDEV-33136: Properly BF-abort user transactions with explicit locks · 7bf3c312
      Denis Protivensky authored
      User transactions may acquire explicit MDL locks from InnoDB level
      when persistent statistics is re-read for a table.
      If such a transaction would be subject to BF-abort, it was improperly
      detected as a system transaction and wouldn't get aborted.
      
      The fix: Check if a transaction holding explicit MDL locks is a user
      transaction in the MDL conflict handling code.
      Signed-off-by: default avatarJulius Goryavsky <julius.goryavsky@mariadb.com>
      7bf3c312
  9. 26 Mar, 2024 2 commits
    • Vladislav Vaintroub's avatar
      MDEV-33506 Show original IP in the "aborted" message. · 318000cf
      Vladislav Vaintroub authored
      Add "real ip:<ip_or_localhost>" part to the aborted message
      Only for proxy-protocoled connection, so it does not  not to cause
      confusion to normal users.
      318000cf
    • Jan Lindström's avatar
      MDEV-33278 : Assertion failure in thd_get_thread_id at lock_wait_wsrep · b762541d
      Jan Lindström authored
      Problem is that not all conflicting transactions have THD object.
      Therefore, it must be checked that victim has THD
      before it's identification is added to victim list as victim's
      thread identification is later requested using thd_get_thread_id
      function that requires that we have valid pointer to THD object
      in trx->mysql_thd.
      
      Victim might not have trx->mysql_thd in two cases:
      
      (1) An incomplete transaction that was recovered from undo logs
      on server startup (and not yet rolled back).
      
      (2) Transaction that is in XA PREPARE state and whose client
      connection was disconnected.
      
      Neither of these can complete before lock_wait_wsrep()
      releases lock_sys.latch.
      
      (1) trx_t::commit_in_memory() is clearing both
      trx_t::state and trx_t::is_recovered before it invokes
      lock_release(trx_t*) (which would be blocked by the exclusive
      lock_sys.latch that we are holding here). Hence, it is not
      possible to write a debug assertion to document this scenario.
      
      (2) If is in XA PREPARE state, it would eventually be rolled
      back and the lock conflict would be resolved when an XA COMMIT
      or XA ROLLBACK statement is executed in some other connection.
      Signed-off-by: default avatarJulius Goryavsky <julius.goryavsky@mariadb.com>
      b762541d
  10. 25 Mar, 2024 4 commits
  11. 22 Mar, 2024 5 commits
    • Marko Mäkelä's avatar
      MDEV-32364 fixup: crash in ut_dontdump() · 70b90772
      Marko Mäkelä authored
      70b90772
    • Marko Mäkelä's avatar
      MDEV-33591 MONITOR_INC_VALUE_CUMULATIVE is executed regardless of "if" condition · f0590db5
      Marko Mäkelä authored
      MONITOR_INC_VALUE_CUMULATIVE is a multiline macro, so the second statement
      will be executed always, regardless of "if" condition.
      
      These problems first started with
      commit b1ab211d (MDEV-15053).
      
      Thanks to Yury Chaikou from ServiceNow for the report.
      f0590db5
    • Marko Mäkelä's avatar
      MDEV-33454 release row locks for non-modified rows at XA PREPARE · 17e59ed3
      Marko Mäkelä authored
      From the correctness point of view, it should be safe to release
      all locks on index records that were not modified by the transaction.
      Doing so should make the locks after XA PREPARE fully compatible
      with what would happen if the server were restarted: InnoDB table
      IX locks and exclusive record locks would be resurrected based on
      undo log records.
      
      Concurrently running transactions that are waiting for a lock may invoke
      lock_rec_convert_impl_to_expl() to create an explicit record lock object
      on behalf of the lock-owning transaction so that they can attaching
      their waiting lock request on the explicit record lock object. Explicit
      locks would be released by trx_t::release_locks() during commit or
      rollback.
      
      Any clustered index record whose DB_TRX_ID belongs to a transaction that
      is in active or XA PREPARE state will be implicitly locked by that
      transaction. On XA PREPARE, we can release explicit exclusive locks on
      records whose DB_TRX_ID does not match the current transaction identifier.
      
      lock_rec_unlock_unmodified(): Release record locks that are not implicitly
      held by the current transaction.
      
      lock_release_on_prepare_try(), lock_release_on_prepare():
      Invoke lock_rec_unlock_unmodified().
      
      row_trx_id_offset(): Declare non-static.
      
      lock_rec_unlock(): Replaces lock_rec_unlock_supremum().
      
      Reviewed by: Vladislav Lesin
      17e59ed3
    • Marko Mäkelä's avatar
      MDEV-33613 InnoDB may still hang when temporarily running out of buffer pool · fa8a46eb
      Marko Mäkelä authored
      By design, InnoDB has always hung when permanently running out of
      buffer pool, for example when several threads are waiting to allocate
      a block, and all of the buffer pool is buffer-fixed by the active threads.
      
      The hang that we are fixing here occurs when the buffer pool is only
      temporarily running out and the situation could be rescued by writing out
      some dirty pages or evicting some clean pages.
      
      buf_LRU_get_free_block(): Simplify the way how we wait for
      the buf_flush_page_cleaner thread. This fixes occasional hangs
      of the test encryption.innochecksum that were introduced by
      commit a55b951e (MDEV-26827).
      To play it safe, we use a timed wait when waiting for the
      buf_flush_page_cleaner() thread to perform its job. Should that
      thread get stuck, we will invoke buf_pool.LRU_warn() in order to
      display a message that pages could not be freed, and keep trying
      to wake up the buf_flush_page_cleaner() thread.
      
      The INFORMATION_SCHEMA.INNODB_METRICS counters
      buffer_LRU_single_flush_failure_count and
      buffer_LRU_get_free_waits will be removed.
      The latter is represented by buffer_pool_wait_free.
      
      Also removed will be the message
      "InnoDB: Difficult to find free blocks in the buffer pool"
      because in d34479dc we
      introduced a more precise message
      "InnoDB: Could not free any blocks in the buffer pool"
      in the buf_flush_page_cleaner thread.
      
      buf_pool_t::LRU_warn(): Issue the warning message that we could
      not free any blocks in the buffer pool. This may also be invoked
      by buf_LRU_get_free_block() if buf_flush_page_cleaner() appears
      to be stuck.
      
      buf_pool_t::n_flush_dec(): Remove.
      
      buf_pool_t::n_flush_dec_holding_mutex(): Rename to n_flush_dec().
      
      buf_flush_LRU_list_batch(): Increment the eviction counter for blocks
      of temporary, discarded or dropped tablespaces.
      
      buf_flush_LRU(): Make static, and remove the constant parameter
      evict=false. The only caller will be the buf_flush_page_cleaner()
      thread.
      
      IORequest::is_LRU(): Remove. The only case of evicting pages on
      write completion will be when we are writing out pages of the
      temporary tablespace. Those pages are not in buf_pool.flush_list,
      only in buf_pool.LRU.
      
      buf_page_t::flush(): Remove the parameter evict.
      
      buf_page_t::write_complete(): Change the parameter "bool temporary"
      to "bool persistent" and add a parameter for an already read state().
      
      Reviewed by: Debarun Banerjee
      fa8a46eb
    • Marko Mäkelä's avatar
      MDEV-33515 log_sys.lsn_lock causes excessive context switching · bf0b82d2
      Marko Mäkelä authored
      The log_sys.lsn_lock is a very contended resource with a small
      critical section in log_sys.append_prepare(). On many processor
      microarchitectures, replacing the system call based log_sys.lsn_lock
      with a pure spin lock would fare worse during high concurrency workloads,
      wasting a significant amount of CPU cycles in the spin loop.
      
      On other microarchitectures, we would see a significant amount of time
      being spent in native_queued_spin_lock_slowpath() in the Linux kernel,
      plus context switching between user and kernel address space. This was
      pointed out by Steve Shaw from Intel Corporation.
      
      Depending on the workload and the hardware implementation, it may be
      useful to use a pure spin lock in log_sys.append_prepare().
      We will introduce a parameter. The statement
      
      	SET GLOBAL INNODB_LOG_SPIN_WAIT_DELAY=50;
      
      would enable a spin lock that will execute that many MY_RELAX_CPU()
      operations (such as the x86 PAUSE instruction) between successive
      attempts of acquiring the spin lock. The use of a system call based
      log_sys.lsn_lock (which is the default setting) can be enabled by
      
      	SET GLOBAL INNODB_LOG_SPIN_WAIT_DELAY=0;
      
      This patch will also introduce #ifdef LOG_LATCH_DEBUG
      (part of cmake -DWITH_INNODB_EXTRA_DEBUG=ON) for more accurate
      tracking of log_sys.latch ownership and reorganize the fields of
      log_sys to improve the locality of reference and to reduce the
      chances of false sharing.
      
      When a spin lock is being used, it will be maintained in the
      most significant bit of log_sys.buf_free. This is useful, because that is
      one of the fields that is covered by the lock. For IA-32 or AMD64, we
      implement the spin lock specially via log_t::lsn_lock_bts(), employing the
      i386 LOCK BTS instruction. A straightforward std::atomic::fetch_or() would
      translate into an inefficient loop around LOCK CMPXCHG.
      
      mtr_t::spin_wait_delay: The value of innodb_log_spin_wait_delay.
      
      mtr_t::finisher: Pointer to the currently used mtr_t::finish_write()
      implementation. This allows to avoid introducing conditional branches.
      We no longer invoke log_sys.is_pmem() at the mini-transaction level,
      but we would do that in log_write_up_to().
      
      mtr_t::finisher_update(): Update finisher when spin_wait_delay is
      changed from or to 0 (the spin lock is changed to log_sys.lsn_lock or
      vice versa).
      bf0b82d2
  12. 21 Mar, 2024 1 commit
    • Brandon Nesterenko's avatar
      MDEV-33551: Semi-sync Wait Point AFTER_COMMIT Slow on Workloads with Heavy Concurrency · 75c7c6dc
      Brandon Nesterenko authored
      When using semi-sync replication with
      rpl_semi_sync_master_wait_point=AFTER_COMMIT, the performance of the
      primary can significantly reduce compared to AFTER_SYNC's
      performance for workloads with many concurrent users executing
      transactions. This is because all connections on the primary share
      the same cond_wait variable/mutex pair, so any time an ACK is
      received from a replica, all waiting connections are awoken to check
      if the ACK was for itself, which is done in mutual exclusion.
      
      This patch changes this such that the waiting THD will use its own
      local condition variable, and the ACK receiver thread only signals
      connections which have been ACKed for wakeup. That is, the
      THD::LOCK_wakeup_ready condition variable is re-used for this
      purpose, and the Active_tranx queue nodes are extended to hold the
      waiting thread, so it can be signalled once ACKed.
      
      Additionally:
      
       1)  Removed part of MDEV-11853 additions, which allowed suspended
      connection threads awaiting their semi-sync ACKs to live until their
      ACKs had been received. This part, however, wasn't needed.  That is,
      all that was needed was for the Ack_thread to survive.  So now the
      connection threads are killed during phase 1. Thereby
      THD::is_awaiting_semisync_ack, and all its related code was removed.
      
       2) COND_binlog_send is repurposed to signal on the condition when
      Active_tranx is emptied during clear_active_tranx_nodes.
      
       3) At master shutdown (when waiting for slaves), instead of the
      main loop individually waiting for each ACK, await_slave_reply()
      (renamed await_all_slave_replies()) just waits once for the
      repurposed COND_binlog_send to signal it is empty.
      
       4) Test rpl_semi_sync_shutdown_await_ack is updates as following:
         4.1) Added test case (adapted from Kristian Nielsen) to ensure
      that if a thread awaiting its ACK is killed while SHUTDOWN WAIT FOR
      ALL SLAVES is issued, the primary will still wait for the ACK from
      the killed thread.
         4.2) As connections which by-passed phase 1 of thread killing no
      longer are delayed for kill until phase 2, we can no longer query
      yes/no tx after receiving an ACK/timeout. The check for these
      variables is removed.
         4.3) Comment descriptions are updated which mention that the
      connection is alive; and adjusted to be the Ack_thread.
      
      Reviewed By:
      ============
      Kristian Nielsen <knielsen@knielsen-hq.org>
      75c7c6dc