1. 22 Mar, 2024 2 commits
    • Marko Mäkelä's avatar
      MDEV-33454 release row locks for non-modified rows at XA PREPARE · 17e59ed3
      Marko Mäkelä authored
      From the correctness point of view, it should be safe to release
      all locks on index records that were not modified by the transaction.
      Doing so should make the locks after XA PREPARE fully compatible
      with what would happen if the server were restarted: InnoDB table
      IX locks and exclusive record locks would be resurrected based on
      undo log records.
      
      Concurrently running transactions that are waiting for a lock may invoke
      lock_rec_convert_impl_to_expl() to create an explicit record lock object
      on behalf of the lock-owning transaction so that they can attaching
      their waiting lock request on the explicit record lock object. Explicit
      locks would be released by trx_t::release_locks() during commit or
      rollback.
      
      Any clustered index record whose DB_TRX_ID belongs to a transaction that
      is in active or XA PREPARE state will be implicitly locked by that
      transaction. On XA PREPARE, we can release explicit exclusive locks on
      records whose DB_TRX_ID does not match the current transaction identifier.
      
      lock_rec_unlock_unmodified(): Release record locks that are not implicitly
      held by the current transaction.
      
      lock_release_on_prepare_try(), lock_release_on_prepare():
      Invoke lock_rec_unlock_unmodified().
      
      row_trx_id_offset(): Declare non-static.
      
      lock_rec_unlock(): Replaces lock_rec_unlock_supremum().
      
      Reviewed by: Vladislav Lesin
      17e59ed3
    • Marko Mäkelä's avatar
      MDEV-33613 InnoDB may still hang when temporarily running out of buffer pool · fa8a46eb
      Marko Mäkelä authored
      By design, InnoDB has always hung when permanently running out of
      buffer pool, for example when several threads are waiting to allocate
      a block, and all of the buffer pool is buffer-fixed by the active threads.
      
      The hang that we are fixing here occurs when the buffer pool is only
      temporarily running out and the situation could be rescued by writing out
      some dirty pages or evicting some clean pages.
      
      buf_LRU_get_free_block(): Simplify the way how we wait for
      the buf_flush_page_cleaner thread. This fixes occasional hangs
      of the test encryption.innochecksum that were introduced by
      commit a55b951e (MDEV-26827).
      To play it safe, we use a timed wait when waiting for the
      buf_flush_page_cleaner() thread to perform its job. Should that
      thread get stuck, we will invoke buf_pool.LRU_warn() in order to
      display a message that pages could not be freed, and keep trying
      to wake up the buf_flush_page_cleaner() thread.
      
      The INFORMATION_SCHEMA.INNODB_METRICS counters
      buffer_LRU_single_flush_failure_count and
      buffer_LRU_get_free_waits will be removed.
      The latter is represented by buffer_pool_wait_free.
      
      Also removed will be the message
      "InnoDB: Difficult to find free blocks in the buffer pool"
      because in d34479dc we
      introduced a more precise message
      "InnoDB: Could not free any blocks in the buffer pool"
      in the buf_flush_page_cleaner thread.
      
      buf_pool_t::LRU_warn(): Issue the warning message that we could
      not free any blocks in the buffer pool. This may also be invoked
      by buf_LRU_get_free_block() if buf_flush_page_cleaner() appears
      to be stuck.
      
      buf_pool_t::n_flush_dec(): Remove.
      
      buf_pool_t::n_flush_dec_holding_mutex(): Rename to n_flush_dec().
      
      buf_flush_LRU_list_batch(): Increment the eviction counter for blocks
      of temporary, discarded or dropped tablespaces.
      
      buf_flush_LRU(): Make static, and remove the constant parameter
      evict=false. The only caller will be the buf_flush_page_cleaner()
      thread.
      
      IORequest::is_LRU(): Remove. The only case of evicting pages on
      write completion will be when we are writing out pages of the
      temporary tablespace. Those pages are not in buf_pool.flush_list,
      only in buf_pool.LRU.
      
      buf_page_t::flush(): Remove the parameter evict.
      
      buf_page_t::write_complete(): Change the parameter "bool temporary"
      to "bool persistent" and add a parameter for an already read state().
      
      Reviewed by: Debarun Banerjee
      fa8a46eb
  2. 21 Mar, 2024 1 commit
    • Brandon Nesterenko's avatar
      MDEV-33551: Semi-sync Wait Point AFTER_COMMIT Slow on Workloads with Heavy Concurrency · 75c7c6dc
      Brandon Nesterenko authored
      When using semi-sync replication with
      rpl_semi_sync_master_wait_point=AFTER_COMMIT, the performance of the
      primary can significantly reduce compared to AFTER_SYNC's
      performance for workloads with many concurrent users executing
      transactions. This is because all connections on the primary share
      the same cond_wait variable/mutex pair, so any time an ACK is
      received from a replica, all waiting connections are awoken to check
      if the ACK was for itself, which is done in mutual exclusion.
      
      This patch changes this such that the waiting THD will use its own
      local condition variable, and the ACK receiver thread only signals
      connections which have been ACKed for wakeup. That is, the
      THD::LOCK_wakeup_ready condition variable is re-used for this
      purpose, and the Active_tranx queue nodes are extended to hold the
      waiting thread, so it can be signalled once ACKed.
      
      Additionally:
      
       1)  Removed part of MDEV-11853 additions, which allowed suspended
      connection threads awaiting their semi-sync ACKs to live until their
      ACKs had been received. This part, however, wasn't needed.  That is,
      all that was needed was for the Ack_thread to survive.  So now the
      connection threads are killed during phase 1. Thereby
      THD::is_awaiting_semisync_ack, and all its related code was removed.
      
       2) COND_binlog_send is repurposed to signal on the condition when
      Active_tranx is emptied during clear_active_tranx_nodes.
      
       3) At master shutdown (when waiting for slaves), instead of the
      main loop individually waiting for each ACK, await_slave_reply()
      (renamed await_all_slave_replies()) just waits once for the
      repurposed COND_binlog_send to signal it is empty.
      
       4) Test rpl_semi_sync_shutdown_await_ack is updates as following:
         4.1) Added test case (adapted from Kristian Nielsen) to ensure
      that if a thread awaiting its ACK is killed while SHUTDOWN WAIT FOR
      ALL SLAVES is issued, the primary will still wait for the ACK from
      the killed thread.
         4.2) As connections which by-passed phase 1 of thread killing no
      longer are delayed for kill until phase 2, we can no longer query
      yes/no tx after receiving an ACK/timeout. The check for these
      variables is removed.
         4.3) Comment descriptions are updated which mention that the
      connection is alive; and adjusted to be the Ack_thread.
      
      Reviewed By:
      ============
      Kristian Nielsen <knielsen@knielsen-hq.org>
      75c7c6dc
  3. 20 Mar, 2024 1 commit
    • Marko Mäkelä's avatar
      MDEV-26642/MDEV-26643/MDEV-32898 Implement innodb_snapshot_isolation · b8a67198
      Marko Mäkelä authored
      https://jepsen.io/analyses/mysql-8.0.34 highlights that the
      transaction isolation levels in the InnoDB storage engine do not
      correspond to any widely accepted definitions, such as
      "Generalized Isolation Level Definitions"
      https://pmg.csail.mit.edu/papers/icde00.pdf
      (PL-1 = READ UNCOMMITTED, PL-2 = READ COMMITTED, PL-2.99 = REPEATABLE READ,
      PL-3 = SERIALIZABLE).
      Only READ UNCOMMITTED in InnoDB seems to match the above definition.
      
      The issue is that InnoDB does not detect write/write conflicts
      (Section 4.4.3, Definition 6) in the above.
      
      It appears that as soon as we implement write/write conflict detection
      (SET SESSION innodb_snapshot_isolation=ON), the default isolation level
      (SET TRANSACTION ISOLATION LEVEL REPEATABLE READ) will become
      Snapshot Isolation (similar to Postgres), as defined in Section 4.2 of
      "A Critique of ANSI SQL Isolation Levels", MSR-TR-95-51, June 1995
      https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tr-95-51.pdf
      
      Locking reads inside InnoDB used to read the latest committed version,
      ignoring what should actually be visible to the transaction.
      The added test innodb.lock_isolation illustrates this. The statement
      	UPDATE t SET a=3 WHERE b=2;
      is executed in a transaction that was started before a read view or
      a snapshot of the current transaction was created, and committed before
      the current transaction attempts to execute
      	UPDATE t SET b=3;
      If SET innodb_snapshot_isolation=ON is in effect when the second
      transaction was started, the second transaction will be aborted with
      the error ER_CHECKREAD. By default (innodb_snapshot_isolation=OFF),
      the second transaction would execute inconsistently, displaying an
      incorrect SELECT COUNT(*) FROM t in its read view.
      
      If innodb_snapshot_isolation=ON, if an attempt to acquire a lock on a
      record that does not exist in the current read view is made, an error
      DB_RECORD_CHANGED (HA_ERR_RECORD_CHANGED, ER_CHECKREAD) will
      be raised. This error will be treated in the same way as a deadlock:
      the transaction will be rolled back.
      
      lock_clust_rec_read_check_and_lock(): If the current transaction has
      a read view where the record is not visible and
      innodb_snapshot_isolation=ON, fail before trying to acquire the lock.
      
      row_sel_build_committed_vers_for_mysql(): If innodb_snapshot_isolation=ON,
      disable the "semi-consistent read" logic that had been implemented by
      myself on the directions of Heikki Tuuri in order to address
      https://bugs.mysql.com/bug.php?id=3300 that was motivated by a customer
      wanting UPDATE to skip locked rows that do not match the WHERE condition.
      It looks like my changes were included in the MySQL 5.1.5
      commit ad126d90; at that time, employees
      of Innobase Oy (a recent acquisition of Oracle) had lost write access to
      the repository.
      
      The only reason why we set innodb_snapshot_isolation=OFF by default is
      backward compatibility with applications, such as the one that motivated
      the implementation of "semi-consistent read" back in 2005. In a later
      major release, we can default to innodb_snapshot_isolation=ON.
      
      Thanks to Peter Alvaro, Kyle Kingsbury and Alexey Gotsman for their work
      on https://github.com/jepsen-io/ and to Kyle and Alexey for explanations
      and some testing of this fix.
      
      Thanks to Vladislav Lesin for the initial test for MDEV-26643,
      as well as reviewing these changes.
      b8a67198
  4. 19 Mar, 2024 3 commits
    • Brandon Nesterenko's avatar
      MDEV-33716: rpl.rpl_semi_sync_slave_enabled_consistent Fails with Error Condition Reached · ca07f629
      Brandon Nesterenko authored
      Though the test itself doesn't create any transactions
      directly, the added test suppressions are replicated,
      and when the SQL thread is stopped mid-execution,
      it is set into an error state because these are
      non-transactional events being aborted.
      
      This patch fixes the test by ensuring that the test
      suppressions are fully replicated before continuing
      ca07f629
    • Thirunarayanan Balathandayuthapani's avatar
      MDEV-33542 Inplace algorithm occupies more disk space compared to copy algorithm · c3a6248b
      Thirunarayanan Balathandayuthapani authored
      Problem:
      =======
      - In case of large file size, InnoDB eagerly adds the new extent
      even though there are many existing unused pages of the segment.
      Reason is that in case of larger file size, threshold
      (1/8 of reserved pages) for adding new extent has been
      reached frequently.
      
      Solution:
      =========
      - Try to utilise the unused pages in the segment before adding
      the new extent in the file segment.
      
      need_for_new_extent(): In case of larger file size, try to use
      the 4 * FSP_EXTENT_SIZE as threshold to allocate the new extent.
      
      fseg_alloc_free_page_low(): Rewrote the function to allocate
      the page in the following order.
      1) Try to get the page from existing segment extent.
      2) Check whether the segment needs new extent
      (need_for_new_extent()) and allocate the new extent,
      find the page.
      3) Take individual page from the unused page from
      segment or tablespace.
      4) Allocate a new extent and take first page from it.
      
      Removed FSEG_FILLFACTOR, FSEG_FRAG_LIMIT variable.
      c3a6248b
    • Vladislav Vaintroub's avatar
      MDEV-23224 Windows threadpool - use better threadpool_max_threads default. · 5b4e69c0
      Vladislav Vaintroub authored
      Use max_connections in calculation, top prevent possible deadlock, if
      max_connection is high.
      5b4e69c0
  5. 18 Mar, 2024 4 commits
    • Vladislav Vaintroub's avatar
      Post-fix 567c0973 · 01d994b3
      Vladislav Vaintroub authored
      Do *not* check if socket is closed by another thread. This is
      race-condition prone, unnecessary, and harmful. VIO state was introduced
      to debug the errors, not to change the behavior.
      
      Rather than checking if socket is closed, add a DBUG_ASSERT that it is
      *not* closed, because this is an actual logic error, and can potentially
      lead to all sorts of funny behavior like writing error packets to Innodb
      files.
      
      Unlike closesocket(), shutdown(2) is not actually race-condition prone,
      and it breaks poll() and read(), and it worked for longer than a decade,
      and it does not need any state check in the code.
      01d994b3
    • Marko Mäkelä's avatar
      Merge 10.5 into 10.6 · 50715bd2
      Marko Mäkelä authored
      50715bd2
    • Marko Mäkelä's avatar
      Work around missing MSAN instrumentation · 4592af2e
      Marko Mäkelä authored
      Let us skip the recently added test main.mysql-interactive if
      an instrumented ncurses library is not available.
      
      In InnoDB, let us work around an uninstrumented libnuma, by
      declaring that the objects returned by numa_get_mems_allowed()
      are initialized.
      4592af2e
    • Marko Mäkelä's avatar
      MDEV-33478: Tests massively fail with clang-18 -fsanitize=memory · 09d991d0
      Marko Mäkelä authored
      Starting with clang-16, MemorySanitizer appears to check that
      uninitialized values not be passed by value nor returned.
      Previously, it was allowed to copy uninitialized data in such cases.
      
      get_foreign_key_info(): Remove a local variable that was passed
      uninitialized to a function.
      
      DsMrr_impl: Initialize key_buffer, because DsMrr_impl::dsmrr_init()
      is reading it.
      
      test_bind_result_ext1(): MYSQL_TYPE_LONG is 32 bits, hence we must
      use a 32-bit type, such as int. sizeof(long) differs between
      LP64 and LLP64 targets.
      09d991d0
  6. 15 Mar, 2024 5 commits
  7. 14 Mar, 2024 8 commits
  8. 13 Mar, 2024 10 commits
  9. 12 Mar, 2024 4 commits
    • Monty's avatar
      MDEV-33622 Server crashes when the UPDATE statement (which has duplicate key)... · cfa8268e
      Monty authored
      MDEV-33622 Server crashes when the UPDATE statement (which has duplicate key) is run after setting a low thread_stack
      
      This was caused by wrong allocation of variable on stack.
      (Was allocating 4K of data instead of 512 bytes).
      
      No test case as the original MDEV test cases is not usable for mtr.
      cfa8268e
    • Dmitry Shulga's avatar
      MDEV-33549: Incorrect handling of UPDATE in PS mode in case a table's colum declared as NOT NULL · 428a6731
      Dmitry Shulga authored
      UPDATE statement that is run in PS mode and uses positional parameter
      handles columns declared with the clause DEFAULT NULL incorrectly in
      case the clause DEFAULT is passed as actual value for the positional
      parameter of the prepared statement. Similar issue happens in case
      an expression specified in the DEFAULT clause of table's column definition.
      
      The reason for incorrect processing of columns declared as DEFAULT NULL
      is that setting of null flag for a field being updated was missed
      in implementation of the method Item_param::assign_default().
      The reason for incorrect handling of an expression in DEFAULT clause is
      also missed saving of a field inside implementation of the method
      Item_param::assign_default().
      428a6731
    • Marko Mäkelä's avatar
      MDEV-24167 fixup: Stricter assertion · 4ac8c4c8
      Marko Mäkelä authored
      log_free_check(): Assert that the current thread is not holding
      lock_sys.latch in any mode.
      
      This fixes up commit 5f2dcd11
      4ac8c4c8
    • Marko Mäkelä's avatar
      Merge 10.5 into 10.6 · c3a00dfa
      Marko Mäkelä authored
      c3a00dfa
  10. 11 Mar, 2024 2 commits