1. 25 Sep, 2024 3 commits
    • Marko Mäkelä's avatar
      MDEV-33966: buf_page_make_young() is a contention point · c8f4726e
      Marko Mäkelä authored
      The buf_pool.LRU list needs to reasonably accurately reflect recently
      accessed blocks, so that they will not be evicted prematurely.
      Because the list is protected by buf_pool.mutex, it is not a good idea
      to maintain the position on every page access.
      
      Instead of maintaining the LRU position on each access, we will
      set a "recently accessed" flag in the block descriptor. The
      buf_flush_page_cleaner() thread as well as some traversal of the
      buf_pool.LRU list will test and reset this flag on each visited
      block. If the flag was set, the position in the buf_pool.LRU list
      may be adjusted.
      
      buf_pool_t::freed_page_clock, buf_page_t::freed_page_clock: Remove.
      This is no longer meaningful in the revised design.
      
      page_zip_des_t::state: An atomic 16-bit field that will include
      the "accessed" and "old" flags that would more logically belong
      to buf_page_t. We maintain them here (along with some
      ROW_FORMAT=COMPRESSED specific state that is protected by page
      latches) in order to avoid race conditions and unnecessary
      memory overhead.
      
      buf_block_t::init(): Replaces buf_block_init().
      
      buf_page_t::invalidate(): Replaces buf_block_modify_clock_inc().
      Instead of maintaining a 64-bit counter, we will maintain one
      comprising 32+16=48 bits, in modify_clock_low,modify_clock_high.
      Worst case there will be exactly n<<48 calls to
      buf_page_t::invalidate() before some operation such as
      btr_pcur_t::restore_position() is executed. Such a count should
      be extremely unlikely but not completely impossible. It is worth
      noting that the DB_TRX_ID is only 48 bits, and each transaction
      start and commit/rollback will consume an identifier.
      
      buf_page_t::modify_clock(): Replaces the read access of
      buf_page_t::modify_clock. Assert that the caller is holding
      a page latch. Note: because invalidate() and modify_clock() are
      protected with buf_pool.mutex or the buf_page_t::lock, there can
      be no issue with regard to the atomicity of accessing the 48-bit field.
      
      buf_page_t::access_time: Store uint16_t(time(nullptr)). Yes, it will
      wrap around every 18.2 hours, and in the worst case, a rather recently
      accessed block may end up being evicted as "least recently used".
      But in this way we will avoid any alignment loss: the adjacent fields
      modify_clock_low, modify_clock_high, access_time of 32+16+16 bits
      will nicely add up to 64 bits.
      
      buf_page_t::make_young(): Clear the "recently accessed" flag of a block
      and move the block to the "not recently used" end of buf_pool.LRU
      if it qualifies for that. When we make a block young, we zero out
      access_time so that the comparison is more likely to hold on a
      subsequent invocation, or so that flag_accessed() will update the
      current access time, which due to wrap-around could look like less
      than the previously assigned access_time.
      
      buf_LRU_scan_and_free_block(): Declare static.
      
      buf_pool_invalidate(): Define in the same compilation unit with
      buf_LRU_scan_and_free_block().
      
      buf_pool.LRU_old_time_threshold: Replaces buf_LRU_old_threshold_ms.
      
      PageConverter::run(): Renamed from fil_iterate(). In debug builds,
      acquire a dummy exclusive latch on the block, so that the assertion
      in buf_block_t::modify() will be satisfied.
      
      ut_time_ms(): Remove. Invoking my_hrtime_coarse() is sufficient for the
      remaining purposes.
      c8f4726e
    • Marko Mäkelä's avatar
      Cleanup: Remove duplicated code · 1590fa4c
      Marko Mäkelä authored
      buf_block_alloc(): Define as an alias in buf0lru.h, which defines
      the underlying buf_LRU_get_free_block().
      
      buf_block_free(): Define as an alias of the non-inline function
      buf_pool.free_block(block).
      1590fa4c
    • Marko Mäkelä's avatar
      MDEV-33966 preparation: Clean up recv_sys.pages bookkeeping · 512b8113
      Marko Mäkelä authored
      Instead of repurposing buf_page_t::access_time for state()==MEMORY
      blocks that are part of recv_sys.pages, let us define an anonymous
      union around buf_page_t::hash.  In this way, we will be able to
      declare access_time private.
      512b8113
  2. 24 Sep, 2024 1 commit
    • Denis Protivensky's avatar
      MDEV-34836: TOI on parent table must BF abort SR in progress on a child · 231900e5
      Denis Protivensky authored
      Applied SR transaction on the child table was not BF aborted by TOI running
      on the parent table for several reasons:
      
      Although SR correctly collected FK-referenced keys to parent, TOI in Galera
      disregards common certification index and simply sets itself to depend on
      the latest certified write set seqno.
      
      Since this write set was the fragment of SR transaction, TOI was allowed to
      run in parallel with SR presuming it would BF abort the latter.
      
      At the same time, DML transactions in the server don't grab MDL locks on
      FK-referenced tables, thus parent table wasn't protected by an MDL lock from
      SR and it couldn't provoke MDL lock conflict for TOI to BF abort SR transaction.
      
      In InnoDB, DDL transactions grab shared MDL locks on child tables, which is not
      enough to trigger MDL conflict in Galera.
      
      InnoDB-level Wsrep patch didn't contain correct conflict resolution logic due to
      the fact that it was believed MDL locking should always produce conflicts correctly.
      
      The fix brings conflict resolution rules similar to MDL-level checks to InnoDB,
      thus accounting for the problematic case.
      
      Apart from that, wsrep_thd_is_SR() is patched to return true only for executing
      SR transactions. It should be safe as any other SR state is either the same as
      for any single write set (thus making the two logically equivalent), or it reflects
      an SR transaction as being aborting or prepared, which is handled separately in
      BF-aborting logic, and for regular execution path it should not matter at all.
      Signed-off-by: default avatarJulius Goryavsky <julius.goryavsky@mariadb.com>
      231900e5
  3. 23 Sep, 2024 2 commits
    • Marko Mäkelä's avatar
      MDEV-34983: Remove x86 asm from InnoDB · 638c62ac
      Marko Mäkelä authored
      Starting with GCC 7 and clang 15, single-bit operations such as
      fetch_or(1) & 1 are translated into 80386 instructions such as
      LOCK BTS, instead of using the generic translation pattern
      of emitting a loop around LOCK CMPXCHG.
      
      Given that the oldest currently supported GNU/Linux distributions
      ship GCC 7, and that older versions of GCC are out of support,
      let us remove some work-arounds that are not strictly necessary.
      If someone compiles the code using an older compiler, it will work
      but possibly less efficiently.
      
      srw_mutex_impl::HOLDER: Changed from 1U<<31 to 1 in order to
      work around https://github.com/llvm/llvm-project/issues/37322
      which is specific to setting the most significant bit.
      
      srw_mutex_impl::WAITER: A multiplier of waiting requests.
      This used to be 1, which would now collide with HOLDER.
      
      fil_space_t::set_stopping(): Remove this unused function.
      
      In MSVC we need _interlockedbittestandset() for LOCK BTS.
      638c62ac
    • Lena Startseva's avatar
  4. 20 Sep, 2024 2 commits
  5. 16 Sep, 2024 2 commits
  6. 15 Sep, 2024 12 commits
  7. 14 Sep, 2024 1 commit
    • Marko Mäkelä's avatar
      mtr_t::log_file_op(): Fix -Wnonnull · 4010dff0
      Marko Mäkelä authored
      GCC 12.2.0 could issue -Wnonnull for an unreachable call to
      strlen(new_path).  Let us prevent that by replacing the condition
      (type == FILE_RENAME) with the equivalent (new_path).
      This should also optimize the generated code, because the life time
      of the parameter "type" will be reduced.
      4010dff0
  8. 13 Sep, 2024 1 commit
    • Marko Mäkelä's avatar
      MDEV-34921 MemorySanitizer reports errors for non-debug builds · b331cde2
      Marko Mäkelä authored
      my_b_encr_write(): Initialize also block_length, and at the same time
      last_block_length, so that all 128 bits can be initialized with fewer
      writes. This fixes an error that was caught in the test
      encryption.tempfiles_encrypted.
      
      test_my_safe_print_str(): Skip a test that would attempt to
      display uninitialized data in the test unit.stacktrace.
      Previously, our CI did not build unit tests with MemorySanitizer.
      
      handle_delayed_insert(): Remove a redundant call to pthread_exit(0),
      which would for some reason cause MemorySanitizer in clang-19 to
      report a stack overflow in a RelWithDebInfo build. This fixes a
      failure of several tests.
      
      Reviewed by: Vladislav Vaintroub
      b331cde2
  9. 12 Sep, 2024 5 commits
  10. 11 Sep, 2024 5 commits
  11. 10 Sep, 2024 6 commits