1. 30 Nov, 2023 6 commits
  2. 29 Nov, 2023 2 commits
    • Vlad Lesin's avatar
      MDEV-28682 gcol.gcol_purge contaminates further execution of innodb.gap_locks · 968061fd
      Vlad Lesin authored
      ha_innobase::extra() invokes check_trx_exists() unconditionally even for
      not supported operations. check_trx_exists() creates and registers trx_t
      object if THD does not contain pointer to it. If ha_innobase::extra() does
      not support some operation, it just invokes check_trx_exists() and quites.
      If check_trx_exists() creates and registers new trx_t object for such
      operation, it will never be freed and deregistered.
      
      For example, if ha_innobase::extra() is invoked from purge thread with
      operation = HA_EXTRA_IS_ATTACHED_CHILDREN, like it goes in
      gcol.gcol_purge test, trx_t object will be registered, but not
      deregisreted, and this causes innodb.gap_lock failure, as "SHOW ENGINE
      INNODB STATUS" shows information about unexpected transaction at the end
      of trx_sys.trx_list.
      
      The fix is not to invoke check_trx_exists() for unsupported operations
      in ha_innobase::extra().
      
      Reviewed by: Marko Mäkelä
      968061fd
    • Marko Mäkelä's avatar
      MDEV-32899 instrumentation · ba6bf7ad
      Marko Mäkelä authored
      In debug builds, let us declare dict_sys.latch as index_lock instead of
      srw_lock, so that we will benefit from the full tracking of lock ownership.
      
      lock_table_for_trx(): Assert that the current thread is not holding
      dict_sys.latch. If the dict_sys.unfreeze() call were moved to the end of
      lock_table_children(), this assertion would fail in the test innodb.innodb
      and many other tests that use FOREIGN KEY.
      ba6bf7ad
  3. 28 Nov, 2023 3 commits
    • Monty's avatar
      Remove deprication from mariadbd --debug · 387b92df
      Monty authored
      --debug is supported by allmost all our other binaries and we should keep
      it also in the server to keep option names similar.
      387b92df
    • Marko Mäkelä's avatar
      MDEV-32899 InnoDB is holding shared dict_sys.latch while waiting for FOREIGN... · 569da6a7
      Marko Mäkelä authored
      MDEV-32899 InnoDB is holding shared dict_sys.latch while waiting for FOREIGN KEY child table lock on DDL
      
      lock_table_children(): A new function to lock all child tables of a table.
      We will only hold dict_sys.latch while traversing
      dict_table_t::referenced_set. To prevent a race condition with
      std::set::erase() we will copy the pointers to the child tables to a
      local vector. Once we have acquired references to all child tables,
      we can safely release dict_sys.latch, wait for the locks, and finally
      release the references.
      
      This fixes up commit 2ca11234 (MDEV-26217)
      and commit c3c53926 (MDEV-26554).
      569da6a7
    • Alexander Barkov's avatar
      MDEV-32879 Server crash in my_decimal::operator= or unexpected ER_DUP_ENTRY... · f436b4a5
      Alexander Barkov authored
      MDEV-32879 Server crash in my_decimal::operator= or unexpected ER_DUP_ENTRY upon comparison with INET6 and similar types
      
      During the 10.5->10.6 merge please use the 10.6 code on conflicts.
      
      This is the 10.5 version of the patch (a backport of the 10.6 version).
      Unlike 10.6 version, it makes changes in plugin/type_inet/sql_type_inet.*
      rather than in sql/sql_type_fixedbin.h
      
      Item_bool_rowready_func2, Item_func_between, Item_func_in
      did not check if a not-NULL argument of an arbitrary data type
      can produce a NULL value on conversion to INET6.
      
      This caused a crash on DBUG_ASSERT() in conversion failures,
      because the function returned SQL NULL for something that
      has Item::maybe_null() equal to false.
      
      Adding setting NULL-ability in such cases.
      
      Details:
      
      - Removing the code in Item_func::setup_args_and_comparator()
        performing character set aggregation with optional narrowing.
        This aggregation is done inside Arg_comparator::set_cmp_func_string().
        So this code was redundant
      
      - Removing Item_func::setup_args_and_comparator() as it git simplified to
        just to two lines:
          convert_const_compared_to_int_field(thd);
          return cmp->set_cmp_func(thd, this, &args[0], &args[1], true);
        Using these lines directly in:
          - Item_bool_rowready_func2::fix_length_and_dec()
          - Item_func_nullif::fix_length_and_dec()
      
      - Adding a new virtual method:
        - Type_handler::Item_bool_rowready_func2_fix_length_and_dec().
      
      - Adding tests detecting if the data type conversion can return SQL NULL into
        the following methods of Type_handler_inet6:
        - Item_bool_rowready_func2_fix_length_and_dec
        - Item_func_between_fix_length_and_dec
        - Item_func_in_fix_comparator_compatible_types
      f436b4a5
  4. 27 Nov, 2023 1 commit
    • Alexander Barkov's avatar
      MDEV-32879 Server crash in my_decimal::operator= or unexpected ER_DUP_ENTRY... · 20b0ec9a
      Alexander Barkov authored
      MDEV-32879 Server crash in my_decimal::operator= or unexpected ER_DUP_ENTRY upon comparison with INET6 and similar types
      
      This is the 10.6 version of the patch.
      
      Item_bool_rowready_func2, Item_func_between, Item_func_in
      did not check if a not-NULL argument of an arbitrary data type
      can produce a NULL value on conversion to INET6.
      
      This caused a crash on DBUG_ASSERT() in conversion failures,
      because the function returned SQL NULL for something that
      has Item::maybe_null() equal to false.
      
      Adding setting NULL-ability in such cases.
      
      Details:
      
      - Removing the code in Item_func::setup_args_and_comparator()
        performing character set aggregation with optional narrowing.
        This aggregation is done inside Arg_comparator::set_cmp_func_string().
        So this code was redundant
      
      - Removing Item_func::setup_args_and_comparator() as it git simplified to
        just to two lines:
          convert_const_compared_to_int_field(thd);
          return cmp->set_cmp_func(thd, this, &args[0], &args[1], true);
        Using these lines directly in:
          - Item_bool_rowready_func2::fix_length_and_dec()
          - Item_func_nullif::fix_length_and_dec()
      
      - Adding a new virtual method:
        - Type_handler::Item_bool_rowready_func2_fix_length_and_dec().
      
      - Adding tests detecting if the data type conversion can return SQL NULL into
        the following methods of Type_handler_fbt:
        - Item_bool_rowready_func2_fix_length_and_dec
        - Item_func_between_fix_length_and_dec
        - Item_func_in_fix_comparator_compatible_types
      20b0ec9a
  5. 24 Nov, 2023 2 commits
    • Marko Mäkelä's avatar
      Merge 10.6 into 10.11 · 3e90efe4
      Marko Mäkelä authored
      3e90efe4
    • Marko Mäkelä's avatar
      MDEV-32873 Test innodb.innodb-index-online occasionally fails · 2f467de4
      Marko Mäkelä authored
      Let us wait for the completion of purge before testing the KILL of
      CREATE INDEX c2d ON t1(c2), so that there will be no table handle
      acquisition by a purge task before the operation is rolled back.
      
      Also, let us make the test compatible with ./mtr --repeat,
      and convert variable_value from string to integer so that any
      comparisons will be performed correctly.
      2f467de4
  6. 23 Nov, 2023 1 commit
    • Daniel Black's avatar
      MDEV-24670 memory pressure - eventfd rather than pipe · a48c1b89
      Daniel Black authored
      Eventfds have a simplier interface and are one file
      descriptor rather than two.
      
      Reuse the patten of the accepting socket connections
      by testing for abort after a poll returns. This way
      the same event descriptor can be used for Quit
      and debugging trigger.
      
      Also correct the registration of mem pressure file
      descriptors.
      a48c1b89
  7. 22 Nov, 2023 4 commits
  8. 21 Nov, 2023 8 commits
    • Marko Mäkelä's avatar
      MDEV-32374 log_sys.lsn_lock is a performance hog · 7443ad1c
      Marko Mäkelä authored
      The log_sys.lsn_lock that was introduced in
      commit a635c406
      had better be located in the same cache line with log_sys.latch
      so that log_t::append_prepare() needs to modify only two first
      cache lines where log_sys is stored.
      
      log_t::lsn_lock: On Linux, change the type from pthread_mutex_t to
      something that may be as small as 32 bits, to pack more data members
      in the same cache line. On Microsoft Windows, CRITICAL_SECTION works
      better.
      
      log_t::check_flush_or_checkpoint_: Renamed to need_checkpoint.
      There is no need to pause all writer threads in log_free_check() when
      we only need to write log_sys.buf to ib_logfile0. That will be done in
      mtr_t::commit().
      
      log_t::append_prepare_wait(): Make the member function non-static
      to simplify the call interface, and add a parameter for the LSN.
      
      log_t::append_prepare(): Invoke append_prepare_wait() at most once.
      Only set_check_for_checkpoint() if a log checkpoint needs to
      be written. If the log buffer needs to be written, we will take care
      of it ourselves later in our caller. This will reduce interference
      with log_free_check() in other threads.
      
      mtr_t::commit(): Call log_write_up_to() if needed.
      
      log_t::get_write_target(): Return a log_write_up_to() target
      to mtr_t::commit().
      
      buf_flush_ahead(): If we are in furious flushing, call
      log_sys.set_check_for_checkpoint() so that all writers will wait
      in log_free_check() until the checkpoint is done. Otherwise,
      the test innodb.insert_into_empty could occasionally report
      an error "Crash recovery is broken".
      
      log_check_margins(): Replaced by log_free_check().
      
      log_flush_margin(): Removed. This is part of mtr_t::commit()
      and other operations that write log.
      
      log_t::create(), log_t::attach(): Guarantee that buf_free < max_buf_free
      will always hold on PMEM, to satisfy an assumption of
      log_t::get_write_target().
      
      log_write_up_to(): Assert lsn!=0. Such calls are not incorrect, but it
      is cheaper to test that single unlikely condition in mtr_t::commit()
      rather than test several conditions in log_write_up_to().
      
      innodb_drop_database(), unlock_and_close_files(): Check the LSN before
      calling log_write_up_to().
      
      ha_innobase::commit_inplace_alter_table(): Remove redundant calls to
      log_write_up_to() after calling unlock_and_close_files().
      
      Reviewed by: Vladislav Vaintroub
      Stress tested by: Matthias Leich
      Performance tested by: Steve Shaw
      7443ad1c
    • Marko Mäkelä's avatar
      Merge 10.6 into 10.11 · f87c7d17
      Marko Mäkelä authored
      f87c7d17
    • Marko Mäkelä's avatar
      MDEV-32050 fixup: Stabilize tests · 4c16ec3e
      Marko Mäkelä authored
      In any test that uses wait_all_purged.inc, ensure that InnoDB tables
      will be created without persistent statistics.
      
      This is a follow-up to commit cd04673a
      after a similar failure was observed in the innodb_zip.blob test.
      4c16ec3e
    • Thirunarayanan Balathandayuthapani's avatar
      MDEV-32050 Fixup · 804b5974
      Thirunarayanan Balathandayuthapani authored
      - Fixing mariabackup.full_backup test case
      804b5974
    • Marko Mäkelä's avatar
      Merge 10.6 into 10.11 · 583a7452
      Marko Mäkelä authored
      583a7452
    • Marko Mäkelä's avatar
      Merge 10.5 into 10.6 · 9c5600ad
      Marko Mäkelä authored
      9c5600ad
    • Marko Mäkelä's avatar
      Merge 10.5 into 10.6 · 0ead2031
      Marko Mäkelä authored
      0ead2031
    • Marko Mäkelä's avatar
      MDEV-32820 Race condition between trx_purge_free_segment() and trx_undo_create() · de31ca6a
      Marko Mäkelä authored
      trx_purge_free_segment(): If fseg_free_step_not_header() needs to be
      called multiple times, acquire an exclusive latch on the
      rollback segment header page after restarting the mini-transaction
      so that the rest of this function cannot execute concurrently
      with trx_undo_create() on the same rollback segment.
      
      This fixes a regression that was introduced in
      commit c14a3943 (MDEV-30753).
      
      Note: The buffer-fixes that we are holding across the mini-transaction
      restart will prevent the pages from being evicted from the buffer pool.
      They may be accessed by other threads or written back to data files
      while we are not holding exclusive latches.
      
      Reviewed by: Vladislav Lesin
      de31ca6a
  9. 20 Nov, 2023 4 commits
  10. 19 Nov, 2023 3 commits
  11. 18 Nov, 2023 1 commit
    • Marko Mäkelä's avatar
      MDEV-31953 madvise(..., MADV_FREE) is causing a performance regression · 23234835
      Marko Mäkelä authored
      buf_page_t::set_os_unused(): Remove the system call that had been added in
      commit 16c97187 and revised in
      commit c1fd082e for Microsoft Windows.
      
      buf_pool_t::garbage_collect(): A new function to collect any garbage
      from the InnoDB buffer pool that can be removed without writing any
      log or data files. This will also invoke madvise() for all of buf_pool.free.
      
      To trigger this the following MDEV is implemented:
      MDEV-24670 avoid OOM by linux kernel co-operative memory management
      
      To avoid frequent triggers that caused the MDEV-31953 regression, while
      still preserving the 10.11 functionality of non-greedy kernel memory
      usage, memory triggers are used.
      
      On the triggering of memory pressure, if supported in the Linux kernel,
      trigger the garbage collection of the innodb buffer pool.
      
      The hard coded triggers occur where there is:
      * some memory pressure in 5 of the last 10 seconds
      * a full stall on memory pressure for 10ms in the last 2 seconds
      
      The kernel will trigger only one in each of these time windows. To avoid
      mariadb being in a constant state of memory garbage collection, this has
      been limited to once per minute.
      
      For a small set of kernels in 2023 (6.5, 6.6), there was a limit requiring
      CAP_SYS_RESOURCE that was lifted[1] to support the use case of user
      memory pressure. It not currently possible to set CAP_SYS_RESOURCES in
      a systemd service as its setting a capability inside a usernamespace.
      
      Running under systemd v254+ requires the default MemoryPressureWatch=auto
      (or alternately "on").
      
      Functionality was tested in a 6.4 kernel Fedora successfully under a
      systemd service.
      
      Running in a container requires that (unmask=)/sys/fs/cgroup be writable
      by the mariadbd process.
      
      To aid testing, the buf_pool_resize was a convient trigger point on
      which to trigger garbage collection.
      
      ref [1]: https://lore.kernel.org/all/CAMw=ZnQ56cm4Txgy5EhGYvR+Jt4s-KVgoA9_65HKWVMOXp7a9A@mail.gmail.com/T/#m3bd2a73c5ee49965cb73a830b1ccaa37ccf4e427
      
      Co-Author: Daniel Black (on memory pressure trigger)
      
      Reviewed by: Marko Mäkelä, Vladislav Vaintroub, Vladislav Lesin,
         Thirunarayanan Balathandayuthapani
      
      Tested by: Matthias Leich
      23234835
  12. 17 Nov, 2023 2 commits
    • Marko Mäkelä's avatar
      MDEV-32027 Opening all .ibd files on InnoDB startup can be slow · eb1f8b29
      Marko Mäkelä authored
      dict_find_max_space_id(): Return SELECT MAX(SPACE) FROM SYS_TABLES.
      
      dict_check_tablespaces_and_store_max_id(): In the normal case
      (no encryption plugin has been loaded and the change buffer is empty),
      invoke dict_find_max_space_id() and do not open any .ibd files.
      If a std::set<uint32_t> has been specified, open the files whose
      tablespace ID is mentioned. Else, open all data files that are identified
      by SYS_TABLES records.
      
      fil_ibd_open(): Remove a call to os_file_get_last_error() that can
      report a misleading error, such as EINVAL inside my_realpath() that is
      not an actual error. This could be invoked when a data file is found
      but the FSP_SPACE_FLAGS are incorrect, such as is the case for
      table test.td in
      ./mtr --mysqld=--innodb-buffer-pool-dump-at-shutdown=0 innodb.table_flags
      
      buf_load(): If any tablespaces could not be found, invoke
      dict_check_tablespaces_and_store_max_id() on the missing tablespaces.
      
      dict_load_tablespace(): Try to load the tablespace unless it was found
      to be futile. This fixes failures related to FTS_*.ibd files for
      FULLTEXT INDEX.
      
      btr_cur_t::search_leaf(): Prevent a crash when the tablespace
      does not exist. This was caught by the test innodb_fts.fts_concurrent_insert
      when the change to dict_load_tablespaces() was not present.
      
      We modify a few tests to ensure that tables will not be loaded at startup.
      For some fault injection tests this means that the corrupted tables
      will not be loaded, because dict_load_tablespace() would perform stricter
      checks than dict_check_tablespaces_and_store_max_id().
      
      Tested by: Matthias Leich
      Reviewed by: Thirunarayanan Balathandayuthapani
      eb1f8b29
    • Marko Mäkelä's avatar
      Merge 10.5 into 10.6 · 44b9e416
      Marko Mäkelä authored
      44b9e416
  13. 16 Nov, 2023 3 commits
    • Marko Mäkelä's avatar
      MDEV-26055: Correct the formula for adaptive flushing · 9a545eb6
      Marko Mäkelä authored
      This is a 10.5 backport of 10.6
      commit d4265fbd.
      
      page_cleaner_flush_pages_recommendation(): If dirty_pct is
      between innodb_max_dirty_pages_pct_lwm
      and innodb_max_dirty_pages_pct,
      scale the effort relative to how close we are to
      innodb_max_dirty_pages_pct.
      
      The previous formula was missing a multiplication by 100.
      9a545eb6
    • Marko Mäkelä's avatar
      MDEV-26055: Improve adaptive flushing · a3d0d5fc
      Marko Mäkelä authored
      This is a 10.5 backport from 10.6
      commit 9593cccf.
      
      Adaptive flushing is enabled by setting innodb_max_dirty_pages_pct_lwm>0
      (not default) and innodb_adaptive_flushing=ON (default).
      There is also the parameter innodb_adaptive_flushing_lwm
      (default: 10 per cent of the log capacity). It should enable some
      adaptive flushing even when innodb_max_dirty_pages_pct_lwm=0.
      That is not being changed here.
      
      This idea was first presented by Inaam Rana several years ago,
      and I discussed it with Jean-François Gagné at FOSDEM 2023.
      
      buf_flush_page_cleaner(): When we are not near the log capacity limit
      (neither buf_flush_async_lsn nor buf_flush_sync_lsn are set),
      also try to move clean blocks from the buf_pool.LRU list to buf_pool.free
      or initiate writes (but not the eviction) of dirty blocks, until
      the remaining I/O capacity has been consumed.
      
      buf_flush_LRU_list_batch(): Add the parameter bool evict, to specify
      whether dirty least recently used pages (from buf_pool.LRU) should
      be evicted immediately after they have been written out. Callers outside
      buf_flush_page_cleaner() will pass evict=true, to retain the existing
      behaviour.
      
      buf_do_LRU_batch(): Add the parameter bool evict.
      Return counts of evicted and flushed pages.
      
      buf_flush_LRU(): Add the parameter bool evict.
      Assume that the caller holds buf_pool.mutex and
      will invoke buf_dblwr.flush_buffered_writes() afterwards.
      
      buf_flush_list_holding_mutex(): A low-level variant of buf_flush_list()
      whose caller must hold buf_pool.mutex and invoke
      buf_dblwr.flush_buffered_writes() afterwards.
      
      buf_flush_wait_batch_end_acquiring_mutex(): Remove. It is enough to have
      buf_flush_wait_batch_end().
      
      page_cleaner_flush_pages_recommendation(): Avoid some floating-point
      arithmetics.
      
      buf_flush_page(), buf_flush_check_neighbor(), buf_flush_check_neighbors(),
      buf_flush_try_neighbors(): Rename the parameter "bool lru" to "bool evict".
      
      buf_free_from_unzip_LRU_list_batch(): Remove the parameter.
      Only actual page writes will contribute towards the limit.
      
      buf_LRU_free_page(): Evict freed pages of temporary tables.
      
      buf_pool.done_free: Broadcast whenever a block is freed
      (and buf_pool.try_LRU_scan is set).
      
      buf_pool_t::io_buf_t::reserve(): Retry indefinitely.
      During the test encryption.innochecksum we easily run out of
      these buffers for PAGE_COMPRESSED or ENCRYPTED pages.
      
      Tested by Matthias Leich and Axel Schwenke
      a3d0d5fc
    • Marko Mäkelä's avatar
      MDEV-31861 Empty INSERT crashes with innodb_force_recovery=6 or innodb_read_only=ON · 5a1f821b
      Marko Mäkelä authored
      ha_innobase::extra(): Do not invoke log_buffer_flush_to_disk()
      if high_level_read_only holds.
      
      log_buffer_flush_to_disk(): Remove an assertion that duplicates one
      at the start of log_write_up_to().
      5a1f821b