1. 04 Dec, 2023 1 commit
    • Marko Mäkelä's avatar
      MDEV-32042 Simplify buf_page_get_gen() · 850d6173
      Marko Mäkelä authored
      buf_page_get_low(): Rename to buf_page_get_gen(), and assume that no
      crash recovery is needed.
      
      recv_sys_t::recover(): Replaces the old buf_page_get_gen(). Read a page
      while crash recovery is in progress.
      
      trx_rseg_get_n_undo_tablespaces(), ibuf_upgrade_needed():
      Invoke recv_sys.recover() instead of buf_page_get_gen().
      
      dict_boot(): Invoke recv_sys.recover() instead of buf_page_get_gen().
      Do not load the system tables.
      
      srv_start(): Load the system tables and the undo logs after all redo log
      has been applied in recv_sys.apply(true) and we can safely invoke the
      regular buf_page_get_gen().
      850d6173
  2. 30 Nov, 2023 8 commits
  3. 29 Nov, 2023 2 commits
    • Vlad Lesin's avatar
      MDEV-28682 gcol.gcol_purge contaminates further execution of innodb.gap_locks · 968061fd
      Vlad Lesin authored
      ha_innobase::extra() invokes check_trx_exists() unconditionally even for
      not supported operations. check_trx_exists() creates and registers trx_t
      object if THD does not contain pointer to it. If ha_innobase::extra() does
      not support some operation, it just invokes check_trx_exists() and quites.
      If check_trx_exists() creates and registers new trx_t object for such
      operation, it will never be freed and deregistered.
      
      For example, if ha_innobase::extra() is invoked from purge thread with
      operation = HA_EXTRA_IS_ATTACHED_CHILDREN, like it goes in
      gcol.gcol_purge test, trx_t object will be registered, but not
      deregisreted, and this causes innodb.gap_lock failure, as "SHOW ENGINE
      INNODB STATUS" shows information about unexpected transaction at the end
      of trx_sys.trx_list.
      
      The fix is not to invoke check_trx_exists() for unsupported operations
      in ha_innobase::extra().
      
      Reviewed by: Marko Mäkelä
      968061fd
    • Marko Mäkelä's avatar
      MDEV-32899 instrumentation · ba6bf7ad
      Marko Mäkelä authored
      In debug builds, let us declare dict_sys.latch as index_lock instead of
      srw_lock, so that we will benefit from the full tracking of lock ownership.
      
      lock_table_for_trx(): Assert that the current thread is not holding
      dict_sys.latch. If the dict_sys.unfreeze() call were moved to the end of
      lock_table_children(), this assertion would fail in the test innodb.innodb
      and many other tests that use FOREIGN KEY.
      ba6bf7ad
  4. 28 Nov, 2023 3 commits
    • Monty's avatar
      Remove deprication from mariadbd --debug · 387b92df
      Monty authored
      --debug is supported by allmost all our other binaries and we should keep
      it also in the server to keep option names similar.
      387b92df
    • Marko Mäkelä's avatar
      MDEV-32899 InnoDB is holding shared dict_sys.latch while waiting for FOREIGN... · 569da6a7
      Marko Mäkelä authored
      MDEV-32899 InnoDB is holding shared dict_sys.latch while waiting for FOREIGN KEY child table lock on DDL
      
      lock_table_children(): A new function to lock all child tables of a table.
      We will only hold dict_sys.latch while traversing
      dict_table_t::referenced_set. To prevent a race condition with
      std::set::erase() we will copy the pointers to the child tables to a
      local vector. Once we have acquired references to all child tables,
      we can safely release dict_sys.latch, wait for the locks, and finally
      release the references.
      
      This fixes up commit 2ca11234 (MDEV-26217)
      and commit c3c53926 (MDEV-26554).
      569da6a7
    • Alexander Barkov's avatar
      MDEV-32879 Server crash in my_decimal::operator= or unexpected ER_DUP_ENTRY... · f436b4a5
      Alexander Barkov authored
      MDEV-32879 Server crash in my_decimal::operator= or unexpected ER_DUP_ENTRY upon comparison with INET6 and similar types
      
      During the 10.5->10.6 merge please use the 10.6 code on conflicts.
      
      This is the 10.5 version of the patch (a backport of the 10.6 version).
      Unlike 10.6 version, it makes changes in plugin/type_inet/sql_type_inet.*
      rather than in sql/sql_type_fixedbin.h
      
      Item_bool_rowready_func2, Item_func_between, Item_func_in
      did not check if a not-NULL argument of an arbitrary data type
      can produce a NULL value on conversion to INET6.
      
      This caused a crash on DBUG_ASSERT() in conversion failures,
      because the function returned SQL NULL for something that
      has Item::maybe_null() equal to false.
      
      Adding setting NULL-ability in such cases.
      
      Details:
      
      - Removing the code in Item_func::setup_args_and_comparator()
        performing character set aggregation with optional narrowing.
        This aggregation is done inside Arg_comparator::set_cmp_func_string().
        So this code was redundant
      
      - Removing Item_func::setup_args_and_comparator() as it git simplified to
        just to two lines:
          convert_const_compared_to_int_field(thd);
          return cmp->set_cmp_func(thd, this, &args[0], &args[1], true);
        Using these lines directly in:
          - Item_bool_rowready_func2::fix_length_and_dec()
          - Item_func_nullif::fix_length_and_dec()
      
      - Adding a new virtual method:
        - Type_handler::Item_bool_rowready_func2_fix_length_and_dec().
      
      - Adding tests detecting if the data type conversion can return SQL NULL into
        the following methods of Type_handler_inet6:
        - Item_bool_rowready_func2_fix_length_and_dec
        - Item_func_between_fix_length_and_dec
        - Item_func_in_fix_comparator_compatible_types
      f436b4a5
  5. 27 Nov, 2023 1 commit
    • Alexander Barkov's avatar
      MDEV-32879 Server crash in my_decimal::operator= or unexpected ER_DUP_ENTRY... · 20b0ec9a
      Alexander Barkov authored
      MDEV-32879 Server crash in my_decimal::operator= or unexpected ER_DUP_ENTRY upon comparison with INET6 and similar types
      
      This is the 10.6 version of the patch.
      
      Item_bool_rowready_func2, Item_func_between, Item_func_in
      did not check if a not-NULL argument of an arbitrary data type
      can produce a NULL value on conversion to INET6.
      
      This caused a crash on DBUG_ASSERT() in conversion failures,
      because the function returned SQL NULL for something that
      has Item::maybe_null() equal to false.
      
      Adding setting NULL-ability in such cases.
      
      Details:
      
      - Removing the code in Item_func::setup_args_and_comparator()
        performing character set aggregation with optional narrowing.
        This aggregation is done inside Arg_comparator::set_cmp_func_string().
        So this code was redundant
      
      - Removing Item_func::setup_args_and_comparator() as it git simplified to
        just to two lines:
          convert_const_compared_to_int_field(thd);
          return cmp->set_cmp_func(thd, this, &args[0], &args[1], true);
        Using these lines directly in:
          - Item_bool_rowready_func2::fix_length_and_dec()
          - Item_func_nullif::fix_length_and_dec()
      
      - Adding a new virtual method:
        - Type_handler::Item_bool_rowready_func2_fix_length_and_dec().
      
      - Adding tests detecting if the data type conversion can return SQL NULL into
        the following methods of Type_handler_fbt:
        - Item_bool_rowready_func2_fix_length_and_dec
        - Item_func_between_fix_length_and_dec
        - Item_func_in_fix_comparator_compatible_types
      20b0ec9a
  6. 24 Nov, 2023 4 commits
  7. 23 Nov, 2023 1 commit
    • Daniel Black's avatar
      MDEV-24670 memory pressure - eventfd rather than pipe · a48c1b89
      Daniel Black authored
      Eventfds have a simplier interface and are one file
      descriptor rather than two.
      
      Reuse the patten of the accepting socket connections
      by testing for abort after a poll returns. This way
      the same event descriptor can be used for Quit
      and debugging trigger.
      
      Also correct the registration of mem pressure file
      descriptors.
      a48c1b89
  8. 22 Nov, 2023 4 commits
  9. 21 Nov, 2023 8 commits
    • Marko Mäkelä's avatar
      MDEV-32374 log_sys.lsn_lock is a performance hog · 7443ad1c
      Marko Mäkelä authored
      The log_sys.lsn_lock that was introduced in
      commit a635c406
      had better be located in the same cache line with log_sys.latch
      so that log_t::append_prepare() needs to modify only two first
      cache lines where log_sys is stored.
      
      log_t::lsn_lock: On Linux, change the type from pthread_mutex_t to
      something that may be as small as 32 bits, to pack more data members
      in the same cache line. On Microsoft Windows, CRITICAL_SECTION works
      better.
      
      log_t::check_flush_or_checkpoint_: Renamed to need_checkpoint.
      There is no need to pause all writer threads in log_free_check() when
      we only need to write log_sys.buf to ib_logfile0. That will be done in
      mtr_t::commit().
      
      log_t::append_prepare_wait(): Make the member function non-static
      to simplify the call interface, and add a parameter for the LSN.
      
      log_t::append_prepare(): Invoke append_prepare_wait() at most once.
      Only set_check_for_checkpoint() if a log checkpoint needs to
      be written. If the log buffer needs to be written, we will take care
      of it ourselves later in our caller. This will reduce interference
      with log_free_check() in other threads.
      
      mtr_t::commit(): Call log_write_up_to() if needed.
      
      log_t::get_write_target(): Return a log_write_up_to() target
      to mtr_t::commit().
      
      buf_flush_ahead(): If we are in furious flushing, call
      log_sys.set_check_for_checkpoint() so that all writers will wait
      in log_free_check() until the checkpoint is done. Otherwise,
      the test innodb.insert_into_empty could occasionally report
      an error "Crash recovery is broken".
      
      log_check_margins(): Replaced by log_free_check().
      
      log_flush_margin(): Removed. This is part of mtr_t::commit()
      and other operations that write log.
      
      log_t::create(), log_t::attach(): Guarantee that buf_free < max_buf_free
      will always hold on PMEM, to satisfy an assumption of
      log_t::get_write_target().
      
      log_write_up_to(): Assert lsn!=0. Such calls are not incorrect, but it
      is cheaper to test that single unlikely condition in mtr_t::commit()
      rather than test several conditions in log_write_up_to().
      
      innodb_drop_database(), unlock_and_close_files(): Check the LSN before
      calling log_write_up_to().
      
      ha_innobase::commit_inplace_alter_table(): Remove redundant calls to
      log_write_up_to() after calling unlock_and_close_files().
      
      Reviewed by: Vladislav Vaintroub
      Stress tested by: Matthias Leich
      Performance tested by: Steve Shaw
      7443ad1c
    • Marko Mäkelä's avatar
      Merge 10.6 into 10.11 · f87c7d17
      Marko Mäkelä authored
      f87c7d17
    • Marko Mäkelä's avatar
      MDEV-32050 fixup: Stabilize tests · 4c16ec3e
      Marko Mäkelä authored
      In any test that uses wait_all_purged.inc, ensure that InnoDB tables
      will be created without persistent statistics.
      
      This is a follow-up to commit cd04673a
      after a similar failure was observed in the innodb_zip.blob test.
      4c16ec3e
    • Thirunarayanan Balathandayuthapani's avatar
      MDEV-32050 Fixup · 804b5974
      Thirunarayanan Balathandayuthapani authored
      - Fixing mariabackup.full_backup test case
      804b5974
    • Marko Mäkelä's avatar
      Merge 10.6 into 10.11 · 583a7452
      Marko Mäkelä authored
      583a7452
    • Marko Mäkelä's avatar
      Merge 10.5 into 10.6 · 9c5600ad
      Marko Mäkelä authored
      9c5600ad
    • Marko Mäkelä's avatar
      Merge 10.5 into 10.6 · 0ead2031
      Marko Mäkelä authored
      0ead2031
    • Marko Mäkelä's avatar
      MDEV-32820 Race condition between trx_purge_free_segment() and trx_undo_create() · de31ca6a
      Marko Mäkelä authored
      trx_purge_free_segment(): If fseg_free_step_not_header() needs to be
      called multiple times, acquire an exclusive latch on the
      rollback segment header page after restarting the mini-transaction
      so that the rest of this function cannot execute concurrently
      with trx_undo_create() on the same rollback segment.
      
      This fixes a regression that was introduced in
      commit c14a3943 (MDEV-30753).
      
      Note: The buffer-fixes that we are holding across the mini-transaction
      restart will prevent the pages from being evicted from the buffer pool.
      They may be accessed by other threads or written back to data files
      while we are not holding exclusive latches.
      
      Reviewed by: Vladislav Lesin
      de31ca6a
  10. 20 Nov, 2023 4 commits
  11. 19 Nov, 2023 3 commits
  12. 18 Nov, 2023 1 commit
    • Marko Mäkelä's avatar
      MDEV-31953 madvise(..., MADV_FREE) is causing a performance regression · 23234835
      Marko Mäkelä authored
      buf_page_t::set_os_unused(): Remove the system call that had been added in
      commit 16c97187 and revised in
      commit c1fd082e for Microsoft Windows.
      
      buf_pool_t::garbage_collect(): A new function to collect any garbage
      from the InnoDB buffer pool that can be removed without writing any
      log or data files. This will also invoke madvise() for all of buf_pool.free.
      
      To trigger this the following MDEV is implemented:
      MDEV-24670 avoid OOM by linux kernel co-operative memory management
      
      To avoid frequent triggers that caused the MDEV-31953 regression, while
      still preserving the 10.11 functionality of non-greedy kernel memory
      usage, memory triggers are used.
      
      On the triggering of memory pressure, if supported in the Linux kernel,
      trigger the garbage collection of the innodb buffer pool.
      
      The hard coded triggers occur where there is:
      * some memory pressure in 5 of the last 10 seconds
      * a full stall on memory pressure for 10ms in the last 2 seconds
      
      The kernel will trigger only one in each of these time windows. To avoid
      mariadb being in a constant state of memory garbage collection, this has
      been limited to once per minute.
      
      For a small set of kernels in 2023 (6.5, 6.6), there was a limit requiring
      CAP_SYS_RESOURCE that was lifted[1] to support the use case of user
      memory pressure. It not currently possible to set CAP_SYS_RESOURCES in
      a systemd service as its setting a capability inside a usernamespace.
      
      Running under systemd v254+ requires the default MemoryPressureWatch=auto
      (or alternately "on").
      
      Functionality was tested in a 6.4 kernel Fedora successfully under a
      systemd service.
      
      Running in a container requires that (unmask=)/sys/fs/cgroup be writable
      by the mariadbd process.
      
      To aid testing, the buf_pool_resize was a convient trigger point on
      which to trigger garbage collection.
      
      ref [1]: https://lore.kernel.org/all/CAMw=ZnQ56cm4Txgy5EhGYvR+Jt4s-KVgoA9_65HKWVMOXp7a9A@mail.gmail.com/T/#m3bd2a73c5ee49965cb73a830b1ccaa37ccf4e427
      
      Co-Author: Daniel Black (on memory pressure trigger)
      
      Reviewed by: Marko Mäkelä, Vladislav Vaintroub, Vladislav Lesin,
         Thirunarayanan Balathandayuthapani
      
      Tested by: Matthias Leich
      23234835