1. 24 Nov, 2023 5 commits
  2. 23 Nov, 2023 1 commit
    • Daniel Black's avatar
      MDEV-24670 memory pressure - eventfd rather than pipe · a48c1b89
      Daniel Black authored
      Eventfds have a simplier interface and are one file
      descriptor rather than two.
      
      Reuse the patten of the accepting socket connections
      by testing for abort after a poll returns. This way
      the same event descriptor can be used for Quit
      and debugging trigger.
      
      Also correct the registration of mem pressure file
      descriptors.
      a48c1b89
  3. 22 Nov, 2023 3 commits
    • Marko Mäkelä's avatar
      Merge 10.6 into 10.11 · f2bd662f
      Marko Mäkelä authored
      f2bd662f
    • Marko Mäkelä's avatar
      Merge 10.5 into 10.6 · d963584d
      Marko Mäkelä authored
      d963584d
    • Marko Mäkelä's avatar
      MDEV-32861 InnoDB hangs when running out of I/O slots · 78c9a12c
      Marko Mäkelä authored
      When the constant OS_AIO_N_PENDING_IOS_PER_THREAD is changed from 256 to 1
      and the server is run with the minimum parameters
      innodb_read_io_threads=1 and innodb_write_io_threads=2, two hangs
      were observed.
      
      tpool::cache<T>::put(T*): Ensure that get() in io_slots::acquire()
      will be woken up when the cache previously was empty.
      
      buf_pool_t::io_buf_t::reserve(): Schedule a possibly partial doublewrite
      batch so that os_aio_wait_until_no_pending_writes() has a chance of
      returning. Add a Boolean parameter and pass wait_for_reads=false inside
      buf_page_decrypt_after_read(), because those calls will be executed
      inside a read completion callback, and therefore
      os_aio_wait_until_no_pending_reads() would block indefinitely.
      78c9a12c
  4. 21 Nov, 2023 8 commits
    • Marko Mäkelä's avatar
      MDEV-32374 log_sys.lsn_lock is a performance hog · 7443ad1c
      Marko Mäkelä authored
      The log_sys.lsn_lock that was introduced in
      commit a635c406
      had better be located in the same cache line with log_sys.latch
      so that log_t::append_prepare() needs to modify only two first
      cache lines where log_sys is stored.
      
      log_t::lsn_lock: On Linux, change the type from pthread_mutex_t to
      something that may be as small as 32 bits, to pack more data members
      in the same cache line. On Microsoft Windows, CRITICAL_SECTION works
      better.
      
      log_t::check_flush_or_checkpoint_: Renamed to need_checkpoint.
      There is no need to pause all writer threads in log_free_check() when
      we only need to write log_sys.buf to ib_logfile0. That will be done in
      mtr_t::commit().
      
      log_t::append_prepare_wait(): Make the member function non-static
      to simplify the call interface, and add a parameter for the LSN.
      
      log_t::append_prepare(): Invoke append_prepare_wait() at most once.
      Only set_check_for_checkpoint() if a log checkpoint needs to
      be written. If the log buffer needs to be written, we will take care
      of it ourselves later in our caller. This will reduce interference
      with log_free_check() in other threads.
      
      mtr_t::commit(): Call log_write_up_to() if needed.
      
      log_t::get_write_target(): Return a log_write_up_to() target
      to mtr_t::commit().
      
      buf_flush_ahead(): If we are in furious flushing, call
      log_sys.set_check_for_checkpoint() so that all writers will wait
      in log_free_check() until the checkpoint is done. Otherwise,
      the test innodb.insert_into_empty could occasionally report
      an error "Crash recovery is broken".
      
      log_check_margins(): Replaced by log_free_check().
      
      log_flush_margin(): Removed. This is part of mtr_t::commit()
      and other operations that write log.
      
      log_t::create(), log_t::attach(): Guarantee that buf_free < max_buf_free
      will always hold on PMEM, to satisfy an assumption of
      log_t::get_write_target().
      
      log_write_up_to(): Assert lsn!=0. Such calls are not incorrect, but it
      is cheaper to test that single unlikely condition in mtr_t::commit()
      rather than test several conditions in log_write_up_to().
      
      innodb_drop_database(), unlock_and_close_files(): Check the LSN before
      calling log_write_up_to().
      
      ha_innobase::commit_inplace_alter_table(): Remove redundant calls to
      log_write_up_to() after calling unlock_and_close_files().
      
      Reviewed by: Vladislav Vaintroub
      Stress tested by: Matthias Leich
      Performance tested by: Steve Shaw
      7443ad1c
    • Marko Mäkelä's avatar
      Merge 10.6 into 10.11 · f87c7d17
      Marko Mäkelä authored
      f87c7d17
    • Marko Mäkelä's avatar
      MDEV-32050 fixup: Stabilize tests · 4c16ec3e
      Marko Mäkelä authored
      In any test that uses wait_all_purged.inc, ensure that InnoDB tables
      will be created without persistent statistics.
      
      This is a follow-up to commit cd04673a
      after a similar failure was observed in the innodb_zip.blob test.
      4c16ec3e
    • Thirunarayanan Balathandayuthapani's avatar
      MDEV-32050 Fixup · 804b5974
      Thirunarayanan Balathandayuthapani authored
      - Fixing mariabackup.full_backup test case
      804b5974
    • Marko Mäkelä's avatar
      Merge 10.6 into 10.11 · 583a7452
      Marko Mäkelä authored
      583a7452
    • Marko Mäkelä's avatar
      Merge 10.5 into 10.6 · 9c5600ad
      Marko Mäkelä authored
      9c5600ad
    • Marko Mäkelä's avatar
      Merge 10.5 into 10.6 · 0ead2031
      Marko Mäkelä authored
      0ead2031
    • Marko Mäkelä's avatar
      MDEV-32820 Race condition between trx_purge_free_segment() and trx_undo_create() · de31ca6a
      Marko Mäkelä authored
      trx_purge_free_segment(): If fseg_free_step_not_header() needs to be
      called multiple times, acquire an exclusive latch on the
      rollback segment header page after restarting the mini-transaction
      so that the rest of this function cannot execute concurrently
      with trx_undo_create() on the same rollback segment.
      
      This fixes a regression that was introduced in
      commit c14a3943 (MDEV-30753).
      
      Note: The buffer-fixes that we are holding across the mini-transaction
      restart will prevent the pages from being evicted from the buffer pool.
      They may be accessed by other threads or written back to data files
      while we are not holding exclusive latches.
      
      Reviewed by: Vladislav Lesin
      de31ca6a
  5. 20 Nov, 2023 4 commits
  6. 19 Nov, 2023 3 commits
  7. 18 Nov, 2023 1 commit
    • Marko Mäkelä's avatar
      MDEV-31953 madvise(..., MADV_FREE) is causing a performance regression · 23234835
      Marko Mäkelä authored
      buf_page_t::set_os_unused(): Remove the system call that had been added in
      commit 16c97187 and revised in
      commit c1fd082e for Microsoft Windows.
      
      buf_pool_t::garbage_collect(): A new function to collect any garbage
      from the InnoDB buffer pool that can be removed without writing any
      log or data files. This will also invoke madvise() for all of buf_pool.free.
      
      To trigger this the following MDEV is implemented:
      MDEV-24670 avoid OOM by linux kernel co-operative memory management
      
      To avoid frequent triggers that caused the MDEV-31953 regression, while
      still preserving the 10.11 functionality of non-greedy kernel memory
      usage, memory triggers are used.
      
      On the triggering of memory pressure, if supported in the Linux kernel,
      trigger the garbage collection of the innodb buffer pool.
      
      The hard coded triggers occur where there is:
      * some memory pressure in 5 of the last 10 seconds
      * a full stall on memory pressure for 10ms in the last 2 seconds
      
      The kernel will trigger only one in each of these time windows. To avoid
      mariadb being in a constant state of memory garbage collection, this has
      been limited to once per minute.
      
      For a small set of kernels in 2023 (6.5, 6.6), there was a limit requiring
      CAP_SYS_RESOURCE that was lifted[1] to support the use case of user
      memory pressure. It not currently possible to set CAP_SYS_RESOURCES in
      a systemd service as its setting a capability inside a usernamespace.
      
      Running under systemd v254+ requires the default MemoryPressureWatch=auto
      (or alternately "on").
      
      Functionality was tested in a 6.4 kernel Fedora successfully under a
      systemd service.
      
      Running in a container requires that (unmask=)/sys/fs/cgroup be writable
      by the mariadbd process.
      
      To aid testing, the buf_pool_resize was a convient trigger point on
      which to trigger garbage collection.
      
      ref [1]: https://lore.kernel.org/all/CAMw=ZnQ56cm4Txgy5EhGYvR+Jt4s-KVgoA9_65HKWVMOXp7a9A@mail.gmail.com/T/#m3bd2a73c5ee49965cb73a830b1ccaa37ccf4e427
      
      Co-Author: Daniel Black (on memory pressure trigger)
      
      Reviewed by: Marko Mäkelä, Vladislav Vaintroub, Vladislav Lesin,
         Thirunarayanan Balathandayuthapani
      
      Tested by: Matthias Leich
      23234835
  8. 17 Nov, 2023 2 commits
    • Marko Mäkelä's avatar
      MDEV-32027 Opening all .ibd files on InnoDB startup can be slow · eb1f8b29
      Marko Mäkelä authored
      dict_find_max_space_id(): Return SELECT MAX(SPACE) FROM SYS_TABLES.
      
      dict_check_tablespaces_and_store_max_id(): In the normal case
      (no encryption plugin has been loaded and the change buffer is empty),
      invoke dict_find_max_space_id() and do not open any .ibd files.
      If a std::set<uint32_t> has been specified, open the files whose
      tablespace ID is mentioned. Else, open all data files that are identified
      by SYS_TABLES records.
      
      fil_ibd_open(): Remove a call to os_file_get_last_error() that can
      report a misleading error, such as EINVAL inside my_realpath() that is
      not an actual error. This could be invoked when a data file is found
      but the FSP_SPACE_FLAGS are incorrect, such as is the case for
      table test.td in
      ./mtr --mysqld=--innodb-buffer-pool-dump-at-shutdown=0 innodb.table_flags
      
      buf_load(): If any tablespaces could not be found, invoke
      dict_check_tablespaces_and_store_max_id() on the missing tablespaces.
      
      dict_load_tablespace(): Try to load the tablespace unless it was found
      to be futile. This fixes failures related to FTS_*.ibd files for
      FULLTEXT INDEX.
      
      btr_cur_t::search_leaf(): Prevent a crash when the tablespace
      does not exist. This was caught by the test innodb_fts.fts_concurrent_insert
      when the change to dict_load_tablespaces() was not present.
      
      We modify a few tests to ensure that tables will not be loaded at startup.
      For some fault injection tests this means that the corrupted tables
      will not be loaded, because dict_load_tablespace() would perform stricter
      checks than dict_check_tablespaces_and_store_max_id().
      
      Tested by: Matthias Leich
      Reviewed by: Thirunarayanan Balathandayuthapani
      eb1f8b29
    • Marko Mäkelä's avatar
      Merge 10.5 into 10.6 · 44b9e416
      Marko Mäkelä authored
      44b9e416
  9. 16 Nov, 2023 5 commits
    • Marko Mäkelä's avatar
      MDEV-26055: Correct the formula for adaptive flushing · 9a545eb6
      Marko Mäkelä authored
      This is a 10.5 backport of 10.6
      commit d4265fbd.
      
      page_cleaner_flush_pages_recommendation(): If dirty_pct is
      between innodb_max_dirty_pages_pct_lwm
      and innodb_max_dirty_pages_pct,
      scale the effort relative to how close we are to
      innodb_max_dirty_pages_pct.
      
      The previous formula was missing a multiplication by 100.
      9a545eb6
    • Marko Mäkelä's avatar
      MDEV-26055: Improve adaptive flushing · a3d0d5fc
      Marko Mäkelä authored
      This is a 10.5 backport from 10.6
      commit 9593cccf.
      
      Adaptive flushing is enabled by setting innodb_max_dirty_pages_pct_lwm>0
      (not default) and innodb_adaptive_flushing=ON (default).
      There is also the parameter innodb_adaptive_flushing_lwm
      (default: 10 per cent of the log capacity). It should enable some
      adaptive flushing even when innodb_max_dirty_pages_pct_lwm=0.
      That is not being changed here.
      
      This idea was first presented by Inaam Rana several years ago,
      and I discussed it with Jean-François Gagné at FOSDEM 2023.
      
      buf_flush_page_cleaner(): When we are not near the log capacity limit
      (neither buf_flush_async_lsn nor buf_flush_sync_lsn are set),
      also try to move clean blocks from the buf_pool.LRU list to buf_pool.free
      or initiate writes (but not the eviction) of dirty blocks, until
      the remaining I/O capacity has been consumed.
      
      buf_flush_LRU_list_batch(): Add the parameter bool evict, to specify
      whether dirty least recently used pages (from buf_pool.LRU) should
      be evicted immediately after they have been written out. Callers outside
      buf_flush_page_cleaner() will pass evict=true, to retain the existing
      behaviour.
      
      buf_do_LRU_batch(): Add the parameter bool evict.
      Return counts of evicted and flushed pages.
      
      buf_flush_LRU(): Add the parameter bool evict.
      Assume that the caller holds buf_pool.mutex and
      will invoke buf_dblwr.flush_buffered_writes() afterwards.
      
      buf_flush_list_holding_mutex(): A low-level variant of buf_flush_list()
      whose caller must hold buf_pool.mutex and invoke
      buf_dblwr.flush_buffered_writes() afterwards.
      
      buf_flush_wait_batch_end_acquiring_mutex(): Remove. It is enough to have
      buf_flush_wait_batch_end().
      
      page_cleaner_flush_pages_recommendation(): Avoid some floating-point
      arithmetics.
      
      buf_flush_page(), buf_flush_check_neighbor(), buf_flush_check_neighbors(),
      buf_flush_try_neighbors(): Rename the parameter "bool lru" to "bool evict".
      
      buf_free_from_unzip_LRU_list_batch(): Remove the parameter.
      Only actual page writes will contribute towards the limit.
      
      buf_LRU_free_page(): Evict freed pages of temporary tables.
      
      buf_pool.done_free: Broadcast whenever a block is freed
      (and buf_pool.try_LRU_scan is set).
      
      buf_pool_t::io_buf_t::reserve(): Retry indefinitely.
      During the test encryption.innochecksum we easily run out of
      these buffers for PAGE_COMPRESSED or ENCRYPTED pages.
      
      Tested by Matthias Leich and Axel Schwenke
      a3d0d5fc
    • Marko Mäkelä's avatar
      MDEV-31861 Empty INSERT crashes with innodb_force_recovery=6 or innodb_read_only=ON · 5a1f821b
      Marko Mäkelä authored
      ha_innobase::extra(): Do not invoke log_buffer_flush_to_disk()
      if high_level_read_only holds.
      
      log_buffer_flush_to_disk(): Remove an assertion that duplicates one
      at the start of log_write_up_to().
      5a1f821b
    • Marko Mäkelä's avatar
      MDEV-32050 fixup: innodb.instant_alter_crash · 55a96c05
      Marko Mäkelä authored
      This test occasionally fails with a failure to purge history.
      Let us try to purge everything before starting the interesting part,
      to make that occasional failure go away.
      55a96c05
    • Thirunarayanan Balathandayuthapani's avatar
      MDEV-32811 Potentially broken crash recovery if a mini-transaction frees a... · 6c342459
      Thirunarayanan Balathandayuthapani authored
      MDEV-32811 Potentially broken crash recovery if a mini-transaction frees a page, not modifying previously clean pages
      
      - The 11.2 test innodb.sys_truncate_debug fails while executing insert statement.
      Reason for the failure is that same mini-transaction does freeing, allocating
      and freeing the same page. Page initialization clears the FIL_PAGE_LSN
      on the page, fails to set the FIL_PAGE_LSN after freeing the same page.
      This issue is caused by commit f46efb44
      
      mtr_t::commit(): Should set the FIL_PAGE_LSN even though page is freed
      6c342459
  10. 15 Nov, 2023 6 commits
    • Rex's avatar
      Merge 10.4 into 10.5 · 8b509a5d
      Rex authored
      8b509a5d
    • Marko Mäkelä's avatar
      MDEV-32757: rollback crash on corruption · ea6ca013
      Marko Mäkelä authored
      trx_undo_free_page(): Detect a case of corrupted TRX_UNDO_PAGE_LIST.
      
      trx_undo_truncate_end(): Stop attempts to truncate a corrupted log.
      
      trx_t::commit_empty(): Add an error message of a corrupted log.
      
      Reviewed by: Thirunarayanan Balathandayuthapani
      ea6ca013
    • Marko Mäkelä's avatar
      Merge 10.5 into 10.6 · 5dbe7a8c
      Marko Mäkelä authored
      5dbe7a8c
    • Marko Mäkelä's avatar
      Merge 10.5 into 10.6 · 52ca2e65
      Marko Mäkelä authored
      52ca2e65
    • Marko Mäkelä's avatar
      MDEV-32757 innodb_undo_log_truncate=ON is not crash safe · a0f02f74
      Marko Mäkelä authored
      trx_purge_truncate_history(): Do not prematurely mark dirty pages
      as clean. This will be done in mtr_t::commit_shrink() as part of
      Shrink::operator()(mtr_memo_slot_t*). Also, register each dirty page
      only once in the mini-transaction.
      
      fsp_page_create(): Adjust and simplify the page creation during
      undo tablespace truncation. We can directly reuse pages that are
      already in buf_pool.page_hash.
      
      This fixes a regression that was caused by
      commit f5794e1d (MDEV-26445).
      
      Tested by: Matthias Leich
      Reviewed by: Thirunarayanan Balathandayuthapani
      a0f02f74
    • Tuukka Pasanen's avatar
      MDEV-32689: Remove Ubuntu Bionic from 10.5 · 15bb8acf
      Tuukka Pasanen authored
      Commit Removed Ubuntu Bionic from
      debian/autobake-debs.sh as it's not used
      anymore to build official MariaDB images
      
      REMINDER TO MERGER: This commit should not
      be merged up to 10.6 or forward
      15bb8acf
  11. 14 Nov, 2023 2 commits