1. 11 Dec, 2023 1 commit
    • Alexander Barkov's avatar
      MDEV-32958 Unusable key notes do not get reported for some operations · 4ced4898
      Alexander Barkov authored
      Enable unusable key notes for non-equality predicates:
         <, <=, =>, >, BETWEEN, IN, LIKE
      
      Note, in some scenarios it displays duplicate notes, e.g.
      for queries with ORDER BY:
      
        SELECT * FROM t1
        WHERE    indexed_string_column >= 10
        ORDER BY indexed_string_column
        LIMIT 5;
      
      This should be tolarable. Getting rid of the diplicate note
      completely would need a much more complex patch, which is
      not desiable in 10.6.
      
      Details:
      
      - Changing RANGE_OPT_PARAM::note_unusable_keys from bool
        to a new data type Item_func::Bitmap, so the caller can
        choose with a better granuality which predicates
        should raise unusable key notes inside the range optimizer:
          a. all predicates (=, <=>, <, <=, =>, >, BETWEEN, IN, LIKE)
          b. all predicates except equality (=, <=>)
          c. none of the predicates
      
        "b." is needed because in some scenarios equality predicates (=, <=>)
        send unusable key notes at an earlier stage, before the range optimizer,
        during update_ref_and_keys(). Calling the range optimizer with
        "all predicates" would produce duplicate notes for = and <=> in such cases.
      
      - Fixing get_quick_record_count() to call the range optimizer
        with "all predicates except equality" instead of "none of the predicates".
        Before this change the range optimizer suppressed all notes for
        non-equality predicates: <, <=, =>, >, BETWEEN, IN, LIKE.
        This actually fixes the reported problem.
      
      - Fixing JOIN::make_range_rowid_filters() to call the range optimizer
        with "all predicates except equality" instead of "all predicates".
        Before this change the range optimizer produced duplicate notes
        for = and <=> during a rowid_filter optimization.
      
      - Cleanup:
        Adding the op_collation argument to Field::raise_note_cannot_use_key_part()
        and displaying the operation collation rather than the argument collation
        in the unusable key note. This is important for operations with more than
        two arguments: BETWEEN and IN, e.g.:
      
          SELECT * FROM t1
          WHERE column_utf8mb3_general_ci
                BETWEEN 'a' AND 'b' COLLATE utf8mb3_unicode_ci;
      
          SELECT * FROM t1
          WHERE column_utf8mb3_general_ci
                IN ('a', 'b' COLLATE utf8mb3_unicode_ci);
      
          The note for 'a' now prints utf8mb3_unicode_ci as the collation.
          which is the collation of the entire operation:
      
            Cannot use key key1 part[0] for lookup:
            "`column_utf8mb3_general_ci`" of collation `utf8mb3_general_ci` >=
            "'a'" of collation `utf8mb3_unicode_ci`
      
          Before this change it printed the collation of 'a',
          so the note was confusing:
      
            Cannot use key key1 part[0] for lookup:
            "`column_utf8mb3_general_ci`" of collation `utf8mb3_general_ci` >=
            "'a'" of collation `utf8mb3_general_ci`"
      4ced4898
  2. 07 Dec, 2023 1 commit
    • Andrew Hutchings's avatar
      MDEV-32884 Improve S3 options comaptibility · bc5e9040
      Andrew Hutchings authored
      The previous commit for MDEV-32884 fixed the s3_protocol_version option,
      which was previous only using "Auto", no matter what it was set to. This
      patch does several things to keep the old behaviour whilst correcting
      for new behaviour and laying the groundwork for the future. This
      includes:
      
      * `Original` now means v2 protocol, which it would have been due to the
        option not working, so upgrades will stil work.
      * A new `Legacy` option has been added to mean v1 protocol.
      * Options `Path` and `Domain` have been added, these will be the only
        two options apart from `Auto` in a future release, and are more
        aligned with what this variable means.
      * Fixed the s3.debug test so that it works with v2 protocol.
      * Fixed the s3.amazon test so that it works with region subdomains.
      * Added additional modes to the s3.amazon test.
      * Added s3.not_amazon test for the remaining modes.
      
      This replaces PR #2902.
      bc5e9040
  3. 06 Dec, 2023 1 commit
  4. 05 Dec, 2023 1 commit
    • Marko Mäkelä's avatar
      MDEV-32068 Some calls to buf_read_ahead_linear() seem to be useless · f074223a
      Marko Mäkelä authored
      The linear read-ahead (enabled by nonzero innodb_read_ahead_threshold)
      works best if index leaf pages or undo log pages have been allocated
      on adjacent page numbers. The read-ahead is assumed not to be helpful
      in other types of page accesses, such as non-leaf index pages.
      
      buf_page_get_low(): Do not invoke buf_page_t::set_accessed(),
      buf_page_make_young_if_needed(), or buf_read_ahead_linear().
      We will invoke them in those callers of buf_page_get_gen() or
      buf_page_get() where it makes sense: the access is not
      one-time-on-startup and the page and not going to be freed soon.
      
      btr_copy_blob_prefix(), btr_pcur_move_to_next_page(),
      trx_undo_get_prev_rec_from_prev_page(),
      trx_undo_get_first_rec(), btr_cur_t::search_leaf(),
      btr_cur_t::open_leaf(): Invoke buf_read_ahead_linear().
      
      We will not invoke linear read-ahead in functions that would
      essentially allocate or free pages, because pages that are
      freshly allocated are expected to be initialized by buf_page_create()
      and not read from the data file. Likewise, freeing pages should
      not involve accessing any sibling pages, except for freeing
      singly-linked lists of BLOB pages.
      
      We will not invoke read-ahead in btr_cur_t::pessimistic_search_leaf()
      or in a pessimistic operation of btr_cur_t::open_leaf(), because
      it is assumed that pessimistic operations should be preceded by
      optimistic operations, which should already have invoked read-ahead.
      
      buf_page_make_young_if_needed(): Invoke also buf_page_t::set_accessed()
      and return the result.
      
      btr_cur_nonleaf_make_young(): Like buf_page_make_young_if_needed(),
      but do not invoke buf_page_t::set_accessed().
      
      Reviewed by: Vladislav Lesin
      Tested by: Matthias Leich
      f074223a
  5. 04 Dec, 2023 3 commits
  6. 30 Nov, 2023 7 commits
  7. 29 Nov, 2023 2 commits
    • Vlad Lesin's avatar
      MDEV-28682 gcol.gcol_purge contaminates further execution of innodb.gap_locks · 968061fd
      Vlad Lesin authored
      ha_innobase::extra() invokes check_trx_exists() unconditionally even for
      not supported operations. check_trx_exists() creates and registers trx_t
      object if THD does not contain pointer to it. If ha_innobase::extra() does
      not support some operation, it just invokes check_trx_exists() and quites.
      If check_trx_exists() creates and registers new trx_t object for such
      operation, it will never be freed and deregistered.
      
      For example, if ha_innobase::extra() is invoked from purge thread with
      operation = HA_EXTRA_IS_ATTACHED_CHILDREN, like it goes in
      gcol.gcol_purge test, trx_t object will be registered, but not
      deregisreted, and this causes innodb.gap_lock failure, as "SHOW ENGINE
      INNODB STATUS" shows information about unexpected transaction at the end
      of trx_sys.trx_list.
      
      The fix is not to invoke check_trx_exists() for unsupported operations
      in ha_innobase::extra().
      
      Reviewed by: Marko Mäkelä
      968061fd
    • Marko Mäkelä's avatar
      MDEV-32899 instrumentation · ba6bf7ad
      Marko Mäkelä authored
      In debug builds, let us declare dict_sys.latch as index_lock instead of
      srw_lock, so that we will benefit from the full tracking of lock ownership.
      
      lock_table_for_trx(): Assert that the current thread is not holding
      dict_sys.latch. If the dict_sys.unfreeze() call were moved to the end of
      lock_table_children(), this assertion would fail in the test innodb.innodb
      and many other tests that use FOREIGN KEY.
      ba6bf7ad
  8. 28 Nov, 2023 3 commits
    • Monty's avatar
      Remove deprication from mariadbd --debug · 387b92df
      Monty authored
      --debug is supported by allmost all our other binaries and we should keep
      it also in the server to keep option names similar.
      387b92df
    • Marko Mäkelä's avatar
      MDEV-32899 InnoDB is holding shared dict_sys.latch while waiting for FOREIGN... · 569da6a7
      Marko Mäkelä authored
      MDEV-32899 InnoDB is holding shared dict_sys.latch while waiting for FOREIGN KEY child table lock on DDL
      
      lock_table_children(): A new function to lock all child tables of a table.
      We will only hold dict_sys.latch while traversing
      dict_table_t::referenced_set. To prevent a race condition with
      std::set::erase() we will copy the pointers to the child tables to a
      local vector. Once we have acquired references to all child tables,
      we can safely release dict_sys.latch, wait for the locks, and finally
      release the references.
      
      This fixes up commit 2ca11234 (MDEV-26217)
      and commit c3c53926 (MDEV-26554).
      569da6a7
    • Alexander Barkov's avatar
      MDEV-32879 Server crash in my_decimal::operator= or unexpected ER_DUP_ENTRY... · f436b4a5
      Alexander Barkov authored
      MDEV-32879 Server crash in my_decimal::operator= or unexpected ER_DUP_ENTRY upon comparison with INET6 and similar types
      
      During the 10.5->10.6 merge please use the 10.6 code on conflicts.
      
      This is the 10.5 version of the patch (a backport of the 10.6 version).
      Unlike 10.6 version, it makes changes in plugin/type_inet/sql_type_inet.*
      rather than in sql/sql_type_fixedbin.h
      
      Item_bool_rowready_func2, Item_func_between, Item_func_in
      did not check if a not-NULL argument of an arbitrary data type
      can produce a NULL value on conversion to INET6.
      
      This caused a crash on DBUG_ASSERT() in conversion failures,
      because the function returned SQL NULL for something that
      has Item::maybe_null() equal to false.
      
      Adding setting NULL-ability in such cases.
      
      Details:
      
      - Removing the code in Item_func::setup_args_and_comparator()
        performing character set aggregation with optional narrowing.
        This aggregation is done inside Arg_comparator::set_cmp_func_string().
        So this code was redundant
      
      - Removing Item_func::setup_args_and_comparator() as it git simplified to
        just to two lines:
          convert_const_compared_to_int_field(thd);
          return cmp->set_cmp_func(thd, this, &args[0], &args[1], true);
        Using these lines directly in:
          - Item_bool_rowready_func2::fix_length_and_dec()
          - Item_func_nullif::fix_length_and_dec()
      
      - Adding a new virtual method:
        - Type_handler::Item_bool_rowready_func2_fix_length_and_dec().
      
      - Adding tests detecting if the data type conversion can return SQL NULL into
        the following methods of Type_handler_inet6:
        - Item_bool_rowready_func2_fix_length_and_dec
        - Item_func_between_fix_length_and_dec
        - Item_func_in_fix_comparator_compatible_types
      f436b4a5
  9. 27 Nov, 2023 1 commit
    • Alexander Barkov's avatar
      MDEV-32879 Server crash in my_decimal::operator= or unexpected ER_DUP_ENTRY... · 20b0ec9a
      Alexander Barkov authored
      MDEV-32879 Server crash in my_decimal::operator= or unexpected ER_DUP_ENTRY upon comparison with INET6 and similar types
      
      This is the 10.6 version of the patch.
      
      Item_bool_rowready_func2, Item_func_between, Item_func_in
      did not check if a not-NULL argument of an arbitrary data type
      can produce a NULL value on conversion to INET6.
      
      This caused a crash on DBUG_ASSERT() in conversion failures,
      because the function returned SQL NULL for something that
      has Item::maybe_null() equal to false.
      
      Adding setting NULL-ability in such cases.
      
      Details:
      
      - Removing the code in Item_func::setup_args_and_comparator()
        performing character set aggregation with optional narrowing.
        This aggregation is done inside Arg_comparator::set_cmp_func_string().
        So this code was redundant
      
      - Removing Item_func::setup_args_and_comparator() as it git simplified to
        just to two lines:
          convert_const_compared_to_int_field(thd);
          return cmp->set_cmp_func(thd, this, &args[0], &args[1], true);
        Using these lines directly in:
          - Item_bool_rowready_func2::fix_length_and_dec()
          - Item_func_nullif::fix_length_and_dec()
      
      - Adding a new virtual method:
        - Type_handler::Item_bool_rowready_func2_fix_length_and_dec().
      
      - Adding tests detecting if the data type conversion can return SQL NULL into
        the following methods of Type_handler_fbt:
        - Item_bool_rowready_func2_fix_length_and_dec
        - Item_func_between_fix_length_and_dec
        - Item_func_in_fix_comparator_compatible_types
      20b0ec9a
  10. 24 Nov, 2023 1 commit
    • Marko Mäkelä's avatar
      MDEV-32873 Test innodb.innodb-index-online occasionally fails · 2f467de4
      Marko Mäkelä authored
      Let us wait for the completion of purge before testing the KILL of
      CREATE INDEX c2d ON t1(c2), so that there will be no table handle
      acquisition by a purge task before the operation is rolled back.
      
      Also, let us make the test compatible with ./mtr --repeat,
      and convert variable_value from string to integer so that any
      comparisons will be performed correctly.
      2f467de4
  11. 22 Nov, 2023 3 commits
    • Anel Husakovic's avatar
    • Marko Mäkelä's avatar
      Merge 10.5 into 10.6 · d963584d
      Marko Mäkelä authored
      d963584d
    • Marko Mäkelä's avatar
      MDEV-32861 InnoDB hangs when running out of I/O slots · 78c9a12c
      Marko Mäkelä authored
      When the constant OS_AIO_N_PENDING_IOS_PER_THREAD is changed from 256 to 1
      and the server is run with the minimum parameters
      innodb_read_io_threads=1 and innodb_write_io_threads=2, two hangs
      were observed.
      
      tpool::cache<T>::put(T*): Ensure that get() in io_slots::acquire()
      will be woken up when the cache previously was empty.
      
      buf_pool_t::io_buf_t::reserve(): Schedule a possibly partial doublewrite
      batch so that os_aio_wait_until_no_pending_writes() has a chance of
      returning. Add a Boolean parameter and pass wait_for_reads=false inside
      buf_page_decrypt_after_read(), because those calls will be executed
      inside a read completion callback, and therefore
      os_aio_wait_until_no_pending_reads() would block indefinitely.
      78c9a12c
  12. 21 Nov, 2023 4 commits
    • Marko Mäkelä's avatar
      MDEV-32050 fixup: Stabilize tests · 4c16ec3e
      Marko Mäkelä authored
      In any test that uses wait_all_purged.inc, ensure that InnoDB tables
      will be created without persistent statistics.
      
      This is a follow-up to commit cd04673a
      after a similar failure was observed in the innodb_zip.blob test.
      4c16ec3e
    • Marko Mäkelä's avatar
      Merge 10.5 into 10.6 · 9c5600ad
      Marko Mäkelä authored
      9c5600ad
    • Marko Mäkelä's avatar
      Merge 10.5 into 10.6 · 0ead2031
      Marko Mäkelä authored
      0ead2031
    • Marko Mäkelä's avatar
      MDEV-32820 Race condition between trx_purge_free_segment() and trx_undo_create() · de31ca6a
      Marko Mäkelä authored
      trx_purge_free_segment(): If fseg_free_step_not_header() needs to be
      called multiple times, acquire an exclusive latch on the
      rollback segment header page after restarting the mini-transaction
      so that the rest of this function cannot execute concurrently
      with trx_undo_create() on the same rollback segment.
      
      This fixes a regression that was introduced in
      commit c14a3943 (MDEV-30753).
      
      Note: The buffer-fixes that we are holding across the mini-transaction
      restart will prevent the pages from being evicted from the buffer pool.
      They may be accessed by other threads or written back to data files
      while we are not holding exclusive latches.
      
      Reviewed by: Vladislav Lesin
      de31ca6a
  13. 20 Nov, 2023 1 commit
  14. 19 Nov, 2023 3 commits
  15. 17 Nov, 2023 2 commits
    • Marko Mäkelä's avatar
      MDEV-32027 Opening all .ibd files on InnoDB startup can be slow · eb1f8b29
      Marko Mäkelä authored
      dict_find_max_space_id(): Return SELECT MAX(SPACE) FROM SYS_TABLES.
      
      dict_check_tablespaces_and_store_max_id(): In the normal case
      (no encryption plugin has been loaded and the change buffer is empty),
      invoke dict_find_max_space_id() and do not open any .ibd files.
      If a std::set<uint32_t> has been specified, open the files whose
      tablespace ID is mentioned. Else, open all data files that are identified
      by SYS_TABLES records.
      
      fil_ibd_open(): Remove a call to os_file_get_last_error() that can
      report a misleading error, such as EINVAL inside my_realpath() that is
      not an actual error. This could be invoked when a data file is found
      but the FSP_SPACE_FLAGS are incorrect, such as is the case for
      table test.td in
      ./mtr --mysqld=--innodb-buffer-pool-dump-at-shutdown=0 innodb.table_flags
      
      buf_load(): If any tablespaces could not be found, invoke
      dict_check_tablespaces_and_store_max_id() on the missing tablespaces.
      
      dict_load_tablespace(): Try to load the tablespace unless it was found
      to be futile. This fixes failures related to FTS_*.ibd files for
      FULLTEXT INDEX.
      
      btr_cur_t::search_leaf(): Prevent a crash when the tablespace
      does not exist. This was caught by the test innodb_fts.fts_concurrent_insert
      when the change to dict_load_tablespaces() was not present.
      
      We modify a few tests to ensure that tables will not be loaded at startup.
      For some fault injection tests this means that the corrupted tables
      will not be loaded, because dict_load_tablespace() would perform stricter
      checks than dict_check_tablespaces_and_store_max_id().
      
      Tested by: Matthias Leich
      Reviewed by: Thirunarayanan Balathandayuthapani
      eb1f8b29
    • Marko Mäkelä's avatar
      Merge 10.5 into 10.6 · 44b9e416
      Marko Mäkelä authored
      44b9e416
  16. 16 Nov, 2023 4 commits
    • Marko Mäkelä's avatar
      MDEV-26055: Correct the formula for adaptive flushing · 9a545eb6
      Marko Mäkelä authored
      This is a 10.5 backport of 10.6
      commit d4265fbd.
      
      page_cleaner_flush_pages_recommendation(): If dirty_pct is
      between innodb_max_dirty_pages_pct_lwm
      and innodb_max_dirty_pages_pct,
      scale the effort relative to how close we are to
      innodb_max_dirty_pages_pct.
      
      The previous formula was missing a multiplication by 100.
      9a545eb6
    • Marko Mäkelä's avatar
      MDEV-26055: Improve adaptive flushing · a3d0d5fc
      Marko Mäkelä authored
      This is a 10.5 backport from 10.6
      commit 9593cccf.
      
      Adaptive flushing is enabled by setting innodb_max_dirty_pages_pct_lwm>0
      (not default) and innodb_adaptive_flushing=ON (default).
      There is also the parameter innodb_adaptive_flushing_lwm
      (default: 10 per cent of the log capacity). It should enable some
      adaptive flushing even when innodb_max_dirty_pages_pct_lwm=0.
      That is not being changed here.
      
      This idea was first presented by Inaam Rana several years ago,
      and I discussed it with Jean-François Gagné at FOSDEM 2023.
      
      buf_flush_page_cleaner(): When we are not near the log capacity limit
      (neither buf_flush_async_lsn nor buf_flush_sync_lsn are set),
      also try to move clean blocks from the buf_pool.LRU list to buf_pool.free
      or initiate writes (but not the eviction) of dirty blocks, until
      the remaining I/O capacity has been consumed.
      
      buf_flush_LRU_list_batch(): Add the parameter bool evict, to specify
      whether dirty least recently used pages (from buf_pool.LRU) should
      be evicted immediately after they have been written out. Callers outside
      buf_flush_page_cleaner() will pass evict=true, to retain the existing
      behaviour.
      
      buf_do_LRU_batch(): Add the parameter bool evict.
      Return counts of evicted and flushed pages.
      
      buf_flush_LRU(): Add the parameter bool evict.
      Assume that the caller holds buf_pool.mutex and
      will invoke buf_dblwr.flush_buffered_writes() afterwards.
      
      buf_flush_list_holding_mutex(): A low-level variant of buf_flush_list()
      whose caller must hold buf_pool.mutex and invoke
      buf_dblwr.flush_buffered_writes() afterwards.
      
      buf_flush_wait_batch_end_acquiring_mutex(): Remove. It is enough to have
      buf_flush_wait_batch_end().
      
      page_cleaner_flush_pages_recommendation(): Avoid some floating-point
      arithmetics.
      
      buf_flush_page(), buf_flush_check_neighbor(), buf_flush_check_neighbors(),
      buf_flush_try_neighbors(): Rename the parameter "bool lru" to "bool evict".
      
      buf_free_from_unzip_LRU_list_batch(): Remove the parameter.
      Only actual page writes will contribute towards the limit.
      
      buf_LRU_free_page(): Evict freed pages of temporary tables.
      
      buf_pool.done_free: Broadcast whenever a block is freed
      (and buf_pool.try_LRU_scan is set).
      
      buf_pool_t::io_buf_t::reserve(): Retry indefinitely.
      During the test encryption.innochecksum we easily run out of
      these buffers for PAGE_COMPRESSED or ENCRYPTED pages.
      
      Tested by Matthias Leich and Axel Schwenke
      a3d0d5fc
    • Marko Mäkelä's avatar
      MDEV-31861 Empty INSERT crashes with innodb_force_recovery=6 or innodb_read_only=ON · 5a1f821b
      Marko Mäkelä authored
      ha_innobase::extra(): Do not invoke log_buffer_flush_to_disk()
      if high_level_read_only holds.
      
      log_buffer_flush_to_disk(): Remove an assertion that duplicates one
      at the start of log_write_up_to().
      5a1f821b
    • Marko Mäkelä's avatar
      MDEV-32050 fixup: innodb.instant_alter_crash · 55a96c05
      Marko Mäkelä authored
      This test occasionally fails with a failure to purge history.
      Let us try to purge everything before starting the interesting part,
      to make that occasional failure go away.
      55a96c05
  17. 15 Nov, 2023 2 commits
    • Rex's avatar
      Merge 10.4 into 10.5 · 8b509a5d
      Rex authored
      8b509a5d
    • Marko Mäkelä's avatar
      MDEV-32757: rollback crash on corruption · ea6ca013
      Marko Mäkelä authored
      trx_undo_free_page(): Detect a case of corrupted TRX_UNDO_PAGE_LIST.
      
      trx_undo_truncate_end(): Stop attempts to truncate a corrupted log.
      
      trx_t::commit_empty(): Add an error message of a corrupted log.
      
      Reviewed by: Thirunarayanan Balathandayuthapani
      ea6ca013