1. 01 Aug, 2023 3 commits
    • Marko Mäkelä's avatar
      MDEV-27593: Crashing on I/O error is unhelpful · 72928e64
      Marko Mäkelä authored
      buf_page_t::write_complete(), buf_page_write_complete(),
      IORequest::write_complete(): Add a parameter for passing
      an error code. If an error occurred, we will release the
      io-fix, buffer-fix and page latch but not reset the
      oldest_modification field. The block would remain in
      buf_pool.LRU and possibly buf_pool.flush_list, to be written
      again later, by buf_flush_page_cleaner(). If all page writes
      start consistently failing, all write threads should eventually
      hang in log_free_check() because the log checkpoint cannot
      be advanced to make room in the circular write-ahead-log ib_logfile0.
      
      IORequest::read_complete(): Add a parameter for passing
      an error code. If a read operation fails, we report the error
      and discard the page, just like we would do if the page checksum
      was not validated or the page could not be decrypted.
      This only affects asynchronous reads, due to linear or random read-ahead
      or crash recovery. When buf_page_get_low() invokes buf_read_page(),
      that will be a synchronous read, not involving this code.
      
      This was tested by randomly injecting errors in
      write_io_callback() and read_io_callback(), like this:
      
        if (!ut_rnd_interval(100))
          cb->m_err= 42;
      72928e64
    • Marko Mäkelä's avatar
      MDEV-31816 fixup: Relax a debug assertion · 96cfdb87
      Marko Mäkelä authored
      buf_LRU_free_page(): The block may also be in the IBUF_EXIST state
      when executing the test innodb.innodb_bulk_create_index_debug.
      96cfdb87
    • Marko Mäkelä's avatar
      MDEV-31816 buf_LRU_free_page() does not preserve ROW_FORMAT=COMPRESSED block state · d794d348
      Marko Mäkelä authored
      buf_LRU_free_page(): When we are discarding the uncompressed copy of a
      ROW_FORMAT=COMPRESSED page, buf_page_t::can_relocate() must have ensured
      that the block descriptor state is one of FREED, UNFIXED, REINIT.
      Do not overwrite the state with UNFIXED. We do not want to write back
      pages that were actually freed, and we want to avoid doublewrite for
      pages that were (re)initialized by log records written since the latest
      checkpoint. Last but not least, we do not want crashes like those that
      commit dc1bd180 (MDEV-31386)
      was supposed to fix.
      
      The test innodb_zip.wl5522_zip should typically cover all 3 states.
      
      This bug is a regression due to
      commit aaef2e1d (MDEV-27058).
      d794d348
  2. 28 Jul, 2023 1 commit
    • Marko Mäkelä's avatar
      MDEV-31354 SIGSEGV in log_sort_flush_list() in InnoDB crash recovery · 0d175968
      Marko Mäkelä authored
      log_sort_flush_list(): Wait for any pending page writes to cease
      before sorting the buf_pool.flush_list. Starting with
      commit 22b62eda (MDEV-25113),
      it is possible that some buf_page_t::oldest_modification_
      that we will be comparing in std::sort() will be updated from
      some value >2 to 1 while we are holding buf_pool.flush_list_mutex.
      
      To catch this type of trouble better in the future, we will clean
      garbage (pages that have been written out) from buf_pool.flush_list
      while constructing the array for sorting, and check with debug
      assertions that all blocks that we are copying from the array to the
      list will be dirty (requiring a writeback) while we sort and copy
      the array back to buf_pool.flush_list.
      
      This failure was observed by chance exactly once when running the test
      innodb.recovery_memory. It was never reproduced in the same form
      afterwards. Unrelated to this change, the test does occasionally
      reproduce a failure to start up InnoDB due to a corrupted page
      being read in recovery. The ticket MDEV-31791 was filed for that.
      
      Tested by: Matthias Leich
      0d175968
  3. 25 Jul, 2023 1 commit
    • Marko Mäkelä's avatar
      MDEV-31767 InnoDB tables are being flagged as corrupted on an I/O bound server · b102872a
      Marko Mäkelä authored
      The main problem is that at ever since
      commit aaef2e1d removed the
      function buf_wait_for_read(), it is not safe to invoke
      buf_page_get_low() with RW_NO_LATCH, that is, only buffer-fixing
      the page. If a page read (or decryption or decompression) is in
      progress, there would be a race condition when executing consistency
      checks, and a page would wrongly be flagged as corrupted.
      
      Furthermore, if the page is actually corrupted and the initial
      access to it was with RW_NO_LATCH (only buffer-fixing), the
      page read handler would likely end up in an infinite loop in
      buf_pool_t::corrupted_evict(). It is not safe to invoke
      mtr_t::upgrade_buffer_fix() on a block on which a page latch
      was not initially acquired in buf_page_get_low().
      
      btr_block_reget(): Remove the constant parameter rw_latch=RW_X_LATCH.
      
      btr_block_get(): Assert that RW_NO_LATCH is not being used,
      and change the parameter type of rw_latch.
      
      btr_pcur_move_to_next_page(), innobase_table_is_empty(): Adjust for the
      parameter type change of btr_block_get().
      
      btr_root_block_get(): If mode==RW_NO_LATCH, do not check the integrity of
      the page, because it is not safe to do so.
      
      btr_page_alloc_low(), btr_page_free(): If the root page latch is not
      previously held by the mini-transaction, invoke btr_root_block_get()
      again with the proper latching mode.
      
      btr_latch_prev(): Helper function to safely acquire a latch on a
      preceding sibling page while holding a latch on a B-tree page.
      To avoid deadlocks, we must not wait for the latch while holding
      a latch on the current page, because another thread may be waiting
      for our page latch when moving to the next page from our preceding
      sibling page. If s_lock_try() or x_lock_try() on the preceding page fails,
      we must release the current page latch, and wait for the latch on the
      preceding page as well as the current page, in that order.
      Page splits or merges will be prevented by the parent page latch
      that we are holding.
      
      btr_cur_t::search_leaf(): Make use of btr_latch_prev().
      
      btr_cur_t::open_leaf(): Make use of btr_latch_prev(). Do not invoke
      mtr_t::upgrade_buffer_fix() (when latch_mode == BTR_MODIFY_TREE),
      because we will already have acquired all page latches upfront.
      
      btr_cur_t::pessimistic_search_leaf(): Do acquire an exclusive index latch
      before accessing the page. Make use of btr_latch_prev().
      b102872a
  4. 24 Jul, 2023 1 commit
    • Marko Mäkelä's avatar
      MDEV-31120 Duplicate entry allowed into a UNIQUE column · 9bb5b253
      Marko Mäkelä authored
      row_ins_sec_index_entry_low(): Correct a condition that was
      inadvertently inverted
      in commit 89ec4b53 (MDEV-29603).
      
      We are not supposed to buffer INSERT operations into unique indexes,
      because duplicate key values would not be checked for. It is only
      allowed when using unique_checks=0, and in that case the user is
      supposed to guarantee that there are no duplicates.
      9bb5b253
  5. 21 Jul, 2023 1 commit
    • Sergei Petrunia's avatar
      MDEV-31577: Make ANALYZE FORMAT=JSON print innodb stats · 6e484c3b
      Sergei Petrunia authored
      ANALYZE FORMAT=JSON output now includes table.r_engine_stats which
      has the engine statistics. Only non-zero members are printed.
      
      Internally: EXPLAIN data structures Explain_table_acccess and
      Explain_update now have handler* handler_for_stats pointer.
      It is used to read statistics from handler_for_stats->handler_stats.
      
      The following applies only to 10.9+, backport doesn't use it:
      
      Explain data structures exist after the tables are closed. We avoid
      walking invalid pointers using this:
      - SQL layer calls Explain_query::notify_tables_are_closed() before
        closing tables.
      - After that call, printing of JSON output is disabled. Non-JSON output
        can be printed but we don't access handler_for_stats when doing that.
      6e484c3b
  6. 13 Jul, 2023 2 commits
  7. 12 Jul, 2023 1 commit
    • Sergei Petrunia's avatar
      MDEV-29152: Assertion failed ... upon TO_CHAR with wrong argument · feaeb27b
      Sergei Petrunia authored
      Item_func_tochar::check_arguments() didn't check if its arguments
      each had one column. Failing to make this check and proceeding would
      eventually cause either an assertion failure or the execution would
      reach "MY_ASSERT_UNREACHABLE();" which would produce a crash with
      a misleading stack trace.
      
      * Fixed Item_func_tochar::check_arguments() to do the required check.
      
      * Also fixed MY_ASSERT_UNREACHABLE() to terminate the program. Just
      "executing" __builtin_unreachable() used to cause "undefined results",
      which in my experience was a crash with corrupted stack trace.
      feaeb27b
  8. 10 Jul, 2023 1 commit
    • Vlad Lesin's avatar
      MDEV-29311 Server Status Innodb_row_lock_time% is reported in seconds · 090a8436
      Vlad Lesin authored
      Before MDEV-24671, the wait time was derived from my_interval_timer() /
      1000 (nanoseconds converted to microseconds, and not microseconds to
      milliseconds like I must have assumed). The lock_sys.wait_time and
      lock_sys.wait_time_max are already in milliseconds; we should not divide
      them by 1000.
      
      In MDEV-24738 the millisecond counts lock_sys.wait_time and
      lock_sys.wait_time_max were changed to a 32-bit type. That would
      overflow in 49.7 days. Keep using a 64-bit type for those millisecond
      counters.
      
      Reviewed by: Marko Mäkelä
      090a8436
  9. 07 Jul, 2023 2 commits
    • Monty's avatar
      Disable view protocol for opt_trace.test · 6ed14bcc
      Monty authored
      This is because some plan changes because of views
      6ed14bcc
    • Monty's avatar
      MDEV-31558 Add InnoDB engine information to the slow query log · 99bd2260
      Monty authored
      The new statistics is enabled by adding the "engine", "innodb" or "full"
      option to --log-slow-verbosity
      
      Example output:
      
       # Pages_accessed: 184  Pages_read: 95  Pages_updated: 0  Old_rows_read: 1
       # Pages_read_time: 17.0204  Engine_time: 248.1297
      
      Page_read_time is time doing physical reads inside a storage engine.
      (Writes cannot be tracked as these are usually done in the background).
      Engine_time is the time spent inside the storage engine for the full
      duration of the read/write/update calls. It uses the same code as
      'analyze statement' for calculating the time spent.
      
      The engine statistics is done with a generic interface that should be
      easy for any engine to use. It can also easily be extended to provide
      even more statistics.
      
      Currently only InnoDB has counters for Pages_% and Undo_% status.
      Engine_time works for all engines.
      
      Implementation details:
      
      class ha_handler_stats holds all engine stats.  This class is included
      in handler and THD classes.
      While a query is running, all statistics is updated in the handler. In
      close_thread_tables() the statistics is added to the THD.
      
      handler::handler_stats is a pointer to where statistics should be
      collected. This is set to point to handler::active_handler_stats if
      stats are requested. If not, it is set to 0.
      handler_stats has also an element, 'active' that is 1 if stats are
      requested. This is to allow engines to avoid doing any 'if's while
      updating the statistics.
      
      Cloned or partition tables have the pointer set to the base table if
      status are requested.
      
      There is a small performance impact when using --log-slow-verbosity=engine:
      - All engine calls in 'select' will be timed.
      - IO calls for InnoDB reads will be timed.
      - Incrementation of counters are done on local variables and accesses
        are inline, so these should have very little impact.
      - Statistics has to be reset for each statement for the THD and each
        used handler. This is only 40 bytes, which should be neglectable.
      - For partition tables we have to loop over all partitions to update
        the handler_status as part of table_init(). Can be optimized in the
        future to only do this is log-slow-verbosity changes. For this to work
        we have to update handler_status for all opened partitions and
        also for all partitions opened in the future.
      
      Other things:
      - Added options 'engine' and 'full' to log-slow-verbosity.
      - Some of the new files in the test suite comes from Percona server, which
        has similar status information.
      - buf_page_optimistic_get(): Do not increment any counter, since we are
        only validating a pointer, not performing any buf_pool.page_hash lookup.
      - Added THD argument to save_explain_data_intern().
      - Switched arguments for save_explain_.*_data() to have
        always THD first (generates better code as other functions also have THD
        first).
      99bd2260
  10. 05 Jul, 2023 3 commits
    • Marko Mäkelä's avatar
      Merge 10.5 into 10.6 · 2855bc53
      Marko Mäkelä authored
      2855bc53
    • Marko Mäkelä's avatar
      MDEV-31568 InnoDB protection against dual processes accessing data insufficient · bd7908e6
      Marko Mäkelä authored
      fil_node_open_file_low(): Always acquire an advisory lock on
      the system tablespace. Originally, we already did this in
      SysTablespace::open_file(), but SysTablespace::open_or_create()
      would release those locks when it is closing the file handles.
      
      This is a 10.5+ specific follow up to
      commit 0ee1082b (MDEV-28495).
      
      Thanks to Daniel Black for verifying this bug.
      bd7908e6
    • Marko Mäkelä's avatar
      MDEV-31621 Remove ibuf_read_merge_pages() call from ibuf_insert_low() · 5b62644e
      Marko Mäkelä authored
      When InnoDB attempts to buffer a change operation of a secondary index
      leaf page (to insert, delete-mark or remove a record) and the
      change buffer is too large, InnoDB used to trigger a change buffer merge
      that could affect any tables. This could lead to huge variance in
      system throughput and potentially unpredictable crashes, in case the
      change buffer was corrupted and a crash occurred while attempting to
      merge changes to a table that is not being accessed by the current
      SQL statement.
      
      ibuf_insert_low(): Simply return DB_STRONG_FAIL when the maximum size
      of the change buffer is exceeded.
      
      ibuf_contract_after_insert(): Remove.
      
      ibuf_get_merge_page_nos_func(): Remove a constant parameter.
      The function ibuf_contract() will be our only caller, during
      shutdown with innodb_fast_shutdown=0.
      5b62644e
  11. 04 Jul, 2023 10 commits
  12. 03 Jul, 2023 3 commits
  13. 30 Jun, 2023 2 commits
    • Marko Mäkelä's avatar
      MDEV-31559 btr_search_hash_table_validate() does not check if CHECK TABLE is killed · 3d901438
      Marko Mäkelä authored
      btr_search_hash_table_validate(), btr_search_validate(): Add the
      parameter THD for checking if the statement has been killed.
      Any non-QUICK CHECK TABLE will validate the entire adaptive hash index
      for all InnoDB tables, which may be extremely slow when running
      multiple concurrent CHECK TABLE.
      3d901438
    • Oleg Smirnov's avatar
      MDEV-30639 Upgrade to 10.8 and later does not work on Windows · 6d911219
      Oleg Smirnov authored
      During the upgrade procedure on Windows mysqld.exe is started with
      the named pipe connection protocol. mysqladmin.exe then pings the
      server to check if is up and running. Command line looks like:
         mysqladmin.exe --protocol=pipe --socket=mysql_upgrade_service_xxx ping
      But the "socket" parameter resets the "protocol" which was previously
      initialized with the "pipe" value, setting it to "socket".
      As a result, connection cannot be established and the upgrade
      procedure fails.
      "socket" in Windows is used to pass the name of the pipe so resetting
      the protocol is not valid in this case.
      
      This commit fixes resetting of the "protocol" parameter with "socket"
      parameter in the case when protocol has been previously initialized
      to "pipe" value
      6d911219
  14. 29 Jun, 2023 2 commits
    • Alexander Barkov's avatar
      MDEV-31578 DECLARE CURSOR: "Memory not freed: 280 bytes lost" on syntax error · fdab2c4c
      Alexander Barkov authored
      When CURSOR parameters get parsed, their sp_assignment_lex instances
      (one instance per parameter) get collected to List<sp_assignment_lex>.
      
      These instances get linked to sphead only in the end of the list.
      If a syntax error happened in the middle of the parameter list,
      these instances were not deleted, which caused memory leaks.
      
      Fix:
      
      using a Bison %destructor to free rules of the <sp_assignment_lex_list>
      type (on syntax errors).
      
      Afte the fix these sp_assignment_lex instances from CURSOR parameters
      deleted as follows:
      
      - If the CURSOR statement was fully parsed, then these instances
        get properly linked to sp_head structures, so they are deleted
        during ~sp_head (this did not change)
      
      - If the CURSOR statement failed on a syntax error, then by Bison's
        %destructor (this is being added in the current patch).
      fdab2c4c
    • Alexander Barkov's avatar
      MDEV-30680 Warning: Memory not freed: 280 on mangled query, LeakSanitizer: detected memory leaks · 0d3720c1
      Alexander Barkov authored
      The parser works as follows:
      
      The rule expr_lex returns a pointer to a newly created sp_expr_lex
      instance which is not linked to any MariaDB structures yet - it is
      pointed only from a Bison stack variable. The sp_expr_lex instance
      gets linked to other structures (such as sp_instr_jump_if_not) later,
      after scanning some following grammar.
      
      Problem before the fix:
      If a parse error happened immediately after expr_lex (before it got linked),
      the created sp_expr_lex value got lost causing a memory leak.
      
      Fix:
      
      - Using Bison's "destructor" directive to free the results of expr_lex
        on parse/oom errors.
      
      - Moving the call for LEX::cleanup_lex_after_parse_error() from
        MYSQL_YYABORT and yyerror inside parse_sql().
        This is needed because Bison calls destructors after yyerror(),
        while it's important to delete the sp_expr_lex instance before
        LEX::cleanup_lex_after_parse_error().
        The latter frees the memory root containing the sp_expr_lex instance.
      
        After this change the code block are executed in the following order:
      
        - yyerror() -- now only raises the error to DA (no cleanup done any more)
        - %destructor { delete $$; } <expr_lex>  -- destructs the sp_expr_lex instance
        - LEX::cleanup_lex_after_parse_error()   -- frees the memory root containing
                                                    the sp_expr_lex instance
      
      - Removing the "delete sublex" related code from restore_lex():
        - restore_lex() is called in most cases on success, when delete is not needed.
        - There is one place when restore_lex() is called on error:
          In sp_create_assignment_instr(). But in this case LEX::sp_lex_in_use
          is true anyway.
          The patch adds a new DBUG_ASSERT(lex->sp_lex_in_use) to guard this.
      0d3720c1
  15. 28 Jun, 2023 4 commits
    • Marko Mäkelä's avatar
      Fix WITH_UBSAN GCC -Wconversion · 33877cfe
      Marko Mäkelä authored
      33877cfe
    • Vlad Lesin's avatar
      MDEV-31570 gap_lock_split.test hangs sporadically · 3e89b4fc
      Vlad Lesin authored
      The fix is in replacing the waiting for the whole purge finishing
      with the the waiting for only delete-marked records purging finishing.
      
      Reviewed by: Marko Mäkelä
      3e89b4fc
    • Vlad Lesin's avatar
      MDEV-30648 btr_estimate_n_rows_in_range() accesses unfixed, unlatched page · 687fd6be
      Vlad Lesin authored
      The issue is caused by MDEV-30400 fix.
      
      There are two cursors in btr_estimate_n_rows_in_range() - p1 and p2, but
      both share the same mtr. Each cursor contains mtr savepoint for the
      previously fetched block to release it then the current block is
      fetched.
      
      Before MDEV-30400 the block was released with
      mtr_t::release_block_at_savepoint(), it just unfixed a block and
      released its page patch. In MDEV-30400 it was replaced with
      mtr_t::rollback_to_savepoint(), which does the same as the former
      mtr_t::release_block_at_savepoint(ulint begin, ulint end) but also
      erases the corresponding slots from mtr memo, what invalidates any
      stored mtr's memo savepoints, greater or equal to "begin".
      
      The idea of the fix is to get rid of savepoints at all in
      btr_estimate_n_rows_in_range() and
      btr_estimate_n_rows_in_range_on_level(). As
      mtr_t::rollback_to_savepoint() erases elements from mtr_t::m_memo, we
      know what element of mtr_t::m_memo can be deleted on the certain case,
      so there is no need to store savepoints.
      
      See also the following slides for details:
      https://docs.google.com/presentation/d/1RFYBo7EUhM22ab3GOYctv3j_3yC0vHtBY9auObZec8U
      
      Reviewed by: Marko Mäkelä
      687fd6be
    • Yuchen Pei's avatar
      MDEV-30542 Fixing spider/bugfix.self_reference_multi · 24faa5de
      Yuchen Pei authored
      The server needs to have a unique name
      24faa5de
  16. 27 Jun, 2023 2 commits
  17. 26 Jun, 2023 1 commit