1. 03 Dec, 2020 12 commits
    • Marko Mäkelä's avatar
      MDEV-24142: Avoid block_lock alignment loss on 64-bit systems · e9f33b77
      Marko Mäkelä authored
      sux_lock::recursive: Move right after the 32-bit sux_lock::lock.
      This will reduce sizeof(block_lock) from 24 to 16 bytes on
      64-bit systems with CMAKE_BUILD_TYPE=RelWithDebInfo. This may be
      significant, because there will be one buf_block_t::lock for each
      buffer pool page descriptor.
      
      We still have some potential for savings, with sizeof(buf_page_t)==112
      and sizeof(buf_block_t)==184 on a GNU/Linux AMD64 system.
      
      Note: On GNU/Linux AMD64, sizeof(index_lock) remains 32 bytes
      (16 with PLUGIN_PERFSCHEMA=NO) even tough it would fit in 24 bytes.
      This is because sizeof(srw_lock) includes 4 bytes of padding
      (to 16 bytes) that index_lock_t::recursive cannot reuse. So,
      in total 4+4 bytes will be lost to padding. This is rather
      insignificant compared to sizeof(dict_index_t)==400.
      e9f33b77
    • Marko Mäkelä's avatar
      MDEV-24142: Remove INFORMATION_SCHEMA.INNODB_MUTEXES · ba2d45dc
      Marko Mäkelä authored
      Let us remove sux_lock::waits and the associated bookkeeping.
      Starting with commit 1669c889
      the PERFORMANCE_SCHEMA instrumentation interface is keeping
      track of lock waits.
      
      The view INFORMATION_SCHEMA.INNODB_MUTEXES only exported counts
      of rw-lock waits.
      
      Also, SHOW ENGINE INNODB MUTEX will no longer export any information
      about rw-locks.
      ba2d45dc
    • Marko Mäkelä's avatar
    • Marko Mäkelä's avatar
      MDEV-24142: Remove the LatchDebug interface to rw-locks · ac028ec5
      Marko Mäkelä authored
      The latching order checks for rw-locks have not caught many bugs
      in the past few years and they are greatly complicating the code.
      
      Last time the debug checks were useful was in
      commit 59caf2c3 (MDEV-13485).
      
      The B-tree hang MDEV-14637 was not caught by LatchDebug,
      because the granularity of the checks is not sufficient
      to distinguish the levels of non-leaf B-tree pages.
      
      The interface was already made dead code by the grandparent
      commit 03ca6495.
      ac028ec5
    • Marko Mäkelä's avatar
      MDEV-24308: Windows improvements · 06efef4b
      Marko Mäkelä authored
      This reverts commit e34e53b5
      and defines os_thread_sleep() is a macro on Windows.
      06efef4b
    • Marko Mäkelä's avatar
      MDEV-24142: Replace InnoDB rw_lock_t with sux_lock · 03ca6495
      Marko Mäkelä authored
      InnoDB buffer pool block and index tree latches depend on a
      special kind of read-update-write lock that allows reentrant
      (recursive) acquisition of the 'update' and 'write' locks
      as well as an upgrade from 'update' lock to 'write' lock.
      The 'update' lock allows any number of reader locks from
      other threads, but no concurrent 'update' or 'write' lock.
      
      If there were no requirement to support an upgrade from 'update'
      to 'write', we could compose the lock out of two srw_lock
      (implemented as any type of native rw-lock, such as SRWLOCK on
      Microsoft Windows). Removing this requirement is very difficult,
      so in commit f7e7f487d4b06695f91f6fbeb0396b9d87fc7bbf we
      implemented an 'update' mode to our srw_lock.
      
      Re-entrant or recursive locking is mostly needed when writing or
      freeing BLOB pages, but also in crash recovery or when merging
      buffered changes to an index page. The re-entrancy allows us to
      attach a previously acquired page to a sub-mini-transaction that
      will be committed before whatever else is holding the page latch.
      
      The SUX lock supports Shared ('read'), Update, and eXclusive ('write')
      locking modes. The S latches are not re-entrant, but a single S latch
      may be acquired even if the thread already holds an U latch.
      
      The idea of the U latch is to allow a write of something that concurrent
      readers do not care about (such as the contents of BTR_SEG_LEAF,
      BTR_SEG_TOP and other page allocation metadata structures, or
      the MDEV-6076 PAGE_ROOT_AUTO_INC). (The PAGE_ROOT_AUTO_INC field
      is only updated when a dict_table_t for the table exists, and only
      read when a dict_table_t for the table is being added to dict_sys.)
      
      block_lock::u_lock_try(bool for_io=true) is used in buf_flush_page()
      to allow concurrent readers but no concurrent modifications while the
      page is being written to the data file. That latch will be released
      by buf_page_write_complete() in a different thread. Hence, we use
      the special lock owner value FOR_IO.
      
      The index_lock::u_lock() improves concurrency on operations that
      involve non-leaf index pages.
      
      The interface has been cleaned up a little. We will use
      x_lock_recursive() instead of x_lock() when we know that a
      lock is already held by the current thread. Similarly,
      a lock upgrade from U to X is only allowed via u_x_upgrade()
      or x_lock_upgraded() but not via x_lock().
      
      We will disable the LatchDebug and sync_array interfaces to
      InnoDB rw-locks.
      
      The SEMAPHORES section of SHOW ENGINE INNODB STATUS output
      will no longer include any information about InnoDB rw-locks,
      only TTASEventMutex (cmake -DMUTEXTYPE=event) waits.
      This will make a part of the 'innotop' script dead code.
      
      The block_lock buf_block_t::lock will not be covered by any
      PERFORMANCE_SCHEMA instrumentation.
      
      SHOW ENGINE INNODB MUTEX and INFORMATION_SCHEMA.INNODB_MUTEXES
      will no longer output source code file names or line numbers.
      The dict_index_t::lock will be identified by index and table names,
      which should be much more useful. PERFORMANCE_SCHEMA is lumping
      information about all dict_index_t::lock together as
      event_name='wait/synch/sxlock/innodb/index_tree_rw_lock'.
      
      buf_page_free(): Remove the file,line parameters. The sux_lock will
      not store such diagnostic information.
      
      buf_block_dbg_add_level(): Define as empty macro, to be removed
      in a subsequent commit.
      
      Unless the build was configured with cmake -DPLUGIN_PERFSCHEMA=NO
      the index_lock dict_index_t::lock will be instrumented via
      PERFORMANCE_SCHEMA. Similar to
      commit 1669c889
      we will distinguish lock waits by registering shared_lock,exclusive_lock
      events instead of try_shared_lock,try_exclusive_lock.
      Actual 'try' operations will not be instrumented at all.
      
      rw_lock_list: Remove. After MDEV-24167, this only covered
      buf_block_t::lock and dict_index_t::lock. We will output their
      information by traversing buf_pool or dict_sys.
      03ca6495
    • Marko Mäkelä's avatar
      MDEV-24142 preparation: Add srw_mutex and srw_lock::u_lock() · d46b4248
      Marko Mäkelä authored
      The PERFORMANCE_SCHEMA insists on distinguishing read-update-write
      locks from read-write locks, so we must add
      template<bool support_u_lock> in rd_lock() and wr_lock() operations.
      
      rd_lock::read_trylock(): Add template<bool prioritize_updater=false>
      which is used by the srw_lock_low::read_lock() loop. As long as
      an UPDATE lock has already been granted to some thread, we will grant
      subsequent READ lock requests even if a waiting WRITE lock request
      exists. This will be necessary to be compatible with existing usage
      pattern of InnoDB rw_lock_t where the holder of SX-latch (which we
      will rename to UPDATE latch) may acquire an additional S-latch
      on the same object. For normal read-write locks without update operations
      this should make no difference at all, because the rw_lock::UPDATER
      flag would never be set.
      d46b4248
    • Marko Mäkelä's avatar
      MDEV-24167: Stabilize perfschema.sxlock_func · 3872e585
      Marko Mäkelä authored
      The extension of the test perfschema.sxlock_func in
      commit 1669c889
      turned out to be unstable.
      
      Let us filter out purge_sys.latch (trx_purge_latch) from the output,
      because it might happen that the purge tasks will not be executed
      during the test execution.
      3872e585
    • Marko Mäkelä's avatar
      MDEV-24167 fixup: Improve the PERFORMANCE_SCHEMA instrumentation · 1669c889
      Marko Mäkelä authored
      Let us try to avoid code bloat for the common case that
      performance_schema is disabled at runtime, and use
      ATTRIBUTE_NOINLINE member functions for instrumented latch acquisition.
      
      Also, let us distinguish lock waits from non-contended lock requests
      by using write_lock,read_lock for the requests that lead to waits,
      and try_write_lock,try_read_lock for the wait-free lock acquisitions.
      Actual 'try' operations are not being instrumented at all.
      1669c889
    • Marko Mäkelä's avatar
      MDEV-24167 fixup: Avoid hangs in SRW_LOCK_DUMMY · 260161fc
      Marko Mäkelä authored
      In commit 1fdc161d we introduced
      a mutex-and-condition-variable based fallback implementation
      for platforms that lack a futex system call. That implementation
      is prone to hangs.
      
      Let us use separate condition variables for shared and exclusive requests.
      260161fc
    • Marko Mäkelä's avatar
      Merge 10.5 into 10.6 · a13fac9e
      Marko Mäkelä authored
      a13fac9e
    • Marko Mäkelä's avatar
      MDEV-22929 fixup: root_name() clash with clang++ <fstream> · f146969f
      Marko Mäkelä authored
      The clang++ -stdlib=libc++ header file <fstream> depends on
      <filesystem> that defines a member function path::root_name(),
      which conflicts with the rather unused #define root_name()
      that had been introduced in
      commit 7c58e97b.
      
      Because an instrumented -stdlib=libc++ (rather than the default
      -stdlib=libstdc++) is easier to build for a working -fsanitize=memory
      (cmake -DWITH_MSAN=ON), let us remove the conflicting #define for now.
      f146969f
  2. 02 Dec, 2020 5 commits
  3. 01 Dec, 2020 8 commits
    • Marko Mäkelä's avatar
      Merge 10.3 into 10.4 · 589cf8db
      Marko Mäkelä authored
      589cf8db
    • Vlad Lesin's avatar
      MDEV-22929 MariaBackup option to report and/or continue when corruption is encountered · e30a05f4
      Vlad Lesin authored
      Post-push Windows compilation errors fix.
      e30a05f4
    • Marko Mäkelä's avatar
      e28d9c15
    • Monty's avatar
      After merge fixes · 7edfed63
      Monty authored
      Change thd->mdl_context.release_transactional_locks() to
      thd->mdl_release_transactional_locks()
      7edfed63
    • Marko Mäkelä's avatar
      MDEV-24323 Crash on recovery after kill during instant ADD COLUMN · 73f34336
      Marko Mäkelä authored
      row_undo_ins_parse_undo_rec(): Do not try to read non-existing
      virtual column information for the metadata record.
      73f34336
    • Marko Mäkelä's avatar
      Merge 10.2 into 10.3 · 81ab9ea6
      Marko Mäkelä authored
      81ab9ea6
    • Marko Mäkelä's avatar
      MDEV-21962 fixup: Remove buf_pool_contains_zip() · e76e1288
      Marko Mäkelä authored
      The replacement is buf_pool.contains_zip().
      e76e1288
    • Vlad Lesin's avatar
      MDEV-22929 MariaBackup option to report and/or continue when corruption is encountered · e6b3e38d
      Vlad Lesin authored
      The new option --log-innodb-page-corruption is introduced.
      
      When this option is set, backup is not interrupted if innodb corrupted
      page is detected. Instead it logs all found corrupted pages in
      innodb_corrupted_pages file in backup directory and finishes with error.
      
      For incremental backup corrupted pages are also copied to .delta file,
      because we can't do LSN check for such pages during backup,
      innodb_corrupted_pages will also be created in incremental backup
      directory.
      
      During --prepare, corrupted pages list is read from the file just after
      redo log is applied, and each page from the list is checked if it is allocated
      in it's tablespace or not. If it is not allocated, then it is zeroed out,
      flushed to the tablespace and removed from the list. If all pages are removed
      from the list, then --prepare is finished successfully and
      innodb_corrupted_pages file is removed from backup directory. Otherwise
      --prepare is finished with error message and innodb_corrupted_pages contains
      the list of the pages, which are detected as corrupted during backup, and are
      allocated in their tablespaces, what means backup directory contains corrupted
      innodb pages, and backup can not be considered as consistent.
      
      For incremental --prepare corrupted pages from .delta files are applied
      to the base backup, innodb_corrupted_pages is read from both base in
      incremental directories, and the same action is proceded for corrupted
      pages list as for full --prepare. innodb_corrupted_pages file is
      modified or removed only in base directory.
      
      If DDL happens during backup, it is also processed at the end of backup
      to have correct tablespace names in innodb_corrupted_pages.
      e6b3e38d
  4. 30 Nov, 2020 15 commits
    • Monty's avatar
      MDEV 15532 Assertion `!log->same_pk' failed in row_log_table_apply_delete · 828471cb
      Monty authored
      The reason for the failure is that
      thd->mdl_context.release_transactional_locks()
      was called after commit & rollback even in cases where the current
      transaction is still active.
      
      For 10.2, 10.3 and 10.4 the fix is simple:
      - Replace all calls to thd->mdl_context.release_transactional_locks() with
        thd->release_transactional_locks(). The thd function will only call
        the mdl_context function if there are no active transactional locks.
        In 10.6 we will better fix where we will change the return value for
        some trans_xxx() functions to indicate if transaction did close the
        transaction or not. This will avoid the need of the indirect call.
      
      Other things:
      - trans_xa_commit() and trans_xa_rollback() will automatically
        call release_transactional_locks() if the transaction is closed.
      - We can't do that for the other functions as the caller of many of these
        are doing additional work (like close_thread_tables) before calling
        release_transactional_locks().
      - Added missing abort_result_set() and missing DBUG_RETURN in
        select_create::send_eof()
      - Fixed wrong indentation in injector::transaction::commit()
      828471cb
    • Monty's avatar
      Fixed maria.create test · c5375764
      Monty authored
      c5375764
    • Monty's avatar
      MDEV-15532 Assertion `!log->same_pk' failed in row_log_table_apply_delete · a3531775
      Monty authored
      The real fix for MDEV-15532 will be pushed into 10.2 and 10.6
      This is an additional fix for 10.4.
      
      In 10.4 trans_xa_detach was introduced.  However THD::cleanup() assumes
      that after trans_xa_detach() is done, there is no registered transactions
      anymore. In the 10.2 patch there will be an assert to ensure this, which
      will cause 10.4 to fail.
      
      The fix used is to reset the transaction flags in trans_xa_detach().
      a3531775
    • Monty's avatar
      Fixed maria.create test · 6261b1f4
      Monty authored
      6261b1f4
    • Vladislav Vaintroub's avatar
      Clarify some comments. · 1435f35b
      Vladislav Vaintroub authored
      - the intention for my_getevents syscall is now better explained,
      why are we using it (to be able to interrupt io_getevents syscall via
      io_destroy()).
      
      - Fix comment for MAX_EVENTS in getevent_thread_routine.
      MAX_EVENTS is more of less arbitrary constant, chosen such that events array
      is big enough to get multiple simultaneous io completions, but small
      enough so it does not blow the thread's stack.
      1435f35b
    • Vladislav Vaintroub's avatar
      MDEV-24295 Reduce wakeups by tpool maintenance timer, when server is idle · 5bb5d4ad
      Vladislav Vaintroub authored
      If maintenance timer does not do much for prolonged time, it will
      wake up less frequently, once every 4 seconds instead of once every 0.4
      second.
      
      It will wakeup more often if thread creation is throttled, to avoid stalls.
      5bb5d4ad
    • Monty's avatar
    • Sergei Petrunia's avatar
      11196347
    • Marko Mäkelä's avatar
      Merge 10.5 into 10.6 · cde525f9
      Marko Mäkelä authored
      cde525f9
    • Marko Mäkelä's avatar
      MDEV-24308: Revert for Windows · e34e53b5
      Marko Mäkelä authored
      For some reason, InnoDB debug tests on Windows fail due to rw_lock_t
      if the function call overhead for some os_thread_ code is removed.
      
      This change worked fine on Windows in combination with MDEV-24142.
      e34e53b5
    • Varun Gupta's avatar
      MDEV-21265: IN predicate conversion to IN subquery should be allowed for a... · b4379df5
      Varun Gupta authored
      MDEV-21265: IN predicate conversion to IN subquery should be allowed for a broader set of datatype comparison
      
      Allow materialization strategy when collations on the
      inner and outer sides of an IN subquery are the same and the
      character set of the inner side is a proper subset of the character
      set on the outer side.
      This allows conversion from utf8mb3 to utf8mb4
      as the former is a subset of the later.
      This is only allowed when IN predicate is converted to an IN subquery
      
      Backported part of the patch (d6a00d9b) of MDEV-17905.
      b4379df5
    • Marko Mäkelä's avatar
      fc6a7e90
    • Marko Mäkelä's avatar
      MDEV-24167 fixup: Always derive srw_lock from rw_lock · 1fdc161d
      Marko Mäkelä authored
      Let us always base srw_lock on our own std::atomic<uint32_t>
      based rw_lock. In this way, we can extend the locks in a portable
      way across all platforms.
      
      We will use futex system calls where available:
      Linux, OpenBSD, and Microsoft Windows.
      
      Elsewhere, we will emulate futex with a mutex and a condition variable.
      
      Thanks to Daniel Black for testing this on OpenBSD.
      1fdc161d
    • Marko Mäkelä's avatar
      Merge 10.5 into 10.6 · 565b0dd1
      Marko Mäkelä authored
      565b0dd1
    • Marko Mäkelä's avatar
      MDEV-24308: Remove some os_thread_ functions · 8fa6e363
      Marko Mäkelä authored
      os_thread_pf(): Remove.
      
      os_thread_eq(), os_thread_yield(), os_thread_get_curr_id():
      Define as macros.
      
      ut_print_timestamp(), ut_sprintf_timestamp(): Simplify.
      8fa6e363