1. 03 Aug, 2024 1 commit
    • Kristian Nielsen's avatar
      MDEV-34696: do_gco_wait() completes too early on InnoDB dict stats updates · d08e519f
      Kristian Nielsen authored
      Before doing mark_start_commit(), check that there is no pending deadlock
      kill. If there is a pending kill, we won't commit (we will abort, roll back,
      and retry). Then we should not mark the commit as started, since that could
      potentially make the following GCO start too early, before we completed the
      commit after the retry.
      
      This condition could trigger in some corner cases, where InnoDB would take
      temporarily table/row locks that are released again immediately, not held
      until the transaction commits. This happens with dict_stats updates and
      possibly auto-increment locks.
      
      Such locks can be passed to thd_rpl_deadlock_check() and cause a deadlock
      kill to be scheduled in the background. But since the blocking locks are
      held only temporarily, then can be released before the background kill
      happens. This way, the kill can be delayed until after mark_start_commit()
      has been called. Thus we need to check the synchronous indication
      rgi->killed_for_retry, not just the asynchroneous thd->killed.
      Signed-off-by: default avatarKristian Nielsen <knielsen@knielsen-hq.org>
      d08e519f
  2. 01 Aug, 2024 1 commit
    • Marko Mäkelä's avatar
      fixup! cbd77f86 · ed342c0f
      Marko Mäkelä authored
      Some cleanup, and try to diagnose ERROR_SHARING_VIOLATION
      in the innodb.log_file_size_online test
      ed342c0f
  3. 31 Jul, 2024 3 commits
    • Marko Mäkelä's avatar
      fixup! ea874982 · cbd77f86
      Marko Mäkelä authored
      trx_flush_log_if_needed(): Do not pass a callback to non-durable
      memory-mapped write.
      cbd77f86
    • Marko Mäkelä's avatar
    • Marko Mäkelä's avatar
      MDEV-34062: Implement innodb_log_file_mmap on 64-bit systems · ea874982
      Marko Mäkelä authored
      When using the default innodb_log_buffer_size=2m, mariadb-backup --backup
      would spend a lot of time re-reading and re-parsing the log. For reads,
      it would be beneficial to memory-map the entire ib_logfile0 to the
      address space (typically 48 bits or 256 TiB) and read it from there,
      both during --backup and --prepare.
      
      We will introduce the Boolean SET GLOBAL innodb_log_file_mmap
      that will be OFF by default, but ON by default in mariadb-backup
      when not running on Microsoft Windows. On Microsoft Windows,
      setting innodb_log_file_mmap=ON may cause aggressive read-ahead of
      the entire ib_logfile0 in when only a tiny portion would be accessed.
      On Linux, the read-ahead would have to be explicitly enabled by
      specifying MAP_POPULATE to mmap(2). Because we avoid MAP_POPULATE,
      the file will be read on demand.
      
      The setting innodb_log_file_mmap=ON is available also in the server.
      This could speed up I/O and allow the log data to be shared between
      mariadbd and mariadb-backup --backup in the RAM buffer.
      Note: It might not be advisable to enable memory-mapped log writes
      while backup is not running. It could make sense with a small
      innodb_log_file_size that fits in RAM.
      
      Memory-mapped regular files differ from log_sys.is_pmem() in the way
      that an open file handle to ib_logfile0 will be retained. That allows
      log_t::set_mmap() to enable or disable the interface with fewer
      operations.
      
      On log checkpoint we will invoke madvise() with MADV_DONTNEED in order
      to reduce the memory pressure. This could lead to reads of old
      garbage contents of the circular log file when a page fault occurs
      while writing a record. There does not seem to be any way around this;
      on Linux, invoking fallocate() with FALLOC_FL_ZERO_RANGE would make
      things even worse by triggering additional metadata writes.
      
      Most references to HAVE_PMEM or log_sys.is_pmem() are replaced with
      HAVE_INNODB_MMAP or log_sys.is_mmap(). The main difference is that
      PMEM skips the use of write_lock and flush_lock and uses pmem_persist(),
      while the memory-mapped interface will use a combination of msync()
      and fdatasync().
      
      Starting with Linux 2.6.19, msync(MS_ASYNC) is a no-op, so we will not
      invoke it on Linux. For durable writes, we will invoke msync(MS_SYNC).
      ea874982
  4. 30 Jul, 2024 2 commits
    • Marko Mäkelä's avatar
      MDEV-34422 Corrupted ib_logfile0 due to uninitialized log_sys.lsn_lock · 1c8af2ae
      Marko Mäkelä authored
      In commit bf0b82d2 (MDEV-33515)
      the function log_t::init_lsn_lock() was removed. This was fine on
      those platforms where InnoDB uses futex-based mutexes (Linux, FreeBSD,
      OpenBSD, NetBSD, DragonflyBSD).
      
      Dave Gosselin debugged this on Apple macOS and submitted a fix where
      pthread_mutex_wrapper::pthread_mutex_wrapper() would invoke init().
      We do not really need that; we only need to invoke lsn_lock.init()
      like we used to do before commit bf0b82d2.
      This should be a no-op for the futex based mutexes, which intentionally
      rely on zero initialization.
      
      The missing pthread_mutex_init() call would cause race conditions
      and corruption of log_sys.buf because multiple threads could
      apparently hold log_sys.lsn_lock concurrently in
      log_t::append_prepare().  The error would be caught by a debug
      assertion in log_t::write_buf(), or in non-debug builds by the
      fact that the server cannot be restarted due to an apparently
      missing FILE_CHECKPOINT record (because it had been written
      to wrong offset in log_sys.buf).
      
      The failure in log_t::append_prepare() was caught on Microsoft Windows
      after enabling SUX_LOCK_GENERIC and therefore forcing the use of
      pthread_mutex_wrapper for the log_sys.lsn_lock.  It appears to be fine
      to omit the pthread_mutex_init() call on GNU/Linux.
      
      log_t::create(): Invoke lsn_lock.init().
      
      log_t::close(): Invoke lsn_lock.destroy().
      
      To better catch this kind of issues in the future by simply defining
      SUX_LOCK_GENERIC on any platform, a separate debug instrumentation patch
      will be applied to the 10.6 branch later.
      
      Reviewed by: Debarun Banerjee
      1c8af2ae
    • Thirunarayanan Balathandayuthapani's avatar
      MDEV-33087 ALTER TABLE...ALGORITHM=COPY should build indexes more efficiently · cc8eefb0
      Thirunarayanan Balathandayuthapani authored
      - During copy algorithm, InnoDB should use bulk insert operation
      for row by row insert operation. By doing this, copy algorithm
      can effectively build indexes. This optimization is disabled
      for temporary table, versioning table and table which has
      foreign key relation.
      
      Introduced the variable innodb_alter_copy_bulk to allow
      the bulk insert operation for copy alter operation
      inside InnoDB. This is enabled by default
      
      ha_innobase::extra(): HA_EXTRA_END_ALTER_COPY mode tries to apply
      the buffered bulk insert operation, updates the non-persistent
      table stats.
      
      row_merge_bulk_t::write_to_index(): Update stat_n_rows after
      applying the bulk insert operation
      
      row_ins_clust_index_entry_low(): In case of copy algorithm,
      switch to bulk insert operation.
      
      copy_data_error_ignore(): Handles the error while copying
      the data from source to target file.
      cc8eefb0
  5. 24 Jul, 2024 1 commit
  6. 22 Jul, 2024 2 commits
  7. 20 Jul, 2024 1 commit
  8. 19 Jul, 2024 4 commits
    • Andrei's avatar
      MDEV-15393 gtid_slave_pos duplicate key errors after mysqldump restore · b8f92ade
      Andrei authored
      When mysqldump is run to dump the `mysql` system database, it generates
      INSERT statements into the table `mysql.gtid_slave_pos`.
      After running the backup script
      those inserts did not produce the expected gtid state on slave. In
      particular the maximum of mysql.gtid_slave_pos.sub_id did not make
      into
         rpl_global_gtid_slave_state.last_sub_id
      
      an in-memory object that is supposed to match the current state of the
      table. And that was regardless of whether --gtid option was specified
      or not. Later when the backup recipient server starts as slave
      in *non-gtid* mode this desychronization may lead to a duplicate key
      error.
      
      This effect is corrected for --gtid mode mysqldump/mariadb-dump only
      as the following.  The fixes ensure the insert block of the dump
      script is followed with a "summing-up" SET @global.gtid_slave_pos
      assignment.
      
      For the implemenation part, note a deferred print-out of
      SET-gtid_slave_pos and associated comments is prefered over relocating
      of the entire blocks if (opt_master,slave_data &&
      do_show_master,slave_status) ...  because of compatiblity
      concern. Namely an error inside do_show_*() is handled in the new code
      the same way, as early as, as before.
      
      A regression test can be run in how-to-reproduce mode as well.
      One affected mtr test observed.
      rpl_mysqldump_slave.result "mismatch" shows now the new deferring print
      of SET-gtid_slave_pos policy in action.
      b8f92ade
    • Oleksandr Byelkin's avatar
      new libfmt 11.0.1 · 0f6f1114
      Oleksandr Byelkin authored
      0f6f1114
    • Oleksandr Byelkin's avatar
      New CC 3.3 · a94fd874
      Oleksandr Byelkin authored
      a94fd874
    • Oleksandr Byelkin's avatar
      Fix view protocol · b8b6cab2
      Oleksandr Byelkin authored
      b8b6cab2
  9. 18 Jul, 2024 2 commits
  10. 17 Jul, 2024 23 commits