1. 27 Sep, 2024 3 commits
    • Marko Mäkelä's avatar
      Cleanup: Replace some is_predefined_tablespace() · 5d9f04d4
      Marko Mäkelä authored
      In some places, there were redundant comparisons against TRX_SYS_SPACE
      or SRV_TMP_SPACE_ID. The temporary tablespace is never the subject of
      log-based recovery.
      5d9f04d4
    • Marko Mäkelä's avatar
      MDEV-34850: Busy work while parsing FILE_ records · 4683c2fe
      Marko Mäkelä authored
      In mariadb-backup --backup, we only have to invoke the undo_space_trunc
      and log_file_op callbacks as well as validate the mini-transaction
      checksums. There is absolutely no need to access recv_sys.pages or
      recv_spaces. This is what the new mode recv_sys_t::store::BACKUP will do.
      
      In the skip_the_rest: loop, the minimum that needs to be done is to
      process all FILE_ records until the end of the log is reached.
      Additionally, in case we invoked file_name_t::add_freed_page() for a
      FREE_PAGE record before switching to the skip_the_rest: loop, we must
      invoke file_name_t::remove_freed_page() for any INIT_PAGE record.
      Any other records that we encounter during this parsing can be ignored;
      they will eventually be processed on a subsequent call to recv_scan_log()
      with store=true.
      
      This was measured to reduce the CPU time between the messages
      "InnoDB: Multi-batch recovery needed at LSN" and
      "InnoDB: End of log at LSN"
      by some 20%.
      
      recv_sys_t::store: A ternary enumeration that specifies how records
      should be stored: NO, BACKUP, or YES.
      
      recv_sys_t::parse(), recv_sys_t::parse_mtr(), recv_sys_t::parse_pmem():
      Replace template<bool store> with template<store storing>.
      
      store_freed_or_init_rec(): Simplify some logic. We can look up also
      the system tablespace.
      4683c2fe
    • Marko Mäkelä's avatar
      MDEV-34907 Bogus assertion failure and busy work while parsing FILE_ records · 2d3ddaef
      Marko Mäkelä authored
      A server that was running with innodb_log_file_size=96M and
      innodb_buffer_pool_size=6M had inserted some data into a table
      that was subsequently dropped. When the server was killed and
      restarted, an assertion failed in recv_sys_t::parse() while
      a FSP_SIZE change was unnecessarily being processed during
      the skip_the_rest: loop in recv_scan_log().
      
      The ib_logfile0 contents was as follows:
      
      1. The checkpoint start LSN points to the start of some mini-transaction.
      2. There may be log records for modifying files for which a FILE_MODIFY
      had been written before the checkpoint. These records were "purged"
      by advancing the checkpoint.
      3. At some point during the initial parsing with store=true the space
      reserved for recv_sys.pages will run out and recv_scan_log() would switch
      to the skip_the_rest: mode.
      4. We encounter a log record for extending a tablespace that will be
      deleted a bit later. This would trip the bogus debug assertion.
      5. Later on, there would be a FILE_DELETE record for this tablespace.
      6. The checkpoint end LSN points to a possibly empty sequence of
      FILE_MODIFY records and a FILE_CHECKPOINT record. Recovery had parsed these
      records first, before rewinding to the checkpoint start LSN.
      7. There could be further records following the FILE_CHECKPOINT record.
      Recovery will process all records until an inconsistency is found and
      it is assumed that the end of the circular ib_logfile0 was reached.
      
      recv_sys_t::parse(): For the template instantiation with store=false,
      remove a debug assertion that could fail in a multi-batch recovery,
      while recv_scan_log(false) would be in the skip_the_rest: loop.
      It is very well possible that we have not encountered all FILE_ records
      yet, and therefore we should not complain about unknown tablespaces.
      
      Reviewed by: Debarun Banerjee
      2d3ddaef
  2. 26 Sep, 2024 1 commit
    • Marko Mäkelä's avatar
      MDEV-34062: Implement innodb_log_file_mmap on 64-bit systems · 6acada71
      Marko Mäkelä authored
      When using the default innodb_log_buffer_size=2m, mariadb-backup --backup
      would spend a lot of time re-reading and re-parsing the log. For reads,
      it would be beneficial to memory-map the entire ib_logfile0 to the
      address space (typically 48 bits or 256 TiB) and read it from there,
      both during --backup and --prepare.
      
      We will introduce the Boolean read-only parameter innodb_log_file_mmap
      that will be OFF by default on most platforms, to avoid aggressive
      read-ahead of the entire ib_logfile0 in when only a tiny portion would be
      accessed. On Linux and FreeBSD the default is innodb_log_file_mmap=ON,
      because those platforms define a specific mmap(2) option for enabling
      such read-ahead and therefore it can be assumed that the default would
      be on-demand paging. This parameter will only have impact on the initial
      InnoDB startup and recovery. Any writes to the log will use regular I/O,
      except when the ib_logfile0 is stored in a specially configured file system
      that is backed by persistent memory (Linux "mount -o dax").
      
      We also experimented with allowing writes of the ib_logfile0 via a
      memory mapping and decided against it. A fundamental problem would be
      unnecessary read-before-write in case of a major page fault, that is,
      when a new, not yet cached, virtual memory page in the circular
      ib_logfile0 is being written to. There appears to be no way to tell
      the operating system that we do not care about the previous contents of
      the page, or that the page fault handler should just zero it out.
      
      Many references to HAVE_PMEM have been replaced with references to
      HAVE_INNODB_MMAP.
      
      The predicate log_sys.is_pmem() has been replaced with
      log_sys.is_mmap() && !log_sys.is_opened().
      
      Memory-mapped regular files differ from MAP_SYNC (PMEM) mappings in the
      way that an open file handle to ib_logfile0 will be retained. In both
      code paths, log_sys.is_mmap() will hold. Holding a file handle open will
      allow log_t::clear_mmap() to disable the interface with fewer operations.
      
      It should be noted that ever since
      commit 685d958e (MDEV-14425)
      most 64-bit Linux platforms on our CI platforms
      (s390x a.k.a. IBM System Z being a notable exception) read and write
      /dev/shm/*/ib_logfile0 via a memory mapping, pretending that it is
      persistent memory (mount -o dax). So, the memory mapping based log
      parsing that this change is enabling by default on Linux and FreeBSD
      has already been extensively tested on Linux.
      
      ::log_mmap(): If a log cannot be opened as PMEM and the desired access
      is read-only, try to open a read-only memory mapping.
      
      xtrabackup_copy_mmap_snippet(), xtrabackup_copy_mmap_logfile():
      Copy the InnoDB log in mariadb-backup --backup from a memory
      mapped file.
      6acada71
  3. 24 Sep, 2024 1 commit
  4. 23 Sep, 2024 1 commit
  5. 20 Sep, 2024 3 commits
  6. 19 Sep, 2024 1 commit
  7. 18 Sep, 2024 1 commit
  8. 17 Sep, 2024 1 commit
  9. 16 Sep, 2024 3 commits
  10. 15 Sep, 2024 12 commits
  11. 14 Sep, 2024 2 commits
    • Marko Mäkelä's avatar
      mtr_t::log_file_op(): Fix -Wnonnull · 4010dff0
      Marko Mäkelä authored
      GCC 12.2.0 could issue -Wnonnull for an unreachable call to
      strlen(new_path).  Let us prevent that by replacing the condition
      (type == FILE_RENAME) with the equivalent (new_path).
      This should also optimize the generated code, because the life time
      of the parameter "type" will be reduced.
      4010dff0
    • Marko Mäkelä's avatar
      MDEV-34750 fixup: -Wconversion on 32-bit · e3f653ca
      Marko Mäkelä authored
      log_t::resize_write_buf(): If d<0 and d>-length, d will fit in ssize_t,
      which is a signed 32-bit or 64-bit integer. Cast from int64_t to ssize_t
      to make this clear and to silence a compiler warning.
      e3f653ca
  12. 13 Sep, 2024 1 commit
    • Marko Mäkelä's avatar
      MDEV-34921 MemorySanitizer reports errors for non-debug builds · b331cde2
      Marko Mäkelä authored
      my_b_encr_write(): Initialize also block_length, and at the same time
      last_block_length, so that all 128 bits can be initialized with fewer
      writes. This fixes an error that was caught in the test
      encryption.tempfiles_encrypted.
      
      test_my_safe_print_str(): Skip a test that would attempt to
      display uninitialized data in the test unit.stacktrace.
      Previously, our CI did not build unit tests with MemorySanitizer.
      
      handle_delayed_insert(): Remove a redundant call to pthread_exit(0),
      which would for some reason cause MemorySanitizer in clang-19 to
      report a stack overflow in a RelWithDebInfo build. This fixes a
      failure of several tests.
      
      Reviewed by: Vladislav Vaintroub
      b331cde2
  13. 12 Sep, 2024 6 commits
  14. 11 Sep, 2024 4 commits