1. 26 Jan, 2022 18 commits
  2. 25 Jan, 2022 1 commit
  3. 24 Jan, 2022 8 commits
    • Alexander Barkov's avatar
      A clean-up for MDEV-10654 add support IN, OUT, INOUT parameter qualifiers for stored functions · 05050867
      Alexander Barkov authored
      Changes:
      
      1. Enabling IN/OUT/INOUT mode for sql_mode=DEFAULT,
         adding tests for sql_mode=DEFAULT based by mostly
         translating compat/oracle.sp-inout.test to SQL/PSM
         with minor changes (e.g. testing trigger OLD.column and
         NEW.column as IN/OUT parameters).
      
      2. Removing duplicate grammar:
         sp_pdparam and sp_fdparam implemented exactly the same syntax after
         - the first patch for MDEV-10654 (for sql_mode=ORACLE)
         - the change #1 from this patch (for sql_mode=DEFAULT)
         Removing separate rules and adding a single "sp_param" rule instead,
         which now covers both PRDEDURE and FUNCTION parameters
         (and CURSOR parameters as well!).
      
      3. Adding a helper rule sp_param_name_and_mode, which is a combination
         of the parameter name and the IN/OUT/INOUT mode. It allows to simplify
         the grammer a bit.
      
      4. The first patch unintentionally allowed IN/OUT/INOUT mode
        to be specified in CURSOR parameters.
        This is good for the IN keyword - it is allowed in PL/SQL CURSORs.
        This is not good the the OUT/INOUT keywords - they should not be allowed.
        Adding a additional symantic post-check.
      05050867
    • ManoharKB's avatar
      MDEV-10654 add support IN, OUT, INOUT parameter qualifiers for stored functions · 4572dc23
      ManoharKB authored
      Problem: Currently stored function does not support IN/OUT/INOUT parameter qualifiers.
      This is needed for Oracle compatibility (sql_mode = ORACLE).
      
      Solution: Implemented parameter qualifier support to CREATE FUNCTION (reference: CREATE PROCEDURE)
      Implemented return by reference for OUT/INOUT parameters in execute_function() (reference: execute_procedure())
      
      Files changed:
      sql/sql_yacc.yy: Added IN, OUT, INOUT parameter qualifiers for CREATE FUNCTION.
      sql/sp_head.cc: Added input and output parameter binding for IN/OUT/INOUT parameters in execute_function() so that OUT/INOUT can return by reference.
      sql/share/errmsg-utf8.txt: Added error message to restrict OUT/INOUT parameters while function being called from SQL query.
      mysql-test/suite/compat/oracle/t/sp-inout.test: Added test cases
      mysql-test/suite/compat/oracle/r/sp-inout.result: Added test results
      
      Reviewed-by: iqbal@hasprime.com
      4572dc23
    • Nayuta Yanagisawa's avatar
      MDEV-27521 SIGSEGV in spider_parse_connect_info in MDEV-27106 branch · 5595ed9d
      Nayuta Yanagisawa authored
      Add NULL check to SPIDER_OPTION_STR_LIST.
      5595ed9d
    • Nayuta Yanagisawa's avatar
      MDEV-26858 Spider: Remove dead code related to HandlerSocket · 0599dd90
      Nayuta Yanagisawa authored
      Remove the dead-code, in Spider, which is related to the Spider's
      HandlerSocket support. The code has been disabled for a long time
      and it is unlikely that the code will be enabled.
      0599dd90
    • Nayuta Yanagisawa's avatar
      MDEV-27106 Spider: specify connection to data node by engine-defined attributes · 72f34df3
      Nayuta Yanagisawa authored
      We introduce engine-defined attributes to specify remote data nodes.
      
      The engine attributes do not cover all the existing DSN parameters
      because most of them need not be specified at the table level.
      We introduce the following three attributes: REMOTE_SERVER,
      REMOTE_DATABASE, REMOTE_TABLE.
      
      One cannot specify both DSN parameter, in COMMENT or CONNECT, and
      engine-defined attribute that are for the same SPIDER_SHARE attribute.
      For example, Spider returns an error if both COMMENT='table "t1"'
      and REMOTE_TABLE="t2" are specified for a single Spider table or
      a single partition in a Spider table.
      72f34df3
    • Nayuta Yanagisawa's avatar
      MDEV-5271 Support engine-defined attributes per partition · c5d09f73
      Nayuta Yanagisawa authored
      Make it possible to specify engine-defined attributes on partitions
      as well as tables.
      
      If an engine-defined attribute is only specified at the table level,
      it applies to all the partitions in the table.
      This is a backward-compatible behavior.
      
      If the same attribute is specified both at the table level and the
      partition level, the per-partition one takes precedence.
      So, we can consider per-table attributes as default values.
      
      One cannot specify engine-defined attributes on subpartitions.
      
      Implementation details:
      
      * We store per-partition attributes in the partition_element class
        because we already have the part_comment field, which is for
        per-partition comments.
      
      * In the case of ALTER TABLE statements, the partition_elements in
        table->part_info is set up by mysql_unpack_partition().
        So, we parse per-partition attributes after the call of the function.
      c5d09f73
    • Daniel Black's avatar
      MDEV-27314 InnoDB Buffer Pool Resize output cleanup (mtr postfix) · 83dd7db6
      Daniel Black authored
      More tests depending on 'Completed resizing buffer pool.' output
      83dd7db6
    • Haidong Ji's avatar
      MDEV-27314 InnoDB Buffer Pool Resize output cleanup · d0ca235d
      Haidong Ji authored
      Cleaned up the log messages as suggested, with a minor code
      formatting change.
      
      On bullet point 13, I decided to not include timestamp in output
      message. In most (all?) cases, the output goes to the log file,
      which has timestamp already.
      d0ca235d
  4. 22 Jan, 2022 2 commits
    • Otto Kekäläinen's avatar
      Extend the Gitlab-CI pipeline to run mini benchmark · c5c61b51
      Otto Kekäläinen authored
      Implement new mini-benchmark script for simple CPU bound benchmark for the
      duration of 5 minutes. The script can be run stand-alone or as part of a
      CI pipeline.
      
      Extend Gitlab-CI to run mini-benchmark on every commit to catch if there
      are severe performance regressions.
      
      Also bump MARIADB_MAJOR_VERSION to 10.8 which is needed on the 10.8 branch.
      c5c61b51
    • Marko Mäkelä's avatar
      MDEV-27208: mtr --ps-protocol test fixup · 1f5fc7b7
      Marko Mäkelä authored
      The test ./mtr --ps-protocol main.func_math
      was broken in commit 5b3ad94c
      because in that mode, one of several truncation warnings for
      a single integer literal would be omitted. Those warnings are
      issued by the parser somewhere outside CRC32() or CRC32C().
      1f5fc7b7
  5. 21 Jan, 2022 5 commits
    • Marko Mäkelä's avatar
      MDEV-27208: Extend CRC32() and implement CRC32C() · 5b3ad94c
      Marko Mäkelä authored
      We used to define a native unary function CRC32() that computes the CRC-32
      of a string using the ISO 3309 polynomial that is being used by zlib
      and many others.
      
      Often, a CRC is computed in pieces. To faciliate this, we introduce a
      2-ary variant of the function that inputs a previous CRC as the first
      argument: CRC32('MariaDB')=CRC32(CRC32('Maria'),'DB').
      
      InnoDB and MyRocks use a different polynomial, which was implemented
      in SSE4.2 instructions that were introduced in the
      Intel Nehalem microarchitecture. This is commonly called CRC-32C
      (Castagnoli).
      
      We introduce a native function that uses the Castagnoli polynomial:
      CRC32C('MariaDB')=CRC32C(CRC32C('Maria'),'DB'). This allows
      SELECT...INTO DUMPFILE to be used for the creation of files with
      valid checksums, such as a logically empty InnoDB redo log file
      ib_logfile0 corresponding to a particular log sequence number.
      5b3ad94c
    • Marko Mäkelä's avatar
      MDEV-27199: Remove FIL_PAGE_FILE_FLUSH_LSN · b07920b6
      Marko Mäkelä authored
      The only purpose of the field FIL_PAGE_FILE_FLUSH_LSN was to
      store the log sequence number for a new ib_logfile0 when the
      InnoDB redo log was missing at startup.
      
      Because FIL_PAGE_FILE_FLUSH_LSN no longer serves any purpose,
      we will stop updating it. The writes of that field were inherently
      risky, because they were not covered by neither the redo log nor
      the doublewrite buffer.
      
      Warning: After MDEV-14425 and before this change, users could perform
      a clean shutdown of the server, replace the ib_logfile0 with a
      0-length file, and expect a valid log file to be created on the
      next server startup. After this change, if the FIL_PAGE_FILE_FLUSH_LSN
      had ever been updated in the past, the server would still create a
      log file in such a scenario, but possibly with an incorrect (too small)
      LSN. Users should not manipulate log files directly!
      b07920b6
    • Marko Mäkelä's avatar
      Disable adaptive spinning on buf_pool.mutex · 88d9fbb4
      Marko Mäkelä authored
      During the testing of MDEV-14425, buf_pool.mutex and log_sys.mutex
      were identified as the main bottlenecks for write workloads.
      Let us disable spinning also for buf_pool.mutex, except on ARMv8
      where spinning was enabled for log_sys.mutex
      in commit f7684f0c (MDEV-26855).
      This was tested on AMD64 and recommended by Axel Schwenke.
      
      According to Krunal Bauskar, removing the spinloops did not improve
      performance in his tests on ARMv8.
      88d9fbb4
    • Marko Mäkelä's avatar
      5d54fd61
    • Marko Mäkelä's avatar
      MDEV-14425 Improve the redo log for concurrency · 685d958e
      Marko Mäkelä authored
      The InnoDB redo log used to be formatted in blocks of 512 bytes.
      The log blocks were encrypted and the checksum was calculated while
      holding log_sys.mutex, creating a serious scalability bottleneck.
      
      We remove the fixed-size redo log block structure altogether and
      essentially turn every mini-transaction into a log block of its own.
      This allows encryption and checksum calculations to be performed
      on local mtr_t::m_log buffers, before acquiring log_sys.mutex.
      The mutex only protects a memcpy() of the data to the shared
      log_sys.buf, as well as the padding of the log, in case the
      to-be-written part of the log would not end in a block boundary of
      the underlying storage. For now, the "padding" consists of writing
      a single NUL byte, to allow recovery and mariadb-backup to detect
      the end of the circular log faster.
      
      Like the previous implementation, we will overwrite the last log block
      over and over again, until it has been completely filled. It would be
      possible to write only up to the last completed block (if no more
      recent write was requested), or to write dummy FILE_CHECKPOINT records
      to fill the incomplete block, by invoking the currently disabled
      function log_pad(). This would require adjustments to some logic around
      log checkpoints, page flushing, and shutdown.
      
      An upgrade after a crash of any previous version is not supported.
      Logically empty log files from a previous version will be upgraded.
      
      An attempt to start up InnoDB without a valid ib_logfile0 will be
      refused. Previously, the redo log used to be created automatically
      if it was missing. Only with with innodb_force_recovery=6, it is
      possible to start InnoDB in read-only mode even if the log file
      does not exist. This allows the contents of a possibly corrupted
      database to be dumped.
      
      Because a prepared backup from an earlier version of mariadb-backup
      will create a 0-sized log file, we will allow an upgrade from such
      log files, provided that the FIL_PAGE_FILE_FLUSH_LSN in the system
      tablespace looks valid.
      
      The 512-byte log checkpoint blocks at 0x200 and 0x600 will be replaced
      with 64-byte log checkpoint blocks at 0x1000 and 0x2000.
      
      The start of log records will move from 0x800 to 0x3000. This allows us
      to use 4096-byte aligned blocks for all I/O in a future revision.
      
      We extend the MDEV-12353 redo log record format as follows.
      
      (1) Empty mini-transactions or extra NUL bytes will not be allowed.
      (2) The end-of-minitransaction marker (a NUL byte) will be replaced
      with a 1-bit sequence number, which will be toggled each time when the
      circular log file wraps back to the beginning.
      (3) After the sequence bit, a CRC-32C checksum of all data
      (excluding the sequence bit) will written.
      (4) If the log is encrypted, 8 bytes will be written before
      the checksum and included in it. This is part of the
      initialization vector (IV) of encrypted log data.
      (5) File names, page numbers, and checkpoint information will not be
      encrypted. Only the payload bytes of page-level log will be encrypted.
      The tablespace ID and page number will form part of the IV.
      (6) For padding, arbitrary-length FILE_CHECKPOINT records may be written,
      with all-zero payload, and with the normal end marker and checksum.
      The minimum size is 7 bytes, or 7+8 with innodb_encrypt_log=ON.
      
      In mariadb-backup and in Galera snapshot transfer (SST) scripts, we will
      no longer remove ib_logfile0 or create an empty ib_logfile0. Server startup
      will require a valid log file. When resizing the log, we will create
      a logically empty ib_logfile101 at the current LSN and use an atomic rename
      to replace ib_logfile0 with it. See the test innodb.log_file_size.
      
      Because there is no mandatory padding in the log file, we are able
      to create a dummy log file as of an arbitrary log sequence number.
      See the test mariabackup.huge_lsn.
      
      The parameter innodb_log_write_ahead_size and the
      INFORMATION_SCHEMA.INNODB_METRICS counter log_padded will be removed.
      
      The minimum value of innodb_log_buffer_size will be increased to 2MiB
      (because log_sys.buf will replace recv_sys.buf) and the increment
      adjusted to 4096 bytes (the maximum log block size).
      
      The following INFORMATION_SCHEMA.INNODB_METRICS counters will be removed:
      
      os_log_fsyncs
      os_log_pending_fsyncs
      log_pending_log_flushes
      log_pending_checkpoint_writes
      
      The following status variables will be removed:
      
      Innodb_os_log_fsyncs (this is included in Innodb_data_fsyncs)
      Innodb_os_log_pending_fsyncs (this was limited to at most 1 by design)
      
      log_sys.get_block_size(): Return the physical block size of the log file.
      This is only implemented on Linux and Microsoft Windows for now, and for
      the power-of-2 block sizes between 64 and 4096 bytes (the minimum and
      maximum size of a checkpoint block). If the block size is anything else,
      the traditional 512-byte size will be used via normal file system
      buffering.
      
      If the file system buffers can be bypassed, a message like the following
      will be issued:
      
      InnoDB: File system buffers for log disabled (block size=512 bytes)
      InnoDB: File system buffers for log disabled (block size=4096 bytes)
      
      This has been tested on Linux and Microsoft Windows with both sizes.
      
      On Linux, only enable O_DIRECT on the log for innodb_flush_method=O_DSYNC.
      Tests in 3 different environments where the log is stored in a device
      with a physical block size of 512 bytes are yielding better throughput
      without O_DIRECT. This could be due to the fact that in the event the
      last log block is being overwritten (if multiple transactions would
      become durable at the same time, and each of will write a small
      number of bytes to the last log block), it should be faster to re-copy
      data from log_sys.buf or log_sys.flush_buf to the kernel buffer,
      to be finally written at fdatasync() time.
      
      The parameter innodb_flush_method=O_DSYNC will imply O_DIRECT for
      data files. This option will enable O_DIRECT on the log file on Linux.
      It may be unsafe to use when the storage device does not support
      FUA (Force Unit Access) mode.
      
      When the server is compiled WITH_PMEM=ON, we will use memory-mapped
      I/O for the log file if the log resides on a "mount -o dax" device.
      We will identify PMEM in a start-up message:
      
      InnoDB: log sequence number 0 (memory-mapped); transaction id 3
      
      On Linux, we will also invoke mmap() on any ib_logfile0 that resides
      in /dev/shm, effectively treating the log file as persistent memory.
      This should speed up "./mtr --mem" and increase the test coverage of
      PMEM on non-PMEM hardware. It also allows users to estimate how much
      the performance would be improved by installing persistent memory.
      On other tmpfs file systems such as /run, we will not use mmap().
      
      mariadb-backup: Eliminated several variables. We will refer
      directly to recv_sys and log_sys.
      
      backup_wait_for_lsn(): Detect non-progress of
      xtrabackup_copy_logfile(). In this new log format with
      arbitrary-sized blocks, we can only detect log file overrun
      indirectly, by observing that the scanned log sequence number
      is not advancing.
      
      xtrabackup_copy_logfile(): On PMEM, do not modify the sequence bit,
      because we are not allowed to modify the server's log file, and our
      memory mapping is read-only.
      
      trx_flush_log_if_needed_low(): Do not use the callback on pmem.
      Using neither flush_lock nor write_lock around PMEM writes seems
      to yield the best performance. The pmem_persist() calls may
      still be somewhat slower than the pwrite() and fdatasync() based
      interface (PMEM mounted without -o dax).
      
      recv_sys_t::buf: Remove. We will use log_sys.buf for parsing.
      
      recv_sys_t::MTR_SIZE_MAX: Replaces RECV_SCAN_SIZE.
      
      recv_sys_t::file_checkpoint: Renamed from mlog_checkpoint_lsn.
      
      recv_sys_t, log_sys_t: Removed many data members.
      
      recv_sys.lsn: Renamed from recv_sys.recovered_lsn.
      recv_sys.offset: Renamed from recv_sys.recovered_offset.
      log_sys.buf_size: Replaces srv_log_buffer_size.
      
      recv_buf: A smart pointer that wraps log_sys.buf[recv_sys.offset]
      when the buffer is being allocated from the memory heap.
      
      recv_ring: A smart pointer that wraps a circular log_sys.buf[] that is
      backed by ib_logfile0. The pointer will wrap from recv_sys.len
      (log_sys.file_size) to log_sys.START_OFFSET. For the record that
      wraps around, we may copy file name or record payload data to
      the auxiliary buffer decrypt_buf in order to have a contiguous
      block of memory. The maximum size of a record is less than
      innodb_page_size bytes.
      
      recv_sys_t::parse(): Take the smart pointer as a template parameter.
      Do not temporarily add a trailing NUL byte to FILE_ records, because
      we are not supposed to modify the memory-mapped log file. (It is
      attached in read-write mode already during recovery.)
      
      recv_sys_t::parse_mtr(): Wrapper for recv_sys_t::parse().
      
      recv_sys_t::parse_pmem(): Like parse_mtr(), but if PREMATURE_EOF would be
      returned on PMEM, use recv_ring to wrap around the buffer to the start.
      
      mtr_t::finish_write(), log_close(): Do not enforce log_sys.max_buf_free
      on PMEM, because it has no meaning on the mmap-based log.
      
      log_sys.write_to_buf: Count writes to log_sys.buf. Replaces
      srv_stats.log_write_requests and export_vars.innodb_log_write_requests.
      Protected by log_sys.mutex. Updated consistently in log_close().
      Previously, mtr_t::commit() conditionally updated the count,
      which was inconsistent.
      
      log_sys.write_to_log: Count swaps of log_sys.buf and log_sys.flush_buf,
      for writing to log_sys.log (the ib_logfile0). Replaces
      srv_stats.log_writes and export_vars.innodb_log_writes.
      Protected by log_sys.mutex.
      
      log_sys.waits: Count waits in append_prepare(). Replaces
      srv_stats.log_waits and export_vars.innodb_log_waits.
      
      recv_recover_page(): Do not unnecessarily acquire
      log_sys.flush_order_mutex. We are inserting the blocks in arbitary
      order anyway, to be adjusted in recv_sys.apply(true).
      
      We will change the definition of flush_lock and write_lock to
      avoid potential false sharing. Depending on sizeof(log_sys) and
      CPU_LEVEL1_DCACHE_LINESIZE, the flush_lock and write_lock could
      share a cache line with each other or with the last data members
      of log_sys.
      
      Thanks to Matthias Leich for providing https://rr-project.org traces
      for various failures during the development, and to
      Thirunarayanan Balathandayuthapani for his help in debugging
      some of the recovery code. And thanks to the developers of the
      rr debugger for a tool without which extensive changes to InnoDB
      would be very challenging to get right.
      
      Thanks to Vladislav Vaintroub for useful feedback and
      to him, Axel Schwenke and Krunal Bauskar for testing the performance.
      685d958e
  6. 20 Jan, 2022 6 commits