1. 16 Sep, 2024 23 commits
    • Brandon Nesterenko's avatar
      MDEV-32014: Reduce min val of large_commit_threshold for debug builds · f924cbda
      Brandon Nesterenko authored
      To help in the testing of MDEV-32014, allow debug_builds to
      set a lower value for binlog_large_commit_threshold
      f924cbda
    • Libing Song's avatar
      MDEV-32014 Rename binlog cache temporary file to binlog file · 12bdd58c
      Libing Song authored
                 for large transaction
      
      Description
      ===========
      When a transaction commits, it copies the binlog events from
      binlog cache to binlog file. Very large transactions
      (eg. gigabytes) can stall other transactions for a long time
      because the data is copied while holding LOCK_log, which blocks
      other commits from binlogging.
      
      The solution in this patch is to rename the binlog cache file to
      a binlog file instead of copy, if the commiting transaction has
      large binlog cache. Rename is a very fast operation, it doesn't
      block other transactions a long time.
      
      Design
      ======
      * binlog_large_commit_threshold
        type: ulonglong
        scope: global
        dynamic: yes
        default: 128MB
      
        Only the binlog cache temporary files large than 128MB are
        renamed to binlog file.
      
      * #binlog_cache_files directory
        To support rename, all binlog cache temporary files are managed
        as normal files now. `#binlog_cache_files` directory is in the same
        directory with binlog files. It is created at server startup if it doesn't
        exist. Otherwise, all files in the directory is deleted at startup.
      
        The temporary files are named with ML_ prefix and the memorary address
        of the binlog_cache_data object which guarantees it is unique.
      
      * Reserve space
        To supprot rename feature, It must reserve enough space at the
        begin of the binlog cache file. The space is required for
        Format description, Gtid list, checkpoint and Gtid events when
        renaming it to a binlog file.
      
        Since binlog_cache_data's cache_log is directly accessed by binlog log,
        online alter and wsrep. It is not easy to update all the code. Thus
        binlog cache will not reserve space if it is not session binlog cache or
        wsrep session is enabled.
      
        - m_file_reserved_bytes
          Stores the bytes reserved at the begin of the cache file.
          It is initialized in write_prepare() and cleared by reset().
      
          The reserved file header is hide to callers. Thus there is no
          change for callers. E.g.
          - get_byte_position() still get the length of binlog data
            written to the cache, but not the file length.
          - truncate(0) will truncate the file to m_file_reserved_bytes but not 0.
      
        - write_prepare()
          write_prepare() is called everytime when anything is being written
          into the cache. It will call init_file_reserved_bytes() to  create
          the cache file (if it doesn't exist) and reserve suitable space if
          the data written exceeds buffer's size.
      
      * Binlog_commit_by_rotate
        It is used to encapsulate the code for remaing a binlog cache
        tempoary file to binlog file.
        - should_commit_by_rotate()
          it is called by write_transaction_to_binlog_events() to check if
          a binlog cache should be rename to a binlog file.
        - commit()
          That is the entry to rename a binlog cache and commit the
          transaction. Both rename and commit are protected by LOCK_log,
          Thus not other transactions can write anything into the renamed
          binlog before it.
      
          Rename happens in a rotation. After the new binlog file is generated,
          replace_binlog_file() is called to:
          - copy data from the new binlog file to its binlog cache file.
          - write gtid event.
          - rename the binlog cache file to binlog file.
      
          After that the rotation will continue to succeed. Then the transaction
          is committed in a seperated group itself. Its cache file will be
          detached and cache log will be reset before calling
          trx_group_commit_with_engines(). Thus only Xid event be written.
      12bdd58c
    • Rex's avatar
      MDEV-31466 Add optional correlation column list for derived tables · 34139685
      Rex authored
      Extend derived table syntax to support column name assignment.
      (subquery expression) [as|=] ident [comma separated column name list].
      Prior to this patch, the optional comma separated column name list is
      not supported.
      
      Processing within the unit of the subquery expression will use
      original column names, outside the unit will use the new names.
      
      For example, in the query
      
      select a1, a2 from
        (select c1, c2, c3 from t1 where c2 > 0) as dt (a1, a2, a3)
      where a2 > 10;
      
      we see the second column of the derived table dt being used both within,
      (where c2 > 0), and outside, (where a2 > 10), the specification.
      Both conditions apply to t1.c2.
      
      When multiple unit preparations are required, such as when being used within
      a prepared statement or procedure, original column names are needed for
      correct resolution. Original names are reset within mysql_derived_reinit().
      
      Item_holder items, used for result tables in both TVC and union preparations
      are renamed before use within st_select_lex_unit::prepare().
      
      During wildcard expansion, if column names are present, items names are
      set directly after creation.
      
      Reviewed by Igor Babaev (igor@mariadb.com)
      34139685
    • Yuchen Pei's avatar
      MDEV-15696 Implement SHOW CREATE SERVER · 41c6b844
      Yuchen Pei authored
      One change is that if the port is not supplied or out of bound, the
      old behaviour is to print 3306. The new behaviour is to not print
      it (if not supplied) or the out of bound value.
      41c6b844
    • Yuchen Pei's avatar
      MDEV-34716 Allow arbitrary options in CREATE SERVER · 698e8fa3
      Yuchen Pei authored
      The existing syntax for CREATE SERVER
      
      CREATE [OR REPLACE] SERVER [IF NOT EXISTS] server_name
          FOREIGN DATA WRAPPER wrapper_name
          OPTIONS (option [, option] ...)
      
      option:
        { HOST character-literal
        | DATABASE character-literal
        | USER character-literal
        | PASSWORD character-literal
        | SOCKET character-literal
        | OWNER character-literal
        | PORT numeric-literal }
      
      With this change we have:
      
      option:
        { HOST character-literal
        | DATABASE character-literal
        | USER character-literal
        | PASSWORD character-literal
        | SOCKET character-literal
        | OWNER character-literal
        | PORT numeric-literal
        | PORT quoted-numerical-literal
        | identifier character-literal}
      
      We store these options as a JSON field in the mysql.servers system
      table. We retain the restriction that PORT needs to be a number, but
      also allow it to be a quoted number, so that SHOW CREATE SERVER can be
      used for dumping. Without an accompanied implementation of SHOW CREATE
      SERVER, some mysqldump tests will fail. Therefore this commit should
      be immediately followed by the one implementating SHOW CREATE SERVER,
      with testing covering both.
      698e8fa3
    • Yuchen Pei's avatar
      MDEV-34716 Fix mysql.servers socket max length too short · 4feb58e8
      Yuchen Pei authored
      The limit of socket length on unix according to libc is 108, see
      sockaddr_un::sun_path, but in the table it is a string of max length
      64, which results in truncation of socket and failure to connect by
      plugins using servers such as spider.
      4feb58e8
    • Yuchen Pei's avatar
      MDEV-34716 spider: some trivial cleanups and documentation · b200cf78
      Yuchen Pei authored
      - document tmp_share, which are temporary spider shares with only one
      link (no ha)
      - simplify spider_get_sys_tables_connect_info() where link_idx is
      always 0
      b200cf78
    • Alexey Botchkov's avatar
      4af516c4
    • Christian Gonzalez's avatar
      Make SESSION_USER() comparable with CURRENT_USER() · c88cb0b7
      Christian Gonzalez authored
      Update `SESSION_USER()` behaviour to be comparable with `CURRENT_USER()`.
      `SESSION_USER()` will return the user and host columns from `mysql.user`
      used to authenticate the user when the session was created.
      
      Historically `SESSION_USER()` was an alias of `USER()` function. The
      main difference with `USER()` behaviour after this changes is that
      `SESSION_USER()` now returns the host column from `mysql.user` instead of
      the client host or ip.
      
      NOTE: `SESSION_USER_IS_USER` old mode is added to make the change
      backward compatible.
      
      All new code of the whole pull request, including one or several files
      that are either new files or modified ones, are contributed under the
      BSD-new license. I am contributing on behalf of my employer
      Amazon Web Services, Inc.
      c88cb0b7
    • Aleksey Midenkov's avatar
      MDEV-27293 Allow converting a versioned table from implicit · 4870d740
      Aleksey Midenkov authored
                 to explicit row_start/row_end columns
      
      In case of adding both system fields of same type (length, unsigned
      flag) as old implicit system fields do the rename of implicit system
      fields to the ones specified in ALTER, remove SYSTEM_INVISIBLE flag in
      that case. Correct PERIOD clause must be specified in ALTER as well.
      
      MDEV-34904 Inplace alter for implicit to explicit versioning is broken
      
      Whether ALTER goes inplace and how it goes inplace depends on
      handler_flags which goes from alter_info->flags by this logic:
      
        ha_alter_info->handler_flags|= (alter_info->flags & ~flags_to_remove);
      
      ALTER_VERS_EXPLICIT was not in flags_to_remove and its value (1ULL <<
      35) clashed with ALTER_ADD_NON_UNIQUE_NON_PRIM_INDEX.
      
      ALTER_VERS_EXPLICIT must not affect inplace, it is SQL-only so we
      remove it from handler_flags.
      4870d740
    • Oleg Smirnov's avatar
      MDEV-27277 Add a warning when max_sort_length is reached · 68ad4090
      Oleg Smirnov authored
      During a query execution some sorting and grouping operations
      on strings may be involved. System variable max_sort_length defines
      the maximum number of bytes to use when comparing strings during
      sorting/grouping. Thus, the comparable parts of strings may be less
      than their actual size, so the results of the query may be not
      sorted/grouped properly.
      To indicate that some comparisons were done on a truncated lengths,
      a new warning has been introduced with this commit.
      68ad4090
    • Alexander Barkov's avatar
      MDEV-27277 Add a warning when max_sort_length is reached · 882aa0a2
      Alexander Barkov authored
      Step#1: fixing the return type of strnxfrm() from size_t to this structure:
      
      typedef struct
      {
        size_t m_output_length;
        size_t m_source_length_used;
        uint m_warnings;
      } my_strnxfrm_ret_t;
      882aa0a2
    • Yuchen Pei's avatar
      MDEV-25008: UPDATE/DELETE: Cost-based choice IN->EXISTS vs Materialization · dee4a4a6
      Yuchen Pei authored
      Single-table UPDATE/DELETE didn't provide outer_lookup_keys value for
      subqueries. This didn't allow to make a meaningful choice between
      IN->EXISTS and Materialization strategies for subqueries.
      
      Fix this:
      * Make UPDATE/DELETE save Sql_cmd_dml::scanned_rows,
      * Then, subquery's JOIN::choose_subquery_plan() can fetch it from
      there for outer_lookup_keys
      
      Details:
      UPDATE/DELETE now calls select_lex->optimize_unflattened_subqueries()
      twice, like SELECT does (first call optimize_constant_subquries() in
      JOIN::optimize_inner(), then call optimize_unflattened_subqueries() in
      JOIN::optimize_stage2()):
      1. Call with const_only=true before any optimizations. This allows
      range optimizer and others to use the values of cheap const
      subqueries.
      2. Call it with const_only=false after range optimizer, partition
      pruning, etc. outer_lookup_keys value is provided, so it's possible to
      pick a good subquery strategy.
      
      Note: PROTECT_STATEMENT_MEMROOT requires that first SP execution
      performs subquery optimization for all subqueries, even for degenerate
      query plans like "Impossible WHERE". Due to that, we ensure that the
      call to optimize_unflattened_subqueries (with const_only=false) even
      for degenerate query plans still happens, as was the case before this
      change.
      dee4a4a6
    • Yuchen Pei's avatar
    • Alexander Barkov's avatar
      MDEV-15751 CURRENT_TIMESTAMP should return a TIMESTAMP [WITH TIME ZONE?] · dad24a55
      Alexander Barkov authored
      Changing the return type of the following functions:
        - CURRENT_TIMESTAMP, CURRENT_TIMESTAMP(), NOW()
        - SYSDATE()
        - FROM_UNIXTIME()
      from DATETIME to TIMESTAMP.
      
      Note, the old function NOW() returning DATETIME is still available
      as LOCALTIMESTAMP or LOCALTIMESTAMP(), e.g.:
      
        SELECT
          LOCALTIMESTAMP,     -- DATETIME
          CURRENT_TIMESTAMP;  -- TIMESTAMP
      
      The change in the functions return data type fixes some problems
      that occurred near a DST change:
      
      - Problem #1
      
      INSERT INTO t1 (timestamp_field) VALUES (CURRENT_TIMESTAMP);
      INSERT INTO t1 (timestamp_field) VALUES (COALESCE(CURRENT_TIMESTAMP));
      
      could result into two different values inserted.
      
      - Problem #2
      
      INSERT INTO t1 (timestamp_field) VALUES (FROM_UNIXTIME(1288477526));
      INSERT INTO t1 (timestamp_field) VALUES (FROM_UNIXTIME(1288477526+3600));
      
      could result into two equal TIMESTAMP values near a DST change.
      
      Additional changes:
      
      - FROM_UNIXTIME(0) now returns SQL NULL instead of '1970-01-01 00:00:00'
        (assuming time_zone='+00:00')
      
      - UNIX_TIMESTAMP('1970-01-01 00:00:00') now returns SQL NULL instead of 0
        (assuming time_zone='+00:00'
      
      These additional changes are needed for consistency with TIMESTAMP fields,
      which cannot store '1970-01-01 00:00:00 +00:00'.
      dad24a55
    • Alexander Barkov's avatar
      MDEV-12252 ROW data type for stored function return values · 3624fb78
      Alexander Barkov authored
      Adding support for the ROW data type in the stored function RETURNS clause:
      
      - explicit ROW(..members...) for both sql_mode=DEFAULT and sql_mode=ORACLE
      
        CREATE FUNCTION f1() RETURNS ROW(a INT, b VARCHAR(32)) ...
      
      - anchored "ROW TYPE OF [db1.]table1" declarations for sql_mode=DEFAULT
      
        CREATE FUNCTION f1() RETURNS ROW TYPE OF test.t1 ...
      
      - anchored "[db1.]table1%ROWTYPE" declarations for sql_mode=ORACLE
      
        CREATE FUNCTION f1() RETURN test.t1%ROWTYPE ...
      
      Adding support for anchored scalar data types in RETURNS clause:
      
      - "TYPE OF [db1.]table1.column1" for sql_mode=DEFAULT
      
        CREATE FUNCTION f1() RETURNS TYPE OF test.t1.column1;
      
      - "[db1.]table1.column1" for sql_mode=ORACLE
      
        CREATE FUNCTION f1() RETURN test.t1.column1%TYPE;
      
      Details:
      
      - Adding a new sql_mode_t parameter to
          sp_head::create()
          sp_head::sp_head()
          sp_package::create()
          sp_package::sp_package()
        to guarantee early initialization of sp_head::m_sql_mode.
        Before this change, this member was not initialized at all during
        CREATE FUNCTION/PROCEDURE/PACKAGE statements, and was not used.
        Now it needs to be initialized to write properly the
        mysql.proc.returns column, according to the create time sql_mode.
      
      - Code refactoring to make the things simpler and functions smaller:
      
        * Adding a new method
          Field_row::row_create_fields(THD *thd, List<Spvar_definition> *list)
          to make a Virtual_tmp_table with Fields for ROW members
          from an explicit definition.
      
        * Adding a new method
          Field_row::row_create_fields(THD *thd, const Spvar_definition &def)
          to make a Virtual_tmp_table with Fields for ROW members
          from an explicit or a table anchored definition.
      
        * Adding a new method
          Item_args::add_array_of_item_field(THD *thd, const Virtual_tmp_table &vtable)
          to create and array of Item_field corresponding to all Field instances
          in a Virtual_tmp_table
      
        * Removing Item_field_row::row_create_items(). It was decomposed
          into the new methods described above.
      
        * Moving the code from the loop body in sp_rcontext::init_var_items()
          into a separate method Spvar_definition::make_item_field_row(),
          to make the code clearer (smaller functions).
          make_item_field_row() itself uses the new methods described above.
      
      - Changing the data type of sp_head::m_return_field_def
        from Column_definition to Spvar_definition.
        So now it supports not only SQL column field types,
        but also explicit ROW and anchored ROW data types,
        as well as anchored column types.
      
      - Adding a new Column_definition parameter to sp_head::create_result_field().
        Before this patch, create_result_field() took the definition only
        from m_return_field_def. Now it's also called with a local Column_definition
        variable which contains the explicit definition resolved from an
        anchored defition.
      
      - Modifying sql_yacc.yy to support the new grammar.
        Adding new helper methods:
          * sf_return_fill_definition_row()
          * sf_return_fill_definition_rowtype_of()
          * sf_return_fill_definition_type_of()
      
      - Fixing tests in:
        * Virtual_tmp_table::setup_field_pointers() in sql_select.cc
        * Send_field::normalize() in field.h
        * store_column_type()
        to prevent calling Type_handler_row::field_type(),
        which is implemented a DBUG_ASSERT(0).
        Before this patch the affected methods and functions were called only
        for scalar data types. Now ROW is also possible.
      
      - Adding a new virtual method Field::cols()
      
      - Overriding methods:
         Item_func_sp::cols()
         Item_func_sp::element_index()
         Item_func_sp::check_cols()
         Item_func_sp::bring_value()
        to support the ROW data type.
      
      - Extending the rule sp_return_type to support
        * explicit ROW and anchored ROW data types
        * anchored scalar data types
      
      - Overriding Field_row::sql_type() to print
        the data type of an explicit ROW.
      3624fb78
    • Sergei Golubchik's avatar
      post-merge changes · f5e4c461
      Sergei Golubchik authored
      * remove duplicate test file
      * move all uuidv7 tests into plugin/type_uuid/mysql-test/type_uuid/
      * remove mysys/ changes
      * auto my_random_bytes() fallback - removes duplicate code from uuid,
        and fixes all other users of my_random_bytes() that don't check
        the return value (because, perhaps, they don't need crypto-strong
        random bytes)
      * End of 11.6 -> 11.7 in tests
      * clarify the warning text
      * UUID_VERSION_MASK()/UUID_VARIANT_MASK() must not depend on the version
      * allow 4x more monotonic uuidv7 per millisecond - instead of stretching
        1000 microseconds over 12 bits, let's use extra 2 bits as a counter
      f5e4c461
    • StefanoPetrilli's avatar
      89e0944d
    • Daniel Black's avatar
      MDEV-32583 UUID() should be treated as stochastic for the purposes of forcing query materialization · 8ea4590a
      Daniel Black authored
      Port 9e800eda changing lex->safe_to_cache_query
      to lex->uncacheable(UNCACHEABLE_RAND).
      8ea4590a
    • Alexander Barkov's avatar
      cleanup: MDEV-11339 Implement native UUID4 function · c5cc46f3
      Alexander Barkov authored
      - Moving the class UUIDv1 into a separate file sql_type_uuid_v1.h
      
      - Adding a new class UUIDv4, similar to UUIDv1
      
      - Changing the way how my_random_bytes() failures are handled.
        Instead of raising an error it now raises a note.
        Reasoning: if we're in the middle of a multi-million row
        transaction and one UUIDv4 generation fails, it's not a good
        idea to throw away the entire transaction. Instead, let's
        generate bytes using a my_rnd() loop.
      
      - Adding a new test func_uuid_v4.test to demonstrate that the UUIDv4()
        returned type is "UUID NOT NULL".
      
      - Adding a new test func_uuidv4_debug.test to emulate my_random_bytes()
        failures
      
      - Adding a template Item_func_uuid_vx to share the code
        between the implementations of UUID() and UUIDv4().
      c5cc46f3
    • StefanoPetrilli's avatar
      2f288273
    • Sergei Golubchik's avatar
    • Monty's avatar
      MDEV-33144 Implement the Percona variable slow_query_log_always_write_time · 295c0ebf
      Monty authored
      This task is inspired by the Percona implementation of
      slow_query_log_always_write_time.
      
      This task implements the variable log_slow_always_query_time (name
      matching other MariaDB variables using the slow query log). The
      default value for the variable is 31536000, which makes MariaDB
      compatible with older installations.
      
      For queries with execution time longer than log_slow_always_query_time
      the variables log_slow_rate_limit and log_slow_min_examined_row_limit
      will be ignored and the query will be written to the slow query log
      if there is no other limitations (like log_slow_filter etc).
      
      Other things:
      - long_query_time internal variable renamed to log_slow_query_time.
      - More descriptive information for "log_slow_query_time".
      295c0ebf
  2. 14 Sep, 2024 17 commits