1. 02 Aug, 2024 1 commit
  2. 01 Aug, 2024 3 commits
  3. 31 Jul, 2024 11 commits
  4. 30 Jul, 2024 6 commits
    • Hugo Wen's avatar
      MDEV-34625 Fix undefined behavior of using uninitialized member variables · 811614d4
      Hugo Wen authored
      Commit a8a75ba2 causes the MariaDB server to crash, usually with signal
      11, at random code locations due to invalid pointer values during any
      table operation. This issue occurs when the server is built with -O3 and
      other customized compiler flags.
      
      For example, the command `use db1;` causes server to crash in the
      `check_table_access` function at line sql_parse.cc:7080 because
      `tables->correspondent_table` is an invalid pointer value of 0x1.
      
      The crashes are due to undefined behavior from using uninitialized
      variables. The problematic commit a8a75ba2 introduces code that
      allocates memory and sets it to 0 using thd->calloc before initializing
      it with a placement new operation.
      This process depends on setting memory to 0 to initialize member
      variables not explicitly set in the constructor. However, the compiler
      can optimize out the memset/bfill, leading to uninitialized values and
      unpredictable issues.
      
      Once a constructor function initializes an object, any uninitialized
      variables within that object are subject to undefined behavior. The
      state of memory before the constructor runs, whether it involves
      memset or was used for other purposes, is irrelevant after the
      placement new operation.
      
      This behavior can be demonstrated with this
      [test](https://gcc.godbolt.org/z/5n87z1raG) I wrote to examine the
      assembly code. The code in MariaDB can be abstracted to the following,
      though it has many layers wrapped around it and more complex logic,
      causing slight differences in optimization in the MariaDB build.
      To summarize, on x86, the memset in the following code is optimized out
      with both -O2 and -O3 in GCC 13, and is only preserved in the much older
      GCC 4.9.
      
          struct S {
            int i;     // uninitialized in consturctor
            S() {};
          };
          int bar() {
            void *buf = malloc(sizeof(S));
            memset(buf, 0, sizeof(S));       // optimized out
            S* s = new(buf) S;
            return s->i;
          }
      
      With GCC13 -O3:
      
          bar():
                sub     rsp, 8
                mov     edi, 4
                call    malloc
                mov     eax, DWORD PTR [rax]
                add     rsp, 8
                ret
      
      With GCC4.9 -O3
      
          bar():
                sub     rsp, 8
                mov     edi, 4
                call    malloc
                mov     DWORD PTR [rax], 0
                xor     eax, eax
                add     rsp, 8
                ret
      
      Now we ensure the constructor initializes variables correctly by running
      the reset() function in the constructor to perform the memset/bfill(0)
      operation. After applying the fix, the crash is gone.
      
      All new code of the whole pull request, including one or several files
      that are either new files or modified ones, are contributed under the
      BSD-new license. I am contributing on behalf of my employer Amazon Web
      Services.
      811614d4
    • Sergei Petrunia's avatar
      MDEV-34580: Assertion `(key_part->key_part_flag & 4) == 0' failed key_hashnr · fdda8171
      Sergei Petrunia authored
      Remove an assert added by fix for MDEV-34417. BNL-H join can be used with
      prefix keys. This happens when there are real prefix indexes on the
      equi-join columns (although it probably doesn't make a lot of sense).
      
      Anyway, remove the assert. The code receives properly truncated key values
      for hashing/comparison so it can handle them just fine.
      fdda8171
    • Thirunarayanan Balathandayuthapani's avatar
      MDEV-34357 InnoDB: Assertion failure in file ./storage/innobase/page/page0zip.cc line 4211 · ee5f7692
      Thirunarayanan Balathandayuthapani authored
      During InnoDB root page split, InnoDB does the following
      1) First move the root records to the new page(p1)
      2) Empty the root, insert the node pointer to the root page
      3) Split the new page and make it as child nodes.
      4) Finds the split record, allocate another new page(p2)
      to the index
      5) InnoDB stores the record(ret) predecessor to the supremum
      record of the page (p2).
      6) In page_copy_rec_list_start(), move the records from p1 to p2
      upto the split record
      6) Given table is a compressed row format page, InnoDB attempts to
      compress the page p2 and failed (due to innodb_compression_level = 0)
      7) Since the compression fails, InnoDB gets the number of preceding
      records(ret_pos) of a record (ret) on the page (p2)
      8) Page (p2) is a new page, ret points to infimum record.
      ret_pos can be 0. InnoDB have wrong condition that ret_pos shouldn't
      be 0 and returns corruption. InnoDB has similar wrong check in
      page_copy_rec_list_end()
      ee5f7692
    • Marko Mäkelä's avatar
      MDEV-34422 Corrupted ib_logfile0 due to uninitialized log_sys.lsn_lock · 1c8af2ae
      Marko Mäkelä authored
      In commit bf0b82d2 (MDEV-33515)
      the function log_t::init_lsn_lock() was removed. This was fine on
      those platforms where InnoDB uses futex-based mutexes (Linux, FreeBSD,
      OpenBSD, NetBSD, DragonflyBSD).
      
      Dave Gosselin debugged this on Apple macOS and submitted a fix where
      pthread_mutex_wrapper::pthread_mutex_wrapper() would invoke init().
      We do not really need that; we only need to invoke lsn_lock.init()
      like we used to do before commit bf0b82d2.
      This should be a no-op for the futex based mutexes, which intentionally
      rely on zero initialization.
      
      The missing pthread_mutex_init() call would cause race conditions
      and corruption of log_sys.buf because multiple threads could
      apparently hold log_sys.lsn_lock concurrently in
      log_t::append_prepare().  The error would be caught by a debug
      assertion in log_t::write_buf(), or in non-debug builds by the
      fact that the server cannot be restarted due to an apparently
      missing FILE_CHECKPOINT record (because it had been written
      to wrong offset in log_sys.buf).
      
      The failure in log_t::append_prepare() was caught on Microsoft Windows
      after enabling SUX_LOCK_GENERIC and therefore forcing the use of
      pthread_mutex_wrapper for the log_sys.lsn_lock.  It appears to be fine
      to omit the pthread_mutex_init() call on GNU/Linux.
      
      log_t::create(): Invoke lsn_lock.init().
      
      log_t::close(): Invoke lsn_lock.destroy().
      
      To better catch this kind of issues in the future by simply defining
      SUX_LOCK_GENERIC on any platform, a separate debug instrumentation patch
      will be applied to the 10.6 branch later.
      
      Reviewed by: Debarun Banerjee
      1c8af2ae
    • Thirunarayanan Balathandayuthapani's avatar
      MDEV-34181 Instant table aborts after discard tablespace · c038b3c0
      Thirunarayanan Balathandayuthapani authored
      - commit 85db5347 (MDEV-33400)
      retains the instantness in the table definition after discard
      tablespace. So there is no need to assign n_core_null_bytes
      during instant table preparation unless they are not
      initialized.
      c038b3c0
    • Thirunarayanan Balathandayuthapani's avatar
      MDEV-33087 ALTER TABLE...ALGORITHM=COPY should build indexes more efficiently · cc8eefb0
      Thirunarayanan Balathandayuthapani authored
      - During copy algorithm, InnoDB should use bulk insert operation
      for row by row insert operation. By doing this, copy algorithm
      can effectively build indexes. This optimization is disabled
      for temporary table, versioning table and table which has
      foreign key relation.
      
      Introduced the variable innodb_alter_copy_bulk to allow
      the bulk insert operation for copy alter operation
      inside InnoDB. This is enabled by default
      
      ha_innobase::extra(): HA_EXTRA_END_ALTER_COPY mode tries to apply
      the buffered bulk insert operation, updates the non-persistent
      table stats.
      
      row_merge_bulk_t::write_to_index(): Update stat_n_rows after
      applying the bulk insert operation
      
      row_ins_clust_index_entry_low(): In case of copy algorithm,
      switch to bulk insert operation.
      
      copy_data_error_ignore(): Handles the error while copying
      the data from source to target file.
      cc8eefb0
  5. 29 Jul, 2024 5 commits
    • Rex's avatar
      MDEV-34506 2nd execution name resolution problem with pushdown into unions · 48b256a7
      Rex authored
      Statements affected by this bug need all the following to be true
      1) a derived table table or view whose specification contains a set
           operation at the top level.
      2) a grouping operator (group by/having) operating on a column alias
           other than in the first select of the union/intersect
      3) an outer condition that will be pushed into all selects in this
           union/intersect, either into the where or having clause
      
      When pushing a condition into all selects of a unit with more than one
      select, pushdown_cond_for_derived() renames items so we can re-use the
      condition being pushed.
      These names need to be saved and reset for correct name resolution on
      second execution of prepared statements.
      
      Reviewed by Igor Babaev (igor@mariadb.com)
      48b256a7
    • Monty's avatar
      MDEV-34664: Add an option to fix InnoDB's doubling of secondary index cardinalities · 4bf7c966
      Monty authored
      (With trivial fixes by sergey@mariadb.com)
      Added option fix_innodb_cardinality to optimizer_adjust_secondary_key_costs
      
      Using fix_innodb_cardinality disables the 'divide by 2' of rec_per_key_int
      in InnoDB that in effect doubles the Cardinality for secondary keys.
      This has the biggest effect for indexes where a few rows has the same key
      value. Using this may also cause table scans for very small tables (which
      in some cases may be better than an index scan).
      
      The user visible effect is that 'SHOW INDEX FROM table_name' will for
      InnoDB show the true Cardinality (and not 2x the real value). It will
      also allow the optimizer to chose a better index in some cases as the
      division by 2 could have a bad effect for tables with 2-5 identical values
      per key.
      
      A few notes about using fix_innodb_cardinality:
      - It has direct affect for SHOW INDEX FROM table_name. SHOW INDEX
        will also update the statistics in table share.
      - The effect of fix_innodb_cardinality for query plans or EXPLAIN
        is only visible after first open of the table. This is why one must
        do a flush tables or use SHOW INDEX for the option to take effect.
      - Using fix_innodb_cardinality can thus affect all user in their query
        plans if they are using the same tables.
      
      Because of this, it is strongly recommended that one uses
      optimizer_adjust_secondary_key_costs=fix_innodb_cardinality mainly
      in configuration files to not cause issues for other users.
      4bf7c966
    • Marko Mäkelä's avatar
      MDEV-34502 fixup: Do not cripple MSAN · 7e5c9ccd
      Marko Mäkelä authored
      We need to work around deficiencies of Valgrind, and apparently
      the previous work-around attempts
      (such as d247d649) do not work
      anymore, definitely not on recent clang-based compilers.
      
      MemorySanitizer should be fine; unfortunately we set HAVE_valgrind for it
      as well.
      7e5c9ccd
    • Marko Mäkelä's avatar
      MDEV-34458: Remove more traces of BTR_MODIFY_PREV · 7ead48a7
      Marko Mäkelä authored
      In commit 2f6df937
      we fixed an observed case of the bug by removing
      some code related to the no longer needed
      BTR_MODIFY_PREV mode.
      
      In commit 73ad436e
      an alternative fix was applied that also fixes the
      BTR_SEARCH_PREV case.
      
      Let us clean up some implicit references to BTR_MODIFY_PREV
      that were missed in 2f6df937.
      
      btr_pcur_move_backward_from_page(): Assume that the latch mode was
      BTR_SEARCH_LEAF.
      
      btr_pcur_move_to_prev(): Assert that the latch mode is BTR_SEARCH_LEAF.
      This function is mostly invoked in row0sel.cc for read operations,
      as well as in row0merge.cc for reading from the clustered index.
      All callers indeed use a cursor in the BTR_SEARCH_LEAF mode.
      7ead48a7
    • Marko Mäkelä's avatar
      MDEV-34565: SIGILL due to OS not supporting AVX512 · 232d7a5e
      Marko Mäkelä authored
      It is not sufficient to check that the CPU supports the necessary
      instructions. Also the operating system (or virtual machine hypervisor)
      must enable all the AVX registers to be saved and restored on a
      context switch.
      
      Because clang 8 does not support the compiler intrinsic _xgetbv()
      we will require clang 9 or later for enabling the use of VPCLMULQDQ
      and the related AVX512 features.
      232d7a5e
  6. 27 Jul, 2024 1 commit
  7. 25 Jul, 2024 1 commit
  8. 24 Jul, 2024 2 commits
  9. 23 Jul, 2024 3 commits
    • Thirunarayanan Balathandayuthapani's avatar
      MDEV-34066 Output of SHOW ENGINE INNODB STATUS uses the nanoseconds suffix for microseconds · 3359ac09
      Thirunarayanan Balathandayuthapani authored
      - This issue is caused by commit e71e6133
      (MDEV-24671). Change the output of transaction lock wait
      time in microseconds suffix.
      3359ac09
    • Oleg Smirnov's avatar
      MDEV-34634 Types mismatch when cloning items causes debug assertion · c91aeb37
      Oleg Smirnov authored
      New runtime diagnostic introduced with MDEV-34490 has detected
      that `Item_int_with_ref` incorrectly returns an instance of its ancestor
      class `Item_int`. This commit fixes that.
      
      In addition, this commit reverts a part of the diagnostic related
      to `clone_item()` checks. As it turned out, `clone_item()` is not required
      to return an object of the same class as the cloned one. For example,
      look at `Item_param::clone_item()`: it can return objects of `Item_null`,
      `Item_int`, `Item_string`, etc, depending on the object state.
      So the runtime type diagnostic is not applicable to `clone_item()` and
      is disabled with this commit.
      
      As the similar diagnostic failures are expected to appear again
      in the future, this commit introduces a new test file in the main suite:
      item_types.test, and new test cases may be added to this file
      
      Reviewer: Oleksandr Byelkin <sanja@mariadb.com>
      c91aeb37
    • Oleksandr Byelkin's avatar
      Merge branch '10.11' into 11.1 · 1de570d7
      Oleksandr Byelkin authored
      1de570d7
  10. 22 Jul, 2024 3 commits
  11. 20 Jul, 2024 1 commit
  12. 19 Jul, 2024 3 commits
    • Andrei's avatar
      MDEV-15393 gtid_slave_pos duplicate key errors after mysqldump restore · b8f92ade
      Andrei authored
      When mysqldump is run to dump the `mysql` system database, it generates
      INSERT statements into the table `mysql.gtid_slave_pos`.
      After running the backup script
      those inserts did not produce the expected gtid state on slave. In
      particular the maximum of mysql.gtid_slave_pos.sub_id did not make
      into
         rpl_global_gtid_slave_state.last_sub_id
      
      an in-memory object that is supposed to match the current state of the
      table. And that was regardless of whether --gtid option was specified
      or not. Later when the backup recipient server starts as slave
      in *non-gtid* mode this desychronization may lead to a duplicate key
      error.
      
      This effect is corrected for --gtid mode mysqldump/mariadb-dump only
      as the following.  The fixes ensure the insert block of the dump
      script is followed with a "summing-up" SET @global.gtid_slave_pos
      assignment.
      
      For the implemenation part, note a deferred print-out of
      SET-gtid_slave_pos and associated comments is prefered over relocating
      of the entire blocks if (opt_master,slave_data &&
      do_show_master,slave_status) ...  because of compatiblity
      concern. Namely an error inside do_show_*() is handled in the new code
      the same way, as early as, as before.
      
      A regression test can be run in how-to-reproduce mode as well.
      One affected mtr test observed.
      rpl_mysqldump_slave.result "mismatch" shows now the new deferring print
      of SET-gtid_slave_pos policy in action.
      b8f92ade
    • Oleksandr Byelkin's avatar
      new libfmt 11.0.1 · 0f6f1114
      Oleksandr Byelkin authored
      0f6f1114
    • Oleksandr Byelkin's avatar
      New columnstore 23.10.2 · 88711ee5
      Oleksandr Byelkin authored
      88711ee5