1. 24 Aug, 2023 1 commit
    • Marko Mäkelä's avatar
      MDEV-31813 SET GLOBAL innodb_max_purge_lag_wait hangs if innodb_read_only · 02878f12
      Marko Mäkelä authored
      innodb_max_purge_lag_wait_update(): Return immediately if we are
      in high_level_read_only mode.
      
      srv_wake_purge_thread_if_not_active(): Relax a debug assertion.
      If srv_read_only_mode holds, purge_sys.enabled() will not hold
      and this function will do nothing.
      
      trx_t::commit_in_memory(): Remove a redundant condition before
      invoking srv_wake_purge_thread_if_not_active().
      02878f12
  2. 23 Aug, 2023 1 commit
    • Yuchen Pei's avatar
      MDEV-31117 Fix spider connection info parsing · e9f3ca61
      Yuchen Pei authored
      Spider connection string is a comma-separated parameter definitions,
      where each definition is of the form "<param_title> <param_value>",
      where <param_value> is quote delimited on both ends, with backslashes
      acting as an escaping prefix.
      
      Despite the simple syntax, the existing spider connection string
      parser was poorly-written, complex, hard to reason and error-prone,
      causing issues like the one described in MDEV-31117. For example it
      treated param title the same way as param value when assigning, and
      have nonsensical fields like delim_title_len and delim_title.
      
      Thus as part of the bugfix, we clean up the spider comment connection
      string parsing, including:
      
      - Factoring out some code from the parsing function
      - Simplify the struct `st_spider_param_string_parse`
      - And any necessary changes caused by the above changes
      e9f3ca61
  3. 22 Aug, 2023 1 commit
    • Marko Mäkelä's avatar
      MDEV-20194 test adjustment for s390x · ff682ead
      Marko Mäkelä authored
      The test innodb.row_size_error_log_warnings_3 that was added in
      commit 372b0e63 (MDEV-20194)
      failed to take into account the earlier adjustment in
      commit cf574cf5 (MDEV-27634)
      that is specific to many GNU/Linux distributions for the s390x.
      ff682ead
  4. 21 Aug, 2023 1 commit
  5. 17 Aug, 2023 3 commits
    • Marko Mäkelä's avatar
      MDEV-31928 Assertion xid ... < 128 failed in trx_undo_write_xid() · 5a8a8fc9
      Marko Mäkelä authored
      trx_undo_write_xid(): Correct an off-by-one error in a debug assertion.
      5a8a8fc9
    • Marko Mäkelä's avatar
      MDEV-31254 InnoDB: Trying to read doublewrite buffer page · 518fe519
      Marko Mäkelä authored
      buf_read_page_low(): Remove an error message that could be triggered
      by buf_read_ahead_linear() or buf_read_ahead_random().
      
      This is a backport of commit c9eff1a1
      from MariaDB Server 10.5.
      518fe519
    • Marko Mäkelä's avatar
      MDEV-31875 ROW_FORMAT=COMPRESSED table: InnoDB: ... Only 0 bytes read · 44df6f35
      Marko Mäkelä authored
      buf_read_ahead_random(), buf_read_ahead_linear(): Avoid read-ahead
      of the last page(s) of ROW_FORMAT=COMPRESSED tablespaces that use
      a page size of 1024 or 2048 bytes. We invoke os_file_set_size() on
      integer multiples of 4096 bytes in order to be compatible with
      the requirements of innodb_flush_method=O_DIRECT regardless of the
      physical block size of the underlying storage.
      
      This change must be null-merged to MariaDB Server 10.5 and later.
      There, out-of-bounds read-ahead should be handled gracefully
      by simply discarding the buffer page that had been allocated.
      
      Tested by: Matthias Leich
      44df6f35
  6. 16 Aug, 2023 1 commit
    • Kristian Nielsen's avatar
      MDEV-29974: Missed kill waiting for worker queues to drain · 34e85854
      Kristian Nielsen authored
      When the SQL driver thread goes to wait for room in the parallel slave
      worker queue, there was a race where a kill at the right moment could
      be ignored and the wait proceed uninterrupted by the kill.
      
      Fix by moving the THD::check_killed() to occur _after_ doing ENTER_COND().
      
      This bug was seen as sporadic failure of the testcase rpl.rpl_parallel
      (rpl.rpl_parallel_gco_wait_kill since 10.5), with "Slave stopped with
      wrong error code".
      Signed-off-by: default avatarKristian Nielsen <knielsen@knielsen-hq.org>
      34e85854
  7. 15 Aug, 2023 6 commits
    • Kristian Nielsen's avatar
      MDEV-31655: Parallel replication deadlock victim preference code errorneously removed · 900c4d69
      Kristian Nielsen authored
      Restore code to make InnoDB choose the second transaction as a deadlock
      victim if two transactions deadlock that need to commit in-order for
      parallel replication. This code was erroneously removed when VATS was
      implemented in InnoDB.
      
      Also add a test case for InnoDB choosing the right deadlock victim.
      Also fixes this bug, with testcase that reliably reproduces:
      
      MDEV-28776: rpl.rpl_mark_optimize_tbl_ddl fails with timeout on sync_with_master
      
      Note: This should be null-merged to 10.6, as a different fix is needed
      there due to InnoDB locking code changes.
      Signed-off-by: default avatarKristian Nielsen <knielsen@knielsen-hq.org>
      900c4d69
    • Kristian Nielsen's avatar
      MDEV-31482: Lock wait timeout with INSERT-SELECT, autoinc, and statement-based replication · 920789e9
      Kristian Nielsen authored
      Remove the exception that InnoDB does not report auto-increment locks waits
      to the parallel replication.
      
      There was an assumption that these waits could not cause conflicts with
      in-order parallel replication and thus need not be reported. However, this
      assumption is wrong and it is possible to get conflicts that lead to hangs
      for the duration of --innodb-lock-wait-timeout. This can be seen with three
      transactions:
      
      1. T1 is waiting for T3 on an autoinc lock
      2. T2 is waiting for T1 to commit
      3. T3 is waiting on a normal row lock held by T2
      
      Here, T3 needs to be deadlock killed on the wait by T1.
      
      Note: This should be null-merged to 10.6, as a different fix is needed
      there due to InnoDB lock code changes.
      Signed-off-by: default avatarKristian Nielsen <knielsen@knielsen-hq.org>
      920789e9
    • Marko Mäkelä's avatar
      Remove the often-hanging test innodb.alter_rename_files · b4ace139
      Marko Mäkelä authored
      The test innodb.alter_rename_files rather frequently hangs in
      checkpoint_set_now. The test was removed in MariaDB Server 10.5
      commit 37e7bde1 when the code that
      it aimed to cover was simplified. Starting with MariaDB Server 10.5
      the page flushing and log checkpointing is much simpler, handled
      by the single buf_flush_page_cleaner() thread.
      
      Let us remove the test to avoid occasional failures. We are not going
      to fix the cause of the failure in MariaDB Server 10.4.
      b4ace139
    • Marko Mäkelä's avatar
      Merge mariadb-10.4.31 into 10.4 · 6fdc6846
      Marko Mäkelä authored
      6fdc6846
    • Alexander Barkov's avatar
      MDEV-24797 Column Compression - ERROR 1265 (01000): Data truncated for column · 9c8ae6dc
      Alexander Barkov authored
      Fix issue was earlier fixed by MDEV-31724. Only adding MTR tests.
      9c8ae6dc
    • Alexander Barkov's avatar
      MDEV-31724 Compressed varchar values lost on joins when sorting on columns from joined table(s) · 1fa7c9a3
      Alexander Barkov authored
      Field_varstring::get_copy_func() did not take into account
      that functions do_varstring1[_mb], do_varstring2[_mb] do not support
      compressed data.
      
      Changing the return value of Field_varstring::get_copy_func()
      to `do_field_string` if there is a compresion and truncation
      at the same time. This fixes the problem, so now it works as follows:
      - val_str() uncompresses the data
      - The prefix is then calculated on the uncompressed data
      
      Additionally, introducing two new copying functions
      - do_varstring1_no_truncation()
      - do_varstring2_no_truncation()
      
      Using new copying functions in cases when:
      - a Field_varstring with length_bytes==1 is changing to a longer
          Field_varstring with length_bytes==1
      - a Field_varstring with length_bytes==2 is changing to a longer
          Field_varstring with length_bytes==2
      
      In these cases we don't care neither of compression nor
      of multi-byte prefixes: the entire data gets fully copied
      from the source column to the target column as is.
      
      This is a kind of new optimization, but this also was needed
      to preserve existing MTR test results.
      1fa7c9a3
  8. 14 Aug, 2023 1 commit
  9. 11 Aug, 2023 1 commit
  10. 10 Aug, 2023 4 commits
    • Monty's avatar
      MDEV-31893 Valgrind reports issues in main.join_cache_notasan · 2aea9387
      Monty authored
      This is also related to
      MDEV-31348 Assertion `last_key_entry >= end_pos' failed in virtual bool
                 JOIN_CACHE_HASHED::put_record()
      
      Valgrind exposed a problem with the join_cache for hash joins:
      =25636== Conditional jump or move depends on uninitialised value(s)
      ==25636== at 0xA8FF4E: JOIN_CACHE_HASHED::init_hash_table()
                (sql_join_cache.cc:2901)
      
      The reason for this was that avg_record_length contained a random value
      if one had used SET optimizer_switch='optimize_join_buffer_size=off'.
      
      This causes either 'random size' memory to be allocated (up to
      join_buffer_size) which can increase memory usage or, if avg_record_length
      is less than the row size, memory overwrites in thd->mem_root, which is
      bad.
      
      Fixed by setting avg_record_length in JOIN_CACHE_HASHED::init()
      before it's used.
      
      There is no test case for MDEV-31893 as valgrind of join_cache_notasan
      checks that.
      I added a test case for MDEV-31348.
      2aea9387
    • Kristian Nielsen's avatar
      MDEV-23021: rpl.rpl_parallel_optimistic_until fails in Buildbot · b2e312b0
      Kristian Nielsen authored
      The test case accessed slave-relay-bin.000003 without waiting for the IO
      thread to write it first. If the IO thread was slow, this could fail.
      Signed-off-by: default avatarKristian Nielsen <knielsen@knielsen-hq.org>
      b2e312b0
    • Kristian Nielsen's avatar
      MDEV-381: fdatasync() does not correctly flush growing binlog file · 5055490c
      Kristian Nielsen authored
      Revert the old work-around for buggy fdatasync() on Linux ext3. This bug was
      fixed in Linux > 10 years ago back to kernel version at least 3.0.
      Reviewed-by: default avatarMarko Mäkelä <marko.makela@mariadb.com>
      Signed-off-by: default avatarKristian Nielsen <knielsen@knielsen-hq.org>
      5055490c
    • Monty's avatar
      MDEV-31893 Valgrind reports issues in main.join_cache_notasan · e9333ff0
      Monty authored
      This is also related to
      MDEV-31348 Assertion `last_key_entry >= end_pos' failed in virtual bool
                 JOIN_CACHE_HASHED::put_record()
      
      Valgrind exposed a problem with the join_cache for hash joins:
      =25636== Conditional jump or move depends on uninitialised value(s)
      ==25636== at 0xA8FF4E: JOIN_CACHE_HASHED::init_hash_table()
                (sql_join_cache.cc:2901)
      
      The reason for this was that avg_record_length contained a random value
      if one had used SET optimizer_switch='optimize_join_buffer_size=off'.
      
      This causes either 'random size' memory to be allocated (up to
      join_buffer_size) which can increase memory usage or, if avg_record_length
      is less than the row size, memory overwrites in thd->mem_root, which is
      bad.
      
      Fixed by setting avg_record_length in JOIN_CACHE_HASHED::init()
      before it's used.
      
      There is no test case for MDEV-31893 as valgrind of join_cache_notasan
      checks that.
      I added a test case for MDEV-31348.
      e9333ff0
  11. 08 Aug, 2023 4 commits
  12. 02 Aug, 2023 2 commits
    • Christian Hesse's avatar
      update galera_new_cluster to use environment file · b54e4bf0
      Christian Hesse authored
      Now that the systemd unit files use an environment file to pass
      _WSREP_START_POSITION we have to update galera_new_cluster as well.
      b54e4bf0
    • Christian Hesse's avatar
      use environment file in systemd units for _WSREP_START_POSITION · 6c405904
      Christian Hesse authored
      We used to run `systemctl set-environment` to pass
      _WSREP_START_POSITION. This is bad because:
      
      * it clutter systemd's environment (yes, pid 1)
      * it requires root privileges
      * options (like LimitNOFILE=) are not applied
      
      Let's just create an environment file in ExecStartPre=, that is read
      before ExecStart= kicks in. We have _WSREP_START_POSITION around for the
      main process without any downsides.
      6c405904
  13. 31 Jul, 2023 5 commits
  14. 30 Jul, 2023 2 commits
  15. 26 Jul, 2023 3 commits
  16. 25 Jul, 2023 3 commits
    • Oleksandr Byelkin's avatar
      new WolfSSL v5.6.3-stable · 2a46b358
      Oleksandr Byelkin authored
      2a46b358
    • Brandon Nesterenko's avatar
      MDEV-30619: Parallel Slave SQL Thread Can Update Seconds_Behind_Master with Active Workers · 063f4ac2
      Brandon Nesterenko authored
      MDEV-31749 sporadic assert in MDEV-30619 new test
      
      If the workers of a parallel replica are busy (potentially with long
      queues), but the SQL thread has no events left to distribute (so it
      goes idle), then the next event that comes from the primary will
      update mi->last_master_timestamp with its timestamp, even if the
      workers have not yet finished.
      
      This patch changes the parallel replica logic which updates
      last_master_timestamp after idling from using solely sql_thread_caught_up
      (added in MDEV-29639) to using the latter with rli queued/dequeued
      event counters.
      That is, if  the queued count is equal to the dequeued count, it
      means all events have been processed and the replica is considered
      idle when the driver thread has also distributed all events.
      
      Low level details of the commit include
      - to make a more generalized test for Seconds_Behind_Master on
        the parallel replica, rpl_delayed_parallel_slave_sbm.test
        is renamed to rpl_parallel_sbm.test for this purpose.
      - pause_sql_thread_on_next_event usage was removed
        with the MDEV-30619 fixes. Rather than remove it, we adapt it
        to the needs of this test case
      - added test case to cover SBM spike of relay log read and LMT
        update that was fixed by MDEV-29639
      - rpl_seconds_behind_master_spike.test is made to use
        the negate_clock_diff_with_master debug eval.
      
      Reviewed By:
      ============
      Andrei Elkin <andrei.elkin@mariadb.com>
      063f4ac2
    • Yuchen Pei's avatar
      MDEV-31400 Simple plugin dependency resolution · 734583b0
      Yuchen Pei authored
      We introduce simple plugin dependency. A plugin init function may
      return HA_ERR_RETRY_INIT. If this happens during server startup when
      the server is trying to initialise all plugins, the failed plugins
      will be retried, until no more plugins succeed in initialisation or
      want to be retried.
      
      This will fix spider init bugs which is caused in part by its
      dependency on Aria for initialisation.
      
      The reason we need a new return code, instead of treating every
      failure as a request for retry, is that it may be impossible to clean
      up after a failed plugin initialisation. Take InnoDB for example, it
      has a global variable `buf_page_cleaner_is_active`, which may not
      satisfy an assertion during a second initialisation try, probably
      because InnoDB does not expect the initialisation to be called
      twice.
      734583b0
  17. 24 Jul, 2023 1 commit