1. 20 Sep, 2020 3 commits
  2. 16 Sep, 2020 1 commit
    • Sujatha's avatar
      MDEV-21839: Handle crazy offset to SHOW BINLOG EVENTS · 873cc1e7
      Sujatha authored
      Problem:
      =======
      SHOW BINLOG EVENTS FROM <"random"-pos> caused a variety of failures as
      reported in MDEV-18046. They are fixed but that approach is not future-proof
      as well as is not optimal to create extra check for being constructed event
      parameters.
      
      Analysis:
      =========
      "show binlog events from <pos>" code considers the user given position as a
      valid event start position. The code starts reading data from this event start
      position onwards and tries to map it to a set of known events. Each event has
      a specific event structure and asserts have been added to ensure that, read
      event data, satisfies the event specific requirements. When a random position
      is supplied to "show binlog events command" the event structure specific
      checks will fail and they result in assert.
      
      For example: https://jira.mariadb.org/browse/MDEV-18046
      In the bug description user executes CREATE TABLE/INSERT and ALTER SQL
      commands.
      
      When a crazy offset like "SHOW BINLOG EVENTS FROM 365" is provided code
      assumes offset 365 as valid event begin and proceeds to EVENT_LEN_OFFSET reads
      some random length and comes up with a crazy event which didn't exits in the
      binary log. In this quoted example scenario, event read at offset 365 is
      considered as "Update_rows_log_event", which is not present in binary log.
      Since this is a random event its validation fails and code results in
      assert/segmentation fault, as shown below.
      
      mysqld: /data/src/10.4/sql/log_event.cc:10863: Rows_log_event::Rows_log_event(
          const char*, uint, const Format_description_log_event*):
          Assertion `var_header_len >= 2' failed.
          181220 15:27:02 [ERROR] mysqld got signal 6 ;
      #7  0x00007fa0d96abee2 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
      #8  0x000055e744ef82de in Rows_log_event::Rows_log_event (this=0x7fa05800d390,
          buf=0x7fa05800d080 "", event_len=254, description_event=0x7fa058006d60) at
      /data/src/10.4/sql/log_event.cc:10863
      #9  0x000055e744f00cf8 in Update_rows_log_event::Update_rows_log_event
      
      Since we are reading random data repeating the same command SHOW BINLOG EVENTS
      FROM 365 produces different types of crashes with different events. MDEV-18046
      reported 10 such crashes.
      
      In order to avoid such scenarios user provided starting offset needs to be
      validated for its correctness. Best way of doing this is to make use of
      checksums if they are available. MDEV-18046 fix introduced the checksum based
      validation.
      
      The issue still remains in cases where binlog checksums are disabled. Please
      find the following bug reports.
      
      MDEV-22473: binlog.binlog_show_binlog_event_random_pos failed in buildbot,
                  server crashed in read_log_event
      MDEV-22455: Server crashes in Table_map_log_event,
                  binlog.binlog_invalid_read_in_rotate failed in buildbot
      
      Fix:
      ====
      When binlog checksum is disabled, perform scan(via reading event by event), to
      validate the requested FROM <pos> offset. Starting from offset 4 read the
      event_length of next_event in the binary log. Using the next_event length
      advance current offset to point to next event. Repeat this process till the
      current offset is less than or equal to crazy offset. If current offset is
      higher than crazy offset provide appropriate invalid input offset error.
      873cc1e7
  3. 14 Sep, 2020 1 commit
  4. 11 Sep, 2020 2 commits
  5. 07 Sep, 2020 2 commits
    • Sujatha's avatar
      MDEV-9501: rpl.rpl_binlog_index, rpl.rpl_gtid_crash, rpl.rpl_stm_multi_query... · a8f6bbb7
      Sujatha authored
      MDEV-9501: rpl.rpl_binlog_index, rpl.rpl_gtid_crash, rpl.rpl_stm_multi_query fail sporadically in buildbot with Master command COM_REGISTER_SLAVE failed
      
      Analysis:
      ========
      Slave server will send COM_REGISTER_SLAVE command at the time of establishing
      a connection to master. If master is down, then the command will fail and
      COM_REGISTER_SLAVE failed warning is reported.
      
      'rpl_binlog_index.test' shutsdown the master and it relocates binary logs to a
      new location and attempts to start master by pointing 'log-bin' to new
      location. During this process the slave threads are active. IO thread actively
      checks for the presence of master when it finds that the connection is lost it
      attempts a reconnect, as master is down COM_REGISTER_SLAVE command fails.
      
      As part of fix, stop the slave threads and then shutdown the master and do the
      binlog relocation. Once master is restarted start the slave threads and sync
      them with the master. In test binary logs and index files on master are
      relocated to /tmpdir but during master restart only --log-bin option is
      provided, this is incorrect. Even --log-bin-index also should be pointed to
      /tmpdir otherwise upon master server restart two index files will be created.
      One master-bin.index in /tmpdir and a new master-bin.index as per log_basename
      in datadir. Due to this slave will fail to connect to master.
      
      'rpl_gtid_crash.test' tests following scenario "crashing master, causing slave
      IO thread to reconnect while SQL thread is running". When IO thread tries to
      connect to crashed master on slow platforms COM_REGISTER_SLAVE command fails.
      This is expected hence the warning should be added to suppression list.
      a8f6bbb7
    • Kentoku SHIBA's avatar
      MDEV-7098 spider/bg.spider_fixes failed in buildbot with safe_mutex: Trying to... · 420c4dcc
      Kentoku SHIBA authored
      MDEV-7098 spider/bg.spider_fixes failed in buildbot with safe_mutex: Trying to unlock mutex conn->mta_conn_mutex that wasn't locked at storage/spider/spd_db_conn.cc, line 671
      420c4dcc
  6. 04 Sep, 2020 1 commit
  7. 03 Sep, 2020 3 commits
    • Alexander Barkov's avatar
      MDEV-23535 SIGSEGV, SIGABRT and SIGILL in typeinfo for Item_func_set_collation... · f0a57acb
      Alexander Barkov authored
      MDEV-23535 SIGSEGV, SIGABRT and SIGILL in typeinfo for Item_func_set_collation (on optimized builds)
      
      This piece of the code in Item_func_or_sum::agg_item_set_converter:
      
       if (!conv && ((*arg)->collation.repertoire == MY_REPERTOIRE_ASCII))
         conv= new (thd->mem_root) Item_func_conv_charset(thd, *arg, coll.collation, 1);
      
      was wrong because:
      
      1. It could change Item_cache to Item_func_conv_charset
        (with the old Item_cache in args[0]).
        Such Item type change is not always supported, e.g.
        the code in Item_singlerow_subselect::reset() expects only Item_cache,
        to be able to call Item_cache::set_null(). So it erroneously
        reinterpreted Item_func_conv_charset to Item_cache and called
        a non-existing method set_null(), which crashed the server.
      
      2. The 1 in the last parameter to Item_func_conv_charset() was also
        a problem. In MariaDB versions where the reported query did not crash,
        it erroneously returned "empty set" instead of one row, because
        the 1 made subselects execute too earlier and return NULL.
      
      Fix:
      
      1. Removing the above two lines from Item_func_or_sum::agg_item_set_converter()
      
      2. Adding the repertoire test inside the constructor of Item_func_conv_charset,
         so it now detects itself as "safe" in more cases than before.
         This is needed to avoid new "Illegal mix of collations" after
         removing the wrong code in various scenarios when character set
         conversion from pure ASCII happens, including the reported scenario.
      
      So now this sequence:
      
         Item_cache -> Item_func_concat
      
      is replaced to this compatible sequence (the top Item is still Item_cache):
      
         new Item_cache -> Item_func_conv_charset -> Item_func_concat
      
      Before the fix it was replaced to this incompatible sequence:
      
         Item_func_conv_charset -> old Item_cache -> Item_func_concat
      f0a57acb
    • Marko Mäkelä's avatar
      MDEV-22387: Do not pass null pointer to some memcpy() · 94a520dd
      Marko Mäkelä authored
      Passing a null pointer to a nonnull argument is not only undefined
      behaviour, but it also grants the compiler the permission to optimize
      away further checks whether the pointer is null. GCC -O2 at least
      starting with version 8 may do that, potentially causing SIGSEGV.
      
      These problems were caught in a WITH_UBSAN=ON build with the
      Bug#7024 test in main.view.
      94a520dd
    • Marko Mäkelä's avatar
      MDEV-7110 follow-up fix: Do not pass NULL as nonnull parameter · a256070e
      Marko Mäkelä authored
      Passing a null pointer to the "%s" argument of a printf-like
      function is undefined behaviour. In the GNU libc implementation
      of the printf() family of functions, it happens to work.
      
      GCC 10.2.0 would diagnose this with -Wformat-overflow -Og.
      In -fsanitize=undefined (WITH_UBSAN=ON) builds, a runtime error
      would be generated. In some other builds, GCC 8 or later might infer
      that the parameter is nonnull and optimize away further checks whether
      the parameter is null, leading to SIGSEGV.
      a256070e
  8. 02 Sep, 2020 1 commit
  9. 01 Sep, 2020 1 commit
  10. 31 Aug, 2020 1 commit
    • Andrei Elkin's avatar
      MDEV-16372 ER_BASE64_DECODE_ERROR upon replaying binary log via mysqlbinlog --verbose · feac078f
      Andrei Elkin authored
      (This commit is exclusively for 10.1 branch, do not merge it to upper ones)
      
      In case of a pattern of non-STMT_END-marked Rows-log-event (A) followed by
      a STMT_END marked one (B) mysqlbinlog mixes up the base64 encoded rows events
      with their pseudo sql representation produced by the verbose option:
            BINLOG '
              base64 encoded data for A
              ### verbose section for A
              base64 encoded data for B
              ### verbose section for B
            '/*!*/;
      In effect the produced BINLOG '...' query is not valid and is rejected with the error.
      Examples of this way malformed BINLOG could have been found in binlog_row_annotate.result
      that gets corrected with the patch.
      
      The issue is fixed with introduction an auxiliary IO_CACHE to hold on the verbose
      comments until the terminal STMT_END event is found. The new cache is emptied
      out after two pre-existing ones are done at that time.
      The correctly produced output now for the above case is as the following:
            BINLOG '
              base64 encoded data for A
              base64 encoded data for B
            '/*!*/;
              ### verbose section for A
              ### verbose section for B
      
      Thanks to Alexey Midenkov for the problem recognition and attempt to tackle,
      Venkatesh Duggirala who produced a patch for the upstream whose
      idea is exploited here, as well as to MDEV-23077 reporter LukeXwang who
      also contributed a piece of a patch aiming at this issue.
      
      Extra: mysqlbinlog_row_minimal refined to not produce mutable numeric values into the result file.
      feac078f
  11. 27 Aug, 2020 1 commit
    • Varun Gupta's avatar
      MDEV-23596: Assertion `tab->ref.use_count' failed in join_read_key_unlock_row · f69cc267
      Varun Gupta authored
      The issue here was that the query was using ORDER BY LIMIT optimzation where
      the access method was changed from EQ_REF access to an index scan (index that would
      resolve the ORDER BY clause).
      But the parameter READ_RECORD::unlock_row was not reset to rr_unlock_row, which is
      used when the access method is not EQ_REF access.
      f69cc267
  12. 25 Aug, 2020 1 commit
    • Sergei Golubchik's avatar
      MDEV-23569 temporary tables can overwrite existing files · 62d1e3bf
      Sergei Golubchik authored
      for internal temporary tables: don't use realpath(),
      and let them overwrite whatever orphan temp files might've
      left in the tmpdir (see main.error_simulation test).
      
      for user created temporary tables: we have to use realpath(),
      (see 3a726ab6, remember DATA/INDEX DIRECTORY). don't allow
      them to overwrite existing files.
      
      This bug was reported by RACK911 LABS
      62d1e3bf
  13. 21 Aug, 2020 1 commit
  14. 18 Aug, 2020 2 commits
    • Oleksandr Byelkin's avatar
      MDEV-23491: __bss_start breaks compilation of various platforms · ece0b062
      Oleksandr Byelkin authored
      Remove __bss_start & Co, because systen call "write" check buffer address and return EFAULT if it is wrong.
      ece0b062
    • Julius Goryavsky's avatar
      MDEV-21039: Server fails to start with unknown mysqld_safe options · 57960211
      Julius Goryavsky authored
      Adding any unknown option to the "[mysqld_safe]" section makes
      mysqld impossible to start with mysqld_multi. For example, after
      adding the unknown option "numa_interleave" to the "[mysqld_safe]"
      section, mysqld_multi exits with the following diagnostics:
      
      [ERROR] /usr/local/mysql/bin/mysqld: unknown option '--numa_interleave'
      
      To get rid of this behavior, this patch by default adds the "--loose-"
      prefix to all unknown (for mysqld_safe) options. This behavior can be
      enabled explicitly with the --ignore-unknown option and disabled with
      the --no-ignore-unknown option.
      57960211
  15. 15 Aug, 2020 1 commit
    • Daniel Black's avatar
      MDEV-23440: mysql_tzinfo_to_sql to use transactions · b970363a
      Daniel Black authored
      Since MDEV-18778, timezone tables get changed to innodb
      to allow them to be replicated to other galera nodes.
      
      Even without galera, timezone tables could be declared innodb.
      With the standalone innodb tables, the mysql_tzinfo_to_sql takes
      approximately 27 seconds.
      
      With the transactions enabled in this patch, 1.2 seconds is
      the approximate load time.
      
      While explicit checks for the engine of the time zone tables could be
      done, or checks against !opt_skip_write_binlog, non-transactional
      storage engines will just ignore the transactional state without
      even a warning so its safe to enact globally.
      
      Leap seconds are pretty much ignored as they are a single insert
      statement and have gone out of favour as they have caused MariaDB
      stalls in the past.
      b970363a
  16. 13 Aug, 2020 1 commit
  17. 12 Aug, 2020 2 commits
    • Marko Mäkelä's avatar
      MDEV-20672 Inconsistent usage message for innodb_compression_algorithm · 101ce10d
      Marko Mäkelä authored
      The usage message for the innodb_compression_algorithm system variable
      did not list snappy, which was added as an optional compression algorithm
      in MariaDB 10.1.3 and might actually work since
      commit 90c52e52 (MDEV-12615)
      in MariaDB 10.1.24.
      
      Unfortunately, we will include also unavailable compression algorithms
      in the list, because ENUM parameters allow numeric values, and we do
      not want innodb_compression_algorithm=3 to change meaning depending on
      the way how the source code was compiled.
      101ce10d
    • Marko Mäkelä's avatar
      MDEV-19526 heap number overflow on innodb_page_size=64k · efd8af53
      Marko Mäkelä authored
      InnoDB only reserves 13 bits for the heap number in the record header,
      limiting the heap number to be at most 8191. But, when using
      innodb_page_size=64k and secondary index records of 7 bytes each,
      it is possible to exceed the maximum heap number.
      
      btr_cur_optimistic_insert(): Let the operation fail if the
      maximum number of records would be exceeded.
      
      page_mem_alloc_heap(): Move to the same compilation unit with the
      only caller, and let the operation fail if the maximum heap number
      has been allocated already.
      efd8af53
  18. 11 Aug, 2020 2 commits
    • Julius Goryavsky's avatar
      MDEV-21526: mysqld_multi no longer works with different server binaries · 7ad4709a
      Julius Goryavsky authored
      The problem is caused by the fact that adding the
      --defaults-group-suffix option to fix MDEV-18863 causes
      mysqld to read all options from the appropriate sections
      of the config file, including options specific to mysqld_multi.
      Reading unknown options (which are not supported by mysqld)
      causes mysqld to terminate with an error.
      
      However, the MDEV-18863 problem has been completely fixed
      by passing options on the command line, and now there is no
      need to specify the --defaults-group-suffix option (we just
      need to give priority to options passed through the command
      line, so as not to break MDEV-18863).
      7ad4709a
    • Alexander Barkov's avatar
      Fixing sporading builtbot test failures happening at '00:00:00' sharp · caf10590
      Alexander Barkov authored
      Some tests relied on the fact that DATETIME->DATE conversion
      always produce a truncation (with a warning). This is not the case
      when the SQL statement is executed at current time '00:00:00' sharp.
      
      Adding a new SET TIMESTAMP statements to make sure time is not '00:00:00'.
      caf10590
  19. 10 Aug, 2020 4 commits
    • Marko Mäkelä's avatar
      MDEV-16115 Hang after reducing innodb_encryption_threads · 7f67ef14
      Marko Mäkelä authored
      The test encryption.create_or_replace would occasionally fail,
      because some fil_space_t::n_pending_ops would never be decremented.
      
      fil_crypt_find_space_to_rotate(): If rotate_thread_t::should_shutdown()
      holds due to innodb_encryption_threads having been reduced, do
      release the reference.
      
      fil_space_remove_from_keyrotation(), fil_space_next(): Declare the
      functions static, simplify a little, and define in the same compilation
      unit with the only caller, fil_crypt_find_space_to_rotate().
      
      fil_crypt_key_mutex: Remove (unused).
      7f67ef14
    • Daniel Bartholomew's avatar
      bump the VERSION · 3e3da164
      Daniel Bartholomew authored
      3e3da164
    • Oleksandr Byelkin's avatar
    • Daniel Black's avatar
      MDEV-23386: mtr: main.mysqld--help autosized table{-open,}-cach and max-connections · deb36558
      Daniel Black authored
      Example of the failure:
      http://buildbot.askmonty.org/buildbot/builders/bld-p9-rhel7/builds/4417/steps/mtr/logs/stdio
      ```
      main.mysqld--help 'unix'                 w17 [ fail ]
              Test ended at 2020-06-20 18:51:45
      
      CURRENT_TEST: main.mysqld--help
      --- /opt/buildbot-slave/bld-p9-rhel7/build/mysql-test/main/mysqld--help.result	2020-06-20 16:06:49.903604179 +0300
      +++ /opt/buildbot-slave/bld-p9-rhel7/build/mysql-test/main/mysqld--help.reject	2020-06-20 18:51:44.886766820 +0300
      @@ -1797,10 +1797,10 @@
       sync-relay-log-info 10000
       sysdate-is-now FALSE
       system-versioning-alter-history ERROR
      -table-cache 421
      +table-cache 2000
       table-definition-cache 400
      -table-open-cache 421
      -table-open-cache-instances 1
      +table-open-cache 2000
      +table-open-cache-instances 8
       tc-heuristic-recover OFF
       tcp-keepalive-interval 0
       tcp-keepalive-probes 0
      
      mysqltest: Result length mismatch
      ```
      mtr: table_open_cache_basic autosized:
      
      Lets assume that >400 are available and that
      we can set the result back to the start value.
      
      All of these system variables are autosized and can
      generate MTR output differences.
      
      Closes #1527
      deb36558
  20. 06 Aug, 2020 3 commits
  21. 05 Aug, 2020 1 commit
  22. 04 Aug, 2020 3 commits
    • Sergei Golubchik's avatar
      a09a06d5
    • Sergei Golubchik's avatar
      5.6.49-89.0 · 2adaaeba
      Sergei Golubchik authored
      2adaaeba
    • Sachin's avatar
      MDEV-23089 rpl_parallel2 fails in 10.5 · e3c18b8e
      Sachin authored
      Problem:- rpl_parallel2 was failing non-deterministically
      Analysis:-
      When FLUSH TABLES WITH READ LOCK is executed, it will allow all worker
      threads to complete their ongoing transactions and then it will pause them.
      At this state FTWRL will proceed to acquire global read lock. FTWRL first
      blocks threads from starting new commits, then upgrades the lock to block
      commit of existing transactions.
        Step1:
          FLUSH TABLES WITH READ LOCK - Blocks new commits
        Step2:
          * STOP SLAVE command enables 'force_abort=1' which unblocks workers,
            they continue to execute events.
          * T1: Waits in 'record_gtid' call to update 'gtid_slave_pos' table with
            its current GTID, but it is blocked becuase of Step1.
          * T2: Holds COMMIT lock and waits for T1 to commit.
        Step3:
          FLUSH TABLES WITH READ LOCK - Waiting to get BLOCK_COMMIT.
      This results in deadlock. When STOP SLAVE command allows paused workers to
      proceed, workers should skip the execution of all further events, similar
      to 'conservative' parallel mode.
      Solution:-
      We will assign 1 to skip_event_group when we are aborted in do_ftwrl_wait.
      rpl_parallel_entry->pause_sub_id is only reset when force_abort is off in
      rpl_pause_after_ftwrl.
      e3c18b8e
  23. 03 Aug, 2020 2 commits