1. 22 Sep, 2020 8 commits
    • Marko Mäkelä's avatar
      Merge 10.1 into 10.2 · 9d0ee2dc
      Marko Mäkelä authored
      9d0ee2dc
    • Marko Mäkelä's avatar
      MDEV-22939: Restore an AUTO_INCREMENT check · 78efa109
      Marko Mäkelä authored
      It turns out that we must check for DISCARD TABLESPACE both
      when the table is being rebuilt and when the AUTO_INCREMENT
      value of the table is being added.
      
      This was caught by the test innodb.alter_missing_tablespace.
      Somehow I failed to run all tests. Sorry!
      78efa109
    • Marko Mäkelä's avatar
      MDEV-22939 Server crashes in row_make_new_pathname() · 3eb81136
      Marko Mäkelä authored
      The statement ALTER TABLE...DISCARD TABLESPACE is problematic,
      because its designed purpose is to break the referential integrity
      of the data dictionary and make a table point to nowhere.
      
      ha_innobase::commit_inplace_alter_table(): Check whether the
      table has been discarded. (This is a bit late to check it, right
      before committing the change.) Previously, we performed this check
      only in a specific branch of the function commit_set_autoinc().
      
      Note: We intentionally allow non-rebuilding ALTER TABLE even if
      the tablespace has been discarded, to remain compatible with MySQL.
      (See the various tests with "wl5522" in the name, such as
      innodb.innodb-wl5522.)
      
      The test case would crash starting with 10.3 only, but it does not hurt
      to minimize the code and test difference between 10.2 and 10.3.
      3eb81136
    • Marko Mäkelä's avatar
      Make DISCARD TABLESPACE more robust · e5e83daf
      Marko Mäkelä authored
      dict_load_table_low(): Copy the 'discarded' flag to file_unreadable.
      This allows to avoid a potentially harmful call to dict_stats_init()
      in ha_innobase::open().
      e5e83daf
    • Marko Mäkelä's avatar
      MDEV-23776: Re-apply the fix and make the test more robust · 2af8f712
      Marko Mäkelä authored
      The test that was added in commit e05650e6
      would break a subsequent run of a test encryption.innodb-bad-key-change
      because some pages in the system tablespace would be encrypted with
      a different key.
      
      The failure was repeatable with the following invocation:
      
      ./mtr --no-reorder \
      encryption.create_or_replace,cbc \
      encryption.innodb-bad-key-change,cbc
      
      Because the crash was unrelated to the code changes that we reverted
      in commit eb38b1f7
      we can safely re-apply those fixes.
      2af8f712
    • Marko Mäkelä's avatar
      MDEV-23705 Assertion 'table->data_dir_path || !space' · 732cd7fd
      Marko Mäkelä authored
      After DISCARD TABLESPACE, the tablespace of a table will no longer
      exist, and dict_get_and_save_data_dir_path() would invoke
      dict_get_first_path() to read an entry from SYS_DATAFILES.
      For some reason, DISCARD TABLESPACE would not to remove the entry
      from there.
      
      dict_get_and_save_data_dir_path(): If the tablespace has been
      discarded, do not bother trying to read the name.
      
      Side note: The tables SYS_TABLESPACES and SYS_DATAFILES are
      redundant and subject to removal in MDEV-22343.
      732cd7fd
    • Marko Mäkelä's avatar
      Revert "MDEV-23776 Test encryption.create_or_replace fails with a warning" · eb38b1f7
      Marko Mäkelä authored
      This reverts commit e33f7b6f.
      The change seems to have introduced intermittent failures of the test
      encryption.innodb-bad-key-change on many platforms.
      
      The failure that we were trying to address was not reproduced on 10.2.
      It could be related to commit a7dd7c89
      (MDEV-23651) or de942c9f (MDEV-15983)
      or other changes that reduced contention on fil_system.mutex in 10.3.
      
      The fix that we are hereby reverting from 10.2 seems to work fine
      on 10.3 and 10.4.
      eb38b1f7
    • Daniel Black's avatar
      systemd: mariadb@bootstrap - clear ExecStartPre and ExecStartPost · 4c192279
      Daniel Black authored
      This is just to make sure no ExecStartPre/Post actions from the
      multi-instance MariaDB service definition are executed
      when a user attempts to start mariadb@bootstrap.
      
      Fixes: 3723c70a
      4c192279
  2. 21 Sep, 2020 8 commits
  3. 18 Sep, 2020 1 commit
  4. 17 Sep, 2020 2 commits
  5. 16 Sep, 2020 2 commits
    • Jan Lindström's avatar
      MDEV-21655 : galera.galera_wan_restart_ist MTR fails sporadically: WSREP did... · 96426dac
      Jan Lindström authored
      MDEV-21655 : galera.galera_wan_restart_ist MTR fails sporadically: WSREP did not transition to state READY
      
      Replace sleeps with proper wait_conditions to wait correct
      cluster configuration.
      96426dac
    • Sujatha's avatar
      MDEV-21839: Handle crazy offset to SHOW BINLOG EVENTS · 873cc1e7
      Sujatha authored
      Problem:
      =======
      SHOW BINLOG EVENTS FROM <"random"-pos> caused a variety of failures as
      reported in MDEV-18046. They are fixed but that approach is not future-proof
      as well as is not optimal to create extra check for being constructed event
      parameters.
      
      Analysis:
      =========
      "show binlog events from <pos>" code considers the user given position as a
      valid event start position. The code starts reading data from this event start
      position onwards and tries to map it to a set of known events. Each event has
      a specific event structure and asserts have been added to ensure that, read
      event data, satisfies the event specific requirements. When a random position
      is supplied to "show binlog events command" the event structure specific
      checks will fail and they result in assert.
      
      For example: https://jira.mariadb.org/browse/MDEV-18046
      In the bug description user executes CREATE TABLE/INSERT and ALTER SQL
      commands.
      
      When a crazy offset like "SHOW BINLOG EVENTS FROM 365" is provided code
      assumes offset 365 as valid event begin and proceeds to EVENT_LEN_OFFSET reads
      some random length and comes up with a crazy event which didn't exits in the
      binary log. In this quoted example scenario, event read at offset 365 is
      considered as "Update_rows_log_event", which is not present in binary log.
      Since this is a random event its validation fails and code results in
      assert/segmentation fault, as shown below.
      
      mysqld: /data/src/10.4/sql/log_event.cc:10863: Rows_log_event::Rows_log_event(
          const char*, uint, const Format_description_log_event*):
          Assertion `var_header_len >= 2' failed.
          181220 15:27:02 [ERROR] mysqld got signal 6 ;
      #7  0x00007fa0d96abee2 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
      #8  0x000055e744ef82de in Rows_log_event::Rows_log_event (this=0x7fa05800d390,
          buf=0x7fa05800d080 "", event_len=254, description_event=0x7fa058006d60) at
      /data/src/10.4/sql/log_event.cc:10863
      #9  0x000055e744f00cf8 in Update_rows_log_event::Update_rows_log_event
      
      Since we are reading random data repeating the same command SHOW BINLOG EVENTS
      FROM 365 produces different types of crashes with different events. MDEV-18046
      reported 10 such crashes.
      
      In order to avoid such scenarios user provided starting offset needs to be
      validated for its correctness. Best way of doing this is to make use of
      checksums if they are available. MDEV-18046 fix introduced the checksum based
      validation.
      
      The issue still remains in cases where binlog checksums are disabled. Please
      find the following bug reports.
      
      MDEV-22473: binlog.binlog_show_binlog_event_random_pos failed in buildbot,
                  server crashed in read_log_event
      MDEV-22455: Server crashes in Table_map_log_event,
                  binlog.binlog_invalid_read_in_rotate failed in buildbot
      
      Fix:
      ====
      When binlog checksum is disabled, perform scan(via reading event by event), to
      validate the requested FROM <pos> offset. Starting from offset 4 read the
      event_length of next_event in the binary log. Using the next_event length
      advance current offset to point to next event. Repeat this process till the
      current offset is less than or equal to crazy offset. If current offset is
      higher than crazy offset provide appropriate invalid input offset error.
      873cc1e7
  6. 14 Sep, 2020 3 commits
  7. 11 Sep, 2020 4 commits
  8. 10 Sep, 2020 2 commits
    • Jan Lindström's avatar
      MDEV-23101 : SIGSEGV in lock_rec_unlock() when Galera is enabled · 224c9504
      Jan Lindström authored
      Remove incorrect BF (brute force) handling from lock_rec_has_to_wait_in_queue
      and move condition to correct callers. Add a function to report
      BF lock waits and assert if incorrect BF-BF lock wait happens.
      
      wsrep_report_bf_lock_wait
      	Add a new function to report BF lock wait.
      
      wsrep_assert_no_bf_bf_wait
      	Add a new function to check do we have a
      	BF-BF wait and if we have report this case
      	and assert as it is a bug.
      
      lock_rec_has_to_wait
      	Use new wsrep_assert_bf_wait to check BF-BF wait.
      
      lock_rec_create_low
      lock_table_create
      	Use new function to report BF lock waits.
      
      lock_rec_insert_by_trx_age
      lock_grant_and_move_on_page
      lock_grant_and_move_on_rec
      	Assert that trx is not Galera as VATS is not compatible
      	with Galera.
      
      lock_rec_add_to_queue
      	If there is conflicting lock in a queue make sure that
      	transaction is BF.
      
      lock_rec_has_to_wait_in_queue
      	Remove incorrect BF handling. If there is conflicting
      	locks in a queue all transactions must wait.
      
      lock_rec_dequeue_from_page
      lock_rec_unlock
      	If there is conflicting lock make sure it is not
      	BF-BF case.
      
      lock_rec_queue_validate
      	Add Galera record locking rules comment and use
      	new function to report BF lock waits.
      
      All attempts to reproduce the original assertion have been
      failed. Therefore, there is no test case on this commit.
      224c9504
    • Thirunarayanan Balathandayuthapani's avatar
      MDEV-18867 Long Time to Stop and Start · 75e82f71
      Thirunarayanan Balathandayuthapani authored
      fts_drop_orphaned_tables() takes long time to remove the orphaned
      FTS tables. In order to reduce the time, do the following:
      
      - Traverse fil_system.space_list and construct a set of
      table_id,index_id of all FTS_*.ibd tablespaces.
      - Traverse the sys_indexes table and ignore the entry
      from the above collection if it exist.
      - Existing elements in the collection can be considered as
      orphaned fts tables. construct the table name from
      (table_id,index_id) and invoke fts_drop_tables().
      - Removed DICT_TF2_FTS_AUX_HEX_NAME flag usage from upgrade.
      - is_aux_table() in dict_table_t to check whether the given name
      is fts auxiliary table
      fts_space_set_t is a structure to store set of parent table id
      and index id
      - Remove unused FTS function in fts0fts.cc
      - Remove the fulltext index in row_format_redundant test case.
      Because it deals with the condition that SYS_TABLES does have
      corrupted entry and valid entry exist in SYS_INDEXES.
      75e82f71
  9. 09 Sep, 2020 7 commits
    • Jan Lindström's avatar
      MDEV-23706 : Galera test failure on galera_autoinc_sst_mariabackup · 5c07ce40
      Jan Lindström authored
      Remove infinite procedure and use direct INSERTs.
      5c07ce40
    • Marko Mäkelä's avatar
      MDEV-23456 fixup: Simplify a comparison · 0eb38243
      Marko Mäkelä authored
      0eb38243
    • Marko Mäkelä's avatar
      MDEV-22924 fixup: Replace C++11 auto · 040ae4c5
      Marko Mäkelä authored
      040ae4c5
    • Marko Mäkelä's avatar
      MDEV-22924 fixup: Replace C++11 nullptr · d44c0f46
      Marko Mäkelä authored
      Only starting with MariaDB Server 10.4 we may depend on C++11.
      d44c0f46
    • Marko Mäkelä's avatar
      MDEV-23685 SIGSEGV on ADD FOREIGN KEY after failed ADD KEY · 64c8fa58
      Marko Mäkelä authored
      dict_foreign_qualify_index(): Reject corrupted or garbage indexes.
      For index stubs that are created on virtual columns, no
      dict_field_t::col would be assign. Instead, the entire table
      definition would be reloaded on a successful operation.
      64c8fa58
    • Marko Mäkelä's avatar
      MDEV-23456 fixup: Fix mtr_t::get_fix_count() · c26eae0c
      Marko Mäkelä authored
      Before commit 05fa4558 (MDEV-22110)
      we have slot->type == MTR_MEMO_MODIFY that are unrelated to
      incrementing the buffer-fix count.
      
      FindBlock::operator(): In debug builds, skip MTR_MEMO_MODIFY entries.
      
      Also, simplify the code a little.
      
      This fixes an infinite loop in the tests
      innodb.innodb_defragment and innodb.innodb_wl6326_big.
      c26eae0c
    • Thirunarayanan Balathandayuthapani's avatar
      MDEV-23456 fil_space_crypt_t::write_page0() is accessing an uninitialized page · b1009ae5
      Thirunarayanan Balathandayuthapani authored
      buf_page_create() is invoked when page is initialized. So that
      previous contents of the page ignored. In few cases, it calls
      buf_page_get_gen() is called to fetch the page from buffer pool.
      It should take x-latch on the page. If other thread uses the block
      or block io state is different from BUF_IO_NONE then release the
      mutex and check the state and buffer fix count again. For compressed
      page, use the existing free block from LRU list to create new page.
      Retry to fetch the compressed page if it is in flush list
      
      fseg_create(), fseg_create_general(): Introduce block as a parameter
      where segment header is placed. It is used to avoid repetitive
      x-latch on the same page
      
      Change the assert to check whether the page has SX latch and
      X latch in all callee function of buf_page_create()
      
      mtr_t::get_fix_count(): Get the buffer fix count of the given
      block added by the mtr
      
      FindBlock is added to find the buffer fix count of the given
      block acquired by the mini-transaction
      b1009ae5
  10. 07 Sep, 2020 3 commits
    • Marko Mäkelä's avatar
      MDEV-22924 Corruption in MVCC read via secondary index · f99cace7
      Marko Mäkelä authored
      An unsafe optimization was introduced by
      commit 2347ffd8 (MDEV-20301)
      which is based on
      mysql/mysql-server@3f3136188f1bd383f77f97823cf6ebd72d5e4d7e or
      mysql/mysql-server@647a3814a91c3d3bffc70ddff5513398e3f37bd4
      in MySQL 8.0.12 or MySQL 8.0.13
      (which in turn is based on the contribution in MySQL Bug #84958).
      
      Row_sel_get_clust_rec_for_mysql::operator(): In addition to checking
      that the pointer to the record matches, also check the latest
      modification of the page (FIL_PAGE_LSN) as well as the page identifier.
      Only if all three match, it is safe to reuse cached_old_vers.
      
      Row_sel_get_clust_rec_for_mysql::check_eq(): Assert that the PRIMARY KEY
      of the cached old version of the record corresponds to the latest version.
      
      We got a test case where CHECK TABLE, UPDATE and purge would be
      hammering on the same table (with only 6 rows) and a pointer that
      was originally pointing to a record pk=2 would match a cached_clust_rec
      that was pointing to a record pk=1. In the diagnosed `rr replay` trace,
      we would wrongly return an old cached version of the pk=1 record,
      instead of retrieving the correct version of the pk=2 record. Because
      of this, CHECK TABLE would fail to count one of the records in a
      secondary index, and report failure.
      
      This bug appears to affect MVCC reads via secondary indexes only.
      The purge of history in secondary indexes uses a different code path,
      and so do checks for implicit record locks.
      f99cace7
    • Sujatha's avatar
      MDEV-9501: rpl.rpl_binlog_index, rpl.rpl_gtid_crash, rpl.rpl_stm_multi_query... · a8f6bbb7
      Sujatha authored
      MDEV-9501: rpl.rpl_binlog_index, rpl.rpl_gtid_crash, rpl.rpl_stm_multi_query fail sporadically in buildbot with Master command COM_REGISTER_SLAVE failed
      
      Analysis:
      ========
      Slave server will send COM_REGISTER_SLAVE command at the time of establishing
      a connection to master. If master is down, then the command will fail and
      COM_REGISTER_SLAVE failed warning is reported.
      
      'rpl_binlog_index.test' shutsdown the master and it relocates binary logs to a
      new location and attempts to start master by pointing 'log-bin' to new
      location. During this process the slave threads are active. IO thread actively
      checks for the presence of master when it finds that the connection is lost it
      attempts a reconnect, as master is down COM_REGISTER_SLAVE command fails.
      
      As part of fix, stop the slave threads and then shutdown the master and do the
      binlog relocation. Once master is restarted start the slave threads and sync
      them with the master. In test binary logs and index files on master are
      relocated to /tmpdir but during master restart only --log-bin option is
      provided, this is incorrect. Even --log-bin-index also should be pointed to
      /tmpdir otherwise upon master server restart two index files will be created.
      One master-bin.index in /tmpdir and a new master-bin.index as per log_basename
      in datadir. Due to this slave will fail to connect to master.
      
      'rpl_gtid_crash.test' tests following scenario "crashing master, causing slave
      IO thread to reconnect while SQL thread is running". When IO thread tries to
      connect to crashed master on slow platforms COM_REGISTER_SLAVE command fails.
      This is expected hence the warning should be added to suppression list.
      a8f6bbb7
    • Kentoku SHIBA's avatar
      MDEV-7098 spider/bg.spider_fixes failed in buildbot with safe_mutex: Trying to... · 420c4dcc
      Kentoku SHIBA authored
      MDEV-7098 spider/bg.spider_fixes failed in buildbot with safe_mutex: Trying to unlock mutex conn->mta_conn_mutex that wasn't locked at storage/spider/spd_db_conn.cc, line 671
      420c4dcc