1. 19 Oct, 2018 16 commits
    • Marko Mäkelä's avatar
      Merge 10.4 into 10.4-mdev-15562 · fae9b711
      Marko Mäkelä authored
      This branch exists just for the record, for preserving the
      development history, should it ever be needed.
      fae9b711
    • Marko Mäkelä's avatar
      MDEV-15662 Instant DROP COLUMN or changing the order of columns · 0e5a4ac2
      Marko Mäkelä authored
      Allow ADD COLUMN anywhere in a table, not only adding as the
      last column.
      
      Allow instant DROP COLUMN and instant changing the order of columns.
      
      The added columns will always be added last in clustered index records.
      In new records, instantly dropped columns will be stored as NULL or
      empty when possible.
      
      Information about dropped and reordered columns will be written in
      a metadata BLOB (mblob), which is stored before the first 'user' field
      in the hidden metadata record at the start of the clustered index.
      The presence of mblob is indicated by setting the delete-mark flag in
      the metadata record.
      
      The metadata BLOB stores the number of clustered index fields,
      followed by an array of column information for each field.
      For dropped columns, we store the NOT NULL flag, the fixed length,
      and for variable-length columns, whether the maximum length exceeded
      255 bytes. For non-dropped columns, we store the column position.
      
      Unlike with MDEV-11369, when a table becomes empty, it cannot
      be converted back to the canonical format. The reason for this is
      that other threads may hold cached objects such as
      row_prebuilt_t::ins_node that could refer to dropped or reordered
      index fields.
      
      For instant DROP COLUMN and ROW_FORMAT=COMPACT or ROW_FORMAT=DYNAMIC,
      we must store the n_core_null_bytes in the root page, so that the
      chain of node pointer records can be followed in order to reach the
      leftmost leaf page where the metadata record is located.
      If the mblob is present, we will zero-initialize the strings
      "infimum" and "supremum" in the root page, and use the last byte of
      "supremum" for storing the number of null bytes (which are allocated
      but useless on node pointer pages). This is necessary for
      btr_cur_instant_init_metadata() to be able to navigate to the mblob.
      
      If the PRIMARY KEY contains any variable-length column and some
      nullable columns were instantly dropped, the dict_index_t::n_nullable
      in the data dictionary could be smaller than it actually is in the
      non-leaf pages. Because of this, the non-leaf pages could use more
      bytes for the null flags than the data dictionary expects, and we
      could be reading the lengths of the variable-length columns from the
      wrong offset, and thus reading the child page number from wrong place.
      This is the result of two design mistakes that involve unnecessary
      storage of data: First, it is nonsense to store any data fields for
      the leftmost node pointer records, because the comparisons would be
      resolved by the MIN_REC_FLAG alone. Second, there cannot be any null
      fields in the clustered index node pointer fields, but we nevertheless
      reserve space for all the null flags.
      
      Limitations (future work):
      
      MDEV-17459 Allow instant ALTER TABLE even if FULLTEXT INDEX exists
      MDEV-17468 Avoid table rebuild on operations on generated columns
      MDEV-17494 Refuse ALGORITHM=INSTANT when the row size is too large
      
      btr_page_reorganize_low(): Preserve any metadata in the root page.
      Call lock_move_reorganize_page() only after restoring the "infimum"
      and "supremum" records, to avoid a memcmp() assertion failure.
      
      dict_col_t::DROPPED: Magic value for dict_col_t::ind.
      
      dict_col_t::clear_instant(): Renamed from dict_col_t::remove_instant().
      Do not assert that the column was instantly added, because we
      sometimes call this unconditionally for all columns.
      Convert an instantly added column to a "core column". The old name
      remove_instant() could be mistaken to refer to "instant DROP COLUMN".
      
      dict_col_t::is_added(): Rename from dict_col_t::is_instant().
      
      dtype_t::metadata_blob_init(): Initialize the mblob data type.
      
      dtuple_t::is_metadata(), dtuple_t::is_alter_metadata(),
      upd_t::is_metadata(), upd_t::is_alter_metadata(): Check if info_bits
      refer to a metadata record.
      
      dict_table_t::instant: Metadata about dropped or reordered columns.
      
      dict_table_t::prepare_instant(): Prepare
      ha_innobase_inplace_ctx::instant_table for instant ALTER TABLE.
      innobase_instant_try() will pass this to dict_table_t::instant_column().
      On rollback, dict_table_t::rollback_instant() will be called.
      
      dict_table_t::instant_column(): Renamed from instant_add_column().
      Add the parameter col_map so that columns can be reordered.
      Copy and adjust v_cols[] as well.
      
      dict_table_t::find(): Find an old column based on a new column number.
      
      dict_table_t::serialise_columns(), dict_table_t::deserialise_columns():
      Convert the mblob.
      
      dict_index_t::instant_metadata(): Create the metadata record
      for instant ALTER TABLE. Invoke dict_table_t::serialise_columns().
      
      dict_index_t::reconstruct_fields(): Invoked by
      dict_table_t::deserialise_columns().
      
      dict_index_t::clear_instant_alter(): Move the fields for the
      dropped columns to the end, and sort the surviving index fields
      in ascending order of column position.
      
      ha_innobase::check_if_supported_inplace_alter(): Do not allow
      adding a FTS_DOC_ID column if a hidden FTS_DOC_ID column exists
      due to FULLTEXT INDEX. (This always required ALGORITHM=COPY.)
      
      instant_alter_column_possible(): Add a parameter for InnoDB table,
      to check for additional conditions, such as the maximum number of
      index fields.
      
      ha_innobase_inplace_ctx::first_alter_pos: The first column whose position
      is affected by instant ADD, DROP, or changing the order of columns.
      
      innobase_build_col_map(): Skip added virtual columns.
      
      prepare_inplace_add_virtual(): Correctly compute num_to_add_vcol.
      Remove some unnecessary code. Note that the call to
      innodb_base_col_setup() should be executed later.
      
      commit_try_norebuild(): If ctx->is_instant(), let the virtual
      columns be added or dropped by innobase_instant_try().
      
      innobase_instant_try(): Fill in a zero default value for the
      hidden column FTS_DOC_ID (to reduce the work needed in MDEV-17459).
      If any columns were dropped or reordered (or added not last),
      delete any SYS_COLUMNS records for the following columns, and
      insert SYS_COLUMNS records for all subsequent stored columns as well
      as for all virtual columns. If any virtual column is dropped, rewrite
      all virtual column metadata. Use a shortcut only for adding
      virtual columns. This is because innobase_drop_virtual_try()
      assumes that the dropped virtual columns still exist in ctx->old_table.
      
      innodb_update_cols(): Renamed from innodb_update_n_cols().
      
      innobase_add_one_virtual(), innobase_insert_sys_virtual(): Change
      the return type to bool, and invoke my_error() when detecting an error.
      
      innodb_insert_sys_columns(): Insert a record into SYS_COLUMNS.
      Refactored from innobase_add_one_virtual() and innobase_instant_add_col().
      
      innobase_instant_add_col(): Replace the parameter dfield with type.
      
      innobase_instant_drop_cols(): Drop matching columns from SYS_COLUMNS
      and all columns from SYS_VIRTUAL.
      
      innobase_add_virtual_try(), innobase_drop_virtual_try(): Let
      the caller invoke innodb_update_cols().
      
      innobase_rename_column_try(): Skip dropped columns.
      
      commit_cache_norebuild(): Update table->fts->doc_col.
      
      dict_mem_table_col_rename_low(): Skip dropped columns.
      
      trx_undo_rec_get_partial_row(): Skip dropped columns.
      
      trx_undo_update_rec_get_update(): Handle the metadata BLOB correctly.
      
      trx_undo_page_report_modify(): Avoid out-of-bounds access to record fields.
      Log metadata records consistently.
      Apparently, the first fields of a clustered index may be updated
      in an update_undo vector when the index is ID_IND of SYS_FOREIGN,
      as part of renaming the table during ALTER TABLE. Normally, updates of
      the PRIMARY KEY should be logged as delete-mark and an insert.
      
      row_undo_mod_parse_undo_rec(), row_purge_parse_undo_rec():
      Use trx_undo_metadata.
      
      row_undo_mod_clust_low(): On metadata rollback, roll back the root page too.
      
      row_undo_mod_clust(): Relax an assertion. The delete-mark flag was
      repurposed for ALTER TABLE metadata records.
      
      row_rec_to_index_entry_impl(): Add the template parameter mblob
      and the optional parameter info_bits for specifying the desired new
      info bits. For the metadata tuple, allow conversion between the original
      format (ADD COLUMN only) and the generic format (with hidden BLOB).
      Add the optional parameter "pad" to determine whether the tuple should
      be padded to the index fields (on ALTER TABLE it should), or whether
      it should remain at its original size (on rollback).
      
      row_build_index_entry_low(): Clean up the code, removing
      redundant variables and conditions. For instantly dropped columns,
      generate a dummy value that is NULL, the empty string, or a
      fixed length of NUL bytes, depending on the type of the dropped column.
      
      row_upd_clust_rec_by_insert_inherit_func(): On the update of PRIMARY KEY
      of a record that contained a dropped column whose value was stored
      externally, we will be inserting a dummy NULL or empty string value
      to the field of the dropped column. The externally stored column would
      eventually be dropped when purge removes the delete-marked record for
      the old PRIMARY KEY value.
      
      btr_index_rec_validate(): Recognize the metadata record.
      
      btr_discard_only_page_on_level(): Preserve the generic instant
      ALTER TABLE metadata.
      
      btr_set_instant(): Replaces page_set_instant(). This sets a clustered
      index root page to the appropriate format, or upgrades from
      the MDEV-11369 instant ADD COLUMN to generic ALTER TABLE format.
      
      btr_cur_instant_init_low(): Read and validate the metadata BLOB page
      before reconstructing the dictionary information based on it.
      
      btr_cur_instant_init_metadata(): Do not read any lengths from the
      metadata record header before reading the BLOB. At this point, we
      would not actually know how many nullable fields the metadata record
      contains.
      
      btr_cur_instant_root_init(): Initialize n_core_null_bytes in one
      of two possible ways.
      
      btr_cur_trim(): Handle the mblob record.
      
      row_metadata_to_tuple(): Convert a metadata record to a data tuple,
      based on the new info_bits of the metadata record.
      
      btr_cur_pessimistic_update(): Invoke row_metadata_to_tuple() if needed.
      Invoke dtuple_convert_big_rec() for metadata records if the record is
      too large, or if the mblob is not yet marked as externally stored.
      
      btr_cur_optimistic_delete_func(), btr_cur_pessimistic_delete():
      When the last user record is deleted, do not delete the
      generic instant ALTER TABLE metadata record. Only delete
      MDEV-11369 instant ADD COLUMN metadata records.
      
      btr_cur_optimistic_insert(): Avoid unnecessary computation of rec_size.
      
      btr_pcur_store_position(): Allow a logically empty page to contain
      a metadata record for generic ALTER TABLE.
      
      REC_INFO_DEFAULT_ROW_ADD: Renamed from REC_INFO_DEFAULT_ROW.
      This is for the old instant ADD COLUMN (MDEV-11369) only.
      
      REC_INFO_DEFAULT_ROW_ALTER: The more generic metadata record,
      with additional information for dropped or reordered columns.
      
      rec_info_bits_valid(): Remove. The only case when this would fail
      is when the record is the generic ALTER TABLE metadata record.
      
      rec_is_alter_metadata(): Check if a record is the metadata record
      for instant ALTER TABLE (other than ADD COLUMN). NOTE: This function
      must not be invoked on node pointer records, because the delete-mark
      flag in those records may be set (it is garbage), and then a debug
      assertion could fail because index->is_instant() does not necessarily
      hold.
      
      rec_is_add_metadata(): Check if a record is MDEV-11369 ADD COLUMN metadata
      record (not more generic instant ALTER TABLE).
      
      rec_get_converted_size_comp_prefix_low(): Assume that the metadata
      field will be stored externally. In dtuple_convert_big_rec() during
      the rec_get_converted_size() call, it would not be there yet.
      
      rec_get_converted_size_comp(): Replace status,fields,n_fields with tuple.
      
      rec_init_offsets_comp_ordinary(), rec_get_converted_size_comp_prefix_low(),
      rec_convert_dtuple_to_rec_comp(): Add template<bool mblob = false>.
      With mblob=true, process a record with a metadata BLOB.
      
      rec_copy_prefix_to_buf(): Assert that no fields beyond the key and
      system columns are being copied. Exclude the metadata BLOB field.
      
      rec_convert_dtuple_to_metadata_comp(): Convert an alter metadata tuple
      into a record.
      
      row_upd_index_replace_metadata(): Apply an update vector to an
      alter_metadata tuple.
      
      row_log_allocate(): Replace dict_index_t::is_instant()
      with a more appropriate condition that ignores dict_table_t::instant.
      Only a table on which the MDEV-11369 ADD COLUMN was performed
      can "lose its instantness" when it becomes empty. After
      instant DROP COLUMN or reordering columns, we cannot simply
      convert the table to the canonical format, because the data
      dictionary cache and all possibly existing references to it
      from other client connection threads would have to be adjusted.
      
      row_quiesce_write_index_fields(): Do not crash when the table contains
      an instantly dropped column.
      
      Thanks to Thirunarayanan Balathandayuthapani for discussing the design
      and implementing an initial prototype of this.
      Thanks to Matthias Leich for testing.
      0e5a4ac2
    • Marko Mäkelä's avatar
      Add one more test · d86bdc97
      Marko Mäkelä authored
      d86bdc97
    • Marko Mäkelä's avatar
      Merge 10.4 into 10.4-mdev-15562 · 6b5b6225
      Marko Mäkelä authored
      6b5b6225
    • Marko Mäkelä's avatar
      Fix a harmless debug assertion failure · 4c800788
      Marko Mäkelä authored
      btr_page_reorganize_low(): Call lock_move_reorganize_page()
      only after restoring the "infimum" and "supremum" records,
      to avoid an assertion failure that these records differ.
      4c800788
    • Marko Mäkelä's avatar
      927ceb14
    • Marko Mäkelä's avatar
      Introduce dict_table_t::prepare_instant() · 724f4f9d
      Marko Mäkelä authored
      724f4f9d
    • Marko Mäkelä's avatar
      5bb0259c
    • Marko Mäkelä's avatar
    • Alexander Barkov's avatar
      MDEV-17502 MDEV-17474 Change Unicode xxx_general_ci and xxx_bin collation... · a8efe7ab
      Alexander Barkov authored
      MDEV-17502 MDEV-17474 Change Unicode xxx_general_ci and xxx_bin collation implementation to "inline" style
      a8efe7ab
    • Marko Mäkelä's avatar
      Fix a bug in metadata record creation · d5fa3dda
      Marko Mäkelä authored
      dict_index_t::instant_metadata(): For instantly dropped NOT NULL
      variable-length columns, store an empty string for the default value.
      d5fa3dda
    • Marko Mäkelä's avatar
      Merge 10.4 into 10.4-mdev-15562 · 8fe0211a
      Marko Mäkelä authored
      8fe0211a
    • Marko Mäkelä's avatar
      Merge 10.3 into 10.4 · 1bb90411
      Marko Mäkelä authored
      1bb90411
    • Marko Mäkelä's avatar
      Merge 10.2 into 10.3 · 1595ff8a
      Marko Mäkelä authored
      1595ff8a
    • Marko Mäkelä's avatar
      MDEV-17466: Remove the debug assertion · ab1ce220
      Marko Mäkelä authored
      This reverts commit 2d4075e1
      where the debug assertion was added. There seems to be a potential
      problem in the purge of indexes that depend on virtual columns.
      
      Ultimately, we should change the InnoDB undo log format so that
      all actual secondary index keys are stored there, also for
      virtual or spatial indexes. In that way, purge and rollback would
      be more straightforward.
      ab1ce220
    • Marko Mäkelä's avatar
      Remove unused TIMETPF · abbf169f
      Marko Mäkelä authored
      abbf169f
  2. 18 Oct, 2018 18 commits
  3. 17 Oct, 2018 6 commits
    • Marko Mäkelä's avatar
      Replace FIXME comments · b29e729d
      Marko Mäkelä authored
      b29e729d
    • Marko Mäkelä's avatar
      Merge 10.4 into 10.4-mdev-15562 · 68dc0535
      Marko Mäkelä authored
      68dc0535
    • Marko Mäkelä's avatar
      Merge 10.2 into 10.3 · f454189c
      Marko Mäkelä authored
      f454189c
    • Marko Mäkelä's avatar
      Merge 10.3 into 10.4 · d88c136b
      Marko Mäkelä authored
      d88c136b
    • Marko Mäkelä's avatar
      MDEV-17483 Insert on delete-marked record can wrongly inherit old values for instantly added column · 2fa4ed03
      Marko Mäkelä authored
      row_ins_clust_index_entry_low(): Do not call dtuple_t::trim()
      before row_ins_clust_index_entry_by_modify(), so that the values
      of all columns will be available in row_upd_build_difference_binary().
      If applicable, the tuple can be trimmed in btr_cur_optimistic_update()
      or btr_cur_pessimistic_update(), which will be called by
      row_ins_clust_index_entry_by_modify().
      2fa4ed03
    • Marko Mäkelä's avatar
      MDEV-13564: Set innodb_safe_truncate=ON by default · 853a0a43
      Marko Mäkelä authored
      The setting innodb_safe_truncate=ON reduces compatibility with older
      versions of MariaDB and backup tools in two ways.
      
      First, we will be writing TRX_UNDO_RENAME_TABLE records, which older
      versions do not know about. These records could be misinterpreted if
      a DDL transaction was recovered and would be rolled back.
      Such rollback is only possible if the server was killed while
      an incomplete DDL transaction was persisted. On transaction completion,
      the insert_undo log pages would only be repurposed for new undo log
      allocations, and their contents would not matter. So, older versions
      will not have a problem with innodb_safe_truncate=ON if the server was
      shut down cleanly.
      
      Second, to prevent such recovery failure, innodb_safe_truncate=ON will
      cause a modification of the redo log format identifier, which will
      prevent older versions from starting up after a crash. MariaDB Server
      versions older than 10.2.13 will refuse to start up altogether, even
      after clean shutdown.
      
      A server restart with innodb_safe_truncate=OFF will restore compatibility
      with older server and backup versions.
      853a0a43