- 19 May, 2021 40 commits
-
-
Monty authored
MDEV-25604 Atomic DDL: Binlog event written upon recovery does not have default database The purpose of this task is to ensure that ALTER TABLE is atomic even if the MariaDB server would be killed at any point of the alter table. This means that either the ALTER TABLE succeeds (including that triggers, the status tables and the binary log are updated) or things should be reverted to their original state. If the server crashes before the new version is fully up to date and commited, it will revert to the original table and remove all temporary files and tables. If the new version is commited, crash recovery will use the new version, and update triggers, the status tables and the binary log. The one execption is ALTER TABLE .. RENAME .. where no changes are done to table definition. This one will work as RENAME and roll back unless the whole statement completed, including updating the binary log (if enabled). Other changes: - Added handlerton->check_version() function to allow the ddl recovery code to check, in case of inplace alter table, if the table in the storage engine is of the new or old version. - Added handler->table_version() so that an engine can report the current version of the table. This should be changed each time the table definition changes. - Added ha_signal_ddl_recovery_done() and handlerton::signal_ddl_recovery_done() to inform all handlers when ddl recovery has been done. (Needed by InnoDB). - Added handlerton call inplace_alter_table_committed, to signal engine that ddl_log has been closed for the alter table query. - Added new handerton flag HTON_REQUIRES_NOTIFY_TABLEDEF_CHANGED_AFTER_COMMIT to signal when we should call hton->notify_tabledef_changed() during mysql_inplace_alter_table. This was required as MyRocks and InnoDB needed the call at different times. - Added function server_uuid_value() to be able to generate a temporary xid when ddl recovery writes the query to the binary log. This is needed to be able to handle crashes during ddl log recovery. - Moved freeing of the frm definition to end of mysql_alter_table() to remove duplicate code and have a common exit strategy. ------- InnoDB part of atomic ALTER TABLE (Implemented by Marko Mäkelä) innodb_check_version(): Compare the saved dict_table_t::def_trx_id to determine whether an ALTER TABLE operation was committed. We must correctly recover dict_table_t::def_trx_id for this to work. Before purge removes any trace of DB_TRX_ID from system tables, it will make an effort to load the user table into the cache, so that the dict_table_t::def_trx_id can be recovered. ha_innobase::table_version(): return garbage, or the trx_id that would be used for committing an ALTER TABLE operation. In InnoDB, table names starting with #sql-ib will remain special: they will be dropped on startup. This may be revisited later in MDEV-18518 when we implement proper undo logging and rollback for creating or dropping multiple tables in a transaction. Table names starting with #sql will retain some special meaning: dict_table_t::parse_name() will not consider such names for MDL acquisition, and dict_table_rename_in_cache() will treat such names specially when handling FOREIGN KEY constraints. Simplify InnoDB DROP INDEX. Prevent purge wakeup To ensure that dict_table_t::def_trx_id will be recovered correctly in case the server is killed before ddl_log_complete(), we will block the purge of any history in SYS_TABLES, SYS_INDEXES, SYS_COLUMNS between ha_innobase::commit_inplace_alter_table(commit=true) (purge_sys.stop_SYS()) and purge_sys.resume_SYS(). The completion callback purge_sys.resume_SYS() must be between ddl_log_complete() and MDL release. -------- MyRocks support for atomic ALTER TABLE (Implemented by Sergui Petrunia) Implement these SE API functions: - ha_rocksdb::table_version() - hton->check_version = rocksdb_check_versionMyRocks data dictionary now stores table version for each table. (Absence of table version record is interpreted as table_version=0, that is, which means no upgrade changes are needed) - For inplace alter table of a partitioned table, call the underlying handlerton when checking if the table is ok. This assumes that the partition engine commits all changes at once.
-
Monty authored
ALTER TABLE .. RENAME, when used with the inplace algorithm, does: - Do an inplace or online alter to the new definition - Rename to new name - Update triggers. If update triggers would fail, we would rename the table back. The problem with this approach is that the table would have the new definition but the rename would fail. The binary log would also not be updated. The solution to this is to very early check if we can rename triggers and give an error if this would fail. Both ALTER TABLE ... RENAME and RENAME TABLE is fixed. This was implemented by moving the pre-check of rename table in triggers from Table_triggers_list::change_table_name() to Table_triggers_list::prepare_for_rename().
-
Monty authored
Before this fix, one would get a 'Trigger ... already exists' when trying to create a trigger matching the original name and 'Trigger ... does not exists" when trying to drop it. Fixes a reported bug in MDEV-25180 Atomic ALTER TABLE MDEV-25517 Atomic DDL: Assertion `query_arg' in THD::binlog_query upon DROP TRIGGER The bug was that the stmt_query variable was not populated with the query in case of DROP TRIGGER of an orphan trigger (.TRN file exists & table exists, but the trigger was not in table->triggers).
-
Monty authored
The purpose of this task is to ensure that CREATE TRIGGER is atomic When a trigger is created, we first create a trigger_name.TRN file and then create or update the table_name.TRG files. This is done by creating .TRN~ and .TRG~ files and replacing (or creating) the result files. The new logic is - Log CREATE TRIGGER to DDL log, with a marker if old trigger existsted - If old .TRN or .TRG files exists, make backup copies of these - Create the new .TRN and .TRG files as before - Remove the backups Crash recovery - If query has been logged to binary log: - delete any left over backup files - else - Delete any old .TRN~ or .TRG~ files - If there was orignally some triggers (old .TRG file existed) - If we crashed before creating all backup files - Delete existing backup files - else - Restore backup files - end - Delete .TRN and .TRG file (as there was no triggers before One benefit of the new code is that CREATE OR REPLACE TRIGGER is now totally atomic even if there existed an old trigger: Either the old trigger will be replaced or the old one will be left untouched. Other things: - If sql_create_definition_file() would fail, there could be memory leaks in CREATE TRIGGER, DROP TRIGGER or CREATE OR REPLACE TRIGGER. This is now fixed.
-
Monty authored
The logic of the new code is: - Log CREATE view to DDL log, with a marker if old view existed - If old view exists (in case of CREATE or REPLACE view), make a copy of the old view as view_name.frm- - Create the new view definition file - Delete copy of view if it was created. Crash recovery: - Delete view_name.frm~ file (Temporary file for view definition) - If query was logged to binary log - Delete copy of view if it exists - else -rename the copy of the view over the .frm file (restoring the old definition) One benefit of the new code is that CREATE OR REPLACE VIEW for an existing view is no fully atomic: Either the view will be replaced or the old one will be left unchanged.
-
Monty authored
There are a few different cases to consider Logging of CREATE TABLE and CREATE TABLE ... LIKE - If REPLACE is used and there was an existing table, DDL log the drop of the table. - If discovery of table is to be done - DDL LOG create table else - DDL log create table (with engine type) - create the table - If table was created - Log entry to binary log with xid - Mark DDL log completed Crash recovery: - If query was in binary log do nothing and exit - If discoverted table - Delete the .frm file -else - Drop created table and frm file - If table was dropped, write a DROP TABLE statement in binary log CREATE TABLE ... SELECT required a little more work as when one is using statement logging the query is written to the binary log before commit is done. This was fixed by adding a DROP TABLE to the binary log during crash recovery if the ddl log entry was not closed. In this case the binary log will contain: CREATE TABLE xxx ... SELECT .... DROP TABLE xxx; Other things: - Added debug_crash_here() functionality to Aria to be able to test crash in create table between the creation of the .MAI and the .MAD files.
-
Monty authored
Description of how DROP DATABASE works after this patch - Collect list of tables - DDL log tables as they are dropped - DDL log drop database - Delete db.opt - Delete data directory - Log either DROP TABLE or DROP DATABASE to binary log - De active ddl log entry This is in line of how things where before (minus ddl logging) except that we delete db.opt file last to not loose it if DROP DATABASE fails. On recovery we have to ensure that all dropped tables are logged in binary log and that they are properly dropped (as with atomic drop table). No new tables be dropped as part of recovery. Recovery of active drop database ddl log entry: - If drop database was logged to ddl log but was not found in the binary log: - drop the db.opt file and database directory. - Log DROP DATABASE to binary log - If drop database was not logged to ddl log - Update binary log with DROP TABLE of the dropped tables. If table list is longer than max_allowed_packet, then the query will be split into multiple DROP TABLE/VIEW queries. Other things: - Added DDL_LOG_STATE and 'current database' as arguments to mysql_rm_table_no_locks(). This was needed to be able to combine ddl logging of DROP DATABASE and DROP TABLE and make the generated DROP TABLE statements shorter. - To make the DROP TABLE statement created by ddl log shorter, I changed the binlogged query to use current directory and omit the directory part for all tables in the current directory. - Merged some DROP TABLE and DROP VIEW code in ddl logger. This was done to be able get separate DROP VIEW and DROP TABLE statements in the binary log. - Added a 'recovery_state' variable to remember the state of dropped tables and views. - Moved out code that drops database objects (stored procedures) from mysql_rm_db_internal() to drop_database_objects() for better code reuse. - Made mysql_rm_db_internal() global so that could be used by the ddl recovery code.
-
Monty authored
The purpose of this task is to ensure that DROP TRIGGER is atomic. Description of how atomic drop trigger works: Logging of DROP TRIGGER Log the following information: db table name trigger name xid /* Used to check if query was already logged to binary log */ initial length of the .TRG file query if there is space for it, if not log a zero length query. Recovery operations: - Delete if exists 'database/trigger_name.TRN~' - If this file existed, it means that we crashed before the trigger was deleted and there is nothing else to do. - Get length of .TRG file - If file length is unchanged, trigger was not dropped. Nothing else to do. - Log original query to binary, if it was stored in the ddl log. If it was not stored (long query string), log the following query to binary log: use `database` ; DROP TRIGGER IF EXISTS `trigger_name` /* generated by ddl log */; Other things: - Added trigger name and DDL_LOG_STATE to drop_trigger() Trigger name was added to make the interface more consistent and more general.
-
Monty authored
Logging logic: - Log tables, just before they are dropped, to the ddl log - After the last table for the statement is dropped, log an xid for the whole ddl log event In case of crash: - Remove first any active DROP TABLE events from the ddl log that matches xids found in binary log (this mean the drop was successful and was propery logged). - Loop over all active DROP TABLE events - Ensure that the table is completely dropped - Write a DROP TABLE entry to the binary log with the dropped tables. Other things: - Added code to ha_drop_table() to be able to tell the difference if a get_new_handler() failed because of out-of-memory or because the handler refused/was not able to create a a handler. This was needed to get sequences to work as sequences needs a share object to be passed to get_new_handler() - TC_LOG_BINLOG::recover() was changed to always collect Xid's from the binary log and always call ddl_log_close_binlogged_events(). This was needed to be able to collect DROP TABLE events with embedded Xid's (used by ddl log). - Added a new variable "$grep_script" to binlog filter to be able to find only rows that matches a regexp. - Had to adjust some test that changed because drop statements are a bit larger in the binary log than before (as we have to store the xid) Other things: - MDEV-25588 Atomic DDL: Binlog query event written upon recovery is corrupt fixed (in the original commit).
-
Monty authored
- Major rewrite of ddl_log.cc and ddl_log.h - ddl_log.cc described in the beginning how the recovery works. - ddl_log.log has unique signature and is dynamic. It's easy to add more information to the header and other ddl blocks while still being able to execute old ddl entries. - IO_SIZE for ddl blocks is now dynamic. Can be changed without affecting recovery of old logs. - Code is more modular and is now usable outside of partition handling. - Renamed log file to dll_recovery.log and added option --log-ddl-recovery to allow one to specify the path & filename. - Added ddl_log_entry_phase[], number of phases for each DDL action, which allowed me to greatly simply set_global_from_ddl_log_entry() - Changed how strings are stored in log entries, which allows us to store much more information in a log entry. - ddl log is now always created at start and deleted on normal shutdown. This simplices things notable. - Added probes debug_crash_here() and debug_simulate_error() to simply crash testing and allow crash after a given number of times a probe is executed. See comments in debug_sync.cc and rename_table.test for how this can be used. - Reverting failed table and view renames is done trough the ddl log. This ensures that the ddl log is tested also outside of recovery. - Added helper function 'handler::needs_lower_case_filenames()' - Extend binary log with Q_XID events. ddl log handling is using this to check if a ddl log entry was logged to the binary log (if yes, it will be deleted from the log during ddl_log_close_binlogged_events() - If a DDL entry fails 3 time, disable it. This is to ensure that if we have a crash in ddl recovery code the server will not get stuck in a forever crash-restart-crash loop. mysqltest.cc changes: - --die will now replace $variables with their values - $error will contain the error of the last failed statement storage engine changes: - maria_rename() was changed to be more robust against crashes during rename.
-
Monty authored
This is required to make Atomic RENAME TABLE work for these engines The requirement is that if we have a server crash in the middle of a storage engine rename call, the upcoming ddl log recovery should be able to finalize it by re-execute the rename.
-
Monty authored
This happened because in ma_open() we did not take into account that tran_man (Aria transaction manager) would not be initialized. Fixed by using the same check for minimum transaction id as we use during repair. Other things: - ariad_read_log now displays a readable timestamp - Removed printing of datapage for header. This removes some wrong warnings from the aria_read_log output
-
Monty authored
-
Monty authored
This did not server any real purpose and also made it too difficult to add asserts for string memory overrwrites. Moved all functionallity from Static_binary_string to Binary_string. Other things: - Added asserts to q_xxx and qs_xxx functions to check for memory overruns - Fixed wrong test in String_buffer::set_buffer_if_not_allocated(). The idea is to reuse allocated buffers (to avoid extra allocs), which the code did not do.
-
Monty authored
This change is to get rid of randomly failing tests, especially those that reads random position of the binary log. From looking at the logs it's clear that some failures is because of a read char (with value >= 128) is converted to a big long value. Using uchar everywhere makes this much less likely to happen. Another benefit is that a lot of cast of char to uchar could be removed. Other things: - Removed some extra space before '=' and '+=' in assignments - Fixed indentations and lines > 80 characters - Replace '16' with 'element_size' (from class definition) in Gtid_list_log_event()
-
Monty authored
-
Monty authored
The reason for the removal are: - Generates more code - Storing and retreving THD - Causes extra code and daata to be generated to handle possible throw exceptions (which never happens in MariaDB code) - Uses more stack space Other things: - Changed convert_const_to_int() to use item->save_in_field_no_warnings(), which made the code shorter and simpler. - Removed not needed code in Sp_handler::sp_create_routine() - Added thd as argument to store_key.copy() to make function simpler - Added thd as argument to some subselect* constructor that inherites from Item_subselect.
-
Monty authored
- Updated error messages for recovery - Changed printing of debug sync point information to make it fit 80 char
-
Monty authored
Other things: - Updated code comments & fixed indentation - Removed an old QQ (temporary) comment that does not apply anymore
-
Monty authored
Author: woqutech
-
Monty authored
This is needed as we are using uint32 for allocated and current length.
-
Monty authored
TO_CHAR(expr, fmt) - expr: required parameter, data/time/timestamp type expression - fmt: optional parameter, format string, supports YYYY/YYY/YY/RRRR/RR/MM/MON/MONTH/MI/DD/DY/HH/HH12/HH24/SS and special characters. The default value is "YYYY-MM-DD HH24:MI:SS" In Oracle, TO_CHAR() can also be used to convert numbers to strings, but this is not supported. This will gave an error in this patch. Other things: - If format strings is a constant, it's evaluated only once and if there is any errors in it, they are given at once and the statement will abort. Original author: woqutech Lots of optimizations and cleanups done as part of review
-
Monty authored
MINUS is mapped to EXCEPT One consequence of the patch is that MINUS becomes a reserved word in Oracle mode. Author: woqutech
-
Monty authored
SYS_GUID() returns same as UUID(), but without any '-' author: woqutech
-
Monty authored
The ROWNUM() function is for SELECT mapped to JOIN->accepted_rows, which is incremented for each accepted rows. For Filesort, update, insert, delete and load data, we map ROWNUM() to internal variables incremented when the table is changed. The connection between the row counter and Item_func_rownum is done in sql_select.cc::fix_items_after_optimize() and sql_insert.cc::fix_rownum_pointers() When ROWNUM() is used anywhere in query, the optimization to ignore ORDER BY in sub queries are disabled. This was done to get the following common Oracle query to work: select * from (select * from t1 order by a desc) as t where rownum() <= 2; MDEV-3926 "Wrong result with GROUP BY ... WITH ROLLUP" contains a discussion about this topic. LIMIT optimization is enabled when in a top level WHERE clause comparing ROWNUM() with a numerical constant using any of the following expressions: - ROWNUM() < # - ROWNUM() <= # - ROWNUM() = 1 ROWNUM() can be also be the right argument to the comparison function. LIMIT optimization is done in two cases: - For the current sub query when the ROWNUM comparison is done on the top level: SELECT * from t1 WHERE rownum() <= 2 AND t1.a > 0 - For an inner sub query, when the upper level has only a ROWNUM comparison in the WHERE clause: SELECT * from (select * from t1) as t WHERE rownum() <= 2 In Oracle mode, one can also use ROWNUM without parentheses. Other things: - Fixed bug where the optimizer tries to optimize away sub queries with RAND_TABLE_BIT set (non-deterministic queries). Now these sub queries will not be converted to joins. This bug fix was also needed to get rownum() working inside subqueries. - In remove_const() remove setting simple_order to FALSE if ROLLUP is USED. This code was disable a long time ago because of wrong assignment in the following code. Instead we set simple_order to false if RAND_TABLE_BIT was used in the SELECT list. This ensures that we don't delete ORDER BY if the result set is not deterministic, like in 'SELECT RAND() AS 'r' FROM t1 ORDER BY r'; - Updated parameters for Sort_param::init_for_filesort() to be able to provide filesort with information where the number of accepted rows should be stored - Reordered fields in class Filesort to optimize storage layout - Added new error messsage to tell that a function can't be used in HAVING - Added field 'with_rownum' to THD to mark that ROWNUM() is used in the query. Co-author: Oleksandr Byelkin <sanja@mariadb.com> LIMIT optimization for sub query
-
Alexander Barkov authored
-
Monty authored
-
Monty authored
-
Monty authored
-
Monty authored
Rename deactivate_ddl_log_entry to ddl_log_increment_phase
-
Monty authored
Part of prepration for: MDEV-17567 Atomic DDL No notable code changes except moving code around
-
Monty authored
-
Monty authored
DROP TABLE opens all temporary tables at start, but then uses find_temporary_table() to check if a table is temporary instead of is_temporary_table() which is much faster. This patch fixes this issue.
-
Monty authored
- Moved out creating StringBuffers in loops and instead create them outside and just reset the buffer if it was not allocated (to avoid a possible malloc/free for every entry) Other things related to set_buffer_if_not_allocated() - Changed Valuebuffer to not call set_buffer_if_not_allocated() when it is created. - Fixed geometry functions to reset string length before calling String::reserve(). This is because one should not access length() of an undefined. - Added Item_func_conv_charset::save_in_field() as the item is using str_value to store cached values, which conflicts with Item::save_str_in_field(). - Changed Item_proc_string to not store the string value in sql_string as this clashes with Item::save_str_in_field(). - Locally store value of full_name_cstring() in analyse::end_of_records() as Item::save_str_in_field() may overwrite it. - Marked some strings as set_thread_specific() - Added String::free_buffer() to be used internally in String functions to just free the buffer but not reset other String values. - Fixed uses_buffer_owned_by() to check for allocated length instead of strlength, which could be marked MEM_UNDEFINED().
-
Monty authored
This change removed 68 explict strlen() calls from the code. The following renames was done to ensure we don't use the old names when merging code from earlier releases, as using the new variables for print function could result in crashes: - charset->csname renamed to charset->cs_name - charset->name renamed to charset->coll_name Almost everything where mechanical changes except: - Changed to use the new Protocol::store(LEX_CSTRING..) when possible - Changed to use field->store(LEX_CSTRING*, CHARSET_INFO*) when possible - Changed to use String->append(LEX_CSTRING&) when possible Other things: - There where compiler issues with ensuring that all character set names points to the same string: gcc doesn't allow one to use integer constants when defining global structures (constant char * pointers works fine). To get around this, I declared defines for each character set name length.
-
Monty authored
-
Sergei Golubchik authored
a helper method to check whether an item can be evaluated in the query optimization phase (in and below JOIN::optimize()).
-
Monty authored
Other things: - Remove inline and virtual for methods that are overrides - Added a 'final' to some Item classes
-
Monty authored
This returns a LEX_CSTRING and allows one to avoid strlen() calls.
-
Monty authored
Changes: - To detect automatic strlen() I removed the methods in String that uses 'const char *' without a length: - String::append(const char*) - Binary_string(const char *str) - String(const char *str, CHARSET_INFO *cs) - append_for_single_quote(const char *) All usage of append(const char*) is changed to either use String::append(char), String::append(const char*, size_t length) or String::append(LEX_CSTRING) - Added STRING_WITH_LEN() around constant string arguments to String::append() - Added overflow argument to escape_string_for_mysql() and escape_quotes_for_mysql() instead of returning (size_t) -1 on overflow. This was needed as most usage of the above functions never tested the result for -1 and would have given wrong results or crashes in case of overflows. - Added Item_func_or_sum::func_name_cstring(), which returns LEX_CSTRING. Changed all Item_func::func_name()'s to func_name_cstring()'s. The old Item_func_or_sum::func_name() is now an inline function that returns func_name_cstring().str. - Changed Item::mode_name() and Item::func_name_ext() to return LEX_CSTRING. - Changed for some functions the name argument from const char * to to const LEX_CSTRING &: - Item::Item_func_fix_attributes() - Item::check_type_...() - Type_std_attributes::agg_item_collations() - Type_std_attributes::agg_item_set_converter() - Type_std_attributes::agg_arg_charsets...() - Type_handler_hybrid_field_type::aggregate_for_result() - Type_handler_geometry::check_type_geom_or_binary() - Type_handler::Item_func_or_sum_illegal_param() - Predicant_to_list_comparator::add_value_skip_null() - Predicant_to_list_comparator::add_value() - cmp_item_row::prepare_comparators() - cmp_item_row::aggregate_row_elements_for_comparison() - Cursor_ref::print_func() - Removes String_space() as it was only used in one cases and that could be simplified to not use String_space(), thanks to the fixed my_vsnprintf(). - Added some const LEX_CSTRING's for common strings: - NULL_clex_str, DATA_clex_str, INDEX_clex_str. - Changed primary_key_name to a LEX_CSTRING - Renamed String::set_quick() to String::set_buffer_if_not_allocated() to clarify what the function really does. - Rename of protocol function: bool store(const char *from, CHARSET_INFO *cs) to bool store_string_or_null(const char *from, CHARSET_INFO *cs). This was done to both clarify the difference between this 'store' function and also to make it easier to find unoptimal usage of store() calls. - Added Protocol::store(const LEX_CSTRING*, CHARSET_INFO*) - Changed some 'const char*' arrays to instead be of type LEX_CSTRING. - class Item_func_units now used LEX_CSTRING for name. Other things: - Fixed a bug in mysql.cc:construct_prompt() where a wrong escape character in the prompt would cause some part of the prompt to be duplicated. - Fixed a lot of instances where the length of the argument to append is known or easily obtain but was not used. - Removed some not needed 'virtual' definition for functions that was inherited from the parent. I added override to these. - Fixed Ordered_key::print() to preallocate needed buffer. Old code could case memory overruns. - Simplified some loops when adding char * to a String with delimiters.
-