MDEV-27653 long uniques don't work with unicode collations
There are no source code changes in this commit! This is an empty follow-up commit for 284ac6f2 to comment what was done, as the patch itself did not have change comments. Problems solved in this patch: 1. The function calc_hash_for_unique() erroneously takes into account the string length, so equal strings (in terms of the collation) with different lengths got different hash value. For example: - LATIN LETTER A - 1 byte - LATIN LETTER A WITH ACUTE - 2 bytes are equal in utf8_general_ci, but as their lengths are different, calc_hash_for_unique() returned different hash values. 2. calc_hash_for_unique() also erroneously used val_str() result to calculate hashes. This may not be correct for some data types, e.g. TIMESTAMP, as its string value depends on the session environment (e.g. @@time_zone). Change summary: Instead of doing Item::val_str(), we should always call Field::hash() of the underlying Field. It properly handles both cases (equal strings with different lengths, as well as tricky data types like TIMESTAMP). Detailed change description: Non-functional changes (make the code cleaner): - Adding a helper class Hasher, to pass hash parts nr1 and nr2 through function arguments easier. - Splitting virtual Field::hash() into non-virtual wrapper Field::hash() and virtual Field::hash_not_null(). This helps to get rid of duplicate code handling SQL NULL, as it was equal in all Field_xxx implementations. - Adding a new method THD::my_ok_with_recreate_info(). Actual fix changes (make new tables work properly): - Adding a virtual method Item::hash_not_null() This helps to handle hashes on full fields (Item_field) and hashes on prefix fields (Item_func_left(Item_field)) in a polymorphic way. Implementing overrides for Item_field and Item_func_left. - Rewriting Item_func_hash::val_int() to use Item::hash_not_null(), instead of the combination of val_str() and alc_hash_for_unique(). Backward compatibility changes (make old tables work in the new server): - Adding a new class Item_func_hash_mariadb_100403. Moving the old version of Item_func_hash::val_int() into Item_func_hash_mariadb_100403::val_int(). The old class Item_func_hash_mariadb_100403 is still needed, to open old tables before upgrade is done. - Adding TABLE_SHARE::old_long_hash_function() and handler::check_long_hash_compatibility() to test if a table is using an old hash function. - Adding a helper method TABLE_SHARE::make_long_hash_func() to instantiate either Item_func_hash_mariadb_100403 (for old not upgraded tables) or Item_func_hash (for new tables). Upgrade changes (make old tables upgrade in the new server properly): Upgrading an old table to a new hash can be done using either of these two statements: ALTER IGNORE TABLE t1 FORCE; REPAIR TABLE t1; !!! These statements find and filter out erreneous duplicates!!! The table after these statements will have less records if there were erroneous duplicates (such and A and A WITH ACUTE). The information about filtered out records is reported in both statements. - Adding a new class Recreate_info to return out information about copied and duplucate rows from these functions: - mysql_alter_table() - mysql_recreate_table() - admin_recreate_table() This helps to print a warning during REPAIR: MariaDB [test]> repair table mdev27653_100422_text; +----------------------------+--------+----------+------------------------------------+ | Table | Op | Msg_type | Msg_text | +----------------------------+--------+----------+------------------------------------+ | test.mdev27653_100422_text | repair | Warning | Number of rows changed from 2 to 1 | | test.mdev27653_100422_text | repair | status | OK | +----------------------------+--------+----------+------------------------------------+ 2 rows in set (0.018 sec)
Showing
Please register or sign in to comment