Commit eea9f2a1 authored by Alexander Barkov's avatar Alexander Barkov Committed by Jan Lindström

MDEV-27653 long uniques don't work with unicode collations

There are no source code changes in this commit!
This is an empty follow-up commit for
  284ac6f2
to comment what was done, as the patch itself did not have
change comments.

Problems solved in this patch:

1. The function calc_hash_for_unique() erroneously takes into account
the string length, so equal strings (in terms of the collation)
with different lengths got different hash value.

For example:
- LATIN LETTER A             - 1 byte
- LATIN LETTER A WITH ACUTE  - 2 bytes

are equal in utf8_general_ci, but as their lengths
are different, calc_hash_for_unique() returned
different hash values.

2. calc_hash_for_unique() also erroneously used val_str()
result to calculate hashes. This may not be correct for
some data types, e.g. TIMESTAMP, as its string
value depends on the session environment (e.g. @@time_zone).

Change summary:

Instead of doing Item::val_str(), we should always call
Field::hash() of the underlying Field. It properly
handles both cases (equal strings with different
lengths, as well as tricky data types like TIMESTAMP).

Detailed change description:

Non-functional changes (make the code cleaner):

- Adding a helper class Hasher, to pass hash parts
  nr1 and nr2 through function arguments easier.
- Splitting virtual Field::hash() into non-virtual
  wrapper Field::hash() and virtual Field::hash_not_null().
  This helps to get rid of duplicate code handling SQL NULL,
  as it was equal in all Field_xxx implementations.
- Adding a new method THD::my_ok_with_recreate_info().

Actual fix changes (make new tables work properly):

- Adding a virtual method Item::hash_not_null()
  This helps to handle hashes on full fields (Item_field)
  and hashes on prefix fields (Item_func_left(Item_field))
  in a polymorphic way.
  Implementing overrides for Item_field and Item_func_left.

- Rewriting Item_func_hash::val_int() to use Item::hash_not_null(),
  instead of the combination of val_str() and alc_hash_for_unique().

Backward compatibility changes (make old tables work in the new server):

- Adding a new class Item_func_hash_mariadb_100403.
  Moving the old version of Item_func_hash::val_int()
  into Item_func_hash_mariadb_100403::val_int().
  The old class Item_func_hash_mariadb_100403 is still needed,
  to open old tables before upgrade is done.

- Adding TABLE_SHARE::old_long_hash_function() and
  handler::check_long_hash_compatibility() to test
  if a table is using an old hash function.

- Adding a helper method TABLE_SHARE::make_long_hash_func()
  to instantiate either Item_func_hash_mariadb_100403 (for old
  not upgraded tables) or Item_func_hash (for new tables).

Upgrade changes (make old tables upgrade in the new server properly):

Upgrading an old table to a new hash can be done using either
of these two statements:

  ALTER IGNORE TABLE t1 FORCE;
  REPAIR TABLE t1;

!!! These statements find and filter out erreneous duplicates!!!
The table after these statements will have less records
if there were erroneous duplicates (such and A and A WITH ACUTE).

The information about filtered out records is reported in both statements.

- Adding a new class Recreate_info to return out information
  about copied and duplucate rows from these functions:
  - mysql_alter_table()
  - mysql_recreate_table()
  - admin_recreate_table()
  This helps to print a warning during REPAIR:

MariaDB [test]> repair table mdev27653_100422_text;
+----------------------------+--------+----------+------------------------------------+
| Table                      | Op     | Msg_type | Msg_text                           |
+----------------------------+--------+----------+------------------------------------+
| test.mdev27653_100422_text | repair | Warning  | Number of rows changed from 2 to 1 |
| test.mdev27653_100422_text | repair | status   | OK                                 |
+----------------------------+--------+----------+------------------------------------+
2 rows in set (0.018 sec)
parent ae96e21c
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment