mysys/charset.c · 95d51369c9b1d5b759be630003ab12e9615ea0cc · nexedi / MariaDB

MDEV-30661 UPPER() returns an empty string for U+0251 in uca1400 collations for utf8 · 7f6b648d

Alexander Barkov authored Feb 17, 2023

String length growth during upper/lower conversion
in Unicode collations depends only on the underlying MY_UNICASE_INFO
used in the collation.

Maintaining a separate member CHARSET_INFO::caseup_multiply and
CHARSET_INFO::casedn_multiply duplicated this information
and caused bugs like this (when MY_UNICASE_INFO and case??_multiply
when out of sync because of incomplete CHARSET_INFO initialization).

Fix:

Changing CHARSET_INFO::caseup_multiply and CHARSET_INFO::casedn_multiply
from members to virtual functions.
The virtual functions in Unicode collations calculate case conversion
growth factors from the MY_UNICASE_INFO. This guarantees that the growth
factors are always in sync with the MY_UNICASE_INFO.

7f6b648d

charset.c 45.8 KB

Replace charset.c