Commit 209711c0 authored by unknown's avatar unknown

Bug#28600 Yen sign and overline ujis conversion change

Problem: Unicode->UJIS followed incorrect conversion
rules for U+00A5 YEN SIGN and U+203E OVERLINE,
so these characters were converted to ujis 0x8E5C
and 0x8E7E accordingly.

This behaviour would be correct for a JIS-X-0201 based character set,
but this is wrong for UJIS, which is documented as x-eucjp-unicode-0.9,
and which is based on ASCII for the range U+0000..U+007F.

Fix:
removing JIS-X-0201 conversion rules, making UJIS ASCII compatible.
YEN SIGN and OVERLINE do not have corresponding UJIS characters anymore
and converted to 0x3F QUESTION MARK, throwing a warning in appropriative cases.

This patch also includes a test covering full UJIS->Unicode->UJIS mapping.


sql/field.cc:
  Nicer error message format:
  always use HEX notation when printing warnings about UCS2 values -
  this is more readable.
strings/ctype-ujis.c:
  Removing incorrect Unicode->UJIS mapping.
  MySQL "UJIS" is x-eucjp-unicode-0.9.
mysql-test/r/ctype_ujis_ucs2.result:
  New BitKeeper file ``mysql-test/r/ctype_ujis_ucs2.result''
mysql-test/t/ctype_ujis_ucs2.test:
  New BitKeeper file ``mysql-test/t/ctype_ujis_ucs2.test''
parent 63fe55e7
This diff is collapsed.
This diff is collapsed.
......@@ -5911,6 +5911,7 @@ void Field_datetime::sql_type(String &res) const
well_formed_error_pos - where not well formed data was first met
cannot_convert_error_pos - where a not-convertable character was first met
end - end of the string
cs - character set of the string
NOTES
As of version 5.0 both cases return the same error:
......@@ -5930,7 +5931,8 @@ static bool
check_string_copy_error(Field_str *field,
const char *well_formed_error_pos,
const char *cannot_convert_error_pos,
const char *end)
const char *end,
CHARSET_INFO *cs)
{
const char *pos, *end_orig;
char tmp[64], *t;
......@@ -5944,8 +5946,18 @@ check_string_copy_error(Field_str *field,
for (t= tmp; pos < end; pos++)
{
/*
If the source string is ASCII compatible (mbminlen==1)
and the source character is in ASCII printable range (0x20..0x7F),
then display the character as is.
Otherwise, if the source string is not ASCII compatible (e.g. UCS2),
or the source character is not in the printable range,
then print the character using HEX notation.
*/
if (((unsigned char) *pos) >= 0x20 &&
((unsigned char) *pos) <= 0x7F)
((unsigned char) *pos) <= 0x7F &&
cs->mbminlen == 1)
{
*t++= *pos;
}
......@@ -6027,7 +6039,7 @@ int Field_string::store(const char *from,uint length,CHARSET_INFO *cs)
field_charset->pad_char);
if (check_string_copy_error(this, well_formed_error_pos,
cannot_convert_error_pos, from + length))
cannot_convert_error_pos, from + length, cs))
return 2;
/*
......@@ -6462,7 +6474,7 @@ int Field_varstring::store(const char *from,uint length,CHARSET_INFO *cs)
int2store(ptr, copy_length);
if (check_string_copy_error(this, well_formed_error_pos,
cannot_convert_error_pos, from + length))
cannot_convert_error_pos, from + length, cs))
return 2;
// Check if we lost something other than just trailing spaces
......@@ -7144,7 +7156,7 @@ int Field_blob::store(const char *from,uint length,CHARSET_INFO *cs)
bmove(ptr+packlength,(char*) &tmp,sizeof(char*));
if (check_string_copy_error(this, well_formed_error_pos,
cannot_convert_error_pos, from + length))
cannot_convert_error_pos, from + length, cs))
return 2;
if (from_end_pos < from + length)
......
......@@ -264,18 +264,6 @@ my_wc_mb_jisx0201(CHARSET_INFO *cs __attribute__((unused)),
return 1;
}
if (wc == 0x00A5)
{
*s = 0x5C;
return 1;
}
if (wc == 0x203E)
{
*s = 0x7E;
return 1;
}
return MY_CS_ILUNI;
}
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment