tty/vt: UTF-8 parsing update according to RFC 3629, modern Unicode
vc_translate_unicode() and vc_sanitize_unicode() parse input to the UTF-8-enabled console, marking invalid byte sequences and producing Unicode codepoints. The current algorithm follows ancient Unicode and may accept invalid byte sequences, pass on non-existent codepoints and reject valid sequences. The patch restores the functions' compliance with modern Unicode (v15.1 [1] + many previous versions) as well as RFC 3629 [2]. 1. Codepoint space is limited to 0x10FFFF. 2. "Noncharacters", such as U+FFFE, U+FFFF, are no longer invalid in Unicode and will be accepted. Another option was to complete the set of noncharacters (used to be just those two, now there's more) and preserve the rejection step. This is indeed what Unicode suggests ([1] chap. 23.7) (not requires), but most codepoints are !iswprint(), so selecting just the noncharacters seemed arbitrary and futile (and unnecessary). This is not a security patch. I'm not aware of any present security implications of the old code. [1] https://www.unicode.org/versions/Unicode15.1.0 [2] https://datatracker.ietf.org/doc/html/rfc3629Signed-off-by: Roman Žilka <roman.zilka@gmail.com> Link: https://lore.kernel.org/r/598ab459-6ba9-4a17-b4a1-08f26a356fc0@gmail.comSigned-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Showing
Please register or sign in to comment