riscv-gnu-toolchain/newlib.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Corinna Vinschen <corinna@vinschen.de>	2025-07-23 22:42:01 +0200
committer	Corinna Vinschen <corinna@vinschen.de>	2025-07-24 11:23:27 +0200
commit	1463b41d403e861e4033387cdc71006e1664203a (patch)
tree	5079a888e6f349b44a059464de1fb3df2bdb675d /newlib/libc/stdio/vasniprintf.c
parent	ba962ee04543855cfc6e2dc79a7369a78218815a (diff)
download	newlib-1463b41d403e861e4033387cdc71006e1664203a.zip newlib-1463b41d403e861e4033387cdc71006e1664203a.tar.gz newlib-1463b41d403e861e4033387cdc71006e1664203a.tar.bz2

Cygwin: _sys_mbstowcs: fix handling invalid 4-byte UTF-8 sequences

When a 4 byte utf-8 sequence has an invalid 4th byte, it's actually an invalid 3 byte sequence. In this case we already generated the high surrogate and only realize the problem when byte 4 doesn't match. At this point _sys_mbstowcs transposes the invalid 4th byte into the private use area. This is wrong. The invalid byte sequence here is the 3 byte sequence already converted to a high surrogate, not the trailing 4th byte. Fix this by backtracking to the start of the broken sequence and overwrite the already written high surrogate with a sequence of the original three bytes transposed to the private use area. Reset the mbstate and restart normal conversion at the non-matching 4th byte, which might start a new multibyte sequence. The resulting wide-char string can be converted back to multibyte and back again to wide-char, and the result will be identical, even if the multibyte sequence differs from the original sequence. Fixes: e44b9069cd227 ("* strfuncs.cc (sys_cp_mbstowcs): Treat src as unsigned char *. Convert failure of f_mbtowc into a single malformed utf-16 value.") Signed-off-by: Corinna Vinschen <corinna@vinschen.de>

Diffstat (limited to 'newlib/libc/stdio/vasniprintf.c')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: