From e59f39d40397645477b959255aedfa17a7c9c779 Mon Sep 17 00:00:00 2001 From: Markus Armbruster Date: Thu, 23 Aug 2018 18:39:49 +0200 Subject: json: Reject invalid UTF-8 sequences We reject bytes that can't occur in valid UTF-8 (\xC0..\xC1, \xF5..\xFF in the lexer. That's insufficient; there's plenty of invalid UTF-8 not containing these bytes, as demonstrated by check-qjson: * Malformed sequences - Unexpected continuation bytes - Missing continuation bytes after start bytes other than \xC0..\xC1, \xF5..\xFD. * Overlong sequences with start bytes other than \xC0..\xC1, \xF5..\xFD. * Invalid code points Fixing this in the lexer would be bothersome. Fixing it in the parser is straightforward, so do that. Signed-off-by: Markus Armbruster Reviewed-by: Eric Blake Message-Id: <20180823164025.12553-23-armbru@redhat.com> --- include/qemu/unicode.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/qemu/unicode.h b/include/qemu/unicode.h index 71c72db..7fa10b8 100644 --- a/include/qemu/unicode.h +++ b/include/qemu/unicode.h @@ -2,5 +2,6 @@ #define QEMU_UNICODE_H int mod_utf8_codepoint(const char *s, size_t n, char **end); +ssize_t mod_utf8_encode(char buf[], size_t bufsz, int codepoint); #endif -- cgit v1.1