From e59f39d40397645477b959255aedfa17a7c9c779 Mon Sep 17 00:00:00 2001
From: Markus Armbruster <armbru@redhat.com>
Date: Thu, 23 Aug 2018 18:39:49 +0200
Subject: json: Reject invalid UTF-8 sequences

We reject bytes that can't occur in valid UTF-8 (\xC0..\xC1,
\xF5..\xFF in the lexer.  That's insufficient; there's plenty of
invalid UTF-8 not containing these bytes, as demonstrated by
check-qjson:

* Malformed sequences

  - Unexpected continuation bytes

  - Missing continuation bytes after start bytes other than
    \xC0..\xC1, \xF5..\xFD.

* Overlong sequences with start bytes other than \xC0..\xC1,
  \xF5..\xFD.

* Invalid code points

Fixing this in the lexer would be bothersome.  Fixing it in the parser
is straightforward, so do that.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-23-armbru@redhat.com>
---
 include/qemu/unicode.h | 1 +
 1 file changed, 1 insertion(+)

(limited to 'include')

diff --git a/include/qemu/unicode.h b/include/qemu/unicode.h
index 71c72db..7fa10b8 100644
--- a/include/qemu/unicode.h
+++ b/include/qemu/unicode.h
@@ -2,5 +2,6 @@
 #define QEMU_UNICODE_H
 
 int mod_utf8_codepoint(const char *s, size_t n, char **end);
+ssize_t mod_utf8_encode(char buf[], size_t bufsz, int codepoint);
 
 #endif
-- 
cgit v1.1