aboutsummaryrefslogtreecommitdiff
path: root/libcpp/internal.h
diff options
context:
space:
mode:
authorMarek Polacek <polacek@redhat.com>2021-10-06 14:33:59 -0400
committerMarek Polacek <polacek@redhat.com>2021-11-16 21:56:16 -0500
commit51c500269bf53749b107807d84271385fad35628 (patch)
treeb7fd2eeed15448294f60989bed252e3488b251e0 /libcpp/internal.h
parent111fd515f2894d7cddf62f80c69765c43ae18577 (diff)
downloadgcc-51c500269bf53749b107807d84271385fad35628.zip
gcc-51c500269bf53749b107807d84271385fad35628.tar.gz
gcc-51c500269bf53749b107807d84271385fad35628.tar.bz2
libcpp: Implement -Wbidi-chars for CVE-2021-42574 [PR103026]
From a link below: "An issue was discovered in the Bidirectional Algorithm in the Unicode Specification through 14.0. It permits the visual reordering of characters via control sequences, which can be used to craft source code that renders different logic than the logical ordering of tokens ingested by compilers and interpreters. Adversaries can leverage this to encode source code for compilers accepting Unicode such that targeted vulnerabilities are introduced invisibly to human reviewers." More info: https://nvd.nist.gov/vuln/detail/CVE-2021-42574 https://trojansource.codes/ This is not a compiler bug. However, to mitigate the problem, this patch implements -Wbidi-chars=[none|unpaired|any] to warn about possibly misleading Unicode bidirectional control characters the preprocessor may encounter. The default is =unpaired, which warns about improperly terminated bidirectional control characters; e.g. a LRE without its corresponding PDF. The level =any warns about any use of bidirectional control characters. This patch handles both UCNs and UTF-8 characters. UCNs designating bidi characters in identifiers are accepted since r204886. Then r217144 enabled -fextended-identifiers by default. Extended characters in C/C++ identifiers have been accepted since r275979. However, this patch still warns about mixing UTF-8 and UCN bidi characters; there seems to be no good reason to allow mixing them. We warn in different contexts: comments (both C and C++-style), string literals, character constants, and identifiers. Expectedly, UCNs are ignored in comments and raw string literals. The bidirectional control characters can nest so this patch handles that as well. I have not included nor tested this at all with Fortran (which also has string literals and line comments). Dave M. posted patches improving diagnostic involving Unicode characters. This patch does not make use of this new infrastructure yet. PR preprocessor/103026 gcc/c-family/ChangeLog: * c.opt (Wbidi-chars, Wbidi-chars=): New option. gcc/ChangeLog: * doc/invoke.texi: Document -Wbidi-chars. libcpp/ChangeLog: * include/cpplib.h (enum cpp_bidirectional_level): New. (struct cpp_options): Add cpp_warn_bidirectional. (enum cpp_warning_reason): Add CPP_W_BIDIRECTIONAL. * internal.h (struct cpp_reader): Add warn_bidi_p member function. * init.c (cpp_create_reader): Set cpp_warn_bidirectional. * lex.c (bidi): New namespace. (get_bidi_utf8): New function. (get_bidi_ucn): Likewise. (maybe_warn_bidi_on_close): Likewise. (maybe_warn_bidi_on_char): Likewise. (_cpp_skip_block_comment): Implement warning about bidirectional control characters. (skip_line_comment): Likewise. (forms_identifier_p): Likewise. (lex_identifier): Likewise. (lex_string): Likewise. (lex_raw_string): Likewise. gcc/testsuite/ChangeLog: * c-c++-common/Wbidi-chars-1.c: New test. * c-c++-common/Wbidi-chars-2.c: New test. * c-c++-common/Wbidi-chars-3.c: New test. * c-c++-common/Wbidi-chars-4.c: New test. * c-c++-common/Wbidi-chars-5.c: New test. * c-c++-common/Wbidi-chars-6.c: New test. * c-c++-common/Wbidi-chars-7.c: New test. * c-c++-common/Wbidi-chars-8.c: New test. * c-c++-common/Wbidi-chars-9.c: New test. * c-c++-common/Wbidi-chars-10.c: New test. * c-c++-common/Wbidi-chars-11.c: New test. * c-c++-common/Wbidi-chars-12.c: New test. * c-c++-common/Wbidi-chars-13.c: New test. * c-c++-common/Wbidi-chars-14.c: New test. * c-c++-common/Wbidi-chars-15.c: New test. * c-c++-common/Wbidi-chars-16.c: New test. * c-c++-common/Wbidi-chars-17.c: New test.
Diffstat (limited to 'libcpp/internal.h')
-rw-r--r--libcpp/internal.h7
1 files changed, 7 insertions, 0 deletions
diff --git a/libcpp/internal.h b/libcpp/internal.h
index 8577cab..0ce0246 100644
--- a/libcpp/internal.h
+++ b/libcpp/internal.h
@@ -597,6 +597,13 @@ struct cpp_reader
/* Location identifying the main source file -- intended to be line
zero of said file. */
location_t main_loc;
+
+ /* Returns true iff we should warn about UTF-8 bidirectional control
+ characters. */
+ bool warn_bidi_p () const
+ {
+ return CPP_OPTION (this, cpp_warn_bidirectional) != bidirectional_none;
+ }
};
/* Character classes. Based on the more primitive macros in safe-ctype.h.