diff options
author | Corentin Jabot <corentinjabot@gmail.com> | 2022-04-04 12:41:12 +0200 |
---|---|---|
committer | Corentin Jabot <corentinjabot@gmail.com> | 2022-06-25 19:03:33 +0200 |
commit | c92056d038812c23800131892bee48abb2de7ca0 (patch) | |
tree | 9d6b03771d9072131513830402ee2312819948fb /llvm/unittests/Support/SourceMgrTest.cpp | |
parent | f8c1c9afd3e2286a8fac99fb9978f1566b89fa70 (diff) | |
download | llvm-c92056d038812c23800131892bee48abb2de7ca0.zip llvm-c92056d038812c23800131892bee48abb2de7ca0.tar.gz llvm-c92056d038812c23800131892bee48abb2de7ca0.tar.bz2 |
[Clang][C++23] P2071 Named universal character escapes
Implements [[ https://wg21.link/p2071r1 | P2071 Named Universal Character Escapes ]] - as an extension in all language mode, the patch not warn in c++23 mode will be done later once this paper is plenary approved (in July).
We add
* A code generator that transforms `UnicodeData.txt` and `NameAliases.txt` to a space efficient data structure that can be queried in `O(NameLength)`
* A set of functions in `Unicode.h` to query that data, including
* A function to find an exact match of a given Unicode character name
* A function to perform a loose (ignoring case, space, underscore, medial hyphen) matching
* A function returning the best matching codepoint for a given string per edit distance
* Support of `\N{}` escape sequences in String and character Literals, with loose and typos diagnostics/fixits
* Support of `\N{}` as UCN with loose matching diagnostics/fixits.
Loose matching is considered an error to match closely the semantics of P2071.
The generated data contributes to 280kB of data to the binaries.
`UnicodeData.txt` and `NameAliases.txt` are not committed to the repository in this patch, and regenerating the data is a manual process.
Reviewed By: tahonermann
Differential Revision: https://reviews.llvm.org/D123064
Diffstat (limited to 'llvm/unittests/Support/SourceMgrTest.cpp')
0 files changed, 0 insertions, 0 deletions