aboutsummaryrefslogtreecommitdiff
path: root/libcpp/ucnid.h
AgeCommit message (Collapse)AuthorFilesLines
2022-01-03Update copyright years.Jakub Jelinek1-1/+1
2021-09-01libcpp: Implement C++23 P1949R7 - C++ Identifier Syntax using Unicode ↵Jakub Jelinek1-1752/+2650
Standard Annex 31 The following patch implements the P1949R7 - C++ Identifier Syntax using Unicode Standard Annex 31 paper. We already allow UTF-8 characters in the source, so that part is already implemented, so IMHO all we need to do is pedwarn instead of just warn for the (default) -Wnormalize=nfc (or for -Wnormalize={id,nkfc}) if the character is not in NFC and to use the unicode XID_Start and XID_Continue derived code properties to find out what characters are allowed (the standard actually adds U+005F to XID_Start, but we are handling the ASCII compatible characters differently already and they aren't allowed in UCNs in identifiers). Instead of hardcoding the large tables in ucnid.tab, this patch makes makeucnid.c read them from the Unicode tables (13.0.0 version at this point). For non-pedantic mode, we accept as 2nd+ char in identifiers a union of valid characters in all supported modes, but for the 1st char it was actually pedantically requiring that it is not any of the characters that may not appear in the currently chosen standard as the first character. This patch changes it such that also what is allowed at the start of an identifier is a union of characters valid at the start of an identifier in any of the pedantic modes. 2021-09-01 Jakub Jelinek <jakub@redhat.com> PR c++/100977 libcpp/ * include/cpplib.h (struct cpp_options): Add cxx23_identifiers. * charset.c (CXX23, NXX23): New enumerators. (CID, NFC, NKC, CTX): Renumber. (ucn_valid_in_identifier): Implement P1949R7 - use CXX23 and NXX23 flags for cxx23_identifiers. For start character in non-pedantic mode, allow characters that are allowed as start characters in any of the supported language modes, rather than disallowing characters allowed only as non-start characters in current mode but for characters from other language modes allowing them even if they are never allowed at start. * init.c (struct lang_flags): Add cxx23_identifiers. (lang_defaults): Add cxx23_identifiers column. (cpp_set_lang): Initialize CPP_OPTION (pfile, cxx23_identifiers). * lex.c (warn_about_normalization): If cxx23_identifiers, use cpp_pedwarning_with_line instead of cpp_warning_with_line for "is not in NFC" diagnostics. * makeucnid.c: Adjust usage comment. (CXX23, NXX23): New enumerators. (all_languages): Add CXX23. (not_NFC, not_NFKC, maybe_not_NFC): Renumber. (read_derivedcore): New function. (write_table): Print also CXX23 and NXX23 columns. (main): Require 5 arguments instead of 4, call read_derivedcore. * ucnid.h: Regenerated using Unicode 13.0.0 files. gcc/testsuite/ * g++.dg/cpp23/normalize1.C: New test. * g++.dg/cpp23/normalize2.C: New test. * g++.dg/cpp23/normalize3.C: New test. * g++.dg/cpp23/normalize4.C: New test. * g++.dg/cpp23/normalize5.C: New test. * g++.dg/cpp23/normalize6.C: New test. * g++.dg/cpp23/normalize7.C: New test. * g++.dg/cpp23/ucnid-1-utf8.C: New test. * g++.dg/cpp23/ucnid-2-utf8.C: New test. * gcc.dg/cpp/ucnid-4.c: Don't expect "not valid at the start of an identifier" errors. * gcc.dg/cpp/ucnid-4-utf8.c: Likewise. * gcc.dg/cpp/ucnid-5-utf8.c: New test.
2021-08-05libcpp: Regenerate ucnid.h using Unicode 13.0.0 files [PR100977]Jakub Jelinek1-11/+229
The following patch (incremental to the makeucnid.c fix) regenerates ucnid.h with https://www.unicode.org/Public/13.0.0/ucd/ files. 2021-08-05 Jakub Jelinek <jakub@redhat.com> PR c++/100977 * ucnid.h: Regenerated using Unicode 13.0.0 files.
2021-08-05libcpp: Fix makeucnid bug with combining values [PR100977]Jakub Jelinek1-345/+90
I've noticed in ucnid.h two adjacent lines that had all flags and combine values identical and as such were supposed to be merged. This is due to a bug in makeucnid.c, which records last_flag, last_combine and really_safe of what has just been printed, but because of a typo mishandles it for last_combine, always compares against the combining_value[0] which is 0. This has two effects on the table, one is that often the table is unnecessarily large, as for non-zero .combine every character has its own record instead of adjacent characters with the same flags and combine being merged. This means larger tables. The other is that sometimes the last char that has combine set doesn't actually have it in the tables, because the code is printing entries only upon seeing the next character and if that character does have combining_value of 0 and flags are otherwise the same as previously printed, it will not print anything. The following patch fixes that, for clarity what exactly it affects I've regenerated with the same Unicode files as last time it has been regenerated. 2021-08-05 Jakub Jelinek <jakub@redhat.com> PR c++/100977 * makeucnid.c (write_table): Fix computation of last_combine. * ucnid.h: Regenerated using Unicode 6.3.0 files.
2021-01-04Update copyright years.Jakub Jelinek1-1/+1
2020-01-01Update copyright years.Jakub Jelinek1-1/+1
From-SVN: r279813
2019-01-01Update copyright years.Jakub Jelinek1-1/+1
From-SVN: r267494
2018-01-03Update copyright years.Jakub Jelinek1-1/+1
From-SVN: r256169
2017-01-01Update copyright years.Jakub Jelinek1-1/+1
From-SVN: r243994
2016-01-04Update copyright years.Jakub Jelinek1-1/+1
From-SVN: r232055
2015-01-05Update copyright years.Jakub Jelinek1-1/+1
From-SVN: r219188
2014-01-02Update copyright years in libcpp/Richard Sandiford1-1/+1
From-SVN: r206293
2013-11-16ucnid-2011-1.c: New test.Joseph Myers1-746/+4470
gcc/testsuite: * c-c++-common/cpp/ucnid-2011-1.c: New test. libcpp: * ucnid.tab: Add C11 and C11NOSTART data. * makeucnid.c (digit): Rename enum value to N99. (C11, N11, all_languages): New enum values. (NUM_CODE_POINTS, MAX_CODE_POINT): New macros. (flags, decomp, combining_value): Use NUM_CODE_POINTS as array size. (decomp): Use unsigned int as element type. (all_decomp): New array. (read_ucnid): Handle C11 and C11NOSTART. Use MAX_CODE_POINT. (read_table): Use MAX_CODE_POINT. Store all decompositions in all_decomp. (read_derived): Use MAX_CODE_POINT. (write_table): Use NUM_CODE_POINTS. Print N99, C11 and N11 flags. Print whole array variable declaration rather than just array contents. (char_id_valid, write_context_switch): New functions. (main): Call write_context_switch. * ucnid.h: Regenerate. * include/cpplib.h (struct cpp_options): Add c11_identifiers. * init.c (struct lang_flags): Add c11_identifiers. (cpp_set_lang): Set c11_identifiers option from selected language. * internal.h (struct normalize_state): Document "previous" as previous starter character. (NORMALIZE_STATE_UPDATE_IDNUM): Take character as argument. * charset.c (DIG): Rename enum value to N99. (C11, N11): New enum values. (struct ucnrange): Give name to struct. Use short for flags and unsigned int for end of range. Include ucnid.h for whole variable declaration. (ucn_valid_in_identifier): Allow for characters up to 0x10FFFF. Allow for C11 in determining valid characters and valid start characters. Use check_nfc for non-Hangul context-dependent checks. Only store starter characters in nst->previous. (_cpp_valid_ucn): Pass new argument to NORMALIZE_STATE_UPDATE_IDNUM. * lex.c (lex_identifier): Pass new argument to NORMALIZE_STATE_UPDATE_IDNUM. Call NORMALIZE_STATE_UPDATE_IDNUM after initial non-UCN part of identifier. (lex_number): Pass new argument to NORMALIZE_STATE_UPDATE_IDNUM. From-SVN: r204886
2013-11-15ucnid-9.c: New test.Joseph Myers1-5/+4
gcc/testsuite: * gcc.dg/cpp/ucnid-9.c: New test. libcpp: * ucnid.tab: Mark C99 digits as [C99DIG]. * makeucnid.c (read_ucnid): Handle [C99DIG]. (read_table): Don't check for digit characters. * ucnid.h: Regenerate. From-SVN: r204835
2013-01-14Update copyright years in libcpp.Richard Sandiford1-1/+1
From-SVN: r195162
2009-04-09Licensing changes to GPLv3 resp. GPLv3 with GCC Runtime Exception.Jakub Jelinek1-4/+4
From-SVN: r145841
2005-06-29all files: Update FSF address in copyright headers.Kelley Cook1-1/+1
2005-06-29 Kelley Cook <kcook@gcc.gnu.org> * all files: Update FSF address in copyright headers. * makeucnid.c (write_copyright): Update outputted FSF address. From-SVN: r101413
2005-03-15Index: gcc/ChangeLogGeoffrey Keating1-327/+792
2005-03-14 Geoffrey Keating <geoffk@apple.com> * doc/cppopts.texi (-fexec-charset): Add concept index entry. (-fwide-exec-charset): Likewise. (-finput-charset): Likewise. * doc/invoke.texi (Warning Options): Document -Wnormalized=. * c-opts.c (c_common_handle_option): Handle -Wnormalized=. * c.opt (Wnormalized): New. Index: libcpp/ChangeLog 2005-03-14 Geoffrey Keating <geoffk@apple.com> * init.c (cpp_create_reader): Default warn_normalize to normalized_C. * charset.c: Update for new format of ucnid.h. (ucn_valid_in_identifier): Update for new format of ucnid.h. Add NST parameter, and update it; update callers. (cpp_valid_ucn): Add NST parameter, update callers. Replace abort with cpp_error. (convert_ucn): Pass normalize_state to cpp_valid_ucn. * internal.h (struct normalize_state): New. (INITIAL_NORMALIZE_STATE): New. (NORMALIZE_STATE_RESULT): New. (NORMALIZE_STATE_UPDATE_IDNUM): New. (_cpp_valid_ucn): New. * lex.c (warn_about_normalization): New. (forms_identifier_p): Add normalize_state parameter, update callers. (lex_identifier): Add normalize_state parameter, update callers. Keep the state current. (lex_number): Likewise. (_cpp_lex_direct): Pass normalize_state to subroutines. Check it with warn_about_normalization. * makeucnid.c: New. * ucnid.h: Replace. * ucnid.pl: Remove. * ucnid.tab: Make appropriate for input to makeucnid.c. Remove comments about obsolete version of C++. * include/cpplib.h (enum cpp_normalize_level): New. (struct cpp_options): Add warn_normalize field. Index: gcc/testsuite/ChangeLog 2005-03-14 Geoffrey Keating <geoffk@apple.com> * gcc.dg/cpp/normalize-1.c: New. * gcc.dg/cpp/normalize-2.c: New. * gcc.dg/cpp/normalize-3.c: New. * gcc.dg/cpp/normalize-4.c: New. * gcc.dg/cpp/ucnid-4.c: New. * gcc.dg/cpp/ucnid-5.c: New. * g++.dg/cpp/normalize-1.C: New. * g++.dg/cpp/ucnid-1.C: New. From-SVN: r96459
2004-05-24Makefile.def (host_modules): add libcpp.Paolo Bonzini1-0/+336
ChangeLog: 2004-05-23 Paolo Bonzini <bonzini@gnu.org> * Makefile.def (host_modules): add libcpp. * Makefile.tpl: Add dependencies on and for libcpp. * Makefile.in: Regenerate. * configure.in: Add libcpp host module. * configure: Regenerate. config/ChangeLog: 2004-05-23 Paolo Bonzini <bonzini@gnu.org> * acx.m4 (ACX_HEADER_STDBOOL, ACX_HEADER_STRING): From gcc. gcc/ChangeLog: 2004-05-23 Paolo Bonzini <bonzini@gnu.org> Move libcpp to the toplevel. * Makefile.in: Remove references to libcpp files, use CPPLIBS instead of libcpp.a. Define SYMTAB_H and change hashtable.h to that. * aclocal.m4 (gcc_AC_HEADER_STDBOOL, gcc_AC_HEADER_STRING, gcc_AC_C__BOOL): Remove. * configure.ac (gcc_AC_C__BOOL, HAVE_UCHAR): Remove tests. * configure: Regenerate. * config.in: Regenerate. * c-ppoutput.c: Include ../libcpp/internal.h instead of cpphash.h. * cppcharset.c: Removed. * cpperror.c: Removed. * cppexp.c: Removed. * cppfiles.c: Removed. * cpphash.c: Removed. * cpphash.h: Removed. * cppinit.c: Removed. * cpplex.c: Removed. * cpplib.c: Removed. * cpplib.h: Removed. * cppmacro.c: Removed. * cpppch.c: Removed. * cpptrad.c: Removed. * cppucnid.h: Removed. * cppucnid.pl: Removed. * cppucnid.tab: Removed. * hashtable.c: Removed. * hashtable.h: Removed. * line-map.c: Removed. * line-map.h: Removed. * mkdeps.c: Removed. * mkdeps.h: Removed. * stringpool.h: Include symtab.h instead of hashtable.h. * tree.h: Include symtab.h instead of hashtable.h. * system.h (O_NONBLOCK, O_NOCTTY): Do not define. gcc/cp/ChangeLog: 2004-05-23 Paolo Bonzini <bonzini@gnu.org> * Make-lang.in: No need to specify $(LIBCPP). gcc/java/ChangeLog: 2004-05-23 Paolo Bonzini <bonzini@gnu.org> * Make-lang.in: Link in $(LIBCPP) instead of mkdeps.o. libcpp/ChangeLog: 2004-05-23 Paolo Bonzini <bonzini@gnu.org> Moved libcpp from the gcc subdirectory to the toplevel. * Makefile.am: New file. * Makefile.in: Regenerate. * configure.ac: New file. * configure: Regenerate. * config.in: Regenerate. * charset.c: Moved from gcc/cppcharset.c. Add note about brokenness of input charset detection. Adjust for change in name of cppucnid.h. * errors.c: Moved from gcc/cpperror.c. Do not include intl.h. * expr.c: Moved from gcc/cppexp.c. * files.c: Moved from gcc/cppfiles.c. Do not include intl.h. Remove #define of O_BINARY, it is in system.h. * identifiers.c: Moved from gcc/cpphash.c. * internal.h: Moved from gcc/cpphash.h. Change header guard name. All other files adjusted to match name change. * init.c: Moved from gcc/cppinit.c. (init_library) [ENABLE_NLS]: Call bindtextdomain. * lex.c: Moved from gcc/cpplex.c. * directives.c: Moved from gcc/cpplib.c. * macro.c: Moved from gcc/cppmacro.c. * pch.c: Moved from gcc/cpppch.c. Do not include intl.h. * traditional.c: Moved from gcc/cpptrad.c. * ucnid.h: Moved from gcc/cppucnid.h. Change header guard name. * ucnid.pl: Moved from gcc/cppucnid.pl. * ucnid.tab: Moved from gcc/cppucnid.tab. Change header guard name. * symtab.c: Moved from gcc/hashtable.c. * line-map.c: Moved from gcc. Do not include intl.h. * mkdeps.c: Moved from gcc. * system.h: New file. libcpp/include/ChangeLog: 2004-05-23 Paolo Bonzini <bonzini@gnu.org> * cpplib.h: Moved from gcc. Change header guard name. * line-map.h: Moved from gcc. Change header guard name. * mkdeps.h: Moved from gcc. Change header guard name. * symtab.h: Moved from gcc/hashtable.h. Change header guard name. libcpp/po/ChangeLog: 2004-05-23 Paolo Bonzini <bonzini@gnu.org> * be.po: Extracted from gcc/po/be.po. * ca.po: Extracted from gcc/po/ca.po. * da.po: Extracted from gcc/po/da.po. * de.po: Extracted from gcc/po/de.po. * el.po: Extracted from gcc/po/el.po. * es.po: Extracted from gcc/po/es.po. * fr.po: Extracted from gcc/po/fr.po. * ja.po: Extracted from gcc/po/ja.po. * nl.po: Extracted from gcc/po/nl.po. * sv.po: Extracted from gcc/po/sv.po. * tr.po: Extracted from gcc/po/tr.po. From-SVN: r82199