diff options
author | Jakub Jelinek <jakub@redhat.com> | 2024-07-25 21:36:31 +0200 |
---|---|---|
committer | Thomas Koenig <tkoenig@gcc.gnu.org> | 2024-07-28 19:05:58 +0200 |
commit | dd46a960f3bfb4a4d147bb349deee150c2cd0547 (patch) | |
tree | fa4c9cef1b0bbaf677a9584d6b331529bb59a689 /gcc | |
parent | 6f1ab8976f4b169922645c802fc17277f01dc67d (diff) | |
download | gcc-dd46a960f3bfb4a4d147bb349deee150c2cd0547.zip gcc-dd46a960f3bfb4a4d147bb349deee150c2cd0547.tar.gz gcc-dd46a960f3bfb4a4d147bb349deee150c2cd0547.tar.bz2 |
c++: Implement C++26 P2558R2 - Add @, $, and ` to the basic character set [PR110343]
The following patch implements the easy parts of the paper.
When @$` are added to the basic character set, it means that
R"@$`()@$`" should now be valid (here I've noticed most of the
raw string tests were tested solely with -std=c++11 or -std=gnu++11
and I've tried to change that), and on the other side even if
by extension $ is allowed in identifiers, \u0024 or \U00000024
or \u{24} should not be, similarly how \u0041 is not allowed.
The paper in 3.1 claims though that
#include <stdio.h>
#define STR(x) #x
int main()
{
printf("%s", STR(\u0060)); // U+0060 is ` GRAVE ACCENT
}
should have been accepted before this paper (and rejected after it),
but g++ rejects it.
I've tried to understand it, but am confused on what is the right
behavior and why.
Consider
#define STR(x) #x
const char *a = "\u00b7";
const char *b = STR(\u00b7);
const char *c = "\u0041";
const char *d = STR(\u0041);
const char *e = STR(a\u00b7);
const char *f = STR(a\u0041);
const char *g = STR(a \u00b7);
const char *h = STR(a \u0041);
const char *i = "\u066d";
const char *j = STR(\u066d);
const char *k = "\u0040";
const char *l = STR(\u0040);
const char *m = STR(a\u066d);
const char *n = STR(a\u0040);
const char *o = STR(a \u066d);
const char *p = STR(a \u0040);
Neither clang nor gcc emit any diagnostics on the a, c, i and k
initializers, those are certainly valid (c is invalid in C23 though). g++
emits with -pedantic-errors errors on all the others, while clang++ on the
ones with STR involving \u0041, \u0040 and a\u0066d. The chosen values are
\u0040 '@' as something being changed by this paper, \u0041 'A' as basic
character set char valid in identifiers before/after, \u00b7 as an example
of character which is pedantically valid in identifiers if not at the start
and \u066d s something pedantically not valid in identifiers.
Now, https://eel.is/c++draft/lex.charset#6 says that UCN used outside of a
string/character literal which corresponds to basic character set character
(or control character) is ill-formed, that would make d, f, h cases invalid
for C++ and l, n, p cases invalid for C++26.
https://eel.is/c++draft/lex.name states which characters can appear at the
start of the identifier and which can appear after the start. And
https://eel.is/c++draft/lex.pptoken states that preprocessing-token is
either identifier, or tons of other things, or "each non-whitespace
character that cannot be one of the above"
Then https://eel.is/c++draft/lex.pptoken#1 says that this last category is
invalid if the preprocessing token is being converted into token.
And https://eel.is/c++draft/lex.pptoken#2 includes "If any character not in
the basic character set matches the last category, the program is
ill-formed."
Now, e.g. for the C++23 STR(\u0040) case, \u0040 is there not in the basic
character set, so valid outside of the literals (not the case anymore in
C++26), but it isn't nondigit and doesn't have XID_Start property, so it
isn't IMHO an identifier and so must be the "each non-whitespace character
that cannot be one of the above" case. Why doesn't the above mentioned
https://eel.is/c++draft/lex.pptoken#2 sentence make that invalid? Ignoring
that, I'd say it would be then stringized and that feels like it is what
clang++ is doing. Now, e.g. for the STR(a\u066d) case, I wonder why that
isn't lexed as a identifier followed by \u066d "each non-whitespace
character that cannot be one of the above" token and stringified similarly,
clang++ rejects that.
What GCC libcpp seems to be doing is that if that forms_identifier_p calls
_cpp_valid_utf8 or _cpp_valid_ucn with an argument which tells it is first
or second+ in identifier, and e.g. _cpp_valid_ucn then for UCNs valid in
string literals calls
else if (identifier_pos)
{
int validity = ucn_valid_in_identifier (pfile, result, nst);
if (validity == 0)
cpp_error (pfile, CPP_DL_ERROR,
"universal character %.*s is not valid in an identifier",
(int) (str - base), base);
else if (validity == 2 && identifier_pos == 1)
cpp_error (pfile, CPP_DL_ERROR,
"universal character %.*s is not valid at the start of an identifier",
(int) (str - base), base);
}
so basically all those invalid in identifiers cases emit an error and
pretend to be valid in identifiers, rather than what e.g. _cpp_valid_utf8
does for C but not for C++ and only for the chars completely invalid in
identifiers rather than just valid in identifiers but not at the start:
/* In C++, this is an error for invalid character in an identifier
because logically, the UTF-8 was converted to a UCN during
translation phase 1 (even though we don't physically do it that
way). In C, this byte rather becomes grammatically a separate
token. */
if (CPP_OPTION (pfile, cplusplus))
cpp_error (pfile, CPP_DL_ERROR,
"extended character %.*s is not valid in an identifier",
(int) (*pstr - base), base);
else
{
*pstr = base;
return false;
}
The comment doesn't really match what is done in recent C++ versions because
there UCNs are translated to characters and not the other way around.
2024-07-25 Jakub Jelinek <jakub@redhat.com>
PR c++/110343
libcpp/
* lex.cc: C++26 P2558R2 - Add @, $, and ` to the basic character set.
(lex_raw_string): For C++26 allow $@` characters in prefix.
* charset.cc (_cpp_valid_ucn): For C++26 reject \u0024 in identifiers.
gcc/testsuite/
* c-c++-common/raw-string-1.c: Use { c || c++11 } effective target,
remove c++ specific dg-options.
* c-c++-common/raw-string-2.c: Likewise.
* c-c++-common/raw-string-4.c: Likewise.
* c-c++-common/raw-string-5.c: Likewise. Expect some diagnostics
only for non-c++26, for c++26 expect different.
* c-c++-common/raw-string-6.c: Use { c || c++11 } effective target,
remove c++ specific dg-options.
* c-c++-common/raw-string-11.c: Likewise.
* c-c++-common/raw-string-13.c: Likewise.
* c-c++-common/raw-string-14.c: Likewise.
* c-c++-common/raw-string-15.c: Use { c || c++11 } effective target,
change c++ specific dg-options to just -Wtrigraphs.
* c-c++-common/raw-string-16.c: Likewise.
* c-c++-common/raw-string-17.c: Use { c || c++11 } effective target,
remove c++ specific dg-options.
* c-c++-common/raw-string-18.c: Use { c || c++11 } effective target,
remove -std=c++11 from c++ specific dg-options.
* c-c++-common/raw-string-19.c: Likewise.
* g++.dg/cpp26/raw-string1.C: New test.
* g++.dg/cpp26/raw-string2.C: New test.
Diffstat (limited to 'gcc')
-rw-r--r-- | gcc/testsuite/c-c++-common/raw-string-1.c | 3 | ||||
-rw-r--r-- | gcc/testsuite/c-c++-common/raw-string-11.c | 5 | ||||
-rw-r--r-- | gcc/testsuite/c-c++-common/raw-string-13.c | 3 | ||||
-rw-r--r-- | gcc/testsuite/c-c++-common/raw-string-14.c | 3 | ||||
-rw-r--r-- | gcc/testsuite/c-c++-common/raw-string-15.c | 4 | ||||
-rw-r--r-- | gcc/testsuite/c-c++-common/raw-string-16.c | 4 | ||||
-rw-r--r-- | gcc/testsuite/c-c++-common/raw-string-17.c | 3 | ||||
-rw-r--r-- | gcc/testsuite/c-c++-common/raw-string-18.c | 4 | ||||
-rw-r--r-- | gcc/testsuite/c-c++-common/raw-string-19.c | 4 | ||||
-rw-r--r-- | gcc/testsuite/c-c++-common/raw-string-2.c | 3 | ||||
-rw-r--r-- | gcc/testsuite/c-c++-common/raw-string-4.c | 3 | ||||
-rw-r--r-- | gcc/testsuite/c-c++-common/raw-string-5.c | 19 | ||||
-rw-r--r-- | gcc/testsuite/c-c++-common/raw-string-6.c | 3 | ||||
-rw-r--r-- | gcc/testsuite/g++.dg/cpp26/raw-string1.C | 4 | ||||
-rw-r--r-- | gcc/testsuite/g++.dg/cpp26/raw-string2.C | 7 |
15 files changed, 40 insertions, 32 deletions
diff --git a/gcc/testsuite/c-c++-common/raw-string-1.c b/gcc/testsuite/c-c++-common/raw-string-1.c index 199a3c6..321b5af 100644 --- a/gcc/testsuite/c-c++-common/raw-string-1.c +++ b/gcc/testsuite/c-c++-common/raw-string-1.c @@ -1,7 +1,6 @@ -// { dg-do run } +// { dg-do run { target { c || c++11 } } } // { dg-require-effective-target wchar } // { dg-options "-std=gnu99 -Wno-c++-compat -trigraphs" { target c } } -// { dg-options "-std=c++0x" { target c++ } } #ifndef __cplusplus #include <wchar.h> diff --git a/gcc/testsuite/c-c++-common/raw-string-11.c b/gcc/testsuite/c-c++-common/raw-string-11.c index 19210c5..daa75f3 100644 --- a/gcc/testsuite/c-c++-common/raw-string-11.c +++ b/gcc/testsuite/c-c++-common/raw-string-11.c @@ -1,7 +1,7 @@ // PR preprocessor/48740 +// { dg-do run { target { c || c++11 } } } // { dg-options "-std=gnu99 -trigraphs -save-temps" { target c } } -// { dg-options "-std=c++0x -save-temps" { target c++ } } -// { dg-do run } +// { dg-options "-save-temps" { target c++ } } int main () { @@ -9,4 +9,3 @@ int main () "foo%sbar%sfred%sbob?""?""?""?""?", sizeof ("foo%sbar%sfred%sbob?""?""?""?""?")); } - diff --git a/gcc/testsuite/c-c++-common/raw-string-13.c b/gcc/testsuite/c-c++-common/raw-string-13.c index fa11eda..5ab9a45 100644 --- a/gcc/testsuite/c-c++-common/raw-string-13.c +++ b/gcc/testsuite/c-c++-common/raw-string-13.c @@ -1,8 +1,7 @@ // PR preprocessor/57620 -// { dg-do run } +// { dg-do run { target { c || c++11 } } } // { dg-require-effective-target wchar } // { dg-options "-std=gnu99 -Wno-c++-compat -trigraphs" { target c } } -// { dg-options "-std=c++11" { target c++ } } #ifndef __cplusplus #include <wchar.h> diff --git a/gcc/testsuite/c-c++-common/raw-string-14.c b/gcc/testsuite/c-c++-common/raw-string-14.c index fba826c..81f0fe9 100644 --- a/gcc/testsuite/c-c++-common/raw-string-14.c +++ b/gcc/testsuite/c-c++-common/raw-string-14.c @@ -1,7 +1,6 @@ // PR preprocessor/57620 -// { dg-do compile } +// { dg-do compile { target { c || c++11 } } } // { dg-options "-std=gnu99 -trigraphs" { target c } } -// { dg-options "-std=c++11" { target c++ } } const void *s0 = R"abc\ def()abcdef" 0; diff --git a/gcc/testsuite/c-c++-common/raw-string-15.c b/gcc/testsuite/c-c++-common/raw-string-15.c index 1d101dc..cc9d393 100644 --- a/gcc/testsuite/c-c++-common/raw-string-15.c +++ b/gcc/testsuite/c-c++-common/raw-string-15.c @@ -1,8 +1,8 @@ // PR preprocessor/57620 -// { dg-do run } +// { dg-do run { target { c || c++11 } } } // { dg-require-effective-target wchar } // { dg-options "-std=gnu99 -Wno-c++-compat -Wtrigraphs" { target c } } -// { dg-options "-std=gnu++11 -Wtrigraphs" { target c++ } } +// { dg-options "-Wtrigraphs" { target c++ } } #ifndef __cplusplus #include <wchar.h> diff --git a/gcc/testsuite/c-c++-common/raw-string-16.c b/gcc/testsuite/c-c++-common/raw-string-16.c index 1bf16dd..3ddbd8f 100644 --- a/gcc/testsuite/c-c++-common/raw-string-16.c +++ b/gcc/testsuite/c-c++-common/raw-string-16.c @@ -1,7 +1,7 @@ // PR preprocessor/57620 -// { dg-do compile } +// { dg-do compile { target { c || c++11 } } } // { dg-options "-std=gnu99 -Wtrigraphs" { target c } } -// { dg-options "-std=gnu++11 -Wtrigraphs" { target c++ } } +// { dg-options "-Wtrigraphs" { target c++ } } const void *s0 = R"abc\ def()abcdef" 0; diff --git a/gcc/testsuite/c-c++-common/raw-string-17.c b/gcc/testsuite/c-c++-common/raw-string-17.c index 30df020..48db8ca 100644 --- a/gcc/testsuite/c-c++-common/raw-string-17.c +++ b/gcc/testsuite/c-c++-common/raw-string-17.c @@ -1,7 +1,6 @@ /* PR preprocessor/57824 */ -/* { dg-do run } */ +/* { dg-do run { target { c || c++11 } } } */ /* { dg-options "-std=gnu99" { target c } } */ -/* { dg-options "-std=c++11" { target c++ } } */ #define S(s) s #define T(s) s "\n" diff --git a/gcc/testsuite/c-c++-common/raw-string-18.c b/gcc/testsuite/c-c++-common/raw-string-18.c index 6709946..d96639b 100644 --- a/gcc/testsuite/c-c++-common/raw-string-18.c +++ b/gcc/testsuite/c-c++-common/raw-string-18.c @@ -1,7 +1,7 @@ /* PR preprocessor/57824 */ -/* { dg-do compile } */ +/* { dg-do compile { target { c || c++11 } } } */ /* { dg-options "-std=gnu99 -fdump-tree-optimized-lineno" { target c } } */ -/* { dg-options "-std=c++11 -fdump-tree-optimized-lineno" { target c++ } } */ +/* { dg-options "-fdump-tree-optimized-lineno" { target c++ } } */ const char x[] = R"( abc diff --git a/gcc/testsuite/c-c++-common/raw-string-19.c b/gcc/testsuite/c-c++-common/raw-string-19.c index 7ab9e6c..88c5420 100644 --- a/gcc/testsuite/c-c++-common/raw-string-19.c +++ b/gcc/testsuite/c-c++-common/raw-string-19.c @@ -1,7 +1,7 @@ /* PR preprocessor/57824 */ -/* { dg-do compile } */ +// { dg-do compile { target { c || c++11 } } } /* { dg-options "-std=gnu99 -fdump-tree-optimized-lineno -save-temps" { target c } } */ -/* { dg-options "-std=c++11 -fdump-tree-optimized-lineno -save-temps" { target c++ } } */ +/* { dg-options "-fdump-tree-optimized-lineno -save-temps" { target c++ } } */ const char x[] = R"( abc diff --git a/gcc/testsuite/c-c++-common/raw-string-2.c b/gcc/testsuite/c-c++-common/raw-string-2.c index 6f2e37d..9601c1d 100644 --- a/gcc/testsuite/c-c++-common/raw-string-2.c +++ b/gcc/testsuite/c-c++-common/raw-string-2.c @@ -1,7 +1,6 @@ -// { dg-do run } +// { dg-do run { target { c || c++11 } } } // { dg-require-effective-target wchar } // { dg-options "-std=gnu99 -Wno-c++-compat -trigraphs" { target c } } -// { dg-options "-std=c++0x" { target c++ } } #ifndef __cplusplus #include <wchar.h> diff --git a/gcc/testsuite/c-c++-common/raw-string-4.c b/gcc/testsuite/c-c++-common/raw-string-4.c index 303233b..4870ac4 100644 --- a/gcc/testsuite/c-c++-common/raw-string-4.c +++ b/gcc/testsuite/c-c++-common/raw-string-4.c @@ -1,7 +1,6 @@ // R is not applicable for character literals. -// { dg-do compile } +// { dg-do compile { target { c || c++11 } } } // { dg-options "-std=gnu99" { target c } } -// { dg-options "-std=c++0x" { target c++ } } const int i0 = R'a'; // { dg-error "was not declared|undeclared" "undeclared" } // { dg-error "expected ',' or ';'" "expected" { target c } .-1 } diff --git a/gcc/testsuite/c-c++-common/raw-string-5.c b/gcc/testsuite/c-c++-common/raw-string-5.c index dbf3133..1bb4a30 100644 --- a/gcc/testsuite/c-c++-common/raw-string-5.c +++ b/gcc/testsuite/c-c++-common/raw-string-5.c @@ -1,6 +1,5 @@ -// { dg-do compile } +// { dg-do compile { target { c || c++11 } } } // { dg-options "-std=gnu99" { target c } } -// { dg-options "-std=c++0x" { target c++ } } const void *s0 = R"0123456789abcdefg()0123456789abcdefg" 0; // { dg-error "raw string delimiter longer" "longer" { target *-*-* } .-1 } @@ -15,12 +14,18 @@ const void *s3 = R")())" 0; // { dg-error "invalid character" "invalid" { target *-*-* } .-1 } // { dg-error "stray" "stray" { target *-*-* } .-2 } const void *s4 = R"@()@" 0; - // { dg-error "invalid character" "invalid" { target *-*-* } .-1 } - // { dg-error "stray" "stray" { target *-*-* } .-2 } + // { dg-error "invalid character" "invalid" { target { c || c++23_down } } .-1 } + // { dg-error "stray" "stray" { target { c || c++23_down } } .-2 } + // { dg-error "before numeric constant" "numeric" { target c++26 } .-3 } const void *s5 = R"$()$" 0; - // { dg-error "invalid character" "invalid" { target *-*-* } .-1 } - // { dg-error "stray" "stray" { target *-*-* } .-2 } -const void *s6 = R"\u0040()\u0040" 0; + // { dg-error "invalid character" "invalid" { target { c || c++23_down } } .-1 } + // { dg-error "stray" "stray" { target { c || c++23_down } } .-2 } + // { dg-error "before numeric constant" "numeric" { target c++26 } .-3 } +const void *s6 = R"`()`" 0; + // { dg-error "invalid character" "invalid" { target { c || c++23_down } } .-1 } + // { dg-error "stray" "stray" { target { c || c++23_down } } .-2 } + // { dg-error "before numeric constant" "numeric" { target c++26 } .-3 } +const void *s7 = R"\u0040()\u0040" 0; // { dg-error "invalid character" "invalid" { target *-*-* } .-1 } // { dg-error "stray" "stray" { target *-*-* } .-2 } diff --git a/gcc/testsuite/c-c++-common/raw-string-6.c b/gcc/testsuite/c-c++-common/raw-string-6.c index 819dd44..d8a5ac0 100644 --- a/gcc/testsuite/c-c++-common/raw-string-6.c +++ b/gcc/testsuite/c-c++-common/raw-string-6.c @@ -1,6 +1,5 @@ -// { dg-do compile } +// { dg-do compile { target { c || c++11 } } } // { dg-options "-std=gnu99" { target c } } -// { dg-options "-std=c++0x" { target c++ } } const void *s0 = R"ouch()ouCh"; // { dg-error "unterminated raw string" "unterminated" } // { dg-error "at end of input" "end" { target *-*-* } .-1 } diff --git a/gcc/testsuite/g++.dg/cpp26/raw-string1.C b/gcc/testsuite/g++.dg/cpp26/raw-string1.C new file mode 100644 index 0000000..1040c70 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp26/raw-string1.C @@ -0,0 +1,4 @@ +// C++26 P2558R2 - Add @, $, and ` to the basic character set +// { dg-do compile { target c++26 } } + +const char *s0 = R"`@$$@`@`$()`@$$@`@`$"; diff --git a/gcc/testsuite/g++.dg/cpp26/raw-string2.C b/gcc/testsuite/g++.dg/cpp26/raw-string2.C new file mode 100644 index 0000000..a756290 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp26/raw-string2.C @@ -0,0 +1,7 @@ +// C++26 P2558R2 - Add @, $, and ` to the basic character set +// { dg-do compile { target { ! { avr*-*-* mmix*-*-* *-*-aix* } } } } +// { dg-options "" } + +int a$b; +int a\u0024c; // { dg-error "universal character \\\\u0024 is not valid in an identifier" "" { target c++26 } } +int a\U00000024d; // { dg-error "universal character \\\\U00000024 is not valid in an identifier" "" { target c++26 } } |