Age | Commit message (Collapse) | Author | Files | Lines |
|
The warning for -Wunknown-pragmas is issued at the location provided by
libcpp to the def_pragma() callback. This location is
cpp_reader::directive_line, which is a location for the start of the line
only; it is also not a valid location in case the unknown pragma was lexed
from a _Pragma string. These factors make it impossible to suppress
-Wunknown-pragmas via _Pragma("GCC diagnostic...") directives on the same
source line, as in the PR and the test case. Address that by issuing the
warning at a better location returned by cpp_get_diagnostic_override_loc().
libcpp already maintains this location to handle _Pragma-related diagnostics
internally; it was needed also to make a publicly accessible version of it.
gcc/c-family/ChangeLog:
PR c/118838
* c-lex.cc (cb_def_pragma): Call cpp_get_diagnostic_override_loc()
to get a valid location at which to issue -Wunknown-pragmas, in case
it was triggered from a _Pragma.
libcpp/ChangeLog:
PR c/118838
* errors.cc (cpp_get_diagnostic_override_loc): New function.
* include/cpplib.h (cpp_get_diagnostic_override_loc): Declare.
gcc/testsuite/ChangeLog:
PR c/118838
* c-c++-common/cpp/pragma-diagnostic-loc-2.c: New test.
* g++.dg/gomp/macro-4.C: Adjust expected output.
* gcc.dg/gomp/macro-4.c: Likewise.
* gcc.dg/cpp/Wunknown-pragmas-1.c: Likewise.
|
|
At the end of a sequence like:
#pragma GCC push_options
...
#pragma GCC pop_options
the handler for pop_options calls cl_optimization_compare() (as generated by
optc-save-gen.awk) to make sure that all global state has been restored to
the value it had prior to the push_options call. The verification is
performed for almost all entries in the global_options struct. This leads to
unexpected checking asserts, as discussed in the PR, in case the state of
warnings-related options has been intentionally modified in between
push_options and pop_options via a call to #pragma GCC diagnostic. Address
that by skipping the verification for CL_WARNING-flagged options.
gcc/ChangeLog:
PR middle-end/115913
* optc-save-gen.awk (cl_optimization_compare): Skip options with
CL_WARNING flag.
gcc/testsuite/ChangeLog:
PR middle-end/115913
* c-c++-common/cpp/pr115913.c: New test.
|
|
__has_builtin(__builtin_c23_va_start)
The Linux kernel uses its own copy of stdarg.h.
Now, before GCC 15, our stdarg.h had
#if defined __STDC_VERSION__ && __STDC_VERSION__ > 201710L
#define va_start(v, ...) __builtin_va_start(v, 0)
#else
#define va_start(v,l) __builtin_va_start(v,l)
#endif
va_start definition but GCC 15 has:
#if defined __STDC_VERSION__ && __STDC_VERSION__ > 201710L
#define va_start(...) __builtin_c23_va_start(__VA_ARGS__)
#else
#define va_start(v,l) __builtin_va_start(v,l)
#endif
I wanted to suggest to the kernel people during their porting to C23
that they'd better use C23 compatible va_start macro definition,
but to make it portable, I think they really want something like
#if defined __STDC_VERSION__ && __STDC_VERSION__ > 201710L
#define va_start(v, ...) __builtin_va_start(v, 0)
#ifdef __has_builtin
#if __has_builtin(__builtin_c23_va_start)
#undef va_start
#define va_start(...) __builtin_c23_va_start(__VA_ARGS__)
#endif
#else
#define va_start(v,l) __builtin_va_start(v,l)
#endif
or so (or with >= 202311L), as GCC 13-14 and clang don't support
__builtin_c23_va_start (yet?) and one gets better user experience with
that.
Except it seems __has_builtin(__builtin_c23_va_start) doesn't actually work,
it works for most of the stdarg.h __builtin_va_*, doesn't work for
__builtin_va_arg (neither C nor C++) and didn't work for
__builtin_c23_va_start if it was available.
The following patch wires __has_builtin for those.
2025-01-21 Jakub Jelinek <jakub@redhat.com>
gcc/c/
* c-decl.cc (names_builtin_p): Return 1 for RID_C23_VA_START and
RID_VA_ARG.
gcc/cp/
* cp-objcp-common.cc (names_builtin_p): Return 1 for RID_VA_ARG.
gcc/testsuite/
* c-c++-common/cpp/has-builtin-4.c: New test.
|
|
With the https://gcc.gnu.org/pipermail/gcc-patches/2025-January/673945.html
hack we get slightly different error wording in one of the errors, given that
the test actually does use #embed, I think both wordings are just fine and
we should accept them.
2025-01-17 Jakub Jelinek <jakub@redhat.com>
* c-c++-common/cpp/embed-10.c: Allow a different error wording for
C++.
|
|
The following testcases ICE because fold_array_ctor_reference in the
RAW_DATA_CST handling just return build_int_cst without actually checking
that if type is non-NULL, TREE_TYPE (val) is uselessly convertible to it.
By falling through the code after it without *suboff += we get everything
we need, the two if conditionals will never be true (we've already
checked that size == BITS_PER_UNIT and so can't be 0, and val will be
INTEGER_CST), but it will do the important fold_ctor_reference call
which will deal with type incompatibilities.
2024-12-28 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/118207
* gimple-fold.cc (fold_array_ctor_reference): For RAW_DATA_CST,
just set val to build_int_cst and fall through to the normal
element handling code instead of returning build_int_cst right away.
* gcc.dg/pr118207.c: New test.
|
|
Some directives in the test were #errror rather than #error.
2024-12-13 Jakub Jelinek <jakub@redhat.com>
* c-c++-common/cpp/embed-1.c: Use #error rather than #errror.
|
|
This patch adds similar optimizations to the C++ FE as have been
implemented earlier in the C FE.
The libcpp hunk enables use of CPP_EMBED token even for C++, not just
C; the preprocessor guarantees there is always a CPP_NUMBER CPP_COMMA
before CPP_EMBED and CPP_COMMA CPP_NUMBER after it which simplifies
parsing (unless #embed is more than 2GB, in that case it could be
CPP_NUMBER CPP_COMMA CPP_EMBED CPP_COMMA CPP_EMBED CPP_COMMA CPP_EMBED
CPP_COMMA CPP_NUMBER etc. with each CPP_EMBED covering at most INT_MAX
bytes).
Similarly to the C patch, this patch parses it into RAW_DATA_CST tree
in the braced initializers (and from there peels into INTEGER_CSTs unless
it is an initializer of an std::byte array or integral array with CHAR_BIT
element precision), parses CPP_EMBED in cp_parser_expression into just
the last INTEGER_CST in it because I think users don't need millions of
-Wunused-value warnings because they did useless
int a = (
#embed "megabyte.dat"
);
and so most of the inner INTEGER_CSTs would be there just for the warning,
and in the rest of contexts like template argument list, function argument
list, attribute argument list, ...) parse it into a sequence of INTEGER_CSTs
(I wrote a range/iterator classes to simplify that).
My dumb
cat embed-11.c
constexpr unsigned char a[] = {
#embed "cc1plus"
};
const unsigned char *b = a;
testcase where cc1plus is 492329008 bytes long when configured
--enable-checking=yes,rtl,extra against recent binutils with .base64 gas
support results in:
time ./xg++ -B ./ -S -O2 embed-11.c
real 0m4.350s
user 0m2.427s
sys 0m0.830s
time ./xg++ -B ./ -c -O2 embed-11.c
real 0m6.932s
user 0m6.034s
sys 0m0.888s
(compared to running out of memory or very long compilation).
On a shorter inclusion,
cat embed-12.c
constexpr unsigned char a[] = {
#embed "xg++"
};
const unsigned char *b = a;
where xg++ is 15225904 bytes long, this takes using GCC with the #embed
patchset except for this patch:
time ~/src/gcc/obj36/gcc/xg++ -B ~/src/gcc/obj36/gcc/ -S -O2 embed-12.c
real 0m33.190s
user 0m32.327s
sys 0m0.790s
and with this patch:
time ./xg++ -B ./ -S -O2 embed-12.c
real 0m0.118s
user 0m0.090s
sys 0m0.028s
The patch doesn't change anything on what the first patch in the series
introduces even for C++, namely that #embed is expanded (actually or as if)
into a sequence of literals like
127,69,76,70,2,1,1,3,0,0,0,0,0,0,0,0,2,0,62,0,1,0,0,0,80,211,64,0,0,0,0,0,64,0,0,0,0,0,0,0,8,253
and so each element has int type.
That is how I believe it is in C23, and the different versions of the
C++ P1967 paper specified there some casts, P1967R12 in particular
"Otherwise, the integral constant expression is the value of std::fgetc’s return is cast
to unsigned char."
but please see
https://github.com/llvm/llvm-project/pull/97274#issuecomment-2230929277
comment and whether we really want the preprocessor to preprocess it for
C++ as (or as-if)
static_cast<unsigned char>(127),static_cast<unsigned char>(69),static_cast<unsigned char>(76),static_cast<unsigned char>(70),static_cast<unsigned char>(2),...
i.e. 9 tokens per byte rather than 2, or
(unsigned char)127,(unsigned char)69,...
or
((unsigned char)127),((unsigned char)69),...
etc.
Without a literal suffix for unsigned char constant literals it is horrible,
plus the incompatibility between C and C++. Sure, we could use the magic
form more often for C++ to save the size and do the 9 or how many tokens
form only for the boundary constants and use #embed "." __gnu__::__base64__("...")
for what is in between if there are at least 2 tokens inside of it.
E.g. (unsigned char)127 vs. static_cast<unsigned char>(127) behaves
differently if there is constexpr long long p[] = { ... };
...
#embed __FILE__
[p]
2024-12-06 Jakub Jelinek <jakub@redhat.com>
libcpp/
* files.cc (finish_embed): Use CPP_EMBED even for C++.
gcc/
* tree.h (RAW_DATA_UCHAR_ELT, RAW_DATA_SCHAR_ELT): Define.
gcc/cp/ChangeLog:
* cp-tree.h (class raw_data_iterator): New type.
(class raw_data_range): New type.
* parser.cc (cp_parser_postfix_open_square_expression): Handle
parsing of CPP_EMBED.
(cp_parser_parenthesized_expression_list): Likewise. Use
cp_lexer_next_token_is.
(cp_parser_expression): Handle parsing of CPP_EMBED.
(cp_parser_template_argument_list): Likewise.
(cp_parser_initializer_list): Likewise.
(cp_parser_oacc_clause_tile): Likewise.
(cp_parser_omp_tile_sizes): Likewise.
* pt.cc (tsubst_expr): Handle RAW_DATA_CST.
* constexpr.cc (reduced_constant_expression_p): Likewise.
(raw_data_cst_elt): New function.
(find_array_ctor_elt): Handle RAW_DATA_CST.
(cxx_eval_array_reference): Likewise.
* typeck2.cc (digest_init_r): Emit -Wnarrowing and/or -Wconversion
diagnostics.
(process_init_constructor_array): Handle RAW_DATA_CST.
* decl.cc (maybe_deduce_size_from_array_init): Likewise.
(is_direct_enum_init): Fail for RAW_DATA_CST.
(cp_maybe_split_raw_data): New function.
(consume_init): New function.
(reshape_init_array_1): Add VECTOR_P argument. Handle RAW_DATA_CST.
(reshape_init_array): Adjust reshape_init_array_1 caller.
(reshape_init_vector): Likewise.
(reshape_init_class): Handle RAW_DATA_CST.
(reshape_init_r): Likewise.
gcc/testsuite/
* c-c++-common/cpp/embed-22.c: New test.
* c-c++-common/cpp/embed-23.c: New test.
* g++.dg/cpp/embed-4.C: New test.
* g++.dg/cpp/embed-5.C: New test.
* g++.dg/cpp/embed-6.C: New test.
* g++.dg/cpp/embed-7.C: New test.
* g++.dg/cpp/embed-8.C: New test.
* g++.dg/cpp/embed-9.C: New test.
* g++.dg/cpp/embed-10.C: New test.
* g++.dg/cpp/embed-11.C: New test.
* g++.dg/cpp/embed-12.C: New test.
* g++.dg/cpp/embed-13.C: New test.
* g++.dg/cpp/embed-14.C: New test.
|
|
As noted in bug 117162, C23 changed some rules on UCNs to match C++
(this was a late change agreed in the resolution to CD2 comment
US-032, implementing changes from N3124), which we need to implement.
Allow UCNs below 0xa0 outside identifiers for C, with a
pedwarn-if-pedantic before C23 (and a warning with -Wc11-c23-compat)
except for the always-allowed cases of UCNs for $ @ `. Also as part
of that change, do not allow \u0024 in identifiers as equivalent to $
for C23.
Bootstrapped with no regressions for x86_64-pc-linux-gnu.
PR c/117162
libcpp/
* include/cpplib.h (struct cpp_options): Add low_ucns.
* init.cc (struct lang_flags, lang_defaults): Add low_ucns.
(cpp_set_lang): Set low_ucns
* charset.cc (_cpp_valid_ucn): For C, allow UCNs below 0xa0
outside identifiers, with a pedwarn if pedantic before C23 or a
warning with -Wc11-c23-compat. Do not allow \u0024 in identifiers
for C23.
gcc/testsuite/
* gcc.dg/cpp/c17-ucn-1.c, gcc.dg/cpp/c17-ucn-2.c,
gcc.dg/cpp/c17-ucn-3.c, gcc.dg/cpp/c17-ucn-4.c,
gcc.dg/cpp/c23-ucn-2.c, gcc.dg/cpp/c23-ucnid-2.c: New tests.
* c-c++-common/cpp/delimited-escape-seq-3.c,
c-c++-common/cpp/named-universal-char-escape-3.c,
gcc.dg/cpp/c23-ucn-1.c, gcc.dg/cpp/c2y-delimited-escape-seq-3.c:
Update expected messages
* gcc.dg/cpp/ucs.c: Use -pedantic-errors. Update expected
messages.
|
|
On Thu, Oct 24, 2024 at 03:33:25PM -0400, Eric Gallager wrote:
> On Thu, Oct 24, 2024 at 4:17 AM Jakub Jelinek <jakub@redhat.com> wrote:
> > I've tried to build stage3 with
> > -Wleading-whitespace=blanks -Wtrailing-whitespace=blank -Wno-error=leading-whitespace=blanks -Wno-error=trailing-whitespace=blank
>
> So wait, it's "blanks" (plural) when it's leading, but "blank"
> (singular) when it's trailing? That inconsistency bothers me...
I've mentioned it already in
https://gcc.gnu.org/pipermail/gcc-patches/2024-October/664664.html
Citing that here:
Not sure about the kinds for the option, given -Wleading-whitespace=
uses plural and this option singular and -Wleading-whitespace= spaces
means literally just ' ' characters, while space in
-Wtrailing-whitespace= was ' ', '\t', '\v' and '\f'; so category;
perhaps just use any and blanks?
Other preferences?
Here is a patch to do the blank->blanks and space->any changes.
2024-10-27 Jakub Jelinek <jakub@redhat.com>
gcc/
* doc/invoke.texi (Wtrailing-whitespace=): Change
blank argument to blanks and space argument to any.
gcc/c-family/
* c.opt (warn_trailing_whitespace_kind): Change blank
to blanks and space to any.
gcc/testsuite/
* c-c++-common/cpp/Wtrailing-whitespace-2.c: Use
-Wtrailing-whitespace=blanks rather than -Wtrailing-whitespace=blank.
* c-c++-common/cpp/Wtrailing-whitespace-3.c: Use
-Wtrailing-whitespace=any rather than -Wtrailing-whitespace=space.
* c-c++-common/cpp/Wtrailing-whitespace-7.c: Use
-Wtrailing-whitespace=blanks rather than -Wtrailing-whitespace=blank.
* c-c++-common/cpp/Wtrailing-whitespace-8.c: Use
-Wtrailing-whitespace=any rather than -Wtrailing-whitespace=space.
|
|
The following patch on top of the r15-4346 patch adds
-Wleading-whitespace= warning option.
This warning doesn't care how much one actually indents which line
in the source (that is something that can't be easily done in the
preprocessor without doing syntactic analysis), but just simple checks
on what kind of whitespace is used in the indentation.
I think it is still useful to get warnings about such issues early,
while git diagnoses some of it in patches (e.g. the tab after space
case), getting the warnings earlier might help avoiding such issues
sooner.
There are projects which ban use of tabs and require just spaces,
others which require indentation just with horizontal tabs, and finally
projects which want indentation with tabs for multiples of tabstop size
followed by spaces (fewer than tabstop size), like GCC.
For all 3 kinds the warning diagnoses indentation with '\v' or '\f'
characters (unless line contains just whitespace), and for the last one
also cases where a space in the indentation is followed by horizontal
tab or where there are N or more consecutive spaces in the indentation
(for -ftabstop=N).
BTW, for additional testing I've enabled the warnings (without -Werror
for them) in stage3. There are many warnings (both trailing and leading
whitespace), some of them something that can be easily fixed in the headers
or source files, but others with whitespace issues in generated sources,
so if we enable the warnings, either we'd need to adjust the generators
or disable the warnings in (some of the) generated files.
2024-10-23 Jakub Jelinek <jakub@redhat.com>
libcpp/
* include/cpplib.h (struct cpp_options): Add
cpp_warn_leading_whitespace and cpp_tabstop members.
(enum cpp_warning_reason): Add CPP_W_LEADING_WHITESPACE.
* internal.h (struct _cpp_line_note): Document new
line note kinds.
* init.cc (cpp_create_reader): Set cpp_tabstop to 8.
* lex.cc (find_leading_whitespace_issues): New function.
(_cpp_clean_line): Use it.
(_cpp_process_line_notes): Handle 'L', 'S' and 'T' line notes.
(lex_raw_string): Clear type on 'L', 'S' and 'T' line notes
inside of raw string literals.
gcc/
* doc/invoke.texi (Wleading-whitespace=): Document.
gcc/c-family/
* c.opt (Wleading-whitespace=): New option.
* c-opts.cc (c_common_post_options): Set cpp_opts->cpp_tabstop
to global_dc->m_tabstop.
gcc/testsuite/
* c-c++-common/cpp/Wleading-whitespace-1.c: New test.
* c-c++-common/cpp/Wleading-whitespace-2.c: New test.
* c-c++-common/cpp/Wleading-whitespace-3.c: New test.
* c-c++-common/cpp/Wleading-whitespace-4.c: New test.
|
|
libcpp is not currently set up to be able to generate valid
locations for tokens lexed from a _Pragma string. Instead, after obtaining
the tokens, it sets their locations all to the location of the _Pragma
operator itself. This makes things like _Pragma("GCC diagnostic") work well
enough, but if any diagnostics are issued during lexing, prior to resetting
the token locations, those diagnostics get issued at the invalid
locations. Fix that up by adding a new field pfile->diagnostic_override_loc
that instructs libcpp to issue diagnostics at the alternate location.
libcpp/ChangeLog:
PR preprocessor/114423
* internal.h (struct cpp_reader): Add DIAGNOSTIC_OVERRIDE_LOC
field.
* directives.cc (destringize_and_run): Set the new field to the
location of the _Pragma operator.
* errors.cc (cpp_diagnostic_at): Support DIAGNOSTIC_OVERRIDE_LOC to
temporarily issue diagnostics at a different location.
(cpp_diagnostic_with_line): Likewise.
gcc/testsuite/ChangeLog:
PR preprocessor/114423
* c-c++-common/cpp/pragma-diagnostic-loc.c: New test.
* c-c++-common/cpp/diagnostic-pragma-1.c: Adjust expected output.
* g++.dg/pch/operator-1.C: Likewise.
|
|
I've noticed the following testcase hangs during gimplification.
While it is gimplifying an assignment from a VAR_DECL .LCNNN to MEM_REF,
because the VAR_DECL is TREE_READONLY, it will happily pick its initializer
and try to gimplify that, which means recursing to the exact same code.
The following patch fixes that by just gimplifying the lhs and building
assignment, because the code decided that it should use copying from
a static var.
2024-10-16 Jakub Jelinek <jakub@redhat.com>
* gimplify.cc (gimplify_init_ctor_eval): For larger RAW_DATA_CST,
just gimplify cref as lvalue and add gimple assignment of rctor
to cref instead of going through gimplification of INIT_EXPR, as
the latter can suffer from infinite recursion.
* c-c++-common/cpp/embed-24.c: New test.
|
|
This patch actually optimizes #embed, so far in C.
For a simple testcase (for 494447200 bytes long cc1plus):
cat embed-11.c
unsigned char a[] = {
#embed "cc1plus"
};
time ./xgcc -B ./ -S -std=c23 -O2 embed-11.c
real 0m13.647s
user 0m7.157s
sys 0m2.597s
time ./xgcc -B ./ -c -std=c23 -O2 embed-11.c
real 0m28.649s
user 0m26.653s
sys 0m1.958s
and when configured against binutils with .base64 support
time ./xgcc -B ./ -S -std=c23 -O2 embed-11.c
real 0m4.283s
user 0m2.288s
sys 0m0.859s
time ./xgcc -B ./ -c -std=c23 -O2 embed-11.c
real 0m6.888s
user 0m5.876s
sys 0m1.002s
(all times with --enable-checking=yes,rtl,extra compiler).
Even just
./cc1plus -E -o embed-11.i embed-11.c
(which doesn't have this optimization yet and so preprocesses it as
1.3GB preprocessed file) needed almost 25GB of compile time RAM (but
preprocessed fine).
And compiling that embed-11.i with -std=c23 -O0 by unpatched gcc
I gave up after 400 seconds when it already ate 45GB of RAM and didn't
produce a single byte into embed-11.s yet.
The patch introduces a new CPP_EMBED token which contains raw memory image
virtually representing a sequence of int literals.
To simplify the parsing complexities, the preprocessor guarantees CPP_EMBED
is only emitted if there are 4+ (it actually does that for 64+ right now)
literals in the sequence and emits CPP_NUMBER CPP_COMMA CPP_EMBED CPP_COMMA
CPP_NUMBER tokens (with more CPP_EMBED separated by CPP_COMMA if it is
longer than 2GB, as STRING_CSTs in GCC and also the new RAW_DATA_CST etc.
are limited to INT_MAX elements). The main reason is that the preprocessor
doesn't really know in which context #embed directive appears, there could
be e.g.
{ 25 *
#embed "whatever"
* 2 - 15 }
or similar and dealing with this special case deep in the expression parsing
is undesirable.
With the CPP_NUMBERs around it, I believe in the C FE the only places which
need handling of the CPP_EMBED token are initializer parsing (that is the
only one which adds actual optimizations for it), comma expressions (I
believe nothing really cares whether it is 25,13,95 or
25,13,0,1,2,3,4,5,6,7,8,9,10,13,95 etc., so besides the 2 outer CPP_NUMBER
the parsing just adds one INTEGER_CST to the comma expression, I doubt users
want to be spammed with millions of -Wunused warnings per #embed),
whatever uses c_parser_expr_list (function calls, attribute arguments,
OpenMP sizes clause argument, OpenACC tile clause argument and whatever uses
c_parser_get_builtin_args (mainly for __builtin_shufflevector). Please correct
me if I'm wrong.
The patch introduces a RAW_DATA_CST tree code, which can then be used inside
of array CONSTRUCTOR elt values. In some sense RAW_DATA_CST is similar to
STRING_CST, but right now STRING_CST is used only if the whole array
initializer is that constant, while RAW_DATA_CST at index idx (should be
always INTEGER_CST index, another advantage of the CPP_NUMBER around is that
[30 ... 250] =
#embed "whatever"
really does what it would do with a integer sequence there) stands for
[idx] = RAW_DATA_POINTER (val)[0],
[idx+1] = RAW_DATA_POINTER (val)[1],
...
[idx+RAW_DATA_LENGTH (val)-1] = RAW_DATA_POINTER (val)[RAW_DATA_LENGTH (val)-1].
Another important thing is that unlike STRING_CST which has the data
embedded in it RAW_DATA_CST doesn't own the data, it has RAW_DATA_OWNER
which owns the data (that can be a STRING_CST, e.g. used for PCH or LTO
after reading LTO in) or another RAW_DATA_CST (with NULL RAW_DATA_OWNER,
standing for data owned by libcpp buffers). The advantage is that it can be
cheaply peeled off, or split into multiple smaller pieces, e.g. if one uses
designated initializer to store something into the middle of a 10GB #embed
array, in no case we need to actually copy data around for that.
Right now RAW_DATA_CST is only used in initializers of integral arrays where
the integer type has (host) CHAR_BIT precision, so usually char/signed
char/unsigned char (for C++ later maybe std::byte); in theory we could say
allocate 4 times as big buffer for conversions to int array and depending
on endianity and storage order reversal etc., but I'm not sure if that is
something that will be actually needed in the wild.
And an optimization inside of c-common.cc attempts to undo that CPP_NUMBER
CPP_EMBED CPP_NUMBER division in case one uses #embed the usual way and
doesn't use the boundary literals in weird ways and the values there match
the surrounding bytes in the owner buffer.
For LTO, in order to avoid copying perhaps gigabytes long data around,
the hacks in the streamer out/in cause the data owned by libcpp to be
streamed right into the stream and streamed back as a STRING_CST which
owns the data.
2024-10-16 Jakub Jelinek <jakub@redhat.com>
libcpp/
* include/cpplib.h (TTYPE_TABLE): Add CPP_EMBED token type.
* files.cc (finish_embed): For limit >= 64 and C preprocessing
instead of emitting CPP_NUMBER CPP_COMMA separated sequence for the
whole embed emit it just for the first and last byte and in between
emit a CPP_EMBED token or tokens if too large.
gcc/
* treestruct.def (TS_RAW_DATA_CST): New.
* tree.def (RAW_DATA_CST): New tree code.
* tree-core.h (struct tree_raw_data): New type.
(union tree_node): Add raw_data_cst member.
* tree.h (RAW_DATA_LENGTH, RAW_DATA_POINTER, RAW_DATA_OWNER): Define.
(gt_ggc_mx, gt_pch_nx): Declare overloads for tree_raw_data *.
* tree.cc (tree_node_structure_for_code): Handle RAW_DATA_CST.
(initialize_tree_contains_struct): Handle TS_RAW_DATA_CST.
(tree_code_size): Handle RAW_DATA_CST.
(initializer_zerop): Likewise.
(gt_ggc_mx, gt_pch_nx): Define overloads for tree_raw_data *.
* gimplify.cc (gimplify_init_ctor_eval): Handle RAW_DATA_CST.
* fold-const.cc (operand_compare::operand_equal_p): Handle
RAW_DATA_CST. Formatting fix.
(operand_compare::hash_operand): Handle RAW_DATA_CST.
(native_encode_initializer): Likewise.
(get_array_ctor_element_at_index): Likewise.
(fold): Likewise.
* gimple-fold.cc (fold_array_ctor_reference): Likewise. Formatting
fix.
* varasm.cc (const_hash_1): Handle RAW_DATA_CST.
(initializer_constant_valid_p_1): Likewise.
(array_size_for_constructor): Likewise.
(output_constructor_regular_field): Likewise.
* expr.cc (categorize_ctor_elements_1): Likewise.
(expand_expr_real_1) <case ARRAY_REF>: Punt for RAW_DATA_CST.
* tree-streamer.cc (streamer_check_handled_ts_structures): Mark
TS_RAW_DATA_CST as handled.
* tree-streamer-in.cc (streamer_alloc_tree): Handle RAW_DATA_CST.
(lto_input_ts_raw_data_cst_tree_pointers): New function.
(streamer_read_tree_body): Call it for RAW_DATA_CST.
* tree-streamer-out.cc (write_ts_raw_data_cst_tree_pointers): New
function.
(streamer_write_tree_body): Call it for RAW_DATA_CST.
(streamer_write_tree_header): Handle RAW_DATA_CST.
* lto-streamer-out.cc (DFS::DFS_write_tree_body): Handle RAW_DATA_CST.
* tree-pretty-print.cc (dump_generic_node): Likewise.
gcc/c-family/
* c-ppoutput.cc (token_streamer::stream): Add special code to spell
CPP_EMBED token.
* c-lex.cc (c_lex_with_flags): Handle CPP_EMBED. Formatting fix.
* c-common.cc (c_parse_error): Handle CPP_EMBED.
(braced_list_to_string): Optimize RAW_DATA_CST surrounded by
INTEGER_CSTs which match some bytes before or after RAW_DATA_CST in
its owner.
gcc/c/
* c-parser.cc (c_parser_braced_init): Handle CPP_EMBED.
(c_parser_get_builtin_args): Likewise.
(c_parser_expression): Likewise.
(c_parser_expr_list): Likewise.
* c-typeck.cc (digest_init): Handle RAW_DATA_CST. Formatting fix.
(init_node_successor): New function.
(add_pending_init): Handle RAW_DATA_CST.
(set_nonincremental_init): Formatting fix.
(output_init_element): Handle RAW_DATA_CST. Formatting fixes.
(maybe_split_raw_data): New function.
(process_init_element): Use maybe_split_raw_data. Handle
RAW_DATA_CST.
gcc/testsuite/
* c-c++-common/cpp/embed-20.c: New test.
* c-c++-common/cpp/embed-21.c: New test.
* c-c++-common/cpp/embed-28.c: New test.
* gcc.dg/cpp/embed-8.c: New test.
* gcc.dg/cpp/embed-9.c: New test.
* gcc.dg/cpp/embed-10.c: New test.
* gcc.dg/cpp/embed-11.c: New test.
* gcc.dg/cpp/embed-12.c: New test.
* gcc.dg/cpp/embed-13.c: New test.
* gcc.dg/cpp/embed-14.c: New test.
* gcc.dg/cpp/embed-15.c: New test.
* gcc.dg/cpp/embed-16.c: New test.
* gcc.dg/pch/embed-1.c: New test.
* gcc.dg/pch/embed-1.hs: New test.
* gcc.dg/lto/embed-1_0.c: New test.
* gcc.dg/lto/embed-1_1.c: New test.
|
|
Trailing blanks is something even git diff diagnoses; while it is a coding
style issue, if it is so common that git diff diagnoses it, I think it could
be useful to various projects to check that at compile time.
Dunno if it should be included in -Wextra, currently it isn't, and due to
tons of trailing whitespace in our sources, haven't enabled it for when
building gcc itself either.
Note, git diff also diagnoses indentation with tab following space, wonder
if we couldn't have trivial warning options where one would simply ask for
checking of indentation with no tabs, just spaces vs. indentation with
tabs followed by spaces (but never tab width or more spaces in the
indentation). I think that would be easy to do also on the libcpp side.
Checking how much something should be exactly indented requires syntax
analysis (at least some limited one) and can consider columns of first token
on line, but what the exact indentation blanks were is something only libcpp
knows.
On Thu, Sep 19, 2024 at 08:17:24AM +0200, Richard Biener wrote:
> Generally I like diagnosing this early. For the above I'd say -Wtrailing-whitespace=
> with a set of things to diagnose (and a sane default - just spaces and tabs - for
> -Wtrailiing-whitespace) would be nice. As for naming possibly follow the
> is{space,blank,cntrl} character classifications? If those are a good
> fit, that is.
The patch currently allows blank (' ' '\t') and space (' ' '\t' '\f' '\v'),
cntrl not yet added, not anything non-ASCII, but in theory could
be added later (though, non-ASCII would be just for inside of comments,
say non-breaking space etc. in the source is otherwise an error).
2024-10-15 Jakub Jelinek <jakub@redhat.com>
libcpp/
* include/cpplib.h (struct cpp_options): Add
cpp_warn_trailing_whitespace member.
(enum cpp_warning_reason): Add CPP_W_TRAILING_WHITESPACE.
* internal.h (struct _cpp_line_note): Document 'W' line note.
* lex.cc (_cpp_clean_line): Add 'W' line note for trailing whitespace
except for trailing whitespace after backslash. Formatting fix.
(_cpp_process_line_notes): Emit -Wtrailing-whitespace diagnostics.
Formatting fixes.
(lex_raw_string): Clear type on 'W' notes.
gcc/
* doc/invoke.texi (Wtrailing-whitespace): Document.
gcc/c-family/
* c.opt (Wtrailing-whitespace=): New option.
(Wtrailing-whitespace): New alias.
* c.opt.urls: Regenerate.
gcc/testsuite/
* c-c++-common/cpp/Wtrailing-whitespace-1.c: New test.
* c-c++-common/cpp/Wtrailing-whitespace-2.c: New test.
* c-c++-common/cpp/Wtrailing-whitespace-3.c: New test.
* c-c++-common/cpp/Wtrailing-whitespace-4.c: New test.
* c-c++-common/cpp/Wtrailing-whitespace-5.c: New test.
* c-c++-common/cpp/Wtrailing-whitespace-6.c: New test.
* c-c++-common/cpp/Wtrailing-whitespace-7.c: New test.
* c-c++-common/cpp/Wtrailing-whitespace-8.c: New test.
* c-c++-common/cpp/Wtrailing-whitespace-9.c: New test.
* c-c++-common/cpp/Wtrailing-whitespace-10.c: New test.
|
|
_Pragma("GCC system_header") currently takes effect only partially. It does
succeed in updating the line_map, so that checks like in_system_header_at()
return correctly, but it does not update pfile->buffer->sysp. One result is
that a subsequent #include does not set up the system header state properly
for the newly included file, as pointed out in the PR. Fix by propagating
the new system header state back to the buffer after processing the pragma.
libcpp/ChangeLog:
PR preprocessor/114436
* directives.cc (destringize_and_run): If the _Pragma changed the
buffer system header state (e.g. because it was "GCC
system_header"), propagate that change back to the actual buffer
too.
gcc/testsuite/ChangeLog:
PR preprocessor/114436
* c-c++-common/cpp/pragma-system-header-1.h: New test.
* c-c++-common/cpp/pragma-system-header-2.h: New test.
* c-c++-common/cpp/pragma-system-header.c: New test.
|
|
The implementation of #pragma push_macro and #pragma pop_macro has to date
made use of an ad-hoc function, _cpp_lex_identifier(), which lexes an
identifier out of a string. When support was added for extended characters
in identifiers ($, UCNs, or UTF-8), that support was added only for the
"normal" way of lexing identifiers out of a cpp_buffer (_cpp_lex_direct) and
not for the ad-hoc way. Consequently, extended identifiers are not usable
with these pragmas.
The logic for lexing identifiers has become more complicated than it was
when _cpp_lex_identifier() was written -- it now handles things like \N{}
escapes in C++, for instance -- and it no longer seems practical to maintain
a redundant code path for lexing identifiers. Address the issue by changing
the implementation of #pragma {push,pop}_macro to lex identifiers in the
expected way, i.e. by pushing a cpp_buffer and lexing the identifier from
there.
The existing implementation has some quirks because of the ad-hoc parsing
logic. For example:
#pragma push_macro("X ")
...
#pragma pop_macro("X")
will not restore macro X (note the extra space in the first string). However:
#pragma push_macro("X ")
...
#pragma pop_macro("X ")
actually does sucessfully restore "X". This is because the key for looking
up the saved macro on the push stack is the original string passed, so the
string passed to pop_macro needs to match it exactly. It is not that easy to
reproduce this logic in the world of extended characters, given that for
example it should be valid to pass a UCN to push_macro, and the
corresponding UTF-8 to pop_macro. Given that this aspect of the existing
behavior seems unintentional and has no tests (and does not match other
implementations), I opted to make the new logic more straightforward. The
string passed needs to lex to one token, which must be a valid identifier,
or else no action is taken and no error is generated. Any diagnostics
encountered during lexing (e.g., due to a UTF-8 character not permitted to
appear in an identifier) are also suppressed.
It could be nice (for GCC 15) to also add a warning if a pop_macro does not
match a previous push_macro.
libcpp/ChangeLog:
PR preprocessor/109704
* include/cpplib.h (class cpp_auto_suppress_diagnostics): New class.
* errors.cc
(cpp_auto_suppress_diagnostics::cpp_auto_suppress_diagnostics): New
function.
(cpp_auto_suppress_diagnostics::~cpp_auto_suppress_diagnostics): New
function.
* charset.cc (noop_diagnostic_cb): Remove.
(cpp_interpret_string_ranges): Refactor diagnostic suppression logic
into new class cpp_auto_suppress_diagnostics.
(count_source_chars): Likewise.
* directives.cc (cpp_pop_definition): Add cpp_hashnode argument.
(lex_identifier_from_string): New static helper function.
(push_pop_macro_common): Refactor common logic from
do_pragma_push_macro and do_pragma_pop_macro; use
lex_identifier_from_string instead of _cpp_lex_identifier.
(do_pragma_push_macro): Reimplement using push_pop_macro_common.
(do_pragma_pop_macro): Likewise.
* internal.h (_cpp_lex_identifier): Remove.
* lex.cc (lex_identifier_intern): Remove.
(_cpp_lex_identifier): Remove.
gcc/testsuite/ChangeLog:
PR preprocessor/109704
* c-c++-common/cpp/pragma-push-pop-utf8.c: New test.
* g++.dg/pch/pushpop-2.C: New test.
* g++.dg/pch/pushpop-2.Hs: New test.
* gcc.dg/pch/pushpop-2.c: New test.
* gcc.dg/pch/pushpop-2.hs: New test.
|
|
When working on #embed support, or -Wheader-guard or other recent libcpp
changes, I've been annoyed by the libcpp diagnostics being visually
different from normal gcc diagnostics, especially in the area of quoting
stuff in the diagnostic messages.
Normall GCC diagnostics is gcc_diag/gcc_tdiag, one can use
%</%>, %qs etc. in there, while libcpp diagnostics was marked as printf
and in libcpp we've been very creative with quoting stuff, either
no quotes at all, or "something" quoting, or 'something' quoting, or
`something' quoting (but in none of the cases it used colors consistently
with the rest of the compiler).
Now, libcpp diagnostics is always emitted using a callback,
pfile->cb.diagnostic. On the gcc/ side, this callback is initialized with
genmatch.cc: cb->diagnostic = diagnostic_cb;
c-family/c-opts.cc: cb->diagnostic = c_cpp_diagnostic;
fortran/cpp.cc: cb->diagnostic = cb_cpp_diagnostic;
where the latter two just use diagnostic_report_diagnostic, so actually
support all the gcc_diag stuff, only the genmatch.cc case didn't.
So, the following patch changes genmatch.cc to use pp_format* instead
of vfprintf so that it supports the gcc_diag formatting (pretty-print.o
unfortunately has various dependencies, so had to link genmatch with
libcommon.a libbacktrace.a and tweak Makefile.in so that there are no
circular dependencies) and marks the libcpp diagnostic routines as
gcc_diag rather than printf. That change resulted in hundreds of
-Wformat-diag new warnings (most of them useful and resulting IMHO in
better diagnostics), so the rest of the patch is changing the format
strings to make -Wformat-diag happy and adjusting the testsuite for
the differences in how is the diagnostic reformatted.
Dunno if some out of GCC tree projects use libcpp, that case would
make it harder because one couldn't use vfprintf in the diagnostic
callback anymore, but there is always David's libdiagnostic which could
be used for that purpose IMHO.
2024-10-12 Jakub Jelinek <jakub@redhat.com>
libcpp/
* include/cpplib.h (ATTRIBUTE_CPP_PPDIAG): Define.
(struct cpp_callbacks): Use ATTRIBUTE_CPP_PPDIAG instead of
ATTRIBUTE_FPTR_PRINTF on diagnostic callback.
(cpp_error, cpp_warning, cpp_pedwarning, cpp_warning_syshdr): Use
ATTRIBUTE_CPP_PPDIAG (3, 4) instead of ATTRIBUTE_PRINTF_3.
(cpp_warning_at, cpp_pedwarning_at): Use ATTRIBUTE_CPP_PPDIAG (4, 5)
instead of ATTRIBUTE_PRINTF_4.
(cpp_error_with_line, cpp_warning_with_line, cpp_pedwarning_with_line,
cpp_warning_with_line_syshdr): Use ATTRIBUTE_CPP_PPDIAG (5, 6)
instead of ATTRIBUTE_PRINTF_5.
(cpp_error_at): Use ATTRIBUTE_CPP_PPDIAG (4, 5) instead of
ATTRIBUTE_PRINTF_4.
* Makefile.in (po/$(PACKAGE).pot): Use --language=GCC-source rather
than --language=c.
* errors.cc (cpp_diagnostic_at, cpp_diagnostic,
cpp_diagnostic_with_line): Use ATTRIBUTE_CPP_PPDIAG instead of
-ATTRIBUTE_FPTR_PRINTF.
* charset.cc (cpp_host_to_exec_charset, _cpp_valid_ucn, convert_hex,
convert_oct, convert_escape): Fix up -Wformat-diag warnings.
(cpp_interpret_string_ranges, count_source_chars): Use
ATTRIBUTE_CPP_PPDIAG instead of ATTRIBUTE_FPTR_PRINTF.
(narrow_str_to_charconst): Fix up -Wformat-diag warnings.
* directives.cc (check_eol_1, directive_diagnostics, lex_macro_node,
do_undef, glue_header_name, parse_include, do_include_common,
do_include_next, _cpp_parse_embed_params, do_embed, read_flag,
do_line, do_linemarker, register_pragma_1, do_pragma_once,
do_pragma_push_macro, do_pragma_pop_macro, do_pragma_poison,
do_pragma_system_header, do_pragma_warning_or_error, _cpp_do__Pragma,
do_else, do_elif, do_endif, parse_answer, do_assert,
cpp_define_unused): Likewise.
* expr.cc (cpp_classify_number, parse_defined, eval_token,
_cpp_parse_expr, reduce, check_promotion): Likewise.
* files.cc (_cpp_find_file, finish_base64_embed,
_cpp_pop_file_buffer): Likewise.
* init.cc (sanity_checks): Likewise.
* lex.cc (_cpp_process_line_notes, maybe_warn_bidi_on_char,
_cpp_warn_invalid_utf8, _cpp_skip_block_comment,
warn_about_normalization, forms_identifier_p, maybe_va_opt_error,
identifier_diagnostics_on_lex, cpp_maybe_module_directive): Likewise.
* macro.cc (class vaopt_state, builtin_has_include_1,
builtin_has_include, builtin_has_embed, _cpp_warn_if_unused_macro,
_cpp_builtin_macro_text, builtin_macro, stringify_arg,
_cpp_arguments_ok, collect_args, enter_macro_context,
_cpp_save_parameter, parse_params, create_iso_definition,
_cpp_create_definition, check_trad_stringification): Likewise.
* pch.cc (cpp_valid_state): Likewise.
* traditional.cc (_cpp_scan_out_logical_line, recursive_macro):
Likewise.
gcc/
* Makefile.in (generated_files): Remove {gimple,generic}-match*.
(generated_match_files): New variable. Add a dependency of
$(filter-out $(OBJS-libcommon),$(ALL_HOST_OBJS)) files on those.
(build/genmatch$(build_exeext)): Depend on and link against
libcommon.a and $(LIBBACKTRACE).
* genmatch.cc: Include pretty-print.h and input.h.
(ggc_internal_cleared_alloc, ggc_free): Remove.
(fatal): New function.
(line_table): Remove.
(linemap_client_expand_location_to_spelling_point): Remove.
(diagnostic_cb): Use gcc_diag rather than printf format. Use
pp_format_verbatim on a temporary pretty_printer instead of
vfprintf.
(fatal_at, warning_at): Use gcc_diag rather than printf format.
(output_line_directive): Rename location_hash to loc_hash.
(parser::eat_ident, parser::parse_operation, parser::parse_expr,
parser::parse_pattern, parser::finish_match_operand): Fix up
-Wformat-diag warnings.
gcc/c-family/
* c-lex.cc (c_common_has_attribute,
c_common_lex_availability_macro): Fix up -Wformat-diag warnings.
gcc/testsuite/
* c-c++-common/cpp/counter-2.c: Adjust expected diagnostics for
libcpp diagnostic formatting changes.
* c-c++-common/cpp/embed-3.c: Likewise.
* c-c++-common/cpp/embed-4.c: Likewise.
* c-c++-common/cpp/embed-16.c: Likewise.
* c-c++-common/cpp/embed-18.c: Likewise.
* c-c++-common/cpp/eof-2.c: Likewise.
* c-c++-common/cpp/eof-3.c: Likewise.
* c-c++-common/cpp/fmax-include-depth.c: Likewise.
* c-c++-common/cpp/has-builtin.c: Likewise.
* c-c++-common/cpp/line-2.c: Likewise.
* c-c++-common/cpp/line-3.c: Likewise.
* c-c++-common/cpp/macro-arg-count-1.c: Likewise.
* c-c++-common/cpp/macro-arg-count-2.c: Likewise.
* c-c++-common/cpp/macro-ranges.c: Likewise.
* c-c++-common/cpp/named-universal-char-escape-4.c: Likewise.
* c-c++-common/cpp/named-universal-char-escape-5.c: Likewise.
* c-c++-common/cpp/pr88974.c: Likewise.
* c-c++-common/cpp/va-opt-error.c: Likewise.
* c-c++-common/cpp/va-opt-pedantic.c: Likewise.
* c-c++-common/cpp/Wheader-guard-2.c: Likewise.
* c-c++-common/cpp/Wheader-guard-3.c: Likewise.
* c-c++-common/cpp/Winvalid-utf8-1.c: Likewise.
* c-c++-common/cpp/Winvalid-utf8-2.c: Likewise.
* c-c++-common/cpp/Winvalid-utf8-3.c: Likewise.
* c-c++-common/diagnostic-format-sarif-file-bad-utf8-pr109098-1.c:
Likewise.
* c-c++-common/diagnostic-format-sarif-file-bad-utf8-pr109098-3.c:
Likewise.
* c-c++-common/pr68833-3.c: Likewise.
* c-c++-common/raw-string-directive-1.c: Likewise.
* gcc.dg/analyzer/named-constants-Wunused-macros.c: Likewise.
* gcc.dg/binary-constants-4.c: Likewise.
* gcc.dg/builtin-redefine.c: Likewise.
* gcc.dg/cpp/19951025-1.c: Likewise.
* gcc.dg/cpp/c11-warning-1.c: Likewise.
* gcc.dg/cpp/c11-warning-2.c: Likewise.
* gcc.dg/cpp/c11-warning-3.c: Likewise.
* gcc.dg/cpp/c23-elifdef-2.c: Likewise.
* gcc.dg/cpp/c23-warning-2.c: Likewise.
* gcc.dg/cpp/embed-2.c: Likewise.
* gcc.dg/cpp/embed-3.c: Likewise.
* gcc.dg/cpp/embed-4.c: Likewise.
* gcc.dg/cpp/expr.c: Likewise.
* gcc.dg/cpp/gnu11-elifdef-2.c: Likewise.
* gcc.dg/cpp/gnu11-elifdef-3.c: Likewise.
* gcc.dg/cpp/gnu11-elifdef-4.c: Likewise.
* gcc.dg/cpp/gnu11-warning-1.c: Likewise.
* gcc.dg/cpp/gnu11-warning-2.c: Likewise.
* gcc.dg/cpp/gnu11-warning-3.c: Likewise.
* gcc.dg/cpp/gnu23-warning-2.c: Likewise.
* gcc.dg/cpp/include6.c: Likewise.
* gcc.dg/cpp/pr35322.c: Likewise.
* gcc.dg/cpp/tr-warn6.c: Likewise.
* gcc.dg/cpp/undef2.c: Likewise.
* gcc.dg/cpp/warn-comments.c: Likewise.
* gcc.dg/cpp/warn-comments-2.c: Likewise.
* gcc.dg/cpp/warn-comments-3.c: Likewise.
* gcc.dg/cpp/warn-cxx-compat.c: Likewise.
* gcc.dg/cpp/warn-cxx-compat-2.c: Likewise.
* gcc.dg/cpp/warn-deprecated.c: Likewise.
* gcc.dg/cpp/warn-deprecated-2.c: Likewise.
* gcc.dg/cpp/warn-long-long.c: Likewise.
* gcc.dg/cpp/warn-long-long-2.c: Likewise.
* gcc.dg/cpp/warn-normalized-1.c: Likewise.
* gcc.dg/cpp/warn-normalized-2.c: Likewise.
* gcc.dg/cpp/warn-normalized-3.c: Likewise.
* gcc.dg/cpp/warn-normalized-4-bytes.c: Likewise.
* gcc.dg/cpp/warn-normalized-4-unicode.c: Likewise.
* gcc.dg/cpp/warn-redefined.c: Likewise.
* gcc.dg/cpp/warn-redefined-2.c: Likewise.
* gcc.dg/cpp/warn-traditional.c: Likewise.
* gcc.dg/cpp/warn-traditional-2.c: Likewise.
* gcc.dg/cpp/warn-trigraphs-1.c: Likewise.
* gcc.dg/cpp/warn-trigraphs-2.c: Likewise.
* gcc.dg/cpp/warn-trigraphs-3.c: Likewise.
* gcc.dg/cpp/warn-trigraphs-4.c: Likewise.
* gcc.dg/cpp/warn-undef.c: Likewise.
* gcc.dg/cpp/warn-undef-2.c: Likewise.
* gcc.dg/cpp/warn-unused-macros.c: Likewise.
* gcc.dg/cpp/warn-unused-macros-2.c: Likewise.
* gcc.dg/pch/counter-2.c: Likewise.
* g++.dg/cpp0x/udlit-error1.C: Likewise.
* g++.dg/cpp23/named-universal-char-escape1.C: Likewise.
* g++.dg/cpp23/named-universal-char-escape2.C: Likewise.
* g++.dg/cpp23/Winvalid-utf8-1.C: Likewise.
* g++.dg/cpp23/Winvalid-utf8-2.C: Likewise.
* g++.dg/cpp23/Winvalid-utf8-3.C: Likewise.
* g++.dg/cpp23/Winvalid-utf8-4.C: Likewise.
* g++.dg/cpp23/Winvalid-utf8-5.C: Likewise.
* g++.dg/cpp23/Winvalid-utf8-6.C: Likewise.
* g++.dg/cpp23/Winvalid-utf8-7.C: Likewise.
* g++.dg/cpp23/Winvalid-utf8-8.C: Likewise.
* g++.dg/cpp23/Winvalid-utf8-9.C: Likewise.
* g++.dg/cpp23/Winvalid-utf8-10.C: Likewise.
* g++.dg/cpp23/Winvalid-utf8-11.C: Likewise.
* g++.dg/cpp23/Winvalid-utf8-12.C: Likewise.
* g++.dg/cpp/elifdef-3.C: Likewise.
* g++.dg/cpp/elifdef-5.C: Likewise.
* g++.dg/cpp/elifdef-6.C: Likewise.
* g++.dg/cpp/elifdef-7.C: Likewise.
* g++.dg/cpp/embed-1.C: Likewise.
* g++.dg/cpp/embed-2.C: Likewise.
* g++.dg/cpp/pedantic-errors.C: Likewise.
* g++.dg/cpp/warning-1.C: Likewise.
* g++.dg/cpp/warning-2.C: Likewise.
* g++.dg/ext/bitint1.C: Likewise.
* g++.dg/ext/bitint2.C: Likewise.
|
|
It is autumn again and there is a new Unicode version 16.0.
The following patch updates our Unicode stuff in contrib, libcpp and
libstdc++ from that Unicode version.
2024-10-08 Jakub Jelinek <jakub@redhat.com>
contrib/
* unicode/README: Update glibc git commit hash, replace
Unicode 15 or 15.1 versions with 16.
* unicode/gen_libstdcxx_unicode_data.py: Use 160000 instead of
150100 in _GLIBCXX_GET_UNICODE_DATA test.
* unicode/from_glibc/utf8_gen.py: Updated from glibc
064c708c78cc2a6b5802dce73108fc0c1c6bfc80 commit.
* unicode/DerivedCoreProperties.txt: Updated from Unicode 16.0.
* unicode/emoji-data.txt: Likewise.
* unicode/PropList.txt: Likewise.
* unicode/GraphemeBreakProperty.txt: Likewise.
* unicode/DerivedNormalizationProps.txt: Likewise.
* unicode/NameAliases.txt: Likewise.
* unicode/UnicodeData.txt: Likewise.
* unicode/EastAsianWidth.txt: Likewise.
gcc/testsuite/
* c-c++-common/cpp/named-universal-char-escape-1.c: Add tests
for some Unicode 16.0 characters, both normal and generated.
libcpp/
* makeucnid.cc (write_copyright): Update Unicode Copyright years.
* makeuname2c.cc (generated_ranges): Adjust Unicode version from 15.1
to 16.0. Add EGYPTIAN HIEROGLYPH- generated range, adjust indexes in
following entries.
(write_copyright): Update Unicode Copyright years.
* generated_cpp_wcwidth.h: Regenerated.
* ucnid.h: Regenerated.
* uname2c.h: Regenerated.
libstdc++-v3/
* include/bits/unicode.h (std::__unicode::__v15_1_0): Rename inline
namespace to ...
(std::__unicode::__v16_0_0): ... this.
(_GLIBCXX_GET_UNICODE_DATA): Change from 150100 to 160000.
* include/bits/unicode-data.h: Regenerated.
* testsuite/ext/unicode/properties.cc: Check for _Gcb_SpacingMark
on U+11F03 rather than U+1D16D as the latter lost SpacingMark property
in Unicode 16.0.
|
|
The following patch implements the clang -Wheader-guard warning, which warns
if a valid multiple inclusion header guard's #ifndef/#if !defined directive
is immediately (no other non-line directives nor other (non-comment)
tokens in between) followed by #define directive for some different macro,
which in get_suggestion rules is close enough to the actual header guard
macro (i.e. likely misspelling), the #define is object-like with empty
definition (I've followed what clang implements) and the macro isn't defined
later on (at least not on the final #endif at the end of a header).
In this case it emits a warning, so that
#ifndef STDIO_H
#define STDOI_H
...
#endif
or similar misspellings can be caught.
clang enables this warning by default, but I've put it into -Wall instead
as it still seems to be a style warning, nothing more severe; if a header
doesn't survive multiple inclusion because of the misspelling, users will
get different diagnostics.
2024-10-02 Jakub Jelinek <jakub@redhat.com>
PR preprocessor/96842
libcpp/
* include/cpplib.h (struct cpp_options): Add warn_header_guard member.
(enum cpp_warning_reason): Add CPP_W_HEADER_GUARD enumerator.
* internal.h (struct cpp_reader): Add mi_def_cmacro, mi_loc and
mi_def_loc members.
(_cpp_defined_macro_p): Constify type pointed by argument type.
Formatting fix.
* init.cc (cpp_create_reader): Clear
CPP_OPTION (pfile, warn_header_guard).
* directives.cc (struct if_stack): Add def_loc and mi_def_cmacro
members.
(DIRECTIVE_TABLE): Add IF_COND flag to define.
(do_define): Set ifs->mi_def_cmacro on a define immediately following
#ifndef directive for the guard. Clear pfile->mi_valid. Formatting
fix.
(do_endif): Copy over pfile->mi_def_cmacro and pfile->mi_def_loc
if ifs->mi_def_cmacro is set and pfile->mi_cmacro isn't a defined
macro.
(push_conditional): Clear mi_def_cmacro and mi_def_loc members.
* files.cc (_cpp_pop_file_buffer): Emit -Wheader-guard diagnostics.
gcc/
* doc/invoke.texi (Wheader-guard): Document.
gcc/c-family/
* c.opt (Wheader-guard): New option.
* c.opt.urls: Regenerated.
* c-ppoutput.cc (init_pp_output): Initialize also cb->get_suggestion.
gcc/testsuite/
* c-c++-common/cpp/Wheader-guard-1.c: New test.
* c-c++-common/cpp/Wheader-guard-1-1.h: New test.
* c-c++-common/cpp/Wheader-guard-1-2.h: New test.
* c-c++-common/cpp/Wheader-guard-1-3.h: New test.
* c-c++-common/cpp/Wheader-guard-1-4.h: New test.
* c-c++-common/cpp/Wheader-guard-1-5.h: New test.
* c-c++-common/cpp/Wheader-guard-1-6.h: New test.
* c-c++-common/cpp/Wheader-guard-1-7.h: New test.
* c-c++-common/cpp/Wheader-guard-1-8.h: New test.
* c-c++-common/cpp/Wheader-guard-1-9.h: New test.
* c-c++-common/cpp/Wheader-guard-1-10.h: New test.
* c-c++-common/cpp/Wheader-guard-1-11.h: New test.
* c-c++-common/cpp/Wheader-guard-1-12.h: New test.
* c-c++-common/cpp/Wheader-guard-2.c: New test.
* c-c++-common/cpp/Wheader-guard-2.h: New test.
* c-c++-common/cpp/Wheader-guard-3.c: New test.
* c-c++-common/cpp/Wheader-guard-3.h: New test.
|
|
This patch which adds another #embed extension, gnu::base64.
As mentioned in the documentation, this extension is primarily
intended for use by the preprocessor, so that for the larger (say 32+ or
64+ bytes long embeds it doesn't have to emit tens of thousands or
millions of comma separated string literals which would be very expensive
to parse again, but can emit
#embed "." __gnu__::__base64__( \
"Tm9uIGVyYW0gbsOpc2NpdXMsIEJydXRlLCBjdW0sIHF1w6Ygc3VtbWlzIGluZ8OpbmlpcyBleHF1" \
"aXNpdMOhcXVlIGRvY3Ryw61uYSBwaGlsw7Nzb3BoaSBHcsOmY28gc2VybcOzbmUgdHJhY3RhdsOt" \
"c3NlbnQsIGVhIExhdMOtbmlzIGzDrXR0ZXJpcyBtYW5kYXLDqW11cywgZm9yZSB1dCBoaWMgbm9z" \
"dGVyIGxhYm9yIGluIHbDoXJpYXMgcmVwcmVoZW5zacOzbmVzIGluY8O6cnJlcmV0LiBuYW0gcXVp" \
"YsO6c2RhbSwgZXQgaWlzIHF1aWRlbSBub24gw6FkbW9kdW0gaW5kw7NjdGlzLCB0b3R1bSBob2Mg" \
"ZMOtc3BsaWNldCBwaGlsb3NvcGjDoXJpLiBxdWlkYW0gYXV0ZW0gbm9uIHRhbSBpZCByZXByZWjD" \
"qW5kdW50LCBzaSByZW3DrXNzaXVzIGFnw6F0dXIsIHNlZCB0YW50dW0gc3TDumRpdW0gdGFtcXVl" \
"IG11bHRhbSDDs3BlcmFtIHBvbsOpbmRhbSBpbiBlbyBub24gYXJiaXRyw6FudHVyLiBlcnVudCDD" \
"qXRpYW0sIGV0IGlpIHF1aWRlbSBlcnVkw610aSBHcsOmY2lzIGzDrXR0ZXJpcywgY29udGVtbsOp" \
"bnRlcyBMYXTDrW5hcywgcXVpIHNlIGRpY2FudCBpbiBHcsOmY2lzIGxlZ8OpbmRpcyDDs3BlcmFt" \
"IG1hbGxlIGNvbnPDum1lcmUuIHBvc3Ryw6ltbyDDoWxpcXVvcyBmdXTDunJvcyBzw7pzcGljb3Is" \
"IHF1aSBtZSBhZCDDoWxpYXMgbMOtdHRlcmFzIHZvY2VudCwgZ2VudXMgaG9jIHNjcmliw6luZGks" \
"IGV0c2kgc2l0IGVsw6lnYW5zLCBwZXJzw7Nuw6YgdGFtZW4gZXQgZGlnbml0w6F0aXMgZXNzZSBu" \
"ZWdlbnQu")
with the meaning don't actually load some file, instead base64 decode
(RFC4648 with A-Za-z0-9+/ chars and = padding, no newlines in between)
the string and use that as data. This is chosen because it should be
-pedantic-errors clean, fairly cheap to decode and then in optimizing
compiler could be handled as similar binary blob to normal #embed,
while the data isn't left somewhere on the disk, so distcc/ccache etc.
can move the preprocessed source without issues.
It makes no sense to support limit and gnu::offset parameters together
with it IMHO, why would somebody waste providing full data and then
threw some away? prefix/suffix/if_empty are normally supported though,
but not intended to be used by the preprocessor.
This patch adds just the extension side, not the actual emitting of this
during -E or -E -fdirectives-only for now, that will be included in the
upcoming patch.
Compared to the earlier posted version of this extension, this patch
allows the string concatenation in the parameter argument (but still
doesn't allow escapes in the string, why would anyone use them when
only A-Za-z0-9+/= are valid). The patch also adds support for parsing
this even in -fpreprocessed compilation.
2024-09-12 Jakub Jelinek <jakub@redhat.com>
libcpp/
* internal.h (struct cpp_embed_params): Add base64 member.
(_cpp_free_embed_params_tokens): Declare.
* directives.cc (DIRECTIVE_TABLE): Add IN_I flag to T_EMBED.
(save_token_for_embed, _cpp_free_embed_params_tokens): New functions.
(EMBED_PARAMS): Add gnu::base64 entry.
(_cpp_parse_embed_params): Parse gnu::base64 parameter. If
-fpreprocessed without -fdirectives-only, require #embed to have
gnu::base64 parameter. Diagnose conflict between gnu::base64 and
limit or gnu::offset parameters.
(do_embed): Use _cpp_free_embed_params_tokens.
* files.cc (finish_embed, base64_dec_fn): New functions.
(base64_dec): New array.
(B64D0, B64D1, B64D2, B64D3): Define.
(finish_base64_embed): New function.
(_cpp_stack_embed): Use finish_embed. Handle params->base64
using finish_base64_embed.
* macro.cc (builtin_has_embed): Call _cpp_free_embed_params_tokens.
gcc/
* doc/cpp.texi (Binary Resource Inclusion): Document gnu::base64
parameter.
gcc/testsuite/
* c-c++-common/cpp/embed-17.c: New test.
* c-c++-common/cpp/embed-18.c: New test.
* c-c++-common/cpp/embed-19.c: New test.
* c-c++-common/cpp/embed-27.c: New test.
* gcc.dg/cpp/embed-6.c: New test.
* gcc.dg/cpp/embed-7.c: New test.
|
|
The following patch adds on top of the just posted #embed patch
a first extension, gnu::offset which allows to seek in the data
file (for seekable files, otherwise read and throw away).
I think this is useful e.g. when some binary data start with
some well known header which shouldn't be included in the data etc.
2024-09-12 Jakub Jelinek <jakub@redhat.com>
libcpp/
* internal.h (struct cpp_embed_params): Add offset member.
* directives.cc (EMBED_PARAMS): Add gnu::offset entry.
(enum embed_param_kind): Add NUM_EMBED_STD_PARAMS.
(_cpp_parse_embed_params): Use NUM_EMBED_STD_PARAMS rather than
NUM_EMBED_PARAMS when parsing standard parameters. Parse gnu::offset
parameter.
* files.cc (struct _cpp_file): Add offset member.
(_cpp_stack_embed): Handle params->offset.
gcc/
* doc/cpp.texi (Binary Resource Inclusion): Document gnu::offset
#embed parameter.
gcc/testsuite/
* c-c++-common/cpp/embed-15.c: New test.
* c-c++-common/cpp/embed-16.c: New test.
* gcc.dg/cpp/embed-5.c: New test.
|
|
The following patch implements the C23 N3017 "#embed - a scannable,
tooling-friendly binary resource inclusion mechanism" paper.
The implementation is intentionally dumb, in that it doesn't significantly
speed up compilation of larger initializers and doesn't make it possible
to use huge #embeds (like several gigabytes large, that is compile time
and memory still infeasible).
There are 2 reasons for this. One is that I think like it is implemented
now in the patch is how we should use it for the smaller #embed sizes,
dunno with which boundary, whether 32 bytes or 64 or something like that,
certainly handling the single byte cases which is something that can appear
anywhere in the source where constant integer literal can appear is
desirable and I think for a few bytes it isn't worth it to come up with
something smarter and users would like to e.g. see it in -E readably as
well (perhaps the slow vs. fast boundary should be determined by command
line option). And the other one is to be able to more easily find
regressions in behavior caused by the optimizations, so we have something
to get back in git to compare against.
I'm definitely willing to work on the optimizations (likely introduce a new
CPP_* token type to refer to a range of libcpp owned memory (start + size)
and similarly some tree which can do the same, and can be at any time e.g.
split into 2 subparts + say INTEGER_CST in between if needed say for
const unsigned char d[] = {
#embed "2GB.dat" prefix (0, 0, ) suffix (, [0x40000000] = 42)
}; still without having to copy around huge amounts of data; STRING_CST
owns the memory it points to and can be only 2GB in size), but would
like to do that incrementally.
And would like to first include some extensions also not included in
this patch, like gnu::offset (off) parameter to allow to skip certain
constant amount of bytes at the start of the files, plus
gnu::base64 ("base64_encoded_data") parameter to add something which can
store more efficiently large amounts of the #embed data in preprocessed
source.
I've been cross-checking all the tests also against the LLVM implementation
https://github.com/llvm/llvm-project/pull/68620
which has been for a few hours even committed to LLVM trunk but reverted
afterwards. LLVM now has the support committed and I admit I haven't
rechecked whether the behavior on the below mentioned spots have been fixed
in it already or not yet.
The patch uses --embed-dir= option that clang plans to add above and doesn't
use other variants on the search directories yet, plus there are no
default directories at least for the time being where to search for embed
files. So, #embed "..." works if it is found in the same directory (or
relative to the current file's directory) and #embed "/..." or #embed </...>
work always, but relative #embed <...> doesn't unless at least one
--embed-dir= is specified. There is no reason to differentiate between
system and non-system directories, so we don't need -isystem like
counterpart, perhaps -iquote like counterpart could be useful in the future,
dunno what else. It has --embed-directory=dir and --embed-directory dir
as aliases.
There are some differences beyond clang ICEs, so I'd like to point them out
to make sure there is agreement on the choices in the patch. They are also
mentioned in the comments of the llvm pull request.
The most important is that the GCC patch (as well as the original thephd.dev
LLVM branch on godbolt) expands #embed (or acts as if it is expanded) into
a mere sequence of numbers like 123,2,35,26 rather then what clang
effectively treats as (unsigned char)123,(unsigned char)2,(unsigned
char)35,(unsigned char)26 but only does that when using integrated
preprocessor, not when using -save-temps where it acts as GCC.
JeanHeyd as the original author agrees that is how it is currently worded in
C23.
Another difference (not tested in the testsuite, not sure how to check for
effective target /dev/urandom nor am sure it is desirable to check that
during testsuite) is how to treat character devices, named pipes etc.
(block devices are errored on). The original paper uses /dev/urandom
in various examples and seems to assume that unlike regular files the
devices aren't really cached, so
#embed </dev/urandom> limit(1) prefix(int a = ) suffix(;)
#embed </dev/urandom> limit(1) prefix(int b = ) suffix(;)
usually results in a != b. That is what the godbolt thephd.dev branch
implements too and what this patch does as well, but clang actually seems
to just go from st.st_size == 0, ergo it must be zero-sized resource and
so just copies over if_empty if present. It is really questionable
what to do about the character devices/named pipes with __has_embed, for
regular files the patch doesn't read anything from them, relies on
st.st_size + limit for whether it is empty or non-empty. But I don't know
of a way to check if read on say a character device would read anything
or not (the </dev/null> limit (1) vs. </dev/zero> limit (1) cases), and
if we read something, that would be better cached for later because
#embed later if it reads again could read no further data even when it
first read something. So, the patch currently for __has_embed just
always returns 2 on the non-regular files, like the thephd.dev
branch does as well and like the clang pull request as well.
A question is also what to do for gnu::offset on the non-regular files
even for #embed, those aren't seekable and do we want to just read and throw
away the offset bytes each time we see it used?
clang also chokes on the
#if __has_embed (__FILE__ __limit__ (1) __prefix__ () suffix (1 / 0) \
__if_empty__ ((({{[0[0{0{0(0(0)1)1}1}]]}})))) != __STDC_EMBED_FOUND__
#error "__has_embed fail"
#endif
in embed-1.c, but thephd.dev branch accepts it and I don't see why
it shouldn't, (({{[0[0{0{0(0(0)1)1}1}]]}}))) is a balanced token
sequence and the file isn't empty, so it should just be parsed and
discarded.
clang also IMHO mishandles
const unsigned char w[] = {
#embed __FILE__ prefix([0] = 42, [15] =) limit(32)
};
but again only without -save-temps, seems like it
treats it as
[0] = 42, [15] = (99,111,110,115,116,32,117,110,115,105,103,110,101,100,
32,99,104,97,114,32,119,91,93,32,61,32,123,10,35,101,109,98)
rather than
[0] = 42, [15] = 99,111,110,115,116,32,117,110,115,105,103,110,101,100,
32,99,104,97,114,32,119,91,93,32,61,32,123,10,35,101,109,98
and warns on it for -Wunused-value and just compiles it as
[0] = 42, [15] = 98
And also
void foo (int, int, int, int);
void bar (void) { foo (
#embed __FILE__ limit (4) prefix (172 + ) suffix (+ 2)
); }
is treated as
172 + (118, 111, 105, 100) + 2
rather than
172 + 118, 111, 105, 100 + 2
which clang -save-temps or GCC treats it like, so results
in just one argument passed rather than 4.
if (!strstr ((const char *) magna_carta, "imprisonétur")) abort ();
in the testcase fails as well, but in that case calling it in gdb succeeds:
p ((char *(*)(char *, char *))__strstr_sse2) (magna_carta, "imprisonétur")
$2 = 0x555555558d3c <magna_carta+11564> "imprisonétur aut disseisiátur"...
so I guess they are just trying to constant evaluate strstr and do it
incorrectly.
They started with making the optimizations together in the initial patch
set, so they don't have the luxury to compare if it is just because of
the optimization they are trying to do or because that is how the
feature works for them. At least unless they use -save-temps for now.
There is also different behavior between clang and gcc on -M or other
dependency generating options. Seems clang includes the __has_embed
searched files in dependencies, while my patch doesn't. But so does
clang for __has_include and GCC doesn't. Emitting a hard dependency
on some header just because there was __has_include/__has_embed for it
seems wrong to me, because (at least when properly written) the source
likely doesn't mind if the file is missing, it will do something else,
so a hard error from make because of it doesn't seem right. Does
make have some weaker dependencies, such that if some file can be remade
it is but if it doesn't exist, it isn't fatal?
I wonder whether #embed <non-existent-file> really needs to be fatal
or whether we could simply after diagnosing it pretend the file exists
and is empty. For #include I think fatal errors make tons of sense,
but perhaps for #embed which is more localized we'd get better error
reporting if we didn't bail out immediately. Note, both GCC and clang
currently treat those as fatal errors.
clang also added -dE option which with -E instead of preprocessing
the #embed directives keeps them as is, but the preprocessed source
then isn't self-contained. That option looks more harmful than useful to
me.
Also, it isn't clear to me from C23 whether it is possible to have
__has_include/__has_c_attribute/__has_embed expressions inside of
the limit #embed/__has_embed argument.
6.10.3.2/2 says that defined should not appear there (and the patch
diagnoses it and testsuite tests), but for __has_include/__has_embed
etc. 6.10.1/11 says:
"The identifiers __has_include, __has_embed, and __has_c_attribute
shall not appear in any context not mentioned in this subclause."
If that subclause in that case means 6.10.1, then it presumably shouldn't
appear in #embed in 6.10.3, but __has_embed is in 6.10.1...
But 6.10.3.2/3 says that it should be parsed according to the 6.10.1
rules. Haven't included tests like
#if __has_embed (__FILE__ limit (__has_embed (__FILE__ limit (1))))
or
#embed __FILE__ limit (__has_include (__FILE__))
into the testsuite because of the doubts but I think the patch should
handle those right now.
The reason I've used Magna Carta text in some of the testcases is that
I hope it shouldn't be copyrighted after the centuries and I'd strongly
prefer not to have binary blobs in git after the xz backdoor lesson
and wanted something larger which doesn't change all the time.
Oh, BTW, I see in C23 draft 6.10.3.2 in Example 4
if (f_source == NULL);
return 1;
(note the spurious semicolon after closing paren), has that been fixed
already?
Like the thephd.dev and clang implementations, the patch always macro
expands the whole #embed and __has_embed directives except for the
embed keyword. That is most likely not what C23 says, my limited
understanding right now is that in #embed one needs to parse the whole
directive line with macro expansion disabled and check if it satisfies the
grammar, if not, the whole directive is macro expanded, if yes, only
the limit parameter argument is macro expanded and the prefix/suffix/if_empty
arguments are maybe macro expanded when actually used (and not at all if
unused). And I think __has_embed macro expansion has conflicting rules.
2024-09-12 Jakub Jelinek <jakub@redhat.com>
PR c/105863
libcpp/
* include/cpplib.h: Implement C23 N3017 #embed - a scannable,
tooling-friendly binary resource inclusion mechanism paper.
(struct cpp_options): Add embed member.
(enum cpp_builtin_type): Add BT_HAS_EMBED.
(cpp_set_include_chains): Add another cpp_dir * argument to
the declaration.
* internal.h (enum include_type): Add IT_EMBED.
(struct cpp_reader): Add embed_include member.
(struct cpp_embed_params_tokens): New type.
(struct cpp_embed_params): New type.
(_cpp_get_token_no_padding): Declare.
(enum _cpp_find_file_kind): Add _cpp_FFK_EMBED and _cpp_FFK_HAS_EMBED.
(_cpp_stack_embed): Declare.
(_cpp_parse_expr): Change return type to cpp_num_part instead of
bool, change second argument from bool to const char * and add third
argument.
(_cpp_parse_embed_params): Declare.
* directives.cc (DIRECTIVE_TABLE): Add embed entry.
(end_directive): Don't call skip_rest_of_line for T_EMBED directive.
(_cpp_handle_directive): Return 2 rather than 1 for T_EMBED in
directives-only mode.
(parse_include): Don't Call check_eol for T_EMBED directive.
(skip_balanced_token_seq): New function.
(EMBED_PARAMS): Define.
(enum embed_param_kind): New type.
(embed_params): New variable.
(_cpp_parse_embed_params): New function.
(do_embed): New function.
(do_if): Adjust _cpp_parse_expr caller.
(do_elif): Likewise.
* expr.cc (parse_defined): Diagnose defined in #embed or __has_embed
parameters.
(_cpp_parse_expr): Change return type to cpp_num_part instead of
bool, change second argument from bool to const char * and add third
argument. Adjust function comment. For #embed/__has_embed parameters
add an artificial CPP_OPEN_PAREN. Use the second argument DIR
directly instead of string literals conditional on IS_IF.
For #embed/__has_embed parameter, stop on reaching CPP_CLOSE_PAREN
matching the artificial one. Diagnose negative or too large embed
parameter operands.
(num_binary_op): Use #embed instead of #if for diagnostics if inside
#embed/__has_embed parameter.
(num_div_op): Likewise.
* files.cc (struct _cpp_file): Add limit member and embed bitfield.
(search_cache): Add IS_EMBED argument, formatting fix. Skip over
files with different file->embed from the argument.
(find_file_in_dir): Don't call pch_open_file if file->embed.
(_cpp_find_file): Handle _cpp_FFK_EMBED and _cpp_FFK_HAS_EMBED.
(read_file_guts): Formatting fix.
(has_unique_contents): Ignore file->embed files.
(search_path_head): Handle IT_EMBED type.
(_cpp_stack_embed): New function.
(_cpp_get_file_stat): Formatting fix.
(cpp_set_include_chains): Add embed argument, save it to
pfile->embed_include and compute lens for the chain.
* init.cc (struct lang_flags): Add embed member.
(lang_defaults): Add embed initializers.
(cpp_set_lang): Initialize CPP_OPTION (pfile, embed).
(builtin_array): Add __has_embed entry.
(cpp_init_builtins): Predefine __STDC_EMBED_NOT_FOUND__,
__STDC_EMBED_FOUND__ and __STDC_EMBED_EMPTY__.
* lex.cc (cpp_directive_only_process): Handle #embed.
* macro.cc (cpp_get_token_no_padding): Rename to ...
(_cpp_get_token_no_padding): ... this. No longer static.
(builtin_has_include_1): New function.
(builtin_has_include): Use it. Use _cpp_get_token_no_padding
instead of cpp_get_token_no_padding.
(builtin_has_embed): New function.
(_cpp_builtin_macro_text): Handle BT_HAS_EMBED.
gcc/
* doc/cppdiropts.texi (--embed-dir=): Document.
* doc/cpp.texi (Binary Resource Inclusion): New chapter.
(__has_embed): Document.
* doc/invoke.texi (Directory Options): Mention --embed-dir=.
* gcc.cc (cpp_unique_options): Add %{-embed*}.
* genmatch.cc (main): Adjust cpp_set_include_chains caller.
* incpath.h (enum incpath_kind): Add INC_EMBED.
* incpath.cc (merge_include_chains): Handle INC_EMBED.
(register_include_chains): Adjust cpp_set_include_chains caller.
gcc/c-family/
* c.opt (-embed-dir=): New option.
(-embed-directory): New alias.
(-embed-directory=): New alias.
* c-opts.cc (c_common_handle_option): Handle OPT__embed_dir_.
gcc/testsuite/
* c-c++-common/cpp/embed-1.c: New test.
* c-c++-common/cpp/embed-2.c: New test.
* c-c++-common/cpp/embed-3.c: New test.
* c-c++-common/cpp/embed-4.c: New test.
* c-c++-common/cpp/embed-5.c: New test.
* c-c++-common/cpp/embed-6.c: New test.
* c-c++-common/cpp/embed-7.c: New test.
* c-c++-common/cpp/embed-8.c: New test.
* c-c++-common/cpp/embed-9.c: New test.
* c-c++-common/cpp/embed-10.c: New test.
* c-c++-common/cpp/embed-11.c: New test.
* c-c++-common/cpp/embed-12.c: New test.
* c-c++-common/cpp/embed-13.c: New test.
* c-c++-common/cpp/embed-14.c: New test.
* c-c++-common/cpp/embed-25.c: New test.
* c-c++-common/cpp/embed-26.c: New test.
* c-c++-common/cpp/embed-dir/embed-1.inc: New test.
* c-c++-common/cpp/embed-dir/embed-3.c: New test.
* c-c++-common/cpp/embed-dir/embed-4.c: New test.
* c-c++-common/cpp/embed-dir/magna-carta.txt: New test.
* gcc.dg/cpp/embed-1.c: New test.
* gcc.dg/cpp/embed-2.c: New test.
* gcc.dg/cpp/embed-3.c: New test.
* gcc.dg/cpp/embed-4.c: New test.
* g++.dg/cpp/embed-1.C: New test.
* g++.dg/cpp/embed-2.C: New test.
* g++.dg/cpp/embed-3.C: New test.
|
|
PR preprocessor/90581
* c-c++-common/cpp/fmax-include-depth.c: Fix whitespace in dg directive.
|
|
When the file name for a #include directive is the result of stringifying a
macro argument, libcpp needs to take some care to get the whitespace
correct; in particular stringify_arg() needs to see a CPP_PADDING token
between macro tokens so that it can figure out when to output space between
tokens. The CPP_PADDING tokens are not normally generated when handling a
preprocessor directive, but for #include-like directives, libcpp sets the
state variable pfile->state.directive_wants_padding to TRUE so that the
CPP_PADDING tokens will be output, and then everything works fine for
computed includes.
As the PR points out, things do not work fine for __has_include. Fix that by
setting the state variable the same as is done for #include.
libcpp/ChangeLog:
PR preprocessor/110558
* macro.cc (builtin_has_include): Set
pfile->state.directive_wants_padding prior to lexing the
file name, in case it comes from macro expansion.
gcc/testsuite/ChangeLog:
PR preprocessor/110558
* c-c++-common/cpp/has-include-2.c: New test.
* c-c++-common/cpp/has-include-2.h: New test.
|
|
In libcpp/files.cc, the function _cpp_has_header(), which implements
__has_include and __has_include_next, does not check for a NULL return value
from search_path_head(), leading to an ICE tripping an assert when
_cpp_find_file() tries to use it. Fix it by checking for that case and
silently returning false instead.
As suggested by the PR author, it is easiest to make a testcase by using
the -idirafter option. To enable that, also modify the dg-additional-options
testsuite procedure to make the global $srcdir available, since -idirafter
requires the full path.
libcpp/ChangeLog:
PR preprocessor/80755
* files.cc (search_path_head): Add SUPPRESS_DIAGNOSTIC argument
defaulting to false.
(_cpp_has_header): Silently return false if the search path has been
exhausted, rather than issuing a diagnostic and then hitting an
assert.
gcc/testsuite/ChangeLog:
* lib/gcc-defs.exp (dg-additional-options): Make $srcdir usable in a
dg-additional-options directive.
* c-c++-common/cpp/has-include-next-2-dir/has-include-next-2.h: New test.
* c-c++-common/cpp/has-include-next-2.c: New test.
|
|
The PR requests an enhancement to the diagnostic issued for the use of a
poisoned identifier. Currently, we show the location of the usage, but not
the location which requested the poisoning, which would be helpful for the
user if the decision to poison an identifier was made externally, such as
in a library header.
In order to output this information, we need to remember a location_t for
each identifier that has been poisoned, and that data needs to be preserved
as well in a PCH. One option would be to add a field to struct cpp_hashnode,
but there is no convenient place to add it without increasing the size of
the struct for all identifiers. Given this facility will be needed rarely,
it seemed better to add a second hash map, which is handled PCH-wise the
same as the current one in gcc/stringpool.cc. This hash map associates a new
struct cpp_hashnode_extra with each identifier that needs one. Currently
that struct only contains the new location_t, but it could be extended in
the future if there is other ancillary data that may be convenient to put
there for other purposes.
libcpp/ChangeLog:
PR preprocessor/36887
* directives.cc (do_pragma_poison): Store in the extra hash map the
location from which an identifier has been poisoned.
* lex.cc (identifier_diagnostics_on_lex): When issuing a diagnostic
for the use of a poisoned identifier, also add a note indicating the
location from which it was poisoned.
* identifiers.cc (alloc_node): Convert to template function.
(_cpp_init_hashtable): Handle the new extra hash map.
(_cpp_destroy_hashtable): Likewise.
* include/cpplib.h (struct cpp_hashnode_extra): New struct.
(cpp_create_reader): Update prototype to...
* init.cc (cpp_create_reader): ...accept an argument for the extra
hash table and pass it to _cpp_init_hashtable.
* include/symtab.h (ht_lookup): New overload for convenience.
* internal.h (struct cpp_reader): Add EXTRA_HASH_TABLE member.
(_cpp_init_hashtable): Adjust prototype.
gcc/c-family/ChangeLog:
PR preprocessor/36887
* c-opts.cc (c_common_init_options): Pass new extra hash map
argument to cpp_create_reader().
gcc/ChangeLog:
PR preprocessor/36887
* toplev.h (ident_hash_extra): Declare...
* stringpool.cc (ident_hash_extra): ...this new global variable.
(init_stringpool): Handle ident_hash_extra as well as ident_hash.
(ggc_mark_stringpool): Likewise.
(ggc_purge_stringpool): Likewise.
(struct string_pool_data_extra): New struct.
(spd2): New GC root variable.
(gt_pch_save_stringpool): Use spd2 to handle ident_hash_extra,
analogous to how spd is used to handle ident_hash.
(gt_pch_restore_stringpool): Likewise.
gcc/testsuite/ChangeLog:
PR preprocessor/36887
* c-c++-common/cpp/diagnostic-poison.c: New test.
* g++.dg/pch/pr36887.C: New test.
* g++.dg/pch/pr36887.Hs: New test.
|
|
As noted on the PR, commit r13-1544, the fix for PR53431, did not handle
the specific case of -Wunknown-pragmas, because that warning is issued
during preprocessing, but not by libcpp directly (it comes from the
cb_def_pragma callback). Address that by handling this pragma in
addition to libcpp pragmas during the early pragma handler.
gcc/c-family/ChangeLog:
PR c++/89038
* c-pragma.cc (handle_pragma_diagnostic_impl): Handle
-Wunknown-pragmas during early processing.
gcc/testsuite/ChangeLog:
PR c++/89038
* c-c++-common/cpp/Wunknown-pragmas-1.c: New test.
|
|
This PR was fixed by r12-4797 and r12-5454. Add test coverage from the PR
that is not represented elsewhere.
gcc/testsuite/ChangeLog:
PR preprocessor/82335
* c-c++-common/cpp/diagnostic-pragma-3.c: New test.
|
|
The PR was fixed by r12-5454. Since the fix was somewhat incidental,
although related, add a testcase from PR90400 too before closing it out.
gcc/testsuite/ChangeLog:
PR preprocessor/90400
* c-c++-common/cpp/pr90400.c: New test.
|
|
As noted in the PR, GCC will segfault if a file name is first seen in a
linemarker directive, and then later seen in a normal #include. This is
because the fake include process adds the file to the cache with a null PATH
member. The normal #include finds this file in the cache and then attempts
to use the null PATH. Resolve by adding the file to the cache with a unique
starting directory, so that the fake entry will only be found by a
subsequent fake include, not by a real one.
libcpp/ChangeLog:
PR preprocessor/61474
* files.cc (_cpp_find_file): Set DONT_READ to TRUE for fake
include files.
(_cpp_fake_include): Pass a unique cpp_dir* address so
the fake file will not be found when looked up for real.
gcc/testsuite/ChangeLog:
PR preprocessor/61474
* c-c++-common/cpp/pr61474-2.h: New test.
* c-c++-common/cpp/pr61474.c: New test.
* c-c++-common/cpp/pr61474.h: New test.
|
|
When libcpp reports diagnostics whose locus is a macro name (such as for
-Wunused-macros), it uses the location in the cpp_macro object that was
stored by _cpp_new_macro. This is currently set to pfile->directive_line,
which contains the line number only and no column information. This patch
changes the stored location to the src_loc for the token defining the macro
name, which includes the location and range information.
libcpp/ChangeLog:
PR c++/66290
* macro.cc (_cpp_create_definition): Add location argument.
* internal.h (_cpp_create_definition): Adjust prototype.
* directives.cc (do_define): Pass new location argument to
_cpp_create_definition.
(do_undef): Stop passing inferior location to cpp_warning_with_line;
the default from cpp_warning is better.
(cpp_pop_definition): Pass new location argument to
_cpp_create_definition.
* pch.cc (cpp_read_state): Likewise.
gcc/testsuite/ChangeLog:
PR c++/66290
* c-c++-common/cpp/macro-ranges.c: New test.
* c-c++-common/cpp/line-2.c: Adapt to check for column information
on macro-related libcpp warnings.
* c-c++-common/cpp/line-3.c: Likewise.
* c-c++-common/cpp/macro-arg-count-1.c: Likewise.
* c-c++-common/cpp/pr58844-1.c: Likewise.
* c-c++-common/cpp/pr58844-2.c: Likewise.
* c-c++-common/cpp/warning-zero-location.c: Likewise.
* c-c++-common/pragma-diag-14.c: Likewise.
* c-c++-common/pragma-diag-15.c: Likewise.
* g++.dg/modules/macro-2_d.C: Likewise.
* g++.dg/modules/macro-4_d.C: Likewise.
* g++.dg/modules/macro-4_e.C: Likewise.
* g++.dg/spellcheck-macro-ordering.C: Likewise.
* gcc.dg/builtin-redefine.c: Likewise.
* gcc.dg/cpp/Wunused.c: Likewise.
* gcc.dg/cpp/redef2.c: Likewise.
* gcc.dg/cpp/redef3.c: Likewise.
* gcc.dg/cpp/redef4.c: Likewise.
* gcc.dg/cpp/ucnid-11-utf8.c: Likewise.
* gcc.dg/cpp/ucnid-11.c: Likewise.
* gcc.dg/cpp/undef2.c: Likewise.
* gcc.dg/cpp/warn-redefined-2.c: Likewise.
* gcc.dg/cpp/warn-redefined.c: Likewise.
* gcc.dg/cpp/warn-unused-macros-2.c: Likewise.
* gcc.dg/cpp/warn-unused-macros.c: Likewise.
|
|
get__Pragma_string() in directives.cc is responsible for lexing the parens
and the string argument from a _Pragma("...") operator. This function does
not handle the case when the closing paren is not on the same line as the
string; in that case, libcpp will by default reuse the token buffer it
previously used for the string, so that the string token returned by
get__Pragma_string() may be corrupted, as shown in the testcase. Fix using
the existing keep_tokens mechanism that temporarily disables the reuse of
token buffers.
libcpp/ChangeLog:
PR preprocessor/67046
* directives.cc (_cpp_do__Pragma): Increment pfile->keep_tokens to
ensure the returned string token is valid.
gcc/testsuite/ChangeLog:
PR preprocessor/67046
* c-c++-common/cpp/pr67046.c: New test.
|
|
The following patch adds testcases for 5 DRs. In the DR2475, DR2530 and
CWG2691 my understanding is we already implement the desired behavior,
in DR2478 partially (I've added 2 dg-bogus there, I think we inherit
rather than overwrite DECL_DECLARED_CONSTINIT_P for explicit specialization
somewhere, still far better than clang++) and DR2673 on the other side the
DR was to codify the clang++ behavior rather than GCC.
Not 100% sure if it is better to commit the 2 with dg-bogus or just wait
until the actual fixes are implemented. BTW, I've noticed
register_specialization does:
FOR_EACH_CLONE (clone, fn)
{
DECL_DECLARED_INLINE_P (clone)
= DECL_DECLARED_INLINE_P (fn);
DECL_SOURCE_LOCATION (clone)
= DECL_SOURCE_LOCATION (fn);
DECL_DELETED_FN (clone)
= DECL_DELETED_FN (fn);
}
but not e.g. constexpr/consteval, have tried to cover that in a testcase
but haven't managed to do so.
2023-02-15 Jakub Jelinek <jakub@redhat.com>
* g++.dg/DRs/dr2475.C: New test.
* g++.dg/DRs/dr2478.C: New test.
* g++.dg/DRs/dr2530.C: New test.
* g++.dg/DRs/dr2673.C: New test.
* c-c++-common/cpp/delimited-escape-seq-8.c: New test.
|
|
libcpp's directives-only mode does not expect deferred pragmas to be
registered, but to date the c-family registration process has not checked for
this case. That issue became more visible since r13-1544, which added the
commonly used GCC diagnostic pragmas to the set of those registered in
preprocessing modes. Fix it by checking for directives-only mode in
c-family/c-pragma.cc.
gcc/c-family/ChangeLog:
PR preprocessor/108244
* c-pragma.cc (c_register_pragma_1): Don't attempt to register any
deferred pragmas if -fdirectives-only.
(init_pragma): Likewise.
gcc/testsuite/ChangeLog:
* c-c++-common/cpp/pr108244-1.c: New test.
* c-c++-common/cpp/pr108244-2.c: New test.
* c-c++-common/gomp/pr108244-3.c: New test.
|
|
The result of linemap_resolve_location() can be an ad-hoc location, if that is
what was stored in a relevant macro map. maybe_unwind_expanded_macro_loc()
did not previously handle this case, causing it to print the wrong tracking
information for an example such as the new testcase macro-trace-1.c. Fix that
by checking for ad-hoc locations where needed.
gcc/ChangeLog:
* tree-diagnostic.cc (maybe_unwind_expanded_macro_loc): Handle ad-hoc
location in return value of linemap_resolve_location().
gcc/testsuite/ChangeLog:
* c-c++-common/cpp/macro-trace-1.c: New test.
|
|
The following pseudo-patch regenerates the libcpp tables with Unicode 15.0.0
which added 4489 new characters.
As mentioned previously, this isn't just a matter of running the
two libcpp/make*.cc programs on the new Unicode files, but one needs
to manually update a table inside of makeuname2c.cc according to
a table in Unicode text (which is partially reflected in the text
files, but e.g. in Unicode 14.0.0 not 100% accurately, in 15.0.0
actually accurately).
I've also added some randomly chosen subset of those 4489 new
characters to a testcase.
2022-11-04 Jakub Jelinek <jakub@redhat.com>
gcc/testsuite/
* c-c++-common/cpp/named-universal-char-escape-1.c: Add tests for some
characters newly added in Unicode 15.0.0.
libcpp/
* makeuname2c.cc (struct generated): Update from Unicode 15.0.0
table 4-8.
* ucnid.h: Regenerated for Unicode 15.0.0.
* uname2c.h: Likewise.
|
|
On Tue, Aug 30, 2022 at 09:10:37PM +0000, Joseph Myers wrote:
> I'm seeing build failures of glibc for powerpc64, as illustrated by the
> following C code:
>
> #if 0
> \NARG
> #endif
>
> (the actual sysdeps/powerpc/powerpc64/sysdep.h code is inside #ifdef
> __ASSEMBLER__).
>
> This shows some problems with this feature - and with delimited escape
> sequences - as it affects C. It's fine to accept it as an extension
> inside string and character literals, because \N or \u{...} would be
> invalid in the absence of the feature (i.e. the syntax for such literals
> fails to match, meaning that the rule about undefined behavior for a
> single ' or " as a pp-token applies). But outside string and character
> literals, the usual lexing rules apply, the \ is a pp-token on its own and
> the code is valid at the preprocessing level, and with expansion of macros
> appearing before or after the \ (e.g. u defined as a macro in the \u{...}
> case) it may be valid code at the language level as well. I don't know
> what older C++ versions say about this, but for C this means e.g.
>
> #define z(x) 0
> #define a z(
> int x = a\NARG);
>
> needs to be accepted as expanding to "int x = 0;", not interpreted as
> using the \N feature in an identifier and produce an error.
The following patch changes this, so that:
1) outside of string/character literals, \N without following { is never
treated as an error nor warning, it is silently treated as \ separate
token followed by whatever is after it
2) \u{123} and \N{LATIN SMALL LETTER A WITH ACUTE} are not handled as
extension at all outside of string/character literals in the strict
standard modes (-std=c*) except for -std=c++{23,2b}, only in the
-std=gnu* modes, because it changes behavior on valid sources, e.g.
#define z(x) 0
#define a z(
int x = a\u{123});
int y = a\N{LATIN SMALL LETTER A WITH ACUTE});
3) introduces -Wunicode warning (on by default) and warns for cases
of what looks like invalid delimited escape sequence or named
universal character escape outside of string/character literals
and is treated as separate tokens
2022-09-07 Jakub Jelinek <jakub@redhat.com>
libcpp/
* include/cpplib.h (struct cpp_options): Add cpp_warn_unicode member.
(enum cpp_warning_reason): Add CPP_W_UNICODE.
* init.cc (cpp_create_reader): Initialize cpp_warn_unicode.
* charset.cc (_cpp_valid_ucn): In possible identifier contexts, don't
handle \u{ or \N{ specially in -std=c* modes except -std=c++2{3,b}.
In possible identifier contexts, don't emit an error and punt
if \N isn't followed by {, or if \N{} surrounds some lower case
letters or _. In possible identifier contexts when not C++23, don't
emit an error but warning about unknown character names and treat as
separate tokens. When treating as separate tokens \u{ or \N{, emit
warnings.
gcc/
* doc/invoke.texi (-Wno-unicode): Document.
gcc/c-family/
* c.opt (Winvalid-utf8): Use ObjC instead of objC. Remove
" in comments" from description.
(Wunicode): New option.
gcc/testsuite/
* c-c++-common/cpp/delimited-escape-seq-4.c: New test.
* c-c++-common/cpp/delimited-escape-seq-5.c: New test.
* c-c++-common/cpp/delimited-escape-seq-6.c: New test.
* c-c++-common/cpp/delimited-escape-seq-7.c: New test.
* c-c++-common/cpp/named-universal-char-escape-5.c: New test.
* c-c++-common/cpp/named-universal-char-escape-6.c: New test.
* c-c++-common/cpp/named-universal-char-escape-7.c: New test.
* g++.dg/cpp23/named-universal-char-escape1.C: New test.
* g++.dg/cpp23/named-universal-char-escape2.C: New test.
|
|
The following patch introduces a new warning - -Winvalid-utf8 similarly
to what clang now has - to diagnose invalid UTF-8 byte sequences in
comments, but not just in those, but also in string/character literals
and outside of them.
The warning is on by default when explicit -finput-charset=UTF-8 is
used and C++23 compilation is requested and if -{,W}pedantic or
-pedantic-errors it is actually a pedwarn.
The reason it is on by default only for -finput-charset=UTF-8 is
that the sources often are UTF-8, but sometimes could be some ASCII
compatible single byte encoding where non-ASCII characters only
appear in comments. So having the warning off by default
is IMO desirable. The C++23 pedantic mode for when the source code
is UTF-8 is -std=c++23 -pedantic-errors -finput-charset=UTF-8.
2022-09-01 Jakub Jelinek <jakub@redhat.com>
PR c++/106655
libcpp/
* include/cpplib.h (struct cpp_options): Implement C++23
P2295R6 - Support for UTF-8 as a portable source file encoding.
Add cpp_warn_invalid_utf8 and cpp_input_charset_explicit fields.
(enum cpp_warning_reason): Add CPP_W_INVALID_UTF8 enumerator.
* init.cc (cpp_create_reader): Initialize cpp_warn_invalid_utf8
and cpp_input_charset_explicit.
* charset.cc (_cpp_valid_utf8): Adjust function comment.
* lex.cc (UCS_LIMIT): Define.
(utf8_continuation): New const variable.
(utf8_signifier): Move earlier in the file.
(_cpp_warn_invalid_utf8, _cpp_handle_multibyte_utf8): New functions.
(_cpp_skip_block_comment): Handle -Winvalid-utf8 warning.
(skip_line_comment): Likewise.
(lex_raw_string, lex_string): Likewise.
(_cpp_lex_direct): Likewise.
gcc/
* doc/invoke.texi (-Winvalid-utf8): Document it.
gcc/c-family/
* c.opt (-Winvalid-utf8): New warning.
* c-opts.cc (c_common_handle_option) <case OPT_finput_charset_>:
Set cpp_opts->cpp_input_charset_explicit.
(c_common_post_options): If -finput-charset=UTF-8 is explicit
in C++23, enable -Winvalid-utf8 by default and if -pedantic
or -pedantic-errors, make it a pedwarn.
gcc/testsuite/
* c-c++-common/cpp/Winvalid-utf8-1.c: New test.
* c-c++-common/cpp/Winvalid-utf8-2.c: New test.
* c-c++-common/cpp/Winvalid-utf8-3.c: New test.
* g++.dg/cpp23/Winvalid-utf8-1.C: New test.
* g++.dg/cpp23/Winvalid-utf8-2.C: New test.
* g++.dg/cpp23/Winvalid-utf8-3.C: New test.
* g++.dg/cpp23/Winvalid-utf8-4.C: New test.
* g++.dg/cpp23/Winvalid-utf8-5.C: New test.
* g++.dg/cpp23/Winvalid-utf8-6.C: New test.
* g++.dg/cpp23/Winvalid-utf8-7.C: New test.
* g++.dg/cpp23/Winvalid-utf8-8.C: New test.
* g++.dg/cpp23/Winvalid-utf8-9.C: New test.
* g++.dg/cpp23/Winvalid-utf8-10.C: New test.
* g++.dg/cpp23/Winvalid-utf8-11.C: New test.
* g++.dg/cpp23/Winvalid-utf8-12.C: New test.
|
|
The following patch implements the
C++23 P2071R2 - Named universal character escapes
paper to support \N{LATIN SMALL LETTER E} etc.
I've used Unicode 14.0, there are 144803 character name properties
(including the ones generated by Unicode NR1 and NR2 rules)
and correction/control/alternate aliases, together with zero terminators
that would be 3884745 bytes, which is clearly unacceptable for libcpp.
This patch instead contains a generator which from the UnicodeData.txt
and NameAliases.txt files emits a space optimized radix tree (208765
bytes long for 14.0), a single string literal dictionary (59418 bytes),
maximum name length (currently 88 chars) and two small helper arrays
for the NR1/NR2 name generation.
The radix tree needs 2 to 9 bytes per node, the exact format is
described in the generator program. There could be ways to shrink
the dictionary size somewhat at the expense of slightly slower lookups.
Currently the patch implements strict matching (that is what is needed
to actually implement it on valid code) and Unicode UAX44-LM2 algorithm
loose matching to provide hints (that algorithm essentially ignores
hyphens in between two alphanumeric characters, spaces and underscores
(with one exception for hyphen) and does case insensitive matching).
In the attachment is a WIP patch that shows how to implement also
spellcheck.{h,cc} style discovery of misspellings, but I'll need to talk
to David Malcolm about it, as spellcheck.{h,cc} is in gcc/ subdir
(so the WIP incremental patch instead prints all the names to stderr).
2022-08-26 Jakub Jelinek <jakub@redhat.com>
PR c++/106648
libcpp/
* charset.cc: Implement C++23 P2071R2 - Named universal character
escapes. Include uname2c.h.
(hangul_syllables, hangul_count): New variables.
(struct uname2c_data): New type.
(_cpp_uname2c, _cpp_uname2c_uax44_lm2): New functions.
(_cpp_valid_ucn): Use them. Handle named universal character escapes.
(convert_ucn): Adjust comment.
(convert_escape): Call convert_ucn even for \N.
(_cpp_interpret_identifier): Handle named universal character escapes.
* lex.cc (get_bidi_ucn): Fix up function comment formatting.
(get_bidi_named): New function.
(forms_identifier_p, lex_string): Handle named universal character
escapes.
* makeuname2c.cc: New file. Small parts copied from makeucnid.cc.
* uname2c.h: New generated file.
gcc/c-family/
* c-cppbuiltin.cc (c_cpp_builtins): Predefine
__cpp_named_character_escapes to 202207L.
gcc/testsuite/
* c-c++-common/cpp/named-universal-char-escape-1.c: New test.
* c-c++-common/cpp/named-universal-char-escape-2.c: New test.
* c-c++-common/cpp/named-universal-char-escape-3.c: New test.
* c-c++-common/cpp/named-universal-char-escape-4.c: New test.
* c-c++-common/Wbidi-chars-25.c: New test.
* gcc.dg/cpp/named-universal-char-escape-1.c: New test.
* gcc.dg/cpp/named-universal-char-escape-2.c: New test.
* g++.dg/cpp/named-universal-char-escape-1.C: New test.
* g++.dg/cpp/named-universal-char-escape-2.C: New test.
* g++.dg/cpp23/feat-cxx2b.C: Test __cpp_named_character_escapes.
|
|
The following patch implements the C++23 P2290R3 paper.
2022-08-20 Jakub Jelinek <jakub@redhat.com>
PR c++/106645
libcpp/
* include/cpplib.h (struct cpp_options): Implement
P2290R3 - Delimited escape sequences. Add delimite_escape_seqs
member.
* init.cc (struct lang_flags): Likewise.
(lang_defaults): Add delim column.
(cpp_set_lang): Copy over delimite_escape_seqs.
* charset.cc (extend_char_range): New function.
(_cpp_valid_ucn): Use it. Handle delimited escape sequences.
(convert_hex): Likewise.
(convert_oct): Likewise.
(convert_ucn): Use extend_char_range.
(convert_escape): Call convert_oct even for \o.
(_cpp_interpret_identifier): Handle delimited escape sequences.
* lex.cc (get_bidi_ucn_1): Likewise. Add end argument, fill it in.
(get_bidi_ucn): Adjust get_bidi_ucn_1 caller. Use end argument to
compute num_bytes.
gcc/testsuite/
* c-c++-common/cpp/delimited-escape-seq-1.c: New test.
* c-c++-common/cpp/delimited-escape-seq-2.c: New test.
* c-c++-common/cpp/delimited-escape-seq-3.c: New test.
* c-c++-common/Wbidi-chars-24.c: New test.
* gcc.dg/cpp/delimited-escape-seq-1.c: New test.
* gcc.dg/cpp/delimited-escape-seq-2.c: New test.
* g++.dg/cpp/delimited-escape-seq-1.C: New test.
* g++.dg/cpp/delimited-escape-seq-2.C: New test.
|
|
The first part of the following testcase (m1-m3 macros and its use)
regressed with my PR89971 fix, but as the m1,m4-m5 and its use part shows,
the problem isn't new, we can emit a CPP_PADDING token to avoid it from
being adjacent to whatever comes after the __VA_OPT__ (in this case there
is nothing afterwards, true).
In most cases these CPP_PADDING tokens don't matter, all other
callers of cpp_get_token_with_location either ignore CPP_PADDING tokens
completely (e.g. c_lex_with_flags) or they just remember them and
take them into account when printing stuff whether there should be
added whitespace or not (scan_translation_unit + token_streamer::stream).
So, I think we should just ignore CPP_PADDING tokens the same way in
_cpp_parse_expr.
2022-05-27 Jakub Jelinek <jakub@redhat.com>
PR preprocessor/105732
* expr.cc (_cpp_parse_expr): Handle CPP_PADDING by just another
token.
* c-c++-common/cpp/va-opt-10.c: New test.
|
|
As mentioned in the PR, in some cases we preprocess incorrectly when we
encounter an identifier which is defined as function-like macro, followed
by at least 2 CPP_PADDING tokens and then some other identifier.
On the following testcase, the problem is in the 3rd funlike_invocation_p,
the tokens are CPP_NAME Y, CPP_PADDING (the pfile->avoid_paste shared token),
CPP_PADDING (one created with padding_token, val.source is non-NULL and
val.source->flags & PREV_WHITE is non-zero) and then another CPP_NAME.
funlike_invocation_p remembers there was a padding token, but remembers the
first one because of its condition, then the next token is the CPP_NAME,
which is not CPP_OPEN_PAREN, so the CPP_NAME token is backed up, but as we
can't easily backup more tokens, it pushes into a new context the padding
token (the pfile->avoid_paste one). The net effect is that when Y is not
defined as fun-like macro, we read Y, avoid_paste, padding_token, Y,
while if Y is fun-like macro, we read Y, avoid_paste, avoid_paste, Y
(the second avoid_paste is because that is how we handle end of a context).
Now, for stringify_arg that is unfortunately a significant difference,
which handles CPP_PADDING tokens with:
if (token->type == CPP_PADDING)
{
if (source == NULL
|| (!(source->flags & PREV_WHITE)
&& token->val.source == NULL))
source = token->val.source;
continue;
}
and later on
/* Leading white space? */
if (dest - 1 != BUFF_FRONT (pfile->u_buff))
{
if (source == NULL)
source = token;
if (source->flags & PREV_WHITE)
*dest++ = ' ';
}
source = NULL;
(and c-ppoutput.cc has similar code).
So, when Y is not fun-like macro, ' ' is added because padding_token's
val.source->flags & PREV_WHITE is non-zero, while when it is fun-like
macro, we don't add ' ' in between, because source is NULL and so
used from the next token (CPP_NAME Y), which doesn't have PREV_WHITE set.
Now, the funlike_invocation_p condition
if (padding == NULL
|| (!(padding->flags & PREV_WHITE) && token->val.source == NULL))
padding = token;
looks very similar to that in stringify_arg/c-ppoutput.cc, so I assume
the intent was to prefer do the same thing and pick the right padding.
But there are significant differences. Both stringify_arg and c-ppoutput.cc
don't remember the CPP_PADDING token, but its val.source instead, while
in funlike_invocation_p we want to remember the padding token that has the
significant information for stringify_arg/c-ppoutput.cc.
So, IMHO we want to overwrite padding if:
1) padding == NULL (remember that there was any padding at all)
2) padding->val.source == NULL (this matches the source == NULL
case in stringify_arg)
3) !(padding->val.source->flags & PREV_WHITE) && token->val.source == NULL
(this matches the !(source->flags & PREV_WHITE) && token->val.source == NULL
case in stringify_arg)
2022-02-01 Jakub Jelinek <jakub@redhat.com>
PR preprocessor/104147
* macro.cc (funlike_invocation_p): For padding prefer a token
with val.source non-NULL especially if it has PREV_WHITE set
on val.source->flags. Add gcc_assert that CPP_PADDING tokens
don't have PREV_WHITE set in flags.
* c-c++-common/cpp/pr104147.c: New test.
|
|
When a sequence of diagnostic messages bounces back and forth repeatedly
between two includes, as with
#include <map>
std::map<const char*, const char*> m ("123", "456");
The output is quite a bit longer than necessary because we dump the include
path each time it changes. I'd think we could print the include path once
for each header file, and then expect that the user can look earlier in the
output if they're wondering.
gcc/ChangeLog:
* diagnostic.h (struct diagnostic_context): Add includes_seen.
* diagnostic.c (diagnostic_initialize): Initialize it.
(diagnostic_finish): Clean it up.
(includes_seen): New function.
(diagnostic_report_current_module): Use it.
gcc/testsuite/ChangeLog:
* c-c++-common/cpp/line-2.c: Only expect includes once.
* c-c++-common/cpp/line-3.c: Likewise.
|
|
In the following testcase we incorrectly error about pasting / token
with padding token (which is a result of __VA_OPT__); instead we should
like e.g. for ##arg where arg is empty macro argument clear PASTE_LEFT
flag of the previous token if __VA_OPT__ doesn't add any real tokens
(which can happen either because the macro doesn't have any tokens
passed to ... (i.e. __VA_ARGS__ expands to empty) or when __VA_OPT__
doesn't have any tokens in between ()s).
2021-12-30 Jakub Jelinek <jakub@redhat.com>
PR preprocessor/89971
libcpp/
* macro.c (replace_args): For ##__VA_OPT__, if __VA_OPT__ expands
to no tokens at all, drop PASTE_LEFT flag from the previous token.
gcc/testsuite/
* c-c++-common/cpp/va-opt-9.c: New test.
|
|
stringify_arg uses pfile->u_buff to create the string literal.
Unfortunately, paste_tokens -> _cpp_lex_direct -> lex_number -> _cpp_unaligned_alloc
can in some cases use pfile->u_buff too, which results in losing everything
prepared for the string literal until the token pasting.
The following patch fixes that by not calling paste_token during the
construction of the string literal, but doing that before. All the tokens
we are processing have been pushed into a token buffer using
tokens_buff_add_token so it is fine if we paste some of them in that buffer
(successful pasting creates a new token in that buffer), move following
tokens if any to make it contiguous, pop (throw away) the extra tokens at
the end and then do stringify_arg.
Also, paste_tokens now copies over PREV_WHITE and PREV_FALLTHROUGH flags
from the original lhs token to the replacement token. Copying that way
the PREV_WHITE flag is needed for the #__VA_OPT__ handling and copying
over PREV_FALLTHROUGH fixes the new Wimplicit-fallthrough-38.c test.
2021-12-01 Jakub Jelinek <jakub@redhat.com>
PR preprocessor/103415
libcpp/
* macro.c (stringify_arg): Remove va_opt argument and va_opt handling.
(paste_tokens): On successful paste or in PREV_WHITE and
PREV_FALLTHROUGH flags from the *plhs token to the new token.
(replace_args): Adjust stringify_arg callers. For #__VA_OPT__,
perform token pasting in a separate loop before stringify_arg call.
gcc/testsuite/
* c-c++-common/cpp/va-opt-8.c: New test.
* c-c++-common/Wimplicit-fallthrough-38.c: New test.
|
|
Jonathan mentioned on IRC that:
"Accept P1949R7 (C++ Identifier Syntax using Unicode Standard Annex 31) as
a Defect Report and apply the changes therein to the C++ working paper."
while I've actually implemented it only for -std={gnu,c}++{23,2b}.
As the C++98 rules were significantly different, I'm not trying to change
anything for C++98.
2021-11-30 Jakub Jelinek <jakub@redhat.com>
PR c++/100977
* init.c (lang_defaults): Enable cxx23_identifiers for
-std={gnu,c}++{11,14,17,20} too.
* c-c++-common/cpp/ucnid-2011-1-utf8.c: Expect errors in C++.
* c-c++-common/cpp/ucnid-2011-1.c: Likewise.
* g++.dg/cpp/ucnid-4-utf8.C: Add missing space to dg-options.
* g++.dg/cpp23/normalize3.C: Enable for c++11 rather than just c++23.
* g++.dg/cpp23/normalize4.C: Likewise.
* g++.dg/cpp23/normalize5.C: Likewise.
* g++.dg/cpp23/normalize7.C: Expect errors rather than just warnings
for c++11 and up rather than just c++23.
* g++.dg/cpp23/ucnid-2-utf8.C: Expect errors even for c++11 .. c++20.
|
|
Normal preprocessing, -fdirectives-only preprocessing before the Nathan's
rewrite, and all other compilers I've tried on godbolt treat even \*/
as end of a block comment, but the new -fdirectives-only handling doesn't.
2021-11-17 Jakub Jelinek <jakub@redhat.com>
PR preprocessor/103130
* lex.c (cpp_directive_only_process): Treat even \*/ as end of block
comment.
* c-c++-common/cpp/dir-only-9.c: New test.
|
|
So, besides missing #__VA_OPT__ patch for which I've posted patch last week,
P1042R1 introduced some placemarker changes for __VA_OPT__, most notably
the addition of before "removal of placemarker tokens," rescanning ...
and the
#define H4(X, ...) __VA_OPT__(a X ## X) ## b
H4(, 1) // replaced by a b
example mentioned there where we replace it currently with ab
The following patch are the minimum changes (except for the
__builtin_expect) that achieve the same preprocessing between current
clang++ and patched gcc on all the testcases I've tried (i.e. gcc __VA_OPT__
testsuite in c-c++-common/cpp/va-opt* including the new test and the clang
clang/test/Preprocessor/macro_va_opt* testcases).
At one point I was trying to implement the __VA_OPT__(args) case as if
for non-empty __VA_ARGS__ it expanded as if __VA_OPT__( and ) were missing,
but from the tests it seems that is not how it should work, in particular
if after (or before) we have some macro argument and it is not followed
(or preceded) by ##, then it should be macro expanded even when __VA_OPT__
is after ## or ) is followed by ##. And it seems that not removing any
padding tokens isn't possible either, because the expansion of the arguments
typically has a padding token at the start and end and those at least
according to the testsuite need to go. It is unclear if it would be enough
to remove just one or if all padding tokens should be removed.
Anyway, e.g. the previous removal of all padding tokens at the end of
__VA_OPT__ is undesirable, as it e.g. eats also the padding tokens needed
for the H4 example from the paper.
2021-09-01 Jakub Jelinek <jakub@redhat.com>
PR preprocessor/101488
* macro.c (replace_args): Fix up handling of CPP_PADDING tokens at the
start or end of __VA_OPT__ arguments when preceeded or followed by ##.
* c-c++-common/cpp/va-opt-3.c: Adjust expected output.
* c-c++-common/cpp/va-opt-7.c: New test.
|
|
The following patch implements C++20 # __VA_OPT__ (...) support.
Testcases cover what I came up with myself and what LLVM has for #__VA_OPT__
in its testsuite and the string literals are identical between the two
compilers on the va-opt-5.c testcase.
2021-08-17 Jakub Jelinek <jakub@redhat.com>
libcpp/
* macro.c (vaopt_state): Add m_stringify member.
(vaopt_state::vaopt_state): Initialize it.
(vaopt_state::update): Overwrite it.
(vaopt_state::stringify): New method.
(stringify_arg): Replace arg argument with first, count arguments
and add va_opt argument. Use first instead of arg->first and
count instead of arg->count, for va_opt add paste_tokens handling.
(paste_tokens): Fix up len calculation. Don't spell rhs twice,
instead use %.*s to supply lhs and rhs spelling lengths. Don't call
_cpp_backup_tokens here.
(paste_all_tokens): Call it here instead.
(replace_args): Adjust stringify_arg caller. For vaopt_state::END
if stringify is true handle __VA_OPT__ stringification.
(create_iso_definition): Handle # __VA_OPT__ similarly to # macro_arg.
gcc/testsuite/
* c-c++-common/cpp/va-opt-5.c: New test.
* c-c++-common/cpp/va-opt-6.c: New test.
|
|
The toolchain provided by ST for stm32 has had support for
__FILENAME__ for a while, but clang/llvm has recently implemented
support for __FILE_NAME__, so it seems better to use the same macro
name in GCC.
It happens that the ST patch is similar to the one proposed in PR
c/42579.
Given these input files:
::::::::::::::
mydir/myinc.h
::::::::::::::
char* mystringh_file = __FILE__;
char* mystringh_filename = __FILE_NAME__;
char* mystringh_base_file = __BASE_FILE__;
::::::::::::::
mydir/mysrc.c
::::::::::::::
char* mystring_file = __FILE__;
char* mystring_filename = __FILE_NAME__;
char* mystring_base_file = __BASE_FILE__;
we produce:
$ gcc mydir/mysrc.c -I . -E
char* mystringh_file = "./mydir/myinc.h";
char* mystringh_filename = "myinc.h";
char* mystringh_base_file = "mydir/mysrc.c";
char* mystring_file = "mydir/mysrc.c";
char* mystring_filename = "mysrc.c";
char* mystring_base_file = "mydir/mysrc.c";
2021-05-20 Christophe Lyon <christophe.lyon@linaro.org>
Torbjörn Svensson <torbjorn.svensson@st.com>
PR c/42579
libcpp/
* include/cpplib.h (cpp_builtin_type): Add BT_FILE_NAME entry.
* init.c (builtin_array): Likewise.
* macro.c (_cpp_builtin_macro_text): Add support for BT_FILE_NAME.
gcc/
* doc/cpp.texi (Common Predefined Macros): Document __FILE_NAME__.
gcc/testsuite/
* c-c++-common/spellcheck-reserved.c: Add tests for __FILE_NAME__.
* c-c++-common/cpp/file-name-1.c: New test.
|