Age | Commit message (Collapse) | Author | Files | Lines |
|
The following patch adds support for the begin declare target construct,
which is another spelling for declare target construct without clauses
(where it needs paired end declare target), but unlike that one accepts
clauses.
This is an OpenMP 5.1 feature, implemented with 5.2 clarification because
in 5.1 we had a restriction in the declare target chapter shared by
declare target and begin declare target that if there are any clauses
specified at least one of them needs to be to or link. But that
was of course meant just for declare target and not begin declare target,
because begin declare target doesn't even allow to/link/enter clauses.
In addition to that, the patch also makes device_type clause duplication
an error (as stated in 5.1) and similarly makes declare target with
just device_type clause an error rather than warning.
What this patch doesn't do is:
1) OpenMP 5.1 also added an indirect clause, we don't support that
neither on declare target nor begin declare target
and I couldn't find it in our features pages (neither libgomp.texi
nor web)
2) I think device_type(nohost)/device_type(host) support can't work for
variables (in 5.0 it only talked about procedures so this could be
also thought as 5.1 feature that we should just add to the list
and implement)
3) I don't see any use of the "omp declare target nohost" attribute, so
I'm not sure if device_type(nohost) works at all
2022-10-04 Jakub Jelinek <jakub@redhat.com>
gcc/c-family/
* c-omp.cc (c_omp_directives): Uncomment begin declare target
entry.
gcc/c/
* c-lang.h (struct c_omp_declare_target_attr): New type.
(current_omp_declare_target_attribute): Change type from
int to vec<c_omp_declare_target_attr, va_gc> *.
* c-parser.cc (c_parser_translation_unit): Adjust for that change.
If last pushed directive was begin declare target, use different
wording and simplify format strings for easier translations.
(c_parser_omp_clause_device_type): Uncomment
check_no_duplicate_clause call.
(c_parser_omp_declare_target): Adjust for the
current_omp_declare_target_attribute type change, push { -1 }.
Use error_at rather than warning_at for declare target with
only device_type clauses.
(OMP_BEGIN_DECLARE_TARGET_CLAUSE_MASK): Define.
(c_parser_omp_begin): Add begin declare target support.
(c_parser_omp_end): Adjust for the
current_omp_declare_target_attribute type change, adjust
diagnostics wording and simplify format strings for easier
translations.
* c-decl.cc (current_omp_declare_target_attribute): Change type from
int to vec<c_omp_declare_target_attr, va_gc> *.
(c_decl_attributes): Adjust for the
current_omp_declare_target_attribute type change. If device_type
was present on begin declare target, add "omp declare target host"
and/or "omp declare target nohost" attributes.
gcc/cp/
* cp-tree.h (struct omp_declare_target_attr): Rename to ...
(cp_omp_declare_target_attr): ... this. Add device_type member.
(omp_begin_assumes_data): Rename to ...
(cp_omp_begin_assumes_data): ... this.
(struct saved_scope): Change types of omp_declare_target_attribute
and omp_begin_assumes.
* parser.cc (cp_parser_omp_clause_device_type): Uncomment
check_no_duplicate_clause call.
(cp_parser_omp_all_clauses): Fix up pasto, c_name for OMP_CLAUSE_LINK
should be "link" rather than "to".
(cp_parser_omp_declare_target): Adjust for omp_declare_target_attr
to cp_omp_declare_target_attr changes, push -1 as device_type. Use
error_at rather than warning_at for declare target with only
device_type clauses.
(OMP_BEGIN_DECLARE_TARGET_CLAUSE_MASK): Define.
(cp_parser_omp_begin): Add begin declare target support. Adjust
for omp_begin_assumes_data to cp_omp_begin_assumes_data change.
(cp_parser_omp_end): Adjust for the
omp_declare_target_attr to cp_omp_declare_target_attr and
omp_begin_assumes_data to cp_omp_begin_assumes_data type changes,
adjust diagnostics wording and simplify format strings for easier
translations.
* semantics.cc (finish_translation_unit): Likewise.
* decl2.cc (cplus_decl_attributes): If device_type was present on
begin declare target, add "omp declare target host" and/or
"omp declare target nohost" attributes.
gcc/testsuite/
* c-c++-common/gomp/declare-target-4.c: Move tests that are now
rejected into declare-target-7.c.
* c-c++-common/gomp/declare-target-6.c: Adjust expected diagnostics.
* c-c++-common/gomp/declare-target-7.c: New test.
* c-c++-common/gomp/begin-declare-target-1.c: New test.
* c-c++-common/gomp/begin-declare-target-2.c: New test.
* c-c++-common/gomp/begin-declare-target-3.c: New test.
* c-c++-common/gomp/begin-declare-target-4.c: New test.
* g++.dg/gomp/attrs-9.C: Add begin declare target tests.
* g++.dg/gomp/attrs-18.C: New test.
libgomp/
* libgomp.texi (Support begin/end declare target syntax in C/C++):
Mark as implemented.
|
|
The reason the nonzero mask was kept in a tree was basically inertia,
as everything in irange is a tree. However, there's no need to keep
it in a tree, as the conversions to and from wide ints are very
annoying. That, plus special casing NULL masks to be -1 is prone
to error.
I have not only rewritten all the uses to assume a wide int, but
have corrected a few places where we weren't propagating the masks, or
rather pessimizing them to -1. This will become more important in
upcoming patches where we make better use of the masks.
Performance testing shows a trivial improvement in VRP, as things like
irange::contains_p() are tied to a tree. Ughh, can't wait for trees in
iranges to go away.
gcc/ChangeLog:
* value-range-storage.cc (irange_storage_slot::set_irange): Remove
special case.
* value-range.cc (irange::irange_set): Adjust for nonzero mask
being a wide int.
(irange::irange_set_anti_range): Same.
(irange::set): Same.
(irange::verify_range): Same.
(irange::legacy_equal_p): Same.
(irange::operator==): Same.
(irange::contains_p): Same.
(irange::legacy_intersect): Same.
(irange::legacy_union): Same.
(irange::irange_single_pair_union): Call union_nonzero_bits.
(irange::irange_union): Same.
(irange::irange_intersect): Call intersect_nonzero_bits.
(irange::intersect): Adjust for nonzero mask being a wide int.
(irange::invert): Same.
(irange::set_nonzero_bits): Same.
(irange::get_nonzero_bits_from_range): New.
(irange::set_range_from_nonzero_bits): New.
(irange::get_nonzero_bits): Adjust for nonzero mask being a wide
int.
(irange::intersect_nonzero_bits): Same.
(irange::union_nonzero_bits): Same.
(range_tests_nonzero_bits): Remove test.
* value-range.h (irange::varying_compatible_p): Adjust for nonzero
mask being a wide int.
(gt_ggc_mx): Same.
(gt_pch_nx): Same.
(irange::set_undefined): Same.
(irange::set_varying): Same.
(irange::normalize_kind): Same.
|
|
__builtin_popcount and __builtin_ffs were sharing the same range-ops
entry, but the nonzero mask optimization is not valid for ffs.
Separate them out into two entries.
PR tree-optimization/107130
gcc/ChangeLog:
* gimple-range-op.cc (class cfn_popcount): Call op_cfn_ffs.
(class cfn_ffs): New.
(gimple_range_op_handler::maybe_builtin_call): Separate out
CASE_CFN_FFS into its own case.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/pr107130.c: New test.
|
|
This PR related to _Pragma locations and diagnostic pragmas was fixed by a
combination of r10-325 and r13-1596. Add missing test coverage.
gcc/testsuite/ChangeLog:
PR c/91669
* c-c++-common/pr91669.c: New test.
|
|
|
|
i386-builtin-types.inc is included indirectly via i386-builtins.h
into 4 files: i386.cc i386-builtins.cc i386-expand.cc i386-features.cc
Only i386.cc dependency was present in gcc/config/t-i386 makefile.
As a result parallel builds occasionally fail as:
g++ ... -o i386-builtins.o ... ../../gcc-13-20220911/gcc/config/i386/i386-builtins.cc
In file included from ../../gcc-13-20220911/gcc/config/i386/i386-builtins.cc:92:
../../gcc-13-20220911/gcc/config/i386/i386-builtins.h:25:10:
fatal error: i386-builtin-types.inc: No such file or directory
25 | #include "i386-builtin-types.inc"
| ^~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
make[3]: *** [../../gcc-13-20220911/gcc/config/i386/t-i386:54: i386-builtins.o]
Error 1 shuffle=1663349189
gcc/
* config/i386/t-i386: Add build-time dependencies against
i386-builtin-types.inc to i386-builtins.o, i386-expand.o,
i386-features.o.
|
|
The cmse-15.c testcase fails at -Os because ICF means that we
generate
secure3:
b secure1
which is OK, but does not match the currently expected
secure3:
...
bx r[0-3]
gcc/testsuite/ChangeLog:
* gcc.target/arm/cmse/cmse-15.c: Align with -Os improvements.
Co-Authored-By: Yvan ROUX <yvan.roux@foss.st.com>
Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
|
|
On Fri, Sep 30, 2022 at 04:39:25PM -0400, Jason Merrill wrote:
> > --- gcc/cp/decl.cc.jj 2022-09-22 00:14:55.478599363 +0200
> > +++ gcc/cp/decl.cc 2022-09-22 00:24:01.121178256 +0200
> > @@ -223,6 +223,7 @@ struct GTY((for_user)) named_label_entry
> > bool in_transaction_scope;
> > bool in_constexpr_if;
> > bool in_consteval_if;
> > + bool in_assume;
>
> I think it would be better to reject jumps into statement-expressions like
> the C front-end.
Ok, here is a self-contained patch that does that.
2022-10-03 Jakub Jelinek <jakub@redhat.com>
* cp-tree.h (BCS_STMT_EXPR): New enumerator.
* name-lookup.h (enum scope_kind): Add sk_stmt_expr.
* name-lookup.cc (begin_scope): Handle sk_stmt_expr like sk_block.
* semantics.cc (begin_compound_stmt): For BCS_STMT_EXPR use
sk_stmt_expr.
* parser.cc (cp_parser_statement_expr): Use BCS_STMT_EXPR instead of
BCS_NORMAL.
* decl.cc (struct named_label_entry): Add in_stmt_expr.
(poplevel_named_label_1): Handle sk_stmt_expr.
(check_previous_goto_1): Diagnose entering of statement expression.
(check_goto): Likewise.
* g++.dg/ext/stmtexpr24.C: New test.
|
|
* sv.po: Update.
|
|
... to match the trait's canonical spelling __is_same instead
of its alternative spelling __is_same_as.
gcc/c-family/ChangeLog:
* c-common.cc (c_common_reswords): Use RID_IS_SAME instead of
RID_IS_SAME_AS.
gcc/cp/ChangeLog:
* constraint.cc (diagnose_trait_expr): Use CPTK_IS_SAME instead
of CPTK_IS_SAME_AS.
* cp-trait.def (IS_SAME_AS): Rename to ...
(IS_SAME): ... this.
* pt.cc (alias_ctad_tweaks): Use CPTK_IS_SAME instead of
CPTK_IS_SAME_AS.
* semantics.cc (trait_expr_value): Likewise.
(finish_trait_expr): Likewise.
|
|
Add a vector length parameter needed by amdgcn without breaking aarch64.
All amdgcn vector masks are DImode, regardless of vector length, so we can't
tell what length is implied simply from the operator mode. (Even if we used
different integer modes there's no mode small enough to differenciate a 2 or
4 lane mask). Without knowing the intended length we end up using a mask with
too many lanes enabled, which leads to undefined behaviour..
The extra operand is not added for vector mask types so AArch64 does not need
to be adjusted.
gcc/ChangeLog:
* config/gcn/gcn-valu.md (while_ultsidi): Limit mask length using
operand 3.
* doc/md.texi (while_ult): Document new operand 3 usage.
* internal-fn.cc (expand_while_optab_fn): Set operand 3 when lhs_type
maps to a non-vector mode.
|
|
No need to continue processing an undefined range.
gcc/
PR tree-optimization/107109
* range-op.cc (adjust_op1_for_overflow): Don't process undefined.
gcc/testsuite/
* gcc.dg/pr107109.c: New.
|
|
Like the non-predicated vrev64q patterns, mve_vrev64q_m_<supf><mode>
and mve_vrev64q_m_f<mode> need an early clobber constraint, otherwise
we can generate an unpredictable instruction:
Warning: 64-bit element size and same destination and source operands makes instruction UNPREDICTABLE
when calling vrevq64_m* with the same first and second arguments.
OK for trunk?
Thanks,
Christophe
gcc/ChangeLog:
* config/arm/mve.md (mve_vrev64q_m_<supf><mode>): Add early
clobber.
(mve_vrev64q_m_f<mode>): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/arm/mve/intrinsics/vrev64q_m_s16-clobber.c: New test.
|
|
C2x changes the <float.h> definition of *_EPSILON to apply only to
normalized numbers. The effect is that LDBL_EPSILON for IBM long
double becomes 0x1p-105L instead of 0x1p-1074L.
There is a reasonable case for considering this a defect fix - it
originated from the issue reporting process (DR#467), though it ended
up being resolved by a paper (N2326) for C2x rather than through the
issue process, and code using *_EPSILON often needs to override the
pre-C2x value of LDBL_EPSILON and use something on the order of
magnitude of the C2x value instead. However, I've followed the
conservative approach of only making the change for C2x and not for
previous standard versions (and not for C++, which doesn't have the
C2x changes in this area).
The testcases added are intended to be valid for all long double
formats. The C11 one is based on
gcc.target/powerpc/rs6000-ldouble-2.c (and when we move to a C2x
default, gcc.target/powerpc/rs6000-ldouble-2.c will need an
appropriate option added to keep using an older language version).
Tested with no regressions for cross to powerpc-linux-gnu.
gcc/c-family/
* c-cppbuiltin.cc (builtin_define_float_constants): Do not
special-case __*_EPSILON__ setting for IBM long double for C2x.
gcc/testsuite/
* gcc.dg/c11-float-7.c, gcc.dg/c2x-float-12.c: New tests.
|
|
Currently if we have a range of [0,0] and we set the nonzero bits to
1, the current code pessimizes the range to [0,1] because it assumes
the range is [1,1] plus the possibility of 0. This fixes the
oversight.
gcc/ChangeLog:
* value-range.cc (irange::set_nonzero_bits): Do not pessimize range.
(range_tests_nonzero_bits): New test.
|
|
There is nothing else to compare when the number of sub-ranges is 0.
gcc/ChangeLog:
* value-range.cc (irange::operator==): Early bail on m_num_ranges
equal to 0.
|
|
There is no need to compare nonzero masks when comparing two VARYING
ranges, as they are always the same when range types are the same.
gcc/ChangeLog:
* value-range.cc (irange::legacy_equal_p): Remove nonozero mask
check when comparing VR_VARYING ranges.
|
|
gcc/ChangeLog:
* ipa-prop.cc (struct ipa_vr_ggc_hash_traits): Do not compare
incompatible ranges in ipa-prop.
|
|
Remove unreliable test for IEEE_FMA(), which fails on powerpc.
Adjust stop codes for modes_1.f90.
2022-10-03 Francois-Xavier Coudert <fxcoudert@gcc.gnu.org>
gcc/testsuite/
PR fortran/107062
* gfortran.dg/ieee/fma_1.f90: Fix test.
* gfortran.dg/ieee/modes_1.f90: Fix test.
|
|
|
|
Obvious typo in diagnostics.
2022-10-02 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/107121
* tree-cfg.cc (verify_gimple_call): Fix a typo in diagnostics,
DEFFERED_INIT -> DEFERRED_INIT.
|
|
We need to perform static links by default on VxWorks, where the use
of shared libraries involves unusual steps compared to standard native
systems.
This has to be conveyed before the lang_specific_driver code gets
invoked (in particular for g++), so specs aren't available.
This change defines the GCC_DRIVER_HOST_INITIALIZATION macro for
VxWorks, to insert a -static option in case the user hasn't provided any
explicit indication on the command line of the kind of link desired.
While a HOST macro doesn't seem appropriate to control a target OS
driven behavior, this matches other uses and won't conflict as VxWorks
is not supported on any of the other configurations using this macro.
gcc/
* config/vxworks-driver.cc: New.
* config.gcc (*vxworks*): Add vxworks-driver.o in extra_gcc_objs.
* config/t-vxworks: Add vxworks-driver.o.
* config/vxworks.h (GCC_DRIVER_HOST_INITIALIZATION): New.
|
|
Working on the reintroduction of shared libraries support
(and of modules depending on shared libraries) exposed a few
test failures of simple c++ constructor tests on arm-vxworks7r2.
Investigation revealed that we were not linking the
crtstuff objects as needed from a compiler configured not to
have shared libs support, because of the ENABLE_SHARED_LIBGCC
guard in this piece of vxworks.h:
/* Setup the crtstuff begin/end we might need for dwarf EH registration
and/or INITFINI_ARRAY support for shared libs. */
#if (HAVE_INITFINI_ARRAY_SUPPORT && defined(ENABLE_SHARED_LIBGCC)) \
|| (DWARF2_UNWIND_INFO && !defined(CONFIG_SJLJ_EXCEPTIONS))
#define VX_CRTBEGIN_SPEC "%{!shared:vx_crtbegin.o%s;:vx_crtbeginS.o%s}"
crtstuff initfini array support is meant to be leveraged for
constructors regardless of whether the compiler also happens to be
configured with shared library support, so the guard on ENABLE_SHARED_LIBGCC
here is inappropriate.
This change just removes it,
2022-09-30 Olivier Hainque <hainque@adacore.com>
gcc/
* config/vxworks.h (VX_CRTBEGIN_SPEC, VX_CRTEND_SPEC): If
HAVE_INITFINI_ARRAY_SUPPORT, pick crtstuff objects regardless
of ENABLE_SHARED_LIBGCC.
|
|
|
|
gcc/fortran/ChangeLog:
PR fortran/100040
PR fortran/100029
* trans-expr.cc (gfc_conv_class_to_class): Add code to have
assumed-rank arrays recognized as full arrays and fix the type
of the array assignment.
(gfc_conv_procedure_call): Change order of code blocks such that
the free of ALLOCATABLE dummy arguments with INTENT(OUT) occurs
first.
gcc/testsuite/ChangeLog:
PR fortran/100029
* gfortran.dg/PR100029.f90: New test.
PR fortran/100040
* gfortran.dg/PR100040.f90: New test.
|
|
This replaces the unreachable default case in some cp_trait_kind
switches with an exhaustive listing of the trait codes that we don't
expect to see, so that when adding a new trait we'll get a helpful
-Wswitch warning if we forget to handle the new trait in a relevant
switch.
gcc/cp/ChangeLog:
* semantics.cc (trait_expr_value): Make cp_trait_kind switch
statement exhaustive.
(finish_trait_expr): Likewise.
(finish_trait_type): Likewise.
|
|
This was found when testing buildroot with linuxthreads enabled. In
this case, the build passes --disable-tls to the toolchain during
configuration. After building the OpenRISC toolchain it was still
generating TLS code sequences and causing linker failures such as:
..../or1k-buildroot-linux-uclibc-gcc -o gpsd-3.24/gpsctl .... -lusb-1.0 -lm -lrt -lnsl
..../ld: ..../sysroot/usr/lib/libusb-1.0.so: undefined reference to `__tls_get_addr'
This patch fixes this by disabling tls for the OpenRISC target when requested
via --disable-tls.
gcc/ChangeLog:
* config/or1k/or1k.cc (TARGET_HAVE_TLS): Only define if
HAVE_AS_TLS is defined.
Tested-by: Yann E. MORIN <yann.morin@orange.com>
|
|
This patch is a minimal fix for the recently-added
struct-component-kind-1.c test (which is currently failing to emit one
of the errors it expects in scan output). This fragment was erroneously
omitted from the second version of the patch posted previously:
https://gcc.gnu.org/pipermail/gcc-patches/2022-September/602504.html
2022-10-01 Julian Brown <julian@codesourcery.com>
gcc/
* gimplify.cc (omp_group_base): Fix IF_PRESENT (no_create)
handling.
|
|
This patch improves handling of the Z bit in the status register in a
variety of ways to improve either the code size or code speed on various
H8 subtargets.
For example, we can test the zero/nonzero status of the upper byte of a
16 bit register using mov.b, we can move the Z or an inverted Z into a
QImode register profitably on some subtargets. We can move Z or an
inverted Z into the sign bit on the H8/SX profitably, etc.
gcc/
* config/h8300/h8300.md (HSI2): New iterator.
(eqne_invert): Similarly.
* config/h8300/testcompare.md (testhi_upper_z): New pattern.
(cmpqi_z, cmphi_z, cmpsi_z): Likewise.
(store_z_qi, store_z_i_qi, store_z_hi, store_z_hi_sb): New
define_insn_and_splits and/or define_insns.
(store_z_hi_neg, store_z_hi_and, store_z_<mode>): Likewise.
(store_z_<mode>_neg, store_z_<mode>_and, store_z): Likewise.
|
|
I noticed that we were ignoring all the special rules for when to use a
simple INIT_EXPR for array initialization from a CONSTRUCTOR, because
split_nonconstant_init_1 was also passing 1 to the from_array parameter.
Arguably that's the real bug, but I think we can be flexible.
The test that I noticed this with no longer fails without it.
gcc/cp/ChangeLog:
* init.cc (build_vec_init): Clear from_array for CONSTRUCTOR
initializer.
|
|
We were already converting the result of expand_vec_init_expr to void; we
need to do the same for split_nonconstant_init.
The test that I noticed this with no longer fails without it.
gcc/cp/ChangeLog:
* cp-gimplify.cc (cp_genericize_init): Also convert the result of
split_nonconstant_init to void.
|
|
gcc/
* tree-ssa-dom.cc (record_edge_info): Install correct version of
patch.
|
|
This change is based on commit 9fa26998a63d4b22b637ed8702520819e408a694
by Dehao Chen in vendors/google/heads/gcc-4_8.
Tested on x86_64-pc-linux-gnu.
gcc/ChangeLog:
* dwarf2out.cc (add_call_src_coords_attributes): Emit discriminators for inlined call sites.
|
|
|
|
While investigating a benchmark for optimization opportunities I came across single block loop which either iterates precisely once or forever. This is an interesting scenario as we can ignore the infinite looping path and treat any PHI nodes as degenerates. So more concretely let's consider this trivial testcase:
volatile void abort (void);
void
foo(int a)
{
int b = 0;
while (1)
{
if (!a)
break;
b = 1;
}
if (b != 0)
abort ();
}
Quick analysis shows that b's initial value is 0 and its value only changes if we enter an infinite loop. So if we get to the test b != 0, the only possible value b could have would be 0 and the test and its true arm can be eliminated.
The DOM3 dump looks something like this:
;; basic block 2, loop depth 0, count 118111600 (estimated locally), maybe hot
;; prev block 0, next block 3, flags: (NEW, VISITED)
;; pred: ENTRY [always] count:118111600 (estimated locally) (FALLTHRU,EXECUTABLE)
;; succ: 3 [always] count:118111600 (estimated locally) (FALLTHRU,EXECUTABLE)
;; basic block 3, loop depth 1, count 1073741824 (estimated locally), maybe hot
;; prev block 2, next block 4, flags: (NEW, VISITED)
;; pred: 2 [always] count:118111600 (estimated locally) (FALLTHRU,EXECUTABLE)
;; 3 [89.0% (guessed)] count:955630224 (estimated locally) (FALSE_VALUE,EXECUTABLE)
# b_1 = PHI <0(2), 1(3)>
if (a_3(D) == 0)
goto <bb 4>; [11.00%]
else
goto <bb 3>; [89.00%]
;; succ: 4 [11.0% (guessed)] count:118111600 (estimated locally) (TRUE_VALUE,EXECUTABLE)
;; 3 [89.0% (guessed)] count:955630224 (estimated locally) (FALSE_VALUE,EXECUTABLE)
;; basic block 4, loop depth 0, count 118111600 (estimated locally), maybe hot
;; prev block 3, next block 5, flags: (NEW, VISITED)
;; pred: 3 [11.0% (guessed)] count:118111600 (estimated locally) (TRUE_VALUE,EXECUTABLE)
if (b_1 != 0)
goto <bb 5>; [0.00%]
else
goto <bb 6>; [100.00%]
;; succ: 5 [never] count:0 (precise) (TRUE_VALUE,EXECUTABLE)
;; 6 [always] count:118111600 (estimated locally) (FALSE_VALUE,EXECUTABLE)
This is a good representative of what the benchmark code looks like.
The primary effect we want to capture is to realize that the test if (b_1 != 0) is always false and optimize it accordingly.
In the benchmark, this opportunity is well hidden until after the loop optimizers have completed, so the first chance to capture this case is in DOM3. Furthermore, DOM wants loops normalized with latch blocks/edges. So instead of bb3 looping back to itself, there's an intermediate empty block during DOM.
I originally thought this was likely to only affect the benchmark. But when I instrumented the optimization and bootstrapped GCC, much to my surprise there were several hundred similar cases identified in GCC itself. So it's not as benchmark specific as I'd initially feared.
Anyway, detecting this in DOM is pretty simple. We detect the infinite loop, including the latch block. Once we've done that, we walk the PHI nodes and attach equivalences to the appropriate outgoing edge. That's all we need to do as the rest of DOM is already prepared to handle equivalences on edges.
gcc/
* tree-ssa-dom.cc (single_block_loop_p): New function.
(record_edge_info): Also record equivalences for the outgoing
edge of a single block loop where the condition is an invariant.
gcc/testsuite/
* gcc.dg/infinite-loop.c: New test.
|
|
It's a bit weird that free_dom_edge_info leaves a dangling pointer in e->aux.
Not sure what I was thinking.
There's two callers. One wipes e->aux immediately after the call, the other
attaches a newly created object immediately after the call. So we can wipe
e->aux within the call and simplify one of the two call sites.
This is preparatory work for a minor optimization where we want to detect
another class of edge equivalences in DOM (until something better is available)
and either attach them an existing edge_info structure or create a new one if
one doesn't currently exist for a given edge.
gcc/
* tree-ssa-dom.cc (free_dom_edge_info): Clear e->aux too.
(free_all_edge_infos): Do not clear e->aux here.
|
|
* target.def (TARGET_C_EXCESS_PRECISION): Document
-fexcess-precision=16.
|
|
I just happened to stuble on this one while trying to sort out the
RISC-V bits.
gcc/ChangeLog
* doc/tm.texi (TARGET_C_EXCESS_PRECISION): Add 16.
|
|
This fixes f19a327077e ("Support -fexcess-precision=16 which will enable
FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when backend supports _Float16.") on
RISC-V targets.
gcc/ChangeLog
PR target/106815
* config/riscv/riscv.cc (riscv_excess_precision): Add support
for EXCESS_PRECISION_TYPE_FLOAT16.
|
|
On Fri, Sep 30, 2022 at 09:54:49AM -0400, Jason Merrill wrote:
> > Note, there is one further problem on aarch64/arm, types with HFmode
> > (_Float16 and __fp16) are there mangled as Dh (which is standard
> > Itanium mangling:
> > ::= Dh # IEEE 754r half-precision floating point (16 bits)
> > ::= DF <number> _ # ISO/IEC TS 18661 binary floating point type _FloatN (N bits)
> > so in theory is also ok, but DF16_ is more specific. Should we just
> > change Dh to DF16_ in those backends, or should __fp16 there be distinct
> > type from _Float16 where __fp16 would mangle Dh and _Float16 DF16_ ?
>
> You argued for keeping __float128 separate from _Float128, does the same
> argument not apply to this case?
Actually, they already were distinct types that just mangled the same.
So the same issue that had to be solved on i?86, ia64 and rs6000 for
_Float64x vs. long double is a problem on arm and aarch64 with _Float16
vs. __fp16.
The following patch fixes it for arm after aarch64 has been changed
already before.
> > And there is csky, which mangles __fp16 (but only if type's name is __fp16,
> > not _Float16) as __fp16, that looks clearly invalid to me as it isn't
> > valid in the mangling grammar. So perhaps just nuke csky's mangle_type
> > and have it mangled as DF16_ by the generic code?
And seems even on csky __fp16 is distinct type from _Float16 (which is a
good thing for consistency, these 3 targets are the only ones that have
__fp16 type), so instead the patch handles it the same as on arm/aarch64,
Dh mangling for __fp16 and DF16_ for _Float16.
2022-09-30 Jakub Jelinek <jakub@redhat.com>
PR c++/107080
* config/arm/arm.cc (arm_mangle_type): Mangle just __fp16 as Dh
and _Float16 as DF16_.
* config/csky/csky.cc (csky_init_builtins): Fix a comment typo.
(csky_mangle_type): Mangle __fp16 as Dh and _Float16 as DF16_
rather than mangling __fp16 as __fp16.
* g++.target/arm/pr107080.C: New test.
|
|
Warnings issued for -Wuninitialized have been using the spelling location of
the problematic usage, discarding any information on the location of the macro
expansion point if such usage was in a macro. This makes the warnings
impossible to control reliably with #pragma GCC diagnostic, and also discards
useful context in the diagnostic output. There seems to be no need to discard
the virtual location information, so this patch fixes that.
PR69543 was mostly about _Pragma issues which have been fixed for many years
now. The PR remains open because two of the testcases added in response to it
still have xfails, but those xfails have nothing to do with _Pragma and rather
just with the issue fixed by this patch, so the PR can be closed now as well.
The other testcase modified here, pragma-diagnostic-2.c, was explicitly
testing for the undesirable behavior that was xfailed in pr69543-3.c. I have
adjusted that and also added a new testcase verifying all 3 types of warning
that come from tree-ssa-uninit.cc get the proper location information now.
gcc/ChangeLog:
PR preprocessor/69543
* tree-ssa-uninit.cc (warn_uninit): Stop stripping macro tracking
information away from the diagnostic location.
(maybe_warn_read_write_only): Likewise.
(maybe_warn_operand): Likewise.
gcc/testsuite/ChangeLog:
PR preprocessor/69543
* c-c++-common/pr69543-3.c: Remove xfail.
* c-c++-common/pr69543-4.c: Likewise.
* gcc.dg/cpp/pragma-diagnostic-2.c: Adjust test for new behavior.
* c-c++-common/pragma-diag-16.c: New test.
|
|
On Fri, Sep 30, 2022 at 09:54:49AM -0400, Jason Merrill wrote:
> > Note, there is one further problem on aarch64/arm, types with HFmode
> > (_Float16 and __fp16) are there mangled as Dh (which is standard
> > Itanium mangling:
> > ::= Dh # IEEE 754r half-precision floating point (16 bits)
> > ::= DF <number> _ # ISO/IEC TS 18661 binary floating point type _FloatN (N bits)
> > so in theory is also ok, but DF16_ is more specific. Should we just
> > change Dh to DF16_ in those backends, or should __fp16 there be distinct
> > type from _Float16 where __fp16 would mangle Dh and _Float16 DF16_ ?
>
> You argued for keeping __float128 separate from _Float128, does the same
> argument not apply to this case?
Actually, they already were distinct types that just mangled the same.
So the same issue that had to be solved on i?86, ia64 and rs6000 for
_Float64x vs. long double is a problem on arm and aarch64 with _Float16
vs. __fp16.
The following patch fixes it so far for aarch64.
2022-09-30 Jakub Jelinek <jakub@redhat.com>
PR c++/107080
* config/aarch64/aarch64.cc (aarch64_mangle_type): Mangle just __fp16
as Dh and _Float16 as DF16_.
* g++.target/aarch64/pr107080.C: New test.
|
|
The following testcase ICEs on x86 as well as ppc64le (the latter
with -mabi=ieeelongdouble), because _Float64x there isn't mangled as
DF64x but e or u9__ieee128 instead.
Those are the mangling that should be used for the non-standard
types with the same mode or for long double, but not for _Float64x.
All the 4 mangle_type targhook implementations start with
type = TYPE_MAIN_VARIANT (type);
so I think it is cleanest to handle it the same in all and return NULL
before the switches on mode or whatever other tests.
s390 doesn't actually have a bug, but while I was there, having
type = TYPE_MAIN_VARIANT (type);
if (TYPE_MAIN_VARIANT (type) == long_double_type_node)
looked useless to me.
Note, there is one further problem on aarch64/arm, types with HFmode
(_Float16 and __fp16) are there mangled as Dh (which is standard
Itanium mangling:
::= Dh # IEEE 754r half-precision floating point (16 bits)
::= DF <number> _ # ISO/IEC TS 18661 binary floating point type _FloatN (N bits)
so in theory is also ok, but DF16_ is more specific. Should we just
change Dh to DF16_ in those backends, or should __fp16 there be distinct
type from _Float16 where __fp16 would mangle Dh and _Float16 DF16_ ?
And there is csky, which mangles __fp16 (but only if type's name is __fp16,
not _Float16) as __fp16, that looks clearly invalid to me as it isn't
valid in the mangling grammar. So perhaps just nuke csky's mangle_type
and have it mangled as DF16_ by the generic code?
2022-09-30 Jakub Jelinek <jakub@redhat.com>
PR c++/107080
* config/i386/i386.cc (ix86_mangle_type): Always return NULL
for float128_type_node or float64x_type_node, don't check
float128t_type_node later on.
* config/ia64/ia64.cc (ia64_mangle_type): Always return NULL
for float128_type_node or float64x_type_node.
* config/rs6000/rs6000.cc (rs6000_mangle_type): Likewise.
Don't check float128_type_node later on.
* config/s390/s390.cc (s390_mangle_type): Don't use
TYPE_MAIN_VARIANT on type which was set to TYPE_MAIN_VARIANT
a few lines earlier.
* g++.dg/cpp23/ext-floating11.C: New test.
|
|
Checking that the triplet matches arm*-*-eabi (or msp430-*-*) is not
enough to know if the execution will enter an endless loop, or if it
will give a meaningful result. As the execution test only work when
VMA and LMA are equal, make sure that this condition is met.
gcc/ChangeLog:
* doc/sourcebuild.texi: Document new vma_equals_lma effective
target check.
gcc/testsuite/ChangeLog:
* lib/target-supports.exp (check_effective_target_vma_equals_lma): New.
* c-c++-common/torture/attr-noinit-1.c: Requre VMA == LMA to run.
* c-c++-common/torture/attr-noinit-2.c: Likewise.
* c-c++-common/torture/attr-noinit-3.c: Likewise.
* c-c++-common/torture/attr-persistent-1.c: Likewise.
* c-c++-common/torture/attr-persistent-3.c: Likewise.
Co-Authored-By: Yvan ROUX <yvan.roux@foss.st.com>
Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
|
|
The linker script should not be prefixed with "-Wl," - it's not an
input file and does not interfere with the new dump output filename
strategy.
gcc/testsuite/ChangeLog:
* lib/gcc-defs.exp: Do not prefix linker script with "-Wl,".
Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
|
|
Add -m[no]-csr-check option in gcc part, when enable -mcsr-check option,
it will add csr-check in .option section and pass this to assembler.
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_file_start): New .option.
* config/riscv/riscv.opt: New options.
* doc/invoke.texi: New definations.
|
|
Adding a new built-in trait currently involves manual boilerplate
consisting of defining an rid enumerator for the identifier as well as a
corresponding cp_trait_kind enumerator and handling them in various switch
statements, the exact set of which depends on whether the proposed trait
yields (and thus is recognized as) a type or an expression.
To streamline the process, this patch adds a central cp-trait.def file
that tabulates the essential details about each built-in trait (whether
it yields a type or an expression, its code, its spelling and its arity)
and uses this file to automate away the manual boilerplate. It also
migrates all the existing C++-specific built-in traits to use this
approach.
After this change, adding a new built-in trait just entails declaring
it in cp-trait.def and defining its behavior in finish_trait_expr/type
(and handling it in diagnose_trait_expr, if it's an expression-yielding
trait).
gcc/c-family/ChangeLog:
* c-common.cc (c_common_reswords): Use cp/cp-trait.def to handle
C++ traits.
* c-common.h (enum rid): Likewise.
gcc/cp/ChangeLog:
* constraint.cc (diagnose_trait_expr): Likewise.
* cp-objcp-common.cc (names_builtin_p): Likewise.
* cp-tree.h (enum cp_trait_kind): Likewise.
* cxx-pretty-print.cc (pp_cxx_trait): Likewise.
* parser.cc (cp_keyword_starts_decl_specifier_p): Likewise.
(cp_parser_primary_expression): Likewise.
(cp_parser_trait): Likewise.
(cp_parser_simple_type_specifier): Likewise.
* cp-trait.def: New file.
|
|
The ':' is reserved in filenames on Windows.
Without this patch, the test case failes with:
.../ben-1_a.C:4:8: error: failed to write compiled module: Invalid argument
.../ben-1_a.C:4:8: note: compiled module file is 'partitions/module:import.mod'
gcc/testsuite:
* g++.dg/modules/ben-1.map: Replace the colon with dash.
* g++.dg/modules/ben-1_a.C: Likewise
Co-Authored-By: Yvan ROUX <yvan.roux@foss.st.com>
Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
|
|
As PR99888 and its related show, the current support for
-fpatchable-function-entry on powerpc ELFv2 doesn't work
well with global entry existence. For example, with one
command line option -fpatchable-function-entry=3,2, it got
below w/o this patch:
.LPFE1:
nop
nop
.type foo, @function
foo:
nop
.LFB0:
.cfi_startproc
.LCF0:
0: addis 2,12,.TOC.-.LCF0@ha
addi 2,2,.TOC.-.LCF0@l
.localentry foo,.-foo
, the assembly is unexpected since the patched nops have
no effects when being entered from local entry.
This patch is to update the nops patched before and after
local entry, it looks like:
.type foo, @function
foo:
.LFB0:
.cfi_startproc
.LCF0:
0: addis 2,12,.TOC.-.LCF0@ha
addi 2,2,.TOC.-.LCF0@l
nop
nop
.localentry foo,.-foo
nop
PR target/99888
PR target/105649
gcc/ChangeLog:
* doc/invoke.texi (option -fpatchable-function-entry): Adjust the
documentation for PowerPC ELFv2 ABI dual entry points.
* config/rs6000/rs6000-internal.h
(rs6000_print_patchable_function_entry): New function declaration.
* config/rs6000/rs6000-logue.cc (rs6000_output_function_prologue):
Support patchable-function-entry by emitting nops before and after
local entry for the function that needs global entry.
* config/rs6000/rs6000.cc (rs6000_print_patchable_function_entry): Skip
the function that needs global entry till global entry has been
emitted.
* config/rs6000/rs6000.h (struct machine_function): New bool member
global_entry_emitted.
gcc/testsuite/ChangeLog:
* gcc.target/powerpc/pr99888-1.c: New test.
* gcc.target/powerpc/pr99888-2.c: New test.
* gcc.target/powerpc/pr99888-3.c: New test.
* gcc.target/powerpc/pr99888-4.c: New test.
* gcc.target/powerpc/pr99888-5.c: New test.
* gcc.target/powerpc/pr99888-6.c: New test.
* c-c++-common/patchable_function_entry-default.c: Adjust for
powerpc_elfv2 to avoid compilation error.
|
|
As PR106516 shows, we can get unexpected gimple outputs for
function thud on some target which supports modulus operation
for vector int. This patch introduces one effective target
vect_int_mod for it, then adjusts the test case with it.
PR testsuite/106516
gcc/testsuite/ChangeLog:
* gcc.dg/pr104992.c: Adjust with vect_int_mod.
* lib/target-supports.exp (check_effective_target_vect_int_mod): New
effective target.
|