Age | Commit message (Collapse) | Author | Files | Lines |
|
logical_stmt_cache::cacheable_p() returns true for relops, but
logical_combine (which does the caching) doesn't handle them and ICEs.
This patch fixes the inconsistency by returning false for relops.
This was working before because even though logical_combine doesn't
handle relops, statements with only one SSA are handled in cache_stmt,
which seems like the only statement we've ever encountered (even through
a full Fedora build).
lhs = s_5 > 999;
However, with two SSA operands we ICE because logical_combine doesn't
handle them:
lhs = s_5 > y_8;
We can either return false for relops in cacheable_p, or fix
logical_combine to handle them. The original idea was to only cache
ANDs and ORs, so I've done the former to unbreak trunk.
We can decide later if there was ever any benefit in caching relops.
gcc/ChangeLog:
PR tree-optimization/97359
* gimple-range-gori.cc (logical_stmt_cache::cacheable_p): Only
handle ANDs and ORs.
(gori_compute_cache::cache_stmt): Adjust comment.
gcc/testsuite/ChangeLog:
* gcc.dg/pr97359.c: New test.
|
|
|
|
gcc/ChangeLog:
2020-10-09 Vladimir Makarov <vmakarov@redhat.com>
PR rtl-optimization/97313
* lra-constraints.c (match_reload): Don't keep strict_low_part in
reloads for non-registers.
gcc/testsuite/ChangeLog:
2020-10-09 Vladimir Makarov <vmakarov@redhat.com>
PR rtl-optimization/97313
* gcc.target/i386/pr97313.c: New.
|
|
For sources which can't use any vector instructions, <x86intrin.h> and
<immintrin.h> cannot be included for compiler intrinsics:
$ echo "#include <x86intrin.h>" | gcc -S -O2 -mno-sse -mno-mmx -x c -
In file included from /usr/include/stdlib.h:1013,
from /usr/lib/gcc/x86_64-redhat-linux/10/include/mm_malloc.h:27,
from /usr/lib/gcc/x86_64-redhat-linux/10/include/xmmintrin.h:34,
from /usr/lib/gcc/x86_64-redhat-linux/10/include/immintrin.h:29,
from /usr/lib/gcc/x86_64-redhat-linux/10/include/x86intrin.h:32,
from <stdin>:1:
/usr/include/bits/stdlib-float.h: In function ‘atof’:
/usr/include/bits/stdlib-float.h:26:1: error: SSE register return with SSE disabled
26 | {
| ^
$
libgcc/config/i386/shadow-stack-unwind.h has a workaround:
/* NB: We need _get_ssp and _inc_ssp from <cetintrin.h>. But we can't
include <x86intrin.h> which ends up including <mm_malloc.h>, which
includes <stdlib.h> and <errno.h> unconditionally. But we can't
include any libc system headers unconditionally from libgcc. Avoid
including <mm_malloc.h> here by defining _IMMINTRIN_H_INCLUDED. */
#define _IMMINTRIN_H_INCLUDED
#include <cetintrin.h>
#undef _IMMINTRIN_H_INCLUDED
Add a standalone intrinsic header file, <x86gprintrin.h>, to provide
integer only intrinsics. All integer only intrinsics are placed in
<x86gprintrin.h>. <x86intrin.h> and <immintrin.h> simply include
<x86gprintrin.h>.
gcc/
PR target/97148
* config.gcc (extra_headers): Add x86gprintrin.h.
* config/i386/adxintrin.h: Check _X86GPRINTRIN_H_INCLUDED for
<x86gprintrin.h>.
* config/i386/bmi2intrin.h: Likewise.
* config/i386/bmiintrin.h: Likewise.
* config/i386/cetintrin.h: Likewise.
* config/i386/cldemoteintrin.h: Likewise.
* config/i386/clflushoptintrin.h: Likewise.
* config/i386/clwbintrin.h: Likewise.
* config/i386/enqcmdintrin.h: Likewise.
* config/i386/fxsrintrin.h: Likewise.
* config/i386/ia32intrin.h: Likewise.
* config/i386/lwpintrin.h: Likewise.
* config/i386/lzcntintrin.h: Likewise.
* config/i386/movdirintrin.h: Likewise.
* config/i386/pconfigintrin.h: Likewise.
* config/i386/pkuintrin.h: Likewise.
* config/i386/rdseedintrin.h: Likewise.
* config/i386/rtmintrin.h: Likewise.
* config/i386/serializeintrin.h: Likewise.
* config/i386/tbmintrin.h: Likewise.
* config/i386/tsxldtrkintrin.h: Likewise.
* config/i386/waitpkgintrin.h: Likewise.
* config/i386/wbnoinvdintrin.h: Likewise.
* config/i386/xsavecintrin.h: Likewise.
* config/i386/xsaveintrin.h: Likewise.
* config/i386/xsaveoptintrin.h: Likewise.
* config/i386/xsavesintrin.h: Likewise.
* config/i386/xtestintrin.h: Likewise.
* config/i386/immintrin.h: Include <x86gprintrin.h> instead of
<fxsrintrin.h>, <xsaveintrin.h>, <xsaveoptintrin.h>,
<xsavesintrin.h>, <xsavecintrin.h>, <lzcntintrin.h>,
<bmiintrin.h>, <bmi2intrin.h>, <xtestintrin.h>, <cetintrin.h>,
<movdirintrin.h>, <sgxintrin.h, <pconfigintrin.h>,
<waitpkgintrin.h>, <cldemoteintrin.h>, <enqcmdintrin.h>,
<serializeintrin.h>, <tsxldtrkintrin.h>, <adxintrin.h>,
<clwbintrin.h>, <clflushoptintrin.h>, <wbnoinvdintrin.h> and
<pkuintrin.h>.
(_wbinvd): Moved to config/i386/x86gprintrin.h.
(_rdrand16_step): Likewise.
(_rdrand32_step): Likewise.
(_rdpid_u32): Likewise.
(_readfsbase_u32): Likewise.
(_readfsbase_u64): Likewise.
(_readgsbase_u32): Likewise.
(_readgsbase_u64): Likewise.
(_writefsbase_u32): Likewise.
(_writefsbase_u64): Likewise.
(_writegsbase_u32): Likewise.
(_writegsbase_u64): Likewise.
(_rdrand64_step): Likewise.
(_ptwrite64): Likewise.
(_ptwrite32): Likewise.
* config/i386/x86gprintrin.h: New file.
* config/i386/x86intrin.h: Include <x86gprintrin.h>. Don't
include <ia32intrin.h>, <lwpintrin.h>, <tbmintrin.h>,
<popcntintrin.h>, <mwaitxintrin.h> and <clzerointrin.h>.
gcc/testsuite/
* gcc.target/i386/avx-1.c (__builtin_ia32_lwpval32): New to
support <lwpintrin.h> included in <x86gprintrin.h>.
(__builtin_ia32_lwpval64): Likewise.
(__builtin_ia32_lwpins32): Likewise.
(__builtin_ia32_lwpins64): Likewise.
(__builtin_ia32_bextri_u32): New to support <tbmintrin.h>
included in <x86gprintrin.h>.
(__builtin_ia32_bextri_u64): Likewise.
* gcc.target/i386/x86gprintrin-1.c: New test.
* gcc.target/i386/x86gprintrin-2.c: Likewise.
* gcc.target/i386/x86gprintrin-3.c: Likewise.
* gcc.target/i386/x86gprintrin-4.c: Likewise.
* gcc.target/i386/x86gprintrin-4a.c: Likewise.
* gcc.target/i386/x86gprintrin-5.c: Likewise.
* gcc.target/i386/x86gprintrin-5a.c: Likewise.
* gcc.target/i386/x86gprintrin-5b.c: Likewise.
* gcc.target/i386/x86gprintrin-6.c: Likewise.
libgcc/
PR target/97148
* config/i386/shadow-stack-unwind.h: Include <x86gprintrin.h>
instead of <cetintrin.h>.
|
|
The nvptx-as assembler verifies the ptx code using ptxas, if there's any
in the PATH.
The default in the nvptx port for -misa=sm_xx is sm_30, but the ptxas of the
latest cuda release (11.1) no longer supports sm_30.
Consequently we cannot build gcc against that release (although we should
still be able to build without any cuda release).
Fix this by setting -misa=sm_35 by default.
Tested check-gcc on nvptx.
Tested libgomp on x86_64-linux with nvpx accelerator.
Both build again cuda 9.1.
gcc/ChangeLog:
2020-10-09 Tom de Vries <tdevries@suse.de>
PR target/97348
* config/nvptx/nvptx.h (ASM_SPEC): Also pass -m to nvptx-as if
default is used.
* config/nvptx/nvptx.opt (misa): Init with PTX_ISA_SM35.
|
|
The following adds a effective target to properly allow
the gcc.dg/vect/pr65947-3.c expected vectorization to be adjusted
when run with, say, -march=cascadelake.
2020-10-09 Richard Biener <rguenther@suse.de>
gcc/
* doc/sourcebuild.texi (vect_masked_load): Document.
gcc/testsuite
* lib/target-supports.exp (check_effective_target_vect_masked_load):
New effective target.
* gcc.dg/vect/pr65947-3.c: Update.
|
|
We're running into a multiplication with one unvectorizable
operand we expect to build from scalars but SLP discovery
fatally fails the build of both since one stmt is commutated:
_60 = _58 * _59;
_63 = _59 * _62;
_66 = _59 * _65;
...
where _59 is the "bad" operand. The following patch makes the
case work where the first stmt has a good operand by not fatally
failing the SLP build for the operand but communicating upwards
how to commutate.
2020-10-09 Richard Biener <rguenther@suse.de>
PR tree-optimization/97334
* tree-vect-slp.c (vect_build_slp_tree_1): Do not fatally
fail lanes other than zero when BB vectorizing.
* gcc.dg/vect/bb-slp-pr65935.c: Amend.
|
|
gcc/ChangeLog:
PR ipa/97292
PR ipa/97335
* ipa-modref-tree.h (copy_from): Drop summary in a
clone.
|
|
Just use edge insertion which will appropriately handle the situation
from botan.
2020-10-09 Richard Biener <rguenther@suse.de>
PR tree-optimization/97347
* tree-vect-slp.c (vect_create_constant_vectors): Use
edge insertion when inserting on the fallthru edge,
appropriately insert at the start of BBs when inserting
after PHIs.
* g++.dg/vect/pr97347.cc: New testcase.
|
|
gcc/ChangeLog:
PR tree-optimization/97317
* range-op.cc (operator_cast::op1_range): Handle casts where the precision
of the RHS is only 1 greater than the precision of the LHS.
gcc/testsuite/ChangeLog:
* gcc.dg/pr97317.c: New test.
|
|
This fixes leaks discovered checking whether I introduced new ones
with the last vectorizer changes.
2020-10-09 Richard Biener <rguenther@suse.de>
* cgraphunit.c (expand_all_functions): Free tp_first_run_order.
* ipa-modref.c (pass_ipa_modref::execute): Free order.
* tree-ssa-loop-niter.c (estimate_numbers_of_iterations): Free
loop body.
* tree-vect-data-refs.c (vect_find_stmt_data_reference): Free
data references upon failure.
* tree-vect-loop.c (update_epilogue_loop_vinfo): Free BBs
array of the original loop.
* tree-vect-slp.c (vect_slp_bbs): Use an auto_vec for
dataref_groups to release its memory.
|
|
> Perhaps another way out of this would be document and enforce that
> __builtin_c[lt]z{,l,ll} etc calls are undefined at zero, but C[TL]Z ifn
> calls are defined there based on *_DEFINED_VALUE_AT_ZERO (*) == 2
The following patch implements that, i.e. __builtin_c?z* now take full
advantage of them being UB at zero, while the ifns are well defined at zero
if *_DEFINED_VALUE_AT_ZERO (*) == 2. That is what fixes PR94801.
Furthermore, to fix PR97312, if it is well defined at zero and the value at
zero is prec, we don't lower the maximum unless the argument is known to be
non-zero.
For gimple-range.cc I guess we could improve it if needed e.g. by returning
a [0,7][32,32] range for .CTZ of e.g. [0,137], but for now it (roughly)
matches what vr-values.c does.
2020-10-09 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/94801
PR target/97312
* vr-values.c (vr_values::extract_range_basic) <CASE_CFN_CLZ,
CASE_CFN_CTZ>: When stmt is not an internal-fn call or
C?Z_DEFINED_VALUE_AT_ZERO is not 2, assume argument is not zero
and thus use [0, prec-1] range unless it can be further improved.
For CTZ, don't update maxi from upper bound if it was previously prec.
* gimple-range.cc (gimple_ranger::range_of_builtin_call) <CASE_CFN_CLZ,
CASE_CFN_CTZ>: Likewise.
* gcc.dg/tree-ssa/pr94801.c: New test.
|
|
And no testcase was included, I'm including one below.
Anyway, this PR and the other CTZ related discussions led me to discover a
bug I've made earlier, CLZ/CTZ builtins have unsigned arguments and e.g.
both the vr-values.cc and now gimple-range.cc code heavily relies on that,
but __builtin_ffs has a signed operand and this optimization was incorrectly
making the operand signed too, so I guess it would greatly confuse VRP in
some cases.
2020-10-09 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/97325
* match.pd (FFS(nonzero) -> CTZ(nonzero) + 1): Cast argument to
corresponding unsigned type.
* gcc.c-torture/execute/pr97325.c: New test.
|
|
gcc/testsuite/ChangeLog:
PR testsuite/97337
* gcc.dg/pr97315-1.c: Moved to...
* g++.dg/opt/pr97315-1.C: ...here.
|
|
This fixes a vector CTOR insertion issue when we try to insert after
a PHI node.
2020-10-09 Richard Biener <rguenther@suse.de>
* tree-vect-slp.c (vect_create_constant_vectors): Properly insert
after PHIs.
* gcc.dg/vect/bb-slp-phis-1.c: New testcase.
|
|
|
|
Here we're trying to push into a<T>::c<N> in order to instantiate t<N>, but
were building a TYPENAME_TYPE for it because a<T> isn't open yet. Don't
do that when we know we're trying to enter the scope.
gcc/cp/ChangeLog:
PR c++/96805
PR c++/96199
* pt.c (tsubst_aggr_type): Don't build a TYPENAME_TYPE when
entering_scope.
(tsubst_template_decl): Use tsubst_aggr_type.
gcc/testsuite/ChangeLog:
PR c++/96805
* g++.dg/cpp0x/alias-decl-pr96805.C: New test.
|
|
This is a first step towards enabling the sincos optimization in Ada.
The issue this patch solves is that sincos takes the type to be looked
up with mathfn_built_in from variables or temporaries passed as
arguments to SIN and COS intrinsics. In Ada, different float types
may be used but, despite their representation equivalence, their
distinctness causes the optimization to be skipped, because they are
not the types that mathfn_built_in expects.
This patch introduces a function that maps intrinsics to the type
they're associated with, and uses that type, obtained from the
intrinsics used in calls to be optimized, to look up the correspoding
CEXPI intrinsic.
For the sake of defensive programming, when using the type obtained
from the intrinsic, it now checks that, if different types are found
for the used argument, or for other calls that use it, that the types
are interchangeable.
for gcc/ChangeLog
* builtins.c (mathfn_built_in_type): New.
* builtins.h (mathfn_built_in_type): Declare.
* tree-ssa-math-opts.c (execute_cse_sincos_1): Use it to
obtain the type expected by the intrinsic.
|
|
Rename our BU_P10_MISC_2 built-in define macro to be
BU_P10_POWERPC64_MISC_2. This more accurately reflects
that the macro includes the RS6000_BTM_POWERPC64 entry,
and matches the style we used for the P7 equivalent.
gcc/ChangeLog:
* config/rs6000/rs6000-builtin.def (BU_P10_MISC_2): Rename
to BU_P10_POWERPC64_MISC_2.
CFUGED, CNTLZDM, CNTTZDM, PDEPD, PEXTD): Call renamed macro.
|
|
* tree-nrv.c (dest_safe_for_nrv_p): Disable tbaa in
call_may_clobber_ref_p and ref_maybe_used_by_stmt_p.
* tree-tailcall.c (find_tail_calls): Likewise.
* tree-ssa-alias.c (call_may_clobber_ref_p): Add tbaa_p parameter.
* tree-ssa-alias.h (call_may_clobber_ref_p): Update prototype.
* tree-ssa-sccvn.c (vn_reference_lookup_3): Pass data->tbaa_p
to call_may_clobber_ref_p_1.
|
|
When gas outputs DWARF5 .debug_line[_str] then we have to tell it the
comp_dir and main file name for the zero entry line table. Otherwise
gas has to guess at the CU compilation directory and file.
Before a gcc -gdwarf-5 ../src/hello.c line table looked like:
Directory table:
0 ../src (24)
1 ../src (24)
2 /usr/include (31)
File name table:
0 hello.c (16), 0
1 hello.c (16), 1
2 stdio.h (44), 2
With this patch it looks like:
Directory table:
0 /tmp/obj (0)
1 ../src (24)
2 /usr/include (31)
File name table:
0 ../src/hello.c (9), 0
1 hello.c (16), 1
2 stdio.h (44), 2
gcc/ChangeLog:
* dwarf2out.c (dwarf2out_finish): Emit .file 0 entry when
generating DWARF5 .debug_line table through gas.
|
|
2020-10-08 John Henning <john.henning@oracle.com>
gcc/
PR other/97309
* doc/invoke.texi: Improve documentation of
-fallow-store-data-races.
|
|
__arm_vcvtnq_u32_f32 was missing from arm_mve.h, although the s32_f32 and
[su]16_f16 versions were present.
This patch adds the missing version and testcase, which are
cut-and-paste from the other versions.
2020-10-08 Christophe Lyon <christophe.lyon@linaro.org>
gcc/
PR target/96914
* config/arm/arm_mve.h (__arm_vcvtnq_u32_f32): New.
gcc/testsuite/
PR target/96914
* gcc.target/arm/mve/intrinsics/vcvtnq_u32_f32.c: New test.
|
|
This work from Martin Liska was motivated by gcc.dg/vect/bb-slp-22.c
which shows how poorly we currently BB vectorize code like
a0 = in[0] + 23;
a1 = in[1] + 142;
a2 = in[2] + 2;
a3 = in[3] + 31;
if (x > y)
{
b[0] = a0;
b[1] = a1;
b[2] = a2;
b[3] = a3;
}
else
{
out[0] = a0 * (x + 1);
out[1] = a1 * (y + 1);
out[2] = a2 * (x + 1);
out[3] = a3 * (y + 1);
}
namely by vectorizing the stores but not the common load (and add)
they are feeded with.
Thus with the following patch we change the BB vectorizer from
operating on a single basic-block at a time to consider somewhat
larger regions (but not the whole function yet because of issues
with vector size iteration).
I took the opportunity to remove the fancy region iterations again
now that we operate on BB granularity and in the end need to visit
PHI nodes as well.
2020-10-08 Martin Liska <mliska@suse.cz>
Richard Biener <rguenther@suse.de>
* tree-vectorizer.h (_bb_vec_info::const_iterator): Remove.
(_bb_vec_info::const_reverse_iterator): Likewise.
(_bb_vec_info::region_stmts): Likewise.
(_bb_vec_info::reverse_region_stmts): Likewise.
(_bb_vec_info::_bb_vec_info): Adjust.
(_bb_vec_info::bb): Remove.
(_bb_vec_info::region_begin): Remove.
(_bb_vec_info::region_end): Remove.
(_bb_vec_info::bbs): New vector of BBs.
(vect_slp_function): Declare.
* tree-vect-patterns.c (vect_determine_precisions): Use
regular stmt iteration.
(vect_pattern_recog): Likewise.
* tree-vect-slp.c: Include cfganal.h, tree-eh.h and tree-cfg.h.
(vect_build_slp_tree_1): Properly refuse to vectorize
volatile and throwing stmts.
(vect_build_slp_tree_2): Pass group-size down to
get_vectype_for_scalar_type.
(_bb_vec_info::_bb_vec_info): Use regular stmt iteration,
adjust for changed region specification.
(_bb_vec_info::~_bb_vec_info): Likewise.
(vect_slp_check_for_constructors): Likewise.
(vect_slp_region): Likewise.
(vect_slp_bbs): New worker operating on a vector of BBs.
(vect_slp_bb): Wrap it.
(vect_slp_function): New function splitting the function
into multi-BB regions.
(vect_create_constant_vectors): Handle the case of inserting
after a throwing def.
(vect_schedule_slp_instance): Adjust.
* tree-vectorizer.c (vec_info::remove_stmt): Simplify again.
(vec_info::insert_seq_on_entry): Adjust.
(pass_slp_vectorize::execute): Also init PHIs. Call
vect_slp_function.
* gcc.dg/vect/bb-slp-22.c: Adjust.
* gfortran.dg/pr68627.f: Likewise.
|
|
This fixes bad placement of sunk loads.
2020-10-08 Richard Biener <rguenther@suse.de>
PR tree-optimization/97330
* tree-ssa-sink.c (statement_sink_location): Avoid skipping
PHIs when they dominate the insert location.
* gcc.dg/torture/pr97330-1.c: New testcase.
* gcc.dg/torture/pr97330-2.c: Likewise.
|
|
* ipa-modref.c (get_access): Fix handling of offsets.
* tree-ssa-alias.c (modref_may_conflict): Watch for overflows.
|
|
gcc/ChangeLog:
* dbgcnt.def (DEBUG_COUNTER): Add ipa_mod_ref debug counter.
* tree-ssa-alias.c (modref_may_conflict): Handle the counter.
|
|
This adjusts BB vectorization testcases to look for the number of
SLP subgraphs vectorized rather than for the number of basic blocks
we've found opportunities in because followup patches will play
with the granularity we work on, vectorizing multiple basic blocks
at a time.
Together with this, because I noticed when looking at non-obvious
mismatches, I avoid analyzing group-size 1 SLP instances which
result in pointless V1mode vectorizations.
It might be interesting to work on adding sth like
dg-warning to look for -fopt-info-{optimized,missing} so
we could directly annotate (not) vectorized loops instead of
relying on fragile counts.
2020-10-08 Richard Biener <rguenther@suse.de>
* tree-vectorizer.c (try_vectorize_loop_1): Do not dump
"basic block vectorized".
(pass_slp_vectorize::execute): Likewise.
* tree-vect-slp.c (vect_analyze_slp_instance): Avoid
re-analyzing split single stmts.
* g++.dg/vect/slp-pr50819.cc: Adjust.
* gcc.dg/vect/bb-slp-1.c: Adjust.
* gcc.dg/vect/bb-slp-10.c: Adjust.
* gcc.dg/vect/bb-slp-11.c: Adjust.
* gcc.dg/vect/bb-slp-13.c: Adjust.
* gcc.dg/vect/bb-slp-14.c: Adjust.
* gcc.dg/vect/bb-slp-15.c: Adjust.
* gcc.dg/vect/bb-slp-16.c: Adjust.
* gcc.dg/vect/bb-slp-17.c: Adjust.
* gcc.dg/vect/bb-slp-18.c: Adjust.
* gcc.dg/vect/bb-slp-19.c: Adjust.
* gcc.dg/vect/bb-slp-2.c: Adjust.
* gcc.dg/vect/bb-slp-20.c: Adjust.
* gcc.dg/vect/bb-slp-21.c: Adjust.
* gcc.dg/vect/bb-slp-22.c: Adjust.
* gcc.dg/vect/bb-slp-23.c: Adjust.
* gcc.dg/vect/bb-slp-24.c: Adjust.
* gcc.dg/vect/bb-slp-25.c: Adjust.
* gcc.dg/vect/bb-slp-26.c: Adjust.
* gcc.dg/vect/bb-slp-27.c: Adjust.
* gcc.dg/vect/bb-slp-28.c: Adjust.
* gcc.dg/vect/bb-slp-29.c: Adjust.
* gcc.dg/vect/bb-slp-3.c: Adjust.
* gcc.dg/vect/bb-slp-30.c: Adjust.
* gcc.dg/vect/bb-slp-31.c: Adjust.
* gcc.dg/vect/bb-slp-34.c: Adjust.
* gcc.dg/vect/bb-slp-35.c: Adjust.
* gcc.dg/vect/bb-slp-36.c: Adjust.
* gcc.dg/vect/bb-slp-38.c: Adjust.
* gcc.dg/vect/bb-slp-4.c: Adjust.
* gcc.dg/vect/bb-slp-45.c: Adjust.
* gcc.dg/vect/bb-slp-46.c: Adjust.
* gcc.dg/vect/bb-slp-48.c: Adjust.
* gcc.dg/vect/bb-slp-5.c: Adjust.
* gcc.dg/vect/bb-slp-6.c: Adjust.
* gcc.dg/vect/bb-slp-7.c: Adjust.
* gcc.dg/vect/bb-slp-8.c: Adjust.
* gcc.dg/vect/bb-slp-8a.c: Adjust.
* gcc.dg/vect/bb-slp-8b.c: Adjust.
* gcc.dg/vect/bb-slp-9.c: Adjust.
* gcc.dg/vect/bb-slp-div-2.c: Adjust.
* gcc.dg/vect/bb-slp-over-widen-1.c: Adjust.
* gcc.dg/vect/bb-slp-over-widen-2.c: Adjust.
* gcc.dg/vect/bb-slp-pattern-2.c: Adjust.
* gcc.dg/vect/bb-slp-pow-1.c: Adjust.
* gcc.dg/vect/bb-slp-pr58135.c: Adjust.
* gcc.dg/vect/bb-slp-pr65935.c: Adjust.
* gcc.dg/vect/bb-slp-pr78205.c: Adjust.
* gcc.dg/vect/bb-slp-pr81635-1.c: Adjust.
* gcc.dg/vect/bb-slp-pr81635-3.c: Adjust.
* gcc.dg/vect/bb-slp-pr95839-2.c: Adjust.
* gcc.dg/vect/bb-slp-pr95839.c: Adjust.
* gcc.dg/vect/bb-slp-pr95866.c: Adjust.
* gcc.dg/vect/bb-slp-subgroups-1.c: Adjust.
* gcc.dg/vect/bb-slp-subgroups-2.c: Adjust.
* gcc.dg/vect/bb-slp-subgroups-3.c: Adjust.
* gcc.dg/vect/fast-math-bb-slp-call-1.c: Adjust.
* gcc.dg/vect/no-tree-reassoc-bb-slp-12.c: Adjust.
* gcc.dg/vect/no-tree-sra-bb-slp-pr50730.c: Adjust.
* gfortran.dg/vect/pr62283-2.f: Adjust.
* gcc.target/i386/pr68961.c: Adjust.
* gcc.target/i386/pr84101.c: Adjust.
* gcc.dg/vect/bb-slp-pr81635-2.c: Adjust.
* gcc.dg/vect/bb-slp-pr81635-4.c: Adjust.
* gcc.dg/vect/fast-math-bb-slp-call-2.c: Adjust.
* gcc.dg/vect/costmodel/ppc/costmodel-bb-slp-9a.c: Adjust.
* gcc.dg/vect/costmodel/x86_64/costmodel-vect-slp.c: Adjust.
* gcc.dg/vect/bb-slp-div-1.c: Adjust.
* gcc.dg/vect/bb-slp-pr90006.c: Adjust.
* g++.dg/vect/slp-pr50413.cc: Adjust.
|
|
A few MVE intrinsics had an unsigned variant implement while they are
supported by the hardware. This patch removes them:
__arm_vqrdmlashq_n_u8
__arm_vqrdmlahq_n_u8
__arm_vqdmlahq_n_u8
__arm_vqrdmlashq_n_u16
__arm_vqrdmlahq_n_u16
__arm_vqdmlahq_n_u16
__arm_vqrdmlashq_n_u32
__arm_vqrdmlahq_n_u32
__arm_vqdmlahq_n_u32
__arm_vmlaldavaxq_p_u32
__arm_vmlaldavaxq_p_u16
2020-10-08 Christophe Lyon <christophe.lyon@linaro.org>
gcc/
PR target/96914
* config/arm/arm_mve.h (vqrdmlashq_n_u8, vqrdmlashq_n_u16)
(vqrdmlashq_n_u32, vqrdmlahq_n_u8, vqrdmlahq_n_u16)
(vqrdmlahq_n_u32, vqdmlahq_n_u8, vqdmlahq_n_u16, vqdmlahq_n_u32)
(vmlaldavaxq_p_u16, vmlaldavaxq_p_u32): Remove.
* config/arm/arm_mve_builtins.def (vqrdmlashq_n_u, vqrdmlahq_n_u)
(vqdmlahq_n_u, vmlaldavaxq_p_u): Remove.
* config/arm/unspecs.md (VQDMLAHQ_N_U, VQRDMLAHQ_N_U)
(VQRDMLASHQ_N_U)
(VMLALDAVAXQ_P_U): Remove unspecs.
* config/arm/iterators.md (VQDMLAHQ_N_U, VQRDMLAHQ_N_U)
(VQRDMLASHQ_N_U, VMLALDAVAXQ_P_U): Remove attributes.
(VQDMLAHQ_N, VQRDMLAHQ_N, VQRDMLASHQ_N, VMLALDAVAXQ_P): Remove
unsigned variants from iterators.
* config/arm/mve.md (mve_vqdmlahq_n_<supf><mode>)
(mve_vqrdmlahq_n_<supf><mode>)
(mve_vqrdmlashq_n_<supf><mode>, mve_vmlaldavaxq_p_<supf><mode>):
Update comment.
gcc/testsuite/
PR target/96914
* gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_u16.c: Remove.
* gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_u32.c: Remove.
* gcc.target/arm/mve/intrinsics/vqdmlahq_n_u16.c: Remove.
* gcc.target/arm/mve/intrinsics/vqdmlahq_n_u32.c: Remove.
* gcc.target/arm/mve/intrinsics/vqdmlahq_n_u8.c: Remove.
* gcc.target/arm/mve/intrinsics/vqrdmlahq_n_u16.c: Remove.
* gcc.target/arm/mve/intrinsics/vqrdmlahq_n_u32.c: Remove.
* gcc.target/arm/mve/intrinsics/vqrdmlahq_n_u8.c: Remove.
* gcc.target/arm/mve/intrinsics/vqrdmlashq_n_u16.c: Remove.
* gcc.target/arm/mve/intrinsics/vqrdmlashq_n_u32.c: Remove.
* gcc.target/arm/mve/intrinsics/vqrdmlashq_n_u8.c: Remove.
|
|
This patch adds:
vqdmlashq_m_n_s16
vqdmlashq_m_n_s32
vqdmlashq_m_n_s8
vqdmlashq_n_s16
vqdmlashq_n_s32
vqdmlashq_n_s8
2020-10-08 Christophe Lyon <christophe.lyon@linaro.org>
gcc/
PR target/96914
* config/arm/arm_mve.h (vqdmlashq, vqdmlashq_m): Define.
* config/arm/arm_mve_builtins.def (vqdmlashq_n_s)
(vqdmlashq_m_n_s,): New.
* config/arm/unspecs.md (VQDMLASHQ_N_S, VQDMLASHQ_M_N_S): New
unspecs.
* config/arm/iterators.md (VQDMLASHQ_N_S, VQDMLASHQ_M_N_S): New
attributes.
(VQDMLASHQ_N): New iterator.
* config/arm/mve.md (mve_vqdmlashq_n_, mve_vqdmlashq_m_n_s): New
patterns.
gcc/testsuite/
PR target/96914
* gcc.target/arm/mve/intrinsics/vqdmlashq_m_n_s16.c: New test.
* gcc.target/arm/mve/intrinsics/vqdmlashq_m_n_s32.c: New test.
* gcc.target/arm/mve/intrinsics/vqdmlashq_m_n_s8.c: New test.
* gcc.target/arm/mve/intrinsics/vqdmlashq_n_s16.c: New test.
* gcc.target/arm/mve/intrinsics/vqdmlashq_n_s32.c: New test.
* gcc.target/arm/mve/intrinsics/vqdmlashq_n_s8.c: New test.
|
|
The arm target hook for divmod wasn't prepared to handle constants passed to
the function.
2020-10-08 Jakub Jelinek <jakub@redhat.com>
PR target/97322
* config/arm/arm.c (arm_expand_divmod_libfunc): Pass mode instead of
GET_MODE (op0) or GET_MODE (op1) to emit_library_call_value.
* gcc.dg/pr97322.c: New test.
|
|
gcc/ChangeLog:
PR tree-optimization/97325
* gimple-range.cc (gimple_ranger::range_of_builtin_call): Handle
negative numbers in __builtin_ffs and __builtin_popcount.
|
|
gcc/ChangeLog:
PR tree-optimization/97315
* range-op.cc (value_range_with_overflow): Change any
non-overflow calculation in which both bounds are
overflow/underflow to be undefined.
gcc/testsuite/ChangeLog:
* gcc.dg/pr97315-2.c: New test.
|
|
gcc/ChangeLog:
PR tree-optimization/97315
* gimple-ssa-evrp.c (hybrid_folder::choose_value): Removes the
trap and instead annotates the listing.
gcc/testsuite/ChangeLog:
* gcc.dg/pr97315-1.c: New test.
|
|
The following testcase FAILs, because we don't mark the child OpenMP function
as cfun->calls_alloca when it does call alloca. When optimizing, during DCE we
reset those flags and recompute them again, but with -O0 DCE is not performed.
Fixed by calling notice_special_calls when moving insns to the child function.
cfun->calls_alloca is normally set during gimplification and most of the
alloca calls omp-low.c does go through the gimplifier, but one spot didn't
and built the gcall directly, so that one needs to set calls_alloca too.
2020-10-08 Jakub Jelinek <jakub@redhat.com>
PR sanitizer/97294
* tree-cfg.c (move_block_to_fn): Call notice_special_calls on
call stmts being moved into dest_cfun.
* omp-low.c (lower_rec_input_clauses): Set cfun->calls_alloca when
adding __builtin_alloca_with_align call without gimplification.
* gcc.dg/asan/pr97294.c: New test.
|
|
This patch fixes an "unguarded" call to coerce_template_parms in
build_standard_check: processing_template_decl could be zero if we
get here during processing of the first 'auto' parameter of an
abbreviated function template, or if we're processing the type
constraint of a non-templated variable. In the testcase below, this
leads to an ICE when coerce_template_parms instantiates C's dependent
default template argument.
gcc/cp/ChangeLog:
PR c++/97052
* constraint.cc (build_type_constraint): Temporarily increment
processing_template_decl before calling build_concept_check.
* pt.c (make_constrained_placeholder_type): Likewise.
gcc/testsuite/ChangeLog:
PR c++/97052
* g++.dg/cpp2a/concepts-defarg2.C: New test.
|
|
In the testcase below, during processing (at parse time) of Y's base
class X<Y>, convert_template_argument calls is_compatible_template_arg
to check if the template argument Y is no more constrained than the
parameter P. But at this point we haven't yet set Y's constraints, so
get_normalized_constraints_from_decl yields NULL_TREE as the normal form
and caches this result into the normalized_map.
We set Y's constraints later in cp_parser_class_specifier_1 but the
stale normal form in the normalized_map remains. This ultimately causes
us to miss the constraint failure for Y<Z> because according to the
cached normal form, Y is not constrained.
This patch fixes this issue by moving up the call to
associate_classtype_constraints so that we set constraints before we
start processing a class's bases.
gcc/cp/ChangeLog:
PR c++/96229
* parser.c (cp_parser_class_specifier_1): Move call to
associate_classtype_constraints from here to ...
(cp_parser_class_head): ... here.
* pt.c (is_compatible_template_arg): Correct documentation to
say "argument is _no_ more constrained than the parameter".
gcc/testsuite/ChangeLog:
PR c++/96229
* g++.dg/cpp2a/concepts-class2.C: New test.
|
|
|
|
To quickly recap, P0846 says that a name is also considered to refer to
a template if it is an unqualified-id followed by a < and name lookup
finds either one or more functions or finds nothing.
In a template, when parsing a function call that has type-dependent
arguments, we can't perform ADL right away so we set KOENIG_LOOKUP_P in
the call to remember to do it when instantiating the call
(tsubst_copy_and_build/CALL_EXPR). When the called function is a
function template, we represent the call with a TEMPLATE_ID_EXPR;
usually the operand is an OVERLOAD.
In the P0846 case though, the operand can be an IDENTIFIER_NODE, when
name lookup found nothing when parsing the template name. But we
weren't handling this correctly in tsubst_copy_and_build. First
we need to pass the FUNCTION_P argument from <case TEMPLATE_ID_EXPR> to
<case IDENTIFIER_NODE>, otherwise we give a bogus error. And then in
<case CALL_EXPR> we need to perform ADL. The rest of the changes is to
give better errors when ADL didn't find anything.
gcc/cp/ChangeLog:
PR c++/97010
* pt.c (tsubst_copy_and_build) <case TEMPLATE_ID_EXPR>: Call
tsubst_copy_and_build explicitly instead of using the RECUR macro.
Handle a TEMPLATE_ID_EXPR with an IDENTIFIER_NODE as its operand.
<case CALL_EXPR>: Perform ADL for a TEMPLATE_ID_EXPR with an
IDENTIFIER_NODE as its operand.
gcc/testsuite/ChangeLog:
PR c++/97010
* g++.dg/cpp2a/fn-template21.C: New test.
* g++.dg/cpp2a/fn-template22.C: New test.
|
|
This option is no longer needed. There is no crash without it since
at least gcc-9.
Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/260157
|
|
match.sh was not correctly handling build constraints for Go versions
that have a two-digit suffix, like "go1.10".
The same issue will arise with Go 1.100, but that is a long ways off.
Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/260077
|
|
* common.opt (-fevrp-mode): Rename and move...
* params.opt (--param=evrp-mode): ...here.
* gimple-range.h (DEBUG_RANGE_CACHE): Use param_evrp_mode instead
of flag_evrp_mode.
* gimple-ssa-evrp.c (rvrp_folder): Same.
(hybrid_folder): Same.
(execute_early_vrp): Same.
|
|
This improves the heuristics finding a sink location for loads that does
not cross any store.
2020-10-07 Richard Biener <rguenther@suse.de>
PR tree-optimization/97307
* tree-ssa-sink.c (statement_sink_location): Change heuristic
for not skipping stores to look for virtual definitions
rather than uses.
* gcc.dg/tree-ssa/ssa-sink-17.c: New testcase.
* gcc.dg/vect/pr65947-3.c: XFAIL.
|
|
cp_tree_equal currently considers alignof the same as __alignof__, but
these operators are semantically different ever since r8-7957. In the
testcase below, this causes the second static_assert to fail on targets
where alignof(double) != __alignof__(double) because the specialization
table (which uses cp_tree_equal as its equality predicate) conflates the
two dependent specializations integral_constant<__alignof__(T)> and
integral_constant<alignof(T)>.
This patch makes cp_tree_equal distinguish between these two operators
by inspecting the ALIGNOF_EXPR_STD_P flag.
gcc/cp/ChangeLog:
PR c++/88115
PR libstdc++/97273
* tree.c (cp_tree_equal) <case ALIGNOF_EXPR>: Return false if
ALIGNOF_EXPR_STD_P differ.
gcc/testsuite/ChangeLog:
PR c++/88115
PR libstdc++/97273
* g++.dg/template/alignof3.C: New test.
|
|
Allocate the memory in an approved portable way.
gcc/ChangeLog:
2020-10-06 Andrew MacLeod <amacleod@redhat.com>
* value-range.h (irange_allocator::allocate): Allocate in two hunks
instead of using the variably-sized trailing array approach.
|
|
2020-07-10 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/47469
* trans-expr.c (arrayfunc_assign_needs_temporary): Tidy detection
of pointer and allocatable functions.
|
|
gcc/analyzer/ChangeLog:
PR analyzer/97116
* sm-malloc.cc (method_p): New.
(describe_argument_index): New.
(inform_nonnull_attribute): Use describe_argument_index.
(possible_null_arg::describe_final_event): Likewise.
(null_arg::describe_final_event): Likewise.
gcc/testsuite/ChangeLog:
PR analyzer/97116
* g++.dg/analyzer/pr97116.C: New test.
|
|
The path-printing default of -fdiagnostics-path-format=inline-events
interacted poorly with -fdiagnostics-plain-output, so it makes most
sense to add -fdiagnostics-path-format=separate-events to
-fdiagnostics-plain-output.
Seen when adding an experimental analyzer plugin to gcc.dg/plugin.exp.
gcc/ChangeLog:
* doc/invoke.texi (-fdiagnostics-plain-output): Add
-fdiagnostics-path-format=separate-events to list of
options injected by -fdiagnostics-plain-output.
* opts-common.c (decode_cmdline_options_to_array): Likewise.
gcc/testsuite/ChangeLog:
* g++.dg/analyzer/analyzer.exp (DEFAULT_CXXFLAGS): Remove
-fdiagnostics-path-format=separate-events.
* gcc.dg/analyzer/analyzer.exp (DEFAULT_CFLAGS): Likewise.
* gcc.dg/plugin/diagnostic-path-format-default.c: Rename to...
* gcc.dg/plugin/diagnostic-path-format-plain.c: ...this. Remove
dg-options directive. Copy remainder of test from
diagnostic-path-format-separate-events.c.
* gcc.dg/plugin/diagnostic-test-paths-2.c: Add
-fdiagnostics-path-format=inline-events to options.
Fix expected output for location of conditional within "for" loop.
* gcc.dg/plugin/plugin.exp (plugin_test_list): Update for
renaming.
* gfortran.dg/analyzer/analyzer.exp (DEFAULT_FFLAGS): Remove
-fdiagnostics-path-format=separate-events.
|
|
This patch improves block-scope extern handling by always injecting a
hidden copy into the enclosing namespace (or using a match already
there). This hidden copy will be revealed if the user explicitly
declares it later. We can get from the DECL_LOCAL_DECL_P local extern
to the alias via DECL_LOCAL_DECL_ALIAS. This fixes several bugs and
removes the kludgy per-function extern_decl_map. We only do this
pushing for non-dependent local externs -- dependent ones will be
pushed during instantiation.
User code that expected to be able to handle incompatible local
externs in different block-scopes will no longer work. That code is
ill-formed. (always was, despite what 31775 claimed). I had to
adjust a number of testcases that fell into this.
I tried using DECL_VALUE_EXPR, but that didn't work out. Due to
constexpr requirements we have to do the replacement very late (it
happens in the gimplifier). Consider:
extern int l[]; // #1
constexpr bool foo ()
{
extern int l[3]; // this does not complete the type of decl #1
constexpr int *p = &l[2]; // ok
return !p;
}
This requirement, coupled with our use of the common folding machinery
makes pr97306 hard to fix, as we end up with an expression containing
the two different decls for 'l', and only the c++ FE knows how to
reconcile those. I punted on this.
gcc/cp/
* cp-tree.h (struct language_function): Delete extern_decl_map.
(DECL_LOCAL_DECL_ALIAS): New.
* name-lookup.h (is_local_extern): Delete.
* name-lookup.c (set_local_extern_decl_linkage): Replace with ...
(push_local_extern_decl): ... this new function.
(do_pushdecl): Call new function after pushing new decl. Unhide
hidden non-functions.
(is_local_extern): Delete.
* decl.c (layout_var_decl): Do not allow VLA local externs.
* decl2.c (mark_used): Also mark DECL_LOCAL_DECL_ALIAS. Drop old
local-extern treatment.
* parser.c (cp_parser_oacc_declare): Deal with local extern aliases.
* pt.c (tsubst_expr): Adjust local extern instantiation.
* cp-gimplify.c (cp_genericize_r): Remap DECL_LOCAL_DECLs.
gcc/testsuite/
* g++.dg/cpp0x/lambda/lambda-sfinae1.C: Avoid ill-formed local extern
* g++.dg/init/pr42844.C: Add expected error.
* g++.dg/lookup/extern-redecl1.C: Likewise.
* g++.dg/lookup/koenig15.C: Avoid ill-formed.
* g++.dg/lto/pr95677.C: New.
* g++.dg/other/nested-extern-1.C: Correct expected behabviour.
* g++.dg/other/nested-extern-2.C: Likewise.
* g++.dg/other/nested-extern.cc: Split ...
* g++.dg/other/nested-extern-1.cc: ... here ...
* g++.dg/other/nested-extern-2.cc: ... here.
* g++.dg/template/scope5.C: Avoid ill-formed
* g++.old-deja/g++.law/missed-error2.C: Allow extension.
* g++.old-deja/g++.pt/crash3.C: Add expected error.
|
|
As the FIXME which this patch removes states, the current code does
not work when a call with multiple speculative targets gets resolved
through parameter tracking during inlining - it feeds the inliner an
edge it has already dealt with. The patch makes the code which should
prevent it aware of the possibility that that speculation can have
more than one target now.
gcc/ChangeLog:
2020-09-30 Martin Jambor <mjambor@suse.cz>
PR ipa/96394
* ipa-prop.c (update_indirect_edges_after_inlining): Do not add
resolved speculation edges to vector of new direct edges even in
presence of multiple speculative direct edges for a single call.
gcc/testsuite/ChangeLog:
2020-09-30 Martin Jambor <mjambor@suse.cz>
PR ipa/96394
* gcc.dg/tree-prof/pr96394.c: New test.
|