Age | Commit message (Collapse) | Author | Files | Lines |
|
When vector comparisons were forced to use vec_cond_expr, we lost a number of optimizations (my fault for not adding enough testcases to
prevent that). This patch tries to unwrap vec_cond_expr a bit so some optimizations can still happen.
I wasn't planning to add all those transformations together, but adding one caused a regression, whose fix introduced a second regression,
etc.
Restricting to constant folding would not be sufficient, we also need at least things like X|0 or X&X. The transformations are quite
conservative with :s and folding only if everything simplifies, we may want to relax this later. And of course we are going to miss things
like a?b:c + a?c:b -> b+c.
In terms of number of operations, some transformations turning 2 VEC_COND_EXPR into VEC_COND_EXPR + BIT_IOR_EXPR + BIT_NOT_EXPR might not look
like a gain... I expect the bit_not disappears in most cases, and VEC_COND_EXPR looks more costly than a simpler BIT_IOR_EXPR.
2020-08-05 Marc Glisse <marc.glisse@inria.fr>
PR tree-optimization/95906
PR target/70314
* match.pd ((c ? a : b) op d, (c ? a : b) op (c ? d : e),
(v ? w : 0) ? a : b, c1 ? c2 ? a : b : b): New transformations.
(op (c ? a : b)): Update to match the new transformations.
* gcc.dg/tree-ssa/andnot-2.c: New file.
* gcc.dg/tree-ssa/pr95906.c: Likewise.
* gcc.target/i386/pr70314.c: Likewise.
|
|
The stack_protect_test patterns were leaving the canary value in the
temporary register, meaning that it was often still in registers on
return from the function. An attacker might therefore have been
able to use it to defeat stack-smash protection for a later function.
gcc/
PR target/96191
* config/aarch64/aarch64.md (stack_protect_test_<mode>): Set the
CC register directly, instead of a GPR. Replace the original GPR
destination with an extra scratch register. Zero out operand 3
after use.
(stack_protect_test): Update accordingly.
gcc/testsuite/
PR target/96191
* gcc.target/aarch64/stack-protector-1.c: New test.
* gcc.target/aarch64/stack-protector-2.c: Likewise.
|
|
For LDP/STP Q, the memory operand might not be valid for "m",
so we need to use %z<N> instead of %<N> in the asm template.
This patch does that for all Ump LDP/STP patterns, regardless
of whether it's strictly needed.
This is needed to unbreak bootstrap.
2020-08-05 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* config/aarch64/aarch64.md (load_pair_sw_<SX:mode><SX2:mode>)
(load_pair_dw_<DX:mode><DX2:mode>, load_pair_dw_tftf)
(store_pair_sw_<SX:mode><SX2:mode>)
(store_pair_dw_<DX:mode><DX2:mode>, store_pair_dw_tftf)
(*load_pair_extendsidi2_aarch64)
(*load_pair_zero_extendsidi2_aarch64): Use %z for the memory operand.
* config/aarch64/aarch64-simd.md (load_pair<DREG:mode><DREG2:mode>)
(vec_store_pair<DREG:mode><DREG2:mode>, load_pair<VQ:mode><VQ2:mode>)
(vec_store_pair<VQ:mode><VQ2:mode>): Likewise.
|
|
This refactors LIM to eschew alloc_aux_for_edges and re-uses the RPO
order of the move_computations walk for invariantness computation as well.
It also removes one unnecessary sorting (but retaining it as checking
code because we bsearch the vector) and moves edge insert commit code
to the place where it doesn't have to scan all the functions edges.
This was all done when investigating whether LIM can be refactored
to work on a specific loop for on-demand processing (but we're not
there yet).
2020-08-05 Richard Biener <rguenther@suse.de>
* tree-ssa-loop-im.c (invariantness_dom_walker): Remove.
(invariantness_dom_walker::before_dom_children): Move to ...
(compute_invariantness): ... this function.
(move_computations): Inline ...
(tree_ssa_lim): ... here, share RPO order and avoid some
cfun references.
(analyze_memory_references): Remove sorting of location
lists, instead assert they are sorted already when checking.
(prev_flag_edges): Remove.
(execute_sm_if_changed): Pass down and adjust prev edge state.
(execute_sm_exit): Likewise.
(hoist_memory_references): Likewise. Commit edge insertions
of each processed exit.
(store_motion_loop): Do not commit edge insertions on all
edges in the function.
(tree_ssa_lim_initialize): Do not call alloc_aux_for_edges.
(tree_ssa_lim_finalize): Do not call free_aux_for_edges.
|
|
Currently whether a fail during the transform stage is fatal or
whether following patterns are still considers is a bit random
depending on whether the pattern is wrapped in a for for example.
The follwing makes it consistent by replacing early returns with
gotos to the end of the pattern processing.
2020-08-05 Richard Biener <rguenther@suse.de>
* genmatch.c (fail_label): New global.
(expr::gen_transform): Branch to fail_label instead of
returning. Fix indent of call argument checking.
(dt_simplify::gen_1): Compute and emit fail_label, branch
to it instead of returning early.
|
|
The number of loops computation and logical iteration -> actual iterator values
computations can now be done separately even on composite constructs (though
for triangular loops it would still be more efficient to propagate a few values
through, will handle that incrementally).
simd and taskloop are still unhandled.
2020-08-05 Jakub Jelinek <jakub@redhat.com>
* omp-expand.c (expand_omp_for): Don't disallow combined non-rectangular
loops.
* testsuite/libgomp.c/loop-22.c: New test.
* testsuite/libgomp.c/loop-23.c: New test.
|
|
As the new testcase shows, we weren't actually performing reductions on
host teams construct. And fixing that revealed a flaw in the for-14.c testcase.
The problem is that the tests perform also initialization and checking around the
calls to the functions with the OpenMP constructs. In that testcase, all the
tests have been spawned from a teams construct but only the tested loops were
distribute, which means the initialization and checking has been performed
redundantly and racily in each team. Fixed by performing the initialization
and checking outside of host teams and only do the calls to functions with
the tested constructs inside of host teams.
2020-08-05 Jakub Jelinek <jakub@redhat.com>
PR middle-end/96459
* omp-low.c (lower_omp_taskreg): Call lower_reduction_clauses even in
for host teams.
* testsuite/libgomp.c/teams-3.c: New test.
* testsuite/libgomp.c-c++-common/for-2.h (OMPTEAMS): Define to nothing
if not defined yet.
(N(test)): Use it before all N(f*) calls.
* testsuite/libgomp.c-c++-common/for-14.c (DO_PRAGMA, OMPTEAMS): Define.
(main): Don't call all test_* functions from within
#pragma omp teams reduction(|:err), call them directly.
|
|
iterations is computed at runtime
For triangular loops use more efficient logical iteration number
to actual iterator values computation even for non-rectangular loops
where number of loop iterations could not be computed at compile time.
2020-08-05 Jakub Jelinek <jakub@redhat.com>
* omp-expand.c (expand_omp_for_init_counts): Remember
first_inner_iterations, factor and n1o from the number of iterations
computation in *fd.
(expand_omp_for_init_vars): Use more efficient logical iteration number
to actual iterator values computation even for non-rectangular loops
where number of loop iterations could not be computed at compile time.
|
|
GCC maintainers:
The following patch adds support for the vec_blendv and vec_permx
builtins.
The patch has been compiled and tested on
powerpc64le-unknown-linux-gnu (Power 8 LE)
powerpc64le-unknown-linux-gnu (Power 9 LE)
with no regression errors.
The test cases were compiled on a Power 9 system and then tested on
Mambo.
Carl Love
rs6000 RFC2609 vector blend, permute instructions
gcc/ChangeLog
2020-08-04 Carl Love <cel@us.ibm.com>
* config/rs6000/altivec.h (vec_blendv, vec_permx): Add define.
* config/rs6000/altivec.md (UNSPEC_XXBLEND, UNSPEC_XXPERMX.): New
unspecs.
(VM3): New define_mode.
(VM3_char): New define_attr.
(xxblend_<mode> mode VM3): New define_insn.
(xxpermx): New define_expand.
(xxpermx_inst): New define_insn.
* config/rs6000/rs6000-builtin.def (VXXBLEND_V16QI, VXXBLEND_V8HI,
VXXBLEND_V4SI, VXXBLEND_V2DI, VXXBLEND_V4SF, VXXBLEND_V2DF): New
BU_P10V_3 definitions.
(XXBLEND): New BU_P10_OVERLOAD_3 definition.
(XXPERMX): New BU_P10_OVERLOAD_4 definition.
* config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin):
(P10_BUILTIN_VXXPERMX): Add if statement.
* config/rs6000/rs6000-call.c (P10_BUILTIN_VXXBLEND_V16QI,
P10_BUILTIN_VXXBLEND_V8HI, P10_BUILTIN_VXXBLEND_V4SI,
P10_BUILTIN_VXXBLEND_V2DI, P10_BUILTIN_VXXBLEND_V4SF,
P10_BUILTIN_VXXBLEND_V2DF, P10_BUILTIN_VXXPERMX): Define
overloaded arguments.
(rs6000_expand_quaternop_builtin): Add if case for CODE_FOR_xxpermx.
(builtin_quaternary_function_type): Add v16uqi_type and xxpermx_type
variables, add case statement for P10_BUILTIN_VXXPERMX.
(builtin_function_type): Add case statements for
P10_BUILTIN_VXXBLEND_V16QI, P10_BUILTIN_VXXBLEND_V8HI,
P10_BUILTIN_VXXBLEND_V4SI, P10_BUILTIN_VXXBLEND_V2DI.
* doc/extend.texi: Add documentation for vec_blendv and vec_permx.
gcc/testsuite/ChangeLog
2020-08-04 Carl Love <cel@us.ibm.com>
* gcc.target/powerpc/vec-blend-runnable.c: New test.
* gcc.target/powerpc/vec-permute-ext-runnable.c: New test.
|
|
GCC maintainers:
The following patch adds support for the vec_splati, vec_splatid and
vec_splati_ins builtins.
This patch adds support for instructions that take a 32-bit immediate
value that represents a floating point value. This support adds new
predicates and a support function to properly handle the immediate value.
The patch has been compiled and tested on
powerpc64le-unknown-linux-gnu (Power 8 LE)
powerpc64le-unknown-linux-gnu (Power 9 LE)
with no regression errors.
The test case was compiled on a Power 9 system and then tested on
Mambo.
Please let me know if this patch is acceptable for the mainline
branch. Thanks.
Carl Love
--------------------------------------------------------
gcc/ChangeLog
2020-08-04 Carl Love <cel@us.ibm.com>
* config/rs6000/altivec.h (vec_splati, vec_splatid, vec_splati_ins):
Add defines.
* config/rs6000/altivec.md (UNSPEC_XXSPLTIW, UNSPEC_XXSPLTID,
UNSPEC_XXSPLTI32DX): New.
(vxxspltiw_v4si, vxxspltiw_v4sf_inst, vxxspltidp_v2df_inst,
vxxsplti32dx_v4si_inst, vxxsplti32dx_v4sf_inst): New define_insn.
(vxxspltiw_v4sf, vxxspltidp_v2df, vxxsplti32dx_v4si,
vxxsplti32dx_v4sf.): New define_expands.
* config/rs6000/predicates.md (u1bit_cint_operand,
s32bit_cint_operand, c32bit_cint_operand): New predicates.
* config/rs6000/rs6000-builtin.def (VXXSPLTIW_V4SI, VXXSPLTIW_V4SF,
VXXSPLTID): New definitions.
(VXXSPLTI32DX_V4SI, VXXSPLTI32DX_V4SF): New BU_P10V_3
definitions.
(XXSPLTIW, XXSPLTID): New definitions.
(XXSPLTI32DX): Add definitions.
* config/rs6000/rs6000-call.c (P10_BUILTIN_VEC_XXSPLTIW,
P10_BUILTIN_VEC_XXSPLTID, P10_BUILTIN_VEC_XXSPLTI32DX):
New definitions.
* config/rs6000/rs6000-protos.h (rs6000_constF32toI32): New extern
declaration.
* config/rs6000/rs6000.c (rs6000_constF32toI32): New function.
* doc/extend.texi: Add documentation for vec_splati,
vec_splatid, and vec_splati_ins.
gcc/testsuite/ChangeLog
2020-08-04 Carl Love <cel@us.ibm.com>
* gcc.target/powerpc/vec-splati-runnable.c: New test.
|
|
GCC maintainers:
The following patch adds support for the vector shift double builtins.
The patch has been compiled and tested on
powerpc64le-unknown-linux-gnu (Power 8 LE)
powerpc64le-unknown-linux-gnu (Power 9 LE)
and Mambo with no regression errors.
Please let me know if this patch is acceptable for the mainline branch.
Thanks.
Carl Love
-------------------------------------------------------
gcc/ChangeLog
2020-08-04 Carl Love <cel@us.ibm.com>
* config/rs6000/altivec.h (vec_sldb, vec_srdb): New defines.
* config/rs6000/altivec.md (UNSPEC_SLDB, UNSPEC_SRDB): New.
(SLDB_lr): New attribute.
(VSHIFT_DBL_LR): New iterator.
(vs<SLDB_lr>db_<mode>): New define_insn.
* config/rs6000/rs6000-builtin.def (VSLDB_V16QI, VSLDB_V8HI,
VSLDB_V4SI, VSLDB_V2DI, VSRDB_V16QI, VSRDB_V8HI, VSRDB_V4SI,
VSRDB_V2DI): New BU_P10V_3 definitions.
(SLDB, SRDB): New BU_P10_OVERLOAD_3 definitions.
* config/rs6000/rs6000-call.c (P10_BUILTIN_VEC_SLDB,
P10_BUILTIN_VEC_SRDB): New definitions.
(rs6000_expand_ternop_builtin) [CODE_FOR_vsldb_v16qi,
CODE_FOR_vsldb_v8hi, CODE_FOR_vsldb_v4si, CODE_FOR_vsldb_v2di,
CODE_FOR_vsrdb_v16qi, CODE_FOR_vsrdb_v8hi, CODE_FOR_vsrdb_v4si,
CODE_FOR_vsrdb_v2di]: Add clauses.
* doc/extend.texi: Add description for vec_sldb and vec_srdb.
gcc/testsuite/ChangeLog
2020-08-04 Carl Love <cel@us.ibm.com>
* gcc.target/powerpc/vec-shift-double-runnable.c: New test file.
|
|
The following patch adds support for builtins vec_replace_elt and
vec_replace_unaligned.
The patch has been compiled and tested on
powerpc64le-unknown-linux-gnu (Power 8 LE)
powerpc64le-unknown-linux-gnu (Power 9 LE)
and mambo with no regression errors.
Please let me know if this patch is acceptable for the mainline
branch. Thanks.
Carl Love
-------------------------------------------------------
gcc/ChangeLog
2020-08-04 Carl Love <cel@us.ibm.com>
* config/rs6000/altivec.h: Add define for vec_replace_elt and
vec_replace_unaligned.
* config/rs6000/vsx.md (UNSPEC_REPLACE_ELT, UNSPEC_REPLACE_UN): New
unspecs.
(REPLACE_ELT): New mode iterator.
(REPLACE_ELT_char, REPLACE_ELT_sh, REPLACE_ELT_max): New mode attributes.
(vreplace_un_<mode>, vreplace_elt_<mode>_inst): New.
* config/rs6000/rs6000-builtin.def (VREPLACE_ELT_V4SI,
VREPLACE_ELT_UV4SI, VREPLACE_ELT_V4SF, VREPLACE_ELT_UV2DI,
VREPLACE_ELT_V2DF, VREPLACE_UN_V4SI, VREPLACE_UN_UV4SI,
VREPLACE_UN_V4SF, VREPLACE_UN_V2DI, VREPLACE_UN_UV2DI,
VREPLACE_UN_V2DF, (REPLACE_ELT, REPLACE_UN, VREPLACE_ELT_V2DI): New builtin
entries.
* config/rs6000/rs6000-call.c (P10_BUILTIN_VEC_REPLACE_ELT,
P10_BUILTIN_VEC_REPLACE_UN): New builtin argument definitions.
(rs6000_expand_quaternop_builtin): Add 3rd argument checks for
CODE_FOR_vreplace_elt_v4si, CODE_FOR_vreplace_elt_v4sf,
CODE_FOR_vreplace_un_v4si, CODE_FOR_vreplace_un_v4sf.
(builtin_function_type) [P10_BUILTIN_VREPLACE_ELT_UV4SI,
P10_BUILTIN_VREPLACE_ELT_UV2DI, P10_BUILTIN_VREPLACE_UN_UV4SI,
P10_BUILTIN_VREPLACE_UN_UV2DI]: New cases.
* doc/extend.texi: Add description for vec_replace_elt and
vec_replace_unaligned builtins.
gcc/testsuite/ChangeLog
2020-08-04 Carl Love <cel@us.ibm.com>
* gcc.target/powerpc/vec-replace-word-runnable.c: New test.
|
|
GCC maintainers:
This patch adds support for vec_insertl and vec_inserth builtins.
The patch has been compiled and tested on
powerpc64le-unknown-linux-gnu (Power 8 LE)
powerpc64le-unknown-linux-gnu (Power 9 LE)
and mambo with no regression errors.
Please let me know if this patch is acceptable for the mainline branch.
Thanks.
Carl Love
--------------------------------------------------------------
gcc/ChangeLog
2020-08-04 Carl Love <cel@us.ibm.com>
* config/rs6000/altivec.h (vec_insertl, vec_inserth): New defines.
* config/rs6000/rs6000-builtin.def (VINSERTGPRBL, VINSERTGPRHL,
VINSERTGPRWL, VINSERTGPRDL, VINSERTVPRBL, VINSERTVPRHL, VINSERTVPRWL,
VINSERTGPRBR, VINSERTGPRHR, VINSERTGPRWR, VINSERTGPRDR, VINSERTVPRBR,
VINSERTVPRHR, VINSERTVPRWR): New builtins.
(INSERTL, INSERTH): New builtins.
* config/rs6000/rs6000-call.c (P10_BUILTIN_VEC_INSERTL,
P10_BUILTIN_VEC_INSERTH): New overloaded definitions.
(P10_BUILTIN_VINSERTGPRBL, P10_BUILTIN_VINSERTGPRHL,
P10_BUILTIN_VINSERTGPRWL, P10_BUILTIN_VINSERTGPRDL,
P10_BUILTIN_VINSERTVPRBL, P10_BUILTIN_VINSERTVPRHL,
P10_BUILTIN_VINSERTVPRWL): Add case entries.
* config/rs6000/vsx.md (define_c_enum): Add UNSPEC_INSERTL,
UNSPEC_INSERTR.
(define_expand): Add vinsertvl_<mode>, vinsertvr_<mode>,
vinsertgl_<mode>, vinsertgr_<mode>, mode is VI2.
(define_ins): vinsertvl_internal_<mode>, vinsertvr_internal_<mode>,
vinsertgl_internal_<mode>, vinsertgr_internal_<mode>, mode VEC_I.
* doc/extend.texi: Add documentation for vec_insertl, vec_inserth.
gcc/testsuite/ChangeLog
2020-08-04 Carl Love <cel@us.ibm.com>
* gcc.target/powerpc/vec-insert-word-runnable.c: New test case.
|
|
GCC maintainers:
Move the existing vector extract support in altivec.md to vsx.md
so all of the vector insert and extract support is in the same file.
The patch also updates the name of the builtins and descriptions for the
builtins in the documentation file so they match the approved builtin
names and descriptions.
The patch does not make any functional changes.
Please let me know if the changes are acceptable for mainline. Thanks.
Carl Love
------------------------------------------------------
gcc/ChangeLog
2020-08-04 Carl Love <cel@us.ibm.com>
* config/rs6000/altivec.md: (UNSPEC_EXTRACTL, UNSPEC_EXTRACTR)
(vextractl<mode>, vextractr<mode>)
(vextractl<mode>_internal, vextractr<mode>_internal for mode VI2)
(VI2): Move to ...
* config/rs6000/vsx.md: (UNSPEC_EXTRACTL, UNSPEC_EXTRACTR)
(vextractl<mode>, vextractr<mode>)
(vextractl<mode>_internal, vextractr<mode>_internal for mode VI2)
(VI2): ..here.
* doc/extend.texi: Update documentation for vec_extractl.
Replace builtin name vec_extractr with vec_extracth. Update
description of vec_extracth.
|
|
|
|
Noticed while reviewing the RISC-V -mstack-protector-guard docs. The
AArch64 section has two identical copies of the docs for this option.
gcc/
* doc/invoke.texi (AArch64 Options): Delete duplicate
-mstack-protector-guard docs.
|
|
This patch adds support for signed and unsigned, HImode and SImode highpart
multiplications to the nvptx backend.
This patch has been tested on nvptx-none hosted on x86_64-pc-linux-gnu
with a "make" and "make -k check" with no new failures with the
above patch.
2020-08-04 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog:
* config/nvptx/nvptx.md (smulhi3_highpart, smulsi3_highpart)
(umulhi3_highpart, umulsi3_highpart): New instructions.
gcc/testsuite/ChangeLog:
* gcc.target/nvptx/mul-hi.c: New test.
* gcc.target/nvptx/umul-hi.c: New test.
|
|
In r9-4235 I tried to make sure that the template keyword follows
a nested-name-specifier. :: is a valid nested-name-specifier, so
I also have to check 'globalscope' before giving the error.
gcc/cp/ChangeLog:
PR c++/96082
* parser.c (cp_parser_elaborated_type_specifier): Allow
'template' following ::.
gcc/testsuite/ChangeLog:
PR c++/96082
* g++.dg/template/template-keyword3.C: New test.
|
|
If we lower a constant string operation in a Binary_expression,
delete the strings. This is safe because constant strings are always
newly allocated.
This is a hack to use much less memory when compiling the new
time/tzdata package, which has a file that contains the sum of over
13,000 constant strings. We don't do this for numeric expressions
because that could cause us to delete an Iota_expression.
We should have a cleaner approach to memory usage some day.
Fixes PR go/96450
|
|
Nothing uses these since the switch to HSACOv3.
gcc/ChangeLog:
* config/gcn/gcn-run.c (R_AMDGPU_NONE): Delete.
(R_AMDGPU_ABS32_LO): Delete.
(R_AMDGPU_ABS32_HI): Delete.
(R_AMDGPU_ABS64): Delete.
(R_AMDGPU_REL32): Delete.
(R_AMDGPU_REL64): Delete.
(R_AMDGPU_ABS32): Delete.
(R_AMDGPU_GOTPCREL): Delete.
(R_AMDGPU_GOTPCREL32_LO): Delete.
(R_AMDGPU_GOTPCREL32_HI): Delete.
(R_AMDGPU_REL32_LO): Delete.
(R_AMDGPU_REL32_HI): Delete.
(reserved): Delete.
(R_AMDGPU_RELATIVE64): Delete.
|
|
Previously, compiling with -march=armv8.1-m.main would tune for
Cortex-M7.
However, the Cortex-M7 only supports up to Armv7e-M. The Cortex-M55 is
the earliest CPU that supports Armv8.1-M Mainline so is more appropriate.
This also has the effect of changing the branch cost function used, which
will be necessary to correctly prioritise conditional instructions over branches
in the rest of this patch series.
Regression tested on arm-none-eabi.
gcc/ChangeLog
2020-08-04 Omar Tahir <omar.tahir@arm.com>
* config/arm/arm-cpus.in (armv8.1-m.main): Tune for Cortex-M55.
|
|
gcc/
* config/aarch64/aarch64.c (aarch64_if_then_else_costs): Delete
redundant extra_cost variable.
|
|
I noticed that we could leak parser->num_template_parameter_lists with
erroneous specializations. We'd increment, notice a problem and then
bail out. This refactors cp_parser_explicit_specialization to avoid
that code path. A couple of tests get different diagnostics because
of the fix. pr39425 then goes to unbounded template instantiation
and exceeds the implementation limit.
gcc/cp/
* parser.c (cp_parser_explicit_specialization): Refactor
to avoid leak of num_template_parameter_lists value.
gcc/testsuite/
* g++.dg/template/pr39425.C: Adjust errors, (unbounded
template recursion).
* g++.old-deja/g++.pt/spec20.C: Remove fallout diagnostics.
|
|
Since all FP intrinsics are set by FLAG_FP by default, but not all FP intrinsics
raise FP exceptions or read FPCR register. So we add a global flag FLAG_AUTO_FP
to suppress the flag FLAG_FP.
2020-08-04 Zhiheng Xie <xiezhiheng@huawei.com>
gcc/ChangeLog:
* config/aarch64/aarch64-builtins.c (aarch64_call_properties):
Use FLOAT_MODE_P macro instead of enumerating all floating-point
modes and add global flag FLAG_AUTO_FP.
|
|
gcc/fortran/ChangeLog:
* openmp.c (resolve_omp_do): Detect not perfectly
nested loop with innermost collapse.
gcc/testsuite/ChangeLog:
* gfortran.dg/gomp/collapse1.f90: Add dg-error.
* gfortran.dg/gomp/collapse2.f90: New test.
|
|
When looking at the symver attr documentation in html, I found there is no
name to refer to for it.
2020-08-04 Jakub Jelinek <jakub@redhat.com>
* doc/extend.texi (symver): Add @cindex for symver function attribute.
|
|
PR rtl-optimization/60473 is code quality regression that has
been cured by improvements to register allocation. For the function
in the test case, GCC 4.4, 4.5 and 4.6 generated very poor code
requiring two mov instructions, and GCC 4.7 and 4.8 (when the PR was
filed) produced better but still poor code with one mov instruction.
Since GCC 4.9 (including current mainline), it generates optimal
code with no mov instructions, matching what used to be generated
in GCC 4.1.
2020-08-04 Roger Sayle <roger@nextmovesoftware.com>
gcc/testsuite/ChangeLog
PR rtl-optimization/60473
* gcc.target/i386/pr60473.c: New test.
|
|
this transformation is quite straightforward, without overflow, 3*X==15 is
the same as X==5 and 3*X==5 cannot happen. Adding a single_use restriction
for the first case didn't seem necessary, although of course it can
slightly increase register pressure in some cases.
2020-08-04 Marc Glisse <marc.glisse@inria.fr>
PR tree-optimization/95433
* match.pd (X * C1 == C2): New transformation.
* gcc.c-torture/execute/pr23135.c: Add -fwrapv to avoid
undefined behavior.
* gcc.dg/tree-ssa/pr95433.c: New file.
|
|
gcc/ChangeLog:
* gimple-ssa-sprintf.c (get_int_range): Adjust for irange API.
(format_integer): Same.
(handle_printf_call): Same.
|
|
Adds code generation for generating a temporary for, and pre-filling
struct and array literals with zeroes before assigning, so that
alignment holes don't cause objects to produce a non-deterministic hash
value. A new field has been added to the expression visitor to track
whether the result is being generated for another literal, so that
memset() is only called once on the top-level literal expression, and
not for nesting struct or arrays.
gcc/d/ChangeLog:
PR d/96153
* d-tree.h (build_expr): Add literalp argument.
* expr.cc (ExprVisitor): Add literalp_ field.
(ExprVisitor::ExprVisitor): Initialize literalp_.
(ExprVisitor::visit (AssignExp *)): Call memset() on blits where RHS
is a struct literal. Elide assignment if initializer is all zeroes.
(ExprVisitor::visit (CastExp *)): Forward literalp_ to generation of
subexpression.
(ExprVisitor::visit (AddrExp *)): Likewise.
(ExprVisitor::visit (ArrayLiteralExp *)): Use memset() to pre-fill
object with zeroes. Set literalp in subexpressions.
(ExprVisitor::visit (StructLiteralExp *)): Likewise.
(ExprVisitor::visit (TupleExp *)): Set literalp in subexpressions.
(ExprVisitor::visit (VectorExp *)): Likewise.
(ExprVisitor::visit (VectorArrayExp *)): Likewise.
(build_expr): Forward literal_p to ExprVisitor.
gcc/testsuite/ChangeLog:
PR d/96153
* gdc.dg/pr96153.d: New test.
|
|
Implement TImode shifts in the backend.
The middle-end support that does it for other architectures doesn't work for
GCN because BITS_PER_WORD==32, meaning that TImode is quad-word, not
double-word.
gcc/ChangeLog:
* config/gcn/gcn.md ("<expander>ti3"): New.
|
|
This patch preserves the source locations of each node in a member
initializer list so that during processing of the list we can set
input_location appropriately for generally more accurate diagnostic
locations. Since TREE_LIST nodes are tcc_exceptional, they can't have
source locations, so we instead store the location in a dummy
tcc_expression node within the TREE_TYPE of the list node.
gcc/cp/ChangeLog:
PR c++/94024
* init.c (sort_mem_initializers): Preserve TREE_TYPE of the
member initializer list node.
(emit_mem_initializers): Set input_location when performing each
member initialization.
* parser.c (cp_parser_mem_initializer): Attach the source
location of this initializer to a dummy EMPTY_CLASS_EXPR
within the TREE_TYPE of the list node.
* pt.c (tsubst_initializer_list): Preserve TREE_TYPE of the
member initializer list node.
gcc/testsuite/ChangeLog:
PR c++/94024
* g++.dg/diagnostic/mem-init1.C: New test.
|
|
This adds a stopgap measure to avoid performing code-hoisting
on mixed type loads when the load we'd insert in the hoisting
position would be a floating point one. This is because certain
targets (hello x87) cannot perform floating point loads without
possibly altering the bit representation and thus cannot be used
in place of integral loads.
2020-08-04 Richard Biener <rguenther@suse.de>
PR tree-optimization/88240
* tree-ssa-sccvn.h (vn_reference_s::punned): New flag.
* tree-ssa-sccvn.c (vn_reference_insert): Initialize punned.
(vn_reference_insert_pieces): Likewise.
(visit_reference_op_call): Likewise.
(visit_reference_op_load): Track whether a ref was punned.
* tree-ssa-pre.c (do_hoist_insertion): Refuse to perform hoist
insertion on punned floating point loads.
* gcc.target/i386/pr88240.c: New testcase.
|
|
gcc/fortran/ChangeLog:
* trans-openmp.c (gfc_trans_omp_do): Fix 'lastprivate(conditional:'.
gcc/testsuite/ChangeLog:
* gfortran.dg/gomp/lastprivate-conditional-3.f90: Enable some
previously disabled 'lastprivate(conditional:' dg-warnings.
|
|
This is my attempt at reviving the old patch
https://gcc.gnu.org/pipermail/gcc-patches/2019-January/514632.html
I have followed on Kyrill's comment upstream on the link above and I
am using the recommended option iii that he mentioned.
"1) Adjust the copy_limit to 256 bits after checking
AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS in the tuning.
2) Adjust aarch64_copy_one_block_and_progress_pointers to handle
256-bit moves. by iii:
iii) Emit explicit V4SI (or any other 128-bit vector mode) pairs
ldp/stps. This wouldn't need any adjustments to MD patterns,
but would make aarch64_copy_one_block_and_progress_pointers
more complex as it would now have two paths, where one
handles two adjacent memory addresses in one calls."
gcc/ChangeLog:
* config/aarch64/aarch64.c (aarch64_gen_store_pair): Add case
for E_V4SImode.
(aarch64_gen_load_pair): Likewise.
(aarch64_copy_one_block_and_progress_pointers): Handle 256 bit copy.
(aarch64_expand_cpymem): Expand copy_limit to 256bits where
appropriate.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/cpymem-q-reg_1.c: New test.
* gcc.target/aarch64/large_struct_copy_2.c: Update for ldp q regs.
|
|
gcc/ChangeLog
2020-07-30 Andrea Corallo <andrea.corallo@arm.com>
* config/aarch64/aarch64.md (aarch64_fjcvtzs): Add missing
clobber.
* doc/sourcebuild.texi (aarch64_fjcvtzs_hw) Document new
target supports option.
gcc/testsuite/ChangeLog
2020-07-30 Andrea Corallo <andrea.corallo@arm.com>
* gcc.target/aarch64/acle/jcvt_2.c: New testcase.
* lib/target-supports.exp
(check_effective_target_aarch64_fjcvtzs_hw): Add new check for
FJCVTZS hw.
|
|
With the pr96628-part1.f90 source and -ftree-slp-vectorize, we run into an
ICE due to the fact that V2DI mode is not handled in nvptx_gen_shuffle.
Fix this by adding handling of V2DI as well as V2SI mode in
nvptx_gen_shuffle.
Build and reg-tested on x86_64 with nvptx accelerator.
gcc/ChangeLog:
PR target/96428
* config/nvptx/nvptx.c (nvptx_gen_shuffle): Handle V2SI/V2DI.
libgomp/ChangeLog:
PR target/96428
* testsuite/libgomp.oacc-fortran/pr96628-part1.f90: New test.
* testsuite/libgomp.oacc-fortran/pr96628-part2.f90: New test.
|
|
.VEC_CONVERT is a const internal call, so normally if the lhs is not used,
we'd DCE it far before getting to veclower, but with -O0 (or perhaps
-fno-tree-dce and some other -fno-* options) it can happen.
But as the internal fn needs the lhs to know the type to which the
conversion is done (and I think that is a reasonable representation, having
some magic another argument and having to create constants with that type
looks overkill to me), we just should DCE those calls ourselves.
During veclower, we can't really remove insns, as the callers would be
upset, so this just replaces it with a GIMPLE_NOP.
2020-08-04 Jakub Jelinek <jakub@redhat.com>
PR middle-end/96426
* tree-vect-generic.c (expand_vector_conversion): Replace .VEC_CONVERT
call with GIMPLE_NOP if there is no lhs.
* gcc.c-torture/compile/pr96426.c: New test.
|
|
In debug stmts, we are less strict about what is and what is not accepted
there, so this patch just punts on optimization of a debug stmt rather than
ICEing.
2020-08-04 Jakub Jelinek <jakub@redhat.com>
PR debug/96354
* gimple-fold.c (maybe_canonicalize_mem_ref_addr): Add IS_DEBUG
argument. Return false instead of gcc_unreachable if it is true and
get_addr_base_and_unit_offset returns NULL.
(fold_stmt_1) <case GIMPLE_DEBUG>: Adjust caller.
* g++.dg/opt/pr96354.C: New test.
|
|
gcc/ChangeLog:
* vr-values.c (simplify_using_ranges::vrp_evaluate_conditional):
Call is_gimple_min_invariant dropped from previous patch.
|
|
some non-rectangular loops
2020-08-04 Jakub Jelinek <jakub@redhat.com>
* omp-expand.c (expand_omp_for_init_counts): For triangular loops
compute number of iterations at runtime more efficiently.
(expand_omp_for_init_vars): Adjust immediate dominators.
(extract_omp_for_update_vars): Likewise.
|
|
gcc/d/ChangeLog:
PR d/96429
* expr.cc (ExprVisitor::visit (BinExp*)): Use EXACT_DIV_EXPR for
pointer diff expressions.
gcc/testsuite/ChangeLog:
PR d/96429
* gdc.dg/pr96429.d: New test.
|
|
2020-08-04 Paul Thomas <pault@gcc.gnu.org>
gcc/testsuite/
PR fortran/96325
* gfortran.dg/pr96325.f90: Change from run to compile.
|
|
gcc/ChangeLog:
* vr-values.c (simplify_using_ranges::two_valued_val_range_p):
Use irange API.
|
|
gcc/ChangeLog:
* vr-values.c (simplify_conversion_using_ranges): Convert to irange API.
|
|
gcc/ChangeLog:
* vr-values.c (test_for_singularity): Use irange API.
(simplify_using_ranges::simplify_cond_using_ranges_1): Do not
special case VR_RANGE.
|
|
gcc/ChangeLog:
* vr-values.c (simplify_using_ranges::vrp_evaluate_conditional): Adjust
for irange API.
|
|
gcc/ChangeLog:
* vr-values.c (simplify_using_ranges::op_with_boolean_value_range_p): Adjust
for irange API.
|
|
gcc/ChangeLog:
* tree-ssanames.c (get_range_info): Use irange instead of value_range.
* tree-ssanames.h (get_range_info): Same.
|
|
gcc/ChangeLog:
* fold-const.c (expr_not_equal_to): Adjust for irange API.
|