Age | Commit message (Collapse) | Author | Files | Lines |
|
OpenACC 'kernels'
gcc/
* omp-low.c (scan_omp_for) <OpenACC>: Move earlier inconsistent
nested 'reduction' clauses checking.
gcc/testsuite/
* c-c++-common/goacc/nested-reductions-1-kernels.c: Extend.
* c-c++-common/goacc/nested-reductions-2-kernels.c: Likewise.
* gfortran.dg/goacc/nested-reductions-1-kernels.f90: Likewise.
* gfortran.dg/goacc/nested-reductions-2-kernels.f90: Likewise.
|
|
checking
gcc/testsuite/
* c-c++-common/goacc/nested-reductions.c: Split file into...
* c-c++-common/goacc/nested-reductions-1-kernels.c: ... this...
* c-c++-common/goacc/nested-reductions-1-parallel.c: ..., this...
* c-c++-common/goacc/nested-reductions-1-routine.c: ..., and this.
* c-c++-common/goacc/nested-reductions-warn.c: Split file into...
* c-c++-common/goacc/nested-reductions-2-kernels.c: ... this...
* c-c++-common/goacc/nested-reductions-2-parallel.c: ..., this...
* c-c++-common/goacc/nested-reductions-2-routine.c: ..., and this.
* gfortran.dg/goacc/nested-reductions.f90: Split file into...
* gfortran.dg/goacc/nested-reductions-1-kernels.f90: ... this...
* gfortran.dg/goacc/nested-reductions-1-parallel.f90: ..., this...
* gfortran.dg/goacc/nested-reductions-1-routine.f90: ..., and
this.
* gfortran.dg/goacc/nested-reductions-warn.f90: Split file into...
* gfortran.dg/goacc/nested-reductions-2-kernels.f90: ... this...
* gfortran.dg/goacc/nested-reductions-2-parallel.f90: ..., this...
* gfortran.dg/goacc/nested-reductions-2-routine.f90: ..., and
this.
|
|
'vector' clauses with argument [PR92793]
gcc/fortran/
PR fortran/92793
* trans-openmp.c (gfc_trans_omp_clauses): More precise location
information for OpenACC 'gang', 'worker', 'vector' clauses with
argument.
gcc/testsuite/
PR fortran/92793
* gfortran.dg/goacc/pr92793-1.f90: Adjust.
|
|
with arguments on 'loop' only allowed in 'kernels' regions
Instead of at the location of the 'loop' directive, 'error_at' the location of
the improper clause, and 'inform' at the location of the enclosing parent
compute construct/routine.
The Fortran testcases come with some XFAILing, to be resolved later.
gcc/
* omp-low.c (scan_omp_for) <OpenACC>: More precise diagnostics for
'gang', 'worker', 'vector' clauses with arguments only allowed in
'kernels' regions.
gcc/testsuite/
* c-c++-common/goacc/pr92793-1.c: Extend.
* gfortran.dg/goacc/pr92793-1.f90: Likewise.
|
|
As the discussion in PR96789, we found that some scalar stmts
which can be eliminated by some passes after SLP, but we still
modeled their costs when trying to SLP, it could impact
vectorizer's decision. One typical case is the case in PR96789
on target Power.
As Richard suggested there, this patch is to introduce one pass
called pre_slp_scalar_cleanup which has some secondary clean up
passes, for now they are FRE and DSE. It introduces one new
TODO flags group called pending TODO flags, unlike normal TODO
flags, the pending TODO flags are passed down in the pipeline
until one of its consumers can perform the requested action.
Consumers should then clear the flags for the actions that they
have taken.
Soem compilation time statistics on all SPEC2017 INT bmks were
collected on one Power9 machine for several option sets below:
A1: -Ofast -funroll-loops
A2: -O1
A3: -O1 -funroll-loops
A4: -O2
A5: -O2 -funroll-loops
the corresponding increment rate is trivial:
A1 A2 A3 A4 A5
0.08% 0.00% -0.38% -0.10% -0.05%
Bootstrapped/regtested on powerpc64le-linux-gnu P8.
gcc/ChangeLog:
PR tree-optimization/96789
* function.h (struct function): New member unsigned pending_TODOs.
* passes.c (class pass_pre_slp_scalar_cleanup): New class.
(make_pass_pre_slp_scalar_cleanup): New function.
(pass_data_pre_slp_scalar_cleanup): New pass data.
* passes.def: (pass_pre_slp_scalar_cleanup): New pass, add
pass_fre and pass_dse as its children.
* timevar.def (TV_SCALAR_CLEANUP): New timevar.
* tree-pass.h (PENDING_TODO_force_next_scalar_cleanup): New
pending TODO flag.
(make_pass_pre_slp_scalar_cleanup): New declare.
* tree-ssa-loop-ivcanon.c (tree_unroll_loops_completely_1):
Once any outermost loop gets unrolled, flag cfun pending_TODOs
PENDING_TODO_force_next_scalar_cleanup on.
gcc/testsuite/ChangeLog:
PR tree-optimization/96789
* gcc.dg/tree-ssa/ssa-dse-28.c: Adjust.
* gcc.dg/tree-ssa/ssa-dse-29.c: Likewise.
* gcc.dg/vect/bb-slp-41.c: Likewise.
* gcc.dg/tree-ssa/pr96789.c: New test.
|
|
|
|
This moves an #ifdef block of code from calls.c to
targetm.function_ok_for_sibcall. Only two targets, x86 and rs6000,
define REG_PARM_STACK_SPACE or OUTGOING_REG_PARM_STACK_SPACE macros
that might vary depending on the called function. Macros like
UNITS_PER_WORD don't change over a function boundary, nor does the
MIPS ABI, nor does TARGET_64BIT on PA-RISC. Other targets are even
more trivially proven to not need the calls.c code.
Besides cleaning up a small piece of #ifdef code, the motivation for
this patch is to allow tail calls on PowerPC for functions that
require less reg_parm_stack_space than their caller. The original
code in calls.c only permitted tail calls when exactly equal, but on
PowerPC we can tail call if the callee has less or equal
REG_PARM_STACK_SPACE than the caller, as demonstrated by the
testcase. So we should use
/* If reg parm stack space increases, we cannot sibcall. */
if (REG_PARM_STACK_SPACE (decl ? decl : fntype)
> INCOMING_REG_PARM_STACK_SPACE (current_function_decl))
and note the change to use INCOMING_REG_PARM_STACK_SPACE.
REG_PARM_STACK_SPACE has always been wrong there for PowerPC. See
https://gcc.gnu.org/pipermail/gcc-patches/2014-May/389867.html for why
if you're curious. Not that it matters, because PowerPC can do
without this check entirely, relying on a stack slot test in generic
code.
a) The generic code checks that arg passing stack in the callee is not
greater than that in the caller, and,
b) ELFv2 only allocates reg_parm_stack_space when some parameter is
passed on the stack.
Point (b) means that zero reg_parm_stack_space implies zero stack
space, and non-zero reg_parm_stack_space implies non-zero stack
space. So the case of 0 reg_parm_stack_space in the caller and 64 in
the callee will be caught by (a).
gcc/
PR middle-end/97267
* calls.h (maybe_complain_about_tail_call): Declare.
* calls.c (maybe_complain_about_tail_call): Make global.
(can_implement_as_sibling_call_p): Delete reg_parm_stack_space
param. Adjust caller. Move REG_PARM_STACK_SPACE check to..
* config/i386/i386.c (ix86_function_ok_for_sibcall): ..here.
gcc/testsuite/
PR middle-end/97267
* gcc.target/powerpc/pr97267.c: New test.
|
|
gcc/ChangeLog:
* ira.c (ira_remove_scratches): Rename to remove_scratches. Make
it static and returning flag of any change.
(ira.c): Call ira_expand_reg_equiv in case of removing scratches.
|
|
MMX emulation with SEE is implemented at MMX intrinsic level, not at MMX
instruction level. _mm_maskmove_si64 intrinsic for "MASKMOVQ mm1, mm2"
is emulated with __builtin_ia32_maskmovdqu. Since SSE "MASKMOVQ mm1, mm2"
builtin function, __builtin_ia32_maskmovq, can't be emulated with XMM
registers, make __builtin_ia32_maskmovq also require MMX instead of SSE
only.
gcc/
PR target/97140
* config/i386/i386-expand.c (ix86_expand_builtin): Require MMX
for __builtin_ia32_maskmovq.
gcc/testsuite/
PR target/97140
* gcc.target/i386/pr97140.c: New test.
|
|
|
|
gcc/ChangeLog:
* doc/invoke.texi (-Wstringop-overflow): Correct default setting.
(-Wstringop-overread): Move past -Wstringop-overflow.
|
|
gcc/ChangeLog:
PR bootstrap/57076
* Makefile.in (gcc-vers.texi): Quote @, { and }.
|
|
Move some var decls to their initializers. Correct some whitespace.
gcc/cp/
* decl.c (start_decl_1): Refactor declarations. Fixup some
whitespace.
(lookup_and_check_tag): Fixup some whitespace.
|
|
A couple of paths in duplicate decls dealing with templates and
builtins were overly complicated. Fixing thusly.
gcc/cp/
* decl.c (duplicate_decls): Refactor some template & builtin
handling.
|
|
Since I redid block-scope extern decls, the need for a uid->decl
hasher has gone away. Deleting thusly.
gcc/cp/
* cp-tree.h (struct cxx_int_tree_map): Delete.
(struct cxx_int_tree_map_hasher): Delete.
* cp-gimplify.c (cxx_int_tree_map_hasher::equal): Delete.
(cxx_int_tree_map_hasher::hash): Delete.
|
|
The adoption of P2104 ("Disallow changing concept values") means we can
memoize the result of satisfaction indefinitely and no longer have to
clear the satisfaction caches on various events that would affect
satisfaction. To that end, this patch removes the invalidation routine
clear_satisfaction_cache and adjusts its callers appropriately.
This provides a large reduction in compile time and memory use in some
cases. For example, on the libstdc++ test std/ranges/adaptor/join.cc,
compile time and memory usage drops nearly 75%, from 7.5s/770MB to
2s/230MB, with a --enable-checking=release compiler.
gcc/cp/ChangeLog:
* class.c (finish_struct_1): Don't call clear_satisfaction_cache.
* constexpr.c (clear_cv_and_fold_caches): Likewise. Remove bool
parameter.
* constraint.cc (clear_satisfaction_cache): Remove definition.
* cp-tree.h (clear_satisfaction_cache): Remove declaration.
(clear_cv_and_fold_caches): Remove bool parameter.
* typeck2.c (store_init_value): Remove argument to
clear_cv_and_fold_caches.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/concepts-complete1.C: Delete test that became
ill-formed after P2104.
|
|
2020-10-29 Carl Love <cel@us.ibm.com>
gcc/
PR target/93449
* config/rs6000/altivec.h (__builtin_bcdadd, __builtin_bcdadd_lt,
__builtin_bcdadd_eq, __builtin_bcdadd_gt, __builtin_bcdadd_ofl,
__builtin_bcdadd_ov, __builtin_bcdsub, __builtin_bcdsub_lt,
__builtin_bcdsub_eq, __builtin_bcdsub_gt, __builtin_bcdsub_ofl,
__builtin_bcdsub_ov, __builtin_bcdinvalid, __builtin_bcdmul10,
__builtin_bcddiv10, __builtin_bcd2dfp, __builtin_bcdcmpeq,
__builtin_bcdcmpgt, __builtin_bcdcmplt, __builtin_bcdcmpge,
__builtin_bcdcmple): Add defines.
* config/rs6000/altivec.md: Add UNSPEC_BCDSHIFT.
(BCD_TEST): Add le, ge to code iterator.
Add VBCD mode iterator.
(bcd<bcd_add_sub>_test, *bcd<bcd_add_sub>_test2,
bcd<bcd_add_sub>_<code>, bcd<bcd_add_sub>_<code>): Add mode to name.
Change iterator from V1TI to VBCD.
(*bcdinvalid_<mode>, bcdshift_v16qi): New define_insn.
(bcdinvalid_<mode>, bcdmul10_v16qi, bcddiv10_v16qi): New define.
* config/rs6000/dfp.md (dfp_denbcd_v16qi_inst): New define_insn.
(dfp_denbcd_v16qi): New define_expand.
* config/rs6000/rs6000-builtin.def (BU_P8V_MISC_1): New define.
(BCDADD): Replaced with BCDADD_V1TI and BCDADD_V16QI.
(BCDADD_LT): Replaced with BCDADD_LT_V1TI and BCDADD_LT_V16QI.
(BCDADD_EQ): Replaced with BCDADD_EQ_V1TI and BCDADD_EQ_V16QI.
(BCDADD_GT): Replaced with BCDADD_GT_V1TI and BCDADD_GT_V16QI.
(BCDADD_OV): Replaced with BCDADD_OV_V1TI and BCDADD_OV_V16QI.
(BCDSUB_V1TI, BCDSUB_V16QI, BCDSUB_LT_V1TI, BCDSUB_LT_V16QI,
BCDSUB_LE_V1TI, BCDSUB_LE_V16QI, BCDSUB_EQ_V1TI, BCDSUB_EQ_V16QI,
BCDSUB_GT_V1TI, BCDSUB_GT_V16QI, BCDSUB_GE_V1TI, BCDSUB_GE_V16QI,
BCDSUB_OV_V1TI, BCDSUB_OV_V16QI, BCDINVALID_V1TI, BCDINVALID_V16QI,
BCDMUL10_V16QI, BCDDIV10_V16QI, DENBCD_V16QI): New builtin definitions.
(BCDADD, BCDADD_LT, BCDADD_EQ, BCDADD_GT, BCDADD_OV, BCDSUB, BCDSUB_LT,
BCDSUB_LE, BCDSUB_EQ, BCDSUB_GT, BCDSUB_GE, BCDSUB_OV, BCDINVALID,
BCDMUL10, BCDDIV10, DENBCD): New overload definitions.
* config/rs6000/rs6000-call.c (P8V_BUILTIN_VEC_BCDADD, P8V_BUILTIN_VEC_BCDADD_LT,
P8V_BUILTIN_VEC_BCDADD_EQ, P8V_BUILTIN_VEC_BCDADD_GT, P8V_BUILTIN_VEC_BCDADD_OV,
P8V_BUILTIN_VEC_BCDINVALID, P9V_BUILTIN_VEC_BCDMUL10, P8V_BUILTIN_VEC_DENBCD.
P8V_BUILTIN_VEC_BCDSUB, P8V_BUILTIN_VEC_BCDSUB_LT, P8V_BUILTIN_VEC_BCDSUB_LE,
P8V_BUILTIN_VEC_BCDSUB_EQ, P8V_BUILTIN_VEC_BCDSUB_GT, P8V_BUILTIN_VEC_BCDSUB_GE,
P8V_BUILTIN_VEC_BCDSUB_OV): New overloaded specifications.
(CODE_FOR_bcdadd): Replaced with CODE_FOR_bcdadd_v16qi and CODE_FOR_bcdadd_v1ti.
(CODE_FOR_bcdadd_lt): Replaced with CODE_FOR_bcdadd_lt_v16qi and CODE_FOR_bcdadd_lt_v1ti.
(CODE_FOR_bcdadd_eq): Replaced with CODE_FOR_bcdadd_eq_v16qi and CODE_FOR_bcdadd_eq_v1ti.
(CODE_FOR_bcdadd_gt): Replaced with CODE_FOR_bcdadd_gt_v16qi and CODE_FOR_bcdadd_gt_v1ti.
(CODE_FOR_bcdsub): Replaced with CODE_FOR_bcdsub_v16qi and CODE_FOR_bcdsub_v1ti.
(CODE_FOR_bcdsub_lt): Replaced with CODE_FOR_bcdsub_lt_v16qi and CODE_FOR_bcdsub_lt_v1ti.
(CODE_FOR_bcdsub_eq): Replaced with CODE_FOR_bcdsub_eq_v16qi and CODE_FOR_bcdsub_eq_v1ti.
(CODE_FOR_bcdsub_gt): Replaced with CODE_FOR_bcdsub_gt_v16qi and CODE_FOR_bcdsub_gt_v1ti.
(rs6000_expand_ternop_builtin): Add CODE_FOR_dfp_denbcd_v16qi to else if.
* doc/extend.texi: Add documentation for new builtins.
gcc/testsuite/
* gcc.target/powerpc/bcd-2.c: Add include altivec.h.
* gcc.target/powerpc/bcd-3.c: Add include altivec.h.
* gcc.target/powerpc/bcd-4.c: New test.
|
|
I created a few tests on the modules branch that are not actually
module-related. Here they are.
gcc/testsuite/
* g++.dg/concepts/pack-1.C: New.
* g++.dg/lookup/using53.C: Add an enum.
* g++.dg/template/error25.C: Relax 'export' error check.
|
|
This changes more on the modules branch, but let's move the
declaration to the initializer now.
gcc/c-family/
* c-opts.c (c_common_post_options): Move var decl to its
initialization point.
|
|
I fell over an ICE where wide_int_to_type_1's expectations of pointer
value caching didn't match that of cache_integer_cst's behaviour. I
don't know why it only exhibited on the modules branch, but it seems
pretty wrong. This patch matches up the behaviours and adds a comment
about that.
gcc/
* tree.c (cache_integer_cst): Fixup pointer caching to match
wide_int_to_type_1's expectations. Add comment.
|
|
I noticed the two id_equal functions directly called strcmp. This
changes one of them to call the other with args swapped.
gcc/
* tree.h (id_equal): Call the symetric predicate with swapped
arguments.
|
|
In debugging some call-expr handling, I got confused because the debug
printer elided NULL call operands. This changes the printer to display
them as NULL.
gcc/
* print-tree.c (print_node): Display all the operands of a call
expr.
|
|
*vsx_extract_<mode>_store_p9.
gcc/ChangeLog:
* config/rs6000/vsx.md (*vsx_extract_<mode>_store_p9): Add hint *
to 2nd alternative of the 1st scratch.
|
|
Currently the testcase in the patch was failing to produce
a 'bti c' at the beginning of the function. This was because
in aarch64_pac_insn_p, we were wrongly returning at the first
check!
2020-10-30 Sudakshina Das <sudi.das@arm.com>
gcc/ChangeLog:
PR target/97638
* config/aarch64/aarch64-bti-insert.c (aarch64_pac_insn_p): Update
return value on INSN_P check.
gcc/testsuite/ChangeLog:
PR target/97638
* gcc.target/aarch64/pr97638.c: New test.a
|
|
This rewrites SLP induction vectorization to handle different
inductions in the different SLP lanes. It also changes SLP
build to represent the initial value (but not the cycle) so
it can be enhanced to handle outer loop vectorization later.
Note this FAILs gcc.dg/vect/costmodel/x86_64/costmodel-pr30843.c
because it removes one CSE optimization that no longer works
with non-uniform initial value and step. I'll see to recover
from this after outer loop vectorization of inductions works.
It might be a bit friendlier to variable-size vectors now
but then we're now building the step vector from scalars ...
2020-11-02 Richard Biener <rguenther@suse.de>
* tree.h (build_real_from_wide): Declare.
* tree.c (build_real_from_wide): New function.
* tree-vect-slp.c (vect_build_slp_tree_2): Remove
restriction on induction vectorization, represent
the initial value.
* tree-vect-loop.c (vect_model_induction_cost): Inline ...
(vectorizable_induction): ... here. Rewrite SLP
code generation.
* gcc.dg/vect/slp-49.c: New testcase.
|
|
Martin Liška has been asking me to add debug counters to the IPA-CP pass so
that testcase reductions are easier. The pass already has one for the bit
value propagation, so this patch adds one for value_range propagation
and one for the actual constant propagation.
gcc/ChangeLog:
2020-10-30 Martin Jambor <mjambor@suse.cz>
* dbgcnt.def (ipa_cp_values): New counter.
(ipa_cp_vr): Likewise.
* ipa-cp.c (decide_about_value): Check and bump ipa_cp_values debug
counter.
(decide_whether_version_node): Likewise.
(ipcp_store_vr_results):Check and bump ipa_cp_vr debug counter.
|
|
When -mpure-code is used, we cannot load delta from code memory (like
we do without -mpure-code).
This patch builds the value of mi_delta into r3 with a series of
movs/adds/lsls.
We also do some cleanup by not emitting the function address and delta
via .word directives at the end of the thunk since we don't use them
with -mpure-code.
No need for new testcases, this bug was already identified by:
g++.dg/ipa/pr46287-3.C
g++.dg/ipa/pr46984.C
g++.dg/opt/thunk1.C
g++.dg/torture/pr46287.C
g++.dg/torture/pr45699.C
2020-11-02 Christophe Lyon <christophe.lyon@linaro.org>
gcc/
* config/arm/arm.c (arm_thumb1_mi_thunk): Build mi_delta in r3 and
do not emit function address and delta when -mpure-code is used.
|
|
thumb1_movsi_insn used the same algorithm to build a constant in asm
than thumb1_gen_const_int_1 does in RTL. Since the previous patch added
support for asm generation in thumb1_gen_const_int_1, this patch calls
it from thumb1_movsi_insn to avoid duplication.
We need to introduce a new proxy function, thumb1_gen_const_int_print
to select the right template.
This patch also adds a new testcase as the updated alternative is only
used by thumb-1 processors that also support movt/movw.
2020-11-02 Christophe Lyon <christophe.lyon@linaro.org>
gcc/
* config/arm/thumb1.md (thumb1_movsi_insn): Call
thumb1_gen_const_int_print.
* config/arm/arm-protos.h (thumb1_gen_const_int_print): Add
prototype.
* config/arm/arm.c (thumb1_gen_const_int_print): New.
gcc/testsuite/
* gcc.target/arm/pure-code/no-literal-pool-m23.c: New.
|
|
Enable thumb1_gen_const_int to generate RTL or asm depending on the
context, so that we avoid duplicating code to handle constants in
Thumb-1 with -mpure-code.
Use a template so that the algorithm is effectively shared, and
rely on two classes to handle the actual emission as RTL or asm.
The generated sequence is improved to handle right-shiftable and small
values with less instructions. We now generate:
128:
movs r0, r0, #128
264:
movs r3, #33
lsls r3, #3
510:
movs r3, #255
lsls r3, #1
512:
movs r3, #1
lsls r3, #9
764:
movs r3, #191
lsls r3, #2
65536:
movs r3, #1
lsls r3, #16
0x123456:
movs r3, #18 ;0x12
lsls r3, #8
adds r3, #52 ;0x34
lsls r3, #8
adds r3, #86 ;0x56
0x1123456:
movs r3, #137 ;0x89
lsls r3, #8
adds r3, #26 ;0x1a
lsls r3, #8
adds r3, #43 ;0x2b
lsls r3, #1
0x1000010:
movs r3, #16
lsls r3, #16
adds r3, #1
lsls r3, #4
0x1000011:
movs r3, #1
lsls r3, #24
adds r3, #17
-8192:
movs r3, #1
lsls r3, #13
rsbs r3, #0
The patch adds a testcase which does not fully exercise
thumb1_gen_const_int, as other existing patterns already catch small
constants. These parts of thumb1_gen_const_int are used by
arm_thumb1_mi_thunk.
2020-11-02 Christophe Lyon <christophe.lyon@linaro.org>
gcc/
* config/arm/arm.c (thumb1_const_rtl, thumb1_const_print): New
classes.
(thumb1_gen_const_int): Rename to ...
(thumb1_gen_const_int_1): ... New helper function. Add capability
to emit either RTL or asm, improve generated code.
(thumb1_gen_const_int_rtl): New function.
* config/arm/arm-protos.h (thumb1_gen_const_int): Rename to
thumb1_gen_const_int_rtl.
* config/arm/thumb1.md: Call thumb1_gen_const_int_rtl instead
of thumb1_gen_const_int.
gcc/testsuite/
* gcc.target/arm/pure-code/no-literal-pool-m0.c: New.
|
|
Building on top of commit 9c81750c5bedd7883182ee2684a012c6210ebe1d "Fortran] PR
92793 - fix column used for error diagnostic", there is another place where we
have to use 'gfc_get_location' returning column-corrected locations.
For example, this improves column location information for OMP constructs.
gcc/fortran/
PR fortran/92793
* trans.c (gfc_set_backend_locus): Use 'gfc_get_location'.
(gfc_restore_backend_locus): Adjust.
gcc/testsuite/
PR fortran/92793
* gfortran.dg/goacc/pr92793-1.f90: Adjust.
|
|
gcc/fortran/ChangeLog:
PR fortran/97655
* openmp.c (gfc_match_omp_atomic): Fix mem-order handling;
reject specifying update + capture together.
gcc/testsuite/ChangeLog:
PR fortran/97655
* gfortran.dg/gomp/atomic.f90: Update tree-dump counts; move
invalid OMP 5.0 code to ...
* gfortran.dg/gomp/atomic-2.f90: ... here; update dg-error.
* gfortran.dg/gomp/requires-9.f90: Update tree dump scan.
|
|
This makes sure to compute the vector type for invariant SLP children
of nested cycles.
2020-11-02 Richard Biener <rguenther@suse.de>
PR tree-optimization/97558
* tree-vect-loop.c (vectorizable_reduction): For nested SLP
cycles compute invariant operands vector type.
* gcc.dg/vect/pr97558-2.c: New testcase.
|
|
gcc/testsuite/ChangeLog:
PR tree-optimization/97505
* gcc.dg/pr97505.c: New test.
|
|
This avoids analyzing reductions that are not relevant (thus dead)
which eventually will lead into crashes because the participating
stmts meta is not analyzed. For this to work the patch also
properly removes reduction groups that are not uniformly recognized
as patterns.
2020-11-02 Richard Biener <rguenther@suse.de>
PR tree-optimization/97558
* tree-vect-loop.c (vect_fixup_scalar_cycles_with_patterns):
Check for any mismatch in pattern vs. non-pattern and dissolve
the group if there is one.
* tree-vect-slp.c (vect_analyze_slp_instance): Avoid
analyzing not relevant reductions.
(vect_analyze_slp): Avoid analyzing not relevant reduction
groups.
* gcc.dg/vect/pr97558.c: New testcase.
|
|
I was mistaken to treat vect_external_def as only applying to
SSA_NAME defs, so check for that.
2020-11-02 Richard Biener <rguenther@suse.de>
PR tree-optimization/97650
* tree-vect-slp.c (vect_get_and_check_slp_defs): Check
for SSA_NAME before checking SSA_NAME_IS_DEFAULT_DEF.
* gcc.dg/vect/bb-slp-pr97650.c: New testcase.
|
|
gcc/ChangeLog:
* common/config/riscv/riscv-common.c
(riscv_subset_list::parse_multiletter_ext): Checking multiletter
extension has more than 1 letter.
gcc/testsuite/ChangeLog
* gcc.target/riscv/arch-7.c: New.
* gcc.target/riscv/attribute-10.c: Update test arch string.
|
|
multi-lib settings.
- Able to configure complex multi-lib rule in configure time, without modify
any in-tree source.
- I was consider to implmenet this into `--with-multilib-list` option,
but I am not sure who will using that with riscv*-*-elf*, so I decide to
using another option name for that.
- --with-multilib-generator will pass arguments to multilib-generator, and
then using the generated multi-lib config file to build the toolchain.
e.g. Build riscv gcc, default arch/abi is rv64gc/lp64, and build multilib
for rv32imafd/ilp32 and rv32i/ilp32; rv32ic/ilp32 will reuse
rv32i/ilp32.
$ <GCC-SRC>/configure \
--target=riscv64-elf \
--with-arch=rv64gc --with-abi=lp64 \
--with-multilib-generator=rv32i-ilp32--c;rv32imafd-ilp32--
V3 Changes:
- Rename --with-multilib-config to --with-multilib-generator
- Check --with-multilib-generator and --with-multilib-list can't be used at
same time.
V2 Changes:
- Fix --with-multilib-config hanling on non riscv*-*-elf* triple.
gcc/ChangeLog:
* config.gcc (riscv*-*-*): Handle --with-multilib-generator.
* configure: Regen.
* configure.ac: Add --with-multilib-generator.
* config/riscv/multilib-generator: Exit when parsing arch string error.
* config/riscv/t-withmultilib-generator: New.
* doc/install.texi: Document --with-multilib-generator.
|
|
v6m (PR96770)
With -mpure-code on v6m (thumb-1), we can use small offsets with
upper/lower relocations to avoid the extra addition of the
offset.
This patch accepts expressions symbol+offset as legitimate constants
when the literal pool is disabled, making sure that the offset is
within the range supported by thumb-1 [0..255] as described in the
AAELF32 documentation.
It also makes sure that thumb1_movsi_insn emits an error in case we
try to use it with an unsupported RTL construct.
2020-09-28 Christophe Lyon <christophe.lyon@linaro.org>
gcc/
PR target/96770
* config/arm/arm.c (thumb_legitimate_constant_p): Accept
(symbol_ref + addend) when literal pool is disabled.
(arm_valid_symbolic_address_p): Add support for thumb-1 without
MOVT/MOVW.
* config/arm/thumb1.md (*thumb1_movsi_insn): Accept (symbol_ref +
addend) in the pure-code alternative.
gcc/testsuite/
PR target/96770
* gcc.target/arm/pure-code/pr96770.c: New test.
|
|
With -mpure-code on v6m (thumb-1), to avoid a useless indirection when
building the address of a symbol, we want to consider SYMBOL_REF as a
legitimate constant. This way, we build the address using a series of
upper/lower relocations instead of loading the address from memory.
This patch also fixes a missing "clob" conds attribute for
thumb1_movsi_insn, needed because that alternative clobbers the flags.
2020-11-02 Christophe Lyon <christophe.lyon@linaro.org>
gcc/
PR target/96967
* config/arm/arm.c (thumb_legitimate_constant_p): Add support for
disabled literal pool in thumb-1.
* config/arm/thumb1.md (thumb1_movsi_symbol_ref): Remove.
(*thumb1_movsi_insn): Add support for SYMBOL_REF with -mpure-code.
gcc/testsuite
PR target/96967
* gcc.target/arm/pure-code/pr96767.c: New test.
|
|
Newer versions of Darwin report pagesize 20 which means that we
need to adjust the aligment of the PCH area.
gcc/ChangeLog:
* config/host-darwin.c: Align pch_address_space to 16384.
|
|
The reference implementation for Objective-C provides the SEL
typedef (although it is also available from <objc/objc.h>).
gcc/objc/ChangeLog:
* objc-act.c (synth_module_prologue): Get the SEL identifier.
* objc-act.h (enum objc_tree_index): Add OCTI_SEL_NAME.
(objc_selector_name): New.
(SEL_TYPEDEF_NAME): New.
* objc-gnu-runtime-abi-01.c
(gnu_runtime_01_initialize): Initialize SEL typedef.
* objc-next-runtime-abi-01.c
(next_runtime_01_initialize): Likewise.
* objc-next-runtime-abi-02.c
gcc/testsuite/ChangeLog:
* obj-c++.dg/SEL-typedef.mm: New test.
* objc.dg/SEL-typedef.m: New test.
|
|
When we are lexing tokens for Objective-C, we combine '@' tokens
with a following keyword (when that keyword is a valid Objective-C
one or, for Objective-C, one of the C++ keywords that can appear in
this position). The responsibility is passed on to the parser to
validate the resulting combination.
The combination of tokens was being done without applying the rule
to their locations - so that we get:
@property
^
instead of what the user might expect:
@property
^~~~~~~~~
This patch combines the source range of the keyword with that of the
'@' sign - which improves diagnostics.
gcc/c-family/ChangeLog:
* c-lex.c (c_lex_with_flags): When combining '@' with a
keyword for Objective-C, combine the location ranges too.
|
|
We can avoid the spurious additional complaint about a closing
')' by short-circuiting the test in the case we know there's a
syntax error already reported.
gcc/cp/ChangeLog:
* parser.c (cp_parser_objc_at_property_declaration): Use any
exisiting syntax error to suppress complaints about a missing
closing parenthesis in parsing property attributes.
gcc/testsuite/ChangeLog:
* obj-c++.dg/property/at-property-1.mm: Adjust test after
fixing spurious error output.
|
|
gcc/ChangeLog
* config/i386/i386.c (ix86_expand_prologue): Set the stack usage to 0
for naked functions.
|
|
PR 97660 occurs when cgraph_node::get returns NULL, and this NULL
cgraph_node is then passed to clone_info::get. As the original assert
prior to the regressing change in r11-4587 allowed for the cgraph_node
to be NULL, clone_info::get is now only called when cgraph_node::get
returns a nonnull value.
gcc/ChangeLog:
PR ipa/97660
* cgraph.c (cgraph_edge::redirect_call_stmt_to_callee): Don't call
clone_info::get when cgraph_node::get returns NULL.
|
|
These tests currently fail on targets without Intel assembler support.
gcc/testsuite/ChangeLog:
* gcc.target/i386/amxbf16-asmintel-1.c: Require masm_intel.
* gcc.target/i386/amxint8-asmintel-1.c: Likewise.
* gcc.target/i386/amxtile-asmintel-1.c: Likewise.
|
|
* Makefile.in: (OBJS): Add symtab-clones.o
(GTFILES): Add symtab-clones.h
* cgraph.c: Include symtab-clones.h.
(cgraph_edge::resolve_speculation): Fix formating
(cgraph_edge::redirect_call_stmt_to_callee): Update.
(cgraph_update_edges_for_call_stmt): Update
(release_function_body): Fix formating.
(cgraph_node::remove): Fix formating.
(cgraph_node::dump): Fix formating.
(cgraph_node::get_availability): Fix formating.
(cgraph_node::call_for_symbol_thunks_and_aliases): Fix formating.
(set_const_flag_1): Fix formating.
(set_pure_flag_1): Fix formating.
(cgraph_node::can_remove_if_no_direct_calls_p): Fix formating.
(collect_callers_of_node_1): Fix formating.
(clone_of_p): Update.
(cgraph_node::verify_node): Update.
(cgraph_c_finalize): Call clone_info::release ().
* cgraph.h (struct cgraph_clone_info): Move to symtab-clones.h.
(cgraph_node): Remove clone_info.
(symbol_table): Add m_clones.
* cgraphclones.c: Include symtab-clone.h.
(duplicate_thunk_for_node): Update.
(cgraph_node::create_clone): Update.
(cgraph_node::create_virtual_clone): Update.
(cgraph_node::find_replacement): Update.
(cgraph_node::materialize_clone): Update.
* gengtype.c (open_base_files): Include symtab-clones.h.
* ipa-cp.c: Include symtab-clones.h.
(initialize_node_lattices): Update.
(want_remove_some_param_p): Update.
(create_specialized_node): Update.
* ipa-fnsummary.c: Include symtab-clones.h.
(ipa_fn_summary_t::duplicate): Update.
* ipa-modref.c: Include symtab-clones.h.
(update_signature): Update.
* ipa-param-manipulation.c: Include symtab-clones.h.
(ipa_param_body_adjustments::common_initialization): Update.
* ipa-prop.c: Include symtab-clones.h.
(adjust_agg_replacement_values): Update.
(ipcp_get_parm_bits): Update.
(ipcp_update_bits): Update.
(ipcp_update_vr): Update.
* ipa-sra.c: Include symtab-clones.h.
(process_isra_node_results): Update.
(disable_unavailable_parameters): Update.
* lto-cgraph.c: Include symtab-clone.h.
(output_cgraph_opt_summary_p): Update.
(output_node_opt_summary): Update.
(input_node_opt_summary): Update.
* symtab-clones.cc: New file.
* symtab-clones.h: New file.
* tree-inline.c (expand_call_inline): Update.
(update_clone_info): Update.
(tree_function_versioning): Update.
|
|
* ipa-modref.c (modref_summary::dump): Dump writes_errno.
(parm_map_for_arg): Break out from ...
(merge_call_side_effects): ... here.
(get_access_for_fnspec): New function.
(process_fnspec): New function.
(analyze_call): Use it.
(analyze_stmt): Update.
(analyze_function): Initialize writes_errno.
(modref_summaries::duplicate): Duplicate writes_errno.
* ipa-modref.h (struct modref_summary): Add writes_errno.
* tree-ssa-alias.c (call_may_clobber_ref_p_1): Check errno.
|
|
gcc/
2020-10-30 Michael Meissner <meissner@linux.ibm.com>
* config/rs6000/rs6000.c (glibc_supports_ieee_128bit): New helper
function.
(rs6000_option_override_internal): Call it.
|
|
This new feature causes the compiler to zero a subset of all call-used
registers at function return. This is used to increase program security
by either mitigating Return-Oriented Programming (ROP) attacks or
preventing information leakage through registers.
gcc/ChangeLog:
2020-10-30 Qing Zhao <qing.zhao@oracle.com>
H.J.Lu <hjl.tools@gmail.com>
* common.opt: Add new option -fzero-call-used-regs
* config/i386/i386.c (zero_call_used_regno_p): New function.
(zero_call_used_regno_mode): Likewise.
(zero_all_vector_registers): Likewise.
(zero_all_st_registers): Likewise.
(zero_all_mm_registers): Likewise.
(ix86_zero_call_used_regs): Likewise.
(TARGET_ZERO_CALL_USED_REGS): Define.
* df-scan.c (df_epilogue_uses_p): New function.
(df_get_exit_block_use_set): Replace EPILOGUE_USES with
df_epilogue_uses_p.
* df.h (df_epilogue_uses_p): Declare.
* doc/extend.texi: Document the new zero_call_used_regs attribute.
* doc/invoke.texi: Document the new -fzero-call-used-regs option.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in (TARGET_ZERO_CALL_USED_REGS): New hook.
* emit-rtl.h (struct rtl_data): New field must_be_zero_on_return.
* flag-types.h (namespace zero_regs_flags): New namespace.
* function.c (gen_call_used_regs_seq): New function.
(class pass_zero_call_used_regs): New class.
(pass_zero_call_used_regs::execute): New function.
(make_pass_zero_call_used_regs): New function.
* optabs.c (expand_asm_reg_clobber_mem_blockage): New function.
* optabs.h (expand_asm_reg_clobber_mem_blockage): Declare.
* opts.c (zero_call_used_regs_opts): New structure array
initialization.
(parse_zero_call_used_regs_options): New function.
(common_handle_option): Handle -fzero-call-used-regs.
* opts.h (zero_call_used_regs_opts): New structure array.
* passes.def: Add new pass pass_zero_call_used_regs.
* recog.c (valid_insn_p): New function.
* recog.h (valid_insn_p): Declare.
* resource.c (init_resource_info): Replace EPILOGUE_USES with
df_epilogue_uses_p.
* target.def (zero_call_used_regs): New hook.
* targhooks.c (default_zero_call_used_regs): New function.
* targhooks.h (default_zero_call_used_regs): Declare.
* tree-pass.h (make_pass_zero_call_used_regs): Declare.
gcc/c-family/ChangeLog:
2020-10-30 Qing Zhao <qing.zhao@oracle.com>
H.J.Lu <hjl.tools@gmail.com>
* c-attribs.c (c_common_attribute_table): Add new attribute
zero_call_used_regs.
(handle_zero_call_used_regs_attribute): New function.
gcc/testsuite/ChangeLog:
2020-10-30 Qing Zhao <qing.zhao@oracle.com>
H.J.Lu <hjl.tools@gmail.com>
* c-c++-common/zero-scratch-regs-1.c: New test.
* c-c++-common/zero-scratch-regs-10.c: New test.
* c-c++-common/zero-scratch-regs-11.c: New test.
* c-c++-common/zero-scratch-regs-2.c: New test.
* c-c++-common/zero-scratch-regs-3.c: New test.
* c-c++-common/zero-scratch-regs-4.c: New test.
* c-c++-common/zero-scratch-regs-5.c: New test.
* c-c++-common/zero-scratch-regs-6.c: New test.
* c-c++-common/zero-scratch-regs-7.c: New test.
* c-c++-common/zero-scratch-regs-8.c: New test.
* c-c++-common/zero-scratch-regs-9.c: New test.
* c-c++-common/zero-scratch-regs-attr-usages.c: New test.
* gcc.target/i386/zero-scratch-regs-1.c: New test.
* gcc.target/i386/zero-scratch-regs-10.c: New test.
* gcc.target/i386/zero-scratch-regs-11.c: New test.
* gcc.target/i386/zero-scratch-regs-12.c: New test.
* gcc.target/i386/zero-scratch-regs-13.c: New test.
* gcc.target/i386/zero-scratch-regs-14.c: New test.
* gcc.target/i386/zero-scratch-regs-15.c: New test.
* gcc.target/i386/zero-scratch-regs-16.c: New test.
* gcc.target/i386/zero-scratch-regs-17.c: New test.
* gcc.target/i386/zero-scratch-regs-18.c: New test.
* gcc.target/i386/zero-scratch-regs-19.c: New test.
* gcc.target/i386/zero-scratch-regs-2.c: New test.
* gcc.target/i386/zero-scratch-regs-20.c: New test.
* gcc.target/i386/zero-scratch-regs-21.c: New test.
* gcc.target/i386/zero-scratch-regs-22.c: New test.
* gcc.target/i386/zero-scratch-regs-23.c: New test.
* gcc.target/i386/zero-scratch-regs-24.c: New test.
* gcc.target/i386/zero-scratch-regs-25.c: New test.
* gcc.target/i386/zero-scratch-regs-26.c: New test.
* gcc.target/i386/zero-scratch-regs-27.c: New test.
* gcc.target/i386/zero-scratch-regs-28.c: New test.
* gcc.target/i386/zero-scratch-regs-29.c: New test.
* gcc.target/i386/zero-scratch-regs-30.c: New test.
* gcc.target/i386/zero-scratch-regs-31.c: New test.
* gcc.target/i386/zero-scratch-regs-3.c: New test.
* gcc.target/i386/zero-scratch-regs-4.c: New test.
* gcc.target/i386/zero-scratch-regs-5.c: New test.
* gcc.target/i386/zero-scratch-regs-6.c: New test.
* gcc.target/i386/zero-scratch-regs-7.c: New test.
* gcc.target/i386/zero-scratch-regs-8.c: New test.
* gcc.target/i386/zero-scratch-regs-9.c: New test.
|