Age | Commit message (Collapse) | Author | Files | Lines |
|
Division is slow on hppa and mode sizes are powers of 2. So, we
can use '&' operator to check displacement alignment.
2024-11-08 John David Anglin <danglin@gcc.gnu.org>
gcc/ChangeLog:
* config/pa/predicates.md (base14_operand): Use '&' operator
instead of '%' to check displacement alignment.
|
|
LRA has problems handling spills for OI and TI modes. There are
issues with SUBREG support as well.
This change fixes gcc.c-torture/compile/pr92618.c with LRA.
2024-11-08 John David Anglin <danglin@gcc.gnu.org>
gcc/ChangeLog:
PR target/117238
* config/pa/pa32-regs.h (PA_HARD_REGNO_MODE_OK): Don't allow
mode size 32. Limit mode size 16 in general registers to
complex modes.
|
|
This is fairly subtle.
When handling spills for SUBREG arguments in pa_emit_move_sequence,
alter_subreg may be called. It in turn calls adjust_address_1 and
change_address_1. change_address_1 calls pa_legitimate_address_p
to validate the new spill address. change_address_1 generates an
internal compiler error if the address is not valid. We need to
allow 14-bit displacements for all modes when reload_in_progress
is true and strict is false to prevent the internal compiler error.
SUBREGs are only used with the general registers, so the spill
should result in an integer access. 14-bit displacements are okay
for integer loads and stores but not for floating-point loads and
stores.
Potentially, the change could break the handling of spills for the
floating point-registers but I believe these are handled separately
in pa_emit_move_sequence.
This change fixes the build of symmetrica-3.0.1+ds.
2024-11-08 John David Anglin <danglin@gcc.gnu.org>
gcc/ChangeLog:
PR target/117443
* config/pa/pa.cc (pa_legitimate_address_p): Allow any
14-bit displacement when reload is in progress and strict
is false.
|
|
x86 doesn't have .REDUC_PLUS for V2SImode - there's no effective
target for that so add it to the list of targets not expecting the
BB vectorization.
* gcc.dg/vect/bb-slp-77.c: Add x86_64-*-* and i?86-*-* to
the list of expected failing targets.
|
|
When not dealing with the special armv8.1-m.main conditional instructions case
make sure it uses the default_noce_conversion_profitable_p call to determine
whether the sequence is cost effective.
Also make sure arm_noce_conversion_profitable_p accepts vsel<cond> patterns for
Armv8.1-M Mainline targets.
gcc/ChangeLog:
PR target/116444
* config/arm/arm.cc (arm_noce_conversion_profitable_p): Call
default_noce_conversion_profitable_p when not dealing with the
armv8.1-m.main special case.
(arm_is_vsel_fp_insn): New function.
|
|
Since C++20 virtual methods can be constexpr, and if they are
constexpr evaluated, we choose tentative_decl_linkage for those
defer their output and decide at_eof again.
On the following testcases we ICE though, because if
expand_or_defer_fn_1 decides to use tentative_decl_linkage, it
returns true and the caller in that case cals emit_associated_thunks,
where use_thunk which it calls asserts DECL_INTERFACE_KNOWN on the
thunk destination, which isn't the case for tentative_decl_linkage.
The following patch fixes the ICE by not emitting the thunks
for the DECL_DEFER_OUTPUT fns just yet but waiting until at_eof
time when we return to those.
Note, the second testcase ICEs already since r0-110035 with -std=c++0x
before it gets a chance to diagnose constexpr virtual method.
2024-11-08 Jakub Jelinek <jakub@redhat.com>
PR c++/117317
* semantics.cc (emit_associated_thunks): Do nothing for
!DECL_INTERFACE_KNOWN && DECL_DEFER_OUTPUT fns.
* g++.dg/cpp2a/pr117317-1.C: New test.
* g++.dg/cpp2a/pr117317-2.C: New test.
|
|
Update test case for armv8.1-m.main that supports conditional
arithmetic.
armv7-m:
push {r4, lr}
ldr r4, .L6
ldr r4, [r4]
lsls r4, r4, #29
it mi
addmi r2, r2, #1
bl bar
movs r0, #0
pop {r4, pc}
armv8.1-m.main:
push {r3, r4, r5, lr}
ldr r4, .L5
ldr r5, [r4]
tst r5, #4
csinc r2, r2, r2, eq
bl bar
movs r0, #0
pop {r3, r4, r5, pc}
gcc/testsuite/ChangeLog:
* gcc.target/arm/epilog-1.c: Use check-function-bodies.
Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
|
|
This fixes reported regression at
https://linaro.atlassian.net/browse/GNU-1407.
gcc/testsuite/ChangeLog:
* gcc.target/arm/pr68620.c: Use effective-target
arm_libc_fp_abi.
* lib/target-supports.exp: Define effective-target
arm_libc_fp_abi.
Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
Co-authored-by: Richard Earnshaw <rearnsha@arm.com>
|
|
When building the test case with neon, the 'vst1.32' instruction is used
instead of 'strd'. Allow both variants to make the test pass.
gcc/testsuite/ChangeLog:
* gcc.target/arm/pr40457-2.c: Add vst1.32 as an allowed
instruction.
Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
|
|
Using "dg-do run" with a selector overrides the default selector set by
vect.exp that picks between "dg-do run" and "dg-do compile" based on the
target's support for simd operations for Arm targets.
The actual selection of default operation is performed in
check_vect_support_and_set_flags.
gcc/testsuite/ChangeLog:
* g++.dg/vect/pr84556.cc: Change from "dg-do run" with selector
to instead use dg-require-effective-target with the same
selector.
Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
|
|
The following also enables the testcase on x86 as it now has the
required cbranch.
* gcc.dg/vect/vect-early-break_21.c: Remove disabling of
x86_64 and i?86.
|
|
Implement -mcpu options for:
- Cortex-A520AE
- Cortex-A720AE
- Cortex-R82AE
These all implement the same feature sets as their non-AE
counterparts, using the same scheduler and costs and differing only in
their respective part numbers.
gcc/ChangeLog:
* config/aarch64/aarch64-cores.def (cortex-a520ae,
cortex-a720ae, cortex-r82ae): Define new entries.
* config/aarch64/aarch64-tune.md: Regenerate.
* doc/invoke.texi: Document A520AE, A720AE and R82AE CPUs.
|
|
Test uses MVE, so add effective-target arm_fp requirement.
gcc/testsuite/ChangeLog:
* g++.target/arm/mve/general-c++/nomve_fp_1.c: Use
effective-target arm_fp.
Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
|
|
form1:
void __attribute__((noinline)) \
vec_sat_u_sub_imm##IMM##_##T##_fmt_1 (T *out, T *in, unsigned limit) \
{ \
unsigned i; \
for (i = 0; i < limit; i++) \
out[i] = (T)IMM >= in[i] ? (T)IMM - in[i] : 0; \
}
Passed the rv64gcv full regression test.
Signed-off-by: Li Xu <xuli1@eswincomputing.com>
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/binop/vec_sat_data.h: add data for vec sat_sub.
* gcc.target/riscv/rvv/autovec/vec_sat_arith.h: add unsigned imm vec sat_sub form1.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_imm-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_imm-2.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_imm-3.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_imm-4.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_imm-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_imm-run-2.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_imm-run-3.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_imm-run-4.c: New test.
|
|
I missed a search-and-replace on this test, meaning that it was
duplicating bfmlalb_f32.c.
gcc/testsuite/
* gcc.target/aarch64/sme2/acle-asm/bfmlslb_f32.c: Replace bfmla*
with bfmls*
|
|
The svpsel_lane intrinsics were wrongly classified as SME2+ only,
rather than as base SME intrinsics. They should always be available
in streaming mode.
gcc/
* config/aarch64/aarch64-sve2.md (@aarch64_sve_psel<BHSD_BITS>)
(*aarch64_sve_psel<BHSD_BITS>_plus): Require TARGET_STREAMING
rather than TARGET_STREAMING_SME2.
gcc/testsuite/
* gcc.target/aarch64/sme2/acle-asm/psel_lane_b16.c: Move to...
* gcc.target/aarch64/sme/acle-asm/psel_lane_b16.c: ...here.
* gcc.target/aarch64/sme2/acle-asm/psel_lane_b32.c: Move to...
* gcc.target/aarch64/sme/acle-asm/psel_lane_b32.c: ...here.
* gcc.target/aarch64/sme2/acle-asm/psel_lane_b64.c: Move to...
* gcc.target/aarch64/sme/acle-asm/psel_lane_b64.c: ...here.
* gcc.target/aarch64/sme2/acle-asm/psel_lane_b8.c: Move to...
* gcc.target/aarch64/sme/acle-asm/psel_lane_b8.c: ...here.
* gcc.target/aarch64/sme2/acle-asm/psel_lane_c16.c: Move to...
* gcc.target/aarch64/sme/acle-asm/psel_lane_c16.c: ...here.
* gcc.target/aarch64/sme2/acle-asm/psel_lane_c32.c: Move to...
* gcc.target/aarch64/sme/acle-asm/psel_lane_c32.c: ...here.
* gcc.target/aarch64/sme2/acle-asm/psel_lane_c64.c: Move to...
* gcc.target/aarch64/sme/acle-asm/psel_lane_c64.c: ...here.
* gcc.target/aarch64/sme2/acle-asm/psel_lane_c8.c: Move to...
* gcc.target/aarch64/sme/acle-asm/psel_lane_c8.c: ...here.
|
|
There are two sets of patterns for FCLAMP: one set for single registers
and one set for multiple registers. The multiple-register set was
correctly gated on SME2, but the single-register set only required SME.
This doesn't matter for ACLE usage, since the intrinsic definitions
are correctly gated. But it does matter for automatic generation of
FCLAMP from separate minimum and maximum operations (either ACLE
intrinsics or autovectorised code).
gcc/
* config/aarch64/aarch64-sve2.md (@aarch64_sve_fclamp<mode>)
(*aarch64_sve_fclamp<mode>_x): Require TARGET_STREAMING_SME2
rather than TARGET_STREAMING_SME.
gcc/testsuite/
* gcc.target/aarch64/sme/clamp_3.c: Force sme2
* gcc.target/aarch64/sme/clamp_4.c: Likewise.
* gcc.target/aarch64/sme/clamp_5.c: New test.
|
|
The BPF-specific .BTF.ext section is always generated for BPF programs
if -gbtf is specified, and generating it requires BTF information and
assumes that the BTF info has already been generated.
Compiling non-C languages to BPF is not supported, nor is generating
CTF/BTF for non-C. But, compiling another language like C++ to BPF
with -gbtf specified meant that we would try to generate the .BTF.ext
section anyway, and then ICE because no BTF information was available.
Add a check to bail out of btf_ext_output if the TU CTFC does not exist,
meaning no BTF info is available.
gcc/
PR target/117447
* config/bpf/btfext-out.cc (btf_ext_output): Bail if TU CTFC is null.
|
|
These maps will always be non-null in btf_finalize under normal
circumstances, but be safe and verify that before trying to empty them.
gcc/
* btfout.cc (btf_finalize): Check that hash maps are non-null before
emptying them.
|
|
r0-126134-g5d2a9da9a7f7c1 added support for circuiting and combing the ifs
into using either AND or OR. But it only allowed the inner condition
basic block having the conditional only. This changes to allow up to 2 defining
statements as long as they are just integer to integer conversions for
either the lhs or rhs of the conditional.
This should allow to use ccmp on aarch64 and x86_64 (APX) slightly more than before.
Boootstrapped and tested on x86_64-linux-gnu.
PR tree-optimization/85605
gcc/ChangeLog:
* tree-ssa-ifcombine.cc (can_combine_bbs_with_short_circuit): New function.
(ifcombine_ifandif): Use can_combine_bbs_with_short_circuit
instead of checking if iterator is one before the last statement.
gcc/testsuite/ChangeLog:
* g++.dg/tree-ssa/ifcombine-ccmp-1.C: New test.
* gcc.dg/tree-ssa/ssa-ifcombine-ccmp-7.c: New test.
* gcc.dg/tree-ssa/ssa-ifcombine-ccmp-8.c: New test.
* gcc.dg/tree-ssa/ssa-ifcombine-ccmp-9.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
GIMPLE_COND [PR117414]
Sometimes we get back a full ssa name when looking up the comparison of the GIMPLE_COND
rather than a predicate. We then want to lookup the `val != 0` for the predicate.
Note this might happen with other boolean assignments and COND_EXPR but I am not sure
if it is as important; I have not found a testcase yet.
Bootstrapped and tested on x86_64-linux-gnu.
PR tree-optimization/117414
gcc/ChangeLog:
* tree-ssa-sccvn.cc (process_bb): Lookup
`val != 0` if got back a ssa name when looking the comparison.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/fre-predicated-4.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
After the last patch, we also want to record `(A CMP B) != 0`
as `(A CMP B)` and `(A CMP B) == 0` as `(A CMP B)` with the
true/false edges swapped.
This shows up more due to the new handling of
`(A | B) ==/!= 0` in insert_predicates_for_cond
as now we can notice these comparisons which were not seen before.
This is enough to fix the original issue in `gcc.dg/tree-ssa/pr111456-1.c`
and make sure we don't regress it when enhancing ifcombine.
This adds that predicate and allows us to optimize f
in fre-predicated-3.c.
Changes since v1:
* v2: Use vn_valueize.
Bootstrapped and tested on x86_64-linux-gnu.
PR tree-optimization/117414
gcc/ChangeLog:
* tree-ssa-sccvn.cc (insert_predicates_for_cond): Handle `(A CMP B) !=/== 0`.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/fre-predicated-3.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
For `(a | b) == 0`, we can "assert" on the true edge that
both `a == 0` and `b == 0` but nothing on the false edge.
For `(a | b) != 0`, we can "assert" on the false edge that
both `a == 0` and `b == 0` but nothing on the true edge.
This adds that predicate and allows us to optimize f0, f1,
and f2 in fre-predicated-[12].c.
Changes since v1:
* v2: Use vn_valueize. Also canonicalize the comparison
at the begining of insert_predicates_for_cond for
constants to be on the rhs. Return early for
non-ssa names on the lhs (after canonicalization).
Bootstrapped and tested on x86_64-linux-gnu.
PR tree-optimization/117414
gcc/ChangeLog:
* tree-ssa-sccvn.cc (insert_predicates_for_cond): Canonicalize the comparison.
Don't insert anything if lhs is not a SSA_NAME. Handle `(a | b) !=/== 0`.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/fre-predicated-1.c: New test.
* gcc.dg/tree-ssa/fre-predicated-2.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
To make it easier to add more predicates in some cases,
factor out the code. Plus it makes the code slightly more
readable since it is not indented as much.
Bootstrapped and tested on x86_64.
gcc/ChangeLog:
* tree-ssa-sccvn.cc (insert_predicates_for_cond): New function, factored out from ...
(process_bb): Here.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
Until now, the structures that keep pragma information were different
when in preprocessing only mode and in normal mode. This change unifies
both so that the space and name of a pragma are always registered and
can be queried easily at a later time.
gcc/c-family/ChangeLog:
* c-pragma.cc (struct pragma_pp_data): Use (struct internal_pragma_handler);
(c_register_pragma_1): Always register name and space for all pragmas.
(c_invoke_pragma_handler): Adapt.
(c_invoke_early_pragma_handler): Likewise.
(c_pp_invoke_early_pragma_handler): Likewise.
|
|
We currently make vect_check_gather_scatter happy by replacing SSA
name references in DR_REF for gather/scatter DRs but the replacement
process only works once since for the second epilogue we have SSA
names from the first epilogue in DR_REF but as we copied from the
original loop the SSA mapping doesn't work.
The following simply punts for non-first epilogues, those gather/scatter
recognized by patterns to IFNs are already analyzed and should work
fine.
* tree-vect-data-refs.cc (vect_check_gather_scatter): Refuse
to analyze DR_REF if from an epilogue that's not first.
* tree-vect-loop.cc (update_epilogue_loop_vinfo): Add comment
how the substitution in DR_REF is broken.
|
|
The following introduces LOOP_VINFO_MAIN_LOOP_INFO alongside
LOOP_VINFO_ORIG_LOOP_INFO so one can have both access to the main
vectorized loop info and the preceeding vectorized epilogue.
This is critical for correctness as we need to disallow never
executed epilogues by costing in vect_analyze_loop_costing as
we assume those do not exist when deciding to add a skip-vector
edge during peeling. The patch also changes how multiple vector
epilogues are handled - instead of the epilogue_vinfos array in
the main loop info we now record the single epilogue_vinfo there
and further epilogues in the epilogue_vinfo member of the
epilogue info. This simplifies code.
* tree-vectorizer.h (_loop_vec_info::main_loop_info): New.
(LOOP_VINFO_MAIN_LOOP_INFO): Likewise.
(_loop_vec_info::epilogue_vinfo): Change from epilogue_vinfos
from array to single element.
* tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Initialize
main_loop_info and epilogue_vinfo. Remove epilogue_vinfos
allocation.
(_loop_vec_info::~_loop_vec_info): Do not release epilogue_vinfos.
(vect_create_loop_vinfo): Rename parameter, set
LOOP_VINFO_MAIN_LOOP_INFO.
(vect_analyze_loop_1): Rename parameter.
(vect_analyze_loop_costing): Properly distinguish between
the main vector loop and the preceeding epilogue.
(vect_analyze_loop): Change for epilogue_vinfos no longer
being a vector.
* tree-vect-loop-manip.cc (vect_do_peeling): Simplify and
thereby handle a vector epilogue of a vector epilogue.
|
|
The following remembers how we advanced DRs when vectorizing an
epilogue. When we want to vectorize the epilogue of such epilogue
we have to retain that advancement and add the advancement for this
vectorized epilogue. Due to the way we copy and re-associate
stmt_vec_infos and DRs recording this advancement and re-applying
it for the next epilogue is simplest.
* tree-vectorizer.h (_loop_vec_info::drs_advanced_by): New.
(LOOP_VINFO_DRS_ADVANCED_BY): Likewise.
* tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Initialize
drs_advanced_by.
(update_epilogue_loop_vinfo): Remember the DR advancement made.
(vect_transform_loop): Accumulate past advancements.
|
|
We need to check that an epilogue doesn't require LOOP_VINFO_PEELING_FOR_GAPS
in case the main loop didn't (the other way around is OK), the
computation whether the epilog is executed or not gets our of sync
otherwise.
* tree-vect-loop.cc (vect_analyze_loop_2): Move
vect_analyze_loop_costing after check whether we can do
peeling. Add check on LOOP_VINFO_PEELING_FOR_GAPS for
epilogues.
|
|
On Fri, Oct 18, 2024 at 02:05:59PM -0400, Antoni Boucher wrote:
> PR target/116725
> * gcc.target/i386/pr116725.c: Add test using those AVX builtins.
This test FAILs for me, as I don't have the latest gas around and the test
is dg-do assemble, so doesn't need just fixed compiler, but also assembler
which supports those instructions.
The following patch adds effective target directives to ensure assembler
supports those too.
2024-11-07 Jakub Jelinek <jakub@redhat.com>
PR target/116725
* gcc.target/i386/pr116725.c: Add dg-require-effective-target
avx512{dq,fp16,vl}.
|
|
Apparently we need to explicitly disable AVX, not just enabled SSE, to
guarentee the 16-lane vectors we need for the pattern match.
libgomp/ChangeLog:
* testsuite/libgomp.c/max_vf-1.c: Add -mno-avx.
gcc/testsuite/ChangeLog:
* gcc.dg/gomp/max_vf-1.c: Add -mno-avx.
|
|
This patch would like to add doc for the below 2 standard names.
1. strided load: v = mask_len_strided_load (ptr, stried, mask, len, bias)
2. strided store: mask_len_stried_store (ptr, stride, v, mask, len, bias)
gcc/ChangeLog:
* doc/md.texi: Add doc for mask_len_stried_load{store}.
Signed-off-by: Pan Li <pan2.li@intel.com>
Co-Authored-By: Juzhe-Zhong <juzhe.zhong@rivai.ai>
|
|
ext-dce uses TV_NONE, that's not OK for a pass taking 33% compile-time.
The following adds a timevar to it for proper blaming.
PR rtl-optimization/117467
* timevar.def (TV_EXT_DCE): New.
* ext-dce.cc (pass_data_ext_dce): Use TV_EXT_DCE.
|
|
We recently supports cbranchbf4 with AVX10_2 native bf16 comi
instructions, so do similar to cstorebf4.
gcc/ChangeLog:
* config/i386/i386.md (cstorebf4): Use vcomsbf16 under
TARGET_AVX10_2_256 and -fno-trapping-math.
(cbranchbf4): Adjust formatting.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx10_2-comibf-3.c: New test.
* gcc.target/i386/avx10_2-comibf-4.c: Likewise.
|
|
Since the test doesn't care if the hint is correct,
modify the regexp of the hint part to avoid future
changes to the hint that would cause the test to fail.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr117304-1.c: Modify regexp.
|
|
It became apparent that conditions could be combined that had deep SSA
dependency trees, that might thus require moving lots of statements.
Set a hard upper bound for now, hopefully to be replaced by a
dynamically computed bound, based on probabilities and costs.
Also reset flow sensitive info and avoid introducing undefined
behavior when moving stmts from under guarding conditions.
Finally, rework the preexisting reset of flow sensitive info and
avoidance of undefined behavior to be done when needed on all affected
inner blocks: reset flow info whenever enclosing conditions change,
and avoid undefined behavior whenever enclosing conditions become
laxer.
for gcc/ChangeLog
* tree-ssa-ifcombine.cc
(ifcombine_rewrite_to_defined_overflow): New.
(ifcombine_replace_cond): Reject conds that would require
moving too many stmts. Reset flow sensitive info and avoid
undefined behavior in moved stmts. Reset flow sensitive info
in all inner blocks when the outer condition changes, and
avoid undefined behavior whenever the outer condition becomes
laxer, adapted and moved from...
(pass_tree_ifcombine::execute): ... here.
|
|
The upcoming move of fold_truth_andor to ifcombine brings with it the
possibility of TRUTH_ANDIF cond exprs. Handle them by splitting the
cond so as to best use both BB insertion points, but only if they're
contiguous.
for gcc/ChangeLog
* tree-ssa-ifcombine.c (ifcombine_replace_cond): Support
TRUTH_ANDIF cond exprs.
|
|
Rework ifcombine to support merging conditions from noncontiguous
blocks. This depends on earlier preparation changes.
The function that attempted to ifcombine a block with its immediate
predecessor, tree_ssa_ifcombine_bb, now loops over dominating blocks
eligible for ifcombine, attempting to combine with them.
The function that actually drives the combination of a pair of blocks,
tree_ssa_ifcombine_bb_1, now takes an additional parameter: the
successor of outer that leads to inner.
The function that recognizes if_then_else patterns is modified to
enable testing without distinguishing between then and else, or to
require nondegenerate conditions, that aren't worth combining with.
for gcc/ChangeLog
* tree-ssa-ifcombine.cc (recognize_if_then_else): Support
relaxed then/else testing; require nondegenerate condition
otherwise.
(tree_ssa_ifcombine_bb_1): Add outer_succ_bb parm, use it
instead of inner_cond_bb. Adjust callers.
(tree_ssa_ifcombine_bb): Loop over dominating outer blocks
eligible for ifcombine.
(pass_tree_ifcombine::execute): Noted potential need for
changes to the post-combine logic.
|
|
Prepare to handle noncontiguous ifcombine, introducing logic to modify
the outer condition when needed. There are two cases worth
mentioning:
- when blocks are noncontiguous, we have to place the combined
condition in the outer block to avoid pessimizing carefully crafted
short-circuited tests;
- even when blocks are contiguous, we prepare for situations in which
the combined condition has two tests, one to be placed in outer and
the other in inner. This circumstance will not come up when
noncontiguous ifcombine is first enabled, but it will when
an improved fold_truth_andor is integrated with ifcombine.
Combining the condition from inner into outer may require moving SSA
DEFs used in the inner condition, and the changes implement this as
well.
for gcc/ChangeLog
* tree-ssa-ifcombine.cc: Include bitmap.h.
(ifcombine_mark_ssa_name): New.
(struct ifcombine_mark_ssa_name_t): New.
(ifcombine_mark_ssa_name_walk): New.
(ifcombine_replace_cond): Prepare to handle noncontiguous and
split-condition ifcombine.
|
|
Prepare for ifcombining noncontiguous blocks, adding (still unused)
logic to the ifcombine profile updater to handle such cases.
for gcc/ChangeLog
* tree-ssa-ifcombine.cc (known_succ_p): New.
(update_profile_after_ifcombine): Handle noncontiguous blocks.
|
|
Refactor ifcombine_ifandif, moving the common code from the various
paths that apply the combined condition to a new function.
for gcc/ChangeLog
* tree-ssa-ifcombine.cc (ifcombine_replace_cond): Factor out
of...
(ifcombine_ifandif): ... this. Leave it for the above to
gimplify and invert the condition.
|
|
In preparation to changes that may modify both inner and outer
conditions in ifcombine, drop the redundant parameter result_inv, that
is always identical to inner_inv.
for gcc/ChangeLog
* tree-ssa-ifcombine.cc (ifcombine_ifandif): Drop redundant
result_inv parm. Adjust all callers.
|
|
Disallowing vuses in blocks for ifcombine is too strict, and it
prevents usefully moving fold_truth_andor into ifcombine. That
tree-level folder has long ifcombined loads, absent other relevant
side effects.
for gcc/ChangeLog
* tree-ssa-ifcombine.c (bb_no_side_effects_p): Allow vuses,
but not vdefs.
|
|
Multiple tests fail on ia32 with -fPIE enabled by default because of
different call sequences required by the call-saved PIC register
(no-callee-saved-*.c), uses of the constant pool instead of computing
constants (pr100865-*.c), and unexpected matches of esp in get_pc_thunk
(sse2-stv-1.c). Disable PIE on them, to match the expectations.
for gcc/testsuite/ChangeLog
* gcc.target/i386/no-callee-saved-13.c: Disable PIE on ia32.
* gcc.target/i386/no-callee-saved-14.c: Likewise.
* gcc.target/i386/no-callee-saved-15.c: Likewise.
* gcc.target/i386/no-callee-saved-17.c: Likewise.
* gcc.target/i386/pr100865-1.c: Likewise.
* gcc.target/i386/pr100865-7a.c: Likewise.
* gcc.target/i386/pr100865-7c.c: Likewise.
* gcc.target/i386/sse2-stv-1.c: Likewise.
|
|
When we select a non-bx get_pc_thunk, we get an extra mov to set up
the PIC register before the abort call. Expect that mov or a
get_pc_thunk.bx call.
for gcc/testsuite/ChangeLog
* gcc.target/i386/pr70321.c: Cope with non-bx get_pc_thunk.
|
|
This patch adds testcase for form1, as shown below:
T __attribute__((noinline)) \
sat_s_add_imm_##T##_fmt_1##_##INDEX (T x) \
{ \
T sum = (UT)x + (UT)IMM; \
return (x ^ IMM) < 0 \
? sum \
: (sum ^ x) >= 0 \
? sum \
: x < 0 ? MIN : MAX; \
}
Passed the rv64gcv regression test.
Signed-off-by: Li Xu <xuli1@eswincomputing.com>
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat_arith.h: Support signed
imm SAT_ADD form1.
* gcc.target/riscv/sat_s_add_imm-1-1.c: New test.
* gcc.target/riscv/sat_s_add_imm-1.c: New test.
* gcc.target/riscv/sat_s_add_imm-2-1.c: New test.
* gcc.target/riscv/sat_s_add_imm-2.c: New test.
* gcc.target/riscv/sat_s_add_imm-3-1.c: New test.
* gcc.target/riscv/sat_s_add_imm-3.c: New test.
* gcc.target/riscv/sat_s_add_imm-4.c: New test.
* gcc.target/riscv/sat_s_add_imm-run-1.c: New test.
* gcc.target/riscv/sat_s_add_imm-run-2.c: New test.
* gcc.target/riscv/sat_s_add_imm-run-3.c: New test.
* gcc.target/riscv/sat_s_add_imm-run-4.c: New test.
|
|
This patch would like to support .SAT_ADD when one of the op
is singed IMM.
Form1:
T __attribute__((noinline)) \
sat_s_add_imm_##T##_fmt_1##_##INDEX (T x) \
{ \
T sum = (UT)x + (UT)IMM; \
return (x ^ IMM) < 0 \
? sum \
: (sum ^ x) >= 0 \
? sum \
: x < 0 ? MIN : MAX; \
}
Take below form1 as example:
DEF_SAT_S_ADD_IMM_FMT_1(0, int8_t, uint8_t, -10, INT8_MIN, INT8_MAX)
Before this patch:
__attribute__((noinline))
int8_t sat_s_add_imm_int8_t_fmt_1_0 (int8_t x)
{
int8_t sum;
unsigned char x.0_1;
unsigned char _2;
signed char _4;
int8_t _5;
_Bool _9;
signed char _10;
signed char _11;
signed char _12;
signed char _14;
signed char _16;
<bb 2> [local count: 1073741824]:
x.0_1 = (unsigned char) x_6(D);
_2 = x.0_1 + 246;
sum_7 = (int8_t) _2;
_4 = x_6(D) ^ sum_7;
_16 = x_6(D) ^ 9;
_14 = _4 & _16;
if (_14 < 0)
goto <bb 3>; [41.00%]
else
goto <bb 4>; [59.00%]
<bb 3> [local count: 259738147]:
_9 = x_6(D) < 0;
_10 = (signed char) _9;
_11 = -_10;
_12 = _11 ^ 127;
<bb 4> [local count: 1073741824]:
# _5 = PHI <sum_7(2), _12(3)>
return _5;
}
After this patch:
__attribute__((noinline))
int8_t sat_s_add_imm_int8_t_fmt_1_0 (int8_t x)
{
int8_t _5;
<bb 2> [local count: 1073741824]:
_5 = .SAT_ADD (x_6(D), -10); [tail call]
return _5;
}
The below test suites are passed for this patch:
1. The rv64gcv fully regression tests.
2. The x86 bootstrap tests.
3. The x86 fully regression tests.
Signed-off-by: Li Xu <xuli1@eswincomputing.com>
gcc/ChangeLog:
* match.pd: Add the form1 of signed imm .SAT_ADD matching.
* tree-ssa-math-opts.cc (match_saturation_add): Add fold
convert for const_int to the type of operand 0.
|
|
|
|
Since avx10_2-comibf-2.c is a run test, require AVX10.2 support.
* gcc.target/i386/avx10_2-comibf-2.c: Require avx10_2 target.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
|
|
This patch adds optimization of the following patterns:
(zero_extend:M (subreg:N (not:O==M (X:Q==M)))) ->
(xor:M (zero_extend:M (subreg:N (X:M)), mask))
... where the mask is GET_MODE_MASK (N).
For the cases when X:M doesn't have any non-zero bits outside of mode N,
(zero_extend:M (subreg:N (X:M)) could be simplified to just (X:M)
and whole optimization will be:
(zero_extend:M (subreg:N (not:M (X:M)))) ->
(xor:M (X:M, mask))
Patch targets to handle code patterns like:
not a0,a0
andi a0,a0,0xff
to be optimized to:
xori a0,a0,255
Change was locally tested for x86_64 and AArch64 (as most common)
and for RV-64 and MIPS-32 targets (as having an effect from this optimization):
no regressions for all cases.
PR rtl-optimization/112398
gcc/ChangeLog:
* simplify-rtx.cc (simplify_context::simplify_unary_operation_1):
Simplify ZERO_EXTEND (SUBREG (NOT X)) to XOR (X, GET_MODE_MASK(SUBREG))
when X doesn't have any non-zero bits outside of SUBREG mode.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/pr112398.c: New test.
Signed-off-by: Alexey Merzlyakov <alexey.merzlyakov@samsung.com>
|