Age | Commit message (Collapse) | Author | Files | Lines |
|
libstdc++-v3/ChangeLog:
* include/bits/hashtable.h (_Hashtable): Fix comment grammar.
|
|
I missed a search-and-replace on this test, meaning that it was
duplicating bfmlalb_f32.c.
gcc/testsuite/
* gcc.target/aarch64/sme2/acle-asm/bfmlslb_f32.c: Replace bfmla*
with bfmls*
|
|
The svpsel_lane intrinsics were wrongly classified as SME2+ only,
rather than as base SME intrinsics. They should always be available
in streaming mode.
gcc/
* config/aarch64/aarch64-sve2.md (@aarch64_sve_psel<BHSD_BITS>)
(*aarch64_sve_psel<BHSD_BITS>_plus): Require TARGET_STREAMING
rather than TARGET_STREAMING_SME2.
gcc/testsuite/
* gcc.target/aarch64/sme2/acle-asm/psel_lane_b16.c: Move to...
* gcc.target/aarch64/sme/acle-asm/psel_lane_b16.c: ...here.
* gcc.target/aarch64/sme2/acle-asm/psel_lane_b32.c: Move to...
* gcc.target/aarch64/sme/acle-asm/psel_lane_b32.c: ...here.
* gcc.target/aarch64/sme2/acle-asm/psel_lane_b64.c: Move to...
* gcc.target/aarch64/sme/acle-asm/psel_lane_b64.c: ...here.
* gcc.target/aarch64/sme2/acle-asm/psel_lane_b8.c: Move to...
* gcc.target/aarch64/sme/acle-asm/psel_lane_b8.c: ...here.
* gcc.target/aarch64/sme2/acle-asm/psel_lane_c16.c: Move to...
* gcc.target/aarch64/sme/acle-asm/psel_lane_c16.c: ...here.
* gcc.target/aarch64/sme2/acle-asm/psel_lane_c32.c: Move to...
* gcc.target/aarch64/sme/acle-asm/psel_lane_c32.c: ...here.
* gcc.target/aarch64/sme2/acle-asm/psel_lane_c64.c: Move to...
* gcc.target/aarch64/sme/acle-asm/psel_lane_c64.c: ...here.
* gcc.target/aarch64/sme2/acle-asm/psel_lane_c8.c: Move to...
* gcc.target/aarch64/sme/acle-asm/psel_lane_c8.c: ...here.
|
|
There are two sets of patterns for FCLAMP: one set for single registers
and one set for multiple registers. The multiple-register set was
correctly gated on SME2, but the single-register set only required SME.
This doesn't matter for ACLE usage, since the intrinsic definitions
are correctly gated. But it does matter for automatic generation of
FCLAMP from separate minimum and maximum operations (either ACLE
intrinsics or autovectorised code).
gcc/
* config/aarch64/aarch64-sve2.md (@aarch64_sve_fclamp<mode>)
(*aarch64_sve_fclamp<mode>_x): Require TARGET_STREAMING_SME2
rather than TARGET_STREAMING_SME.
gcc/testsuite/
* gcc.target/aarch64/sme/clamp_3.c: Force sme2
* gcc.target/aarch64/sme/clamp_4.c: Likewise.
* gcc.target/aarch64/sme/clamp_5.c: New test.
|
|
The BPF-specific .BTF.ext section is always generated for BPF programs
if -gbtf is specified, and generating it requires BTF information and
assumes that the BTF info has already been generated.
Compiling non-C languages to BPF is not supported, nor is generating
CTF/BTF for non-C. But, compiling another language like C++ to BPF
with -gbtf specified meant that we would try to generate the .BTF.ext
section anyway, and then ICE because no BTF information was available.
Add a check to bail out of btf_ext_output if the TU CTFC does not exist,
meaning no BTF info is available.
gcc/
PR target/117447
* config/bpf/btfext-out.cc (btf_ext_output): Bail if TU CTFC is null.
|
|
These maps will always be non-null in btf_finalize under normal
circumstances, but be safe and verify that before trying to empty them.
gcc/
* btfout.cc (btf_finalize): Check that hash maps are non-null before
emptying them.
|
|
r0-126134-g5d2a9da9a7f7c1 added support for circuiting and combing the ifs
into using either AND or OR. But it only allowed the inner condition
basic block having the conditional only. This changes to allow up to 2 defining
statements as long as they are just integer to integer conversions for
either the lhs or rhs of the conditional.
This should allow to use ccmp on aarch64 and x86_64 (APX) slightly more than before.
Boootstrapped and tested on x86_64-linux-gnu.
PR tree-optimization/85605
gcc/ChangeLog:
* tree-ssa-ifcombine.cc (can_combine_bbs_with_short_circuit): New function.
(ifcombine_ifandif): Use can_combine_bbs_with_short_circuit
instead of checking if iterator is one before the last statement.
gcc/testsuite/ChangeLog:
* g++.dg/tree-ssa/ifcombine-ccmp-1.C: New test.
* gcc.dg/tree-ssa/ssa-ifcombine-ccmp-7.c: New test.
* gcc.dg/tree-ssa/ssa-ifcombine-ccmp-8.c: New test.
* gcc.dg/tree-ssa/ssa-ifcombine-ccmp-9.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
GIMPLE_COND [PR117414]
Sometimes we get back a full ssa name when looking up the comparison of the GIMPLE_COND
rather than a predicate. We then want to lookup the `val != 0` for the predicate.
Note this might happen with other boolean assignments and COND_EXPR but I am not sure
if it is as important; I have not found a testcase yet.
Bootstrapped and tested on x86_64-linux-gnu.
PR tree-optimization/117414
gcc/ChangeLog:
* tree-ssa-sccvn.cc (process_bb): Lookup
`val != 0` if got back a ssa name when looking the comparison.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/fre-predicated-4.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
After the last patch, we also want to record `(A CMP B) != 0`
as `(A CMP B)` and `(A CMP B) == 0` as `(A CMP B)` with the
true/false edges swapped.
This shows up more due to the new handling of
`(A | B) ==/!= 0` in insert_predicates_for_cond
as now we can notice these comparisons which were not seen before.
This is enough to fix the original issue in `gcc.dg/tree-ssa/pr111456-1.c`
and make sure we don't regress it when enhancing ifcombine.
This adds that predicate and allows us to optimize f
in fre-predicated-3.c.
Changes since v1:
* v2: Use vn_valueize.
Bootstrapped and tested on x86_64-linux-gnu.
PR tree-optimization/117414
gcc/ChangeLog:
* tree-ssa-sccvn.cc (insert_predicates_for_cond): Handle `(A CMP B) !=/== 0`.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/fre-predicated-3.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
For `(a | b) == 0`, we can "assert" on the true edge that
both `a == 0` and `b == 0` but nothing on the false edge.
For `(a | b) != 0`, we can "assert" on the false edge that
both `a == 0` and `b == 0` but nothing on the true edge.
This adds that predicate and allows us to optimize f0, f1,
and f2 in fre-predicated-[12].c.
Changes since v1:
* v2: Use vn_valueize. Also canonicalize the comparison
at the begining of insert_predicates_for_cond for
constants to be on the rhs. Return early for
non-ssa names on the lhs (after canonicalization).
Bootstrapped and tested on x86_64-linux-gnu.
PR tree-optimization/117414
gcc/ChangeLog:
* tree-ssa-sccvn.cc (insert_predicates_for_cond): Canonicalize the comparison.
Don't insert anything if lhs is not a SSA_NAME. Handle `(a | b) !=/== 0`.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/fre-predicated-1.c: New test.
* gcc.dg/tree-ssa/fre-predicated-2.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
To make it easier to add more predicates in some cases,
factor out the code. Plus it makes the code slightly more
readable since it is not indented as much.
Bootstrapped and tested on x86_64.
gcc/ChangeLog:
* tree-ssa-sccvn.cc (insert_predicates_for_cond): New function, factored out from ...
(process_bb): Here.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
std::is_permutation is only used in <bits/hashtable.h> not in
<bits/hashtable_policy.h>, so move the comment referring to it.
libstdc++-v3/ChangeLog:
* include/bits/hashtable.h: Add is_permutation to comment.
* include/bits/hashtable_policy.h: Remove it from comment.
|
|
And tweak grammar in a couple of comments.
libstdc++-v3/ChangeLog:
* include/bits/hashtable.h: Fix spelling in comment.
|
|
libgomp/ChangeLog:
* libgomp.texi (OpenMP Technical Report 13): Remove 'iterator'
in 'map' clause of 'declare mapper' as it is already the list above.
(Interoperability Routines): Add.
(omp_target_memcpy_async, omp_target_memcpy_rect_async):
Document that depobj_list may be omitted in C++ and Fortran.
|
|
Until now, the structures that keep pragma information were different
when in preprocessing only mode and in normal mode. This change unifies
both so that the space and name of a pragma are always registered and
can be queried easily at a later time.
gcc/c-family/ChangeLog:
* c-pragma.cc (struct pragma_pp_data): Use (struct internal_pragma_handler);
(c_register_pragma_1): Always register name and space for all pragmas.
(c_invoke_pragma_handler): Adapt.
(c_invoke_early_pragma_handler): Likewise.
(c_pp_invoke_early_pragma_handler): Likewise.
|
|
We currently make vect_check_gather_scatter happy by replacing SSA
name references in DR_REF for gather/scatter DRs but the replacement
process only works once since for the second epilogue we have SSA
names from the first epilogue in DR_REF but as we copied from the
original loop the SSA mapping doesn't work.
The following simply punts for non-first epilogues, those gather/scatter
recognized by patterns to IFNs are already analyzed and should work
fine.
* tree-vect-data-refs.cc (vect_check_gather_scatter): Refuse
to analyze DR_REF if from an epilogue that's not first.
* tree-vect-loop.cc (update_epilogue_loop_vinfo): Add comment
how the substitution in DR_REF is broken.
|
|
The following introduces LOOP_VINFO_MAIN_LOOP_INFO alongside
LOOP_VINFO_ORIG_LOOP_INFO so one can have both access to the main
vectorized loop info and the preceeding vectorized epilogue.
This is critical for correctness as we need to disallow never
executed epilogues by costing in vect_analyze_loop_costing as
we assume those do not exist when deciding to add a skip-vector
edge during peeling. The patch also changes how multiple vector
epilogues are handled - instead of the epilogue_vinfos array in
the main loop info we now record the single epilogue_vinfo there
and further epilogues in the epilogue_vinfo member of the
epilogue info. This simplifies code.
* tree-vectorizer.h (_loop_vec_info::main_loop_info): New.
(LOOP_VINFO_MAIN_LOOP_INFO): Likewise.
(_loop_vec_info::epilogue_vinfo): Change from epilogue_vinfos
from array to single element.
* tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Initialize
main_loop_info and epilogue_vinfo. Remove epilogue_vinfos
allocation.
(_loop_vec_info::~_loop_vec_info): Do not release epilogue_vinfos.
(vect_create_loop_vinfo): Rename parameter, set
LOOP_VINFO_MAIN_LOOP_INFO.
(vect_analyze_loop_1): Rename parameter.
(vect_analyze_loop_costing): Properly distinguish between
the main vector loop and the preceeding epilogue.
(vect_analyze_loop): Change for epilogue_vinfos no longer
being a vector.
* tree-vect-loop-manip.cc (vect_do_peeling): Simplify and
thereby handle a vector epilogue of a vector epilogue.
|
|
The following remembers how we advanced DRs when vectorizing an
epilogue. When we want to vectorize the epilogue of such epilogue
we have to retain that advancement and add the advancement for this
vectorized epilogue. Due to the way we copy and re-associate
stmt_vec_infos and DRs recording this advancement and re-applying
it for the next epilogue is simplest.
* tree-vectorizer.h (_loop_vec_info::drs_advanced_by): New.
(LOOP_VINFO_DRS_ADVANCED_BY): Likewise.
* tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Initialize
drs_advanced_by.
(update_epilogue_loop_vinfo): Remember the DR advancement made.
(vect_transform_loop): Accumulate past advancements.
|
|
We need to check that an epilogue doesn't require LOOP_VINFO_PEELING_FOR_GAPS
in case the main loop didn't (the other way around is OK), the
computation whether the epilog is executed or not gets our of sync
otherwise.
* tree-vect-loop.cc (vect_analyze_loop_2): Move
vect_analyze_loop_costing after check whether we can do
peeling. Add check on LOOP_VINFO_PEELING_FOR_GAPS for
epilogues.
|
|
On Fri, Oct 18, 2024 at 02:05:59PM -0400, Antoni Boucher wrote:
> PR target/116725
> * gcc.target/i386/pr116725.c: Add test using those AVX builtins.
This test FAILs for me, as I don't have the latest gas around and the test
is dg-do assemble, so doesn't need just fixed compiler, but also assembler
which supports those instructions.
The following patch adds effective target directives to ensure assembler
supports those too.
2024-11-07 Jakub Jelinek <jakub@redhat.com>
PR target/116725
* gcc.target/i386/pr116725.c: Add dg-require-effective-target
avx512{dq,fp16,vl}.
|
|
Apparently we need to explicitly disable AVX, not just enabled SSE, to
guarentee the 16-lane vectors we need for the pattern match.
libgomp/ChangeLog:
* testsuite/libgomp.c/max_vf-1.c: Add -mno-avx.
gcc/testsuite/ChangeLog:
* gcc.dg/gomp/max_vf-1.c: Add -mno-avx.
|
|
This patch would like to add doc for the below 2 standard names.
1. strided load: v = mask_len_strided_load (ptr, stried, mask, len, bias)
2. strided store: mask_len_stried_store (ptr, stride, v, mask, len, bias)
gcc/ChangeLog:
* doc/md.texi: Add doc for mask_len_stried_load{store}.
Signed-off-by: Pan Li <pan2.li@intel.com>
Co-Authored-By: Juzhe-Zhong <juzhe.zhong@rivai.ai>
|
|
ext-dce uses TV_NONE, that's not OK for a pass taking 33% compile-time.
The following adds a timevar to it for proper blaming.
PR rtl-optimization/117467
* timevar.def (TV_EXT_DCE): New.
* ext-dce.cc (pass_data_ext_dce): Use TV_EXT_DCE.
|
|
We recently supports cbranchbf4 with AVX10_2 native bf16 comi
instructions, so do similar to cstorebf4.
gcc/ChangeLog:
* config/i386/i386.md (cstorebf4): Use vcomsbf16 under
TARGET_AVX10_2_256 and -fno-trapping-math.
(cbranchbf4): Adjust formatting.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx10_2-comibf-3.c: New test.
* gcc.target/i386/avx10_2-comibf-4.c: Likewise.
|
|
Since the test doesn't care if the hint is correct,
modify the regexp of the hint part to avoid future
changes to the hint that would cause the test to fail.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr117304-1.c: Modify regexp.
|
|
It became apparent that conditions could be combined that had deep SSA
dependency trees, that might thus require moving lots of statements.
Set a hard upper bound for now, hopefully to be replaced by a
dynamically computed bound, based on probabilities and costs.
Also reset flow sensitive info and avoid introducing undefined
behavior when moving stmts from under guarding conditions.
Finally, rework the preexisting reset of flow sensitive info and
avoidance of undefined behavior to be done when needed on all affected
inner blocks: reset flow info whenever enclosing conditions change,
and avoid undefined behavior whenever enclosing conditions become
laxer.
for gcc/ChangeLog
* tree-ssa-ifcombine.cc
(ifcombine_rewrite_to_defined_overflow): New.
(ifcombine_replace_cond): Reject conds that would require
moving too many stmts. Reset flow sensitive info and avoid
undefined behavior in moved stmts. Reset flow sensitive info
in all inner blocks when the outer condition changes, and
avoid undefined behavior whenever the outer condition becomes
laxer, adapted and moved from...
(pass_tree_ifcombine::execute): ... here.
|
|
The upcoming move of fold_truth_andor to ifcombine brings with it the
possibility of TRUTH_ANDIF cond exprs. Handle them by splitting the
cond so as to best use both BB insertion points, but only if they're
contiguous.
for gcc/ChangeLog
* tree-ssa-ifcombine.c (ifcombine_replace_cond): Support
TRUTH_ANDIF cond exprs.
|
|
Rework ifcombine to support merging conditions from noncontiguous
blocks. This depends on earlier preparation changes.
The function that attempted to ifcombine a block with its immediate
predecessor, tree_ssa_ifcombine_bb, now loops over dominating blocks
eligible for ifcombine, attempting to combine with them.
The function that actually drives the combination of a pair of blocks,
tree_ssa_ifcombine_bb_1, now takes an additional parameter: the
successor of outer that leads to inner.
The function that recognizes if_then_else patterns is modified to
enable testing without distinguishing between then and else, or to
require nondegenerate conditions, that aren't worth combining with.
for gcc/ChangeLog
* tree-ssa-ifcombine.cc (recognize_if_then_else): Support
relaxed then/else testing; require nondegenerate condition
otherwise.
(tree_ssa_ifcombine_bb_1): Add outer_succ_bb parm, use it
instead of inner_cond_bb. Adjust callers.
(tree_ssa_ifcombine_bb): Loop over dominating outer blocks
eligible for ifcombine.
(pass_tree_ifcombine::execute): Noted potential need for
changes to the post-combine logic.
|
|
Prepare to handle noncontiguous ifcombine, introducing logic to modify
the outer condition when needed. There are two cases worth
mentioning:
- when blocks are noncontiguous, we have to place the combined
condition in the outer block to avoid pessimizing carefully crafted
short-circuited tests;
- even when blocks are contiguous, we prepare for situations in which
the combined condition has two tests, one to be placed in outer and
the other in inner. This circumstance will not come up when
noncontiguous ifcombine is first enabled, but it will when
an improved fold_truth_andor is integrated with ifcombine.
Combining the condition from inner into outer may require moving SSA
DEFs used in the inner condition, and the changes implement this as
well.
for gcc/ChangeLog
* tree-ssa-ifcombine.cc: Include bitmap.h.
(ifcombine_mark_ssa_name): New.
(struct ifcombine_mark_ssa_name_t): New.
(ifcombine_mark_ssa_name_walk): New.
(ifcombine_replace_cond): Prepare to handle noncontiguous and
split-condition ifcombine.
|
|
Prepare for ifcombining noncontiguous blocks, adding (still unused)
logic to the ifcombine profile updater to handle such cases.
for gcc/ChangeLog
* tree-ssa-ifcombine.cc (known_succ_p): New.
(update_profile_after_ifcombine): Handle noncontiguous blocks.
|
|
Refactor ifcombine_ifandif, moving the common code from the various
paths that apply the combined condition to a new function.
for gcc/ChangeLog
* tree-ssa-ifcombine.cc (ifcombine_replace_cond): Factor out
of...
(ifcombine_ifandif): ... this. Leave it for the above to
gimplify and invert the condition.
|
|
In preparation to changes that may modify both inner and outer
conditions in ifcombine, drop the redundant parameter result_inv, that
is always identical to inner_inv.
for gcc/ChangeLog
* tree-ssa-ifcombine.cc (ifcombine_ifandif): Drop redundant
result_inv parm. Adjust all callers.
|
|
Disallowing vuses in blocks for ifcombine is too strict, and it
prevents usefully moving fold_truth_andor into ifcombine. That
tree-level folder has long ifcombined loads, absent other relevant
side effects.
for gcc/ChangeLog
* tree-ssa-ifcombine.c (bb_no_side_effects_p): Allow vuses,
but not vdefs.
|
|
Multiple tests fail on ia32 with -fPIE enabled by default because of
different call sequences required by the call-saved PIC register
(no-callee-saved-*.c), uses of the constant pool instead of computing
constants (pr100865-*.c), and unexpected matches of esp in get_pc_thunk
(sse2-stv-1.c). Disable PIE on them, to match the expectations.
for gcc/testsuite/ChangeLog
* gcc.target/i386/no-callee-saved-13.c: Disable PIE on ia32.
* gcc.target/i386/no-callee-saved-14.c: Likewise.
* gcc.target/i386/no-callee-saved-15.c: Likewise.
* gcc.target/i386/no-callee-saved-17.c: Likewise.
* gcc.target/i386/pr100865-1.c: Likewise.
* gcc.target/i386/pr100865-7a.c: Likewise.
* gcc.target/i386/pr100865-7c.c: Likewise.
* gcc.target/i386/sse2-stv-1.c: Likewise.
|
|
When we select a non-bx get_pc_thunk, we get an extra mov to set up
the PIC register before the abort call. Expect that mov or a
get_pc_thunk.bx call.
for gcc/testsuite/ChangeLog
* gcc.target/i386/pr70321.c: Cope with non-bx get_pc_thunk.
|
|
This patch adds testcase for form1, as shown below:
T __attribute__((noinline)) \
sat_s_add_imm_##T##_fmt_1##_##INDEX (T x) \
{ \
T sum = (UT)x + (UT)IMM; \
return (x ^ IMM) < 0 \
? sum \
: (sum ^ x) >= 0 \
? sum \
: x < 0 ? MIN : MAX; \
}
Passed the rv64gcv regression test.
Signed-off-by: Li Xu <xuli1@eswincomputing.com>
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat_arith.h: Support signed
imm SAT_ADD form1.
* gcc.target/riscv/sat_s_add_imm-1-1.c: New test.
* gcc.target/riscv/sat_s_add_imm-1.c: New test.
* gcc.target/riscv/sat_s_add_imm-2-1.c: New test.
* gcc.target/riscv/sat_s_add_imm-2.c: New test.
* gcc.target/riscv/sat_s_add_imm-3-1.c: New test.
* gcc.target/riscv/sat_s_add_imm-3.c: New test.
* gcc.target/riscv/sat_s_add_imm-4.c: New test.
* gcc.target/riscv/sat_s_add_imm-run-1.c: New test.
* gcc.target/riscv/sat_s_add_imm-run-2.c: New test.
* gcc.target/riscv/sat_s_add_imm-run-3.c: New test.
* gcc.target/riscv/sat_s_add_imm-run-4.c: New test.
|
|
This patch would like to support .SAT_ADD when one of the op
is singed IMM.
Form1:
T __attribute__((noinline)) \
sat_s_add_imm_##T##_fmt_1##_##INDEX (T x) \
{ \
T sum = (UT)x + (UT)IMM; \
return (x ^ IMM) < 0 \
? sum \
: (sum ^ x) >= 0 \
? sum \
: x < 0 ? MIN : MAX; \
}
Take below form1 as example:
DEF_SAT_S_ADD_IMM_FMT_1(0, int8_t, uint8_t, -10, INT8_MIN, INT8_MAX)
Before this patch:
__attribute__((noinline))
int8_t sat_s_add_imm_int8_t_fmt_1_0 (int8_t x)
{
int8_t sum;
unsigned char x.0_1;
unsigned char _2;
signed char _4;
int8_t _5;
_Bool _9;
signed char _10;
signed char _11;
signed char _12;
signed char _14;
signed char _16;
<bb 2> [local count: 1073741824]:
x.0_1 = (unsigned char) x_6(D);
_2 = x.0_1 + 246;
sum_7 = (int8_t) _2;
_4 = x_6(D) ^ sum_7;
_16 = x_6(D) ^ 9;
_14 = _4 & _16;
if (_14 < 0)
goto <bb 3>; [41.00%]
else
goto <bb 4>; [59.00%]
<bb 3> [local count: 259738147]:
_9 = x_6(D) < 0;
_10 = (signed char) _9;
_11 = -_10;
_12 = _11 ^ 127;
<bb 4> [local count: 1073741824]:
# _5 = PHI <sum_7(2), _12(3)>
return _5;
}
After this patch:
__attribute__((noinline))
int8_t sat_s_add_imm_int8_t_fmt_1_0 (int8_t x)
{
int8_t _5;
<bb 2> [local count: 1073741824]:
_5 = .SAT_ADD (x_6(D), -10); [tail call]
return _5;
}
The below test suites are passed for this patch:
1. The rv64gcv fully regression tests.
2. The x86 bootstrap tests.
3. The x86 fully regression tests.
Signed-off-by: Li Xu <xuli1@eswincomputing.com>
gcc/ChangeLog:
* match.pd: Add the form1 of signed imm .SAT_ADD matching.
* tree-ssa-math-opts.cc (match_saturation_add): Add fold
convert for const_int to the type of operand 0.
|
|
|
|
Since avx10_2-comibf-2.c is a run test, require AVX10.2 support.
* gcc.target/i386/avx10_2-comibf-2.c: Require avx10_2 target.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
|
|
This patch adds optimization of the following patterns:
(zero_extend:M (subreg:N (not:O==M (X:Q==M)))) ->
(xor:M (zero_extend:M (subreg:N (X:M)), mask))
... where the mask is GET_MODE_MASK (N).
For the cases when X:M doesn't have any non-zero bits outside of mode N,
(zero_extend:M (subreg:N (X:M)) could be simplified to just (X:M)
and whole optimization will be:
(zero_extend:M (subreg:N (not:M (X:M)))) ->
(xor:M (X:M, mask))
Patch targets to handle code patterns like:
not a0,a0
andi a0,a0,0xff
to be optimized to:
xori a0,a0,255
Change was locally tested for x86_64 and AArch64 (as most common)
and for RV-64 and MIPS-32 targets (as having an effect from this optimization):
no regressions for all cases.
PR rtl-optimization/112398
gcc/ChangeLog:
* simplify-rtx.cc (simplify_context::simplify_unary_operation_1):
Simplify ZERO_EXTEND (SUBREG (NOT X)) to XOR (X, GET_MODE_MASK(SUBREG))
when X doesn't have any non-zero bits outside of SUBREG mode.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/pr112398.c: New test.
Signed-off-by: Alexey Merzlyakov <alexey.merzlyakov@samsung.com>
|
|
cdtor_record needs to have an unsigned entry for the position in order to
match with vec_safe_length.
gcc/ChangeLog:
* config/darwin.cc (cdtor_record): Make position unsigned.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
|
|
My previous patch broke things when building with Werror.
gcc/ChangeLog:
* omp-general.cc (omp_max_vf): Cast the constant to poly_uint64.
|
|
Ensure that the GOMP_MAX_VF does the right thing for explicit schedules, when
offloading is enabled ("target" directives are present), and is inactive
otherwise.
libgomp/ChangeLog:
* testsuite/libgomp.c/max_vf-1.c: New test.
* testsuite/libgomp.c/max_vf-2.c: New test.
gcc/testsuite/ChangeLog:
* gcc.dg/gomp/max_vf-1.c: New test.
|
|
Delay omp_max_vf call until after the host and device compilers have diverged
so that the max_vf value can be tuned exactly right on both variants.
This change means that the ompdevlow pass must be enabled for functions that
use OpenMP directives with both "simd" and "schedule" enabled.
gcc/ChangeLog:
* internal-fn.cc (expand_GOMP_MAX_VF): New function.
* internal-fn.def (GOMP_MAX_VF): New internal function.
* omp-expand.cc (omp_adjust_chunk_size): Emit IFN_GOMP_MAX_VF when
called in offload context, otherwise assume host context.
* omp-offload.cc (execute_omp_device_lower): Expand IFN_GOMP_MAX_VF.
|
|
The chunk size for SIMD loops should be right for the current device; too big
allocates too much memory, too small is inefficient. Getting it wrong doesn't
actually break anything though.
This patch attempts to choose the optimal setting based on the context. Both
host-fallback and device will get the same chunk size, but device performance
is the most important in this case.
gcc/ChangeLog:
* omp-expand.cc (is_in_offload_region): New function.
(omp_adjust_chunk_size): Add pass-through "offload" parameter.
(get_ws_args_for): Likewise.
(determine_parallel_type): Use is_in_offload_region to adjust call to
get_ws_args_for.
(expand_omp_for_generic): Likewise.
(expand_omp_for_static_chunk): Likewise.
|
|
If requested, return the vectorization factor appropriate for the offload
device, if any.
This change gives a significant speedup in the BabelStream "dot" benchmark on
amdgcn.
The omp_adjust_chunk_size usecase is set "false", for now, but I intend to
change that in a follow-up patch.
Note that NVPTX SIMT offload does not use this code-path.
gcc/ChangeLog:
* gimple-loop-versioning.cc (loop_versioning::loop_versioning): Set
omp_max_vf to offload == false.
* omp-expand.cc (omp_adjust_chunk_size): Likewise.
* omp-general.cc (omp_max_vf): Add "offload" parameter, and detect
amdgcn offload devices.
* omp-general.h (omp_max_vf): Likewise.
* omp-low.cc (lower_rec_simd_input_clauses): Pass offload state to
omp_max_vf.
|
|
The Assume pass simply produces results, with no indication of how it
arrived as the results it gets. Add some output to the details listing.
The only functional change is when gori is used to calculate a range
more than once (ie, multiple uses), we now load the merged range rather
than just using the last calculated one.
* tree-assume.cc (assume_query::assume_query): Add debug output.
(assume_query::update_parms): Likewise.
(assume_query::calculate_phi): Likewise.
(assume_query::calculate_op): Likewise. Also pick up any
merged path values.
(assume_query::calculate_stmt): Likewise.
|
|
gcc/testsuite/ChangeLog:
PR c++/63388
* g++.dg/analyzer/infinite-recursion-pr63388.C: New test.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
|
|
gcc/ChangeLog:
* diagnostic.h (class diagnostic_context): Fix typo in leading
comment.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
|
|
These headers make no sense for C++ programs, because they either define
different content to the corresponding <xxx.h> C header, or define
nothing at all in namespace std. They were all deprecated in C++17, so
add deprecation warnings to them, which can be disabled with
-Wno-deprecated. For C++20 and later these headers are no longer in the
standard at all, so compiling with _GLIBCXX_USE_DEPRECATED defined to 0
will give an error when they are included.
Because #warning is non-standard before C++23 we need to use pragmas to
ignore -Wc++23-extensions for the -Wsystem-headers -pedantic case.
One g++ test needs adjustment because it includes <ciso646>, but that
can be made conditional on the __cplusplus value without any reduction
in test coverage.
For the library tests, consolidate the std_c++0x_neg.cc XFAIL tests into
the macros.cc test, using dg-error with a { target c++98_only }
selector. This avoids having two separate test files, one for C++98 and
one for everything later. Also add tests for the <xxx.h> headers to
ensure that they behave as expected and don't give deprecated warnings.
libstdc++-v3/ChangeLog:
* doc/xml/manual/evolution.xml: Document deprecations.
* doc/html/*: Regenerate.
* include/c_compatibility/complex.h (_GLIBCXX_COMPLEX_H): Move
include guard to start of file. Include <complex> directly
instead of <ccomplex>.
* include/c_compatibility/tgmath.h: Include <cmath> and
<complex> directly, instead of <ctgmath>.
* include/c_global/ccomplex: Add deprecated #warning for C++17
and #error for C++20 if _GLIBCXX_USE_DEPRECATED == 0.
* include/c_global/ciso646: Likewise.
* include/c_global/cstdalign: Likewise.
* include/c_global/cstdbool: Likewise.
* include/c_global/ctgmath: Likewise.
* include/c_std/ciso646: Likewise.
* include/precompiled/stdc++.h: Do not include ccomplex,
ciso646, cstdalign, cstdbool, or ctgmath in C++17 and later.
* testsuite/18_support/headers/cstdalign/macros.cc: Check for
warnings and errors for unsupported dialects.
* testsuite/18_support/headers/cstdbool/macros.cc: Likewise.
* testsuite/26_numerics/headers/ctgmath/complex.cc: Likewise.
* testsuite/27_io/objects/char/1.cc: Do not include <ciso646>.
* testsuite/27_io/objects/wchar_t/1.cc: Likewise.
* testsuite/18_support/headers/cstdbool/std_c++0x_neg.cc: Removed.
* testsuite/18_support/headers/cstdalign/std_c++0x_neg.cc: Removed.
* testsuite/26_numerics/headers/ccomplex/std_c++0x_neg.cc: Removed.
* testsuite/26_numerics/headers/ctgmath/std_c++0x_neg.cc: Removed.
* testsuite/18_support/headers/ciso646/macros.cc: New test.
* testsuite/18_support/headers/ciso646/macros.h.cc: New test.
* testsuite/18_support/headers/cstdbool/macros.h.cc: New test.
* testsuite/26_numerics/headers/ccomplex/complex.cc: New test.
* testsuite/26_numerics/headers/ccomplex/complex.h.cc: New test.
* testsuite/26_numerics/headers/ctgmath/complex.h.cc: New test.
gcc/testsuite/ChangeLog:
* g++.old-deja/g++.other/headers1.C: Do not include ciso646 for
C++17 and later.
|