aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-11-07libstdc++: Fix grammar in comment, againJonathan Wakely1-1/+1
libstdc++-v3/ChangeLog: * include/bits/hashtable.h (_Hashtable): Fix comment grammar.
2024-11-07aarch64: Fix gcc.target/aarch64/sme2/acle-asm/bfmlslb_f32.cRichard Sandiford1-30/+30
I missed a search-and-replace on this test, meaning that it was duplicating bfmlalb_f32.c. gcc/testsuite/ * gcc.target/aarch64/sme2/acle-asm/bfmlslb_f32.c: Replace bfmla* with bfmls*
2024-11-07aarch64: Make PSEL dependent on SME rather than SME2Richard Sandiford9-10/+10
The svpsel_lane intrinsics were wrongly classified as SME2+ only, rather than as base SME intrinsics. They should always be available in streaming mode. gcc/ * config/aarch64/aarch64-sve2.md (@aarch64_sve_psel<BHSD_BITS>) (*aarch64_sve_psel<BHSD_BITS>_plus): Require TARGET_STREAMING rather than TARGET_STREAMING_SME2. gcc/testsuite/ * gcc.target/aarch64/sme2/acle-asm/psel_lane_b16.c: Move to... * gcc.target/aarch64/sme/acle-asm/psel_lane_b16.c: ...here. * gcc.target/aarch64/sme2/acle-asm/psel_lane_b32.c: Move to... * gcc.target/aarch64/sme/acle-asm/psel_lane_b32.c: ...here. * gcc.target/aarch64/sme2/acle-asm/psel_lane_b64.c: Move to... * gcc.target/aarch64/sme/acle-asm/psel_lane_b64.c: ...here. * gcc.target/aarch64/sme2/acle-asm/psel_lane_b8.c: Move to... * gcc.target/aarch64/sme/acle-asm/psel_lane_b8.c: ...here. * gcc.target/aarch64/sme2/acle-asm/psel_lane_c16.c: Move to... * gcc.target/aarch64/sme/acle-asm/psel_lane_c16.c: ...here. * gcc.target/aarch64/sme2/acle-asm/psel_lane_c32.c: Move to... * gcc.target/aarch64/sme/acle-asm/psel_lane_c32.c: ...here. * gcc.target/aarch64/sme2/acle-asm/psel_lane_c64.c: Move to... * gcc.target/aarch64/sme/acle-asm/psel_lane_c64.c: ...here. * gcc.target/aarch64/sme2/acle-asm/psel_lane_c8.c: Move to... * gcc.target/aarch64/sme/acle-asm/psel_lane_c8.c: ...here.
2024-11-07aarch64: Restrict FCLAMP to SME2Richard Sandiford4-2/+30
There are two sets of patterns for FCLAMP: one set for single registers and one set for multiple registers. The multiple-register set was correctly gated on SME2, but the single-register set only required SME. This doesn't matter for ACLE usage, since the intrinsic definitions are correctly gated. But it does matter for automatic generation of FCLAMP from separate minimum and maximum operations (either ACLE intrinsics or autovectorised code). gcc/ * config/aarch64/aarch64-sve2.md (@aarch64_sve_fclamp<mode>) (*aarch64_sve_fclamp<mode>_x): Require TARGET_STREAMING_SME2 rather than TARGET_STREAMING_SME. gcc/testsuite/ * gcc.target/aarch64/sme/clamp_3.c: Force sme2 * gcc.target/aarch64/sme/clamp_4.c: Likewise. * gcc.target/aarch64/sme/clamp_5.c: New test.
2024-11-07bpf: avoid possible null deref in btf_ext_output [PR target/117447]David Faust1-0/+3
The BPF-specific .BTF.ext section is always generated for BPF programs if -gbtf is specified, and generating it requires BTF information and assumes that the BTF info has already been generated. Compiling non-C languages to BPF is not supported, nor is generating CTF/BTF for non-C. But, compiling another language like C++ to BPF with -gbtf specified meant that we would try to generate the .BTF.ext section anyway, and then ICE because no BTF information was available. Add a check to bail out of btf_ext_output if the TU CTFC does not exist, meaning no BTF info is available. gcc/ PR target/117447 * config/bpf/btfext-out.cc (btf_ext_output): Bail if TU CTFC is null.
2024-11-07btf: check hash maps are non-null before emptyingDavid Faust1-4/+10
These maps will always be non-null in btf_finalize under normal circumstances, but be safe and verify that before trying to empty them. gcc/ * btfout.cc (btf_finalize): Check that hash maps are non-null before emptying them.
2024-11-07ifcombine: For short circuit case, allow 2 convert defining statements [PR85605]Andrew Pinski5-2/+116
r0-126134-g5d2a9da9a7f7c1 added support for circuiting and combing the ifs into using either AND or OR. But it only allowed the inner condition basic block having the conditional only. This changes to allow up to 2 defining statements as long as they are just integer to integer conversions for either the lhs or rhs of the conditional. This should allow to use ccmp on aarch64 and x86_64 (APX) slightly more than before. Boootstrapped and tested on x86_64-linux-gnu. PR tree-optimization/85605 gcc/ChangeLog: * tree-ssa-ifcombine.cc (can_combine_bbs_with_short_circuit): New function. (ifcombine_ifandif): Use can_combine_bbs_with_short_circuit instead of checking if iterator is one before the last statement. gcc/testsuite/ChangeLog: * g++.dg/tree-ssa/ifcombine-ccmp-1.C: New test. * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-7.c: New test. * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-8.c: New test. * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-9.c: New test. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-11-07VN: Lookup `val != 0` if we got back val when looking up the predicate for ↵Andrew Pinski2-0/+48
GIMPLE_COND [PR117414] Sometimes we get back a full ssa name when looking up the comparison of the GIMPLE_COND rather than a predicate. We then want to lookup the `val != 0` for the predicate. Note this might happen with other boolean assignments and COND_EXPR but I am not sure if it is as important; I have not found a testcase yet. Bootstrapped and tested on x86_64-linux-gnu. PR tree-optimization/117414 gcc/ChangeLog: * tree-ssa-sccvn.cc (process_bb): Lookup `val != 0` if got back a ssa name when looking the comparison. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/fre-predicated-4.c: New test. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-11-07VN: Handle `(A CMP B) !=/== 0` for predicates [PR117414]Andrew Pinski2-0/+60
After the last patch, we also want to record `(A CMP B) != 0` as `(A CMP B)` and `(A CMP B) == 0` as `(A CMP B)` with the true/false edges swapped. This shows up more due to the new handling of `(A | B) ==/!= 0` in insert_predicates_for_cond as now we can notice these comparisons which were not seen before. This is enough to fix the original issue in `gcc.dg/tree-ssa/pr111456-1.c` and make sure we don't regress it when enhancing ifcombine. This adds that predicate and allows us to optimize f in fre-predicated-3.c. Changes since v1: * v2: Use vn_valueize. Bootstrapped and tested on x86_64-linux-gnu. PR tree-optimization/117414 gcc/ChangeLog: * tree-ssa-sccvn.cc (insert_predicates_for_cond): Handle `(A CMP B) !=/== 0`. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/fre-predicated-3.c: New test. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-11-07VN: Handle `(a | b) !=/== 0` for predicates [PR117414]Andrew Pinski3-0/+116
For `(a | b) == 0`, we can "assert" on the true edge that both `a == 0` and `b == 0` but nothing on the false edge. For `(a | b) != 0`, we can "assert" on the false edge that both `a == 0` and `b == 0` but nothing on the true edge. This adds that predicate and allows us to optimize f0, f1, and f2 in fre-predicated-[12].c. Changes since v1: * v2: Use vn_valueize. Also canonicalize the comparison at the begining of insert_predicates_for_cond for constants to be on the rhs. Return early for non-ssa names on the lhs (after canonicalization). Bootstrapped and tested on x86_64-linux-gnu. PR tree-optimization/117414 gcc/ChangeLog: * tree-ssa-sccvn.cc (insert_predicates_for_cond): Canonicalize the comparison. Don't insert anything if lhs is not a SSA_NAME. Handle `(a | b) !=/== 0`. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/fre-predicated-1.c: New test. * gcc.dg/tree-ssa/fre-predicated-2.c: New test. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-11-07VN: Factor out inserting predicates for conditionalAndrew Pinski1-33/+37
To make it easier to add more predicates in some cases, factor out the code. Plus it makes the code slightly more readable since it is not indented as much. Bootstrapped and tested on x86_64. gcc/ChangeLog: * tree-ssa-sccvn.cc (insert_predicates_for_cond): New function, factored out from ... (process_bb): Here. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-11-07libstdc++: Tweak comments on includes in hashtable headersJonathan Wakely2-2/+2
std::is_permutation is only used in <bits/hashtable.h> not in <bits/hashtable_policy.h>, so move the comment referring to it. libstdc++-v3/ChangeLog: * include/bits/hashtable.h: Add is_permutation to comment. * include/bits/hashtable_policy.h: Remove it from comment.
2024-11-07libstdc++: Fix typo in comment in hashtable.hJonathan Wakely1-2/+2
And tweak grammar in a couple of comments. libstdc++-v3/ChangeLog: * include/bits/hashtable.h: Fix spelling in comment.
2024-11-07libgomp.texi: Document OpenMP's Interoperability RoutinesTobias Burnus1-21/+312
libgomp/ChangeLog: * libgomp.texi (OpenMP Technical Report 13): Remove 'iterator' in 'map' clause of 'declare mapper' as it is already the list above. (Interoperability Routines): Add. (omp_target_memcpy_async, omp_target_memcpy_rect_async): Document that depobj_list may be omitted in C++ and Fortran.
2024-11-07Unify registered_pp_pragmas and registered_pragmasPaul Iannetta1-37/+29
Until now, the structures that keep pragma information were different when in preprocessing only mode and in normal mode. This change unifies both so that the space and name of a pragma are always registered and can be queried easily at a later time. gcc/c-family/ChangeLog: * c-pragma.cc (struct pragma_pp_data): Use (struct internal_pragma_handler); (c_register_pragma_1): Always register name and space for all pragmas. (c_invoke_pragma_handler): Adapt. (c_invoke_early_pragma_handler): Likewise. (c_pp_invoke_early_pragma_handler): Likewise.
2024-11-07Disable gather/scatter for non-first vectorized epilogueRichard Biener2-0/+10
We currently make vect_check_gather_scatter happy by replacing SSA name references in DR_REF for gather/scatter DRs but the replacement process only works once since for the second epilogue we have SSA names from the first epilogue in DR_REF but as we copied from the original loop the SSA mapping doesn't work. The following simply punts for non-first epilogues, those gather/scatter recognized by patterns to IFNs are already analyzed and should work fine. * tree-vect-data-refs.cc (vect_check_gather_scatter): Refuse to analyze DR_REF if from an epilogue that's not first. * tree-vect-loop.cc (update_epilogue_loop_vinfo): Add comment how the substitution in DR_REF is broken.
2024-11-07Add LOOP_VINFO_MAIN_LOOP_INFORichard Biener3-48/+53
The following introduces LOOP_VINFO_MAIN_LOOP_INFO alongside LOOP_VINFO_ORIG_LOOP_INFO so one can have both access to the main vectorized loop info and the preceeding vectorized epilogue. This is critical for correctness as we need to disallow never executed epilogues by costing in vect_analyze_loop_costing as we assume those do not exist when deciding to add a skip-vector edge during peeling. The patch also changes how multiple vector epilogues are handled - instead of the epilogue_vinfos array in the main loop info we now record the single epilogue_vinfo there and further epilogues in the epilogue_vinfo member of the epilogue info. This simplifies code. * tree-vectorizer.h (_loop_vec_info::main_loop_info): New. (LOOP_VINFO_MAIN_LOOP_INFO): Likewise. (_loop_vec_info::epilogue_vinfo): Change from epilogue_vinfos from array to single element. * tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Initialize main_loop_info and epilogue_vinfo. Remove epilogue_vinfos allocation. (_loop_vec_info::~_loop_vec_info): Do not release epilogue_vinfos. (vect_create_loop_vinfo): Rename parameter, set LOOP_VINFO_MAIN_LOOP_INFO. (vect_analyze_loop_1): Rename parameter. (vect_analyze_loop_costing): Properly distinguish between the main vector loop and the preceeding epilogue. (vect_analyze_loop): Change for epilogue_vinfos no longer being a vector. * tree-vect-loop-manip.cc (vect_do_peeling): Simplify and thereby handle a vector epilogue of a vector epilogue.
2024-11-07Add LOOP_VINFO_DRS_ADVANCED_BYRichard Biener2-0/+13
The following remembers how we advanced DRs when vectorizing an epilogue. When we want to vectorize the epilogue of such epilogue we have to retain that advancement and add the advancement for this vectorized epilogue. Due to the way we copy and re-associate stmt_vec_infos and DRs recording this advancement and re-applying it for the next epilogue is simplest. * tree-vectorizer.h (_loop_vec_info::drs_advanced_by): New. (LOOP_VINFO_DRS_ADVANCED_BY): Likewise. * tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Initialize drs_advanced_by. (update_epilogue_loop_vinfo): Remember the DR advancement made. (vect_transform_loop): Accumulate past advancements.
2024-11-07Check LOOP_VINFO_PEELING_FOR_GAPS on epilog is supportedRichard Biener1-10/+20
We need to check that an epilogue doesn't require LOOP_VINFO_PEELING_FOR_GAPS in case the main loop didn't (the other way around is OK), the computation whether the epilog is executed or not gets our of sync otherwise. * tree-vect-loop.cc (vect_analyze_loop_2): Move vect_analyze_loop_costing after check whether we can do peeling. Add check on LOOP_VINFO_PEELING_FOR_GAPS for epilogues.
2024-11-07testsuite: Fix up pr116725.c test [PR116725]Jakub Jelinek1-0/+3
On Fri, Oct 18, 2024 at 02:05:59PM -0400, Antoni Boucher wrote: > PR target/116725 > * gcc.target/i386/pr116725.c: Add test using those AVX builtins. This test FAILs for me, as I don't have the latest gas around and the test is dg-do assemble, so doesn't need just fixed compiler, but also assembler which supports those instructions. The following patch adds effective target directives to ensure assembler supports those too. 2024-11-07 Jakub Jelinek <jakub@redhat.com> PR target/116725 * gcc.target/i386/pr116725.c: Add dg-require-effective-target avx512{dq,fp16,vl}.
2024-11-07openmp: Fix max_vf testcases with -march=cascadelakeAndrew Stubbs2-2/+2
Apparently we need to explicitly disable AVX, not just enabled SSE, to guarentee the 16-lane vectors we need for the pattern match. libgomp/ChangeLog: * testsuite/libgomp.c/max_vf-1.c: Add -mno-avx. gcc/testsuite/ChangeLog: * gcc.dg/gomp/max_vf-1.c: Add -mno-avx.
2024-11-07Doc: Add doc for standard name mask_len_strided_load{store}mPan Li1-0/+27
This patch would like to add doc for the below 2 standard names. 1. strided load: v = mask_len_strided_load (ptr, stried, mask, len, bias) 2. strided store: mask_len_stried_store (ptr, stride, v, mask, len, bias) gcc/ChangeLog: * doc/md.texi: Add doc for mask_len_stried_load{store}. Signed-off-by: Pan Li <pan2.li@intel.com> Co-Authored-By: Juzhe-Zhong <juzhe.zhong@rivai.ai>
2024-11-07rtl-optimization/117467 - 33% compile-time in rest of compilationRichard Biener2-1/+2
ext-dce uses TV_NONE, that's not OK for a pass taking 33% compile-time. The following adds a timevar to it for proper blaming. PR rtl-optimization/117467 * timevar.def (TV_EXT_DCE): New. * ext-dce.cc (pass_data_ext_dce): Use TV_EXT_DCE.
2024-11-07i386: Support cstorebf4 with native bf16 comiHongyu Wang3-10/+82
We recently supports cbranchbf4 with AVX10_2 native bf16 comi instructions, so do similar to cstorebf4. gcc/ChangeLog: * config/i386/i386.md (cstorebf4): Use vcomsbf16 under TARGET_AVX10_2_256 and -fno-trapping-math. (cbranchbf4): Adjust formatting. gcc/testsuite/ChangeLog: * gcc.target/i386/avx10_2-comibf-3.c: New test. * gcc.target/i386/avx10_2-comibf-4.c: Likewise.
2024-11-07i386: Modify regexp of pr117304-1.cHu, Lin11-5/+5
Since the test doesn't care if the hint is correct, modify the regexp of the hint part to avoid future changes to the hint that would cause the test to fail. gcc/testsuite/ChangeLog: * gcc.target/i386/pr117304-1.c: Modify regexp.
2024-11-07limit ifcombine stmt moving and adjust flow infoAlexandre Oliva1-25/+89
It became apparent that conditions could be combined that had deep SSA dependency trees, that might thus require moving lots of statements. Set a hard upper bound for now, hopefully to be replaced by a dynamically computed bound, based on probabilities and costs. Also reset flow sensitive info and avoid introducing undefined behavior when moving stmts from under guarding conditions. Finally, rework the preexisting reset of flow sensitive info and avoidance of undefined behavior to be done when needed on all affected inner blocks: reset flow info whenever enclosing conditions change, and avoid undefined behavior whenever enclosing conditions become laxer. for gcc/ChangeLog * tree-ssa-ifcombine.cc (ifcombine_rewrite_to_defined_overflow): New. (ifcombine_replace_cond): Reject conds that would require moving too many stmts. Reset flow sensitive info and avoid undefined behavior in moved stmts. Reset flow sensitive info in all inner blocks when the outer condition changes, and avoid undefined behavior whenever the outer condition becomes laxer, adapted and moved from... (pass_tree_ifcombine::execute): ... here.
2024-11-07handle TRUTH_ANDIF cond exprs in ifcombine_replace_condAlexandre Oliva1-0/+11
The upcoming move of fold_truth_andor to ifcombine brings with it the possibility of TRUTH_ANDIF cond exprs. Handle them by splitting the cond so as to best use both BB insertion points, but only if they're contiguous. for gcc/ChangeLog * tree-ssa-ifcombine.c (ifcombine_replace_cond): Support TRUTH_ANDIF cond exprs.
2024-11-07ifcombine across noncontiguous blocksAlexandre Oliva1-29/+123
Rework ifcombine to support merging conditions from noncontiguous blocks. This depends on earlier preparation changes. The function that attempted to ifcombine a block with its immediate predecessor, tree_ssa_ifcombine_bb, now loops over dominating blocks eligible for ifcombine, attempting to combine with them. The function that actually drives the combination of a pair of blocks, tree_ssa_ifcombine_bb_1, now takes an additional parameter: the successor of outer that leads to inner. The function that recognizes if_then_else patterns is modified to enable testing without distinguishing between then and else, or to require nondegenerate conditions, that aren't worth combining with. for gcc/ChangeLog * tree-ssa-ifcombine.cc (recognize_if_then_else): Support relaxed then/else testing; require nondegenerate condition otherwise. (tree_ssa_ifcombine_bb_1): Add outer_succ_bb parm, use it instead of inner_cond_bb. Adjust callers. (tree_ssa_ifcombine_bb): Loop over dominating outer blocks eligible for ifcombine. (pass_tree_ifcombine::execute): Noted potential need for changes to the post-combine logic.
2024-11-07extend ifcombine_replace_cond to handle noncontiguous ifcombineAlexandre Oliva1-5/+170
Prepare to handle noncontiguous ifcombine, introducing logic to modify the outer condition when needed. There are two cases worth mentioning: - when blocks are noncontiguous, we have to place the combined condition in the outer block to avoid pessimizing carefully crafted short-circuited tests; - even when blocks are contiguous, we prepare for situations in which the combined condition has two tests, one to be placed in outer and the other in inner. This circumstance will not come up when noncontiguous ifcombine is first enabled, but it will when an improved fold_truth_andor is integrated with ifcombine. Combining the condition from inner into outer may require moving SSA DEFs used in the inner condition, and the changes implement this as well. for gcc/ChangeLog * tree-ssa-ifcombine.cc: Include bitmap.h. (ifcombine_mark_ssa_name): New. (struct ifcombine_mark_ssa_name_t): New. (ifcombine_mark_ssa_name_walk): New. (ifcombine_replace_cond): Prepare to handle noncontiguous and split-condition ifcombine.
2024-11-07adjust update_profile_after_ifcombine for noncontiguous ifcombineAlexandre Oliva1-24/+85
Prepare for ifcombining noncontiguous blocks, adding (still unused) logic to the ifcombine profile updater to handle such cases. for gcc/ChangeLog * tree-ssa-ifcombine.cc (known_succ_p): New. (update_profile_after_ifcombine): Handle noncontiguous blocks.
2024-11-07introduce ifcombine_replace_condAlexandre Oliva1-72/+65
Refactor ifcombine_ifandif, moving the common code from the various paths that apply the combined condition to a new function. for gcc/ChangeLog * tree-ssa-ifcombine.cc (ifcombine_replace_cond): Factor out of... (ifcombine_ifandif): ... this. Leave it for the above to gimplify and invert the condition.
2024-11-07drop redundant ifcombine_ifandif parmAlexandre Oliva1-11/+7
In preparation to changes that may modify both inner and outer conditions in ifcombine, drop the redundant parameter result_inv, that is always identical to inner_inv. for gcc/ChangeLog * tree-ssa-ifcombine.cc (ifcombine_ifandif): Drop redundant result_inv parm. Adjust all callers.
2024-11-07allow vuses in ifcombine blocksAlexandre Oliva1-1/+1
Disallowing vuses in blocks for ifcombine is too strict, and it prevents usefully moving fold_truth_andor into ifcombine. That tree-level folder has long ifcombined loads, absent other relevant side effects. for gcc/ChangeLog * tree-ssa-ifcombine.c (bb_no_side_effects_p): Allow vuses, but not vdefs.
2024-11-07[testsuite] disable PIE on ia32 on more testsAlexandre Oliva8-0/+8
Multiple tests fail on ia32 with -fPIE enabled by default because of different call sequences required by the call-saved PIC register (no-callee-saved-*.c), uses of the constant pool instead of computing constants (pr100865-*.c), and unexpected matches of esp in get_pc_thunk (sse2-stv-1.c). Disable PIE on them, to match the expectations. for gcc/testsuite/ChangeLog * gcc.target/i386/no-callee-saved-13.c: Disable PIE on ia32. * gcc.target/i386/no-callee-saved-14.c: Likewise. * gcc.target/i386/no-callee-saved-15.c: Likewise. * gcc.target/i386/no-callee-saved-17.c: Likewise. * gcc.target/i386/pr100865-1.c: Likewise. * gcc.target/i386/pr100865-7a.c: Likewise. * gcc.target/i386/pr100865-7c.c: Likewise. * gcc.target/i386/sse2-stv-1.c: Likewise.
2024-11-07[testsuite] fix pr70321.c PIC expectationsAlexandre Oliva1-1/+5
When we select a non-bx get_pc_thunk, we get an extra mov to set up the PIC register before the abort call. Expect that mov or a get_pc_thunk.bx call. for gcc/testsuite/ChangeLog * gcc.target/i386/pr70321.c: Cope with non-bx get_pc_thunk.
2024-11-07RISC-V: Add testcases for signed imm SAT_ADD form1xuli12-0/+336
This patch adds testcase for form1, as shown below: T __attribute__((noinline)) \ sat_s_add_imm_##T##_fmt_1##_##INDEX (T x) \ { \ T sum = (UT)x + (UT)IMM; \ return (x ^ IMM) < 0 \ ? sum \ : (sum ^ x) >= 0 \ ? sum \ : x < 0 ? MIN : MAX; \ } Passed the rv64gcv regression test. Signed-off-by: Li Xu <xuli1@eswincomputing.com> gcc/testsuite/ChangeLog: * gcc.target/riscv/sat_arith.h: Support signed imm SAT_ADD form1. * gcc.target/riscv/sat_s_add_imm-1-1.c: New test. * gcc.target/riscv/sat_s_add_imm-1.c: New test. * gcc.target/riscv/sat_s_add_imm-2-1.c: New test. * gcc.target/riscv/sat_s_add_imm-2.c: New test. * gcc.target/riscv/sat_s_add_imm-3-1.c: New test. * gcc.target/riscv/sat_s_add_imm-3.c: New test. * gcc.target/riscv/sat_s_add_imm-4.c: New test. * gcc.target/riscv/sat_s_add_imm-run-1.c: New test. * gcc.target/riscv/sat_s_add_imm-run-2.c: New test. * gcc.target/riscv/sat_s_add_imm-run-3.c: New test. * gcc.target/riscv/sat_s_add_imm-run-4.c: New test.
2024-11-07Match:Support signed imm SAT_ADD form1xuli2-0/+16
This patch would like to support .SAT_ADD when one of the op is singed IMM. Form1: T __attribute__((noinline)) \ sat_s_add_imm_##T##_fmt_1##_##INDEX (T x) \ { \ T sum = (UT)x + (UT)IMM; \ return (x ^ IMM) < 0 \ ? sum \ : (sum ^ x) >= 0 \ ? sum \ : x < 0 ? MIN : MAX; \ } Take below form1 as example: DEF_SAT_S_ADD_IMM_FMT_1(0, int8_t, uint8_t, -10, INT8_MIN, INT8_MAX) Before this patch: __attribute__((noinline)) int8_t sat_s_add_imm_int8_t_fmt_1_0 (int8_t x) { int8_t sum; unsigned char x.0_1; unsigned char _2; signed char _4; int8_t _5; _Bool _9; signed char _10; signed char _11; signed char _12; signed char _14; signed char _16; <bb 2> [local count: 1073741824]: x.0_1 = (unsigned char) x_6(D); _2 = x.0_1 + 246; sum_7 = (int8_t) _2; _4 = x_6(D) ^ sum_7; _16 = x_6(D) ^ 9; _14 = _4 & _16; if (_14 < 0) goto <bb 3>; [41.00%] else goto <bb 4>; [59.00%] <bb 3> [local count: 259738147]: _9 = x_6(D) < 0; _10 = (signed char) _9; _11 = -_10; _12 = _11 ^ 127; <bb 4> [local count: 1073741824]: # _5 = PHI <sum_7(2), _12(3)> return _5; } After this patch: __attribute__((noinline)) int8_t sat_s_add_imm_int8_t_fmt_1_0 (int8_t x) { int8_t _5; <bb 2> [local count: 1073741824]: _5 = .SAT_ADD (x_6(D), -10); [tail call] return _5; } The below test suites are passed for this patch: 1. The rv64gcv fully regression tests. 2. The x86 bootstrap tests. 3. The x86 fully regression tests. Signed-off-by: Li Xu <xuli1@eswincomputing.com> gcc/ChangeLog: * match.pd: Add the form1 of signed imm .SAT_ADD matching. * tree-ssa-math-opts.cc (match_saturation_add): Add fold convert for const_int to the type of operand 0.
2024-11-07Daily bump.GCC Administrator6-1/+215
2024-11-07avx10_2-comibf-2.c: Require AVX10.2 supportH.J. Lu1-1/+2
Since avx10_2-comibf-2.c is a run test, require AVX10.2 support. * gcc.target/i386/avx10_2-comibf-2.c: Require avx10_2 target. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2024-11-06[PATCH v2] RISC-V: zero_extend(not) -> xor optimization [PR112398]Alexey Merzlyakov2-0/+36
This patch adds optimization of the following patterns: (zero_extend:M (subreg:N (not:O==M (X:Q==M)))) -> (xor:M (zero_extend:M (subreg:N (X:M)), mask)) ... where the mask is GET_MODE_MASK (N). For the cases when X:M doesn't have any non-zero bits outside of mode N, (zero_extend:M (subreg:N (X:M)) could be simplified to just (X:M) and whole optimization will be: (zero_extend:M (subreg:N (not:M (X:M)))) -> (xor:M (X:M, mask)) Patch targets to handle code patterns like: not a0,a0 andi a0,a0,0xff to be optimized to: xori a0,a0,255 Change was locally tested for x86_64 and AArch64 (as most common) and for RV-64 and MIPS-32 targets (as having an effect from this optimization): no regressions for all cases. PR rtl-optimization/112398 gcc/ChangeLog: * simplify-rtx.cc (simplify_context::simplify_unary_operation_1): Simplify ZERO_EXTEND (SUBREG (NOT X)) to XOR (X, GET_MODE_MASK(SUBREG)) when X doesn't have any non-zero bits outside of SUBREG mode. gcc/testsuite/ChangeLog: * gcc.target/riscv/pr112398.c: New test. Signed-off-by: Alexey Merzlyakov <alexey.merzlyakov@samsung.com>
2024-11-06Darwin: Fix a narrowing warning.Iain Sandoe1-1/+1
cdtor_record needs to have an unsigned entry for the position in order to match with vec_safe_length. gcc/ChangeLog: * config/darwin.cc (cdtor_record): Make position unsigned. Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
2024-11-06openmp: Fix signed/unsigned warningAndrew Stubbs1-1/+1
My previous patch broke things when building with Werror. gcc/ChangeLog: * omp-general.cc (omp_max_vf): Cast the constant to poly_uint64.
2024-11-06openmp: Add testcases for omp_max_vfAndrew Stubbs3-0/+105
Ensure that the GOMP_MAX_VF does the right thing for explicit schedules, when offloading is enabled ("target" directives are present), and is inactive otherwise. libgomp/ChangeLog: * testsuite/libgomp.c/max_vf-1.c: New test. * testsuite/libgomp.c/max_vf-2.c: New test. gcc/testsuite/ChangeLog: * gcc.dg/gomp/max_vf-1.c: New test.
2024-11-06openmp: Add IFN_GOMP_MAX_VFAndrew Stubbs4-8/+34
Delay omp_max_vf call until after the host and device compilers have diverged so that the max_vf value can be tuned exactly right on both variants. This change means that the ompdevlow pass must be enabled for functions that use OpenMP directives with both "simd" and "schedule" enabled. gcc/ChangeLog: * internal-fn.cc (expand_GOMP_MAX_VF): New function. * internal-fn.def (GOMP_MAX_VF): New internal function. * omp-expand.cc (omp_adjust_chunk_size): Emit IFN_GOMP_MAX_VF when called in offload context, otherwise assume host context. * omp-offload.cc (execute_omp_device_lower): Expand IFN_GOMP_MAX_VF.
2024-11-06openmp: use offload max_vf for chunk_sizeAndrew Stubbs1-8/+28
The chunk size for SIMD loops should be right for the current device; too big allocates too much memory, too small is inefficient. Getting it wrong doesn't actually break anything though. This patch attempts to choose the optimal setting based on the context. Both host-fallback and device will get the same chunk size, but device performance is the most important in this case. gcc/ChangeLog: * omp-expand.cc (is_in_offload_region): New function. (omp_adjust_chunk_size): Add pass-through "offload" parameter. (get_ws_args_for): Likewise. (determine_parallel_type): Use is_in_offload_region to adjust call to get_ws_args_for. (expand_omp_for_generic): Likewise. (expand_omp_for_static_chunk): Likewise.
2024-11-06openmp: Tune omp_max_vf for offload targetsAndrew Stubbs5-6/+20
If requested, return the vectorization factor appropriate for the offload device, if any. This change gives a significant speedup in the BabelStream "dot" benchmark on amdgcn. The omp_adjust_chunk_size usecase is set "false", for now, but I intend to change that in a follow-up patch. Note that NVPTX SIMT offload does not use this code-path. gcc/ChangeLog: * gimple-loop-versioning.cc (loop_versioning::loop_versioning): Set omp_max_vf to offload == false. * omp-expand.cc (omp_adjust_chunk_size): Likewise. * omp-general.cc (omp_max_vf): Add "offload" parameter, and detect amdgcn offload devices. * omp-general.h (omp_max_vf): Likewise. * omp-low.cc (lower_rec_simd_input_clauses): Pass offload state to omp_max_vf.
2024-11-06Add details output for assume processing.Andrew MacLeod1-19/+115
The Assume pass simply produces results, with no indication of how it arrived as the results it gets. Add some output to the details listing. The only functional change is when gori is used to calculate a range more than once (ie, multiple uses), we now load the merged range rather than just using the last calculated one. * tree-assume.cc (assume_query::assume_query): Add debug output. (assume_query::update_parms): Likewise. (assume_query::calculate_phi): Likewise. (assume_query::calculate_op): Likewise. Also pick up any merged path values. (assume_query::calculate_stmt): Likewise.
2024-11-06testsuite: add infinite recursion test case [PR63388]David Malcolm1-0/+21
gcc/testsuite/ChangeLog: PR c++/63388 * g++.dg/analyzer/infinite-recursion-pr63388.C: New test. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2024-11-06diagnostics: fix typo in commentDavid Malcolm1-1/+1
gcc/ChangeLog: * diagnostic.h (class diagnostic_context): Fix typo in leading comment. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2024-11-06libstdc++: Deprecate useless <cxxx> compatibility headers for C++17Jonathan Wakely28-120/+288
These headers make no sense for C++ programs, because they either define different content to the corresponding <xxx.h> C header, or define nothing at all in namespace std. They were all deprecated in C++17, so add deprecation warnings to them, which can be disabled with -Wno-deprecated. For C++20 and later these headers are no longer in the standard at all, so compiling with _GLIBCXX_USE_DEPRECATED defined to 0 will give an error when they are included. Because #warning is non-standard before C++23 we need to use pragmas to ignore -Wc++23-extensions for the -Wsystem-headers -pedantic case. One g++ test needs adjustment because it includes <ciso646>, but that can be made conditional on the __cplusplus value without any reduction in test coverage. For the library tests, consolidate the std_c++0x_neg.cc XFAIL tests into the macros.cc test, using dg-error with a { target c++98_only } selector. This avoids having two separate test files, one for C++98 and one for everything later. Also add tests for the <xxx.h> headers to ensure that they behave as expected and don't give deprecated warnings. libstdc++-v3/ChangeLog: * doc/xml/manual/evolution.xml: Document deprecations. * doc/html/*: Regenerate. * include/c_compatibility/complex.h (_GLIBCXX_COMPLEX_H): Move include guard to start of file. Include <complex> directly instead of <ccomplex>. * include/c_compatibility/tgmath.h: Include <cmath> and <complex> directly, instead of <ctgmath>. * include/c_global/ccomplex: Add deprecated #warning for C++17 and #error for C++20 if _GLIBCXX_USE_DEPRECATED == 0. * include/c_global/ciso646: Likewise. * include/c_global/cstdalign: Likewise. * include/c_global/cstdbool: Likewise. * include/c_global/ctgmath: Likewise. * include/c_std/ciso646: Likewise. * include/precompiled/stdc++.h: Do not include ccomplex, ciso646, cstdalign, cstdbool, or ctgmath in C++17 and later. * testsuite/18_support/headers/cstdalign/macros.cc: Check for warnings and errors for unsupported dialects. * testsuite/18_support/headers/cstdbool/macros.cc: Likewise. * testsuite/26_numerics/headers/ctgmath/complex.cc: Likewise. * testsuite/27_io/objects/char/1.cc: Do not include <ciso646>. * testsuite/27_io/objects/wchar_t/1.cc: Likewise. * testsuite/18_support/headers/cstdbool/std_c++0x_neg.cc: Removed. * testsuite/18_support/headers/cstdalign/std_c++0x_neg.cc: Removed. * testsuite/26_numerics/headers/ccomplex/std_c++0x_neg.cc: Removed. * testsuite/26_numerics/headers/ctgmath/std_c++0x_neg.cc: Removed. * testsuite/18_support/headers/ciso646/macros.cc: New test. * testsuite/18_support/headers/ciso646/macros.h.cc: New test. * testsuite/18_support/headers/cstdbool/macros.h.cc: New test. * testsuite/26_numerics/headers/ccomplex/complex.cc: New test. * testsuite/26_numerics/headers/ccomplex/complex.h.cc: New test. * testsuite/26_numerics/headers/ctgmath/complex.h.cc: New test. gcc/testsuite/ChangeLog: * g++.old-deja/g++.other/headers1.C: Do not include ciso646 for C++17 and later.