riscv-gnu-toolchain/gcc.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
2024-11-07	libstdc++: Fix grammar in comment, again	Jonathan Wakely	1	-1/+1
	libstdc++-v3/ChangeLog: * include/bits/hashtable.h (_Hashtable): Fix comment grammar.
2024-11-07	aarch64: Fix gcc.target/aarch64/sme2/acle-asm/bfmlslb_f32.c	Richard Sandiford	1	-30/+30
	I missed a search-and-replace on this test, meaning that it was duplicating bfmlalb_f32.c. gcc/testsuite/ * gcc.target/aarch64/sme2/acle-asm/bfmlslb_f32.c: Replace bfmla* with bfmls*
2024-11-07	aarch64: Make PSEL dependent on SME rather than SME2	Richard Sandiford	9	-10/+10
	The svpsel_lane intrinsics were wrongly classified as SME2+ only, rather than as base SME intrinsics. They should always be available in streaming mode. gcc/ * config/aarch64/aarch64-sve2.md (@aarch64_sve_psel<BHSD_BITS>) (aarch64_sve_psel<BHSD_BITS>_plus): Require TARGET_STREAMING rather than TARGET_STREAMING_SME2. gcc/testsuite/ gcc.target/aarch64/sme2/acle-asm/psel_lane_b16.c: Move to... * gcc.target/aarch64/sme/acle-asm/psel_lane_b16.c: ...here. * gcc.target/aarch64/sme2/acle-asm/psel_lane_b32.c: Move to... * gcc.target/aarch64/sme/acle-asm/psel_lane_b32.c: ...here. * gcc.target/aarch64/sme2/acle-asm/psel_lane_b64.c: Move to... * gcc.target/aarch64/sme/acle-asm/psel_lane_b64.c: ...here. * gcc.target/aarch64/sme2/acle-asm/psel_lane_b8.c: Move to... * gcc.target/aarch64/sme/acle-asm/psel_lane_b8.c: ...here. * gcc.target/aarch64/sme2/acle-asm/psel_lane_c16.c: Move to... * gcc.target/aarch64/sme/acle-asm/psel_lane_c16.c: ...here. * gcc.target/aarch64/sme2/acle-asm/psel_lane_c32.c: Move to... * gcc.target/aarch64/sme/acle-asm/psel_lane_c32.c: ...here. * gcc.target/aarch64/sme2/acle-asm/psel_lane_c64.c: Move to... * gcc.target/aarch64/sme/acle-asm/psel_lane_c64.c: ...here. * gcc.target/aarch64/sme2/acle-asm/psel_lane_c8.c: Move to... * gcc.target/aarch64/sme/acle-asm/psel_lane_c8.c: ...here.
2024-11-07	aarch64: Restrict FCLAMP to SME2	Richard Sandiford	4	-2/+30
	There are two sets of patterns for FCLAMP: one set for single registers and one set for multiple registers. The multiple-register set was correctly gated on SME2, but the single-register set only required SME. This doesn't matter for ACLE usage, since the intrinsic definitions are correctly gated. But it does matter for automatic generation of FCLAMP from separate minimum and maximum operations (either ACLE intrinsics or autovectorised code). gcc/ * config/aarch64/aarch64-sve2.md (@aarch64_sve_fclamp<mode>) (aarch64_sve_fclamp<mode>_x): Require TARGET_STREAMING_SME2 rather than TARGET_STREAMING_SME. gcc/testsuite/ gcc.target/aarch64/sme/clamp_3.c: Force sme2 * gcc.target/aarch64/sme/clamp_4.c: Likewise. * gcc.target/aarch64/sme/clamp_5.c: New test.
2024-11-07	bpf: avoid possible null deref in btf_ext_output [PR target/117447]	David Faust	1	-0/+3
	The BPF-specific .BTF.ext section is always generated for BPF programs if -gbtf is specified, and generating it requires BTF information and assumes that the BTF info has already been generated. Compiling non-C languages to BPF is not supported, nor is generating CTF/BTF for non-C. But, compiling another language like C++ to BPF with -gbtf specified meant that we would try to generate the .BTF.ext section anyway, and then ICE because no BTF information was available. Add a check to bail out of btf_ext_output if the TU CTFC does not exist, meaning no BTF info is available. gcc/ PR target/117447 * config/bpf/btfext-out.cc (btf_ext_output): Bail if TU CTFC is null.
2024-11-07	btf: check hash maps are non-null before emptying	David Faust	1	-4/+10
	These maps will always be non-null in btf_finalize under normal circumstances, but be safe and verify that before trying to empty them. gcc/ * btfout.cc (btf_finalize): Check that hash maps are non-null before emptying them.
2024-11-07	ifcombine: For short circuit case, allow 2 convert defining statements [PR85605]	Andrew Pinski	5	-2/+116
	r0-126134-g5d2a9da9a7f7c1 added support for circuiting and combing the ifs into using either AND or OR. But it only allowed the inner condition basic block having the conditional only. This changes to allow up to 2 defining statements as long as they are just integer to integer conversions for either the lhs or rhs of the conditional. This should allow to use ccmp on aarch64 and x86_64 (APX) slightly more than before. Boootstrapped and tested on x86_64-linux-gnu. PR tree-optimization/85605 gcc/ChangeLog: * tree-ssa-ifcombine.cc (can_combine_bbs_with_short_circuit): New function. (ifcombine_ifandif): Use can_combine_bbs_with_short_circuit instead of checking if iterator is one before the last statement. gcc/testsuite/ChangeLog: * g++.dg/tree-ssa/ifcombine-ccmp-1.C: New test. * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-7.c: New test. * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-8.c: New test. * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-9.c: New test. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-11-07	VN: Lookup `val != 0` if we got back val when looking up the predicate for ↵	Andrew Pinski	2	-0/+48
	GIMPLE_COND [PR117414] Sometimes we get back a full ssa name when looking up the comparison of the GIMPLE_COND rather than a predicate. We then want to lookup the `val != 0` for the predicate. Note this might happen with other boolean assignments and COND_EXPR but I am not sure if it is as important; I have not found a testcase yet. Bootstrapped and tested on x86_64-linux-gnu. PR tree-optimization/117414 gcc/ChangeLog: * tree-ssa-sccvn.cc (process_bb): Lookup `val != 0` if got back a ssa name when looking the comparison. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/fre-predicated-4.c: New test. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-11-07	VN: Handle `(A CMP B) !=/== 0` for predicates [PR117414]	Andrew Pinski	2	-0/+60
	After the last patch, we also want to record `(A CMP B) != 0` as `(A CMP B)` and `(A CMP B) == 0` as `(A CMP B)` with the true/false edges swapped. This shows up more due to the new handling of `(A \| B) ==/!= 0` in insert_predicates_for_cond as now we can notice these comparisons which were not seen before. This is enough to fix the original issue in `gcc.dg/tree-ssa/pr111456-1.c` and make sure we don't regress it when enhancing ifcombine. This adds that predicate and allows us to optimize f in fre-predicated-3.c. Changes since v1: * v2: Use vn_valueize. Bootstrapped and tested on x86_64-linux-gnu. PR tree-optimization/117414 gcc/ChangeLog: * tree-ssa-sccvn.cc (insert_predicates_for_cond): Handle `(A CMP B) !=/== 0`. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/fre-predicated-3.c: New test. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-11-07	VN: Handle `(a \| b) !=/== 0` for predicates [PR117414]	Andrew Pinski	3	-0/+116
	For `(a \| b) == 0`, we can "assert" on the true edge that both `a == 0` and `b == 0` but nothing on the false edge. For `(a \| b) != 0`, we can "assert" on the false edge that both `a == 0` and `b == 0` but nothing on the true edge. This adds that predicate and allows us to optimize f0, f1, and f2 in fre-predicated-[12].c. Changes since v1: * v2: Use vn_valueize. Also canonicalize the comparison at the begining of insert_predicates_for_cond for constants to be on the rhs. Return early for non-ssa names on the lhs (after canonicalization). Bootstrapped and tested on x86_64-linux-gnu. PR tree-optimization/117414 gcc/ChangeLog: * tree-ssa-sccvn.cc (insert_predicates_for_cond): Canonicalize the comparison. Don't insert anything if lhs is not a SSA_NAME. Handle `(a \| b) !=/== 0`. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/fre-predicated-1.c: New test. * gcc.dg/tree-ssa/fre-predicated-2.c: New test. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-11-07	VN: Factor out inserting predicates for conditional	Andrew Pinski	1	-33/+37
	To make it easier to add more predicates in some cases, factor out the code. Plus it makes the code slightly more readable since it is not indented as much. Bootstrapped and tested on x86_64. gcc/ChangeLog: * tree-ssa-sccvn.cc (insert_predicates_for_cond): New function, factored out from ... (process_bb): Here. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-11-07	libstdc++: Tweak comments on includes in hashtable headers	Jonathan Wakely	2	-2/+2
	std::is_permutation is only used in <bits/hashtable.h> not in <bits/hashtable_policy.h>, so move the comment referring to it. libstdc++-v3/ChangeLog: * include/bits/hashtable.h: Add is_permutation to comment. * include/bits/hashtable_policy.h: Remove it from comment.
2024-11-07	libstdc++: Fix typo in comment in hashtable.h	Jonathan Wakely	1	-2/+2
	And tweak grammar in a couple of comments. libstdc++-v3/ChangeLog: * include/bits/hashtable.h: Fix spelling in comment.
2024-11-07	libgomp.texi: Document OpenMP's Interoperability Routines	Tobias Burnus	1	-21/+312
	libgomp/ChangeLog: * libgomp.texi (OpenMP Technical Report 13): Remove 'iterator' in 'map' clause of 'declare mapper' as it is already the list above. (Interoperability Routines): Add. (omp_target_memcpy_async, omp_target_memcpy_rect_async): Document that depobj_list may be omitted in C++ and Fortran.
2024-11-07	Unify registered_pp_pragmas and registered_pragmas	Paul Iannetta	1	-37/+29
	Until now, the structures that keep pragma information were different when in preprocessing only mode and in normal mode. This change unifies both so that the space and name of a pragma are always registered and can be queried easily at a later time. gcc/c-family/ChangeLog: * c-pragma.cc (struct pragma_pp_data): Use (struct internal_pragma_handler); (c_register_pragma_1): Always register name and space for all pragmas. (c_invoke_pragma_handler): Adapt. (c_invoke_early_pragma_handler): Likewise. (c_pp_invoke_early_pragma_handler): Likewise.
2024-11-07	Disable gather/scatter for non-first vectorized epilogue	Richard Biener	2	-0/+10
	We currently make vect_check_gather_scatter happy by replacing SSA name references in DR_REF for gather/scatter DRs but the replacement process only works once since for the second epilogue we have SSA names from the first epilogue in DR_REF but as we copied from the original loop the SSA mapping doesn't work. The following simply punts for non-first epilogues, those gather/scatter recognized by patterns to IFNs are already analyzed and should work fine. * tree-vect-data-refs.cc (vect_check_gather_scatter): Refuse to analyze DR_REF if from an epilogue that's not first. * tree-vect-loop.cc (update_epilogue_loop_vinfo): Add comment how the substitution in DR_REF is broken.
2024-11-07	Add LOOP_VINFO_MAIN_LOOP_INFO	Richard Biener	3	-48/+53
	The following introduces LOOP_VINFO_MAIN_LOOP_INFO alongside LOOP_VINFO_ORIG_LOOP_INFO so one can have both access to the main vectorized loop info and the preceeding vectorized epilogue. This is critical for correctness as we need to disallow never executed epilogues by costing in vect_analyze_loop_costing as we assume those do not exist when deciding to add a skip-vector edge during peeling. The patch also changes how multiple vector epilogues are handled - instead of the epilogue_vinfos array in the main loop info we now record the single epilogue_vinfo there and further epilogues in the epilogue_vinfo member of the epilogue info. This simplifies code. * tree-vectorizer.h (_loop_vec_info::main_loop_info): New. (LOOP_VINFO_MAIN_LOOP_INFO): Likewise. (_loop_vec_info::epilogue_vinfo): Change from epilogue_vinfos from array to single element. * tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Initialize main_loop_info and epilogue_vinfo. Remove epilogue_vinfos allocation. (_loop_vec_info::~_loop_vec_info): Do not release epilogue_vinfos. (vect_create_loop_vinfo): Rename parameter, set LOOP_VINFO_MAIN_LOOP_INFO. (vect_analyze_loop_1): Rename parameter. (vect_analyze_loop_costing): Properly distinguish between the main vector loop and the preceeding epilogue. (vect_analyze_loop): Change for epilogue_vinfos no longer being a vector. * tree-vect-loop-manip.cc (vect_do_peeling): Simplify and thereby handle a vector epilogue of a vector epilogue.
2024-11-07	Add LOOP_VINFO_DRS_ADVANCED_BY	Richard Biener	2	-0/+13
	The following remembers how we advanced DRs when vectorizing an epilogue. When we want to vectorize the epilogue of such epilogue we have to retain that advancement and add the advancement for this vectorized epilogue. Due to the way we copy and re-associate stmt_vec_infos and DRs recording this advancement and re-applying it for the next epilogue is simplest. * tree-vectorizer.h (_loop_vec_info::drs_advanced_by): New. (LOOP_VINFO_DRS_ADVANCED_BY): Likewise. * tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Initialize drs_advanced_by. (update_epilogue_loop_vinfo): Remember the DR advancement made. (vect_transform_loop): Accumulate past advancements.
2024-11-07	Check LOOP_VINFO_PEELING_FOR_GAPS on epilog is supported	Richard Biener	1	-10/+20
	We need to check that an epilogue doesn't require LOOP_VINFO_PEELING_FOR_GAPS in case the main loop didn't (the other way around is OK), the computation whether the epilog is executed or not gets our of sync otherwise. * tree-vect-loop.cc (vect_analyze_loop_2): Move vect_analyze_loop_costing after check whether we can do peeling. Add check on LOOP_VINFO_PEELING_FOR_GAPS for epilogues.
2024-11-07	testsuite: Fix up pr116725.c test [PR116725]	Jakub Jelinek	1	-0/+3
	On Fri, Oct 18, 2024 at 02:05:59PM -0400, Antoni Boucher wrote: > PR target/116725 > * gcc.target/i386/pr116725.c: Add test using those AVX builtins. This test FAILs for me, as I don't have the latest gas around and the test is dg-do assemble, so doesn't need just fixed compiler, but also assembler which supports those instructions. The following patch adds effective target directives to ensure assembler supports those too. 2024-11-07 Jakub Jelinek <jakub@redhat.com> PR target/116725 * gcc.target/i386/pr116725.c: Add dg-require-effective-target avx512{dq,fp16,vl}.
2024-11-07	openmp: Fix max_vf testcases with -march=cascadelake	Andrew Stubbs	2	-2/+2
	Apparently we need to explicitly disable AVX, not just enabled SSE, to guarentee the 16-lane vectors we need for the pattern match. libgomp/ChangeLog: * testsuite/libgomp.c/max_vf-1.c: Add -mno-avx. gcc/testsuite/ChangeLog: * gcc.dg/gomp/max_vf-1.c: Add -mno-avx.
2024-11-07	Doc: Add doc for standard name mask_len_strided_load{store}m	Pan Li	1	-0/+27
	This patch would like to add doc for the below 2 standard names. 1. strided load: v = mask_len_strided_load (ptr, stried, mask, len, bias) 2. strided store: mask_len_stried_store (ptr, stride, v, mask, len, bias) gcc/ChangeLog: * doc/md.texi: Add doc for mask_len_stried_load{store}. Signed-off-by: Pan Li <pan2.li@intel.com> Co-Authored-By: Juzhe-Zhong <juzhe.zhong@rivai.ai>
2024-11-07	rtl-optimization/117467 - 33% compile-time in rest of compilation	Richard Biener	2	-1/+2
	ext-dce uses TV_NONE, that's not OK for a pass taking 33% compile-time. The following adds a timevar to it for proper blaming. PR rtl-optimization/117467 * timevar.def (TV_EXT_DCE): New. * ext-dce.cc (pass_data_ext_dce): Use TV_EXT_DCE.
2024-11-07	i386: Support cstorebf4 with native bf16 comi	Hongyu Wang	3	-10/+82
	We recently supports cbranchbf4 with AVX10_2 native bf16 comi instructions, so do similar to cstorebf4. gcc/ChangeLog: * config/i386/i386.md (cstorebf4): Use vcomsbf16 under TARGET_AVX10_2_256 and -fno-trapping-math. (cbranchbf4): Adjust formatting. gcc/testsuite/ChangeLog: * gcc.target/i386/avx10_2-comibf-3.c: New test. * gcc.target/i386/avx10_2-comibf-4.c: Likewise.
2024-11-07	i386: Modify regexp of pr117304-1.c	Hu, Lin1	1	-5/+5
	Since the test doesn't care if the hint is correct, modify the regexp of the hint part to avoid future changes to the hint that would cause the test to fail. gcc/testsuite/ChangeLog: * gcc.target/i386/pr117304-1.c: Modify regexp.
2024-11-07	limit ifcombine stmt moving and adjust flow info	Alexandre Oliva	1	-25/+89
	It became apparent that conditions could be combined that had deep SSA dependency trees, that might thus require moving lots of statements. Set a hard upper bound for now, hopefully to be replaced by a dynamically computed bound, based on probabilities and costs. Also reset flow sensitive info and avoid introducing undefined behavior when moving stmts from under guarding conditions. Finally, rework the preexisting reset of flow sensitive info and avoidance of undefined behavior to be done when needed on all affected inner blocks: reset flow info whenever enclosing conditions change, and avoid undefined behavior whenever enclosing conditions become laxer. for gcc/ChangeLog * tree-ssa-ifcombine.cc (ifcombine_rewrite_to_defined_overflow): New. (ifcombine_replace_cond): Reject conds that would require moving too many stmts. Reset flow sensitive info and avoid undefined behavior in moved stmts. Reset flow sensitive info in all inner blocks when the outer condition changes, and avoid undefined behavior whenever the outer condition becomes laxer, adapted and moved from... (pass_tree_ifcombine::execute): ... here.
2024-11-07	handle TRUTH_ANDIF cond exprs in ifcombine_replace_cond	Alexandre Oliva	1	-0/+11
	The upcoming move of fold_truth_andor to ifcombine brings with it the possibility of TRUTH_ANDIF cond exprs. Handle them by splitting the cond so as to best use both BB insertion points, but only if they're contiguous. for gcc/ChangeLog * tree-ssa-ifcombine.c (ifcombine_replace_cond): Support TRUTH_ANDIF cond exprs.
2024-11-07	ifcombine across noncontiguous blocks	Alexandre Oliva	1	-29/+123
	Rework ifcombine to support merging conditions from noncontiguous blocks. This depends on earlier preparation changes. The function that attempted to ifcombine a block with its immediate predecessor, tree_ssa_ifcombine_bb, now loops over dominating blocks eligible for ifcombine, attempting to combine with them. The function that actually drives the combination of a pair of blocks, tree_ssa_ifcombine_bb_1, now takes an additional parameter: the successor of outer that leads to inner. The function that recognizes if_then_else patterns is modified to enable testing without distinguishing between then and else, or to require nondegenerate conditions, that aren't worth combining with. for gcc/ChangeLog * tree-ssa-ifcombine.cc (recognize_if_then_else): Support relaxed then/else testing; require nondegenerate condition otherwise. (tree_ssa_ifcombine_bb_1): Add outer_succ_bb parm, use it instead of inner_cond_bb. Adjust callers. (tree_ssa_ifcombine_bb): Loop over dominating outer blocks eligible for ifcombine. (pass_tree_ifcombine::execute): Noted potential need for changes to the post-combine logic.
2024-11-07	extend ifcombine_replace_cond to handle noncontiguous ifcombine	Alexandre Oliva	1	-5/+170
	Prepare to handle noncontiguous ifcombine, introducing logic to modify the outer condition when needed. There are two cases worth mentioning: - when blocks are noncontiguous, we have to place the combined condition in the outer block to avoid pessimizing carefully crafted short-circuited tests; - even when blocks are contiguous, we prepare for situations in which the combined condition has two tests, one to be placed in outer and the other in inner. This circumstance will not come up when noncontiguous ifcombine is first enabled, but it will when an improved fold_truth_andor is integrated with ifcombine. Combining the condition from inner into outer may require moving SSA DEFs used in the inner condition, and the changes implement this as well. for gcc/ChangeLog * tree-ssa-ifcombine.cc: Include bitmap.h. (ifcombine_mark_ssa_name): New. (struct ifcombine_mark_ssa_name_t): New. (ifcombine_mark_ssa_name_walk): New. (ifcombine_replace_cond): Prepare to handle noncontiguous and split-condition ifcombine.
2024-11-07	adjust update_profile_after_ifcombine for noncontiguous ifcombine	Alexandre Oliva	1	-24/+85
	Prepare for ifcombining noncontiguous blocks, adding (still unused) logic to the ifcombine profile updater to handle such cases. for gcc/ChangeLog * tree-ssa-ifcombine.cc (known_succ_p): New. (update_profile_after_ifcombine): Handle noncontiguous blocks.
2024-11-07	introduce ifcombine_replace_cond	Alexandre Oliva	1	-72/+65
	Refactor ifcombine_ifandif, moving the common code from the various paths that apply the combined condition to a new function. for gcc/ChangeLog * tree-ssa-ifcombine.cc (ifcombine_replace_cond): Factor out of... (ifcombine_ifandif): ... this. Leave it for the above to gimplify and invert the condition.
2024-11-07	drop redundant ifcombine_ifandif parm	Alexandre Oliva	1	-11/+7
	In preparation to changes that may modify both inner and outer conditions in ifcombine, drop the redundant parameter result_inv, that is always identical to inner_inv. for gcc/ChangeLog * tree-ssa-ifcombine.cc (ifcombine_ifandif): Drop redundant result_inv parm. Adjust all callers.
2024-11-07	allow vuses in ifcombine blocks	Alexandre Oliva	1	-1/+1
	Disallowing vuses in blocks for ifcombine is too strict, and it prevents usefully moving fold_truth_andor into ifcombine. That tree-level folder has long ifcombined loads, absent other relevant side effects. for gcc/ChangeLog * tree-ssa-ifcombine.c (bb_no_side_effects_p): Allow vuses, but not vdefs.
2024-11-07	[testsuite] disable PIE on ia32 on more tests	Alexandre Oliva	8	-0/+8
	Multiple tests fail on ia32 with -fPIE enabled by default because of different call sequences required by the call-saved PIC register (no-callee-saved-.c), uses of the constant pool instead of computing constants (pr100865-.c), and unexpected matches of esp in get_pc_thunk (sse2-stv-1.c). Disable PIE on them, to match the expectations. for gcc/testsuite/ChangeLog * gcc.target/i386/no-callee-saved-13.c: Disable PIE on ia32. * gcc.target/i386/no-callee-saved-14.c: Likewise. * gcc.target/i386/no-callee-saved-15.c: Likewise. * gcc.target/i386/no-callee-saved-17.c: Likewise. * gcc.target/i386/pr100865-1.c: Likewise. * gcc.target/i386/pr100865-7a.c: Likewise. * gcc.target/i386/pr100865-7c.c: Likewise. * gcc.target/i386/sse2-stv-1.c: Likewise.
2024-11-07	[testsuite] fix pr70321.c PIC expectations	Alexandre Oliva	1	-1/+5
	When we select a non-bx get_pc_thunk, we get an extra mov to set up the PIC register before the abort call. Expect that mov or a get_pc_thunk.bx call. for gcc/testsuite/ChangeLog * gcc.target/i386/pr70321.c: Cope with non-bx get_pc_thunk.
2024-11-07	RISC-V: Add testcases for signed imm SAT_ADD form1	xuli	12	-0/+336
	This patch adds testcase for form1, as shown below: T __attribute__((noinline)) \ sat_s_add_imm_##T##_fmt_1##_##INDEX (T x) \ { \ T sum = (UT)x + (UT)IMM; \ return (x ^ IMM) < 0 \ ? sum \ : (sum ^ x) >= 0 \ ? sum \ : x < 0 ? MIN : MAX; \ } Passed the rv64gcv regression test. Signed-off-by: Li Xu <xuli1@eswincomputing.com> gcc/testsuite/ChangeLog: * gcc.target/riscv/sat_arith.h: Support signed imm SAT_ADD form1. * gcc.target/riscv/sat_s_add_imm-1-1.c: New test. * gcc.target/riscv/sat_s_add_imm-1.c: New test. * gcc.target/riscv/sat_s_add_imm-2-1.c: New test. * gcc.target/riscv/sat_s_add_imm-2.c: New test. * gcc.target/riscv/sat_s_add_imm-3-1.c: New test. * gcc.target/riscv/sat_s_add_imm-3.c: New test. * gcc.target/riscv/sat_s_add_imm-4.c: New test. * gcc.target/riscv/sat_s_add_imm-run-1.c: New test. * gcc.target/riscv/sat_s_add_imm-run-2.c: New test. * gcc.target/riscv/sat_s_add_imm-run-3.c: New test. * gcc.target/riscv/sat_s_add_imm-run-4.c: New test.
2024-11-07	Match:Support signed imm SAT_ADD form1	xuli	2	-0/+16
	This patch would like to support .SAT_ADD when one of the op is singed IMM. Form1: T __attribute__((noinline)) \ sat_s_add_imm_##T##_fmt_1##_##INDEX (T x) \ { \ T sum = (UT)x + (UT)IMM; \ return (x ^ IMM) < 0 \ ? sum \ : (sum ^ x) >= 0 \ ? sum \ : x < 0 ? MIN : MAX; \ } Take below form1 as example: DEF_SAT_S_ADD_IMM_FMT_1(0, int8_t, uint8_t, -10, INT8_MIN, INT8_MAX) Before this patch: __attribute__((noinline)) int8_t sat_s_add_imm_int8_t_fmt_1_0 (int8_t x) { int8_t sum; unsigned char x.0_1; unsigned char _2; signed char _4; int8_t _5; _Bool _9; signed char _10; signed char _11; signed char _12; signed char _14; signed char _16; <bb 2> [local count: 1073741824]: x.0_1 = (unsigned char) x_6(D); _2 = x.0_1 + 246; sum_7 = (int8_t) _2; _4 = x_6(D) ^ sum_7; _16 = x_6(D) ^ 9; _14 = _4 & _16; if (_14 < 0) goto <bb 3>; [41.00%] else goto <bb 4>; [59.00%] <bb 3> [local count: 259738147]: _9 = x_6(D) < 0; _10 = (signed char) _9; _11 = -_10; _12 = _11 ^ 127; <bb 4> [local count: 1073741824]: # _5 = PHI <sum_7(2), _12(3)> return _5; } After this patch: __attribute__((noinline)) int8_t sat_s_add_imm_int8_t_fmt_1_0 (int8_t x) { int8_t _5; <bb 2> [local count: 1073741824]: _5 = .SAT_ADD (x_6(D), -10); [tail call] return _5; } The below test suites are passed for this patch: 1. The rv64gcv fully regression tests. 2. The x86 bootstrap tests. 3. The x86 fully regression tests. Signed-off-by: Li Xu <xuli1@eswincomputing.com> gcc/ChangeLog: * match.pd: Add the form1 of signed imm .SAT_ADD matching. * tree-ssa-math-opts.cc (match_saturation_add): Add fold convert for const_int to the type of operand 0.
2024-11-07	Daily bump.	GCC Administrator	6	-1/+215

2024-11-07	avx10_2-comibf-2.c: Require AVX10.2 support	H.J. Lu	1	-1/+2
	Since avx10_2-comibf-2.c is a run test, require AVX10.2 support. * gcc.target/i386/avx10_2-comibf-2.c: Require avx10_2 target. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2024-11-06	[PATCH v2] RISC-V: zero_extend(not) -> xor optimization [PR112398]	Alexey Merzlyakov	2	-0/+36
	This patch adds optimization of the following patterns: (zero_extend:M (subreg:N (not:O==M (X:Q==M)))) -> (xor:M (zero_extend:M (subreg:N (X:M)), mask)) ... where the mask is GET_MODE_MASK (N). For the cases when X:M doesn't have any non-zero bits outside of mode N, (zero_extend:M (subreg:N (X:M)) could be simplified to just (X:M) and whole optimization will be: (zero_extend:M (subreg:N (not:M (X:M)))) -> (xor:M (X:M, mask)) Patch targets to handle code patterns like: not a0,a0 andi a0,a0,0xff to be optimized to: xori a0,a0,255 Change was locally tested for x86_64 and AArch64 (as most common) and for RV-64 and MIPS-32 targets (as having an effect from this optimization): no regressions for all cases. PR rtl-optimization/112398 gcc/ChangeLog: * simplify-rtx.cc (simplify_context::simplify_unary_operation_1): Simplify ZERO_EXTEND (SUBREG (NOT X)) to XOR (X, GET_MODE_MASK(SUBREG)) when X doesn't have any non-zero bits outside of SUBREG mode. gcc/testsuite/ChangeLog: * gcc.target/riscv/pr112398.c: New test. Signed-off-by: Alexey Merzlyakov <alexey.merzlyakov@samsung.com>
2024-11-06	Darwin: Fix a narrowing warning.	Iain Sandoe	1	-1/+1
	cdtor_record needs to have an unsigned entry for the position in order to match with vec_safe_length. gcc/ChangeLog: * config/darwin.cc (cdtor_record): Make position unsigned. Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
2024-11-06	openmp: Fix signed/unsigned warning	Andrew Stubbs	1	-1/+1
	My previous patch broke things when building with Werror. gcc/ChangeLog: * omp-general.cc (omp_max_vf): Cast the constant to poly_uint64.
2024-11-06	openmp: Add testcases for omp_max_vf	Andrew Stubbs	3	-0/+105
	Ensure that the GOMP_MAX_VF does the right thing for explicit schedules, when offloading is enabled ("target" directives are present), and is inactive otherwise. libgomp/ChangeLog: * testsuite/libgomp.c/max_vf-1.c: New test. * testsuite/libgomp.c/max_vf-2.c: New test. gcc/testsuite/ChangeLog: * gcc.dg/gomp/max_vf-1.c: New test.
2024-11-06	openmp: Add IFN_GOMP_MAX_VF	Andrew Stubbs	4	-8/+34
	Delay omp_max_vf call until after the host and device compilers have diverged so that the max_vf value can be tuned exactly right on both variants. This change means that the ompdevlow pass must be enabled for functions that use OpenMP directives with both "simd" and "schedule" enabled. gcc/ChangeLog: * internal-fn.cc (expand_GOMP_MAX_VF): New function. * internal-fn.def (GOMP_MAX_VF): New internal function. * omp-expand.cc (omp_adjust_chunk_size): Emit IFN_GOMP_MAX_VF when called in offload context, otherwise assume host context. * omp-offload.cc (execute_omp_device_lower): Expand IFN_GOMP_MAX_VF.
2024-11-06	openmp: use offload max_vf for chunk_size	Andrew Stubbs	1	-8/+28
	The chunk size for SIMD loops should be right for the current device; too big allocates too much memory, too small is inefficient. Getting it wrong doesn't actually break anything though. This patch attempts to choose the optimal setting based on the context. Both host-fallback and device will get the same chunk size, but device performance is the most important in this case. gcc/ChangeLog: * omp-expand.cc (is_in_offload_region): New function. (omp_adjust_chunk_size): Add pass-through "offload" parameter. (get_ws_args_for): Likewise. (determine_parallel_type): Use is_in_offload_region to adjust call to get_ws_args_for. (expand_omp_for_generic): Likewise. (expand_omp_for_static_chunk): Likewise.
2024-11-06	openmp: Tune omp_max_vf for offload targets	Andrew Stubbs	5	-6/+20
	If requested, return the vectorization factor appropriate for the offload device, if any. This change gives a significant speedup in the BabelStream "dot" benchmark on amdgcn. The omp_adjust_chunk_size usecase is set "false", for now, but I intend to change that in a follow-up patch. Note that NVPTX SIMT offload does not use this code-path. gcc/ChangeLog: * gimple-loop-versioning.cc (loop_versioning::loop_versioning): Set omp_max_vf to offload == false. * omp-expand.cc (omp_adjust_chunk_size): Likewise. * omp-general.cc (omp_max_vf): Add "offload" parameter, and detect amdgcn offload devices. * omp-general.h (omp_max_vf): Likewise. * omp-low.cc (lower_rec_simd_input_clauses): Pass offload state to omp_max_vf.
2024-11-06	Add details output for assume processing.	Andrew MacLeod	1	-19/+115
	The Assume pass simply produces results, with no indication of how it arrived as the results it gets. Add some output to the details listing. The only functional change is when gori is used to calculate a range more than once (ie, multiple uses), we now load the merged range rather than just using the last calculated one. * tree-assume.cc (assume_query::assume_query): Add debug output. (assume_query::update_parms): Likewise. (assume_query::calculate_phi): Likewise. (assume_query::calculate_op): Likewise. Also pick up any merged path values. (assume_query::calculate_stmt): Likewise.
2024-11-06	testsuite: add infinite recursion test case [PR63388]	David Malcolm	1	-0/+21
	gcc/testsuite/ChangeLog: PR c++/63388 * g++.dg/analyzer/infinite-recursion-pr63388.C: New test. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2024-11-06	diagnostics: fix typo in comment	David Malcolm	1	-1/+1
	gcc/ChangeLog: * diagnostic.h (class diagnostic_context): Fix typo in leading comment. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2024-11-06	libstdc++: Deprecate useless <cxxx> compatibility headers for C++17	Jonathan Wakely	28	-120/+288
	These headers make no sense for C++ programs, because they either define different content to the corresponding <xxx.h> C header, or define nothing at all in namespace std. They were all deprecated in C++17, so add deprecation warnings to them, which can be disabled with -Wno-deprecated. For C++20 and later these headers are no longer in the standard at all, so compiling with _GLIBCXX_USE_DEPRECATED defined to 0 will give an error when they are included. Because #warning is non-standard before C++23 we need to use pragmas to ignore -Wc++23-extensions for the -Wsystem-headers -pedantic case. One g++ test needs adjustment because it includes <ciso646>, but that can be made conditional on the __cplusplus value without any reduction in test coverage. For the library tests, consolidate the std_c++0x_neg.cc XFAIL tests into the macros.cc test, using dg-error with a { target c++98_only } selector. This avoids having two separate test files, one for C++98 and one for everything later. Also add tests for the <xxx.h> headers to ensure that they behave as expected and don't give deprecated warnings. libstdc++-v3/ChangeLog: * doc/xml/manual/evolution.xml: Document deprecations. * doc/html/: Regenerate. include/c_compatibility/complex.h (_GLIBCXX_COMPLEX_H): Move include guard to start of file. Include <complex> directly instead of <ccomplex>. * include/c_compatibility/tgmath.h: Include <cmath> and <complex> directly, instead of <ctgmath>. * include/c_global/ccomplex: Add deprecated #warning for C++17 and #error for C++20 if _GLIBCXX_USE_DEPRECATED == 0. * include/c_global/ciso646: Likewise. * include/c_global/cstdalign: Likewise. * include/c_global/cstdbool: Likewise. * include/c_global/ctgmath: Likewise. * include/c_std/ciso646: Likewise. * include/precompiled/stdc++.h: Do not include ccomplex, ciso646, cstdalign, cstdbool, or ctgmath in C++17 and later. * testsuite/18_support/headers/cstdalign/macros.cc: Check for warnings and errors for unsupported dialects. * testsuite/18_support/headers/cstdbool/macros.cc: Likewise. * testsuite/26_numerics/headers/ctgmath/complex.cc: Likewise. * testsuite/27_io/objects/char/1.cc: Do not include <ciso646>. * testsuite/27_io/objects/wchar_t/1.cc: Likewise. * testsuite/18_support/headers/cstdbool/std_c++0x_neg.cc: Removed. * testsuite/18_support/headers/cstdalign/std_c++0x_neg.cc: Removed. * testsuite/26_numerics/headers/ccomplex/std_c++0x_neg.cc: Removed. * testsuite/26_numerics/headers/ctgmath/std_c++0x_neg.cc: Removed. * testsuite/18_support/headers/ciso646/macros.cc: New test. * testsuite/18_support/headers/ciso646/macros.h.cc: New test. * testsuite/18_support/headers/cstdbool/macros.h.cc: New test. * testsuite/26_numerics/headers/ccomplex/complex.cc: New test. * testsuite/26_numerics/headers/ccomplex/complex.h.cc: New test. * testsuite/26_numerics/headers/ctgmath/complex.h.cc: New test. gcc/testsuite/ChangeLog: * g++.old-deja/g++.other/headers1.C: Do not include ciso646 for C++17 and later.