riscv-gnu-toolchain/gcc.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
2024-11-08	hppa: Don't use '%' operator in base14_operand	John David Anglin	1	-1/+1
	Division is slow on hppa and mode sizes are powers of 2. So, we can use '&' operator to check displacement alignment. 2024-11-08 John David Anglin <danglin@gcc.gnu.org> gcc/ChangeLog: * config/pa/predicates.md (base14_operand): Use '&' operator instead of '%' to check displacement alignment.
2024-11-08	hppa: Don't allow large modes in hard registers	John David Anglin	1	-9/+6
	LRA has problems handling spills for OI and TI modes. There are issues with SUBREG support as well. This change fixes gcc.c-torture/compile/pr92618.c with LRA. 2024-11-08 John David Anglin <danglin@gcc.gnu.org> gcc/ChangeLog: PR target/117238 * config/pa/pa32-regs.h (PA_HARD_REGNO_MODE_OK): Don't allow mode size 32. Limit mode size 16 in general registers to complex modes.
2024-11-08	hppa: Fix handling of secondary reloads involving a SUBREG	John David Anglin	1	-0/+1
	This is fairly subtle. When handling spills for SUBREG arguments in pa_emit_move_sequence, alter_subreg may be called. It in turn calls adjust_address_1 and change_address_1. change_address_1 calls pa_legitimate_address_p to validate the new spill address. change_address_1 generates an internal compiler error if the address is not valid. We need to allow 14-bit displacements for all modes when reload_in_progress is true and strict is false to prevent the internal compiler error. SUBREGs are only used with the general registers, so the spill should result in an integer access. 14-bit displacements are okay for integer loads and stores but not for floating-point loads and stores. Potentially, the change could break the handling of spills for the floating point-registers but I believe these are handled separately in pa_emit_move_sequence. This change fixes the build of symmetrica-3.0.1+ds. 2024-11-08 John David Anglin <danglin@gcc.gnu.org> gcc/ChangeLog: PR target/117443 * config/pa/pa.cc (pa_legitimate_address_p): Allow any 14-bit displacement when reload is in progress and strict is false.
2024-11-08	Fix gcc.dg/vect/bb-slp-77.c for x86	Richard Biener	1	-1/+1
	x86 doesn't have .REDUC_PLUS for V2SImode - there's no effective target for that so add it to the list of targets not expecting the BB vectorization. * gcc.dg/vect/bb-slp-77.c: Add x86_64-- and i?86-- to the list of expected failing targets.
2024-11-08	arm: Improvements to arm_noce_conversion_profitable_p call [PR 116444]	Andre Simoes Dias Vieira	1	-3/+56
	When not dealing with the special armv8.1-m.main conditional instructions case make sure it uses the default_noce_conversion_profitable_p call to determine whether the sequence is cost effective. Also make sure arm_noce_conversion_profitable_p accepts vsel<cond> patterns for Armv8.1-M Mainline targets. gcc/ChangeLog: PR target/116444 * config/arm/arm.cc (arm_noce_conversion_profitable_p): Call default_noce_conversion_profitable_p when not dealing with the armv8.1-m.main special case. (arm_is_vsel_fp_insn): New function.
2024-11-08	c++: Fix ICE on constexpr virtual function [PR117317]	Jakub Jelinek	3	-1/+38
	Since C++20 virtual methods can be constexpr, and if they are constexpr evaluated, we choose tentative_decl_linkage for those defer their output and decide at_eof again. On the following testcases we ICE though, because if expand_or_defer_fn_1 decides to use tentative_decl_linkage, it returns true and the caller in that case cals emit_associated_thunks, where use_thunk which it calls asserts DECL_INTERFACE_KNOWN on the thunk destination, which isn't the case for tentative_decl_linkage. The following patch fixes the ICE by not emitting the thunks for the DECL_DEFER_OUTPUT fns just yet but waiting until at_eof time when we return to those. Note, the second testcase ICEs already since r0-110035 with -std=c++0x before it gets a chance to diagnose constexpr virtual method. 2024-11-08 Jakub Jelinek <jakub@redhat.com> PR c++/117317 * semantics.cc (emit_associated_thunks): Do nothing for !DECL_INTERFACE_KNOWN && DECL_DEFER_OUTPUT fns. * g++.dg/cpp2a/pr117317-1.C: New test. * g++.dg/cpp2a/pr117317-2.C: New test.
2024-11-08	testsuite: arm: Use check-function-bodies in epilog-1.c test	Torbjörn SVENSSON	1	-3/+21
	Update test case for armv8.1-m.main that supports conditional arithmetic. armv7-m: push {r4, lr} ldr r4, .L6 ldr r4, [r4] lsls r4, r4, #29 it mi addmi r2, r2, #1 bl bar movs r0, #0 pop {r4, pc} armv8.1-m.main: push {r3, r4, r5, lr} ldr r4, .L5 ldr r5, [r4] tst r5, #4 csinc r2, r2, r2, eq bl bar movs r0, #0 pop {r3, r4, r5, pc} gcc/testsuite/ChangeLog: * gcc.target/arm/epilog-1.c: Use check-function-bodies. Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
2024-11-08	testsuite: arm: Use effective-target arm_libc_fp_abi for pr68620.c test	Torbjörn SVENSSON	2	-1/+38
	This fixes reported regression at https://linaro.atlassian.net/browse/GNU-1407. gcc/testsuite/ChangeLog: * gcc.target/arm/pr68620.c: Use effective-target arm_libc_fp_abi. * lib/target-supports.exp: Define effective-target arm_libc_fp_abi. Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com> Co-authored-by: Richard Earnshaw <rearnsha@arm.com>
2024-11-08	testsuite: arm: Allow vst1.32 instruction in pr40457-2.c	Torbjörn SVENSSON	1	-1/+1
	When building the test case with neon, the 'vst1.32' instruction is used instead of 'strd'. Allow both variants to make the test pass. gcc/testsuite/ChangeLog: * gcc.target/arm/pr40457-2.c: Add vst1.32 as an allowed instruction. Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
2024-11-08	testsuite: arm: Use effective-target for pr84556.cc test	Torbjörn SVENSSON	1	-1/+1
	Using "dg-do run" with a selector overrides the default selector set by vect.exp that picks between "dg-do run" and "dg-do compile" based on the target's support for simd operations for Arm targets. The actual selection of default operation is performed in check_vect_support_and_set_flags. gcc/testsuite/ChangeLog: * g++.dg/vect/pr84556.cc: Change from "dg-do run" with selector to instead use dg-require-effective-target with the same selector. Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
2024-11-08	Enable gcc.dg/vect/vect-early-break_21.c on x86_64	Richard Biener	1	-1/+1
	The following also enables the testcase on x86 as it now has the required cbranch. * gcc.dg/vect/vect-early-break_21.c: Remove disabling of x86_64 and i?86.
2024-11-08	aarch64: Extend support for the AE family of Cortex CPUs	Victor Do Nascimento	3	-5/+9
	Implement -mcpu options for: - Cortex-A520AE - Cortex-A720AE - Cortex-R82AE These all implement the same feature sets as their non-AE counterparts, using the same scheduler and costs and differing only in their respective part numbers. gcc/ChangeLog: * config/aarch64/aarch64-cores.def (cortex-a520ae, cortex-a720ae, cortex-r82ae): Define new entries. * config/aarch64/aarch64-tune.md: Regenerate. * doc/invoke.texi: Document A520AE, A720AE and R82AE CPUs.
2024-11-08	testsuite: arm: Use effective-target for nomve_fp_1 test	Torbjörn SVENSSON	1	-0/+2
	Test uses MVE, so add effective-target arm_fp requirement. gcc/testsuite/ChangeLog: * g++.target/arm/mve/general-c++/nomve_fp_1.c: Use effective-target arm_fp. Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
2024-11-08	RISC-V: Add testcases for unsigned imm vec SAT_SUB form1	xuli	10	-0/+406
	form1: void __attribute__((noinline)) \ vec_sat_u_sub_imm##IMM##_##T##_fmt_1 (T out, T in, unsigned limit) \ { \ unsigned i; \ for (i = 0; i < limit; i++) \ out[i] = (T)IMM >= in[i] ? (T)IMM - in[i] : 0; \ } Passed the rv64gcv full regression test. Signed-off-by: Li Xu <xuli1@eswincomputing.com> gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/binop/vec_sat_data.h: add data for vec sat_sub. * gcc.target/riscv/rvv/autovec/vec_sat_arith.h: add unsigned imm vec sat_sub form1. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_imm-1.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_imm-2.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_imm-3.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_imm-4.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_imm-run-1.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_imm-run-2.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_imm-run-3.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_imm-run-4.c: New test.
2024-11-07	aarch64: Fix gcc.target/aarch64/sme2/acle-asm/bfmlslb_f32.c	Richard Sandiford	1	-30/+30
	I missed a search-and-replace on this test, meaning that it was duplicating bfmlalb_f32.c. gcc/testsuite/ * gcc.target/aarch64/sme2/acle-asm/bfmlslb_f32.c: Replace bfmla* with bfmls*
2024-11-07	aarch64: Make PSEL dependent on SME rather than SME2	Richard Sandiford	9	-10/+10
	The svpsel_lane intrinsics were wrongly classified as SME2+ only, rather than as base SME intrinsics. They should always be available in streaming mode. gcc/ * config/aarch64/aarch64-sve2.md (@aarch64_sve_psel<BHSD_BITS>) (aarch64_sve_psel<BHSD_BITS>_plus): Require TARGET_STREAMING rather than TARGET_STREAMING_SME2. gcc/testsuite/ gcc.target/aarch64/sme2/acle-asm/psel_lane_b16.c: Move to... * gcc.target/aarch64/sme/acle-asm/psel_lane_b16.c: ...here. * gcc.target/aarch64/sme2/acle-asm/psel_lane_b32.c: Move to... * gcc.target/aarch64/sme/acle-asm/psel_lane_b32.c: ...here. * gcc.target/aarch64/sme2/acle-asm/psel_lane_b64.c: Move to... * gcc.target/aarch64/sme/acle-asm/psel_lane_b64.c: ...here. * gcc.target/aarch64/sme2/acle-asm/psel_lane_b8.c: Move to... * gcc.target/aarch64/sme/acle-asm/psel_lane_b8.c: ...here. * gcc.target/aarch64/sme2/acle-asm/psel_lane_c16.c: Move to... * gcc.target/aarch64/sme/acle-asm/psel_lane_c16.c: ...here. * gcc.target/aarch64/sme2/acle-asm/psel_lane_c32.c: Move to... * gcc.target/aarch64/sme/acle-asm/psel_lane_c32.c: ...here. * gcc.target/aarch64/sme2/acle-asm/psel_lane_c64.c: Move to... * gcc.target/aarch64/sme/acle-asm/psel_lane_c64.c: ...here. * gcc.target/aarch64/sme2/acle-asm/psel_lane_c8.c: Move to... * gcc.target/aarch64/sme/acle-asm/psel_lane_c8.c: ...here.
2024-11-07	aarch64: Restrict FCLAMP to SME2	Richard Sandiford	4	-2/+30
	There are two sets of patterns for FCLAMP: one set for single registers and one set for multiple registers. The multiple-register set was correctly gated on SME2, but the single-register set only required SME. This doesn't matter for ACLE usage, since the intrinsic definitions are correctly gated. But it does matter for automatic generation of FCLAMP from separate minimum and maximum operations (either ACLE intrinsics or autovectorised code). gcc/ * config/aarch64/aarch64-sve2.md (@aarch64_sve_fclamp<mode>) (aarch64_sve_fclamp<mode>_x): Require TARGET_STREAMING_SME2 rather than TARGET_STREAMING_SME. gcc/testsuite/ gcc.target/aarch64/sme/clamp_3.c: Force sme2 * gcc.target/aarch64/sme/clamp_4.c: Likewise. * gcc.target/aarch64/sme/clamp_5.c: New test.
2024-11-07	bpf: avoid possible null deref in btf_ext_output [PR target/117447]	David Faust	1	-0/+3
	The BPF-specific .BTF.ext section is always generated for BPF programs if -gbtf is specified, and generating it requires BTF information and assumes that the BTF info has already been generated. Compiling non-C languages to BPF is not supported, nor is generating CTF/BTF for non-C. But, compiling another language like C++ to BPF with -gbtf specified meant that we would try to generate the .BTF.ext section anyway, and then ICE because no BTF information was available. Add a check to bail out of btf_ext_output if the TU CTFC does not exist, meaning no BTF info is available. gcc/ PR target/117447 * config/bpf/btfext-out.cc (btf_ext_output): Bail if TU CTFC is null.
2024-11-07	btf: check hash maps are non-null before emptying	David Faust	1	-4/+10
	These maps will always be non-null in btf_finalize under normal circumstances, but be safe and verify that before trying to empty them. gcc/ * btfout.cc (btf_finalize): Check that hash maps are non-null before emptying them.
2024-11-07	ifcombine: For short circuit case, allow 2 convert defining statements [PR85605]	Andrew Pinski	5	-2/+116
	r0-126134-g5d2a9da9a7f7c1 added support for circuiting and combing the ifs into using either AND or OR. But it only allowed the inner condition basic block having the conditional only. This changes to allow up to 2 defining statements as long as they are just integer to integer conversions for either the lhs or rhs of the conditional. This should allow to use ccmp on aarch64 and x86_64 (APX) slightly more than before. Boootstrapped and tested on x86_64-linux-gnu. PR tree-optimization/85605 gcc/ChangeLog: * tree-ssa-ifcombine.cc (can_combine_bbs_with_short_circuit): New function. (ifcombine_ifandif): Use can_combine_bbs_with_short_circuit instead of checking if iterator is one before the last statement. gcc/testsuite/ChangeLog: * g++.dg/tree-ssa/ifcombine-ccmp-1.C: New test. * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-7.c: New test. * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-8.c: New test. * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-9.c: New test. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-11-07	VN: Lookup `val != 0` if we got back val when looking up the predicate for ↵	Andrew Pinski	2	-0/+48
	GIMPLE_COND [PR117414] Sometimes we get back a full ssa name when looking up the comparison of the GIMPLE_COND rather than a predicate. We then want to lookup the `val != 0` for the predicate. Note this might happen with other boolean assignments and COND_EXPR but I am not sure if it is as important; I have not found a testcase yet. Bootstrapped and tested on x86_64-linux-gnu. PR tree-optimization/117414 gcc/ChangeLog: * tree-ssa-sccvn.cc (process_bb): Lookup `val != 0` if got back a ssa name when looking the comparison. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/fre-predicated-4.c: New test. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-11-07	VN: Handle `(A CMP B) !=/== 0` for predicates [PR117414]	Andrew Pinski	2	-0/+60
	After the last patch, we also want to record `(A CMP B) != 0` as `(A CMP B)` and `(A CMP B) == 0` as `(A CMP B)` with the true/false edges swapped. This shows up more due to the new handling of `(A \| B) ==/!= 0` in insert_predicates_for_cond as now we can notice these comparisons which were not seen before. This is enough to fix the original issue in `gcc.dg/tree-ssa/pr111456-1.c` and make sure we don't regress it when enhancing ifcombine. This adds that predicate and allows us to optimize f in fre-predicated-3.c. Changes since v1: * v2: Use vn_valueize. Bootstrapped and tested on x86_64-linux-gnu. PR tree-optimization/117414 gcc/ChangeLog: * tree-ssa-sccvn.cc (insert_predicates_for_cond): Handle `(A CMP B) !=/== 0`. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/fre-predicated-3.c: New test. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-11-07	VN: Handle `(a \| b) !=/== 0` for predicates [PR117414]	Andrew Pinski	3	-0/+116
	For `(a \| b) == 0`, we can "assert" on the true edge that both `a == 0` and `b == 0` but nothing on the false edge. For `(a \| b) != 0`, we can "assert" on the false edge that both `a == 0` and `b == 0` but nothing on the true edge. This adds that predicate and allows us to optimize f0, f1, and f2 in fre-predicated-[12].c. Changes since v1: * v2: Use vn_valueize. Also canonicalize the comparison at the begining of insert_predicates_for_cond for constants to be on the rhs. Return early for non-ssa names on the lhs (after canonicalization). Bootstrapped and tested on x86_64-linux-gnu. PR tree-optimization/117414 gcc/ChangeLog: * tree-ssa-sccvn.cc (insert_predicates_for_cond): Canonicalize the comparison. Don't insert anything if lhs is not a SSA_NAME. Handle `(a \| b) !=/== 0`. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/fre-predicated-1.c: New test. * gcc.dg/tree-ssa/fre-predicated-2.c: New test. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-11-07	VN: Factor out inserting predicates for conditional	Andrew Pinski	1	-33/+37
	To make it easier to add more predicates in some cases, factor out the code. Plus it makes the code slightly more readable since it is not indented as much. Bootstrapped and tested on x86_64. gcc/ChangeLog: * tree-ssa-sccvn.cc (insert_predicates_for_cond): New function, factored out from ... (process_bb): Here. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-11-07	Unify registered_pp_pragmas and registered_pragmas	Paul Iannetta	1	-37/+29
	Until now, the structures that keep pragma information were different when in preprocessing only mode and in normal mode. This change unifies both so that the space and name of a pragma are always registered and can be queried easily at a later time. gcc/c-family/ChangeLog: * c-pragma.cc (struct pragma_pp_data): Use (struct internal_pragma_handler); (c_register_pragma_1): Always register name and space for all pragmas. (c_invoke_pragma_handler): Adapt. (c_invoke_early_pragma_handler): Likewise. (c_pp_invoke_early_pragma_handler): Likewise.
2024-11-07	Disable gather/scatter for non-first vectorized epilogue	Richard Biener	2	-0/+10
	We currently make vect_check_gather_scatter happy by replacing SSA name references in DR_REF for gather/scatter DRs but the replacement process only works once since for the second epilogue we have SSA names from the first epilogue in DR_REF but as we copied from the original loop the SSA mapping doesn't work. The following simply punts for non-first epilogues, those gather/scatter recognized by patterns to IFNs are already analyzed and should work fine. * tree-vect-data-refs.cc (vect_check_gather_scatter): Refuse to analyze DR_REF if from an epilogue that's not first. * tree-vect-loop.cc (update_epilogue_loop_vinfo): Add comment how the substitution in DR_REF is broken.
2024-11-07	Add LOOP_VINFO_MAIN_LOOP_INFO	Richard Biener	3	-48/+53
	The following introduces LOOP_VINFO_MAIN_LOOP_INFO alongside LOOP_VINFO_ORIG_LOOP_INFO so one can have both access to the main vectorized loop info and the preceeding vectorized epilogue. This is critical for correctness as we need to disallow never executed epilogues by costing in vect_analyze_loop_costing as we assume those do not exist when deciding to add a skip-vector edge during peeling. The patch also changes how multiple vector epilogues are handled - instead of the epilogue_vinfos array in the main loop info we now record the single epilogue_vinfo there and further epilogues in the epilogue_vinfo member of the epilogue info. This simplifies code. * tree-vectorizer.h (_loop_vec_info::main_loop_info): New. (LOOP_VINFO_MAIN_LOOP_INFO): Likewise. (_loop_vec_info::epilogue_vinfo): Change from epilogue_vinfos from array to single element. * tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Initialize main_loop_info and epilogue_vinfo. Remove epilogue_vinfos allocation. (_loop_vec_info::~_loop_vec_info): Do not release epilogue_vinfos. (vect_create_loop_vinfo): Rename parameter, set LOOP_VINFO_MAIN_LOOP_INFO. (vect_analyze_loop_1): Rename parameter. (vect_analyze_loop_costing): Properly distinguish between the main vector loop and the preceeding epilogue. (vect_analyze_loop): Change for epilogue_vinfos no longer being a vector. * tree-vect-loop-manip.cc (vect_do_peeling): Simplify and thereby handle a vector epilogue of a vector epilogue.
2024-11-07	Add LOOP_VINFO_DRS_ADVANCED_BY	Richard Biener	2	-0/+13
	The following remembers how we advanced DRs when vectorizing an epilogue. When we want to vectorize the epilogue of such epilogue we have to retain that advancement and add the advancement for this vectorized epilogue. Due to the way we copy and re-associate stmt_vec_infos and DRs recording this advancement and re-applying it for the next epilogue is simplest. * tree-vectorizer.h (_loop_vec_info::drs_advanced_by): New. (LOOP_VINFO_DRS_ADVANCED_BY): Likewise. * tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Initialize drs_advanced_by. (update_epilogue_loop_vinfo): Remember the DR advancement made. (vect_transform_loop): Accumulate past advancements.
2024-11-07	Check LOOP_VINFO_PEELING_FOR_GAPS on epilog is supported	Richard Biener	1	-10/+20
	We need to check that an epilogue doesn't require LOOP_VINFO_PEELING_FOR_GAPS in case the main loop didn't (the other way around is OK), the computation whether the epilog is executed or not gets our of sync otherwise. * tree-vect-loop.cc (vect_analyze_loop_2): Move vect_analyze_loop_costing after check whether we can do peeling. Add check on LOOP_VINFO_PEELING_FOR_GAPS for epilogues.
2024-11-07	testsuite: Fix up pr116725.c test [PR116725]	Jakub Jelinek	1	-0/+3
	On Fri, Oct 18, 2024 at 02:05:59PM -0400, Antoni Boucher wrote: > PR target/116725 > * gcc.target/i386/pr116725.c: Add test using those AVX builtins. This test FAILs for me, as I don't have the latest gas around and the test is dg-do assemble, so doesn't need just fixed compiler, but also assembler which supports those instructions. The following patch adds effective target directives to ensure assembler supports those too. 2024-11-07 Jakub Jelinek <jakub@redhat.com> PR target/116725 * gcc.target/i386/pr116725.c: Add dg-require-effective-target avx512{dq,fp16,vl}.
2024-11-07	openmp: Fix max_vf testcases with -march=cascadelake	Andrew Stubbs	1	-1/+1
	Apparently we need to explicitly disable AVX, not just enabled SSE, to guarentee the 16-lane vectors we need for the pattern match. libgomp/ChangeLog: * testsuite/libgomp.c/max_vf-1.c: Add -mno-avx. gcc/testsuite/ChangeLog: * gcc.dg/gomp/max_vf-1.c: Add -mno-avx.
2024-11-07	Doc: Add doc for standard name mask_len_strided_load{store}m	Pan Li	1	-0/+27
	This patch would like to add doc for the below 2 standard names. 1. strided load: v = mask_len_strided_load (ptr, stried, mask, len, bias) 2. strided store: mask_len_stried_store (ptr, stride, v, mask, len, bias) gcc/ChangeLog: * doc/md.texi: Add doc for mask_len_stried_load{store}. Signed-off-by: Pan Li <pan2.li@intel.com> Co-Authored-By: Juzhe-Zhong <juzhe.zhong@rivai.ai>
2024-11-07	rtl-optimization/117467 - 33% compile-time in rest of compilation	Richard Biener	2	-1/+2
	ext-dce uses TV_NONE, that's not OK for a pass taking 33% compile-time. The following adds a timevar to it for proper blaming. PR rtl-optimization/117467 * timevar.def (TV_EXT_DCE): New. * ext-dce.cc (pass_data_ext_dce): Use TV_EXT_DCE.
2024-11-07	i386: Support cstorebf4 with native bf16 comi	Hongyu Wang	3	-10/+82
	We recently supports cbranchbf4 with AVX10_2 native bf16 comi instructions, so do similar to cstorebf4. gcc/ChangeLog: * config/i386/i386.md (cstorebf4): Use vcomsbf16 under TARGET_AVX10_2_256 and -fno-trapping-math. (cbranchbf4): Adjust formatting. gcc/testsuite/ChangeLog: * gcc.target/i386/avx10_2-comibf-3.c: New test. * gcc.target/i386/avx10_2-comibf-4.c: Likewise.
2024-11-07	i386: Modify regexp of pr117304-1.c	Hu, Lin1	1	-5/+5
	Since the test doesn't care if the hint is correct, modify the regexp of the hint part to avoid future changes to the hint that would cause the test to fail. gcc/testsuite/ChangeLog: * gcc.target/i386/pr117304-1.c: Modify regexp.
2024-11-07	limit ifcombine stmt moving and adjust flow info	Alexandre Oliva	1	-25/+89
	It became apparent that conditions could be combined that had deep SSA dependency trees, that might thus require moving lots of statements. Set a hard upper bound for now, hopefully to be replaced by a dynamically computed bound, based on probabilities and costs. Also reset flow sensitive info and avoid introducing undefined behavior when moving stmts from under guarding conditions. Finally, rework the preexisting reset of flow sensitive info and avoidance of undefined behavior to be done when needed on all affected inner blocks: reset flow info whenever enclosing conditions change, and avoid undefined behavior whenever enclosing conditions become laxer. for gcc/ChangeLog * tree-ssa-ifcombine.cc (ifcombine_rewrite_to_defined_overflow): New. (ifcombine_replace_cond): Reject conds that would require moving too many stmts. Reset flow sensitive info and avoid undefined behavior in moved stmts. Reset flow sensitive info in all inner blocks when the outer condition changes, and avoid undefined behavior whenever the outer condition becomes laxer, adapted and moved from... (pass_tree_ifcombine::execute): ... here.
2024-11-07	handle TRUTH_ANDIF cond exprs in ifcombine_replace_cond	Alexandre Oliva	1	-0/+11
	The upcoming move of fold_truth_andor to ifcombine brings with it the possibility of TRUTH_ANDIF cond exprs. Handle them by splitting the cond so as to best use both BB insertion points, but only if they're contiguous. for gcc/ChangeLog * tree-ssa-ifcombine.c (ifcombine_replace_cond): Support TRUTH_ANDIF cond exprs.
2024-11-07	ifcombine across noncontiguous blocks	Alexandre Oliva	1	-29/+123
	Rework ifcombine to support merging conditions from noncontiguous blocks. This depends on earlier preparation changes. The function that attempted to ifcombine a block with its immediate predecessor, tree_ssa_ifcombine_bb, now loops over dominating blocks eligible for ifcombine, attempting to combine with them. The function that actually drives the combination of a pair of blocks, tree_ssa_ifcombine_bb_1, now takes an additional parameter: the successor of outer that leads to inner. The function that recognizes if_then_else patterns is modified to enable testing without distinguishing between then and else, or to require nondegenerate conditions, that aren't worth combining with. for gcc/ChangeLog * tree-ssa-ifcombine.cc (recognize_if_then_else): Support relaxed then/else testing; require nondegenerate condition otherwise. (tree_ssa_ifcombine_bb_1): Add outer_succ_bb parm, use it instead of inner_cond_bb. Adjust callers. (tree_ssa_ifcombine_bb): Loop over dominating outer blocks eligible for ifcombine. (pass_tree_ifcombine::execute): Noted potential need for changes to the post-combine logic.
2024-11-07	extend ifcombine_replace_cond to handle noncontiguous ifcombine	Alexandre Oliva	1	-5/+170
	Prepare to handle noncontiguous ifcombine, introducing logic to modify the outer condition when needed. There are two cases worth mentioning: - when blocks are noncontiguous, we have to place the combined condition in the outer block to avoid pessimizing carefully crafted short-circuited tests; - even when blocks are contiguous, we prepare for situations in which the combined condition has two tests, one to be placed in outer and the other in inner. This circumstance will not come up when noncontiguous ifcombine is first enabled, but it will when an improved fold_truth_andor is integrated with ifcombine. Combining the condition from inner into outer may require moving SSA DEFs used in the inner condition, and the changes implement this as well. for gcc/ChangeLog * tree-ssa-ifcombine.cc: Include bitmap.h. (ifcombine_mark_ssa_name): New. (struct ifcombine_mark_ssa_name_t): New. (ifcombine_mark_ssa_name_walk): New. (ifcombine_replace_cond): Prepare to handle noncontiguous and split-condition ifcombine.
2024-11-07	adjust update_profile_after_ifcombine for noncontiguous ifcombine	Alexandre Oliva	1	-24/+85
	Prepare for ifcombining noncontiguous blocks, adding (still unused) logic to the ifcombine profile updater to handle such cases. for gcc/ChangeLog * tree-ssa-ifcombine.cc (known_succ_p): New. (update_profile_after_ifcombine): Handle noncontiguous blocks.
2024-11-07	introduce ifcombine_replace_cond	Alexandre Oliva	1	-72/+65
	Refactor ifcombine_ifandif, moving the common code from the various paths that apply the combined condition to a new function. for gcc/ChangeLog * tree-ssa-ifcombine.cc (ifcombine_replace_cond): Factor out of... (ifcombine_ifandif): ... this. Leave it for the above to gimplify and invert the condition.
2024-11-07	drop redundant ifcombine_ifandif parm	Alexandre Oliva	1	-11/+7
	In preparation to changes that may modify both inner and outer conditions in ifcombine, drop the redundant parameter result_inv, that is always identical to inner_inv. for gcc/ChangeLog * tree-ssa-ifcombine.cc (ifcombine_ifandif): Drop redundant result_inv parm. Adjust all callers.
2024-11-07	allow vuses in ifcombine blocks	Alexandre Oliva	1	-1/+1
	Disallowing vuses in blocks for ifcombine is too strict, and it prevents usefully moving fold_truth_andor into ifcombine. That tree-level folder has long ifcombined loads, absent other relevant side effects. for gcc/ChangeLog * tree-ssa-ifcombine.c (bb_no_side_effects_p): Allow vuses, but not vdefs.
2024-11-07	[testsuite] disable PIE on ia32 on more tests	Alexandre Oliva	8	-0/+8
	Multiple tests fail on ia32 with -fPIE enabled by default because of different call sequences required by the call-saved PIC register (no-callee-saved-.c), uses of the constant pool instead of computing constants (pr100865-.c), and unexpected matches of esp in get_pc_thunk (sse2-stv-1.c). Disable PIE on them, to match the expectations. for gcc/testsuite/ChangeLog * gcc.target/i386/no-callee-saved-13.c: Disable PIE on ia32. * gcc.target/i386/no-callee-saved-14.c: Likewise. * gcc.target/i386/no-callee-saved-15.c: Likewise. * gcc.target/i386/no-callee-saved-17.c: Likewise. * gcc.target/i386/pr100865-1.c: Likewise. * gcc.target/i386/pr100865-7a.c: Likewise. * gcc.target/i386/pr100865-7c.c: Likewise. * gcc.target/i386/sse2-stv-1.c: Likewise.
2024-11-07	[testsuite] fix pr70321.c PIC expectations	Alexandre Oliva	1	-1/+5
	When we select a non-bx get_pc_thunk, we get an extra mov to set up the PIC register before the abort call. Expect that mov or a get_pc_thunk.bx call. for gcc/testsuite/ChangeLog * gcc.target/i386/pr70321.c: Cope with non-bx get_pc_thunk.
2024-11-07	RISC-V: Add testcases for signed imm SAT_ADD form1	xuli	12	-0/+336
	This patch adds testcase for form1, as shown below: T __attribute__((noinline)) \ sat_s_add_imm_##T##_fmt_1##_##INDEX (T x) \ { \ T sum = (UT)x + (UT)IMM; \ return (x ^ IMM) < 0 \ ? sum \ : (sum ^ x) >= 0 \ ? sum \ : x < 0 ? MIN : MAX; \ } Passed the rv64gcv regression test. Signed-off-by: Li Xu <xuli1@eswincomputing.com> gcc/testsuite/ChangeLog: * gcc.target/riscv/sat_arith.h: Support signed imm SAT_ADD form1. * gcc.target/riscv/sat_s_add_imm-1-1.c: New test. * gcc.target/riscv/sat_s_add_imm-1.c: New test. * gcc.target/riscv/sat_s_add_imm-2-1.c: New test. * gcc.target/riscv/sat_s_add_imm-2.c: New test. * gcc.target/riscv/sat_s_add_imm-3-1.c: New test. * gcc.target/riscv/sat_s_add_imm-3.c: New test. * gcc.target/riscv/sat_s_add_imm-4.c: New test. * gcc.target/riscv/sat_s_add_imm-run-1.c: New test. * gcc.target/riscv/sat_s_add_imm-run-2.c: New test. * gcc.target/riscv/sat_s_add_imm-run-3.c: New test. * gcc.target/riscv/sat_s_add_imm-run-4.c: New test.
2024-11-07	Match:Support signed imm SAT_ADD form1	xuli	2	-0/+16
	This patch would like to support .SAT_ADD when one of the op is singed IMM. Form1: T __attribute__((noinline)) \ sat_s_add_imm_##T##_fmt_1##_##INDEX (T x) \ { \ T sum = (UT)x + (UT)IMM; \ return (x ^ IMM) < 0 \ ? sum \ : (sum ^ x) >= 0 \ ? sum \ : x < 0 ? MIN : MAX; \ } Take below form1 as example: DEF_SAT_S_ADD_IMM_FMT_1(0, int8_t, uint8_t, -10, INT8_MIN, INT8_MAX) Before this patch: __attribute__((noinline)) int8_t sat_s_add_imm_int8_t_fmt_1_0 (int8_t x) { int8_t sum; unsigned char x.0_1; unsigned char _2; signed char _4; int8_t _5; _Bool _9; signed char _10; signed char _11; signed char _12; signed char _14; signed char _16; <bb 2> [local count: 1073741824]: x.0_1 = (unsigned char) x_6(D); _2 = x.0_1 + 246; sum_7 = (int8_t) _2; _4 = x_6(D) ^ sum_7; _16 = x_6(D) ^ 9; _14 = _4 & _16; if (_14 < 0) goto <bb 3>; [41.00%] else goto <bb 4>; [59.00%] <bb 3> [local count: 259738147]: _9 = x_6(D) < 0; _10 = (signed char) _9; _11 = -_10; _12 = _11 ^ 127; <bb 4> [local count: 1073741824]: # _5 = PHI <sum_7(2), _12(3)> return _5; } After this patch: __attribute__((noinline)) int8_t sat_s_add_imm_int8_t_fmt_1_0 (int8_t x) { int8_t _5; <bb 2> [local count: 1073741824]: _5 = .SAT_ADD (x_6(D), -10); [tail call] return _5; } The below test suites are passed for this patch: 1. The rv64gcv fully regression tests. 2. The x86 bootstrap tests. 3. The x86 fully regression tests. Signed-off-by: Li Xu <xuli1@eswincomputing.com> gcc/ChangeLog: * match.pd: Add the form1 of signed imm .SAT_ADD matching. * tree-ssa-math-opts.cc (match_saturation_add): Add fold convert for const_int to the type of operand 0.
2024-11-07	Daily bump.	GCC Administrator	4	-1/+151

2024-11-07	avx10_2-comibf-2.c: Require AVX10.2 support	H.J. Lu	1	-1/+2
	Since avx10_2-comibf-2.c is a run test, require AVX10.2 support. * gcc.target/i386/avx10_2-comibf-2.c: Require avx10_2 target. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2024-11-06	[PATCH v2] RISC-V: zero_extend(not) -> xor optimization [PR112398]	Alexey Merzlyakov	2	-0/+36
	This patch adds optimization of the following patterns: (zero_extend:M (subreg:N (not:O==M (X:Q==M)))) -> (xor:M (zero_extend:M (subreg:N (X:M)), mask)) ... where the mask is GET_MODE_MASK (N). For the cases when X:M doesn't have any non-zero bits outside of mode N, (zero_extend:M (subreg:N (X:M)) could be simplified to just (X:M) and whole optimization will be: (zero_extend:M (subreg:N (not:M (X:M)))) -> (xor:M (X:M, mask)) Patch targets to handle code patterns like: not a0,a0 andi a0,a0,0xff to be optimized to: xori a0,a0,255 Change was locally tested for x86_64 and AArch64 (as most common) and for RV-64 and MIPS-32 targets (as having an effect from this optimization): no regressions for all cases. PR rtl-optimization/112398 gcc/ChangeLog: * simplify-rtx.cc (simplify_context::simplify_unary_operation_1): Simplify ZERO_EXTEND (SUBREG (NOT X)) to XOR (X, GET_MODE_MASK(SUBREG)) when X doesn't have any non-zero bits outside of SUBREG mode. gcc/testsuite/ChangeLog: * gcc.target/riscv/pr112398.c: New test. Signed-off-by: Alexey Merzlyakov <alexey.merzlyakov@samsung.com>