riscv-gnu-toolchain/gcc.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
2023-08-24	gimple_fold: Support COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS gimple fold	Juzhe-Zhong	4	-10/+138
	Hi, Richard and Richi. Currently, GCC support COND_LEN_FMA for floating-point NO -ffast-math. It's supported in tree-ssa-math-opts.cc. However, GCC failed to support COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS. Consider this following case: __attribute__ ((noipa)) void ternop_##TYPE (TYPE __restrict dst, \ TYPE __restrict a, \ TYPE __restrict b, int n) \ { \ for (int i = 0; i < n; i++) \ dst[i] -= a[i] b[i]; \ } TEST_TYPE (float) \ TEST_ALL () Gimple IR for RVV: ... _39 = -vect__8.14_26; vect__10.16_21 = .COND_LEN_FMA ({ -1, ... }, vect__6.11_30, _39, vect__4.8_34, vect__4.8_34, _46, 0); ... This is because this following piece of codes in tree-ssa-math-opts.cc: if (len) fma_stmt = gimple_build_call_internal (IFN_COND_LEN_FMA, 7, cond, mulop1, op2, addop, else_value, len, bias); else if (cond) fma_stmt = gimple_build_call_internal (IFN_COND_FMA, 5, cond, mulop1, op2, addop, else_value); else fma_stmt = gimple_build_call_internal (IFN_FMA, 3, mulop1, op2, addop); gimple_set_lhs (fma_stmt, gimple_get_lhs (use_stmt)); gimple_call_set_nothrow (fma_stmt, !stmt_can_throw_internal (cfun, use_stmt)); gsi_replace (&gsi, fma_stmt, true); /* Follow all SSA edges so that we generate FMS, FNMA and FNMS regardless of where the negation occurs. / gimple orig_stmt = gsi_stmt (gsi); if (fold_stmt (&gsi, follow_all_ssa_edges)) { if (maybe_clean_or_replace_eh_stmt (orig_stmt, gsi_stmt (gsi))) gcc_unreachable (); update_stmt (gsi_stmt (gsi)); } 'fold_stmt' failed to fold NEGATE_EXPR + COND_LEN_FMA ====> COND_LEN_FNMA. This patch support STMT fold into: vect__10.16_21 = .COND_LEN_FNMA ({ -1, ... }, vect__8.14_26, vect__6.11_30, vect__4.8_34, { 0.0, ... }, _46, 0); Note that COND_LEN_FNMA has 7 arguments and COND_LEN_ADD has 6 arguments. Extend maximum num ops: - static const unsigned int MAX_NUM_OPS = 5; + static const unsigned int MAX_NUM_OPS = 7; Bootstrap and Regtest on X86 passed. Tested on aarch64 Qemu. Fully tested COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS on RISC-V backend. gcc/ChangeLog: * genmatch.cc (decision_tree::gen): Support COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS gimple fold. * gimple-match-exports.cc (gimple_simplify): Ditto. (gimple_resimplify6): New function. (gimple_resimplify7): New function. (gimple_match_op::resimplify): Support COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS gimple fold. (convert_conditional_op): Ditto. (build_call_internal): Ditto. (try_conditional_simplification): Ditto. (gimple_extract): Ditto. * gimple-match.h (gimple_match_cond::gimple_match_cond): Ditto. * internal-fn.cc (CASE): Ditto.
2023-08-24	tree-optimization/111115 - SLP of masked stores	Richard Biener	6	-21/+94
	The following adds the capability to do SLP on .MASK_STORE, I do not plan to add interleaving support. PR tree-optimization/111115 gcc/ * tree-vectorizer.h (vect_slp_child_index_for_operand): New. * tree-vect-data-refs.cc (can_group_stmts_p): Also group .MASK_STORE. * tree-vect-slp.cc (arg3_arg2_map): New. (vect_get_operand_map): Handle IFN_MASK_STORE. (vect_slp_child_index_for_operand): New function. (vect_build_slp_tree_1): Handle statements with no LHS, masked store ifns. (vect_remove_slp_scalar_calls): Likewise. * tree-vect-stmts.cc (vect_check_store_rhs): Lookup the SLP child corresponding to the ifn value index. (vectorizable_store): Likewise for the mask index. Support masked stores. (vectorizable_load): Lookup the SLP child corresponding to the ifn mask index. gcc/testsuite/ * lib/target-supports.exp (check_effective_target_vect_masked_store): Supported with check_avx_available. * gcc.dg/vect/slp-mask-store-1.c: New testcase.
2023-08-24	tree-optimization/111125 - properly cost BB reduction remain stmt handling	Richard Biener	1	-0/+5
	We assume that all root stmts which compose the total reduction chain are vectorized but fail to account for the cost of adding back the scalar defs we are not vectorizing. The following rectifies this, fixing the gcc.dg/tree-ssa/slsr-11.c FAIL on aarch64. PR tree-optimization/111125 * tree-vect-slp.cc (vectorizable_bb_reduc_epilogue): Account for the remain_defs processing.
2023-08-24	aarch64: Account for different Advanced SIMD fusing options	Richard Sandiford	3	-6/+47
	The scalar FNMADD/FNMSUB and SVE FNMLA/FNMLS instructions mean that either side of a subtraction can start an accumulator chain. However, Advanced SIMD doesn't have an equivalent instruction. This means that, for Advanced SIMD, a subtraction can only be fused if the second operand is a multiplication. Also, if both sides of a subtraction are multiplications, and if the second operand is used multiple times, such as: c * d - a * b e * f - a * b then the first rather than second multiplication operand will tend to be fused. On Advanced SIMD, this leads to: tmp1 = a * b tmp2 = -tmp1 ... = tmp2 + c * d // FMLA ... = tmp2 + e * f // FMLA where one of the FMLAs also requires a MOV. This patch tries to account for this in the vector cost model. It improves roms performance by 2-3% on Neoverse V1. It's also needed to avoid a regression in fotonik for Neoverse N2 and Neoverse V2 with the patch for PR110625. gcc/ * config/aarch64/aarch64.cc: Include ssa.h. (aarch64_multiply_add_p): Require the second operand of an Advanced SIMD subtraction to be a multiplication. Assume that such an operation won't be fused if the second operand is used multiple times and if the first operand is also a multiplication. gcc/testsuite/ * gcc.target/aarch64/neoverse_v1_2.c: New test. * gcc.target/aarch64/neoverse_v1_3.c: Likewise.
2023-08-24	VECT: Apply LEN_FOLD_EXTRACT_LAST into loop vectorizer	Juzhe-Zhong	2	-9/+50
	Hi. This patch is apply LEN_FOLD_EXTRACT_LAST into loop vectorizer. Consider this following case: /* Simple condition reduction. / int __attribute__ ((noinline, noclone)) condition_reduction (int a, int min_v) { int last = 66; /* High start value. / for (int i = 0; i < N; i++) if (a[i] < min_v) last = i; return last; } With this patch, we can generate this following IR: _44 = .SELECT_VL (ivtmp_42, POLY_INT_CST [4, 4]); _34 = vect_vec_iv_.5_33 + { POLY_INT_CST [4, 4], ... }; ivtmp_36 = _44 4; vect__4.8_39 = .MASK_LEN_LOAD (vectp_a.6_37, 32B, { -1, ... }, _44, 0); mask__11.9_41 = vect__4.8_39 < vect_cst__40; last_5 = .LEN_FOLD_EXTRACT_LAST (last_14, mask__11.9_41, vect_vec_iv_.5_33, _44, 0); ... gcc/ChangeLog: * tree-vect-loop.cc (vectorizable_reduction): Apply LEN_FOLD_EXTRACT_LAST. * tree-vect-stmts.cc (vectorizable_condition): Ditto.
2023-08-24	tree-optimization/111128 - fix shift pattern recog	Richard Biener	2	-1/+17
	The following fixes placement of shift operand sanitization with MIN when the original shift operand was external but the actual one is not. PR tree-optimization/111128 * tree-vect-patterns.cc (vect_recog_over_widening_pattern): Emit external shift operand inline if we promoted it with another pattern stmt. * gcc.dg/torture/pr111128.c: New testcase.
2023-08-24	testsuite/111125 - disable BB vectorization for the test	Richard Biener	1	-1/+4
	The test is for loop vectorization producing non-canonical multiplications. We can now BB vectorize the whole function when the target supports .REDUC_PLUS for V2SImode but we don't have a dejagnu selector for that. Disable BB vectorization like we disabled epilogue vectorization. PR testsuite/111125 * gcc.dg/vect/pr53773.c: Disable BB vectorization.
2023-08-24	RISC-V: Fix one typo in autovec.md pattern comment	Pan Li	1	-3/+3
	vfmsac => vfnmacc vfmsub => vfnmadd Signed-off-by: Pan Li <pan2.li@intel.com> gcc/ChangeLog: * config/riscv/autovec.md: Fix typo.
2023-08-24	RISC-V: Refactor RVV class by frm_op_type template arg	Pan Li	1	-428/+143
	As suggested by kito, we will add new frm_opt_type template arg to the op class, to avoid the duplicated function expand. Signed-off-by: Pan Li <pan2.li@intel.com> gcc/ChangeLog: * config/riscv/riscv-vector-builtins-bases.cc (class binop_frm): Removed. (class reverse_binop_frm): Ditto. (class widen_binop_frm): Ditto. (class vfmacc_frm): Ditto. (class vfnmacc_frm): Ditto. (class vfmsac_frm): Ditto. (class vfnmsac_frm): Ditto. (class vfmadd_frm): Ditto. (class vfnmadd_frm): Ditto. (class vfmsub_frm): Ditto. (class vfnmsub_frm): Ditto. (class vfwmacc_frm): Ditto. (class vfwnmacc_frm): Ditto. (class vfwmsac_frm): Ditto. (class vfwnmsac_frm): Ditto. (class unop_frm): Ditto. (class vfrec7_frm): Ditto. (class binop): Add frm_op_type template arg. (class unop): Ditto. (class widen_binop): Ditto. (class widen_binop_fp): Ditto. (class reverse_binop): Ditto. (class vfmacc): Ditto. (class vfnmsac): Ditto. (class vfmadd): Ditto. (class vfnmsub): Ditto. (class vfnmacc): Ditto. (class vfmsac): Ditto. (class vfnmadd): Ditto. (class vfmsub): Ditto. (class vfwmacc): Ditto. (class vfwnmacc): Ditto. (class vfwmsac): Ditto. (class vfwnmsac): Ditto. (class float_misc): Ditto.
2023-08-24	MATCH: [PR111109] Fix bit_ior(cond,cond) when comparisons are fp	Andrew Pinski	2	-3/+86
	The patterns that were added in r13-4620-g4d9db4bdd458, missed that (a > b) and (a <= b) are not inverse of each other for floating point comparisons (if NaNs are supported). Even though there was a check for intergal types, it was only for the result of the cond rather for the type of what is being compared. The fix is to check to see if cmp and icmp are inverse of each other by using the invert_tree_comparison function. OK for trunk and GCC 13 branch? Bootstrapped and tested on x86_64-linux-gnu with no regressions. I added the testcase to execute/ieee as it requires support for NAN. PR tree-optimization/111109 gcc/ChangeLog: * match.pd (ior(cond,cond), ior(vec_cond,vec_cond)): Add check to make sure cmp and icmp are inverse. gcc/testsuite/ChangeLog: * gcc.c-torture/execute/ieee/fp-cmp-cond-1.c: New test.
2023-08-23	MATCH: remove negate for 1bit types	Andrew Pinski	4	-0/+82
	For 1bit types, negate is either undefined or don't change the value. In either cases we want to remove them. This patch adds a match pattern to do that. Also converting to a 1bit type we can remove the negate just like we already do for `&1` so this patch adds that too. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. Notes on the testcases: This patch is the last part to fix PR 95929; cond-bool-2.c testcase. bit1neg-1.c is a 1bit-field testcase where we could remove the assignment all the way in one case (which happened on the RTL level for some targets but not all). cond-bool-2.c is the reduced testcase of PR 95929. PR tree-optimization/95929 gcc/ChangeLog: * match.pd (convert?(-a)): New pattern for 1bit integer types. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/bit1neg-1.c: New test. * gcc.dg/tree-ssa/cond-bool-1.c: New test. * gcc.dg/tree-ssa/cond-bool-2.c: New test.
2023-08-24	Revert "Initial support for AVX10.1"	Haochen Jiang	26	-367/+13
	This reverts commit 11ad44da01dd1c91c96e45802fd8b1c50e88703f.
2023-08-24	Revert "Emit a warning when disabling AVX512 with AVX10 enabled or disabling ↵	Haochen Jiang	6	-91/+15
	AVX10 with AVX512 enabled" This reverts commit 0288ab14732a16b3787546cdd159941eb7306cf3.
2023-08-24	Revert "Emit a warning when AVX10 options conflict in vector width"	Haochen Jiang	6	-58/+1
	This reverts commit 26a820dc136b00b4dc37609429576b6a914cb572.
2023-08-24	Revert "Support AVX10.1 for AVX512DQ+AVX512VL intrins"	Haochen Jiang	10	-117/+79
	This reverts commit 2485dd9b4e219307f00d683077bbaf5a2add6604.
2023-08-24	Revert "Support AVX10.1 for AVX512DQ+AVX512VL intrins"	Haochen Jiang	13	-318/+0
	This reverts commit 1c3c405ecf23aeb3a2976350887bf2238719c71f.
2023-08-24	Revert "[Patch 3/6] Support AVX10.1 for AVX512DQ+AVX512VL intrins"	Haochen Jiang	4	-109/+89
	This reverts commit d14ab07ee91de0ebf80b73a22c4a23ecf2a2572e.
2023-08-24	Revert "[Patch 4/6] Support AVX10.1 for AVX512DQ+AVX512VL intrins"	Haochen Jiang	17	-430/+0
	This reverts commit aba10895052fcb2ab3c6d53ad98c855509877555.
2023-08-24	Revert "[Patch 5/6] Support AVX10.1 for AVX512DQ+AVX512VL intrins"	Haochen Jiang	4	-76/+65
	This reverts commit 0b20e0f17b47a86cddba68a2e016be0132ae9b0a.
2023-08-24	Revert "[Patch 6/6] Support AVX10.1 for AVX512DQ+AVX512VL intrins"	Haochen Jiang	10	-227/+0
	This reverts commit 5ccdfd0870be168031f8902e1039e77be93b131a.
2023-08-24	Revert "i386: Add AVX2 pragma wrapper for AVX512DQVL intrins"	Haochen Jiang	2	-22/+0
	This reverts commit 68f7cb6cf9e8b9f2254855507f3b479552adda5f.
2023-08-24	debug/111080 - avoid outputting debug info for unused restrict qualified type	Richard Biener	2	-0/+25
	The following applies some maintainance with respect to type qualifiers and kinds added by later DWARF standards to prune_unused_types_walk. The particular case in the bug is not handling (thus marking required) all restrict qualified type DIEs. I've found more DW_TAG__type that are unhandled, looked up the DWARF docs and added them as well based on common sense. PR debug/111080 dwarf2out.cc (prune_unused_types_walk): Handle DW_TAG_restrict_type, DW_TAG_shared_type, DW_TAG_atomic_type, DW_TAG_immutable_type, DW_TAG_coarray_type, DW_TAG_unspecified_type and DW_TAG_dynamic_type as to only output them when referenced. * gcc.dg/debug/dwarf2/pr111080.c: New testcase.
2023-08-24	Adjust GCC V13 to GCC 13.1 in diagnotic.	liuhongt	1	-1/+1
	gcc/ChangeLog: * config/i386/i386.cc (ix86_invalid_conversion): Adjust GCC V13 to GCC 13.1.
2023-08-24	Fix target_clone ("arch=graniterapids-d") and target_clone ("arch=arrowlake-s")	liuhongt	5	-37/+62
	Both "graniterapid-d" and "graniterapids" are attached with PROCESSOR_GRANITERAPID in processor_alias_table but mapped to different __cpu_subtype in get_intel_cpu. And get_builtin_code_for_version will try to match the first PROCESSOR_GRANITERAPIDS in processor_alias_table which maps to "granitepraids" here. 861 else if (new_target->arch_specified && new_target->arch > 0) 1862 for (i = 0; i < pta_size; i++) 1863 if (processor_alias_table[i].processor == new_target->arch) 1864 { 1865 const pta arch_info = &processor_alias_table[i]; 1866 switch (arch_info->priority) 1867 { 1868 default: 1869 arg_str = arch_info->name; This mismatch makes dispatch_function_versions check the preidcate of__builtin_cpu_is ("graniterapids") for "graniterapids-d" and causes the issue. The patch explicitly adds PROCESSOR_ARROWLAKE_S and PROCESSOR_GRANITERAPIDS_D to make a distinction. For "alderlake","raptorlake", "meteorlake" they share same isa, cost, tuning, and mapped to the same __cpu_type/__cpu_subtype in get_intel_cpu, so no need to add PROCESSOR_RAPTORLAKE and others. gcc/ChangeLog: common/config/i386/i386-common.cc (processor_names): Add new member graniterapids-s and arrowlake-s. * config/i386/i386-options.cc (processor_alias_table): Update table with PROCESSOR_ARROWLAKE_S and PROCESSOR_GRANITERAPIDS_D. (m_GRANITERAPID_D): New macro. (m_ARROWLAKE_S): Ditto. (m_CORE_AVX512): Add m_GRANITERAPIDS_D. (processor_cost_table): Add icelake_cost for PROCESSOR_GRANITERAPIDS_D and alderlake_cost for PROCESSOR_ARROWLAKE_S. * config/i386/x86-tune.def: Hanlde m_ARROWLAKE_S same as m_ARROWLAKE. * config/i386/i386.h (enum processor_type): Add new member PROCESSOR_GRANITERAPIDS_D and PROCESSOR_ARROWLAKE_S. * config/i386/i386-c.cc (ix86_target_macros_internal): Handle PROCESSOR_GRANITERAPIDS_D and PROCESSOR_ARROWLAKE_S
2023-08-24	testsuite: Xfail gcc.dg/tree-ssa/update-threading.c for CRIS, PR110628	Hans-Peter Nilsson	1	-1/+1
	* gcc.dg/tree-ssa/update-threading.c: Xfail for cris--.
2023-08-24	Daily bump.	GCC Administrator	4	-1/+166

2023-08-23	Improve quality of code from LRA register elimination	Jivan Hakobyan	1	-0/+12
	This is primarily Jivan's work, I'm mostly responsible for the write-up and coordinating with Vlad on a few questions. On targets with limitations on immediates usable in arithmetic instructions, LRA's register elimination phase can construct fairly poor code. This example (from the GCC testsuite) illustrates the problem well. int consume (void ); int foo (void) { int x[1000000]; return consume (x + 1000); } If you compile on riscv64-linux-gnu with "-O2 -march=rv64gc -mabi=lp64d", then you'll get this code (up to the call to consume()). .cfi_startproc li t0,-4001792 li a0,-3997696 li a5,4001792 addi sp,sp,-16 .cfi_def_cfa_offset 16 addi t0,t0,1792 addi a0,a0,1696 addi a5,a5,-1792 sd ra,8(sp) add a5,a5,a0 add sp,sp,t0 .cfi_def_cfa_offset 4000016 .cfi_offset 1, -8 add a0,a5,sp call consume Of particular interest is the value in a0 when we call consume. We compute that horribly inefficiently. If we back-substitute from the final assignment to a0 we get... a0 = a5 + sp a0 = a5 + (sp + t0) a0 = (a5 + a0) + (sp + t0) a0 = ((a5 - 1792) + a0) + (sp + t0) a0 = ((a5 - 1792) + (a0 + 1696)) + (sp + t0) a0 = ((a5 - 1792) + (a0 + 1696)) + (sp + (t0 + 1792)) a0 = (a5 + (a0 + 1696)) + (sp + t0) // removed offsetting terms a0 = (a5 + (a0 + 1696)) + ((sp - 16) + t0) a0 = (4001792 + (a0 + 1696)) + ((sp - 16) + t0) a0 = (4001792 + (-3997696 + 1696)) + ((sp - 16) + t0) a0 = (4001792 + (-3997696 + 1696)) + ((sp - 16) + -4001792) a0 = (-3997696 + 1696) + (sp -16) // removed offsetting terms a0 = sp - 3990616 That's a pretty convoluted way to compute sp - 3990616. Something like this would be notably better (not great, but we need both the stack adjustment and the address of the object to pass to consume): addi sp,sp,-16 sd ra,8(sp) li t0,-4001792 addi t0,t0,1792 add sp,sp,t0 li a0,4096 addi a0,a0,-96 add a0,sp,a0 call consume The problem is LRA's elimination code is not handling the case where we have (plus (reg1) (reg2) where reg1 is an eliminable register and reg2 has a known equivalency, particularly a constant. If we can determine that reg2 is equivalent to a constant and treat (plus (reg1) (reg2)) in the same way we'd treat (plus (reg1) (const_int)) then we can get the desired code. This eliminates about 19b instructions, or roughly 1% for deepsjeng on rv64. There are improvements elsewhere, but they're relatively small. This may ultimately lessen the value of Manolis's fold-mem-offsets patch. So we'll have to evaluate that again once he posts a new version. Bootstrapped and regression tested on x86_64 as well as bootstrapped on rv64. Earlier versions have been tested against spec2017. Pre-approved by Vlad in a private email conversation (thanks Vlad!). Committed to the trunk, gcc/ lra-eliminations.cc (eliminate_regs_in_insn): Use equivalences to to help simplify code further.
2023-08-23	Fortran: improve diagnostic message for COMMON with automatic object [PR32986]	Harald Anlauf	2	-1/+21
	gcc/fortran/ChangeLog: PR fortran/32986 * resolve.cc (is_non_constant_shape_array): Add forward declaration. (resolve_common_vars): Diagnose automatic array object in COMMON. (resolve_symbol): Prevent confusing follow-on error. gcc/testsuite/ChangeLog: PR fortran/32986 * gfortran.dg/common_28.f90: New test.
2023-08-23	Phi analyzer - Initialize with range instead of a tree.	Andrew MacLeod	6	-105/+129
	Rangers PHI analyzer currently only allows a single initializer to a group. This patch changes that to use an inialization range, which is cumulative of all integer constants, plus a single symbolic value. There is no other change to group functionality. This patch also changes the way PHI groups are printed so they show up in the listing as they are encountered, rather than as a list at the end. It was more difficult to see what was going on previously. PR tree-optimization/110918 - Initialize with range instead of a tree. gcc/ * gimple-range-fold.cc (fold_using_range::range_of_phi): Tweak output. * gimple-range-phi.cc (phi_group::phi_group): Remove unused members. Initialize using a range instead of value and edge. (phi_group::calculate_using_modifier): Use initializer value and process for relations after trying for iteration convergence. (phi_group::refine_using_relation): Use initializer range. (phi_group::dump): Rework the dump output. (phi_analyzer::process_phi): Allow multiple constant initilizers. Dump groups immediately as created. (phi_analyzer::dump): Tweak output. * gimple-range-phi.h (phi_group::phi_group): Adjust prototype. (phi_group::initial_value): Delete. (phi_group::refine_using_relation): Adjust prototype. (phi_group::m_initial_value): Delete. (phi_group::m_initial_edge): Delete. (phi_group::m_vr): Use int_range_max. * tree-vrp.cc (execute_ranger_vrp): Don't dump phi groups. gcc/testsuite/ * gcc.dg/pr102983.c: Adjust output expectations. * gcc.dg/pr110918.c: New.
2023-08-23	Don't process phi groups with one phi.	Andrew MacLeod	1	-5/+11
	The phi analyzer should not create a phi group containing a single phi. * gimple-range-phi.cc (phi_analyzer::operator[]): Return NULL if no group was created. (phi_analyzer::process_phi): Do not create groups of one phi node.
2023-08-23	rtl: use rtx_code for gen_ccmp_first and gen_ccmp_next	Richard Earnshaw	3	-6/+7
	Now that we have a forward declaration of rtx_code in coretypes.h, we can adjust these hooks to take rtx_code arguments rather than an int. gcc/ChangeLog: * target.def (gen_ccmp_first, gen_ccmp_next): Use rtx_code for CODE, CMP_CODE and BIT_CODE arguments. * config/aarch64/aarch64.cc (aarch64_gen_ccmp_first): Likewise. (aarch64_gen_ccmp_next): Likewise. * doc/tm.texi: Regenerated.
2023-08-23	rtl: Forward declare rtx_code	Richard Earnshaw	2	-1/+5
	Now that we require C++ 11, we can safely forward declare rtx_code so that we can use it in target hooks. gcc/ChangeLog * coretypes.h (rtx_code): Add forward declaration. * rtl.h (rtx_code): Make compatible with forward declaration.
2023-08-23	i386: Fix register spill failure with concat RTX [PR111010]	Uros Bizjak	1	-32/+14
	Disable (=&r,m,m) alternative for 32-bit targets. The combination of two memory operands (possibly with complex addressing mode), early clobbered output, frame pointer and PIC registers uses too much registers on a register constrained 32-bit target. Also merge two similar patterns using DWIH mode iterator. PR target/111010 gcc/ChangeLog: * config/i386/i386.md (concat<any_or_plus:mode><dwi>3_3): Merge pattern from concatditi3_3 and concatsidi3_3 using DWIH mode iterator. Disable (=&r,m,m) alternative for 32-bit targets. (concat<any_or_plus:mode><dwi>3_3): Disable (=&r,m,m) alternative for 32-bit targets.
2023-08-23	[PATCH] RISC-V:add a more appropriate type attribute	Zhangjin Liao	1	-1/+1
	Due to the more accurate type attribute added to the clz, ctz, and pcnt operations in https://github.com/gcc-mirror/gcc/commit/07e2576d6f3 the same type attribute should be used here. gcc/ChangeLog: * config/riscv/bitmanip.md (*<bitmanip_optab>disi2_sext): Add a more appropriate type attribute.
2023-08-23	RISC-V: Add conditional unary neg/abs/not autovec patterns	Lehua Ding	20	-13/+724
	Hi, This patch add conditional unary neg/abs/not autovec patterns to RISC-V backend. For this C code: void test_3 (float __restrict a, float __restrict b, int __restrict pred, int n) { for (int i = 0; i < n; i += 1) { a[i] = pred[i] ? __builtin_fabsf (b[i]) : a[i]; } } Before this patch: ... vsetvli a7,zero,e32,m1,ta,ma vfabs.v v2,v2 vmerge.vvm v1,v1,v2,v0 ... After this patch: ... vsetvli a7,zero,e32,m1,ta,mu vfabs.v v1,v2,v0.t ... For int neg/not and FP neg patterns, Defining the corresponding cond_xxx paterns is enough. For the FP abs pattern, We need to change the definition of `abs<mode>2` and `@vcond_mask_<mode><vm>` pattern from define_expand to define_insn_and_split in order to fuse them into a new pattern `cond_abs<mode>` at the combine pass. A fusion process similar to the one below: (insn 30 29 31 4 (set (reg:RVVM1SF 152 [ vect_iftmp.15 ]) (abs:RVVM1SF (reg:RVVM1SF 137 [ vect__6.14 ]))) "float.c":15:56 discrim 1 12799 {absrvvm1sf2} (expr_list:REG_DEAD (reg:RVVM1SF 137 [ vect__6.14 ]) (nil))) (insn 31 30 32 4 (set (reg:RVVM1SF 140 [ vect_iftmp.19 ]) (if_then_else:RVVM1SF (reg:RVVMF32BI 136 [ mask__27.11 ]) (reg:RVVM1SF 152 [ vect_iftmp.15 ]) (reg:RVVM1SF 139 [ vect_iftmp.18 ]))) 12707 {vcond_mask_rvvm1sfrvvmf32bi} (expr_list:REG_DEAD (reg:RVVM1SF 152 [ vect_iftmp.15 ]) (expr_list:REG_DEAD (reg:RVVM1SF 139 [ vect_iftmp.18 ]) (expr_list:REG_DEAD (reg:RVVMF32BI 136 [ mask__27.11 ]) (nil))))) ==> (insn 31 30 32 4 (set (reg:RVVM1SF 140 [ vect_iftmp.19 ]) (if_then_else:RVVM1SF (reg:RVVMF32BI 136 [ mask__27.11 ]) (abs:RVVM1SF (reg:RVVM1SF 137 [ vect__6.14 ])) (reg:RVVM1SF 139 [ vect_iftmp.18 ]))) 13444 {cond_absrvvm1sf} (expr_list:REG_DEAD (reg:RVVM1SF 137 [ vect__6.14 ]) (expr_list:REG_DEAD (reg:RVVMF32BI 136 [ mask__27.11 ]) (expr_list:REG_DEAD (reg:RVVM1SF 139 [ vect_iftmp.18 ]) (nil))))) Best, Lehua gcc/ChangeLog: config/riscv/autovec-opt.md (cond_abs<mode>): New combine pattern. (copysign<mode>_neg): Ditto. * config/riscv/autovec.md (@vcond_mask_<mode><vm>): Adjust. (<optab><mode>2): Ditto. (cond_<optab><mode>): New. (cond_len_<optab><mode>): Ditto. * config/riscv/riscv-protos.h (enum insn_type): New. (expand_cond_len_unop): New helper func. * config/riscv/riscv-v.cc (shuffle_merge_patterns): Adjust. (expand_cond_len_unop): New helper func. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/cond/cond_unary-1.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_unary-2.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_unary-3.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_unary-4.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_unary-5.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_unary-6.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_unary-7.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_unary-8.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_unary_run-1.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_unary_run-2.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_unary_run-3.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_unary_run-4.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_unary_run-5.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_unary_run-6.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_unary_run-7.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_unary_run-8.c: New test.
2023-08-23	Fix handling of static exists in loop_ch	Jan Hubicka	2	-18/+15
	This patch fixes wrong return value in should_duplicate_loop_header_p. Doing so uncovered suboptimal decisions on some jump threading testcases where we choose to stop duplicating just before basic block that has zero cost and duplicating so would be always a win. This is because the heuristics trying to choose right point to duplicate all winning blocks and to get loop to be do_while did not account zero_cost blocks in all cases. The patch simplifies the logic by simply remembering zero cost blocks and handling them last after the right stopping point is chosen. gcc/ChangeLog: * tree-ssa-loop-ch.cc (enum ch_decision): Fix comment. (should_duplicate_loop_header_p): Fix return value for static exits. (ch_base::copy_headers): Improve handling of ch_possible_zero_cost. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/copy-headers-9.c: Update template.
2023-08-23	Add testcase for PR110940	Jan Hubicka	1	-0/+19
	gcc/testsuite/ChangeLog: PR middle-end/110940 * gcc.c-torture/compile/pr110940.c: New test.
2023-08-23	vect: Move VMAT_GATHER_SCATTER handlings from final loop nest	Kewen Lin	1	-99/+159
	Like r14-3317 which moves the handlings on memory access type VMAT_GATHER_SCATTER in vectorizable_load final loop nest, this one is to deal with vectorizable_store side. gcc/ChangeLog: * tree-vect-stmts.cc (vectorizable_store): Move the handlings on VMAT_GATHER_SCATTER in the final loop nest to its own loop, and update the final nest accordingly.
2023-08-23	vect: Move VMAT_LOAD_STORE_LANES handlings from final loop nest	Kewen Lin	1	-345/+387
	Like commit r14-3214 which moves the handlings on memory access type VMAT_LOAD_STORE_LANES in vectorizable_load final loop nest, this one is to deal with the function vectorizable_store. gcc/ChangeLog: * tree-vect-stmts.cc (vectorizable_store): Move the handlings on VMAT_LOAD_STORE_LANES in the final loop nest to its own loop, and update the final nest accordingly.
2023-08-23	vect: Remove some manual release in vectorizable_store	Kewen Lin	1	-41/+23
	To avoid some duplicates in some follow-up patches on function vectorizable_store, this patch is to adjust some existing vec with auto_vec and remove some manual release invocation. Also refactor a bit and remove some uesless codes. gcc/ChangeLog: * tree-vect-stmts.cc (vectorizable_store): Remove vec oprnds, adjust vec result_chain, vec_oprnd with auto_vec, and adjust gvec_oprnds with auto_delete_vec.
2023-08-23	RISC-V: Fix potential ICE of global vsetvl elimination	Juzhe-Zhong	1	-2/+5
	Committed for following VSETVL refactor patch to make V2 patch easier to review. gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (pass_vsetvl::global_eliminate_vsetvl_insn): Fix potential ICE.
2023-08-23	RISC-V: Fix VTYPE fuse rule bug	Juzhe-Zhong	2	-3/+9
	This bug is exposed after refactor patch. Separate it and commited. gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (ge_sew_ratio_unavailable_p): Fix fuse rule bug. * config/riscv/riscv-vsetvl.def (DEF_SEW_LMUL_FUSE_RULE): Ditto.
2023-08-23	RISC-V: Fix gather_load_run-12.c test	Juzhe-Zhong	1	-0/+6
	FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-12.c gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-12.c: Add vsetvli asm.
2023-08-23	RISC-V: Add attribute to vtype change only vsetvl	Juzhe-Zhong	1	-1/+5
	This patch is prepare patch for VSETVL PASS. Commited. gcc/ChangeLog: * config/riscv/vector.md: Add attribute.
2023-08-23	RISC-V: Adapt live-1.c testcase	Juzhe-Zhong	1	-2/+2
	Commited. Fix failures: FAIL: gcc.target/riscv/rvv/autovec/partial/live-1.c scan-tree-dump-times optimized ".VEC_EXTRACT" 10 FAIL: gcc.target/riscv/rvv/autovec/partial/live-1.c scan-tree-dump-times optimized ".VEC_EXTRACT" 10 gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/partial/live-1.c: Adapt test.
2023-08-23	Daily bump.	GCC Administrator	7	-1/+304

2023-08-23	RISC-V: Clang format riscv-vsetvl.cc[NFC]	Juzhe-Zhong	1	-36/+29
	Commited. gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (change_insn): Clang format. (vector_infos_manager::all_same_ratio_p): Ditto. (vector_infos_manager::all_same_avl_p): Ditto. (pass_vsetvl::refine_vsetvls): Ditto. (pass_vsetvl::cleanup_vsetvls): Ditto. (pass_vsetvl::commit_vsetvls): Ditto. (pass_vsetvl::local_eliminate_vsetvl_insn): Ditto. (pass_vsetvl::global_eliminate_vsetvl_insn): Ditto. (pass_vsetvl::compute_probabilities): Ditto.
2023-08-23	RISC-V: Add riscv-vsetvl.def to t-riscv	Juzhe-Zhong	1	-1/+2
	This patch will be backport to GCC 13 and commit to trunk. gcc/ChangeLog: * config/riscv/t-riscv: Add riscv-vsetvl.def
2023-08-22	analyzer: reimplement kf_strlen [PR105899]	David Malcolm	5	-61/+85
	Reimplement kf_strlen in terms of the new string scanning implementation, sharing strlen's implementation with __analyzer_get_strlen. gcc/analyzer/ChangeLog: PR analyzer/105899 * kf-analyzer.cc (class kf_analyzer_get_strlen): Move to kf.cc. (register_known_analyzer_functions): Use make_kf_strlen. * kf.cc (class kf_strlen::impl_call_pre): Replace with implementation of kf_analyzer_get_strlen from kf-analyzer.cc. Handle "UNKNOWN" return from check_for_null_terminated_string_arg by falling back to a conjured svalue. (make_kf_strlen): New. (register_known_functions): Use make_kf_strlen. * known-function-manager.h (make_kf_strlen): New decl. gcc/testsuite/ChangeLog: PR analyzer/105899 * gcc.dg/analyzer/null-terminated-strings-1.c: Update expected results on symbolic values. * gcc.dg/analyzer/strlen-1.c: New test. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2023-08-22	c++: maybe_substitute_reqs_for fix	Jason Merrill	3	-7/+9
	While working on PR109751 I found that maybe_substitute_reqs_for was doing the wrong thing for a non-template friend, substituting in the template args of the scope's original template rather than those of the instantiation. This didn't end up being necessary to fix the PR, but it's still an improvement. gcc/cp/ChangeLog: * pt.cc (outer_template_args): Handle non-template argument. * constraint.cc (maybe_substitute_reqs_for): Pass decl to it. * cp-tree.h (outer_template_args): Adjust.