aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2025-07-21RISC-V: Add test for vec_duplicate + vaaddu.vv combine for DImodePan Li7-0/+26
Add asm dump check and run test for vec_duplicate + vaaddu.vv combine to vaaddu.vx, with the GR2VR cost is 0, 1, 2 and 15 for the case 0 and case 1. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u64.c: Add asm check. * gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u64.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u64.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u64.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u64.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u64.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx_vaadd-run-1-u64.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2025-07-21RISC-V: Allow VLS DImode for sat_op vx DImode patternPan Li1-15/+15
When try to introduce the vaaddu.vx combine for DImode, we will meet ICE like below: 0x4889763 internal_error(char const*, ...) .../riscv-gnu-toolchain/gcc/__build__/../gcc/diagnostic-global-context.cc:517 0x4842f98 fancy_abort(char const*, int, char const*) .../riscv-gnu-toolchain/gcc/__build__/../gcc/diagnostic.cc:1818 0x2953461 code_for_pred_scalar(int, machine_mode) ./insn-opinit.h:1911 0x295f300 riscv_vector::sat_op<110>::expand(riscv_vector::function_expander&) const .../riscv-gnu-toolchain/gcc/__build__/../gcc/config/riscv/riscv-vector-builtins-bases.cc:667 0x294bce1 riscv_vector::function_expander::expand() We will have code_for_nothing when emit the vaadd.vx insn for V2DI vls mode. So allow the VLS mode for the sat_op vx pattern to unblock it. gcc/ChangeLog: * config/riscv/vector.md: Allow VLS DImode for sat_op vx pattern. Signed-off-by: Pan Li <pan2.li@intel.com>
2025-07-21RISC-V: Add test for vec_duplicate + vaaddu.vv combine case 1 with GR2VR ↵Pan Li9-0/+18
cost 0, 1 and 2 for QI, HI and SI mode Add asm dump check test for vec_duplicate + vaaddu.vv combine to vaaddu.vx, with the GR2VR cost is 0, 1 and 2. Please note DImode is not included. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c: Add asm check. * gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u32.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u8.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u16.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u32.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u8.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u16.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u32.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u8.c: Ditto. Signed-off-by: Pan Li <pan2.li@intel.com>
2025-07-21RISC-V: Add test for vec_duplicate + vaaddu.vv combine case 0 with GR2VR ↵Pan Li14-15/+297
cost 0, 2 and 15 for QI, HI and SI mode Add asm dump check and run test for vec_duplicate + vaaddu.vv combine to vaaddu.vx, with the GR2VR cost is 0, 2 and 15. Please note DImode is not included here. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c: Add asm check. * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u8.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u16.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u32.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u8.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u16.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u32.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u8.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h: Add test helper macros. * gcc.target/riscv/rvv/autovec/vx_vf/vx_binary_data.h: Add test data for run test. * gcc.target/riscv/rvv/autovec/vx_vf/vx_vaadd-run-1-u16.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vx_vaadd-run-1-u32.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vx_vaadd-run-1-u8.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2025-07-21RISC-V: Combine vec_duplicate + vaaddu.vv to vaaddu.vx on GR2VR cost for HI, ↵Pan Li2-2/+89
QI and SI mode This patch would like to combine the vec_duplicate + vaaddu.vv to the vaaddu.vx. From example as below code. The related pattern will depend on the cost of vec_duplicate from GR2VR. Then the late-combine will take action if the cost of GR2VR is zero, and reject the combination if the GR2VR cost is greater than zero. Assume we have example code like below, GR2VR cost is 0. #define DEF_AVG_FLOOR(NT, WT) \ NT \ test_##NT##_avg_floor(NT x, NT y) \ { \ return (NT)(((WT)x + (WT)y) >> 1); \ } #define AVG_FLOOR_FUNC(T) test_##T##_avg_floor DEF_AVG_FLOOR(uint32_t, uint64_t) DEF_VX_BINARY_CASE_2_WRAP(T, AVG_FLOOR_FUNC(T), sat_add) Before this patch: 11 │ beq a3,zero,.L8 12 │ vsetvli a5,zero,e32,m1,ta,ma 13 │ vmv.v.x v2,a2 14 │ slli a3,a3,32 15 │ srli a3,a3,32 16 │ .L3: 17 │ vsetvli a5,a3,e32,m1,ta,ma 18 │ vle32.v v1,0(a1) 19 │ slli a4,a5,2 20 │ sub a3,a3,a5 21 │ add a1,a1,a4 22 │ vaaddu.vv v1,v1,v2 23 │ vse32.v v1,0(a0) 24 │ add a0,a0,a4 25 │ bne a3,zero,.L3 After this patch: 11 │ beq a3,zero,.L8 12 │ slli a3,a3,32 13 │ srli a3,a3,32 14 │ .L3: 15 │ vsetvli a5,a3,e32,m1,ta,ma 16 │ vle32.v v1,0(a1) 17 │ slli a4,a5,2 18 │ sub a3,a3,a5 19 │ add a1,a1,a4 20 │ vaaddu.vx v1,v1,a2 21 │ vse32.v v1,0(a0) 22 │ add a0,a0,a4 23 │ bne a3,zero,.L3 gcc/ChangeLog: * config/riscv/autovec-opt.md (*uavg_floor_vx_<mode>): Add pattern for vaaddu.vx combine. * config/riscv/riscv.cc (get_vector_binary_rtx_cost): Add UNSPEC handling for UNSPEC_VAADDU. (riscv_rtx_costs): Ditto. Signed-off-by: Pan Li <pan2.li@intel.com>
2025-07-21aarch64: Avoid INS-(W|X)ZR instructions when optimising for speedKyrylo Tkachov4-2/+33
For inserting zero into a vector lane we usually use an instruction like: ins v0.h[2], wzr This, however, has not-so-great performance on some CPUs. On Grace, for example it has a latency of 5 and throughput 1. The alternative sequence: movi v31.8b, #0 ins v0.h[2], v31.h[0] is prefereble bcause the MOVI-0 is often a zero-latency operation that is eliminated by the CPU frontend and the lane-to-lane INS has a latency of 2 and throughput of 4. We can avoid the merging of the two instructions into the aarch64_simd_vec_set_zero<mode> by disabling that pattern when optimizing for speed. Thanks to wider benchmarking from Tamar, it makes sense to make this change for all tunings, so no RTX costs or tuning flags are introduced to control this in a more fine-grained manner. They can be easily added in the future if needed for a particular CPU. Bootstrapped and tested on aarch64-none-linux-gnu. Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com> gcc/ * config/aarch64/aarch64-simd.md (aarch64_simd_vec_set_zero<mode>): Enable only when optimizing for size. gcc/testsuite/ * gcc.target/aarch64/simd/mf8_data_1.c (test_set_lane4, test_setq_lane4): Relax allowed assembly. * gcc.target/aarch64/vec-set-zero.c: Use -Os in flags. * gcc.target/aarch64/inszero_split_1.c: New test.
2025-07-21aarch64: NFC - Make vec_* rtx costing logic consistentKyrylo Tkachov3-58/+65
The rtx costs logic for CONST_VECTOR, VEC_DUPLICATE and VEC_SELECT sets the cost unconditionally to the movi, dup or extract fields of extra_cost, when the normal practice in that function is to use extra_cost only when speed is set. When speed is false the function should estimate the size cost only. This patch makes the logic consistent by using the extra_cost fields to increment the cost when speed is set. This requires reducing the extra_cost values of the movi, dup and extract fields by COSTS_N_INSNS (1), as every insn being costed has a cost of COSTS_N_INSNS (1) at the start of the function. The cost tables for the CPUs are updated in line with this. With these changes the testsuite is unaffected so no different costing decisions are made and this patch is just a cleanup. Bootstrapped and tested on aarch64-none-linux-gnu. Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com> gcc/ * config/aarch64/aarch64.cc (aarch64_rtx_costs): Add extra_cost values only when speed is true for CONST_VECTOR, VEC_DUPLICATE, VEC_SELECT cases. * config/aarch64/aarch64-cost-tables.h (qdf24xx_extra_costs, thunderx_extra_costs, thunderx2t99_extra_costs, thunderx3t110_extra_costs, tsv110_extra_costs, a64fx_extra_costs, ampere1_extra_costs, ampere1a_extra_costs, ampere1b_extra_costs): Reduce cost of movi, dup, extract fields by COSTS_N_INSNS (1). * config/arm/aarch-cost-tables.h (generic_extra_costs, cortexa53_extra_costs, cortexa57_extra_costs, cortexa76_extra_costs, exynosm1_extra_costs, xgene1_extra_costs): Likewise.
2025-07-21tree-optimization/121194 - check LC PHIs can be vectorizedRichard Biener2-0/+29
With bools we can have the usual mismatch between mask and data use. Catch that, like we do elsewhere. PR tree-optimization/121194 * tree-vect-loop.cc (vectorizable_lc_phi): Verify vector types are compatible. * gcc.dg/torture/pr121194.c: New testcase.
2025-07-21fortran: Fix indentationMikael Morin1-8/+8
gcc/fortran/ChangeLog: * trans-decl.cc (gfc_trans_deferred_vars): Fix indentation.
2025-07-21amdgcn: add DImode offsets for gather/scatterAndrew Stubbs2-12/+103
Add new variant of he gather_load and scatter_store instructions that take the offsets in DImode. This is not the natural width for offsets in the instruction set, but we can use them to compute a vector of absolute addresses, which does work. This enables the autovectorizer to use gather/scatter in a number of additional scenarios (one of which shows up in the SPEC HPC lbm benchmark). gcc/ChangeLog: * config/gcn/gcn-valu.md (gather_load<mode><vndi>): New. (scatter_store<mode><vndi>): New. (mask_gather_load<mode><vndi>): New. (mask_scatter_store<mode><vndi>): New. * config/gcn/gcn.cc (gcn_expand_scaled_offsets): Support DImode.
2025-07-21amdgcn: Add ashlvNm, mulvNm macrosAndrew Stubbs1-27/+41
I need some extra shift varieties in the mode-independent code, but the macros don't permit insns that don't have QI/HI variants. This fixes the problem, and adds the new functions for the follow-up patch to use. gcc/ChangeLog: * config/gcn/gcn.cc (GEN_VNM_NOEXEC): Use USE_QHF. (GEN_VNM): Likewise, and call for new ashl and mul variants.
2025-07-21amdgcn: add more insn patterns using vec_duplicateAndrew Stubbs2-6/+179
These new insns allow more efficient use of scalar inputs to 64-bit vector add and mul. Also, the patch adjusts the existing mul.._dup because it was actually a dup2 (the vec_duplicate is on the second input), and that was inconveniently inconsistent. The patterns are generally useful, but will be used directly by a follow-up patch. gcc/ChangeLog: * config/gcn/gcn-valu.md (add<mode>3_dup): New. (add<mode>3_dup_exec): New. (<su>mul<mode>3_highpart_dup<exec>): New. (mul<mode>3_dup): Move the vec_duplicate to operand 1. (mul<mode>3_dup_exec): New. (vec_series<mode>): Adjust call to gen_mul<mode>3_dup. * config/gcn/gcn.cc (gcn_expand_vector_init): Likewise.
2025-07-21genoutput: Verify hard register constraintsStefan Schulze Frielinghaus3-0/+52
Since genoutput has no information about hard register names we cannot statically verify those names in constraints of the machine description. Therefore, we have to do it at runtime. Although verification shouldn't be too expensive, restrict it to checking builds. This should be sufficient since hard register constraints in machine descriptions probably change rarely, and each commit should be tested with checking anyway, or at the very least before a release is taken. gcc/ChangeLog: * genoutput.cc (main): Emit function verify_reg_names_in_constraints() for run-time validation. (mdep_constraint_len): Deal with hard register constraints. * output.h (verify_reg_names_in_constraints): New function declaration. * toplev.cc (backend_init): If checking is enabled, call into verify_reg_names_in_constraints().
2025-07-21Error handling for hard register constraintsStefan Schulze Frielinghaus26-118/+843
This implements error handling for hard register constraints including potential conflicts with register asm operands. In contrast to register asm operands, hard register constraints allow more than just one register per operand. Even more than just one register per alternative. For example, a valid constraint for an operand is "{r0}{r1}m,{r2}". However, this also means that we have to make sure that each register is used at most once in each alternative over all outputs and likewise over all inputs. For asm statements this is done by this patch during gimplification. For hard register constraints used in machine description, error handling is still a todo and I haven't investigated this so far and consider this rather a low priority. gcc/ada/ChangeLog: * gcc-interface/trans.cc (gnat_to_gnu): Pass null pointer to parse_{input,output}_constraint(). gcc/analyzer/ChangeLog: * region-model-asm.cc (region_model::on_asm_stmt): Pass null pointer to parse_{input,output}_constraint(). gcc/c/ChangeLog: * c-typeck.cc (build_asm_expr): Pass null pointer to parse_{input,output}_constraint(). gcc/ChangeLog: * cfgexpand.cc (n_occurrences): Move this ... (check_operand_nalternatives): and this ... (expand_asm_stmt): and the call to gimplify.cc. * config/s390/s390.cc (s390_md_asm_adjust): Pass null pointer to parse_{input,output}_constraint(). * gimple-walk.cc (walk_gimple_asm): Pass null pointer to parse_{input,output}_constraint(). (walk_stmt_load_store_addr_ops): Ditto. * gimplify-me.cc (gimple_regimplify_operands): Ditto. * gimplify.cc (num_occurrences): Moved from cfgexpand.cc. (num_alternatives): Ditto. (gimplify_asm_expr): Deal with hard register constraints. * stmt.cc (eliminable_regno_p): New helper. (hardreg_ok_p): Perform a similar check as done in make_decl_rtl(). (parse_output_constraint): Add parameter for gimplify_reg_info and validate hard register constrained operands. (parse_input_constraint): Ditto. * stmt.h (class gimplify_reg_info): Forward declaration. (parse_output_constraint): Add parameter. (parse_input_constraint): Ditto. * tree-ssa-operands.cc (operands_scanner::get_asm_stmt_operands): Pass null pointer to parse_{input,output}_constraint(). * tree-ssa-structalias.cc (find_func_aliases): Pass null pointer to parse_{input,output}_constraint(). * varasm.cc (assemble_asm): Pass null pointer to parse_{input,output}_constraint(). * gimplify_reg_info.h: New file. gcc/cp/ChangeLog: * semantics.cc (finish_asm_stmt): Pass null pointer to parse_{input,output}_constraint(). gcc/d/ChangeLog: * toir.cc: Pass null pointer to parse_{input,output}_constraint(). gcc/testsuite/ChangeLog: * gcc.dg/pr87600-2.c: Split test into two files since errors for functions test{0,1} are thrown during expand, and for test{2,3} during gimplification. * lib/scanasm.exp: On s390, skip lines beginning with #. * gcc.dg/asm-hard-reg-error-1.c: New test. * gcc.dg/asm-hard-reg-error-2.c: New test. * gcc.dg/asm-hard-reg-error-3.c: New test. * gcc.dg/asm-hard-reg-error-4.c: New test. * gcc.dg/asm-hard-reg-error-5.c: New test. * gcc.dg/pr87600-3.c: New test. * gcc.target/aarch64/asm-hard-reg-2.c: New test. * gcc.target/s390/asm-hard-reg-7.c: New test.
2025-07-21Hard register constraintsStefan Schulze Frielinghaus31-7/+1388
Implement hard register constraints of the form {regname} where regname must be a valid register name for the target. Such constraints may be used in asm statements as a replacement for register asm and in machine descriptions. A more verbose description is given in extend.texi. It is expected and desired that optimizations coalesce multiple pseudos into one whenever possible. However, in case of hard register constraints we may have to undo this and introduce copies since otherwise we would constraint a single pseudo to multiple hard registers. This is done prior RA during asmcons in match_asm_constraints_2(). While IRA tries to reduce live ranges, it also replaces some register-register moves. That in turn might undo those copies of a pseudo which we just introduced during asmcons. Thus, check in decrease_live_ranges_number() via valid_replacement_for_asm_input_p() whether it is valid to perform a replacement. The reminder of the patch mostly deals with parsing and decoding hard register constraints. The actual work is done by LRA in process_alt_operands() where a register filter, according to the constraint, is installed. For the sake of "reviewability" and in order to show the beauty of LRA, error handling (which gets pretty involved) is spread out into a subsequent patch. Limitation ---------- Currently, a fixed register cannot be used as hard register constraint. For example, loading the stack pointer on x86_64 via void * foo (void) { void *y; __asm__ ("" : "={rsp}" (y)); return y; } leads to an error. Asm Adjust Hook --------------- The following targets implement TARGET_MD_ASM_ADJUST: - aarch64 - arm - avr - cris - i386 - mn10300 - nds32 - pdp11 - rs6000 - s390 - vax Most of them only add the CC register to the list of clobbered register. However, cris, i386, and s390 need some minor adjustment. gcc/ChangeLog: * config/cris/cris.cc (cris_md_asm_adjust): Deal with hard register constraint. * config/i386/i386.cc (map_egpr_constraints): Ditto. * config/s390/s390.cc (f_constraint_p): Ditto. * doc/extend.texi: Document hard register constraints. * doc/md.texi: Ditto. * function.cc (match_asm_constraints_2): Have a unique pseudo for each operand with a hard register constraint. (pass_match_asm_constraints::execute): Calling into new helper match_asm_constraints_2(). * genoutput.cc (mdep_constraint_len): Return the length of a hard register constraint. * genpreds.cc (write_insn_constraint_len): Support hard register constraints for insn_constraint_len(). * ira.cc (valid_replacement_for_asm_input_p_1): New helper. (valid_replacement_for_asm_input_p): New helper. (decrease_live_ranges_number): Similar to match_asm_constraints_2() ensure that each operand has a unique pseudo if constrained by a hard register. * lra-constraints.cc (process_alt_operands): Install hard register filter according to constraint. * recog.cc (asm_operand_ok): Accept register type for hard register constrained asm operands. (constrain_operands): Validate hard register constraints. * stmt.cc (decode_hard_reg_constraint): Parse a hard register constraint into the corresponding register number or bail out. (parse_output_constraint): Parse hard register constraint and set *ALLOWS_REG. (parse_input_constraint): Ditto. * stmt.h (decode_hard_reg_constraint): Declaration of new function. gcc/testsuite/ChangeLog: * gcc.dg/asm-hard-reg-1.c: New test. * gcc.dg/asm-hard-reg-2.c: New test. * gcc.dg/asm-hard-reg-3.c: New test. * gcc.dg/asm-hard-reg-4.c: New test. * gcc.dg/asm-hard-reg-5.c: New test. * gcc.dg/asm-hard-reg-6.c: New test. * gcc.dg/asm-hard-reg-7.c: New test. * gcc.dg/asm-hard-reg-8.c: New test. * gcc.target/aarch64/asm-hard-reg-1.c: New test. * gcc.target/i386/asm-hard-reg-1.c: New test. * gcc.target/i386/asm-hard-reg-2.c: New test. * gcc.target/s390/asm-hard-reg-1.c: New test. * gcc.target/s390/asm-hard-reg-2.c: New test. * gcc.target/s390/asm-hard-reg-3.c: New test. * gcc.target/s390/asm-hard-reg-4.c: New test. * gcc.target/s390/asm-hard-reg-5.c: New test. * gcc.target/s390/asm-hard-reg-6.c: New test. * gcc.target/s390/asm-hard-reg-longdouble.h: New test.
2025-07-21Remove bougs minimum VF computeRichard Biener5-15/+19
The following removes the minimum VF compute from dataref analysis which does not take into account SLP at all, leaving the testcase vectorized with V2SImode instead of V4SImode on x86. With SLP the only minimum VF we can compute this early is 1. * tree-vectorizer.h (vect_analyze_data_refs): Remove min_vf output. * tree-vect-data-refs.cc (vect_analyze_data_refs): Likewise. * tree-vect-loop.cc (vect_analyze_loop_2): Remove early out based on bogus min_vf. * tree-vect-slp.cc (vect_slp_analyze_bb_1): Adjust. * gcc.dg/vect/vect-127.c: New testcase.
2025-07-21Fortran: Allow for iterator substitution in array constructors [PR119106]Andre Vehreschild2-2/+20
PR fortran/119106 gcc/fortran/ChangeLog: * expr.cc (simplify_constructor): Do not simplify constants. (gfc_simplify_expr): Continue to simplify expression when an iterator is present. gcc/testsuite/ChangeLog: * gfortran.dg/array_constructor_58.f90: New test.
2025-07-21fortran: Factor array descriptor referencesMikael Morin1-1/+143
Save subexpressions of array descriptor references to variables, so that all the expressions using the descriptor as base object benefit from a simplified reference using the variables. This limits the size of the expressions generated in the original tree dump, easing analysis of the code involving those expressions. This is especially helpful with chains of array references where each array in the chain uses a descriptor. After optimizations, the effect of the change shouldn't be visible in the vast majority of cases. In rare cases it seems to permit a couple more jump threadings. gcc/fortran/ChangeLog: * trans-array.cc (gfc_conv_ss_descriptor): Move the descriptor expression initialisation... (set_factored_descriptor_value): ... to this new function. Before initialisation, walk the reference expression passed as argument and save some of its subexpressions to a variable. (substitute_t): New struct. (maybe_substitute_expr): New function. (substitute_subexpr_in_expr): New function.
2025-07-21RISC-V: Add testcase for unsigned scalar SAT_ADD form 8 and form 9panciyan17-0/+212
This patch adds testcase for form8 and form9, as shown below: T __attribute__((noinline)) \ sat_u_add_##T##_fmt_8(T x, T y) \ { \ return x <= (T)(x + y) ? (x + y) : -1; \ } T __attribute__((noinline)) \ sat_u_add_##T##_fmt_9(T x, T y) \ { \ return x > (T)(x + y) ? -1 : (x + y); \ } Passed the rv64gc regression test. Signed-off-by: Ciyan Pan <panciyan@eswincomputing.com> gcc/testsuite/ChangeLog: * gcc.target/riscv/sat/sat_arith.h: Unsigned testcase form8 form9. * gcc.target/riscv/sat/sat_u_add-8-u16.c: New test. * gcc.target/riscv/sat/sat_u_add-8-u32.c: New test. * gcc.target/riscv/sat/sat_u_add-8-u64.c: New test. * gcc.target/riscv/sat/sat_u_add-8-u8.c: New test. * gcc.target/riscv/sat/sat_u_add-9-u16.c: New test. * gcc.target/riscv/sat/sat_u_add-9-u32.c: New test. * gcc.target/riscv/sat/sat_u_add-9-u64.c: New test. * gcc.target/riscv/sat/sat_u_add-9-u8.c: New test. * gcc.target/riscv/sat/sat_u_add-run-8-u16.c: New test. * gcc.target/riscv/sat/sat_u_add-run-8-u32.c: New test. * gcc.target/riscv/sat/sat_u_add-run-8-u64.c: New test. * gcc.target/riscv/sat/sat_u_add-run-8-u8.c: New test. * gcc.target/riscv/sat/sat_u_add-run-9-u16.c: New test. * gcc.target/riscv/sat/sat_u_add-run-9-u32.c: New test. * gcc.target/riscv/sat/sat_u_add-run-9-u64.c: New test. * gcc.target/riscv/sat/sat_u_add-run-9-u8.c: New test.
2025-07-21Adjust 'libgomp.c++/target-cdtor-{1,2}.C' for 'targetm.cxx.use_aeabi_atexit' ↵Thomas Schwinge2-12/+22
[PR119853, PR119854] Fix-up for commit aafe942227baf8c2bcd4cac2cb150e49a4b895a9 "GCN, nvptx offloading: Host/device compatibility: Itanium C++ ABI, DSO Object Destruction API [PR119853, PR119854]": we need to adjust for 'targetm.cxx.use_aeabi_atexit': gcc/config/arm/arm.cc:#define TARGET_CXX_USE_AEABI_ATEXIT arm_cxx_use_aeabi_atexit gcc/config/arm/arm.cc:/* The EABI says __aeabi_atexit should be used to register static gcc/config/arm/arm.cc- destructors. */ gcc/config/arm/arm.cc- gcc/config/arm/arm.cc-static bool gcc/config/arm/arm.cc:arm_cxx_use_aeabi_atexit (void) gcc/config/arm/arm.cc-{ gcc/config/arm/arm.cc- return TARGET_AAPCS_BASED; gcc/config/arm/arm.cc-} ..., which 'gcc/cp/decl.cc:get_atexit_node' then acts on: call '__aeabi_atexit' instead of '__cxa_atexit', and swap two arguments. PR target/119853 PR target/119854 libgomp/ * testsuite/libgomp.c++/target-cdtor-1.C: Adjust for 'targetm.cxx.use_aeabi_atexit'. * testsuite/libgomp.c++/target-cdtor-2.C: Likewise.
2025-07-21Daily bump.GCC Administrator4-1/+36
2025-07-20libstdc++: Export std::dextents from std.cc.in [PR121174]Jakub Jelinek1-1/+4
r16-442 implemented both std::extents and std::dextents (and perhaps other stuff), but exported only std::extents. I went through https://eel.is/c++draft/mdspan.syn and I think std::dextents is the only one implemented but not exported. The following patch exports it, and additionally appends some further entities to the FIXME list, those all seems to be unimplemented yet. 2025-07-20 Jakub Jelinek <jakub@redhat.com> PR libstdc++/121174 * src/c++23/std.cc.in (std::dextents): Export. Add to FIXME comments other not yet implemented nor exported <mdspan> entities.
2025-07-19testsuite: Fix afdo-crossmodule-1b.c [PR120859]Andrew Pinski1-0/+5
The problem here is that the testcase is part of another testcase but dg-final does not work across source files so it needs its own dg-* headers to that match up with afdo-crossmodule-1.c. Pushed as preapproved in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120859#c4 . PR testsuite/120859 gcc/testsuite/ChangeLog: * gcc.dg/tree-prof/afdo-crossmodule-1b.c: Add some dg-* commands like what is in afdo-crossmodule-1.c Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2025-07-20RISC-V: Refine the test case for vector avg_floor and avg_ceil [NFC]Pan Li14-14/+14
The previous test case doesn't leverage the right test helper macro, it should be DEF_AVG_0_WRAP instead of DEF_AVG_0. We prefer the test function name is test_avg_floor_int64_t_int32_t_0 instead of test_avg_floor_WT_NT_0 for DEF_AVG_0(WT, NT). The below test suites are passed for this patch. * The rv64gcv fully regression test. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/avg_floor-1-i16-from-i32.c: Leverage DEF_AVG_0_WRAP to generate the correct func name. * gcc.target/riscv/rvv/autovec/avg_floor-1-i16-from-i64.c: Ditto. * gcc.target/riscv/rvv/autovec/avg_floor-1-i32-from-i64.c: Ditto. * gcc.target/riscv/rvv/autovec/avg_floor-1-i64-from-i128.c: Ditto. * gcc.target/riscv/rvv/autovec/avg_floor-1-i8-from-i16.c: Ditto. * gcc.target/riscv/rvv/autovec/avg_floor-1-i8-from-i32.c: Ditto. * gcc.target/riscv/rvv/autovec/avg_floor-1-i8-from-i64.c: Ditto. * gcc.target/riscv/rvv/autovec/avg_ceil-1-i16-from-i32.c: Ditto. * gcc.target/riscv/rvv/autovec/avg_ceil-1-i16-from-i64.c: Ditto. * gcc.target/riscv/rvv/autovec/avg_ceil-1-i32-from-i64.c: Ditto. * gcc.target/riscv/rvv/autovec/avg_ceil-1-i8-from-i16.c: Ditto. * gcc.target/riscv/rvv/autovec/avg_ceil-1-i8-from-i32.c: Ditto. * gcc.target/riscv/rvv/autovec/avg_ceil-1-i8-from-i64.c: Ditto. * gcc.target/riscv/rvv/autovec/avg_ceil-1-i64-from-i128.c: Ditto. Signed-off-by: Pan Li <pan2.li@intel.com>
2025-07-20RISC-V: Add ashiftrt operand 2 for vector avg_floor and avg_ceilPan Li1-2/+4
According to the semantics of the avg_floor and avg_ceil as below: floor: op0 = (narrow) (((wide) op1 + (wide) op2) >> 1); ceil: op0 = (narrow) (((wide) op1 + (wide) op2 + 1) >> 1); Aka we have (const_int 1) as the op2 of the ashiftrt but seems missed. Thus, add it back to align the definition. The below test suites are passed for this patch. * The rv64gcv fully regression test. gcc/ChangeLog: * config/riscv/autovec.md: Add (const_int 1) as the op2 of ashiftrt. Signed-off-by: Pan Li <pan2.li@intel.com>
2025-07-20Daily bump.GCC Administrator3-1/+57
2025-07-19pru: Use signed HOST_WIDE_INT for handling ctable addressesDimitar Dimitrov4-13/+38
The ctable base address for SBCO/LBCO load/store patterns was incorrectly stored as unsigned integer. That prevented matching addresses with bit 31 set, because const_int RTL expression is expected to be sign-extended. Fix by using sign-extended 32-bit values for ctable base addresses. PR target/121124 gcc/ChangeLog: * config/pru/pru-pragma.cc (pru_pragma_ctable_entry): Handle the ctable base address as signed 32-bit value, and sign-extend to HOST_WIDE_INT. * config/pru/pru-protos.h (struct pru_ctable_entry): Store the ctable base address as signed. (pru_get_ctable_exact_base_index): Pass base address as signed. (pru_get_ctable_base_index): Ditto. (pru_get_ctable_base_offset): Ditto. * config/pru/pru.cc (pru_get_ctable_exact_base_index): Ditto. (pru_get_ctable_base_index): Ditto. (pru_get_ctable_base_offset): Ditto. (pru_print_operand_address): Ditto. gcc/testsuite/ChangeLog: * gcc.target/pru/pragma-ctable_entry-2.c: New test.
2025-07-19[PATCH] RISC-V: Vector-scalar widening negate-multiply-(subtract-)accumulate ↵Paul-Antoine Arras14-5/+151
[PR119100] This pattern enables the combine pass (or late-combine, depending on the case) to merge a float_extend'ed vec_duplicate into a (possibly negated) minus-mult RTL instruction. Before this patch, we have six instructions, e.g.: vsetivli zero,4,e32,m1,ta,ma fcvt.s.h fa5,fa5 vfmv.v.f v4,fa5 vfwcvt.f.f.v v1,v3 vsetvli zero,zero,e32,m1,ta,ma vfnmadd.vv v1,v4,v2 After, we get only one: vfwnmacc.vf v1,fa5,v2 PR target/119100 gcc/ChangeLog: * config/riscv/autovec-opt.md (*vfwnmacc_vf_<mode>): New pattern. (*vfwnmsac_vf_<mode>): New pattern. * config/riscv/riscv.cc (get_vector_binary_rtx_cost): Add support for a vec_duplicate in a neg. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f16.c: Add vfwnmacc and vfwnmsac. * gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfwnmacc-run-1-f16.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfwnmacc-run-1-f32.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfwnmsac-run-1-f16.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfwnmsac-run-1-f32.c: New test.
2025-07-19[PATCH] RISC-V: prevent NULL_RTX dereference in riscv_macro_fusion_pair_p ()Artemiy Volkov1-2/+2
> A number of folks have had their fingers in this code and it's going to take > a few submissions to do everything we want to do. > > This patch is primarily concerned with avoiding signaling that fusion can > occur in cases where it obviously should not be signaling fusion. Hi Jeff, With this change, we're liable to ICE whenever prev_set or curr_set are NULL_RTX. For a fix, how about something like the below? Thanks, Artemiy Introduced in r16-1984-g83d19b5d842dad, initializers for {prev,curr}_dest_regno can cause an ICE if the respective insn isn't a single set. Rectify this by inserting a NULL_RTX check before using {prev,curr}_set. Regtested on riscv32. gcc/ * config/riscv/riscv.cc (riscv_macro_fusion_pair_p): Protect from a NULL PREV_SET or CURR_SET.
2025-07-19AVR: Fuse get_insns with end_sequence.Georg-Johann Lay1-4/+2
gcc/ * config/avr/avr-passes.cc (avr_optimize_casesi): Fuse get_insns() with end_sequence().
2025-07-19Daily bump.GCC Administrator8-1/+258
2025-07-19libstdc++: Only define __any_input_iterator for C++20Jonathan Wakely1-0/+2
Currently this new concept will get defined for -std=c++17 -fconcepts but as it uses std::input_iterator, which is new in C++20, that won't work. Guard it with __cpp_lib_concepts as well as __cpp_concepts. libstdc++-v3/ChangeLog: * include/bits/stl_iterator_base_types.h (__any_input_iterator): Only define when __cpp_lib_concepts is defined.
2025-07-18PR modula2/121164 Modula 2 build failureGaius Mulley1-3/+3
This patch fixes the 2nd parameter name mismatch in ARRAYOFCHAR.mod. gcc/m2/ChangeLog: PR modula2/121164 * gm2-libs/ARRAYOFCHAR.mod (Write): Rename 2nd parameter name a to str. Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
2025-07-18Fortran: fix bogus runtime error with optional procedure argument [PR121145]Harald Anlauf2-1/+48
PR fortran/121145 gcc/fortran/ChangeLog: * trans-expr.cc (gfc_conv_procedure_call): Do not create pointer check for proc-pointer actual passed to optional dummy. gcc/testsuite/ChangeLog: * gfortran.dg/pointer_check_15.f90: New test.
2025-07-18testsuite/vec: Fix vect-reduc-cond-[12].c for non vect_condition targets ↵Andrew Pinski2-0/+2
[PR121153] I missed this when I added the two testcase vect-reduc-cond-[12].c. These testcases require support of vectorization of `a ? b : c` which some targets (e.g. sparc) does not support. Pushed as obvious after a quick test. PR testsuite/121153 gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-reduc-cond-1.c: Require vect_condition. * gcc.dg/vect/vect-reduc-cond-2.c: Likewise. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2025-07-18libstdc++: Remove Paolo from list of people to contact about contributingJonathan Wakely2-6/+2
Paolo has not been active for some time. libstdc++-v3/ChangeLog: * doc/xml/manual/appendix_contributing.xml: Remove Paolo from list of maintainers to contact about contributing. * doc/html/manual/appendix_contributing.html: Regenerate.
2025-07-18libstdc++: Document new generated headerJonathan Wakely2-0/+18
libstdc++-v3/ChangeLog: * doc/xml/manual/build_hacking.xml: Document that windows_zones-map.h is a generated file. * doc/html/manual/appendix_porting.html: Regenerate.
2025-07-18RISC-V: Support RVVDImode for avg3_ceil auto vectPan Li4-1/+42
Like the avg3_floor pattern, the avg3_ceil has the similar issue that lack of the RVV DImode support. Thus, this patch would like to support the DImode by the standard name, with the iterator V_VLSI_D. The below test suites are passed for this patch series. * The rv64gcv fully regression test. gcc/ChangeLog: * config/riscv/autovec.md (avg<mode>3_ceil): Add new pattern of avg3_ceil for RVV DImode gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/avg_data.h: Adjust the test data. * gcc.target/riscv/rvv/autovec/avg_ceil-1-i64-from-i128.c: New test. * gcc.target/riscv/rvv/autovec/avg_ceil-run-1-i64-from-i128.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2025-07-18libstdc++: Qualify addressof calls in inplace_vector [PR119137]Tomasz Kamiński1-2/+2
PR libstdc++/119137 libstdc++-v3/ChangeLog: * include/std/inplace_vector (inplace_vector::operator=): Qualify call to std::addressof.
2025-07-18libstdc++: Fixed localized empty-spec formatting for months/weekdays [PR121154]Tomasz Kamiński3-52/+96
Previously for localized output, if _M_debug option was set, the _M_check_ok completed succesfully and _M_locale_fmt was called for months/weekdays that are !ok(). This patch lifts debug checks from each conversion function into _M_check_ok, that in case of !ok() values return a string_view containing the kind of calendar data, to be included after "is not a valid" string. The localized output (_M_locale_fmt) is not used if string is non-empty. Emitting of this message is now handled in _M_format_to, further reducing each specifier function. To handle weekday (%a,%A) and month (%b,%B), _M_check_ok now accepts a mutable reference to conversion specifier, and updates it to corresponding numeric value (%w, %m). Extra care needs to be taken to handle a month(0) that needs to be printed as single digit in debug format. Finally, the _M_time_point is replaced with _M_needs_ok_check member, that indicates if input contains any user-suplied values that are checked for being ok() and these values are referenced in chrono-specs. PR libstdc++/121154 libstdc++-v3/ChangeLog: * include/bits/chrono_io.h (_ChronoSpec::_M_time_point): Remove. (_ChronoSpec::_M_needs_ok_check): Define (__formatter_chrono::_M_parse): Set _M_needs_ok_check. (__formatter_chrono::_M_check_ok): Check values also for debug mode, and return __string_view. (__formatter_chrono::_M_format_to): Handle results of _M_check_ok. (__formatter_chrono::_M_wi, __formatter_chrono::_M_a_A) (__formatter_chrono::_M_b_B, __formatter_chrono::_M_C_y_Y) (__formatter_chrono::_M_d_e, __formatter_chrono::_M_F): Removed handling of _M_debug. (__formatter_chrono::__M_m): Print zero unpadded in _M_debug mode. (__formatter_duration::_S_spec_for): Remove _M_time_point refernce. (__formatter_duration::_M_parse): Override _M_needs_ok_check. * testsuite/std/time/month/io.cc: Test for localized !ok() values. * testsuite/std/time/weekday/io.cc: Test for localized !ok() values.
2025-07-18amdgcn, libgomp: Remove unused variable (PR121156)Andrew Stubbs1-6/+2
There's a new compiler warning breaking the build. This fixes it. The variable appears to be genuinely vestigial. libgomp/ChangeLog: PR target/121156 * config/gcn/bar.c (gomp_team_barrier_wait_end): Remove unused "generation" variable. (gomp_team_barrier_wait_cancel_end): Likewise.
2025-07-18tree-sra: Fix grp_covered flag computation when totally scalarizing (PR117423)Martin Jambor2-2/+56
Testcase of PR 117423 shows a flaw in the fancy way we do "total scalarization" in SRA now. We use the types encountered in the function body and not in type declaration (allowing us to totally scalarize when only one union field is ever used, since we effectively "skip" the union then) and can accommodate pre-existing accesses that happen to fall into padding. In this case, we skipped the union (bypassing the totally_scalarizable_type_p check) and the access falling into the "padding" is an aggregate and so not a candidate for SRA but actually containing data. Arguably total scalarization should just bail out when it encounters this situation (but I decided not to depend on this mainly because we'd need to detect all cases when we eventually cannot scalarize, such as when a scalar access has children accesses) but the actual bug is that the detection if all data in an aggregate is indeed covered by replacements just assumes that is always the case if total scalarization triggers which however may not be the case in cases like this - and perhaps more. This patch fixes the bug by just assuming that all padding is taken care of when total scalarization triggered, not that every access was actually scalarized. gcc/ChangeLog: 2025-07-17 Martin Jambor <mjambor@suse.cz> PR tree-optimization/117423 * tree-sra.cc (analyze_access_subtree): Fix computation of grp_covered flag. gcc/testsuite/ChangeLog: 2025-07-17 Martin Jambor <mjambor@suse.cz> PR tree-optimization/117423 * gcc.dg/tree-ssa/pr117423.c: New test.
2025-07-18tree-optimization/121126 - properly verify live LC PHIsRichard Biener2-1/+32
The following makes sure we analyze live LC PHIs not part of a double reduction. PR tree-optimization/121126 * tree-vect-stmts.cc (vect_analyze_stmt): Analyze the live lane extract for LC PHIs that are vect_internal_def. * gcc.dg/vect/pr121126.c: New testcase.
2025-07-18libstdc++: Fix hash<__int128> test for x32 [PR121150]Jonathan Wakely1-2/+2
I incorrectly assumed that all targets that support __int128 use the LP64 ABI, so size_t is a 64-bit type. But x32 uses ILP32 and still supports __int128 (because it's an ILP32 target on 64-bit hardware). Add casts to the tests so that we get the correct expected values using size_t type. libstdc++-v3/ChangeLog: PR libstdc++/121150 * testsuite/20_util/hash/int128.cc: Cast expected values to size_t.
2025-07-18libstdc++: Implement reverse iteration for _Utf_viewJonathan Wakely2-30/+183
This implements the missing functions in _Utf_iterator to support reverse iteration. All existing tests pass when the view is reversed, so that the same code units are seen when iterating forwards or backwards. libstdc++-v3/ChangeLog: * include/bits/unicode.h (_Utf_iterator::operator--): Reorder conditions and update position after reading a code unit. (_Utf_iterator::_M_read_reverse): Define. (_Utf_iterator::_M_read_utf8): Return extracted code point. (_Utf_iterator::_M_read_reverse_utf8): Define. (_Utf_iterator::_M_read_reverse_utf16): Define. (_Utf_iterator::_M_read_reverse_utf32): Define. * testsuite/ext/unicode/view.cc: Add checks for reversed views and reverse iteration. Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
2025-07-18libstdc++: Optimize _Utf_iterator for sizeJonathan Wakely1-6/+6
This reorders the data members of _Utf_iterator to avoid padding bytes between members due to alignment requirements. For x86_64 the previous layout had padding after _M_buf and after _M_to_increment for the common case where the iterators and sentinel types are pointers, so the size shrinks from 40 bytes to 32 bytes. (For i686 there's no change, it's still 20 bytes). We could compress the three uint8_t members into one byte by using bit-fields: uint8_t _M_buf_index : 2; // [0,3] uint8_t _M_buf_last : 3; // [0,4] uint8_t _M_to_increment : 3; // [0,4] But there doesn't seem to be any point, because it will just be slower to access them and there will be tail padding so the size isn't any smaller. We could also reduce _M_buf_last and _M_to_increment to 2 bits because the 0 value is only used for a default constructed iterator, and we don't actually care about the values in that case. Again, this doesn't seem worth doing. libstdc++-v3/ChangeLog: * include/bits/unicode.h (_Utf_iterator): Reorder data members to be more compact. Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
2025-07-18Remove non-SLP path from vectorizable_live_operationRichard Biener1-75/+40
This removes paths gated by !slp_node and propagates out ncopies == 1, thereby reducing the vectorizable_live_operation_1 API. * tree-vect-loop.cc (vectorizable_live_operation_1): Remove stmt_info and ncopies parameters. Remove !slp_node paths. (vectorizable_live_operation): Remove !slp_node paths.
2025-07-18tree-optimization/120924 - up --param uninit-max-chain-lenRichard Biener2-1/+35
The PR shows that the uninit analysis limits are set too low in cases we lower switches to ifs as happens on s390x for a linux kernel TU. This causes false positive uninit diagnostics as we abort the attempt to prove that a value is initialized on all paths. The new testcase only would require upping to 9. PR tree-optimization/120924 * params.opt (uninit-max-chain-len): Up from 8 to 12. * gcc.dg/uninit-pr120924.c: New testcase.
2025-07-18ada: Spurious actual/formal matching check failure for formal derived type.Steve Baird1-2/+15
In some cases involving a generic with two formal parameters, a formal package and a formal derived type that is derived from an interface type declared in the formal package, a legal instantiation of that generic is rejected with a message incorrectly stating that the second actual parameter does not implement the required interface. gcc/ada/ChangeLog: * sem_ch12.adb (Validate_Derived_Type_Instance): Cope with the case where the ancestor type for a formal derived type is declared in an earlier formal package but Get_Instance_Of does not return the corresponding type from the corresponding actual package.
2025-07-18ada: Back out change to Tbuild.Unchecked_Convert_ToBob Duff1-3/+3
...because it breaks one test that uses --RTS=light. "Is_Composite_Type" is needed; "not Is_Scalar_Type" was wrong. gcc/ada/ChangeLog: * tbuild.adb (Unchecked_Convert_To): Back out change.