aboutsummaryrefslogtreecommitdiff
path: root/gcc/testsuite
AgeCommit message (Collapse)AuthorFilesLines
2025-08-26[committed] RISC-V Testsuite hygieneJeff Law4-18/+9
Shreya and I were working through some testsuite failures and noticed that many of the current failures on the pioneer were just silly. We have tests that expect to see full architecture strings in their expected output when the bulk (some might say all) of the architecture string is irrelevant. Worse yet, we'd have different matching lines. ie we'd have one that would machine rv64gc_blah_blah and another for rv64imfa_blah_blah. Judicious wildcard usage cleans this up considerably. This fixes ~80 failures in the riscv.exp testsuite. Pushing to the trunk as it's happy on the pioneer native, riscv32-elf and riscv64-elf. gcc/testsuite/ * gcc.target/riscv/arch-25.c: Use wildcards to simplify/eliminate dg-error directives. * gcc.target/riscv/arch-ss-2.c: Similarly. * gcc.target/riscv/arch-zilsd-2.c: Similarly. * gcc.target/riscv/arch-zilsd-3.c: Similarly.
2025-08-26testsuite: restrict ctf-array-7 test to 64-bit targets [PR121411]David Faust1-2/+3
The test fails to compile on 32-bit targets because the arrays are too large. Restrict to targets where the array index type is 64-bits. Also note the relevant PR in the test comment. PR debug/121411 gcc/testsuite/ * gcc.dg/debug/ctf/ctf-array-7.c: Restrict to lp64,llp64 targets.
2025-08-26testsuite: arm: Disable sched2 and sched3 in unsigned-extend-2.cTorbjörn SVENSSON1-9/+4
Disable sched2 and sched3 to only have one order of instructions to consider. gcc/testsuite/ChangeLog: * gcc.target/arm/unsigned-extend-2.c: Disable sched2 and sched3 and update function body to match. Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
2025-08-26Enable unroll in the vectorizer when there's reduction for ↵liuhongt6-0/+73
FMA/DOT_PROD_EXPR/SAD_EXPR The patch is trying to unroll the vectorized loop when there're FMA/DOT_PRDO_EXPR/SAD_EXPR reductions, it will break cross-iteration dependence and enable more parallelism(since vectorize will also enable partial sum). When there's gather/scatter or scalarization in the loop, don't do the unroll since the performance bottleneck is not at the reduction. The unroll factor is set according to FMA/DOT_PROX_EXPR/SAD_EXPR CEIL ((latency * throught), num_of_reduction) .i.e For fma, latency is 4, throught is 2, if there's 1 FMA for reduction then unroll factor is 2 * 4 / 1 = 8. There's also a vect_unroll_limit, the final suggested_unroll_factor is set as MIN (vect_unroll_limix, 8). The vect_unroll_limit is mainly for register pressure, avoid to many spills. Ideally, all instructions in the vectorized loop should be used to determine unroll_factor with their (latency * throughput) / number, but that would too much for this patch, and may just GIGO, so the patch only considers 3 kinds of instructions: FMA, DOT_PROD_EXPR, SAD_EXPR. Note when DOT_PROD_EXPR is not native support, m_num_reduction += 3 * count which almost prevents unroll. There's performance boost for simple benchmark with DOT_PRDO_EXPR/FMA chain, slight improvement in SPEC2017 performance. gcc/ChangeLog: * config/i386/i386.cc (ix86_vector_costs::ix86_vector_costs): Addd new memeber m_num_reduc, m_prefer_unroll. (ix86_vector_costs::add_stmt_cost): Set m_prefer_unroll and m_num_reduc (ix86_vector_costs::finish_cost): Determine m_suggested_unroll_vector with consideration of reduc_lat_mult_thr, m_num_reduction and ix86_vect_unroll_limit. * config/i386/i386.h (enum ix86_reduc_unroll_factor): New enum. (processor_costs): Add reduc_lat_mult_thr and vect_unroll_limit. * config/i386/x86-tune-costs.h: Initialize reduc_lat_mult_thr and vect_unroll_limit. * config/i386/i386.opt: Add -param=ix86-vect-unroll-limit. gcc/testsuite/ChangeLog: * gcc.target/i386/vect_unroll-1.c: New test. * gcc.target/i386/vect_unroll-2.c: New test. * gcc.target/i386/vect_unroll-3.c: New test. * gcc.target/i386/vect_unroll-4.c: New test. * gcc.target/i386/vect_unroll-5.c: New test.
2025-08-26[PATCH] RISC-V: Add pattern for reverse floating-point dividePaul-Antoine Arras17-6/+288
This pattern enables the combine pass (or late-combine, depending on the case) to merge a vec_duplicate into a div RTL instruction. The vec_duplicate is the dividend operand. Before this patch, we have two instructions, e.g.: vfmv.v.f v2,fa0 vfdiv.vv v1,v2,v1 After, we get only one: vfrdiv.vf v1,v1,fa0 gcc/ChangeLog: * config/riscv/autovec-opt.md (*vfrdiv_vf_<mode>): Add new pattern to combine vec_duplicate + vfdiv.vv into vfrdiv.vf. * config/riscv/vector.md (@pred_<optab><mode>_reverse_scalar): Allow VLS modes. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f16.c: Add vfrdiv. * gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f64.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f64.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f64.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f64.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf_binop.h: Add support for reverse variants. * gcc.target/riscv/rvv/autovec/vx_vf/vf_binop_data.h: Add data for reverse variants. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfrdiv-run-1-f16.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfrdiv-run-1-f32.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfrdiv-run-1-f64.c: New test.
2025-08-26AArch64: extend cost model to cost outer loop vect where the inner loop is ↵Tamar Christina1-0/+18
invariant [PR121290] Consider the example: void f (int *restrict x, int *restrict y, int *restrict z, int n) { for (int i = 0; i < 4; ++i) { int res = 0; for (int j = 0; j < 100; ++j) res += y[j] * z[i]; x[i] = res; } } we currently vectorize as f: movi v30.4s, 0 ldr q31, [x2] add x2, x1, 400 .L2: ld1r {v29.4s}, [x1], 4 mla v30.4s, v29.4s, v31.4s cmp x2, x1 bne .L2 str q30, [x0] ret which is not useful because by doing outer-loop vectorization we're performing less work per iteration than we would had we done inner-loop vectorization and simply unrolled the inner loop. This patch teaches the cost model that if all your leafs are invariant, then adjust the loop cost by * VF, since every vector iteration has at least one lane really just doing 1 scalar. There are a couple of ways we could have solved this, one is to increase the unroll factor to process more iterations of the inner loop. This removes the need for the broadcast, however we don't support unrolling the inner loop within the outer loop. We only support unrolling by increasing the VF, which would affect the outer loop as well as the inner loop. We also don't directly support costing inner-loop vs outer-loop vectorization, and as such we're left trying to predict/steer the cost model ahead of time to what we think should be profitable. This patch attempts to do so using a heuristic which penalizes the outer-loop vectorization. We now cost the loop as note: Cost model analysis: Vector inside of loop cost: 2000 Vector prologue cost: 4 Vector epilogue cost: 0 Scalar iteration cost: 300 Scalar outside cost: 0 Vector outside cost: 4 prologue iterations: 0 epilogue iterations: 0 missed: cost model: the vector iteration cost = 2000 divided by the scalar iteration cost = 300 is greater or equal to the vectorization factor = 4. missed: not vectorized: vectorization not profitable. missed: not vectorized: vector version will never be profitable. missed: Loop costings may not be worthwhile. And subsequently generate: .L5: add w4, w4, w7 ld1w z24.s, p6/z, [x0, #1, mul vl] ld1w z23.s, p6/z, [x0, #2, mul vl] ld1w z22.s, p6/z, [x0, #3, mul vl] ld1w z29.s, p6/z, [x0] mla z26.s, p6/m, z24.s, z30.s add x0, x0, x8 mla z27.s, p6/m, z23.s, z30.s mla z28.s, p6/m, z22.s, z30.s mla z25.s, p6/m, z29.s, z30.s cmp w4, w6 bls .L5 and avoids the load and replicate if it knows it has enough vector pipes to do so. gcc/ChangeLog: PR target/121290 * config/aarch64/aarch64.cc (class aarch64_vector_costs ): Add m_loop_fully_scalar_dup. (aarch64_vector_costs::add_stmt_cost): Detect invariant inner loops. (adjust_body_cost): Adjust final costing if m_loop_fully_scalar_dup. gcc/testsuite/ChangeLog: PR target/121290 * gcc.target/aarch64/pr121290.c: New test.
2025-08-26[PATCH] RISC-V: Add pattern for vector-scalar single-width floating-point ↵Paul-Antoine Arras20-4/+340
multiply This pattern enables the combine pass (or late-combine, depending on the case) to merge a vec_duplicate into a mult RTL instruction. Before this patch, we have two instructions, e.g.: vfmv.v.f v2,fa0 vfmul.vv v1,v1,v2 After, we get only one: vfmul.vf v2,v2,fa0 gcc/ChangeLog: * config/riscv/autovec-opt.md (*vfmul_vf_<mode>): Add new pattern to combine vec_duplicate + vfmul.vv into vfmul.vf. * config/riscv/vector.md (@pred_<optab><mode>_scalar): Allow VLS modes. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f16.c: Add vfmul. * gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f64.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f64.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f64.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f64.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf_binop.h: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf_binop_data.h: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf_binop_run.h: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmul-run-1-f16.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmul-run-1-f32.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmul-run-1-f64.c: New test. * gcc.target/riscv/rvv/autovec/vls/floating-point-mul-2.c: Adjust scan dump. * gcc.target/riscv/rvv/autovec/vls/floating-point-mul-3.c: Likewise.
2025-08-26arm: testsuite: make gcc.target/arm/bics_3.c generate bics againRichard Earnshaw1-1/+30
The compiler is getting too smart! But this test is really intended to test that we generate BICS instead of BIC+CMP, so make the test use something that we can't subsequently fold away into a bit minipulation of a store-flag value. I've also added a couple of extra tests, so we now cover both the cases where we fold the result away and where that cannot be done. Also add a test that we don't generate a compare against 0, since that's really part of what this test is covering. gcc/testsuite: * gcc.target/arm/bics_3.c: Add some additional tests that cannot be folded to a bit manipulation.
2025-08-26tree-optimization/121659 - bogus swap of reduction operandsRichard Biener1-0/+11
The following addresses a bogus swapping of SLP operands of a reduction operation which gets STMT_VINFO_REDUC_IDX out of sync with the SLP operand order. In fact the most obvious mistake is that we simply swap operands even on the first stmt even when there's no difference in the comparison operators (for == and != at least). But there are more latent issues that I noticed and fixed up in the process. PR tree-optimization/121659 * tree-vect-slp.cc (vect_build_slp_tree_1): Do not allow matching up comparison operators by swapping if that would disturb STMT_VINFO_REDUC_IDX. Make sure to only actually mark operands for swapping when there was a mismatch and we're not processing the first stmt. * gcc.dg/vect/pr121659.c: New testcase.
2025-08-26i386: Fix up recent changes to use GFNI for rotates/shifts [PR121658]Jakub Jelinek1-0/+11
The vgf2p8affineqb_<mode><mask_name> pattern uses "register_operand" predicate for the first input operand, so using "general_operand" for the rotate operand passed to it leads to ICEs, and so does the "nonimmediate_operand" in the <insn>v16qi3 define_expand. The following patch fixes it by using "register_operand" in the former case (that pattern is TARGET_GFNI only) and using force_reg in the latter case (the pattern is TARGET_XOP || TARGET_GFNI and for XOP we can handle MEM operand). The rest of the changes are small formatting tweaks or use of const0_rtx instead of GEN_INT (0). 2025-08-26 Jakub Jelinek <jakub@redhat.com> PR target/121658 * config/i386/sse.md (<insn><mode>3 any_shift): Use const0_rtx instead of GEN_INT (0). (cond_<insn><mode> any_shift): Likewise. Formatting fix. (<insn><mode>3 any_rotate): Use register_operand predicate instead of general_operand for match_operand 1. Use const0_rtx instead of GEN_INT (0). (<insn>v16qi3 any_rotate): Use force_reg on operands[1]. Formatting fix. * config/i386/i386.cc (ix86_shift_rotate_cost): Comment formatting fixes. * gcc.target/i386/pr121658.c: New test.
2025-08-26Daily bump.GCC Administrator1-0/+99
2025-08-26RISC-V: Add test for vec_duplicate + vmacc.vv unsigned combine with GR2VR ↵Pan Li16-0/+100
cost 0, 1 and 15 Add asm dump check and run test for vec_duplicate + vmacc.vvm combine to vmacc.vx, with the GR2VR cost is 0, 2 and 15. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c: Add asm check for vx combine. * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u64.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u8.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u16.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u32.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u64.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u8.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u16.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u32.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u64.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u8.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx_vmacc-run-1-u16.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vx_vmacc-run-1-u32.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vx_vmacc-run-1-u64.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vx_vmacc-run-1-u8.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2025-08-26RISC-V: Add test for vec_duplicate + vmacc.vv signed combine with GR2VR cost ↵Pan Li19-0/+538
0, 1 and 15 Add asm dump check and run test for vec_duplicate + vmacc.vvm combine to vmacc.vx, with the GR2VR cost is 0, 2 and 15. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i16.c: Add asm check for vx combine. * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i32.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i64.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i8.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i16.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i32.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i64.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i8.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i16.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i32.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i64.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i8.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx_ternary.h: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vx_ternary_data.h: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vx_ternary_run.h: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vx_vmacc-run-1-i16.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vx_vmacc-run-1-i32.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vx_vmacc-run-1-i64.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vx_vmacc-run-1-i8.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2025-08-26omp-expand: Initialize fd->loop.n2 if needed for the zero iter case [PR121453]Jakub Jelinek1-0/+18
When expand_omp_for_init_counts is called from expand_omp_for_generic, zero_iter1_bb is NULL and the code always creates a new bb in which it clears fd->loop.n2 var (if it is a var), because it can dominate code with lastprivate guards that use the var. When called from other places, zero_iter1_bb is non-NULL and so we don't insert the clearing (and can't, because the same bb is used also for the non-zero iterations exit and in that case we need to preserve the iteration count). Clearing is also not necessary when e.g. outermost collapsed loop has constant non-zero number of iterations, in that case we initialize the var to something already earlier. The following patch makes sure to clear it if it hasn't been initialized yet before the first check for zero iterations. 2025-08-26 Jakub Jelinek <jakub@redhat.com> PR middle-end/121453 * omp-expand.cc (expand_omp_for_init_counts): Clear fd->loop.n2 before first zero count check if zero_iter1_bb is non-NULL upon entry and fd->loop.n2 has not been written yet. * gcc.dg/gomp/pr121453.c: New test.
2025-08-25Add a test for PR tree-optimization/121656H.J. Lu1-0/+21
PR tree-optimization/121656 * gcc.dg/pr121656.c: New file. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2025-08-25ctf: avoid overflow for array num elements [PR121411]David Faust1-0/+23
CTF array encoding uses uint32 for number of elements. This means there is a hard upper limit on array types which the format can represent. GCC internally was also using a uint32_t for this, which would overflow when translating from DWARF for arrays with more than UINT32_MAX elements. Use an unsigned HOST_WIDE_INT instead to fetch the array bound, and fall back to CTF_K_UNKNOWN if the array cannot be represented in CTF. PR debug/121411 gcc/ * dwarf2ctf.cc (gen_ctf_subrange_type): Use unsigned HWI for array_num_elements. Fallback to CTF_K_UNKNOWN if the array type has too many elements for CTF to represent. gcc/testsuite/ * gcc.dg/debug/ctf/ctf-array-7.c: New test.
2025-08-25Rewrite bool loads for undefined case [PR121279]Andrew Pinski1-0/+49
Just like r16-465-gf2bb7ffe84840d8 but this time instead of a VCE there is a full on load from a boolean. This showed up when trying to remove the extra copy in the testcase from the revision mentioned above (pr120122-1.c). So when moving loads from a boolean type from being conditional to non-conditional, the load needs to become a full load and then casted into a bool so that the upper bits are correct. Bitfields loads will always do the truncation so they don't need to be rewritten. Non boolean types always do the truncation too. What we do is wrap the original reference with a VCE which causes the full load and then do a casting to do the truncation. Using fold_build1 with VCE will do the correct thing if there is a secondary VCE and will also fold if this was just a plain MEM_REF so there is no need to handle those 2 cases special either. Changes since v1: * v2: Use VIEW_CONVERT_EXPR instead of doing a manual load. Accept all non mode precision loads rather than just boolean ones. * v3: Move back to checking boolean type. Don't handle BIT_FIELD_REF. Add asserts for IMAG/REAL_PART_EXPR. Bootstrapped and tested on x86_64-linux-gnu. PR tree-optimization/121279 gcc/ChangeLog: * gimple-fold.cc (gimple_needing_rewrite_undefined): Return true for non mode precision boolean loads. (rewrite_to_defined_unconditional): Handle non mode precision loads. gcc/testsuite/ChangeLog: * gcc.dg/torture/pr121279-1.c: New test. Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
2025-08-25c++: Implement C++ CWG3048 - Empty destructuring expansion statementsJakub Jelinek3-2/+42
The following patch implements the proposed resolution of https://cplusplus.github.io/CWG/issues/3048.html Instead of rejecting structured binding size it just builds a normal decl rather than structured binding declaration. 2025-08-25 Jakub Jelinek <jakub@redhat.com> * pt.cc (finish_expansion_stmt): Implement C++ CWG3048 - Empty destructuring expansion statements. Don't error for destructuring expansion stmts if sz is 0, don't call fit_decomposition_lang_decl if n is 0 and pass NULL rather than this_decomp to cp_finish_decl. * g++.dg/cpp26/expansion-stmt15.C: Don't expect error on destructuring expansion stmts with structured binding size 0. * g++.dg/cpp26/expansion-stmt21.C: New test. * g++.dg/cpp26/expansion-stmt22.C: New test.
2025-08-25c++: Check for *jump_target earlier in cxx_bind_parameters_in_call [PR121601]Jakub Jelinek1-0/+19
The following testcase ICEs, because the /* Check we aren't dereferencing a null pointer when calling a non-static member function, which is undefined behaviour. */ if (i == 0 && DECL_OBJECT_MEMBER_FUNCTION_P (fun) && integer_zerop (arg) /* But ignore calls from within compiler-generated code, to handle cases like lambda function pointer conversion operator thunks which pass NULL as the 'this' pointer. */ && !(TREE_CODE (t) == CALL_EXPR && CALL_FROM_THUNK_P (t))) { if (!ctx->quiet) error_at (cp_expr_loc_or_input_loc (x), "dereferencing a null pointer"); *non_constant_p = true; } checking is done before testing if (*jump_target). Especially when throws (jump_target), arg can be (and is on this testcase) NULL_TREE, so calling integer_zerop on it ICEs. Fixed by moving the if (*jump_target) test earlier. 2025-08-25 Jakub Jelinek <jakub@redhat.com> PR c++/121601 * constexpr.cc (cxx_bind_parameters_in_call): Move break if *jump_target before the check for null this object pointer. * g++.dg/cpp26/constexpr-eh16.C: New test.
2025-08-25tree-optimization/121638 - missed SLP discovery of live inductionRichard Biener1-0/+74
The following fixes a missed SLP discovery of a live induction. Our pattern matching of those fails because of the PR81529 fix which I think was misguided and should now no longer be relevant. So this essentially reverts that fix. I have added a GIMPLE testcase to increase the chance the particular IL is preserved through the future. This shows that how we make some IVs live because of early-break isn't quite correct, so I had to preserve a hack here. Hopefully to be investigated at some point. PR tree-optimization/121638 * tree-vect-stmts.cc (process_use): Do not make induction PHI backedge values relevant. * gcc.dg/vect/pr121638.c: New testcase.
2025-08-25Use x86 GFNI for vectorized constant byte shifts/rotatesAndi Kleen7-0/+371
The GFNI AVX gf2p8affineqb instruction can be used to implement vectorized byte shifts or rotates. This patch uses them to implement shift and rotate patterns to allow the vectorizer to use them. Previously AVX couldn't do rotates (except with XOP) and had to handle 8 bit shifts with a half throughput 16 bit shift. This is only implemented for constant shifts. In theory it could be used with a lookup table for variable shifts, but it's unclear if it's worth it. The vectorizer cost model could be improved, but seems to work for now. It doesn't model the true latencies of the instructions. Also it doesn't account for the memory loading of the mask, assuming that for a loop it will be loaded outside the loop. The instructions would also support more complex patterns (e.g. arbitary bit movement or inversions), so some of the tricks applied to ternlog could be applied here too to collapse more code. It's trickier because the input patterns can be much longer since they can apply to every bit individually. I didn't attempt any of this. There's currently no test case for the masked/cond_ variants, they seem to be difficult to trigger with the vectorizer. Suggestions for a test case for them welcome. gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_vgf2p8affine_shift_matrix): New function to lookup shift/rotate matrixes for gf2p8affine. * config/i386/i386-protos.h (ix86_vgf2p8affine_shift_matrix): Declare new function. * config/i386/i386.cc (ix86_shift_rotate_cost): Add cost model for shift/rotate implemented using gf2p8affine. * config/i386/sse.md (VI1_AVX512_3264): New mode iterator. (<insn><mode>3): Add GFNI case for shift patterns. (cond_<insn><mode>3): New pattern. (<insn><mode>3<mask_name>): Dito. (<insn>v16qi): New rotate pattern to handle XOP V16QI case and GFNI. (rotl<mode>3, rotr<mode>3): Exclude V16QI case. gcc/testsuite/ChangeLog: * gcc.target/i386/shift-gf2p8affine-1.c: New test * gcc.target/i386/shift-gf2p8affine-2.c: New test * gcc.target/i386/shift-gf2p8affine-3.c: New test * gcc.target/i386/shift-v16qi-4.c: New test * gcc.target/i386/shift-gf2p8affine-5.c: New test * gcc.target/i386/shift-gf2p8affine-6.c: New test * gcc.target/i386/shift-gf2p8affine-7.c: New test
2025-08-25LoongArch: Fix ICE in highway-1.3.0 testsuite [PR121634]Xi Ruoyao1-0/+15
I can't believe I made such a stupid pasto and the regression test didn't detect anything wrong. PR target/121634 gcc/ * config/loongarch/simd.md (simd_maddw_evod_<mode>_<su>): Use WVEC_HALF instead of WVEC for the mode of the sign_extend for the rhs of multiplication. gcc/testsuite/ * gcc.target/loongarch/pr121634.c: New test.
2025-08-24Fix invalid right shift count with recent ifcvt changesJeff Law1-2/+2
I got too clever trying to simplify the right shift computation in my recent ifcvt patch. Interestingly enough, I haven't seen anything but the Linaro CI configuration actually trip the problem, though the code is clearly wrong. The problem I was trying to avoid were the leading zeros when calling clz on a HWI when the real object is just say 32 bits. The net is we get a right shift count of "2" when we really wanted a right shift count of 30. That causes the execution aspect of bics_3 to fail. The scan failures are due to creating slightly more efficient code. THe new code sequences don't need to use conditional execution for selection and thus we can use bic rather bics which requires a twiddle in the scan. I reviewed recent bug reports and haven't seen one for this issue. So no new testcase as this is covered by the armv7 testsuite in the right configuration. Bootstrapped and regression tested on x86_64, also verified it fixes the Linaro reported CI failure and verified the crosses are still happy. Pushing to the trunk. gcc/ * ifcvt.cc (noce_try_sign_bit_splat): Fix right shift computation. gcc/testsuite/ * gcc.target/arm/bics_3.c: Adjust expected output
2025-08-24Daily bump.GCC Administrator1-0/+19
2025-08-23c++: Fix greater-than operator in braced-init-lists [PR116928]Eczbek1-0/+4
PR c++/116928 gcc/cp/ChangeLog: * parser.cc (cp_parser_braced_list): Set greater_than_is_operator_p. gcc/testsuite/ChangeLog: * g++.dg/parse/template33.C: New test. Reviewed-by: Jason Merrill <jason@redhat.com>
2025-08-23x86: Compile noplt-(g|l)d-1.c with -mtls-dialect=gnuH.J. Lu2-2/+2
Compile noplt-gd-1.c and noplt-ld-1.c with -mtls-dialect=gnu to support the --with-tls=gnu2 configure option since they scan the assembly output for the __tls_get_addr call which is generated by -mtls-dialect=gnu. PR target/120933 * gcc.target/i386/noplt-gd-1.c (dg-options): Add -mtls-dialect=gnu. * gcc.target/i386/noplt-ld-1.c (dg-options): Likewise. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2025-08-23c++/modules: Provide definitions of synthesized methods outside their ↵Nathaniel Shead3-0/+56
defining module [PR120499] In the PR, we're getting a linker error from _Vector_impl's destructor never getting emitted. This is because of a combination of factors: 1. in imp-member-4_a, the destructor is not used and so there is no definition generated. 2. in imp-member-4_b, the destructor gets synthesized (as part of the synthesis for Coll's destructor) but is not ODR-used and so does not get emitted. Despite there being a definition provided in this TU, the destructor is still considered imported and so isn't streamed into the module body. 3. in imp-member-4_c, we need to ODR-use the destructor but we only got a forward declaration from imp-member-4_b, so we cannot emit a body. The point of failure here is step 2; this function has effectively been declared in the imp-member-4_b module, and so we shouldn't treat it as imported. This way we'll properly stream the body so that importers can emit it. PR c++/120499 gcc/cp/ChangeLog: * method.cc (synthesize_method): Set the instantiating module. gcc/testsuite/ChangeLog: * g++.dg/modules/imp-member-4_a.C: New test. * g++.dg/modules/imp-member-4_b.C: New test. * g++.dg/modules/imp-member-4_c.C: New test. Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
2025-08-23Daily bump.GCC Administrator1-0/+60
2025-08-22[PR rtl-optimization/120553] Improve selecting between constants based on ↵Jeff Law8-0/+580
sign bit test While working to remove mvconst_internal I stumbled over a regression in the code to handle signed division by a power of two. In that sequence we want to select between 0, 2^n-1 by pairing a sign bit splat with a subsequent logical right shift. This can be done without branches or conditional moves. Playing with it a bit made me realize there's a handful of selections we can do based on a sign bit test. Essentially there's two broad cases. Clearing bits after the sign bit splat. So we have 0, -1, if we clear bits the 0 stays as-is, but the -1 could easily turn into 2^n-1, ~2^n-1, or some small constants. Setting bits after the sign bit splat. If we have 0, -1, setting bits the -1 stays as-is, but the 0 can turn into 2^n, a small constant, etc. Shreya and I originally started looking at target patterns to do this, essentially discovering conditional move forms of the selects and rewriting them into something more efficient. That got out of control pretty quickly and it relied on if-conversion to initially create the conditional move. The better solution is to actually discover the cases during if-conversion itself. That catches cases that were previously being missed, checks cost models, and is actually simpler since we don't have to distinguish between things like ori and bseti, instead we just emit the natural RTL and let the target figure it out. In the ifcvt implementation we put these cases just before trying the traditional conditional move sequences. Essentially these are a last attempt before trying the generalized conditional move sequence. This as been bootstrapped and regression tested on aarch64, riscv, ppc64le, s390x, alpha, m68k, sh4eb, x86_64 and probably a couple others I've forgotten. It's also been tested on the other embedded targets. Obviously the new tests are risc-v specific, so that testing was primarily to make sure we didn't ICE, generate incorrect code or regress target existing specific tests. Raphael has some changes to attack this from the gimple direction as well. I think the latest version of those is on me to push through internal review. PR rtl-optimization/120553 gcc/ * ifcvt.cc (noce_try_sign_bit_splat): New function. (noce_process_if_block): Use it. gcc/testsuite/ * gcc.target/riscv/pr120553-1.c: New test. * gcc.target/riscv/pr120553-2.c: New test. * gcc.target/riscv/pr120553-3.c: New test. * gcc.target/riscv/pr120553-4.c: New test. * gcc.target/riscv/pr120553-5.c: New test. * gcc.target/riscv/pr120553-6.c: New test. * gcc.target/riscv/pr120553-7.c: New test. * gcc.target/riscv/pr120553-8.c: New test.
2025-08-22RISC-V: Add testcase for scalar unsigned SAT_MUL form 3Pan Li27-0/+382
Add run and asm check test cases for scalar unsigned SAT_MUL form 3. gcc/testsuite/ChangeLog: * gcc.target/riscv/sat/sat_arith.h: Add test helper macros. * gcc.target/riscv/sat/sat_u_mul-4-u16-from-u128.c: New test. * gcc.target/riscv/sat/sat_u_mul-4-u16-from-u32.c: New test. * gcc.target/riscv/sat/sat_u_mul-4-u16-from-u64.c: New test. * gcc.target/riscv/sat/sat_u_mul-4-u16-from-u64.rv32.c: New test. * gcc.target/riscv/sat/sat_u_mul-4-u32-from-u128.c: New test. * gcc.target/riscv/sat/sat_u_mul-4-u32-from-u64.c: New test. * gcc.target/riscv/sat/sat_u_mul-4-u32-from-u64.rv32.c: New test. * gcc.target/riscv/sat/sat_u_mul-4-u64-from-u128.c: New test. * gcc.target/riscv/sat/sat_u_mul-4-u8-from-u128.c: New test. * gcc.target/riscv/sat/sat_u_mul-4-u8-from-u16.c: New test. * gcc.target/riscv/sat/sat_u_mul-4-u8-from-u32.c: New test. * gcc.target/riscv/sat/sat_u_mul-4-u8-from-u64.c: New test. * gcc.target/riscv/sat/sat_u_mul-4-u8-from-u64.rv32.c: New test. * gcc.target/riscv/sat/sat_u_mul-run-4-u16-from-u128.c: New test. * gcc.target/riscv/sat/sat_u_mul-run-4-u16-from-u32.c: New test. * gcc.target/riscv/sat/sat_u_mul-run-4-u16-from-u64.c: New test. * gcc.target/riscv/sat/sat_u_mul-run-4-u16-from-u64.rv32.c: New test. * gcc.target/riscv/sat/sat_u_mul-run-4-u32-from-u128.c: New test. * gcc.target/riscv/sat/sat_u_mul-run-4-u32-from-u64.c: New test. * gcc.target/riscv/sat/sat_u_mul-run-4-u32-from-u64.rv32.c: New test. * gcc.target/riscv/sat/sat_u_mul-run-4-u64-from-u128.c: New test. * gcc.target/riscv/sat/sat_u_mul-run-4-u8-from-u128.c: New test. * gcc.target/riscv/sat/sat_u_mul-run-4-u8-from-u16.c: New test. * gcc.target/riscv/sat/sat_u_mul-run-4-u8-from-u32.c: New test. * gcc.target/riscv/sat/sat_u_mul-run-4-u8-from-u64.c: New test. * gcc.target/riscv/sat/sat_u_mul-run-4-u8-from-u64.rv32.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2025-08-22Emit the TLS call after NOTE_INSN_FUNCTION_BEGH.J. Lu2-0/+28
For the beginning basic block: (note 4 0 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK) (note 2 4 26 2 NOTE_INSN_FUNCTION_BEG) emit the TLS call after NOTE_INSN_FUNCTION_BEG. gcc/ PR target/121635 * config/i386/i386-features.cc (ix86_emit_tls_call): Emit the TLS call after NOTE_INSN_FUNCTION_BEG. gcc/testsuite/ PR target/121635 * gcc.target/i386/pr121635-1a.c: New test. * gcc.target/i386/pr121635-1b.c: Likewise. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2025-08-22testsuite: Fix g++.dg/abi/mangle83.C for -fshort-enumsNathaniel Shead1-2/+2
Linaro CI informed me that this test fails on ARM thumb-m7-hard-eabi. This appears to be because the target defaults to -fshort-enums, and so the mangled names are inaccurate. This patch just disables the implicit type enum test for this case. gcc/testsuite/ChangeLog: * g++.dg/abi/mangle83.C: Disable implicit enum test for -fshort-enums. Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
2025-08-21[arm] require armv7 support for [PR120424]Alexandre Oliva2-1/+4
Without stating the architecture version required by the test, test runs with options that are incompatible with the required architecture version fail, e.g. -mfloat-abi=hard. armv7 was not covered by the long list of arm variants in target-supports.exp, so add it, and use it for the effective target requirement and for the option. for gcc/testsuite/ChangeLog PR rtl-optimization/120424 * lib/target-supports.exp (arm arches): Add arm_arch_v7. * g++.target/arm/pr120424.C: Require armv7 support. Use dg-add-options arm_arch_v7 instead of explicit -march=armv7.
2025-08-22Daily bump.GCC Administrator1-0/+59
2025-08-21Fortran: Fix NULL pointer issue.Steven G. Kargl1-0/+5
PR fortran/121627 gcc/fortran/ChangeLog: * module.cc (create_int_parameter_array): Avoid NULL pointer dereference and enhance error message. gcc/testsuite/ChangeLog: * gfortran.dg/pr121627.f90: New test.
2025-08-21c: Add folding of nullptr_t in some cases [PR121478]Andrew Pinski1-0/+32
The middle-end does not fully understand NULLPTR_TYPE. So it gets confused a lot of the time when dealing with it. This adds the folding that is similarly done in the C++ front-end already. In some cases it should produce slightly better code as there is no reason to load from a nullptr_t variable as it is always NULL. The following is handled: nullptr_v ==/!= nullptr_v -> true/false (ptr)nullptr_v -> (ptr)0, nullptr_v f(nullptr_v) -> f ((nullptr, nullptr_v)) The last one is for conversion inside ... . Bootstrapped and tested on x86_64-linux-gnu. PR c/121478 gcc/c/ChangeLog: * c-fold.cc (c_fully_fold_internal): Fold nullptr_t ==/!= nullptr_t. * c-typeck.cc (convert_arguments): Handle conversion from nullptr_t for varargs. (convert_for_assignment): Handle conversions from nullptr_t to pointer type specially. gcc/testsuite/ChangeLog: * gcc.dg/torture/pr121478-1.c: New test. Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
2025-08-21c++: constexpr clobber of const [PR121068]Jason Merrill1-0/+26
Since r16-3022, 20_util/variant/102912.cc was failing in C++20 and above due to wrong errors about destruction modifying a const object; destruction is OK. PR c++/121068 gcc/cp/ChangeLog: * constexpr.cc (cxx_eval_store_expression): Allow clobber of a const object. gcc/testsuite/ChangeLog: * g++.dg/cpp2a/constexpr-dtor18.C: New test.
2025-08-21RISC-V: testsuite: Fix DejaGnu support for riscv_zvfhPaul-Antoine Arras13-18/+14
Call check_effective_target_riscv_zvfh_ok rather than check_effective_target_riscv_zvfh in vx_vf_*run-1-f16.c run tests and ensure that they are actually run. Also fix remove_options_for_riscv_zvfh. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmacc-run-1-f16.c: Call check_effective_target_riscv_zvfh_ok rather than check_effective_target_riscv_zvfh. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmadd-run-1-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmsac-run-1-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmsub-run-1-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfnmacc-run-1-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfnmadd-run-1-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfnmsac-run-1-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfnmsub-run-1-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfwmacc-run-1-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfwmsac-run-1-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfwnmacc-run-1-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfwnmsac-run-1-f16.c: Likewise. * lib/target-supports.exp (check_effective_target_riscv_zvfh_ok): Append zvfh instead of v to march. (remove_options_for_riscv_zvfh): Remove duplicate and call remove_ rather than add_options_for_riscv_z_ext.
2025-08-21rtl-ssa: Add missing live-out uses [PR121619]Richard Sandiford1-0/+33
This PR is another bug in the rtl-ssa code to manage live-out uses. It seems that this didn't get much coverage until recently. In the testcase, late-combine first removed a register-to-register move by substituting into all uses, some of which were in other EBBs. This was done after checking make_uses_available, which (as expected) says that single dominating definitions are available everywhere that the definition dominates. But the update failed to add appropriate live-out uses, so a later parallelisation attempt tried to move the new destination into a later block. gcc/ PR rtl-optimization/121619 * rtl-ssa/functions.h (function_info::commit_make_use_available): Declare. * rtl-ssa/blocks.cc (function_info::commit_make_use_available): New function. * rtl-ssa/changes.cc (function_info::apply_changes_to_insn): Use it. gcc/testsuite/ PR rtl-optimization/121619 * gcc.dg/pr121619.c: New test.
2025-08-21x86-64: Emit the TLS call after NOTE_INSN_BASIC_BLOCKH.J. Lu2-0/+65
For a basic block with only a label: (code_label 78 11 77 3 14 (nil) [1 uses]) (note 77 78 54 3 [bb 3] NOTE_INSN_BASIC_BLOCK) emit the TLS call after NOTE_INSN_BASIC_BLOCK, instead of before NOTE_INSN_BASIC_BLOCK, to avoid x.c: In function ‘aout_16_write_syms’: x.c:54:1: error: NOTE_INSN_BASIC_BLOCK is missing for block 3 54 | } | ^ x.c:54:1: error: NOTE_INSN_BASIC_BLOCK 77 in middle of basic block 3 during RTL pass: x86_cse x.c:54:1: internal compiler error: verify_flow_info failed gcc/ PR target/121607 * config/i386/i386-features.cc (ix86_emit_tls_call): Emit the TLS call after NOTE_INSN_BASIC_BLOCK in a basic block with only a label. gcc/testsuite/ PR target/121607 * gcc.target/i386/pr121607-1a.c: New test. * gcc.target/i386/pr121607-1b.c: Likewise. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2025-08-21Fortran: gfortran PDT component access [PR84122, PR85942]Paul Thomas2-0/+144
2025-08-21 Paul Thomas <pault@gcc.gnu.org> gcc/fortran PR fortran/84122 * parse.cc (parse_derived): PDT type parameters are not allowed an explicit access specification and must appear before a PRIVATE statement. If a PRIVATE statement is seen, mark all the other components as PRIVATE. PR fortran/85942 * simplify.cc (get_kind): Convert a PDT KIND component into a specification expression using the default initializer. gcc/testsuite/ PR fortran/84122 * gfortran.dg/pdt_38.f03: New test. PR fortran/85942 * gfortran.dg/pdt_39.f03: New test.
2025-08-20c++: pointer to auto member function [PR120757]Jason Merrill1-0/+20
Here r13-1210 correctly changed &A<int>::foo to not be considered type-dependent, but tsubst_expr of the OFFSET_REF got confused trying to tsubst a type that involved auto. Fixed by getting the type from the member rather than tsubst. PR c++/120757 gcc/cp/ChangeLog: * pt.cc (tsubst_expr) [OFFSET_REF]: Don't tsubst the type. gcc/testsuite/ChangeLog: * g++.dg/cpp1y/auto-fn66.C: New test.
2025-08-21Daily bump.GCC Administrator1-0/+40
2025-08-20c++: lambda capture and shadowing [PR121553]Marek Polacek4-6/+21
P2036 says that this: [x=1]{ int x; } should be rejected, but with my P2036 we started giving an error for the attached testcase as well, breaking Dolphin. So let's keep the error only for init-captures. PR c++/121553 gcc/cp/ChangeLog: * name-lookup.cc (check_local_shadow): Check !is_normal_capture_proxy. gcc/testsuite/ChangeLog: * g++.dg/warn/Wshadow-19.C: Revert P2036 changes. * g++.dg/warn/Wshadow-6.C: Likewise. * g++.dg/warn/Wshadow-20.C: New test. * g++.dg/warn/Wshadow-21.C: New test. Reviewed-by: Jason Merrill <jason@redhat.com>
2025-08-20Provide new option -fdiagnostics-show-context=N for -Warray-bounds, ↵Qing Zhao14-0/+681
-Wstringop-* warnings [PR109071,PR85788,PR88771,PR106762,PR108770,PR115274,PR117179] '-fdiagnostics-show-context[=DEPTH]' '-fno-diagnostics-show-context' With this option, the compiler might print the interesting control flow chain that guards the basic block of the statement which has the warning. DEPTH is the maximum depth of the control flow chain. Currently, The list of the impacted warning options includes: '-Warray-bounds', '-Wstringop-overflow', '-Wstringop-overread', '-Wstringop-truncation'. and '-Wrestrict'. More warning options might be added to this list in future releases. The forms '-fdiagnostics-show-context' and '-fno-diagnostics-show-context' are aliases for '-fdiagnostics-show-context=1' and '-fdiagnostics-show-context=0', respectively. For example: $ cat t.c extern void warn(void); static inline void assign(int val, int *regs, int *index) { if (*index >= 4) warn(); *regs = val; } struct nums {int vals[4];}; void sparx5_set (int *ptr, struct nums *sg, int index) { int *val = &sg->vals[index]; assign(0, ptr, &index); assign(*val, ptr, &index); } $ gcc -Wall -O2 -c -o t.o t.c t.c: In function ‘sparx5_set’: t.c:12:23: warning: array subscript 4 is above array bounds of ‘int[4]’ [-Warray-bounds=] 12 | int *val = &sg->vals[index]; | ~~~~~~~~^~~~~~~ t.c:8:18: note: while referencing ‘vals’ 8 | struct nums {int vals[4];}; | ^~~~ In the above, Although the warning is correct in theory, the warning message itself is confusing to the end-user since there is information that cannot be connected to the source code directly. It will be a nice improvement to add more information in the warning message to report where such index value come from. With the new option -fdiagnostics-show-context=1, the warning message for the above testing case is now: $ gcc -Wall -O2 -fdiagnostics-show-context=1 -c -o t.o t.c t.c: In function ‘sparx5_set’: t.c:12:23: warning: array subscript 4 is above array bounds of ‘int[4]’ [-Warray-bounds=] 12 | int *val = &sg->vals[index]; | ~~~~~~~~^~~~~~~ ‘sparx5_set’: events 1-2 4 | if (*index >= 4) | ^ | | | (1) when the condition is evaluated to true ...... 12 | int *val = &sg->vals[index]; | ~~~~~~~~~~~~~~~ | | | (2) warning happens here t.c:8:18: note: while referencing ‘vals’ 8 | struct nums {int vals[4];}; | ^~~~ PR tree-optimization/109071 PR tree-optimization/85788 PR tree-optimization/88771 PR tree-optimization/106762 PR tree-optimization/108770 PR tree-optimization/115274 PR tree-optimization/117179 gcc/ChangeLog: * Makefile.in (OBJS): Add diagnostic-context-rich-location.o. * common.opt (fdiagnostics-show-context): New option. (fdiagnostics-show-context=): New option. * diagnostic-context-rich-location.cc: New file. * diagnostic-context-rich-location.h: New file. * doc/invoke.texi (fdiagnostics-details): Add documentation for the new options. * gimple-array-bounds.cc (check_out_of_bounds_and_warn): Add one new parameter. Use rich location with details for warning_at. (array_bounds_checker::check_array_ref): Use rich location with ditails for warning_at. (array_bounds_checker::check_mem_ref): Add one new parameter. Use rich location with details for warning_at. (array_bounds_checker::check_addr_expr): Use rich location with move_history_diagnostic_path for warning_at. (array_bounds_checker::check_array_bounds): Call check_mem_ref with one more parameter. * gimple-array-bounds.h: Update prototype for check_mem_ref. * gimple-ssa-warn-access.cc (warn_string_no_nul): Use rich location with details for warning_at. (maybe_warn_nonstring_arg): Likewise. (maybe_warn_for_bound): Likewise. (warn_for_access): Likewise. (check_access): Likewise. (pass_waccess::check_strncat): Likewise. (pass_waccess::maybe_check_access_sizes): Likewise. * gimple-ssa-warn-restrict.cc (pass_wrestrict::execute): Calculate dominance info for diagnostics show context. (maybe_diag_overlap): Use rich location with details for warning_at. (maybe_diag_access_bounds): Use rich location with details for warning_at. gcc/testsuite/ChangeLog: * gcc.dg/pr109071.c: New test. * gcc.dg/pr109071_1.c: New test. * gcc.dg/pr109071_10.c: New test. * gcc.dg/pr109071_11.c: New test. * gcc.dg/pr109071_12.c: New test. * gcc.dg/pr109071_2.c: New test. * gcc.dg/pr109071_3.c: New test. * gcc.dg/pr109071_4.c: New test. * gcc.dg/pr109071_5.c: New test. * gcc.dg/pr109071_6.c: New test. * gcc.dg/pr109071_7.c: New test. * gcc.dg/pr109071_8.c: New test. * gcc.dg/pr109071_9.c: New test. * gcc.dg/pr117375.c: New test.
2025-08-19x86: Place the TLS call before all register setting BBsH.J. Lu4-0/+104
We can't place a TLS call before a conditional jump in a basic block like (code_label 13 11 14 4 2 (nil) [1 uses]) (note 14 13 16 4 [bb 4] NOTE_INSN_BASIC_BLOCK) (jump_insn 16 14 17 4 (set (pc) (if_then_else (le (reg:CCNO 17 flags) (const_int 0 [0])) (label_ref 27) (pc))) "x.c":10:21 discrim 1 1462 {*jcc} (expr_list:REG_DEAD (reg:CCNO 17 flags) (int_list:REG_BR_PROB 628353713 (nil))) -> 27) since the TLS call will clobber flags register nor place a TLS call in a basic block if any live caller-saved registers aren't dead at the end of the basic block: ;; live in 6 [bp] 7 [sp] 16 [argp] 17 [flags] 19 [frame] 104 ;; live gen 0 [ax] 102 106 108 116 117 118 120 ;; live kill 5 [di] Instead, we should place such call before all register setting basic blocks which dominate the current basic block. Keep track the replaced GNU and GNU2 TLS instructions. Use these info to place the __tls_get_addr call and mark FLAGS register as dead. gcc/ PR target/121572 * config/i386/i386-features.cc (replace_tls_call): Add a bitmap argument and put the updated TLS instruction in the bitmap. (ix86_get_dominator_for_reg): New. (ix86_check_flags_reg): Likewise. (ix86_emit_tls_call): Likewise. (ix86_place_single_tls_call): Add 2 bitmap arguments for updated GNU and GNU2 TLS instructions. Call ix86_emit_tls_call to emit TLS instruction. Correct debug dump for before instruction. gcc/testsuite/ PR target/121572 * gcc.target/i386/pr121572-1a.c: New test. * gcc.target/i386/pr121572-1b.c: Likewise. * gcc.target/i386/pr121572-2a.c: Likewise. * gcc.target/i386/pr121572-2b.c: Likewise. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2025-08-20Daily bump.GCC Administrator1-0/+44
2025-08-19c++: testcase tweak for -fimplicit-constexprJason Merrill1-1/+1
This testcase is testing the difference between functions that are or are not declared constexpr. gcc/testsuite/ChangeLog: * g++.dg/cpp26/expansion-stmt16.C: Add -fno-implicit-constexpr.
2025-08-19c++: Fix ICE on mangling invalid compound requirement [PR120618]Ben Wu2-1/+14
This testcase caused an ICE when mangling the invalid type-constraint in write_requirement since write_type_constraint expects a TEMPLATE_TYPE_PARM. Setting the trailing return type to NULL_TREE when a return-type-requirement is found in place of a type-constraint prevents the failed assertion in write_requirement. It also allows the invalid constraint to be satisfied in some contexts to prevent redundant errors, e.g. in concepts-requires5.C. Bootstrapped and tested on x86_64-linux-gnu. PR c++/120618 gcc/cp/ChangeLog: * parser.cc (cp_parser_compound_requirement): Set type to NULL_TREE for invalid type-constraint. gcc/testsuite/ChangeLog: * g++.dg/cpp2a/concepts-requires5.C: Don't require redundant diagnostic in static assertion. * g++.dg/concepts/pr120618.C: New test. Suggested-by: Jason Merrill <jason@redhat.com>
2025-08-19middle-end: Fix malloc like functions when calling with void "return" [PR120024]Andrew Pinski2-0/+22
When expanding malloc like functions, we copy the return register into a temporary and then mark that temporary register with a noalias regnote and the alignment. This works fine unless you are calling the function with a return type of void. At this point then the valreg will be null and a crash will happen. A few cleanups are included in this patch because it was easier to do the fix with the cleanups added. The start_sequence/end_sequence for ECF_MALLOC is no longer needed; I can't tell if it was ever needed. The emit_move_insn function returns the last emitted instruction anyways so there is no reason to call get_last_insn as we can just use the return value of emit_move_insn. This has been true since this code was originally added so I don't understand why it was done that way beforehand. Bootstrapped and tested on x86_64-linux-gnu. PR middle-end/120024 gcc/ChangeLog: * calls.cc (expand_call): Remove start_sequence/end_sequence for ECF_MALLOC. Check valreg before deferencing it when it comes to malloc like functions. Use the return value of emit_move_insn instead of calling get_last_insn. gcc/testsuite/ChangeLog: * gcc.dg/torture/malloc-1.c: New test. * gcc.dg/torture/malloc-2.c: New test. Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>