riscv-gnu-toolchain/gcc.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
2025-08-26	OpenMP: give error when variant is the same as the base function [PR118839]	Sandra Loosemore	5	-0/+43
	As noted in the issue, the C++ front end has deeper problems: it's supposed to do the name lookup of the variant at the call site but is instead doing it when parsing the "declare variant" construct, before registering the decl for the base function. The C++ part of the patch is a band-aid to catch the case where there is a previous declaration of the function and it doesn't give an undefined symbol error instead. Some real solution ought to be included as part of fixing PR118791. gcc/c/ PR middle-end/118839 * c-parser.cc (c_finish_omp_declare_variant): Error if variant is the same as base. gcc/cp/ PR middle-end/118839 * decl.cc (omp_declare_variant_finalize_one): Error if variant is the same as base. gcc/fortran/ PR middle-end/118839 * trans-openmp.cc (gfc_trans_omp_declare_variant): Error if variant is the same as base. gcc/testsuite/ PR middle-end/118839 * gcc.dg/gomp/declare-variant-3.c: New. * gfortran.dg/gomp/declare-variant-22.f90: New.
2025-08-26	OpenMP: Improve front-end error-checking for "declare variant"	Sandra Loosemore	11	-196/+248
	This patch fixes a number of problems with parser error checking of "declare variant", especially in the C front end. The new C testcase unprototyped-variant.c added by this patch used to ICE when gimplifying the call site, at least in part because the variant was being recorded even after it was diagnosed as invalid. There was also a large block of dead code in the C front end that was supposed to fix up an unprototyped declaration of a variant function to match the base function declaration, that was never executed because it was nested in a conditional that could never be true. I've fixed those problems by rearranging the code and only recording the variant if it passes the correctness checks. I also tried to add some comments and re-work some particularly confusing bits of code, so that it's easier to understand. The OpenMP specification doesn't say what the behavior of "declare variant" with the "append_args" clause should be when the base function is unprototyped. The additional arguments are supposed to be inserted between the last fixed argument of the base function and any varargs, but without a prototype, for any given call we have no idea which arguments are fixed and which are varargs, and therefore no idea where to insert the additional arguments. This used to trigger some other diagnostics (which one depending on whether the variant was also unprototyped), but I thought it was better to just reject this with an explicit "sorry". Finally, I also observed that a missing "match" clause was only rejected if "append_args" or "adjust_args" was present. Per the spec, "match" has the "required" property, so if it's missing it should be diagnosed unconditionally. The C++ and Fortran front ends had the same issue so I fixed this one there too. gcc/c/ChangeLog * c-parser.cc (c_finish_omp_declare_variant): Rework diagnostic code. Do not record variant if there are errors. Make check for a missing "match" clause unconditional. gcc/cp/ChangeLog * parser.cc (cp_finish_omp_declare_variant): Structure diagnostic code similarly to C front end. Make check for a missing "match" clause unconditional. gcc/fortran/ChangeLog * openmp.cc (gfc_match_omp_declare_variant): Make check for a missing "match" clause unconditional. gcc/testsuite/ChangeLog * c-c++-common/gomp/append-args-1.c: Adjust expected output. * g++.dg/gomp/adjust-args-1.C: Likewise. * g++.dg/gomp/adjust-args-3.C: Likewise. * gcc.dg/gomp/adjust-args-1.c: Likewise: * gcc.dg/gomp/append-args-1.c: Likewise. * gcc.dg/gomp/unprototyped-variant.c: New. * gfortran.dg/gomp/adjust-args-1.f90: Adjust expected output. * gfortran.dg/gomp/append_args-1.f90: Likewise.
2025-08-26	[committed] RISC-V Testsuite hygiene	Jeff Law	4	-18/+9
	Shreya and I were working through some testsuite failures and noticed that many of the current failures on the pioneer were just silly. We have tests that expect to see full architecture strings in their expected output when the bulk (some might say all) of the architecture string is irrelevant. Worse yet, we'd have different matching lines. ie we'd have one that would machine rv64gc_blah_blah and another for rv64imfa_blah_blah. Judicious wildcard usage cleans this up considerably. This fixes ~80 failures in the riscv.exp testsuite. Pushing to the trunk as it's happy on the pioneer native, riscv32-elf and riscv64-elf. gcc/testsuite/ * gcc.target/riscv/arch-25.c: Use wildcards to simplify/eliminate dg-error directives. * gcc.target/riscv/arch-ss-2.c: Similarly. * gcc.target/riscv/arch-zilsd-2.c: Similarly. * gcc.target/riscv/arch-zilsd-3.c: Similarly.
2025-08-26	libstdc++/ranges: Prefer using offset-based _CachedPosition	Patrick Palka	1	-2/+0
	The offset-based partial specialization of _CachedPosition for random-access iterators is currently only selected if the offset type is smaller than the iterator type. Before r12-1018-g46ed811bcb4b86 this made sense since the main partial specialization only stored the iterator (incorrectly). After that bugfix, the main partial specialization now effectively stores a std::optional<iter> so the size constraint is inaccurate. And this main partial specialization must invalidate itself upon copy/move unlike the offset-based partial specialization. So I think we should just always prefer the offset-based _CachedPosition for a random-access iterator, even if the offset type happens to be larger than the iterator type. libstdc++-v3/ChangeLog: * include/std/ranges (__detail::_CachedPosition): Remove additional size constraint on the offset-based partial specialization. Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com> Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
2025-08-26	testsuite: restrict ctf-array-7 test to 64-bit targets [PR121411]	David Faust	1	-2/+3
	The test fails to compile on 32-bit targets because the arrays are too large. Restrict to targets where the array index type is 64-bits. Also note the relevant PR in the test comment. PR debug/121411 gcc/testsuite/ * gcc.dg/debug/ctf/ctf-array-7.c: Restrict to lp64,llp64 targets.
2025-08-26	testsuite: arm: Disable sched2 and sched3 in unsigned-extend-2.c	Torbjörn SVENSSON	1	-9/+4
	Disable sched2 and sched3 to only have one order of instructions to consider. gcc/testsuite/ChangeLog: * gcc.target/arm/unsigned-extend-2.c: Disable sched2 and sched3 and update function body to match. Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
2025-08-26	libstdc++: Do not require assignment for vector::resize(n, v) [PR90192]	Tomasz Kamiński	5	-21/+148
	This patch introduces a new function, _M_fill_append, which is invoked when copies of the same value are appended to the end of a vector. Unlike _M_fill_insert(end(), n, v), _M_fill_append never permute elements in place, so it does not require: * vector element type to be assignable; * a copy of the inserted value, in the case where it points to an element of the vector. vector::resize(n, v) now uses _M_fill_append, fixing the non-conformance where element types were required to be assignable. In addition, _M_fill_insert(end(), n, v) now delegates to _M_fill_append, which eliminates an unnecessary copy of v when the existing capacity is used. PR libstdc++/90192 libstdc++-v3/ChangeLog: * include/bits/stl_vector.h (vector<T>::_M_fill_append): Declare. (vector<T>::fill): Use _M_fill_append instead of _M_fill_insert. * include/bits/vector.tcc (vector<T>::_M_fill_append): Define (vector<T>::_M_fill_insert): Delegate to _M_fill_append when elements are appended. * testsuite/23_containers/vector/modifiers/moveable.cc: Updated copycount for inserting at the end (appending). * testsuite/23_containers/vector/modifiers/resize.cc: New test. * testsuite/backward/hash_set/check_construct_destroy.cc: Updated copycount, the hash_set constructor uses insert to fill buckets with nullptrs. Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
2025-08-26	libstdc++: Refactor bound arguments storage for bind_front/back	Tomasz Kamiński	6	-49/+418
	This patch refactors the implementation of bind_front and bind_back to avoid using std::tuple for argument storage. Instead, bound arguments are now: * stored directly if there is only one, * within a dedicated _Bound_arg_storage otherwise. _Bound_arg_storage is less expensive to instantiate and access than std::tuple. It can also be trivially copyable, as it doesn't require a non-trivial assignment operator for reference types. Storing a single argument directly provides similar benefits compared to both one element tuple or _Bound_arg_storage. _Bound_arg_storage holds each argument in an _Indexed_bound_arg base object. The base class is parameterized by both type and index to allow storing multiple arguments of the same type. Invocations are handled by _S_apply_front amd _S_apply_back static functions, which simulate explicit object parameters. To facilitate this, the __like_t alias template is now unconditionally available since C++11 in bits/move.h. libstdc++-v3/ChangeLog: * include/bits/move.h (std::__like_impl, std::__like_t): Make available in c++11. * include/std/functional (std::_Indexed_bound_arg) (std::_Bound_arg_storage, std::__make_bound_args): Define. (std::_Bind_front, std::_Bind_back): Use _Bound_arg_storage. * testsuite/20_util/function_objects/bind_back/1.cc: Expand test to cover cases of 0, 1, many bound args. * testsuite/20_util/function_objects/bind_back/111327.cc: Likewise. * testsuite/20_util/function_objects/bind_front/1.cc: Likewise. * testsuite/20_util/function_objects/bind_front/111327.cc: Likewise. Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Reviewed-by: Patrick Palka <ppalka@redhat.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
2025-08-26	libstdc++: Specialize _Never_valueless_alt for jthread, stop_token and ↵	Tomasz Kamiński	2	-0/+33
	stop_source The move constructors for stop_source and stop_token are equivalent to copying and clearing the raw pointer, as they are wrappers for a counted-shared state. For jthread, the move constructor performs a member-wise move of stop_source and thread. While std::thread could also have a _Never_valueless_alt specialization due to its inexpensive move (only moving a handle), doing so now would change the ABI. This patch takes the opportunity to correct this behavior for jthread, before C++20 API is marked stable. libstdc++-v3/ChangeLog: * include/std/stop_token (__variant::_Never_valueless_alt): Declare. (__variant::_Never_valueless_alt<std::stop_token>) (__variant::_Never_valueless_alt<std::stop_source>): Define. * include/std/thread: (__variant::_Never_valueless_alt): Declare. (__variant::_Never_valueless_alt<std::jthread>): Define.
2025-08-26	Enable unroll in the vectorizer when there's reduction for ↵	liuhongt	10	-3/+447
	FMA/DOT_PROD_EXPR/SAD_EXPR The patch is trying to unroll the vectorized loop when there're FMA/DOT_PRDO_EXPR/SAD_EXPR reductions, it will break cross-iteration dependence and enable more parallelism(since vectorize will also enable partial sum). When there's gather/scatter or scalarization in the loop, don't do the unroll since the performance bottleneck is not at the reduction. The unroll factor is set according to FMA/DOT_PROX_EXPR/SAD_EXPR CEIL ((latency * throught), num_of_reduction) .i.e For fma, latency is 4, throught is 2, if there's 1 FMA for reduction then unroll factor is 2 * 4 / 1 = 8. There's also a vect_unroll_limit, the final suggested_unroll_factor is set as MIN (vect_unroll_limix, 8). The vect_unroll_limit is mainly for register pressure, avoid to many spills. Ideally, all instructions in the vectorized loop should be used to determine unroll_factor with their (latency * throughput) / number, but that would too much for this patch, and may just GIGO, so the patch only considers 3 kinds of instructions: FMA, DOT_PROD_EXPR, SAD_EXPR. Note when DOT_PROD_EXPR is not native support, m_num_reduction += 3 * count which almost prevents unroll. There's performance boost for simple benchmark with DOT_PRDO_EXPR/FMA chain, slight improvement in SPEC2017 performance. gcc/ChangeLog: * config/i386/i386.cc (ix86_vector_costs::ix86_vector_costs): Addd new memeber m_num_reduc, m_prefer_unroll. (ix86_vector_costs::add_stmt_cost): Set m_prefer_unroll and m_num_reduc (ix86_vector_costs::finish_cost): Determine m_suggested_unroll_vector with consideration of reduc_lat_mult_thr, m_num_reduction and ix86_vect_unroll_limit. * config/i386/i386.h (enum ix86_reduc_unroll_factor): New enum. (processor_costs): Add reduc_lat_mult_thr and vect_unroll_limit. * config/i386/x86-tune-costs.h: Initialize reduc_lat_mult_thr and vect_unroll_limit. * config/i386/i386.opt: Add -param=ix86-vect-unroll-limit. gcc/testsuite/ChangeLog: * gcc.target/i386/vect_unroll-1.c: New test. * gcc.target/i386/vect_unroll-2.c: New test. * gcc.target/i386/vect_unroll-3.c: New test. * gcc.target/i386/vect_unroll-4.c: New test. * gcc.target/i386/vect_unroll-5.c: New test.
2025-08-26	[PATCH] RISC-V: Add pattern for reverse floating-point divide	Paul-Antoine Arras	19	-12/+313
	This pattern enables the combine pass (or late-combine, depending on the case) to merge a vec_duplicate into a div RTL instruction. The vec_duplicate is the dividend operand. Before this patch, we have two instructions, e.g.: vfmv.v.f v2,fa0 vfdiv.vv v1,v2,v1 After, we get only one: vfrdiv.vf v1,v1,fa0 gcc/ChangeLog: * config/riscv/autovec-opt.md (vfrdiv_vf_<mode>): Add new pattern to combine vec_duplicate + vfdiv.vv into vfrdiv.vf. config/riscv/vector.md (@pred_<optab><mode>_reverse_scalar): Allow VLS modes. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f16.c: Add vfrdiv. * gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f64.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f64.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f64.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f64.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf_binop.h: Add support for reverse variants. * gcc.target/riscv/rvv/autovec/vx_vf/vf_binop_data.h: Add data for reverse variants. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfrdiv-run-1-f16.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfrdiv-run-1-f32.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfrdiv-run-1-f64.c: New test.
2025-08-26	AArch64: extend cost model to cost outer loop vect where the inner loop is ↵	Tamar Christina	2	-2/+61
	invariant [PR121290] Consider the example: void f (int restrict x, int restrict y, int restrict z, int n) { for (int i = 0; i < 4; ++i) { int res = 0; for (int j = 0; j < 100; ++j) res += y[j] z[i]; x[i] = res; } } we currently vectorize as f: movi v30.4s, 0 ldr q31, [x2] add x2, x1, 400 .L2: ld1r {v29.4s}, [x1], 4 mla v30.4s, v29.4s, v31.4s cmp x2, x1 bne .L2 str q30, [x0] ret which is not useful because by doing outer-loop vectorization we're performing less work per iteration than we would had we done inner-loop vectorization and simply unrolled the inner loop. This patch teaches the cost model that if all your leafs are invariant, then adjust the loop cost by * VF, since every vector iteration has at least one lane really just doing 1 scalar. There are a couple of ways we could have solved this, one is to increase the unroll factor to process more iterations of the inner loop. This removes the need for the broadcast, however we don't support unrolling the inner loop within the outer loop. We only support unrolling by increasing the VF, which would affect the outer loop as well as the inner loop. We also don't directly support costing inner-loop vs outer-loop vectorization, and as such we're left trying to predict/steer the cost model ahead of time to what we think should be profitable. This patch attempts to do so using a heuristic which penalizes the outer-loop vectorization. We now cost the loop as note: Cost model analysis: Vector inside of loop cost: 2000 Vector prologue cost: 4 Vector epilogue cost: 0 Scalar iteration cost: 300 Scalar outside cost: 0 Vector outside cost: 4 prologue iterations: 0 epilogue iterations: 0 missed: cost model: the vector iteration cost = 2000 divided by the scalar iteration cost = 300 is greater or equal to the vectorization factor = 4. missed: not vectorized: vectorization not profitable. missed: not vectorized: vector version will never be profitable. missed: Loop costings may not be worthwhile. And subsequently generate: .L5: add w4, w4, w7 ld1w z24.s, p6/z, [x0, #1, mul vl] ld1w z23.s, p6/z, [x0, #2, mul vl] ld1w z22.s, p6/z, [x0, #3, mul vl] ld1w z29.s, p6/z, [x0] mla z26.s, p6/m, z24.s, z30.s add x0, x0, x8 mla z27.s, p6/m, z23.s, z30.s mla z28.s, p6/m, z22.s, z30.s mla z25.s, p6/m, z29.s, z30.s cmp w4, w6 bls .L5 and avoids the load and replicate if it knows it has enough vector pipes to do so. gcc/ChangeLog: PR target/121290 * config/aarch64/aarch64.cc (class aarch64_vector_costs ): Add m_loop_fully_scalar_dup. (aarch64_vector_costs::add_stmt_cost): Detect invariant inner loops. (adjust_body_cost): Adjust final costing if m_loop_fully_scalar_dup. gcc/testsuite/ChangeLog: PR target/121290 * gcc.target/aarch64/pr121290.c: New test.
2025-08-26	[PATCH] RISC-V: Add pattern for vector-scalar single-width floating-point ↵	Paul-Antoine Arras	22	-10/+365
	multiply This pattern enables the combine pass (or late-combine, depending on the case) to merge a vec_duplicate into a mult RTL instruction. Before this patch, we have two instructions, e.g.: vfmv.v.f v2,fa0 vfmul.vv v1,v1,v2 After, we get only one: vfmul.vf v2,v2,fa0 gcc/ChangeLog: * config/riscv/autovec-opt.md (vfmul_vf_<mode>): Add new pattern to combine vec_duplicate + vfmul.vv into vfmul.vf. config/riscv/vector.md (@pred_<optab><mode>_scalar): Allow VLS modes. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f16.c: Add vfmul. * gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f64.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f64.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f64.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f64.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf_binop.h: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf_binop_data.h: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf_binop_run.h: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmul-run-1-f16.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmul-run-1-f32.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmul-run-1-f64.c: New test. * gcc.target/riscv/rvv/autovec/vls/floating-point-mul-2.c: Adjust scan dump. * gcc.target/riscv/rvv/autovec/vls/floating-point-mul-3.c: Likewise.
2025-08-26	Fix RISC-V bootstrap	Jeff Law	1	-1/+1
	Recent changes from Kito have an unused parameter. On the assumption that he's going to likely want it as part of the API, I've simply removed the parameter's name until such time as Kito needs it. This should restore bootstrapping to the RISC-V port. Committing now rather than waiting for the CI system given bootstrap builds currently fail. * config/riscv/riscv.cc (riscv_arg_partial_bytes): Remove name from unused parameter.
2025-08-26	arm: testsuite: make gcc.target/arm/bics_3.c generate bics again	Richard Earnshaw	1	-1/+30
	The compiler is getting too smart! But this test is really intended to test that we generate BICS instead of BIC+CMP, so make the test use something that we can't subsequently fold away into a bit minipulation of a store-flag value. I've also added a couple of extra tests, so we now cover both the cases where we fold the result away and where that cannot be done. Also add a test that we don't generate a compare against 0, since that's really part of what this test is covering. gcc/testsuite: * gcc.target/arm/bics_3.c: Add some additional tests that cannot be folded to a bit manipulation.
2025-08-26	Compute vect_reduc_type off SLP node instead of stmt-info	Richard Biener	2	-13/+24
	The following changes the vect_reduc_type API to work on the SLP node. The API is only used from the aarch64 backend, so all changes are there. In particular I noticed aarch64_force_single_cycle is invoked even for scalar costing (where the flag tested isn't computed yet), I figured in scalar costing all reductions are a single cycle. * tree-vectorizer.h (vect_reduc_type): Get SLP node as argument. * config/aarch64/aarch64.cc (aarch64_sve_in_loop_reduction_latency): Take SLO node as argument and adjust. (aarch64_in_loop_reduction_latency): Likewise. (aarch64_detect_vector_stmt_subtype): Adjust. (aarch64_vector_costs::count_ops): Likewise. Treat reductions during scalar costing as single-cycle.
2025-08-26	tree-optimization/121659 - bogus swap of reduction operands	Richard Biener	2	-3/+19
	The following addresses a bogus swapping of SLP operands of a reduction operation which gets STMT_VINFO_REDUC_IDX out of sync with the SLP operand order. In fact the most obvious mistake is that we simply swap operands even on the first stmt even when there's no difference in the comparison operators (for == and != at least). But there are more latent issues that I noticed and fixed up in the process. PR tree-optimization/121659 * tree-vect-slp.cc (vect_build_slp_tree_1): Do not allow matching up comparison operators by swapping if that would disturb STMT_VINFO_REDUC_IDX. Make sure to only actually mark operands for swapping when there was a mismatch and we're not processing the first stmt. * gcc.dg/vect/pr121659.c: New testcase.
2025-08-26	Fix UBSAN issue with load-store data refactoring	Richard Biener	1	-2/+4
	The following makes sure to read from the lanes_ifn member only when necessary (and thus it was set). * tree-vect-stmts.cc (vectorizable_store): Access lanes_ifn only when VMAT_LOAD_STORE_LANES. (vectorizable_load): Likewise.
2025-08-26	Remove STMT_VINFO_REDUC_VECTYPE_IN	Richard Biener	2	-17/+5
	This was added when invariants/externals outside of SLP didn't have an easily accessible vector type. Now it's redundant so the following removes it. * tree-vectorizer.h (stmt_vec_info_::reduc_vectype_in): Remove. (STMT_VINFO_REDUC_VECTYPE_IN): Likewise. * tree-vect-loop.cc (vect_is_emulated_mixed_dot_prod): Get at the input vectype via the SLP node child. (vectorizable_lane_reducing): Likewise. (vect_transform_reduction): Likewise. (vectorizable_reduction): Do not set STMT_VINFO_REDUC_VECTYPE_IN.
2025-08-26	i386: Fix up recent changes to use GFNI for rotates/shifts [PR121658]	Jakub Jelinek	3	-9/+22
	The vgf2p8affineqb_<mode><mask_name> pattern uses "register_operand" predicate for the first input operand, so using "general_operand" for the rotate operand passed to it leads to ICEs, and so does the "nonimmediate_operand" in the <insn>v16qi3 define_expand. The following patch fixes it by using "register_operand" in the former case (that pattern is TARGET_GFNI only) and using force_reg in the latter case (the pattern is TARGET_XOP \|\| TARGET_GFNI and for XOP we can handle MEM operand). The rest of the changes are small formatting tweaks or use of const0_rtx instead of GEN_INT (0). 2025-08-26 Jakub Jelinek <jakub@redhat.com> PR target/121658 * config/i386/sse.md (<insn><mode>3 any_shift): Use const0_rtx instead of GEN_INT (0). (cond_<insn><mode> any_shift): Likewise. Formatting fix. (<insn><mode>3 any_rotate): Use register_operand predicate instead of general_operand for match_operand 1. Use const0_rtx instead of GEN_INT (0). (<insn>v16qi3 any_rotate): Use force_reg on operands[1]. Formatting fix. * config/i386/i386.cc (ix86_shift_rotate_cost): Comment formatting fixes. * gcc.target/i386/pr121658.c: New test.
2025-08-26	Daily bump.	GCC Administrator	4	-1/+259

2025-08-26	RISC-V: Add test for vec_duplicate + vmacc.vv unsigned combine with GR2VR ↵	Pan Li	16	-0/+100
	cost 0, 1 and 15 Add asm dump check and run test for vec_duplicate + vmacc.vvm combine to vmacc.vx, with the GR2VR cost is 0, 2 and 15. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c: Add asm check for vx combine. * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u64.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u8.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u16.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u32.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u64.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u8.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u16.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u32.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u64.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u8.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx_vmacc-run-1-u16.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vx_vmacc-run-1-u32.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vx_vmacc-run-1-u64.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vx_vmacc-run-1-u8.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2025-08-26	RISC-V: Add test for vec_duplicate + vmacc.vv signed combine with GR2VR cost ↵	Pan Li	19	-0/+538
	0, 1 and 15 Add asm dump check and run test for vec_duplicate + vmacc.vvm combine to vmacc.vx, with the GR2VR cost is 0, 2 and 15. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i16.c: Add asm check for vx combine. * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i32.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i64.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i8.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i16.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i32.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i64.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i8.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i16.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i32.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i64.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i8.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx_ternary.h: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vx_ternary_data.h: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vx_ternary_run.h: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vx_vmacc-run-1-i16.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vx_vmacc-run-1-i32.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vx_vmacc-run-1-i64.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vx_vmacc-run-1-i8.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2025-08-26	RISC-V: Combine vec_duplicate + vmacc.vv to vmacc.vx on GR2VR cost	Pan Li	2	-0/+119
	This patch would like to combine the vec_duplicate + vmacc.vv to the vmacc.vx. From example as below code. The related pattern will depend on the cost of vec_duplicate from GR2VR. Then the late-combine will take action if the cost of GR2VR is zero, and reject the combination if the GR2VR cost is greater than zero. Assume we have example code like below, GR2VR cost is 0. #define DEF_VX_TERNARY_CASE_0(T, OP_1, OP_2, NAME) \ void \ test_vx_ternary_##NAME##_##T##_case_0 (T * restrict vd, T * restrict vs2, \ T rs1, unsigned n) \ { \ for (unsigned i = 0; i < n; i++) \ vd[i] = vd[i] OP_2 vs2[i] OP_1 rs1; \ } DEF_VX_TERNARY_CASE_0(int32_t, , +, macc) Before this patch: 11 │ beq a3,zero,.L8 12 │ vsetvli a5,zero,e32,m1,ta,ma 13 │ vmv.v.x v2,a2 ... 16 │ .L3: 17 │ vsetvli a5,a3,e32,m1,ta,ma ... 22 │ vmacc.vv v1,v2,v3 ... 25 │ bne a3,zero,.L3 After this patch: 11 │ beq a3,zero,.L8 ... 14 │ .L3: 15 │ vsetvli a5,a3,e32,m1,ta,ma ... 20 │ vmacc.vx v1,a2,v3 ... 23 │ bne a3,zero,.L3 gcc/ChangeLog: config/riscv/vector.md (@pred_mul_plus_vx_<mode>): Add new pattern to generate vmacc rtl. (pred_macc_<mode>_scalar_undef): Ditto. config/riscv/autovec-opt.md (*vmacc_vx_<mode>): Add new pattern to match the vmacc vx combine. Signed-off-by: Pan Li <pan2.li@intel.com>
2025-08-26	omp-expand: Initialize fd->loop.n2 if needed for the zero iter case [PR121453]	Jakub Jelinek	2	-0/+34
	When expand_omp_for_init_counts is called from expand_omp_for_generic, zero_iter1_bb is NULL and the code always creates a new bb in which it clears fd->loop.n2 var (if it is a var), because it can dominate code with lastprivate guards that use the var. When called from other places, zero_iter1_bb is non-NULL and so we don't insert the clearing (and can't, because the same bb is used also for the non-zero iterations exit and in that case we need to preserve the iteration count). Clearing is also not necessary when e.g. outermost collapsed loop has constant non-zero number of iterations, in that case we initialize the var to something already earlier. The following patch makes sure to clear it if it hasn't been initialized yet before the first check for zero iterations. 2025-08-26 Jakub Jelinek <jakub@redhat.com> PR middle-end/121453 * omp-expand.cc (expand_omp_for_init_counts): Clear fd->loop.n2 before first zero count check if zero_iter1_bb is non-NULL upon entry and fd->loop.n2 has not been written yet. * gcc.dg/gomp/pr121453.c: New test.
2025-08-25	Add a test for PR tree-optimization/121656	H.J. Lu	1	-0/+21
	PR tree-optimization/121656 * gcc.dg/pr121656.c: New file. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2025-08-25	ctf: avoid overflow for array num elements [PR121411]	David Faust	2	-3/+32
	CTF array encoding uses uint32 for number of elements. This means there is a hard upper limit on array types which the format can represent. GCC internally was also using a uint32_t for this, which would overflow when translating from DWARF for arrays with more than UINT32_MAX elements. Use an unsigned HOST_WIDE_INT instead to fetch the array bound, and fall back to CTF_K_UNKNOWN if the array cannot be represented in CTF. PR debug/121411 gcc/ * dwarf2ctf.cc (gen_ctf_subrange_type): Use unsigned HWI for array_num_elements. Fallback to CTF_K_UNKNOWN if the array type has too many elements for CTF to represent. gcc/testsuite/ * gcc.dg/debug/ctf/ctf-array-7.c: New test.
2025-08-25	forwprop: Boolify simplify_permutation	Andrew Pinski	1	-39/+34
	After the return type of remove_prop_source_from_use was changed to void, simplify_permutation only returns 1 or 0 so it can be boolified. Bootstrapped and tested on x86_64-linux-gnu. gcc/ChangeLog: * tree-ssa-forwprop.cc (simplify_permutation): Boolify. (pass_forwprop::execute): No longer handle 2 as the return from simplify_permutation. Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
2025-08-25	Forwprop: boolify forward_propagate_into_comparison	Andrew Pinski	1	-12/+5
	After changing the return type of remove_prop_source_from_use, forward_propagate_into_comparison will never return 2. So boolify forward_propagate_into_comparison. Bootstrapped and tested on x86_64-linux-gnu. gcc/ChangeLog: * tree-ssa-forwprop.cc (forward_propagate_into_comparison): Boolify. (pass_forwprop::execute): Don't handle return of 2 from forward_propagate_into_comparison. Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
2025-08-25	forwprop: Remove return type of remove_prop_source_from_use	Andrew Pinski	1	-20/+15
	Since r5-4705-ga499aac5dfa5d9, remove_prop_source_from_use has always return false. This removes the return type of remove_prop_source_from_use and cleans up the usage of remove_prop_source_from_use. Bootstrapped and tested on x86_64-linux-gnu. gcc/ChangeLog: * tree-ssa-forwprop.cc (remove_prop_source_from_use): Remove return type. (forward_propagate_into_comparison): Update dealing with no return type of remove_prop_source_from_use. (forward_propagate_into_gimple_cond): Likewise. (simplify_permutation): Likewise. Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
2025-08-25	forwprop: Mark the old switch index for (maybe) dceing	Andrew Pinski	1	-2/+6
	While looking at this code I noticed that we don't remove the old switch index assignment if it is only used in the switch after it is modified in simplify_gimple_switch. This fixes that by marking the old switch index for the dce worklist. Bootstrapped and tested on x86_64-linux-gnu. gcc/ChangeLog: * tree-ssa-forwprop.cc (simplify_gimple_switch): Add simple_dce_worklist argument. Mark the old index when doing the replacement. (pass_forwprop::execute): Update call to simplify_gimple_switch. Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
2025-08-25	Rewrite bool loads for undefined case [PR121279]	Andrew Pinski	2	-1/+114
	Just like r16-465-gf2bb7ffe84840d8 but this time instead of a VCE there is a full on load from a boolean. This showed up when trying to remove the extra copy in the testcase from the revision mentioned above (pr120122-1.c). So when moving loads from a boolean type from being conditional to non-conditional, the load needs to become a full load and then casted into a bool so that the upper bits are correct. Bitfields loads will always do the truncation so they don't need to be rewritten. Non boolean types always do the truncation too. What we do is wrap the original reference with a VCE which causes the full load and then do a casting to do the truncation. Using fold_build1 with VCE will do the correct thing if there is a secondary VCE and will also fold if this was just a plain MEM_REF so there is no need to handle those 2 cases special either. Changes since v1: * v2: Use VIEW_CONVERT_EXPR instead of doing a manual load. Accept all non mode precision loads rather than just boolean ones. * v3: Move back to checking boolean type. Don't handle BIT_FIELD_REF. Add asserts for IMAG/REAL_PART_EXPR. Bootstrapped and tested on x86_64-linux-gnu. PR tree-optimization/121279 gcc/ChangeLog: * gimple-fold.cc (gimple_needing_rewrite_undefined): Return true for non mode precision boolean loads. (rewrite_to_defined_unconditional): Handle non mode precision loads. gcc/testsuite/ChangeLog: * gcc.dg/torture/pr121279-1.c: New test. Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
2025-08-25	LIM: Manually put uninit decl into ssa	Andrew Pinski	1	-0/+1
	When working on PR121279, I noticed that lim would create an uninitialized decl and marking it with supression for uninitialization warning. This is fine but then into ssa would just call get_or_create_ssa_default_def on that new decl which could in theory take some extra compile time to figure that out. Plus when doing the rewriting for undefinedness, there would now be a VCE around the decl. This means the ssa name is kept around and not propagated in some cases. So instead this patch manually calls get_or_create_ssa_default_def to get the "uninitalized" ssa name for this decl and no longer needs the write into ssa nor for undefined ness. Bootstrapped and tested on x86_64-linux-gnu. gcc/ChangeLog: * tree-ssa-loop-im.cc (execute_sm): Call get_or_create_ssa_default_def for the new uninitialized decl. Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
2025-08-25	xtensa: Make use of compact insn definition syntax for insns whose have ↵	Takayuki 'January June' Suwa	1	-174/+151
	multiple alternatives The use of compact syntax makes the relationship between asm output, operand constraints, and insn attributes easier to understand and modify, especially for "mov<mode>_internal". gcc/ChangeLog: * config/xtensa/xtensa.md (addsi3, <u>mulhisi3, andsi3, zero_extend<mode>si2, extendhisi2_internal, movsi_internal, movhi_internal, movqi_internal, movsf_internal, ashlsi3_internal, ashrsi3, lshrsi3, rotlsi3, rotrsi3): Rewrite in compact syntax.
2025-08-25	xtensa: Simplify "*masktrue_const_bitcmpl" insn pattern	Takayuki 'January June' Suwa	1	-1/+1
	gcc/ChangeLog: * config/xtensa/xtensa.md (The auxiliary define_split for *masktrue_const_bitcmpl): Use a more concise function call, i.e., (1 << GET_MODE_BITSIZE (mode)) - 1 is equivalent to GET_MODE_MASK (mode).
2025-08-25	xtensa: Simplify "zero_extend[hq]isi2" insn patterns	Takayuki 'January June' Suwa	1	-15/+5
	gcc/ChangeLog: * config/xtensa/xtensa.md (mode_bits): New mode attribute. (zero_extend<mode>si2): Use the appropriate mode iterator and attribute to unify "zero_extend[hq]isi2" to this description.
2025-08-25	c++: Implement C++ CWG3048 - Empty destructuring expansion statements	Jakub Jelinek	4	-9/+45
	The following patch implements the proposed resolution of https://cplusplus.github.io/CWG/issues/3048.html Instead of rejecting structured binding size it just builds a normal decl rather than structured binding declaration. 2025-08-25 Jakub Jelinek <jakub@redhat.com> * pt.cc (finish_expansion_stmt): Implement C++ CWG3048 - Empty destructuring expansion statements. Don't error for destructuring expansion stmts if sz is 0, don't call fit_decomposition_lang_decl if n is 0 and pass NULL rather than this_decomp to cp_finish_decl. * g++.dg/cpp26/expansion-stmt15.C: Don't expect error on destructuring expansion stmts with structured binding size 0. * g++.dg/cpp26/expansion-stmt21.C: New test. * g++.dg/cpp26/expansion-stmt22.C: New test.
2025-08-25	c++: Check for *jump_target earlier in cxx_bind_parameters_in_call [PR121601]	Jakub Jelinek	2	-2/+21
	The following testcase ICEs, because the /* Check we aren't dereferencing a null pointer when calling a non-static member function, which is undefined behaviour. / if (i == 0 && DECL_OBJECT_MEMBER_FUNCTION_P (fun) && integer_zerop (arg) / But ignore calls from within compiler-generated code, to handle cases like lambda function pointer conversion operator thunks which pass NULL as the 'this' pointer. / && !(TREE_CODE (t) == CALL_EXPR && CALL_FROM_THUNK_P (t))) { if (!ctx->quiet) error_at (cp_expr_loc_or_input_loc (x), "dereferencing a null pointer"); non_constant_p = true; } checking is done before testing if (jump_target). Especially when throws (jump_target), arg can be (and is on this testcase) NULL_TREE, so calling integer_zerop on it ICEs. Fixed by moving the if (jump_target) test earlier. 2025-08-25 Jakub Jelinek <jakub@redhat.com> PR c++/121601 * constexpr.cc (cxx_bind_parameters_in_call): Move break if jump_target before the check for null this object pointer. g++.dg/cpp26/constexpr-eh16.C: New test.
2025-08-25	tree-optimization/121638 - missed SLP discovery of live induction	Richard Biener	2	-6/+82
	The following fixes a missed SLP discovery of a live induction. Our pattern matching of those fails because of the PR81529 fix which I think was misguided and should now no longer be relevant. So this essentially reverts that fix. I have added a GIMPLE testcase to increase the chance the particular IL is preserved through the future. This shows that how we make some IVs live because of early-break isn't quite correct, so I had to preserve a hack here. Hopefully to be investigated at some point. PR tree-optimization/121638 * tree-vect-stmts.cc (process_use): Do not make induction PHI backedge values relevant. * gcc.dg/vect/pr121638.c: New testcase.
2025-08-25	targhooks: i386: rename TAG_SIZE to TAG_BITSIZE	Indu Bhagat	7	-11/+11
	gcc/Changelog: * asan.h (HWASAN_TAG_SIZE): Use targetm.memtag.tag_bitsize. * config/i386/i386.cc (ix86_memtag_tag_size): Rename to ix86_memtag_tag_bitsize. (TARGET_MEMTAG_TAG_SIZE): Renamed to TARGET_MEMTAG_TAG_BITSIZE. * doc/tm.texi (TARGET_MEMTAG_TAG_SIZE): Likewise. * doc/tm.texi.in (TARGET_MEMTAG_TAG_SIZE): Likewise. * target.def (tag_size): Rename to tag_bitsize. * targhooks.cc (default_memtag_tag_size): Rename to default_memtag_tag_bitsize. * targhooks.h (default_memtag_tag_size): Likewise. Signed-off-by: Claudiu Zissulescu <claudiu.zissulescu-ianculescu@oracle.com> Co-authored-by: Claudiu Zissulescu <claudiu.zissulescu-ianculescu@oracle.com>
2025-08-25	RISC-V: Replace deprecated FUNCTION_VALUE/LIBCALL_VALUE macros with target hooks	Kito Cheng	3	-21/+36
	The FUNCTION_VALUE and LIBCALL_VALUE macros are deprecated in favor of the TARGET_FUNCTION_VALUE and TARGET_LIBCALL_VALUE target hooks. This patch replaces the macro definitions with proper target hook implementations. This change is also a preparatory step for VLS calling convention support, which will require additional information that is more easily handled through the target hook interface. gcc/ChangeLog: * config/riscv/riscv-protos.h (riscv_init_cumulative_args): Change fntype parameter from tree to const_tree. * config/riscv/riscv.cc (riscv_init_cumulative_args): Likewise. (riscv_function_value): Replace with new implementation that conforms to TARGET_FUNCTION_VALUE hook signature. (riscv_libcall_value): New function implementing TARGET_LIBCALL_VALUE. (TARGET_FUNCTION_VALUE): Define. (TARGET_LIBCALL_VALUE): Define. * config/riscv/riscv.h (FUNCTION_VALUE): Remove. (LIBCALL_VALUE): Remove.
2025-08-25	Use x86 GFNI for vectorized constant byte shifts/rotates	Andi Kleen	11	-14/+595
	The GFNI AVX gf2p8affineqb instruction can be used to implement vectorized byte shifts or rotates. This patch uses them to implement shift and rotate patterns to allow the vectorizer to use them. Previously AVX couldn't do rotates (except with XOP) and had to handle 8 bit shifts with a half throughput 16 bit shift. This is only implemented for constant shifts. In theory it could be used with a lookup table for variable shifts, but it's unclear if it's worth it. The vectorizer cost model could be improved, but seems to work for now. It doesn't model the true latencies of the instructions. Also it doesn't account for the memory loading of the mask, assuming that for a loop it will be loaded outside the loop. The instructions would also support more complex patterns (e.g. arbitary bit movement or inversions), so some of the tricks applied to ternlog could be applied here too to collapse more code. It's trickier because the input patterns can be much longer since they can apply to every bit individually. I didn't attempt any of this. There's currently no test case for the masked/cond_ variants, they seem to be difficult to trigger with the vectorizer. Suggestions for a test case for them welcome. gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_vgf2p8affine_shift_matrix): New function to lookup shift/rotate matrixes for gf2p8affine. * config/i386/i386-protos.h (ix86_vgf2p8affine_shift_matrix): Declare new function. * config/i386/i386.cc (ix86_shift_rotate_cost): Add cost model for shift/rotate implemented using gf2p8affine. * config/i386/sse.md (VI1_AVX512_3264): New mode iterator. (<insn><mode>3): Add GFNI case for shift patterns. (cond_<insn><mode>3): New pattern. (<insn><mode>3<mask_name>): Dito. (<insn>v16qi): New rotate pattern to handle XOP V16QI case and GFNI. (rotl<mode>3, rotr<mode>3): Exclude V16QI case. gcc/testsuite/ChangeLog: * gcc.target/i386/shift-gf2p8affine-1.c: New test * gcc.target/i386/shift-gf2p8affine-2.c: New test * gcc.target/i386/shift-gf2p8affine-3.c: New test * gcc.target/i386/shift-v16qi-4.c: New test * gcc.target/i386/shift-gf2p8affine-5.c: New test * gcc.target/i386/shift-gf2p8affine-6.c: New test * gcc.target/i386/shift-gf2p8affine-7.c: New test
2025-08-25	LoongArch: Fix ICE in highway-1.3.0 testsuite [PR121634]	Xi Ruoyao	2	-1/+16
	I can't believe I made such a stupid pasto and the regression test didn't detect anything wrong. PR target/121634 gcc/ * config/loongarch/simd.md (simd_maddw_evod_<mode>_<su>): Use WVEC_HALF instead of WVEC for the mode of the sign_extend for the rhs of multiplication. gcc/testsuite/ * gcc.target/loongarch/pr121634.c: New test.
2025-08-24	Fix invalid right shift count with recent ifcvt changes	Jeff Law	2	-6/+10
	I got too clever trying to simplify the right shift computation in my recent ifcvt patch. Interestingly enough, I haven't seen anything but the Linaro CI configuration actually trip the problem, though the code is clearly wrong. The problem I was trying to avoid were the leading zeros when calling clz on a HWI when the real object is just say 32 bits. The net is we get a right shift count of "2" when we really wanted a right shift count of 30. That causes the execution aspect of bics_3 to fail. The scan failures are due to creating slightly more efficient code. THe new code sequences don't need to use conditional execution for selection and thus we can use bic rather bics which requires a twiddle in the scan. I reviewed recent bug reports and haven't seen one for this issue. So no new testcase as this is covered by the armv7 testsuite in the right configuration. Bootstrapped and regression tested on x86_64, also verified it fixes the Linaro reported CI failure and verified the crosses are still happy. Pushing to the trunk. gcc/ * ifcvt.cc (noce_try_sign_bit_splat): Fix right shift computation. gcc/testsuite/ * gcc.target/arm/bics_3.c: Adjust expected output
2025-08-25	Daily bump.	GCC Administrator	2	-1/+5

2025-08-24	Update gcc de.po	Joseph Myers	1	-331/+332
	* de.po: Update.
2025-08-24	i386: fix ChangeLog entry	Sam James	1	-1/+1
	Fix a typo in the ChangeLog entry from r16-3355-g96a291c4bb0b8a.
2025-08-24	Daily bump.	GCC Administrator	4	-1/+55

2025-08-23	c++: Fix greater-than operator in braced-init-lists [PR116928]	Eczbek	2	-0/+8
	PR c++/116928 gcc/cp/ChangeLog: * parser.cc (cp_parser_braced_list): Set greater_than_is_operator_p. gcc/testsuite/ChangeLog: * g++.dg/parse/template33.C: New test. Reviewed-by: Jason Merrill <jason@redhat.com>
2025-08-23	x86: Compile noplt-(g\|l)d-1.c with -mtls-dialect=gnu	H.J. Lu	2	-2/+2
	Compile noplt-gd-1.c and noplt-ld-1.c with -mtls-dialect=gnu to support the --with-tls=gnu2 configure option since they scan the assembly output for the __tls_get_addr call which is generated by -mtls-dialect=gnu. PR target/120933 * gcc.target/i386/noplt-gd-1.c (dg-options): Add -mtls-dialect=gnu. * gcc.target/i386/noplt-ld-1.c (dg-options): Likewise. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>