aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2025-08-28libstdc++: Provide helpers to interoperate between __cmp_cat::_Ord and ↵Tomasz Kamiński1-44/+41
ordering types. This patch adds two new internal helpers for ordering types: * __cmp_cat::__ord to retrieve an internal _Ord value, * __cmp_cat::__make<Ordering> to create an ordering from an _Ord value. Conversions between ordering types are now handled by __cmp_cat::__make. As a result, ordering types no longer need to befriend each other, only the new helpers. The __fp_weak_ordering implementation has also been simplified by: * using the new helpers to convert partial_ordering to weak_ordering, * using strong_ordering to weak_ordering conversion operator, for the __isnan_sign comparison, * removing the unused __cat local variable. Finally, the _Ncmp enum is removed, and the unordered enumerator is added to the existing _Ord enum. libstdc++-v3/ChangeLog: * libsupc++/compare (__cmp_cat::_Ord): Add unordered enumerator. (__cmp_cat::_Ncmp): Remove. (__cmp_cat::__ord, __cmp_cat::__make): Define. (partial_ordering::partial_ordering(__cmp_cat::_Ncmp)): Remove. (operator<=>(__cmp_cat::__unspec, partial_ordering)) (partial_ordering::unordered): Replace _Ncmp with _Ord. (std::partial_ordering, std::weak_ordering, std::strong_ordering): Befriend __ord and __make helpers, remove friend declartions for other orderings. (__compare::__fp_weak_ordering): Remove unused __cat variable. Simplify ordering conversions. Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
2025-08-28c++/modules: Add explanatory note for incomplete types with definition in ↵Nathaniel Shead4-4/+186
different module [PR119844] The confusion in the PR arose because the definition of 'User' in a separate named module did not provide an implementation for the forward-declaration in the global module. This seems likely to be a common mistake while people are transitioning to modules, so this patch adds an explanatory note. While I was looking at this I also noticed that the existing handling of partial specialisations for this note was wrong (we pointed at the primary template declaration rather than the relevant partial spec), so this patch fixes that up, and also gives a more precise error message for using a template other than by self-reference while it's being defined. PR c++/119844 gcc/cp/ChangeLog: * typeck2.cc (cxx_incomplete_type_inform): Add explanation when a similar type is complete but attached to a different module. Also fix handling of partial specs and templates. gcc/testsuite/ChangeLog: * g++.dg/modules/pr119844_a.C: New test. * g++.dg/modules/pr119844_b.C: New test. Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
2025-08-28PR modula2/121629: adding third party modulesGaius Mulley9-94/+384
This patch makes it easier to add third party modules. cc1gm2 now appends the search directory prefix/include/m2 to the search path for non dialect specific modules. Prior to this it appends the dialect specific subdirectories {m2pim,m2iso,m2log,m2min} with the appropriate dialect pathname. The patch also includes a new option -fm2-pathname-root=prefix which allow additional prefix/m2 directories to be searched before the default. gcc/ChangeLog: PR modula2/121629 * doc/gm2.texi (Module Search Path): New section. (Compiler options): New option -fm2-pathname-root=. New option -fm2-pathname-rootI. gcc/m2/ChangeLog: PR modula2/121629 * gm2-compiler/PathName.mod: Add copyright notice. * gm2-lang.cc (named_path): Add field lib_root. (push_back_Ipath): Set lib_root false. (push_back_lib_root): New function. (get_dir_sep_size): Ditto. (add_path_component): Ditto. (add_one_import_path): Ditto. (add_non_dialect_specific_path): Ditto. (foreach_lib_gen_import_path): Ditto. (get_module_source_dir): Ditto. (add_default_include_paths): Ditto. (assign_flibs): Ditto. (m2_pathname_root): Ditto. (add_m2_import_paths): Remove function. (gm2_langhook_post_options): Call assign_flibs. Check np.lib_root and call foreach_lib_gen_import_path. Replace call to add_m2_import_paths with a call to add_default_include_paths. (gm2_langhook_handle_option): Add case OPT_fm2_pathname_rootI_. * gm2spec.cc (named_path): Add field lib_root. (push_back_Ipath): Set lib_root false. (push_back_lib_root): New function. (add_m2_I_path): Add OPT_fm2_pathname_rootI_ option if np.lib_root. (lang_specific_driver): Add case OPT_fm2_pathname_root_. * lang.opt (fm2-pathname-root=): New option. (fm2-pathname-rootI=): Ditto. gcc/testsuite/ChangeLog: PR modula2/121629 * gm2/switches/pathnameroot/pass/switches-pathnameroot-pass.exp: New test. * gm2/switches/pathnameroot/pass/test.mod: New test. * gm2/switches/pathnameroot/pass/testlib/m2/foo.def: New test. * gm2/switches/pathnameroot/pass/testlib/m2/foo.mod: New test. Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
2025-08-28[gcn] gcc/configure.ac + install.texi - changes to detect HAVE_AS_LEB128 ↵Tobias Burnus3-0/+9
[PR119367] The llvm-mc linker by default assemblies to another assembly file and not to an ELF binary; that usually does not matter – but for the LEB128 check, additionally, the resulting binary is checked. Hence, when using llvm-mc as target linker for amdgcn-*-*, we better add the "--filetype=obj -triple=amdgcn--amdhsa" flags. The current patch does so unconditionally, assuming that always llvm-mc is used. Additionally, the resulting ELF file is checked, which requires an ELF reader such as objdump. This commit adds llvm-objdump to the build documentation for amdgcn, albeit also, e.g., Binutil's 'objdump' would do - as long as either amdgcn-amdhsa-objdump or amdgcn-amdhsa/bin/objdump is found during the amdgcn cross build. gcc/ChangeLog: PR debug/119367 * acinclude.m4 (gcc_GAS_FLAGS): For gcn, use "--filetype=obj -triple=amdgcn--amdhsa", if supported. * configure: Regenerate. * doc/install.texi (amdgcn-*-*): Also add llvm-objdump to the list of to-be-copied files.
2025-08-28c++: Fix auto return type deduction with expansion statements [PR121583]Jakub Jelinek3-1/+28
The following testcase ICEs during expansion, because cfun->returns_struct wasn't cleared, despite auto being deduced to int. The problem is that check_return_type -> apply_deduced_return_type is called when parsing the expansion stmt body, at that time processing_template_decl is non-zero and apply_deduced_return_type in that case doesn't do the if (function *fun = DECL_STRUCT_FUNCTION (fco)) { bool aggr = aggregate_value_p (result, fco); #ifdef PCC_STATIC_STRUCT_RETURN fun->returns_pcc_struct = aggr; #endif fun->returns_struct = aggr; } My assumption is that !processing_template_decl in that case is used in the sense "the fco function is not a function template", for function templates no reason to bother with fun->returns*struct, nothing will care about that. When returning a type dependent expression in the expansion stmt body, apply_deduced_return_type just won't be called during parsing, but when instantiating the body and all will be fine. But when returning a non-type-dependent expression, while check_return_type will be called again during instantiation of the body, as the return type is no longer auto in that case apply_deduced_return_type will not be called again and so nothing will fix up fun->returns*struct. The following patch fixes that by using !uses_template_parms (fco) check instead of !processing_template_decl. 2025-08-28 Jakub Jelinek <jakub@redhat.com> PR c++/121583 * semantics.cc (apply_deduced_return_type): Adjust fun->returns*_struct when !uses_template_parms (fco) instead of when !processing_template_decl. * g++.dg/cpp26/expansion-stmt23.C: New test. * g++.dg/cpp26/expansion-stmt24.C: New test.
2025-08-28c++: Fix ICE with parameter uses in expansion stmts [PR121575]Jakub Jelinek2-0/+64
The following testcase shows an ICE when a parameter of a non-template function is referenced in expansion stmt body. tsubst_expr in that case assumes that either the PARM_DECL has registered local specialization, or is this argument or it is in unevaluated context. Parameters are always defined outside of the expansion statement for-range-declaration or body, so for the instantiation of the body outside of templates should always map to themselves. It could be fixed by registering local self-specializations for all the function parameters, but just handling it in tsubst_expr seems to be easier and less costly. Some PARM_DECLs, e.g. from concepts, have NULL DECL_CONTEXT, those are handled like before (and assert it is unevaluated operand), for others this checks if the PARM_DECL is from a non-template and in that case it will just return t. 2025-08-28 Jakub Jelinek <jakub@redhat.com> Jason Merrill <jason@redhat.com> PR c++/121575 * pt.cc (tsubst_expr) <case PARM_DECL>: If DECL_CONTEXT (t) isn't a template return t for PARM_DECLs without local specialization. * g++.dg/cpp26/expansion-stmt20.C: New test.
2025-08-28Avoid mult pattern if that will break reduction constraintsRichard Biener1-0/+29
synth-mult introduces multiple uses of a reduction variable in some cases which will ultimatively fail vectorization (or ICE with a pending change). So avoid applying the pattern in such case. * tree-vect-patterns.cc (vect_synth_mult_by_constant): Avoid in cases that introduce multiple uses of reduction operands. Co-authored-by: Jakub Jelinek <jakub@redhat.com>
2025-08-28The divmod pattern will break reduction constraintsRichard Biener1-1/+3
When we apply a divmod pattern this will break reductions by introducing multiple uses of the reduction var, so avoid this pattern in reductions. * tree-vect-patterns.cc (vect_recog_divmod_pattern): Avoid for stmts participating in a reduction.
2025-08-28configure: Add readelf fallback for HAVE_AS_ULEB128 test [PR119367]Jakub Jelinek2-4/+14
The following patch adds a readelf fallback if objdump nor otool don't exist. All of GNU binutils readelf, eu-readelf and llvm-readelf can handle it with those options. 2025-08-28 Jakub Jelinek <jakub@redhat.com> PR debug/119367 * configure.ac (gcc_cv_as_leb128): Add fallback using readelf. Grammar fix in comment. * configure: Regenerate.
2025-08-28dwarf2out: Use DW_LNS_advance_pc instead of DW_LNS_fixed_advance_pc if ↵Jakub Jelinek1-3/+19
possible [PR119367] In the usual case we use .loc directives and don't emit the line table manually. And assembler usually uses DW_LNS_advance_pc which has uleb128 argument and in most cases will have just a single byte operand. But if we do emit it for whatever reason (old or buggy assembler or -gno-as-loc{,view}-support option), we do use DW_LNS_fixed_advance_pc instead, which has fixed 2 byte operand. That is both wasteful in the usual case of very small advances, and more importantly will just result in assembler errors if we need to advance over more than 65535 bytes. The following patch uses DW_LNS_advance_pc instead if assembler supports .uleb128 directive with a difference of two labels in the same section. This is only possible if Minimum Instruction Length in the .debug_line header is 1 (otherwise DW_LNS_advance_pc operand is multiplied by that value and DW_LNS_fixed_advance_pc is not), but we emit 1 for that on all targets. Looking at dwarf2out.o (from dwarf2out.cc with this patch) compiled with compilers before/after this change with additional -fpic -gno-as-loc{,view}-support options, I see .debug_line section shrunk from 878067 bytes to 773381 bytes, so shrink by 12%. Admittedly gas generated .debug_line is even smaller, 501374 bytes (with -fpic and without -gno-as-loc{,view}-support options). 2025-08-28 Jakub Jelinek <jakub@redhat.com> PR debug/119367 * dwarf2out.cc (output_one_line_info_table) <case LI_adv_address>: If HAVE_AS_LEB128, use DW_LNS_advance_pc with dw2_asm_output_delta_uleb128 instead of DW_LNS_fixed_advance_pc with dw2_asm_output_delta.
2025-08-28Fortran: Constructors with PDT components did not work [PR82843]Paul Thomas3-0/+66
2025-08-28 Paul Thomas <pault@gcc.gnu.org> gcc/fortran PR fortran/82843 * intrinsic.cc (gfc_convert_type_warn): If the 'from_ts' is a PDT instance, copy the derived type to the target ts. * resolve.cc (gfc_resolve_ref): A PDT component in a component reference can be that of the pdt_template. Unconditionally use component of the PDT instance to ensure that the backend_decl is set during translation. Likewise if a component is encountered that is a PDT template type, use the component parmeters to convert to the correct PDT instance. gcc/testsuite/ PR fortran/82843 * gfortran.dg/pdt_40.f03: New test.
2025-08-28Fortran: Implement correct form of PDT constructors [PR82205]Paul Thomas7-17/+111
2025-08-28 Paul Thomas <pault@gcc.gnu.org> gcc/fortran PR fortran/82205 * decl.cc (gfc_get_pdt_instance): Copy the default initializer for components that are not PDT parameters or parameterized. If any component is a pointer or allocatable set the attributes 'pointer_comp' or 'alloc_comp' of the new PDT instance. * primary.cc (gfc_match_rvalue): Implement the correct form of PDT constructors with 'name (type parms)(component values)'. * trans-array.cc (structure_alloc_comps): Apply scalar default initializers. Array initializers await the coming change in PDT representation. * trans-io.cc (transfer_expr): Do not output the type parms of a PDT in list directed output. gcc/testsuite/ PR fortran/82205 * gfortran.dg/pdt_22.f03: Use the correct for PDT constructors. * gfortran.dg/pdt_23.f03: Likewise. * gfortran.dg/pdt_3.f03: Likewise.
2025-08-28Daily bump.GCC Administrator6-1/+147
2025-08-27Remove xfail marker on RISC-V testJeff Law1-3/+3
So yet another testsuite hygiene patch. This time turning XPASS -> PASS. My tester treats those cases the same so I didn't get notified that nozicond-2.c was passing after some recent changes. This removes the xfail marker on that test and thus the test is expected to pass now. Pushing to the trunk momentarily. gcc/testsuite/ * gcc.target/riscv/nozicond-2.c: Remove xfails.
2025-08-27Fortran: H edit descriptor error with -std=f95Jerry DeLisle11-27/+32
PR fortran/114611 gcc/fortran/ChangeLog: * io.cc: Issue an error on use of the H descriptor in a format with -std=f95 or higher. Otherwise, issue a warning. gcc/testsuite/ChangeLog: * gfortran.dg/aliasing_dummy_1.f90: Accommodate errors and warnings as needed. * gfortran.dg/eoshift_8.f90: Likewise. * gfortran.dg/g77/f77-edit-h-out.f: Likewise. * gfortran.dg/hollerith_1.f90: Likewise. * gfortran.dg/io_constraints_1.f90: Likewise. * gfortran.dg/io_constraints_2.f90: Likewise. * gfortran.dg/longline.f: Likewise. * gfortran.dg/pr20086.f90: Likewise. * gfortran.dg/unused_artificial_dummies_1.f90: Likewise. * gfortran.dg/x_slash_1.f: Likewise.
2025-08-27ifcvt: fix factor_out_operators (again) [PR121695]Andrew Pinski2-1/+26
r16-2648-gaebbc90d8c7c70 had a copy and pasto where the second statement was supposed to be setting the operand 1 of the phi but it was setting operand 0 instead. This fixes typo. Push as obvious after a quick build test for x86_64-linux-gnu. PR tree-optimization/121695 gcc/ChangeLog: * tree-if-conv.cc (factor_out_operators): Fix typo in assignment of the phi. gcc/testsuite/ChangeLog: * gcc.dg/torture/pr121695-1.c: New test. Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
2025-08-27RISC-V: testsuite: Fix vf_vfmul and vf_vfrdivPaul-Antoine Arras3-9/+1
Fix type and remove useless DejaGnu directives. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmul-run-1-f64.c: Fix type. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfrdiv-run-1-f32.c: Remove useless dg directives. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfrdiv-run-1-f64.c: Likewise.
2025-08-27libstdc++: Use _M_reverse to reverse partial_ordering using operator<=>Tomasz Kamiński1-6/+1
The patch r16-3414-gfcb3009a32dc33 changed the representation of unordered to optimize reversing of order, but it did not update implementation of reversing operator<=>(0, partial_order). libstdc++-v3/ChangeLog: * libsupc++/compare (operator<=>(__cmp_cat::__unspec, partial_ordering)): Implement using _M_reverse. Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
2025-08-27libstdc++: Move tai_- and gps_clock::now impls out of ABINathan Myers5-20/+74
This patch moves std::tai_clock::now() and std::tai_clock::now() definitions from header inlines to static members invoked via a normal function call, in service of stabilizing the C++20 ABI. It also changes #if guards to mention the actual __cpp_lib_* feature gated, not just the language version, for clarity. New global function symbols std::chrono::tai_clock::now and std::chrono::gps_clock::now are exported. libstdc++-v3/ChangeLog: * include/std/chrono (gps_clock::now, tai_clock::now): Remove inline definitions. * src/c++20/clock.cc (gps_clock::now, tai_clock::now): New file for out-of-line now() impls. * src/c++20/Makefile.am: Mention clock.cc. * src/c++20/Makefile.in: Regenerate. * config/abi/pre/gnu.ver: add mangled now() symbols.
2025-08-27Remove dead codeRichard Biener1-2/+0
The following removes trivially dead code. * tree-vect-loop.cc (vect_transform_cycle_phi): Remove unused reduc_stmt_info.
2025-08-27libsupc++: Change _Unordered comparison value to minimum value of signed char.Tomasz Kamiński2-8/+20
For any minimum value of a signed type, its negation (with wraparound) results in the same value, behaving like zero. Representing the unordered result with this minimum value, along with 0 for equal, 1 for greater, and -1 for less in partial_ordering, allows its value to be reversed using unary negation. The operator<=(partial_order, 0) now checks if the reversed value is positive. This works correctly because the unordered value remains unchanged and thus negative. libstdc++-v3/ChangeLog: * libsupc++/compare (_Ncmp::_Unordered): Rename and change the value to minimum value of signed char. (_Ncomp::unordered): Renamed from _Unordered, the name is reserved by partial_ordered::unordered. (partial_ordering::_M_reverse()): Define. (operator<=(partial_ordering, __cmp_cat::__unspec)) (operator>=(__cmp_cat::__unspec, partial_ordering)): Implemented in terms of negated _M_value. (operator>=(partial_ordering, __cmp_cat::__unspec)) (operator<=(__cmp_cat::__unspec, partial_ordering)): Directly compare _M_value, as unordered value is negative. (partial_ordering::unordered): Handle _Ncmp::unoredred rename. * python/libstdcxx/v6/printers.py: Add -128 as integer value for unordered, keeping 2 to preserve backward compatibility. Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
2025-08-27c++: Fix up cpp_warn on __STDCPP_FLOAT*_T__ [PR121520]Jakub Jelinek1-8/+8
I got the cpp_warn on __STDCPP_FLOAT*_T__ if we aren't predefining those wrong, so e.g. on powerpc64le we don't diagnose #undef __STDCPP_FLOAT16_T__. I've added it as else if on the if (c_dialect_cxx () && cxx_dialect > cxx20 && !floatn_nx_types[i].extended) condition, which means cpp_warn is called in case a target supports some extended type like _Float32x, cpp_warn is called on __STDCPP_FLOAT32_T__ (where when it supported _Float32 as well it did cpp_define_warn (pfile, "__STDCPP_FLOAT32_T__=1") earlier). On targets where the types aren't supported the earlier if (FLOATN_NX_TYPE_NODE (i) == NULL_TREE) continue; path is taken. This patch fixes it to cpp_warn on the non-extended types for C++23 if the target doesn't support them and cpp_define_warn as before if it does. 2025-08-27 Jakub Jelinek <jakub@redhat.com> PR target/121520 * c-cppbuiltin.cc (c_cpp_builtins): Properly call cpp_warn for __STDCPP_FLOAT<NN>_T__ if FLOATN_NX_TYPE_NODE (i) is NULL for C++23 for non-extended types and don't call cpp_warn for extended types.
2025-08-27tree-optimization/121686 - failed SLP discovery for live recurrenceRichard Biener2-3/+34
The following adjusts the SLP build for only-live stmts to not only consider vect_induction_def and vect_internal_def that are not part of a reduction but instead consider all non-reduction defs that are not part of a reduction, specifically in this case a recurrence def. This is also a missed optimization on the gcc-15 branch (but IMO a very minor one). PR tree-optimization/121686 * tree-vect-slp.cc (vect_analyze_slp): Consider all only-live non-reduction defs for discovery. * gcc.dg/vect/pr121686.c: New testcase.
2025-08-26testsuite; Fix unprotected-allocas-1.c at -O3 [PR121684]Andrew Pinski1-2/+2
The problem here is after r16-101, the 2 functions containing alloca/VLA start to be cloned and then we un-VLA happens in using_vararray so this is no longer testing what it should be testing. The obvious fix is to mark using_vararray and using_alloca as noclone too. Pushed as obvious after a quick test to make sure it is now working. gcc/testsuite/ChangeLog: PR testsuite/121684 * c-c++-common/hwasan/unprotected-allocas-0.c: Mark using_vararray and using_alloca as noclone too. Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
2025-08-27libstdc++: Reduce chances of object aliasing for function wrapper.Tomasz Kamiński3-10/+50
Previously, an empty functor (EmptyIdFunc) stored inside a std::move_only_function being first member of a Composite class could have the same address as the base of the EmptyIdFunc type (see included test cases), resulting in two objects of the same type at the same address. This commit addresses the issue by moving the internal buffer from the start of the wrapper object to a position after the manager function pointer. This minimizes aliasing with the stored buffer but doesn't completely eliminate it, especially when multiple empty base objects are involved (PR121180). To facilitate this member reordering, the private section of _Mo_base was eliminated, and the corresponding _M_manager and _M_destroy members were made protected. They remain inaccessible to users, as user-facing wrappers derive from _Mo_base privately. libstdc++-v3/ChangeLog: * include/bits/funcwrap.h (__polyfunc::_Mo_base): Reorder _M_manage and _M_storage members. Make _M_destroy protected and remove friend declaration. * testsuite/20_util/copyable_function/call.cc: Add test for aliasing base class. * testsuite/20_util/move_only_function/call.cc: Likewise. Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Reviewed-by: Patrick Palka <ppalka@redhat.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
2025-08-26x86-64: Emit the TLS call after debug markerH.J. Lu2-5/+34
For a basic block with only a debug marker: (note 3 0 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK) (note 2 3 5 2 NOTE_INSN_FUNCTION_BEG) (debug_insn 5 2 16 2 (debug_marker) "x.c":6:3 -1 (nil)) emit the TLS call after debug marker. gcc/ PR target/121668 * config/i386/i386-features.cc (ix86_emit_tls_call): Emit the TLS call after debug marker. gcc/testsuite/ PR target/121668 * gcc.target/i386/pr121668-1a.c: New test. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2025-08-26Move pr121656.c to gcc.dg/tortureH.J. Lu2-21/+30
Move pr121656.c to gcc.dg/torture and replace weak attribute with noipa attribute. Verified by reverting 56ca14c4c4f Fix invalid right shift count with recent ifcvt changes to trigger FAIL: gcc.dg/torture/pr121656.c -O1 execution test FAIL: gcc.dg/torture/pr121656.c -O2 execution test FAIL: gcc.dg/torture/pr121656.c -O3 -g execution test FAIL: gcc.dg/torture/pr121656.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test on Linux/x86-64. PR tree-optimization/121656 * gcc.dg/pr121656.c: Moved to ... * gcc.dg/torture/pr121656.c: Here. (dg-options): Removed. (foo): Replace weak attribute with noipa attribute. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2025-08-26More RISC-V testsuite hygieneJeff Law3-10/+8
More testsuite hygiene. Some of the thead tests are expecting to find xtheadvdot in the extension set, but it's not defined as a valid extension anywhere. I'm just removing xtheadvdot. Someone more familiar with these cores can add it back properly if they're so inclined. Second, there's a space after the zifencei in a couple of the thead arch strings. Naturally that causes failures as well. That's a trivial fix, just remove the bogus whitespace. That gets us clean on riscv.exp on the pioneer system. The pioneer is happy, as is riscv32-elf and riscv64-elf. Pushing to the trunk. gcc/ * config/riscv/riscv-cores.def (xt-c908v): Drop xtheadvdot. (xt-c910v2): Remove extraenous whitespace. (xt-c920v2): Drop xtheadvdot and remove extraeonous whitespace. gcc/testsuite/ * gcc.target/riscv/mcpu-xt-c908v.c: Drop xtheadvdot. * gcc.target/riscv/mcpu-xt-c920v2.c: Drop xtheadvdot.
2025-08-27Daily bump.GCC Administrator7-1/+282
2025-08-26OpenMP: give error when variant is the same as the base function [PR118839]Sandra Loosemore5-0/+43
As noted in the issue, the C++ front end has deeper problems: it's supposed to do the name lookup of the variant at the call site but is instead doing it when parsing the "declare variant" construct, before registering the decl for the base function. The C++ part of the patch is a band-aid to catch the case where there is a previous declaration of the function and it doesn't give an undefined symbol error instead. Some real solution ought to be included as part of fixing PR118791. gcc/c/ PR middle-end/118839 * c-parser.cc (c_finish_omp_declare_variant): Error if variant is the same as base. gcc/cp/ PR middle-end/118839 * decl.cc (omp_declare_variant_finalize_one): Error if variant is the same as base. gcc/fortran/ PR middle-end/118839 * trans-openmp.cc (gfc_trans_omp_declare_variant): Error if variant is the same as base. gcc/testsuite/ PR middle-end/118839 * gcc.dg/gomp/declare-variant-3.c: New. * gfortran.dg/gomp/declare-variant-22.f90: New.
2025-08-26OpenMP: Improve front-end error-checking for "declare variant"Sandra Loosemore11-196/+248
This patch fixes a number of problems with parser error checking of "declare variant", especially in the C front end. The new C testcase unprototyped-variant.c added by this patch used to ICE when gimplifying the call site, at least in part because the variant was being recorded even after it was diagnosed as invalid. There was also a large block of dead code in the C front end that was supposed to fix up an unprototyped declaration of a variant function to match the base function declaration, that was never executed because it was nested in a conditional that could never be true. I've fixed those problems by rearranging the code and only recording the variant if it passes the correctness checks. I also tried to add some comments and re-work some particularly confusing bits of code, so that it's easier to understand. The OpenMP specification doesn't say what the behavior of "declare variant" with the "append_args" clause should be when the base function is unprototyped. The additional arguments are supposed to be inserted between the last fixed argument of the base function and any varargs, but without a prototype, for any given call we have no idea which arguments are fixed and which are varargs, and therefore no idea where to insert the additional arguments. This used to trigger some other diagnostics (which one depending on whether the variant was also unprototyped), but I thought it was better to just reject this with an explicit "sorry". Finally, I also observed that a missing "match" clause was only rejected if "append_args" or "adjust_args" was present. Per the spec, "match" has the "required" property, so if it's missing it should be diagnosed unconditionally. The C++ and Fortran front ends had the same issue so I fixed this one there too. gcc/c/ChangeLog * c-parser.cc (c_finish_omp_declare_variant): Rework diagnostic code. Do not record variant if there are errors. Make check for a missing "match" clause unconditional. gcc/cp/ChangeLog * parser.cc (cp_finish_omp_declare_variant): Structure diagnostic code similarly to C front end. Make check for a missing "match" clause unconditional. gcc/fortran/ChangeLog * openmp.cc (gfc_match_omp_declare_variant): Make check for a missing "match" clause unconditional. gcc/testsuite/ChangeLog * c-c++-common/gomp/append-args-1.c: Adjust expected output. * g++.dg/gomp/adjust-args-1.C: Likewise. * g++.dg/gomp/adjust-args-3.C: Likewise. * gcc.dg/gomp/adjust-args-1.c: Likewise: * gcc.dg/gomp/append-args-1.c: Likewise. * gcc.dg/gomp/unprototyped-variant.c: New. * gfortran.dg/gomp/adjust-args-1.f90: Adjust expected output. * gfortran.dg/gomp/append_args-1.f90: Likewise.
2025-08-26[committed] RISC-V Testsuite hygieneJeff Law4-18/+9
Shreya and I were working through some testsuite failures and noticed that many of the current failures on the pioneer were just silly. We have tests that expect to see full architecture strings in their expected output when the bulk (some might say all) of the architecture string is irrelevant. Worse yet, we'd have different matching lines. ie we'd have one that would machine rv64gc_blah_blah and another for rv64imfa_blah_blah. Judicious wildcard usage cleans this up considerably. This fixes ~80 failures in the riscv.exp testsuite. Pushing to the trunk as it's happy on the pioneer native, riscv32-elf and riscv64-elf. gcc/testsuite/ * gcc.target/riscv/arch-25.c: Use wildcards to simplify/eliminate dg-error directives. * gcc.target/riscv/arch-ss-2.c: Similarly. * gcc.target/riscv/arch-zilsd-2.c: Similarly. * gcc.target/riscv/arch-zilsd-3.c: Similarly.
2025-08-26libstdc++/ranges: Prefer using offset-based _CachedPositionPatrick Palka1-2/+0
The offset-based partial specialization of _CachedPosition for random-access iterators is currently only selected if the offset type is smaller than the iterator type. Before r12-1018-g46ed811bcb4b86 this made sense since the main partial specialization only stored the iterator (incorrectly). After that bugfix, the main partial specialization now effectively stores a std::optional<iter> so the size constraint is inaccurate. And this main partial specialization must invalidate itself upon copy/move unlike the offset-based partial specialization. So I think we should just always prefer the offset-based _CachedPosition for a random-access iterator, even if the offset type happens to be larger than the iterator type. libstdc++-v3/ChangeLog: * include/std/ranges (__detail::_CachedPosition): Remove additional size constraint on the offset-based partial specialization. Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com> Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
2025-08-26testsuite: restrict ctf-array-7 test to 64-bit targets [PR121411]David Faust1-2/+3
The test fails to compile on 32-bit targets because the arrays are too large. Restrict to targets where the array index type is 64-bits. Also note the relevant PR in the test comment. PR debug/121411 gcc/testsuite/ * gcc.dg/debug/ctf/ctf-array-7.c: Restrict to lp64,llp64 targets.
2025-08-26testsuite: arm: Disable sched2 and sched3 in unsigned-extend-2.cTorbjörn SVENSSON1-9/+4
Disable sched2 and sched3 to only have one order of instructions to consider. gcc/testsuite/ChangeLog: * gcc.target/arm/unsigned-extend-2.c: Disable sched2 and sched3 and update function body to match. Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
2025-08-26libstdc++: Do not require assignment for vector::resize(n, v) [PR90192]Tomasz Kamiński5-21/+148
This patch introduces a new function, _M_fill_append, which is invoked when copies of the same value are appended to the end of a vector. Unlike _M_fill_insert(end(), n, v), _M_fill_append never permute elements in place, so it does not require: * vector element type to be assignable; * a copy of the inserted value, in the case where it points to an element of the vector. vector::resize(n, v) now uses _M_fill_append, fixing the non-conformance where element types were required to be assignable. In addition, _M_fill_insert(end(), n, v) now delegates to _M_fill_append, which eliminates an unnecessary copy of v when the existing capacity is used. PR libstdc++/90192 libstdc++-v3/ChangeLog: * include/bits/stl_vector.h (vector<T>::_M_fill_append): Declare. (vector<T>::fill): Use _M_fill_append instead of _M_fill_insert. * include/bits/vector.tcc (vector<T>::_M_fill_append): Define (vector<T>::_M_fill_insert): Delegate to _M_fill_append when elements are appended. * testsuite/23_containers/vector/modifiers/moveable.cc: Updated copycount for inserting at the end (appending). * testsuite/23_containers/vector/modifiers/resize.cc: New test. * testsuite/backward/hash_set/check_construct_destroy.cc: Updated copycount, the hash_set constructor uses insert to fill buckets with nullptrs. Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
2025-08-26libstdc++: Refactor bound arguments storage for bind_front/backTomasz Kamiński6-49/+418
This patch refactors the implementation of bind_front and bind_back to avoid using std::tuple for argument storage. Instead, bound arguments are now: * stored directly if there is only one, * within a dedicated _Bound_arg_storage otherwise. _Bound_arg_storage is less expensive to instantiate and access than std::tuple. It can also be trivially copyable, as it doesn't require a non-trivial assignment operator for reference types. Storing a single argument directly provides similar benefits compared to both one element tuple or _Bound_arg_storage. _Bound_arg_storage holds each argument in an _Indexed_bound_arg base object. The base class is parameterized by both type and index to allow storing multiple arguments of the same type. Invocations are handled by _S_apply_front amd _S_apply_back static functions, which simulate explicit object parameters. To facilitate this, the __like_t alias template is now unconditionally available since C++11 in bits/move.h. libstdc++-v3/ChangeLog: * include/bits/move.h (std::__like_impl, std::__like_t): Make available in c++11. * include/std/functional (std::_Indexed_bound_arg) (std::_Bound_arg_storage, std::__make_bound_args): Define. (std::_Bind_front, std::_Bind_back): Use _Bound_arg_storage. * testsuite/20_util/function_objects/bind_back/1.cc: Expand test to cover cases of 0, 1, many bound args. * testsuite/20_util/function_objects/bind_back/111327.cc: Likewise. * testsuite/20_util/function_objects/bind_front/1.cc: Likewise. * testsuite/20_util/function_objects/bind_front/111327.cc: Likewise. Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Reviewed-by: Patrick Palka <ppalka@redhat.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
2025-08-26libstdc++: Specialize _Never_valueless_alt for jthread, stop_token and ↵Tomasz Kamiński2-0/+33
stop_source The move constructors for stop_source and stop_token are equivalent to copying and clearing the raw pointer, as they are wrappers for a counted-shared state. For jthread, the move constructor performs a member-wise move of stop_source and thread. While std::thread could also have a _Never_valueless_alt specialization due to its inexpensive move (only moving a handle), doing so now would change the ABI. This patch takes the opportunity to correct this behavior for jthread, before C++20 API is marked stable. libstdc++-v3/ChangeLog: * include/std/stop_token (__variant::_Never_valueless_alt): Declare. (__variant::_Never_valueless_alt<std::stop_token>) (__variant::_Never_valueless_alt<std::stop_source>): Define. * include/std/thread: (__variant::_Never_valueless_alt): Declare. (__variant::_Never_valueless_alt<std::jthread>): Define.
2025-08-26Enable unroll in the vectorizer when there's reduction for ↵liuhongt10-3/+447
FMA/DOT_PROD_EXPR/SAD_EXPR The patch is trying to unroll the vectorized loop when there're FMA/DOT_PRDO_EXPR/SAD_EXPR reductions, it will break cross-iteration dependence and enable more parallelism(since vectorize will also enable partial sum). When there's gather/scatter or scalarization in the loop, don't do the unroll since the performance bottleneck is not at the reduction. The unroll factor is set according to FMA/DOT_PROX_EXPR/SAD_EXPR CEIL ((latency * throught), num_of_reduction) .i.e For fma, latency is 4, throught is 2, if there's 1 FMA for reduction then unroll factor is 2 * 4 / 1 = 8. There's also a vect_unroll_limit, the final suggested_unroll_factor is set as MIN (vect_unroll_limix, 8). The vect_unroll_limit is mainly for register pressure, avoid to many spills. Ideally, all instructions in the vectorized loop should be used to determine unroll_factor with their (latency * throughput) / number, but that would too much for this patch, and may just GIGO, so the patch only considers 3 kinds of instructions: FMA, DOT_PROD_EXPR, SAD_EXPR. Note when DOT_PROD_EXPR is not native support, m_num_reduction += 3 * count which almost prevents unroll. There's performance boost for simple benchmark with DOT_PRDO_EXPR/FMA chain, slight improvement in SPEC2017 performance. gcc/ChangeLog: * config/i386/i386.cc (ix86_vector_costs::ix86_vector_costs): Addd new memeber m_num_reduc, m_prefer_unroll. (ix86_vector_costs::add_stmt_cost): Set m_prefer_unroll and m_num_reduc (ix86_vector_costs::finish_cost): Determine m_suggested_unroll_vector with consideration of reduc_lat_mult_thr, m_num_reduction and ix86_vect_unroll_limit. * config/i386/i386.h (enum ix86_reduc_unroll_factor): New enum. (processor_costs): Add reduc_lat_mult_thr and vect_unroll_limit. * config/i386/x86-tune-costs.h: Initialize reduc_lat_mult_thr and vect_unroll_limit. * config/i386/i386.opt: Add -param=ix86-vect-unroll-limit. gcc/testsuite/ChangeLog: * gcc.target/i386/vect_unroll-1.c: New test. * gcc.target/i386/vect_unroll-2.c: New test. * gcc.target/i386/vect_unroll-3.c: New test. * gcc.target/i386/vect_unroll-4.c: New test. * gcc.target/i386/vect_unroll-5.c: New test.
2025-08-26[PATCH] RISC-V: Add pattern for reverse floating-point dividePaul-Antoine Arras19-12/+313
This pattern enables the combine pass (or late-combine, depending on the case) to merge a vec_duplicate into a div RTL instruction. The vec_duplicate is the dividend operand. Before this patch, we have two instructions, e.g.: vfmv.v.f v2,fa0 vfdiv.vv v1,v2,v1 After, we get only one: vfrdiv.vf v1,v1,fa0 gcc/ChangeLog: * config/riscv/autovec-opt.md (*vfrdiv_vf_<mode>): Add new pattern to combine vec_duplicate + vfdiv.vv into vfrdiv.vf. * config/riscv/vector.md (@pred_<optab><mode>_reverse_scalar): Allow VLS modes. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f16.c: Add vfrdiv. * gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f64.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f64.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f64.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f64.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf_binop.h: Add support for reverse variants. * gcc.target/riscv/rvv/autovec/vx_vf/vf_binop_data.h: Add data for reverse variants. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfrdiv-run-1-f16.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfrdiv-run-1-f32.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfrdiv-run-1-f64.c: New test.
2025-08-26AArch64: extend cost model to cost outer loop vect where the inner loop is ↵Tamar Christina2-2/+61
invariant [PR121290] Consider the example: void f (int *restrict x, int *restrict y, int *restrict z, int n) { for (int i = 0; i < 4; ++i) { int res = 0; for (int j = 0; j < 100; ++j) res += y[j] * z[i]; x[i] = res; } } we currently vectorize as f: movi v30.4s, 0 ldr q31, [x2] add x2, x1, 400 .L2: ld1r {v29.4s}, [x1], 4 mla v30.4s, v29.4s, v31.4s cmp x2, x1 bne .L2 str q30, [x0] ret which is not useful because by doing outer-loop vectorization we're performing less work per iteration than we would had we done inner-loop vectorization and simply unrolled the inner loop. This patch teaches the cost model that if all your leafs are invariant, then adjust the loop cost by * VF, since every vector iteration has at least one lane really just doing 1 scalar. There are a couple of ways we could have solved this, one is to increase the unroll factor to process more iterations of the inner loop. This removes the need for the broadcast, however we don't support unrolling the inner loop within the outer loop. We only support unrolling by increasing the VF, which would affect the outer loop as well as the inner loop. We also don't directly support costing inner-loop vs outer-loop vectorization, and as such we're left trying to predict/steer the cost model ahead of time to what we think should be profitable. This patch attempts to do so using a heuristic which penalizes the outer-loop vectorization. We now cost the loop as note: Cost model analysis: Vector inside of loop cost: 2000 Vector prologue cost: 4 Vector epilogue cost: 0 Scalar iteration cost: 300 Scalar outside cost: 0 Vector outside cost: 4 prologue iterations: 0 epilogue iterations: 0 missed: cost model: the vector iteration cost = 2000 divided by the scalar iteration cost = 300 is greater or equal to the vectorization factor = 4. missed: not vectorized: vectorization not profitable. missed: not vectorized: vector version will never be profitable. missed: Loop costings may not be worthwhile. And subsequently generate: .L5: add w4, w4, w7 ld1w z24.s, p6/z, [x0, #1, mul vl] ld1w z23.s, p6/z, [x0, #2, mul vl] ld1w z22.s, p6/z, [x0, #3, mul vl] ld1w z29.s, p6/z, [x0] mla z26.s, p6/m, z24.s, z30.s add x0, x0, x8 mla z27.s, p6/m, z23.s, z30.s mla z28.s, p6/m, z22.s, z30.s mla z25.s, p6/m, z29.s, z30.s cmp w4, w6 bls .L5 and avoids the load and replicate if it knows it has enough vector pipes to do so. gcc/ChangeLog: PR target/121290 * config/aarch64/aarch64.cc (class aarch64_vector_costs ): Add m_loop_fully_scalar_dup. (aarch64_vector_costs::add_stmt_cost): Detect invariant inner loops. (adjust_body_cost): Adjust final costing if m_loop_fully_scalar_dup. gcc/testsuite/ChangeLog: PR target/121290 * gcc.target/aarch64/pr121290.c: New test.
2025-08-26[PATCH] RISC-V: Add pattern for vector-scalar single-width floating-point ↵Paul-Antoine Arras22-10/+365
multiply This pattern enables the combine pass (or late-combine, depending on the case) to merge a vec_duplicate into a mult RTL instruction. Before this patch, we have two instructions, e.g.: vfmv.v.f v2,fa0 vfmul.vv v1,v1,v2 After, we get only one: vfmul.vf v2,v2,fa0 gcc/ChangeLog: * config/riscv/autovec-opt.md (*vfmul_vf_<mode>): Add new pattern to combine vec_duplicate + vfmul.vv into vfmul.vf. * config/riscv/vector.md (@pred_<optab><mode>_scalar): Allow VLS modes. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f16.c: Add vfmul. * gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f64.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f64.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f64.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f64.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf_binop.h: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf_binop_data.h: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf_binop_run.h: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmul-run-1-f16.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmul-run-1-f32.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmul-run-1-f64.c: New test. * gcc.target/riscv/rvv/autovec/vls/floating-point-mul-2.c: Adjust scan dump. * gcc.target/riscv/rvv/autovec/vls/floating-point-mul-3.c: Likewise.
2025-08-26Fix RISC-V bootstrapJeff Law1-1/+1
Recent changes from Kito have an unused parameter. On the assumption that he's going to likely want it as part of the API, I've simply removed the parameter's name until such time as Kito needs it. This should restore bootstrapping to the RISC-V port. Committing now rather than waiting for the CI system given bootstrap builds currently fail. * config/riscv/riscv.cc (riscv_arg_partial_bytes): Remove name from unused parameter.
2025-08-26arm: testsuite: make gcc.target/arm/bics_3.c generate bics againRichard Earnshaw1-1/+30
The compiler is getting too smart! But this test is really intended to test that we generate BICS instead of BIC+CMP, so make the test use something that we can't subsequently fold away into a bit minipulation of a store-flag value. I've also added a couple of extra tests, so we now cover both the cases where we fold the result away and where that cannot be done. Also add a test that we don't generate a compare against 0, since that's really part of what this test is covering. gcc/testsuite: * gcc.target/arm/bics_3.c: Add some additional tests that cannot be folded to a bit manipulation.
2025-08-26Compute vect_reduc_type off SLP node instead of stmt-infoRichard Biener2-13/+24
The following changes the vect_reduc_type API to work on the SLP node. The API is only used from the aarch64 backend, so all changes are there. In particular I noticed aarch64_force_single_cycle is invoked even for scalar costing (where the flag tested isn't computed yet), I figured in scalar costing all reductions are a single cycle. * tree-vectorizer.h (vect_reduc_type): Get SLP node as argument. * config/aarch64/aarch64.cc (aarch64_sve_in_loop_reduction_latency): Take SLO node as argument and adjust. (aarch64_in_loop_reduction_latency): Likewise. (aarch64_detect_vector_stmt_subtype): Adjust. (aarch64_vector_costs::count_ops): Likewise. Treat reductions during scalar costing as single-cycle.
2025-08-26tree-optimization/121659 - bogus swap of reduction operandsRichard Biener2-3/+19
The following addresses a bogus swapping of SLP operands of a reduction operation which gets STMT_VINFO_REDUC_IDX out of sync with the SLP operand order. In fact the most obvious mistake is that we simply swap operands even on the first stmt even when there's no difference in the comparison operators (for == and != at least). But there are more latent issues that I noticed and fixed up in the process. PR tree-optimization/121659 * tree-vect-slp.cc (vect_build_slp_tree_1): Do not allow matching up comparison operators by swapping if that would disturb STMT_VINFO_REDUC_IDX. Make sure to only actually mark operands for swapping when there was a mismatch and we're not processing the first stmt. * gcc.dg/vect/pr121659.c: New testcase.
2025-08-26Fix UBSAN issue with load-store data refactoringRichard Biener1-2/+4
The following makes sure to read from the lanes_ifn member only when necessary (and thus it was set). * tree-vect-stmts.cc (vectorizable_store): Access lanes_ifn only when VMAT_LOAD_STORE_LANES. (vectorizable_load): Likewise.
2025-08-26Remove STMT_VINFO_REDUC_VECTYPE_INRichard Biener2-17/+5
This was added when invariants/externals outside of SLP didn't have an easily accessible vector type. Now it's redundant so the following removes it. * tree-vectorizer.h (stmt_vec_info_::reduc_vectype_in): Remove. (STMT_VINFO_REDUC_VECTYPE_IN): Likewise. * tree-vect-loop.cc (vect_is_emulated_mixed_dot_prod): Get at the input vectype via the SLP node child. (vectorizable_lane_reducing): Likewise. (vect_transform_reduction): Likewise. (vectorizable_reduction): Do not set STMT_VINFO_REDUC_VECTYPE_IN.
2025-08-26i386: Fix up recent changes to use GFNI for rotates/shifts [PR121658]Jakub Jelinek3-9/+22
The vgf2p8affineqb_<mode><mask_name> pattern uses "register_operand" predicate for the first input operand, so using "general_operand" for the rotate operand passed to it leads to ICEs, and so does the "nonimmediate_operand" in the <insn>v16qi3 define_expand. The following patch fixes it by using "register_operand" in the former case (that pattern is TARGET_GFNI only) and using force_reg in the latter case (the pattern is TARGET_XOP || TARGET_GFNI and for XOP we can handle MEM operand). The rest of the changes are small formatting tweaks or use of const0_rtx instead of GEN_INT (0). 2025-08-26 Jakub Jelinek <jakub@redhat.com> PR target/121658 * config/i386/sse.md (<insn><mode>3 any_shift): Use const0_rtx instead of GEN_INT (0). (cond_<insn><mode> any_shift): Likewise. Formatting fix. (<insn><mode>3 any_rotate): Use register_operand predicate instead of general_operand for match_operand 1. Use const0_rtx instead of GEN_INT (0). (<insn>v16qi3 any_rotate): Use force_reg on operands[1]. Formatting fix. * config/i386/i386.cc (ix86_shift_rotate_cost): Comment formatting fixes. * gcc.target/i386/pr121658.c: New test.
2025-08-26Daily bump.GCC Administrator4-1/+259