aboutsummaryrefslogtreecommitdiff
path: root/gcc/testsuite
AgeCommit message (Collapse)AuthorFilesLines
2025-08-04aarch64: Drop unnecessary GPs in svcmp_wide PTEST patternsRichard Sandiford10-0/+667
Patterns that fuse a predicate operation P with a PTEST use aarch64_sve_same_pred_for_ptest_p to test whether the governing predicates of P and the PTEST are compatible. Most patterns were also written as define_insn_and_rewrites, with the rewrite replacing P's original governing predicate with PTEST's. This ensures that we don't, for example, have both a .H PTRUE for the PTEST and a .B PTRUE for a comparison that feeds the PTEST. The svcmp_wide* patterns were missing this rewrite, meaning that we did have redundant PTRUEs. gcc/ * config/aarch64/aarch64-sve.md (*aarch64_pred_cmp<cmp_op><mode>_wide_cc): Turn into a define_insn_and_rewrite and rewrite the governing predicate of the comparison so that it is identical to the PTEST's. (*aarch64_pred_cmp<cmp_op><mode>_wide_ptest): Likewise. gcc/testsuite/ * gcc.target/aarch64/sve/acle/general/cmpeq_1.c: Check the number of PTRUEs. * gcc.target/aarch64/sve/acle/general/cmpge_5.c: New test. * gcc.target/aarch64/sve/acle/general/cmpge_6.c: Likewise. * gcc.target/aarch64/sve/acle/general/cmpgt_5.c: Likewise. * gcc.target/aarch64/sve/acle/general/cmpgt_6.c: Likewise. * gcc.target/aarch64/sve/acle/general/cmple_5.c: Likewise. * gcc.target/aarch64/sve/acle/general/cmple_6.c: Likewise. * gcc.target/aarch64/sve/acle/general/cmplt_5.c: Likewise. * gcc.target/aarch64/sve/acle/general/cmplt_6.c: Likewise. * gcc.target/aarch64/sve/acle/general/cmpne_3.c: Likewise.
2025-08-04aarch64: Use the correct GP mode in the svcmp_wide patternsRichard Sandiford1-1/+52
The patterns for the svcmp_wide intrinsics used a VNx16BI input predicate for all modes, instead of the usual <VPRED>. That unnecessarily made some input bits significant, but more importantly, it triggered an ICE in aarch64_sve_same_pred_for_ptest_p when testing whether a comparison pattern could be fused with a PTEST. A later patch will add tests for other comparisons. gcc/ * config/aarch64/aarch64-sve.md (@aarch64_pred_cmp<cmp_op><mode>_wide) (*aarch64_pred_cmp<cmp_op><mode>_wide_cc): Use <VPRED> instead of VNx16BI for the governing predicate. (*aarch64_pred_cmp<cmp_op><mode>_wide_ptest): Likewise. gcc/testsuite/ * gcc.target/aarch64/sve/acle/general/cmpeq_1.c: Add more tests.
2025-08-04aarch64: Use VNx16BI for non-widening integer svcmp*Richard Sandiford23-5/+2985
This patch continues the work of making ACLE intrinsics use VNx16BI for svbool_t results. It deals with the non-widening integer forms of svcmp*. The handling of the PTEST patterns is similar to that for the earlier svwhile* patch. Unfortunately, on its own, this triggers a failure in the pred_clobber_*.c tests. The problem is that, after the patch, we have a comparison instruction followed by a move into p0. Combine combines the instructions together, so that the destination of the comparison is the hard register p0 rather than a pseudo. This defeats IRA's make_early_clobber_and_input_conflicts, which requires the source and destination to be pseudo registers. Before the patch, there was a subreg move between the comparison and the move into p0, so it was that subreg move that ended up with a hard register destination. Arguably the fix for PR87600 should be extended to destination registers as well as source registers, but in the meantime, the patch just disables combine for these tests. The tests are really testing the constraints and register allocation. gcc/ * config/aarch64/aarch64-sve.md (@aarch64_pred_cmp<cmp_op><mode>_acle) (*aarch64_pred_cmp<cmp_op><mode>_acle, *cmp<cmp_op><mode>_acle_cc) (*cmp<cmp_op><mode>_acle_and): New patterns that yield VNx16BI results for all element types. * config/aarch64/aarch64-sve-builtins-base.cc (svcmp_impl::expand): Use them. (svcmp_wide_impl::expand): Likewise when implementing an svcmp_wide against an in-range constant. gcc/testsuite/ * gcc.target/aarch64/sve/pred_clobber_1.c: Disable combine. * gcc.target/aarch64/sve/pred_clobber_2.c: Likewise. * gcc.target/aarch64/sve/pred_clobber_3.c: Likewise. * gcc.target/aarch64/sve/acle/general/cmpeq_2.c: Add more cases. * gcc.target/aarch64/sve/acle/general/cmpeq_4.c: New test. * gcc.target/aarch64/sve/acle/general/cmpge_1.c: Likewise. * gcc.target/aarch64/sve/acle/general/cmpge_2.c: Likewise. * gcc.target/aarch64/sve/acle/general/cmpge_3.c: Likewise. * gcc.target/aarch64/sve/acle/general/cmpge_4.c: Likewise. * gcc.target/aarch64/sve/acle/general/cmpgt_1.c: Likewise. * gcc.target/aarch64/sve/acle/general/cmpgt_2.c: Likewise. * gcc.target/aarch64/sve/acle/general/cmpgt_3.c: Likewise. * gcc.target/aarch64/sve/acle/general/cmpgt_4.c: Likewise. * gcc.target/aarch64/sve/acle/general/cmple_1.c: Likewise. * gcc.target/aarch64/sve/acle/general/cmple_2.c: Likewise. * gcc.target/aarch64/sve/acle/general/cmple_3.c: Likewise. * gcc.target/aarch64/sve/acle/general/cmple_4.c: Likewise. * gcc.target/aarch64/sve/acle/general/cmplt_1.c: Likewise. * gcc.target/aarch64/sve/acle/general/cmplt_2.c: Likewise. * gcc.target/aarch64/sve/acle/general/cmplt_3.c: Likewise. * gcc.target/aarch64/sve/acle/general/cmplt_4.c: Likewise. * gcc.target/aarch64/sve/acle/general/cmpne_1.c: Likewise. * gcc.target/aarch64/sve/acle/general/cmpne_2.c: Likewise.
2025-08-04aarch64: Use VNx16BI for svunpklo/hi_bRichard Sandiford2-0/+48
This patch continues the work of making ACLE intrinsics use VNx16BI for svbool_t results. It deals with the svunpk* intrinsics. gcc/ * config/aarch64/aarch64-sve.md (@aarch64_sve_punpk<perm_hilo>_acle) (*aarch64_sve_punpk<perm_hilo>_acle): New patterns. * config/aarch64/aarch64-sve-builtins-base.cc (svunpk_impl::expand): Use them for boolean svunpk*. gcc/testsuite/ * gcc.target/aarch64/sve/acle/general/unpkhi_1.c: New test. * gcc.target/aarch64/sve/acle/general/unpklo_1.c: Likewise.
2025-08-04aarch64: Use VNx16BI for svrev_b* [PR121294]Richard Sandiford1-0/+27
The previous patch for PR121294 handled svtrn1/2, svuzp1/2, and svzip1/2. This one extends it to handle svrev intrinsics, where the same kind of wrong code can be generated. gcc/ PR target/121294 * config/aarch64/aarch64.md (UNSPEC_REV_PRED): New unspec. * config/aarch64/aarch64-sve.md (@aarch64_sve_rev<mode>_acle) (*aarch64_sve_rev<mode>_acle): New patterns. * config/aarch64/aarch64-sve-builtins-base.cc (svrev_impl::expand): Use the new patterns for boolean svrev. gcc/testsuite/ PR target/121294 * gcc.target/aarch64/sve/acle/general/rev_2.c: New test.
2025-08-04aarch64: Use VNx16BI for more permutations [PR121294]Richard Sandiford6-0/+576
The patterns for the predicate forms of svtrn1/2, svuzp1/2, and svzip1/2 are shared with aarch64_vectorize_vec_perm_const. The .H, .S, and .D forms operate on VNx8BI, VNx4BI, and VNx2BI respectively. Thus, for all four element widths, there is one significant bit per element, for both the inputs and the output. That's appropriate for aarch64_vectorize_vec_perm_const but not for the ACLE intrinsics, where every bit of the output is significant, and where every bit of the selected input elements is therefore also significant. The current expansion can lead the optimisers to simplify inputs by changing the upper bits of the input elements (since the current patterns claim that those bits don't matter), which in turn leads to wrong code. The ACLE expansion should operate on VNx16BI instead, for all element widths. There was already a pattern for a VNx16BI-only form of TRN1, for constructing certain predicate constants. The patch generalises it to handle the other five permutations as well. For the reasons given in the comments, this is done by making the permutation unspec an operand to a new UNSPEC_PERMUTE_PRED, rather than overloading the existing unspecs, and rather than adding a new unspec for each permutation. gcc/ PR target/121294 * config/aarch64/iterators.md (UNSPEC_TRN1_CONV): Delete. (UNSPEC_PERMUTE_PRED): New unspec. * config/aarch64/aarch64-sve.md (@aarch64_sve_trn1_conv<mode>): Replace with... (@aarch64_sve_<perm_insn><mode>_acle) (*aarch64_sve_<perm_insn><mode>_acle): ...these new patterns. * config/aarch64/aarch64.cc (aarch64_expand_sve_const_pred_trn): Update accordingly. * config/aarch64/aarch64-sve-builtins-functions.h (binary_permute::expand): Use the new _acle patterns for predicate operations. gcc/testsuite/ PR target/121294 * gcc.target/aarch64/sve/acle/general/perm_2.c: New test. * gcc.target/aarch64/sve/acle/general/perm_3.c: Likewise. * gcc.target/aarch64/sve/acle/general/perm_4.c: Likewise. * gcc.target/aarch64/sve/acle/general/perm_5.c: Likewise. * gcc.target/aarch64/sve/acle/general/perm_6.c: Likewise. * gcc.target/aarch64/sve/acle/general/perm_7.c: Likewise.
2025-08-04aarch64: Use VNx16BI for more SVE WHILE* results [PR121118]Richard Sandiford7-0/+796
PR121118 is about a case where we try to construct a predicate constant using a permutation of a PFALSE and a WHILELO. The WHILELO is a .H operation and its result has mode VNx8BI. However, the permute instruction expects both inputs to be VNx16BI, leading to an unrecognisable insn ICE. VNx8BI is effectively a form of VNx16BI in which every odd-indexed bit is insignificant. In the PR's testcase that's OK, since those bits will be dropped by the permutation. But if the WHILELO had been a VNx4BI, so that only every fourth bit is significant, the input to the permutation would have had undefined bits. The testcase in the patch has an example of this. This feeds into a related ACLE problem that I'd been meaning to fix for a long time: every bit of an svbool_t result is significant, and so every ACLE intrinsic that returns an svbool_t should return a VNx16BI. That doesn't currently happen for ACLE svwhile* intrinsics. This patch fixes both issues together. We still need to keep the current WHILE* patterns for autovectorisation, where the result mode should match the element width. The patch therefore adds a new set of patterns that are defined to return VNx16BI instead. For want of a better scheme, it uses an "_acle" suffix to distinguish these new patterns from the "normal" ones. The formulation used is: (and:VNx16BI (subreg:VNx16BI normal-pattern 0) C) where C has mode VNx16BI and is a canonical ptrue for normal-pattern's element width (so that the low bit of each element is set and the upper bits are clear). This is a bit clunky, and leads to some repetition. But it has two advantages: * After g:965564eafb721f8000013a3112f1bba8d8fae32b, converting the above expression back to normal-pattern's mode will reduce to normal-pattern, so that the pattern for testing the result using a PTEST doesn't change. * It gives RTL optimisers a bit more information, as the new tests demonstrate. In the expression above, C is matched using a new "special" predicate aarch64_ptrue_all_operand, where "special" means that the mode on the predicate is not necessarily the mode of the expression. In this case, C always has mode VNx16BI, but the mode on the predicate indicates which kind of canonical PTRUE is needed. gcc/ PR testsuite/121118 * config/aarch64/iterators.md (VNx16BI_ONLY): New mode iterator. * config/aarch64/predicates.md (aarch64_ptrue_all_operand): New predicate. * config/aarch64/aarch64-sve.md (@aarch64_sve_while_<while_optab_cmp><GPI:mode><VNx16BI_ONLY:mode>_acle) (@aarch64_sve_while_<while_optab_cmp><GPI:mode><PRED_HSD:mode>_acle) (*aarch64_sve_while_<while_optab_cmp><GPI:mode><PRED_HSD:mode>_acle) (*while_<while_optab_cmp><GPI:mode><PRED_HSD:mode>_acle_cc): New patterns. * config/aarch64/aarch64-sve-builtins-functions.h (while_comparison::expand): Use the new _acle patterns that always return a VNx16BI. * config/aarch64/aarch64-sve-builtins-sve2.cc (svwhilerw_svwhilewr_impl::expand): Likewise. * config/aarch64/aarch64.cc (aarch64_sve_move_pred_via_while): Likewise. gcc/testsuite/ PR testsuite/121118 * gcc.target/aarch64/sve/acle/general/pr121118_1.c: New test. * gcc.target/aarch64/sve/acle/general/whilele_13.c: Likewise. * gcc.target/aarch64/sve/acle/general/whilelt_6.c: Likewise. * gcc.target/aarch64/sve2/acle/general/whilege_1.c: Likewise. * gcc.target/aarch64/sve2/acle/general/whilegt_1.c: Likewise. * gcc.target/aarch64/sve2/acle/general/whilerw_5.c: Likewise. * gcc.target/aarch64/sve2/acle/general/whilewr_5.c: Likewise.
2025-08-04aarch64: Improve svdupq_lane expension for big-endian [PR121293]Richard Sandiford1-0/+8
If the index to svdupq_lane is variable, or is outside the range of the .Q form of DUP, the fallback expansion is to convert to VNx2DI and use TBL. The problem in this PR was that the conversion used subregs, and on big-endian targets, a bitcast from VNx2DI to another element size requires a REV[BHW] in the best case or a spill and reload in the worst case. (See the comment at the head of aarch64-sve.md for details.) Here we want the conversion to act like svreinterpret, so it should use aarch64_sve_reinterpret instead of subregs. gcc/ PR target/121293 * config/aarch64/aarch64-sve-builtins-base.cc (svdupq_lane::expand): Use aarch64_sve_reinterpret instead of subregs. Explicitly reinterpret the result back to the required mode, rather than leaving the caller to take a subreg. gcc/testsuite/ PR target/121293 * gcc.target/aarch64/sve/acle/general/dupq_lane_9.c: New test.
2025-08-04tree-optimization/121362 - missed FRE through aggregate copyRichard Biener2-0/+66
The following streamlines and generalizes how we find the common base of the lookup ref and a kill ref when looking through aggregate copies. In particular this tries to deal with all variants of punning that happens on the inner MEM_REF after forwarding of address taken components of the common base. PR tree-optimization/121362 * tree-ssa-sccvn.cc (vn_reference_lookup_3): Generalize aggregate copy handling. * gcc.dg/tree-ssa/ssa-fre-105.c: New testcase. * gcc.dg/tree-ssa/ssa-fre-106.c: Likewise.
2025-08-03x86: Don't hoist non all 0s/1s vector set outside of loopH.J. Lu1-0/+49
Don't hoist non all 0s/1s vector set outside of the loop to avoid extra spills. gcc/ PR target/120941 * config/i386/i386-features.cc (x86_cse_kind): Moved before ix86_place_single_vector_set. (redundant_load): Likewise. (ix86_place_single_vector_set): Replace the last argument to the pointer to redundant_load. For X86_CSE_VEC_DUP, don't place the vector set outside of the loop to avoid extra spills. (remove_redundant_vector_load): Pass load to ix86_place_single_vector_set. gcc/testsuite/ PR target/120941 * gcc.target/i386/pr120941-1.c: New test. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2025-08-04Daily bump.GCC Administrator1-0/+10
2025-08-03c++: Add stringification testcase for CWG1709 [PR120778]Jakub Jelinek1-0/+18
The CWG1709 just codifies existing GCC (and clang) behavior, so this just adds a testcase for that. 2025-08-03 Jakub Jelinek <jakub@redhat.com> PR preprocessor/120778 * g++.dg/DRs/dr1709.C: New test.
2025-08-03libcpp: Fix up cpp_maybe_module_directive [PR120845]Jakub Jelinek1-0/+8
My changes for "Module Declarations Shouldn’t be Macros" paper broke the following testcase. The backup handling intentionally tries to drop CPP_PRAGMA_EOL token if things go wrong, which is desirable for the case where we haven't committed to the module preprocessing directive (i.e. changed the first token to the magic one). In that case there is no preprocessing directive start and so CPP_PRAGMA_EOL would be wrong. If there is a premature new-line after we've changed the first token though, we shouldn't drop CPP_PRAGMA_EOL, because otherwise we ICE in the FE. While clang++ and MSVC accept the testcase, in my reading it is incorrect at least in the C++23 and newer wordings and I think the changes have been a DR, https://eel.is/c++draft/cpp.module has no exception for new-lines and https://eel.is/c++draft/cpp.pre#1.sentence-2 says that new-line (unless deleted during phase 2 when after backslash) ends the preprocessing directive. The patch arranges for eol being set only in the not_module case. 2025-08-03 Jakub Jelinek <jakub@redhat.com> PR c++/120845 libcpp/ * lex.cc (cpp_maybe_module_directive): Move eol variable declaration to the start of the function, initialize to false and only set it to peek->type == CPP_PRAGMA_EOL in the not_module case. Formatting fix. gcc/testsuite/ * g++.dg/modules/cpp-21.C: New test.
2025-08-03Daily bump.GCC Administrator1-0/+6
2025-08-02c: rewrite implementation of `arg spec' attributeMartin Uecker2-4/+3
Rewrite the implementation of the `arg spec' attribute to pass the the original type of an array parameter instead of passing a string description and a list of bounds. The string and list are then created from this type during the integration of the information of `arg spec' into the access attribute because it still needed later for various warnings. This change makes the implementation simpler and more robust as the declarator is not processed and information that is already encoded in the type is not duplicated. A similar change to the access attribute could then completely remove the string processing. With this change, the original type is now available and can be used for other warnings or for _Countof. The new implementation tries to be faithful to the original, but this is not entirely true as it fixes a bug in the old implementation. gcc/c-family/ChangeLog: * c-attribs.cc (handle_argspec_attribute): Update. (build_arg_spec): New function. (build_attr_access_from_parms): Rewrite `arg spec' handling. gcc/c/ChangeLog: * c-decl.cc (get_parm_array_spec): Remove. (push_parm_decl): Do not add `arg spec` attribute. (build_arg_spec_attribute): New function. (grokdeklarator): Add `arg spec` attribute. gcc/testsuite/ChangeLog: * gcc.dg/Warray-parameter-11.c: Change Warray-parameter to -Wvla-parameter as these are VLAs. * gcc.dg/Warray-parameter.c: Remove xfail.
2025-08-02Daily bump.GCC Administrator1-0/+22
2025-08-01i386: Fix incorrect attributes-error.c testArtemiy Granat1-1/+2
gcc/testsuite/ChangeLog: * gcc.target/i386/attributes-error.c: Change incorrect sseregparm,fastcall combination to cdecl,fastcall.
2025-08-01bswap: Fix up ubsan detected UB in find_bswap_or_nop [PR121322]Jakub Jelinek1-0/+14
The following testcase results in compiler UB as detected by ubsan. find_bswap_or_nop first checks is_bswap_or_nop_p and if that fails on the tmp_n value, tries some rotation of that if possible. The discovery what rotate count to use ignores zero bytes from the least significant end (those mean zero bytes and so can be masked away) and on the first non-zero non-0xff byte (0xff means don't know), 1-8 means some particular byte of the original computes count (the rotation count) from that byte + the byte index. Now, on the following testcase we have tmp_n 0x403020105060700, i.e. the least significant byte is zero, then the msb from the original value, byte below it, another one below it, then the low 32 bits of the original value. So, we stop at count 7 with i 1, it wraps around and we get count 0. Then we invoke UB on tmp_n = tmp_n >> count | tmp_n << (range - count); because count is 0 and range is 64. Now, of course I could fix it up by doing tmp_n << ((range - count) % range) or something similar, but that is just wasted compile time, if count is 0, we already know that is_bswap_or_nop_p failed on that tmp_n value and so it will fail again if the value is the same. So I think better just return NULL (i.e. punt). 2025-08-01 Jakub Jelinek <jakub@redhat.com> PR middle-end/121322 * gimple-ssa-store-merging.cc (find_bswap_or_nop): Return NULL if count is 0. * gcc.dg/pr121322.c: New test.
2025-08-01c++/modules: Warn for optimize attributes instead of ICEing [PR108080]Nathaniel Shead1-0/+5
This PR is the most frequently reported modules bug for 15, as the ICE message does not indicate the issue at all and reducing to find the underlying cause can be tricky. I have a WIP patch to fix this issue by just reconstructing these nodes on stream-in from any attributes applied to the functions, but since at this stage it may still take a while to be ready, it seems useful to me to at least make the error here more friendly and guide users to what they could do to work around this issue. In fact, as noted on the PR, a lot of the time it should be harmless to just ignore the optimize etc. attribute and continue translation, at the user's own risk; this patch as such turns the ICE into a warning with no option to silence. PR c++/108080 gcc/cp/ChangeLog: * module.cc (trees_out::core_vals): Warn when streaming target/optimize node; adjust comments. (trees_in::core_vals): Don't stream a target/optimize node. gcc/testsuite/ChangeLog: * g++.dg/modules/pr108080.H: New test. Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com> Reviewed-by: Jason Merrill <jason@redhat.com> Reviewed-by: Patrick Palka <ppalka@redhat.com>
2025-08-01c++/modules: Merge PARM_DECL properties from function definitions [PR121238]Nathaniel Shead3-0/+42
When we merge a function definition, if there already exists a forward declaration in the importing TU we use the PARM_DECLs belonging to that decl. This usually works fine, except as noted in the linked PR there are some flags (such as TREE_ADDRESSABLE) that only get set on a PARM_DECL once a definition is provided. This patch fixes the wrong-code issues by propagating any properties on PARM_DECLs I could find that may affect codegen. PR c++/121238 gcc/cp/ChangeLog: * module.cc (trees_in::fn_parms_fini): Merge properties for definitions. gcc/testsuite/ChangeLog: * g++.dg/modules/merge-19.h: New test. * g++.dg/modules/merge-19_a.H: New test. * g++.dg/modules/merge-19_b.C: New test. Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com> Reviewed-by: Jason Merrill <jason@redhat.com> Reviewed-by: Patrick Palka <ppalka@redhat.com>
2025-08-01Daily bump.GCC Administrator1-0/+128
2025-07-31PR modula2/121314: quotes appearing in concatenated error stringsGaius Mulley2-0/+32
This patch fixes the addition of strings so that no extraneous quotes appear in the result string. The fix is made to the bootstrap tool mc and it has been rebuilt. gcc/m2/ChangeLog: PR modula2/121314 * mc-boot/GFormatStrings.cc (PerformFormatString): Rebuilt. * mc-boot/GM2EXCEPTION.cc (M2EXCEPTION_M2Exception): Rebuilt. * mc-boot/GSFIO.cc (SFIO_GetFileName): Rebuilt. * mc-boot/GSFIO.h (SFIO_GetFileName): Rebuilt. * mc-boot/Gdecl.cc: Rebuilt. * mc-boot/GmcFileName.h: Rebuilt. * mc/decl.mod (getStringChar): New procedure function. (getStringContents): Call getStringChar. (addQuotes): New procedure function. (foldBinary): Call addQuotes to add delimiting quotes to the new string. gcc/testsuite/ChangeLog: PR modula2/121314 * gm2/errors/fail/badindrtype.mod: New test. * gm2/errors/fail/badindrtype2.mod: New test. Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
2025-07-31fortran: Evaluate class function bounds in the scalarizer [PR121342]Mikael Morin1-0/+35
There is code in gfc_conv_procedure_call that, for polymorphic functions, initializes the scalarization array descriptor information and forcedfully sets loop bounds. This code is changing the decisions made by the scalarizer behind his back, and the test shows an example where the consequences are (badly) visible. In the test, for one of the actual arguments to an elemental subroutine, an offset to the loop variable is missing to access the array, as it was the one originally chosen to set the loop bounds from. This could theoretically be fixed by just clearing the array of choice for the loop bounds. This change takes instead the harder path of adding the missing information to the scalarizer's knowledge so that its decision doesn't need to be forced to something else after the fact. The array descriptor information initialisation for polymorphic functions is moved to gfc_add_loop_ss_code (after the function call generation), and the loop bounds initialization to a new function called after that. As the array chosen to set the loop bounds from is no longer forced to be the polymorphic function result, we have to let the scalarizer set a delta for polymorphic function results. For regular non-polymorphic function result arrays, they are zero-based and the temporary creation makes the loop zero-based as well, so we can continue to skip the delta calculation. In the cases where a temporary is created to store the result of the array function, the creation of the temporary shifts the loop bounds to be zero-based. As there was no delta for polymorphic result arrays, the function result descriptor offset was set to zero in that case for a zero-based array reference to be correct. Now that the scalarizer sets a delta, those forced offset updates have to go because they can make the descriptor invalid and cause erroneous array references. PR fortran/121342 gcc/fortran/ChangeLog: * trans-expr.cc (gfc_conv_subref_array_arg): Remove offset update. (gfc_conv_procedure_call): For polymorphic functions, move the scalarizer descriptor information... * trans-array.cc (gfc_add_loop_ss_code): ... here, and evaluate the bounds to fresh variables. (get_class_info_from_ss): Remove offset update. (gfc_conv_ss_startstride): Don't set a zero value for function result upper bounds. (late_set_loop_bounds): New. (gfc_conv_loop_setup): If the bounds of a function result have been set, and no other array provided loop bounds for a dimension, use the function result bounds as loop bounds for that dimension. (gfc_set_delta): Don't skip delta setting for polymorphic function results. gcc/testsuite/ChangeLog: * gfortran.dg/class_elemental_1.f90: New test.
2025-07-31c++: constexpr, array, private ctor [PR120800]Jason Merrill1-0/+22
Here cxx_eval_vec_init_1 wants to recreate the default constructor call that we previously built and threw away in build_vec_init_elt, but we aren't in the same access context at this point. Since we already checked access, let's just suppress access control here. Redoing overload resolution at constant evaluation time is sketchy, but should usually be fine for a default/copy constructor. PR c++/120800 gcc/cp/ChangeLog: * constexpr.cc (cxx_eval_vec_init_1): Suppress access control. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/constexpr-array30.C: New test.
2025-07-31c++: consteval blocksMarek Polacek8-0/+441
This patch implements consteval blocks, as specified by P2996. They aren't very useful without define_aggregate, but having a reviewed implementation on trunk would be great. consteval {} can be anywhere where a member-declaration or block-declaration can be. The expression corresponding to it is: [] -> void static consteval compound-statement () and it must be a constant expression. I've used cp_parser_lambda_expression to take care of most of the parsing. Since a consteval block can find itself in a template, we need a vehicle to carry the block for instantiation. Rather than inventing a new tree, I'm using STATIC_ASSERT. A consteval block can't return a value but that is checked by virtue of the lambda having a void return type. PR c++/120775 gcc/cp/ChangeLog: * constexpr.cc (cxx_eval_outermost_constant_expr): Use extract_call_expr. * cp-tree.h (CONSTEVAL_BLOCK_P, LAMBDA_EXPR_CONSTEVAL_BLOCK_P): Define. (finish_static_assert): Adjust declaration. (current_nonlambda_function): Likewise. * lambda.cc (current_nonlambda_function): New parameter. Only keep iterating if the function represents a consteval block. * parser.cc (cp_parser_lambda_expression): New parameter for consteval blocks. Use it. Set LAMBDA_EXPR_CONSTEVAL_BLOCK_P. (cp_parser_lambda_declarator_opt): Likewise. (build_empty_string): New. (cp_parser_next_tokens_are_consteval_block_p): New. (cp_parser_consteval_block): New. (cp_parser_block_declaration): Handle consteval blocks. (cp_parser_static_assert): Use build_empty_string. (cp_parser_member_declaration): Handle consteval blocks. * pt.cc (tsubst_stmt): Adjust a call to finish_static_assert. * semantics.cc (finish_fname): Warn for consteval blocks. (finish_static_assert): New parameter for consteval blocks. Set CONSTEVAL_BLOCK_P. Evaluate consteval blocks specially. gcc/testsuite/ChangeLog: * g++.dg/cpp26/consteval-block1.C: New test. * g++.dg/cpp26/consteval-block2.C: New test. * g++.dg/cpp26/consteval-block3.C: New test. * g++.dg/cpp26/consteval-block4.C: New test. * g++.dg/cpp26/consteval-block5.C: New test. * g++.dg/cpp26/consteval-block6.C: New test. * g++.dg/cpp26/consteval-block7.C: New test. * g++.dg/cpp26/consteval-block8.C: New test. Reviewed-by: Jason Merrill <jason@redhat.com>
2025-07-31RISC-V: Add testcases for signed avg ceil vx combinePan Li22-5/+293
The unsigned avg ceil share the vaaddx.vx for the vx combine, so add the test case to make sure it works well as expected. The below test suites are passed for this patch series. * The rv64gcv fully regression test. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i16.c: Add asm check for signed avg ceil. * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i32.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i64.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i8.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i16.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i32.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i64.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i8.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i16.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i32.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i64.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i8.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i16.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i32.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i64.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i8.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h: Add test helper macros. * gcc.target/riscv/rvv/autovec/vx_vf/vx_binary_data.h: Add test data for run test. * gcc.target/riscv/rvv/autovec/vx_vf/vx_vaadd-run-2-i16.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vx_vaadd-run-2-i32.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vx_vaadd-run-2-i64.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vx_vaadd-run-2-i8.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2025-07-31i386: Fix incorrect handling of simultaneous regparm and thiscall useArtemiy Granat1-7/+34
gcc/ChangeLog: * config/i386/i386-options.cc (ix86_handle_cconv_attribute): Handle simultaneous use of regparm and thiscall attributes in case when regparm is set before thiscall. gcc/testsuite/ChangeLog: * gcc.target/i386/attributes-error.c: Add more attributes combinations.
2025-07-31i386: Ignore regparm attribute and warn for it in 64-bit modeArtemiy Granat9-13/+57
The regparm attribute does not affect code generation on x86-64 target. Despite this, regparm was accepted silently, unlike other calling convention attributes handled in the ix86_handle_cconv_attribute function. Due to lack of diagnostics, Linux kernel attempted to specify regparm(0) on vmread_error_trampoline declaration, which is supposed to be invoked with all arguments on stack: https://lore.kernel.org/all/20220928232015.745948-1-seanjc@google.com/ To produce a warning for regparm in 64-bit mode, simply move the block that produces diagnostics above the block that handles the regparm attribute. gcc/ChangeLog: * config/i386/i386-options.cc (ix86_handle_cconv_attribute): Move 64-bit mode check before regparm handling. gcc/testsuite/ChangeLog: * g++.dg/abi/regparm1.C: Require ia32 target. * gcc.target/i386/20020224-1.c: Likewise. * gcc.target/i386/pr103785.c: Use regparm attribute only if not in 64-bit mode. * gcc.target/i386/pr36533.c: Likewise. * gcc.target/i386/pr59099.c: Likewise. * gcc.target/i386/sibcall-8.c: Likewise. * gcc.target/i386/sw-1.c: Likewise. * gcc.target/i386/pr15184-2.c: Fix invalid comment. * gcc.target/i386/attributes-ignore.c: New test.
2025-07-31testsuite: Add runtime test for FMV resolversYury Khrustalev1-0/+82
gcc/testsuite/ChangeLog: * g++.target/aarch64/mv-cpu-features.C: new test.
2025-07-31testsuite: Add tests for __init_cpu_features_constructorYury Khrustalev6-0/+118
Add tests that would call __init_cpu_features_resolver() directly from an ifunc resolver that would in tern call the function under test __init_cpu_features_constructor() using synthetic parameters for different sizes of the 2nd argument. gcc/testsuite/ChangeLog: * gcc.target/aarch64/ifunc-resolver.in: add core test functions. * gcc.target/aarch64/ifunc-resolver-0.c: new test. * gcc.target/aarch64/ifunc-resolver-1.c: ditto. * gcc.target/aarch64/ifunc-resolver-2.c: ditto. * gcc.target/aarch64/ifunc-resolver-3.c: ditto. * gcc.target/aarch64/ifunc-resolver-4.c: as above.
2025-07-31aarch64: Prevent streaming-compatible code from assembler rejection [PR121028]Spencer Abson3-4/+51
Streaming-compatible functions can be compiled without SME enabled, but need to use "SMSTART SM" and "SMSTOP SM" to temporarily switch into the streaming state of a callee. These switches are conditional on the current mode being opposite to the target mode, so no SME instructions are executed if SME is not available. However, in GAS, "SMSTART SM" and "SMSTOP SM" always require +sme. A call from a streaming-compatible function, compiled without SME enabled, to a non -streaming function will be rejected as: Error: selected processor does not support `smstop sm'.. To work around this, we make use of the .inst directive to insert the literal encodings of "SMSTART SM" and "SMSTOP SM". gcc/ChangeLog: PR target/121028 * config/aarch64/aarch64-sme.md (aarch64_smstart_sm): Use the .inst directive if !TARGET_SME. (aarch64_smstop_sm): Likewise. gcc/testsuite/ChangeLog: PR target/121028 * gcc.target/aarch64/sme/call_sm_switch_1.c: Tell check-function -bodies not to ignore .inst directives, and replace the test for "smstart sm" with one for it's encoding. * gcc.target/aarch64/sme/call_sm_switch_11.c: Likewise. * gcc.target/aarch64/sme/pr121028.c: New test.
2025-07-31change get_best_mode args int -> HOST_WIDE_INT [PR121264]Jakub Jelinek1-0/+12
The following testcase is miscompiled, because byte 0x20000000 is bit 0x100000000 and ifcombine incorrectly combines the two loads into a BIT_FIELD_REF even when they are very far away. The problem is that gimple-fold.cc ifcombine uses get_best_mode heavily, and that function has just int bitsize and int bitpos arguments, so when called e.g. with if (get_best_mode (end_bit - first_bit, first_bit, 0, ll_end_region, ll_align, BITS_PER_WORD, volatilep, &lnmode)) where end_bit - first_bit doesn't fit into int, it is silently truncated. If there was just a single problematic get_best_mode call, I would probably just check for overflows in the caller, but there are many. And the two arguments are used solely as arguments to bit_field_mode_iterator constructor which has HOST_WIDE_INT arguments, so I think the easiest fix is just make the get_best_mode arguments also HOST_WIDE_INT. 2025-07-31 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/121264 * machmode.h (get_best_mode): Change type of first 2 arguments from int to HOST_WIDE_INT. * stor-layout.cc (get_best_mode): Likewise. * gcc.dg/tree-ssa/pr121264.c: New test.
2025-07-31aarch64: testsuite: Fix do-assemble tests for SMESpencer Abson13-4/+48
GCC doesn't support SME without SVE2, so the -march=armv8-a+<ext> argument to check_no_compiler_messages causes aarch64_asm_<ext>_ok to return zero for SME and any <ext> that implies it. This patch changes the baseline architecure to armv9-a for these extensions. The tests for ACLE SME2 intrinsics that require FEAT_FAMINMAX were configured to do-assemble if aarch64_asm_sme2_ok returned 1 (by default), but they really need to check if +faminmax is supported too. The fix above exposed this, so we also fix the do-assemble/do-compile choice for those tests here. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sme2/acle-asm/amax_f16_x2.c: Gate do-assemble on assembler support for +faminmax and +sme2. * gcc.target/aarch64/sme2/acle-asm/amax_f16_x4.c: Likewise. * gcc.target/aarch64/sme2/acle-asm/amax_f32_x2.c: Likewise. * gcc.target/aarch64/sme2/acle-asm/amax_f32_x4.c: Likewise. * gcc.target/aarch64/sme2/acle-asm/amax_f64_x2.c: Likewise. * gcc.target/aarch64/sme2/acle-asm/amax_f64_x4.c: Likewise. * gcc.target/aarch64/sme2/acle-asm/amin_f16_x2.c: Likewise. * gcc.target/aarch64/sme2/acle-asm/amin_f16_x4.c: Likewise. * gcc.target/aarch64/sme2/acle-asm/amin_f32_x2.c: Likewise. * gcc.target/aarch64/sme2/acle-asm/amin_f32_x4.c: Likewise. * gcc.target/aarch64/sme2/acle-asm/amin_f64_x2.c: Likewise. * gcc.target/aarch64/sme2/acle-asm/amin_f64_x4.c: Likewise. * lib/target-supports.exp: Split the extensions that require SME into a separate set, and use armv9-a as their baseline.
2025-07-31Fix comment typos - hanlde -> handleJakub Jelinek3-3/+3
2025-07-31 Jakub Jelinek <jakub@redhat.com> * gimple-ssa-store-merging.cc (find_bswap_or_nop): Fix comment typos, hanlde -> handle. * config/i386/i386.cc (ix86_gimple_fold_builtin, ix86_rtx_costs): Likewise. * config/i386/i386-features.cc (remove_partial_avx_dependency): Likewise. * gcc.target/i386/apx-1.c (apx_hanlder): Rename to ... (apx_handler): ... this. * gcc.target/i386/uintr-2.c (UINTR_hanlder): Rename to ... (UINTR_handler): ... this. * gcc.target/i386/uintr-5.c (UINTR_hanlder): Rename to ... (UINTR_handler): ... this.
2025-07-31Daily bump.GCC Administrator1-0/+151
2025-07-31c++: Don't assume trait funcs return error_mark_node when tf_error is passed ↵Nathaniel Shead2-0/+36
[PR121291] For the sake of determining if there are other errors in user code to report early, many trait functions don't always return error_mark_node if not called in a SFINAE context (i.e., tf_error is set). This patch removes some assumptions on this behaviour I'd made when improving diagnostics of builtin traits. PR c++/121291 gcc/cp/ChangeLog: * constraint.cc (diagnose_trait_expr): Remove assumption about failures returning error_mark_node. * except.cc (explain_not_noexcept): Allow expr not being noexcept. * method.cc (build_invoke): Adjust comment. (is_trivially_xible): Always note non-trivial components if expr is not null or error_mark_node. (is_nothrow_xible): Likewise for non-noexcept components. (is_nothrow_convertible): Likewise. gcc/testsuite/ChangeLog: * g++.dg/ext/is_invocable7.C: New test. * g++.dg/ext/is_nothrow_convertible5.C: New test. Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com> Reviewed-by: Patrick Palka <ppalka@redhat.com>
2025-07-30c++: improve non-constant template arg diagnosticJason Merrill3-4/+15
A conversation today pointed out that the current diagnostic for this case doesn't mention the constant evaluation failure, it just says e.g. "'p' is not a valid template argument for 'int*' because it is not the address of a variable" This patch improves the diagnostic in C++17 and above (post-N4268) to diagnose failed constant-evaluation. gcc/cp/ChangeLog: * pt.cc (convert_nontype_argument_function): Check cxx_constant_value on failure. (invalid_tparm_referent_p): Likewise. gcc/testsuite/ChangeLog: * g++.dg/tc1/dr49.C: Adjust diagnostic. * g++.dg/template/func2.C: Likewise. * g++.dg/cpp1z/nontype8.C: New test.
2025-07-30IFCVT: Fix factor_out_operators correctly for more than 1 phi [PR121295]Andrew Pinski2-0/+33
r16-2590-ga51bf9e10182cf was not the correct fix for this in the end. Instead a much simplier and localized fix is needed, just change the phi that is being worked on with the new result and arguments that is from the factored out operator. This solves the issue of not having result in the IR and causing issues that way. Bootstrapped and tested on x86_64-linux-gnu. Note this depends on reverting r16-2590-ga51bf9e10182cf. PR tree-optimization/121236 PR tree-optimization/121295 gcc/ChangeLog: * tree-if-conv.cc (factor_out_operators): Change the phi node to the new result and args. gcc/testsuite/ChangeLog: * gcc.dg/torture/pr121236-1.c: New test. * gcc.dg/torture/pr121295-1.c: New test. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2025-07-30Revert "ifcvt: Fix ifcvt for multiple phi nodes after factoring operator ↵Andrew Pinski1-20/+0
[PR121236]" This reverts commit a51bf9e10182cf7ac858db0ea6c5cb11b4f12377.
2025-07-30s390: Implement spaceship optab [PR117015]Stefan Schulze Frielinghaus7-0/+197
gcc/ChangeLog: PR target/117015 * config/s390/s390-protos.h (s390_expand_int_spaceship): New function. (s390_expand_fp_spaceship): New function. * config/s390/s390.cc (s390_expand_int_spaceship): New function. (s390_expand_fp_spaceship): New function. * config/s390/s390.md (spaceship<mode>4): New expander. gcc/testsuite/ChangeLog: * gcc.target/s390/spaceship-fp-1.c: New test. * gcc.target/s390/spaceship-fp-2.c: New test. * gcc.target/s390/spaceship-fp-3.c: New test. * gcc.target/s390/spaceship-fp-4.c: New test. * gcc.target/s390/spaceship-int-1.c: New test. * gcc.target/s390/spaceship-int-2.c: New test. * gcc.target/s390/spaceship-int-3.c: New test.
2025-07-30x86: Transform to "pushq $-1; popq reg" for -OzH.J. Lu1-0/+10
commit 4c80062d7b8c272e2e193b8074a8440dbb4fe588 Author: H.J. Lu <hjl.tools@gmail.com> Date: Sun May 25 07:40:29 2025 +0800 x86: Enable *mov<mode>_(and|or) only for -Oz disabled transformation from "movq $-1,reg" to "pushq $-1; popq reg" for -Oz. But for legacy integer registers, the former is 4 bytes and the latter is 3 bytes. Enable such transformation for -Oz. gcc/ PR target/120427 * config/i386/i386.md (peephole2): Transform "movq $-1,reg" to "pushq $-1; popq reg" for -Oz if reg is a legacy integer register. gcc/testsuite/ PR target/120427 * gcc.target/i386/pr120427-5.c: New test. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2025-07-30Fix fasle profile insonsistency errorJan Hubicka1-0/+34
This patch fixes false incosistent profile error message seen when building SPEC with -fprofile-use -fdump-ipa-profile. The problem is that with dumping tree_esitmate_probability is run in dry run mode to report success rates of heuristics. It however runs determine_unlikely_bbs which ovewrites some counts to profile_count::zero and later value profiling sees the mismatch. In sane profile determine_unlikely_bbs should be almost always no-op since it should only drop to 0 things that are known to be unlikely executed. What happens here is that there is a comdat where profile is lost and we see a call with non-zero count calling function with zero count and "fix" the profile by making the call to have zero count, too. I also extended unlikely prediates to avoid tampering with predictions when prediciton is believed to be reliable. This also avoids us from dropping all EH regions to 0 count as tested by the testcase. gcc/ChangeLog: * predict.cc (unlikely_executed_edge_p): Ignore EDGE_EH if profile is reliable. (unlikely_executed_stmt_p): special case builtin_trap/unreachable and ignore other heuristics for reliable profiles. (tree_estimate_probability): Disable unlikely bb detection when doing dry run gcc/testsuite/ChangeLog: * g++.dg/tree-prof/eh1.C: New test.
2025-07-30tree-optimization/121130 - vectorizable_call cannot handle .MASK_CALLRichard Biener1-0/+11
The following makes it correctly reject them, vectorizable_simd_clone_call is solely responsible for them. PR tree-optimization/121130 * tree-vect-stmts.cc (vectorizable_call): Bail out for .MASK_CALL. * gcc.dg/vect/vect-simd-pr121130.c: New testcase.
2025-07-30c++: Make __extension__ silence -Wlong-long pedwarns/warnings [PR121133]Jakub Jelinek4-0/+31
The PR13358 r0-92909 change changed the diagnostics on long long in C++ (either with -std=c++98 or -Wlong-long), but unlike the C FE we unfortunately warn even in the __extension__ long long a; etc. cases. The C FE in that case in disable_extension_diagnostics saves and clears not just pedantic flag but also warn_long_long (and several others), while C++ FE only temporarily disables pedantic. The following patch makes it behave like the C FE in this regard, though (__extension__ 1LL) still doesn't work because of the separate lexing (and I must say I have no idea how to fix that). Or do you prefer a solution closer to the C FE, cp_parser_extension_opt saving the values into a bitfield and have another function to restore the state (or use RAII)? 2025-07-30 Jakub Jelinek <jakub@redhat.com> PR c++/121133 * parser.cc (cp_parser_unary_expression): Adjust cp_parser_extension_opt caller and restore warn_long_long. (cp_parser_declaration): Likewise. (cp_parser_block_declaration): Likewise. (cp_parser_member_declaration): Likewise. (cp_parser_extension_opt): Add SAVED_LONG_LONG argument, save previous warn_long_long state into it and clear it for __extension__. * g++.dg/warn/pr121133-1.C: New test. * g++.dg/warn/pr121133-2.C: New test. * g++.dg/warn/pr121133-3.C: New test. * g++.dg/warn/pr121133-4.C: New test.
2025-07-30libcpp: Fix up comma diagnostics in preprocessor for C++ [PR120778]Jakub Jelinek1-0/+42
The P2843R3 Preprocessing is never undefined paper contains comments that various compilers handle comma operators in preprocessor expressions incorrectly and I think they are right. In both C and C++ the grammar uses constant-expression non-terminal for #if/#elif and in both C and C++ that NT is conditional-expression, so #if 1, 2 is IMHO clearly wrong in both languages. C89 then says for constant-expression "Constant expressions shall not contain assignment, increment, decrement, function-call, or comma operators, except when they are contained within the operand of a sizeof operator." Because all the remaining identifiers in the #if/#elif expression are replaced with 0 I think assignments, increment, decrement and function-call aren't that big deal because (0 = 1) or ++4 etc. are all invalid, but for comma expressions I think it matters. In r0-56429 PR456 Joseph has added !CPP_OPTION (pfile, c99) to handle that correctly. Then C99 changed that to: "Constant expressions shall not contain assignment, increment, decrement, function-call, or comma operators, except when they are contained within a subexpression that is not evaluated." That made for C99+ #if 1 || (1, 2) etc. valid but #if (1, 2) is still invalid, ditto #if 1 ? 1, 2 : 3 In C++ I can't find anything like that though, and as can be seen on say int a[(1, 2)]; int b[1 ? 1, 2 : 3]; being accepted by C++ and rejected by C while int c[1, 2]; int d[1 ? 2 : 3, 4]; being rejected in both C and C++, so I think for C++ it is indeed just the grammar that prevents #if 1, 2. When it is the second operand of ?: or inside of () the grammar just uses expression and that allows comma operator. So, the following patch uses different decisions for C++ when to diagnose comma operator in preprocessor expressions, for C++ tracks if it is inside of () (obviously () around #embed clauses don't count unless one uses limit ((1, 2)) etc.) or inside of the second ?: operand and allows comma operator there and disallows elsewhere. BTW, I wonder if anything in the standard disallows <=> in the preprocessor expressions. Say #if (0 <=> 1) < 0 etc. #include <compare> constexpr int a = (0 <=> 1) < 0; is valid (but not valid without #include <compare>) and the expressions don't use any identifiers. 2025-07-30 Jakub Jelinek <jakub@redhat.com> PR c++/120778 * internal.h (struct lexer_state): Add comma_ok member. * expr.cc (_cpp_parse_expr): Initialize it to 0, increment on CPP_OPEN_PAREN and CPP_QUERY, decrement on CPP_CLOSE_PAREN and CPP_COLON. (num_binary_op): For C++ pedwarn on comma operator if pfile->state.comma_ok is 0 instead of !c99 or skip_eval. * g++.dg/cpp/if-comma-1.C: New test.
2025-07-30vect: Add missing skip-vector check for peeling with versioning [PR121020]Pengfei Li1-0/+54
This fixes a miscompilation issue introduced by the enablement of combined loop peeling and versioning. A test case that reproduces the issue is included in the patch. When performing loop peeling, GCC usually inserts a skip-vector check. This ensures that after peeling, there are enough remaining iterations to enter the main vectorized loop. Previously, the check was omitted if loop versioning for alignment was applied. It was safe before because versioning and peeling for alignment were mutually exclusive. However, with combined peeling and versioning enabled, this is not safe any more. A loop may be peeled and versioned at the same time. Without the skip-vector check, the main vectorized loop can be entered even if its iteration count is zero. This can cause the loop running many more iterations than needed, resulting in incorrect results. To fix this, the patch updates the condition of omitting the skip-vector check to when versioning is performed alone without peeling. gcc/ChangeLog: PR tree-optimization/121020 * tree-vect-loop-manip.cc (vect_do_peeling): Update the condition of omitting the skip-vector check. * tree-vectorizer.h (LOOP_VINFO_USE_VERSIONING_WITHOUT_PEELING): Add a helper macro. gcc/testsuite/ChangeLog: PR tree-optimization/121020 * gcc.dg/vect/vect-early-break_138-pr121020.c: New test.
2025-07-30vect: Fix insufficient alignment requirement for speculative loads [PR121190]Pengfei Li2-1/+63
This patch fixes a segmentation fault issue that can occur in vectorized loops with an early break. When GCC vectorizes such loops, it may insert a versioning check to ensure that data references (DRs) with speculative loads are aligned. The check normally requires DRs to be aligned to the vector mode size, which prevents generated vector load instructions from crossing page boundaries. However, this is not sufficient when a single scalar load is vectorized into multiple loads within the same iteration. In such cases, even if none of the vector loads crosses page boundaries, subsequent loads after the first one may still access memory beyond current valid page. Consider the following loop as an example: while (i < MAX_COMPARE) { if (*(p + i) != *(q + i)) return i; i++; } When compiled with "-O3 -march=znver2" on x86, the vectorized loop may include instructions like: vmovdqa (%rcx,%rax), %ymm0 vmovdqa 32(%rcx,%rax), %ymm1 vpcmpeqq (%rdx,%rax), %ymm0, %ymm0 vpcmpeqq 32(%rdx,%rax), %ymm1, %ymm1 Note two speculative vector loads are generated for each DR (p and q). The first vmovdqa and vpcmpeqq are safe due to the vector size (32-byte) alignment, but the following ones (at offset 32) may not be safe because they could read from the beginning of the next memory page, potentially leading to segmentation faults. To avoid the issue, this patch increases the alignment requirement for speculative loads to DR_TARGET_ALIGNMENT. It ensures all vector loads in the same vector iteration access memory within the same page. gcc/ChangeLog: PR tree-optimization/121190 * tree-vect-data-refs.cc (vect_enhance_data_refs_alignment): Increase alignment requirement for speculative loads. gcc/testsuite/ChangeLog: PR tree-optimization/121190 * gcc.dg/vect/vect-early-break_52.c: Update an unsafe test. * gcc.dg/vect/vect-early-break_137-pr121190.c: New test.
2025-07-30aarch64: Fix sme2+faminmax intrisic gating (PR 121300)Alfie Richards1-0/+9
Fixes the feature gating for the SME2+FAMINMAX intrinsics. PR target/121300 gcc/ChangeLog: * config/aarch64/aarch64-sve-builtins-sme.def (svamin/svamax): Fix arch gating. gcc/testsuite/ChangeLog: * gcc.target/aarch64/pr121300.c: New test.
2025-07-30aarch64: Add support for unpacked SVE FP conditional ternary arithmeticSpencer Abson10-12/+165
This patch extends the expander for fma, fnma, fms, and fnms to support partial SVE FP modes. We add the missing BF16 tests, which we can now trigger for having implemented the conditional expander. We also add tests for the 'merging with multiplicand' case, which this expander canonicalizes (albeit under SVE_STRICT_GP). gcc/ChangeLog: * config/aarch64/aarch64-sve.md (@cond_<optab><mode>): Extend to support partial FP modes. (*cond_<optab><mode>_2_strict): Extend from SVE_FULL_F to SVE_F, use aarch64_predicate_operand. (*cond_<optab><mode>_4_strict): Extend from SVE_FULL_F_B16B16 to SVE_F_B16B16, use aarch64_predicate_operand. (*cond_<optab><mode>_any_strict): Likewise. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/unpacked_cond_fmla_1.c: Add test cases for merging with multiplcand. * gcc.target/aarch64/sve/unpacked_cond_fmls_1.c: Likewise. * gcc.target/aarch64/sve/unpacked_cond_fnmla_1.c: Likewise. * gcc.target/aarch64/sve/unpacked_cond_fnmls_1.c: Likewise. * gcc.target/aarch64/sve/unpacked_cond_fmla_2.c: New test. * gcc.target/aarch64/sve/unpacked_cond_fmls_2.c: Likewise. * gcc.target/aarch64/sve/unpacked_cond_fnmla_2.c: Likewise.. * gcc.target/aarch64/sve/unpacked_cond_fnmls_2.c: Likewise. * g++.target/aarch64/sve/unpacked_cond_ternary_bf16_1.C: Likewise. * g++.target/aarch64/sve/unpacked_cond_ternary_bf16_2.C: Likewise.
2025-07-30aarch64: Relaxed SEL combiner patterns for unpacked SVE FP ternary arithmeticSpencer Abson4-0/+188
Extend the ternary op/UNSPEC_SEL combiner patterns from SVE_FULL_F/ SVE_FULL_F_BF to SVE_F/SVE_F_BF, where the strictness value is SVE_RELAXED_GP. We can only reliably test the 'merging with the third input' (addend) and 'independent value' patterns at this stage as the canocalisation that reorders the multiplicands based on the second SEL input would be performed by the conditional expander. Another difficulty is that we can't test these fused multiply/SEL combines without using __builtin_fma and friends. The reason for this is as follows: We support COND_ADD, COND_SUB, and COND_MUL optabs, so match.pd will canonicalize patterns like ADD/SUB/MUL combined with a VEC_COND_EXPR into these conditional forms. Later, when widening_mul tries to fold these into conditional fused multiply operations, the transformation fails - simply because we haven’t implemented those conditional fused multiply optabs yet. Hence why this patch lacks tests for BFloat16... gcc/ChangeLog: * config/aarch64/aarch64-sve.md (*cond_<optab><mode>_2_relaxed): Extend from SVE_FULL_F to SVE_F. (*cond_<optab><mode>_4_relaxed): Extend from SVE_FULL_F_B16B16 to SVE_F_B16B16. (*cond_<optab><mode>_any_relaxed): Likewise. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/unpacked_cond_fmla_1.c: New test. * gcc.target/aarch64/sve/unpacked_cond_fmls_1.c: Likewise. * gcc.target/aarch64/sve/unpacked_cond_fnmla_1.c: Likewise. * gcc.target/aarch64/sve/unpacked_cond_fnmls_1.c: Likewise.