aboutsummaryrefslogtreecommitdiff
path: root/gcc/config
AgeCommit message (Collapse)AuthorFilesLines
2025-08-04aarch64: Use VNx16BI for sv(n)match*Richard Sandiford1-2/+86
This patch continues the work of making ACLE intrinsics use VNx16BI for svbool_t results. It deals with the svmatch* and svnmatch* intrinsics. gcc/ * config/aarch64/aarch64-sve2.md (@aarch64_pred_<sve_int_op><mode>): Split SVE2_MATCH pattern into a VNx16QI_ONLY define_ins and a VNx8HI_ONLY define_expand. Use a VNx16BI destination for the latter. (*aarch64_pred_<sve_int_op><mode>): New SVE2_MATCH pattern for VNx8HI_ONLY. (*aarch64_pred_<sve_int_op><mode>_cc): Likewise. gcc/testsuite/ * gcc.target/aarch64/sve2/acle/general/match_4.c: New test. * gcc.target/aarch64/sve2/acle/general/nmatch_1.c: Likewise.
2025-08-04aarch64: Use VNx16BI for svac*Richard Sandiford2-17/+49
This patch continues the work of making ACLE intrinsics use VNx16BI for svbool_t results. It deals with the svac* intrinsics (floating- point compare absolute). gcc/ * config/aarch64/aarch64-sve.md (@aarch64_pred_fac<cmp_op><mode>): Replace with... (@aarch64_pred_fac<cmp_op><mode>_acle): ...this new expander. (*aarch64_pred_fac<cmp_op><mode>_strict_acle): New pattern. * config/aarch64/aarch64-sve-builtins-base.cc (svac_impl::expand): Update accordingly. gcc/testsuite/ * gcc.target/aarch64/sve/acle/general/acge_1.c: New test. * gcc.target/aarch64/sve/acle/general/acgt_1.c: Likewise. * gcc.target/aarch64/sve/acle/general/acle_1.c: Likewise. * gcc.target/aarch64/sve/acle/general/aclt_1.c: Likewise.
2025-08-04aarch64: Use VNx16BI for floating-point svcmp*Richard Sandiford2-2/+74
This patch continues the work of making ACLE intrinsics use VNx16BI for svbool_t results. It deals with the floating-point forms of svcmp*. gcc/ * config/aarch64/aarch64-sve.md (@aarch64_pred_fcm<cmp_op><mode>_acle) (*aarch64_pred_fcm<cmp_op><mode>_acle, @aarch64_pred_fcmuo<mode>_acle) (*aarch64_pred_fcmuo<mode>_acle): New patterns. * config/aarch64/aarch64-sve-builtins-base.cc (svcmp_impl::expand, svcmpuo_impl::expand): Use them. gcc/testsuite/ * gcc.target/aarch64/sve/acle/general/cmpeq_6.c: New test. * gcc.target/aarch64/sve/acle/general/cmpge_9.c: Likewise. * gcc.target/aarch64/sve/acle/general/cmpgt_9.c: Likewise. * gcc.target/aarch64/sve/acle/general/cmple_9.c: Likewise. * gcc.target/aarch64/sve/acle/general/cmplt_9.c: Likewise. * gcc.target/aarch64/sve/acle/general/cmpne_5.c: Likewise. * gcc.target/aarch64/sve/acle/general/cmpuo_1.c: Likewise.
2025-08-04aarch64: Use VNx16BI for svcmp*_wideRichard Sandiford1-1/+88
This patch continues the work of making ACLE intrinsics use VNx16BI for svbool_t results. It deals with the svcmp*_wide intrinsics. Since the only uses of these patterns are for ACLE intrinsics, there didn't seem much point adding an "_acle" suffix. gcc/ * config/aarch64/aarch64-sve.md (@aarch64_pred_cmp<cmp_op><mode>_wide): Split into VNx16QI_ONLY and SVE_FULL_HSI patterns. Use VNx16BI results for both. (*aarch64_pred_cmp<cmp_op><mode>_wide): New pattern. (*aarch64_pred_cmp<cmp_op><mode>_wide_cc): Likewise. gcc/testsuite/ * gcc.target/aarch64/sve/acle/general/cmpeq_5.c: New test. * gcc.target/aarch64/sve/acle/general/cmpge_7.c: Likewise. * gcc.target/aarch64/sve/acle/general/cmpge_8.c: Likewise. * gcc.target/aarch64/sve/acle/general/cmpgt_7.c: Likewise. * gcc.target/aarch64/sve/acle/general/cmpgt_8.c: Likewise. * gcc.target/aarch64/sve/acle/general/cmple_7.c: Likewise. * gcc.target/aarch64/sve/acle/general/cmple_8.c: Likewise. * gcc.target/aarch64/sve/acle/general/cmplt_7.c: Likewise. * gcc.target/aarch64/sve/acle/general/cmplt_8.c: Likewise. * gcc.target/aarch64/sve/acle/general/cmpne_4.c: Likewise.
2025-08-04aarch64: Drop unnecessary GPs in svcmp_wide PTEST patternsRichard Sandiford1-2/+12
Patterns that fuse a predicate operation P with a PTEST use aarch64_sve_same_pred_for_ptest_p to test whether the governing predicates of P and the PTEST are compatible. Most patterns were also written as define_insn_and_rewrites, with the rewrite replacing P's original governing predicate with PTEST's. This ensures that we don't, for example, have both a .H PTRUE for the PTEST and a .B PTRUE for a comparison that feeds the PTEST. The svcmp_wide* patterns were missing this rewrite, meaning that we did have redundant PTRUEs. gcc/ * config/aarch64/aarch64-sve.md (*aarch64_pred_cmp<cmp_op><mode>_wide_cc): Turn into a define_insn_and_rewrite and rewrite the governing predicate of the comparison so that it is identical to the PTEST's. (*aarch64_pred_cmp<cmp_op><mode>_wide_ptest): Likewise. gcc/testsuite/ * gcc.target/aarch64/sve/acle/general/cmpeq_1.c: Check the number of PTRUEs. * gcc.target/aarch64/sve/acle/general/cmpge_5.c: New test. * gcc.target/aarch64/sve/acle/general/cmpge_6.c: Likewise. * gcc.target/aarch64/sve/acle/general/cmpgt_5.c: Likewise. * gcc.target/aarch64/sve/acle/general/cmpgt_6.c: Likewise. * gcc.target/aarch64/sve/acle/general/cmple_5.c: Likewise. * gcc.target/aarch64/sve/acle/general/cmple_6.c: Likewise. * gcc.target/aarch64/sve/acle/general/cmplt_5.c: Likewise. * gcc.target/aarch64/sve/acle/general/cmplt_6.c: Likewise. * gcc.target/aarch64/sve/acle/general/cmpne_3.c: Likewise.
2025-08-04aarch64: Use the correct GP mode in the svcmp_wide patternsRichard Sandiford1-3/+3
The patterns for the svcmp_wide intrinsics used a VNx16BI input predicate for all modes, instead of the usual <VPRED>. That unnecessarily made some input bits significant, but more importantly, it triggered an ICE in aarch64_sve_same_pred_for_ptest_p when testing whether a comparison pattern could be fused with a PTEST. A later patch will add tests for other comparisons. gcc/ * config/aarch64/aarch64-sve.md (@aarch64_pred_cmp<cmp_op><mode>_wide) (*aarch64_pred_cmp<cmp_op><mode>_wide_cc): Use <VPRED> instead of VNx16BI for the governing predicate. (*aarch64_pred_cmp<cmp_op><mode>_wide_ptest): Likewise. gcc/testsuite/ * gcc.target/aarch64/sve/acle/general/cmpeq_1.c: Add more tests.
2025-08-04aarch64: Use VNx16BI for non-widening integer svcmp*Richard Sandiford2-2/+148
This patch continues the work of making ACLE intrinsics use VNx16BI for svbool_t results. It deals with the non-widening integer forms of svcmp*. The handling of the PTEST patterns is similar to that for the earlier svwhile* patch. Unfortunately, on its own, this triggers a failure in the pred_clobber_*.c tests. The problem is that, after the patch, we have a comparison instruction followed by a move into p0. Combine combines the instructions together, so that the destination of the comparison is the hard register p0 rather than a pseudo. This defeats IRA's make_early_clobber_and_input_conflicts, which requires the source and destination to be pseudo registers. Before the patch, there was a subreg move between the comparison and the move into p0, so it was that subreg move that ended up with a hard register destination. Arguably the fix for PR87600 should be extended to destination registers as well as source registers, but in the meantime, the patch just disables combine for these tests. The tests are really testing the constraints and register allocation. gcc/ * config/aarch64/aarch64-sve.md (@aarch64_pred_cmp<cmp_op><mode>_acle) (*aarch64_pred_cmp<cmp_op><mode>_acle, *cmp<cmp_op><mode>_acle_cc) (*cmp<cmp_op><mode>_acle_and): New patterns that yield VNx16BI results for all element types. * config/aarch64/aarch64-sve-builtins-base.cc (svcmp_impl::expand): Use them. (svcmp_wide_impl::expand): Likewise when implementing an svcmp_wide against an in-range constant. gcc/testsuite/ * gcc.target/aarch64/sve/pred_clobber_1.c: Disable combine. * gcc.target/aarch64/sve/pred_clobber_2.c: Likewise. * gcc.target/aarch64/sve/pred_clobber_3.c: Likewise. * gcc.target/aarch64/sve/acle/general/cmpeq_2.c: Add more cases. * gcc.target/aarch64/sve/acle/general/cmpeq_4.c: New test. * gcc.target/aarch64/sve/acle/general/cmpge_1.c: Likewise. * gcc.target/aarch64/sve/acle/general/cmpge_2.c: Likewise. * gcc.target/aarch64/sve/acle/general/cmpge_3.c: Likewise. * gcc.target/aarch64/sve/acle/general/cmpge_4.c: Likewise. * gcc.target/aarch64/sve/acle/general/cmpgt_1.c: Likewise. * gcc.target/aarch64/sve/acle/general/cmpgt_2.c: Likewise. * gcc.target/aarch64/sve/acle/general/cmpgt_3.c: Likewise. * gcc.target/aarch64/sve/acle/general/cmpgt_4.c: Likewise. * gcc.target/aarch64/sve/acle/general/cmple_1.c: Likewise. * gcc.target/aarch64/sve/acle/general/cmple_2.c: Likewise. * gcc.target/aarch64/sve/acle/general/cmple_3.c: Likewise. * gcc.target/aarch64/sve/acle/general/cmple_4.c: Likewise. * gcc.target/aarch64/sve/acle/general/cmplt_1.c: Likewise. * gcc.target/aarch64/sve/acle/general/cmplt_2.c: Likewise. * gcc.target/aarch64/sve/acle/general/cmplt_3.c: Likewise. * gcc.target/aarch64/sve/acle/general/cmplt_4.c: Likewise. * gcc.target/aarch64/sve/acle/general/cmpne_1.c: Likewise. * gcc.target/aarch64/sve/acle/general/cmpne_2.c: Likewise.
2025-08-04aarch64: Use VNx16BI for svunpklo/hi_bRichard Sandiford2-1/+29
This patch continues the work of making ACLE intrinsics use VNx16BI for svbool_t results. It deals with the svunpk* intrinsics. gcc/ * config/aarch64/aarch64-sve.md (@aarch64_sve_punpk<perm_hilo>_acle) (*aarch64_sve_punpk<perm_hilo>_acle): New patterns. * config/aarch64/aarch64-sve-builtins-base.cc (svunpk_impl::expand): Use them for boolean svunpk*. gcc/testsuite/ * gcc.target/aarch64/sve/acle/general/unpkhi_1.c: New test. * gcc.target/aarch64/sve/acle/general/unpklo_1.c: Likewise.
2025-08-04aarch64: Use VNx16BI for svrev_b* [PR121294]Richard Sandiford3-2/+29
The previous patch for PR121294 handled svtrn1/2, svuzp1/2, and svzip1/2. This one extends it to handle svrev intrinsics, where the same kind of wrong code can be generated. gcc/ PR target/121294 * config/aarch64/aarch64.md (UNSPEC_REV_PRED): New unspec. * config/aarch64/aarch64-sve.md (@aarch64_sve_rev<mode>_acle) (*aarch64_sve_rev<mode>_acle): New patterns. * config/aarch64/aarch64-sve-builtins-base.cc (svrev_impl::expand): Use the new patterns for boolean svrev. gcc/testsuite/ PR target/121294 * gcc.target/aarch64/sve/acle/general/rev_2.c: New test.
2025-08-04aarch64: Use VNx16BI for more permutations [PR121294]Richard Sandiford4-12/+37
The patterns for the predicate forms of svtrn1/2, svuzp1/2, and svzip1/2 are shared with aarch64_vectorize_vec_perm_const. The .H, .S, and .D forms operate on VNx8BI, VNx4BI, and VNx2BI respectively. Thus, for all four element widths, there is one significant bit per element, for both the inputs and the output. That's appropriate for aarch64_vectorize_vec_perm_const but not for the ACLE intrinsics, where every bit of the output is significant, and where every bit of the selected input elements is therefore also significant. The current expansion can lead the optimisers to simplify inputs by changing the upper bits of the input elements (since the current patterns claim that those bits don't matter), which in turn leads to wrong code. The ACLE expansion should operate on VNx16BI instead, for all element widths. There was already a pattern for a VNx16BI-only form of TRN1, for constructing certain predicate constants. The patch generalises it to handle the other five permutations as well. For the reasons given in the comments, this is done by making the permutation unspec an operand to a new UNSPEC_PERMUTE_PRED, rather than overloading the existing unspecs, and rather than adding a new unspec for each permutation. gcc/ PR target/121294 * config/aarch64/iterators.md (UNSPEC_TRN1_CONV): Delete. (UNSPEC_PERMUTE_PRED): New unspec. * config/aarch64/aarch64-sve.md (@aarch64_sve_trn1_conv<mode>): Replace with... (@aarch64_sve_<perm_insn><mode>_acle) (*aarch64_sve_<perm_insn><mode>_acle): ...these new patterns. * config/aarch64/aarch64.cc (aarch64_expand_sve_const_pred_trn): Update accordingly. * config/aarch64/aarch64-sve-builtins-functions.h (binary_permute::expand): Use the new _acle patterns for predicate operations. gcc/testsuite/ PR target/121294 * gcc.target/aarch64/sve/acle/general/perm_2.c: New test. * gcc.target/aarch64/sve/acle/general/perm_3.c: Likewise. * gcc.target/aarch64/sve/acle/general/perm_4.c: Likewise. * gcc.target/aarch64/sve/acle/general/perm_5.c: Likewise. * gcc.target/aarch64/sve/acle/general/perm_6.c: Likewise. * gcc.target/aarch64/sve/acle/general/perm_7.c: Likewise.
2025-08-04aarch64: Use VNx16BI for more SVE WHILE* results [PR121118]Richard Sandiford6-5/+99
PR121118 is about a case where we try to construct a predicate constant using a permutation of a PFALSE and a WHILELO. The WHILELO is a .H operation and its result has mode VNx8BI. However, the permute instruction expects both inputs to be VNx16BI, leading to an unrecognisable insn ICE. VNx8BI is effectively a form of VNx16BI in which every odd-indexed bit is insignificant. In the PR's testcase that's OK, since those bits will be dropped by the permutation. But if the WHILELO had been a VNx4BI, so that only every fourth bit is significant, the input to the permutation would have had undefined bits. The testcase in the patch has an example of this. This feeds into a related ACLE problem that I'd been meaning to fix for a long time: every bit of an svbool_t result is significant, and so every ACLE intrinsic that returns an svbool_t should return a VNx16BI. That doesn't currently happen for ACLE svwhile* intrinsics. This patch fixes both issues together. We still need to keep the current WHILE* patterns for autovectorisation, where the result mode should match the element width. The patch therefore adds a new set of patterns that are defined to return VNx16BI instead. For want of a better scheme, it uses an "_acle" suffix to distinguish these new patterns from the "normal" ones. The formulation used is: (and:VNx16BI (subreg:VNx16BI normal-pattern 0) C) where C has mode VNx16BI and is a canonical ptrue for normal-pattern's element width (so that the low bit of each element is set and the upper bits are clear). This is a bit clunky, and leads to some repetition. But it has two advantages: * After g:965564eafb721f8000013a3112f1bba8d8fae32b, converting the above expression back to normal-pattern's mode will reduce to normal-pattern, so that the pattern for testing the result using a PTEST doesn't change. * It gives RTL optimisers a bit more information, as the new tests demonstrate. In the expression above, C is matched using a new "special" predicate aarch64_ptrue_all_operand, where "special" means that the mode on the predicate is not necessarily the mode of the expression. In this case, C always has mode VNx16BI, but the mode on the predicate indicates which kind of canonical PTRUE is needed. gcc/ PR testsuite/121118 * config/aarch64/iterators.md (VNx16BI_ONLY): New mode iterator. * config/aarch64/predicates.md (aarch64_ptrue_all_operand): New predicate. * config/aarch64/aarch64-sve.md (@aarch64_sve_while_<while_optab_cmp><GPI:mode><VNx16BI_ONLY:mode>_acle) (@aarch64_sve_while_<while_optab_cmp><GPI:mode><PRED_HSD:mode>_acle) (*aarch64_sve_while_<while_optab_cmp><GPI:mode><PRED_HSD:mode>_acle) (*while_<while_optab_cmp><GPI:mode><PRED_HSD:mode>_acle_cc): New patterns. * config/aarch64/aarch64-sve-builtins-functions.h (while_comparison::expand): Use the new _acle patterns that always return a VNx16BI. * config/aarch64/aarch64-sve-builtins-sve2.cc (svwhilerw_svwhilewr_impl::expand): Likewise. * config/aarch64/aarch64.cc (aarch64_sve_move_pred_via_while): Likewise. gcc/testsuite/ PR testsuite/121118 * gcc.target/aarch64/sve/acle/general/pr121118_1.c: New test. * gcc.target/aarch64/sve/acle/general/whilele_13.c: Likewise. * gcc.target/aarch64/sve/acle/general/whilelt_6.c: Likewise. * gcc.target/aarch64/sve2/acle/general/whilege_1.c: Likewise. * gcc.target/aarch64/sve2/acle/general/whilegt_1.c: Likewise. * gcc.target/aarch64/sve2/acle/general/whilerw_5.c: Likewise. * gcc.target/aarch64/sve2/acle/general/whilewr_5.c: Likewise.
2025-08-04aarch64: Improve svdupq_lane expension for big-endian [PR121293]Richard Sandiford1-2/+3
If the index to svdupq_lane is variable, or is outside the range of the .Q form of DUP, the fallback expansion is to convert to VNx2DI and use TBL. The problem in this PR was that the conversion used subregs, and on big-endian targets, a bitcast from VNx2DI to another element size requires a REV[BHW] in the best case or a spill and reload in the worst case. (See the comment at the head of aarch64-sve.md for details.) Here we want the conversion to act like svreinterpret, so it should use aarch64_sve_reinterpret instead of subregs. gcc/ PR target/121293 * config/aarch64/aarch64-sve-builtins-base.cc (svdupq_lane::expand): Use aarch64_sve_reinterpret instead of subregs. Explicitly reinterpret the result back to the required mode, rather than leaving the caller to take a subreg. gcc/testsuite/ PR target/121293 * gcc.target/aarch64/sve/acle/general/dupq_lane_9.c: New test.
2025-08-03x86: Don't hoist non all 0s/1s vector set outside of loopH.J. Lu1-50/+57
Don't hoist non all 0s/1s vector set outside of the loop to avoid extra spills. gcc/ PR target/120941 * config/i386/i386-features.cc (x86_cse_kind): Moved before ix86_place_single_vector_set. (redundant_load): Likewise. (ix86_place_single_vector_set): Replace the last argument to the pointer to redundant_load. For X86_CSE_VEC_DUP, don't place the vector set outside of the loop to avoid extra spills. (remove_redundant_vector_load): Pass load to ix86_place_single_vector_set. gcc/testsuite/ PR target/120941 * gcc.target/i386/pr120941-1.c: New test. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2025-08-03AVR: Use avr_add_ccclobber / DONE_ADD_CCC in md instead of repeats.Georg-Johann Lay3-961/+436
There are many post-reload define_insn_and_split's that just append a (clobber (reg:CC REG_CC)) to the pattern. Instead of repeating the original patterns, avr_add_ccclobber (curr_insn) is used to do that job. This avoids repeating patterns all over the place, and splits that do something different (like using a canonical form) stand out clearly. gcc/ * config/avr/avr.md (define_insn_and_split) [reload_completed]: For splits that just append a (clobber (reg:CC REG_CC)) to the pattern, use avr_add_ccclobber (curr_insn) instead of repeating the original pattern. * config/avr/avr-dimode.md: Same. * config/avr/avr-fixed.md: Same.
2025-08-03AVR: Add avr.cc::avr_add_ccclobber().Georg-Johann Lay2-0/+25
gcc/ * config/avr/avr.cc (avr_add_ccclobber): New function. * config/avr/avr-protos.h (avr_add_ccclobber): New proto. (DONE_ADD_CCC): New define.
2025-07-31AVR: avr.opt.urls: Add -mfuse-move2Georg-Johann Lay1-0/+3
PR rtl-optimization 121340 gcc/ * config/avr/avr.opt.urls (-mfuse-move2): Add url.
2025-07-31AVR: Set .type of jump table label.Georg-Johann Lay1-0/+7
gcc/ * config/avr/avr.cc (avr_output_addr_vec) <labl>: Asm out its .type.
2025-07-31AVR: rtl-optimization/121340 - New mini-pass to undo superfluous moves from ↵Georg-Johann Lay4-0/+152
insn combine. Insn combine may come up with superfluous reg-reg moves, where the combine people say that these are no problem since reg-alloc is supposed to optimize them. The issue is that the lower-subreg pass sitting between combine and reg-alloc may split such moves, coming up with a zoo of subregs which are only handled poorly by the register allocator. This patch adds a new avr mini-pass that handles such cases. As an example, take int f_ffssi (long x) { return __builtin_ffsl (x); } where the two functions have the same interface, i.e. there are no extra moves required for the argument or for the return value. However, $ avr-gcc -S -Os -dp -mno-fuse-move ... f_ffssi: mov r20,r22 ; 29 [c=4 l=1] movqi_insn/0 mov r21,r23 ; 30 [c=4 l=1] movqi_insn/0 mov r22,r24 ; 31 [c=4 l=1] movqi_insn/0 mov r23,r25 ; 32 [c=4 l=1] movqi_insn/0 mov r25,r23 ; 33 [c=4 l=4] *movsi/0 mov r24,r22 mov r23,r21 mov r22,r20 rcall __ffssi2 ; 34 [c=16 l=1] *ffssihi2.libgcc ret ; 37 [c=0 l=1] return where all the moves add up to a no-op. The -mno-fuse-move option stops any attempts by the avr backend to clean up that mess. PR rtl-optimization/121340 gcc/ * config/avr/avr.opt (-mfuse-move2): New option. * config/avr/avr-passes.def (avr_pass_2moves): Insert after combine. * config/avr/avr-passes.cc (make_avr_pass_2moves): New function. (pass_data avr_pass_data_2moves): New static variable. (avr_pass_2moves): New rtl_opt_pass. * config/avr/avr-protos.h (make_avr_pass_2moves): New proto. * common/config/avr/avr-common.cc (default_options avr_option_optimization_table) <-mfuse-move2>: Set for -O1 and higher. * doc/invoke.texi (AVR Options) <-mfuse-move2>: Document.
2025-07-31libgcc: Update FMV features to latest ACLE spec 2024Q4Wilco Dijkstra1-6/+6
Update FMV features to latest ACLE spec of 2024Q4 - several features have been removed or merged. Add FMV support for CSSC and MOPS. Preserve the ordering in enum CPUFeatures. gcc: * common/config/aarch64/cpuinfo.h: Remove unused features, add FEAT_CSSC and FEAT_MOPS. * config/aarch64/aarch64-option-extensions.def: Remove FMV support for RPRES, use PULL rather than AES, add FMV support for CSSC and MOPS. libgcc: * config/aarch64/cpuinfo.c (__init_cpu_features_constructor): Remove unused features, add support for CSSC and MOPS.
2025-07-31AArch64: Use correct cost for shifted halfword load/storesWilco Dijkstra1-1/+1
Since all Armv9 cores support shifted LDRH/STRH, use the correct cost of zero for these. gcc: * config/aarch64/tuning_models/generic_armv9_a.h (generic_armv9_a_addrcost_table): Use zero cost for himode.
2025-07-31i386: Fix typo in diagnostic about simultaneous regparm and thiscall useArtemiy Granat1-1/+1
gcc/ChangeLog: * config/i386/i386-options.cc (ix86_handle_cconv_attribute): Fix typo.
2025-07-31i386: Fix incorrect handling of simultaneous regparm and thiscall useArtemiy Granat1-0/+4
gcc/ChangeLog: * config/i386/i386-options.cc (ix86_handle_cconv_attribute): Handle simultaneous use of regparm and thiscall attributes in case when regparm is set before thiscall. gcc/testsuite/ChangeLog: * gcc.target/i386/attributes-error.c: Add more attributes combinations.
2025-07-31i386: Fix incorrect comment about stdcall and fastcall compatibilityArtemiy Granat1-3/+2
gcc/ChangeLog: * config/i386/i386-options.cc (ix86_handle_cconv_attribute): Fix comments which state that combination of stdcall and fastcall attributes is valid but redundant.
2025-07-31i386: Ignore regparm attribute and warn for it in 64-bit modeArtemiy Granat1-12/+12
The regparm attribute does not affect code generation on x86-64 target. Despite this, regparm was accepted silently, unlike other calling convention attributes handled in the ix86_handle_cconv_attribute function. Due to lack of diagnostics, Linux kernel attempted to specify regparm(0) on vmread_error_trampoline declaration, which is supposed to be invoked with all arguments on stack: https://lore.kernel.org/all/20220928232015.745948-1-seanjc@google.com/ To produce a warning for regparm in 64-bit mode, simply move the block that produces diagnostics above the block that handles the regparm attribute. gcc/ChangeLog: * config/i386/i386-options.cc (ix86_handle_cconv_attribute): Move 64-bit mode check before regparm handling. gcc/testsuite/ChangeLog: * g++.dg/abi/regparm1.C: Require ia32 target. * gcc.target/i386/20020224-1.c: Likewise. * gcc.target/i386/pr103785.c: Use regparm attribute only if not in 64-bit mode. * gcc.target/i386/pr36533.c: Likewise. * gcc.target/i386/pr59099.c: Likewise. * gcc.target/i386/sibcall-8.c: Likewise. * gcc.target/i386/sw-1.c: Likewise. * gcc.target/i386/pr15184-2.c: Fix invalid comment. * gcc.target/i386/attributes-ignore.c: New test.
2025-07-31aarch64: Stop using sys/ifunc.h header in libatomic and libgccYury Khrustalev1-0/+12
This optional header is used to bring in the definition of the struct __ifunc_arg_t type. Since it has been added to glibc only recently, the previous implementation had to check whether this header is present and, if not, it provide its own definition. This creates dead code because either one of these two parts would not be tested. The ABI specification for ifunc resolvers allows to create own ABI-compatible definition for this type, which is the right way of doing it. In addition to improving consistency, the new approach also helps with addition of new fields to struct __ifunc_arg_t type without the need to work-around situations when the definition imported from the header lacks these new fields. ABI allows to define as many hwcap fields in this struct as needed, provided that at runtime we only access the fields that are permitted by the _size value. gcc/ * config/aarch64/aarch64.cc (build_ifunc_arg_type): Add new fields _hwcap3 and _hwcap4. libatomic/ * config/linux/aarch64/host-config.h (__ifunc_arg_t): Remove sys/ifunc.h and add new fields _hwcap3 and _hwcap4. libgcc/ * config/aarch64/cpuinfo.c (__ifunc_arg_t): Likewise. (__init_cpu_features): obtain and assign values for the fields _hwcap3 and _hwcap4. (__init_cpu_features_constructor): check _size in the arg argument.
2025-07-31rs6000: Avoid undefined behavior caused by overflow and invalid shiftsKishan Parmar2-11/+16
While building GCC with --with-build-config=bootstrap-ubsan on powerpc64le-unknown-linux-gnu, multiple UBSAN runtime errors were encountered in rs6000.cc and rs6000.md due to undefined behavior involving left shifts on negative values and shift exponents equal to or exceeding the type width. The issue was in bit pattern recognition code (in can_be_rotated_to_negative_lis and can_be_built_by_li_and_rldic), where signed values were shifted without handling negative inputs or guarding against shift counts equal to the type width, causing UB. The fix ensures shifts and rotations are done unsigned HOST_WIDE_INT, and casting back only where needed (like for arithmetic right shifts) with proper guards to prevent shift-by-64. 2025-07-31 Kishan Parmar <kishan@linux.ibm.com> gcc: PR target/118890 * config/rs6000/rs6000.cc (can_be_rotated_to_negative_lis): Avoid left shift of negative value and guard shift count. (can_be_built_by_li_and_rldic): Likewise. (rs6000_emit_set_long_const): Likewise. * config/rs6000/rs6000.md (splitter for plus into two 16-bit parts): Fix UB from overflow in addition.
2025-07-31Add checks for node in aarch64 vector cost modelingRichard Biener1-1/+3
After removing STMT_VINFO_MEMORY_ACCESS_TYPE we now ICE when costing for scalar stmts required in the epilog since the cost model tries to pattern-match gathers (an earlier patch tried to improve this by introducing stmt groups, but that was on hold due to negative feedback). The following shot-cuts those attempts when node is NULL as that then cannot be a vector stmt. Another possibility would be to gate on vect_body, or restructure everything. Note we now ensure that when m_costing_for_scalar node is NULL. * config/aarch64/aarch64.cc (aarch64_detect_vector_stmt_subtype): Check for node before dereferencing. (aarch64_vector_costs::add_stmt_cost): Likewise.
2025-07-31aarch64: Prevent streaming-compatible code from assembler rejection [PR121028]Spencer Abson1-2/+10
Streaming-compatible functions can be compiled without SME enabled, but need to use "SMSTART SM" and "SMSTOP SM" to temporarily switch into the streaming state of a callee. These switches are conditional on the current mode being opposite to the target mode, so no SME instructions are executed if SME is not available. However, in GAS, "SMSTART SM" and "SMSTOP SM" always require +sme. A call from a streaming-compatible function, compiled without SME enabled, to a non -streaming function will be rejected as: Error: selected processor does not support `smstop sm'.. To work around this, we make use of the .inst directive to insert the literal encodings of "SMSTART SM" and "SMSTOP SM". gcc/ChangeLog: PR target/121028 * config/aarch64/aarch64-sme.md (aarch64_smstart_sm): Use the .inst directive if !TARGET_SME. (aarch64_smstop_sm): Likewise. gcc/testsuite/ChangeLog: PR target/121028 * gcc.target/aarch64/sme/call_sm_switch_1.c: Tell check-function -bodies not to ignore .inst directives, and replace the test for "smstart sm" with one for it's encoding. * gcc.target/aarch64/sme/call_sm_switch_11.c: Likewise. * gcc.target/aarch64/sme/pr121028.c: New test.
2025-07-31Remove STMT_VINFO_MEMORY_ACCESS_TYPERichard Biener2-15/+16
This should be present only on SLP nodes now. The RISC-V changes are mechanical along the line of the SLP_TREE_TYPE changes. * tree-vectorizer.h (_stmt_vec_info::memory_access_type): Remove. (STMT_VINFO_MEMORY_ACCESS_TYPE): Likewise. (vect_mem_access_type): Likewise. * tree-vect-stmts.cc (vectorizable_store): Do not set STMT_VINFO_MEMORY_ACCESS_TYPE. Fix SLP_TREE_MEMORY_ACCESS_TYPE usage. * tree-vect-loop.cc (update_epilogue_loop_vinfo): Remove checking of memory access type. * config/riscv/riscv-vector-costs.cc (costs::compute_local_live_ranges): Use SLP_TREE_MEMORY_ACCESS_TYPE. (costs::need_additional_vector_vars_p): Likewise. (segment_loadstore_group_size): Get SLP node as argument, use SLP_TREE_MEMORY_ACCESS_TYPE. (costs::adjust_stmt_cost): Pass down SLP node. * config/aarch64/aarch64.cc (aarch64_ld234_st234_vectors): Use SLP_TREE_MEMORY_ACCESS_TYPE instead of vect_mem_access_type. (aarch64_detect_vector_stmt_subtype): Likewise. (aarch64_vector_costs::count_ops): Likewise. (aarch64_vector_costs::add_stmt_cost): Likewise.
2025-07-31Fix comment typos - hanlde -> handleJakub Jelinek2-3/+3
2025-07-31 Jakub Jelinek <jakub@redhat.com> * gimple-ssa-store-merging.cc (find_bswap_or_nop): Fix comment typos, hanlde -> handle. * config/i386/i386.cc (ix86_gimple_fold_builtin, ix86_rtx_costs): Likewise. * config/i386/i386-features.cc (remove_partial_avx_dependency): Likewise. * gcc.target/i386/apx-1.c (apx_hanlder): Rename to ... (apx_handler): ... this. * gcc.target/i386/uintr-2.c (UINTR_hanlder): Rename to ... (UINTR_handler): ... this. * gcc.target/i386/uintr-5.c (UINTR_hanlder): Rename to ... (UINTR_handler): ... this.
2025-07-31RISC-V: Adding H to the canonical order [PR121312]Kito Cheng1-1/+1
We added H into canonical order before, but forgot to add it to arch-canonicalize as well... gcc/ChangeLog: PR target/121312 * config/riscv/arch-canonicalize: Add H extension to the canonical order.
2025-07-30[x86] factor out worker from ix86_builtin_vectorization_costRichard Biener1-18/+23
The following factors out a worker that gets a mode argument rather than a vectype argument. That makes a difference when we hit the fallback in add_stmt_cost for scalar stmts where vectype might be NULL and thus mode is derived from the scalar stmt there. But ix86_builtin_vectorization_cost does not have access to the stmt. So the patch instead dispatches to the new ix86_default_vector_cost there, passing down the mode we derived from the stmt. This is to avoid regressions with a patch that makes even more scalar stmt costings have a vectype passed. * config/i386/i386.cc (ix86_default_vector_cost): Split out from ... (ix86_builtin_vectorization_cost): ... this and use mode instead of vectype as argument. (ix86_vector_costs::add_stmt_cost): Call ix86_default_vector_cost instead of ix86_builtin_vectorization_cost.
2025-07-30s390: Implement spaceship optab [PR117015]Stefan Schulze Frielinghaus3-0/+184
gcc/ChangeLog: PR target/117015 * config/s390/s390-protos.h (s390_expand_int_spaceship): New function. (s390_expand_fp_spaceship): New function. * config/s390/s390.cc (s390_expand_int_spaceship): New function. (s390_expand_fp_spaceship): New function. * config/s390/s390.md (spaceship<mode>4): New expander. gcc/testsuite/ChangeLog: * gcc.target/s390/spaceship-fp-1.c: New test. * gcc.target/s390/spaceship-fp-2.c: New test. * gcc.target/s390/spaceship-fp-3.c: New test. * gcc.target/s390/spaceship-fp-4.c: New test. * gcc.target/s390/spaceship-int-1.c: New test. * gcc.target/s390/spaceship-int-2.c: New test. * gcc.target/s390/spaceship-int-3.c: New test.
2025-07-30x86: Transform to "pushq $-1; popq reg" for -OzH.J. Lu1-1/+2
commit 4c80062d7b8c272e2e193b8074a8440dbb4fe588 Author: H.J. Lu <hjl.tools@gmail.com> Date: Sun May 25 07:40:29 2025 +0800 x86: Enable *mov<mode>_(and|or) only for -Oz disabled transformation from "movq $-1,reg" to "pushq $-1; popq reg" for -Oz. But for legacy integer registers, the former is 4 bytes and the latter is 3 bytes. Enable such transformation for -Oz. gcc/ PR target/120427 * config/i386/i386.md (peephole2): Transform "movq $-1,reg" to "pushq $-1; popq reg" for -Oz if reg is a legacy integer register. gcc/testsuite/ PR target/120427 * gcc.target/i386/pr120427-5.c: New test. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2025-07-30vect: Add target hook to prefer gather/scatter instructionsAndrew Stubbs1-0/+12
For AMD GCN, the instructions available for loading/storing vectors are always scatter/gather operations (i.e. there are separate addresses for each vector lane), so the current heuristic to avoid gather/scatter operations with too many elements in get_group_load_store_type is counterproductive. Avoiding such operations in that function can subsequently lead to a missed vectorization opportunity whereby later analyses in the vectorizer try to use a very wide array type which is not available on this target, and thus it bails out. This patch adds a target hook to override the "single_element_p" heuristic in the function as a target hook, and activates it for GCN. This allows much better code to be generated for affected loops. Co-authored-by: Julian Brown <julian@codesourcery.com> gcc/ * doc/tm.texi.in (TARGET_VECTORIZE_PREFER_GATHER_SCATTER): Add documentation hook. * doc/tm.texi: Regenerate. * target.def (prefer_gather_scatter): Add target hook under vectorizer. * hooks.cc (hook_bool_mode_int_unsigned_false): New function. * hooks.h (hook_bool_mode_int_unsigned_false): New prototype. * tree-vect-stmts.cc (vect_use_strided_gather_scatters_p): Add parameters group_size and single_element_p, and rework to use targetm.vectorize.prefer_gather_scatter. (get_group_load_store_type): Move some of the condition into vect_use_strided_gather_scatters_p. * config/gcn/gcn.cc (gcn_prefer_gather_scatter): New function. (TARGET_VECTORIZE_PREFER_GATHER_SCATTER): Define hook.
2025-07-30Don't pass vector params through to offload targetsAndrew Stubbs1-0/+6
The optimization options are deliberately passed through to the LTO compiler, but when the same mechanism is reused for offloading it ends up forcing the host compiler settings onto the device compiler. Maybe this should be removed completely, but this patch just fixes a few of them. In particular, param_vect_partial_vector_usage is disabled by x86 and this really hurts amdgcn. I also fixed an ambiguous else warning in the generated file by adding braces. gcc/ChangeLog: * config/gcn/gcn.cc (gcn_option_override): Add note to set default for param_vect_partial_vector_usage to "1". * optc-save-gen.awk: Don't pass through options marked "NoOffload". * params.opt (-param=vect-epilogues-nomask): Add NoOffload. (-param=vect-partial-vector-usage): Likewise. (-param=vect-inner-loop-cost-factor): Likewise.
2025-07-30aarch64: Fix sme2+faminmax intrisic gating (PR 121300)Alfie Richards1-1/+2
Fixes the feature gating for the SME2+FAMINMAX intrinsics. PR target/121300 gcc/ChangeLog: * config/aarch64/aarch64-sve-builtins-sme.def (svamin/svamax): Fix arch gating. gcc/testsuite/ChangeLog: * gcc.target/aarch64/pr121300.c: New test.
2025-07-30aarch64: Add support for unpacked SVE FP conditional ternary arithmeticSpencer Abson1-29/+31
This patch extends the expander for fma, fnma, fms, and fnms to support partial SVE FP modes. We add the missing BF16 tests, which we can now trigger for having implemented the conditional expander. We also add tests for the 'merging with multiplicand' case, which this expander canonicalizes (albeit under SVE_STRICT_GP). gcc/ChangeLog: * config/aarch64/aarch64-sve.md (@cond_<optab><mode>): Extend to support partial FP modes. (*cond_<optab><mode>_2_strict): Extend from SVE_FULL_F to SVE_F, use aarch64_predicate_operand. (*cond_<optab><mode>_4_strict): Extend from SVE_FULL_F_B16B16 to SVE_F_B16B16, use aarch64_predicate_operand. (*cond_<optab><mode>_any_strict): Likewise. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/unpacked_cond_fmla_1.c: Add test cases for merging with multiplcand. * gcc.target/aarch64/sve/unpacked_cond_fmls_1.c: Likewise. * gcc.target/aarch64/sve/unpacked_cond_fnmla_1.c: Likewise. * gcc.target/aarch64/sve/unpacked_cond_fnmls_1.c: Likewise. * gcc.target/aarch64/sve/unpacked_cond_fmla_2.c: New test. * gcc.target/aarch64/sve/unpacked_cond_fmls_2.c: Likewise. * gcc.target/aarch64/sve/unpacked_cond_fnmla_2.c: Likewise.. * gcc.target/aarch64/sve/unpacked_cond_fnmls_2.c: Likewise. * g++.target/aarch64/sve/unpacked_cond_ternary_bf16_1.C: Likewise. * g++.target/aarch64/sve/unpacked_cond_ternary_bf16_2.C: Likewise.
2025-07-30aarch64: Relaxed SEL combiner patterns for unpacked SVE FP ternary arithmeticSpencer Abson1-19/+19
Extend the ternary op/UNSPEC_SEL combiner patterns from SVE_FULL_F/ SVE_FULL_F_BF to SVE_F/SVE_F_BF, where the strictness value is SVE_RELAXED_GP. We can only reliably test the 'merging with the third input' (addend) and 'independent value' patterns at this stage as the canocalisation that reorders the multiplicands based on the second SEL input would be performed by the conditional expander. Another difficulty is that we can't test these fused multiply/SEL combines without using __builtin_fma and friends. The reason for this is as follows: We support COND_ADD, COND_SUB, and COND_MUL optabs, so match.pd will canonicalize patterns like ADD/SUB/MUL combined with a VEC_COND_EXPR into these conditional forms. Later, when widening_mul tries to fold these into conditional fused multiply operations, the transformation fails - simply because we haven’t implemented those conditional fused multiply optabs yet. Hence why this patch lacks tests for BFloat16... gcc/ChangeLog: * config/aarch64/aarch64-sve.md (*cond_<optab><mode>_2_relaxed): Extend from SVE_FULL_F to SVE_F. (*cond_<optab><mode>_4_relaxed): Extend from SVE_FULL_F_B16B16 to SVE_F_B16B16. (*cond_<optab><mode>_any_relaxed): Likewise. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/unpacked_cond_fmla_1.c: New test. * gcc.target/aarch64/sve/unpacked_cond_fmls_1.c: Likewise. * gcc.target/aarch64/sve/unpacked_cond_fnmla_1.c: Likewise. * gcc.target/aarch64/sve/unpacked_cond_fnmls_1.c: Likewise.
2025-07-30aarch64: Add support for unpacked SVE FP ternary arithmeticSpencer Abson1-13/+13
This patch extends the expander for unconditional fma, fnma, fms, and fnms, so that it supports partial SVE FP modes. gcc/ChangeLog: * config/aarch64/aarch64-sve.md (<optab><mode>4): Extend from SVE_FULL_F_B16B16 to SVE_F_B16B16. Use aarch64_sve_fp_pred instead of aarch64_ptrue_reg. (@aarch64_pred_<optab><mode>): Extend from SVE_FULL_F_B16B16 to SVE_F_B16B16. Use aarch64_predicate_operand. gcc/testsuite/ChangeLog: * g++.target/aarch64/sve/unpacked_ternary_bf16_1.C: New test. * g++.target/aarch64/sve/unpacked_ternary_bf16_2.C: Likewise. * gcc.target/aarch64/sve/unpacked_fmla_1.c: Likewise. * gcc.target/aarch64/sve/unpacked_fmla_2.c: Likewise. * gcc.target/aarch64/sve/unpacked_fmls_1.c: Likewise. * gcc.target/aarch64/sve/unpacked_fmls_2.c: Likewise. * gcc.target/aarch64/sve/unpacked_fnmla_1.c: Likeiwse. * gcc.target/aarch64/sve/unpacked_fnmla_2.c: Likewise. * gcc.target/aarch64/sve/unpacked_fnmls_1.c: Likewise. * gcc.target/aarch64/sve/unpacked_fnmls_2.c: Likewise.
2025-07-30Remove V64SFmode and V64SImode.liuhongt2-4/+1
It's needed by avx5124vnniw/avx5124fmaps which have been removed by r15-656-ge1a7e2c54d52d0. gcc/ChangeLog: * config/i386/i386-modes.def: Remove VECTOR_MODES(FLOAT, 256) and VECTOR_MODE (INT, SI, 64). * config/i386/i386.cc (ix86_hard_regno_nregs): Remove related code for V64SF/V64SImode.
2025-07-30Eliminate redundant vpextrq/vpinsrq when move TI to V4SI.liuhongt1-0/+13
r14-1902-g96c3539f2a3813 split TImode move with 2 DImode move, it's supposed to optimize TImode in parameter/return since accoring to psABI it's stored into 2 general registers. But when TImode is not in parameter/return, it could create redundancy in the PR. The patch add a splitter to handle that. .i.e. (insn 10 9 14 2 (set (subreg:V2DI (reg:V4SI 98 [ <retval> ]) 0) (vec_concat:V2DI (subreg:DI (reg:TI 101) 0) (subreg:DI (reg:TI 101) 8))) 8442 {vec_concatv2di} (expr_list:REG_DEAD (reg:TI 101) gcc/ChangeLog: PR target/121274 * config/i386/sse.md (*vec_concatv2di_0): Add a splitter before it. gcc/testsuite/ChangeLog: * gcc.target/i386/pr121274.c: New test.
2025-07-29aarch64: Add support for unpacked SVE FP conditional binary arithmeticSpencer Abson3-73/+107
This patch extends the expander for conditional smax, smin, add, sub, mul, min, max, and div to support partial SVE FP modes. If exceptions from undefined vector elements must be suppressed, this expansion converts the container-level predicate to an element-level one, and ensures that these elements are inactive for the operation. In practice, this is a predicate AND with the existing mask and a container-size PTRUE. gcc/ChangeLog: * config/aarch64/aarch64-protos.h (aarch64_sve_emit_masked_fp_pred): Declare. * config/aarch64/aarch64-sve.md (and<mode>3): Change this to... (@and<mode>3): ...this, so that we can use gen_and3. (@cond_<optab><mode>): Extend from SVE_FULL_F_B16B16 to SVE_F_B16B16, use aarch64_predicate_operand. (*cond_<optab><mode>_2_strict): Likewise. (*cond_<optab><mode>_3_strict): Likewise. (*cond_<optab><mode>_any_strict): Likwise. (*cond_<optab><mode>_2_const_strict): Extend from SVE_FULL_F to SVE_F, use aarch64_predicate_operand. (*cond_<optab><mode>_any_const_strict): Likewise. (*cond_sub<mode>_3_const_strict): Likwise. (*cond_sub<mode>_const_strict): Likewise. (*vcond_mask_<mode><vpred>): Use aarch64_predicate_operand, and update the comment here. * config/aarch64/aarch64.cc (aarch64_sve_emit_masked_fp_pred): New function. Helper to mask the predicate in conditional expanders. gcc/testsuite/ChangeLog: * g++.target/aarch64/sve/unpacked_cond_binary_bf16_2.C: New test. * gcc.target/aarch64/sve/unpacked_cond_builtin_fmax_2.c: Likewise. * gcc.target/aarch64/sve/unpacked_cond_builtin_fmin_2.c: Likewise. * gcc.target/aarch64/sve/unpacked_cond_fadd_2.c: Likewise. * gcc.target/aarch64/sve/unpacked_cond_fdiv_2.c: Likewise. * gcc.target/aarch64/sve/unpacked_cond_fmaxnm_2.c: Likewise. * gcc.target/aarch64/sve/unpacked_cond_fminnm_2.c: Likewise. * gcc.target/aarch64/sve/unpacked_cond_fmul_2.c: Likewise. * gcc.target/aarch64/sve/unpacked_cond_fsubr_2.c: Likewise.
2025-07-29RISC-V: Generate -mcpu and -mtune options from riscv-cores.def.Dongyan Chen3-2/+119
Automatically generate -mcpu and -mtune options in invoke.texi from the unified riscv-cores.def metadata, ensuring documentation stays in sync with definitions and reducing manual maintenance. gcc/ChangeLog: * Makefile.in: Add riscv-mcpu.texi and riscv-mtune.texi to the list of files to be processed by the Texinfo generator. * config/riscv/t-riscv: Add rule for generating riscv-mcpu.texi and riscv-mtune.texi. * doc/invoke.texi: Replace hand‑written extension table with `@include riscv-mcpu.texi` and `@include riscv-mtune.texi` to pull in auto‑generated entries. * config/riscv/gen-riscv-mcpu-texi.cc: New file. * config/riscv/gen-riscv-mtune-texi.cc: New file. * doc/riscv-mcpu.texi: New file. * doc/riscv-mtune.texi: New file.
2025-07-29simplify-rtx: Simplify subregs of logic opsRichard Sandiford1-0/+34
This patch adds a new rule for distributing lowpart subregs through ANDs, IORs, and XORs with a constant, in cases where one of the terms then disappears. For example: (lowart-subreg:QI (and:HI x 0x100)) simplifies to zero and (lowart-subreg:QI (and:HI x 0xff)) simplifies to (lowart-subreg:QI x). This would often be handled at some point using nonzero bits. However, the specific case I want the optimisation for is SVE predicates, where nonzero bit tracking isn't currently an option. Specifically: the predicate modes VNx8BI, VNx4BI and VNx2BI have the same size as VNx16BI, but treat only every second, fourth, or eighth bit as significant. Thus if we have: (subreg:VNx8BI (and:VNx16BI x C)) where C is the repeating constant { 1, 0, 1, 0, ... }, then the AND only clears bits that are made insignificant by the subreg, and so the result is equal to (subreg:VNx8BI x). Later patches rely on this. gcc/ * simplify-rtx.cc (simplify_context::simplify_subreg): Distribute lowpart subregs through AND/IOR/XOR, if doing so eliminates one of the terms. (test_scalar_int_ext_ops): Add some tests of the above for integers. * config/aarch64/aarch64.cc (aarch64_test_sve_folding): Likewise add tests for predicate modes.
2025-07-29aarch64: Fix function_expander::get_reg_targetRichard Sandiford1-1/+2
function_expander::get_reg_target didn't actually check for a register, meaning that it could return a memory target instead. That doesn't really matter for the current direct and indirect uses (svundef*, svcreate*, and svset*) but it will for later patches. gcc/ * config/aarch64/aarch64-sve-builtins.cc (function_expander::get_reg_target): Check whether the target is a valid register_operand.
2025-07-28AVR: target/121277 - Don't load 0x800000 with const __flashx *x = NULL.Georg-Johann Lay1-6/+13
Converting from generic AS to __flashx used the same rule like for __memx, which tags RAM (generic AS) locations by setting bit 23. The justification was that generic isn't a subset of __flashx, though that lead to surprises with code like const __flashx *x = NULL. The natural thing to do is to just load 0x000000 in that case, so that the null pointer works in __flashx as expected. Apart from that, converting NULL to __flashx (or __flash) no more raises a -Waddr-space-convert diagnostic. gcc/ PR target/121277 * config/avr/avr.cc (avr_addr_space_convert): When converting from generic AS to __flashx, don't set bit 23. (avr_convert_to_type): Don't -Waddr-space-convert when NULL is converted to __flashx or to __flash.
2025-07-28x86: Disallow -mtls-dialect=gnu with no_caller_saved_registersH.J. Lu1-0/+22
__tls_get_addr doesn't preserve vector registers. When a function with no_caller_saved_registers attribute calls __tls_get_addr, YMM and ZMM registers will be clobbered. Issue an error and suggest -mtls-dialect=gnu2 in this case. gcc/ PR target/121208 * config/i386/i386.cc (ix86_tls_get_addr): Issue an error for -mtls-dialect=gnu with no_caller_saved_registers attribute and suggest -mtls-dialect=gnu2. gcc/testsuite/ PR target/121208 * gcc.target/i386/pr121208-1a.c: New test. * gcc.target/i386/pr121208-1b.c: Likewise. * gcc.target/i386/pr121208-2a.c: Likewise. * gcc.target/i386/pr121208-2b.c: Likewise. * gcc.target/i386/pr121208-3a.c: Likewise. * gcc.target/i386/pr121208-3b.c: Likewise. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2025-07-28nvptx/nvptx.opt: Update -march-map= for newer sm_xxxTobias Burnus1-0/+45
Usage of the -march-map=: "Select the closest available '-march=' value that is not more capable." As PTX ISA 8.6/8.7 (= unreleased CUDA 12.7 + CUDA 12.8) added the Nvidia Blackwell GPUs SM_100, SM_101, and SM_120, it makes sense to add them as well. Note that all three come as sm_XXX and sm_XXXa. PTX ISA 8.8 (CUDA 12.9) added SM_103 and SM_121 and the new 'f' suffix for all SM_1xx. Internally, GCC currently generates the same code for >= sm_80 (Ampere); however, as GCC's -march= also supports sm_89 (Ada), the here added sm_1xxs (Blackwell) will map to sm_89. [Naming note: while ptx code generated for sm_X can also run with sm_Y if Y > X, code generated for sm_XXXa can (generally) only run on the specific hardware; and sm_XXXf implies compatibility with only subsequent targets in the same family.] gcc/ChangeLog: * config/nvptx/nvptx.opt (march-map=): Add sm_100{,f,a}, sm_101{,f,a}, sm_103{,a,f}, sm_120{,a,f} and sm_121{,f,a}.
2025-07-28gcn: Fix CDNA3 atomics' buffer invalidationTobias Burnus1-10/+12
For device (agent) scope atomics - as needed when there is more than one teams, a buffer_wbl2 followed by s_waitcnt is required. When doing the initial porting, the pre-atomic instruction got accidentally replaced by buffer_inv sc1, which is not quite the right instruction. gcc/ChangeLog: * config/gcn/gcn.md (atomic_load, atomic_store, atomic_exchange): Fix CDNA3 L2 cache write-back before atomic instructions.