aboutsummaryrefslogtreecommitdiff
path: root/gcc/config
AgeCommit message (Collapse)AuthorFilesLines
2025-08-13[RISC-V][PR target/121160] Avoid bogus force_reg callJeff Law1-2/+2
When we canonicalize the comparison for a czero sequence we need to handle both integer and fp comparisons. Furthermore, within the integer space we want to make sure we promote any sub-word objects to a full word. All that is working fine. After promotion we then force the value into a register if it is not a register or constant already. The idea is not to have to special case subregs in subsequent code. This works fine except when we're presented with a floating point object that would be a subword. (subreg:SF (reg:SI)) on rv64 for example. So this tightens up that force_reg step. Bootstapped and regression tested on riscv64-linux-gnu and tested on riscv32-elf and riscv64-elf. Pushing to the trunk after pre-commit verifies no regressions. Jeff PR target/121160 gcc/ * config/riscv/riscv.cc (canonicalize_comparands); Tighten check for forcing value into a GPR. gcc/testsuite/ * gcc.target/riscv/pr121160.c: New test.
2025-08-13Fold GATHER_SCATTER_*_P into vect_memory_access_typeRichard Biener3-8/+7
The following splits up VMAT_GATHER_SCATTER into VMAT_GATHER_SCATTER_LEGACY, VMAT_GATHER_SCATTER_IFN and VMAT_GATHER_SCATTER_EMULATED. The main motivation is to reduce the uses of (full) gs_info, but it also makes the kind representable by a single entry rather than the ifn and decl tristate. The strided load with gather case gets to use VMAT_GATHER_SCATTER_IFN, since that's what we end up checking. * tree-vectorizer.h (vect_memory_access_type): Replace VMAT_GATHER_SCATTER with three separate access types, VMAT_GATHER_SCATTER_LEGACY, VMAT_GATHER_SCATTER_IFN and VMAT_GATHER_SCATTER_EMULATED. (mat_gather_scatter_p): New predicate. (GATHER_SCATTER_LEGACY_P): Remove. (GATHER_SCATTER_IFN_P): Likewise. (GATHER_SCATTER_EMULATED_P): Likewise. * tree-vect-stmts.cc (check_load_store_for_partial_vectors): Adjust. (get_load_store_type): Likewise. (vect_get_loop_variant_data_ptr_increment): Likewise. (vectorizable_store): Likewise. (vectorizable_load): Likewise. * config/i386/i386.cc (ix86_vector_costs::add_stmt_cost): Likewise. * config/riscv/riscv-vector-costs.cc (costs::need_additional_vector_vars_p): Likewise. * config/aarch64/aarch64.cc (aarch64_detect_vector_stmt_subtype): Likewise. (aarch64_vector_costs::count_ops): Likewise. (aarch64_vector_costs::add_stmt_cost): Likewise.
2025-08-13LoongArch: Define hook TARGET_COMPUTE_PRESSURE_CLASSES[PR120476].Lulu Cheng1-0/+15
The rtx cost value defined by the target backend affects the calculation of register pressure classes in the IRA, thus affecting scheduling. This may cause program performance degradation. For example, OpenSSL 3.5.1 SHA512 and SPEC CPU 2017 exchange_r. This problem can be avoided by defining a set of register pressure classes in the target backend instead of using the default IRA to automatically calculate them. gcc/ChangeLog: PR target/120476 * config/loongarch/loongarch.cc (loongarch_compute_pressure_classes): New function. (TARGET_COMPUTE_PRESSURE_CLASSES): Define.
2025-08-13LoongArch: Add support for _BitInt [PR117599]Yang Yujie2-3/+36
This patch adds support for C23's _BitInt for LoongArch. From the LoongArch psABI[1]: > _BitInt(N) objects are stored in little-endian order in memory > and are signed by default. > > For N ≤ 64, a _BitInt(N) object have the same size and alignment > of the smallest fundamental integral type that can contain it. > The unused high-order bits within this containing type are filled > with sign or zero extension of the N-bit value, depending on whether > the _BitInt(N) object is signed or unsigned. The _BitInt(N) object > propagates its signedness to the containing type and is laid out > in a register or memory as an object of this type. > > For N > 64, _BitInt(N) objects are implemented as structs of 64-bit > integer chunks. The number of chunks is the smallest even integer M > so that M * 64 ≥ N. These objects are of the same size of the struct > containing the chunks, but always have 16-byte alignment. If there > are unused bits in the highest-ordered chunk that contains used > bits, they are defined as the sign- or zero- extension of the used > bits depending on whether the _BitInt(N) object is signed or > unsigned. If an entire chunk is unused, its bits are undefined. [1] https://github.com/loongson/la-abi-specs PR target/117599 gcc/ChangeLog: * config/loongarch/loongarch.h: Define a PROMOTE_MODE case for small _BitInts. * config/loongarch/loongarch.cc (loongarch_promote_function_mode): Same. (loongarch_bitint_type_info): New function. (TARGET_C_BITINT_TYPE_INFO): Declare. libgcc/ChangeLog: * config/loongarch/t-softfp-tf: Enable _BitInt helper functions. * config/loongarch/t-loongarch: Same. * config/loongarch/libgcc-loongarch.ver: New file. gcc/testsuite/ChangeLog: * gcc.target/loongarch/bitint-alignments.c: New test. * gcc.target/loongarch/bitint-args.c: New test. * gcc.target/loongarch/bitint-sizes.c: New test.
2025-08-12[RISC-V][PR target/121113] Handle HFmode in various insn reservationsJeff Law3-4/+10
So this is a minor bug in a few DFA descriptions such as the Xiangshan and a couple of the SiFive descriptions. While Xiangshan covers every insn type, some of the reservations check the mode of the operation. Concretely the fdiv/fsqrt unit reservations vary based on the mode. They handled DF/SF, but not HF (the relevant iterators don't include BF). This patch just adds HF support with the same characteristics as SF. Those who know these designs better could perhaps improve the reservation, but this at least keeps us from aborting. I did check the other published DFAs for mode dependent reservations. That's show I found the p400/p600 issue. Tested in my tester, waiting for CI to render its verdict before pushing. PR target/121113 gcc/ * config/riscv/sifive-p400.md: Handle HFmode for fdiv/fsqrt. * config/riscv/sifive-p600.md: Likewise. * config/riscv/xiangshan.md: Likewise. gcc/testsuite/ * gcc.target/riscv/pr121113.c: New test.
2025-08-12x86: Convert integer constant to mode of moveH.J. Lu1-0/+2
For (set (reg/v:DI 106 [ k ]) (const_int 3000000000 [0xb2d05e00])) ... (set (reg:V4SI 115 [ _13 ]) (vec_duplicate:V4SI (subreg:SI (reg/v:DI 106 [ k ]) 0))) ... (set (reg:V2SI 118 [ _9 ]) (vec_duplicate:V2SI (subreg:SI (reg/v:DI 106 [ k ]) 0))) we should generate (set (reg:SI 125) (const_int -1294967296 [0xffffffffb2d05e00])) (set (reg:V4SI 124) (vec_duplicate:V4SI (reg:VSI 125)) ... (set (reg:V4SI 115 [ _13 ]) (reg:V4SI 124) ... (set (reg:V2SI 118 [ _9 ]) (subreg:V2SI (reg:V4SI 124)) by converting integer constant to mode of move. gcc/ PR target/121497 * config/i386/i386-features.cc (ix86_broadcast_inner): Convert integer constant to mode of move gcc/testsuite/ PR target/121497 * gcc.target/i386/pr121497.c: New test. Co-authored-by: Liu, Hongtao <hongtao.liu@intel.com> Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2025-08-13RISC-V: Combine vec_duplicate + vmerge.vv to vmerge.vx on GR2VR costPan Li1-0/+18
This patch would like to combine the vec_duplicate + vaadd.vv to the vaadd.vx. From example as below code. The related pattern will depend on the cost of vec_duplicate from GR2VR. Then the late-combine will take action if the cost of GR2VR is zero, and reject the combination if the GR2VR cost is greater than zero. Assume we have example code like below, GR2VR cost is 0. #define DEF_VX_MERGE_0(T) \ void \ test_vx_merge_##T##_case_0 (T * restrict out, T * restrict in, \ T x, unsigned n) \ { \ for (unsigned i = 0; i < n; i++) \ { \ if (i % 2 == 0) \ out[i] = x; \ else \ out[i] = in[i]; \ } \ } DEF_VX_MERGE_0(int32_t) Before this patch: 11 │ beq a3,zero,.L8 12 │ vsetvli a5,zero,e32,m1,ta,ma 13 │ vmv.v.x v2,a2 ... 16 │ .L3: 17 │ vsetvli a5,a3,e32,m1,ta,ma ... 22 │ vmerge.vvm v1,v1,v2,v0 ... 25 │ bne a3,zero,.L3 After this patch: 11 │ beq a3,zero,.L8 ... 14 │ .L3: 15 │ vsetvli a5,a3,e32,m1,ta,ma ... 20 │ vmerge.vxm v1,v1,a2,v0 ... 23 │ bne a3,zero,.L3 gcc/ChangeLog: * config/riscv/autovec-opt.md (*merge_vx_<mode>): Add new pattern to combine the vmerge.vxm. Signed-off-by: Pan Li <pan2.li@intel.com>
2025-08-12RISC-V: Expand const_vector with 2 elts per pattern.Robin Dapp1-17/+106
Hi, In PR121334 we are asked to expand a const_vector of size 4 with poly_int elements. It has 2 elts per pattern so is neither a const_vector_duplicate nor a const_vector_stepped. We don't allow this kind of constant in legitimate_constant_p but expr apparently still wants us to expand it under certain conditions. This patch implements a basic expander for such kinds of patterns. As slide1up is used to build the individual vectors it also adds a helper function expand_slide1up. I regtested on rv64gcv_zvl512b but unfortunately the newly created pattern is not even executed. I tried some variations of the original code but didn't manage to trigger it. Regards Robin PR target/121334 gcc/ChangeLog: * config/riscv/riscv-v.cc (expand_slide1up): New function. (expand_vector_init_trailing_same_elem): Use new function. (expand_const_vector_onestep): New function. (expand_const_vector): Uew expand_slide1up. (expand_vector_init_merge_repeating_sequence): Ditto. (shuffle_off_by_one_patterns): Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/pr121334.c: New test.
2025-08-12LoongArch: macro instead enum for base abi typemengqinggang1-6/+4
enum can't be used in #if. For #if expression, identifiers that are not macros, which are all considered to be the number zero. This patch may fix https://sourceware.org/bugzilla/show_bug.cgi?id=32776. gcc/ChangeLog: * config/loongarch/loongarch-def.h (ABI_BASE_LP64D): New macro. (ABI_BASE_LP64F): New macro. (ABI_BASE_LP64S): New macro. (N_ABI_BASE_TYPES): New macro.
2025-08-11Improve initial code generation for addsi/adddiShreya Munnangi3-4/+119
This is a patch primarily from Shreya, though I think she cribbed some code from Philipp that we had internally within Ventana and I made some minor adjustments as well. So the basic idea here is similar to her work on logical ops -- specifically when we can generate more efficient code at expansion time, then do so. In some cases the net is better code; in other cases we lessen reliance on mvconst_internal and finally it provides infrastructure that I think will help address an issue Paul Antoine reported a little while back. The most obvious case is using paired addis from initial code generation for some constants. It will also use a shNadd insn when the cost to synthesize the original value is higher than the right-shifted value. Finally it will negate the constant and use "sub" if the negated constant is cheaper than the original constant. There's more work to do in here, particularly WRT 32 bit objects for rv64. Shreya is looking at that right now. There may also be cases where another shNadd or addi would be profitable. We haven't really explored those cases in any detail, while there may be cases to handle, it's unclear how often they occur in practice. I don't want to remove the define_insn_and_split for the paired addi cases yet. I think that likely happens as a side effect of fixing Paul Antoine's issue. Bootstrapped and regression tested on a BPI & Pioneer box. Will obviously wait for the pre-commit tester before moving forward. Jeff PR target/120603 gcc/ * config/riscv/riscv-protos.h (synthesize_add): Add prototype. * config/riscv/riscv.cc (synthesize_add): New function. * config/riscv/riscv.md (addsi3): Allow any constant as operands[2] in the expander. Force the constant into a register as needed for TARGET_64BIT. Use synthesize_add for !TARGET_64BIT. (*adddi3): Renamed from adddi3. (adddi3): New expander. Use synthesize_add. gcc/testsuite * gcc.target/riscv/add-synthesis-1.c: New test. Co-authored-by: Jeff Law <jlaw@ventanamicro.com> Co-authored-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
2025-08-11aarch64: Fix condition accepted by mov<ALLI>ccRichard Henderson1-5/+11
Reject QI/HImode conditions, which would require extension in order to compare. Fixes z.c:10:1: error: unrecognizable insn: 10 | } | ^ (insn 23 22 24 2 (set (reg:CC 66 cc) (compare:CC (reg:HI 128) (reg:HI 127))) "z.c":6:6 -1 (nil)) during RTL pass: vregs gcc: * config/aarch64/aarch64.md (mov<ALLI>cc): Accept MODE_CC conditions directly; reject QI/HImode conditions. gcc/testsuite: * gcc.target/aarch64/cmpbr-3.c: New. * gcc.target/aarch64/ifcvt_multiple_sets_rewire.c: Simplify test for csel by ignoring the actual registers used.
2025-08-11aarch64: CMPBR branches must be invertableRichard Henderson5-43/+48
Restrict the immediate range to the intersection of LT/GE and GT/LE so that cfglayout can invert the condition to redirect any branch. gcc: PR target/121388 * config/aarch64/aarch64.cc (aarch64_cb_rhs): Restrict the range of LT/GE and GT/LE to their intersections. * config/aarch64/aarch64.md (*aarch64_cb<INT_CMP><GPI>): Unexport. Use cmpbr_imm_predicate instead of aarch64_cb_rhs. * config/aarch64/constraints.md (Uc1): Accept 0..62. (Uc2): Remove. * config/aarch64/iterators.md (cmpbr_imm_predicate): New. (cmpbr_imm_constraint): Update to match aarch64_cb_rhs. * config/aarch64/predicates.md (aarch64_cb_reg_i63_operand): New. (aarch64_cb_reg_i62_operand): New. gcc/testsuite: PR target/121388 * gcc.target/aarch64/cmpbr.c (u32_x0_ult_64): XFAIL. (i32_x0_slt_64, u64_x0_ult_64, i64_x0_slt_64): XFAIL. * gcc.target/aarch64/cmpbr-2.c: New.
2025-08-11aarch64: Consider TARGET_CMPBR in rtx costsRichard Henderson1-0/+9
gcc: * config/aarch64/aarch64.cc (aarch64_if_then_else_costs): Use aarch64_cb_rhs to match CB insns.
2025-08-11aarch64: Remove cc clobber from *aarch64_tbz<LTGE><ALLI>1Richard Henderson1-20/+6
There is a conflict between aarch64_tbzltdi1 and aarch64_cbltdi with respect to pnum_clobbers, resulting in a recog failure: 0xa1fffe fancy_abort(char const*, int, char const*) ../../gcc/diagnostics/context.cc:1640 0x81340e patch_jump_insn ../../gcc/cfgrtl.cc:1303 0xc0eafe redirect_branch_edge ../../gcc/cfgrtl.cc:1330 0xc0f372 cfg_layout_redirect_edge_and_branch ../../gcc/cfgrtl.cc:4736 0xbfb6b9 redirect_edge_and_branch(edge_def*, basic_block_def*) ../../gcc/cfghooks.cc:391 0x1fa9310 try_forward_edges ../../gcc/cfgcleanup.cc:561 0x1fa9310 try_optimize_cfg ../../gcc/cfgcleanup.cc:2931 0x1fa9310 cleanup_cfg(int) ../../gcc/cfgcleanup.cc:3143 0x1fe11e8 rest_of_handle_cse ../../gcc/cse.cc:7591 0x1fe11e8 execute ../../gcc/cse.cc:7622 The simplest solution is to remove the clobber from aarch64_tbz. This removes the possibility of expansion via TST+B.cond, which will merely fall back to TBNZ+B on shorter branches. gcc: PR target/121385 * config/aarch64/aarch64.md (*aarch64_tbz<LTGE><ALLI>1): Remove cc clobber and expansion via TST+Bcond. gcc/testsuite: PR target/121385 * gcc.target/aarch64/cmpbr-1.c: New.
2025-08-11aarch64: Disable TARGET_CMPBR with aarch64_track_speculationRichard Henderson1-2/+3
With -mtrack-speculation, CC_REGNUM must be used at every conditional branch. gcc: * config/aarch64/aarch64.h (TARGET_CMPBR): False when aarch64_track_speculation is true.
2025-08-11aarch64: Fix aarch64_split_imm24 patternsRichard Henderson3-49/+63
Both patterns used !reload_completed as a condition, which is questionable at best. The branch pattern failed to include a clobber of CC_REGNUM. Both problems were unlikely to trigger in practice, due to how the optimization pipeline is organized, but let's fix them anyway. gcc: * config/aarch64/aarch64.cc (aarch64_gen_compare_split_imm24): New. * config/aarch64/aarch64-protos.h: Update. * config/aarch64/aarch64.md (*aarch64_bcond_wide_imm<GPI>): Use it. Add match_scratch and cc clobbers. Use match_operator instead of iterator expansion. (*compare_cstore<GPI>_insn): Likewise.
2025-08-11aarch64: Rename and improve aarch64_split_imm24Richard Henderson3-13/+14
Two of the three uses of aarch64_imm24 included the important follow-up tests vs aarch64_move_imm and aarch64_plus_operand. Lack of the exclusion within aarch64_if_then_else_costs produced incorrect costing. Since aarch64_split_imm24 has already matched a non-negative CONST_INT, drill down from aarch64_plus_operand to aarch64_uimm12_shift. gcc: * config/aarch64/predicates.md (aarch64_split_imm24): Rename from aarch64_imm24; exclude aarch64_move_imm and aarch64_uimm12_shift. * config/aarch64/aarch64.md (*aarch64_bcond_wide_imm<GPI>): Update for aarch64_split_imm24. (*compare_cstore<GPI>_insn): Likewise. * config/aarch64/aarch64.cc (aarch64_if_then_else_costs): Likewise.
2025-08-11aarch64: Fix gcs save/restore_stack_nonlocalRichard Henderson1-7/+7
The save/restore_stack_nonlocal patterns passed a DImode rtx to gen_tbranch_neqi3 for a QImode compare. But since we're seeding r16 with 1, GCSEnabled will clear the only set bit in r16, so we can use CBNZ instead of TBNZ. gcc: * config/aarch64/aarch64.md (tbranch_<EQL><SHORT>3): Remove. (save_stack_nonlocal): Use aarch64_gen_compare_zero_and_branch. (restore_stack_nonlocal): Likewise. gcc/testsuite: * gcc.target/aarch64/gcs-nonlocal-3.c: Match cbnz.
2025-08-11aarch64: Use aarch64_gen_compare_zero_and_branch in aarch64_restore_zaRichard Henderson4-3/+7
With -mtrack-speculation, the pattern that was directly expanded by aarch64_restore_za is disabled. Use the helper function instead. gcc: * config/aarch64/aarch64.cc (aarch64_gen_compare_zero_and_branch): Export. * config/aarch64/aarch64-protos.h (aarch64_gen_compare_zero_and_branch): Declare it. * config/aarch64/aarch64-sme.md (aarch64_restore_za): Use it. * config/aarch64/aarch64.md (*aarch64_cbz<EQL><GPI>): Unexport.
2025-08-11aarch64: Reorg aarch64_if_the_else_costs, conditional branchRichard Henderson1-23/+32
gcc: * config/aarch64/aarch64.cc (aarch64_if_the_else_costs): Reorg to include the cost of inner within TBZ sign-bit test, only match CBZ/CBNZ with valid modes, and both for the aarch64_imm24 test.
2025-08-11aarch64: Remove an indentation level from aarch64_if_then_else_costsRichard Henderson1-27/+25
gcc: * config/aarch64/aarch64.cc (aarch64_if_then_else_costs): Remove else after return and re-indent.
2025-08-11aarch64: Fix spelling of BRANCH_LEN_N_1KiBRichard Henderson1-10/+10
One kilobyte not one kilobit. gcc: * config/aarch64/aarch64.md (BRANCH_LEN_N_1KiB): Rename from BRANCH_LEN_N_1Kib.
2025-08-11RISC-V: Refactor the vec_duplicate cost on gpr/fpr2vr-cost paramPan Li1-99/+11
The previous cost value for vec_duplicate almost bases on the operators like add/minus. The rtx_cost function try to match them case by case and find if it has vec_duplicate, then update the cost values. It is Ok when we initially add it but looks confused/redundant as more and more operators are involved. As Robin's suggestion, we only care about the sub-rtx has vec_duplicate or not, instead of take care of it by operators. Thus, this PR would like to refactor that and get rid of the operators when compute the vec_duplicate cost. The below test suites are passed for this patch series. * The rv64gcv fully regression test. gcc/ChangeLog: * config/riscv/riscv.cc (get_vector_binary_rtx_cost): Remove. (riscv_rtx_costs): Refactor to serach vec_duplicate on the sub rtx. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv-nofm.c: Update asm check due to above change. * gcc.target/riscv/rvv/autovec/binop/vadd-rv64gcv-nofm.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vmul-rv32gcv-nofm.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vmul-rv64gcv-nofm.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vsub-rv32gcv-nofm.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vsub-rv64gcv-nofm.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_copysign-rv32gcv.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_copysign-rv64gcv.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fadd-1.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fadd-2.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fadd-3.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fadd-4.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-1.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-3.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-4.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-5.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-6.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fmax-1.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fmax-2.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fmax-3.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fmax-4.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fmax_zvfh-1.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fmax_zvfh-2.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fmax_zvfh-3.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fmax_zvfh-4.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fmin-1.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fmin-2.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fmin-3.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fmin-4.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fmin_zvfh-1.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fmin_zvfh-2.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fmin_zvfh-3.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fmin_zvfh-4.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-1.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-3.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-4.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-5.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-6.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fmul-1.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fmul-2.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fmul-3.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fmul-4.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fmul-5.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f16.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f32.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f64.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i16.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i8.c: Ditto. Signed-off-by: Pan Li <pan2.li@intel.com>
2025-08-11arm: Fix operand check for __arm_{mrrc{2},mcrr{2]} intrinsics [PR 121464]Andre Vieira1-2/+2
Fix the bound checking for the opc1 operand of the following intrinsics: __arm_mcrr __arm_mcrr2 __arm_mrrc __arm_mrrc2 gcc/ChangeLog: PR target/121464 * config/arm/arm.md (arm_<mrrc>, arm_<mcrr>): Fix operand check. gcc/testsuite/ChangeLog: PR target/121464 * gcc.target/arm/acle/mcrr.c: Update testcase. * gcc.target/arm/acle/mcrr2.c: Likewise. * gcc.target/arm/acle/mrrc.c: Likewise. * gcc.target/arm/acle/mrrc2.c: Likewise.
2025-08-11Fix comment typosJakub Jelinek1-3/+3
This patch fixes some comment typos, singe -> single and unsinged -> unsigned. 2025-08-11 Jakub Jelinek <jakub@redhat.com> gcc/ * tree-cfg.cc (find_case_label_for_value): Fix comment typo, singe-valued -> single-valued. * config/arc/arc.md: Fix comment typos, unsinged -> unsigned. gcc/fortran/ * gfortran.h (gfc_case): Fix comment typo, singe -> single. gcc/testsuite/ * g++.dg/warn/template-1.C: Fix comment typo, unsinged -> unsigned. * gcc.target/powerpc/builtins-2-p9-runnable.c (main): Likewise. * gcc.dg/graphite/id-30.c: Likewise.
2025-08-10Add -mgrow-frame-downwardsMatthew Fortune2-2/+12
Grow the local frame down instead of up for mips16 code size. By growing the frame downwards we get spill slots created at the lowest address rather than highest address in a local frame. The benefit being that when the frame is large the spill slots can still be accessed using a 16bit instruction whereas it is less important for large local variables to be accessed using short instructions as they are (probably) accessed less frequently. This is default on for MIPS16. gcc/ * config/mips/mips.h (FRAME_GROWS_DOWNWARD) Allow the frame to grow downwards for mips16 when -mgrow-frame-downwards is set. * config/mips/mips.opt: Add -mgrow-frame-downwards option.
2025-08-09Darwin: Anchor block internal symbols must not be linker-visible.Iain Sandoe1-5/+33
When we are using section anchors, there's a requirement that the sequence of the content is an unbroken block. If we allow linker- visible symbols in that block, ld(64) would be able to break it into sub-sections on those symbol boundaries. Do not allow symbols that should be visible to be anchored. Do not make anchor block internal symbols linker-visible. gcc/ChangeLog: * config/darwin.cc (darwin_encode_section_info): Do not make anchored symbols linker-visible. (darwin_use_anchors_for_symbol_p): Disallow anchoring on symbols that must be linker-visible (or external), even if the definitions are in this TU. Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
2025-08-09Darwin: Section anchors must be linker-visible.Iain Sandoe1-0/+2
In principle, these begin (or at least delineate) a region that could be split by the static linker. If the symbols are hidden to newer linkers they produce diagnostics about the temporary symbol generated. gcc/ChangeLog: * config/darwin.h (ASM_GENERATE_INTERNAL_LABEL): New entry for LANCHOR. Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
2025-08-08xtensa: Refine constraint "T" to define_special_memory_constraintTakayuki 'January June' Suwa1-1/+1
References to literal pool entries do not need to be reloaded or converted to "(mem (reg X))" to load via base register. gcc/ChangeLog: * config/xtensa/constraints.md (T): Change define_memory_constraint to define_special_memory_constraint.
2025-08-08arm: Fix CMSE nonecure calls [PR 120977]Christophe Lyon4-24/+24
As discussed in https://gcc.gnu.org/pipermail/gcc-patches/2025-June/685733.html the operand of the call should be a mem rather than an unspec. This patch moves the unspec to an additional argument of the parallel and adjusts cmse_nonsecure_call_inline_register_clear accordingly. The scan-rtl-dump in cmse-18.c needs a fix since we no longer emit the 'unspec' part. In addition, I noticed that since arm_v8_1m_mve_ok is always true in the context of the test (we know we support CMSE as per cmse.exp, and arm_v8_1m_mve_ok finds the adequate options), we actually only use the more permissive regex. To improve that, the patch duplicates the test, such that cmse-18.c forces -march=armv8-m.main+fp (so FPCXP is disabled), and cmse-19.c forces -march=armv8.1-m.main+mve (so FPCXP is enabled). Each test uses the appropriate scan-rtl-dump, and also checks we are using UNSPEC_NONSECURE_MEM (we need to remove -slim for that). The tests enable an FPU via -march so that the test passes whether the testing harness forces -mfloat-abi or not. 2025-07-08 Christophe Lyon <christophe.lyon@linaro.org> PR target/120977 gcc/ * config/arm/arm.md (call): Move unspec parameter to parallel. (nonsecure_call_internal): Likewise. (call_value): Likewise. (nonsecure_call_value_internal): Likewise. * config/arm/thumb1.md (nonsecure_call_reg_thumb1_v5): Likewise. (nonsecure_call_value_reg_thumb1_v5): Likewise. * config/arm/thumb2.md (nonsecure_call_reg_thumb2_fpcxt): Likewise. (nonsecure_call_reg_thumb2): Likewise. (nonsecure_call_value_reg_thumb2_fpcxt): Likewise. (nonsecure_call_value_reg_thumb2): Likewise. * config/arm/arm.cc (cmse_nonsecure_call_inline_register_clear): Likewise. gcc/testsuite * gcc.target/arm/cmse/cmse-18.c: Check only the case when FPCXT is not enabled. * gcc.target/arm/cmse/cmse-19.c: New test.
2025-08-08AArch64: Fix invalid immediate offsets in SVE gather/scatter [PR121449]Pengfei Li1-32/+32
This patch fixes incorrect constraints in RTL patterns for AArch64 SVE gather/scatter with type widening/narrowing and vector-plus-immediate addressing. The bug leads to below "immediate offset out of range" errors during assembly, eventually causing compilation failures. /tmp/ccsVqBp1.s: Assembler messages: /tmp/ccsVqBp1.s:54: Error: immediate offset out of range 0 to 31 at operand 3 -- `ld1b z1.d,p0/z,[z1.d,#64]' Current RTL patterns for such instructions incorrectly use vgw or vgd constraints for the immediate operand, base on the vector element type in Z registers (zN.s or zN.d). However, for gather/scatter with type conversions, the immediate range for vector-plus-immediate addressing is determined by the element type in memory, which differs from that in vector registers. Using the wrong constraint can produce out-of-range offset values that cannot be encoded in the instruction. This patch corrects the constraints used in these patterns. A test case that reproduces the issue is also included. Bootstrapped and regression-tested on aarch64-linux-gnu. gcc/ChangeLog: PR target/121449 * config/aarch64/aarch64-sve.md (mask_gather_load<mode><v_int_container>): Use vg<Vesize> constraints for alternatives with immediate offset. (mask_scatter_store<mode><v_int_container>): Likewise. gcc/testsuite/ChangeLog: PR target/121449 * g++.target/aarch64/sve/pr121449.C: New test.
2025-08-08aarch64: Relax fpm_t assert to allow const_ints [PR120986]Alex Coplan1-2/+3
This relaxes an overzealous assert that required the fpm_t argument to be in DImode when expanding FP8 intrinsics. Of course this fails to account for modeless const_ints. gcc/ChangeLog: PR target/120986 * config/aarch64/aarch64-sve-builtins.cc (function_expander::expand): Relax fpm_t assert to allow modeless const_ints. gcc/testsuite/ChangeLog: PR target/120986 * gcc.target/aarch64/torture/pr120986-2.c: New test.
2025-08-08aarch64: Fix predication of FP8 FDOT insns [PR120986]Alex Coplan2-8/+14
The predication of the SVE2 FP8 dot product insns was relying on the architectural dependency: FEAT_FP8DOT2 => FEAT_FP8DOT4 which was relaxed in GCC as of r15-7480-g299a8e2dc667e795991bc439d2cad5ea5bd379e2, thus leading to unrecognisable insn ICEs when compiling a two-way FDOT with just +fp8dot2. This patch introduces a new mode iterator which selectively enables the appropriate mode(s) depending on which of the FP8DOT{2,4} features are available, and uses it to fix the predication of the patterns. gcc/ChangeLog: PR target/120986 * config/aarch64/aarch64-sve2.md (@aarch64_sve_dot<mode>): Switch mode iterator from SVE_FULL_HSF to new iterator; remove insn predicate as this is now taken care of by conditions in the mode iterator. (@aarch64_sve_dot_lane<mode>): Likewise. * config/aarch64/iterators.md (SVE_FULL_HSF_FP8_FDOT): New. gcc/testsuite/ChangeLog: PR target/120986 * gcc.target/aarch64/pr120986-1.c: New test.
2025-08-07aarch64: Mark SME functions as .variant_pcs [PR121414]Richard Sandiford1-8/+29
Unlike base PCS functions, __arm_streaming and __arm_streaming_compatible functions allow/require PSTATE.SM to be 1 on entry, so they need to be treated as STO_AARCH64_VARIANT_PCS. Similarly, functions that share ZA or ZT0 with their callers require ZA to be active on entry, whereas the base PCS requires ZA to be dormant or off. These functions too need to be marked as having a variant PCS. gcc/ PR target/121414 * config/aarch64/aarch64.cc (aarch64_is_variant_pcs): New function, split out from... (aarch64_asm_output_variant_pcs): ...here. Handle various types of SME function type. gcc/testsuite/ PR target/121414 * gcc.target/aarch64/sme/pr121414_1.c: New test.
2025-08-07s390: Add _BitInt supportStefan Schulze Frielinghaus1-8/+29
gcc/ChangeLog: * config/s390/s390.cc (print_operand): Allow arbitrary wide_int constants for _BitInt. (s390_bitint_type_info): Implement target hook TARGET_C_BITINT_TYPE_INFO. libgcc/ChangeLog: * config/s390/libgcc-glibc.ver: Export _BitInt support functions. * config/s390/t-softfp (softfp_extras): Add fixtfbitint floatbitinttf. gcc/testsuite/ChangeLog: * gcc.target/s390/bitint-1.c: New test. * gcc.target/s390/bitint-2.c: New test. * gcc.target/s390/bitint-3.c: New test. * gcc.target/s390/bitint-4.c: New test.
2025-08-07i386: Fix invalid RTX mode in the unnamed rotate splitter.Uros Bizjak2-8/+12
The following splitter from the commit r11-5747: (define_split [(set (match_operand:SWI 0 "register_operand") (any_rotate:SWI (match_operand:SWI 1 "const_int_operand") (subreg:QI (and (match_operand 2 "int248_register_operand") (match_operand 3 "const_int_operand")) 0)))] "(INTVAL (operands[3]) & (GET_MODE_BITSIZE (<MODE>mode) - 1)) == GET_MODE_BITSIZE (<MODE>mode) - 1" [(set (match_dup 4) (match_dup 1)) (set (match_dup 0) (any_rotate:SWI (match_dup 4) (subreg:QI (and:SI (match_dup 2) (match_dup 3)) 0)))] "operands[4] = gen_reg_rtx (<MODE>mode);") matches any mode of (and ...) on input, but hard-codes (and:SI ...) in the output. This causes an ICE if the incoming (and ...) is DImode rather than SImode. Co-developed-by: Richard Sandiford <richard.sandiford@arm.com> PR target/96226 gcc/ChangeLog: * config/i386/predicates.md (and_operator): New operator. * config/i386/i386.md (splitter after *<rotate_insn><mode>3_mask): Use and_operator to match AND RTX and use its mode in the split pattern.
2025-08-06i386: Add missing PTA_POPCNT and PTA_LZCNT with PTA_ABMYangyu Chen1-17/+17
This patch adds the missing PTA_POPCNT and PTA_LZCNT with the PTA_ABM bitmask definition for the bdver1, btver1, and lujiazui architectures in the i386 architecture configuration file. Although these two features were not present in the original definition, their absence does not affect the functionality of these architectures because the POPCNT and LZCNT bits are set when ABM is enabled in the ix86_option_override_internal function. However, including them in these definitions improves consistency and clarity. This issue was discovered while writing a script to extract these bitmasks from the i386.h file referenced in [1]. Additionally, the PTA_YONGFENG bitmask appears incorrect as it includes PTA_LZCNT while already inheriting PTA_ABM from PTA_LUJIAZUI. This seems to be a typo and should be corrected. [1] https://github.com/cyyself/x86-pta gcc/ChangeLog: * config/i386/i386.h (PTA_BDVER1): Add missing PTA_POPCNT and PTA_LZCNT with PTA_ABM. (PTA_ZNVER1): Ditto. (PTA_BTVER1): Ditto. (PTA_LUJIAZUI): Ditto. (PTA_YONGFENG): Do not include extra PTA_LZCNT. Signed-off-by: Yangyu Chen <cyy@cyyself.name>
2025-08-06RISC-V: Read extension data from riscv-ext*.def for arch-canonicalizeKito Cheng1-81/+482
Previously, arch-canonicalize used hardcoded data to handle IMPLIED_EXT. But this data often got out of sync with the actual C++ implementation. Earlier, we introduced riscv-ext.def to keep track of all extension info and generate docs. Now, arch-canonicalize also uses this same data to handle extension implication rules directly. One limitation is that conditional implication rules still need to be written manually. Luckily, there aren't many of them for now, so it's still manageable. I really wanted to avoid writing a C++ + Python binding or trying to parse C++ logic in Python... This version also adds a `--selftest` option to run some unit tests. gcc/ChangeLog: * config/riscv/arch-canonicalize: Read extension data from riscv-ext*.def and adding unittest.
2025-08-06RISC-V: Support -march=unsetKito Cheng1-2/+8
This patch introduces a new `-march=unset` option for RISC-V GCC that allows users to explicitly ignore previous `-march` options and derive the architecture string from the `-mcpu` option instead. This feature is particularly useful for build systems and toolchain configurations where you want to ensure the architecture is always derived from the CPU specification rather than relying on potentially conflicting `-march` options. gcc/ChangeLog: * common/config/riscv/riscv-common.cc (riscv_expand_arch): Ignore `unset`. * config/riscv/riscv.h (OPTION_DEFAULT_SPECS): Handle `-march=unset`. (ARCH_UNSET_CLEANUP_SPECS): New. (DRIVER_SELF_SPECS): Handle -march=unset. * doc/invoke.texi (RISC-V Options): Update documentation for `-march=unset`. gcc/testsuite/ChangeLog: * gcc.target/riscv/arch-unset-1.c: New test. * gcc.target/riscv/arch-unset-2.c: New test. * gcc.target/riscv/arch-unset-3.c: New test. * gcc.target/riscv/arch-unset-4.c: New test. * gcc.target/riscv/arch-unset-5.c: New test.
2025-08-05x86: Get the widest vector mode from STORE_MAX_PIECES for memsetH.J. Lu1-2/+3
commit 050b1708ea532ea4840e97d85fad4ca63d4cd631 Author: H.J. Lu <hjl.tools@gmail.com> Date: Thu Jun 19 05:03:48 2025 +0800 x86: Get the widest vector mode from MOVE_MAX gets the widest vector mode from MOVE_MAX. But for memset, it should use STORE_MAX_PIECES. gcc/ PR target/121410 * config/i386/i386-expand.cc (ix86_expand_set_or_cpymem): Use STORE_MAX_PIECES to get the widest vector mode in vector loop for memset. gcc/testsuite/ PR target/121410 * gcc.target/i386/pr121410.c: New test. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2025-08-05AVR: Allow combination of sign_extend with ashift.Georg-Johann Lay2-1/+46
gcc/ * config/avr/avr.cc (avr_rtx_costs_1) [SIGN_EXTEND]: Adjust cost. * config/avr/avr.md (*sext.ashift<QIPSI:mode><HISI:mode>2): New insn and a cc split.
2025-08-05i386: Extend recognition of high-reg rvalues [PR121306]Richard Sandiford2-245/+161
The i386 high-register patterns used things like: (match_operator:SWI248 2 "extract_operator" [(match_operand 0 "int248_register_operand" "Q") (const_int 8) (const_int 8)]) to match an extraction of a high register such as AH from AX/EAX/RAX. This construct is used in contexts where only the low 8 bits of the value matter. This is either done explicitly using a (subreg:QI ... 0) or implicitly by assigning to an 8-bit zero_extract destination. extract_operator therefore matches both sign_extract and zero_extract, since the signedness of the extension beyond 8 bits is irrelevant. But the fact that only the low 8 bits of the value are significant means that a shift right by 8 is as good as an extraction. Shifts right would already be used for things like: struct s { long a:8; long b:8; long c:48; }; struct s f(struct s x, long y, long z) { x.b = (y & z) >> 8; return x; } but are used more after g:965564eafb721f8000013a3112f1bba8d8fae32b. This patch therefore replaces extract_operator with a new predicate called extract_high_operator that matches both extractions and shifts. The predicate checks the extraction field and shift amount itself, so that patterns only need to match the first operand. Splitters used match_op_dup to preserve the choice of extraction. But the fact that the extractions (and now shifts) are equivalent means that we can just as easily canonicalise on one of them. (In theory, canonicalisation would also promote CSE, although that's unlikely in practice.) The patch goes for zero_extract, for consistency with destinations. gcc/ PR target/121306 * config/i386/predicates.md (extract_operator): Replace with... (extract_high_operator): ...this new predicate. * config/i386/i386.md (*cmpqi_ext<mode>_1, *cmpqi_ext<mode>_2) (*cmpqi_ext<mode>_3, *cmpqi_ext<mode>_4, *movstrictqi_ext<mode>_1) (*extzv<mode>, *insvqi_2, *extendqi<SWI24:mode>_ext_1) (*addqi_ext<mode>_1_slp, *addqi_ext<mode>_1_slp, *addqi_ext<mode>_0) (*addqi_ext2<mode>_0, *addqi_ext<mode>_1, *<insn>qi_ext<mode>_2) (*subqi_ext<mode>_1_slp, *subqi_ext<mode>_2_slp, *subqi_ext<mode>_0) (*subqi_ext2<mode>_0, *subqi_ext<mode>_1, *testqi_ext<mode>_1) (*testqi_ext<mode>_2, *<code>qi_ext<mode>_1_slp) (*<code>qi_ext<mode>_2_slp. *<code>qi_ext<mode>_0) (*<code>qi_ext2<mode>_0, *<code>qi_ext<mode>_1) (*<code>qi_ext<mode>_1_cc, *<code>qi_ext<mode>_1_cc) (*<code>qi_ext<mode>_2, *<code>qi_ext<mode>_3, *negqi_ext<mode>_1) (*one_cmplqi_ext<mode>_1, *ashlqi_ext<mode>_1, *<insn>qi_ext<mode>_1) (define_peephole2): Replace uses of extract_operator with extract_high_operator, matching only the first operand. Use zero_extract rather than match_op_dup when splitting.
2025-08-05AVR: target/121359: Remove -mlra and remains of reload.Georg-Johann Lay7-180/+6
PR target/121359 gcc/ * config/avr/avr.h: Remove -mlra and remains of reload. * config/avr/avr.cc: Same. * config/avr/avr.md: Same. * config/avr/avr-log.cc: Same. * config/avr/avr-protos.h: Same. * config/avr/avr.opt: Same. * config/avr/avr.opt.urls: Same. gcc/testsuite/ * gcc.target/avr/torture/pr118591-1.c: Remove -mlra. * gcc.target/avr/torture/pr118591-2.c: Same.
2025-08-05x86: Update *one_cmplqi_ext<mode>_1H.J. Lu1-12/+7
After commit 965564eafb721f8000013a3112f1bba8d8fae32b Author: Richard Sandiford <richard.sandiford@arm.com> Date: Tue Jul 29 15:58:34 2025 +0100 simplify-rtx: Simplify subregs of logic ops combine generates (set (zero_extract:SI (reg/v:SI 101 [ a ]) (const_int 8 [0x8]) (const_int 8 [0x8])) (not:SI (sign_extract:SI (reg:SI 107 [ b ]) (const_int 8 [0x8]) (const_int 8 [0x8])))) instead of (set (zero_extract:SI (reg/v:SI 101 [ a ]) (const_int 8 [0x8]) (const_int 8 [0x8])) (subreg:SI (not:QI (subreg:QI (sign_extract:SI (reg:SI 107 [ b ]) (const_int 8 [0x8]) (const_int 8 [0x8])) 0)) 0)) Update *one_cmplqi_ext<mode>_1 to support the new pattern. PR target/121306 * config/i386/i386.md (*one_cmplqi_ext<mode>_1): Updated to support the new pattern. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2025-08-05RISC-V: Fix scalar code-gen of unsigned SAT_MULPan Li1-2/+2
The previous code-gen of scalar unsigned SAT_MUL, aka usmul. Leverage the mulhs by mistake, it should be mulhu for the hight bit result of mul. Thus, this patch would like to make it correct. gcc/ChangeLog: * config/riscv/riscv.cc (riscv_expand_xmode_usmul): Take umulhu for high bits mul result. gcc/testsuite/ChangeLog: * gcc.target/riscv/sat/sat_u_mul-1-u32-from-u64.c: Add mulhu asm check. * gcc.target/riscv/sat/sat_u_mul-1-u64-from-u128.c: Ditto. Signed-off-by: Pan Li <pan2.li@intel.com>
2025-08-04aarch64: Check the mode of SVE ACLE function resultsRichard Sandiford1-1/+21
After previous patches, we should always get a VNx16BI result for ACLE intrinsics that return svbool_t. This patch adds an assert that checks a more general condition than that. gcc/ * config/aarch64/aarch64-sve-builtins.cc (function_expander::expand): Assert that the return value has an appropriate mode.
2025-08-04aarch64: Use VNx16BI for svdupq_b*Richard Sandiford3-13/+18
This patch continues the work of making ACLE intrinsics use VNx16BI for svbool_t results. It deals with the predicate forms of svdupq. The general predicate expansion builds an equivalent integer vector and then compares it with zero. This patch therefore relies on the earlier patches to the comparison patterns. gcc/ * config/aarch64/aarch64-protos.h (aarch64_convert_sve_data_to_pred): Remove the mode argument. * config/aarch64/aarch64.cc (aarch64_sve_emit_int_cmp): Allow PRED_MODE to be VNx16BI or the natural predicate mode for the data mode. (aarch64_convert_sve_data_to_pred): Remove the mode argument and instead always create a VNx16BI result. (aarch64_expand_sve_const_pred): Update call accordingly. * config/aarch64/aarch64-sve-builtins-base.cc (svdupq_impl::expand): Likewise, ensuring that the result has mode VNx16BI. gcc/testsuite/ * gcc.target/aarch64/sve/acle/general/dupq_13.c: New test.
2025-08-04aarch64: Use VNx16BI for svdup_b*Richard Sandiford4-5/+40
This patch continues the work of making ACLE intrinsics use VNx16BI for svbool_t results. It deals with the predicate forms of svdup. gcc/ * config/aarch64/aarch64-protos.h (aarch64_emit_sve_pred_vec_duplicate): Declare. * config/aarch64/aarch64.cc (aarch64_emit_sve_pred_vec_duplicate): New function. * config/aarch64/aarch64-sve.md (vec_duplicate<PRED_ALL:mode>): Use it. * config/aarch64/aarch64-sve-builtins-base.cc (svdup_impl::expand): Handle boolean values specially. Check for constants and fall back on aarch64_emit_sve_pred_vec_duplicate for the variable case, ensuring that the result has mode VNx16BI. gcc/testsuite/ * gcc.target/aarch64/sve/acle/general/dup_1.c: New test.
2025-08-04aarch64: Use VNx16BI for svpnext*Richard Sandiford2-5/+74
This patch continues the work of making ACLE intrinsics use VNx16BI for svbool_t results. It deals with the svpnext* intrinsics. gcc/ * config/aarch64/iterators.md (PNEXT_ONLY): New int iterator. * config/aarch64/aarch64-sve.md (@aarch64_sve_<sve_pred_op><mode>): Restrict SVE_PITER pattern to VNx16BI_ONLY. (@aarch64_sve_<sve_pred_op><mode>): New PNEXT_ONLY pattern for PRED_HSD. (*aarch64_sve_<sve_pred_op><mode>): Likewise. (*aarch64_sve_<sve_pred_op><mode>_cc): Likewise. gcc/testsuite/ * gcc.target/aarch64/sve/acle/general/pnext_3.c: New test.
2025-08-04aarch64: Use VNx16BI for sv(n)match*Richard Sandiford1-2/+86
This patch continues the work of making ACLE intrinsics use VNx16BI for svbool_t results. It deals with the svmatch* and svnmatch* intrinsics. gcc/ * config/aarch64/aarch64-sve2.md (@aarch64_pred_<sve_int_op><mode>): Split SVE2_MATCH pattern into a VNx16QI_ONLY define_ins and a VNx8HI_ONLY define_expand. Use a VNx16BI destination for the latter. (*aarch64_pred_<sve_int_op><mode>): New SVE2_MATCH pattern for VNx8HI_ONLY. (*aarch64_pred_<sve_int_op><mode>_cc): Likewise. gcc/testsuite/ * gcc.target/aarch64/sve2/acle/general/match_4.c: New test. * gcc.target/aarch64/sve2/acle/general/nmatch_1.c: Likewise.