aboutsummaryrefslogtreecommitdiff
path: root/gcc/config
AgeCommit message (Collapse)AuthorFilesLines
2024-12-17[PATCH] RISC-V: optimization on checking certain bits set ((x & mask) == val)Oliver Kozul1-0/+28
The patch optimizes code generation for comparisons of the form X & C1 == C2 by converting them to (X | ~C1) == (C2 | ~C1). C1 is a constant that requires li and addi to be loaded, while ~C1 requires a single lui instruction. As the values of C1 and C2 are not visible within the equality expression, a plus pattern is matched instead.       PR target/114087 gcc/ChangeLog: * config/riscv/riscv.md (*lui_constraint<ANYI:mode>_and_to_or): New pattern gcc/testsuite/ChangeLog: * gcc.target/riscv/pr114087-1.c: New test.
2024-12-17[PATCH v2 2/2] RISC-V: Add Tenstorrent Ascalon 8 wide architectureAnton Blanchard2-0/+30
This adds the Tenstorrent Ascalon 8 wide architecture (tt-ascalon-d8) to the list of known cores. gcc/ChangeLog: * config/riscv/riscv-cores.def: Add tt-ascalon-d8. * config/riscv/riscv.cc (tt_ascalon_d8_tune_info): New. * doc/invoke.texi (RISC-V): Add tt-ascalon-d8 to -mcpu. gcc/testsuite/ChangeLog: * gcc.target/riscv/mcpu-tt-ascalon-d8.c: New test.
2024-12-17RISC-V: Add new constraint R for register even-odd pairsKito Cheng1-0/+4
Although this constraint is not currently used for any instructions, it is very useful for custom instructions. Additionally, some new standard extensions (not yet upstream), such as `Zilsd` and `Zclsd`, are potential users of this constraint. Therefore, I believe there is sufficient justification to add it now. gcc/ChangeLog: * config/riscv/constraints.md (R): New constraint. * doc/md.texi: Document new constraint `R`. gcc/testsuite/ChangeLog: * gcc.target/riscv/constraint-R.c: New.
2024-12-17RISC-V: Implment N modifier for printing the register number rather than the ↵Kito Cheng1-0/+23
register name The modifier `N`, to print the raw encoding of a register. This is used when using `.insn <length>, <encoding>`, where the user wants to pass a value to the instruction in a known register, but where the instruction doesn't follow the existing instruction formats, so the assembly parser is not expecting a register name, just a raw integer. gcc/ChangeLog: * config/riscv/riscv.cc (riscv_print_operand): Add N. * doc/extend.texi: Document for N, gcc/testsuite/ChangeLog: * gcc.target/riscv/modifier-N-fpr.c: New. * gcc.target/riscv/modifier-N-vr.c: New. * gcc.target/riscv/modifier-N.c: New.
2024-12-17RISC-V: Rename internal operand modifier N to nKito Cheng3-5/+5
Here is a purposal that using N for printing register encoding number, so let rename the existing internal operand modifier `N` to `n`. gcc/ChangeLog: * config/riscv/corev.md (*cv_branch<mode>): Update modifier. (*branch<mode>): Ditto. * config/riscv/riscv.cc (riscv_print_operand): Update modifier. * config/riscv/riscv.md (*branch<mode>): Update modifier.
2024-12-17RISC-V: Add cr and cf constraintKito Cheng3-11/+29
gcc/ChangeLog: * config/riscv/constraints.md (cr): New. (cf): New. * config/riscv/riscv.h (reg_class): Add RVC_GR_REGS and RVC_FP_REGS. (REG_CLASS_NAMES): Ditto. (REG_CLASS_CONTENTS): Ditto. * doc/md.texi: Document cr and cf constraint. * config/riscv/riscv.cc (riscv_regno_to_class): Update FP_REGS to RVC_FP_REGS since it smaller set. (riscv_secondary_memory_needed): Handle RVC_FP_REGS. (riscv_register_move_cost): Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/constraint-cf-zfinx.c: New. * gcc.target/riscv/constraint-cf.c: New. * gcc.target/riscv/constraint-cr.c: New.
2024-12-17RISC-V: Rename constraint c0* to k0*Kito Cheng4-233/+233
Rename those constraint since we want define other constraint start with `c`, those constraints are internal and undocumented, so it's fine to rename. gcc/ChangeLog: * config/riscv/constraints.md (c01): Rename to... (k01): ...this. (c02): Rename to... (k02): ...this. (c03): Rename to... (k03): ...this. (c04): Rename to... (k04): ...this. (c08): Rename to... (k08): ...this. * config/riscv/corev.md (riscv_cv_simd_add_h_si): Update constraints. (riscv_cv_simd_sub_h_si): Ditto. (riscv_cv_simd_cplxmul_i_si): Ditto. (riscv_cv_simd_subrotmj_si): Ditto. * config/riscv/riscv-v.cc (splat_to_scalar_move_p): Update constraints. * config/riscv/vector-iterators.md (stride_load_constraint): Update constraints. (stride_store_constraint): Ditto.
2024-12-16i386: Fix tabs vs. spaces in mmx.mdUros Bizjak1-140/+140
gcc/ChangeLog: * config/i386/mmx.md: Fix tabs vs. spaces.
2024-12-16i386: Add HImode to VALID_SSE2_REG_MODEUros Bizjak2-5/+1
Move explicit Himode handling for SSE2 XMM regnos from ix86_hard_regno_mode_ok to VALID_SSE2_REG_MODE. No functional change. gcc/ChangeLog: * config/i386/i386.cc (ix86_hard_regno_mode_ok): Remove explicit HImode handling for SSE2 XMM regnos. * config/i386/i386.h (VALID_SSE2_REG_MODE): Add HImode.
2024-12-16vect: Do not try to duplicate_and_interleave one-element mode.Robin Dapp1-9/+0
PR112694 shows that we try to create sub-vectors of single-element vectors because can_duplicate_and_interleave_p returns true. The problem resurfaced in PR116611. This patch makes can_duplicate_and_interleave_p return false if count / nvectors > 0 and removes the corresponding check in the riscv backend. This partially gets rid of the FAIL in slp-19a.c. At least when built with cost model we don't have LOAD_LANES anymore. Without cost model, as in the test suite, we choose a different path and still end up with LOAD_LANES. Bootstrapped and regtested on x86 and power10, regtested on rv64gcv_zvfh_zvbb. Still waiting for the aarch64 results. Regards Robin gcc/ChangeLog: PR target/112694 PR target/116611. * config/riscv/riscv-v.cc (expand_vec_perm_const): Remove early return. * tree-vect-slp.cc (can_duplicate_and_interleave_p): Return false when we cannot create sub-elements.
2024-12-16RISC-V: Fix compress shuffle pattern [PR117383].Robin Dapp2-3/+4
This patch makes vcompress use the tail-undisturbed policy by default and also uses the proper VL. PR target/117383 gcc/ChangeLog: * config/riscv/riscv-protos.h (enum insn_type): Use TU policy. * config/riscv/riscv-v.cc (shuffle_compress_patterns): Set VL. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/binop/vcompress-avlprop-1.c: Expect tu. * gcc.target/riscv/rvv/autovec/pr117383.c: New test.
2024-12-16RISC-V: Increase cost for vec_construct [PR118019].Robin Dapp1-1/+7
For a generic vec_construct from scalar elements we need to load each scalar element and move it over to a vector register. Right now we only use a cost of 1 per element. This patch uses register-move cost as well as scalar_to_vec and multiplies it with the number of elements in the vector instead. PR target/118019 gcc/ChangeLog: * config/riscv/riscv.cc (riscv_builtin_vectorization_cost): Increase vec_construct cost. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/pr118019.c: New test.
2024-12-15hppa: Implement TARGET_FRAME_POINTER_REQUIREDJohn David Anglin1-0/+16
If a function receives nonlocal gotos, it needs to save the frame pointer in the argument save area. This ensures that LRA sets frame_pointer_needed when it saves arguments in the save area. 2024-12-15 John David Anglin <danglin@gcc.gnu.org> gcc/ChangeLog: PR target/118018 * config/pa/pa.cc (pa_frame_pointer_required): Declare and implement. (TARGET_FRAME_POINTER_REQUIRED): Define.
2024-12-15arm: fix bootstrap after MVE changesTamar Christina1-2/+2
The recent commits for MVE on Saturday have broken armhf bootstrap due to a -Werror false positive: inlined from 'virtual rtx_def* {anonymous}::vstrq_scatter_base_impl::expand(arm_mve::function_expander&) const' at /gcc/config/arm/arm-mve-builtins-base.cc:352:17: ./genrtl.h:38:16: error: 'new_base' may be used uninitialized [-Werror=maybe-uninitialized] 38 | XEXP (rt, 1) = arg1; /gcc/config/arm/arm-mve-builtins-base.cc: In member function 'virtual rtx_def* {anonymous}::vstrq_scatter_base_impl::expand(arm_mve::function_expander&) const': /gcc/config/arm/arm-mve-builtins-base.cc:311:26: note: 'new_base' was declared here 311 | rtx insns, base_ptr, new_base; | ^~~~~~~~ In function 'rtx_def* init_rtx_fmt_ee(rtx, machine_mode, rtx, rtx)', inlined from 'rtx_def* gen_rtx_fmt_ee_stat(rtx_code, machine_mode, rtx, rtx)' at ./genrtl.h:50:26, inlined from 'virtual rtx_def* {anonymous}::vldrq_gather_base_impl::expand(arm_mve::function_expander&) const' at /gcc/config/arm/arm-mve-builtins-base.cc:527:17: ./genrtl.h:38:16: error: 'new_base' may be used uninitialized [-Werror=maybe-uninitialized] 38 | XEXP (rt, 1) = arg1; /gcc/config/arm/arm-mve-builtins-base.cc: In member function 'virtual rtx_def* {anonymous}::vldrq_gather_base_impl::expand(arm_mve::function_expander&) const': /gcc/config/arm/arm-mve-builtins-base.cc:486:26: note: 'new_base' was declared here 486 | rtx insns, base_ptr, new_base; To fix it I just initialize the value. gcc/ChangeLog: * config/arm/arm-mve-builtins-base.cc (expand): Initialize new_base.
2024-12-14bpf: fix build adding new required arg to RESOLVE_OVERLOADED_BUILTINJose E. Marchesi1-1/+2
gcc/ChangeLog * config/bpf/bpf.cc (bpf_resolve_overloaded_builtin): Add argument `complain'.
2024-12-13arm: [MVE intrinsics] Fix support for predicate constants [PR target/114801]Christophe Lyon1-1/+31
In this PR, we have to handle a case where MVE predicates are supplied as a const_int, where individual predicates have illegal boolean values (such as 0xc for a 4-bit boolean predicate). To avoid the ICE, fix the constant (any non-zero value is converted to all 1s) and emit a warning. On MVE, V8BI and V4BI multi-bit masks are interpreted byte-by-byte at instruction level, but end-users should describe lanes rather than bytes (so all bytes of a true-predicated lane should be '1'), see the section on MVE intrinsics in the Arm ACLE specification. Since force_lowpart_subreg cannot handle const_int (because they have VOID mode), use gen_lowpart on them, force_lowpart_subreg otherwise. 2024-11-20 Christophe Lyon <christophe.lyon@linaro.org> Jakub Jelinek <jakub@redhat.com> PR target/114801 gcc/ * config/arm/arm-mve-builtins.cc (function_expander::add_input_operand): Handle CONST_INT predicates. gcc/testsuite/ * gcc.target/arm/mve/pr108443.c: Update predicate constant. * gcc.target/arm/mve/pr108443-run.c: Likewise. * gcc.target/arm/mve/pr114801.c: New test.
2024-12-13arm: [MVE intrinsics] rework vst2q vst4q vld2q vld4qChristophe Lyon11-683/+253
Implement vst2q, vst4q, vld2q and vld4q using the new MVE builtins framework. Since MVE uses different tuple modes than Neon, we need to use VALID_MVE_STRUCT_MODE because VALID_NEON_STRUCT_MODE is no longer a super-set of it, for instance in output_move_neon and arm_print_operand_address. In arm_hard_regno_mode_ok, the change is similar but a bit more intrusive. Expand the VSTRUCT iterator, so that mov<mode> and neon_mov<mode> patterns from neon.md still work for MVE. Besides the small updates to the patterns in mve.md, we have to update vec_load_lanes and vec_store_lanes in vec-common.md so that the vectorizer can handle the new modes. These patterns are now different from Neon's, so maybe we should move them back to neon.md and mve.md The patch adds arm_array_mode, which is used by build_array_type_nelts and makes it possible to support the new assert in register_builtin_tuple_types. gcc/ChangeLog: * config/arm/arm-mve-builtins-base.cc (class vst24_impl): New. (class vld24_impl): New. (vld2q, vld4q, vst2q, vst4q): New. * config/arm/arm-mve-builtins-base.def (vld2q, vld4q, vst2q) (vst4q): New. * config/arm/arm-mve-builtins-base.h (vld2q, vld4q, vst2q, vst4q): New. * config/arm/arm-mve-builtins.cc (register_builtin_tuple_types): Add more asserts. * config/arm/arm.cc (TARGET_ARRAY_MODE): New. (output_move_neon): Handle MVE struct modes. (arm_print_operand_address): Likewise. (arm_hard_regno_mode_ok): Likewise. (arm_array_mode): New. * config/arm/arm.h (VALID_MVE_STRUCT_MODE): Likewise. * config/arm/arm_mve.h (vst4q): Delete. (vst2q): Delete. (vld2q): Delete. (vld4q): Delete. (vst4q_s8): Delete. (vst4q_s16): Delete. (vst4q_s32): Delete. (vst4q_u8): Delete. (vst4q_u16): Delete. (vst4q_u32): Delete. (vst4q_f16): Delete. (vst4q_f32): Delete. (vst2q_s8): Delete. (vst2q_u8): Delete. (vld2q_s8): Delete. (vld2q_u8): Delete. (vld4q_s8): Delete. (vld4q_u8): Delete. (vst2q_s16): Delete. (vst2q_u16): Delete. (vld2q_s16): Delete. (vld2q_u16): Delete. (vld4q_s16): Delete. (vld4q_u16): Delete. (vst2q_s32): Delete. (vst2q_u32): Delete. (vld2q_s32): Delete. (vld2q_u32): Delete. (vld4q_s32): Delete. (vld4q_u32): Delete. (vld4q_f16): Delete. (vld2q_f16): Delete. (vst2q_f16): Delete. (vld4q_f32): Delete. (vld2q_f32): Delete. (vst2q_f32): Delete. (__arm_vst4q_s8): Delete. (__arm_vst4q_s16): Delete. (__arm_vst4q_s32): Delete. (__arm_vst4q_u8): Delete. (__arm_vst4q_u16): Delete. (__arm_vst4q_u32): Delete. (__arm_vst2q_s8): Delete. (__arm_vst2q_u8): Delete. (__arm_vld2q_s8): Delete. (__arm_vld2q_u8): Delete. (__arm_vld4q_s8): Delete. (__arm_vld4q_u8): Delete. (__arm_vst2q_s16): Delete. (__arm_vst2q_u16): Delete. (__arm_vld2q_s16): Delete. (__arm_vld2q_u16): Delete. (__arm_vld4q_s16): Delete. (__arm_vld4q_u16): Delete. (__arm_vst2q_s32): Delete. (__arm_vst2q_u32): Delete. (__arm_vld2q_s32): Delete. (__arm_vld2q_u32): Delete. (__arm_vld4q_s32): Delete. (__arm_vld4q_u32): Delete. (__arm_vst4q_f16): Delete. (__arm_vst4q_f32): Delete. (__arm_vld4q_f16): Delete. (__arm_vld2q_f16): Delete. (__arm_vst2q_f16): Delete. (__arm_vld4q_f32): Delete. (__arm_vld2q_f32): Delete. (__arm_vst2q_f32): Delete. (__arm_vst4q): Delete. (__arm_vst2q): Delete. (__arm_vld2q): Delete. (__arm_vld4q): Delete. * config/arm/arm_mve_builtins.def (vst4q, vst2q, vld4q, vld2q): Delete. * config/arm/iterators.md (VSTRUCT): Add V2x16QI, V2x8HI, V2x4SI, V2x8HF, V2x4SF, V4x16QI, V4x8HI, V4x4SI, V4x8HF, V4x4SF. (MVE_VLD2_VST2, MVE_vld2_vst2, MVE_VLD4_VST4, MVE_vld4_vst4): New. * config/arm/mve.md (mve_vst4q<mode>): Update into ... (@mve_vst4q<mode>): ... this. (mve_vst2q<mode>): Update into ... (@mve_vst2q<mode>): ... this. (mve_vld2q<mode>): Update into ... (@mve_vld2q<mode>): ... this. (mve_vld4q<mode>): Update into ... (@mve_vld4q<mode>): ... this. * config/arm/vec-common.md (vec_load_lanesoi<mode>) Remove MVE support. (vec_load_lanesxi<mode>): Likewise. (vec_store_lanesoi<mode>): Likewise. (vec_store_lanesxi<mode>): Likewise. (vec_load_lanes<MVE_vld2_vst2><mode>): New. (vec_store_lanes<MVE_vld2_vst2><mode>): New. (vec_load_lanes<MVE_vld4_vst4><mode>): New. (vec_store_lanes<MVE_vld4_vst4><mode>): New.
2024-12-13arm: [MVE intrinsics] fix store shape to support tuplesChristophe Lyon1-2/+2
Now that tuples are properly supported, we can update the store shape, to expect "t0" instead of "v0" as last argument. gcc/ChangeLog: * config/arm/arm-mve-builtins-shapes.cc (struct store_def): Add support for tuples.
2024-12-13arm: [MVE intrinsics] add support for tuplesChristophe Lyon3-13/+68
This patch is largely a copy/paste from the aarch64 SVE counterpart, and adds support for tuples to the MVE intrinsics framework. Introduce function_resolver::infer_tuple_type which will be used to resolve overloaded vst2q and vst4q function names in a later patch. Fix access to acle_vector_types in a few places, as well as in infer_vector_or_tuple_type because we should shift the tuple size to the right by one bit when computing the array index. The new wrap_type_in_struct, register_type_decl and infer_tuple_type are largely copies of the aarch64 versions, and register_builtin_tuple_types is very similar. gcc/ChangeLog: * config/arm/arm-mve-builtins-shapes.cc (parse_type): Fix access to acle_vector_types. * config/arm/arm-mve-builtins.cc (wrap_type_in_struct): New. (register_type_decl): New. (register_builtin_tuple_types): Fix support for tuples. (function_resolver::infer_tuple_type): New. * config/arm/arm-mve-builtins.h (function_resolver::infer_tuple_type): Declare. (function_instance::tuple_type): Fix access to acle_vector_types.
2024-12-13arm: [MVE intrinsics] add modes for tuplesChristophe Lyon1-0/+22
Add V2x and V4x modes, like we do on aarch64 for Advanced SIMD q-registers. gcc/ChangeLog: * config/arm/arm-modes.def (MVE_STRUCT_MODES): New.
2024-12-13arm: [MVE intrinsics] remove V2DF from MVE_vecs iteratorChristophe Lyon1-1/+1
V2DF is not supported by MVE, so remove it from the only iterator which contains it. gcc/ChangeLog: * config/arm/iterators.md (MVE_vecs): Remove V2DF.
2024-12-13arm: [MVE intrinsics] Fix condition for vec_extract patternsChristophe Lyon1-4/+2
Remove floating-point condition from mve_vec_extract_sext_internal and mve_vec_extract_zext_internal, since the MVE_2 iterator does not include any FP mode. gcc/ChangeLog: * config/arm/mve.md (mve_vec_extract_sext_internal): Fix condition. (mve_vec_extract_zext_internal): Likewise.
2024-12-13arm: [MVE intrinsics] remove useless call_properties implementations.Christophe Lyon1-10/+0
vstrq_impl derives from store_truncating and vldrq_impl derives from load_extending which both implement call_properties. No need to re-implement them in the derived classes. gcc/ChangeLog: * config/arm/arm-mve-builtins-base.cc (vstrq_impl): Remove call_properties. (vldrq_impl): Likewise.
2024-12-13arm: [MVE intrinsics] rework vldr gather_base_wbChristophe Lyon8-496/+74
Implement vldr?q_gather_base_wb using the new MVE builtins framework. gcc/ChangeLog: * config/arm/arm-builtins.cc (arm_ldrgbwbxu_qualifiers) (arm_ldrgbwbxu_z_qualifiers, arm_ldrgbwbs_qualifiers) (arm_ldrgbwbu_qualifiers, arm_ldrgbwbs_z_qualifiers) (arm_ldrgbwbu_z_qualifiers): Delete. * config/arm/arm-mve-builtins-base.cc (vldrq_gather_base_impl): Add support for MODE_wb. * config/arm/arm-mve-builtins-shapes.cc (struct load_gather_base_def): Likewise. * config/arm/arm_mve.h (vldrdq_gather_base_wb_s64): Delete. (vldrdq_gather_base_wb_u64): Delete. (vldrdq_gather_base_wb_z_s64): Delete. (vldrdq_gather_base_wb_z_u64): Delete. (vldrwq_gather_base_wb_f32): Delete. (vldrwq_gather_base_wb_s32): Delete. (vldrwq_gather_base_wb_u32): Delete. (vldrwq_gather_base_wb_z_f32): Delete. (vldrwq_gather_base_wb_z_s32): Delete. (vldrwq_gather_base_wb_z_u32): Delete. (__arm_vldrdq_gather_base_wb_s64): Delete. (__arm_vldrdq_gather_base_wb_u64): Delete. (__arm_vldrdq_gather_base_wb_z_s64): Delete. (__arm_vldrdq_gather_base_wb_z_u64): Delete. (__arm_vldrwq_gather_base_wb_s32): Delete. (__arm_vldrwq_gather_base_wb_u32): Delete. (__arm_vldrwq_gather_base_wb_z_s32): Delete. (__arm_vldrwq_gather_base_wb_z_u32): Delete. (__arm_vldrwq_gather_base_wb_f32): Delete. (__arm_vldrwq_gather_base_wb_z_f32): Delete. * config/arm/arm_mve_builtins.def (vldrwq_gather_base_nowb_z_u) (vldrdq_gather_base_nowb_z_u, vldrwq_gather_base_nowb_u) (vldrdq_gather_base_nowb_u, vldrwq_gather_base_nowb_z_s) (vldrwq_gather_base_nowb_z_f, vldrdq_gather_base_nowb_z_s) (vldrwq_gather_base_nowb_s, vldrwq_gather_base_nowb_f) (vldrdq_gather_base_nowb_s, vldrdq_gather_base_wb_z_s) (vldrdq_gather_base_wb_z_u, vldrdq_gather_base_wb_s) (vldrdq_gather_base_wb_u, vldrwq_gather_base_wb_z_s) (vldrwq_gather_base_wb_z_f, vldrwq_gather_base_wb_z_u) (vldrwq_gather_base_wb_s, vldrwq_gather_base_wb_f) (vldrwq_gather_base_wb_u): Delete * config/arm/iterators.md (supf): Remove VLDRWQGBWB_S, VLDRWQGBWB_U, VLDRDQGBWB_S, VLDRDQGBWB_U. (VLDRWGBWBQ, VLDRDGBWBQ): Delete. * config/arm/mve.md (mve_vldrwq_gather_base_wb_<supf>v4si): Delete. (mve_vldrwq_gather_base_nowb_<supf>v4si): Delete. (mve_vldrwq_gather_base_wb_<supf>v4si_insn): Delete. (mve_vldrwq_gather_base_wb_z_<supf>v4si): Delete. (mve_vldrwq_gather_base_nowb_z_<supf>v4si): Delete. (mve_vldrwq_gather_base_wb_z_<supf>v4si_insn): Delete. (mve_vldrwq_gather_base_wb_fv4sf): Delete. (mve_vldrwq_gather_base_nowb_fv4sf): Delete. (mve_vldrwq_gather_base_wb_fv4sf_insn): Delete. (mve_vldrwq_gather_base_wb_z_fv4sf): Delete. (mve_vldrwq_gather_base_nowb_z_fv4sf): Delete. (mve_vldrwq_gather_base_wb_z_fv4sf_insn): Delete. (mve_vldrdq_gather_base_wb_<supf>v2di): Delete. (mve_vldrdq_gather_base_nowb_<supf>v2di): Delete. (mve_vldrdq_gather_base_wb_<supf>v2di_insn): Delete. (mve_vldrdq_gather_base_wb_z_<supf>v2di): Delete. (mve_vldrdq_gather_base_nowb_z_<supf>v2di): Delete. (mve_vldrdq_gather_base_wb_z_<supf>v2di_insn): Delete. (@mve_vldrq_gather_base_wb_<mode>): New. (@mve_vldrq_gather_base_wb_z_<mode>): New. * config/arm/unspecs.md (VLDRWQGBWB_S, VLDRWQGBWB_U, VLDRWQGBWB_F) (VLDRDQGBWB_S, VLDRDQGBWB_U): Delete (VLDRGBWBQ, VLDRGBWBQ_Z): New. gcc/testsuite/ChangeLog: * gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_s64.c: Update expected output. * gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_u64.c: Likewise.
2024-12-13arm: [MVE intrinsics] rework vldr gather_baseChristophe Lyon9-245/+71
Implement vldr?q_gather_base using the new MVE builtins framework. The patch updates two testcases rather than using different iterators for predicated and non-predicated versions. According to ACLE: vldrdq_gather_base_s64 is expected to generate VLDRD.64 vldrdq_gather_base_z_s64 is expected to generate VLDRDT.U64 Both are equally valid, however. gcc/ChangeLog: * config/arm/arm-builtins.cc (arm_ldrgbs_qualifiers) (arm_ldrgbu_qualifiers, arm_ldrgbs_z_qualifiers) (arm_ldrgbu_z_qualifiers): Delete. * config/arm/arm-mve-builtins-base.cc (class vldrq_gather_base_impl): New. (vldrdq_gather_base, vldrwq_gather_base): New. * config/arm/arm-mve-builtins-base.def (vldrdq_gather_base) (vldrwq_gather_base): New. * config/arm/arm-mve-builtins-base.h: (vldrdq_gather_base) (vldrwq_gather_base): New. * config/arm/arm_mve.h (vldrwq_gather_base_s32): Delete. (vldrwq_gather_base_u32): Delete. (vldrwq_gather_base_z_u32): Delete. (vldrwq_gather_base_z_s32): Delete. (vldrdq_gather_base_s64): Delete. (vldrdq_gather_base_u64): Delete. (vldrdq_gather_base_z_s64): Delete. (vldrdq_gather_base_z_u64): Delete. (vldrwq_gather_base_f32): Delete. (vldrwq_gather_base_z_f32): Delete. (__arm_vldrwq_gather_base_s32): Delete. (__arm_vldrwq_gather_base_u32): Delete. (__arm_vldrwq_gather_base_z_s32): Delete. (__arm_vldrwq_gather_base_z_u32): Delete. (__arm_vldrdq_gather_base_s64): Delete. (__arm_vldrdq_gather_base_u64): Delete. (__arm_vldrdq_gather_base_z_s64): Delete. (__arm_vldrdq_gather_base_z_u64): Delete. (__arm_vldrwq_gather_base_f32): Delete. (__arm_vldrwq_gather_base_z_f32): Delete. * config/arm/arm_mve_builtins.def (vldrwq_gather_base_s) (vldrwq_gather_base_u, vldrwq_gather_base_z_s) (vldrwq_gather_base_z_u, vldrdq_gather_base_s) (vldrwq_gather_base_f, vldrdq_gather_base_z_s) (vldrwq_gather_base_z_f, vldrdq_gather_base_u) (vldrdq_gather_base_z_u): Delete. * config/arm/iterators.md (supf): Remove VLDRWQGB_S, VLDRWQGB_U, VLDRDQGB_S, VLDRDQGB_U. (VLDRWGBQ, VLDRDGBQ): Delete. * config/arm/mve.md (mve_vldrwq_gather_base_<supf>v4si): Delete. (mve_vldrwq_gather_base_z_<supf>v4si): Delete. (mve_vldrdq_gather_base_<supf>v2di): Delete. (mve_vldrdq_gather_base_z_<supf>v2di): Delete. (mve_vldrwq_gather_base_fv4sf): Delete. (mve_vldrwq_gather_base_z_fv4sf): Delete. (@mve_vldrq_gather_base_<mode>): New. (@mve_vldrq_gather_base_z_<mode>): New. * config/arm/unspecs.md (VLDRWQGB_S, VLDRWQGB_U, VLDRDQGB_S) (VLDRDQGB_U, VLDRWQGB_F): Delete. (VLDRGBQ, VLDRGBQ_Z): New. gcc/testsuite/ChangeLog: * gcc.target/arm/mve/intrinsics/vldrdq_gather_base_s64.c: Update expected output. * gcc.target/arm/mve/intrinsics/vldrdq_gather_base_u64.c: Likewise.
2024-12-13arm: [MVE intrinsics] add load_gather_base shapeChristophe Lyon2-0/+39
This patch adds the load_gather_base shape description. Unlike other load_gather shapes, this one does not support overloaded forms. gcc/ChangeLog: * config/arm/arm-mve-builtins-shapes.cc (struct load_gather_base_def): New. * config/arm/arm-mve-builtins-shapes.h: (load_gather_base): New.
2024-12-13arm: [MVE intrinsics] rework vldr gather_shifted_offsetChristophe Lyon9-653/+129
Implement vldr?q_gather_shifted_offset using the new MVE builtins framework. gcc/ChangeLog: * config/arm/arm-builtins.cc (arm_ldrgu_qualifiers) (arm_ldrgs_qualifiers, arm_ldrgs_z_qualifiers) (arm_ldrgu_z_qualifiers): Delete. * config/arm/arm-mve-builtins-base.cc (vldrq_gather_impl): Add support for shifted version. (vldrdq_gather_shifted, vldrhq_gather_shifted) (vldrwq_gather_shifted): New. * config/arm/arm-mve-builtins-base.def (vldrdq_gather_shifted) (vldrhq_gather_shifted, vldrwq_gather_shifted): New. * config/arm/arm-mve-builtins-base.h (vldrdq_gather_shifted) (vldrhq_gather_shifted, vldrwq_gather_shifted): New. * config/arm/arm_mve.h (vldrhq_gather_shifted_offset): Delete. (vldrhq_gather_shifted_offset_z): Delete. (vldrdq_gather_shifted_offset): Delete. (vldrdq_gather_shifted_offset_z): Delete. (vldrwq_gather_shifted_offset): Delete. (vldrwq_gather_shifted_offset_z): Delete. (vldrhq_gather_shifted_offset_s32): Delete. (vldrhq_gather_shifted_offset_s16): Delete. (vldrhq_gather_shifted_offset_u32): Delete. (vldrhq_gather_shifted_offset_u16): Delete. (vldrhq_gather_shifted_offset_z_s32): Delete. (vldrhq_gather_shifted_offset_z_s16): Delete. (vldrhq_gather_shifted_offset_z_u32): Delete. (vldrhq_gather_shifted_offset_z_u16): Delete. (vldrdq_gather_shifted_offset_s64): Delete. (vldrdq_gather_shifted_offset_u64): Delete. (vldrdq_gather_shifted_offset_z_s64): Delete. (vldrdq_gather_shifted_offset_z_u64): Delete. (vldrhq_gather_shifted_offset_f16): Delete. (vldrhq_gather_shifted_offset_z_f16): Delete. (vldrwq_gather_shifted_offset_f32): Delete. (vldrwq_gather_shifted_offset_s32): Delete. (vldrwq_gather_shifted_offset_u32): Delete. (vldrwq_gather_shifted_offset_z_f32): Delete. (vldrwq_gather_shifted_offset_z_s32): Delete. (vldrwq_gather_shifted_offset_z_u32): Delete. (__arm_vldrhq_gather_shifted_offset_s32): Delete. (__arm_vldrhq_gather_shifted_offset_s16): Delete. (__arm_vldrhq_gather_shifted_offset_u32): Delete. (__arm_vldrhq_gather_shifted_offset_u16): Delete. (__arm_vldrhq_gather_shifted_offset_z_s32): Delete. (__arm_vldrhq_gather_shifted_offset_z_s16): Delete. (__arm_vldrhq_gather_shifted_offset_z_u32): Delete. (__arm_vldrhq_gather_shifted_offset_z_u16): Delete. (__arm_vldrdq_gather_shifted_offset_s64): Delete. (__arm_vldrdq_gather_shifted_offset_u64): Delete. (__arm_vldrdq_gather_shifted_offset_z_s64): Delete. (__arm_vldrdq_gather_shifted_offset_z_u64): Delete. (__arm_vldrwq_gather_shifted_offset_s32): Delete. (__arm_vldrwq_gather_shifted_offset_u32): Delete. (__arm_vldrwq_gather_shifted_offset_z_s32): Delete. (__arm_vldrwq_gather_shifted_offset_z_u32): Delete. (__arm_vldrhq_gather_shifted_offset_f16): Delete. (__arm_vldrhq_gather_shifted_offset_z_f16): Delete. (__arm_vldrwq_gather_shifted_offset_f32): Delete. (__arm_vldrwq_gather_shifted_offset_z_f32): Delete. (__arm_vldrhq_gather_shifted_offset): Delete. (__arm_vldrhq_gather_shifted_offset_z): Delete. (__arm_vldrdq_gather_shifted_offset): Delete. (__arm_vldrdq_gather_shifted_offset_z): Delete. (__arm_vldrwq_gather_shifted_offset): Delete. (__arm_vldrwq_gather_shifted_offset_z): Delete. * config/arm/arm_mve_builtins.def (vldrhq_gather_shifted_offset_z_u, vldrhq_gather_shifted_offset_u) (vldrhq_gather_shifted_offset_z_s, vldrhq_gather_shifted_offset_s) (vldrdq_gather_shifted_offset_s, vldrhq_gather_shifted_offset_f) (vldrwq_gather_shifted_offset_f, vldrwq_gather_shifted_offset_s) (vldrdq_gather_shifted_offset_z_s) (vldrhq_gather_shifted_offset_z_f) (vldrwq_gather_shifted_offset_z_f) (vldrwq_gather_shifted_offset_z_s, vldrdq_gather_shifted_offset_u) (vldrwq_gather_shifted_offset_u, vldrdq_gather_shifted_offset_z_u) (vldrwq_gather_shifted_offset_z_u): Delete. * config/arm/iterators.md (supf): Remove VLDRHQGSO_S, VLDRHQGSO_U, VLDRDQGSO_S, VLDRDQGSO_U, VLDRWQGSO_S, VLDRWQGSO_U. (VLDRHGSOQ, VLDRDGSOQ, VLDRWGSOQ): Delete. * config/arm/mve.md (mve_vldrhq_gather_shifted_offset_<supf><mode>): Delete. (mve_vldrhq_gather_shifted_offset_z_<supf><mode>): Delete. (mve_vldrdq_gather_shifted_offset_<supf>v2di): Delete. (mve_vldrdq_gather_shifted_offset_z_<supf>v2di): Delete. (mve_vldrhq_gather_shifted_offset_fv8hf): Delete. (mve_vldrhq_gather_shifted_offset_z_fv8hf): Delete. (mve_vldrwq_gather_shifted_offset_fv4sf): Delete. (mve_vldrwq_gather_shifted_offset_<supf>v4si): Delete. (mve_vldrwq_gather_shifted_offset_z_fv4sf): Delete. (mve_vldrwq_gather_shifted_offset_z_<supf>v4si): Delete. (@mve_vldrq_gather_shifted_offset_<mode>): New. (@mve_vldrq_gather_shifted_offset_extend_v4si<US>): New. (@mve_vldrq_gather_shifted_offset_z_<mode>): New. (@mve_vldrq_gather_shifted_offset_z_extend_v4si<US>): New. * config/arm/unspecs.md (VLDRHQGSO_S, VLDRHQGSO_U, VLDRDQGSO_S) (VLDRDQGSO_U, VLDRHQGSO_F, VLDRWQGSO_F, VLDRWQGSO_S, VLDRWQGSO_U): Delete. (VLDRGSOQ, VLDRGSOQ_Z, VLDRGSOQ_EXT, VLDRGSOQ_EXT_Z): New.
2024-12-13arm: [MVE intrinsics] rework vldr gather_offsetChristophe Lyon8-884/+156
Implement vldr?q_gather_offset using the new MVE builtins framework. The patch introduces a new attribute iterator (MVE_u_elem) to accomodate the fact that ACLE's expected output description uses "uNN" for all modes, except V8HF where it expects ".f16". Using "V_sz_elem" would work, but would require to update several testcases. gcc/ChangeLog: * config/arm/arm-mve-builtins-base.cc (class vldrq_gather_impl): New. (vldrbq_gather, vldrdq_gather, vldrhq_gather, vldrwq_gather): New. * config/arm/arm-mve-builtins-base.def (vldrbq_gather) (vldrdq_gather, vldrhq_gather, vldrwq_gather): New. * config/arm/arm-mve-builtins-base.h (vldrbq_gather) (vldrdq_gather, vldrhq_gather, vldrwq_gather): New. * config/arm/arm_mve.h (vldrbq_gather_offset): Delete. (vldrbq_gather_offset_z): Delete. (vldrhq_gather_offset): Delete. (vldrhq_gather_offset_z): Delete. (vldrdq_gather_offset): Delete. (vldrdq_gather_offset_z): Delete. (vldrwq_gather_offset): Delete. (vldrwq_gather_offset_z): Delete. (vldrbq_gather_offset_u8): Delete. (vldrbq_gather_offset_s8): Delete. (vldrbq_gather_offset_u16): Delete. (vldrbq_gather_offset_s16): Delete. (vldrbq_gather_offset_u32): Delete. (vldrbq_gather_offset_s32): Delete. (vldrbq_gather_offset_z_s16): Delete. (vldrbq_gather_offset_z_u8): Delete. (vldrbq_gather_offset_z_s32): Delete. (vldrbq_gather_offset_z_u16): Delete. (vldrbq_gather_offset_z_u32): Delete. (vldrbq_gather_offset_z_s8): Delete. (vldrhq_gather_offset_s32): Delete. (vldrhq_gather_offset_s16): Delete. (vldrhq_gather_offset_u32): Delete. (vldrhq_gather_offset_u16): Delete. (vldrhq_gather_offset_z_s32): Delete. (vldrhq_gather_offset_z_s16): Delete. (vldrhq_gather_offset_z_u32): Delete. (vldrhq_gather_offset_z_u16): Delete. (vldrdq_gather_offset_s64): Delete. (vldrdq_gather_offset_u64): Delete. (vldrdq_gather_offset_z_s64): Delete. (vldrdq_gather_offset_z_u64): Delete. (vldrhq_gather_offset_f16): Delete. (vldrhq_gather_offset_z_f16): Delete. (vldrwq_gather_offset_f32): Delete. (vldrwq_gather_offset_s32): Delete. (vldrwq_gather_offset_u32): Delete. (vldrwq_gather_offset_z_f32): Delete. (vldrwq_gather_offset_z_s32): Delete. (vldrwq_gather_offset_z_u32): Delete. (__arm_vldrbq_gather_offset_u8): Delete. (__arm_vldrbq_gather_offset_s8): Delete. (__arm_vldrbq_gather_offset_u16): Delete. (__arm_vldrbq_gather_offset_s16): Delete. (__arm_vldrbq_gather_offset_u32): Delete. (__arm_vldrbq_gather_offset_s32): Delete. (__arm_vldrbq_gather_offset_z_s8): Delete. (__arm_vldrbq_gather_offset_z_s32): Delete. (__arm_vldrbq_gather_offset_z_s16): Delete. (__arm_vldrbq_gather_offset_z_u8): Delete. (__arm_vldrbq_gather_offset_z_u32): Delete. (__arm_vldrbq_gather_offset_z_u16): Delete. (__arm_vldrhq_gather_offset_s32): Delete. (__arm_vldrhq_gather_offset_s16): Delete. (__arm_vldrhq_gather_offset_u32): Delete. (__arm_vldrhq_gather_offset_u16): Delete. (__arm_vldrhq_gather_offset_z_s32): Delete. (__arm_vldrhq_gather_offset_z_s16): Delete. (__arm_vldrhq_gather_offset_z_u32): Delete. (__arm_vldrhq_gather_offset_z_u16): Delete. (__arm_vldrdq_gather_offset_s64): Delete. (__arm_vldrdq_gather_offset_u64): Delete. (__arm_vldrdq_gather_offset_z_s64): Delete. (__arm_vldrdq_gather_offset_z_u64): Delete. (__arm_vldrwq_gather_offset_s32): Delete. (__arm_vldrwq_gather_offset_u32): Delete. (__arm_vldrwq_gather_offset_z_s32): Delete. (__arm_vldrwq_gather_offset_z_u32): Delete. (__arm_vldrhq_gather_offset_f16): Delete. (__arm_vldrhq_gather_offset_z_f16): Delete. (__arm_vldrwq_gather_offset_f32): Delete. (__arm_vldrwq_gather_offset_z_f32): Delete. (__arm_vldrbq_gather_offset): Delete. (__arm_vldrbq_gather_offset_z): Delete. (__arm_vldrhq_gather_offset): Delete. (__arm_vldrhq_gather_offset_z): Delete. (__arm_vldrdq_gather_offset): Delete. (__arm_vldrdq_gather_offset_z): Delete. (__arm_vldrwq_gather_offset): Delete. (__arm_vldrwq_gather_offset_z): Delete. * config/arm/arm_mve_builtins.def (vldrbq_gather_offset_u) (vldrbq_gather_offset_s, vldrbq_gather_offset_z_s) (vldrbq_gather_offset_z_u, vldrhq_gather_offset_z_u) (vldrhq_gather_offset_u, vldrhq_gather_offset_z_s) (vldrhq_gather_offset_s, vldrdq_gather_offset_s) (vldrhq_gather_offset_f, vldrwq_gather_offset_f) (vldrwq_gather_offset_s, vldrdq_gather_offset_z_s) (vldrhq_gather_offset_z_f, vldrwq_gather_offset_z_f) (vldrwq_gather_offset_z_s, vldrdq_gather_offset_u) (vldrwq_gather_offset_u, vldrdq_gather_offset_z_u) (vldrwq_gather_offset_z_u): Delete. * config/arm/iterators.md (MVE_u_elem): New. (supf): Remove VLDRBQGO_S, VLDRBQGO_U, VLDRHQGO_S, VLDRHQGO_U, VLDRDQGO_S, VLDRDQGO_U, VLDRWQGO_S, VLDRWQGO_U. (VLDRBGOQ, VLDRHGOQ, VLDRDGOQ, VLDRWGOQ): Delete. * config/arm/mve.md (mve_vldrbq_gather_offset_<supf><mode>): Delete. (mve_vldrbq_gather_offset_z_<supf><mode>): Delete. (mve_vldrhq_gather_offset_<supf><mode>): Delete. (mve_vldrhq_gather_offset_z_<supf><mode>): Delete. (mve_vldrdq_gather_offset_<supf>v2di): Delete. (mve_vldrdq_gather_offset_z_<supf>v2di): Delete. (mve_vldrhq_gather_offset_fv8hf): Delete. (mve_vldrhq_gather_offset_z_fv8hf): Delete. (mve_vldrwq_gather_offset_fv4sf): Delete. (mve_vldrwq_gather_offset_<supf>v4si): Delete. (mve_vldrwq_gather_offset_z_fv4sf): Delete. (mve_vldrwq_gather_offset_z_<supf>v4si): Delete. (@mve_vldrq_gather_offset_<mode>): New. (@mve_vldrq_gather_offset_extend_<mode><US>): New. (@mve_vldrq_gather_offset_z_<mode>): New. (@mve_vldrq_gather_offset_z_extend_<mode><US>): New. * config/arm/unspecs.md (VLDRBQGO_S, VLDRBQGO_U, VLDRHQGO_S) (VLDRHQGO_U, VLDRDQGO_S, VLDRDQGO_U, VLDRHQGO_F, VLDRWQGO_F) (VLDRWQGO_S, VLDRWQGO_U): Delete. (VLDRGOQ, VLDRGOQ_Z, VLDRGOQ_EXT, VLDRGOQ_EXT_Z): New.
2024-12-13arm: [MVE intrinsics] add load_ext_gather_offset shapeChristophe Lyon2-0/+59
This patch adds the load_ext_gather_offset shape description. gcc/ChangeLog: * config/arm/arm-mve-builtins-shapes.cc (struct load_ext_gather): New. (struct load_ext_gather_offset_def): New. * config/arm/arm-mve-builtins-shapes.h (load_ext_gather_offset): New.
2024-12-13arm: [MVE intrinsics] rework vstr scatter_base_wbChristophe Lyon10-378/+128
Implement vstr?q_scatter_base_wb using the new MVE builtins framework. The patch introduces a new 'b' type for signatures, which represents the type of the 'base' argument of vstr?q_scatter_base_wb. gcc/ChangeLog: * config/arm/arm-builtins.cc (arm_strsbwbs_qualifiers) (arm_strsbwbu_qualifiers, arm_strsbwbs_p_qualifiers) (arm_strsbwbu_p_qualifiers): Delete. * config/arm/arm-mve-builtins-base.cc (vstrq_scatter_base_impl): Add support for MODE_wb. * config/arm/arm-mve-builtins-shapes.cc (parse_type): Add support for 'b' type. (store_scatter_base): Add support for MODE_wb. * config/arm/arm-mve-builtins.cc (function_resolver::require_pointer_to_type): New. * config/arm/arm-mve-builtins.h (function_resolver::require_pointer_to_type): New. * config/arm/arm_mve.h (vstrdq_scatter_base_wb): Delete. (vstrdq_scatter_base_wb_p): Delete. (vstrwq_scatter_base_wb_p): Delete. (vstrwq_scatter_base_wb): Delete. (vstrdq_scatter_base_wb_p_s64): Delete. (vstrdq_scatter_base_wb_p_u64): Delete. (vstrdq_scatter_base_wb_s64): Delete. (vstrdq_scatter_base_wb_u64): Delete. (vstrwq_scatter_base_wb_p_s32): Delete. (vstrwq_scatter_base_wb_p_f32): Delete. (vstrwq_scatter_base_wb_p_u32): Delete. (vstrwq_scatter_base_wb_s32): Delete. (vstrwq_scatter_base_wb_u32): Delete. (vstrwq_scatter_base_wb_f32): Delete. (__arm_vstrdq_scatter_base_wb_s64): Delete. (__arm_vstrdq_scatter_base_wb_u64): Delete. (__arm_vstrdq_scatter_base_wb_p_s64): Delete. (__arm_vstrdq_scatter_base_wb_p_u64): Delete. (__arm_vstrwq_scatter_base_wb_p_s32): Delete. (__arm_vstrwq_scatter_base_wb_p_u32): Delete. (__arm_vstrwq_scatter_base_wb_s32): Delete. (__arm_vstrwq_scatter_base_wb_u32): Delete. (__arm_vstrwq_scatter_base_wb_f32): Delete. (__arm_vstrwq_scatter_base_wb_p_f32): Delete. (__arm_vstrdq_scatter_base_wb): Delete. (__arm_vstrdq_scatter_base_wb_p): Delete. (__arm_vstrwq_scatter_base_wb_p): Delete. (__arm_vstrwq_scatter_base_wb): Delete. * config/arm/arm_mve_builtins.def (vstrwq_scatter_base_wb_u) (vstrdq_scatter_base_wb_u, vstrwq_scatter_base_wb_p_u) (vstrdq_scatter_base_wb_p_u, vstrwq_scatter_base_wb_s) (vstrwq_scatter_base_wb_f, vstrdq_scatter_base_wb_s) (vstrwq_scatter_base_wb_p_s, vstrwq_scatter_base_wb_p_f) (vstrdq_scatter_base_wb_p_s): Delete. * config/arm/iterators.md (supf): Remove VSTRWQSBWB_S, VSTRWQSBWB_U, VSTRDQSBWB_S, VSTRDQSBWB_U. (VSTRDSBQ, VSTRWSBWBQ, VSTRDSBWBQ): Delete. * config/arm/mve.md (mve_vstrwq_scatter_base_wb_<supf>v4si): Delete. (mve_vstrwq_scatter_base_wb_p_<supf>v4si): Delete. (mve_vstrwq_scatter_base_wb_fv4sf): Delete. (mve_vstrwq_scatter_base_wb_p_fv4sf): Delete. (mve_vstrdq_scatter_base_wb_<supf>v2di): Delete. (mve_vstrdq_scatter_base_wb_p_<supf>v2di): Delete. (@mve_vstrq_scatter_base_wb_<mode>): New. (@mve_vstrq_scatter_base_wb_p_<mode>): New. * config/arm/unspecs.md (VSTRWQSBWB_S, VSTRWQSBWB_U, VSTRWQSBWB_F) (VSTRDQSBWB_S, VSTRDQSBWB_U): Delete. (VSTRSBWBQ, VSTRSBWBQ_P): New.
2024-12-13arm: [MVE intrinsics] rework vstr scatter_baseChristophe Lyon9-360/+72
Implement vstr?q_scatter_base using the new MVE builtins framework. We need to introduce a new iterator (MVE_4) to support the set needed by vstr?q_scatter_base (V4SI V4SF V2DI). gcc/ChangeLog: * config/arm/arm-builtins.cc (arm_strsbs_qualifiers) (arm_strsbu_qualifiers, arm_strsbs_p_qualifiers) (arm_strsbu_p_qualifiers): Delete. * config/arm/arm-mve-builtins-base.cc (class vstrq_scatter_base_impl): New. (vstrwq_scatter_base, vstrdq_scatter_base): New. * config/arm/arm-mve-builtins-base.def (vstrwq_scatter_base) (vstrdq_scatter_base): New. * config/arm/arm-mve-builtins-base.h (vstrwq_scatter_base) (vstrdq_scatter_base): New. * config/arm/arm_mve.h (vstrwq_scatter_base): Delete. (vstrwq_scatter_base_p): Delete. (vstrdq_scatter_base_p): Delete. (vstrdq_scatter_base): Delete. (vstrwq_scatter_base_s32): Delete. (vstrwq_scatter_base_u32): Delete. (vstrwq_scatter_base_p_s32): Delete. (vstrwq_scatter_base_p_u32): Delete. (vstrdq_scatter_base_p_s64): Delete. (vstrdq_scatter_base_p_u64): Delete. (vstrdq_scatter_base_s64): Delete. (vstrdq_scatter_base_u64): Delete. (vstrwq_scatter_base_f32): Delete. (vstrwq_scatter_base_p_f32): Delete. (__arm_vstrwq_scatter_base_s32): Delete. (__arm_vstrwq_scatter_base_u32): Delete. (__arm_vstrwq_scatter_base_p_s32): Delete. (__arm_vstrwq_scatter_base_p_u32): Delete. (__arm_vstrdq_scatter_base_p_s64): Delete. (__arm_vstrdq_scatter_base_p_u64): Delete. (__arm_vstrdq_scatter_base_s64): Delete. (__arm_vstrdq_scatter_base_u64): Delete. (__arm_vstrwq_scatter_base_f32): Delete. (__arm_vstrwq_scatter_base_p_f32): Delete. (__arm_vstrwq_scatter_base): Delete. (__arm_vstrwq_scatter_base_p): Delete. (__arm_vstrdq_scatter_base_p): Delete. (__arm_vstrdq_scatter_base): Delete. * config/arm/arm_mve_builtins.def (vstrwq_scatter_base_s) (vstrwq_scatter_base_u, vstrwq_scatter_base_p_s) (vstrwq_scatter_base_p_u, vstrdq_scatter_base_s) (vstrwq_scatter_base_f, vstrdq_scatter_base_p_s) (vstrwq_scatter_base_p_f, vstrdq_scatter_base_u) (vstrdq_scatter_base_p_u): Delete. * config/arm/iterators.md (MVE_4): New. (supf): Remove VSTRWQSB_S, VSTRWQSB_U. (VSTRWSBQ): Delete. * config/arm/mve.md (mve_vstrwq_scatter_base_<supf>v4si): Delete. (mve_vstrwq_scatter_base_p_<supf>v4si): Delete. (mve_vstrdq_scatter_base_p_<supf>v2di): Delete. (mve_vstrdq_scatter_base_<supf>v2di): Delete. (mve_vstrwq_scatter_base_fv4sf): Delete. (mve_vstrwq_scatter_base_p_fv4sf): Delete. (@mve_vstrq_scatter_base_<mode>): New. (@mve_vstrq_scatter_base_p_<mode>): New. * config/arm/unspecs.md (VSTRWQSB_S, VSTRWQSB_U, VSTRWQSB_F): Delete. (VSTRSBQ, VSTRSBQ_P): New.
2024-12-13arm: [MVE intrinsics] Add store_scatter_base shapeChristophe Lyon2-0/+50
This patch adds the store_scatter_base shape description. gcc/ChangeLog: * config/arm/arm-mve-builtins-shapes.cc (store_scatter_base): New. * config/arm/arm-mve-builtins-shapes.h (store_scatter_base): New.
2024-12-13arm: [MVE intrinsics] Check immediate is a multiple in a rangeChristophe Lyon2-0/+63
This patch adds support to check that an immediate is a multiple of a given value in a given range. This will be used for instance by scatter_base to check that offset is in +/-4*[0..127]. Unlike require_immediate_range, require_immediate_range_multiple accepts signed range bounds to handle the above case. gcc/ChangeLog: * config/arm/arm-mve-builtins.cc (report_out_of_range_multiple): New. (function_checker::require_signed_immediate): New. (function_checker::require_immediate_range_multiple): New. * config/arm/arm-mve-builtins.h (function_checker::require_immediate_range_multiple): New. (function_checker::require_signed_immediate): New.
2024-12-13arm: [MVE intrinsics] rework vstr_scatter_shifted_offsetChristophe Lyon9-786/+103
Implement vstr?q_scatter_shifted_offset intrinsics using the MVE builtins framework. We use the same approach as the previous patch, and we now have four sets of patterns: - vector scatter stores with shifted offset (non-truncating) - predicated vector scatter stores with shifted offset (non-truncating) - truncating vector scatter stores with shifted offset - predicated truncating vector scatter stores with shifted offset Note that the truncating patterns do not use an iterator since there is only one such variant: V4SI to V4HI. We need to introduce new iterators: - MVE_VLD_ST_scatter_shifted, same as MVE_VLD_ST_scatter without V16QI - MVE_scatter_shift to map the mode to the shift amount gcc/ChangeLog: * config/arm/arm-builtins.cc (arm_strss_qualifiers) (arm_strsu_qualifiers, arm_strsu_p_qualifiers) (arm_strss_p_qualifiers): Delete. * config/arm/arm-mve-builtins-base.cc (class vstrq_scatter_impl): Add support for shifted version. (vstrdq_scatter_shifted, vstrhq_scatter_shifted) (vstrwq_scatter_shifted): New. * config/arm/arm-mve-builtins-base.def (vstrhq_scatter_shifted) (vstrwq_scatter_shifted, vstrdq_scatter_shifted): New. * config/arm/arm-mve-builtins-base.h (vstrhq_scatter_shifted) (vstrwq_scatter_shifted, vstrdq_scatter_shifted): New. * config/arm/arm_mve.h (vstrhq_scatter_shifted_offset): Delete. (vstrhq_scatter_shifted_offset_p): Delete. (vstrdq_scatter_shifted_offset_p): Delete. (vstrdq_scatter_shifted_offset): Delete. (vstrwq_scatter_shifted_offset_p): Delete. (vstrwq_scatter_shifted_offset): Delete. (vstrhq_scatter_shifted_offset_s32): Delete. (vstrhq_scatter_shifted_offset_s16): Delete. (vstrhq_scatter_shifted_offset_u32): Delete. (vstrhq_scatter_shifted_offset_u16): Delete. (vstrhq_scatter_shifted_offset_p_s32): Delete. (vstrhq_scatter_shifted_offset_p_s16): Delete. (vstrhq_scatter_shifted_offset_p_u32): Delete. (vstrhq_scatter_shifted_offset_p_u16): Delete. (vstrdq_scatter_shifted_offset_p_s64): Delete. (vstrdq_scatter_shifted_offset_p_u64): Delete. (vstrdq_scatter_shifted_offset_s64): Delete. (vstrdq_scatter_shifted_offset_u64): Delete. (vstrhq_scatter_shifted_offset_f16): Delete. (vstrhq_scatter_shifted_offset_p_f16): Delete. (vstrwq_scatter_shifted_offset_f32): Delete. (vstrwq_scatter_shifted_offset_p_f32): Delete. (vstrwq_scatter_shifted_offset_p_s32): Delete. (vstrwq_scatter_shifted_offset_p_u32): Delete. (vstrwq_scatter_shifted_offset_s32): Delete. (vstrwq_scatter_shifted_offset_u32): Delete. (__arm_vstrhq_scatter_shifted_offset_s32): Delete. (__arm_vstrhq_scatter_shifted_offset_s16): Delete. (__arm_vstrhq_scatter_shifted_offset_u32): Delete. (__arm_vstrhq_scatter_shifted_offset_u16): Delete. (__arm_vstrhq_scatter_shifted_offset_p_s32): Delete. (__arm_vstrhq_scatter_shifted_offset_p_s16): Delete. (__arm_vstrhq_scatter_shifted_offset_p_u32): Delete. (__arm_vstrhq_scatter_shifted_offset_p_u16): Delete. (__arm_vstrdq_scatter_shifted_offset_p_s64): Delete. (__arm_vstrdq_scatter_shifted_offset_p_u64): Delete. (__arm_vstrdq_scatter_shifted_offset_s64): Delete. (__arm_vstrdq_scatter_shifted_offset_u64): Delete. (__arm_vstrwq_scatter_shifted_offset_p_s32): Delete. (__arm_vstrwq_scatter_shifted_offset_p_u32): Delete. (__arm_vstrwq_scatter_shifted_offset_s32): Delete. (__arm_vstrwq_scatter_shifted_offset_u32): Delete. (__arm_vstrhq_scatter_shifted_offset_f16): Delete. (__arm_vstrhq_scatter_shifted_offset_p_f16): Delete. (__arm_vstrwq_scatter_shifted_offset_f32): Delete. (__arm_vstrwq_scatter_shifted_offset_p_f32): Delete. (__arm_vstrhq_scatter_shifted_offset): Delete. (__arm_vstrhq_scatter_shifted_offset_p): Delete. (__arm_vstrdq_scatter_shifted_offset_p): Delete. (__arm_vstrdq_scatter_shifted_offset): Delete. (__arm_vstrwq_scatter_shifted_offset_p): Delete. (__arm_vstrwq_scatter_shifted_offset): Delete. * config/arm/arm_mve_builtins.def (vstrhq_scatter_shifted_offset_p_u) (vstrhq_scatter_shifted_offset_u) (vstrhq_scatter_shifted_offset_p_s) (vstrhq_scatter_shifted_offset_s, vstrdq_scatter_shifted_offset_s) (vstrhq_scatter_shifted_offset_f, vstrwq_scatter_shifted_offset_f) (vstrwq_scatter_shifted_offset_s) (vstrdq_scatter_shifted_offset_p_s) (vstrhq_scatter_shifted_offset_p_f) (vstrwq_scatter_shifted_offset_p_f) (vstrwq_scatter_shifted_offset_p_s) (vstrdq_scatter_shifted_offset_u, vstrwq_scatter_shifted_offset_u) (vstrdq_scatter_shifted_offset_p_u) (vstrwq_scatter_shifted_offset_p_u): Delete. * config/arm/iterators.md (MVE_VLD_ST_scatter_shifted): New. (MVE_scatter_shift): New. (supf): Remove VSTRHQSSO_S, VSTRHQSSO_U, VSTRDQSSO_S, VSTRDQSSO_U, VSTRWQSSO_U, VSTRWQSSO_S. (VSTRHSSOQ, VSTRDSSOQ, VSTRWSSOQ): Delete. * config/arm/mve.md (mve_vstrhq_scatter_shifted_offset_p_<supf><mode>): Delete. (mve_vstrhq_scatter_shifted_offset_p_<supf><mode>_insn): Delete. (mve_vstrhq_scatter_shifted_offset_<supf><mode>): Delete. (mve_vstrhq_scatter_shifted_offset_<supf><mode>_insn): Delete. (mve_vstrdq_scatter_shifted_offset_p_<supf>v2di): Delete. (mve_vstrdq_scatter_shifted_offset_p_<supf>v2di_insn): Delete. (mve_vstrdq_scatter_shifted_offset_<supf>v2di): Delete. (mve_vstrdq_scatter_shifted_offset_<supf>v2di_insn): Delete. (mve_vstrhq_scatter_shifted_offset_fv8hf): Delete. (mve_vstrhq_scatter_shifted_offset_fv8hf_insn): Delete. (mve_vstrhq_scatter_shifted_offset_p_fv8hf): Delete. (mve_vstrhq_scatter_shifted_offset_p_fv8hf_insn): Delete. (mve_vstrwq_scatter_shifted_offset_fv4sf): Delete. (mve_vstrwq_scatter_shifted_offset_fv4sf_insn): Delete. (mve_vstrwq_scatter_shifted_offset_p_fv4sf): Delete. (mve_vstrwq_scatter_shifted_offset_p_fv4sf_insn): Delete. (mve_vstrwq_scatter_shifted_offset_p_<supf>v4si): Delete. (mve_vstrwq_scatter_shifted_offset_p_<supf>v4si_insn): Delete. (mve_vstrwq_scatter_shifted_offset_<supf>v4si): Delete. (mve_vstrwq_scatter_shifted_offset_<supf>v4si_insn): Delete. (@mve_vstrq_scatter_shifted_offset_<mode>): New. (@mve_vstrq_scatter_shifted_offset_p_<mode>): New. (mve_vstrq_truncate_scatter_shifted_offset_v4si): New. (mve_vstrq_truncate_scatter_shifted_offset_p_v4si): New. * config/arm/unspecs.md (VSTRDQSSO_S, VSTRDQSSO_U, VSTRWQSSO_S) (VSTRWQSSO_U, VSTRHQSSO_F, VSTRWQSSO_F, VSTRHQSSO_S, VSTRHQSSO_U): Delete. (VSTRSSOQ, VSTRSSOQ_P, VSTRSSOQ_TRUNC, VSTRSSOQ_TRUNC_P): New.
2024-12-13arm: [MVE intrinsics] rework vstr?q_scatter_offsetChristophe Lyon9-1017/+143
This patch implements vstr?q_scatter_offset using the new MVE builtins framework. It uses a similar approach to a previous patch which grouped truncating and non-truncating stores in two sets of patterns, rather than having groups of patterns depending on the destination size. We need to add the 'integer_64' types of suffixes in order to support vstrdq_scatter_offset. The patch introduces the MVE_VLD_ST_scatter iterator, similar to MVE_VLD_ST but which also includes V2DI (again, for vstrdq_scatter_offset). The new MVE_scatter_offset mode attribute is used to map the destination type to the offset type (both are usually equal, except when the destination is floating-point). We end up with four sets of patterns: - vector scatter stores with offset (non-truncating) - predicated vector scatter stores with offset (non-truncating) - truncating vector scatter stores with offset - predicated truncating vector scatter stores with offset gcc/ChangeLog: * config/arm/arm-mve-builtins-base.cc (class vstrq_scatter_impl): New. (vstrbq_scatter, vstrhq_scatter, vstrwq_scatter, vstrdq_scatter): New. * config/arm/arm-mve-builtins-base.def (vstrbq_scatter) (vstrhq_scatter, vstrwq_scatter, vstrdq_scatter): New. * config/arm/arm-mve-builtins-base.h (vstrbq_scatter) (vstrhq_scatter, vstrwq_scatter, vstrdq_scatter): New. * config/arm/arm-mve-builtins.cc (integer_64): New. * config/arm/arm_mve.h (vstrbq_scatter_offset): Delete. (vstrbq_scatter_offset_p): Delete. (vstrhq_scatter_offset): Delete. (vstrhq_scatter_offset_p): Delete. (vstrdq_scatter_offset_p): Delete. (vstrdq_scatter_offset): Delete. (vstrwq_scatter_offset_p): Delete. (vstrwq_scatter_offset): Delete. (vstrbq_scatter_offset_s8): Delete. (vstrbq_scatter_offset_u8): Delete. (vstrbq_scatter_offset_u16): Delete. (vstrbq_scatter_offset_s16): Delete. (vstrbq_scatter_offset_u32): Delete. (vstrbq_scatter_offset_s32): Delete. (vstrbq_scatter_offset_p_s8): Delete. (vstrbq_scatter_offset_p_s32): Delete. (vstrbq_scatter_offset_p_s16): Delete. (vstrbq_scatter_offset_p_u8): Delete. (vstrbq_scatter_offset_p_u32): Delete. (vstrbq_scatter_offset_p_u16): Delete. (vstrhq_scatter_offset_s32): Delete. (vstrhq_scatter_offset_s16): Delete. (vstrhq_scatter_offset_u32): Delete. (vstrhq_scatter_offset_u16): Delete. (vstrhq_scatter_offset_p_s32): Delete. (vstrhq_scatter_offset_p_s16): Delete. (vstrhq_scatter_offset_p_u32): Delete. (vstrhq_scatter_offset_p_u16): Delete. (vstrdq_scatter_offset_p_s64): Delete. (vstrdq_scatter_offset_p_u64): Delete. (vstrdq_scatter_offset_s64): Delete. (vstrdq_scatter_offset_u64): Delete. (vstrhq_scatter_offset_f16): Delete. (vstrhq_scatter_offset_p_f16): Delete. (vstrwq_scatter_offset_f32): Delete. (vstrwq_scatter_offset_p_f32): Delete. (vstrwq_scatter_offset_p_s32): Delete. (vstrwq_scatter_offset_p_u32): Delete. (vstrwq_scatter_offset_s32): Delete. (vstrwq_scatter_offset_u32): Delete. (__arm_vstrbq_scatter_offset_s8): Delete. (__arm_vstrbq_scatter_offset_s32): Delete. (__arm_vstrbq_scatter_offset_s16): Delete. (__arm_vstrbq_scatter_offset_u8): Delete. (__arm_vstrbq_scatter_offset_u32): Delete. (__arm_vstrbq_scatter_offset_u16): Delete. (__arm_vstrbq_scatter_offset_p_s8): Delete. (__arm_vstrbq_scatter_offset_p_s32): Delete. (__arm_vstrbq_scatter_offset_p_s16): Delete. (__arm_vstrbq_scatter_offset_p_u8): Delete. (__arm_vstrbq_scatter_offset_p_u32): Delete. (__arm_vstrbq_scatter_offset_p_u16): Delete. (__arm_vstrhq_scatter_offset_s32): Delete. (__arm_vstrhq_scatter_offset_s16): Delete. (__arm_vstrhq_scatter_offset_u32): Delete. (__arm_vstrhq_scatter_offset_u16): Delete. (__arm_vstrhq_scatter_offset_p_s32): Delete. (__arm_vstrhq_scatter_offset_p_s16): Delete. (__arm_vstrhq_scatter_offset_p_u32): Delete. (__arm_vstrhq_scatter_offset_p_u16): Delete. (__arm_vstrdq_scatter_offset_p_s64): Delete. (__arm_vstrdq_scatter_offset_p_u64): Delete. (__arm_vstrdq_scatter_offset_s64): Delete. (__arm_vstrdq_scatter_offset_u64): Delete. (__arm_vstrwq_scatter_offset_p_s32): Delete. (__arm_vstrwq_scatter_offset_p_u32): Delete. (__arm_vstrwq_scatter_offset_s32): Delete. (__arm_vstrwq_scatter_offset_u32): Delete. (__arm_vstrhq_scatter_offset_f16): Delete. (__arm_vstrhq_scatter_offset_p_f16): Delete. (__arm_vstrwq_scatter_offset_f32): Delete. (__arm_vstrwq_scatter_offset_p_f32): Delete. (__arm_vstrbq_scatter_offset): Delete. (__arm_vstrbq_scatter_offset_p): Delete. (__arm_vstrhq_scatter_offset): Delete. (__arm_vstrhq_scatter_offset_p): Delete. (__arm_vstrdq_scatter_offset_p): Delete. (__arm_vstrdq_scatter_offset): Delete. (__arm_vstrwq_scatter_offset_p): Delete. (__arm_vstrwq_scatter_offset): Delete. * config/arm/arm_mve_builtins.def (vstrbq_scatter_offset_s) (vstrbq_scatter_offset_u, vstrbq_scatter_offset_p_s) (vstrbq_scatter_offset_p_u, vstrhq_scatter_offset_p_u) (vstrhq_scatter_offset_u, vstrhq_scatter_offset_p_s) (vstrhq_scatter_offset_s, vstrdq_scatter_offset_s) (vstrhq_scatter_offset_f, vstrwq_scatter_offset_f) (vstrwq_scatter_offset_s, vstrdq_scatter_offset_p_s) (vstrhq_scatter_offset_p_f, vstrwq_scatter_offset_p_f) (vstrwq_scatter_offset_p_s, vstrdq_scatter_offset_u) (vstrwq_scatter_offset_u, vstrdq_scatter_offset_p_u) (vstrwq_scatter_offset_p_u) Delete. * config/arm/iterators.md (MVE_VLD_ST_scatter): New. (MVE_scatter_offset): New. (MVE_elem_ch): Add entry for V2DI. (supf): Remove VSTRBQSO_S, VSTRBQSO_U, VSTRHQSO_S, VSTRHQSO_U, VSTRDQSO_S, VSTRDQSO_U, VSTRWQSO_U, VSTRWQSO_S. (VSTRBSOQ, VSTRHSOQ, VSTRDSOQ, VSTRWSOQ): Delete. * config/arm/mve.md (mve_vstrbq_scatter_offset_<supf><mode>): Delete. (mve_vstrbq_scatter_offset_<supf><mode>_insn): Delete. (mve_vstrbq_scatter_offset_p_<supf><mode>): Delete. (mve_vstrbq_scatter_offset_p_<supf><mode>_insn): Delete. (mve_vstrhq_scatter_offset_p_<supf><mode>): Delete. (mve_vstrhq_scatter_offset_p_<supf><mode>_insn): Delete. (mve_vstrhq_scatter_offset_<supf><mode>): Delete. (mve_vstrhq_scatter_offset_<supf><mode>_insn): Delete. (mve_vstrdq_scatter_offset_p_<supf>v2di): Delete. (mve_vstrdq_scatter_offset_p_<supf>v2di_insn): Delete. (mve_vstrdq_scatter_offset_<supf>v2di): Delete. (mve_vstrdq_scatter_offset_<supf>v2di_insn): Delete. (mve_vstrhq_scatter_offset_fv8hf): Delete. (mve_vstrhq_scatter_offset_fv8hf_insn): Delete. (mve_vstrhq_scatter_offset_p_fv8hf): Delete. (mve_vstrhq_scatter_offset_p_fv8hf_insn): Delete. (mve_vstrwq_scatter_offset_fv4sf): Delete. (mve_vstrwq_scatter_offset_fv4sf_insn): Delete. (mve_vstrwq_scatter_offset_p_fv4sf): Delete. (mve_vstrwq_scatter_offset_p_fv4sf_insn): Delete. (mve_vstrwq_scatter_offset_p_<supf>v4si): Delete. (mve_vstrwq_scatter_offset_p_<supf>v4si_insn): Delete. (mve_vstrwq_scatter_offset_<supf>v4si): Delete. (mve_vstrwq_scatter_offset_<supf>v4si_insn): Delete. (@mve_vstrq_scatter_offset_<mode>): New. (@mve_vstrq_scatter_offset_p_<mode>): New. (@mve_vstrq_truncate_scatter_offset_<mode>): New. (@mve_vstrq_truncate_scatter_offset_p_<mode>): New. * config/arm/unspecs.md (VSTRBQSO_S, VSTRBQSO_U, VSTRHQSO_S) (VSTRDQSO_S, VSTRDQSO_U, VSTRWQSO_S, VSTRWQSO_U, VSTRHQSO_F) (VSTRWQSO_F, VSTRHQSO_U): Delete. (VSTRQSO, VSTRQSO_P, VSTRQSO_TRUNC, VSTRQSO_TRUNC_P): New.
2024-12-13arm: [MVE intrinsics] add store_scatter_offset shapeChristophe Lyon2-0/+65
This patch adds the store_scatter_offset shape and uses a new helper class (store_scatter), which will also be used by later patches. gcc/ChangeLog: * config/arm/arm-mve-builtins-shapes.cc (struct store_scatter): New. (struct store_scatter_offset_def): New. * config/arm/arm-mve-builtins-shapes.h (store_scatter_offset): New.
2024-12-13arm: [MVE intrinsics] add mode_after_pred helper in function_shapeChristophe Lyon3-1/+21
This new helper returns true if the mode suffix goes after the predicate suffix. This is true in most cases, so the base implementations in nonoverloaded_base and overloaded_base return true. For instance: vaddq_m_n_s32. This will be useful in later patches to implement vstr?q_scatter_offset_p (_p appears after _offset). gcc/ChangeLog: * config/arm/arm-mve-builtins-shapes.cc (struct nonoverloaded_base): Implement mode_after_pred. (struct overloaded_base): Likewise. * config/arm/arm-mve-builtins.cc (function_builder::get_name): Call mode_after_pred as needed. * config/arm/arm-mve-builtins.h (function_shape): Add mode_after_pred.
2024-12-13AArch64: Set L1 data cache size according to size on CPUsTamar Christina9-22/+9
This sets the L1 data cache size for some cores based on their size in their Technical Reference Manuals. Today the port minimum is 256 bytes as explained in commit g:9a99559a478111f7fbeec29bd78344df7651c707, however like Neoverse V2 most cores actually define the L1 cache size as 64-bytes. The generic Armv9-A model was already changed in g:f000cb8cbc58b23a91c84d47d69481904981a1d9 and this change follows suite for a few other cores based on their TRMs. This results in less memory pressure when running on large core count machines. gcc/ChangeLog: * config/aarch64/tuning_models/cortexx925.h: Set L1 cache size to 64b. * config/aarch64/tuning_models/neoverse512tvb.h: Likewise. * config/aarch64/tuning_models/neoversen1.h: Likewise. * config/aarch64/tuning_models/neoversen2.h: Likewise. * config/aarch64/tuning_models/neoversen3.h: Likewise. * config/aarch64/tuning_models/neoversev1.h: Likewise. * config/aarch64/tuning_models/neoversev2.h: Likewise. (neoversev2_prefetch_tune): Removed. * config/aarch64/tuning_models/neoversev3.h: Likewise. * config/aarch64/tuning_models/neoversev3ae.h: Likewise.
2024-12-13AArch64: Add CMP+CSEL and CMP+CSET for cores that support itTamar Christina10-9/+17
GCC 15 added two new fusions CMP+CSEL and CMP+CSET. This patch enables them for cores that support based on their Software Optimization Guides and generically on Armv9-A. Even if a core does not support it there's no negative performance impact. gcc/ChangeLog: * config/aarch64/aarch64-fusion-pairs.def (AARCH64_FUSE_NEOVERSE_BASE): New. * config/aarch64/tuning_models/neoverse512tvb.h: Use it. * config/aarch64/tuning_models/neoversen2.h: Use it. * config/aarch64/tuning_models/neoversen3.h: Use it. * config/aarch64/tuning_models/neoversev1.h: Use it. * config/aarch64/tuning_models/neoversev2.h: Use it. * config/aarch64/tuning_models/neoversev3.h: Use it. * config/aarch64/tuning_models/neoversev3ae.h: Use it. * config/aarch64/tuning_models/cortexx925.h: Add fusions. * config/aarch64/tuning_models/generic_armv9_a.h: Add fusions.
2024-12-13i386: Add vec_fm{addsub,subadd}v2sf4 patterns [PR116979]Jakub Jelinek1-0/+48
As mentioned in the PR, the addition of vec_addsubv2sf3 expander caused the testcase to be vectorized and no longer to use fma. The following patch adds new expanders so that it can be vectorized again with the alternating add/sub fma instructions. There is some bug on the slp cost computation side which causes it not to count some scalar multiplication costs, but I think the patch is desirable anyway before that is fixed and the testcase for now just uses -fvect-cost-model=unlimited. 2024-12-13 Jakub Jelinek <jakub@redhat.com> PR target/116979 * config/i386/mmx.md (vec_fmaddsubv2sf4, vec_fmsubaddv2sf4): New define_expand patterns. * gcc.target/i386/pr116979.c: New test.
2024-12-13RISC-V: Improve slide1up pattern.Robin Dapp3-15/+56
This patch adds a second variant to implement the extract/slide1up pattern. In order to do a permutation like <3, 4, 5, 6> from vectors <0, 1, 2, 3> and <4, 5, 6, 7> we currently extract <3> from the first vector and re-insert it into the second vector. Unless register-file crossing latency is essentially zero it should be preferable to first slide the second vector up by one, then slide down the first vector by (nunits - 1). gcc/ChangeLog: * config/riscv/riscv-protos.h (riscv_register_move_cost): Export. * config/riscv/riscv-v.cc (shuffle_extract_and_slide1up_patterns): Rename... (shuffle_off_by_one_patterns): ... to this and add slideup/slidedown variant. (expand_vec_perm_const_1): Call renamed function. * config/riscv/riscv.cc (riscv_secondary_memory_needed): Remove static. (riscv_register_move_cost): Add VR<->GR/FR handling. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/pr112599-2.c: Adjust test expectation.
2024-12-13RISC-V: Add even/odd vec_perm_const pattern.Robin Dapp1-0/+66
This adds handling for even/odd patterns. gcc/ChangeLog: * config/riscv/riscv-v.cc (shuffle_even_odd_patterns): New function. (expand_vec_perm_const_1): Use new function. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls-vlmax/shuffle-evenodd-run.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/shuffle-evenodd.c: New test.
2024-12-13RISC-V: Add interleave pattern.Robin Dapp1-0/+80
This patch adds efficient handling of interleaving patterns like [0 4 1 5] to vec_perm_const. It is implemented by a slideup and a gather. gcc/ChangeLog: * config/riscv/riscv-v.cc (shuffle_interleave_patterns): New function. (expand_vec_perm_const_1): Use new function. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls-vlmax/shuffle-interleave-run.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/shuffle-interleave.c: New test.
2024-12-13RISC-V: Add slide to perm_const strategies.Robin Dapp1-0/+99
This patch adds a shuffle_slide_patterns to expand_vec_perm_const. It recognizes permutations like {0, 1, 4, 5} or {2, 3, 6, 7} which can be constructed by a slideup or slidedown of one of the vectors into the other one. gcc/ChangeLog: * config/riscv/riscv-v.cc (shuffle_slide_patterns): New. (expand_vec_perm_const_1): Call new function. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls-vlmax/shuffle-slide-run.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/shuffle-slide.c: New test.
2024-12-13RISC-V: Emit vector shift pattern for const_vector [PR117353].Robin Dapp1-3/+5
In PR117353 and PR117878 we expand a const vector during reload. For this we use an unpredicated left shift. Normally an insn like this is split but as we introduce it late and cannot create pseudos anymore it remains unpredicated and is not recognized by the vsetvl pass (where we expect all insns to be in predicated RVV format). This patch directly emits a predicated shift instead. We could distinguish between !lra_in_progress and lra_in_progress and emit an unpredicated shift in the former case but we're not very likely to optimize it anyway so it doesn't seem worth it. PR target/117353 PR target/117878 gcc/ChangeLog: * config/riscv/riscv-v.cc (expand_const_vector): Use predicated instead of simple shift. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/pr117353.c: New test.
2024-12-13RISC-V: Make vector strided load alias all other memoriesPan Li1-0/+1
The vector strided load doesn't include the (mem:BLK (scratch)) to alias all other memories. It will make the alias analysis only consider the base address of strided load and promopt the store before the strided load. For example as below #define STEP 10 char d[225]; int e[STEP]; int main() { // store 0, 10, 20, 30, 40, 50, 60, 70, 80, 90 for (long h = 0; h < STEP; ++h) d[h * STEP] = 9; // load 30, 40, 50, 60, 70, 80, 90 // store 3, 4, 5, 6, 7, 8, 9 for (int h = 3; h < STEP; h += 1) e[h] = d[h * STEP]; if (e[5] != 9) { __builtin_abort (); } return 0; } The asm dump will be: main: lui a5,%hi(.LANCHOR0) addi a5,a5,%lo(.LANCHOR0) li a4,9 sb a4,30(a5) addi a3,a5,30 vsetivli zero,7,e32,m1,ta,ma li a2,10 vlse8.v v2,0(a3),a2 // depends on 30(a5), 40(a5), ... 90(a5) but // only 30(a5) has been promoted before vlse. // It is store after load mistake. addi a3,a5,252 sb a4,0(a5) sb a4,10(a5) sb a4,20(a5) sb a4,40(a5) vzext.vf4 v1,v2 sb a4,50(a5) sb a4,60(a5) vse32.v v1,0(a3) li a0,0 sb a4,70(a5) sb a4,80(a5) sb a4,90(a5) lw a5,260(a5) beq a5,a4,.L4 li a0,123 After this patch: main: vsetivli zero,4,e32,m1,ta,ma vmv.v.i v1,9 lui a5,%hi(.LANCHOR0) addi a5,a5,%lo(.LANCHOR0) addi a4,a5,244 vse32.v v1,0(a4) li a4,9 sb a4,0(a5) sb a4,10(a5) sb a4,20(a5) sb a4,30(a5) sb a4,40(a5) sb a4,50(a5) sb a4,60(a5) sb a4,70(a5) sb a4,80(a5) sb a4,90(a5) vsetivli zero,3,e32,m1,ta,ma addi a4,a5,70 li a3,10 vlse8.v v2,0(a4),a3 addi a5,a5,260 li a0,0 vzext.vf4 v1,v2 vse32.v v1,0(a5) ret The below test suites are passed for this patch. * The rv64gcv fully regression test. PR target/117990 gcc/ChangeLog: * config/riscv/vector.md: Add the (mem:BLK (scratch)) to the vector strided load. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/pr117990-run-1.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-12-12hppa: Remove extra clobber from divsi3, udivsi3, modsi3 and umodsi3 patternsJohn David Anglin2-70/+16
The $$divI, $$divU, $$remI and $$remU millicode calls clobber r1, r26, r25 and the return link register (r31 or r2). We don't need to clobber any other registers. 2024-12-12 John David Anglin <danglin@gcc.gnu.org> gcc/ChangeLog: * config/pa/pa.cc (pa_emit_hpdiv_const): Clobber r1, r25, r25 and return register. * config/pa/pa.md (divsi3): Revise clobbers and operands. Remove second clobber from div:SI insns. (udivsi3, modsi3, umodsi3): Likewise.
2024-12-12AVR: target/118000 - Fix copymem from address-spaces.Georg-Johann Lay1-2/+15
* rampz_rtx et al. were missing MEM_VOLATILE_P. This is needed because avr_emit_cpymemhi is setting RAMPZ explicitly with an own insn. * avr_out_cpymem was missing a final RAMPZ = 0 on EBI devices. This only affects the __flash1 ... __flash5 spaces since the other ASes use different routines, gcc/ PR target/118000 * config/avr/avr.cc (avr_init_expanders) <sreg_rtx> <rampd_rtx, rampx_rtx, rampy_rtx, rampz_rtx>: Set MEM_VOLATILE_P. (avr_out_cpymem) [ELPM && EBI]: Restore RAMPZ to 0 after.
2024-12-12AVR: Assert minimal required bit width of section_common::flags.Georg-Johann Lay1-0/+29
gcc/ * config/avr/avr.cc (avr_ctz): New constexpr function. (section_common::flags): Assert minimal bit width.
2024-12-12AVR: target/118001 - Add __flashx as 24-bit named address space.Georg-Johann Lay6-114/+361
This patch adds __flashx as a new named address space that allocates objects in .progmemx.data. The handling is mostly the same or similar to that of 24-bit space __memx, except that the asm routines are simpler and more efficient. Loads are emit inline when ELPMX or LPMX is available. The address space uses a 24-bit addresses even on devices with a program memory size of 64 KiB or less. PR target/118001 gcc/ * doc/extend.texi (AVR Named Address Spaces): Document __flashx. * config/avr/avr.h (ADDR_SPACE_FLASHX): New enum value. * config/avr/avr-protos.h (avr_out_fload, avr_mem_flashx_p) (avr_fload_libgcc_p, avr_load_libgcc_mem_p) (avr_load_libgcc_insn_p): New. * config/avr/avr.cc (avr_addrspace): Add ADDR_SPACE_FLASHX. (avr_decl_flashx_p, avr_mem_flashx_p, avr_fload_libgcc_p) (avr_load_libgcc_mem_p, avr_load_libgcc_insn_p, avr_out_fload): New functions. (avr_adjust_insn_length) [ADJUST_LEN_FLOAD]: Handle case. (avr_progmem_p) [avr_decl_flashx_p]: return 2. (avr_addr_space_legitimate_address_p) [ADDR_SPACE_FLASHX]: Has same behavior like ADDR_SPACE_MEMX. (avr_addr_space_convert): Use pointer sizes rather then ASes. (avr_addr_space_contains): New function. (avr_convert_to_type): Use it. (avr_emit_cpymemhi): Handle ADDR_SPACE_FLASHX. * config/avr/avr.md (adjust_len) <fload>: New attr value. (gen_load<mode>_libgcc): Renamed from load<mode>_libgcc. (xload8<mode>_A): Iterate over MOVMODE rather than over ALL1. (fxmov<mode>_A): New from xloadv<mode>_A. (xmov<mode>_8): New from xload<mode>_A. (fmov<mode>): New insns. (fxload<mode>_A): New from xload<mode>_A. (fxload_<mode>_libgcc): New from xload_<mode>_libgcc. (*fxload_<mode>_libgcc): New from *xload_<mode>_libgcc. (mov<mode>) [avr_mem_flashx_p]: Hande ADDR_SPACE_FLASHX. (cpymemx_<mode>): Make sure the address space is not lost when splitting. (*cpymemx_<mode>) [ADDR_SPACE_FLASHX]: Use __movmemf_<mode> for asm. (*ashlqi.1.zextpsi_split): New combine pattern. * config/avr/predicates.md (nox_general_operand): Don't match when avr_mem_flashx_p is true. * config/avr/avr-passes.cc (AVR_LdSt_Props): ADDR_SPACE_FLASHX has no post_inc. gcc/testsuite/ * gcc.target/avr/torture/addr-space-1.h [AVR_HAVE_ELPM]: Use a function to bump .progmemx.data to a high address. * gcc.target/avr/torture/addr-space-2.h: Same. * gcc.target/avr/torture/addr-space-1-fx.c: New test. * gcc.target/avr/torture/addr-space-2-fx.c: New test. libgcc/ * config/avr/t-avr (LIB1ASMFUNCS): Add _fload_1, _fload_2, _fload_3, _fload_4, _movmemf. * config/avr/lib1funcs.S (.branch_plus): New .macro. (__xload_1, __xload_2, __xload_3, __xload_4): When the address is located in flash, then forward to... (__fload_1, __fload_2, __fload_3, __fload_4): ...these new functions, respectively. (__movmemx_hi): When the address is located in flash, forward to... (__movmemf_hi): ...this new function.