aboutsummaryrefslogtreecommitdiff
path: root/gcc/config/aarch64/aarch64-sve.md
AgeCommit message (Collapse)AuthorFilesLines
2024-10-08aarch64: Expand CTZ to RBIT + CLZ for SVE [PR109498]Soumya AR1-0/+17
Currently, we vectorize CTZ for SVE by using the following operation: .CTZ (X) = (PREC - 1) - .CLZ (X & -X) Instead, this patch expands CTZ to RBIT + CLZ for SVE, as suggested in PR109498. The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. OK for mainline? Signed-off-by: Soumya AR <soumyaa@nvidia.com> gcc/ChangeLog: PR target/109498 * config/aarch64/aarch64-sve.md (ctz<mode>2): Added pattern to expand CTZ to RBIT + CLZ for SVE. gcc/testsuite/ChangeLog: PR target/109498 * gcc.target/aarch64/sve/ctz.c: New test.
2024-10-07aarch64: Fix general permutes of svbfloat16_tsRichard Sandiford1-4/+4
Testing gcc.target/aarch64/sve/permute_2.c without the associated GCC patch triggered an unrecognisable insn ICE for the svbfloat16_t tests. This was because the implementation of general two-vector permutes requires two TBLs and an ORR, with the ORR being represented as an unspec for floating-point modes. The associated pattern did not cover VNx8BF. gcc/ * config/aarch64/iterators.md (SVE_I): Move further up file. (SVE_F): New mode iterator. (SVE_ALL): Redefine in terms of SVE_I and SVE_F. * config/aarch64/aarch64-sve.md (*<LOGICALF:optab><mode>3): Extend to all SVE_F. gcc/testsuite/ * gcc.target/aarch64/sve/permute_5.c: New test.
2024-10-01aarch64: Introduce new unspecs for smax/sminSaurabh Jha1-33/+0
Introduce two new unspecs, UNSPEC_COND_SMAX and UNSPEC_COND_SMIN, corresponding to rtl operators smax and smin. UNSPEC_COND_SMAX is used to generate fmaxnm instruction and UNSPEC_COND_SMIN is used to generate fminnm instruction. With these new unspecs, we can generate SVE2 max/min instructions using existing generic unpredicated and predicated instruction patterns that use optab attribute. Thus, we have removed specialised instruction patterns for max/min instructions that were using SVE_COND_FP_MAXMIN_PUBLIC iterator. No new test cases as the existing test cases should be enough to test this refactoring. gcc/ChangeLog: * config/aarch64/aarch64-sve.md (<fmaxmin><mode>3): Remove this instruction pattern. (cond_<fmaxmin><mode>): Remove this instruction pattern. * config/aarch64/iterators.md: New unspecs and changes to iterators and attrs to use the new unspecs
2024-09-30aarch64: Fix aarch64 backend-use of (u|s|us)dot_prod patternsVictor Do Nascimento1-3/+4
Given recent changes to the dot_prod standard pattern name, this patch fixes the aarch64 back-end by implementing the following changes: 1. Add 2nd mode to all (u|s|us)dot_prod patterns in .md files. 2. Rewrite initialization and function expansion mechanism for simd builtins. 3. Fix all direct calls to back-end `dot_prod' patterns in SVE builtins. Finally, given that it is now possible for the compiler to differentiate between the two- and four-way dot product, we add a test to ensure that autovectorization picks up on dot-product patterns where the result is twice the width of the operands. gcc/ChangeLog: * config/aarch64/aarch64-simd.md (<sur>dot_prod<vsi2qi><vczle><vczbe>): Renamed to... (<sur>dot_prod<mode><vsi2qi><vczle><vczbe>): ...this. (usdot_prod<vsi2qi><vczle><vczbe>): Renamed to... (usdot_prod<mode><vsi2qi><vczle><vczbe>): ...this. (<su>sadv16qi): Adjust call to gen_udot_prod take second mode. (popcount<mode2>): fix use of `udot_prod_optab'. * config/aarch64/aarch64-sve.md (<sur>dot_prod<vsi2qi>): Renamed to... (<sur>dot_prod<mode><vsi2qi>): ...this. (@<sur>dot_prod<vsi2qi>): Renamed to... (@<sur>dot_prod<mode><vsi2qi>): ...this. (<su>sad<vsi2qi>): Adjust call to gen_udot_prod take second mode. * config/aarch64/aarch64-sve2.md (@aarch64_sve_<sur>dotvnx4sivnx8hi): Renamed to... (<sur>dot_prodvnx4sivnx8hi): ...this. * config/aarch64/aarch64-simd-builtins.def: Modify macro expansion-based initialization and expansion of (u|s|us)dot_prod builtins. * config/aarch64/aarch64-builtins.cc (CODE_FOR_aarch64_sdot_prodv8qi): Define as alias to new CODE_FOR_sdot_prodv2siv8qi. (CODE_FOR_aarch64_udot_prodv8qi): Define as alias to new CODE_FOR_udot_prodv2siv8qi. (CODE_FOR_aarch64_usdot_prodv8qi): Define as alias to new CODE_FOR_usdot_prodv2siv8qi. (CODE_FOR_aarch64_sdot_prodv16qi): Define as alias to new CODE_FOR_sdot_prodv4siv16qi. (CODE_FOR_aarch64_udot_prodv16qi): Define as alias to new CODE_FOR_udot_prodv4siv16qi. (CODE_FOR_aarch64_usdot_prodv16qi): Define as alias to new CODE_FOR_usdot_prodv4siv16qi. * config/aarch64/aarch64-sve-builtins-base.cc (svdot_impl::expand): s/direct/convert/ in `convert_optab_handler_for_sign' function call. (svusdot_impl::expand): add second mode argument in call to `code_for_dot_prod'. * config/aarch64/aarch64-sve-builtins.cc (function_expander::convert_optab_handler_for_sign): New class method. * config/aarch64/aarch64-sve-builtins.h (class function_expander): Add prototype for new `convert_optab_handler_for_sign' method. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sme/vect-dotprod-twoway.c (udot2): New.
2024-09-16aarch64: Emit ADD X, Y, Y instead of SHL X, Y, #1 for SVE instructions.Soumya AR1-3/+15
On Neoverse V2, SVE ADD instructions have a throughput of 4, while shift instructions like SHL have a throughput of 2. We can lean on that to emit code like: add z31.b, z31.b, z31.b instead of: lsl z31.b, z31.b, #1 The implementation of this change for SVE vectors is similar to a prior patch <https://gcc.gnu.org/pipermail/gcc-patches/2024-August/659958.html> that adds the above functionality for Neon vectors. Here, the machine descriptor pattern is split up to separately accommodate left and right shifts, so we can specifically emit an add for all left shifts by 1. The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. OK for mainline? Signed-off-by: Soumya AR <soumyaa@nvidia.com> gcc/ChangeLog: * config/aarch64/aarch64-sve.md (*post_ra_v<optab><mode>3): Split pattern to accomodate left and right shifts separately. (*post_ra_v_ashl<mode>3): Matches left shifts with additional constraint to check for shifts by 1. (*post_ra_v_<optab><mode>3): Matches right shifts. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/acle/asm/lsl_s16.c: Updated instances of lsl-1 with corresponding add. * gcc.target/aarch64/sve/acle/asm/lsl_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/lsl_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/lsl_s8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/lsl_u16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/lsl_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/lsl_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/lsl_u8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/lsl_wide_s16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/lsl_wide_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/lsl_wide_s8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/lsl_wide_u16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/lsl_wide_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/lsl_wide_u8.c: Likewise. * gcc.target/aarch64/sve/adr_1.c: Likewise. * gcc.target/aarch64/sve/adr_6.c: Likewise. * gcc.target/aarch64/sve/cond_mla_7.c: Likewise. * gcc.target/aarch64/sve/cond_mla_8.c: Likewise. * gcc.target/aarch64/sve/shift_2.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_s64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_u64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_s64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_u64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/rshl_s16.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/rshl_s32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/rshl_s64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/rshl_s8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/rshl_u16.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/rshl_u32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/rshl_u64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/rshl_u8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_s64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_u64.c: Likewise. * gcc.target/aarch64/sve/sve_shl_add.c: New test.
2024-08-01aarch64: Improve Advanced SIMD popcount expansion by using SVE [PR113860]Pengxuan Zheng1-6/+7
This patch improves the Advanced SIMD popcount expansion by using SVE if available. For example, GCC currently generates the following code sequence for V2DI: cnt v31.16b, v31.16b uaddlp v31.8h, v31.16b uaddlp v31.4s, v31.8h uaddlp v31.2d, v31.4s However, by using SVE, we can generate the following sequence instead: ptrue p7.b, all cnt z31.d, p7/m, z31.d Similar improvements can be made for V4HI, V8HI, V2SI and V4SI too. The scalar popcount expansion can also be improved similarly by using SVE and those changes will be included in a separate patch. PR target/113860 gcc/ChangeLog: * config/aarch64/aarch64-simd.md (popcount<mode>2): Add TARGET_SVE support. * config/aarch64/aarch64-sve.md (@aarch64_pred_<optab><mode>): Use new iterator SVE_VDQ_I. * config/aarch64/iterators.md (SVE_VDQ_I): New mode iterator. (VPRED): Add V8QI, V16QI, V4HI, V8HI and V2SI. gcc/testsuite/ChangeLog: * gcc.target/aarch64/popcnt-sve.c: New test. Signed-off-by: Pengxuan Zheng <quic_pzheng@quicinc.com>
2024-07-26aarch64: sve: Rename aarch64_bic to standard pattern, andnAndrew Pinski1-2/+2
Now there is an optab for bic, andn since r15-1890-gf379596e0ba99d. This moves aarch64_bic for sve over to use it instead. Note unlike the simd bic patterns, the operands were already in the order that was expected for the optab so no swapping was needed. Built and tested on aarch64-linux-gnu with no regressions. gcc/ChangeLog: * config/aarch64/aarch64-sve-builtins-base.cc (svbic_impl::expand): Update to use andn optab instead of using code_for_aarch64_bic. * config/aarch64/aarch64-sve.md (@aarch64_bic<mode>): Rename to ... (andn<mode>3): This. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-06-12aarch64: Use bitreverse rtl code instead of unspec [PR115176]Andrew Pinski1-1/+1
Bitreverse rtl code was added with r14-1586-g6160572f8d243c. So let's use it instead of an unspec. This is just a small cleanup but it does have one small fix with respect to rtx costs which didn't handle vector modes correctly for the UNSPEC and now it does. This is part of the first step in adding __builtin_bitreverse's builtins but it is independent of it though. Bootstrapped and tested on aarch64-linux-gnu with no regressions. gcc/ChangeLog: PR target/115176 * config/aarch64/aarch64-simd.md (aarch64_rbit<mode><vczle><vczbe>): Use bitreverse instead of unspec. * config/aarch64/aarch64-sve-builtins-base.cc (svrbit): Convert over to using rtx_code_function instead of unspec_based_function. * config/aarch64/aarch64-sve.md: Update comment where RBIT is included. * config/aarch64/aarch64.cc (aarch64_rtx_costs): Handle BITREVERSE like BSWAP. Remove UNSPEC_RBIT support. * config/aarch64/aarch64.md (unspec): Remove UNSPEC_RBIT. (aarch64_rbit<mode>): Use bitreverse instead of unspec. * config/aarch64/iterators.md (SVE_INT_UNARY): Add bitreverse. (optab): Likewise. (sve_int_op): Likewise. (SVE_INT_UNARY): Remove UNSPEC_RBIT. (optab): Likewise. (sve_int_op): Likewise. (min_elem_bits): Likewise. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-06-06AArch64: correct constraint on Upl early clobber alternativesTamar Christina1-32/+32
I made an oversight in the previous patch, where I added a ?Upa alternative to the Upl cases. This causes it to create the tie between the larger register file rather than the constrained one. This fixes the affected patterns. gcc/ChangeLog: * config/aarch64/aarch64-sve.md (@aarch64_pred_cmp<cmp_op><mode>, *cmp<cmp_op><mode>_cc, *cmp<cmp_op><mode>_ptest, @aarch64_pred_cmp<cmp_op><mode>_wide, *aarch64_pred_cmp<cmp_op><mode>_wide_cc, *aarch64_pred_cmp<cmp_op><mode>_wide_ptest): Fix Upl tie alternative. * config/aarch64/aarch64-sve2.md (@aarch64_pred_<sve_int_op><mode>): Fix Upl tie alternative.
2024-06-05AArch64: add new alternative with early clobber to patternsTamar Christina1-58/+120
This patch adds new alternatives to the patterns which are affected. The new alternatives with the conditional early clobbers are added before the normal ones in order for LRA to prefer them in the event that we have enough free registers to accommodate them. In case register pressure is too high the normal alternatives will be preferred before a reload is considered as we rather have the tie than a spill. Tests are in the next patch. gcc/ChangeLog: * config/aarch64/aarch64-sve.md (and<mode>3, @aarch64_pred_<optab><mode>_z, *<optab><mode>3_cc, *<optab><mode>3_ptest, aarch64_pred_<nlogical><mode>_z, *<nlogical><mode>3_cc, *<nlogical><mode>3_ptest, aarch64_pred_<logical_nn><mode>_z, *<logical_nn><mode>3_cc, *<logical_nn><mode>3_ptest, @aarch64_pred_cmp<cmp_op><mode>, *cmp<cmp_op><mode>_cc, *cmp<cmp_op><mode>_ptest, @aarch64_pred_cmp<cmp_op><mode>_wide, *aarch64_pred_cmp<cmp_op><mode>_wide_cc, *aarch64_pred_cmp<cmp_op><mode>_wide_ptest, @aarch64_brk<brk_op>, *aarch64_brk<brk_op>_cc, *aarch64_brk<brk_op>_ptest, @aarch64_brk<brk_op>, *aarch64_brk<brk_op>_cc, *aarch64_brk<brk_op>_ptest, aarch64_rdffr_z, *aarch64_rdffr_z_ptest, *aarch64_rdffr_ptest, *aarch64_rdffr_z_cc, *aarch64_rdffr_cc): Add new early clobber alternative. * config/aarch64/aarch64-sve2.md (@aarch64_pred_<sve_int_op><mode>): Likewise.
2024-06-05AArch64: convert several predicate patterns to new compact syntaxTamar Christina1-108/+154
This converts the single alternative patterns to the new compact syntax such that when I add the new alternatives it's clearer what's being changed. Note that this will spew out a bunch of warnings from geninsn as it'll warn that @ is useless for a single alternative pattern. These are not fatal so won't break the build and are only temporary. No change in functionality is expected with this patch. gcc/ChangeLog: * config/aarch64/aarch64-sve.md (and<mode>3, @aarch64_pred_<optab><mode>_z, *<optab><mode>3_cc, *<optab><mode>3_ptest, aarch64_pred_<nlogical><mode>_z, *<nlogical><mode>3_cc, *<nlogical><mode>3_ptest, aarch64_pred_<logical_nn><mode>_z, *<logical_nn><mode>3_cc, *<logical_nn><mode>3_ptest, *cmp<cmp_op><mode>_ptest, @aarch64_pred_cmp<cmp_op><mode>_wide, *aarch64_pred_cmp<cmp_op><mode>_wide_cc, *aarch64_pred_cmp<cmp_op><mode>_wide_ptest, *aarch64_brk<brk_op>_cc, *aarch64_brk<brk_op>_ptest, @aarch64_brk<brk_op>, *aarch64_brk<brk_op>_cc, *aarch64_brk<brk_op>_ptest, aarch64_rdffr_z, *aarch64_rdffr_z_ptest, *aarch64_rdffr_ptest, *aarch64_rdffr_z_cc, *aarch64_rdffr_cc): Convert to compact syntax. * config/aarch64/aarch64-sve2.md (@aarch64_pred_<sve_int_op><mode>): Likewise.
2024-04-05aarch64: Fix bogus cnot optimisation [PR114603]Richard Sandiford1-11/+11
aarch64-sve.md had a pattern that combined: cmpeq pb.T, pa/z, zc.T, #0 mov zd.T, pb/z, #1 into: cnot zd.T, pa/m, zc.T But this is only valid if pa.T is a ptrue. In other cases, the original would set inactive elements of zd.T to 0, whereas the combined form would copy elements from zc.T. gcc/ PR target/114603 * config/aarch64/aarch64-sve.md (@aarch64_pred_cnot<mode>): Replace with... (@aarch64_ptrue_cnot<mode>): ...this, requiring operand 1 to be a ptrue. (*cnot<mode>): Require operand 1 to be a ptrue. * config/aarch64/aarch64-sve-builtins-base.cc (svcnot_impl::expand): Use aarch64_ptrue_cnot<mode> for _x operations that are predicated with a ptrue. Represent other _x operations as fully-defined _m operations. gcc/testsuite/ PR target/114603 * gcc.target/aarch64/sve/acle/general/cnot_1.c: New test.
2024-01-24AArch64: Fix expansion of Advanced SIMD div and mul using SVE [PR109636]Tamar Christina1-29/+51
As suggested in the ticket this replaces the expansion by converting the Advanced SIMD types to SVE types by simply printing out an SVE register for these instructions. This fixes the subreg issues since there are no subregs involved anymore. gcc/ChangeLog: PR target/109636 * config/aarch64/aarch64-simd.md (<su_optab>div<mode>3, mulv2di3): Remove. * config/aarch64/iterators.md (VQDIV): Remove. (SVE_FULL_SDI_SIMD, SVE_FULL_HSDI_SIMD_DI, SVE_I_SIMD_DI): New. (VPRED, sve_lane_con): Add V4SI and V2DI. * config/aarch64/aarch64-sve.md (<optab><mode>3, @aarch64_pred_<optab><mode>): Support Advanced SIMD types. (mul<mode>3): New, split from <optab><mode>3. (@aarch64_pred_<optab><mode>, *post_ra_<optab><mode>3): New. * config/aarch64/aarch64-sve2.md (@aarch64_mul_lane_<mode>, *aarch64_mul_unpredicated_<mode>): Change SVE_FULL_HSDI to SVE_FULL_HSDI_SIMD_DI. gcc/testsuite/ChangeLog: PR target/109636 * gcc.target/aarch64/sve/pr109636_1.c: New test. * gcc.target/aarch64/sve/pr109636_2.c: New test. * gcc.target/aarch64/sve2/pr109636_1.c: New test.
2024-01-03Update copyright years.Jakub Jelinek1-1/+1
2023-12-15aarch64: Handle autoinc addresses in ld1rq splitter [PR112906]Alex Coplan1-4/+1
This patch uses the new force_reload_address routine added by the previous patch to fix PR112906. gcc/ChangeLog: PR target/112906 * config/aarch64/aarch64-sve.md (@aarch64_vec_duplicate_vq<mode>_le): Use force_reload_address to reload addresses that aren't suitable for ld1rq in the pre-RA splitter. gcc/testsuite/ChangeLog: PR target/112906 * gcc.target/aarch64/sve/acle/general/pr112906.c: New test.
2023-12-13aarch64: SVE/NEON Bridging intrinsicsRichard Ball1-0/+33
ACLE has added intrinsics to bridge between SVE and Neon. The NEON_SVE Bridge adds intrinsics that allow conversions between NEON and SVE vectors. This patch adds support to GCC for the following 3 intrinsics: svset_neonq, svget_neonq and svdup_neonq gcc/ChangeLog: * config.gcc: Adds new header to config. * config/aarch64/aarch64-builtins.cc (enum aarch64_type_qualifiers): Moved to header file. (ENTRY): Likewise. (enum aarch64_simd_type): Likewise. (struct aarch64_simd_type_info): Remove static. (GTY): Likewise. * config/aarch64/aarch64-c.cc (aarch64_pragma_aarch64): Defines pragma for arm_neon_sve_bridge.h. * config/aarch64/aarch64-protos.h: Add handle_arm_neon_sve_bridge_h * config/aarch64/aarch64-sve-builtins-base.h: New intrinsics. * config/aarch64/aarch64-sve-builtins-base.cc (class svget_neonq_impl): New intrinsic implementation. (class svset_neonq_impl): Likewise. (class svdup_neonq_impl): Likewise. (NEON_SVE_BRIDGE_FUNCTION): New intrinsics. * config/aarch64/aarch64-sve-builtins-functions.h (NEON_SVE_BRIDGE_FUNCTION): Defines macro for NEON_SVE_BRIDGE functions. * config/aarch64/aarch64-sve-builtins-shapes.h: New shapes. * config/aarch64/aarch64-sve-builtins-shapes.cc (parse_element_type): Add NEON element types. (parse_type): Likewise. (struct get_neonq_def): Defines function shape for get_neonq. (struct set_neonq_def): Defines function shape for set_neonq. (struct dup_neonq_def): Defines function shape for dup_neonq. * config/aarch64/aarch64-sve-builtins.cc (DEF_SVE_TYPE_SUFFIX): Changed to be called through SVE_NEON macro. (DEF_SVE_NEON_TYPE_SUFFIX): Defines macro for NEON_SVE_BRIDGE type suffixes. (DEF_NEON_SVE_FUNCTION): Defines macro for NEON_SVE_BRIDGE functions. (function_resolver::infer_neon128_vector_type): Infers type suffix for overloaded functions. (handle_arm_neon_sve_bridge_h): Handles #pragma arm_neon_sve_bridge.h. * config/aarch64/aarch64-sve-builtins.def (DEF_SVE_NEON_TYPE_SUFFIX): Macro for handling neon_sve type suffixes. (bf16): Replace entry with neon-sve entry. (f16): Likewise. (f32): Likewise. (f64): Likewise. (s8): Likewise. (s16): Likewise. (s32): Likewise. (s64): Likewise. (u8): Likewise. (u16): Likewise. (u32): Likewise. (u64): Likewise. * config/aarch64/aarch64-sve-builtins.h (GCC_AARCH64_SVE_BUILTINS_H): Include aarch64-builtins.h. (ENTRY): Add aarch64_simd_type definiton. (enum aarch64_simd_type): Add neon information to type_suffix_info. (struct type_suffix_info): New function. * config/aarch64/aarch64-sve.md (@aarch64_sve_get_neonq_<mode>): New intrinsic insn for big endian. (@aarch64_sve_set_neonq_<mode>): Likewise. * config/aarch64/iterators.md: Add UNSPEC_SET_NEONQ. * config/aarch64/aarch64-builtins.h: New file. * config/aarch64/aarch64-neon-sve-bridge-builtins.def: New file. * config/aarch64/arm_neon_sve_bridge.h: New file. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/acle/asm/test_sve_acle.h: Add include arm_neon_sve_bridge header file * gcc.dg/torture/neon-sve-bridge.c: New test. * gcc.target/aarch64/sve/acle/asm/dup_neonq_bf16.c: New test. * gcc.target/aarch64/sve/acle/asm/dup_neonq_f16.c: New test. * gcc.target/aarch64/sve/acle/asm/dup_neonq_f32.c: New test. * gcc.target/aarch64/sve/acle/asm/dup_neonq_f64.c: New test. * gcc.target/aarch64/sve/acle/asm/dup_neonq_s16.c: New test. * gcc.target/aarch64/sve/acle/asm/dup_neonq_s32.c: New test. * gcc.target/aarch64/sve/acle/asm/dup_neonq_s64.c: New test. * gcc.target/aarch64/sve/acle/asm/dup_neonq_s8.c: New test. * gcc.target/aarch64/sve/acle/asm/dup_neonq_u16.c: New test. * gcc.target/aarch64/sve/acle/asm/dup_neonq_u32.c: New test. * gcc.target/aarch64/sve/acle/asm/dup_neonq_u64.c: New test. * gcc.target/aarch64/sve/acle/asm/dup_neonq_u8.c: New test. * gcc.target/aarch64/sve/acle/asm/get_neonq_bf16.c: New test. * gcc.target/aarch64/sve/acle/asm/get_neonq_f16.c: New test. * gcc.target/aarch64/sve/acle/asm/get_neonq_f32.c: New test. * gcc.target/aarch64/sve/acle/asm/get_neonq_f64.c: New test. * gcc.target/aarch64/sve/acle/asm/get_neonq_s16.c: New test. * gcc.target/aarch64/sve/acle/asm/get_neonq_s32.c: New test. * gcc.target/aarch64/sve/acle/asm/get_neonq_s64.c: New test. * gcc.target/aarch64/sve/acle/asm/get_neonq_s8.c: New test. * gcc.target/aarch64/sve/acle/asm/get_neonq_u16.c: New test. * gcc.target/aarch64/sve/acle/asm/get_neonq_u32.c: New test. * gcc.target/aarch64/sve/acle/asm/get_neonq_u64.c: New test. * gcc.target/aarch64/sve/acle/asm/get_neonq_u8.c: New test. * gcc.target/aarch64/sve/acle/asm/set_neonq_bf16.c: New test. * gcc.target/aarch64/sve/acle/asm/set_neonq_f16.c: New test. * gcc.target/aarch64/sve/acle/asm/set_neonq_f32.c: New test. * gcc.target/aarch64/sve/acle/asm/set_neonq_f64.c: New test. * gcc.target/aarch64/sve/acle/asm/set_neonq_s16.c: New test. * gcc.target/aarch64/sve/acle/asm/set_neonq_s32.c: New test. * gcc.target/aarch64/sve/acle/asm/set_neonq_s64.c: New test. * gcc.target/aarch64/sve/acle/asm/set_neonq_s8.c: New test. * gcc.target/aarch64/sve/acle/asm/set_neonq_u16.c: New test. * gcc.target/aarch64/sve/acle/asm/set_neonq_u32.c: New test. * gcc.target/aarch64/sve/acle/asm/set_neonq_u64.c: New test. * gcc.target/aarch64/sve/acle/asm/set_neonq_u8.c: New test. * gcc.target/aarch64/sve/acle/general-c/dup_neonq_1.c: New test. * gcc.target/aarch64/sve/acle/general-c/get_neonq_1.c: New test. * gcc.target/aarch64/sve/acle/general-c/set_neonq_1.c: New test.
2023-12-07aarch64: Add an early RA for strided registersRichard Sandiford1-44/+0
This pass adds a simple register allocator for FP & SIMD registers. Its main purpose is to make use of SME2's strided LD1, ST1 and LUTI2/4 instructions, which require a very specific grouping structure, and so would be difficult to exploit with general allocation. The allocator is very simple. It gives up on anything that would require spilling, or that it might not handle well for other reasons. The allocator needs to track liveness at the level of individual FPRs. Doing that fixes a lot of the PRs relating to redundant moves caused by structure loads and stores. That particular problem is going to be fixed more generally for GCC 15 by Lehua's RA patches. However, the early-RA pass runs before scheduling, so it has a chance to bag a spill-free allocation of vector code before the scheduler moves things around. It could therefore still be useful for non-SME code (e.g. for hand-scheduled ACLE code) even after Lehua's patches are in. The pass is controlled by a tristate switch: - -mearly-ra=all: run on all functions - -mearly-ra=strided: run on functions that have access to strided registers - -mearly-ra=none: don't run on any function The patch makes -mearly-ra=all the default at -O2 and above for now. We can revisit this for GCC 15 once Lehua's patches are in; -mearly-ra=strided might then be more appropriate. As said previously, the pass is very naive. There's much more that we could do, such as handling invariants better. The main focus is on not committing to a bad allocation, rather than on handling as much as possible. gcc/ PR rtl-optimization/106694 PR rtl-optimization/109078 PR rtl-optimization/109391 * config.gcc: Add aarch64-early-ra.o for AArch64 targets. * config/aarch64/t-aarch64 (aarch64-early-ra.o): New rule. * config/aarch64/aarch64-opts.h (aarch64_early_ra_scope): New enum. * config/aarch64/aarch64.opt (mearly_ra): New option. * doc/invoke.texi: Document it. * common/config/aarch64/aarch64-common.cc (aarch_option_optimization_table): Use -mearly-ra=strided by default for -O2 and above. * config/aarch64/aarch64-passes.def (pass_aarch64_early_ra): New pass. * config/aarch64/aarch64-protos.h (aarch64_strided_registers_p) (make_pass_aarch64_early_ra): Declare. * config/aarch64/aarch64-sme.md (@aarch64_sme_lut<LUTI_BITS><mode>): Add a stride_type attribute. (@aarch64_sme_lut<LUTI_BITS><mode>_strided2): New pattern. (@aarch64_sme_lut<LUTI_BITS><mode>_strided4): Likewise. * config/aarch64/aarch64-sve-builtins-base.cc (svld1_impl::expand) (svldnt1_impl::expand, svst1_impl::expand, svstn1_impl::expand): Handle new way of defining multi-register loads and stores. * config/aarch64/aarch64-sve.md (@aarch64_ld1<SVE_FULLx24:mode>) (@aarch64_ldnt1<SVE_FULLx24:mode>, @aarch64_st1<SVE_FULLx24:mode>) (@aarch64_stnt1<SVE_FULLx24:mode>): Delete. * config/aarch64/aarch64-sve2.md (@aarch64_<LD1_COUNT:optab><mode>) (@aarch64_<LD1_COUNT:optab><mode>_strided2): New patterns. (@aarch64_<LD1_COUNT:optab><mode>_strided4): Likewise. (@aarch64_<ST1_COUNT:optab><mode>): Likewise. (@aarch64_<ST1_COUNT:optab><mode>_strided2): Likewise. (@aarch64_<ST1_COUNT:optab><mode>_strided4): Likewise. * config/aarch64/aarch64.cc (aarch64_strided_registers_p): New function. * config/aarch64/aarch64.md (UNSPEC_LD1_SVE_COUNT): Delete. (UNSPEC_ST1_SVE_COUNT, UNSPEC_LDNT1_SVE_COUNT): Likewise. (UNSPEC_STNT1_SVE_COUNT): Likewise. (stride_type): New attribute. * config/aarch64/constraints.md (Uwd, Uwt): New constraints. * config/aarch64/iterators.md (UNSPEC_LD1_COUNT, UNSPEC_LDNT1_COUNT) (UNSPEC_ST1_COUNT, UNSPEC_STNT1_COUNT): New unspecs. (optab): Handle them. (LD1_COUNT, ST1_COUNT): New iterators. * config/aarch64/aarch64-early-ra.cc: New file. gcc/testsuite/ PR rtl-optimization/106694 PR rtl-optimization/109078 PR rtl-optimization/109391 * gcc.target/aarch64/ldp_stp_16.c (cons4_4_float): Tighten expected output test. * gcc.target/aarch64/sve/shift_1.c: Allow reversed shifts for .s as well as .d. * gcc.target/aarch64/sme/strided_1.c: New test. * gcc.target/aarch64/pr109078.c: Likewise. * gcc.target/aarch64/pr109391.c: Likewise. * gcc.target/aarch64/sve/pr106694.c: Likewise.
2023-12-05aarch64: Add support for SME2 intrinsicsRichard Sandiford1-19/+79
This patch adds support for the SME2 <arm_sme.h> intrinsics. The convention I've used is to put stuff in aarch64-sve-builtins-sme.* if it relates to ZA, ZT0, the streaming vector length, or other such SME state. Things that operate purely on predicates and vectors go in aarch64-sve-builtins-sve2.* instead. Some of these will later be picked up for SVE2p1. We previously used Uph internally as a constraint for 16-bit immediates to atomic instructions. However, we need a user-facing constraint for the upper predicate registers (already available as PR_HI_REGS), and Uph makes a natural pair with the existing Upl. gcc/ * config/aarch64/aarch64.h (TARGET_STREAMING_SME2): New macro. (P_ALIASES): Likewise. (REGISTER_NAMES): Add pn aliases of the predicate registers. (W8_W11_REGNUM_P): New macro. (W8_W11_REGS): New register class. (REG_CLASS_NAMES, REG_CLASS_CONTENTS): Update accordingly. * config/aarch64/aarch64.cc (aarch64_print_operand): Add support for %K, which prints a predicate as a counter. Handle tuples of predicates. (aarch64_regno_regclass): Handle W8_W11_REGS. (aarch64_class_max_nregs): Likewise. * config/aarch64/constraints.md (Uci, Uw2, Uw4): New constraints. (x, y): Move further up file. (Uph): Redefine as the high predicate registers, renaming the old constraint to... (Uih): ...this. * config/aarch64/predicates.md (const_0_to_7_operand): New predicate. (const_0_to_4_step_4_operand, const_0_to_6_step_2_operand): Likewise. (const_0_to_12_step_4_operand, const_0_to_14_step_2_operand): Likewise. (aarch64_simd_shift_imm_qi): Use const_0_to_7_operand. * config/aarch64/iterators.md (VNx16SI_ONLY, VNx8SI_ONLY) (VNx8DI_ONLY, SVE_FULL_BHSIx2, SVE_FULL_HF, SVE_FULL_SIx2_SDIx4) (SVE_FULL_BHS, SVE_FULLx24, SVE_DIx24, SVE_BHSx24, SVE_Ix24) (SVE_Fx24, SVE_SFx24, SME_ZA_BIx24, SME_ZA_BHIx124, SME_ZA_BHIx24) (SME_ZA_HFx124, SME_ZA_HFx24, SME_ZA_HIx124, SME_ZA_HIx24) (SME_ZA_SDIx24, SME_ZA_SDFx24): New mode iterators. (UNSPEC_REVD, UNSPEC_CNTP_C, UNSPEC_PEXT, UNSPEC_PEXTx2): New unspecs. (UNSPEC_PSEL, UNSPEC_PTRUE_C, UNSPEC_SQRSHR, UNSPEC_SQRSHRN) (UNSPEC_SQRSHRU, UNSPEC_SQRSHRUN, UNSPEC_UQRSHR, UNSPEC_UQRSHRN) (UNSPEC_UZP, UNSPEC_UZPQ, UNSPEC_ZIP, UNSPEC_ZIPQ, UNSPEC_BFMLSLB) (UNSPEC_BFMLSLT, UNSPEC_FCVTN, UNSPEC_FDOT, UNSPEC_SQCVT): Likewise. (UNSPEC_SQCVTN, UNSPEC_SQCVTU, UNSPEC_SQCVTUN, UNSPEC_UQCVT): Likewise. (UNSPEC_SME_ADD, UNSPEC_SME_ADD_WRITE, UNSPEC_SME_BMOPA): Likewise. (UNSPEC_SME_BMOPS, UNSPEC_SME_FADD, UNSPEC_SME_FDOT, UNSPEC_SME_FVDOT) (UNSPEC_SME_FMLA, UNSPEC_SME_FMLS, UNSPEC_SME_FSUB, UNSPEC_SME_READ) (UNSPEC_SME_SDOT, UNSPEC_SME_SVDOT, UNSPEC_SME_SMLA, UNSPEC_SME_SMLS) (UNSPEC_SME_SUB, UNSPEC_SME_SUB_WRITE, UNSPEC_SME_SUDOT): Likewise. (UNSPEC_SME_SUVDOT, UNSPEC_SME_UDOT, UNSPEC_SME_UVDOT): Likewise. (UNSPEC_SME_UMLA, UNSPEC_SME_UMLS, UNSPEC_SME_USDOT): Likewise. (UNSPEC_SME_USVDOT, UNSPEC_SME_WRITE): Likewise. (Vetype, VNARROW, V2XWIDE, Ventype, V_INT_EQUIV, v_int_equiv) (VSINGLE, vsingle, b): Add tuple modes. (v2xwide, za32_offset_range, za64_offset_range, za32_long) (za32_last_offset, vg_modifier, z_suffix, aligned_operand) (aligned_fpr): New mode attributes. (SVE_INT_BINARY_MULTI, SVE_INT_BINARY_SINGLE, SVE_INT_BINARY_MULTI) (SVE_FP_BINARY_MULTI): New int iterators. (SVE_BFLOAT_TERNARY_LONG): Add UNSPEC_BFMLSLB and UNSPEC_BFMLSLT. (SVE_BFLOAT_TERNARY_LONG_LANE): Likewise. (SVE_WHILE_ORDER, SVE2_INT_SHIFT_IMM_NARROWxN, SVE_QCVTxN) (SVE2_SFx24_UNARY, SVE2_x24_PERMUTE, SVE2_x24_PERMUTEQ) (UNSPEC_REVD_ONLY, SME2_INT_MOP, SME2_BMOP, SME_BINARY_SLICE_SDI) (SME_BINARY_SLICE_SDF, SME_BINARY_WRITE_SLICE_SDI, SME_INT_DOTPROD) (SME_INT_DOTPROD_LANE, SME_FP_DOTPROD, SME_FP_DOTPROD_LANE) (SME_INT_TERNARY_SLICE, SME_FP_TERNARY_SLICE, BHSD_BITS) (LUTI_BITS): New int iterators. (optab, sve_int_op): Handle the new unspecs. (sme_int_op, has_16bit_form): New int attributes. (bits_etype): Handle 64. * config/aarch64/aarch64.md (UNSPEC_LD1_SVE_COUNT): New unspec. (UNSPEC_ST1_SVE_COUNT, UNSPEC_LDNT1_SVE_COUNT): Likewise. (UNSPEC_STNT1_SVE_COUNT): Likewise. * config/aarch64/atomics.md (cas_short_expected_imm): Use Uhi rather than Uph for HImode immediates. * config/aarch64/aarch64-sve.md (@aarch64_ld1<SVE_FULLx24:mode>) (@aarch64_ldnt1<SVE_FULLx24:mode>, @aarch64_st1<SVE_FULLx24:mode>) (@aarch64_stnt1<SVE_FULLx24:mode>): New patterns. (@aarch64_<sur>dot_prod_lane<vsi2qi>): Extend to... (@aarch64_<sur>dot_prod_lane<SVE_FULL_SDI:mode><SVE_FULL_BHI:mode>) (@aarch64_<sur>dot_prod_lane<VNx4SI_ONLY:mode><VNx16QI_ONLY:mode>): ...these new patterns. (SVE_WHILE_B, SVE_WHILE_B_X2, SVE_WHILE_C): New constants. Add SVE_WHILE_B to existing while patterns. * config/aarch64/aarch64-sve2.md (@aarch64_sve_ptrue_c<BHSD_BITS>) (@aarch64_sve_pext<BHSD_BITS>, @aarch64_sve_pext<BHSD_BITS>x2) (@aarch64_sve_psel<BHSD_BITS>, *aarch64_sve_psel<BHSD_BITS>_plus) (@aarch64_sve_cntp_c<BHSD_BITS>, <frint_pattern><mode>2) (<optab><mode>3, *<optab><mode>3, @aarch64_sve_single_<optab><mode>) (@aarch64_sve_<sve_int_op><mode>): New patterns. (@aarch64_sve_single_<sve_int_op><mode>, @aarch64_sve_<su>clamp<mode>) (*aarch64_sve_<su>clamp<mode>_x, @aarch64_sve_<su>clamp_single<mode>) (@aarch64_sve_fclamp<mode>, *aarch64_sve_fclamp<mode>_x) (@aarch64_sve_fclamp_single<mode>, <optab><mode><v2xwide>2) (@aarch64_sve_<sur>dotvnx4sivnx8hi): New patterns. (@aarch64_sve_<maxmin_uns_op><mode>): Likewise. (*aarch64_sve_<maxmin_uns_op><mode>): Likewise. (@aarch64_sve_single_<maxmin_uns_op><mode>): Likewise. (aarch64_sve_fdotvnx4sfvnx8hf): Likewise. (aarch64_fdot_prod_lanevnx4sfvnx8hf): Likewise. (@aarch64_sve_<optab><VNx16QI_ONLY:mode><VNx16SI_ONLY:mode>): Likewise. (@aarch64_sve_<optab><VNx8HI_ONLY:mode><VNx8SI_ONLY:mode>): Likewise. (@aarch64_sve_<optab><VNx8HI_ONLY:mode><VNx8DI_ONLY:mode>): Likewise. (truncvnx8sf<mode>2, @aarch64_sve_cvtn<mode>): Likewise. (<optab><v_int_equiv><mode>2, <optab><mode><v_int_equiv>2): Likewise. (@aarch64_sve_sel<mode>): Likewise. (@aarch64_sve_while<while_optab_cmp>_b<BHSD_BITS>_x2): Likewise. (@aarch64_sve_while<while_optab_cmp>_c<BHSD_BITS>): Likewise. (@aarch64_pred_<optab><mode>, @cond_<optab><mode>): Likewise. (@aarch64_sve_<optab><mode>): Likewise. * config/aarch64/aarch64-sme.md (@aarch64_sme_<optab><mode><mode>) (*aarch64_sme_<optab><mode><mode>_plus, @aarch64_sme_read<mode>) (*aarch64_sme_read<mode>_plus, @aarch64_sme_write<mode>): New patterns. (*aarch64_sme_write<mode>_plus aarch64_sme_zero_zt0): Likewise. (@aarch64_sme_<optab><mode>, *aarch64_sme_<optab><mode>_plus) (@aarch64_sme_single_<optab><mode>): Likewise. (*aarch64_sme_single_<optab><mode>_plus): Likewise. (@aarch64_sme_<optab><SME_ZA_SDI:mode><SME_ZA_BHIx24:mode>) (*aarch64_sme_<optab><SME_ZA_SDI:mode><SME_ZA_BHIx24:mode>_plus) (@aarch64_sme_single_<optab><SME_ZA_SDI:mode><SME_ZA_BHIx24:mode>) (*aarch64_sme_single_<optab><SME_ZA_SDI:mode><SME_ZA_BHIx24:mode>_plus) (@aarch64_sme_single_sudot<VNx4SI_ONLY:mode><SME_ZA_BIx24:mode>) (*aarch64_sme_single_sudot<VNx4SI_ONLY:mode><SME_ZA_BIx24:mode>_plus) (@aarch64_sme_lane_<optab><SME_ZA_SDI:mode><SME_ZA_BHIx24:mode>) (*aarch64_sme_lane_<optab><SME_ZA_SDI:mode><SME_ZA_BHIx24:mode>_plus) (@aarch64_sme_<optab><VNx4SI_ONLY:mode><SVE_FULL_BHI:mode>) (*aarch64_sme_<optab><VNx4SI_ONLY:mode><SVE_FULL_BHI:mode>_plus) (@aarch64_sme_<optab><VNx4SI_ONLY:mode><SME_ZA_BHIx24:mode>) (*aarch64_sme_<optab><VNx4SI_ONLY:mode><SME_ZA_BHIx24:mode>_plus) (@aarch64_sme_single_<optab><VNx4SI_ONLY:mode><SME_ZA_BHIx24:mode>) (*aarch64_sme_single_<optab><VNx4SI_ONLY:mode><SME_ZA_BHIx24:mode>_plus) (@aarch64_sme_lane_<optab><VNx4SI_ONLY:mode><SME_ZA_BHIx124:mode>) (*aarch64_sme_lane_<optab><VNx4SI_ONLY:mode><SME_ZA_BHIx124:mode>) (@aarch64_sme_<optab><VNx2DI_ONLY:mode><VNx8HI_ONLY:mode>) (*aarch64_sme_<optab><VNx2DI_ONLY:mode><VNx8HI_ONLY:mode>_plus) (@aarch64_sme_<optab><VNx2DI_ONLY:mode><SME_ZA_HIx24:mode>) (*aarch64_sme_<optab><VNx2DI_ONLY:mode><SME_ZA_HIx24:mode>_plus) (@aarch64_sme_single_<optab><VNx2DI_ONLY:mode><SME_ZA_HIx24:mode>) (*aarch64_sme_single_<optab><VNx2DI_ONLY:mode><SME_ZA_HIx24:mode>_plus) (@aarch64_sme_lane_<optab><VNx2DI_ONLY:mode><SME_ZA_HIx124:mode>) (*aarch64_sme_lane_<optab><VNx2DI_ONLY:mode><SME_ZA_HIx124:mode>) (@aarch64_sme_<optab><VNx4SI_ONLY:mode><VNx8HI_ONLY:mode>) (@aarch64_sme_<optab><VNx4SI_ONLY:mode><VNx4SI_ONLY:mode>) (@aarch64_sme_<optab><VNx4SI_ONLY:mode><SME_ZA_HFx24:mode>) (*aarch64_sme_<optab><VNx4SI_ONLY:mode><SME_ZA_HFx24:mode>_plus) (@aarch64_sme_single_<optab><VNx4SI_ONLY:mode><SME_ZA_HFx24:mode>) (*aarch64_sme_single_<optab><VNx4SI_ONLY:mode><SME_ZA_HFx24:mode>_plus) (@aarch64_sme_lane_<optab><VNx4SI_ONLY:mode><SME_ZA_HFx24:mode>) (*aarch64_sme_lane_<optab><VNx4SI_ONLY:mode><SME_ZA_HFx24:mode>_plus) (@aarch64_sme_<optab><SME_ZA_SDF_I:mode><SME_ZA_SDFx24:mode>) (*aarch64_sme_<optab><SME_ZA_SDF_I:mode><SME_ZA_SDFx24:mode>_plus) (@aarch64_sme_single_<optab><SME_ZA_SDF_I:mode><SME_ZA_SDFx24:mode>) (*aarch64_sme_single_<optab><SME_ZA_SDF_I:mode><SME_ZA_SDFx24:mode>_plus) (@aarch64_sme_lane_<optab><SME_ZA_SDF_I:mode><SME_ZA_SDFx24:mode>) (*aarch64_sme_lane_<optab><SME_ZA_SDF_I:mode><SME_ZA_SDFx24:mode>) (@aarch64_sme_<optab><VNx4SI_ONLY:mode><SVE_FULL_HF:mode>) (*aarch64_sme_<optab><VNx4SI_ONLY:mode><SVE_FULL_HF:mode>_plus) (@aarch64_sme_lane_<optab><VNx4SI_ONLY:mode><SME_ZA_HFx124:mode>) (*aarch64_sme_lane_<optab><VNx4SI_ONLY:mode><SME_ZA_HFx124:mode>) (@aarch64_sme_lut<LUTI_BITS><mode>): Likewise. (UNSPEC_SME_LUTI): New unspec. * config/aarch64/aarch64-sve-builtins.def (single): New mode suffix. (c8, c16, c32, c64): New type suffixes. (vg1x2, vg1x4, vg2, vg2x1, vg2x2, vg2x4, vg4, vg4x1, vg4x2) (vg4x4): New group suffixes. * config/aarch64/aarch64-sve-builtins.h (CP_READ_ZT0) (CP_WRITE_ZT0): New constants. (get_svbool_t): Delete. (function_resolver::report_mismatched_num_vectors): New member function. (function_resolver::resolve_conversion): Likewise. (function_resolver::infer_predicate_type): Likewise. (function_resolver::infer_64bit_scalar_integer_pair): Likewise. (function_resolver::require_matching_predicate_type): Likewise. (function_resolver::require_nonscalar_type): Likewise. (function_resolver::finish_opt_single_resolution): Likewise. (function_resolver::require_derived_vector_type): Add an expected_num_vectors parameter. (function_expander::map_to_rtx_codes): Add an extra parameter for unconditional FP unspecs. (function_instance::gp_type_index): New member function. (function_instance::gp_type): Likewise. (function_instance::gp_mode): Handle multi-vector operations. * config/aarch64/aarch64-sve-builtins.cc (TYPES_all_count) (TYPES_all_pred_count, TYPES_c, TYPES_bhs_data, TYPES_bhs_widen) (TYPES_hs_data, TYPES_cvt_h_s_float, TYPES_cvt_s_s, TYPES_qcvt_x2) (TYPES_qcvt_x4, TYPES_qrshr_x2, TYPES_qrshru_x2, TYPES_qrshr_x4) (TYPES_qrshru_x4, TYPES_while_x, TYPES_while_x_c, TYPES_s_narrow_fsu) (TYPES_za_s_b_signed, TYPES_za_s_b_unsigned, TYPES_za_s_b_integer) (TYPES_za_s_h_integer, TYPES_za_s_h_data, TYPES_za_s_unsigned) (TYPES_za_s_float, TYPES_za_s_data, TYPES_za_d_h_integer): New type macros. (groups_x2, groups_x12, groups_x4, groups_x24, groups_x124) (groups_vg1x2, groups_vg1x4, groups_vg1x24, groups_vg2, groups_vg4) (groups_vg24): New group arrays. (function_instance::reads_global_state_p): Handle CP_READ_ZT0. (function_instance::modifies_global_state_p): Handle CP_WRITE_ZT0. (add_shared_state_attribute): Handle zt0 state. (function_builder::add_overloaded_functions): Skip MODE_single for non-tuple groups. (function_resolver::report_mismatched_num_vectors): New function. (function_resolver::resolve_to): Add a fallback error message for the general two-type case. (function_resolver::resolve_conversion): New function. (function_resolver::infer_predicate_type): Likewise. (function_resolver::infer_64bit_scalar_integer_pair): Likewise. (function_resolver::require_matching_predicate_type): Likewise. (function_resolver::require_matching_vector_type): Specifically diagnose mismatched vector counts. (function_resolver::require_derived_vector_type): Add an expected_num_vectors parameter. Extend to handle cases where tuples are expected. (function_resolver::require_nonscalar_type): New function. (function_resolver::check_gp_argument): Use gp_type_index rather than hard-coding VECTOR_TYPE_svbool_t. (function_resolver::finish_opt_single_resolution): New function. (function_checker::require_immediate_either_or): Remove hard-coded constants. (function_expander::direct_optab_handler): New function. (function_expander::use_pred_x_insn): Only add a strictness flag is the insn has an operand for it. (function_expander::map_to_rtx_codes): Take an unconditional FP unspec as an extra parameter. Handle tuples and MODE_single. (function_expander::map_to_unspecs): Handle tuples and MODE_single. * config/aarch64/aarch64-sve-builtins-functions.h (read_zt0) (write_zt0): New typedefs. (full_width_access::memory_vector): Use the function's vectors_per_tuple. (rtx_code_function_base): Add an optional unconditional FP unspec. (rtx_code_function::expand): Update accordingly. (rtx_code_function_rotated::expand): Likewise. (unspec_based_function_exact_insn::expand): Use tuple_mode instead of vector_mode. (unspec_based_uncond_function): New typedef. (cond_or_uncond_unspec_function): New class. (sme_1mode_function::expand): Handle single forms. (sme_2mode_function_t): Likewise, adding a template parameter for them. (sme_2mode_function): Update accordingly. (sme_2mode_lane_function): New typedef. (multireg_permute): New class. (class integer_conversion): Likewise. (while_comparison::expand): Handle svcount_t and svboolx2_t results. * config/aarch64/aarch64-sve-builtins-shapes.h (binary_int_opt_single_n, binary_opt_single_n, binary_single) (binary_za_slice_lane, binary_za_slice_int_opt_single) (binary_za_slice_opt_single, binary_za_slice_uint_opt_single) (binaryx, clamp, compare_scalar_count, count_pred_c) (dot_za_slice_int_lane, dot_za_slice_lane, dot_za_slice_uint_lane) (extract_pred, inherent_zt, ldr_zt, read_za, read_za_slice) (select_pred, shift_right_imm_narrowxn, storexn, str_zt) (unary_convertxn, unary_za_slice, unaryxn, write_za) (write_za_slice): Declare. * config/aarch64/aarch64-sve-builtins-shapes.cc (za_group_is_pure_overload): New function. (apply_predication): Use the function's gp_type for the predicate, instead of hard-coding the use of svbool_t. (parse_element_type): Add support for "c" (svcount_t). (parse_type): Add support for "c0" and "c1" (conversion destination and source types). (binary_za_slice_lane_base): New class. (binary_za_slice_opt_single_base): Likewise. (load_contiguous_base::resolve): Pass the group suffix to r.resolve. (luti_lane_zt_base): New class. (binary_int_opt_single_n, binary_opt_single_n, binary_single) (binary_za_slice_lane, binary_za_slice_int_opt_single) (binary_za_slice_opt_single, binary_za_slice_uint_opt_single) (binaryx, clamp): New shapes. (compare_scalar_def::build): Allow the return type to be a tuple. (compare_scalar_def::expand): Pass the group suffix to r.resolve. (compare_scalar_count, count_pred_c, dot_za_slice_int_lane) (dot_za_slice_lane, dot_za_slice_uint_lane, extract_pred, inherent_zt) (ldr_zt, read_za, read_za_slice, select_pred, shift_right_imm_narrowxn) (storexn, str_zt): New shapes. (ternary_qq_lane_def, ternary_qq_opt_n_def): Replace with... (ternary_qq_or_011_lane_def, ternary_qq_opt_n_or_011_def): ...these new classes. Allow a second suffix that specifies the type of the second vector argument, and that is used to derive the third. (unary_def::build): Extend to handle tuple types. (unary_convert_def::build): Use the new c0 and c1 format specifiers. (unary_convertxn, unary_za_slice, unaryxn, write_za): New shapes. (write_za_slice): Likewise. * config/aarch64/aarch64-sve-builtins-base.cc (svbic_impl::expand) (svext_bhw_impl::expand): Update call to map_to_rtx_costs. (svcntp_impl::expand): Handle svcount_t variants. (svcvt_impl::expand): Handle unpredicated conversions separately, dealing with tuples. (svdot_impl::expand): Handle 2-way dot products. (svdotprod_lane_impl::expand): Likewise. (svld1_impl::fold): Punt on tuple loads. (svld1_impl::expand): Handle tuple loads. (svldnt1_impl::expand): Likewise. (svpfalse_impl::fold): Punt on svcount_t forms. (svptrue_impl::fold): Likewise. (svptrue_impl::expand): Handle svcount_t forms. (svrint_impl): New class. (svsel_impl::fold): Punt on tuple forms. (svsel_impl::expand): Handle tuple forms. (svst1_impl::fold): Punt on tuple loads. (svst1_impl::expand): Handle tuple loads. (svstnt1_impl::expand): Likewise. (svwhilelx_impl::fold): Punt on tuple forms. (svdot_lane): Use UNSPEC_FDOT. (svmax, svmaxnm, svmin, svminmm): Add unconditional FP unspecs. (rinta, rinti, rintm, rintn, rintp, rintx, rintz): Use svrint_impl. * config/aarch64/aarch64-sve-builtins-base.def (svcreate2, svget2) (svset2, svundef2): Add _b variants. (svcvt): Use unary_convertxn. (svdot): Use ternary_qq_opt_n_or_011. (svdot_lane): Use ternary_qq_or_011_lane. (svmax, svmaxnm, svmin, svminnm): Use binary_opt_single_n. (svpfalse): Add a form that returns svcount_t results. (svrinta, svrintm, svrintn, svrintp): Use unaryxn. (svsel): Use binaryxn. (svst1, svstnt1): Use storexn. * config/aarch64/aarch64-sve-builtins-sme.h (svadd_za, svadd_write_za, svbmopa_za, svbmops_za, svdot_za) (svdot_lane_za, svldr_zt, svluti2_lane_zt, svluti4_lane_zt) (svmla_za, svmla_lane_za, svmls_za, svmls_lane_za, svread_za) (svstr_zt, svsub_za, svsub_write_za, svsudot_za, svsudot_lane_za) (svsuvdot_lane_za, svusdot_za, svusdot_lane_za, svusvdot_lane_za) (svvdot_lane_za, svwrite_za, svzero_zt): Declare. * config/aarch64/aarch64-sve-builtins-sme.cc (load_store_za_base): Rename to... (load_store_za_zt0_base): ...this and extend to tuples. (load_za_base, store_za_base): Update accordingly. (expand_ldr_str_zt0): New function. (svldr_zt_impl, svluti_lane_zt_impl, svread_za_impl, svstr_zt_impl) (svsudot_za_impl, svwrite_za_impl, svzero_zt_impl): New classes. (svadd_za, svadd_write_za, svbmopa_za, svbmops_za, svdot_za) (svdot_lane_za, svldr_zt, svluti2_lane_zt, svluti4_lane_zt) (svmla_za, svmla_lane_za, svmls_za, svmls_lane_za, svread_za) (svstr_zt, svsub_za, svsub_write_za, svsudot_za, svsudot_lane_za) (svsuvdot_lane_za, svusdot_za, svusdot_lane_za, svusvdot_lane_za) (svvdot_lane_za, svwrite_za, svzero_zt): New functions. * config/aarch64/aarch64-sve-builtins-sme.def: Add SME2 intrinsics. * config/aarch64/aarch64-sve-builtins-sve2.h (svbfmlslb, svbfmlslb_lane, svbfmlslt, svbfmlslt_lane, svclamp) (svcvtn, svpext, svpsel, svqcvt, svqcvtn, svqrshr, svqrshrn) (svqrshru, svqrshrun, svrevd, svunpk, svuzp, svuzpq, svzip) (svzipq): Declare. * config/aarch64/aarch64-sve-builtins-sve2.cc (svclamp_impl) (svcvtn_impl, svpext_impl, svpsel_impl): New classes. (svqrshl_impl::fold): Update for change to svrshl shape. (svrshl_impl::fold): Punt on tuple forms. (svsqadd_impl::expand): Update call to map_to_rtx_codes. (svunpk_impl): New class. (svbfmlslb, svbfmlslb_lane, svbfmlslt, svbfmlslt_lane, svclamp) (svcvtn, svpext, svpsel, svqcvt, svqcvtn, svqrshr, svqrshrn) (svqrshru, svqrshrun, svrevd, svunpk, svuzp, svuzpq, svzip) (svzipq): New functions. * config/aarch64/aarch64-sve-builtins-sve2.def: Add SME2 intrinsics. * config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins): Define or undefine __ARM_FEATURE_SME2. gcc/testsuite/ * gcc.target/aarch64/sve/acle/asm/test_sve_acle.h: Provide a way for test functions to share ZT0. (ATTR): Update accordingly. (TEST_LOAD_COUNT, TEST_STORE_COUNT, TEST_PN, TEST_COUNT_PN) (TEST_EXTRACT_PN, TEST_SELECT_P, TEST_COMPARE_S_X2, TEST_COMPARE_S_C) (TEST_CREATE_B, TEST_GET_B, TEST_SET_B, TEST_XN, TEST_XN_SINGLE) (TEST_XN_SINGLE_Z15, TEST_XN_SINGLE_AWKWARD, TEST_X2_NARROW) (TEST_X4_NARROW): New macros. * gcc.target/aarch64/sve/acle/asm/create2_1.c: Add _b tests. * gcc.target/aarch64/sve/acle/general-c/binary_za_m_1.c: Remove test for svmopa that becomes valid with SME2. * gcc.target/aarch64/sve/acle/general-c/create_1.c: Adjust for existence of svboolx2_t version of svcreate2. * gcc.target/aarch64/sve/acle/general-c/store_1.c: Adjust error messages to account for svcount_t predication. * gcc.target/aarch64/sve/acle/general-c/store_2.c: Likewise. * gcc.target/aarch64/sve/acle/general-c/ternary_qq_lane_1.c: Adjust error messages to account for new SME2 variants. * gcc.target/aarch64/sve/acle/general-c/ternary_qq_opt_n_2.c: Likewise.
2023-12-05aarch64: Add svboolx2_tRichard Sandiford1-0/+22
SME2 has some instructions that operate on pairs of predicates. The SME2 ACLE defines an svboolx2_t type for the associated intrinsics. The patch uses a double-width predicate mode, VNx32BI, to represent the contents, similarly to how data vector tuples work. At present there doesn't seem to be any need to define pairs for VNx2BI, VNx4BI and VNx8BI. We already supported pairs of svbool_ts at the PCS level, as part of a more general framework. All that changes on the PCS side is that we now have an associated mode. gcc/ * config/aarch64/aarch64-modes.def (VNx32BI): New mode. * config/aarch64/aarch64-protos.h (aarch64_split_double_move): Declare. * config/aarch64/aarch64-sve-builtins.cc (register_tuple_type): Handle tuples of predicates. (handle_arm_sve_h): Define svboolx2_t as a pair of two svbool_ts. * config/aarch64/aarch64-sve.md (movvnx32bi): New insn. * config/aarch64/aarch64.cc (pure_scalable_type_info::piece::get_rtx): Use VNx32BI for pairs of predicates. (pure_scalable_type_info::add_piece): Don't try to form pairs of predicates. (VEC_STRUCT): Generalize comment. (aarch64_classify_vector_mode): Handle VNx32BI. (aarch64_array_mode): Likewise. Return BLKmode for arrays of predicates that have no associated mode, rather than allowing an integer mode to be chosen. (aarch64_hard_regno_nregs): Handle VNx32BI. (aarch64_hard_regno_mode_ok): Likewise. (aarch64_split_double_move): New function, split out from... (aarch64_split_128bit_move): ...here. (aarch64_ptrue_reg): Tighten assert to aarch64_sve_pred_mode_p. (aarch64_pfalse_reg): Likewise. (aarch64_sve_same_pred_for_ptest_p): Likewise. (aarch64_sme_mode_switch_regs::add_reg): Handle VNx32BI. (aarch64_expand_mov_immediate): Restrict handling of boolean vector constants to single-predicate modes. (aarch64_classify_address): Handle VNx32BI, ensuring that both halves can be addressed. (aarch64_class_max_nregs): Handle VNx32BI. (aarch64_member_type_forces_blk): Don't for BLKmode for svboolx2_t. (aarch64_simd_valid_immediate): Allow all-zeros and all-ones for VNx32BI. (aarch64_mov_operand_p): Restrict predicate constant canonicalization to single-predicate modes. (aarch64_evpc_ext): Generalize exclusion to all predicate modes. (aarch64_evpc_rev_local, aarch64_evpc_dup): Likewise. * config/aarch64/constraints.md (PR_REGS): New predicate. gcc/testsuite/ * gcc.target/aarch64/sve/pcs/struct_3_128.c (test_nonpst3): Adjust stack offsets. (ret_nonpst3): Remove XFAIL. * gcc.target/aarch64/sve/acle/general-c/svboolx2_1.c: New test.
2023-12-05aarch64: Mark relevant SVE instructions as non-streamingRichard Sandiford1-56/+68
Following on from the previous Advanced SIMD patch, this one divides SVE instructions into non-streaming and streaming- compatible groups. gcc/ * config/aarch64/aarch64.h (TARGET_NON_STREAMING): New macro. (TARGET_SVE2_AES, TARGET_SVE2_BITPERM): Use it. (TARGET_SVE2_SHA3, TARGET_SVE2_SM4): Likewise. * config/aarch64/aarch64-sve-builtins-base.def: Separate out the functions that require PSTATE.SM to be 0 and guard them with AARCH64_FL_SM_OFF. * config/aarch64/aarch64-sve-builtins-sve2.def: Likewise. * config/aarch64/aarch64-sve-builtins.cc (check_required_extensions): Enforce AARCH64_FL_SM_OFF requirements. * config/aarch64/aarch64-sve.md (aarch64_wrffr): Require TARGET_NON_STREAMING (aarch64_rdffr, aarch64_rdffr_z, *aarch64_rdffr_z_ptest): Likewise. (*aarch64_rdffr_ptest, *aarch64_rdffr_z_cc, *aarch64_rdffr_cc) (@aarch64_ld<fn>f1<mode>): Likewise. (@aarch64_ld<fn>f1_<ANY_EXTEND:optab><SVE_HSDI:mode><SVE_PARTIAL_I:mode>) (gather_load<mode><v_int_container>): Likewise (mask_gather_load<mode><v_int_container>): Likewise. (mask_gather_load<mode><v_int_container>): Likewise. (*mask_gather_load<mode><v_int_container>_<su>xtw_unpacked): Likewise. (*mask_gather_load<mode><v_int_container>_sxtw): Likewise. (*mask_gather_load<mode><v_int_container>_uxtw): Likewise. (@aarch64_gather_load_<ANY_EXTEND:optab><SVE_4HSI:mode><SVE_4BHI:mode>) (@aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode> <SVE_2BHSI:mode>): Likewise. (*aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode> <SVE_2BHSI:mode>_<ANY_EXTEND2:su>xtw_unpacked) (*aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode> <SVE_2BHSI:mode>_sxtw): Likewise. (*aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode> <SVE_2BHSI:mode>_uxtw): Likewise. (@aarch64_ldff1_gather<mode>, @aarch64_ldff1_gather<mode>): Likewise. (*aarch64_ldff1_gather<mode>_sxtw): Likewise. (*aarch64_ldff1_gather<mode>_uxtw): Likewise. (@aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx4_WIDE:mode> <VNx4_NARROW:mode>): Likewise. (@aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode> <VNx2_NARROW:mode>): Likewise. (*aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode> <VNx2_NARROW:mode>_sxtw): Likewise. (*aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode> <VNx2_NARROW:mode>_uxtw): Likewise. (@aarch64_sve_gather_prefetch<SVE_FULL_I:mode><VNx4SI_ONLY:mode>) (@aarch64_sve_gather_prefetch<SVE_FULL_I:mode><VNx2DI_ONLY:mode>) (*aarch64_sve_gather_prefetch<SVE_FULL_I:mode><VNx2DI_ONLY:mode>_sxtw) (*aarch64_sve_gather_prefetch<SVE_FULL_I:mode><VNx2DI_ONLY:mode>_uxtw) (scatter_store<mode><v_int_container>): Likewise. (mask_scatter_store<mode><v_int_container>): Likewise. (*mask_scatter_store<mode><v_int_container>_<su>xtw_unpacked) (*mask_scatter_store<mode><v_int_container>_sxtw): Likewise. (*mask_scatter_store<mode><v_int_container>_uxtw): Likewise. (@aarch64_scatter_store_trunc<VNx4_NARROW:mode><VNx4_WIDE:mode>) (@aarch64_scatter_store_trunc<VNx2_NARROW:mode><VNx2_WIDE:mode>) (*aarch64_scatter_store_trunc<VNx2_NARROW:mode><VNx2_WIDE:mode>_sxtw) (*aarch64_scatter_store_trunc<VNx2_NARROW:mode><VNx2_WIDE:mode>_uxtw) (@aarch64_sve_ld1ro<mode>, @aarch64_adr<mode>): Likewise. (*aarch64_adr_sxtw, *aarch64_adr_uxtw_unspec): Likewise. (*aarch64_adr_uxtw_and, @aarch64_adr<mode>_shift): Likewise. (*aarch64_adr<mode>_shift, *aarch64_adr_shift_sxtw): Likewise. (*aarch64_adr_shift_uxtw, @aarch64_sve_add_<optab><vsi2qi>): Likewise. (@aarch64_sve_<sve_fp_op><mode>, fold_left_plus_<mode>): Likewise. (mask_fold_left_plus_<mode>, @aarch64_sve_compact<mode>): Likewise. * config/aarch64/aarch64-sve2.md (@aarch64_gather_ldnt<mode>) (@aarch64_gather_ldnt_<ANY_EXTEND:optab><SVE_FULL_SDI:mode> <SVE_PARTIAL_I:mode>): Likewise. (@aarch64_sve2_histcnt<mode>, @aarch64_sve2_histseg<mode>): Likewise. (@aarch64_pred_<SVE2_MATCH:sve_int_op><mode>): Likewise. (*aarch64_pred_<SVE2_MATCH:sve_int_op><mode>_cc): Likewise. (*aarch64_pred_<SVE2_MATCH:sve_int_op><mode>_ptest): Likewise. * config/aarch64/iterators.md (SVE_FP_UNARY_INT): Make FEXPA depend on TARGET_NON_STREAMING. (SVE_BFLOAT_TERNARY_LONG): Likewise BFMMLA. gcc/testsuite/ * g++.target/aarch64/sve/aarch64-ssve.exp: New harness. * g++.target/aarch64/sve/acle/aarch64-sve-acle-asm.exp: Add -DSTREAMING_COMPATIBLE to the list of options. * g++.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp: Likewise. * gcc.target/aarch64/sve/acle/aarch64-sve-acle-asm.exp: Likewise. * gcc.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp: Likewise. Fix pasto in variable name. * gcc.target/aarch64/sve/acle/asm/test_sve_acle.h: Mark functions as streaming-compatible if STREAMING_COMPATIBLE is defined. * gcc.target/aarch64/sve/acle/asm/adda_f16.c: Disable for streaming-compatible code. * gcc.target/aarch64/sve/acle/asm/adda_f32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/adda_f64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/adrb.c: Likewise. * gcc.target/aarch64/sve/acle/asm/adrd.c: Likewise. * gcc.target/aarch64/sve/acle/asm/adrh.c: Likewise. * gcc.target/aarch64/sve/acle/asm/adrw.c: Likewise. * gcc.target/aarch64/sve/acle/asm/bfmmla_f32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/compact_f32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/compact_f64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/compact_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/compact_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/compact_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/compact_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/expa_f16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/expa_f32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/expa_f64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1_gather_f32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1_gather_f64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1_gather_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1_gather_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1_gather_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1_gather_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1ro_bf16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1ro_f16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1ro_f32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1ro_f64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1ro_s16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1ro_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1ro_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1ro_s8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1ro_u16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1ro_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1ro_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1ro_u8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1sb_gather_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1sb_gather_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1sb_gather_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1sb_gather_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1sh_gather_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1sh_gather_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1sh_gather_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1sh_gather_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1sw_gather_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1sw_gather_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1ub_gather_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1ub_gather_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1ub_gather_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1ub_gather_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1uh_gather_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1uh_gather_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1uh_gather_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1uh_gather_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1uw_gather_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1uw_gather_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1_bf16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1_f16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1_f32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1_f64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1_gather_f32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1_gather_f64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1_gather_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1_gather_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1_gather_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1_gather_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1_s16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1_s8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1_u16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1_u8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1sb_gather_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1sb_gather_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1sb_gather_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1sb_gather_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1sb_s16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1sb_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1sb_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1sb_u16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1sb_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1sb_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1sh_gather_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1sh_gather_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1sh_gather_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1sh_gather_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1sh_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1sh_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1sh_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1sh_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1sw_gather_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1sw_gather_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1sw_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1sw_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1ub_gather_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1ub_gather_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1ub_gather_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1ub_gather_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1ub_s16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1ub_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1ub_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1ub_u16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1ub_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1ub_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1uh_gather_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1uh_gather_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1uh_gather_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1uh_gather_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1uh_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1uh_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1uh_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1uh_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1uw_gather_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1uw_gather_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1uw_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1uw_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldnf1_bf16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldnf1_f16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldnf1_f32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldnf1_f64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldnf1_s16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldnf1_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldnf1_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldnf1_s8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldnf1_u16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldnf1_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldnf1_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldnf1_u8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldnf1sb_s16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldnf1sb_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldnf1sb_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldnf1sb_u16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldnf1sb_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldnf1sb_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldnf1sh_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldnf1sh_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldnf1sh_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldnf1sh_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldnf1sw_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldnf1sw_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldnf1ub_s16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldnf1ub_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldnf1ub_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldnf1ub_u16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldnf1ub_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldnf1ub_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldnf1uh_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldnf1uh_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldnf1uh_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldnf1uh_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldnf1uw_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldnf1uw_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mmla_f32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mmla_f64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mmla_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mmla_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/prfb_gather.c: Likewise. * gcc.target/aarch64/sve/acle/asm/prfd_gather.c: Likewise. * gcc.target/aarch64/sve/acle/asm/prfh_gather.c: Likewise. * gcc.target/aarch64/sve/acle/asm/prfw_gather.c: Likewise. * gcc.target/aarch64/sve/acle/asm/rdffr_1.c: Likewise. * gcc.target/aarch64/sve/acle/asm/st1_scatter_f32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/st1_scatter_f64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/st1_scatter_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/st1_scatter_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/st1_scatter_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/st1_scatter_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/st1b_scatter_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/st1b_scatter_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/st1b_scatter_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/st1b_scatter_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/st1h_scatter_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/st1h_scatter_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/st1h_scatter_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/st1h_scatter_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/st1w_scatter_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/st1w_scatter_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/tmad_f16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/tmad_f32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/tmad_f64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/tsmul_f16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/tsmul_f32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/tsmul_f64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/tssel_f16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/tssel_f32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/tssel_f64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/usmmla_s32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/aesd_u8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/aese_u8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/aesimc_u8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/aesmc_u8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/bdep_u16.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/bdep_u32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/bdep_u64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/bdep_u8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/bext_u16.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/bext_u32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/bext_u64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/bext_u8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/bgrp_u16.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/bgrp_u32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/bgrp_u64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/bgrp_u8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/histcnt_s32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/histcnt_s64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/histcnt_u32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/histcnt_u64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/histseg_s8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/histseg_u8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_f32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_f64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_s32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_s64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_u32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_u64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/ldnt1sb_gather_s32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/ldnt1sb_gather_s64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/ldnt1sb_gather_u32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/ldnt1sb_gather_u64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_s32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_s64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_u32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_u64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/ldnt1sw_gather_s64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/ldnt1sw_gather_u64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/ldnt1ub_gather_s32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/ldnt1ub_gather_s64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/ldnt1ub_gather_u32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/ldnt1ub_gather_u64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_s32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_s64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_u32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_u64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/ldnt1uw_gather_s64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/ldnt1uw_gather_u64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/match_s16.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/match_s8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/match_u16.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/match_u8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/nmatch_s16.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/nmatch_s8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/nmatch_u16.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/nmatch_u8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/pmullb_pair_u64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/pmullt_pair_u64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/rax1_s64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/rax1_u64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/sm4e_u32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/sm4ekey_u32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_f32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_f64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_s32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_s64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_u32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_u64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/stnt1b_scatter_s32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/stnt1b_scatter_s64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/stnt1b_scatter_u32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/stnt1b_scatter_u64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_s32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_s64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_u32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_u64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/stnt1w_scatter_s64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/stnt1w_scatter_u64.c: Likewise.
2023-12-05aarch64: Add tuple forms of svreinterpretRichard Sandiford1-4/+4
SME2 adds a number of intrinsics that operate on tuples of 2 and 4 vectors. The ACLE therefore extends the existing svreinterpret intrinsics to handle tuples as well. gcc/ * config/aarch64/aarch64-sve-builtins-base.cc (svreinterpret_impl::fold): Punt on tuple forms. (svreinterpret_impl::expand): Use tuple_mode instead of vector_mode. * config/aarch64/aarch64-sve-builtins-base.def (svreinterpret): Extend to x1234 groups. * config/aarch64/aarch64-sve-builtins-functions.h (multi_vector_function::vectors_per_tuple): If the function has a group suffix, get the number of vectors from there. * config/aarch64/aarch64-sve-builtins-shapes.h (reinterpret): Declare. * config/aarch64/aarch64-sve-builtins-shapes.cc (reinterpret_def) (reinterpret): New function shape. * config/aarch64/aarch64-sve-builtins.cc (function_groups): Handle DEF_SVE_FUNCTION_GS. * config/aarch64/aarch64-sve-builtins.def (DEF_SVE_FUNCTION_GS): New macro. (DEF_SVE_FUNCTION): Forward to DEF_SVE_FUNCTION_GS by default. * config/aarch64/aarch64-sve-builtins.h (function_instance::tuple_mode): New member function. (function_base::vectors_per_tuple): Take the function instance as argument and get the number from the group suffix. (function_instance::vectors_per_tuple): Update accordingly. * config/aarch64/iterators.md (SVE_FULLx2, SVE_FULLx3, SVE_FULLx4) (SVE_ALL_STRUCT): New mode iterators. (SVE_STRUCT): Redefine in terms of SVE_FULL*. * config/aarch64/aarch64-sve.md (@aarch64_sve_reinterpret<mode>) (*aarch64_sve_reinterpret<mode>): Extend to SVE structure modes. gcc/testsuite/ * gcc.target/aarch64/sve/acle/asm/test_sve_acle.h (TEST_DUAL_XN): New macro. * gcc.target/aarch64/sve/acle/asm/reinterpret_bf16.c: Add tests for tuple forms. * gcc.target/aarch64/sve/acle/asm/reinterpret_f16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/reinterpret_f32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/reinterpret_f64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/reinterpret_s16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/reinterpret_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/reinterpret_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/reinterpret_s8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/reinterpret_u16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/reinterpret_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/reinterpret_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/reinterpret_u8.c: Likewise.
2023-11-09AArch64: Add SVE implementation for cond_copysign.Tamar Christina1-0/+51
This adds an implementation for masked copysign along with an optimized pattern for masked copysign (x, -1). gcc/ChangeLog: PR tree-optimization/109154 * config/aarch64/aarch64-sve.md (cond_copysign<mode>): New. gcc/testsuite/ChangeLog: PR tree-optimization/109154 * gcc.target/aarch64/sve/fneg-abs_5.c: New test.
2023-11-09AArch64: Handle copysign (x, -1) expansion efficientlyTamar Christina1-6/+21
copysign (x, -1) is effectively fneg (abs (x)) which on AArch64 can be most efficiently done by doing an OR of the signbit. The middle-end will optimize fneg (abs (x)) now to copysign as the canonical form and so this optimizes the expansion. If the target has an inclusive-OR that takes an immediate, then the transformed instruction is both shorter and faster. For those that don't, the immediate has to be separately constructed, but this still ends up being faster as the immediate construction is not on the critical path. Note that this is part of another patch series, the additional testcases are mutually dependent on the match.pd patch. As such the tests are added there insteadof here. gcc/ChangeLog: PR tree-optimization/109154 * config/aarch64/aarch64.md (copysign<GPF:mode>3): Handle copysign (x, -1). * config/aarch64/aarch64-simd.md (copysign<mode>3): Likewise. * config/aarch64/aarch64-sve.md (copysign<mode>3): Likewise.
2023-10-03aarch64: Convert aarch64 multi choice patterns to new syntaxAndrea Corallo1-1459/+1514
Hi all, this patch converts a number of multi multi choice patterns within the aarch64 backend to the new syntax. The list of the converted patterns is in the Changelog. For completeness here follows the list of multi choice patterns that were rejected for conversion by my parser, they typically have some C as asm output and require some manual intervention: aarch64_simd_vec_set<mode>, aarch64_get_lane<mode>, aarch64_cm<optab>di, aarch64_cm<optab>di, aarch64_cmtstdi, *aarch64_movv8di, *aarch64_be_mov<mode>, *aarch64_be_movci, *aarch64_be_mov<mode>, *aarch64_be_movxi, *aarch64_sve_mov<mode>_le, *aarch64_sve_mov<mode>_be, @aarch64_pred_mov<mode>, @aarch64_sve_gather_prefetch<SVE_FULL_I:mode><VNx4SI_ONLY:mode>, @aarch64_sve_gather_prefetch<SVE_FULL_I:mode><VNx2DI_ONLY:mode>, *aarch64_sve_gather_prefetch<SVE_FULL_I:mode><VNx2DI_ONLY:mode>_sxtw, *aarch64_sve_gather_prefetch<SVE_FULL_I:mode><VNx2DI_ONLY:mode>_uxtw, @aarch64_vec_duplicate_vq<mode>_le, *vec_extract<mode><Vel>_0, *vec_extract<mode><Vel>_v128, *cmp<cmp_op><mode>_and, *fcm<cmp_op><mode>_and_combine, @aarch64_sve_ext<mode>, @aarch64_sve2_<su>aba<mode>, *sibcall_insn, *sibcall_value_insn, *xor_one_cmpl<mode>3, *insv_reg<mode>_<SUBDI_BITS>, *aarch64_bfi<GPI:mode><ALLX:mode>_<SUBDI_BITS>, *aarch64_bfidi<ALLX:mode>_subreg_<SUBDI_BITS>, *aarch64_bfxil<mode>, *aarch64_bfxilsi_uxtw, *aarch64_<su_optab>cvtf<fcvt_target><GPF:mode>2_mult, atomic_store<mode>. Bootstraped and reg tested on aarch64-unknown-linux-gnu, also I analysed tmp-mddump.md (from 'make mddump') and could not find effective differences, okay for trunk? Bests Andrea gcc/ChangeLog: * config/aarch64/aarch64.md (@ccmp<CC_ONLY:mode><GPI:mode>) (@ccmp<CC_ONLY:mode><GPI:mode>_rev, *call_insn, *call_value_insn) (*mov<mode>_aarch64, load_pair_sw_<SX:mode><SX2:mode>) (load_pair_dw_<DX:mode><DX2:mode>) (store_pair_sw_<SX:mode><SX2:mode>) (store_pair_dw_<DX:mode><DX2:mode>, *extendsidi2_aarch64) (*zero_extendsidi2_aarch64, *load_pair_zero_extendsidi2_aarch64) (*extend<SHORT:mode><GPI:mode>2_aarch64) (*zero_extend<SHORT:mode><GPI:mode>2_aarch64) (*extendqihi2_aarch64, *zero_extendqihi2_aarch64) (*add<mode>3_aarch64, *addsi3_aarch64_uxtw, *add<mode>3_poly_1) (add<mode>3_compare0, *addsi3_compare0_uxtw) (*add<mode>3_compareC_cconly, add<mode>3_compareC) (*add<mode>3_compareV_cconly_imm, add<mode>3_compareV_imm) (*add<mode>3nr_compare0, subdi3, subv<GPI:mode>_imm) (*cmpv<GPI:mode>_insn, sub<mode>3_compare1_imm, neg<mode>2) (cmp<mode>, fcmp<mode>, fcmpe<mode>, *cmov<mode>_insn) (*cmovsi_insn_uxtw, <optab><mode>3, *<optab>si3_uxtw) (*and<mode>3_compare0, *andsi3_compare0_uxtw, one_cmpl<mode>2) (*<NLOGICAL:optab>_one_cmpl<mode>3, *and<mode>3nr_compare0) (*aarch64_ashl_sisd_or_int_<mode>3) (*aarch64_lshr_sisd_or_int_<mode>3) (*aarch64_ashr_sisd_or_int_<mode>3, *ror<mode>3_insn) (*<optab>si3_insn_uxtw, <optab>_trunc<fcvt_target><GPI:mode>2) (<optab><fcvt_target><GPF:mode>2) (<FCVT_F2FIXED:fcvt_fixed_insn><GPF:mode>3) (<FCVT_FIXED2F:fcvt_fixed_insn><GPI:mode>3) (*aarch64_<optab><mode>3_cssc, copysign<GPF:mode>3_insn): Update to new syntax. * config/aarch64/aarch64-sve2.md (@aarch64_scatter_stnt<mode>) (@aarch64_scatter_stnt_<SVE_FULL_SDI:mode><SVE_PARTIAL_I:mode>) (*aarch64_mul_unpredicated_<mode>) (@aarch64_pred_<sve_int_op><mode>, *cond_<sve_int_op><mode>_2) (*cond_<sve_int_op><mode>_3, *cond_<sve_int_op><mode>_any) (*cond_<sve_int_op><mode>_z, @aarch64_pred_<sve_int_op><mode>) (*cond_<sve_int_op><mode>_2, *cond_<sve_int_op><mode>_3) (*cond_<sve_int_op><mode>_any, @aarch64_sve_<sve_int_op><mode>) (@aarch64_sve_<sve_int_op>_lane_<mode>) (@aarch64_sve_add_mul_lane_<mode>) (@aarch64_sve_sub_mul_lane_<mode>, @aarch64_sve2_xar<mode>) (*aarch64_sve2_bcax<mode>, @aarch64_sve2_eor3<mode>) (*aarch64_sve2_nor<mode>, *aarch64_sve2_nand<mode>) (*aarch64_sve2_bsl<mode>, *aarch64_sve2_nbsl<mode>) (*aarch64_sve2_bsl1n<mode>, *aarch64_sve2_bsl2n<mode>) (*aarch64_sve2_sra<mode>, @aarch64_sve_add_<sve_int_op><mode>) (*aarch64_sve2_<su>aba<mode>, @aarch64_sve_add_<sve_int_op><mode>) (@aarch64_sve_add_<sve_int_op>_lane_<mode>) (@aarch64_sve_qadd_<sve_int_op><mode>) (@aarch64_sve_qadd_<sve_int_op>_lane_<mode>) (@aarch64_sve_sub_<sve_int_op><mode>) (@aarch64_sve_sub_<sve_int_op>_lane_<mode>) (@aarch64_sve_qsub_<sve_int_op><mode>) (@aarch64_sve_qsub_<sve_int_op>_lane_<mode>) (@aarch64_sve_<sve_fp_op><mode>, @aarch64_<sve_fp_op>_lane_<mode>) (@aarch64_pred_<sve_int_op><mode>) (@aarch64_pred_<sve_fp_op><mode>, *cond_<sve_int_op><mode>_2) (*cond_<sve_int_op><mode>_z, @aarch64_sve_<optab><mode>) (@aarch64_<optab>_lane_<mode>, @aarch64_sve_<optab><mode>) (@aarch64_<optab>_lane_<mode>, @aarch64_pred_<sve_fp_op><mode>) (*cond_<sve_fp_op><mode>_any_relaxed) (*cond_<sve_fp_op><mode>_any_strict) (@aarch64_pred_<sve_int_op><mode>, *cond_<sve_int_op><mode>) (@aarch64_pred_<sve_fp_op><mode>, *cond_<sve_fp_op><mode>) (*cond_<sve_fp_op><mode>_strict): Update to new syntax. * config/aarch64/aarch64-sve.md (*aarch64_sve_mov<mode>_ldr_str) (*aarch64_sve_mov<mode>_no_ldr_str, @aarch64_pred_mov<mode>) (*aarch64_sve_mov<mode>, aarch64_wrffr) (mask_scatter_store<mode><v_int_container>) (*mask_scatter_store<mode><v_int_container>_<su>xtw_unpacked) (*mask_scatter_store<mode><v_int_container>_sxtw) (*mask_scatter_store<mode><v_int_container>_uxtw) (@aarch64_scatter_store_trunc<VNx4_NARROW:mode><VNx4_WIDE:mode>) (@aarch64_scatter_store_trunc<VNx2_NARROW:mode><VNx2_WIDE:mode>) (*aarch64_scatter_store_trunc<VNx2_NARROW:mode><VNx2_WIDE:mode>_sxtw) (*aarch64_scatter_store_trunc<VNx2_NARROW:mode><VNx2_WIDE:mode>_uxtw) (*vec_duplicate<mode>_reg, vec_shl_insert_<mode>) (vec_series<mode>, @extract_<last_op>_<mode>) (@aarch64_pred_<optab><mode>, *cond_<optab><mode>_2) (*cond_<optab><mode>_any, @aarch64_pred_<optab><mode>) (@aarch64_sve_revbhw_<SVE_ALL:mode><PRED_HSD:mode>) (@cond_<optab><mode>) (*<optab><SVE_PARTIAL_I:mode><SVE_HSDI:mode>2) (@aarch64_pred_sxt<SVE_FULL_HSDI:mode><SVE_PARTIAL_I:mode>) (@aarch64_cond_sxt<SVE_FULL_HSDI:mode><SVE_PARTIAL_I:mode>) (*cond_uxt<mode>_2, *cond_uxt<mode>_any, *cnot<mode>) (*cond_cnot<mode>_2, *cond_cnot<mode>_any) (@aarch64_pred_<optab><mode>, *cond_<optab><mode>_2_relaxed) (*cond_<optab><mode>_2_strict, *cond_<optab><mode>_any_relaxed) (*cond_<optab><mode>_any_strict, @aarch64_pred_<optab><mode>) (*cond_<optab><mode>_2, *cond_<optab><mode>_3) (*cond_<optab><mode>_any, add<mode>3, sub<mode>3) (@aarch64_pred_<su>abd<mode>, *aarch64_cond_<su>abd<mode>_2) (*aarch64_cond_<su>abd<mode>_3, *aarch64_cond_<su>abd<mode>_any) (@aarch64_sve_<optab><mode>, @aarch64_pred_<optab><mode>) (*cond_<optab><mode>_2, *cond_<optab><mode>_z) (@aarch64_pred_<optab><mode>, *cond_<optab><mode>_2) (*cond_<optab><mode>_3, *cond_<optab><mode>_any, <optab><mode>3) (*cond_bic<mode>_2, *cond_bic<mode>_any) (@aarch64_pred_<optab><mode>, *cond_<optab><mode>_2_const) (*cond_<optab><mode>_any_const, *cond_<sve_int_op><mode>_m) (*cond_<sve_int_op><mode>_z, *sdiv_pow2<mode>3) (*cond_<sve_int_op><mode>_2, *cond_<sve_int_op><mode>_any) (@aarch64_pred_<optab><mode>, *cond_<optab><mode>_2_relaxed) (*cond_<optab><mode>_2_strict, *cond_<optab><mode>_any_relaxed) (*cond_<optab><mode>_any_strict, @aarch64_pred_<optab><mode>) (*cond_<optab><mode>_2_relaxed, *cond_<optab><mode>_2_strict) (*cond_<optab><mode>_2_const_relaxed) (*cond_<optab><mode>_2_const_strict) (*cond_<optab><mode>_3_relaxed, *cond_<optab><mode>_3_strict) (*cond_<optab><mode>_any_relaxed, *cond_<optab><mode>_any_strict) (*cond_<optab><mode>_any_const_relaxed) (*cond_<optab><mode>_any_const_strict) (@aarch64_pred_<optab><mode>, *cond_add<mode>_2_const_relaxed) (*cond_add<mode>_2_const_strict) (*cond_add<mode>_any_const_relaxed) (*cond_add<mode>_any_const_strict, @aarch64_pred_<optab><mode>) (*cond_<optab><mode>_2_relaxed, *cond_<optab><mode>_2_strict) (*cond_<optab><mode>_any_relaxed, *cond_<optab><mode>_any_strict) (@aarch64_pred_<optab><mode>, *cond_sub<mode>_3_const_relaxed) (*cond_sub<mode>_3_const_strict, *cond_sub<mode>_const_relaxed) (*cond_sub<mode>_const_strict, *aarch64_pred_abd<mode>_relaxed) (*aarch64_pred_abd<mode>_strict) (*aarch64_cond_abd<mode>_2_relaxed) (*aarch64_cond_abd<mode>_2_strict) (*aarch64_cond_abd<mode>_3_relaxed) (*aarch64_cond_abd<mode>_3_strict) (*aarch64_cond_abd<mode>_any_relaxed) (*aarch64_cond_abd<mode>_any_strict, @aarch64_pred_<optab><mode>) (@aarch64_pred_fma<mode>, *cond_fma<mode>_2, *cond_fma<mode>_4) (*cond_fma<mode>_any, @aarch64_pred_fnma<mode>) (*cond_fnma<mode>_2, *cond_fnma<mode>_4, *cond_fnma<mode>_any) (<sur>dot_prod<vsi2qi>, @aarch64_<sur>dot_prod_lane<vsi2qi>) (@<sur>dot_prod<vsi2qi>, @aarch64_<sur>dot_prod_lane<vsi2qi>) (@aarch64_sve_add_<optab><vsi2qi>, @aarch64_pred_<optab><mode>) (*cond_<optab><mode>_2_relaxed, *cond_<optab><mode>_2_strict) (*cond_<optab><mode>_4_relaxed, *cond_<optab><mode>_4_strict) (*cond_<optab><mode>_any_relaxed, *cond_<optab><mode>_any_strict) (@aarch64_<optab>_lane_<mode>, @aarch64_pred_<optab><mode>) (*cond_<optab><mode>_4_relaxed, *cond_<optab><mode>_4_strict) (*cond_<optab><mode>_any_relaxed, *cond_<optab><mode>_any_strict) (@aarch64_<optab>_lane_<mode>, @aarch64_sve_tmad<mode>) (@aarch64_sve_<sve_fp_op>vnx4sf) (@aarch64_sve_<sve_fp_op>_lanevnx4sf) (@aarch64_sve_<sve_fp_op><mode>, *vcond_mask_<mode><vpred>) (@aarch64_sel_dup<mode>, @aarch64_pred_cmp<cmp_op><mode>) (*cmp<cmp_op><mode>_cc, *cmp<cmp_op><mode>_ptest) (@aarch64_pred_fcm<cmp_op><mode>, @fold_extract_<last_op>_<mode>) (@aarch64_fold_extract_vector_<last_op>_<mode>) (@aarch64_sve_splice<mode>) (@aarch64_sve_<optab>_nontrunc<SVE_FULL_F:mode><SVE_FULL_HSDI:mode>) (@aarch64_sve_<optab>_trunc<VNx2DF_ONLY:mode><VNx4SI_ONLY:mode>) (*cond_<optab>_nontrunc<SVE_FULL_F:mode><SVE_FULL_HSDI:mode>_relaxed) (*cond_<optab>_nontrunc<SVE_FULL_F:mode><SVE_FULL_HSDI:mode>_strict) (*cond_<optab>_trunc<VNx2DF_ONLY:mode><VNx4SI_ONLY:mode>) (@aarch64_sve_<optab>_nonextend<SVE_FULL_HSDI:mode><SVE_FULL_F:mode>) (@aarch64_sve_<optab>_extend<VNx4SI_ONLY:mode><VNx2DF_ONLY:mode>) (*cond_<optab>_nonextend<SVE_FULL_HSDI:mode><SVE_FULL_F:mode>_relaxed) (*cond_<optab>_nonextend<SVE_FULL_HSDI:mode><SVE_FULL_F:mode>_strict) (*cond_<optab>_extend<VNx4SI_ONLY:mode><VNx2DF_ONLY:mode>) (@aarch64_sve_<optab>_trunc<SVE_FULL_SDF:mode><SVE_FULL_HSF:mode>) (*cond_<optab>_trunc<SVE_FULL_SDF:mode><SVE_FULL_HSF:mode>) (@aarch64_sve_<optab>_trunc<VNx4SF_ONLY:mode><VNx8BF_ONLY:mode>) (*cond_<optab>_trunc<VNx4SF_ONLY:mode><VNx8BF_ONLY:mode>) (@aarch64_sve_<optab>_nontrunc<SVE_FULL_HSF:mode><SVE_FULL_SDF:mode>) (*cond_<optab>_nontrunc<SVE_FULL_HSF:mode><SVE_FULL_SDF:mode>) (@aarch64_brk<brk_op>, *aarch64_sve_<inc_dec><mode>_cntp): Update to new syntax. * config/aarch64/aarch64-simd.md (aarch64_simd_dup<mode>) (load_pair<DREG:mode><DREG2:mode>) (vec_store_pair<DREG:mode><DREG2:mode>, aarch64_simd_stp<mode>) (aarch64_simd_mov_from_<mode>low) (aarch64_simd_mov_from_<mode>high, and<mode>3<vczle><vczbe>) (ior<mode>3<vczle><vczbe>, aarch64_simd_ashr<mode><vczle><vczbe>) (aarch64_simd_bsl<mode>_internal<vczle><vczbe>) (*aarch64_simd_bsl<mode>_alt<vczle><vczbe>) (aarch64_simd_bsldi_internal, aarch64_simd_bsldi_alt) (store_pair_lanes<mode>, *aarch64_combine_internal<mode>) (*aarch64_combine_internal_be<mode>, *aarch64_combinez<mode>) (*aarch64_combinez_be<mode>) (aarch64_cm<optab><mode><vczle><vczbe>, *aarch64_cm<optab>di) (aarch64_cm<optab><mode><vczle><vczbe>, *aarch64_mov<mode>) (*aarch64_be_mov<mode>, *aarch64_be_movoi): Update to new syntax.
2023-09-14aarch64: Coerce addresses to be suitable for LD1RQRichard Sandiford1-1/+14
In the following test: svuint8_t ld(uint8_t *ptr) { return svld1rq(svptrue_b8(), ptr + 2); } ptr + 2 is a valid address for an Advanced SIMD load, but not for an SVE load. We therefore ended up generating: ldr q0, [x0, 2] dup z0.q, z0.q[0] This patch makes us generate LD1RQ for that case too. It takes the slightly old-school approach of making the predicate broader than the constraint. That is: any valid memory address is accepted as an operand before RA. If the instruction remains during RA, LRA will coerce the address to match the constraint. If the instruction gets split before RA, the splitter will load invalid addresses into a scratch register. gcc/ * config/aarch64/aarch64-sve.md (@aarch64_vec_duplicate_vq<mode>_le): Accept all nonimmediate_operands, but keep the existing constraints. If the instruction is split before RA, load invalid addresses into a temporary register. * config/aarch64/predicates.md (aarch64_sve_dup_ld1rq_operand): Delete. gcc/testsuite/ * gcc.target/aarch64/sve/acle/general/ld1rq_1.c: New test.
2023-06-21aarch64: Avoid same input and output Z register for gather loadsKyrylo Tkachov1-67/+127
The architecture recommends that load-gather instructions avoid using the same Z register for the load address and the destination, and the Software Optimization Guides for Arm cores recommend that as well. This means that for code like: svuint64_t food (svbool_t p, uint64_t *in, svint64_t offsets, svuint64_t a) { return svadd_u64_x (p, a, svld1_gather_offset(p, in, offsets)); } we'll want to avoid generating the current: food: ld1d z0.d, p0/z, [x0, z0.d] // Z0 reused as input and output. add z0.d, z1.d, z0.d ret However, we still want to avoid generating extra moves where there were none before, so the tight aarch64-sve-acle.exp tests for load gathers should still pass as they are. This patch implements that recommendation for the load gather patterns by: * duplicating the alternatives * marking the output operand as early clobber * Tying the input Z register operand in the original alternatives to 0 * Penalising the original alternatives with '?' This results in a large-ish patch in terms of diff lines but the new compact syntax (thanks Tamar) makes it quite a readable an regular change. The benchmark numbers on a Neoverse V1 on fprate look okay: diff 503.bwaves_r 0.00% 507.cactuBSSN_r 0.00% 508.namd_r 0.00% 510.parest_r 0.55% 511.povray_r 0.22% 519.lbm_r 0.00% 521.wrf_r 0.00% 526.blender_r 0.00% 527.cam4_r 0.56% 538.imagick_r 0.00% 544.nab_r 0.00% 549.fotonik3d_r 0.00% 554.roms_r 0.00% fprate 0.10% Bootstrapped and tested on aarch64-none-linux-gnu. gcc/ChangeLog: * config/aarch64/aarch64-sve.md (mask_gather_load<mode><v_int_container>): Add alternatives to prefer to avoid same input and output Z register. (mask_gather_load<mode><v_int_container>): Likewise. (*mask_gather_load<mode><v_int_container>_<su>xtw_unpacked): Likewise. (*mask_gather_load<mode><v_int_container>_sxtw): Likewise. (*mask_gather_load<mode><v_int_container>_uxtw): Likewise. (@aarch64_gather_load_<ANY_EXTEND:optab><SVE_4HSI:mode><SVE_4BHI:mode>): Likewise. (@aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode><SVE_2BHSI:mode>): Likewise. (*aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode> <SVE_2BHSI:mode>_<ANY_EXTEND2:su>xtw_unpacked): Likewise. (*aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode> <SVE_2BHSI:mode>_sxtw): Likewise. (*aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode> <SVE_2BHSI:mode>_uxtw): Likewise. (@aarch64_ldff1_gather<mode>): Likewise. (@aarch64_ldff1_gather<mode>): Likewise. (*aarch64_ldff1_gather<mode>_sxtw): Likewise. (*aarch64_ldff1_gather<mode>_uxtw): Likewise. (@aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx4_WIDE:mode> <VNx4_NARROW:mode>): Likewise. (@aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode> <VNx2_NARROW:mode>): Likewise. (*aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode> <VNx2_NARROW:mode>_sxtw): Likewise. (*aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode> <VNx2_NARROW:mode>_uxtw): Likewise. * config/aarch64/aarch64-sve2.md (@aarch64_gather_ldnt<mode>): Likewise. (@aarch64_gather_ldnt_<ANY_EXTEND:optab><SVE_FULL_SDI:mode> <SVE_PARTIAL_I:mode>): Likewise. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/gather_earlyclobber.c: New test. * gcc.target/aarch64/sve2/gather_earlyclobber.c: New test.
2023-06-21aarch64: Convert SVE gather patterns to compact syntaxKyrylo Tkachov1-176/+194
This patch converts the SVE load gather patterns to the new compact syntax that Tamar introduced. This allows for a future patch I want to contribute to add more alternatives that are better viewed in the more compact form. The lines in some patterns are >80 long now, but I think that's unavoidable and those patterns already had overly long constraint strings. No functional change intended. Bootstrapped and tested on aarch64-none-linux-gnu. gcc/ChangeLog: * config/aarch64/aarch64-sve.md (mask_gather_load<mode><v_int_container>): Convert to compact alternatives syntax. (mask_gather_load<mode><v_int_container>): Likewise. (*mask_gather_load<mode><v_int_container>_<su>xtw_unpacked): Likewise. (*mask_gather_load<mode><v_int_container>_sxtw): Likewise. (*mask_gather_load<mode><v_int_container>_uxtw): Likewise. (@aarch64_gather_load_<ANY_EXTEND:optab><SVE_4HSI:mode><SVE_4BHI:mode>): Likewise. (@aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode><SVE_2BHSI:mode>): Likewise. (*aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode> <SVE_2BHSI:mode>_<ANY_EXTEND2:su>xtw_unpacked): Likewise. (*aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode> <SVE_2BHSI:mode>_sxtw): Likewise. (*aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode> <SVE_2BHSI:mode>_uxtw): Likewise. (@aarch64_ldff1_gather<mode>): Likewise. (@aarch64_ldff1_gather<mode>): Likewise. (*aarch64_ldff1_gather<mode>_sxtw): Likewise. (*aarch64_ldff1_gather<mode>_uxtw): Likewise. (@aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx4_WIDE:mode> <VNx4_NARROW:mode>): Likewise. (@aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode> <VNx2_NARROW:mode>): Likewise. (*aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode> <VNx2_NARROW:mode>_sxtw): Likewise. (*aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode> <VNx2_NARROW:mode>_uxtw): Likewise. * config/aarch64/aarch64-sve2.md (@aarch64_gather_ldnt<mode>): Likewise. (@aarch64_gather_ldnt_<ANY_EXTEND:optab><SVE_FULL_SDI:mode> <SVE_PARTIAL_I:mode>): Likewise.
2023-06-21Revert "aarch64: Convert SVE gather patterns to compact syntax"Kyrylo Tkachov1-254/+176
This reverts commit bb3c69058a5fb874ea3c5c26bfb331d33d0497c3.
2023-06-21aarch64: Convert SVE gather patterns to compact syntaxKyrylo Tkachov1-176/+254
This patch converts the SVE load gather patterns to the new compact syntax that Tamar introduced. This allows for a future patch I want to contribute to add more alternatives that are better viewed in the more compact form. The lines in some patterns are >80 long now, but I think that's unavoidable and those patterns already had overly long constraint strings. No functional change intended. Bootstrapped and tested on aarch64-none-linux-gnu. gcc/ChangeLog: * config/aarch64/aarch64-sve.md (mask_gather_load<mode><v_int_container>): Convert to compact alternatives syntax. (mask_gather_load<mode><v_int_container>): Likewise. (*mask_gather_load<mode><v_int_container>_<su>xtw_unpacked): Likewise. (*mask_gather_load<mode><v_int_container>_sxtw): Likewise. (*mask_gather_load<mode><v_int_container>_uxtw): Likewise. (@aarch64_gather_load_<ANY_EXTEND:optab><SVE_4HSI:mode><SVE_4BHI:mode>): Likewise. (@aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode><SVE_2BHSI:mode>): Likewise. (*aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode> <SVE_2BHSI:mode>_<ANY_EXTEND2:su>xtw_unpacked): Likewise. (*aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode> <SVE_2BHSI:mode>_sxtw): Likewise. (*aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode> <SVE_2BHSI:mode>_uxtw): Likewise. (@aarch64_ldff1_gather<mode>): Likewise. (@aarch64_ldff1_gather<mode>): Likewise. (*aarch64_ldff1_gather<mode>_sxtw): Likewise. (*aarch64_ldff1_gather<mode>_uxtw): Likewise. (@aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx4_WIDE:mode> <VNx4_NARROW:mode>): Likewise. (@aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode> <VNx2_NARROW:mode>): Likewise. (*aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode> <VNx2_NARROW:mode>_sxtw): Likewise. (*aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode> <VNx2_NARROW:mode>_uxtw): Likewise. * config/aarch64/aarch64-sve2.md (@aarch64_gather_ldnt<mode>): Likewise. (@aarch64_gather_ldnt_<ANY_EXTEND:optab><SVE_FULL_SDI:mode> <SVE_PARTIAL_I:mode>): Likewise.
2023-06-15AArch64: New RTL for ABDOluwatamilore Adebayo1-2/+2
This patch adds new RTL and tests for sabd and uabd PR tree-optimization/109156 gcc/ChangeLog: * config/aarch64/aarch64-simd.md (aarch64_<su>abd<mode>): Rename to <su>abd<mode>3. * config/aarch64/aarch64-sve.md (<su>abd<mode>_3): Rename to <su>abd<mode>3. gcc/testsuite/ChangeLog: * gcc.target/aarch64/abd.h: New file. * gcc.target/aarch64/abd_2.c: New test. * gcc.target/aarch64/abd_3.c: New test. * gcc.target/aarch64/abd_4.c: New test. * gcc.target/aarch64/abd_none_2.c: New test. * gcc.target/aarch64/abd_none_3.c: New test. * gcc.target/aarch64/abd_none_4.c: New test. * gcc.target/aarch64/abd_run_1.c: New test. * gcc.target/aarch64/sve/abd_1.c: New test. * gcc.target/aarch64/sve/abd_none_1.c: New test. * gcc.target/aarch64/sve/abd_2.c: New test. * gcc.target/aarch64/sve/abd_none_2.c: New test.
2023-05-09aarch64: Improve register allocation for lane instructionsRichard Sandiford1-1/+1
REG_ALLOC_ORDER is much less important than it used to be, but it is still used as a tie-breaker when multiple registers in a class are equally good. Previously aarch64 used the default approach of allocating in order of increasing register number. But as the comment in the patch says, it's better to allocate FP and predicate registers in the opposite order, so that we don't eat into smaller register classes unnecessarily. This fixes some existing FIXMEs and improves the register allocation for some Arm ACLE code. Doing this also showed that *vcond_mask_<mode><vpred> (predicated MOV/SEL) unnecessarily required p0-p7 rather than p0-p15 for the unpredicated movprfx alternatives. Only the predicated movprfx alternative requires p0-p7 (due to the movprfx itself, rather than due to the main instruction). gcc/ * config/aarch64/aarch64-protos.h (aarch64_adjust_reg_alloc_order): Declare. * config/aarch64/aarch64.h (REG_ALLOC_ORDER): Define. (ADJUST_REG_ALLOC_ORDER): Likewise. * config/aarch64/aarch64.cc (aarch64_adjust_reg_alloc_order): New function. * config/aarch64/aarch64-sve.md (*vcond_mask_<mode><vpred>): Use Upa rather than Upl for unpredicated movprfx alternatives. gcc/testsuite/ * gcc.target/aarch64/sve/acle/asm/abd_f16.c: Remove XFAILs. * gcc.target/aarch64/sve/acle/asm/abd_f32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/abd_f64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/abd_s16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/abd_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/abd_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/abd_s8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/abd_u16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/abd_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/abd_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/abd_u8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/add_s16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/add_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/add_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/add_s8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/add_u16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/add_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/add_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/add_u8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/and_s16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/and_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/and_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/and_s8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/and_u16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/and_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/and_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/and_u8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/asr_s16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/asr_s8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/bic_s16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/bic_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/bic_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/bic_s8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/bic_u16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/bic_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/bic_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/bic_u8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/div_f16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/div_f32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/div_f64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/div_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/div_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/div_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/div_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/divr_f16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/divr_f32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/divr_f64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/divr_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/divr_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/divr_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/divr_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/dot_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/dot_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/dot_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/dot_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/eor_s16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/eor_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/eor_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/eor_s8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/eor_u16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/eor_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/eor_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/eor_u8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/lsl_s16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/lsl_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/lsl_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/lsl_s8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/lsl_u16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/lsl_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/lsl_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/lsl_u8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/lsl_wide_s16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/lsl_wide_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/lsl_wide_s8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/lsl_wide_u16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/lsl_wide_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/lsl_wide_u8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/lsr_u16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/lsr_u8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mad_f16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mad_f32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mad_f64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mad_s16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mad_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mad_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mad_s8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mad_u16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mad_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mad_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mad_u8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/max_s16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/max_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/max_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/max_s8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/max_u16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/max_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/max_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/max_u8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/min_s16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/min_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/min_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/min_s8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/min_u16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/min_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/min_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/min_u8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mla_f16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mla_f32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mla_f64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mla_s16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mla_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mla_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mla_s8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mla_u16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mla_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mla_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mla_u8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mls_f16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mls_f32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mls_f64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mls_s16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mls_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mls_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mls_s8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mls_u16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mls_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mls_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mls_u8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/msb_f16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/msb_f32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/msb_f64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/msb_s16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/msb_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/msb_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/msb_s8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/msb_u16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/msb_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/msb_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/msb_u8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mul_f16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mul_f16_notrap.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mul_f32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mul_f32_notrap.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mul_f64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mul_f64_notrap.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mul_s16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mul_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mul_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mul_s8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mul_u16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mul_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mul_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mul_u8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mulh_s16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mulh_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mulh_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mulh_s8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mulh_u16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mulh_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mulh_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mulh_u8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mulx_f16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mulx_f32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mulx_f64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/nmad_f16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/nmad_f32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/nmad_f64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/nmla_f16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/nmla_f32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/nmla_f64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/nmls_f16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/nmls_f32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/nmls_f64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/nmsb_f16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/nmsb_f32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/nmsb_f64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/orr_s16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/orr_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/orr_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/orr_s8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/orr_u16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/orr_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/orr_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/orr_u8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/scale_f16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/scale_f32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/scale_f64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/sub_s16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/sub_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/sub_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/sub_s8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/sub_u16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/sub_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/sub_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/sub_u8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/subr_f16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/subr_f16_notrap.c: Likewise. * gcc.target/aarch64/sve/acle/asm/subr_f32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/subr_f32_notrap.c: Likewise. * gcc.target/aarch64/sve/acle/asm/subr_f64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/subr_f64_notrap.c: Likewise. * gcc.target/aarch64/sve/acle/asm/subr_s16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/subr_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/subr_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/subr_s8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/subr_u16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/subr_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/subr_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/subr_u8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/bcax_s16.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/bcax_s32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/bcax_s64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/bcax_s8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/bcax_u16.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/bcax_u32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/bcax_u64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/bcax_u8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/qadd_s16.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/qadd_s32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/qadd_s64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/qadd_s8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/qadd_u16.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/qadd_u32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/qadd_u64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/qadd_u8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/qdmlalb_s16.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/qdmlalb_s32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/qdmlalb_s64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/qdmlalbt_s16.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/qdmlalbt_s32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/qdmlalbt_s64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/qsub_s16.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/qsub_s32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/qsub_s64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/qsub_s8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/qsub_u16.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/qsub_u32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/qsub_u64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/qsub_u8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/qsubr_s16.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/qsubr_s32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/qsubr_s64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/qsubr_s8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/qsubr_u16.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/qsubr_u32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/qsubr_u64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/qsubr_u8.c: Likewise.
2023-04-24aarch64: PR target/109406 Add support for SVE2 unpredicated MULKyrylo Tkachov1-0/+9
SVE2 supports an unpredicated vector integer MUL form that we can emit from our SVE expanders without using up a predicate registers. This patch does so. As the SVE MUL expansion currently is templated away through a code iterator I did not split it off just for this case but instead special-cased it in the define_expand. It seemed somewhat less invasive than the alternatives but I could split it off more explicitly if others want to. The div-by-bitmask_1.c testcase is adjusted to expect this new MUL form. Bootstrapped and tested on aarch64-none-linux-gnu. gcc/ChangeLog: PR target/109406 * config/aarch64/aarch64-sve.md (<optab><mode>3): Handle TARGET_SVE2 MUL case. * config/aarch64/aarch64-sve2.md (*aarch64_mul_unpredicated_<mode>): New pattern. gcc/testsuite/ChangeLog: PR target/109406 * gcc.target/aarch64/sve2/div-by-bitmask_1.c: Adjust for unpredicated SVE2 MUL. * gcc.target/aarch64/sve2/unpred_mul_1.c: New test.
2023-04-24[3/4] aarch64: Convert UABAL and SABAL patterns to standard RTL codesKyrylo Tkachov1-4/+4
With the SABDL and UABDL patterns converted, the accumulating forms of them UABAL and SABAL are not much more complicated. There's an accumulator argument that we, err, accumulate into with a PLUS once all the widening is done. Some necessary renaming of patterns relating to the removal of UNSPEC_SABAL and UNSPEC_UABAL is included. Bootstrapped and tested on aarch64-none-linux-gnu. gcc/ChangeLog: * config/aarch64/aarch64-simd.md (aarch64_<sur>abal<mode>): Rename to... (aarch64_<su>abal<mode>): ... This. Use RTL codes instead of unspec. (<sur>sadv16qi): Rename to... (<su>sadv16qi): ... This. Adjust for the above. * config/aarch64/aarch64-sve.md (<sur>sad<vsi2qi>): Rename to... (<su>sad<vsi2qi>): ... This. Adjust for the above. * config/aarch64/aarch64.md (UNSPEC_SABAL, UNSPEC_UABAL): Delete. * config/aarch64/iterators.md (ABAL): Delete. (sur): Remove handling of UNSPEC_SABAL and UNSPEC_UABAL.
2023-01-16Update copyright years.Jakub Jelinek1-1/+1
2023-01-14[aarch64] Fold ldr+dup to ld1rq for little endian targets.Prathamesh Kulkarni1-5/+25
gcc/ChangeLog: * config/aarch64/aarch64-sve.md (aarch64_vec_duplicate_vq<mode>_le): Change to define_insn_and_split to fold ldr+dup to ld1rq. * config/aarch64/predicates.md (aarch64_sve_dup_ld1rq_operand): New. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/acle/general/pr96463-2.c: Adjust.
2022-10-20aarch64: Prevent generation of /M BRKAS and BRKBSRichard Sandiford1-14/+10
Bit of a brown-paper-bag bug, but: GCC was generating non-existent merging forms of BRKAS and BRKBS. Those instructions only support zero predication (although BRKA and BRKB support both). gcc/ * config/aarch64/aarch64-sve.md (*aarch64_brk<brk_op>_cc): Remove merging alternative. (*aarch64_brk<brk_op>_ptest): Likewise. gcc/testsuite/ * gcc.target/aarch64/sve/acle/general/brka_1.c: Expect a separate PTEST instruction. * gcc.target/aarch64/sve/acle/general/brkb_1.c: Likewise.
2022-10-20aarch64: Fix matching of BRKNSRichard Sandiford1-8/+62
Unlike other flag-setting SVE instructions, BRKNS sets the flags based on an all-true governing predicate, rather than the GP operand. gcc/ * config/aarch64/iterators.md (SVE_BRKP): New iterator. * config/aarch64/aarch64-sve.md (*aarch64_brkn_cc): New pattern. (*aarch64_brkn_ptest): Likewise. (*aarch64_brk<brk_op>_cc): Restrict to SVE_BRKP. (*aarch64_brk<brk_op>_ptest): Likewise. gcc/testsuite/ * gcc.target/aarch64/sve/acle/general/brkn_1.c: Expect separate PTEST instructions. * gcc.target/aarch64/sve/acle/general/brkn_2.c: New test.
2022-08-12sve: Fix fcmuo combine patterns [PR106524]Tamar Christina1-2/+2
There's no encoding for fcmuo with zero. This restricts the combine patterns from accepting zero registers. gcc/ChangeLog: PR target/106524 * config/aarch64/aarch64-sve.md (*fcmuo<mode>_nor_combine, *fcmuo<mode>_bic_combine): Don't accept comparisons against zero. gcc/testsuite/ChangeLog: PR target/106524 * gcc.target/aarch64/sve/pr106524.c: New test.
2022-02-02AArch64: use canonical ordering for complex mul, fma and fmsTamar Christina1-3/+3
After the first patch in the series this updates the optabs to expect the canonical sequence. gcc/ChangeLog: PR tree-optimization/102819 PR tree-optimization/103169 * config/aarch64/aarch64-simd.md (cml<fcmac1><conj_op><mode>4): Use canonical order. * config/aarch64/aarch64-sve.md (cml<fcmac1><conj_op><mode>4): Likewise.
2022-01-03Update copyright years.Jakub Jelinek1-1/+1
2021-11-30vect: Add support for fmax and fmin reductionsRichard Sandiford1-0/+11
This patch adds support for reductions involving calls to fmax*() and fmin*(), without the -ffast-math flags that allow them to be converted to MAX_EXPR and MIN_EXPR. gcc/ * doc/md.texi (reduc_fmin_scal_@var{m}): Document. (reduc_fmax_scal_@var{m}): Likewise. * optabs.def (reduc_fmax_scal_optab): New optab. (reduc_fmin_scal_optab): Likewise * internal-fn.def (REDUC_FMAX, REDUC_FMIN): New functions. * tree-vect-loop.c (reduction_fn_for_scalar_code): Handle CASE_CFN_FMAX and CASE_CFN_FMIN. (neutral_op_for_reduction): Likewise. (needs_fold_left_reduction_p): Likewise. * config/aarch64/iterators.md (FMAXMINV): New iterator. (fmaxmin): Handle UNSPEC_FMAXNMV and UNSPEC_FMINNMV. * config/aarch64/aarch64-simd.md (reduc_<optab>_scal_<mode>): Fix unspec mode. (reduc_<fmaxmin>_scal_<mode>): New pattern. * config/aarch64/aarch64-sve.md (reduc_<fmaxmin>_scal_<mode>): Likewise. gcc/testsuite/ * gcc.dg/vect/vect-fmax-1.c: New test. * gcc.dg/vect/vect-fmax-2.c: Likewise. * gcc.dg/vect/vect-fmax-3.c: Likewise. * gcc.dg/vect/vect-fmin-1.c: New test. * gcc.dg/vect/vect-fmin-2.c: Likewise. * gcc.dg/vect/vect-fmin-3.c: Likewise. * gcc.target/aarch64/fmaxnm_1.c: Likewise. * gcc.target/aarch64/fmaxnm_2.c: Likewise. * gcc.target/aarch64/fminnm_1.c: Likewise. * gcc.target/aarch64/fminnm_2.c: Likewise. * gcc.target/aarch64/sve/fmaxnm_2.c: Likewise. * gcc.target/aarch64/sve/fmaxnm_3.c: Likewise. * gcc.target/aarch64/sve/fminnm_2.c: Likewise. * gcc.target/aarch64/sve/fminnm_3.c: Likewise.
2021-11-17Add IFN_COND_FMIN/FMAX functionsRichard Sandiford1-1/+18
This patch adds conditional forms of FMAX and FMIN, following the pattern for existing conditional binary functions. gcc/ * doc/md.texi (cond_fmin@var{mode}, cond_fmax@var{mode}): Document. * optabs.def (cond_fmin_optab, cond_fmax_optab): New optabs. * internal-fn.def (COND_FMIN, COND_FMAX): New functions. * internal-fn.c (first_commutative_argument): Handle them. (FOR_EACH_COND_FN_PAIR): Likewise. * match.pd (UNCOND_BINARY, COND_BINARY): Likewise. * config/aarch64/aarch64-sve.md (cond_<fmaxmin><mode>): New pattern. gcc/testsuite/ * gcc.target/aarch64/sve/cond_fmaxnm_5.c: New test. * gcc.target/aarch64/sve/cond_fmaxnm_5_run.c: Likewise. * gcc.target/aarch64/sve/cond_fmaxnm_6.c: Likewise. * gcc.target/aarch64/sve/cond_fmaxnm_6_run.c: Likewise. * gcc.target/aarch64/sve/cond_fmaxnm_7.c: Likewise. * gcc.target/aarch64/sve/cond_fmaxnm_7_run.c: Likewise. * gcc.target/aarch64/sve/cond_fmaxnm_8.c: Likewise. * gcc.target/aarch64/sve/cond_fmaxnm_8_run.c: Likewise. * gcc.target/aarch64/sve/cond_fminnm_5.c: Likewise. * gcc.target/aarch64/sve/cond_fminnm_5_run.c: Likewise. * gcc.target/aarch64/sve/cond_fminnm_6.c: Likewise. * gcc.target/aarch64/sve/cond_fminnm_6_run.c: Likewise. * gcc.target/aarch64/sve/cond_fminnm_7.c: Likewise. * gcc.target/aarch64/sve/cond_fminnm_7_run.c: Likewise. * gcc.target/aarch64/sve/cond_fminnm_8.c: Likewise. * gcc.target/aarch64/sve/cond_fminnm_8_run.c: Likewise.
2021-11-10aarch64: Tweak FMAX/FMIN iteratorsRichard Sandiford1-1/+1
There was some duplication between the maxmin_uns (uns for unspec rather than unsigned) int attribute and the optab int attribute. The difficulty for FMAXNM and FMINNM is that the instructions really correspond to two things: the smax/smin optabs for floats (used only for fast-math-like flags) and the fmax/fmin optabs (used for built-in functions). The optab attribute was consistently for the former but maxmin_uns had a mixture of both. This patch renames maxmin_uns to fmaxmin and only uses it for the fmax and fmin optabs. The reductions that previously used the maxmin_uns attribute now use the optab attribute instead. FMAX and FMIN are awkward in that they don't correspond to any optab. It's nevertheless useful to define them alongside the “real” optabs. Previously they were known as “smax_nan” and “smin_nan”, but the problem with those names it that smax and smin are only used for floats if NaNs don't matter. This patch therefore uses fmax_nan and fmin_nan instead. There is still some inconsistency, in that the optab attribute handles UNSPEC_COND_FMAX but the fmaxmin attribute handles UNSPEC_FMAX. This is because the SVE FP instructions, being predicated, have to use unspecs in cases where the Advanced SIMD ones could use rtl codes. At least there are no duplicate entries though, so this seemed like the best compromise for now. gcc/ * config/aarch64/iterators.md (optab): Use fmax_nan instead of smax_nan and fmin_nan instead of smin_nan. (maxmin_uns): Rename to... (fmaxmin): ...this and make the same changes. Remove entries unrelated to fmax* and fmin*. * config/aarch64/aarch64.md (<maxmin_uns><mode>3): Rename to... (<fmaxmin><mode>3): ...this. * config/aarch64/aarch64-simd.md (aarch64_<maxmin_uns>p<mode>): Rename to... (aarch64_<optab>p<mode>): ...this. (<maxmin_uns><mode>3): Rename to... (<fmaxmin><mode>3): ...this. (reduc_<maxmin_uns>_scal_<mode>): Rename to... (reduc_<optab>_scal_<mode>): ...this and update gen* call. (aarch64_reduc_<maxmin_uns>_internal<mode>): Rename to... (aarch64_reduc_<optab>_internal<mode>): ...this. (aarch64_reduc_<maxmin_uns>_internalv2si): Rename to... (aarch64_reduc_<optab>_internalv2si): ...this. * config/aarch64/aarch64-sve.md (<maxmin_uns><mode>3): Rename to... (<fmaxmin><mode>3): ...this. * config/aarch64/aarch64-simd-builtins.def (smax_nan, smin_nan) Rename to... (fmax_nan, fmin_nan): ...this. * config/aarch64/arm_neon.h (vmax_f32, vmax_f64, vmaxq_f32, vmaxq_f64) (vmin_f32, vmin_f64, vminq_f32, vminq_f64, vmax_f16, vmaxq_f16) (vmin_f16, vminq_f16): Update accordingly.
2021-10-12sve: combine inverted masks into NOTsTamar Christina1-0/+154
The following example void f10(double * restrict z, double * restrict w, double * restrict x, double * restrict y, int n) { for (int i = 0; i < n; i++) { z[i] = (w[i] > 0) ? x[i] + w[i] : y[i] - w[i]; } } generates currently: ld1d z1.d, p1/z, [x1, x5, lsl 3] fcmgt p2.d, p1/z, z1.d, #0.0 fcmgt p0.d, p3/z, z1.d, #0.0 ld1d z2.d, p2/z, [x2, x5, lsl 3] bic p0.b, p3/z, p1.b, p0.b ld1d z0.d, p0/z, [x3, x5, lsl 3] where a BIC is generated between p1 and p0 where a NOT would be better here since we won't require the use of p3 and opens the pattern up to being CSEd. After this patch using a 2 -> 2 split we generate: ld1d z1.d, p0/z, [x1, x5, lsl 3] fcmgt p2.d, p0/z, z1.d, #0.0 not p1.b, p0/z, p2.b The additional scratch is needed such that we can CSE the two operations. If both statements wrote to the same register then CSE won't be able to CSE the values if there are other statements in between that use the register. A second pattern is needed to capture the nor case as combine will match the longest sequence first. So without this pattern we end up de-optimizing nor and instead emit two nots. I did not find a better way to do this. gcc/ChangeLog: * config/aarch64/aarch64-sve.md (*fcm<cmp_op><mode>_bic_combine, *fcm<cmp_op><mode>_nor_combine, *fcmuo<mode>_bic_combine, *fcmuo<mode>_nor_combine): New. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/pred-not-gen-1.c: New test. * gcc.target/aarch64/sve/pred-not-gen-2.c: New test. * gcc.target/aarch64/sve/pred-not-gen-3.c: New test. * gcc.target/aarch64/sve/pred-not-gen-4.c: New test.
2021-07-14AArch64: Add support for sign differing dot-product usdot for NEON and SVE.Tamar Christina1-1/+1
Hi All, This adds optabs implementing usdot_prod. The following testcase: #define N 480 #define SIGNEDNESS_1 unsigned #define SIGNEDNESS_2 signed #define SIGNEDNESS_3 signed #define SIGNEDNESS_4 unsigned SIGNEDNESS_1 int __attribute__ ((noipa)) f (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict a, SIGNEDNESS_4 char *restrict b) { for (__INTPTR_TYPE__ i = 0; i < N; ++i) { int av = a[i]; int bv = b[i]; SIGNEDNESS_2 short mult = av * bv; res += mult; } return res; } Generates for NEON f: movi v0.4s, 0 mov x3, 0 .p2align 3,,7 .L2: ldr q1, [x2, x3] ldr q2, [x1, x3] usdot v0.4s, v1.16b, v2.16b add x3, x3, 16 cmp x3, 480 bne .L2 addv s0, v0.4s fmov w1, s0 add w0, w0, w1 ret and for SVE f: mov x3, 0 cntb x5 mov w4, 480 mov z1.b, #0 whilelo p0.b, wzr, w4 mov z3.b, #0 ptrue p1.b, all .p2align 3,,7 .L2: ld1b z2.b, p0/z, [x1, x3] ld1b z0.b, p0/z, [x2, x3] add x3, x3, x5 sel z0.b, p0, z0.b, z3.b whilelo p0.b, w3, w4 usdot z1.s, z0.b, z2.b b.any .L2 uaddv d0, p1, z1.s fmov x1, d0 add w0, w0, w1 ret instead of f: movi v0.4s, 0 mov x3, 0 .p2align 3,,7 .L2: ldr q2, [x1, x3] ldr q1, [x2, x3] add x3, x3, 16 sxtl v4.8h, v2.8b sxtl2 v3.8h, v2.16b uxtl v2.8h, v1.8b uxtl2 v1.8h, v1.16b mul v2.8h, v2.8h, v4.8h mul v1.8h, v1.8h, v3.8h saddw v0.4s, v0.4s, v2.4h saddw2 v0.4s, v0.4s, v2.8h saddw v0.4s, v0.4s, v1.4h saddw2 v0.4s, v0.4s, v1.8h cmp x3, 480 bne .L2 addv s0, v0.4s fmov w1, s0 add w0, w0, w1 ret and f: mov x3, 0 cnth x5 mov w4, 480 mov z1.b, #0 whilelo p0.h, wzr, w4 ptrue p2.b, all .p2align 3,,7 .L2: ld1sb z2.h, p0/z, [x1, x3] punpklo p1.h, p0.b ld1b z0.h, p0/z, [x2, x3] add x3, x3, x5 mul z0.h, p2/m, z0.h, z2.h sunpklo z2.s, z0.h sunpkhi z0.s, z0.h add z1.s, p1/m, z1.s, z2.s punpkhi p1.h, p0.b whilelo p0.h, w3, w4 add z1.s, p1/m, z1.s, z0.s b.any .L2 uaddv d0, p2, z1.s fmov x1, d0 add w0, w0, w1 ret gcc/ChangeLog: * config/aarch64/aarch64-simd.md (aarch64_usdot<vsi2qi>): Rename to... (usdot_prod<vsi2qi>): ... This. * config/aarch64/aarch64-simd-builtins.def (usdot): Rename to... (usdot_prod): ...This. * config/aarch64/arm_neon.h (vusdot_s32, vusdotq_s32): Likewise. * config/aarch64/aarch64-sve.md (@aarch64_<sur>dot_prod<vsi2qi>): Rename to... (@<sur>dot_prod<vsi2qi>): ...This. * config/aarch64/aarch64-sve-builtins-base.cc (svusdot_impl::expand): Use it. gcc/testsuite/ChangeLog: * gcc.target/aarch64/simd/vusdot-autovec.c: New test. * gcc.target/aarch64/sve/vusdot-autovec.c: New test.
2021-05-19aarch64: Enable aarch64_load to use UNSPEC_PRED_X loadsAndre Simoes Dias Vieira1-2/+2
This patch will enable the use of loads using the UNSPEC_PRED_X enum in the aarch64_load pattern, thus enabling combine to combine such loads with extends. gcc/ChangeLog: 2021-05-19 Andre Vieira <andre.simoesdiasvieira@arm.com> * config/aarch64/iterators.md (SVE_PRED_LOAD): New iterator. (pred_load): New int attribute. * config/aarch64/aarch64-sve.md (aarch64_load_<ANY_EXTEND:optab><SVE_HSDI:mode><SVE_PARTIAL_I:mode>): Use SVE_PRED_LOAD enum iterator and corresponding pred_load attribute. * config/aarch64/aarch64-sve-builtins-base.cc (expand): Update call to code_for_aarch64_load. gcc/testsuite/ChangeLog: 2021-05-19 Andre Vieira <andre.simoesdiasvieira@arm.com> * gcc.target/aarch64/sve/logical_unpacked_and_2.c: Change scan-assembly-times to scan-assembly not for superfluous uxtb. * gcc.target/aarch64/sve/logical_unpacked_and_3.c: Likewise. * gcc.target/aarch64/sve/logical_unpacked_and_4.c: Likewise. * gcc.target/aarch64/sve/logical_unpacked_and_6.c: Likewise. * gcc.target/aarch64/sve/logical_unpacked_and_7.c: Likewise. * gcc.target/aarch64/sve/logical_unpacked_eor_2.c: Likewise. * gcc.target/aarch64/sve/logical_unpacked_eor_3.c: Likewise. * gcc.target/aarch64/sve/logical_unpacked_eor_4.c: Likewise. * gcc.target/aarch64/sve/logical_unpacked_eor_6.c: Likewise. * gcc.target/aarch64/sve/logical_unpacked_eor_7.c: Likewise. * gcc.target/aarch64/sve/logical_unpacked_orr_2.c: Likewise. * gcc.target/aarch64/sve/logical_unpacked_orr_3.c: Likewise. * gcc.target/aarch64/sve/logical_unpacked_orr_4.c: Likewise. * gcc.target/aarch64/sve/logical_unpacked_orr_6.c: Likewise. * gcc.target/aarch64/sve/logical_unpacked_orr_7.c: Likewise. * gcc.target/aarch64/sve/ld1_extend.c: New test.
2021-04-16SVE: Fix wrong sve predicate split (PR100048)Tamar Christina1-0/+14
The attached testcase generates the following paradoxical subregs when creating the predicates. (insn 22 21 23 2 (set (reg:VNx8BI 100) (subreg:VNx8BI (reg:VNx2BI 103) 0)) (expr_list:REG_EQUAL (const_vector:VNx8BI [ (const_int 1 [0x1]) (const_int 0 [0]) (const_int 1 [0x1]) (const_int 0 [0]) repeated x5 ]) (nil))) and (insn 15 14 16 2 (set (reg:VNx8BI 96) (subreg:VNx8BI (reg:VNx2BI 99) 0)) (expr_list:REG_EQUAL (const_vector:VNx8BI [ (const_int 1 [0x1]) (const_int 0 [0]) repeated x7 ]) (nil))) This causes CSE to incorrectly think that the two predicates are equal because some of the significant bits get ignored due to the subreg. The attached patch instead makes it so it always looks at all 16-bits of the predicate, but in turn means we need to generate a TRN that matches the expected result mode. In effect in RTL we keep the mode as VNx16BI but during codegen re-interpret them as the mode the predicate instruction wanted: (insn 10 9 11 2 (set (reg:VNx8BI 96) (subreg:VNx8BI (reg:VNx16BI 99) 0)) (expr_list:REG_EQUAL (const_vector:VNx8BI [ (const_int 1 [0x1]) (const_int 0 [0]) repeated x7 ]) (nil))) Which needed correction to the TRN pattern. A new TRN1_CONV unspec is introduced which allows one to keep the arguments as VNx16BI but encode the instruction as a type of the last operand. (insn 9 8 10 2 (set (reg:VNx16BI 99) (unspec:VNx16BI [ (reg:VNx16BI 97) (reg:VNx16BI 98) (reg:VNx2BI 100) ] UNSPEC_TRN1_CONV)) (nil)) This allows us remove all the paradoxical subregs and end up with (insn 16 15 17 2 (set (reg:VNx8BI 101) (subreg:VNx8BI (reg:VNx16BI 104) 0)) (expr_list:REG_EQUAL (const_vector:VNx8BI [ (const_int 1 [0x1]) (const_int 0 [0]) (const_int 1 [0x1]) (const_int 0 [0]) repeated x5 ]) (nil))) gcc/ChangeLog: PR target/100048 * config/aarch64/aarch64-sve.md (@aarch64_sve_trn1_conv<mode>): New. * config/aarch64/aarch64.c (aarch64_expand_sve_const_pred_trn): Use new TRN optab. * config/aarch64/iterators.md (UNSPEC_TRN1_CONV): New. gcc/testsuite/ChangeLog: PR target/100048 * gcc.target/aarch64/sve/pr100048.c: New test.
2021-02-19aarch64: Check predicate when using gen_vec_duplicate [PR98657]Andre Vieira1-4/+2
Prevents generation of a vec_duplicate with illegal predicate in <ASHIFT:optab><mode>3. gcc/ChangeLog: 2021-02-19 Andre Vieira <andre.simoesdiasvieira@arm.com> PR target/98657 * config/aarch64/aarch64-sve.md (<ASHIFT:optab><mode>3): Use expand_vector_broadcast' to emit the vec_duplicate operand. gcc/testsuite/ChangeLog: 2021-02-19 Andre Vieira <andre.simoesdiasvieira@arm.com> PR target/98657 * gcc.target/aarch64/sve/pr98657.c: New test.
2021-01-15AArch64: Add NEON, SVE and SVE2 RTL patterns for Multiply, FMS and FMA.Tamar Christina1-0/+56
This adds implementation for the optabs for complex operations. With this the following C code: void g (float complex a[restrict N], float complex b[restrict N], float complex c[restrict N]) { for (int i=0; i < N; i++) c[i] = a[i] * b[i]; } generates NEON: g: movi v3.4s, 0 mov x3, 0 .p2align 3,,7 .L2: mov v0.16b, v3.16b ldr q2, [x1, x3] ldr q1, [x0, x3] fcmla v0.4s, v1.4s, v2.4s, #0 fcmla v0.4s, v1.4s, v2.4s, #90 str q0, [x2, x3] add x3, x3, 16 cmp x3, 1600 bne .L2 ret SVE: g: mov x3, 0 mov x4, 400 ptrue p1.b, all whilelo p0.s, xzr, x4 mov z3.s, #0 .p2align 3,,7 .L2: ld1w z1.s, p0/z, [x0, x3, lsl 2] ld1w z2.s, p0/z, [x1, x3, lsl 2] movprfx z0, z3 fcmla z0.s, p1/m, z1.s, z2.s, #0 fcmla z0.s, p1/m, z1.s, z2.s, #90 st1w z0.s, p0, [x2, x3, lsl 2] incw x3 whilelo p0.s, x3, x4 b.any .L2 ret SVE2 (with int instead of float) g: mov x3, 0 mov x4, 400 mov z3.b, #0 whilelo p0.s, xzr, x4 .p2align 3,,7 .L2: ld1w z1.s, p0/z, [x0, x3, lsl 2] ld1w z2.s, p0/z, [x1, x3, lsl 2] movprfx z0, z3 cmla z0.s, z1.s, z2.s, #0 cmla z0.s, z1.s, z2.s, #90 st1w z0.s, p0, [x2, x3, lsl 2] incw x3 whilelo p0.s, x3, x4 b.any .L2 ret gcc/ChangeLog: * config/aarch64/aarch64-simd.md (cml<fcmac1><conj_op><mode>4, cmul<conj_op><mode>3): New. * config/aarch64/iterators.md (UNSPEC_FCMUL, UNSPEC_FCMUL180, UNSPEC_FCMLA_CONJ, UNSPEC_FCMLA180_CONJ, UNSPEC_CMLA_CONJ, UNSPEC_CMLA180_CONJ, UNSPEC_CMUL, UNSPEC_CMUL180, FCMLA_OP, FCMUL_OP, conj_op, rotsplit1, rotsplit2, fcmac1, sve_rot1, sve_rot2, SVE2_INT_CMLA_OP, SVE2_INT_CMUL_OP, SVE2_INT_CADD_OP): New. (rot): Add UNSPEC_FCMUL, UNSPEC_FCMUL180. (rot_op): Renamed to conj_op. * config/aarch64/aarch64-sve.md (cml<fcmac1><conj_op><mode>4, cmul<conj_op><mode>3): New. * config/aarch64/aarch64-sve2.md (cml<fcmac1><conj_op><mode>4, cmul<conj_op><mode>3): New.
2021-01-13aarch64: Add support for unpacked SVE MLS and MSBRichard Sandiford1-44/+44
This patch extends the MLS/MSB patterns to support unpacked integer vectors. The type suffix could be either the element size or the container size, but using the element size should be more efficient. gcc/ * config/aarch64/aarch64-sve.md (fnma<mode>4): Extend from SVE_FULL_I to SVE_I. (@aarch64_pred_fnma<mode>, cond_fnma<mode>, *cond_fnma<mode>_2) (*cond_fnma<mode>_4, *cond_fnma<mode>_any): Likewise. gcc/testsuite/ * gcc.target/aarch64/sve/mls_2.c: New test. * g++.target/aarch64/sve/cond_mls_1.C: Likewise. * g++.target/aarch64/sve/cond_mls_2.C: Likewise. * g++.target/aarch64/sve/cond_mls_3.C: Likewise. * g++.target/aarch64/sve/cond_mls_4.C: Likewise. * g++.target/aarch64/sve/cond_mls_5.C: Likewise.