Age | Commit message (Collapse) | Author | Files | Lines |
|
Currently, we vectorize CTZ for SVE by using the following operation:
.CTZ (X) = (PREC - 1) - .CLZ (X & -X)
Instead, this patch expands CTZ to RBIT + CLZ for SVE, as suggested in PR109498.
The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
OK for mainline?
Signed-off-by: Soumya AR <soumyaa@nvidia.com>
gcc/ChangeLog:
PR target/109498
* config/aarch64/aarch64-sve.md (ctz<mode>2): Added pattern to expand
CTZ to RBIT + CLZ for SVE.
gcc/testsuite/ChangeLog:
PR target/109498
* gcc.target/aarch64/sve/ctz.c: New test.
|
|
Testing gcc.target/aarch64/sve/permute_2.c without the associated GCC
patch triggered an unrecognisable insn ICE for the svbfloat16_t tests.
This was because the implementation of general two-vector permutes
requires two TBLs and an ORR, with the ORR being represented as an
unspec for floating-point modes. The associated pattern did not
cover VNx8BF.
gcc/
* config/aarch64/iterators.md (SVE_I): Move further up file.
(SVE_F): New mode iterator.
(SVE_ALL): Redefine in terms of SVE_I and SVE_F.
* config/aarch64/aarch64-sve.md (*<LOGICALF:optab><mode>3): Extend
to all SVE_F.
gcc/testsuite/
* gcc.target/aarch64/sve/permute_5.c: New test.
|
|
Introduce two new unspecs, UNSPEC_COND_SMAX and UNSPEC_COND_SMIN,
corresponding to rtl operators smax and smin. UNSPEC_COND_SMAX is used
to generate fmaxnm instruction and UNSPEC_COND_SMIN is used to generate
fminnm instruction.
With these new unspecs, we can generate SVE2 max/min instructions using
existing generic unpredicated and predicated instruction patterns that
use optab attribute. Thus, we have removed specialised instruction
patterns for max/min instructions that were using
SVE_COND_FP_MAXMIN_PUBLIC iterator.
No new test cases as the existing test cases should be enough to test
this refactoring.
gcc/ChangeLog:
* config/aarch64/aarch64-sve.md
(<fmaxmin><mode>3): Remove this instruction pattern.
(cond_<fmaxmin><mode>): Remove this instruction pattern.
* config/aarch64/iterators.md: New unspecs and changes to
iterators and attrs to use the new unspecs
|
|
Given recent changes to the dot_prod standard pattern name, this patch
fixes the aarch64 back-end by implementing the following changes:
1. Add 2nd mode to all (u|s|us)dot_prod patterns in .md files.
2. Rewrite initialization and function expansion mechanism for simd
builtins.
3. Fix all direct calls to back-end `dot_prod' patterns in SVE
builtins.
Finally, given that it is now possible for the compiler to
differentiate between the two- and four-way dot product, we add a test
to ensure that autovectorization picks up on dot-product patterns
where the result is twice the width of the operands.
gcc/ChangeLog:
* config/aarch64/aarch64-simd.md
(<sur>dot_prod<vsi2qi><vczle><vczbe>): Renamed to...
(<sur>dot_prod<mode><vsi2qi><vczle><vczbe>): ...this.
(usdot_prod<vsi2qi><vczle><vczbe>): Renamed to...
(usdot_prod<mode><vsi2qi><vczle><vczbe>): ...this.
(<su>sadv16qi): Adjust call to gen_udot_prod take second mode.
(popcount<mode2>): fix use of `udot_prod_optab'.
* config/aarch64/aarch64-sve.md
(<sur>dot_prod<vsi2qi>): Renamed to...
(<sur>dot_prod<mode><vsi2qi>): ...this.
(@<sur>dot_prod<vsi2qi>): Renamed to...
(@<sur>dot_prod<mode><vsi2qi>): ...this.
(<su>sad<vsi2qi>): Adjust call to gen_udot_prod take second mode.
* config/aarch64/aarch64-sve2.md
(@aarch64_sve_<sur>dotvnx4sivnx8hi): Renamed to...
(<sur>dot_prodvnx4sivnx8hi): ...this.
* config/aarch64/aarch64-simd-builtins.def: Modify macro
expansion-based initialization and expansion
of (u|s|us)dot_prod builtins.
* config/aarch64/aarch64-builtins.cc
(CODE_FOR_aarch64_sdot_prodv8qi): Define as alias to
new CODE_FOR_sdot_prodv2siv8qi.
(CODE_FOR_aarch64_udot_prodv8qi): Define as alias to
new CODE_FOR_udot_prodv2siv8qi.
(CODE_FOR_aarch64_usdot_prodv8qi): Define as alias to
new CODE_FOR_usdot_prodv2siv8qi.
(CODE_FOR_aarch64_sdot_prodv16qi): Define as alias to
new CODE_FOR_sdot_prodv4siv16qi.
(CODE_FOR_aarch64_udot_prodv16qi): Define as alias to
new CODE_FOR_udot_prodv4siv16qi.
(CODE_FOR_aarch64_usdot_prodv16qi): Define as alias to
new CODE_FOR_usdot_prodv4siv16qi.
* config/aarch64/aarch64-sve-builtins-base.cc
(svdot_impl::expand): s/direct/convert/ in
`convert_optab_handler_for_sign' function call.
(svusdot_impl::expand): add second mode argument in call to
`code_for_dot_prod'.
* config/aarch64/aarch64-sve-builtins.cc
(function_expander::convert_optab_handler_for_sign): New class
method.
* config/aarch64/aarch64-sve-builtins.h
(class function_expander): Add prototype for new
`convert_optab_handler_for_sign' method.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sme/vect-dotprod-twoway.c (udot2): New.
|
|
On Neoverse V2, SVE ADD instructions have a throughput of 4, while shift
instructions like SHL have a throughput of 2. We can lean on that to emit code
like:
add z31.b, z31.b, z31.b
instead of:
lsl z31.b, z31.b, #1
The implementation of this change for SVE vectors is similar to a prior patch
<https://gcc.gnu.org/pipermail/gcc-patches/2024-August/659958.html> that adds
the above functionality for Neon vectors.
Here, the machine descriptor pattern is split up to separately accommodate left
and right shifts, so we can specifically emit an add for all left shifts by 1.
The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
OK for mainline?
Signed-off-by: Soumya AR <soumyaa@nvidia.com>
gcc/ChangeLog:
* config/aarch64/aarch64-sve.md (*post_ra_v<optab><mode>3): Split pattern
to accomodate left and right shifts separately.
(*post_ra_v_ashl<mode>3): Matches left shifts with additional
constraint to check for shifts by 1.
(*post_ra_v_<optab><mode>3): Matches right shifts.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve/acle/asm/lsl_s16.c: Updated instances of lsl-1
with corresponding add.
* gcc.target/aarch64/sve/acle/asm/lsl_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_wide_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_wide_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_wide_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_wide_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_wide_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_wide_u8.c: Likewise.
* gcc.target/aarch64/sve/adr_1.c: Likewise.
* gcc.target/aarch64/sve/adr_6.c: Likewise.
* gcc.target/aarch64/sve/cond_mla_7.c: Likewise.
* gcc.target/aarch64/sve/cond_mla_8.c: Likewise.
* gcc.target/aarch64/sve/shift_2.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_s64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_u64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_s64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_u64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/rshl_s16.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/rshl_s32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/rshl_s64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/rshl_s8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/rshl_u16.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/rshl_u32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/rshl_u64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/rshl_u8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_s64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_u64.c: Likewise.
* gcc.target/aarch64/sve/sve_shl_add.c: New test.
|
|
This patch improves the Advanced SIMD popcount expansion by using SVE if
available.
For example, GCC currently generates the following code sequence for V2DI:
cnt v31.16b, v31.16b
uaddlp v31.8h, v31.16b
uaddlp v31.4s, v31.8h
uaddlp v31.2d, v31.4s
However, by using SVE, we can generate the following sequence instead:
ptrue p7.b, all
cnt z31.d, p7/m, z31.d
Similar improvements can be made for V4HI, V8HI, V2SI and V4SI too.
The scalar popcount expansion can also be improved similarly by using SVE and
those changes will be included in a separate patch.
PR target/113860
gcc/ChangeLog:
* config/aarch64/aarch64-simd.md (popcount<mode>2): Add TARGET_SVE
support.
* config/aarch64/aarch64-sve.md (@aarch64_pred_<optab><mode>): Use new
iterator SVE_VDQ_I.
* config/aarch64/iterators.md (SVE_VDQ_I): New mode iterator.
(VPRED): Add V8QI, V16QI, V4HI, V8HI and V2SI.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/popcnt-sve.c: New test.
Signed-off-by: Pengxuan Zheng <quic_pzheng@quicinc.com>
|
|
Now there is an optab for bic, andn since r15-1890-gf379596e0ba99d.
This moves aarch64_bic for sve over to use it instead.
Note unlike the simd bic patterns, the operands were already
in the order that was expected for the optab so no swapping
was needed.
Built and tested on aarch64-linux-gnu with no regressions.
gcc/ChangeLog:
* config/aarch64/aarch64-sve-builtins-base.cc (svbic_impl::expand): Update
to use andn optab instead of using code_for_aarch64_bic.
* config/aarch64/aarch64-sve.md (@aarch64_bic<mode>): Rename to ...
(andn<mode>3): This.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
Bitreverse rtl code was added with r14-1586-g6160572f8d243c. So let's
use it instead of an unspec. This is just a small cleanup but it does
have one small fix with respect to rtx costs which didn't handle vector modes
correctly for the UNSPEC and now it does.
This is part of the first step in adding __builtin_bitreverse's builtins
but it is independent of it though.
Bootstrapped and tested on aarch64-linux-gnu with no regressions.
gcc/ChangeLog:
PR target/115176
* config/aarch64/aarch64-simd.md (aarch64_rbit<mode><vczle><vczbe>): Use
bitreverse instead of unspec.
* config/aarch64/aarch64-sve-builtins-base.cc (svrbit): Convert over to using
rtx_code_function instead of unspec_based_function.
* config/aarch64/aarch64-sve.md: Update comment where RBIT is included.
* config/aarch64/aarch64.cc (aarch64_rtx_costs): Handle BITREVERSE like BSWAP.
Remove UNSPEC_RBIT support.
* config/aarch64/aarch64.md (unspec): Remove UNSPEC_RBIT.
(aarch64_rbit<mode>): Use bitreverse instead of unspec.
* config/aarch64/iterators.md (SVE_INT_UNARY): Add bitreverse.
(optab): Likewise.
(sve_int_op): Likewise.
(SVE_INT_UNARY): Remove UNSPEC_RBIT.
(optab): Likewise.
(sve_int_op): Likewise.
(min_elem_bits): Likewise.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
I made an oversight in the previous patch, where I added a ?Upa
alternative to the Upl cases. This causes it to create the tie
between the larger register file rather than the constrained one.
This fixes the affected patterns.
gcc/ChangeLog:
* config/aarch64/aarch64-sve.md (@aarch64_pred_cmp<cmp_op><mode>,
*cmp<cmp_op><mode>_cc, *cmp<cmp_op><mode>_ptest,
@aarch64_pred_cmp<cmp_op><mode>_wide,
*aarch64_pred_cmp<cmp_op><mode>_wide_cc,
*aarch64_pred_cmp<cmp_op><mode>_wide_ptest): Fix Upl tie alternative.
* config/aarch64/aarch64-sve2.md (@aarch64_pred_<sve_int_op><mode>): Fix
Upl tie alternative.
|
|
This patch adds new alternatives to the patterns which are affected. The new
alternatives with the conditional early clobbers are added before the normal
ones in order for LRA to prefer them in the event that we have enough free
registers to accommodate them.
In case register pressure is too high the normal alternatives will be preferred
before a reload is considered as we rather have the tie than a spill.
Tests are in the next patch.
gcc/ChangeLog:
* config/aarch64/aarch64-sve.md (and<mode>3,
@aarch64_pred_<optab><mode>_z, *<optab><mode>3_cc,
*<optab><mode>3_ptest, aarch64_pred_<nlogical><mode>_z,
*<nlogical><mode>3_cc, *<nlogical><mode>3_ptest,
aarch64_pred_<logical_nn><mode>_z, *<logical_nn><mode>3_cc,
*<logical_nn><mode>3_ptest, @aarch64_pred_cmp<cmp_op><mode>,
*cmp<cmp_op><mode>_cc, *cmp<cmp_op><mode>_ptest,
@aarch64_pred_cmp<cmp_op><mode>_wide,
*aarch64_pred_cmp<cmp_op><mode>_wide_cc,
*aarch64_pred_cmp<cmp_op><mode>_wide_ptest, @aarch64_brk<brk_op>,
*aarch64_brk<brk_op>_cc, *aarch64_brk<brk_op>_ptest,
@aarch64_brk<brk_op>, *aarch64_brk<brk_op>_cc,
*aarch64_brk<brk_op>_ptest, aarch64_rdffr_z, *aarch64_rdffr_z_ptest,
*aarch64_rdffr_ptest, *aarch64_rdffr_z_cc, *aarch64_rdffr_cc): Add
new early clobber
alternative.
* config/aarch64/aarch64-sve2.md
(@aarch64_pred_<sve_int_op><mode>): Likewise.
|
|
This converts the single alternative patterns to the new compact syntax such
that when I add the new alternatives it's clearer what's being changed.
Note that this will spew out a bunch of warnings from geninsn as it'll warn that
@ is useless for a single alternative pattern. These are not fatal so won't
break the build and are only temporary.
No change in functionality is expected with this patch.
gcc/ChangeLog:
* config/aarch64/aarch64-sve.md (and<mode>3,
@aarch64_pred_<optab><mode>_z, *<optab><mode>3_cc,
*<optab><mode>3_ptest, aarch64_pred_<nlogical><mode>_z,
*<nlogical><mode>3_cc, *<nlogical><mode>3_ptest,
aarch64_pred_<logical_nn><mode>_z, *<logical_nn><mode>3_cc,
*<logical_nn><mode>3_ptest, *cmp<cmp_op><mode>_ptest,
@aarch64_pred_cmp<cmp_op><mode>_wide,
*aarch64_pred_cmp<cmp_op><mode>_wide_cc,
*aarch64_pred_cmp<cmp_op><mode>_wide_ptest, *aarch64_brk<brk_op>_cc,
*aarch64_brk<brk_op>_ptest, @aarch64_brk<brk_op>,
*aarch64_brk<brk_op>_cc, *aarch64_brk<brk_op>_ptest, aarch64_rdffr_z,
*aarch64_rdffr_z_ptest, *aarch64_rdffr_ptest, *aarch64_rdffr_z_cc,
*aarch64_rdffr_cc): Convert to compact syntax.
* config/aarch64/aarch64-sve2.md
(@aarch64_pred_<sve_int_op><mode>): Likewise.
|
|
aarch64-sve.md had a pattern that combined:
cmpeq pb.T, pa/z, zc.T, #0
mov zd.T, pb/z, #1
into:
cnot zd.T, pa/m, zc.T
But this is only valid if pa.T is a ptrue. In other cases, the
original would set inactive elements of zd.T to 0, whereas the
combined form would copy elements from zc.T.
gcc/
PR target/114603
* config/aarch64/aarch64-sve.md (@aarch64_pred_cnot<mode>): Replace
with...
(@aarch64_ptrue_cnot<mode>): ...this, requiring operand 1 to be
a ptrue.
(*cnot<mode>): Require operand 1 to be a ptrue.
* config/aarch64/aarch64-sve-builtins-base.cc (svcnot_impl::expand):
Use aarch64_ptrue_cnot<mode> for _x operations that are predicated
with a ptrue. Represent other _x operations as fully-defined _m
operations.
gcc/testsuite/
PR target/114603
* gcc.target/aarch64/sve/acle/general/cnot_1.c: New test.
|
|
As suggested in the ticket this replaces the expansion by converting the
Advanced SIMD types to SVE types by simply printing out an SVE register for
these instructions.
This fixes the subreg issues since there are no subregs involved anymore.
gcc/ChangeLog:
PR target/109636
* config/aarch64/aarch64-simd.md (<su_optab>div<mode>3,
mulv2di3): Remove.
* config/aarch64/iterators.md (VQDIV): Remove.
(SVE_FULL_SDI_SIMD, SVE_FULL_HSDI_SIMD_DI,
SVE_I_SIMD_DI): New.
(VPRED, sve_lane_con): Add V4SI and V2DI.
* config/aarch64/aarch64-sve.md (<optab><mode>3,
@aarch64_pred_<optab><mode>): Support Advanced SIMD types.
(mul<mode>3): New, split from <optab><mode>3.
(@aarch64_pred_<optab><mode>, *post_ra_<optab><mode>3): New.
* config/aarch64/aarch64-sve2.md (@aarch64_mul_lane_<mode>,
*aarch64_mul_unpredicated_<mode>): Change SVE_FULL_HSDI to
SVE_FULL_HSDI_SIMD_DI.
gcc/testsuite/ChangeLog:
PR target/109636
* gcc.target/aarch64/sve/pr109636_1.c: New test.
* gcc.target/aarch64/sve/pr109636_2.c: New test.
* gcc.target/aarch64/sve2/pr109636_1.c: New test.
|
|
|
|
This patch uses the new force_reload_address routine added by the
previous patch to fix PR112906.
gcc/ChangeLog:
PR target/112906
* config/aarch64/aarch64-sve.md (@aarch64_vec_duplicate_vq<mode>_le):
Use force_reload_address to reload addresses that aren't suitable for
ld1rq in the pre-RA splitter.
gcc/testsuite/ChangeLog:
PR target/112906
* gcc.target/aarch64/sve/acle/general/pr112906.c: New test.
|
|
ACLE has added intrinsics to bridge between SVE and Neon.
The NEON_SVE Bridge adds intrinsics that allow conversions between NEON and
SVE vectors.
This patch adds support to GCC for the following 3 intrinsics:
svset_neonq, svget_neonq and svdup_neonq
gcc/ChangeLog:
* config.gcc: Adds new header to config.
* config/aarch64/aarch64-builtins.cc (enum aarch64_type_qualifiers):
Moved to header file.
(ENTRY): Likewise.
(enum aarch64_simd_type): Likewise.
(struct aarch64_simd_type_info): Remove static.
(GTY): Likewise.
* config/aarch64/aarch64-c.cc (aarch64_pragma_aarch64):
Defines pragma for arm_neon_sve_bridge.h.
* config/aarch64/aarch64-protos.h:
Add handle_arm_neon_sve_bridge_h
* config/aarch64/aarch64-sve-builtins-base.h: New intrinsics.
* config/aarch64/aarch64-sve-builtins-base.cc
(class svget_neonq_impl): New intrinsic implementation.
(class svset_neonq_impl): Likewise.
(class svdup_neonq_impl): Likewise.
(NEON_SVE_BRIDGE_FUNCTION): New intrinsics.
* config/aarch64/aarch64-sve-builtins-functions.h
(NEON_SVE_BRIDGE_FUNCTION): Defines macro for NEON_SVE_BRIDGE
functions.
* config/aarch64/aarch64-sve-builtins-shapes.h: New shapes.
* config/aarch64/aarch64-sve-builtins-shapes.cc
(parse_element_type): Add NEON element types.
(parse_type): Likewise.
(struct get_neonq_def): Defines function shape for get_neonq.
(struct set_neonq_def): Defines function shape for set_neonq.
(struct dup_neonq_def): Defines function shape for dup_neonq.
* config/aarch64/aarch64-sve-builtins.cc
(DEF_SVE_TYPE_SUFFIX): Changed to be called through
SVE_NEON macro.
(DEF_SVE_NEON_TYPE_SUFFIX): Defines
macro for NEON_SVE_BRIDGE type suffixes.
(DEF_NEON_SVE_FUNCTION): Defines
macro for NEON_SVE_BRIDGE functions.
(function_resolver::infer_neon128_vector_type): Infers type suffix
for overloaded functions.
(handle_arm_neon_sve_bridge_h): Handles #pragma arm_neon_sve_bridge.h.
* config/aarch64/aarch64-sve-builtins.def
(DEF_SVE_NEON_TYPE_SUFFIX): Macro for handling neon_sve type suffixes.
(bf16): Replace entry with neon-sve entry.
(f16): Likewise.
(f32): Likewise.
(f64): Likewise.
(s8): Likewise.
(s16): Likewise.
(s32): Likewise.
(s64): Likewise.
(u8): Likewise.
(u16): Likewise.
(u32): Likewise.
(u64): Likewise.
* config/aarch64/aarch64-sve-builtins.h
(GCC_AARCH64_SVE_BUILTINS_H): Include aarch64-builtins.h.
(ENTRY): Add aarch64_simd_type definiton.
(enum aarch64_simd_type): Add neon information to type_suffix_info.
(struct type_suffix_info): New function.
* config/aarch64/aarch64-sve.md
(@aarch64_sve_get_neonq_<mode>): New intrinsic insn for big endian.
(@aarch64_sve_set_neonq_<mode>): Likewise.
* config/aarch64/iterators.md: Add UNSPEC_SET_NEONQ.
* config/aarch64/aarch64-builtins.h: New file.
* config/aarch64/aarch64-neon-sve-bridge-builtins.def: New file.
* config/aarch64/arm_neon_sve_bridge.h: New file.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve/acle/asm/test_sve_acle.h: Add include
arm_neon_sve_bridge header file
* gcc.dg/torture/neon-sve-bridge.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_bf16.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_f16.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_f32.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_f64.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_s16.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_s32.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_s64.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_s8.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_u16.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_u32.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_u64.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_u8.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_bf16.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_f16.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_f32.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_f64.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_s16.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_s32.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_s64.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_s8.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_u16.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_u32.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_u64.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_u8.c: New test.
* gcc.target/aarch64/sve/acle/asm/set_neonq_bf16.c: New test.
* gcc.target/aarch64/sve/acle/asm/set_neonq_f16.c: New test.
* gcc.target/aarch64/sve/acle/asm/set_neonq_f32.c: New test.
* gcc.target/aarch64/sve/acle/asm/set_neonq_f64.c: New test.
* gcc.target/aarch64/sve/acle/asm/set_neonq_s16.c: New test.
* gcc.target/aarch64/sve/acle/asm/set_neonq_s32.c: New test.
* gcc.target/aarch64/sve/acle/asm/set_neonq_s64.c: New test.
* gcc.target/aarch64/sve/acle/asm/set_neonq_s8.c: New test.
* gcc.target/aarch64/sve/acle/asm/set_neonq_u16.c: New test.
* gcc.target/aarch64/sve/acle/asm/set_neonq_u32.c: New test.
* gcc.target/aarch64/sve/acle/asm/set_neonq_u64.c: New test.
* gcc.target/aarch64/sve/acle/asm/set_neonq_u8.c: New test.
* gcc.target/aarch64/sve/acle/general-c/dup_neonq_1.c: New test.
* gcc.target/aarch64/sve/acle/general-c/get_neonq_1.c: New test.
* gcc.target/aarch64/sve/acle/general-c/set_neonq_1.c: New test.
|
|
This pass adds a simple register allocator for FP & SIMD registers.
Its main purpose is to make use of SME2's strided LD1, ST1 and LUTI2/4
instructions, which require a very specific grouping structure,
and so would be difficult to exploit with general allocation.
The allocator is very simple. It gives up on anything that would
require spilling, or that it might not handle well for other reasons.
The allocator needs to track liveness at the level of individual FPRs.
Doing that fixes a lot of the PRs relating to redundant moves caused by
structure loads and stores. That particular problem is going to be
fixed more generally for GCC 15 by Lehua's RA patches.
However, the early-RA pass runs before scheduling, so it has a chance
to bag a spill-free allocation of vector code before the scheduler moves
things around. It could therefore still be useful for non-SME code
(e.g. for hand-scheduled ACLE code) even after Lehua's patches are in.
The pass is controlled by a tristate switch:
- -mearly-ra=all: run on all functions
- -mearly-ra=strided: run on functions that have access to strided registers
- -mearly-ra=none: don't run on any function
The patch makes -mearly-ra=all the default at -O2 and above for now.
We can revisit this for GCC 15 once Lehua's patches are in;
-mearly-ra=strided might then be more appropriate.
As said previously, the pass is very naive. There's much more that we
could do, such as handling invariants better. The main focus is on not
committing to a bad allocation, rather than on handling as much as
possible.
gcc/
PR rtl-optimization/106694
PR rtl-optimization/109078
PR rtl-optimization/109391
* config.gcc: Add aarch64-early-ra.o for AArch64 targets.
* config/aarch64/t-aarch64 (aarch64-early-ra.o): New rule.
* config/aarch64/aarch64-opts.h (aarch64_early_ra_scope): New enum.
* config/aarch64/aarch64.opt (mearly_ra): New option.
* doc/invoke.texi: Document it.
* common/config/aarch64/aarch64-common.cc
(aarch_option_optimization_table): Use -mearly-ra=strided by
default for -O2 and above.
* config/aarch64/aarch64-passes.def (pass_aarch64_early_ra): New pass.
* config/aarch64/aarch64-protos.h (aarch64_strided_registers_p)
(make_pass_aarch64_early_ra): Declare.
* config/aarch64/aarch64-sme.md (@aarch64_sme_lut<LUTI_BITS><mode>):
Add a stride_type attribute.
(@aarch64_sme_lut<LUTI_BITS><mode>_strided2): New pattern.
(@aarch64_sme_lut<LUTI_BITS><mode>_strided4): Likewise.
* config/aarch64/aarch64-sve-builtins-base.cc (svld1_impl::expand)
(svldnt1_impl::expand, svst1_impl::expand, svstn1_impl::expand): Handle
new way of defining multi-register loads and stores.
* config/aarch64/aarch64-sve.md (@aarch64_ld1<SVE_FULLx24:mode>)
(@aarch64_ldnt1<SVE_FULLx24:mode>, @aarch64_st1<SVE_FULLx24:mode>)
(@aarch64_stnt1<SVE_FULLx24:mode>): Delete.
* config/aarch64/aarch64-sve2.md (@aarch64_<LD1_COUNT:optab><mode>)
(@aarch64_<LD1_COUNT:optab><mode>_strided2): New patterns.
(@aarch64_<LD1_COUNT:optab><mode>_strided4): Likewise.
(@aarch64_<ST1_COUNT:optab><mode>): Likewise.
(@aarch64_<ST1_COUNT:optab><mode>_strided2): Likewise.
(@aarch64_<ST1_COUNT:optab><mode>_strided4): Likewise.
* config/aarch64/aarch64.cc (aarch64_strided_registers_p): New
function.
* config/aarch64/aarch64.md (UNSPEC_LD1_SVE_COUNT): Delete.
(UNSPEC_ST1_SVE_COUNT, UNSPEC_LDNT1_SVE_COUNT): Likewise.
(UNSPEC_STNT1_SVE_COUNT): Likewise.
(stride_type): New attribute.
* config/aarch64/constraints.md (Uwd, Uwt): New constraints.
* config/aarch64/iterators.md (UNSPEC_LD1_COUNT, UNSPEC_LDNT1_COUNT)
(UNSPEC_ST1_COUNT, UNSPEC_STNT1_COUNT): New unspecs.
(optab): Handle them.
(LD1_COUNT, ST1_COUNT): New iterators.
* config/aarch64/aarch64-early-ra.cc: New file.
gcc/testsuite/
PR rtl-optimization/106694
PR rtl-optimization/109078
PR rtl-optimization/109391
* gcc.target/aarch64/ldp_stp_16.c (cons4_4_float): Tighten expected
output test.
* gcc.target/aarch64/sve/shift_1.c: Allow reversed shifts for .s
as well as .d.
* gcc.target/aarch64/sme/strided_1.c: New test.
* gcc.target/aarch64/pr109078.c: Likewise.
* gcc.target/aarch64/pr109391.c: Likewise.
* gcc.target/aarch64/sve/pr106694.c: Likewise.
|
|
This patch adds support for the SME2 <arm_sme.h> intrinsics. The
convention I've used is to put stuff in aarch64-sve-builtins-sme.*
if it relates to ZA, ZT0, the streaming vector length, or other
such SME state. Things that operate purely on predicates and
vectors go in aarch64-sve-builtins-sve2.* instead. Some of these
will later be picked up for SVE2p1.
We previously used Uph internally as a constraint for 16-bit
immediates to atomic instructions. However, we need a user-facing
constraint for the upper predicate registers (already available as
PR_HI_REGS), and Uph makes a natural pair with the existing Upl.
gcc/
* config/aarch64/aarch64.h (TARGET_STREAMING_SME2): New macro.
(P_ALIASES): Likewise.
(REGISTER_NAMES): Add pn aliases of the predicate registers.
(W8_W11_REGNUM_P): New macro.
(W8_W11_REGS): New register class.
(REG_CLASS_NAMES, REG_CLASS_CONTENTS): Update accordingly.
* config/aarch64/aarch64.cc (aarch64_print_operand): Add support
for %K, which prints a predicate as a counter. Handle tuples of
predicates.
(aarch64_regno_regclass): Handle W8_W11_REGS.
(aarch64_class_max_nregs): Likewise.
* config/aarch64/constraints.md (Uci, Uw2, Uw4): New constraints.
(x, y): Move further up file.
(Uph): Redefine as the high predicate registers, renaming the old
constraint to...
(Uih): ...this.
* config/aarch64/predicates.md (const_0_to_7_operand): New predicate.
(const_0_to_4_step_4_operand, const_0_to_6_step_2_operand): Likewise.
(const_0_to_12_step_4_operand, const_0_to_14_step_2_operand): Likewise.
(aarch64_simd_shift_imm_qi): Use const_0_to_7_operand.
* config/aarch64/iterators.md (VNx16SI_ONLY, VNx8SI_ONLY)
(VNx8DI_ONLY, SVE_FULL_BHSIx2, SVE_FULL_HF, SVE_FULL_SIx2_SDIx4)
(SVE_FULL_BHS, SVE_FULLx24, SVE_DIx24, SVE_BHSx24, SVE_Ix24)
(SVE_Fx24, SVE_SFx24, SME_ZA_BIx24, SME_ZA_BHIx124, SME_ZA_BHIx24)
(SME_ZA_HFx124, SME_ZA_HFx24, SME_ZA_HIx124, SME_ZA_HIx24)
(SME_ZA_SDIx24, SME_ZA_SDFx24): New mode iterators.
(UNSPEC_REVD, UNSPEC_CNTP_C, UNSPEC_PEXT, UNSPEC_PEXTx2): New unspecs.
(UNSPEC_PSEL, UNSPEC_PTRUE_C, UNSPEC_SQRSHR, UNSPEC_SQRSHRN)
(UNSPEC_SQRSHRU, UNSPEC_SQRSHRUN, UNSPEC_UQRSHR, UNSPEC_UQRSHRN)
(UNSPEC_UZP, UNSPEC_UZPQ, UNSPEC_ZIP, UNSPEC_ZIPQ, UNSPEC_BFMLSLB)
(UNSPEC_BFMLSLT, UNSPEC_FCVTN, UNSPEC_FDOT, UNSPEC_SQCVT): Likewise.
(UNSPEC_SQCVTN, UNSPEC_SQCVTU, UNSPEC_SQCVTUN, UNSPEC_UQCVT): Likewise.
(UNSPEC_SME_ADD, UNSPEC_SME_ADD_WRITE, UNSPEC_SME_BMOPA): Likewise.
(UNSPEC_SME_BMOPS, UNSPEC_SME_FADD, UNSPEC_SME_FDOT, UNSPEC_SME_FVDOT)
(UNSPEC_SME_FMLA, UNSPEC_SME_FMLS, UNSPEC_SME_FSUB, UNSPEC_SME_READ)
(UNSPEC_SME_SDOT, UNSPEC_SME_SVDOT, UNSPEC_SME_SMLA, UNSPEC_SME_SMLS)
(UNSPEC_SME_SUB, UNSPEC_SME_SUB_WRITE, UNSPEC_SME_SUDOT): Likewise.
(UNSPEC_SME_SUVDOT, UNSPEC_SME_UDOT, UNSPEC_SME_UVDOT): Likewise.
(UNSPEC_SME_UMLA, UNSPEC_SME_UMLS, UNSPEC_SME_USDOT): Likewise.
(UNSPEC_SME_USVDOT, UNSPEC_SME_WRITE): Likewise.
(Vetype, VNARROW, V2XWIDE, Ventype, V_INT_EQUIV, v_int_equiv)
(VSINGLE, vsingle, b): Add tuple modes.
(v2xwide, za32_offset_range, za64_offset_range, za32_long)
(za32_last_offset, vg_modifier, z_suffix, aligned_operand)
(aligned_fpr): New mode attributes.
(SVE_INT_BINARY_MULTI, SVE_INT_BINARY_SINGLE, SVE_INT_BINARY_MULTI)
(SVE_FP_BINARY_MULTI): New int iterators.
(SVE_BFLOAT_TERNARY_LONG): Add UNSPEC_BFMLSLB and UNSPEC_BFMLSLT.
(SVE_BFLOAT_TERNARY_LONG_LANE): Likewise.
(SVE_WHILE_ORDER, SVE2_INT_SHIFT_IMM_NARROWxN, SVE_QCVTxN)
(SVE2_SFx24_UNARY, SVE2_x24_PERMUTE, SVE2_x24_PERMUTEQ)
(UNSPEC_REVD_ONLY, SME2_INT_MOP, SME2_BMOP, SME_BINARY_SLICE_SDI)
(SME_BINARY_SLICE_SDF, SME_BINARY_WRITE_SLICE_SDI, SME_INT_DOTPROD)
(SME_INT_DOTPROD_LANE, SME_FP_DOTPROD, SME_FP_DOTPROD_LANE)
(SME_INT_TERNARY_SLICE, SME_FP_TERNARY_SLICE, BHSD_BITS)
(LUTI_BITS): New int iterators.
(optab, sve_int_op): Handle the new unspecs.
(sme_int_op, has_16bit_form): New int attributes.
(bits_etype): Handle 64.
* config/aarch64/aarch64.md (UNSPEC_LD1_SVE_COUNT): New unspec.
(UNSPEC_ST1_SVE_COUNT, UNSPEC_LDNT1_SVE_COUNT): Likewise.
(UNSPEC_STNT1_SVE_COUNT): Likewise.
* config/aarch64/atomics.md (cas_short_expected_imm): Use Uhi
rather than Uph for HImode immediates.
* config/aarch64/aarch64-sve.md (@aarch64_ld1<SVE_FULLx24:mode>)
(@aarch64_ldnt1<SVE_FULLx24:mode>, @aarch64_st1<SVE_FULLx24:mode>)
(@aarch64_stnt1<SVE_FULLx24:mode>): New patterns.
(@aarch64_<sur>dot_prod_lane<vsi2qi>): Extend to...
(@aarch64_<sur>dot_prod_lane<SVE_FULL_SDI:mode><SVE_FULL_BHI:mode>)
(@aarch64_<sur>dot_prod_lane<VNx4SI_ONLY:mode><VNx16QI_ONLY:mode>):
...these new patterns.
(SVE_WHILE_B, SVE_WHILE_B_X2, SVE_WHILE_C): New constants. Add
SVE_WHILE_B to existing while patterns.
* config/aarch64/aarch64-sve2.md (@aarch64_sve_ptrue_c<BHSD_BITS>)
(@aarch64_sve_pext<BHSD_BITS>, @aarch64_sve_pext<BHSD_BITS>x2)
(@aarch64_sve_psel<BHSD_BITS>, *aarch64_sve_psel<BHSD_BITS>_plus)
(@aarch64_sve_cntp_c<BHSD_BITS>, <frint_pattern><mode>2)
(<optab><mode>3, *<optab><mode>3, @aarch64_sve_single_<optab><mode>)
(@aarch64_sve_<sve_int_op><mode>): New patterns.
(@aarch64_sve_single_<sve_int_op><mode>, @aarch64_sve_<su>clamp<mode>)
(*aarch64_sve_<su>clamp<mode>_x, @aarch64_sve_<su>clamp_single<mode>)
(@aarch64_sve_fclamp<mode>, *aarch64_sve_fclamp<mode>_x)
(@aarch64_sve_fclamp_single<mode>, <optab><mode><v2xwide>2)
(@aarch64_sve_<sur>dotvnx4sivnx8hi): New patterns.
(@aarch64_sve_<maxmin_uns_op><mode>): Likewise.
(*aarch64_sve_<maxmin_uns_op><mode>): Likewise.
(@aarch64_sve_single_<maxmin_uns_op><mode>): Likewise.
(aarch64_sve_fdotvnx4sfvnx8hf): Likewise.
(aarch64_fdot_prod_lanevnx4sfvnx8hf): Likewise.
(@aarch64_sve_<optab><VNx16QI_ONLY:mode><VNx16SI_ONLY:mode>): Likewise.
(@aarch64_sve_<optab><VNx8HI_ONLY:mode><VNx8SI_ONLY:mode>): Likewise.
(@aarch64_sve_<optab><VNx8HI_ONLY:mode><VNx8DI_ONLY:mode>): Likewise.
(truncvnx8sf<mode>2, @aarch64_sve_cvtn<mode>): Likewise.
(<optab><v_int_equiv><mode>2, <optab><mode><v_int_equiv>2): Likewise.
(@aarch64_sve_sel<mode>): Likewise.
(@aarch64_sve_while<while_optab_cmp>_b<BHSD_BITS>_x2): Likewise.
(@aarch64_sve_while<while_optab_cmp>_c<BHSD_BITS>): Likewise.
(@aarch64_pred_<optab><mode>, @cond_<optab><mode>): Likewise.
(@aarch64_sve_<optab><mode>): Likewise.
* config/aarch64/aarch64-sme.md (@aarch64_sme_<optab><mode><mode>)
(*aarch64_sme_<optab><mode><mode>_plus, @aarch64_sme_read<mode>)
(*aarch64_sme_read<mode>_plus, @aarch64_sme_write<mode>): New patterns.
(*aarch64_sme_write<mode>_plus aarch64_sme_zero_zt0): Likewise.
(@aarch64_sme_<optab><mode>, *aarch64_sme_<optab><mode>_plus)
(@aarch64_sme_single_<optab><mode>): Likewise.
(*aarch64_sme_single_<optab><mode>_plus): Likewise.
(@aarch64_sme_<optab><SME_ZA_SDI:mode><SME_ZA_BHIx24:mode>)
(*aarch64_sme_<optab><SME_ZA_SDI:mode><SME_ZA_BHIx24:mode>_plus)
(@aarch64_sme_single_<optab><SME_ZA_SDI:mode><SME_ZA_BHIx24:mode>)
(*aarch64_sme_single_<optab><SME_ZA_SDI:mode><SME_ZA_BHIx24:mode>_plus)
(@aarch64_sme_single_sudot<VNx4SI_ONLY:mode><SME_ZA_BIx24:mode>)
(*aarch64_sme_single_sudot<VNx4SI_ONLY:mode><SME_ZA_BIx24:mode>_plus)
(@aarch64_sme_lane_<optab><SME_ZA_SDI:mode><SME_ZA_BHIx24:mode>)
(*aarch64_sme_lane_<optab><SME_ZA_SDI:mode><SME_ZA_BHIx24:mode>_plus)
(@aarch64_sme_<optab><VNx4SI_ONLY:mode><SVE_FULL_BHI:mode>)
(*aarch64_sme_<optab><VNx4SI_ONLY:mode><SVE_FULL_BHI:mode>_plus)
(@aarch64_sme_<optab><VNx4SI_ONLY:mode><SME_ZA_BHIx24:mode>)
(*aarch64_sme_<optab><VNx4SI_ONLY:mode><SME_ZA_BHIx24:mode>_plus)
(@aarch64_sme_single_<optab><VNx4SI_ONLY:mode><SME_ZA_BHIx24:mode>)
(*aarch64_sme_single_<optab><VNx4SI_ONLY:mode><SME_ZA_BHIx24:mode>_plus)
(@aarch64_sme_lane_<optab><VNx4SI_ONLY:mode><SME_ZA_BHIx124:mode>)
(*aarch64_sme_lane_<optab><VNx4SI_ONLY:mode><SME_ZA_BHIx124:mode>)
(@aarch64_sme_<optab><VNx2DI_ONLY:mode><VNx8HI_ONLY:mode>)
(*aarch64_sme_<optab><VNx2DI_ONLY:mode><VNx8HI_ONLY:mode>_plus)
(@aarch64_sme_<optab><VNx2DI_ONLY:mode><SME_ZA_HIx24:mode>)
(*aarch64_sme_<optab><VNx2DI_ONLY:mode><SME_ZA_HIx24:mode>_plus)
(@aarch64_sme_single_<optab><VNx2DI_ONLY:mode><SME_ZA_HIx24:mode>)
(*aarch64_sme_single_<optab><VNx2DI_ONLY:mode><SME_ZA_HIx24:mode>_plus)
(@aarch64_sme_lane_<optab><VNx2DI_ONLY:mode><SME_ZA_HIx124:mode>)
(*aarch64_sme_lane_<optab><VNx2DI_ONLY:mode><SME_ZA_HIx124:mode>)
(@aarch64_sme_<optab><VNx4SI_ONLY:mode><VNx8HI_ONLY:mode>)
(@aarch64_sme_<optab><VNx4SI_ONLY:mode><VNx4SI_ONLY:mode>)
(@aarch64_sme_<optab><VNx4SI_ONLY:mode><SME_ZA_HFx24:mode>)
(*aarch64_sme_<optab><VNx4SI_ONLY:mode><SME_ZA_HFx24:mode>_plus)
(@aarch64_sme_single_<optab><VNx4SI_ONLY:mode><SME_ZA_HFx24:mode>)
(*aarch64_sme_single_<optab><VNx4SI_ONLY:mode><SME_ZA_HFx24:mode>_plus)
(@aarch64_sme_lane_<optab><VNx4SI_ONLY:mode><SME_ZA_HFx24:mode>)
(*aarch64_sme_lane_<optab><VNx4SI_ONLY:mode><SME_ZA_HFx24:mode>_plus)
(@aarch64_sme_<optab><SME_ZA_SDF_I:mode><SME_ZA_SDFx24:mode>)
(*aarch64_sme_<optab><SME_ZA_SDF_I:mode><SME_ZA_SDFx24:mode>_plus)
(@aarch64_sme_single_<optab><SME_ZA_SDF_I:mode><SME_ZA_SDFx24:mode>)
(*aarch64_sme_single_<optab><SME_ZA_SDF_I:mode><SME_ZA_SDFx24:mode>_plus)
(@aarch64_sme_lane_<optab><SME_ZA_SDF_I:mode><SME_ZA_SDFx24:mode>)
(*aarch64_sme_lane_<optab><SME_ZA_SDF_I:mode><SME_ZA_SDFx24:mode>)
(@aarch64_sme_<optab><VNx4SI_ONLY:mode><SVE_FULL_HF:mode>)
(*aarch64_sme_<optab><VNx4SI_ONLY:mode><SVE_FULL_HF:mode>_plus)
(@aarch64_sme_lane_<optab><VNx4SI_ONLY:mode><SME_ZA_HFx124:mode>)
(*aarch64_sme_lane_<optab><VNx4SI_ONLY:mode><SME_ZA_HFx124:mode>)
(@aarch64_sme_lut<LUTI_BITS><mode>): Likewise.
(UNSPEC_SME_LUTI): New unspec.
* config/aarch64/aarch64-sve-builtins.def (single): New mode suffix.
(c8, c16, c32, c64): New type suffixes.
(vg1x2, vg1x4, vg2, vg2x1, vg2x2, vg2x4, vg4, vg4x1, vg4x2)
(vg4x4): New group suffixes.
* config/aarch64/aarch64-sve-builtins.h (CP_READ_ZT0)
(CP_WRITE_ZT0): New constants.
(get_svbool_t): Delete.
(function_resolver::report_mismatched_num_vectors): New member
function.
(function_resolver::resolve_conversion): Likewise.
(function_resolver::infer_predicate_type): Likewise.
(function_resolver::infer_64bit_scalar_integer_pair): Likewise.
(function_resolver::require_matching_predicate_type): Likewise.
(function_resolver::require_nonscalar_type): Likewise.
(function_resolver::finish_opt_single_resolution): Likewise.
(function_resolver::require_derived_vector_type): Add an
expected_num_vectors parameter.
(function_expander::map_to_rtx_codes): Add an extra parameter
for unconditional FP unspecs.
(function_instance::gp_type_index): New member function.
(function_instance::gp_type): Likewise.
(function_instance::gp_mode): Handle multi-vector operations.
* config/aarch64/aarch64-sve-builtins.cc (TYPES_all_count)
(TYPES_all_pred_count, TYPES_c, TYPES_bhs_data, TYPES_bhs_widen)
(TYPES_hs_data, TYPES_cvt_h_s_float, TYPES_cvt_s_s, TYPES_qcvt_x2)
(TYPES_qcvt_x4, TYPES_qrshr_x2, TYPES_qrshru_x2, TYPES_qrshr_x4)
(TYPES_qrshru_x4, TYPES_while_x, TYPES_while_x_c, TYPES_s_narrow_fsu)
(TYPES_za_s_b_signed, TYPES_za_s_b_unsigned, TYPES_za_s_b_integer)
(TYPES_za_s_h_integer, TYPES_za_s_h_data, TYPES_za_s_unsigned)
(TYPES_za_s_float, TYPES_za_s_data, TYPES_za_d_h_integer): New type
macros.
(groups_x2, groups_x12, groups_x4, groups_x24, groups_x124)
(groups_vg1x2, groups_vg1x4, groups_vg1x24, groups_vg2, groups_vg4)
(groups_vg24): New group arrays.
(function_instance::reads_global_state_p): Handle CP_READ_ZT0.
(function_instance::modifies_global_state_p): Handle CP_WRITE_ZT0.
(add_shared_state_attribute): Handle zt0 state.
(function_builder::add_overloaded_functions): Skip MODE_single
for non-tuple groups.
(function_resolver::report_mismatched_num_vectors): New function.
(function_resolver::resolve_to): Add a fallback error message for
the general two-type case.
(function_resolver::resolve_conversion): New function.
(function_resolver::infer_predicate_type): Likewise.
(function_resolver::infer_64bit_scalar_integer_pair): Likewise.
(function_resolver::require_matching_predicate_type): Likewise.
(function_resolver::require_matching_vector_type): Specifically
diagnose mismatched vector counts.
(function_resolver::require_derived_vector_type): Add an
expected_num_vectors parameter. Extend to handle cases where
tuples are expected.
(function_resolver::require_nonscalar_type): New function.
(function_resolver::check_gp_argument): Use gp_type_index rather
than hard-coding VECTOR_TYPE_svbool_t.
(function_resolver::finish_opt_single_resolution): New function.
(function_checker::require_immediate_either_or): Remove hard-coded
constants.
(function_expander::direct_optab_handler): New function.
(function_expander::use_pred_x_insn): Only add a strictness flag
is the insn has an operand for it.
(function_expander::map_to_rtx_codes): Take an unconditional
FP unspec as an extra parameter. Handle tuples and MODE_single.
(function_expander::map_to_unspecs): Handle tuples and MODE_single.
* config/aarch64/aarch64-sve-builtins-functions.h (read_zt0)
(write_zt0): New typedefs.
(full_width_access::memory_vector): Use the function's
vectors_per_tuple.
(rtx_code_function_base): Add an optional unconditional FP unspec.
(rtx_code_function::expand): Update accordingly.
(rtx_code_function_rotated::expand): Likewise.
(unspec_based_function_exact_insn::expand): Use tuple_mode instead
of vector_mode.
(unspec_based_uncond_function): New typedef.
(cond_or_uncond_unspec_function): New class.
(sme_1mode_function::expand): Handle single forms.
(sme_2mode_function_t): Likewise, adding a template parameter for them.
(sme_2mode_function): Update accordingly.
(sme_2mode_lane_function): New typedef.
(multireg_permute): New class.
(class integer_conversion): Likewise.
(while_comparison::expand): Handle svcount_t and svboolx2_t results.
* config/aarch64/aarch64-sve-builtins-shapes.h
(binary_int_opt_single_n, binary_opt_single_n, binary_single)
(binary_za_slice_lane, binary_za_slice_int_opt_single)
(binary_za_slice_opt_single, binary_za_slice_uint_opt_single)
(binaryx, clamp, compare_scalar_count, count_pred_c)
(dot_za_slice_int_lane, dot_za_slice_lane, dot_za_slice_uint_lane)
(extract_pred, inherent_zt, ldr_zt, read_za, read_za_slice)
(select_pred, shift_right_imm_narrowxn, storexn, str_zt)
(unary_convertxn, unary_za_slice, unaryxn, write_za)
(write_za_slice): Declare.
* config/aarch64/aarch64-sve-builtins-shapes.cc
(za_group_is_pure_overload): New function.
(apply_predication): Use the function's gp_type for the predicate,
instead of hard-coding the use of svbool_t.
(parse_element_type): Add support for "c" (svcount_t).
(parse_type): Add support for "c0" and "c1" (conversion destination
and source types).
(binary_za_slice_lane_base): New class.
(binary_za_slice_opt_single_base): Likewise.
(load_contiguous_base::resolve): Pass the group suffix to r.resolve.
(luti_lane_zt_base): New class.
(binary_int_opt_single_n, binary_opt_single_n, binary_single)
(binary_za_slice_lane, binary_za_slice_int_opt_single)
(binary_za_slice_opt_single, binary_za_slice_uint_opt_single)
(binaryx, clamp): New shapes.
(compare_scalar_def::build): Allow the return type to be a tuple.
(compare_scalar_def::expand): Pass the group suffix to r.resolve.
(compare_scalar_count, count_pred_c, dot_za_slice_int_lane)
(dot_za_slice_lane, dot_za_slice_uint_lane, extract_pred, inherent_zt)
(ldr_zt, read_za, read_za_slice, select_pred, shift_right_imm_narrowxn)
(storexn, str_zt): New shapes.
(ternary_qq_lane_def, ternary_qq_opt_n_def): Replace with...
(ternary_qq_or_011_lane_def, ternary_qq_opt_n_or_011_def): ...these
new classes. Allow a second suffix that specifies the type of the
second vector argument, and that is used to derive the third.
(unary_def::build): Extend to handle tuple types.
(unary_convert_def::build): Use the new c0 and c1 format specifiers.
(unary_convertxn, unary_za_slice, unaryxn, write_za): New shapes.
(write_za_slice): Likewise.
* config/aarch64/aarch64-sve-builtins-base.cc (svbic_impl::expand)
(svext_bhw_impl::expand): Update call to map_to_rtx_costs.
(svcntp_impl::expand): Handle svcount_t variants.
(svcvt_impl::expand): Handle unpredicated conversions separately,
dealing with tuples.
(svdot_impl::expand): Handle 2-way dot products.
(svdotprod_lane_impl::expand): Likewise.
(svld1_impl::fold): Punt on tuple loads.
(svld1_impl::expand): Handle tuple loads.
(svldnt1_impl::expand): Likewise.
(svpfalse_impl::fold): Punt on svcount_t forms.
(svptrue_impl::fold): Likewise.
(svptrue_impl::expand): Handle svcount_t forms.
(svrint_impl): New class.
(svsel_impl::fold): Punt on tuple forms.
(svsel_impl::expand): Handle tuple forms.
(svst1_impl::fold): Punt on tuple loads.
(svst1_impl::expand): Handle tuple loads.
(svstnt1_impl::expand): Likewise.
(svwhilelx_impl::fold): Punt on tuple forms.
(svdot_lane): Use UNSPEC_FDOT.
(svmax, svmaxnm, svmin, svminmm): Add unconditional FP unspecs.
(rinta, rinti, rintm, rintn, rintp, rintx, rintz): Use svrint_impl.
* config/aarch64/aarch64-sve-builtins-base.def (svcreate2, svget2)
(svset2, svundef2): Add _b variants.
(svcvt): Use unary_convertxn.
(svdot): Use ternary_qq_opt_n_or_011.
(svdot_lane): Use ternary_qq_or_011_lane.
(svmax, svmaxnm, svmin, svminnm): Use binary_opt_single_n.
(svpfalse): Add a form that returns svcount_t results.
(svrinta, svrintm, svrintn, svrintp): Use unaryxn.
(svsel): Use binaryxn.
(svst1, svstnt1): Use storexn.
* config/aarch64/aarch64-sve-builtins-sme.h
(svadd_za, svadd_write_za, svbmopa_za, svbmops_za, svdot_za)
(svdot_lane_za, svldr_zt, svluti2_lane_zt, svluti4_lane_zt)
(svmla_za, svmla_lane_za, svmls_za, svmls_lane_za, svread_za)
(svstr_zt, svsub_za, svsub_write_za, svsudot_za, svsudot_lane_za)
(svsuvdot_lane_za, svusdot_za, svusdot_lane_za, svusvdot_lane_za)
(svvdot_lane_za, svwrite_za, svzero_zt): Declare.
* config/aarch64/aarch64-sve-builtins-sme.cc (load_store_za_base):
Rename to...
(load_store_za_zt0_base): ...this and extend to tuples.
(load_za_base, store_za_base): Update accordingly.
(expand_ldr_str_zt0): New function.
(svldr_zt_impl, svluti_lane_zt_impl, svread_za_impl, svstr_zt_impl)
(svsudot_za_impl, svwrite_za_impl, svzero_zt_impl): New classes.
(svadd_za, svadd_write_za, svbmopa_za, svbmops_za, svdot_za)
(svdot_lane_za, svldr_zt, svluti2_lane_zt, svluti4_lane_zt)
(svmla_za, svmla_lane_za, svmls_za, svmls_lane_za, svread_za)
(svstr_zt, svsub_za, svsub_write_za, svsudot_za, svsudot_lane_za)
(svsuvdot_lane_za, svusdot_za, svusdot_lane_za, svusvdot_lane_za)
(svvdot_lane_za, svwrite_za, svzero_zt): New functions.
* config/aarch64/aarch64-sve-builtins-sme.def: Add SME2 intrinsics.
* config/aarch64/aarch64-sve-builtins-sve2.h
(svbfmlslb, svbfmlslb_lane, svbfmlslt, svbfmlslt_lane, svclamp)
(svcvtn, svpext, svpsel, svqcvt, svqcvtn, svqrshr, svqrshrn)
(svqrshru, svqrshrun, svrevd, svunpk, svuzp, svuzpq, svzip)
(svzipq): Declare.
* config/aarch64/aarch64-sve-builtins-sve2.cc (svclamp_impl)
(svcvtn_impl, svpext_impl, svpsel_impl): New classes.
(svqrshl_impl::fold): Update for change to svrshl shape.
(svrshl_impl::fold): Punt on tuple forms.
(svsqadd_impl::expand): Update call to map_to_rtx_codes.
(svunpk_impl): New class.
(svbfmlslb, svbfmlslb_lane, svbfmlslt, svbfmlslt_lane, svclamp)
(svcvtn, svpext, svpsel, svqcvt, svqcvtn, svqrshr, svqrshrn)
(svqrshru, svqrshrun, svrevd, svunpk, svuzp, svuzpq, svzip)
(svzipq): New functions.
* config/aarch64/aarch64-sve-builtins-sve2.def: Add SME2 intrinsics.
* config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins): Define
or undefine __ARM_FEATURE_SME2.
gcc/testsuite/
* gcc.target/aarch64/sve/acle/asm/test_sve_acle.h: Provide a way
for test functions to share ZT0.
(ATTR): Update accordingly.
(TEST_LOAD_COUNT, TEST_STORE_COUNT, TEST_PN, TEST_COUNT_PN)
(TEST_EXTRACT_PN, TEST_SELECT_P, TEST_COMPARE_S_X2, TEST_COMPARE_S_C)
(TEST_CREATE_B, TEST_GET_B, TEST_SET_B, TEST_XN, TEST_XN_SINGLE)
(TEST_XN_SINGLE_Z15, TEST_XN_SINGLE_AWKWARD, TEST_X2_NARROW)
(TEST_X4_NARROW): New macros.
* gcc.target/aarch64/sve/acle/asm/create2_1.c: Add _b tests.
* gcc.target/aarch64/sve/acle/general-c/binary_za_m_1.c: Remove
test for svmopa that becomes valid with SME2.
* gcc.target/aarch64/sve/acle/general-c/create_1.c: Adjust for
existence of svboolx2_t version of svcreate2.
* gcc.target/aarch64/sve/acle/general-c/store_1.c: Adjust error
messages to account for svcount_t predication.
* gcc.target/aarch64/sve/acle/general-c/store_2.c: Likewise.
* gcc.target/aarch64/sve/acle/general-c/ternary_qq_lane_1.c: Adjust
error messages to account for new SME2 variants.
* gcc.target/aarch64/sve/acle/general-c/ternary_qq_opt_n_2.c: Likewise.
|
|
SME2 has some instructions that operate on pairs of predicates.
The SME2 ACLE defines an svboolx2_t type for the associated
intrinsics.
The patch uses a double-width predicate mode, VNx32BI, to represent
the contents, similarly to how data vector tuples work. At present
there doesn't seem to be any need to define pairs for VNx2BI,
VNx4BI and VNx8BI.
We already supported pairs of svbool_ts at the PCS level, as part
of a more general framework. All that changes on the PCS side is
that we now have an associated mode.
gcc/
* config/aarch64/aarch64-modes.def (VNx32BI): New mode.
* config/aarch64/aarch64-protos.h (aarch64_split_double_move): Declare.
* config/aarch64/aarch64-sve-builtins.cc
(register_tuple_type): Handle tuples of predicates.
(handle_arm_sve_h): Define svboolx2_t as a pair of two svbool_ts.
* config/aarch64/aarch64-sve.md (movvnx32bi): New insn.
* config/aarch64/aarch64.cc
(pure_scalable_type_info::piece::get_rtx): Use VNx32BI for pairs
of predicates.
(pure_scalable_type_info::add_piece): Don't try to form pairs of
predicates.
(VEC_STRUCT): Generalize comment.
(aarch64_classify_vector_mode): Handle VNx32BI.
(aarch64_array_mode): Likewise. Return BLKmode for arrays of
predicates that have no associated mode, rather than allowing
an integer mode to be chosen.
(aarch64_hard_regno_nregs): Handle VNx32BI.
(aarch64_hard_regno_mode_ok): Likewise.
(aarch64_split_double_move): New function, split out from...
(aarch64_split_128bit_move): ...here.
(aarch64_ptrue_reg): Tighten assert to aarch64_sve_pred_mode_p.
(aarch64_pfalse_reg): Likewise.
(aarch64_sve_same_pred_for_ptest_p): Likewise.
(aarch64_sme_mode_switch_regs::add_reg): Handle VNx32BI.
(aarch64_expand_mov_immediate): Restrict handling of boolean vector
constants to single-predicate modes.
(aarch64_classify_address): Handle VNx32BI, ensuring that both halves
can be addressed.
(aarch64_class_max_nregs): Handle VNx32BI.
(aarch64_member_type_forces_blk): Don't for BLKmode for svboolx2_t.
(aarch64_simd_valid_immediate): Allow all-zeros and all-ones for
VNx32BI.
(aarch64_mov_operand_p): Restrict predicate constant canonicalization
to single-predicate modes.
(aarch64_evpc_ext): Generalize exclusion to all predicate modes.
(aarch64_evpc_rev_local, aarch64_evpc_dup): Likewise.
* config/aarch64/constraints.md (PR_REGS): New predicate.
gcc/testsuite/
* gcc.target/aarch64/sve/pcs/struct_3_128.c (test_nonpst3): Adjust
stack offsets.
(ret_nonpst3): Remove XFAIL.
* gcc.target/aarch64/sve/acle/general-c/svboolx2_1.c: New test.
|
|
Following on from the previous Advanced SIMD patch, this one
divides SVE instructions into non-streaming and streaming-
compatible groups.
gcc/
* config/aarch64/aarch64.h (TARGET_NON_STREAMING): New macro.
(TARGET_SVE2_AES, TARGET_SVE2_BITPERM): Use it.
(TARGET_SVE2_SHA3, TARGET_SVE2_SM4): Likewise.
* config/aarch64/aarch64-sve-builtins-base.def: Separate out
the functions that require PSTATE.SM to be 0 and guard them
with AARCH64_FL_SM_OFF.
* config/aarch64/aarch64-sve-builtins-sve2.def: Likewise.
* config/aarch64/aarch64-sve-builtins.cc (check_required_extensions):
Enforce AARCH64_FL_SM_OFF requirements.
* config/aarch64/aarch64-sve.md (aarch64_wrffr): Require
TARGET_NON_STREAMING
(aarch64_rdffr, aarch64_rdffr_z, *aarch64_rdffr_z_ptest): Likewise.
(*aarch64_rdffr_ptest, *aarch64_rdffr_z_cc, *aarch64_rdffr_cc)
(@aarch64_ld<fn>f1<mode>): Likewise.
(@aarch64_ld<fn>f1_<ANY_EXTEND:optab><SVE_HSDI:mode><SVE_PARTIAL_I:mode>)
(gather_load<mode><v_int_container>): Likewise
(mask_gather_load<mode><v_int_container>): Likewise.
(mask_gather_load<mode><v_int_container>): Likewise.
(*mask_gather_load<mode><v_int_container>_<su>xtw_unpacked): Likewise.
(*mask_gather_load<mode><v_int_container>_sxtw): Likewise.
(*mask_gather_load<mode><v_int_container>_uxtw): Likewise.
(@aarch64_gather_load_<ANY_EXTEND:optab><SVE_4HSI:mode><SVE_4BHI:mode>)
(@aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode>
<SVE_2BHSI:mode>): Likewise.
(*aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode>
<SVE_2BHSI:mode>_<ANY_EXTEND2:su>xtw_unpacked)
(*aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode>
<SVE_2BHSI:mode>_sxtw): Likewise.
(*aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode>
<SVE_2BHSI:mode>_uxtw): Likewise.
(@aarch64_ldff1_gather<mode>, @aarch64_ldff1_gather<mode>): Likewise.
(*aarch64_ldff1_gather<mode>_sxtw): Likewise.
(*aarch64_ldff1_gather<mode>_uxtw): Likewise.
(@aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx4_WIDE:mode>
<VNx4_NARROW:mode>): Likewise.
(@aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode>
<VNx2_NARROW:mode>): Likewise.
(*aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode>
<VNx2_NARROW:mode>_sxtw): Likewise.
(*aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode>
<VNx2_NARROW:mode>_uxtw): Likewise.
(@aarch64_sve_gather_prefetch<SVE_FULL_I:mode><VNx4SI_ONLY:mode>)
(@aarch64_sve_gather_prefetch<SVE_FULL_I:mode><VNx2DI_ONLY:mode>)
(*aarch64_sve_gather_prefetch<SVE_FULL_I:mode><VNx2DI_ONLY:mode>_sxtw)
(*aarch64_sve_gather_prefetch<SVE_FULL_I:mode><VNx2DI_ONLY:mode>_uxtw)
(scatter_store<mode><v_int_container>): Likewise.
(mask_scatter_store<mode><v_int_container>): Likewise.
(*mask_scatter_store<mode><v_int_container>_<su>xtw_unpacked)
(*mask_scatter_store<mode><v_int_container>_sxtw): Likewise.
(*mask_scatter_store<mode><v_int_container>_uxtw): Likewise.
(@aarch64_scatter_store_trunc<VNx4_NARROW:mode><VNx4_WIDE:mode>)
(@aarch64_scatter_store_trunc<VNx2_NARROW:mode><VNx2_WIDE:mode>)
(*aarch64_scatter_store_trunc<VNx2_NARROW:mode><VNx2_WIDE:mode>_sxtw)
(*aarch64_scatter_store_trunc<VNx2_NARROW:mode><VNx2_WIDE:mode>_uxtw)
(@aarch64_sve_ld1ro<mode>, @aarch64_adr<mode>): Likewise.
(*aarch64_adr_sxtw, *aarch64_adr_uxtw_unspec): Likewise.
(*aarch64_adr_uxtw_and, @aarch64_adr<mode>_shift): Likewise.
(*aarch64_adr<mode>_shift, *aarch64_adr_shift_sxtw): Likewise.
(*aarch64_adr_shift_uxtw, @aarch64_sve_add_<optab><vsi2qi>): Likewise.
(@aarch64_sve_<sve_fp_op><mode>, fold_left_plus_<mode>): Likewise.
(mask_fold_left_plus_<mode>, @aarch64_sve_compact<mode>): Likewise.
* config/aarch64/aarch64-sve2.md (@aarch64_gather_ldnt<mode>)
(@aarch64_gather_ldnt_<ANY_EXTEND:optab><SVE_FULL_SDI:mode>
<SVE_PARTIAL_I:mode>): Likewise.
(@aarch64_sve2_histcnt<mode>, @aarch64_sve2_histseg<mode>): Likewise.
(@aarch64_pred_<SVE2_MATCH:sve_int_op><mode>): Likewise.
(*aarch64_pred_<SVE2_MATCH:sve_int_op><mode>_cc): Likewise.
(*aarch64_pred_<SVE2_MATCH:sve_int_op><mode>_ptest): Likewise.
* config/aarch64/iterators.md (SVE_FP_UNARY_INT): Make FEXPA
depend on TARGET_NON_STREAMING.
(SVE_BFLOAT_TERNARY_LONG): Likewise BFMMLA.
gcc/testsuite/
* g++.target/aarch64/sve/aarch64-ssve.exp: New harness.
* g++.target/aarch64/sve/acle/aarch64-sve-acle-asm.exp: Add
-DSTREAMING_COMPATIBLE to the list of options.
* g++.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp: Likewise.
* gcc.target/aarch64/sve/acle/aarch64-sve-acle-asm.exp: Likewise.
* gcc.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp: Likewise.
Fix pasto in variable name.
* gcc.target/aarch64/sve/acle/asm/test_sve_acle.h: Mark functions
as streaming-compatible if STREAMING_COMPATIBLE is defined.
* gcc.target/aarch64/sve/acle/asm/adda_f16.c: Disable for
streaming-compatible code.
* gcc.target/aarch64/sve/acle/asm/adda_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/adda_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/adrb.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/adrd.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/adrh.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/adrw.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/bfmmla_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/compact_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/compact_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/compact_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/compact_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/compact_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/compact_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/expa_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/expa_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/expa_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1_gather_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1_gather_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1_gather_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1_gather_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1_gather_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1_gather_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1ro_bf16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1ro_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1ro_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1ro_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1ro_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1ro_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1ro_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1ro_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1ro_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1ro_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1ro_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1ro_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1sb_gather_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1sb_gather_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1sb_gather_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1sb_gather_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1sh_gather_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1sh_gather_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1sh_gather_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1sh_gather_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1sw_gather_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1sw_gather_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1ub_gather_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1ub_gather_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1ub_gather_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1ub_gather_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1uh_gather_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1uh_gather_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1uh_gather_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1uh_gather_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1uw_gather_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1uw_gather_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1_bf16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1_gather_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1_gather_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1_gather_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1_gather_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1_gather_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1_gather_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1sb_gather_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1sb_gather_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1sb_gather_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1sb_gather_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1sb_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1sb_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1sb_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1sb_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1sb_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1sb_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1sh_gather_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1sh_gather_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1sh_gather_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1sh_gather_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1sh_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1sh_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1sh_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1sh_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1sw_gather_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1sw_gather_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1sw_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1sw_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1ub_gather_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1ub_gather_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1ub_gather_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1ub_gather_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1ub_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1ub_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1ub_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1ub_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1ub_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1ub_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1uh_gather_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1uh_gather_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1uh_gather_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1uh_gather_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1uh_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1uh_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1uh_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1uh_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1uw_gather_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1uw_gather_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1uw_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1uw_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1_bf16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1sb_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1sb_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1sb_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1sb_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1sb_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1sb_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1sh_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1sh_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1sh_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1sh_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1sw_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1sw_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1ub_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1ub_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1ub_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1ub_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1ub_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1ub_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1uh_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1uh_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1uh_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1uh_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1uw_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1uw_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mmla_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mmla_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mmla_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mmla_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/prfb_gather.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/prfd_gather.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/prfh_gather.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/prfw_gather.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/rdffr_1.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/st1_scatter_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/st1_scatter_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/st1_scatter_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/st1_scatter_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/st1_scatter_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/st1_scatter_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/st1b_scatter_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/st1b_scatter_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/st1b_scatter_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/st1b_scatter_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/st1h_scatter_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/st1h_scatter_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/st1h_scatter_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/st1h_scatter_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/st1w_scatter_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/st1w_scatter_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/tmad_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/tmad_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/tmad_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/tsmul_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/tsmul_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/tsmul_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/tssel_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/tssel_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/tssel_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/usmmla_s32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/aesd_u8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/aese_u8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/aesimc_u8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/aesmc_u8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/bdep_u16.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/bdep_u32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/bdep_u64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/bdep_u8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/bext_u16.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/bext_u32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/bext_u64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/bext_u8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/bgrp_u16.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/bgrp_u32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/bgrp_u64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/bgrp_u8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/histcnt_s32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/histcnt_s64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/histcnt_u32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/histcnt_u64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/histseg_s8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/histseg_u8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_f32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_f64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_s32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_s64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_u32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_u64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/ldnt1sb_gather_s32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/ldnt1sb_gather_s64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/ldnt1sb_gather_u32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/ldnt1sb_gather_u64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_s32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_s64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_u32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_u64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/ldnt1sw_gather_s64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/ldnt1sw_gather_u64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/ldnt1ub_gather_s32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/ldnt1ub_gather_s64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/ldnt1ub_gather_u32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/ldnt1ub_gather_u64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_s32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_s64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_u32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_u64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/ldnt1uw_gather_s64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/ldnt1uw_gather_u64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/match_s16.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/match_s8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/match_u16.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/match_u8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/nmatch_s16.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/nmatch_s8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/nmatch_u16.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/nmatch_u8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/pmullb_pair_u64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/pmullt_pair_u64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/rax1_s64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/rax1_u64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/sm4e_u32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/sm4ekey_u32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_f32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_f64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_s32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_s64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_u32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_u64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/stnt1b_scatter_s32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/stnt1b_scatter_s64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/stnt1b_scatter_u32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/stnt1b_scatter_u64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_s32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_s64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_u32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_u64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/stnt1w_scatter_s64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/stnt1w_scatter_u64.c: Likewise.
|
|
SME2 adds a number of intrinsics that operate on tuples of 2 and 4
vectors. The ACLE therefore extends the existing svreinterpret
intrinsics to handle tuples as well.
gcc/
* config/aarch64/aarch64-sve-builtins-base.cc
(svreinterpret_impl::fold): Punt on tuple forms.
(svreinterpret_impl::expand): Use tuple_mode instead of vector_mode.
* config/aarch64/aarch64-sve-builtins-base.def (svreinterpret):
Extend to x1234 groups.
* config/aarch64/aarch64-sve-builtins-functions.h
(multi_vector_function::vectors_per_tuple): If the function has
a group suffix, get the number of vectors from there.
* config/aarch64/aarch64-sve-builtins-shapes.h (reinterpret): Declare.
* config/aarch64/aarch64-sve-builtins-shapes.cc (reinterpret_def)
(reinterpret): New function shape.
* config/aarch64/aarch64-sve-builtins.cc (function_groups): Handle
DEF_SVE_FUNCTION_GS.
* config/aarch64/aarch64-sve-builtins.def (DEF_SVE_FUNCTION_GS): New
macro.
(DEF_SVE_FUNCTION): Forward to DEF_SVE_FUNCTION_GS by default.
* config/aarch64/aarch64-sve-builtins.h
(function_instance::tuple_mode): New member function.
(function_base::vectors_per_tuple): Take the function instance
as argument and get the number from the group suffix.
(function_instance::vectors_per_tuple): Update accordingly.
* config/aarch64/iterators.md (SVE_FULLx2, SVE_FULLx3, SVE_FULLx4)
(SVE_ALL_STRUCT): New mode iterators.
(SVE_STRUCT): Redefine in terms of SVE_FULL*.
* config/aarch64/aarch64-sve.md (@aarch64_sve_reinterpret<mode>)
(*aarch64_sve_reinterpret<mode>): Extend to SVE structure modes.
gcc/testsuite/
* gcc.target/aarch64/sve/acle/asm/test_sve_acle.h (TEST_DUAL_XN):
New macro.
* gcc.target/aarch64/sve/acle/asm/reinterpret_bf16.c: Add tests for
tuple forms.
* gcc.target/aarch64/sve/acle/asm/reinterpret_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/reinterpret_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/reinterpret_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/reinterpret_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/reinterpret_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/reinterpret_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/reinterpret_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/reinterpret_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/reinterpret_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/reinterpret_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/reinterpret_u8.c: Likewise.
|
|
This adds an implementation for masked copysign along with an optimized
pattern for masked copysign (x, -1).
gcc/ChangeLog:
PR tree-optimization/109154
* config/aarch64/aarch64-sve.md (cond_copysign<mode>): New.
gcc/testsuite/ChangeLog:
PR tree-optimization/109154
* gcc.target/aarch64/sve/fneg-abs_5.c: New test.
|
|
copysign (x, -1) is effectively fneg (abs (x)) which on AArch64 can be
most efficiently done by doing an OR of the signbit.
The middle-end will optimize fneg (abs (x)) now to copysign as the
canonical form and so this optimizes the expansion.
If the target has an inclusive-OR that takes an immediate, then the transformed
instruction is both shorter and faster. For those that don't, the immediate
has to be separately constructed, but this still ends up being faster as the
immediate construction is not on the critical path.
Note that this is part of another patch series, the additional testcases
are mutually dependent on the match.pd patch. As such the tests are added
there insteadof here.
gcc/ChangeLog:
PR tree-optimization/109154
* config/aarch64/aarch64.md (copysign<GPF:mode>3): Handle
copysign (x, -1).
* config/aarch64/aarch64-simd.md (copysign<mode>3): Likewise.
* config/aarch64/aarch64-sve.md (copysign<mode>3): Likewise.
|
|
Hi all,
this patch converts a number of multi multi choice patterns within the
aarch64 backend to the new syntax.
The list of the converted patterns is in the Changelog.
For completeness here follows the list of multi choice patterns that
were rejected for conversion by my parser, they typically have some C
as asm output and require some manual intervention:
aarch64_simd_vec_set<mode>, aarch64_get_lane<mode>,
aarch64_cm<optab>di, aarch64_cm<optab>di, aarch64_cmtstdi,
*aarch64_movv8di, *aarch64_be_mov<mode>, *aarch64_be_movci,
*aarch64_be_mov<mode>, *aarch64_be_movxi, *aarch64_sve_mov<mode>_le,
*aarch64_sve_mov<mode>_be, @aarch64_pred_mov<mode>,
@aarch64_sve_gather_prefetch<SVE_FULL_I:mode><VNx4SI_ONLY:mode>,
@aarch64_sve_gather_prefetch<SVE_FULL_I:mode><VNx2DI_ONLY:mode>,
*aarch64_sve_gather_prefetch<SVE_FULL_I:mode><VNx2DI_ONLY:mode>_sxtw,
*aarch64_sve_gather_prefetch<SVE_FULL_I:mode><VNx2DI_ONLY:mode>_uxtw,
@aarch64_vec_duplicate_vq<mode>_le, *vec_extract<mode><Vel>_0,
*vec_extract<mode><Vel>_v128, *cmp<cmp_op><mode>_and,
*fcm<cmp_op><mode>_and_combine, @aarch64_sve_ext<mode>,
@aarch64_sve2_<su>aba<mode>, *sibcall_insn, *sibcall_value_insn,
*xor_one_cmpl<mode>3, *insv_reg<mode>_<SUBDI_BITS>,
*aarch64_bfi<GPI:mode><ALLX:mode>_<SUBDI_BITS>,
*aarch64_bfidi<ALLX:mode>_subreg_<SUBDI_BITS>, *aarch64_bfxil<mode>,
*aarch64_bfxilsi_uxtw,
*aarch64_<su_optab>cvtf<fcvt_target><GPF:mode>2_mult,
atomic_store<mode>.
Bootstraped and reg tested on aarch64-unknown-linux-gnu, also I
analysed tmp-mddump.md (from 'make mddump') and could not find
effective differences, okay for trunk?
Bests
Andrea
gcc/ChangeLog:
* config/aarch64/aarch64.md (@ccmp<CC_ONLY:mode><GPI:mode>)
(@ccmp<CC_ONLY:mode><GPI:mode>_rev, *call_insn, *call_value_insn)
(*mov<mode>_aarch64, load_pair_sw_<SX:mode><SX2:mode>)
(load_pair_dw_<DX:mode><DX2:mode>)
(store_pair_sw_<SX:mode><SX2:mode>)
(store_pair_dw_<DX:mode><DX2:mode>, *extendsidi2_aarch64)
(*zero_extendsidi2_aarch64, *load_pair_zero_extendsidi2_aarch64)
(*extend<SHORT:mode><GPI:mode>2_aarch64)
(*zero_extend<SHORT:mode><GPI:mode>2_aarch64)
(*extendqihi2_aarch64, *zero_extendqihi2_aarch64)
(*add<mode>3_aarch64, *addsi3_aarch64_uxtw, *add<mode>3_poly_1)
(add<mode>3_compare0, *addsi3_compare0_uxtw)
(*add<mode>3_compareC_cconly, add<mode>3_compareC)
(*add<mode>3_compareV_cconly_imm, add<mode>3_compareV_imm)
(*add<mode>3nr_compare0, subdi3, subv<GPI:mode>_imm)
(*cmpv<GPI:mode>_insn, sub<mode>3_compare1_imm, neg<mode>2)
(cmp<mode>, fcmp<mode>, fcmpe<mode>, *cmov<mode>_insn)
(*cmovsi_insn_uxtw, <optab><mode>3, *<optab>si3_uxtw)
(*and<mode>3_compare0, *andsi3_compare0_uxtw, one_cmpl<mode>2)
(*<NLOGICAL:optab>_one_cmpl<mode>3, *and<mode>3nr_compare0)
(*aarch64_ashl_sisd_or_int_<mode>3)
(*aarch64_lshr_sisd_or_int_<mode>3)
(*aarch64_ashr_sisd_or_int_<mode>3, *ror<mode>3_insn)
(*<optab>si3_insn_uxtw, <optab>_trunc<fcvt_target><GPI:mode>2)
(<optab><fcvt_target><GPF:mode>2)
(<FCVT_F2FIXED:fcvt_fixed_insn><GPF:mode>3)
(<FCVT_FIXED2F:fcvt_fixed_insn><GPI:mode>3)
(*aarch64_<optab><mode>3_cssc, copysign<GPF:mode>3_insn): Update
to new syntax.
* config/aarch64/aarch64-sve2.md (@aarch64_scatter_stnt<mode>)
(@aarch64_scatter_stnt_<SVE_FULL_SDI:mode><SVE_PARTIAL_I:mode>)
(*aarch64_mul_unpredicated_<mode>)
(@aarch64_pred_<sve_int_op><mode>, *cond_<sve_int_op><mode>_2)
(*cond_<sve_int_op><mode>_3, *cond_<sve_int_op><mode>_any)
(*cond_<sve_int_op><mode>_z, @aarch64_pred_<sve_int_op><mode>)
(*cond_<sve_int_op><mode>_2, *cond_<sve_int_op><mode>_3)
(*cond_<sve_int_op><mode>_any, @aarch64_sve_<sve_int_op><mode>)
(@aarch64_sve_<sve_int_op>_lane_<mode>)
(@aarch64_sve_add_mul_lane_<mode>)
(@aarch64_sve_sub_mul_lane_<mode>, @aarch64_sve2_xar<mode>)
(*aarch64_sve2_bcax<mode>, @aarch64_sve2_eor3<mode>)
(*aarch64_sve2_nor<mode>, *aarch64_sve2_nand<mode>)
(*aarch64_sve2_bsl<mode>, *aarch64_sve2_nbsl<mode>)
(*aarch64_sve2_bsl1n<mode>, *aarch64_sve2_bsl2n<mode>)
(*aarch64_sve2_sra<mode>, @aarch64_sve_add_<sve_int_op><mode>)
(*aarch64_sve2_<su>aba<mode>, @aarch64_sve_add_<sve_int_op><mode>)
(@aarch64_sve_add_<sve_int_op>_lane_<mode>)
(@aarch64_sve_qadd_<sve_int_op><mode>)
(@aarch64_sve_qadd_<sve_int_op>_lane_<mode>)
(@aarch64_sve_sub_<sve_int_op><mode>)
(@aarch64_sve_sub_<sve_int_op>_lane_<mode>)
(@aarch64_sve_qsub_<sve_int_op><mode>)
(@aarch64_sve_qsub_<sve_int_op>_lane_<mode>)
(@aarch64_sve_<sve_fp_op><mode>, @aarch64_<sve_fp_op>_lane_<mode>)
(@aarch64_pred_<sve_int_op><mode>)
(@aarch64_pred_<sve_fp_op><mode>, *cond_<sve_int_op><mode>_2)
(*cond_<sve_int_op><mode>_z, @aarch64_sve_<optab><mode>)
(@aarch64_<optab>_lane_<mode>, @aarch64_sve_<optab><mode>)
(@aarch64_<optab>_lane_<mode>, @aarch64_pred_<sve_fp_op><mode>)
(*cond_<sve_fp_op><mode>_any_relaxed)
(*cond_<sve_fp_op><mode>_any_strict)
(@aarch64_pred_<sve_int_op><mode>, *cond_<sve_int_op><mode>)
(@aarch64_pred_<sve_fp_op><mode>, *cond_<sve_fp_op><mode>)
(*cond_<sve_fp_op><mode>_strict): Update to new syntax.
* config/aarch64/aarch64-sve.md (*aarch64_sve_mov<mode>_ldr_str)
(*aarch64_sve_mov<mode>_no_ldr_str, @aarch64_pred_mov<mode>)
(*aarch64_sve_mov<mode>, aarch64_wrffr)
(mask_scatter_store<mode><v_int_container>)
(*mask_scatter_store<mode><v_int_container>_<su>xtw_unpacked)
(*mask_scatter_store<mode><v_int_container>_sxtw)
(*mask_scatter_store<mode><v_int_container>_uxtw)
(@aarch64_scatter_store_trunc<VNx4_NARROW:mode><VNx4_WIDE:mode>)
(@aarch64_scatter_store_trunc<VNx2_NARROW:mode><VNx2_WIDE:mode>)
(*aarch64_scatter_store_trunc<VNx2_NARROW:mode><VNx2_WIDE:mode>_sxtw)
(*aarch64_scatter_store_trunc<VNx2_NARROW:mode><VNx2_WIDE:mode>_uxtw)
(*vec_duplicate<mode>_reg, vec_shl_insert_<mode>)
(vec_series<mode>, @extract_<last_op>_<mode>)
(@aarch64_pred_<optab><mode>, *cond_<optab><mode>_2)
(*cond_<optab><mode>_any, @aarch64_pred_<optab><mode>)
(@aarch64_sve_revbhw_<SVE_ALL:mode><PRED_HSD:mode>)
(@cond_<optab><mode>)
(*<optab><SVE_PARTIAL_I:mode><SVE_HSDI:mode>2)
(@aarch64_pred_sxt<SVE_FULL_HSDI:mode><SVE_PARTIAL_I:mode>)
(@aarch64_cond_sxt<SVE_FULL_HSDI:mode><SVE_PARTIAL_I:mode>)
(*cond_uxt<mode>_2, *cond_uxt<mode>_any, *cnot<mode>)
(*cond_cnot<mode>_2, *cond_cnot<mode>_any)
(@aarch64_pred_<optab><mode>, *cond_<optab><mode>_2_relaxed)
(*cond_<optab><mode>_2_strict, *cond_<optab><mode>_any_relaxed)
(*cond_<optab><mode>_any_strict, @aarch64_pred_<optab><mode>)
(*cond_<optab><mode>_2, *cond_<optab><mode>_3)
(*cond_<optab><mode>_any, add<mode>3, sub<mode>3)
(@aarch64_pred_<su>abd<mode>, *aarch64_cond_<su>abd<mode>_2)
(*aarch64_cond_<su>abd<mode>_3, *aarch64_cond_<su>abd<mode>_any)
(@aarch64_sve_<optab><mode>, @aarch64_pred_<optab><mode>)
(*cond_<optab><mode>_2, *cond_<optab><mode>_z)
(@aarch64_pred_<optab><mode>, *cond_<optab><mode>_2)
(*cond_<optab><mode>_3, *cond_<optab><mode>_any, <optab><mode>3)
(*cond_bic<mode>_2, *cond_bic<mode>_any)
(@aarch64_pred_<optab><mode>, *cond_<optab><mode>_2_const)
(*cond_<optab><mode>_any_const, *cond_<sve_int_op><mode>_m)
(*cond_<sve_int_op><mode>_z, *sdiv_pow2<mode>3)
(*cond_<sve_int_op><mode>_2, *cond_<sve_int_op><mode>_any)
(@aarch64_pred_<optab><mode>, *cond_<optab><mode>_2_relaxed)
(*cond_<optab><mode>_2_strict, *cond_<optab><mode>_any_relaxed)
(*cond_<optab><mode>_any_strict, @aarch64_pred_<optab><mode>)
(*cond_<optab><mode>_2_relaxed, *cond_<optab><mode>_2_strict)
(*cond_<optab><mode>_2_const_relaxed)
(*cond_<optab><mode>_2_const_strict)
(*cond_<optab><mode>_3_relaxed, *cond_<optab><mode>_3_strict)
(*cond_<optab><mode>_any_relaxed, *cond_<optab><mode>_any_strict)
(*cond_<optab><mode>_any_const_relaxed)
(*cond_<optab><mode>_any_const_strict)
(@aarch64_pred_<optab><mode>, *cond_add<mode>_2_const_relaxed)
(*cond_add<mode>_2_const_strict)
(*cond_add<mode>_any_const_relaxed)
(*cond_add<mode>_any_const_strict, @aarch64_pred_<optab><mode>)
(*cond_<optab><mode>_2_relaxed, *cond_<optab><mode>_2_strict)
(*cond_<optab><mode>_any_relaxed, *cond_<optab><mode>_any_strict)
(@aarch64_pred_<optab><mode>, *cond_sub<mode>_3_const_relaxed)
(*cond_sub<mode>_3_const_strict, *cond_sub<mode>_const_relaxed)
(*cond_sub<mode>_const_strict, *aarch64_pred_abd<mode>_relaxed)
(*aarch64_pred_abd<mode>_strict)
(*aarch64_cond_abd<mode>_2_relaxed)
(*aarch64_cond_abd<mode>_2_strict)
(*aarch64_cond_abd<mode>_3_relaxed)
(*aarch64_cond_abd<mode>_3_strict)
(*aarch64_cond_abd<mode>_any_relaxed)
(*aarch64_cond_abd<mode>_any_strict, @aarch64_pred_<optab><mode>)
(@aarch64_pred_fma<mode>, *cond_fma<mode>_2, *cond_fma<mode>_4)
(*cond_fma<mode>_any, @aarch64_pred_fnma<mode>)
(*cond_fnma<mode>_2, *cond_fnma<mode>_4, *cond_fnma<mode>_any)
(<sur>dot_prod<vsi2qi>, @aarch64_<sur>dot_prod_lane<vsi2qi>)
(@<sur>dot_prod<vsi2qi>, @aarch64_<sur>dot_prod_lane<vsi2qi>)
(@aarch64_sve_add_<optab><vsi2qi>, @aarch64_pred_<optab><mode>)
(*cond_<optab><mode>_2_relaxed, *cond_<optab><mode>_2_strict)
(*cond_<optab><mode>_4_relaxed, *cond_<optab><mode>_4_strict)
(*cond_<optab><mode>_any_relaxed, *cond_<optab><mode>_any_strict)
(@aarch64_<optab>_lane_<mode>, @aarch64_pred_<optab><mode>)
(*cond_<optab><mode>_4_relaxed, *cond_<optab><mode>_4_strict)
(*cond_<optab><mode>_any_relaxed, *cond_<optab><mode>_any_strict)
(@aarch64_<optab>_lane_<mode>, @aarch64_sve_tmad<mode>)
(@aarch64_sve_<sve_fp_op>vnx4sf)
(@aarch64_sve_<sve_fp_op>_lanevnx4sf)
(@aarch64_sve_<sve_fp_op><mode>, *vcond_mask_<mode><vpred>)
(@aarch64_sel_dup<mode>, @aarch64_pred_cmp<cmp_op><mode>)
(*cmp<cmp_op><mode>_cc, *cmp<cmp_op><mode>_ptest)
(@aarch64_pred_fcm<cmp_op><mode>, @fold_extract_<last_op>_<mode>)
(@aarch64_fold_extract_vector_<last_op>_<mode>)
(@aarch64_sve_splice<mode>)
(@aarch64_sve_<optab>_nontrunc<SVE_FULL_F:mode><SVE_FULL_HSDI:mode>)
(@aarch64_sve_<optab>_trunc<VNx2DF_ONLY:mode><VNx4SI_ONLY:mode>)
(*cond_<optab>_nontrunc<SVE_FULL_F:mode><SVE_FULL_HSDI:mode>_relaxed)
(*cond_<optab>_nontrunc<SVE_FULL_F:mode><SVE_FULL_HSDI:mode>_strict)
(*cond_<optab>_trunc<VNx2DF_ONLY:mode><VNx4SI_ONLY:mode>)
(@aarch64_sve_<optab>_nonextend<SVE_FULL_HSDI:mode><SVE_FULL_F:mode>)
(@aarch64_sve_<optab>_extend<VNx4SI_ONLY:mode><VNx2DF_ONLY:mode>)
(*cond_<optab>_nonextend<SVE_FULL_HSDI:mode><SVE_FULL_F:mode>_relaxed)
(*cond_<optab>_nonextend<SVE_FULL_HSDI:mode><SVE_FULL_F:mode>_strict)
(*cond_<optab>_extend<VNx4SI_ONLY:mode><VNx2DF_ONLY:mode>)
(@aarch64_sve_<optab>_trunc<SVE_FULL_SDF:mode><SVE_FULL_HSF:mode>)
(*cond_<optab>_trunc<SVE_FULL_SDF:mode><SVE_FULL_HSF:mode>)
(@aarch64_sve_<optab>_trunc<VNx4SF_ONLY:mode><VNx8BF_ONLY:mode>)
(*cond_<optab>_trunc<VNx4SF_ONLY:mode><VNx8BF_ONLY:mode>)
(@aarch64_sve_<optab>_nontrunc<SVE_FULL_HSF:mode><SVE_FULL_SDF:mode>)
(*cond_<optab>_nontrunc<SVE_FULL_HSF:mode><SVE_FULL_SDF:mode>)
(@aarch64_brk<brk_op>, *aarch64_sve_<inc_dec><mode>_cntp): Update
to new syntax.
* config/aarch64/aarch64-simd.md (aarch64_simd_dup<mode>)
(load_pair<DREG:mode><DREG2:mode>)
(vec_store_pair<DREG:mode><DREG2:mode>, aarch64_simd_stp<mode>)
(aarch64_simd_mov_from_<mode>low)
(aarch64_simd_mov_from_<mode>high, and<mode>3<vczle><vczbe>)
(ior<mode>3<vczle><vczbe>, aarch64_simd_ashr<mode><vczle><vczbe>)
(aarch64_simd_bsl<mode>_internal<vczle><vczbe>)
(*aarch64_simd_bsl<mode>_alt<vczle><vczbe>)
(aarch64_simd_bsldi_internal, aarch64_simd_bsldi_alt)
(store_pair_lanes<mode>, *aarch64_combine_internal<mode>)
(*aarch64_combine_internal_be<mode>, *aarch64_combinez<mode>)
(*aarch64_combinez_be<mode>)
(aarch64_cm<optab><mode><vczle><vczbe>, *aarch64_cm<optab>di)
(aarch64_cm<optab><mode><vczle><vczbe>, *aarch64_mov<mode>)
(*aarch64_be_mov<mode>, *aarch64_be_movoi): Update to new syntax.
|
|
In the following test:
svuint8_t ld(uint8_t *ptr) { return svld1rq(svptrue_b8(), ptr + 2); }
ptr + 2 is a valid address for an Advanced SIMD load, but not for
an SVE load. We therefore ended up generating:
ldr q0, [x0, 2]
dup z0.q, z0.q[0]
This patch makes us generate LD1RQ for that case too. It takes the
slightly old-school approach of making the predicate broader than
the constraint. That is: any valid memory address is accepted as
an operand before RA. If the instruction remains during RA, LRA will
coerce the address to match the constraint. If the instruction gets
split before RA, the splitter will load invalid addresses into a
scratch register.
gcc/
* config/aarch64/aarch64-sve.md (@aarch64_vec_duplicate_vq<mode>_le):
Accept all nonimmediate_operands, but keep the existing constraints.
If the instruction is split before RA, load invalid addresses into
a temporary register.
* config/aarch64/predicates.md (aarch64_sve_dup_ld1rq_operand): Delete.
gcc/testsuite/
* gcc.target/aarch64/sve/acle/general/ld1rq_1.c: New test.
|
|
The architecture recommends that load-gather instructions avoid using the same
Z register for the load address and the destination, and the Software Optimization
Guides for Arm cores recommend that as well.
This means that for code like:
svuint64_t
food (svbool_t p, uint64_t *in, svint64_t offsets, svuint64_t a)
{
return svadd_u64_x (p, a, svld1_gather_offset(p, in, offsets));
}
we'll want to avoid generating the current:
food:
ld1d z0.d, p0/z, [x0, z0.d] // Z0 reused as input and output.
add z0.d, z1.d, z0.d
ret
However, we still want to avoid generating extra moves where there were
none before, so the tight aarch64-sve-acle.exp tests for load gathers
should still pass as they are.
This patch implements that recommendation for the load gather patterns by:
* duplicating the alternatives
* marking the output operand as early clobber
* Tying the input Z register operand in the original alternatives to 0
* Penalising the original alternatives with '?'
This results in a large-ish patch in terms of diff lines but the new
compact syntax (thanks Tamar) makes it quite a readable an regular change.
The benchmark numbers on a Neoverse V1 on fprate look okay:
diff
503.bwaves_r 0.00%
507.cactuBSSN_r 0.00%
508.namd_r 0.00%
510.parest_r 0.55%
511.povray_r 0.22%
519.lbm_r 0.00%
521.wrf_r 0.00%
526.blender_r 0.00%
527.cam4_r 0.56%
538.imagick_r 0.00%
544.nab_r 0.00%
549.fotonik3d_r 0.00%
554.roms_r 0.00%
fprate 0.10%
Bootstrapped and tested on aarch64-none-linux-gnu.
gcc/ChangeLog:
* config/aarch64/aarch64-sve.md (mask_gather_load<mode><v_int_container>):
Add alternatives to prefer to avoid same input and output Z register.
(mask_gather_load<mode><v_int_container>): Likewise.
(*mask_gather_load<mode><v_int_container>_<su>xtw_unpacked): Likewise.
(*mask_gather_load<mode><v_int_container>_sxtw): Likewise.
(*mask_gather_load<mode><v_int_container>_uxtw): Likewise.
(@aarch64_gather_load_<ANY_EXTEND:optab><SVE_4HSI:mode><SVE_4BHI:mode>):
Likewise.
(@aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode><SVE_2BHSI:mode>):
Likewise.
(*aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode>
<SVE_2BHSI:mode>_<ANY_EXTEND2:su>xtw_unpacked): Likewise.
(*aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode>
<SVE_2BHSI:mode>_sxtw): Likewise.
(*aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode>
<SVE_2BHSI:mode>_uxtw): Likewise.
(@aarch64_ldff1_gather<mode>): Likewise.
(@aarch64_ldff1_gather<mode>): Likewise.
(*aarch64_ldff1_gather<mode>_sxtw): Likewise.
(*aarch64_ldff1_gather<mode>_uxtw): Likewise.
(@aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx4_WIDE:mode>
<VNx4_NARROW:mode>): Likewise.
(@aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode>
<VNx2_NARROW:mode>): Likewise.
(*aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode>
<VNx2_NARROW:mode>_sxtw): Likewise.
(*aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode>
<VNx2_NARROW:mode>_uxtw): Likewise.
* config/aarch64/aarch64-sve2.md (@aarch64_gather_ldnt<mode>): Likewise.
(@aarch64_gather_ldnt_<ANY_EXTEND:optab><SVE_FULL_SDI:mode>
<SVE_PARTIAL_I:mode>): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve/gather_earlyclobber.c: New test.
* gcc.target/aarch64/sve2/gather_earlyclobber.c: New test.
|
|
This patch converts the SVE load gather patterns to the new compact syntax
that Tamar introduced. This allows for a future patch I want to contribute
to add more alternatives that are better viewed in the more compact form.
The lines in some patterns are >80 long now, but I think that's unavoidable
and those patterns already had overly long constraint strings.
No functional change intended.
Bootstrapped and tested on aarch64-none-linux-gnu.
gcc/ChangeLog:
* config/aarch64/aarch64-sve.md (mask_gather_load<mode><v_int_container>):
Convert to compact alternatives syntax.
(mask_gather_load<mode><v_int_container>): Likewise.
(*mask_gather_load<mode><v_int_container>_<su>xtw_unpacked): Likewise.
(*mask_gather_load<mode><v_int_container>_sxtw): Likewise.
(*mask_gather_load<mode><v_int_container>_uxtw): Likewise.
(@aarch64_gather_load_<ANY_EXTEND:optab><SVE_4HSI:mode><SVE_4BHI:mode>):
Likewise.
(@aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode><SVE_2BHSI:mode>):
Likewise.
(*aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode>
<SVE_2BHSI:mode>_<ANY_EXTEND2:su>xtw_unpacked): Likewise.
(*aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode>
<SVE_2BHSI:mode>_sxtw): Likewise.
(*aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode>
<SVE_2BHSI:mode>_uxtw): Likewise.
(@aarch64_ldff1_gather<mode>): Likewise.
(@aarch64_ldff1_gather<mode>): Likewise.
(*aarch64_ldff1_gather<mode>_sxtw): Likewise.
(*aarch64_ldff1_gather<mode>_uxtw): Likewise.
(@aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx4_WIDE:mode>
<VNx4_NARROW:mode>): Likewise.
(@aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode>
<VNx2_NARROW:mode>): Likewise.
(*aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode>
<VNx2_NARROW:mode>_sxtw): Likewise.
(*aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode>
<VNx2_NARROW:mode>_uxtw): Likewise.
* config/aarch64/aarch64-sve2.md (@aarch64_gather_ldnt<mode>): Likewise.
(@aarch64_gather_ldnt_<ANY_EXTEND:optab><SVE_FULL_SDI:mode>
<SVE_PARTIAL_I:mode>): Likewise.
|
|
This reverts commit bb3c69058a5fb874ea3c5c26bfb331d33d0497c3.
|
|
This patch converts the SVE load gather patterns to the new compact syntax
that Tamar introduced. This allows for a future patch I want to contribute
to add more alternatives that are better viewed in the more compact form.
The lines in some patterns are >80 long now, but I think that's unavoidable
and those patterns already had overly long constraint strings.
No functional change intended.
Bootstrapped and tested on aarch64-none-linux-gnu.
gcc/ChangeLog:
* config/aarch64/aarch64-sve.md (mask_gather_load<mode><v_int_container>):
Convert to compact alternatives syntax.
(mask_gather_load<mode><v_int_container>): Likewise.
(*mask_gather_load<mode><v_int_container>_<su>xtw_unpacked): Likewise.
(*mask_gather_load<mode><v_int_container>_sxtw): Likewise.
(*mask_gather_load<mode><v_int_container>_uxtw): Likewise.
(@aarch64_gather_load_<ANY_EXTEND:optab><SVE_4HSI:mode><SVE_4BHI:mode>):
Likewise.
(@aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode><SVE_2BHSI:mode>):
Likewise.
(*aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode>
<SVE_2BHSI:mode>_<ANY_EXTEND2:su>xtw_unpacked): Likewise.
(*aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode>
<SVE_2BHSI:mode>_sxtw): Likewise.
(*aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode>
<SVE_2BHSI:mode>_uxtw): Likewise.
(@aarch64_ldff1_gather<mode>): Likewise.
(@aarch64_ldff1_gather<mode>): Likewise.
(*aarch64_ldff1_gather<mode>_sxtw): Likewise.
(*aarch64_ldff1_gather<mode>_uxtw): Likewise.
(@aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx4_WIDE:mode>
<VNx4_NARROW:mode>): Likewise.
(@aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode>
<VNx2_NARROW:mode>): Likewise.
(*aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode>
<VNx2_NARROW:mode>_sxtw): Likewise.
(*aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode>
<VNx2_NARROW:mode>_uxtw): Likewise.
* config/aarch64/aarch64-sve2.md (@aarch64_gather_ldnt<mode>): Likewise.
(@aarch64_gather_ldnt_<ANY_EXTEND:optab><SVE_FULL_SDI:mode>
<SVE_PARTIAL_I:mode>): Likewise.
|
|
This patch adds new RTL and tests for sabd and uabd
PR tree-optimization/109156
gcc/ChangeLog:
* config/aarch64/aarch64-simd.md (aarch64_<su>abd<mode>):
Rename to <su>abd<mode>3.
* config/aarch64/aarch64-sve.md (<su>abd<mode>_3): Rename
to <su>abd<mode>3.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/abd.h: New file.
* gcc.target/aarch64/abd_2.c: New test.
* gcc.target/aarch64/abd_3.c: New test.
* gcc.target/aarch64/abd_4.c: New test.
* gcc.target/aarch64/abd_none_2.c: New test.
* gcc.target/aarch64/abd_none_3.c: New test.
* gcc.target/aarch64/abd_none_4.c: New test.
* gcc.target/aarch64/abd_run_1.c: New test.
* gcc.target/aarch64/sve/abd_1.c: New test.
* gcc.target/aarch64/sve/abd_none_1.c: New test.
* gcc.target/aarch64/sve/abd_2.c: New test.
* gcc.target/aarch64/sve/abd_none_2.c: New test.
|
|
REG_ALLOC_ORDER is much less important than it used to be, but it
is still used as a tie-breaker when multiple registers in a class
are equally good.
Previously aarch64 used the default approach of allocating in order
of increasing register number. But as the comment in the patch says,
it's better to allocate FP and predicate registers in the opposite
order, so that we don't eat into smaller register classes unnecessarily.
This fixes some existing FIXMEs and improves the register allocation
for some Arm ACLE code.
Doing this also showed that *vcond_mask_<mode><vpred> (predicated MOV/SEL)
unnecessarily required p0-p7 rather than p0-p15 for the unpredicated
movprfx alternatives. Only the predicated movprfx alternative requires
p0-p7 (due to the movprfx itself, rather than due to the main instruction).
gcc/
* config/aarch64/aarch64-protos.h (aarch64_adjust_reg_alloc_order):
Declare.
* config/aarch64/aarch64.h (REG_ALLOC_ORDER): Define.
(ADJUST_REG_ALLOC_ORDER): Likewise.
* config/aarch64/aarch64.cc (aarch64_adjust_reg_alloc_order): New
function.
* config/aarch64/aarch64-sve.md (*vcond_mask_<mode><vpred>): Use
Upa rather than Upl for unpredicated movprfx alternatives.
gcc/testsuite/
* gcc.target/aarch64/sve/acle/asm/abd_f16.c: Remove XFAILs.
* gcc.target/aarch64/sve/acle/asm/abd_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/abd_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/abd_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/abd_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/abd_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/abd_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/abd_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/abd_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/abd_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/abd_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/add_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/add_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/add_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/add_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/add_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/add_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/add_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/add_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/and_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/and_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/and_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/and_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/and_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/and_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/and_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/and_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/asr_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/asr_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/bic_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/bic_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/bic_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/bic_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/bic_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/bic_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/bic_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/bic_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/div_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/div_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/div_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/div_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/div_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/div_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/div_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/divr_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/divr_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/divr_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/divr_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/divr_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/divr_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/divr_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/dot_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/dot_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/dot_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/dot_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/eor_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/eor_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/eor_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/eor_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/eor_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/eor_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/eor_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/eor_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_wide_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_wide_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_wide_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_wide_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_wide_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_wide_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsr_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsr_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mad_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mad_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mad_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mad_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mad_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mad_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mad_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mad_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mad_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mad_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mad_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/max_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/max_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/max_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/max_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/max_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/max_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/max_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/max_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/min_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/min_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/min_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/min_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/min_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/min_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/min_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/min_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mla_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mla_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mla_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mla_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mla_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mla_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mla_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mla_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mla_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mla_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mla_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mls_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mls_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mls_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mls_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mls_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mls_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mls_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mls_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mls_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mls_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mls_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/msb_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/msb_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/msb_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/msb_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/msb_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/msb_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/msb_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/msb_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/msb_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/msb_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/msb_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mul_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mul_f16_notrap.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mul_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mul_f32_notrap.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mul_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mul_f64_notrap.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mul_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mul_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mul_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mul_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mul_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mul_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mul_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mul_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mulh_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mulh_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mulh_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mulh_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mulh_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mulh_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mulh_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mulh_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mulx_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mulx_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mulx_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/nmad_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/nmad_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/nmad_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/nmla_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/nmla_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/nmla_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/nmls_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/nmls_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/nmls_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/nmsb_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/nmsb_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/nmsb_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/orr_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/orr_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/orr_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/orr_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/orr_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/orr_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/orr_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/orr_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/scale_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/scale_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/scale_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/sub_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/sub_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/sub_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/sub_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/sub_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/sub_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/sub_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/sub_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/subr_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/subr_f16_notrap.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/subr_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/subr_f32_notrap.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/subr_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/subr_f64_notrap.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/subr_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/subr_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/subr_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/subr_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/subr_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/subr_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/subr_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/subr_u8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/bcax_s16.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/bcax_s32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/bcax_s64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/bcax_s8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/bcax_u16.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/bcax_u32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/bcax_u64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/bcax_u8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/qadd_s16.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/qadd_s32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/qadd_s64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/qadd_s8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/qadd_u16.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/qadd_u32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/qadd_u64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/qadd_u8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/qdmlalb_s16.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/qdmlalb_s32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/qdmlalb_s64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/qdmlalbt_s16.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/qdmlalbt_s32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/qdmlalbt_s64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/qsub_s16.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/qsub_s32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/qsub_s64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/qsub_s8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/qsub_u16.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/qsub_u32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/qsub_u64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/qsub_u8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/qsubr_s16.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/qsubr_s32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/qsubr_s64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/qsubr_s8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/qsubr_u16.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/qsubr_u32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/qsubr_u64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/qsubr_u8.c: Likewise.
|
|
SVE2 supports an unpredicated vector integer MUL form that we can emit from our SVE expanders
without using up a predicate registers. This patch does so.
As the SVE MUL expansion currently is templated away through a code iterator I did not split it
off just for this case but instead special-cased it in the define_expand. It seemed somewhat less
invasive than the alternatives but I could split it off more explicitly if others want to.
The div-by-bitmask_1.c testcase is adjusted to expect this new MUL form.
Bootstrapped and tested on aarch64-none-linux-gnu.
gcc/ChangeLog:
PR target/109406
* config/aarch64/aarch64-sve.md (<optab><mode>3): Handle TARGET_SVE2 MUL
case.
* config/aarch64/aarch64-sve2.md (*aarch64_mul_unpredicated_<mode>): New
pattern.
gcc/testsuite/ChangeLog:
PR target/109406
* gcc.target/aarch64/sve2/div-by-bitmask_1.c: Adjust for unpredicated SVE2
MUL.
* gcc.target/aarch64/sve2/unpred_mul_1.c: New test.
|
|
With the SABDL and UABDL patterns converted, the accumulating forms of them UABAL and SABAL are not much more complicated.
There's an accumulator argument that we, err, accumulate into with a PLUS once all the widening is done.
Some necessary renaming of patterns relating to the removal of UNSPEC_SABAL and UNSPEC_UABAL is included.
Bootstrapped and tested on aarch64-none-linux-gnu.
gcc/ChangeLog:
* config/aarch64/aarch64-simd.md (aarch64_<sur>abal<mode>): Rename to...
(aarch64_<su>abal<mode>): ... This. Use RTL codes instead of unspec.
(<sur>sadv16qi): Rename to...
(<su>sadv16qi): ... This. Adjust for the above.
* config/aarch64/aarch64-sve.md (<sur>sad<vsi2qi>): Rename to...
(<su>sad<vsi2qi>): ... This. Adjust for the above.
* config/aarch64/aarch64.md (UNSPEC_SABAL, UNSPEC_UABAL): Delete.
* config/aarch64/iterators.md (ABAL): Delete.
(sur): Remove handling of UNSPEC_SABAL and UNSPEC_UABAL.
|
|
|
|
gcc/ChangeLog:
* config/aarch64/aarch64-sve.md (aarch64_vec_duplicate_vq<mode>_le):
Change to define_insn_and_split to fold ldr+dup to ld1rq.
* config/aarch64/predicates.md (aarch64_sve_dup_ld1rq_operand): New.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve/acle/general/pr96463-2.c: Adjust.
|
|
Bit of a brown-paper-bag bug, but: GCC was generating
non-existent merging forms of BRKAS and BRKBS. Those
instructions only support zero predication (although
BRKA and BRKB support both).
gcc/
* config/aarch64/aarch64-sve.md (*aarch64_brk<brk_op>_cc): Remove
merging alternative.
(*aarch64_brk<brk_op>_ptest): Likewise.
gcc/testsuite/
* gcc.target/aarch64/sve/acle/general/brka_1.c: Expect a separate
PTEST instruction.
* gcc.target/aarch64/sve/acle/general/brkb_1.c: Likewise.
|
|
Unlike other flag-setting SVE instructions, BRKNS sets the flags
based on an all-true governing predicate, rather than the GP operand.
gcc/
* config/aarch64/iterators.md (SVE_BRKP): New iterator.
* config/aarch64/aarch64-sve.md (*aarch64_brkn_cc): New pattern.
(*aarch64_brkn_ptest): Likewise.
(*aarch64_brk<brk_op>_cc): Restrict to SVE_BRKP.
(*aarch64_brk<brk_op>_ptest): Likewise.
gcc/testsuite/
* gcc.target/aarch64/sve/acle/general/brkn_1.c: Expect separate
PTEST instructions.
* gcc.target/aarch64/sve/acle/general/brkn_2.c: New test.
|
|
There's no encoding for fcmuo with zero. This restricts the combine patterns
from accepting zero registers.
gcc/ChangeLog:
PR target/106524
* config/aarch64/aarch64-sve.md (*fcmuo<mode>_nor_combine,
*fcmuo<mode>_bic_combine): Don't accept comparisons against zero.
gcc/testsuite/ChangeLog:
PR target/106524
* gcc.target/aarch64/sve/pr106524.c: New test.
|
|
After the first patch in the series this updates the optabs to expect the
canonical sequence.
gcc/ChangeLog:
PR tree-optimization/102819
PR tree-optimization/103169
* config/aarch64/aarch64-simd.md (cml<fcmac1><conj_op><mode>4): Use
canonical order.
* config/aarch64/aarch64-sve.md (cml<fcmac1><conj_op><mode>4): Likewise.
|
|
|
|
This patch adds support for reductions involving calls to fmax*()
and fmin*(), without the -ffast-math flags that allow them to be
converted to MAX_EXPR and MIN_EXPR.
gcc/
* doc/md.texi (reduc_fmin_scal_@var{m}): Document.
(reduc_fmax_scal_@var{m}): Likewise.
* optabs.def (reduc_fmax_scal_optab): New optab.
(reduc_fmin_scal_optab): Likewise
* internal-fn.def (REDUC_FMAX, REDUC_FMIN): New functions.
* tree-vect-loop.c (reduction_fn_for_scalar_code): Handle
CASE_CFN_FMAX and CASE_CFN_FMIN.
(neutral_op_for_reduction): Likewise.
(needs_fold_left_reduction_p): Likewise.
* config/aarch64/iterators.md (FMAXMINV): New iterator.
(fmaxmin): Handle UNSPEC_FMAXNMV and UNSPEC_FMINNMV.
* config/aarch64/aarch64-simd.md (reduc_<optab>_scal_<mode>): Fix
unspec mode.
(reduc_<fmaxmin>_scal_<mode>): New pattern.
* config/aarch64/aarch64-sve.md (reduc_<fmaxmin>_scal_<mode>):
Likewise.
gcc/testsuite/
* gcc.dg/vect/vect-fmax-1.c: New test.
* gcc.dg/vect/vect-fmax-2.c: Likewise.
* gcc.dg/vect/vect-fmax-3.c: Likewise.
* gcc.dg/vect/vect-fmin-1.c: New test.
* gcc.dg/vect/vect-fmin-2.c: Likewise.
* gcc.dg/vect/vect-fmin-3.c: Likewise.
* gcc.target/aarch64/fmaxnm_1.c: Likewise.
* gcc.target/aarch64/fmaxnm_2.c: Likewise.
* gcc.target/aarch64/fminnm_1.c: Likewise.
* gcc.target/aarch64/fminnm_2.c: Likewise.
* gcc.target/aarch64/sve/fmaxnm_2.c: Likewise.
* gcc.target/aarch64/sve/fmaxnm_3.c: Likewise.
* gcc.target/aarch64/sve/fminnm_2.c: Likewise.
* gcc.target/aarch64/sve/fminnm_3.c: Likewise.
|
|
This patch adds conditional forms of FMAX and FMIN, following
the pattern for existing conditional binary functions.
gcc/
* doc/md.texi (cond_fmin@var{mode}, cond_fmax@var{mode}): Document.
* optabs.def (cond_fmin_optab, cond_fmax_optab): New optabs.
* internal-fn.def (COND_FMIN, COND_FMAX): New functions.
* internal-fn.c (first_commutative_argument): Handle them.
(FOR_EACH_COND_FN_PAIR): Likewise.
* match.pd (UNCOND_BINARY, COND_BINARY): Likewise.
* config/aarch64/aarch64-sve.md (cond_<fmaxmin><mode>): New
pattern.
gcc/testsuite/
* gcc.target/aarch64/sve/cond_fmaxnm_5.c: New test.
* gcc.target/aarch64/sve/cond_fmaxnm_5_run.c: Likewise.
* gcc.target/aarch64/sve/cond_fmaxnm_6.c: Likewise.
* gcc.target/aarch64/sve/cond_fmaxnm_6_run.c: Likewise.
* gcc.target/aarch64/sve/cond_fmaxnm_7.c: Likewise.
* gcc.target/aarch64/sve/cond_fmaxnm_7_run.c: Likewise.
* gcc.target/aarch64/sve/cond_fmaxnm_8.c: Likewise.
* gcc.target/aarch64/sve/cond_fmaxnm_8_run.c: Likewise.
* gcc.target/aarch64/sve/cond_fminnm_5.c: Likewise.
* gcc.target/aarch64/sve/cond_fminnm_5_run.c: Likewise.
* gcc.target/aarch64/sve/cond_fminnm_6.c: Likewise.
* gcc.target/aarch64/sve/cond_fminnm_6_run.c: Likewise.
* gcc.target/aarch64/sve/cond_fminnm_7.c: Likewise.
* gcc.target/aarch64/sve/cond_fminnm_7_run.c: Likewise.
* gcc.target/aarch64/sve/cond_fminnm_8.c: Likewise.
* gcc.target/aarch64/sve/cond_fminnm_8_run.c: Likewise.
|
|
There was some duplication between the maxmin_uns (uns for unspec
rather than unsigned) int attribute and the optab int attribute.
The difficulty for FMAXNM and FMINNM is that the instructions
really correspond to two things: the smax/smin optabs for floats
(used only for fast-math-like flags) and the fmax/fmin optabs
(used for built-in functions). The optab attribute was
consistently for the former but maxmin_uns had a mixture of both.
This patch renames maxmin_uns to fmaxmin and only uses it
for the fmax and fmin optabs. The reductions that previously
used the maxmin_uns attribute now use the optab attribute instead.
FMAX and FMIN are awkward in that they don't correspond to any
optab. It's nevertheless useful to define them alongside the
“real” optabs. Previously they were known as “smax_nan” and
“smin_nan”, but the problem with those names it that smax and
smin are only used for floats if NaNs don't matter. This patch
therefore uses fmax_nan and fmin_nan instead.
There is still some inconsistency, in that the optab attribute
handles UNSPEC_COND_FMAX but the fmaxmin attribute handles
UNSPEC_FMAX. This is because the SVE FP instructions, being
predicated, have to use unspecs in cases where the Advanced
SIMD ones could use rtl codes.
At least there are no duplicate entries though, so this seemed
like the best compromise for now.
gcc/
* config/aarch64/iterators.md (optab): Use fmax_nan instead of
smax_nan and fmin_nan instead of smin_nan.
(maxmin_uns): Rename to...
(fmaxmin): ...this and make the same changes. Remove entries
unrelated to fmax* and fmin*.
* config/aarch64/aarch64.md (<maxmin_uns><mode>3): Rename to...
(<fmaxmin><mode>3): ...this.
* config/aarch64/aarch64-simd.md (aarch64_<maxmin_uns>p<mode>):
Rename to...
(aarch64_<optab>p<mode>): ...this.
(<maxmin_uns><mode>3): Rename to...
(<fmaxmin><mode>3): ...this.
(reduc_<maxmin_uns>_scal_<mode>): Rename to...
(reduc_<optab>_scal_<mode>): ...this and update gen* call.
(aarch64_reduc_<maxmin_uns>_internal<mode>): Rename to...
(aarch64_reduc_<optab>_internal<mode>): ...this.
(aarch64_reduc_<maxmin_uns>_internalv2si): Rename to...
(aarch64_reduc_<optab>_internalv2si): ...this.
* config/aarch64/aarch64-sve.md (<maxmin_uns><mode>3): Rename to...
(<fmaxmin><mode>3): ...this.
* config/aarch64/aarch64-simd-builtins.def (smax_nan, smin_nan)
Rename to...
(fmax_nan, fmin_nan): ...this.
* config/aarch64/arm_neon.h (vmax_f32, vmax_f64, vmaxq_f32, vmaxq_f64)
(vmin_f32, vmin_f64, vminq_f32, vminq_f64, vmax_f16, vmaxq_f16)
(vmin_f16, vminq_f16): Update accordingly.
|
|
The following example
void f10(double * restrict z, double * restrict w, double * restrict x,
double * restrict y, int n)
{
for (int i = 0; i < n; i++) {
z[i] = (w[i] > 0) ? x[i] + w[i] : y[i] - w[i];
}
}
generates currently:
ld1d z1.d, p1/z, [x1, x5, lsl 3]
fcmgt p2.d, p1/z, z1.d, #0.0
fcmgt p0.d, p3/z, z1.d, #0.0
ld1d z2.d, p2/z, [x2, x5, lsl 3]
bic p0.b, p3/z, p1.b, p0.b
ld1d z0.d, p0/z, [x3, x5, lsl 3]
where a BIC is generated between p1 and p0 where a NOT would be better here
since we won't require the use of p3 and opens the pattern up to being CSEd.
After this patch using a 2 -> 2 split we generate:
ld1d z1.d, p0/z, [x1, x5, lsl 3]
fcmgt p2.d, p0/z, z1.d, #0.0
not p1.b, p0/z, p2.b
The additional scratch is needed such that we can CSE the two operations. If
both statements wrote to the same register then CSE won't be able to CSE the
values if there are other statements in between that use the register.
A second pattern is needed to capture the nor case as combine will match the
longest sequence first. So without this pattern we end up de-optimizing nor
and instead emit two nots. I did not find a better way to do this.
gcc/ChangeLog:
* config/aarch64/aarch64-sve.md (*fcm<cmp_op><mode>_bic_combine,
*fcm<cmp_op><mode>_nor_combine, *fcmuo<mode>_bic_combine,
*fcmuo<mode>_nor_combine): New.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve/pred-not-gen-1.c: New test.
* gcc.target/aarch64/sve/pred-not-gen-2.c: New test.
* gcc.target/aarch64/sve/pred-not-gen-3.c: New test.
* gcc.target/aarch64/sve/pred-not-gen-4.c: New test.
|
|
Hi All,
This adds optabs implementing usdot_prod.
The following testcase:
#define N 480
#define SIGNEDNESS_1 unsigned
#define SIGNEDNESS_2 signed
#define SIGNEDNESS_3 signed
#define SIGNEDNESS_4 unsigned
SIGNEDNESS_1 int __attribute__ ((noipa))
f (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict a,
SIGNEDNESS_4 char *restrict b)
{
for (__INTPTR_TYPE__ i = 0; i < N; ++i)
{
int av = a[i];
int bv = b[i];
SIGNEDNESS_2 short mult = av * bv;
res += mult;
}
return res;
}
Generates for NEON
f:
movi v0.4s, 0
mov x3, 0
.p2align 3,,7
.L2:
ldr q1, [x2, x3]
ldr q2, [x1, x3]
usdot v0.4s, v1.16b, v2.16b
add x3, x3, 16
cmp x3, 480
bne .L2
addv s0, v0.4s
fmov w1, s0
add w0, w0, w1
ret
and for SVE
f:
mov x3, 0
cntb x5
mov w4, 480
mov z1.b, #0
whilelo p0.b, wzr, w4
mov z3.b, #0
ptrue p1.b, all
.p2align 3,,7
.L2:
ld1b z2.b, p0/z, [x1, x3]
ld1b z0.b, p0/z, [x2, x3]
add x3, x3, x5
sel z0.b, p0, z0.b, z3.b
whilelo p0.b, w3, w4
usdot z1.s, z0.b, z2.b
b.any .L2
uaddv d0, p1, z1.s
fmov x1, d0
add w0, w0, w1
ret
instead of
f:
movi v0.4s, 0
mov x3, 0
.p2align 3,,7
.L2:
ldr q2, [x1, x3]
ldr q1, [x2, x3]
add x3, x3, 16
sxtl v4.8h, v2.8b
sxtl2 v3.8h, v2.16b
uxtl v2.8h, v1.8b
uxtl2 v1.8h, v1.16b
mul v2.8h, v2.8h, v4.8h
mul v1.8h, v1.8h, v3.8h
saddw v0.4s, v0.4s, v2.4h
saddw2 v0.4s, v0.4s, v2.8h
saddw v0.4s, v0.4s, v1.4h
saddw2 v0.4s, v0.4s, v1.8h
cmp x3, 480
bne .L2
addv s0, v0.4s
fmov w1, s0
add w0, w0, w1
ret
and
f:
mov x3, 0
cnth x5
mov w4, 480
mov z1.b, #0
whilelo p0.h, wzr, w4
ptrue p2.b, all
.p2align 3,,7
.L2:
ld1sb z2.h, p0/z, [x1, x3]
punpklo p1.h, p0.b
ld1b z0.h, p0/z, [x2, x3]
add x3, x3, x5
mul z0.h, p2/m, z0.h, z2.h
sunpklo z2.s, z0.h
sunpkhi z0.s, z0.h
add z1.s, p1/m, z1.s, z2.s
punpkhi p1.h, p0.b
whilelo p0.h, w3, w4
add z1.s, p1/m, z1.s, z0.s
b.any .L2
uaddv d0, p2, z1.s
fmov x1, d0
add w0, w0, w1
ret
gcc/ChangeLog:
* config/aarch64/aarch64-simd.md (aarch64_usdot<vsi2qi>): Rename to...
(usdot_prod<vsi2qi>): ... This.
* config/aarch64/aarch64-simd-builtins.def (usdot): Rename to...
(usdot_prod): ...This.
* config/aarch64/arm_neon.h (vusdot_s32, vusdotq_s32): Likewise.
* config/aarch64/aarch64-sve.md (@aarch64_<sur>dot_prod<vsi2qi>):
Rename to...
(@<sur>dot_prod<vsi2qi>): ...This.
* config/aarch64/aarch64-sve-builtins-base.cc
(svusdot_impl::expand): Use it.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/simd/vusdot-autovec.c: New test.
* gcc.target/aarch64/sve/vusdot-autovec.c: New test.
|
|
This patch will enable the use of loads using the UNSPEC_PRED_X enum in the
aarch64_load pattern, thus enabling combine to combine such loads with extends.
gcc/ChangeLog:
2021-05-19 Andre Vieira <andre.simoesdiasvieira@arm.com>
* config/aarch64/iterators.md (SVE_PRED_LOAD): New iterator.
(pred_load): New int attribute.
* config/aarch64/aarch64-sve.md
(aarch64_load_<ANY_EXTEND:optab><SVE_HSDI:mode><SVE_PARTIAL_I:mode>): Use
SVE_PRED_LOAD enum iterator and corresponding pred_load attribute.
* config/aarch64/aarch64-sve-builtins-base.cc (expand): Update call to
code_for_aarch64_load.
gcc/testsuite/ChangeLog:
2021-05-19 Andre Vieira <andre.simoesdiasvieira@arm.com>
* gcc.target/aarch64/sve/logical_unpacked_and_2.c: Change
scan-assembly-times to scan-assembly not for superfluous uxtb.
* gcc.target/aarch64/sve/logical_unpacked_and_3.c: Likewise.
* gcc.target/aarch64/sve/logical_unpacked_and_4.c: Likewise.
* gcc.target/aarch64/sve/logical_unpacked_and_6.c: Likewise.
* gcc.target/aarch64/sve/logical_unpacked_and_7.c: Likewise.
* gcc.target/aarch64/sve/logical_unpacked_eor_2.c: Likewise.
* gcc.target/aarch64/sve/logical_unpacked_eor_3.c: Likewise.
* gcc.target/aarch64/sve/logical_unpacked_eor_4.c: Likewise.
* gcc.target/aarch64/sve/logical_unpacked_eor_6.c: Likewise.
* gcc.target/aarch64/sve/logical_unpacked_eor_7.c: Likewise.
* gcc.target/aarch64/sve/logical_unpacked_orr_2.c: Likewise.
* gcc.target/aarch64/sve/logical_unpacked_orr_3.c: Likewise.
* gcc.target/aarch64/sve/logical_unpacked_orr_4.c: Likewise.
* gcc.target/aarch64/sve/logical_unpacked_orr_6.c: Likewise.
* gcc.target/aarch64/sve/logical_unpacked_orr_7.c: Likewise.
* gcc.target/aarch64/sve/ld1_extend.c: New test.
|
|
The attached testcase generates the following paradoxical subregs when creating
the predicates.
(insn 22 21 23 2 (set (reg:VNx8BI 100)
(subreg:VNx8BI (reg:VNx2BI 103) 0))
(expr_list:REG_EQUAL (const_vector:VNx8BI [
(const_int 1 [0x1])
(const_int 0 [0])
(const_int 1 [0x1])
(const_int 0 [0]) repeated x5
])
(nil)))
and
(insn 15 14 16 2 (set (reg:VNx8BI 96)
(subreg:VNx8BI (reg:VNx2BI 99) 0))
(expr_list:REG_EQUAL (const_vector:VNx8BI [
(const_int 1 [0x1])
(const_int 0 [0]) repeated x7
])
(nil)))
This causes CSE to incorrectly think that the two predicates are equal because
some of the significant bits get ignored due to the subreg.
The attached patch instead makes it so it always looks at all 16-bits of the
predicate, but in turn means we need to generate a TRN that matches the expected
result mode. In effect in RTL we keep the mode as VNx16BI but during codegen
re-interpret them as the mode the predicate instruction wanted:
(insn 10 9 11 2 (set (reg:VNx8BI 96)
(subreg:VNx8BI (reg:VNx16BI 99) 0))
(expr_list:REG_EQUAL (const_vector:VNx8BI [
(const_int 1 [0x1])
(const_int 0 [0]) repeated x7
])
(nil)))
Which needed correction to the TRN pattern. A new TRN1_CONV unspec is
introduced which allows one to keep the arguments as VNx16BI but encode the
instruction as a type of the last operand.
(insn 9 8 10 2 (set (reg:VNx16BI 99)
(unspec:VNx16BI [
(reg:VNx16BI 97)
(reg:VNx16BI 98)
(reg:VNx2BI 100)
] UNSPEC_TRN1_CONV))
(nil))
This allows us remove all the paradoxical subregs and end up with
(insn 16 15 17 2 (set (reg:VNx8BI 101)
(subreg:VNx8BI (reg:VNx16BI 104) 0))
(expr_list:REG_EQUAL (const_vector:VNx8BI [
(const_int 1 [0x1])
(const_int 0 [0])
(const_int 1 [0x1])
(const_int 0 [0]) repeated x5
])
(nil)))
gcc/ChangeLog:
PR target/100048
* config/aarch64/aarch64-sve.md (@aarch64_sve_trn1_conv<mode>): New.
* config/aarch64/aarch64.c (aarch64_expand_sve_const_pred_trn): Use new
TRN optab.
* config/aarch64/iterators.md (UNSPEC_TRN1_CONV): New.
gcc/testsuite/ChangeLog:
PR target/100048
* gcc.target/aarch64/sve/pr100048.c: New test.
|
|
Prevents generation of a vec_duplicate with illegal predicate in
<ASHIFT:optab><mode>3.
gcc/ChangeLog:
2021-02-19 Andre Vieira <andre.simoesdiasvieira@arm.com>
PR target/98657
* config/aarch64/aarch64-sve.md (<ASHIFT:optab><mode>3): Use
expand_vector_broadcast' to emit the vec_duplicate operand.
gcc/testsuite/ChangeLog:
2021-02-19 Andre Vieira <andre.simoesdiasvieira@arm.com>
PR target/98657
* gcc.target/aarch64/sve/pr98657.c: New test.
|
|
This adds implementation for the optabs for complex operations. With this the
following C code:
void g (float complex a[restrict N], float complex b[restrict N],
float complex c[restrict N])
{
for (int i=0; i < N; i++)
c[i] = a[i] * b[i];
}
generates
NEON:
g:
movi v3.4s, 0
mov x3, 0
.p2align 3,,7
.L2:
mov v0.16b, v3.16b
ldr q2, [x1, x3]
ldr q1, [x0, x3]
fcmla v0.4s, v1.4s, v2.4s, #0
fcmla v0.4s, v1.4s, v2.4s, #90
str q0, [x2, x3]
add x3, x3, 16
cmp x3, 1600
bne .L2
ret
SVE:
g:
mov x3, 0
mov x4, 400
ptrue p1.b, all
whilelo p0.s, xzr, x4
mov z3.s, #0
.p2align 3,,7
.L2:
ld1w z1.s, p0/z, [x0, x3, lsl 2]
ld1w z2.s, p0/z, [x1, x3, lsl 2]
movprfx z0, z3
fcmla z0.s, p1/m, z1.s, z2.s, #0
fcmla z0.s, p1/m, z1.s, z2.s, #90
st1w z0.s, p0, [x2, x3, lsl 2]
incw x3
whilelo p0.s, x3, x4
b.any .L2
ret
SVE2 (with int instead of float)
g:
mov x3, 0
mov x4, 400
mov z3.b, #0
whilelo p0.s, xzr, x4
.p2align 3,,7
.L2:
ld1w z1.s, p0/z, [x0, x3, lsl 2]
ld1w z2.s, p0/z, [x1, x3, lsl 2]
movprfx z0, z3
cmla z0.s, z1.s, z2.s, #0
cmla z0.s, z1.s, z2.s, #90
st1w z0.s, p0, [x2, x3, lsl 2]
incw x3
whilelo p0.s, x3, x4
b.any .L2
ret
gcc/ChangeLog:
* config/aarch64/aarch64-simd.md (cml<fcmac1><conj_op><mode>4,
cmul<conj_op><mode>3): New.
* config/aarch64/iterators.md (UNSPEC_FCMUL,
UNSPEC_FCMUL180, UNSPEC_FCMLA_CONJ, UNSPEC_FCMLA180_CONJ,
UNSPEC_CMLA_CONJ, UNSPEC_CMLA180_CONJ, UNSPEC_CMUL, UNSPEC_CMUL180,
FCMLA_OP, FCMUL_OP, conj_op, rotsplit1, rotsplit2, fcmac1, sve_rot1,
sve_rot2, SVE2_INT_CMLA_OP, SVE2_INT_CMUL_OP, SVE2_INT_CADD_OP): New.
(rot): Add UNSPEC_FCMUL, UNSPEC_FCMUL180.
(rot_op): Renamed to conj_op.
* config/aarch64/aarch64-sve.md (cml<fcmac1><conj_op><mode>4,
cmul<conj_op><mode>3): New.
* config/aarch64/aarch64-sve2.md (cml<fcmac1><conj_op><mode>4,
cmul<conj_op><mode>3): New.
|
|
This patch extends the MLS/MSB patterns to support unpacked
integer vectors. The type suffix could be either the element
size or the container size, but using the element size should
be more efficient.
gcc/
* config/aarch64/aarch64-sve.md (fnma<mode>4): Extend from SVE_FULL_I
to SVE_I.
(@aarch64_pred_fnma<mode>, cond_fnma<mode>, *cond_fnma<mode>_2)
(*cond_fnma<mode>_4, *cond_fnma<mode>_any): Likewise.
gcc/testsuite/
* gcc.target/aarch64/sve/mls_2.c: New test.
* g++.target/aarch64/sve/cond_mls_1.C: Likewise.
* g++.target/aarch64/sve/cond_mls_2.C: Likewise.
* g++.target/aarch64/sve/cond_mls_3.C: Likewise.
* g++.target/aarch64/sve/cond_mls_4.C: Likewise.
* g++.target/aarch64/sve/cond_mls_5.C: Likewise.
|