diff options
author | Jennifer Schmitz <jschmitz@nvidia.com> | 2025-03-12 00:37:42 -0700 |
---|---|---|
committer | Jennifer Schmitz <jschmitz@nvidia.com> | 2025-05-02 11:33:21 +0200 |
commit | cdfa963cfc6849ff3ceb911f293201882aeef22e (patch) | |
tree | 4b7b02d5787a3d86d25c14dd5f5468e1e6fe0575 /gcc/fortran/coarray.cc | |
parent | c6efdffa7d5c68a14aa5de3a426a44ee05aaa1b9 (diff) | |
download | gcc-cdfa963cfc6849ff3ceb911f293201882aeef22e.zip gcc-cdfa963cfc6849ff3ceb911f293201882aeef22e.tar.gz gcc-cdfa963cfc6849ff3ceb911f293201882aeef22e.tar.bz2 |
aarch64: Optimize SVE extract last for VLS.
For the test case
int32_t foo (svint32_t x)
{
svbool_t pg = svpfalse ();
return svlastb_s32 (pg, x);
}
compiled with -O3 -mcpu=grace -msve-vector-bits=128, GCC produced:
foo:
pfalse p3.b
lastb w0, p3, z0.s
ret
when it could use a Neon lane extract instead:
foo:
umov w0, v0.s[3]
ret
Similar optimizations can be made for VLS with other vector widths.
We implemented this optimization by guarding the emission of
pfalse+lastb in the pattern vec_extract<mode><Vel> by
!val.is_constant ().
Thus, for last-extract operations with VLS, the patterns
*vec_extract<mode><Vel>_v128, *vec_extract<mode><Vel>_dup, or
*vec_extract<mode><Vel>_ext are used instead.
We added tests for 128-bit VLS and adjusted the tests for the other vector
widths.
The patch was bootstrapped and tested on aarch64-linux-gnu, no regression.
OK for mainline?
Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>
gcc/
* config/aarch64/aarch64-sve.md (vec_extract<mode><Vel>):
Prevent the emission of pfalse+lastb for VLS.
gcc/testsuite/
* gcc.target/aarch64/sve/extract_last_128.c: New test.
* gcc.target/aarch64/sve/extract_1.c: Adjust expected outcome.
* gcc.target/aarch64/sve/extract_2.c: Likewise.
* gcc.target/aarch64/sve/extract_3.c: Likewise.
* gcc.target/aarch64/sve/extract_4.c: Likewise.
Diffstat (limited to 'gcc/fortran/coarray.cc')
0 files changed, 0 insertions, 0 deletions