aboutsummaryrefslogtreecommitdiff
path: root/gcc/fortran/coarray.cc
diff options
context:
space:
mode:
authorJennifer Schmitz <jschmitz@nvidia.com>2025-03-12 00:37:42 -0700
committerJennifer Schmitz <jschmitz@nvidia.com>2025-05-02 11:33:21 +0200
commitcdfa963cfc6849ff3ceb911f293201882aeef22e (patch)
tree4b7b02d5787a3d86d25c14dd5f5468e1e6fe0575 /gcc/fortran/coarray.cc
parentc6efdffa7d5c68a14aa5de3a426a44ee05aaa1b9 (diff)
downloadgcc-cdfa963cfc6849ff3ceb911f293201882aeef22e.zip
gcc-cdfa963cfc6849ff3ceb911f293201882aeef22e.tar.gz
gcc-cdfa963cfc6849ff3ceb911f293201882aeef22e.tar.bz2
aarch64: Optimize SVE extract last for VLS.
For the test case int32_t foo (svint32_t x) { svbool_t pg = svpfalse (); return svlastb_s32 (pg, x); } compiled with -O3 -mcpu=grace -msve-vector-bits=128, GCC produced: foo: pfalse p3.b lastb w0, p3, z0.s ret when it could use a Neon lane extract instead: foo: umov w0, v0.s[3] ret Similar optimizations can be made for VLS with other vector widths. We implemented this optimization by guarding the emission of pfalse+lastb in the pattern vec_extract<mode><Vel> by !val.is_constant (). Thus, for last-extract operations with VLS, the patterns *vec_extract<mode><Vel>_v128, *vec_extract<mode><Vel>_dup, or *vec_extract<mode><Vel>_ext are used instead. We added tests for 128-bit VLS and adjusted the tests for the other vector widths. The patch was bootstrapped and tested on aarch64-linux-gnu, no regression. OK for mainline? Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com> gcc/ * config/aarch64/aarch64-sve.md (vec_extract<mode><Vel>): Prevent the emission of pfalse+lastb for VLS. gcc/testsuite/ * gcc.target/aarch64/sve/extract_last_128.c: New test. * gcc.target/aarch64/sve/extract_1.c: Adjust expected outcome. * gcc.target/aarch64/sve/extract_2.c: Likewise. * gcc.target/aarch64/sve/extract_3.c: Likewise. * gcc.target/aarch64/sve/extract_4.c: Likewise.
Diffstat (limited to 'gcc/fortran/coarray.cc')
0 files changed, 0 insertions, 0 deletions