rocket-tools/riscv-gnu-toolchain/gcc.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Jennifer Schmitz <jschmitz@nvidia.com>	2025-03-12 00:37:42 -0700
committer	Jennifer Schmitz <jschmitz@nvidia.com>	2025-05-02 11:33:21 +0200
commit	cdfa963cfc6849ff3ceb911f293201882aeef22e (patch)
tree	4b7b02d5787a3d86d25c14dd5f5468e1e6fe0575 /gcc/fortran/coarray.cc
parent	c6efdffa7d5c68a14aa5de3a426a44ee05aaa1b9 (diff)
download	gcc-cdfa963cfc6849ff3ceb911f293201882aeef22e.zip gcc-cdfa963cfc6849ff3ceb911f293201882aeef22e.tar.gz gcc-cdfa963cfc6849ff3ceb911f293201882aeef22e.tar.bz2

aarch64: Optimize SVE extract last for VLS.

For the test case int32_t foo (svint32_t x) { svbool_t pg = svpfalse (); return svlastb_s32 (pg, x); } compiled with -O3 -mcpu=grace -msve-vector-bits=128, GCC produced: foo: pfalse p3.b lastb w0, p3, z0.s ret when it could use a Neon lane extract instead: foo: umov w0, v0.s[3] ret Similar optimizations can be made for VLS with other vector widths. We implemented this optimization by guarding the emission of pfalse+lastb in the pattern vec_extract<mode><Vel> by !val.is_constant (). Thus, for last-extract operations with VLS, the patterns *vec_extract<mode><Vel>_v128, *vec_extract<mode><Vel>_dup, or *vec_extract<mode><Vel>_ext are used instead. We added tests for 128-bit VLS and adjusted the tests for the other vector widths. The patch was bootstrapped and tested on aarch64-linux-gnu, no regression. OK for mainline? Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com> gcc/ * config/aarch64/aarch64-sve.md (vec_extract<mode><Vel>): Prevent the emission of pfalse+lastb for VLS. gcc/testsuite/ * gcc.target/aarch64/sve/extract_last_128.c: New test. * gcc.target/aarch64/sve/extract_1.c: Adjust expected outcome. * gcc.target/aarch64/sve/extract_2.c: Likewise. * gcc.target/aarch64/sve/extract_3.c: Likewise. * gcc.target/aarch64/sve/extract_4.c: Likewise.

Diffstat (limited to 'gcc/fortran/coarray.cc')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: