aboutsummaryrefslogtreecommitdiff
path: root/libcpp
diff options
context:
space:
mode:
authorSoumya AR <soumyaa@nvidia.com>2024-11-13 10:20:14 +0530
committerSoumya AR <soumyaa@nvidia.com>2024-11-13 10:20:14 +0530
commit9b2915d95d855333d4d8f66b71a75f653ee0d076 (patch)
treea31c5d16c0f9792665a7ed761776d206c35c5dbe /libcpp
parent445d8bb6a89eb2275c4930ec87a98d5123e5abdd (diff)
downloadgcc-9b2915d95d855333d4d8f66b71a75f653ee0d076.zip
gcc-9b2915d95d855333d4d8f66b71a75f653ee0d076.tar.gz
gcc-9b2915d95d855333d4d8f66b71a75f653ee0d076.tar.bz2
aarch64: Optimise calls to ldexp with SVE FSCALE instruction [PR111733]
This patch uses the FSCALE instruction provided by SVE to implement the standard ldexp family of functions. Currently, with '-Ofast -mcpu=neoverse-v2', GCC generates libcalls for the following code: float test_ldexpf (float x, int i) { return __builtin_ldexpf (x, i); } double test_ldexp (double x, int i) { return __builtin_ldexp(x, i); } GCC Output: test_ldexpf: b ldexpf test_ldexp: b ldexp Since SVE has support for an FSCALE instruction, we can use this to process scalar floats by moving them to a vector register and performing an fscale call, similar to how LLVM tackles an ldexp builtin as well. New Output: test_ldexpf: fmov s31, w0 ptrue p7.b, vl4 fscale z0.s, p7/m, z0.s, z31.s ret test_ldexp: sxtw x0, w0 ptrue p7.b, vl8 fmov d31, x0 fscale z0.d, p7/m, z0.d, z31.d ret This is a revision of an earlier patch, and now uses the extended definition of aarch64_ptrue_reg to generate predicate registers with the appropriate set bits. The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. OK for mainline? Signed-off-by: Soumya AR <soumyaa@nvidia.com> gcc/ChangeLog: PR target/111733 * config/aarch64/aarch64-sve.md (ldexp<mode>3): Added a new pattern to match ldexp calls with scalar floating modes and expand to the existing pattern for FSCALE. * config/aarch64/iterators.md: (SVE_FULL_F_SCALAR): Added an iterator to match all FP SVE modes as well as their scalar equivalents. (VPRED): Extended the attribute to handle GPF_HF modes. * internal-fn.def (LDEXP): Changed macro to incorporate ldexpf16. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/fscale.c: New test.
Diffstat (limited to 'libcpp')
0 files changed, 0 insertions, 0 deletions