diff options
author | Soumya AR <soumyaa@nvidia.com> | 2024-11-13 10:20:14 +0530 |
---|---|---|
committer | Soumya AR <soumyaa@nvidia.com> | 2024-11-13 10:20:14 +0530 |
commit | 9b2915d95d855333d4d8f66b71a75f653ee0d076 (patch) | |
tree | a31c5d16c0f9792665a7ed761776d206c35c5dbe /libcpp | |
parent | 445d8bb6a89eb2275c4930ec87a98d5123e5abdd (diff) | |
download | gcc-9b2915d95d855333d4d8f66b71a75f653ee0d076.zip gcc-9b2915d95d855333d4d8f66b71a75f653ee0d076.tar.gz gcc-9b2915d95d855333d4d8f66b71a75f653ee0d076.tar.bz2 |
aarch64: Optimise calls to ldexp with SVE FSCALE instruction [PR111733]
This patch uses the FSCALE instruction provided by SVE to implement the
standard ldexp family of functions.
Currently, with '-Ofast -mcpu=neoverse-v2', GCC generates libcalls for the
following code:
float
test_ldexpf (float x, int i)
{
return __builtin_ldexpf (x, i);
}
double
test_ldexp (double x, int i)
{
return __builtin_ldexp(x, i);
}
GCC Output:
test_ldexpf:
b ldexpf
test_ldexp:
b ldexp
Since SVE has support for an FSCALE instruction, we can use this to process
scalar floats by moving them to a vector register and performing an fscale call,
similar to how LLVM tackles an ldexp builtin as well.
New Output:
test_ldexpf:
fmov s31, w0
ptrue p7.b, vl4
fscale z0.s, p7/m, z0.s, z31.s
ret
test_ldexp:
sxtw x0, w0
ptrue p7.b, vl8
fmov d31, x0
fscale z0.d, p7/m, z0.d, z31.d
ret
This is a revision of an earlier patch, and now uses the extended definition of
aarch64_ptrue_reg to generate predicate registers with the appropriate set bits.
The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
OK for mainline?
Signed-off-by: Soumya AR <soumyaa@nvidia.com>
gcc/ChangeLog:
PR target/111733
* config/aarch64/aarch64-sve.md
(ldexp<mode>3): Added a new pattern to match ldexp calls with scalar
floating modes and expand to the existing pattern for FSCALE.
* config/aarch64/iterators.md:
(SVE_FULL_F_SCALAR): Added an iterator to match all FP SVE modes as well
as their scalar equivalents.
(VPRED): Extended the attribute to handle GPF_HF modes.
* internal-fn.def (LDEXP): Changed macro to incorporate ldexpf16.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve/fscale.c: New test.
Diffstat (limited to 'libcpp')
0 files changed, 0 insertions, 0 deletions