diff options
author | Tamar Christina <tamar.christina@arm.com> | 2022-11-14 15:42:42 +0000 |
---|---|---|
committer | Tamar Christina <tamar.christina@arm.com> | 2022-11-14 17:40:56 +0000 |
commit | b2bb611d90d01f64a2456c29de2a2ca1211ac134 (patch) | |
tree | beaed686bf35b867edc42d73a51ef5f0044ccb7f /gcc/expr.cc | |
parent | 2b85d759dae79c930abe8118e1102ecb673b74aa (diff) | |
download | gcc-b2bb611d90d01f64a2456c29de2a2ca1211ac134.zip gcc-b2bb611d90d01f64a2456c29de2a2ca1211ac134.tar.gz gcc-b2bb611d90d01f64a2456c29de2a2ca1211ac134.tar.bz2 |
middle-end: Add optimized float addsub without needing VEC_PERM_EXPR.
For IEEE 754 floating point formats we can replace a sequence of alternative
+/- with fneg of a wider type followed by an fadd. This eliminated the need for
using a permutation. This patch adds a math.pd rule to recognize and do this
rewriting.
For
void f (float *restrict a, float *restrict b, float *res, int n)
{
for (int i = 0; i < (n & -4); i+=2)
{
res[i+0] = a[i+0] + b[i+0];
res[i+1] = a[i+1] - b[i+1];
}
}
we generate:
.L3:
ldr q1, [x1, x3]
ldr q0, [x0, x3]
fneg v1.2d, v1.2d
fadd v0.4s, v0.4s, v1.4s
str q0, [x2, x3]
add x3, x3, 16
cmp x3, x4
bne .L3
now instead of:
.L3:
ldr q1, [x0, x3]
ldr q2, [x1, x3]
fadd v0.4s, v1.4s, v2.4s
fsub v1.4s, v1.4s, v2.4s
tbl v0.16b, {v0.16b - v1.16b}, v3.16b
str q0, [x2, x3]
add x3, x3, 16
cmp x3, x4
bne .L3
Thanks to George Steed for the idea.
gcc/ChangeLog:
* generic-match-head.cc: Include langooks.
* gimple-match-head.cc: Likewise.
* match.pd: Add fneg/fadd rule.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/simd/addsub_1.c: New test.
* gcc.target/aarch64/sve/addsub_1.c: New test.
Diffstat (limited to 'gcc/expr.cc')
0 files changed, 0 insertions, 0 deletions