aboutsummaryrefslogtreecommitdiff
path: root/gcc/expr.cc
diff options
context:
space:
mode:
authorTamar Christina <tamar.christina@arm.com>2022-11-14 15:42:42 +0000
committerTamar Christina <tamar.christina@arm.com>2022-11-14 17:40:56 +0000
commitb2bb611d90d01f64a2456c29de2a2ca1211ac134 (patch)
treebeaed686bf35b867edc42d73a51ef5f0044ccb7f /gcc/expr.cc
parent2b85d759dae79c930abe8118e1102ecb673b74aa (diff)
downloadgcc-b2bb611d90d01f64a2456c29de2a2ca1211ac134.zip
gcc-b2bb611d90d01f64a2456c29de2a2ca1211ac134.tar.gz
gcc-b2bb611d90d01f64a2456c29de2a2ca1211ac134.tar.bz2
middle-end: Add optimized float addsub without needing VEC_PERM_EXPR.
For IEEE 754 floating point formats we can replace a sequence of alternative +/- with fneg of a wider type followed by an fadd. This eliminated the need for using a permutation. This patch adds a math.pd rule to recognize and do this rewriting. For void f (float *restrict a, float *restrict b, float *res, int n) { for (int i = 0; i < (n & -4); i+=2) { res[i+0] = a[i+0] + b[i+0]; res[i+1] = a[i+1] - b[i+1]; } } we generate: .L3: ldr q1, [x1, x3] ldr q0, [x0, x3] fneg v1.2d, v1.2d fadd v0.4s, v0.4s, v1.4s str q0, [x2, x3] add x3, x3, 16 cmp x3, x4 bne .L3 now instead of: .L3: ldr q1, [x0, x3] ldr q2, [x1, x3] fadd v0.4s, v1.4s, v2.4s fsub v1.4s, v1.4s, v2.4s tbl v0.16b, {v0.16b - v1.16b}, v3.16b str q0, [x2, x3] add x3, x3, 16 cmp x3, x4 bne .L3 Thanks to George Steed for the idea. gcc/ChangeLog: * generic-match-head.cc: Include langooks. * gimple-match-head.cc: Likewise. * match.pd: Add fneg/fadd rule. gcc/testsuite/ChangeLog: * gcc.target/aarch64/simd/addsub_1.c: New test. * gcc.target/aarch64/sve/addsub_1.c: New test.
Diffstat (limited to 'gcc/expr.cc')
0 files changed, 0 insertions, 0 deletions