diff options
author | Kyrylo Tkachov <kyrylo.tkachov@arm.com> | 2023-05-15 12:05:35 +0100 |
---|---|---|
committer | Kyrylo Tkachov <kyrylo.tkachov@arm.com> | 2023-05-15 12:05:35 +0100 |
commit | c4733ea2b46278974f8d78a8afb379447cc38201 (patch) | |
tree | 3724fdd0cd98c704a6dbb6a3aa124590a2cd334f /gcc | |
parent | 6c3b30ef9e0578509bdaf59c13da4a212fe6c2ba (diff) | |
download | gcc-c4733ea2b46278974f8d78a8afb379447cc38201.zip gcc-c4733ea2b46278974f8d78a8afb379447cc38201.tar.gz gcc-c4733ea2b46278974f8d78a8afb379447cc38201.tar.bz2 |
aarch64: Cost vector comparisons more accurately
We are missing cases for combining of FACGE/FACGT instructions. In the testcase of the patch we generate:
foo:
fabs v3.4s, v0.4s
fabs v0.4s, v1.4s
fabs v1.4s, v2.4s
fcmgt v0.4s, v3.4s, v0.4s
fcmgt v1.4s, v3.4s, v1.4s
b g
This is because combine is rejecting the pattern due to costs:
Successfully matched this instruction:
(set (reg:V4SI 106)
(neg:V4SI (lt:V4SI (abs:V4SF (reg:V4SF 113))
(abs:V4SF (reg:V4SF 111)))))
rejecting combination of insns 8, 9 and 10
original costs 8 + 8 + 12 = 28
replacement costs 8 + 28 = 36
It is obviously recursing in the various arms of the RTX and such.
This patch teaches the aarch64 rtx costs routine that our vector comparisons are represented as a NEG of
compare operators, with the FACGE/FAGT operations in particular having ABS on each arm. With this patch we get
the much more reasonable dump:
original costs 8 + 8 + 8 = 24
replacement costs 8 + 8 = 16
and generate the optimal assembly:
foo:
mov v31.16b, v0.16b
facgt v0.4s, v0.4s, v1.4s
facgt v1.4s, v31.4s, v2.4s
b g
Bootstrapped and tested on aarch64-none-linux-gnu.
gcc/ChangeLog:
* config/aarch64/aarch64.cc (aarch64_rtx_costs, NEG case): Add costing
logic for vector modes.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/facg_1.c: New test.
Diffstat (limited to 'gcc')
-rw-r--r-- | gcc/config/aarch64/aarch64.cc | 21 | ||||
-rw-r--r-- | gcc/testsuite/gcc.target/aarch64/facg_1.c | 15 |
2 files changed, 36 insertions, 0 deletions
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index 18dab2a..29dbacf 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -14081,6 +14081,27 @@ aarch64_rtx_costs (rtx x, machine_mode mode, int outer ATTRIBUTE_UNUSED, if (VECTOR_MODE_P (mode)) { + /* Many vector comparison operations are represented as NEG + of a comparison. */ + if (COMPARISON_P (op0)) + { + rtx op00 = XEXP (op0, 0); + rtx op01 = XEXP (op0, 1); + machine_mode inner_mode = GET_MODE (op00); + /* FACGE/FACGT. */ + if (GET_MODE_CLASS (inner_mode) == MODE_VECTOR_FLOAT + && GET_CODE (op00) == ABS + && GET_CODE (op01) == ABS) + { + op00 = XEXP (op00, 0); + op01 = XEXP (op01, 0); + } + *cost += rtx_cost (op00, inner_mode, GET_CODE (op0), 0, speed); + *cost += rtx_cost (op01, inner_mode, GET_CODE (op0), 1, speed); + if (speed) + *cost += extra_cost->vect.alu; + return true; + } if (speed) { /* FNEG. */ diff --git a/gcc/testsuite/gcc.target/aarch64/facg_1.c b/gcc/testsuite/gcc.target/aarch64/facg_1.c new file mode 100644 index 0000000..6c17fb6 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/facg_1.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include <arm_neon.h> + +int g(uint32x4_t, uint32x4_t); + +int foo (float32x4_t x, float32x4_t a, float32x4_t b) +{ + return g(vcagtq_f32 (x, a), vcagtq_f32 (x, b)); +} + +/* { dg-final { scan-assembler-times {facgt\tv[0-9]+\.4s, v[0-9]+\.4s, v[0-9]+\.4s} 2 } } */ +/* { dg-final { scan-assembler-not {\tfabs\t} } } */ + |