aarch64: Cost vector comparisons more accurately

We are missing cases for combining of FACGE/FACGT instructions. In the testcase of the patch we generate: foo: fabs v3.4s, v0.4s fabs v0.4s, v1.4s fabs v1.4s, v2.4s fcmgt v0.4s, v3.4s, v0.4s fcmgt v1.4s, v3.4s, v1.4s b g This is because combine is rejecting the pattern due to costs: Successfully matched this instruction: (set (reg:V4SI 106) (neg:V4SI (lt:V4SI (abs:V4SF (reg:V4SF 113)) (abs:V4SF (reg:V4SF 111))))) rejecting combination of insns 8, 9 and 10 original costs 8 + 8 + 12 = 28 replacement costs 8 + 28 = 36 It is obviously recursing in the various arms of the RTX and such. This patch teaches the aarch64 rtx costs routine that our vector comparisons are represented as a NEG of compare operators, with the FACGE/FAGT operations in particular having ABS on each arm. With this patch we get the much more reasonable dump: original costs 8 + 8 + 8 = 24 replacement costs 8 + 8 = 16 and generate the optimal assembly: foo: mov v31.16b, v0.16b facgt v0.4s, v0.4s, v1.4s facgt v1.4s, v31.4s, v2.4s b g Bootstrapped and tested on aarch64-none-linux-gnu. gcc/ChangeLog: * config/aarch64/aarch64.cc (aarch64_rtx_costs, NEG case): Add costing logic for vector modes. gcc/testsuite/ChangeLog: * gcc.target/aarch64/facg_1.c: New test.
author: Kyrylo Tkachov <kyrylo.tkachov@arm.com> 2023-05-15 12:05:35 +0100
committer: Kyrylo Tkachov <kyrylo.tkachov@arm.com> 2023-05-15 12:05:35 +0100
commit: c4733ea2b46278974f8d78a8afb379447cc38201 (patch)
tree: 3724fdd0cd98c704a6dbb6a3aa124590a2cd334f /gcc
parent: 6c3b30ef9e0578509bdaf59c13da4a212fe6c2ba (diff)
download: gcc-c4733ea2b46278974f8d78a8afb379447cc38201.zip
gcc-c4733ea2b46278974f8d78a8afb379447cc38201.tar.gz
gcc-c4733ea2b46278974f8d78a8afb379447cc38201.tar.bz2
2 files changed, 36 insertions, 0 deletions
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 18dab2a..29dbacf 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -14081,6 +14081,27 @@ aarch64_rtx_costs (rtx x, machine_mode mode, int outer ATTRIBUTE_UNUSED,
 
       if (VECTOR_MODE_P (mode))
 	{
+	  /* Many vector comparison operations are represented as NEG
+	     of a comparison.  */
+	  if (COMPARISON_P (op0))
+	    {
+	      rtx op00 = XEXP (op0, 0);
+	      rtx op01 = XEXP (op0, 1);
+	      machine_mode inner_mode = GET_MODE (op00);
+	      /* FACGE/FACGT.  */
+	      if (GET_MODE_CLASS (inner_mode) == MODE_VECTOR_FLOAT
+		  && GET_CODE (op00) == ABS
+		  && GET_CODE (op01) == ABS)
+		{
+		  op00 = XEXP (op00, 0);
+		  op01 = XEXP (op01, 0);
+		}
+	      *cost += rtx_cost (op00, inner_mode, GET_CODE (op0), 0, speed);
+	      *cost += rtx_cost (op01, inner_mode, GET_CODE (op0), 1, speed);
+	      if (speed)
+		*cost += extra_cost->vect.alu;
+	      return true;
+	    }
 	  if (speed)
 	    {
 	      /* FNEG.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/facg_1.c b/gcc/testsuite/gcc.target/aarch64/facg_1.c
new file mode 100644
index 0000000..6c17fb6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/facg_1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+#include <arm_neon.h>
+
+int g(uint32x4_t, uint32x4_t);
+
+int foo (float32x4_t x, float32x4_t a, float32x4_t b)
+{
+  return g(vcagtq_f32 (x, a), vcagtq_f32 (x, b));
+}
+
+/* { dg-final { scan-assembler-times {facgt\tv[0-9]+\.4s, v[0-9]+\.4s, v[0-9]+\.4s} 2 } } */
+/* { dg-final { scan-assembler-not {\tfabs\t} } } */
+
author	Kyrylo Tkachov <kyrylo.tkachov@arm.com>	2023-05-15 12:05:35 +0100
committer	Kyrylo Tkachov <kyrylo.tkachov@arm.com>	2023-05-15 12:05:35 +0100
commit	c4733ea2b46278974f8d78a8afb379447cc38201 (patch)
tree	3724fdd0cd98c704a6dbb6a3aa124590a2cd334f /gcc
parent	6c3b30ef9e0578509bdaf59c13da4a212fe6c2ba (diff)
download	gcc-c4733ea2b46278974f8d78a8afb379447cc38201.zip gcc-c4733ea2b46278974f8d78a8afb379447cc38201.tar.gz gcc-c4733ea2b46278974f8d78a8afb379447cc38201.tar.bz2