AArch64: Update costing for vector conversions [PR110625]

In gimple the operation short _8; double _9; _9 = (double) _8; denotes two operations on AArch64. First we have to widen from short to long and then convert this integer to a double. Currently however we only count the widen/truncate operations: (double) _5 6 times vec_promote_demote costs 12 in body (double) _5 12 times vec_promote_demote costs 24 in body but not the actual conversion operation, which needs an additional 12 instructions in the attached testcase. Without this the attached testcase ends up incorrectly thinking that it's beneficial to vectorize the loop at a very high VF = 8 (4x unrolled). Because we can't change the mid-end to account for this the costing code in the backend now keeps track of whether the previous operation was a promotion/demotion and ajdusts the expected number of instructions to: 1. If it's the first FLOAT_EXPR and the precision of the lhs and rhs are different, double it, since we need to convert and promote. 2. If it's the previous operation was a demonition/promotion then reduce the cost of the current operation by the amount we added extra in the last. with the patch we get: (double) _5 6 times vec_promote_demote costs 24 in body (double) _5 12 times vec_promote_demote costs 36 in body which correctly accounts for 30 operations. This fixes the 16% regression in imagick in SPECCPU 2017 reported on Neoverse N2 and using the new generic Armv9-a cost model. gcc/ChangeLog: PR target/110625 * config/aarch64/aarch64.cc (aarch64_vector_costs::add_stmt_cost): Adjust throughput and latency calculations for vector conversions. (class aarch64_vector_costs): Add m_num_last_promote_demote. gcc/testsuite/ChangeLog: PR target/110625 * gcc.target/aarch64/pr110625_4.c: New test. * gcc.target/aarch64/sve/unpack_fcvt_signed_1.c: Add --param aarch64-sve-compare-costs=0. * gcc.target/aarch64/sve/unpack_fcvt_unsigned_1.c: Likewise
author: Tamar Christina <tamar.christina@arm.com> 2023-12-29 15:58:29 +0000
committer: Tamar Christina <tamar.christina@arm.com> 2023-12-29 15:58:29 +0000
commit: 984bdeaa39b6417b11736b2c167ef82119e272dc (patch)
tree: 47e3b217a185bf2b03545201b30bc57099e5cd7a
parent: 748a4e9069e41bec1f27aa369c19f9756d8c1494 (diff)
download: gcc-984bdeaa39b6417b11736b2c167ef82119e272dc.zip
gcc-984bdeaa39b6417b11736b2c167ef82119e272dc.tar.gz
gcc-984bdeaa39b6417b11736b2c167ef82119e272dc.tar.bz2
4 files changed, 52 insertions, 2 deletions
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index f985032..9858de6 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -16077,6 +16077,15 @@ private:
      leaving a vectorization of { elts }.  */
   bool m_stores_to_vector_load_decl = false;
 
+  /* Non-zero if the last operation we costed is a vector promotion or demotion.
+     In this case the value is the number of insns in the last operation.
+
+     On AArch64 vector promotion and demotions require us to first widen or
+     narrow the input and only after that emit conversion instructions.  For
+     costing this means we need to emit the cost of the final conversions as
+     well.  */
+  unsigned int m_num_last_promote_demote = 0;
+
   /* - If M_VEC_FLAGS is zero then we're costing the original scalar code.
      - If M_VEC_FLAGS & VEC_ADVSIMD is nonzero then we're costing Advanced
        SIMD code.
@@ -17132,6 +17141,29 @@ aarch64_vector_costs::add_stmt_cost (int count, vect_cost_for_stmt kind,
     stmt_cost = aarch64_sve_adjust_stmt_cost (m_vinfo, kind, stmt_info,
 					      vectype, stmt_cost);
 
+  /*  Vector promotion and demotion requires us to widen the operation first
+      and only after that perform the conversion.  Unfortunately the mid-end
+      expects this to be doable as a single operation and doesn't pass on
+      enough context here for us to tell which operation is happening.  To
+      account for this we count every promote-demote operation twice and if
+      the previously costed operation was also a promote-demote we reduce
+      the cost of the currently being costed operation to simulate the final
+      conversion cost.  Note that for SVE we can do better here if the converted
+      value comes from a load since the widening load would consume the widening
+      operations.  However since we're in stage 3 we can't change the helper
+      vect_is_extending_load and duplicating the code seems not useful.  */
+  gassign *assign = NULL;
+  if (kind == vec_promote_demote
+      && (assign = dyn_cast <gassign *> (STMT_VINFO_STMT (stmt_info)))
+      && gimple_assign_rhs_code (assign) == FLOAT_EXPR)
+    {
+      auto new_count = count * 2 - m_num_last_promote_demote;
+      m_num_last_promote_demote = count;
+      count = new_count;
+    }
+  else
+    m_num_last_promote_demote = 0;
+
   if (stmt_info && aarch64_use_new_vector_costs_p ())
     {
       /* Account for any extra "embedded" costs that apply additively
diff --git a/gcc/testsuite/gcc.target/aarch64/pr110625_4.c b/gcc/testsuite/gcc.target/aarch64/pr110625_4.c
new file mode 100644
index 0000000..34dac19
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr110625_4.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mcpu=neoverse-n2 -fdump-tree-vect-details" } */
+
+typedef struct {
+  short blue, green, red, opacity;
+} Pixel;
+
+double foo (long n, double *k, Pixel *k_pixels) {
+  double result_2, result_1, result_0;
+  for (; n; n++, k--) {
+    result_0 += *k * k_pixels[n].red;
+    result_1 += *k * k_pixels[n].green;
+    result_2 += *k * k_pixels[n].blue;
+  }
+  return result_0 + result_1 + result_2;
+}
+
+/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/unpack_fcvt_signed_1.c b/gcc/testsuite/gcc.target/aarch64/sve/unpack_fcvt_signed_1.c
index 0f96dc2..4c5e886 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/unpack_fcvt_signed_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/unpack_fcvt_signed_1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-vectorize -fno-inline" } */
+/* { dg-options "-O2 -ftree-vectorize -fno-inline --param aarch64-sve-compare-costs=0" } */
 
 #include <stdint.h>
 
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/unpack_fcvt_unsigned_1.c b/gcc/testsuite/gcc.target/aarch64/sve/unpack_fcvt_unsigned_1.c
index 70465f9..3ff2bd1 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/unpack_fcvt_unsigned_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/unpack_fcvt_unsigned_1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-vectorize" } */
+/* { dg-options "-O2 -ftree-vectorize --param aarch64-sve-compare-costs=0" } */
 
 #include <stdint.h>
author	Tamar Christina <tamar.christina@arm.com>	2023-12-29 15:58:29 +0000
committer	Tamar Christina <tamar.christina@arm.com>	2023-12-29 15:58:29 +0000
commit	984bdeaa39b6417b11736b2c167ef82119e272dc (patch)
tree	47e3b217a185bf2b03545201b30bc57099e5cd7a
parent	748a4e9069e41bec1f27aa369c19f9756d8c1494 (diff)
download	gcc-984bdeaa39b6417b11736b2c167ef82119e272dc.zip gcc-984bdeaa39b6417b11736b2c167ef82119e272dc.tar.gz gcc-984bdeaa39b6417b11736b2c167ef82119e272dc.tar.bz2