[AArch64] Enable VECT_COMPARE_COSTS by default for SVE

This patch enables VECT_COMPARE_COSTS by default for SVE, both so that we can compare SVE against Advanced SIMD and so that (with future patches) we can compare multiple SVE vectorisation approaches against each other. It also adds a target-specific --param to control this. 2019-11-16 Richard Sandiford <richard.sandiford@arm.com> gcc/ * config/aarch64/aarch64.opt (--param=aarch64-sve-compare-costs): New option. * doc/invoke.texi: Document it. * config/aarch64/aarch64.c (aarch64_autovectorize_vector_modes): By default, return VECT_COMPARE_COSTS for SVE. gcc/testsuite/ * gcc.target/aarch64/sve/reduc_3.c: Split multi-vector cases out into... * gcc.target/aarch64/sve/reduc_3_costly.c: ...this new test, passing -fno-vect-cost-model for them. * gcc.target/aarch64/sve/slp_6.c: Add -fno-vect-cost-model. * gcc.target/aarch64/sve/slp_7.c, * gcc.target/aarch64/sve/slp_7_run.c: Split multi-vector cases out into... * gcc.target/aarch64/sve/slp_7_costly.c, * gcc.target/aarch64/sve/slp_7_costly_run.c: ...these new tests, passing -fno-vect-cost-model for them. * gcc.target/aarch64/sve/while_7.c: Add -fno-vect-cost-model. * gcc.target/aarch64/sve/while_9.c: Likewise. From-SVN: r278337
author: Richard Sandiford <richard.sandiford@arm.com> 2019-11-16 10:43:52 +0000
committer: Richard Sandiford <rsandifo@gcc.gnu.org> 2019-11-16 10:43:52 +0000
commit: eb23241ba81aace0c881ccee4643632809741953 (patch)
tree: 7eec28a84e7f30c72a362aff3439e15219dcea7e /gcc
parent: bcc7e346bf9b5dc77797ea949d6adc740deb30ca (diff)
download: gcc-eb23241ba81aace0c881ccee4643632809741953.zip
gcc-eb23241ba81aace0c881ccee4643632809741953.tar.gz
gcc-eb23241ba81aace0c881ccee4643632809741953.tar.bz2
14 files changed, 154 insertions, 36 deletions
diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index f809def..2eee46b 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,5 +1,13 @@
 2019-11-16  Richard Sandiford  <richard.sandiford@arm.com>
 
+	* config/aarch64/aarch64.opt (--param=aarch64-sve-compare-costs):
+	New option.
+	* doc/invoke.texi: Document it.
+	* config/aarch64/aarch64.c (aarch64_autovectorize_vector_modes):
+	By default, return VECT_COMPARE_COSTS for SVE.
+
+2019-11-16  Richard Sandiford  <richard.sandiford@arm.com>
+
 	* target.h (VECT_COMPARE_COSTS): New constant.
 	* target.def (autovectorize_vector_modes): Return a bitmask of flags.
 	* doc/tm.texi: Regenerate.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index e2251a2..9ffe213 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -15962,7 +15962,15 @@ aarch64_autovectorize_vector_modes (vector_modes *modes, bool)
      for this case.  */
   modes->safe_push (V2SImode);
 
-  return 0;
+  unsigned int flags = 0;
+  /* Consider enabling VECT_COMPARE_COSTS for SVE, both so that we
+     can compare SVE against Advanced SIMD and so that we can compare
+     multiple SVE vectorization approaches against each other.  There's
+     not really any point doing this for Advanced SIMD only, since the
+     first mode that works should always be the best.  */
+  if (TARGET_SVE && aarch64_sve_compare_costs)
+    flags |= VECT_COMPARE_COSTS;
+  return flags;
 }
 
 /* Implement TARGET_MANGLE_TYPE.  */
diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
index fc43428..3b675e1 100644
--- a/gcc/config/aarch64/aarch64.opt
+++ b/gcc/config/aarch64/aarch64.opt
@@ -258,3 +258,7 @@ long aarch64_stack_protector_guard_offset = 0
 moutline-atomics
 Target Report Mask(OUTLINE_ATOMICS) Save
 Generate local calls to out-of-line atomic operations.
+
+-param=aarch64-sve-compare-costs=
+Target Joined UInteger Var(aarch64_sve_compare_costs) Init(1) IntegerRange(0, 1) Param
+When vectorizing for SVE, consider using unpacked vectors for smaller elements and use the cost model to pick the cheapest approach.  Also use the cost model to choose between SVE and Advanced SIMD vectorization.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index fe79ca2..7f19d67 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -11179,8 +11179,8 @@ without notice in future releases.
 In order to get minimal, maximal and default value of a parameter,
 one can use @option{--help=param -Q} options.
 
-In each case, the @var{value} is an integer.  The allowable choices for
-@var{name} are:
+In each case, the @var{value} is an integer.  The following choices
+of @var{name} are recognized for all targets:
 
 @table @gcctabopt
 @item predictable-branch-outcome
@@ -12396,6 +12396,20 @@ statements or when determining their validity prior to issuing
 diagnostics.
 
 @end table
+
+The following choices of @var{name} are available on AArch64 targets:
+
+@table @gcctabopt
+@item aarch64-sve-compare-costs
+When vectorizing for SVE, consider using ``unpacked'' vectors for
+smaller elements and use the cost model to pick the cheapest approach.
+Also use the cost model to choose between SVE and Advanced SIMD vectorization.
+
+Using unpacked vectors includes storing smaller elements in larger
+containers and accessing elements with extending loads and truncating
+stores.
+@end table
+
 @end table
 
 @node Instrumentation Options
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index daac270..4274edd 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,5 +1,21 @@
 2019-11-16  Richard Sandiford  <richard.sandiford@arm.com>
 
+	* gcc.target/aarch64/sve/reduc_3.c: Split multi-vector cases out
+	into...
+	* gcc.target/aarch64/sve/reduc_3_costly.c: ...this new test,
+	passing -fno-vect-cost-model for them.
+	* gcc.target/aarch64/sve/slp_6.c: Add -fno-vect-cost-model.
+	* gcc.target/aarch64/sve/slp_7.c,
+	* gcc.target/aarch64/sve/slp_7_run.c: Split multi-vector cases out
+	into...
+	* gcc.target/aarch64/sve/slp_7_costly.c,
+	* gcc.target/aarch64/sve/slp_7_costly_run.c: ...these new tests,
+	passing -fno-vect-cost-model for them.
+	* gcc.target/aarch64/sve/while_7.c: Add -fno-vect-cost-model.
+	* gcc.target/aarch64/sve/while_9.c: Likewise.
+
+2019-11-16  Richard Sandiford  <richard.sandiford@arm.com>
+
 	* gcc.dg/vect/bb-slp-4.c: Expect the block to be vectorized
 	with -fno-vect-cost-model.
 	* gcc.dg/vect/bb-slp-bool-1.c: New test.
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/reduc_3.c b/gcc/testsuite/gcc.target/aarch64/sve/reduc_3.c
index 4561199..0fc193b 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/reduc_3.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/reduc_3.c
@@ -17,7 +17,6 @@ void reduc_ptr_##DSTTYPE##_##SRCTYPE (DSTTYPE *restrict sum,	\
 
 REDUC_PTR (int8_t, int8_t)
 REDUC_PTR (int16_t, int16_t)
-
 REDUC_PTR (int32_t, int32_t)
 REDUC_PTR (int64_t, int64_t)
 
@@ -25,17 +24,6 @@ REDUC_PTR (_Float16, _Float16)
 REDUC_PTR (float, float)
 REDUC_PTR (double, double)
 
-/* Widening reductions.  */
-REDUC_PTR (int32_t, int8_t)
-REDUC_PTR (int32_t, int16_t)
-
-REDUC_PTR (int64_t, int8_t)
-REDUC_PTR (int64_t, int16_t)
-REDUC_PTR (int64_t, int32_t)
-
-REDUC_PTR (float, _Float16)
-REDUC_PTR (double, float)
-
 /* Float<>Int conversions */
 REDUC_PTR (_Float16, int16_t)
 REDUC_PTR (float, int32_t)
@@ -45,8 +33,14 @@ REDUC_PTR (int16_t, _Float16)
 REDUC_PTR (int32_t, float)
 REDUC_PTR (int64_t, double)
 
-/* { dg-final { scan-assembler-times {\tuaddv\td[0-9]+, p[0-7], z[0-9]+\.s\n} 3 } } */
-/* { dg-final { scan-assembler-times {\tuaddv\td[0-9]+, p[0-7], z[0-9]+\.d\n} 4 } } */
+/* { dg-final { scan-assembler-times {\tuaddv\td[0-9]+, p[0-7], z[0-9]+\.b\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tuaddv\td[0-9]+, p[0-7], z[0-9]+\.h\n} 2 { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times {\tuaddv\td[0-9]+, p[0-7], z[0-9]+\.s\n} 2 { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times {\tuaddv\td[0-9]+, p[0-7], z[0-9]+\.d\n} 2 { xfail *-*-* } } } */
+/* We don't yet vectorize the int<-float cases.  */
+/* { dg-final { scan-assembler-times {\tuaddv\td[0-9]+, p[0-7], z[0-9]+\.h\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tuaddv\td[0-9]+, p[0-7], z[0-9]+\.s\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tuaddv\td[0-9]+, p[0-7], z[0-9]+\.d\n} 1 } } */
 /* { dg-final { scan-assembler-times {\tfaddv\th[0-9]+, p[0-7], z[0-9]+\.h\n} 2 } } */
-/* { dg-final { scan-assembler-times {\tfaddv\ts[0-9]+, p[0-7], z[0-9]+\.s\n} 3 } } */
-/* { dg-final { scan-assembler-times {\tfaddv\td[0-9]+, p[0-7], z[0-9]+\.d\n} 3 } } */
+/* { dg-final { scan-assembler-times {\tfaddv\ts[0-9]+, p[0-7], z[0-9]+\.s\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tfaddv\td[0-9]+, p[0-7], z[0-9]+\.d\n} 2 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/reduc_3_costly.c b/gcc/testsuite/gcc.target/aarch64/sve/reduc_3_costly.c
new file mode 100644
index 0000000..988459d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/reduc_3_costly.c
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize -ffast-math -fno-vect-cost-model" } */
+
+#include <stdint.h>
+
+#define NUM_ELEMS(TYPE) (32 / sizeof (TYPE))
+
+#define REDUC_PTR(DSTTYPE, SRCTYPE)				\
+void reduc_ptr_##DSTTYPE##_##SRCTYPE (DSTTYPE *restrict sum,	\
+				      SRCTYPE *restrict array,	\
+				      int count)		\
+{								\
+  *sum = 0;							\
+  for (int i = 0; i < count; ++i)				\
+    *sum += array[i];						\
+}
+
+/* Widening reductions.  */
+REDUC_PTR (int32_t, int8_t)
+REDUC_PTR (int32_t, int16_t)
+
+REDUC_PTR (int64_t, int8_t)
+REDUC_PTR (int64_t, int16_t)
+REDUC_PTR (int64_t, int32_t)
+
+REDUC_PTR (float, _Float16)
+REDUC_PTR (double, float)
+
+/* { dg-final { scan-assembler-times {\tuaddv\td[0-9]+, p[0-7], z[0-9]+\.s\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tuaddv\td[0-9]+, p[0-7], z[0-9]+\.d\n} 3 } } */
+/* { dg-final { scan-assembler-times {\tfaddv\ts[0-9]+, p[0-7], z[0-9]+\.s\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tfaddv\td[0-9]+, p[0-7], z[0-9]+\.d\n} 1 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/slp_6.c b/gcc/testsuite/gcc.target/aarch64/sve/slp_6.c
index 80fa350..44d1284 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/slp_6.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/slp_6.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-vectorize -msve-vector-bits=scalable -ffast-math" } */
+/* { dg-options "-O2 -ftree-vectorize -msve-vector-bits=scalable -ffast-math -fno-vect-cost-model" } */
 
 #include <stdint.h>
 
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/slp_7.c b/gcc/testsuite/gcc.target/aarch64/sve/slp_7.c
index dbc32a4..1920720 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/slp_7.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/slp_7.c
@@ -31,37 +31,27 @@ vec_slp_##TYPE (TYPE *restrict a, TYPE *restrict b, int n)	\
   T (uint16_t)					\
   T (int32_t)					\
   T (uint32_t)					\
-  T (int64_t)					\
-  T (uint64_t)					\
   T (_Float16)					\
-  T (float)					\
-  T (double)
+  T (float)
 
 TEST_ALL (VEC_PERM)
 
-/* We can't use SLP for the 64-bit loops, since the number of reduction
-   results might be greater than the number of elements in the vector.
-   Otherwise we have two loads per loop, one for the initial vector
-   and one for the loop body.  */
+/* We have two loads per loop, one for the initial vector and one for
+   the loop body.  */
 /* { dg-final { scan-assembler-times {\tld1b\t} 2 } } */
 /* { dg-final { scan-assembler-times {\tld1h\t} 3 } } */
 /* { dg-final { scan-assembler-times {\tld1w\t} 3 } } */
-/* { dg-final { scan-assembler-times {\tld4d\t} 3 } } */
 /* { dg-final { scan-assembler-not {\tld4b\t} } } */
 /* { dg-final { scan-assembler-not {\tld4h\t} } } */
 /* { dg-final { scan-assembler-not {\tld4w\t} } } */
-/* { dg-final { scan-assembler-not {\tld1d\t} } } */
 /* { dg-final { scan-assembler-times {\tuaddv\td[0-9]+, p[0-7], z[0-9]+\.b} 8 } } */
 /* { dg-final { scan-assembler-times {\tuaddv\td[0-9]+, p[0-7], z[0-9]+\.h} 8 } } */
 /* { dg-final { scan-assembler-times {\tuaddv\td[0-9]+, p[0-7], z[0-9]+\.s} 8 } } */
-/* { dg-final { scan-assembler-times {\tuaddv\td[0-9]+, p[0-7], z[0-9]+\.d} 8 } } */
 /* { dg-final { scan-assembler-times {\tfaddv\th[0-9]+, p[0-7], z[0-9]+\.h} 4 } } */
 /* { dg-final { scan-assembler-times {\tfaddv\ts[0-9]+, p[0-7], z[0-9]+\.s} 4 } } */
-/* { dg-final { scan-assembler-times {\tfaddv\td[0-9]+, p[0-7], z[0-9]+\.d} 4 } } */
 
 /* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.b} 4 } } */
 /* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.h} 6 } } */
 /* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.s} 6 } } */
-/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.d} 6 } } */
 
 /* { dg-final { scan-assembler-not {\tuqdec} } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/slp_7_costly.c b/gcc/testsuite/gcc.target/aarch64/sve/slp_7_costly.c
new file mode 100644
index 0000000..69c3319
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/slp_7_costly.c
@@ -0,0 +1,43 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize -msve-vector-bits=scalable -ffast-math -fno-vect-cost-model" } */
+
+#include <stdint.h>
+
+#define VEC_PERM(TYPE)						\
+void __attribute__ ((noinline, noclone))			\
+vec_slp_##TYPE (TYPE *restrict a, TYPE *restrict b, int n)	\
+{								\
+  TYPE x0 = b[0];						\
+  TYPE x1 = b[1];						\
+  TYPE x2 = b[2];						\
+  TYPE x3 = b[3];						\
+  for (int i = 0; i < n; ++i)					\
+    {								\
+      x0 += a[i * 4];						\
+      x1 += a[i * 4 + 1];					\
+      x2 += a[i * 4 + 2];					\
+      x3 += a[i * 4 + 3];					\
+    }								\
+  b[0] = x0;							\
+  b[1] = x1;							\
+  b[2] = x2;							\
+  b[3] = x3;							\
+}
+
+#define TEST_ALL(T)				\
+  T (int64_t)					\
+  T (uint64_t)					\
+  T (double)
+
+TEST_ALL (VEC_PERM)
+
+/* We can't use SLP for the 64-bit loops, since the number of reduction
+   results might be greater than the number of elements in the vector.  */
+/* { dg-final { scan-assembler-times {\tld4d\t} 3 } } */
+/* { dg-final { scan-assembler-not {\tld1d\t} } } */
+/* { dg-final { scan-assembler-times {\tuaddv\td[0-9]+, p[0-7], z[0-9]+\.d} 8 } } */
+/* { dg-final { scan-assembler-times {\tfaddv\td[0-9]+, p[0-7], z[0-9]+\.d} 4 } } */
+
+/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.d} 6 } } */
+
+/* { dg-final { scan-assembler-not {\tuqdec} } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/slp_7_costly_run.c b/gcc/testsuite/gcc.target/aarch64/sve/slp_7_costly_run.c
new file mode 100644
index 0000000..fbf9432
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/slp_7_costly_run.c
@@ -0,0 +1,5 @@
+/* { dg-do run { target aarch64_sve_hw } } */
+/* { dg-options "-O2 -ftree-vectorize -ffast-math -fno-vect-cost-model" } */
+
+#define FILENAME "slp_7_costly.c"
+#include "slp_7_run.c"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/slp_7_run.c b/gcc/testsuite/gcc.target/aarch64/sve/slp_7_run.c
index 3cc090d..7c0aa62 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/slp_7_run.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/slp_7_run.c
@@ -1,7 +1,11 @@
 /* { dg-do run { target aarch64_sve_hw } } */
 /* { dg-options "-O2 -ftree-vectorize -ffast-math" } */
 
-#include "slp_7.c"
+#ifndef FILENAME
+#define FILENAME "slp_7.c"
+#endif
+
+#include FILENAME
 
 #define N (54 * 4)
 
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/while_7.c b/gcc/testsuite/gcc.target/aarch64/sve/while_7.c
index d5ffb66..a66a20d 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/while_7.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/while_7.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-vectorize -msve-vector-bits=scalable" } */
+/* { dg-options "-O2 -ftree-vectorize -msve-vector-bits=scalable -fno-vect-cost-model" } */
 
 #include <stdint.h>
 
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/while_9.c b/gcc/testsuite/gcc.target/aarch64/sve/while_9.c
index 9a8e5fe..dd3f404 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/while_9.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/while_9.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-vectorize -msve-vector-bits=scalable" } */
+/* { dg-options "-O2 -ftree-vectorize -msve-vector-bits=scalable -fno-vect-cost-model" } */
 
 #include <stdint.h>
author	Richard Sandiford <richard.sandiford@arm.com>	2019-11-16 10:43:52 +0000
committer	Richard Sandiford <rsandifo@gcc.gnu.org>	2019-11-16 10:43:52 +0000
commit	eb23241ba81aace0c881ccee4643632809741953 (patch)
tree	7eec28a84e7f30c72a362aff3439e15219dcea7e /gcc
parent	bcc7e346bf9b5dc77797ea949d6adc740deb30ca (diff)
download	gcc-eb23241ba81aace0c881ccee4643632809741953.zip gcc-eb23241ba81aace0c881ccee4643632809741953.tar.gz gcc-eb23241ba81aace0c881ccee4643632809741953.tar.bz2