vect+aarch64: Fix ldp_stp_* regressions

ldp_stp_1.c, ldp_stp_4.c and ldp_stp_5.c have been failing since vectorisation was enabled at -O2. In all three cases SLP is generating vector code when scalar code would be better. The problem is that the target costs do not model whether STP could be used for the scalar or vector code, so the normal latency-based costs for store-heavy code can be way off. It would be good to fix that “properly” at some point, but it isn't easy; see the existing discussion in aarch64_sve_adjust_stmt_cost for more details. This patch therefore adds an on-the-side check for whether the code is doing nothing more than set-up+stores. It then applies STP-based costs to those cases only, in addition to the normal latency-based costs. (That is, the vector code has to win on both counts rather than on one count individually.) However, at the moment, SLP costs one vector set-up instruction for every vector in an SLP node, even if the contents are the same as a previous vector in the same node. Fixing the STP costs without fixing that would regress other cases, tested in the patch. The patch therefore makes the SLP costing code check for duplicates within a node. Ideally we'd check for duplicates more globally, but that would require a more global approach to costs: the cost of an initialisation should be amoritised across all trees that use the initialisation, rather than fully counted against one arbitrarily-chosen subtree. Back on aarch64: an earlier version of the patch tried to apply the new heuristic to constant stores. However, that didn't work too well in practice; see the comments for details. The patch therefore just tests the status quo for constant cases, leaving out a match if the current choice is dubious. ldp_stp_5.c was affected by the same thing. The test would be worth vectorising if we generated better vector code, but: (1) We do a bad job of moving the { -1, 1 } constant, given that we have { -1, -1 } and { 1, 1 } to hand. (2) The vector code has 6 pairable stores to misaligned offsets. We have peephole patterns to handle such misalignment for 4 pairable stores, but not 6. So the SLP decision isn't wrong as such. It's just being let down by later codegen. The patch therefore adds -mstrict-align to preserve the original intention of the test while adding ldp_stp_19.c to check for the preferred vector code (XFAILed for now). gcc/ * tree-vectorizer.h (vect_scalar_ops_slice): New struct. (vect_scalar_ops_slice_hash): Likewise. (vect_scalar_ops_slice::op): New function. * tree-vect-slp.cc (vect_scalar_ops_slice::all_same_p): New function. (vect_scalar_ops_slice_hash::hash): Likewise. (vect_scalar_ops_slice_hash::equal): Likewise. (vect_prologue_cost_for_slp): Check for duplicate vectors. * config/aarch64/aarch64.cc (aarch64_vector_costs::m_stp_sequence_cost): New member variable. (aarch64_aligned_constant_offset_p): New function. (aarch64_stp_sequence_cost): Likewise. (aarch64_vector_costs::add_stmt_cost): Handle new STP heuristic. (aarch64_vector_costs::finish_cost): Likewise. gcc/testsuite/ * gcc.target/aarch64/ldp_stp_5.c: Require -mstrict-align. * gcc.target/aarch64/ldp_stp_14.h, * gcc.target/aarch64/ldp_stp_14.c: New test. * gcc.target/aarch64/ldp_stp_15.c: Likewise. * gcc.target/aarch64/ldp_stp_16.c: Likewise. * gcc.target/aarch64/ldp_stp_17.c: Likewise. * gcc.target/aarch64/ldp_stp_18.c: Likewise. * gcc.target/aarch64/ldp_stp_19.c: Likewise.
author: Richard Sandiford <richard.sandiford@arm.com> 2022-02-15 18:09:33 +0000
committer: Richard Sandiford <richard.sandiford@arm.com> 2022-02-15 18:09:33 +0000
commit: 4963079769c99c4073adfd799885410ad484cbbe (patch)
tree: ec53951399724d809c6296a30e7a06e81d3e72a8 /gcc/tree-vectorizer.h
parent: 63a9328cb8c601377fe73e214b708c4ae0441847 (diff)
download: gcc-4963079769c99c4073adfd799885410ad484cbbe.zip
gcc-4963079769c99c4073adfd799885410ad484cbbe.tar.gz
gcc-4963079769c99c4073adfd799885410ad484cbbe.tar.bz2
1 files changed, 35 insertions, 0 deletions
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index ec479d3..ddd0637 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -113,6 +113,41 @@ typedef hash_map<tree_operand_hash,
 		 std::pair<stmt_vec_info, innermost_loop_behavior *> >
 	  vec_base_alignments;
 
+/* Represents elements [START, START + LENGTH) of cyclical array OPS*
+   (i.e. OPS repeated to give at least START + LENGTH elements)  */
+struct vect_scalar_ops_slice
+{
+  tree op (unsigned int i) const;
+  bool all_same_p () const;
+
+  vec<tree> *ops;
+  unsigned int start;
+  unsigned int length;
+};
+
+/* Return element I of the slice.  */
+inline tree
+vect_scalar_ops_slice::op (unsigned int i) const
+{
+  return (*ops)[(i + start) % ops->length ()];
+}
+
+/* Hash traits for vect_scalar_ops_slice.  */
+struct vect_scalar_ops_slice_hash : typed_noop_remove<vect_scalar_ops_slice>
+{
+  typedef vect_scalar_ops_slice value_type;
+  typedef vect_scalar_ops_slice compare_type;
+
+  static const bool empty_zero_p = true;
+
+  static void mark_deleted (value_type &s) { s.length = ~0U; }
+  static void mark_empty (value_type &s) { s.length = 0; }
+  static bool is_deleted (const value_type &s) { return s.length == ~0U; }
+  static bool is_empty (const value_type &s) { return s.length == 0; }
+  static hashval_t hash (const value_type &);
+  static bool equal (const value_type &, const compare_type &);
+};
+
 /************************************************************************
   SLP
  ************************************************************************/
author	Richard Sandiford <richard.sandiford@arm.com>	2022-02-15 18:09:33 +0000
committer	Richard Sandiford <richard.sandiford@arm.com>	2022-02-15 18:09:33 +0000
commit	4963079769c99c4073adfd799885410ad484cbbe (patch)
tree	ec53951399724d809c6296a30e7a06e81d3e72a8 /gcc/tree-vectorizer.h
parent	63a9328cb8c601377fe73e214b708c4ae0441847 (diff)
download	gcc-4963079769c99c4073adfd799885410ad484cbbe.zip gcc-4963079769c99c4073adfd799885410ad484cbbe.tar.gz gcc-4963079769c99c4073adfd799885410ad484cbbe.tar.bz2