diff options
author | Richard Sandiford <richard.sandiford@arm.com> | 2022-02-15 18:09:33 +0000 |
---|---|---|
committer | Richard Sandiford <richard.sandiford@arm.com> | 2022-02-15 18:09:33 +0000 |
commit | 4963079769c99c4073adfd799885410ad484cbbe (patch) | |
tree | ec53951399724d809c6296a30e7a06e81d3e72a8 /gcc/tree-vectorizer.h | |
parent | 63a9328cb8c601377fe73e214b708c4ae0441847 (diff) | |
download | gcc-4963079769c99c4073adfd799885410ad484cbbe.zip gcc-4963079769c99c4073adfd799885410ad484cbbe.tar.gz gcc-4963079769c99c4073adfd799885410ad484cbbe.tar.bz2 |
vect+aarch64: Fix ldp_stp_* regressions
ldp_stp_1.c, ldp_stp_4.c and ldp_stp_5.c have been failing since
vectorisation was enabled at -O2. In all three cases SLP is
generating vector code when scalar code would be better.
The problem is that the target costs do not model whether STP could
be used for the scalar or vector code, so the normal latency-based
costs for store-heavy code can be way off. It would be good to fix
that “properly” at some point, but it isn't easy; see the existing
discussion in aarch64_sve_adjust_stmt_cost for more details.
This patch therefore adds an on-the-side check for whether the
code is doing nothing more than set-up+stores. It then applies
STP-based costs to those cases only, in addition to the normal
latency-based costs. (That is, the vector code has to win on
both counts rather than on one count individually.)
However, at the moment, SLP costs one vector set-up instruction
for every vector in an SLP node, even if the contents are the
same as a previous vector in the same node. Fixing the STP costs
without fixing that would regress other cases, tested in the patch.
The patch therefore makes the SLP costing code check for duplicates
within a node. Ideally we'd check for duplicates more globally,
but that would require a more global approach to costs: the cost
of an initialisation should be amoritised across all trees that
use the initialisation, rather than fully counted against one
arbitrarily-chosen subtree.
Back on aarch64: an earlier version of the patch tried to apply
the new heuristic to constant stores. However, that didn't work
too well in practice; see the comments for details. The patch
therefore just tests the status quo for constant cases, leaving out
a match if the current choice is dubious.
ldp_stp_5.c was affected by the same thing. The test would be
worth vectorising if we generated better vector code, but:
(1) We do a bad job of moving the { -1, 1 } constant, given that
we have { -1, -1 } and { 1, 1 } to hand.
(2) The vector code has 6 pairable stores to misaligned offsets.
We have peephole patterns to handle such misalignment for
4 pairable stores, but not 6.
So the SLP decision isn't wrong as such. It's just being let
down by later codegen.
The patch therefore adds -mstrict-align to preserve the original
intention of the test while adding ldp_stp_19.c to check for the
preferred vector code (XFAILed for now).
gcc/
* tree-vectorizer.h (vect_scalar_ops_slice): New struct.
(vect_scalar_ops_slice_hash): Likewise.
(vect_scalar_ops_slice::op): New function.
* tree-vect-slp.cc (vect_scalar_ops_slice::all_same_p): New function.
(vect_scalar_ops_slice_hash::hash): Likewise.
(vect_scalar_ops_slice_hash::equal): Likewise.
(vect_prologue_cost_for_slp): Check for duplicate vectors.
* config/aarch64/aarch64.cc
(aarch64_vector_costs::m_stp_sequence_cost): New member variable.
(aarch64_aligned_constant_offset_p): New function.
(aarch64_stp_sequence_cost): Likewise.
(aarch64_vector_costs::add_stmt_cost): Handle new STP heuristic.
(aarch64_vector_costs::finish_cost): Likewise.
gcc/testsuite/
* gcc.target/aarch64/ldp_stp_5.c: Require -mstrict-align.
* gcc.target/aarch64/ldp_stp_14.h,
* gcc.target/aarch64/ldp_stp_14.c: New test.
* gcc.target/aarch64/ldp_stp_15.c: Likewise.
* gcc.target/aarch64/ldp_stp_16.c: Likewise.
* gcc.target/aarch64/ldp_stp_17.c: Likewise.
* gcc.target/aarch64/ldp_stp_18.c: Likewise.
* gcc.target/aarch64/ldp_stp_19.c: Likewise.
Diffstat (limited to 'gcc/tree-vectorizer.h')
-rw-r--r-- | gcc/tree-vectorizer.h | 35 |
1 files changed, 35 insertions, 0 deletions
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index ec479d3..ddd0637 100644 --- a/gcc/tree-vectorizer.h +++ b/gcc/tree-vectorizer.h @@ -113,6 +113,41 @@ typedef hash_map<tree_operand_hash, std::pair<stmt_vec_info, innermost_loop_behavior *> > vec_base_alignments; +/* Represents elements [START, START + LENGTH) of cyclical array OPS* + (i.e. OPS repeated to give at least START + LENGTH elements) */ +struct vect_scalar_ops_slice +{ + tree op (unsigned int i) const; + bool all_same_p () const; + + vec<tree> *ops; + unsigned int start; + unsigned int length; +}; + +/* Return element I of the slice. */ +inline tree +vect_scalar_ops_slice::op (unsigned int i) const +{ + return (*ops)[(i + start) % ops->length ()]; +} + +/* Hash traits for vect_scalar_ops_slice. */ +struct vect_scalar_ops_slice_hash : typed_noop_remove<vect_scalar_ops_slice> +{ + typedef vect_scalar_ops_slice value_type; + typedef vect_scalar_ops_slice compare_type; + + static const bool empty_zero_p = true; + + static void mark_deleted (value_type &s) { s.length = ~0U; } + static void mark_empty (value_type &s) { s.length = 0; } + static bool is_deleted (const value_type &s) { return s.length == ~0U; } + static bool is_empty (const value_type &s) { return s.length == 0; } + static hashval_t hash (const value_type &); + static bool equal (const value_type &, const compare_type &); +}; + /************************************************************************ SLP ************************************************************************/ |