diff options
author | Tamar Christina <tamar.christina@arm.com> | 2021-02-24 09:43:22 +0000 |
---|---|---|
committer | Tamar Christina <tamar.christina@arm.com> | 2021-02-24 09:43:22 +0000 |
commit | 5296bd57d0605d1fec900d85e3ab3875197e609a (patch) | |
tree | 73048a53e576417be37cf2b5c633efd5c2a27c1d /gcc/tree-vectorizer.c | |
parent | 66e070b00ff327df3d7515222a4d85105c883d4a (diff) | |
download | gcc-5296bd57d0605d1fec900d85e3ab3875197e609a.zip gcc-5296bd57d0605d1fec900d85e3ab3875197e609a.tar.gz gcc-5296bd57d0605d1fec900d85e3ab3875197e609a.tar.bz2 |
slp: fix sharing of SLP only patterns.
The attached testcase ICEs due to a couple of issues.
In the testcase you have two SLP instances that share the majority of their
definition with each other. One tree defines a COMPLEX_MUL sequence and the
other tree a COMPLEX_FMA.
The ice happens because:
1. the refcounts are wrong, in particular the FMA case doesn't correctly count
the references for the COMPLEX_MUL that it consumes.
2. when the FMA is created it incorrectly assumes it can just tear apart the MUL
node that it's consuming. This is wrong and should only be done when there is
no more uses of the node, in which case the vector only pattern is no longer
relevant.
To fix the last part the SLP only pattern reset code was moved into
vect_free_slp_tree which results in cleaner code. I also think it does belong
there since that function knows when there are no more uses of the node and so
the pattern should be unmarked, so when the the vectorizer is inspecting the BB
it doesn't find the now invalid vector only patterns.
The patch also clears the SLP_TREE_REPRESENTATIVE when stores are removed such
that we don't hit an error later trying to free the stmt_vec_info again.
Lastly it also tweaks the results of whether a pattern has been detected or not
to return true when another SLP instance has created a pattern that is only used
by a different instance (due to the trees being unshared).
Instead of ICEing this code now produces
adrp x1, .LANCHOR0
add x2, x1, :lo12:.LANCHOR0
movi v1.2s, 0
mov w0, 0
ldr x4, [x1, #:lo12:.LANCHOR0]
ldrsw x3, [x2, 16]
ldr x1, [x2, 8]
ldrsw x2, [x2, 20]
ldr d0, [x4]
ldr d2, [x1, x3, lsl 3]
fcmla v2.2s, v0.2s, v0.2s, #0
fcmla v2.2s, v0.2s, v0.2s, #90
str d2, [x1, x3, lsl 3]
fcmla v1.2s, v0.2s, v0.2s, #0
fcmla v1.2s, v0.2s, v0.2s, #90
str d1, [x1, x2, lsl 3]
ret
PS. This testcase actually shows that the codegen we get in these cases is not
optimal. It should generate a MUL + ADD instead MUL + FMA.
But that's for GCC 12.
gcc/ChangeLog:
PR tree-optimization/99149
* tree-vect-slp-patterns.c (vect_detect_pair_op): Don't recreate the
buffer.
(vect_slp_reset_pattern): Remove.
(complex_fma_pattern::matches): Remove call to vect_slp_reset_pattern.
(complex_mul_pattern::build, complex_fma_pattern::build,
complex_fms_pattern::build): Fix ref counts.
* tree-vect-slp.c (vect_free_slp_tree): Undo SLP only pattern relevancy
when node is being deleted.
(vect_match_slp_patterns_2): Correct result of cache hit on patterns.
(vect_schedule_slp): Invalidate SLP_TREE_REPRESENTATIVE of removed
stores.
* tree-vectorizer.c (vec_info::new_stmt_vec_info): Initialize value.
gcc/testsuite/ChangeLog:
PR tree-optimization/99149
* g++.dg/vect/pr99149.cc: New test.
Diffstat (limited to 'gcc/tree-vectorizer.c')
-rw-r--r-- | gcc/tree-vectorizer.c | 1 |
1 files changed, 1 insertions, 0 deletions
diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c index 5b45df3..63ba594 100644 --- a/gcc/tree-vectorizer.c +++ b/gcc/tree-vectorizer.c @@ -695,6 +695,7 @@ vec_info::new_stmt_vec_info (gimple *stmt) STMT_VINFO_REDUC_FN (res) = IFN_LAST; STMT_VINFO_REDUC_IDX (res) = -1; STMT_VINFO_SLP_VECT_ONLY (res) = false; + STMT_VINFO_SLP_VECT_ONLY_PATTERN (res) = false; STMT_VINFO_VEC_STMTS (res) = vNULL; if (is_a <loop_vec_info> (this) |