aboutsummaryrefslogtreecommitdiff
path: root/gcc/tree-vect-loop.cc
diff options
context:
space:
mode:
authorRichard Biener <rguenther@suse.de>2025-02-12 14:18:06 +0100
committerRichard Biener <rguenth@gcc.gnu.org>2025-02-14 08:28:50 +0100
commit27653070db35216d5115cc25672fcc6a51203d26 (patch)
tree902b096ac125b500960271a4536e1705f91955f5 /gcc/tree-vect-loop.cc
parent8caf67eea7e1b29a4437f07d13c300d9fdb04827 (diff)
downloadgcc-27653070db35216d5115cc25672fcc6a51203d26.zip
gcc-27653070db35216d5115cc25672fcc6a51203d26.tar.gz
gcc-27653070db35216d5115cc25672fcc6a51203d26.tar.bz2
tree-optimization/90579 - avoid STLF fail by better optimizing
For the testcase in question which uses a fold-left vectorized reduction of a reverse iterating loop we'd need two forwprop invocations to first bypass the permute emitted for the reverse iterating loop and then to decompose the vector load that only feeds element extracts. The following moves the first transform to a match.pd pattern and makes sure we fold the element extracts when the vectorizer emits them so the single forwprop pass can then pick up the vector load decomposition, avoiding the forwarding fail that causes. Moving simplify_bitfield_ref also makes forwprop remove the dead VEC_PERM_EXPR via the simple-dce it uses - this was also previously missing. PR tree-optimization/90579 * tree-ssa-forwprop.cc (simplify_bitfield_ref): Move to match.pd. (pass_forwprop::execute): Adjust. * match.pd (bit_field_ref (vec_perm ...)): New pattern modeled after simplify_bitfield_ref. * tree-vect-loop.cc (vect_expand_fold_left): Fold the element extract stmt, combining it with the vector def. * gcc.target/i386/pr90579.c: New testcase.
Diffstat (limited to 'gcc/tree-vect-loop.cc')
-rw-r--r--gcc/tree-vect-loop.cc5
1 files changed, 5 insertions, 0 deletions
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index eea0b89..07b19a2 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -7086,6 +7086,11 @@ vect_expand_fold_left (gimple_stmt_iterator *gsi, tree scalar_dest,
rhs = make_ssa_name (scalar_dest, stmt);
gimple_assign_set_lhs (stmt, rhs);
gsi_insert_before (gsi, stmt, GSI_SAME_STMT);
+ /* Fold the vector extract, combining it with a previous reversal
+ like seen in PR90579. */
+ auto gsi2 = gsi_for_stmt (stmt);
+ if (fold_stmt (&gsi2, follow_all_ssa_edges))
+ update_stmt (gsi_stmt (gsi2));
stmt = gimple_build_assign (scalar_dest, code, lhs, rhs);
tree new_name = make_ssa_name (scalar_dest, stmt);