aboutsummaryrefslogtreecommitdiff
path: root/gcc/tree-vect-loop.cc
diff options
context:
space:
mode:
authorRichard Sandiford <richard.sandiford@arm.com>2025-02-06 10:30:53 +0000
committerRichard Sandiford <richard.sandiford@arm.com>2025-02-06 10:30:53 +0000
commit7eb260c8a472568912c1e0b83fb402d22977281e (patch)
tree5e0c08abb5b10fcd26f70d8bcd00d4d21c631914 /gcc/tree-vect-loop.cc
parent677122c9df1b55a791a54426269f7a8ce794f947 (diff)
downloadgcc-7eb260c8a472568912c1e0b83fb402d22977281e.zip
gcc-7eb260c8a472568912c1e0b83fb402d22977281e.tar.gz
gcc-7eb260c8a472568912c1e0b83fb402d22977281e.tar.bz2
vect: Move induction IV increments [PR110449]
In this PR, we used to generate: .L6: mov v30.16b, v31.16b fadd v31.4s, v31.4s, v27.4s fadd v29.4s, v30.4s, v28.4s stp q30, q29, [x0] add x0, x0, 32 cmp x1, x0 bne .L6 for an unrolled induction in: for (int i = 0; i < 1024; i++) { arr[i] = freq; freq += step; } with the problem being the unnecessary MOV. The main induction IV was incremented by VF * step == 2 * nunits * step, and then nunits * step was added for the second store to arr. The original patch for the PR (r14-2367-g224fd59b2dc8) avoided the MOV by incrementing the IV by nunits * step twice. The problem with that approach is that it doubles the loop-carried latency. This change was deliberately not preserved when moving from loop-vect to SLP and so the test started failing again after r15-3509-gd34cda720988. I think the main problem is that we put the IV increment in the wrong place. Normal IVs created by create_iv are placed before the exit condition where possible, but vectorizable_induction instead always inserted them at the start of the loop body. The only use of the incremented IV is by the phi node, so the effect is to make both the old and new IV values live for the whole loop body, which is why we need the MOV. The simplest fix therefore seems to be to reuse the create_iv logic. gcc/ PR tree-optimization/110449 * tree-ssa-loop-manip.h (insert_iv_increment): Declare. * tree-ssa-loop-manip.cc (insert_iv_increment): New function, split out from... (create_iv): ...here and generalized to gimple_seqs. * tree-vect-loop.cc (vectorizable_induction): Use standard_iv_increment_position and insert_iv_increment to insert the IV increment. gcc/testsuite/ PR tree-optimization/110449 * gcc.target/aarch64/pr110449.c: Expect an increment by 8.0, but test that there is no MOV.
Diffstat (limited to 'gcc/tree-vect-loop.cc')
-rw-r--r--gcc/tree-vect-loop.cc6
1 files changed, 5 insertions, 1 deletions
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 0342620..eea0b89 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -10580,6 +10580,10 @@ vectorizable_induction (loop_vec_info loop_vinfo,
[i2 + 2*S2, i0 + 3*S0, i1 + 3*S1, i2 + 3*S2]. */
if (slp_node)
{
+ gimple_stmt_iterator incr_si;
+ bool insert_after;
+ standard_iv_increment_position (iv_loop, &incr_si, &insert_after);
+
/* The initial values are vectorized, but any lanes > group_size
need adjustment. */
slp_tree init_node
@@ -10810,7 +10814,7 @@ vectorizable_induction (loop_vec_info loop_vinfo,
vec_def = gimple_build (&stmts,
PLUS_EXPR, step_vectype, vec_def, up);
vec_def = gimple_convert (&stmts, vectype, vec_def);
- gsi_insert_seq_before (&si, stmts, GSI_SAME_STMT);
+ insert_iv_increment (&incr_si, insert_after, stmts);
add_phi_arg (induction_phi, vec_def, loop_latch_edge (iv_loop),
UNKNOWN_LOCATION);