diff options
author | Richard Biener <rguenther@suse.de> | 2024-02-23 11:45:50 +0100 |
---|---|---|
committer | Richard Biener <rguenther@suse.de> | 2024-06-04 10:13:30 +0200 |
commit | d93353e6423ecaaae9fa47d0935caafd9abfe4de (patch) | |
tree | 8e85a2b6432e24b2a5cb84e4566ffa31ffd90bed /gcc/tree-vect-loop.cc | |
parent | 0592000aeed84d47040946a125154b3c46d7c84f (diff) | |
download | gcc-d93353e6423ecaaae9fa47d0935caafd9abfe4de.zip gcc-d93353e6423ecaaae9fa47d0935caafd9abfe4de.tar.gz gcc-d93353e6423ecaaae9fa47d0935caafd9abfe4de.tar.bz2 |
Do single-lane SLP discovery for reductions
The following performs single-lane SLP discovery for reductions.
It requires a fixup for outer loop vectorization where a check
for multiple types needs adjustments as otherwise bogus pointer
IV increments happen when there are multiple copies of vector stmts
in the inner loop.
For the reduction epilog handling this extends the optimized path
to cover the trivial single-lane SLP reduction case.
The fix for PR65518 implemented in vect_grouped_load_supported for
non-SLP needs a SLP counterpart that I put in get_group_load_store_type.
I've decided to adjust three testcases for appearing single-lane
SLP instances instead of not dumping "vectorizing stmts using SLP"
for single-lane instances as that also requires testsuite adjustments.
* tree-vect-slp.cc (vect_build_slp_tree_2): Only multi-lane
discoveries are reduction chains and need special backedge
treatment.
(vect_analyze_slp): Fall back to single-lane SLP discovery
for reductions. Make sure to try single-lane SLP reduction
for all reductions as fallback.
(vectorizable_load): Avoid outer loop SLP vectorization with
multi-copy vector stmts in the inner loop.
(vectorizable_store): Likewise.
* tree-vect-loop.cc (vect_create_epilog_for_reduction): Allow
direct opcode and shift reduction also for SLP reductions
with a single lane.
* tree-vect-stmts.cc (get_group_load_store_type): For SLP also
check for the PR65518 single-element interleaving case as done in
vect_grouped_load_supported.
* gcc.dg/vect/slp-24.c: Expect another SLP instance for the
reduction.
* gcc.dg/vect/slp-24-big-array.c: Likewise.
* gcc.dg/vect/slp-reduc-6.c: Remove scan for zero SLP instances.
Diffstat (limited to 'gcc/tree-vect-loop.cc')
-rw-r--r-- | gcc/tree-vect-loop.cc | 4 |
1 files changed, 2 insertions, 2 deletions
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index a08357a..06292ed 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -6504,7 +6504,7 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo, /* 2.3 Create the reduction code, using one of the three schemes described above. In SLP we simply need to extract all the elements from the vector (without reducing them), so we use scalar shifts. */ - else if (reduc_fn != IFN_LAST && !slp_reduc) + else if (reduc_fn != IFN_LAST && (!slp_reduc || group_size == 1)) { tree tmp; tree vec_elem_type; @@ -6674,7 +6674,7 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo, gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT); reduc_inputs[0] = new_temp; - if (reduce_with_shift && !slp_reduc) + if (reduce_with_shift && (!slp_reduc || group_size == 1)) { int element_bitsize = tree_to_uhwi (bitsize); /* Enforced by vectorizable_reduction, which disallows SLP reductions |