Age | Commit message (Collapse) | Author | Files | Lines |
|
This fixes SLP cycles leaking memory by maintaining a double-linked
list of allocatd SLP nodes we can zap when we free the alloc pool.
2020-12-02 Richard Biener <rguenther@suse.de>
PR tree-optimization/97630
* tree-vectorizer.h (_slp_tree::next_node,
_slp_tree::prev_node): New.
(vect_slp_init): Declare.
(vect_slp_fini): Likewise.
* tree-vectorizer.c (vectorize_loops): Call vect_slp_init/fini.
(pass_slp_vectorize::execute): Likewise.
* tree-vect-slp.c (vect_slp_init): New.
(vect_slp_fini): Likewise.
(slp_first_node): New global.
(_slp_tree::_slp_tree): Link node into the SLP tree list.
(_slp_tree::~_slp_tree): Delink node from the SLP tree list.
|
|
This properly skips debug USE_STMTs when looking for non-SLP sinks.
2020-11-23 Richard Biener <rguenther@suse.de>
* tree-vect-slp.c (maybe_push_to_hybrid_worklist): Skip
debug stmts.
* g++.dg/vect/simd-12.cc: New testcase.
|
|
This modifies vectorizable_slp_permutation to update the type of the children
of a perm node before trying to permute them. This allows us to be able to
permute invariant nodes.
This will be covered by test from the SLP pattern matcher.
gcc/ChangeLog:
* tree-vect-slp.c (vectorizable_slp_permutation): Update types on nodes
when needed.
|
|
This makes hybrid SLP discovery deal with stmts indirectly consumed
by SLP, for example via patterns. This means that all uses of a
stmt end up in SLP vectorized stmts.
This helps my prototype patches for PR97832 where I make SLP discovery
re-associate chains to make operands match. This ends up building
SLP computation nodes without 1:1 representatives in the scalar IL
and thus no scalar lane defs in SLP_TREE_SCALAR_STMTS. Nevertheless
all of the original scalar stmts are consumed so this represents
another kind of SLP pattern for the computation chain result.
2020-11-20 Richard Biener <rguenther@suse.de>
* tree-vect-slp.c (maybe_push_to_hybrid_worklist): New function.
(vect_detect_hybrid_slp): Use it. Perform a backward walk
over the IL.
|
|
It always annoyed me to see those empty SLP nodes in dumpfiles:
t.c:16:3: note: node 0x3a2a280 (max_nunits=1, refcnt=1)
t.c:16:3: note: { }
t.c:16:3: note: children 0x3a29db0 0x3a29e90
resulting from two-operator handling. The following makes
sure to also dump the operation template or VEC_PERM_EXPR.
2020-11-20 Richard Biener <rguenther@suse.de>
* tree-vect-slp.c (vect_print_slp_tree): Also dump
SLP_TREE_REPRESENTATIVE.
|
|
This delays filling SLP_INSTANCE_LOADS.
2020-11-16 Richard Biener <rguenther@suse.de>
* tree-vectorizer.h (vect_gather_slp_loads): Declare.
* tree-vect-loop.c (vect_analyze_loop_2): Call
vect_gather_slp_loads.
* tree-vect-slp.c (vect_build_slp_instance): Do not gather
SLP loads here.
(vect_gather_slp_loads): Remove wrapper, new function.
(vect_slp_analyze_bb_1): Call it.
|
|
This properly handles reduction PHI nodes with unrepresented
initial value as leaf in the SLP graph.
2020-11-16 Richard Biener <rguenther@suse.de>
PR tree-optimization/97838
* tree-vect-slp.c (vect_slp_build_vertices): Properly handle
not backwards reachable cycles.
(vect_optimize_slp): Check a node is leaf before marking it
visited.
* gcc.dg/vect/pr97838.c: New testcase.
|
|
This removes a premature end of the DFS walk.
2020-11-09 Richard Biener <rguenther@suse.de>
PR tree-optimization/97761
* tree-vect-slp.c (vect_bb_slp_mark_live_stmts): Remove
premature end of DFS walk.
* gfortran.dg/vect/pr97761.f90: New testcase.
|
|
This passes down the graph entry kind down to vect_analyze_slp_instance
which simplifies it and makes it a shallow wrapper around
vect_build_slp_instance.
2020-11-06 Richard Biener <rguenther@suse.de>
* tree-vect-slp.c (vect_analyze_slp): Pass down the
SLP graph entry kind.
(vect_analyze_slp_instance): Simplify.
(vect_build_slp_instance): Adjust.
(vect_slp_check_for_constructors): Perform more
eligibility checks here.
|
|
This adds a missing check.
2020-11-06 Richard Biener <rguenther@suse.de>
PR tree-optimization/97733
* tree-vect-slp.c (vect_analyze_slp_instance): If less
than two reductions were relevant or live do nothing.
|
|
The following fixes SLP vectorization of stores that were
pattern recognized. Since in SLP vectorization pattern analysis
happens after dataref group analysis we have to adjust the groups
with the pattern stmts. This has some effects down the pipeline
and exposes cases where we looked at the wrong pattern/non-pattern
stmts.
2020-11-05 Richard Biener <rguenther@suse.de>
* tree-vect-data-refs.c (vect_slp_analyze_node_dependences):
Use the original stmts.
(vect_slp_analyze_node_alignment): Use the pattern stmt.
* tree-vect-slp.c (vect_fixup_store_groups_with_patterns):
New function.
(vect_slp_analyze_bb_1): Call it.
* gcc.dg/vect/bb-slp-69.c: New testcase.
|
|
This optimizes sequential permutes. i.e. if there are two permutes back to back
this function applies the permute of the parent to the child and removed the
parent.
This relies on the materialization point calculation in optimize SLP.
This allows us to remove useless permutes such as
ldr q0, [x0, x3]
ldr q2, [x1, x3]
trn1 v1.4s, v0.4s, v0.4s
trn2 v0.4s, v0.4s, v0.4s
trn1 v0.4s, v1.4s, v0.4s
mov v1.16b, v3.16b
fcmla v1.4s, v0.4s, v2.4s, #0
fcmla v1.4s, v0.4s, v2.4s, #90
str q1, [x2, x3]
from the sequence the vectorizer puts out and give
ldr q0, [x0, x3]
ldr q2, [x1, x3]
mov v1.16b, v3.16b
fcmla v1.4s, v0.4s, v2.4s, #0
fcmla v1.4s, v0.4s, v2.4s, #90
str q1, [x2, x3]
instead.
gcc/ChangeLog:
* tree-vect-slp.c (vect_slp_tree_permute_noop_p): New.
(vect_optimize_slp): Optimize permutes.
(vectorizable_slp_permutation): Fix typo.
|
|
This patch stores the SLP instance kind in the SLP instance so that we can use
it later when detecting load/store lanes support.
This also changes the load/store lane support check to only check if the SLP
kind is a store. This means that in order for the load/lanes to work all
instances must be of kind store.
gcc/ChangeLog:
* tree-vect-loop.c (vect_analyze_loop_2): Check kind.
* tree-vect-slp.c (vect_build_slp_instance): New.
(enum slp_instance_kind): Move to...
* tree-vectorizer.h (enum slp_instance_kind): .. Here
(SLP_INSTANCE_KIND): New.
|
|
This moves the code that checks for load/store lanes further in the pipeline and
places it after slp_optimize. This would allow us to perform optimizations on
the SLP tree and only bail out if we really have a permute.
With this change it allows us to handle permutes such as {1,1,1,1} which should
be handled by a load and replicate.
This change however makes it all or nothing. Either all instances can be handled
or none at all. This is why some of the test cases have been adjusted.
gcc/ChangeLog:
* tree-vect-slp.c (vect_analyze_slp_instance): Moved load/store lanes
check to ...
* tree-vect-loop.c (vect_analyze_loop_2): ..Here
gcc/testsuite/ChangeLog:
* gcc.dg/vect/slp-11b.c: Update output scan.
* gcc.dg/vect/slp-perm-6.c: Likewise.
|
|
Pastoed the previous fix too quickly, the following fixes the
correct spot - the memset, not the allocation.
2020-11-04 Richard Biener <rguenther@suse.de>
PR bootstrap/97666
* tree-vect-slp.c (vect_build_slp_tree_2): Revert previous
fix and instead adjust the memset.
|
|
This fixes the bad assumption that sizeof (bool) == 1
2020-11-03 Richard Biener <rguenther@suse.de>
PR bootstrap/97666
* tree-vect-slp.c (vect_build_slp_tree_2): Scale
allocation of skip_args by sizeof (bool).
|
|
This restores not tracking SLP nodes for induction initial values
in not nested context because this interferes with peeling and
epilogue vectorization.
2020-11-03 Richard Biener <rguenther@suse.de>
PR tree-optimization/97678
* tree-vect-slp.c (vect_build_slp_tree_2): Do not track
the initial values of inductions when not nested.
* tree-vect-loop.c (vectorizable_induction): Look at
PHI node initial values again for SLP and not nested
inductions. Handle LOOP_VINFO_MASK_SKIP_NITERS and cost
invariants.
* gcc.dg/vect/pr97678.c: New testcase.
|
|
This rewrites SLP induction vectorization to handle different
inductions in the different SLP lanes. It also changes SLP
build to represent the initial value (but not the cycle) so
it can be enhanced to handle outer loop vectorization later.
Note this FAILs gcc.dg/vect/costmodel/x86_64/costmodel-pr30843.c
because it removes one CSE optimization that no longer works
with non-uniform initial value and step. I'll see to recover
from this after outer loop vectorization of inductions works.
It might be a bit friendlier to variable-size vectors now
but then we're now building the step vector from scalars ...
2020-11-02 Richard Biener <rguenther@suse.de>
* tree.h (build_real_from_wide): Declare.
* tree.c (build_real_from_wide): New function.
* tree-vect-slp.c (vect_build_slp_tree_2): Remove
restriction on induction vectorization, represent
the initial value.
* tree-vect-loop.c (vect_model_induction_cost): Inline ...
(vectorizable_induction): ... here. Rewrite SLP
code generation.
* gcc.dg/vect/slp-49.c: New testcase.
|
|
This avoids analyzing reductions that are not relevant (thus dead)
which eventually will lead into crashes because the participating
stmts meta is not analyzed. For this to work the patch also
properly removes reduction groups that are not uniformly recognized
as patterns.
2020-11-02 Richard Biener <rguenther@suse.de>
PR tree-optimization/97558
* tree-vect-loop.c (vect_fixup_scalar_cycles_with_patterns):
Check for any mismatch in pattern vs. non-pattern and dissolve
the group if there is one.
* tree-vect-slp.c (vect_analyze_slp_instance): Avoid
analyzing not relevant reductions.
(vect_analyze_slp): Avoid analyzing not relevant reduction
groups.
* gcc.dg/vect/pr97558.c: New testcase.
|
|
I was mistaken to treat vect_external_def as only applying to
SSA_NAME defs, so check for that.
2020-11-02 Richard Biener <rguenther@suse.de>
PR tree-optimization/97650
* tree-vect-slp.c (vect_get_and_check_slp_defs): Check
for SSA_NAME before checking SSA_NAME_IS_DEFAULT_DEF.
* gcc.dg/vect/bb-slp-pr97650.c: New testcase.
|
|
This makes sure to roll-back the whole SCC when we fail stmt
analysis, otherwise the optimistic visited treatment breaks down
with different entries. Rollback is easy when tracking additions
to visited in a vector which also makes the whole thing cheaper
than the two hash-sets used before.
2020-10-30 Richard Biener <rguenther@suse.de>
PR tree-optimization/97626
* tree-vect-slp.c (vect_slp_analyze_node_operations):
Exchange the lvisited hash-set for a vector, roll back
recursive adds to visited when analysis failed.
(vect_slp_analyze_operations): Likewise.
* gcc.dg/vect/bb-slp-pr97626.c: New testcase.
|
|
This makes sure to update backedges in single-node cycles.
2020-10-30 Richard Biener <rguenther@suse.de>
PR tree-optimization/97633
* tree-vect-slp.c (): Update backedges in single-node cycles.
Optimize processing of externals.
* g++.dg/vect/slp-pr97636.cc: New testcase.
* gcc.dg/vect/bb-slp-pr97633.c: Likewise.
|
|
For the following test case (compiled with load/store lanes
disabled locally):
void
f (uint32_t *restrict x, uint8_t *restrict y, int n)
{
for (int i = 0; i < n; ++i)
{
x[i * 2] = x[i * 2] + y[i * 2];
x[i * 2 + 1] = x[i * 2 + 1] + y[i * 2];
}
}
we have a redundant no-op permute on the x[] load node:
node 0x4472350 (max_nunits=8, refcnt=2)
stmt 0 _5 = *_4;
stmt 1 _13 = *_12;
load permutation { 0 1 }
Then, when costing it, we pick a cost of 1, even though we need 4 copies
of the x[] load to match a single y[] load:
==> examining statement: _5 = *_4;
Vectorizing an unaligned access.
vect_model_load_cost: unaligned supported by hardware.
vect_model_load_cost: inside_cost = 1, prologue_cost = 0 .
The problem is that the code only considers the permutation for
the first scalar iteration, rather than for all VF iterations.
This patch tries to fix that by making vect_transform_slp_perm_load
calculate the value instead.
gcc/
* tree-vectorizer.h (vect_transform_slp_perm_load): Take an
optional extra parameter.
* tree-vect-slp.c (vect_transform_slp_perm_load): Calculate
the number of loads as well as the number of permutes, taking
the counting loop from...
* tree-vect-stmts.c (vect_model_load_cost): ...here. Use the
value computed by vect_transform_slp_perm_load for ncopies.
|
|
This avoids randomly (based on whether the stmt is
SLP_TREE_REPRESENTATIVE and not a pattern stmt) passing a vector
type or NULL to the add_stmt_cost hook for scalar code cost
compute. For example the x86 backend uses only the vector type to
decide on the scalar computation mode which makes costing off.
So the following explicitely passes the vector type and uses
SLP_TREE_VECTYPE for this purpose.
2020-10-29 Richard Biener <rguenther@suse.de>
* tree-vect-slp.c (vect_bb_slp_scalar_cost): Pass
SLP_TREE_VECTYPE to record_stmt_cost.
|
|
This tweaks the op build from splats to allow loads marked as not
vectorizable. It also amends some dump prints with the address of
the SLP node or the instance to better be able to debug things.
2020-10-29 Richard Biener <rguenther@suse.de>
* tree-vect-slp.c (vect_build_slp_tree_2): Allow splatting
not vectorizable loads.
(vect_build_slp_instance): Amend dumping with address.
(vect_slp_convert_to_external): Likewise.
* gcc.dg/vect/bb-slp-pr65935.c: Adjust.
|
|
This adds another one.
2020-10-28 Richard Biener <rguenther@suse.de>
* tree-vect-slp.c (vect_slp_analyze_node_operations_1): Dump
when shared vectype update fails.
|
|
This passes down skip_args to vect_get_and_check_slp_defs to skip
ignored ops there, too and not fail SLP discovery. This fixes
gcc.target/aarch64/sve/reduc_strict_5.c
2020-10-28 Richard Biener <rguenther@suse.de>
* tree-vect-slp.c (vect_get_and_check_slp_defs): For skipped
args just push NULLs and vect_uninitialized_def.
(vect_build_slp_tree_2): Allocate skip_args for all ops
and pass it down to vect_get_and_check_slp_defs.
|
|
The previous change missed to check for patterns again, the following
corrects that.
2020-10-28 Richard Biener <rguenther@suse.de>
PR tree-optimization/97615
* tree-vect-slp.c (vect_build_slp_tree_2): Do not build
an external from pattern defs.
* gcc.dg/vect/bb-slp-pr97615.c: New testcase.
|
|
I've made a typo when refactoring the iteration over all loads in
the SLP graph. Fixed.
2020-10-28 Richard Biener <rguenther@suse.de>
* tree-vect-slp.c (vect_optimize_slp): Fix iteration over
all loads.
|
|
The following fixes missed optimizations due to the strange way we
split stores in BB vectorization. The solution is to split at
the failure boundary and not re-align that to the initial piece
chosen vector size. Also re-analyze any larger matching rest.
2020-10-28 Richard Biener <rguenther@suse.de>
* tree-vect-slp.c (vect_build_slp_instance): Split the store
group at the failure boundary and also re-analyze a large enough
matching rest.
* gcc.dg/vect/bb-slp-68.c: New testcase.
|
|
This fixes a mistake in the previous change in this area to what
was desired - figure the largest power-of-two group size fitting
in the matching area.
2020-10-27 Richard Biener <rguenther@suse.de>
* tree-vect-slp.c (vect_build_slp_instance): Use ceil_log2
to compute maximum group-size.
* gcc.dg/vect/bb-slp-67.c: New testcase.
|
|
This adjusts the condition when to split at control altering stmts,
only when there's a definition. It also removes the only use
of --param slp-max-insns-in-bb which a previous change left doing
nothing (but repeatedly print a message for each successive
instruction...).
2020-10-27 Richard Biener <rguenther@suse.de>
* tree-vect-slp.c (vect_slp_bbs): Remove no-op
slp-max-insns-in-bb check.
(vect_slp_function): Dump when splitting the function.
Adjust the split condition for control altering stmts.
* params.opt (-param=slp-max-insns-in-bb): Remove.
* doc/invoke.texi (-param=slp-max-insns-in-bb): Likewise.
|
|
This makes SLP discovery detect backedges by seeding the bst_map with
the node to be analyzed so it can be picked up from recursive calls.
This removes the need to discover backedges in a separate walk.
This enables SLP build to handle PHI nodes in full, continuing
the SLP build to non-backedges. For loop vectorization this
enables outer loop vectorization of nested SLP cycles and for
BB vectorization this enables vectorization of PHIs at CFG merges.
It also turns code generation into a SCC discovery walk to handle
irreducible regions and nodes only reachable via backedges where
we now also fill in vectorized backedge defs.
This requires sanitizing the SLP tree for SLP reduction chains even
more, manually filling the backedge SLP def.
This also exposes the fact that CFG copying (and edge splitting
until I fixed that) ends up with different edge order in the
copy which doesn't play well with the desired 1:1 mapping of
SLP PHI node children and edges for epilogue vectorization.
I've tried to fixup CFG copying here but this really looks
like a dead (or expensive) end there so I've done fixup in
slpeel_tree_duplicate_loop_to_edge_cfg instead for the cases
we can run into.
There's still NULLs in the SLP_TREE_CHILDREN vectors and I'm
not sure it's possible to eliminate them all this stage1 so the
patch has quite some checks for this case all over the place.
Bootstrapped and tested on x86_64-unknown-linux-gnu. SPEC CPU 2017
and SPEC CPU 2006 successfully built and tested.
2020-10-27 Richard Biener <rguenther@suse.de>
* gimple.h (gimple_expr_type): For PHIs return the type
of the result.
* tree-vect-loop-manip.c (slpeel_tree_duplicate_loop_to_edge_cfg):
Make sure edge order into copied loop headers line up with the
originals.
* tree-vect-loop.c (vect_transform_cycle_phi): Handle nested
loops with SLP.
(vectorizable_phi): New function.
(vectorizable_live_operation): For BB vectorization compute insert
location here.
* tree-vect-slp.c (vect_free_slp_tree): Deal with NULL
SLP_TREE_CHILDREN entries.
(vect_create_new_slp_node): Add overloads with pre-existing node
argument.
(vect_print_slp_graph): Likewise.
(vect_mark_slp_stmts): Likewise.
(vect_mark_slp_stmts_relevant): Likewise.
(vect_gather_slp_loads): Likewise.
(vect_optimize_slp): Likewise.
(vect_slp_analyze_node_operations): Likewise.
(vect_bb_slp_scalar_cost): Likewise.
(vect_remove_slp_scalar_calls): Likewise.
(vect_get_and_check_slp_defs): Handle PHIs.
(vect_build_slp_tree_1): Handle PHIs.
(vect_build_slp_tree_2): Continue SLP build, following PHI
arguments. Fix memory leak.
(vect_build_slp_tree): Put stub node into the hash-map so
we can discover cycles directly.
(vect_build_slp_instance): Set the backedge SLP def for
reduction chains.
(vect_analyze_slp_backedges): Remove.
(vect_analyze_slp): Do not call it.
(vect_slp_convert_to_external): Release SLP_TREE_LOAD_PERMUTATION.
(vect_slp_analyze_node_operations): Handle stray failed
backedge defs by failing.
(vect_slp_build_vertices): Adjust leaf condition.
(vect_bb_slp_mark_live_stmts): Handle PHIs, use visited
hash-set to handle cycles.
(vect_slp_analyze_operations): Adjust.
(vect_bb_partition_graph_r): Likewise.
(vect_slp_function): Adjust split condition to allow CFG
merges.
(vect_schedule_slp_instance): Rename to ...
(vect_schedule_slp_node): ... this. Move DFS walk to ...
(vect_schedule_scc): ... this new function.
(vect_schedule_slp): Call it. Remove ad-hoc vectorized
backedge fill code.
* tree-vect-stmts.c (vect_analyze_stmt): Call
vectorizable_phi.
(vect_transform_stmt): Likewise.
(vect_is_simple_use): Handle vect_backedge_def.
* tree-vectorizer.c (vec_info::new_stmt_vec_info): Only
set loop header PHIs to vect_unknown_def_type for loop
vectorization.
* tree-vectorizer.h (enum vect_def_type): Add vect_backedge_def.
(enum stmt_vec_info_type): Add phi_info_type.
(vectorizable_phi): Declare.
* gcc.dg/vect/bb-slp-54.c: New test.
* gcc.dg/vect/bb-slp-55.c: Likewise.
* gcc.dg/vect/bb-slp-56.c: Likewise.
* gcc.dg/vect/bb-slp-57.c: Likewise.
* gcc.dg/vect/bb-slp-58.c: Likewise.
* gcc.dg/vect/bb-slp-59.c: Likewise.
* gcc.dg/vect/bb-slp-60.c: Likewise.
* gcc.dg/vect/bb-slp-61.c: Likewise.
* gcc.dg/vect/bb-slp-62.c: Likewise.
* gcc.dg/vect/bb-slp-63.c: Likewise.
* gcc.dg/vect/bb-slp-64.c: Likewise.
* gcc.dg/vect/bb-slp-65.c: Likewise.
* gcc.dg/vect/bb-slp-66.c: Likewise.
* gcc.dg/vect/vect-outer-slp-1.c: Likewise.
* gfortran.dg/vect/O3-bb-slp-1.f: Likewise.
* gfortran.dg/vect/O3-bb-slp-2.f: Likewise.
* g++.dg/vect/simd-11.cc: Likewise.
|
|
This makes sure to use splats early when facing uniform internal
operands in BB SLP discovery rather than relying on the late
heuristincs re-building nodes from scratch.
2020-10-27 Richard Biener <rguenther@suse.de>
* tree-vect-slp.c (vect_build_slp_tree_2): When vectorizing
BBs splat uniform operands and stop SLP discovery.
* gcc.target/i386/pr95866-1.c: Adjust.
|
|
This introduces a global alloc-pool for SLP nodes to reduce overhead
on SLP allocation churn which will get worse and to eventually release
SLP cycles which will retain a refcount of one and thus are never
freed at the moment.
2020-10-26 Richard Biener <rguenther@suse.de>
* tree-vectorizer.h (slp_tree_pool): Declare.
(_slp_tree::operator new): Likewise.
(_slp_tree::operator delete): Likewise.
* tree-vectorizer.c (vectorize_loops): Allocate and free the
slp_tree_pool.
(pass_slp_vectorize::execute): Likewise.
* tree-vect-slp.c (slp_tree_pool): Define.
(_slp_tree::operator new): Likewise.
(_slp_tree::operator delete): Likewise.
|
|
This refactors the toplevel entry to analyze an SLP instance to
expose a worker analyzing from a vector of stmts and an SLP entry
kind.
2020-10-26 Richard Biener <rguenther@suse.de>
* tree-vect-slp.c (enum slp_instance_kind): New.
(vect_build_slp_instance): Split out from...
(vect_analyze_slp_instance): ... this.
|
|
In preparation for a larger change this refactors vect_analyze_slp_instance
so it doesn't need to know a vector type early.
2020-10-22 Richard Biener <rguenther@suse.de>
* tree-vect-slp.c (vect_analyze_slp_instance): Refactor so
computing a vector type early is not needed, for store group
splitting compute a new vector type based on the desired
group size.
|
|
Inductions are not vectorized as cycle but materialized from SCEV data.
Filling in backedge SLP nodes confuses this process.
2020-10-21 Richard Biener <rguenther@suse.de>
PR tree-optimization/97500
* tree-vect-slp.c (vect_analyze_slp_backedges): Do not
fill backedges for inductions.
* gfortran.dg/pr97500.f90: New testcase.
|
|
I forgot to guard the promotion to external for the case where the
def is in a pattern.
2020-10-20 Richard Biener <rguenther@suse.de>
PR tree-optimization/97496
* tree-vect-slp.c (vect_get_and_check_slp_defs): Guard extern
promotion with not in pattern.
* gcc.dg/vect/bb-slp-pr97496.c: New testcase.
|
|
This avoids edge inserting and eventual splitting during BB SLP
vectorization for now.
2020-10-19 Richard Biener <rguenther@suse.de>
PR tree-optimization/97486
* tree-vect-slp.c (vect_slp_function): Split after stmts
ending a BB.
* gcc.dg/vect/bb-slp-pr97486.c: New testcase.
|
|
This removes an assertion that was supposed to be only for temporary
debugging. I've also re-indented the code which I missed as well.
2020-10-19 Richard Biener <rguenther@suse.de>
PR tree-optimization/97466
* tree-vect-slp.c (vect_get_and_check_slp_defs): Remove
spurious assert, re-indent.
|
|
This changes SLP def gathering to not fail due to mismatched
def type but instead demote the def to external. This allows the
new testcase to be vectorized in full (with GCC 10 it is not
vectorized at all and with current trunk we vectorize only the
store). This is important since with BB vectorization being
applied to bigger pieces of code the chance that we mix
internal and external defs for an operand that should end up
treated as external (built from scalars) increases.
2020-10-16 Richard Biener <rguenther@suse.de>
* tree-vect-slp.c (vect_get_and_check_slp_defs): For BB
vectorization swap operands only if it helps, demote mismatches to
external.
* gcc.dg/vect/bb-slp-53.c: New testcase.
|
|
This refactors vect_get_and_check_slp_defs so that the ops and def_stmts
arrays are filled for all stmts and operands even when we signal failure.
This allows later changes for BB vectorization SLP discovery heuristics.
2020-10-16 Richard Biener <rguenther@suse.de>
* tree-vect-slp.c (vect_get_and_check_slp_defs): First analyze
all operands and fill in the def_stmts and ops entries.
(vect_def_types_match): New helper.
|
|
This enables SLP store group splitting also for loop vectorization.
For the existing testcase gcc.dg/vect/vect-complex-5.c this then
generates much better code, likewise for the PR97428 testcase.
Both of those have a splitting opportunity splitting the group
into two equal (vector-sized) halves, still the patch enables
quite arbitrary splitting since generally the interleaving scheme
results in quite awkward code for even small groups. If any
problems surface with this it's easy to restrict the splitting
to known-good cases.
2020-10-15 Richard Biener <rguenther@suse.de>
PR tree-optimization/97428
* tree-vect-slp.c (vect_analyze_slp_instance): Split store
groups also for loop vectorization.
* gcc.dg/vect/vect-complex-5.c: Expect to SLP.
* gcc.dg/vect/pr97428.c: Likewise.
|
|
This is another tiny piece in some bigger refactoring of
vect_get_and_check_slp_defs. Split out a test that has nothing
to do with def types or commutation.
2020-10-14 Richard Biener <rguenther@suse.de>
* tree-vect-slp.c (vect_get_and_check_slp_defs): Split out
test for compatible operand types.
|
|
We can end up with { _1, 1.0 } * { 3.0, _2 } which isn't really
profitable. The following adjusts things so we reject more than
one possibly expensive (non-constant and not uniform) vector CTOR
and instead build a CTOR for the scalar operation results.
This also moves a check in vect_get_and_check_slp_defs to a better
place.
2020-10-14 Richard Biener <rguenther@suse.de>
* tree-vect-slp.c (vect_get_and_check_slp_defs): Move
check for duplicate/interleave of variable size constants
to a place done once and early.
(vect_build_slp_tree_2): Adjust heuristics when to build
a BB SLP node from scalars.
|
|
This introduces a permute optimization phase for SLP which is
intended to cover the existing permute eliding for SLP reductions
plus handling commonizing the easy cases.
It currently uses graphds to compute a postorder on the reverse
SLP graph and it handles all cases vect_attempt_slp_rearrange_stmts
did (hopefully - I've adjusted most testcases that triggered it
a few days ago). It restricts itself to move around bijective
permutations to simplify things for now, mainly around constant nodes.
As a prerequesite it makes the SLP graph cyclic (ugh). It looks
like it would pay off to compute a PRE/POST order visit array
once and elide all the recursive SLP graph walks and their
visited hash-set. At least for the time where we do not change
the SLP graph during such walk.
I do not like using graphds too much but at least I don't have to
re-implement yet another RPO walk, so maybe it isn't too bad.
It now computes permute placement during iteration and thus should
get cycles more obviously correct.
Richard.
2020-10-06 Richard Biener <rguenther@suse.de>
* tree-vect-data-refs.c (vect_slp_analyze_instance_dependence):
Use SLP_TREE_REPRESENTATIVE.
* tree-vectorizer.h (_slp_tree::vertex): New member used
for graphds interfacing.
* tree-vect-slp.c (vect_build_slp_tree_2): Allocate space
for PHI SLP children.
(vect_analyze_slp_backedges): New function filling in SLP
node children for PHIs that correspond to backedge values.
(vect_analyze_slp): Call vect_analyze_slp_backedges for the
graph.
(vect_slp_analyze_node_operations): Deal with a cyclic graph.
(vect_schedule_slp_instance): Likewise.
(vect_schedule_slp): Likewise.
(slp_copy_subtree): Remove.
(vect_slp_rearrange_stmts): Likewise.
(vect_attempt_slp_rearrange_stmts): Likewise.
(vect_slp_build_vertices): New functions.
(vect_slp_permute): Likewise.
(vect_slp_perms_eq): Likewise.
(vect_optimize_slp): Remove special code to elide
permutations with SLP reductions. Implement generic
permute optimization.
* gcc.dg/vect/bb-slp-50.c: New testcase.
* gcc.dg/vect/bb-slp-51.c: Likewise.
|
|
When a VEC_PERM SLP node just permutes existing lanes this confuses
the SLP subgraph detection where I tried to elide a node-based
visited hash-map in a way that doesn't work. Fixed by adding such.
2020-10-12 Richard Biener <rguenther@suse.de>
* tree-vect-slp.c (vect_bb_partition_graph_r): Use visited
hash-map.
(vect_bb_partition_graph): Likewise.
|
|
This appropriately makes matches all true after successful SLP discovery
to reliably succeed splitting. We were picking up an eventual all
false built-up from scalars state in some cases.
2020-10-12 Richard Biener <rguenther@suse.de>
* tree-vect-slp.c (vect_analyze_slp_instance): Set matches to true
after successful discovery but forced split.
|
|
We're running into a multiplication with one unvectorizable
operand we expect to build from scalars but SLP discovery
fatally fails the build of both since one stmt is commutated:
_60 = _58 * _59;
_63 = _59 * _62;
_66 = _59 * _65;
...
where _59 is the "bad" operand. The following patch makes the
case work where the first stmt has a good operand by not fatally
failing the SLP build for the operand but communicating upwards
how to commutate.
2020-10-09 Richard Biener <rguenther@suse.de>
PR tree-optimization/97334
* tree-vect-slp.c (vect_build_slp_tree_1): Do not fatally
fail lanes other than zero when BB vectorizing.
* gcc.dg/vect/bb-slp-pr65935.c: Amend.
|