aboutsummaryrefslogtreecommitdiff
path: root/gcc/tree-vect-slp.c
AgeCommit message (Collapse)AuthorFilesLines
2020-12-02tree-optimization/97630 - fix SLP cycle memory leakRichard Biener1-1/+29
This fixes SLP cycles leaking memory by maintaining a double-linked list of allocatd SLP nodes we can zap when we free the alloc pool. 2020-12-02 Richard Biener <rguenther@suse.de> PR tree-optimization/97630 * tree-vectorizer.h (_slp_tree::next_node, _slp_tree::prev_node): New. (vect_slp_init): Declare. (vect_slp_fini): Likewise. * tree-vectorizer.c (vectorize_loops): Call vect_slp_init/fini. (pass_slp_vectorize::execute): Likewise. * tree-vect-slp.c (vect_slp_init): New. (vect_slp_fini): Likewise. (slp_first_node): New global. (_slp_tree::_slp_tree): Link node into the SLP tree list. (_slp_tree::~_slp_tree): Delink node from the SLP tree list.
2020-11-23fix hybrid SLP discovery debug stmt issueRichard Biener1-0/+2
This properly skips debug USE_STMTs when looking for non-SLP sinks. 2020-11-23 Richard Biener <rguenther@suse.de> * tree-vect-slp.c (maybe_push_to_hybrid_worklist): Skip debug stmts. * g++.dg/vect/simd-12.cc: New testcase.
2020-11-20SLP: Have vectorizable_slp_permutation set type on invariantsTamar Christina1-1/+2
This modifies vectorizable_slp_permutation to update the type of the children of a perm node before trying to permute them. This allows us to be able to permute invariant nodes. This will be covered by test from the SLP pattern matcher. gcc/ChangeLog: * tree-vect-slp.c (vectorizable_slp_permutation): Update types on nodes when needed.
2020-11-20Deal with (pattern) SLP consumed stmts in hybrid discoveryRichard Biener1-7/+72
This makes hybrid SLP discovery deal with stmts indirectly consumed by SLP, for example via patterns. This means that all uses of a stmt end up in SLP vectorized stmts. This helps my prototype patches for PR97832 where I make SLP discovery re-associate chains to make operands match. This ends up building SLP computation nodes without 1:1 representatives in the scalar IL and thus no scalar lane defs in SLP_TREE_SCALAR_STMTS. Nevertheless all of the original scalar stmts are consumed so this represents another kind of SLP pattern for the computation chain result. 2020-11-20 Richard Biener <rguenther@suse.de> * tree-vect-slp.c (maybe_push_to_hybrid_worklist): New function. (vect_detect_hybrid_slp): Use it. Perform a backward walk over the IL.
2020-11-20dump SLP_TREE_REPRESENTATIVERichard Biener1-0/+8
It always annoyed me to see those empty SLP nodes in dumpfiles: t.c:16:3: note: node 0x3a2a280 (max_nunits=1, refcnt=1) t.c:16:3: note: { } t.c:16:3: note: children 0x3a29db0 0x3a29e90 resulting from two-operator handling. The following makes sure to also dump the operation template or VEC_PERM_EXPR. 2020-11-20 Richard Biener <rguenther@suse.de> * tree-vect-slp.c (vect_print_slp_tree): Also dump SLP_TREE_REPRESENTATIVE.
2020-11-16Delay SLP instance loads gatheringRichard Biener1-8/+18
This delays filling SLP_INSTANCE_LOADS. 2020-11-16 Richard Biener <rguenther@suse.de> * tree-vectorizer.h (vect_gather_slp_loads): Declare. * tree-vect-loop.c (vect_analyze_loop_2): Call vect_gather_slp_loads. * tree-vect-slp.c (vect_build_slp_instance): Do not gather SLP loads here. (vect_gather_slp_loads): Remove wrapper, new function. (vect_slp_analyze_bb_1): Call it.
2020-11-16tree-optimization/97838 - fix SLP leaf detectionRichard Biener1-5/+17
This properly handles reduction PHI nodes with unrepresented initial value as leaf in the SLP graph. 2020-11-16 Richard Biener <rguenther@suse.de> PR tree-optimization/97838 * tree-vect-slp.c (vect_slp_build_vertices): Properly handle not backwards reachable cycles. (vect_optimize_slp): Check a node is leaf before marking it visited. * gcc.dg/vect/pr97838.c: New testcase.
2020-11-09tree-optimization/97761 - fix SLP live calculationRichard Biener1-4/+0
This removes a premature end of the DFS walk. 2020-11-09 Richard Biener <rguenther@suse.de> PR tree-optimization/97761 * tree-vect-slp.c (vect_bb_slp_mark_live_stmts): Remove premature end of DFS walk. * gfortran.dg/vect/pr97761.f90: New testcase.
2020-11-06refactor SLP analysisRichard Biener1-60/+45
This passes down the graph entry kind down to vect_analyze_slp_instance which simplifies it and makes it a shallow wrapper around vect_build_slp_instance. 2020-11-06 Richard Biener <rguenther@suse.de> * tree-vect-slp.c (vect_analyze_slp): Pass down the SLP graph entry kind. (vect_analyze_slp_instance): Simplify. (vect_build_slp_instance): Adjust. (vect_slp_check_for_constructors): Perform more eligibility checks here.
2020-11-06tree-optimization/97733 - fix SLP of reductions with zero relevantRichard Biener1-0/+3
This adds a missing check. 2020-11-06 Richard Biener <rguenther@suse.de> PR tree-optimization/97733 * tree-vect-slp.c (vect_analyze_slp_instance): If less than two reductions were relevant or live do nothing.
2020-11-05Fix SLP vectorization of stores from boolean vectorsRichard Biener1-0/+45
The following fixes SLP vectorization of stores that were pattern recognized. Since in SLP vectorization pattern analysis happens after dataref group analysis we have to adjust the groups with the pattern stmts. This has some effects down the pipeline and exposes cases where we looked at the wrong pattern/non-pattern stmts. 2020-11-05 Richard Biener <rguenther@suse.de> * tree-vect-data-refs.c (vect_slp_analyze_node_dependences): Use the original stmts. (vect_slp_analyze_node_alignment): Use the pattern stmt. * tree-vect-slp.c (vect_fixup_store_groups_with_patterns): New function. (vect_slp_analyze_bb_1): Call it. * gcc.dg/vect/bb-slp-69.c: New testcase.
2020-11-05middle-end: optimize slp simplify back to back permutes.Tamar Christina1-1/+13
This optimizes sequential permutes. i.e. if there are two permutes back to back this function applies the permute of the parent to the child and removed the parent. This relies on the materialization point calculation in optimize SLP. This allows us to remove useless permutes such as ldr q0, [x0, x3] ldr q2, [x1, x3] trn1 v1.4s, v0.4s, v0.4s trn2 v0.4s, v0.4s, v0.4s trn1 v0.4s, v1.4s, v0.4s mov v1.16b, v3.16b fcmla v1.4s, v0.4s, v2.4s, #0 fcmla v1.4s, v0.4s, v2.4s, #90 str q1, [x2, x3] from the sequence the vectorizer puts out and give ldr q0, [x0, x3] ldr q2, [x1, x3] mov v1.16b, v3.16b fcmla v1.4s, v0.4s, v2.4s, #0 fcmla v1.4s, v0.4s, v2.4s, #90 str q1, [x2, x3] instead. gcc/ChangeLog: * tree-vect-slp.c (vect_slp_tree_permute_noop_p): New. (vect_optimize_slp): Optimize permutes. (vectorizable_slp_permutation): Fix typo.
2020-11-05middle-end: Store and use the SLP instance kind when aborting load/store lanesTamar Christina1-7/+1
This patch stores the SLP instance kind in the SLP instance so that we can use it later when detecting load/store lanes support. This also changes the load/store lane support check to only check if the SLP kind is a store. This means that in order for the load/lanes to work all instances must be of kind store. gcc/ChangeLog: * tree-vect-loop.c (vect_analyze_loop_2): Check kind. * tree-vect-slp.c (vect_build_slp_instance): New. (enum slp_instance_kind): Move to... * tree-vectorizer.h (enum slp_instance_kind): .. Here (SLP_INSTANCE_KIND): New.
2020-11-04middle-end: Move load/store-lanes check till late.Tamar Christina1-48/+0
This moves the code that checks for load/store lanes further in the pipeline and places it after slp_optimize. This would allow us to perform optimizations on the SLP tree and only bail out if we really have a permute. With this change it allows us to handle permutes such as {1,1,1,1} which should be handled by a load and replicate. This change however makes it all or nothing. Either all instances can be handled or none at all. This is why some of the test cases have been adjusted. gcc/ChangeLog: * tree-vect-slp.c (vect_analyze_slp_instance): Moved load/store lanes check to ... * tree-vect-loop.c (vect_analyze_loop_2): ..Here gcc/testsuite/ChangeLog: * gcc.dg/vect/slp-11b.c: Update output scan. * gcc.dg/vect/slp-perm-6.c: Likewise.
2020-11-04bootstrap/97666 - really fix sizeof (bool) issueRichard Biener1-2/+2
Pastoed the previous fix too quickly, the following fixes the correct spot - the memset, not the allocation. 2020-11-04 Richard Biener <rguenther@suse.de> PR bootstrap/97666 * tree-vect-slp.c (vect_build_slp_tree_2): Revert previous fix and instead adjust the memset.
2020-11-03bootstrap/97666 - fix array of bool allocationRichard Biener1-1/+1
This fixes the bad assumption that sizeof (bool) == 1 2020-11-03 Richard Biener <rguenther@suse.de> PR bootstrap/97666 * tree-vect-slp.c (vect_build_slp_tree_2): Scale allocation of skip_args by sizeof (bool).
2020-11-03tree-optimization/97678 - fix SLP induction epilogue vectorizationRichard Biener1-2/+6
This restores not tracking SLP nodes for induction initial values in not nested context because this interferes with peeling and epilogue vectorization. 2020-11-03 Richard Biener <rguenther@suse.de> PR tree-optimization/97678 * tree-vect-slp.c (vect_build_slp_tree_2): Do not track the initial values of inductions when not nested. * tree-vect-loop.c (vectorizable_induction): Look at PHI node initial values again for SLP and not nested inductions. Handle LOOP_VINFO_MASK_SKIP_NITERS and cost invariants. * gcc.dg/vect/pr97678.c: New testcase.
2020-11-02Rewrite SLP induction vectorizationRichard Biener1-12/+6
This rewrites SLP induction vectorization to handle different inductions in the different SLP lanes. It also changes SLP build to represent the initial value (but not the cycle) so it can be enhanced to handle outer loop vectorization later. Note this FAILs gcc.dg/vect/costmodel/x86_64/costmodel-pr30843.c because it removes one CSE optimization that no longer works with non-uniform initial value and step. I'll see to recover from this after outer loop vectorization of inductions works. It might be a bit friendlier to variable-size vectors now but then we're now building the step vector from scalars ... 2020-11-02 Richard Biener <rguenther@suse.de> * tree.h (build_real_from_wide): Declare. * tree.c (build_real_from_wide): New function. * tree-vect-slp.c (vect_build_slp_tree_2): Remove restriction on induction vectorization, represent the initial value. * tree-vect-loop.c (vect_model_induction_cost): Inline ... (vectorizable_induction): ... here. Rewrite SLP code generation. * gcc.dg/vect/slp-49.c: New testcase.
2020-11-02tree-optimization/97558 - avoid SLP analyzing irrelevant stmtsRichard Biener1-22/+24
This avoids analyzing reductions that are not relevant (thus dead) which eventually will lead into crashes because the participating stmts meta is not analyzed. For this to work the patch also properly removes reduction groups that are not uniformly recognized as patterns. 2020-11-02 Richard Biener <rguenther@suse.de> PR tree-optimization/97558 * tree-vect-loop.c (vect_fixup_scalar_cycles_with_patterns): Check for any mismatch in pattern vs. non-pattern and dissolve the group if there is one. * tree-vect-slp.c (vect_analyze_slp_instance): Avoid analyzing not relevant reductions. (vect_analyze_slp): Avoid analyzing not relevant reduction groups. * gcc.dg/vect/pr97558.c: New testcase.
2020-11-02tree-optimization/97650 - fix ICE in vect_get_and_check_slp_defsRichard Biener1-0/+1
I was mistaken to treat vect_external_def as only applying to SSA_NAME defs, so check for that. 2020-11-02 Richard Biener <rguenther@suse.de> PR tree-optimization/97650 * tree-vect-slp.c (vect_get_and_check_slp_defs): Check for SSA_NAME before checking SSA_NAME_IS_DEFAULT_DEF. * gcc.dg/vect/bb-slp-pr97650.c: New testcase.
2020-10-30tree-optimization/97626 - handle SCCs properly in SLP stmt analysisRichard Biener1-13/+21
This makes sure to roll-back the whole SCC when we fail stmt analysis, otherwise the optimistic visited treatment breaks down with different entries. Rollback is easy when tracking additions to visited in a vector which also makes the whole thing cheaper than the two hash-sets used before. 2020-10-30 Richard Biener <rguenther@suse.de> PR tree-optimization/97626 * tree-vect-slp.c (vect_slp_analyze_node_operations): Exchange the lvisited hash-set for a vector, roll back recursive adds to visited when analysis failed. (vect_slp_analyze_operations): Likewise. * gcc.dg/vect/bb-slp-pr97626.c: New testcase.
2020-10-30tree-optimization/97633 - fix SLP scheduling of single-node cyclesRichard Biener1-74/+88
This makes sure to update backedges in single-node cycles. 2020-10-30 Richard Biener <rguenther@suse.de> PR tree-optimization/97633 * tree-vect-slp.c (): Update backedges in single-node cycles. Optimize processing of externals. * g++.dg/vect/slp-pr97636.cc: New testcase. * gcc.dg/vect/bb-slp-pr97633.c: Likewise.
2020-10-29vect: Fix load costs for SLP permutesRichard Sandiford1-2/+37
For the following test case (compiled with load/store lanes disabled locally): void f (uint32_t *restrict x, uint8_t *restrict y, int n) { for (int i = 0; i < n; ++i) { x[i * 2] = x[i * 2] + y[i * 2]; x[i * 2 + 1] = x[i * 2 + 1] + y[i * 2]; } } we have a redundant no-op permute on the x[] load node: node 0x4472350 (max_nunits=8, refcnt=2) stmt 0 _5 = *_4; stmt 1 _13 = *_12; load permutation { 0 1 } Then, when costing it, we pick a cost of 1, even though we need 4 copies of the x[] load to match a single y[] load: ==> examining statement: _5 = *_4; Vectorizing an unaligned access. vect_model_load_cost: unaligned supported by hardware. vect_model_load_cost: inside_cost = 1, prologue_cost = 0 . The problem is that the code only considers the permutation for the first scalar iteration, rather than for all VF iterations. This patch tries to fix that by making vect_transform_slp_perm_load calculate the value instead. gcc/ * tree-vectorizer.h (vect_transform_slp_perm_load): Take an optional extra parameter. * tree-vect-slp.c (vect_transform_slp_perm_load): Calculate the number of loads as well as the number of permutes, taking the counting loop from... * tree-vect-stmts.c (vect_model_load_cost): ...here. Use the value computed by vect_transform_slp_perm_load for ncopies.
2020-10-29Consistently pass the vector type for scalar SLP cost computeRichard Biener1-1/+2
This avoids randomly (based on whether the stmt is SLP_TREE_REPRESENTATIVE and not a pattern stmt) passing a vector type or NULL to the add_stmt_cost hook for scalar code cost compute. For example the x86 backend uses only the vector type to decide on the scalar computation mode which makes costing off. So the following explicitely passes the vector type and uses SLP_TREE_VECTYPE for this purpose. 2020-10-29 Richard Biener <rguenther@suse.de> * tree-vect-slp.c (vect_bb_slp_scalar_cost): Pass SLP_TREE_VECTYPE to record_stmt_cost.
2020-10-29More BB vectorization tweaksRichard Biener1-4/+6
This tweaks the op build from splats to allow loads marked as not vectorizable. It also amends some dump prints with the address of the SLP node or the instance to better be able to debug things. 2020-10-29 Richard Biener <rguenther@suse.de> * tree-vect-slp.c (vect_build_slp_tree_2): Allow splatting not vectorizable loads. (vect_build_slp_instance): Amend dumping with address. (vect_slp_convert_to_external): Likewise. * gcc.dg/vect/bb-slp-pr65935.c: Adjust.
2020-10-28dump when SLP analysis fails due to shared vectype mismatchRichard Biener1-1/+7
This adds another one. 2020-10-28 Richard Biener <rguenther@suse.de> * tree-vect-slp.c (vect_slp_analyze_node_operations_1): Dump when shared vectype update fails.
2020-10-28Ignore ignored operands in vect_get_and_check_slp_defsRichard Biener1-10/+26
This passes down skip_args to vect_get_and_check_slp_defs to skip ignored ops there, too and not fail SLP discovery. This fixes gcc.target/aarch64/sve/reduc_strict_5.c 2020-10-28 Richard Biener <rguenther@suse.de> * tree-vect-slp.c (vect_get_and_check_slp_defs): For skipped args just push NULLs and vect_uninitialized_def. (vect_build_slp_tree_2): Allocate skip_args for all ops and pass it down to vect_get_and_check_slp_defs.
2020-10-28tree-optimization/97615 - avoid creating externals from patternsRichard Biener1-1/+2
The previous change missed to check for patterns again, the following corrects that. 2020-10-28 Richard Biener <rguenther@suse.de> PR tree-optimization/97615 * tree-vect-slp.c (vect_build_slp_tree_2): Do not build an external from pattern defs. * gcc.dg/vect/bb-slp-pr97615.c: New testcase.
2020-10-28Fix iteration over loads in SLP optimizeRichard Biener1-1/+1
I've made a typo when refactoring the iteration over all loads in the SLP graph. Fixed. 2020-10-28 Richard Biener <rguenther@suse.de> * tree-vect-slp.c (vect_optimize_slp): Fix iteration over all loads.
2020-10-28Change the way we split stores in BB vectorizationRichard Biener1-7/+13
The following fixes missed optimizations due to the strange way we split stores in BB vectorization. The solution is to split at the failure boundary and not re-align that to the initial piece chosen vector size. Also re-analyze any larger matching rest. 2020-10-28 Richard Biener <rguenther@suse.de> * tree-vect-slp.c (vect_build_slp_instance): Split the store group at the failure boundary and also re-analyze a large enough matching rest. * gcc.dg/vect/bb-slp-68.c: New testcase.
2020-10-27Fix BB store group splitting group size computeRichard Biener1-1/+1
This fixes a mistake in the previous change in this area to what was desired - figure the largest power-of-two group size fitting in the matching area. 2020-10-27 Richard Biener <rguenther@suse.de> * tree-vect-slp.c (vect_build_slp_instance): Use ceil_log2 to compute maximum group-size. * gcc.dg/vect/bb-slp-67.c: New testcase.
2020-10-27Adjust BB vectorization function splittingRichard Biener1-13/+23
This adjusts the condition when to split at control altering stmts, only when there's a definition. It also removes the only use of --param slp-max-insns-in-bb which a previous change left doing nothing (but repeatedly print a message for each successive instruction...). 2020-10-27 Richard Biener <rguenther@suse.de> * tree-vect-slp.c (vect_slp_bbs): Remove no-op slp-max-insns-in-bb check. (vect_slp_function): Dump when splitting the function. Adjust the split condition for control altering stmts. * params.opt (-param=slp-max-insns-in-bb): Remove. * doc/invoke.texi (-param=slp-max-insns-in-bb): Likewise.
2020-10-27SLP vectorize across PHI nodesRichard Biener1-222/+442
This makes SLP discovery detect backedges by seeding the bst_map with the node to be analyzed so it can be picked up from recursive calls. This removes the need to discover backedges in a separate walk. This enables SLP build to handle PHI nodes in full, continuing the SLP build to non-backedges. For loop vectorization this enables outer loop vectorization of nested SLP cycles and for BB vectorization this enables vectorization of PHIs at CFG merges. It also turns code generation into a SCC discovery walk to handle irreducible regions and nodes only reachable via backedges where we now also fill in vectorized backedge defs. This requires sanitizing the SLP tree for SLP reduction chains even more, manually filling the backedge SLP def. This also exposes the fact that CFG copying (and edge splitting until I fixed that) ends up with different edge order in the copy which doesn't play well with the desired 1:1 mapping of SLP PHI node children and edges for epilogue vectorization. I've tried to fixup CFG copying here but this really looks like a dead (or expensive) end there so I've done fixup in slpeel_tree_duplicate_loop_to_edge_cfg instead for the cases we can run into. There's still NULLs in the SLP_TREE_CHILDREN vectors and I'm not sure it's possible to eliminate them all this stage1 so the patch has quite some checks for this case all over the place. Bootstrapped and tested on x86_64-unknown-linux-gnu. SPEC CPU 2017 and SPEC CPU 2006 successfully built and tested. 2020-10-27 Richard Biener <rguenther@suse.de> * gimple.h (gimple_expr_type): For PHIs return the type of the result. * tree-vect-loop-manip.c (slpeel_tree_duplicate_loop_to_edge_cfg): Make sure edge order into copied loop headers line up with the originals. * tree-vect-loop.c (vect_transform_cycle_phi): Handle nested loops with SLP. (vectorizable_phi): New function. (vectorizable_live_operation): For BB vectorization compute insert location here. * tree-vect-slp.c (vect_free_slp_tree): Deal with NULL SLP_TREE_CHILDREN entries. (vect_create_new_slp_node): Add overloads with pre-existing node argument. (vect_print_slp_graph): Likewise. (vect_mark_slp_stmts): Likewise. (vect_mark_slp_stmts_relevant): Likewise. (vect_gather_slp_loads): Likewise. (vect_optimize_slp): Likewise. (vect_slp_analyze_node_operations): Likewise. (vect_bb_slp_scalar_cost): Likewise. (vect_remove_slp_scalar_calls): Likewise. (vect_get_and_check_slp_defs): Handle PHIs. (vect_build_slp_tree_1): Handle PHIs. (vect_build_slp_tree_2): Continue SLP build, following PHI arguments. Fix memory leak. (vect_build_slp_tree): Put stub node into the hash-map so we can discover cycles directly. (vect_build_slp_instance): Set the backedge SLP def for reduction chains. (vect_analyze_slp_backedges): Remove. (vect_analyze_slp): Do not call it. (vect_slp_convert_to_external): Release SLP_TREE_LOAD_PERMUTATION. (vect_slp_analyze_node_operations): Handle stray failed backedge defs by failing. (vect_slp_build_vertices): Adjust leaf condition. (vect_bb_slp_mark_live_stmts): Handle PHIs, use visited hash-set to handle cycles. (vect_slp_analyze_operations): Adjust. (vect_bb_partition_graph_r): Likewise. (vect_slp_function): Adjust split condition to allow CFG merges. (vect_schedule_slp_instance): Rename to ... (vect_schedule_slp_node): ... this. Move DFS walk to ... (vect_schedule_scc): ... this new function. (vect_schedule_slp): Call it. Remove ad-hoc vectorized backedge fill code. * tree-vect-stmts.c (vect_analyze_stmt): Call vectorizable_phi. (vect_transform_stmt): Likewise. (vect_is_simple_use): Handle vect_backedge_def. * tree-vectorizer.c (vec_info::new_stmt_vec_info): Only set loop header PHIs to vect_unknown_def_type for loop vectorization. * tree-vectorizer.h (enum vect_def_type): Add vect_backedge_def. (enum stmt_vec_info_type): Add phi_info_type. (vectorizable_phi): Declare. * gcc.dg/vect/bb-slp-54.c: New test. * gcc.dg/vect/bb-slp-55.c: Likewise. * gcc.dg/vect/bb-slp-56.c: Likewise. * gcc.dg/vect/bb-slp-57.c: Likewise. * gcc.dg/vect/bb-slp-58.c: Likewise. * gcc.dg/vect/bb-slp-59.c: Likewise. * gcc.dg/vect/bb-slp-60.c: Likewise. * gcc.dg/vect/bb-slp-61.c: Likewise. * gcc.dg/vect/bb-slp-62.c: Likewise. * gcc.dg/vect/bb-slp-63.c: Likewise. * gcc.dg/vect/bb-slp-64.c: Likewise. * gcc.dg/vect/bb-slp-65.c: Likewise. * gcc.dg/vect/bb-slp-66.c: Likewise. * gcc.dg/vect/vect-outer-slp-1.c: Likewise. * gfortran.dg/vect/O3-bb-slp-1.f: Likewise. * gfortran.dg/vect/O3-bb-slp-2.f: Likewise. * g++.dg/vect/simd-11.cc: Likewise.
2020-10-27Avoid uniform lane BB vectorizationRichard Biener1-0/+22
This makes sure to use splats early when facing uniform internal operands in BB SLP discovery rather than relying on the late heuristincs re-building nodes from scratch. 2020-10-27 Richard Biener <rguenther@suse.de> * tree-vect-slp.c (vect_build_slp_tree_2): When vectorizing BBs splat uniform operands and stop SLP discovery. * gcc.target/i386/pr95866-1.c: Adjust.
2020-10-27Move SLP nodes to an alloc-poolRichard Biener1-0/+17
This introduces a global alloc-pool for SLP nodes to reduce overhead on SLP allocation churn which will get worse and to eventually release SLP cycles which will retain a refcount of one and thus are never freed at the moment. 2020-10-26 Richard Biener <rguenther@suse.de> * tree-vectorizer.h (slp_tree_pool): Declare. (_slp_tree::operator new): Likewise. (_slp_tree::operator delete): Likewise. * tree-vectorizer.c (vectorize_loops): Allocate and free the slp_tree_pool. (pass_slp_vectorize::execute): Likewise. * tree-vect-slp.c (slp_tree_pool): Define. (_slp_tree::operator new): Likewise. (_slp_tree::operator delete): Likewise.
2020-10-26Refactor SLP instance analysisRichard Biener1-108/+152
This refactors the toplevel entry to analyze an SLP instance to expose a worker analyzing from a vector of stmts and an SLP entry kind. 2020-10-26 Richard Biener <rguenther@suse.de> * tree-vect-slp.c (enum slp_instance_kind): New. (vect_build_slp_instance): Split out from... (vect_analyze_slp_instance): ... this.
2020-10-22Refactor vect_analyze_slp_instance a bitRichard Biener1-47/+38
In preparation for a larger change this refactors vect_analyze_slp_instance so it doesn't need to know a vector type early. 2020-10-22 Richard Biener <rguenther@suse.de> * tree-vect-slp.c (vect_analyze_slp_instance): Refactor so computing a vector type early is not needed, for store group splitting compute a new vector type based on the desired group size.
2020-10-21tree-optimization/97500 - avoid SLP backedges for inductionsRichard Biener1-0/+6
Inductions are not vectorized as cycle but materialized from SCEV data. Filling in backedge SLP nodes confuses this process. 2020-10-21 Richard Biener <rguenther@suse.de> PR tree-optimization/97500 * tree-vect-slp.c (vect_analyze_slp_backedges): Do not fill backedges for inductions. * gfortran.dg/pr97500.f90: New testcase.
2020-10-20tree-optimization/97496 - avoid SLP externs in patternsRichard Biener1-1/+2
I forgot to guard the promotion to external for the case where the def is in a pattern. 2020-10-20 Richard Biener <rguenther@suse.de> PR tree-optimization/97496 * tree-vect-slp.c (vect_get_and_check_slp_defs): Guard extern promotion with not in pattern. * gcc.dg/vect/bb-slp-pr97496.c: New testcase.
2020-10-19tree-optimization/97486 - avoid edge insertion in SLP vectorizingRichard Biener1-0/+9
This avoids edge inserting and eventual splitting during BB SLP vectorization for now. 2020-10-19 Richard Biener <rguenther@suse.de> PR tree-optimization/97486 * tree-vect-slp.c (vect_slp_function): Split after stmts ending a BB. * gcc.dg/vect/bb-slp-pr97486.c: New testcase.
2020-10-19tree-optimization/97466 - remove spurious assertRichard Biener1-66/+62
This removes an assertion that was supposed to be only for temporary debugging. I've also re-indented the code which I missed as well. 2020-10-19 Richard Biener <rguenther@suse.de> PR tree-optimization/97466 * tree-vect-slp.c (vect_get_and_check_slp_defs): Remove spurious assert, re-indent.
2020-10-16Adjust BB vectorization SLP build heuristicsRichard Biener1-6/+25
This changes SLP def gathering to not fail due to mismatched def type but instead demote the def to external. This allows the new testcase to be vectorized in full (with GCC 10 it is not vectorized at all and with current trunk we vectorize only the store). This is important since with BB vectorization being applied to bigger pieces of code the chance that we mix internal and external defs for an operand that should end up treated as external (built from scalars) increases. 2020-10-16 Richard Biener <rguenther@suse.de> * tree-vect-slp.c (vect_get_and_check_slp_defs): For BB vectorization swap operands only if it helps, demote mismatches to external. * gcc.dg/vect/bb-slp-53.c: New testcase.
2020-10-16Refactor vect_get_and_check_slp_defs some moreRichard Biener1-59/+82
This refactors vect_get_and_check_slp_defs so that the ops and def_stmts arrays are filled for all stmts and operands even when we signal failure. This allows later changes for BB vectorization SLP discovery heuristics. 2020-10-16 Richard Biener <rguenther@suse.de> * tree-vect-slp.c (vect_get_and_check_slp_defs): First analyze all operands and fill in the def_stmts and ops entries. (vect_def_types_match): New helper.
2020-10-16tree-optimization/97428 - split SLP groups for loop vectorizationRichard Biener1-8/+38
This enables SLP store group splitting also for loop vectorization. For the existing testcase gcc.dg/vect/vect-complex-5.c this then generates much better code, likewise for the PR97428 testcase. Both of those have a splitting opportunity splitting the group into two equal (vector-sized) halves, still the patch enables quite arbitrary splitting since generally the interleaving scheme results in quite awkward code for even small groups. If any problems surface with this it's easy to restrict the splitting to known-good cases. 2020-10-15 Richard Biener <rguenther@suse.de> PR tree-optimization/97428 * tree-vect-slp.c (vect_analyze_slp_instance): Split store groups also for loop vectorization. * gcc.dg/vect/vect-complex-5.c: Expect to SLP. * gcc.dg/vect/pr97428.c: Likewise.
2020-10-14More vect_get_and_check_slp_defs refactoringRichard Biener1-1/+8
This is another tiny piece in some bigger refactoring of vect_get_and_check_slp_defs. Split out a test that has nothing to do with def types or commutation. 2020-10-14 Richard Biener <rguenther@suse.de> * tree-vect-slp.c (vect_get_and_check_slp_defs): Split out test for compatible operand types.
2020-10-14adjust BB SLP build from scalars heuristicsRichard Biener1-20/+31
We can end up with { _1, 1.0 } * { 3.0, _2 } which isn't really profitable. The following adjusts things so we reject more than one possibly expensive (non-constant and not uniform) vector CTOR and instead build a CTOR for the scalar operation results. This also moves a check in vect_get_and_check_slp_defs to a better place. 2020-10-14 Richard Biener <rguenther@suse.de> * tree-vect-slp.c (vect_get_and_check_slp_defs): Move check for duplicate/interleave of variable size constants to a place done once and early. (vect_build_slp_tree_2): Adjust heuristics when to build a BB SLP node from scalars.
2020-10-12optimize permutes in SLP, remove vect_attempt_slp_rearrange_stmtsRichard Biener1-225/+460
This introduces a permute optimization phase for SLP which is intended to cover the existing permute eliding for SLP reductions plus handling commonizing the easy cases. It currently uses graphds to compute a postorder on the reverse SLP graph and it handles all cases vect_attempt_slp_rearrange_stmts did (hopefully - I've adjusted most testcases that triggered it a few days ago). It restricts itself to move around bijective permutations to simplify things for now, mainly around constant nodes. As a prerequesite it makes the SLP graph cyclic (ugh). It looks like it would pay off to compute a PRE/POST order visit array once and elide all the recursive SLP graph walks and their visited hash-set. At least for the time where we do not change the SLP graph during such walk. I do not like using graphds too much but at least I don't have to re-implement yet another RPO walk, so maybe it isn't too bad. It now computes permute placement during iteration and thus should get cycles more obviously correct. Richard. 2020-10-06 Richard Biener <rguenther@suse.de> * tree-vect-data-refs.c (vect_slp_analyze_instance_dependence): Use SLP_TREE_REPRESENTATIVE. * tree-vectorizer.h (_slp_tree::vertex): New member used for graphds interfacing. * tree-vect-slp.c (vect_build_slp_tree_2): Allocate space for PHI SLP children. (vect_analyze_slp_backedges): New function filling in SLP node children for PHIs that correspond to backedge values. (vect_analyze_slp): Call vect_analyze_slp_backedges for the graph. (vect_slp_analyze_node_operations): Deal with a cyclic graph. (vect_schedule_slp_instance): Likewise. (vect_schedule_slp): Likewise. (slp_copy_subtree): Remove. (vect_slp_rearrange_stmts): Likewise. (vect_attempt_slp_rearrange_stmts): Likewise. (vect_slp_build_vertices): New functions. (vect_slp_permute): Likewise. (vect_slp_perms_eq): Likewise. (vect_optimize_slp): Remove special code to elide permutations with SLP reductions. Implement generic permute optimization. * gcc.dg/vect/bb-slp-50.c: New testcase. * gcc.dg/vect/bb-slp-51.c: Likewise.
2020-10-12fix SLP subgraph detection wrt fully shared lanesRichard Biener1-7/+10
When a VEC_PERM SLP node just permutes existing lanes this confuses the SLP subgraph detection where I tried to elide a node-based visited hash-map in a way that doesn't work. Fixed by adding such. 2020-10-12 Richard Biener <rguenther@suse.de> * tree-vect-slp.c (vect_bb_partition_graph_r): Use visited hash-map. (vect_bb_partition_graph): Likewise.
2020-10-12More consistently split SLP groupsRichard Biener1-1/+1
This appropriately makes matches all true after successful SLP discovery to reliably succeed splitting. We were picking up an eventual all false built-up from scalars state in some cases. 2020-10-12 Richard Biener <rguenther@suse.de> * tree-vect-slp.c (vect_analyze_slp_instance): Set matches to true after successful discovery but forced split.
2020-10-09tree-optimization/97334 - improve BB SLP discoveryRichard Biener1-0/+22
We're running into a multiplication with one unvectorizable operand we expect to build from scalars but SLP discovery fatally fails the build of both since one stmt is commutated: _60 = _58 * _59; _63 = _59 * _62; _66 = _59 * _65; ... where _59 is the "bad" operand. The following patch makes the case work where the first stmt has a good operand by not fatally failing the SLP build for the operand but communicating upwards how to commutate. 2020-10-09 Richard Biener <rguenther@suse.de> PR tree-optimization/97334 * tree-vect-slp.c (vect_build_slp_tree_1): Do not fatally fail lanes other than zero when BB vectorizing. * gcc.dg/vect/bb-slp-pr65935.c: Amend.