aboutsummaryrefslogtreecommitdiff
path: root/gcc/tree-vect-loop.c
AgeCommit message (Collapse)AuthorFilesLines
2020-11-30tree-optimization/98064 - fix BB SLP live lane extract wrt LC SSARichard Biener1-0/+18
This avoids breaking LC SSA when SLP codegen pulled an out-of-loop def into a loop when merging with in-loop defs for an external def. 2020-11-30 Richard Biener <rguenther@suse.de> PR tree-optimization/98064 * tree-vect-loop.c (vectorizable_live_operation): Avoid breaking LC SSA for BB vectorization. * g++.dg/vect/pr98064.cc: New testcase.
2020-11-19vect: Add a “very cheap” cost modelRichard Sandiford1-0/+27
Currently we have three vector cost models: cheap, dynamic and unlimited. -O2 -ftree-vectorize uses “cheap” by default, but that's still relatively aggressive about peeling and aliasing checks, and can lead to significant code size growth. This patch adds an even more conservative choice, which for lack of imagination I've called “very cheap”. It only allows vectorisation if the vector code entirely replaces the scalar code. It also requires one iteration of the vector loop to pay for itself, regardless of how often the loop iterates. (If the vector loop needs multiple iterations to be beneficial then things are probably too close to call, and the conservative thing would be to stick with the scalar code.) The idea is that this should be suitable for -O2, although the patch doesn't change any defaults itself. I tested this by building and running a bunch of workloads for SVE, with three options: (1) -O2 (2) -O2 -ftree-vectorize -fvect-cost-model=very-cheap (3) -O2 -ftree-vectorize [-fvect-cost-model=cheap] All three builds used the default -msve-vector-bits=scalable and ran with the minimum vector length of 128 bits, which should give a worst-case bound for the performance impact. The workloads included a mixture of microbenchmarks and full applications. Because it's quite an eclectic mix, there's not much point giving exact figures. The aim was more to get a general impression. Code size growth with (2) was much lower than with (3). Only a handful of tests increased by more than 5%, and all of them were microbenchmarks. In terms of performance, (2) was significantly faster than (1) on microbenchmarks (as expected) but also on some full apps. Again, performance only regressed on a handful of tests. As expected, the performance of (3) vs. (1) and (3) vs. (2) is more of a mixed bag. There are several significant improvements with (3) over (2), but also some (smaller) regressions. That seems to be in line with -O2 -ftree-vectorize being a kind of -O2.5. The patch reorders vect_cost_model so that values are in order of increasing aggressiveness, which makes it possible to use range checks. The value 0 still represents “unlimited”, so “if (flag_vect_cost_model)” is still a meaningful check. gcc/ * doc/invoke.texi (-fvect-cost-model): Add a very-cheap model. * common.opt (fvect-cost-model=): Add very-cheap as a possible option. (fsimd-cost-model=): Likewise. (vect_cost_model): Add very-cheap. * flag-types.h (vect_cost_model): Add VECT_COST_MODEL_VERY_CHEAP. Put the values in order of increasing aggressiveness. * tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Use range checks when comparing against VECT_COST_MODEL_CHEAP. (vect_prune_runtime_alias_test_list): Do not allow any alias checks for the very-cheap cost model. * tree-vect-loop.c (vect_analyze_loop_costing): Do not allow any peeling for the very-cheap cost model. Also require one iteration of the vector loop to pay for itself. gcc/testsuite/ * gcc.dg/vect/vect-cost-model-1.c: New test. * gcc.dg/vect/vect-cost-model-2.c: Likewise. * gcc.dg/vect/vect-cost-model-3.c: Likewise. * gcc.dg/vect/vect-cost-model-4.c: Likewise. * gcc.dg/vect/vect-cost-model-5.c: Likewise. * gcc.dg/vect/vect-cost-model-6.c: Likewise.
2020-11-18tree-optimization/97886 - deal with strange LC PHI nodesRichard Biener1-0/+11
This makes vectorization properly assign vector types to PHI nodes that copy from externals on loop exit edges. 2020-11-18 Richard Biener <rguenther@suse.de> PR tree-optimization/97886 * tree-vect-loop.c (vectorizable_lc_phi): Properly assign vector types to invariants for SLP.
2020-11-16Delay SLP instance loads gatheringRichard Biener1-0/+3
This delays filling SLP_INSTANCE_LOADS. 2020-11-16 Richard Biener <rguenther@suse.de> * tree-vectorizer.h (vect_gather_slp_loads): Declare. * tree-vect-loop.c (vect_analyze_loop_2): Call vect_gather_slp_loads. * tree-vect-slp.c (vect_build_slp_instance): Do not gather SLP loads here. (vect_gather_slp_loads): Remove wrapper, new function. (vect_slp_analyze_bb_1): Call it.
2020-11-16tree-optimization/97835 - fix step vector construction for SLP inductionRichard Biener1-1/+1
We're stripping conversions off access functions of inductions and thus the step can be of different sign. Fix bogus step CTORs by converting the elements rather than the whole vector. 2020-11-16 Richard Biener <rguenther@suse.de> PR tree-optimization/97835 * tree-vect-loop.c (vectorizable_induction): Convert step scalars rather than step vector. * gcc.dg/vect/pr97835.c: New testcase.
2020-11-10tree-optimization/97760 - reduction paths with unhandled live stmtRichard Biener1-3/+6
This makes sure we reject reduction paths with a live stmt that is not the last one altering the value. This is because we do not handle this in the epilogue unless there's a scalar epilogue loop. 2020-11-09 Richard Biener <rguenther@suse.de> PR tree-optimization/97760 * tree-vect-loop.c (check_reduction_path): Reject reduction paths we do not handle in epilogue generation. * gcc.dg/vect/pr97760.c: New testcase.
2020-11-09tree-optimization/97753 - fix SLP induction vectRichard Biener1-2/+5
This fixes updating of the step vectors when filling up to group_size. 2020-11-09 Richard Biener <rguenther@suse.de> PR tree-optimization/97753 * tree-vect-loop.c (vectorizable_induction): Fill vec_steps when CSEing inside the group. * gcc.dg/vect/pr97753.c: New testcase.
2020-11-06tree-optimization/97732 - fix init of SLP induction vectorizationRichard Biener1-0/+4
This PR exposes two issues - one that the vector builder treats &x as eligible for VECTOR_CST elements and one that SLP induction vectorization forgets to convert init elements to the vector component type which makes a difference for pointer vs. integer. 2020-11-06 Richard Biener <rguenther@suse.de> PR tree-optimization/97732 * tree-vect-loop.c (vectorizable_induction): Convert the init elements to the vector component type. * gimple-fold.c (gimple_build_vector): Use CONSTANT_CLASS_P rather than TREE_CONSTANT to determine if elements are eligible for VECTOR_CSTs. * gcc.dg/vect/bb-slp-pr97732.c: New testcase.
2020-11-05middle-end: Store and use the SLP instance kind when aborting load/store lanesTamar Christina1-0/+1
This patch stores the SLP instance kind in the SLP instance so that we can use it later when detecting load/store lanes support. This also changes the load/store lane support check to only check if the SLP kind is a store. This means that in order for the load/lanes to work all instances must be of kind store. gcc/ChangeLog: * tree-vect-loop.c (vect_analyze_loop_2): Check kind. * tree-vect-slp.c (vect_build_slp_instance): New. (enum slp_instance_kind): Move to... * tree-vectorizer.h (enum slp_instance_kind): .. Here (SLP_INSTANCE_KIND): New.
2020-11-04middle-end: Move load/store-lanes check till late.Tamar Christina1-0/+72
This moves the code that checks for load/store lanes further in the pipeline and places it after slp_optimize. This would allow us to perform optimizations on the SLP tree and only bail out if we really have a permute. With this change it allows us to handle permutes such as {1,1,1,1} which should be handled by a load and replicate. This change however makes it all or nothing. Either all instances can be handled or none at all. This is why some of the test cases have been adjusted. gcc/ChangeLog: * tree-vect-slp.c (vect_analyze_slp_instance): Moved load/store lanes check to ... * tree-vect-loop.c (vect_analyze_loop_2): ..Here gcc/testsuite/ChangeLog: * gcc.dg/vect/slp-11b.c: Update output scan. * gcc.dg/vect/slp-perm-6.c: Likewise.
2020-11-04add costing to SLP vectorized PHIsRichard Biener1-1/+3
I forgot to cost vectorized PHIs. Scalar PHIs are just costed as scalar_stmt so the following costs vector PHIs as vector_stmt. 2020-11-04 Richard Biener <rguenther@suse.de> * tree-vectorizer.h (vectorizable_phi): Adjust prototype. * tree-vect-stmts.c (vect_transform_stmt): Adjust. (vect_analyze_stmt): Pass cost_vec to vectorizable_phi. * tree-vect-loop.c (vectorizable_phi): Do costing.
2020-11-04tree-optimization/97709 - set abnormal flag when vectorizing live lanesRichard Biener1-0/+3
This properly sets the abnormal flag when vectorizing live lanes when the original scalar was live across an abnormal edge. 2020-11-04 Richard Biener <rguenther@suse.de> PR tree-optimization/97709 * tree-vect-loop.c (vectorizable_live_operation): Set SSA_NAME_OCCURS_IN_ABNORMAL_PHI when necessary. * gcc.dg/vect/bb-slp-pr97709.c: New testcase.
2020-11-04Re-instantiate SLP induction IV CSERichard Biener1-2/+19
This re-instantiates the previously removed CSE, fixing the FAIL of gcc.dg/vect/costmodel/x86_64/costmodel-pr30843.c It turns out the previous approach still works. 2020-11-04 Richard Biener <rguenther@suse.de> * tree-vect-loop.c (vectorizable_induction): Re-instantiate previously removed CSE of SLP IVs.
2020-11-03tree-optimization/80928 - SLP vectorize nested loop inductionRichard Biener1-65/+51
This adds SLP vectorization of nested inductions. 2020-11-03 Richard Biener <rguenther@suse.de> PR tree-optimization/80928 * tree-vect-loop.c (vectorizable_induction): SLP vectorize nested inductions. * gcc.dg/vect/vect-outer-slp-2.c: New testcase. * gcc.dg/vect/vect-outer-slp-3.c: Likewise.
2020-11-03tree-optimization/97678 - fix SLP induction epilogue vectorizationRichard Biener1-5/+44
This restores not tracking SLP nodes for induction initial values in not nested context because this interferes with peeling and epilogue vectorization. 2020-11-03 Richard Biener <rguenther@suse.de> PR tree-optimization/97678 * tree-vect-slp.c (vect_build_slp_tree_2): Do not track the initial values of inductions when not nested. * tree-vect-loop.c (vectorizable_induction): Look at PHI node initial values again for SLP and not nested inductions. Handle LOOP_VINFO_MASK_SKIP_NITERS and cost invariants. * gcc.dg/vect/pr97678.c: New testcase.
2020-11-02Rewrite SLP induction vectorizationRichard Biener1-135/+161
This rewrites SLP induction vectorization to handle different inductions in the different SLP lanes. It also changes SLP build to represent the initial value (but not the cycle) so it can be enhanced to handle outer loop vectorization later. Note this FAILs gcc.dg/vect/costmodel/x86_64/costmodel-pr30843.c because it removes one CSE optimization that no longer works with non-uniform initial value and step. I'll see to recover from this after outer loop vectorization of inductions works. It might be a bit friendlier to variable-size vectors now but then we're now building the step vector from scalars ... 2020-11-02 Richard Biener <rguenther@suse.de> * tree.h (build_real_from_wide): Declare. * tree.c (build_real_from_wide): New function. * tree-vect-slp.c (vect_build_slp_tree_2): Remove restriction on induction vectorization, represent the initial value. * tree-vect-loop.c (vect_model_induction_cost): Inline ... (vectorizable_induction): ... here. Rewrite SLP code generation. * gcc.dg/vect/slp-49.c: New testcase.
2020-11-02tree-optimization/97558 - compute vectype for SLP nested cyclesRichard Biener1-3/+22
This makes sure to compute the vector type for invariant SLP children of nested cycles. 2020-11-02 Richard Biener <rguenther@suse.de> PR tree-optimization/97558 * tree-vect-loop.c (vectorizable_reduction): For nested SLP cycles compute invariant operands vector type. * gcc.dg/vect/pr97558-2.c: New testcase.
2020-11-02tree-optimization/97558 - avoid SLP analyzing irrelevant stmtsRichard Biener1-21/+44
This avoids analyzing reductions that are not relevant (thus dead) which eventually will lead into crashes because the participating stmts meta is not analyzed. For this to work the patch also properly removes reduction groups that are not uniformly recognized as patterns. 2020-11-02 Richard Biener <rguenther@suse.de> PR tree-optimization/97558 * tree-vect-loop.c (vect_fixup_scalar_cycles_with_patterns): Check for any mismatch in pattern vs. non-pattern and dissolve the group if there is one. * tree-vect-slp.c (vect_analyze_slp_instance): Avoid analyzing not relevant reductions. (vect_analyze_slp): Avoid analyzing not relevant reduction groups. * gcc.dg/vect/pr97558.c: New testcase.
2020-10-29Fix some memleaksRichard Biener1-1/+1
This fixes some memleaks, one older, one recently introduced. 2020-10-29 Richard Biener <rguenther@suse.de> * tree-ssa-pre.c (compute_avail): Free operands consistently. * tree-vect-loop.c (vectorizable_phi): Make sure all operand defs vectors are released.
2020-10-27SLP vectorize across PHI nodesRichard Biener1-11/+101
This makes SLP discovery detect backedges by seeding the bst_map with the node to be analyzed so it can be picked up from recursive calls. This removes the need to discover backedges in a separate walk. This enables SLP build to handle PHI nodes in full, continuing the SLP build to non-backedges. For loop vectorization this enables outer loop vectorization of nested SLP cycles and for BB vectorization this enables vectorization of PHIs at CFG merges. It also turns code generation into a SCC discovery walk to handle irreducible regions and nodes only reachable via backedges where we now also fill in vectorized backedge defs. This requires sanitizing the SLP tree for SLP reduction chains even more, manually filling the backedge SLP def. This also exposes the fact that CFG copying (and edge splitting until I fixed that) ends up with different edge order in the copy which doesn't play well with the desired 1:1 mapping of SLP PHI node children and edges for epilogue vectorization. I've tried to fixup CFG copying here but this really looks like a dead (or expensive) end there so I've done fixup in slpeel_tree_duplicate_loop_to_edge_cfg instead for the cases we can run into. There's still NULLs in the SLP_TREE_CHILDREN vectors and I'm not sure it's possible to eliminate them all this stage1 so the patch has quite some checks for this case all over the place. Bootstrapped and tested on x86_64-unknown-linux-gnu. SPEC CPU 2017 and SPEC CPU 2006 successfully built and tested. 2020-10-27 Richard Biener <rguenther@suse.de> * gimple.h (gimple_expr_type): For PHIs return the type of the result. * tree-vect-loop-manip.c (slpeel_tree_duplicate_loop_to_edge_cfg): Make sure edge order into copied loop headers line up with the originals. * tree-vect-loop.c (vect_transform_cycle_phi): Handle nested loops with SLP. (vectorizable_phi): New function. (vectorizable_live_operation): For BB vectorization compute insert location here. * tree-vect-slp.c (vect_free_slp_tree): Deal with NULL SLP_TREE_CHILDREN entries. (vect_create_new_slp_node): Add overloads with pre-existing node argument. (vect_print_slp_graph): Likewise. (vect_mark_slp_stmts): Likewise. (vect_mark_slp_stmts_relevant): Likewise. (vect_gather_slp_loads): Likewise. (vect_optimize_slp): Likewise. (vect_slp_analyze_node_operations): Likewise. (vect_bb_slp_scalar_cost): Likewise. (vect_remove_slp_scalar_calls): Likewise. (vect_get_and_check_slp_defs): Handle PHIs. (vect_build_slp_tree_1): Handle PHIs. (vect_build_slp_tree_2): Continue SLP build, following PHI arguments. Fix memory leak. (vect_build_slp_tree): Put stub node into the hash-map so we can discover cycles directly. (vect_build_slp_instance): Set the backedge SLP def for reduction chains. (vect_analyze_slp_backedges): Remove. (vect_analyze_slp): Do not call it. (vect_slp_convert_to_external): Release SLP_TREE_LOAD_PERMUTATION. (vect_slp_analyze_node_operations): Handle stray failed backedge defs by failing. (vect_slp_build_vertices): Adjust leaf condition. (vect_bb_slp_mark_live_stmts): Handle PHIs, use visited hash-set to handle cycles. (vect_slp_analyze_operations): Adjust. (vect_bb_partition_graph_r): Likewise. (vect_slp_function): Adjust split condition to allow CFG merges. (vect_schedule_slp_instance): Rename to ... (vect_schedule_slp_node): ... this. Move DFS walk to ... (vect_schedule_scc): ... this new function. (vect_schedule_slp): Call it. Remove ad-hoc vectorized backedge fill code. * tree-vect-stmts.c (vect_analyze_stmt): Call vectorizable_phi. (vect_transform_stmt): Likewise. (vect_is_simple_use): Handle vect_backedge_def. * tree-vectorizer.c (vec_info::new_stmt_vec_info): Only set loop header PHIs to vect_unknown_def_type for loop vectorization. * tree-vectorizer.h (enum vect_def_type): Add vect_backedge_def. (enum stmt_vec_info_type): Add phi_info_type. (vectorizable_phi): Declare. * gcc.dg/vect/bb-slp-54.c: New test. * gcc.dg/vect/bb-slp-55.c: Likewise. * gcc.dg/vect/bb-slp-56.c: Likewise. * gcc.dg/vect/bb-slp-57.c: Likewise. * gcc.dg/vect/bb-slp-58.c: Likewise. * gcc.dg/vect/bb-slp-59.c: Likewise. * gcc.dg/vect/bb-slp-60.c: Likewise. * gcc.dg/vect/bb-slp-61.c: Likewise. * gcc.dg/vect/bb-slp-62.c: Likewise. * gcc.dg/vect/bb-slp-63.c: Likewise. * gcc.dg/vect/bb-slp-64.c: Likewise. * gcc.dg/vect/bb-slp-65.c: Likewise. * gcc.dg/vect/bb-slp-66.c: Likewise. * gcc.dg/vect/vect-outer-slp-1.c: Likewise. * gfortran.dg/vect/O3-bb-slp-1.f: Likewise. * gfortran.dg/vect/O3-bb-slp-2.f: Likewise. * g++.dg/vect/simd-11.cc: Likewise.
2020-10-22vect: Remove redundant LOOP_VINFO_FULLY_MASKED_PKewen Lin1-2/+1
Remove one redundant LOOP_VINFO_FULLY_MASKED_P condition check which will be checked in vect_use_loop_mask_for_alignment_p. gcc/ChangeLog: * tree-vect-loop.c (vect_transform_loop): Remove the redundant LOOP_VINFO_FULLY_MASKED_P check.
2020-10-20Fix latch PHI arg lookup in vectorizable_reduction for double-reductionRichard Biener1-2/+4
We were using the wrong loop to figure the latch arg of a double-reduction PHI. Which isn't a problem in case ->dest_idx match up with the outer loop edges - but that's of course not guaranteed. 2020-10-20 Richard Biener <rguenther@suse.de> * tree-vect-loop.c (vectorizable_reduction): Use the correct loops latch edge for the PHI arg lookup.
2020-10-15Fix ICE in vectorizable_live_operationRichard Biener1-2/+5
This fixes the case where the insertion iterator for the live stmt is the end of a BB by adjusting the dominance query to the definition of the def we're substituting. 2020-10-15 Richard Biener <rguenther@suse.de> * tree-vect-loop.c (vectorizable_live_operation): Adjust dominance query. * gcc.dg/vect/bb-slp-52.c: New testcase.
2020-10-09random memory leak fixesRichard Biener1-0/+1
This fixes leaks discovered checking whether I introduced new ones with the last vectorizer changes. 2020-10-09 Richard Biener <rguenther@suse.de> * cgraphunit.c (expand_all_functions): Free tp_first_run_order. * ipa-modref.c (pass_ipa_modref::execute): Free order. * tree-ssa-loop-niter.c (estimate_numbers_of_iterations): Free loop body. * tree-vect-data-refs.c (vect_find_stmt_data_reference): Free data references upon failure. * tree-vect-loop.c (update_epilogue_loop_vinfo): Free BBs array of the original loop. * tree-vect-slp.c (vect_slp_bbs): Use an auto_vec for dataref_groups to release its memory.
2020-09-29tree-optimization/97241 - fix ICE in reduction vectorizationRichard Biener1-12/+5
The following moves an ad-hoc attempt at discovering the SLP node for a stmt to the place where we can find it in lock-step when we find the stmt itself. 2020-09-29 Richard Biener <rguenther@suse.de> PR tree-optimization/97241 * tree-vect-loop.c (vectorizable_reduction): Move finding the SLP node for the reduction stmt to a better place. * gcc.dg/vect/pr97241.c: New testcase.
2020-09-23vect: Fix epilogue loop handling of partial vectorsRichard Sandiford1-68/+128
This patch fixes the fallout that Kewen reported on Power after the recent change to avoid unnecessary use of partial vectors. As Kewen said, the problem is that vect_analyze_loop_2 doesn't know how many epilogue iterations there will be, and so it cannot make a final decision about whether the number of iterations forces an epilogue loop to use partial vectors. This is similar to the current situation for peeling: we don't know during initial analysis whether an epilogue loop will itself require peeling. Instead we decide that during vect_do_peeling, where the final number of epilogue loop iterations is known. The patch takes a similar approach for the decision about whether to use partial vectors. As the comments in the patch say, the idea is that vect_analyze_loop_2 should make peeling and partial- vector decisions based on the assumption that the loop_vinfo will be used as the main loop, while vect_do_peeling should make them in the knowledge that the loop_vinfo will be used as an epilogue loop. This allows the same analysis to be used for both cases, which we rely on for implementing VECT_COMPARE_COSTS; see the big comment in vect_analyze_loop for details. I hope the patch makes the (mostly preexisting) structure a bit more obvious. It isn't what anyone would design from scratch, but that's the nature of working with a mature vector framework. Arranging things this way means that vect_verify_full_masking and vect_verify_loop_lens now become part of the “can” rather than “will” test for partial vectors. Also, while splitting out the logic that handles epilogues with constant iterations, I added a check to make sure that we don't try to use partial vectors to vectorise a single-scalar loop. This required some changes to the Power tests. gcc/ * tree-vectorizer.h (determine_peel_for_niter): Delete in favor of... (vect_determine_partial_vectors_and_peeling): ...this new function. * tree-vect-loop-manip.c (vect_update_epilogue_niters): New function. Reject using vector epilogue loops for single iterations. Install the constant number of epilogue loop iterations in the associated loop_vinfo. Rely on vect_determine_partial_vectors_and_peeling to do the main part of the test. (vect_do_peeling): Use vect_update_epilogue_niters to handle epilogue loops with a known number of iterations. Skip recomputing the number of iterations later in that case. Otherwise, use vect_determine_partial_vectors_and_peeling to decide whether the epilogue loop needs to use partial vectors or peeling. * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Set the default can_use_partial_vectors_p to false if partial-vector-usage=0. (determine_peel_for_niter): Remove in favor of... (vect_determine_partial_vectors_and_peeling): ...this new function, split out from... (vect_analyze_loop_2): ...here. Reflect the vect_verify_full_masking and vect_verify_loop_lens results in CAN_USE_PARTIAL_VECTORS_P rather than USING_PARTIAL_VECTORS_P. gcc/testsuite/ * gcc.target/powerpc/p9-vec-length-epil-1.c: Do not expect the single-iteration epilogues of the 64-bit loops to be vectorized. * gcc.target/powerpc/p9-vec-length-epil-7.c: Likewise. * gcc.target/powerpc/p9-vec-length-epil-8.c: Likewise.
2020-09-23tree-optimization/97173 - extend assert in vectorizable_live_operationRichard Biener1-2/+4
The condition we're expecting to eventually run into isn't fully captured by checking for CTORs, instead we can also run into the CTOR element conversion. 2020-09-23 Richard Biener <rguenther@suse.de> PR tree-optimization/97173 * tree-vect-loop.c (vectorizable_live_operation): Extend assert to also conver element conversions. * gcc.dg/vect/pr97173.c: New testcase.
2020-09-18tree-optimization/97095 - fix typo in vectorizable_live_operationRichard Biener1-1/+1
This fixes a typo introduced with the last change and not noticed because those vectorizer access macros are not type safe ... 2020-09-18 Richard Biener <rguenther@suse.de> PR tree-optimization/97095 * tree-vect-loop.c (vectorizable_live_operation): Get the SLP vector type from the correct object. * gfortran.dg/pr97095.f: New testcase.
2020-09-16vec: don't select partial vectors when unnecessaryAndrea Corallo1-36/+49
gcc/ChangeLog 2020-09-09 Andrea Corallo <andrea.corallo@arm.com> * tree-vect-loop.c (vect_need_peeling_or_partial_vectors_p): New function. (vect_analyze_loop_2): Make use of it not to select partial vectors if no peel is required. (determine_peel_for_niter): Move out some logic into 'vect_need_peeling_or_partial_vectors_p'. gcc/testsuite/ChangeLog 2020-09-09 Andrea Corallo <andrea.corallo@arm.com> * gcc.target/aarch64/sve/cost_model_10.c: New test. * gcc.target/aarch64/sve/clastb_8.c: Update test for new vectorization strategy. * gcc.target/aarch64/sve/cost_model_5.c: Likewise. * gcc.target/aarch64/sve/struct_vect_14.c: Likewise. * gcc.target/aarch64/sve/struct_vect_15.c: Likewise. * gcc.target/aarch64/sve/struct_vect_16.c: Likewise. * gcc.target/aarch64/sve/struct_vect_17.c: Likewise.
2020-09-16remove STMT_VINFO_NUM_SLP_USESRichard Biener1-3/+5
This removes STMT_VINFO_NUM_SLP_USES by pushing the setting of the shared stmt_vec_info vector type to where we actually need it which is alignment analysis and vectorizable_* analysis (where we could eventually elide it for non-load/store operations). In particular "uses" in the cache and in disqualified SLP subgraphs should no longer provide conflicting vector types this way. 2020-09-16 Richard Biener <rguenther@suse.de> * tree-vectorizer.h (_stmt_vec_info::num_slp_uses): Remove. (STMT_VINFO_NUM_SLP_USES): Likewise. (vect_free_slp_instance): Adjust. (vect_update_shared_vectype): Declare. * tree-vectorizer.c (vec_info::~vec_info): Adjust. * tree-vect-loop.c (vect_analyze_loop_2): Likewise. (vectorizable_live_operation): Use vector type from SLP_TREE_REPRESENTATIVE. (vect_transform_loop): Adjust. * tree-vect-data-refs.c (vect_slp_analyze_node_alignment): Set the shared vector type. * tree-vect-slp.c (vect_free_slp_tree): Remove final_p parameter, remove STMT_VINFO_NUM_SLP_USES updating. (vect_free_slp_instance): Adjust. (vect_create_new_slp_node): Remove STMT_VINFO_NUM_SLP_USES updating. (vect_update_shared_vectype): Always compare with the present vector type, update if NULL. (vect_build_slp_tree_1): Do not update the shared vector type here. (vect_build_slp_tree_2): Adjust. (slp_copy_subtree): Likewise. (vect_attempt_slp_rearrange_stmts): Likewise. (vect_analyze_slp_instance): Likewise. (vect_analyze_slp): Likewise. (vect_slp_analyze_node_operations_1): Update the shared vector type. (vect_slp_analyze_operations): Adjust. (vect_slp_analyze_bb_1): Likewise.
2020-09-11improve BB vectorization dump locationsRichard Biener1-1/+1
This tries to improve BB vectorization dumps by providing more precise locations. Currently the vect_location is simply the very last stmt in a basic-block that has a location. So for double a[4], b[4]; int x[4], y[4]; void foo() { a[0] = b[0]; // line 5 a[1] = b[1]; a[2] = b[2]; a[3] = b[3]; x[0] = y[0]; // line 9 x[1] = y[1]; x[2] = y[2]; x[3] = y[3]; } // line 13 we show the user with -O3 -fopt-info-vec t.c:13:1: optimized: basic block part vectorized using 16 byte vectors while with the patch we point to both independently vectorized opportunities: t.c:5:8: optimized: basic block part vectorized using 16 byte vectors t.c:9:8: optimized: basic block part vectorized using 16 byte vectors there's the possibility that the location regresses in case the root stmt in the SLP instance has no location. For a SLP subgraph with multiple entries the location also chooses one entry at random, not sure in which case we want to dump both. Still as the plan is to extend the basic-block vectorization scope from single basic-block to multiple ones this is a first step to preserve something sensible. Implementation-wise this makes both costing and code-generation happen on the subgraphs as analyzed. 2020-09-11 Richard Biener <rguenther@suse.de> * tree-vectorizer.h (_slp_instance::location): New method. (vect_schedule_slp): Adjust prototype. * tree-vectorizer.c (vec_info::remove_stmt): Adjust the BB region begin if we removed the stmt it points to. * tree-vect-loop.c (vect_transform_loop): Adjust. * tree-vect-slp.c (_slp_instance::location): Implement. (vect_analyze_slp_instance): For BB vectorization set vect_location to that of the instance. (vect_slp_analyze_operations): Likewise. (vect_bb_vectorization_profitable_p): Remove wrapper. (vect_slp_analyze_bb_1): Remove cost check here. (vect_slp_region): Cost check and code generate subgraphs separately, report optimized locations and missed optimizations due to profitability for each of them. (vect_schedule_slp): Get the vector of SLP graph entries to vectorize as argument.
2020-09-07vec: Revert "dead code removal in tree-vect-loop.c" and add a comment.Andrea Corallo1-4/+13
gcc/ChangeLog 2020-09-07 Andrea Corallo <andrea.corallo@arm.com> * tree-vect-loop.c (vect_estimate_min_profitable_iters): Revert dead-code removal introduced by 09fa6acd8d9 + add a comment to clarify.
2020-09-07code generate live lanes in basic-block vectorizationRichard Biener1-94/+149
The following adds the capability to code-generate live lanes in basic-block vectorization using lane extracts from vector stmts rather than keeping the original scalar code around for those. This eventually makes previously not profitable vectorizations profitable (the live scalar code was appropriately costed so are the lane extracts now), without considering the cost model this patch doesn't add or remove any basic-block vectorization capabilities. The patch re/ab-uses STMT_VINFO_LIVE_P in basic-block vectorization mode to tell whether a live lane is vectorized or whether it is provided by means of keeping the scalar code live. The patch is a first step towards vectorizing sequences of stmts that do not end up in stores or vector constructors though. Bootstrapped and tested on x86_64-unknown-linux-gnu. 2020-09-04 Richard Biener <rguenther@suse.de> * tree-vectorizer.h (vectorizable_live_operation): Adjust. * tree-vect-loop.c (vectorizable_live_operation): Vectorize live lanes out of basic-block vectorization nodes. * tree-vect-slp.c (vect_bb_slp_mark_live_stmts): New function. (vect_slp_analyze_operations): Analyze live lanes and their vectorization possibility after the whole SLP graph is final. (vect_bb_slp_scalar_cost): Adjust for vectorized live lanes. * tree-vect-stmts.c (can_vectorize_live_stmts): Adjust. (vect_transform_stmt): Call can_vectorize_live_stmts also for basic-block vectorization. * gcc.dg/vect/bb-slp-46.c: New testcase. * gcc.dg/vect/bb-slp-47.c: Likewise. * gcc.dg/vect/bb-slp-32.c: Adjust.
2020-09-04tree-optimization/96920 - another ICE when vectorizing nested cyclesRichard Biener1-35/+67
This refines the previous fix for PR96698 by re-doing how and where we arrange for setting vectorized cycle PHI backedge values. 2020-09-04 Richard Biener <rguenther@suse.de> PR tree-optimization/96698 PR tree-optimization/96920 * tree-vectorizer.h (loop_vec_info::reduc_latch_defs): Remove. (loop_vec_info::reduc_latch_slp_defs): Likewise. * tree-vect-stmts.c (vect_transform_stmt): Remove vectorized cycle PHI latch code. * tree-vect-loop.c (maybe_set_vectorized_backedge_value): New helper to set vectorized cycle PHI latch values. (vect_transform_loop): Walk over all PHIs again after vectorizing them, calling maybe_set_vectorized_backedge_value. Call maybe_set_vectorized_backedge_value for each vectorized stmt. Remove delayed update code. * tree-vect-slp.c (vect_analyze_slp_instance): Initialize SLP instance reduc_phis member. (vect_schedule_slp): Set vectorized cycle PHI latch values. * gfortran.dg/vect/pr96920.f90: New testcase. * gcc.dg/vect/pr96920.c: Likewise.
2020-09-04vec: dead code removal in tree-vect-loop.cAndrea Corallo1-11/+4
gcc/ChangeLog 2020-09-04 Andrea Corallo <andrea.corallo@arm.com> * tree-vect-loop.c (vect_estimate_min_profitable_iters): Remove dead code as LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) is always verified.
2020-08-27vec: add exact argument for various grow functions.Martin Liska1-4/+4
gcc/ada/ChangeLog: * gcc-interface/trans.c (gigi): Set exact argument of a vector growth function to true. (Attribute_to_gnu): Likewise. gcc/ChangeLog: * alias.c (init_alias_analysis): Set exact argument of a vector growth function to true. * calls.c (internal_arg_pointer_based_exp_scan): Likewise. * cfgbuild.c (find_many_sub_basic_blocks): Likewise. * cfgexpand.c (expand_asm_stmt): Likewise. * cfgrtl.c (rtl_create_basic_block): Likewise. * combine.c (combine_split_insns): Likewise. (combine_instructions): Likewise. * config/aarch64/aarch64-sve-builtins.cc (function_expander::add_output_operand): Likewise. (function_expander::add_input_operand): Likewise. (function_expander::add_integer_operand): Likewise. (function_expander::add_address_operand): Likewise. (function_expander::add_fixed_operand): Likewise. * df-core.c (df_worklist_dataflow_doublequeue): Likewise. * dwarf2cfi.c (update_row_reg_save): Likewise. * early-remat.c (early_remat::init_block_info): Likewise. (early_remat::finalize_candidate_indices): Likewise. * except.c (sjlj_build_landing_pads): Likewise. * final.c (compute_alignments): Likewise. (grow_label_align): Likewise. * function.c (temp_slots_at_level): Likewise. * fwprop.c (build_single_def_use_links): Likewise. (update_uses): Likewise. * gcc.c (insert_wrapper): Likewise. * genautomata.c (create_state_ainsn_table): Likewise. (add_vect): Likewise. (output_dead_lock_vect): Likewise. * genmatch.c (capture_info::capture_info): Likewise. (parser::finish_match_operand): Likewise. * genrecog.c (optimize_subroutine_group): Likewise. (merge_pattern_info::merge_pattern_info): Likewise. (merge_into_decision): Likewise. (print_subroutine_start): Likewise. (main): Likewise. * gimple-loop-versioning.cc (loop_versioning::loop_versioning): Likewise. * gimple.c (gimple_set_bb): Likewise. * graphite-isl-ast-to-gimple.c (translate_isl_ast_node_user): Likewise. * haifa-sched.c (sched_extend_luids): Likewise. (extend_h_i_d): Likewise. * insn-addr.h (insn_addresses_new): Likewise. * ipa-cp.c (gather_context_independent_values): Likewise. (find_more_contexts_for_caller_subset): Likewise. * ipa-devirt.c (final_warning_record::grow_type_warnings): Likewise. (ipa_odr_read_section): Likewise. * ipa-fnsummary.c (evaluate_properties_for_edge): Likewise. (ipa_fn_summary_t::duplicate): Likewise. (analyze_function_body): Likewise. (ipa_merge_fn_summary_after_inlining): Likewise. (read_ipa_call_summary): Likewise. * ipa-icf.c (sem_function::bb_dict_test): Likewise. * ipa-prop.c (ipa_alloc_node_params): Likewise. (parm_bb_aa_status_for_bb): Likewise. (ipa_compute_jump_functions_for_edge): Likewise. (ipa_analyze_node): Likewise. (update_jump_functions_after_inlining): Likewise. (ipa_read_edge_info): Likewise. (read_ipcp_transformation_info): Likewise. (ipcp_transform_function): Likewise. * ipa-reference.c (ipa_reference_write_optimization_summary): Likewise. * ipa-split.c (execute_split_functions): Likewise. * ira.c (find_moveable_pseudos): Likewise. * lower-subreg.c (decompose_multiword_subregs): Likewise. * lto-streamer-in.c (input_eh_regions): Likewise. (input_cfg): Likewise. (input_struct_function_base): Likewise. (input_function): Likewise. * modulo-sched.c (set_node_sched_params): Likewise. (extend_node_sched_params): Likewise. (schedule_reg_moves): Likewise. * omp-general.c (omp_construct_simd_compare): Likewise. * passes.c (pass_manager::create_pass_tab): Likewise. (enable_disable_pass): Likewise. * predict.c (determine_unlikely_bbs): Likewise. * profile.c (compute_branch_probabilities): Likewise. * read-rtl-function.c (function_reader::parse_block): Likewise. * read-rtl.c (rtx_reader::read_rtx_code): Likewise. * reg-stack.c (stack_regs_mentioned): Likewise. * regrename.c (regrename_init): Likewise. * rtlanal.c (T>::add_single_to_queue): Likewise. * sched-deps.c (init_deps_data_vector): Likewise. * sel-sched-ir.c (sel_extend_global_bb_info): Likewise. (extend_region_bb_info): Likewise. (extend_insn_data): Likewise. * symtab.c (symtab_node::create_reference): Likewise. * tracer.c (tail_duplicate): Likewise. * trans-mem.c (tm_region_init): Likewise. (get_bb_regions_instrumented): Likewise. * tree-cfg.c (init_empty_tree_cfg_for_function): Likewise. (build_gimple_cfg): Likewise. (create_bb): Likewise. (move_block_to_fn): Likewise. * tree-complex.c (tree_lower_complex): Likewise. * tree-if-conv.c (predicate_rhs_code): Likewise. * tree-inline.c (copy_bb): Likewise. * tree-into-ssa.c (get_ssa_name_ann): Likewise. (mark_phi_for_rewrite): Likewise. * tree-object-size.c (compute_builtin_object_size): Likewise. (init_object_sizes): Likewise. * tree-predcom.c (initialize_root_vars_store_elim_1): Likewise. (initialize_root_vars_store_elim_2): Likewise. (prepare_initializers_chain_store_elim): Likewise. * tree-ssa-address.c (addr_for_mem_ref): Likewise. (multiplier_allowed_in_address_p): Likewise. * tree-ssa-coalesce.c (ssa_conflicts_new): Likewise. * tree-ssa-forwprop.c (simplify_vector_constructor): Likewise. * tree-ssa-loop-ivopts.c (addr_offset_valid_p): Likewise. (get_address_cost_ainc): Likewise. * tree-ssa-loop-niter.c (discover_iteration_bound_by_body_walk): Likewise. * tree-ssa-pre.c (add_to_value): Likewise. (phi_translate_1): Likewise. (do_pre_regular_insertion): Likewise. (do_pre_partial_partial_insertion): Likewise. (init_pre): Likewise. * tree-ssa-propagate.c (ssa_prop_init): Likewise. (update_call_from_tree): Likewise. * tree-ssa-reassoc.c (optimize_range_tests_cmp_bitwise): Likewise. * tree-ssa-sccvn.c (vn_reference_lookup_3): Likewise. (vn_reference_lookup_pieces): Likewise. (eliminate_dom_walker::eliminate_push_avail): Likewise. * tree-ssa-strlen.c (set_strinfo): Likewise. (get_stridx_plus_constant): Likewise. (zero_length_string): Likewise. (find_equal_ptrs): Likewise. (printf_strlen_execute): Likewise. * tree-ssa-threadedge.c (set_ssa_name_value): Likewise. * tree-ssanames.c (make_ssa_name_fn): Likewise. * tree-streamer-in.c (streamer_read_tree_bitfields): Likewise. * tree-vect-loop.c (vect_record_loop_mask): Likewise. (vect_get_loop_mask): Likewise. (vect_record_loop_len): Likewise. (vect_get_loop_len): Likewise. * tree-vect-patterns.c (vect_recog_mask_conversion_pattern): Likewise. * tree-vect-slp.c (vect_slp_convert_to_external): Likewise. (vect_bb_slp_scalar_cost): Likewise. (vect_bb_vectorization_profitable_p): Likewise. (vectorizable_slp_permutation): Likewise. * tree-vect-stmts.c (vectorizable_call): Likewise. (vectorizable_simd_clone_call): Likewise. (scan_store_can_perm_p): Likewise. (vectorizable_store): Likewise. * expr.c: Likewise. * vec.c (test_safe_grow_cleared): Likewise. * vec.h (vec_safe_grow): Likewise. (vec_safe_grow_cleared): Likewise. (vl_ptr>::safe_grow): Likewise. (vl_ptr>::safe_grow_cleared): Likewise. * config/c6x/c6x.c (insn_set_clock): Likewise. gcc/c/ChangeLog: * gimple-parser.c (c_parser_gimple_compound_statement): Set exact argument of a vector growth function to true. gcc/cp/ChangeLog: * class.c (build_vtbl_initializer): Set exact argument of a vector growth function to true. * constraint.cc (get_mapped_args): Likewise. * decl.c (cp_maybe_mangle_decomp): Likewise. (cp_finish_decomp): Likewise. * parser.c (cp_parser_omp_for_loop): Likewise. * pt.c (canonical_type_parameter): Likewise. * rtti.c (get_pseudo_ti_init): Likewise. gcc/fortran/ChangeLog: * trans-openmp.c (gfc_trans_omp_do): Set exact argument of a vector growth function to true. gcc/lto/ChangeLog: * lto-common.c (lto_file_finalize): Set exact argument of a vector growth function to true.
2020-08-26tree-optimization/96698 - fix ICE when vectorizing nested cyclesRichard Biener1-1/+34
This fixes vectorized PHI latch edge updating and delay it until all of the loop is code generated to deal with the case that the latch def is a PHI in the same block. 2020-08-26 Richard Biener <rguenther@suse.de> PR tree-optimization/96698 * tree-vectorizer.h (loop_vec_info::reduc_latch_defs): New. (loop_vec_info::reduc_latch_slp_defs): Likewise. * tree-vect-stmts.c (vect_transform_stmt): Only record stmts to update PHI latches from, perform the update ... * tree-vect-loop.c (vect_transform_loop): ... here after vectorizing those PHIs. (info_for_reduction): Properly handle non-reduction PHIs. * gcc.dg/vect/pr96698.c: New testcase.
2020-08-24SLP: support entire BB.Martin Liska1-2/+3
gcc/ChangeLog: * tree-vect-data-refs.c (dr_group_sort_cmp): Work on data_ref_pair. (vect_analyze_data_ref_accesses): Work on groups. (vect_find_stmt_data_reference): Add group_id argument and fill up dataref_groups vector. * tree-vect-loop.c (vect_get_datarefs_in_loop): Pass new arguments. (vect_analyze_loop_2): Likewise. * tree-vect-slp.c (vect_slp_analyze_bb_1): Pass argument. (vect_slp_bb_region): Likewise. (vect_slp_region): Likewise. (vect_slp_bb):Work on the entire BB. * tree-vectorizer.h (vect_analyze_data_ref_accesses): Add new argument. (vect_find_stmt_data_reference): Likewise. gcc/testsuite/ChangeLog: * gcc.dg/vect/bb-slp-38.c: Adjust pattern as now we only process a single vectorization and now 2 partial. * gcc.dg/vect/bb-slp-45.c: New test.
2020-08-06vect/rs6000: Support vector with length cost modelingKewen Lin1-5/+83
This patch is to add the cost modeling for vector with length, it mainly follows what we generate for vector with length in functions vect_set_loop_controls_directly and vect_gen_len at the worst case. For Power, the length is expected to be in bits 0-7 (high bits), we have to model the cost of shifting bits, which is implemented in adjust_vect_cost_per_loop. Bootstrapped/regtested on powerpc64le-linux-gnu (P9) with explicit param vect-partial-vector-usage=1. gcc/ChangeLog: * config/rs6000/rs6000.c (rs6000_adjust_vect_cost_per_loop): New function. (rs6000_finish_cost): Call rs6000_adjust_vect_cost_per_loop. * tree-vect-loop.c (vect_estimate_min_profitable_iters): Add cost modeling for vector with length. (vect_rgroup_iv_might_wrap_p): New function, factored out from... * tree-vect-loop-manip.c (vect_set_loop_controls_directly): ...this. Update function comment. * tree-vect-stmts.c (vect_gen_len): Update function comment. * tree-vectorizer.h (vect_rgroup_iv_might_wrap_p): New declare.
2020-07-31vect: Don't consider branch costs if no peeled iterationsKewen Lin1-7/+9
Currently vectorizer cost modeling counts branch taken costs for prologue and epilogue if the number of iterations is unknown. But it isn't sensible if there are no peeled iterations. This patch is to guard them under peel_iters_prologue > 0 or peel_iters_epilogue > 0. Bootstrapped/regtested on powerpc64le-linux-gnu and aarch64-linux-gnu. gcc/ChangeLog: * tree-vect-loop.c (vect_get_known_peeling_cost): Don't consider branch taken costs for prologue and epilogue if they don't exist. (vect_estimate_min_profitable_iters): Likewise. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/cost_model_2.c: Adjust due to cost model change.
2020-07-27vect: Refactor peel_iters_{pro,epi}logue cost modelingKewen Lin1-125/+142
This patch is to refactor the existing peel_iters_prologue and peel_iters_epilogue cost model handlings, by following the structure below suggested by Richard Sandiford: - calculate peel_iters_prologue - calculate peel_iters_epilogue - add costs associated with peel_iters_prologue - add costs associated with peel_iters_epilogue - add costs related to branch taken/not_taken. Bootstrapped/regtested on aarch64-linux-gnu. gcc/ChangeLog: * tree-vect-loop.c (vect_get_known_peeling_cost): Factor out some code to determine peel_iters_epilogue to... (vect_get_peel_iters_epilogue): ...this new function. (vect_estimate_min_profitable_iters): Refactor cost calculation on peel_iters_prologue and peel_iters_epilogue.
2020-07-19vect: Support length-based partial vectors approachKewen Lin1-6/+211
Power9 supports vector load/store instruction lxvl/stxvl which allow us to operate partial vectors with one specific length. This patch extends some of current mask-based partial vectors support code for length-based approach, also adds some length specific support code. So far it assumes that we can only have one partial vectors approach at the same time, it will disable to use partial vectors if both approaches co-exist. Like the description of optab len_load/len_store, the length-based approach can have two flavors, one is length in bytes, the other is length in lanes. This patch is mainly implemented and tested for length in bytes, but as Richard S. suggested, most of code has considered both flavors. This also introduces one parameter vect-partial-vector-usage allow users to control when the loop vectorizer considers using partial vectors as an alternative to falling back to scalar code. gcc/ChangeLog: * config/rs6000/rs6000.c (rs6000_option_override_internal): Set param_vect_partial_vector_usage to 0 explicitly. * doc/invoke.texi (vect-partial-vector-usage): Document new option. * optabs-query.c (get_len_load_store_mode): New function. * optabs-query.h (get_len_load_store_mode): New declare. * params.opt (vect-partial-vector-usage): New. * tree-vect-loop-manip.c (vect_set_loop_controls_directly): Add the handlings for vectorization using length-based partial vectors, call vect_gen_len for length generation, and rename some variables with items instead of scalars. (vect_set_loop_condition_partial_vectors): Add the handlings for vectorization using length-based partial vectors. (vect_do_peeling): Allow remaining eiters less than epilogue vf for LOOP_VINFO_USING_PARTIAL_VECTORS_P. * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Init epil_using_partial_vectors_p. (_loop_vec_info::~_loop_vec_info): Call release_vec_loop_controls for lengths destruction. (vect_verify_loop_lens): New function. (vect_analyze_loop): Add handlings for epilogue of loop when it's marked to use vectorization using partial vectors. (vect_analyze_loop_2): Add the check to allow only one vectorization approach using partial vectorization at the same time. Check param vect-partial-vector-usage for partial vectors decision. Mark LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P if the epilogue is considerable to use partial vectors. Call release_vec_loop_controls for lengths destruction. (vect_estimate_min_profitable_iters): Adjust for loop vectorization using length-based partial vectors. (vect_record_loop_mask): Init factor to 1 for vectorization using mask-based partial vectors. (vect_record_loop_len): New function. (vect_get_loop_len): Likewise. * tree-vect-stmts.c (check_load_store_for_partial_vectors): Add checks for vectorization using length-based partial vectors. Factor some code to lambda function get_valid_nvectors. (vectorizable_store): Add handlings when using length-based partial vectors. (vectorizable_load): Likewise. (vect_gen_len): New function. * tree-vectorizer.h (struct rgroup_controls): Add field factor mainly for length-based partial vectors. (vec_loop_lens): New typedef. (_loop_vec_info): Add lens and epil_using_partial_vectors_p. (LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P): New macro. (LOOP_VINFO_LENS): Likewise. (LOOP_VINFO_FULLY_WITH_LENGTH_P): Likewise. (vect_record_loop_len): New declare. (vect_get_loop_len): Likewise. (vect_gen_len): Likewise.
2020-07-09vect: Use adjusted niters by considering peeling prologueKewen Lin1-1/+7
This patch is derived from the review of vector with length patch series. I relaxed the guard on LOOP_VINFO_PEELING_FOR_ALIGNMENT for vector with length as Richard S.'s suggestion, then encountered one failure from case gcc.dg/vect/vect-ifcvt-11.c with param vect-partial-vector-usage=2 enablement run. The root cause is that we still use the original niters for the loop body vectorization, it leads the access to go out of bound, instead we should use LOOP_VINFO_NITERS which has been adjusted in vect_do_peeling by considering the peeling number for prologue. Bootstrapped/regtested on aarch64-linux-gnu and powerpc64le-linux-gnu. gcc/ChangeLog: * tree-vect-loop.c (vect_transform_loop): Use LOOP_VINFO_NITERS which is adjusted by considering peeled prologue for non vect_use_loop_mask_for_alignment_p cases.
2020-07-09remove premature vect_verify_datarefs_alignmentRichard Biener1-2/+0
This followup removes vect_verify_datarefs_alignment and its premature cancellation of vectorization leaving the actual decision whether alignment is supported to the functions deciding whether we can vectorize a load or store. 2020-07-08 Richard Biener <rguenther@suse.de> * tree-vectorizer.h (vect_verify_datarefs_alignment): Remove. (vect_slp_analyze_and_verify_instance_alignment): Rename to ... (vect_slp_analyze_instance_alignment): ... this. * tree-vect-data-refs.c (verify_data_ref_alignment): Remove. (vect_verify_datarefs_alignment): Likewise. (vect_enhance_data_refs_alignment): Do not call vect_verify_datarefs_alignment. (vect_slp_analyze_node_alignment): Rename from vect_slp_analyze_and_verify_node_alignment and do not call verify_data_ref_alignment. (vect_slp_analyze_instance_alignment): Rename from vect_slp_analyze_and_verify_instance_alignment. * tree-vect-stmts.c (vectorizable_store): Dump when we vectorize an unaligned access. (vectorizable_load): Likewise. * tree-vect-loop.c (vect_analyze_loop_2): Do not call vect_verify_datarefs_alignment. * tree-vect-slp.c (vect_slp_analyze_bb_1): Adjust. * gcc.dg/vect/bb-slp-10.c: Adjust. * gcc.dg/vect/slp-45.c: Likewise. * gcc.dg/vect/vect-109.c: Likewise.
2020-07-09vect/testsuite: Adjust dumping for fully masking decisionKewen Lin1-3/+3
As Richard S. suggested in the review of vector with length patch series, we can use one message on "partial vectors" instead of "fully with masking". This patch is to update the dumping string and related test cases. Bootstrapped/regtested on aarch64-linux-gnu. gcc/ChangeLog: * tree-vect-loop.c (vect_analyze_loop_2): Update dumping string for fully masking to be more common. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/clastb_1.c: Update dumping string. * gcc.target/aarch64/sve/clastb_2.c: Likewise. * gcc.target/aarch64/sve/clastb_3.c: Likewise. * gcc.target/aarch64/sve/clastb_4.c: Likewise. * gcc.target/aarch64/sve/clastb_5.c: Likewise. * gcc.target/aarch64/sve/clastb_6.c: Likewise. * gcc.target/aarch64/sve/clastb_7.c: Likewise.
2020-06-26tree-optimization/95897 - fix fold-left SLP reduction insert placeRichard Biener1-2/+1
This fixes computation of the insertion place for fold-left SLP reductions where the PHIs do not have vectorized stmts. The SLP representation isn't perfect here thus the following. 2020-06-26 Richard Biener <rguenther@suse.de> PR tree-optimization/95897 * tree-vectorizer.h (vectorizable_induction): Remove unused gimple_stmt_iterator * parameter. * tree-vect-loop.c (vectorizable_induction): Likewise. (vect_analyze_loop_operations): Adjust. * tree-vect-stmts.c (vect_analyze_stmt): Likewise. (vect_transform_stmt): Likewise. * tree-vect-slp.c (vect_schedule_slp_instance): Adjust for fold-left reductions, clarify existing reduction case. * gcc.dg/vect/pr95897.c: New testcase.
2020-06-15vect: Use LOOP_VINFO_DATAREFS and LOOP_VINFO_DDRS consistentlyFei Yang1-2/+2
Minor code refactorings in tree-vect-data-refs.c and tree-vect-loop.c. Use LOOP_VINFO_DATAREFS and LOOP_VINFO_DDRS when possible and rename several parameters to make code more consistent. 2020-06-13 Felix Yang <felix.yang@huawei.com> gcc/ * tree-vect-data-refs.c (vect_verify_datarefs_alignment): Rename parameter to loop_vinfo and update uses. Use LOOP_VINFO_DATAREFS when possible. (vect_analyze_data_refs_alignment): Likewise, and use LOOP_VINFO_DDRS when possible. * tree-vect-loop.c (vect_dissolve_slp_only_groups): Use LOOP_VINFO_DATAREFS when possible. (update_epilogue_loop_vinfo): Likewise.
2020-06-12vect: Factor out and rename some functions/macrosKewen Lin1-44/+72
Power supports vector memory access with length (in bytes) instructions. Like existing fully masking for SVE, it is another approach to vectorize the loop using partially-populated vectors. As Richard Sandiford suggested, we should share the codes in approaches with partial vectors if possible. This patch is to: 1) factor out two functions: - vect_min_prec_for_max_niters - vect_known_niters_smaller_than_vf. 2) rename four functions: - vect_iv_limit_for_full_masking - check_load_store_masking - vect_set_loop_condition_masked - vect_set_loop_condition_unmasked 3) rename macros LOOP_VINFO_MASK_COMPARE_TYPE and LOOP_VINFO_MASK_IV_TYPE. Bootstrapped/regtested on aarch64-linux-gnu. gcc/ChangeLog: * tree-vect-loop-manip.c (vect_set_loop_controls_directly): Rename LOOP_VINFO_MASK_COMPARE_TYPE to LOOP_VINFO_RGROUP_COMPARE_TYPE. Rename LOOP_VINFO_MASK_IV_TYPE to LOOP_VINFO_RGROUP_IV_TYPE. (vect_set_loop_condition_masked): Renamed to ... (vect_set_loop_condition_partial_vectors): ... this. Rename LOOP_VINFO_MASK_COMPARE_TYPE to LOOP_VINFO_RGROUP_COMPARE_TYPE. Rename vect_iv_limit_for_full_masking to vect_iv_limit_for_partial_vectors. (vect_set_loop_condition_unmasked): Renamed to ... (vect_set_loop_condition_normal): ... this. (vect_set_loop_condition): Rename vect_set_loop_condition_unmasked to vect_set_loop_condition_normal. Rename vect_set_loop_condition_masked to vect_set_loop_condition_partial_vectors. (vect_prepare_for_masked_peels): Rename LOOP_VINFO_MASK_COMPARE_TYPE to LOOP_VINFO_RGROUP_COMPARE_TYPE. * tree-vect-loop.c (vect_known_niters_smaller_than_vf): New, factored out from ... (vect_analyze_loop_costing): ... this. (_loop_vec_info::_loop_vec_info): Rename mask_compare_type to compare_type. (vect_min_prec_for_max_niters): New, factored out from ... (vect_verify_full_masking): ... this. Rename vect_iv_limit_for_full_masking to vect_iv_limit_for_partial_vectors. Rename LOOP_VINFO_MASK_COMPARE_TYPE to LOOP_VINFO_RGROUP_COMPARE_TYPE. Rename LOOP_VINFO_MASK_IV_TYPE to LOOP_VINFO_RGROUP_IV_TYPE. (vectorizable_reduction): Update some dumpings with partial vectors instead of fully-masked. (vectorizable_live_operation): Likewise. (vect_iv_limit_for_full_masking): Renamed to ... (vect_iv_limit_for_partial_vectors): ... this. * tree-vect-stmts.c (check_load_store_masking): Renamed to ... (check_load_store_for_partial_vectors): ... this. Update some dumpings with partial vectors instead of fully-masked. (vectorizable_store): Rename check_load_store_masking to check_load_store_for_partial_vectors. (vectorizable_load): Likewise. * tree-vectorizer.h (LOOP_VINFO_MASK_COMPARE_TYPE): Renamed to ... (LOOP_VINFO_RGROUP_COMPARE_TYPE): ... this. (LOOP_VINFO_MASK_IV_TYPE): Renamed to ... (LOOP_VINFO_RGROUP_IV_TYPE): ... this. (vect_iv_limit_for_full_masking): Renamed to ... (vect_iv_limit_for_partial_vectors): this. (_loop_vec_info): Rename mask_compare_type to rgroup_compare_type. Rename iv_type to rgroup_iv_type.
2020-06-11vect: Rename things related to rgroup_masksKewen Lin1-22/+22
Power supports vector memory access with length (in bytes) instructions. Like existing fully masking for SVE, it is another approach to vectorize the loop using partially-populated vectors. As Richard Sandiford pointed out, we can rename the rgroup struct rgroup_masks to rgroup_controls, rename its members mask_type to type, masks to controls to be more generic. Besides, this patch also renames some functions like vect_set_loop_mask to vect_set_loop_control, release_vec_loop_masks to release_vec_loop_controls, vect_set_loop_masks_directly to vect_set_loop_controls_directly. Bootstrapped/regtested on aarch64-linux-gnu. gcc/ChangeLog: * tree-vect-loop-manip.c (vect_set_loop_mask): Renamed to ... (vect_set_loop_control): ... this. (vect_maybe_permute_loop_masks): Rename rgroup_masks related things. (vect_set_loop_masks_directly): Renamed to ... (vect_set_loop_controls_directly): ... this. Also rename some variables with ctrl instead of mask. Rename vect_set_loop_mask to vect_set_loop_control. (vect_set_loop_condition_masked): Rename rgroup_masks related things. Also rename some variables with ctrl instead of mask. * tree-vect-loop.c (release_vec_loop_masks): Renamed to ... (release_vec_loop_controls): ... this. Rename rgroup_masks related things. (_loop_vec_info::~_loop_vec_info): Rename release_vec_loop_masks to release_vec_loop_controls. (can_produce_all_loop_masks_p): Rename rgroup_masks related things. (vect_get_max_nscalars_per_iter): Likewise. (vect_estimate_min_profitable_iters): Likewise. (vect_record_loop_mask): Likewise. (vect_get_loop_mask): Likewise. * tree-vectorizer.h (struct rgroup_masks): Renamed to ... (struct rgroup_controls): ... this. Also rename mask_type to type and rename masks to controls.
2020-06-11vect: Rename fully_masked_p to using_partial_vectors_pKewen Lin1-15/+17
Power supports vector memory access with length (in bytes) instructions. Like existing fully masking for SVE, it is another approach to vectorize the loop using partially-populated vectors. As Richard Sandiford suggested, this patch is to update the existing fully_masked_p field to using_partial_vectors_p. Introduce one macro LOOP_VINFO_USING_PARTIAL_VECTORS_P for partial vectorization checking usage, update the LOOP_VINFO_FULLY_MASKED_P with LOOP_VINFO_USING_PARTIAL_VECTORS_P && !masks.is_empty() and still use it for mask-based partial vectors approach specific checks. Bootstrapped/regtested on aarch64-linux-gnu. gcc/ChangeLog: * tree-vect-loop-manip.c (vect_set_loop_condition): Rename LOOP_VINFO_FULLY_MASKED_P to LOOP_VINFO_USING_PARTIAL_VECTORS_P. (vect_gen_vector_loop_niters): Likewise. (vect_do_peeling): Likewise. * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Rename fully_masked_p to using_partial_vectors_p. (vect_analyze_loop_costing): Rename LOOP_VINFO_FULLY_MASKED_P to LOOP_VINFO_USING_PARTIAL_VECTORS_P. (determine_peel_for_niter): Likewise. (vect_estimate_min_profitable_iters): Likewise. (vect_transform_loop): Likewise. * tree-vectorizer.h (LOOP_VINFO_FULLY_MASKED_P): Updated. (LOOP_VINFO_USING_PARTIAL_VECTORS_P): New macro.