Age | Commit message (Collapse) | Author | Files | Lines |
|
This avoids breaking LC SSA when SLP codegen pulled an out-of-loop
def into a loop when merging with in-loop defs for an external def.
2020-11-30 Richard Biener <rguenther@suse.de>
PR tree-optimization/98064
* tree-vect-loop.c (vectorizable_live_operation): Avoid
breaking LC SSA for BB vectorization.
* g++.dg/vect/pr98064.cc: New testcase.
|
|
Currently we have three vector cost models: cheap, dynamic and
unlimited. -O2 -ftree-vectorize uses “cheap” by default, but that's
still relatively aggressive about peeling and aliasing checks,
and can lead to significant code size growth.
This patch adds an even more conservative choice, which for lack of
imagination I've called “very cheap”. It only allows vectorisation
if the vector code entirely replaces the scalar code. It also
requires one iteration of the vector loop to pay for itself,
regardless of how often the loop iterates. (If the vector loop
needs multiple iterations to be beneficial then things are
probably too close to call, and the conservative thing would
be to stick with the scalar code.)
The idea is that this should be suitable for -O2, although the patch
doesn't change any defaults itself.
I tested this by building and running a bunch of workloads for SVE,
with three options:
(1) -O2
(2) -O2 -ftree-vectorize -fvect-cost-model=very-cheap
(3) -O2 -ftree-vectorize [-fvect-cost-model=cheap]
All three builds used the default -msve-vector-bits=scalable and
ran with the minimum vector length of 128 bits, which should give
a worst-case bound for the performance impact.
The workloads included a mixture of microbenchmarks and full
applications. Because it's quite an eclectic mix, there's not
much point giving exact figures. The aim was more to get a general
impression.
Code size growth with (2) was much lower than with (3). Only a
handful of tests increased by more than 5%, and all of them were
microbenchmarks.
In terms of performance, (2) was significantly faster than (1)
on microbenchmarks (as expected) but also on some full apps.
Again, performance only regressed on a handful of tests.
As expected, the performance of (3) vs. (1) and (3) vs. (2) is more
of a mixed bag. There are several significant improvements with (3)
over (2), but also some (smaller) regressions. That seems to be in
line with -O2 -ftree-vectorize being a kind of -O2.5.
The patch reorders vect_cost_model so that values are in order
of increasing aggressiveness, which makes it possible to use
range checks. The value 0 still represents “unlimited”,
so “if (flag_vect_cost_model)” is still a meaningful check.
gcc/
* doc/invoke.texi (-fvect-cost-model): Add a very-cheap model.
* common.opt (fvect-cost-model=): Add very-cheap as a possible option.
(fsimd-cost-model=): Likewise.
(vect_cost_model): Add very-cheap.
* flag-types.h (vect_cost_model): Add VECT_COST_MODEL_VERY_CHEAP.
Put the values in order of increasing aggressiveness.
* tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Use
range checks when comparing against VECT_COST_MODEL_CHEAP.
(vect_prune_runtime_alias_test_list): Do not allow any alias
checks for the very-cheap cost model.
* tree-vect-loop.c (vect_analyze_loop_costing): Do not allow
any peeling for the very-cheap cost model. Also require one
iteration of the vector loop to pay for itself.
gcc/testsuite/
* gcc.dg/vect/vect-cost-model-1.c: New test.
* gcc.dg/vect/vect-cost-model-2.c: Likewise.
* gcc.dg/vect/vect-cost-model-3.c: Likewise.
* gcc.dg/vect/vect-cost-model-4.c: Likewise.
* gcc.dg/vect/vect-cost-model-5.c: Likewise.
* gcc.dg/vect/vect-cost-model-6.c: Likewise.
|
|
This makes vectorization properly assign vector types to PHI
nodes that copy from externals on loop exit edges.
2020-11-18 Richard Biener <rguenther@suse.de>
PR tree-optimization/97886
* tree-vect-loop.c (vectorizable_lc_phi): Properly assign
vector types to invariants for SLP.
|
|
This delays filling SLP_INSTANCE_LOADS.
2020-11-16 Richard Biener <rguenther@suse.de>
* tree-vectorizer.h (vect_gather_slp_loads): Declare.
* tree-vect-loop.c (vect_analyze_loop_2): Call
vect_gather_slp_loads.
* tree-vect-slp.c (vect_build_slp_instance): Do not gather
SLP loads here.
(vect_gather_slp_loads): Remove wrapper, new function.
(vect_slp_analyze_bb_1): Call it.
|
|
We're stripping conversions off access functions of inductions and
thus the step can be of different sign. Fix bogus step CTORs by
converting the elements rather than the whole vector.
2020-11-16 Richard Biener <rguenther@suse.de>
PR tree-optimization/97835
* tree-vect-loop.c (vectorizable_induction): Convert step
scalars rather than step vector.
* gcc.dg/vect/pr97835.c: New testcase.
|
|
This makes sure we reject reduction paths with a live stmt that
is not the last one altering the value. This is because we do not
handle this in the epilogue unless there's a scalar epilogue loop.
2020-11-09 Richard Biener <rguenther@suse.de>
PR tree-optimization/97760
* tree-vect-loop.c (check_reduction_path): Reject
reduction paths we do not handle in epilogue generation.
* gcc.dg/vect/pr97760.c: New testcase.
|
|
This fixes updating of the step vectors when filling up to group_size.
2020-11-09 Richard Biener <rguenther@suse.de>
PR tree-optimization/97753
* tree-vect-loop.c (vectorizable_induction): Fill vec_steps
when CSEing inside the group.
* gcc.dg/vect/pr97753.c: New testcase.
|
|
This PR exposes two issues - one that the vector builder treats
&x as eligible for VECTOR_CST elements and one that SLP induction
vectorization forgets to convert init elements to the vector
component type which makes a difference for pointer vs. integer.
2020-11-06 Richard Biener <rguenther@suse.de>
PR tree-optimization/97732
* tree-vect-loop.c (vectorizable_induction): Convert the
init elements to the vector component type.
* gimple-fold.c (gimple_build_vector): Use CONSTANT_CLASS_P
rather than TREE_CONSTANT to determine if elements are
eligible for VECTOR_CSTs.
* gcc.dg/vect/bb-slp-pr97732.c: New testcase.
|
|
This patch stores the SLP instance kind in the SLP instance so that we can use
it later when detecting load/store lanes support.
This also changes the load/store lane support check to only check if the SLP
kind is a store. This means that in order for the load/lanes to work all
instances must be of kind store.
gcc/ChangeLog:
* tree-vect-loop.c (vect_analyze_loop_2): Check kind.
* tree-vect-slp.c (vect_build_slp_instance): New.
(enum slp_instance_kind): Move to...
* tree-vectorizer.h (enum slp_instance_kind): .. Here
(SLP_INSTANCE_KIND): New.
|
|
This moves the code that checks for load/store lanes further in the pipeline and
places it after slp_optimize. This would allow us to perform optimizations on
the SLP tree and only bail out if we really have a permute.
With this change it allows us to handle permutes such as {1,1,1,1} which should
be handled by a load and replicate.
This change however makes it all or nothing. Either all instances can be handled
or none at all. This is why some of the test cases have been adjusted.
gcc/ChangeLog:
* tree-vect-slp.c (vect_analyze_slp_instance): Moved load/store lanes
check to ...
* tree-vect-loop.c (vect_analyze_loop_2): ..Here
gcc/testsuite/ChangeLog:
* gcc.dg/vect/slp-11b.c: Update output scan.
* gcc.dg/vect/slp-perm-6.c: Likewise.
|
|
I forgot to cost vectorized PHIs. Scalar PHIs are just costed
as scalar_stmt so the following costs vector PHIs as vector_stmt.
2020-11-04 Richard Biener <rguenther@suse.de>
* tree-vectorizer.h (vectorizable_phi): Adjust prototype.
* tree-vect-stmts.c (vect_transform_stmt): Adjust.
(vect_analyze_stmt): Pass cost_vec to vectorizable_phi.
* tree-vect-loop.c (vectorizable_phi): Do costing.
|
|
This properly sets the abnormal flag when vectorizing live lanes
when the original scalar was live across an abnormal edge.
2020-11-04 Richard Biener <rguenther@suse.de>
PR tree-optimization/97709
* tree-vect-loop.c (vectorizable_live_operation): Set
SSA_NAME_OCCURS_IN_ABNORMAL_PHI when necessary.
* gcc.dg/vect/bb-slp-pr97709.c: New testcase.
|
|
This re-instantiates the previously removed CSE, fixing the
FAIL of gcc.dg/vect/costmodel/x86_64/costmodel-pr30843.c
It turns out the previous approach still works.
2020-11-04 Richard Biener <rguenther@suse.de>
* tree-vect-loop.c (vectorizable_induction): Re-instantiate
previously removed CSE of SLP IVs.
|
|
This adds SLP vectorization of nested inductions.
2020-11-03 Richard Biener <rguenther@suse.de>
PR tree-optimization/80928
* tree-vect-loop.c (vectorizable_induction): SLP vectorize
nested inductions.
* gcc.dg/vect/vect-outer-slp-2.c: New testcase.
* gcc.dg/vect/vect-outer-slp-3.c: Likewise.
|
|
This restores not tracking SLP nodes for induction initial values
in not nested context because this interferes with peeling and
epilogue vectorization.
2020-11-03 Richard Biener <rguenther@suse.de>
PR tree-optimization/97678
* tree-vect-slp.c (vect_build_slp_tree_2): Do not track
the initial values of inductions when not nested.
* tree-vect-loop.c (vectorizable_induction): Look at
PHI node initial values again for SLP and not nested
inductions. Handle LOOP_VINFO_MASK_SKIP_NITERS and cost
invariants.
* gcc.dg/vect/pr97678.c: New testcase.
|
|
This rewrites SLP induction vectorization to handle different
inductions in the different SLP lanes. It also changes SLP
build to represent the initial value (but not the cycle) so
it can be enhanced to handle outer loop vectorization later.
Note this FAILs gcc.dg/vect/costmodel/x86_64/costmodel-pr30843.c
because it removes one CSE optimization that no longer works
with non-uniform initial value and step. I'll see to recover
from this after outer loop vectorization of inductions works.
It might be a bit friendlier to variable-size vectors now
but then we're now building the step vector from scalars ...
2020-11-02 Richard Biener <rguenther@suse.de>
* tree.h (build_real_from_wide): Declare.
* tree.c (build_real_from_wide): New function.
* tree-vect-slp.c (vect_build_slp_tree_2): Remove
restriction on induction vectorization, represent
the initial value.
* tree-vect-loop.c (vect_model_induction_cost): Inline ...
(vectorizable_induction): ... here. Rewrite SLP
code generation.
* gcc.dg/vect/slp-49.c: New testcase.
|
|
This makes sure to compute the vector type for invariant SLP children
of nested cycles.
2020-11-02 Richard Biener <rguenther@suse.de>
PR tree-optimization/97558
* tree-vect-loop.c (vectorizable_reduction): For nested SLP
cycles compute invariant operands vector type.
* gcc.dg/vect/pr97558-2.c: New testcase.
|
|
This avoids analyzing reductions that are not relevant (thus dead)
which eventually will lead into crashes because the participating
stmts meta is not analyzed. For this to work the patch also
properly removes reduction groups that are not uniformly recognized
as patterns.
2020-11-02 Richard Biener <rguenther@suse.de>
PR tree-optimization/97558
* tree-vect-loop.c (vect_fixup_scalar_cycles_with_patterns):
Check for any mismatch in pattern vs. non-pattern and dissolve
the group if there is one.
* tree-vect-slp.c (vect_analyze_slp_instance): Avoid
analyzing not relevant reductions.
(vect_analyze_slp): Avoid analyzing not relevant reduction
groups.
* gcc.dg/vect/pr97558.c: New testcase.
|
|
This fixes some memleaks, one older, one recently introduced.
2020-10-29 Richard Biener <rguenther@suse.de>
* tree-ssa-pre.c (compute_avail): Free operands consistently.
* tree-vect-loop.c (vectorizable_phi): Make sure all operand
defs vectors are released.
|
|
This makes SLP discovery detect backedges by seeding the bst_map with
the node to be analyzed so it can be picked up from recursive calls.
This removes the need to discover backedges in a separate walk.
This enables SLP build to handle PHI nodes in full, continuing
the SLP build to non-backedges. For loop vectorization this
enables outer loop vectorization of nested SLP cycles and for
BB vectorization this enables vectorization of PHIs at CFG merges.
It also turns code generation into a SCC discovery walk to handle
irreducible regions and nodes only reachable via backedges where
we now also fill in vectorized backedge defs.
This requires sanitizing the SLP tree for SLP reduction chains even
more, manually filling the backedge SLP def.
This also exposes the fact that CFG copying (and edge splitting
until I fixed that) ends up with different edge order in the
copy which doesn't play well with the desired 1:1 mapping of
SLP PHI node children and edges for epilogue vectorization.
I've tried to fixup CFG copying here but this really looks
like a dead (or expensive) end there so I've done fixup in
slpeel_tree_duplicate_loop_to_edge_cfg instead for the cases
we can run into.
There's still NULLs in the SLP_TREE_CHILDREN vectors and I'm
not sure it's possible to eliminate them all this stage1 so the
patch has quite some checks for this case all over the place.
Bootstrapped and tested on x86_64-unknown-linux-gnu. SPEC CPU 2017
and SPEC CPU 2006 successfully built and tested.
2020-10-27 Richard Biener <rguenther@suse.de>
* gimple.h (gimple_expr_type): For PHIs return the type
of the result.
* tree-vect-loop-manip.c (slpeel_tree_duplicate_loop_to_edge_cfg):
Make sure edge order into copied loop headers line up with the
originals.
* tree-vect-loop.c (vect_transform_cycle_phi): Handle nested
loops with SLP.
(vectorizable_phi): New function.
(vectorizable_live_operation): For BB vectorization compute insert
location here.
* tree-vect-slp.c (vect_free_slp_tree): Deal with NULL
SLP_TREE_CHILDREN entries.
(vect_create_new_slp_node): Add overloads with pre-existing node
argument.
(vect_print_slp_graph): Likewise.
(vect_mark_slp_stmts): Likewise.
(vect_mark_slp_stmts_relevant): Likewise.
(vect_gather_slp_loads): Likewise.
(vect_optimize_slp): Likewise.
(vect_slp_analyze_node_operations): Likewise.
(vect_bb_slp_scalar_cost): Likewise.
(vect_remove_slp_scalar_calls): Likewise.
(vect_get_and_check_slp_defs): Handle PHIs.
(vect_build_slp_tree_1): Handle PHIs.
(vect_build_slp_tree_2): Continue SLP build, following PHI
arguments. Fix memory leak.
(vect_build_slp_tree): Put stub node into the hash-map so
we can discover cycles directly.
(vect_build_slp_instance): Set the backedge SLP def for
reduction chains.
(vect_analyze_slp_backedges): Remove.
(vect_analyze_slp): Do not call it.
(vect_slp_convert_to_external): Release SLP_TREE_LOAD_PERMUTATION.
(vect_slp_analyze_node_operations): Handle stray failed
backedge defs by failing.
(vect_slp_build_vertices): Adjust leaf condition.
(vect_bb_slp_mark_live_stmts): Handle PHIs, use visited
hash-set to handle cycles.
(vect_slp_analyze_operations): Adjust.
(vect_bb_partition_graph_r): Likewise.
(vect_slp_function): Adjust split condition to allow CFG
merges.
(vect_schedule_slp_instance): Rename to ...
(vect_schedule_slp_node): ... this. Move DFS walk to ...
(vect_schedule_scc): ... this new function.
(vect_schedule_slp): Call it. Remove ad-hoc vectorized
backedge fill code.
* tree-vect-stmts.c (vect_analyze_stmt): Call
vectorizable_phi.
(vect_transform_stmt): Likewise.
(vect_is_simple_use): Handle vect_backedge_def.
* tree-vectorizer.c (vec_info::new_stmt_vec_info): Only
set loop header PHIs to vect_unknown_def_type for loop
vectorization.
* tree-vectorizer.h (enum vect_def_type): Add vect_backedge_def.
(enum stmt_vec_info_type): Add phi_info_type.
(vectorizable_phi): Declare.
* gcc.dg/vect/bb-slp-54.c: New test.
* gcc.dg/vect/bb-slp-55.c: Likewise.
* gcc.dg/vect/bb-slp-56.c: Likewise.
* gcc.dg/vect/bb-slp-57.c: Likewise.
* gcc.dg/vect/bb-slp-58.c: Likewise.
* gcc.dg/vect/bb-slp-59.c: Likewise.
* gcc.dg/vect/bb-slp-60.c: Likewise.
* gcc.dg/vect/bb-slp-61.c: Likewise.
* gcc.dg/vect/bb-slp-62.c: Likewise.
* gcc.dg/vect/bb-slp-63.c: Likewise.
* gcc.dg/vect/bb-slp-64.c: Likewise.
* gcc.dg/vect/bb-slp-65.c: Likewise.
* gcc.dg/vect/bb-slp-66.c: Likewise.
* gcc.dg/vect/vect-outer-slp-1.c: Likewise.
* gfortran.dg/vect/O3-bb-slp-1.f: Likewise.
* gfortran.dg/vect/O3-bb-slp-2.f: Likewise.
* g++.dg/vect/simd-11.cc: Likewise.
|
|
Remove one redundant LOOP_VINFO_FULLY_MASKED_P condition check
which will be checked in vect_use_loop_mask_for_alignment_p.
gcc/ChangeLog:
* tree-vect-loop.c (vect_transform_loop): Remove the redundant
LOOP_VINFO_FULLY_MASKED_P check.
|
|
We were using the wrong loop to figure the latch arg of a
double-reduction PHI. Which isn't a problem in case ->dest_idx
match up with the outer loop edges - but that's of course not guaranteed.
2020-10-20 Richard Biener <rguenther@suse.de>
* tree-vect-loop.c (vectorizable_reduction): Use the correct
loops latch edge for the PHI arg lookup.
|
|
This fixes the case where the insertion iterator for the live stmt
is the end of a BB by adjusting the dominance query to the definition
of the def we're substituting.
2020-10-15 Richard Biener <rguenther@suse.de>
* tree-vect-loop.c (vectorizable_live_operation): Adjust
dominance query.
* gcc.dg/vect/bb-slp-52.c: New testcase.
|
|
This fixes leaks discovered checking whether I introduced new ones
with the last vectorizer changes.
2020-10-09 Richard Biener <rguenther@suse.de>
* cgraphunit.c (expand_all_functions): Free tp_first_run_order.
* ipa-modref.c (pass_ipa_modref::execute): Free order.
* tree-ssa-loop-niter.c (estimate_numbers_of_iterations): Free
loop body.
* tree-vect-data-refs.c (vect_find_stmt_data_reference): Free
data references upon failure.
* tree-vect-loop.c (update_epilogue_loop_vinfo): Free BBs
array of the original loop.
* tree-vect-slp.c (vect_slp_bbs): Use an auto_vec for
dataref_groups to release its memory.
|
|
The following moves an ad-hoc attempt at discovering the SLP node
for a stmt to the place where we can find it in lock-step when
we find the stmt itself.
2020-09-29 Richard Biener <rguenther@suse.de>
PR tree-optimization/97241
* tree-vect-loop.c (vectorizable_reduction): Move finding
the SLP node for the reduction stmt to a better place.
* gcc.dg/vect/pr97241.c: New testcase.
|
|
This patch fixes the fallout that Kewen reported on Power after
the recent change to avoid unnecessary use of partial vectors.
As Kewen said, the problem is that vect_analyze_loop_2 doesn't
know how many epilogue iterations there will be, and so it
cannot make a final decision about whether the number of
iterations forces an epilogue loop to use partial vectors.
This is similar to the current situation for peeling: we don't know
during initial analysis whether an epilogue loop will itself require
peeling. Instead we decide that during vect_do_peeling, where the
final number of epilogue loop iterations is known.
The patch takes a similar approach for the decision about whether
to use partial vectors. As the comments in the patch say, the
idea is that vect_analyze_loop_2 should make peeling and partial-
vector decisions based on the assumption that the loop_vinfo will
be used as the main loop, while vect_do_peeling should make them
in the knowledge that the loop_vinfo will be used as an epilogue loop.
This allows the same analysis to be used for both cases, which we
rely on for implementing VECT_COMPARE_COSTS; see the big comment
in vect_analyze_loop for details.
I hope the patch makes the (mostly preexisting) structure a bit
more obvious. It isn't what anyone would design from scratch,
but that's the nature of working with a mature vector framework.
Arranging things this way means that vect_verify_full_masking
and vect_verify_loop_lens now become part of the “can” rather
than “will” test for partial vectors.
Also, while splitting out the logic that handles epilogues with
constant iterations, I added a check to make sure that we don't
try to use partial vectors to vectorise a single-scalar loop.
This required some changes to the Power tests.
gcc/
* tree-vectorizer.h (determine_peel_for_niter): Delete in favor of...
(vect_determine_partial_vectors_and_peeling): ...this new function.
* tree-vect-loop-manip.c (vect_update_epilogue_niters): New function.
Reject using vector epilogue loops for single iterations. Install
the constant number of epilogue loop iterations in the associated
loop_vinfo. Rely on vect_determine_partial_vectors_and_peeling
to do the main part of the test.
(vect_do_peeling): Use vect_update_epilogue_niters to handle
epilogue loops with a known number of iterations. Skip recomputing
the number of iterations later in that case. Otherwise, use
vect_determine_partial_vectors_and_peeling to decide whether the
epilogue loop needs to use partial vectors or peeling.
* tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Set the
default can_use_partial_vectors_p to false if partial-vector-usage=0.
(determine_peel_for_niter): Remove in favor of...
(vect_determine_partial_vectors_and_peeling): ...this new function,
split out from...
(vect_analyze_loop_2): ...here. Reflect the vect_verify_full_masking
and vect_verify_loop_lens results in CAN_USE_PARTIAL_VECTORS_P
rather than USING_PARTIAL_VECTORS_P.
gcc/testsuite/
* gcc.target/powerpc/p9-vec-length-epil-1.c: Do not expect the
single-iteration epilogues of the 64-bit loops to be vectorized.
* gcc.target/powerpc/p9-vec-length-epil-7.c: Likewise.
* gcc.target/powerpc/p9-vec-length-epil-8.c: Likewise.
|
|
The condition we're expecting to eventually run into isn't fully
captured by checking for CTORs, instead we can also run into the
CTOR element conversion.
2020-09-23 Richard Biener <rguenther@suse.de>
PR tree-optimization/97173
* tree-vect-loop.c (vectorizable_live_operation): Extend
assert to also conver element conversions.
* gcc.dg/vect/pr97173.c: New testcase.
|
|
This fixes a typo introduced with the last change and not noticed
because those vectorizer access macros are not type safe ...
2020-09-18 Richard Biener <rguenther@suse.de>
PR tree-optimization/97095
* tree-vect-loop.c (vectorizable_live_operation): Get
the SLP vector type from the correct object.
* gfortran.dg/pr97095.f: New testcase.
|
|
gcc/ChangeLog
2020-09-09 Andrea Corallo <andrea.corallo@arm.com>
* tree-vect-loop.c (vect_need_peeling_or_partial_vectors_p): New
function.
(vect_analyze_loop_2): Make use of it not to select partial
vectors if no peel is required.
(determine_peel_for_niter): Move out some logic into
'vect_need_peeling_or_partial_vectors_p'.
gcc/testsuite/ChangeLog
2020-09-09 Andrea Corallo <andrea.corallo@arm.com>
* gcc.target/aarch64/sve/cost_model_10.c: New test.
* gcc.target/aarch64/sve/clastb_8.c: Update test for new
vectorization strategy.
* gcc.target/aarch64/sve/cost_model_5.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_14.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_15.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_16.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_17.c: Likewise.
|
|
This removes STMT_VINFO_NUM_SLP_USES by pushing the setting of
the shared stmt_vec_info vector type to where we actually need it
which is alignment analysis and vectorizable_* analysis (where
we could eventually elide it for non-load/store operations).
In particular "uses" in the cache and in disqualified SLP
subgraphs should no longer provide conflicting vector types
this way.
2020-09-16 Richard Biener <rguenther@suse.de>
* tree-vectorizer.h (_stmt_vec_info::num_slp_uses): Remove.
(STMT_VINFO_NUM_SLP_USES): Likewise.
(vect_free_slp_instance): Adjust.
(vect_update_shared_vectype): Declare.
* tree-vectorizer.c (vec_info::~vec_info): Adjust.
* tree-vect-loop.c (vect_analyze_loop_2): Likewise.
(vectorizable_live_operation): Use vector type from
SLP_TREE_REPRESENTATIVE.
(vect_transform_loop): Adjust.
* tree-vect-data-refs.c (vect_slp_analyze_node_alignment):
Set the shared vector type.
* tree-vect-slp.c (vect_free_slp_tree): Remove final_p
parameter, remove STMT_VINFO_NUM_SLP_USES updating.
(vect_free_slp_instance): Adjust.
(vect_create_new_slp_node): Remove STMT_VINFO_NUM_SLP_USES
updating.
(vect_update_shared_vectype): Always compare with the
present vector type, update if NULL.
(vect_build_slp_tree_1): Do not update the shared vector
type here.
(vect_build_slp_tree_2): Adjust.
(slp_copy_subtree): Likewise.
(vect_attempt_slp_rearrange_stmts): Likewise.
(vect_analyze_slp_instance): Likewise.
(vect_analyze_slp): Likewise.
(vect_slp_analyze_node_operations_1): Update the shared
vector type.
(vect_slp_analyze_operations): Adjust.
(vect_slp_analyze_bb_1): Likewise.
|
|
This tries to improve BB vectorization dumps by providing more
precise locations. Currently the vect_location is simply the
very last stmt in a basic-block that has a location. So for
double a[4], b[4];
int x[4], y[4];
void foo()
{
a[0] = b[0]; // line 5
a[1] = b[1];
a[2] = b[2];
a[3] = b[3];
x[0] = y[0]; // line 9
x[1] = y[1];
x[2] = y[2];
x[3] = y[3];
} // line 13
we show the user with -O3 -fopt-info-vec
t.c:13:1: optimized: basic block part vectorized using 16 byte vectors
while with the patch we point to both independently vectorized
opportunities:
t.c:5:8: optimized: basic block part vectorized using 16 byte vectors
t.c:9:8: optimized: basic block part vectorized using 16 byte vectors
there's the possibility that the location regresses in case the
root stmt in the SLP instance has no location. For a SLP subgraph
with multiple entries the location also chooses one entry at random,
not sure in which case we want to dump both.
Still as the plan is to extend the basic-block vectorization
scope from single basic-block to multiple ones this is a first
step to preserve something sensible.
Implementation-wise this makes both costing and code-generation
happen on the subgraphs as analyzed.
2020-09-11 Richard Biener <rguenther@suse.de>
* tree-vectorizer.h (_slp_instance::location): New method.
(vect_schedule_slp): Adjust prototype.
* tree-vectorizer.c (vec_info::remove_stmt): Adjust
the BB region begin if we removed the stmt it points to.
* tree-vect-loop.c (vect_transform_loop): Adjust.
* tree-vect-slp.c (_slp_instance::location): Implement.
(vect_analyze_slp_instance): For BB vectorization set
vect_location to that of the instance.
(vect_slp_analyze_operations): Likewise.
(vect_bb_vectorization_profitable_p): Remove wrapper.
(vect_slp_analyze_bb_1): Remove cost check here.
(vect_slp_region): Cost check and code generate subgraphs separately,
report optimized locations and missed optimizations due to
profitability for each of them.
(vect_schedule_slp): Get the vector of SLP graph entries to
vectorize as argument.
|
|
gcc/ChangeLog
2020-09-07 Andrea Corallo <andrea.corallo@arm.com>
* tree-vect-loop.c (vect_estimate_min_profitable_iters): Revert
dead-code removal introduced by 09fa6acd8d9 + add a comment to
clarify.
|
|
The following adds the capability to code-generate live lanes in
basic-block vectorization using lane extracts from vector stmts
rather than keeping the original scalar code around for those.
This eventually makes previously not profitable vectorizations
profitable (the live scalar code was appropriately costed so
are the lane extracts now), without considering the cost model
this patch doesn't add or remove any basic-block vectorization
capabilities.
The patch re/ab-uses STMT_VINFO_LIVE_P in basic-block vectorization
mode to tell whether a live lane is vectorized or whether it is
provided by means of keeping the scalar code live.
The patch is a first step towards vectorizing sequences of
stmts that do not end up in stores or vector constructors though.
Bootstrapped and tested on x86_64-unknown-linux-gnu.
2020-09-04 Richard Biener <rguenther@suse.de>
* tree-vectorizer.h (vectorizable_live_operation): Adjust.
* tree-vect-loop.c (vectorizable_live_operation): Vectorize
live lanes out of basic-block vectorization nodes.
* tree-vect-slp.c (vect_bb_slp_mark_live_stmts): New function.
(vect_slp_analyze_operations): Analyze live lanes and their
vectorization possibility after the whole SLP graph is final.
(vect_bb_slp_scalar_cost): Adjust for vectorized live lanes.
* tree-vect-stmts.c (can_vectorize_live_stmts): Adjust.
(vect_transform_stmt): Call can_vectorize_live_stmts also for
basic-block vectorization.
* gcc.dg/vect/bb-slp-46.c: New testcase.
* gcc.dg/vect/bb-slp-47.c: Likewise.
* gcc.dg/vect/bb-slp-32.c: Adjust.
|
|
This refines the previous fix for PR96698 by re-doing how and where
we arrange for setting vectorized cycle PHI backedge values.
2020-09-04 Richard Biener <rguenther@suse.de>
PR tree-optimization/96698
PR tree-optimization/96920
* tree-vectorizer.h (loop_vec_info::reduc_latch_defs): Remove.
(loop_vec_info::reduc_latch_slp_defs): Likewise.
* tree-vect-stmts.c (vect_transform_stmt): Remove vectorized
cycle PHI latch code.
* tree-vect-loop.c (maybe_set_vectorized_backedge_value): New
helper to set vectorized cycle PHI latch values.
(vect_transform_loop): Walk over all PHIs again after
vectorizing them, calling maybe_set_vectorized_backedge_value.
Call maybe_set_vectorized_backedge_value for each vectorized
stmt. Remove delayed update code.
* tree-vect-slp.c (vect_analyze_slp_instance): Initialize
SLP instance reduc_phis member.
(vect_schedule_slp): Set vectorized cycle PHI latch values.
* gfortran.dg/vect/pr96920.f90: New testcase.
* gcc.dg/vect/pr96920.c: Likewise.
|
|
gcc/ChangeLog
2020-09-04 Andrea Corallo <andrea.corallo@arm.com>
* tree-vect-loop.c (vect_estimate_min_profitable_iters): Remove
dead code as LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) is
always verified.
|
|
gcc/ada/ChangeLog:
* gcc-interface/trans.c (gigi): Set exact argument of a vector
growth function to true.
(Attribute_to_gnu): Likewise.
gcc/ChangeLog:
* alias.c (init_alias_analysis): Set exact argument of a vector
growth function to true.
* calls.c (internal_arg_pointer_based_exp_scan): Likewise.
* cfgbuild.c (find_many_sub_basic_blocks): Likewise.
* cfgexpand.c (expand_asm_stmt): Likewise.
* cfgrtl.c (rtl_create_basic_block): Likewise.
* combine.c (combine_split_insns): Likewise.
(combine_instructions): Likewise.
* config/aarch64/aarch64-sve-builtins.cc (function_expander::add_output_operand): Likewise.
(function_expander::add_input_operand): Likewise.
(function_expander::add_integer_operand): Likewise.
(function_expander::add_address_operand): Likewise.
(function_expander::add_fixed_operand): Likewise.
* df-core.c (df_worklist_dataflow_doublequeue): Likewise.
* dwarf2cfi.c (update_row_reg_save): Likewise.
* early-remat.c (early_remat::init_block_info): Likewise.
(early_remat::finalize_candidate_indices): Likewise.
* except.c (sjlj_build_landing_pads): Likewise.
* final.c (compute_alignments): Likewise.
(grow_label_align): Likewise.
* function.c (temp_slots_at_level): Likewise.
* fwprop.c (build_single_def_use_links): Likewise.
(update_uses): Likewise.
* gcc.c (insert_wrapper): Likewise.
* genautomata.c (create_state_ainsn_table): Likewise.
(add_vect): Likewise.
(output_dead_lock_vect): Likewise.
* genmatch.c (capture_info::capture_info): Likewise.
(parser::finish_match_operand): Likewise.
* genrecog.c (optimize_subroutine_group): Likewise.
(merge_pattern_info::merge_pattern_info): Likewise.
(merge_into_decision): Likewise.
(print_subroutine_start): Likewise.
(main): Likewise.
* gimple-loop-versioning.cc (loop_versioning::loop_versioning): Likewise.
* gimple.c (gimple_set_bb): Likewise.
* graphite-isl-ast-to-gimple.c (translate_isl_ast_node_user): Likewise.
* haifa-sched.c (sched_extend_luids): Likewise.
(extend_h_i_d): Likewise.
* insn-addr.h (insn_addresses_new): Likewise.
* ipa-cp.c (gather_context_independent_values): Likewise.
(find_more_contexts_for_caller_subset): Likewise.
* ipa-devirt.c (final_warning_record::grow_type_warnings): Likewise.
(ipa_odr_read_section): Likewise.
* ipa-fnsummary.c (evaluate_properties_for_edge): Likewise.
(ipa_fn_summary_t::duplicate): Likewise.
(analyze_function_body): Likewise.
(ipa_merge_fn_summary_after_inlining): Likewise.
(read_ipa_call_summary): Likewise.
* ipa-icf.c (sem_function::bb_dict_test): Likewise.
* ipa-prop.c (ipa_alloc_node_params): Likewise.
(parm_bb_aa_status_for_bb): Likewise.
(ipa_compute_jump_functions_for_edge): Likewise.
(ipa_analyze_node): Likewise.
(update_jump_functions_after_inlining): Likewise.
(ipa_read_edge_info): Likewise.
(read_ipcp_transformation_info): Likewise.
(ipcp_transform_function): Likewise.
* ipa-reference.c (ipa_reference_write_optimization_summary): Likewise.
* ipa-split.c (execute_split_functions): Likewise.
* ira.c (find_moveable_pseudos): Likewise.
* lower-subreg.c (decompose_multiword_subregs): Likewise.
* lto-streamer-in.c (input_eh_regions): Likewise.
(input_cfg): Likewise.
(input_struct_function_base): Likewise.
(input_function): Likewise.
* modulo-sched.c (set_node_sched_params): Likewise.
(extend_node_sched_params): Likewise.
(schedule_reg_moves): Likewise.
* omp-general.c (omp_construct_simd_compare): Likewise.
* passes.c (pass_manager::create_pass_tab): Likewise.
(enable_disable_pass): Likewise.
* predict.c (determine_unlikely_bbs): Likewise.
* profile.c (compute_branch_probabilities): Likewise.
* read-rtl-function.c (function_reader::parse_block): Likewise.
* read-rtl.c (rtx_reader::read_rtx_code): Likewise.
* reg-stack.c (stack_regs_mentioned): Likewise.
* regrename.c (regrename_init): Likewise.
* rtlanal.c (T>::add_single_to_queue): Likewise.
* sched-deps.c (init_deps_data_vector): Likewise.
* sel-sched-ir.c (sel_extend_global_bb_info): Likewise.
(extend_region_bb_info): Likewise.
(extend_insn_data): Likewise.
* symtab.c (symtab_node::create_reference): Likewise.
* tracer.c (tail_duplicate): Likewise.
* trans-mem.c (tm_region_init): Likewise.
(get_bb_regions_instrumented): Likewise.
* tree-cfg.c (init_empty_tree_cfg_for_function): Likewise.
(build_gimple_cfg): Likewise.
(create_bb): Likewise.
(move_block_to_fn): Likewise.
* tree-complex.c (tree_lower_complex): Likewise.
* tree-if-conv.c (predicate_rhs_code): Likewise.
* tree-inline.c (copy_bb): Likewise.
* tree-into-ssa.c (get_ssa_name_ann): Likewise.
(mark_phi_for_rewrite): Likewise.
* tree-object-size.c (compute_builtin_object_size): Likewise.
(init_object_sizes): Likewise.
* tree-predcom.c (initialize_root_vars_store_elim_1): Likewise.
(initialize_root_vars_store_elim_2): Likewise.
(prepare_initializers_chain_store_elim): Likewise.
* tree-ssa-address.c (addr_for_mem_ref): Likewise.
(multiplier_allowed_in_address_p): Likewise.
* tree-ssa-coalesce.c (ssa_conflicts_new): Likewise.
* tree-ssa-forwprop.c (simplify_vector_constructor): Likewise.
* tree-ssa-loop-ivopts.c (addr_offset_valid_p): Likewise.
(get_address_cost_ainc): Likewise.
* tree-ssa-loop-niter.c (discover_iteration_bound_by_body_walk): Likewise.
* tree-ssa-pre.c (add_to_value): Likewise.
(phi_translate_1): Likewise.
(do_pre_regular_insertion): Likewise.
(do_pre_partial_partial_insertion): Likewise.
(init_pre): Likewise.
* tree-ssa-propagate.c (ssa_prop_init): Likewise.
(update_call_from_tree): Likewise.
* tree-ssa-reassoc.c (optimize_range_tests_cmp_bitwise): Likewise.
* tree-ssa-sccvn.c (vn_reference_lookup_3): Likewise.
(vn_reference_lookup_pieces): Likewise.
(eliminate_dom_walker::eliminate_push_avail): Likewise.
* tree-ssa-strlen.c (set_strinfo): Likewise.
(get_stridx_plus_constant): Likewise.
(zero_length_string): Likewise.
(find_equal_ptrs): Likewise.
(printf_strlen_execute): Likewise.
* tree-ssa-threadedge.c (set_ssa_name_value): Likewise.
* tree-ssanames.c (make_ssa_name_fn): Likewise.
* tree-streamer-in.c (streamer_read_tree_bitfields): Likewise.
* tree-vect-loop.c (vect_record_loop_mask): Likewise.
(vect_get_loop_mask): Likewise.
(vect_record_loop_len): Likewise.
(vect_get_loop_len): Likewise.
* tree-vect-patterns.c (vect_recog_mask_conversion_pattern): Likewise.
* tree-vect-slp.c (vect_slp_convert_to_external): Likewise.
(vect_bb_slp_scalar_cost): Likewise.
(vect_bb_vectorization_profitable_p): Likewise.
(vectorizable_slp_permutation): Likewise.
* tree-vect-stmts.c (vectorizable_call): Likewise.
(vectorizable_simd_clone_call): Likewise.
(scan_store_can_perm_p): Likewise.
(vectorizable_store): Likewise.
* expr.c: Likewise.
* vec.c (test_safe_grow_cleared): Likewise.
* vec.h (vec_safe_grow): Likewise.
(vec_safe_grow_cleared): Likewise.
(vl_ptr>::safe_grow): Likewise.
(vl_ptr>::safe_grow_cleared): Likewise.
* config/c6x/c6x.c (insn_set_clock): Likewise.
gcc/c/ChangeLog:
* gimple-parser.c (c_parser_gimple_compound_statement): Set exact argument of a vector
growth function to true.
gcc/cp/ChangeLog:
* class.c (build_vtbl_initializer): Set exact argument of a vector
growth function to true.
* constraint.cc (get_mapped_args): Likewise.
* decl.c (cp_maybe_mangle_decomp): Likewise.
(cp_finish_decomp): Likewise.
* parser.c (cp_parser_omp_for_loop): Likewise.
* pt.c (canonical_type_parameter): Likewise.
* rtti.c (get_pseudo_ti_init): Likewise.
gcc/fortran/ChangeLog:
* trans-openmp.c (gfc_trans_omp_do): Set exact argument of a vector
growth function to true.
gcc/lto/ChangeLog:
* lto-common.c (lto_file_finalize): Set exact argument of a vector
growth function to true.
|
|
This fixes vectorized PHI latch edge updating and delay it until
all of the loop is code generated to deal with the case that the
latch def is a PHI in the same block.
2020-08-26 Richard Biener <rguenther@suse.de>
PR tree-optimization/96698
* tree-vectorizer.h (loop_vec_info::reduc_latch_defs): New.
(loop_vec_info::reduc_latch_slp_defs): Likewise.
* tree-vect-stmts.c (vect_transform_stmt): Only record
stmts to update PHI latches from, perform the update ...
* tree-vect-loop.c (vect_transform_loop): ... here after
vectorizing those PHIs.
(info_for_reduction): Properly handle non-reduction PHIs.
* gcc.dg/vect/pr96698.c: New testcase.
|
|
gcc/ChangeLog:
* tree-vect-data-refs.c (dr_group_sort_cmp): Work on
data_ref_pair.
(vect_analyze_data_ref_accesses): Work on groups.
(vect_find_stmt_data_reference): Add group_id argument and fill
up dataref_groups vector.
* tree-vect-loop.c (vect_get_datarefs_in_loop): Pass new
arguments.
(vect_analyze_loop_2): Likewise.
* tree-vect-slp.c (vect_slp_analyze_bb_1): Pass argument.
(vect_slp_bb_region): Likewise.
(vect_slp_region): Likewise.
(vect_slp_bb):Work on the entire BB.
* tree-vectorizer.h (vect_analyze_data_ref_accesses): Add new
argument.
(vect_find_stmt_data_reference): Likewise.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/bb-slp-38.c: Adjust pattern as now we only process
a single vectorization and now 2 partial.
* gcc.dg/vect/bb-slp-45.c: New test.
|
|
This patch is to add the cost modeling for vector with length,
it mainly follows what we generate for vector with length in
functions vect_set_loop_controls_directly and vect_gen_len
at the worst case.
For Power, the length is expected to be in bits 0-7 (high bits),
we have to model the cost of shifting bits, which is implemented
in adjust_vect_cost_per_loop.
Bootstrapped/regtested on powerpc64le-linux-gnu (P9) with explicit
param vect-partial-vector-usage=1.
gcc/ChangeLog:
* config/rs6000/rs6000.c (rs6000_adjust_vect_cost_per_loop): New
function.
(rs6000_finish_cost): Call rs6000_adjust_vect_cost_per_loop.
* tree-vect-loop.c (vect_estimate_min_profitable_iters): Add cost
modeling for vector with length.
(vect_rgroup_iv_might_wrap_p): New function, factored out from...
* tree-vect-loop-manip.c (vect_set_loop_controls_directly): ...this.
Update function comment.
* tree-vect-stmts.c (vect_gen_len): Update function comment.
* tree-vectorizer.h (vect_rgroup_iv_might_wrap_p): New declare.
|
|
Currently vectorizer cost modeling counts branch taken costs for
prologue and epilogue if the number of iterations is unknown.
But it isn't sensible if there are no peeled iterations.
This patch is to guard them under peel_iters_prologue > 0 or
peel_iters_epilogue > 0.
Bootstrapped/regtested on powerpc64le-linux-gnu and aarch64-linux-gnu.
gcc/ChangeLog:
* tree-vect-loop.c (vect_get_known_peeling_cost): Don't consider branch
taken costs for prologue and epilogue if they don't exist.
(vect_estimate_min_profitable_iters): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve/cost_model_2.c: Adjust due to cost model
change.
|
|
This patch is to refactor the existing peel_iters_prologue and
peel_iters_epilogue cost model handlings, by following the structure
below suggested by Richard Sandiford:
- calculate peel_iters_prologue
- calculate peel_iters_epilogue
- add costs associated with peel_iters_prologue
- add costs associated with peel_iters_epilogue
- add costs related to branch taken/not_taken.
Bootstrapped/regtested on aarch64-linux-gnu.
gcc/ChangeLog:
* tree-vect-loop.c (vect_get_known_peeling_cost): Factor out some code
to determine peel_iters_epilogue to...
(vect_get_peel_iters_epilogue): ...this new function.
(vect_estimate_min_profitable_iters): Refactor cost calculation on
peel_iters_prologue and peel_iters_epilogue.
|
|
Power9 supports vector load/store instruction lxvl/stxvl which allow
us to operate partial vectors with one specific length. This patch
extends some of current mask-based partial vectors support code for
length-based approach, also adds some length specific support code.
So far it assumes that we can only have one partial vectors approach
at the same time, it will disable to use partial vectors if both
approaches co-exist.
Like the description of optab len_load/len_store, the length-based
approach can have two flavors, one is length in bytes, the other is
length in lanes. This patch is mainly implemented and tested for
length in bytes, but as Richard S. suggested, most of code has
considered both flavors.
This also introduces one parameter vect-partial-vector-usage allow
users to control when the loop vectorizer considers using partial
vectors as an alternative to falling back to scalar code.
gcc/ChangeLog:
* config/rs6000/rs6000.c (rs6000_option_override_internal):
Set param_vect_partial_vector_usage to 0 explicitly.
* doc/invoke.texi (vect-partial-vector-usage): Document new option.
* optabs-query.c (get_len_load_store_mode): New function.
* optabs-query.h (get_len_load_store_mode): New declare.
* params.opt (vect-partial-vector-usage): New.
* tree-vect-loop-manip.c (vect_set_loop_controls_directly): Add the
handlings for vectorization using length-based partial vectors, call
vect_gen_len for length generation, and rename some variables with
items instead of scalars.
(vect_set_loop_condition_partial_vectors): Add the handlings for
vectorization using length-based partial vectors.
(vect_do_peeling): Allow remaining eiters less than epilogue vf for
LOOP_VINFO_USING_PARTIAL_VECTORS_P.
* tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Init
epil_using_partial_vectors_p.
(_loop_vec_info::~_loop_vec_info): Call release_vec_loop_controls
for lengths destruction.
(vect_verify_loop_lens): New function.
(vect_analyze_loop): Add handlings for epilogue of loop when it's
marked to use vectorization using partial vectors.
(vect_analyze_loop_2): Add the check to allow only one vectorization
approach using partial vectorization at the same time. Check param
vect-partial-vector-usage for partial vectors decision. Mark
LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P if the epilogue is
considerable to use partial vectors. Call release_vec_loop_controls
for lengths destruction.
(vect_estimate_min_profitable_iters): Adjust for loop vectorization
using length-based partial vectors.
(vect_record_loop_mask): Init factor to 1 for vectorization using
mask-based partial vectors.
(vect_record_loop_len): New function.
(vect_get_loop_len): Likewise.
* tree-vect-stmts.c (check_load_store_for_partial_vectors): Add
checks for vectorization using length-based partial vectors. Factor
some code to lambda function get_valid_nvectors.
(vectorizable_store): Add handlings when using length-based partial
vectors.
(vectorizable_load): Likewise.
(vect_gen_len): New function.
* tree-vectorizer.h (struct rgroup_controls): Add field factor
mainly for length-based partial vectors.
(vec_loop_lens): New typedef.
(_loop_vec_info): Add lens and epil_using_partial_vectors_p.
(LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P): New macro.
(LOOP_VINFO_LENS): Likewise.
(LOOP_VINFO_FULLY_WITH_LENGTH_P): Likewise.
(vect_record_loop_len): New declare.
(vect_get_loop_len): Likewise.
(vect_gen_len): Likewise.
|
|
This patch is derived from the review of vector with length patch
series. I relaxed the guard on LOOP_VINFO_PEELING_FOR_ALIGNMENT for
vector with length as Richard S.'s suggestion, then encountered one
failure from case gcc.dg/vect/vect-ifcvt-11.c with param
vect-partial-vector-usage=2 enablement run. The root cause is that
we still use the original niters for the loop body vectorization,
it leads the access to go out of bound, instead we should use
LOOP_VINFO_NITERS which has been adjusted in vect_do_peeling by
considering the peeling number for prologue.
Bootstrapped/regtested on aarch64-linux-gnu and powerpc64le-linux-gnu.
gcc/ChangeLog:
* tree-vect-loop.c (vect_transform_loop): Use LOOP_VINFO_NITERS which
is adjusted by considering peeled prologue for non
vect_use_loop_mask_for_alignment_p cases.
|
|
This followup removes vect_verify_datarefs_alignment and its
premature cancellation of vectorization leaving the actual
decision whether alignment is supported to the functions
deciding whether we can vectorize a load or store.
2020-07-08 Richard Biener <rguenther@suse.de>
* tree-vectorizer.h (vect_verify_datarefs_alignment): Remove.
(vect_slp_analyze_and_verify_instance_alignment): Rename to ...
(vect_slp_analyze_instance_alignment): ... this.
* tree-vect-data-refs.c (verify_data_ref_alignment): Remove.
(vect_verify_datarefs_alignment): Likewise.
(vect_enhance_data_refs_alignment): Do not call
vect_verify_datarefs_alignment.
(vect_slp_analyze_node_alignment): Rename from
vect_slp_analyze_and_verify_node_alignment and do not
call verify_data_ref_alignment.
(vect_slp_analyze_instance_alignment): Rename from
vect_slp_analyze_and_verify_instance_alignment.
* tree-vect-stmts.c (vectorizable_store): Dump when
we vectorize an unaligned access.
(vectorizable_load): Likewise.
* tree-vect-loop.c (vect_analyze_loop_2): Do not call
vect_verify_datarefs_alignment.
* tree-vect-slp.c (vect_slp_analyze_bb_1): Adjust.
* gcc.dg/vect/bb-slp-10.c: Adjust.
* gcc.dg/vect/slp-45.c: Likewise.
* gcc.dg/vect/vect-109.c: Likewise.
|
|
As Richard S. suggested in the review of vector with length patch
series, we can use one message on "partial vectors" instead of
"fully with masking". This patch is to update the dumping string
and related test cases.
Bootstrapped/regtested on aarch64-linux-gnu.
gcc/ChangeLog:
* tree-vect-loop.c (vect_analyze_loop_2): Update dumping string
for fully masking to be more common.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve/clastb_1.c: Update dumping string.
* gcc.target/aarch64/sve/clastb_2.c: Likewise.
* gcc.target/aarch64/sve/clastb_3.c: Likewise.
* gcc.target/aarch64/sve/clastb_4.c: Likewise.
* gcc.target/aarch64/sve/clastb_5.c: Likewise.
* gcc.target/aarch64/sve/clastb_6.c: Likewise.
* gcc.target/aarch64/sve/clastb_7.c: Likewise.
|
|
This fixes computation of the insertion place for fold-left SLP
reductions where the PHIs do not have vectorized stmts. The
SLP representation isn't perfect here thus the following.
2020-06-26 Richard Biener <rguenther@suse.de>
PR tree-optimization/95897
* tree-vectorizer.h (vectorizable_induction): Remove
unused gimple_stmt_iterator * parameter.
* tree-vect-loop.c (vectorizable_induction): Likewise.
(vect_analyze_loop_operations): Adjust.
* tree-vect-stmts.c (vect_analyze_stmt): Likewise.
(vect_transform_stmt): Likewise.
* tree-vect-slp.c (vect_schedule_slp_instance): Adjust
for fold-left reductions, clarify existing reduction case.
* gcc.dg/vect/pr95897.c: New testcase.
|
|
Minor code refactorings in tree-vect-data-refs.c and tree-vect-loop.c.
Use LOOP_VINFO_DATAREFS and LOOP_VINFO_DDRS when possible and rename
several parameters to make code more consistent.
2020-06-13 Felix Yang <felix.yang@huawei.com>
gcc/
* tree-vect-data-refs.c (vect_verify_datarefs_alignment): Rename
parameter to loop_vinfo and update uses. Use LOOP_VINFO_DATAREFS
when possible.
(vect_analyze_data_refs_alignment): Likewise, and use LOOP_VINFO_DDRS
when possible.
* tree-vect-loop.c (vect_dissolve_slp_only_groups): Use
LOOP_VINFO_DATAREFS when possible.
(update_epilogue_loop_vinfo): Likewise.
|
|
Power supports vector memory access with length (in bytes) instructions.
Like existing fully masking for SVE, it is another approach to vectorize
the loop using partially-populated vectors.
As Richard Sandiford suggested, we should share the codes in approaches
with partial vectors if possible. This patch is to:
1) factor out two functions:
- vect_min_prec_for_max_niters
- vect_known_niters_smaller_than_vf.
2) rename four functions:
- vect_iv_limit_for_full_masking
- check_load_store_masking
- vect_set_loop_condition_masked
- vect_set_loop_condition_unmasked
3) rename macros LOOP_VINFO_MASK_COMPARE_TYPE and LOOP_VINFO_MASK_IV_TYPE.
Bootstrapped/regtested on aarch64-linux-gnu.
gcc/ChangeLog:
* tree-vect-loop-manip.c (vect_set_loop_controls_directly): Rename
LOOP_VINFO_MASK_COMPARE_TYPE to LOOP_VINFO_RGROUP_COMPARE_TYPE. Rename
LOOP_VINFO_MASK_IV_TYPE to LOOP_VINFO_RGROUP_IV_TYPE.
(vect_set_loop_condition_masked): Renamed to ...
(vect_set_loop_condition_partial_vectors): ... this. Rename
LOOP_VINFO_MASK_COMPARE_TYPE to LOOP_VINFO_RGROUP_COMPARE_TYPE. Rename
vect_iv_limit_for_full_masking to vect_iv_limit_for_partial_vectors.
(vect_set_loop_condition_unmasked): Renamed to ...
(vect_set_loop_condition_normal): ... this.
(vect_set_loop_condition): Rename vect_set_loop_condition_unmasked to
vect_set_loop_condition_normal. Rename vect_set_loop_condition_masked
to vect_set_loop_condition_partial_vectors.
(vect_prepare_for_masked_peels): Rename LOOP_VINFO_MASK_COMPARE_TYPE
to LOOP_VINFO_RGROUP_COMPARE_TYPE.
* tree-vect-loop.c (vect_known_niters_smaller_than_vf): New, factored
out from ...
(vect_analyze_loop_costing): ... this.
(_loop_vec_info::_loop_vec_info): Rename mask_compare_type to
compare_type.
(vect_min_prec_for_max_niters): New, factored out from ...
(vect_verify_full_masking): ... this. Rename
vect_iv_limit_for_full_masking to vect_iv_limit_for_partial_vectors.
Rename LOOP_VINFO_MASK_COMPARE_TYPE to LOOP_VINFO_RGROUP_COMPARE_TYPE.
Rename LOOP_VINFO_MASK_IV_TYPE to LOOP_VINFO_RGROUP_IV_TYPE.
(vectorizable_reduction): Update some dumpings with partial
vectors instead of fully-masked.
(vectorizable_live_operation): Likewise.
(vect_iv_limit_for_full_masking): Renamed to ...
(vect_iv_limit_for_partial_vectors): ... this.
* tree-vect-stmts.c (check_load_store_masking): Renamed to ...
(check_load_store_for_partial_vectors): ... this. Update some
dumpings with partial vectors instead of fully-masked.
(vectorizable_store): Rename check_load_store_masking to
check_load_store_for_partial_vectors.
(vectorizable_load): Likewise.
* tree-vectorizer.h (LOOP_VINFO_MASK_COMPARE_TYPE): Renamed to ...
(LOOP_VINFO_RGROUP_COMPARE_TYPE): ... this.
(LOOP_VINFO_MASK_IV_TYPE): Renamed to ...
(LOOP_VINFO_RGROUP_IV_TYPE): ... this.
(vect_iv_limit_for_full_masking): Renamed to ...
(vect_iv_limit_for_partial_vectors): this.
(_loop_vec_info): Rename mask_compare_type to rgroup_compare_type.
Rename iv_type to rgroup_iv_type.
|
|
Power supports vector memory access with length (in bytes) instructions.
Like existing fully masking for SVE, it is another approach to vectorize
the loop using partially-populated vectors.
As Richard Sandiford pointed out, we can rename the rgroup struct
rgroup_masks to rgroup_controls, rename its members mask_type to type,
masks to controls to be more generic.
Besides, this patch also renames some functions like vect_set_loop_mask
to vect_set_loop_control, release_vec_loop_masks to
release_vec_loop_controls, vect_set_loop_masks_directly to
vect_set_loop_controls_directly.
Bootstrapped/regtested on aarch64-linux-gnu.
gcc/ChangeLog:
* tree-vect-loop-manip.c (vect_set_loop_mask): Renamed to ...
(vect_set_loop_control): ... this.
(vect_maybe_permute_loop_masks): Rename rgroup_masks related things.
(vect_set_loop_masks_directly): Renamed to ...
(vect_set_loop_controls_directly): ... this. Also rename some
variables with ctrl instead of mask. Rename vect_set_loop_mask to
vect_set_loop_control.
(vect_set_loop_condition_masked): Rename rgroup_masks related things.
Also rename some variables with ctrl instead of mask.
* tree-vect-loop.c (release_vec_loop_masks): Renamed to ...
(release_vec_loop_controls): ... this. Rename rgroup_masks related
things.
(_loop_vec_info::~_loop_vec_info): Rename release_vec_loop_masks to
release_vec_loop_controls.
(can_produce_all_loop_masks_p): Rename rgroup_masks related things.
(vect_get_max_nscalars_per_iter): Likewise.
(vect_estimate_min_profitable_iters): Likewise.
(vect_record_loop_mask): Likewise.
(vect_get_loop_mask): Likewise.
* tree-vectorizer.h (struct rgroup_masks): Renamed to ...
(struct rgroup_controls): ... this. Also rename mask_type
to type and rename masks to controls.
|
|
Power supports vector memory access with length (in bytes) instructions.
Like existing fully masking for SVE, it is another approach to vectorize
the loop using partially-populated vectors.
As Richard Sandiford suggested, this patch is to update the existing
fully_masked_p field to using_partial_vectors_p. Introduce one macro
LOOP_VINFO_USING_PARTIAL_VECTORS_P for partial vectorization checking
usage, update the LOOP_VINFO_FULLY_MASKED_P with
LOOP_VINFO_USING_PARTIAL_VECTORS_P && !masks.is_empty() and still use
it for mask-based partial vectors approach specific checks.
Bootstrapped/regtested on aarch64-linux-gnu.
gcc/ChangeLog:
* tree-vect-loop-manip.c (vect_set_loop_condition): Rename
LOOP_VINFO_FULLY_MASKED_P to LOOP_VINFO_USING_PARTIAL_VECTORS_P.
(vect_gen_vector_loop_niters): Likewise.
(vect_do_peeling): Likewise.
* tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Rename
fully_masked_p to using_partial_vectors_p.
(vect_analyze_loop_costing): Rename LOOP_VINFO_FULLY_MASKED_P to
LOOP_VINFO_USING_PARTIAL_VECTORS_P.
(determine_peel_for_niter): Likewise.
(vect_estimate_min_profitable_iters): Likewise.
(vect_transform_loop): Likewise.
* tree-vectorizer.h (LOOP_VINFO_FULLY_MASKED_P): Updated.
(LOOP_VINFO_USING_PARTIAL_VECTORS_P): New macro.
|