Age | Commit message (Collapse) | Author | Files | Lines |
|
This patch is to add the cost modeling for vector with length,
it mainly follows what we generate for vector with length in
functions vect_set_loop_controls_directly and vect_gen_len
at the worst case.
For Power, the length is expected to be in bits 0-7 (high bits),
we have to model the cost of shifting bits, which is implemented
in adjust_vect_cost_per_loop.
Bootstrapped/regtested on powerpc64le-linux-gnu (P9) with explicit
param vect-partial-vector-usage=1.
gcc/ChangeLog:
* config/rs6000/rs6000.c (rs6000_adjust_vect_cost_per_loop): New
function.
(rs6000_finish_cost): Call rs6000_adjust_vect_cost_per_loop.
* tree-vect-loop.c (vect_estimate_min_profitable_iters): Add cost
modeling for vector with length.
(vect_rgroup_iv_might_wrap_p): New function, factored out from...
* tree-vect-loop-manip.c (vect_set_loop_controls_directly): ...this.
Update function comment.
* tree-vect-stmts.c (vect_gen_len): Update function comment.
* tree-vectorizer.h (vect_rgroup_iv_might_wrap_p): New declare.
|
|
Power9 supports vector load/store instruction lxvl/stxvl which allow
us to operate partial vectors with one specific length. This patch
extends some of current mask-based partial vectors support code for
length-based approach, also adds some length specific support code.
So far it assumes that we can only have one partial vectors approach
at the same time, it will disable to use partial vectors if both
approaches co-exist.
Like the description of optab len_load/len_store, the length-based
approach can have two flavors, one is length in bytes, the other is
length in lanes. This patch is mainly implemented and tested for
length in bytes, but as Richard S. suggested, most of code has
considered both flavors.
This also introduces one parameter vect-partial-vector-usage allow
users to control when the loop vectorizer considers using partial
vectors as an alternative to falling back to scalar code.
gcc/ChangeLog:
* config/rs6000/rs6000.c (rs6000_option_override_internal):
Set param_vect_partial_vector_usage to 0 explicitly.
* doc/invoke.texi (vect-partial-vector-usage): Document new option.
* optabs-query.c (get_len_load_store_mode): New function.
* optabs-query.h (get_len_load_store_mode): New declare.
* params.opt (vect-partial-vector-usage): New.
* tree-vect-loop-manip.c (vect_set_loop_controls_directly): Add the
handlings for vectorization using length-based partial vectors, call
vect_gen_len for length generation, and rename some variables with
items instead of scalars.
(vect_set_loop_condition_partial_vectors): Add the handlings for
vectorization using length-based partial vectors.
(vect_do_peeling): Allow remaining eiters less than epilogue vf for
LOOP_VINFO_USING_PARTIAL_VECTORS_P.
* tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Init
epil_using_partial_vectors_p.
(_loop_vec_info::~_loop_vec_info): Call release_vec_loop_controls
for lengths destruction.
(vect_verify_loop_lens): New function.
(vect_analyze_loop): Add handlings for epilogue of loop when it's
marked to use vectorization using partial vectors.
(vect_analyze_loop_2): Add the check to allow only one vectorization
approach using partial vectorization at the same time. Check param
vect-partial-vector-usage for partial vectors decision. Mark
LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P if the epilogue is
considerable to use partial vectors. Call release_vec_loop_controls
for lengths destruction.
(vect_estimate_min_profitable_iters): Adjust for loop vectorization
using length-based partial vectors.
(vect_record_loop_mask): Init factor to 1 for vectorization using
mask-based partial vectors.
(vect_record_loop_len): New function.
(vect_get_loop_len): Likewise.
* tree-vect-stmts.c (check_load_store_for_partial_vectors): Add
checks for vectorization using length-based partial vectors. Factor
some code to lambda function get_valid_nvectors.
(vectorizable_store): Add handlings when using length-based partial
vectors.
(vectorizable_load): Likewise.
(vect_gen_len): New function.
* tree-vectorizer.h (struct rgroup_controls): Add field factor
mainly for length-based partial vectors.
(vec_loop_lens): New typedef.
(_loop_vec_info): Add lens and epil_using_partial_vectors_p.
(LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P): New macro.
(LOOP_VINFO_LENS): Likewise.
(LOOP_VINFO_FULLY_WITH_LENGTH_P): Likewise.
(vect_record_loop_len): New declare.
(vect_get_loop_len): Likewise.
(vect_gen_len): Likewise.
|
|
This followup removes vect_verify_datarefs_alignment and its
premature cancellation of vectorization leaving the actual
decision whether alignment is supported to the functions
deciding whether we can vectorize a load or store.
2020-07-08 Richard Biener <rguenther@suse.de>
* tree-vectorizer.h (vect_verify_datarefs_alignment): Remove.
(vect_slp_analyze_and_verify_instance_alignment): Rename to ...
(vect_slp_analyze_instance_alignment): ... this.
* tree-vect-data-refs.c (verify_data_ref_alignment): Remove.
(vect_verify_datarefs_alignment): Likewise.
(vect_enhance_data_refs_alignment): Do not call
vect_verify_datarefs_alignment.
(vect_slp_analyze_node_alignment): Rename from
vect_slp_analyze_and_verify_node_alignment and do not
call verify_data_ref_alignment.
(vect_slp_analyze_instance_alignment): Rename from
vect_slp_analyze_and_verify_instance_alignment.
* tree-vect-stmts.c (vectorizable_store): Dump when
we vectorize an unaligned access.
(vectorizable_load): Likewise.
* tree-vect-loop.c (vect_analyze_loop_2): Do not call
vect_verify_datarefs_alignment.
* tree-vect-slp.c (vect_slp_analyze_bb_1): Adjust.
* gcc.dg/vect/bb-slp-10.c: Adjust.
* gcc.dg/vect/slp-45.c: Likewise.
* gcc.dg/vect/vect-109.c: Likewise.
|
|
This provides helpers to insert stmts on region entry abstracted
from loop/basic-block split out from vec_init_vector and used
from the SLP constant code generation path. The SLP constant
code generation path is also changed to avoid needless SSA
copying since we can store VECTOR_CSTs directly in the vectorized
defs array, improving the IL from the vectorizer.
2020-07-03 Richard Biener <rguenther@suse.de>
* tree-vectorizer.h (vec_info::insert_on_entry): New.
(vec_info::insert_seq_on_entry): Likewise.
* tree-vectorizer.c (vec_info::insert_on_entry): Implement.
(vec_info::insert_seq_on_entry): Likewise.
* tree-vect-stmts.c (vect_init_vector_1): Use
vec_info::insert_on_entry.
(vect_finish_stmt_generation): Set modified bit after
adjusting VUSE.
* tree-vect-slp.c (vect_create_constant_vectors): Simplify
by using vec_info::insert_seq_on_entry and bypassing
vec_init_vector.
(vect_schedule_slp_instance): Deal with all-constant
children later.
|
|
This removes the duplicate <utility> include from tree-vectorizer.h.
2020-06-29 Richard Biener <rguenther@suse.de>
* tree-vectorizer.h: Do not include <utility>.
|
|
gcc/ChangeLog:
* tree-ssa-ccp.c (gsi_prev_dom_bb_nondebug): Use gsi_bb
instead of gimple_stmt_iterator::bb.
* tree-ssa-math-opts.c (insert_reciprocals): Likewise.
* tree-vectorizer.h: Likewise.
|
|
This fixes computation of the insertion place for fold-left SLP
reductions where the PHIs do not have vectorized stmts. The
SLP representation isn't perfect here thus the following.
2020-06-26 Richard Biener <rguenther@suse.de>
PR tree-optimization/95897
* tree-vectorizer.h (vectorizable_induction): Remove
unused gimple_stmt_iterator * parameter.
* tree-vect-loop.c (vectorizable_induction): Likewise.
(vect_analyze_loop_operations): Adjust.
* tree-vect-stmts.c (vect_analyze_stmt): Likewise.
(vect_transform_stmt): Likewise.
* tree-vect-slp.c (vect_schedule_slp_instance): Adjust
for fold-left reductions, clarify existing reduction case.
* gcc.dg/vect/pr95897.c: New testcase.
|
|
This makes sure to emit SLP vectorized loads where the first scalar
load is. This makes SLP dependence checking more powerful because
hoisting loads can use TBAA and it increases the freedom for
vector placement when there are constraints from live lanes.
Vectorized shifts block inserting vectorized stmts always after
vectorized defs because it ends up using the original scalar
operand even when the SLP graph indicates the shift operand
is vectorized (and we actually emit and cost those stmts).
vect_slp_analyze_and_verify_node_alignment shows we need alignment
for too many places, this is a temporary solution and my plan
is to have a single meta-info for a dataref group instead
(also getting rid of DR_GROUP_FIRST/NEXT_ELEMENT).
2020-06-24 Richard Biener <rguenther@suse.de>
* tree-vectorizer.h (vect_find_first_scalar_stmt_in_slp):
Declare.
* tree-vect-data-refs.c (vect_preserves_scalar_order_p):
Simplify for new position of vectorized SLP loads.
(vect_slp_analyze_node_dependences): Adjust for it.
(vect_slp_analyze_and_verify_node_alignment): Compute alignment
for the first stmts dataref.
* tree-vect-slp.c (vect_find_first_scalar_stmt_in_slp): New.
(vect_schedule_slp_instance): Emit loads before the
first scalar stmt.
* tree-vect-stmts.c (vectorizable_load): Do what the comment
says and use vect_find_first_scalar_stmt_in_slp.
|
|
gcc/ChangeLog:
* coretypes.h (struct iterator_range): New type.
* tree-vect-patterns.c (vect_determine_precisions): Use
range-based iterator.
(vect_pattern_recog): Likewise.
* tree-vect-slp.c (_bb_vec_info): Likewise.
(_bb_vec_info::~_bb_vec_info): Likewise.
(vect_slp_check_for_constructors): Likewise.
* tree-vectorizer.h:Add new iterators
and functions that use it.
|
|
This removes the SLP_TREE_TWO_OPERATORS hack in favor of having
explicit SLP nodes for both computations and the blend operation.
For this introduce a generic merge + select + permute SLP node
(with implementation limits).
Building upon earlier patches it adds vect_stmt_dominates_stmt_p
and the ability to compute a vector insertion place from
vectorized stmts (which now have UID zero) as needed for
the permute node.
2020-06-17 Richard Biener <rguenther@suse.de>
* tree-vectorizer.h (_slp_tree::two_operators): Remove.
(_slp_tree::lane_permutation): New member.
(_slp_tree::code): Likewise.
(SLP_TREE_TWO_OPERATORS): Remove.
(SLP_TREE_LANE_PERMUTATION): New.
(SLP_TREE_CODE): Likewise.
(vect_stmt_dominates_stmt_p): Declare.
* tree-vectorizer.c (vect_stmt_dominates_stmt_p): New function.
* tree-vect-stmts.c (vect_model_simple_cost): Remove
SLP_TREE_TWO_OPERATORS handling.
* tree-vect-slp.c (_slp_tree::_slp_tree): Amend.
(_slp_tree::~_slp_tree): Likewise.
(vect_two_operations_perm_ok_p): Remove.
(vect_build_slp_tree_1): Remove verification of two-operator
permutation here.
(vect_build_slp_tree_2): When we have two different operators
build two computation SLP nodes and a blend.
(vect_print_slp_tree): Print the lane permutation if it exists.
(slp_copy_subtree): Copy it.
(vect_slp_rearrange_stmts): Re-arrange it.
(vect_slp_analyze_node_operations_1): Handle SLP_TREE_CODE
VEC_PERM_EXPR explicitely.
(vect_schedule_slp_instance): Likewise. Remove old
SLP_TREE_TWO_OPERATORS code.
(vectorizable_slp_permutation): New function.
|
|
Power supports vector memory access with length (in bytes) instructions.
Like existing fully masking for SVE, it is another approach to vectorize
the loop using partially-populated vectors.
As Richard Sandiford suggested, we should share the codes in approaches
with partial vectors if possible. This patch is to:
1) factor out two functions:
- vect_min_prec_for_max_niters
- vect_known_niters_smaller_than_vf.
2) rename four functions:
- vect_iv_limit_for_full_masking
- check_load_store_masking
- vect_set_loop_condition_masked
- vect_set_loop_condition_unmasked
3) rename macros LOOP_VINFO_MASK_COMPARE_TYPE and LOOP_VINFO_MASK_IV_TYPE.
Bootstrapped/regtested on aarch64-linux-gnu.
gcc/ChangeLog:
* tree-vect-loop-manip.c (vect_set_loop_controls_directly): Rename
LOOP_VINFO_MASK_COMPARE_TYPE to LOOP_VINFO_RGROUP_COMPARE_TYPE. Rename
LOOP_VINFO_MASK_IV_TYPE to LOOP_VINFO_RGROUP_IV_TYPE.
(vect_set_loop_condition_masked): Renamed to ...
(vect_set_loop_condition_partial_vectors): ... this. Rename
LOOP_VINFO_MASK_COMPARE_TYPE to LOOP_VINFO_RGROUP_COMPARE_TYPE. Rename
vect_iv_limit_for_full_masking to vect_iv_limit_for_partial_vectors.
(vect_set_loop_condition_unmasked): Renamed to ...
(vect_set_loop_condition_normal): ... this.
(vect_set_loop_condition): Rename vect_set_loop_condition_unmasked to
vect_set_loop_condition_normal. Rename vect_set_loop_condition_masked
to vect_set_loop_condition_partial_vectors.
(vect_prepare_for_masked_peels): Rename LOOP_VINFO_MASK_COMPARE_TYPE
to LOOP_VINFO_RGROUP_COMPARE_TYPE.
* tree-vect-loop.c (vect_known_niters_smaller_than_vf): New, factored
out from ...
(vect_analyze_loop_costing): ... this.
(_loop_vec_info::_loop_vec_info): Rename mask_compare_type to
compare_type.
(vect_min_prec_for_max_niters): New, factored out from ...
(vect_verify_full_masking): ... this. Rename
vect_iv_limit_for_full_masking to vect_iv_limit_for_partial_vectors.
Rename LOOP_VINFO_MASK_COMPARE_TYPE to LOOP_VINFO_RGROUP_COMPARE_TYPE.
Rename LOOP_VINFO_MASK_IV_TYPE to LOOP_VINFO_RGROUP_IV_TYPE.
(vectorizable_reduction): Update some dumpings with partial
vectors instead of fully-masked.
(vectorizable_live_operation): Likewise.
(vect_iv_limit_for_full_masking): Renamed to ...
(vect_iv_limit_for_partial_vectors): ... this.
* tree-vect-stmts.c (check_load_store_masking): Renamed to ...
(check_load_store_for_partial_vectors): ... this. Update some
dumpings with partial vectors instead of fully-masked.
(vectorizable_store): Rename check_load_store_masking to
check_load_store_for_partial_vectors.
(vectorizable_load): Likewise.
* tree-vectorizer.h (LOOP_VINFO_MASK_COMPARE_TYPE): Renamed to ...
(LOOP_VINFO_RGROUP_COMPARE_TYPE): ... this.
(LOOP_VINFO_MASK_IV_TYPE): Renamed to ...
(LOOP_VINFO_RGROUP_IV_TYPE): ... this.
(vect_iv_limit_for_full_masking): Renamed to ...
(vect_iv_limit_for_partial_vectors): this.
(_loop_vec_info): Rename mask_compare_type to rgroup_compare_type.
Rename iv_type to rgroup_iv_type.
|
|
Power supports vector memory access with length (in bytes) instructions.
Like existing fully masking for SVE, it is another approach to vectorize
the loop using partially-populated vectors.
As Richard Sandiford pointed out, we can rename the rgroup struct
rgroup_masks to rgroup_controls, rename its members mask_type to type,
masks to controls to be more generic.
Besides, this patch also renames some functions like vect_set_loop_mask
to vect_set_loop_control, release_vec_loop_masks to
release_vec_loop_controls, vect_set_loop_masks_directly to
vect_set_loop_controls_directly.
Bootstrapped/regtested on aarch64-linux-gnu.
gcc/ChangeLog:
* tree-vect-loop-manip.c (vect_set_loop_mask): Renamed to ...
(vect_set_loop_control): ... this.
(vect_maybe_permute_loop_masks): Rename rgroup_masks related things.
(vect_set_loop_masks_directly): Renamed to ...
(vect_set_loop_controls_directly): ... this. Also rename some
variables with ctrl instead of mask. Rename vect_set_loop_mask to
vect_set_loop_control.
(vect_set_loop_condition_masked): Rename rgroup_masks related things.
Also rename some variables with ctrl instead of mask.
* tree-vect-loop.c (release_vec_loop_masks): Renamed to ...
(release_vec_loop_controls): ... this. Rename rgroup_masks related
things.
(_loop_vec_info::~_loop_vec_info): Rename release_vec_loop_masks to
release_vec_loop_controls.
(can_produce_all_loop_masks_p): Rename rgroup_masks related things.
(vect_get_max_nscalars_per_iter): Likewise.
(vect_estimate_min_profitable_iters): Likewise.
(vect_record_loop_mask): Likewise.
(vect_get_loop_mask): Likewise.
* tree-vectorizer.h (struct rgroup_masks): Renamed to ...
(struct rgroup_controls): ... this. Also rename mask_type
to type and rename masks to controls.
|
|
Power supports vector memory access with length (in bytes) instructions.
Like existing fully masking for SVE, it is another approach to vectorize
the loop using partially-populated vectors.
As Richard Sandiford suggested, this patch is to update the existing
fully_masked_p field to using_partial_vectors_p. Introduce one macro
LOOP_VINFO_USING_PARTIAL_VECTORS_P for partial vectorization checking
usage, update the LOOP_VINFO_FULLY_MASKED_P with
LOOP_VINFO_USING_PARTIAL_VECTORS_P && !masks.is_empty() and still use
it for mask-based partial vectors approach specific checks.
Bootstrapped/regtested on aarch64-linux-gnu.
gcc/ChangeLog:
* tree-vect-loop-manip.c (vect_set_loop_condition): Rename
LOOP_VINFO_FULLY_MASKED_P to LOOP_VINFO_USING_PARTIAL_VECTORS_P.
(vect_gen_vector_loop_niters): Likewise.
(vect_do_peeling): Likewise.
* tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Rename
fully_masked_p to using_partial_vectors_p.
(vect_analyze_loop_costing): Rename LOOP_VINFO_FULLY_MASKED_P to
LOOP_VINFO_USING_PARTIAL_VECTORS_P.
(determine_peel_for_niter): Likewise.
(vect_estimate_min_profitable_iters): Likewise.
(vect_transform_loop): Likewise.
* tree-vectorizer.h (LOOP_VINFO_FULLY_MASKED_P): Updated.
(LOOP_VINFO_USING_PARTIAL_VECTORS_P): New macro.
|
|
Power supports vector memory access with length (in bytes) instructions.
Like existing fully masking for SVE, it is another approach to vectorize
the loop using partially-populated vectors.
As Richard Sandiford pointed out, we should extend the existing flag
can_fully_mask_p to be more generic, to indicate whether we have
any chances with partial vectors for this loop. So this patch
is to rename this flag to can_use_partial_vectors_p to be more
meaningful, also rename the macro LOOP_VINFO_CAN_FULLY_MASK_P
to LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P.
Bootstrapped/regtested on aarch64-linux-gnu.
gcc/ChangeLog:
* tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Rename
can_fully_mask_p to can_use_partial_vectors_p.
(vect_analyze_loop_2): Rename LOOP_VINFO_CAN_FULLY_MASK_P to
LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P. Rename saved_can_fully_mask_p
to saved_can_use_partial_vectors_p.
(vectorizable_reduction): Rename LOOP_VINFO_CAN_FULLY_MASK_P to
LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P.
(vectorizable_live_operation): Likewise.
* tree-vect-stmts.c (permute_vec_elements): Likewise.
(check_load_store_masking): Likewise.
(vectorizable_operation): Likewise.
(vectorizable_store): Likewise.
(vectorizable_load): Likewise.
(vectorizable_condition): Likewise.
* tree-vectorizer.h (LOOP_VINFO_CAN_FULLY_MASK_P): Renamed to ...
(LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P): ... this.
(_loop_vec_info): Rename can_fully_mask_p to can_use_partial_vectors_p.
|
|
This makes {SLP_TREE,STMT_VINFO}_VEC_STMTS a vector of gimple * and
not allocate a stmt_vec_info for vectorizer generated stmts since
this is now possible after removing the only use which was chaining
of vector stmts via STMT_VINFO_RELATED_STMT.
This also removes all stmt_vec_info allocations done for vector
stmts, the remaining ones are for stmts in the scalar IL and for
patterns which are not part of the IL. Thus after this the stmt
UIDs inside a basic-block are suitable for dominance checking
if you ignore (or lazy-fill) UIDs of zero of the vector stmts
inserted during transform. This property is ensured by a new
flag set when pattern analysis is complete.
2020-06-10 Richard Biener <rguenther@suse.de>
* tree-vectorizer.h (_slp_tree::vec_stmts): Make it a vector
of gimple * stmts.
(_stmt_vec_info::vec_stmts): Likewise.
(vec_info::stmt_vec_info_ro): New flag.
(vect_finish_replace_stmt): Adjust declaration.
(vect_finish_stmt_generation): Likewise.
(vectorizable_induction): Likewise.
(vect_transform_reduction): Likewise.
(vectorizable_lc_phi): Likewise.
* tree-vect-data-refs.c (vect_create_data_ref_ptr): Do not
allocate stmt infos for increments.
(vect_record_grouped_load_vectors): Adjust.
* tree-vect-loop.c (vect_create_epilog_for_reduction): Likewise.
(vectorize_fold_left_reduction): Likewise.
(vect_transform_reduction): Likewise.
(vect_transform_cycle_phi): Likewise.
(vectorizable_lc_phi): Likewise.
(vectorizable_induction): Likewise.
(vectorizable_live_operation): Likewise.
(vect_transform_loop): Likewise.
* tree-vect-patterns.c (vect_pattern_recog): Set stmt_vec_info_ro.
* tree-vect-slp.c (vect_get_slp_vect_def): Adjust.
(vect_get_slp_defs): Likewise.
(vect_transform_slp_perm_load): Likewise.
(vect_schedule_slp_instance): Likewise.
(vectorize_slp_instance_root_stmt): Likewise.
* tree-vect-stmts.c (vect_get_vec_defs_for_operand): Likewise.
(vect_finish_stmt_generation_1): Do not allocate a stmt info.
(vect_finish_replace_stmt): Do not return anything.
(vect_finish_stmt_generation): Likewise.
(vect_build_gather_load_calls): Adjust.
(vectorizable_bswap): Likewise.
(vectorizable_call): Likewise.
(vectorizable_simd_clone_call): Likewise.
(vect_create_vectorized_demotion_stmts): Likewise.
(vectorizable_conversion): Likewise.
(vectorizable_assignment): Likewise.
(vectorizable_shift): Likewise.
(vectorizable_operation): Likewise.
(vectorizable_scan_store): Likewise.
(vectorizable_store): Likewise.
(vectorizable_load): Likewise.
(vectorizable_condition): Likewise.
(vectorizable_comparison): Likewise.
(vect_transform_stmt): Likewise.
* tree-vectorizer.c (vec_info::vec_info): Initialize
stmt_vec_info_ro.
(vec_info::replace_stmt): Copy over stmt UID rather than
unsetting/setting a stmt info allocating a new UID.
(vec_info::set_vinfo_for_stmt): Assert !stmt_vec_info_ro.
|
|
This gets rid of the linked list of STMT_VINFO_VECT_STMT and
STMT_VINFO_RELATED_STMT in preparation for vectorized stmts no
longer needing a stmt_vec_info (just for this chaining). This
has ripple-down effects in all places we gather vectorized
defs. For this new interfaces are introduced and used
throughout vectorization, simplifying code in a lot of places
and merging it with the SLP way of gathering vectorized
operands. There is vect_get_vec_defs as the new recommended
unified interface and vect_get_vec_defs_for_operand as one
for non-SLP operation. I've resorted to keep the structure
of the code the same where using vect_get_vec_defs would have
been too disruptive for this already large patch.
2020-06-10 Richard Biener <rguenther@suse.de>
* tree-vect-data-refs.c (vect_vfa_access_size): Adjust.
(vect_record_grouped_load_vectors): Likewise.
* tree-vect-loop.c (vect_create_epilog_for_reduction): Likewise.
(vectorize_fold_left_reduction): Likewise.
(vect_transform_reduction): Likewise.
(vect_transform_cycle_phi): Likewise.
(vectorizable_lc_phi): Likewise.
(vectorizable_induction): Likewise.
(vectorizable_live_operation): Likewise.
(vect_transform_loop): Likewise.
* tree-vect-slp.c (vect_get_slp_defs): New function, split out
from overload.
* tree-vect-stmts.c (vect_get_vec_def_for_operand_1): Remove.
(vect_get_vec_def_for_operand): Likewise.
(vect_get_vec_def_for_stmt_copy): Likewise.
(vect_get_vec_defs_for_stmt_copy): Likewise.
(vect_get_vec_defs_for_operand): New function.
(vect_get_vec_defs): Likewise.
(vect_build_gather_load_calls): Adjust.
(vect_get_gather_scatter_ops): Likewise.
(vectorizable_bswap): Likewise.
(vectorizable_call): Likewise.
(vectorizable_simd_clone_call): Likewise.
(vect_get_loop_based_defs): Remove.
(vect_create_vectorized_demotion_stmts): Adjust.
(vectorizable_conversion): Likewise.
(vectorizable_assignment): Likewise.
(vectorizable_shift): Likewise.
(vectorizable_operation): Likewise.
(vectorizable_scan_store): Likewise.
(vectorizable_store): Likewise.
(vectorizable_load): Likewise.
(vectorizable_condition): Likewise.
(vectorizable_comparison): Likewise.
(vect_transform_stmt): Adjust and remove no longer applicable
sanity checks.
* tree-vectorizer.c (vec_info::new_stmt_vec_info): Initialize
STMT_VINFO_VEC_STMTS.
(vec_info::free_stmt_vec_info): Relase it.
* tree-vectorizer.h (_stmt_vec_info::vectorized_stmt): Remove.
(_stmt_vec_info::vec_stmts): Add.
(STMT_VINFO_VEC_STMT): Remove.
(STMT_VINFO_VEC_STMTS): New.
(vect_get_vec_def_for_operand_1): Remove.
(vect_get_vec_def_for_operand): Likewise.
(vect_get_vec_defs_for_stmt_copy): Likewise.
(vect_get_vec_def_for_stmt_copy): Likewise.
(vect_get_vec_defs): New overloads.
(vect_get_vec_defs_for_operand): New.
(vect_get_slp_defs): Declare.
|
|
This adds vect_get_slp_vect_def to get at a SLP nodes vectorized def,
abstracting away the details. It also fixes one stray failure to
use SLP_TREE_REPRESENTATIVE.
2020-05-04 Richard Biener <rguenther@suse.de>
* tree-vectorizer.h (vect_get_slp_vect_def): Declare.
* tree-vect-loop.c (vect_create_epilog_for_reduction): Use it.
* tree-vect-stmts.c (vect_transform_stmt): Likewise.
(vect_is_simple_use): Use SLP_TREE_REPRESENTATIVE.
* tree-vect-slp.c (vect_get_slp_vect_defs): Fold into single
use ...
(vect_get_slp_defs): ... here.
(vect_get_slp_vect_def): New function.
|
|
This adds an explicit number of scalar lanes to the SLP node
avoiding to dispatch between stmts/ops and eventually not require
those vectors at all.
2020-05-27 Richard Biener <rguenther@suse.de>
* tree-vectorizer.h (_slp_tree::lanes): New.
(SLP_TREE_LANES): Likewise.
* tree-vect-loop.c (vect_create_epilog_for_reduction): Use it.
(vectorizable_reduction): Likewise.
(vect_transform_cycle_phi): Likewise.
(vectorizable_induction): Likewise.
(vectorizable_live_operation): Likewise.
* tree-vect-slp.c (_slp_tree::_slp_tree): Initialize lanes.
(vect_create_new_slp_node): Likewise.
(slp_copy_subtree): Copy it.
(vect_optimize_slp): Use it.
(vect_slp_analyze_node_operations_1): Likewise.
(vect_slp_convert_to_external): Likewise.
(vect_bb_vectorization_profitable_p): Likewise.
* tree-vect-stmts.c (vectorizable_load): Likewise.
(get_vectype_for_scalar_type): Likewise.
|
|
This adds SLP_TREE_REPRESENTATIVE - a representative stmt-info that
is used by SLP analysis and code generation. This avoids the need
for the hack in vect_slp_rearrange_stmts which previously avoided
to re-arrange stmts that might not have been isomorphic because
of operand swapping. It also plays nice with future directions of SLP
and for the forseeable future is easier than replicating more and
more info in the SLP node as long as non-SLP is in-tree.
2020-05-29 Richard Biener <rguenther@suse.de>
PR tree-optimization/95272
* tree-vectorizer.h (_slp_tree::representative): Add.
(SLP_TREE_REPRESENTATIVE): Likewise.
* tree-vect-loop.c (vectorizable_reduction): Adjust SLP
node gathering.
(vectorizable_live_operation): Use the representative to
attach the reduction info to.
* tree-vect-slp.c (_slp_tree::_slp_tree): Initialize
SLP_TREE_REPRESENTATIVE.
(vect_create_new_slp_node): Likewise.
(slp_copy_subtree): Copy it.
(vect_slp_rearrange_stmts): Re-arrange even COND_EXPR stmts.
(vect_slp_analyze_node_operations_1): Pass the representative
to vect_analyze_stmt.
(vect_schedule_slp_instance): Pass the representative to
vect_transform_stmt.
* gcc.dg/vect/pr95272.c: New testcase.
|
|
This generates vector defs for externals and invariants during the SLP
walk rather than as part of getting vectorized defs when vectorizing
the users. This is a requirement to make sharing of external/invariant
nodes be reflected in actual code generation.
This temporarily adds a SLP_TREE_VEC_DEFS vector alongside the
SLP_TREE_VEC_STMTS one. Eventually the latter can go away.
2020-05-27 Richard Biener <rguenther@suse.de>
* tree-vectorizer.h (_slp_tree::vec_defs): Add.
(SLP_TREE_VEC_DEFS): Likewise.
* tree-vect-slp.c (_slp_tree::_slp_tree): Adjust.
(_slp_tree::~_slp_tree): Likewise.
(vect_mask_constant_operand_p): Remove unused function.
(vect_get_constant_vectors): Rename to...
(vect_create_constant_vectors): ... this. Take the
invariant node as argument and code generate it. Remove
dead code, remove temporary asserts. Pass a NULL stmt_info
to vect_init_vector.
(vect_get_slp_defs): Simplify.
(vect_schedule_slp_instance): Code-generate externals and
invariants using vect_create_constant_vectors.
|
|
This tries to enforce a set SLP_TREE_VECTYPE in vect_get_constant_vectors
and provides some infrastructure for setting it in the vectorizable_*
functions, amending those.
2020-05-22 Richard Biener <rguenther@suse.de>
* tree-vectorizer.h (vect_is_simple_use): New overload.
(vect_maybe_update_slp_op_vectype): New.
* tree-vect-stmts.c (vect_is_simple_use): New overload
accessing operands of SLP vs. non-SLP operation transparently.
(vect_maybe_update_slp_op_vectype): New function updating
the possibly shared SLP operands vector type.
(vectorizable_operation): Be a bit more SLP vs non-SLP agnostic
using the new vect_is_simple_use overload; update SLP invariant
operand nodes vector type.
(vectorizable_comparison): Likewise.
(vectorizable_call): Likewise.
(vectorizable_conversion): Likewise.
(vectorizable_shift): Likewise.
(vectorizable_store): Likewise.
(vectorizable_condition): Likewise.
(vectorizable_assignment): Likewise.
* tree-vect-loop.c (vectorizable_reduction): Likewise.
* tree-vect-slp.c (vect_get_constant_vectors): Enforce
present SLP_TREE_VECTYPE and check it matches previous
behavior.
|
|
This adds constructor and destructor to slp_tree factoring common
code. I've not changed the wrappers to overloaded CTORs since
I hope to use object_allocator<> and am not sure whether that can
be done in any fancy way yet.
2020-05-22 Richard Biener <rguenther@suse.de>
* tree-vectorizer.h (_slp_tree::_slp_tree): New.
(_slp_tree::~_slp_tree): Likewise.
* tree-vect-slp.c (_slp_tree::_slp_tree): Factor out code
from allocators.
(_slp_tree::~_slp_tree): Implement.
(vect_free_slp_tree): Simplify.
(vect_create_new_slp_node): Likewise. Add nops parameter.
(vect_build_slp_tree_2): Adjust.
(vect_analyze_slp_instance): Likewise.
|
|
2020-05-19 Richard Biener <rguenther@suse.de>
* tree-vectorizer.h (_slp_tree::vectype): Add field.
(SLP_TREE_VECTYPE): New.
* tree-vect-slp.c (vect_create_new_slp_node): Initialize
SLP_TREE_VECTYPE.
(vect_create_new_slp_node): Likewise.
(vect_prologue_cost_for_slp): Move here from tree-vect-stmts.c
and simplify.
(vect_slp_analyze_node_operations): Walk nodes children for
invariant costing.
(vect_get_constant_vectors): Use local scope op variable.
* tree-vect-stmts.c (vect_prologue_cost_for_slp_op): Remove here.
(vect_model_simple_cost): Adjust.
(vect_model_store_cost): Likewise.
(vectorizable_store): Likewise.
|
|
This adds a vectype parameter to add_stmt_cost which avoids the need
to pass down a (wrong) stmt_info just to carry this information.
Useful for invariants which do not have a stmt_info associated.
2020-05-13 Richard Biener <rguenther@suse.de>
* target.def (add_stmt_cost): Add new vectype parameter.
* targhooks.c (default_add_stmt_cost): Adjust.
* targhooks.h (default_add_stmt_cost): Likewise.
* config/aarch64/aarch64.c (aarch64_add_stmt_cost): Take new
vectype parameter.
* config/arm/arm.c (arm_add_stmt_cost): Likewise.
* config/i386/i386.c (ix86_add_stmt_cost): Likewise.
* config/rs6000/rs6000.c (rs6000_add_stmt_cost): Likewise.
* tree-vectorizer.h (stmt_info_for_cost::vectype): Add.
(dump_stmt_cost): Add new vectype parameter.
(add_stmt_cost): Likewise.
(record_stmt_cost): Likewise.
(record_stmt_cost): Add overload with old signature.
* tree-vect-loop.c (vect_compute_single_scalar_iteration_cost):
Adjust.
(vect_get_known_peeling_cost): Likewise.
(vect_estimate_min_profitable_iters): Likewise.
* tree-vectorizer.c (dump_stmt_cost): Add new vectype parameter.
* tree-vect-stmts.c (record_stmt_cost): Likewise.
(vect_prologue_cost_for_slp_op): Remove stmt_vec_info parameter
and pass down correct vectype and NULL stmt_info.
(vect_model_simple_cost): Adjust.
(vect_model_store_cost): Likewise.
|
|
This removes the SLP_INSTANCE_GROUP_SIZE member since the number of
lanes throughout a SLP subgraph is not necessarily constant.
2020-05-13 Richard Biener <rguenther@suse.de>
* tree-vectorizer.h (SLP_INSTANCE_GROUP_SIZE): Remove.
(_slp_instance::group_size): Likewise.
* tree-vect-loop.c (vectorizable_reduction): The group size
is the number of lanes in the node.
* tree-vect-slp.c (vect_attempt_slp_rearrange_stmts): Likewise.
(vect_analyze_slp_instance): Do not set SLP_INSTANCE_GROUP_SIZE,
verify it matches the instance trees number of lanes.
(vect_slp_analyze_node_operations_1): Use the numer of lanes
in the node as group size.
(vect_bb_vectorization_profitable_p): Use the instance root
number of lanes for the size of life.
(vect_schedule_slp_instance): Use the number of lanes as
group_size.
* tree-vect-stmts.c (vectorizable_load): Remove SLP instance
parameter. Use the number of lanes of the load for the group
size in the gap adjustment code.
(vect_analyze_stmt): Adjust.
(vect_transform_stmt): Likewise.
|
|
This delays the SLP permutation check to vectorizable_load and optimizes
permutations only after all SLP instances have been generated and the
vectorization factor is determined.
2020-05-08 Richard Biener <rguenther@suse.de>
* tree-vectorizer.h (vec_info::slp_loads): New.
(vect_optimize_slp): Declare.
* tree-vect-slp.c (vect_attempt_slp_rearrange_stmts): Do
nothing when there are no loads.
(vect_gather_slp_loads): Gather loads into a vector.
(vect_supported_load_permutation_p): Remove.
(vect_analyze_slp_instance): Do not verify permutation
validity here.
(vect_analyze_slp): Optimize permutations of reductions
after all SLP instances have been gathered and gather
all loads.
(vect_optimize_slp): New function split out from
vect_supported_load_permutation_p. Elide some permutations.
(vect_slp_analyze_bb_1): Call vect_optimize_slp.
* tree-vect-loop.c (vect_analyze_loop_2): Likewise.
* tree-vect-stmts.c (vectorizable_load): Check whether
the load can be permuted. When generating code assert we can.
* gcc.dg/vect/bb-slp-pr68892.c: Adjust for not supported
SLP permutations becoming builds from scalars.
* gcc.dg/vect/bb-slp-pr78205.c: Likewise.
* gcc.dg/vect/bb-slp-34.c: Likewise.
|
|
This removes trivial instances of SLP_INSTANCE_GROUP_SIZE and refrains
from using a "SLP instance" which nowadays is just one of the possibly
many entries into the SLP graph.
2020-05-06 Richard Biener <rguenther@suse.de>
* tree-vectorizer.h (vect_transform_slp_perm_load): Adjust.
* tree-vect-data-refs.c (vect_slp_analyze_node_dependences):
Remove slp_instance parameter, just iterate over all scalar stmts.
(vect_slp_analyze_instance_dependence): Adjust and likewise.
* tree-vect-slp.c (vect_bb_slp_scalar_cost): Remove unused BB
parameter.
(vect_schedule_slp): Just iterate over all scalar stmts.
(vect_supported_load_permutation_p): Adjust.
(vect_transform_slp_perm_load): Remove slp_instance parameter,
instead use the number of lanes in the node as group size.
* tree-vect-stmts.c (vect_model_load_cost): Get vectorization
factor instead of slp_instance as parameter.
(vectorizable_load): Adjust.
|
|
Soonish we'll get SLP nodes which have no corresponding scalar
stmt and thus not stmt_vec_info and thus no way to get back to
the associated vec_info. This patch makes the vec_info available
as part of the APIs instead of putting in that back-pointer into
the leaf data structures.
2020-05-05 Richard Biener <rguenther@suse.de>
* tree-vectorizer.h (_stmt_vec_info::vinfo): Remove.
(STMT_VINFO_LOOP_VINFO): Likewise.
(STMT_VINFO_BB_VINFO): Likewise.
* tree-vect-data-refs.c: Adjust for the above, adding vec_info *
parameters and adjusting calls.
* tree-vect-loop-manip.c: Likewise.
* tree-vect-loop.c: Likewise.
* tree-vect-patterns.c: Likewise.
* tree-vect-slp.c: Likewise.
* tree-vect-stmts.c: Likewise.
* tree-vectorizer.c: Likewise.
* target.def (add_stmt_cost): Add vec_info * parameter.
* target.h (stmt_in_inner_loop_p): Likewise.
* targhooks.c (default_add_stmt_cost): Adjust.
* doc/tm.texi: Re-generate.
* config/aarch64/aarch64.c (aarch64_extending_load_p): Add
vec_info * parameter and adjust.
(aarch64_sve_adjust_stmt_cost): Likewise.
(aarch64_add_stmt_cost): Likewise.
* config/arm/arm.c (arm_add_stmt_cost): Likewise.
* config/i386/i386.c (ix86_add_stmt_cost): Likewise.
* config/rs6000/rs6000.c (rs6000_add_stmt_cost): Likewise.
|
|
When versioning is run the IL is already mangled and finding
a VECTORIZED_CALL IFN can fail.
2020-01-20 Richard Biener <rguenther@suse.de>
PR tree-optimization/93094
* tree-vectorizer.h (vect_loop_versioning): Adjust.
(vect_transform_loop): Likewise.
* tree-vectorizer.c (try_vectorize_loop_1): Pass down
loop_vectorized_call to vect_transform_loop.
* tree-vect-loop.c (vect_transform_loop): Pass down
loop_vectorized_call to vect_loop_versioning.
* tree-vect-loop-manip.c (vect_loop_versioning): Use
the earlier discovered loop_vectorized_call.
* gcc.dg/vect/pr93094.c: New testcase.
|
|
gcc/cp/ChangeLog:
* cp-gimplify.c (source_location_table_entry_hash::empty_zero_p):
New static constant.
* cp-tree.h (named_decl_hash::empty_zero_p): Likewise.
(struct named_label_hash::empty_zero_p): Likewise.
* decl2.c (mangled_decl_hash::empty_zero_p): Likewise.
gcc/ChangeLog:
* attribs.c (excl_hash_traits::empty_zero_p): New static constant.
* gcov.c (function_start_pair_hash::empty_zero_p): Likewise.
* graphite.c (struct sese_scev_hash::empty_zero_p): Likewise.
* hash-map-tests.c (selftest::test_nonzero_empty_key): New selftest.
(selftest::hash_map_tests_c_tests): Call it.
* hash-map-traits.h (simple_hashmap_traits::empty_zero_p):
New static constant, using the value of = H::empty_zero_p.
(unbounded_hashmap_traits::empty_zero_p): Likewise, using the value
from default_hash_traits <Value>.
* hash-map.h (hash_map::empty_zero_p): Likewise, using the value
from Traits.
* hash-set-tests.c (value_hash_traits::empty_zero_p): Likewise.
* hash-table.h (hash_table::alloc_entries): Guard the loop of
calls to mark_empty with !Descriptor::empty_zero_p.
(hash_table::empty_slow): Conditionalize the memset call with a
check that Descriptor::empty_zero_p; otherwise, loop through the
entries calling mark_empty on them.
* hash-traits.h (int_hash::empty_zero_p): New static constant.
(pointer_hash::empty_zero_p): Likewise.
(pair_hash::empty_zero_p): Likewise.
* ipa-devirt.c (default_hash_traits <type_pair>::empty_zero_p):
Likewise.
* ipa-prop.c (ipa_bit_ggc_hash_traits::empty_zero_p): Likewise.
(ipa_vr_ggc_hash_traits::empty_zero_p): Likewise.
* profile.c (location_triplet_hash::empty_zero_p): Likewise.
* sanopt.c (sanopt_tree_triplet_hash::empty_zero_p): Likewise.
(sanopt_tree_couple_hash::empty_zero_p): Likewise.
* tree-hasher.h (int_tree_hasher::empty_zero_p): Likewise.
* tree-ssa-sccvn.c (vn_ssa_aux_hasher::empty_zero_p): Likewise.
* tree-vect-slp.c (bst_traits::empty_zero_p): Likewise.
* tree-vectorizer.h
(default_hash_traits<scalar_cond_masked_key>::empty_zero_p):
Likewise.
|
|
gcc/ChangeLog:
2020-01-10 Andre Vieira <andre.simoesdiasvieira@arm.com>
* tree-vectorizer.h (get_dr_vinfo_offset): Add missing function
comment.
From-SVN: r280108
|
|
gcc/ChangeLog:
2020-01-10 Andre Vieira <andre.simoesdiasvieira@arm.com>
* tree-vect-data-refs.c (vect_create_addr_base_for_vector_ref): Use
get_dr_vinfo_offset
* tree-vect-loop.c (update_epilogue_loop_vinfo): Remove orig_drs_init
parameter and its use to reset DR_OFFSET's.
(vect_transform_loop): Remove orig_drs_init argument.
* tree-vect-loop-manip.c (vect_update_init_of_dr): Update the offset
member of dr_vec_info rather than the offset of the associated
data_reference's innermost_loop_behavior.
(vect_update_init_of_dr): Pass dr_vec_info instead of data_reference.
(vect_do_peeling): Remove orig_drs_init parameter and its construction.
* tree-vect-stmts.c (check_scan_store): Replace use of DR_OFFSET with
get_dr_vinfo_offset.
(vectorizable_store): Likewise.
(vectorizable_load): Likewise.
From-SVN: r280107
|
|
From-SVN: r279813
|
|
search_type_for_mask uses a worklist to search a chain of boolean
operations for a natural vector mask type. This patch instead does
that in vect_determine_stmt_precisions, where we also look for
overpromoted integer operations. We then only need to compute
the precision once and can cache it in the stmt_vec_info.
The new function vect_determine_mask_precision is supposed
to handle exactly the same cases as search_type_for_mask_1,
and in the same way. There's a lot we could improve here,
but that's not stage 3 material.
I wondered about sharing mask_precision with other fields like
operation_precision, but in the end that seemed too dangerous.
We have patterns to convert between boolean and non-boolean
operations and it would be very easy to get mixed up about
which case the fields are describing.
2019-11-29 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-vectorizer.h (stmt_vec_info::mask_precision): New field.
(vect_use_mask_type_p): New function.
* tree-vect-patterns.c (vect_init_pattern_stmt): Copy the
mask precision to the pattern statement.
(append_pattern_def_seq): Add a scalar_type_for_mask parameter
and use it to initialize the new stmt's mask precision.
(search_type_for_mask_1): Delete.
(search_type_for_mask): Replace with...
(integer_type_for_mask): ...this new function. Use the information
cached in the stmt_vec_info.
(vect_recog_bool_pattern): Update accordingly.
(build_mask_conversion): Pass the scalar type associated with the
mask type to append_pattern_def_seq.
(vect_recog_mask_conversion_pattern): Likewise. Call
integer_type_for_mask instead of search_type_for_mask.
(vect_convert_mask_for_vectype): Call integer_type_for_mask instead
of search_type_for_mask.
(possible_vector_mask_operation_p): New function.
(vect_determine_mask_precision): Likewise.
(vect_determine_stmt_precisions): Call it.
From-SVN: r278850
|
|
This patch makes vect_get_mask_type_for_stmt and
get_mask_type_for_scalar_type take a group size instead of
the SLP node, so that later patches can call it before an
SLP node has been built.
2019-11-29 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-vectorizer.h (get_mask_type_for_scalar_type): Replace
the slp_tree parameter with a group size parameter.
(vect_get_mask_type_for_stmt): Likewise.
* tree-vect-stmts.c (get_mask_type_for_scalar_type): Likewise.
(vect_get_mask_type_for_stmt): Likewise.
* tree-vect-slp.c (vect_slp_analyze_node_operations_1): Update
call accordingly.
From-SVN: r278849
|
|
This patch adds a mode in which the vectoriser tries each available
base vector mode and picks the one with the lowest cost. The new
behaviour is selected by autovectorize_vector_modes.
The patch keeps the current behaviour of preferring a VF of
loop->simdlen over any larger or smaller VF, regardless of costs
or target preferences.
2019-11-16 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* target.h (VECT_COMPARE_COSTS): New constant.
* target.def (autovectorize_vector_modes): Return a bitmask of flags.
* doc/tm.texi: Regenerate.
* targhooks.h (default_autovectorize_vector_modes): Update accordingly.
* targhooks.c (default_autovectorize_vector_modes): Likewise.
* config/aarch64/aarch64.c (aarch64_autovectorize_vector_modes):
Likewise.
* config/arc/arc.c (arc_autovectorize_vector_modes): Likewise.
* config/arm/arm.c (arm_autovectorize_vector_modes): Likewise.
* config/i386/i386.c (ix86_autovectorize_vector_modes): Likewise.
* config/mips/mips.c (mips_autovectorize_vector_modes): Likewise.
* tree-vectorizer.h (_loop_vec_info::vec_outside_cost)
(_loop_vec_info::vec_inside_cost): New member variables.
* tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize them.
(vect_better_loop_vinfo_p, vect_joust_loop_vinfos): New functions.
(vect_analyze_loop): When autovectorize_vector_modes returns
VECT_COMPARE_COSTS, try vectorizing the loop with each available
vector mode and picking the one with the lowest cost.
(vect_estimate_min_profitable_iters): Record the computed costs
in the loop_vec_info.
From-SVN: r278336
|
|
This patch makes can_duplicate_and_interleave_p cope with mixtures of
vector sizes, by using queries based on get_vectype_for_scalar_type
instead of directly querying GET_MODE_SIZE (vinfo->vector_mode).
int_mode_for_size is now the first check we do for a candidate mode,
so it seemed better to restrict it to MAX_FIXED_MODE_SIZE. This avoids
unnecessary work and avoids trying to create scalar types that the
target might not support.
2019-11-16 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-vectorizer.h (can_duplicate_and_interleave_p): Take an
element type rather than an element mode.
* tree-vect-slp.c (can_duplicate_and_interleave_p): Likewise.
Use get_vectype_for_scalar_type to query the natural types
for a given element type rather than basing everything on
GET_MODE_SIZE (vinfo->vector_mode). Limit int_mode_for_size
query to MAX_FIXED_MODE_SIZE.
(duplicate_and_interleave): Update call accordingly.
* tree-vect-loop.c (vectorizable_reduction): Likewise.
From-SVN: r278335
|
|
The BB vectoriser picked vector types in the same way as the loop
vectoriser: it picked a vector mode/size for the region and then
based all the vector types off that choice. This meant we could
end up trying to use vector types that had too many elements for
the group size.
The main part of this patch is therefore about passing the SLP
group size down to routines like get_vectype_for_scalar_type and
ensuring that each vector type in the SLP tree is chosen wrt the
group size. That part in itself is pretty easy and mechanical.
The main warts are:
(1) We normally pick a STMT_VINFO_VECTYPE for data references at an
early stage (vect_analyze_data_refs). However, nothing in the
BB vectoriser relied on this, or on the min_vf calculated from it.
I couldn't see anything other than vect_recog_bool_pattern that
tried to access the vector type before the SLP tree is built.
(2) It's possible for the same statement to be used in groups of
different sizes. Taking the group size into account meant that
we could try to pick different vector types for the same statement.
This problem should go away with the move to doing everything on
SLP trees, where presumably we would attach the vector type to the
SLP node rather than the stmt_vec_info. Until then, the patch just
uses a first-come, first-served approach.
(3) A similar problem exists for grouped data references, where
different statements in the same dataref group could be used
in SLP nodes that have different group sizes. The patch copes
with that by making sure that all vector types in a dataref
group remain consistent.
The patch means that:
void
f (int *x, short *y)
{
x[0] += y[0];
x[1] += y[1];
x[2] += y[2];
x[3] += y[3];
}
now produces:
ldr q0, [x0]
ldr d1, [x1]
saddw v0.4s, v0.4s, v1.4h
str q0, [x0]
ret
instead of:
ldrsh w2, [x1]
ldrsh w3, [x1, 2]
fmov s0, w2
ldrsh w2, [x1, 4]
ldrsh w1, [x1, 6]
ins v0.s[1], w3
ldr q1, [x0]
ins v0.s[2], w2
ins v0.s[3], w1
add v0.4s, v0.4s, v1.4s
str q0, [x0]
ret
Unfortunately it also means we start to vectorise
gcc.target/i386/pr84101.c for -m32. That seems like a target
cost issue though; see PR92265 for details.
2019-11-16 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-vectorizer.h (vect_get_vector_types_for_stmt): Take an
optional maximum nunits.
(get_vectype_for_scalar_type): Likewise. Also declare a form that
takes an slp_tree.
(get_mask_type_for_scalar_type): Take an optional slp_tree.
(vect_get_mask_type_for_stmt): Likewise.
* tree-vect-data-refs.c (vect_analyze_data_refs): Don't store
the vector type in STMT_VINFO_VECTYPE for BB vectorization.
* tree-vect-patterns.c (vect_recog_bool_pattern): Use
vect_get_vector_types_for_stmt instead of STMT_VINFO_VECTYPE
to get an assumed vector type for data references.
* tree-vect-slp.c (vect_update_shared_vectype): New function.
(vect_update_all_shared_vectypes): Likewise.
(vect_build_slp_tree_1): Pass the group size to
vect_get_vector_types_for_stmt. Use vect_update_shared_vectype
for BB vectorization.
(vect_build_slp_tree_2): Call vect_update_all_shared_vectypes
before building the vectof from scalars.
(vect_analyze_slp_instance): Pass the group size to
get_vectype_for_scalar_type.
(vect_slp_analyze_node_operations_1): Don't recompute the vector
types for BB vectorization here; just handle the case in which
we deferred the choice for booleans.
(vect_get_constant_vectors): Pass the slp_tree to
get_vectype_for_scalar_type.
* tree-vect-stmts.c (vect_prologue_cost_for_slp_op): Likewise.
(vectorizable_call): Likewise.
(vectorizable_simd_clone_call): Likewise.
(vectorizable_conversion): Likewise.
(vectorizable_shift): Likewise.
(vectorizable_operation): Likewise.
(vectorizable_comparison): Likewise.
(vect_is_simple_cond): Take the slp_tree as argument and
pass it to get_vectype_for_scalar_type.
(vectorizable_condition): Update call accordingly.
(get_vectype_for_scalar_type): Take a group_size argument.
For BB vectorization, limit the the vector to that number
of elements. Also define an overload that takes an slp_tree.
(get_mask_type_for_scalar_type): Add an slp_tree argument and
pass it to get_vectype_for_scalar_type.
(vect_get_vector_types_for_stmt): Add a group_size argument
and pass it to get_vectype_for_scalar_type. Don't use the
cached vector type for BB vectorization if a group size is given.
Handle data references in that case.
(vect_get_mask_type_for_stmt): Take an slp_tree argument and
pass it to get_mask_type_for_scalar_type.
gcc/testsuite/
* gcc.dg/vect/bb-slp-4.c: Expect the block to be vectorized
with -fno-vect-cost-model.
* gcc.dg/vect/bb-slp-bool-1.c: New test.
* gcc.target/aarch64/vect_mixed_sizes_14.c: Likewise.
* gcc.target/i386/pr84101.c: XFAIL for -m32.
From-SVN: r278334
|
|
A later patch makes the AArch64 port add four entries to
autovectorize_vector_modes. Each entry describes a different
vector mode assignment for vector code that mixes 8-bit, 16-bit,
32-bit and 64-bit elements. But if (as usual) the vector code has
fewer element sizes than that, we could end up trying the same
combination of vector modes multiple times. This patch adds a
check to prevent that.
2019-11-14 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-vectorizer.h (vec_info::mode_set): New typedef.
(vec_info::used_vector_mode): New member variable.
(vect_chooses_same_modes_p): Declare.
* tree-vect-stmts.c (get_vectype_for_scalar_type): Record each
chosen vector mode in vec_info::used_vector_mode.
(vect_chooses_same_modes_p): New function.
* tree-vect-loop.c (vect_analyze_loop): Use it to avoid trying
the same vector statements multiple times.
* tree-vect-slp.c (vect_slp_bb_region): Likewise.
From-SVN: r278242
|
|
After previous patches, it's now possible to make the vectoriser
support multiple vector sizes in the same vector region, using
related_vector_mode to pick the right vector mode for a given
element mode. No port yet takes advantage of this, but I have
a follow-on patch for AArch64.
This patch also seemed like a good opportunity to add some more dump
messages: one to make it clear which vector size/mode was being used
when analysis passed or failed, and another to say when we've decided
to skip a redundant vector size/mode.
2019-11-14 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* machmode.h (opt_machine_mode::operator==): New function.
(opt_machine_mode::operator!=): Likewise.
* tree-vectorizer.h (vec_info::vector_mode): Update comment.
(get_related_vectype_for_scalar_type): Delete.
(get_vectype_for_scalar_type_and_size): Declare.
* tree-vect-slp.c (vect_slp_bb_region): Print dump messages to say
whether analysis passed or failed, and with what vector modes.
Use related_vector_mode to check whether trying a particular
vector mode would be redundant with the autodetected mode,
and print a dump message if we decide to skip it.
* tree-vect-loop.c (vect_analyze_loop): Likewise.
(vect_create_epilog_for_reduction): Use
get_related_vectype_for_scalar_type instead of
get_vectype_for_scalar_type_and_size.
* tree-vect-stmts.c (get_vectype_for_scalar_type_and_size): Replace
with...
(get_related_vectype_for_scalar_type): ...this new function.
Take a starting/"prevailing" vector mode rather than a vector size.
Take an optional nunits argument, with the same meaning as for
related_vector_mode. Use related_vector_mode when not
auto-detecting a mode, falling back to mode_for_vector if no
target mode exists.
(get_vectype_for_scalar_type): Update accordingly.
(get_same_sized_vectype): Likewise.
* tree-vectorizer.c (get_vec_alignment_for_array_type): Likewise.
From-SVN: r278240
|
|
This patch replaces vec_info::vector_size with vec_info::vector_mode,
but for now continues to use it as a way of specifying a single
vector size. This makes it easier for later patches to use
related_vector_mode instead.
2019-11-14 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-vectorizer.h (vec_info::vector_size): Replace with...
(vec_info::vector_mode): ...this new field.
* tree-vect-loop.c (vect_update_vf_for_slp): Update accordingly.
(vect_analyze_loop, vect_transform_loop): Likewise.
* tree-vect-loop-manip.c (vect_do_peeling): Likewise.
* tree-vect-slp.c (can_duplicate_and_interleave_p): Likewise.
(vect_make_slp_decision, vect_slp_bb_region): Likewise.
* tree-vect-stmts.c (get_vectype_for_scalar_type): Likewise.
* tree-vectorizer.c (try_vectorize_loop_1): Likewise.
gcc/testsuite/
* gcc.dg/vect/vect-tail-nomask-1.c: Update expected epilogue
vectorization message.
From-SVN: r278237
|
|
Callers of vect_halve_mask_nunits and vect_double_mask_nunits
already know what mode the resulting vector type should have,
so we might as well create the vector type directly with that mode,
just like build_vector_type_for_mode lets us build normal vectors
with a known mode. This avoids the current awkwardness of having
to recompute the mode starting from vec_info::vector_size, which
hard-codes the assumption that all vectors have to be the same size.
A later patch gets rid of build_truth_vector_type and
build_same_sized_truth_vector_type, so the net effect of the
series is to reduce the number of type functions by one.
2019-11-14 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree.h (build_truth_vector_type_for_mode): Declare.
* tree.c (build_truth_vector_type_for_mode): New function,
split out from...
(build_truth_vector_type): ...here.
(build_opaque_vector_type): Fix head comment.
* tree-vectorizer.h (supportable_narrowing_operation): Remove
vec_info parameter.
(vect_halve_mask_nunits): Replace vec_info parameter with the
mode of the new vector.
(vect_double_mask_nunits): Likewise.
* tree-vect-loop.c (vect_halve_mask_nunits): Likewise.
(vect_double_mask_nunits): Likewise.
* tree-vect-loop-manip.c: Include insn-config.h, rtl.h and recog.h.
(vect_maybe_permute_loop_masks): Remove vinfo parameter. Update call
to vect_halve_mask_nunits, getting the required mode from the unpack
patterns.
(vect_set_loop_condition_masked): Update call accordingly.
* tree-vect-stmts.c (supportable_narrowing_operation): Remove vec_info
parameter and update call to vect_double_mask_nunits.
(vectorizable_conversion): Update call accordingly.
(simple_integer_narrowing): Likewise. Remove vec_info parameter.
(vectorizable_call): Update call accordingly.
(supportable_widening_operation): Update call to
vect_halve_mask_nunits.
* config/aarch64/aarch64-sve-builtins.cc (register_builtin_types):
Use build_truth_vector_type_mode instead of build_truth_vector_type.
From-SVN: r278231
|
|
vect_analyze_loop_costing uses two profitability thresholds: a runtime
one and a static compile-time one. The runtime one is simply the point
at which the vector loop is cheaper than the scalar loop, while the
static one also takes into account the cost of choosing between the
scalar and vector loops at runtime. We compare this static cost against
the expected execution frequency to decide whether it's worth generating
any vector code at all.
However, we never reclaimed the cost of applying the runtime threshold
if it turned out that the vector code can always be used. And we only
know whether that's true once we've calculated what the runtime
threshold would be.
2019-11-13 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-vectorizer.h (vect_apply_runtime_profitability_check_p):
New function.
* tree-vect-loop-manip.c (vect_loop_versioning): Use it.
* tree-vect-loop.c (vect_analyze_loop_2): Likewise.
(vect_transform_loop): Likewise.
(vect_analyze_loop_costing): Don't take the cost of versioning
into account for the static profitability threshold if it turns
out that no versioning is needed.
From-SVN: r278124
|
|
vectorizable_assignment handles true SSA-to-SSA copies (which hopefully
we don't see in practice) and no-op conversions that are required
to maintain correct gimple, such as changes between signed and
unsigned types. These cases shouldn't generate any code and so
shouldn't count against either the scalar or vector costs.
Later patches test this, but it seemed worth splitting out.
2019-11-13 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-vectorizer.h (vect_nop_conversion_p): Declare.
* tree-vect-stmts.c (vect_nop_conversion_p): New function.
(vectorizable_assignment): Don't add a cost for nop conversions.
* tree-vect-loop.c (vect_compute_single_scalar_iteration_cost):
Likewise.
* tree-vect-slp.c (vect_bb_slp_scalar_cost): Likewise.
From-SVN: r278122
|
|
2019-11-12 Martin Liska <mliska@suse.cz>
* asan.c (asan_sanitize_stack_p): Replace old parameter syntax
with the new one, include opts.h if needed. Use SET_OPTION_IF_UNSET
macro.
(asan_sanitize_allocas_p): Likewise.
(asan_emit_stack_protection): Likewise.
(asan_protect_global): Likewise.
(instrument_derefs): Likewise.
(instrument_builtin_call): Likewise.
(asan_expand_mark_ifn): Likewise.
* auto-profile.c (auto_profile): Likewise.
* bb-reorder.c (copy_bb_p): Likewise.
(duplicate_computed_gotos): Likewise.
* builtins.c (inline_expand_builtin_string_cmp): Likewise.
* cfgcleanup.c (try_crossjump_to_edge): Likewise.
(try_crossjump_bb): Likewise.
* cfgexpand.c (defer_stack_allocation): Likewise.
(stack_protect_classify_type): Likewise.
(pass_expand::execute): Likewise.
* cfgloopanal.c (expected_loop_iterations_unbounded): Likewise.
(estimate_reg_pressure_cost): Likewise.
* cgraph.c (cgraph_edge::maybe_hot_p): Likewise.
* combine.c (combine_instructions): Likewise.
(record_value_for_reg): Likewise.
* common/config/aarch64/aarch64-common.c (aarch64_option_validate_param): Likewise.
(aarch64_option_default_params): Likewise.
* common/config/ia64/ia64-common.c (ia64_option_default_params): Likewise.
* common/config/powerpcspe/powerpcspe-common.c (rs6000_option_default_params): Likewise.
* common/config/rs6000/rs6000-common.c (rs6000_option_default_params): Likewise.
* common/config/sh/sh-common.c (sh_option_default_params): Likewise.
* config/aarch64/aarch64.c (aarch64_output_probe_stack_range): Likewise.
(aarch64_allocate_and_probe_stack_space): Likewise.
(aarch64_expand_epilogue): Likewise.
(aarch64_override_options_internal): Likewise.
* config/alpha/alpha.c (alpha_option_override): Likewise.
* config/arm/arm.c (arm_option_override): Likewise.
(arm_valid_target_attribute_p): Likewise.
* config/i386/i386-options.c (ix86_option_override_internal): Likewise.
* config/i386/i386.c (get_probe_interval): Likewise.
(ix86_adjust_stack_and_probe_stack_clash): Likewise.
(ix86_max_noce_ifcvt_seq_cost): Likewise.
* config/ia64/ia64.c (ia64_adjust_cost): Likewise.
* config/rs6000/rs6000-logue.c (get_stack_clash_protection_probe_interval): Likewise.
(get_stack_clash_protection_guard_size): Likewise.
* config/rs6000/rs6000.c (rs6000_option_override_internal): Likewise.
* config/s390/s390.c (allocate_stack_space): Likewise.
(s390_emit_prologue): Likewise.
(s390_option_override_internal): Likewise.
* config/sparc/sparc.c (sparc_option_override): Likewise.
* config/visium/visium.c (visium_option_override): Likewise.
* coverage.c (get_coverage_counts): Likewise.
(coverage_compute_profile_id): Likewise.
(coverage_begin_function): Likewise.
(coverage_end_function): Likewise.
* cse.c (cse_find_path): Likewise.
(cse_extended_basic_block): Likewise.
(cse_main): Likewise.
* cselib.c (cselib_invalidate_mem): Likewise.
* dse.c (dse_step1): Likewise.
* emit-rtl.c (set_new_first_and_last_insn): Likewise.
(get_max_insn_count): Likewise.
(make_debug_insn_raw): Likewise.
(init_emit): Likewise.
* explow.c (compute_stack_clash_protection_loop_data): Likewise.
* final.c (compute_alignments): Likewise.
* fold-const.c (fold_range_test): Likewise.
(fold_truth_andor): Likewise.
(tree_single_nonnegative_warnv_p): Likewise.
(integer_valued_real_single_p): Likewise.
* gcse.c (want_to_gcse_p): Likewise.
(prune_insertions_deletions): Likewise.
(hoist_code): Likewise.
(gcse_or_cprop_is_too_expensive): Likewise.
* ggc-common.c: Likewise.
* ggc-page.c (ggc_collect): Likewise.
* gimple-loop-interchange.cc (MAX_NUM_STMT): Likewise.
(MAX_DATAREFS): Likewise.
(OUTER_STRIDE_RATIO): Likewise.
* gimple-loop-jam.c (tree_loop_unroll_and_jam): Likewise.
* gimple-loop-versioning.cc (loop_versioning::max_insns_for_loop): Likewise.
* gimple-ssa-split-paths.c (is_feasible_trace): Likewise.
* gimple-ssa-store-merging.c (imm_store_chain_info::try_coalesce_bswap): Likewise.
(imm_store_chain_info::coalesce_immediate_stores): Likewise.
(imm_store_chain_info::output_merged_store): Likewise.
(pass_store_merging::process_store): Likewise.
* gimple-ssa-strength-reduction.c (find_basis_for_base_expr): Likewise.
* graphite-isl-ast-to-gimple.c (class translate_isl_ast_to_gimple): Likewise.
(scop_to_isl_ast): Likewise.
* graphite-optimize-isl.c (get_schedule_for_node_st): Likewise.
(optimize_isl): Likewise.
* graphite-scop-detection.c (build_scops): Likewise.
* haifa-sched.c (set_modulo_params): Likewise.
(rank_for_schedule): Likewise.
(model_add_to_worklist): Likewise.
(model_promote_insn): Likewise.
(model_choose_insn): Likewise.
(queue_to_ready): Likewise.
(autopref_multipass_dfa_lookahead_guard): Likewise.
(schedule_block): Likewise.
(sched_init): Likewise.
* hsa-gen.c (init_prologue): Likewise.
* ifcvt.c (bb_ok_for_noce_convert_multiple_sets): Likewise.
(cond_move_process_if_block): Likewise.
* ipa-cp.c (ipcp_lattice::add_value): Likewise.
(merge_agg_lats_step): Likewise.
(devirtualization_time_bonus): Likewise.
(hint_time_bonus): Likewise.
(incorporate_penalties): Likewise.
(good_cloning_opportunity_p): Likewise.
(ipcp_propagate_stage): Likewise.
* ipa-fnsummary.c (decompose_param_expr): Likewise.
(set_switch_stmt_execution_predicate): Likewise.
(analyze_function_body): Likewise.
(compute_fn_summary): Likewise.
* ipa-inline-analysis.c (estimate_growth): Likewise.
* ipa-inline.c (caller_growth_limits): Likewise.
(inline_insns_single): Likewise.
(inline_insns_auto): Likewise.
(can_inline_edge_by_limits_p): Likewise.
(want_early_inline_function_p): Likewise.
(big_speedup_p): Likewise.
(want_inline_small_function_p): Likewise.
(want_inline_self_recursive_call_p): Likewise.
(edge_badness): Likewise.
(recursive_inlining): Likewise.
(compute_max_insns): Likewise.
(early_inliner): Likewise.
* ipa-polymorphic-call.c (csftc_abort_walking_p): Likewise.
* ipa-profile.c (ipa_profile): Likewise.
* ipa-prop.c (determine_known_aggregate_parts): Likewise.
(ipa_analyze_node): Likewise.
(ipcp_transform_function): Likewise.
* ipa-split.c (consider_split): Likewise.
* ipa-sra.c (allocate_access): Likewise.
(process_scan_results): Likewise.
(ipa_sra_summarize_function): Likewise.
(pull_accesses_from_callee): Likewise.
* ira-build.c (loop_compare_func): Likewise.
(mark_loops_for_removal): Likewise.
* ira-conflicts.c (build_conflict_bit_table): Likewise.
* loop-doloop.c (doloop_optimize): Likewise.
* loop-invariant.c (gain_for_invariant): Likewise.
(move_loop_invariants): Likewise.
* loop-unroll.c (decide_unroll_constant_iterations): Likewise.
(decide_unroll_runtime_iterations): Likewise.
(decide_unroll_stupid): Likewise.
(expand_var_during_unrolling): Likewise.
* lra-assigns.c (spill_for): Likewise.
* lra-constraints.c (EBB_PROBABILITY_CUTOFF): Likewise.
* modulo-sched.c (sms_schedule): Likewise.
(DFA_HISTORY): Likewise.
* opts.c (default_options_optimization): Likewise.
(finish_options): Likewise.
(common_handle_option): Likewise.
* postreload-gcse.c (eliminate_partially_redundant_load): Likewise.
(if): Likewise.
* predict.c (get_hot_bb_threshold): Likewise.
(maybe_hot_count_p): Likewise.
(probably_never_executed): Likewise.
(predictable_edge_p): Likewise.
(predict_loops): Likewise.
(expr_expected_value_1): Likewise.
(tree_predict_by_opcode): Likewise.
(handle_missing_profiles): Likewise.
* reload.c (find_equiv_reg): Likewise.
* reorg.c (redundant_insn): Likewise.
* resource.c (mark_target_live_regs): Likewise.
(incr_ticks_for_insn): Likewise.
* sanopt.c (pass_sanopt::execute): Likewise.
* sched-deps.c (sched_analyze_1): Likewise.
(sched_analyze_2): Likewise.
(sched_analyze_insn): Likewise.
(deps_analyze_insn): Likewise.
* sched-ebb.c (schedule_ebbs): Likewise.
* sched-rgn.c (find_single_block_region): Likewise.
(too_large): Likewise.
(haifa_find_rgns): Likewise.
(extend_rgns): Likewise.
(new_ready): Likewise.
(schedule_region): Likewise.
(sched_rgn_init): Likewise.
* sel-sched-ir.c (make_region_from_loop): Likewise.
* sel-sched-ir.h (MAX_WS): Likewise.
* sel-sched.c (process_pipelined_exprs): Likewise.
(sel_setup_region_sched_flags): Likewise.
* shrink-wrap.c (try_shrink_wrapping): Likewise.
* targhooks.c (default_max_noce_ifcvt_seq_cost): Likewise.
* toplev.c (print_version): Likewise.
(process_options): Likewise.
* tracer.c (tail_duplicate): Likewise.
* trans-mem.c (tm_log_add): Likewise.
* tree-chrec.c (chrec_fold_plus_1): Likewise.
* tree-data-ref.c (split_constant_offset): Likewise.
(compute_all_dependences): Likewise.
* tree-if-conv.c (MAX_PHI_ARG_NUM): Likewise.
* tree-inline.c (remap_gimple_stmt): Likewise.
* tree-loop-distribution.c (MAX_DATAREFS_NUM): Likewise.
* tree-parloops.c (MIN_PER_THREAD): Likewise.
(create_parallel_loop): Likewise.
* tree-predcom.c (determine_unroll_factor): Likewise.
* tree-scalar-evolution.c (instantiate_scev_r): Likewise.
* tree-sra.c (analyze_all_variable_accesses): Likewise.
* tree-ssa-ccp.c (fold_builtin_alloca_with_align): Likewise.
* tree-ssa-dse.c (setup_live_bytes_from_ref): Likewise.
(dse_optimize_redundant_stores): Likewise.
(dse_classify_store): Likewise.
* tree-ssa-ifcombine.c (ifcombine_ifandif): Likewise.
* tree-ssa-loop-ch.c (ch_base::copy_headers): Likewise.
* tree-ssa-loop-im.c (LIM_EXPENSIVE): Likewise.
* tree-ssa-loop-ivcanon.c (try_unroll_loop_completely): Likewise.
(try_peel_loop): Likewise.
(tree_unroll_loops_completely): Likewise.
* tree-ssa-loop-ivopts.c (avg_loop_niter): Likewise.
(CONSIDER_ALL_CANDIDATES_BOUND): Likewise.
(MAX_CONSIDERED_GROUPS): Likewise.
(ALWAYS_PRUNE_CAND_SET_BOUND): Likewise.
* tree-ssa-loop-manip.c (can_unroll_loop_p): Likewise.
* tree-ssa-loop-niter.c (MAX_ITERATIONS_TO_TRACK): Likewise.
* tree-ssa-loop-prefetch.c (PREFETCH_BLOCK): Likewise.
(L1_CACHE_SIZE_BYTES): Likewise.
(L2_CACHE_SIZE_BYTES): Likewise.
(should_issue_prefetch_p): Likewise.
(schedule_prefetches): Likewise.
(determine_unroll_factor): Likewise.
(volume_of_references): Likewise.
(add_subscript_strides): Likewise.
(self_reuse_distance): Likewise.
(mem_ref_count_reasonable_p): Likewise.
(insn_to_prefetch_ratio_too_small_p): Likewise.
(loop_prefetch_arrays): Likewise.
(tree_ssa_prefetch_arrays): Likewise.
* tree-ssa-loop-unswitch.c (tree_unswitch_single_loop): Likewise.
* tree-ssa-math-opts.c (gimple_expand_builtin_pow): Likewise.
(convert_mult_to_fma): Likewise.
(math_opts_dom_walker::after_dom_children): Likewise.
* tree-ssa-phiopt.c (cond_if_else_store_replacement): Likewise.
(hoist_adjacent_loads): Likewise.
(gate_hoist_loads): Likewise.
* tree-ssa-pre.c (translate_vuse_through_block): Likewise.
(compute_partial_antic_aux): Likewise.
* tree-ssa-reassoc.c (get_reassociation_width): Likewise.
* tree-ssa-sccvn.c (vn_reference_lookup_pieces): Likewise.
(vn_reference_lookup): Likewise.
(do_rpo_vn): Likewise.
* tree-ssa-scopedtables.c (avail_exprs_stack::lookup_avail_expr): Likewise.
* tree-ssa-sink.c (select_best_block): Likewise.
* tree-ssa-strlen.c (new_stridx): Likewise.
(new_addr_stridx): Likewise.
(get_range_strlen_dynamic): Likewise.
(class ssa_name_limit_t): Likewise.
* tree-ssa-structalias.c (push_fields_onto_fieldstack): Likewise.
(create_variable_info_for_1): Likewise.
(init_alias_vars): Likewise.
* tree-ssa-tail-merge.c (find_clusters_1): Likewise.
(tail_merge_optimize): Likewise.
* tree-ssa-threadbackward.c (thread_jumps::profitable_jump_thread_path): Likewise.
(thread_jumps::fsm_find_control_statement_thread_paths): Likewise.
(thread_jumps::find_jump_threads_backwards): Likewise.
* tree-ssa-threadedge.c (record_temporary_equivalences_from_stmts_at_dest): Likewise.
* tree-ssa-uninit.c (compute_control_dep_chain): Likewise.
* tree-switch-conversion.c (switch_conversion::check_range): Likewise.
(jump_table_cluster::can_be_handled): Likewise.
* tree-switch-conversion.h (jump_table_cluster::case_values_threshold): Likewise.
(SWITCH_CONVERSION_BRANCH_RATIO): Likewise.
(param_switch_conversion_branch_ratio): Likewise.
* tree-vect-data-refs.c (vect_mark_for_runtime_alias_test): Likewise.
(vect_enhance_data_refs_alignment): Likewise.
(vect_prune_runtime_alias_test_list): Likewise.
* tree-vect-loop.c (vect_analyze_loop_costing): Likewise.
(vect_get_datarefs_in_loop): Likewise.
(vect_analyze_loop): Likewise.
* tree-vect-slp.c (vect_slp_bb): Likewise.
* tree-vectorizer.h: Likewise.
* tree-vrp.c (find_switch_asserts): Likewise.
(vrp_prop::check_mem_ref): Likewise.
* tree.c (wide_int_to_tree_1): Likewise.
(cache_integer_cst): Likewise.
* var-tracking.c (EXPR_USE_DEPTH): Likewise.
(reverse_op): Likewise.
(vt_find_locations): Likewise.
2019-11-12 Martin Liska <mliska@suse.cz>
* gimple-parser.c (c_parser_parse_gimple_body): Replace old parameter syntax
with the new one, include opts.h if needed. Use SET_OPTION_IF_UNSET
macro.
2019-11-12 Martin Liska <mliska@suse.cz>
* name-lookup.c (namespace_hints::namespace_hints): Replace old parameter syntax
with the new one, include opts.h if needed. Use SET_OPTION_IF_UNSET
macro.
* typeck.c (comptypes): Likewise.
2019-11-12 Martin Liska <mliska@suse.cz>
* lto-partition.c (lto_balanced_map): Replace old parameter syntax
with the new one, include opts.h if needed. Use SET_OPTION_IF_UNSET
macro.
* lto.c (do_whole_program_analysis): Likewise.
From-SVN: r278085
|
|
internal-fn.c:2890)
2019-11-08 Richard Biener <rguenther@suse.de>
PR tree-optimization/92324
* tree-vect-loop.c (vect_create_epilog_for_reduction): Use
STMT_VINFO_REDUC_VECTYPE for all computations, inserting
sign-conversions as necessary.
(vectorizable_reduction): Reject conversions in the chain
that are not sign-conversions, base analysis on a non-converting
stmt and its operation sign. Set STMT_VINFO_REDUC_VECTYPE.
* tree-vect-stmts.c (vect_stmt_relevant_p): Don't dump anything
for debug stmts.
* tree-vectorizer.h (_stmt_vec_info::reduc_vectype): New.
(STMT_VINFO_REDUC_VECTYPE): Likewise.
* gcc.dg/vect/pr92205.c: XFAIL.
* gcc.dg/vect/pr92324-1.c: New testcase.
* gcc.dg/vect/pr92324-2.c: Likewise.
From-SVN: r277955
|
|
The gather and scatter optabs required the vector offset to be
the integer equivalent of the vector mode being loaded or stored.
This patch generalises them so that the two vectors can have different
element sizes, although they still need to have the same number of
elements.
One consequence of this is that it's possible (if unlikely)
for two IFN_GATHER_LOADs to have the same arguments but different
return types. E.g. the same scalar base and vector of 32-bit offsets
could be used to load 8-bit elements and to load 16-bit elements.
From just looking at the arguments, we could wrongly deduce that
they're equivalent.
I know we saw this happen at one point with IFN_WHILE_ULT,
and we dealt with it there by passing a zero of the return type
as an extra argument. Doing the same here also makes the load
and store functions have the same argument assignment.
For now this patch should be a no-op, but later SVE patches take
advantage of the new flexibility.
2019-11-08 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* optabs.def (gather_load_optab, mask_gather_load_optab)
(scatter_store_optab, mask_scatter_store_optab): Turn into
conversion optabs, with the offset mode given explicitly.
* doc/md.texi: Update accordingly.
* config/aarch64/aarch64-sve-builtins-base.cc
(svld1_gather_impl::expand): Likewise.
(svst1_scatter_impl::expand): Likewise.
* internal-fn.c (gather_load_direct, scatter_store_direct): Likewise.
(expand_scatter_store_optab_fn): Likewise.
(direct_gather_load_optab_supported_p): Likewise.
(direct_scatter_store_optab_supported_p): Likewise.
(expand_gather_load_optab_fn): Likewise. Expect the mask argument
to be argument 4.
(internal_fn_mask_index): Return 4 for IFN_MASK_GATHER_LOAD.
(internal_gather_scatter_fn_supported_p): Replace the offset sign
argument with the offset vector type. Require the two vector
types to have the same number of elements but allow their element
sizes to be different. Treat the optabs as conversion optabs.
* internal-fn.h (internal_gather_scatter_fn_supported_p): Update
prototype accordingly.
* optabs-query.c (supports_at_least_one_mode_p): Replace with...
(supports_vec_convert_optab_p): ...this new function.
(supports_vec_gather_load_p): Update accordingly.
(supports_vec_scatter_store_p): Likewise.
* tree-vectorizer.h (vect_gather_scatter_fn_p): Take a vec_info.
Replace the offset sign and bits parameters with a scalar type tree.
* tree-vect-data-refs.c (vect_gather_scatter_fn_p): Likewise.
Pass back the offset vector type instead of the scalar element type.
Allow the offset to be wider than the memory elements. Search for
an offset type that the target supports, stopping once we've
reached the maximum of the element size and pointer size.
Update call to internal_gather_scatter_fn_supported_p.
(vect_check_gather_scatter): Update calls accordingly.
When testing a new scale before knowing the final offset type,
check whether the scale is supported for any signed or unsigned
offset type. Check whether the target supports the source and
target types of a conversion before deciding whether to look
through the conversion. Record the chosen offset_vectype.
* tree-vect-patterns.c (vect_get_gather_scatter_offset_type): Delete.
(vect_recog_gather_scatter_pattern): Get the scalar offset type
directly from the gs_info's offset_vectype instead. Pass a zero
of the result type to IFN_GATHER_LOAD and IFN_MASK_GATHER_LOAD.
* tree-vect-stmts.c (check_load_store_masking): Update call to
internal_gather_scatter_fn_supported_p, passing the offset vector
type recorded in the gs_info.
(vect_truncate_gather_scatter_offset): Update call to
vect_check_gather_scatter, leaving it to search for a valid
offset vector type.
(vect_use_strided_gather_scatters_p): Convert the offset to the
element type of the gs_info's offset_vectype.
(vect_get_gather_scatter_ops): Get the offset vector type directly
from the gs_info.
(vect_get_strided_load_store_ops): Likewise.
(vectorizable_load): Pass a zero of the result type to IFN_GATHER_LOAD
and IFN_MASK_GATHER_LOAD.
* config/aarch64/aarch64-sve.md (gather_load<mode>): Rename to...
(gather_load<mode><v_int_equiv>): ...this.
(mask_gather_load<mode>): Rename to...
(mask_gather_load<mode><v_int_equiv>): ...this.
(scatter_store<mode>): Rename to...
(scatter_store<mode><v_int_equiv>): ...this.
(mask_scatter_store<mode>): Rename to...
(mask_scatter_store<mode><v_int_equiv>): ...this.
From-SVN: r277949
|
|
gcc/ChangeLog:
2019-11-04 Andre Vieira <andre.simoesdiasvieira@arm.com>
* tree-vect-loop.c (vect_analyze_loop): Remove orig_loop_vinfo
parameter.
* tree-vectorizer.h (vect_analyze_loop): Update declaration.
* tree-vectorizer.c (try_vectorize_loop_1): Update calls to
vect_analyze_loop.
From-SVN: r277785
|
|
gcc/ChangeLog:
2019-11-04 Joel Hutton <Joel.Hutton@arm.com>
* expr.c (store_constructor): Modify to handle single element vectors.
* tree-vect-slp.c (vect_analyze_slp_instance): Add case for vector
constructors.
(vect_slp_check_for_constructors): New function.
(vect_slp_analyze_bb_1): Call new function to check for vector
constructors.
(vectorize_slp_instance_root_stmt): New function.
(vect_schedule_slp): Call new function to vectorize root stmt of vector
constructors.
* tree-vectorizer.h (SLP_INSTANCE_ROOT_STMT): New field.
gcc/testsuite/ChangeLog:
2019-11-04 Joel Hutton <Joel.Hutton@arm.com>
* gcc.dg/vect/bb-slp-40.c: New test.
* gcc.dg/vect/bb-slp-41.c: New test.
From-SVN: r277784
|
|
gcc/ChangeLog:
2019-10-29 Andre Vieira <andre.simoesdiasvieira@arm.com>
PR 88915
* tree-ssa-loop-niter.h (simplify_replace_tree): Change declaration.
* tree-ssa-loop-niter.c (simplify_replace_tree): Add context parameter
and make the valueize function pointer also take a void pointer.
* gcc/tree-ssa-sccvn.c (vn_valueize_wrapper): New function to wrap
around vn_valueize, to call it without a context.
(process_bb): Use vn_valueize_wrapper instead of vn_valueize.
* tree-vect-loop.c (_loop_vec_info): Initialize epilogue_vinfos.
(~_loop_vec_info): Release epilogue_vinfos.
(vect_analyze_loop_costing): Use knowledge of main VF to estimate
number of iterations of epilogue.
(vect_analyze_loop_2): Adapt to analyse main loop for all supported
vector sizes when vect-epilogues-nomask=1. Also keep track of lowest
versioning threshold needed for main loop.
(vect_analyze_loop): Likewise.
(find_in_mapping): New helper function.
(update_epilogue_loop_vinfo): New function.
(vect_transform_loop): When vectorizing epilogues re-use analysis done
on main loop and call update_epilogue_loop_vinfo to update it.
* tree-vect-loop-manip.c (vect_update_inits_of_drs): No longer insert
stmts on loop preheader edge.
(vect_do_peeling): Enable skip-vectors when doing loop versioning if
we decided to vectorize epilogues. Update epilogues NITERS and
construct ADVANCE to update epilogues data references where needed.
* tree-vectorizer.h (_loop_vec_info): Add epilogue_vinfos.
(vect_do_peeling, vect_update_inits_of_drs,
determine_peel_for_niter, vect_analyze_loop): Add or update
declarations.
* tree-vectorizer.c (try_vectorize_loop_1): Make sure to use already
created loop_vec_info's for epilogues when available. Otherwise analyse
epilogue separately.
From-SVN: r277569
|