aboutsummaryrefslogtreecommitdiff
path: root/gcc/tree-vectorizer.h
AgeCommit message (Collapse)AuthorFilesLines
2020-08-17vect/rs6000: Support vector with length cost modelingKewen Lin1-0/+1
This patch is to add the cost modeling for vector with length, it mainly follows what we generate for vector with length in functions vect_set_loop_controls_directly and vect_gen_len at the worst case. For Power, the length is expected to be in bits 0-7 (high bits), we have to model the cost of shifting bits, which is implemented in adjust_vect_cost_per_loop. Bootstrapped/regtested on powerpc64le-linux-gnu (P9) with explicit param vect-partial-vector-usage=1. gcc/ChangeLog: * config/rs6000/rs6000.c (rs6000_adjust_vect_cost_per_loop): New function. (rs6000_finish_cost): Call rs6000_adjust_vect_cost_per_loop. * tree-vect-loop.c (vect_estimate_min_profitable_iters): Add cost modeling for vector with length. (vect_rgroup_iv_might_wrap_p): New function, factored out from... * tree-vect-loop-manip.c (vect_set_loop_controls_directly): ...this. Update function comment. * tree-vect-stmts.c (vect_gen_len): Update function comment. * tree-vectorizer.h (vect_rgroup_iv_might_wrap_p): New declare.
2020-08-17vect: Support length-based partial vectors approachKewen Lin1-3/+32
Power9 supports vector load/store instruction lxvl/stxvl which allow us to operate partial vectors with one specific length. This patch extends some of current mask-based partial vectors support code for length-based approach, also adds some length specific support code. So far it assumes that we can only have one partial vectors approach at the same time, it will disable to use partial vectors if both approaches co-exist. Like the description of optab len_load/len_store, the length-based approach can have two flavors, one is length in bytes, the other is length in lanes. This patch is mainly implemented and tested for length in bytes, but as Richard S. suggested, most of code has considered both flavors. This also introduces one parameter vect-partial-vector-usage allow users to control when the loop vectorizer considers using partial vectors as an alternative to falling back to scalar code. gcc/ChangeLog: * config/rs6000/rs6000.c (rs6000_option_override_internal): Set param_vect_partial_vector_usage to 0 explicitly. * doc/invoke.texi (vect-partial-vector-usage): Document new option. * optabs-query.c (get_len_load_store_mode): New function. * optabs-query.h (get_len_load_store_mode): New declare. * params.opt (vect-partial-vector-usage): New. * tree-vect-loop-manip.c (vect_set_loop_controls_directly): Add the handlings for vectorization using length-based partial vectors, call vect_gen_len for length generation, and rename some variables with items instead of scalars. (vect_set_loop_condition_partial_vectors): Add the handlings for vectorization using length-based partial vectors. (vect_do_peeling): Allow remaining eiters less than epilogue vf for LOOP_VINFO_USING_PARTIAL_VECTORS_P. * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Init epil_using_partial_vectors_p. (_loop_vec_info::~_loop_vec_info): Call release_vec_loop_controls for lengths destruction. (vect_verify_loop_lens): New function. (vect_analyze_loop): Add handlings for epilogue of loop when it's marked to use vectorization using partial vectors. (vect_analyze_loop_2): Add the check to allow only one vectorization approach using partial vectorization at the same time. Check param vect-partial-vector-usage for partial vectors decision. Mark LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P if the epilogue is considerable to use partial vectors. Call release_vec_loop_controls for lengths destruction. (vect_estimate_min_profitable_iters): Adjust for loop vectorization using length-based partial vectors. (vect_record_loop_mask): Init factor to 1 for vectorization using mask-based partial vectors. (vect_record_loop_len): New function. (vect_get_loop_len): Likewise. * tree-vect-stmts.c (check_load_store_for_partial_vectors): Add checks for vectorization using length-based partial vectors. Factor some code to lambda function get_valid_nvectors. (vectorizable_store): Add handlings when using length-based partial vectors. (vectorizable_load): Likewise. (vect_gen_len): New function. * tree-vectorizer.h (struct rgroup_controls): Add field factor mainly for length-based partial vectors. (vec_loop_lens): New typedef. (_loop_vec_info): Add lens and epil_using_partial_vectors_p. (LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P): New macro. (LOOP_VINFO_LENS): Likewise. (LOOP_VINFO_FULLY_WITH_LENGTH_P): Likewise. (vect_record_loop_len): New declare. (vect_get_loop_len): Likewise. (vect_gen_len): Likewise.
2020-08-17remove premature vect_verify_datarefs_alignmentRichard Biener1-3/+1
This followup removes vect_verify_datarefs_alignment and its premature cancellation of vectorization leaving the actual decision whether alignment is supported to the functions deciding whether we can vectorize a load or store. 2020-07-08 Richard Biener <rguenther@suse.de> * tree-vectorizer.h (vect_verify_datarefs_alignment): Remove. (vect_slp_analyze_and_verify_instance_alignment): Rename to ... (vect_slp_analyze_instance_alignment): ... this. * tree-vect-data-refs.c (verify_data_ref_alignment): Remove. (vect_verify_datarefs_alignment): Likewise. (vect_enhance_data_refs_alignment): Do not call vect_verify_datarefs_alignment. (vect_slp_analyze_node_alignment): Rename from vect_slp_analyze_and_verify_node_alignment and do not call verify_data_ref_alignment. (vect_slp_analyze_instance_alignment): Rename from vect_slp_analyze_and_verify_instance_alignment. * tree-vect-stmts.c (vectorizable_store): Dump when we vectorize an unaligned access. (vectorizable_load): Likewise. * tree-vect-loop.c (vect_analyze_loop_2): Do not call vect_verify_datarefs_alignment. * tree-vect-slp.c (vect_slp_analyze_bb_1): Adjust. * gcc.dg/vect/bb-slp-10.c: Adjust. * gcc.dg/vect/slp-45.c: Likewise. * gcc.dg/vect/vect-109.c: Likewise.
2020-08-17refactor SLP constant insertion and provde entry insert helperRichard Biener1-0/+2
This provides helpers to insert stmts on region entry abstracted from loop/basic-block split out from vec_init_vector and used from the SLP constant code generation path. The SLP constant code generation path is also changed to avoid needless SSA copying since we can store VECTOR_CSTs directly in the vectorized defs array, improving the IL from the vectorizer. 2020-07-03 Richard Biener <rguenther@suse.de> * tree-vectorizer.h (vec_info::insert_on_entry): New. (vec_info::insert_seq_on_entry): Likewise. * tree-vectorizer.c (vec_info::insert_on_entry): Implement. (vec_info::insert_seq_on_entry): Likewise. * tree-vect-stmts.c (vect_init_vector_1): Use vec_info::insert_on_entry. (vect_finish_stmt_generation): Set modified bit after adjusting VUSE. * tree-vect-slp.c (vect_create_constant_vectors): Simplify by using vec_info::insert_seq_on_entry and bypassing vec_init_vector. (vect_schedule_slp_instance): Deal with all-constant children later.
2020-08-17do not include <utility> from tree-vectorizer.hRichard Biener1-1/+1
This removes the duplicate <utility> include from tree-vectorizer.h. 2020-06-29 Richard Biener <rguenther@suse.de> * tree-vectorizer.h: Do not include <utility>.
2020-08-17Use gsi_bb instead of iterator->bb.Martin Liska1-1/+1
gcc/ChangeLog: * tree-ssa-ccp.c (gsi_prev_dom_bb_nondebug): Use gsi_bb instead of gimple_stmt_iterator::bb. * tree-ssa-math-opts.c (insert_reciprocals): Likewise. * tree-vectorizer.h: Likewise.
2020-08-17tree-optimization/95897 - fix fold-left SLP reduction insert placeRichard Biener1-1/+0
This fixes computation of the insertion place for fold-left SLP reductions where the PHIs do not have vectorized stmts. The SLP representation isn't perfect here thus the following. 2020-06-26 Richard Biener <rguenther@suse.de> PR tree-optimization/95897 * tree-vectorizer.h (vectorizable_induction): Remove unused gimple_stmt_iterator * parameter. * tree-vect-loop.c (vectorizable_induction): Likewise. (vect_analyze_loop_operations): Adjust. * tree-vect-stmts.c (vect_analyze_stmt): Likewise. (vect_transform_stmt): Likewise. * tree-vect-slp.c (vect_schedule_slp_instance): Adjust for fold-left reductions, clarify existing reduction case. * gcc.dg/vect/pr95897.c: New testcase.
2020-08-17emit SLP vectorized loads earlierRichard Biener1-0/+1
This makes sure to emit SLP vectorized loads where the first scalar load is. This makes SLP dependence checking more powerful because hoisting loads can use TBAA and it increases the freedom for vector placement when there are constraints from live lanes. Vectorized shifts block inserting vectorized stmts always after vectorized defs because it ends up using the original scalar operand even when the SLP graph indicates the shift operand is vectorized (and we actually emit and cost those stmts). vect_slp_analyze_and_verify_node_alignment shows we need alignment for too many places, this is a temporary solution and my plan is to have a single meta-info for a dataref group instead (also getting rid of DR_GROUP_FIRST/NEXT_ELEMENT). 2020-06-24 Richard Biener <rguenther@suse.de> * tree-vectorizer.h (vect_find_first_scalar_stmt_in_slp): Declare. * tree-vect-data-refs.c (vect_preserves_scalar_order_p): Simplify for new position of vectorized SLP loads. (vect_slp_analyze_node_dependences): Adjust for it. (vect_slp_analyze_and_verify_node_alignment): Compute alignment for the first stmts dataref. * tree-vect-slp.c (vect_find_first_scalar_stmt_in_slp): New. (vect_schedule_slp_instance): Emit loads before the first scalar stmt. * tree-vect-stmts.c (vectorizable_load): Do what the comment says and use vect_find_first_scalar_stmt_in_slp.
2020-08-17vectorizer: add _bb_vec_info::region_stmts and reverse_region_stmtsMartin Liska1-0/+82
gcc/ChangeLog: * coretypes.h (struct iterator_range): New type. * tree-vect-patterns.c (vect_determine_precisions): Use range-based iterator. (vect_pattern_recog): Likewise. * tree-vect-slp.c (_bb_vec_info): Likewise. (_bb_vec_info::~_bb_vec_info): Likewise. (vect_slp_check_for_constructors): Likewise. * tree-vectorizer.h:Add new iterators and functions that use it.
2020-08-17remove SLP_TREE_TWO_OPERATORS, add SLP permutation nodeRichard Biener1-4/+9
This removes the SLP_TREE_TWO_OPERATORS hack in favor of having explicit SLP nodes for both computations and the blend operation. For this introduce a generic merge + select + permute SLP node (with implementation limits). Building upon earlier patches it adds vect_stmt_dominates_stmt_p and the ability to compute a vector insertion place from vectorized stmts (which now have UID zero) as needed for the permute node. 2020-06-17 Richard Biener <rguenther@suse.de> * tree-vectorizer.h (_slp_tree::two_operators): Remove. (_slp_tree::lane_permutation): New member. (_slp_tree::code): Likewise. (SLP_TREE_TWO_OPERATORS): Remove. (SLP_TREE_LANE_PERMUTATION): New. (SLP_TREE_CODE): Likewise. (vect_stmt_dominates_stmt_p): Declare. * tree-vectorizer.c (vect_stmt_dominates_stmt_p): New function. * tree-vect-stmts.c (vect_model_simple_cost): Remove SLP_TREE_TWO_OPERATORS handling. * tree-vect-slp.c (_slp_tree::_slp_tree): Amend. (_slp_tree::~_slp_tree): Likewise. (vect_two_operations_perm_ok_p): Remove. (vect_build_slp_tree_1): Remove verification of two-operator permutation here. (vect_build_slp_tree_2): When we have two different operators build two computation SLP nodes and a blend. (vect_print_slp_tree): Print the lane permutation if it exists. (slp_copy_subtree): Copy it. (vect_slp_rearrange_stmts): Re-arrange it. (vect_slp_analyze_node_operations_1): Handle SLP_TREE_CODE VEC_PERM_EXPR explicitely. (vect_schedule_slp_instance): Likewise. Remove old SLP_TREE_TWO_OPERATORS code. (vectorizable_slp_permutation): New function.
2020-08-17vect: Factor out and rename some functions/macrosKewen Lin1-9/+10
Power supports vector memory access with length (in bytes) instructions. Like existing fully masking for SVE, it is another approach to vectorize the loop using partially-populated vectors. As Richard Sandiford suggested, we should share the codes in approaches with partial vectors if possible. This patch is to: 1) factor out two functions: - vect_min_prec_for_max_niters - vect_known_niters_smaller_than_vf. 2) rename four functions: - vect_iv_limit_for_full_masking - check_load_store_masking - vect_set_loop_condition_masked - vect_set_loop_condition_unmasked 3) rename macros LOOP_VINFO_MASK_COMPARE_TYPE and LOOP_VINFO_MASK_IV_TYPE. Bootstrapped/regtested on aarch64-linux-gnu. gcc/ChangeLog: * tree-vect-loop-manip.c (vect_set_loop_controls_directly): Rename LOOP_VINFO_MASK_COMPARE_TYPE to LOOP_VINFO_RGROUP_COMPARE_TYPE. Rename LOOP_VINFO_MASK_IV_TYPE to LOOP_VINFO_RGROUP_IV_TYPE. (vect_set_loop_condition_masked): Renamed to ... (vect_set_loop_condition_partial_vectors): ... this. Rename LOOP_VINFO_MASK_COMPARE_TYPE to LOOP_VINFO_RGROUP_COMPARE_TYPE. Rename vect_iv_limit_for_full_masking to vect_iv_limit_for_partial_vectors. (vect_set_loop_condition_unmasked): Renamed to ... (vect_set_loop_condition_normal): ... this. (vect_set_loop_condition): Rename vect_set_loop_condition_unmasked to vect_set_loop_condition_normal. Rename vect_set_loop_condition_masked to vect_set_loop_condition_partial_vectors. (vect_prepare_for_masked_peels): Rename LOOP_VINFO_MASK_COMPARE_TYPE to LOOP_VINFO_RGROUP_COMPARE_TYPE. * tree-vect-loop.c (vect_known_niters_smaller_than_vf): New, factored out from ... (vect_analyze_loop_costing): ... this. (_loop_vec_info::_loop_vec_info): Rename mask_compare_type to compare_type. (vect_min_prec_for_max_niters): New, factored out from ... (vect_verify_full_masking): ... this. Rename vect_iv_limit_for_full_masking to vect_iv_limit_for_partial_vectors. Rename LOOP_VINFO_MASK_COMPARE_TYPE to LOOP_VINFO_RGROUP_COMPARE_TYPE. Rename LOOP_VINFO_MASK_IV_TYPE to LOOP_VINFO_RGROUP_IV_TYPE. (vectorizable_reduction): Update some dumpings with partial vectors instead of fully-masked. (vectorizable_live_operation): Likewise. (vect_iv_limit_for_full_masking): Renamed to ... (vect_iv_limit_for_partial_vectors): ... this. * tree-vect-stmts.c (check_load_store_masking): Renamed to ... (check_load_store_for_partial_vectors): ... this. Update some dumpings with partial vectors instead of fully-masked. (vectorizable_store): Rename check_load_store_masking to check_load_store_for_partial_vectors. (vectorizable_load): Likewise. * tree-vectorizer.h (LOOP_VINFO_MASK_COMPARE_TYPE): Renamed to ... (LOOP_VINFO_RGROUP_COMPARE_TYPE): ... this. (LOOP_VINFO_MASK_IV_TYPE): Renamed to ... (LOOP_VINFO_RGROUP_IV_TYPE): ... this. (vect_iv_limit_for_full_masking): Renamed to ... (vect_iv_limit_for_partial_vectors): this. (_loop_vec_info): Rename mask_compare_type to rgroup_compare_type. Rename iv_type to rgroup_iv_type.
2020-08-17vect: Rename things related to rgroup_masksKewen Lin1-23/+25
Power supports vector memory access with length (in bytes) instructions. Like existing fully masking for SVE, it is another approach to vectorize the loop using partially-populated vectors. As Richard Sandiford pointed out, we can rename the rgroup struct rgroup_masks to rgroup_controls, rename its members mask_type to type, masks to controls to be more generic. Besides, this patch also renames some functions like vect_set_loop_mask to vect_set_loop_control, release_vec_loop_masks to release_vec_loop_controls, vect_set_loop_masks_directly to vect_set_loop_controls_directly. Bootstrapped/regtested on aarch64-linux-gnu. gcc/ChangeLog: * tree-vect-loop-manip.c (vect_set_loop_mask): Renamed to ... (vect_set_loop_control): ... this. (vect_maybe_permute_loop_masks): Rename rgroup_masks related things. (vect_set_loop_masks_directly): Renamed to ... (vect_set_loop_controls_directly): ... this. Also rename some variables with ctrl instead of mask. Rename vect_set_loop_mask to vect_set_loop_control. (vect_set_loop_condition_masked): Rename rgroup_masks related things. Also rename some variables with ctrl instead of mask. * tree-vect-loop.c (release_vec_loop_masks): Renamed to ... (release_vec_loop_controls): ... this. Rename rgroup_masks related things. (_loop_vec_info::~_loop_vec_info): Rename release_vec_loop_masks to release_vec_loop_controls. (can_produce_all_loop_masks_p): Rename rgroup_masks related things. (vect_get_max_nscalars_per_iter): Likewise. (vect_estimate_min_profitable_iters): Likewise. (vect_record_loop_mask): Likewise. (vect_get_loop_mask): Likewise. * tree-vectorizer.h (struct rgroup_masks): Renamed to ... (struct rgroup_controls): ... this. Also rename mask_type to type and rename masks to controls.
2020-08-17vect: Rename fully_masked_p to using_partial_vectors_pKewen Lin1-3/+8
Power supports vector memory access with length (in bytes) instructions. Like existing fully masking for SVE, it is another approach to vectorize the loop using partially-populated vectors. As Richard Sandiford suggested, this patch is to update the existing fully_masked_p field to using_partial_vectors_p. Introduce one macro LOOP_VINFO_USING_PARTIAL_VECTORS_P for partial vectorization checking usage, update the LOOP_VINFO_FULLY_MASKED_P with LOOP_VINFO_USING_PARTIAL_VECTORS_P && !masks.is_empty() and still use it for mask-based partial vectors approach specific checks. Bootstrapped/regtested on aarch64-linux-gnu. gcc/ChangeLog: * tree-vect-loop-manip.c (vect_set_loop_condition): Rename LOOP_VINFO_FULLY_MASKED_P to LOOP_VINFO_USING_PARTIAL_VECTORS_P. (vect_gen_vector_loop_niters): Likewise. (vect_do_peeling): Likewise. * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Rename fully_masked_p to using_partial_vectors_p. (vect_analyze_loop_costing): Rename LOOP_VINFO_FULLY_MASKED_P to LOOP_VINFO_USING_PARTIAL_VECTORS_P. (determine_peel_for_niter): Likewise. (vect_estimate_min_profitable_iters): Likewise. (vect_transform_loop): Likewise. * tree-vectorizer.h (LOOP_VINFO_FULLY_MASKED_P): Updated. (LOOP_VINFO_USING_PARTIAL_VECTORS_P): New macro.
2020-08-17vect: Rename can_fully_mask_p to can_use_partial_vectors_pKewen Lin1-3/+6
Power supports vector memory access with length (in bytes) instructions. Like existing fully masking for SVE, it is another approach to vectorize the loop using partially-populated vectors. As Richard Sandiford pointed out, we should extend the existing flag can_fully_mask_p to be more generic, to indicate whether we have any chances with partial vectors for this loop. So this patch is to rename this flag to can_use_partial_vectors_p to be more meaningful, also rename the macro LOOP_VINFO_CAN_FULLY_MASK_P to LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P. Bootstrapped/regtested on aarch64-linux-gnu. gcc/ChangeLog: * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Rename can_fully_mask_p to can_use_partial_vectors_p. (vect_analyze_loop_2): Rename LOOP_VINFO_CAN_FULLY_MASK_P to LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P. Rename saved_can_fully_mask_p to saved_can_use_partial_vectors_p. (vectorizable_reduction): Rename LOOP_VINFO_CAN_FULLY_MASK_P to LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P. (vectorizable_live_operation): Likewise. * tree-vect-stmts.c (permute_vec_elements): Likewise. (check_load_store_masking): Likewise. (vectorizable_operation): Likewise. (vectorizable_store): Likewise. (vectorizable_load): Likewise. (vectorizable_condition): Likewise. * tree-vectorizer.h (LOOP_VINFO_CAN_FULLY_MASK_P): Renamed to ... (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P): ... this. (_loop_vec_info): Rename can_fully_mask_p to can_use_partial_vectors_p.
2020-08-17Make {SLP_TREE,STMT_VINFO}_VEC_STMTS a vector of gimple *Richard Biener1-14/+12
This makes {SLP_TREE,STMT_VINFO}_VEC_STMTS a vector of gimple * and not allocate a stmt_vec_info for vectorizer generated stmts since this is now possible after removing the only use which was chaining of vector stmts via STMT_VINFO_RELATED_STMT. This also removes all stmt_vec_info allocations done for vector stmts, the remaining ones are for stmts in the scalar IL and for patterns which are not part of the IL. Thus after this the stmt UIDs inside a basic-block are suitable for dominance checking if you ignore (or lazy-fill) UIDs of zero of the vector stmts inserted during transform. This property is ensured by a new flag set when pattern analysis is complete. 2020-06-10 Richard Biener <rguenther@suse.de> * tree-vectorizer.h (_slp_tree::vec_stmts): Make it a vector of gimple * stmts. (_stmt_vec_info::vec_stmts): Likewise. (vec_info::stmt_vec_info_ro): New flag. (vect_finish_replace_stmt): Adjust declaration. (vect_finish_stmt_generation): Likewise. (vectorizable_induction): Likewise. (vect_transform_reduction): Likewise. (vectorizable_lc_phi): Likewise. * tree-vect-data-refs.c (vect_create_data_ref_ptr): Do not allocate stmt infos for increments. (vect_record_grouped_load_vectors): Adjust. * tree-vect-loop.c (vect_create_epilog_for_reduction): Likewise. (vectorize_fold_left_reduction): Likewise. (vect_transform_reduction): Likewise. (vect_transform_cycle_phi): Likewise. (vectorizable_lc_phi): Likewise. (vectorizable_induction): Likewise. (vectorizable_live_operation): Likewise. (vect_transform_loop): Likewise. * tree-vect-patterns.c (vect_pattern_recog): Set stmt_vec_info_ro. * tree-vect-slp.c (vect_get_slp_vect_def): Adjust. (vect_get_slp_defs): Likewise. (vect_transform_slp_perm_load): Likewise. (vect_schedule_slp_instance): Likewise. (vectorize_slp_instance_root_stmt): Likewise. * tree-vect-stmts.c (vect_get_vec_defs_for_operand): Likewise. (vect_finish_stmt_generation_1): Do not allocate a stmt info. (vect_finish_replace_stmt): Do not return anything. (vect_finish_stmt_generation): Likewise. (vect_build_gather_load_calls): Adjust. (vectorizable_bswap): Likewise. (vectorizable_call): Likewise. (vectorizable_simd_clone_call): Likewise. (vect_create_vectorized_demotion_stmts): Likewise. (vectorizable_conversion): Likewise. (vectorizable_assignment): Likewise. (vectorizable_shift): Likewise. (vectorizable_operation): Likewise. (vectorizable_scan_store): Likewise. (vectorizable_store): Likewise. (vectorizable_load): Likewise. (vectorizable_condition): Likewise. (vectorizable_comparison): Likewise. (vect_transform_stmt): Likewise. * tree-vectorizer.c (vec_info::vec_info): Initialize stmt_vec_info_ro. (vec_info::replace_stmt): Copy over stmt UID rather than unsetting/setting a stmt info allocating a new UID. (vec_info::set_vinfo_for_stmt): Assert !stmt_vec_info_ro.
2020-08-17Introduce STMT_VINFO_VEC_STMTSRichard Biener1-12/+18
This gets rid of the linked list of STMT_VINFO_VECT_STMT and STMT_VINFO_RELATED_STMT in preparation for vectorized stmts no longer needing a stmt_vec_info (just for this chaining). This has ripple-down effects in all places we gather vectorized defs. For this new interfaces are introduced and used throughout vectorization, simplifying code in a lot of places and merging it with the SLP way of gathering vectorized operands. There is vect_get_vec_defs as the new recommended unified interface and vect_get_vec_defs_for_operand as one for non-SLP operation. I've resorted to keep the structure of the code the same where using vect_get_vec_defs would have been too disruptive for this already large patch. 2020-06-10 Richard Biener <rguenther@suse.de> * tree-vect-data-refs.c (vect_vfa_access_size): Adjust. (vect_record_grouped_load_vectors): Likewise. * tree-vect-loop.c (vect_create_epilog_for_reduction): Likewise. (vectorize_fold_left_reduction): Likewise. (vect_transform_reduction): Likewise. (vect_transform_cycle_phi): Likewise. (vectorizable_lc_phi): Likewise. (vectorizable_induction): Likewise. (vectorizable_live_operation): Likewise. (vect_transform_loop): Likewise. * tree-vect-slp.c (vect_get_slp_defs): New function, split out from overload. * tree-vect-stmts.c (vect_get_vec_def_for_operand_1): Remove. (vect_get_vec_def_for_operand): Likewise. (vect_get_vec_def_for_stmt_copy): Likewise. (vect_get_vec_defs_for_stmt_copy): Likewise. (vect_get_vec_defs_for_operand): New function. (vect_get_vec_defs): Likewise. (vect_build_gather_load_calls): Adjust. (vect_get_gather_scatter_ops): Likewise. (vectorizable_bswap): Likewise. (vectorizable_call): Likewise. (vectorizable_simd_clone_call): Likewise. (vect_get_loop_based_defs): Remove. (vect_create_vectorized_demotion_stmts): Adjust. (vectorizable_conversion): Likewise. (vectorizable_assignment): Likewise. (vectorizable_shift): Likewise. (vectorizable_operation): Likewise. (vectorizable_scan_store): Likewise. (vectorizable_store): Likewise. (vectorizable_load): Likewise. (vectorizable_condition): Likewise. (vectorizable_comparison): Likewise. (vect_transform_stmt): Adjust and remove no longer applicable sanity checks. * tree-vectorizer.c (vec_info::new_stmt_vec_info): Initialize STMT_VINFO_VEC_STMTS. (vec_info::free_stmt_vec_info): Relase it. * tree-vectorizer.h (_stmt_vec_info::vectorized_stmt): Remove. (_stmt_vec_info::vec_stmts): Add. (STMT_VINFO_VEC_STMT): Remove. (STMT_VINFO_VEC_STMTS): New. (vect_get_vec_def_for_operand_1): Remove. (vect_get_vec_def_for_operand): Likewise. (vect_get_vec_defs_for_stmt_copy): Likewise. (vect_get_vec_def_for_stmt_copy): Likewise. (vect_get_vec_defs): New overloads. (vect_get_vec_defs_for_operand): New. (vect_get_slp_defs): Declare.
2020-08-17add vect_get_slp_vect_defRichard Biener1-0/+1
This adds vect_get_slp_vect_def to get at a SLP nodes vectorized def, abstracting away the details. It also fixes one stray failure to use SLP_TREE_REPRESENTATIVE. 2020-05-04 Richard Biener <rguenther@suse.de> * tree-vectorizer.h (vect_get_slp_vect_def): Declare. * tree-vect-loop.c (vect_create_epilog_for_reduction): Use it. * tree-vect-stmts.c (vect_transform_stmt): Likewise. (vect_is_simple_use): Use SLP_TREE_REPRESENTATIVE. * tree-vect-slp.c (vect_get_slp_vect_defs): Fold into single use ... (vect_get_slp_defs): ... here. (vect_get_slp_vect_def): New function.
2020-08-17Add explicit SLP_TREE_LANESRichard Biener1-0/+3
This adds an explicit number of scalar lanes to the SLP node avoiding to dispatch between stmts/ops and eventually not require those vectors at all. 2020-05-27 Richard Biener <rguenther@suse.de> * tree-vectorizer.h (_slp_tree::lanes): New. (SLP_TREE_LANES): Likewise. * tree-vect-loop.c (vect_create_epilog_for_reduction): Use it. (vectorizable_reduction): Likewise. (vect_transform_cycle_phi): Likewise. (vectorizable_induction): Likewise. (vectorizable_live_operation): Likewise. * tree-vect-slp.c (_slp_tree::_slp_tree): Initialize lanes. (vect_create_new_slp_node): Likewise. (slp_copy_subtree): Copy it. (vect_optimize_slp): Use it. (vect_slp_analyze_node_operations_1): Likewise. (vect_slp_convert_to_external): Likewise. (vect_bb_vectorization_profitable_p): Likewise. * tree-vect-stmts.c (vectorizable_load): Likewise. (get_vectype_for_scalar_type): Likewise.
2020-08-17tree-optimization/95272 - add SLP_TREE_REPRESENTATIVERichard Biener1-0/+4
This adds SLP_TREE_REPRESENTATIVE - a representative stmt-info that is used by SLP analysis and code generation. This avoids the need for the hack in vect_slp_rearrange_stmts which previously avoided to re-arrange stmts that might not have been isomorphic because of operand swapping. It also plays nice with future directions of SLP and for the forseeable future is easier than replicating more and more info in the SLP node as long as non-SLP is in-tree. 2020-05-29 Richard Biener <rguenther@suse.de> PR tree-optimization/95272 * tree-vectorizer.h (_slp_tree::representative): Add. (SLP_TREE_REPRESENTATIVE): Likewise. * tree-vect-loop.c (vectorizable_reduction): Adjust SLP node gathering. (vectorizable_live_operation): Use the representative to attach the reduction info to. * tree-vect-slp.c (_slp_tree::_slp_tree): Initialize SLP_TREE_REPRESENTATIVE. (vect_create_new_slp_node): Likewise. (slp_copy_subtree): Copy it. (vect_slp_rearrange_stmts): Re-arrange even COND_EXPR stmts. (vect_slp_analyze_node_operations_1): Pass the representative to vect_analyze_stmt. (vect_schedule_slp_instance): Pass the representative to vect_transform_stmt. * gcc.dg/vect/pr95272.c: New testcase.
2020-08-17Code generate externals/invariants during the SLP graph walkRichard Biener1-0/+2
This generates vector defs for externals and invariants during the SLP walk rather than as part of getting vectorized defs when vectorizing the users. This is a requirement to make sharing of external/invariant nodes be reflected in actual code generation. This temporarily adds a SLP_TREE_VEC_DEFS vector alongside the SLP_TREE_VEC_STMTS one. Eventually the latter can go away. 2020-05-27 Richard Biener <rguenther@suse.de> * tree-vectorizer.h (_slp_tree::vec_defs): Add. (SLP_TREE_VEC_DEFS): Likewise. * tree-vect-slp.c (_slp_tree::_slp_tree): Adjust. (_slp_tree::~_slp_tree): Likewise. (vect_mask_constant_operand_p): Remove unused function. (vect_get_constant_vectors): Rename to... (vect_create_constant_vectors): ... this. Take the invariant node as argument and code generate it. Remove dead code, remove temporary asserts. Pass a NULL stmt_info to vect_init_vector. (vect_get_slp_defs): Simplify. (vect_schedule_slp_instance): Code-generate externals and invariants using vect_create_constant_vectors.
2020-08-17enfoce SLP_TREE_VECTYPE for invariantsRichard Biener1-0/+5
This tries to enforce a set SLP_TREE_VECTYPE in vect_get_constant_vectors and provides some infrastructure for setting it in the vectorizable_* functions, amending those. 2020-05-22 Richard Biener <rguenther@suse.de> * tree-vectorizer.h (vect_is_simple_use): New overload. (vect_maybe_update_slp_op_vectype): New. * tree-vect-stmts.c (vect_is_simple_use): New overload accessing operands of SLP vs. non-SLP operation transparently. (vect_maybe_update_slp_op_vectype): New function updating the possibly shared SLP operands vector type. (vectorizable_operation): Be a bit more SLP vs non-SLP agnostic using the new vect_is_simple_use overload; update SLP invariant operand nodes vector type. (vectorizable_comparison): Likewise. (vectorizable_call): Likewise. (vectorizable_conversion): Likewise. (vectorizable_shift): Likewise. (vectorizable_store): Likewise. (vectorizable_condition): Likewise. (vectorizable_assignment): Likewise. * tree-vect-loop.c (vectorizable_reduction): Likewise. * tree-vect-slp.c (vect_get_constant_vectors): Enforce present SLP_TREE_VECTYPE and check it matches previous behavior.
2020-08-17add ctor/dtor to slp_treeRichard Biener1-0/+3
This adds constructor and destructor to slp_tree factoring common code. I've not changed the wrappers to overloaded CTORs since I hope to use object_allocator<> and am not sure whether that can be done in any fancy way yet. 2020-05-22 Richard Biener <rguenther@suse.de> * tree-vectorizer.h (_slp_tree::_slp_tree): New. (_slp_tree::~_slp_tree): Likewise. * tree-vect-slp.c (_slp_tree::_slp_tree): Factor out code from allocators. (_slp_tree::~_slp_tree): Implement. (vect_free_slp_tree): Simplify. (vect_create_new_slp_node): Likewise. Add nops parameter. (vect_build_slp_tree_2): Adjust. (vect_analyze_slp_instance): Likewise.
2020-08-17cost invariant nodes from vect_slp_analyze_node_operations SLP walkRichard Biener1-0/+2
2020-05-19 Richard Biener <rguenther@suse.de> * tree-vectorizer.h (_slp_tree::vectype): Add field. (SLP_TREE_VECTYPE): New. * tree-vect-slp.c (vect_create_new_slp_node): Initialize SLP_TREE_VECTYPE. (vect_create_new_slp_node): Likewise. (vect_prologue_cost_for_slp): Move here from tree-vect-stmts.c and simplify. (vect_slp_analyze_node_operations): Walk nodes children for invariant costing. (vect_get_constant_vectors): Use local scope op variable. * tree-vect-stmts.c (vect_prologue_cost_for_slp_op): Remove here. (vect_model_simple_cost): Adjust. (vect_model_store_cost): Likewise. (vectorizable_store): Likewise.
2020-08-17add vectype parameter to add_stmt_cost hookRichard Biener1-6/+21
This adds a vectype parameter to add_stmt_cost which avoids the need to pass down a (wrong) stmt_info just to carry this information. Useful for invariants which do not have a stmt_info associated. 2020-05-13 Richard Biener <rguenther@suse.de> * target.def (add_stmt_cost): Add new vectype parameter. * targhooks.c (default_add_stmt_cost): Adjust. * targhooks.h (default_add_stmt_cost): Likewise. * config/aarch64/aarch64.c (aarch64_add_stmt_cost): Take new vectype parameter. * config/arm/arm.c (arm_add_stmt_cost): Likewise. * config/i386/i386.c (ix86_add_stmt_cost): Likewise. * config/rs6000/rs6000.c (rs6000_add_stmt_cost): Likewise. * tree-vectorizer.h (stmt_info_for_cost::vectype): Add. (dump_stmt_cost): Add new vectype parameter. (add_stmt_cost): Likewise. (record_stmt_cost): Likewise. (record_stmt_cost): Add overload with old signature. * tree-vect-loop.c (vect_compute_single_scalar_iteration_cost): Adjust. (vect_get_known_peeling_cost): Likewise. (vect_estimate_min_profitable_iters): Likewise. * tree-vectorizer.c (dump_stmt_cost): Add new vectype parameter. * tree-vect-stmts.c (record_stmt_cost): Likewise. (vect_prologue_cost_for_slp_op): Remove stmt_vec_info parameter and pass down correct vectype and NULL stmt_info. (vect_model_simple_cost): Adjust. (vect_model_store_cost): Likewise.
2020-08-17Remove SLP_INSTANCE_GROUP_SIZERichard Biener1-4/+4
This removes the SLP_INSTANCE_GROUP_SIZE member since the number of lanes throughout a SLP subgraph is not necessarily constant. 2020-05-13 Richard Biener <rguenther@suse.de> * tree-vectorizer.h (SLP_INSTANCE_GROUP_SIZE): Remove. (_slp_instance::group_size): Likewise. * tree-vect-loop.c (vectorizable_reduction): The group size is the number of lanes in the node. * tree-vect-slp.c (vect_attempt_slp_rearrange_stmts): Likewise. (vect_analyze_slp_instance): Do not set SLP_INSTANCE_GROUP_SIZE, verify it matches the instance trees number of lanes. (vect_slp_analyze_node_operations_1): Use the numer of lanes in the node as group size. (vect_bb_vectorization_profitable_p): Use the instance root number of lanes for the size of life. (vect_schedule_slp_instance): Use the number of lanes as group_size. * tree-vect-stmts.c (vectorizable_load): Remove SLP instance parameter. Use the number of lanes of the load for the group size in the gap adjustment code. (vect_analyze_stmt): Adjust. (vect_transform_stmt): Likewise.
2020-08-17move permutation validity checkRichard Biener1-1/+3
This delays the SLP permutation check to vectorizable_load and optimizes permutations only after all SLP instances have been generated and the vectorization factor is determined. 2020-05-08 Richard Biener <rguenther@suse.de> * tree-vectorizer.h (vec_info::slp_loads): New. (vect_optimize_slp): Declare. * tree-vect-slp.c (vect_attempt_slp_rearrange_stmts): Do nothing when there are no loads. (vect_gather_slp_loads): Gather loads into a vector. (vect_supported_load_permutation_p): Remove. (vect_analyze_slp_instance): Do not verify permutation validity here. (vect_analyze_slp): Optimize permutations of reductions after all SLP instances have been gathered and gather all loads. (vect_optimize_slp): New function split out from vect_supported_load_permutation_p. Elide some permutations. (vect_slp_analyze_bb_1): Call vect_optimize_slp. * tree-vect-loop.c (vect_analyze_loop_2): Likewise. * tree-vect-stmts.c (vectorizable_load): Check whether the load can be permuted. When generating code assert we can. * gcc.dg/vect/bb-slp-pr68892.c: Adjust for not supported SLP permutations becoming builds from scalars. * gcc.dg/vect/bb-slp-pr78205.c: Likewise. * gcc.dg/vect/bb-slp-34.c: Likewise.
2020-08-17Prepare removal of SLP_INSTANCE_GROUP_SIZERichard Biener1-1/+1
This removes trivial instances of SLP_INSTANCE_GROUP_SIZE and refrains from using a "SLP instance" which nowadays is just one of the possibly many entries into the SLP graph. 2020-05-06 Richard Biener <rguenther@suse.de> * tree-vectorizer.h (vect_transform_slp_perm_load): Adjust. * tree-vect-data-refs.c (vect_slp_analyze_node_dependences): Remove slp_instance parameter, just iterate over all scalar stmts. (vect_slp_analyze_instance_dependence): Adjust and likewise. * tree-vect-slp.c (vect_bb_slp_scalar_cost): Remove unused BB parameter. (vect_schedule_slp): Just iterate over all scalar stmts. (vect_supported_load_permutation_p): Adjust. (vect_transform_slp_perm_load): Remove slp_instance parameter, instead use the number of lanes in the node as group size. * tree-vect-stmts.c (vect_model_load_cost): Get vectorization factor instead of slp_instance as parameter. (vectorizable_load): Adjust.
2020-05-05add vec_info * parameters where neededRichard Biener1-59/+64
Soonish we'll get SLP nodes which have no corresponding scalar stmt and thus not stmt_vec_info and thus no way to get back to the associated vec_info. This patch makes the vec_info available as part of the APIs instead of putting in that back-pointer into the leaf data structures. 2020-05-05 Richard Biener <rguenther@suse.de> * tree-vectorizer.h (_stmt_vec_info::vinfo): Remove. (STMT_VINFO_LOOP_VINFO): Likewise. (STMT_VINFO_BB_VINFO): Likewise. * tree-vect-data-refs.c: Adjust for the above, adding vec_info * parameters and adjusting calls. * tree-vect-loop-manip.c: Likewise. * tree-vect-loop.c: Likewise. * tree-vect-patterns.c: Likewise. * tree-vect-slp.c: Likewise. * tree-vect-stmts.c: Likewise. * tree-vectorizer.c: Likewise. * target.def (add_stmt_cost): Add vec_info * parameter. * target.h (stmt_in_inner_loop_p): Likewise. * targhooks.c (default_add_stmt_cost): Adjust. * doc/tm.texi: Re-generate. * config/aarch64/aarch64.c (aarch64_extending_load_p): Add vec_info * parameter and adjust. (aarch64_sve_adjust_stmt_cost): Likewise. (aarch64_add_stmt_cost): Likewise. * config/arm/arm.c (arm_add_stmt_cost): Likewise. * config/i386/i386.c (ix86_add_stmt_cost): Likewise. * config/rs6000/rs6000.c (rs6000_add_stmt_cost): Likewise.
2020-01-20tree-optimization/93094 pass down VECTORIZED_CALL to versioningRichard Biener1-2/+2
When versioning is run the IL is already mangled and finding a VECTORIZED_CALL IFN can fail. 2020-01-20 Richard Biener <rguenther@suse.de> PR tree-optimization/93094 * tree-vectorizer.h (vect_loop_versioning): Adjust. (vect_transform_loop): Likewise. * tree-vectorizer.c (try_vectorize_loop_1): Pass down loop_vectorized_call to vect_transform_loop. * tree-vect-loop.c (vect_transform_loop): Pass down loop_vectorized_call to vect_loop_versioning. * tree-vect-loop-manip.c (vect_loop_versioning): Use the earlier discovered loop_vectorized_call. * gcc.dg/vect/pr93094.c: New testcase.
2020-01-14hash-table.h: support non-zero empty values in empty_slow (v2)David Malcolm1-0/+2
gcc/cp/ChangeLog: * cp-gimplify.c (source_location_table_entry_hash::empty_zero_p): New static constant. * cp-tree.h (named_decl_hash::empty_zero_p): Likewise. (struct named_label_hash::empty_zero_p): Likewise. * decl2.c (mangled_decl_hash::empty_zero_p): Likewise. gcc/ChangeLog: * attribs.c (excl_hash_traits::empty_zero_p): New static constant. * gcov.c (function_start_pair_hash::empty_zero_p): Likewise. * graphite.c (struct sese_scev_hash::empty_zero_p): Likewise. * hash-map-tests.c (selftest::test_nonzero_empty_key): New selftest. (selftest::hash_map_tests_c_tests): Call it. * hash-map-traits.h (simple_hashmap_traits::empty_zero_p): New static constant, using the value of = H::empty_zero_p. (unbounded_hashmap_traits::empty_zero_p): Likewise, using the value from default_hash_traits <Value>. * hash-map.h (hash_map::empty_zero_p): Likewise, using the value from Traits. * hash-set-tests.c (value_hash_traits::empty_zero_p): Likewise. * hash-table.h (hash_table::alloc_entries): Guard the loop of calls to mark_empty with !Descriptor::empty_zero_p. (hash_table::empty_slow): Conditionalize the memset call with a check that Descriptor::empty_zero_p; otherwise, loop through the entries calling mark_empty on them. * hash-traits.h (int_hash::empty_zero_p): New static constant. (pointer_hash::empty_zero_p): Likewise. (pair_hash::empty_zero_p): Likewise. * ipa-devirt.c (default_hash_traits <type_pair>::empty_zero_p): Likewise. * ipa-prop.c (ipa_bit_ggc_hash_traits::empty_zero_p): Likewise. (ipa_vr_ggc_hash_traits::empty_zero_p): Likewise. * profile.c (location_triplet_hash::empty_zero_p): Likewise. * sanopt.c (sanopt_tree_triplet_hash::empty_zero_p): Likewise. (sanopt_tree_couple_hash::empty_zero_p): Likewise. * tree-hasher.h (int_tree_hasher::empty_zero_p): Likewise. * tree-ssa-sccvn.c (vn_ssa_aux_hasher::empty_zero_p): Likewise. * tree-vect-slp.c (bst_traits::empty_zero_p): Likewise. * tree-vectorizer.h (default_hash_traits<scalar_cond_masked_key>::empty_zero_p): Likewise.
2020-01-10[vect] Add missing commentAndre Vieira1-0/+4
gcc/ChangeLog: 2020-01-10 Andre Vieira <andre.simoesdiasvieira@arm.com> * tree-vectorizer.h (get_dr_vinfo_offset): Add missing function comment. From-SVN: r280108
2020-01-10[vect] Keep track of DR_OFFSET advance in dr_vec_info rather than data_referenceAndre Vieira1-1/+25
gcc/ChangeLog: 2020-01-10 Andre Vieira <andre.simoesdiasvieira@arm.com> * tree-vect-data-refs.c (vect_create_addr_base_for_vector_ref): Use get_dr_vinfo_offset * tree-vect-loop.c (update_epilogue_loop_vinfo): Remove orig_drs_init parameter and its use to reset DR_OFFSET's. (vect_transform_loop): Remove orig_drs_init argument. * tree-vect-loop-manip.c (vect_update_init_of_dr): Update the offset member of dr_vec_info rather than the offset of the associated data_reference's innermost_loop_behavior. (vect_update_init_of_dr): Pass dr_vec_info instead of data_reference. (vect_do_peeling): Remove orig_drs_init parameter and its construction. * tree-vect-stmts.c (check_scan_store): Replace use of DR_OFFSET with get_dr_vinfo_offset. (vectorizable_store): Likewise. (vectorizable_load): Likewise. From-SVN: r280107
2020-01-01Update copyright years.Jakub Jelinek1-1/+1
From-SVN: r279813
2019-11-29Record the vector mask precision in stmt_vec_infoRichard Sandiford1-0/+26
search_type_for_mask uses a worklist to search a chain of boolean operations for a natural vector mask type. This patch instead does that in vect_determine_stmt_precisions, where we also look for overpromoted integer operations. We then only need to compute the precision once and can cache it in the stmt_vec_info. The new function vect_determine_mask_precision is supposed to handle exactly the same cases as search_type_for_mask_1, and in the same way. There's a lot we could improve here, but that's not stage 3 material. I wondered about sharing mask_precision with other fields like operation_precision, but in the end that seemed too dangerous. We have patterns to convert between boolean and non-boolean operations and it would be very easy to get mixed up about which case the fields are describing. 2019-11-29 Richard Sandiford <richard.sandiford@arm.com> gcc/ * tree-vectorizer.h (stmt_vec_info::mask_precision): New field. (vect_use_mask_type_p): New function. * tree-vect-patterns.c (vect_init_pattern_stmt): Copy the mask precision to the pattern statement. (append_pattern_def_seq): Add a scalar_type_for_mask parameter and use it to initialize the new stmt's mask precision. (search_type_for_mask_1): Delete. (search_type_for_mask): Replace with... (integer_type_for_mask): ...this new function. Use the information cached in the stmt_vec_info. (vect_recog_bool_pattern): Update accordingly. (build_mask_conversion): Pass the scalar type associated with the mask type to append_pattern_def_seq. (vect_recog_mask_conversion_pattern): Likewise. Call integer_type_for_mask instead of search_type_for_mask. (vect_convert_mask_for_vectype): Call integer_type_for_mask instead of search_type_for_mask. (possible_vector_mask_operation_p): New function. (vect_determine_mask_precision): Likewise. (vect_determine_stmt_precisions): Call it. From-SVN: r278850
2019-11-29Make vect_get_mask_type_for_stmt take a group sizeRichard Sandiford1-2/+2
This patch makes vect_get_mask_type_for_stmt and get_mask_type_for_scalar_type take a group size instead of the SLP node, so that later patches can call it before an SLP node has been built. 2019-11-29 Richard Sandiford <richard.sandiford@arm.com> gcc/ * tree-vectorizer.h (get_mask_type_for_scalar_type): Replace the slp_tree parameter with a group size parameter. (vect_get_mask_type_for_stmt): Likewise. * tree-vect-stmts.c (get_mask_type_for_scalar_type): Likewise. (vect_get_mask_type_for_stmt): Likewise. * tree-vect-slp.c (vect_slp_analyze_node_operations_1): Update call accordingly. From-SVN: r278849
2019-11-16Optionally pick the cheapest loop_vec_infoRichard Sandiford1-0/+7
This patch adds a mode in which the vectoriser tries each available base vector mode and picks the one with the lowest cost. The new behaviour is selected by autovectorize_vector_modes. The patch keeps the current behaviour of preferring a VF of loop->simdlen over any larger or smaller VF, regardless of costs or target preferences. 2019-11-16 Richard Sandiford <richard.sandiford@arm.com> gcc/ * target.h (VECT_COMPARE_COSTS): New constant. * target.def (autovectorize_vector_modes): Return a bitmask of flags. * doc/tm.texi: Regenerate. * targhooks.h (default_autovectorize_vector_modes): Update accordingly. * targhooks.c (default_autovectorize_vector_modes): Likewise. * config/aarch64/aarch64.c (aarch64_autovectorize_vector_modes): Likewise. * config/arc/arc.c (arc_autovectorize_vector_modes): Likewise. * config/arm/arm.c (arm_autovectorize_vector_modes): Likewise. * config/i386/i386.c (ix86_autovectorize_vector_modes): Likewise. * config/mips/mips.c (mips_autovectorize_vector_modes): Likewise. * tree-vectorizer.h (_loop_vec_info::vec_outside_cost) (_loop_vec_info::vec_inside_cost): New member variables. * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize them. (vect_better_loop_vinfo_p, vect_joust_loop_vinfos): New functions. (vect_analyze_loop): When autovectorize_vector_modes returns VECT_COMPARE_COSTS, try vectorizing the loop with each available vector mode and picking the one with the lowest cost. (vect_estimate_min_profitable_iters): Record the computed costs in the loop_vec_info. From-SVN: r278336
2019-11-16Extend can_duplicate_and_interleave_p to mixed-size vectorsRichard Sandiford1-2/+1
This patch makes can_duplicate_and_interleave_p cope with mixtures of vector sizes, by using queries based on get_vectype_for_scalar_type instead of directly querying GET_MODE_SIZE (vinfo->vector_mode). int_mode_for_size is now the first check we do for a candidate mode, so it seemed better to restrict it to MAX_FIXED_MODE_SIZE. This avoids unnecessary work and avoids trying to create scalar types that the target might not support. 2019-11-16 Richard Sandiford <richard.sandiford@arm.com> gcc/ * tree-vectorizer.h (can_duplicate_and_interleave_p): Take an element type rather than an element mode. * tree-vect-slp.c (can_duplicate_and_interleave_p): Likewise. Use get_vectype_for_scalar_type to query the natural types for a given element type rather than basing everything on GET_MODE_SIZE (vinfo->vector_mode). Limit int_mode_for_size query to MAX_FIXED_MODE_SIZE. (duplicate_and_interleave): Update call accordingly. * tree-vect-loop.c (vectorizable_reduction): Likewise. From-SVN: r278335
2019-11-16Apply maximum nunits for BB SLPRichard Sandiford1-4/+5
The BB vectoriser picked vector types in the same way as the loop vectoriser: it picked a vector mode/size for the region and then based all the vector types off that choice. This meant we could end up trying to use vector types that had too many elements for the group size. The main part of this patch is therefore about passing the SLP group size down to routines like get_vectype_for_scalar_type and ensuring that each vector type in the SLP tree is chosen wrt the group size. That part in itself is pretty easy and mechanical. The main warts are: (1) We normally pick a STMT_VINFO_VECTYPE for data references at an early stage (vect_analyze_data_refs). However, nothing in the BB vectoriser relied on this, or on the min_vf calculated from it. I couldn't see anything other than vect_recog_bool_pattern that tried to access the vector type before the SLP tree is built. (2) It's possible for the same statement to be used in groups of different sizes. Taking the group size into account meant that we could try to pick different vector types for the same statement. This problem should go away with the move to doing everything on SLP trees, where presumably we would attach the vector type to the SLP node rather than the stmt_vec_info. Until then, the patch just uses a first-come, first-served approach. (3) A similar problem exists for grouped data references, where different statements in the same dataref group could be used in SLP nodes that have different group sizes. The patch copes with that by making sure that all vector types in a dataref group remain consistent. The patch means that: void f (int *x, short *y) { x[0] += y[0]; x[1] += y[1]; x[2] += y[2]; x[3] += y[3]; } now produces: ldr q0, [x0] ldr d1, [x1] saddw v0.4s, v0.4s, v1.4h str q0, [x0] ret instead of: ldrsh w2, [x1] ldrsh w3, [x1, 2] fmov s0, w2 ldrsh w2, [x1, 4] ldrsh w1, [x1, 6] ins v0.s[1], w3 ldr q1, [x0] ins v0.s[2], w2 ins v0.s[3], w1 add v0.4s, v0.4s, v1.4s str q0, [x0] ret Unfortunately it also means we start to vectorise gcc.target/i386/pr84101.c for -m32. That seems like a target cost issue though; see PR92265 for details. 2019-11-16 Richard Sandiford <richard.sandiford@arm.com> gcc/ * tree-vectorizer.h (vect_get_vector_types_for_stmt): Take an optional maximum nunits. (get_vectype_for_scalar_type): Likewise. Also declare a form that takes an slp_tree. (get_mask_type_for_scalar_type): Take an optional slp_tree. (vect_get_mask_type_for_stmt): Likewise. * tree-vect-data-refs.c (vect_analyze_data_refs): Don't store the vector type in STMT_VINFO_VECTYPE for BB vectorization. * tree-vect-patterns.c (vect_recog_bool_pattern): Use vect_get_vector_types_for_stmt instead of STMT_VINFO_VECTYPE to get an assumed vector type for data references. * tree-vect-slp.c (vect_update_shared_vectype): New function. (vect_update_all_shared_vectypes): Likewise. (vect_build_slp_tree_1): Pass the group size to vect_get_vector_types_for_stmt. Use vect_update_shared_vectype for BB vectorization. (vect_build_slp_tree_2): Call vect_update_all_shared_vectypes before building the vectof from scalars. (vect_analyze_slp_instance): Pass the group size to get_vectype_for_scalar_type. (vect_slp_analyze_node_operations_1): Don't recompute the vector types for BB vectorization here; just handle the case in which we deferred the choice for booleans. (vect_get_constant_vectors): Pass the slp_tree to get_vectype_for_scalar_type. * tree-vect-stmts.c (vect_prologue_cost_for_slp_op): Likewise. (vectorizable_call): Likewise. (vectorizable_simd_clone_call): Likewise. (vectorizable_conversion): Likewise. (vectorizable_shift): Likewise. (vectorizable_operation): Likewise. (vectorizable_comparison): Likewise. (vect_is_simple_cond): Take the slp_tree as argument and pass it to get_vectype_for_scalar_type. (vectorizable_condition): Update call accordingly. (get_vectype_for_scalar_type): Take a group_size argument. For BB vectorization, limit the the vector to that number of elements. Also define an overload that takes an slp_tree. (get_mask_type_for_scalar_type): Add an slp_tree argument and pass it to get_vectype_for_scalar_type. (vect_get_vector_types_for_stmt): Add a group_size argument and pass it to get_vectype_for_scalar_type. Don't use the cached vector type for BB vectorization if a group size is given. Handle data references in that case. (vect_get_mask_type_for_stmt): Take an slp_tree argument and pass it to get_mask_type_for_scalar_type. gcc/testsuite/ * gcc.dg/vect/bb-slp-4.c: Expect the block to be vectorized with -fno-vect-cost-model. * gcc.dg/vect/bb-slp-bool-1.c: New test. * gcc.target/aarch64/vect_mixed_sizes_14.c: Likewise. * gcc.target/i386/pr84101.c: XFAIL for -m32. From-SVN: r278334
2019-11-14Avoid retrying with the same vector modesRichard Sandiford1-0/+5
A later patch makes the AArch64 port add four entries to autovectorize_vector_modes. Each entry describes a different vector mode assignment for vector code that mixes 8-bit, 16-bit, 32-bit and 64-bit elements. But if (as usual) the vector code has fewer element sizes than that, we could end up trying the same combination of vector modes multiple times. This patch adds a check to prevent that. 2019-11-14 Richard Sandiford <richard.sandiford@arm.com> gcc/ * tree-vectorizer.h (vec_info::mode_set): New typedef. (vec_info::used_vector_mode): New member variable. (vect_chooses_same_modes_p): Declare. * tree-vect-stmts.c (get_vectype_for_scalar_type): Record each chosen vector mode in vec_info::used_vector_mode. (vect_chooses_same_modes_p): New function. * tree-vect-loop.c (vect_analyze_loop): Use it to avoid trying the same vector statements multiple times. * tree-vect-slp.c (vect_slp_bb_region): Likewise. From-SVN: r278242
2019-11-14Support vectorisation with mixed vector sizesRichard Sandiford1-3/+5
After previous patches, it's now possible to make the vectoriser support multiple vector sizes in the same vector region, using related_vector_mode to pick the right vector mode for a given element mode. No port yet takes advantage of this, but I have a follow-on patch for AArch64. This patch also seemed like a good opportunity to add some more dump messages: one to make it clear which vector size/mode was being used when analysis passed or failed, and another to say when we've decided to skip a redundant vector size/mode. 2019-11-14 Richard Sandiford <richard.sandiford@arm.com> gcc/ * machmode.h (opt_machine_mode::operator==): New function. (opt_machine_mode::operator!=): Likewise. * tree-vectorizer.h (vec_info::vector_mode): Update comment. (get_related_vectype_for_scalar_type): Delete. (get_vectype_for_scalar_type_and_size): Declare. * tree-vect-slp.c (vect_slp_bb_region): Print dump messages to say whether analysis passed or failed, and with what vector modes. Use related_vector_mode to check whether trying a particular vector mode would be redundant with the autodetected mode, and print a dump message if we decide to skip it. * tree-vect-loop.c (vect_analyze_loop): Likewise. (vect_create_epilog_for_reduction): Use get_related_vectype_for_scalar_type instead of get_vectype_for_scalar_type_and_size. * tree-vect-stmts.c (get_vectype_for_scalar_type_and_size): Replace with... (get_related_vectype_for_scalar_type): ...this new function. Take a starting/"prevailing" vector mode rather than a vector size. Take an optional nunits argument, with the same meaning as for related_vector_mode. Use related_vector_mode when not auto-detecting a mode, falling back to mode_for_vector if no target mode exists. (get_vectype_for_scalar_type): Update accordingly. (get_same_sized_vectype): Likewise. * tree-vectorizer.c (get_vec_alignment_for_array_type): Likewise. From-SVN: r278240
2019-11-14Replace vec_info::vector_size with vec_info::vector_modeRichard Sandiford1-3/+3
This patch replaces vec_info::vector_size with vec_info::vector_mode, but for now continues to use it as a way of specifying a single vector size. This makes it easier for later patches to use related_vector_mode instead. 2019-11-14 Richard Sandiford <richard.sandiford@arm.com> gcc/ * tree-vectorizer.h (vec_info::vector_size): Replace with... (vec_info::vector_mode): ...this new field. * tree-vect-loop.c (vect_update_vf_for_slp): Update accordingly. (vect_analyze_loop, vect_transform_loop): Likewise. * tree-vect-loop-manip.c (vect_do_peeling): Likewise. * tree-vect-slp.c (can_duplicate_and_interleave_p): Likewise. (vect_make_slp_decision, vect_slp_bb_region): Likewise. * tree-vect-stmts.c (get_vectype_for_scalar_type): Likewise. * tree-vectorizer.c (try_vectorize_loop_1): Likewise. gcc/testsuite/ * gcc.dg/vect/vect-tail-nomask-1.c: Update expected epilogue vectorization message. From-SVN: r278237
2019-11-14Add build_truth_vector_type_for_modeRichard Sandiford1-5/+5
Callers of vect_halve_mask_nunits and vect_double_mask_nunits already know what mode the resulting vector type should have, so we might as well create the vector type directly with that mode, just like build_vector_type_for_mode lets us build normal vectors with a known mode. This avoids the current awkwardness of having to recompute the mode starting from vec_info::vector_size, which hard-codes the assumption that all vectors have to be the same size. A later patch gets rid of build_truth_vector_type and build_same_sized_truth_vector_type, so the net effect of the series is to reduce the number of type functions by one. 2019-11-14 Richard Sandiford <richard.sandiford@arm.com> gcc/ * tree.h (build_truth_vector_type_for_mode): Declare. * tree.c (build_truth_vector_type_for_mode): New function, split out from... (build_truth_vector_type): ...here. (build_opaque_vector_type): Fix head comment. * tree-vectorizer.h (supportable_narrowing_operation): Remove vec_info parameter. (vect_halve_mask_nunits): Replace vec_info parameter with the mode of the new vector. (vect_double_mask_nunits): Likewise. * tree-vect-loop.c (vect_halve_mask_nunits): Likewise. (vect_double_mask_nunits): Likewise. * tree-vect-loop-manip.c: Include insn-config.h, rtl.h and recog.h. (vect_maybe_permute_loop_masks): Remove vinfo parameter. Update call to vect_halve_mask_nunits, getting the required mode from the unpack patterns. (vect_set_loop_condition_masked): Update call accordingly. * tree-vect-stmts.c (supportable_narrowing_operation): Remove vec_info parameter and update call to vect_double_mask_nunits. (vectorizable_conversion): Update call accordingly. (simple_integer_narrowing): Likewise. Remove vec_info parameter. (vectorizable_call): Update call accordingly. (supportable_widening_operation): Update call to vect_halve_mask_nunits. * config/aarch64/aarch64-sve-builtins.cc (register_builtin_types): Use build_truth_vector_type_mode instead of build_truth_vector_type. From-SVN: r278231
2019-11-13Avoid accounting for non-existent vector loop versioningRichard Sandiford1-0/+11
vect_analyze_loop_costing uses two profitability thresholds: a runtime one and a static compile-time one. The runtime one is simply the point at which the vector loop is cheaper than the scalar loop, while the static one also takes into account the cost of choosing between the scalar and vector loops at runtime. We compare this static cost against the expected execution frequency to decide whether it's worth generating any vector code at all. However, we never reclaimed the cost of applying the runtime threshold if it turned out that the vector code can always be used. And we only know whether that's true once we've calculated what the runtime threshold would be. 2019-11-13 Richard Sandiford <richard.sandiford@arm.com> gcc/ * tree-vectorizer.h (vect_apply_runtime_profitability_check_p): New function. * tree-vect-loop-manip.c (vect_loop_versioning): Use it. * tree-vect-loop.c (vect_analyze_loop_2): Likewise. (vect_transform_loop): Likewise. (vect_analyze_loop_costing): Don't take the cost of versioning into account for the static profitability threshold if it turns out that no versioning is needed. From-SVN: r278124
2019-11-13Don't assign a cost to vectorizable_assignmentRichard Sandiford1-0/+1
vectorizable_assignment handles true SSA-to-SSA copies (which hopefully we don't see in practice) and no-op conversions that are required to maintain correct gimple, such as changes between signed and unsigned types. These cases shouldn't generate any code and so shouldn't count against either the scalar or vector costs. Later patches test this, but it seemed worth splitting out. 2019-11-13 Richard Sandiford <richard.sandiford@arm.com> gcc/ * tree-vectorizer.h (vect_nop_conversion_p): Declare. * tree-vect-stmts.c (vect_nop_conversion_p): New function. (vectorizable_assignment): Don't add a cost for nop conversions. * tree-vect-loop.c (vect_compute_single_scalar_iteration_cost): Likewise. * tree-vect-slp.c (vect_bb_slp_scalar_cost): Likewise. From-SVN: r278122
2019-11-12Apply mechanical replacement (generated patch).Martin Liska1-1/+1
2019-11-12 Martin Liska <mliska@suse.cz> * asan.c (asan_sanitize_stack_p): Replace old parameter syntax with the new one, include opts.h if needed. Use SET_OPTION_IF_UNSET macro. (asan_sanitize_allocas_p): Likewise. (asan_emit_stack_protection): Likewise. (asan_protect_global): Likewise. (instrument_derefs): Likewise. (instrument_builtin_call): Likewise. (asan_expand_mark_ifn): Likewise. * auto-profile.c (auto_profile): Likewise. * bb-reorder.c (copy_bb_p): Likewise. (duplicate_computed_gotos): Likewise. * builtins.c (inline_expand_builtin_string_cmp): Likewise. * cfgcleanup.c (try_crossjump_to_edge): Likewise. (try_crossjump_bb): Likewise. * cfgexpand.c (defer_stack_allocation): Likewise. (stack_protect_classify_type): Likewise. (pass_expand::execute): Likewise. * cfgloopanal.c (expected_loop_iterations_unbounded): Likewise. (estimate_reg_pressure_cost): Likewise. * cgraph.c (cgraph_edge::maybe_hot_p): Likewise. * combine.c (combine_instructions): Likewise. (record_value_for_reg): Likewise. * common/config/aarch64/aarch64-common.c (aarch64_option_validate_param): Likewise. (aarch64_option_default_params): Likewise. * common/config/ia64/ia64-common.c (ia64_option_default_params): Likewise. * common/config/powerpcspe/powerpcspe-common.c (rs6000_option_default_params): Likewise. * common/config/rs6000/rs6000-common.c (rs6000_option_default_params): Likewise. * common/config/sh/sh-common.c (sh_option_default_params): Likewise. * config/aarch64/aarch64.c (aarch64_output_probe_stack_range): Likewise. (aarch64_allocate_and_probe_stack_space): Likewise. (aarch64_expand_epilogue): Likewise. (aarch64_override_options_internal): Likewise. * config/alpha/alpha.c (alpha_option_override): Likewise. * config/arm/arm.c (arm_option_override): Likewise. (arm_valid_target_attribute_p): Likewise. * config/i386/i386-options.c (ix86_option_override_internal): Likewise. * config/i386/i386.c (get_probe_interval): Likewise. (ix86_adjust_stack_and_probe_stack_clash): Likewise. (ix86_max_noce_ifcvt_seq_cost): Likewise. * config/ia64/ia64.c (ia64_adjust_cost): Likewise. * config/rs6000/rs6000-logue.c (get_stack_clash_protection_probe_interval): Likewise. (get_stack_clash_protection_guard_size): Likewise. * config/rs6000/rs6000.c (rs6000_option_override_internal): Likewise. * config/s390/s390.c (allocate_stack_space): Likewise. (s390_emit_prologue): Likewise. (s390_option_override_internal): Likewise. * config/sparc/sparc.c (sparc_option_override): Likewise. * config/visium/visium.c (visium_option_override): Likewise. * coverage.c (get_coverage_counts): Likewise. (coverage_compute_profile_id): Likewise. (coverage_begin_function): Likewise. (coverage_end_function): Likewise. * cse.c (cse_find_path): Likewise. (cse_extended_basic_block): Likewise. (cse_main): Likewise. * cselib.c (cselib_invalidate_mem): Likewise. * dse.c (dse_step1): Likewise. * emit-rtl.c (set_new_first_and_last_insn): Likewise. (get_max_insn_count): Likewise. (make_debug_insn_raw): Likewise. (init_emit): Likewise. * explow.c (compute_stack_clash_protection_loop_data): Likewise. * final.c (compute_alignments): Likewise. * fold-const.c (fold_range_test): Likewise. (fold_truth_andor): Likewise. (tree_single_nonnegative_warnv_p): Likewise. (integer_valued_real_single_p): Likewise. * gcse.c (want_to_gcse_p): Likewise. (prune_insertions_deletions): Likewise. (hoist_code): Likewise. (gcse_or_cprop_is_too_expensive): Likewise. * ggc-common.c: Likewise. * ggc-page.c (ggc_collect): Likewise. * gimple-loop-interchange.cc (MAX_NUM_STMT): Likewise. (MAX_DATAREFS): Likewise. (OUTER_STRIDE_RATIO): Likewise. * gimple-loop-jam.c (tree_loop_unroll_and_jam): Likewise. * gimple-loop-versioning.cc (loop_versioning::max_insns_for_loop): Likewise. * gimple-ssa-split-paths.c (is_feasible_trace): Likewise. * gimple-ssa-store-merging.c (imm_store_chain_info::try_coalesce_bswap): Likewise. (imm_store_chain_info::coalesce_immediate_stores): Likewise. (imm_store_chain_info::output_merged_store): Likewise. (pass_store_merging::process_store): Likewise. * gimple-ssa-strength-reduction.c (find_basis_for_base_expr): Likewise. * graphite-isl-ast-to-gimple.c (class translate_isl_ast_to_gimple): Likewise. (scop_to_isl_ast): Likewise. * graphite-optimize-isl.c (get_schedule_for_node_st): Likewise. (optimize_isl): Likewise. * graphite-scop-detection.c (build_scops): Likewise. * haifa-sched.c (set_modulo_params): Likewise. (rank_for_schedule): Likewise. (model_add_to_worklist): Likewise. (model_promote_insn): Likewise. (model_choose_insn): Likewise. (queue_to_ready): Likewise. (autopref_multipass_dfa_lookahead_guard): Likewise. (schedule_block): Likewise. (sched_init): Likewise. * hsa-gen.c (init_prologue): Likewise. * ifcvt.c (bb_ok_for_noce_convert_multiple_sets): Likewise. (cond_move_process_if_block): Likewise. * ipa-cp.c (ipcp_lattice::add_value): Likewise. (merge_agg_lats_step): Likewise. (devirtualization_time_bonus): Likewise. (hint_time_bonus): Likewise. (incorporate_penalties): Likewise. (good_cloning_opportunity_p): Likewise. (ipcp_propagate_stage): Likewise. * ipa-fnsummary.c (decompose_param_expr): Likewise. (set_switch_stmt_execution_predicate): Likewise. (analyze_function_body): Likewise. (compute_fn_summary): Likewise. * ipa-inline-analysis.c (estimate_growth): Likewise. * ipa-inline.c (caller_growth_limits): Likewise. (inline_insns_single): Likewise. (inline_insns_auto): Likewise. (can_inline_edge_by_limits_p): Likewise. (want_early_inline_function_p): Likewise. (big_speedup_p): Likewise. (want_inline_small_function_p): Likewise. (want_inline_self_recursive_call_p): Likewise. (edge_badness): Likewise. (recursive_inlining): Likewise. (compute_max_insns): Likewise. (early_inliner): Likewise. * ipa-polymorphic-call.c (csftc_abort_walking_p): Likewise. * ipa-profile.c (ipa_profile): Likewise. * ipa-prop.c (determine_known_aggregate_parts): Likewise. (ipa_analyze_node): Likewise. (ipcp_transform_function): Likewise. * ipa-split.c (consider_split): Likewise. * ipa-sra.c (allocate_access): Likewise. (process_scan_results): Likewise. (ipa_sra_summarize_function): Likewise. (pull_accesses_from_callee): Likewise. * ira-build.c (loop_compare_func): Likewise. (mark_loops_for_removal): Likewise. * ira-conflicts.c (build_conflict_bit_table): Likewise. * loop-doloop.c (doloop_optimize): Likewise. * loop-invariant.c (gain_for_invariant): Likewise. (move_loop_invariants): Likewise. * loop-unroll.c (decide_unroll_constant_iterations): Likewise. (decide_unroll_runtime_iterations): Likewise. (decide_unroll_stupid): Likewise. (expand_var_during_unrolling): Likewise. * lra-assigns.c (spill_for): Likewise. * lra-constraints.c (EBB_PROBABILITY_CUTOFF): Likewise. * modulo-sched.c (sms_schedule): Likewise. (DFA_HISTORY): Likewise. * opts.c (default_options_optimization): Likewise. (finish_options): Likewise. (common_handle_option): Likewise. * postreload-gcse.c (eliminate_partially_redundant_load): Likewise. (if): Likewise. * predict.c (get_hot_bb_threshold): Likewise. (maybe_hot_count_p): Likewise. (probably_never_executed): Likewise. (predictable_edge_p): Likewise. (predict_loops): Likewise. (expr_expected_value_1): Likewise. (tree_predict_by_opcode): Likewise. (handle_missing_profiles): Likewise. * reload.c (find_equiv_reg): Likewise. * reorg.c (redundant_insn): Likewise. * resource.c (mark_target_live_regs): Likewise. (incr_ticks_for_insn): Likewise. * sanopt.c (pass_sanopt::execute): Likewise. * sched-deps.c (sched_analyze_1): Likewise. (sched_analyze_2): Likewise. (sched_analyze_insn): Likewise. (deps_analyze_insn): Likewise. * sched-ebb.c (schedule_ebbs): Likewise. * sched-rgn.c (find_single_block_region): Likewise. (too_large): Likewise. (haifa_find_rgns): Likewise. (extend_rgns): Likewise. (new_ready): Likewise. (schedule_region): Likewise. (sched_rgn_init): Likewise. * sel-sched-ir.c (make_region_from_loop): Likewise. * sel-sched-ir.h (MAX_WS): Likewise. * sel-sched.c (process_pipelined_exprs): Likewise. (sel_setup_region_sched_flags): Likewise. * shrink-wrap.c (try_shrink_wrapping): Likewise. * targhooks.c (default_max_noce_ifcvt_seq_cost): Likewise. * toplev.c (print_version): Likewise. (process_options): Likewise. * tracer.c (tail_duplicate): Likewise. * trans-mem.c (tm_log_add): Likewise. * tree-chrec.c (chrec_fold_plus_1): Likewise. * tree-data-ref.c (split_constant_offset): Likewise. (compute_all_dependences): Likewise. * tree-if-conv.c (MAX_PHI_ARG_NUM): Likewise. * tree-inline.c (remap_gimple_stmt): Likewise. * tree-loop-distribution.c (MAX_DATAREFS_NUM): Likewise. * tree-parloops.c (MIN_PER_THREAD): Likewise. (create_parallel_loop): Likewise. * tree-predcom.c (determine_unroll_factor): Likewise. * tree-scalar-evolution.c (instantiate_scev_r): Likewise. * tree-sra.c (analyze_all_variable_accesses): Likewise. * tree-ssa-ccp.c (fold_builtin_alloca_with_align): Likewise. * tree-ssa-dse.c (setup_live_bytes_from_ref): Likewise. (dse_optimize_redundant_stores): Likewise. (dse_classify_store): Likewise. * tree-ssa-ifcombine.c (ifcombine_ifandif): Likewise. * tree-ssa-loop-ch.c (ch_base::copy_headers): Likewise. * tree-ssa-loop-im.c (LIM_EXPENSIVE): Likewise. * tree-ssa-loop-ivcanon.c (try_unroll_loop_completely): Likewise. (try_peel_loop): Likewise. (tree_unroll_loops_completely): Likewise. * tree-ssa-loop-ivopts.c (avg_loop_niter): Likewise. (CONSIDER_ALL_CANDIDATES_BOUND): Likewise. (MAX_CONSIDERED_GROUPS): Likewise. (ALWAYS_PRUNE_CAND_SET_BOUND): Likewise. * tree-ssa-loop-manip.c (can_unroll_loop_p): Likewise. * tree-ssa-loop-niter.c (MAX_ITERATIONS_TO_TRACK): Likewise. * tree-ssa-loop-prefetch.c (PREFETCH_BLOCK): Likewise. (L1_CACHE_SIZE_BYTES): Likewise. (L2_CACHE_SIZE_BYTES): Likewise. (should_issue_prefetch_p): Likewise. (schedule_prefetches): Likewise. (determine_unroll_factor): Likewise. (volume_of_references): Likewise. (add_subscript_strides): Likewise. (self_reuse_distance): Likewise. (mem_ref_count_reasonable_p): Likewise. (insn_to_prefetch_ratio_too_small_p): Likewise. (loop_prefetch_arrays): Likewise. (tree_ssa_prefetch_arrays): Likewise. * tree-ssa-loop-unswitch.c (tree_unswitch_single_loop): Likewise. * tree-ssa-math-opts.c (gimple_expand_builtin_pow): Likewise. (convert_mult_to_fma): Likewise. (math_opts_dom_walker::after_dom_children): Likewise. * tree-ssa-phiopt.c (cond_if_else_store_replacement): Likewise. (hoist_adjacent_loads): Likewise. (gate_hoist_loads): Likewise. * tree-ssa-pre.c (translate_vuse_through_block): Likewise. (compute_partial_antic_aux): Likewise. * tree-ssa-reassoc.c (get_reassociation_width): Likewise. * tree-ssa-sccvn.c (vn_reference_lookup_pieces): Likewise. (vn_reference_lookup): Likewise. (do_rpo_vn): Likewise. * tree-ssa-scopedtables.c (avail_exprs_stack::lookup_avail_expr): Likewise. * tree-ssa-sink.c (select_best_block): Likewise. * tree-ssa-strlen.c (new_stridx): Likewise. (new_addr_stridx): Likewise. (get_range_strlen_dynamic): Likewise. (class ssa_name_limit_t): Likewise. * tree-ssa-structalias.c (push_fields_onto_fieldstack): Likewise. (create_variable_info_for_1): Likewise. (init_alias_vars): Likewise. * tree-ssa-tail-merge.c (find_clusters_1): Likewise. (tail_merge_optimize): Likewise. * tree-ssa-threadbackward.c (thread_jumps::profitable_jump_thread_path): Likewise. (thread_jumps::fsm_find_control_statement_thread_paths): Likewise. (thread_jumps::find_jump_threads_backwards): Likewise. * tree-ssa-threadedge.c (record_temporary_equivalences_from_stmts_at_dest): Likewise. * tree-ssa-uninit.c (compute_control_dep_chain): Likewise. * tree-switch-conversion.c (switch_conversion::check_range): Likewise. (jump_table_cluster::can_be_handled): Likewise. * tree-switch-conversion.h (jump_table_cluster::case_values_threshold): Likewise. (SWITCH_CONVERSION_BRANCH_RATIO): Likewise. (param_switch_conversion_branch_ratio): Likewise. * tree-vect-data-refs.c (vect_mark_for_runtime_alias_test): Likewise. (vect_enhance_data_refs_alignment): Likewise. (vect_prune_runtime_alias_test_list): Likewise. * tree-vect-loop.c (vect_analyze_loop_costing): Likewise. (vect_get_datarefs_in_loop): Likewise. (vect_analyze_loop): Likewise. * tree-vect-slp.c (vect_slp_bb): Likewise. * tree-vectorizer.h: Likewise. * tree-vrp.c (find_switch_asserts): Likewise. (vrp_prop::check_mem_ref): Likewise. * tree.c (wide_int_to_tree_1): Likewise. (cache_integer_cst): Likewise. * var-tracking.c (EXPR_USE_DEPTH): Likewise. (reverse_op): Likewise. (vt_find_locations): Likewise. 2019-11-12 Martin Liska <mliska@suse.cz> * gimple-parser.c (c_parser_parse_gimple_body): Replace old parameter syntax with the new one, include opts.h if needed. Use SET_OPTION_IF_UNSET macro. 2019-11-12 Martin Liska <mliska@suse.cz> * name-lookup.c (namespace_hints::namespace_hints): Replace old parameter syntax with the new one, include opts.h if needed. Use SET_OPTION_IF_UNSET macro. * typeck.c (comptypes): Likewise. 2019-11-12 Martin Liska <mliska@suse.cz> * lto-partition.c (lto_balanced_map): Replace old parameter syntax with the new one, include opts.h if needed. Use SET_OPTION_IF_UNSET macro. * lto.c (do_whole_program_analysis): Likewise. From-SVN: r278085
2019-11-08re PR tree-optimization/92324 (ICE in expand_direct_optab_fn, at ↵Richard Biener1-0/+4
internal-fn.c:2890) 2019-11-08 Richard Biener <rguenther@suse.de> PR tree-optimization/92324 * tree-vect-loop.c (vect_create_epilog_for_reduction): Use STMT_VINFO_REDUC_VECTYPE for all computations, inserting sign-conversions as necessary. (vectorizable_reduction): Reject conversions in the chain that are not sign-conversions, base analysis on a non-converting stmt and its operation sign. Set STMT_VINFO_REDUC_VECTYPE. * tree-vect-stmts.c (vect_stmt_relevant_p): Don't dump anything for debug stmts. * tree-vectorizer.h (_stmt_vec_info::reduc_vectype): New. (STMT_VINFO_REDUC_VECTYPE): Likewise. * gcc.dg/vect/pr92205.c: XFAIL. * gcc.dg/vect/pr92324-1.c: New testcase. * gcc.dg/vect/pr92324-2.c: Likewise. From-SVN: r277955
2019-11-08Generalise gather and scatter optabsRichard Sandiford1-2/+2
The gather and scatter optabs required the vector offset to be the integer equivalent of the vector mode being loaded or stored. This patch generalises them so that the two vectors can have different element sizes, although they still need to have the same number of elements. One consequence of this is that it's possible (if unlikely) for two IFN_GATHER_LOADs to have the same arguments but different return types. E.g. the same scalar base and vector of 32-bit offsets could be used to load 8-bit elements and to load 16-bit elements. From just looking at the arguments, we could wrongly deduce that they're equivalent. I know we saw this happen at one point with IFN_WHILE_ULT, and we dealt with it there by passing a zero of the return type as an extra argument. Doing the same here also makes the load and store functions have the same argument assignment. For now this patch should be a no-op, but later SVE patches take advantage of the new flexibility. 2019-11-08 Richard Sandiford <richard.sandiford@arm.com> gcc/ * optabs.def (gather_load_optab, mask_gather_load_optab) (scatter_store_optab, mask_scatter_store_optab): Turn into conversion optabs, with the offset mode given explicitly. * doc/md.texi: Update accordingly. * config/aarch64/aarch64-sve-builtins-base.cc (svld1_gather_impl::expand): Likewise. (svst1_scatter_impl::expand): Likewise. * internal-fn.c (gather_load_direct, scatter_store_direct): Likewise. (expand_scatter_store_optab_fn): Likewise. (direct_gather_load_optab_supported_p): Likewise. (direct_scatter_store_optab_supported_p): Likewise. (expand_gather_load_optab_fn): Likewise. Expect the mask argument to be argument 4. (internal_fn_mask_index): Return 4 for IFN_MASK_GATHER_LOAD. (internal_gather_scatter_fn_supported_p): Replace the offset sign argument with the offset vector type. Require the two vector types to have the same number of elements but allow their element sizes to be different. Treat the optabs as conversion optabs. * internal-fn.h (internal_gather_scatter_fn_supported_p): Update prototype accordingly. * optabs-query.c (supports_at_least_one_mode_p): Replace with... (supports_vec_convert_optab_p): ...this new function. (supports_vec_gather_load_p): Update accordingly. (supports_vec_scatter_store_p): Likewise. * tree-vectorizer.h (vect_gather_scatter_fn_p): Take a vec_info. Replace the offset sign and bits parameters with a scalar type tree. * tree-vect-data-refs.c (vect_gather_scatter_fn_p): Likewise. Pass back the offset vector type instead of the scalar element type. Allow the offset to be wider than the memory elements. Search for an offset type that the target supports, stopping once we've reached the maximum of the element size and pointer size. Update call to internal_gather_scatter_fn_supported_p. (vect_check_gather_scatter): Update calls accordingly. When testing a new scale before knowing the final offset type, check whether the scale is supported for any signed or unsigned offset type. Check whether the target supports the source and target types of a conversion before deciding whether to look through the conversion. Record the chosen offset_vectype. * tree-vect-patterns.c (vect_get_gather_scatter_offset_type): Delete. (vect_recog_gather_scatter_pattern): Get the scalar offset type directly from the gs_info's offset_vectype instead. Pass a zero of the result type to IFN_GATHER_LOAD and IFN_MASK_GATHER_LOAD. * tree-vect-stmts.c (check_load_store_masking): Update call to internal_gather_scatter_fn_supported_p, passing the offset vector type recorded in the gs_info. (vect_truncate_gather_scatter_offset): Update call to vect_check_gather_scatter, leaving it to search for a valid offset vector type. (vect_use_strided_gather_scatters_p): Convert the offset to the element type of the gs_info's offset_vectype. (vect_get_gather_scatter_ops): Get the offset vector type directly from the gs_info. (vect_get_strided_load_store_ops): Likewise. (vectorizable_load): Pass a zero of the result type to IFN_GATHER_LOAD and IFN_MASK_GATHER_LOAD. * config/aarch64/aarch64-sve.md (gather_load<mode>): Rename to... (gather_load<mode><v_int_equiv>): ...this. (mask_gather_load<mode>): Rename to... (mask_gather_load<mode><v_int_equiv>): ...this. (scatter_store<mode>): Rename to... (scatter_store<mode><v_int_equiv>): ...this. (mask_scatter_store<mode>): Rename to... (mask_scatter_store<mode><v_int_equiv>): ...this. From-SVN: r277949
2019-11-04[vect] Clean up orig_loop_vinfo from vect_analyze_loopAndre Vieira1-3/+1
gcc/ChangeLog: 2019-11-04 Andre Vieira <andre.simoesdiasvieira@arm.com> * tree-vect-loop.c (vect_analyze_loop): Remove orig_loop_vinfo parameter. * tree-vectorizer.h (vect_analyze_loop): Update declaration. * tree-vectorizer.c (try_vectorize_loop_1): Update calls to vect_analyze_loop. From-SVN: r277785
2019-11-04[SLP] SLP vectorization: vectorize vector constructorsJoel Hutton1-0/+5
gcc/ChangeLog: 2019-11-04 Joel Hutton <Joel.Hutton@arm.com> * expr.c (store_constructor): Modify to handle single element vectors. * tree-vect-slp.c (vect_analyze_slp_instance): Add case for vector constructors. (vect_slp_check_for_constructors): New function. (vect_slp_analyze_bb_1): Call new function to check for vector constructors. (vectorize_slp_instance_root_stmt): New function. (vect_schedule_slp): Call new function to vectorize root stmt of vector constructors. * tree-vectorizer.h (SLP_INSTANCE_ROOT_STMT): New field. gcc/testsuite/ChangeLog: 2019-11-04 Joel Hutton <Joel.Hutton@arm.com> * gcc.dg/vect/bb-slp-40.c: New test. * gcc.dg/vect/bb-slp-41.c: New test. From-SVN: r277784
2019-10-29[vect]PR 88915: Vectorize epilogues when versioning loopsAndre Vieira1-1/+12
gcc/ChangeLog: 2019-10-29 Andre Vieira <andre.simoesdiasvieira@arm.com> PR 88915 * tree-ssa-loop-niter.h (simplify_replace_tree): Change declaration. * tree-ssa-loop-niter.c (simplify_replace_tree): Add context parameter and make the valueize function pointer also take a void pointer. * gcc/tree-ssa-sccvn.c (vn_valueize_wrapper): New function to wrap around vn_valueize, to call it without a context. (process_bb): Use vn_valueize_wrapper instead of vn_valueize. * tree-vect-loop.c (_loop_vec_info): Initialize epilogue_vinfos. (~_loop_vec_info): Release epilogue_vinfos. (vect_analyze_loop_costing): Use knowledge of main VF to estimate number of iterations of epilogue. (vect_analyze_loop_2): Adapt to analyse main loop for all supported vector sizes when vect-epilogues-nomask=1. Also keep track of lowest versioning threshold needed for main loop. (vect_analyze_loop): Likewise. (find_in_mapping): New helper function. (update_epilogue_loop_vinfo): New function. (vect_transform_loop): When vectorizing epilogues re-use analysis done on main loop and call update_epilogue_loop_vinfo to update it. * tree-vect-loop-manip.c (vect_update_inits_of_drs): No longer insert stmts on loop preheader edge. (vect_do_peeling): Enable skip-vectors when doing loop versioning if we decided to vectorize epilogues. Update epilogues NITERS and construct ADVANCE to update epilogues data references where needed. * tree-vectorizer.h (_loop_vec_info): Add epilogue_vinfos. (vect_do_peeling, vect_update_inits_of_drs, determine_peel_for_niter, vect_analyze_loop): Add or update declarations. * tree-vectorizer.c (try_vectorize_loop_1): Make sure to use already created loop_vec_info's for epilogues when available. Otherwise analyse epilogue separately. From-SVN: r277569