riscv-gnu-toolchain/gcc.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
2020-09-23	vect: Fix epilogue loop handling of partial vectors	Richard Sandiford	1	-68/+128
	This patch fixes the fallout that Kewen reported on Power after the recent change to avoid unnecessary use of partial vectors. As Kewen said, the problem is that vect_analyze_loop_2 doesn't know how many epilogue iterations there will be, and so it cannot make a final decision about whether the number of iterations forces an epilogue loop to use partial vectors. This is similar to the current situation for peeling: we don't know during initial analysis whether an epilogue loop will itself require peeling. Instead we decide that during vect_do_peeling, where the final number of epilogue loop iterations is known. The patch takes a similar approach for the decision about whether to use partial vectors. As the comments in the patch say, the idea is that vect_analyze_loop_2 should make peeling and partial- vector decisions based on the assumption that the loop_vinfo will be used as the main loop, while vect_do_peeling should make them in the knowledge that the loop_vinfo will be used as an epilogue loop. This allows the same analysis to be used for both cases, which we rely on for implementing VECT_COMPARE_COSTS; see the big comment in vect_analyze_loop for details. I hope the patch makes the (mostly preexisting) structure a bit more obvious. It isn't what anyone would design from scratch, but that's the nature of working with a mature vector framework. Arranging things this way means that vect_verify_full_masking and vect_verify_loop_lens now become part of the “can” rather than “will” test for partial vectors. Also, while splitting out the logic that handles epilogues with constant iterations, I added a check to make sure that we don't try to use partial vectors to vectorise a single-scalar loop. This required some changes to the Power tests. gcc/ * tree-vectorizer.h (determine_peel_for_niter): Delete in favor of... (vect_determine_partial_vectors_and_peeling): ...this new function. * tree-vect-loop-manip.c (vect_update_epilogue_niters): New function. Reject using vector epilogue loops for single iterations. Install the constant number of epilogue loop iterations in the associated loop_vinfo. Rely on vect_determine_partial_vectors_and_peeling to do the main part of the test. (vect_do_peeling): Use vect_update_epilogue_niters to handle epilogue loops with a known number of iterations. Skip recomputing the number of iterations later in that case. Otherwise, use vect_determine_partial_vectors_and_peeling to decide whether the epilogue loop needs to use partial vectors or peeling. * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Set the default can_use_partial_vectors_p to false if partial-vector-usage=0. (determine_peel_for_niter): Remove in favor of... (vect_determine_partial_vectors_and_peeling): ...this new function, split out from... (vect_analyze_loop_2): ...here. Reflect the vect_verify_full_masking and vect_verify_loop_lens results in CAN_USE_PARTIAL_VECTORS_P rather than USING_PARTIAL_VECTORS_P. gcc/testsuite/ * gcc.target/powerpc/p9-vec-length-epil-1.c: Do not expect the single-iteration epilogues of the 64-bit loops to be vectorized. * gcc.target/powerpc/p9-vec-length-epil-7.c: Likewise. * gcc.target/powerpc/p9-vec-length-epil-8.c: Likewise.
2020-09-23	tree-optimization/97173 - extend assert in vectorizable_live_operation	Richard Biener	1	-2/+4
	The condition we're expecting to eventually run into isn't fully captured by checking for CTORs, instead we can also run into the CTOR element conversion. 2020-09-23 Richard Biener <rguenther@suse.de> PR tree-optimization/97173 * tree-vect-loop.c (vectorizable_live_operation): Extend assert to also conver element conversions. * gcc.dg/vect/pr97173.c: New testcase.
2020-09-18	tree-optimization/97095 - fix typo in vectorizable_live_operation	Richard Biener	1	-1/+1
	This fixes a typo introduced with the last change and not noticed because those vectorizer access macros are not type safe ... 2020-09-18 Richard Biener <rguenther@suse.de> PR tree-optimization/97095 * tree-vect-loop.c (vectorizable_live_operation): Get the SLP vector type from the correct object. * gfortran.dg/pr97095.f: New testcase.
2020-09-16	vec: don't select partial vectors when unnecessary	Andrea Corallo	1	-36/+49
	gcc/ChangeLog 2020-09-09 Andrea Corallo <andrea.corallo@arm.com> * tree-vect-loop.c (vect_need_peeling_or_partial_vectors_p): New function. (vect_analyze_loop_2): Make use of it not to select partial vectors if no peel is required. (determine_peel_for_niter): Move out some logic into 'vect_need_peeling_or_partial_vectors_p'. gcc/testsuite/ChangeLog 2020-09-09 Andrea Corallo <andrea.corallo@arm.com> * gcc.target/aarch64/sve/cost_model_10.c: New test. * gcc.target/aarch64/sve/clastb_8.c: Update test for new vectorization strategy. * gcc.target/aarch64/sve/cost_model_5.c: Likewise. * gcc.target/aarch64/sve/struct_vect_14.c: Likewise. * gcc.target/aarch64/sve/struct_vect_15.c: Likewise. * gcc.target/aarch64/sve/struct_vect_16.c: Likewise. * gcc.target/aarch64/sve/struct_vect_17.c: Likewise.
2020-09-16	remove STMT_VINFO_NUM_SLP_USES	Richard Biener	1	-3/+5
	This removes STMT_VINFO_NUM_SLP_USES by pushing the setting of the shared stmt_vec_info vector type to where we actually need it which is alignment analysis and vectorizable_* analysis (where we could eventually elide it for non-load/store operations). In particular "uses" in the cache and in disqualified SLP subgraphs should no longer provide conflicting vector types this way. 2020-09-16 Richard Biener <rguenther@suse.de> * tree-vectorizer.h (_stmt_vec_info::num_slp_uses): Remove. (STMT_VINFO_NUM_SLP_USES): Likewise. (vect_free_slp_instance): Adjust. (vect_update_shared_vectype): Declare. * tree-vectorizer.c (vec_info::~vec_info): Adjust. * tree-vect-loop.c (vect_analyze_loop_2): Likewise. (vectorizable_live_operation): Use vector type from SLP_TREE_REPRESENTATIVE. (vect_transform_loop): Adjust. * tree-vect-data-refs.c (vect_slp_analyze_node_alignment): Set the shared vector type. * tree-vect-slp.c (vect_free_slp_tree): Remove final_p parameter, remove STMT_VINFO_NUM_SLP_USES updating. (vect_free_slp_instance): Adjust. (vect_create_new_slp_node): Remove STMT_VINFO_NUM_SLP_USES updating. (vect_update_shared_vectype): Always compare with the present vector type, update if NULL. (vect_build_slp_tree_1): Do not update the shared vector type here. (vect_build_slp_tree_2): Adjust. (slp_copy_subtree): Likewise. (vect_attempt_slp_rearrange_stmts): Likewise. (vect_analyze_slp_instance): Likewise. (vect_analyze_slp): Likewise. (vect_slp_analyze_node_operations_1): Update the shared vector type. (vect_slp_analyze_operations): Adjust. (vect_slp_analyze_bb_1): Likewise.
2020-09-11	improve BB vectorization dump locations	Richard Biener	1	-1/+1
	This tries to improve BB vectorization dumps by providing more precise locations. Currently the vect_location is simply the very last stmt in a basic-block that has a location. So for double a[4], b[4]; int x[4], y[4]; void foo() { a[0] = b[0]; // line 5 a[1] = b[1]; a[2] = b[2]; a[3] = b[3]; x[0] = y[0]; // line 9 x[1] = y[1]; x[2] = y[2]; x[3] = y[3]; } // line 13 we show the user with -O3 -fopt-info-vec t.c:13:1: optimized: basic block part vectorized using 16 byte vectors while with the patch we point to both independently vectorized opportunities: t.c:5:8: optimized: basic block part vectorized using 16 byte vectors t.c:9:8: optimized: basic block part vectorized using 16 byte vectors there's the possibility that the location regresses in case the root stmt in the SLP instance has no location. For a SLP subgraph with multiple entries the location also chooses one entry at random, not sure in which case we want to dump both. Still as the plan is to extend the basic-block vectorization scope from single basic-block to multiple ones this is a first step to preserve something sensible. Implementation-wise this makes both costing and code-generation happen on the subgraphs as analyzed. 2020-09-11 Richard Biener <rguenther@suse.de> * tree-vectorizer.h (_slp_instance::location): New method. (vect_schedule_slp): Adjust prototype. * tree-vectorizer.c (vec_info::remove_stmt): Adjust the BB region begin if we removed the stmt it points to. * tree-vect-loop.c (vect_transform_loop): Adjust. * tree-vect-slp.c (_slp_instance::location): Implement. (vect_analyze_slp_instance): For BB vectorization set vect_location to that of the instance. (vect_slp_analyze_operations): Likewise. (vect_bb_vectorization_profitable_p): Remove wrapper. (vect_slp_analyze_bb_1): Remove cost check here. (vect_slp_region): Cost check and code generate subgraphs separately, report optimized locations and missed optimizations due to profitability for each of them. (vect_schedule_slp): Get the vector of SLP graph entries to vectorize as argument.
2020-09-07	vec: Revert "dead code removal in tree-vect-loop.c" and add a comment.	Andrea Corallo	1	-4/+13
	gcc/ChangeLog 2020-09-07 Andrea Corallo <andrea.corallo@arm.com> * tree-vect-loop.c (vect_estimate_min_profitable_iters): Revert dead-code removal introduced by 09fa6acd8d9 + add a comment to clarify.
2020-09-07	code generate live lanes in basic-block vectorization	Richard Biener	1	-94/+149
	The following adds the capability to code-generate live lanes in basic-block vectorization using lane extracts from vector stmts rather than keeping the original scalar code around for those. This eventually makes previously not profitable vectorizations profitable (the live scalar code was appropriately costed so are the lane extracts now), without considering the cost model this patch doesn't add or remove any basic-block vectorization capabilities. The patch re/ab-uses STMT_VINFO_LIVE_P in basic-block vectorization mode to tell whether a live lane is vectorized or whether it is provided by means of keeping the scalar code live. The patch is a first step towards vectorizing sequences of stmts that do not end up in stores or vector constructors though. Bootstrapped and tested on x86_64-unknown-linux-gnu. 2020-09-04 Richard Biener <rguenther@suse.de> * tree-vectorizer.h (vectorizable_live_operation): Adjust. * tree-vect-loop.c (vectorizable_live_operation): Vectorize live lanes out of basic-block vectorization nodes. * tree-vect-slp.c (vect_bb_slp_mark_live_stmts): New function. (vect_slp_analyze_operations): Analyze live lanes and their vectorization possibility after the whole SLP graph is final. (vect_bb_slp_scalar_cost): Adjust for vectorized live lanes. * tree-vect-stmts.c (can_vectorize_live_stmts): Adjust. (vect_transform_stmt): Call can_vectorize_live_stmts also for basic-block vectorization. * gcc.dg/vect/bb-slp-46.c: New testcase. * gcc.dg/vect/bb-slp-47.c: Likewise. * gcc.dg/vect/bb-slp-32.c: Adjust.
2020-09-04	tree-optimization/96920 - another ICE when vectorizing nested cycles	Richard Biener	1	-35/+67
	This refines the previous fix for PR96698 by re-doing how and where we arrange for setting vectorized cycle PHI backedge values. 2020-09-04 Richard Biener <rguenther@suse.de> PR tree-optimization/96698 PR tree-optimization/96920 * tree-vectorizer.h (loop_vec_info::reduc_latch_defs): Remove. (loop_vec_info::reduc_latch_slp_defs): Likewise. * tree-vect-stmts.c (vect_transform_stmt): Remove vectorized cycle PHI latch code. * tree-vect-loop.c (maybe_set_vectorized_backedge_value): New helper to set vectorized cycle PHI latch values. (vect_transform_loop): Walk over all PHIs again after vectorizing them, calling maybe_set_vectorized_backedge_value. Call maybe_set_vectorized_backedge_value for each vectorized stmt. Remove delayed update code. * tree-vect-slp.c (vect_analyze_slp_instance): Initialize SLP instance reduc_phis member. (vect_schedule_slp): Set vectorized cycle PHI latch values. * gfortran.dg/vect/pr96920.f90: New testcase. * gcc.dg/vect/pr96920.c: Likewise.
2020-09-04	vec: dead code removal in tree-vect-loop.c	Andrea Corallo	1	-11/+4
	gcc/ChangeLog 2020-09-04 Andrea Corallo <andrea.corallo@arm.com> * tree-vect-loop.c (vect_estimate_min_profitable_iters): Remove dead code as LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) is always verified.
2020-08-27	vec: add exact argument for various grow functions.	Martin Liska	1	-4/+4
	gcc/ada/ChangeLog: * gcc-interface/trans.c (gigi): Set exact argument of a vector growth function to true. (Attribute_to_gnu): Likewise. gcc/ChangeLog: * alias.c (init_alias_analysis): Set exact argument of a vector growth function to true. * calls.c (internal_arg_pointer_based_exp_scan): Likewise. * cfgbuild.c (find_many_sub_basic_blocks): Likewise. * cfgexpand.c (expand_asm_stmt): Likewise. * cfgrtl.c (rtl_create_basic_block): Likewise. * combine.c (combine_split_insns): Likewise. (combine_instructions): Likewise. * config/aarch64/aarch64-sve-builtins.cc (function_expander::add_output_operand): Likewise. (function_expander::add_input_operand): Likewise. (function_expander::add_integer_operand): Likewise. (function_expander::add_address_operand): Likewise. (function_expander::add_fixed_operand): Likewise. * df-core.c (df_worklist_dataflow_doublequeue): Likewise. * dwarf2cfi.c (update_row_reg_save): Likewise. * early-remat.c (early_remat::init_block_info): Likewise. (early_remat::finalize_candidate_indices): Likewise. * except.c (sjlj_build_landing_pads): Likewise. * final.c (compute_alignments): Likewise. (grow_label_align): Likewise. * function.c (temp_slots_at_level): Likewise. * fwprop.c (build_single_def_use_links): Likewise. (update_uses): Likewise. * gcc.c (insert_wrapper): Likewise. * genautomata.c (create_state_ainsn_table): Likewise. (add_vect): Likewise. (output_dead_lock_vect): Likewise. * genmatch.c (capture_info::capture_info): Likewise. (parser::finish_match_operand): Likewise. * genrecog.c (optimize_subroutine_group): Likewise. (merge_pattern_info::merge_pattern_info): Likewise. (merge_into_decision): Likewise. (print_subroutine_start): Likewise. (main): Likewise. * gimple-loop-versioning.cc (loop_versioning::loop_versioning): Likewise. * gimple.c (gimple_set_bb): Likewise. * graphite-isl-ast-to-gimple.c (translate_isl_ast_node_user): Likewise. * haifa-sched.c (sched_extend_luids): Likewise. (extend_h_i_d): Likewise. * insn-addr.h (insn_addresses_new): Likewise. * ipa-cp.c (gather_context_independent_values): Likewise. (find_more_contexts_for_caller_subset): Likewise. * ipa-devirt.c (final_warning_record::grow_type_warnings): Likewise. (ipa_odr_read_section): Likewise. * ipa-fnsummary.c (evaluate_properties_for_edge): Likewise. (ipa_fn_summary_t::duplicate): Likewise. (analyze_function_body): Likewise. (ipa_merge_fn_summary_after_inlining): Likewise. (read_ipa_call_summary): Likewise. * ipa-icf.c (sem_function::bb_dict_test): Likewise. * ipa-prop.c (ipa_alloc_node_params): Likewise. (parm_bb_aa_status_for_bb): Likewise. (ipa_compute_jump_functions_for_edge): Likewise. (ipa_analyze_node): Likewise. (update_jump_functions_after_inlining): Likewise. (ipa_read_edge_info): Likewise. (read_ipcp_transformation_info): Likewise. (ipcp_transform_function): Likewise. * ipa-reference.c (ipa_reference_write_optimization_summary): Likewise. * ipa-split.c (execute_split_functions): Likewise. * ira.c (find_moveable_pseudos): Likewise. * lower-subreg.c (decompose_multiword_subregs): Likewise. * lto-streamer-in.c (input_eh_regions): Likewise. (input_cfg): Likewise. (input_struct_function_base): Likewise. (input_function): Likewise. * modulo-sched.c (set_node_sched_params): Likewise. (extend_node_sched_params): Likewise. (schedule_reg_moves): Likewise. * omp-general.c (omp_construct_simd_compare): Likewise. * passes.c (pass_manager::create_pass_tab): Likewise. (enable_disable_pass): Likewise. * predict.c (determine_unlikely_bbs): Likewise. * profile.c (compute_branch_probabilities): Likewise. * read-rtl-function.c (function_reader::parse_block): Likewise. * read-rtl.c (rtx_reader::read_rtx_code): Likewise. * reg-stack.c (stack_regs_mentioned): Likewise. * regrename.c (regrename_init): Likewise. * rtlanal.c (T>::add_single_to_queue): Likewise. * sched-deps.c (init_deps_data_vector): Likewise. * sel-sched-ir.c (sel_extend_global_bb_info): Likewise. (extend_region_bb_info): Likewise. (extend_insn_data): Likewise. * symtab.c (symtab_node::create_reference): Likewise. * tracer.c (tail_duplicate): Likewise. * trans-mem.c (tm_region_init): Likewise. (get_bb_regions_instrumented): Likewise. * tree-cfg.c (init_empty_tree_cfg_for_function): Likewise. (build_gimple_cfg): Likewise. (create_bb): Likewise. (move_block_to_fn): Likewise. * tree-complex.c (tree_lower_complex): Likewise. * tree-if-conv.c (predicate_rhs_code): Likewise. * tree-inline.c (copy_bb): Likewise. * tree-into-ssa.c (get_ssa_name_ann): Likewise. (mark_phi_for_rewrite): Likewise. * tree-object-size.c (compute_builtin_object_size): Likewise. (init_object_sizes): Likewise. * tree-predcom.c (initialize_root_vars_store_elim_1): Likewise. (initialize_root_vars_store_elim_2): Likewise. (prepare_initializers_chain_store_elim): Likewise. * tree-ssa-address.c (addr_for_mem_ref): Likewise. (multiplier_allowed_in_address_p): Likewise. * tree-ssa-coalesce.c (ssa_conflicts_new): Likewise. * tree-ssa-forwprop.c (simplify_vector_constructor): Likewise. * tree-ssa-loop-ivopts.c (addr_offset_valid_p): Likewise. (get_address_cost_ainc): Likewise. * tree-ssa-loop-niter.c (discover_iteration_bound_by_body_walk): Likewise. * tree-ssa-pre.c (add_to_value): Likewise. (phi_translate_1): Likewise. (do_pre_regular_insertion): Likewise. (do_pre_partial_partial_insertion): Likewise. (init_pre): Likewise. * tree-ssa-propagate.c (ssa_prop_init): Likewise. (update_call_from_tree): Likewise. * tree-ssa-reassoc.c (optimize_range_tests_cmp_bitwise): Likewise. * tree-ssa-sccvn.c (vn_reference_lookup_3): Likewise. (vn_reference_lookup_pieces): Likewise. (eliminate_dom_walker::eliminate_push_avail): Likewise. * tree-ssa-strlen.c (set_strinfo): Likewise. (get_stridx_plus_constant): Likewise. (zero_length_string): Likewise. (find_equal_ptrs): Likewise. (printf_strlen_execute): Likewise. * tree-ssa-threadedge.c (set_ssa_name_value): Likewise. * tree-ssanames.c (make_ssa_name_fn): Likewise. * tree-streamer-in.c (streamer_read_tree_bitfields): Likewise. * tree-vect-loop.c (vect_record_loop_mask): Likewise. (vect_get_loop_mask): Likewise. (vect_record_loop_len): Likewise. (vect_get_loop_len): Likewise. * tree-vect-patterns.c (vect_recog_mask_conversion_pattern): Likewise. * tree-vect-slp.c (vect_slp_convert_to_external): Likewise. (vect_bb_slp_scalar_cost): Likewise. (vect_bb_vectorization_profitable_p): Likewise. (vectorizable_slp_permutation): Likewise. * tree-vect-stmts.c (vectorizable_call): Likewise. (vectorizable_simd_clone_call): Likewise. (scan_store_can_perm_p): Likewise. (vectorizable_store): Likewise. * expr.c: Likewise. * vec.c (test_safe_grow_cleared): Likewise. * vec.h (vec_safe_grow): Likewise. (vec_safe_grow_cleared): Likewise. (vl_ptr>::safe_grow): Likewise. (vl_ptr>::safe_grow_cleared): Likewise. * config/c6x/c6x.c (insn_set_clock): Likewise. gcc/c/ChangeLog: * gimple-parser.c (c_parser_gimple_compound_statement): Set exact argument of a vector growth function to true. gcc/cp/ChangeLog: * class.c (build_vtbl_initializer): Set exact argument of a vector growth function to true. * constraint.cc (get_mapped_args): Likewise. * decl.c (cp_maybe_mangle_decomp): Likewise. (cp_finish_decomp): Likewise. * parser.c (cp_parser_omp_for_loop): Likewise. * pt.c (canonical_type_parameter): Likewise. * rtti.c (get_pseudo_ti_init): Likewise. gcc/fortran/ChangeLog: * trans-openmp.c (gfc_trans_omp_do): Set exact argument of a vector growth function to true. gcc/lto/ChangeLog: * lto-common.c (lto_file_finalize): Set exact argument of a vector growth function to true.
2020-08-26	tree-optimization/96698 - fix ICE when vectorizing nested cycles	Richard Biener	1	-1/+34
	This fixes vectorized PHI latch edge updating and delay it until all of the loop is code generated to deal with the case that the latch def is a PHI in the same block. 2020-08-26 Richard Biener <rguenther@suse.de> PR tree-optimization/96698 * tree-vectorizer.h (loop_vec_info::reduc_latch_defs): New. (loop_vec_info::reduc_latch_slp_defs): Likewise. * tree-vect-stmts.c (vect_transform_stmt): Only record stmts to update PHI latches from, perform the update ... * tree-vect-loop.c (vect_transform_loop): ... here after vectorizing those PHIs. (info_for_reduction): Properly handle non-reduction PHIs. * gcc.dg/vect/pr96698.c: New testcase.
2020-08-24	SLP: support entire BB.	Martin Liska	1	-2/+3
	gcc/ChangeLog: * tree-vect-data-refs.c (dr_group_sort_cmp): Work on data_ref_pair. (vect_analyze_data_ref_accesses): Work on groups. (vect_find_stmt_data_reference): Add group_id argument and fill up dataref_groups vector. * tree-vect-loop.c (vect_get_datarefs_in_loop): Pass new arguments. (vect_analyze_loop_2): Likewise. * tree-vect-slp.c (vect_slp_analyze_bb_1): Pass argument. (vect_slp_bb_region): Likewise. (vect_slp_region): Likewise. (vect_slp_bb):Work on the entire BB. * tree-vectorizer.h (vect_analyze_data_ref_accesses): Add new argument. (vect_find_stmt_data_reference): Likewise. gcc/testsuite/ChangeLog: * gcc.dg/vect/bb-slp-38.c: Adjust pattern as now we only process a single vectorization and now 2 partial. * gcc.dg/vect/bb-slp-45.c: New test.
2020-08-06	vect/rs6000: Support vector with length cost modeling	Kewen Lin	1	-5/+83
	This patch is to add the cost modeling for vector with length, it mainly follows what we generate for vector with length in functions vect_set_loop_controls_directly and vect_gen_len at the worst case. For Power, the length is expected to be in bits 0-7 (high bits), we have to model the cost of shifting bits, which is implemented in adjust_vect_cost_per_loop. Bootstrapped/regtested on powerpc64le-linux-gnu (P9) with explicit param vect-partial-vector-usage=1. gcc/ChangeLog: * config/rs6000/rs6000.c (rs6000_adjust_vect_cost_per_loop): New function. (rs6000_finish_cost): Call rs6000_adjust_vect_cost_per_loop. * tree-vect-loop.c (vect_estimate_min_profitable_iters): Add cost modeling for vector with length. (vect_rgroup_iv_might_wrap_p): New function, factored out from... * tree-vect-loop-manip.c (vect_set_loop_controls_directly): ...this. Update function comment. * tree-vect-stmts.c (vect_gen_len): Update function comment. * tree-vectorizer.h (vect_rgroup_iv_might_wrap_p): New declare.
2020-07-31	vect: Don't consider branch costs if no peeled iterations	Kewen Lin	1	-7/+9
	Currently vectorizer cost modeling counts branch taken costs for prologue and epilogue if the number of iterations is unknown. But it isn't sensible if there are no peeled iterations. This patch is to guard them under peel_iters_prologue > 0 or peel_iters_epilogue > 0. Bootstrapped/regtested on powerpc64le-linux-gnu and aarch64-linux-gnu. gcc/ChangeLog: * tree-vect-loop.c (vect_get_known_peeling_cost): Don't consider branch taken costs for prologue and epilogue if they don't exist. (vect_estimate_min_profitable_iters): Likewise. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/cost_model_2.c: Adjust due to cost model change.
2020-07-27	vect: Refactor peel_iters_{pro,epi}logue cost modeling	Kewen Lin	1	-125/+142
	This patch is to refactor the existing peel_iters_prologue and peel_iters_epilogue cost model handlings, by following the structure below suggested by Richard Sandiford: - calculate peel_iters_prologue - calculate peel_iters_epilogue - add costs associated with peel_iters_prologue - add costs associated with peel_iters_epilogue - add costs related to branch taken/not_taken. Bootstrapped/regtested on aarch64-linux-gnu. gcc/ChangeLog: * tree-vect-loop.c (vect_get_known_peeling_cost): Factor out some code to determine peel_iters_epilogue to... (vect_get_peel_iters_epilogue): ...this new function. (vect_estimate_min_profitable_iters): Refactor cost calculation on peel_iters_prologue and peel_iters_epilogue.
2020-07-19	vect: Support length-based partial vectors approach	Kewen Lin	1	-6/+211
	Power9 supports vector load/store instruction lxvl/stxvl which allow us to operate partial vectors with one specific length. This patch extends some of current mask-based partial vectors support code for length-based approach, also adds some length specific support code. So far it assumes that we can only have one partial vectors approach at the same time, it will disable to use partial vectors if both approaches co-exist. Like the description of optab len_load/len_store, the length-based approach can have two flavors, one is length in bytes, the other is length in lanes. This patch is mainly implemented and tested for length in bytes, but as Richard S. suggested, most of code has considered both flavors. This also introduces one parameter vect-partial-vector-usage allow users to control when the loop vectorizer considers using partial vectors as an alternative to falling back to scalar code. gcc/ChangeLog: * config/rs6000/rs6000.c (rs6000_option_override_internal): Set param_vect_partial_vector_usage to 0 explicitly. * doc/invoke.texi (vect-partial-vector-usage): Document new option. * optabs-query.c (get_len_load_store_mode): New function. * optabs-query.h (get_len_load_store_mode): New declare. * params.opt (vect-partial-vector-usage): New. * tree-vect-loop-manip.c (vect_set_loop_controls_directly): Add the handlings for vectorization using length-based partial vectors, call vect_gen_len for length generation, and rename some variables with items instead of scalars. (vect_set_loop_condition_partial_vectors): Add the handlings for vectorization using length-based partial vectors. (vect_do_peeling): Allow remaining eiters less than epilogue vf for LOOP_VINFO_USING_PARTIAL_VECTORS_P. * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Init epil_using_partial_vectors_p. (_loop_vec_info::~_loop_vec_info): Call release_vec_loop_controls for lengths destruction. (vect_verify_loop_lens): New function. (vect_analyze_loop): Add handlings for epilogue of loop when it's marked to use vectorization using partial vectors. (vect_analyze_loop_2): Add the check to allow only one vectorization approach using partial vectorization at the same time. Check param vect-partial-vector-usage for partial vectors decision. Mark LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P if the epilogue is considerable to use partial vectors. Call release_vec_loop_controls for lengths destruction. (vect_estimate_min_profitable_iters): Adjust for loop vectorization using length-based partial vectors. (vect_record_loop_mask): Init factor to 1 for vectorization using mask-based partial vectors. (vect_record_loop_len): New function. (vect_get_loop_len): Likewise. * tree-vect-stmts.c (check_load_store_for_partial_vectors): Add checks for vectorization using length-based partial vectors. Factor some code to lambda function get_valid_nvectors. (vectorizable_store): Add handlings when using length-based partial vectors. (vectorizable_load): Likewise. (vect_gen_len): New function. * tree-vectorizer.h (struct rgroup_controls): Add field factor mainly for length-based partial vectors. (vec_loop_lens): New typedef. (_loop_vec_info): Add lens and epil_using_partial_vectors_p. (LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P): New macro. (LOOP_VINFO_LENS): Likewise. (LOOP_VINFO_FULLY_WITH_LENGTH_P): Likewise. (vect_record_loop_len): New declare. (vect_get_loop_len): Likewise. (vect_gen_len): Likewise.
2020-07-09	vect: Use adjusted niters by considering peeling prologue	Kewen Lin	1	-1/+7
	This patch is derived from the review of vector with length patch series. I relaxed the guard on LOOP_VINFO_PEELING_FOR_ALIGNMENT for vector with length as Richard S.'s suggestion, then encountered one failure from case gcc.dg/vect/vect-ifcvt-11.c with param vect-partial-vector-usage=2 enablement run. The root cause is that we still use the original niters for the loop body vectorization, it leads the access to go out of bound, instead we should use LOOP_VINFO_NITERS which has been adjusted in vect_do_peeling by considering the peeling number for prologue. Bootstrapped/regtested on aarch64-linux-gnu and powerpc64le-linux-gnu. gcc/ChangeLog: * tree-vect-loop.c (vect_transform_loop): Use LOOP_VINFO_NITERS which is adjusted by considering peeled prologue for non vect_use_loop_mask_for_alignment_p cases.
2020-07-09	remove premature vect_verify_datarefs_alignment	Richard Biener	1	-2/+0
	This followup removes vect_verify_datarefs_alignment and its premature cancellation of vectorization leaving the actual decision whether alignment is supported to the functions deciding whether we can vectorize a load or store. 2020-07-08 Richard Biener <rguenther@suse.de> * tree-vectorizer.h (vect_verify_datarefs_alignment): Remove. (vect_slp_analyze_and_verify_instance_alignment): Rename to ... (vect_slp_analyze_instance_alignment): ... this. * tree-vect-data-refs.c (verify_data_ref_alignment): Remove. (vect_verify_datarefs_alignment): Likewise. (vect_enhance_data_refs_alignment): Do not call vect_verify_datarefs_alignment. (vect_slp_analyze_node_alignment): Rename from vect_slp_analyze_and_verify_node_alignment and do not call verify_data_ref_alignment. (vect_slp_analyze_instance_alignment): Rename from vect_slp_analyze_and_verify_instance_alignment. * tree-vect-stmts.c (vectorizable_store): Dump when we vectorize an unaligned access. (vectorizable_load): Likewise. * tree-vect-loop.c (vect_analyze_loop_2): Do not call vect_verify_datarefs_alignment. * tree-vect-slp.c (vect_slp_analyze_bb_1): Adjust. * gcc.dg/vect/bb-slp-10.c: Adjust. * gcc.dg/vect/slp-45.c: Likewise. * gcc.dg/vect/vect-109.c: Likewise.
2020-07-09	vect/testsuite: Adjust dumping for fully masking decision	Kewen Lin	1	-3/+3
	As Richard S. suggested in the review of vector with length patch series, we can use one message on "partial vectors" instead of "fully with masking". This patch is to update the dumping string and related test cases. Bootstrapped/regtested on aarch64-linux-gnu. gcc/ChangeLog: * tree-vect-loop.c (vect_analyze_loop_2): Update dumping string for fully masking to be more common. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/clastb_1.c: Update dumping string. * gcc.target/aarch64/sve/clastb_2.c: Likewise. * gcc.target/aarch64/sve/clastb_3.c: Likewise. * gcc.target/aarch64/sve/clastb_4.c: Likewise. * gcc.target/aarch64/sve/clastb_5.c: Likewise. * gcc.target/aarch64/sve/clastb_6.c: Likewise. * gcc.target/aarch64/sve/clastb_7.c: Likewise.
2020-06-26	tree-optimization/95897 - fix fold-left SLP reduction insert place	Richard Biener	1	-2/+1
	This fixes computation of the insertion place for fold-left SLP reductions where the PHIs do not have vectorized stmts. The SLP representation isn't perfect here thus the following. 2020-06-26 Richard Biener <rguenther@suse.de> PR tree-optimization/95897 * tree-vectorizer.h (vectorizable_induction): Remove unused gimple_stmt_iterator * parameter. * tree-vect-loop.c (vectorizable_induction): Likewise. (vect_analyze_loop_operations): Adjust. * tree-vect-stmts.c (vect_analyze_stmt): Likewise. (vect_transform_stmt): Likewise. * tree-vect-slp.c (vect_schedule_slp_instance): Adjust for fold-left reductions, clarify existing reduction case. * gcc.dg/vect/pr95897.c: New testcase.
2020-06-15	vect: Use LOOP_VINFO_DATAREFS and LOOP_VINFO_DDRS consistently	Fei Yang	1	-2/+2
	Minor code refactorings in tree-vect-data-refs.c and tree-vect-loop.c. Use LOOP_VINFO_DATAREFS and LOOP_VINFO_DDRS when possible and rename several parameters to make code more consistent. 2020-06-13 Felix Yang <felix.yang@huawei.com> gcc/ * tree-vect-data-refs.c (vect_verify_datarefs_alignment): Rename parameter to loop_vinfo and update uses. Use LOOP_VINFO_DATAREFS when possible. (vect_analyze_data_refs_alignment): Likewise, and use LOOP_VINFO_DDRS when possible. * tree-vect-loop.c (vect_dissolve_slp_only_groups): Use LOOP_VINFO_DATAREFS when possible. (update_epilogue_loop_vinfo): Likewise.
2020-06-12	vect: Factor out and rename some functions/macros	Kewen Lin	1	-44/+72
	Power supports vector memory access with length (in bytes) instructions. Like existing fully masking for SVE, it is another approach to vectorize the loop using partially-populated vectors. As Richard Sandiford suggested, we should share the codes in approaches with partial vectors if possible. This patch is to: 1) factor out two functions: - vect_min_prec_for_max_niters - vect_known_niters_smaller_than_vf. 2) rename four functions: - vect_iv_limit_for_full_masking - check_load_store_masking - vect_set_loop_condition_masked - vect_set_loop_condition_unmasked 3) rename macros LOOP_VINFO_MASK_COMPARE_TYPE and LOOP_VINFO_MASK_IV_TYPE. Bootstrapped/regtested on aarch64-linux-gnu. gcc/ChangeLog: * tree-vect-loop-manip.c (vect_set_loop_controls_directly): Rename LOOP_VINFO_MASK_COMPARE_TYPE to LOOP_VINFO_RGROUP_COMPARE_TYPE. Rename LOOP_VINFO_MASK_IV_TYPE to LOOP_VINFO_RGROUP_IV_TYPE. (vect_set_loop_condition_masked): Renamed to ... (vect_set_loop_condition_partial_vectors): ... this. Rename LOOP_VINFO_MASK_COMPARE_TYPE to LOOP_VINFO_RGROUP_COMPARE_TYPE. Rename vect_iv_limit_for_full_masking to vect_iv_limit_for_partial_vectors. (vect_set_loop_condition_unmasked): Renamed to ... (vect_set_loop_condition_normal): ... this. (vect_set_loop_condition): Rename vect_set_loop_condition_unmasked to vect_set_loop_condition_normal. Rename vect_set_loop_condition_masked to vect_set_loop_condition_partial_vectors. (vect_prepare_for_masked_peels): Rename LOOP_VINFO_MASK_COMPARE_TYPE to LOOP_VINFO_RGROUP_COMPARE_TYPE. * tree-vect-loop.c (vect_known_niters_smaller_than_vf): New, factored out from ... (vect_analyze_loop_costing): ... this. (_loop_vec_info::_loop_vec_info): Rename mask_compare_type to compare_type. (vect_min_prec_for_max_niters): New, factored out from ... (vect_verify_full_masking): ... this. Rename vect_iv_limit_for_full_masking to vect_iv_limit_for_partial_vectors. Rename LOOP_VINFO_MASK_COMPARE_TYPE to LOOP_VINFO_RGROUP_COMPARE_TYPE. Rename LOOP_VINFO_MASK_IV_TYPE to LOOP_VINFO_RGROUP_IV_TYPE. (vectorizable_reduction): Update some dumpings with partial vectors instead of fully-masked. (vectorizable_live_operation): Likewise. (vect_iv_limit_for_full_masking): Renamed to ... (vect_iv_limit_for_partial_vectors): ... this. * tree-vect-stmts.c (check_load_store_masking): Renamed to ... (check_load_store_for_partial_vectors): ... this. Update some dumpings with partial vectors instead of fully-masked. (vectorizable_store): Rename check_load_store_masking to check_load_store_for_partial_vectors. (vectorizable_load): Likewise. * tree-vectorizer.h (LOOP_VINFO_MASK_COMPARE_TYPE): Renamed to ... (LOOP_VINFO_RGROUP_COMPARE_TYPE): ... this. (LOOP_VINFO_MASK_IV_TYPE): Renamed to ... (LOOP_VINFO_RGROUP_IV_TYPE): ... this. (vect_iv_limit_for_full_masking): Renamed to ... (vect_iv_limit_for_partial_vectors): this. (_loop_vec_info): Rename mask_compare_type to rgroup_compare_type. Rename iv_type to rgroup_iv_type.
2020-06-11	vect: Rename things related to rgroup_masks	Kewen Lin	1	-22/+22
	Power supports vector memory access with length (in bytes) instructions. Like existing fully masking for SVE, it is another approach to vectorize the loop using partially-populated vectors. As Richard Sandiford pointed out, we can rename the rgroup struct rgroup_masks to rgroup_controls, rename its members mask_type to type, masks to controls to be more generic. Besides, this patch also renames some functions like vect_set_loop_mask to vect_set_loop_control, release_vec_loop_masks to release_vec_loop_controls, vect_set_loop_masks_directly to vect_set_loop_controls_directly. Bootstrapped/regtested on aarch64-linux-gnu. gcc/ChangeLog: * tree-vect-loop-manip.c (vect_set_loop_mask): Renamed to ... (vect_set_loop_control): ... this. (vect_maybe_permute_loop_masks): Rename rgroup_masks related things. (vect_set_loop_masks_directly): Renamed to ... (vect_set_loop_controls_directly): ... this. Also rename some variables with ctrl instead of mask. Rename vect_set_loop_mask to vect_set_loop_control. (vect_set_loop_condition_masked): Rename rgroup_masks related things. Also rename some variables with ctrl instead of mask. * tree-vect-loop.c (release_vec_loop_masks): Renamed to ... (release_vec_loop_controls): ... this. Rename rgroup_masks related things. (_loop_vec_info::~_loop_vec_info): Rename release_vec_loop_masks to release_vec_loop_controls. (can_produce_all_loop_masks_p): Rename rgroup_masks related things. (vect_get_max_nscalars_per_iter): Likewise. (vect_estimate_min_profitable_iters): Likewise. (vect_record_loop_mask): Likewise. (vect_get_loop_mask): Likewise. * tree-vectorizer.h (struct rgroup_masks): Renamed to ... (struct rgroup_controls): ... this. Also rename mask_type to type and rename masks to controls.
2020-06-11	vect: Rename fully_masked_p to using_partial_vectors_p	Kewen Lin	1	-15/+17
	Power supports vector memory access with length (in bytes) instructions. Like existing fully masking for SVE, it is another approach to vectorize the loop using partially-populated vectors. As Richard Sandiford suggested, this patch is to update the existing fully_masked_p field to using_partial_vectors_p. Introduce one macro LOOP_VINFO_USING_PARTIAL_VECTORS_P for partial vectorization checking usage, update the LOOP_VINFO_FULLY_MASKED_P with LOOP_VINFO_USING_PARTIAL_VECTORS_P && !masks.is_empty() and still use it for mask-based partial vectors approach specific checks. Bootstrapped/regtested on aarch64-linux-gnu. gcc/ChangeLog: * tree-vect-loop-manip.c (vect_set_loop_condition): Rename LOOP_VINFO_FULLY_MASKED_P to LOOP_VINFO_USING_PARTIAL_VECTORS_P. (vect_gen_vector_loop_niters): Likewise. (vect_do_peeling): Likewise. * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Rename fully_masked_p to using_partial_vectors_p. (vect_analyze_loop_costing): Rename LOOP_VINFO_FULLY_MASKED_P to LOOP_VINFO_USING_PARTIAL_VECTORS_P. (determine_peel_for_niter): Likewise. (vect_estimate_min_profitable_iters): Likewise. (vect_transform_loop): Likewise. * tree-vectorizer.h (LOOP_VINFO_FULLY_MASKED_P): Updated. (LOOP_VINFO_USING_PARTIAL_VECTORS_P): New macro.
2020-06-11	vect: Rename can_fully_mask_p to can_use_partial_vectors_p	Kewen Lin	1	-11/+13
	Power supports vector memory access with length (in bytes) instructions. Like existing fully masking for SVE, it is another approach to vectorize the loop using partially-populated vectors. As Richard Sandiford pointed out, we should extend the existing flag can_fully_mask_p to be more generic, to indicate whether we have any chances with partial vectors for this loop. So this patch is to rename this flag to can_use_partial_vectors_p to be more meaningful, also rename the macro LOOP_VINFO_CAN_FULLY_MASK_P to LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P. Bootstrapped/regtested on aarch64-linux-gnu. gcc/ChangeLog: * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Rename can_fully_mask_p to can_use_partial_vectors_p. (vect_analyze_loop_2): Rename LOOP_VINFO_CAN_FULLY_MASK_P to LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P. Rename saved_can_fully_mask_p to saved_can_use_partial_vectors_p. (vectorizable_reduction): Rename LOOP_VINFO_CAN_FULLY_MASK_P to LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P. (vectorizable_live_operation): Likewise. * tree-vect-stmts.c (permute_vec_elements): Likewise. (check_load_store_masking): Likewise. (vectorizable_operation): Likewise. (vectorizable_store): Likewise. (vectorizable_load): Likewise. (vectorizable_condition): Likewise. * tree-vectorizer.h (LOOP_VINFO_CAN_FULLY_MASK_P): Renamed to ... (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P): ... this. (_loop_vec_info): Rename can_fully_mask_p to can_use_partial_vectors_p.
2020-06-10	avoid stmt-info allocation for debug stmts	Richard Biener	1	-1/+12
	The following avoids allocating stmt info structs for debug stmts. 2020-06-10 Richard Biener <rguenther@suse.de> * tree-vect-loop.c (vect_determine_vectorization_factor): Skip debug stmts. (_loop_vec_info::_loop_vec_info): Likewise. (vect_update_vf_for_slp): Likewise. (vect_analyze_loop_operations): Likewise. (update_epilogue_loop_vinfo): Likewise. * tree-vect-patterns.c (vect_determine_precisions): Likewise. (vect_pattern_recog): Likewise. * tree-vect-slp.c (vect_detect_hybrid_slp): Likewise. (_bb_vec_info::_bb_vec_info): Likewise. * tree-vect-stmts.c (vect_mark_stmts_to_be_vectorized): Likewise.
2020-06-10	Make {SLP_TREE,STMT_VINFO}_VEC_STMTS a vector of gimple *	Richard Biener	1	-66/+46
	This makes {SLP_TREE,STMT_VINFO}_VEC_STMTS a vector of gimple * and not allocate a stmt_vec_info for vectorizer generated stmts since this is now possible after removing the only use which was chaining of vector stmts via STMT_VINFO_RELATED_STMT. This also removes all stmt_vec_info allocations done for vector stmts, the remaining ones are for stmts in the scalar IL and for patterns which are not part of the IL. Thus after this the stmt UIDs inside a basic-block are suitable for dominance checking if you ignore (or lazy-fill) UIDs of zero of the vector stmts inserted during transform. This property is ensured by a new flag set when pattern analysis is complete. 2020-06-10 Richard Biener <rguenther@suse.de> * tree-vectorizer.h (_slp_tree::vec_stmts): Make it a vector of gimple * stmts. (_stmt_vec_info::vec_stmts): Likewise. (vec_info::stmt_vec_info_ro): New flag. (vect_finish_replace_stmt): Adjust declaration. (vect_finish_stmt_generation): Likewise. (vectorizable_induction): Likewise. (vect_transform_reduction): Likewise. (vectorizable_lc_phi): Likewise. * tree-vect-data-refs.c (vect_create_data_ref_ptr): Do not allocate stmt infos for increments. (vect_record_grouped_load_vectors): Adjust. * tree-vect-loop.c (vect_create_epilog_for_reduction): Likewise. (vectorize_fold_left_reduction): Likewise. (vect_transform_reduction): Likewise. (vect_transform_cycle_phi): Likewise. (vectorizable_lc_phi): Likewise. (vectorizable_induction): Likewise. (vectorizable_live_operation): Likewise. (vect_transform_loop): Likewise. * tree-vect-patterns.c (vect_pattern_recog): Set stmt_vec_info_ro. * tree-vect-slp.c (vect_get_slp_vect_def): Adjust. (vect_get_slp_defs): Likewise. (vect_transform_slp_perm_load): Likewise. (vect_schedule_slp_instance): Likewise. (vectorize_slp_instance_root_stmt): Likewise. * tree-vect-stmts.c (vect_get_vec_defs_for_operand): Likewise. (vect_finish_stmt_generation_1): Do not allocate a stmt info. (vect_finish_replace_stmt): Do not return anything. (vect_finish_stmt_generation): Likewise. (vect_build_gather_load_calls): Adjust. (vectorizable_bswap): Likewise. (vectorizable_call): Likewise. (vectorizable_simd_clone_call): Likewise. (vect_create_vectorized_demotion_stmts): Likewise. (vectorizable_conversion): Likewise. (vectorizable_assignment): Likewise. (vectorizable_shift): Likewise. (vectorizable_operation): Likewise. (vectorizable_scan_store): Likewise. (vectorizable_store): Likewise. (vectorizable_load): Likewise. (vectorizable_condition): Likewise. (vectorizable_comparison): Likewise. (vect_transform_stmt): Likewise. * tree-vectorizer.c (vec_info::vec_info): Initialize stmt_vec_info_ro. (vec_info::replace_stmt): Copy over stmt UID rather than unsetting/setting a stmt info allocating a new UID. (vec_info::set_vinfo_for_stmt): Assert !stmt_vec_info_ro.
2020-06-10	Introduce STMT_VINFO_VEC_STMTS	Richard Biener	1	-208/+111
	This gets rid of the linked list of STMT_VINFO_VECT_STMT and STMT_VINFO_RELATED_STMT in preparation for vectorized stmts no longer needing a stmt_vec_info (just for this chaining). This has ripple-down effects in all places we gather vectorized defs. For this new interfaces are introduced and used throughout vectorization, simplifying code in a lot of places and merging it with the SLP way of gathering vectorized operands. There is vect_get_vec_defs as the new recommended unified interface and vect_get_vec_defs_for_operand as one for non-SLP operation. I've resorted to keep the structure of the code the same where using vect_get_vec_defs would have been too disruptive for this already large patch. 2020-06-10 Richard Biener <rguenther@suse.de> * tree-vect-data-refs.c (vect_vfa_access_size): Adjust. (vect_record_grouped_load_vectors): Likewise. * tree-vect-loop.c (vect_create_epilog_for_reduction): Likewise. (vectorize_fold_left_reduction): Likewise. (vect_transform_reduction): Likewise. (vect_transform_cycle_phi): Likewise. (vectorizable_lc_phi): Likewise. (vectorizable_induction): Likewise. (vectorizable_live_operation): Likewise. (vect_transform_loop): Likewise. * tree-vect-slp.c (vect_get_slp_defs): New function, split out from overload. * tree-vect-stmts.c (vect_get_vec_def_for_operand_1): Remove. (vect_get_vec_def_for_operand): Likewise. (vect_get_vec_def_for_stmt_copy): Likewise. (vect_get_vec_defs_for_stmt_copy): Likewise. (vect_get_vec_defs_for_operand): New function. (vect_get_vec_defs): Likewise. (vect_build_gather_load_calls): Adjust. (vect_get_gather_scatter_ops): Likewise. (vectorizable_bswap): Likewise. (vectorizable_call): Likewise. (vectorizable_simd_clone_call): Likewise. (vect_get_loop_based_defs): Remove. (vect_create_vectorized_demotion_stmts): Adjust. (vectorizable_conversion): Likewise. (vectorizable_assignment): Likewise. (vectorizable_shift): Likewise. (vectorizable_operation): Likewise. (vectorizable_scan_store): Likewise. (vectorizable_store): Likewise. (vectorizable_load): Likewise. (vectorizable_condition): Likewise. (vectorizable_comparison): Likewise. (vect_transform_stmt): Adjust and remove no longer applicable sanity checks. * tree-vectorizer.c (vec_info::new_stmt_vec_info): Initialize STMT_VINFO_VEC_STMTS. (vec_info::free_stmt_vec_info): Relase it. * tree-vectorizer.h (_stmt_vec_info::vectorized_stmt): Remove. (_stmt_vec_info::vec_stmts): Add. (STMT_VINFO_VEC_STMT): Remove. (STMT_VINFO_VEC_STMTS): New. (vect_get_vec_def_for_operand_1): Remove. (vect_get_vec_def_for_operand): Likewise. (vect_get_vec_defs_for_stmt_copy): Likewise. (vect_get_vec_def_for_stmt_copy): Likewise. (vect_get_vec_defs): New overloads. (vect_get_vec_defs_for_operand): New. (vect_get_slp_defs): Declare.
2020-06-09	Remove dead code	Richard Biener	1	-42/+0
	This removes dead code left over from the reduction vectorization refactoring last year. 2020-06-09 Richard Biener <rguenther@suse.de> * tree-vect-loop.c (vectorizable_induction): Remove dead code.
2020-06-04	add vect_get_slp_vect_def	Richard Biener	1	-1/+1
	This adds vect_get_slp_vect_def to get at a SLP nodes vectorized def, abstracting away the details. It also fixes one stray failure to use SLP_TREE_REPRESENTATIVE. 2020-05-04 Richard Biener <rguenther@suse.de> * tree-vectorizer.h (vect_get_slp_vect_def): Declare. * tree-vect-loop.c (vect_create_epilog_for_reduction): Use it. * tree-vect-stmts.c (vect_transform_stmt): Likewise. (vect_is_simple_use): Use SLP_TREE_REPRESENTATIVE. * tree-vect-slp.c (vect_get_slp_vect_defs): Fold into single use ... (vect_get_slp_defs): ... here. (vect_get_slp_vect_def): New function.
2020-06-04	Add explicit SLP_TREE_LANES	Richard Biener	1	-7/+6
	This adds an explicit number of scalar lanes to the SLP node avoiding to dispatch between stmts/ops and eventually not require those vectors at all. 2020-05-27 Richard Biener <rguenther@suse.de> * tree-vectorizer.h (_slp_tree::lanes): New. (SLP_TREE_LANES): Likewise. * tree-vect-loop.c (vect_create_epilog_for_reduction): Use it. (vectorizable_reduction): Likewise. (vect_transform_cycle_phi): Likewise. (vectorizable_induction): Likewise. (vectorizable_live_operation): Likewise. * tree-vect-slp.c (_slp_tree::_slp_tree): Initialize lanes. (vect_create_new_slp_node): Likewise. (slp_copy_subtree): Copy it. (vect_optimize_slp): Use it. (vect_slp_analyze_node_operations_1): Likewise. (vect_slp_convert_to_external): Likewise. (vect_bb_vectorization_profitable_p): Likewise. * tree-vect-stmts.c (vectorizable_load): Likewise. (get_vectype_for_scalar_type): Likewise.
2020-06-04	Simplify SLP code wrt SLP_TREE_DEF_TYPE	Richard Biener	1	-1/+7
	The following removes the ugly pushing of SLP_TREE_DEF_TYPE to stmt_infos and instead makes sure to handle invariants fully in vect_is_simple_use plus adjusting a few places I refrained from touching when enforcing vector types for them. It also simplifies building SLP nodes with all external operands from scalars by not doing that in the parent but instead not building those from the start. That also gets rid of vect_update_all_shared_vectypes. 2020-06-04 Richard Biener <rguenther@suse.de> * tree-vect-slp.c (vect_update_all_shared_vectypes): Remove. (vect_build_slp_tree_2): Simplify building all external op nodes from scalars. (vect_slp_analyze_node_operations): Remove push/pop of STMT_VINFO_DEF_TYPE. (vect_schedule_slp_instance): Likewise. * tree-vect-stmts.c (ect_check_store_rhs): Pass in the stmt_info, use the vect_is_simple_use overload combining SLP and stmt_info analysis. (vect_is_simple_cond): Likewise. (vectorizable_store): Adjust. (vectorizable_condition): Likewise. (vect_is_simple_use): Fully handle invariant SLP nodes here. Amend stmt_info operand extraction with COND_EXPR and masked stores. * tree-vect-loop.c (vectorizable_reduction): Deal with COND_EXPR representation ugliness.
2020-05-29	tree-optimization/95272 - add SLP_TREE_REPRESENTATIVE	Richard Biener	1	-2/+6
	This adds SLP_TREE_REPRESENTATIVE - a representative stmt-info that is used by SLP analysis and code generation. This avoids the need for the hack in vect_slp_rearrange_stmts which previously avoided to re-arrange stmts that might not have been isomorphic because of operand swapping. It also plays nice with future directions of SLP and for the forseeable future is easier than replicating more and more info in the SLP node as long as non-SLP is in-tree. 2020-05-29 Richard Biener <rguenther@suse.de> PR tree-optimization/95272 * tree-vectorizer.h (_slp_tree::representative): Add. (SLP_TREE_REPRESENTATIVE): Likewise. * tree-vect-loop.c (vectorizable_reduction): Adjust SLP node gathering. (vectorizable_live_operation): Use the representative to attach the reduction info to. * tree-vect-slp.c (_slp_tree::_slp_tree): Initialize SLP_TREE_REPRESENTATIVE. (vect_create_new_slp_node): Likewise. (slp_copy_subtree): Copy it. (vect_slp_rearrange_stmts): Re-arrange even COND_EXPR stmts. (vect_slp_analyze_node_operations_1): Pass the representative to vect_analyze_stmt. (vect_schedule_slp_instance): Pass the representative to vect_transform_stmt. * gcc.dg/vect/pr95272.c: New testcase.
2020-05-22	enfoce SLP_TREE_VECTYPE for invariants	Richard Biener	1	-3/+30
	This tries to enforce a set SLP_TREE_VECTYPE in vect_get_constant_vectors and provides some infrastructure for setting it in the vectorizable_* functions, amending those. 2020-05-22 Richard Biener <rguenther@suse.de> * tree-vectorizer.h (vect_is_simple_use): New overload. (vect_maybe_update_slp_op_vectype): New. * tree-vect-stmts.c (vect_is_simple_use): New overload accessing operands of SLP vs. non-SLP operation transparently. (vect_maybe_update_slp_op_vectype): New function updating the possibly shared SLP operands vector type. (vectorizable_operation): Be a bit more SLP vs non-SLP agnostic using the new vect_is_simple_use overload; update SLP invariant operand nodes vector type. (vectorizable_comparison): Likewise. (vectorizable_call): Likewise. (vectorizable_conversion): Likewise. (vectorizable_shift): Likewise. (vectorizable_store): Likewise. (vectorizable_condition): Likewise. (vectorizable_assignment): Likewise. * tree-vect-loop.c (vectorizable_reduction): Likewise. * tree-vect-slp.c (vect_get_constant_vectors): Enforce present SLP_TREE_VECTYPE and check it matches previous behavior.
2020-05-20	tree-optimization/95219 - improve IV selection for induction	Richard Biener	1	-1/+13
	This improves code generation with SSE2 for the testcase by making sure to only generate a single IV when the group size is a multiple of the vector size. It also adjusts the testcase which was passing before. 2020-05-20 Richard Biener <rguenther@suse.de> PR tree-optimization/95219 * tree-vect-loop.c (vectorizable_induction): Reduce group_size before computing the number of required IVs. * gcc.dg/vect/costmodel/x86_64/costmodel-pr30843.c: Adjust.
2020-05-13	add vectype parameter to add_stmt_cost hook	Richard Biener	1	-22/+24
	This adds a vectype parameter to add_stmt_cost which avoids the need to pass down a (wrong) stmt_info just to carry this information. Useful for invariants which do not have a stmt_info associated. 2020-05-13 Richard Biener <rguenther@suse.de> * target.def (add_stmt_cost): Add new vectype parameter. * targhooks.c (default_add_stmt_cost): Adjust. * targhooks.h (default_add_stmt_cost): Likewise. * config/aarch64/aarch64.c (aarch64_add_stmt_cost): Take new vectype parameter. * config/arm/arm.c (arm_add_stmt_cost): Likewise. * config/i386/i386.c (ix86_add_stmt_cost): Likewise. * config/rs6000/rs6000.c (rs6000_add_stmt_cost): Likewise. * tree-vectorizer.h (stmt_info_for_cost::vectype): Add. (dump_stmt_cost): Add new vectype parameter. (add_stmt_cost): Likewise. (record_stmt_cost): Likewise. (record_stmt_cost): Add overload with old signature. * tree-vect-loop.c (vect_compute_single_scalar_iteration_cost): Adjust. (vect_get_known_peeling_cost): Likewise. (vect_estimate_min_profitable_iters): Likewise. * tree-vectorizer.c (dump_stmt_cost): Add new vectype parameter. * tree-vect-stmts.c (record_stmt_cost): Likewise. (vect_prologue_cost_for_slp_op): Remove stmt_vec_info parameter and pass down correct vectype and NULL stmt_info. (vect_model_simple_cost): Adjust. (vect_model_store_cost): Likewise.
2020-05-13	Remove SLP_INSTANCE_GROUP_SIZE	Richard Biener	1	-1/+1
	This removes the SLP_INSTANCE_GROUP_SIZE member since the number of lanes throughout a SLP subgraph is not necessarily constant. 2020-05-13 Richard Biener <rguenther@suse.de> * tree-vectorizer.h (SLP_INSTANCE_GROUP_SIZE): Remove. (_slp_instance::group_size): Likewise. * tree-vect-loop.c (vectorizable_reduction): The group size is the number of lanes in the node. * tree-vect-slp.c (vect_attempt_slp_rearrange_stmts): Likewise. (vect_analyze_slp_instance): Do not set SLP_INSTANCE_GROUP_SIZE, verify it matches the instance trees number of lanes. (vect_slp_analyze_node_operations_1): Use the numer of lanes in the node as group size. (vect_bb_vectorization_profitable_p): Use the instance root number of lanes for the size of life. (vect_schedule_slp_instance): Use the number of lanes as group_size. * tree-vect-stmts.c (vectorizable_load): Remove SLP instance parameter. Use the number of lanes of the load for the group size in the gap adjustment code. (vect_analyze_stmt): Adjust. (vect_transform_stmt): Likewise.
2020-05-12	tree: Add vector_element_bits(_tree) [PR94980 1/3]	Richard Sandiford	1	-3/+1
	A lot of code that wants to know the number of bits in a vector element gets that information from the element's TYPE_SIZE, which is always equal to TYPE_SIZE_UNIT * BITS_PER_UNIT. This doesn't work for SVE and AVX512-style packed boolean vectors, where several elements can occupy a single byte. This patch introduces a new pair of helpers for getting the true (possibly sub-byte) size. I made a token attempt to convert obvious element size calculations, but I'm sure I missed some. 2020-05-12 Richard Sandiford <richard.sandiford@arm.com> gcc/ PR tree-optimization/94980 * tree.h (vector_element_bits, vector_element_bits_tree): Declare. * tree.c (vector_element_bits, vector_element_bits_tree): New. * match.pd: Use the new functions instead of determining the vector element size directly from TYPE_SIZE(_UNIT). * tree-vect-data-refs.c (vect_gather_scatter_fn_p): Likewise. * tree-vect-patterns.c (vect_recog_mask_conversion_pattern): Likewise. * tree-vect-stmts.c (vect_is_simple_cond): Likewise. * tree-vect-generic.c (expand_vector_piecewise): Likewise. (expand_vector_conversion): Likewise. (expand_vector_addition): Likewise for a TYPE_SIZE_UNIT used as a divisor. Convert the dividend to bits to compensate. * tree-vect-loop.c (vectorizable_live_operation): Call vector_element_bits instead of open-coding it.
2020-05-08	move permutation validity check	Richard Biener	1	-0/+3
	This delays the SLP permutation check to vectorizable_load and optimizes permutations only after all SLP instances have been generated and the vectorization factor is determined. 2020-05-08 Richard Biener <rguenther@suse.de> * tree-vectorizer.h (vec_info::slp_loads): New. (vect_optimize_slp): Declare. * tree-vect-slp.c (vect_attempt_slp_rearrange_stmts): Do nothing when there are no loads. (vect_gather_slp_loads): Gather loads into a vector. (vect_supported_load_permutation_p): Remove. (vect_analyze_slp_instance): Do not verify permutation validity here. (vect_analyze_slp): Optimize permutations of reductions after all SLP instances have been gathered and gather all loads. (vect_optimize_slp): New function split out from vect_supported_load_permutation_p. Elide some permutations. (vect_slp_analyze_bb_1): Call vect_optimize_slp. * tree-vect-loop.c (vect_analyze_loop_2): Likewise. * tree-vect-stmts.c (vectorizable_load): Check whether the load can be permuted. When generating code assert we can. * gcc.dg/vect/bb-slp-pr68892.c: Adjust for not supported SLP permutations becoming builds from scalars. * gcc.dg/vect/bb-slp-pr78205.c: Likewise. * gcc.dg/vect/bb-slp-34.c: Likewise.
2020-05-05	add vec_info * parameters where needed	Richard Biener	1	-98/+123
	Soonish we'll get SLP nodes which have no corresponding scalar stmt and thus not stmt_vec_info and thus no way to get back to the associated vec_info. This patch makes the vec_info available as part of the APIs instead of putting in that back-pointer into the leaf data structures. 2020-05-05 Richard Biener <rguenther@suse.de> * tree-vectorizer.h (_stmt_vec_info::vinfo): Remove. (STMT_VINFO_LOOP_VINFO): Likewise. (STMT_VINFO_BB_VINFO): Likewise. * tree-vect-data-refs.c: Adjust for the above, adding vec_info * parameters and adjusting calls. * tree-vect-loop-manip.c: Likewise. * tree-vect-loop.c: Likewise. * tree-vect-patterns.c: Likewise. * tree-vect-slp.c: Likewise. * tree-vect-stmts.c: Likewise. * tree-vectorizer.c: Likewise. * target.def (add_stmt_cost): Add vec_info * parameter. * target.h (stmt_in_inner_loop_p): Likewise. * targhooks.c (default_add_stmt_cost): Adjust. * doc/tm.texi: Re-generate. * config/aarch64/aarch64.c (aarch64_extending_load_p): Add vec_info * parameter and adjust. (aarch64_sve_adjust_stmt_cost): Likewise. (aarch64_add_stmt_cost): Likewise. * config/arm/arm.c (arm_add_stmt_cost): Likewise. * config/i386/i386.c (ix86_add_stmt_cost): Likewise. * config/rs6000/rs6000.c (rs6000_add_stmt_cost): Likewise.
2020-04-20	vect: Tweak vect_better_loop_vinfo_p handling of variable VFs	Richard Sandiford	1	-1/+30
	This patch fixes a large lmbench performance regression with 128-bit SVE, compiled in length-agnostic mode. vect_better_loop_vinfo_p (new in GCC 10) tries to estimate whether a new loop_vinfo is cheaper than a previous one, with an in-built preference for the old one. For variable VF it prefers the old loop_vinfo if it is cheaper for at least one VF. However, we have no idea how likely that VF is in practice. Another extreme would be to do what most of the rest of the vectoriser does, and rely solely on the constant estimated VF. But as noted in the comment, this means that a one-unit cost difference would be enough to pick the new loop_vinfo, despite the target generally preferring the old loop_vinfo where possible. The cost model just isn't accurate enough for that to produce good results as things stand: there might not be any practical benefit to the new loop_vinfo at the estimated VF, and it would be significantly worse for higher VFs. The patch instead goes for a hacky compromise: make sure that the new loop_vinfo is also no worse than the old loop_vinfo at double the estimated VF. For all but trivial loops, this ensures that the new loop_vinfo is only chosen if it is better than the old one by a non-trivial amount at the estimated VF. It also avoids putting too much faith in the VF estimate. I realise this isn't great, but it's supposed to be a conservative fix suitable for stage 4. The only affected testcases are the ones for pr89007-.c, where Advanced SIMD is indeed preferred for 128-bit SVE and is no worse for 256-bit SVE. Part of the problem here is that if the new loop_vinfo is better, we discard the old one and never consider using it even as an epilogue loop. This means that if we choose Advanced SIMD over SVE, we're much more likely to have left-over scalar elements. Another is that the estimate provided by estimated_poly_value might have different probabilities attached. E.g. when tuning for a particular core, the estimate is probably accurate, but when tuning for generic code, the estimate is more of a guess. Relying solely on the estimate is probably correct for the former but not for the latter. Hopefully those are things that we could tackle in GCC 11. 2020-04-20 Richard Sandiford <richard.sandiford@arm.com> gcc/ tree-vect-loop.c (vect_better_loop_vinfo_p): If old_loop_vinfo has a variable VF, prefer new_loop_vinfo if it is cheaper for the estimated VF and is no worse at double the estimated VF. gcc/testsuite/ * gcc.target/aarch64/sve/cost_model_8.c: New test. * gcc.target/aarch64/sve/cost_model_9.c: Likewise. * gcc.target/aarch64/sve/pr89007-1.c: Add -msve-vector-bits=512. * gcc.target/aarch64/sve/pr89007-2.c: Likewise.
2020-04-03	Fix PR94443 with gsi_insert_seq_before [PR94443]	Kewen Lin	1	-2/+2
	This patch is to fix the stupid mistake by using gsi_insert_seq_before instead of gsi_insert_before. BTW, the regression testing on one x86_64 machine from CFarm is unable to reveal it (I guess due to native arch sandybridge?), so I specified additional option -march=znver2 and verified the coverage. Bootstrapped/regtested on powerpc64le-linux-gnu (P9) and x86_64-pc-linux-gnu, also verified the fail cases in related PRs. 2020-04-03 Kewen Lin <linkw@gcc.gnu.org> gcc/ PR tree-optimization/94443 * tree-vect-loop.c (vectorizable_live_operation): Use gsi_insert_seq_before to replace gsi_insert_before. gcc/testsuite/ PR tree-optimization/94443 * gcc.dg/vect/pr94443.c: New test.
2020-04-01	Fix PR94043 by making vect_live_op generate lc-phi	Kewen Lin	1	-6/+44
	As PR94043 shows, my commit r10-4524 exposed one issue in vectorizable_live_operation, which inserts one extra BB before the single exit, leading unexpected operand expansion and unexpected loop depth assertion. As Richi suggested, this patch is to teach vectorizable_live_operation to generate loop closed phi for vec_lhs, it looks like: loop; # lhs' = PHI <lhs> => loop; # vec_lhs' = PHI <vec_lhs> new_tree = BIT_FIELD_REF <vec_lhs', ...>; lhs' = new_tree; I noticed that there are some SLP cases that have same lhs and vec_lhs but different offsets, which can make us have more PHIs for the same vec_lhs there. But I think it would be fine since only one of them is actually live, the others should be eliminated by the following dce. So the patch doesn't check whether there is one phi for vec_lhs, just create one directly instead. Bootstrapped/regtested on powerpc64le-linux-gnu (LE) P8. 2020-04-01 Kewen Lin <linkw@gcc.gnu.org> gcc/ChangeLog PR tree-optimization/94043 * tree-vect-loop.c (vectorizable_live_operation): Generate loop-closed phi for vec_lhs and use it for lane extraction. gcc/testsuite/ChangeLog PR tree-optimization/94043 * gfortran.dg/graphite/vect-pr94043.f90: New test.
2020-03-17	Fix up duplicated duplicated words mostly in comments	Jakub Jelinek	1	-1/+1
	In the r10-7197-gbae7b38cf8a21e068ad5c0bab089dedb78af3346 commit I've noticed duplicated word in a message, which lead me to grep for those and we have a tons of them. I've used grep -v 'long long\\|optab optab\\|template template\\|double double' .[chS] /.[chS] .def config// 2>/dev/null \| grep ' \([a-zA-Z]\+\) \1 ' Note, the command will not detect the doubled words at the start or end of line or when one of the words is at the end of line and the next one at the start of another one. Some of it is fairly obvious, e.g. all the "the the" cases which is something I've posted and committed patch for already e.g. in 2016, other cases are often valid, e.g. "that that" seems to look mostly ok to me. Some cases are quite hard to figure out, I've left out some of them from the patch (e.g. "and and" in some cases isn't talking about bitwise/logical and and so looks incorrect, but in other cases it is talking about those operations). In most cases the right solution seems to be to remove one of the duplicated words, but not always. I think most important are the ones with user visible messages (in the patch 3 of the first 4 hunks), the rest is just comments (and internal documentation; for that see the doc/tm.texi changes). 2020-03-17 Jakub Jelinek <jakub@redhat.com> * lra-spills.c (remove_pseudos): Fix up duplicated word issue in a dump message. * tree-sra.c (create_access_replacement): Fix up duplicated word issue in a comment. * read-rtl-function.c (find_param_by_name, function_reader::parse_enum_value, function_reader::get_insn_by_uid): Likewise. * spellcheck.c (get_edit_distance_cutoff): Likewise. * tree-data-ref.c (create_ifn_alias_checks): Likewise. * tree.def (SWITCH_EXPR): Likewise. * selftest.c (assert_str_contains): Likewise. * ipa-param-manipulation.h (class ipa_param_body_adjustments): Likewise. * tree-ssa-math-opts.c (convert_expand_mult_copysign): Likewise. * tree-ssa-loop-split.c (find_vdef_in_loop): Likewise. * langhooks.h (struct lang_hooks_for_decls): Likewise. * ipa-prop.h (struct ipa_param_descriptor): Likewise. * tree-ssa-strlen.c (handle_builtin_string_cmp, handle_store): Likewise. * tree-ssa-dom.c (simplify_stmt_for_jump_threading): Likewise. * tree-ssa-reassoc.c (reassociate_bb): Likewise. * tree.c (component_ref_size): Likewise. * hsa-common.c (hsa_init_compilation_unit_data): Likewise. * gimple-ssa-sprintf.c (get_string_length, format_string, format_directive): Likewise. * omp-grid.c (grid_process_kernel_body_copy): Likewise. * input.c (string_concat_db::get_string_concatenation, test_lexer_string_locations_ucn4): Likewise. * cfgexpand.c (pass_expand::execute): Likewise. * gimple-ssa-warn-restrict.c (builtin_memref::offset_out_of_bounds, maybe_diag_overlap): Likewise. * rtl.c (RTX_CODE_HWINT_P_1): Likewise. * shrink-wrap.c (spread_components): Likewise. * tree-ssa-dse.c (initialize_ao_ref_for_dse, valid_ao_ref_for_dse): Likewise. * tree-call-cdce.c (shrink_wrap_one_built_in_call_with_conds): Likewise. * dwarf2out.c (dwarf2out_early_finish): Likewise. * gimple-ssa-store-merging.c: Likewise. * ira-costs.c (record_operand_costs): Likewise. * tree-vect-loop.c (vectorizable_reduction): Likewise. * target.def (dispatch): Likewise. (validate_dims, gen_ccmp_first): Fix up duplicated word issue in documentation text. * doc/tm.texi: Regenerated. * config/i386/x86-tune.def (X86_TUNE_PARTIAL_FLAG_REG_STALL): Fix up duplicated word issue in a comment. * config/i386/i386.c (ix86_test_loading_unspec): Likewise. * config/i386/i386-features.c (remove_partial_avx_dependency): Likewise. * config/msp430/msp430.c (msp430_select_section): Likewise. * config/gcn/gcn-run.c (load_image): Likewise. * config/aarch64/aarch64-sve.md (sve_ld1r<mode>): Likewise. * config/aarch64/aarch64.c (aarch64_gen_adjusted_ldpstp): Likewise. * config/aarch64/falkor-tag-collision-avoidance.c (single_dest_per_chain): Likewise. * config/nvptx/nvptx.c (nvptx_record_fndecl): Likewise. * config/fr30/fr30.c (fr30_arg_partial_bytes): Likewise. * config/rs6000/rs6000-string.c (expand_cmp_vec_sequence): Likewise. * config/rs6000/rs6000-p8swap.c (replace_swapped_load_constant): Likewise. * config/rs6000/rs6000-c.c (rs6000_target_modify_macros): Likewise. * config/rs6000/rs6000.c (rs6000_option_override_internal): Likewise. * config/rs6000/rs6000-logue.c (rs6000_emit_probe_stack_range_stack_clash): Likewise. * config/nds32/nds32-md-auxiliary.c (nds32_split_ashiftdi3): Likewise. Fix various other issues in the comment. c-family/ * c-common.c (resolve_overloaded_builtin): Fix up duplicated word issue in a diagnostic message. cp/ * pt.c (tsubst): Fix up duplicated word issue in a diagnostic message. (lookup_template_class_1, tsubst_expr): Fix up duplicated word issue in a comment. * parser.c (cp_parser_statement, cp_parser_linkage_specification, cp_parser_placeholder_type_specifier, cp_parser_constraint_requires_parens): Likewise. * name-lookup.c (suggest_alternative_in_explicit_scope): Likewise. fortran/ * array.c (gfc_check_iter_variable): Fix up duplicated word issue in a comment. * arith.c (gfc_arith_concat): Likewise. * resolve.c (gfc_resolve_ref): Likewise. * frontend-passes.c (matmul_lhs_realloc): Likewise. * module.c (gfc_match_submodule, load_needed): Likewise. * trans-expr.c (gfc_init_se): Likewise.
2020-01-28	vect: Pattern-matched calls in reduction chains	Richard Sandiford	1	-3/+11
	gcc.dg/pr56350.c started ICEing for SVE in GCC 10 because we pattern-matched a division reduction: a /= 8; into a signed shift with division semantics: ... = IFN_SDIV_POW2 (..., 3); whereas the reduction code expected it still to be a gassign. One fix would be to check for a reduction in the pattern matcher (but current patterns don't generally do that). Another would be to fail gracefully for reductions involving calls. Since we can't vectorise the reduction either way, and probably have a better shot with the shift form, this patch goes for the "fail gracefully" approach. 2020-01-28 Richard Sandiford <richard.sandiford@arm.com> gcc/ * tree-vect-loop.c (vectorizable_reduction): Fail gracefully for reduction chains that (now) include a call.
2020-01-20	tree-optimization/93094 pass down VECTORIZED_CALL to versioning	Richard Biener	1	-2/+2
	When versioning is run the IL is already mangled and finding a VECTORIZED_CALL IFN can fail. 2020-01-20 Richard Biener <rguenther@suse.de> PR tree-optimization/93094 * tree-vectorizer.h (vect_loop_versioning): Adjust. (vect_transform_loop): Likewise. * tree-vectorizer.c (try_vectorize_loop_1): Pass down loop_vectorized_call to vect_transform_loop. * tree-vect-loop.c (vect_transform_loop): Pass down loop_vectorized_call to vect_loop_versioning. * tree-vect-loop-manip.c (vect_loop_versioning): Use the earlier discovered loop_vectorized_call. * gcc.dg/vect/pr93094.c: New testcase.
2020-01-16	PR tree-optimization/92429 do not fold when updating epilogue statements	Andre Vieira	1	-1/+6
	This patch addresses the problem reported in PR92429. When creating an epilogue for vectorization we have to replace the SSA_NAMEs in the PATTERN_DEF_SEQs and RELATED_STMTs of the epilogue's loop_vec_infos. When doing this we were using simplify_replace_tree which always folds the replacement. This may lead to a different tree-node than the one which was analyzed in vect_loop_analyze. In turn the new tree-node may require a different vectorization than the one we had prepared for which caused the ICE in question. gcc/ChangeLog: 2020-01-16 Andre Vieira <andre.simoesdiasvieira@arm.com> PR tree-optimization/92429 * tree-ssa-loop-niter.h (simplify_replace_tree): Add parameter. * tree-ssa-loop-niter.c (simplify_replace_tree): Add parameter to control folding. * tree-vect-loop.c (update_epilogue_vinfo): Do not fold when replacing tree. gcc/testsuite/ChangeLog: 2020-01-16 Andre Vieira <andre.simoesdiasvieira@arm.com> PR tree-optimization/92429 * gcc.dg/vect/pr92429.c: New test.
2020-01-15	PR tree-optimization/93247 - ICE in get_load_store_type	Richard Sandiford	1	-1/+2
	My earlier update_epilogue_loop_vinfo patch introduced an ICE on these tests for AVX512. If we use pattern stmts, STMT_VINFO_GATHER_SCATTER_P is valid for both the original stmt and the pattern stmt, but STMT_VINFO_MEMORY_ACCESS_TYPE is valid only for the latter. 2020-01-15 Richard Sandiford <richard.sandiford@arm.com> gcc/ PR tree-optimization/93247 * tree-vect-loop.c (update_epilogue_loop_vinfo): Check the access type of the stmt that we're going to vectorize. gcc/testsuite/ PR tree-optimization/93247 * gcc.dg/vect/pr93247-1.c: New test. * gcc.dg/vect/pr93247-2.c: Likewise.
2020-01-10	Use get_related_vectype_for_scalar_type for reduction indices	Richard Sandiford	1	-2/+3
	The related_vector_mode series missed this case in vect_create_epilog_for_reduction, where we want to create the unsigned integer equivalent of another vector. Without it we could mix SVE and Advanced SIMD vectors in the same operation. This showed up on existing tests when testing with fixed-length -msve-vector-bits=128. 2020-01-10 Richard Sandiford <richard.sandiford@arm.com> gcc/ * tree-vect-loop.c (vect_create_epilog_for_reduction): Use get_related_vectype_for_scalar_type rather than build_vector_type to create the index type for a conditional reduction. From-SVN: r280112