aboutsummaryrefslogtreecommitdiff
path: root/gcc/tree-vect-stmts.c
AgeCommit message (Collapse)AuthorFilesLines
2020-12-10remove obsolete conversion handling from vectorizable_assignmentRichard Biener1-6/+1
This removes an odd special-case of VECTOR_BOOLEAN_TYPE_P typed conversions from vectorizable_assignment that was obsoleted by making all integer mode VECTOR_BOOLEAN_TYPE_P types have 1-bit precision bool components with 605c2a393d3a2db8 2020-12-10 Richard Biener <rguenther@suse.de> * tree-vect-stmts.c (vectorizable_assignment): Remove special allowance of VECTOR_BOOLEAN_TYPE_P conversions.
2020-12-10tree-optimization/98211 - fix bogus vectorization of conversionRichard Biener1-0/+11
Pattern recog incompletely handles some bool cases but we shouldn't miscompile as a result but not vectorize. Unfortunately vectorizable_assignment lets invalid conversions (that vectorizable_conversion rejects) slip through. The following rectifies that. 2020-12-10 Richard Biener <rguenther@suse.de> PR tree-optimization/98211 * tree-vect-stmts.c (vectorizable_assignment): Disallow invalid conversions to bool vector types. * gcc.dg/pr98211.c: New testcase.
2020-11-19[3/3] [AArch64][vect] vec_widen_lshift patternJoel Hutton1-2/+3
Add aarch64 vec_widen_lshift_lo/hi patterns and fix bug it triggers in mid-end. This pattern takes one vector with N elements of size S, shifts each element left by the element width and stores the results as N elements of size 2*s (in 2 result vectors). The aarch64 backend implements this with the shll,shll2 instruction pair. gcc/ChangeLog: * config/aarch64/aarch64-simd.md: Add vec_widen_lshift_hi/lo<mode> patterns. * tree-vect-stmts.c (vectorizable_conversion): Fix for widen_lshift case. gcc/testsuite/ChangeLog: * gcc.target/aarch64/vect-widen-lshift.c: New test.
2020-11-19[2/3] [vect] Add widening add, subtract patternsJoel Hutton1-1/+14
Add widening add, subtract patterns to tree-vect-patterns. Update the widened code of patterns that detect PLUS_EXPR to also detect WIDEN_PLUS_EXPR. These patterns take 2 vectors with N elements of size S and perform an add/subtract on the elements, storing the results as N elements of size 2*S (in 2 result vectors). This is implemented in the aarch64 backend as addl,addl2 and subl,subl2 respectively. Add aarch64 tests for patterns. gcc/ChangeLog: * doc/generic.texi: Document new widen_plus/minus_lo/hi tree codes. * doc/md.texi: Document new widenening add/subtract hi/lo optabs. * expr.c (expand_expr_real_2): Add widen_add, widen_subtract cases. * optabs-tree.c (optab_for_tree_code): Add case for widening optabs. * optabs.def (OPTAB_D): Define vectorized widen add, subtracts. * tree-cfg.c (verify_gimple_assign_binary): Add case for widening adds, subtracts. * tree-inline.c (estimate_operator_cost): Add case for widening adds, subtracts. * tree-vect-generic.c (expand_vector_operations_1): Add case for widening adds, subtracts * tree-vect-patterns.c (vect_recog_widen_add_pattern): New recog pattern. (vect_recog_widen_sub_pattern): New recog pattern. (vect_recog_average_pattern): Update widened add code. (vect_recog_average_pattern): Update widened add code. * tree-vect-stmts.c (vectorizable_conversion): Add case for widened add, subtract. (supportable_widening_operation): Add case for widened add, subtract. * tree.def (WIDEN_PLUS_EXPR): New tree code. (WIDEN_MINUS_EXPR): New tree code. (VEC_WIDEN_ADD_HI_EXPR): New tree code. (VEC_WIDEN_PLUS_LO_EXPR): New tree code. (VEC_WIDEN_MINUS_HI_EXPR): New tree code. (VEC_WIDEN_MINUS_LO_EXPR): New tree code. gcc/testsuite/ChangeLog: * gcc.target/aarch64/vect-widen-add.c: New test. * gcc.target/aarch64/vect-widen-sub.c: New test.
2020-11-17PR97693: Specify required vectype in vectorizable_callRichard Sandiford1-1/+2
The vectorizable_call part of r11-1143 dropped the required vectype when moving from vect_get_vec_def_for_operand to vect_get_vec_defs_for_operand. This caused an ICE on the testcase for SVE, because we ended up with a non-predicate value being passed to a predicate input. AFAICT this was the only instance of that happening. The types seemed to get carried forward for all the other converted calls. gcc/ PR tree-optimization/97693 * tree-vect-stmts.c (vectorizable_call): Pass the required vectype to vect_get_vec_defs_for_operand. gcc/testsuite/ PR tree-optimization/97693 * gcc.dg/vect/pr97693.c: New test.
2020-11-04add costing to SLP vectorized PHIsRichard Biener1-2/+2
I forgot to cost vectorized PHIs. Scalar PHIs are just costed as scalar_stmt so the following costs vector PHIs as vector_stmt. 2020-11-04 Richard Biener <rguenther@suse.de> * tree-vectorizer.h (vectorizable_phi): Adjust prototype. * tree-vect-stmts.c (vect_transform_stmt): Adjust. (vect_analyze_stmt): Pass cost_vec to vectorizable_phi. * tree-vect-loop.c (vectorizable_phi): Do costing.
2020-11-03PR target/96342 Change field "simdlen" into poly_uint64Yang Yang1-19/+24
This is the first patch of PR96342. In order to add support for "omp declare simd", change the type of the field "simdlen" of struct cgraph_simd_clone from unsigned int to poly_uint64 and related adaptation. Since the length might be variable for the SVE cases. 2020-11-03 Yang Yang <yangyang305@huawei.com> gcc/ChangeLog: * cgraph.h (struct cgraph_simd_clone): Change field "simdlen" of struct cgraph_simd_clone from unsigned int to poly_uint64. * config/aarch64/aarch64.c (aarch64_simd_clone_compute_vecsize_and_simdlen): adaptation of operations on "simdlen". * config/i386/i386.c (ix86_simd_clone_compute_vecsize_and_simdlen): Printf formats update. * gengtype.c (main): Handle poly_uint64. * omp-simd-clone.c (simd_clone_mangle): Likewise.Re (simd_clone_adjust_return_type): Likewise. (create_tmp_simd_array): Likewise. (simd_clone_adjust_argument_types): Likewise. (simd_clone_init_simd_arrays): Likewise. (ipa_simd_modify_function_body): Likewise. (simd_clone_adjust): Likewise. (expand_simd_clones): Likewise. * poly-int-types.h (vector_unroll_factor): New macro. * poly-int.h (constant_multiple_p): Add two-argument versions. * tree-vect-stmts.c (vectorizable_simd_clone_call): Likewise.
2020-10-29vect: Fix load costs for SLP permutesRichard Sandiford1-28/+4
For the following test case (compiled with load/store lanes disabled locally): void f (uint32_t *restrict x, uint8_t *restrict y, int n) { for (int i = 0; i < n; ++i) { x[i * 2] = x[i * 2] + y[i * 2]; x[i * 2 + 1] = x[i * 2 + 1] + y[i * 2]; } } we have a redundant no-op permute on the x[] load node: node 0x4472350 (max_nunits=8, refcnt=2) stmt 0 _5 = *_4; stmt 1 _13 = *_12; load permutation { 0 1 } Then, when costing it, we pick a cost of 1, even though we need 4 copies of the x[] load to match a single y[] load: ==> examining statement: _5 = *_4; Vectorizing an unaligned access. vect_model_load_cost: unaligned supported by hardware. vect_model_load_cost: inside_cost = 1, prologue_cost = 0 . The problem is that the code only considers the permutation for the first scalar iteration, rather than for all VF iterations. This patch tries to fix that by making vect_transform_slp_perm_load calculate the value instead. gcc/ * tree-vectorizer.h (vect_transform_slp_perm_load): Take an optional extra parameter. * tree-vect-slp.c (vect_transform_slp_perm_load): Calculate the number of loads as well as the number of permutes, taking the counting loop from... * tree-vect-stmts.c (vect_model_load_cost): ...here. Use the value computed by vect_transform_slp_perm_load for ncopies.
2020-10-27SLP vectorize across PHI nodesRichard Biener1-1/+7
This makes SLP discovery detect backedges by seeding the bst_map with the node to be analyzed so it can be picked up from recursive calls. This removes the need to discover backedges in a separate walk. This enables SLP build to handle PHI nodes in full, continuing the SLP build to non-backedges. For loop vectorization this enables outer loop vectorization of nested SLP cycles and for BB vectorization this enables vectorization of PHIs at CFG merges. It also turns code generation into a SCC discovery walk to handle irreducible regions and nodes only reachable via backedges where we now also fill in vectorized backedge defs. This requires sanitizing the SLP tree for SLP reduction chains even more, manually filling the backedge SLP def. This also exposes the fact that CFG copying (and edge splitting until I fixed that) ends up with different edge order in the copy which doesn't play well with the desired 1:1 mapping of SLP PHI node children and edges for epilogue vectorization. I've tried to fixup CFG copying here but this really looks like a dead (or expensive) end there so I've done fixup in slpeel_tree_duplicate_loop_to_edge_cfg instead for the cases we can run into. There's still NULLs in the SLP_TREE_CHILDREN vectors and I'm not sure it's possible to eliminate them all this stage1 so the patch has quite some checks for this case all over the place. Bootstrapped and tested on x86_64-unknown-linux-gnu. SPEC CPU 2017 and SPEC CPU 2006 successfully built and tested. 2020-10-27 Richard Biener <rguenther@suse.de> * gimple.h (gimple_expr_type): For PHIs return the type of the result. * tree-vect-loop-manip.c (slpeel_tree_duplicate_loop_to_edge_cfg): Make sure edge order into copied loop headers line up with the originals. * tree-vect-loop.c (vect_transform_cycle_phi): Handle nested loops with SLP. (vectorizable_phi): New function. (vectorizable_live_operation): For BB vectorization compute insert location here. * tree-vect-slp.c (vect_free_slp_tree): Deal with NULL SLP_TREE_CHILDREN entries. (vect_create_new_slp_node): Add overloads with pre-existing node argument. (vect_print_slp_graph): Likewise. (vect_mark_slp_stmts): Likewise. (vect_mark_slp_stmts_relevant): Likewise. (vect_gather_slp_loads): Likewise. (vect_optimize_slp): Likewise. (vect_slp_analyze_node_operations): Likewise. (vect_bb_slp_scalar_cost): Likewise. (vect_remove_slp_scalar_calls): Likewise. (vect_get_and_check_slp_defs): Handle PHIs. (vect_build_slp_tree_1): Handle PHIs. (vect_build_slp_tree_2): Continue SLP build, following PHI arguments. Fix memory leak. (vect_build_slp_tree): Put stub node into the hash-map so we can discover cycles directly. (vect_build_slp_instance): Set the backedge SLP def for reduction chains. (vect_analyze_slp_backedges): Remove. (vect_analyze_slp): Do not call it. (vect_slp_convert_to_external): Release SLP_TREE_LOAD_PERMUTATION. (vect_slp_analyze_node_operations): Handle stray failed backedge defs by failing. (vect_slp_build_vertices): Adjust leaf condition. (vect_bb_slp_mark_live_stmts): Handle PHIs, use visited hash-set to handle cycles. (vect_slp_analyze_operations): Adjust. (vect_bb_partition_graph_r): Likewise. (vect_slp_function): Adjust split condition to allow CFG merges. (vect_schedule_slp_instance): Rename to ... (vect_schedule_slp_node): ... this. Move DFS walk to ... (vect_schedule_scc): ... this new function. (vect_schedule_slp): Call it. Remove ad-hoc vectorized backedge fill code. * tree-vect-stmts.c (vect_analyze_stmt): Call vectorizable_phi. (vect_transform_stmt): Likewise. (vect_is_simple_use): Handle vect_backedge_def. * tree-vectorizer.c (vec_info::new_stmt_vec_info): Only set loop header PHIs to vect_unknown_def_type for loop vectorization. * tree-vectorizer.h (enum vect_def_type): Add vect_backedge_def. (enum stmt_vec_info_type): Add phi_info_type. (vectorizable_phi): Declare. * gcc.dg/vect/bb-slp-54.c: New test. * gcc.dg/vect/bb-slp-55.c: Likewise. * gcc.dg/vect/bb-slp-56.c: Likewise. * gcc.dg/vect/bb-slp-57.c: Likewise. * gcc.dg/vect/bb-slp-58.c: Likewise. * gcc.dg/vect/bb-slp-59.c: Likewise. * gcc.dg/vect/bb-slp-60.c: Likewise. * gcc.dg/vect/bb-slp-61.c: Likewise. * gcc.dg/vect/bb-slp-62.c: Likewise. * gcc.dg/vect/bb-slp-63.c: Likewise. * gcc.dg/vect/bb-slp-64.c: Likewise. * gcc.dg/vect/bb-slp-65.c: Likewise. * gcc.dg/vect/bb-slp-66.c: Likewise. * gcc.dg/vect/vect-outer-slp-1.c: Likewise. * gfortran.dg/vect/O3-bb-slp-1.f: Likewise. * gfortran.dg/vect/O3-bb-slp-2.f: Likewise. * g++.dg/vect/simd-11.cc: Likewise.
2020-10-01tree-optimization/97236 - fix bad use of VMAT_CONTIGUOUSRichard Biener1-11/+9
This avoids using VMAT_CONTIGUOUS with single-element interleaving when using V1mode vectors. Instead keep VMAT_ELEMENTWISE but continue to avoid load-lanes and gathers. 2020-10-01 Richard Biener <rguenther@suse.de> PR tree-optimization/97236 * tree-vect-stmts.c (get_group_load_store_type): Keep VMAT_ELEMENTWISE for single-element vectors. * gcc.dg/vect/pr97236.c: New testcase.
2020-09-09enable live comparison vectorizationRichard Biener1-8/+0
This removes a check preventing vectorization of live results of vectorized comparisons. I tested it with AVX512 mask registers (inspecting assembly) and traditional vector masks. 2020-09-09 Richard Biener <rguenther@suse.de> * tree-vect-stmts.c (vectorizable_comparison): Allow STMT_VINFO_LIVE_P stmts. * gcc.dg/vect/vect-live-6.c: New testcase.
2020-09-09enable live condition vectorizationRichard Biener1-9/+0
This removes a check preventing vectorization of live results of vectorized conditions. 2020-09-09 Richard Biener <rguenther@suse.de> * tree-vect-stmts.c (vectorizable_condition): Allow STMT_VINFO_LIVE_P stmts. * gcc.dg/vect/vect-cond-13.c: New testcase. * gcc.target/i386/pr87007-4.c: Adjust. * gcc.target/i386/pr87007-5.c: Likewise.
2020-09-09tree-optimization/96978 - fix fallout of BB vectorization of live stmtsRichard Biener1-2/+2
This avoids looking at STMT_VINFO_LIVE_P when vectorizing BBs. 2020-09-09 Richard Biener <rguenther@suse.de> PR tree-optimization/96978 * tree-vect-stmts.c (vectorizable_condition): Do not look at STMT_VINFO_LIVE_P for BB vectorization. (vectorizable_comparison): Likewise.
2020-09-07code generate live lanes in basic-block vectorizationRichard Biener1-7/+5
The following adds the capability to code-generate live lanes in basic-block vectorization using lane extracts from vector stmts rather than keeping the original scalar code around for those. This eventually makes previously not profitable vectorizations profitable (the live scalar code was appropriately costed so are the lane extracts now), without considering the cost model this patch doesn't add or remove any basic-block vectorization capabilities. The patch re/ab-uses STMT_VINFO_LIVE_P in basic-block vectorization mode to tell whether a live lane is vectorized or whether it is provided by means of keeping the scalar code live. The patch is a first step towards vectorizing sequences of stmts that do not end up in stores or vector constructors though. Bootstrapped and tested on x86_64-unknown-linux-gnu. 2020-09-04 Richard Biener <rguenther@suse.de> * tree-vectorizer.h (vectorizable_live_operation): Adjust. * tree-vect-loop.c (vectorizable_live_operation): Vectorize live lanes out of basic-block vectorization nodes. * tree-vect-slp.c (vect_bb_slp_mark_live_stmts): New function. (vect_slp_analyze_operations): Analyze live lanes and their vectorization possibility after the whole SLP graph is final. (vect_bb_slp_scalar_cost): Adjust for vectorized live lanes. * tree-vect-stmts.c (can_vectorize_live_stmts): Adjust. (vect_transform_stmt): Call can_vectorize_live_stmts also for basic-block vectorization. * gcc.dg/vect/bb-slp-46.c: New testcase. * gcc.dg/vect/bb-slp-47.c: Likewise. * gcc.dg/vect/bb-slp-32.c: Adjust.
2020-09-04tree-optimization/96920 - another ICE when vectorizing nested cyclesRichard Biener1-27/+0
This refines the previous fix for PR96698 by re-doing how and where we arrange for setting vectorized cycle PHI backedge values. 2020-09-04 Richard Biener <rguenther@suse.de> PR tree-optimization/96698 PR tree-optimization/96920 * tree-vectorizer.h (loop_vec_info::reduc_latch_defs): Remove. (loop_vec_info::reduc_latch_slp_defs): Likewise. * tree-vect-stmts.c (vect_transform_stmt): Remove vectorized cycle PHI latch code. * tree-vect-loop.c (maybe_set_vectorized_backedge_value): New helper to set vectorized cycle PHI latch values. (vect_transform_loop): Walk over all PHIs again after vectorizing them, calling maybe_set_vectorized_backedge_value. Call maybe_set_vectorized_backedge_value for each vectorized stmt. Remove delayed update code. * tree-vect-slp.c (vect_analyze_slp_instance): Initialize SLP instance reduc_phis member. (vect_schedule_slp): Set vectorized cycle PHI latch values. * gfortran.dg/vect/pr96920.f90: New testcase. * gcc.dg/vect/pr96920.c: Likewise.
2020-08-27vec: add exact argument for various grow functions.Martin Liska1-7/+8
gcc/ada/ChangeLog: * gcc-interface/trans.c (gigi): Set exact argument of a vector growth function to true. (Attribute_to_gnu): Likewise. gcc/ChangeLog: * alias.c (init_alias_analysis): Set exact argument of a vector growth function to true. * calls.c (internal_arg_pointer_based_exp_scan): Likewise. * cfgbuild.c (find_many_sub_basic_blocks): Likewise. * cfgexpand.c (expand_asm_stmt): Likewise. * cfgrtl.c (rtl_create_basic_block): Likewise. * combine.c (combine_split_insns): Likewise. (combine_instructions): Likewise. * config/aarch64/aarch64-sve-builtins.cc (function_expander::add_output_operand): Likewise. (function_expander::add_input_operand): Likewise. (function_expander::add_integer_operand): Likewise. (function_expander::add_address_operand): Likewise. (function_expander::add_fixed_operand): Likewise. * df-core.c (df_worklist_dataflow_doublequeue): Likewise. * dwarf2cfi.c (update_row_reg_save): Likewise. * early-remat.c (early_remat::init_block_info): Likewise. (early_remat::finalize_candidate_indices): Likewise. * except.c (sjlj_build_landing_pads): Likewise. * final.c (compute_alignments): Likewise. (grow_label_align): Likewise. * function.c (temp_slots_at_level): Likewise. * fwprop.c (build_single_def_use_links): Likewise. (update_uses): Likewise. * gcc.c (insert_wrapper): Likewise. * genautomata.c (create_state_ainsn_table): Likewise. (add_vect): Likewise. (output_dead_lock_vect): Likewise. * genmatch.c (capture_info::capture_info): Likewise. (parser::finish_match_operand): Likewise. * genrecog.c (optimize_subroutine_group): Likewise. (merge_pattern_info::merge_pattern_info): Likewise. (merge_into_decision): Likewise. (print_subroutine_start): Likewise. (main): Likewise. * gimple-loop-versioning.cc (loop_versioning::loop_versioning): Likewise. * gimple.c (gimple_set_bb): Likewise. * graphite-isl-ast-to-gimple.c (translate_isl_ast_node_user): Likewise. * haifa-sched.c (sched_extend_luids): Likewise. (extend_h_i_d): Likewise. * insn-addr.h (insn_addresses_new): Likewise. * ipa-cp.c (gather_context_independent_values): Likewise. (find_more_contexts_for_caller_subset): Likewise. * ipa-devirt.c (final_warning_record::grow_type_warnings): Likewise. (ipa_odr_read_section): Likewise. * ipa-fnsummary.c (evaluate_properties_for_edge): Likewise. (ipa_fn_summary_t::duplicate): Likewise. (analyze_function_body): Likewise. (ipa_merge_fn_summary_after_inlining): Likewise. (read_ipa_call_summary): Likewise. * ipa-icf.c (sem_function::bb_dict_test): Likewise. * ipa-prop.c (ipa_alloc_node_params): Likewise. (parm_bb_aa_status_for_bb): Likewise. (ipa_compute_jump_functions_for_edge): Likewise. (ipa_analyze_node): Likewise. (update_jump_functions_after_inlining): Likewise. (ipa_read_edge_info): Likewise. (read_ipcp_transformation_info): Likewise. (ipcp_transform_function): Likewise. * ipa-reference.c (ipa_reference_write_optimization_summary): Likewise. * ipa-split.c (execute_split_functions): Likewise. * ira.c (find_moveable_pseudos): Likewise. * lower-subreg.c (decompose_multiword_subregs): Likewise. * lto-streamer-in.c (input_eh_regions): Likewise. (input_cfg): Likewise. (input_struct_function_base): Likewise. (input_function): Likewise. * modulo-sched.c (set_node_sched_params): Likewise. (extend_node_sched_params): Likewise. (schedule_reg_moves): Likewise. * omp-general.c (omp_construct_simd_compare): Likewise. * passes.c (pass_manager::create_pass_tab): Likewise. (enable_disable_pass): Likewise. * predict.c (determine_unlikely_bbs): Likewise. * profile.c (compute_branch_probabilities): Likewise. * read-rtl-function.c (function_reader::parse_block): Likewise. * read-rtl.c (rtx_reader::read_rtx_code): Likewise. * reg-stack.c (stack_regs_mentioned): Likewise. * regrename.c (regrename_init): Likewise. * rtlanal.c (T>::add_single_to_queue): Likewise. * sched-deps.c (init_deps_data_vector): Likewise. * sel-sched-ir.c (sel_extend_global_bb_info): Likewise. (extend_region_bb_info): Likewise. (extend_insn_data): Likewise. * symtab.c (symtab_node::create_reference): Likewise. * tracer.c (tail_duplicate): Likewise. * trans-mem.c (tm_region_init): Likewise. (get_bb_regions_instrumented): Likewise. * tree-cfg.c (init_empty_tree_cfg_for_function): Likewise. (build_gimple_cfg): Likewise. (create_bb): Likewise. (move_block_to_fn): Likewise. * tree-complex.c (tree_lower_complex): Likewise. * tree-if-conv.c (predicate_rhs_code): Likewise. * tree-inline.c (copy_bb): Likewise. * tree-into-ssa.c (get_ssa_name_ann): Likewise. (mark_phi_for_rewrite): Likewise. * tree-object-size.c (compute_builtin_object_size): Likewise. (init_object_sizes): Likewise. * tree-predcom.c (initialize_root_vars_store_elim_1): Likewise. (initialize_root_vars_store_elim_2): Likewise. (prepare_initializers_chain_store_elim): Likewise. * tree-ssa-address.c (addr_for_mem_ref): Likewise. (multiplier_allowed_in_address_p): Likewise. * tree-ssa-coalesce.c (ssa_conflicts_new): Likewise. * tree-ssa-forwprop.c (simplify_vector_constructor): Likewise. * tree-ssa-loop-ivopts.c (addr_offset_valid_p): Likewise. (get_address_cost_ainc): Likewise. * tree-ssa-loop-niter.c (discover_iteration_bound_by_body_walk): Likewise. * tree-ssa-pre.c (add_to_value): Likewise. (phi_translate_1): Likewise. (do_pre_regular_insertion): Likewise. (do_pre_partial_partial_insertion): Likewise. (init_pre): Likewise. * tree-ssa-propagate.c (ssa_prop_init): Likewise. (update_call_from_tree): Likewise. * tree-ssa-reassoc.c (optimize_range_tests_cmp_bitwise): Likewise. * tree-ssa-sccvn.c (vn_reference_lookup_3): Likewise. (vn_reference_lookup_pieces): Likewise. (eliminate_dom_walker::eliminate_push_avail): Likewise. * tree-ssa-strlen.c (set_strinfo): Likewise. (get_stridx_plus_constant): Likewise. (zero_length_string): Likewise. (find_equal_ptrs): Likewise. (printf_strlen_execute): Likewise. * tree-ssa-threadedge.c (set_ssa_name_value): Likewise. * tree-ssanames.c (make_ssa_name_fn): Likewise. * tree-streamer-in.c (streamer_read_tree_bitfields): Likewise. * tree-vect-loop.c (vect_record_loop_mask): Likewise. (vect_get_loop_mask): Likewise. (vect_record_loop_len): Likewise. (vect_get_loop_len): Likewise. * tree-vect-patterns.c (vect_recog_mask_conversion_pattern): Likewise. * tree-vect-slp.c (vect_slp_convert_to_external): Likewise. (vect_bb_slp_scalar_cost): Likewise. (vect_bb_vectorization_profitable_p): Likewise. (vectorizable_slp_permutation): Likewise. * tree-vect-stmts.c (vectorizable_call): Likewise. (vectorizable_simd_clone_call): Likewise. (scan_store_can_perm_p): Likewise. (vectorizable_store): Likewise. * expr.c: Likewise. * vec.c (test_safe_grow_cleared): Likewise. * vec.h (vec_safe_grow): Likewise. (vec_safe_grow_cleared): Likewise. (vl_ptr>::safe_grow): Likewise. (vl_ptr>::safe_grow_cleared): Likewise. * config/c6x/c6x.c (insn_set_clock): Likewise. gcc/c/ChangeLog: * gimple-parser.c (c_parser_gimple_compound_statement): Set exact argument of a vector growth function to true. gcc/cp/ChangeLog: * class.c (build_vtbl_initializer): Set exact argument of a vector growth function to true. * constraint.cc (get_mapped_args): Likewise. * decl.c (cp_maybe_mangle_decomp): Likewise. (cp_finish_decomp): Likewise. * parser.c (cp_parser_omp_for_loop): Likewise. * pt.c (canonical_type_parameter): Likewise. * rtti.c (get_pseudo_ti_init): Likewise. gcc/fortran/ChangeLog: * trans-openmp.c (gfc_trans_omp_do): Set exact argument of a vector growth function to true. gcc/lto/ChangeLog: * lto-common.c (lto_file_finalize): Set exact argument of a vector growth function to true.
2020-08-26tree-optimization/96698 - fix ICE when vectorizing nested cyclesRichard Biener1-24/+5
This fixes vectorized PHI latch edge updating and delay it until all of the loop is code generated to deal with the case that the latch def is a PHI in the same block. 2020-08-26 Richard Biener <rguenther@suse.de> PR tree-optimization/96698 * tree-vectorizer.h (loop_vec_info::reduc_latch_defs): New. (loop_vec_info::reduc_latch_slp_defs): Likewise. * tree-vect-stmts.c (vect_transform_stmt): Only record stmts to update PHI latches from, perform the update ... * tree-vect-loop.c (vect_transform_loop): ... here after vectorizing those PHIs. (info_for_reduction): Properly handle non-reduction PHIs. * gcc.dg/vect/pr96698.c: New testcase.
2020-08-26tree-optimization/96783 - fix vectorization of negative step SLPRichard Biener1-1/+9
This appropriately uses VMAT_ELEMENTWISE when the vectors cannot be filled from a single SLP group until we get more explicit support for negative stride SLP. 2020-08-26 Richard Biener <rguenther@suse.de> PR tree-optimization/96783 * tree-vect-stmts.c (get_group_load_store_type): Use VMAT_ELEMENTWISE for negative strides when we cannot use VMAT_STRIDED_SLP. * gcc.dg/vect/pr96783-1.c: New testcase. * gcc.dg/vect/pr96783-2.c: Likewise.
2020-08-06vect/rs6000: Support vector with length cost modelingKewen Lin1-1/+5
This patch is to add the cost modeling for vector with length, it mainly follows what we generate for vector with length in functions vect_set_loop_controls_directly and vect_gen_len at the worst case. For Power, the length is expected to be in bits 0-7 (high bits), we have to model the cost of shifting bits, which is implemented in adjust_vect_cost_per_loop. Bootstrapped/regtested on powerpc64le-linux-gnu (P9) with explicit param vect-partial-vector-usage=1. gcc/ChangeLog: * config/rs6000/rs6000.c (rs6000_adjust_vect_cost_per_loop): New function. (rs6000_finish_cost): Call rs6000_adjust_vect_cost_per_loop. * tree-vect-loop.c (vect_estimate_min_profitable_iters): Add cost modeling for vector with length. (vect_rgroup_iv_might_wrap_p): New function, factored out from... * tree-vect-loop-manip.c (vect_set_loop_controls_directly): ...this. Update function comment. * tree-vect-stmts.c (vect_gen_len): Update function comment. * tree-vectorizer.h (vect_rgroup_iv_might_wrap_p): New declare.
2020-07-20vect: Fix an ICE in vectorizable_simd_clone_cally005201631-3/+22
In vectorizable_simd_clone_call, type compatibility is handled based on the number of elements and the type compatibility of elements, which is not enough. This patch add VIEW_CONVERT_EXPRs if the arguments types and return type of simd clone function are distinct with the vectype of stmt. 2020-07-20 Yang Yang <yangyang305@huawei.com> gcc/ChangeLog: * tree-vect-stmts.c (vectorizable_simd_clone_call): Add VIEW_CONVERT_EXPRs if the arguments types and return type of simd clone function are distinct with the vectype of stmt. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/pr96195.c: New test.
2020-07-19vect: Support length-based partial vectors approachKewen Lin1-13/+154
Power9 supports vector load/store instruction lxvl/stxvl which allow us to operate partial vectors with one specific length. This patch extends some of current mask-based partial vectors support code for length-based approach, also adds some length specific support code. So far it assumes that we can only have one partial vectors approach at the same time, it will disable to use partial vectors if both approaches co-exist. Like the description of optab len_load/len_store, the length-based approach can have two flavors, one is length in bytes, the other is length in lanes. This patch is mainly implemented and tested for length in bytes, but as Richard S. suggested, most of code has considered both flavors. This also introduces one parameter vect-partial-vector-usage allow users to control when the loop vectorizer considers using partial vectors as an alternative to falling back to scalar code. gcc/ChangeLog: * config/rs6000/rs6000.c (rs6000_option_override_internal): Set param_vect_partial_vector_usage to 0 explicitly. * doc/invoke.texi (vect-partial-vector-usage): Document new option. * optabs-query.c (get_len_load_store_mode): New function. * optabs-query.h (get_len_load_store_mode): New declare. * params.opt (vect-partial-vector-usage): New. * tree-vect-loop-manip.c (vect_set_loop_controls_directly): Add the handlings for vectorization using length-based partial vectors, call vect_gen_len for length generation, and rename some variables with items instead of scalars. (vect_set_loop_condition_partial_vectors): Add the handlings for vectorization using length-based partial vectors. (vect_do_peeling): Allow remaining eiters less than epilogue vf for LOOP_VINFO_USING_PARTIAL_VECTORS_P. * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Init epil_using_partial_vectors_p. (_loop_vec_info::~_loop_vec_info): Call release_vec_loop_controls for lengths destruction. (vect_verify_loop_lens): New function. (vect_analyze_loop): Add handlings for epilogue of loop when it's marked to use vectorization using partial vectors. (vect_analyze_loop_2): Add the check to allow only one vectorization approach using partial vectorization at the same time. Check param vect-partial-vector-usage for partial vectors decision. Mark LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P if the epilogue is considerable to use partial vectors. Call release_vec_loop_controls for lengths destruction. (vect_estimate_min_profitable_iters): Adjust for loop vectorization using length-based partial vectors. (vect_record_loop_mask): Init factor to 1 for vectorization using mask-based partial vectors. (vect_record_loop_len): New function. (vect_get_loop_len): Likewise. * tree-vect-stmts.c (check_load_store_for_partial_vectors): Add checks for vectorization using length-based partial vectors. Factor some code to lambda function get_valid_nvectors. (vectorizable_store): Add handlings when using length-based partial vectors. (vectorizable_load): Likewise. (vect_gen_len): New function. * tree-vectorizer.h (struct rgroup_controls): Add field factor mainly for length-based partial vectors. (vec_loop_lens): New typedef. (_loop_vec_info): Add lens and epil_using_partial_vectors_p. (LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P): New macro. (LOOP_VINFO_LENS): Likewise. (LOOP_VINFO_FULLY_WITH_LENGTH_P): Likewise. (vect_record_loop_len): New declare. (vect_get_loop_len): Likewise. (vect_gen_len): Likewise.
2020-07-09remove premature vect_verify_datarefs_alignmentRichard Biener1-0/+14
This followup removes vect_verify_datarefs_alignment and its premature cancellation of vectorization leaving the actual decision whether alignment is supported to the functions deciding whether we can vectorize a load or store. 2020-07-08 Richard Biener <rguenther@suse.de> * tree-vectorizer.h (vect_verify_datarefs_alignment): Remove. (vect_slp_analyze_and_verify_instance_alignment): Rename to ... (vect_slp_analyze_instance_alignment): ... this. * tree-vect-data-refs.c (verify_data_ref_alignment): Remove. (vect_verify_datarefs_alignment): Likewise. (vect_enhance_data_refs_alignment): Do not call vect_verify_datarefs_alignment. (vect_slp_analyze_node_alignment): Rename from vect_slp_analyze_and_verify_node_alignment and do not call verify_data_ref_alignment. (vect_slp_analyze_instance_alignment): Rename from vect_slp_analyze_and_verify_instance_alignment. * tree-vect-stmts.c (vectorizable_store): Dump when we vectorize an unaligned access. (vectorizable_load): Likewise. * tree-vect-loop.c (vect_analyze_loop_2): Do not call vect_verify_datarefs_alignment. * tree-vect-slp.c (vect_slp_analyze_bb_1): Adjust. * gcc.dg/vect/bb-slp-10.c: Adjust. * gcc.dg/vect/slp-45.c: Likewise. * gcc.dg/vect/vect-109.c: Likewise.
2020-07-09vect: Enhance condition check to use partial vectorsKewen Lin1-5/+16
This patch is derived from the review of vector with length patch series. The length-based partial vector approach doesn't support reduction so far, so we would like to disable vectorization with partial vectors explicitly for it in vectorizable_condition. Otherwise, it will cause some unexpected failures for a few cases like gcc.dg/vect/pr65947-2.c. But if we disable it for the cases excepting for reduction_type equal to EXTRACT_LAST_REDUCTION, it cause one regression failure on aarch64: gcc.target/aarch64/sve/reduc_8.c -march=armv8.2-a+sve The disabling makes the outer loop can't work with partial vectors, the check fails. But the case is safe to adopt it. As Richard S. pointed out in the review comments, the extra inactive lanes only matter for double reductions, so this patch is to permit vectorization with partial vectors for cases EXTRACT_LAST_REDUCTION or nested-cycle reduction. Bootstrapped/regtested on aarch64-linux-gnu. gcc/ChangeLog: * tree-vect-stmts.c (vectorizable_condition): Prohibit vectorization with partial vectors explicitly excepting for EXTRACT_LAST_REDUCTION or nested-cycle reduction.
2020-07-08compute and check alignment info during analysisRichard Biener1-22/+50
This moves querying the alignment support scheme from load/store transform time to get_load_store_type where we should know best what alignment constraints we actually need. This should make verify_data_ref_alignment obsolete which prematurely disqualifies all vectorization IMHO. 2020-07-08 Richard Biener <rguenther@suse.de> * tree-vect-stmts.c (get_group_load_store_type): Pass in the SLP node and the alignment support scheme output. Set that. (get_load_store_type): Likewise. (vectorizable_store): Adjust. (vectorizable_load): Likewise.
2020-07-07fix detection of negative step DR groupsRichard Biener1-2/+9
This fixes a condition that caused all negative step DR groups to be detected as single element interleaving. Such groups are rejected by interleaving vectorization but miscompiled by SLP which is fixed by forcing VMAT_STRIDED_SLP for now. 2020-07-07 Richard Biener <rguenther@suse.de> * tree-vect-data-refs.c (vect_analyze_data_ref_accesses): Fix group overlap condition to allow negative step DR groups. * tree-vect-stmts.c (get_group_load_store_type): For multi element SLP groups force VMAT_STRIDED_SLP when the step is negative. * gcc.dg/vect/slp-47.c: New testcase. * gcc.dg/vect/slp-48.c: Likewise.
2020-07-03tree-optimization/96037 - fix uninitialized use of slp_opRichard Biener1-0/+1
The following avoids leaving slp_def as passed to vect_is_simple_use by reference uninitialized. 2020-07-03 Richard Biener <rguenther@suse.de> PR tree-optimization/96037 * tree-vect-stmts.c (vect_is_simple_use): Initialize *slp_def.
2020-07-03refactor SLP constant insertion and provde entry insert helperRichard Biener1-23/+2
This provides helpers to insert stmts on region entry abstracted from loop/basic-block split out from vec_init_vector and used from the SLP constant code generation path. The SLP constant code generation path is also changed to avoid needless SSA copying since we can store VECTOR_CSTs directly in the vectorized defs array, improving the IL from the vectorizer. 2020-07-03 Richard Biener <rguenther@suse.de> * tree-vectorizer.h (vec_info::insert_on_entry): New. (vec_info::insert_seq_on_entry): Likewise. * tree-vectorizer.c (vec_info::insert_on_entry): Implement. (vec_info::insert_seq_on_entry): Likewise. * tree-vect-stmts.c (vect_init_vector_1): Use vec_info::insert_on_entry. (vect_finish_stmt_generation): Set modified bit after adjusting VUSE. * tree-vect-slp.c (vect_create_constant_vectors): Simplify by using vec_info::insert_seq_on_entry and bypassing vec_init_vector. (vect_schedule_slp_instance): Deal with all-constant children later.
2020-07-02tree-optimization/96022 - fix ICE with vectorized shiftRichard Biener1-2/+4
This fixes lane extraction for internal def vectorized shifts with an effective scalar shift operand by always using lane zero of the first vector stmt. It also fixes a SLP build issue noticed on the testcase where we end up building unary vector ops with the only operand built form scalars which isn't profitable by itself. The exception is for stores. 2020-07-02 Richard Biener <rguenther@suse.de> PR tree-optimization/96022 * tree-vect-stmts.c (vectorizable_shift): Only use the first vector stmt when extracting the scalar shift amount. * tree-vect-slp.c (vect_build_slp_tree_2): Also build unary nodes with all-scalar children from scalars but not stores. (vect_analyze_slp_instance): Mark the node not failed. * g++.dg/vect/pr96022.cc: New testcase.
2020-06-26tree-optimization/95897 - fix fold-left SLP reduction insert placeRichard Biener1-2/+2
This fixes computation of the insertion place for fold-left SLP reductions where the PHIs do not have vectorized stmts. The SLP representation isn't perfect here thus the following. 2020-06-26 Richard Biener <rguenther@suse.de> PR tree-optimization/95897 * tree-vectorizer.h (vectorizable_induction): Remove unused gimple_stmt_iterator * parameter. * tree-vect-loop.c (vectorizable_induction): Likewise. (vect_analyze_loop_operations): Adjust. * tree-vect-stmts.c (vect_analyze_stmt): Likewise. (vect_transform_stmt): Likewise. * tree-vect-slp.c (vect_schedule_slp_instance): Adjust for fold-left reductions, clarify existing reduction case. * gcc.dg/vect/pr95897.c: New testcase.
2020-06-25tree-optimization/95866 - avoid using scalar ops for vectorized shiftRichard Biener1-3/+27
This avoids using the original scalar SSA operand when vectorizing a shift with a vectorized shift operand where we know all vector components have the same value and thus we can use a vector by scalar shift. Using the scalar SSA operand causes a possibly long chain of scalar computation to be retained so it's better to simply extract lane zero from the available vectorized shift operand. 2020-06-25 Richard Biener <rguenther@suse.de> PR tree-optimization/95866 * tree-vect-stmts.c (vectorizable_shift): Reject incompatible vectorized shift operands. For scalar shifts use lane zero of a vectorized shift operand. * gcc.dg/vect/bb-slp-pr95866.c: New testcase.
2020-06-24emit SLP vectorized loads earlierRichard Biener1-1/+2
This makes sure to emit SLP vectorized loads where the first scalar load is. This makes SLP dependence checking more powerful because hoisting loads can use TBAA and it increases the freedom for vector placement when there are constraints from live lanes. Vectorized shifts block inserting vectorized stmts always after vectorized defs because it ends up using the original scalar operand even when the SLP graph indicates the shift operand is vectorized (and we actually emit and cost those stmts). vect_slp_analyze_and_verify_node_alignment shows we need alignment for too many places, this is a temporary solution and my plan is to have a single meta-info for a dataref group instead (also getting rid of DR_GROUP_FIRST/NEXT_ELEMENT). 2020-06-24 Richard Biener <rguenther@suse.de> * tree-vectorizer.h (vect_find_first_scalar_stmt_in_slp): Declare. * tree-vect-data-refs.c (vect_preserves_scalar_order_p): Simplify for new position of vectorized SLP loads. (vect_slp_analyze_node_dependences): Adjust for it. (vect_slp_analyze_and_verify_node_alignment): Compute alignment for the first stmts dataref. * tree-vect-slp.c (vect_find_first_scalar_stmt_in_slp): New. (vect_schedule_slp_instance): Emit loads before the first scalar stmt. * tree-vect-stmts.c (vectorizable_load): Do what the comment says and use vect_find_first_scalar_stmt_in_slp.
2020-06-18remove SLP_TREE_TWO_OPERATORS, add SLP permutation nodeRichard Biener1-8/+0
This removes the SLP_TREE_TWO_OPERATORS hack in favor of having explicit SLP nodes for both computations and the blend operation. For this introduce a generic merge + select + permute SLP node (with implementation limits). Building upon earlier patches it adds vect_stmt_dominates_stmt_p and the ability to compute a vector insertion place from vectorized stmts (which now have UID zero) as needed for the permute node. 2020-06-17 Richard Biener <rguenther@suse.de> * tree-vectorizer.h (_slp_tree::two_operators): Remove. (_slp_tree::lane_permutation): New member. (_slp_tree::code): Likewise. (SLP_TREE_TWO_OPERATORS): Remove. (SLP_TREE_LANE_PERMUTATION): New. (SLP_TREE_CODE): Likewise. (vect_stmt_dominates_stmt_p): Declare. * tree-vectorizer.c (vect_stmt_dominates_stmt_p): New function. * tree-vect-stmts.c (vect_model_simple_cost): Remove SLP_TREE_TWO_OPERATORS handling. * tree-vect-slp.c (_slp_tree::_slp_tree): Amend. (_slp_tree::~_slp_tree): Likewise. (vect_two_operations_perm_ok_p): Remove. (vect_build_slp_tree_1): Remove verification of two-operator permutation here. (vect_build_slp_tree_2): When we have two different operators build two computation SLP nodes and a blend. (vect_print_slp_tree): Print the lane permutation if it exists. (slp_copy_subtree): Copy it. (vect_slp_rearrange_stmts): Re-arrange it. (vect_slp_analyze_node_operations_1): Handle SLP_TREE_CODE VEC_PERM_EXPR explicitely. (vect_schedule_slp_instance): Likewise. Remove old SLP_TREE_TWO_OPERATORS code. (vectorizable_slp_permutation): New function.
2020-06-17vect: CSE for bump and offset in strided load/store operations.Kaipeng Zhou1-12/+5
Every time "vect_get_strided_load_store_ops" is called, new bump and offset variables and a series of stmts are created. And IVOPTs is not able to eliminate them. The patch use "cse_and_gimplify_to_preheader" to CSE them. 2020-06-17 Bin Cheng <bin.cheng@linux.alibaba.com> Kaipeng Zhou <zhoukaipeng3@huawei.com> PR tree-optimization/95199 * tree-vect-stmts.c: Eliminate common stmts for bump and offset in strided load/store operations and remove redundant code. 2020-06-17 Bin Cheng <bin.cheng@linux.alibaba.com> Kaipeng Zhou <zhoukaipeng3@huawei.com> PR tree-optimization/95199 * gcc.target/aarch64/sve/pr95199.c: New test.
2020-06-17Use SLP_TREE_VECTYPE consistentlyRichard Biener1-6/+6
This assigns SLP_TREE_VECTYPE to all SLP nodes and uses it when analyzing def operands. It does not deal with mismatches between SLP_TREE_VECTYPE and STMT_VINFO_VECTYPE yet - those cases are still rejected until I get to clean up the interaction with data reference groups. 2020-06-17 Richard Biener <rguenther@suse.de> * tree-vect-slp.c (vect_build_slp_tree_1): Set the passed in *vectype parameter. (vect_build_slp_tree_2): Set SLP_TREE_VECTYPE from what vect_build_slp_tree_1 computed. (vect_analyze_slp_instance): Set SLP_TREE_VECTYPE. (vect_slp_analyze_node_operations_1): Use the SLP node vector type. (vect_schedule_slp_instance): Likewise. * tree-vect-stmts.c (vect_is_simple_use): Take the vector type from SLP_TREE_VECTYPE.
2020-06-17Lower VEC_COND_EXPR into internal functions.Martin Liska1-2/+6
gcc/ChangeLog: * Makefile.in: Add new file. * expr.c (expand_expr_real_2): Add gcc_unreachable as we should not meet this condition. (do_store_flag): Likewise. * gimplify.c (gimplify_expr): Gimplify first argument of VEC_COND_EXPR to be a SSA name. * internal-fn.c (vec_cond_mask_direct): New. (vec_cond_direct): Likewise. (vec_condu_direct): Likewise. (vec_condeq_direct): Likewise. (expand_vect_cond_optab_fn): New. (expand_vec_cond_optab_fn): Likewise. (expand_vec_condu_optab_fn): Likewise. (expand_vec_condeq_optab_fn): Likewise. (expand_vect_cond_mask_optab_fn): Likewise. (expand_vec_cond_mask_optab_fn): Likewise. (direct_vec_cond_mask_optab_supported_p): Likewise. (direct_vec_cond_optab_supported_p): Likewise. (direct_vec_condu_optab_supported_p): Likewise. (direct_vec_condeq_optab_supported_p): Likewise. * internal-fn.def (VCOND): New OPTAB. (VCONDU): Likewise. (VCONDEQ): Likewise. (VCOND_MASK): Likewise. * optabs.c (get_rtx_code): Make it global. (expand_vec_cond_mask_expr): Removed. (expand_vec_cond_expr): Removed. * optabs.h (expand_vec_cond_expr): Likewise. (vector_compare_rtx): Make it global. * passes.def: Add new pass_gimple_isel pass. * tree-cfg.c (verify_gimple_assign_ternary): Add check for VEC_COND_EXPR about first argument. * tree-pass.h (make_pass_gimple_isel): New. * tree-ssa-forwprop.c (pass_forwprop::execute): Prevent propagation of the first argument of a VEC_COND_EXPR. * tree-ssa-reassoc.c (ovce_extract_ops): Support SSA_NAME as first argument of a VEC_COND_EXPR. (optimize_vec_cond_expr): Likewise. * tree-vect-generic.c (expand_vector_divmod): Make SSA_NAME for a first argument of created VEC_COND_EXPR. (expand_vector_condition): Fix coding style. * tree-vect-stmts.c (vectorizable_condition): Gimplify first argument. * gimple-isel.cc: New file. gcc/testsuite/ChangeLog: * g++.dg/vect/vec-cond-expr-eh.C: New test.
2020-06-12fix vectorizable_condition ICE with EXTRACT_LAST_REDUCTIONRichard Biener1-1/+2
The previous reorg missed a guard around the else clause access. 2020-06-12 Richard Biener <rguenther@suse.de> PR tree-optimization/95633 * tree-vect-stmts.c (vectorizable_condition): Properly guard the vec_else_clause access with EXTRACT_LAST_REDUCTION.
2020-06-12vect: Factor out and rename some functions/macrosKewen Lin1-19/+25
Power supports vector memory access with length (in bytes) instructions. Like existing fully masking for SVE, it is another approach to vectorize the loop using partially-populated vectors. As Richard Sandiford suggested, we should share the codes in approaches with partial vectors if possible. This patch is to: 1) factor out two functions: - vect_min_prec_for_max_niters - vect_known_niters_smaller_than_vf. 2) rename four functions: - vect_iv_limit_for_full_masking - check_load_store_masking - vect_set_loop_condition_masked - vect_set_loop_condition_unmasked 3) rename macros LOOP_VINFO_MASK_COMPARE_TYPE and LOOP_VINFO_MASK_IV_TYPE. Bootstrapped/regtested on aarch64-linux-gnu. gcc/ChangeLog: * tree-vect-loop-manip.c (vect_set_loop_controls_directly): Rename LOOP_VINFO_MASK_COMPARE_TYPE to LOOP_VINFO_RGROUP_COMPARE_TYPE. Rename LOOP_VINFO_MASK_IV_TYPE to LOOP_VINFO_RGROUP_IV_TYPE. (vect_set_loop_condition_masked): Renamed to ... (vect_set_loop_condition_partial_vectors): ... this. Rename LOOP_VINFO_MASK_COMPARE_TYPE to LOOP_VINFO_RGROUP_COMPARE_TYPE. Rename vect_iv_limit_for_full_masking to vect_iv_limit_for_partial_vectors. (vect_set_loop_condition_unmasked): Renamed to ... (vect_set_loop_condition_normal): ... this. (vect_set_loop_condition): Rename vect_set_loop_condition_unmasked to vect_set_loop_condition_normal. Rename vect_set_loop_condition_masked to vect_set_loop_condition_partial_vectors. (vect_prepare_for_masked_peels): Rename LOOP_VINFO_MASK_COMPARE_TYPE to LOOP_VINFO_RGROUP_COMPARE_TYPE. * tree-vect-loop.c (vect_known_niters_smaller_than_vf): New, factored out from ... (vect_analyze_loop_costing): ... this. (_loop_vec_info::_loop_vec_info): Rename mask_compare_type to compare_type. (vect_min_prec_for_max_niters): New, factored out from ... (vect_verify_full_masking): ... this. Rename vect_iv_limit_for_full_masking to vect_iv_limit_for_partial_vectors. Rename LOOP_VINFO_MASK_COMPARE_TYPE to LOOP_VINFO_RGROUP_COMPARE_TYPE. Rename LOOP_VINFO_MASK_IV_TYPE to LOOP_VINFO_RGROUP_IV_TYPE. (vectorizable_reduction): Update some dumpings with partial vectors instead of fully-masked. (vectorizable_live_operation): Likewise. (vect_iv_limit_for_full_masking): Renamed to ... (vect_iv_limit_for_partial_vectors): ... this. * tree-vect-stmts.c (check_load_store_masking): Renamed to ... (check_load_store_for_partial_vectors): ... this. Update some dumpings with partial vectors instead of fully-masked. (vectorizable_store): Rename check_load_store_masking to check_load_store_for_partial_vectors. (vectorizable_load): Likewise. * tree-vectorizer.h (LOOP_VINFO_MASK_COMPARE_TYPE): Renamed to ... (LOOP_VINFO_RGROUP_COMPARE_TYPE): ... this. (LOOP_VINFO_MASK_IV_TYPE): Renamed to ... (LOOP_VINFO_RGROUP_IV_TYPE): ... this. (vect_iv_limit_for_full_masking): Renamed to ... (vect_iv_limit_for_partial_vectors): this. (_loop_vec_info): Rename mask_compare_type to rgroup_compare_type. Rename iv_type to rgroup_iv_type.
2020-06-11vect: Rename can_fully_mask_p to can_use_partial_vectors_pKewen Lin1-10/+10
Power supports vector memory access with length (in bytes) instructions. Like existing fully masking for SVE, it is another approach to vectorize the loop using partially-populated vectors. As Richard Sandiford pointed out, we should extend the existing flag can_fully_mask_p to be more generic, to indicate whether we have any chances with partial vectors for this loop. So this patch is to rename this flag to can_use_partial_vectors_p to be more meaningful, also rename the macro LOOP_VINFO_CAN_FULLY_MASK_P to LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P. Bootstrapped/regtested on aarch64-linux-gnu. gcc/ChangeLog: * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Rename can_fully_mask_p to can_use_partial_vectors_p. (vect_analyze_loop_2): Rename LOOP_VINFO_CAN_FULLY_MASK_P to LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P. Rename saved_can_fully_mask_p to saved_can_use_partial_vectors_p. (vectorizable_reduction): Rename LOOP_VINFO_CAN_FULLY_MASK_P to LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P. (vectorizable_live_operation): Likewise. * tree-vect-stmts.c (permute_vec_elements): Likewise. (check_load_store_masking): Likewise. (vectorizable_operation): Likewise. (vectorizable_store): Likewise. (vectorizable_load): Likewise. (vectorizable_condition): Likewise. * tree-vectorizer.h (LOOP_VINFO_CAN_FULLY_MASK_P): Renamed to ... (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P): ... this. (_loop_vec_info): Rename can_fully_mask_p to can_use_partial_vectors_p.
2020-06-10avoid stmt-info allocation for debug stmtsRichard Biener1-0/+2
The following avoids allocating stmt info structs for debug stmts. 2020-06-10 Richard Biener <rguenther@suse.de> * tree-vect-loop.c (vect_determine_vectorization_factor): Skip debug stmts. (_loop_vec_info::_loop_vec_info): Likewise. (vect_update_vf_for_slp): Likewise. (vect_analyze_loop_operations): Likewise. (update_epilogue_loop_vinfo): Likewise. * tree-vect-patterns.c (vect_determine_precisions): Likewise. (vect_pattern_recog): Likewise. * tree-vect-slp.c (vect_detect_hybrid_slp): Likewise. (_bb_vec_info::_bb_vec_info): Likewise. * tree-vect-stmts.c (vect_mark_stmts_to_be_vectorized): Likewise.
2020-06-10Make {SLP_TREE,STMT_VINFO}_VEC_STMTS a vector of gimple *Richard Biener1-268/+193
This makes {SLP_TREE,STMT_VINFO}_VEC_STMTS a vector of gimple * and not allocate a stmt_vec_info for vectorizer generated stmts since this is now possible after removing the only use which was chaining of vector stmts via STMT_VINFO_RELATED_STMT. This also removes all stmt_vec_info allocations done for vector stmts, the remaining ones are for stmts in the scalar IL and for patterns which are not part of the IL. Thus after this the stmt UIDs inside a basic-block are suitable for dominance checking if you ignore (or lazy-fill) UIDs of zero of the vector stmts inserted during transform. This property is ensured by a new flag set when pattern analysis is complete. 2020-06-10 Richard Biener <rguenther@suse.de> * tree-vectorizer.h (_slp_tree::vec_stmts): Make it a vector of gimple * stmts. (_stmt_vec_info::vec_stmts): Likewise. (vec_info::stmt_vec_info_ro): New flag. (vect_finish_replace_stmt): Adjust declaration. (vect_finish_stmt_generation): Likewise. (vectorizable_induction): Likewise. (vect_transform_reduction): Likewise. (vectorizable_lc_phi): Likewise. * tree-vect-data-refs.c (vect_create_data_ref_ptr): Do not allocate stmt infos for increments. (vect_record_grouped_load_vectors): Adjust. * tree-vect-loop.c (vect_create_epilog_for_reduction): Likewise. (vectorize_fold_left_reduction): Likewise. (vect_transform_reduction): Likewise. (vect_transform_cycle_phi): Likewise. (vectorizable_lc_phi): Likewise. (vectorizable_induction): Likewise. (vectorizable_live_operation): Likewise. (vect_transform_loop): Likewise. * tree-vect-patterns.c (vect_pattern_recog): Set stmt_vec_info_ro. * tree-vect-slp.c (vect_get_slp_vect_def): Adjust. (vect_get_slp_defs): Likewise. (vect_transform_slp_perm_load): Likewise. (vect_schedule_slp_instance): Likewise. (vectorize_slp_instance_root_stmt): Likewise. * tree-vect-stmts.c (vect_get_vec_defs_for_operand): Likewise. (vect_finish_stmt_generation_1): Do not allocate a stmt info. (vect_finish_replace_stmt): Do not return anything. (vect_finish_stmt_generation): Likewise. (vect_build_gather_load_calls): Adjust. (vectorizable_bswap): Likewise. (vectorizable_call): Likewise. (vectorizable_simd_clone_call): Likewise. (vect_create_vectorized_demotion_stmts): Likewise. (vectorizable_conversion): Likewise. (vectorizable_assignment): Likewise. (vectorizable_shift): Likewise. (vectorizable_operation): Likewise. (vectorizable_scan_store): Likewise. (vectorizable_store): Likewise. (vectorizable_load): Likewise. (vectorizable_condition): Likewise. (vectorizable_comparison): Likewise. (vect_transform_stmt): Likewise. * tree-vectorizer.c (vec_info::vec_info): Initialize stmt_vec_info_ro. (vec_info::replace_stmt): Copy over stmt UID rather than unsetting/setting a stmt info allocating a new UID. (vec_info::set_vinfo_for_stmt): Assert !stmt_vec_info_ro.
2020-06-10Introduce STMT_VINFO_VEC_STMTSRichard Biener1-1246/+721
This gets rid of the linked list of STMT_VINFO_VECT_STMT and STMT_VINFO_RELATED_STMT in preparation for vectorized stmts no longer needing a stmt_vec_info (just for this chaining). This has ripple-down effects in all places we gather vectorized defs. For this new interfaces are introduced and used throughout vectorization, simplifying code in a lot of places and merging it with the SLP way of gathering vectorized operands. There is vect_get_vec_defs as the new recommended unified interface and vect_get_vec_defs_for_operand as one for non-SLP operation. I've resorted to keep the structure of the code the same where using vect_get_vec_defs would have been too disruptive for this already large patch. 2020-06-10 Richard Biener <rguenther@suse.de> * tree-vect-data-refs.c (vect_vfa_access_size): Adjust. (vect_record_grouped_load_vectors): Likewise. * tree-vect-loop.c (vect_create_epilog_for_reduction): Likewise. (vectorize_fold_left_reduction): Likewise. (vect_transform_reduction): Likewise. (vect_transform_cycle_phi): Likewise. (vectorizable_lc_phi): Likewise. (vectorizable_induction): Likewise. (vectorizable_live_operation): Likewise. (vect_transform_loop): Likewise. * tree-vect-slp.c (vect_get_slp_defs): New function, split out from overload. * tree-vect-stmts.c (vect_get_vec_def_for_operand_1): Remove. (vect_get_vec_def_for_operand): Likewise. (vect_get_vec_def_for_stmt_copy): Likewise. (vect_get_vec_defs_for_stmt_copy): Likewise. (vect_get_vec_defs_for_operand): New function. (vect_get_vec_defs): Likewise. (vect_build_gather_load_calls): Adjust. (vect_get_gather_scatter_ops): Likewise. (vectorizable_bswap): Likewise. (vectorizable_call): Likewise. (vectorizable_simd_clone_call): Likewise. (vect_get_loop_based_defs): Remove. (vect_create_vectorized_demotion_stmts): Adjust. (vectorizable_conversion): Likewise. (vectorizable_assignment): Likewise. (vectorizable_shift): Likewise. (vectorizable_operation): Likewise. (vectorizable_scan_store): Likewise. (vectorizable_store): Likewise. (vectorizable_load): Likewise. (vectorizable_condition): Likewise. (vectorizable_comparison): Likewise. (vect_transform_stmt): Adjust and remove no longer applicable sanity checks. * tree-vectorizer.c (vec_info::new_stmt_vec_info): Initialize STMT_VINFO_VEC_STMTS. (vec_info::free_stmt_vec_info): Relase it. * tree-vectorizer.h (_stmt_vec_info::vectorized_stmt): Remove. (_stmt_vec_info::vec_stmts): Add. (STMT_VINFO_VEC_STMT): Remove. (STMT_VINFO_VEC_STMTS): New. (vect_get_vec_def_for_operand_1): Remove. (vect_get_vec_def_for_operand): Likewise. (vect_get_vec_defs_for_stmt_copy): Likewise. (vect_get_vec_def_for_stmt_copy): Likewise. (vect_get_vec_defs): New overloads. (vect_get_vec_defs_for_operand): New. (vect_get_slp_defs): Declare.
2020-06-09Remove dead codeRichard Biener1-43/+0
This removes dead code that was left over from the reduction vectorization refactoring last year. 2020-06-09 Richard Biener <rguenther@suse.de> * tree-vect-stmts.c (vect_transform_stmt): Remove dead code.
2020-06-05tree-optimization/95539 - fix SLP_TREE_REPRESENTATIVE vs. dr_infoRichard Biener1-4/+15
This fixes a disconnect between the stmt_info used for dr_info analysis and the one in SLP_TREE_REPRESENTATIVE with a temporary workaround. 2020-06-05 Richard Biener <rguenther@suse.de> PR tree-optimization/95539 * tree-vect-data-refs.c (vect_slp_analyze_and_verify_instance_alignment): Use SLP_TREE_REPRESENTATIVE for the data-ref check. * tree-vect-stmts.c (vectorizable_load): Reset stmt_info back to the first scalar stmt rather than the SLP_TREE_REPRESENTATIVE to match previous behavior. * gcc.dg/vect/pr95539.c: New testcase.
2020-06-04add vect_get_slp_vect_defRichard Biener1-2/+2
This adds vect_get_slp_vect_def to get at a SLP nodes vectorized def, abstracting away the details. It also fixes one stray failure to use SLP_TREE_REPRESENTATIVE. 2020-05-04 Richard Biener <rguenther@suse.de> * tree-vectorizer.h (vect_get_slp_vect_def): Declare. * tree-vect-loop.c (vect_create_epilog_for_reduction): Use it. * tree-vect-stmts.c (vect_transform_stmt): Likewise. (vect_is_simple_use): Use SLP_TREE_REPRESENTATIVE. * tree-vect-slp.c (vect_get_slp_vect_defs): Fold into single use ... (vect_get_slp_defs): ... here. (vect_get_slp_vect_def): New function.
2020-06-04Add explicit SLP_TREE_LANESRichard Biener1-6/+2
This adds an explicit number of scalar lanes to the SLP node avoiding to dispatch between stmts/ops and eventually not require those vectors at all. 2020-05-27 Richard Biener <rguenther@suse.de> * tree-vectorizer.h (_slp_tree::lanes): New. (SLP_TREE_LANES): Likewise. * tree-vect-loop.c (vect_create_epilog_for_reduction): Use it. (vectorizable_reduction): Likewise. (vect_transform_cycle_phi): Likewise. (vectorizable_induction): Likewise. (vectorizable_live_operation): Likewise. * tree-vect-slp.c (_slp_tree::_slp_tree): Initialize lanes. (vect_create_new_slp_node): Likewise. (slp_copy_subtree): Copy it. (vect_optimize_slp): Use it. (vect_slp_analyze_node_operations_1): Likewise. (vect_slp_convert_to_external): Likewise. (vect_bb_vectorization_profitable_p): Likewise. * tree-vect-stmts.c (vectorizable_load): Likewise. (get_vectype_for_scalar_type): Likewise.
2020-06-04Simplify SLP code wrt SLP_TREE_DEF_TYPERichard Biener1-24/+48
The following removes the ugly pushing of SLP_TREE_DEF_TYPE to stmt_infos and instead makes sure to handle invariants fully in vect_is_simple_use plus adjusting a few places I refrained from touching when enforcing vector types for them. It also simplifies building SLP nodes with all external operands from scalars by not doing that in the parent but instead not building those from the start. That also gets rid of vect_update_all_shared_vectypes. 2020-06-04 Richard Biener <rguenther@suse.de> * tree-vect-slp.c (vect_update_all_shared_vectypes): Remove. (vect_build_slp_tree_2): Simplify building all external op nodes from scalars. (vect_slp_analyze_node_operations): Remove push/pop of STMT_VINFO_DEF_TYPE. (vect_schedule_slp_instance): Likewise. * tree-vect-stmts.c (ect_check_store_rhs): Pass in the stmt_info, use the vect_is_simple_use overload combining SLP and stmt_info analysis. (vect_is_simple_cond): Likewise. (vectorizable_store): Adjust. (vectorizable_condition): Likewise. (vect_is_simple_use): Fully handle invariant SLP nodes here. Amend stmt_info operand extraction with COND_EXPR and masked stores. * tree-vect-loop.c (vectorizable_reduction): Deal with COND_EXPR representation ugliness.
2020-06-03tree-optimization/95487 - use a truth type for scatter masksRichard Biener1-2/+6
This makes sure to get a truth type for scatter masks even when they are invariant. 2020-06-03 Richard Biener <rguenther@suse.de> PR tree-optimization/95487 * tree-vect-stmts.c (vectorizable_store): Use a truth type for the scatter mask. * g++.dg/vect/pr95487.cc: New testcase.
2020-05-29tree-optimization/95356 - more vectorizable_shift massagingRichard Biener1-17/+20
The previous fix clashed with the rewrite to emit SLP invariants during the SLP walk. Thus the following adjusts the SLP tree hacking vectorizable_shift does appropriately. Still resisting the attempt of a rewrite of vectorizable_shift ... 2020-05-29 Richard Biener <rguenther@suse.de> PR tree-optimization/95356 * tree-vect-stmts.c (vectorizable_shift): Do in-place SLP node hacking during analysis.
2020-05-29tree-optimization/95403 - guard vect_init_vector_1 against NULL stmt_infoRichard Biener1-1/+1
2020-05-29 Richard Biener <rguenther@suse.de> PR tree-optimization/95403 * tree-vect-stmts.c (vect_init_vector_1): Guard against NULL stmt_vinfo. * gfortran.dg/vect/pr95403.f: New testcase.
2020-05-28make vect_finish_stmt_generation work w/o stmt_vec_infoRichard Biener1-8/+13
This makes the call chain below vec_init_vector happy with a NULL stmt_vec_info which is used as "context". 2020-05-27 Richard Biener <rguenther@suse.de> * tree-vect-stmts.c (vect_finish_stmt_generation_1): Conditionalize stmt_info use, assert the new stmt cannot throw when not specified. (vect_finish_stmt_generation): Adjust assert.