aboutsummaryrefslogtreecommitdiff
path: root/gcc/tree-vect-loop.c
AgeCommit message (Collapse)AuthorFilesLines
2020-06-04add vect_get_slp_vect_defRichard Biener1-1/+1
This adds vect_get_slp_vect_def to get at a SLP nodes vectorized def, abstracting away the details. It also fixes one stray failure to use SLP_TREE_REPRESENTATIVE. 2020-05-04 Richard Biener <rguenther@suse.de> * tree-vectorizer.h (vect_get_slp_vect_def): Declare. * tree-vect-loop.c (vect_create_epilog_for_reduction): Use it. * tree-vect-stmts.c (vect_transform_stmt): Likewise. (vect_is_simple_use): Use SLP_TREE_REPRESENTATIVE. * tree-vect-slp.c (vect_get_slp_vect_defs): Fold into single use ... (vect_get_slp_defs): ... here. (vect_get_slp_vect_def): New function.
2020-06-04Add explicit SLP_TREE_LANESRichard Biener1-7/+6
This adds an explicit number of scalar lanes to the SLP node avoiding to dispatch between stmts/ops and eventually not require those vectors at all. 2020-05-27 Richard Biener <rguenther@suse.de> * tree-vectorizer.h (_slp_tree::lanes): New. (SLP_TREE_LANES): Likewise. * tree-vect-loop.c (vect_create_epilog_for_reduction): Use it. (vectorizable_reduction): Likewise. (vect_transform_cycle_phi): Likewise. (vectorizable_induction): Likewise. (vectorizable_live_operation): Likewise. * tree-vect-slp.c (_slp_tree::_slp_tree): Initialize lanes. (vect_create_new_slp_node): Likewise. (slp_copy_subtree): Copy it. (vect_optimize_slp): Use it. (vect_slp_analyze_node_operations_1): Likewise. (vect_slp_convert_to_external): Likewise. (vect_bb_vectorization_profitable_p): Likewise. * tree-vect-stmts.c (vectorizable_load): Likewise. (get_vectype_for_scalar_type): Likewise.
2020-06-04Simplify SLP code wrt SLP_TREE_DEF_TYPERichard Biener1-1/+7
The following removes the ugly pushing of SLP_TREE_DEF_TYPE to stmt_infos and instead makes sure to handle invariants fully in vect_is_simple_use plus adjusting a few places I refrained from touching when enforcing vector types for them. It also simplifies building SLP nodes with all external operands from scalars by not doing that in the parent but instead not building those from the start. That also gets rid of vect_update_all_shared_vectypes. 2020-06-04 Richard Biener <rguenther@suse.de> * tree-vect-slp.c (vect_update_all_shared_vectypes): Remove. (vect_build_slp_tree_2): Simplify building all external op nodes from scalars. (vect_slp_analyze_node_operations): Remove push/pop of STMT_VINFO_DEF_TYPE. (vect_schedule_slp_instance): Likewise. * tree-vect-stmts.c (ect_check_store_rhs): Pass in the stmt_info, use the vect_is_simple_use overload combining SLP and stmt_info analysis. (vect_is_simple_cond): Likewise. (vectorizable_store): Adjust. (vectorizable_condition): Likewise. (vect_is_simple_use): Fully handle invariant SLP nodes here. Amend stmt_info operand extraction with COND_EXPR and masked stores. * tree-vect-loop.c (vectorizable_reduction): Deal with COND_EXPR representation ugliness.
2020-05-29tree-optimization/95272 - add SLP_TREE_REPRESENTATIVERichard Biener1-2/+6
This adds SLP_TREE_REPRESENTATIVE - a representative stmt-info that is used by SLP analysis and code generation. This avoids the need for the hack in vect_slp_rearrange_stmts which previously avoided to re-arrange stmts that might not have been isomorphic because of operand swapping. It also plays nice with future directions of SLP and for the forseeable future is easier than replicating more and more info in the SLP node as long as non-SLP is in-tree. 2020-05-29 Richard Biener <rguenther@suse.de> PR tree-optimization/95272 * tree-vectorizer.h (_slp_tree::representative): Add. (SLP_TREE_REPRESENTATIVE): Likewise. * tree-vect-loop.c (vectorizable_reduction): Adjust SLP node gathering. (vectorizable_live_operation): Use the representative to attach the reduction info to. * tree-vect-slp.c (_slp_tree::_slp_tree): Initialize SLP_TREE_REPRESENTATIVE. (vect_create_new_slp_node): Likewise. (slp_copy_subtree): Copy it. (vect_slp_rearrange_stmts): Re-arrange even COND_EXPR stmts. (vect_slp_analyze_node_operations_1): Pass the representative to vect_analyze_stmt. (vect_schedule_slp_instance): Pass the representative to vect_transform_stmt. * gcc.dg/vect/pr95272.c: New testcase.
2020-05-22enfoce SLP_TREE_VECTYPE for invariantsRichard Biener1-3/+30
This tries to enforce a set SLP_TREE_VECTYPE in vect_get_constant_vectors and provides some infrastructure for setting it in the vectorizable_* functions, amending those. 2020-05-22 Richard Biener <rguenther@suse.de> * tree-vectorizer.h (vect_is_simple_use): New overload. (vect_maybe_update_slp_op_vectype): New. * tree-vect-stmts.c (vect_is_simple_use): New overload accessing operands of SLP vs. non-SLP operation transparently. (vect_maybe_update_slp_op_vectype): New function updating the possibly shared SLP operands vector type. (vectorizable_operation): Be a bit more SLP vs non-SLP agnostic using the new vect_is_simple_use overload; update SLP invariant operand nodes vector type. (vectorizable_comparison): Likewise. (vectorizable_call): Likewise. (vectorizable_conversion): Likewise. (vectorizable_shift): Likewise. (vectorizable_store): Likewise. (vectorizable_condition): Likewise. (vectorizable_assignment): Likewise. * tree-vect-loop.c (vectorizable_reduction): Likewise. * tree-vect-slp.c (vect_get_constant_vectors): Enforce present SLP_TREE_VECTYPE and check it matches previous behavior.
2020-05-20tree-optimization/95219 - improve IV selection for inductionRichard Biener1-1/+13
This improves code generation with SSE2 for the testcase by making sure to only generate a single IV when the group size is a multiple of the vector size. It also adjusts the testcase which was passing before. 2020-05-20 Richard Biener <rguenther@suse.de> PR tree-optimization/95219 * tree-vect-loop.c (vectorizable_induction): Reduce group_size before computing the number of required IVs. * gcc.dg/vect/costmodel/x86_64/costmodel-pr30843.c: Adjust.
2020-05-13add vectype parameter to add_stmt_cost hookRichard Biener1-22/+24
This adds a vectype parameter to add_stmt_cost which avoids the need to pass down a (wrong) stmt_info just to carry this information. Useful for invariants which do not have a stmt_info associated. 2020-05-13 Richard Biener <rguenther@suse.de> * target.def (add_stmt_cost): Add new vectype parameter. * targhooks.c (default_add_stmt_cost): Adjust. * targhooks.h (default_add_stmt_cost): Likewise. * config/aarch64/aarch64.c (aarch64_add_stmt_cost): Take new vectype parameter. * config/arm/arm.c (arm_add_stmt_cost): Likewise. * config/i386/i386.c (ix86_add_stmt_cost): Likewise. * config/rs6000/rs6000.c (rs6000_add_stmt_cost): Likewise. * tree-vectorizer.h (stmt_info_for_cost::vectype): Add. (dump_stmt_cost): Add new vectype parameter. (add_stmt_cost): Likewise. (record_stmt_cost): Likewise. (record_stmt_cost): Add overload with old signature. * tree-vect-loop.c (vect_compute_single_scalar_iteration_cost): Adjust. (vect_get_known_peeling_cost): Likewise. (vect_estimate_min_profitable_iters): Likewise. * tree-vectorizer.c (dump_stmt_cost): Add new vectype parameter. * tree-vect-stmts.c (record_stmt_cost): Likewise. (vect_prologue_cost_for_slp_op): Remove stmt_vec_info parameter and pass down correct vectype and NULL stmt_info. (vect_model_simple_cost): Adjust. (vect_model_store_cost): Likewise.
2020-05-13Remove SLP_INSTANCE_GROUP_SIZERichard Biener1-1/+1
This removes the SLP_INSTANCE_GROUP_SIZE member since the number of lanes throughout a SLP subgraph is not necessarily constant. 2020-05-13 Richard Biener <rguenther@suse.de> * tree-vectorizer.h (SLP_INSTANCE_GROUP_SIZE): Remove. (_slp_instance::group_size): Likewise. * tree-vect-loop.c (vectorizable_reduction): The group size is the number of lanes in the node. * tree-vect-slp.c (vect_attempt_slp_rearrange_stmts): Likewise. (vect_analyze_slp_instance): Do not set SLP_INSTANCE_GROUP_SIZE, verify it matches the instance trees number of lanes. (vect_slp_analyze_node_operations_1): Use the numer of lanes in the node as group size. (vect_bb_vectorization_profitable_p): Use the instance root number of lanes for the size of life. (vect_schedule_slp_instance): Use the number of lanes as group_size. * tree-vect-stmts.c (vectorizable_load): Remove SLP instance parameter. Use the number of lanes of the load for the group size in the gap adjustment code. (vect_analyze_stmt): Adjust. (vect_transform_stmt): Likewise.
2020-05-12tree: Add vector_element_bits(_tree) [PR94980 1/3]Richard Sandiford1-3/+1
A lot of code that wants to know the number of bits in a vector element gets that information from the element's TYPE_SIZE, which is always equal to TYPE_SIZE_UNIT * BITS_PER_UNIT. This doesn't work for SVE and AVX512-style packed boolean vectors, where several elements can occupy a single byte. This patch introduces a new pair of helpers for getting the true (possibly sub-byte) size. I made a token attempt to convert obvious element size calculations, but I'm sure I missed some. 2020-05-12 Richard Sandiford <richard.sandiford@arm.com> gcc/ PR tree-optimization/94980 * tree.h (vector_element_bits, vector_element_bits_tree): Declare. * tree.c (vector_element_bits, vector_element_bits_tree): New. * match.pd: Use the new functions instead of determining the vector element size directly from TYPE_SIZE(_UNIT). * tree-vect-data-refs.c (vect_gather_scatter_fn_p): Likewise. * tree-vect-patterns.c (vect_recog_mask_conversion_pattern): Likewise. * tree-vect-stmts.c (vect_is_simple_cond): Likewise. * tree-vect-generic.c (expand_vector_piecewise): Likewise. (expand_vector_conversion): Likewise. (expand_vector_addition): Likewise for a TYPE_SIZE_UNIT used as a divisor. Convert the dividend to bits to compensate. * tree-vect-loop.c (vectorizable_live_operation): Call vector_element_bits instead of open-coding it.
2020-05-08move permutation validity checkRichard Biener1-0/+3
This delays the SLP permutation check to vectorizable_load and optimizes permutations only after all SLP instances have been generated and the vectorization factor is determined. 2020-05-08 Richard Biener <rguenther@suse.de> * tree-vectorizer.h (vec_info::slp_loads): New. (vect_optimize_slp): Declare. * tree-vect-slp.c (vect_attempt_slp_rearrange_stmts): Do nothing when there are no loads. (vect_gather_slp_loads): Gather loads into a vector. (vect_supported_load_permutation_p): Remove. (vect_analyze_slp_instance): Do not verify permutation validity here. (vect_analyze_slp): Optimize permutations of reductions after all SLP instances have been gathered and gather all loads. (vect_optimize_slp): New function split out from vect_supported_load_permutation_p. Elide some permutations. (vect_slp_analyze_bb_1): Call vect_optimize_slp. * tree-vect-loop.c (vect_analyze_loop_2): Likewise. * tree-vect-stmts.c (vectorizable_load): Check whether the load can be permuted. When generating code assert we can. * gcc.dg/vect/bb-slp-pr68892.c: Adjust for not supported SLP permutations becoming builds from scalars. * gcc.dg/vect/bb-slp-pr78205.c: Likewise. * gcc.dg/vect/bb-slp-34.c: Likewise.
2020-05-05add vec_info * parameters where neededRichard Biener1-98/+123
Soonish we'll get SLP nodes which have no corresponding scalar stmt and thus not stmt_vec_info and thus no way to get back to the associated vec_info. This patch makes the vec_info available as part of the APIs instead of putting in that back-pointer into the leaf data structures. 2020-05-05 Richard Biener <rguenther@suse.de> * tree-vectorizer.h (_stmt_vec_info::vinfo): Remove. (STMT_VINFO_LOOP_VINFO): Likewise. (STMT_VINFO_BB_VINFO): Likewise. * tree-vect-data-refs.c: Adjust for the above, adding vec_info * parameters and adjusting calls. * tree-vect-loop-manip.c: Likewise. * tree-vect-loop.c: Likewise. * tree-vect-patterns.c: Likewise. * tree-vect-slp.c: Likewise. * tree-vect-stmts.c: Likewise. * tree-vectorizer.c: Likewise. * target.def (add_stmt_cost): Add vec_info * parameter. * target.h (stmt_in_inner_loop_p): Likewise. * targhooks.c (default_add_stmt_cost): Adjust. * doc/tm.texi: Re-generate. * config/aarch64/aarch64.c (aarch64_extending_load_p): Add vec_info * parameter and adjust. (aarch64_sve_adjust_stmt_cost): Likewise. (aarch64_add_stmt_cost): Likewise. * config/arm/arm.c (arm_add_stmt_cost): Likewise. * config/i386/i386.c (ix86_add_stmt_cost): Likewise. * config/rs6000/rs6000.c (rs6000_add_stmt_cost): Likewise.
2020-04-20vect: Tweak vect_better_loop_vinfo_p handling of variable VFsRichard Sandiford1-1/+30
This patch fixes a large lmbench performance regression with 128-bit SVE, compiled in length-agnostic mode. vect_better_loop_vinfo_p (new in GCC 10) tries to estimate whether a new loop_vinfo is cheaper than a previous one, with an in-built preference for the old one. For variable VF it prefers the old loop_vinfo if it is cheaper for at least one VF. However, we have no idea how likely that VF is in practice. Another extreme would be to do what most of the rest of the vectoriser does, and rely solely on the constant estimated VF. But as noted in the comment, this means that a one-unit cost difference would be enough to pick the new loop_vinfo, despite the target generally preferring the old loop_vinfo where possible. The cost model just isn't accurate enough for that to produce good results as things stand: there might not be any practical benefit to the new loop_vinfo at the estimated VF, and it would be significantly worse for higher VFs. The patch instead goes for a hacky compromise: make sure that the new loop_vinfo is also no worse than the old loop_vinfo at double the estimated VF. For all but trivial loops, this ensures that the new loop_vinfo is only chosen if it is better than the old one by a non-trivial amount at the estimated VF. It also avoids putting too much faith in the VF estimate. I realise this isn't great, but it's supposed to be a conservative fix suitable for stage 4. The only affected testcases are the ones for pr89007-*.c, where Advanced SIMD is indeed preferred for 128-bit SVE and is no worse for 256-bit SVE. Part of the problem here is that if the new loop_vinfo is better, we discard the old one and never consider using it even as an epilogue loop. This means that if we choose Advanced SIMD over SVE, we're much more likely to have left-over scalar elements. Another is that the estimate provided by estimated_poly_value might have different probabilities attached. E.g. when tuning for a particular core, the estimate is probably accurate, but when tuning for generic code, the estimate is more of a guess. Relying solely on the estimate is probably correct for the former but not for the latter. Hopefully those are things that we could tackle in GCC 11. 2020-04-20 Richard Sandiford <richard.sandiford@arm.com> gcc/ * tree-vect-loop.c (vect_better_loop_vinfo_p): If old_loop_vinfo has a variable VF, prefer new_loop_vinfo if it is cheaper for the estimated VF and is no worse at double the estimated VF. gcc/testsuite/ * gcc.target/aarch64/sve/cost_model_8.c: New test. * gcc.target/aarch64/sve/cost_model_9.c: Likewise. * gcc.target/aarch64/sve/pr89007-1.c: Add -msve-vector-bits=512. * gcc.target/aarch64/sve/pr89007-2.c: Likewise.
2020-04-03Fix PR94443 with gsi_insert_seq_before [PR94443]Kewen Lin1-2/+2
This patch is to fix the stupid mistake by using gsi_insert_seq_before instead of gsi_insert_before. BTW, the regression testing on one x86_64 machine from CFarm is unable to reveal it (I guess due to native arch sandybridge?), so I specified additional option -march=znver2 and verified the coverage. Bootstrapped/regtested on powerpc64le-linux-gnu (P9) and x86_64-pc-linux-gnu, also verified the fail cases in related PRs. 2020-04-03 Kewen Lin <linkw@gcc.gnu.org> gcc/ PR tree-optimization/94443 * tree-vect-loop.c (vectorizable_live_operation): Use gsi_insert_seq_before to replace gsi_insert_before. gcc/testsuite/ PR tree-optimization/94443 * gcc.dg/vect/pr94443.c: New test.
2020-04-01Fix PR94043 by making vect_live_op generate lc-phiKewen Lin1-6/+44
As PR94043 shows, my commit r10-4524 exposed one issue in vectorizable_live_operation, which inserts one extra BB before the single exit, leading unexpected operand expansion and unexpected loop depth assertion. As Richi suggested, this patch is to teach vectorizable_live_operation to generate loop closed phi for vec_lhs, it looks like: loop; # lhs' = PHI <lhs> => loop; # vec_lhs' = PHI <vec_lhs> new_tree = BIT_FIELD_REF <vec_lhs', ...>; lhs' = new_tree; I noticed that there are some SLP cases that have same lhs and vec_lhs but different offsets, which can make us have more PHIs for the same vec_lhs there. But I think it would be fine since only one of them is actually live, the others should be eliminated by the following dce. So the patch doesn't check whether there is one phi for vec_lhs, just create one directly instead. Bootstrapped/regtested on powerpc64le-linux-gnu (LE) P8. 2020-04-01 Kewen Lin <linkw@gcc.gnu.org> gcc/ChangeLog PR tree-optimization/94043 * tree-vect-loop.c (vectorizable_live_operation): Generate loop-closed phi for vec_lhs and use it for lane extraction. gcc/testsuite/ChangeLog PR tree-optimization/94043 * gfortran.dg/graphite/vect-pr94043.f90: New test.
2020-03-17Fix up duplicated duplicated words mostly in commentsJakub Jelinek1-1/+1
In the r10-7197-gbae7b38cf8a21e068ad5c0bab089dedb78af3346 commit I've noticed duplicated word in a message, which lead me to grep for those and we have a tons of them. I've used grep -v 'long long\|optab optab\|template template\|double double' *.[chS] */*.[chS] *.def config/*/* 2>/dev/null | grep ' \([a-zA-Z]\+\) \1 ' Note, the command will not detect the doubled words at the start or end of line or when one of the words is at the end of line and the next one at the start of another one. Some of it is fairly obvious, e.g. all the "the the" cases which is something I've posted and committed patch for already e.g. in 2016, other cases are often valid, e.g. "that that" seems to look mostly ok to me. Some cases are quite hard to figure out, I've left out some of them from the patch (e.g. "and and" in some cases isn't talking about bitwise/logical and and so looks incorrect, but in other cases it is talking about those operations). In most cases the right solution seems to be to remove one of the duplicated words, but not always. I think most important are the ones with user visible messages (in the patch 3 of the first 4 hunks), the rest is just comments (and internal documentation; for that see the doc/tm.texi changes). 2020-03-17 Jakub Jelinek <jakub@redhat.com> * lra-spills.c (remove_pseudos): Fix up duplicated word issue in a dump message. * tree-sra.c (create_access_replacement): Fix up duplicated word issue in a comment. * read-rtl-function.c (find_param_by_name, function_reader::parse_enum_value, function_reader::get_insn_by_uid): Likewise. * spellcheck.c (get_edit_distance_cutoff): Likewise. * tree-data-ref.c (create_ifn_alias_checks): Likewise. * tree.def (SWITCH_EXPR): Likewise. * selftest.c (assert_str_contains): Likewise. * ipa-param-manipulation.h (class ipa_param_body_adjustments): Likewise. * tree-ssa-math-opts.c (convert_expand_mult_copysign): Likewise. * tree-ssa-loop-split.c (find_vdef_in_loop): Likewise. * langhooks.h (struct lang_hooks_for_decls): Likewise. * ipa-prop.h (struct ipa_param_descriptor): Likewise. * tree-ssa-strlen.c (handle_builtin_string_cmp, handle_store): Likewise. * tree-ssa-dom.c (simplify_stmt_for_jump_threading): Likewise. * tree-ssa-reassoc.c (reassociate_bb): Likewise. * tree.c (component_ref_size): Likewise. * hsa-common.c (hsa_init_compilation_unit_data): Likewise. * gimple-ssa-sprintf.c (get_string_length, format_string, format_directive): Likewise. * omp-grid.c (grid_process_kernel_body_copy): Likewise. * input.c (string_concat_db::get_string_concatenation, test_lexer_string_locations_ucn4): Likewise. * cfgexpand.c (pass_expand::execute): Likewise. * gimple-ssa-warn-restrict.c (builtin_memref::offset_out_of_bounds, maybe_diag_overlap): Likewise. * rtl.c (RTX_CODE_HWINT_P_1): Likewise. * shrink-wrap.c (spread_components): Likewise. * tree-ssa-dse.c (initialize_ao_ref_for_dse, valid_ao_ref_for_dse): Likewise. * tree-call-cdce.c (shrink_wrap_one_built_in_call_with_conds): Likewise. * dwarf2out.c (dwarf2out_early_finish): Likewise. * gimple-ssa-store-merging.c: Likewise. * ira-costs.c (record_operand_costs): Likewise. * tree-vect-loop.c (vectorizable_reduction): Likewise. * target.def (dispatch): Likewise. (validate_dims, gen_ccmp_first): Fix up duplicated word issue in documentation text. * doc/tm.texi: Regenerated. * config/i386/x86-tune.def (X86_TUNE_PARTIAL_FLAG_REG_STALL): Fix up duplicated word issue in a comment. * config/i386/i386.c (ix86_test_loading_unspec): Likewise. * config/i386/i386-features.c (remove_partial_avx_dependency): Likewise. * config/msp430/msp430.c (msp430_select_section): Likewise. * config/gcn/gcn-run.c (load_image): Likewise. * config/aarch64/aarch64-sve.md (sve_ld1r<mode>): Likewise. * config/aarch64/aarch64.c (aarch64_gen_adjusted_ldpstp): Likewise. * config/aarch64/falkor-tag-collision-avoidance.c (single_dest_per_chain): Likewise. * config/nvptx/nvptx.c (nvptx_record_fndecl): Likewise. * config/fr30/fr30.c (fr30_arg_partial_bytes): Likewise. * config/rs6000/rs6000-string.c (expand_cmp_vec_sequence): Likewise. * config/rs6000/rs6000-p8swap.c (replace_swapped_load_constant): Likewise. * config/rs6000/rs6000-c.c (rs6000_target_modify_macros): Likewise. * config/rs6000/rs6000.c (rs6000_option_override_internal): Likewise. * config/rs6000/rs6000-logue.c (rs6000_emit_probe_stack_range_stack_clash): Likewise. * config/nds32/nds32-md-auxiliary.c (nds32_split_ashiftdi3): Likewise. Fix various other issues in the comment. c-family/ * c-common.c (resolve_overloaded_builtin): Fix up duplicated word issue in a diagnostic message. cp/ * pt.c (tsubst): Fix up duplicated word issue in a diagnostic message. (lookup_template_class_1, tsubst_expr): Fix up duplicated word issue in a comment. * parser.c (cp_parser_statement, cp_parser_linkage_specification, cp_parser_placeholder_type_specifier, cp_parser_constraint_requires_parens): Likewise. * name-lookup.c (suggest_alternative_in_explicit_scope): Likewise. fortran/ * array.c (gfc_check_iter_variable): Fix up duplicated word issue in a comment. * arith.c (gfc_arith_concat): Likewise. * resolve.c (gfc_resolve_ref): Likewise. * frontend-passes.c (matmul_lhs_realloc): Likewise. * module.c (gfc_match_submodule, load_needed): Likewise. * trans-expr.c (gfc_init_se): Likewise.
2020-01-28vect: Pattern-matched calls in reduction chainsRichard Sandiford1-3/+11
gcc.dg/pr56350.c started ICEing for SVE in GCC 10 because we pattern-matched a division reduction: a /= 8; into a signed shift with division semantics: ... = IFN_SDIV_POW2 (..., 3); whereas the reduction code expected it still to be a gassign. One fix would be to check for a reduction in the pattern matcher (but current patterns don't generally do that). Another would be to fail gracefully for reductions involving calls. Since we can't vectorise the reduction either way, and probably have a better shot with the shift form, this patch goes for the "fail gracefully" approach. 2020-01-28 Richard Sandiford <richard.sandiford@arm.com> gcc/ * tree-vect-loop.c (vectorizable_reduction): Fail gracefully for reduction chains that (now) include a call.
2020-01-20tree-optimization/93094 pass down VECTORIZED_CALL to versioningRichard Biener1-2/+2
When versioning is run the IL is already mangled and finding a VECTORIZED_CALL IFN can fail. 2020-01-20 Richard Biener <rguenther@suse.de> PR tree-optimization/93094 * tree-vectorizer.h (vect_loop_versioning): Adjust. (vect_transform_loop): Likewise. * tree-vectorizer.c (try_vectorize_loop_1): Pass down loop_vectorized_call to vect_transform_loop. * tree-vect-loop.c (vect_transform_loop): Pass down loop_vectorized_call to vect_loop_versioning. * tree-vect-loop-manip.c (vect_loop_versioning): Use the earlier discovered loop_vectorized_call. * gcc.dg/vect/pr93094.c: New testcase.
2020-01-16PR tree-optimization/92429 do not fold when updating epilogue statementsAndre Vieira1-1/+6
This patch addresses the problem reported in PR92429. When creating an epilogue for vectorization we have to replace the SSA_NAMEs in the PATTERN_DEF_SEQs and RELATED_STMTs of the epilogue's loop_vec_infos. When doing this we were using simplify_replace_tree which always folds the replacement. This may lead to a different tree-node than the one which was analyzed in vect_loop_analyze. In turn the new tree-node may require a different vectorization than the one we had prepared for which caused the ICE in question. gcc/ChangeLog: 2020-01-16 Andre Vieira <andre.simoesdiasvieira@arm.com> PR tree-optimization/92429 * tree-ssa-loop-niter.h (simplify_replace_tree): Add parameter. * tree-ssa-loop-niter.c (simplify_replace_tree): Add parameter to control folding. * tree-vect-loop.c (update_epilogue_vinfo): Do not fold when replacing tree. gcc/testsuite/ChangeLog: 2020-01-16 Andre Vieira <andre.simoesdiasvieira@arm.com> PR tree-optimization/92429 * gcc.dg/vect/pr92429.c: New test.
2020-01-15PR tree-optimization/93247 - ICE in get_load_store_typeRichard Sandiford1-1/+2
My earlier update_epilogue_loop_vinfo patch introduced an ICE on these tests for AVX512. If we use pattern stmts, STMT_VINFO_GATHER_SCATTER_P is valid for both the original stmt and the pattern stmt, but STMT_VINFO_MEMORY_ACCESS_TYPE is valid only for the latter. 2020-01-15 Richard Sandiford <richard.sandiford@arm.com> gcc/ PR tree-optimization/93247 * tree-vect-loop.c (update_epilogue_loop_vinfo): Check the access type of the stmt that we're going to vectorize. gcc/testsuite/ PR tree-optimization/93247 * gcc.dg/vect/pr93247-1.c: New test. * gcc.dg/vect/pr93247-2.c: Likewise.
2020-01-10Use get_related_vectype_for_scalar_type for reduction indicesRichard Sandiford1-2/+3
The related_vector_mode series missed this case in vect_create_epilog_for_reduction, where we want to create the unsigned integer equivalent of another vector. Without it we could mix SVE and Advanced SIMD vectors in the same operation. This showed up on existing tests when testing with fixed-length -msve-vector-bits=128. 2020-01-10 Richard Sandiford <richard.sandiford@arm.com> gcc/ * tree-vect-loop.c (vect_create_epilog_for_reduction): Use get_related_vectype_for_scalar_type rather than build_vector_type to create the index type for a conditional reduction. From-SVN: r280112
2020-01-10Fix gather/scatter check when updating a vector epilogue loopRichard Sandiford1-1/+1
update_epilogue_loop_vinfo applies SSA renmaing to the DR_REF of a gather or scatter, so that vect_check_gather_scatter continues to work. However, we sometimes also rely on vect_check_gather_scatter when using gathers and scatters to implement strided accesses. This showed up on existing tests when testing with fixed-length -msve-vector-bits=128. 2020-01-10 Richard Sandiford <richard.sandiford@arm.com> gcc/ * tree-vect-loop.c (update_epilogue_loop_vinfo): Update DR_REF for any type of gather or scatter, including strided accesses. From-SVN: r280111
2020-01-10[vect] Keep track of DR_OFFSET advance in dr_vec_info rather than data_referenceAndre Vieira1-11/+4
gcc/ChangeLog: 2020-01-10 Andre Vieira <andre.simoesdiasvieira@arm.com> * tree-vect-data-refs.c (vect_create_addr_base_for_vector_ref): Use get_dr_vinfo_offset * tree-vect-loop.c (update_epilogue_loop_vinfo): Remove orig_drs_init parameter and its use to reset DR_OFFSET's. (vect_transform_loop): Remove orig_drs_init argument. * tree-vect-loop-manip.c (vect_update_init_of_dr): Update the offset member of dr_vec_info rather than the offset of the associated data_reference's innermost_loop_behavior. (vect_update_init_of_dr): Pass dr_vec_info instead of data_reference. (vect_do_peeling): Remove orig_drs_init parameter and its construction. * tree-vect-stmts.c (check_scan_store): Replace use of DR_OFFSET with get_dr_vinfo_offset. (vectorizable_store): Likewise. (vectorizable_load): Likewise. From-SVN: r280107
2020-01-01Update copyright years.Jakub Jelinek1-1/+1
From-SVN: r279813
2019-12-27Add missing target check for fully-masked fold-left reductionsRichard Sandiford1-0/+12
The fold-left reduction code has a (rarely-used) fallback that handles cases in which the loop is fully-masked and the target has no native support for the reduction. The fallback includea a VEC_COND_EXPR between the reduction vector and a safe value, so we should check whether that VEC_COND_EXPR is supported. 2019-12-27 Richard Sandiford <richard.sandiford@arm.com> gcc/ * tree-vect-loop.c (vectorizable_reduction): Check whether the target supports the required VEC_COND_EXPR operation before allowing the fallback handling of masked fold-left reductions. gcc/testsuite/ * gcc.target/aarch64/sve/mixed_size_10.c: New test. From-SVN: r279742
2019-12-17Add pointer to PR92772Andrew Stubbs1-1/+4
2019-12-17 Andrew Stubbs <ams@codesourcery.com> * tree-vect-loop.c (vect_create_epilog_for_reduction): Mention pr92772 in the comments. From-SVN: r279460
2019-12-10Add missing conversion in vect_create_epilog_for_reductionRichard Sandiford1-0/+2
The direct_slp_reduc code in vect_create_epilog_for_reduction was still assuming that all types involved in a reduction are the same (up to types_compatible_p), whereas we now support differences in sign. This was causing an ICE in gcc.dg/vect/pr92324-4.c for SVE. 2019-12-10 Richard Sandiford <richard.sandiford@arm.com> gcc/ * tree-vect-loop.c (vect_create_epilog_for_reduction): When handling direct_slp_reduc, allow the PHI arguments to have a different type from the vector elements. From-SVN: r279164
2019-12-10Disallow EXTRACT_LAST_REDUCTION for reduction chainsRichard Sandiford1-2/+3
gcc.dg/vect/vect-cond-reduc-5.c was ICEing for SVE because we tried to use an extract-last reduction for a chain of COND_EXPRs. Adding support for the chained case would be too invasive for stage 3 so this patch explicitly forbids it instead. I've filed PR92884 for the possible future work. 2019-12-10 Richard Sandiford <richard.sandiford@arm.com> gcc/ * tree-vect-loop.c (vectorizable_reduction): Don't use EXTRACT_LAST_REDUCTION for chained reductions. From-SVN: r279161
2019-12-05[Patch, GCC] Fix a condition post r278611Sudakshina Das1-1/+1
gcc/ChangeLog 2019-12-05 Sudakshina Das <sudi.das@arm.com> * tree-vect-loop.c (vect_model_reduction_cost): Remove reduction_type check from if condition. From-SVN: r279012
2019-12-02re PR tree-optimization/92742 (ICE in info_for_reduction, at ↵Richard Biener1-1/+2
tree-vect-loop.c:4367) 2019-12-02 Richard Biener <rguenther@suse.de> PR tree-optimization/92742 * tree-vect-loop.c (vect_fixup_reduc_chain): Do not touch the def-type but verify it is consistent with the original stmts. * gcc.dg/torture/pr92742.c: New testcase. From-SVN: r278896
2019-11-29Fix DR_GROUP_GAP for strided accesses (PR 92677)Richard Sandiford1-1/+4
When dissolving an SLP-only group of accesses, we should only set the gap to group_size - 1 for normal non-strided groups. 2019-11-29 Richard Sandiford <richard.sandiford@arm.com> gcc/ PR tree-optimization/92677 * tree-vect-loop.c (vect_dissolve_slp_only_groups): Set the gap to zero when dissolving a group of strided accesses. gcc/testsuite/ PR tree-optimization/92677 * gcc.dg/vect/pr92677.c: New test. From-SVN: r278852
2019-11-29Don't defer choice of vector type for bools (PR 92596)Richard Sandiford1-30/+8
Now that stmt_vec_info records the choice between vector mask types and normal nonmask types, we can use that information in vect_get_vector_types_for_stmt instead of deferring the choice of vector type till later. vect_get_mask_type_for_stmt used to check whether the boolean inputs to an operation: (a) consistently used mask types or consistently used nonmask types; and (b) agreed on the number of elements. (b) shouldn't be a problem when (a) is met. If the operation consistently uses mask types, tree-vect-patterns.c will have corrected any mismatches in mask precision. (This is because we only use mask types for a small well-known set of operations and tree-vect-patterns.c knows how to handle any that could have different mask precisions.) And if the operation consistently uses normal nonmask types, there's no reason why booleans should need extra vector compatibility checks compared to ordinary integers. So the potential difficulties all seem to come from (a). Now that we've chosen the result type ahead of time, we also have to consider whether the outputs and inputs consistently use mask types. Taking each vectorizable_* routine in turn: - vectorizable_call vect_get_vector_types_for_stmt only handled booleans specially for gassigns, so vect_get_mask_type_for_stmt never had chance to handle calls. I'm not sure we support any calls that operate on booleans, but as things stand, a boolean result would always have a nonmask type. Presumably any vector argument would also need to use nonmask types, unless it corresponds to internal_fn_mask_index (which is already a special case). For safety, I've added a check for mask/nonmask combinations here even though we didn't check this previously. - vectorizable_simd_clone_call Again, vect_get_mask_type_for_stmt never had chance to handle calls. The result of the call will always be a nonmask type and the patch for PR 92710 rejects mask arguments. So all booleans should consistently use nonmask types here. - vectorizable_conversion The function already rejects any conversion between booleans in which one type isn't a mask type. - vectorizable_operation This function definitely needs a consistency check, e.g. to handle & and | in which one operand is loaded from memory and the other is a comparison result. Ideally we'd handle this via pattern stmts instead (like we do for the all-mask case), but that's future work. - vectorizable_assignment VECT_SCALAR_BOOLEAN_TYPE_P requires single-bit precision, so the current code already rejects problematic cases. - vectorizable_load Loads always produce nonmask types and there are no relevant inputs to check against. - vectorizable_store vect_check_store_rhs already rejects mask/nonmask combinations via useless_type_conversion_p. - vectorizable_reduction - vectorizable_lc_phi PHIs always have nonmask types. After the change above, attempts to combine the PHI result with a mask type would be rejected by vectorizable_operation. (Again, it would be better to handle this using pattern stmts.) - vectorizable_induction We don't generate inductions for booleans. - vectorizable_shift The function already rejects boolean shifts via type_has_mode_precision_p. - vectorizable_condition The function already rejects mismatches via useless_type_conversion_p. - vectorizable_comparison The function already rejects comparisons between mask and nonmask types. The result is always a mask type. 2019-11-29 Richard Sandiford <richard.sandiford@arm.com> gcc/ PR tree-optimization/92596 * tree-vect-stmts.c (vectorizable_call): Punt on hybrid mask/nonmask operations. (vectorizable_operation): Likewise, instead of relying on vect_get_mask_type_for_stmt to do this. (vect_get_vector_types_for_stmt): Always return a vector type immediately, rather than deferring the choice for boolean results. Use a vector mask type instead of a normal vector if vect_use_mask_type_p. (vect_get_mask_type_for_stmt): Delete. * tree-vect-loop.c (vect_determine_vf_for_stmt_1): Remove mask_producers argument and special boolean_type_node handling. (vect_determine_vf_for_stmt): Remove mask_producers argument and update calls to vect_determine_vf_for_stmt_1. Remove doubled call. (vect_determine_vectorization_factor): Update call accordingly. * tree-vect-slp.c (vect_build_slp_tree_1): Remove special boolean_type_node handling. (vect_slp_analyze_node_operations_1): Likewise. gcc/testsuite/ PR tree-optimization/92596 * gcc.dg/vect/bb-slp-pr92596.c: New test. * gcc.dg/vect/bb-slp-43.c: Likewise. From-SVN: r278851
2019-11-22Move EXTRACT_LAST_REDUCTION costing to vectorizable_conditionRichard Sandiford1-2/+5
gcc.target/aarch64/sve/clastb_[57].c started failing after the increase in the cost of vec_to_scalar (r278452). The problem is that we were double-counting the cost of the CLASTB: once in vect_model_reduction_cost as a vec_to_scalar and once in vectorizable_condition as a plain vector_stmt. Based on the TODO above vect_model_reduction_cost, I think the preferred long-term direction is for vectorizable_* to cost these things itself, so that's what the patch does (for this one case only). 2019-11-22 Richard Sandiford <richard.sandiford@arm.com> gcc/ * tree-vect-stmts.c (vect_model_simple_cost): Take an optional vect_cost_for_stmt. (vectorizable_condition): Calculate the cost of EXTRACT_LAST_REDUCTION here rather than... * tree-vect-loop.c (vect_model_reduction_cost): ...here. From-SVN: r278611
2019-11-19re PR tree-optimization/92581 (condition chains vectorized wrongly)Richard Biener1-26/+32
2019-11-19 Richard Biener <rguenther@suse.de> PR tree-optimization/92581 * tree-vect-loop.c (vect_create_epilog_for_reduction): For condition reduction chains gather all conditions involved for computing the index reduction vector. * gcc.dg/vect/vect-cond-reduc-5.c: New testcase. From-SVN: r278445
2019-11-19re PR tree-optimization/92554 (ICE in vect_create_epilog_for_reduction, at ↵Richard Biener1-11/+21
tree-vect-loop.c:4325) 2019-11-19 Richard Biener <rguenther@suse.de> PR tree-optimization/92554 * tree-vect-loop.c (vect_create_epilog_for_reduction): Look for the actual condition stmt and deal with sign-changes. * gcc.dg/vect/pr92554.c: New testcase. From-SVN: r278431
2019-11-19re PR tree-optimization/92555 (ICE in exact_div, at poly-int.h:2162)Richard Biener1-0/+12
2019-09-19 Richard Biener <rguenther@suse.de> PR tree-optimization/92555 * tree-vect-loop.c (vect_update_vf_for_slp): Also scan PHIs for non-SLP stmts. * gcc.dg/vect/pr92555.c: New testcase. From-SVN: r278430
2019-11-18re PR tree-optimization/92558 (Miscompare of 554.roms_r with -Ofast ↵Richard Biener1-0/+1
-march=znver2 -flto since r278289) 2019-11-18 Richard Biener <rguenther@suse.de> PR tree-optimization/92558 * tree-vect-loop.c (vect_create_epilog_for_reduction): When reducting the width of a reduction vector def update new_phis. * gcc.dg/vect/pr92558.c: New testcase. From-SVN: r278400
2019-11-16Optionally pick the cheapest loop_vec_infoRichard Sandiford1-7/+141
This patch adds a mode in which the vectoriser tries each available base vector mode and picks the one with the lowest cost. The new behaviour is selected by autovectorize_vector_modes. The patch keeps the current behaviour of preferring a VF of loop->simdlen over any larger or smaller VF, regardless of costs or target preferences. 2019-11-16 Richard Sandiford <richard.sandiford@arm.com> gcc/ * target.h (VECT_COMPARE_COSTS): New constant. * target.def (autovectorize_vector_modes): Return a bitmask of flags. * doc/tm.texi: Regenerate. * targhooks.h (default_autovectorize_vector_modes): Update accordingly. * targhooks.c (default_autovectorize_vector_modes): Likewise. * config/aarch64/aarch64.c (aarch64_autovectorize_vector_modes): Likewise. * config/arc/arc.c (arc_autovectorize_vector_modes): Likewise. * config/arm/arm.c (arm_autovectorize_vector_modes): Likewise. * config/i386/i386.c (ix86_autovectorize_vector_modes): Likewise. * config/mips/mips.c (mips_autovectorize_vector_modes): Likewise. * tree-vectorizer.h (_loop_vec_info::vec_outside_cost) (_loop_vec_info::vec_inside_cost): New member variables. * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize them. (vect_better_loop_vinfo_p, vect_joust_loop_vinfos): New functions. (vect_analyze_loop): When autovectorize_vector_modes returns VECT_COMPARE_COSTS, try vectorizing the loop with each available vector mode and picking the one with the lowest cost. (vect_estimate_min_profitable_iters): Record the computed costs in the loop_vec_info. From-SVN: r278336
2019-11-16Extend can_duplicate_and_interleave_p to mixed-size vectorsRichard Sandiford1-2/+1
This patch makes can_duplicate_and_interleave_p cope with mixtures of vector sizes, by using queries based on get_vectype_for_scalar_type instead of directly querying GET_MODE_SIZE (vinfo->vector_mode). int_mode_for_size is now the first check we do for a candidate mode, so it seemed better to restrict it to MAX_FIXED_MODE_SIZE. This avoids unnecessary work and avoids trying to create scalar types that the target might not support. 2019-11-16 Richard Sandiford <richard.sandiford@arm.com> gcc/ * tree-vectorizer.h (can_duplicate_and_interleave_p): Take an element type rather than an element mode. * tree-vect-slp.c (can_duplicate_and_interleave_p): Likewise. Use get_vectype_for_scalar_type to query the natural types for a given element type rather than basing everything on GET_MODE_SIZE (vinfo->vector_mode). Limit int_mode_for_size query to MAX_FIXED_MODE_SIZE. (duplicate_and_interleave): Update call accordingly. * tree-vect-loop.c (vectorizable_reduction): Likewise. From-SVN: r278335
2019-11-15re PR tree-optimization/92512 (ICE in gimple_op, at gimple.h:2436)Richard Biener1-4/+17
2019-11-15 Richard Biener <rguenther@suse.de> PR tree-optimization/92512 * tree-vect-loop.c (check_reduction_path): Fix operand index computability check. Add check for second use in COND_EXPRs. * gcc.dg/torture/pr92512.c: New testcase. From-SVN: r278293
2019-11-15re PR tree-optimization/92324 (ICE in expand_direct_optab_fn, at ↵Richard Biener1-30/+38
internal-fn.c:2890) 2019-11-15 Richard Biener <rguenther@suse.de> PR tree-optimization/92324 * tree-vect-loop.c (vect_create_epilog_for_reduction): Fix singedness of SLP reduction epilouge operations. Also reduce the vector width for SLP reductions before doing elementwise operations if possible. * gcc.dg/vect/pr92324-4.c: New testcase. From-SVN: r278289
2019-11-14Avoid retrying with the same vector modesRichard Sandiford1-0/+13
A later patch makes the AArch64 port add four entries to autovectorize_vector_modes. Each entry describes a different vector mode assignment for vector code that mixes 8-bit, 16-bit, 32-bit and 64-bit elements. But if (as usual) the vector code has fewer element sizes than that, we could end up trying the same combination of vector modes multiple times. This patch adds a check to prevent that. 2019-11-14 Richard Sandiford <richard.sandiford@arm.com> gcc/ * tree-vectorizer.h (vec_info::mode_set): New typedef. (vec_info::used_vector_mode): New member variable. (vect_chooses_same_modes_p): Declare. * tree-vect-stmts.c (get_vectype_for_scalar_type): Record each chosen vector mode in vec_info::used_vector_mode. (vect_chooses_same_modes_p): New function. * tree-vect-loop.c (vect_analyze_loop): Use it to avoid trying the same vector statements multiple times. * tree-vect-slp.c (vect_slp_bb_region): Likewise. From-SVN: r278242
2019-11-14Support vectorisation with mixed vector sizesRichard Sandiford1-14/+40
After previous patches, it's now possible to make the vectoriser support multiple vector sizes in the same vector region, using related_vector_mode to pick the right vector mode for a given element mode. No port yet takes advantage of this, but I have a follow-on patch for AArch64. This patch also seemed like a good opportunity to add some more dump messages: one to make it clear which vector size/mode was being used when analysis passed or failed, and another to say when we've decided to skip a redundant vector size/mode. 2019-11-14 Richard Sandiford <richard.sandiford@arm.com> gcc/ * machmode.h (opt_machine_mode::operator==): New function. (opt_machine_mode::operator!=): Likewise. * tree-vectorizer.h (vec_info::vector_mode): Update comment. (get_related_vectype_for_scalar_type): Delete. (get_vectype_for_scalar_type_and_size): Declare. * tree-vect-slp.c (vect_slp_bb_region): Print dump messages to say whether analysis passed or failed, and with what vector modes. Use related_vector_mode to check whether trying a particular vector mode would be redundant with the autodetected mode, and print a dump message if we decide to skip it. * tree-vect-loop.c (vect_analyze_loop): Likewise. (vect_create_epilog_for_reduction): Use get_related_vectype_for_scalar_type instead of get_vectype_for_scalar_type_and_size. * tree-vect-stmts.c (get_vectype_for_scalar_type_and_size): Replace with... (get_related_vectype_for_scalar_type): ...this new function. Take a starting/"prevailing" vector mode rather than a vector size. Take an optional nunits argument, with the same meaning as for related_vector_mode. Use related_vector_mode when not auto-detecting a mode, falling back to mode_for_vector if no target mode exists. (get_vectype_for_scalar_type): Update accordingly. (get_same_sized_vectype): Likewise. * tree-vectorizer.c (get_vec_alignment_for_array_type): Likewise. From-SVN: r278240
2019-11-14Replace vec_info::vector_size with vec_info::vector_modeRichard Sandiford1-19/+13
This patch replaces vec_info::vector_size with vec_info::vector_mode, but for now continues to use it as a way of specifying a single vector size. This makes it easier for later patches to use related_vector_mode instead. 2019-11-14 Richard Sandiford <richard.sandiford@arm.com> gcc/ * tree-vectorizer.h (vec_info::vector_size): Replace with... (vec_info::vector_mode): ...this new field. * tree-vect-loop.c (vect_update_vf_for_slp): Update accordingly. (vect_analyze_loop, vect_transform_loop): Likewise. * tree-vect-loop-manip.c (vect_do_peeling): Likewise. * tree-vect-slp.c (can_duplicate_and_interleave_p): Likewise. (vect_make_slp_decision, vect_slp_bb_region): Likewise. * tree-vect-stmts.c (get_vectype_for_scalar_type): Likewise. * tree-vectorizer.c (try_vectorize_loop_1): Likewise. gcc/testsuite/ * gcc.dg/vect/vect-tail-nomask-1.c: Update expected epilogue vectorization message. From-SVN: r278237
2019-11-14Replace autovectorize_vector_sizes with autovectorize_vector_modesRichard Sandiford1-18/+15
This is another patch in the series to remove the assumption that all modes involved in vectorisation have to be the same size. Rather than have the target provide a list of vector sizes, it makes the target provide a list of vector "approaches", with each approach represented by a mode. A later patch will pass this mode to targetm.vectorize.related_mode to get the vector mode for a given element mode. Until then, the modes simply act as an alternative way of specifying the vector size. 2019-11-14 Richard Sandiford <richard.sandiford@arm.com> gcc/ * target.h (vector_sizes, auto_vector_sizes): Delete. (vector_modes, auto_vector_modes): New typedefs. * target.def (autovectorize_vector_sizes): Replace with... (autovectorize_vector_modes): ...this new hook. * doc/tm.texi.in (TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES): Replace with... (TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES): ...this new hook. * doc/tm.texi: Regenerate. * targhooks.h (default_autovectorize_vector_sizes): Delete. (default_autovectorize_vector_modes): New function. * targhooks.c (default_autovectorize_vector_sizes): Delete. (default_autovectorize_vector_modes): New function. * omp-general.c (omp_max_vf): Use autovectorize_vector_modes instead of autovectorize_vector_sizes. Use the number of units in the mode to calculate the maximum VF. * omp-low.c (omp_clause_aligned_alignment): Use autovectorize_vector_modes instead of autovectorize_vector_sizes. Use a loop based on related_mode to iterate through all supported vector modes for a given scalar mode. * optabs-query.c (can_vec_mask_load_store_p): Use autovectorize_vector_modes instead of autovectorize_vector_sizes. * tree-vect-loop.c (vect_analyze_loop, vect_transform_loop): Likewise. * tree-vect-slp.c (vect_slp_bb_region): Likewise. * config/aarch64/aarch64.c (aarch64_autovectorize_vector_sizes): Replace with... (aarch64_autovectorize_vector_modes): ...this new function. (TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES): Delete. (TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES): Define. * config/arc/arc.c (arc_autovectorize_vector_sizes): Replace with... (arc_autovectorize_vector_modes): ...this new function. (TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES): Delete. (TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES): Define. * config/arm/arm.c (arm_autovectorize_vector_sizes): Replace with... (arm_autovectorize_vector_modes): ...this new function. (TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES): Delete. (TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES): Define. * config/i386/i386.c (ix86_autovectorize_vector_sizes): Replace with... (ix86_autovectorize_vector_modes): ...this new function. (TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES): Delete. (TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES): Define. * config/mips/mips.c (mips_autovectorize_vector_sizes): Replace with... (mips_autovectorize_vector_modes): ...this new function. (TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES): Delete. (TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES): Define. From-SVN: r278236
2019-11-14Remove build_{same_sized_,}truth_vector_typeRichard Sandiford1-5/+4
build_same_sized_truth_vector_type was confusingly named, since for SVE and AVX512 the returned vector isn't the same byte size (although it does have the same number of elements). What it really returns is the "truth" vector type for a given data vector type. The more general truth_type_for provides the same thing when passed a vector and IMO has a more descriptive name, so this patch replaces all uses of build_same_sized_truth_vector_type with that. It does the same for a call to build_truth_vector_type, leaving truth_type_for itself as the only remaining caller. It's then more natural to pass build_truth_vector_type the original vector type rather than its size and nunits, especially since the given size isn't the size of the returned vector. This in turn allows a future patch to simplify the interface of get_mask_mode. Doing this also fixes a bug in which truth_type_for would pass a size of zero for BLKmode vector types. 2019-11-14 Richard Sandiford <richard.sandiford@arm.com> gcc/ * tree.h (build_truth_vector_type): Delete. (build_same_sized_truth_vector_type): Likewise. * tree.c (build_truth_vector_type): Rename to... (build_truth_vector_type_for): ...this. Make static and take a vector type as argument. (truth_type_for): Update accordingly. (build_same_sized_truth_vector_type): Delete. * tree-vect-generic.c (expand_vector_divmod): Use truth_type_for instead of build_same_sized_truth_vector_type. * tree-vect-loop.c (vect_create_epilog_for_reduction): Likewise. (vect_record_loop_mask, vect_get_loop_mask): Likewise. * tree-vect-patterns.c (build_mask_conversion): Likeise. * tree-vect-slp.c (vect_get_constant_vectors): Likewise. * tree-vect-stmts.c (vect_get_vec_def_for_operand): Likewise. (vect_build_gather_load_calls, vectorizable_call): Likewise. (scan_store_can_perm_p, vectorizable_scan_store): Likewise. (vectorizable_store, vectorizable_condition): Likewise. (get_mask_type_for_scalar_type, get_same_sized_vectype): Likewise. (vect_get_mask_type_for_stmt): Use truth_type_for instead of build_truth_vector_type. * config/aarch64/aarch64-sve-builtins.cc (gimple_folder::convert_pred): Use truth_type_for instead of build_same_sized_truth_vector_type. * config/rs6000/rs6000-call.c (fold_build_vec_cmp): Likewise. gcc/c/ * c-typeck.c (build_conditional_expr): Use truth_type_for instead of build_same_sized_truth_vector_type. (build_vec_cmp): Likewise. gcc/cp/ * call.c (build_conditional_expr_1): Use truth_type_for instead of build_same_sized_truth_vector_type. * typeck.c (build_vec_cmp): Likewise. gcc/d/ * d-codegen.cc (build_boolop): Use truth_type_for instead of build_same_sized_truth_vector_type. From-SVN: r278232
2019-11-14Add build_truth_vector_type_for_modeRichard Sandiford1-8/+10
Callers of vect_halve_mask_nunits and vect_double_mask_nunits already know what mode the resulting vector type should have, so we might as well create the vector type directly with that mode, just like build_vector_type_for_mode lets us build normal vectors with a known mode. This avoids the current awkwardness of having to recompute the mode starting from vec_info::vector_size, which hard-codes the assumption that all vectors have to be the same size. A later patch gets rid of build_truth_vector_type and build_same_sized_truth_vector_type, so the net effect of the series is to reduce the number of type functions by one. 2019-11-14 Richard Sandiford <richard.sandiford@arm.com> gcc/ * tree.h (build_truth_vector_type_for_mode): Declare. * tree.c (build_truth_vector_type_for_mode): New function, split out from... (build_truth_vector_type): ...here. (build_opaque_vector_type): Fix head comment. * tree-vectorizer.h (supportable_narrowing_operation): Remove vec_info parameter. (vect_halve_mask_nunits): Replace vec_info parameter with the mode of the new vector. (vect_double_mask_nunits): Likewise. * tree-vect-loop.c (vect_halve_mask_nunits): Likewise. (vect_double_mask_nunits): Likewise. * tree-vect-loop-manip.c: Include insn-config.h, rtl.h and recog.h. (vect_maybe_permute_loop_masks): Remove vinfo parameter. Update call to vect_halve_mask_nunits, getting the required mode from the unpack patterns. (vect_set_loop_condition_masked): Update call accordingly. * tree-vect-stmts.c (supportable_narrowing_operation): Remove vec_info parameter and update call to vect_double_mask_nunits. (vectorizable_conversion): Update call accordingly. (simple_integer_narrowing): Likewise. Remove vec_info parameter. (vectorizable_call): Update call accordingly. (supportable_widening_operation): Update call to vect_halve_mask_nunits. * config/aarch64/aarch64-sve-builtins.cc (register_builtin_types): Use build_truth_vector_type_mode instead of build_truth_vector_type. From-SVN: r278231
2019-11-13Account for the cost of generating loop masksRichard Sandiford1-0/+26
We didn't take the cost of generating loop masks into account, and so tended to underestimate the cost of loops that need multiple masks. 2019-11-13 Richard Sandiford <richard.sandiford@arm.com> gcc/ * tree-vect-loop.c (vect_estimate_min_profitable_iters): Include the cost of generating loop masks. gcc/testsuite/ * gcc.target/aarch64/sve/mask_struct_store_3.c: Add -fno-vect-cost-model. * gcc.target/aarch64/sve/mask_struct_store_3_run.c: Likewise. * gcc.target/aarch64/sve/peel_ind_2.c: Likewise. * gcc.target/aarch64/sve/peel_ind_2_run.c: Likewise. * gcc.target/aarch64/sve/peel_ind_3.c: Likewise. * gcc.target/aarch64/sve/peel_ind_3_run.c: Likewise. From-SVN: r278125
2019-11-13Avoid accounting for non-existent vector loop versioningRichard Sandiford1-9/+25
vect_analyze_loop_costing uses two profitability thresholds: a runtime one and a static compile-time one. The runtime one is simply the point at which the vector loop is cheaper than the scalar loop, while the static one also takes into account the cost of choosing between the scalar and vector loops at runtime. We compare this static cost against the expected execution frequency to decide whether it's worth generating any vector code at all. However, we never reclaimed the cost of applying the runtime threshold if it turned out that the vector code can always be used. And we only know whether that's true once we've calculated what the runtime threshold would be. 2019-11-13 Richard Sandiford <richard.sandiford@arm.com> gcc/ * tree-vectorizer.h (vect_apply_runtime_profitability_check_p): New function. * tree-vect-loop-manip.c (vect_loop_versioning): Use it. * tree-vect-loop.c (vect_analyze_loop_2): Likewise. (vect_transform_loop): Likewise. (vect_analyze_loop_costing): Don't take the cost of versioning into account for the static profitability threshold if it turns out that no versioning is needed. From-SVN: r278124
2019-11-13Don't assign a cost to vectorizable_assignmentRichard Sandiford1-1/+3
vectorizable_assignment handles true SSA-to-SSA copies (which hopefully we don't see in practice) and no-op conversions that are required to maintain correct gimple, such as changes between signed and unsigned types. These cases shouldn't generate any code and so shouldn't count against either the scalar or vector costs. Later patches test this, but it seemed worth splitting out. 2019-11-13 Richard Sandiford <richard.sandiford@arm.com> gcc/ * tree-vectorizer.h (vect_nop_conversion_p): Declare. * tree-vect-stmts.c (vect_nop_conversion_p): New function. (vectorizable_assignment): Don't add a cost for nop conversions. * tree-vect-loop.c (vect_compute_single_scalar_iteration_cost): Likewise. * tree-vect-slp.c (vect_bb_slp_scalar_cost): Likewise. From-SVN: r278122
2019-11-13re PR target/92473 (test pr92324-2.c fails on arm and aarch64)Richard Biener1-24/+6
2019-11-13 Richard Biener <rguenther@suse.de> PR tree-optimization/92473 * tree-vect-loop.c (vect_create_epilog_for_reduction): Perform direct optab reduction in the correct type. From-SVN: r278113