aboutsummaryrefslogtreecommitdiff
path: root/gcc/tree-vect-stmts.c
AgeCommit message (Collapse)AuthorFilesLines
2017-01-09re PR tree-optimization/78938 (ICE in expand_vec_cond_expr, at optabs.c:5636 ↵Jakub Jelinek1-19/+112
w/ -mavx512bw -ftree-loop-vectorize -O1) PR tree-optimization/78938 * tree-vect-stmts.c (vectorizable_condition): For non-masked COND_EXPR where comp_vectype is VECTOR_BOOLEAN_TYPE_P, use BIT_{NOT,XOR,AND,IOR}_EXPR on the comparison operands instead of {EQ,NE,GE,GT,LE,LT}_EXPR directly inside of VEC_COND_EXPR. Formatting fixes. * gcc.dg/vect/pr78938.c: New test. From-SVN: r244223
2017-01-01Update copyright years.Jakub Jelinek1-1/+1
From-SVN: r243994
2016-11-09re PR target/78007 (Important loop from 482.sphinx3 is not vectorized)Richard Biener1-0/+116
2016-11-09 Richard Biener <rguenther@suse.de> PR tree-optimization/78007 * tree-vect-stmts.c (vectorizable_bswap): New function. (vectorizable_call): Call vectorizable_bswap for BUILT_IN_BSWAP{16,32,64} if arguments are not promoted. * gcc.dg/vect/vect-bswap32.c: Adjust. * gcc.dg/vect/vect-bswap64.c: Likewise. From-SVN: r241992
2016-11-08tree-vect-stmts.c (get_group_load_store_type): If the access is aligned do ↵Richard Biener1-0/+9
not trigger peeling for gaps. 2016-11-08 Richard Biener <rguenther@suse.de> * tree-vect-stmts.c (get_group_load_store_type): If the access is aligned do not trigger peeling for gaps. * tree-vect-data-refs.c (vect_compute_data_ref_alignment): Do not force alignment of vars with DECL_USER_ALIGN. * gcc.dg/vect/vect-nb-iter-ub-2.c: Adjust. From-SVN: r241959
2016-11-08re PR tree-optimization/78205 (BB vectorization confused by too large load ↵Richard Biener1-12/+0
groups) 2016-11-08 Richard Biener <rguenther@suse.de> PR tree-optimization/78205 * tree-vect-stmts.c (vectorizable_load): Move check whether we may run into gaps when BB vectorizing SLP permutations ... * tree-vect-slp.c (vect_supported_load_permutation_p): ... here where we can do a more precise check. * gcc.dg/vect/bb-slp-pr78205.c: New testcase. From-SVN: r241956
2016-11-07re PR middle-end/37150 (basic-block vectorization misses some unrolled loops)Richard Biener1-3/+8
2016-11-07 Richard Biener <rguenther@suse.de> PR tree-optimization/37150 * tree-vectorizer.h (vect_transform_slp_perm_load): Add n_perms parameter. * tree-vect-slp.c (vect_supported_load_permutation_p): Adjust. (vect_analyze_slp_cost_1): Account for the real number of permutations emitted and for dead loads. (vect_transform_slp_perm_load): Add n_perms parameter counting the number of emitted permutations. * tree-vect-stmts.c (vectorizable_load): Adjust. From-SVN: r241893
2016-10-25re PR target/78102 (GCC refuses to generate PCMPEQQ instruction for SSE4.1)Jakub Jelinek1-2/+3
PR target/78102 * optabs.def (vcondeq_optab, vec_cmpeq_optab): New optabs. * optabs.c (expand_vec_cond_expr): For comparison codes EQ_EXPR and NE_EXPR, attempt vcondeq_optab as fallback. (expand_vec_cmp_expr): For comparison codes EQ_EXPR and NE_EXPR, attempt vec_cmpeq_optab as fallback. * optabs-tree.h (expand_vec_cmp_expr_p, expand_vec_cond_expr_p): Add enum tree_code argument. * optabs-query.h (get_vec_cmp_eq_icode, get_vcond_eq_icode): New inline functions. * optabs-tree.c (expand_vec_cmp_expr_p): Add CODE argument. For CODE EQ_EXPR or NE_EXPR, attempt to use vec_cmpeq_optab as fallback. (expand_vec_cond_expr_p): Add CODE argument. For CODE EQ_EXPR or NE_EXPR, attempt to use vcondeq_optab as fallback. * tree-vect-generic.c (expand_vector_comparison, expand_vector_divmod, expand_vector_condition): Adjust expand_vec_cmp_expr_p and expand_vec_cond_expr_p callers. * tree-vect-stmts.c (vectorizable_condition, vectorizable_comparison): Likewise. * tree-vect-patterns.c (vect_recog_mixed_size_cond_pattern, check_bool_pattern, search_type_for_mask_1): Likewise. * expr.c (do_store_flag): Likewise. * doc/md.texi (@code{vec_cmpeq@var{m}@var{n}}, @code{vcondeq@var{m}@var{n}}): Document. * config/i386/sse.md (vec_cmpeqv2div2di, vcondeq<VI8F_128:mode>v2di): New expanders. testsuite/ * gcc.target/i386/pr78102.c: New test. From-SVN: r241525
2016-09-29tree-vect-stmts.c (vectorizable_load): Avoid emitting vector constructors ↵Richard Biener1-5/+36
with vector elements. 2016-09-29 Richard Biener <rguenther@suse.de> * tree-vect-stmts.c (vectorizable_load): Avoid emitting vector constructors with vector elements. From-SVN: r240611
2016-09-26ipa-inline-analysis.c (find_foldable_builtin_expect): Use ↵Marek Polacek1-6/+2
gimple_call_internal_p. * ipa-inline-analysis.c (find_foldable_builtin_expect): Use gimple_call_internal_p. * ipa-split.c (find_return_bb): Likewise. (execute_split_functions): Likewise. * omp-low.c (dump_oacc_loop_part): Likewise. (oacc_loop_xform_head_tail): Likewise. * predict.c (predict_loops): Likewise. * sanopt.c (pass_sanopt::execute): Likewise. * tree-cfg.c (get_abnormal_succ_dispatcher): Likewise. * tree-parloops.c (oacc_entry_exit_ok_1): Likewise. * tree-stdarg.c (gimple_call_ifn_va_arg_p): Remove function. (expand_ifn_va_arg_1): Use gimple_call_internal_p. (expand_ifn_va_arg): Likewise. * tree-vect-loop.c (vect_determine_vectorization_factor): Likewise. (optimize_mask_stores): Likewise. * tree-vect-stmts.c (vect_simd_lane_linear): Likewise. (vect_transform_stmt): Likewise. * tree-vectorizer.c (vect_loop_vectorized_call): Likewise. * tsan.c (instrument_memory_accesses): Likewise. From-SVN: r240498
2016-09-21re PR tree-optimization/77550 (std::deque with -O3 has infinite std::distance)Bernd Edlinger1-29/+67
gcc: 2016-09-21 Bernd Edlinger <bernd.edlinger@hotmail.de> PR tree-optimization/77550 * tree-vect-stmts.c (create_array_ref): Change parameters. (get_group_alias_ptr_type): New function. (vectorizable_store, vectorizable_load): Use get_group_alias_ptr_type. testsuite: 2016-09-21 Bernd Edlinger <bernd.edlinger@hotmail.de> PR tree-optimization/77550 * g++.dg/pr77550.C: New test. From-SVN: r240313
2016-09-16Add inline functions for various bitwise operations.Jason Merrill1-2/+2
* hwint.h (least_bit_hwi, pow2_or_zerop, pow2p_hwi, ctz_or_zero): New. * hwint.c (exact_log2): Use pow2p_hwi. (ctz_hwi, ffs_hwi): Use least_bit_hwi. * alias.c (memrefs_conflict_p): Use pow2_or_zerop. * builtins.c (get_object_alignment_2, get_object_alignment) (get_pointer_alignment, fold_builtin_atomic_always_lock_free): Use least_bit_hwi. * calls.c (compute_argument_addresses, store_one_arg): Use least_bit_hwi. * cfgexpand.c (expand_one_stack_var_at): Use least_bit_hwi. * combine.c (force_to_mode): Use least_bit_hwi. * emit-rtl.c (set_mem_attributes_minus_bitpos, adjust_address_1): Use least_bit_hwi. * expmed.c (synth_mult, expand_divmod): Use ctz_or_zero, ctz_hwi. (init_expmed_one_conv): Use pow2p_hwi. * fold-const.c (round_up_loc, round_down_loc): Use pow2_or_zerop. (fold_binary_loc): Use pow2p_hwi. * function.c (assign_parm_find_stack_rtl): Use least_bit_hwi. * gimple-fold.c (gimple_fold_builtin_memory_op): Use pow2p_hwi. * gimple-ssa-strength-reduction.c (replace_ref): Use least_bit_hwi. * hsa-gen.c (gen_hsa_addr_with_align, hsa_bitmemref_alignment): Use least_bit_hwi. * ipa-cp.c (ipcp_alignment_lattice::meet_with_1): Use least_bit_hwi. * ipa-prop.c (ipa_modify_call_arguments): Use least_bit_hwi. * omp-low.c (oacc_loop_fixed_partitions) (oacc_loop_auto_partitions): Use least_bit_hwi. * rtlanal.c (nonzero_bits1): Use ctz_or_zero. * stor-layout.c (place_field): Use least_bit_hwi. * tree-pretty-print.c (dump_generic_node): Use pow2p_hwi. * tree-sra.c (build_ref_for_offset): Use least_bit_hwi. * tree-ssa-ccp.c (ccp_finalize): Use least_bit_hwi. * tree-ssa-math-opts.c (bswap_replace): Use least_bit_hwi. * tree-ssa-strlen.c (handle_builtin_memcmp): Use pow2p_hwi. * tree-vect-data-refs.c (vect_analyze_group_access_1) (vect_grouped_store_supported, vect_grouped_load_supported) (vect_permute_load_chain, vect_shift_permute_load_chain) (vect_transform_grouped_load): Use pow2p_hwi. * tree-vect-generic.c (expand_vector_divmod): Use ctz_or_zero. * tree-vect-patterns.c (vect_recog_divmod_pattern): Use ctz_or_zero. * tree-vect-stmts.c (vectorizable_mask_load_store): Use least_bit_hwi. * tsan.c (instrument_expr): Use least_bit_hwi. * var-tracking.c (negative_power_of_two_p): Use pow2_or_zerop. From-SVN: r240194
2016-09-15re PR tree-optimization/77503 (ICE in vect_transform_stmt compiling postgresql)Bin Cheng1-0/+1
PR tree-optimization/77503 * tree-vect-loop.c (vectorizable_reduction): Record reduction code for CONST_COND_REDUCTION at analysis stage and use it at transform stage. * tree-vectorizer.h (struct _stmt_vec_info): New field. (STMT_VINFO_VEC_CONST_COND_REDUC_CODE): New macro. * tree-vect-stmts.c (new_stmt_vec_info): Initialize above new field. gcc/testsuite * gcc.dg/vect/pr77503.c: New test. From-SVN: r240166
2016-07-13remove unnecessary calls to vec::releaseTrevor Saunders1-3/+0
There's no point in calling release () on an auto_vec just before it goes out of scope. gcc/ChangeLog: 2016-07-12 Trevor Saunders <tbsaunde+gcc@tbsaunde.org> * tree-data-ref.c (find_data_references_in_stmt): Remove unnecessary call to vec::release. (graphite_find_data_references_in_stmt): Likewise. * tree-ssa-alias.c (nonoverlapping_component_refs_of_decl_p): Likewise. * tree-vect-stmts.c (vectorizable_condition): Likewise. From-SVN: r238286
2016-07-13use auto_vec for more local variablesTrevor Saunders1-6/+2
gcc/c/ChangeLog: 2016-07-12 Trevor Saunders <tbsaunde+gcc@tbsaunde.org> * c-parser.c (c_parser_generic_selection): Make type of variable auto_vec. (c_parser_omp_declare_simd): Likewise. gcc/ChangeLog: 2016-07-12 Trevor Saunders <tbsaunde+gcc@tbsaunde.org> * cfgexpand.c (expand_used_vars): Make the type of a local variable auto_vec. * genmatch.c (lower_for): Likewise. * haifa-sched.c (haifa_sched_init): Likewise. (add_to_speculative_block): Likewise. (create_check_block_twin): Likewise. * predict.c (handle_missing_profiles): Likewise. * tree-data-ref.c (loop_nest_has_data_refs): Likewise. * tree-diagnostic.c (maybe_unwind_expanded_macro_loc): Likewise. * tree-ssa-loop-niter.c (discover_iteration_bound_by_body_walk): Likewise. (maybe_lower_iteration_bound): Likewise. * tree-ssa-sccvn.c (DFS): Likewise. * tree-stdarg.c (reachable_at_most_once): Likewise. * tree-vect-stmts.c (vectorizable_conversion): Likewise. (vectorizable_store): Likewise. From-SVN: r238285
2016-07-11re PR tree-optimization/71823 (g++ segfaults with -mfma and ↵Jakub Jelinek1-5/+2
-ftree-slp-vectorize) PR tree-optimization/71823 * tree-vect-stmts.c (vectorizable_operation): Use vect_get_vec_defs to get vec_oprnds2 from op2. * gcc.dg/vect/pr71823.c: New test. From-SVN: r238229
2016-07-06[7/7] Add negative and zero strides to vect_memory_access_typeRichard Sandiford1-121/+113
This patch uses the vect_memory_access_type from patch 6 to represent the effect of a negative contiguous stride or a zero stride. The latter is valid only for loads. Tested on aarch64-linux-gnu and x86_64-linux-gnu. gcc/ * tree-vectorizer.h (vect_memory_access_type): Add VMAT_INVARIANT, VMAT_CONTIGUOUS_DOWN and VMAT_CONTIGUOUS_REVERSED. * tree-vect-stmts.c (compare_step_with_zero): New function. (perm_mask_for_reverse): Move further up file. (get_group_load_store_type): Stick to VMAT_ELEMENTWISE if the step is negative. (get_negative_load_store_type): New function. (get_load_store_type): Call it. Add an ncopies argument. (vectorizable_mask_load_store): Update call accordingly and remove tests for negative steps. (vectorizable_store, vectorizable_load): Likewise. Handle new memory_access_types. From-SVN: r238039
2016-07-06[6/7] Explicitly classify vector loads and storesRichard Sandiford1-192/+318
This is the main patch in the series. It adds a new enum and routines for classifying a vector load or store implementation. Originally there were three motivations: (1) Reduce cut-&-paste (2) Make the chosen vectorisation strategy more obvious. At the moment this is derived implicitly from various other bits of state (GROUPED, STRIDED, SLP, etc.) (3) Decouple the vectorisation strategy from those other bits of state, so that there can be a choice of implementation for a given scalar statement. The specific problem here is that we class: for (...) { ... = a[i * x]; ... = a[i * x + 1]; } as "strided and grouped" but: for (...) { ... = a[i * 7]; ... = a[i * 7 + 1]; } as "non-strided and grouped". Before the patch, "strided and grouped" loads would always try to use separate scalar loads while "non-strided and grouped" loads would always try to use load-and-permute. But load-and-permute is never supported for a group size of 7, so the effect was that the first loop was vectorisable and the second wasn't. It seemed odd that not knowing x (but accepting it could be 7) would allow more optimisation opportunities than knowing x is 7. Unfortunately, it looks like we underestimate the cost of separate scalar accesses on at least aarch64, so I've disabled (3) for now; see the "if" statement at the end of get_load_store_type. I think the patch still does (1) and (2), so that's the justification for it in its current form. It also means that (3) is now simply a case of removing the FIXME code, once the cost model problems have been sorted out. (I did wonder about adding a --param, but that seems overkill. I hope to get back to this during GCC 7 stage 1.) Tested on aarch64-linux-gnu and x86_64-linux-gnu. gcc/ * tree-vectorizer.h (vect_memory_access_type): New enum. (_stmt_vec_info): Add a memory_access_type field. (STMT_VINFO_MEMORY_ACCESS_TYPE): New macro. (vect_model_store_cost): Take an access type instead of a boolean. (vect_model_load_cost): Likewise. * tree-vect-slp.c (vect_analyze_slp_cost_1): Update calls to vect_model_store_cost and vect_model_load_cost. * tree-vect-stmts.c (vec_load_store_type): New enum. (vect_model_store_cost): Take an access type instead of a store_lanes_p boolean. Simplify tests. (vect_model_load_cost): Likewise, but for load_lanes_p. (get_group_load_store_type, get_load_store_type): New functions. (vectorizable_store): Use get_load_store_type. Record the access type in STMT_VINFO_MEMORY_ACCESS_TYPE. (vectorizable_load): Likewise. (vectorizable_mask_load_store): Likewise. Replace is_store variable with vls_type. From-SVN: r238038
2016-07-06[5/7] Move the fix for PR65518Richard Sandiford1-16/+5
This patch moves the fix for PR65518 to the code that checks whether load-and-permute operations are supported. If the group size is greater than the vectorisation factor, it would still be possible to fall back to elementwise loads (as for strided groups) rather than fail vectorisation entirely. Tested on aarch64-linux-gnu and x86_64-linux-gnu. gcc/ * tree-vectorizer.h (vect_grouped_load_supported): Add a single_element_p parameter. * tree-vect-data-refs.c (vect_grouped_load_supported): Likewise. Check the PR65518 case here rather than in vectorizable_load. * tree-vect-loop.c (vect_analyze_loop_2): Update call accordignly. * tree-vect-stmts.c (vectorizable_load): Likewise. From-SVN: r238037
2016-07-06[4/7] Add a gather_scatter_info structureRichard Sandiford1-60/+51
This patch just refactors the gather/scatter support so that all information is in a single structure, rather than separate variables. This reduces the number of arguments to a function added in patch 6. Tested on aarch64-linux-gnu and x86_64-linux-gnu. gcc/ * tree-vectorizer.h (gather_scatter_info): New structure. (vect_check_gather_scatter): Return a bool rather than a decl. Replace return-by-pointer arguments with a single gather_scatter_info *. * tree-vect-data-refs.c (vect_check_gather_scatter): Likewise. (vect_analyze_data_refs): Update call accordingly. * tree-vect-stmts.c (vect_mark_stmts_to_be_vectorized): Likewise. (vectorizable_mask_load_store): Likewise. Also record the offset dt and vectype in the gather_scatter_info. (vectorizable_store): Likewise. (vectorizable_load): Likewise. From-SVN: r238036
2016-07-06[3/7] Fix load/store costs for strided groupsRichard Sandiford1-4/+2
vect_model_store_cost had: /* Costs of the stores. */ if (STMT_VINFO_STRIDED_P (stmt_info) && !STMT_VINFO_GROUPED_ACCESS (stmt_info)) { /* N scalar stores plus extracting the elements. */ inside_cost += record_stmt_cost (body_cost_vec, ncopies * TYPE_VECTOR_SUBPARTS (vectype), scalar_store, stmt_info, 0, vect_body); But non-SLP strided groups also use individual scalar stores rather than vector stores, so I think we should skip this only for SLP groups. The same applies to vect_model_load_cost. Tested on aarch64-linux-gnu and x86_64-linux-gnu. gcc/ * tree-vect-stmts.c (vect_model_store_cost): For non-SLP strided groups, use the cost of N scalar accesses instead of ncopies vector accesses. (vect_model_load_cost): Likewise. From-SVN: r238035
2016-07-06[2/7] Clean up vectorizer load/store costsRichard Sandiford1-69/+50
Add a bit more commentary and try to make the structure more obvious. The horrendous: if (grouped_access_p && represents_group_p && !store_lanes_p && !STMT_VINFO_STRIDED_P (stmt_info) && !slp_node) checks go away in patch 6. Tested on aarch64-linux-gnu and x86_64-linux-gnu. gcc/ * tree-vect-stmts.c (vect_cost_group_size): Delete. (vect_model_store_cost): Avoid calling it. Use first_stmt_p variable to indicate when once-per-group costs are being used. (vect_model_load_cost): Likewise. Fix comment and misindented code. From-SVN: r238034
2016-07-06[1/7] Remove unnecessary peeling for gaps checkRichard Sandiford1-4/+2
I recently relaxed the peeling-for-gaps conditions for LD3 but kept them as-is for load-and-permute. I don't think the conditions are needed for load-and-permute either though. No current load-and- permute should load outside the group, so if there is no gap at the end, the final vector element loaded will correspond to an element loaded by the original scalar loop. The patch for PR68559 (a missed optimisation PR) increased the peeled cases from "exact_log2 (groupsize) == -1" to "vf % group_size == 0", so before that fix, we didn't peel for gaps if there was no gap at the end of the group and if the group size was a power of 2. The only current non-power-of-2 load-and-permute size is 3, which doesn't require loading more than 3 vectors. The testcase is based on gcc.dg/vect/pr49038.c. Tested on aarch64-linux-gnu and x86_64-linux-gnu. gcc/ * tree-vect-stmts.c (vectorizable_load): Remove unnecessary peeling-for-gaps condition. gcc/testsuite/ * gcc.dg/vect/group-no-gaps-1.c: New test. From-SVN: r238033
2016-06-29re PR tree-optimization/71655 (GCC trunk ICE on westmere target)Ilya Enkovich1-0/+2
gcc/ PR tree-optimization/71655 * tree-vect-stmts.c (vectorizable_comparison): Swap definition types when swapping operands. gcc/testsuite/ PR tree-optimization/71655 * g++.dg/pr71655.C: New test. From-SVN: r237846
2016-06-22re PR tree-optimization/71488 (Wrong code for vector comparisons with ↵Ilya Enkovich1-5/+91
ivybridge and westmere targets) gcc/ PR middle-end/71488 * tree-vect-patterns.c (vect_recog_mask_conversion_pattern): Support comparison of boolean vectors. * tree-vect-stmts.c (vectorizable_comparison): Vectorize comparison of boolean vectors using bitwise operations. gcc/testsuite/ PR middle-end/71488 * g++.dg/pr71488.C: New test. * gcc.dg/vect/vect-bool-cmp.c: New test. From-SVN: r237706
2016-06-15tree-vect-stmts.c (vectorizable_store): Remove strided grouped store ↵Richard Biener1-25/+42
restrictions. 2016-06-15 Richard Biener <rguenther@suse.de> * tree-vect-stmts.c (vectorizable_store): Remove strided grouped store restrictions. * gcc.dg/vect/slp-45.c: New testcase. From-SVN: r237474
2016-06-08tree-vect-stmts.c (vectorizable_load): Remove restrictions on strided SLP ↵Richard Biener1-48/+46
loads and fall back to scalar loads in case... 2016-06-08 Richard Biener <rguenther@suse.de> * tree-vect-stmts.c (vectorizable_load): Remove restrictions on strided SLP loads and fall back to scalar loads in case we can't chunk them. * gcc.dg/vect/slp-43.c: New testcase. From-SVN: r237215
2016-06-03[3/3] No need to vectorize simple only-live stmtsAlan Hayward1-1/+2
2016-06-03 Alan Hayward <alan.hayward@arm.com> [3/3] No need to vectorize simple only-live stmts gcc/ * tree-vect-stmts.c (vect_stmt_relevant_p): Do not vectorize non live relevant stmts which are simple and invariant. * tree-vect-loop.c (vectorizable_live_operation): Check relevance instead of simple and invariant testsuite/ * gcc.dg/vect/vect-live-slp-5.c: Remove dg check. From-SVN: r237065
2016-06-03[2/3] Vectorize inductions that are live after the loopAlan Hayward1-55/+93
2016-06-03 Alan Hayward <alan.hayward@arm.com> [2/3] Vectorize inductions that are live after the loop gcc/ * tree-vect-loop.c (vect_analyze_loop_operations): Allow live stmts. (vectorizable_reduction): Check for new relevant state. (vectorizable_live_operation): vectorize live stmts using BIT_FIELD_REF. Remove special case for gimple assigns stmts. * tree-vect-stmts.c (is_simple_and_all_uses_invariant): New function. (vect_stmt_relevant_p): Check for stmts which are only used live. (process_use): Use of a stmt does not inherit it's live value. (vect_mark_stmts_to_be_vectorized): Simplify relevance inheritance. (vect_analyze_stmt): Check for new relevant state. * tree-vectorizer.h (vect_relevant): New entry for a stmt which is used outside the loop, but not inside it. testsuite/ * gcc.dg/tree-ssa/pr64183.c: Ensure test does not vectorize. * testsuite/gcc.dg/vect/no-scevccp-vect-iv-2.c: Remove xfail. * gcc.dg/vect/vect-live-1.c: New test. * gcc.dg/vect/vect-live-2.c: New test. * gcc.dg/vect/vect-live-3.c: New test. * gcc.dg/vect/vect-live-4.c: New test. * gcc.dg/vect/vect-live-5.c: New test. * gcc.dg/vect/vect-live-slp-1.c: New test. * gcc.dg/vect/vect-live-slp-2.c: New test. * gcc.dg/vect/vect-live-slp-3.c: New test. From-SVN: r237064
2016-06-03[1/3] Split vect_get_vec_def_for_operand into twoAlan Hayward1-54/+64
2016-06-03 Alan Hayward <alan.hayward@arm.com> [1/3] Split vect_get_vec_def_for_operand into two gcc/ * tree-vectorizer.h (vect_get_vec_def_for_operand_1): New * tree-vect-stmts.c (vect_get_vec_def_for_operand_1): New (vect_get_vec_def_for_operand): Split out code. From-SVN: r237063
2016-06-032016-06-03 Alan Hayward <alan.hayward@arm.com>Alan Hayward1-23/+0
gcc/ * tree-vect-stmts.c (vectorizable_call) Remove GOMP_SIMD_LANE code. From-SVN: r237061
2016-05-25re PR tree-optimization/71264 (ICE in convert_move)Richard Biener1-4/+11
2016-05-25 Richard Biener <rguenther@suse.de> PR tree-optimization/71264 * tree-vect-stmts.c (vect_init_vector): Properly deal with vector type val. * gcc.dg/vect/pr71264.c: New testcase. From-SVN: r236699
2016-05-24re PR c++/71257 (OpenMP declare simd linear with ref modifier doesn't accept ↵Jakub Jelinek1-2/+7
references to non-integer/non-pointer) PR c++/71257 * tree-vect-stmts.c (vectorizable_simd_clone_call): Handle SIMD_CLONE_ARG_TYPE_LINEAR_REF_CONSTANT_STEP like SIMD_CLONE_ARG_TYPE_LINEAR_CONSTANT_STEP. Add SIMD_CLONE_ARG_TYPE_LINEAR_VAL_CONSTANT_STEP and SIMD_CLONE_ARG_TYPE_LINEAR_UVAL_CONSTANT_STEP cases explicitly. * semantics.c (finish_omp_clauses) <case OMP_CLAUSE_LINEAR>: For OMP_CLAUSE_LINEAR_REF don't require type to be integral or pointer. * g++.dg/vect/simd-clone-6.cc: New test. * g++.dg/gomp/declare-simd-6.C: New test. From-SVN: r236648
2016-05-24Clean up PURE_SLP_STMT handlingRichard Sandiford1-20/+21
The vectorizable_* routines had many instances of: slp_node || PURE_SLP_STMT (stmt_info) which gives the misleading impression that we can have !slp_node && PURE_SLP_STMT (stmt_info). In this context it's really enough to test slp_node on its own. There are three cases: loop vectorisation only: vectorizable_foo called only with !slp_node pure SLP: vectorizable_foo called only with slp_node hybrid SLP: (e.g. a vector that's used in SLP statements and also in a reduction) - vectorizable_foo called once with slp_node for the SLP uses. - vectorizable_foo called once with !slp_node for the non-SLP uses. Hybrid SLP isn't possible for stores, so I added an explicit assert for that. I also made vectorizable_comparison static, to make it obvious that no other callers outside tree-vect-stmts.c could use it with the !slp && PURE_SLP_STMT combination. Tested on aarch64-linux-gnu and x86_64-linux-gnu. gcc/ * tree-vectorizer.h (vectorizable_comparison): Delete. * tree-vect-loop.c (vectorizable_reduction): Remove redundant PURE_SLP_STMT check. * tree-vect-stmts.c (vectorizable_call): Likewise. (vectorizable_simd_clone_call): Likewise. (vectorizable_conversion): Likewise. (vectorizable_assignment): Likewise. (vectorizable_shift): Likewise. (vectorizable_operation): Likewise. (vectorizable_load): Likewise. (vectorizable_condition): Likewise. (vectorizable_store): Likewise. Assert that we don't have hybrid SLP. (vectorizable_comparison): Make static. Remove redundant PURE_SLP_STMT check. (vect_transform_stmt): Assert that we always have an slp_node if PURE_SLP_STMT. From-SVN: r236642
2016-05-24Avoid unnecessary peeling for gaps with LD3Richard Sandiford1-13/+12
vectorizable_load forces peeling for gaps if the vectorisation factor is not a multiple of the group size, since in that case we'd normally load beyond the original scalar accesses but drop the excess elements as part of a following permute: if (loop_vinfo && ! STMT_VINFO_STRIDED_P (stmt_info) && (GROUP_GAP (vinfo_for_stmt (first_stmt)) != 0 || (!slp && vf % GROUP_SIZE (vinfo_for_stmt (first_stmt)) != 0))) This isn't necessary for LOAD_LANES though, since it loads only the data needed and does the permute itself. Tested on aarch64-linux-gnu and x86_64-linux-gnu. gcc/ * tree-vect-stmts.c (vectorizable_load): Reorder checks so that load_lanes/grouped_load classification comes first. Don't check whether the vectorization factor is a multiple of the group size for load_lanes. gcc/testsuite/ * gcc.dg/vect/vect-load-lanes-peeling-1.c: New test. From-SVN: r236632
2016-05-24Fix GROUP_GAP for single-element interleavingRichard Sandiford1-4/+2
vectorizable_load had a curious "force_peeling" variable, with no comment explaining why we need it for single-element interleaving but not for other cases. I think it's simply because we weren't initialising the GROUP_GAP correctly for single loads. Tested on aarch64-linux-gnu and x86_64-linux-gnu. gcc/ * tree-vect-data-refs.c (vect_analyze_group_access_1): Set GROUP_GAP for single-element interleaving. * tree-vect-stmts.c (vectorizable_load): Remove force_peeling variable. From-SVN: r236631
2016-05-19Fix memory leak in tree-vect-stmts.cMartin Liska1-16/+5
* tree-vect-stmts.c (vectorizable_simd_clone_call): Utilize auto_vec instead of vec. From-SVN: r236472
2016-04-20re PR tree-optimization/70726 (Internal compiler error (ICE) on valid code)Richard Biener1-0/+10
2016-04-20 Richard Biener <rguenther@suse.de> PR tree-optimization/70726 * tree-vect-stmts.c (vectorizable_shift): Do not use scalar shift amounts from a pattern stmt operand. * g++.dg/vect/pr70726.cc: New testcase. From-SVN: r235236
2016-04-18tree.h (TYPE_ALIGN, DECL_ALIGN): Return shifted amount.Michael Matz1-1/+1
* tree.h (TYPE_ALIGN, DECL_ALIGN): Return shifted amount. (SET_TYPE_ALIGN, SET_DECL_ALIGN): New. * tree-core.h (tree_type_common.align): Use bit-field. (tree_type_common.spare): New. (tree_decl_common.off_align): Make smaller. (tree_decl_common.align): Use bit-field. * expr.c (expand_expr_addr_expr_1): Use SET_TYPE_ALIGN. * omp-low.c (install_var_field): Use SET_DECL_ALIGN. (scan_sharing_clauses): Ditto. (finish_taskreg_scan): Use SET_DECL_ALIGN and SET_TYPE_ALIGN. (omp_finish_file): Ditto. * stor-layout.c (do_type_align): Use SET_DECL_ALIGN. (layout_decl): Ditto. (relayout_decl): Ditto. (finalize_record_size): Use SET_TYPE_ALIGN. (finalize_type_size): Ditto. (finish_builtin_struct): Ditto. (layout_type): Ditto. (initialize_sizetypes): Ditto. * targhooks.c (std_gimplify_va_arg_expr): Use SET_TYPE_ALIGN. * tree-nested.c (insert_field_into_struct): Use SET_TYPE_ALIGN. (lookup_field_for_decl): Use SET_DECL_ALIGN. (get_chain_field): Ditto. (get_trampoline_type): Ditto. (get_nl_goto_field): Ditto. * tree-streamer-in.c (unpack_ts_decl_common_value_fields): Use SET_DECL_ALIGN. (unpack_ts_type_common_value_fields): Use SET_TYPE_ALIGN. * gimple-expr.c (copy_var_decl): Use SET_DECL_ALIGN. * tree.c (make_node_stat): Use SET_DECL_ALIGN and SET_TYPE_ALIGN. (build_qualified_type): Use SET_TYPE_ALIGN. (build_aligned_type, build_range_type_1): Ditto. (build_atomic_base): Ditto. (build_common_tree_nodes): Ditto. * cfgexpand.c (align_local_variable): Use SET_DECL_ALIGN. (expand_one_stack_var_at): Ditto. * coverage.c (build_var): Use SET_DECL_ALIGN. * except.c (init_eh): Ditto. * function.c (assign_parm_setup_block): Ditto. * symtab.c (increase_alignment_1): Ditto. * tree-ssa-ccp.c (fold_builtin_alloca_with_align): Ditto. * tree-vect-stmts.c (ensure_base_align): Ditto. * varasm.c (align_variable): Ditto. (assemble_variable): Ditto. (build_constant_desc): Ditto. (output_constant_def_contents): Ditto. * config/arm/arm.c (arm_relayout_function): Use SET_DECL_ALIGN. * config/avr/avr.c (avr_adjust_type_node): Use SET_TYPE_ALIGN. * config/mips/mips.c (mips_std_gimplify_va_arg_expr): Ditto. * config/msp430/msp430.c (msp430_gimplify_va_arg_expr): Ditto. * config/spu/spu.c (spu_build_builtin_va_list): Use SET_DECL_ALIGN. ada/ * gcc-interface/decl.c (gnat_to_gnu_entity): Use SET_TYPE_ALIGN. (gnat_to_gnu_field): Ditto. (components_to_record): Ditto. (create_variant_part_from): Ditto. (copy_and_substitute_in_size): Ditto. (substitute_in_type): Ditto. * gcc-interface/utils.c (make_aligning_type): Use SET_TYPE_ALIGN. (make_packable_type): Ditto. (maybe_pad_type): Ditto. (finish_fat_pointer_type): Ditto. (finish_record_type): Ditto and use SET_DECL_ALIGN. (rest_of_record_type_compilation): Use SET_TYPE_ALIGN. (create_field_decl): Use SET_DECL_ALIGN. c-family/ * c-common.c (handle_aligned_attribute): Use SET_TYPE_ALIGN and SET_DECL_ALIGN. c/ * c-decl.c (merge_decls): Use SET_DECL_ALIGN and SET_TYPE_ALIGN. (grokdeclarator, parser_xref_tag, finish_enum): Use SET_TYPE_ALIGN. cp/ * class.c (build_vtable): Use SET_DECL_ALIGN and SET_TYPE_ALIGN. (layout_class_type): Ditto. (build_base_field): Use SET_DECL_ALIGN. (fixup_attribute_variants): Use SET_TYPE_ALIGN. * decl.c (duplicate_decls): Use SET_DECL_ALIGN. (record_unknown_type): Use SET_TYPE_ALIGN. (cxx_init_decl_processing): Ditto. (copy_type_enum): Ditto. (grokfndecl): Use SET_DECL_ALIGN. (copy_type_enum): Use SET_TYPE_ALIGN. * pt.c (instantiate_class_template_1): Use SET_TYPE_ALIGN. (tsubst): Ditto. * tree.c (cp_build_qualified_type_real): Use SET_TYPE_ALIGN. * lambda.c (maybe_add_lambda_conv_op): Use SET_DECL_ALIGN. * method.c (implicitly_declare_fn): Use SET_DECL_ALIGN. * rtti.c (emit_tinfo_decl): Ditto. fortran/ * trans-io.c (gfc_build_io_library_fndecls): Use SET_TYPE_ALIGN. * trans-common.c (build_common_decl): Use SET_DECL_ALIGN. * trans-types.c (gfc_add_field_to_struct): Use SET_DECL_ALIGN. go/ * go-gcc.cc (Gcc_backend::implicit_variable): Use SET_DECL_ALIGN. java/ * class.c (add_method_1): Use SET_DECL_ALIGN. (make_class_data): Ditto. (emit_register_classes_in_jcr_section): Ditto. * typeck.c (build_java_array_type): Ditto. objc/ * objc-act.c (objc_build_struct): Use SET_DECL_ALIGN. libcc1/ * plugin.cc (plugin_finish_record_or_union): Use SET_TYPE_ALIGN. From-SVN: r235172
2016-03-24re PR tree-optimization/70396 (ICE on valid code at -O3 in 32-bit and 64-bit ↵Richard Biener1-2/+2
modes on x86_64-linux-gnu (in immed_wide_int_const, at emit-rtl.c:606)) 2016-03-24 Richard Biener <rguenther@suse.de> PR tree-optimization/70396 * tree-vect-stmts.c (vectorizable_comparison): Use get_vectype_for_scalar_type. * gcc.dg/torture/pr70396.c: New testcase. From-SVN: r234455
2016-03-18re PR tree-optimization/70252 (ICE in vect_get_vec_def_for_stmt_copy with ↵Ilya Enkovich1-4/+18
-O3 -march=skylake-avx512.) gcc/ PR tree-optimization/70252 * tree-vect-stmts.c (supportable_widening_operation): Check resulting boolean vector has a proper number of elements. (supportable_narrowing_operation): Likewise. gcc/testsuite/ PR tree-optimization/70252 * gcc.dg/pr70252.c: New test. From-SVN: r234323
2016-03-10tree-vect-stmts.c (vectorizable_mask_load_store): Check mask has a proper ↵Ilya Enkovich1-1/+2
number of elements. gcc/ * tree-vect-stmts.c (vectorizable_mask_load_store): Check mask has a proper number of elements. From-SVN: r234104
2016-03-03re PR target/70021 (Test miscompiled with -O3 option for -march=core-avx2.)Jakub Jelinek1-60/+18
PR target/70021 * tree-vect-stmts.c (vect_mark_relevant): Remove USED_IN_PATTERN argument, if STMT_VINFO_IN_PATTERN_P (stmt_info), always mark the pattern no matter if it is used just by non-pattern, pattern or mix thereof. (process_use, vect_mark_stmts_to_be_vectorized): Adjust callers. * tree-vect-patterns.c (vect_recog_vector_vector_shift_pattern): If oprnd1 def_stmt is in pattern, don't look through it. * gcc.dg/vect/pr70021.c: New test. * gcc.target/i386/pr70021.c: New test. From-SVN: r233940
2016-03-01re PR tree-optimization/69956 (Wrong vector type @ fold-const)Ilya Enkovich1-6/+27
gcc/ PR tree-optimization/69956 * tree-vect-stmts.c (supportable_widening_operation): Support multi-step conversion of boolean vectors. (supportable_narrowing_operation): Likewise. gcc/testsuite/ PR tree-optimization/69956 * gcc.dg/pr69956.c: New test. From-SVN: r233850
2016-02-24re PR tree-optimization/69907 (wrong code at -O3 on x86_64-linux-gnu)Richard Biener1-0/+13
2016-02-24 Richard Biener <rguenther@suse.de> PR tree-optimization/69907 * tree-vect-stmts.c (vectorizable_load): Check for gaps at the end of permutations for BB vectorization. * gcc.dg/vect/bb-slp-pr69907.c: New testcase. * gcc.dg/vect/bb-slp-34.c: XFAIL. * gcc.dg/vect/bb-slp-pr68892.c: Likewise. From-SVN: r233655
2016-02-02re PR middle-end/68542 (10% 481.wrf performance regression)Yuri Rumyantsev1-0/+1
gcc/ 2016-02-02 Yuri Rumyantsev <ysrumyan@gmail.com> PR middle-end/68542 * config/i386/i386.c (ix86_expand_branch): Add support for conditional branch with vector comparison. * config/i386/sse.md (VI48_AVX): New mode iterator. (define_expand "cbranch<mode>4): Add support for conditional branch with vector comparison. * tree-vect-loop.c (optimize_mask_stores): New function. * tree-vect-stmts.c (vectorizable_mask_load_store): Initialize has_mask_store field of vect_info. * tree-vectorizer.c (vectorize_loops): Invoke optimaze_mask_stores for vectorized loops having masked stores after vec_info destroy. * tree-vectorizer.h (loop_vec_info): Add new has_mask_store field and correspondent macros. (optimize_mask_stores): Add prototype. gcc/testsuite 2016-02-02 Yuri Rumyantsev <ysrumyan@gmail.com> PR middle-end/68542 * gcc.dg/vect/vect-mask-store-move-1.c: New test. * gcc.target/i386/avx2-vect-mask-store-move1.c: New test. From-SVN: r233068
2016-01-28tree-vect-stmts.c (vectorizable_comparison): Add NULL check for vectype.Ilya Enkovich1-1/+1
gcc/ * tree-vect-stmts.c (vectorizable_comparison): Add NULL check for vectype. gcc/testsuite/ * gcc.dg/declare-simd.c: New test. From-SVN: r232917
2016-01-25re PR target/69421 (ICE in maybe_legitimize_operand, at optabs.c:6888 with -O3)Ilya Enkovich1-2/+11
gcc/ PR target/69421 * tree-vect-stmts.c (vectorizable_condition): Check vectype of operands is compatible with a statement vectype. gcc/testsuite/ PR target/69421 * gcc.dg/pr69421.c: New test. From-SVN: r232792
2016-01-23tree-vect-stmts.c (vectorizable_condition): Build a VEC_COND_EXPR directly ↵Jakub Jelinek1-7/+5
instead of building a temporary tree. * tree-vect-stmts.c (vectorizable_condition): Build a VEC_COND_EXPR directly instead of building a temporary tree. From-SVN: r232764
2016-01-20re PR tree-optimization/69328 (ice in vect_get_vec_def_for_operand, at ↵Ilya Enkovich1-7/+7
tree-vect-stmts.c:1379 with -O3) gcc/ PR tree-optimization/69328 * tree-vect-stmts.c (vect_is_simple_cond): Check compared vectors have same number of elements. (vectorizable_condition): Fix masked version recognition. gcc/testsuite/ PR tree-optimization/69328 * gcc.dg/pr69328.c: New test. Co-Authored-By: Richard Biener <rguenther@suse.de> From-SVN: r232608
2016-01-19Fix ICE in vectorizable_store ().Kirill Yukhin1-2/+6
gcc/ * tree-vect-stmts.c (vectorizable_store): Check rhs vectype. From-SVN: r232568