aboutsummaryrefslogtreecommitdiff
path: root/gcc/tree-vect-data-refs.c
AgeCommit message (Collapse)AuthorFilesLines
2018-05-25tree-vect-data-refs.c (vect_find_stmt_data_reference): New function, ↵Richard Biener1-158/+127
combining stmt data ref gathering and fatal analysis parts. 2018-05-25 Richard Biener <rguenther@suse.de> * tree-vect-data-refs.c (vect_find_stmt_data_reference): New function, combining stmt data ref gathering and fatal analysis parts. (vect_analyze_data_refs): Remove now redudnant code and simplify. * tree-vect-loop.c (vect_get_datarefs_in_loop): Factor out from vect_analyze_loop_2 and use vect_find_stmt_data_reference. * tree-vect-slp.c (vect_slp_bb): Use vect_find_stmt_data_reference. * tree-vectorizer.h (vect_find_stmt_data_reference): Declare. From-SVN: r260754
2018-05-25tree-vectorizer.h (STMT_VINFO_GROUP_*, GROUP_*): Remove.Richard Biener1-63/+64
2018-05-25 Richard Biener <rguenther@suse.de> * tree-vectorizer.h (STMT_VINFO_GROUP_*, GROUP_*): Remove. (DR_GROUP_*): New, assert we have non-NULL ->data_ref_info. (REDUC_GROUP_*): New, assert we have NULL ->data_ref_info. (STMT_VINFO_GROUPED_ACCESS): Adjust. * tree-vect-data-refs.c (everywhere): Adjust users. * tree-vect-loop.c (everywhere): Likewise. * tree-vect-slp.c (everywhere): Likewise. * tree-vect-stmts.c (everywhere): Likewise. * tree-vect-patterns.c (vect_reassociating_reduction_p): Likewise. From-SVN: r260709
2018-05-02Tighten early exit in vect_analyze_data_ref_dependence (PR85586)Richard Sandiford1-2/+4
The problem in this PR was that we didn't consider aliases between writes in the same strided group. After tightening the early exit we get the expected abs(step) >= 2 versioning check. 2018-05-02 Richard Sandiford <richard.sandiford@linaro.org> gcc/ PR tree-optimization/85586 * tree-vect-data-refs.c (vect_analyze_data_ref_dependence): Only exit early for statements in the same group if the accesses are not strided. gcc/testsuite/ PR tree-optimization/85586 * gcc.dg/vect/pr85586.c: New test. From-SVN: r259822
2018-04-26tree-vect-data-refs.c (vect_get_data_access_cost): Get prologue cost vector ↵Richard Biener1-7/+12
and pass it to vect_get_load_cost. 2018-04-26 Richard Biener <rguenther@suse.de> * tree-vect-data-refs.c (vect_get_data_access_cost): Get prologue cost vector and pass it to vect_get_load_cost. (vect_get_peeling_costs_all_drs): Likewise. (vect_peeling_hash_get_lowest_cost): Likewise. (vect_enhance_data_refs_alignment): Likewise. From-SVN: r259668
2018-04-19re PR tree-optimization/84737 (20% degradation in CPU2000 172.mgrid starting ↵Richard Biener1-0/+22
with r256888) 2018-04-19 Richard Biener <rguenther@suse.de> PR tree-optimization/84737 * tree-vect-data-refs.c (vect_copy_ref_info): New function copying restrict info. (vect_setup_realignment): Use it. * tree-vectorizer.h (vect_copy_ref_info): Declare. * tree-vect-stmts.c (vectorizable_store): Copy ref info from the first DR to all generated stores. (vectorizable_load): Likewise for loads. From-SVN: r259493
2018-04-10Add missing cases to vect_get_smallest_scalar_type (PR 85286)Richard Sandiford1-0/+2
In this PR we used WIDEN_SUM_EXPR to vectorise: short i, y; int sum; [...] for (i = x; i > 0; i--) sum += y; with 4 ints and 8 shorts per vector. The problem was that we set the VF based only on the ints, then calculated the number of vector copies based on the shorts, giving 4/8. Previously that led to ncopies==0, but after r249897 we pick it up as an ICE. In this particular case we could vectorise the reduction by setting ncopies based on the output type rather than the input type, but it doesn't seem worth adding a special "optimisation" for such a pathological case. I think it's really an instance of the more general problem that we can't vectorise using combinations of (say) 64-bit and 128-bit vectors on targets that support both. 2018-04-10 Richard Sandiford <richard.sandiford@linaro.org> gcc/ PR tree-optimization/85286 * tree-vect-data-refs.c (vect_get_smallest_scalar_type): gcc/testsuite/ PR tree-optimization/85286 * gcc.dg/vect/pr85286.c: New test. From-SVN: r259268
2018-03-24Use SCEV information when aligning for vectorisation (PR 84005)Richard Sandiford1-5/+5
This PR is another regression caused by the removal of the simple_iv check in dr_analyze_innermost for BB analysis. Without splitting out the step, we weren't able to find an underlying object whose alignment could be increased. As with PR81635, I think the simple_iv was only handling one special case of something that ought to be more general. The more general thing here is that if the address can be analysed as a scalar evolution, and if all updates preserve alignment N, it's possible to align the address to N by increasing the alignment of the base object to N. That applies also to outer loops, and to both loop and BB analysis. I wasn't sure where the new functions ought to live, but tree-data-ref.c seemed OK since (a) that already does scev analysis on addresses and (b) you'd want to use dr_analyze_innermost first if you were analysing a reference. 2018-03-24 Richard Sandiford <richard.sandiford@linaro.org> gcc/ PR tree-optimization/84005 * tree-data-ref.h (get_base_for_alignment): Declare. * tree-data-ref.c (get_base_for_alignment_1): New function. (get_base_for_alignment): Likewise. * tree-vect-data-refs.c (vect_compute_data_ref_alignment): Use get_base_for_alignment to find a suitable base object, instead of always using drb->base_address. gcc/testsuite/ PR tree-optimization/84005 * gcc.dg/vect/bb-slp-1.c: Make sure there is no message about failing to force the alignment. From-SVN: r258833
2018-03-02Use loop->safelen rather than loop->force_vectorizeRichard Sandiford1-2/+2
...since the latter doesn't guarantee independence by itself. 2018-03-02 Richard Sandiford <richard.sandiford@linaro.org> gcc/ * tree-vect-data-refs.c (vect_analyze_data_ref_dependence) (vect_analyze_data_ref_access): Use loop->safe_len rather than loop->force_vectorize to check whether there is no alias. gcc/testsuite/ * gcc.dg/vect/vect-alias-check-13.c: New test. From-SVN: r258130
2018-01-16Don't group gather loads (PR83847)Richard Sandiford1-2/+4
In the testcase we were trying to group two gather loads, even though that isn't supported. Fixed by explicitly disallowing grouping of gathers and scatters. This problem didn't show up on SVE because there we convert to IFN_GATHER_LOAD/IFN_SCATTER_STORE pattern statements, which fail the can_group_stmts_p check. 2018-01-16 Richard Sandiford <richard.sandiford@linaro.org> gcc/ * tree-vect-data-refs.c (vect_analyze_data_ref_accesses): gcc/testsuite/ * gcc.dg/torture/pr83847.c: New test. From-SVN: r256730
2018-01-13Support for aliasing with variable stridesRichard Sandiford1-45/+320
This patch adds runtime alias checks for loops with variable strides, so that we can vectorise them even without a restrict qualifier. There are several parts to doing this: 1) For accesses like: x[i * n] += 1; we need to check whether n (and thus the DR_STEP) is nonzero. vect_analyze_data_ref_dependence records values that need to be checked in this way, then prune_runtime_alias_test_list records a bounds check on DR_STEP being outside the range [0, 0]. 2) For accesses like: x[i * n] = x[i * n + 1] + 1; we simply need to test whether abs (n) >= 2. prune_runtime_alias_test_list looks for cases like this and tries to guess whether it is better to use this kind of check or a check for non-overlapping ranges. (We could do an OR of the two conditions at runtime, but that isn't implemented yet.) 3) Checks for overlapping ranges need to cope with variable strides. At present the "length" of each segment in a range check is represented as an offset from the base that lies outside the touched range, in the same direction as DR_STEP. The length can therefore be negative and is sometimes conservative. With variable steps it's easier to reaon about if we split this into two: seg_len: distance travelled from the first iteration of interest to the last, e.g. DR_STEP * (VF - 1) access_size: the number of bytes accessed in each iteration with access_size always being a positive constant and seg_len possibly being variable. We can then combine alias checks for two accesses that are a constant number of bytes apart by adjusting the access size to account for the gap. This leaves the segment length unchanged, which allows the check to be combined with further accesses. When seg_len is positive, the runtime alias check has the form: base_a >= base_b + seg_len_b + access_size_b || base_b >= base_a + seg_len_a + access_size_a In many accesses the base will be aligned to the access size, which allows us to skip the addition: base_a > base_b + seg_len_b || base_b > base_a + seg_len_a A similar saving is possible with "negative" lengths. The patch therefore tracks the alignment in addition to seg_len and access_size. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * tree-vectorizer.h (vec_lower_bound): New structure. (_loop_vec_info): Add check_nonzero and lower_bounds. (LOOP_VINFO_CHECK_NONZERO): New macro. (LOOP_VINFO_LOWER_BOUNDS): Likewise. (LOOP_REQUIRES_VERSIONING_FOR_ALIAS): Check lower_bounds too. * tree-data-ref.h (dr_with_seg_len): Add access_size and align fields. Make seg_len the distance travelled, not including the access size. (dr_direction_indicator): Declare. (dr_zero_step_indicator): Likewise. (dr_known_forward_stride_p): Likewise. * tree-data-ref.c: Include stringpool.h, tree-vrp.h and tree-ssanames.h. (runtime_alias_check_p): Allow runtime alias checks with variable strides. (operator ==): Compare access_size and align. (prune_runtime_alias_test_list): Rework for new distinction between the access_size and seg_len. (create_intersect_range_checks_index): Likewise. Cope with polynomial segment lengths. (get_segment_min_max): New function. (create_intersect_range_checks): Use it. (dr_step_indicator): New function. (dr_direction_indicator): Likewise. (dr_zero_step_indicator): Likewise. (dr_known_forward_stride_p): Likewise. * tree-loop-distribution.c (data_ref_segment_size): Return DR_STEP * (niters - 1). (compute_alias_check_pairs): Update call to the dr_with_seg_len constructor. * tree-vect-data-refs.c (vect_check_nonzero_value): New function. (vect_preserves_scalar_order_p): New function, split out from... (vect_analyze_data_ref_dependence): ...here. Check for zero steps. (vect_vfa_segment_size): Return DR_STEP * (length_factor - 1). (vect_vfa_access_size): New function. (vect_vfa_align): Likewise. (vect_compile_time_alias): Take access_size_a and access_b arguments. (dump_lower_bound): New function. (vect_check_lower_bound): Likewise. (vect_small_gap_p): Likewise. (vectorizable_with_step_bound_p): Likewise. (vect_prune_runtime_alias_test_list): Ignore cross-iteration depencies if the vectorization factor is 1. Convert the checks for nonzero steps into checks on the bounds of DR_STEP. Try using a bunds check for variable steps if the minimum required step is relatively small. Update calls to the dr_with_seg_len constructor and to vect_compile_time_alias. * tree-vect-loop-manip.c (vect_create_cond_for_lower_bounds): New function. (vect_loop_versioning): Call it. * tree-vect-loop.c (vect_analyze_loop_2): Clear LOOP_VINFO_LOWER_BOUNDS when retrying. (vect_estimate_min_profitable_iters): Account for any bounds checks. gcc/testsuite/ * gcc.dg/vect/bb-slp-cond-1.c: Expect loop vectorization rather than SLP vectorization. * gcc.dg/vect/vect-alias-check-10.c: New test. * gcc.dg/vect/vect-alias-check-11.c: Likewise. * gcc.dg/vect/vect-alias-check-12.c: Likewise. * gcc.dg/vect/vect-alias-check-8.c: Likewise. * gcc.dg/vect/vect-alias-check-9.c: Likewise. * gcc.target/aarch64/sve/strided_load_8.c: Likewise. * gcc.target/aarch64/sve/var_stride_1.c: Likewise. * gcc.target/aarch64/sve/var_stride_1.h: Likewise. * gcc.target/aarch64/sve/var_stride_1_run.c: Likewise. * gcc.target/aarch64/sve/var_stride_2.c: Likewise. * gcc.target/aarch64/sve/var_stride_2_run.c: Likewise. * gcc.target/aarch64/sve/var_stride_3.c: Likewise. * gcc.target/aarch64/sve/var_stride_3_run.c: Likewise. * gcc.target/aarch64/sve/var_stride_4.c: Likewise. * gcc.target/aarch64/sve/var_stride_4_run.c: Likewise. * gcc.target/aarch64/sve/var_stride_5.c: Likewise. * gcc.target/aarch64/sve/var_stride_5_run.c: Likewise. * gcc.target/aarch64/sve/var_stride_6.c: Likewise. * gcc.target/aarch64/sve/var_stride_6_run.c: Likewise. * gcc.target/aarch64/sve/var_stride_7.c: Likewise. * gcc.target/aarch64/sve/var_stride_7_run.c: Likewise. * gcc.target/aarch64/sve/var_stride_8.c: Likewise. * gcc.target/aarch64/sve/var_stride_8_run.c: Likewise. * gfortran.dg/vect/vect-alias-check-1.F90: Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256644
2018-01-13Add support for SVE scatter storesRichard Sandiford1-3/+8
This is mostly a mechanical extension of the previous gather load support to scatter stores. The internal functions in this case are: IFN_SCATTER_STORE (base, offsets, scale, values) IFN_MASK_SCATTER_STORE (base, offsets, scale, values, mask) However, one nonobvious change is to vect_analyze_data_ref_access. If we're treating an access as a gather load or scatter store (i.e. if STMT_VINFO_GATHER_SCATTER_P is true), the existing code would create a dummy data_reference whose step is 0. There's not really much else it could do, since the whole point is that the step isn't predictable from iteration to iteration. We then went into this code in vect_analyze_data_ref_access: /* Allow loads with zero step in inner-loop vectorization. */ if (loop_vinfo && integer_zerop (step)) { GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt)) = NULL; if (!nested_in_vect_loop_p (loop, stmt)) return DR_IS_READ (dr); I.e. we'd take the step literally and assume that this is a load or store to an invariant address. Loads from invariant addresses are supported but stores to them aren't. The code therefore had the effect of disabling all scatter stores. AFAICT this is true of AVX too: although tests like avx512f-scatter-1.c test for the correctness of a scatter-like loop, they don't seem to check whether a scatter instruction is actually used. The patch therefore makes vect_analyze_data_ref_access return true for scatters. We do seem to handle the aliasing correctly; that's tested by other functions, and is symmetrical to the already-working gather case. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * doc/sourcebuild.texi (vect_scatter_store): Document. * optabs.def (scatter_store_optab, mask_scatter_store_optab): New optabs. * doc/md.texi (scatter_store@var{m}, mask_scatter_store@var{m}): Document. * genopinit.c (main): Add supports_vec_scatter_store and supports_vec_scatter_store_cached to target_optabs. * gimple.h (gimple_expr_type): Handle IFN_SCATTER_STORE and IFN_MASK_SCATTER_STORE. * internal-fn.def (SCATTER_STORE, MASK_SCATTER_STORE): New internal functions. * internal-fn.h (internal_store_fn_p): Declare. (internal_fn_stored_value_index): Likewise. * internal-fn.c (scatter_store_direct): New macro. (expand_scatter_store_optab_fn): New function. (direct_scatter_store_optab_supported_p): New macro. (internal_store_fn_p): New function. (internal_gather_scatter_fn_p): Handle IFN_SCATTER_STORE and IFN_MASK_SCATTER_STORE. (internal_fn_mask_index): Likewise. (internal_fn_stored_value_index): New function. (internal_gather_scatter_fn_supported_p): Adjust operand numbers for scatter stores. * optabs-query.h (supports_vec_scatter_store_p): Declare. * optabs-query.c (supports_vec_scatter_store_p): New function. * tree-vectorizer.h (vect_get_store_rhs): Declare. * tree-vect-data-refs.c (vect_analyze_data_ref_access): Return true for scatter stores. (vect_gather_scatter_fn_p): Handle scatter stores too. (vect_check_gather_scatter): Consider using scatter stores if supports_vec_scatter_store_p. * tree-vect-patterns.c (vect_try_gather_scatter_pattern): Handle scatter stores too. * tree-vect-stmts.c (exist_non_indexing_operands_for_use_p): Use internal_fn_stored_value_index. (check_load_store_masking): Handle scatter stores too. (vect_get_store_rhs): Make public. (vectorizable_call): Use internal_store_fn_p. (vectorizable_store): Handle scatter store internal functions. (vect_transform_stmt): Compare GROUP_STORE_COUNT with GROUP_SIZE when deciding whether the end of the group has been reached. * config/aarch64/aarch64.md (UNSPEC_ST1_SCATTER): New unspec. * config/aarch64/aarch64-sve.md (scatter_store<mode>): New expander. (mask_scatter_store<mode>): New insns. gcc/testsuite/ * lib/target-supports.exp (check_effective_target_vect_scatter_store): New proc. * gcc.dg/vect/pr25413a.c: Expect both loops to be optimized on targets with scatter stores. * gcc.dg/vect/vect-71.c: Restrict XFAIL to targets without scatter stores. * gcc.target/aarch64/sve/mask_scatter_store_1.c: New test. * gcc.target/aarch64/sve/mask_scatter_store_2.c: Likewise. * gcc.target/aarch64/sve/scatter_store_1.c: Likewise. * gcc.target/aarch64/sve/scatter_store_2.c: Likewise. * gcc.target/aarch64/sve/scatter_store_3.c: Likewise. * gcc.target/aarch64/sve/scatter_store_4.c: Likewise. * gcc.target/aarch64/sve/scatter_store_5.c: Likewise. * gcc.target/aarch64/sve/scatter_store_6.c: Likewise. * gcc.target/aarch64/sve/scatter_store_7.c: Likewise. * gcc.target/aarch64/sve/strided_store_1.c: Likewise. * gcc.target/aarch64/sve/strided_store_2.c: Likewise. * gcc.target/aarch64/sve/strided_store_3.c: Likewise. * gcc.target/aarch64/sve/strided_store_4.c: Likewise. * gcc.target/aarch64/sve/strided_store_5.c: Likewise. * gcc.target/aarch64/sve/strided_store_6.c: Likewise. * gcc.target/aarch64/sve/strided_store_7.c: Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256643
2018-01-13Allow gather loads to be used for grouped accessesRichard Sandiford1-1/+1
Following on from the previous patch for strided accesses, this patch allows gather loads to be used with grouped accesses, if we otherwise would need to fall back to VMAT_ELEMENTWISE. However, as the comment says, this is restricted to single-element groups for now: ??? Although the code can handle all group sizes correctly, it probably isn't a win to use separate strided accesses based on nearby locations. Or, even if it's a win over scalar code, it might not be a win over vectorizing at a lower VF, if that allows us to use contiguous accesses. Single-element groups are an important special case though, and this means that code is less sensitive to GCC's classification of single accesses with constant steps as "grouped" and ones with variable steps as "strided". 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * tree-vectorizer.h (vect_gather_scatter_fn_p): Declare. * tree-vect-data-refs.c (vect_gather_scatter_fn_p): Make public. * tree-vect-stmts.c (vect_truncate_gather_scatter_offset): New function. (vect_use_strided_gather_scatters_p): Take a masked_p argument. Use vect_truncate_gather_scatter_offset if we can't treat the operation as a normal gather load or scatter store. (get_group_load_store_type): Take the gather_scatter_info as argument. Try using a gather load or scatter store for single-element groups. (get_load_store_type): Update calls to get_group_load_store_type and vect_use_strided_gather_scatters_p. gcc/testsuite/ * gcc.target/aarch64/sve/reduc_strict_3.c: Expect FADDA to be used for double_reduc1. * gcc.target/aarch64/sve/strided_load_4.c: New test. * gcc.target/aarch64/sve/strided_load_5.c: Likewise. * gcc.target/aarch64/sve/strided_load_6.c: Likewise. * gcc.target/aarch64/sve/strided_load_7.c: Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256642
2018-01-13Use gather loads for strided accessesRichard Sandiford1-13/+28
This patch tries to use gather loads for strided accesses, rather than falling back to VMAT_ELEMENTWISE. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * tree-vectorizer.h (vect_create_data_ref_ptr): Take an extra optional tree argument. * tree-vect-data-refs.c (vect_check_gather_scatter): Check for null target hooks. (vect_create_data_ref_ptr): Take the iv_step as an optional argument, but continue to use the current value as a fallback. (bump_vector_ptr): Use operand_equal_p rather than tree_int_cst_compare to compare the updates. * tree-vect-stmts.c (vect_use_strided_gather_scatters_p): New function. (get_load_store_type): Use it when handling a strided access. (vect_get_strided_load_store_ops): New function. (vect_get_data_ptr_increment): Likewise. (vectorizable_load): Handle strided gather loads. Always pass a step to vect_create_data_ref_ptr and bump_vector_ptr. gcc/testsuite/ * gcc.target/aarch64/sve/strided_load_1.c: New test. * gcc.target/aarch64/sve/strided_load_2.c: Likewise. * gcc.target/aarch64/sve/strided_load_3.c: Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256641
2018-01-13Add support for SVE gather loadsRichard Sandiford1-14/+139
This patch adds support for SVE gather loads. It uses the basically the same analysis code as the AVX gather support, but after that there are two major differences: - It uses new internal functions rather than target built-ins. The interface is: IFN_GATHER_LOAD (base, offsets scale) IFN_MASK_GATHER_LOAD (base, offsets scale, mask) which should be reasonably generic. One of the advantages of using internal functions is that other passes can understand what the functions do, but a more immediate advantage is that we can query the underlying target pattern to see which scales it supports. - It uses pattern recognition to convert the offset to the right width, if it was originally narrower than that. This avoids having to do a widening operation as part of the gather expansion itself. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * doc/md.texi (gather_load@var{m}): Document. (mask_gather_load@var{m}): Likewise. * genopinit.c (main): Add supports_vec_gather_load and supports_vec_gather_load_cached to target_optabs. * optabs-tree.c (init_tree_optimization_optabs): Use ggc_cleared_alloc to allocate target_optabs. * optabs.def (gather_load_optab, mask_gather_laod_optab): New optabs. * internal-fn.def (GATHER_LOAD, MASK_GATHER_LOAD): New internal functions. * internal-fn.h (internal_load_fn_p): Declare. (internal_gather_scatter_fn_p): Likewise. (internal_fn_mask_index): Likewise. (internal_gather_scatter_fn_supported_p): Likewise. * internal-fn.c (gather_load_direct): New macro. (expand_gather_load_optab_fn): New function. (direct_gather_load_optab_supported_p): New macro. (direct_internal_fn_optab): New function. (internal_load_fn_p): Likewise. (internal_gather_scatter_fn_p): Likewise. (internal_fn_mask_index): Likewise. (internal_gather_scatter_fn_supported_p): Likewise. * optabs-query.c (supports_at_least_one_mode_p): New function. (supports_vec_gather_load_p): Likewise. * optabs-query.h (supports_vec_gather_load_p): Declare. * tree-vectorizer.h (gather_scatter_info): Add ifn, element_type and memory_type field. (NUM_PATTERNS): Bump to 15. * tree-vect-data-refs.c: Include internal-fn.h. (vect_gather_scatter_fn_p): New function. (vect_describe_gather_scatter_call): Likewise. (vect_check_gather_scatter): Try using internal functions for gather loads. Recognize existing calls to a gather load function. (vect_analyze_data_refs): Consider using gather loads if supports_vec_gather_load_p. * tree-vect-patterns.c (vect_get_load_store_mask): New function. (vect_get_gather_scatter_offset_type): Likewise. (vect_convert_mask_for_vectype): Likewise. (vect_add_conversion_to_patterm): Likewise. (vect_try_gather_scatter_pattern): Likewise. (vect_recog_gather_scatter_pattern): New pattern recognizer. (vect_vect_recog_func_ptrs): Add it. * tree-vect-stmts.c (exist_non_indexing_operands_for_use_p): Use internal_fn_mask_index and internal_gather_scatter_fn_p. (check_load_store_masking): Take the gather_scatter_info as an argument and handle gather loads. (vect_get_gather_scatter_ops): New function. (vectorizable_call): Check internal_load_fn_p. (vectorizable_load): Likewise. Handle gather load internal functions. (vectorizable_store): Update call to check_load_store_masking. * config/aarch64/aarch64.md (UNSPEC_LD1_GATHER): New unspec. * config/aarch64/iterators.md (SVE_S, SVE_D): New mode iterators. * config/aarch64/predicates.md (aarch64_gather_scale_operand_w) (aarch64_gather_scale_operand_d): New predicates. * config/aarch64/aarch64-sve.md (gather_load<mode>): New expander. (mask_gather_load<mode>): New insns. gcc/testsuite/ * gcc.target/aarch64/sve/gather_load_1.c: New test. * gcc.target/aarch64/sve/gather_load_2.c: Likewise. * gcc.target/aarch64/sve/gather_load_3.c: Likewise. * gcc.target/aarch64/sve/gather_load_4.c: Likewise. * gcc.target/aarch64/sve/gather_load_5.c: Likewise. * gcc.target/aarch64/sve/gather_load_6.c: Likewise. * gcc.target/aarch64/sve/gather_load_7.c: Likewise. * gcc.target/aarch64/sve/mask_gather_load_1.c: Likewise. * gcc.target/aarch64/sve/mask_gather_load_2.c: Likewise. * gcc.target/aarch64/sve/mask_gather_load_3.c: Likewise. * gcc.target/aarch64/sve/mask_gather_load_4.c: Likewise. * gcc.target/aarch64/sve/mask_gather_load_5.c: Likewise. * gcc.target/aarch64/sve/mask_gather_load_6.c: Likewise. * gcc.target/aarch64/sve/mask_gather_load_7.c: Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256640
2018-01-13Allow single-element interleaving for non-power-of-2 stridesRichard Sandiford1-3/+2
This allows LD3 to be used for isolated a[i * 3] accesses, in a similar way to the current a[i * 2] and a[i * 4] for LD2 and LD4 respectively. Given the problems with the cost model underestimating the cost of elementwise accesses, the patch continues to reject the VMAT_ELEMENTWISE cases that are currently rejected. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * tree-vect-data-refs.c (vect_analyze_group_access_1): Allow single-element interleaving even if the size is not a power of 2. * tree-vect-stmts.c (get_load_store_type): Disallow elementwise accesses for single-element interleaving if the group size is not a power of 2. gcc/testsuite/ * gcc.target/aarch64/sve/struct_vect_18.c: New test. * gcc.target/aarch64/sve/struct_vect_18_run.c: Likewise. * gcc.target/aarch64/sve/struct_vect_19.c: Likewise. * gcc.target/aarch64/sve/struct_vect_19_run.c: Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256634
2018-01-13Add support for masked load/store_lanesRichard Sandiford1-14/+81
This patch adds support for vectorising groups of IFN_MASK_LOADs and IFN_MASK_STOREs using conditional load/store-lanes instructions. This requires new internal functions to represent the result (IFN_MASK_{LOAD,STORE}_LANES), as well as associated optabs. The normal IFN_{LOAD,STORE}_LANES functions are const operations that logically just perform the permute: the load or store is encoded as a MEM operand to the call statement. In contrast, the IFN_MASK_{LOAD,STORE}_LANES functions use the same kind of interface as IFN_MASK_{LOAD,STORE}, since the memory is only conditionally accessed. The AArch64 patterns were added as part of the main LD[234]/ST[234] patch. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * doc/md.texi (vec_mask_load_lanes@var{m}@var{n}): Document. (vec_mask_store_lanes@var{m}@var{n}): Likewise. * optabs.def (vec_mask_load_lanes_optab): New optab. (vec_mask_store_lanes_optab): Likewise. * internal-fn.def (MASK_LOAD_LANES): New internal function. (MASK_STORE_LANES): Likewise. * internal-fn.c (mask_load_lanes_direct): New macro. (mask_store_lanes_direct): Likewise. (expand_mask_load_optab_fn): Handle masked operations. (expand_mask_load_lanes_optab_fn): New macro. (expand_mask_store_optab_fn): Handle masked operations. (expand_mask_store_lanes_optab_fn): New macro. (direct_mask_load_lanes_optab_supported_p): Likewise. (direct_mask_store_lanes_optab_supported_p): Likewise. * tree-vectorizer.h (vect_store_lanes_supported): Take a masked_p parameter. (vect_load_lanes_supported): Likewise. * tree-vect-data-refs.c (strip_conversion): New function. (can_group_stmts_p): Likewise. (vect_analyze_data_ref_accesses): Use it instead of checking for a pair of assignments. (vect_store_lanes_supported): Take a masked_p parameter. (vect_load_lanes_supported): Likewise. * tree-vect-loop.c (vect_analyze_loop_2): Update calls to vect_store_lanes_supported and vect_load_lanes_supported. * tree-vect-slp.c (vect_analyze_slp_instance): Likewise. * tree-vect-stmts.c (get_group_load_store_type): Take a masked_p parameter. Don't allow gaps for masked accesses. Use vect_get_store_rhs. Update calls to vect_store_lanes_supported and vect_load_lanes_supported. (get_load_store_type): Take a masked_p parameter and update call to get_group_load_store_type. (vectorizable_store): Update call to get_load_store_type. Handle IFN_MASK_STORE_LANES. (vectorizable_load): Update call to get_load_store_type. Handle IFN_MASK_LOAD_LANES. gcc/testsuite/ * gcc.dg/vect/vect-ooo-group-1.c: New test. * gcc.target/aarch64/sve/mask_struct_load_1.c: Likewise. * gcc.target/aarch64/sve/mask_struct_load_1_run.c: Likewise. * gcc.target/aarch64/sve/mask_struct_load_2.c: Likewise. * gcc.target/aarch64/sve/mask_struct_load_2_run.c: Likewise. * gcc.target/aarch64/sve/mask_struct_load_3.c: Likewise. * gcc.target/aarch64/sve/mask_struct_load_3_run.c: Likewise. * gcc.target/aarch64/sve/mask_struct_load_4.c: Likewise. * gcc.target/aarch64/sve/mask_struct_load_5.c: Likewise. * gcc.target/aarch64/sve/mask_struct_load_6.c: Likewise. * gcc.target/aarch64/sve/mask_struct_load_7.c: Likewise. * gcc.target/aarch64/sve/mask_struct_load_8.c: Likewise. * gcc.target/aarch64/sve/mask_struct_store_1.c: Likewise. * gcc.target/aarch64/sve/mask_struct_store_1_run.c: Likewise. * gcc.target/aarch64/sve/mask_struct_store_2.c: Likewise. * gcc.target/aarch64/sve/mask_struct_store_2_run.c: Likewise. * gcc.target/aarch64/sve/mask_struct_store_3.c: Likewise. * gcc.target/aarch64/sve/mask_struct_store_3_run.c: Likewise. * gcc.target/aarch64/sve/mask_struct_store_4.c: Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256620
2018-01-13Give the target more control over ARRAY_TYPE modesRichard Sandiford1-10/+13
So far we've used integer modes for LD[234] and ST[234] arrays. That doesn't scale well to SVE, since the sizes aren't fixed at compile time (and even if they were, we wouldn't want integers to be so wide). This patch lets the target use double-, triple- and quadruple-length vectors instead. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * target.def (array_mode): New target hook. * doc/tm.texi.in (TARGET_ARRAY_MODE): New hook. * doc/tm.texi: Regenerate. * hooks.h (hook_optmode_mode_uhwi_none): Declare. * hooks.c (hook_optmode_mode_uhwi_none): New function. * tree-vect-data-refs.c (vect_lanes_optab_supported_p): Use targetm.array_mode. * stor-layout.c (mode_for_array): Likewise. Support polynomial type sizes. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256617
2018-01-12Handle polynomial DR_INITRichard Sandiford1-4/+10
The idea with the main 107-patch poly_int series (latterly 109-patch) was to change the mode sizes and vector element counts to poly_int and then propagate those changes as far as they needed to go to fix build failures from incompatible types. This means that DR_INIT is now constructed as a poly_int64: poly_int64 pbytepos; if (!multiple_p (pbitpos, BITS_PER_UNIT, &pbytepos)) { if (dump_file && (dump_flags & TDF_DETAILS)) fprintf (dump_file, "failed: bit offset alignment.\n"); return false; } [...] init = ssize_int (pbytepos); This patch adjusts other references to DR_INIT accordingly. Unlike the above, the adjustments weren't needed to avoid a build-time type incompatibility, but they are needed to make the producer and consumers of DR_INIT logically consistent. 2018-01-12 Richard Sandiford <richard.sandiford@linaro.org> gcc/ * tree-predcom.c (aff_combination_dr_offset): Use wi::to_poly_widest rather than wi::to_widest for DR_INITs. * tree-vect-data-refs.c (vect_find_same_alignment_drs): Use wi::to_poly_offset rather than wi::to_offset for DR_INIT. (vect_analyze_data_ref_accesses): Require both DR_INITs to be INTEGER_CSTs. (vect_analyze_group_access_1): Note that here. From-SVN: r256587
2018-01-05Revert DECL_USER_ALIGN part of r241959Richard Sandiford1-13/+0
r241959 included code to stop the vectoriser increasing the alignment of a "user-aligned" variable. This wasn't the main purpose of the patch, but was done for consistency with pass_increase_alignment, and was needed to make the testcase work. The documentation for the aligned attribute says: This attribute specifies a minimum alignment for the variable or structure field, measured in bytes. so I think it's reasonable for the vectoriser to increase the alignment further, if that helps us to vectorise code. It's also useful if the "user" alignment actually came from an earlier pass rather than the source code. A possible counterexample came up when this was discussed on the lists. Users who are trying to collate things from several translation units into a single section can use: __attribute__((section ("whatever"), aligned(N))) and would not want extra padding. It turns out that the supported way of doing that is to add a "used" attribute, which works even when no "aligned" attribute is given. 2018-01-05 Richard Sandiford <richard.sandiford@linaro.org> gcc/ * tree-vect-data-refs.c (vect_compute_data_ref_alignment): Don't punt for user-aligned variables. gcc/testsuite/ * gcc.dg/vect/vect-align-4.c: New test. * gcc.dg/vect/vect-nb-iter-ub-2.c (cc): Remove alignment attribute and redefine as a structure with an unaligned member "b". (foo): Update accordingly. From-SVN: r256277
2018-01-03poly_int: GET_MODE_SIZERichard Sandiford1-3/+14
This patch changes GET_MODE_SIZE from unsigned short to poly_uint16. The non-mechanical parts were handled by previous patches. 2018-01-03 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * machmode.h (mode_size): Change from unsigned short to poly_uint16_pod. (mode_to_bytes): Return a poly_uint16 rather than an unsigned short. (GET_MODE_SIZE): Return a constant if ONLY_FIXED_SIZE_MODES, or if measurement_type is not polynomial. (fixed_size_mode::includes_p): Check for constant-sized modes. * genmodes.c (emit_mode_size_inline): Make mode_size_inline return a poly_uint16 rather than an unsigned short. (emit_mode_size): Change the type of mode_size from unsigned short to poly_uint16_pod. Use ZERO_COEFFS for the initializer. (emit_mode_adjustments): Cope with polynomial vector sizes. * lto-streamer-in.c (lto_input_mode_table): Use bp_unpack_poly_value for GET_MODE_SIZE. * lto-streamer-out.c (lto_write_mode_table): Use bp_pack_poly_value for GET_MODE_SIZE. * auto-inc-dec.c (try_merge): Treat GET_MODE_SIZE as polynomial. * builtins.c (expand_ifn_atomic_compare_exchange_into_call): Likewise. * caller-save.c (setup_save_areas): Likewise. (replace_reg_with_saved_mem): Likewise. * calls.c (emit_library_call_value_1): Likewise. * combine-stack-adj.c (combine_stack_adjustments_for_block): Likewise. * combine.c (simplify_set, make_extraction, simplify_shift_const_1) (gen_lowpart_for_combine): Likewise. * convert.c (convert_to_integer_1): Likewise. * cse.c (equiv_constant, cse_insn): Likewise. * cselib.c (autoinc_split, cselib_hash_rtx): Likewise. (cselib_subst_to_values): Likewise. * dce.c (word_dce_process_block): Likewise. * df-problems.c (df_word_lr_mark_ref): Likewise. * dwarf2cfi.c (init_one_dwarf_reg_size): Likewise. * dwarf2out.c (multiple_reg_loc_descriptor, mem_loc_descriptor) (concat_loc_descriptor, concatn_loc_descriptor, loc_descriptor) (rtl_for_decl_location): Likewise. * emit-rtl.c (gen_highpart, widen_memory_access): Likewise. * expmed.c (extract_bit_field_1, extract_integral_bit_field): Likewise. * expr.c (emit_group_load_1, clear_storage_hints): Likewise. (emit_move_complex, emit_move_multi_word, emit_push_insn): Likewise. (expand_expr_real_1): Likewise. * function.c (assign_parm_setup_block_p, assign_parm_setup_block) (pad_below): Likewise. * gimple-fold.c (optimize_atomic_compare_exchange_p): Likewise. * gimple-ssa-store-merging.c (rhs_valid_for_store_merging_p): Likewise. * ira.c (get_subreg_tracking_sizes): Likewise. * ira-build.c (ira_create_allocno_objects): Likewise. * ira-color.c (coalesced_pseudo_reg_slot_compare): Likewise. (ira_sort_regnos_for_alter_reg): Likewise. * ira-costs.c (record_operand_costs): Likewise. * lower-subreg.c (interesting_mode_p, simplify_gen_subreg_concatn) (resolve_simple_move): Likewise. * lra-constraints.c (get_reload_reg, operands_match_p): Likewise. (process_addr_reg, simplify_operand_subreg, curr_insn_transform) (lra_constraints): Likewise. (CONST_POOL_OK_P): Reject variable-sized modes. * lra-spills.c (slot, assign_mem_slot, pseudo_reg_slot_compare) (add_pseudo_to_slot, lra_spill): Likewise. * omp-low.c (omp_clause_aligned_alignment): Likewise. * optabs-query.c (get_best_extraction_insn): Likewise. * optabs-tree.c (expand_vec_cond_expr_p): Likewise. * optabs.c (expand_vec_perm_var, expand_vec_cond_expr): Likewise. (expand_mult_highpart, valid_multiword_target_p): Likewise. * recog.c (offsettable_address_addr_space_p): Likewise. * regcprop.c (maybe_mode_change): Likewise. * reginfo.c (choose_hard_reg_mode, record_subregs_of_mode): Likewise. * regrename.c (build_def_use): Likewise. * regstat.c (dump_reg_info): Likewise. * reload.c (complex_word_subreg_p, push_reload, find_dummy_reload) (find_reloads, find_reloads_subreg_address): Likewise. * reload1.c (eliminate_regs_1): Likewise. * rtlanal.c (for_each_inc_dec_find_inc_dec, rtx_cost): Likewise. * simplify-rtx.c (avoid_constant_pool_reference): Likewise. (simplify_binary_operation_1, simplify_subreg): Likewise. * targhooks.c (default_function_arg_padding): Likewise. (default_hard_regno_nregs, default_class_max_nregs): Likewise. * tree-cfg.c (verify_gimple_assign_binary): Likewise. (verify_gimple_assign_ternary): Likewise. * tree-inline.c (estimate_move_cost): Likewise. * tree-ssa-forwprop.c (simplify_vector_constructor): Likewise. * tree-ssa-loop-ivopts.c (add_autoinc_candidates): Likewise. (get_address_cost_ainc): Likewise. * tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Likewise. (vect_supportable_dr_alignment): Likewise. * tree-vect-loop.c (vect_determine_vectorization_factor): Likewise. (vectorizable_reduction): Likewise. * tree-vect-stmts.c (vectorizable_assignment, vectorizable_shift) (vectorizable_operation, vectorizable_load): Likewise. * tree.c (build_same_sized_truth_vector_type): Likewise. * valtrack.c (cleanup_auto_inc_dec): Likewise. * var-tracking.c (emit_note_insn_var_location): Likewise. * config/arc/arc.h (ASM_OUTPUT_CASE_END): Use as_a <scalar_int_mode>. (ADDR_VEC_ALIGN): Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256201
2018-01-03poly_int: TYPE_VECTOR_SUBPARTSRichard Sandiford1-9/+9
This patch changes TYPE_VECTOR_SUBPARTS to a poly_uint64. The value is encoded in the 10-bit precision field and was previously always stored as a simple log2 value. The challenge was to use this 10 bits to encode the number of elements in variable-length vectors, so that we didn't need to increase the size of the tree. In practice the number of vector elements should always have the form N + N * X (where X is the runtime value), and as for constant-length vectors, N must be a power of 2 (even though X itself might not be). The patch therefore uses the low 8 bits to encode log2(N) and bit 8 to select between constant-length and variable-length vectors. Targets without variable-length vectors continue to use the old scheme. A new valid_vector_subparts_p function tests whether a given number of elements can be encoded. This is false for the vector modes that represent an LD3 or ST3 vector triple (which we want to treat as arrays of vectors rather than single vectors). Most of the patch is mechanical; previous patches handled the changes that weren't entirely straightforward. 2018-01-03 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * tree.h (TYPE_VECTOR_SUBPARTS): Turn into a function and handle polynomial numbers of units. (SET_TYPE_VECTOR_SUBPARTS): Likewise. (valid_vector_subparts_p): New function. (build_vector_type): Remove temporary shim and take the number of units as a poly_uint64 rather than an int. (build_opaque_vector_type): Take the number of units as a poly_uint64 rather than an int. * tree.c (build_vector_from_ctor): Handle polynomial TYPE_VECTOR_SUBPARTS. (type_hash_canon_hash, type_cache_hasher::equal): Likewise. (uniform_vector_p, vector_type_mode, build_vector): Likewise. (build_vector_from_val): If the number of units is variable, use build_vec_duplicate_cst for constant operands and VEC_DUPLICATE_EXPR otherwise. (make_vector_type): Remove temporary is_constant (). (build_vector_type, build_opaque_vector_type): Take the number of units as a poly_uint64 rather than an int. (check_vector_cst): Handle polynomial TYPE_VECTOR_SUBPARTS and VECTOR_CST_NELTS. * cfgexpand.c (expand_debug_expr): Likewise. * expr.c (count_type_elements, categorize_ctor_elements_1): Likewise. (store_constructor, expand_expr_real_1): Likewise. (const_scalar_mask_from_tree): Likewise. * fold-const-call.c (fold_const_reduction): Likewise. * fold-const.c (const_binop, const_unop, fold_convert_const): Likewise. (operand_equal_p, fold_vec_perm, fold_ternary_loc): Likewise. (native_encode_vector, vec_cst_ctor_to_array): Likewise. (fold_relational_const): Likewise. (native_interpret_vector): Likewise. Change the size from an int to an unsigned int. * gimple-fold.c (gimple_fold_stmt_to_constant_1): Handle polynomial TYPE_VECTOR_SUBPARTS. (gimple_fold_indirect_ref, gimple_build_vector): Likewise. (gimple_build_vector_from_val): Use VEC_DUPLICATE_EXPR when duplicating a non-constant operand into a variable-length vector. * hsa-brig.c (hsa_op_immed::emit_to_buffer): Handle polynomial TYPE_VECTOR_SUBPARTS and VECTOR_CST_NELTS. * ipa-icf.c (sem_variable::equals): Likewise. * match.pd: Likewise. * omp-simd-clone.c (simd_clone_subparts): Likewise. * print-tree.c (print_node): Likewise. * stor-layout.c (layout_type): Likewise. * targhooks.c (default_builtin_vectorization_cost): Likewise. * tree-cfg.c (verify_gimple_comparison): Likewise. (verify_gimple_assign_binary): Likewise. (verify_gimple_assign_ternary): Likewise. (verify_gimple_assign_single): Likewise. * tree-pretty-print.c (dump_generic_node): Likewise. * tree-ssa-forwprop.c (simplify_vector_constructor): Likewise. (simplify_bitfield_ref, is_combined_permutation_identity): Likewise. * tree-vect-data-refs.c (vect_permute_store_chain): Likewise. (vect_grouped_load_supported, vect_permute_load_chain): Likewise. (vect_shift_permute_load_chain): Likewise. * tree-vect-generic.c (nunits_for_known_piecewise_op): Likewise. (expand_vector_condition, optimize_vector_constructor): Likewise. (lower_vec_perm, get_compute_type): Likewise. * tree-vect-loop.c (vect_determine_vectorization_factor): Likewise. (get_initial_defs_for_reduction, vect_transform_loop): Likewise. * tree-vect-patterns.c (vect_recog_bool_pattern): Likewise. (vect_recog_mask_conversion_pattern): Likewise. * tree-vect-slp.c (vect_supported_load_permutation_p): Likewise. (vect_get_constant_vectors, vect_transform_slp_perm_load): Likewise. * tree-vect-stmts.c (perm_mask_for_reverse): Likewise. (get_group_load_store_type, vectorizable_mask_load_store): Likewise. (vectorizable_bswap, simd_clone_subparts, vectorizable_assignment) (vectorizable_shift, vectorizable_operation, vectorizable_store) (vectorizable_load, vect_is_simple_cond, vectorizable_comparison) (supportable_widening_operation): Likewise. (supportable_narrowing_operation): Likewise. * tree-vector-builder.c (tree_vector_builder::binary_encoded_nelts): Likewise. * varasm.c (output_constant): Likewise. gcc/ada/ * gcc-interface/utils.c (gnat_types_compatible_p): Handle polynomial TYPE_VECTOR_SUBPARTS. gcc/brig/ * brigfrontend/brig-to-generic.cc (get_unsigned_int_type): Handle polynomial TYPE_VECTOR_SUBPARTS. * brigfrontend/brig-util.h (gccbrig_type_vector_subparts): Likewise. gcc/c-family/ * c-common.c (vector_types_convertible_p, c_build_vec_perm_expr) (convert_vector_to_array_for_subscript): Handle polynomial TYPE_VECTOR_SUBPARTS. (c_common_type_for_mode): Check valid_vector_subparts_p. * c-pretty-print.c (pp_c_initializer_list): Handle polynomial VECTOR_CST_NELTS. gcc/c/ * c-typeck.c (comptypes_internal, build_binary_op): Handle polynomial TYPE_VECTOR_SUBPARTS. gcc/cp/ * constexpr.c (cxx_eval_array_reference): Handle polynomial VECTOR_CST_NELTS. (cxx_fold_indirect_ref): Handle polynomial TYPE_VECTOR_SUBPARTS. * call.c (build_conditional_expr_1): Likewise. * decl.c (cp_finish_decomp): Likewise. * mangle.c (write_type): Likewise. * typeck.c (structural_comptypes): Likewise. (cp_build_binary_op): Likewise. * typeck2.c (process_init_constructor_array): Likewise. gcc/fortran/ * trans-types.c (gfc_type_for_mode): Check valid_vector_subparts_p. gcc/lto/ * lto-lang.c (lto_type_for_mode): Check valid_vector_subparts_p. * lto.c (hash_canonical_type): Handle polynomial TYPE_VECTOR_SUBPARTS. gcc/go/ * go-lang.c (go_langhook_type_for_mode): Check valid_vector_subparts_p. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256197
2018-01-03poly_int: vect_permute_load/store_chainRichard Sandiford1-3/+7
The GET_MODE_NUNITS patch made vect_grouped_store_supported and vect_grouped_load_supported check for a constant number of elements, so vect_permute_store_chain and vect_permute_load_chain can assert for that. This patch adds commentary to that effect; the actual asserts will be added by a later, more mechanical, patch. The patch also reorganises the function so that the asserts are linked specifically to code that builds permute vectors element-by-element. This allows a later patch to add support for some variable-length permutes. 2018-01-03 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * tree-vect-data-refs.c (vect_permute_store_chain): Reorganize so that both the length == 3 and length != 3 cases set up their own permute vectors. Add comments explaining why we know the number of elements is constant. (vect_permute_load_chain): Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256196
2018-01-03poly_int: GET_MODE_NUNITSRichard Sandiford1-4/+25
This patch changes GET_MODE_NUNITS from unsigned char to poly_uint16, although it remains a macro when compiling target code with NUM_POLY_INT_COEFFS == 1. We can handle permuted loads and stores for variable nunits if the number of statements is a power of 2, but not otherwise. The to_constant call in make_vector_type goes away in a later patch. 2018-01-03 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * machmode.h (mode_nunits): Change from unsigned char to poly_uint16_pod. (ONLY_FIXED_SIZE_MODES): New macro. (pod_mode::measurement_type, scalar_int_mode::measurement_type) (scalar_float_mode::measurement_type, scalar_mode::measurement_type) (complex_mode::measurement_type, fixed_size_mode::measurement_type): New typedefs. (mode_to_nunits): Return a poly_uint16 rather than an unsigned short. (GET_MODE_NUNITS): Return a constant if ONLY_FIXED_SIZE_MODES, or if measurement_type is not polynomial. * genmodes.c (ZERO_COEFFS): New macro. (emit_mode_nunits_inline): Make mode_nunits_inline return a poly_uint16. (emit_mode_nunits): Change the type of mode_nunits to poly_uint16_pod. Use ZERO_COEFFS when emitting initializers. * data-streamer.h (bp_pack_poly_value): New function. (bp_unpack_poly_value): Likewise. * lto-streamer-in.c (lto_input_mode_table): Use bp_unpack_poly_value for GET_MODE_NUNITS. * lto-streamer-out.c (lto_write_mode_table): Use bp_pack_poly_value for GET_MODE_NUNITS. * tree.c (make_vector_type): Remove temporary shim and make the real function take the number of units as a poly_uint64 rather than an int. (build_vector_type_for_mode): Handle polynomial nunits. * dwarf2out.c (loc_descriptor, add_const_value_attribute): Likewise. * emit-rtl.c (const_vec_series_p_1): Likewise. (gen_rtx_CONST_VECTOR): Likewise. * fold-const.c (test_vec_duplicate_folding): Likewise. * genrecog.c (validate_pattern): Likewise. * optabs-query.c (can_vec_perm_var_p, can_mult_highpart_p): Likewise. * optabs-tree.c (expand_vec_cond_expr_p): Likewise. * optabs.c (expand_vector_broadcast, expand_binop_directly): Likewise. (shift_amt_for_vec_perm_mask, expand_vec_perm_var): Likewise. (expand_vec_cond_expr, expand_mult_highpart): Likewise. * rtlanal.c (subreg_get_info): Likewise. * tree-vect-data-refs.c (vect_grouped_store_supported): Likewise. (vect_grouped_load_supported): Likewise. * tree-vect-generic.c (type_for_widest_vector_mode): Likewise. * tree-vect-loop.c (have_whole_vector_shift): Likewise. * simplify-rtx.c (simplify_unary_operation_1): Likewise. (simplify_const_unary_operation, simplify_binary_operation_1) (simplify_const_binary_operation, simplify_ternary_operation) (test_vector_ops_duplicate, test_vector_ops): Likewise. (simplify_immed_subreg): Use GET_MODE_NUNITS on a fixed_size_mode instead of CONST_VECTOR_NUNITS. * varasm.c (output_constant_pool_2): Likewise. * rtx-vector-builder.c (rtx_vector_builder::build): Only include the explicit-encoded elements in the XVEC for variable-length vectors. gcc/ada/ * gcc-interface/misc.c (enumerate_modes): Handle polynomial GET_MODE_NUNITS. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256195
2018-01-03Update copyright years.Jakub Jelinek1-1/+1
From-SVN: r256169
2018-01-03poly_int: vect_no_alias_pRichard Sandiford1-39/+41
This patch replaces the two-state vect_no_alias_p with a three-state vect_compile_time_alias that handles polynomial segment lengths. 2018-01-03 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * tree-vect-data-refs.c (vect_no_alias_p): Replace with... (vect_compile_time_alias): ...this new function. Do the calculation on poly_ints rather than trees. (vect_prune_runtime_alias_test_list): Update call accordingly. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256142
2018-01-03poly_int: vector_alignment_reachable_pRichard Sandiford1-3/+4
This patch makes vector_alignment_reachable_p cope with variable-length vectors. 2018-01-03 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * tree-vect-data-refs.c (vector_alignment_reachable_p): Treat the number of units as polynomial. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256132
2018-01-03poly_int: vectoriser vf and ufRichard Sandiford1-41/+48
This patch changes the type of the vectorisation factor and SLP unrolling factor to poly_uint64. This in turn required some knock-on changes in signedness elsewhere. Cost decisions are generally based on estimated_poly_value, which for VF is wrapped up as vect_vf_for_cost. The patch doesn't on its own enable variable-length vectorisation. It just makes the minimum changes necessary for the code to build with the new VF and UF types. Later patches also make the vectoriser cope with variable TYPE_VECTOR_SUBPARTS and variable GET_MODE_NUNITS, at which point the code really does handle variable-length vectors. The patch also changes MAX_VECTORIZATION_FACTOR to INT_MAX, to avoid hard-coding a particular architectural limit. The patch includes a new test because a development version of the patch accidentally used file print routines instead of dump_*, which would fail with -fopt-info. 2018-01-03 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * tree-vectorizer.h (_slp_instance::unrolling_factor): Change from an unsigned int to a poly_uint64. (_loop_vec_info::slp_unrolling_factor): Likewise. (_loop_vec_info::vectorization_factor): Change from an int to a poly_uint64. (MAX_VECTORIZATION_FACTOR): Bump from 64 to INT_MAX. (vect_get_num_vectors): New function. (vect_update_max_nunits, vect_vf_for_cost): Likewise. (vect_get_num_copies): Use vect_get_num_vectors. (vect_analyze_data_ref_dependences): Change max_vf from an int * to an unsigned int *. (vect_analyze_data_refs): Change min_vf from an int * to a poly_uint64 *. (vect_transform_slp_perm_load): Take the vf as a poly_uint64 rather than an unsigned HOST_WIDE_INT. * tree-vect-data-refs.c (vect_analyze_possibly_independent_ddr) (vect_analyze_data_ref_dependence): Change max_vf from an int * to an unsigned int *. (vect_analyze_data_ref_dependences): Likewise. (vect_compute_data_ref_alignment): Handle polynomial vf. (vect_enhance_data_refs_alignment): Likewise. (vect_prune_runtime_alias_test_list): Likewise. (vect_shift_permute_load_chain): Likewise. (vect_supportable_dr_alignment): Likewise. (dependence_distance_ge_vf): Take the vectorization factor as a poly_uint64 rather than an unsigned HOST_WIDE_INT. (vect_analyze_data_refs): Change min_vf from an int * to a poly_uint64 *. * tree-vect-loop-manip.c (vect_gen_scalar_loop_niters): Take vfm1 as a poly_uint64 rather than an int. Make the same change for the returned bound_scalar. (vect_gen_vector_loop_niters): Handle polynomial vf. (vect_do_peeling): Likewise. Update call to vect_gen_scalar_loop_niters and handle polynomial bound_scalars. (vect_gen_vector_loop_niters_mult_vf): Assert that the vf must be constant. * tree-vect-loop.c (vect_determine_vectorization_factor) (vect_update_vf_for_slp, vect_analyze_loop_2): Handle polynomial vf. (vect_get_known_peeling_cost): Likewise. (vect_estimate_min_profitable_iters, vectorizable_reduction): Likewise. (vect_worthwhile_without_simd_p, vectorizable_induction): Likewise. (vect_transform_loop): Likewise. Use the lowest possible VF when updating the upper bounds of the loop. (vect_min_worthwhile_factor): Make static. Return an unsigned int rather than an int. * tree-vect-slp.c (vect_attempt_slp_rearrange_stmts): Cope with polynomial unroll factors. (vect_analyze_slp_cost_1, vect_analyze_slp_instance): Likewise. (vect_make_slp_decision): Likewise. (vect_supported_load_permutation_p): Likewise, and polynomial vf too. (vect_analyze_slp_cost): Handle polynomial vf. (vect_slp_analyze_node_operations): Likewise. (vect_slp_analyze_bb_1): Likewise. (vect_transform_slp_perm_load): Take the vf as a poly_uint64 rather than an unsigned HOST_WIDE_INT. * tree-vect-stmts.c (vectorizable_simd_clone_call, vectorizable_store) (vectorizable_load): Handle polynomial vf. * tree-vectorizer.c (simduid_to_vf::vf): Change from an int to a poly_uint64. (adjust_simduid_builtins, shrink_simd_arrays): Update accordingly. gcc/testsuite/ * gcc.dg/vect-opt-info-1.c: New test. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256126
2018-01-02Use explicit encodings for simple permutesRichard Sandiford1-19/+28
This patch makes users of vec_perm_builders use the compressed encoding where possible. This means that they work with variable-length vectors. 2018-01-02 Richard Sandiford <richard.sandiford@linaro.org> gcc/ * optabs.c (expand_vec_perm_var): Use an explicit encoding for the broadcast of the low byte. (expand_mult_highpart): Use an explicit encoding for the permutes. * optabs-query.c (can_mult_highpart_p): Likewise. * tree-vect-loop.c (calc_vec_perm_mask_for_shift): Likewise. * tree-vect-stmts.c (perm_mask_for_reverse): Likewise. (vectorizable_bswap): Likewise. * tree-vect-data-refs.c (vect_grouped_store_supported): Use an explicit encoding for the power-of-2 permutes. (vect_permute_store_chain): Likewise. (vect_grouped_load_supported): Likewise. (vect_permute_load_chain): Likewise. From-SVN: r256097
2018-01-02Make vec_perm_indices use new vector encodingRichard Sandiford1-40/+69
This patch changes vec_perm_indices from a plain vec<> to a class that stores a canonicalized permutation, using the same encoding as for VECTOR_CSTs. This means that vec_perm_indices now carries information about the number of vectors being permuted (currently always 1 or 2) and the number of elements in each input vector. A new vec_perm_builder class is used to actually build up the vector, like tree_vector_builder does for trees. vec_perm_indices is the completed representation, a bit like VECTOR_CST is for trees. The patch just does a mechanical conversion of the code to vec_perm_builder: a later patch uses explicit encodings where possible. The point of all this is that it makes the representation suitable for variable-length vectors. It's no longer necessary for the underlying vec<>s to store every element explicitly. In int-vector-builder.h, "using the same encoding as tree and rtx constants" describes the endpoint -- adding the rtx encoding comes later. 2018-01-02 Richard Sandiford <richard.sandiford@linaro.org> gcc/ * int-vector-builder.h: New file. * vec-perm-indices.h: Include int-vector-builder.h. (vec_perm_indices): Redefine as an int_vector_builder. (auto_vec_perm_indices): Delete. (vec_perm_builder): Redefine as a stand-alone class. (vec_perm_indices::vec_perm_indices): New function. (vec_perm_indices::clamp): Likewise. * vec-perm-indices.c: Include fold-const.h and tree-vector-builder.h. (vec_perm_indices::new_vector): New function. (vec_perm_indices::new_expanded_vector): Update for new vec_perm_indices class. (vec_perm_indices::rotate_inputs): New function. (vec_perm_indices::all_in_range_p): Operate directly on the encoded form, without computing elided elements. (tree_to_vec_perm_builder): Operate directly on the VECTOR_CST encoding. Update for new vec_perm_indices class. * optabs.c (expand_vec_perm_const): Create a vec_perm_indices for the given vec_perm_builder. (expand_vec_perm_var): Update vec_perm_builder constructor. (expand_mult_highpart): Use vec_perm_builder instead of auto_vec_perm_indices. * optabs-query.c (can_mult_highpart_p): Use vec_perm_builder and vec_perm_indices instead of auto_vec_perm_indices. Use a single or double series encoding as appropriate. * fold-const.c (fold_ternary_loc): Use vec_perm_builder and vec_perm_indices instead of auto_vec_perm_indices. * tree-ssa-forwprop.c (simplify_vector_constructor): Likewise. * tree-vect-data-refs.c (vect_grouped_store_supported): Likewise. (vect_permute_store_chain): Likewise. (vect_grouped_load_supported): Likewise. (vect_permute_load_chain): Likewise. (vect_shift_permute_load_chain): Likewise. * tree-vect-slp.c (vect_build_slp_tree_1): Likewise. (vect_transform_slp_perm_load): Likewise. (vect_schedule_slp_instance): Likewise. * tree-vect-stmts.c (perm_mask_for_reverse): Likewise. (vectorizable_mask_load_store): Likewise. (vectorizable_bswap): Likewise. (vectorizable_store): Likewise. (vectorizable_load): Likewise. * tree-vect-generic.c (lower_vec_perm): Use vec_perm_builder and vec_perm_indices instead of auto_vec_perm_indices. Use tree_to_vec_perm_builder to read the vector from a tree. * tree-vect-loop.c (calc_vec_perm_mask_for_shift): Take a vec_perm_builder instead of a vec_perm_indices. (have_whole_vector_shift): Use vec_perm_builder and vec_perm_indices instead of auto_vec_perm_indices. Leave the truncation to calc_vec_perm_mask_for_shift. (vect_create_epilog_for_reduction): Likewise. * config/aarch64/aarch64.c (expand_vec_perm_d::perm): Change from auto_vec_perm_indices to vec_perm_indices. (aarch64_expand_vec_perm_const_1): Use rotate_inputs on d.perm instead of changing individual elements. (aarch64_vectorize_vec_perm_const): Use new_vector to install the vector in d.perm. * config/arm/arm.c (expand_vec_perm_d::perm): Change from auto_vec_perm_indices to vec_perm_indices. (arm_expand_vec_perm_const_1): Use rotate_inputs on d.perm instead of changing individual elements. (arm_vectorize_vec_perm_const): Use new_vector to install the vector in d.perm. * config/powerpcspe/powerpcspe.c (rs6000_expand_extract_even): Update vec_perm_builder constructor. (rs6000_expand_interleave): Likewise. * config/rs6000/rs6000.c (rs6000_expand_extract_even): Likewise. (rs6000_expand_interleave): Likewise. From-SVN: r256095
2018-01-02Remove vec_perm_const optabRichard Sandiford1-0/+1
One of the changes needed for variable-length VEC_PERM_EXPRs -- and for long fixed-length VEC_PERM_EXPRs -- is the ability to use constant selectors that wouldn't fit in the vectors being permuted. E.g. a permute on two V256QIs can't be done using a V256QI selector. At the moment constant permutes use two interfaces: targetm.vectorizer.vec_perm_const_ok for testing whether a permute is valid and the vec_perm_const optab for actually emitting the permute. The former gets passed a vec<> selector and the latter an rtx selector. Most ports share a lot of code between the hook and the optab, with a wrapper function for each interface. We could try to keep that interface and require ports to define wider vector modes that could be attached to the CONST_VECTOR (e.g. V256HI or V256SI in the example above). But building a CONST_VECTOR rtx seems a bit pointless here, since the expand code only creates the CONST_VECTOR in order to call the optab, and the first thing the target does is take the CONST_VECTOR apart again. The easiest approach therefore seemed to be to remove the optab and reuse the target hook to emit the code. One potential drawback is that it's no longer possible to use match_operand predicates to force operands into the required form, but in practice all targets want register operands anyway. The patch also changes vec_perm_indices into a class that provides some simple routines for handling permutations. A later patch will flesh this out and get rid of auto_vec_perm_indices, but I didn't want to do all that in this patch and make it more complicated than it already is. 2018-01-02 Richard Sandiford <richard.sandiford@linaro.org> gcc/ * Makefile.in (OBJS): Add vec-perm-indices.o. * vec-perm-indices.h: New file. * vec-perm-indices.c: Likewise. * target.h (vec_perm_indices): Replace with a forward class declaration. (auto_vec_perm_indices): Move to vec-perm-indices.h. * optabs.h: Include vec-perm-indices.h. (expand_vec_perm): Delete. (selector_fits_mode_p, expand_vec_perm_var): Declare. (expand_vec_perm_const): Declare. * target.def (vec_perm_const_ok): Replace with... (vec_perm_const): ...this new hook. * doc/tm.texi.in (TARGET_VECTORIZE_VEC_PERM_CONST_OK): Replace with... (TARGET_VECTORIZE_VEC_PERM_CONST): ...this new hook. * doc/tm.texi: Regenerate. * optabs.def (vec_perm_const): Delete. * doc/md.texi (vec_perm_const): Likewise. (vec_perm): Refer to TARGET_VECTORIZE_VEC_PERM_CONST. * expr.c (expand_expr_real_2): Use expand_vec_perm_const rather than expand_vec_perm for constant permutation vectors. Assert that the mode of variable permutation vectors is the integer equivalent of the mode that is being permuted. * optabs-query.h (selector_fits_mode_p): Declare. * optabs-query.c: Include vec-perm-indices.h. (selector_fits_mode_p): New function. (can_vec_perm_const_p): Check whether targetm.vectorize.vec_perm_const is defined, instead of checking whether the vec_perm_const_optab exists. Use targetm.vectorize.vec_perm_const instead of targetm.vectorize.vec_perm_const_ok. Check whether the indices fit in the vector mode before using a variable permute. * optabs.c (shift_amt_for_vec_perm_mask): Take a mode and a vec_perm_indices instead of an rtx. (expand_vec_perm): Replace with... (expand_vec_perm_const): ...this new function. Take the selector as a vec_perm_indices rather than an rtx. Also take the mode of the selector. Update call to shift_amt_for_vec_perm_mask. Use targetm.vectorize.vec_perm_const instead of vec_perm_const_optab. Use vec_perm_indices::new_expanded_vector to expand the original selector into bytes. Check whether the indices fit in the vector mode before using a variable permute. (expand_vec_perm_var): Make global. (expand_mult_highpart): Use expand_vec_perm_const. * fold-const.c: Includes vec-perm-indices.h. * tree-ssa-forwprop.c: Likewise. * tree-vect-data-refs.c: Likewise. * tree-vect-generic.c: Likewise. * tree-vect-loop.c: Likewise. * tree-vect-slp.c: Likewise. * tree-vect-stmts.c: Likewise. * config/aarch64/aarch64-protos.h (aarch64_expand_vec_perm_const): Delete. * config/aarch64/aarch64-simd.md (vec_perm_const<mode>): Delete. * config/aarch64/aarch64.c (aarch64_expand_vec_perm_const) (aarch64_vectorize_vec_perm_const_ok): Fuse into... (aarch64_vectorize_vec_perm_const): ...this new function. (TARGET_VECTORIZE_VEC_PERM_CONST_OK): Delete. (TARGET_VECTORIZE_VEC_PERM_CONST): Redefine. * config/arm/arm-protos.h (arm_expand_vec_perm_const): Delete. * config/arm/vec-common.md (vec_perm_const<mode>): Delete. * config/arm/arm.c (TARGET_VECTORIZE_VEC_PERM_CONST_OK): Delete. (TARGET_VECTORIZE_VEC_PERM_CONST): Redefine. (arm_expand_vec_perm_const, arm_vectorize_vec_perm_const_ok): Merge into... (arm_vectorize_vec_perm_const): ...this new function. Explicitly check for NEON modes. * config/i386/i386-protos.h (ix86_expand_vec_perm_const): Delete. * config/i386/sse.md (VEC_PERM_CONST, vec_perm_const<mode>): Delete. * config/i386/i386.c (ix86_expand_vec_perm_const_1): Update comment. (ix86_expand_vec_perm_const, ix86_vectorize_vec_perm_const_ok): Merge into... (ix86_vectorize_vec_perm_const): ...this new function. Incorporate the old VEC_PERM_CONST conditions. * config/ia64/ia64-protos.h (ia64_expand_vec_perm_const): Delete. * config/ia64/vect.md (vec_perm_const<mode>): Delete. * config/ia64/ia64.c (ia64_expand_vec_perm_const) (ia64_vectorize_vec_perm_const_ok): Merge into... (ia64_vectorize_vec_perm_const): ...this new function. * config/mips/loongson.md (vec_perm_const<mode>): Delete. * config/mips/mips-msa.md (vec_perm_const<mode>): Delete. * config/mips/mips-ps-3d.md (vec_perm_constv2sf): Delete. * config/mips/mips-protos.h (mips_expand_vec_perm_const): Delete. * config/mips/mips.c (mips_expand_vec_perm_const) (mips_vectorize_vec_perm_const_ok): Merge into... (mips_vectorize_vec_perm_const): ...this new function. * config/powerpcspe/altivec.md (vec_perm_constv16qi): Delete. * config/powerpcspe/paired.md (vec_perm_constv2sf): Delete. * config/powerpcspe/spe.md (vec_perm_constv2si): Delete. * config/powerpcspe/vsx.md (vec_perm_const<mode>): Delete. * config/powerpcspe/powerpcspe-protos.h (altivec_expand_vec_perm_const) (rs6000_expand_vec_perm_const): Delete. * config/powerpcspe/powerpcspe.c (TARGET_VECTORIZE_VEC_PERM_CONST_OK): Delete. (TARGET_VECTORIZE_VEC_PERM_CONST): Redefine. (altivec_expand_vec_perm_const_le): Take each operand individually. Operate on constant selectors rather than rtxes. (altivec_expand_vec_perm_const): Likewise. Update call to altivec_expand_vec_perm_const_le. (rs6000_expand_vec_perm_const): Delete. (rs6000_vectorize_vec_perm_const_ok): Delete. (rs6000_vectorize_vec_perm_const): New function. (rs6000_do_expand_vec_perm): Take a vec_perm_builder instead of an element count and rtx array. (rs6000_expand_extract_even): Update call accordingly. (rs6000_expand_interleave): Likewise. * config/rs6000/altivec.md (vec_perm_constv16qi): Delete. * config/rs6000/paired.md (vec_perm_constv2sf): Delete. * config/rs6000/vsx.md (vec_perm_const<mode>): Delete. * config/rs6000/rs6000-protos.h (altivec_expand_vec_perm_const) (rs6000_expand_vec_perm_const): Delete. * config/rs6000/rs6000.c (TARGET_VECTORIZE_VEC_PERM_CONST_OK): Delete. (TARGET_VECTORIZE_VEC_PERM_CONST): Redefine. (altivec_expand_vec_perm_const_le): Take each operand individually. Operate on constant selectors rather than rtxes. (altivec_expand_vec_perm_const): Likewise. Update call to altivec_expand_vec_perm_const_le. (rs6000_expand_vec_perm_const): Delete. (rs6000_vectorize_vec_perm_const_ok): Delete. (rs6000_vectorize_vec_perm_const): New function. Remove stray reference to the SPE evmerge intructions. (rs6000_do_expand_vec_perm): Take a vec_perm_builder instead of an element count and rtx array. (rs6000_expand_extract_even): Update call accordingly. (rs6000_expand_interleave): Likewise. * config/sparc/sparc.md (vec_perm_constv8qi): Delete in favor of... * config/sparc/sparc.c (sparc_vectorize_vec_perm_const): ...this new function. (TARGET_VECTORIZE_VEC_PERM_CONST): Redefine. From-SVN: r256093
2018-01-02Split can_vec_perm_p into can_vec_perm_{var,const}_pRichard Sandiford1-19/+19
This patch splits can_vec_perm_p into two functions: can_vec_perm_var_p for testing permute operations with variable selection vectors, and can_vec_perm_const_p for testing permute operations with specific constant selection vectors. This means that we can pass the constant selection vector by reference. Constant permutes can still use a variable permute as a fallback. A later patch adds a check to makre sure that we don't truncate the vector indices when doing this. However, have_whole_vector_shift checked: if (direct_optab_handler (vec_perm_const_optab, mode) == CODE_FOR_nothing) return false; which had the effect of disallowing the fallback to variable permutes. I'm not sure whether that was the intention or whether it was just supposed to short-cut the loop on targets that don't support permutes. (But then why bother? The first check in the loop would fail and we'd bail out straightaway.) The patch adds a parameter for disallowing the fallback. I think it makes sense to do this for the following code in the VEC_PERM_EXPR folder: /* Some targets are deficient and fail to expand a single argument permutation while still allowing an equivalent 2-argument version. */ if (need_mask_canon && arg2 == op2 && !can_vec_perm_p (TYPE_MODE (type), false, &sel) && can_vec_perm_p (TYPE_MODE (type), false, &sel2)) since it's really testing whether the expand_vec_perm_const code expects a particular form. 2018-01-02 Richard Sandiford <richard.sandiford@linaro.org> gcc/ * optabs-query.h (can_vec_perm_p): Delete. (can_vec_perm_var_p, can_vec_perm_const_p): Declare. * optabs-query.c (can_vec_perm_p): Split into... (can_vec_perm_var_p, can_vec_perm_const_p): ...these two functions. (can_mult_highpart_p): Use can_vec_perm_const_p to test whether a particular selector is valid. * tree-ssa-forwprop.c (simplify_vector_constructor): Likewise. * tree-vect-data-refs.c (vect_grouped_store_supported): Likewise. (vect_grouped_load_supported): Likewise. (vect_shift_permute_load_chain): Likewise. * tree-vect-slp.c (vect_build_slp_tree_1): Likewise. (vect_transform_slp_perm_load): Likewise. * tree-vect-stmts.c (perm_mask_for_reverse): Likewise. (vectorizable_bswap): Likewise. (vect_gen_perm_mask_checked): Likewise. * fold-const.c (fold_ternary_loc): Likewise. Don't take implementations of variable permutation vectors into account when deciding which selector to use. * tree-vect-loop.c (have_whole_vector_shift): Don't check whether vec_perm_const_optab is supported; instead use can_vec_perm_const_p with a false third argument. * tree-vect-generic.c (lower_vec_perm): Use can_vec_perm_const_p to test whether the constant selector is valid and can_vec_perm_var_p to test whether a variable selector is valid. From-SVN: r256091
2017-12-21poly_int: compute_data_ref_alignmentRichard Sandiford1-3/+17
This patch makes vect_compute_data_ref_alignment treat DR_INIT as a poly_int and handles cases in which the calculated misalignment might not be constant. 2017-12-21 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * tree-vect-data-refs.c (vect_compute_data_ref_alignment): Treat drb->init as a poly_int. Fail if its misalignment wrt vector_alignment isn't known. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r255935
2017-12-21poly_int: MEM_REF offsetsRichard Sandiford1-4/+1
This patch allows MEM_REF offsets to be polynomial, with mem_ref_offset now returning a poly_offset_int instead of an offset_int. The non-mechanical changes to callers of mem_ref_offset were handled by previous patches. 2017-12-21 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * fold-const.h (mem_ref_offset): Return a poly_offset_int rather than an offset_int. * tree.c (mem_ref_offset): Likewise. (build_simple_mem_ref_loc): Treat MEM_REF offsets as poly_ints. * builtins.c (get_object_alignment_2): Likewise. * expr.c (get_inner_reference, expand_expr_real_1): Likewise. * gimple-fold.c (get_base_constructor): Likewise. * gimple-ssa-strength-reduction.c (restructure_reference): Likewise. * gimple-ssa-warn-restrict.c (builtin_memref::builtin_memref): Likewise. * ipa-polymorphic-call.c (ipa_polymorphic_call_context::ipa_polymorphic_call_context): Likewise. * ipa-prop.c (compute_complex_assign_jump_func): Likewise. (get_ancestor_addr_info): Likewise. * ipa-param-manipulation.c (ipa_get_adjustment_candidate): Likewise. * match.pd: Likewise. * tree-data-ref.c (dr_analyze_innermost): Likewise. * tree-dfa.c (get_addr_base_and_unit_offset_1): Likewise. * tree-eh.c (tree_could_trap_p): Likewise. * tree-object-size.c (addr_object_size): Likewise. * tree-ssa-address.c (copy_ref_info): Likewise. * tree-ssa-alias.c (indirect_ref_may_alias_decl_p): Likewise. (indirect_refs_may_alias_p): Likewise. * tree-ssa-sccvn.c (copy_reference_ops_from_ref): Likewise. * tree-ssa.c (maybe_rewrite_mem_ref_base): Likewise. (non_rewritable_mem_ref_base): Likewise. * tree-vect-data-refs.c (vect_check_gather_scatter): Likewise. * tree-vrp.c (vrp_prop::check_array_ref): Likewise. * varasm.c (decode_addr_const): Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r255930
2017-12-21poly_int: get_inner_reference & co.Richard Sandiford1-4/+6
This patch makes get_inner_reference and ptr_difference_const return the bit size and bit position as poly_int64s rather than HOST_WIDE_INTS. The non-mechanical changes were handled by previous patches. 2017-12-21 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * tree.h (get_inner_reference): Return the bitsize and bitpos as poly_int64_pods rather than HOST_WIDE_INT. * fold-const.h (ptr_difference_const): Return the pointer difference as a poly_int64_pod rather than a HOST_WIDE_INT. * expr.c (get_inner_reference): Return the bitsize and bitpos as poly_int64_pods rather than HOST_WIDE_INT. (expand_expr_addr_expr_1, expand_expr_real_1): Track polynomial offsets and sizes. * fold-const.c (make_bit_field_ref): Take the bitpos as a poly_int64 rather than a HOST_WIDE_INT. Update call to get_inner_reference. (optimize_bit_field_compare): Update call to get_inner_reference. (decode_field_reference): Likewise. (fold_unary_loc): Track polynomial offsets and sizes. (split_address_to_core_and_offset): Return the bitpos as a poly_int64_pod rather than a HOST_WIDE_INT. (ptr_difference_const): Likewise for the pointer difference. * asan.c (instrument_derefs): Track polynomial offsets and sizes. * config/mips/mips.c (r10k_safe_mem_expr_p): Likewise. * dbxout.c (dbxout_expand_expr): Likewise. * dwarf2out.c (loc_list_for_address_of_addr_expr_of_indirect_ref) (loc_list_from_tree_1, fortran_common): Likewise. * gimple-laddress.c (pass_laddress::execute): Likewise. * gimple-ssa-store-merging.c (find_bswap_or_nop_load): Likewise. * gimplify.c (gimplify_scan_omp_clauses): Likewise. * simplify-rtx.c (delegitimize_mem_from_attrs): Likewise. * tree-affine.c (tree_to_aff_combination): Likewise. (get_inner_reference_aff): Likewise. * tree-data-ref.c (split_constant_offset_1): Likewise. (dr_analyze_innermost): Likewise. * tree-scalar-evolution.c (interpret_rhs_expr): Likewise. * tree-sra.c (ipa_sra_check_caller): Likewise. * tree-vect-data-refs.c (vect_check_gather_scatter): Likewise. * ubsan.c (maybe_instrument_pointer_overflow): Likewise. (instrument_bool_enum_load, instrument_object_size): Likewise. * gimple-ssa-strength-reduction.c (slsr_process_ref): Update call to get_inner_reference. * hsa-gen.c (gen_hsa_addr): Likewise. * sanopt.c (maybe_optimize_ubsan_ptr_ifn): Likewise. * tsan.c (instrument_expr): Likewise. * match.pd: Update call to ptr_difference_const. gcc/ada/ * gcc-interface/trans.c (Attribute_to_gnu): Track polynomial offsets and sizes. * gcc-interface/utils2.c (build_unary_op): Likewise. gcc/cp/ * constexpr.c (check_automatic_or_tls): Track polynomial offsets and sizes. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r255914
2017-12-20poly_int: tree constantsRichard Sandiford1-1/+1
This patch adds a tree representation for poly_ints. Unlike the rtx version, the coefficients are INTEGER_CSTs rather than plain integers, so that we can easily access them as poly_widest_ints and poly_offset_ints. The patch also adjusts some places that previously relied on "constant" meaning "INTEGER_CST". It also makes sure that the TYPE_SIZE agrees with the TYPE_SIZE_UNIT for vector booleans, given the existing: /* Several boolean vector elements may fit in a single unit. */ if (VECTOR_BOOLEAN_TYPE_P (type) && type->type_common.mode != BLKmode) TYPE_SIZE_UNIT (type) = size_int (GET_MODE_SIZE (type->type_common.mode)); else TYPE_SIZE_UNIT (type) = int_const_binop (MULT_EXPR, TYPE_SIZE_UNIT (innertype), size_int (nunits)); 2017-12-20 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * doc/generic.texi (POLY_INT_CST): Document. * tree.def (POLY_INT_CST): New tree code. * treestruct.def (TS_POLY_INT_CST): New tree layout. * tree-core.h (tree_poly_int_cst): New struct. (tree_node): Add a poly_int_cst field. * tree.h (POLY_INT_CST_P, POLY_INT_CST_COEFF): New macros. (wide_int_to_tree, force_fit_type): Take a poly_wide_int_ref instead of a wide_int_ref. (build_int_cst, build_int_cst_type): Take a poly_int64 instead of a HOST_WIDE_INT. (build_int_cstu, build_array_type_nelts): Take a poly_uint64 instead of an unsigned HOST_WIDE_INT. (build_poly_int_cst, tree_fits_poly_int64_p, tree_fits_poly_uint64_p) (ptrdiff_tree_p): Declare. (tree_to_poly_int64, tree_to_poly_uint64): Likewise. Provide extern inline implementations if the target doesn't use POLY_INT_CST. (poly_int_tree_p): New function. (wi::unextended_tree): New class. (wi::int_traits <unextended_tree>): New override. (wi::extended_tree): Add a default constructor. (wi::extended_tree::get_tree): New function. (wi::widest_extended_tree, wi::offset_extended_tree): New typedefs. (wi::tree_to_widest_ref, wi::tree_to_offset_ref): Use them. (wi::tree_to_poly_widest_ref, wi::tree_to_poly_offset_ref) (wi::tree_to_poly_wide_ref): New typedefs. (wi::ints_for): Provide overloads for extended_tree and unextended_tree. (poly_int_cst_value, wi::to_poly_widest, wi::to_poly_offset) (wi::to_wide): New functions. (wi::fits_to_boolean_p, wi::fits_to_tree_p): Handle poly_ints. * tree.c (poly_int_cst_hasher): New struct. (poly_int_cst_hash_table): New variable. (tree_node_structure_for_code, tree_code_size, simple_cst_equal) (valid_constant_size_p, add_expr, drop_tree_overflow): Handle POLY_INT_CST. (initialize_tree_contains_struct): Handle TS_POLY_INT_CST. (init_ttree): Initialize poly_int_cst_hash_table. (build_int_cst, build_int_cst_type, build_invariant_address): Take a poly_int64 instead of a HOST_WIDE_INT. (build_int_cstu, build_array_type_nelts): Take a poly_uint64 instead of an unsigned HOST_WIDE_INT. (wide_int_to_tree): Rename to... (wide_int_to_tree_1): ...this. (build_new_poly_int_cst, build_poly_int_cst): New functions. (force_fit_type): Take a poly_wide_int_ref instead of a wide_int_ref. (wide_int_to_tree): New function that takes a poly_wide_int_ref. (ptrdiff_tree_p, tree_to_poly_int64, tree_to_poly_uint64) (tree_fits_poly_int64_p, tree_fits_poly_uint64_p): New functions. * lto-streamer-out.c (DFS::DFS_write_tree_body, hash_tree): Handle TS_POLY_INT_CST. * tree-streamer-in.c (lto_input_ts_poly_tree_pointers): Likewise. (streamer_read_tree_body): Likewise. * tree-streamer-out.c (write_ts_poly_tree_pointers): Likewise. (streamer_write_tree_body): Likewise. * tree-streamer.c (streamer_check_handled_ts_structures): Likewise. * asan.c (asan_protect_global): Require the size to be an INTEGER_CST. * cfgexpand.c (expand_debug_expr): Handle POLY_INT_CST. * expr.c (expand_expr_real_1, const_vector_from_tree): Likewise. * gimple-expr.h (is_gimple_constant): Likewise. * gimplify.c (maybe_with_size_expr): Likewise. * print-tree.c (print_node): Likewise. * tree-data-ref.c (data_ref_compare_tree): Likewise. * tree-pretty-print.c (dump_generic_node): Likewise. * tree-ssa-address.c (addr_for_mem_ref): Likewise. * tree-vect-data-refs.c (dr_group_sort_cmp): Likewise. * tree-vrp.c (compare_values_warnv): Likewise. * tree-ssa-loop-ivopts.c (determine_base_object, constant_multiple_of) (get_loop_invariant_expr, add_candidate_1, get_computation_aff_1) (force_expr_to_var_cost): Likewise. * tree-ssa-loop.c (for_each_index): Likewise. * fold-const.h (build_invariant_address, size_int_kind): Take a poly_int64 instead of a HOST_WIDE_INT. * fold-const.c (fold_negate_expr_1, const_binop, const_unop) (fold_convert_const, multiple_of_p, fold_negate_const): Handle POLY_INT_CST. (size_binop_loc): Likewise. Allow int_const_binop_1 to fail. (int_const_binop_2): New function, split out from... (int_const_binop_1): ...here. Handle POLY_INT_CST. (size_int_kind): Take a poly_int64 instead of a HOST_WIDE_INT. * expmed.c (make_tree): Handle CONST_POLY_INT_P. * gimple-ssa-strength-reduction.c (slsr_process_add) (slsr_process_mul): Check for INTEGER_CSTs before using them as candidates. * stor-layout.c (bits_from_bytes): New function. (bit_from_pos): Use it. (layout_type): Likewise. For vectors, multiply the TYPE_SIZE_UNIT by BITS_PER_UNIT to get the TYPE_SIZE. * tree-cfg.c (verify_expr, verify_types_in_gimple_reference): Allow MEM_REF and TARGET_MEM_REF offsets to be a POLY_INT_CST. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r255863
2017-12-01re PR tree-optimization/83232 (fma3d spec2000 regression on zen with -Ofast ↵Richard Biener1-9/+20
(generic tuning) after r255268 by missed SLP oppurtunity) 2017-12-01 Richard Biener <rguenther@suse.de> PR tree-optimization/83232 * tree-vect-data-refs.c (vect_analyze_data_ref_accesses): Fix detection of same access. Instead of breaking the group here do not consider the duplicate. Add comment explaining real fix. * gfortran.dg/vect/pr83232.f90: New testcase. From-SVN: r255307
2017-10-06re PR tree-optimization/82397 (qsort comparator non-negative on sorted ↵Richard Biener1-31/+18
output: 1 in vect_analyze_data_ref_accesses) 2017-10-06 Richard Biener <rguenther@suse.de> PR tree-optimization/82397 * tree-vect-data-refs.c (dr_group_sort_cmp): Do not use operand_equal_p but rely on data_ref_compare_tree for detecting equalities. (vect_analyze_data_ref_accesses): Use data_ref_compare_tree to match up with dr_group_sort_cmp. * gfortran.dg/pr82397.f: New testcase. From-SVN: r253482
2017-09-22PR82289: Computing peeling costs for irrelevant drsRichard Sandiford1-0/+3
This PR shows that we weren't filtering out irrelevant stmts in vect_get_peeling_costs_all_drs (unlike related loops in which we iterate over all datarefs). 2017-09-22 Richard Sandiford <richard.sandiford@linaro.org> gcc/ PR tree-optimization/82289 * tree-vect-data-refs.c (vect_get_peeling_costs_all_drs): Check STMT_VINFO_RELEVANT_P. gcc/testsuite/ PR tree-optimization/82289 * gcc.dg/vect/pr82289.c: New test. From-SVN: r253103
2017-09-22Let the target choose a vectorisation alignmentRichard Sandiford1-40/+52
The vectoriser aligned vectors to TYPE_ALIGN unconditionally, although there was also a hard-coded assumption that this was equal to the type size. This was inconvenient for SVE for two reasons: - When compiling for a specific power-of-2 SVE vector length, we might want to align to a full vector. However, the TYPE_ALIGN is governed by the ABI alignment, which is 128 bits regardless of size. - For vector-length-agnostic code it doesn't usually make sense to align, since the runtime vector length might not be a power of two. Even for power of two sizes, there's no guarantee that aligning to the previous 16 bytes will be an improveent. This patch therefore adds a target hook to control the preferred vectoriser (as opposed to ABI) alignment. 2017-09-22 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * target.def (preferred_vector_alignment): New hook. * doc/tm.texi.in (TARGET_VECTORIZE_PREFERRED_VECTOR_ALIGNMENT): New hook. * doc/tm.texi: Regenerate. * targhooks.h (default_preferred_vector_alignment): Declare. * targhooks.c (default_preferred_vector_alignment): New function. * tree-vectorizer.h (dataref_aux): Add a target_alignment field. Expand commentary. (DR_TARGET_ALIGNMENT): New macro. (aligned_access_p): Update commentary. (vect_known_alignment_in_bytes): New function. * tree-vect-data-refs.c (vect_calculate_required_alignment): New function. (vect_compute_data_ref_alignment): Set DR_TARGET_ALIGNMENT. Calculate the misalignment based on the target alignment rather than the vector size. (vect_update_misalignment_for_peel): Use DR_TARGET_ALIGMENT rather than TYPE_ALIGN / BITS_PER_UNIT to update the misalignment. (vect_enhance_data_refs_alignment): Mask the byte misalignment with the target alignment, rather than masking the element misalignment with the number of elements in a vector. Also use the target alignment when calculating the maximum number of peels. (vect_find_same_alignment_drs): Use vect_calculate_required_alignment instead of TYPE_ALIGN_UNIT. (vect_duplicate_ssa_name_ptr_info): Remove stmt_info parameter. Measure DR_MISALIGNMENT relative to DR_TARGET_ALIGNMENT. (vect_create_addr_base_for_vector_ref): Update call accordingly. (vect_create_data_ref_ptr): Likewise. (vect_setup_realignment): Realign by ANDing with -DR_TARGET_MISALIGNMENT. * tree-vect-loop-manip.c (vect_gen_prolog_loop_niters): Calculate the number of peels based on DR_TARGET_ALIGNMENT. * tree-vect-stmts.c (get_group_load_store_type): Compare the gap with the guaranteed alignment boundary when deciding whether overrun is OK. (vectorizable_mask_load_store): Interpret DR_MISALIGNMENT relative to DR_TARGET_ALIGNMENT instead of TYPE_ALIGN_UNIT. (ensure_base_align): Remove stmt_info parameter. Get the target base alignment from DR_TARGET_ALIGNMENT. (vectorizable_store): Update call accordingly. Interpret DR_MISALIGNMENT relative to DR_TARGET_ALIGNMENT instead of TYPE_ALIGN_UNIT. (vectorizable_load): Likewise. gcc/testsuite/ * gcc.dg/vect/vect-outer-3a.c: Adjust dump scan for new wording of alignment message. * gcc.dg/vect/vect-outer-3a-big-array.c: Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r253101
2017-09-22Add a vect_get_scalar_dr_size helper functionRichard Sandiford1-7/+4
This patch adds a helper function for getting the number of bytes accessed by an unvectorised data reference, which helps when general modes have a variable size. 2017-09-22 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * tree-vectorizer.h (vect_get_scalar_dr_size): New function. * tree-vect-data-refs.c (vect_update_misalignment_for_peel): Use it. (vect_enhance_data_refs_alignment): Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r253099
2017-09-18Fix an SVE failure in the Fortran matmul* testsRichard Sandiford1-0/+5
The vectoriser was calling vect_get_smallest_scalar_type without having proven that the type actually is a scalar. This seems to be the intended behaviour: the ultimate test of whether the type is interesting (and hence scalar) is whether an associated vector type exists, but this is only tested later. The patch simply makes the function cope gracefully with non-scalar inputs. 2017-09-18 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * tree-vect-data-refs.c (vect_get_smallest_scalar_type): Cope with types that aren't in fact scalar. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r252934
2017-09-14Add LOOP_VINFO_MAX_VECT_FACTORRichard Sandiford1-1/+1
Epilogue vectorisation uses the vectorisation factor of the main loop as the maximum vectorisation factor allowed for correctness. That makes sense as a conservatively correct value, since the chosen vectorisation factor will be strictly less than that anyway. However, once the VF itself becomes variable, it's easier to carry across the original maximum VF instead. 2017-09-14 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * tree-vectorizer.h (_loop_vec_info): Add max_vectorization_factor. (LOOP_VINFO_MAX_VECT_FACTOR): New macro. (LOOP_VINFO_ORIG_VECT_FACTOR): Replace with... (LOOP_VINFO_ORIG_MAX_VECT_FACTOR): ...this new macro. * tree-vect-data-refs.c (vect_analyze_data_ref_dependences): Update accordingly. * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize max_vectorization_factor. (vect_analyze_loop_2): Set LOOP_VINFO_MAX_VECT_FACTOR. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r252766
2017-09-14Add a vect_get_num_copies helper routineRichard Sandiford1-3/+6
This patch adds a vectoriser helper routine to calculate how many copies of a vector statement we need. At present this is always: LOOP_VINFO_VECT_FACTOR (loop_vinfo) / TYPE_VECTOR_SUBPARTS (vectype) but later patches add other cases. Another benefit of using a helper routine is that it can assert that the division is exact (which it must be). 2017-09-14 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * tree-vectorizer.h (vect_get_num_copies): New function. * tree-vect-data-refs.c (vect_get_data_access_cost): Use it. * tree-vect-loop.c (vectorizable_reduction): Likewise. (vectorizable_induction): Likewise. (vectorizable_live_operation): Likewise. * tree-vect-stmts.c (vectorizable_mask_load_store): Likewise. (vectorizable_bswap): Likewise. (vectorizable_call): Likewise. (vectorizable_conversion): Likewise. (vectorizable_assignment): Likewise. (vectorizable_shift): Likewise. (vectorizable_operation): Likewise. (vectorizable_store): Likewise. (vectorizable_load): Likewise. (vectorizable_condition): Likewise. (vectorizable_comparison): Likewise. (vect_analyze_stmt): Pass the slp node to vectorizable_live_operation. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r252764
2017-09-14Use vec<> for constant permute masksRichard Sandiford1-27/+35
This patch makes can_vec_perm_p & co. take a vec<>, wrapped in new typedefs vec_perm_indices and auto_vec_perm_indices. There are two reasons for doing this for SVE: (1) it means that the number of elements is bundled with the elements themselves, and is obviously constant. (2) it makes it easier to change the "unsigned char" element type to something wider. Changing the target hook is left as follow-on work. 2017-09-14 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * target.h (vec_perm_indices): New typedef. (auto_vec_perm_indices): Likewise. * optabs-query.h: Include target.h (can_vec_perm_p): Take a vec_perm_indices *. * optabs-query.c (can_vec_perm_p): Likewise. (can_mult_highpart_p): Update accordingly. Use auto_vec_perm_indices. * tree-ssa-forwprop.c (simplify_vector_constructor): Likewise. * tree-vect-generic.c (lower_vec_perm): Likewise. * tree-vect-data-refs.c (vect_grouped_store_supported): Likewise. (vect_grouped_load_supported): Likewise. (vect_shift_permute_load_chain): Likewise. (vect_permute_store_chain): Use auto_vec_perm_indices. (vect_permute_load_chain): Likewise. * fold-const.c (fold_vec_perm): Take vec_perm_indices. (fold_ternary_loc): Update accordingly. Use auto_vec_perm_indices. Update uses of can_vec_perm_p. * tree-vect-loop.c (calc_vec_perm_mask_for_shift): Replace the mode with a number of elements. Take a vec_perm_indices *. (vect_create_epilog_for_reduction): Update accordingly. Use auto_vec_perm_indices. (have_whole_vector_shift): Likewise. Update call to can_vec_perm_p. * tree-vect-slp.c (vect_build_slp_tree_1): Likewise. (vect_transform_slp_perm_load): Likewise. (vect_schedule_slp_instance): Use auto_vec_perm_indices. * tree-vectorizer.h (vect_gen_perm_mask_any): Take a vec_perm_indices. (vect_gen_perm_mask_checked): Likewise. * tree-vect-stmts.c (vect_gen_perm_mask_any): Take a vec_perm_indices. (vect_gen_perm_mask_checked): Likewise. (vectorizable_mask_load_store): Use auto_vec_perm_indices. (vectorizable_store): Likewise. (vectorizable_load): Likewise. (perm_mask_for_reverse): Likewise. Update call to can_vec_perm_p. (vectorizable_bswap): Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r252761
2017-08-30[17/77] Add an int_mode_for_size helper functionRichard Sandiford1-5/+4
This patch adds a wrapper around mode_for_size for cases in which the mode class is MODE_INT (the commonest case). The return type can then be an opt_scalar_int_mode instead of a machine_mode. 2017-08-30 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * machmode.h (int_mode_for_size): New function. * builtins.c (set_builtin_user_assembler_name): Use int_mode_for_size instead of mode_for_size. * calls.c (save_fixed_argument_area): Likewise. Make use of BLKmode explicit. * combine.c (expand_field_assignment): Use int_mode_for_size instead of mode_for_size. (make_extraction): Likewise. (simplify_shift_const_1): Likewise. (simplify_comparison): Likewise. * dojump.c (do_jump): Likewise. * dwarf2out.c (mem_loc_descriptor): Likewise. * emit-rtl.c (init_derived_machine_modes): Likewise. * expmed.c (flip_storage_order): Likewise. (convert_extracted_bit_field): Likewise. * expr.c (copy_blkmode_from_reg): Likewise. * graphite-isl-ast-to-gimple.c (max_mode_int_precision): Likewise. * internal-fn.c (expand_mul_overflow): Likewise. * lower-subreg.c (simple_move): Likewise. * optabs-libfuncs.c (init_optabs): Likewise. * simplify-rtx.c (simplify_unary_operation_1): Likewise. * tree.c (vector_type_mode): Likewise. * tree-ssa-strlen.c (handle_builtin_memcmp): Likewise. * tree-vect-data-refs.c (vect_lanes_optab_supported_p): Likewise. * tree-vect-generic.c (expand_vector_parallel): Likewise. * tree-vect-stmts.c (vectorizable_load): Likewise. (vectorizable_store): Likewise. gcc/ada/ * gcc-interface/decl.c (gnat_to_gnu_entity): Use int_mode_for_size instead of mode_for_size. (gnat_to_gnu_subprog_type): Likewise. * gcc-interface/utils.c (make_type_from_size): Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r251469
2017-08-04Pool alignment information for common basesRichard Sandiford1-3/+77
This patch is a follow-on to the fix for PR81136. The testcase for that PR shows that we can (correctly) calculate different base alignments for two data_references but still tell that their misalignments wrt the vector size are equal. This is because we calculate the base alignments for each dr individually, without looking at the other drs, and in general the alignment we calculate is only guaranteed if the dr's DR_REF actually occurs. This is working as designed, but it does expose a missed opportunity. We know that if a vectorised loop is reached, all statements in that loop execute at least once, so it should be safe to pool the alignment information for all the statements we're vectorising. The only catch is that DR_REFs for masked loads and stores only occur if the mask value is nonzero. For example, in: struct s __attribute__((aligned(32))) { int misaligner; int array[N]; }; int *ptr; for (int i = 0; i < n; ++i) ptr[i] = c[i] ? ((struct s *) (ptr - 1))->array[i] : 0; we can only guarantee that ptr points to a "struct s" if at least one c[i] is true. This patch adds a DR_IS_CONDITIONAL_IN_STMT flag to record whether the DR_REF is guaranteed to occur every time that the statement executes to completion. It then pools the alignment information for references that aren't conditional in this sense. 2017-08-04 Richard Sandiford <richard.sandiford@linaro.org> gcc/ PR tree-optimization/81136 * tree-vectorizer.h: Include tree-hash-traits.h. (vec_base_alignments): New typedef. (vec_info): Add a base_alignments field. (vect_record_base_alignments): Declare. * tree-data-ref.h (data_reference): Add an is_conditional_in_stmt field. (DR_IS_CONDITIONAL_IN_STMT): New macro. (create_data_ref): Add an is_conditional_in_stmt argument. * tree-data-ref.c (create_data_ref): Likewise. Use it to initialize the is_conditional_in_stmt field. (data_ref_loc): Add an is_conditional_in_stmt field. (get_references_in_stmt): Set the is_conditional_in_stmt field. (find_data_references_in_stmt): Update call to create_data_ref. (graphite_find_data_references_in_stmt): Likewise. * tree-ssa-loop-prefetch.c (determine_loop_nest_reuse): Likewise. * tree-vect-data-refs.c (vect_analyze_data_refs): Likewise. (vect_record_base_alignment): New function. (vect_record_base_alignments): Likewise. (vect_compute_data_ref_alignment): Adjust base_addr and aligned_to for nested statements even if we fail to compute a misalignment. Use pooled base alignments for unconditional references. (vect_find_same_alignment_drs): Compare base addresses instead of base objects. (vect_analyze_data_refs_alignment): Call vect_record_base_alignments. * tree-vect-slp.c (vect_slp_analyze_bb_1): Likewise. gcc/testsuite/ PR tree-optimization/81136 * gcc.dg/vect/pr81136.c: Add scan test. From-SVN: r250870
2017-08-04Use base inequality for some vector alias checksRichard Sandiford1-7/+32
This patch checks whether two data references x and y cannot partially overlap and so are independent whenever &x != &y. We can then use this in the vectoriser to optimise alias checks. gcc/ 2016-08-04 Richard Sandiford <richard.sandiford@linaro.org> * hash-traits.h (pair_hash): New struct. * tree-data-ref.h (data_dependence_relation): Add object_a and object_b fields. (DDR_OBJECT_A, DDR_OBJECT_B): New macros. * tree-data-ref.c (initialize_data_dependence_relation): Initialize DDR_OBJECT_A and DDR_OBJECT_B. * tree-vectorizer.h (vec_object_pair): New type. (_loop_vec_info): Add a check_unequal_addrs field. (LOOP_VINFO_CHECK_UNEQUAL_ADDRS): New macro. (LOOP_REQUIRES_VERSIONING_FOR_ALIAS): Return true if there is an entry in check_unequal_addrs. Check comp_alias_ddrs instead of may_alias_ddrs. * tree-vect-loop.c (destroy_loop_vec_info): Release LOOP_VINFO_CHECK_UNEQUAL_ADDRS. (vect_analyze_loop_2): Likewise, when restarting. (vect_estimate_min_profitable_iters): Estimate the cost of LOOP_VINFO_CHECK_UNEQUAL_ADDRS. * tree-vect-data-refs.c: Include tree-hash-traits.h. (vect_prune_runtime_alias_test_list): Try to handle conflicts using LOOP_VINFO_CHECK_UNEQUAL_ADDRS, if the data dependence allows. Count such tests in the final summary. * tree-vect-loop-manip.c (chain_cond_expr): New function. (vect_create_cond_for_align_checks): Use it. (vect_create_cond_for_unequal_addrs): New function. (vect_loop_versioning): Call it. gcc/testsuite/ * gcc.dg/vect/vect-alias-check-6.c: New test. From-SVN: r250868
2017-08-04Handle data dependence relations with different basesRichard Sandiford1-4/+107
This patch tries to calculate conservatively-correct distance vectors for two references whose base addresses are not the same. It sets a new flag DDR_COULD_BE_INDEPENDENT_P if the dependence isn't guaranteed to occur. The motivating example is: struct s { int x[8]; }; void f (struct s *a, struct s *b) { for (int i = 0; i < 8; ++i) a->x[i] += b->x[i]; } in which the "a" and "b" accesses are either independent or have a dependence distance of 0 (assuming -fstrict-aliasing). Neither case prevents vectorisation, so we can vectorise without an alias check. I'd originally wanted to do the same thing for arrays as well, e.g.: void f (int a[][8], struct b[][8]) { for (int i = 0; i < 8; ++i) a[0][i] += b[0][i]; } I think this is valid because C11 6.7.6.2/6 says: For two array types to be compatible, both shall have compatible element types, and if both size specifiers are present, and are integer constant expressions, then both size specifiers shall have the same constant value. So if we access an array through an int (*)[8], it must have type X[8] or X[], where X is compatible with int. It doesn't seem possible in either case for "a[0]" and "b[0]" to overlap when "a != b". However, as the comment above "if (same_base_p)" explains, GCC is more forgiving: it supports arbitrary overlap of arrays and allows arrays to be accessed with different dimensionality. There are examples of this in PR50067. The patch therefore only handles references that end in a structure field access. There are two ways of handling these dependences in the vectoriser: use them to limit VF, or check at runtime as before. I've gone for the approach of checking at runtime if we can, to avoid limiting VF unnecessarily, but falling back to a VF cap when runtime checks aren't allowed. The patch tests whether we queued an alias check with a dependence distance of X and then picked a VF <= X, in which case it's safe to drop the alias check. Since vect_prune_runtime_alias_check_list can be called twice with different VF for the same loop, it's no longer safe to clear may_alias_ddrs on exit. Instead we should use comp_alias_ddrs to check whether versioning is necessary. 2017-08-04 Richard Sandiford <richard.sandiford@linaro.org> gcc/ * tree-data-ref.h (subscript): Add access_fn field. (data_dependence_relation): Add could_be_independent_p. (SUB_ACCESS_FN, DDR_COULD_BE_INDEPENDENT_P): New macros. (same_access_functions): Move to tree-data-ref.c. * tree-data-ref.c (ref_contains_union_access_p): New function. (access_fn_component_p): Likewise. (access_fn_components_comparable_p): Likewise. (dr_analyze_indices): Add a reference to access_fn_component_p. (dump_data_dependence_relation): Use SUB_ACCESS_FN instead of DR_ACCESS_FN. (constant_access_functions): Likewise. (add_other_self_distances): Likewise. (same_access_functions): Likewise. (Moved from tree-data-ref.h.) (initialize_data_dependence_relation): Use XCNEW and remove explicit zeroing of DDR_REVERSED_P. Look for a subsequence of access functions that have the same type. Allow the subsequence to end with different bases in some circumstances. Record the chosen access functions in SUB_ACCESS_FN. (build_classic_dist_vector_1): Replace ddr_a and ddr_b with a_index and b_index. Use SUB_ACCESS_FN instead of DR_ACCESS_FN. (subscript_dependence_tester_1): Likewise dra and drb. (build_classic_dist_vector): Update calls accordingly. (subscript_dependence_tester): Likewise. * tree-ssa-loop-prefetch.c (determine_loop_nest_reuse): Check DDR_COULD_BE_INDEPENDENT_P. * tree-vectorizer.h (LOOP_REQUIRES_VERSIONING_FOR_ALIAS): Test comp_alias_ddrs instead of may_alias_ddrs. * tree-vect-data-refs.c (vect_analyze_possibly_independent_ddr): New function. (vect_analyze_data_ref_dependence): Use it if DDR_COULD_BE_INDEPENDENT_P, but fall back to using the recorded distance vectors if that fails. (dependence_distance_ge_vf): New function. (vect_prune_runtime_alias_test_list): Use it. Don't clear LOOP_VINFO_MAY_ALIAS_DDRS. gcc/testsuite/ * gcc.dg/vect/vect-alias-check-3.c: New test. * gcc.dg/vect/vect-alias-check-4.c: Likewise. * gcc.dg/vect/vect-alias-check-5.c: Likewise. From-SVN: r250867
2017-07-21re PR tree-optimization/81303 (410.bwaves regression caused by r249919)Richard Biener1-18/+14
2017-07-21 Richard Biener <rguenther@suse.de> PR tree-optimization/81303 * tree-vect-data-refs.c (vect_get_peeling_costs_all_drs): Pass in datarefs vector. Allow NULL dr0 for no peeling cost estimate. (vect_peeling_hash_get_lowest_cost): Adjust. (vect_enhance_data_refs_alignment): Likewise. Use vect_get_peeling_costs_all_drs to compute the penalty for no peeling to match up costs. From-SVN: r250424
2017-07-18Fix PR81362: Vector peelingRobin Dapp1-22/+8
npeel was erroneously overwritten by vect_peeling_hash_get_lowest_cost although the corresponding dataref is not used afterwards. It should be safe to get rid of the npeel parameter since we use the returned peeling_info's npeel anyway. Also removed the body_cost_vec parameter which is not used elsewhere. gcc/ChangeLog: 2017-07-18 Robin Dapp <rdapp@linux.vnet.ibm.com> * tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Remove body_cost_vec from _vect_peel_extended_info. (vect_peeling_hash_get_lowest_cost): Do not set body_cost_vec. (vect_peeling_hash_choose_best_peeling): Remove body_cost_vec and npeel. From-SVN: r250300