aboutsummaryrefslogtreecommitdiff
path: root/gcc/tree-vect-loop.c
AgeCommit message (Collapse)AuthorFilesLines
2018-01-19Check whether any statements need masking (PR 83922)Richard Sandiford1-2/+34
This PR is an odd case in which, due to the low optimisation level, we enter vectorisation with: outer1: x_1 = PHI <x_3(outer2), ...>; ... inner: x_2 = 0; ... outer2: x_3 = PHI <x_2(inner)>; These statements are tentatively treated as a double reduction by vect_force_simple_reduction, but in the end only x_3 and x_2 are marked as relevant. vect_analyze_loop_operations skips over x_3, leaving the vectorizable_reduction check to a presumed future test of x_1, which in this case never happens. We therefore end up vectorising x_2 only (complete with peeling for niters!) and leave the scalar x_3 in place. This caused a segfault in the support for fully-masked loops, since there were no statements that needed masking. Fixed by checking for that. But I think this is also a flaw in vect_analyze_loop_operations. Outer loop vectorisation reduces the number of times that the inner loop is executed, so it wouldn't necessarily be valid to leave the scalar x_3 in place for all vectorisable x_2. There's already code to forbid that when x_1 isn't present: /* FORNOW: we currently don't support the case that these phis are not used in the outerloop (unless it is double reduction, i.e., this phi is vect_reduction_def), cause this case requires to actually do something here. */ I think we need to do the same if x_1 is present but not relevant. 2018-01-19 Richard Sandiford <richard.sandiford@linaro.org> gcc/ PR tree-optimization/83922 * tree-vect-loop.c (vect_verify_full_masking): Return false if there are no statements that need masking. (vect_active_double_reduction_p): New function. (vect_analyze_loop_operations): Use it when handling phis that are not in the loop header. gcc/testsuite/ PR tree-optimization/83922 * gcc.dg/pr83922.c: New test. From-SVN: r256885
2018-01-19Avoid ICE for nested inductions (PR 83914)Richard Sandiford1-27/+23
This testcase ICEd because we converted the initial value of an induction to the vector element type even for nested inductions. This isn't necessary because the initial expression is vectorised normally, and it meant that init_expr was no longer the original statement operand by the time we called vect_get_vec_def_for_operand. Also, adding the conversion code here made the existing SLP conversion redundant. 2018-01-19 Richard Sandiford <richard.sandiford@linaro.org> gcc/ PR tree-optimization/83914 * tree-vect-loop.c (vectorizable_induction): Don't convert init_expr or apply the peeling adjustment for inductions that are nested within the vectorized loop. gcc/testsuite/ PR tree-optimization/83914 * gcc.dg/vect/pr83914.c: New test. From-SVN: r256884
2018-01-16Two fixes for live-out SLP inductions (PR 83857)Richard Sandiford1-2/+9
vect_analyze_loop_operations was calling vectorizable_live_operation for all live-out phis, which led to a bogus ncopies calculation in the pure SLP case. I think v_a_l_o should only be passing phis that are vectorised using normal loop vectorisation, since vect_slp_analyze_node_operations handles the SLP side (and knows the correct slp_index and slp_node arguments to pass in, via vect_analyze_stmt). With that fixed we hit an older bug that vectorizable_live_operation didn't handle live-out SLP inductions. Fixed by using gimple_phi_result rather than gimple_get_lhs for phis. 2018-01-16 Richard Sandiford <richard.sandiford@linaro.org> gcc/ PR tree-optimization/83857 * tree-vect-loop.c (vect_analyze_loop_operations): Don't call vectorizable_live_operation for pure SLP statements. (vectorizable_live_operation): Handle PHIs. gcc/testsuite/ PR tree-optimization/83857 * gcc.dg/vect/pr83857.c: New test. From-SVN: r256747
2018-01-13Support for aliasing with variable stridesRichard Sandiford1-0/+13
This patch adds runtime alias checks for loops with variable strides, so that we can vectorise them even without a restrict qualifier. There are several parts to doing this: 1) For accesses like: x[i * n] += 1; we need to check whether n (and thus the DR_STEP) is nonzero. vect_analyze_data_ref_dependence records values that need to be checked in this way, then prune_runtime_alias_test_list records a bounds check on DR_STEP being outside the range [0, 0]. 2) For accesses like: x[i * n] = x[i * n + 1] + 1; we simply need to test whether abs (n) >= 2. prune_runtime_alias_test_list looks for cases like this and tries to guess whether it is better to use this kind of check or a check for non-overlapping ranges. (We could do an OR of the two conditions at runtime, but that isn't implemented yet.) 3) Checks for overlapping ranges need to cope with variable strides. At present the "length" of each segment in a range check is represented as an offset from the base that lies outside the touched range, in the same direction as DR_STEP. The length can therefore be negative and is sometimes conservative. With variable steps it's easier to reaon about if we split this into two: seg_len: distance travelled from the first iteration of interest to the last, e.g. DR_STEP * (VF - 1) access_size: the number of bytes accessed in each iteration with access_size always being a positive constant and seg_len possibly being variable. We can then combine alias checks for two accesses that are a constant number of bytes apart by adjusting the access size to account for the gap. This leaves the segment length unchanged, which allows the check to be combined with further accesses. When seg_len is positive, the runtime alias check has the form: base_a >= base_b + seg_len_b + access_size_b || base_b >= base_a + seg_len_a + access_size_a In many accesses the base will be aligned to the access size, which allows us to skip the addition: base_a > base_b + seg_len_b || base_b > base_a + seg_len_a A similar saving is possible with "negative" lengths. The patch therefore tracks the alignment in addition to seg_len and access_size. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * tree-vectorizer.h (vec_lower_bound): New structure. (_loop_vec_info): Add check_nonzero and lower_bounds. (LOOP_VINFO_CHECK_NONZERO): New macro. (LOOP_VINFO_LOWER_BOUNDS): Likewise. (LOOP_REQUIRES_VERSIONING_FOR_ALIAS): Check lower_bounds too. * tree-data-ref.h (dr_with_seg_len): Add access_size and align fields. Make seg_len the distance travelled, not including the access size. (dr_direction_indicator): Declare. (dr_zero_step_indicator): Likewise. (dr_known_forward_stride_p): Likewise. * tree-data-ref.c: Include stringpool.h, tree-vrp.h and tree-ssanames.h. (runtime_alias_check_p): Allow runtime alias checks with variable strides. (operator ==): Compare access_size and align. (prune_runtime_alias_test_list): Rework for new distinction between the access_size and seg_len. (create_intersect_range_checks_index): Likewise. Cope with polynomial segment lengths. (get_segment_min_max): New function. (create_intersect_range_checks): Use it. (dr_step_indicator): New function. (dr_direction_indicator): Likewise. (dr_zero_step_indicator): Likewise. (dr_known_forward_stride_p): Likewise. * tree-loop-distribution.c (data_ref_segment_size): Return DR_STEP * (niters - 1). (compute_alias_check_pairs): Update call to the dr_with_seg_len constructor. * tree-vect-data-refs.c (vect_check_nonzero_value): New function. (vect_preserves_scalar_order_p): New function, split out from... (vect_analyze_data_ref_dependence): ...here. Check for zero steps. (vect_vfa_segment_size): Return DR_STEP * (length_factor - 1). (vect_vfa_access_size): New function. (vect_vfa_align): Likewise. (vect_compile_time_alias): Take access_size_a and access_b arguments. (dump_lower_bound): New function. (vect_check_lower_bound): Likewise. (vect_small_gap_p): Likewise. (vectorizable_with_step_bound_p): Likewise. (vect_prune_runtime_alias_test_list): Ignore cross-iteration depencies if the vectorization factor is 1. Convert the checks for nonzero steps into checks on the bounds of DR_STEP. Try using a bunds check for variable steps if the minimum required step is relatively small. Update calls to the dr_with_seg_len constructor and to vect_compile_time_alias. * tree-vect-loop-manip.c (vect_create_cond_for_lower_bounds): New function. (vect_loop_versioning): Call it. * tree-vect-loop.c (vect_analyze_loop_2): Clear LOOP_VINFO_LOWER_BOUNDS when retrying. (vect_estimate_min_profitable_iters): Account for any bounds checks. gcc/testsuite/ * gcc.dg/vect/bb-slp-cond-1.c: Expect loop vectorization rather than SLP vectorization. * gcc.dg/vect/vect-alias-check-10.c: New test. * gcc.dg/vect/vect-alias-check-11.c: Likewise. * gcc.dg/vect/vect-alias-check-12.c: Likewise. * gcc.dg/vect/vect-alias-check-8.c: Likewise. * gcc.dg/vect/vect-alias-check-9.c: Likewise. * gcc.target/aarch64/sve/strided_load_8.c: Likewise. * gcc.target/aarch64/sve/var_stride_1.c: Likewise. * gcc.target/aarch64/sve/var_stride_1.h: Likewise. * gcc.target/aarch64/sve/var_stride_1_run.c: Likewise. * gcc.target/aarch64/sve/var_stride_2.c: Likewise. * gcc.target/aarch64/sve/var_stride_2_run.c: Likewise. * gcc.target/aarch64/sve/var_stride_3.c: Likewise. * gcc.target/aarch64/sve/var_stride_3_run.c: Likewise. * gcc.target/aarch64/sve/var_stride_4.c: Likewise. * gcc.target/aarch64/sve/var_stride_4_run.c: Likewise. * gcc.target/aarch64/sve/var_stride_5.c: Likewise. * gcc.target/aarch64/sve/var_stride_5_run.c: Likewise. * gcc.target/aarch64/sve/var_stride_6.c: Likewise. * gcc.target/aarch64/sve/var_stride_6_run.c: Likewise. * gcc.target/aarch64/sve/var_stride_7.c: Likewise. * gcc.target/aarch64/sve/var_stride_7_run.c: Likewise. * gcc.target/aarch64/sve/var_stride_8.c: Likewise. * gcc.target/aarch64/sve/var_stride_8_run.c: Likewise. * gfortran.dg/vect/vect-alias-check-1.F90: Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256644
2018-01-13Add support for in-order addition reduction using SVE FADDARichard Sandiford1-56/+338
This patch adds support for in-order floating-point addition reductions, which are suitable even in strict IEEE mode. Previously vect_is_simple_reduction would reject any cases that forbid reassociation. The idea is instead to tentatively accept them as "FOLD_LEFT_REDUCTIONs" and only fail later if there is no support for them. Although this patch only handles the particular case of plus and minus on floating-point types, there's no reason in principle why we couldn't handle other cases. The reductions use a new fold_left_plus_optab if available, otherwise they fall back to elementwise additions or subtractions. The vect_force_simple_reduction change makes it easier for parloops to read the type of reduction. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * optabs.def (fold_left_plus_optab): New optab. * doc/md.texi (fold_left_plus_@var{m}): Document. * internal-fn.def (IFN_FOLD_LEFT_PLUS): New internal function. * internal-fn.c (fold_left_direct): Define. (expand_fold_left_optab_fn): Likewise. (direct_fold_left_optab_supported_p): Likewise. * fold-const-call.c (fold_const_fold_left): New function. (fold_const_call): Use it to fold CFN_FOLD_LEFT_PLUS. * tree-parloops.c (valid_reduction_p): New function. (gather_scalar_reductions): Use it. * tree-vectorizer.h (FOLD_LEFT_REDUCTION): New vect_reduction_type. (vect_finish_replace_stmt): Declare. * tree-vect-loop.c (fold_left_reduction_fn): New function. (needs_fold_left_reduction_p): New function, split out from... (vect_is_simple_reduction): ...here. Accept reductions that forbid reassociation, but give them type FOLD_LEFT_REDUCTION. (vect_force_simple_reduction): Also store the reduction type in the assignment's STMT_VINFO_REDUC_TYPE. (vect_model_reduction_cost): Handle FOLD_LEFT_REDUCTION. (merge_with_identity): New function. (vect_expand_fold_left): Likewise. (vectorize_fold_left_reduction): Likewise. (vectorizable_reduction): Handle FOLD_LEFT_REDUCTION. Leave the scalar phi in place for it. Check for target support and reject cases that would reassociate the operation. Defer the transform phase to vectorize_fold_left_reduction. * config/aarch64/aarch64.md (UNSPEC_FADDA): New unspec. * config/aarch64/aarch64-sve.md (fold_left_plus_<mode>): New expander. (*fold_left_plus_<mode>, *pred_fold_left_plus_<mode>): New insns. gcc/testsuite/ * gcc.dg/vect/no-fast-math-vect16.c: Expect the test to pass and check for a message about using in-order reductions. * gcc.dg/vect/pr79920.c: Expect both loops to be vectorized and check for a message about using in-order reductions. * gcc.dg/vect/trapv-vect-reduc-4.c: Expect all three loops to be vectorized and check for a message about using in-order reductions. Expect targets with variable-length vectors to fall back to the fixed-length mininum. * gcc.dg/vect/vect-reduc-6.c: Expect the loop to be vectorized and check for a message about using in-order reductions. * gcc.dg/vect/vect-reduc-in-order-1.c: New test. * gcc.dg/vect/vect-reduc-in-order-2.c: Likewise. * gcc.dg/vect/vect-reduc-in-order-3.c: Likewise. * gcc.dg/vect/vect-reduc-in-order-4.c: Likewise. * gcc.target/aarch64/sve/reduc_strict_1.c: New test. * gcc.target/aarch64/sve/reduc_strict_1_run.c: Likewise. * gcc.target/aarch64/sve/reduc_strict_2.c: Likewise. * gcc.target/aarch64/sve/reduc_strict_2_run.c: Likewise. * gcc.target/aarch64/sve/reduc_strict_3.c: Likewise. * gcc.target/aarch64/sve/slp_13.c: Add floating-point types. * gfortran.dg/vect/vect-8.f90: Expect 22 loops to be vectorized if vect_fold_left_plus. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256639
2018-01-13Use single-iteration epilogues when peeling for gapsRichard Sandiford1-10/+17
This patch adds support for fully-masking loops that require peeling for gaps. It peels exactly one scalar iteration and uses the masked loop to handle the rest. Previously we would fall back on using a standard unmasked loop instead. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * tree-vect-loop-manip.c (vect_gen_scalar_loop_niters): Replace vfm1 with a bound_epilog parameter. (vect_do_peeling): Update calls accordingly, and move the prologue call earlier in the function. Treat the base bound_epilog as 0 for fully-masked loops and retain vf - 1 for other loops. Add 1 to this base when peeling for gaps. * tree-vect-loop.c (vect_analyze_loop_2): Allow peeling for gaps with fully-masked loops. (vect_estimate_min_profitable_iters): Handle the single peeled iteration in that case. gcc/testsuite/ * gcc.target/aarch64/sve/struct_vect_18.c: Check the number of branches. * gcc.target/aarch64/sve/struct_vect_19.c: Likewise. * gcc.target/aarch64/sve/struct_vect_20.c: New test. * gcc.target/aarch64/sve/struct_vect_20_run.c: Likewise. * gcc.target/aarch64/sve/struct_vect_21.c: Likewise. * gcc.target/aarch64/sve/struct_vect_21_run.c: Likewise. * gcc.target/aarch64/sve/struct_vect_22.c: Likewise. * gcc.target/aarch64/sve/struct_vect_22_run.c: Likewise. * gcc.target/aarch64/sve/struct_vect_23.c: Likewise. * gcc.target/aarch64/sve/struct_vect_23_run.c: Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256635
2018-01-13Add support for conditional reductions using SVE CLASTBRichard Sandiford1-43/+86
This patch uses SVE CLASTB to optimise conditional reductions. It means that we no longer need to maintain a separate index vector to record the most recent valid value, and no longer need to worry about overflow cases. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * doc/md.texi (fold_extract_last_@var{m}): Document. * doc/sourcebuild.texi (vect_fold_extract_last): Likewise. * optabs.def (fold_extract_last_optab): New optab. * internal-fn.def (FOLD_EXTRACT_LAST): New internal function. * internal-fn.c (fold_extract_direct): New macro. (expand_fold_extract_optab_fn): Likewise. (direct_fold_extract_optab_supported_p): Likewise. * tree-vectorizer.h (EXTRACT_LAST_REDUCTION): New vect_reduction_type. * tree-vect-loop.c (vect_model_reduction_cost): Handle EXTRACT_LAST_REDUCTION. (get_initial_def_for_reduction): Do not create an initial vector for EXTRACT_LAST_REDUCTION reductions. (vectorizable_reduction): Leave the scalar phi in place for EXTRACT_LAST_REDUCTIONs. Try using EXTRACT_LAST_REDUCTION ahead of INTEGER_INDUC_COND_REDUCTION. Do not check for an epilogue code for EXTRACT_LAST_REDUCTION and defer the transform phase to vectorizable_condition. * tree-vect-stmts.c (vect_finish_stmt_generation_1): New function, split out from... (vect_finish_stmt_generation): ...here. (vect_finish_replace_stmt): New function. (vectorizable_condition): Handle EXTRACT_LAST_REDUCTION. * config/aarch64/aarch64-sve.md (fold_extract_last_<mode>): New pattern. * config/aarch64/aarch64.md (UNSPEC_CLASTB): New unspec. gcc/testsuite/ * lib/target-supports.exp (check_effective_target_vect_fold_extract_last): New proc. * gcc.dg/vect/pr65947-1.c: Update dump messages. Add markup for fold_extract_last. * gcc.dg/vect/pr65947-2.c: Likewise. * gcc.dg/vect/pr65947-3.c: Likewise. * gcc.dg/vect/pr65947-4.c: Likewise. * gcc.dg/vect/pr65947-5.c: Likewise. * gcc.dg/vect/pr65947-6.c: Likewise. * gcc.dg/vect/pr65947-9.c: Likewise. * gcc.dg/vect/pr65947-10.c: Likewise. * gcc.dg/vect/pr65947-12.c: Likewise. * gcc.dg/vect/pr65947-14.c: Likewise. * gcc.dg/vect/pr80631-1.c: Likewise. * gcc.target/aarch64/sve/clastb_1.c: New test. * gcc.target/aarch64/sve/clastb_1_run.c: Likewise. * gcc.target/aarch64/sve/clastb_2.c: Likewise. * gcc.target/aarch64/sve/clastb_2_run.c: Likewise. * gcc.target/aarch64/sve/clastb_3.c: Likewise. * gcc.target/aarch64/sve/clastb_3_run.c: Likewise. * gcc.target/aarch64/sve/clastb_4.c: Likewise. * gcc.target/aarch64/sve/clastb_4_run.c: Likewise. * gcc.target/aarch64/sve/clastb_5.c: Likewise. * gcc.target/aarch64/sve/clastb_5_run.c: Likewise. * gcc.target/aarch64/sve/clastb_6.c: Likewise. * gcc.target/aarch64/sve/clastb_6_run.c: Likewise. * gcc.target/aarch64/sve/clastb_7.c: Likewise. * gcc.target/aarch64/sve/clastb_7_run.c: Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256633
2018-01-13Add support for vectorising live-out values using SVE LASTBRichard Sandiford1-17/+70
This patch uses the SVE LASTB instruction to optimise cases in which a value produced by the final scalar iteration of a vectorised loop is live outside the loop. Previously this situation would stop us from using a fully-masked loop. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * doc/md.texi (extract_last_@var{m}): Document. * optabs.def (extract_last_optab): New optab. * internal-fn.def (EXTRACT_LAST): New internal function. * internal-fn.c (cond_unary_direct): New macro. (expand_cond_unary_optab_fn): Likewise. (direct_cond_unary_optab_supported_p): Likewise. * tree-vect-loop.c (vectorizable_live_operation): Allow fully-masked loops using EXTRACT_LAST. * config/aarch64/aarch64-sve.md (aarch64_sve_lastb<mode>): Rename to... (extract_last_<mode>): ...this optab. (vec_extract<mode><Vel>): Update accordingly. gcc/testsuite/ * gcc.target/aarch64/sve/live_1.c: New test. * gcc.target/aarch64/sve/live_1_run.c: Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256632
2018-01-13Handle peeling for alignment with maskingRichard Sandiford1-29/+58
This patch adds support for aligning vectors by using a partial first iteration. E.g. if the start pointer is 3 elements beyond an aligned address, the first iteration will have a mask in which the first three elements are false. On SVE, the optimisation is only useful for vector-length-specific code. Vector-length-agnostic code doesn't try to align vectors since the vector length might not be a power of 2. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * tree-vectorizer.h (_loop_vec_info::mask_skip_niters): New field. (LOOP_VINFO_MASK_SKIP_NITERS): New macro. (vect_use_loop_mask_for_alignment_p): New function. (vect_prepare_for_masked_peels, vect_gen_while_not): Declare. * tree-vect-loop-manip.c (vect_set_loop_masks_directly): Add an niters_skip argument. Make sure that the first niters_skip elements of the first iteration are inactive. (vect_set_loop_condition_masked): Handle LOOP_VINFO_MASK_SKIP_NITERS. Update call to vect_set_loop_masks_directly. (get_misalign_in_elems): New function, split out from... (vect_gen_prolog_loop_niters): ...here. (vect_update_init_of_dr): Take a code argument that specifies whether the adjustment should be added or subtracted. (vect_update_init_of_drs): Likewise. (vect_prepare_for_masked_peels): New function. (vect_do_peeling): Skip prologue peeling if we're using a mask instead. Update call to vect_update_inits_of_drs. * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize mask_skip_niters. (vect_analyze_loop_2): Allow fully-masked loops with peeling for alignment. Do not include the number of peeled iterations in the minimum threshold in that case. (vectorizable_induction): Adjust the start value down by LOOP_VINFO_MASK_SKIP_NITERS iterations. (vect_transform_loop): Call vect_prepare_for_masked_peels. Take the number of skipped iterations into account when calculating the loop bounds. * tree-vect-stmts.c (vect_gen_while_not): New function. gcc/testsuite/ * gcc.target/aarch64/sve/nopeel_1.c: New test. * gcc.target/aarch64/sve/peel_ind_1.c: Likewise. * gcc.target/aarch64/sve/peel_ind_1_run.c: Likewise. * gcc.target/aarch64/sve/peel_ind_2.c: Likewise. * gcc.target/aarch64/sve/peel_ind_2_run.c: Likewise. * gcc.target/aarch64/sve/peel_ind_3.c: Likewise. * gcc.target/aarch64/sve/peel_ind_3_run.c: Likewise. * gcc.target/aarch64/sve/peel_ind_4.c: Likewise. * gcc.target/aarch64/sve/peel_ind_4_run.c: Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256630
2018-01-13Allow the number of iterations to be smaller than VFRichard Sandiford1-71/+103
Fully-masked loops can be profitable even if the iteration count is smaller than the vectorisation factor. In this case we're effectively doing a complete unroll followed by SLP. The documentation for min-vect-loop-bound says that the default value was 0, but actually the default and minimum were 1. We need it to be 0 for this case since the parameter counts a whole number of vector iterations. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * doc/sourcebuild.texi (vect_fully_masked): Document. * params.def (PARAM_MIN_VECT_LOOP_BOUND): Change minimum and default value to 0. * tree-vect-loop.c (vect_analyze_loop_costing): New function, split out from... (vect_analyze_loop_2): ...here. Don't check the vectorization factor against the number of loop iterations if the loop is fully-masked. gcc/testsuite/ * lib/target-supports.exp (check_effective_target_vect_fully_masked): New proc. * gcc.dg/vect/slp-3.c: Expect all loops to be vectorized if vect_fully_masked. * gcc.target/aarch64/sve/loop_add_4.c: New test. * gcc.target/aarch64/sve/loop_add_4_run.c: Likewise. * gcc.target/aarch64/sve/loop_add_5.c: Likewise. * gcc.target/aarch64/sve/loop_add_5_run.c: Likewise. * gcc.target/aarch64/sve/miniloop_1.c: Likewise. * gcc.target/aarch64/sve/miniloop_2.c: Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256629
2018-01-13Add support for reductions in fully-masked loopsRichard Sandiford1-23/+65
This patch removes the restriction that fully-masked loops cannot have reductions. The key thing here is to make sure that the reduction accumulator doesn't include any values associated with inactive lanes; the patch adds a bunch of conditional binary operations for doing that. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * doc/md.texi (cond_add@var{mode}, cond_sub@var{mode}) (cond_and@var{mode}, cond_ior@var{mode}, cond_xor@var{mode}) (cond_smin@var{mode}, cond_smax@var{mode}, cond_umin@var{mode}) (cond_umax@var{mode}): Document. * optabs.def (cond_add_optab, cond_sub_optab, cond_and_optab) (cond_ior_optab, cond_xor_optab, cond_smin_optab, cond_smax_optab) (cond_umin_optab, cond_umax_optab): New optabs. * internal-fn.def (COND_ADD, COND_SUB, COND_MIN, COND_MAX, COND_AND) (COND_IOR, COND_XOR): New internal functions. * internal-fn.h (get_conditional_internal_fn): Declare. * internal-fn.c (cond_binary_direct): New macro. (expand_cond_binary_optab_fn): Likewise. (direct_cond_binary_optab_supported_p): Likewise. (get_conditional_internal_fn): New function. * tree-vect-loop.c (vectorizable_reduction): Handle fully-masked loops. Cope with reduction statements that are vectorized as calls rather than assignments. * config/aarch64/aarch64-sve.md (cond_<optab><mode>): New insns. * config/aarch64/iterators.md (UNSPEC_COND_ADD, UNSPEC_COND_SUB) (UNSPEC_COND_SMAX, UNSPEC_COND_UMAX, UNSPEC_COND_SMIN) (UNSPEC_COND_UMIN, UNSPEC_COND_AND, UNSPEC_COND_ORR) (UNSPEC_COND_EOR): New unspecs. (optab): Add mappings for them. (SVE_COND_INT_OP, SVE_COND_FP_OP): New int iterators. (sve_int_op, sve_fp_op): New int attributes. gcc/testsuite/ * gcc.dg/vect/pr60482.c: Remove XFAIL for variable-length vectors. * gcc.target/aarch64/sve/reduc_1.c: Expect the loop operations to be predicated. * gcc.target/aarch64/sve/slp_5.c: Check for a fully-masked loop. * gcc.target/aarch64/sve/slp_7.c: Likewise. * gcc.target/aarch64/sve/reduc_5.c: New test. * gcc.target/aarch64/sve/slp_13.c: Likewise. * gcc.target/aarch64/sve/slp_13_run.c: Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256626
2018-01-13Add support for fully-predicated loopsRichard Sandiford1-20/+305
This patch adds support for using a single fully-predicated loop instead of a vector loop and a scalar tail. An SVE WHILELO instruction generates the predicate for each iteration of the loop, given the current scalar iv value and the loop bound. This operation is wrapped up in a new internal function called WHILE_ULT. E.g.: WHILE_ULT (0, 3, { 0, 0, 0, 0}) -> { 1, 1, 1, 0 } WHILE_ULT (UINT_MAX - 1, UINT_MAX, { 0, 0, 0, 0 }) -> { 1, 0, 0, 0 } The third WHILE_ULT argument is needed to make the operation unambiguous: without it, WHILE_ULT (0, 3) for one vector type would seem equivalent to WHILE_ULT (0, 3) for another, even if the types have different numbers of elements. Note that the patch uses "mask" and "fully-masked" instead of "predicate" and "fully-predicated", to follow existing GCC terminology. This patch just handles the simple cases, punting for things like reductions and live-out values. Later patches remove most of these restrictions. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * optabs.def (while_ult_optab): New optab. * doc/md.texi (while_ult@var{m}@var{n}): Document. * internal-fn.def (WHILE_ULT): New internal function. * internal-fn.h (direct_internal_fn_supported_p): New override that takes two types as argument. * internal-fn.c (while_direct): New macro. (expand_while_optab_fn): New function. (convert_optab_supported_p): Likewise. (direct_while_optab_supported_p): New macro. * wide-int.h (wi::udiv_ceil): New function. * tree-vectorizer.h (rgroup_masks): New structure. (vec_loop_masks): New typedef. (_loop_vec_info): Add masks, mask_compare_type, can_fully_mask_p and fully_masked_p. (LOOP_VINFO_CAN_FULLY_MASK_P, LOOP_VINFO_FULLY_MASKED_P) (LOOP_VINFO_MASKS, LOOP_VINFO_MASK_COMPARE_TYPE): New macros. (vect_max_vf): New function. (slpeel_make_loop_iterate_ntimes): Delete. (vect_set_loop_condition, vect_get_loop_mask_type, vect_gen_while) (vect_halve_mask_nunits, vect_double_mask_nunits): Declare. (vect_record_loop_mask, vect_get_loop_mask): Likewise. * tree-vect-loop-manip.c: Include tree-ssa-loop-niter.h, internal-fn.h, stor-layout.h and optabs-query.h. (vect_set_loop_mask): New function. (add_preheader_seq): Likewise. (add_header_seq): Likewise. (interleave_supported_p): Likewise. (vect_maybe_permute_loop_masks): Likewise. (vect_set_loop_masks_directly): Likewise. (vect_set_loop_condition_masked): Likewise. (vect_set_loop_condition_unmasked): New function, split out from slpeel_make_loop_iterate_ntimes. (slpeel_make_loop_iterate_ntimes): Rename to.. (vect_set_loop_condition): ...this. Use vect_set_loop_condition_masked for fully-masked loops and vect_set_loop_condition_unmasked otherwise. (vect_do_peeling): Update call accordingly. (vect_gen_vector_loop_niters): Use VF as the step for fully-masked loops. * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize mask_compare_type, can_fully_mask_p and fully_masked_p. (release_vec_loop_masks): New function. (_loop_vec_info): Use it to free the loop masks. (can_produce_all_loop_masks_p): New function. (vect_get_max_nscalars_per_iter): Likewise. (vect_verify_full_masking): Likewise. (vect_analyze_loop_2): Save LOOP_VINFO_CAN_FULLY_MASK_P around retries, and free the mask rgroups before retrying. Check loop-wide reasons for disallowing fully-masked loops. Make the final decision about whether use a fully-masked loop or not. (vect_estimate_min_profitable_iters): Do not assume that peeling for the number of iterations will be needed for fully-masked loops. (vectorizable_reduction): Disable fully-masked loops. (vectorizable_live_operation): Likewise. (vect_halve_mask_nunits): New function. (vect_double_mask_nunits): Likewise. (vect_record_loop_mask): Likewise. (vect_get_loop_mask): Likewise. (vect_transform_loop): Handle the case in which the final loop iteration might handle a partial vector. Call vect_set_loop_condition instead of slpeel_make_loop_iterate_ntimes. * tree-vect-stmts.c: Include tree-ssa-loop-niter.h and gimple-fold.h. (check_load_store_masking): New function. (prepare_load_store_mask): Likewise. (vectorizable_store): Handle fully-masked loops. (vectorizable_load): Likewise. (supportable_widening_operation): Use vect_halve_mask_nunits for booleans. (supportable_narrowing_operation): Likewise vect_double_mask_nunits. (vect_gen_while): New function. * config/aarch64/aarch64.md (umax<mode>3): New expander. (aarch64_uqdec<mode>): New insn. gcc/testsuite/ * gcc.dg/tree-ssa/cunroll-10.c: Disable vectorization. * gcc.dg/tree-ssa/peel1.c: Likewise. * gcc.dg/vect/vect-load-lanes-peeling-1.c: Remove XFAIL for variable-length vectors. * gcc.target/aarch64/sve/vcond_6.c: XFAIL test for AND. * gcc.target/aarch64/sve/vec_bool_cmp_1.c: Expect BIC instead of NOT. * gcc.target/aarch64/sve/slp_1.c: Check for a fully-masked loop. * gcc.target/aarch64/sve/slp_2.c: Likewise. * gcc.target/aarch64/sve/slp_3.c: Likewise. * gcc.target/aarch64/sve/slp_4.c: Likewise. * gcc.target/aarch64/sve/slp_6.c: Likewise. * gcc.target/aarch64/sve/slp_8.c: New test. * gcc.target/aarch64/sve/slp_8_run.c: Likewise. * gcc.target/aarch64/sve/slp_9.c: Likewise. * gcc.target/aarch64/sve/slp_9_run.c: Likewise. * gcc.target/aarch64/sve/slp_10.c: Likewise. * gcc.target/aarch64/sve/slp_10_run.c: Likewise. * gcc.target/aarch64/sve/slp_11.c: Likewise. * gcc.target/aarch64/sve/slp_11_run.c: Likewise. * gcc.target/aarch64/sve/slp_12.c: Likewise. * gcc.target/aarch64/sve/slp_12_run.c: Likewise. * gcc.target/aarch64/sve/ld1r_2.c: Likewise. * gcc.target/aarch64/sve/ld1r_2_run.c: Likewise. * gcc.target/aarch64/sve/while_1.c: Likewise. * gcc.target/aarch64/sve/while_2.c: Likewise. * gcc.target/aarch64/sve/while_3.c: Likewise. * gcc.target/aarch64/sve/while_4.c: Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256625
2018-01-13Add support for bitwise reductionsRichard Sandiford1-3/+12
This patch adds support for the SVE bitwise reduction instructions (ANDV, ORV and EORV). It's a fairly mechanical extension of existing REDUC_* operators. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * optabs.def (reduc_and_scal_optab, reduc_ior_scal_optab) (reduc_xor_scal_optab): New optabs. * doc/md.texi (reduc_and_scal_@var{m}, reduc_ior_scal_@var{m}) (reduc_xor_scal_@var{m}): Document. * doc/sourcebuild.texi (vect_logical_reduc): Likewise. * internal-fn.def (IFN_REDUC_AND, IFN_REDUC_IOR, IFN_REDUC_XOR): New internal functions. * fold-const-call.c (fold_const_call): Handle them. * tree-vect-loop.c (reduction_fn_for_scalar_code): Return the new internal functions for BIT_AND_EXPR, BIT_IOR_EXPR and BIT_XOR_EXPR. * config/aarch64/aarch64-sve.md (reduc_<bit_reduc>_scal_<mode>): (*reduc_<bit_reduc>_scal_<mode>): New patterns. * config/aarch64/iterators.md (UNSPEC_ANDV, UNSPEC_ORV) (UNSPEC_XORV): New unspecs. (optab): Add entries for them. (BITWISEV): New int iterator. (bit_reduc_op): New int attributes. gcc/testsuite/ * lib/target-supports.exp (check_effective_target_vect_logical_reduc): New proc. * gcc.dg/vect/vect-reduc-or_1.c: Also run for vect_logical_reduc and add an associated scan-dump test. Prevent vectorization of the first two loops. * gcc.dg/vect/vect-reduc-or_2.c: Likewise. * gcc.target/aarch64/sve/reduc_1.c: Add AND, IOR and XOR reductions. * gcc.target/aarch64/sve/reduc_2.c: Likewise. * gcc.target/aarch64/sve/reduc_1_run.c: Likewise. (INIT_VECTOR): Tweak initial value so that some bits are always set. * gcc.target/aarch64/sve/reduc_2_run.c: Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256624
2018-01-13SLP reductions with variable-length vectorsRichard Sandiford1-67/+255
Two things stopped us using SLP reductions with variable-length vectors: (1) We didn't have a way of constructing the initial vector. This patch does it by creating a vector full of the neutral identity value and then using a shift-and-insert function to insert any non-identity inputs into the low-numbered elements. (The non-identity values are needed for double reductions.) Alternatively, for unchained MIN/MAX reductions that have no neutral value, we instead use the same duplicate-and-interleave approach as for SLP constant and external definitions (added by a previous patch). (2) The epilogue for constant-length vectors would extract the vector elements associated with each SLP statement and do scalar arithmetic on these individual elements. For variable-length vectors, the patch instead creates a reduction vector for each SLP statement, replacing the elements for other SLP statements with the identity value. It then uses a hardware reduction instruction on each vector. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * doc/md.texi (vec_shl_insert_@var{m}): New optab. * internal-fn.def (VEC_SHL_INSERT): New internal function. * optabs.def (vec_shl_insert_optab): New optab. * tree-vectorizer.h (can_duplicate_and_interleave_p): Declare. (duplicate_and_interleave): Likewise. * tree-vect-loop.c: Include internal-fn.h. (neutral_op_for_slp_reduction): New function, split out from get_initial_defs_for_reduction. (get_initial_def_for_reduction): Handle option 2 for variable-length vectors by loading the neutral value into a vector and then shifting the initial value into element 0. (get_initial_defs_for_reduction): Replace the code argument with the neutral value calculated by neutral_op_for_slp_reduction. Use gimple_build_vector for constant-length vectors. Use IFN_VEC_SHL_INSERT for variable-length vectors if all but the first group_size elements have a neutral value. Use duplicate_and_interleave otherwise. (vect_create_epilog_for_reduction): Take a neutral_op parameter. Update call to get_initial_defs_for_reduction. Handle SLP reductions for variable-length vectors by creating one vector result for each scalar result, with the elements associated with other scalar results stubbed out with the neutral value. (vectorizable_reduction): Call neutral_op_for_slp_reduction. Require IFN_VEC_SHL_INSERT for double reductions on variable-length vectors, or SLP reductions that have a neutral value. Require can_duplicate_and_interleave_p support for variable-length unchained SLP reductions if there is no neutral value, such as for MIN/MAX reductions. Also require the number of vector elements to be a multiple of the number of SLP statements when doing variable-length unchained SLP reductions. Update call to vect_create_epilog_for_reduction. * tree-vect-slp.c (can_duplicate_and_interleave_p): Make public and remove initial values. (duplicate_and_interleave): Make public. * config/aarch64/aarch64.md (UNSPEC_INSR): New unspec. * config/aarch64/aarch64-sve.md (vec_shl_insert_<mode>): New insn. gcc/testsuite/ * gcc.dg/vect/pr37027.c: Remove XFAIL for variable-length vectors. * gcc.dg/vect/pr67790.c: Likewise. * gcc.dg/vect/slp-reduc-1.c: Likewise. * gcc.dg/vect/slp-reduc-2.c: Likewise. * gcc.dg/vect/slp-reduc-3.c: Likewise. * gcc.dg/vect/slp-reduc-5.c: Likewise. * gcc.target/aarch64/sve/slp_5.c: New test. * gcc.target/aarch64/sve/slp_5_run.c: Likewise. * gcc.target/aarch64/sve/slp_6.c: Likewise. * gcc.target/aarch64/sve/slp_6_run.c: Likewise. * gcc.target/aarch64/sve/slp_7.c: Likewise. * gcc.target/aarch64/sve/slp_7_run.c: Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256623
2018-01-13Protect against min_profitable_iters going negativeRichard Sandiford1-9/+10
We had: if (vec_outside_cost <= 0) min_profitable_iters = 0; else { min_profitable_iters = ((vec_outside_cost - scalar_outside_cost) * assumed_vf - vec_inside_cost * peel_iters_prologue - vec_inside_cost * peel_iters_epilogue) / ((scalar_single_iter_cost * assumed_vf) - vec_inside_cost); which can lead to negative min_profitable_iters when the *_outside_costs are the same and peel_iters_epilogue is nonzero (e.g. if we're peeling for gaps). This is tested as part of the patch that adds support for fully-predicated loops. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * tree-vect-loop.c (vect_estimate_min_profitable_iters): Make sure min_profitable_iters doesn't go negative. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256621
2018-01-13Add support for masked load/store_lanesRichard Sandiford1-2/+2
This patch adds support for vectorising groups of IFN_MASK_LOADs and IFN_MASK_STOREs using conditional load/store-lanes instructions. This requires new internal functions to represent the result (IFN_MASK_{LOAD,STORE}_LANES), as well as associated optabs. The normal IFN_{LOAD,STORE}_LANES functions are const operations that logically just perform the permute: the load or store is encoded as a MEM operand to the call statement. In contrast, the IFN_MASK_{LOAD,STORE}_LANES functions use the same kind of interface as IFN_MASK_{LOAD,STORE}, since the memory is only conditionally accessed. The AArch64 patterns were added as part of the main LD[234]/ST[234] patch. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * doc/md.texi (vec_mask_load_lanes@var{m}@var{n}): Document. (vec_mask_store_lanes@var{m}@var{n}): Likewise. * optabs.def (vec_mask_load_lanes_optab): New optab. (vec_mask_store_lanes_optab): Likewise. * internal-fn.def (MASK_LOAD_LANES): New internal function. (MASK_STORE_LANES): Likewise. * internal-fn.c (mask_load_lanes_direct): New macro. (mask_store_lanes_direct): Likewise. (expand_mask_load_optab_fn): Handle masked operations. (expand_mask_load_lanes_optab_fn): New macro. (expand_mask_store_optab_fn): Handle masked operations. (expand_mask_store_lanes_optab_fn): New macro. (direct_mask_load_lanes_optab_supported_p): Likewise. (direct_mask_store_lanes_optab_supported_p): Likewise. * tree-vectorizer.h (vect_store_lanes_supported): Take a masked_p parameter. (vect_load_lanes_supported): Likewise. * tree-vect-data-refs.c (strip_conversion): New function. (can_group_stmts_p): Likewise. (vect_analyze_data_ref_accesses): Use it instead of checking for a pair of assignments. (vect_store_lanes_supported): Take a masked_p parameter. (vect_load_lanes_supported): Likewise. * tree-vect-loop.c (vect_analyze_loop_2): Update calls to vect_store_lanes_supported and vect_load_lanes_supported. * tree-vect-slp.c (vect_analyze_slp_instance): Likewise. * tree-vect-stmts.c (get_group_load_store_type): Take a masked_p parameter. Don't allow gaps for masked accesses. Use vect_get_store_rhs. Update calls to vect_store_lanes_supported and vect_load_lanes_supported. (get_load_store_type): Take a masked_p parameter and update call to get_group_load_store_type. (vectorizable_store): Update call to get_load_store_type. Handle IFN_MASK_STORE_LANES. (vectorizable_load): Update call to get_load_store_type. Handle IFN_MASK_LOAD_LANES. gcc/testsuite/ * gcc.dg/vect/vect-ooo-group-1.c: New test. * gcc.target/aarch64/sve/mask_struct_load_1.c: Likewise. * gcc.target/aarch64/sve/mask_struct_load_1_run.c: Likewise. * gcc.target/aarch64/sve/mask_struct_load_2.c: Likewise. * gcc.target/aarch64/sve/mask_struct_load_2_run.c: Likewise. * gcc.target/aarch64/sve/mask_struct_load_3.c: Likewise. * gcc.target/aarch64/sve/mask_struct_load_3_run.c: Likewise. * gcc.target/aarch64/sve/mask_struct_load_4.c: Likewise. * gcc.target/aarch64/sve/mask_struct_load_5.c: Likewise. * gcc.target/aarch64/sve/mask_struct_load_6.c: Likewise. * gcc.target/aarch64/sve/mask_struct_load_7.c: Likewise. * gcc.target/aarch64/sve/mask_struct_load_8.c: Likewise. * gcc.target/aarch64/sve/mask_struct_store_1.c: Likewise. * gcc.target/aarch64/sve/mask_struct_store_1_run.c: Likewise. * gcc.target/aarch64/sve/mask_struct_store_2.c: Likewise. * gcc.target/aarch64/sve/mask_struct_store_2_run.c: Likewise. * gcc.target/aarch64/sve/mask_struct_store_3.c: Likewise. * gcc.target/aarch64/sve/mask_struct_store_3_run.c: Likewise. * gcc.target/aarch64/sve/mask_struct_store_4.c: Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256620
2018-01-12re PR target/80846 (auto-vectorized AVX2 horizontal sum should narrow to ↵Richard Biener1-26/+117
128b right away, to be more efficient for Ryzen and Intel) 2018-01-12 Richard Biener <rguenther@suse.de> PR tree-optimization/80846 * target.def (split_reduction): New target hook. * targhooks.c (default_split_reduction): New function. * targhooks.h (default_split_reduction): Declare. * tree-vect-loop.c (vect_create_epilog_for_reduction): If the target requests first reduce vectors by combining low and high parts. * tree-vect-stmts.c (vect_gen_perm_mask_any): Adjust. (get_vectype_for_scalar_type_and_size): Export. * tree-vectorizer.h (get_vectype_for_scalar_type_and_size): Declare. * doc/tm.texi.in (TARGET_VECTORIZE_SPLIT_REDUCTION): Document. * doc/tm.texi: Regenerate. i386/ * config/i386/i386.c (ix86_split_reduction): Implement TARGET_VECTORIZE_SPLIT_REDUCTION. * gcc.target/i386/pr80846-1.c: New testcase. * gcc.target/i386/pr80846-2.c: Likewise. From-SVN: r256576
2018-01-03Move code that stubs out IFN_MASK_LOADsRichard Sandiford1-0/+19
vectorizable_mask_load_store replaces scalar IFN_MASK_LOAD calls with dummy assignments, so that they never survive vectorisation. This patch moves the code to vect_transform_loop instead, so that we only change the scalar statements once all of them have been vectorised. This makes it easier to handle other types of functions that need stubbing out, and also makes it easier to handle groups and patterns. 2018-01-03 Richard Sandiford <richard.sandiford@linaro.org> gcc/ * tree-vect-loop.c (vect_transform_loop): Stub out scalar IFN_MASK_LOAD calls here rather than... * tree-vect-stmts.c (vectorizable_mask_load_store): ...here. From-SVN: r256210
2018-01-03poly_int: GET_MODE_SIZERichard Sandiford1-3/+3
This patch changes GET_MODE_SIZE from unsigned short to poly_uint16. The non-mechanical parts were handled by previous patches. 2018-01-03 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * machmode.h (mode_size): Change from unsigned short to poly_uint16_pod. (mode_to_bytes): Return a poly_uint16 rather than an unsigned short. (GET_MODE_SIZE): Return a constant if ONLY_FIXED_SIZE_MODES, or if measurement_type is not polynomial. (fixed_size_mode::includes_p): Check for constant-sized modes. * genmodes.c (emit_mode_size_inline): Make mode_size_inline return a poly_uint16 rather than an unsigned short. (emit_mode_size): Change the type of mode_size from unsigned short to poly_uint16_pod. Use ZERO_COEFFS for the initializer. (emit_mode_adjustments): Cope with polynomial vector sizes. * lto-streamer-in.c (lto_input_mode_table): Use bp_unpack_poly_value for GET_MODE_SIZE. * lto-streamer-out.c (lto_write_mode_table): Use bp_pack_poly_value for GET_MODE_SIZE. * auto-inc-dec.c (try_merge): Treat GET_MODE_SIZE as polynomial. * builtins.c (expand_ifn_atomic_compare_exchange_into_call): Likewise. * caller-save.c (setup_save_areas): Likewise. (replace_reg_with_saved_mem): Likewise. * calls.c (emit_library_call_value_1): Likewise. * combine-stack-adj.c (combine_stack_adjustments_for_block): Likewise. * combine.c (simplify_set, make_extraction, simplify_shift_const_1) (gen_lowpart_for_combine): Likewise. * convert.c (convert_to_integer_1): Likewise. * cse.c (equiv_constant, cse_insn): Likewise. * cselib.c (autoinc_split, cselib_hash_rtx): Likewise. (cselib_subst_to_values): Likewise. * dce.c (word_dce_process_block): Likewise. * df-problems.c (df_word_lr_mark_ref): Likewise. * dwarf2cfi.c (init_one_dwarf_reg_size): Likewise. * dwarf2out.c (multiple_reg_loc_descriptor, mem_loc_descriptor) (concat_loc_descriptor, concatn_loc_descriptor, loc_descriptor) (rtl_for_decl_location): Likewise. * emit-rtl.c (gen_highpart, widen_memory_access): Likewise. * expmed.c (extract_bit_field_1, extract_integral_bit_field): Likewise. * expr.c (emit_group_load_1, clear_storage_hints): Likewise. (emit_move_complex, emit_move_multi_word, emit_push_insn): Likewise. (expand_expr_real_1): Likewise. * function.c (assign_parm_setup_block_p, assign_parm_setup_block) (pad_below): Likewise. * gimple-fold.c (optimize_atomic_compare_exchange_p): Likewise. * gimple-ssa-store-merging.c (rhs_valid_for_store_merging_p): Likewise. * ira.c (get_subreg_tracking_sizes): Likewise. * ira-build.c (ira_create_allocno_objects): Likewise. * ira-color.c (coalesced_pseudo_reg_slot_compare): Likewise. (ira_sort_regnos_for_alter_reg): Likewise. * ira-costs.c (record_operand_costs): Likewise. * lower-subreg.c (interesting_mode_p, simplify_gen_subreg_concatn) (resolve_simple_move): Likewise. * lra-constraints.c (get_reload_reg, operands_match_p): Likewise. (process_addr_reg, simplify_operand_subreg, curr_insn_transform) (lra_constraints): Likewise. (CONST_POOL_OK_P): Reject variable-sized modes. * lra-spills.c (slot, assign_mem_slot, pseudo_reg_slot_compare) (add_pseudo_to_slot, lra_spill): Likewise. * omp-low.c (omp_clause_aligned_alignment): Likewise. * optabs-query.c (get_best_extraction_insn): Likewise. * optabs-tree.c (expand_vec_cond_expr_p): Likewise. * optabs.c (expand_vec_perm_var, expand_vec_cond_expr): Likewise. (expand_mult_highpart, valid_multiword_target_p): Likewise. * recog.c (offsettable_address_addr_space_p): Likewise. * regcprop.c (maybe_mode_change): Likewise. * reginfo.c (choose_hard_reg_mode, record_subregs_of_mode): Likewise. * regrename.c (build_def_use): Likewise. * regstat.c (dump_reg_info): Likewise. * reload.c (complex_word_subreg_p, push_reload, find_dummy_reload) (find_reloads, find_reloads_subreg_address): Likewise. * reload1.c (eliminate_regs_1): Likewise. * rtlanal.c (for_each_inc_dec_find_inc_dec, rtx_cost): Likewise. * simplify-rtx.c (avoid_constant_pool_reference): Likewise. (simplify_binary_operation_1, simplify_subreg): Likewise. * targhooks.c (default_function_arg_padding): Likewise. (default_hard_regno_nregs, default_class_max_nregs): Likewise. * tree-cfg.c (verify_gimple_assign_binary): Likewise. (verify_gimple_assign_ternary): Likewise. * tree-inline.c (estimate_move_cost): Likewise. * tree-ssa-forwprop.c (simplify_vector_constructor): Likewise. * tree-ssa-loop-ivopts.c (add_autoinc_candidates): Likewise. (get_address_cost_ainc): Likewise. * tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Likewise. (vect_supportable_dr_alignment): Likewise. * tree-vect-loop.c (vect_determine_vectorization_factor): Likewise. (vectorizable_reduction): Likewise. * tree-vect-stmts.c (vectorizable_assignment, vectorizable_shift) (vectorizable_operation, vectorizable_load): Likewise. * tree.c (build_same_sized_truth_vector_type): Likewise. * valtrack.c (cleanup_auto_inc_dec): Likewise. * var-tracking.c (emit_note_insn_var_location): Likewise. * config/arc/arc.h (ASM_OUTPUT_CASE_END): Use as_a <scalar_int_mode>. (ADDR_VEC_ALIGN): Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256201
2018-01-03poly_int: TYPE_VECTOR_SUBPARTSRichard Sandiford1-12/+15
This patch changes TYPE_VECTOR_SUBPARTS to a poly_uint64. The value is encoded in the 10-bit precision field and was previously always stored as a simple log2 value. The challenge was to use this 10 bits to encode the number of elements in variable-length vectors, so that we didn't need to increase the size of the tree. In practice the number of vector elements should always have the form N + N * X (where X is the runtime value), and as for constant-length vectors, N must be a power of 2 (even though X itself might not be). The patch therefore uses the low 8 bits to encode log2(N) and bit 8 to select between constant-length and variable-length vectors. Targets without variable-length vectors continue to use the old scheme. A new valid_vector_subparts_p function tests whether a given number of elements can be encoded. This is false for the vector modes that represent an LD3 or ST3 vector triple (which we want to treat as arrays of vectors rather than single vectors). Most of the patch is mechanical; previous patches handled the changes that weren't entirely straightforward. 2018-01-03 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * tree.h (TYPE_VECTOR_SUBPARTS): Turn into a function and handle polynomial numbers of units. (SET_TYPE_VECTOR_SUBPARTS): Likewise. (valid_vector_subparts_p): New function. (build_vector_type): Remove temporary shim and take the number of units as a poly_uint64 rather than an int. (build_opaque_vector_type): Take the number of units as a poly_uint64 rather than an int. * tree.c (build_vector_from_ctor): Handle polynomial TYPE_VECTOR_SUBPARTS. (type_hash_canon_hash, type_cache_hasher::equal): Likewise. (uniform_vector_p, vector_type_mode, build_vector): Likewise. (build_vector_from_val): If the number of units is variable, use build_vec_duplicate_cst for constant operands and VEC_DUPLICATE_EXPR otherwise. (make_vector_type): Remove temporary is_constant (). (build_vector_type, build_opaque_vector_type): Take the number of units as a poly_uint64 rather than an int. (check_vector_cst): Handle polynomial TYPE_VECTOR_SUBPARTS and VECTOR_CST_NELTS. * cfgexpand.c (expand_debug_expr): Likewise. * expr.c (count_type_elements, categorize_ctor_elements_1): Likewise. (store_constructor, expand_expr_real_1): Likewise. (const_scalar_mask_from_tree): Likewise. * fold-const-call.c (fold_const_reduction): Likewise. * fold-const.c (const_binop, const_unop, fold_convert_const): Likewise. (operand_equal_p, fold_vec_perm, fold_ternary_loc): Likewise. (native_encode_vector, vec_cst_ctor_to_array): Likewise. (fold_relational_const): Likewise. (native_interpret_vector): Likewise. Change the size from an int to an unsigned int. * gimple-fold.c (gimple_fold_stmt_to_constant_1): Handle polynomial TYPE_VECTOR_SUBPARTS. (gimple_fold_indirect_ref, gimple_build_vector): Likewise. (gimple_build_vector_from_val): Use VEC_DUPLICATE_EXPR when duplicating a non-constant operand into a variable-length vector. * hsa-brig.c (hsa_op_immed::emit_to_buffer): Handle polynomial TYPE_VECTOR_SUBPARTS and VECTOR_CST_NELTS. * ipa-icf.c (sem_variable::equals): Likewise. * match.pd: Likewise. * omp-simd-clone.c (simd_clone_subparts): Likewise. * print-tree.c (print_node): Likewise. * stor-layout.c (layout_type): Likewise. * targhooks.c (default_builtin_vectorization_cost): Likewise. * tree-cfg.c (verify_gimple_comparison): Likewise. (verify_gimple_assign_binary): Likewise. (verify_gimple_assign_ternary): Likewise. (verify_gimple_assign_single): Likewise. * tree-pretty-print.c (dump_generic_node): Likewise. * tree-ssa-forwprop.c (simplify_vector_constructor): Likewise. (simplify_bitfield_ref, is_combined_permutation_identity): Likewise. * tree-vect-data-refs.c (vect_permute_store_chain): Likewise. (vect_grouped_load_supported, vect_permute_load_chain): Likewise. (vect_shift_permute_load_chain): Likewise. * tree-vect-generic.c (nunits_for_known_piecewise_op): Likewise. (expand_vector_condition, optimize_vector_constructor): Likewise. (lower_vec_perm, get_compute_type): Likewise. * tree-vect-loop.c (vect_determine_vectorization_factor): Likewise. (get_initial_defs_for_reduction, vect_transform_loop): Likewise. * tree-vect-patterns.c (vect_recog_bool_pattern): Likewise. (vect_recog_mask_conversion_pattern): Likewise. * tree-vect-slp.c (vect_supported_load_permutation_p): Likewise. (vect_get_constant_vectors, vect_transform_slp_perm_load): Likewise. * tree-vect-stmts.c (perm_mask_for_reverse): Likewise. (get_group_load_store_type, vectorizable_mask_load_store): Likewise. (vectorizable_bswap, simd_clone_subparts, vectorizable_assignment) (vectorizable_shift, vectorizable_operation, vectorizable_store) (vectorizable_load, vect_is_simple_cond, vectorizable_comparison) (supportable_widening_operation): Likewise. (supportable_narrowing_operation): Likewise. * tree-vector-builder.c (tree_vector_builder::binary_encoded_nelts): Likewise. * varasm.c (output_constant): Likewise. gcc/ada/ * gcc-interface/utils.c (gnat_types_compatible_p): Handle polynomial TYPE_VECTOR_SUBPARTS. gcc/brig/ * brigfrontend/brig-to-generic.cc (get_unsigned_int_type): Handle polynomial TYPE_VECTOR_SUBPARTS. * brigfrontend/brig-util.h (gccbrig_type_vector_subparts): Likewise. gcc/c-family/ * c-common.c (vector_types_convertible_p, c_build_vec_perm_expr) (convert_vector_to_array_for_subscript): Handle polynomial TYPE_VECTOR_SUBPARTS. (c_common_type_for_mode): Check valid_vector_subparts_p. * c-pretty-print.c (pp_c_initializer_list): Handle polynomial VECTOR_CST_NELTS. gcc/c/ * c-typeck.c (comptypes_internal, build_binary_op): Handle polynomial TYPE_VECTOR_SUBPARTS. gcc/cp/ * constexpr.c (cxx_eval_array_reference): Handle polynomial VECTOR_CST_NELTS. (cxx_fold_indirect_ref): Handle polynomial TYPE_VECTOR_SUBPARTS. * call.c (build_conditional_expr_1): Likewise. * decl.c (cp_finish_decomp): Likewise. * mangle.c (write_type): Likewise. * typeck.c (structural_comptypes): Likewise. (cp_build_binary_op): Likewise. * typeck2.c (process_init_constructor_array): Likewise. gcc/fortran/ * trans-types.c (gfc_type_for_mode): Check valid_vector_subparts_p. gcc/lto/ * lto-lang.c (lto_type_for_mode): Check valid_vector_subparts_p. * lto.c (hash_canonical_type): Handle polynomial TYPE_VECTOR_SUBPARTS. gcc/go/ * go-lang.c (go_langhook_type_for_mode): Check valid_vector_subparts_p. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256197
2018-01-03poly_int: GET_MODE_NUNITSRichard Sandiford1-2/+6
This patch changes GET_MODE_NUNITS from unsigned char to poly_uint16, although it remains a macro when compiling target code with NUM_POLY_INT_COEFFS == 1. We can handle permuted loads and stores for variable nunits if the number of statements is a power of 2, but not otherwise. The to_constant call in make_vector_type goes away in a later patch. 2018-01-03 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * machmode.h (mode_nunits): Change from unsigned char to poly_uint16_pod. (ONLY_FIXED_SIZE_MODES): New macro. (pod_mode::measurement_type, scalar_int_mode::measurement_type) (scalar_float_mode::measurement_type, scalar_mode::measurement_type) (complex_mode::measurement_type, fixed_size_mode::measurement_type): New typedefs. (mode_to_nunits): Return a poly_uint16 rather than an unsigned short. (GET_MODE_NUNITS): Return a constant if ONLY_FIXED_SIZE_MODES, or if measurement_type is not polynomial. * genmodes.c (ZERO_COEFFS): New macro. (emit_mode_nunits_inline): Make mode_nunits_inline return a poly_uint16. (emit_mode_nunits): Change the type of mode_nunits to poly_uint16_pod. Use ZERO_COEFFS when emitting initializers. * data-streamer.h (bp_pack_poly_value): New function. (bp_unpack_poly_value): Likewise. * lto-streamer-in.c (lto_input_mode_table): Use bp_unpack_poly_value for GET_MODE_NUNITS. * lto-streamer-out.c (lto_write_mode_table): Use bp_pack_poly_value for GET_MODE_NUNITS. * tree.c (make_vector_type): Remove temporary shim and make the real function take the number of units as a poly_uint64 rather than an int. (build_vector_type_for_mode): Handle polynomial nunits. * dwarf2out.c (loc_descriptor, add_const_value_attribute): Likewise. * emit-rtl.c (const_vec_series_p_1): Likewise. (gen_rtx_CONST_VECTOR): Likewise. * fold-const.c (test_vec_duplicate_folding): Likewise. * genrecog.c (validate_pattern): Likewise. * optabs-query.c (can_vec_perm_var_p, can_mult_highpart_p): Likewise. * optabs-tree.c (expand_vec_cond_expr_p): Likewise. * optabs.c (expand_vector_broadcast, expand_binop_directly): Likewise. (shift_amt_for_vec_perm_mask, expand_vec_perm_var): Likewise. (expand_vec_cond_expr, expand_mult_highpart): Likewise. * rtlanal.c (subreg_get_info): Likewise. * tree-vect-data-refs.c (vect_grouped_store_supported): Likewise. (vect_grouped_load_supported): Likewise. * tree-vect-generic.c (type_for_widest_vector_mode): Likewise. * tree-vect-loop.c (have_whole_vector_shift): Likewise. * simplify-rtx.c (simplify_unary_operation_1): Likewise. (simplify_const_unary_operation, simplify_binary_operation_1) (simplify_const_binary_operation, simplify_ternary_operation) (test_vector_ops_duplicate, test_vector_ops): Likewise. (simplify_immed_subreg): Use GET_MODE_NUNITS on a fixed_size_mode instead of CONST_VECTOR_NUNITS. * varasm.c (output_constant_pool_2): Likewise. * rtx-vector-builder.c (rtx_vector_builder::build): Only include the explicit-encoded elements in the XVEC for variable-length vectors. gcc/ada/ * gcc-interface/misc.c (enumerate_modes): Handle polynomial GET_MODE_NUNITS. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256195
2018-01-03Update copyright years.Jakub Jelinek1-1/+1
From-SVN: r256169
2018-01-03poly_int: vectorizable_live_operationRichard Sandiford1-13/+27
This patch makes vectorizable_live_operation cope with variable-length vectors. For now we just handle cases in which we can tell at compile time which vector contains the final result. 2018-01-03 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * tree-vect-loop.c (vectorizable_live_operation): Treat the number of units as polynomial. Punt if we can't tell at compile time which vector contains the final result. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256135
2018-01-03poly_int: vectorizable_inductionRichard Sandiford1-17/+58
This patch makes vectorizable_induction cope with variable-length vectors. For now we punt on SLP inductions, but patchees after the main SVE submission add support for those too. 2018-01-03 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * tree-vect-loop.c (vectorizable_induction): Treat the number of units as polynomial. Punt on SLP inductions. Use an integer VEC_SERIES_EXPR for variable-length integer reductions. Use a cast of such a series for variable-length floating-point reductions. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256134
2018-01-03poly_int: vectorizable_reductionRichard Sandiford1-15/+51
This patch makes vectorizable_reduction cope with variable-length vectors. We can handle the simple case of an inner loop reduction for which the target has native support for the epilogue operation. For now we punt on other cases, but patches after the main SVE submission allow SLP and double reductions too. 2018-01-03 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * tree.h (build_index_vector): Declare. * tree.c (build_index_vector): New function. * tree-vect-loop.c (get_initial_defs_for_reduction): Treat the number of units as polynomial, forcibly converting it to a constant if vectorizable_reduction has already enforced the condition. (vect_create_epilog_for_reduction): Likewise. Use build_index_vector to create a {1,2,3,...} vector. (vectorizable_reduction): Treat the number of units as polynomial. Choose vectype_in based on the largest scalar element size rather than the smallest number of units. Enforce the restrictions relied on above. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256133
2018-01-03poly_int: current_vector_size and TARGET_AUTOVECTORIZE_VECTOR_SIZESRichard Sandiford1-33/+56
This patch changes the type of current_vector_size to poly_uint64. It also changes TARGET_AUTOVECTORIZE_VECTOR_SIZES so that it fills in a vector of possible sizes (as poly_uint64s) instead of returning a bitmask. The documentation claimed that the hook didn't need to include the default vector size (returned by preferred_simd_mode), but that wasn't consistent with the omp-low.c usage. 2018-01-03 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * target.h (vector_sizes, auto_vector_sizes): New typedefs. * target.def (autovectorize_vector_sizes): Return the vector sizes by pointer, using vector_sizes rather than a bitmask. * targhooks.h (default_autovectorize_vector_sizes): Update accordingly. * targhooks.c (default_autovectorize_vector_sizes): Likewise. * config/aarch64/aarch64.c (aarch64_autovectorize_vector_sizes): Likewise. * config/arc/arc.c (arc_autovectorize_vector_sizes): Likewise. * config/arm/arm.c (arm_autovectorize_vector_sizes): Likewise. * config/i386/i386.c (ix86_autovectorize_vector_sizes): Likewise. * config/mips/mips.c (mips_autovectorize_vector_sizes): Likewise. * omp-general.c (omp_max_vf): Likewise. * omp-low.c (omp_clause_aligned_alignment): Likewise. * optabs-query.c (can_vec_mask_load_store_p): Likewise. * tree-vect-loop.c (vect_analyze_loop): Likewise. * tree-vect-slp.c (vect_slp_bb): Likewise. * doc/tm.texi: Regenerate. * tree-vectorizer.h (current_vector_size): Change from an unsigned int to a poly_uint64. * tree-vect-stmts.c (get_vectype_for_scalar_type_and_size): Take the vector size as a poly_uint64 rather than an unsigned int. (current_vector_size): Change from an unsigned int to a poly_uint64. (get_vectype_for_scalar_type): Update accordingly. * tree.h (build_truth_vector_type): Take the size and number of units as a poly_uint64 rather than an unsigned int. (build_vector_type): Add a temporary overload that takes the number of units as a poly_uint64 rather than an unsigned int. * tree.c (make_vector_type): Likewise. (build_truth_vector_type): Take the number of units as a poly_uint64 rather than an unsigned int. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256131
2018-01-03poly_int: vect_nunits_for_costRichard Sandiford1-3/+5
This patch adds a function for getting the number of elements in a vector for cost purposes, which is always constant. It makes it possible for a later patch to change GET_MODE_NUNITS and TYPE_VECTOR_SUBPARTS to a poly_int. 2018-01-03 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * tree-vectorizer.h (vect_nunits_for_cost): New function. * tree-vect-loop.c (vect_model_reduction_cost): Use it. * tree-vect-slp.c (vect_analyze_slp_cost_1): Likewise. (vect_analyze_slp_cost): Likewise. * tree-vect-stmts.c (vect_model_store_cost): Likewise. (vect_model_load_cost): Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256128
2018-01-03poly_int: vectoriser vf and ufRichard Sandiford1-99/+124
This patch changes the type of the vectorisation factor and SLP unrolling factor to poly_uint64. This in turn required some knock-on changes in signedness elsewhere. Cost decisions are generally based on estimated_poly_value, which for VF is wrapped up as vect_vf_for_cost. The patch doesn't on its own enable variable-length vectorisation. It just makes the minimum changes necessary for the code to build with the new VF and UF types. Later patches also make the vectoriser cope with variable TYPE_VECTOR_SUBPARTS and variable GET_MODE_NUNITS, at which point the code really does handle variable-length vectors. The patch also changes MAX_VECTORIZATION_FACTOR to INT_MAX, to avoid hard-coding a particular architectural limit. The patch includes a new test because a development version of the patch accidentally used file print routines instead of dump_*, which would fail with -fopt-info. 2018-01-03 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * tree-vectorizer.h (_slp_instance::unrolling_factor): Change from an unsigned int to a poly_uint64. (_loop_vec_info::slp_unrolling_factor): Likewise. (_loop_vec_info::vectorization_factor): Change from an int to a poly_uint64. (MAX_VECTORIZATION_FACTOR): Bump from 64 to INT_MAX. (vect_get_num_vectors): New function. (vect_update_max_nunits, vect_vf_for_cost): Likewise. (vect_get_num_copies): Use vect_get_num_vectors. (vect_analyze_data_ref_dependences): Change max_vf from an int * to an unsigned int *. (vect_analyze_data_refs): Change min_vf from an int * to a poly_uint64 *. (vect_transform_slp_perm_load): Take the vf as a poly_uint64 rather than an unsigned HOST_WIDE_INT. * tree-vect-data-refs.c (vect_analyze_possibly_independent_ddr) (vect_analyze_data_ref_dependence): Change max_vf from an int * to an unsigned int *. (vect_analyze_data_ref_dependences): Likewise. (vect_compute_data_ref_alignment): Handle polynomial vf. (vect_enhance_data_refs_alignment): Likewise. (vect_prune_runtime_alias_test_list): Likewise. (vect_shift_permute_load_chain): Likewise. (vect_supportable_dr_alignment): Likewise. (dependence_distance_ge_vf): Take the vectorization factor as a poly_uint64 rather than an unsigned HOST_WIDE_INT. (vect_analyze_data_refs): Change min_vf from an int * to a poly_uint64 *. * tree-vect-loop-manip.c (vect_gen_scalar_loop_niters): Take vfm1 as a poly_uint64 rather than an int. Make the same change for the returned bound_scalar. (vect_gen_vector_loop_niters): Handle polynomial vf. (vect_do_peeling): Likewise. Update call to vect_gen_scalar_loop_niters and handle polynomial bound_scalars. (vect_gen_vector_loop_niters_mult_vf): Assert that the vf must be constant. * tree-vect-loop.c (vect_determine_vectorization_factor) (vect_update_vf_for_slp, vect_analyze_loop_2): Handle polynomial vf. (vect_get_known_peeling_cost): Likewise. (vect_estimate_min_profitable_iters, vectorizable_reduction): Likewise. (vect_worthwhile_without_simd_p, vectorizable_induction): Likewise. (vect_transform_loop): Likewise. Use the lowest possible VF when updating the upper bounds of the loop. (vect_min_worthwhile_factor): Make static. Return an unsigned int rather than an int. * tree-vect-slp.c (vect_attempt_slp_rearrange_stmts): Cope with polynomial unroll factors. (vect_analyze_slp_cost_1, vect_analyze_slp_instance): Likewise. (vect_make_slp_decision): Likewise. (vect_supported_load_permutation_p): Likewise, and polynomial vf too. (vect_analyze_slp_cost): Handle polynomial vf. (vect_slp_analyze_node_operations): Likewise. (vect_slp_analyze_bb_1): Likewise. (vect_transform_slp_perm_load): Take the vf as a poly_uint64 rather than an unsigned HOST_WIDE_INT. * tree-vect-stmts.c (vectorizable_simd_clone_call, vectorizable_store) (vectorizable_load): Handle polynomial vf. * tree-vectorizer.c (simduid_to_vf::vf): Change from an int to a poly_uint64. (adjust_simduid_builtins, shrink_simd_arrays): Update accordingly. gcc/testsuite/ * gcc.dg/vect-opt-info-1.c: New test. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256126
2018-01-03Add an alternative vector loop iv mechanismRichard Sandiford1-7/+19
Normally we adjust the vector loop so that it iterates: (original number of scalar iterations - number of peels) / VF times, enforcing this using an IV that starts at zero and increments by one each iteration. However, dividing by VF would be expensive for variable VF, so this patch adds an alternative in which the IV increments by VF each iteration instead. We then need to take care to handle possible overflow in the IV. The new mechanism isn't used yet; a later patch replaces the "if (1)" with a check for variable VF. 2018-01-03 Richard Sandiford <richard.sandiford@linaro.org> gcc/ * tree-vect-loop-manip.c: Include gimple-fold.h. (slpeel_make_loop_iterate_ntimes): Add step, final_iv and niters_maybe_zero parameters. Handle other cases besides a step of 1. (vect_gen_vector_loop_niters): Add a step_vector_ptr parameter. Add a path that uses a step of VF instead of 1, but disable it for now. (vect_do_peeling): Add step_vector, niters_vector_mult_vf_var and niters_no_overflow parameters. Update calls to slpeel_make_loop_iterate_ntimes and vect_gen_vector_loop_niters. Create a new SSA name if the latter choses to use a ste other than zero, and return it via niters_vector_mult_vf_var. * tree-vect-loop.c (vect_transform_loop): Update calls to vect_do_peeling, vect_gen_vector_loop_niters and slpeel_make_loop_iterate_ntimes. * tree-vectorizer.h (slpeel_make_loop_iterate_ntimes, vect_do_peeling) (vect_gen_vector_loop_niters): Update declarations after above changes. From-SVN: r256124
2018-01-02Use explicit encodings for simple permutesRichard Sandiford1-2/+4
This patch makes users of vec_perm_builders use the compressed encoding where possible. This means that they work with variable-length vectors. 2018-01-02 Richard Sandiford <richard.sandiford@linaro.org> gcc/ * optabs.c (expand_vec_perm_var): Use an explicit encoding for the broadcast of the low byte. (expand_mult_highpart): Use an explicit encoding for the permutes. * optabs-query.c (can_mult_highpart_p): Likewise. * tree-vect-loop.c (calc_vec_perm_mask_for_shift): Likewise. * tree-vect-stmts.c (perm_mask_for_reverse): Likewise. (vectorizable_bswap): Likewise. * tree-vect-data-refs.c (vect_grouped_store_supported): Use an explicit encoding for the power-of-2 permutes. (vect_permute_store_chain): Likewise. (vect_grouped_load_supported): Likewise. (vect_permute_load_chain): Likewise. From-SVN: r256097
2018-01-02Make vec_perm_indices use new vector encodingRichard Sandiford1-12/+12
This patch changes vec_perm_indices from a plain vec<> to a class that stores a canonicalized permutation, using the same encoding as for VECTOR_CSTs. This means that vec_perm_indices now carries information about the number of vectors being permuted (currently always 1 or 2) and the number of elements in each input vector. A new vec_perm_builder class is used to actually build up the vector, like tree_vector_builder does for trees. vec_perm_indices is the completed representation, a bit like VECTOR_CST is for trees. The patch just does a mechanical conversion of the code to vec_perm_builder: a later patch uses explicit encodings where possible. The point of all this is that it makes the representation suitable for variable-length vectors. It's no longer necessary for the underlying vec<>s to store every element explicitly. In int-vector-builder.h, "using the same encoding as tree and rtx constants" describes the endpoint -- adding the rtx encoding comes later. 2018-01-02 Richard Sandiford <richard.sandiford@linaro.org> gcc/ * int-vector-builder.h: New file. * vec-perm-indices.h: Include int-vector-builder.h. (vec_perm_indices): Redefine as an int_vector_builder. (auto_vec_perm_indices): Delete. (vec_perm_builder): Redefine as a stand-alone class. (vec_perm_indices::vec_perm_indices): New function. (vec_perm_indices::clamp): Likewise. * vec-perm-indices.c: Include fold-const.h and tree-vector-builder.h. (vec_perm_indices::new_vector): New function. (vec_perm_indices::new_expanded_vector): Update for new vec_perm_indices class. (vec_perm_indices::rotate_inputs): New function. (vec_perm_indices::all_in_range_p): Operate directly on the encoded form, without computing elided elements. (tree_to_vec_perm_builder): Operate directly on the VECTOR_CST encoding. Update for new vec_perm_indices class. * optabs.c (expand_vec_perm_const): Create a vec_perm_indices for the given vec_perm_builder. (expand_vec_perm_var): Update vec_perm_builder constructor. (expand_mult_highpart): Use vec_perm_builder instead of auto_vec_perm_indices. * optabs-query.c (can_mult_highpart_p): Use vec_perm_builder and vec_perm_indices instead of auto_vec_perm_indices. Use a single or double series encoding as appropriate. * fold-const.c (fold_ternary_loc): Use vec_perm_builder and vec_perm_indices instead of auto_vec_perm_indices. * tree-ssa-forwprop.c (simplify_vector_constructor): Likewise. * tree-vect-data-refs.c (vect_grouped_store_supported): Likewise. (vect_permute_store_chain): Likewise. (vect_grouped_load_supported): Likewise. (vect_permute_load_chain): Likewise. (vect_shift_permute_load_chain): Likewise. * tree-vect-slp.c (vect_build_slp_tree_1): Likewise. (vect_transform_slp_perm_load): Likewise. (vect_schedule_slp_instance): Likewise. * tree-vect-stmts.c (perm_mask_for_reverse): Likewise. (vectorizable_mask_load_store): Likewise. (vectorizable_bswap): Likewise. (vectorizable_store): Likewise. (vectorizable_load): Likewise. * tree-vect-generic.c (lower_vec_perm): Use vec_perm_builder and vec_perm_indices instead of auto_vec_perm_indices. Use tree_to_vec_perm_builder to read the vector from a tree. * tree-vect-loop.c (calc_vec_perm_mask_for_shift): Take a vec_perm_builder instead of a vec_perm_indices. (have_whole_vector_shift): Use vec_perm_builder and vec_perm_indices instead of auto_vec_perm_indices. Leave the truncation to calc_vec_perm_mask_for_shift. (vect_create_epilog_for_reduction): Likewise. * config/aarch64/aarch64.c (expand_vec_perm_d::perm): Change from auto_vec_perm_indices to vec_perm_indices. (aarch64_expand_vec_perm_const_1): Use rotate_inputs on d.perm instead of changing individual elements. (aarch64_vectorize_vec_perm_const): Use new_vector to install the vector in d.perm. * config/arm/arm.c (expand_vec_perm_d::perm): Change from auto_vec_perm_indices to vec_perm_indices. (arm_expand_vec_perm_const_1): Use rotate_inputs on d.perm instead of changing individual elements. (arm_vectorize_vec_perm_const): Use new_vector to install the vector in d.perm. * config/powerpcspe/powerpcspe.c (rs6000_expand_extract_even): Update vec_perm_builder constructor. (rs6000_expand_interleave): Likewise. * config/rs6000/rs6000.c (rs6000_expand_extract_even): Likewise. (rs6000_expand_interleave): Likewise. From-SVN: r256095
2018-01-02Remove vec_perm_const optabRichard Sandiford1-0/+1
One of the changes needed for variable-length VEC_PERM_EXPRs -- and for long fixed-length VEC_PERM_EXPRs -- is the ability to use constant selectors that wouldn't fit in the vectors being permuted. E.g. a permute on two V256QIs can't be done using a V256QI selector. At the moment constant permutes use two interfaces: targetm.vectorizer.vec_perm_const_ok for testing whether a permute is valid and the vec_perm_const optab for actually emitting the permute. The former gets passed a vec<> selector and the latter an rtx selector. Most ports share a lot of code between the hook and the optab, with a wrapper function for each interface. We could try to keep that interface and require ports to define wider vector modes that could be attached to the CONST_VECTOR (e.g. V256HI or V256SI in the example above). But building a CONST_VECTOR rtx seems a bit pointless here, since the expand code only creates the CONST_VECTOR in order to call the optab, and the first thing the target does is take the CONST_VECTOR apart again. The easiest approach therefore seemed to be to remove the optab and reuse the target hook to emit the code. One potential drawback is that it's no longer possible to use match_operand predicates to force operands into the required form, but in practice all targets want register operands anyway. The patch also changes vec_perm_indices into a class that provides some simple routines for handling permutations. A later patch will flesh this out and get rid of auto_vec_perm_indices, but I didn't want to do all that in this patch and make it more complicated than it already is. 2018-01-02 Richard Sandiford <richard.sandiford@linaro.org> gcc/ * Makefile.in (OBJS): Add vec-perm-indices.o. * vec-perm-indices.h: New file. * vec-perm-indices.c: Likewise. * target.h (vec_perm_indices): Replace with a forward class declaration. (auto_vec_perm_indices): Move to vec-perm-indices.h. * optabs.h: Include vec-perm-indices.h. (expand_vec_perm): Delete. (selector_fits_mode_p, expand_vec_perm_var): Declare. (expand_vec_perm_const): Declare. * target.def (vec_perm_const_ok): Replace with... (vec_perm_const): ...this new hook. * doc/tm.texi.in (TARGET_VECTORIZE_VEC_PERM_CONST_OK): Replace with... (TARGET_VECTORIZE_VEC_PERM_CONST): ...this new hook. * doc/tm.texi: Regenerate. * optabs.def (vec_perm_const): Delete. * doc/md.texi (vec_perm_const): Likewise. (vec_perm): Refer to TARGET_VECTORIZE_VEC_PERM_CONST. * expr.c (expand_expr_real_2): Use expand_vec_perm_const rather than expand_vec_perm for constant permutation vectors. Assert that the mode of variable permutation vectors is the integer equivalent of the mode that is being permuted. * optabs-query.h (selector_fits_mode_p): Declare. * optabs-query.c: Include vec-perm-indices.h. (selector_fits_mode_p): New function. (can_vec_perm_const_p): Check whether targetm.vectorize.vec_perm_const is defined, instead of checking whether the vec_perm_const_optab exists. Use targetm.vectorize.vec_perm_const instead of targetm.vectorize.vec_perm_const_ok. Check whether the indices fit in the vector mode before using a variable permute. * optabs.c (shift_amt_for_vec_perm_mask): Take a mode and a vec_perm_indices instead of an rtx. (expand_vec_perm): Replace with... (expand_vec_perm_const): ...this new function. Take the selector as a vec_perm_indices rather than an rtx. Also take the mode of the selector. Update call to shift_amt_for_vec_perm_mask. Use targetm.vectorize.vec_perm_const instead of vec_perm_const_optab. Use vec_perm_indices::new_expanded_vector to expand the original selector into bytes. Check whether the indices fit in the vector mode before using a variable permute. (expand_vec_perm_var): Make global. (expand_mult_highpart): Use expand_vec_perm_const. * fold-const.c: Includes vec-perm-indices.h. * tree-ssa-forwprop.c: Likewise. * tree-vect-data-refs.c: Likewise. * tree-vect-generic.c: Likewise. * tree-vect-loop.c: Likewise. * tree-vect-slp.c: Likewise. * tree-vect-stmts.c: Likewise. * config/aarch64/aarch64-protos.h (aarch64_expand_vec_perm_const): Delete. * config/aarch64/aarch64-simd.md (vec_perm_const<mode>): Delete. * config/aarch64/aarch64.c (aarch64_expand_vec_perm_const) (aarch64_vectorize_vec_perm_const_ok): Fuse into... (aarch64_vectorize_vec_perm_const): ...this new function. (TARGET_VECTORIZE_VEC_PERM_CONST_OK): Delete. (TARGET_VECTORIZE_VEC_PERM_CONST): Redefine. * config/arm/arm-protos.h (arm_expand_vec_perm_const): Delete. * config/arm/vec-common.md (vec_perm_const<mode>): Delete. * config/arm/arm.c (TARGET_VECTORIZE_VEC_PERM_CONST_OK): Delete. (TARGET_VECTORIZE_VEC_PERM_CONST): Redefine. (arm_expand_vec_perm_const, arm_vectorize_vec_perm_const_ok): Merge into... (arm_vectorize_vec_perm_const): ...this new function. Explicitly check for NEON modes. * config/i386/i386-protos.h (ix86_expand_vec_perm_const): Delete. * config/i386/sse.md (VEC_PERM_CONST, vec_perm_const<mode>): Delete. * config/i386/i386.c (ix86_expand_vec_perm_const_1): Update comment. (ix86_expand_vec_perm_const, ix86_vectorize_vec_perm_const_ok): Merge into... (ix86_vectorize_vec_perm_const): ...this new function. Incorporate the old VEC_PERM_CONST conditions. * config/ia64/ia64-protos.h (ia64_expand_vec_perm_const): Delete. * config/ia64/vect.md (vec_perm_const<mode>): Delete. * config/ia64/ia64.c (ia64_expand_vec_perm_const) (ia64_vectorize_vec_perm_const_ok): Merge into... (ia64_vectorize_vec_perm_const): ...this new function. * config/mips/loongson.md (vec_perm_const<mode>): Delete. * config/mips/mips-msa.md (vec_perm_const<mode>): Delete. * config/mips/mips-ps-3d.md (vec_perm_constv2sf): Delete. * config/mips/mips-protos.h (mips_expand_vec_perm_const): Delete. * config/mips/mips.c (mips_expand_vec_perm_const) (mips_vectorize_vec_perm_const_ok): Merge into... (mips_vectorize_vec_perm_const): ...this new function. * config/powerpcspe/altivec.md (vec_perm_constv16qi): Delete. * config/powerpcspe/paired.md (vec_perm_constv2sf): Delete. * config/powerpcspe/spe.md (vec_perm_constv2si): Delete. * config/powerpcspe/vsx.md (vec_perm_const<mode>): Delete. * config/powerpcspe/powerpcspe-protos.h (altivec_expand_vec_perm_const) (rs6000_expand_vec_perm_const): Delete. * config/powerpcspe/powerpcspe.c (TARGET_VECTORIZE_VEC_PERM_CONST_OK): Delete. (TARGET_VECTORIZE_VEC_PERM_CONST): Redefine. (altivec_expand_vec_perm_const_le): Take each operand individually. Operate on constant selectors rather than rtxes. (altivec_expand_vec_perm_const): Likewise. Update call to altivec_expand_vec_perm_const_le. (rs6000_expand_vec_perm_const): Delete. (rs6000_vectorize_vec_perm_const_ok): Delete. (rs6000_vectorize_vec_perm_const): New function. (rs6000_do_expand_vec_perm): Take a vec_perm_builder instead of an element count and rtx array. (rs6000_expand_extract_even): Update call accordingly. (rs6000_expand_interleave): Likewise. * config/rs6000/altivec.md (vec_perm_constv16qi): Delete. * config/rs6000/paired.md (vec_perm_constv2sf): Delete. * config/rs6000/vsx.md (vec_perm_const<mode>): Delete. * config/rs6000/rs6000-protos.h (altivec_expand_vec_perm_const) (rs6000_expand_vec_perm_const): Delete. * config/rs6000/rs6000.c (TARGET_VECTORIZE_VEC_PERM_CONST_OK): Delete. (TARGET_VECTORIZE_VEC_PERM_CONST): Redefine. (altivec_expand_vec_perm_const_le): Take each operand individually. Operate on constant selectors rather than rtxes. (altivec_expand_vec_perm_const): Likewise. Update call to altivec_expand_vec_perm_const_le. (rs6000_expand_vec_perm_const): Delete. (rs6000_vectorize_vec_perm_const_ok): Delete. (rs6000_vectorize_vec_perm_const): New function. Remove stray reference to the SPE evmerge intructions. (rs6000_do_expand_vec_perm): Take a vec_perm_builder instead of an element count and rtx array. (rs6000_expand_extract_even): Update call accordingly. (rs6000_expand_interleave): Likewise. * config/sparc/sparc.md (vec_perm_constv8qi): Delete in favor of... * config/sparc/sparc.c (sparc_vectorize_vec_perm_const): ...this new function. (TARGET_VECTORIZE_VEC_PERM_CONST): Redefine. From-SVN: r256093
2018-01-02Split can_vec_perm_p into can_vec_perm_{var,const}_pRichard Sandiford1-4/+1
This patch splits can_vec_perm_p into two functions: can_vec_perm_var_p for testing permute operations with variable selection vectors, and can_vec_perm_const_p for testing permute operations with specific constant selection vectors. This means that we can pass the constant selection vector by reference. Constant permutes can still use a variable permute as a fallback. A later patch adds a check to makre sure that we don't truncate the vector indices when doing this. However, have_whole_vector_shift checked: if (direct_optab_handler (vec_perm_const_optab, mode) == CODE_FOR_nothing) return false; which had the effect of disallowing the fallback to variable permutes. I'm not sure whether that was the intention or whether it was just supposed to short-cut the loop on targets that don't support permutes. (But then why bother? The first check in the loop would fail and we'd bail out straightaway.) The patch adds a parameter for disallowing the fallback. I think it makes sense to do this for the following code in the VEC_PERM_EXPR folder: /* Some targets are deficient and fail to expand a single argument permutation while still allowing an equivalent 2-argument version. */ if (need_mask_canon && arg2 == op2 && !can_vec_perm_p (TYPE_MODE (type), false, &sel) && can_vec_perm_p (TYPE_MODE (type), false, &sel2)) since it's really testing whether the expand_vec_perm_const code expects a particular form. 2018-01-02 Richard Sandiford <richard.sandiford@linaro.org> gcc/ * optabs-query.h (can_vec_perm_p): Delete. (can_vec_perm_var_p, can_vec_perm_const_p): Declare. * optabs-query.c (can_vec_perm_p): Split into... (can_vec_perm_var_p, can_vec_perm_const_p): ...these two functions. (can_mult_highpart_p): Use can_vec_perm_const_p to test whether a particular selector is valid. * tree-ssa-forwprop.c (simplify_vector_constructor): Likewise. * tree-vect-data-refs.c (vect_grouped_store_supported): Likewise. (vect_grouped_load_supported): Likewise. (vect_shift_permute_load_chain): Likewise. * tree-vect-slp.c (vect_build_slp_tree_1): Likewise. (vect_transform_slp_perm_load): Likewise. * tree-vect-stmts.c (perm_mask_for_reverse): Likewise. (vectorizable_bswap): Likewise. (vect_gen_perm_mask_checked): Likewise. * fold-const.c (fold_ternary_loc): Likewise. Don't take implementations of variable permutation vectors into account when deciding which selector to use. * tree-vect-loop.c (have_whole_vector_shift): Don't check whether vec_perm_const_optab is supported; instead use can_vec_perm_const_p with a false third argument. * tree-vect-generic.c (lower_vec_perm): Use can_vec_perm_const_p to test whether the constant selector is valid and can_vec_perm_var_p to test whether a variable selector is valid. From-SVN: r256091
2017-12-21poly_int: loop versioning thresholdRichard Sandiford1-8/+17
This patch splits the loop versioning threshold out from the cost model threshold so that the former can become a poly_uint64. We still use a single test to enforce both limits where possible. 2017-12-21 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * tree-vectorizer.h (_loop_vec_info): Add a versioning_threshold field. (LOOP_VINFO_VERSIONING_THRESHOLD): New macro (vect_loop_versioning): Take the loop versioning threshold as a separate parameter. * tree-vect-loop-manip.c (vect_loop_versioning): Likewise. * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize versioning_threshold. (vect_analyze_loop_2): Compute the loop versioning threshold whenever loop versioning is needed, and store it in the new field rather than combining it with the cost model threshold. (vect_transform_loop): Update call to vect_loop_versioning. Try to combine the loop versioning and cost thresholds here. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r255934
2017-12-19re PR tree-optimization/80631 (Compiling with -O3 -mavx2 gives wrong code)Jakub Jelinek1-2/+2
PR tree-optimization/80631 * tree-vect-loop.c (vect_create_epilog_for_reduction): Compare induc_code against MAX_EXPR or MIN_EXPR instead of reduc_fn against IFN_REDUC_MAX or IFN_REDUC_MIN. From-SVN: r255804
2017-12-12re PR tree-optimization/80631 (Compiling with -O3 -mavx2 gives wrong code)Jakub Jelinek1-36/+103
PR tree-optimization/80631 * tree-vect-loop.c (get_initial_def_for_reduction): Fix comment typo. (vect_create_epilog_for_reduction): Add INDUC_VAL and INDUC_CODE arguments, for INTEGER_INDUC_COND_REDUCTION use INDUC_VAL instead of hardcoding zero as the value if COND_EXPR is never true. For INTEGER_INDUC_COND_REDUCTION don't emit the final COND_EXPR if INDUC_VAL is equal to INITIAL_DEF, and use INDUC_CODE instead of hardcoding MAX_EXPR as the reduction operation. (is_nonwrapping_integer_induction): Allow negative step. (vectorizable_reduction): Compute INDUC_VAL and INDUC_CODE for vect_create_epilog_for_reduction, if no value is suitable, don't use INTEGER_INDUC_COND_REDUCTION for now. Formatting fixes. * gcc.dg/vect/pr80631-1.c: New test. * gcc.dg/vect/pr80631-2.c: New test. * gcc.dg/vect/pr65947-13.c: Expect integer induc cond reduction vectorization. From-SVN: r255574
2017-12-12[SFN] boilerplate changes in preparation to introduce nonbind markersAlexandre Oliva1-2/+2
This patch introduces a number of new macros and functions that will be used to distinguish between different kinds of debug stmts, insns and notes, namely, preexisting debug bind ones and to-be-introduced nonbind markers. In a seemingly mechanical way, it adjusts several uses of the macros and functions, so that they refer to narrower categories when appropriate. These changes, by themselves, should not have any visible effect in the compiler behavior, since the upcoming debug markers are never created with this patch alone. for gcc/ChangeLog * gimple.h (enum gimple_debug_subcode): Add GIMPLE_DEBUG_BEGIN_STMT. (gimple_debug_begin_stmt_p): New. (gimple_debug_nonbind_marker_p): New. * tree.h (MAY_HAVE_DEBUG_MARKER_STMTS): New. (MAY_HAVE_DEBUG_BIND_STMTS): Renamed from.... (MAY_HAVE_DEBUG_STMTS): ... this. Check both. * insn-notes.def (BEGIN_STMT): New. * rtl.h (MAY_HAVE_DEBUG_MARKER_INSNS): New. (MAY_HAVE_DEBUG_BIND_INSNS): Renamed from.... (MAY_HAVE_DEBUG_INSNS): ... this. Check both. (NOTE_MARKER_LOCATION, NOTE_MARKER_P): New. (DEBUG_BIND_INSN_P, DEBUG_MARKER_INSN_P): New. (INSN_DEBUG_MARKER_KIND): New. (GEN_RTX_DEBUG_MARKER_BEGIN_STMT_PAT): New. (INSN_VAR_LOCATION): Check for VAR_LOCATION. (INSN_VAR_LOCATION_PTR): New. * cfgexpand.c (expand_debug_locations): Handle debug bind insns only. (expand_gimple_basic_block): Likewise. Emit debug temps for TER deps only if debug bind insns are enabled. (pass_expand::execute): Avoid deep TER and expand debug locations for debug bind insns only. * cgraph.c (cgraph_edge::redirect_call_stmt_to_callee): Narrow debug stmts special handling down to debug bind stmts. * combine.c (try_combine): Narrow debug insns special handling down to debug bind insns. * cse.c (delete_trivially_dead_insns): Handle debug bindings. Narrow debug insns preexisting special handling down to debug bind insns. * dce.c (rest_of_handle_ud_dce): Narrow debug insns special handling down to debug bind insns. * function.c (instantiate_virtual_regs): Skip debug markers, adjust handling of debug binds. * gimple-ssa-backprop.c (backprop::prepare_change): Try debug temp insertion iff MAY_HAVE_DEBUG_BIND_STMTS. * haifa-sched.c (schedule_insn): Narrow special handling of debug insns to debug bind insns. * ipa-param-manipulation.c (ipa_modify_call_arguments): Narrow special handling of debug stmts to debug bind stmts. * ipa-split.c (split_function): Likewise. * ira.c (combine_and_move_insns): Adjust debug bind insns only. * loop-unroll.c (apply_opt_in_copies): Adjust tests on bind debug insns. * reg-stack.c (convert_regs_1): Use DEBUG_BIND_INSN_P. * regrename.c (build_def_use): Likewise. * regcprop.c (copyprop_hardreg_forward_1): Likewise. (pass_cprop_hardreg): Narrow special casing of debug insns to debug bind insns. * regstat.c (regstat_init_n_sets_and_refs): Likewise. * reload1.c (reload): Likewise. * sese.c (sese_insert_phis_for_liveouts): Narrow special casing of debug stmts to debug bind stmts. * shrink-wrap.c (move_insn_for_shrink_wrap): Likewise. * ssa-iterators.h (num_imm_uses): Likewise. * tree-cfg.c (gimple_merge_blocks): Narrow special casing of debug stmts to debug bind stmts. * tree-inline.c (tree_function_versioning): Narrow special casing of debug stmts to debug bind stmts. * tree-loop-distribution.c (generate_loops_for_partition): Narrow special casing of debug stmts to debug bind stmts. * tree-sra.c (analyze_access_subtree): Narrow special casing of debug stmts to debug bind stmts. * tree-ssa-dce.c (remove_dead_stmt): Narrow special casing of debug stmts to debug bind stmts. * tree-ssa-loop-ivopt.c (remove_unused_ivs): Narrow special casing of debug stmts to debug bind stmts. * tree-ssa-reassoc.c (reassoc_remove_stmt): Likewise. * tree-ssa-tail-merge.c (tail_merge_optimize): Narrow special casing of debug stmts to debug bind stmts. * tree-ssa-threadedge.c (propagate_threaded_block_debug_info): Likewise. * tree-ssa.c (flush_pending_stmts): Narrow special casing of debug stmts to debug bind stmts. (gimple_replace_ssa_lhs): Likewise. (insert_debug_temp_for_var_def): Likewise. (insert_debug_temps_for_defs): Likewise. (reset_debug_uses): Likewise. * tree-ssanames.c (release_ssa_name_fn): Likewise. * tree-vect-loop-manip.c (adjust_debug_stmts_now): Likewise. (adjust_debug_stmts): Likewise. (adjust_phi_and_debug_stmts): Likewise. (vect_do_peeling): Likewise. * tree-vect-loop.c (vect_transform_loop): Likewise. * valtrack.c (propagate_for_debug): Use BIND_DEBUG_INSN_P. * var-tracking.c (adjust_mems): Narrow special casing of debug insns to debug bind insns. (dv_onepart_p, dataflow_set_clar_at_call, use_type): Likewise. (compute_bb_dataflow, vt_find_locations): Likewise. (vt_expand_loc, emit_notes_for_changes): Likewise. (vt_init_cfa_base): Likewise. (vt_emit_notes): Likewise. (vt_initialize): Likewise. (vt_finalize): Likewise. From-SVN: r255565
2017-12-07Make gimple_build_vector take a tree_vector_builderRichard Sandiford1-13/+11
This patch changes gimple_build_vector so that it takes a tree_vector_builder instead of a size and a vector of trees. 2017-12-07 Richard Sandiford <richard.sandiford@linaro.org> gcc/ * vector-builder.h (vector_builder::derived): New const overload. (vector_builder::elt): New function. * tree-vector-builder.h (tree_vector_builder::type): New function. (tree_vector_builder::apply_step): Declare. * tree-vector-builder.c (tree_vector_builder::apply_step): New function. * gimple-fold.h (tree_vector_builder): Declare. (gimple_build_vector): Take a tree_vector_builder instead of a type and vector of elements. * gimple-fold.c (gimple_build_vector): Likewise. * tree-vect-loop.c (get_initial_def_for_reduction): Update call accordingly. (get_initial_defs_for_reduction): Likewise. (vectorizable_induction): Likewise. From-SVN: r255478
2017-12-07Use tree_vector_builder instead of build_vectorRichard Sandiford1-3/+4
This patch switches most build_vector calls over to tree_vector_builder, using explicit encodings where appropriate. Later patches handle the remaining uses of build_vector. 2017-12-07 Richard Sandiford <richard.sandiford@linaro.org> gcc/ * config/sparc/sparc.c: Include tree-vector-builder.h. (sparc_fold_builtin): Use tree_vector_builder instead of build_vector. * expmed.c: Include tree-vector-builder.h. (make_tree): Use tree_vector_builder instead of build_vector. * fold-const.c: Include tree-vector-builder.h. (const_binop): Use tree_vector_builder instead of build_vector. (const_unop): Likewise. (native_interpret_vector): Likewise. (fold_vec_perm): Likewise. (fold_ternary_loc): Likewise. * gimple-fold.c: Include tree-vector-builder.h. (gimple_fold_stmt_to_constant_1): Use tree_vector_builder instead of build_vector. * tree-ssa-forwprop.c: Include tree-vector-builder.h. (simplify_vector_constructor): Use tree_vector_builder instead of build_vector. * tree-vect-generic.c: Include tree-vector-builder.h. (add_rshift): Use tree_vector_builder instead of build_vector. (expand_vector_divmod): Likewise. (optimize_vector_constructor): Likewise. * tree-vect-loop.c: Include tree-vector-builder.h. (vect_create_epilog_for_reduction): Use tree_vector_builder instead of build_vector. Explicitly use a stepped encoding for { 1, 2, 3, ... }. * tree-vect-slp.c: Include tree-vector-builder.h. (vect_get_constant_vectors): Use tree_vector_builder instead of build_vector. (vect_transform_slp_perm_load): Likewise. (vect_schedule_slp_instance): Likewise. * tree-vect-stmts.c: Include tree-vector-builder.h. (vectorizable_bswap): Use tree_vector_builder instead of build_vector. (vect_gen_perm_mask_any): Likewise. (vectorizable_call): Likewise. Explicitly use a stepped encoding. * tree.c: (build_vector_from_ctor): Use tree_vector_builder instead of build_vector. (build_vector_from_val): Likewise. Explicitly use a duplicate encoding. From-SVN: r255475
2017-12-07re PR tree-optimization/81303 (410.bwaves regression caused by r249919)Bin Cheng1-100/+110
PR tree-optimization/81303 * Makefile.in (gimple-loop-interchange.o): New object file. * common.opt (floop-interchange): Reuse the option from graphite. * doc/invoke.texi (-floop-interchange): Ditto. New document for -floop-interchange and mention it for -O3. * opts.c (default_options_table): Enable -floop-interchange at -O3. * gimple-loop-interchange.cc: New file. * params.def (PARAM_LOOP_INTERCHANGE_MAX_NUM_STMTS): New parameter. (PARAM_LOOP_INTERCHANGE_STRIDE_RATIO): New parameter. * passes.def (pass_linterchange): New pass. * timevar.def (TV_LINTERCHANGE): New time var. * tree-pass.h (make_pass_linterchange): New declaration. * tree-ssa-loop-ivcanon.c (create_canonical_iv): Change to external interchange. Record IV before/after increment in new parameters. * tree-ssa-loop-ivopts.h (create_canonical_iv): New declaration. * tree-vect-loop.c (vect_is_simple_reduction): Factor out reduction path check into... (check_reduction_path): ...New function here. * tree-vectorizer.h (check_reduction_path): New declaration. gcc/testsuite * gcc.dg/tree-ssa/loop-interchange-1.c: New test. * gcc.dg/tree-ssa/loop-interchange-1b.c: New test. * gcc.dg/tree-ssa/loop-interchange-2.c: New test. * gcc.dg/tree-ssa/loop-interchange-3.c: New test. * gcc.dg/tree-ssa/loop-interchange-4.c: New test. * gcc.dg/tree-ssa/loop-interchange-5.c: New test. * gcc.dg/tree-ssa/loop-interchange-6.c: New test. * gcc.dg/tree-ssa/loop-interchange-7.c: New test. * gcc.dg/tree-ssa/loop-interchange-8.c: New test. * gcc.dg/tree-ssa/loop-interchange-9.c: New test. * gcc.dg/tree-ssa/loop-interchange-10.c: New test. * gcc.dg/tree-ssa/loop-interchange-11.c: New test. * gcc.dg/tree-ssa/loop-interchange-12.c: New test. * gcc.dg/tree-ssa/loop-interchange-13.c: New test. Co-Authored-By: Richard Biener <rguenther@suse.de> From-SVN: r255472
2017-11-22Replace REDUC_*_EXPRs with internal functions.Richard Sandiford1-63/+55
This patch replaces the REDUC_*_EXPR tree codes with internal functions. This is needed so that the upcoming in-order reductions can also use internal functions without too much complication. 2017-11-22 Richard Sandiford <richard.sandiford@linaro.org> gcc/ * tree.def (REDUC_MAX_EXPR, REDUC_MIN_EXPR, REDUC_PLUS_EXPR): Delete. * cfgexpand.c (expand_debug_expr): Remove handling for them. * expr.c (expand_expr_real_2): Likewise. * fold-const.c (const_unop): Likewise. * optabs-tree.c (optab_for_tree_code): Likewise. * tree-cfg.c (verify_gimple_assign_unary): Likewise. * tree-inline.c (estimate_operator_cost): Likewise. * tree-pretty-print.c (dump_generic_node): Likewise. (op_code_prio): Likewise. (op_symbol_code): Likewise. * internal-fn.def (DEF_INTERNAL_SIGNED_OPTAB_FN): Define. (IFN_REDUC_PLUS, IFN_REDUC_MAX, IFN_REDUC_MIN): New internal functions. * internal-fn.c (direct_internal_fn_optab): New function. (direct_internal_fn_array, direct_internal_fn_supported_p (internal_fn_expanders): Handle DEF_INTERNAL_SIGNED_OPTAB_FN. * fold-const-call.c (fold_const_reduction): New function. (fold_const_call): Handle CFN_REDUC_PLUS, CFN_REDUC_MAX and CFN_REDUC_MIN. * tree-vect-loop.c: Include internal-fn.h. (reduction_code_for_scalar_code): Rename to... (reduction_fn_for_scalar_code): ...this and return an internal function. (vect_model_reduction_cost): Take an internal_fn rather than a tree_code. (vect_create_epilog_for_reduction): Likewise. Build calls rather than assignments. (vectorizable_reduction): Use internal functions rather than tree codes for the reduction operation. Update calls to the functions above. * config/aarch64/aarch64-builtins.c (aarch64_gimple_fold_builtin): Use calls to internal functions rather than REDUC tree codes. * config/aarch64/aarch64-simd.md: Update comment accordingly. From-SVN: r255073
2017-11-03asan.c (create_cond_insert_point): Maintain profile.Jan Hubicka1-10/+4
* asan.c (create_cond_insert_point): Maintain profile. * ipa-utils.c (ipa_merge_profiles): Be sure only IPA profiles are merged. * basic-block.h (struct basic_block_def): Remove frequency. (EDGE_FREQUENCY): Use to_frequency * bb-reorder.c (push_to_next_round_p): Use only IPA counts for global heuristics. (find_traces): Update to use to_frequency. (find_traces_1_round): Likewise; use only IPA counts. (bb_to_key): Likewise. (connect_traces): Use IPA counts only. (copy_bb_p): Update to use to_frequency. (fix_up_crossing_landing_pad): Likewise. (sanitize_hot_paths): Likewise. * bt-load.c (basic_block_freq): Likewise. * cfg.c (init_flow): Set count_max to uninitialized. (check_bb_profile): Remove frequencies; check counts. (dump_bb_info): Do not dump frequencies. (update_bb_profile_for_threading): Update counts only. (scale_bbs_frequencies_int): Likewise. (MAX_SAFE_MULTIPLIER): Remove. (scale_bbs_frequencies_gcov_type): Update counts only. (scale_bbs_frequencies_profile_count): Update counts only. (scale_bbs_frequencies): Update counts only. * cfg.h (struct control_flow_graph): Add count-max. (update_bb_profile_for_threading): Update prototype. * cfgbuild.c (find_bb_boundaries): Do not update frequencies. (find_many_sub_basic_blocks): Likewise. * cfgcleanup.c (try_forward_edges): Likewise. (try_crossjump_to_edge): Likewise. * cfgexpand.c (expand_gimple_cond): Likewise. (expand_gimple_tailcall): Likewise. (construct_init_block): Likewise. (construct_exit_block): Likewise. * cfghooks.c (verify_flow_info): Check consistency of counts. (dump_bb_for_graph): Do not dump frequencies. (split_block_1): Do not update frequencies. (split_edge): Do not update frequencies. (make_forwarder_block): Do not update frequencies. (duplicate_block): Do not update frequencies. (account_profile_record): Do not update frequencies. * cfgloop.c (find_subloop_latch_edge_by_profile): Use IPA counts for global heuristics. * cfgloopanal.c (average_num_loop_insns): Update to use to_frequency. (expected_loop_iterations_unbounded): Use counts only. * cfgloopmanip.c (scale_loop_profile): Simplify. (create_empty_loop_on_edge): Simplify (loopify): Simplify (duplicate_loop_to_header_edge): Simplify * cfgrtl.c (force_nonfallthru_and_redirect): Update profile. (update_br_prob_note): Take care of removing note when profile becomes undefined. (relink_block_chain): Do not dump frequency. (rtl_account_profile_record): Use to_frequency. * cgraph.c (symbol_table::create_edge): Convert count to ipa count. (cgraph_edge::redirect_call_stmt_to_calle): Conver tcount to ipa count. (cgraph_update_edges_for_call_stmt_node): Likewise. (cgraph_edge::verify_count_and_frequency): Update. (cgraph_node::verify_node): Temporarily disable frequency verification. * cgraphbuild.c (compute_call_stmt_bb_frequency): Use to_cgraph_frequency. (cgraph_edge::rebuild_edges): Convert to ipa counts. * cgraphunit.c (init_lowered_empty_function): Do not initialize frequencies. (cgraph_node::expand_thunk): Update profile. * except.c (dw2_build_landing_pads): Do not update frequency. * final.c (compute_alignments): Use to_frequency. (dump_basic_block_info): Do not dump frequency. * gimple-pretty-print.c (dump_profile): Do not dump frequency. (dump_gimple_bb_header): Do not dump frequency. * gimple-ssa-isolate-paths.c (isolate_path): Do not update frequency; do update count. * gimple-streamer-in.c (input_bb): Do not stream frequency. * gimple-streamer-out.c (output_bb): Do not stream frequency. * haifa-sched.c (sched_pressure_start_bb): Use to_freuqency. (init_before_recovery): Do not update frequency. (sched_create_recovery_edges): Do not update frequency. * hsa-gen.c (convert_switch_statements): Do not update frequency. * ipa-cp.c (ipcp_propagate_stage): Update search for max_count. (ipa_cp_c_finalize): Set max_count to uninitialized. * ipa-fnsummary.c (get_minimal_bb): Use counts. (param_change_prob): Use counts. * ipa-profile.c (ipa_profile_generate_summary): Do not summarize local profiles. * ipa-split.c (consider_split): Use to_frequency. (split_function): Use to_frequency. * ira-build.c (loop_compare_func): Likewise. (mark_loops_for_removal): Likewise. (mark_all_loops_for_removal): Likewise. * loop-doloop.c (doloop_modify): Do not update frequency. * loop-unroll.c (unroll_loop_runtime_iterations): Do not update frequency. * lto-streamer-in.c (input_function): Update count_max. * omp-expand.c (expand_omp_taskreg): Update count_max. * omp-simd-clone.c (simd_clone_adjust): Update profile. * predict.c (maybe_hot_frequency_p): Use to_frequency. (maybe_hot_count_p): Use ipa counts only. (maybe_hot_bb_p): Simplify. (maybe_hot_edge_p): Simplify. (probably_never_executed): Do not take frequency argument. (probably_never_executed_bb_p): Do not pass frequency. (probably_never_executed_edge_p): Likewise. (combine_predictions_for_bb): Check that profile is nonzero. (propagate_freq): Do not set frequency. (drop_profile): Simplify. (counts_to_freqs): Simplify. (expensive_function_p): Use to_frequency. (propagate_unlikely_bbs_forward): Simplify. (determine_unlikely_bbs): Simplify. (estimate_bb_frequencies): Add hack to silence graphite issues. (compute_function_frequency): Use ipa counts. (pass_profile::execute): Update. (rebuild_frequencies): Use counts only. (force_edge_cold): Use counts only. * profile-count.c (profile_count::dump): Dump new count types. (profile_count::differs_from_p): Check compatiblity. (profile_count::to_frequency): New function. (profile_count::to_cgraph_frequency): New function. * profile-count.h (struct function): Declare. (enum profile_quality): Add profile_guessed_local and profile_guessed_global0. (class profile_proability): Decrease number of bits to 29; update from_reg_br_prob_note and to_reg_br_prob_note. (class profile_count: Update comment; decrease number of bits to 61. Check compatibility. (profile_count::compatible_p): New private member function. (profile_count::ipa_p): New member function. (profile_count::operator<): Handle global zero correctly. (profile_count::operator>): Handle global zero correctly. (profile_count::operator<=): Handle global zero correctly. (profile_count::operator>=): Handle global zero correctly. (profile_count::nonzero_p): New member function. (profile_count::force_nonzero): New member function. (profile_count::max): New member function. (profile_count::apply_scale): Handle IPA scalling. (profile_count::guessed_local): New member function. (profile_count::global0): New member function. (profile_count::ipa): New member function. (profile_count::to_frequency): Declare. (profile_count::to_cgraph_frequency): Declare. * profile.c (OVERLAP_BASE): Delete. (compute_frequency_overlap): Delete. (compute_branch_probabilities): Do not use compute_frequency_overlap. * regs.h (REG_FREQ_FROM_BB): Use to_frequency. * sched-ebb.c (rank): Use counts only. * shrink-wrap.c (handle_simple_exit): Use counts only. (try_shrink_wrapping): Use counts only. (place_prologue_for_one_component): Use counts only. * tracer.c (find_best_predecessor): Use to_frequency. (find_trace): Use to_frequency. (tail_duplicate): Use to_frequency. * trans-mem.c (expand_transaction): Do not update frequency. * tree-call-cdce.c: Do not update frequency. * tree-cfg.c (gimple_find_sub_bbs): Likewise. (gimple_merge_blocks): Likewise. (gimple_split_edge): Likewise. (gimple_duplicate_sese_region): Likewise. (gimple_duplicate_sese_tail): Likewise. (move_sese_region_to_fn): Likewise. (gimple_account_profile_record): Likewise. (insert_cond_bb): Likewise. * tree-complex.c (expand_complex_div_wide): Likewise. * tree-eh.c (lower_resx): Update profile. * tree-inline.c (copy_bb): Simplify count scaling; do not scale frequencies. (initialize_cfun): Do not initialize frequencies (freqs_to_counts): Delete. (copy_cfg_body): Ignore count parameter. (copy_body): Update. (expand_call_inline): Update count_max. (optimize_inline_calls): Update count_max. (tree_function_versioning): Update count_max. * tree-ssa-coalesce.c (coalesce_cost_bb): Use to_frequency. * tree-ssa-ifcombine.c (update_profile_after_ifcombine): Do not update frequency. * tree-ssa-loop-im.c (execute_sm_if_changed): Use counts only. * tree-ssa-loop-ivcanon.c (unloop_loops): Do not update freuqency. (try_peel_loop): Likewise. * tree-ssa-loop-ivopts.c (get_scaled_computation_cost_at): Use to_frequency. * tree-ssa-loop-manip.c (niter_for_unrolled_loop): Pass -1. (tree_transform_and_unroll_loop): Do not use frequencies * tree-ssa-loop-niter.c (estimate_numbers_of_iterations): Use reliable prediction only. * tree-ssa-loop-unswitch.c (hoist_guard): Do not use frequencies. * tree-ssa-sink.c (select_best_block): Use to_frequency. * tree-ssa-tail-merge.c (replace_block_by): Temporarily disable probability scaling. * tree-ssa-threadupdate.c (create_block_for_threading): Do not update frequency (any_remaining_duplicated_blocks): Likewise. (update_profile): Likewise. (estimated_freqs_path): Delete. (freqs_to_counts_path): Delete. (clear_counts_path): Delete. (ssa_fix_duplicate_block_edges): Likewise. (duplicate_thread_path): Likewise. * tree-switch-conversion.c (gen_inbound_check): Use counts. * tree-tailcall.c (decrease_profile): Do not update frequency. (eliminate_tail_call): Likewise. * tree-vect-loop-manip.c (vect_do_peeling): Likewise. * tree-vect-loop.c (scale_profile_for_vect_loop): Likewise. (optimize_mask_stores): Likewise. * tree-vect-stmts.c (vectorizable_simd_clone_call): Likewise. * ubsan.c (ubsan_expand_null_ifn): Update profile. (ubsan_expand_ptr_ifn): Update profile. * value-prof.c (gimple_ic): Simplify. * value-prof.h (gimple_ic): Update prototype. * ipa-inline-transform.c (inline_transform): Fix scaling conditoins. * ipa-inline.c (compute_uninlined_call_time): Be sure that counts are nonzero. (want_inline_self_recursive_call_p): Likewise. (resolve_noninline_speculation): Only cummulate defined counts. (inline_small_functions): Use nonzero_p. (ipa_inline): Do not access freed node. Unknown ChangeLog: 2017-11-02 Jan Hubicka <hubicka@ucw.cz> * testsuite/gcc.dg/no-strict-overflow-3.c (foo): Update magic value to not clash with frequency. * testsuite/gcc.dg/strict-overflow-3.c (foo): Likewise. * testsuite/gcc.dg/tree-ssa/builtin-sprintf-2.c: Update template. * testsuite/gcc.dg/tree-ssa/dump-2.c: Update template. * testsuite/gcc.dg/tree-ssa/ifc-10.c: Update template. * testsuite/gcc.dg/tree-ssa/ifc-11.c: Update template. * testsuite/gcc.dg/tree-ssa/ifc-12.c: Update template. * testsuite/gcc.dg/tree-ssa/ifc-20040816-1.c: Update template. * testsuite/gcc.dg/tree-ssa/ifc-20040816-2.c: Update template. * testsuite/gcc.dg/tree-ssa/ifc-5.c: Update template. * testsuite/gcc.dg/tree-ssa/ifc-8.c: Update template. * testsuite/gcc.dg/tree-ssa/ifc-9.c: Update template. * testsuite/gcc.dg/tree-ssa/ifc-cd.c: Update template. * testsuite/gcc.dg/tree-ssa/ifc-pr56541.c: Update template. * testsuite/gcc.dg/tree-ssa/ifc-pr68583.c: Update template. * testsuite/gcc.dg/tree-ssa/ifc-pr69489-1.c: Update template. * testsuite/gcc.dg/tree-ssa/ifc-pr69489-2.c: Update template. * testsuite/gcc.target/i386/pr61403.c: Update template. From-SVN: r254379
2017-10-23Use SCALAR_TYPE_MODE in vect_create_epilog_for_reductionRichard Sandiford1-1/+1
This follows on from similar changes a couple of months ago and is needed when general modes have variable size. 2017-10-23 Richard Sandiford <richard.sandiford@linaro.org> gcc/ * tree-vect-loop.c (vect_create_epilog_for_reduction): Use SCALAR_TYPE_MODE instead of TYPE_MODE. From-SVN: r254002
2017-10-20re PR tree-optimization/82473 (ICE in vect_get_vec_def_for_stmt_copy, at ↵Richard Biener1-2/+5
tree-vect-stmts.c:1524) 2017-10-20 Richard Biener <rguenther@suse.de> PR tree-optimization/82473 * tree-vect-loop.c (vectorizable_reduction): Properly get at the largest input type. * gcc.dg/torture/pr82473.c: New testcase. From-SVN: r253937
2017-10-19asan.c (create_cond_insert_point): Do not update edge count.Jan Hubicka1-4/+1
* asan.c (create_cond_insert_point): Do not update edge count. * auto-profile.c (afdo_propagate_edge): Update for edge count removal. (afdo_propagate_circuit): Likewise. (afdo_calculate_branch_prob): Likewise. (afdo_annotate_cfg): Likewise. * basic-block.h (struct edge_def): Remove count. (edge_def::count): New accessor. * bb-reorder.c (rotate_loop): Update. (find_traces_1_round): Update. (connect_traces): Update. (sanitize_hot_paths): Update. * cfg.c (unchecked_make_edge): Update. (make_single_succ_edge): Update. (check_bb_profile): Update. (dump_edge_info): Update. (update_bb_profile_for_threading): Update. (scale_bbs_frequencies_int): Update. (scale_bbs_frequencies_gcov_type): Update. (scale_bbs_frequencies_profile_count): Update. (scale_bbs_frequencies): Update. * cfganal.c (connect_infinite_loops_to_exit): Update. * cfgbuild.c (compute_outgoing_frequencies): Update. (find_many_sub_basic_blocks): Update. * cfgcleanup.c (try_forward_edges): Update. (try_crossjump_to_edge): Update * cfgexpand.c (expand_gimple_cond): Update (expand_gimple_tailcall): Update (construct_exit_block): Update * cfghooks.c (verify_flow_info): Update (redirect_edge_succ_nodup): Update (split_edge): Update (make_forwarder_block): Update (duplicate_block): Update (account_profile_record): Update * cfgloop.c (find_subloop_latch_edge_by_profile): Update. * cfgloopanal.c (expected_loop_iterations_unbounded): Update. * cfgloopmanip.c (scale_loop_profile): Update. (loopify): Update. (lv_adjust_loop_entry_edge): Update. * cfgrtl.c (try_redirect_by_replacing_jump): Update. (force_nonfallthru_and_redirect): Update. (purge_dead_edges): Update. (rtl_flow_call_edges_add): Update. * cgraphunit.c (init_lowered_empty_function): Update. (cgraph_node::expand_thunk): Update. * gimple-pretty-print.c (dump_probability): Update. (dump_edge_probability): Update. * gimple-ssa-isolate-paths.c (isolate_path): Update. * haifa-sched.c (sched_create_recovery_edges): Update. * hsa-gen.c (convert_switch_statements): Update. * ifcvt.c (dead_or_predicable): Update. * ipa-inline-transform.c (inline_transform): Update. * ipa-split.c (split_function): Update. * ipa-utils.c (ipa_merge_profiles): Update. * loop-doloop.c (add_test): Update. * loop-unroll.c (unroll_loop_runtime_iterations): Update. * lto-streamer-in.c (input_cfg): Update. (input_function): Update. * lto-streamer-out.c (output_cfg): Update. * modulo-sched.c (sms_schedule): Update. * postreload-gcse.c (eliminate_partially_redundant_load): Update. * predict.c (maybe_hot_edge_p): Update. (unlikely_executed_edge_p): Update. (probably_never_executed_edge_p): Update. (dump_prediction): Update. (drop_profile): Update. (propagate_unlikely_bbs_forward): Update. (determine_unlikely_bbs): Update. (force_edge_cold): Update. * profile.c (compute_branch_probabilities): Update. * reg-stack.c (better_edge): Update. * shrink-wrap.c (handle_simple_exit): Update. * tracer.c (better_p): Update. * trans-mem.c (expand_transaction): Update. (split_bb_make_tm_edge): Update. * tree-call-cdce.c: Update. * tree-cfg.c (gimple_find_sub_bbs): Update. (gimple_split_edge): Update. (gimple_duplicate_sese_region): Update. (gimple_duplicate_sese_tail): Update. (gimple_flow_call_edges_add): Update. (insert_cond_bb): Update. (execute_fixup_cfg): Update. * tree-cfgcleanup.c (cleanup_control_expr_graph): Update. * tree-complex.c (expand_complex_div_wide): Update. * tree-eh.c (lower_resx): Update. (unsplit_eh): Update. (cleanup_empty_eh_move_lp): Update. * tree-inline.c (copy_edges_for_bb): Update. (freqs_to_counts): Update. (copy_cfg_body): Update. * tree-ssa-dce.c (remove_dead_stmt): Update. * tree-ssa-ifcombine.c (update_profile_after_ifcombine): Update. * tree-ssa-loop-im.c (execute_sm_if_changed): Update. * tree-ssa-loop-ivcanon.c (remove_exits_and_undefined_stmts): Update. (unloop_loops): Update. * tree-ssa-loop-manip.c (tree_transform_and_unroll_loop): Update. * tree-ssa-loop-split.c (connect_loops): Update. (split_loop): Update. * tree-ssa-loop-unswitch.c (hoist_guard): Update. * tree-ssa-phionlycprop.c (propagate_rhs_into_lhs): Update. * tree-ssa-phiopt.c (replace_phi_edge_with_variable): Update. * tree-ssa-reassoc.c (branch_fixup): Update. * tree-ssa-tail-merge.c (replace_block_by): Update. * tree-ssa-threadupdate.c (remove_ctrl_stmt_and_useless_edges): Update. (compute_path_counts): Update. (update_profile): Update. (recompute_probabilities): Update. (update_joiner_offpath_counts): Update. (estimated_freqs_path): Update. (freqs_to_counts_path): Update. (clear_counts_path): Update. (ssa_fix_duplicate_block_edges): Update. (duplicate_thread_path): Update. * tree-switch-conversion.c (hoist_edge_and_branch_if_true): Update. (case_bit_test_cmp): Update. (collect_switch_conv_info): Update. (gen_inbound_check): Update. (do_jump_if_equal): Update. (emit_cmp_and_jump_insns): Update. * tree-tailcall.c (decrease_profile): Update. (eliminate_tail_call): Update. * tree-vect-loop-manip.c (slpeel_add_loop_guard): Update. (vect_do_peeling): Update. * tree-vect-loop.c (scale_profile_for_vect_loop): Update. * ubsan.c (ubsan_expand_null_ifn): Update. (ubsan_expand_ptr_ifn): Update. * value-prof.c (gimple_divmod_fixed_value): Update. (gimple_mod_pow2): Update. (gimple_mod_subtract): Update. (gimple_ic): Update. (gimple_stringop_fixed_value): Update. From-SVN: r253910
2017-09-18Move computation of SLP_TREE_NUMBER_OF_VEC_STMTSRichard Sandiford1-2/+1
Previously SLP_TREE_NUMBER_OF_VEC_STMTS was calculated while scheduling an SLP tree after analysis, but sometimes it can be useful to know the value during analysis too. This patch moves the calculation to vect_slp_analyze_node_operations instead. 2017-09-18 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * tree-vectorizer.h (vect_slp_analyze_operations): Replace parameters with a vec_info *. * tree-vect-loop.c (vect_analyze_loop_operations): Update call accordingly. * tree-vect-slp.c (vect_slp_analyze_node_operations): Add vec_info * parameter. Set SLP_TREE_NUMBER_OF_VEC_STMTS here rather than in vect_schedule_slp_instance. (vect_slp_analyze_operations): Replace parameters with a vec_info *. Update call to vect_slp_analyze_node_operations. Simplify return value. (vect_slp_analyze_bb_1): Update call accordingly. (vect_schedule_slp_instance): Remove vectorization_factor parameter. Don't calculate SLP_TREE_NUMBER_OF_VEC_STMTS here. (vect_schedule_slp): Update call accordingly. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r252935
2017-09-18Fix type of bitstart in vectorizable_live_operationRichard Sandiford1-1/+1
This patch changes the type of the multiplier applied by vectorizable_live_operation from unsigned_type_node to bitsizetype, which matches the type of TYPE_SIZE and is the type expected of a BIT_FIELD_REF bit position. This is shown by existing tests when SVE is added. 2017-09-18 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * tree-vect-loop.c (vectorizable_live_operation): Fix type of bitstart. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r252931
2017-09-18Fix vectorizable_live_operation handling of vector booleansRichard Sandiford1-1/+3
vectorizable_live_operation needs to use BIT_FIELD_REF to extract one element of a vector. For a packed vector boolean type, the number of bits to extract should be taken from TYPE_PRECISION rather than TYPE_SIZE. This is shown by existing tests once SVE is added. 2017-09-18 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * tree-vect-loop.c (vectorizable_live_operation): Fix element size calculation for vector booleans. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r252930
2017-09-18re PR tree-optimization/82220 (SPEC CPU2006 482.sphinx3 ~10% performance ↵Richard Biener1-2/+2
regression with trunk@250416) 2017-09-18 Richard Biener <rguenther@suse.de> PR tree-optimization/82220 * tree-vect-loop.c (vect_estimate_min_profitable_iters): Exclude epilogue niters from the min_profitable_iters compute. From-SVN: r252917
2017-09-16PR82228: Move ncopies calculation in vectorizable_live_operationRichard Sandiford1-5/+5
This should have been after the early exit for non-vectorised statements. 2017-09-16 Richard Sandiford <richard.sandiford@linaro.org> gcc/ PR tree-optimization/82228 * tree-vect-loop.c (vectorizable_live_operation): Move initialization of ncopies. From-SVN: r252888