aboutsummaryrefslogtreecommitdiff
path: root/gcc/tree-vect-loop-manip.c
AgeCommit message (Collapse)AuthorFilesLines
2018-03-16re PR c/84909 (typo: conversion from %qT to to %qT)Jakub Jelinek1-1/+1
PR c/84909 * c-warn.c (conversion_warning): Replace "to to" with "to" in diagnostics. * hsa-gen.c (mem_type_for_type): Fix comment typo. * tree-vect-loop-manip.c (vect_create_cond_for_niters_checks): Likewise. * gimple-ssa-warn-restrict.c (builtin_memref::set_base_and_offset): Likewise. From-SVN: r258609
2018-03-13[SLP/AArch64] Fix unpack handling for big-endian SVERichard Sandiford1-1/+2
I hadn't realised that on big-endian targets, VEC_UNPACK*HI_EXPR unpacks the low-numbered lanes and VEC_UNPACK*LO_EXPR unpacks the high-numbered lanes. This meant that both the SVE patterns and the handling of fully-masked loops were wrong. The patch deals with that by making sure that all vec_unpack* optabs are define_expands, using BYTES_BIG_ENDIAN to choose the appropriate define_insn. This in turn meant that we can get rid of the duplication between the signed and unsigned patterns for predicates. (We provide implementations of both the signed and unsigned optabs because the sign doesn't matter for predicates: every element contains only one significant bit.) Also, the float unpacks need to unpack one half of the input vector, but the unpacked upper bits are "don't care". There are two obvious ways of handling that: use an unpack (filling with zeros) or use a ZIP (filling with a duplicate of the low bits). The code previously used unpacks, but the sequence involved a subreg that is semantically an element reverse on big-endian targets. Using the ZIP patterns avoids that, and at the moment there's no reason to prefer one over the other for performance reasons, so the patch switches to ZIP unconditionally. As the comment says, it would be easy to optimise this later if UUNPK turns out to be better for some implementations. 2018-03-13 Richard Sandiford <richard.sandiford@linaro.org> gcc/ * tree-vect-loop-manip.c (vect_maybe_permute_loop_masks): Reverse the choice between VEC_UNPACK_LO_EXPR and VEC_UNPACK_HI_EXPR for big-endian. * config/aarch64/iterators.md (hi_lanes_optab): New int attribute. * config/aarch64/aarch64-sve.md (*aarch64_sve_<perm_insn><perm_hilo><mode>): Rename to... (aarch64_sve_<perm_insn><perm_hilo><mode>): ...this. (*extend<mode><Vwide>2): Rename to... (aarch64_sve_extend<mode><Vwide>2): ...this. (vec_unpack<su>_<perm_hilo>_<mode>): Turn into a define_expand, renaming the old pattern to... (aarch64_sve_punpk<perm_hilo>_<mode>): ...this. Only define unsigned packs. (vec_unpack<su>_<perm_hilo>_<SVE_BHSI:mode>): Turn into a define_expand, renaming the old pattern to... (aarch64_sve_<su>unpk<perm_hilo>_<SVE_BHSI:mode>): ...this. (*vec_unpacku_<perm_hilo>_<mode>_no_convert): Delete. (vec_unpacks_<perm_hilo>_<mode>): Take BYTES_BIG_ENDIAN into account when deciding which SVE instruction the optab should use. (vec_unpack<su_optab>_float_<perm_hilo>_vnx4si): Likewise. gcc/testsuite/ * gcc.target/aarch64/sve/unpack_fcvt_signed_1.c: Expect zips rather than unpacks. * gcc.target/aarch64/sve/unpack_fcvt_unsigned_1.c: Likewise. * gcc.target/aarch64/sve/unpack_float_1.c: Likewise. From-SVN: r258489
2018-02-07re PR tree-optimization/84037 (Speed regression of polyhedron benchmark ↵Richard Biener1-1/+2
since r256644) 2018-02-07 Richard Biener <rguenther@suse.de> PR tree-optimization/84037 * tree-vectorizer.h (struct _loop_vec_info): Add ivexpr_map member. (cse_and_gimplify_to_preheader): Declare. (vect_get_place_in_interleaving_chain): Likewise. * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize ivexpr_map. (_loop_vec_info::~_loop_vec_info): Delete it. (cse_and_gimplify_to_preheader): New function. * tree-vect-slp.c (vect_get_place_in_interleaving_chain): Export. * tree-vect-stmts.c (vectorizable_store): CSE base and steps. (vectorizable_load): Likewise. For grouped stores always base the IV on the first element. * tree-vect-loop-manip.c (vect_loop_versioning): Unshare versioning condition before gimplifying. From-SVN: r257453
2018-01-13Support for aliasing with variable stridesRichard Sandiford1-0/+26
This patch adds runtime alias checks for loops with variable strides, so that we can vectorise them even without a restrict qualifier. There are several parts to doing this: 1) For accesses like: x[i * n] += 1; we need to check whether n (and thus the DR_STEP) is nonzero. vect_analyze_data_ref_dependence records values that need to be checked in this way, then prune_runtime_alias_test_list records a bounds check on DR_STEP being outside the range [0, 0]. 2) For accesses like: x[i * n] = x[i * n + 1] + 1; we simply need to test whether abs (n) >= 2. prune_runtime_alias_test_list looks for cases like this and tries to guess whether it is better to use this kind of check or a check for non-overlapping ranges. (We could do an OR of the two conditions at runtime, but that isn't implemented yet.) 3) Checks for overlapping ranges need to cope with variable strides. At present the "length" of each segment in a range check is represented as an offset from the base that lies outside the touched range, in the same direction as DR_STEP. The length can therefore be negative and is sometimes conservative. With variable steps it's easier to reaon about if we split this into two: seg_len: distance travelled from the first iteration of interest to the last, e.g. DR_STEP * (VF - 1) access_size: the number of bytes accessed in each iteration with access_size always being a positive constant and seg_len possibly being variable. We can then combine alias checks for two accesses that are a constant number of bytes apart by adjusting the access size to account for the gap. This leaves the segment length unchanged, which allows the check to be combined with further accesses. When seg_len is positive, the runtime alias check has the form: base_a >= base_b + seg_len_b + access_size_b || base_b >= base_a + seg_len_a + access_size_a In many accesses the base will be aligned to the access size, which allows us to skip the addition: base_a > base_b + seg_len_b || base_b > base_a + seg_len_a A similar saving is possible with "negative" lengths. The patch therefore tracks the alignment in addition to seg_len and access_size. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * tree-vectorizer.h (vec_lower_bound): New structure. (_loop_vec_info): Add check_nonzero and lower_bounds. (LOOP_VINFO_CHECK_NONZERO): New macro. (LOOP_VINFO_LOWER_BOUNDS): Likewise. (LOOP_REQUIRES_VERSIONING_FOR_ALIAS): Check lower_bounds too. * tree-data-ref.h (dr_with_seg_len): Add access_size and align fields. Make seg_len the distance travelled, not including the access size. (dr_direction_indicator): Declare. (dr_zero_step_indicator): Likewise. (dr_known_forward_stride_p): Likewise. * tree-data-ref.c: Include stringpool.h, tree-vrp.h and tree-ssanames.h. (runtime_alias_check_p): Allow runtime alias checks with variable strides. (operator ==): Compare access_size and align. (prune_runtime_alias_test_list): Rework for new distinction between the access_size and seg_len. (create_intersect_range_checks_index): Likewise. Cope with polynomial segment lengths. (get_segment_min_max): New function. (create_intersect_range_checks): Use it. (dr_step_indicator): New function. (dr_direction_indicator): Likewise. (dr_zero_step_indicator): Likewise. (dr_known_forward_stride_p): Likewise. * tree-loop-distribution.c (data_ref_segment_size): Return DR_STEP * (niters - 1). (compute_alias_check_pairs): Update call to the dr_with_seg_len constructor. * tree-vect-data-refs.c (vect_check_nonzero_value): New function. (vect_preserves_scalar_order_p): New function, split out from... (vect_analyze_data_ref_dependence): ...here. Check for zero steps. (vect_vfa_segment_size): Return DR_STEP * (length_factor - 1). (vect_vfa_access_size): New function. (vect_vfa_align): Likewise. (vect_compile_time_alias): Take access_size_a and access_b arguments. (dump_lower_bound): New function. (vect_check_lower_bound): Likewise. (vect_small_gap_p): Likewise. (vectorizable_with_step_bound_p): Likewise. (vect_prune_runtime_alias_test_list): Ignore cross-iteration depencies if the vectorization factor is 1. Convert the checks for nonzero steps into checks on the bounds of DR_STEP. Try using a bunds check for variable steps if the minimum required step is relatively small. Update calls to the dr_with_seg_len constructor and to vect_compile_time_alias. * tree-vect-loop-manip.c (vect_create_cond_for_lower_bounds): New function. (vect_loop_versioning): Call it. * tree-vect-loop.c (vect_analyze_loop_2): Clear LOOP_VINFO_LOWER_BOUNDS when retrying. (vect_estimate_min_profitable_iters): Account for any bounds checks. gcc/testsuite/ * gcc.dg/vect/bb-slp-cond-1.c: Expect loop vectorization rather than SLP vectorization. * gcc.dg/vect/vect-alias-check-10.c: New test. * gcc.dg/vect/vect-alias-check-11.c: Likewise. * gcc.dg/vect/vect-alias-check-12.c: Likewise. * gcc.dg/vect/vect-alias-check-8.c: Likewise. * gcc.dg/vect/vect-alias-check-9.c: Likewise. * gcc.target/aarch64/sve/strided_load_8.c: Likewise. * gcc.target/aarch64/sve/var_stride_1.c: Likewise. * gcc.target/aarch64/sve/var_stride_1.h: Likewise. * gcc.target/aarch64/sve/var_stride_1_run.c: Likewise. * gcc.target/aarch64/sve/var_stride_2.c: Likewise. * gcc.target/aarch64/sve/var_stride_2_run.c: Likewise. * gcc.target/aarch64/sve/var_stride_3.c: Likewise. * gcc.target/aarch64/sve/var_stride_3_run.c: Likewise. * gcc.target/aarch64/sve/var_stride_4.c: Likewise. * gcc.target/aarch64/sve/var_stride_4_run.c: Likewise. * gcc.target/aarch64/sve/var_stride_5.c: Likewise. * gcc.target/aarch64/sve/var_stride_5_run.c: Likewise. * gcc.target/aarch64/sve/var_stride_6.c: Likewise. * gcc.target/aarch64/sve/var_stride_6_run.c: Likewise. * gcc.target/aarch64/sve/var_stride_7.c: Likewise. * gcc.target/aarch64/sve/var_stride_7_run.c: Likewise. * gcc.target/aarch64/sve/var_stride_8.c: Likewise. * gcc.target/aarch64/sve/var_stride_8_run.c: Likewise. * gfortran.dg/vect/vect-alias-check-1.F90: Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256644
2018-01-13Use single-iteration epilogues when peeling for gapsRichard Sandiford1-40/+52
This patch adds support for fully-masking loops that require peeling for gaps. It peels exactly one scalar iteration and uses the masked loop to handle the rest. Previously we would fall back on using a standard unmasked loop instead. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * tree-vect-loop-manip.c (vect_gen_scalar_loop_niters): Replace vfm1 with a bound_epilog parameter. (vect_do_peeling): Update calls accordingly, and move the prologue call earlier in the function. Treat the base bound_epilog as 0 for fully-masked loops and retain vf - 1 for other loops. Add 1 to this base when peeling for gaps. * tree-vect-loop.c (vect_analyze_loop_2): Allow peeling for gaps with fully-masked loops. (vect_estimate_min_profitable_iters): Handle the single peeled iteration in that case. gcc/testsuite/ * gcc.target/aarch64/sve/struct_vect_18.c: Check the number of branches. * gcc.target/aarch64/sve/struct_vect_19.c: Likewise. * gcc.target/aarch64/sve/struct_vect_20.c: New test. * gcc.target/aarch64/sve/struct_vect_20_run.c: Likewise. * gcc.target/aarch64/sve/struct_vect_21.c: Likewise. * gcc.target/aarch64/sve/struct_vect_21_run.c: Likewise. * gcc.target/aarch64/sve/struct_vect_22.c: Likewise. * gcc.target/aarch64/sve/struct_vect_22_run.c: Likewise. * gcc.target/aarch64/sve/struct_vect_23.c: Likewise. * gcc.target/aarch64/sve/struct_vect_23_run.c: Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256635
2018-01-13Handle peeling for alignment with maskingRichard Sandiford1-53/+213
This patch adds support for aligning vectors by using a partial first iteration. E.g. if the start pointer is 3 elements beyond an aligned address, the first iteration will have a mask in which the first three elements are false. On SVE, the optimisation is only useful for vector-length-specific code. Vector-length-agnostic code doesn't try to align vectors since the vector length might not be a power of 2. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * tree-vectorizer.h (_loop_vec_info::mask_skip_niters): New field. (LOOP_VINFO_MASK_SKIP_NITERS): New macro. (vect_use_loop_mask_for_alignment_p): New function. (vect_prepare_for_masked_peels, vect_gen_while_not): Declare. * tree-vect-loop-manip.c (vect_set_loop_masks_directly): Add an niters_skip argument. Make sure that the first niters_skip elements of the first iteration are inactive. (vect_set_loop_condition_masked): Handle LOOP_VINFO_MASK_SKIP_NITERS. Update call to vect_set_loop_masks_directly. (get_misalign_in_elems): New function, split out from... (vect_gen_prolog_loop_niters): ...here. (vect_update_init_of_dr): Take a code argument that specifies whether the adjustment should be added or subtracted. (vect_update_init_of_drs): Likewise. (vect_prepare_for_masked_peels): New function. (vect_do_peeling): Skip prologue peeling if we're using a mask instead. Update call to vect_update_inits_of_drs. * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize mask_skip_niters. (vect_analyze_loop_2): Allow fully-masked loops with peeling for alignment. Do not include the number of peeled iterations in the minimum threshold in that case. (vectorizable_induction): Adjust the start value down by LOOP_VINFO_MASK_SKIP_NITERS iterations. (vect_transform_loop): Call vect_prepare_for_masked_peels. Take the number of skipped iterations into account when calculating the loop bounds. * tree-vect-stmts.c (vect_gen_while_not): New function. gcc/testsuite/ * gcc.target/aarch64/sve/nopeel_1.c: New test. * gcc.target/aarch64/sve/peel_ind_1.c: Likewise. * gcc.target/aarch64/sve/peel_ind_1_run.c: Likewise. * gcc.target/aarch64/sve/peel_ind_2.c: Likewise. * gcc.target/aarch64/sve/peel_ind_2_run.c: Likewise. * gcc.target/aarch64/sve/peel_ind_3.c: Likewise. * gcc.target/aarch64/sve/peel_ind_3_run.c: Likewise. * gcc.target/aarch64/sve/peel_ind_4.c: Likewise. * gcc.target/aarch64/sve/peel_ind_4_run.c: Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256630
2018-01-13Add support for fully-predicated loopsRichard Sandiford1-34/+486
This patch adds support for using a single fully-predicated loop instead of a vector loop and a scalar tail. An SVE WHILELO instruction generates the predicate for each iteration of the loop, given the current scalar iv value and the loop bound. This operation is wrapped up in a new internal function called WHILE_ULT. E.g.: WHILE_ULT (0, 3, { 0, 0, 0, 0}) -> { 1, 1, 1, 0 } WHILE_ULT (UINT_MAX - 1, UINT_MAX, { 0, 0, 0, 0 }) -> { 1, 0, 0, 0 } The third WHILE_ULT argument is needed to make the operation unambiguous: without it, WHILE_ULT (0, 3) for one vector type would seem equivalent to WHILE_ULT (0, 3) for another, even if the types have different numbers of elements. Note that the patch uses "mask" and "fully-masked" instead of "predicate" and "fully-predicated", to follow existing GCC terminology. This patch just handles the simple cases, punting for things like reductions and live-out values. Later patches remove most of these restrictions. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * optabs.def (while_ult_optab): New optab. * doc/md.texi (while_ult@var{m}@var{n}): Document. * internal-fn.def (WHILE_ULT): New internal function. * internal-fn.h (direct_internal_fn_supported_p): New override that takes two types as argument. * internal-fn.c (while_direct): New macro. (expand_while_optab_fn): New function. (convert_optab_supported_p): Likewise. (direct_while_optab_supported_p): New macro. * wide-int.h (wi::udiv_ceil): New function. * tree-vectorizer.h (rgroup_masks): New structure. (vec_loop_masks): New typedef. (_loop_vec_info): Add masks, mask_compare_type, can_fully_mask_p and fully_masked_p. (LOOP_VINFO_CAN_FULLY_MASK_P, LOOP_VINFO_FULLY_MASKED_P) (LOOP_VINFO_MASKS, LOOP_VINFO_MASK_COMPARE_TYPE): New macros. (vect_max_vf): New function. (slpeel_make_loop_iterate_ntimes): Delete. (vect_set_loop_condition, vect_get_loop_mask_type, vect_gen_while) (vect_halve_mask_nunits, vect_double_mask_nunits): Declare. (vect_record_loop_mask, vect_get_loop_mask): Likewise. * tree-vect-loop-manip.c: Include tree-ssa-loop-niter.h, internal-fn.h, stor-layout.h and optabs-query.h. (vect_set_loop_mask): New function. (add_preheader_seq): Likewise. (add_header_seq): Likewise. (interleave_supported_p): Likewise. (vect_maybe_permute_loop_masks): Likewise. (vect_set_loop_masks_directly): Likewise. (vect_set_loop_condition_masked): Likewise. (vect_set_loop_condition_unmasked): New function, split out from slpeel_make_loop_iterate_ntimes. (slpeel_make_loop_iterate_ntimes): Rename to.. (vect_set_loop_condition): ...this. Use vect_set_loop_condition_masked for fully-masked loops and vect_set_loop_condition_unmasked otherwise. (vect_do_peeling): Update call accordingly. (vect_gen_vector_loop_niters): Use VF as the step for fully-masked loops. * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize mask_compare_type, can_fully_mask_p and fully_masked_p. (release_vec_loop_masks): New function. (_loop_vec_info): Use it to free the loop masks. (can_produce_all_loop_masks_p): New function. (vect_get_max_nscalars_per_iter): Likewise. (vect_verify_full_masking): Likewise. (vect_analyze_loop_2): Save LOOP_VINFO_CAN_FULLY_MASK_P around retries, and free the mask rgroups before retrying. Check loop-wide reasons for disallowing fully-masked loops. Make the final decision about whether use a fully-masked loop or not. (vect_estimate_min_profitable_iters): Do not assume that peeling for the number of iterations will be needed for fully-masked loops. (vectorizable_reduction): Disable fully-masked loops. (vectorizable_live_operation): Likewise. (vect_halve_mask_nunits): New function. (vect_double_mask_nunits): Likewise. (vect_record_loop_mask): Likewise. (vect_get_loop_mask): Likewise. (vect_transform_loop): Handle the case in which the final loop iteration might handle a partial vector. Call vect_set_loop_condition instead of slpeel_make_loop_iterate_ntimes. * tree-vect-stmts.c: Include tree-ssa-loop-niter.h and gimple-fold.h. (check_load_store_masking): New function. (prepare_load_store_mask): Likewise. (vectorizable_store): Handle fully-masked loops. (vectorizable_load): Likewise. (supportable_widening_operation): Use vect_halve_mask_nunits for booleans. (supportable_narrowing_operation): Likewise vect_double_mask_nunits. (vect_gen_while): New function. * config/aarch64/aarch64.md (umax<mode>3): New expander. (aarch64_uqdec<mode>): New insn. gcc/testsuite/ * gcc.dg/tree-ssa/cunroll-10.c: Disable vectorization. * gcc.dg/tree-ssa/peel1.c: Likewise. * gcc.dg/vect/vect-load-lanes-peeling-1.c: Remove XFAIL for variable-length vectors. * gcc.target/aarch64/sve/vcond_6.c: XFAIL test for AND. * gcc.target/aarch64/sve/vec_bool_cmp_1.c: Expect BIC instead of NOT. * gcc.target/aarch64/sve/slp_1.c: Check for a fully-masked loop. * gcc.target/aarch64/sve/slp_2.c: Likewise. * gcc.target/aarch64/sve/slp_3.c: Likewise. * gcc.target/aarch64/sve/slp_4.c: Likewise. * gcc.target/aarch64/sve/slp_6.c: Likewise. * gcc.target/aarch64/sve/slp_8.c: New test. * gcc.target/aarch64/sve/slp_8_run.c: Likewise. * gcc.target/aarch64/sve/slp_9.c: Likewise. * gcc.target/aarch64/sve/slp_9_run.c: Likewise. * gcc.target/aarch64/sve/slp_10.c: Likewise. * gcc.target/aarch64/sve/slp_10_run.c: Likewise. * gcc.target/aarch64/sve/slp_11.c: Likewise. * gcc.target/aarch64/sve/slp_11_run.c: Likewise. * gcc.target/aarch64/sve/slp_12.c: Likewise. * gcc.target/aarch64/sve/slp_12_run.c: Likewise. * gcc.target/aarch64/sve/ld1r_2.c: Likewise. * gcc.target/aarch64/sve/ld1r_2_run.c: Likewise. * gcc.target/aarch64/sve/while_1.c: Likewise. * gcc.target/aarch64/sve/while_2.c: Likewise. * gcc.target/aarch64/sve/while_3.c: Likewise. * gcc.target/aarch64/sve/while_4.c: Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256625
2018-01-03Update copyright years.Jakub Jelinek1-1/+1
From-SVN: r256169
2018-01-03poly_int: vectoriser vf and ufRichard Sandiford1-29/+45
This patch changes the type of the vectorisation factor and SLP unrolling factor to poly_uint64. This in turn required some knock-on changes in signedness elsewhere. Cost decisions are generally based on estimated_poly_value, which for VF is wrapped up as vect_vf_for_cost. The patch doesn't on its own enable variable-length vectorisation. It just makes the minimum changes necessary for the code to build with the new VF and UF types. Later patches also make the vectoriser cope with variable TYPE_VECTOR_SUBPARTS and variable GET_MODE_NUNITS, at which point the code really does handle variable-length vectors. The patch also changes MAX_VECTORIZATION_FACTOR to INT_MAX, to avoid hard-coding a particular architectural limit. The patch includes a new test because a development version of the patch accidentally used file print routines instead of dump_*, which would fail with -fopt-info. 2018-01-03 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * tree-vectorizer.h (_slp_instance::unrolling_factor): Change from an unsigned int to a poly_uint64. (_loop_vec_info::slp_unrolling_factor): Likewise. (_loop_vec_info::vectorization_factor): Change from an int to a poly_uint64. (MAX_VECTORIZATION_FACTOR): Bump from 64 to INT_MAX. (vect_get_num_vectors): New function. (vect_update_max_nunits, vect_vf_for_cost): Likewise. (vect_get_num_copies): Use vect_get_num_vectors. (vect_analyze_data_ref_dependences): Change max_vf from an int * to an unsigned int *. (vect_analyze_data_refs): Change min_vf from an int * to a poly_uint64 *. (vect_transform_slp_perm_load): Take the vf as a poly_uint64 rather than an unsigned HOST_WIDE_INT. * tree-vect-data-refs.c (vect_analyze_possibly_independent_ddr) (vect_analyze_data_ref_dependence): Change max_vf from an int * to an unsigned int *. (vect_analyze_data_ref_dependences): Likewise. (vect_compute_data_ref_alignment): Handle polynomial vf. (vect_enhance_data_refs_alignment): Likewise. (vect_prune_runtime_alias_test_list): Likewise. (vect_shift_permute_load_chain): Likewise. (vect_supportable_dr_alignment): Likewise. (dependence_distance_ge_vf): Take the vectorization factor as a poly_uint64 rather than an unsigned HOST_WIDE_INT. (vect_analyze_data_refs): Change min_vf from an int * to a poly_uint64 *. * tree-vect-loop-manip.c (vect_gen_scalar_loop_niters): Take vfm1 as a poly_uint64 rather than an int. Make the same change for the returned bound_scalar. (vect_gen_vector_loop_niters): Handle polynomial vf. (vect_do_peeling): Likewise. Update call to vect_gen_scalar_loop_niters and handle polynomial bound_scalars. (vect_gen_vector_loop_niters_mult_vf): Assert that the vf must be constant. * tree-vect-loop.c (vect_determine_vectorization_factor) (vect_update_vf_for_slp, vect_analyze_loop_2): Handle polynomial vf. (vect_get_known_peeling_cost): Likewise. (vect_estimate_min_profitable_iters, vectorizable_reduction): Likewise. (vect_worthwhile_without_simd_p, vectorizable_induction): Likewise. (vect_transform_loop): Likewise. Use the lowest possible VF when updating the upper bounds of the loop. (vect_min_worthwhile_factor): Make static. Return an unsigned int rather than an int. * tree-vect-slp.c (vect_attempt_slp_rearrange_stmts): Cope with polynomial unroll factors. (vect_analyze_slp_cost_1, vect_analyze_slp_instance): Likewise. (vect_make_slp_decision): Likewise. (vect_supported_load_permutation_p): Likewise, and polynomial vf too. (vect_analyze_slp_cost): Handle polynomial vf. (vect_slp_analyze_node_operations): Likewise. (vect_slp_analyze_bb_1): Likewise. (vect_transform_slp_perm_load): Take the vf as a poly_uint64 rather than an unsigned HOST_WIDE_INT. * tree-vect-stmts.c (vectorizable_simd_clone_call, vectorizable_store) (vectorizable_load): Handle polynomial vf. * tree-vectorizer.c (simduid_to_vf::vf): Change from an int to a poly_uint64. (adjust_simduid_builtins, shrink_simd_arrays): Update accordingly. gcc/testsuite/ * gcc.dg/vect-opt-info-1.c: New test. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256126
2018-01-03Add an alternative vector loop iv mechanismRichard Sandiford1-40/+180
Normally we adjust the vector loop so that it iterates: (original number of scalar iterations - number of peels) / VF times, enforcing this using an IV that starts at zero and increments by one each iteration. However, dividing by VF would be expensive for variable VF, so this patch adds an alternative in which the IV increments by VF each iteration instead. We then need to take care to handle possible overflow in the IV. The new mechanism isn't used yet; a later patch replaces the "if (1)" with a check for variable VF. 2018-01-03 Richard Sandiford <richard.sandiford@linaro.org> gcc/ * tree-vect-loop-manip.c: Include gimple-fold.h. (slpeel_make_loop_iterate_ntimes): Add step, final_iv and niters_maybe_zero parameters. Handle other cases besides a step of 1. (vect_gen_vector_loop_niters): Add a step_vector_ptr parameter. Add a path that uses a step of VF instead of 1, but disable it for now. (vect_do_peeling): Add step_vector, niters_vector_mult_vf_var and niters_no_overflow parameters. Update calls to slpeel_make_loop_iterate_ntimes and vect_gen_vector_loop_niters. Create a new SSA name if the latter choses to use a ste other than zero, and return it via niters_vector_mult_vf_var. * tree-vect-loop.c (vect_transform_loop): Update calls to vect_do_peeling, vect_gen_vector_loop_niters and slpeel_make_loop_iterate_ntimes. * tree-vectorizer.h (slpeel_make_loop_iterate_ntimes, vect_do_peeling) (vect_gen_vector_loop_niters): Update declarations after above changes. From-SVN: r256124
2017-12-21poly_int: loop versioning thresholdRichard Sandiford1-1/+13
This patch splits the loop versioning threshold out from the cost model threshold so that the former can become a poly_uint64. We still use a single test to enforce both limits where possible. 2017-12-21 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * tree-vectorizer.h (_loop_vec_info): Add a versioning_threshold field. (LOOP_VINFO_VERSIONING_THRESHOLD): New macro (vect_loop_versioning): Take the loop versioning threshold as a separate parameter. * tree-vect-loop-manip.c (vect_loop_versioning): Likewise. * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize versioning_threshold. (vect_analyze_loop_2): Compute the loop versioning threshold whenever loop versioning is needed, and store it in the new field rather than combining it with the cost model threshold. (vect_transform_loop): Update call to vect_loop_versioning. Try to combine the loop versioning and cost thresholds here. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r255934
2017-12-12[SFN] boilerplate changes in preparation to introduce nonbind markersAlexandre Oliva1-4/+4
This patch introduces a number of new macros and functions that will be used to distinguish between different kinds of debug stmts, insns and notes, namely, preexisting debug bind ones and to-be-introduced nonbind markers. In a seemingly mechanical way, it adjusts several uses of the macros and functions, so that they refer to narrower categories when appropriate. These changes, by themselves, should not have any visible effect in the compiler behavior, since the upcoming debug markers are never created with this patch alone. for gcc/ChangeLog * gimple.h (enum gimple_debug_subcode): Add GIMPLE_DEBUG_BEGIN_STMT. (gimple_debug_begin_stmt_p): New. (gimple_debug_nonbind_marker_p): New. * tree.h (MAY_HAVE_DEBUG_MARKER_STMTS): New. (MAY_HAVE_DEBUG_BIND_STMTS): Renamed from.... (MAY_HAVE_DEBUG_STMTS): ... this. Check both. * insn-notes.def (BEGIN_STMT): New. * rtl.h (MAY_HAVE_DEBUG_MARKER_INSNS): New. (MAY_HAVE_DEBUG_BIND_INSNS): Renamed from.... (MAY_HAVE_DEBUG_INSNS): ... this. Check both. (NOTE_MARKER_LOCATION, NOTE_MARKER_P): New. (DEBUG_BIND_INSN_P, DEBUG_MARKER_INSN_P): New. (INSN_DEBUG_MARKER_KIND): New. (GEN_RTX_DEBUG_MARKER_BEGIN_STMT_PAT): New. (INSN_VAR_LOCATION): Check for VAR_LOCATION. (INSN_VAR_LOCATION_PTR): New. * cfgexpand.c (expand_debug_locations): Handle debug bind insns only. (expand_gimple_basic_block): Likewise. Emit debug temps for TER deps only if debug bind insns are enabled. (pass_expand::execute): Avoid deep TER and expand debug locations for debug bind insns only. * cgraph.c (cgraph_edge::redirect_call_stmt_to_callee): Narrow debug stmts special handling down to debug bind stmts. * combine.c (try_combine): Narrow debug insns special handling down to debug bind insns. * cse.c (delete_trivially_dead_insns): Handle debug bindings. Narrow debug insns preexisting special handling down to debug bind insns. * dce.c (rest_of_handle_ud_dce): Narrow debug insns special handling down to debug bind insns. * function.c (instantiate_virtual_regs): Skip debug markers, adjust handling of debug binds. * gimple-ssa-backprop.c (backprop::prepare_change): Try debug temp insertion iff MAY_HAVE_DEBUG_BIND_STMTS. * haifa-sched.c (schedule_insn): Narrow special handling of debug insns to debug bind insns. * ipa-param-manipulation.c (ipa_modify_call_arguments): Narrow special handling of debug stmts to debug bind stmts. * ipa-split.c (split_function): Likewise. * ira.c (combine_and_move_insns): Adjust debug bind insns only. * loop-unroll.c (apply_opt_in_copies): Adjust tests on bind debug insns. * reg-stack.c (convert_regs_1): Use DEBUG_BIND_INSN_P. * regrename.c (build_def_use): Likewise. * regcprop.c (copyprop_hardreg_forward_1): Likewise. (pass_cprop_hardreg): Narrow special casing of debug insns to debug bind insns. * regstat.c (regstat_init_n_sets_and_refs): Likewise. * reload1.c (reload): Likewise. * sese.c (sese_insert_phis_for_liveouts): Narrow special casing of debug stmts to debug bind stmts. * shrink-wrap.c (move_insn_for_shrink_wrap): Likewise. * ssa-iterators.h (num_imm_uses): Likewise. * tree-cfg.c (gimple_merge_blocks): Narrow special casing of debug stmts to debug bind stmts. * tree-inline.c (tree_function_versioning): Narrow special casing of debug stmts to debug bind stmts. * tree-loop-distribution.c (generate_loops_for_partition): Narrow special casing of debug stmts to debug bind stmts. * tree-sra.c (analyze_access_subtree): Narrow special casing of debug stmts to debug bind stmts. * tree-ssa-dce.c (remove_dead_stmt): Narrow special casing of debug stmts to debug bind stmts. * tree-ssa-loop-ivopt.c (remove_unused_ivs): Narrow special casing of debug stmts to debug bind stmts. * tree-ssa-reassoc.c (reassoc_remove_stmt): Likewise. * tree-ssa-tail-merge.c (tail_merge_optimize): Narrow special casing of debug stmts to debug bind stmts. * tree-ssa-threadedge.c (propagate_threaded_block_debug_info): Likewise. * tree-ssa.c (flush_pending_stmts): Narrow special casing of debug stmts to debug bind stmts. (gimple_replace_ssa_lhs): Likewise. (insert_debug_temp_for_var_def): Likewise. (insert_debug_temps_for_defs): Likewise. (reset_debug_uses): Likewise. * tree-ssanames.c (release_ssa_name_fn): Likewise. * tree-vect-loop-manip.c (adjust_debug_stmts_now): Likewise. (adjust_debug_stmts): Likewise. (adjust_phi_and_debug_stmts): Likewise. (vect_do_peeling): Likewise. * tree-vect-loop.c (vect_transform_loop): Likewise. * valtrack.c (propagate_for_debug): Use BIND_DEBUG_INSN_P. * var-tracking.c (adjust_mems): Narrow special casing of debug insns to debug bind insns. (dv_onepart_p, dataflow_set_clar_at_call, use_type): Likewise. (compute_bb_dataflow, vt_find_locations): Likewise. (vt_expand_loc, emit_notes_for_changes): Likewise. (vt_init_cfa_base): Likewise. (vt_emit_notes): Likewise. (vt_initialize): Likewise. (vt_finalize): Likewise. From-SVN: r255565
2017-11-24re PR tree-optimization/82402 (error: SSA_NAME_OCCURS_IN_ABNORMAL_PHI should ↵Richard Biener1-0/+2
be set) 2017-11-24 Richard Biener <rguenther@suse.de> PR tree-optimization/82402 * tree-vect-loop-manip.c (create_lcssa_for_virtual_phi): Properly set SSA_NAME_OCCURS_IN_ABNORMAL_PHI. * gcc.dg/torture/pr82402.c: New testcase. From-SVN: r255140
2017-11-16tree-vect-loop-manip.c (vect_do_peeling): Do not use scale_bbs_frequencies_int.Jan Hubicka1-4/+6
* tree-vect-loop-manip.c (vect_do_peeling): Do not use scale_bbs_frequencies_int. From-SVN: r254810
2017-11-03asan.c (create_cond_insert_point): Maintain profile.Jan Hubicka1-1/+0
* asan.c (create_cond_insert_point): Maintain profile. * ipa-utils.c (ipa_merge_profiles): Be sure only IPA profiles are merged. * basic-block.h (struct basic_block_def): Remove frequency. (EDGE_FREQUENCY): Use to_frequency * bb-reorder.c (push_to_next_round_p): Use only IPA counts for global heuristics. (find_traces): Update to use to_frequency. (find_traces_1_round): Likewise; use only IPA counts. (bb_to_key): Likewise. (connect_traces): Use IPA counts only. (copy_bb_p): Update to use to_frequency. (fix_up_crossing_landing_pad): Likewise. (sanitize_hot_paths): Likewise. * bt-load.c (basic_block_freq): Likewise. * cfg.c (init_flow): Set count_max to uninitialized. (check_bb_profile): Remove frequencies; check counts. (dump_bb_info): Do not dump frequencies. (update_bb_profile_for_threading): Update counts only. (scale_bbs_frequencies_int): Likewise. (MAX_SAFE_MULTIPLIER): Remove. (scale_bbs_frequencies_gcov_type): Update counts only. (scale_bbs_frequencies_profile_count): Update counts only. (scale_bbs_frequencies): Update counts only. * cfg.h (struct control_flow_graph): Add count-max. (update_bb_profile_for_threading): Update prototype. * cfgbuild.c (find_bb_boundaries): Do not update frequencies. (find_many_sub_basic_blocks): Likewise. * cfgcleanup.c (try_forward_edges): Likewise. (try_crossjump_to_edge): Likewise. * cfgexpand.c (expand_gimple_cond): Likewise. (expand_gimple_tailcall): Likewise. (construct_init_block): Likewise. (construct_exit_block): Likewise. * cfghooks.c (verify_flow_info): Check consistency of counts. (dump_bb_for_graph): Do not dump frequencies. (split_block_1): Do not update frequencies. (split_edge): Do not update frequencies. (make_forwarder_block): Do not update frequencies. (duplicate_block): Do not update frequencies. (account_profile_record): Do not update frequencies. * cfgloop.c (find_subloop_latch_edge_by_profile): Use IPA counts for global heuristics. * cfgloopanal.c (average_num_loop_insns): Update to use to_frequency. (expected_loop_iterations_unbounded): Use counts only. * cfgloopmanip.c (scale_loop_profile): Simplify. (create_empty_loop_on_edge): Simplify (loopify): Simplify (duplicate_loop_to_header_edge): Simplify * cfgrtl.c (force_nonfallthru_and_redirect): Update profile. (update_br_prob_note): Take care of removing note when profile becomes undefined. (relink_block_chain): Do not dump frequency. (rtl_account_profile_record): Use to_frequency. * cgraph.c (symbol_table::create_edge): Convert count to ipa count. (cgraph_edge::redirect_call_stmt_to_calle): Conver tcount to ipa count. (cgraph_update_edges_for_call_stmt_node): Likewise. (cgraph_edge::verify_count_and_frequency): Update. (cgraph_node::verify_node): Temporarily disable frequency verification. * cgraphbuild.c (compute_call_stmt_bb_frequency): Use to_cgraph_frequency. (cgraph_edge::rebuild_edges): Convert to ipa counts. * cgraphunit.c (init_lowered_empty_function): Do not initialize frequencies. (cgraph_node::expand_thunk): Update profile. * except.c (dw2_build_landing_pads): Do not update frequency. * final.c (compute_alignments): Use to_frequency. (dump_basic_block_info): Do not dump frequency. * gimple-pretty-print.c (dump_profile): Do not dump frequency. (dump_gimple_bb_header): Do not dump frequency. * gimple-ssa-isolate-paths.c (isolate_path): Do not update frequency; do update count. * gimple-streamer-in.c (input_bb): Do not stream frequency. * gimple-streamer-out.c (output_bb): Do not stream frequency. * haifa-sched.c (sched_pressure_start_bb): Use to_freuqency. (init_before_recovery): Do not update frequency. (sched_create_recovery_edges): Do not update frequency. * hsa-gen.c (convert_switch_statements): Do not update frequency. * ipa-cp.c (ipcp_propagate_stage): Update search for max_count. (ipa_cp_c_finalize): Set max_count to uninitialized. * ipa-fnsummary.c (get_minimal_bb): Use counts. (param_change_prob): Use counts. * ipa-profile.c (ipa_profile_generate_summary): Do not summarize local profiles. * ipa-split.c (consider_split): Use to_frequency. (split_function): Use to_frequency. * ira-build.c (loop_compare_func): Likewise. (mark_loops_for_removal): Likewise. (mark_all_loops_for_removal): Likewise. * loop-doloop.c (doloop_modify): Do not update frequency. * loop-unroll.c (unroll_loop_runtime_iterations): Do not update frequency. * lto-streamer-in.c (input_function): Update count_max. * omp-expand.c (expand_omp_taskreg): Update count_max. * omp-simd-clone.c (simd_clone_adjust): Update profile. * predict.c (maybe_hot_frequency_p): Use to_frequency. (maybe_hot_count_p): Use ipa counts only. (maybe_hot_bb_p): Simplify. (maybe_hot_edge_p): Simplify. (probably_never_executed): Do not take frequency argument. (probably_never_executed_bb_p): Do not pass frequency. (probably_never_executed_edge_p): Likewise. (combine_predictions_for_bb): Check that profile is nonzero. (propagate_freq): Do not set frequency. (drop_profile): Simplify. (counts_to_freqs): Simplify. (expensive_function_p): Use to_frequency. (propagate_unlikely_bbs_forward): Simplify. (determine_unlikely_bbs): Simplify. (estimate_bb_frequencies): Add hack to silence graphite issues. (compute_function_frequency): Use ipa counts. (pass_profile::execute): Update. (rebuild_frequencies): Use counts only. (force_edge_cold): Use counts only. * profile-count.c (profile_count::dump): Dump new count types. (profile_count::differs_from_p): Check compatiblity. (profile_count::to_frequency): New function. (profile_count::to_cgraph_frequency): New function. * profile-count.h (struct function): Declare. (enum profile_quality): Add profile_guessed_local and profile_guessed_global0. (class profile_proability): Decrease number of bits to 29; update from_reg_br_prob_note and to_reg_br_prob_note. (class profile_count: Update comment; decrease number of bits to 61. Check compatibility. (profile_count::compatible_p): New private member function. (profile_count::ipa_p): New member function. (profile_count::operator<): Handle global zero correctly. (profile_count::operator>): Handle global zero correctly. (profile_count::operator<=): Handle global zero correctly. (profile_count::operator>=): Handle global zero correctly. (profile_count::nonzero_p): New member function. (profile_count::force_nonzero): New member function. (profile_count::max): New member function. (profile_count::apply_scale): Handle IPA scalling. (profile_count::guessed_local): New member function. (profile_count::global0): New member function. (profile_count::ipa): New member function. (profile_count::to_frequency): Declare. (profile_count::to_cgraph_frequency): Declare. * profile.c (OVERLAP_BASE): Delete. (compute_frequency_overlap): Delete. (compute_branch_probabilities): Do not use compute_frequency_overlap. * regs.h (REG_FREQ_FROM_BB): Use to_frequency. * sched-ebb.c (rank): Use counts only. * shrink-wrap.c (handle_simple_exit): Use counts only. (try_shrink_wrapping): Use counts only. (place_prologue_for_one_component): Use counts only. * tracer.c (find_best_predecessor): Use to_frequency. (find_trace): Use to_frequency. (tail_duplicate): Use to_frequency. * trans-mem.c (expand_transaction): Do not update frequency. * tree-call-cdce.c: Do not update frequency. * tree-cfg.c (gimple_find_sub_bbs): Likewise. (gimple_merge_blocks): Likewise. (gimple_split_edge): Likewise. (gimple_duplicate_sese_region): Likewise. (gimple_duplicate_sese_tail): Likewise. (move_sese_region_to_fn): Likewise. (gimple_account_profile_record): Likewise. (insert_cond_bb): Likewise. * tree-complex.c (expand_complex_div_wide): Likewise. * tree-eh.c (lower_resx): Update profile. * tree-inline.c (copy_bb): Simplify count scaling; do not scale frequencies. (initialize_cfun): Do not initialize frequencies (freqs_to_counts): Delete. (copy_cfg_body): Ignore count parameter. (copy_body): Update. (expand_call_inline): Update count_max. (optimize_inline_calls): Update count_max. (tree_function_versioning): Update count_max. * tree-ssa-coalesce.c (coalesce_cost_bb): Use to_frequency. * tree-ssa-ifcombine.c (update_profile_after_ifcombine): Do not update frequency. * tree-ssa-loop-im.c (execute_sm_if_changed): Use counts only. * tree-ssa-loop-ivcanon.c (unloop_loops): Do not update freuqency. (try_peel_loop): Likewise. * tree-ssa-loop-ivopts.c (get_scaled_computation_cost_at): Use to_frequency. * tree-ssa-loop-manip.c (niter_for_unrolled_loop): Pass -1. (tree_transform_and_unroll_loop): Do not use frequencies * tree-ssa-loop-niter.c (estimate_numbers_of_iterations): Use reliable prediction only. * tree-ssa-loop-unswitch.c (hoist_guard): Do not use frequencies. * tree-ssa-sink.c (select_best_block): Use to_frequency. * tree-ssa-tail-merge.c (replace_block_by): Temporarily disable probability scaling. * tree-ssa-threadupdate.c (create_block_for_threading): Do not update frequency (any_remaining_duplicated_blocks): Likewise. (update_profile): Likewise. (estimated_freqs_path): Delete. (freqs_to_counts_path): Delete. (clear_counts_path): Delete. (ssa_fix_duplicate_block_edges): Likewise. (duplicate_thread_path): Likewise. * tree-switch-conversion.c (gen_inbound_check): Use counts. * tree-tailcall.c (decrease_profile): Do not update frequency. (eliminate_tail_call): Likewise. * tree-vect-loop-manip.c (vect_do_peeling): Likewise. * tree-vect-loop.c (scale_profile_for_vect_loop): Likewise. (optimize_mask_stores): Likewise. * tree-vect-stmts.c (vectorizable_simd_clone_call): Likewise. * ubsan.c (ubsan_expand_null_ifn): Update profile. (ubsan_expand_ptr_ifn): Update profile. * value-prof.c (gimple_ic): Simplify. * value-prof.h (gimple_ic): Update prototype. * ipa-inline-transform.c (inline_transform): Fix scaling conditoins. * ipa-inline.c (compute_uninlined_call_time): Be sure that counts are nonzero. (want_inline_self_recursive_call_p): Likewise. (resolve_noninline_speculation): Only cummulate defined counts. (inline_small_functions): Use nonzero_p. (ipa_inline): Do not access freed node. Unknown ChangeLog: 2017-11-02 Jan Hubicka <hubicka@ucw.cz> * testsuite/gcc.dg/no-strict-overflow-3.c (foo): Update magic value to not clash with frequency. * testsuite/gcc.dg/strict-overflow-3.c (foo): Likewise. * testsuite/gcc.dg/tree-ssa/builtin-sprintf-2.c: Update template. * testsuite/gcc.dg/tree-ssa/dump-2.c: Update template. * testsuite/gcc.dg/tree-ssa/ifc-10.c: Update template. * testsuite/gcc.dg/tree-ssa/ifc-11.c: Update template. * testsuite/gcc.dg/tree-ssa/ifc-12.c: Update template. * testsuite/gcc.dg/tree-ssa/ifc-20040816-1.c: Update template. * testsuite/gcc.dg/tree-ssa/ifc-20040816-2.c: Update template. * testsuite/gcc.dg/tree-ssa/ifc-5.c: Update template. * testsuite/gcc.dg/tree-ssa/ifc-8.c: Update template. * testsuite/gcc.dg/tree-ssa/ifc-9.c: Update template. * testsuite/gcc.dg/tree-ssa/ifc-cd.c: Update template. * testsuite/gcc.dg/tree-ssa/ifc-pr56541.c: Update template. * testsuite/gcc.dg/tree-ssa/ifc-pr68583.c: Update template. * testsuite/gcc.dg/tree-ssa/ifc-pr69489-1.c: Update template. * testsuite/gcc.dg/tree-ssa/ifc-pr69489-2.c: Update template. * testsuite/gcc.target/i386/pr61403.c: Update template. From-SVN: r254379
2017-10-19asan.c (create_cond_insert_point): Do not update edge count.Jan Hubicka1-4/+0
* asan.c (create_cond_insert_point): Do not update edge count. * auto-profile.c (afdo_propagate_edge): Update for edge count removal. (afdo_propagate_circuit): Likewise. (afdo_calculate_branch_prob): Likewise. (afdo_annotate_cfg): Likewise. * basic-block.h (struct edge_def): Remove count. (edge_def::count): New accessor. * bb-reorder.c (rotate_loop): Update. (find_traces_1_round): Update. (connect_traces): Update. (sanitize_hot_paths): Update. * cfg.c (unchecked_make_edge): Update. (make_single_succ_edge): Update. (check_bb_profile): Update. (dump_edge_info): Update. (update_bb_profile_for_threading): Update. (scale_bbs_frequencies_int): Update. (scale_bbs_frequencies_gcov_type): Update. (scale_bbs_frequencies_profile_count): Update. (scale_bbs_frequencies): Update. * cfganal.c (connect_infinite_loops_to_exit): Update. * cfgbuild.c (compute_outgoing_frequencies): Update. (find_many_sub_basic_blocks): Update. * cfgcleanup.c (try_forward_edges): Update. (try_crossjump_to_edge): Update * cfgexpand.c (expand_gimple_cond): Update (expand_gimple_tailcall): Update (construct_exit_block): Update * cfghooks.c (verify_flow_info): Update (redirect_edge_succ_nodup): Update (split_edge): Update (make_forwarder_block): Update (duplicate_block): Update (account_profile_record): Update * cfgloop.c (find_subloop_latch_edge_by_profile): Update. * cfgloopanal.c (expected_loop_iterations_unbounded): Update. * cfgloopmanip.c (scale_loop_profile): Update. (loopify): Update. (lv_adjust_loop_entry_edge): Update. * cfgrtl.c (try_redirect_by_replacing_jump): Update. (force_nonfallthru_and_redirect): Update. (purge_dead_edges): Update. (rtl_flow_call_edges_add): Update. * cgraphunit.c (init_lowered_empty_function): Update. (cgraph_node::expand_thunk): Update. * gimple-pretty-print.c (dump_probability): Update. (dump_edge_probability): Update. * gimple-ssa-isolate-paths.c (isolate_path): Update. * haifa-sched.c (sched_create_recovery_edges): Update. * hsa-gen.c (convert_switch_statements): Update. * ifcvt.c (dead_or_predicable): Update. * ipa-inline-transform.c (inline_transform): Update. * ipa-split.c (split_function): Update. * ipa-utils.c (ipa_merge_profiles): Update. * loop-doloop.c (add_test): Update. * loop-unroll.c (unroll_loop_runtime_iterations): Update. * lto-streamer-in.c (input_cfg): Update. (input_function): Update. * lto-streamer-out.c (output_cfg): Update. * modulo-sched.c (sms_schedule): Update. * postreload-gcse.c (eliminate_partially_redundant_load): Update. * predict.c (maybe_hot_edge_p): Update. (unlikely_executed_edge_p): Update. (probably_never_executed_edge_p): Update. (dump_prediction): Update. (drop_profile): Update. (propagate_unlikely_bbs_forward): Update. (determine_unlikely_bbs): Update. (force_edge_cold): Update. * profile.c (compute_branch_probabilities): Update. * reg-stack.c (better_edge): Update. * shrink-wrap.c (handle_simple_exit): Update. * tracer.c (better_p): Update. * trans-mem.c (expand_transaction): Update. (split_bb_make_tm_edge): Update. * tree-call-cdce.c: Update. * tree-cfg.c (gimple_find_sub_bbs): Update. (gimple_split_edge): Update. (gimple_duplicate_sese_region): Update. (gimple_duplicate_sese_tail): Update. (gimple_flow_call_edges_add): Update. (insert_cond_bb): Update. (execute_fixup_cfg): Update. * tree-cfgcleanup.c (cleanup_control_expr_graph): Update. * tree-complex.c (expand_complex_div_wide): Update. * tree-eh.c (lower_resx): Update. (unsplit_eh): Update. (cleanup_empty_eh_move_lp): Update. * tree-inline.c (copy_edges_for_bb): Update. (freqs_to_counts): Update. (copy_cfg_body): Update. * tree-ssa-dce.c (remove_dead_stmt): Update. * tree-ssa-ifcombine.c (update_profile_after_ifcombine): Update. * tree-ssa-loop-im.c (execute_sm_if_changed): Update. * tree-ssa-loop-ivcanon.c (remove_exits_and_undefined_stmts): Update. (unloop_loops): Update. * tree-ssa-loop-manip.c (tree_transform_and_unroll_loop): Update. * tree-ssa-loop-split.c (connect_loops): Update. (split_loop): Update. * tree-ssa-loop-unswitch.c (hoist_guard): Update. * tree-ssa-phionlycprop.c (propagate_rhs_into_lhs): Update. * tree-ssa-phiopt.c (replace_phi_edge_with_variable): Update. * tree-ssa-reassoc.c (branch_fixup): Update. * tree-ssa-tail-merge.c (replace_block_by): Update. * tree-ssa-threadupdate.c (remove_ctrl_stmt_and_useless_edges): Update. (compute_path_counts): Update. (update_profile): Update. (recompute_probabilities): Update. (update_joiner_offpath_counts): Update. (estimated_freqs_path): Update. (freqs_to_counts_path): Update. (clear_counts_path): Update. (ssa_fix_duplicate_block_edges): Update. (duplicate_thread_path): Update. * tree-switch-conversion.c (hoist_edge_and_branch_if_true): Update. (case_bit_test_cmp): Update. (collect_switch_conv_info): Update. (gen_inbound_check): Update. (do_jump_if_equal): Update. (emit_cmp_and_jump_insns): Update. * tree-tailcall.c (decrease_profile): Update. (eliminate_tail_call): Update. * tree-vect-loop-manip.c (slpeel_add_loop_guard): Update. (vect_do_peeling): Update. * tree-vect-loop.c (scale_profile_for_vect_loop): Update. * ubsan.c (ubsan_expand_null_ifn): Update. (ubsan_expand_ptr_ifn): Update. * value-prof.c (gimple_divmod_fixed_value): Update. (gimple_mod_pow2): Update. (gimple_mod_subtract): Update. (gimple_ic): Update. (gimple_stringop_fixed_value): Update. From-SVN: r253910
2017-10-10Require wi::to_wide for treesRichard Sandiford1-4/+7
The wide_int routines allow things like: wi::add (t, 1) to add 1 to an INTEGER_CST T in its native precision. But we also have: wi::to_offset (t) // Treat T as an offset_int wi::to_widest (t) // Treat T as a widest_int Recently we also gained: wi::to_wide (t, prec) // Treat T as a wide_int in preccision PREC This patch therefore requires: wi::to_wide (t) when operating on INTEGER_CSTs in their native precision. This is just as efficient, and makes it clearer that a deliberate choice is being made to treat the tree as a wide_int in its native precision. This also removes the inconsistency that a) INTEGER_CSTs in their native precision can be used without an accessor but must use wi:: functions instead of C++ operators b) the other forms need an explicit accessor but the result can be used with C++ operators. It also helps with SVE, where there's the additional possibility that the tree could be a runtime value. 2017-10-10 Richard Sandiford <richard.sandiford@linaro.org> gcc/ * wide-int.h (wide_int_ref_storage): Make host_dependent_precision a template parameter. (WIDE_INT_REF_FOR): Update accordingly. * tree.h (wi::int_traits <const_tree>): Delete. (wi::tree_to_widest_ref, wi::tree_to_offset_ref): New typedefs. (wi::to_widest, wi::to_offset): Use them. Expand commentary. (wi::tree_to_wide_ref): New typedef. (wi::to_wide): New function. * calls.c (get_size_range): Use wi::to_wide when operating on trees as wide_ints. * cgraph.c (cgraph_node::create_thunk): Likewise. * config/i386/i386.c (ix86_data_alignment): Likewise. (ix86_local_alignment): Likewise. * dbxout.c (stabstr_O): Likewise. * dwarf2out.c (add_scalar_info, gen_enumeration_type_die): Likewise. * expr.c (const_vector_from_tree): Likewise. * fold-const-call.c (host_size_t_cst_p, fold_const_call_1): Likewise. * fold-const.c (may_negate_without_overflow_p, negate_expr_p) (fold_negate_expr_1, int_const_binop_1, const_binop) (fold_convert_const_int_from_real, optimize_bit_field_compare) (all_ones_mask_p, sign_bit_p, unextend, extract_muldiv_1) (fold_div_compare, fold_single_bit_test, fold_plusminus_mult_expr) (pointer_may_wrap_p, expr_not_equal_to, fold_binary_loc) (fold_ternary_loc, multiple_of_p, fold_negate_const, fold_abs_const) (fold_not_const, round_up_loc): Likewise. * gimple-fold.c (gimple_fold_indirect_ref): Likewise. * gimple-ssa-warn-alloca.c (alloca_call_type_by_arg): Likewise. (alloca_call_type): Likewise. * gimple.c (preprocess_case_label_vec_for_gimple): Likewise. * godump.c (go_output_typedef): Likewise. * graphite-sese-to-poly.c (tree_int_to_gmp): Likewise. * internal-fn.c (get_min_precision): Likewise. * ipa-cp.c (ipcp_store_vr_results): Likewise. * ipa-polymorphic-call.c (ipa_polymorphic_call_context::ipa_polymorphic_call_context): Likewise. * ipa-prop.c (ipa_print_node_jump_functions_for_edge): Likewise. (ipa_modify_call_arguments): Likewise. * match.pd: Likewise. * omp-low.c (scan_omp_1_op, lower_omp_ordered_clauses): Likewise. * print-tree.c (print_node_brief, print_node): Likewise. * stmt.c (expand_case): Likewise. * stor-layout.c (layout_type): Likewise. * tree-affine.c (tree_to_aff_combination): Likewise. * tree-cfg.c (group_case_labels_stmt): Likewise. * tree-data-ref.c (dr_analyze_indices): Likewise. (prune_runtime_alias_test_list): Likewise. * tree-dump.c (dequeue_and_dump): Likewise. * tree-inline.c (remap_gimple_op_r, copy_tree_body_r): Likewise. * tree-predcom.c (is_inv_store_elimination_chain): Likewise. * tree-pretty-print.c (dump_generic_node): Likewise. * tree-scalar-evolution.c (iv_can_overflow_p): Likewise. (simple_iv_with_niters): Likewise. * tree-ssa-address.c (addr_for_mem_ref): Likewise. * tree-ssa-ccp.c (ccp_finalize, evaluate_stmt): Likewise. * tree-ssa-loop-ivopts.c (constant_multiple_of): Likewise. * tree-ssa-loop-niter.c (split_to_var_and_offset) (refine_value_range_using_guard, number_of_iterations_ne_max) (number_of_iterations_lt_to_ne, number_of_iterations_lt) (get_cst_init_from_scev, record_nonwrapping_iv) (scev_var_range_cant_overflow): Likewise. * tree-ssa-phiopt.c (minmax_replacement): Likewise. * tree-ssa-pre.c (compute_avail): Likewise. * tree-ssa-sccvn.c (vn_reference_fold_indirect): Likewise. (vn_reference_maybe_forwprop_address, valueized_wider_op): Likewise. * tree-ssa-structalias.c (get_constraint_for_ptr_offset): Likewise. * tree-ssa-uninit.c (is_pred_expr_subset_of): Likewise. * tree-ssanames.c (set_nonzero_bits, get_nonzero_bits): Likewise. * tree-switch-conversion.c (collect_switch_conv_info, array_value_type) (dump_case_nodes, try_switch_expansion): Likewise. * tree-vect-loop-manip.c (vect_gen_vector_loop_niters): Likewise. (vect_do_peeling): Likewise. * tree-vect-patterns.c (vect_recog_bool_pattern): Likewise. * tree-vect-stmts.c (vectorizable_load): Likewise. * tree-vrp.c (compare_values_warnv, vrp_int_const_binop): Likewise. (zero_nonzero_bits_from_vr, ranges_from_anti_range): Likewise. (extract_range_from_binary_expr_1, adjust_range_with_scev): Likewise. (overflow_comparison_p_1, register_edge_assert_for_2): Likewise. (is_masked_range_test, find_switch_asserts, maybe_set_nonzero_bits) (vrp_evaluate_conditional_warnv_with_ops, intersect_ranges): Likewise. (range_fits_type_p, two_valued_val_range_p, vrp_finalize): Likewise. (evrp_dom_walker::before_dom_children): Likewise. * tree.c (cache_integer_cst, real_value_from_int_cst, integer_zerop) (integer_all_onesp, integer_pow2p, integer_nonzerop, tree_log2) (tree_floor_log2, tree_ctz, mem_ref_offset, tree_int_cst_sign_bit) (tree_int_cst_sgn, get_unwidened, int_fits_type_p): Likewise. (get_type_static_bounds, num_ending_zeros, drop_tree_overflow) (get_range_pos_neg): Likewise. * ubsan.c (ubsan_expand_ptr_ifn): Likewise. * config/darwin.c (darwin_mergeable_constant_section): Likewise. * config/aarch64/aarch64.c (aapcs_vfp_sub_candidate): Likewise. * config/arm/arm.c (aapcs_vfp_sub_candidate): Likewise. * config/avr/avr.c (avr_fold_builtin): Likewise. * config/bfin/bfin.c (bfin_local_alignment): Likewise. * config/msp430/msp430.c (msp430_attr): Likewise. * config/nds32/nds32.c (nds32_insert_attributes): Likewise. * config/powerpcspe/powerpcspe-c.c (altivec_resolve_overloaded_builtin): Likewise. * config/powerpcspe/powerpcspe.c (rs6000_aggregate_candidate) (rs6000_expand_ternop_builtin): Likewise. * config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin): Likewise. * config/rs6000/rs6000.c (rs6000_aggregate_candidate): Likewise. (rs6000_expand_ternop_builtin): Likewise. * config/s390/s390.c (s390_handle_hotpatch_attribute): Likewise. gcc/ada/ * gcc-interface/decl.c (annotate_value): Use wi::to_wide when operating on trees as wide_ints. gcc/c/ * c-parser.c (c_parser_cilk_clause_vectorlength): Use wi::to_wide when operating on trees as wide_ints. * c-typeck.c (build_c_cast, c_finish_omp_clauses): Likewise. (c_tree_equal): Likewise. gcc/c-family/ * c-ada-spec.c (dump_generic_ada_node): Use wi::to_wide when operating on trees as wide_ints. * c-common.c (pointer_int_sum): Likewise. * c-pretty-print.c (pp_c_integer_constant): Likewise. * c-warn.c (match_case_to_enum_1): Likewise. (c_do_switch_warnings): Likewise. (maybe_warn_shift_overflow): Likewise. gcc/cp/ * cvt.c (ignore_overflows): Use wi::to_wide when operating on trees as wide_ints. * decl.c (check_array_designated_initializer): Likewise. * mangle.c (write_integer_cst): Likewise. * semantics.c (cp_finish_omp_clause_depend_sink): Likewise. gcc/fortran/ * target-memory.c (gfc_interpret_logical): Use wi::to_wide when operating on trees as wide_ints. * trans-const.c (gfc_conv_tree_to_mpz): Likewise. * trans-expr.c (gfc_conv_cst_int_power): Likewise. * trans-intrinsic.c (trans_this_image): Likewise. (gfc_conv_intrinsic_bound): Likewise. (conv_intrinsic_cobound): Likewise. gcc/lto/ * lto.c (compare_tree_sccs_1): Use wi::to_wide when operating on trees as wide_ints. gcc/objc/ * objc-act.c (objc_decl_method_attributes): Use wi::to_wide when operating on trees as wide_ints. From-SVN: r253595
2017-10-10tree-vect-loop-manip.c (rename_variables_in_bb): Rename PHI nodes when ↵Bin Cheng1-2/+0
copying loop nest with only one inner loop. * tree-vect-loop-manip.c (rename_variables_in_bb): Rename PHI nodes when copying loop nest with only one inner loop. gcc/testsuite * gcc.dg/tree-ssa/ldist-34.c: New test. From-SVN: r253586
2017-10-10tree-vect-loop-manip.c (slpeel_tree_duplicate_loop_to_edge_cfg): Skip ↵Bin Cheng1-1/+2
renaming variables in new preheader if it's deleted. * tree-vect-loop-manip.c (slpeel_tree_duplicate_loop_to_edge_cfg): Skip renaming variables in new preheader if it's deleted. From-SVN: r253579
2017-09-22Let the target choose a vectorisation alignmentRichard Sandiford1-24/+27
The vectoriser aligned vectors to TYPE_ALIGN unconditionally, although there was also a hard-coded assumption that this was equal to the type size. This was inconvenient for SVE for two reasons: - When compiling for a specific power-of-2 SVE vector length, we might want to align to a full vector. However, the TYPE_ALIGN is governed by the ABI alignment, which is 128 bits regardless of size. - For vector-length-agnostic code it doesn't usually make sense to align, since the runtime vector length might not be a power of two. Even for power of two sizes, there's no guarantee that aligning to the previous 16 bytes will be an improveent. This patch therefore adds a target hook to control the preferred vectoriser (as opposed to ABI) alignment. 2017-09-22 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * target.def (preferred_vector_alignment): New hook. * doc/tm.texi.in (TARGET_VECTORIZE_PREFERRED_VECTOR_ALIGNMENT): New hook. * doc/tm.texi: Regenerate. * targhooks.h (default_preferred_vector_alignment): Declare. * targhooks.c (default_preferred_vector_alignment): New function. * tree-vectorizer.h (dataref_aux): Add a target_alignment field. Expand commentary. (DR_TARGET_ALIGNMENT): New macro. (aligned_access_p): Update commentary. (vect_known_alignment_in_bytes): New function. * tree-vect-data-refs.c (vect_calculate_required_alignment): New function. (vect_compute_data_ref_alignment): Set DR_TARGET_ALIGNMENT. Calculate the misalignment based on the target alignment rather than the vector size. (vect_update_misalignment_for_peel): Use DR_TARGET_ALIGMENT rather than TYPE_ALIGN / BITS_PER_UNIT to update the misalignment. (vect_enhance_data_refs_alignment): Mask the byte misalignment with the target alignment, rather than masking the element misalignment with the number of elements in a vector. Also use the target alignment when calculating the maximum number of peels. (vect_find_same_alignment_drs): Use vect_calculate_required_alignment instead of TYPE_ALIGN_UNIT. (vect_duplicate_ssa_name_ptr_info): Remove stmt_info parameter. Measure DR_MISALIGNMENT relative to DR_TARGET_ALIGNMENT. (vect_create_addr_base_for_vector_ref): Update call accordingly. (vect_create_data_ref_ptr): Likewise. (vect_setup_realignment): Realign by ANDing with -DR_TARGET_MISALIGNMENT. * tree-vect-loop-manip.c (vect_gen_prolog_loop_niters): Calculate the number of peels based on DR_TARGET_ALIGNMENT. * tree-vect-stmts.c (get_group_load_store_type): Compare the gap with the guaranteed alignment boundary when deciding whether overrun is OK. (vectorizable_mask_load_store): Interpret DR_MISALIGNMENT relative to DR_TARGET_ALIGNMENT instead of TYPE_ALIGN_UNIT. (ensure_base_align): Remove stmt_info parameter. Get the target base alignment from DR_TARGET_ALIGNMENT. (vectorizable_store): Update call accordingly. Interpret DR_MISALIGNMENT relative to DR_TARGET_ALIGNMENT instead of TYPE_ALIGN_UNIT. (vectorizable_load): Likewise. gcc/testsuite/ * gcc.dg/vect/vect-outer-3a.c: Adjust dump scan for new wording of alignment message. * gcc.dg/vect/vect-outer-3a-big-array.c: Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r253101
2017-08-04Use base inequality for some vector alias checksRichard Sandiford1-6/+39
This patch checks whether two data references x and y cannot partially overlap and so are independent whenever &x != &y. We can then use this in the vectoriser to optimise alias checks. gcc/ 2016-08-04 Richard Sandiford <richard.sandiford@linaro.org> * hash-traits.h (pair_hash): New struct. * tree-data-ref.h (data_dependence_relation): Add object_a and object_b fields. (DDR_OBJECT_A, DDR_OBJECT_B): New macros. * tree-data-ref.c (initialize_data_dependence_relation): Initialize DDR_OBJECT_A and DDR_OBJECT_B. * tree-vectorizer.h (vec_object_pair): New type. (_loop_vec_info): Add a check_unequal_addrs field. (LOOP_VINFO_CHECK_UNEQUAL_ADDRS): New macro. (LOOP_REQUIRES_VERSIONING_FOR_ALIAS): Return true if there is an entry in check_unequal_addrs. Check comp_alias_ddrs instead of may_alias_ddrs. * tree-vect-loop.c (destroy_loop_vec_info): Release LOOP_VINFO_CHECK_UNEQUAL_ADDRS. (vect_analyze_loop_2): Likewise, when restarting. (vect_estimate_min_profitable_iters): Estimate the cost of LOOP_VINFO_CHECK_UNEQUAL_ADDRS. * tree-vect-data-refs.c: Include tree-hash-traits.h. (vect_prune_runtime_alias_test_list): Try to handle conflicts using LOOP_VINFO_CHECK_UNEQUAL_ADDRS, if the data dependence allows. Count such tests in the final summary. * tree-vect-loop-manip.c (chain_cond_expr): New function. (vect_create_cond_for_align_checks): Use it. (vect_create_cond_for_unequal_addrs): New function. (vect_loop_versioning): Call it. gcc/testsuite/ * gcc.dg/vect/vect-alias-check-6.c: New test. From-SVN: r250868
2017-07-25re PR tree-optimization/81303 (410.bwaves regression caused by r249919)Richard Biener1-2/+2
2017-07-25 Richard Biener <rguenther@suse.de> PR tree-optimization/81303 * tree-vect-loop-manip.c (vect_loop_versioning): Build profitability check against LOOP_VINFO_NITERSM1. From-SVN: r250503
2017-07-03* tree-vect-loop-manip.c (vect_do_peeling): Fix scaling up.Jan Hubicka1-2/+2
From-SVN: r249929
2017-07-03Avoid minimum - 1 confusion in vectoriserRichard Sandiford1-1/+1
The variables that claimed to be the "minimum number of iterations" for which vectorisation was profitable were actually the maximum number of iterations for which vectorisation wasn't profitable. The loop threshold was too. This patch makes the values be what the variable names suggest. 2017-07-03 Richard Sandiford <richard.sandiford@linaro.org> gcc/ * tree-vect-loop.c (vect_analyze_loop_2): Treat min_scalar_loop_bound, min_profitable_iters, and th as inclusive lower bounds. Fix LOOP_VINFO_PEELING_FOR_GAPS condition. (vect_estimate_min_profitable_iters): Return inclusive lower bounds for min_profitable_iters and min_profitable_estimate. (vect_transform_loop): Treat th as an inclusive lower bound. * tree-vect-loop-manip.c (vect_loop_versioning): Likewise. From-SVN: r249927
2017-07-03Use innermost_loop_behavior for outer loop vectorisationRichard Sandiford1-4/+2
This patch replaces the individual stmt_vinfo dr_* fields with an innermost_loop_behavior, so that the changes in later patches get picked up automatically. It also adds a helper function for getting the behavior of a data reference wrt the vectorised loop. 2017-07-03 Richard Sandiford <richard.sandiford@linaro.org> gcc/ * tree-vectorizer.h (_stmt_vec_info): Replace individual dr_* fields with dr_wrt_vec_loop. (STMT_VINFO_DR_BASE_ADDRESS, STMT_VINFO_DR_INIT, STMT_VINFO_DR_OFFSET) (STMT_VINFO_DR_STEP, STMT_VINFO_DR_ALIGNED_TO): Update accordingly. (STMT_VINFO_DR_WRT_VEC_LOOP): New macro. (vect_dr_behavior): New function. (vect_create_addr_base_for_vector_ref): Remove loop parameter. * tree-vect-data-refs.c (vect_compute_data_ref_alignment): Use vect_dr_behavior. Use a step_preserves_misalignment_p boolean to track whether the step preserves the misalignment. (vect_create_addr_base_for_vector_ref): Remove loop parameter. Use vect_dr_behavior. (vect_setup_realignment): Update call accordingly. (vect_create_data_ref_ptr): Likewise. Use vect_dr_behavior. * tree-vect-loop-manip.c (vect_gen_prolog_loop_niters): Update call to vect_create_addr_base_for_vector_ref. (vect_create_cond_for_align_checks): Likewise. * tree-vect-patterns.c (vect_recog_bool_pattern): Copy STMT_VINFO_DR_WRT_VEC_LOOP as a block. (vect_recog_mask_conversion_pattern): Likewise. * tree-vect-stmts.c (compare_step_with_zero): Use vect_dr_behavior. (new_stmt_vec_info): Remove redundant zeroing. From-SVN: r249911
2017-07-01cfg.c (scale_bbs_frequencies): New function.Jan Hubicka1-27/+17
* cfg.c (scale_bbs_frequencies): New function. * cfg.h (scale_bbs_frequencies): Declare it. * cfgloopanal.c (single_likely_exit): Cleanup. * cfgloopmanip.c (scale_loop_frequencies): Take profile_probability as parameter. (scale_loop_profile): Likewise. (loop_version): Likewise. (create_empty_loop_on_edge): Update. * cfgloopmanip.h (scale_loop_frequencies, scale_loop_profile, scale_loop_frequencies, scale_loop_profile, loopify, loop_version): Update prototypes. * modulo-sched.c (sms_schedule): Update. * predict.c (unlikely_executed_edge_p): Also check probability. (probably_never_executed_edge_p): Fix typo. * tree-if-conv.c (version_loop_for_if_conversion): Update. * tree-parloops.c (gen_parallel_loop): Update. * tree-ssa-loop-ivcanon.c (try_peel_loop): Update. * tree-ssa-loop-manip.c (tree_transform_and_unroll_loop): Update. * tree-ssa-loop-split.c (split_loop): Update. * tree-ssa-loop-unswitch.c (tree_unswitch_loop): Update. * tree-vect-loop-manip.c (vect_do_peeling): Update. (vect_loop_versioning): Update. * tree-vect-loop.c (scale_profile_for_vect_loop): Update. From-SVN: r249872
2017-06-29asan.c (asan_emit_stack_protection): Update.Jan Hubicka1-19/+31
* asan.c (asan_emit_stack_protection): Update. (create_cond_insert_point): Update. * auto-profile.c (afdo_propagate_circuit): Update. * basic-block.h (struct edge_def): Turn probability to profile_probability. (EDGE_FREQUENCY): Update. * bb-reorder.c (find_traces_1_round): Update. (better_edge_p): Update. (sanitize_hot_paths): Update. * cfg.c (unchecked_make_edge): Initialize probability to uninitialized. (make_single_succ_edge): Update. (check_bb_profile): Update. (dump_edge_info): Update. (update_bb_profile_for_threading): Update. * cfganal.c (connect_infinite_loops_to_exit): Initialize new edge probabilitycount to 0. * cfgbuild.c (compute_outgoing_frequencies): Update. * cfgcleanup.c (try_forward_edges): Update. (outgoing_edges_match): Update. (try_crossjump_to_edge): Update. * cfgexpand.c (expand_gimple_cond): Update make_single_succ_edge. (expand_gimple_tailcall): Update. (construct_init_block): Use make_single_succ_edge. (construct_exit_block): Use make_single_succ_edge. * cfghooks.c (verify_flow_info): Update. (redirect_edge_succ_nodup): Update. (split_edge): Update. (account_profile_record): Update. * cfgloopanal.c (single_likely_exit): Update. * cfgloopmanip.c (scale_loop_profile): Update. (set_zero_probability): Remove. (duplicate_loop_to_header_edge): Update. * cfgloopmanip.h (loop_version): Update prototype. * cfgrtl.c (try_redirect_by_replacing_jump): Update. (force_nonfallthru_and_redirect): Update. (update_br_prob_note): Update. (rtl_verify_edges): Update. (purge_dead_edges): Update. (rtl_lv_add_condition_to_bb): Update. * cgraph.c: (cgraph_edge::redirect_call_stmt_to_calle): Update. * cgraphunit.c (init_lowered_empty_function): Update. (cgraph_node::expand_thunk): Update. * cilk-common.c: Include profile-count.h * dojump.c (inv): Remove. (jumpifnot): Update. (jumpifnot_1): Update. (do_jump_1): Update. (do_jump): Update. (do_jump_by_parts_greater_rtx): Update. (do_compare_rtx_and_jump): Update. * dojump.h (jumpifnot, jumpifnot_1, jumpif_1, jumpif, do_jump, do_jump_1. do_compare_rtx_and_jump): Update prototype. * dwarf2cfi.c: Include profile-count.h * except.c (dw2_build_landing_pads): Use make_single_succ_edge. (sjlj_emit_dispatch_table): Likewise. * explow.c: Include profile-count.h * expmed.c (emit_store_flag_force): Update. (do_cmp_and_jump): Update. * expr.c (compare_by_pieces_d::generate): Update. (compare_by_pieces_d::finish_mode): Update. (emit_block_move_via_loop): Update. (store_expr_with_bounds): Update. (store_constructor): Update. (expand_expr_real_2): Update. (expand_expr_real_1): Update. * expr.h (try_casesi, try_tablejump): Update prototypes. * gimple-pretty-print.c (dump_probability): Update. (dump_profile): New. (dump_gimple_label): Update. (dump_gimple_bb_header): Update. * graph.c (draw_cfg_node_succ_edges): Update. * hsa-gen.c (convert_switch_statements): Update. * ifcvt.c (cheap_bb_rtx_cost_p): Update. (find_if_case_1): Update. (find_if_case_2): Update. * internal-fn.c (expand_arith_overflow_result_store): Update. (expand_addsub_overflow): Update. (expand_neg_overflow): Update. (expand_mul_overflow): Update. (expand_vector_ubsan_overflow): Update. * ipa-cp.c (good_cloning_opportunity_p): Update. * ipa-split.c (split_function): Use make_single_succ_edge. * ipa-utils.c (ipa_merge_profiles): Update. * loop-doloop.c (add_test): Update. (doloop_modify): Update. * loop-unroll.c (compare_and_jump_seq): Update. (unroll_loop_runtime_iterations): Update. * lra-constraints.c (lra_inheritance): Update. * lto-streamer-in.c (input_cfg): Update. * lto-streamer-out.c (output_cfg): Update. * mcf.c (adjust_cfg_counts): Update. * modulo-sched.c (sms_schedule): Update. * omp-expand.c (expand_omp_for_init_counts): Update. (extract_omp_for_update_vars): Update. (expand_omp_ordered_sink): Update. (expand_omp_for_ordered_loops): Update. (expand_omp_for_generic): Update. (expand_omp_for_static_nochunk): Update. (expand_omp_for_static_chunk): Update. (expand_cilk_for): Update. (expand_omp_simd): Update. (expand_omp_taskloop_for_outer): Update. (expand_omp_taskloop_for_inner): Update. * omp-simd-clone.c (simd_clone_adjust): Update. * optabs.c (expand_doubleword_shift): Update. (expand_abs): Update. (emit_cmp_and_jump_insn_1): Update. (expand_compare_and_swap_loop): Update. * optabs.h (emit_cmp_and_jump_insns): Update prototype. * predict.c (predictable_edge_p): Update. (edge_probability_reliable_p): Update. (set_even_probabilities): Update. (combine_predictions_for_insn): Update. (combine_predictions_for_bb): Update. (propagate_freq): Update. (estimate_bb_frequencies): Update. (force_edge_cold): Update. * profile-count.c (profile_count::dump): Add missing space into dump. (profile_count::debug): Add newline. (profile_count::differs_from_p): Explicitly convert to unsigned. (profile_count::stream_in): Update. (profile_probability::dump): New member function. (profile_probability::debug): New member function. (profile_probability::differs_from_p): New member function. (profile_probability::differs_lot_from_p): New member function. (profile_probability::stream_in): New member function. (profile_probability::stream_out): New member function. * profile-count.h (profile_count_quality): Rename to ... (profile_quality): ... this one. (profile_probability): New. (profile_count): Update. * profile.c (compute_branch_probabilities): Update. * recog.c (peep2_attempt): Update. * sched-ebb.c (schedule_ebbs): Update. * sched-rgn.c (find_single_block_region): Update. (compute_dom_prob_ps): Update. (schedule_region): Update. * sel-sched-ir.c (compute_succs_info): Update. * stmt.c (struct case_node): Update. (do_jump_if_equal): Update. (get_outgoing_edge_probs): Update. (conditional_probability): Update. (emit_case_dispatch_table): Update. (expand_case): Update. (expand_sjlj_dispatch_table): Update. (emit_case_nodes): Update. * targhooks.c: Update. * tracer.c (better_p): Update. (find_best_successor): Update. * trans-mem.c (expand_transaction): Update. * tree-call-cdce.c: Update. * tree-cfg.c (gimple_split_edge): Upate. (move_sese_region_to_fn): Upate. * tree-cfgcleanup.c (cleanup_control_expr_graph): Upate. * tree-eh.c (lower_resx): Upate. (cleanup_empty_eh_move_lp): Upate. * tree-if-conv.c (version_loop_for_if_conversion): Update. * tree-inline.c (copy_edges_for_bb): Update. (copy_cfg_body): Update. * tree-parloops.c (gen_parallel_loop): Update. * tree-profile.c (gimple_gen_ic_func_profiler): Update. (gimple_gen_time_profiler): Update. * tree-ssa-dce.c (remove_dead_stmt): Update. * tree-ssa-ifcombine.c (update_profile_after_ifcombine): Update. * tree-ssa-loop-im.c (execute_sm_if_changed): Update. * tree-ssa-loop-ivcanon.c (remove_exits_and_undefined_stmts): Update. (unloop_loops): Update. (try_peel_loop): Update. * tree-ssa-loop-manip.c (tree_transform_and_unroll_loop): Update. * tree-ssa-loop-split.c (connect_loops): Update. (split_loop): Update. * tree-ssa-loop-unswitch.c (tree_unswitch_loop): Update. (hoist_guard): Update. * tree-ssa-phionlycprop.c (propagate_rhs_into_lhs): Update. * tree-ssa-phiopt.c (replace_phi_edge_with_variable): Update. (value_replacement): Update. * tree-ssa-reassoc.c (branch_fixup): Update. * tree-ssa-tail-merge.c (replace_block_by): Update. * tree-ssa-threadupdate.c (remove_ctrl_stmt_and_useless_edges): Update. (create_edge_and_update_destination_phis): Update. (compute_path_counts): Update. (recompute_probabilities): Update. (update_joiner_offpath_counts): Update. (freqs_to_counts_path): Update. (duplicate_thread_path): Update. * tree-switch-conversion.c (hoist_edge_and_branch_if_true): Update. (struct switch_conv_info): Update. (gen_inbound_check): Update. * tree-vect-loop-manip.c (slpeel_add_loop_guard): Update. (vect_do_peeling): Update. (vect_loop_versioning): Update. * tree-vect-loop.c (scale_profile_for_vect_loop): Update. (optimize_mask_stores): Update. * ubsan.c (ubsan_expand_null_ifn): Update. * value-prof.c (gimple_divmod_fixed_value): Update. (gimple_divmod_fixed_value_transform): Update. (gimple_mod_pow2): Update. (gimple_mod_pow2_value_transform): Update. (gimple_mod_subtract): Update. (gimple_mod_subtract_transform): Update. (gimple_ic): Update. (gimple_stringop_fixed_value): Update. (gimple_stringops_transform): Update. * value-prof.h: Update. From-SVN: r249800
2017-06-07tree-vect-loop-manip.c (vect_do_peeling): Don't skip vector loop if ↵Bin Cheng1-3/+5
versioning is required. * tree-vect-loop-manip.c (vect_do_peeling): Don't skip vector loop if versioning is required. * tree-vect-loop.c (vect_analyze_loop_2): Merge niter check for loop peeling with the check for versioning. From-SVN: r248959
2017-06-07tree-vectorizer.h (vect_build_loop_niters): New parameter.Bin Cheng1-21/+34
* tree-vectorizer.h (vect_build_loop_niters): New parameter. * tree-vect-loop-manip.c (vect_build_loop_niters): New parameter. Set true to new parameter if new ssa variable is defined. (vect_gen_vector_loop_niters): Refactor. Set range information for the new vector loop bound variable. (vect_do_peeling): Ditto. From-SVN: r248958
2017-06-04i386.c (make_resolver_func): Update.Jan Hubicka1-1/+1
2017-05-23 Jan Hubicka <hubicka@ucw.cz> * config/i386/i386.c (make_resolver_func): Update. * Makefile.in: Add profile-count.h and profile-count.o * auto-profile.c (afdo_indirect_call): Update to new API. (afdo_set_bb_count): Update. (afdo_propagate_edge): Update. (afdo_propagate_circuit): Update. (afdo_calculate_branch_prob): Update. (afdo_annotate_cfg): Update. * basic-block.h: Include profile-count.h (struct edge_def): Turn count to profile_count. (struct basic_block_def): Likewie. (REG_BR_PROB_BASE): Move to profile-count.h (RDIV): Move to profile-count.h * bb-reorder.c (max_entry_count): Turn to profile_count. (find_traces): Update. (rotate_loop):Update. (connect_traces):Update. (sanitize_hot_paths):Update. * bt-load.c (migrate_btr_defs): Update. * cfg.c (RDIV): Remove. (init_flow): Use alloc_block. (alloc_block): Uninitialize count. (unchecked_make_edge): Uninitialize count. (check_bb_profile): Update. (dump_edge_info): Update. (dump_bb_info): Update. (update_bb_profile_for_threading): Update. (scale_bbs_frequencies_int): Update. (scale_bbs_frequencies_gcov_type): Update. (scale_bbs_frequencies_profile_count): New. * cfg.h (update_bb_profile_for_threading): Update. (scale_bbs_frequencies_profile_count): Declare. * cfgbuild.c (compute_outgoing_frequencies): Update. (find_many_sub_basic_blocks): Update. * cfgcleanup.c (try_forward_edges): Update. (try_crossjump_to_edge): Update. * cfgexpand.c (expand_gimple_tailcall): Update. (construct_exit_block): Update. * cfghooks.c (verify_flow_info): Update. (dump_bb_for_graph): Update. (split_edge): Update. (make_forwarder_block): Update. (duplicate_block): Update. (account_profile_record): Update. * cfgloop.c (find_subloop_latch_edge_by_profile): Update. (get_estimated_loop_iterations): Update. * cfgloopanal.c (expected_loop_iterations_unbounded): Update. (single_likely_exit): Update. * cfgloopmanip.c (scale_loop_profile): Update. (loopify): Update. (set_zero_probability): Update. (lv_adjust_loop_entry_edge): Update. * cfgrtl.c (force_nonfallthru_and_redirect): Update. (purge_dead_edges): Update. (rtl_account_profile_record): Update. * cgraph.c (cgraph_node::create): Uninitialize count. (symbol_table::create_edge): Uninitialize count. (cgraph_update_edges_for_call_stmt_node): Update. (cgraph_edge::dump_edge_flags): Update. (cgraph_node::dump): Update. (cgraph_edge::maybe_hot_p): Update. * cgraph.h: Include profile-count.h (create_clone), create_edge, create_indirect_edge): Update. (cgraph_node): Turn count to profile_count. (cgraph_edge0: Likewise. (make_speculative, clone): Update. (create_edge): Update. (init_lowered_empty_function): Update. * cgraphclones.c (cgraph_edge::clone): Update. (duplicate_thunk_for_node): Update. (cgraph_node::create_clone): Update. * cgraphunit.c (cgraph_node::analyze): Update. (cgraph_node::expand_thunk): Update. * final.c (dump_basic_block_info): Update. * gimple-streamer-in.c (input_bb): Update. * gimple-streamer-out.c (output_bb): Update. * graphite.c (print_global_statistics): Update. (print_graphite_scop_statistics): Update. * hsa-brig.c: Include basic-block.h. * hsa-dump.c: Include basic-block.h. * hsa-gen.c (T sum_slice): Update. (convert_switch_statements):Update. * hsa-regalloc.c: Include basic-block.h. * ipa-chkp.c (chkp_produce_thunks): Update. * ipa-cp.c (struct caller_statistics): Update. (init_caller_stats): Update. (gather_caller_stats): Update. (ipcp_cloning_candidate_p): Update. (good_cloning_opportunity_p): Update. (get_info_about_necessary_edges): Update. (dump_profile_updates): Update. (update_profiling_info): Update. (update_specialized_profile): Update. (perhaps_add_new_callers): Update. (decide_about_value): Update. (ipa_cp_c_finalize): Update. * ipa-devirt.c (struct odr_type_warn_count): Update. (struct decl_warn_count): Update. (struct final_warning_record): Update. (possible_polymorphic_call_targets): Update. (ipa_devirt): Update. * ipa-fnsummary.c (redirect_to_unreachable): Update. * ipa-icf.c (sem_function::merge): Update. * ipa-inline-analysis.c (do_estimate_edge_time): Update. * ipa-inline.c (compute_uninlined_call_time): Update. (compute_inlined_call_time): Update. (want_inline_small_function_p): Update. (want_inline_self_recursive_call_p): Update. (edge_badness): Update. (lookup_recursive_calls): Update. (recursive_inlining): Update. (inline_small_functions): Update. (dump_overall_stats): Update. (dump_inline_stats): Update. * ipa-profile.c (ipa_profile_generate_summary): Update. (ipa_propagate_frequency): Update. (ipa_profile): Update. * ipa-prop.c (ipa_make_edge_direct_to_target): Update. * ipa-utils.c (ipa_merge_profiles): Update. * loop-doloop.c (doloop_modify): Update. * loop-unroll.c (report_unroll): Update. (unroll_loop_runtime_iterations): Update. * lto-cgraph.c (lto_output_edge): Update. (lto_output_node): Update. (input_node): Update. (input_edge): Update. (merge_profile_summaries): Update. * lto-streamer-in.c (input_cfg): Update. * lto-streamer-out.c (output_cfg): Update. * mcf.c (create_fixup_graph): Update. (adjust_cfg_counts): Update. (sum_edge_counts): Update. * modulo-sched.c (sms_schedule): Update. * postreload-gcse.c (eliminate_partially_redundant_load): Update. * predict.c (maybe_hot_count_p): Update. (probably_never_executed): Update. (dump_prediction): Update. (combine_predictions_for_bb): Update. (propagate_freq): Update. (handle_missing_profiles): Update. (counts_to_freqs): Update. (rebuild_frequencies): Update. (force_edge_cold): Update. * predict.h: Include profile-count.h (maybe_hot_count_p, counts_to_freqs): UPdate. * print-rtl-function.c: Do not include cfg.h * print-rtl.c: Include basic-block.h * profile-count.c: New file. * profile-count.h: New file. * profile.c (is_edge_inconsistent): Update. (correct_negative_edge_counts): Update. (is_inconsistent): Update. (set_bb_counts): Update. (read_profile_edge_counts): Update. (compute_frequency_overlap): Update. (compute_branch_probabilities): Update; Initialize and deinitialize gcov_count tables. (branch_prob): Update. * profile.h (bb_gcov_counts, edge_gcov_counts): New. (edge_gcov_count): New. (bb_gcov_count): New. * shrink-wrap.c (try_shrink_wrapping): Update. * tracer.c (better_p): Update. * trans-mem.c (expand_transaction): Update. (ipa_tm_insert_irr_call): Update. (ipa_tm_insert_gettmclone_call): Update. * tree-call-cdce.c: Update. * tree-cfg.c (gimple_duplicate_sese_region): Update. (gimple_duplicate_sese_tail): Update. (gimple_account_profile_record): Update. (execute_fixup_cfg): Update. * tree-inline.c (copy_bb): Update. (copy_edges_for_bb): Update. (initialize_cfun): Update. (freqs_to_counts): Update. (copy_cfg_body): Update. (expand_call_inline): Update. * tree-ssa-ifcombine.c (update_profile_after_ifcombine): Update. * tree-ssa-loop-ivcanon.c (unloop_loops): Update. (try_unroll_loop_completely): Update. (try_peel_loop): Update. * tree-ssa-loop-manip.c (tree_transform_and_unroll_loop): Update. * tree-ssa-loop-niter.c (estimate_numbers_of_iterations_loop): Update. * tree-ssa-loop-split.c (connect_loops): Update. * tree-ssa-loop-unswitch.c (hoist_guard): Update. * tree-ssa-reassoc.c (branch_fixup): Update. * tree-ssa-tail-merge.c (replace_block_by): Update. * tree-ssa-threadupdate.c (create_block_for_threading): Update. (compute_path_counts): Update. (update_profile): Update. (recompute_probabilities): Update. (update_joiner_offpath_counts): Update. (estimated_freqs_path): Update. (freqs_to_counts_path): Update. (clear_counts_path): Update. (ssa_fix_duplicate_block_edges): Update. (duplicate_thread_path): Update. * tree-switch-conversion.c (case_bit_test_cmp): Update. (struct switch_conv_info): Update. * tree-tailcall.c (decrease_profile): Update. * tree-vect-loop-manip.c (slpeel_add_loop_guard): Update. * tree-vect-loop.c (scale_profile_for_vect_loop): Update. * value-prof.c (check_counter): Update. (gimple_divmod_fixed_value): Update. (gimple_mod_pow2): Update. (gimple_mod_subtract): Update. (gimple_ic_transform): Update. (gimple_stringop_fixed_value): Update. * value-prof.h (gimple_ic): Update. * gcc.dg/tree-ssa/attr-hotcold-2.c: Update template. From-SVN: r248863
2017-05-31* tree-vect-loop-manip.c (create_intersect_range_checks_index)Bin Cheng1-214/+2
(create_intersect_range_checks): Move from ... * tree-data-ref.c (create_intersect_range_checks_index) (create_intersect_range_checks): ... to here. (create_runtime_alias_checks): New function factored from ... * tree-vect-loop-manip.c (vect_create_cond_for_alias_checks): ... here. Call above function. * tree-data-ref.h (create_runtime_alias_checks): New function. From-SVN: r248726
2017-05-26tree-vect-loop-manip.c (create_intersect_range_checks_index): Pass in ↵Bin Cheng1-10/+10
parameter loop, rather than loop_vinfo. * tree-vect-loop-manip.c (create_intersect_range_checks_index): Pass in parameter loop, rather than loop_vinfo. (create_intersect_range_checks): Ditto. (vect_create_cond_for_alias_checks): Update call to above functions. From-SVN: r248513
2017-03-28tree-vect-loop-manip.c (slpeel_add_loop_guard): New param and mark new ↵Bin Cheng1-5/+13
edge's irreducible flag accordign to it. * tree-vect-loop-manip.c (slpeel_add_loop_guard): New param and mark new edge's irreducible flag accordign to it. (vect_do_peeling): Check loop preheader edge's irreducible flag and pass it to function slpeel_add_loop_guard. gcc/testsuite * gcc.c-torture/compile/irreducible-loop.c: New. From-SVN: r246540
2017-03-07tree-vect-loop-manip.c (slpeel_add_loop_guard): Preserve preheaders.Richard Biener1-0/+5
2017-03-07 Richard Biener <rguenther@suse.de> * tree-vect-loop-manip.c (slpeel_add_loop_guard): Preserve preheaders. From-SVN: r245950
2017-02-15re PR tree-optimization/79347 (vect_do_peeling is messing up profile)Bin Cheng1-2/+37
PR tree-optimization/79347 * tree-vect-loop-manip.c (vect_do_peeling): Maintain profile counters during peeling. gcc/testsuite * gcc.dg/vect/pr79347.c: New test. From-SVN: r245490
2017-02-05re PR tree-optimization/79347 (vect_do_peeling is messing up profile)Jan Hubicka1-2/+4
PR tree-ssa/79347 * cfgloopmanip.c (lv_adjust_loop_entry_edge, loop_version): Add ELSE_PROB. * cfgloopmanip.h (loop_version): Update prototype. * modulo-sched.c (sms_schedule): Update call of loop_version. * tree-if-conv.c(version_loop_for_if_conversion): Likewise. * tree-parloops.c (gen_parallel_loop): Likewise. * tree-ssa-loop-manip.c (tree_transform_and_unroll_loop): Likewise. * tree-ssa-loop-split.c (split_loop): Likewise. * tree-ssa-loop-unswitch.c (tree_unswitch_loop): Likewise. * tree-vect-loop-manip.c (vect_loop_versioning): Likewise. * gcc.dg/tree-ssa/ifc-10.c: Match for profile mismatches. * gcc.dg/tree-ssa/ifc-11.c: Match for profile mismatches. * gcc.dg/tree-ssa/ifc-12.c: Match for profile mismatches. * gcc.dg/tree-ssa/ifc-20040816-1.c: Match for profile mismatches. * gcc.dg/tree-ssa/ifc-20040816-2.c: Match for profile mismatches. * gcc.dg/tree-ssa/ifc-5.c: Match for profile mismatches. * gcc.dg/tree-ssa/ifc-8.c: Match for profile mismatches. * gcc.dg/tree-ssa/ifc-9.c: Match for profile mismatches. * gcc.dg/tree-ssa/ifc-cd.c: Match for profile mismatches. * gcc.dg/tree-ssa/ifc-pr56541.c: Match for profile mismatches. * gcc.dg/tree-ssa/ifc-pr68583.c: Match for profile mismatches. * gcc.dg/tree-ssa/ifc-pr69489-1.c: Match for profile mismatches. * gcc.dg/tree-ssa/ifc-pr69489-2.c: Match for profile mismatches. From-SVN: r245196
2017-01-09re PR tree-optimization/78899 (Vestorized loop with optmized mask stores ↵Jakub Jelinek1-4/+20
motion is completely deleted after r242520.) PR tree-optimization/78899 * tree-if-conv.c (version_loop_for_if_conversion): Instead of returning bool return struct loop *, NULL for failure and the new loop on success. (versionable_outer_loop_p): Don't version outer loop if it has dont_vectorized bit set. (tree_if_conversion): When versioning outer loop, ensure tree_if_conversion is performed also on the inner loop of the non-vectorizable outer loop copy. * tree-vectorizer.c (set_uid_loop_bbs): Formatting fix. Fold LOOP_VECTORIZED in inner loop of the scalar outer loop and prevent vectorization of it. (vectorize_loops): For outer + inner LOOP_VECTORIZED, ensure the outer loop vectorization of the non-scalar version is attempted before vectorization of the inner loop in scalar version. If outer LOOP_VECTORIZED guarded loop is not vectorized, prevent vectorization of its inner loop. * tree-vect-loop-manip.c (rename_variables_in_bb): If outer_loop has 2 inner loops, rename also on edges from bb whose single pred is outer_loop->header. Fix typo in function comment. * gcc.target/i386/pr78899.c: New test. * gcc.dg/pr71077.c: New test. From-SVN: r244238
2017-01-01Update copyright years.Jakub Jelinek1-1/+1
From-SVN: r243994
2016-12-08re PR middle-end/78684 (ICE in create_intersect_range_checks_index, at ↵Bin Cheng1-2/+1
tree-vect-loop-manip.c:2074) PR middle-end/78684 * tree-vect-loop-manip.c (create_intersect_range_checks_index): Check sign bit for index step of data reference. gcc/testsuite PR middle-end/78684 * g++.dg/torture/pr78684.C: New test. From-SVN: r243431
2016-11-16Support non-masked epilogue vectoriziationYuri Rumyantsev1-3/+7
gcc/ 2016-11-16 Yuri Rumyantsev <ysrumyan@gmail.com> * params.def (PARAM_VECT_EPILOGUES_NOMASK): New. * tree-if-conv.c (tree_if_conversion): Make public. * * tree-if-conv.h: New file. * tree-vect-data-refs.c (vect_analyze_data_ref_dependences) Avoid dynamic alias checks for epilogues. * tree-vect-loop-manip.c (vect_do_peeling): Return created epilog. * tree-vect-loop.c: include tree-if-conv.h. (new_loop_vec_info): Add zeroing orig_loop_info field. (vect_analyze_loop_2): Don't try to enhance alignment for epilogues. (vect_analyze_loop): Add argument ORIG_LOOP_INFO which is not NULL if epilogue is vectorized, set up orig_loop_info field of loop_vinfo using passed argument. (vect_transform_loop): Check if created epilogue should be returned for further vectorization with less vf. If-convert epilogue if required. Print vectorization success for epilogue. * tree-vectorizer.c (vectorize_loops): Add epilogue vectorization if it is required, pass loop_vinfo produced during vectorization of loop body to vect_analyze_loop. * tree-vectorizer.h (struct _loop_vec_info): Add new field orig_loop_info. (LOOP_VINFO_ORIG_LOOP_INFO): New. (LOOP_VINFO_EPILOGUE_P): New. (LOOP_VINFO_ORIG_VECT_FACTOR): New. (vect_do_peeling): Change prototype to return epilogue. (vect_analyze_loop): Add argument of loop_vec_info type. (vect_transform_loop): Return created loop. gcc/testsuite/ 2016-11-16 Yuri Rumyantsev <ysrumyan@gmail.com> * lib/target-supports.exp (check_avx2_hw_available): New. (check_effective_target_avx2_runtime): New. * gcc.dg/vect/vect-tail-nomask-1.c: New test. From-SVN: r242501
2016-11-16Fix nb_iterations calculation in tree-vect-loop-manip.cRichard Sandiford1-1/+4
We previously stored the number of loop iterations rather than the number of latch iterations. gcc/ 2016-11-15 Richard Sandiford <richard.sandiford@arm.com> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> * tree-vect-loop-manip.c (slpeel_make_loop_iterate_ntimes): Set nb_iterations to the number of latch iterations rather than the number of loop iterations. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r242493
2016-10-19re PR tree-optimization/78005 (172.mgrid and 450.soplex miscompare)Bin Cheng1-40/+43
PR tree-optimization/78005 * tree-vect-loop-manip.c (vect_gen_prolog_loop_niters): Compute upper (included) bound for niters of prolog loop. (vect_gen_scalar_loop_niters): Change parameter VF to VFM1. Compute niters of scalar loop above which vectorized loop is preferred, as well as the upper (included) bound for the niters. (vect_do_peeling): Record niter bound for loops accordingly. gcc/testsuite PR tree-optimization/78005 * gcc.dg/vect/pr78005.c: New. * gcc.target/i386/l_fma_float_1.c: Revise test. * gcc.target/i386/l_fma_float_2.c: Ditto. * gcc.target/i386/l_fma_float_3.c: Ditto. * gcc.target/i386/l_fma_float_4.c: Ditto. * gcc.target/i386/l_fma_float_5.c: Ditto. * gcc.target/i386/l_fma_float_6.c: Ditto. * gcc.target/i386/l_fma_double_1.c: Ditto. * gcc.target/i386/l_fma_double_2.c: Ditto. * gcc.target/i386/l_fma_double_3.c: Ditto. * gcc.target/i386/l_fma_double_4.c: Ditto. * gcc.target/i386/l_fma_double_5.c: Ditto. * gcc.target/i386/l_fma_double_6.c: Ditto. From-SVN: r241339
2016-10-13tree-vect-loop-manip.c (adjust_vec_debug_stmts): Don't release adjust_vec ↵Bin Cheng1-1131/+843
automatically. * tree-vect-loop-manip.c (adjust_vec_debug_stmts): Don't release adjust_vec automatically. (slpeel_add_loop_guard): Remove param cond_expr_stmt_list. Rename param exit_bb to guard_to. (slpeel_checking_verify_cfg_after_peeling): (set_prologue_iterations): (create_lcssa_for_virtual_phi): New func which is factored out from slpeel_tree_peel_loop_to_edge. (slpeel_tree_peel_loop_to_edge): (iv_phi_p): New func. (vect_can_advance_ivs_p): Call iv_phi_p. (vect_update_ivs_after_vectorizer): Call iv_phi_p. Directly insert new gimple stmts in basic block. (vect_do_peeling_for_loop_bound): (vect_do_peeling_for_alignment): (vect_gen_niters_for_prolog_loop): Rename to... (vect_gen_prolog_loop_niters): ...Rename from. Change parameters and adjust implementation. (vect_update_inits_of_drs): Fix code style issue. Convert niters to sizetype if necessary. (vect_build_loop_niters): Move to here from tree-vect-loop.c. Change it to external function. (vect_gen_scalar_loop_niters, vect_gen_vector_loop_niters): New. (vect_gen_vector_loop_niters_mult_vf): New. (slpeel_update_phi_nodes_for_loops): New. (slpeel_update_phi_nodes_for_guard1): Reimplement. (find_guard_arg, slpeel_update_phi_nodes_for_guard2): Reimplement. (slpeel_update_phi_nodes_for_lcssa, vect_do_peeling): New. * tree-vect-loop.c (vect_build_loop_niters): Move to file tree-vect-loop-manip.c (vect_generate_tmps_on_preheader): Delete. (vect_transform_loop): Rename vectorization_factor to vf. Call vect_do_peeling instead of vect_do_peeling-* functions. * tree-vectorizer.h (vect_do_peeling): New decl. (vect_build_loop_niters, vect_gen_vector_loop_niters): New decls. (vect_do_peeling_for_loop_bound): Delete. (vect_do_peeling_for_alignment): Delete. From-SVN: r241099
2016-10-13tree-vect-loop-manip.c (slpeel_tree_duplicate_loop_to_edge_cfg): Put ↵Bin Cheng1-6/+7
duplicated loop after its preheader and after the original loop. * tree-vect-loop-manip.c (slpeel_tree_duplicate_loop_to_edge_cfg): Put duplicated loop after its preheader and after the original loop. From-SVN: r241098
2016-10-13tree-vect-loop-manip.c (slpeel_can_duplicate_loop_p): Fix code style issue.Bin Cheng1-9/+3
* tree-vect-loop-manip.c (slpeel_can_duplicate_loop_p): Fix code style issue. (vect_do_peeling_for_loop_bound, vect_do_peeling_for_alignment): Remove useless code. From-SVN: r241092
2016-09-28re PR tree-optimization/77724 (bootstrap-O3 broken: ICE: in tree_to_uhwi, at ↵Robin Dapp1-5/+7
tree.c:7330) Fix PR tree-optimization/77724 2016-09-27 Robin Dapp <rdapp@linux.vnet.ibm.com> PR tree-optimization/77724 * tree-vect-loop-manip.c (create_intersect_range_checks_index): Add tree_fits_shwi_p check. From-SVN: r240565
2016-09-23tree-vect-loop-manip.c (create_intersect_range_checks_index): New.Bin Cheng1-47/+190
* tree-vect-loop-manip.c (create_intersect_range_checks_index): New. (create_intersect_range_checks): New. (vect_create_cond_for_alias_checks): Call above function. From-SVN: r240412
2016-08-19re PR tree-optimization/77286 (ICE in fold_convert_loc, at fold-const.c:2248 ↵Richard Biener1-7/+18
building 435.gromacs) 2016-08-19 Richard Biener <rguenther@suse.de> PR tree-optimization/77286 * tree-vect-loop-manip.c (slpeel_duplicate_current_defs_from_edges): Deal with virtual PHIs being out-of-order. * gcc.dg/torture/pr77286.c: New testcase. From-SVN: r239605
2016-08-17tree-ssa.c: Include tree-cfg.h and tree-dfa.h.Richard Biener1-0/+3
2016-08-17 Richard Biener <rguenther@suse.de> * tree-ssa.c: Include tree-cfg.h and tree-dfa.h. (verify_vssa): New function verifying virtual SSA form. (verify_ssa): Call it. * tree-ssa-loop-manip.c (slpeel_update_phi_nodes_for_guard2): Do not apply loop-closed SSA handling to virtuals. * ssa-iterators.h (op_iter_init): Handle GIMPLE_TRANSACTION. * tree-into-ssa.c (prepare_use_sites_for): Skip virtual SSA names when rewriting their symbol. (prepare_def_site_for): Likewise. * tree-chkp-opt.c (chkp_reduce_bounds_lifetime): Clear virtual operands of moved stmts. From-SVN: r239524
2016-08-01re PR tree-optimization/71818 (ICE in as_a, at is-a.h:192 w/ -O2 ↵Alan Hayward1-1/+18
-ftree-vectorize) 2016-08-01 Alan Hayward <alan.hayward@arm.com> gcc/ PR tree-optimization/71818 * tree-vect-loop-manip.c (vect_can_advance_ivs_p): Don't advance IVs with non invariant evolutions testsuite/ PR tree-optimization/71818 * gcc.dg/vect/pr71818.c: New From-SVN: r238955