Age | Commit message (Collapse) | Author | Files | Lines |
|
This PR is an odd case in which, due to the low optimisation level,
we enter vectorisation with:
outer1:
x_1 = PHI <x_3(outer2), ...>;
...
inner:
x_2 = 0;
...
outer2:
x_3 = PHI <x_2(inner)>;
These statements are tentatively treated as a double reduction by
vect_force_simple_reduction, but in the end only x_3 and x_2 are marked
as relevant. vect_analyze_loop_operations skips over x_3, leaving the
vectorizable_reduction check to a presumed future test of x_1, which
in this case never happens. We therefore end up vectorising x_2 only
(complete with peeling for niters!) and leave the scalar x_3 in place.
This caused a segfault in the support for fully-masked loops,
since there were no statements that needed masking. Fixed by
checking for that.
But I think this is also a flaw in vect_analyze_loop_operations.
Outer loop vectorisation reduces the number of times that the
inner loop is executed, so it wouldn't necessarily be valid
to leave the scalar x_3 in place for all vectorisable x_2.
There's already code to forbid that when x_1 isn't present:
/* FORNOW: we currently don't support the case that these phis
are not used in the outerloop (unless it is double reduction,
i.e., this phi is vect_reduction_def), cause this case
requires to actually do something here. */
I think we need to do the same if x_1 is present but not relevant.
2018-01-19 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
PR tree-optimization/83922
* tree-vect-loop.c (vect_verify_full_masking): Return false if
there are no statements that need masking.
(vect_active_double_reduction_p): New function.
(vect_analyze_loop_operations): Use it when handling phis that
are not in the loop header.
gcc/testsuite/
PR tree-optimization/83922
* gcc.dg/pr83922.c: New test.
From-SVN: r256885
|
|
This testcase ICEd because we converted the initial value of an
induction to the vector element type even for nested inductions.
This isn't necessary because the initial expression is vectorised
normally, and it meant that init_expr was no longer the original
statement operand by the time we called vect_get_vec_def_for_operand.
Also, adding the conversion code here made the existing SLP conversion
redundant.
2018-01-19 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
PR tree-optimization/83914
* tree-vect-loop.c (vectorizable_induction): Don't convert
init_expr or apply the peeling adjustment for inductions
that are nested within the vectorized loop.
gcc/testsuite/
PR tree-optimization/83914
* gcc.dg/vect/pr83914.c: New test.
From-SVN: r256884
|
|
vect_analyze_loop_operations was calling vectorizable_live_operation
for all live-out phis, which led to a bogus ncopies calculation in
the pure SLP case. I think v_a_l_o should only be passing phis
that are vectorised using normal loop vectorisation, since
vect_slp_analyze_node_operations handles the SLP side (and knows
the correct slp_index and slp_node arguments to pass in, via
vect_analyze_stmt).
With that fixed we hit an older bug that vectorizable_live_operation
didn't handle live-out SLP inductions. Fixed by using gimple_phi_result
rather than gimple_get_lhs for phis.
2018-01-16 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
PR tree-optimization/83857
* tree-vect-loop.c (vect_analyze_loop_operations): Don't call
vectorizable_live_operation for pure SLP statements.
(vectorizable_live_operation): Handle PHIs.
gcc/testsuite/
PR tree-optimization/83857
* gcc.dg/vect/pr83857.c: New test.
From-SVN: r256747
|
|
This patch adds runtime alias checks for loops with variable strides,
so that we can vectorise them even without a restrict qualifier.
There are several parts to doing this:
1) For accesses like:
x[i * n] += 1;
we need to check whether n (and thus the DR_STEP) is nonzero.
vect_analyze_data_ref_dependence records values that need to be
checked in this way, then prune_runtime_alias_test_list records a
bounds check on DR_STEP being outside the range [0, 0].
2) For accesses like:
x[i * n] = x[i * n + 1] + 1;
we simply need to test whether abs (n) >= 2.
prune_runtime_alias_test_list looks for cases like this and tries
to guess whether it is better to use this kind of check or a check
for non-overlapping ranges. (We could do an OR of the two conditions
at runtime, but that isn't implemented yet.)
3) Checks for overlapping ranges need to cope with variable strides.
At present the "length" of each segment in a range check is
represented as an offset from the base that lies outside the
touched range, in the same direction as DR_STEP. The length
can therefore be negative and is sometimes conservative.
With variable steps it's easier to reaon about if we split
this into two:
seg_len:
distance travelled from the first iteration of interest
to the last, e.g. DR_STEP * (VF - 1)
access_size:
the number of bytes accessed in each iteration
with access_size always being a positive constant and seg_len
possibly being variable. We can then combine alias checks
for two accesses that are a constant number of bytes apart by
adjusting the access size to account for the gap. This leaves
the segment length unchanged, which allows the check to be combined
with further accesses.
When seg_len is positive, the runtime alias check has the form:
base_a >= base_b + seg_len_b + access_size_b
|| base_b >= base_a + seg_len_a + access_size_a
In many accesses the base will be aligned to the access size, which
allows us to skip the addition:
base_a > base_b + seg_len_b
|| base_b > base_a + seg_len_a
A similar saving is possible with "negative" lengths.
The patch therefore tracks the alignment in addition to seg_len
and access_size.
2018-01-13 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* tree-vectorizer.h (vec_lower_bound): New structure.
(_loop_vec_info): Add check_nonzero and lower_bounds.
(LOOP_VINFO_CHECK_NONZERO): New macro.
(LOOP_VINFO_LOWER_BOUNDS): Likewise.
(LOOP_REQUIRES_VERSIONING_FOR_ALIAS): Check lower_bounds too.
* tree-data-ref.h (dr_with_seg_len): Add access_size and align
fields. Make seg_len the distance travelled, not including the
access size.
(dr_direction_indicator): Declare.
(dr_zero_step_indicator): Likewise.
(dr_known_forward_stride_p): Likewise.
* tree-data-ref.c: Include stringpool.h, tree-vrp.h and
tree-ssanames.h.
(runtime_alias_check_p): Allow runtime alias checks with
variable strides.
(operator ==): Compare access_size and align.
(prune_runtime_alias_test_list): Rework for new distinction between
the access_size and seg_len.
(create_intersect_range_checks_index): Likewise. Cope with polynomial
segment lengths.
(get_segment_min_max): New function.
(create_intersect_range_checks): Use it.
(dr_step_indicator): New function.
(dr_direction_indicator): Likewise.
(dr_zero_step_indicator): Likewise.
(dr_known_forward_stride_p): Likewise.
* tree-loop-distribution.c (data_ref_segment_size): Return
DR_STEP * (niters - 1).
(compute_alias_check_pairs): Update call to the dr_with_seg_len
constructor.
* tree-vect-data-refs.c (vect_check_nonzero_value): New function.
(vect_preserves_scalar_order_p): New function, split out from...
(vect_analyze_data_ref_dependence): ...here. Check for zero steps.
(vect_vfa_segment_size): Return DR_STEP * (length_factor - 1).
(vect_vfa_access_size): New function.
(vect_vfa_align): Likewise.
(vect_compile_time_alias): Take access_size_a and access_b arguments.
(dump_lower_bound): New function.
(vect_check_lower_bound): Likewise.
(vect_small_gap_p): Likewise.
(vectorizable_with_step_bound_p): Likewise.
(vect_prune_runtime_alias_test_list): Ignore cross-iteration
depencies if the vectorization factor is 1. Convert the checks
for nonzero steps into checks on the bounds of DR_STEP. Try using
a bunds check for variable steps if the minimum required step is
relatively small. Update calls to the dr_with_seg_len
constructor and to vect_compile_time_alias.
* tree-vect-loop-manip.c (vect_create_cond_for_lower_bounds): New
function.
(vect_loop_versioning): Call it.
* tree-vect-loop.c (vect_analyze_loop_2): Clear LOOP_VINFO_LOWER_BOUNDS
when retrying.
(vect_estimate_min_profitable_iters): Account for any bounds checks.
gcc/testsuite/
* gcc.dg/vect/bb-slp-cond-1.c: Expect loop vectorization rather
than SLP vectorization.
* gcc.dg/vect/vect-alias-check-10.c: New test.
* gcc.dg/vect/vect-alias-check-11.c: Likewise.
* gcc.dg/vect/vect-alias-check-12.c: Likewise.
* gcc.dg/vect/vect-alias-check-8.c: Likewise.
* gcc.dg/vect/vect-alias-check-9.c: Likewise.
* gcc.target/aarch64/sve/strided_load_8.c: Likewise.
* gcc.target/aarch64/sve/var_stride_1.c: Likewise.
* gcc.target/aarch64/sve/var_stride_1.h: Likewise.
* gcc.target/aarch64/sve/var_stride_1_run.c: Likewise.
* gcc.target/aarch64/sve/var_stride_2.c: Likewise.
* gcc.target/aarch64/sve/var_stride_2_run.c: Likewise.
* gcc.target/aarch64/sve/var_stride_3.c: Likewise.
* gcc.target/aarch64/sve/var_stride_3_run.c: Likewise.
* gcc.target/aarch64/sve/var_stride_4.c: Likewise.
* gcc.target/aarch64/sve/var_stride_4_run.c: Likewise.
* gcc.target/aarch64/sve/var_stride_5.c: Likewise.
* gcc.target/aarch64/sve/var_stride_5_run.c: Likewise.
* gcc.target/aarch64/sve/var_stride_6.c: Likewise.
* gcc.target/aarch64/sve/var_stride_6_run.c: Likewise.
* gcc.target/aarch64/sve/var_stride_7.c: Likewise.
* gcc.target/aarch64/sve/var_stride_7_run.c: Likewise.
* gcc.target/aarch64/sve/var_stride_8.c: Likewise.
* gcc.target/aarch64/sve/var_stride_8_run.c: Likewise.
* gfortran.dg/vect/vect-alias-check-1.F90: Likewise.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256644
|
|
This patch adds support for in-order floating-point addition reductions,
which are suitable even in strict IEEE mode.
Previously vect_is_simple_reduction would reject any cases that forbid
reassociation. The idea is instead to tentatively accept them as
"FOLD_LEFT_REDUCTIONs" and only fail later if there is no support
for them. Although this patch only handles the particular case of plus
and minus on floating-point types, there's no reason in principle why
we couldn't handle other cases.
The reductions use a new fold_left_plus_optab if available, otherwise
they fall back to elementwise additions or subtractions.
The vect_force_simple_reduction change makes it easier for parloops
to read the type of reduction.
2018-01-13 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* optabs.def (fold_left_plus_optab): New optab.
* doc/md.texi (fold_left_plus_@var{m}): Document.
* internal-fn.def (IFN_FOLD_LEFT_PLUS): New internal function.
* internal-fn.c (fold_left_direct): Define.
(expand_fold_left_optab_fn): Likewise.
(direct_fold_left_optab_supported_p): Likewise.
* fold-const-call.c (fold_const_fold_left): New function.
(fold_const_call): Use it to fold CFN_FOLD_LEFT_PLUS.
* tree-parloops.c (valid_reduction_p): New function.
(gather_scalar_reductions): Use it.
* tree-vectorizer.h (FOLD_LEFT_REDUCTION): New vect_reduction_type.
(vect_finish_replace_stmt): Declare.
* tree-vect-loop.c (fold_left_reduction_fn): New function.
(needs_fold_left_reduction_p): New function, split out from...
(vect_is_simple_reduction): ...here. Accept reductions that
forbid reassociation, but give them type FOLD_LEFT_REDUCTION.
(vect_force_simple_reduction): Also store the reduction type in
the assignment's STMT_VINFO_REDUC_TYPE.
(vect_model_reduction_cost): Handle FOLD_LEFT_REDUCTION.
(merge_with_identity): New function.
(vect_expand_fold_left): Likewise.
(vectorize_fold_left_reduction): Likewise.
(vectorizable_reduction): Handle FOLD_LEFT_REDUCTION. Leave the
scalar phi in place for it. Check for target support and reject
cases that would reassociate the operation. Defer the transform
phase to vectorize_fold_left_reduction.
* config/aarch64/aarch64.md (UNSPEC_FADDA): New unspec.
* config/aarch64/aarch64-sve.md (fold_left_plus_<mode>): New expander.
(*fold_left_plus_<mode>, *pred_fold_left_plus_<mode>): New insns.
gcc/testsuite/
* gcc.dg/vect/no-fast-math-vect16.c: Expect the test to pass and
check for a message about using in-order reductions.
* gcc.dg/vect/pr79920.c: Expect both loops to be vectorized and
check for a message about using in-order reductions.
* gcc.dg/vect/trapv-vect-reduc-4.c: Expect all three loops to be
vectorized and check for a message about using in-order reductions.
Expect targets with variable-length vectors to fall back to the
fixed-length mininum.
* gcc.dg/vect/vect-reduc-6.c: Expect the loop to be vectorized and
check for a message about using in-order reductions.
* gcc.dg/vect/vect-reduc-in-order-1.c: New test.
* gcc.dg/vect/vect-reduc-in-order-2.c: Likewise.
* gcc.dg/vect/vect-reduc-in-order-3.c: Likewise.
* gcc.dg/vect/vect-reduc-in-order-4.c: Likewise.
* gcc.target/aarch64/sve/reduc_strict_1.c: New test.
* gcc.target/aarch64/sve/reduc_strict_1_run.c: Likewise.
* gcc.target/aarch64/sve/reduc_strict_2.c: Likewise.
* gcc.target/aarch64/sve/reduc_strict_2_run.c: Likewise.
* gcc.target/aarch64/sve/reduc_strict_3.c: Likewise.
* gcc.target/aarch64/sve/slp_13.c: Add floating-point types.
* gfortran.dg/vect/vect-8.f90: Expect 22 loops to be vectorized if
vect_fold_left_plus.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256639
|
|
This patch adds support for fully-masking loops that require peeling
for gaps. It peels exactly one scalar iteration and uses the masked
loop to handle the rest. Previously we would fall back on using a
standard unmasked loop instead.
2018-01-13 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* tree-vect-loop-manip.c (vect_gen_scalar_loop_niters): Replace
vfm1 with a bound_epilog parameter.
(vect_do_peeling): Update calls accordingly, and move the prologue
call earlier in the function. Treat the base bound_epilog as 0 for
fully-masked loops and retain vf - 1 for other loops. Add 1 to
this base when peeling for gaps.
* tree-vect-loop.c (vect_analyze_loop_2): Allow peeling for gaps
with fully-masked loops.
(vect_estimate_min_profitable_iters): Handle the single peeled
iteration in that case.
gcc/testsuite/
* gcc.target/aarch64/sve/struct_vect_18.c: Check the number
of branches.
* gcc.target/aarch64/sve/struct_vect_19.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_20.c: New test.
* gcc.target/aarch64/sve/struct_vect_20_run.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_21.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_21_run.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_22.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_22_run.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_23.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_23_run.c: Likewise.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256635
|
|
This patch uses SVE CLASTB to optimise conditional reductions. It means
that we no longer need to maintain a separate index vector to record
the most recent valid value, and no longer need to worry about overflow
cases.
2018-01-13 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* doc/md.texi (fold_extract_last_@var{m}): Document.
* doc/sourcebuild.texi (vect_fold_extract_last): Likewise.
* optabs.def (fold_extract_last_optab): New optab.
* internal-fn.def (FOLD_EXTRACT_LAST): New internal function.
* internal-fn.c (fold_extract_direct): New macro.
(expand_fold_extract_optab_fn): Likewise.
(direct_fold_extract_optab_supported_p): Likewise.
* tree-vectorizer.h (EXTRACT_LAST_REDUCTION): New vect_reduction_type.
* tree-vect-loop.c (vect_model_reduction_cost): Handle
EXTRACT_LAST_REDUCTION.
(get_initial_def_for_reduction): Do not create an initial vector
for EXTRACT_LAST_REDUCTION reductions.
(vectorizable_reduction): Leave the scalar phi in place for
EXTRACT_LAST_REDUCTIONs. Try using EXTRACT_LAST_REDUCTION
ahead of INTEGER_INDUC_COND_REDUCTION. Do not check for an
epilogue code for EXTRACT_LAST_REDUCTION and defer the
transform phase to vectorizable_condition.
* tree-vect-stmts.c (vect_finish_stmt_generation_1): New function,
split out from...
(vect_finish_stmt_generation): ...here.
(vect_finish_replace_stmt): New function.
(vectorizable_condition): Handle EXTRACT_LAST_REDUCTION.
* config/aarch64/aarch64-sve.md (fold_extract_last_<mode>): New
pattern.
* config/aarch64/aarch64.md (UNSPEC_CLASTB): New unspec.
gcc/testsuite/
* lib/target-supports.exp
(check_effective_target_vect_fold_extract_last): New proc.
* gcc.dg/vect/pr65947-1.c: Update dump messages. Add markup
for fold_extract_last.
* gcc.dg/vect/pr65947-2.c: Likewise.
* gcc.dg/vect/pr65947-3.c: Likewise.
* gcc.dg/vect/pr65947-4.c: Likewise.
* gcc.dg/vect/pr65947-5.c: Likewise.
* gcc.dg/vect/pr65947-6.c: Likewise.
* gcc.dg/vect/pr65947-9.c: Likewise.
* gcc.dg/vect/pr65947-10.c: Likewise.
* gcc.dg/vect/pr65947-12.c: Likewise.
* gcc.dg/vect/pr65947-14.c: Likewise.
* gcc.dg/vect/pr80631-1.c: Likewise.
* gcc.target/aarch64/sve/clastb_1.c: New test.
* gcc.target/aarch64/sve/clastb_1_run.c: Likewise.
* gcc.target/aarch64/sve/clastb_2.c: Likewise.
* gcc.target/aarch64/sve/clastb_2_run.c: Likewise.
* gcc.target/aarch64/sve/clastb_3.c: Likewise.
* gcc.target/aarch64/sve/clastb_3_run.c: Likewise.
* gcc.target/aarch64/sve/clastb_4.c: Likewise.
* gcc.target/aarch64/sve/clastb_4_run.c: Likewise.
* gcc.target/aarch64/sve/clastb_5.c: Likewise.
* gcc.target/aarch64/sve/clastb_5_run.c: Likewise.
* gcc.target/aarch64/sve/clastb_6.c: Likewise.
* gcc.target/aarch64/sve/clastb_6_run.c: Likewise.
* gcc.target/aarch64/sve/clastb_7.c: Likewise.
* gcc.target/aarch64/sve/clastb_7_run.c: Likewise.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256633
|
|
This patch uses the SVE LASTB instruction to optimise cases in which
a value produced by the final scalar iteration of a vectorised loop is
live outside the loop. Previously this situation would stop us from
using a fully-masked loop.
2018-01-13 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* doc/md.texi (extract_last_@var{m}): Document.
* optabs.def (extract_last_optab): New optab.
* internal-fn.def (EXTRACT_LAST): New internal function.
* internal-fn.c (cond_unary_direct): New macro.
(expand_cond_unary_optab_fn): Likewise.
(direct_cond_unary_optab_supported_p): Likewise.
* tree-vect-loop.c (vectorizable_live_operation): Allow fully-masked
loops using EXTRACT_LAST.
* config/aarch64/aarch64-sve.md (aarch64_sve_lastb<mode>): Rename to...
(extract_last_<mode>): ...this optab.
(vec_extract<mode><Vel>): Update accordingly.
gcc/testsuite/
* gcc.target/aarch64/sve/live_1.c: New test.
* gcc.target/aarch64/sve/live_1_run.c: Likewise.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256632
|
|
This patch adds support for aligning vectors by using a partial
first iteration. E.g. if the start pointer is 3 elements beyond
an aligned address, the first iteration will have a mask in which
the first three elements are false.
On SVE, the optimisation is only useful for vector-length-specific
code. Vector-length-agnostic code doesn't try to align vectors
since the vector length might not be a power of 2.
2018-01-13 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* tree-vectorizer.h (_loop_vec_info::mask_skip_niters): New field.
(LOOP_VINFO_MASK_SKIP_NITERS): New macro.
(vect_use_loop_mask_for_alignment_p): New function.
(vect_prepare_for_masked_peels, vect_gen_while_not): Declare.
* tree-vect-loop-manip.c (vect_set_loop_masks_directly): Add an
niters_skip argument. Make sure that the first niters_skip elements
of the first iteration are inactive.
(vect_set_loop_condition_masked): Handle LOOP_VINFO_MASK_SKIP_NITERS.
Update call to vect_set_loop_masks_directly.
(get_misalign_in_elems): New function, split out from...
(vect_gen_prolog_loop_niters): ...here.
(vect_update_init_of_dr): Take a code argument that specifies whether
the adjustment should be added or subtracted.
(vect_update_init_of_drs): Likewise.
(vect_prepare_for_masked_peels): New function.
(vect_do_peeling): Skip prologue peeling if we're using a mask
instead. Update call to vect_update_inits_of_drs.
* tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize
mask_skip_niters.
(vect_analyze_loop_2): Allow fully-masked loops with peeling for
alignment. Do not include the number of peeled iterations in
the minimum threshold in that case.
(vectorizable_induction): Adjust the start value down by
LOOP_VINFO_MASK_SKIP_NITERS iterations.
(vect_transform_loop): Call vect_prepare_for_masked_peels.
Take the number of skipped iterations into account when calculating
the loop bounds.
* tree-vect-stmts.c (vect_gen_while_not): New function.
gcc/testsuite/
* gcc.target/aarch64/sve/nopeel_1.c: New test.
* gcc.target/aarch64/sve/peel_ind_1.c: Likewise.
* gcc.target/aarch64/sve/peel_ind_1_run.c: Likewise.
* gcc.target/aarch64/sve/peel_ind_2.c: Likewise.
* gcc.target/aarch64/sve/peel_ind_2_run.c: Likewise.
* gcc.target/aarch64/sve/peel_ind_3.c: Likewise.
* gcc.target/aarch64/sve/peel_ind_3_run.c: Likewise.
* gcc.target/aarch64/sve/peel_ind_4.c: Likewise.
* gcc.target/aarch64/sve/peel_ind_4_run.c: Likewise.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256630
|
|
Fully-masked loops can be profitable even if the iteration
count is smaller than the vectorisation factor. In this case
we're effectively doing a complete unroll followed by SLP.
The documentation for min-vect-loop-bound says that the
default value was 0, but actually the default and minimum
were 1. We need it to be 0 for this case since the parameter
counts a whole number of vector iterations.
2018-01-13 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* doc/sourcebuild.texi (vect_fully_masked): Document.
* params.def (PARAM_MIN_VECT_LOOP_BOUND): Change minimum and
default value to 0.
* tree-vect-loop.c (vect_analyze_loop_costing): New function,
split out from...
(vect_analyze_loop_2): ...here. Don't check the vectorization
factor against the number of loop iterations if the loop is
fully-masked.
gcc/testsuite/
* lib/target-supports.exp (check_effective_target_vect_fully_masked):
New proc.
* gcc.dg/vect/slp-3.c: Expect all loops to be vectorized if
vect_fully_masked.
* gcc.target/aarch64/sve/loop_add_4.c: New test.
* gcc.target/aarch64/sve/loop_add_4_run.c: Likewise.
* gcc.target/aarch64/sve/loop_add_5.c: Likewise.
* gcc.target/aarch64/sve/loop_add_5_run.c: Likewise.
* gcc.target/aarch64/sve/miniloop_1.c: Likewise.
* gcc.target/aarch64/sve/miniloop_2.c: Likewise.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256629
|
|
This patch removes the restriction that fully-masked loops cannot
have reductions. The key thing here is to make sure that the
reduction accumulator doesn't include any values associated with
inactive lanes; the patch adds a bunch of conditional binary
operations for doing that.
2018-01-13 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* doc/md.texi (cond_add@var{mode}, cond_sub@var{mode})
(cond_and@var{mode}, cond_ior@var{mode}, cond_xor@var{mode})
(cond_smin@var{mode}, cond_smax@var{mode}, cond_umin@var{mode})
(cond_umax@var{mode}): Document.
* optabs.def (cond_add_optab, cond_sub_optab, cond_and_optab)
(cond_ior_optab, cond_xor_optab, cond_smin_optab, cond_smax_optab)
(cond_umin_optab, cond_umax_optab): New optabs.
* internal-fn.def (COND_ADD, COND_SUB, COND_MIN, COND_MAX, COND_AND)
(COND_IOR, COND_XOR): New internal functions.
* internal-fn.h (get_conditional_internal_fn): Declare.
* internal-fn.c (cond_binary_direct): New macro.
(expand_cond_binary_optab_fn): Likewise.
(direct_cond_binary_optab_supported_p): Likewise.
(get_conditional_internal_fn): New function.
* tree-vect-loop.c (vectorizable_reduction): Handle fully-masked loops.
Cope with reduction statements that are vectorized as calls rather
than assignments.
* config/aarch64/aarch64-sve.md (cond_<optab><mode>): New insns.
* config/aarch64/iterators.md (UNSPEC_COND_ADD, UNSPEC_COND_SUB)
(UNSPEC_COND_SMAX, UNSPEC_COND_UMAX, UNSPEC_COND_SMIN)
(UNSPEC_COND_UMIN, UNSPEC_COND_AND, UNSPEC_COND_ORR)
(UNSPEC_COND_EOR): New unspecs.
(optab): Add mappings for them.
(SVE_COND_INT_OP, SVE_COND_FP_OP): New int iterators.
(sve_int_op, sve_fp_op): New int attributes.
gcc/testsuite/
* gcc.dg/vect/pr60482.c: Remove XFAIL for variable-length vectors.
* gcc.target/aarch64/sve/reduc_1.c: Expect the loop operations
to be predicated.
* gcc.target/aarch64/sve/slp_5.c: Check for a fully-masked loop.
* gcc.target/aarch64/sve/slp_7.c: Likewise.
* gcc.target/aarch64/sve/reduc_5.c: New test.
* gcc.target/aarch64/sve/slp_13.c: Likewise.
* gcc.target/aarch64/sve/slp_13_run.c: Likewise.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256626
|
|
This patch adds support for using a single fully-predicated loop instead
of a vector loop and a scalar tail. An SVE WHILELO instruction generates
the predicate for each iteration of the loop, given the current scalar
iv value and the loop bound. This operation is wrapped up in a new internal
function called WHILE_ULT. E.g.:
WHILE_ULT (0, 3, { 0, 0, 0, 0}) -> { 1, 1, 1, 0 }
WHILE_ULT (UINT_MAX - 1, UINT_MAX, { 0, 0, 0, 0 }) -> { 1, 0, 0, 0 }
The third WHILE_ULT argument is needed to make the operation
unambiguous: without it, WHILE_ULT (0, 3) for one vector type would
seem equivalent to WHILE_ULT (0, 3) for another, even if the types have
different numbers of elements.
Note that the patch uses "mask" and "fully-masked" instead of
"predicate" and "fully-predicated", to follow existing GCC terminology.
This patch just handles the simple cases, punting for things like
reductions and live-out values. Later patches remove most of these
restrictions.
2018-01-13 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* optabs.def (while_ult_optab): New optab.
* doc/md.texi (while_ult@var{m}@var{n}): Document.
* internal-fn.def (WHILE_ULT): New internal function.
* internal-fn.h (direct_internal_fn_supported_p): New override
that takes two types as argument.
* internal-fn.c (while_direct): New macro.
(expand_while_optab_fn): New function.
(convert_optab_supported_p): Likewise.
(direct_while_optab_supported_p): New macro.
* wide-int.h (wi::udiv_ceil): New function.
* tree-vectorizer.h (rgroup_masks): New structure.
(vec_loop_masks): New typedef.
(_loop_vec_info): Add masks, mask_compare_type, can_fully_mask_p
and fully_masked_p.
(LOOP_VINFO_CAN_FULLY_MASK_P, LOOP_VINFO_FULLY_MASKED_P)
(LOOP_VINFO_MASKS, LOOP_VINFO_MASK_COMPARE_TYPE): New macros.
(vect_max_vf): New function.
(slpeel_make_loop_iterate_ntimes): Delete.
(vect_set_loop_condition, vect_get_loop_mask_type, vect_gen_while)
(vect_halve_mask_nunits, vect_double_mask_nunits): Declare.
(vect_record_loop_mask, vect_get_loop_mask): Likewise.
* tree-vect-loop-manip.c: Include tree-ssa-loop-niter.h,
internal-fn.h, stor-layout.h and optabs-query.h.
(vect_set_loop_mask): New function.
(add_preheader_seq): Likewise.
(add_header_seq): Likewise.
(interleave_supported_p): Likewise.
(vect_maybe_permute_loop_masks): Likewise.
(vect_set_loop_masks_directly): Likewise.
(vect_set_loop_condition_masked): Likewise.
(vect_set_loop_condition_unmasked): New function, split out from
slpeel_make_loop_iterate_ntimes.
(slpeel_make_loop_iterate_ntimes): Rename to..
(vect_set_loop_condition): ...this. Use vect_set_loop_condition_masked
for fully-masked loops and vect_set_loop_condition_unmasked otherwise.
(vect_do_peeling): Update call accordingly.
(vect_gen_vector_loop_niters): Use VF as the step for fully-masked
loops.
* tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize
mask_compare_type, can_fully_mask_p and fully_masked_p.
(release_vec_loop_masks): New function.
(_loop_vec_info): Use it to free the loop masks.
(can_produce_all_loop_masks_p): New function.
(vect_get_max_nscalars_per_iter): Likewise.
(vect_verify_full_masking): Likewise.
(vect_analyze_loop_2): Save LOOP_VINFO_CAN_FULLY_MASK_P around
retries, and free the mask rgroups before retrying. Check loop-wide
reasons for disallowing fully-masked loops. Make the final decision
about whether use a fully-masked loop or not.
(vect_estimate_min_profitable_iters): Do not assume that peeling
for the number of iterations will be needed for fully-masked loops.
(vectorizable_reduction): Disable fully-masked loops.
(vectorizable_live_operation): Likewise.
(vect_halve_mask_nunits): New function.
(vect_double_mask_nunits): Likewise.
(vect_record_loop_mask): Likewise.
(vect_get_loop_mask): Likewise.
(vect_transform_loop): Handle the case in which the final loop
iteration might handle a partial vector. Call vect_set_loop_condition
instead of slpeel_make_loop_iterate_ntimes.
* tree-vect-stmts.c: Include tree-ssa-loop-niter.h and gimple-fold.h.
(check_load_store_masking): New function.
(prepare_load_store_mask): Likewise.
(vectorizable_store): Handle fully-masked loops.
(vectorizable_load): Likewise.
(supportable_widening_operation): Use vect_halve_mask_nunits for
booleans.
(supportable_narrowing_operation): Likewise vect_double_mask_nunits.
(vect_gen_while): New function.
* config/aarch64/aarch64.md (umax<mode>3): New expander.
(aarch64_uqdec<mode>): New insn.
gcc/testsuite/
* gcc.dg/tree-ssa/cunroll-10.c: Disable vectorization.
* gcc.dg/tree-ssa/peel1.c: Likewise.
* gcc.dg/vect/vect-load-lanes-peeling-1.c: Remove XFAIL for
variable-length vectors.
* gcc.target/aarch64/sve/vcond_6.c: XFAIL test for AND.
* gcc.target/aarch64/sve/vec_bool_cmp_1.c: Expect BIC instead of NOT.
* gcc.target/aarch64/sve/slp_1.c: Check for a fully-masked loop.
* gcc.target/aarch64/sve/slp_2.c: Likewise.
* gcc.target/aarch64/sve/slp_3.c: Likewise.
* gcc.target/aarch64/sve/slp_4.c: Likewise.
* gcc.target/aarch64/sve/slp_6.c: Likewise.
* gcc.target/aarch64/sve/slp_8.c: New test.
* gcc.target/aarch64/sve/slp_8_run.c: Likewise.
* gcc.target/aarch64/sve/slp_9.c: Likewise.
* gcc.target/aarch64/sve/slp_9_run.c: Likewise.
* gcc.target/aarch64/sve/slp_10.c: Likewise.
* gcc.target/aarch64/sve/slp_10_run.c: Likewise.
* gcc.target/aarch64/sve/slp_11.c: Likewise.
* gcc.target/aarch64/sve/slp_11_run.c: Likewise.
* gcc.target/aarch64/sve/slp_12.c: Likewise.
* gcc.target/aarch64/sve/slp_12_run.c: Likewise.
* gcc.target/aarch64/sve/ld1r_2.c: Likewise.
* gcc.target/aarch64/sve/ld1r_2_run.c: Likewise.
* gcc.target/aarch64/sve/while_1.c: Likewise.
* gcc.target/aarch64/sve/while_2.c: Likewise.
* gcc.target/aarch64/sve/while_3.c: Likewise.
* gcc.target/aarch64/sve/while_4.c: Likewise.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256625
|
|
This patch adds support for the SVE bitwise reduction instructions
(ANDV, ORV and EORV). It's a fairly mechanical extension of existing
REDUC_* operators.
2018-01-13 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* optabs.def (reduc_and_scal_optab, reduc_ior_scal_optab)
(reduc_xor_scal_optab): New optabs.
* doc/md.texi (reduc_and_scal_@var{m}, reduc_ior_scal_@var{m})
(reduc_xor_scal_@var{m}): Document.
* doc/sourcebuild.texi (vect_logical_reduc): Likewise.
* internal-fn.def (IFN_REDUC_AND, IFN_REDUC_IOR, IFN_REDUC_XOR): New
internal functions.
* fold-const-call.c (fold_const_call): Handle them.
* tree-vect-loop.c (reduction_fn_for_scalar_code): Return the new
internal functions for BIT_AND_EXPR, BIT_IOR_EXPR and BIT_XOR_EXPR.
* config/aarch64/aarch64-sve.md (reduc_<bit_reduc>_scal_<mode>):
(*reduc_<bit_reduc>_scal_<mode>): New patterns.
* config/aarch64/iterators.md (UNSPEC_ANDV, UNSPEC_ORV)
(UNSPEC_XORV): New unspecs.
(optab): Add entries for them.
(BITWISEV): New int iterator.
(bit_reduc_op): New int attributes.
gcc/testsuite/
* lib/target-supports.exp (check_effective_target_vect_logical_reduc):
New proc.
* gcc.dg/vect/vect-reduc-or_1.c: Also run for vect_logical_reduc
and add an associated scan-dump test. Prevent vectorization
of the first two loops.
* gcc.dg/vect/vect-reduc-or_2.c: Likewise.
* gcc.target/aarch64/sve/reduc_1.c: Add AND, IOR and XOR reductions.
* gcc.target/aarch64/sve/reduc_2.c: Likewise.
* gcc.target/aarch64/sve/reduc_1_run.c: Likewise.
(INIT_VECTOR): Tweak initial value so that some bits are always set.
* gcc.target/aarch64/sve/reduc_2_run.c: Likewise.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256624
|
|
Two things stopped us using SLP reductions with variable-length vectors:
(1) We didn't have a way of constructing the initial vector.
This patch does it by creating a vector full of the neutral
identity value and then using a shift-and-insert function
to insert any non-identity inputs into the low-numbered elements.
(The non-identity values are needed for double reductions.)
Alternatively, for unchained MIN/MAX reductions that have no neutral
value, we instead use the same duplicate-and-interleave approach as
for SLP constant and external definitions (added by a previous
patch).
(2) The epilogue for constant-length vectors would extract the vector
elements associated with each SLP statement and do scalar arithmetic
on these individual elements. For variable-length vectors, the patch
instead creates a reduction vector for each SLP statement, replacing
the elements for other SLP statements with the identity value.
It then uses a hardware reduction instruction on each vector.
2018-01-13 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* doc/md.texi (vec_shl_insert_@var{m}): New optab.
* internal-fn.def (VEC_SHL_INSERT): New internal function.
* optabs.def (vec_shl_insert_optab): New optab.
* tree-vectorizer.h (can_duplicate_and_interleave_p): Declare.
(duplicate_and_interleave): Likewise.
* tree-vect-loop.c: Include internal-fn.h.
(neutral_op_for_slp_reduction): New function, split out from
get_initial_defs_for_reduction.
(get_initial_def_for_reduction): Handle option 2 for variable-length
vectors by loading the neutral value into a vector and then shifting
the initial value into element 0.
(get_initial_defs_for_reduction): Replace the code argument with
the neutral value calculated by neutral_op_for_slp_reduction.
Use gimple_build_vector for constant-length vectors.
Use IFN_VEC_SHL_INSERT for variable-length vectors if all
but the first group_size elements have a neutral value.
Use duplicate_and_interleave otherwise.
(vect_create_epilog_for_reduction): Take a neutral_op parameter.
Update call to get_initial_defs_for_reduction. Handle SLP
reductions for variable-length vectors by creating one vector
result for each scalar result, with the elements associated
with other scalar results stubbed out with the neutral value.
(vectorizable_reduction): Call neutral_op_for_slp_reduction.
Require IFN_VEC_SHL_INSERT for double reductions on
variable-length vectors, or SLP reductions that have
a neutral value. Require can_duplicate_and_interleave_p
support for variable-length unchained SLP reductions if there
is no neutral value, such as for MIN/MAX reductions. Also require
the number of vector elements to be a multiple of the number of
SLP statements when doing variable-length unchained SLP reductions.
Update call to vect_create_epilog_for_reduction.
* tree-vect-slp.c (can_duplicate_and_interleave_p): Make public
and remove initial values.
(duplicate_and_interleave): Make public.
* config/aarch64/aarch64.md (UNSPEC_INSR): New unspec.
* config/aarch64/aarch64-sve.md (vec_shl_insert_<mode>): New insn.
gcc/testsuite/
* gcc.dg/vect/pr37027.c: Remove XFAIL for variable-length vectors.
* gcc.dg/vect/pr67790.c: Likewise.
* gcc.dg/vect/slp-reduc-1.c: Likewise.
* gcc.dg/vect/slp-reduc-2.c: Likewise.
* gcc.dg/vect/slp-reduc-3.c: Likewise.
* gcc.dg/vect/slp-reduc-5.c: Likewise.
* gcc.target/aarch64/sve/slp_5.c: New test.
* gcc.target/aarch64/sve/slp_5_run.c: Likewise.
* gcc.target/aarch64/sve/slp_6.c: Likewise.
* gcc.target/aarch64/sve/slp_6_run.c: Likewise.
* gcc.target/aarch64/sve/slp_7.c: Likewise.
* gcc.target/aarch64/sve/slp_7_run.c: Likewise.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256623
|
|
We had:
if (vec_outside_cost <= 0)
min_profitable_iters = 0;
else
{
min_profitable_iters = ((vec_outside_cost - scalar_outside_cost)
* assumed_vf
- vec_inside_cost * peel_iters_prologue
- vec_inside_cost * peel_iters_epilogue)
/ ((scalar_single_iter_cost * assumed_vf)
- vec_inside_cost);
which can lead to negative min_profitable_iters when the *_outside_costs
are the same and peel_iters_epilogue is nonzero (e.g. if we're peeling
for gaps).
This is tested as part of the patch that adds support for fully-predicated
loops.
2018-01-13 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* tree-vect-loop.c (vect_estimate_min_profitable_iters): Make sure
min_profitable_iters doesn't go negative.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256621
|
|
This patch adds support for vectorising groups of IFN_MASK_LOADs
and IFN_MASK_STOREs using conditional load/store-lanes instructions.
This requires new internal functions to represent the result
(IFN_MASK_{LOAD,STORE}_LANES), as well as associated optabs.
The normal IFN_{LOAD,STORE}_LANES functions are const operations
that logically just perform the permute: the load or store is
encoded as a MEM operand to the call statement. In contrast,
the IFN_MASK_{LOAD,STORE}_LANES functions use the same kind of
interface as IFN_MASK_{LOAD,STORE}, since the memory is only
conditionally accessed.
The AArch64 patterns were added as part of the main LD[234]/ST[234] patch.
2018-01-13 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* doc/md.texi (vec_mask_load_lanes@var{m}@var{n}): Document.
(vec_mask_store_lanes@var{m}@var{n}): Likewise.
* optabs.def (vec_mask_load_lanes_optab): New optab.
(vec_mask_store_lanes_optab): Likewise.
* internal-fn.def (MASK_LOAD_LANES): New internal function.
(MASK_STORE_LANES): Likewise.
* internal-fn.c (mask_load_lanes_direct): New macro.
(mask_store_lanes_direct): Likewise.
(expand_mask_load_optab_fn): Handle masked operations.
(expand_mask_load_lanes_optab_fn): New macro.
(expand_mask_store_optab_fn): Handle masked operations.
(expand_mask_store_lanes_optab_fn): New macro.
(direct_mask_load_lanes_optab_supported_p): Likewise.
(direct_mask_store_lanes_optab_supported_p): Likewise.
* tree-vectorizer.h (vect_store_lanes_supported): Take a masked_p
parameter.
(vect_load_lanes_supported): Likewise.
* tree-vect-data-refs.c (strip_conversion): New function.
(can_group_stmts_p): Likewise.
(vect_analyze_data_ref_accesses): Use it instead of checking
for a pair of assignments.
(vect_store_lanes_supported): Take a masked_p parameter.
(vect_load_lanes_supported): Likewise.
* tree-vect-loop.c (vect_analyze_loop_2): Update calls to
vect_store_lanes_supported and vect_load_lanes_supported.
* tree-vect-slp.c (vect_analyze_slp_instance): Likewise.
* tree-vect-stmts.c (get_group_load_store_type): Take a masked_p
parameter. Don't allow gaps for masked accesses.
Use vect_get_store_rhs. Update calls to vect_store_lanes_supported
and vect_load_lanes_supported.
(get_load_store_type): Take a masked_p parameter and update
call to get_group_load_store_type.
(vectorizable_store): Update call to get_load_store_type.
Handle IFN_MASK_STORE_LANES.
(vectorizable_load): Update call to get_load_store_type.
Handle IFN_MASK_LOAD_LANES.
gcc/testsuite/
* gcc.dg/vect/vect-ooo-group-1.c: New test.
* gcc.target/aarch64/sve/mask_struct_load_1.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_load_1_run.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_load_2.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_load_2_run.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_load_3.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_load_3_run.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_load_4.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_load_5.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_load_6.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_load_7.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_load_8.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_store_1.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_store_1_run.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_store_2.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_store_2_run.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_store_3.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_store_3_run.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_store_4.c: Likewise.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256620
|
|
128b right away, to be more efficient for Ryzen and Intel)
2018-01-12 Richard Biener <rguenther@suse.de>
PR tree-optimization/80846
* target.def (split_reduction): New target hook.
* targhooks.c (default_split_reduction): New function.
* targhooks.h (default_split_reduction): Declare.
* tree-vect-loop.c (vect_create_epilog_for_reduction): If the
target requests first reduce vectors by combining low and high
parts.
* tree-vect-stmts.c (vect_gen_perm_mask_any): Adjust.
(get_vectype_for_scalar_type_and_size): Export.
* tree-vectorizer.h (get_vectype_for_scalar_type_and_size): Declare.
* doc/tm.texi.in (TARGET_VECTORIZE_SPLIT_REDUCTION): Document.
* doc/tm.texi: Regenerate.
i386/
* config/i386/i386.c (ix86_split_reduction): Implement
TARGET_VECTORIZE_SPLIT_REDUCTION.
* gcc.target/i386/pr80846-1.c: New testcase.
* gcc.target/i386/pr80846-2.c: Likewise.
From-SVN: r256576
|
|
vectorizable_mask_load_store replaces scalar IFN_MASK_LOAD calls with
dummy assignments, so that they never survive vectorisation. This patch
moves the code to vect_transform_loop instead, so that we only change
the scalar statements once all of them have been vectorised.
This makes it easier to handle other types of functions that need
stubbing out, and also makes it easier to handle groups and patterns.
2018-01-03 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* tree-vect-loop.c (vect_transform_loop): Stub out scalar
IFN_MASK_LOAD calls here rather than...
* tree-vect-stmts.c (vectorizable_mask_load_store): ...here.
From-SVN: r256210
|
|
This patch changes GET_MODE_SIZE from unsigned short to poly_uint16.
The non-mechanical parts were handled by previous patches.
2018-01-03 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* machmode.h (mode_size): Change from unsigned short to
poly_uint16_pod.
(mode_to_bytes): Return a poly_uint16 rather than an unsigned short.
(GET_MODE_SIZE): Return a constant if ONLY_FIXED_SIZE_MODES,
or if measurement_type is not polynomial.
(fixed_size_mode::includes_p): Check for constant-sized modes.
* genmodes.c (emit_mode_size_inline): Make mode_size_inline
return a poly_uint16 rather than an unsigned short.
(emit_mode_size): Change the type of mode_size from unsigned short
to poly_uint16_pod. Use ZERO_COEFFS for the initializer.
(emit_mode_adjustments): Cope with polynomial vector sizes.
* lto-streamer-in.c (lto_input_mode_table): Use bp_unpack_poly_value
for GET_MODE_SIZE.
* lto-streamer-out.c (lto_write_mode_table): Use bp_pack_poly_value
for GET_MODE_SIZE.
* auto-inc-dec.c (try_merge): Treat GET_MODE_SIZE as polynomial.
* builtins.c (expand_ifn_atomic_compare_exchange_into_call): Likewise.
* caller-save.c (setup_save_areas): Likewise.
(replace_reg_with_saved_mem): Likewise.
* calls.c (emit_library_call_value_1): Likewise.
* combine-stack-adj.c (combine_stack_adjustments_for_block): Likewise.
* combine.c (simplify_set, make_extraction, simplify_shift_const_1)
(gen_lowpart_for_combine): Likewise.
* convert.c (convert_to_integer_1): Likewise.
* cse.c (equiv_constant, cse_insn): Likewise.
* cselib.c (autoinc_split, cselib_hash_rtx): Likewise.
(cselib_subst_to_values): Likewise.
* dce.c (word_dce_process_block): Likewise.
* df-problems.c (df_word_lr_mark_ref): Likewise.
* dwarf2cfi.c (init_one_dwarf_reg_size): Likewise.
* dwarf2out.c (multiple_reg_loc_descriptor, mem_loc_descriptor)
(concat_loc_descriptor, concatn_loc_descriptor, loc_descriptor)
(rtl_for_decl_location): Likewise.
* emit-rtl.c (gen_highpart, widen_memory_access): Likewise.
* expmed.c (extract_bit_field_1, extract_integral_bit_field): Likewise.
* expr.c (emit_group_load_1, clear_storage_hints): Likewise.
(emit_move_complex, emit_move_multi_word, emit_push_insn): Likewise.
(expand_expr_real_1): Likewise.
* function.c (assign_parm_setup_block_p, assign_parm_setup_block)
(pad_below): Likewise.
* gimple-fold.c (optimize_atomic_compare_exchange_p): Likewise.
* gimple-ssa-store-merging.c (rhs_valid_for_store_merging_p): Likewise.
* ira.c (get_subreg_tracking_sizes): Likewise.
* ira-build.c (ira_create_allocno_objects): Likewise.
* ira-color.c (coalesced_pseudo_reg_slot_compare): Likewise.
(ira_sort_regnos_for_alter_reg): Likewise.
* ira-costs.c (record_operand_costs): Likewise.
* lower-subreg.c (interesting_mode_p, simplify_gen_subreg_concatn)
(resolve_simple_move): Likewise.
* lra-constraints.c (get_reload_reg, operands_match_p): Likewise.
(process_addr_reg, simplify_operand_subreg, curr_insn_transform)
(lra_constraints): Likewise.
(CONST_POOL_OK_P): Reject variable-sized modes.
* lra-spills.c (slot, assign_mem_slot, pseudo_reg_slot_compare)
(add_pseudo_to_slot, lra_spill): Likewise.
* omp-low.c (omp_clause_aligned_alignment): Likewise.
* optabs-query.c (get_best_extraction_insn): Likewise.
* optabs-tree.c (expand_vec_cond_expr_p): Likewise.
* optabs.c (expand_vec_perm_var, expand_vec_cond_expr): Likewise.
(expand_mult_highpart, valid_multiword_target_p): Likewise.
* recog.c (offsettable_address_addr_space_p): Likewise.
* regcprop.c (maybe_mode_change): Likewise.
* reginfo.c (choose_hard_reg_mode, record_subregs_of_mode): Likewise.
* regrename.c (build_def_use): Likewise.
* regstat.c (dump_reg_info): Likewise.
* reload.c (complex_word_subreg_p, push_reload, find_dummy_reload)
(find_reloads, find_reloads_subreg_address): Likewise.
* reload1.c (eliminate_regs_1): Likewise.
* rtlanal.c (for_each_inc_dec_find_inc_dec, rtx_cost): Likewise.
* simplify-rtx.c (avoid_constant_pool_reference): Likewise.
(simplify_binary_operation_1, simplify_subreg): Likewise.
* targhooks.c (default_function_arg_padding): Likewise.
(default_hard_regno_nregs, default_class_max_nregs): Likewise.
* tree-cfg.c (verify_gimple_assign_binary): Likewise.
(verify_gimple_assign_ternary): Likewise.
* tree-inline.c (estimate_move_cost): Likewise.
* tree-ssa-forwprop.c (simplify_vector_constructor): Likewise.
* tree-ssa-loop-ivopts.c (add_autoinc_candidates): Likewise.
(get_address_cost_ainc): Likewise.
* tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Likewise.
(vect_supportable_dr_alignment): Likewise.
* tree-vect-loop.c (vect_determine_vectorization_factor): Likewise.
(vectorizable_reduction): Likewise.
* tree-vect-stmts.c (vectorizable_assignment, vectorizable_shift)
(vectorizable_operation, vectorizable_load): Likewise.
* tree.c (build_same_sized_truth_vector_type): Likewise.
* valtrack.c (cleanup_auto_inc_dec): Likewise.
* var-tracking.c (emit_note_insn_var_location): Likewise.
* config/arc/arc.h (ASM_OUTPUT_CASE_END): Use as_a <scalar_int_mode>.
(ADDR_VEC_ALIGN): Likewise.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256201
|
|
This patch changes TYPE_VECTOR_SUBPARTS to a poly_uint64. The value is
encoded in the 10-bit precision field and was previously always stored
as a simple log2 value. The challenge was to use this 10 bits to
encode the number of elements in variable-length vectors, so that
we didn't need to increase the size of the tree.
In practice the number of vector elements should always have the form
N + N * X (where X is the runtime value), and as for constant-length
vectors, N must be a power of 2 (even though X itself might not be).
The patch therefore uses the low 8 bits to encode log2(N) and bit
8 to select between constant-length and variable-length vectors.
Targets without variable-length vectors continue to use the old scheme.
A new valid_vector_subparts_p function tests whether a given number
of elements can be encoded. This is false for the vector modes that
represent an LD3 or ST3 vector triple (which we want to treat as arrays
of vectors rather than single vectors).
Most of the patch is mechanical; previous patches handled the changes
that weren't entirely straightforward.
2018-01-03 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* tree.h (TYPE_VECTOR_SUBPARTS): Turn into a function and handle
polynomial numbers of units.
(SET_TYPE_VECTOR_SUBPARTS): Likewise.
(valid_vector_subparts_p): New function.
(build_vector_type): Remove temporary shim and take the number
of units as a poly_uint64 rather than an int.
(build_opaque_vector_type): Take the number of units as a
poly_uint64 rather than an int.
* tree.c (build_vector_from_ctor): Handle polynomial
TYPE_VECTOR_SUBPARTS.
(type_hash_canon_hash, type_cache_hasher::equal): Likewise.
(uniform_vector_p, vector_type_mode, build_vector): Likewise.
(build_vector_from_val): If the number of units is variable,
use build_vec_duplicate_cst for constant operands and
VEC_DUPLICATE_EXPR otherwise.
(make_vector_type): Remove temporary is_constant ().
(build_vector_type, build_opaque_vector_type): Take the number of
units as a poly_uint64 rather than an int.
(check_vector_cst): Handle polynomial TYPE_VECTOR_SUBPARTS and
VECTOR_CST_NELTS.
* cfgexpand.c (expand_debug_expr): Likewise.
* expr.c (count_type_elements, categorize_ctor_elements_1): Likewise.
(store_constructor, expand_expr_real_1): Likewise.
(const_scalar_mask_from_tree): Likewise.
* fold-const-call.c (fold_const_reduction): Likewise.
* fold-const.c (const_binop, const_unop, fold_convert_const): Likewise.
(operand_equal_p, fold_vec_perm, fold_ternary_loc): Likewise.
(native_encode_vector, vec_cst_ctor_to_array): Likewise.
(fold_relational_const): Likewise.
(native_interpret_vector): Likewise. Change the size from an
int to an unsigned int.
* gimple-fold.c (gimple_fold_stmt_to_constant_1): Handle polynomial
TYPE_VECTOR_SUBPARTS.
(gimple_fold_indirect_ref, gimple_build_vector): Likewise.
(gimple_build_vector_from_val): Use VEC_DUPLICATE_EXPR when
duplicating a non-constant operand into a variable-length vector.
* hsa-brig.c (hsa_op_immed::emit_to_buffer): Handle polynomial
TYPE_VECTOR_SUBPARTS and VECTOR_CST_NELTS.
* ipa-icf.c (sem_variable::equals): Likewise.
* match.pd: Likewise.
* omp-simd-clone.c (simd_clone_subparts): Likewise.
* print-tree.c (print_node): Likewise.
* stor-layout.c (layout_type): Likewise.
* targhooks.c (default_builtin_vectorization_cost): Likewise.
* tree-cfg.c (verify_gimple_comparison): Likewise.
(verify_gimple_assign_binary): Likewise.
(verify_gimple_assign_ternary): Likewise.
(verify_gimple_assign_single): Likewise.
* tree-pretty-print.c (dump_generic_node): Likewise.
* tree-ssa-forwprop.c (simplify_vector_constructor): Likewise.
(simplify_bitfield_ref, is_combined_permutation_identity): Likewise.
* tree-vect-data-refs.c (vect_permute_store_chain): Likewise.
(vect_grouped_load_supported, vect_permute_load_chain): Likewise.
(vect_shift_permute_load_chain): Likewise.
* tree-vect-generic.c (nunits_for_known_piecewise_op): Likewise.
(expand_vector_condition, optimize_vector_constructor): Likewise.
(lower_vec_perm, get_compute_type): Likewise.
* tree-vect-loop.c (vect_determine_vectorization_factor): Likewise.
(get_initial_defs_for_reduction, vect_transform_loop): Likewise.
* tree-vect-patterns.c (vect_recog_bool_pattern): Likewise.
(vect_recog_mask_conversion_pattern): Likewise.
* tree-vect-slp.c (vect_supported_load_permutation_p): Likewise.
(vect_get_constant_vectors, vect_transform_slp_perm_load): Likewise.
* tree-vect-stmts.c (perm_mask_for_reverse): Likewise.
(get_group_load_store_type, vectorizable_mask_load_store): Likewise.
(vectorizable_bswap, simd_clone_subparts, vectorizable_assignment)
(vectorizable_shift, vectorizable_operation, vectorizable_store)
(vectorizable_load, vect_is_simple_cond, vectorizable_comparison)
(supportable_widening_operation): Likewise.
(supportable_narrowing_operation): Likewise.
* tree-vector-builder.c (tree_vector_builder::binary_encoded_nelts):
Likewise.
* varasm.c (output_constant): Likewise.
gcc/ada/
* gcc-interface/utils.c (gnat_types_compatible_p): Handle
polynomial TYPE_VECTOR_SUBPARTS.
gcc/brig/
* brigfrontend/brig-to-generic.cc (get_unsigned_int_type): Handle
polynomial TYPE_VECTOR_SUBPARTS.
* brigfrontend/brig-util.h (gccbrig_type_vector_subparts): Likewise.
gcc/c-family/
* c-common.c (vector_types_convertible_p, c_build_vec_perm_expr)
(convert_vector_to_array_for_subscript): Handle polynomial
TYPE_VECTOR_SUBPARTS.
(c_common_type_for_mode): Check valid_vector_subparts_p.
* c-pretty-print.c (pp_c_initializer_list): Handle polynomial
VECTOR_CST_NELTS.
gcc/c/
* c-typeck.c (comptypes_internal, build_binary_op): Handle polynomial
TYPE_VECTOR_SUBPARTS.
gcc/cp/
* constexpr.c (cxx_eval_array_reference): Handle polynomial
VECTOR_CST_NELTS.
(cxx_fold_indirect_ref): Handle polynomial TYPE_VECTOR_SUBPARTS.
* call.c (build_conditional_expr_1): Likewise.
* decl.c (cp_finish_decomp): Likewise.
* mangle.c (write_type): Likewise.
* typeck.c (structural_comptypes): Likewise.
(cp_build_binary_op): Likewise.
* typeck2.c (process_init_constructor_array): Likewise.
gcc/fortran/
* trans-types.c (gfc_type_for_mode): Check valid_vector_subparts_p.
gcc/lto/
* lto-lang.c (lto_type_for_mode): Check valid_vector_subparts_p.
* lto.c (hash_canonical_type): Handle polynomial TYPE_VECTOR_SUBPARTS.
gcc/go/
* go-lang.c (go_langhook_type_for_mode): Check valid_vector_subparts_p.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256197
|
|
This patch changes GET_MODE_NUNITS from unsigned char
to poly_uint16, although it remains a macro when compiling
target code with NUM_POLY_INT_COEFFS == 1.
We can handle permuted loads and stores for variable nunits if
the number of statements is a power of 2, but not otherwise.
The to_constant call in make_vector_type goes away in a later patch.
2018-01-03 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* machmode.h (mode_nunits): Change from unsigned char to
poly_uint16_pod.
(ONLY_FIXED_SIZE_MODES): New macro.
(pod_mode::measurement_type, scalar_int_mode::measurement_type)
(scalar_float_mode::measurement_type, scalar_mode::measurement_type)
(complex_mode::measurement_type, fixed_size_mode::measurement_type):
New typedefs.
(mode_to_nunits): Return a poly_uint16 rather than an unsigned short.
(GET_MODE_NUNITS): Return a constant if ONLY_FIXED_SIZE_MODES,
or if measurement_type is not polynomial.
* genmodes.c (ZERO_COEFFS): New macro.
(emit_mode_nunits_inline): Make mode_nunits_inline return a
poly_uint16.
(emit_mode_nunits): Change the type of mode_nunits to poly_uint16_pod.
Use ZERO_COEFFS when emitting initializers.
* data-streamer.h (bp_pack_poly_value): New function.
(bp_unpack_poly_value): Likewise.
* lto-streamer-in.c (lto_input_mode_table): Use bp_unpack_poly_value
for GET_MODE_NUNITS.
* lto-streamer-out.c (lto_write_mode_table): Use bp_pack_poly_value
for GET_MODE_NUNITS.
* tree.c (make_vector_type): Remove temporary shim and make
the real function take the number of units as a poly_uint64
rather than an int.
(build_vector_type_for_mode): Handle polynomial nunits.
* dwarf2out.c (loc_descriptor, add_const_value_attribute): Likewise.
* emit-rtl.c (const_vec_series_p_1): Likewise.
(gen_rtx_CONST_VECTOR): Likewise.
* fold-const.c (test_vec_duplicate_folding): Likewise.
* genrecog.c (validate_pattern): Likewise.
* optabs-query.c (can_vec_perm_var_p, can_mult_highpart_p): Likewise.
* optabs-tree.c (expand_vec_cond_expr_p): Likewise.
* optabs.c (expand_vector_broadcast, expand_binop_directly): Likewise.
(shift_amt_for_vec_perm_mask, expand_vec_perm_var): Likewise.
(expand_vec_cond_expr, expand_mult_highpart): Likewise.
* rtlanal.c (subreg_get_info): Likewise.
* tree-vect-data-refs.c (vect_grouped_store_supported): Likewise.
(vect_grouped_load_supported): Likewise.
* tree-vect-generic.c (type_for_widest_vector_mode): Likewise.
* tree-vect-loop.c (have_whole_vector_shift): Likewise.
* simplify-rtx.c (simplify_unary_operation_1): Likewise.
(simplify_const_unary_operation, simplify_binary_operation_1)
(simplify_const_binary_operation, simplify_ternary_operation)
(test_vector_ops_duplicate, test_vector_ops): Likewise.
(simplify_immed_subreg): Use GET_MODE_NUNITS on a fixed_size_mode
instead of CONST_VECTOR_NUNITS.
* varasm.c (output_constant_pool_2): Likewise.
* rtx-vector-builder.c (rtx_vector_builder::build): Only include the
explicit-encoded elements in the XVEC for variable-length vectors.
gcc/ada/
* gcc-interface/misc.c (enumerate_modes): Handle polynomial
GET_MODE_NUNITS.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256195
|
|
From-SVN: r256169
|
|
This patch makes vectorizable_live_operation cope with variable-length
vectors. For now we just handle cases in which we can tell at compile
time which vector contains the final result.
2018-01-03 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* tree-vect-loop.c (vectorizable_live_operation): Treat the number
of units as polynomial. Punt if we can't tell at compile time
which vector contains the final result.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256135
|
|
This patch makes vectorizable_induction cope with variable-length
vectors. For now we punt on SLP inductions, but patchees after
the main SVE submission add support for those too.
2018-01-03 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* tree-vect-loop.c (vectorizable_induction): Treat the number
of units as polynomial. Punt on SLP inductions. Use an integer
VEC_SERIES_EXPR for variable-length integer reductions. Use a
cast of such a series for variable-length floating-point
reductions.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256134
|
|
This patch makes vectorizable_reduction cope with variable-length vectors.
We can handle the simple case of an inner loop reduction for which
the target has native support for the epilogue operation. For now we
punt on other cases, but patches after the main SVE submission allow
SLP and double reductions too.
2018-01-03 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* tree.h (build_index_vector): Declare.
* tree.c (build_index_vector): New function.
* tree-vect-loop.c (get_initial_defs_for_reduction): Treat the number
of units as polynomial, forcibly converting it to a constant if
vectorizable_reduction has already enforced the condition.
(vect_create_epilog_for_reduction): Likewise. Use build_index_vector
to create a {1,2,3,...} vector.
(vectorizable_reduction): Treat the number of units as polynomial.
Choose vectype_in based on the largest scalar element size rather
than the smallest number of units. Enforce the restrictions
relied on above.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256133
|
|
This patch changes the type of current_vector_size to poly_uint64.
It also changes TARGET_AUTOVECTORIZE_VECTOR_SIZES so that it fills
in a vector of possible sizes (as poly_uint64s) instead of returning
a bitmask. The documentation claimed that the hook didn't need to
include the default vector size (returned by preferred_simd_mode),
but that wasn't consistent with the omp-low.c usage.
2018-01-03 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* target.h (vector_sizes, auto_vector_sizes): New typedefs.
* target.def (autovectorize_vector_sizes): Return the vector sizes
by pointer, using vector_sizes rather than a bitmask.
* targhooks.h (default_autovectorize_vector_sizes): Update accordingly.
* targhooks.c (default_autovectorize_vector_sizes): Likewise.
* config/aarch64/aarch64.c (aarch64_autovectorize_vector_sizes):
Likewise.
* config/arc/arc.c (arc_autovectorize_vector_sizes): Likewise.
* config/arm/arm.c (arm_autovectorize_vector_sizes): Likewise.
* config/i386/i386.c (ix86_autovectorize_vector_sizes): Likewise.
* config/mips/mips.c (mips_autovectorize_vector_sizes): Likewise.
* omp-general.c (omp_max_vf): Likewise.
* omp-low.c (omp_clause_aligned_alignment): Likewise.
* optabs-query.c (can_vec_mask_load_store_p): Likewise.
* tree-vect-loop.c (vect_analyze_loop): Likewise.
* tree-vect-slp.c (vect_slp_bb): Likewise.
* doc/tm.texi: Regenerate.
* tree-vectorizer.h (current_vector_size): Change from an unsigned int
to a poly_uint64.
* tree-vect-stmts.c (get_vectype_for_scalar_type_and_size): Take
the vector size as a poly_uint64 rather than an unsigned int.
(current_vector_size): Change from an unsigned int to a poly_uint64.
(get_vectype_for_scalar_type): Update accordingly.
* tree.h (build_truth_vector_type): Take the size and number of
units as a poly_uint64 rather than an unsigned int.
(build_vector_type): Add a temporary overload that takes
the number of units as a poly_uint64 rather than an unsigned int.
* tree.c (make_vector_type): Likewise.
(build_truth_vector_type): Take the number of units as a poly_uint64
rather than an unsigned int.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256131
|
|
This patch adds a function for getting the number of elements in
a vector for cost purposes, which is always constant. It makes
it possible for a later patch to change GET_MODE_NUNITS and
TYPE_VECTOR_SUBPARTS to a poly_int.
2018-01-03 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* tree-vectorizer.h (vect_nunits_for_cost): New function.
* tree-vect-loop.c (vect_model_reduction_cost): Use it.
* tree-vect-slp.c (vect_analyze_slp_cost_1): Likewise.
(vect_analyze_slp_cost): Likewise.
* tree-vect-stmts.c (vect_model_store_cost): Likewise.
(vect_model_load_cost): Likewise.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256128
|
|
This patch changes the type of the vectorisation factor and SLP
unrolling factor to poly_uint64. This in turn required some knock-on
changes in signedness elsewhere.
Cost decisions are generally based on estimated_poly_value,
which for VF is wrapped up as vect_vf_for_cost.
The patch doesn't on its own enable variable-length vectorisation.
It just makes the minimum changes necessary for the code to build
with the new VF and UF types. Later patches also make the
vectoriser cope with variable TYPE_VECTOR_SUBPARTS and variable
GET_MODE_NUNITS, at which point the code really does handle
variable-length vectors.
The patch also changes MAX_VECTORIZATION_FACTOR to INT_MAX,
to avoid hard-coding a particular architectural limit.
The patch includes a new test because a development version of the patch
accidentally used file print routines instead of dump_*, which would
fail with -fopt-info.
2018-01-03 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* tree-vectorizer.h (_slp_instance::unrolling_factor): Change
from an unsigned int to a poly_uint64.
(_loop_vec_info::slp_unrolling_factor): Likewise.
(_loop_vec_info::vectorization_factor): Change from an int
to a poly_uint64.
(MAX_VECTORIZATION_FACTOR): Bump from 64 to INT_MAX.
(vect_get_num_vectors): New function.
(vect_update_max_nunits, vect_vf_for_cost): Likewise.
(vect_get_num_copies): Use vect_get_num_vectors.
(vect_analyze_data_ref_dependences): Change max_vf from an int *
to an unsigned int *.
(vect_analyze_data_refs): Change min_vf from an int * to a
poly_uint64 *.
(vect_transform_slp_perm_load): Take the vf as a poly_uint64 rather
than an unsigned HOST_WIDE_INT.
* tree-vect-data-refs.c (vect_analyze_possibly_independent_ddr)
(vect_analyze_data_ref_dependence): Change max_vf from an int *
to an unsigned int *.
(vect_analyze_data_ref_dependences): Likewise.
(vect_compute_data_ref_alignment): Handle polynomial vf.
(vect_enhance_data_refs_alignment): Likewise.
(vect_prune_runtime_alias_test_list): Likewise.
(vect_shift_permute_load_chain): Likewise.
(vect_supportable_dr_alignment): Likewise.
(dependence_distance_ge_vf): Take the vectorization factor as a
poly_uint64 rather than an unsigned HOST_WIDE_INT.
(vect_analyze_data_refs): Change min_vf from an int * to a
poly_uint64 *.
* tree-vect-loop-manip.c (vect_gen_scalar_loop_niters): Take
vfm1 as a poly_uint64 rather than an int. Make the same change
for the returned bound_scalar.
(vect_gen_vector_loop_niters): Handle polynomial vf.
(vect_do_peeling): Likewise. Update call to
vect_gen_scalar_loop_niters and handle polynomial bound_scalars.
(vect_gen_vector_loop_niters_mult_vf): Assert that the vf must
be constant.
* tree-vect-loop.c (vect_determine_vectorization_factor)
(vect_update_vf_for_slp, vect_analyze_loop_2): Handle polynomial vf.
(vect_get_known_peeling_cost): Likewise.
(vect_estimate_min_profitable_iters, vectorizable_reduction): Likewise.
(vect_worthwhile_without_simd_p, vectorizable_induction): Likewise.
(vect_transform_loop): Likewise. Use the lowest possible VF when
updating the upper bounds of the loop.
(vect_min_worthwhile_factor): Make static. Return an unsigned int
rather than an int.
* tree-vect-slp.c (vect_attempt_slp_rearrange_stmts): Cope with
polynomial unroll factors.
(vect_analyze_slp_cost_1, vect_analyze_slp_instance): Likewise.
(vect_make_slp_decision): Likewise.
(vect_supported_load_permutation_p): Likewise, and polynomial
vf too.
(vect_analyze_slp_cost): Handle polynomial vf.
(vect_slp_analyze_node_operations): Likewise.
(vect_slp_analyze_bb_1): Likewise.
(vect_transform_slp_perm_load): Take the vf as a poly_uint64 rather
than an unsigned HOST_WIDE_INT.
* tree-vect-stmts.c (vectorizable_simd_clone_call, vectorizable_store)
(vectorizable_load): Handle polynomial vf.
* tree-vectorizer.c (simduid_to_vf::vf): Change from an int to
a poly_uint64.
(adjust_simduid_builtins, shrink_simd_arrays): Update accordingly.
gcc/testsuite/
* gcc.dg/vect-opt-info-1.c: New test.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256126
|
|
Normally we adjust the vector loop so that it iterates:
(original number of scalar iterations - number of peels) / VF
times, enforcing this using an IV that starts at zero and increments
by one each iteration. However, dividing by VF would be expensive
for variable VF, so this patch adds an alternative in which the IV
increments by VF each iteration instead. We then need to take care
to handle possible overflow in the IV.
The new mechanism isn't used yet; a later patch replaces the
"if (1)" with a check for variable VF.
2018-01-03 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* tree-vect-loop-manip.c: Include gimple-fold.h.
(slpeel_make_loop_iterate_ntimes): Add step, final_iv and
niters_maybe_zero parameters. Handle other cases besides a step of 1.
(vect_gen_vector_loop_niters): Add a step_vector_ptr parameter.
Add a path that uses a step of VF instead of 1, but disable it
for now.
(vect_do_peeling): Add step_vector, niters_vector_mult_vf_var
and niters_no_overflow parameters. Update calls to
slpeel_make_loop_iterate_ntimes and vect_gen_vector_loop_niters.
Create a new SSA name if the latter choses to use a ste other
than zero, and return it via niters_vector_mult_vf_var.
* tree-vect-loop.c (vect_transform_loop): Update calls to
vect_do_peeling, vect_gen_vector_loop_niters and
slpeel_make_loop_iterate_ntimes.
* tree-vectorizer.h (slpeel_make_loop_iterate_ntimes, vect_do_peeling)
(vect_gen_vector_loop_niters): Update declarations after above changes.
From-SVN: r256124
|
|
This patch makes users of vec_perm_builders use the compressed encoding
where possible. This means that they work with variable-length vectors.
2018-01-02 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* optabs.c (expand_vec_perm_var): Use an explicit encoding for
the broadcast of the low byte.
(expand_mult_highpart): Use an explicit encoding for the permutes.
* optabs-query.c (can_mult_highpart_p): Likewise.
* tree-vect-loop.c (calc_vec_perm_mask_for_shift): Likewise.
* tree-vect-stmts.c (perm_mask_for_reverse): Likewise.
(vectorizable_bswap): Likewise.
* tree-vect-data-refs.c (vect_grouped_store_supported): Use an
explicit encoding for the power-of-2 permutes.
(vect_permute_store_chain): Likewise.
(vect_grouped_load_supported): Likewise.
(vect_permute_load_chain): Likewise.
From-SVN: r256097
|
|
This patch changes vec_perm_indices from a plain vec<> to a class
that stores a canonicalized permutation, using the same encoding
as for VECTOR_CSTs. This means that vec_perm_indices now carries
information about the number of vectors being permuted (currently
always 1 or 2) and the number of elements in each input vector.
A new vec_perm_builder class is used to actually build up the vector,
like tree_vector_builder does for trees. vec_perm_indices is the
completed representation, a bit like VECTOR_CST is for trees.
The patch just does a mechanical conversion of the code to
vec_perm_builder: a later patch uses explicit encodings where possible.
The point of all this is that it makes the representation suitable
for variable-length vectors. It's no longer necessary for the
underlying vec<>s to store every element explicitly.
In int-vector-builder.h, "using the same encoding as tree and rtx constants"
describes the endpoint -- adding the rtx encoding comes later.
2018-01-02 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* int-vector-builder.h: New file.
* vec-perm-indices.h: Include int-vector-builder.h.
(vec_perm_indices): Redefine as an int_vector_builder.
(auto_vec_perm_indices): Delete.
(vec_perm_builder): Redefine as a stand-alone class.
(vec_perm_indices::vec_perm_indices): New function.
(vec_perm_indices::clamp): Likewise.
* vec-perm-indices.c: Include fold-const.h and tree-vector-builder.h.
(vec_perm_indices::new_vector): New function.
(vec_perm_indices::new_expanded_vector): Update for new
vec_perm_indices class.
(vec_perm_indices::rotate_inputs): New function.
(vec_perm_indices::all_in_range_p): Operate directly on the
encoded form, without computing elided elements.
(tree_to_vec_perm_builder): Operate directly on the VECTOR_CST
encoding. Update for new vec_perm_indices class.
* optabs.c (expand_vec_perm_const): Create a vec_perm_indices for
the given vec_perm_builder.
(expand_vec_perm_var): Update vec_perm_builder constructor.
(expand_mult_highpart): Use vec_perm_builder instead of
auto_vec_perm_indices.
* optabs-query.c (can_mult_highpart_p): Use vec_perm_builder and
vec_perm_indices instead of auto_vec_perm_indices. Use a single
or double series encoding as appropriate.
* fold-const.c (fold_ternary_loc): Use vec_perm_builder and
vec_perm_indices instead of auto_vec_perm_indices.
* tree-ssa-forwprop.c (simplify_vector_constructor): Likewise.
* tree-vect-data-refs.c (vect_grouped_store_supported): Likewise.
(vect_permute_store_chain): Likewise.
(vect_grouped_load_supported): Likewise.
(vect_permute_load_chain): Likewise.
(vect_shift_permute_load_chain): Likewise.
* tree-vect-slp.c (vect_build_slp_tree_1): Likewise.
(vect_transform_slp_perm_load): Likewise.
(vect_schedule_slp_instance): Likewise.
* tree-vect-stmts.c (perm_mask_for_reverse): Likewise.
(vectorizable_mask_load_store): Likewise.
(vectorizable_bswap): Likewise.
(vectorizable_store): Likewise.
(vectorizable_load): Likewise.
* tree-vect-generic.c (lower_vec_perm): Use vec_perm_builder and
vec_perm_indices instead of auto_vec_perm_indices. Use
tree_to_vec_perm_builder to read the vector from a tree.
* tree-vect-loop.c (calc_vec_perm_mask_for_shift): Take a
vec_perm_builder instead of a vec_perm_indices.
(have_whole_vector_shift): Use vec_perm_builder and
vec_perm_indices instead of auto_vec_perm_indices. Leave the
truncation to calc_vec_perm_mask_for_shift.
(vect_create_epilog_for_reduction): Likewise.
* config/aarch64/aarch64.c (expand_vec_perm_d::perm): Change
from auto_vec_perm_indices to vec_perm_indices.
(aarch64_expand_vec_perm_const_1): Use rotate_inputs on d.perm
instead of changing individual elements.
(aarch64_vectorize_vec_perm_const): Use new_vector to install
the vector in d.perm.
* config/arm/arm.c (expand_vec_perm_d::perm): Change
from auto_vec_perm_indices to vec_perm_indices.
(arm_expand_vec_perm_const_1): Use rotate_inputs on d.perm
instead of changing individual elements.
(arm_vectorize_vec_perm_const): Use new_vector to install
the vector in d.perm.
* config/powerpcspe/powerpcspe.c (rs6000_expand_extract_even):
Update vec_perm_builder constructor.
(rs6000_expand_interleave): Likewise.
* config/rs6000/rs6000.c (rs6000_expand_extract_even): Likewise.
(rs6000_expand_interleave): Likewise.
From-SVN: r256095
|
|
One of the changes needed for variable-length VEC_PERM_EXPRs -- and for
long fixed-length VEC_PERM_EXPRs -- is the ability to use constant
selectors that wouldn't fit in the vectors being permuted. E.g. a
permute on two V256QIs can't be done using a V256QI selector.
At the moment constant permutes use two interfaces:
targetm.vectorizer.vec_perm_const_ok for testing whether a permute is
valid and the vec_perm_const optab for actually emitting the permute.
The former gets passed a vec<> selector and the latter an rtx selector.
Most ports share a lot of code between the hook and the optab, with a
wrapper function for each interface.
We could try to keep that interface and require ports to define wider
vector modes that could be attached to the CONST_VECTOR (e.g. V256HI or
V256SI in the example above). But building a CONST_VECTOR rtx seems a bit
pointless here, since the expand code only creates the CONST_VECTOR in
order to call the optab, and the first thing the target does is take
the CONST_VECTOR apart again.
The easiest approach therefore seemed to be to remove the optab and
reuse the target hook to emit the code. One potential drawback is that
it's no longer possible to use match_operand predicates to force
operands into the required form, but in practice all targets want
register operands anyway.
The patch also changes vec_perm_indices into a class that provides
some simple routines for handling permutations. A later patch will
flesh this out and get rid of auto_vec_perm_indices, but I didn't
want to do all that in this patch and make it more complicated than
it already is.
2018-01-02 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* Makefile.in (OBJS): Add vec-perm-indices.o.
* vec-perm-indices.h: New file.
* vec-perm-indices.c: Likewise.
* target.h (vec_perm_indices): Replace with a forward class
declaration.
(auto_vec_perm_indices): Move to vec-perm-indices.h.
* optabs.h: Include vec-perm-indices.h.
(expand_vec_perm): Delete.
(selector_fits_mode_p, expand_vec_perm_var): Declare.
(expand_vec_perm_const): Declare.
* target.def (vec_perm_const_ok): Replace with...
(vec_perm_const): ...this new hook.
* doc/tm.texi.in (TARGET_VECTORIZE_VEC_PERM_CONST_OK): Replace with...
(TARGET_VECTORIZE_VEC_PERM_CONST): ...this new hook.
* doc/tm.texi: Regenerate.
* optabs.def (vec_perm_const): Delete.
* doc/md.texi (vec_perm_const): Likewise.
(vec_perm): Refer to TARGET_VECTORIZE_VEC_PERM_CONST.
* expr.c (expand_expr_real_2): Use expand_vec_perm_const rather than
expand_vec_perm for constant permutation vectors. Assert that
the mode of variable permutation vectors is the integer equivalent
of the mode that is being permuted.
* optabs-query.h (selector_fits_mode_p): Declare.
* optabs-query.c: Include vec-perm-indices.h.
(selector_fits_mode_p): New function.
(can_vec_perm_const_p): Check whether targetm.vectorize.vec_perm_const
is defined, instead of checking whether the vec_perm_const_optab
exists. Use targetm.vectorize.vec_perm_const instead of
targetm.vectorize.vec_perm_const_ok. Check whether the indices
fit in the vector mode before using a variable permute.
* optabs.c (shift_amt_for_vec_perm_mask): Take a mode and a
vec_perm_indices instead of an rtx.
(expand_vec_perm): Replace with...
(expand_vec_perm_const): ...this new function. Take the selector
as a vec_perm_indices rather than an rtx. Also take the mode of
the selector. Update call to shift_amt_for_vec_perm_mask.
Use targetm.vectorize.vec_perm_const instead of vec_perm_const_optab.
Use vec_perm_indices::new_expanded_vector to expand the original
selector into bytes. Check whether the indices fit in the vector
mode before using a variable permute.
(expand_vec_perm_var): Make global.
(expand_mult_highpart): Use expand_vec_perm_const.
* fold-const.c: Includes vec-perm-indices.h.
* tree-ssa-forwprop.c: Likewise.
* tree-vect-data-refs.c: Likewise.
* tree-vect-generic.c: Likewise.
* tree-vect-loop.c: Likewise.
* tree-vect-slp.c: Likewise.
* tree-vect-stmts.c: Likewise.
* config/aarch64/aarch64-protos.h (aarch64_expand_vec_perm_const):
Delete.
* config/aarch64/aarch64-simd.md (vec_perm_const<mode>): Delete.
* config/aarch64/aarch64.c (aarch64_expand_vec_perm_const)
(aarch64_vectorize_vec_perm_const_ok): Fuse into...
(aarch64_vectorize_vec_perm_const): ...this new function.
(TARGET_VECTORIZE_VEC_PERM_CONST_OK): Delete.
(TARGET_VECTORIZE_VEC_PERM_CONST): Redefine.
* config/arm/arm-protos.h (arm_expand_vec_perm_const): Delete.
* config/arm/vec-common.md (vec_perm_const<mode>): Delete.
* config/arm/arm.c (TARGET_VECTORIZE_VEC_PERM_CONST_OK): Delete.
(TARGET_VECTORIZE_VEC_PERM_CONST): Redefine.
(arm_expand_vec_perm_const, arm_vectorize_vec_perm_const_ok): Merge
into...
(arm_vectorize_vec_perm_const): ...this new function. Explicitly
check for NEON modes.
* config/i386/i386-protos.h (ix86_expand_vec_perm_const): Delete.
* config/i386/sse.md (VEC_PERM_CONST, vec_perm_const<mode>): Delete.
* config/i386/i386.c (ix86_expand_vec_perm_const_1): Update comment.
(ix86_expand_vec_perm_const, ix86_vectorize_vec_perm_const_ok): Merge
into...
(ix86_vectorize_vec_perm_const): ...this new function. Incorporate
the old VEC_PERM_CONST conditions.
* config/ia64/ia64-protos.h (ia64_expand_vec_perm_const): Delete.
* config/ia64/vect.md (vec_perm_const<mode>): Delete.
* config/ia64/ia64.c (ia64_expand_vec_perm_const)
(ia64_vectorize_vec_perm_const_ok): Merge into...
(ia64_vectorize_vec_perm_const): ...this new function.
* config/mips/loongson.md (vec_perm_const<mode>): Delete.
* config/mips/mips-msa.md (vec_perm_const<mode>): Delete.
* config/mips/mips-ps-3d.md (vec_perm_constv2sf): Delete.
* config/mips/mips-protos.h (mips_expand_vec_perm_const): Delete.
* config/mips/mips.c (mips_expand_vec_perm_const)
(mips_vectorize_vec_perm_const_ok): Merge into...
(mips_vectorize_vec_perm_const): ...this new function.
* config/powerpcspe/altivec.md (vec_perm_constv16qi): Delete.
* config/powerpcspe/paired.md (vec_perm_constv2sf): Delete.
* config/powerpcspe/spe.md (vec_perm_constv2si): Delete.
* config/powerpcspe/vsx.md (vec_perm_const<mode>): Delete.
* config/powerpcspe/powerpcspe-protos.h (altivec_expand_vec_perm_const)
(rs6000_expand_vec_perm_const): Delete.
* config/powerpcspe/powerpcspe.c (TARGET_VECTORIZE_VEC_PERM_CONST_OK):
Delete.
(TARGET_VECTORIZE_VEC_PERM_CONST): Redefine.
(altivec_expand_vec_perm_const_le): Take each operand individually.
Operate on constant selectors rather than rtxes.
(altivec_expand_vec_perm_const): Likewise. Update call to
altivec_expand_vec_perm_const_le.
(rs6000_expand_vec_perm_const): Delete.
(rs6000_vectorize_vec_perm_const_ok): Delete.
(rs6000_vectorize_vec_perm_const): New function.
(rs6000_do_expand_vec_perm): Take a vec_perm_builder instead of
an element count and rtx array.
(rs6000_expand_extract_even): Update call accordingly.
(rs6000_expand_interleave): Likewise.
* config/rs6000/altivec.md (vec_perm_constv16qi): Delete.
* config/rs6000/paired.md (vec_perm_constv2sf): Delete.
* config/rs6000/vsx.md (vec_perm_const<mode>): Delete.
* config/rs6000/rs6000-protos.h (altivec_expand_vec_perm_const)
(rs6000_expand_vec_perm_const): Delete.
* config/rs6000/rs6000.c (TARGET_VECTORIZE_VEC_PERM_CONST_OK): Delete.
(TARGET_VECTORIZE_VEC_PERM_CONST): Redefine.
(altivec_expand_vec_perm_const_le): Take each operand individually.
Operate on constant selectors rather than rtxes.
(altivec_expand_vec_perm_const): Likewise. Update call to
altivec_expand_vec_perm_const_le.
(rs6000_expand_vec_perm_const): Delete.
(rs6000_vectorize_vec_perm_const_ok): Delete.
(rs6000_vectorize_vec_perm_const): New function. Remove stray
reference to the SPE evmerge intructions.
(rs6000_do_expand_vec_perm): Take a vec_perm_builder instead of
an element count and rtx array.
(rs6000_expand_extract_even): Update call accordingly.
(rs6000_expand_interleave): Likewise.
* config/sparc/sparc.md (vec_perm_constv8qi): Delete in favor of...
* config/sparc/sparc.c (sparc_vectorize_vec_perm_const): ...this
new function.
(TARGET_VECTORIZE_VEC_PERM_CONST): Redefine.
From-SVN: r256093
|
|
This patch splits can_vec_perm_p into two functions: can_vec_perm_var_p
for testing permute operations with variable selection vectors, and
can_vec_perm_const_p for testing permute operations with specific
constant selection vectors. This means that we can pass the constant
selection vector by reference.
Constant permutes can still use a variable permute as a fallback.
A later patch adds a check to makre sure that we don't truncate the
vector indices when doing this.
However, have_whole_vector_shift checked:
if (direct_optab_handler (vec_perm_const_optab, mode) == CODE_FOR_nothing)
return false;
which had the effect of disallowing the fallback to variable permutes.
I'm not sure whether that was the intention or whether it was just
supposed to short-cut the loop on targets that don't support permutes.
(But then why bother? The first check in the loop would fail and
we'd bail out straightaway.)
The patch adds a parameter for disallowing the fallback. I think it
makes sense to do this for the following code in the VEC_PERM_EXPR
folder:
/* Some targets are deficient and fail to expand a single
argument permutation while still allowing an equivalent
2-argument version. */
if (need_mask_canon && arg2 == op2
&& !can_vec_perm_p (TYPE_MODE (type), false, &sel)
&& can_vec_perm_p (TYPE_MODE (type), false, &sel2))
since it's really testing whether the expand_vec_perm_const code expects
a particular form.
2018-01-02 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* optabs-query.h (can_vec_perm_p): Delete.
(can_vec_perm_var_p, can_vec_perm_const_p): Declare.
* optabs-query.c (can_vec_perm_p): Split into...
(can_vec_perm_var_p, can_vec_perm_const_p): ...these two functions.
(can_mult_highpart_p): Use can_vec_perm_const_p to test whether a
particular selector is valid.
* tree-ssa-forwprop.c (simplify_vector_constructor): Likewise.
* tree-vect-data-refs.c (vect_grouped_store_supported): Likewise.
(vect_grouped_load_supported): Likewise.
(vect_shift_permute_load_chain): Likewise.
* tree-vect-slp.c (vect_build_slp_tree_1): Likewise.
(vect_transform_slp_perm_load): Likewise.
* tree-vect-stmts.c (perm_mask_for_reverse): Likewise.
(vectorizable_bswap): Likewise.
(vect_gen_perm_mask_checked): Likewise.
* fold-const.c (fold_ternary_loc): Likewise. Don't take
implementations of variable permutation vectors into account
when deciding which selector to use.
* tree-vect-loop.c (have_whole_vector_shift): Don't check whether
vec_perm_const_optab is supported; instead use can_vec_perm_const_p
with a false third argument.
* tree-vect-generic.c (lower_vec_perm): Use can_vec_perm_const_p
to test whether the constant selector is valid and can_vec_perm_var_p
to test whether a variable selector is valid.
From-SVN: r256091
|
|
This patch splits the loop versioning threshold out from the
cost model threshold so that the former can become a poly_uint64.
We still use a single test to enforce both limits where possible.
2017-12-21 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* tree-vectorizer.h (_loop_vec_info): Add a versioning_threshold
field.
(LOOP_VINFO_VERSIONING_THRESHOLD): New macro
(vect_loop_versioning): Take the loop versioning threshold as a
separate parameter.
* tree-vect-loop-manip.c (vect_loop_versioning): Likewise.
* tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize
versioning_threshold.
(vect_analyze_loop_2): Compute the loop versioning threshold
whenever loop versioning is needed, and store it in the new
field rather than combining it with the cost model threshold.
(vect_transform_loop): Update call to vect_loop_versioning.
Try to combine the loop versioning and cost thresholds here.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r255934
|
|
PR tree-optimization/80631
* tree-vect-loop.c (vect_create_epilog_for_reduction): Compare
induc_code against MAX_EXPR or MIN_EXPR instead of reduc_fn against
IFN_REDUC_MAX or IFN_REDUC_MIN.
From-SVN: r255804
|
|
PR tree-optimization/80631
* tree-vect-loop.c (get_initial_def_for_reduction): Fix comment typo.
(vect_create_epilog_for_reduction): Add INDUC_VAL and INDUC_CODE
arguments, for INTEGER_INDUC_COND_REDUCTION use INDUC_VAL instead of
hardcoding zero as the value if COND_EXPR is never true. For
INTEGER_INDUC_COND_REDUCTION don't emit the final COND_EXPR if
INDUC_VAL is equal to INITIAL_DEF, and use INDUC_CODE instead of
hardcoding MAX_EXPR as the reduction operation.
(is_nonwrapping_integer_induction): Allow negative step.
(vectorizable_reduction): Compute INDUC_VAL and INDUC_CODE for
vect_create_epilog_for_reduction, if no value is suitable, don't
use INTEGER_INDUC_COND_REDUCTION for now. Formatting fixes.
* gcc.dg/vect/pr80631-1.c: New test.
* gcc.dg/vect/pr80631-2.c: New test.
* gcc.dg/vect/pr65947-13.c: Expect integer induc cond reduction
vectorization.
From-SVN: r255574
|
|
This patch introduces a number of new macros and functions that will
be used to distinguish between different kinds of debug stmts, insns
and notes, namely, preexisting debug bind ones and to-be-introduced
nonbind markers.
In a seemingly mechanical way, it adjusts several uses of the macros
and functions, so that they refer to narrower categories when
appropriate.
These changes, by themselves, should not have any visible effect in
the compiler behavior, since the upcoming debug markers are never
created with this patch alone.
for gcc/ChangeLog
* gimple.h (enum gimple_debug_subcode): Add
GIMPLE_DEBUG_BEGIN_STMT.
(gimple_debug_begin_stmt_p): New.
(gimple_debug_nonbind_marker_p): New.
* tree.h (MAY_HAVE_DEBUG_MARKER_STMTS): New.
(MAY_HAVE_DEBUG_BIND_STMTS): Renamed from....
(MAY_HAVE_DEBUG_STMTS): ... this. Check both.
* insn-notes.def (BEGIN_STMT): New.
* rtl.h (MAY_HAVE_DEBUG_MARKER_INSNS): New.
(MAY_HAVE_DEBUG_BIND_INSNS): Renamed from....
(MAY_HAVE_DEBUG_INSNS): ... this. Check both.
(NOTE_MARKER_LOCATION, NOTE_MARKER_P): New.
(DEBUG_BIND_INSN_P, DEBUG_MARKER_INSN_P): New.
(INSN_DEBUG_MARKER_KIND): New.
(GEN_RTX_DEBUG_MARKER_BEGIN_STMT_PAT): New.
(INSN_VAR_LOCATION): Check for VAR_LOCATION.
(INSN_VAR_LOCATION_PTR): New.
* cfgexpand.c (expand_debug_locations): Handle debug bind insns
only.
(expand_gimple_basic_block): Likewise. Emit debug temps for TER
deps only if debug bind insns are enabled.
(pass_expand::execute): Avoid deep TER and expand
debug locations for debug bind insns only.
* cgraph.c (cgraph_edge::redirect_call_stmt_to_callee): Narrow
debug stmts special handling down to debug bind stmts.
* combine.c (try_combine): Narrow debug insns special handling
down to debug bind insns.
* cse.c (delete_trivially_dead_insns): Handle debug bindings.
Narrow debug insns preexisting special handling down to debug
bind insns.
* dce.c (rest_of_handle_ud_dce): Narrow debug insns special
handling down to debug bind insns.
* function.c (instantiate_virtual_regs): Skip debug markers,
adjust handling of debug binds.
* gimple-ssa-backprop.c (backprop::prepare_change): Try debug
temp insertion iff MAY_HAVE_DEBUG_BIND_STMTS.
* haifa-sched.c (schedule_insn): Narrow special handling of debug
insns to debug bind insns.
* ipa-param-manipulation.c (ipa_modify_call_arguments): Narrow
special handling of debug stmts to debug bind stmts.
* ipa-split.c (split_function): Likewise.
* ira.c (combine_and_move_insns): Adjust debug bind insns only.
* loop-unroll.c (apply_opt_in_copies): Adjust tests on bind
debug insns.
* reg-stack.c (convert_regs_1): Use DEBUG_BIND_INSN_P.
* regrename.c (build_def_use): Likewise.
* regcprop.c (copyprop_hardreg_forward_1): Likewise.
(pass_cprop_hardreg): Narrow special casing of debug insns to
debug bind insns.
* regstat.c (regstat_init_n_sets_and_refs): Likewise.
* reload1.c (reload): Likewise.
* sese.c (sese_insert_phis_for_liveouts): Narrow special
casing of debug stmts to debug bind stmts.
* shrink-wrap.c (move_insn_for_shrink_wrap): Likewise.
* ssa-iterators.h (num_imm_uses): Likewise.
* tree-cfg.c (gimple_merge_blocks): Narrow special casing of
debug stmts to debug bind stmts.
* tree-inline.c (tree_function_versioning): Narrow special casing
of debug stmts to debug bind stmts.
* tree-loop-distribution.c (generate_loops_for_partition):
Narrow special casing of debug stmts to debug bind stmts.
* tree-sra.c (analyze_access_subtree): Narrow special casing
of debug stmts to debug bind stmts.
* tree-ssa-dce.c (remove_dead_stmt): Narrow special casing of debug
stmts to debug bind stmts.
* tree-ssa-loop-ivopt.c (remove_unused_ivs): Narrow special
casing of debug stmts to debug bind stmts.
* tree-ssa-reassoc.c (reassoc_remove_stmt): Likewise.
* tree-ssa-tail-merge.c (tail_merge_optimize): Narrow special
casing of debug stmts to debug bind stmts.
* tree-ssa-threadedge.c (propagate_threaded_block_debug_info):
Likewise.
* tree-ssa.c (flush_pending_stmts): Narrow special casing of
debug stmts to debug bind stmts.
(gimple_replace_ssa_lhs): Likewise.
(insert_debug_temp_for_var_def): Likewise.
(insert_debug_temps_for_defs): Likewise.
(reset_debug_uses): Likewise.
* tree-ssanames.c (release_ssa_name_fn): Likewise.
* tree-vect-loop-manip.c (adjust_debug_stmts_now): Likewise.
(adjust_debug_stmts): Likewise.
(adjust_phi_and_debug_stmts): Likewise.
(vect_do_peeling): Likewise.
* tree-vect-loop.c (vect_transform_loop): Likewise.
* valtrack.c (propagate_for_debug): Use BIND_DEBUG_INSN_P.
* var-tracking.c (adjust_mems): Narrow special casing of debug
insns to debug bind insns.
(dv_onepart_p, dataflow_set_clar_at_call, use_type): Likewise.
(compute_bb_dataflow, vt_find_locations): Likewise.
(vt_expand_loc, emit_notes_for_changes): Likewise.
(vt_init_cfa_base): Likewise.
(vt_emit_notes): Likewise.
(vt_initialize): Likewise.
(vt_finalize): Likewise.
From-SVN: r255565
|
|
This patch changes gimple_build_vector so that it takes a
tree_vector_builder instead of a size and a vector of trees.
2017-12-07 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* vector-builder.h (vector_builder::derived): New const overload.
(vector_builder::elt): New function.
* tree-vector-builder.h (tree_vector_builder::type): New function.
(tree_vector_builder::apply_step): Declare.
* tree-vector-builder.c (tree_vector_builder::apply_step): New
function.
* gimple-fold.h (tree_vector_builder): Declare.
(gimple_build_vector): Take a tree_vector_builder instead of a
type and vector of elements.
* gimple-fold.c (gimple_build_vector): Likewise.
* tree-vect-loop.c (get_initial_def_for_reduction): Update call
accordingly.
(get_initial_defs_for_reduction): Likewise.
(vectorizable_induction): Likewise.
From-SVN: r255478
|
|
This patch switches most build_vector calls over to tree_vector_builder,
using explicit encodings where appropriate. Later patches handle
the remaining uses of build_vector.
2017-12-07 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* config/sparc/sparc.c: Include tree-vector-builder.h.
(sparc_fold_builtin): Use tree_vector_builder instead of build_vector.
* expmed.c: Include tree-vector-builder.h.
(make_tree): Use tree_vector_builder instead of build_vector.
* fold-const.c: Include tree-vector-builder.h.
(const_binop): Use tree_vector_builder instead of build_vector.
(const_unop): Likewise.
(native_interpret_vector): Likewise.
(fold_vec_perm): Likewise.
(fold_ternary_loc): Likewise.
* gimple-fold.c: Include tree-vector-builder.h.
(gimple_fold_stmt_to_constant_1): Use tree_vector_builder instead
of build_vector.
* tree-ssa-forwprop.c: Include tree-vector-builder.h.
(simplify_vector_constructor): Use tree_vector_builder instead
of build_vector.
* tree-vect-generic.c: Include tree-vector-builder.h.
(add_rshift): Use tree_vector_builder instead of build_vector.
(expand_vector_divmod): Likewise.
(optimize_vector_constructor): Likewise.
* tree-vect-loop.c: Include tree-vector-builder.h.
(vect_create_epilog_for_reduction): Use tree_vector_builder instead
of build_vector. Explicitly use a stepped encoding for
{ 1, 2, 3, ... }.
* tree-vect-slp.c: Include tree-vector-builder.h.
(vect_get_constant_vectors): Use tree_vector_builder instead
of build_vector.
(vect_transform_slp_perm_load): Likewise.
(vect_schedule_slp_instance): Likewise.
* tree-vect-stmts.c: Include tree-vector-builder.h.
(vectorizable_bswap): Use tree_vector_builder instead of build_vector.
(vect_gen_perm_mask_any): Likewise.
(vectorizable_call): Likewise. Explicitly use a stepped encoding.
* tree.c: (build_vector_from_ctor): Use tree_vector_builder instead
of build_vector.
(build_vector_from_val): Likewise. Explicitly use a duplicate
encoding.
From-SVN: r255475
|
|
PR tree-optimization/81303
* Makefile.in (gimple-loop-interchange.o): New object file.
* common.opt (floop-interchange): Reuse the option from graphite.
* doc/invoke.texi (-floop-interchange): Ditto. New document for
-floop-interchange and mention it for -O3.
* opts.c (default_options_table): Enable -floop-interchange at -O3.
* gimple-loop-interchange.cc: New file.
* params.def (PARAM_LOOP_INTERCHANGE_MAX_NUM_STMTS): New parameter.
(PARAM_LOOP_INTERCHANGE_STRIDE_RATIO): New parameter.
* passes.def (pass_linterchange): New pass.
* timevar.def (TV_LINTERCHANGE): New time var.
* tree-pass.h (make_pass_linterchange): New declaration.
* tree-ssa-loop-ivcanon.c (create_canonical_iv): Change to external
interchange. Record IV before/after increment in new parameters.
* tree-ssa-loop-ivopts.h (create_canonical_iv): New declaration.
* tree-vect-loop.c (vect_is_simple_reduction): Factor out reduction
path check into...
(check_reduction_path): ...New function here.
* tree-vectorizer.h (check_reduction_path): New declaration.
gcc/testsuite
* gcc.dg/tree-ssa/loop-interchange-1.c: New test.
* gcc.dg/tree-ssa/loop-interchange-1b.c: New test.
* gcc.dg/tree-ssa/loop-interchange-2.c: New test.
* gcc.dg/tree-ssa/loop-interchange-3.c: New test.
* gcc.dg/tree-ssa/loop-interchange-4.c: New test.
* gcc.dg/tree-ssa/loop-interchange-5.c: New test.
* gcc.dg/tree-ssa/loop-interchange-6.c: New test.
* gcc.dg/tree-ssa/loop-interchange-7.c: New test.
* gcc.dg/tree-ssa/loop-interchange-8.c: New test.
* gcc.dg/tree-ssa/loop-interchange-9.c: New test.
* gcc.dg/tree-ssa/loop-interchange-10.c: New test.
* gcc.dg/tree-ssa/loop-interchange-11.c: New test.
* gcc.dg/tree-ssa/loop-interchange-12.c: New test.
* gcc.dg/tree-ssa/loop-interchange-13.c: New test.
Co-Authored-By: Richard Biener <rguenther@suse.de>
From-SVN: r255472
|
|
This patch replaces the REDUC_*_EXPR tree codes with internal functions.
This is needed so that the upcoming in-order reductions can also use
internal functions without too much complication.
2017-11-22 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* tree.def (REDUC_MAX_EXPR, REDUC_MIN_EXPR, REDUC_PLUS_EXPR): Delete.
* cfgexpand.c (expand_debug_expr): Remove handling for them.
* expr.c (expand_expr_real_2): Likewise.
* fold-const.c (const_unop): Likewise.
* optabs-tree.c (optab_for_tree_code): Likewise.
* tree-cfg.c (verify_gimple_assign_unary): Likewise.
* tree-inline.c (estimate_operator_cost): Likewise.
* tree-pretty-print.c (dump_generic_node): Likewise.
(op_code_prio): Likewise.
(op_symbol_code): Likewise.
* internal-fn.def (DEF_INTERNAL_SIGNED_OPTAB_FN): Define.
(IFN_REDUC_PLUS, IFN_REDUC_MAX, IFN_REDUC_MIN): New internal functions.
* internal-fn.c (direct_internal_fn_optab): New function.
(direct_internal_fn_array, direct_internal_fn_supported_p
(internal_fn_expanders): Handle DEF_INTERNAL_SIGNED_OPTAB_FN.
* fold-const-call.c (fold_const_reduction): New function.
(fold_const_call): Handle CFN_REDUC_PLUS, CFN_REDUC_MAX and
CFN_REDUC_MIN.
* tree-vect-loop.c: Include internal-fn.h.
(reduction_code_for_scalar_code): Rename to...
(reduction_fn_for_scalar_code): ...this and return an internal
function.
(vect_model_reduction_cost): Take an internal_fn rather than
a tree_code.
(vect_create_epilog_for_reduction): Likewise. Build calls rather
than assignments.
(vectorizable_reduction): Use internal functions rather than tree
codes for the reduction operation. Update calls to the functions
above.
* config/aarch64/aarch64-builtins.c (aarch64_gimple_fold_builtin):
Use calls to internal functions rather than REDUC tree codes.
* config/aarch64/aarch64-simd.md: Update comment accordingly.
From-SVN: r255073
|
|
* asan.c (create_cond_insert_point): Maintain profile.
* ipa-utils.c (ipa_merge_profiles): Be sure only IPA profiles are
merged.
* basic-block.h (struct basic_block_def): Remove frequency.
(EDGE_FREQUENCY): Use to_frequency
* bb-reorder.c (push_to_next_round_p): Use only IPA counts for global
heuristics.
(find_traces): Update to use to_frequency.
(find_traces_1_round): Likewise; use only IPA counts.
(bb_to_key): Likewise.
(connect_traces): Use IPA counts only.
(copy_bb_p): Update to use to_frequency.
(fix_up_crossing_landing_pad): Likewise.
(sanitize_hot_paths): Likewise.
* bt-load.c (basic_block_freq): Likewise.
* cfg.c (init_flow): Set count_max to uninitialized.
(check_bb_profile): Remove frequencies; check counts.
(dump_bb_info): Do not dump frequencies.
(update_bb_profile_for_threading): Update counts only.
(scale_bbs_frequencies_int): Likewise.
(MAX_SAFE_MULTIPLIER): Remove.
(scale_bbs_frequencies_gcov_type): Update counts only.
(scale_bbs_frequencies_profile_count): Update counts only.
(scale_bbs_frequencies): Update counts only.
* cfg.h (struct control_flow_graph): Add count-max.
(update_bb_profile_for_threading): Update prototype.
* cfgbuild.c (find_bb_boundaries): Do not update frequencies.
(find_many_sub_basic_blocks): Likewise.
* cfgcleanup.c (try_forward_edges): Likewise.
(try_crossjump_to_edge): Likewise.
* cfgexpand.c (expand_gimple_cond): Likewise.
(expand_gimple_tailcall): Likewise.
(construct_init_block): Likewise.
(construct_exit_block): Likewise.
* cfghooks.c (verify_flow_info): Check consistency of counts.
(dump_bb_for_graph): Do not dump frequencies.
(split_block_1): Do not update frequencies.
(split_edge): Do not update frequencies.
(make_forwarder_block): Do not update frequencies.
(duplicate_block): Do not update frequencies.
(account_profile_record): Do not update frequencies.
* cfgloop.c (find_subloop_latch_edge_by_profile): Use IPA counts
for global heuristics.
* cfgloopanal.c (average_num_loop_insns): Update to use to_frequency.
(expected_loop_iterations_unbounded): Use counts only.
* cfgloopmanip.c (scale_loop_profile): Simplify.
(create_empty_loop_on_edge): Simplify
(loopify): Simplify
(duplicate_loop_to_header_edge): Simplify
* cfgrtl.c (force_nonfallthru_and_redirect): Update profile.
(update_br_prob_note): Take care of removing note when profile
becomes undefined.
(relink_block_chain): Do not dump frequency.
(rtl_account_profile_record): Use to_frequency.
* cgraph.c (symbol_table::create_edge): Convert count to ipa count.
(cgraph_edge::redirect_call_stmt_to_calle): Conver tcount to ipa count.
(cgraph_update_edges_for_call_stmt_node): Likewise.
(cgraph_edge::verify_count_and_frequency): Update.
(cgraph_node::verify_node): Temporarily disable frequency verification.
* cgraphbuild.c (compute_call_stmt_bb_frequency): Use
to_cgraph_frequency.
(cgraph_edge::rebuild_edges): Convert to ipa counts.
* cgraphunit.c (init_lowered_empty_function): Do not initialize
frequencies.
(cgraph_node::expand_thunk): Update profile.
* except.c (dw2_build_landing_pads): Do not update frequency.
* final.c (compute_alignments): Use to_frequency.
(dump_basic_block_info): Do not dump frequency.
* gimple-pretty-print.c (dump_profile): Do not dump frequency.
(dump_gimple_bb_header): Do not dump frequency.
* gimple-ssa-isolate-paths.c (isolate_path): Do not update frequency;
do update count.
* gimple-streamer-in.c (input_bb): Do not stream frequency.
* gimple-streamer-out.c (output_bb): Do not stream frequency.
* haifa-sched.c (sched_pressure_start_bb): Use to_freuqency.
(init_before_recovery): Do not update frequency.
(sched_create_recovery_edges): Do not update frequency.
* hsa-gen.c (convert_switch_statements): Do not update frequency.
* ipa-cp.c (ipcp_propagate_stage): Update search for max_count.
(ipa_cp_c_finalize): Set max_count to uninitialized.
* ipa-fnsummary.c (get_minimal_bb): Use counts.
(param_change_prob): Use counts.
* ipa-profile.c (ipa_profile_generate_summary): Do not summarize
local profiles.
* ipa-split.c (consider_split): Use to_frequency.
(split_function): Use to_frequency.
* ira-build.c (loop_compare_func): Likewise.
(mark_loops_for_removal): Likewise.
(mark_all_loops_for_removal): Likewise.
* loop-doloop.c (doloop_modify): Do not update frequency.
* loop-unroll.c (unroll_loop_runtime_iterations): Do not update
frequency.
* lto-streamer-in.c (input_function): Update count_max.
* omp-expand.c (expand_omp_taskreg): Update count_max.
* omp-simd-clone.c (simd_clone_adjust): Update profile.
* predict.c (maybe_hot_frequency_p): Use to_frequency.
(maybe_hot_count_p): Use ipa counts only.
(maybe_hot_bb_p): Simplify.
(maybe_hot_edge_p): Simplify.
(probably_never_executed): Do not take frequency argument.
(probably_never_executed_bb_p): Do not pass frequency.
(probably_never_executed_edge_p): Likewise.
(combine_predictions_for_bb): Check that profile is nonzero.
(propagate_freq): Do not set frequency.
(drop_profile): Simplify.
(counts_to_freqs): Simplify.
(expensive_function_p): Use to_frequency.
(propagate_unlikely_bbs_forward): Simplify.
(determine_unlikely_bbs): Simplify.
(estimate_bb_frequencies): Add hack to silence graphite issues.
(compute_function_frequency): Use ipa counts.
(pass_profile::execute): Update.
(rebuild_frequencies): Use counts only.
(force_edge_cold): Use counts only.
* profile-count.c (profile_count::dump): Dump new count types.
(profile_count::differs_from_p): Check compatiblity.
(profile_count::to_frequency): New function.
(profile_count::to_cgraph_frequency): New function.
* profile-count.h (struct function): Declare.
(enum profile_quality): Add profile_guessed_local and
profile_guessed_global0.
(class profile_proability): Decrease number of bits to 29;
update from_reg_br_prob_note and to_reg_br_prob_note.
(class profile_count: Update comment; decrease number of bits
to 61. Check compatibility.
(profile_count::compatible_p): New private member function.
(profile_count::ipa_p): New member function.
(profile_count::operator<): Handle global zero correctly.
(profile_count::operator>): Handle global zero correctly.
(profile_count::operator<=): Handle global zero correctly.
(profile_count::operator>=): Handle global zero correctly.
(profile_count::nonzero_p): New member function.
(profile_count::force_nonzero): New member function.
(profile_count::max): New member function.
(profile_count::apply_scale): Handle IPA scalling.
(profile_count::guessed_local): New member function.
(profile_count::global0): New member function.
(profile_count::ipa): New member function.
(profile_count::to_frequency): Declare.
(profile_count::to_cgraph_frequency): Declare.
* profile.c (OVERLAP_BASE): Delete.
(compute_frequency_overlap): Delete.
(compute_branch_probabilities): Do not use compute_frequency_overlap.
* regs.h (REG_FREQ_FROM_BB): Use to_frequency.
* sched-ebb.c (rank): Use counts only.
* shrink-wrap.c (handle_simple_exit): Use counts only.
(try_shrink_wrapping): Use counts only.
(place_prologue_for_one_component): Use counts only.
* tracer.c (find_best_predecessor): Use to_frequency.
(find_trace): Use to_frequency.
(tail_duplicate): Use to_frequency.
* trans-mem.c (expand_transaction): Do not update frequency.
* tree-call-cdce.c: Do not update frequency.
* tree-cfg.c (gimple_find_sub_bbs): Likewise.
(gimple_merge_blocks): Likewise.
(gimple_split_edge): Likewise.
(gimple_duplicate_sese_region): Likewise.
(gimple_duplicate_sese_tail): Likewise.
(move_sese_region_to_fn): Likewise.
(gimple_account_profile_record): Likewise.
(insert_cond_bb): Likewise.
* tree-complex.c (expand_complex_div_wide): Likewise.
* tree-eh.c (lower_resx): Update profile.
* tree-inline.c (copy_bb): Simplify count scaling; do not scale
frequencies.
(initialize_cfun): Do not initialize frequencies
(freqs_to_counts): Delete.
(copy_cfg_body): Ignore count parameter.
(copy_body): Update.
(expand_call_inline): Update count_max.
(optimize_inline_calls): Update count_max.
(tree_function_versioning): Update count_max.
* tree-ssa-coalesce.c (coalesce_cost_bb): Use to_frequency.
* tree-ssa-ifcombine.c (update_profile_after_ifcombine): Do not update
frequency.
* tree-ssa-loop-im.c (execute_sm_if_changed): Use counts only.
* tree-ssa-loop-ivcanon.c (unloop_loops): Do not update freuqency.
(try_peel_loop): Likewise.
* tree-ssa-loop-ivopts.c (get_scaled_computation_cost_at): Use
to_frequency.
* tree-ssa-loop-manip.c (niter_for_unrolled_loop): Pass -1.
(tree_transform_and_unroll_loop): Do not use frequencies
* tree-ssa-loop-niter.c (estimate_numbers_of_iterations):
Use reliable prediction only.
* tree-ssa-loop-unswitch.c (hoist_guard): Do not use frequencies.
* tree-ssa-sink.c (select_best_block): Use to_frequency.
* tree-ssa-tail-merge.c (replace_block_by): Temporarily disable
probability scaling.
* tree-ssa-threadupdate.c (create_block_for_threading): Do
not update frequency
(any_remaining_duplicated_blocks): Likewise.
(update_profile): Likewise.
(estimated_freqs_path): Delete.
(freqs_to_counts_path): Delete.
(clear_counts_path): Delete.
(ssa_fix_duplicate_block_edges): Likewise.
(duplicate_thread_path): Likewise.
* tree-switch-conversion.c (gen_inbound_check): Use counts.
* tree-tailcall.c (decrease_profile): Do not update frequency.
(eliminate_tail_call): Likewise.
* tree-vect-loop-manip.c (vect_do_peeling): Likewise.
* tree-vect-loop.c (scale_profile_for_vect_loop): Likewise.
(optimize_mask_stores): Likewise.
* tree-vect-stmts.c (vectorizable_simd_clone_call): Likewise.
* ubsan.c (ubsan_expand_null_ifn): Update profile.
(ubsan_expand_ptr_ifn): Update profile.
* value-prof.c (gimple_ic): Simplify.
* value-prof.h (gimple_ic): Update prototype.
* ipa-inline-transform.c (inline_transform): Fix scaling conditoins.
* ipa-inline.c (compute_uninlined_call_time): Be sure that
counts are nonzero.
(want_inline_self_recursive_call_p): Likewise.
(resolve_noninline_speculation): Only cummulate defined counts.
(inline_small_functions): Use nonzero_p.
(ipa_inline): Do not access freed node.
Unknown ChangeLog:
2017-11-02 Jan Hubicka <hubicka@ucw.cz>
* testsuite/gcc.dg/no-strict-overflow-3.c (foo): Update magic
value to not clash with frequency.
* testsuite/gcc.dg/strict-overflow-3.c (foo): Likewise.
* testsuite/gcc.dg/tree-ssa/builtin-sprintf-2.c: Update template.
* testsuite/gcc.dg/tree-ssa/dump-2.c: Update template.
* testsuite/gcc.dg/tree-ssa/ifc-10.c: Update template.
* testsuite/gcc.dg/tree-ssa/ifc-11.c: Update template.
* testsuite/gcc.dg/tree-ssa/ifc-12.c: Update template.
* testsuite/gcc.dg/tree-ssa/ifc-20040816-1.c: Update template.
* testsuite/gcc.dg/tree-ssa/ifc-20040816-2.c: Update template.
* testsuite/gcc.dg/tree-ssa/ifc-5.c: Update template.
* testsuite/gcc.dg/tree-ssa/ifc-8.c: Update template.
* testsuite/gcc.dg/tree-ssa/ifc-9.c: Update template.
* testsuite/gcc.dg/tree-ssa/ifc-cd.c: Update template.
* testsuite/gcc.dg/tree-ssa/ifc-pr56541.c: Update template.
* testsuite/gcc.dg/tree-ssa/ifc-pr68583.c: Update template.
* testsuite/gcc.dg/tree-ssa/ifc-pr69489-1.c: Update template.
* testsuite/gcc.dg/tree-ssa/ifc-pr69489-2.c: Update template.
* testsuite/gcc.target/i386/pr61403.c: Update template.
From-SVN: r254379
|
|
This follows on from similar changes a couple of months ago and
is needed when general modes have variable size.
2017-10-23 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* tree-vect-loop.c (vect_create_epilog_for_reduction): Use
SCALAR_TYPE_MODE instead of TYPE_MODE.
From-SVN: r254002
|
|
tree-vect-stmts.c:1524)
2017-10-20 Richard Biener <rguenther@suse.de>
PR tree-optimization/82473
* tree-vect-loop.c (vectorizable_reduction): Properly get at
the largest input type.
* gcc.dg/torture/pr82473.c: New testcase.
From-SVN: r253937
|
|
* asan.c (create_cond_insert_point): Do not update edge count.
* auto-profile.c (afdo_propagate_edge): Update for edge count removal.
(afdo_propagate_circuit): Likewise.
(afdo_calculate_branch_prob): Likewise.
(afdo_annotate_cfg): Likewise.
* basic-block.h (struct edge_def): Remove count.
(edge_def::count): New accessor.
* bb-reorder.c (rotate_loop): Update.
(find_traces_1_round): Update.
(connect_traces): Update.
(sanitize_hot_paths): Update.
* cfg.c (unchecked_make_edge): Update.
(make_single_succ_edge): Update.
(check_bb_profile): Update.
(dump_edge_info): Update.
(update_bb_profile_for_threading): Update.
(scale_bbs_frequencies_int): Update.
(scale_bbs_frequencies_gcov_type): Update.
(scale_bbs_frequencies_profile_count): Update.
(scale_bbs_frequencies): Update.
* cfganal.c (connect_infinite_loops_to_exit): Update.
* cfgbuild.c (compute_outgoing_frequencies): Update.
(find_many_sub_basic_blocks): Update.
* cfgcleanup.c (try_forward_edges): Update.
(try_crossjump_to_edge): Update
* cfgexpand.c (expand_gimple_cond): Update
(expand_gimple_tailcall): Update
(construct_exit_block): Update
* cfghooks.c (verify_flow_info): Update
(redirect_edge_succ_nodup): Update
(split_edge): Update
(make_forwarder_block): Update
(duplicate_block): Update
(account_profile_record): Update
* cfgloop.c (find_subloop_latch_edge_by_profile): Update.
* cfgloopanal.c (expected_loop_iterations_unbounded): Update.
* cfgloopmanip.c (scale_loop_profile): Update.
(loopify): Update.
(lv_adjust_loop_entry_edge): Update.
* cfgrtl.c (try_redirect_by_replacing_jump): Update.
(force_nonfallthru_and_redirect): Update.
(purge_dead_edges): Update.
(rtl_flow_call_edges_add): Update.
* cgraphunit.c (init_lowered_empty_function): Update.
(cgraph_node::expand_thunk): Update.
* gimple-pretty-print.c (dump_probability): Update.
(dump_edge_probability): Update.
* gimple-ssa-isolate-paths.c (isolate_path): Update.
* haifa-sched.c (sched_create_recovery_edges): Update.
* hsa-gen.c (convert_switch_statements): Update.
* ifcvt.c (dead_or_predicable): Update.
* ipa-inline-transform.c (inline_transform): Update.
* ipa-split.c (split_function): Update.
* ipa-utils.c (ipa_merge_profiles): Update.
* loop-doloop.c (add_test): Update.
* loop-unroll.c (unroll_loop_runtime_iterations): Update.
* lto-streamer-in.c (input_cfg): Update.
(input_function): Update.
* lto-streamer-out.c (output_cfg): Update.
* modulo-sched.c (sms_schedule): Update.
* postreload-gcse.c (eliminate_partially_redundant_load): Update.
* predict.c (maybe_hot_edge_p): Update.
(unlikely_executed_edge_p): Update.
(probably_never_executed_edge_p): Update.
(dump_prediction): Update.
(drop_profile): Update.
(propagate_unlikely_bbs_forward): Update.
(determine_unlikely_bbs): Update.
(force_edge_cold): Update.
* profile.c (compute_branch_probabilities): Update.
* reg-stack.c (better_edge): Update.
* shrink-wrap.c (handle_simple_exit): Update.
* tracer.c (better_p): Update.
* trans-mem.c (expand_transaction): Update.
(split_bb_make_tm_edge): Update.
* tree-call-cdce.c: Update.
* tree-cfg.c (gimple_find_sub_bbs): Update.
(gimple_split_edge): Update.
(gimple_duplicate_sese_region): Update.
(gimple_duplicate_sese_tail): Update.
(gimple_flow_call_edges_add): Update.
(insert_cond_bb): Update.
(execute_fixup_cfg): Update.
* tree-cfgcleanup.c (cleanup_control_expr_graph): Update.
* tree-complex.c (expand_complex_div_wide): Update.
* tree-eh.c (lower_resx): Update.
(unsplit_eh): Update.
(cleanup_empty_eh_move_lp): Update.
* tree-inline.c (copy_edges_for_bb): Update.
(freqs_to_counts): Update.
(copy_cfg_body): Update.
* tree-ssa-dce.c (remove_dead_stmt): Update.
* tree-ssa-ifcombine.c (update_profile_after_ifcombine): Update.
* tree-ssa-loop-im.c (execute_sm_if_changed): Update.
* tree-ssa-loop-ivcanon.c (remove_exits_and_undefined_stmts): Update.
(unloop_loops): Update.
* tree-ssa-loop-manip.c (tree_transform_and_unroll_loop): Update.
* tree-ssa-loop-split.c (connect_loops): Update.
(split_loop): Update.
* tree-ssa-loop-unswitch.c (hoist_guard): Update.
* tree-ssa-phionlycprop.c (propagate_rhs_into_lhs): Update.
* tree-ssa-phiopt.c (replace_phi_edge_with_variable): Update.
* tree-ssa-reassoc.c (branch_fixup): Update.
* tree-ssa-tail-merge.c (replace_block_by): Update.
* tree-ssa-threadupdate.c (remove_ctrl_stmt_and_useless_edges): Update.
(compute_path_counts): Update.
(update_profile): Update.
(recompute_probabilities): Update.
(update_joiner_offpath_counts): Update.
(estimated_freqs_path): Update.
(freqs_to_counts_path): Update.
(clear_counts_path): Update.
(ssa_fix_duplicate_block_edges): Update.
(duplicate_thread_path): Update.
* tree-switch-conversion.c (hoist_edge_and_branch_if_true): Update.
(case_bit_test_cmp): Update.
(collect_switch_conv_info): Update.
(gen_inbound_check): Update.
(do_jump_if_equal): Update.
(emit_cmp_and_jump_insns): Update.
* tree-tailcall.c (decrease_profile): Update.
(eliminate_tail_call): Update.
* tree-vect-loop-manip.c (slpeel_add_loop_guard): Update.
(vect_do_peeling): Update.
* tree-vect-loop.c (scale_profile_for_vect_loop): Update.
* ubsan.c (ubsan_expand_null_ifn): Update.
(ubsan_expand_ptr_ifn): Update.
* value-prof.c (gimple_divmod_fixed_value): Update.
(gimple_mod_pow2): Update.
(gimple_mod_subtract): Update.
(gimple_ic): Update.
(gimple_stringop_fixed_value): Update.
From-SVN: r253910
|
|
Previously SLP_TREE_NUMBER_OF_VEC_STMTS was calculated while scheduling
an SLP tree after analysis, but sometimes it can be useful to know the
value during analysis too. This patch moves the calculation to
vect_slp_analyze_node_operations instead.
2017-09-18 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* tree-vectorizer.h (vect_slp_analyze_operations): Replace parameters
with a vec_info *.
* tree-vect-loop.c (vect_analyze_loop_operations): Update call
accordingly.
* tree-vect-slp.c (vect_slp_analyze_node_operations): Add vec_info *
parameter. Set SLP_TREE_NUMBER_OF_VEC_STMTS here rather than in
vect_schedule_slp_instance.
(vect_slp_analyze_operations): Replace parameters with a vec_info *.
Update call to vect_slp_analyze_node_operations. Simplify return
value.
(vect_slp_analyze_bb_1): Update call accordingly.
(vect_schedule_slp_instance): Remove vectorization_factor parameter.
Don't calculate SLP_TREE_NUMBER_OF_VEC_STMTS here.
(vect_schedule_slp): Update call accordingly.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r252935
|
|
This patch changes the type of the multiplier applied by
vectorizable_live_operation from unsigned_type_node to bitsizetype,
which matches the type of TYPE_SIZE and is the type expected of a
BIT_FIELD_REF bit position. This is shown by existing tests when
SVE is added.
2017-09-18 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* tree-vect-loop.c (vectorizable_live_operation): Fix type of
bitstart.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r252931
|
|
vectorizable_live_operation needs to use BIT_FIELD_REF to extract one
element of a vector. For a packed vector boolean type, the number of
bits to extract should be taken from TYPE_PRECISION rather than TYPE_SIZE.
This is shown by existing tests once SVE is added.
2017-09-18 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* tree-vect-loop.c (vectorizable_live_operation): Fix element size
calculation for vector booleans.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r252930
|
|
regression with trunk@250416)
2017-09-18 Richard Biener <rguenther@suse.de>
PR tree-optimization/82220
* tree-vect-loop.c (vect_estimate_min_profitable_iters): Exclude
epilogue niters from the min_profitable_iters compute.
From-SVN: r252917
|
|
This should have been after the early exit for non-vectorised statements.
2017-09-16 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
PR tree-optimization/82228
* tree-vect-loop.c (vectorizable_live_operation): Move initialization
of ncopies.
From-SVN: r252888
|