Age | Commit message (Collapse) | Author | Files | Lines |
|
combining stmt data ref gathering and fatal analysis parts.
2018-05-25 Richard Biener <rguenther@suse.de>
* tree-vect-data-refs.c (vect_find_stmt_data_reference): New
function, combining stmt data ref gathering and fatal analysis
parts.
(vect_analyze_data_refs): Remove now redudnant code and simplify.
* tree-vect-loop.c (vect_get_datarefs_in_loop): Factor out from
vect_analyze_loop_2 and use vect_find_stmt_data_reference.
* tree-vect-slp.c (vect_slp_bb): Use vect_find_stmt_data_reference.
* tree-vectorizer.h (vect_find_stmt_data_reference): Declare.
From-SVN: r260754
|
|
2018-05-25 Richard Biener <rguenther@suse.de>
* tree-vectorizer.h (STMT_VINFO_GROUP_*, GROUP_*): Remove.
(DR_GROUP_*): New, assert we have non-NULL ->data_ref_info.
(REDUC_GROUP_*): New, assert we have NULL ->data_ref_info.
(STMT_VINFO_GROUPED_ACCESS): Adjust.
* tree-vect-data-refs.c (everywhere): Adjust users.
* tree-vect-loop.c (everywhere): Likewise.
* tree-vect-slp.c (everywhere): Likewise.
* tree-vect-stmts.c (everywhere): Likewise.
* tree-vect-patterns.c (vect_reassociating_reduction_p): Likewise.
From-SVN: r260709
|
|
The problem in this PR was that we didn't consider aliases between
writes in the same strided group. After tightening the early exit
we get the expected abs(step) >= 2 versioning check.
2018-05-02 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
PR tree-optimization/85586
* tree-vect-data-refs.c (vect_analyze_data_ref_dependence): Only
exit early for statements in the same group if the accesses are
not strided.
gcc/testsuite/
PR tree-optimization/85586
* gcc.dg/vect/pr85586.c: New test.
From-SVN: r259822
|
|
and pass it to vect_get_load_cost.
2018-04-26 Richard Biener <rguenther@suse.de>
* tree-vect-data-refs.c (vect_get_data_access_cost): Get
prologue cost vector and pass it to vect_get_load_cost.
(vect_get_peeling_costs_all_drs): Likewise.
(vect_peeling_hash_get_lowest_cost): Likewise.
(vect_enhance_data_refs_alignment): Likewise.
From-SVN: r259668
|
|
with r256888)
2018-04-19 Richard Biener <rguenther@suse.de>
PR tree-optimization/84737
* tree-vect-data-refs.c (vect_copy_ref_info): New function
copying restrict info.
(vect_setup_realignment): Use it.
* tree-vectorizer.h (vect_copy_ref_info): Declare.
* tree-vect-stmts.c (vectorizable_store): Copy ref info from
the first DR to all generated stores.
(vectorizable_load): Likewise for loads.
From-SVN: r259493
|
|
In this PR we used WIDEN_SUM_EXPR to vectorise:
short i, y;
int sum;
[...]
for (i = x; i > 0; i--)
sum += y;
with 4 ints and 8 shorts per vector. The problem was that we set
the VF based only on the ints, then calculated the number of vector
copies based on the shorts, giving 4/8. Previously that led to
ncopies==0, but after r249897 we pick it up as an ICE.
In this particular case we could vectorise the reduction by setting
ncopies based on the output type rather than the input type, but it
doesn't seem worth adding a special "optimisation" for such a
pathological case. I think it's really an instance of the more general
problem that we can't vectorise using combinations of (say) 64-bit and
128-bit vectors on targets that support both.
2018-04-10 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
PR tree-optimization/85286
* tree-vect-data-refs.c (vect_get_smallest_scalar_type):
gcc/testsuite/
PR tree-optimization/85286
* gcc.dg/vect/pr85286.c: New test.
From-SVN: r259268
|
|
This PR is another regression caused by the removal of the simple_iv
check in dr_analyze_innermost for BB analysis. Without splitting out
the step, we weren't able to find an underlying object whose alignment
could be increased.
As with PR81635, I think the simple_iv was only handling one special
case of something that ought to be more general. The more general
thing here is that if the address can be analysed as a scalar
evolution, and if all updates preserve alignment N, it's possible
to align the address to N by increasing the alignment of the base
object to N. That applies also to outer loops, and to both loop
and BB analysis.
I wasn't sure where the new functions ought to live, but tree-data-ref.c
seemed OK since (a) that already does scev analysis on addresses and
(b) you'd want to use dr_analyze_innermost first if you were analysing
a reference.
2018-03-24 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
PR tree-optimization/84005
* tree-data-ref.h (get_base_for_alignment): Declare.
* tree-data-ref.c (get_base_for_alignment_1): New function.
(get_base_for_alignment): Likewise.
* tree-vect-data-refs.c (vect_compute_data_ref_alignment): Use
get_base_for_alignment to find a suitable base object, instead
of always using drb->base_address.
gcc/testsuite/
PR tree-optimization/84005
* gcc.dg/vect/bb-slp-1.c: Make sure there is no message about
failing to force the alignment.
From-SVN: r258833
|
|
...since the latter doesn't guarantee independence by itself.
2018-03-02 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* tree-vect-data-refs.c (vect_analyze_data_ref_dependence)
(vect_analyze_data_ref_access): Use loop->safe_len rather than
loop->force_vectorize to check whether there is no alias.
gcc/testsuite/
* gcc.dg/vect/vect-alias-check-13.c: New test.
From-SVN: r258130
|
|
In the testcase we were trying to group two gather loads, even though
that isn't supported. Fixed by explicitly disallowing grouping of
gathers and scatters.
This problem didn't show up on SVE because there we convert to
IFN_GATHER_LOAD/IFN_SCATTER_STORE pattern statements, which fail
the can_group_stmts_p check.
2018-01-16 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* tree-vect-data-refs.c (vect_analyze_data_ref_accesses):
gcc/testsuite/
* gcc.dg/torture/pr83847.c: New test.
From-SVN: r256730
|
|
This patch adds runtime alias checks for loops with variable strides,
so that we can vectorise them even without a restrict qualifier.
There are several parts to doing this:
1) For accesses like:
x[i * n] += 1;
we need to check whether n (and thus the DR_STEP) is nonzero.
vect_analyze_data_ref_dependence records values that need to be
checked in this way, then prune_runtime_alias_test_list records a
bounds check on DR_STEP being outside the range [0, 0].
2) For accesses like:
x[i * n] = x[i * n + 1] + 1;
we simply need to test whether abs (n) >= 2.
prune_runtime_alias_test_list looks for cases like this and tries
to guess whether it is better to use this kind of check or a check
for non-overlapping ranges. (We could do an OR of the two conditions
at runtime, but that isn't implemented yet.)
3) Checks for overlapping ranges need to cope with variable strides.
At present the "length" of each segment in a range check is
represented as an offset from the base that lies outside the
touched range, in the same direction as DR_STEP. The length
can therefore be negative and is sometimes conservative.
With variable steps it's easier to reaon about if we split
this into two:
seg_len:
distance travelled from the first iteration of interest
to the last, e.g. DR_STEP * (VF - 1)
access_size:
the number of bytes accessed in each iteration
with access_size always being a positive constant and seg_len
possibly being variable. We can then combine alias checks
for two accesses that are a constant number of bytes apart by
adjusting the access size to account for the gap. This leaves
the segment length unchanged, which allows the check to be combined
with further accesses.
When seg_len is positive, the runtime alias check has the form:
base_a >= base_b + seg_len_b + access_size_b
|| base_b >= base_a + seg_len_a + access_size_a
In many accesses the base will be aligned to the access size, which
allows us to skip the addition:
base_a > base_b + seg_len_b
|| base_b > base_a + seg_len_a
A similar saving is possible with "negative" lengths.
The patch therefore tracks the alignment in addition to seg_len
and access_size.
2018-01-13 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* tree-vectorizer.h (vec_lower_bound): New structure.
(_loop_vec_info): Add check_nonzero and lower_bounds.
(LOOP_VINFO_CHECK_NONZERO): New macro.
(LOOP_VINFO_LOWER_BOUNDS): Likewise.
(LOOP_REQUIRES_VERSIONING_FOR_ALIAS): Check lower_bounds too.
* tree-data-ref.h (dr_with_seg_len): Add access_size and align
fields. Make seg_len the distance travelled, not including the
access size.
(dr_direction_indicator): Declare.
(dr_zero_step_indicator): Likewise.
(dr_known_forward_stride_p): Likewise.
* tree-data-ref.c: Include stringpool.h, tree-vrp.h and
tree-ssanames.h.
(runtime_alias_check_p): Allow runtime alias checks with
variable strides.
(operator ==): Compare access_size and align.
(prune_runtime_alias_test_list): Rework for new distinction between
the access_size and seg_len.
(create_intersect_range_checks_index): Likewise. Cope with polynomial
segment lengths.
(get_segment_min_max): New function.
(create_intersect_range_checks): Use it.
(dr_step_indicator): New function.
(dr_direction_indicator): Likewise.
(dr_zero_step_indicator): Likewise.
(dr_known_forward_stride_p): Likewise.
* tree-loop-distribution.c (data_ref_segment_size): Return
DR_STEP * (niters - 1).
(compute_alias_check_pairs): Update call to the dr_with_seg_len
constructor.
* tree-vect-data-refs.c (vect_check_nonzero_value): New function.
(vect_preserves_scalar_order_p): New function, split out from...
(vect_analyze_data_ref_dependence): ...here. Check for zero steps.
(vect_vfa_segment_size): Return DR_STEP * (length_factor - 1).
(vect_vfa_access_size): New function.
(vect_vfa_align): Likewise.
(vect_compile_time_alias): Take access_size_a and access_b arguments.
(dump_lower_bound): New function.
(vect_check_lower_bound): Likewise.
(vect_small_gap_p): Likewise.
(vectorizable_with_step_bound_p): Likewise.
(vect_prune_runtime_alias_test_list): Ignore cross-iteration
depencies if the vectorization factor is 1. Convert the checks
for nonzero steps into checks on the bounds of DR_STEP. Try using
a bunds check for variable steps if the minimum required step is
relatively small. Update calls to the dr_with_seg_len
constructor and to vect_compile_time_alias.
* tree-vect-loop-manip.c (vect_create_cond_for_lower_bounds): New
function.
(vect_loop_versioning): Call it.
* tree-vect-loop.c (vect_analyze_loop_2): Clear LOOP_VINFO_LOWER_BOUNDS
when retrying.
(vect_estimate_min_profitable_iters): Account for any bounds checks.
gcc/testsuite/
* gcc.dg/vect/bb-slp-cond-1.c: Expect loop vectorization rather
than SLP vectorization.
* gcc.dg/vect/vect-alias-check-10.c: New test.
* gcc.dg/vect/vect-alias-check-11.c: Likewise.
* gcc.dg/vect/vect-alias-check-12.c: Likewise.
* gcc.dg/vect/vect-alias-check-8.c: Likewise.
* gcc.dg/vect/vect-alias-check-9.c: Likewise.
* gcc.target/aarch64/sve/strided_load_8.c: Likewise.
* gcc.target/aarch64/sve/var_stride_1.c: Likewise.
* gcc.target/aarch64/sve/var_stride_1.h: Likewise.
* gcc.target/aarch64/sve/var_stride_1_run.c: Likewise.
* gcc.target/aarch64/sve/var_stride_2.c: Likewise.
* gcc.target/aarch64/sve/var_stride_2_run.c: Likewise.
* gcc.target/aarch64/sve/var_stride_3.c: Likewise.
* gcc.target/aarch64/sve/var_stride_3_run.c: Likewise.
* gcc.target/aarch64/sve/var_stride_4.c: Likewise.
* gcc.target/aarch64/sve/var_stride_4_run.c: Likewise.
* gcc.target/aarch64/sve/var_stride_5.c: Likewise.
* gcc.target/aarch64/sve/var_stride_5_run.c: Likewise.
* gcc.target/aarch64/sve/var_stride_6.c: Likewise.
* gcc.target/aarch64/sve/var_stride_6_run.c: Likewise.
* gcc.target/aarch64/sve/var_stride_7.c: Likewise.
* gcc.target/aarch64/sve/var_stride_7_run.c: Likewise.
* gcc.target/aarch64/sve/var_stride_8.c: Likewise.
* gcc.target/aarch64/sve/var_stride_8_run.c: Likewise.
* gfortran.dg/vect/vect-alias-check-1.F90: Likewise.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256644
|
|
This is mostly a mechanical extension of the previous gather load
support to scatter stores. The internal functions in this case are:
IFN_SCATTER_STORE (base, offsets, scale, values)
IFN_MASK_SCATTER_STORE (base, offsets, scale, values, mask)
However, one nonobvious change is to vect_analyze_data_ref_access.
If we're treating an access as a gather load or scatter store
(i.e. if STMT_VINFO_GATHER_SCATTER_P is true), the existing code
would create a dummy data_reference whose step is 0. There's not
really much else it could do, since the whole point is that the
step isn't predictable from iteration to iteration. We then
went into this code in vect_analyze_data_ref_access:
/* Allow loads with zero step in inner-loop vectorization. */
if (loop_vinfo && integer_zerop (step))
{
GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt)) = NULL;
if (!nested_in_vect_loop_p (loop, stmt))
return DR_IS_READ (dr);
I.e. we'd take the step literally and assume that this is a load
or store to an invariant address. Loads from invariant addresses
are supported but stores to them aren't.
The code therefore had the effect of disabling all scatter stores.
AFAICT this is true of AVX too: although tests like avx512f-scatter-1.c
test for the correctness of a scatter-like loop, they don't seem to
check whether a scatter instruction is actually used.
The patch therefore makes vect_analyze_data_ref_access return true
for scatters. We do seem to handle the aliasing correctly;
that's tested by other functions, and is symmetrical to the
already-working gather case.
2018-01-13 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* doc/sourcebuild.texi (vect_scatter_store): Document.
* optabs.def (scatter_store_optab, mask_scatter_store_optab): New
optabs.
* doc/md.texi (scatter_store@var{m}, mask_scatter_store@var{m}):
Document.
* genopinit.c (main): Add supports_vec_scatter_store and
supports_vec_scatter_store_cached to target_optabs.
* gimple.h (gimple_expr_type): Handle IFN_SCATTER_STORE and
IFN_MASK_SCATTER_STORE.
* internal-fn.def (SCATTER_STORE, MASK_SCATTER_STORE): New internal
functions.
* internal-fn.h (internal_store_fn_p): Declare.
(internal_fn_stored_value_index): Likewise.
* internal-fn.c (scatter_store_direct): New macro.
(expand_scatter_store_optab_fn): New function.
(direct_scatter_store_optab_supported_p): New macro.
(internal_store_fn_p): New function.
(internal_gather_scatter_fn_p): Handle IFN_SCATTER_STORE and
IFN_MASK_SCATTER_STORE.
(internal_fn_mask_index): Likewise.
(internal_fn_stored_value_index): New function.
(internal_gather_scatter_fn_supported_p): Adjust operand numbers
for scatter stores.
* optabs-query.h (supports_vec_scatter_store_p): Declare.
* optabs-query.c (supports_vec_scatter_store_p): New function.
* tree-vectorizer.h (vect_get_store_rhs): Declare.
* tree-vect-data-refs.c (vect_analyze_data_ref_access): Return
true for scatter stores.
(vect_gather_scatter_fn_p): Handle scatter stores too.
(vect_check_gather_scatter): Consider using scatter stores if
supports_vec_scatter_store_p.
* tree-vect-patterns.c (vect_try_gather_scatter_pattern): Handle
scatter stores too.
* tree-vect-stmts.c (exist_non_indexing_operands_for_use_p): Use
internal_fn_stored_value_index.
(check_load_store_masking): Handle scatter stores too.
(vect_get_store_rhs): Make public.
(vectorizable_call): Use internal_store_fn_p.
(vectorizable_store): Handle scatter store internal functions.
(vect_transform_stmt): Compare GROUP_STORE_COUNT with GROUP_SIZE
when deciding whether the end of the group has been reached.
* config/aarch64/aarch64.md (UNSPEC_ST1_SCATTER): New unspec.
* config/aarch64/aarch64-sve.md (scatter_store<mode>): New expander.
(mask_scatter_store<mode>): New insns.
gcc/testsuite/
* lib/target-supports.exp (check_effective_target_vect_scatter_store):
New proc.
* gcc.dg/vect/pr25413a.c: Expect both loops to be optimized on
targets with scatter stores.
* gcc.dg/vect/vect-71.c: Restrict XFAIL to targets without scatter
stores.
* gcc.target/aarch64/sve/mask_scatter_store_1.c: New test.
* gcc.target/aarch64/sve/mask_scatter_store_2.c: Likewise.
* gcc.target/aarch64/sve/scatter_store_1.c: Likewise.
* gcc.target/aarch64/sve/scatter_store_2.c: Likewise.
* gcc.target/aarch64/sve/scatter_store_3.c: Likewise.
* gcc.target/aarch64/sve/scatter_store_4.c: Likewise.
* gcc.target/aarch64/sve/scatter_store_5.c: Likewise.
* gcc.target/aarch64/sve/scatter_store_6.c: Likewise.
* gcc.target/aarch64/sve/scatter_store_7.c: Likewise.
* gcc.target/aarch64/sve/strided_store_1.c: Likewise.
* gcc.target/aarch64/sve/strided_store_2.c: Likewise.
* gcc.target/aarch64/sve/strided_store_3.c: Likewise.
* gcc.target/aarch64/sve/strided_store_4.c: Likewise.
* gcc.target/aarch64/sve/strided_store_5.c: Likewise.
* gcc.target/aarch64/sve/strided_store_6.c: Likewise.
* gcc.target/aarch64/sve/strided_store_7.c: Likewise.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256643
|
|
Following on from the previous patch for strided accesses, this patch
allows gather loads to be used with grouped accesses, if we otherwise
would need to fall back to VMAT_ELEMENTWISE. However, as the comment
says, this is restricted to single-element groups for now:
??? Although the code can handle all group sizes correctly,
it probably isn't a win to use separate strided accesses based
on nearby locations. Or, even if it's a win over scalar code,
it might not be a win over vectorizing at a lower VF, if that
allows us to use contiguous accesses.
Single-element groups are an important special case though,
and this means that code is less sensitive to GCC's classification
of single accesses with constant steps as "grouped" and ones with
variable steps as "strided".
2018-01-13 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* tree-vectorizer.h (vect_gather_scatter_fn_p): Declare.
* tree-vect-data-refs.c (vect_gather_scatter_fn_p): Make public.
* tree-vect-stmts.c (vect_truncate_gather_scatter_offset): New
function.
(vect_use_strided_gather_scatters_p): Take a masked_p argument.
Use vect_truncate_gather_scatter_offset if we can't treat the
operation as a normal gather load or scatter store.
(get_group_load_store_type): Take the gather_scatter_info
as argument. Try using a gather load or scatter store for
single-element groups.
(get_load_store_type): Update calls to get_group_load_store_type
and vect_use_strided_gather_scatters_p.
gcc/testsuite/
* gcc.target/aarch64/sve/reduc_strict_3.c: Expect FADDA to be used
for double_reduc1.
* gcc.target/aarch64/sve/strided_load_4.c: New test.
* gcc.target/aarch64/sve/strided_load_5.c: Likewise.
* gcc.target/aarch64/sve/strided_load_6.c: Likewise.
* gcc.target/aarch64/sve/strided_load_7.c: Likewise.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256642
|
|
This patch tries to use gather loads for strided accesses,
rather than falling back to VMAT_ELEMENTWISE.
2018-01-13 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* tree-vectorizer.h (vect_create_data_ref_ptr): Take an extra
optional tree argument.
* tree-vect-data-refs.c (vect_check_gather_scatter): Check for
null target hooks.
(vect_create_data_ref_ptr): Take the iv_step as an optional argument,
but continue to use the current value as a fallback.
(bump_vector_ptr): Use operand_equal_p rather than tree_int_cst_compare
to compare the updates.
* tree-vect-stmts.c (vect_use_strided_gather_scatters_p): New function.
(get_load_store_type): Use it when handling a strided access.
(vect_get_strided_load_store_ops): New function.
(vect_get_data_ptr_increment): Likewise.
(vectorizable_load): Handle strided gather loads. Always pass
a step to vect_create_data_ref_ptr and bump_vector_ptr.
gcc/testsuite/
* gcc.target/aarch64/sve/strided_load_1.c: New test.
* gcc.target/aarch64/sve/strided_load_2.c: Likewise.
* gcc.target/aarch64/sve/strided_load_3.c: Likewise.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256641
|
|
This patch adds support for SVE gather loads. It uses the basically
the same analysis code as the AVX gather support, but after that
there are two major differences:
- It uses new internal functions rather than target built-ins.
The interface is:
IFN_GATHER_LOAD (base, offsets scale)
IFN_MASK_GATHER_LOAD (base, offsets scale, mask)
which should be reasonably generic. One of the advantages of
using internal functions is that other passes can understand what
the functions do, but a more immediate advantage is that we can
query the underlying target pattern to see which scales it supports.
- It uses pattern recognition to convert the offset to the right width,
if it was originally narrower than that. This avoids having to do
a widening operation as part of the gather expansion itself.
2018-01-13 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* doc/md.texi (gather_load@var{m}): Document.
(mask_gather_load@var{m}): Likewise.
* genopinit.c (main): Add supports_vec_gather_load and
supports_vec_gather_load_cached to target_optabs.
* optabs-tree.c (init_tree_optimization_optabs): Use
ggc_cleared_alloc to allocate target_optabs.
* optabs.def (gather_load_optab, mask_gather_laod_optab): New optabs.
* internal-fn.def (GATHER_LOAD, MASK_GATHER_LOAD): New internal
functions.
* internal-fn.h (internal_load_fn_p): Declare.
(internal_gather_scatter_fn_p): Likewise.
(internal_fn_mask_index): Likewise.
(internal_gather_scatter_fn_supported_p): Likewise.
* internal-fn.c (gather_load_direct): New macro.
(expand_gather_load_optab_fn): New function.
(direct_gather_load_optab_supported_p): New macro.
(direct_internal_fn_optab): New function.
(internal_load_fn_p): Likewise.
(internal_gather_scatter_fn_p): Likewise.
(internal_fn_mask_index): Likewise.
(internal_gather_scatter_fn_supported_p): Likewise.
* optabs-query.c (supports_at_least_one_mode_p): New function.
(supports_vec_gather_load_p): Likewise.
* optabs-query.h (supports_vec_gather_load_p): Declare.
* tree-vectorizer.h (gather_scatter_info): Add ifn, element_type
and memory_type field.
(NUM_PATTERNS): Bump to 15.
* tree-vect-data-refs.c: Include internal-fn.h.
(vect_gather_scatter_fn_p): New function.
(vect_describe_gather_scatter_call): Likewise.
(vect_check_gather_scatter): Try using internal functions for
gather loads. Recognize existing calls to a gather load function.
(vect_analyze_data_refs): Consider using gather loads if
supports_vec_gather_load_p.
* tree-vect-patterns.c (vect_get_load_store_mask): New function.
(vect_get_gather_scatter_offset_type): Likewise.
(vect_convert_mask_for_vectype): Likewise.
(vect_add_conversion_to_patterm): Likewise.
(vect_try_gather_scatter_pattern): Likewise.
(vect_recog_gather_scatter_pattern): New pattern recognizer.
(vect_vect_recog_func_ptrs): Add it.
* tree-vect-stmts.c (exist_non_indexing_operands_for_use_p): Use
internal_fn_mask_index and internal_gather_scatter_fn_p.
(check_load_store_masking): Take the gather_scatter_info as an
argument and handle gather loads.
(vect_get_gather_scatter_ops): New function.
(vectorizable_call): Check internal_load_fn_p.
(vectorizable_load): Likewise. Handle gather load internal
functions.
(vectorizable_store): Update call to check_load_store_masking.
* config/aarch64/aarch64.md (UNSPEC_LD1_GATHER): New unspec.
* config/aarch64/iterators.md (SVE_S, SVE_D): New mode iterators.
* config/aarch64/predicates.md (aarch64_gather_scale_operand_w)
(aarch64_gather_scale_operand_d): New predicates.
* config/aarch64/aarch64-sve.md (gather_load<mode>): New expander.
(mask_gather_load<mode>): New insns.
gcc/testsuite/
* gcc.target/aarch64/sve/gather_load_1.c: New test.
* gcc.target/aarch64/sve/gather_load_2.c: Likewise.
* gcc.target/aarch64/sve/gather_load_3.c: Likewise.
* gcc.target/aarch64/sve/gather_load_4.c: Likewise.
* gcc.target/aarch64/sve/gather_load_5.c: Likewise.
* gcc.target/aarch64/sve/gather_load_6.c: Likewise.
* gcc.target/aarch64/sve/gather_load_7.c: Likewise.
* gcc.target/aarch64/sve/mask_gather_load_1.c: Likewise.
* gcc.target/aarch64/sve/mask_gather_load_2.c: Likewise.
* gcc.target/aarch64/sve/mask_gather_load_3.c: Likewise.
* gcc.target/aarch64/sve/mask_gather_load_4.c: Likewise.
* gcc.target/aarch64/sve/mask_gather_load_5.c: Likewise.
* gcc.target/aarch64/sve/mask_gather_load_6.c: Likewise.
* gcc.target/aarch64/sve/mask_gather_load_7.c: Likewise.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256640
|
|
This allows LD3 to be used for isolated a[i * 3] accesses, in a similar
way to the current a[i * 2] and a[i * 4] for LD2 and LD4 respectively.
Given the problems with the cost model underestimating the cost of
elementwise accesses, the patch continues to reject the VMAT_ELEMENTWISE
cases that are currently rejected.
2018-01-13 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* tree-vect-data-refs.c (vect_analyze_group_access_1): Allow
single-element interleaving even if the size is not a power of 2.
* tree-vect-stmts.c (get_load_store_type): Disallow elementwise
accesses for single-element interleaving if the group size is
not a power of 2.
gcc/testsuite/
* gcc.target/aarch64/sve/struct_vect_18.c: New test.
* gcc.target/aarch64/sve/struct_vect_18_run.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_19.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_19_run.c: Likewise.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256634
|
|
This patch adds support for vectorising groups of IFN_MASK_LOADs
and IFN_MASK_STOREs using conditional load/store-lanes instructions.
This requires new internal functions to represent the result
(IFN_MASK_{LOAD,STORE}_LANES), as well as associated optabs.
The normal IFN_{LOAD,STORE}_LANES functions are const operations
that logically just perform the permute: the load or store is
encoded as a MEM operand to the call statement. In contrast,
the IFN_MASK_{LOAD,STORE}_LANES functions use the same kind of
interface as IFN_MASK_{LOAD,STORE}, since the memory is only
conditionally accessed.
The AArch64 patterns were added as part of the main LD[234]/ST[234] patch.
2018-01-13 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* doc/md.texi (vec_mask_load_lanes@var{m}@var{n}): Document.
(vec_mask_store_lanes@var{m}@var{n}): Likewise.
* optabs.def (vec_mask_load_lanes_optab): New optab.
(vec_mask_store_lanes_optab): Likewise.
* internal-fn.def (MASK_LOAD_LANES): New internal function.
(MASK_STORE_LANES): Likewise.
* internal-fn.c (mask_load_lanes_direct): New macro.
(mask_store_lanes_direct): Likewise.
(expand_mask_load_optab_fn): Handle masked operations.
(expand_mask_load_lanes_optab_fn): New macro.
(expand_mask_store_optab_fn): Handle masked operations.
(expand_mask_store_lanes_optab_fn): New macro.
(direct_mask_load_lanes_optab_supported_p): Likewise.
(direct_mask_store_lanes_optab_supported_p): Likewise.
* tree-vectorizer.h (vect_store_lanes_supported): Take a masked_p
parameter.
(vect_load_lanes_supported): Likewise.
* tree-vect-data-refs.c (strip_conversion): New function.
(can_group_stmts_p): Likewise.
(vect_analyze_data_ref_accesses): Use it instead of checking
for a pair of assignments.
(vect_store_lanes_supported): Take a masked_p parameter.
(vect_load_lanes_supported): Likewise.
* tree-vect-loop.c (vect_analyze_loop_2): Update calls to
vect_store_lanes_supported and vect_load_lanes_supported.
* tree-vect-slp.c (vect_analyze_slp_instance): Likewise.
* tree-vect-stmts.c (get_group_load_store_type): Take a masked_p
parameter. Don't allow gaps for masked accesses.
Use vect_get_store_rhs. Update calls to vect_store_lanes_supported
and vect_load_lanes_supported.
(get_load_store_type): Take a masked_p parameter and update
call to get_group_load_store_type.
(vectorizable_store): Update call to get_load_store_type.
Handle IFN_MASK_STORE_LANES.
(vectorizable_load): Update call to get_load_store_type.
Handle IFN_MASK_LOAD_LANES.
gcc/testsuite/
* gcc.dg/vect/vect-ooo-group-1.c: New test.
* gcc.target/aarch64/sve/mask_struct_load_1.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_load_1_run.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_load_2.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_load_2_run.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_load_3.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_load_3_run.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_load_4.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_load_5.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_load_6.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_load_7.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_load_8.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_store_1.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_store_1_run.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_store_2.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_store_2_run.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_store_3.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_store_3_run.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_store_4.c: Likewise.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256620
|
|
So far we've used integer modes for LD[234] and ST[234] arrays.
That doesn't scale well to SVE, since the sizes aren't fixed at
compile time (and even if they were, we wouldn't want integers
to be so wide).
This patch lets the target use double-, triple- and quadruple-length
vectors instead.
2018-01-13 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* target.def (array_mode): New target hook.
* doc/tm.texi.in (TARGET_ARRAY_MODE): New hook.
* doc/tm.texi: Regenerate.
* hooks.h (hook_optmode_mode_uhwi_none): Declare.
* hooks.c (hook_optmode_mode_uhwi_none): New function.
* tree-vect-data-refs.c (vect_lanes_optab_supported_p): Use
targetm.array_mode.
* stor-layout.c (mode_for_array): Likewise. Support polynomial
type sizes.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256617
|
|
The idea with the main 107-patch poly_int series (latterly 109-patch)
was to change the mode sizes and vector element counts to poly_int and
then propagate those changes as far as they needed to go to fix build
failures from incompatible types. This means that DR_INIT is now
constructed as a poly_int64:
poly_int64 pbytepos;
if (!multiple_p (pbitpos, BITS_PER_UNIT, &pbytepos))
{
if (dump_file && (dump_flags & TDF_DETAILS))
fprintf (dump_file, "failed: bit offset alignment.\n");
return false;
}
[...]
init = ssize_int (pbytepos);
This patch adjusts other references to DR_INIT accordingly. Unlike
the above, the adjustments weren't needed to avoid a build-time type
incompatibility, but they are needed to make the producer and consumers
of DR_INIT logically consistent.
2018-01-12 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* tree-predcom.c (aff_combination_dr_offset): Use wi::to_poly_widest
rather than wi::to_widest for DR_INITs.
* tree-vect-data-refs.c (vect_find_same_alignment_drs): Use
wi::to_poly_offset rather than wi::to_offset for DR_INIT.
(vect_analyze_data_ref_accesses): Require both DR_INITs to be
INTEGER_CSTs.
(vect_analyze_group_access_1): Note that here.
From-SVN: r256587
|
|
r241959 included code to stop the vectoriser increasing the alignment of
a "user-aligned" variable. This wasn't the main purpose of the patch,
but was done for consistency with pass_increase_alignment, and was
needed to make the testcase work.
The documentation for the aligned attribute says:
This attribute specifies a minimum alignment for the variable or
structure field, measured in bytes.
so I think it's reasonable for the vectoriser to increase the
alignment further, if that helps us to vectorise code. It's also
useful if the "user" alignment actually came from an earlier pass
rather than the source code.
A possible counterexample came up when this was discussed on the lists.
Users who are trying to collate things from several translation units
into a single section can use:
__attribute__((section ("whatever"), aligned(N)))
and would not want extra padding. It turns out that the supported way
of doing that is to add a "used" attribute, which works even when no
"aligned" attribute is given.
2018-01-05 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* tree-vect-data-refs.c (vect_compute_data_ref_alignment): Don't
punt for user-aligned variables.
gcc/testsuite/
* gcc.dg/vect/vect-align-4.c: New test.
* gcc.dg/vect/vect-nb-iter-ub-2.c (cc): Remove alignment attribute
and redefine as a structure with an unaligned member "b".
(foo): Update accordingly.
From-SVN: r256277
|
|
This patch changes GET_MODE_SIZE from unsigned short to poly_uint16.
The non-mechanical parts were handled by previous patches.
2018-01-03 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* machmode.h (mode_size): Change from unsigned short to
poly_uint16_pod.
(mode_to_bytes): Return a poly_uint16 rather than an unsigned short.
(GET_MODE_SIZE): Return a constant if ONLY_FIXED_SIZE_MODES,
or if measurement_type is not polynomial.
(fixed_size_mode::includes_p): Check for constant-sized modes.
* genmodes.c (emit_mode_size_inline): Make mode_size_inline
return a poly_uint16 rather than an unsigned short.
(emit_mode_size): Change the type of mode_size from unsigned short
to poly_uint16_pod. Use ZERO_COEFFS for the initializer.
(emit_mode_adjustments): Cope with polynomial vector sizes.
* lto-streamer-in.c (lto_input_mode_table): Use bp_unpack_poly_value
for GET_MODE_SIZE.
* lto-streamer-out.c (lto_write_mode_table): Use bp_pack_poly_value
for GET_MODE_SIZE.
* auto-inc-dec.c (try_merge): Treat GET_MODE_SIZE as polynomial.
* builtins.c (expand_ifn_atomic_compare_exchange_into_call): Likewise.
* caller-save.c (setup_save_areas): Likewise.
(replace_reg_with_saved_mem): Likewise.
* calls.c (emit_library_call_value_1): Likewise.
* combine-stack-adj.c (combine_stack_adjustments_for_block): Likewise.
* combine.c (simplify_set, make_extraction, simplify_shift_const_1)
(gen_lowpart_for_combine): Likewise.
* convert.c (convert_to_integer_1): Likewise.
* cse.c (equiv_constant, cse_insn): Likewise.
* cselib.c (autoinc_split, cselib_hash_rtx): Likewise.
(cselib_subst_to_values): Likewise.
* dce.c (word_dce_process_block): Likewise.
* df-problems.c (df_word_lr_mark_ref): Likewise.
* dwarf2cfi.c (init_one_dwarf_reg_size): Likewise.
* dwarf2out.c (multiple_reg_loc_descriptor, mem_loc_descriptor)
(concat_loc_descriptor, concatn_loc_descriptor, loc_descriptor)
(rtl_for_decl_location): Likewise.
* emit-rtl.c (gen_highpart, widen_memory_access): Likewise.
* expmed.c (extract_bit_field_1, extract_integral_bit_field): Likewise.
* expr.c (emit_group_load_1, clear_storage_hints): Likewise.
(emit_move_complex, emit_move_multi_word, emit_push_insn): Likewise.
(expand_expr_real_1): Likewise.
* function.c (assign_parm_setup_block_p, assign_parm_setup_block)
(pad_below): Likewise.
* gimple-fold.c (optimize_atomic_compare_exchange_p): Likewise.
* gimple-ssa-store-merging.c (rhs_valid_for_store_merging_p): Likewise.
* ira.c (get_subreg_tracking_sizes): Likewise.
* ira-build.c (ira_create_allocno_objects): Likewise.
* ira-color.c (coalesced_pseudo_reg_slot_compare): Likewise.
(ira_sort_regnos_for_alter_reg): Likewise.
* ira-costs.c (record_operand_costs): Likewise.
* lower-subreg.c (interesting_mode_p, simplify_gen_subreg_concatn)
(resolve_simple_move): Likewise.
* lra-constraints.c (get_reload_reg, operands_match_p): Likewise.
(process_addr_reg, simplify_operand_subreg, curr_insn_transform)
(lra_constraints): Likewise.
(CONST_POOL_OK_P): Reject variable-sized modes.
* lra-spills.c (slot, assign_mem_slot, pseudo_reg_slot_compare)
(add_pseudo_to_slot, lra_spill): Likewise.
* omp-low.c (omp_clause_aligned_alignment): Likewise.
* optabs-query.c (get_best_extraction_insn): Likewise.
* optabs-tree.c (expand_vec_cond_expr_p): Likewise.
* optabs.c (expand_vec_perm_var, expand_vec_cond_expr): Likewise.
(expand_mult_highpart, valid_multiword_target_p): Likewise.
* recog.c (offsettable_address_addr_space_p): Likewise.
* regcprop.c (maybe_mode_change): Likewise.
* reginfo.c (choose_hard_reg_mode, record_subregs_of_mode): Likewise.
* regrename.c (build_def_use): Likewise.
* regstat.c (dump_reg_info): Likewise.
* reload.c (complex_word_subreg_p, push_reload, find_dummy_reload)
(find_reloads, find_reloads_subreg_address): Likewise.
* reload1.c (eliminate_regs_1): Likewise.
* rtlanal.c (for_each_inc_dec_find_inc_dec, rtx_cost): Likewise.
* simplify-rtx.c (avoid_constant_pool_reference): Likewise.
(simplify_binary_operation_1, simplify_subreg): Likewise.
* targhooks.c (default_function_arg_padding): Likewise.
(default_hard_regno_nregs, default_class_max_nregs): Likewise.
* tree-cfg.c (verify_gimple_assign_binary): Likewise.
(verify_gimple_assign_ternary): Likewise.
* tree-inline.c (estimate_move_cost): Likewise.
* tree-ssa-forwprop.c (simplify_vector_constructor): Likewise.
* tree-ssa-loop-ivopts.c (add_autoinc_candidates): Likewise.
(get_address_cost_ainc): Likewise.
* tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Likewise.
(vect_supportable_dr_alignment): Likewise.
* tree-vect-loop.c (vect_determine_vectorization_factor): Likewise.
(vectorizable_reduction): Likewise.
* tree-vect-stmts.c (vectorizable_assignment, vectorizable_shift)
(vectorizable_operation, vectorizable_load): Likewise.
* tree.c (build_same_sized_truth_vector_type): Likewise.
* valtrack.c (cleanup_auto_inc_dec): Likewise.
* var-tracking.c (emit_note_insn_var_location): Likewise.
* config/arc/arc.h (ASM_OUTPUT_CASE_END): Use as_a <scalar_int_mode>.
(ADDR_VEC_ALIGN): Likewise.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256201
|
|
This patch changes TYPE_VECTOR_SUBPARTS to a poly_uint64. The value is
encoded in the 10-bit precision field and was previously always stored
as a simple log2 value. The challenge was to use this 10 bits to
encode the number of elements in variable-length vectors, so that
we didn't need to increase the size of the tree.
In practice the number of vector elements should always have the form
N + N * X (where X is the runtime value), and as for constant-length
vectors, N must be a power of 2 (even though X itself might not be).
The patch therefore uses the low 8 bits to encode log2(N) and bit
8 to select between constant-length and variable-length vectors.
Targets without variable-length vectors continue to use the old scheme.
A new valid_vector_subparts_p function tests whether a given number
of elements can be encoded. This is false for the vector modes that
represent an LD3 or ST3 vector triple (which we want to treat as arrays
of vectors rather than single vectors).
Most of the patch is mechanical; previous patches handled the changes
that weren't entirely straightforward.
2018-01-03 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* tree.h (TYPE_VECTOR_SUBPARTS): Turn into a function and handle
polynomial numbers of units.
(SET_TYPE_VECTOR_SUBPARTS): Likewise.
(valid_vector_subparts_p): New function.
(build_vector_type): Remove temporary shim and take the number
of units as a poly_uint64 rather than an int.
(build_opaque_vector_type): Take the number of units as a
poly_uint64 rather than an int.
* tree.c (build_vector_from_ctor): Handle polynomial
TYPE_VECTOR_SUBPARTS.
(type_hash_canon_hash, type_cache_hasher::equal): Likewise.
(uniform_vector_p, vector_type_mode, build_vector): Likewise.
(build_vector_from_val): If the number of units is variable,
use build_vec_duplicate_cst for constant operands and
VEC_DUPLICATE_EXPR otherwise.
(make_vector_type): Remove temporary is_constant ().
(build_vector_type, build_opaque_vector_type): Take the number of
units as a poly_uint64 rather than an int.
(check_vector_cst): Handle polynomial TYPE_VECTOR_SUBPARTS and
VECTOR_CST_NELTS.
* cfgexpand.c (expand_debug_expr): Likewise.
* expr.c (count_type_elements, categorize_ctor_elements_1): Likewise.
(store_constructor, expand_expr_real_1): Likewise.
(const_scalar_mask_from_tree): Likewise.
* fold-const-call.c (fold_const_reduction): Likewise.
* fold-const.c (const_binop, const_unop, fold_convert_const): Likewise.
(operand_equal_p, fold_vec_perm, fold_ternary_loc): Likewise.
(native_encode_vector, vec_cst_ctor_to_array): Likewise.
(fold_relational_const): Likewise.
(native_interpret_vector): Likewise. Change the size from an
int to an unsigned int.
* gimple-fold.c (gimple_fold_stmt_to_constant_1): Handle polynomial
TYPE_VECTOR_SUBPARTS.
(gimple_fold_indirect_ref, gimple_build_vector): Likewise.
(gimple_build_vector_from_val): Use VEC_DUPLICATE_EXPR when
duplicating a non-constant operand into a variable-length vector.
* hsa-brig.c (hsa_op_immed::emit_to_buffer): Handle polynomial
TYPE_VECTOR_SUBPARTS and VECTOR_CST_NELTS.
* ipa-icf.c (sem_variable::equals): Likewise.
* match.pd: Likewise.
* omp-simd-clone.c (simd_clone_subparts): Likewise.
* print-tree.c (print_node): Likewise.
* stor-layout.c (layout_type): Likewise.
* targhooks.c (default_builtin_vectorization_cost): Likewise.
* tree-cfg.c (verify_gimple_comparison): Likewise.
(verify_gimple_assign_binary): Likewise.
(verify_gimple_assign_ternary): Likewise.
(verify_gimple_assign_single): Likewise.
* tree-pretty-print.c (dump_generic_node): Likewise.
* tree-ssa-forwprop.c (simplify_vector_constructor): Likewise.
(simplify_bitfield_ref, is_combined_permutation_identity): Likewise.
* tree-vect-data-refs.c (vect_permute_store_chain): Likewise.
(vect_grouped_load_supported, vect_permute_load_chain): Likewise.
(vect_shift_permute_load_chain): Likewise.
* tree-vect-generic.c (nunits_for_known_piecewise_op): Likewise.
(expand_vector_condition, optimize_vector_constructor): Likewise.
(lower_vec_perm, get_compute_type): Likewise.
* tree-vect-loop.c (vect_determine_vectorization_factor): Likewise.
(get_initial_defs_for_reduction, vect_transform_loop): Likewise.
* tree-vect-patterns.c (vect_recog_bool_pattern): Likewise.
(vect_recog_mask_conversion_pattern): Likewise.
* tree-vect-slp.c (vect_supported_load_permutation_p): Likewise.
(vect_get_constant_vectors, vect_transform_slp_perm_load): Likewise.
* tree-vect-stmts.c (perm_mask_for_reverse): Likewise.
(get_group_load_store_type, vectorizable_mask_load_store): Likewise.
(vectorizable_bswap, simd_clone_subparts, vectorizable_assignment)
(vectorizable_shift, vectorizable_operation, vectorizable_store)
(vectorizable_load, vect_is_simple_cond, vectorizable_comparison)
(supportable_widening_operation): Likewise.
(supportable_narrowing_operation): Likewise.
* tree-vector-builder.c (tree_vector_builder::binary_encoded_nelts):
Likewise.
* varasm.c (output_constant): Likewise.
gcc/ada/
* gcc-interface/utils.c (gnat_types_compatible_p): Handle
polynomial TYPE_VECTOR_SUBPARTS.
gcc/brig/
* brigfrontend/brig-to-generic.cc (get_unsigned_int_type): Handle
polynomial TYPE_VECTOR_SUBPARTS.
* brigfrontend/brig-util.h (gccbrig_type_vector_subparts): Likewise.
gcc/c-family/
* c-common.c (vector_types_convertible_p, c_build_vec_perm_expr)
(convert_vector_to_array_for_subscript): Handle polynomial
TYPE_VECTOR_SUBPARTS.
(c_common_type_for_mode): Check valid_vector_subparts_p.
* c-pretty-print.c (pp_c_initializer_list): Handle polynomial
VECTOR_CST_NELTS.
gcc/c/
* c-typeck.c (comptypes_internal, build_binary_op): Handle polynomial
TYPE_VECTOR_SUBPARTS.
gcc/cp/
* constexpr.c (cxx_eval_array_reference): Handle polynomial
VECTOR_CST_NELTS.
(cxx_fold_indirect_ref): Handle polynomial TYPE_VECTOR_SUBPARTS.
* call.c (build_conditional_expr_1): Likewise.
* decl.c (cp_finish_decomp): Likewise.
* mangle.c (write_type): Likewise.
* typeck.c (structural_comptypes): Likewise.
(cp_build_binary_op): Likewise.
* typeck2.c (process_init_constructor_array): Likewise.
gcc/fortran/
* trans-types.c (gfc_type_for_mode): Check valid_vector_subparts_p.
gcc/lto/
* lto-lang.c (lto_type_for_mode): Check valid_vector_subparts_p.
* lto.c (hash_canonical_type): Handle polynomial TYPE_VECTOR_SUBPARTS.
gcc/go/
* go-lang.c (go_langhook_type_for_mode): Check valid_vector_subparts_p.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256197
|
|
The GET_MODE_NUNITS patch made vect_grouped_store_supported and
vect_grouped_load_supported check for a constant number of elements,
so vect_permute_store_chain and vect_permute_load_chain can assert
for that. This patch adds commentary to that effect; the actual
asserts will be added by a later, more mechanical, patch.
The patch also reorganises the function so that the asserts
are linked specifically to code that builds permute vectors
element-by-element. This allows a later patch to add support
for some variable-length permutes.
2018-01-03 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* tree-vect-data-refs.c (vect_permute_store_chain): Reorganize
so that both the length == 3 and length != 3 cases set up their
own permute vectors. Add comments explaining why we know the
number of elements is constant.
(vect_permute_load_chain): Likewise.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256196
|
|
This patch changes GET_MODE_NUNITS from unsigned char
to poly_uint16, although it remains a macro when compiling
target code with NUM_POLY_INT_COEFFS == 1.
We can handle permuted loads and stores for variable nunits if
the number of statements is a power of 2, but not otherwise.
The to_constant call in make_vector_type goes away in a later patch.
2018-01-03 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* machmode.h (mode_nunits): Change from unsigned char to
poly_uint16_pod.
(ONLY_FIXED_SIZE_MODES): New macro.
(pod_mode::measurement_type, scalar_int_mode::measurement_type)
(scalar_float_mode::measurement_type, scalar_mode::measurement_type)
(complex_mode::measurement_type, fixed_size_mode::measurement_type):
New typedefs.
(mode_to_nunits): Return a poly_uint16 rather than an unsigned short.
(GET_MODE_NUNITS): Return a constant if ONLY_FIXED_SIZE_MODES,
or if measurement_type is not polynomial.
* genmodes.c (ZERO_COEFFS): New macro.
(emit_mode_nunits_inline): Make mode_nunits_inline return a
poly_uint16.
(emit_mode_nunits): Change the type of mode_nunits to poly_uint16_pod.
Use ZERO_COEFFS when emitting initializers.
* data-streamer.h (bp_pack_poly_value): New function.
(bp_unpack_poly_value): Likewise.
* lto-streamer-in.c (lto_input_mode_table): Use bp_unpack_poly_value
for GET_MODE_NUNITS.
* lto-streamer-out.c (lto_write_mode_table): Use bp_pack_poly_value
for GET_MODE_NUNITS.
* tree.c (make_vector_type): Remove temporary shim and make
the real function take the number of units as a poly_uint64
rather than an int.
(build_vector_type_for_mode): Handle polynomial nunits.
* dwarf2out.c (loc_descriptor, add_const_value_attribute): Likewise.
* emit-rtl.c (const_vec_series_p_1): Likewise.
(gen_rtx_CONST_VECTOR): Likewise.
* fold-const.c (test_vec_duplicate_folding): Likewise.
* genrecog.c (validate_pattern): Likewise.
* optabs-query.c (can_vec_perm_var_p, can_mult_highpart_p): Likewise.
* optabs-tree.c (expand_vec_cond_expr_p): Likewise.
* optabs.c (expand_vector_broadcast, expand_binop_directly): Likewise.
(shift_amt_for_vec_perm_mask, expand_vec_perm_var): Likewise.
(expand_vec_cond_expr, expand_mult_highpart): Likewise.
* rtlanal.c (subreg_get_info): Likewise.
* tree-vect-data-refs.c (vect_grouped_store_supported): Likewise.
(vect_grouped_load_supported): Likewise.
* tree-vect-generic.c (type_for_widest_vector_mode): Likewise.
* tree-vect-loop.c (have_whole_vector_shift): Likewise.
* simplify-rtx.c (simplify_unary_operation_1): Likewise.
(simplify_const_unary_operation, simplify_binary_operation_1)
(simplify_const_binary_operation, simplify_ternary_operation)
(test_vector_ops_duplicate, test_vector_ops): Likewise.
(simplify_immed_subreg): Use GET_MODE_NUNITS on a fixed_size_mode
instead of CONST_VECTOR_NUNITS.
* varasm.c (output_constant_pool_2): Likewise.
* rtx-vector-builder.c (rtx_vector_builder::build): Only include the
explicit-encoded elements in the XVEC for variable-length vectors.
gcc/ada/
* gcc-interface/misc.c (enumerate_modes): Handle polynomial
GET_MODE_NUNITS.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256195
|
|
From-SVN: r256169
|
|
This patch replaces the two-state vect_no_alias_p with a three-state
vect_compile_time_alias that handles polynomial segment lengths.
2018-01-03 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* tree-vect-data-refs.c (vect_no_alias_p): Replace with...
(vect_compile_time_alias): ...this new function. Do the calculation
on poly_ints rather than trees.
(vect_prune_runtime_alias_test_list): Update call accordingly.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256142
|
|
This patch makes vector_alignment_reachable_p cope with variable-length
vectors.
2018-01-03 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* tree-vect-data-refs.c (vector_alignment_reachable_p): Treat the
number of units as polynomial.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256132
|
|
This patch changes the type of the vectorisation factor and SLP
unrolling factor to poly_uint64. This in turn required some knock-on
changes in signedness elsewhere.
Cost decisions are generally based on estimated_poly_value,
which for VF is wrapped up as vect_vf_for_cost.
The patch doesn't on its own enable variable-length vectorisation.
It just makes the minimum changes necessary for the code to build
with the new VF and UF types. Later patches also make the
vectoriser cope with variable TYPE_VECTOR_SUBPARTS and variable
GET_MODE_NUNITS, at which point the code really does handle
variable-length vectors.
The patch also changes MAX_VECTORIZATION_FACTOR to INT_MAX,
to avoid hard-coding a particular architectural limit.
The patch includes a new test because a development version of the patch
accidentally used file print routines instead of dump_*, which would
fail with -fopt-info.
2018-01-03 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* tree-vectorizer.h (_slp_instance::unrolling_factor): Change
from an unsigned int to a poly_uint64.
(_loop_vec_info::slp_unrolling_factor): Likewise.
(_loop_vec_info::vectorization_factor): Change from an int
to a poly_uint64.
(MAX_VECTORIZATION_FACTOR): Bump from 64 to INT_MAX.
(vect_get_num_vectors): New function.
(vect_update_max_nunits, vect_vf_for_cost): Likewise.
(vect_get_num_copies): Use vect_get_num_vectors.
(vect_analyze_data_ref_dependences): Change max_vf from an int *
to an unsigned int *.
(vect_analyze_data_refs): Change min_vf from an int * to a
poly_uint64 *.
(vect_transform_slp_perm_load): Take the vf as a poly_uint64 rather
than an unsigned HOST_WIDE_INT.
* tree-vect-data-refs.c (vect_analyze_possibly_independent_ddr)
(vect_analyze_data_ref_dependence): Change max_vf from an int *
to an unsigned int *.
(vect_analyze_data_ref_dependences): Likewise.
(vect_compute_data_ref_alignment): Handle polynomial vf.
(vect_enhance_data_refs_alignment): Likewise.
(vect_prune_runtime_alias_test_list): Likewise.
(vect_shift_permute_load_chain): Likewise.
(vect_supportable_dr_alignment): Likewise.
(dependence_distance_ge_vf): Take the vectorization factor as a
poly_uint64 rather than an unsigned HOST_WIDE_INT.
(vect_analyze_data_refs): Change min_vf from an int * to a
poly_uint64 *.
* tree-vect-loop-manip.c (vect_gen_scalar_loop_niters): Take
vfm1 as a poly_uint64 rather than an int. Make the same change
for the returned bound_scalar.
(vect_gen_vector_loop_niters): Handle polynomial vf.
(vect_do_peeling): Likewise. Update call to
vect_gen_scalar_loop_niters and handle polynomial bound_scalars.
(vect_gen_vector_loop_niters_mult_vf): Assert that the vf must
be constant.
* tree-vect-loop.c (vect_determine_vectorization_factor)
(vect_update_vf_for_slp, vect_analyze_loop_2): Handle polynomial vf.
(vect_get_known_peeling_cost): Likewise.
(vect_estimate_min_profitable_iters, vectorizable_reduction): Likewise.
(vect_worthwhile_without_simd_p, vectorizable_induction): Likewise.
(vect_transform_loop): Likewise. Use the lowest possible VF when
updating the upper bounds of the loop.
(vect_min_worthwhile_factor): Make static. Return an unsigned int
rather than an int.
* tree-vect-slp.c (vect_attempt_slp_rearrange_stmts): Cope with
polynomial unroll factors.
(vect_analyze_slp_cost_1, vect_analyze_slp_instance): Likewise.
(vect_make_slp_decision): Likewise.
(vect_supported_load_permutation_p): Likewise, and polynomial
vf too.
(vect_analyze_slp_cost): Handle polynomial vf.
(vect_slp_analyze_node_operations): Likewise.
(vect_slp_analyze_bb_1): Likewise.
(vect_transform_slp_perm_load): Take the vf as a poly_uint64 rather
than an unsigned HOST_WIDE_INT.
* tree-vect-stmts.c (vectorizable_simd_clone_call, vectorizable_store)
(vectorizable_load): Handle polynomial vf.
* tree-vectorizer.c (simduid_to_vf::vf): Change from an int to
a poly_uint64.
(adjust_simduid_builtins, shrink_simd_arrays): Update accordingly.
gcc/testsuite/
* gcc.dg/vect-opt-info-1.c: New test.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256126
|
|
This patch makes users of vec_perm_builders use the compressed encoding
where possible. This means that they work with variable-length vectors.
2018-01-02 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* optabs.c (expand_vec_perm_var): Use an explicit encoding for
the broadcast of the low byte.
(expand_mult_highpart): Use an explicit encoding for the permutes.
* optabs-query.c (can_mult_highpart_p): Likewise.
* tree-vect-loop.c (calc_vec_perm_mask_for_shift): Likewise.
* tree-vect-stmts.c (perm_mask_for_reverse): Likewise.
(vectorizable_bswap): Likewise.
* tree-vect-data-refs.c (vect_grouped_store_supported): Use an
explicit encoding for the power-of-2 permutes.
(vect_permute_store_chain): Likewise.
(vect_grouped_load_supported): Likewise.
(vect_permute_load_chain): Likewise.
From-SVN: r256097
|
|
This patch changes vec_perm_indices from a plain vec<> to a class
that stores a canonicalized permutation, using the same encoding
as for VECTOR_CSTs. This means that vec_perm_indices now carries
information about the number of vectors being permuted (currently
always 1 or 2) and the number of elements in each input vector.
A new vec_perm_builder class is used to actually build up the vector,
like tree_vector_builder does for trees. vec_perm_indices is the
completed representation, a bit like VECTOR_CST is for trees.
The patch just does a mechanical conversion of the code to
vec_perm_builder: a later patch uses explicit encodings where possible.
The point of all this is that it makes the representation suitable
for variable-length vectors. It's no longer necessary for the
underlying vec<>s to store every element explicitly.
In int-vector-builder.h, "using the same encoding as tree and rtx constants"
describes the endpoint -- adding the rtx encoding comes later.
2018-01-02 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* int-vector-builder.h: New file.
* vec-perm-indices.h: Include int-vector-builder.h.
(vec_perm_indices): Redefine as an int_vector_builder.
(auto_vec_perm_indices): Delete.
(vec_perm_builder): Redefine as a stand-alone class.
(vec_perm_indices::vec_perm_indices): New function.
(vec_perm_indices::clamp): Likewise.
* vec-perm-indices.c: Include fold-const.h and tree-vector-builder.h.
(vec_perm_indices::new_vector): New function.
(vec_perm_indices::new_expanded_vector): Update for new
vec_perm_indices class.
(vec_perm_indices::rotate_inputs): New function.
(vec_perm_indices::all_in_range_p): Operate directly on the
encoded form, without computing elided elements.
(tree_to_vec_perm_builder): Operate directly on the VECTOR_CST
encoding. Update for new vec_perm_indices class.
* optabs.c (expand_vec_perm_const): Create a vec_perm_indices for
the given vec_perm_builder.
(expand_vec_perm_var): Update vec_perm_builder constructor.
(expand_mult_highpart): Use vec_perm_builder instead of
auto_vec_perm_indices.
* optabs-query.c (can_mult_highpart_p): Use vec_perm_builder and
vec_perm_indices instead of auto_vec_perm_indices. Use a single
or double series encoding as appropriate.
* fold-const.c (fold_ternary_loc): Use vec_perm_builder and
vec_perm_indices instead of auto_vec_perm_indices.
* tree-ssa-forwprop.c (simplify_vector_constructor): Likewise.
* tree-vect-data-refs.c (vect_grouped_store_supported): Likewise.
(vect_permute_store_chain): Likewise.
(vect_grouped_load_supported): Likewise.
(vect_permute_load_chain): Likewise.
(vect_shift_permute_load_chain): Likewise.
* tree-vect-slp.c (vect_build_slp_tree_1): Likewise.
(vect_transform_slp_perm_load): Likewise.
(vect_schedule_slp_instance): Likewise.
* tree-vect-stmts.c (perm_mask_for_reverse): Likewise.
(vectorizable_mask_load_store): Likewise.
(vectorizable_bswap): Likewise.
(vectorizable_store): Likewise.
(vectorizable_load): Likewise.
* tree-vect-generic.c (lower_vec_perm): Use vec_perm_builder and
vec_perm_indices instead of auto_vec_perm_indices. Use
tree_to_vec_perm_builder to read the vector from a tree.
* tree-vect-loop.c (calc_vec_perm_mask_for_shift): Take a
vec_perm_builder instead of a vec_perm_indices.
(have_whole_vector_shift): Use vec_perm_builder and
vec_perm_indices instead of auto_vec_perm_indices. Leave the
truncation to calc_vec_perm_mask_for_shift.
(vect_create_epilog_for_reduction): Likewise.
* config/aarch64/aarch64.c (expand_vec_perm_d::perm): Change
from auto_vec_perm_indices to vec_perm_indices.
(aarch64_expand_vec_perm_const_1): Use rotate_inputs on d.perm
instead of changing individual elements.
(aarch64_vectorize_vec_perm_const): Use new_vector to install
the vector in d.perm.
* config/arm/arm.c (expand_vec_perm_d::perm): Change
from auto_vec_perm_indices to vec_perm_indices.
(arm_expand_vec_perm_const_1): Use rotate_inputs on d.perm
instead of changing individual elements.
(arm_vectorize_vec_perm_const): Use new_vector to install
the vector in d.perm.
* config/powerpcspe/powerpcspe.c (rs6000_expand_extract_even):
Update vec_perm_builder constructor.
(rs6000_expand_interleave): Likewise.
* config/rs6000/rs6000.c (rs6000_expand_extract_even): Likewise.
(rs6000_expand_interleave): Likewise.
From-SVN: r256095
|
|
One of the changes needed for variable-length VEC_PERM_EXPRs -- and for
long fixed-length VEC_PERM_EXPRs -- is the ability to use constant
selectors that wouldn't fit in the vectors being permuted. E.g. a
permute on two V256QIs can't be done using a V256QI selector.
At the moment constant permutes use two interfaces:
targetm.vectorizer.vec_perm_const_ok for testing whether a permute is
valid and the vec_perm_const optab for actually emitting the permute.
The former gets passed a vec<> selector and the latter an rtx selector.
Most ports share a lot of code between the hook and the optab, with a
wrapper function for each interface.
We could try to keep that interface and require ports to define wider
vector modes that could be attached to the CONST_VECTOR (e.g. V256HI or
V256SI in the example above). But building a CONST_VECTOR rtx seems a bit
pointless here, since the expand code only creates the CONST_VECTOR in
order to call the optab, and the first thing the target does is take
the CONST_VECTOR apart again.
The easiest approach therefore seemed to be to remove the optab and
reuse the target hook to emit the code. One potential drawback is that
it's no longer possible to use match_operand predicates to force
operands into the required form, but in practice all targets want
register operands anyway.
The patch also changes vec_perm_indices into a class that provides
some simple routines for handling permutations. A later patch will
flesh this out and get rid of auto_vec_perm_indices, but I didn't
want to do all that in this patch and make it more complicated than
it already is.
2018-01-02 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* Makefile.in (OBJS): Add vec-perm-indices.o.
* vec-perm-indices.h: New file.
* vec-perm-indices.c: Likewise.
* target.h (vec_perm_indices): Replace with a forward class
declaration.
(auto_vec_perm_indices): Move to vec-perm-indices.h.
* optabs.h: Include vec-perm-indices.h.
(expand_vec_perm): Delete.
(selector_fits_mode_p, expand_vec_perm_var): Declare.
(expand_vec_perm_const): Declare.
* target.def (vec_perm_const_ok): Replace with...
(vec_perm_const): ...this new hook.
* doc/tm.texi.in (TARGET_VECTORIZE_VEC_PERM_CONST_OK): Replace with...
(TARGET_VECTORIZE_VEC_PERM_CONST): ...this new hook.
* doc/tm.texi: Regenerate.
* optabs.def (vec_perm_const): Delete.
* doc/md.texi (vec_perm_const): Likewise.
(vec_perm): Refer to TARGET_VECTORIZE_VEC_PERM_CONST.
* expr.c (expand_expr_real_2): Use expand_vec_perm_const rather than
expand_vec_perm for constant permutation vectors. Assert that
the mode of variable permutation vectors is the integer equivalent
of the mode that is being permuted.
* optabs-query.h (selector_fits_mode_p): Declare.
* optabs-query.c: Include vec-perm-indices.h.
(selector_fits_mode_p): New function.
(can_vec_perm_const_p): Check whether targetm.vectorize.vec_perm_const
is defined, instead of checking whether the vec_perm_const_optab
exists. Use targetm.vectorize.vec_perm_const instead of
targetm.vectorize.vec_perm_const_ok. Check whether the indices
fit in the vector mode before using a variable permute.
* optabs.c (shift_amt_for_vec_perm_mask): Take a mode and a
vec_perm_indices instead of an rtx.
(expand_vec_perm): Replace with...
(expand_vec_perm_const): ...this new function. Take the selector
as a vec_perm_indices rather than an rtx. Also take the mode of
the selector. Update call to shift_amt_for_vec_perm_mask.
Use targetm.vectorize.vec_perm_const instead of vec_perm_const_optab.
Use vec_perm_indices::new_expanded_vector to expand the original
selector into bytes. Check whether the indices fit in the vector
mode before using a variable permute.
(expand_vec_perm_var): Make global.
(expand_mult_highpart): Use expand_vec_perm_const.
* fold-const.c: Includes vec-perm-indices.h.
* tree-ssa-forwprop.c: Likewise.
* tree-vect-data-refs.c: Likewise.
* tree-vect-generic.c: Likewise.
* tree-vect-loop.c: Likewise.
* tree-vect-slp.c: Likewise.
* tree-vect-stmts.c: Likewise.
* config/aarch64/aarch64-protos.h (aarch64_expand_vec_perm_const):
Delete.
* config/aarch64/aarch64-simd.md (vec_perm_const<mode>): Delete.
* config/aarch64/aarch64.c (aarch64_expand_vec_perm_const)
(aarch64_vectorize_vec_perm_const_ok): Fuse into...
(aarch64_vectorize_vec_perm_const): ...this new function.
(TARGET_VECTORIZE_VEC_PERM_CONST_OK): Delete.
(TARGET_VECTORIZE_VEC_PERM_CONST): Redefine.
* config/arm/arm-protos.h (arm_expand_vec_perm_const): Delete.
* config/arm/vec-common.md (vec_perm_const<mode>): Delete.
* config/arm/arm.c (TARGET_VECTORIZE_VEC_PERM_CONST_OK): Delete.
(TARGET_VECTORIZE_VEC_PERM_CONST): Redefine.
(arm_expand_vec_perm_const, arm_vectorize_vec_perm_const_ok): Merge
into...
(arm_vectorize_vec_perm_const): ...this new function. Explicitly
check for NEON modes.
* config/i386/i386-protos.h (ix86_expand_vec_perm_const): Delete.
* config/i386/sse.md (VEC_PERM_CONST, vec_perm_const<mode>): Delete.
* config/i386/i386.c (ix86_expand_vec_perm_const_1): Update comment.
(ix86_expand_vec_perm_const, ix86_vectorize_vec_perm_const_ok): Merge
into...
(ix86_vectorize_vec_perm_const): ...this new function. Incorporate
the old VEC_PERM_CONST conditions.
* config/ia64/ia64-protos.h (ia64_expand_vec_perm_const): Delete.
* config/ia64/vect.md (vec_perm_const<mode>): Delete.
* config/ia64/ia64.c (ia64_expand_vec_perm_const)
(ia64_vectorize_vec_perm_const_ok): Merge into...
(ia64_vectorize_vec_perm_const): ...this new function.
* config/mips/loongson.md (vec_perm_const<mode>): Delete.
* config/mips/mips-msa.md (vec_perm_const<mode>): Delete.
* config/mips/mips-ps-3d.md (vec_perm_constv2sf): Delete.
* config/mips/mips-protos.h (mips_expand_vec_perm_const): Delete.
* config/mips/mips.c (mips_expand_vec_perm_const)
(mips_vectorize_vec_perm_const_ok): Merge into...
(mips_vectorize_vec_perm_const): ...this new function.
* config/powerpcspe/altivec.md (vec_perm_constv16qi): Delete.
* config/powerpcspe/paired.md (vec_perm_constv2sf): Delete.
* config/powerpcspe/spe.md (vec_perm_constv2si): Delete.
* config/powerpcspe/vsx.md (vec_perm_const<mode>): Delete.
* config/powerpcspe/powerpcspe-protos.h (altivec_expand_vec_perm_const)
(rs6000_expand_vec_perm_const): Delete.
* config/powerpcspe/powerpcspe.c (TARGET_VECTORIZE_VEC_PERM_CONST_OK):
Delete.
(TARGET_VECTORIZE_VEC_PERM_CONST): Redefine.
(altivec_expand_vec_perm_const_le): Take each operand individually.
Operate on constant selectors rather than rtxes.
(altivec_expand_vec_perm_const): Likewise. Update call to
altivec_expand_vec_perm_const_le.
(rs6000_expand_vec_perm_const): Delete.
(rs6000_vectorize_vec_perm_const_ok): Delete.
(rs6000_vectorize_vec_perm_const): New function.
(rs6000_do_expand_vec_perm): Take a vec_perm_builder instead of
an element count and rtx array.
(rs6000_expand_extract_even): Update call accordingly.
(rs6000_expand_interleave): Likewise.
* config/rs6000/altivec.md (vec_perm_constv16qi): Delete.
* config/rs6000/paired.md (vec_perm_constv2sf): Delete.
* config/rs6000/vsx.md (vec_perm_const<mode>): Delete.
* config/rs6000/rs6000-protos.h (altivec_expand_vec_perm_const)
(rs6000_expand_vec_perm_const): Delete.
* config/rs6000/rs6000.c (TARGET_VECTORIZE_VEC_PERM_CONST_OK): Delete.
(TARGET_VECTORIZE_VEC_PERM_CONST): Redefine.
(altivec_expand_vec_perm_const_le): Take each operand individually.
Operate on constant selectors rather than rtxes.
(altivec_expand_vec_perm_const): Likewise. Update call to
altivec_expand_vec_perm_const_le.
(rs6000_expand_vec_perm_const): Delete.
(rs6000_vectorize_vec_perm_const_ok): Delete.
(rs6000_vectorize_vec_perm_const): New function. Remove stray
reference to the SPE evmerge intructions.
(rs6000_do_expand_vec_perm): Take a vec_perm_builder instead of
an element count and rtx array.
(rs6000_expand_extract_even): Update call accordingly.
(rs6000_expand_interleave): Likewise.
* config/sparc/sparc.md (vec_perm_constv8qi): Delete in favor of...
* config/sparc/sparc.c (sparc_vectorize_vec_perm_const): ...this
new function.
(TARGET_VECTORIZE_VEC_PERM_CONST): Redefine.
From-SVN: r256093
|
|
This patch splits can_vec_perm_p into two functions: can_vec_perm_var_p
for testing permute operations with variable selection vectors, and
can_vec_perm_const_p for testing permute operations with specific
constant selection vectors. This means that we can pass the constant
selection vector by reference.
Constant permutes can still use a variable permute as a fallback.
A later patch adds a check to makre sure that we don't truncate the
vector indices when doing this.
However, have_whole_vector_shift checked:
if (direct_optab_handler (vec_perm_const_optab, mode) == CODE_FOR_nothing)
return false;
which had the effect of disallowing the fallback to variable permutes.
I'm not sure whether that was the intention or whether it was just
supposed to short-cut the loop on targets that don't support permutes.
(But then why bother? The first check in the loop would fail and
we'd bail out straightaway.)
The patch adds a parameter for disallowing the fallback. I think it
makes sense to do this for the following code in the VEC_PERM_EXPR
folder:
/* Some targets are deficient and fail to expand a single
argument permutation while still allowing an equivalent
2-argument version. */
if (need_mask_canon && arg2 == op2
&& !can_vec_perm_p (TYPE_MODE (type), false, &sel)
&& can_vec_perm_p (TYPE_MODE (type), false, &sel2))
since it's really testing whether the expand_vec_perm_const code expects
a particular form.
2018-01-02 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* optabs-query.h (can_vec_perm_p): Delete.
(can_vec_perm_var_p, can_vec_perm_const_p): Declare.
* optabs-query.c (can_vec_perm_p): Split into...
(can_vec_perm_var_p, can_vec_perm_const_p): ...these two functions.
(can_mult_highpart_p): Use can_vec_perm_const_p to test whether a
particular selector is valid.
* tree-ssa-forwprop.c (simplify_vector_constructor): Likewise.
* tree-vect-data-refs.c (vect_grouped_store_supported): Likewise.
(vect_grouped_load_supported): Likewise.
(vect_shift_permute_load_chain): Likewise.
* tree-vect-slp.c (vect_build_slp_tree_1): Likewise.
(vect_transform_slp_perm_load): Likewise.
* tree-vect-stmts.c (perm_mask_for_reverse): Likewise.
(vectorizable_bswap): Likewise.
(vect_gen_perm_mask_checked): Likewise.
* fold-const.c (fold_ternary_loc): Likewise. Don't take
implementations of variable permutation vectors into account
when deciding which selector to use.
* tree-vect-loop.c (have_whole_vector_shift): Don't check whether
vec_perm_const_optab is supported; instead use can_vec_perm_const_p
with a false third argument.
* tree-vect-generic.c (lower_vec_perm): Use can_vec_perm_const_p
to test whether the constant selector is valid and can_vec_perm_var_p
to test whether a variable selector is valid.
From-SVN: r256091
|
|
This patch makes vect_compute_data_ref_alignment treat DR_INIT as a
poly_int and handles cases in which the calculated misalignment might
not be constant.
2017-12-21 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* tree-vect-data-refs.c (vect_compute_data_ref_alignment):
Treat drb->init as a poly_int. Fail if its misalignment wrt
vector_alignment isn't known.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r255935
|
|
This patch allows MEM_REF offsets to be polynomial, with mem_ref_offset
now returning a poly_offset_int instead of an offset_int. The
non-mechanical changes to callers of mem_ref_offset were handled by
previous patches.
2017-12-21 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* fold-const.h (mem_ref_offset): Return a poly_offset_int rather
than an offset_int.
* tree.c (mem_ref_offset): Likewise.
(build_simple_mem_ref_loc): Treat MEM_REF offsets as poly_ints.
* builtins.c (get_object_alignment_2): Likewise.
* expr.c (get_inner_reference, expand_expr_real_1): Likewise.
* gimple-fold.c (get_base_constructor): Likewise.
* gimple-ssa-strength-reduction.c (restructure_reference): Likewise.
* gimple-ssa-warn-restrict.c (builtin_memref::builtin_memref):
Likewise.
* ipa-polymorphic-call.c
(ipa_polymorphic_call_context::ipa_polymorphic_call_context): Likewise.
* ipa-prop.c (compute_complex_assign_jump_func): Likewise.
(get_ancestor_addr_info): Likewise.
* ipa-param-manipulation.c (ipa_get_adjustment_candidate): Likewise.
* match.pd: Likewise.
* tree-data-ref.c (dr_analyze_innermost): Likewise.
* tree-dfa.c (get_addr_base_and_unit_offset_1): Likewise.
* tree-eh.c (tree_could_trap_p): Likewise.
* tree-object-size.c (addr_object_size): Likewise.
* tree-ssa-address.c (copy_ref_info): Likewise.
* tree-ssa-alias.c (indirect_ref_may_alias_decl_p): Likewise.
(indirect_refs_may_alias_p): Likewise.
* tree-ssa-sccvn.c (copy_reference_ops_from_ref): Likewise.
* tree-ssa.c (maybe_rewrite_mem_ref_base): Likewise.
(non_rewritable_mem_ref_base): Likewise.
* tree-vect-data-refs.c (vect_check_gather_scatter): Likewise.
* tree-vrp.c (vrp_prop::check_array_ref): Likewise.
* varasm.c (decode_addr_const): Likewise.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r255930
|
|
This patch makes get_inner_reference and ptr_difference_const return the
bit size and bit position as poly_int64s rather than HOST_WIDE_INTS.
The non-mechanical changes were handled by previous patches.
2017-12-21 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* tree.h (get_inner_reference): Return the bitsize and bitpos
as poly_int64_pods rather than HOST_WIDE_INT.
* fold-const.h (ptr_difference_const): Return the pointer difference
as a poly_int64_pod rather than a HOST_WIDE_INT.
* expr.c (get_inner_reference): Return the bitsize and bitpos
as poly_int64_pods rather than HOST_WIDE_INT.
(expand_expr_addr_expr_1, expand_expr_real_1): Track polynomial
offsets and sizes.
* fold-const.c (make_bit_field_ref): Take the bitpos as a poly_int64
rather than a HOST_WIDE_INT. Update call to get_inner_reference.
(optimize_bit_field_compare): Update call to get_inner_reference.
(decode_field_reference): Likewise.
(fold_unary_loc): Track polynomial offsets and sizes.
(split_address_to_core_and_offset): Return the bitpos as a
poly_int64_pod rather than a HOST_WIDE_INT.
(ptr_difference_const): Likewise for the pointer difference.
* asan.c (instrument_derefs): Track polynomial offsets and sizes.
* config/mips/mips.c (r10k_safe_mem_expr_p): Likewise.
* dbxout.c (dbxout_expand_expr): Likewise.
* dwarf2out.c (loc_list_for_address_of_addr_expr_of_indirect_ref)
(loc_list_from_tree_1, fortran_common): Likewise.
* gimple-laddress.c (pass_laddress::execute): Likewise.
* gimple-ssa-store-merging.c (find_bswap_or_nop_load): Likewise.
* gimplify.c (gimplify_scan_omp_clauses): Likewise.
* simplify-rtx.c (delegitimize_mem_from_attrs): Likewise.
* tree-affine.c (tree_to_aff_combination): Likewise.
(get_inner_reference_aff): Likewise.
* tree-data-ref.c (split_constant_offset_1): Likewise.
(dr_analyze_innermost): Likewise.
* tree-scalar-evolution.c (interpret_rhs_expr): Likewise.
* tree-sra.c (ipa_sra_check_caller): Likewise.
* tree-vect-data-refs.c (vect_check_gather_scatter): Likewise.
* ubsan.c (maybe_instrument_pointer_overflow): Likewise.
(instrument_bool_enum_load, instrument_object_size): Likewise.
* gimple-ssa-strength-reduction.c (slsr_process_ref): Update call
to get_inner_reference.
* hsa-gen.c (gen_hsa_addr): Likewise.
* sanopt.c (maybe_optimize_ubsan_ptr_ifn): Likewise.
* tsan.c (instrument_expr): Likewise.
* match.pd: Update call to ptr_difference_const.
gcc/ada/
* gcc-interface/trans.c (Attribute_to_gnu): Track polynomial
offsets and sizes.
* gcc-interface/utils2.c (build_unary_op): Likewise.
gcc/cp/
* constexpr.c (check_automatic_or_tls): Track polynomial
offsets and sizes.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r255914
|
|
This patch adds a tree representation for poly_ints. Unlike the
rtx version, the coefficients are INTEGER_CSTs rather than plain
integers, so that we can easily access them as poly_widest_ints
and poly_offset_ints.
The patch also adjusts some places that previously
relied on "constant" meaning "INTEGER_CST". It also makes
sure that the TYPE_SIZE agrees with the TYPE_SIZE_UNIT for
vector booleans, given the existing:
/* Several boolean vector elements may fit in a single unit. */
if (VECTOR_BOOLEAN_TYPE_P (type)
&& type->type_common.mode != BLKmode)
TYPE_SIZE_UNIT (type)
= size_int (GET_MODE_SIZE (type->type_common.mode));
else
TYPE_SIZE_UNIT (type) = int_const_binop (MULT_EXPR,
TYPE_SIZE_UNIT (innertype),
size_int (nunits));
2017-12-20 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* doc/generic.texi (POLY_INT_CST): Document.
* tree.def (POLY_INT_CST): New tree code.
* treestruct.def (TS_POLY_INT_CST): New tree layout.
* tree-core.h (tree_poly_int_cst): New struct.
(tree_node): Add a poly_int_cst field.
* tree.h (POLY_INT_CST_P, POLY_INT_CST_COEFF): New macros.
(wide_int_to_tree, force_fit_type): Take a poly_wide_int_ref
instead of a wide_int_ref.
(build_int_cst, build_int_cst_type): Take a poly_int64 instead
of a HOST_WIDE_INT.
(build_int_cstu, build_array_type_nelts): Take a poly_uint64
instead of an unsigned HOST_WIDE_INT.
(build_poly_int_cst, tree_fits_poly_int64_p, tree_fits_poly_uint64_p)
(ptrdiff_tree_p): Declare.
(tree_to_poly_int64, tree_to_poly_uint64): Likewise. Provide
extern inline implementations if the target doesn't use POLY_INT_CST.
(poly_int_tree_p): New function.
(wi::unextended_tree): New class.
(wi::int_traits <unextended_tree>): New override.
(wi::extended_tree): Add a default constructor.
(wi::extended_tree::get_tree): New function.
(wi::widest_extended_tree, wi::offset_extended_tree): New typedefs.
(wi::tree_to_widest_ref, wi::tree_to_offset_ref): Use them.
(wi::tree_to_poly_widest_ref, wi::tree_to_poly_offset_ref)
(wi::tree_to_poly_wide_ref): New typedefs.
(wi::ints_for): Provide overloads for extended_tree and
unextended_tree.
(poly_int_cst_value, wi::to_poly_widest, wi::to_poly_offset)
(wi::to_wide): New functions.
(wi::fits_to_boolean_p, wi::fits_to_tree_p): Handle poly_ints.
* tree.c (poly_int_cst_hasher): New struct.
(poly_int_cst_hash_table): New variable.
(tree_node_structure_for_code, tree_code_size, simple_cst_equal)
(valid_constant_size_p, add_expr, drop_tree_overflow): Handle
POLY_INT_CST.
(initialize_tree_contains_struct): Handle TS_POLY_INT_CST.
(init_ttree): Initialize poly_int_cst_hash_table.
(build_int_cst, build_int_cst_type, build_invariant_address): Take
a poly_int64 instead of a HOST_WIDE_INT.
(build_int_cstu, build_array_type_nelts): Take a poly_uint64
instead of an unsigned HOST_WIDE_INT.
(wide_int_to_tree): Rename to...
(wide_int_to_tree_1): ...this.
(build_new_poly_int_cst, build_poly_int_cst): New functions.
(force_fit_type): Take a poly_wide_int_ref instead of a wide_int_ref.
(wide_int_to_tree): New function that takes a poly_wide_int_ref.
(ptrdiff_tree_p, tree_to_poly_int64, tree_to_poly_uint64)
(tree_fits_poly_int64_p, tree_fits_poly_uint64_p): New functions.
* lto-streamer-out.c (DFS::DFS_write_tree_body, hash_tree): Handle
TS_POLY_INT_CST.
* tree-streamer-in.c (lto_input_ts_poly_tree_pointers): Likewise.
(streamer_read_tree_body): Likewise.
* tree-streamer-out.c (write_ts_poly_tree_pointers): Likewise.
(streamer_write_tree_body): Likewise.
* tree-streamer.c (streamer_check_handled_ts_structures): Likewise.
* asan.c (asan_protect_global): Require the size to be an INTEGER_CST.
* cfgexpand.c (expand_debug_expr): Handle POLY_INT_CST.
* expr.c (expand_expr_real_1, const_vector_from_tree): Likewise.
* gimple-expr.h (is_gimple_constant): Likewise.
* gimplify.c (maybe_with_size_expr): Likewise.
* print-tree.c (print_node): Likewise.
* tree-data-ref.c (data_ref_compare_tree): Likewise.
* tree-pretty-print.c (dump_generic_node): Likewise.
* tree-ssa-address.c (addr_for_mem_ref): Likewise.
* tree-vect-data-refs.c (dr_group_sort_cmp): Likewise.
* tree-vrp.c (compare_values_warnv): Likewise.
* tree-ssa-loop-ivopts.c (determine_base_object, constant_multiple_of)
(get_loop_invariant_expr, add_candidate_1, get_computation_aff_1)
(force_expr_to_var_cost): Likewise.
* tree-ssa-loop.c (for_each_index): Likewise.
* fold-const.h (build_invariant_address, size_int_kind): Take a
poly_int64 instead of a HOST_WIDE_INT.
* fold-const.c (fold_negate_expr_1, const_binop, const_unop)
(fold_convert_const, multiple_of_p, fold_negate_const): Handle
POLY_INT_CST.
(size_binop_loc): Likewise. Allow int_const_binop_1 to fail.
(int_const_binop_2): New function, split out from...
(int_const_binop_1): ...here. Handle POLY_INT_CST.
(size_int_kind): Take a poly_int64 instead of a HOST_WIDE_INT.
* expmed.c (make_tree): Handle CONST_POLY_INT_P.
* gimple-ssa-strength-reduction.c (slsr_process_add)
(slsr_process_mul): Check for INTEGER_CSTs before using them
as candidates.
* stor-layout.c (bits_from_bytes): New function.
(bit_from_pos): Use it.
(layout_type): Likewise. For vectors, multiply the TYPE_SIZE_UNIT
by BITS_PER_UNIT to get the TYPE_SIZE.
* tree-cfg.c (verify_expr, verify_types_in_gimple_reference): Allow
MEM_REF and TARGET_MEM_REF offsets to be a POLY_INT_CST.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r255863
|
|
(generic tuning) after r255268 by missed SLP oppurtunity)
2017-12-01 Richard Biener <rguenther@suse.de>
PR tree-optimization/83232
* tree-vect-data-refs.c (vect_analyze_data_ref_accesses): Fix
detection of same access. Instead of breaking the group here
do not consider the duplicate. Add comment explaining real fix.
* gfortran.dg/vect/pr83232.f90: New testcase.
From-SVN: r255307
|
|
output: 1 in vect_analyze_data_ref_accesses)
2017-10-06 Richard Biener <rguenther@suse.de>
PR tree-optimization/82397
* tree-vect-data-refs.c (dr_group_sort_cmp): Do not use
operand_equal_p but rely on data_ref_compare_tree for detecting
equalities.
(vect_analyze_data_ref_accesses): Use data_ref_compare_tree
to match up with dr_group_sort_cmp.
* gfortran.dg/pr82397.f: New testcase.
From-SVN: r253482
|
|
This PR shows that we weren't filtering out irrelevant stmts in
vect_get_peeling_costs_all_drs (unlike related loops in which
we iterate over all datarefs).
2017-09-22 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
PR tree-optimization/82289
* tree-vect-data-refs.c (vect_get_peeling_costs_all_drs): Check
STMT_VINFO_RELEVANT_P.
gcc/testsuite/
PR tree-optimization/82289
* gcc.dg/vect/pr82289.c: New test.
From-SVN: r253103
|
|
The vectoriser aligned vectors to TYPE_ALIGN unconditionally, although
there was also a hard-coded assumption that this was equal to the type
size. This was inconvenient for SVE for two reasons:
- When compiling for a specific power-of-2 SVE vector length, we might
want to align to a full vector. However, the TYPE_ALIGN is governed
by the ABI alignment, which is 128 bits regardless of size.
- For vector-length-agnostic code it doesn't usually make sense to align,
since the runtime vector length might not be a power of two. Even for
power of two sizes, there's no guarantee that aligning to the previous
16 bytes will be an improveent.
This patch therefore adds a target hook to control the preferred
vectoriser (as opposed to ABI) alignment.
2017-09-22 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* target.def (preferred_vector_alignment): New hook.
* doc/tm.texi.in (TARGET_VECTORIZE_PREFERRED_VECTOR_ALIGNMENT): New
hook.
* doc/tm.texi: Regenerate.
* targhooks.h (default_preferred_vector_alignment): Declare.
* targhooks.c (default_preferred_vector_alignment): New function.
* tree-vectorizer.h (dataref_aux): Add a target_alignment field.
Expand commentary.
(DR_TARGET_ALIGNMENT): New macro.
(aligned_access_p): Update commentary.
(vect_known_alignment_in_bytes): New function.
* tree-vect-data-refs.c (vect_calculate_required_alignment): New
function.
(vect_compute_data_ref_alignment): Set DR_TARGET_ALIGNMENT.
Calculate the misalignment based on the target alignment rather than
the vector size.
(vect_update_misalignment_for_peel): Use DR_TARGET_ALIGMENT
rather than TYPE_ALIGN / BITS_PER_UNIT to update the misalignment.
(vect_enhance_data_refs_alignment): Mask the byte misalignment with
the target alignment, rather than masking the element misalignment
with the number of elements in a vector. Also use the target
alignment when calculating the maximum number of peels.
(vect_find_same_alignment_drs): Use vect_calculate_required_alignment
instead of TYPE_ALIGN_UNIT.
(vect_duplicate_ssa_name_ptr_info): Remove stmt_info parameter.
Measure DR_MISALIGNMENT relative to DR_TARGET_ALIGNMENT.
(vect_create_addr_base_for_vector_ref): Update call accordingly.
(vect_create_data_ref_ptr): Likewise.
(vect_setup_realignment): Realign by ANDing with
-DR_TARGET_MISALIGNMENT.
* tree-vect-loop-manip.c (vect_gen_prolog_loop_niters): Calculate
the number of peels based on DR_TARGET_ALIGNMENT.
* tree-vect-stmts.c (get_group_load_store_type): Compare the gap
with the guaranteed alignment boundary when deciding whether
overrun is OK.
(vectorizable_mask_load_store): Interpret DR_MISALIGNMENT
relative to DR_TARGET_ALIGNMENT instead of TYPE_ALIGN_UNIT.
(ensure_base_align): Remove stmt_info parameter. Get the
target base alignment from DR_TARGET_ALIGNMENT.
(vectorizable_store): Update call accordingly. Interpret
DR_MISALIGNMENT relative to DR_TARGET_ALIGNMENT instead of
TYPE_ALIGN_UNIT.
(vectorizable_load): Likewise.
gcc/testsuite/
* gcc.dg/vect/vect-outer-3a.c: Adjust dump scan for new wording
of alignment message.
* gcc.dg/vect/vect-outer-3a-big-array.c: Likewise.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r253101
|
|
This patch adds a helper function for getting the number of bytes
accessed by an unvectorised data reference, which helps when general
modes have a variable size.
2017-09-22 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* tree-vectorizer.h (vect_get_scalar_dr_size): New function.
* tree-vect-data-refs.c (vect_update_misalignment_for_peel): Use it.
(vect_enhance_data_refs_alignment): Likewise.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r253099
|
|
The vectoriser was calling vect_get_smallest_scalar_type without
having proven that the type actually is a scalar. This seems to
be the intended behaviour: the ultimate test of whether the type
is interesting (and hence scalar) is whether an associated vector
type exists, but this is only tested later.
The patch simply makes the function cope gracefully with non-scalar
inputs.
2017-09-18 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* tree-vect-data-refs.c (vect_get_smallest_scalar_type): Cope
with types that aren't in fact scalar.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r252934
|
|
Epilogue vectorisation uses the vectorisation factor of the main loop
as the maximum vectorisation factor allowed for correctness. That makes
sense as a conservatively correct value, since the chosen vectorisation
factor will be strictly less than that anyway. However, once the VF
itself becomes variable, it's easier to carry across the original
maximum VF instead.
2017-09-14 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* tree-vectorizer.h (_loop_vec_info): Add max_vectorization_factor.
(LOOP_VINFO_MAX_VECT_FACTOR): New macro.
(LOOP_VINFO_ORIG_VECT_FACTOR): Replace with...
(LOOP_VINFO_ORIG_MAX_VECT_FACTOR): ...this new macro.
* tree-vect-data-refs.c (vect_analyze_data_ref_dependences): Update
accordingly.
* tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize
max_vectorization_factor.
(vect_analyze_loop_2): Set LOOP_VINFO_MAX_VECT_FACTOR.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r252766
|
|
This patch adds a vectoriser helper routine to calculate how
many copies of a vector statement we need. At present this
is always:
LOOP_VINFO_VECT_FACTOR (loop_vinfo) / TYPE_VECTOR_SUBPARTS (vectype)
but later patches add other cases. Another benefit of using
a helper routine is that it can assert that the division is
exact (which it must be).
2017-09-14 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* tree-vectorizer.h (vect_get_num_copies): New function.
* tree-vect-data-refs.c (vect_get_data_access_cost): Use it.
* tree-vect-loop.c (vectorizable_reduction): Likewise.
(vectorizable_induction): Likewise.
(vectorizable_live_operation): Likewise.
* tree-vect-stmts.c (vectorizable_mask_load_store): Likewise.
(vectorizable_bswap): Likewise.
(vectorizable_call): Likewise.
(vectorizable_conversion): Likewise.
(vectorizable_assignment): Likewise.
(vectorizable_shift): Likewise.
(vectorizable_operation): Likewise.
(vectorizable_store): Likewise.
(vectorizable_load): Likewise.
(vectorizable_condition): Likewise.
(vectorizable_comparison): Likewise.
(vect_analyze_stmt): Pass the slp node to vectorizable_live_operation.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r252764
|
|
This patch makes can_vec_perm_p & co. take a vec<>, wrapped in new
typedefs vec_perm_indices and auto_vec_perm_indices. There are two
reasons for doing this for SVE:
(1) it means that the number of elements is bundled with the elements
themselves, and is obviously constant.
(2) it makes it easier to change the "unsigned char" element type to
something wider.
Changing the target hook is left as follow-on work.
2017-09-14 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* target.h (vec_perm_indices): New typedef.
(auto_vec_perm_indices): Likewise.
* optabs-query.h: Include target.h
(can_vec_perm_p): Take a vec_perm_indices *.
* optabs-query.c (can_vec_perm_p): Likewise.
(can_mult_highpart_p): Update accordingly. Use auto_vec_perm_indices.
* tree-ssa-forwprop.c (simplify_vector_constructor): Likewise.
* tree-vect-generic.c (lower_vec_perm): Likewise.
* tree-vect-data-refs.c (vect_grouped_store_supported): Likewise.
(vect_grouped_load_supported): Likewise.
(vect_shift_permute_load_chain): Likewise.
(vect_permute_store_chain): Use auto_vec_perm_indices.
(vect_permute_load_chain): Likewise.
* fold-const.c (fold_vec_perm): Take vec_perm_indices.
(fold_ternary_loc): Update accordingly. Use auto_vec_perm_indices.
Update uses of can_vec_perm_p.
* tree-vect-loop.c (calc_vec_perm_mask_for_shift): Replace the
mode with a number of elements. Take a vec_perm_indices *.
(vect_create_epilog_for_reduction): Update accordingly.
Use auto_vec_perm_indices.
(have_whole_vector_shift): Likewise. Update call to can_vec_perm_p.
* tree-vect-slp.c (vect_build_slp_tree_1): Likewise.
(vect_transform_slp_perm_load): Likewise.
(vect_schedule_slp_instance): Use auto_vec_perm_indices.
* tree-vectorizer.h (vect_gen_perm_mask_any): Take a vec_perm_indices.
(vect_gen_perm_mask_checked): Likewise.
* tree-vect-stmts.c (vect_gen_perm_mask_any): Take a vec_perm_indices.
(vect_gen_perm_mask_checked): Likewise.
(vectorizable_mask_load_store): Use auto_vec_perm_indices.
(vectorizable_store): Likewise.
(vectorizable_load): Likewise.
(perm_mask_for_reverse): Likewise. Update call to can_vec_perm_p.
(vectorizable_bswap): Likewise.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r252761
|
|
This patch adds a wrapper around mode_for_size for cases in which
the mode class is MODE_INT (the commonest case). The return type
can then be an opt_scalar_int_mode instead of a machine_mode.
2017-08-30 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* machmode.h (int_mode_for_size): New function.
* builtins.c (set_builtin_user_assembler_name): Use int_mode_for_size
instead of mode_for_size.
* calls.c (save_fixed_argument_area): Likewise. Make use of BLKmode
explicit.
* combine.c (expand_field_assignment): Use int_mode_for_size
instead of mode_for_size.
(make_extraction): Likewise.
(simplify_shift_const_1): Likewise.
(simplify_comparison): Likewise.
* dojump.c (do_jump): Likewise.
* dwarf2out.c (mem_loc_descriptor): Likewise.
* emit-rtl.c (init_derived_machine_modes): Likewise.
* expmed.c (flip_storage_order): Likewise.
(convert_extracted_bit_field): Likewise.
* expr.c (copy_blkmode_from_reg): Likewise.
* graphite-isl-ast-to-gimple.c (max_mode_int_precision): Likewise.
* internal-fn.c (expand_mul_overflow): Likewise.
* lower-subreg.c (simple_move): Likewise.
* optabs-libfuncs.c (init_optabs): Likewise.
* simplify-rtx.c (simplify_unary_operation_1): Likewise.
* tree.c (vector_type_mode): Likewise.
* tree-ssa-strlen.c (handle_builtin_memcmp): Likewise.
* tree-vect-data-refs.c (vect_lanes_optab_supported_p): Likewise.
* tree-vect-generic.c (expand_vector_parallel): Likewise.
* tree-vect-stmts.c (vectorizable_load): Likewise.
(vectorizable_store): Likewise.
gcc/ada/
* gcc-interface/decl.c (gnat_to_gnu_entity): Use int_mode_for_size
instead of mode_for_size.
(gnat_to_gnu_subprog_type): Likewise.
* gcc-interface/utils.c (make_type_from_size): Likewise.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r251469
|
|
This patch is a follow-on to the fix for PR81136. The testcase for that
PR shows that we can (correctly) calculate different base alignments
for two data_references but still tell that their misalignments wrt the
vector size are equal. This is because we calculate the base alignments
for each dr individually, without looking at the other drs, and in
general the alignment we calculate is only guaranteed if the dr's DR_REF
actually occurs.
This is working as designed, but it does expose a missed opportunity.
We know that if a vectorised loop is reached, all statements in that
loop execute at least once, so it should be safe to pool the alignment
information for all the statements we're vectorising. The only catch is
that DR_REFs for masked loads and stores only occur if the mask value is
nonzero. For example, in:
struct s __attribute__((aligned(32))) {
int misaligner;
int array[N];
};
int *ptr;
for (int i = 0; i < n; ++i)
ptr[i] = c[i] ? ((struct s *) (ptr - 1))->array[i] : 0;
we can only guarantee that ptr points to a "struct s" if at least
one c[i] is true.
This patch adds a DR_IS_CONDITIONAL_IN_STMT flag to record whether
the DR_REF is guaranteed to occur every time that the statement
executes to completion. It then pools the alignment information
for references that aren't conditional in this sense.
2017-08-04 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
PR tree-optimization/81136
* tree-vectorizer.h: Include tree-hash-traits.h.
(vec_base_alignments): New typedef.
(vec_info): Add a base_alignments field.
(vect_record_base_alignments): Declare.
* tree-data-ref.h (data_reference): Add an is_conditional_in_stmt
field.
(DR_IS_CONDITIONAL_IN_STMT): New macro.
(create_data_ref): Add an is_conditional_in_stmt argument.
* tree-data-ref.c (create_data_ref): Likewise. Use it to initialize
the is_conditional_in_stmt field.
(data_ref_loc): Add an is_conditional_in_stmt field.
(get_references_in_stmt): Set the is_conditional_in_stmt field.
(find_data_references_in_stmt): Update call to create_data_ref.
(graphite_find_data_references_in_stmt): Likewise.
* tree-ssa-loop-prefetch.c (determine_loop_nest_reuse): Likewise.
* tree-vect-data-refs.c (vect_analyze_data_refs): Likewise.
(vect_record_base_alignment): New function.
(vect_record_base_alignments): Likewise.
(vect_compute_data_ref_alignment): Adjust base_addr and aligned_to
for nested statements even if we fail to compute a misalignment.
Use pooled base alignments for unconditional references.
(vect_find_same_alignment_drs): Compare base addresses instead
of base objects.
(vect_analyze_data_refs_alignment): Call vect_record_base_alignments.
* tree-vect-slp.c (vect_slp_analyze_bb_1): Likewise.
gcc/testsuite/
PR tree-optimization/81136
* gcc.dg/vect/pr81136.c: Add scan test.
From-SVN: r250870
|
|
This patch checks whether two data references x and y cannot
partially overlap and so are independent whenever &x != &y.
We can then use this in the vectoriser to optimise alias checks.
gcc/
2016-08-04 Richard Sandiford <richard.sandiford@linaro.org>
* hash-traits.h (pair_hash): New struct.
* tree-data-ref.h (data_dependence_relation): Add object_a and
object_b fields.
(DDR_OBJECT_A, DDR_OBJECT_B): New macros.
* tree-data-ref.c (initialize_data_dependence_relation): Initialize
DDR_OBJECT_A and DDR_OBJECT_B.
* tree-vectorizer.h (vec_object_pair): New type.
(_loop_vec_info): Add a check_unequal_addrs field.
(LOOP_VINFO_CHECK_UNEQUAL_ADDRS): New macro.
(LOOP_REQUIRES_VERSIONING_FOR_ALIAS): Return true if there is an
entry in check_unequal_addrs. Check comp_alias_ddrs instead of
may_alias_ddrs.
* tree-vect-loop.c (destroy_loop_vec_info): Release
LOOP_VINFO_CHECK_UNEQUAL_ADDRS.
(vect_analyze_loop_2): Likewise, when restarting.
(vect_estimate_min_profitable_iters): Estimate the cost of
LOOP_VINFO_CHECK_UNEQUAL_ADDRS.
* tree-vect-data-refs.c: Include tree-hash-traits.h.
(vect_prune_runtime_alias_test_list): Try to handle conflicts
using LOOP_VINFO_CHECK_UNEQUAL_ADDRS, if the data dependence allows.
Count such tests in the final summary.
* tree-vect-loop-manip.c (chain_cond_expr): New function.
(vect_create_cond_for_align_checks): Use it.
(vect_create_cond_for_unequal_addrs): New function.
(vect_loop_versioning): Call it.
gcc/testsuite/
* gcc.dg/vect/vect-alias-check-6.c: New test.
From-SVN: r250868
|
|
This patch tries to calculate conservatively-correct distance
vectors for two references whose base addresses are not the same.
It sets a new flag DDR_COULD_BE_INDEPENDENT_P if the dependence
isn't guaranteed to occur.
The motivating example is:
struct s { int x[8]; };
void
f (struct s *a, struct s *b)
{
for (int i = 0; i < 8; ++i)
a->x[i] += b->x[i];
}
in which the "a" and "b" accesses are either independent or have a
dependence distance of 0 (assuming -fstrict-aliasing). Neither case
prevents vectorisation, so we can vectorise without an alias check.
I'd originally wanted to do the same thing for arrays as well, e.g.:
void
f (int a[][8], struct b[][8])
{
for (int i = 0; i < 8; ++i)
a[0][i] += b[0][i];
}
I think this is valid because C11 6.7.6.2/6 says:
For two array types to be compatible, both shall have compatible
element types, and if both size specifiers are present, and are
integer constant expressions, then both size specifiers shall have
the same constant value.
So if we access an array through an int (*)[8], it must have type X[8]
or X[], where X is compatible with int. It doesn't seem possible in
either case for "a[0]" and "b[0]" to overlap when "a != b".
However, as the comment above "if (same_base_p)" explains, GCC is more
forgiving: it supports arbitrary overlap of arrays and allows arrays to
be accessed with different dimensionality. There are examples of this
in PR50067. The patch therefore only handles references that end in a
structure field access.
There are two ways of handling these dependences in the vectoriser:
use them to limit VF, or check at runtime as before. I've gone for
the approach of checking at runtime if we can, to avoid limiting VF
unnecessarily, but falling back to a VF cap when runtime checks aren't
allowed.
The patch tests whether we queued an alias check with a dependence
distance of X and then picked a VF <= X, in which case it's safe to
drop the alias check. Since vect_prune_runtime_alias_check_list
can be called twice with different VF for the same loop, it's no
longer safe to clear may_alias_ddrs on exit. Instead we should use
comp_alias_ddrs to check whether versioning is necessary.
2017-08-04 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* tree-data-ref.h (subscript): Add access_fn field.
(data_dependence_relation): Add could_be_independent_p.
(SUB_ACCESS_FN, DDR_COULD_BE_INDEPENDENT_P): New macros.
(same_access_functions): Move to tree-data-ref.c.
* tree-data-ref.c (ref_contains_union_access_p): New function.
(access_fn_component_p): Likewise.
(access_fn_components_comparable_p): Likewise.
(dr_analyze_indices): Add a reference to access_fn_component_p.
(dump_data_dependence_relation): Use SUB_ACCESS_FN instead of
DR_ACCESS_FN.
(constant_access_functions): Likewise.
(add_other_self_distances): Likewise.
(same_access_functions): Likewise. (Moved from tree-data-ref.h.)
(initialize_data_dependence_relation): Use XCNEW and remove
explicit zeroing of DDR_REVERSED_P. Look for a subsequence
of access functions that have the same type. Allow the
subsequence to end with different bases in some circumstances.
Record the chosen access functions in SUB_ACCESS_FN.
(build_classic_dist_vector_1): Replace ddr_a and ddr_b with
a_index and b_index. Use SUB_ACCESS_FN instead of DR_ACCESS_FN.
(subscript_dependence_tester_1): Likewise dra and drb.
(build_classic_dist_vector): Update calls accordingly.
(subscript_dependence_tester): Likewise.
* tree-ssa-loop-prefetch.c (determine_loop_nest_reuse): Check
DDR_COULD_BE_INDEPENDENT_P.
* tree-vectorizer.h (LOOP_REQUIRES_VERSIONING_FOR_ALIAS): Test
comp_alias_ddrs instead of may_alias_ddrs.
* tree-vect-data-refs.c (vect_analyze_possibly_independent_ddr):
New function.
(vect_analyze_data_ref_dependence): Use it if
DDR_COULD_BE_INDEPENDENT_P, but fall back to using the recorded
distance vectors if that fails.
(dependence_distance_ge_vf): New function.
(vect_prune_runtime_alias_test_list): Use it. Don't clear
LOOP_VINFO_MAY_ALIAS_DDRS.
gcc/testsuite/
* gcc.dg/vect/vect-alias-check-3.c: New test.
* gcc.dg/vect/vect-alias-check-4.c: Likewise.
* gcc.dg/vect/vect-alias-check-5.c: Likewise.
From-SVN: r250867
|
|
2017-07-21 Richard Biener <rguenther@suse.de>
PR tree-optimization/81303
* tree-vect-data-refs.c (vect_get_peeling_costs_all_drs): Pass
in datarefs vector. Allow NULL dr0 for no peeling cost estimate.
(vect_peeling_hash_get_lowest_cost): Adjust.
(vect_enhance_data_refs_alignment): Likewise. Use
vect_get_peeling_costs_all_drs to compute the penalty for no
peeling to match up costs.
From-SVN: r250424
|
|
npeel was erroneously overwritten by vect_peeling_hash_get_lowest_cost
although the corresponding dataref is not used afterwards. It should
be safe to get rid of the npeel parameter since we use the returned
peeling_info's npeel anyway. Also removed the body_cost_vec parameter
which is not used elsewhere.
gcc/ChangeLog:
2017-07-18 Robin Dapp <rdapp@linux.vnet.ibm.com>
* tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Remove
body_cost_vec from _vect_peel_extended_info.
(vect_peeling_hash_get_lowest_cost): Do not set body_cost_vec.
(vect_peeling_hash_choose_best_peeling): Remove body_cost_vec and
npeel.
From-SVN: r250300
|