Age | Commit message (Collapse) | Author | Files | Lines |
|
2018-06-01 Richard Biener <rguenther@suse.de>
* tree-vectorizer.h (vect_dr_stmt): New function.
(vect_get_load_cost): Adjust.
(vect_get_store_cost): Likewise.
* tree-vect-data-refs.c (vect_analyze_data_ref_dependence):
Use vect_dr_stmt instead of DR_SMTT.
(vect_record_base_alignments): Likewise.
(vect_calculate_target_alignment): Likewise.
(vect_compute_data_ref_alignment): Likewise and make static.
(vect_update_misalignment_for_peel): Likewise.
(vect_verify_datarefs_alignment): Likewise.
(vector_alignment_reachable_p): Likewise.
(vect_get_data_access_cost): Likewise. Pass down
vinfo to vect_get_load_cost/vect_get_store_cost instead of DR.
(vect_get_peeling_costs_all_drs): Likewise.
(vect_peeling_hash_get_lowest_cost): Likewise.
(vect_enhance_data_refs_alignment): Likewise.
(vect_find_same_alignment_drs): Likewise.
(vect_analyze_data_refs_alignment): Likewise.
(vect_analyze_group_access_1): Likewise.
(vect_analyze_group_access): Likewise.
(vect_analyze_data_ref_access): Likewise.
(vect_analyze_data_ref_accesses): Likewise.
(vect_vfa_segment_size): Likewise.
(vect_small_gap_p): Likewise.
(vectorizable_with_step_bound_p): Likewise.
(vect_prune_runtime_alias_test_list): Likewise.
(vect_analyze_data_refs): Likewise.
(vect_supportable_dr_alignment): Likewise.
* tree-vect-loop-manip.c (get_misalign_in_elems): Likewise.
(vect_gen_prolog_loop_niters): Likewise.
* tree-vect-loop.c (vect_analyze_loop_2): Likewise.
* tree-vect-patterns.c (vect_recog_bool_pattern): Do not
modify DR_STMT.
(vect_recog_mask_conversion_pattern): Likewise.
(vect_try_gather_scatter_pattern): Likewise.
* tree-vect-stmts.c (vect_model_store_cost): Pass stmt_info
to vect_get_store_cost.
(vect_get_store_cost): Get stmt_info instead of DR.
(vect_model_load_cost): Pass stmt_info to vect_get_load_cost.
(vect_get_load_cost): Get stmt_info instead of DR.
From-SVN: r261062
|
|
vectorized for AVX512DQ target)
PR target/85918
* tree.def (VEC_UNPACK_FIX_TRUNC_HI_EXPR, VEC_UNPACK_FIX_TRUNC_LO_EXPR,
VEC_PACK_FLOAT_EXPR): New tree codes.
* tree-pretty-print.c (op_code_prio): Handle
VEC_UNPACK_FIX_TRUNC_HI_EXPR and VEC_UNPACK_FIX_TRUNC_LO_EXPR.
(dump_generic_node): Handle VEC_UNPACK_FIX_TRUNC_HI_EXPR,
VEC_UNPACK_FIX_TRUNC_LO_EXPR and VEC_PACK_FLOAT_EXPR.
* tree-inline.c (estimate_operator_cost): Likewise.
* gimple-pretty-print.c (dump_binary_rhs): Handle VEC_PACK_FLOAT_EXPR.
* fold-const.c (const_binop): Likewise.
(const_unop): Handle VEC_UNPACK_FIX_TRUNC_HI_EXPR and
VEC_UNPACK_FIX_TRUNC_LO_EXPR.
* tree-cfg.c (verify_gimple_assign_unary): Likewise.
(verify_gimple_assign_binary): Handle VEC_PACK_FLOAT_EXPR.
* cfgexpand.c (expand_debug_expr): Handle VEC_UNPACK_FIX_TRUNC_HI_EXPR,
VEC_UNPACK_FIX_TRUNC_LO_EXPR and VEC_PACK_FLOAT_EXPR.
* expr.c (expand_expr_real_2): Likewise.
* optabs.def (vec_packs_float_optab, vec_packu_float_optab,
vec_unpack_sfix_trunc_hi_optab, vec_unpack_sfix_trunc_lo_optab,
vec_unpack_ufix_trunc_hi_optab, vec_unpack_ufix_trunc_lo_optab): New
optabs.
* optabs.c (expand_widen_pattern_expr): For
VEC_UNPACK_FIX_TRUNC_HI_EXPR and VEC_UNPACK_FIX_TRUNC_LO_EXPR use
sign from result type rather than operand's type.
(expand_binop_directly): For vec_packu_float_optab and
vec_packs_float_optab allow result type to be different from operand's
type.
* optabs-tree.c (optab_for_tree_code): Handle
VEC_UNPACK_FIX_TRUNC_HI_EXPR, VEC_UNPACK_FIX_TRUNC_LO_EXPR and
VEC_PACK_FLOAT_EXPR. Formatting fixes.
* tree-vect-generic.c (expand_vector_operations_1): Handle
VEC_UNPACK_FIX_TRUNC_HI_EXPR, VEC_UNPACK_FIX_TRUNC_LO_EXPR and
VEC_PACK_FLOAT_EXPR.
* tree-vect-stmts.c (supportable_widening_operation): Handle
FIX_TRUNC_EXPR.
(supportable_narrowing_operation): Handle FLOAT_EXPR.
* config/i386/i386.md (fixprefix, floatprefix): New code attributes.
* config/i386/sse.md (*float<floatunssuffix>v2div2sf2): Rename to ...
(float<floatunssuffix>v2div2sf2): ... this. Formatting fix.
(vpckfloat_concat_mode, vpckfloat_temp_mode, vpckfloat_op_mode): New
mode attributes.
(vec_pack<floatprefix>_float_<mode>): New expander.
(vunpckfixt_mode, vunpckfixt_model, vunpckfixt_extract_mode): New mode
attributes.
(vec_unpack_<fixprefix>fix_trunc_lo_<mode>,
vec_unpack_<fixprefix>fix_trunc_hi_<mode>): New expanders.
* doc/md.texi (vec_packs_float_@var{m}, vec_packu_float_@var{m},
vec_unpack_sfix_trunc_hi_@var{m}, vec_unpack_sfix_trunc_lo_@var{m},
vec_unpack_ufix_trunc_hi_@var{m}, vec_unpack_ufix_trunc_lo_@var{m}):
Document.
* doc/generic.texi (VEC_UNPACK_FLOAT_HI_EXPR,
VEC_UNPACK_FLOAT_LO_EXPR): Fix pasto in description.
(VEC_UNPACK_FIX_TRUNC_HI_EXPR, VEC_UNPACK_FIX_TRUNC_LO_EXPR,
VEC_PACK_FLOAT_EXPR): Document.
* gcc.target/i386/avx512dq-pr85918.c: Add -mprefer-vector-width=512
and -fno-vect-cost-model options. Add aligned(64) attribute to the
arrays. Add suffix 1 to all functions and use 4 iterations rather
than N. Add functions with conversions to and from float.
Add new set of functions with 8 iterations and another one
with 16 iterations, expect 24 vectorized loops instead of just 4.
* gcc.target/i386/avx512dq-pr85918-2.c: New test.
From-SVN: r260893
|
|
2018-05-29 Richard Biener <rguenther@suse.de>
* tree-vectorizer.h (struct vec_info): Add stmt_vec_infos
member.
(stmt_vec_info_vec): Make pointer.
(init_stmt_vec_info_vec): Remove.
(free_stmt_vec_info_vec): Likewise.
(set_stmt_vec_info_vec): New function.
(free_stmt_vec_infos): Likewise.
(vinfo_for_stmt): Adjust for stmt_vec_info_vec indirection.
(set_vinfo_for_stmt): Likewise.
(get_earlier_stmt): Likewise.
(get_later_stmt): Likewise.
* tree-vectorizer.c (stmt_vec_info_vec): Make pointer.
(vec_info::vec_info): Allocate stmt_vec_infos and set the global.
(vec_info::~vec_info): Free stmt_vec_infos.
(vectorize_loops): Set the global stmt_vec_info_vec to NULL.
Remove old init_stmt_vec_info_vec/free_stmt_vec_info_vec calls.
(pass_slp_vectorize::execute): Likewise.
* tree-vect-stmts.c (init_stmt_vec_info_vec): Remove.
(free_stmt_vec_info_vec): Likewise.
(set_stmt_vec_info_vec): New function.
(free_stmt_vec_infos): Likewise.
* tree-vect-loop.c (_loop_vec_info::~_loop_vec_info): Set
the global stmt_vec_info_vec.
* tree-parloops.c (gather_scalar_reductions): Use
set_stmt_vec_info_vec/free_stmt_vec_infos and maintain a local
vector.
From-SVN: r260892
|
|
2018-05-25 Richard Biener <rguenther@suse.de>
* tree-vectorizer.h (STMT_VINFO_GROUP_*, GROUP_*): Remove.
(DR_GROUP_*): New, assert we have non-NULL ->data_ref_info.
(REDUC_GROUP_*): New, assert we have NULL ->data_ref_info.
(STMT_VINFO_GROUPED_ACCESS): Adjust.
* tree-vect-data-refs.c (everywhere): Adjust users.
* tree-vect-loop.c (everywhere): Likewise.
* tree-vect-slp.c (everywhere): Likewise.
* tree-vect-stmts.c (everywhere): Likewise.
* tree-vect-patterns.c (vect_reassociating_reduction_p): Likewise.
From-SVN: r260709
|
|
solib.fppized.f starting with r260283)
2018-05-22 Richard Biener <rguenther@suse.de>
PR tree-optimization/85863
* tree-vect-stmts.c (vect_is_simple_cond): Only widen invariant
comparisons when vectype is specified.
(vectorizable_condition): Do not specify vectype for
vect_is_simple_cond when SLP vectorizing.
* gfortran.fortran-torture/compile/pr85863.f: New testcase.
From-SVN: r260501
|
|
pass vect.)
PR tree-optimization/85793
* tree-vect-stmts.c (vectorizable_load): Handle 1 element-wise load
for VMAT_ELEMENTWISE.
gcc/testsuite
* gcc.dg/vect/pr85793.c: New test.
Co-Authored-By: Richard Biener <rguenther@suse.de>
From-SVN: r260317
|
|
2018-05-16 Richard Biener <rguenther@suse.de>
* tree-vectorizer.h (struct stmt_info_for_cost): Add where member.
(dump_stmt_cost): Declare.
(add_stmt_cost): Dump cost we add.
(add_stmt_costs): New function.
(vect_model_simple_cost, vect_model_store_cost, vect_model_load_cost):
No longer exported.
(vect_analyze_stmt): Adjust prototype.
(vectorizable_condition): Likewise.
(vectorizable_live_operation): Likewise.
(vectorizable_reduction): Likewise.
(vectorizable_induction): Likewise.
* tree-vect-loop.c (vect_analyze_loop_operations): Create local
cost vector to pass to vectorizable_ and record afterwards.
(vect_model_reduction_cost): Take cost vector argument and adjust.
(vect_model_induction_cost): Likewise.
(vectorizable_reduction): Likewise.
(vectorizable_induction): Likewise.
(vectorizable_live_operation): Likewise.
* tree-vect-slp.c (vect_create_new_slp_node): Initialize
SLP_TREE_NUMBER_OF_VEC_STMTS.
(vect_analyze_slp_cost_1): Remove.
(vect_analyze_slp_cost): Likewise.
(vect_slp_analyze_node_operations): Take visited args and
a target cost vector. Avoid processing already visited stmt sets.
(vect_slp_analyze_operations): Use a local cost vector to gather
costs and register those of non-discarded instances.
(vect_bb_vectorization_profitable_p): Use add_stmt_costs.
(vect_schedule_slp_instance): Remove copying of
SLP_TREE_NUMBER_OF_VEC_STMTS. Instead assert that it is not
zero.
* tree-vect-stmts.c (record_stmt_cost): Remove path directly
adding cost. Record cost entry location.
(vect_prologue_cost_for_slp_op): Function to compute cost of
a constant or invariant generated for SLP vect in the prologue,
split out from vect_analyze_slp_cost_1.
(vect_model_simple_cost): Make static. Adjust for SLP costing.
(vect_model_promotion_demotion_cost): Likewise.
(vect_model_store_cost): Likewise, make static.
(vect_model_load_cost): Likewise.
(vectorizable_bswap): Add cost vector arg and adjust.
(vectorizable_call): Likewise.
(vectorizable_simd_clone_call): Likewise.
(vectorizable_conversion): Likewise.
(vectorizable_assignment): Likewise.
(vectorizable_shift): Likewise.
(vectorizable_operation): Likewise.
(vectorizable_store): Likewise.
(vectorizable_load): Likewise.
(vectorizable_condition): Likewise.
(vectorizable_comparison): Likewise.
(can_vectorize_live_stmts): Likewise.
(vect_analyze_stmt): Likewise.
(vect_transform_stmt): Adjust calls to vectorizable_*.
* tree-vectorizer.c: Include gimple-pretty-print.h.
(dump_stmt_cost): New function.
From-SVN: r260289
|
|
The SLP unrolling factor is calculated by finding the smallest
scalar type for each SLP statement and taking the number of required
lanes from the vector versions of those scalar types. E.g. for an
int32->int64 conversion, it's the vector of int32s rather than the
vector of int64s that determines the unroll factor.
We rely on tree-vect-patterns.c to replace boolean operations like:
bool a, b, c;
a = b & c;
with integer operations of whatever the best size is in context.
E.g. if b and c are fed by comparisons of ints, a, b and c will become
the appropriate size for an int comparison. For most targets this means
that a, b and c will end up as int-sized themselves, but on targets like
SVE and AVX512 with packed vector booleans, they'll instead become a
small bitfield like :1, padded to a byte for memory purposes.
The SLP code would then take these scalar types and try to calculate
the vector type for them, causing the unroll factor to be much higher
than necessary.
This patch tries to make the SLP code use the same approach as the
loop vectorizer, by splitting out the code that calculates the
statement vector type and the vector type that should be used for
the number of units.
2018-05-16 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* tree-vectorizer.h (vect_get_vector_types_for_stmt): Declare.
(vect_get_mask_type_for_stmt): Likewise.
* tree-vect-slp.c (vect_two_operations_perm_ok_p): New function,
split out from...
(vect_build_slp_tree_1): ...here. Use vect_get_vector_types_for_stmt
to determine the statement's vector type and the vector type that
should be used for calculating nunits. Deal with cases in which
the type has to be deferred.
(vect_slp_analyze_node_operations): Use vect_get_vector_types_for_stmt
and vect_get_mask_type_for_stmt to calculate STMT_VINFO_VECTYPE.
* tree-vect-loop.c (vect_determine_vf_for_stmt_1)
(vect_determine_vf_for_stmt): New functions, split out from...
(vect_determine_vectorization_factor): ...here.
* tree-vect-stmts.c (vect_get_vector_types_for_stmt)
(vect_get_mask_type_for_stmt): New functions, split out from
vect_determine_vectorization_factor.
gcc/testsuite/
* gcc.target/aarch64/sve/vcond_10.c: New test.
* gcc.target/aarch64/sve/vcond_10_run.c: Likewise.
* gcc.target/aarch64/sve/vcond_11.c: Likewise.
* gcc.target/aarch64/sve/vcond_11_run.c: Likewise.
From-SVN: r260287
|
|
We build up the input to IFN_STORE_LANES one vector at a time.
In RTL, each of these vector assignments becomes a write to
subregs of the form (subreg:VEC (reg:AGGR R)), where R is the
eventual input to the store lanes instruction. The problem is
that RTL isn't very good at tracking liveness when things are
initialised piecemeal by subregs, so R tends to end up being
live on all paths from the entry block to the store. This in
turn leads to unnecessary spilling around calls, as well as to
excess register pressure in vector loops.
This patch adds gimple clobbers to indicate the liveness of the
IFN_STORE_LANES variable and makes sure that gimple clobbers are
expanded to rtl clobbers where useful. For consistency it also
uses clobbers to mark the point at which an IFN_LOAD_LANES
variable is no longer needed.
2018-05-08 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* cfgexpand.c (expand_clobber): New function.
(expand_gimple_stmt_1): Use it.
* tree-vect-stmts.c (vect_clobber_variable): New function,
split out from...
(vectorizable_simd_clone_call): ...here.
(vectorizable_store): Emit a clobber either side of an
IFN_STORE_LANES sequence.
(vectorizable_load): Emit a clobber after an IFN_LOAD_LANES sequence.
gcc/testsuite/
* gcc.target/aarch64/store_lane_spill_1.c: New test.
* gcc.target/aarch64/sve/store_lane_spill_1.c: Likewise.
From-SVN: r260073
|
|
compute_live_loop_exits, at tree-ssa-loop-manip.c:229)
2018-05-02 Richard Biener <rguenther@suse.de>
PR tree-optimization/85597
* tree-vect-stmts.c (vectorizable_operation): For ternary SLP
do not use split vect_get_vec_defs call but call vect_get_slp_defs
directly.
* gcc.dg/vect/pr85597.c: New testcase.
From-SVN: r259840
|
|
with r256888)
2018-04-19 Richard Biener <rguenther@suse.de>
PR tree-optimization/84737
* tree-vect-data-refs.c (vect_copy_ref_info): New function
copying restrict info.
(vect_setup_realignment): Use it.
* tree-vectorizer.h (vect_copy_ref_info): Declare.
* tree-vect-stmts.c (vectorizable_store): Copy ref info from
the first DR to all generated stores.
(vectorizable_load): Likewise for loads.
From-SVN: r259493
|
|
We were computing &LOOP_VINFO_MASKS even for bb vectorisation,
which is UB.
2018-03-02 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
PR tree-optimization/84634
* tree-vect-stmts.c (vectorizable_store, vectorizable_load): Replace
masks and masked_loop_p with a single loop_masks, making sure it's
null for bb vectorization.
From-SVN: r258131
|
|
since r256644)
2018-02-12 Richard Biener <rguenther@suse.de>
PR tree-optimization/84037
* tree-vect-slp.c (vect_analyze_slp_cost): Add visited
parameter, move visited init to caller.
(vect_slp_analyze_operations): Separate cost from validity
check, initialize visited once for all instances.
(vect_schedule_slp): Analyze map to CSE vectorized nodes once
for all instances.
* tree-vect-stmts.c (vect_model_simple_cost): Make early
out an assert.
(vect_model_promotion_demotion_cost): Likewise.
(vectorizable_bswap): Guard cost modeling with !slp_node
instead of !PURE_SLP_STMT to avoid double-counting on hybrid
SLP stmts.
(vectorizable_call): Likewise.
(vectorizable_conversion): Likewise.
(vectorizable_assignment): Likewise.
(vectorizable_shift): Likewise.
(vectorizable_operation): Likewise.
(vectorizable_store): Likewise.
(vectorizable_load): Likewise.
(vectorizable_condition): Likewise.
(vectorizable_comparison): Likewise.
From-SVN: r257588
|
|
PR83753 was about a case in which we ended up trying to "vectorise"
a group of loads ore stores using single-element vectors. The problem
was that we were classifying the load or store as VMAT_CONTIGUOUS_PERMUTE
rather than VMAT_CONTIGUOUS, even though it doesn't make sense to permute
a single-element vector.
In that PR it was enough to change get_group_load_store_type,
because vectorisation ended up being unprofitable and so we didn't
take things further. But when vectorisation is profitable, the same
fix is needed in vectorizable_load and vectorizable_store.
2018-02-08 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
PR tree-optimization/84265
* tree-vect-stmts.c (vectorizable_store): Don't treat
VMAT_CONTIGUOUS accesses as grouped.
(vectorizable_load): Likewise.
gcc/testsuite/
PR tree-optimization/84265
* gcc.dg/vect/pr84265.c: New test.
From-SVN: r257492
|
|
2018-02-08 Richard Biener <rguenther@suse.de>
PR tree-optimization/84278
* tree-vect-stmts.c (vectorizable_store): When looking for
smaller vector types to perform grouped strided loads/stores
make sure the mode is supported by the target.
(vectorizable_load): Likewise.
* gcc.target/i386/pr84278.c: New testcase.
From-SVN: r257483
|
|
since r256644)
2018-02-07 Richard Biener <rguenther@suse.de>
PR tree-optimization/84037
* tree-vectorizer.h (struct _loop_vec_info): Add ivexpr_map member.
(cse_and_gimplify_to_preheader): Declare.
(vect_get_place_in_interleaving_chain): Likewise.
* tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize
ivexpr_map.
(_loop_vec_info::~_loop_vec_info): Delete it.
(cse_and_gimplify_to_preheader): New function.
* tree-vect-slp.c (vect_get_place_in_interleaving_chain): Export.
* tree-vect-stmts.c (vectorizable_store): CSE base and steps.
(vectorizable_load): Likewise. For grouped stores always base
the IV on the first element.
* tree-vect-loop-manip.c (vect_loop_versioning): Unshare versioning
condition before gimplifying.
From-SVN: r257453
|
|
gcc/ChangeLog:
2018-01-29 Richard Biener <rguenther@suse.de>
Kelvin Nilsen <kelvin@gcc.gnu.org>
PR bootstrap/80867
* tree-vect-stmts.c (vectorizable_call): Don't call
targetm.vectorize_builtin_md_vectorized_function if callee is
NULL.
Co-Authored-By: Kelvin Nilsen <kelvin@gcc.gnu.org>
From-SVN: r257158
|
|
As Jakub says in the PR, the problem here was that the x86/built-in
version of the scatter support was using a bogus scatter_src_dt
when calling vect_get_vec_def_for_stmt_copy (and had since it
was added). The patch uses the vect_def_type from the original
call to vect_is_simple_use instead.
However, Jakub also pointed out that other parts of the load and store
code passed the vector operand rather than the scalar operand to
vect_is_simple_use. That probably works most of the time since
a constant scalar operand should give a constant vector operand,
and likewise for external and internal definitions. But it
definitely seems more robust to pass the scalar operand.
The patch avoids the issue for gather and scatter offsets by
using the cached gs_info.offset_dt. This is safe because gathers
and scatters are never grouped, so there's only one statement operand
to consider. The patch also caches the vect_def_type for mask operands,
which is safe because grouped masked operations share the same mask.
That just leaves the store rhs. We still need to recalculate the
vect_def_type there since different store values in the group can
have different definition types. But since we still have access
to the original scalar operand, it seems better to use that instead.
2018-01-20 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
PR tree-optimization/83940
* tree-vect-stmts.c (vect_truncate_gather_scatter_offset): Set
offset_dt to vect_constant_def rather than vect_unknown_def_type.
(vect_check_load_store_mask): Add a mask_dt_out parameter and
use it to pass back the definition type.
(vect_check_store_rhs): Likewise rhs_dt_out.
(vect_build_gather_load_calls): Add a mask_dt argument and use
it instead of a call to vect_is_simple_use.
(vectorizable_store): Update calls to vect_check_load_store_mask
and vect_check_store_rhs. Use the dt returned by the latter instead
of scatter_src_dt. Use the cached mask_dt and gs_info.offset_dt
instead of calls to vect_is_simple_use. Pass the scalar rather
than the vector operand to vect_is_simple_use when handling
second and subsequent copies of an rhs value.
(vectorizable_load): Update calls to vect_check_load_store_mask
and vect_build_gather_load_calls. Use the cached mask_dt and
gs_info.offset_dt instead of calls to vect_is_simple_use.
gcc/testsuite/
PR tree-optimization/83940
* gcc.dg/torture/pr83940.c: New test.
From-SVN: r256918
|
|
2018-01-16 Richard Biener <rguenther@suse.de>
PR tree-optimization/83867
* tree-vect-stmts.c (vect_transform_stmt): Precompute
nested_in_vect_loop_p since the scalar stmt may get invalidated.
* gcc.dg/vect/pr83867.c: New testcase.
From-SVN: r256746
|
|
This is mostly a mechanical extension of the previous gather load
support to scatter stores. The internal functions in this case are:
IFN_SCATTER_STORE (base, offsets, scale, values)
IFN_MASK_SCATTER_STORE (base, offsets, scale, values, mask)
However, one nonobvious change is to vect_analyze_data_ref_access.
If we're treating an access as a gather load or scatter store
(i.e. if STMT_VINFO_GATHER_SCATTER_P is true), the existing code
would create a dummy data_reference whose step is 0. There's not
really much else it could do, since the whole point is that the
step isn't predictable from iteration to iteration. We then
went into this code in vect_analyze_data_ref_access:
/* Allow loads with zero step in inner-loop vectorization. */
if (loop_vinfo && integer_zerop (step))
{
GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt)) = NULL;
if (!nested_in_vect_loop_p (loop, stmt))
return DR_IS_READ (dr);
I.e. we'd take the step literally and assume that this is a load
or store to an invariant address. Loads from invariant addresses
are supported but stores to them aren't.
The code therefore had the effect of disabling all scatter stores.
AFAICT this is true of AVX too: although tests like avx512f-scatter-1.c
test for the correctness of a scatter-like loop, they don't seem to
check whether a scatter instruction is actually used.
The patch therefore makes vect_analyze_data_ref_access return true
for scatters. We do seem to handle the aliasing correctly;
that's tested by other functions, and is symmetrical to the
already-working gather case.
2018-01-13 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* doc/sourcebuild.texi (vect_scatter_store): Document.
* optabs.def (scatter_store_optab, mask_scatter_store_optab): New
optabs.
* doc/md.texi (scatter_store@var{m}, mask_scatter_store@var{m}):
Document.
* genopinit.c (main): Add supports_vec_scatter_store and
supports_vec_scatter_store_cached to target_optabs.
* gimple.h (gimple_expr_type): Handle IFN_SCATTER_STORE and
IFN_MASK_SCATTER_STORE.
* internal-fn.def (SCATTER_STORE, MASK_SCATTER_STORE): New internal
functions.
* internal-fn.h (internal_store_fn_p): Declare.
(internal_fn_stored_value_index): Likewise.
* internal-fn.c (scatter_store_direct): New macro.
(expand_scatter_store_optab_fn): New function.
(direct_scatter_store_optab_supported_p): New macro.
(internal_store_fn_p): New function.
(internal_gather_scatter_fn_p): Handle IFN_SCATTER_STORE and
IFN_MASK_SCATTER_STORE.
(internal_fn_mask_index): Likewise.
(internal_fn_stored_value_index): New function.
(internal_gather_scatter_fn_supported_p): Adjust operand numbers
for scatter stores.
* optabs-query.h (supports_vec_scatter_store_p): Declare.
* optabs-query.c (supports_vec_scatter_store_p): New function.
* tree-vectorizer.h (vect_get_store_rhs): Declare.
* tree-vect-data-refs.c (vect_analyze_data_ref_access): Return
true for scatter stores.
(vect_gather_scatter_fn_p): Handle scatter stores too.
(vect_check_gather_scatter): Consider using scatter stores if
supports_vec_scatter_store_p.
* tree-vect-patterns.c (vect_try_gather_scatter_pattern): Handle
scatter stores too.
* tree-vect-stmts.c (exist_non_indexing_operands_for_use_p): Use
internal_fn_stored_value_index.
(check_load_store_masking): Handle scatter stores too.
(vect_get_store_rhs): Make public.
(vectorizable_call): Use internal_store_fn_p.
(vectorizable_store): Handle scatter store internal functions.
(vect_transform_stmt): Compare GROUP_STORE_COUNT with GROUP_SIZE
when deciding whether the end of the group has been reached.
* config/aarch64/aarch64.md (UNSPEC_ST1_SCATTER): New unspec.
* config/aarch64/aarch64-sve.md (scatter_store<mode>): New expander.
(mask_scatter_store<mode>): New insns.
gcc/testsuite/
* lib/target-supports.exp (check_effective_target_vect_scatter_store):
New proc.
* gcc.dg/vect/pr25413a.c: Expect both loops to be optimized on
targets with scatter stores.
* gcc.dg/vect/vect-71.c: Restrict XFAIL to targets without scatter
stores.
* gcc.target/aarch64/sve/mask_scatter_store_1.c: New test.
* gcc.target/aarch64/sve/mask_scatter_store_2.c: Likewise.
* gcc.target/aarch64/sve/scatter_store_1.c: Likewise.
* gcc.target/aarch64/sve/scatter_store_2.c: Likewise.
* gcc.target/aarch64/sve/scatter_store_3.c: Likewise.
* gcc.target/aarch64/sve/scatter_store_4.c: Likewise.
* gcc.target/aarch64/sve/scatter_store_5.c: Likewise.
* gcc.target/aarch64/sve/scatter_store_6.c: Likewise.
* gcc.target/aarch64/sve/scatter_store_7.c: Likewise.
* gcc.target/aarch64/sve/strided_store_1.c: Likewise.
* gcc.target/aarch64/sve/strided_store_2.c: Likewise.
* gcc.target/aarch64/sve/strided_store_3.c: Likewise.
* gcc.target/aarch64/sve/strided_store_4.c: Likewise.
* gcc.target/aarch64/sve/strided_store_5.c: Likewise.
* gcc.target/aarch64/sve/strided_store_6.c: Likewise.
* gcc.target/aarch64/sve/strided_store_7.c: Likewise.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256643
|
|
Following on from the previous patch for strided accesses, this patch
allows gather loads to be used with grouped accesses, if we otherwise
would need to fall back to VMAT_ELEMENTWISE. However, as the comment
says, this is restricted to single-element groups for now:
??? Although the code can handle all group sizes correctly,
it probably isn't a win to use separate strided accesses based
on nearby locations. Or, even if it's a win over scalar code,
it might not be a win over vectorizing at a lower VF, if that
allows us to use contiguous accesses.
Single-element groups are an important special case though,
and this means that code is less sensitive to GCC's classification
of single accesses with constant steps as "grouped" and ones with
variable steps as "strided".
2018-01-13 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* tree-vectorizer.h (vect_gather_scatter_fn_p): Declare.
* tree-vect-data-refs.c (vect_gather_scatter_fn_p): Make public.
* tree-vect-stmts.c (vect_truncate_gather_scatter_offset): New
function.
(vect_use_strided_gather_scatters_p): Take a masked_p argument.
Use vect_truncate_gather_scatter_offset if we can't treat the
operation as a normal gather load or scatter store.
(get_group_load_store_type): Take the gather_scatter_info
as argument. Try using a gather load or scatter store for
single-element groups.
(get_load_store_type): Update calls to get_group_load_store_type
and vect_use_strided_gather_scatters_p.
gcc/testsuite/
* gcc.target/aarch64/sve/reduc_strict_3.c: Expect FADDA to be used
for double_reduc1.
* gcc.target/aarch64/sve/strided_load_4.c: New test.
* gcc.target/aarch64/sve/strided_load_5.c: Likewise.
* gcc.target/aarch64/sve/strided_load_6.c: Likewise.
* gcc.target/aarch64/sve/strided_load_7.c: Likewise.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256642
|
|
This patch tries to use gather loads for strided accesses,
rather than falling back to VMAT_ELEMENTWISE.
2018-01-13 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* tree-vectorizer.h (vect_create_data_ref_ptr): Take an extra
optional tree argument.
* tree-vect-data-refs.c (vect_check_gather_scatter): Check for
null target hooks.
(vect_create_data_ref_ptr): Take the iv_step as an optional argument,
but continue to use the current value as a fallback.
(bump_vector_ptr): Use operand_equal_p rather than tree_int_cst_compare
to compare the updates.
* tree-vect-stmts.c (vect_use_strided_gather_scatters_p): New function.
(get_load_store_type): Use it when handling a strided access.
(vect_get_strided_load_store_ops): New function.
(vect_get_data_ptr_increment): Likewise.
(vectorizable_load): Handle strided gather loads. Always pass
a step to vect_create_data_ref_ptr and bump_vector_ptr.
gcc/testsuite/
* gcc.target/aarch64/sve/strided_load_1.c: New test.
* gcc.target/aarch64/sve/strided_load_2.c: Likewise.
* gcc.target/aarch64/sve/strided_load_3.c: Likewise.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256641
|
|
This patch adds support for SVE gather loads. It uses the basically
the same analysis code as the AVX gather support, but after that
there are two major differences:
- It uses new internal functions rather than target built-ins.
The interface is:
IFN_GATHER_LOAD (base, offsets scale)
IFN_MASK_GATHER_LOAD (base, offsets scale, mask)
which should be reasonably generic. One of the advantages of
using internal functions is that other passes can understand what
the functions do, but a more immediate advantage is that we can
query the underlying target pattern to see which scales it supports.
- It uses pattern recognition to convert the offset to the right width,
if it was originally narrower than that. This avoids having to do
a widening operation as part of the gather expansion itself.
2018-01-13 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* doc/md.texi (gather_load@var{m}): Document.
(mask_gather_load@var{m}): Likewise.
* genopinit.c (main): Add supports_vec_gather_load and
supports_vec_gather_load_cached to target_optabs.
* optabs-tree.c (init_tree_optimization_optabs): Use
ggc_cleared_alloc to allocate target_optabs.
* optabs.def (gather_load_optab, mask_gather_laod_optab): New optabs.
* internal-fn.def (GATHER_LOAD, MASK_GATHER_LOAD): New internal
functions.
* internal-fn.h (internal_load_fn_p): Declare.
(internal_gather_scatter_fn_p): Likewise.
(internal_fn_mask_index): Likewise.
(internal_gather_scatter_fn_supported_p): Likewise.
* internal-fn.c (gather_load_direct): New macro.
(expand_gather_load_optab_fn): New function.
(direct_gather_load_optab_supported_p): New macro.
(direct_internal_fn_optab): New function.
(internal_load_fn_p): Likewise.
(internal_gather_scatter_fn_p): Likewise.
(internal_fn_mask_index): Likewise.
(internal_gather_scatter_fn_supported_p): Likewise.
* optabs-query.c (supports_at_least_one_mode_p): New function.
(supports_vec_gather_load_p): Likewise.
* optabs-query.h (supports_vec_gather_load_p): Declare.
* tree-vectorizer.h (gather_scatter_info): Add ifn, element_type
and memory_type field.
(NUM_PATTERNS): Bump to 15.
* tree-vect-data-refs.c: Include internal-fn.h.
(vect_gather_scatter_fn_p): New function.
(vect_describe_gather_scatter_call): Likewise.
(vect_check_gather_scatter): Try using internal functions for
gather loads. Recognize existing calls to a gather load function.
(vect_analyze_data_refs): Consider using gather loads if
supports_vec_gather_load_p.
* tree-vect-patterns.c (vect_get_load_store_mask): New function.
(vect_get_gather_scatter_offset_type): Likewise.
(vect_convert_mask_for_vectype): Likewise.
(vect_add_conversion_to_patterm): Likewise.
(vect_try_gather_scatter_pattern): Likewise.
(vect_recog_gather_scatter_pattern): New pattern recognizer.
(vect_vect_recog_func_ptrs): Add it.
* tree-vect-stmts.c (exist_non_indexing_operands_for_use_p): Use
internal_fn_mask_index and internal_gather_scatter_fn_p.
(check_load_store_masking): Take the gather_scatter_info as an
argument and handle gather loads.
(vect_get_gather_scatter_ops): New function.
(vectorizable_call): Check internal_load_fn_p.
(vectorizable_load): Likewise. Handle gather load internal
functions.
(vectorizable_store): Update call to check_load_store_masking.
* config/aarch64/aarch64.md (UNSPEC_LD1_GATHER): New unspec.
* config/aarch64/iterators.md (SVE_S, SVE_D): New mode iterators.
* config/aarch64/predicates.md (aarch64_gather_scale_operand_w)
(aarch64_gather_scale_operand_d): New predicates.
* config/aarch64/aarch64-sve.md (gather_load<mode>): New expander.
(mask_gather_load<mode>): New insns.
gcc/testsuite/
* gcc.target/aarch64/sve/gather_load_1.c: New test.
* gcc.target/aarch64/sve/gather_load_2.c: Likewise.
* gcc.target/aarch64/sve/gather_load_3.c: Likewise.
* gcc.target/aarch64/sve/gather_load_4.c: Likewise.
* gcc.target/aarch64/sve/gather_load_5.c: Likewise.
* gcc.target/aarch64/sve/gather_load_6.c: Likewise.
* gcc.target/aarch64/sve/gather_load_7.c: Likewise.
* gcc.target/aarch64/sve/mask_gather_load_1.c: Likewise.
* gcc.target/aarch64/sve/mask_gather_load_2.c: Likewise.
* gcc.target/aarch64/sve/mask_gather_load_3.c: Likewise.
* gcc.target/aarch64/sve/mask_gather_load_4.c: Likewise.
* gcc.target/aarch64/sve/mask_gather_load_5.c: Likewise.
* gcc.target/aarch64/sve/mask_gather_load_6.c: Likewise.
* gcc.target/aarch64/sve/mask_gather_load_7.c: Likewise.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256640
|
|
This allows LD3 to be used for isolated a[i * 3] accesses, in a similar
way to the current a[i * 2] and a[i * 4] for LD2 and LD4 respectively.
Given the problems with the cost model underestimating the cost of
elementwise accesses, the patch continues to reject the VMAT_ELEMENTWISE
cases that are currently rejected.
2018-01-13 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* tree-vect-data-refs.c (vect_analyze_group_access_1): Allow
single-element interleaving even if the size is not a power of 2.
* tree-vect-stmts.c (get_load_store_type): Disallow elementwise
accesses for single-element interleaving if the group size is
not a power of 2.
gcc/testsuite/
* gcc.target/aarch64/sve/struct_vect_18.c: New test.
* gcc.target/aarch64/sve/struct_vect_18_run.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_19.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_19_run.c: Likewise.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256634
|
|
This patch uses SVE CLASTB to optimise conditional reductions. It means
that we no longer need to maintain a separate index vector to record
the most recent valid value, and no longer need to worry about overflow
cases.
2018-01-13 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* doc/md.texi (fold_extract_last_@var{m}): Document.
* doc/sourcebuild.texi (vect_fold_extract_last): Likewise.
* optabs.def (fold_extract_last_optab): New optab.
* internal-fn.def (FOLD_EXTRACT_LAST): New internal function.
* internal-fn.c (fold_extract_direct): New macro.
(expand_fold_extract_optab_fn): Likewise.
(direct_fold_extract_optab_supported_p): Likewise.
* tree-vectorizer.h (EXTRACT_LAST_REDUCTION): New vect_reduction_type.
* tree-vect-loop.c (vect_model_reduction_cost): Handle
EXTRACT_LAST_REDUCTION.
(get_initial_def_for_reduction): Do not create an initial vector
for EXTRACT_LAST_REDUCTION reductions.
(vectorizable_reduction): Leave the scalar phi in place for
EXTRACT_LAST_REDUCTIONs. Try using EXTRACT_LAST_REDUCTION
ahead of INTEGER_INDUC_COND_REDUCTION. Do not check for an
epilogue code for EXTRACT_LAST_REDUCTION and defer the
transform phase to vectorizable_condition.
* tree-vect-stmts.c (vect_finish_stmt_generation_1): New function,
split out from...
(vect_finish_stmt_generation): ...here.
(vect_finish_replace_stmt): New function.
(vectorizable_condition): Handle EXTRACT_LAST_REDUCTION.
* config/aarch64/aarch64-sve.md (fold_extract_last_<mode>): New
pattern.
* config/aarch64/aarch64.md (UNSPEC_CLASTB): New unspec.
gcc/testsuite/
* lib/target-supports.exp
(check_effective_target_vect_fold_extract_last): New proc.
* gcc.dg/vect/pr65947-1.c: Update dump messages. Add markup
for fold_extract_last.
* gcc.dg/vect/pr65947-2.c: Likewise.
* gcc.dg/vect/pr65947-3.c: Likewise.
* gcc.dg/vect/pr65947-4.c: Likewise.
* gcc.dg/vect/pr65947-5.c: Likewise.
* gcc.dg/vect/pr65947-6.c: Likewise.
* gcc.dg/vect/pr65947-9.c: Likewise.
* gcc.dg/vect/pr65947-10.c: Likewise.
* gcc.dg/vect/pr65947-12.c: Likewise.
* gcc.dg/vect/pr65947-14.c: Likewise.
* gcc.dg/vect/pr80631-1.c: Likewise.
* gcc.target/aarch64/sve/clastb_1.c: New test.
* gcc.target/aarch64/sve/clastb_1_run.c: Likewise.
* gcc.target/aarch64/sve/clastb_2.c: Likewise.
* gcc.target/aarch64/sve/clastb_2_run.c: Likewise.
* gcc.target/aarch64/sve/clastb_3.c: Likewise.
* gcc.target/aarch64/sve/clastb_3_run.c: Likewise.
* gcc.target/aarch64/sve/clastb_4.c: Likewise.
* gcc.target/aarch64/sve/clastb_4_run.c: Likewise.
* gcc.target/aarch64/sve/clastb_5.c: Likewise.
* gcc.target/aarch64/sve/clastb_5_run.c: Likewise.
* gcc.target/aarch64/sve/clastb_6.c: Likewise.
* gcc.target/aarch64/sve/clastb_6_run.c: Likewise.
* gcc.target/aarch64/sve/clastb_7.c: Likewise.
* gcc.target/aarch64/sve/clastb_7_run.c: Likewise.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256633
|
|
This patch adds support for aligning vectors by using a partial
first iteration. E.g. if the start pointer is 3 elements beyond
an aligned address, the first iteration will have a mask in which
the first three elements are false.
On SVE, the optimisation is only useful for vector-length-specific
code. Vector-length-agnostic code doesn't try to align vectors
since the vector length might not be a power of 2.
2018-01-13 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* tree-vectorizer.h (_loop_vec_info::mask_skip_niters): New field.
(LOOP_VINFO_MASK_SKIP_NITERS): New macro.
(vect_use_loop_mask_for_alignment_p): New function.
(vect_prepare_for_masked_peels, vect_gen_while_not): Declare.
* tree-vect-loop-manip.c (vect_set_loop_masks_directly): Add an
niters_skip argument. Make sure that the first niters_skip elements
of the first iteration are inactive.
(vect_set_loop_condition_masked): Handle LOOP_VINFO_MASK_SKIP_NITERS.
Update call to vect_set_loop_masks_directly.
(get_misalign_in_elems): New function, split out from...
(vect_gen_prolog_loop_niters): ...here.
(vect_update_init_of_dr): Take a code argument that specifies whether
the adjustment should be added or subtracted.
(vect_update_init_of_drs): Likewise.
(vect_prepare_for_masked_peels): New function.
(vect_do_peeling): Skip prologue peeling if we're using a mask
instead. Update call to vect_update_inits_of_drs.
* tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize
mask_skip_niters.
(vect_analyze_loop_2): Allow fully-masked loops with peeling for
alignment. Do not include the number of peeled iterations in
the minimum threshold in that case.
(vectorizable_induction): Adjust the start value down by
LOOP_VINFO_MASK_SKIP_NITERS iterations.
(vect_transform_loop): Call vect_prepare_for_masked_peels.
Take the number of skipped iterations into account when calculating
the loop bounds.
* tree-vect-stmts.c (vect_gen_while_not): New function.
gcc/testsuite/
* gcc.target/aarch64/sve/nopeel_1.c: New test.
* gcc.target/aarch64/sve/peel_ind_1.c: Likewise.
* gcc.target/aarch64/sve/peel_ind_1_run.c: Likewise.
* gcc.target/aarch64/sve/peel_ind_2.c: Likewise.
* gcc.target/aarch64/sve/peel_ind_2_run.c: Likewise.
* gcc.target/aarch64/sve/peel_ind_3.c: Likewise.
* gcc.target/aarch64/sve/peel_ind_3_run.c: Likewise.
* gcc.target/aarch64/sve/peel_ind_4.c: Likewise.
* gcc.target/aarch64/sve/peel_ind_4_run.c: Likewise.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256630
|
|
This patch adds support for using a single fully-predicated loop instead
of a vector loop and a scalar tail. An SVE WHILELO instruction generates
the predicate for each iteration of the loop, given the current scalar
iv value and the loop bound. This operation is wrapped up in a new internal
function called WHILE_ULT. E.g.:
WHILE_ULT (0, 3, { 0, 0, 0, 0}) -> { 1, 1, 1, 0 }
WHILE_ULT (UINT_MAX - 1, UINT_MAX, { 0, 0, 0, 0 }) -> { 1, 0, 0, 0 }
The third WHILE_ULT argument is needed to make the operation
unambiguous: without it, WHILE_ULT (0, 3) for one vector type would
seem equivalent to WHILE_ULT (0, 3) for another, even if the types have
different numbers of elements.
Note that the patch uses "mask" and "fully-masked" instead of
"predicate" and "fully-predicated", to follow existing GCC terminology.
This patch just handles the simple cases, punting for things like
reductions and live-out values. Later patches remove most of these
restrictions.
2018-01-13 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* optabs.def (while_ult_optab): New optab.
* doc/md.texi (while_ult@var{m}@var{n}): Document.
* internal-fn.def (WHILE_ULT): New internal function.
* internal-fn.h (direct_internal_fn_supported_p): New override
that takes two types as argument.
* internal-fn.c (while_direct): New macro.
(expand_while_optab_fn): New function.
(convert_optab_supported_p): Likewise.
(direct_while_optab_supported_p): New macro.
* wide-int.h (wi::udiv_ceil): New function.
* tree-vectorizer.h (rgroup_masks): New structure.
(vec_loop_masks): New typedef.
(_loop_vec_info): Add masks, mask_compare_type, can_fully_mask_p
and fully_masked_p.
(LOOP_VINFO_CAN_FULLY_MASK_P, LOOP_VINFO_FULLY_MASKED_P)
(LOOP_VINFO_MASKS, LOOP_VINFO_MASK_COMPARE_TYPE): New macros.
(vect_max_vf): New function.
(slpeel_make_loop_iterate_ntimes): Delete.
(vect_set_loop_condition, vect_get_loop_mask_type, vect_gen_while)
(vect_halve_mask_nunits, vect_double_mask_nunits): Declare.
(vect_record_loop_mask, vect_get_loop_mask): Likewise.
* tree-vect-loop-manip.c: Include tree-ssa-loop-niter.h,
internal-fn.h, stor-layout.h and optabs-query.h.
(vect_set_loop_mask): New function.
(add_preheader_seq): Likewise.
(add_header_seq): Likewise.
(interleave_supported_p): Likewise.
(vect_maybe_permute_loop_masks): Likewise.
(vect_set_loop_masks_directly): Likewise.
(vect_set_loop_condition_masked): Likewise.
(vect_set_loop_condition_unmasked): New function, split out from
slpeel_make_loop_iterate_ntimes.
(slpeel_make_loop_iterate_ntimes): Rename to..
(vect_set_loop_condition): ...this. Use vect_set_loop_condition_masked
for fully-masked loops and vect_set_loop_condition_unmasked otherwise.
(vect_do_peeling): Update call accordingly.
(vect_gen_vector_loop_niters): Use VF as the step for fully-masked
loops.
* tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize
mask_compare_type, can_fully_mask_p and fully_masked_p.
(release_vec_loop_masks): New function.
(_loop_vec_info): Use it to free the loop masks.
(can_produce_all_loop_masks_p): New function.
(vect_get_max_nscalars_per_iter): Likewise.
(vect_verify_full_masking): Likewise.
(vect_analyze_loop_2): Save LOOP_VINFO_CAN_FULLY_MASK_P around
retries, and free the mask rgroups before retrying. Check loop-wide
reasons for disallowing fully-masked loops. Make the final decision
about whether use a fully-masked loop or not.
(vect_estimate_min_profitable_iters): Do not assume that peeling
for the number of iterations will be needed for fully-masked loops.
(vectorizable_reduction): Disable fully-masked loops.
(vectorizable_live_operation): Likewise.
(vect_halve_mask_nunits): New function.
(vect_double_mask_nunits): Likewise.
(vect_record_loop_mask): Likewise.
(vect_get_loop_mask): Likewise.
(vect_transform_loop): Handle the case in which the final loop
iteration might handle a partial vector. Call vect_set_loop_condition
instead of slpeel_make_loop_iterate_ntimes.
* tree-vect-stmts.c: Include tree-ssa-loop-niter.h and gimple-fold.h.
(check_load_store_masking): New function.
(prepare_load_store_mask): Likewise.
(vectorizable_store): Handle fully-masked loops.
(vectorizable_load): Likewise.
(supportable_widening_operation): Use vect_halve_mask_nunits for
booleans.
(supportable_narrowing_operation): Likewise vect_double_mask_nunits.
(vect_gen_while): New function.
* config/aarch64/aarch64.md (umax<mode>3): New expander.
(aarch64_uqdec<mode>): New insn.
gcc/testsuite/
* gcc.dg/tree-ssa/cunroll-10.c: Disable vectorization.
* gcc.dg/tree-ssa/peel1.c: Likewise.
* gcc.dg/vect/vect-load-lanes-peeling-1.c: Remove XFAIL for
variable-length vectors.
* gcc.target/aarch64/sve/vcond_6.c: XFAIL test for AND.
* gcc.target/aarch64/sve/vec_bool_cmp_1.c: Expect BIC instead of NOT.
* gcc.target/aarch64/sve/slp_1.c: Check for a fully-masked loop.
* gcc.target/aarch64/sve/slp_2.c: Likewise.
* gcc.target/aarch64/sve/slp_3.c: Likewise.
* gcc.target/aarch64/sve/slp_4.c: Likewise.
* gcc.target/aarch64/sve/slp_6.c: Likewise.
* gcc.target/aarch64/sve/slp_8.c: New test.
* gcc.target/aarch64/sve/slp_8_run.c: Likewise.
* gcc.target/aarch64/sve/slp_9.c: Likewise.
* gcc.target/aarch64/sve/slp_9_run.c: Likewise.
* gcc.target/aarch64/sve/slp_10.c: Likewise.
* gcc.target/aarch64/sve/slp_10_run.c: Likewise.
* gcc.target/aarch64/sve/slp_11.c: Likewise.
* gcc.target/aarch64/sve/slp_11_run.c: Likewise.
* gcc.target/aarch64/sve/slp_12.c: Likewise.
* gcc.target/aarch64/sve/slp_12_run.c: Likewise.
* gcc.target/aarch64/sve/ld1r_2.c: Likewise.
* gcc.target/aarch64/sve/ld1r_2_run.c: Likewise.
* gcc.target/aarch64/sve/while_1.c: Likewise.
* gcc.target/aarch64/sve/while_2.c: Likewise.
* gcc.target/aarch64/sve/while_3.c: Likewise.
* gcc.target/aarch64/sve/while_4.c: Likewise.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256625
|
|
This patch adds support for vectorising groups of IFN_MASK_LOADs
and IFN_MASK_STOREs using conditional load/store-lanes instructions.
This requires new internal functions to represent the result
(IFN_MASK_{LOAD,STORE}_LANES), as well as associated optabs.
The normal IFN_{LOAD,STORE}_LANES functions are const operations
that logically just perform the permute: the load or store is
encoded as a MEM operand to the call statement. In contrast,
the IFN_MASK_{LOAD,STORE}_LANES functions use the same kind of
interface as IFN_MASK_{LOAD,STORE}, since the memory is only
conditionally accessed.
The AArch64 patterns were added as part of the main LD[234]/ST[234] patch.
2018-01-13 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* doc/md.texi (vec_mask_load_lanes@var{m}@var{n}): Document.
(vec_mask_store_lanes@var{m}@var{n}): Likewise.
* optabs.def (vec_mask_load_lanes_optab): New optab.
(vec_mask_store_lanes_optab): Likewise.
* internal-fn.def (MASK_LOAD_LANES): New internal function.
(MASK_STORE_LANES): Likewise.
* internal-fn.c (mask_load_lanes_direct): New macro.
(mask_store_lanes_direct): Likewise.
(expand_mask_load_optab_fn): Handle masked operations.
(expand_mask_load_lanes_optab_fn): New macro.
(expand_mask_store_optab_fn): Handle masked operations.
(expand_mask_store_lanes_optab_fn): New macro.
(direct_mask_load_lanes_optab_supported_p): Likewise.
(direct_mask_store_lanes_optab_supported_p): Likewise.
* tree-vectorizer.h (vect_store_lanes_supported): Take a masked_p
parameter.
(vect_load_lanes_supported): Likewise.
* tree-vect-data-refs.c (strip_conversion): New function.
(can_group_stmts_p): Likewise.
(vect_analyze_data_ref_accesses): Use it instead of checking
for a pair of assignments.
(vect_store_lanes_supported): Take a masked_p parameter.
(vect_load_lanes_supported): Likewise.
* tree-vect-loop.c (vect_analyze_loop_2): Update calls to
vect_store_lanes_supported and vect_load_lanes_supported.
* tree-vect-slp.c (vect_analyze_slp_instance): Likewise.
* tree-vect-stmts.c (get_group_load_store_type): Take a masked_p
parameter. Don't allow gaps for masked accesses.
Use vect_get_store_rhs. Update calls to vect_store_lanes_supported
and vect_load_lanes_supported.
(get_load_store_type): Take a masked_p parameter and update
call to get_group_load_store_type.
(vectorizable_store): Update call to get_load_store_type.
Handle IFN_MASK_STORE_LANES.
(vectorizable_load): Update call to get_load_store_type.
Handle IFN_MASK_LOAD_LANES.
gcc/testsuite/
* gcc.dg/vect/vect-ooo-group-1.c: New test.
* gcc.target/aarch64/sve/mask_struct_load_1.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_load_1_run.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_load_2.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_load_2_run.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_load_3.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_load_3_run.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_load_4.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_load_5.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_load_6.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_load_7.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_load_8.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_store_1.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_store_1_run.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_store_2.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_store_2_run.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_store_3.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_store_3_run.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_store_4.c: Likewise.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256620
|
|
128b right away, to be more efficient for Ryzen and Intel)
2018-01-12 Richard Biener <rguenther@suse.de>
PR tree-optimization/80846
* target.def (split_reduction): New target hook.
* targhooks.c (default_split_reduction): New function.
* targhooks.h (default_split_reduction): Declare.
* tree-vect-loop.c (vect_create_epilog_for_reduction): If the
target requests first reduce vectors by combining low and high
parts.
* tree-vect-stmts.c (vect_gen_perm_mask_any): Adjust.
(get_vectype_for_scalar_type_and_size): Export.
* tree-vectorizer.h (get_vectype_for_scalar_type_and_size): Declare.
* doc/tm.texi.in (TARGET_VECTORIZE_SPLIT_REDUCTION): Document.
* doc/tm.texi: Regenerate.
i386/
* config/i386/i386.c (ix86_split_reduction): Implement
TARGET_VECTORIZE_SPLIT_REDUCTION.
* gcc.target/i386/pr80846-1.c: New testcase.
* gcc.target/i386/pr80846-2.c: Likewise.
From-SVN: r256576
|
|
After cunrolling the inner loop, the remaining loop in the testcase
has a single 32-bit access and a group of 64-bit accesses. We first
try to vectorise at 128 bits (VF 4), but decide not to for cost reasons.
We then try with 64 bits (VF 2) instead. This means that the group
of 64-bit accesses uses a single-element vector, which is deliberately
supported as of r251538. We then try to create "permutes" for these
single-element vectors and fall foul of:
for (i = 0; i < 6; i++)
sel[i] += exact_div (nelt, 2);
in vect_grouped_store_supported, since nelt==1.
Maybe we shouldn't even be trying to vectorise statements in the
single-element case, and instead just copy the scalar statement
for each member of the group. But until then, this patch treats
non-strided grouped accesses as VMAT_CONTIGUOUS if no permutation
is necessary.
2018-01-10 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
PR tree-optimization/83753
* tree-vect-stmts.c (get_group_load_store_type): Use VMAT_CONTIGUOUS
for non-strided grouped accesses if the number of elements is 1.
gcc/testsuite/
PR tree-optimization/83753
* gcc.dg/torture/pr83753.c: New test.
From-SVN: r256427
|
|
As mentioned in https://gcc.gnu.org/ml/gcc-patches/2017-11/msg01575.html ,
the scatter handling in vectorizable_store seems to be dead code at the
moment. Enabling it with the vect_analyze_data_ref_access part of
that patch triggered an ICE in the avx512f-scatter-*.c tests (which
previously didn't use scatters). The problem was that the NARROW
and WIDEN handling uses permute_vec_elements to marshal the inputs,
and permute_vec_elements expected the lhs of the stmt to be an SSA_NAME,
which of course it isn't for stores.
This patch makes permute_vec_elements create a fresh variable in this case.
2018-01-09 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* tree-vect-stmts.c (permute_vec_elements): Create a fresh variable
if the destination isn't an SSA_NAME.
From-SVN: r256383
|
|
After the previous patches, it's easier to see that the remaining
inlined transform code in vectorizable_mask_load_store is just a
cut-down version of the VMAT_CONTIGUOUS handling in vectorizable_load
and vectorizable_store. This patch therefore makes those functions
handle masked loads and stores instead.
This makes it easier to handle more forms of masked load and store
without duplicating logic from the unmasked forms. It also helps with
support for fully-masked loops.
2018-01-03 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* tree-vect-stmts.c (vect_get_store_rhs): New function.
(vectorizable_mask_load_store): Delete.
(vectorizable_call): Return false for masked loads and stores.
(vectorizable_store): Handle IFN_MASK_STORE. Use vect_get_store_rhs
instead of gimple_assign_rhs1.
(vectorizable_load): Handle IFN_MASK_LOAD.
(vect_transform_stmt): Don't set is_store for call_vec_info_type.
From-SVN: r256216
|
|
vectorizable_mask_load_store and vectorizable_load used the same
code to build a gather load call, except that the former also
vectorised a mask argument and used it for both the merge and mask
inputs. The latter instead used a merge input of zero and a mask
input of all-ones. This patch splits it out into a subroutine.
2018-01-03 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* tree-vect-stmts.c (vect_build_gather_load_calls): New function,
split out from..,
(vectorizable_mask_load_store): ...here.
(vectorizable_load): ...and here.
From-SVN: r256215
|
|
This patch splits out the code to build an all-bits-one or all-bits-zero
input to a gather load. The catch is that both masks can have
floating-point type, in which case they are implicitly treated in
the same way as an integer bitmask.
2018-01-03 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* tree-vect-stmts.c (vect_build_all_ones_mask)
(vect_build_zero_merge_argument): New functions, split out from...
(vectorizable_load): ...here.
From-SVN: r256214
|
|
This patch splits out the rhs checking code that's common to both
vectorizable_mask_load_store and vectorizable_store.
2018-01-03 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* tree-vect-stmts.c (vect_check_store_rhs): New function,
split out from...
(vectorizable_mask_load_store): ...here.
(vectorizable_store): ...and here.
From-SVN: r256213
|
|
This patch splits the mask argument checking out of
vectorizable_mask_load_store, so that a later patch can use it in both
vectorizable_load and vectorizable_store. It also adds dump messages
for false returns. This is mostly useful for the TYPE_VECTOR_SUBPARTS
check, which can fail if pattern recognition didn't convert the mask
properly.
2018-01-03 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* tree-vect-stmts.c (vect_check_load_store_mask): New function,
split out from...
(vectorizable_mask_load_store): ...here.
From-SVN: r256212
|
|
This patch makes vect_model_store_cost take a vec_load_store_type
instead of a vect_def_type. It's a wash on its own, but it helps
with later patches.
2018-01-03 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* tree-vectorizer.h (vec_load_store_type): Moved from tree-vec-stmts.c
(vect_model_store_cost): Take a vec_load_store_type instead of a
vect_def_type.
* tree-vect-stmts.c (vec_load_store_type): Move to tree-vectorizer.h.
(vect_model_store_cost): Take a vec_load_store_type instead of a
vect_def_type.
(vectorizable_mask_load_store): Update accordingly.
(vectorizable_store): Likewise.
* tree-vect-slp.c (vect_analyze_slp_cost_1): Update accordingly.
From-SVN: r256211
|
|
vectorizable_mask_load_store replaces scalar IFN_MASK_LOAD calls with
dummy assignments, so that they never survive vectorisation. This patch
moves the code to vect_transform_loop instead, so that we only change
the scalar statements once all of them have been vectorised.
This makes it easier to handle other types of functions that need
stubbing out, and also makes it easier to handle groups and patterns.
2018-01-03 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* tree-vect-loop.c (vect_transform_loop): Stub out scalar
IFN_MASK_LOAD calls here rather than...
* tree-vect-stmts.c (vectorizable_mask_load_store): ...here.
From-SVN: r256210
|
|
This patch changes GET_MODE_SIZE from unsigned short to poly_uint16.
The non-mechanical parts were handled by previous patches.
2018-01-03 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* machmode.h (mode_size): Change from unsigned short to
poly_uint16_pod.
(mode_to_bytes): Return a poly_uint16 rather than an unsigned short.
(GET_MODE_SIZE): Return a constant if ONLY_FIXED_SIZE_MODES,
or if measurement_type is not polynomial.
(fixed_size_mode::includes_p): Check for constant-sized modes.
* genmodes.c (emit_mode_size_inline): Make mode_size_inline
return a poly_uint16 rather than an unsigned short.
(emit_mode_size): Change the type of mode_size from unsigned short
to poly_uint16_pod. Use ZERO_COEFFS for the initializer.
(emit_mode_adjustments): Cope with polynomial vector sizes.
* lto-streamer-in.c (lto_input_mode_table): Use bp_unpack_poly_value
for GET_MODE_SIZE.
* lto-streamer-out.c (lto_write_mode_table): Use bp_pack_poly_value
for GET_MODE_SIZE.
* auto-inc-dec.c (try_merge): Treat GET_MODE_SIZE as polynomial.
* builtins.c (expand_ifn_atomic_compare_exchange_into_call): Likewise.
* caller-save.c (setup_save_areas): Likewise.
(replace_reg_with_saved_mem): Likewise.
* calls.c (emit_library_call_value_1): Likewise.
* combine-stack-adj.c (combine_stack_adjustments_for_block): Likewise.
* combine.c (simplify_set, make_extraction, simplify_shift_const_1)
(gen_lowpart_for_combine): Likewise.
* convert.c (convert_to_integer_1): Likewise.
* cse.c (equiv_constant, cse_insn): Likewise.
* cselib.c (autoinc_split, cselib_hash_rtx): Likewise.
(cselib_subst_to_values): Likewise.
* dce.c (word_dce_process_block): Likewise.
* df-problems.c (df_word_lr_mark_ref): Likewise.
* dwarf2cfi.c (init_one_dwarf_reg_size): Likewise.
* dwarf2out.c (multiple_reg_loc_descriptor, mem_loc_descriptor)
(concat_loc_descriptor, concatn_loc_descriptor, loc_descriptor)
(rtl_for_decl_location): Likewise.
* emit-rtl.c (gen_highpart, widen_memory_access): Likewise.
* expmed.c (extract_bit_field_1, extract_integral_bit_field): Likewise.
* expr.c (emit_group_load_1, clear_storage_hints): Likewise.
(emit_move_complex, emit_move_multi_word, emit_push_insn): Likewise.
(expand_expr_real_1): Likewise.
* function.c (assign_parm_setup_block_p, assign_parm_setup_block)
(pad_below): Likewise.
* gimple-fold.c (optimize_atomic_compare_exchange_p): Likewise.
* gimple-ssa-store-merging.c (rhs_valid_for_store_merging_p): Likewise.
* ira.c (get_subreg_tracking_sizes): Likewise.
* ira-build.c (ira_create_allocno_objects): Likewise.
* ira-color.c (coalesced_pseudo_reg_slot_compare): Likewise.
(ira_sort_regnos_for_alter_reg): Likewise.
* ira-costs.c (record_operand_costs): Likewise.
* lower-subreg.c (interesting_mode_p, simplify_gen_subreg_concatn)
(resolve_simple_move): Likewise.
* lra-constraints.c (get_reload_reg, operands_match_p): Likewise.
(process_addr_reg, simplify_operand_subreg, curr_insn_transform)
(lra_constraints): Likewise.
(CONST_POOL_OK_P): Reject variable-sized modes.
* lra-spills.c (slot, assign_mem_slot, pseudo_reg_slot_compare)
(add_pseudo_to_slot, lra_spill): Likewise.
* omp-low.c (omp_clause_aligned_alignment): Likewise.
* optabs-query.c (get_best_extraction_insn): Likewise.
* optabs-tree.c (expand_vec_cond_expr_p): Likewise.
* optabs.c (expand_vec_perm_var, expand_vec_cond_expr): Likewise.
(expand_mult_highpart, valid_multiword_target_p): Likewise.
* recog.c (offsettable_address_addr_space_p): Likewise.
* regcprop.c (maybe_mode_change): Likewise.
* reginfo.c (choose_hard_reg_mode, record_subregs_of_mode): Likewise.
* regrename.c (build_def_use): Likewise.
* regstat.c (dump_reg_info): Likewise.
* reload.c (complex_word_subreg_p, push_reload, find_dummy_reload)
(find_reloads, find_reloads_subreg_address): Likewise.
* reload1.c (eliminate_regs_1): Likewise.
* rtlanal.c (for_each_inc_dec_find_inc_dec, rtx_cost): Likewise.
* simplify-rtx.c (avoid_constant_pool_reference): Likewise.
(simplify_binary_operation_1, simplify_subreg): Likewise.
* targhooks.c (default_function_arg_padding): Likewise.
(default_hard_regno_nregs, default_class_max_nregs): Likewise.
* tree-cfg.c (verify_gimple_assign_binary): Likewise.
(verify_gimple_assign_ternary): Likewise.
* tree-inline.c (estimate_move_cost): Likewise.
* tree-ssa-forwprop.c (simplify_vector_constructor): Likewise.
* tree-ssa-loop-ivopts.c (add_autoinc_candidates): Likewise.
(get_address_cost_ainc): Likewise.
* tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Likewise.
(vect_supportable_dr_alignment): Likewise.
* tree-vect-loop.c (vect_determine_vectorization_factor): Likewise.
(vectorizable_reduction): Likewise.
* tree-vect-stmts.c (vectorizable_assignment, vectorizable_shift)
(vectorizable_operation, vectorizable_load): Likewise.
* tree.c (build_same_sized_truth_vector_type): Likewise.
* valtrack.c (cleanup_auto_inc_dec): Likewise.
* var-tracking.c (emit_note_insn_var_location): Likewise.
* config/arc/arc.h (ASM_OUTPUT_CASE_END): Use as_a <scalar_int_mode>.
(ADDR_VEC_ALIGN): Likewise.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256201
|
|
This patch changes GET_MODE_BITSIZE from an unsigned short
to a poly_uint16.
2018-01-03 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* machmode.h (mode_to_bits): Return a poly_uint16 rather than an
unsigned short.
(GET_MODE_BITSIZE): Return a constant if ONLY_FIXED_SIZE_MODES,
or if measurement_type is polynomial.
* calls.c (shift_return_value): Treat GET_MODE_BITSIZE as polynomial.
* combine.c (make_extraction): Likewise.
* dse.c (find_shift_sequence): Likewise.
* dwarf2out.c (mem_loc_descriptor): Likewise.
* expmed.c (store_integral_bit_field, extract_bit_field_1): Likewise.
(extract_bit_field, extract_low_bits): Likewise.
* expr.c (convert_move, convert_modes, emit_move_insn_1): Likewise.
(optimize_bitfield_assignment_op, expand_assignment): Likewise.
(store_expr_with_bounds, store_field, expand_expr_real_1): Likewise.
* fold-const.c (optimize_bit_field_compare, merge_ranges): Likewise.
* gimple-fold.c (optimize_atomic_compare_exchange_p): Likewise.
* reload.c (find_reloads): Likewise.
* reload1.c (alter_reg): Likewise.
* stor-layout.c (bitwise_mode_for_mode, compute_record_mode): Likewise.
* targhooks.c (default_secondary_memory_needed_mode): Likewise.
* tree-if-conv.c (predicate_mem_writes): Likewise.
* tree-ssa-strlen.c (handle_builtin_memcmp): Likewise.
* tree-vect-patterns.c (adjust_bool_pattern): Likewise.
* tree-vect-stmts.c (vectorizable_simd_clone_call): Likewise.
* valtrack.c (dead_debug_insert_temp): Likewise.
* varasm.c (mergeable_constant_section): Likewise.
* config/sh/sh.h (LOCAL_ALIGNMENT): Use as_a <fixed_size_mode>.
gcc/ada/
* gcc-interface/misc.c (enumerate_modes): Treat GET_MODE_BITSIZE
as polynomial.
gcc/c-family/
* c-ubsan.c (ubsan_instrument_shift): Treat GET_MODE_BITSIZE
as polynomial.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256200
|
|
This patch changes TYPE_VECTOR_SUBPARTS to a poly_uint64. The value is
encoded in the 10-bit precision field and was previously always stored
as a simple log2 value. The challenge was to use this 10 bits to
encode the number of elements in variable-length vectors, so that
we didn't need to increase the size of the tree.
In practice the number of vector elements should always have the form
N + N * X (where X is the runtime value), and as for constant-length
vectors, N must be a power of 2 (even though X itself might not be).
The patch therefore uses the low 8 bits to encode log2(N) and bit
8 to select between constant-length and variable-length vectors.
Targets without variable-length vectors continue to use the old scheme.
A new valid_vector_subparts_p function tests whether a given number
of elements can be encoded. This is false for the vector modes that
represent an LD3 or ST3 vector triple (which we want to treat as arrays
of vectors rather than single vectors).
Most of the patch is mechanical; previous patches handled the changes
that weren't entirely straightforward.
2018-01-03 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* tree.h (TYPE_VECTOR_SUBPARTS): Turn into a function and handle
polynomial numbers of units.
(SET_TYPE_VECTOR_SUBPARTS): Likewise.
(valid_vector_subparts_p): New function.
(build_vector_type): Remove temporary shim and take the number
of units as a poly_uint64 rather than an int.
(build_opaque_vector_type): Take the number of units as a
poly_uint64 rather than an int.
* tree.c (build_vector_from_ctor): Handle polynomial
TYPE_VECTOR_SUBPARTS.
(type_hash_canon_hash, type_cache_hasher::equal): Likewise.
(uniform_vector_p, vector_type_mode, build_vector): Likewise.
(build_vector_from_val): If the number of units is variable,
use build_vec_duplicate_cst for constant operands and
VEC_DUPLICATE_EXPR otherwise.
(make_vector_type): Remove temporary is_constant ().
(build_vector_type, build_opaque_vector_type): Take the number of
units as a poly_uint64 rather than an int.
(check_vector_cst): Handle polynomial TYPE_VECTOR_SUBPARTS and
VECTOR_CST_NELTS.
* cfgexpand.c (expand_debug_expr): Likewise.
* expr.c (count_type_elements, categorize_ctor_elements_1): Likewise.
(store_constructor, expand_expr_real_1): Likewise.
(const_scalar_mask_from_tree): Likewise.
* fold-const-call.c (fold_const_reduction): Likewise.
* fold-const.c (const_binop, const_unop, fold_convert_const): Likewise.
(operand_equal_p, fold_vec_perm, fold_ternary_loc): Likewise.
(native_encode_vector, vec_cst_ctor_to_array): Likewise.
(fold_relational_const): Likewise.
(native_interpret_vector): Likewise. Change the size from an
int to an unsigned int.
* gimple-fold.c (gimple_fold_stmt_to_constant_1): Handle polynomial
TYPE_VECTOR_SUBPARTS.
(gimple_fold_indirect_ref, gimple_build_vector): Likewise.
(gimple_build_vector_from_val): Use VEC_DUPLICATE_EXPR when
duplicating a non-constant operand into a variable-length vector.
* hsa-brig.c (hsa_op_immed::emit_to_buffer): Handle polynomial
TYPE_VECTOR_SUBPARTS and VECTOR_CST_NELTS.
* ipa-icf.c (sem_variable::equals): Likewise.
* match.pd: Likewise.
* omp-simd-clone.c (simd_clone_subparts): Likewise.
* print-tree.c (print_node): Likewise.
* stor-layout.c (layout_type): Likewise.
* targhooks.c (default_builtin_vectorization_cost): Likewise.
* tree-cfg.c (verify_gimple_comparison): Likewise.
(verify_gimple_assign_binary): Likewise.
(verify_gimple_assign_ternary): Likewise.
(verify_gimple_assign_single): Likewise.
* tree-pretty-print.c (dump_generic_node): Likewise.
* tree-ssa-forwprop.c (simplify_vector_constructor): Likewise.
(simplify_bitfield_ref, is_combined_permutation_identity): Likewise.
* tree-vect-data-refs.c (vect_permute_store_chain): Likewise.
(vect_grouped_load_supported, vect_permute_load_chain): Likewise.
(vect_shift_permute_load_chain): Likewise.
* tree-vect-generic.c (nunits_for_known_piecewise_op): Likewise.
(expand_vector_condition, optimize_vector_constructor): Likewise.
(lower_vec_perm, get_compute_type): Likewise.
* tree-vect-loop.c (vect_determine_vectorization_factor): Likewise.
(get_initial_defs_for_reduction, vect_transform_loop): Likewise.
* tree-vect-patterns.c (vect_recog_bool_pattern): Likewise.
(vect_recog_mask_conversion_pattern): Likewise.
* tree-vect-slp.c (vect_supported_load_permutation_p): Likewise.
(vect_get_constant_vectors, vect_transform_slp_perm_load): Likewise.
* tree-vect-stmts.c (perm_mask_for_reverse): Likewise.
(get_group_load_store_type, vectorizable_mask_load_store): Likewise.
(vectorizable_bswap, simd_clone_subparts, vectorizable_assignment)
(vectorizable_shift, vectorizable_operation, vectorizable_store)
(vectorizable_load, vect_is_simple_cond, vectorizable_comparison)
(supportable_widening_operation): Likewise.
(supportable_narrowing_operation): Likewise.
* tree-vector-builder.c (tree_vector_builder::binary_encoded_nelts):
Likewise.
* varasm.c (output_constant): Likewise.
gcc/ada/
* gcc-interface/utils.c (gnat_types_compatible_p): Handle
polynomial TYPE_VECTOR_SUBPARTS.
gcc/brig/
* brigfrontend/brig-to-generic.cc (get_unsigned_int_type): Handle
polynomial TYPE_VECTOR_SUBPARTS.
* brigfrontend/brig-util.h (gccbrig_type_vector_subparts): Likewise.
gcc/c-family/
* c-common.c (vector_types_convertible_p, c_build_vec_perm_expr)
(convert_vector_to_array_for_subscript): Handle polynomial
TYPE_VECTOR_SUBPARTS.
(c_common_type_for_mode): Check valid_vector_subparts_p.
* c-pretty-print.c (pp_c_initializer_list): Handle polynomial
VECTOR_CST_NELTS.
gcc/c/
* c-typeck.c (comptypes_internal, build_binary_op): Handle polynomial
TYPE_VECTOR_SUBPARTS.
gcc/cp/
* constexpr.c (cxx_eval_array_reference): Handle polynomial
VECTOR_CST_NELTS.
(cxx_fold_indirect_ref): Handle polynomial TYPE_VECTOR_SUBPARTS.
* call.c (build_conditional_expr_1): Likewise.
* decl.c (cp_finish_decomp): Likewise.
* mangle.c (write_type): Likewise.
* typeck.c (structural_comptypes): Likewise.
(cp_build_binary_op): Likewise.
* typeck2.c (process_init_constructor_array): Likewise.
gcc/fortran/
* trans-types.c (gfc_type_for_mode): Check valid_vector_subparts_p.
gcc/lto/
* lto-lang.c (lto_type_for_mode): Check valid_vector_subparts_p.
* lto.c (hash_canonical_type): Handle polynomial TYPE_VECTOR_SUBPARTS.
gcc/go/
* go-lang.c (go_langhook_type_for_mode): Check valid_vector_subparts_p.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256197
|
|
From-SVN: r256169
|
|
This patch changes the number of elements in a vector being built
by a vector_builder from unsigned int to poly_uint64. The case
in which it isn't a constant is the one that motivated adding
the vector encoding in the first place.
2018-01-03 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* vector-builder.h (vector_builder::m_full_nelts): Change from
unsigned int to poly_uint64.
(vector_builder::full_nelts): Update prototype accordingly.
(vector_builder::new_vector): Likewise.
(vector_builder::encoded_full_vector_p): Handle polynomial full_nelts.
(vector_builder::operator ==): Likewise.
(vector_builder::finalize): Likewise.
* int-vector-builder.h (int_vector_builder::int_vector_builder):
Take the number of elements as a poly_uint64 rather than an
unsigned int.
* vec-perm-indices.h (vec_perm_indices::m_nelts_per_input): Change
from unsigned int to poly_uint64.
(vec_perm_indices::vec_perm_indices): Update prototype accordingly.
(vec_perm_indices::new_vector): Likewise.
(vec_perm_indices::length): Likewise.
(vec_perm_indices::nelts_per_input): Likewise.
(vec_perm_indices::input_nelts): Likewise.
* vec-perm-indices.c (vec_perm_indices::new_vector): Take the
number of elements per input as a poly_uint64 rather than an
unsigned int. Use the original encoding for variable-length
vectors, rather than clamping each individual element.
For the second and subsequent elements in each pattern,
clamp the step and base before clamping their sum.
(vec_perm_indices::series_p): Handle polynomial element counts.
(vec_perm_indices::all_in_range_p): Likewise.
(vec_perm_indices_to_tree): Likewise.
(vec_perm_indices_to_rtx): Likewise.
* tree-vect-stmts.c (vect_gen_perm_mask_any): Likewise.
* tree-vector-builder.c (tree_vector_builder::new_unary_operation)
(tree_vector_builder::new_binary_operation): Handle polynomial
element counts. Return false if we need to know the number
of elements at compile time.
* fold-const.c (fold_vec_perm): Punt if the number of elements
isn't known at compile time.
From-SVN: r256165
|
|
This patch makes vectorizable_conversion cope with variable-length
vectors. We already require the number of elements in one vector
to be a multiple of the number of elements in the other vector,
so the patch uses that to choose between widening and narrowing.
2018-01-03 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* tree-vect-stmts.c (vectorizable_conversion): Treat the number
of units as polynomial. Choose between WIDE and NARROW based
on multiple_p.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256139
|
|
This patch makes vectorizable_simd_clone_call cope with variable-length
vectors. For now we don't support SIMD clones for variable-length
vectors; this will be post GCC 8 material.
2018-01-03 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* tree-vect-stmts.c (simd_clone_subparts): New function.
(vectorizable_simd_clone_call): Use it instead of TYPE_VECTOR_SUBPARTS.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256138
|
|
This patch makes vectorizable_call handle variable-length vectors.
The only substantial change is to use build_index_vector for
IFN_GOMP_SIMD_LANE; this makes no functional difference for
fixed-length vectors.
2018-01-03 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* tree-vect-stmts.c (vectorizable_call): Treat the number of
vectors as polynomial. Use build_index_vector for
IFN_GOMP_SIMD_LANE.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256137
|
|
This patch makes vectorizable_load and vectorizable_store cope with
variable-length vectors. The reverse and permute cases will be
excluded by the code that checks the permutation mask (although a
patch after the main SVE submission adds support for the reversed
case). Here we also need to exclude VMAT_ELEMENTWISE and
VMAT_STRIDED_SLP, which split the operation up into a constant
number of constant-sized operations. We also don't try to extend
the current widening gather/scatter support to variable-length
vectors, since SVE uses a different approach.
2018-01-03 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* tree-vect-stmts.c (get_load_store_type): Treat the number of
units as polynomial. Reject VMAT_ELEMENTWISE and VMAT_STRIDED_SLP
for variable-length vectors.
(vectorizable_mask_load_store): Treat the number of units as
polynomial, asserting that it is constant if the condition has
already been enforced.
(vectorizable_store, vectorizable_load): Likewise.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256136
|
|
This patch changes the type of current_vector_size to poly_uint64.
It also changes TARGET_AUTOVECTORIZE_VECTOR_SIZES so that it fills
in a vector of possible sizes (as poly_uint64s) instead of returning
a bitmask. The documentation claimed that the hook didn't need to
include the default vector size (returned by preferred_simd_mode),
but that wasn't consistent with the omp-low.c usage.
2018-01-03 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* target.h (vector_sizes, auto_vector_sizes): New typedefs.
* target.def (autovectorize_vector_sizes): Return the vector sizes
by pointer, using vector_sizes rather than a bitmask.
* targhooks.h (default_autovectorize_vector_sizes): Update accordingly.
* targhooks.c (default_autovectorize_vector_sizes): Likewise.
* config/aarch64/aarch64.c (aarch64_autovectorize_vector_sizes):
Likewise.
* config/arc/arc.c (arc_autovectorize_vector_sizes): Likewise.
* config/arm/arm.c (arm_autovectorize_vector_sizes): Likewise.
* config/i386/i386.c (ix86_autovectorize_vector_sizes): Likewise.
* config/mips/mips.c (mips_autovectorize_vector_sizes): Likewise.
* omp-general.c (omp_max_vf): Likewise.
* omp-low.c (omp_clause_aligned_alignment): Likewise.
* optabs-query.c (can_vec_mask_load_store_p): Likewise.
* tree-vect-loop.c (vect_analyze_loop): Likewise.
* tree-vect-slp.c (vect_slp_bb): Likewise.
* doc/tm.texi: Regenerate.
* tree-vectorizer.h (current_vector_size): Change from an unsigned int
to a poly_uint64.
* tree-vect-stmts.c (get_vectype_for_scalar_type_and_size): Take
the vector size as a poly_uint64 rather than an unsigned int.
(current_vector_size): Change from an unsigned int to a poly_uint64.
(get_vectype_for_scalar_type): Update accordingly.
* tree.h (build_truth_vector_type): Take the size and number of
units as a poly_uint64 rather than an unsigned int.
(build_vector_type): Add a temporary overload that takes
the number of units as a poly_uint64 rather than an unsigned int.
* tree.c (make_vector_type): Likewise.
(build_truth_vector_type): Take the number of units as a poly_uint64
rather than an unsigned int.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256131
|
|
This patch adds a function for getting the number of elements in
a vector for cost purposes, which is always constant. It makes
it possible for a later patch to change GET_MODE_NUNITS and
TYPE_VECTOR_SUBPARTS to a poly_int.
2018-01-03 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* tree-vectorizer.h (vect_nunits_for_cost): New function.
* tree-vect-loop.c (vect_model_reduction_cost): Use it.
* tree-vect-slp.c (vect_analyze_slp_cost_1): Likewise.
(vect_analyze_slp_cost): Likewise.
* tree-vect-stmts.c (vect_model_store_cost): Likewise.
(vect_model_load_cost): Likewise.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256128
|
|
This patch changes the type of the vectorisation factor and SLP
unrolling factor to poly_uint64. This in turn required some knock-on
changes in signedness elsewhere.
Cost decisions are generally based on estimated_poly_value,
which for VF is wrapped up as vect_vf_for_cost.
The patch doesn't on its own enable variable-length vectorisation.
It just makes the minimum changes necessary for the code to build
with the new VF and UF types. Later patches also make the
vectoriser cope with variable TYPE_VECTOR_SUBPARTS and variable
GET_MODE_NUNITS, at which point the code really does handle
variable-length vectors.
The patch also changes MAX_VECTORIZATION_FACTOR to INT_MAX,
to avoid hard-coding a particular architectural limit.
The patch includes a new test because a development version of the patch
accidentally used file print routines instead of dump_*, which would
fail with -fopt-info.
2018-01-03 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* tree-vectorizer.h (_slp_instance::unrolling_factor): Change
from an unsigned int to a poly_uint64.
(_loop_vec_info::slp_unrolling_factor): Likewise.
(_loop_vec_info::vectorization_factor): Change from an int
to a poly_uint64.
(MAX_VECTORIZATION_FACTOR): Bump from 64 to INT_MAX.
(vect_get_num_vectors): New function.
(vect_update_max_nunits, vect_vf_for_cost): Likewise.
(vect_get_num_copies): Use vect_get_num_vectors.
(vect_analyze_data_ref_dependences): Change max_vf from an int *
to an unsigned int *.
(vect_analyze_data_refs): Change min_vf from an int * to a
poly_uint64 *.
(vect_transform_slp_perm_load): Take the vf as a poly_uint64 rather
than an unsigned HOST_WIDE_INT.
* tree-vect-data-refs.c (vect_analyze_possibly_independent_ddr)
(vect_analyze_data_ref_dependence): Change max_vf from an int *
to an unsigned int *.
(vect_analyze_data_ref_dependences): Likewise.
(vect_compute_data_ref_alignment): Handle polynomial vf.
(vect_enhance_data_refs_alignment): Likewise.
(vect_prune_runtime_alias_test_list): Likewise.
(vect_shift_permute_load_chain): Likewise.
(vect_supportable_dr_alignment): Likewise.
(dependence_distance_ge_vf): Take the vectorization factor as a
poly_uint64 rather than an unsigned HOST_WIDE_INT.
(vect_analyze_data_refs): Change min_vf from an int * to a
poly_uint64 *.
* tree-vect-loop-manip.c (vect_gen_scalar_loop_niters): Take
vfm1 as a poly_uint64 rather than an int. Make the same change
for the returned bound_scalar.
(vect_gen_vector_loop_niters): Handle polynomial vf.
(vect_do_peeling): Likewise. Update call to
vect_gen_scalar_loop_niters and handle polynomial bound_scalars.
(vect_gen_vector_loop_niters_mult_vf): Assert that the vf must
be constant.
* tree-vect-loop.c (vect_determine_vectorization_factor)
(vect_update_vf_for_slp, vect_analyze_loop_2): Handle polynomial vf.
(vect_get_known_peeling_cost): Likewise.
(vect_estimate_min_profitable_iters, vectorizable_reduction): Likewise.
(vect_worthwhile_without_simd_p, vectorizable_induction): Likewise.
(vect_transform_loop): Likewise. Use the lowest possible VF when
updating the upper bounds of the loop.
(vect_min_worthwhile_factor): Make static. Return an unsigned int
rather than an int.
* tree-vect-slp.c (vect_attempt_slp_rearrange_stmts): Cope with
polynomial unroll factors.
(vect_analyze_slp_cost_1, vect_analyze_slp_instance): Likewise.
(vect_make_slp_decision): Likewise.
(vect_supported_load_permutation_p): Likewise, and polynomial
vf too.
(vect_analyze_slp_cost): Handle polynomial vf.
(vect_slp_analyze_node_operations): Likewise.
(vect_slp_analyze_bb_1): Likewise.
(vect_transform_slp_perm_load): Take the vf as a poly_uint64 rather
than an unsigned HOST_WIDE_INT.
* tree-vect-stmts.c (vectorizable_simd_clone_call, vectorizable_store)
(vectorizable_load): Handle polynomial vf.
* tree-vectorizer.c (simduid_to_vf::vf): Change from an int to
a poly_uint64.
(adjust_simduid_builtins, shrink_simd_arrays): Update accordingly.
gcc/testsuite/
* gcc.dg/vect-opt-info-1.c: New test.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256126
|