aboutsummaryrefslogtreecommitdiff
path: root/gcc/tree-vect-stmts.c
AgeCommit message (Collapse)AuthorFilesLines
2018-06-01tree-vectorizer.h (vect_dr_stmt): New function.Richard Biener1-21/+8
2018-06-01 Richard Biener <rguenther@suse.de> * tree-vectorizer.h (vect_dr_stmt): New function. (vect_get_load_cost): Adjust. (vect_get_store_cost): Likewise. * tree-vect-data-refs.c (vect_analyze_data_ref_dependence): Use vect_dr_stmt instead of DR_SMTT. (vect_record_base_alignments): Likewise. (vect_calculate_target_alignment): Likewise. (vect_compute_data_ref_alignment): Likewise and make static. (vect_update_misalignment_for_peel): Likewise. (vect_verify_datarefs_alignment): Likewise. (vector_alignment_reachable_p): Likewise. (vect_get_data_access_cost): Likewise. Pass down vinfo to vect_get_load_cost/vect_get_store_cost instead of DR. (vect_get_peeling_costs_all_drs): Likewise. (vect_peeling_hash_get_lowest_cost): Likewise. (vect_enhance_data_refs_alignment): Likewise. (vect_find_same_alignment_drs): Likewise. (vect_analyze_data_refs_alignment): Likewise. (vect_analyze_group_access_1): Likewise. (vect_analyze_group_access): Likewise. (vect_analyze_data_ref_access): Likewise. (vect_analyze_data_ref_accesses): Likewise. (vect_vfa_segment_size): Likewise. (vect_small_gap_p): Likewise. (vectorizable_with_step_bound_p): Likewise. (vect_prune_runtime_alias_test_list): Likewise. (vect_analyze_data_refs): Likewise. (vect_supportable_dr_alignment): Likewise. * tree-vect-loop-manip.c (get_misalign_in_elems): Likewise. (vect_gen_prolog_loop_niters): Likewise. * tree-vect-loop.c (vect_analyze_loop_2): Likewise. * tree-vect-patterns.c (vect_recog_bool_pattern): Do not modify DR_STMT. (vect_recog_mask_conversion_pattern): Likewise. (vect_try_gather_scatter_pattern): Likewise. * tree-vect-stmts.c (vect_model_store_cost): Pass stmt_info to vect_get_store_cost. (vect_get_store_cost): Get stmt_info instead of DR. (vect_model_load_cost): Pass stmt_info to vect_get_load_cost. (vect_get_load_cost): Get stmt_info instead of DR. From-SVN: r261062
2018-05-29re PR target/85918 (Conversions to/from [unsigned] long long are not ↵Jakub Jelinek1-13/+14
vectorized for AVX512DQ target) PR target/85918 * tree.def (VEC_UNPACK_FIX_TRUNC_HI_EXPR, VEC_UNPACK_FIX_TRUNC_LO_EXPR, VEC_PACK_FLOAT_EXPR): New tree codes. * tree-pretty-print.c (op_code_prio): Handle VEC_UNPACK_FIX_TRUNC_HI_EXPR and VEC_UNPACK_FIX_TRUNC_LO_EXPR. (dump_generic_node): Handle VEC_UNPACK_FIX_TRUNC_HI_EXPR, VEC_UNPACK_FIX_TRUNC_LO_EXPR and VEC_PACK_FLOAT_EXPR. * tree-inline.c (estimate_operator_cost): Likewise. * gimple-pretty-print.c (dump_binary_rhs): Handle VEC_PACK_FLOAT_EXPR. * fold-const.c (const_binop): Likewise. (const_unop): Handle VEC_UNPACK_FIX_TRUNC_HI_EXPR and VEC_UNPACK_FIX_TRUNC_LO_EXPR. * tree-cfg.c (verify_gimple_assign_unary): Likewise. (verify_gimple_assign_binary): Handle VEC_PACK_FLOAT_EXPR. * cfgexpand.c (expand_debug_expr): Handle VEC_UNPACK_FIX_TRUNC_HI_EXPR, VEC_UNPACK_FIX_TRUNC_LO_EXPR and VEC_PACK_FLOAT_EXPR. * expr.c (expand_expr_real_2): Likewise. * optabs.def (vec_packs_float_optab, vec_packu_float_optab, vec_unpack_sfix_trunc_hi_optab, vec_unpack_sfix_trunc_lo_optab, vec_unpack_ufix_trunc_hi_optab, vec_unpack_ufix_trunc_lo_optab): New optabs. * optabs.c (expand_widen_pattern_expr): For VEC_UNPACK_FIX_TRUNC_HI_EXPR and VEC_UNPACK_FIX_TRUNC_LO_EXPR use sign from result type rather than operand's type. (expand_binop_directly): For vec_packu_float_optab and vec_packs_float_optab allow result type to be different from operand's type. * optabs-tree.c (optab_for_tree_code): Handle VEC_UNPACK_FIX_TRUNC_HI_EXPR, VEC_UNPACK_FIX_TRUNC_LO_EXPR and VEC_PACK_FLOAT_EXPR. Formatting fixes. * tree-vect-generic.c (expand_vector_operations_1): Handle VEC_UNPACK_FIX_TRUNC_HI_EXPR, VEC_UNPACK_FIX_TRUNC_LO_EXPR and VEC_PACK_FLOAT_EXPR. * tree-vect-stmts.c (supportable_widening_operation): Handle FIX_TRUNC_EXPR. (supportable_narrowing_operation): Handle FLOAT_EXPR. * config/i386/i386.md (fixprefix, floatprefix): New code attributes. * config/i386/sse.md (*float<floatunssuffix>v2div2sf2): Rename to ... (float<floatunssuffix>v2div2sf2): ... this. Formatting fix. (vpckfloat_concat_mode, vpckfloat_temp_mode, vpckfloat_op_mode): New mode attributes. (vec_pack<floatprefix>_float_<mode>): New expander. (vunpckfixt_mode, vunpckfixt_model, vunpckfixt_extract_mode): New mode attributes. (vec_unpack_<fixprefix>fix_trunc_lo_<mode>, vec_unpack_<fixprefix>fix_trunc_hi_<mode>): New expanders. * doc/md.texi (vec_packs_float_@var{m}, vec_packu_float_@var{m}, vec_unpack_sfix_trunc_hi_@var{m}, vec_unpack_sfix_trunc_lo_@var{m}, vec_unpack_ufix_trunc_hi_@var{m}, vec_unpack_ufix_trunc_lo_@var{m}): Document. * doc/generic.texi (VEC_UNPACK_FLOAT_HI_EXPR, VEC_UNPACK_FLOAT_LO_EXPR): Fix pasto in description. (VEC_UNPACK_FIX_TRUNC_HI_EXPR, VEC_UNPACK_FIX_TRUNC_LO_EXPR, VEC_PACK_FLOAT_EXPR): Document. * gcc.target/i386/avx512dq-pr85918.c: Add -mprefer-vector-width=512 and -fno-vect-cost-model options. Add aligned(64) attribute to the arrays. Add suffix 1 to all functions and use 4 iterations rather than N. Add functions with conversions to and from float. Add new set of functions with 8 iterations and another one with 16 iterations, expect 24 vectorized loops instead of just 4. * gcc.target/i386/avx512dq-pr85918-2.c: New test. From-SVN: r260893
2018-05-29tree-vectorizer.h (struct vec_info): Add stmt_vec_infos member.Richard Biener1-10/+9
2018-05-29 Richard Biener <rguenther@suse.de> * tree-vectorizer.h (struct vec_info): Add stmt_vec_infos member. (stmt_vec_info_vec): Make pointer. (init_stmt_vec_info_vec): Remove. (free_stmt_vec_info_vec): Likewise. (set_stmt_vec_info_vec): New function. (free_stmt_vec_infos): Likewise. (vinfo_for_stmt): Adjust for stmt_vec_info_vec indirection. (set_vinfo_for_stmt): Likewise. (get_earlier_stmt): Likewise. (get_later_stmt): Likewise. * tree-vectorizer.c (stmt_vec_info_vec): Make pointer. (vec_info::vec_info): Allocate stmt_vec_infos and set the global. (vec_info::~vec_info): Free stmt_vec_infos. (vectorize_loops): Set the global stmt_vec_info_vec to NULL. Remove old init_stmt_vec_info_vec/free_stmt_vec_info_vec calls. (pass_slp_vectorize::execute): Likewise. * tree-vect-stmts.c (init_stmt_vec_info_vec): Remove. (free_stmt_vec_info_vec): Likewise. (set_stmt_vec_info_vec): New function. (free_stmt_vec_infos): Likewise. * tree-vect-loop.c (_loop_vec_info::~_loop_vec_info): Set the global stmt_vec_info_vec. * tree-parloops.c (gather_scalar_reductions): Use set_stmt_vec_info_vec/free_stmt_vec_infos and maintain a local vector. From-SVN: r260892
2018-05-25tree-vectorizer.h (STMT_VINFO_GROUP_*, GROUP_*): Remove.Richard Biener1-54/+54
2018-05-25 Richard Biener <rguenther@suse.de> * tree-vectorizer.h (STMT_VINFO_GROUP_*, GROUP_*): Remove. (DR_GROUP_*): New, assert we have non-NULL ->data_ref_info. (REDUC_GROUP_*): New, assert we have NULL ->data_ref_info. (STMT_VINFO_GROUPED_ACCESS): Adjust. * tree-vect-data-refs.c (everywhere): Adjust users. * tree-vect-loop.c (everywhere): Likewise. * tree-vect-slp.c (everywhere): Likewise. * tree-vect-stmts.c (everywhere): Likewise. * tree-vect-patterns.c (vect_reassociating_reduction_p): Likewise. From-SVN: r260709
2018-05-22re PR tree-optimization/85863 (ICE in compiling spec2006 fortran test case ↵Richard Biener1-2/+2
solib.fppized.f starting with r260283) 2018-05-22 Richard Biener <rguenther@suse.de> PR tree-optimization/85863 * tree-vect-stmts.c (vect_is_simple_cond): Only widen invariant comparisons when vectype is specified. (vectorizable_condition): Do not specify vectype for vect_is_simple_cond when SLP vectorizing. * gfortran.fortran-torture/compile/pr85863.f: New testcase. From-SVN: r260501
2018-05-17re PR tree-optimization/85793 ([AARCH64] ICE in verify_gimple during GIMPLE ↵Bin Cheng1-0/+4
pass vect.) PR tree-optimization/85793 * tree-vect-stmts.c (vectorizable_load): Handle 1 element-wise load for VMAT_ELEMENTWISE. gcc/testsuite * gcc.dg/vect/pr85793.c: New test. Co-Authored-By: Richard Biener <rguenther@suse.de> From-SVN: r260317
2018-05-16tree-vectorizer.h (struct stmt_info_for_cost): Add where member.Richard Biener1-162/+309
2018-05-16 Richard Biener <rguenther@suse.de> * tree-vectorizer.h (struct stmt_info_for_cost): Add where member. (dump_stmt_cost): Declare. (add_stmt_cost): Dump cost we add. (add_stmt_costs): New function. (vect_model_simple_cost, vect_model_store_cost, vect_model_load_cost): No longer exported. (vect_analyze_stmt): Adjust prototype. (vectorizable_condition): Likewise. (vectorizable_live_operation): Likewise. (vectorizable_reduction): Likewise. (vectorizable_induction): Likewise. * tree-vect-loop.c (vect_analyze_loop_operations): Create local cost vector to pass to vectorizable_ and record afterwards. (vect_model_reduction_cost): Take cost vector argument and adjust. (vect_model_induction_cost): Likewise. (vectorizable_reduction): Likewise. (vectorizable_induction): Likewise. (vectorizable_live_operation): Likewise. * tree-vect-slp.c (vect_create_new_slp_node): Initialize SLP_TREE_NUMBER_OF_VEC_STMTS. (vect_analyze_slp_cost_1): Remove. (vect_analyze_slp_cost): Likewise. (vect_slp_analyze_node_operations): Take visited args and a target cost vector. Avoid processing already visited stmt sets. (vect_slp_analyze_operations): Use a local cost vector to gather costs and register those of non-discarded instances. (vect_bb_vectorization_profitable_p): Use add_stmt_costs. (vect_schedule_slp_instance): Remove copying of SLP_TREE_NUMBER_OF_VEC_STMTS. Instead assert that it is not zero. * tree-vect-stmts.c (record_stmt_cost): Remove path directly adding cost. Record cost entry location. (vect_prologue_cost_for_slp_op): Function to compute cost of a constant or invariant generated for SLP vect in the prologue, split out from vect_analyze_slp_cost_1. (vect_model_simple_cost): Make static. Adjust for SLP costing. (vect_model_promotion_demotion_cost): Likewise. (vect_model_store_cost): Likewise, make static. (vect_model_load_cost): Likewise. (vectorizable_bswap): Add cost vector arg and adjust. (vectorizable_call): Likewise. (vectorizable_simd_clone_call): Likewise. (vectorizable_conversion): Likewise. (vectorizable_assignment): Likewise. (vectorizable_shift): Likewise. (vectorizable_operation): Likewise. (vectorizable_store): Likewise. (vectorizable_load): Likewise. (vectorizable_condition): Likewise. (vectorizable_comparison): Likewise. (can_vectorize_live_stmts): Likewise. (vect_analyze_stmt): Likewise. (vect_transform_stmt): Adjust calls to vectorizable_*. * tree-vectorizer.c: Include gimple-pretty-print.h. (dump_stmt_cost): New function. From-SVN: r260289
2018-05-16Handle vector boolean types when calculating the SLP unroll factorRichard Sandiford1-0/+308
The SLP unrolling factor is calculated by finding the smallest scalar type for each SLP statement and taking the number of required lanes from the vector versions of those scalar types. E.g. for an int32->int64 conversion, it's the vector of int32s rather than the vector of int64s that determines the unroll factor. We rely on tree-vect-patterns.c to replace boolean operations like: bool a, b, c; a = b & c; with integer operations of whatever the best size is in context. E.g. if b and c are fed by comparisons of ints, a, b and c will become the appropriate size for an int comparison. For most targets this means that a, b and c will end up as int-sized themselves, but on targets like SVE and AVX512 with packed vector booleans, they'll instead become a small bitfield like :1, padded to a byte for memory purposes. The SLP code would then take these scalar types and try to calculate the vector type for them, causing the unroll factor to be much higher than necessary. This patch tries to make the SLP code use the same approach as the loop vectorizer, by splitting out the code that calculates the statement vector type and the vector type that should be used for the number of units. 2018-05-16 Richard Sandiford <richard.sandiford@linaro.org> gcc/ * tree-vectorizer.h (vect_get_vector_types_for_stmt): Declare. (vect_get_mask_type_for_stmt): Likewise. * tree-vect-slp.c (vect_two_operations_perm_ok_p): New function, split out from... (vect_build_slp_tree_1): ...here. Use vect_get_vector_types_for_stmt to determine the statement's vector type and the vector type that should be used for calculating nunits. Deal with cases in which the type has to be deferred. (vect_slp_analyze_node_operations): Use vect_get_vector_types_for_stmt and vect_get_mask_type_for_stmt to calculate STMT_VINFO_VECTYPE. * tree-vect-loop.c (vect_determine_vf_for_stmt_1) (vect_determine_vf_for_stmt): New functions, split out from... (vect_determine_vectorization_factor): ...here. * tree-vect-stmts.c (vect_get_vector_types_for_stmt) (vect_get_mask_type_for_stmt): New functions, split out from vect_determine_vectorization_factor. gcc/testsuite/ * gcc.target/aarch64/sve/vcond_10.c: New test. * gcc.target/aarch64/sve/vcond_10_run.c: Likewise. * gcc.target/aarch64/sve/vcond_11.c: Likewise. * gcc.target/aarch64/sve/vcond_11_run.c: Likewise. From-SVN: r260287
2018-05-09Add clobbers around IFN_LOAD/STORE_LANESRichard Sandiford1-16/+28
We build up the input to IFN_STORE_LANES one vector at a time. In RTL, each of these vector assignments becomes a write to subregs of the form (subreg:VEC (reg:AGGR R)), where R is the eventual input to the store lanes instruction. The problem is that RTL isn't very good at tracking liveness when things are initialised piecemeal by subregs, so R tends to end up being live on all paths from the entry block to the store. This in turn leads to unnecessary spilling around calls, as well as to excess register pressure in vector loops. This patch adds gimple clobbers to indicate the liveness of the IFN_STORE_LANES variable and makes sure that gimple clobbers are expanded to rtl clobbers where useful. For consistency it also uses clobbers to mark the point at which an IFN_LOAD_LANES variable is no longer needed. 2018-05-08 Richard Sandiford <richard.sandiford@linaro.org> gcc/ * cfgexpand.c (expand_clobber): New function. (expand_gimple_stmt_1): Use it. * tree-vect-stmts.c (vect_clobber_variable): New function, split out from... (vectorizable_simd_clone_call): ...here. (vectorizable_store): Emit a clobber either side of an IFN_STORE_LANES sequence. (vectorizable_load): Emit a clobber after an IFN_LOAD_LANES sequence. gcc/testsuite/ * gcc.target/aarch64/store_lane_spill_1.c: New test. * gcc.target/aarch64/sve/store_lane_spill_1.c: Likewise. From-SVN: r260073
2018-05-02re PR tree-optimization/85597 (internal compiler error: in ↵Richard Biener1-4/+23
compute_live_loop_exits, at tree-ssa-loop-manip.c:229) 2018-05-02 Richard Biener <rguenther@suse.de> PR tree-optimization/85597 * tree-vect-stmts.c (vectorizable_operation): For ternary SLP do not use split vect_get_vec_defs call but call vect_get_slp_defs directly. * gcc.dg/vect/pr85597.c: New testcase. From-SVN: r259840
2018-04-19re PR tree-optimization/84737 (20% degradation in CPU2000 172.mgrid starting ↵Richard Biener1-4/+10
with r256888) 2018-04-19 Richard Biener <rguenther@suse.de> PR tree-optimization/84737 * tree-vect-data-refs.c (vect_copy_ref_info): New function copying restrict info. (vect_setup_realignment): Use it. * tree-vectorizer.h (vect_copy_ref_info): Declare. * tree-vect-stmts.c (vectorizable_store): Copy ref info from the first DR to all generated stores. (vectorizable_load): Likewise for loads. From-SVN: r259493
2018-03-02Avoid &LOOP_VINFO_MASKS for bb vectorisation (PR 84634)Richard Sandiford1-16/+24
We were computing &LOOP_VINFO_MASKS even for bb vectorisation, which is UB. 2018-03-02 Richard Sandiford <richard.sandiford@linaro.org> gcc/ PR tree-optimization/84634 * tree-vect-stmts.c (vectorizable_store, vectorizable_load): Replace masks and masked_loop_p with a single loop_masks, making sure it's null for bb vectorization. From-SVN: r258131
2018-02-12re PR tree-optimization/84037 (Speed regression of polyhedron benchmark ↵Richard Biener1-20/+29
since r256644) 2018-02-12 Richard Biener <rguenther@suse.de> PR tree-optimization/84037 * tree-vect-slp.c (vect_analyze_slp_cost): Add visited parameter, move visited init to caller. (vect_slp_analyze_operations): Separate cost from validity check, initialize visited once for all instances. (vect_schedule_slp): Analyze map to CSE vectorized nodes once for all instances. * tree-vect-stmts.c (vect_model_simple_cost): Make early out an assert. (vect_model_promotion_demotion_cost): Likewise. (vectorizable_bswap): Guard cost modeling with !slp_node instead of !PURE_SLP_STMT to avoid double-counting on hybrid SLP stmts. (vectorizable_call): Likewise. (vectorizable_conversion): Likewise. (vectorizable_assignment): Likewise. (vectorizable_shift): Likewise. (vectorizable_operation): Likewise. (vectorizable_store): Likewise. (vectorizable_load): Likewise. (vectorizable_condition): Likewise. (vectorizable_comparison): Likewise. From-SVN: r257588
2018-02-08Another fix for single-element permutes (PR 84265)Richard Sandiford1-2/+4
PR83753 was about a case in which we ended up trying to "vectorise" a group of loads ore stores using single-element vectors. The problem was that we were classifying the load or store as VMAT_CONTIGUOUS_PERMUTE rather than VMAT_CONTIGUOUS, even though it doesn't make sense to permute a single-element vector. In that PR it was enough to change get_group_load_store_type, because vectorisation ended up being unprofitable and so we didn't take things further. But when vectorisation is profitable, the same fix is needed in vectorizable_load and vectorizable_store. 2018-02-08 Richard Sandiford <richard.sandiford@linaro.org> gcc/ PR tree-optimization/84265 * tree-vect-stmts.c (vectorizable_store): Don't treat VMAT_CONTIGUOUS accesses as grouped. (vectorizable_load): Likewise. gcc/testsuite/ PR tree-optimization/84265 * gcc.dg/vect/pr84265.c: New test. From-SVN: r257492
2018-02-08re PR target/84278 (claims initv4sfv2sf is available but inits through stack)Richard Biener1-0/+4
2018-02-08 Richard Biener <rguenther@suse.de> PR tree-optimization/84278 * tree-vect-stmts.c (vectorizable_store): When looking for smaller vector types to perform grouped strided loads/stores make sure the mode is supported by the target. (vectorizable_load): Likewise. * gcc.target/i386/pr84278.c: New testcase. From-SVN: r257483
2018-02-07re PR tree-optimization/84037 (Speed regression of polyhedron benchmark ↵Richard Biener1-18/+26
since r256644) 2018-02-07 Richard Biener <rguenther@suse.de> PR tree-optimization/84037 * tree-vectorizer.h (struct _loop_vec_info): Add ivexpr_map member. (cse_and_gimplify_to_preheader): Declare. (vect_get_place_in_interleaving_chain): Likewise. * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize ivexpr_map. (_loop_vec_info::~_loop_vec_info): Delete it. (cse_and_gimplify_to_preheader): New function. * tree-vect-slp.c (vect_get_place_in_interleaving_chain): Export. * tree-vect-stmts.c (vectorizable_store): CSE base and steps. (vectorizable_load): Likewise. For grouped stores always base the IV on the first element. * tree-vect-loop-manip.c (vect_loop_versioning): Unshare versioning condition before gimplifying. From-SVN: r257453
2018-01-29re PR bootstrap/80867 (gnat bootstrap broken on powerpc64le-linux-gnu with -O3)Richard Biener1-1/+1
gcc/ChangeLog: 2018-01-29 Richard Biener <rguenther@suse.de> Kelvin Nilsen <kelvin@gcc.gnu.org> PR bootstrap/80867 * tree-vect-stmts.c (vectorizable_call): Don't call targetm.vectorize_builtin_md_vectorized_function if callee is NULL. Co-Authored-By: Kelvin Nilsen <kelvin@gcc.gnu.org> From-SVN: r257158
2018-01-20Fix vect_def_type handling in x86 scatter support (PR 83940)Richard Sandiford1-55/+44
As Jakub says in the PR, the problem here was that the x86/built-in version of the scatter support was using a bogus scatter_src_dt when calling vect_get_vec_def_for_stmt_copy (and had since it was added). The patch uses the vect_def_type from the original call to vect_is_simple_use instead. However, Jakub also pointed out that other parts of the load and store code passed the vector operand rather than the scalar operand to vect_is_simple_use. That probably works most of the time since a constant scalar operand should give a constant vector operand, and likewise for external and internal definitions. But it definitely seems more robust to pass the scalar operand. The patch avoids the issue for gather and scatter offsets by using the cached gs_info.offset_dt. This is safe because gathers and scatters are never grouped, so there's only one statement operand to consider. The patch also caches the vect_def_type for mask operands, which is safe because grouped masked operations share the same mask. That just leaves the store rhs. We still need to recalculate the vect_def_type there since different store values in the group can have different definition types. But since we still have access to the original scalar operand, it seems better to use that instead. 2018-01-20 Richard Sandiford <richard.sandiford@linaro.org> gcc/ PR tree-optimization/83940 * tree-vect-stmts.c (vect_truncate_gather_scatter_offset): Set offset_dt to vect_constant_def rather than vect_unknown_def_type. (vect_check_load_store_mask): Add a mask_dt_out parameter and use it to pass back the definition type. (vect_check_store_rhs): Likewise rhs_dt_out. (vect_build_gather_load_calls): Add a mask_dt argument and use it instead of a call to vect_is_simple_use. (vectorizable_store): Update calls to vect_check_load_store_mask and vect_check_store_rhs. Use the dt returned by the latter instead of scatter_src_dt. Use the cached mask_dt and gs_info.offset_dt instead of calls to vect_is_simple_use. Pass the scalar rather than the vector operand to vect_is_simple_use when handling second and subsequent copies of an rhs value. (vectorizable_load): Update calls to vect_check_load_store_mask and vect_build_gather_load_calls. Use the cached mask_dt and gs_info.offset_dt instead of calls to vect_is_simple_use. gcc/testsuite/ PR tree-optimization/83940 * gcc.dg/torture/pr83940.c: New test. From-SVN: r256918
2018-01-16re PR tree-optimization/83867 (ICE: Segmentation fault in nested_in_vect_loop_p)Richard Biener1-3/+6
2018-01-16 Richard Biener <rguenther@suse.de> PR tree-optimization/83867 * tree-vect-stmts.c (vect_transform_stmt): Precompute nested_in_vect_loop_p since the scalar stmt may get invalidated. * gcc.dg/vect/pr83867.c: New testcase. From-SVN: r256746
2018-01-13Add support for SVE scatter storesRichard Sandiford1-31/+97
This is mostly a mechanical extension of the previous gather load support to scatter stores. The internal functions in this case are: IFN_SCATTER_STORE (base, offsets, scale, values) IFN_MASK_SCATTER_STORE (base, offsets, scale, values, mask) However, one nonobvious change is to vect_analyze_data_ref_access. If we're treating an access as a gather load or scatter store (i.e. if STMT_VINFO_GATHER_SCATTER_P is true), the existing code would create a dummy data_reference whose step is 0. There's not really much else it could do, since the whole point is that the step isn't predictable from iteration to iteration. We then went into this code in vect_analyze_data_ref_access: /* Allow loads with zero step in inner-loop vectorization. */ if (loop_vinfo && integer_zerop (step)) { GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt)) = NULL; if (!nested_in_vect_loop_p (loop, stmt)) return DR_IS_READ (dr); I.e. we'd take the step literally and assume that this is a load or store to an invariant address. Loads from invariant addresses are supported but stores to them aren't. The code therefore had the effect of disabling all scatter stores. AFAICT this is true of AVX too: although tests like avx512f-scatter-1.c test for the correctness of a scatter-like loop, they don't seem to check whether a scatter instruction is actually used. The patch therefore makes vect_analyze_data_ref_access return true for scatters. We do seem to handle the aliasing correctly; that's tested by other functions, and is symmetrical to the already-working gather case. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * doc/sourcebuild.texi (vect_scatter_store): Document. * optabs.def (scatter_store_optab, mask_scatter_store_optab): New optabs. * doc/md.texi (scatter_store@var{m}, mask_scatter_store@var{m}): Document. * genopinit.c (main): Add supports_vec_scatter_store and supports_vec_scatter_store_cached to target_optabs. * gimple.h (gimple_expr_type): Handle IFN_SCATTER_STORE and IFN_MASK_SCATTER_STORE. * internal-fn.def (SCATTER_STORE, MASK_SCATTER_STORE): New internal functions. * internal-fn.h (internal_store_fn_p): Declare. (internal_fn_stored_value_index): Likewise. * internal-fn.c (scatter_store_direct): New macro. (expand_scatter_store_optab_fn): New function. (direct_scatter_store_optab_supported_p): New macro. (internal_store_fn_p): New function. (internal_gather_scatter_fn_p): Handle IFN_SCATTER_STORE and IFN_MASK_SCATTER_STORE. (internal_fn_mask_index): Likewise. (internal_fn_stored_value_index): New function. (internal_gather_scatter_fn_supported_p): Adjust operand numbers for scatter stores. * optabs-query.h (supports_vec_scatter_store_p): Declare. * optabs-query.c (supports_vec_scatter_store_p): New function. * tree-vectorizer.h (vect_get_store_rhs): Declare. * tree-vect-data-refs.c (vect_analyze_data_ref_access): Return true for scatter stores. (vect_gather_scatter_fn_p): Handle scatter stores too. (vect_check_gather_scatter): Consider using scatter stores if supports_vec_scatter_store_p. * tree-vect-patterns.c (vect_try_gather_scatter_pattern): Handle scatter stores too. * tree-vect-stmts.c (exist_non_indexing_operands_for_use_p): Use internal_fn_stored_value_index. (check_load_store_masking): Handle scatter stores too. (vect_get_store_rhs): Make public. (vectorizable_call): Use internal_store_fn_p. (vectorizable_store): Handle scatter store internal functions. (vect_transform_stmt): Compare GROUP_STORE_COUNT with GROUP_SIZE when deciding whether the end of the group has been reached. * config/aarch64/aarch64.md (UNSPEC_ST1_SCATTER): New unspec. * config/aarch64/aarch64-sve.md (scatter_store<mode>): New expander. (mask_scatter_store<mode>): New insns. gcc/testsuite/ * lib/target-supports.exp (check_effective_target_vect_scatter_store): New proc. * gcc.dg/vect/pr25413a.c: Expect both loops to be optimized on targets with scatter stores. * gcc.dg/vect/vect-71.c: Restrict XFAIL to targets without scatter stores. * gcc.target/aarch64/sve/mask_scatter_store_1.c: New test. * gcc.target/aarch64/sve/mask_scatter_store_2.c: Likewise. * gcc.target/aarch64/sve/scatter_store_1.c: Likewise. * gcc.target/aarch64/sve/scatter_store_2.c: Likewise. * gcc.target/aarch64/sve/scatter_store_3.c: Likewise. * gcc.target/aarch64/sve/scatter_store_4.c: Likewise. * gcc.target/aarch64/sve/scatter_store_5.c: Likewise. * gcc.target/aarch64/sve/scatter_store_6.c: Likewise. * gcc.target/aarch64/sve/scatter_store_7.c: Likewise. * gcc.target/aarch64/sve/strided_store_1.c: Likewise. * gcc.target/aarch64/sve/strided_store_2.c: Likewise. * gcc.target/aarch64/sve/strided_store_3.c: Likewise. * gcc.target/aarch64/sve/strided_store_4.c: Likewise. * gcc.target/aarch64/sve/strided_store_5.c: Likewise. * gcc.target/aarch64/sve/strided_store_6.c: Likewise. * gcc.target/aarch64/sve/strided_store_7.c: Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256643
2018-01-13Allow gather loads to be used for grouped accessesRichard Sandiford1-6/+121
Following on from the previous patch for strided accesses, this patch allows gather loads to be used with grouped accesses, if we otherwise would need to fall back to VMAT_ELEMENTWISE. However, as the comment says, this is restricted to single-element groups for now: ??? Although the code can handle all group sizes correctly, it probably isn't a win to use separate strided accesses based on nearby locations. Or, even if it's a win over scalar code, it might not be a win over vectorizing at a lower VF, if that allows us to use contiguous accesses. Single-element groups are an important special case though, and this means that code is less sensitive to GCC's classification of single accesses with constant steps as "grouped" and ones with variable steps as "strided". 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * tree-vectorizer.h (vect_gather_scatter_fn_p): Declare. * tree-vect-data-refs.c (vect_gather_scatter_fn_p): Make public. * tree-vect-stmts.c (vect_truncate_gather_scatter_offset): New function. (vect_use_strided_gather_scatters_p): Take a masked_p argument. Use vect_truncate_gather_scatter_offset if we can't treat the operation as a normal gather load or scatter store. (get_group_load_store_type): Take the gather_scatter_info as argument. Try using a gather load or scatter store for single-element groups. (get_load_store_type): Update calls to get_group_load_store_type and vect_use_strided_gather_scatters_p. gcc/testsuite/ * gcc.target/aarch64/sve/reduc_strict_3.c: Expect FADDA to be used for double_reduc1. * gcc.target/aarch64/sve/strided_load_4.c: New test. * gcc.target/aarch64/sve/strided_load_5.c: Likewise. * gcc.target/aarch64/sve/strided_load_6.c: Likewise. * gcc.target/aarch64/sve/strided_load_7.c: Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256642
2018-01-13Use gather loads for strided accessesRichard Sandiford1-11/+137
This patch tries to use gather loads for strided accesses, rather than falling back to VMAT_ELEMENTWISE. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * tree-vectorizer.h (vect_create_data_ref_ptr): Take an extra optional tree argument. * tree-vect-data-refs.c (vect_check_gather_scatter): Check for null target hooks. (vect_create_data_ref_ptr): Take the iv_step as an optional argument, but continue to use the current value as a fallback. (bump_vector_ptr): Use operand_equal_p rather than tree_int_cst_compare to compare the updates. * tree-vect-stmts.c (vect_use_strided_gather_scatters_p): New function. (get_load_store_type): Use it when handling a strided access. (vect_get_strided_load_store_ops): New function. (vect_get_data_ptr_increment): Likewise. (vectorizable_load): Handle strided gather loads. Always pass a step to vect_create_data_ref_ptr and bump_vector_ptr. gcc/testsuite/ * gcc.target/aarch64/sve/strided_load_1.c: New test. * gcc.target/aarch64/sve/strided_load_2.c: Likewise. * gcc.target/aarch64/sve/strided_load_3.c: Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256641
2018-01-13Add support for SVE gather loadsRichard Sandiford1-26/+116
This patch adds support for SVE gather loads. It uses the basically the same analysis code as the AVX gather support, but after that there are two major differences: - It uses new internal functions rather than target built-ins. The interface is: IFN_GATHER_LOAD (base, offsets scale) IFN_MASK_GATHER_LOAD (base, offsets scale, mask) which should be reasonably generic. One of the advantages of using internal functions is that other passes can understand what the functions do, but a more immediate advantage is that we can query the underlying target pattern to see which scales it supports. - It uses pattern recognition to convert the offset to the right width, if it was originally narrower than that. This avoids having to do a widening operation as part of the gather expansion itself. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * doc/md.texi (gather_load@var{m}): Document. (mask_gather_load@var{m}): Likewise. * genopinit.c (main): Add supports_vec_gather_load and supports_vec_gather_load_cached to target_optabs. * optabs-tree.c (init_tree_optimization_optabs): Use ggc_cleared_alloc to allocate target_optabs. * optabs.def (gather_load_optab, mask_gather_laod_optab): New optabs. * internal-fn.def (GATHER_LOAD, MASK_GATHER_LOAD): New internal functions. * internal-fn.h (internal_load_fn_p): Declare. (internal_gather_scatter_fn_p): Likewise. (internal_fn_mask_index): Likewise. (internal_gather_scatter_fn_supported_p): Likewise. * internal-fn.c (gather_load_direct): New macro. (expand_gather_load_optab_fn): New function. (direct_gather_load_optab_supported_p): New macro. (direct_internal_fn_optab): New function. (internal_load_fn_p): Likewise. (internal_gather_scatter_fn_p): Likewise. (internal_fn_mask_index): Likewise. (internal_gather_scatter_fn_supported_p): Likewise. * optabs-query.c (supports_at_least_one_mode_p): New function. (supports_vec_gather_load_p): Likewise. * optabs-query.h (supports_vec_gather_load_p): Declare. * tree-vectorizer.h (gather_scatter_info): Add ifn, element_type and memory_type field. (NUM_PATTERNS): Bump to 15. * tree-vect-data-refs.c: Include internal-fn.h. (vect_gather_scatter_fn_p): New function. (vect_describe_gather_scatter_call): Likewise. (vect_check_gather_scatter): Try using internal functions for gather loads. Recognize existing calls to a gather load function. (vect_analyze_data_refs): Consider using gather loads if supports_vec_gather_load_p. * tree-vect-patterns.c (vect_get_load_store_mask): New function. (vect_get_gather_scatter_offset_type): Likewise. (vect_convert_mask_for_vectype): Likewise. (vect_add_conversion_to_patterm): Likewise. (vect_try_gather_scatter_pattern): Likewise. (vect_recog_gather_scatter_pattern): New pattern recognizer. (vect_vect_recog_func_ptrs): Add it. * tree-vect-stmts.c (exist_non_indexing_operands_for_use_p): Use internal_fn_mask_index and internal_gather_scatter_fn_p. (check_load_store_masking): Take the gather_scatter_info as an argument and handle gather loads. (vect_get_gather_scatter_ops): New function. (vectorizable_call): Check internal_load_fn_p. (vectorizable_load): Likewise. Handle gather load internal functions. (vectorizable_store): Update call to check_load_store_masking. * config/aarch64/aarch64.md (UNSPEC_LD1_GATHER): New unspec. * config/aarch64/iterators.md (SVE_S, SVE_D): New mode iterators. * config/aarch64/predicates.md (aarch64_gather_scale_operand_w) (aarch64_gather_scale_operand_d): New predicates. * config/aarch64/aarch64-sve.md (gather_load<mode>): New expander. (mask_gather_load<mode>): New insns. gcc/testsuite/ * gcc.target/aarch64/sve/gather_load_1.c: New test. * gcc.target/aarch64/sve/gather_load_2.c: Likewise. * gcc.target/aarch64/sve/gather_load_3.c: Likewise. * gcc.target/aarch64/sve/gather_load_4.c: Likewise. * gcc.target/aarch64/sve/gather_load_5.c: Likewise. * gcc.target/aarch64/sve/gather_load_6.c: Likewise. * gcc.target/aarch64/sve/gather_load_7.c: Likewise. * gcc.target/aarch64/sve/mask_gather_load_1.c: Likewise. * gcc.target/aarch64/sve/mask_gather_load_2.c: Likewise. * gcc.target/aarch64/sve/mask_gather_load_3.c: Likewise. * gcc.target/aarch64/sve/mask_gather_load_4.c: Likewise. * gcc.target/aarch64/sve/mask_gather_load_5.c: Likewise. * gcc.target/aarch64/sve/mask_gather_load_6.c: Likewise. * gcc.target/aarch64/sve/mask_gather_load_7.c: Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256640
2018-01-13Allow single-element interleaving for non-power-of-2 stridesRichard Sandiford1-1/+4
This allows LD3 to be used for isolated a[i * 3] accesses, in a similar way to the current a[i * 2] and a[i * 4] for LD2 and LD4 respectively. Given the problems with the cost model underestimating the cost of elementwise accesses, the patch continues to reject the VMAT_ELEMENTWISE cases that are currently rejected. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * tree-vect-data-refs.c (vect_analyze_group_access_1): Allow single-element interleaving even if the size is not a power of 2. * tree-vect-stmts.c (get_load_store_type): Disallow elementwise accesses for single-element interleaving if the group size is not a power of 2. gcc/testsuite/ * gcc.target/aarch64/sve/struct_vect_18.c: New test. * gcc.target/aarch64/sve/struct_vect_18_run.c: Likewise. * gcc.target/aarch64/sve/struct_vect_19.c: Likewise. * gcc.target/aarch64/sve/struct_vect_19_run.c: Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256634
2018-01-13Add support for conditional reductions using SVE CLASTBRichard Sandiford1-28/+84
This patch uses SVE CLASTB to optimise conditional reductions. It means that we no longer need to maintain a separate index vector to record the most recent valid value, and no longer need to worry about overflow cases. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * doc/md.texi (fold_extract_last_@var{m}): Document. * doc/sourcebuild.texi (vect_fold_extract_last): Likewise. * optabs.def (fold_extract_last_optab): New optab. * internal-fn.def (FOLD_EXTRACT_LAST): New internal function. * internal-fn.c (fold_extract_direct): New macro. (expand_fold_extract_optab_fn): Likewise. (direct_fold_extract_optab_supported_p): Likewise. * tree-vectorizer.h (EXTRACT_LAST_REDUCTION): New vect_reduction_type. * tree-vect-loop.c (vect_model_reduction_cost): Handle EXTRACT_LAST_REDUCTION. (get_initial_def_for_reduction): Do not create an initial vector for EXTRACT_LAST_REDUCTION reductions. (vectorizable_reduction): Leave the scalar phi in place for EXTRACT_LAST_REDUCTIONs. Try using EXTRACT_LAST_REDUCTION ahead of INTEGER_INDUC_COND_REDUCTION. Do not check for an epilogue code for EXTRACT_LAST_REDUCTION and defer the transform phase to vectorizable_condition. * tree-vect-stmts.c (vect_finish_stmt_generation_1): New function, split out from... (vect_finish_stmt_generation): ...here. (vect_finish_replace_stmt): New function. (vectorizable_condition): Handle EXTRACT_LAST_REDUCTION. * config/aarch64/aarch64-sve.md (fold_extract_last_<mode>): New pattern. * config/aarch64/aarch64.md (UNSPEC_CLASTB): New unspec. gcc/testsuite/ * lib/target-supports.exp (check_effective_target_vect_fold_extract_last): New proc. * gcc.dg/vect/pr65947-1.c: Update dump messages. Add markup for fold_extract_last. * gcc.dg/vect/pr65947-2.c: Likewise. * gcc.dg/vect/pr65947-3.c: Likewise. * gcc.dg/vect/pr65947-4.c: Likewise. * gcc.dg/vect/pr65947-5.c: Likewise. * gcc.dg/vect/pr65947-6.c: Likewise. * gcc.dg/vect/pr65947-9.c: Likewise. * gcc.dg/vect/pr65947-10.c: Likewise. * gcc.dg/vect/pr65947-12.c: Likewise. * gcc.dg/vect/pr65947-14.c: Likewise. * gcc.dg/vect/pr80631-1.c: Likewise. * gcc.target/aarch64/sve/clastb_1.c: New test. * gcc.target/aarch64/sve/clastb_1_run.c: Likewise. * gcc.target/aarch64/sve/clastb_2.c: Likewise. * gcc.target/aarch64/sve/clastb_2_run.c: Likewise. * gcc.target/aarch64/sve/clastb_3.c: Likewise. * gcc.target/aarch64/sve/clastb_3_run.c: Likewise. * gcc.target/aarch64/sve/clastb_4.c: Likewise. * gcc.target/aarch64/sve/clastb_4_run.c: Likewise. * gcc.target/aarch64/sve/clastb_5.c: Likewise. * gcc.target/aarch64/sve/clastb_5_run.c: Likewise. * gcc.target/aarch64/sve/clastb_6.c: Likewise. * gcc.target/aarch64/sve/clastb_6_run.c: Likewise. * gcc.target/aarch64/sve/clastb_7.c: Likewise. * gcc.target/aarch64/sve/clastb_7_run.c: Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256633
2018-01-13Handle peeling for alignment with maskingRichard Sandiford1-0/+13
This patch adds support for aligning vectors by using a partial first iteration. E.g. if the start pointer is 3 elements beyond an aligned address, the first iteration will have a mask in which the first three elements are false. On SVE, the optimisation is only useful for vector-length-specific code. Vector-length-agnostic code doesn't try to align vectors since the vector length might not be a power of 2. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * tree-vectorizer.h (_loop_vec_info::mask_skip_niters): New field. (LOOP_VINFO_MASK_SKIP_NITERS): New macro. (vect_use_loop_mask_for_alignment_p): New function. (vect_prepare_for_masked_peels, vect_gen_while_not): Declare. * tree-vect-loop-manip.c (vect_set_loop_masks_directly): Add an niters_skip argument. Make sure that the first niters_skip elements of the first iteration are inactive. (vect_set_loop_condition_masked): Handle LOOP_VINFO_MASK_SKIP_NITERS. Update call to vect_set_loop_masks_directly. (get_misalign_in_elems): New function, split out from... (vect_gen_prolog_loop_niters): ...here. (vect_update_init_of_dr): Take a code argument that specifies whether the adjustment should be added or subtracted. (vect_update_init_of_drs): Likewise. (vect_prepare_for_masked_peels): New function. (vect_do_peeling): Skip prologue peeling if we're using a mask instead. Update call to vect_update_inits_of_drs. * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize mask_skip_niters. (vect_analyze_loop_2): Allow fully-masked loops with peeling for alignment. Do not include the number of peeled iterations in the minimum threshold in that case. (vectorizable_induction): Adjust the start value down by LOOP_VINFO_MASK_SKIP_NITERS iterations. (vect_transform_loop): Call vect_prepare_for_masked_peels. Take the number of skipped iterations into account when calculating the loop bounds. * tree-vect-stmts.c (vect_gen_while_not): New function. gcc/testsuite/ * gcc.target/aarch64/sve/nopeel_1.c: New test. * gcc.target/aarch64/sve/peel_ind_1.c: Likewise. * gcc.target/aarch64/sve/peel_ind_1_run.c: Likewise. * gcc.target/aarch64/sve/peel_ind_2.c: Likewise. * gcc.target/aarch64/sve/peel_ind_2_run.c: Likewise. * gcc.target/aarch64/sve/peel_ind_3.c: Likewise. * gcc.target/aarch64/sve/peel_ind_3_run.c: Likewise. * gcc.target/aarch64/sve/peel_ind_4.c: Likewise. * gcc.target/aarch64/sve/peel_ind_4_run.c: Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256630
2018-01-13Add support for fully-predicated loopsRichard Sandiford1-32/+213
This patch adds support for using a single fully-predicated loop instead of a vector loop and a scalar tail. An SVE WHILELO instruction generates the predicate for each iteration of the loop, given the current scalar iv value and the loop bound. This operation is wrapped up in a new internal function called WHILE_ULT. E.g.: WHILE_ULT (0, 3, { 0, 0, 0, 0}) -> { 1, 1, 1, 0 } WHILE_ULT (UINT_MAX - 1, UINT_MAX, { 0, 0, 0, 0 }) -> { 1, 0, 0, 0 } The third WHILE_ULT argument is needed to make the operation unambiguous: without it, WHILE_ULT (0, 3) for one vector type would seem equivalent to WHILE_ULT (0, 3) for another, even if the types have different numbers of elements. Note that the patch uses "mask" and "fully-masked" instead of "predicate" and "fully-predicated", to follow existing GCC terminology. This patch just handles the simple cases, punting for things like reductions and live-out values. Later patches remove most of these restrictions. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * optabs.def (while_ult_optab): New optab. * doc/md.texi (while_ult@var{m}@var{n}): Document. * internal-fn.def (WHILE_ULT): New internal function. * internal-fn.h (direct_internal_fn_supported_p): New override that takes two types as argument. * internal-fn.c (while_direct): New macro. (expand_while_optab_fn): New function. (convert_optab_supported_p): Likewise. (direct_while_optab_supported_p): New macro. * wide-int.h (wi::udiv_ceil): New function. * tree-vectorizer.h (rgroup_masks): New structure. (vec_loop_masks): New typedef. (_loop_vec_info): Add masks, mask_compare_type, can_fully_mask_p and fully_masked_p. (LOOP_VINFO_CAN_FULLY_MASK_P, LOOP_VINFO_FULLY_MASKED_P) (LOOP_VINFO_MASKS, LOOP_VINFO_MASK_COMPARE_TYPE): New macros. (vect_max_vf): New function. (slpeel_make_loop_iterate_ntimes): Delete. (vect_set_loop_condition, vect_get_loop_mask_type, vect_gen_while) (vect_halve_mask_nunits, vect_double_mask_nunits): Declare. (vect_record_loop_mask, vect_get_loop_mask): Likewise. * tree-vect-loop-manip.c: Include tree-ssa-loop-niter.h, internal-fn.h, stor-layout.h and optabs-query.h. (vect_set_loop_mask): New function. (add_preheader_seq): Likewise. (add_header_seq): Likewise. (interleave_supported_p): Likewise. (vect_maybe_permute_loop_masks): Likewise. (vect_set_loop_masks_directly): Likewise. (vect_set_loop_condition_masked): Likewise. (vect_set_loop_condition_unmasked): New function, split out from slpeel_make_loop_iterate_ntimes. (slpeel_make_loop_iterate_ntimes): Rename to.. (vect_set_loop_condition): ...this. Use vect_set_loop_condition_masked for fully-masked loops and vect_set_loop_condition_unmasked otherwise. (vect_do_peeling): Update call accordingly. (vect_gen_vector_loop_niters): Use VF as the step for fully-masked loops. * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize mask_compare_type, can_fully_mask_p and fully_masked_p. (release_vec_loop_masks): New function. (_loop_vec_info): Use it to free the loop masks. (can_produce_all_loop_masks_p): New function. (vect_get_max_nscalars_per_iter): Likewise. (vect_verify_full_masking): Likewise. (vect_analyze_loop_2): Save LOOP_VINFO_CAN_FULLY_MASK_P around retries, and free the mask rgroups before retrying. Check loop-wide reasons for disallowing fully-masked loops. Make the final decision about whether use a fully-masked loop or not. (vect_estimate_min_profitable_iters): Do not assume that peeling for the number of iterations will be needed for fully-masked loops. (vectorizable_reduction): Disable fully-masked loops. (vectorizable_live_operation): Likewise. (vect_halve_mask_nunits): New function. (vect_double_mask_nunits): Likewise. (vect_record_loop_mask): Likewise. (vect_get_loop_mask): Likewise. (vect_transform_loop): Handle the case in which the final loop iteration might handle a partial vector. Call vect_set_loop_condition instead of slpeel_make_loop_iterate_ntimes. * tree-vect-stmts.c: Include tree-ssa-loop-niter.h and gimple-fold.h. (check_load_store_masking): New function. (prepare_load_store_mask): Likewise. (vectorizable_store): Handle fully-masked loops. (vectorizable_load): Likewise. (supportable_widening_operation): Use vect_halve_mask_nunits for booleans. (supportable_narrowing_operation): Likewise vect_double_mask_nunits. (vect_gen_while): New function. * config/aarch64/aarch64.md (umax<mode>3): New expander. (aarch64_uqdec<mode>): New insn. gcc/testsuite/ * gcc.dg/tree-ssa/cunroll-10.c: Disable vectorization. * gcc.dg/tree-ssa/peel1.c: Likewise. * gcc.dg/vect/vect-load-lanes-peeling-1.c: Remove XFAIL for variable-length vectors. * gcc.target/aarch64/sve/vcond_6.c: XFAIL test for AND. * gcc.target/aarch64/sve/vec_bool_cmp_1.c: Expect BIC instead of NOT. * gcc.target/aarch64/sve/slp_1.c: Check for a fully-masked loop. * gcc.target/aarch64/sve/slp_2.c: Likewise. * gcc.target/aarch64/sve/slp_3.c: Likewise. * gcc.target/aarch64/sve/slp_4.c: Likewise. * gcc.target/aarch64/sve/slp_6.c: Likewise. * gcc.target/aarch64/sve/slp_8.c: New test. * gcc.target/aarch64/sve/slp_8_run.c: Likewise. * gcc.target/aarch64/sve/slp_9.c: Likewise. * gcc.target/aarch64/sve/slp_9_run.c: Likewise. * gcc.target/aarch64/sve/slp_10.c: Likewise. * gcc.target/aarch64/sve/slp_10_run.c: Likewise. * gcc.target/aarch64/sve/slp_11.c: Likewise. * gcc.target/aarch64/sve/slp_11_run.c: Likewise. * gcc.target/aarch64/sve/slp_12.c: Likewise. * gcc.target/aarch64/sve/slp_12_run.c: Likewise. * gcc.target/aarch64/sve/ld1r_2.c: Likewise. * gcc.target/aarch64/sve/ld1r_2_run.c: Likewise. * gcc.target/aarch64/sve/while_1.c: Likewise. * gcc.target/aarch64/sve/while_2.c: Likewise. * gcc.target/aarch64/sve/while_3.c: Likewise. * gcc.target/aarch64/sve/while_4.c: Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256625
2018-01-13Add support for masked load/store_lanesRichard Sandiford1-29/+67
This patch adds support for vectorising groups of IFN_MASK_LOADs and IFN_MASK_STOREs using conditional load/store-lanes instructions. This requires new internal functions to represent the result (IFN_MASK_{LOAD,STORE}_LANES), as well as associated optabs. The normal IFN_{LOAD,STORE}_LANES functions are const operations that logically just perform the permute: the load or store is encoded as a MEM operand to the call statement. In contrast, the IFN_MASK_{LOAD,STORE}_LANES functions use the same kind of interface as IFN_MASK_{LOAD,STORE}, since the memory is only conditionally accessed. The AArch64 patterns were added as part of the main LD[234]/ST[234] patch. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * doc/md.texi (vec_mask_load_lanes@var{m}@var{n}): Document. (vec_mask_store_lanes@var{m}@var{n}): Likewise. * optabs.def (vec_mask_load_lanes_optab): New optab. (vec_mask_store_lanes_optab): Likewise. * internal-fn.def (MASK_LOAD_LANES): New internal function. (MASK_STORE_LANES): Likewise. * internal-fn.c (mask_load_lanes_direct): New macro. (mask_store_lanes_direct): Likewise. (expand_mask_load_optab_fn): Handle masked operations. (expand_mask_load_lanes_optab_fn): New macro. (expand_mask_store_optab_fn): Handle masked operations. (expand_mask_store_lanes_optab_fn): New macro. (direct_mask_load_lanes_optab_supported_p): Likewise. (direct_mask_store_lanes_optab_supported_p): Likewise. * tree-vectorizer.h (vect_store_lanes_supported): Take a masked_p parameter. (vect_load_lanes_supported): Likewise. * tree-vect-data-refs.c (strip_conversion): New function. (can_group_stmts_p): Likewise. (vect_analyze_data_ref_accesses): Use it instead of checking for a pair of assignments. (vect_store_lanes_supported): Take a masked_p parameter. (vect_load_lanes_supported): Likewise. * tree-vect-loop.c (vect_analyze_loop_2): Update calls to vect_store_lanes_supported and vect_load_lanes_supported. * tree-vect-slp.c (vect_analyze_slp_instance): Likewise. * tree-vect-stmts.c (get_group_load_store_type): Take a masked_p parameter. Don't allow gaps for masked accesses. Use vect_get_store_rhs. Update calls to vect_store_lanes_supported and vect_load_lanes_supported. (get_load_store_type): Take a masked_p parameter and update call to get_group_load_store_type. (vectorizable_store): Update call to get_load_store_type. Handle IFN_MASK_STORE_LANES. (vectorizable_load): Update call to get_load_store_type. Handle IFN_MASK_LOAD_LANES. gcc/testsuite/ * gcc.dg/vect/vect-ooo-group-1.c: New test. * gcc.target/aarch64/sve/mask_struct_load_1.c: Likewise. * gcc.target/aarch64/sve/mask_struct_load_1_run.c: Likewise. * gcc.target/aarch64/sve/mask_struct_load_2.c: Likewise. * gcc.target/aarch64/sve/mask_struct_load_2_run.c: Likewise. * gcc.target/aarch64/sve/mask_struct_load_3.c: Likewise. * gcc.target/aarch64/sve/mask_struct_load_3_run.c: Likewise. * gcc.target/aarch64/sve/mask_struct_load_4.c: Likewise. * gcc.target/aarch64/sve/mask_struct_load_5.c: Likewise. * gcc.target/aarch64/sve/mask_struct_load_6.c: Likewise. * gcc.target/aarch64/sve/mask_struct_load_7.c: Likewise. * gcc.target/aarch64/sve/mask_struct_load_8.c: Likewise. * gcc.target/aarch64/sve/mask_struct_store_1.c: Likewise. * gcc.target/aarch64/sve/mask_struct_store_1_run.c: Likewise. * gcc.target/aarch64/sve/mask_struct_store_2.c: Likewise. * gcc.target/aarch64/sve/mask_struct_store_2_run.c: Likewise. * gcc.target/aarch64/sve/mask_struct_store_3.c: Likewise. * gcc.target/aarch64/sve/mask_struct_store_3_run.c: Likewise. * gcc.target/aarch64/sve/mask_struct_store_4.c: Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256620
2018-01-12re PR target/80846 (auto-vectorized AVX2 horizontal sum should narrow to ↵Richard Biener1-1/+1
128b right away, to be more efficient for Ryzen and Intel) 2018-01-12 Richard Biener <rguenther@suse.de> PR tree-optimization/80846 * target.def (split_reduction): New target hook. * targhooks.c (default_split_reduction): New function. * targhooks.h (default_split_reduction): Declare. * tree-vect-loop.c (vect_create_epilog_for_reduction): If the target requests first reduce vectors by combining low and high parts. * tree-vect-stmts.c (vect_gen_perm_mask_any): Adjust. (get_vectype_for_scalar_type_and_size): Export. * tree-vectorizer.h (get_vectype_for_scalar_type_and_size): Declare. * doc/tm.texi.in (TARGET_VECTORIZE_SPLIT_REDUCTION): Document. * doc/tm.texi: Regenerate. i386/ * config/i386/i386.c (ix86_split_reduction): Implement TARGET_VECTORIZE_SPLIT_REDUCTION. * gcc.target/i386/pr80846-1.c: New testcase. * gcc.target/i386/pr80846-2.c: Likewise. From-SVN: r256576
2018-01-10Don't use permutes for single-element accesses (PR83753)Richard Sandiford1-4/+10
After cunrolling the inner loop, the remaining loop in the testcase has a single 32-bit access and a group of 64-bit accesses. We first try to vectorise at 128 bits (VF 4), but decide not to for cost reasons. We then try with 64 bits (VF 2) instead. This means that the group of 64-bit accesses uses a single-element vector, which is deliberately supported as of r251538. We then try to create "permutes" for these single-element vectors and fall foul of: for (i = 0; i < 6; i++) sel[i] += exact_div (nelt, 2); in vect_grouped_store_supported, since nelt==1. Maybe we shouldn't even be trying to vectorise statements in the single-element case, and instead just copy the scalar statement for each member of the group. But until then, this patch treats non-strided grouped accesses as VMAT_CONTIGUOUS if no permutation is necessary. 2018-01-10 Richard Sandiford <richard.sandiford@linaro.org> gcc/ PR tree-optimization/83753 * tree-vect-stmts.c (get_group_load_store_type): Use VMAT_CONTIGUOUS for non-strided grouped accesses if the number of elements is 1. gcc/testsuite/ PR tree-optimization/83753 * gcc.dg/torture/pr83753.c: New test. From-SVN: r256427
2018-01-09Fix permute handling when vectorising scattersRichard Sandiford1-1/+5
As mentioned in https://gcc.gnu.org/ml/gcc-patches/2017-11/msg01575.html , the scatter handling in vectorizable_store seems to be dead code at the moment. Enabling it with the vect_analyze_data_ref_access part of that patch triggered an ICE in the avx512f-scatter-*.c tests (which previously didn't use scatters). The problem was that the NARROW and WIDEN handling uses permute_vec_elements to marshal the inputs, and permute_vec_elements expected the lhs of the stmt to be an SSA_NAME, which of course it isn't for stores. This patch makes permute_vec_elements create a fresh variable in this case. 2018-01-09 Richard Sandiford <richard.sandiford@linaro.org> gcc/ * tree-vect-stmts.c (permute_vec_elements): Create a fresh variable if the destination isn't an SSA_NAME. From-SVN: r256383
2018-01-03Make vectorizable_load/store handle IFN_MASK_LOAD/STORERichard Sandiford1-339/+259
After the previous patches, it's easier to see that the remaining inlined transform code in vectorizable_mask_load_store is just a cut-down version of the VMAT_CONTIGUOUS handling in vectorizable_load and vectorizable_store. This patch therefore makes those functions handle masked loads and stores instead. This makes it easier to handle more forms of masked load and store without duplicating logic from the unmasked forms. It also helps with support for fully-masked loops. 2018-01-03 Richard Sandiford <richard.sandiford@linaro.org> gcc/ * tree-vect-stmts.c (vect_get_store_rhs): New function. (vectorizable_mask_load_store): Delete. (vectorizable_call): Return false for masked loads and stores. (vectorizable_store): Handle IFN_MASK_STORE. Use vect_get_store_rhs instead of gimple_assign_rhs1. (vectorizable_load): Handle IFN_MASK_LOAD. (vect_transform_stmt): Don't set is_store for call_vec_info_type. From-SVN: r256216
2018-01-03Split gather load handling out of vectorizable_{mask_load_store,load}Richard Sandiford1-306/+207
vectorizable_mask_load_store and vectorizable_load used the same code to build a gather load call, except that the former also vectorised a mask argument and used it for both the merge and mask inputs. The latter instead used a merge input of zero and a mask input of all-ones. This patch splits it out into a subroutine. 2018-01-03 Richard Sandiford <richard.sandiford@linaro.org> gcc/ * tree-vect-stmts.c (vect_build_gather_load_calls): New function, split out from.., (vectorizable_mask_load_store): ...here. (vectorizable_load): ...and here. From-SVN: r256215
2018-01-03Split out gather load mask buildingRichard Sandiford1-38/+55
This patch splits out the code to build an all-bits-one or all-bits-zero input to a gather load. The catch is that both masks can have floating-point type, in which case they are implicitly treated in the same way as an integer bitmask. 2018-01-03 Richard Sandiford <richard.sandiford@linaro.org> gcc/ * tree-vect-stmts.c (vect_build_all_ones_mask) (vect_build_zero_merge_argument): New functions, split out from... (vectorizable_load): ...here. From-SVN: r256214
2018-01-03Split rhs checking out of vectorizable_{,mask_load_}storeRichard Sandiford1-28/+52
This patch splits out the rhs checking code that's common to both vectorizable_mask_load_store and vectorizable_store. 2018-01-03 Richard Sandiford <richard.sandiford@linaro.org> gcc/ * tree-vect-stmts.c (vect_check_store_rhs): New function, split out from... (vectorizable_mask_load_store): ...here. (vectorizable_store): ...and here. From-SVN: r256213
2018-01-03Split mask checking out of vectorizable_mask_load_storeRichard Sandiford1-18/+71
This patch splits the mask argument checking out of vectorizable_mask_load_store, so that a later patch can use it in both vectorizable_load and vectorizable_store. It also adds dump messages for false returns. This is mostly useful for the TYPE_VECTOR_SUBPARTS check, which can fail if pattern recognition didn't convert the mask properly. 2018-01-03 Richard Sandiford <richard.sandiford@linaro.org> gcc/ * tree-vect-stmts.c (vect_check_load_store_mask): New function, split out from... (vectorizable_mask_load_store): ...here. From-SVN: r256212
2018-01-03Make vect_model_store_cost take a vec_load_store_typeRichard Sandiford1-13/+5
This patch makes vect_model_store_cost take a vec_load_store_type instead of a vect_def_type. It's a wash on its own, but it helps with later patches. 2018-01-03 Richard Sandiford <richard.sandiford@linaro.org> gcc/ * tree-vectorizer.h (vec_load_store_type): Moved from tree-vec-stmts.c (vect_model_store_cost): Take a vec_load_store_type instead of a vect_def_type. * tree-vect-stmts.c (vec_load_store_type): Move to tree-vectorizer.h. (vect_model_store_cost): Take a vec_load_store_type instead of a vect_def_type. (vectorizable_mask_load_store): Update accordingly. (vectorizable_store): Likewise. * tree-vect-slp.c (vect_analyze_slp_cost_1): Update accordingly. From-SVN: r256211
2018-01-03Move code that stubs out IFN_MASK_LOADsRichard Sandiford1-31/+0
vectorizable_mask_load_store replaces scalar IFN_MASK_LOAD calls with dummy assignments, so that they never survive vectorisation. This patch moves the code to vect_transform_loop instead, so that we only change the scalar statements once all of them have been vectorised. This makes it easier to handle other types of functions that need stubbing out, and also makes it easier to handle groups and patterns. 2018-01-03 Richard Sandiford <richard.sandiford@linaro.org> gcc/ * tree-vect-loop.c (vect_transform_loop): Stub out scalar IFN_MASK_LOAD calls here rather than... * tree-vect-stmts.c (vectorizable_mask_load_store): ...here. From-SVN: r256210
2018-01-03poly_int: GET_MODE_SIZERichard Sandiford1-5/+6
This patch changes GET_MODE_SIZE from unsigned short to poly_uint16. The non-mechanical parts were handled by previous patches. 2018-01-03 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * machmode.h (mode_size): Change from unsigned short to poly_uint16_pod. (mode_to_bytes): Return a poly_uint16 rather than an unsigned short. (GET_MODE_SIZE): Return a constant if ONLY_FIXED_SIZE_MODES, or if measurement_type is not polynomial. (fixed_size_mode::includes_p): Check for constant-sized modes. * genmodes.c (emit_mode_size_inline): Make mode_size_inline return a poly_uint16 rather than an unsigned short. (emit_mode_size): Change the type of mode_size from unsigned short to poly_uint16_pod. Use ZERO_COEFFS for the initializer. (emit_mode_adjustments): Cope with polynomial vector sizes. * lto-streamer-in.c (lto_input_mode_table): Use bp_unpack_poly_value for GET_MODE_SIZE. * lto-streamer-out.c (lto_write_mode_table): Use bp_pack_poly_value for GET_MODE_SIZE. * auto-inc-dec.c (try_merge): Treat GET_MODE_SIZE as polynomial. * builtins.c (expand_ifn_atomic_compare_exchange_into_call): Likewise. * caller-save.c (setup_save_areas): Likewise. (replace_reg_with_saved_mem): Likewise. * calls.c (emit_library_call_value_1): Likewise. * combine-stack-adj.c (combine_stack_adjustments_for_block): Likewise. * combine.c (simplify_set, make_extraction, simplify_shift_const_1) (gen_lowpart_for_combine): Likewise. * convert.c (convert_to_integer_1): Likewise. * cse.c (equiv_constant, cse_insn): Likewise. * cselib.c (autoinc_split, cselib_hash_rtx): Likewise. (cselib_subst_to_values): Likewise. * dce.c (word_dce_process_block): Likewise. * df-problems.c (df_word_lr_mark_ref): Likewise. * dwarf2cfi.c (init_one_dwarf_reg_size): Likewise. * dwarf2out.c (multiple_reg_loc_descriptor, mem_loc_descriptor) (concat_loc_descriptor, concatn_loc_descriptor, loc_descriptor) (rtl_for_decl_location): Likewise. * emit-rtl.c (gen_highpart, widen_memory_access): Likewise. * expmed.c (extract_bit_field_1, extract_integral_bit_field): Likewise. * expr.c (emit_group_load_1, clear_storage_hints): Likewise. (emit_move_complex, emit_move_multi_word, emit_push_insn): Likewise. (expand_expr_real_1): Likewise. * function.c (assign_parm_setup_block_p, assign_parm_setup_block) (pad_below): Likewise. * gimple-fold.c (optimize_atomic_compare_exchange_p): Likewise. * gimple-ssa-store-merging.c (rhs_valid_for_store_merging_p): Likewise. * ira.c (get_subreg_tracking_sizes): Likewise. * ira-build.c (ira_create_allocno_objects): Likewise. * ira-color.c (coalesced_pseudo_reg_slot_compare): Likewise. (ira_sort_regnos_for_alter_reg): Likewise. * ira-costs.c (record_operand_costs): Likewise. * lower-subreg.c (interesting_mode_p, simplify_gen_subreg_concatn) (resolve_simple_move): Likewise. * lra-constraints.c (get_reload_reg, operands_match_p): Likewise. (process_addr_reg, simplify_operand_subreg, curr_insn_transform) (lra_constraints): Likewise. (CONST_POOL_OK_P): Reject variable-sized modes. * lra-spills.c (slot, assign_mem_slot, pseudo_reg_slot_compare) (add_pseudo_to_slot, lra_spill): Likewise. * omp-low.c (omp_clause_aligned_alignment): Likewise. * optabs-query.c (get_best_extraction_insn): Likewise. * optabs-tree.c (expand_vec_cond_expr_p): Likewise. * optabs.c (expand_vec_perm_var, expand_vec_cond_expr): Likewise. (expand_mult_highpart, valid_multiword_target_p): Likewise. * recog.c (offsettable_address_addr_space_p): Likewise. * regcprop.c (maybe_mode_change): Likewise. * reginfo.c (choose_hard_reg_mode, record_subregs_of_mode): Likewise. * regrename.c (build_def_use): Likewise. * regstat.c (dump_reg_info): Likewise. * reload.c (complex_word_subreg_p, push_reload, find_dummy_reload) (find_reloads, find_reloads_subreg_address): Likewise. * reload1.c (eliminate_regs_1): Likewise. * rtlanal.c (for_each_inc_dec_find_inc_dec, rtx_cost): Likewise. * simplify-rtx.c (avoid_constant_pool_reference): Likewise. (simplify_binary_operation_1, simplify_subreg): Likewise. * targhooks.c (default_function_arg_padding): Likewise. (default_hard_regno_nregs, default_class_max_nregs): Likewise. * tree-cfg.c (verify_gimple_assign_binary): Likewise. (verify_gimple_assign_ternary): Likewise. * tree-inline.c (estimate_move_cost): Likewise. * tree-ssa-forwprop.c (simplify_vector_constructor): Likewise. * tree-ssa-loop-ivopts.c (add_autoinc_candidates): Likewise. (get_address_cost_ainc): Likewise. * tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Likewise. (vect_supportable_dr_alignment): Likewise. * tree-vect-loop.c (vect_determine_vectorization_factor): Likewise. (vectorizable_reduction): Likewise. * tree-vect-stmts.c (vectorizable_assignment, vectorizable_shift) (vectorizable_operation, vectorizable_load): Likewise. * tree.c (build_same_sized_truth_vector_type): Likewise. * valtrack.c (cleanup_auto_inc_dec): Likewise. * var-tracking.c (emit_note_insn_var_location): Likewise. * config/arc/arc.h (ASM_OUTPUT_CASE_END): Use as_a <scalar_int_mode>. (ADDR_VEC_ALIGN): Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256201
2018-01-03poly_int: GET_MODE_BITSIZERichard Sandiford1-4/+4
This patch changes GET_MODE_BITSIZE from an unsigned short to a poly_uint16. 2018-01-03 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * machmode.h (mode_to_bits): Return a poly_uint16 rather than an unsigned short. (GET_MODE_BITSIZE): Return a constant if ONLY_FIXED_SIZE_MODES, or if measurement_type is polynomial. * calls.c (shift_return_value): Treat GET_MODE_BITSIZE as polynomial. * combine.c (make_extraction): Likewise. * dse.c (find_shift_sequence): Likewise. * dwarf2out.c (mem_loc_descriptor): Likewise. * expmed.c (store_integral_bit_field, extract_bit_field_1): Likewise. (extract_bit_field, extract_low_bits): Likewise. * expr.c (convert_move, convert_modes, emit_move_insn_1): Likewise. (optimize_bitfield_assignment_op, expand_assignment): Likewise. (store_expr_with_bounds, store_field, expand_expr_real_1): Likewise. * fold-const.c (optimize_bit_field_compare, merge_ranges): Likewise. * gimple-fold.c (optimize_atomic_compare_exchange_p): Likewise. * reload.c (find_reloads): Likewise. * reload1.c (alter_reg): Likewise. * stor-layout.c (bitwise_mode_for_mode, compute_record_mode): Likewise. * targhooks.c (default_secondary_memory_needed_mode): Likewise. * tree-if-conv.c (predicate_mem_writes): Likewise. * tree-ssa-strlen.c (handle_builtin_memcmp): Likewise. * tree-vect-patterns.c (adjust_bool_pattern): Likewise. * tree-vect-stmts.c (vectorizable_simd_clone_call): Likewise. * valtrack.c (dead_debug_insert_temp): Likewise. * varasm.c (mergeable_constant_section): Likewise. * config/sh/sh.h (LOCAL_ALIGNMENT): Use as_a <fixed_size_mode>. gcc/ada/ * gcc-interface/misc.c (enumerate_modes): Treat GET_MODE_BITSIZE as polynomial. gcc/c-family/ * c-ubsan.c (ubsan_instrument_shift): Treat GET_MODE_BITSIZE as polynomial. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256200
2018-01-03poly_int: TYPE_VECTOR_SUBPARTSRichard Sandiford1-47/+56
This patch changes TYPE_VECTOR_SUBPARTS to a poly_uint64. The value is encoded in the 10-bit precision field and was previously always stored as a simple log2 value. The challenge was to use this 10 bits to encode the number of elements in variable-length vectors, so that we didn't need to increase the size of the tree. In practice the number of vector elements should always have the form N + N * X (where X is the runtime value), and as for constant-length vectors, N must be a power of 2 (even though X itself might not be). The patch therefore uses the low 8 bits to encode log2(N) and bit 8 to select between constant-length and variable-length vectors. Targets without variable-length vectors continue to use the old scheme. A new valid_vector_subparts_p function tests whether a given number of elements can be encoded. This is false for the vector modes that represent an LD3 or ST3 vector triple (which we want to treat as arrays of vectors rather than single vectors). Most of the patch is mechanical; previous patches handled the changes that weren't entirely straightforward. 2018-01-03 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * tree.h (TYPE_VECTOR_SUBPARTS): Turn into a function and handle polynomial numbers of units. (SET_TYPE_VECTOR_SUBPARTS): Likewise. (valid_vector_subparts_p): New function. (build_vector_type): Remove temporary shim and take the number of units as a poly_uint64 rather than an int. (build_opaque_vector_type): Take the number of units as a poly_uint64 rather than an int. * tree.c (build_vector_from_ctor): Handle polynomial TYPE_VECTOR_SUBPARTS. (type_hash_canon_hash, type_cache_hasher::equal): Likewise. (uniform_vector_p, vector_type_mode, build_vector): Likewise. (build_vector_from_val): If the number of units is variable, use build_vec_duplicate_cst for constant operands and VEC_DUPLICATE_EXPR otherwise. (make_vector_type): Remove temporary is_constant (). (build_vector_type, build_opaque_vector_type): Take the number of units as a poly_uint64 rather than an int. (check_vector_cst): Handle polynomial TYPE_VECTOR_SUBPARTS and VECTOR_CST_NELTS. * cfgexpand.c (expand_debug_expr): Likewise. * expr.c (count_type_elements, categorize_ctor_elements_1): Likewise. (store_constructor, expand_expr_real_1): Likewise. (const_scalar_mask_from_tree): Likewise. * fold-const-call.c (fold_const_reduction): Likewise. * fold-const.c (const_binop, const_unop, fold_convert_const): Likewise. (operand_equal_p, fold_vec_perm, fold_ternary_loc): Likewise. (native_encode_vector, vec_cst_ctor_to_array): Likewise. (fold_relational_const): Likewise. (native_interpret_vector): Likewise. Change the size from an int to an unsigned int. * gimple-fold.c (gimple_fold_stmt_to_constant_1): Handle polynomial TYPE_VECTOR_SUBPARTS. (gimple_fold_indirect_ref, gimple_build_vector): Likewise. (gimple_build_vector_from_val): Use VEC_DUPLICATE_EXPR when duplicating a non-constant operand into a variable-length vector. * hsa-brig.c (hsa_op_immed::emit_to_buffer): Handle polynomial TYPE_VECTOR_SUBPARTS and VECTOR_CST_NELTS. * ipa-icf.c (sem_variable::equals): Likewise. * match.pd: Likewise. * omp-simd-clone.c (simd_clone_subparts): Likewise. * print-tree.c (print_node): Likewise. * stor-layout.c (layout_type): Likewise. * targhooks.c (default_builtin_vectorization_cost): Likewise. * tree-cfg.c (verify_gimple_comparison): Likewise. (verify_gimple_assign_binary): Likewise. (verify_gimple_assign_ternary): Likewise. (verify_gimple_assign_single): Likewise. * tree-pretty-print.c (dump_generic_node): Likewise. * tree-ssa-forwprop.c (simplify_vector_constructor): Likewise. (simplify_bitfield_ref, is_combined_permutation_identity): Likewise. * tree-vect-data-refs.c (vect_permute_store_chain): Likewise. (vect_grouped_load_supported, vect_permute_load_chain): Likewise. (vect_shift_permute_load_chain): Likewise. * tree-vect-generic.c (nunits_for_known_piecewise_op): Likewise. (expand_vector_condition, optimize_vector_constructor): Likewise. (lower_vec_perm, get_compute_type): Likewise. * tree-vect-loop.c (vect_determine_vectorization_factor): Likewise. (get_initial_defs_for_reduction, vect_transform_loop): Likewise. * tree-vect-patterns.c (vect_recog_bool_pattern): Likewise. (vect_recog_mask_conversion_pattern): Likewise. * tree-vect-slp.c (vect_supported_load_permutation_p): Likewise. (vect_get_constant_vectors, vect_transform_slp_perm_load): Likewise. * tree-vect-stmts.c (perm_mask_for_reverse): Likewise. (get_group_load_store_type, vectorizable_mask_load_store): Likewise. (vectorizable_bswap, simd_clone_subparts, vectorizable_assignment) (vectorizable_shift, vectorizable_operation, vectorizable_store) (vectorizable_load, vect_is_simple_cond, vectorizable_comparison) (supportable_widening_operation): Likewise. (supportable_narrowing_operation): Likewise. * tree-vector-builder.c (tree_vector_builder::binary_encoded_nelts): Likewise. * varasm.c (output_constant): Likewise. gcc/ada/ * gcc-interface/utils.c (gnat_types_compatible_p): Handle polynomial TYPE_VECTOR_SUBPARTS. gcc/brig/ * brigfrontend/brig-to-generic.cc (get_unsigned_int_type): Handle polynomial TYPE_VECTOR_SUBPARTS. * brigfrontend/brig-util.h (gccbrig_type_vector_subparts): Likewise. gcc/c-family/ * c-common.c (vector_types_convertible_p, c_build_vec_perm_expr) (convert_vector_to_array_for_subscript): Handle polynomial TYPE_VECTOR_SUBPARTS. (c_common_type_for_mode): Check valid_vector_subparts_p. * c-pretty-print.c (pp_c_initializer_list): Handle polynomial VECTOR_CST_NELTS. gcc/c/ * c-typeck.c (comptypes_internal, build_binary_op): Handle polynomial TYPE_VECTOR_SUBPARTS. gcc/cp/ * constexpr.c (cxx_eval_array_reference): Handle polynomial VECTOR_CST_NELTS. (cxx_fold_indirect_ref): Handle polynomial TYPE_VECTOR_SUBPARTS. * call.c (build_conditional_expr_1): Likewise. * decl.c (cp_finish_decomp): Likewise. * mangle.c (write_type): Likewise. * typeck.c (structural_comptypes): Likewise. (cp_build_binary_op): Likewise. * typeck2.c (process_init_constructor_array): Likewise. gcc/fortran/ * trans-types.c (gfc_type_for_mode): Check valid_vector_subparts_p. gcc/lto/ * lto-lang.c (lto_type_for_mode): Check valid_vector_subparts_p. * lto.c (hash_canonical_type): Handle polynomial TYPE_VECTOR_SUBPARTS. gcc/go/ * go-lang.c (go_langhook_type_for_mode): Check valid_vector_subparts_p. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256197
2018-01-03Update copyright years.Jakub Jelinek1-1/+1
From-SVN: r256169
2018-01-03poly_int: vector_builder element countRichard Sandiford1-2/+2
This patch changes the number of elements in a vector being built by a vector_builder from unsigned int to poly_uint64. The case in which it isn't a constant is the one that motivated adding the vector encoding in the first place. 2018-01-03 Richard Sandiford <richard.sandiford@linaro.org> gcc/ * vector-builder.h (vector_builder::m_full_nelts): Change from unsigned int to poly_uint64. (vector_builder::full_nelts): Update prototype accordingly. (vector_builder::new_vector): Likewise. (vector_builder::encoded_full_vector_p): Handle polynomial full_nelts. (vector_builder::operator ==): Likewise. (vector_builder::finalize): Likewise. * int-vector-builder.h (int_vector_builder::int_vector_builder): Take the number of elements as a poly_uint64 rather than an unsigned int. * vec-perm-indices.h (vec_perm_indices::m_nelts_per_input): Change from unsigned int to poly_uint64. (vec_perm_indices::vec_perm_indices): Update prototype accordingly. (vec_perm_indices::new_vector): Likewise. (vec_perm_indices::length): Likewise. (vec_perm_indices::nelts_per_input): Likewise. (vec_perm_indices::input_nelts): Likewise. * vec-perm-indices.c (vec_perm_indices::new_vector): Take the number of elements per input as a poly_uint64 rather than an unsigned int. Use the original encoding for variable-length vectors, rather than clamping each individual element. For the second and subsequent elements in each pattern, clamp the step and base before clamping their sum. (vec_perm_indices::series_p): Handle polynomial element counts. (vec_perm_indices::all_in_range_p): Likewise. (vec_perm_indices_to_tree): Likewise. (vec_perm_indices_to_rtx): Likewise. * tree-vect-stmts.c (vect_gen_perm_mask_any): Likewise. * tree-vector-builder.c (tree_vector_builder::new_unary_operation) (tree_vector_builder::new_binary_operation): Handle polynomial element counts. Return false if we need to know the number of elements at compile time. * fold-const.c (fold_vec_perm): Punt if the number of elements isn't known at compile time. From-SVN: r256165
2018-01-03poly_int: vectorizable_conversionRichard Sandiford1-6/+9
This patch makes vectorizable_conversion cope with variable-length vectors. We already require the number of elements in one vector to be a multiple of the number of elements in the other vector, so the patch uses that to choose between widening and narrowing. 2018-01-03 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * tree-vect-stmts.c (vectorizable_conversion): Treat the number of units as polynomial. Choose between WIDE and NARROW based on multiple_p. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256139
2018-01-03poly_int: vectorizable_simd_clone_callRichard Sandiford1-17/+27
This patch makes vectorizable_simd_clone_call cope with variable-length vectors. For now we don't support SIMD clones for variable-length vectors; this will be post GCC 8 material. 2018-01-03 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * tree-vect-stmts.c (simd_clone_subparts): New function. (vectorizable_simd_clone_call): Use it instead of TYPE_VECTOR_SUBPARTS. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256138
2018-01-03poly_int: vectorizable_callRichard Sandiford1-10/+6
This patch makes vectorizable_call handle variable-length vectors. The only substantial change is to use build_index_vector for IFN_GOMP_SIMD_LANE; this makes no functional difference for fixed-length vectors. 2018-01-03 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * tree-vect-stmts.c (vectorizable_call): Treat the number of vectors as polynomial. Use build_index_vector for IFN_GOMP_SIMD_LANE. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256137
2018-01-03poly_int: vectorizable_load/storeRichard Sandiford1-69/+105
This patch makes vectorizable_load and vectorizable_store cope with variable-length vectors. The reverse and permute cases will be excluded by the code that checks the permutation mask (although a patch after the main SVE submission adds support for the reversed case). Here we also need to exclude VMAT_ELEMENTWISE and VMAT_STRIDED_SLP, which split the operation up into a constant number of constant-sized operations. We also don't try to extend the current widening gather/scatter support to variable-length vectors, since SVE uses a different approach. 2018-01-03 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * tree-vect-stmts.c (get_load_store_type): Treat the number of units as polynomial. Reject VMAT_ELEMENTWISE and VMAT_STRIDED_SLP for variable-length vectors. (vectorizable_mask_load_store): Treat the number of units as polynomial, asserting that it is constant if the condition has already been enforced. (vectorizable_store, vectorizable_load): Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256136
2018-01-03poly_int: current_vector_size and TARGET_AUTOVECTORIZE_VECTOR_SIZESRichard Sandiford1-8/+8
This patch changes the type of current_vector_size to poly_uint64. It also changes TARGET_AUTOVECTORIZE_VECTOR_SIZES so that it fills in a vector of possible sizes (as poly_uint64s) instead of returning a bitmask. The documentation claimed that the hook didn't need to include the default vector size (returned by preferred_simd_mode), but that wasn't consistent with the omp-low.c usage. 2018-01-03 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * target.h (vector_sizes, auto_vector_sizes): New typedefs. * target.def (autovectorize_vector_sizes): Return the vector sizes by pointer, using vector_sizes rather than a bitmask. * targhooks.h (default_autovectorize_vector_sizes): Update accordingly. * targhooks.c (default_autovectorize_vector_sizes): Likewise. * config/aarch64/aarch64.c (aarch64_autovectorize_vector_sizes): Likewise. * config/arc/arc.c (arc_autovectorize_vector_sizes): Likewise. * config/arm/arm.c (arm_autovectorize_vector_sizes): Likewise. * config/i386/i386.c (ix86_autovectorize_vector_sizes): Likewise. * config/mips/mips.c (mips_autovectorize_vector_sizes): Likewise. * omp-general.c (omp_max_vf): Likewise. * omp-low.c (omp_clause_aligned_alignment): Likewise. * optabs-query.c (can_vec_mask_load_store_p): Likewise. * tree-vect-loop.c (vect_analyze_loop): Likewise. * tree-vect-slp.c (vect_slp_bb): Likewise. * doc/tm.texi: Regenerate. * tree-vectorizer.h (current_vector_size): Change from an unsigned int to a poly_uint64. * tree-vect-stmts.c (get_vectype_for_scalar_type_and_size): Take the vector size as a poly_uint64 rather than an unsigned int. (current_vector_size): Change from an unsigned int to a poly_uint64. (get_vectype_for_scalar_type): Update accordingly. * tree.h (build_truth_vector_type): Take the size and number of units as a poly_uint64 rather than an unsigned int. (build_vector_type): Add a temporary overload that takes the number of units as a poly_uint64 rather than an unsigned int. * tree.c (make_vector_type): Likewise. (build_truth_vector_type): Take the number of units as a poly_uint64 rather than an unsigned int. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256131
2018-01-03poly_int: vect_nunits_for_costRichard Sandiford1-8/+16
This patch adds a function for getting the number of elements in a vector for cost purposes, which is always constant. It makes it possible for a later patch to change GET_MODE_NUNITS and TYPE_VECTOR_SUBPARTS to a poly_int. 2018-01-03 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * tree-vectorizer.h (vect_nunits_for_cost): New function. * tree-vect-loop.c (vect_model_reduction_cost): Use it. * tree-vect-slp.c (vect_analyze_slp_cost_1): Likewise. (vect_analyze_slp_cost): Likewise. * tree-vect-stmts.c (vect_model_store_cost): Likewise. (vect_model_load_cost): Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256128
2018-01-03poly_int: vectoriser vf and ufRichard Sandiford1-22/+40
This patch changes the type of the vectorisation factor and SLP unrolling factor to poly_uint64. This in turn required some knock-on changes in signedness elsewhere. Cost decisions are generally based on estimated_poly_value, which for VF is wrapped up as vect_vf_for_cost. The patch doesn't on its own enable variable-length vectorisation. It just makes the minimum changes necessary for the code to build with the new VF and UF types. Later patches also make the vectoriser cope with variable TYPE_VECTOR_SUBPARTS and variable GET_MODE_NUNITS, at which point the code really does handle variable-length vectors. The patch also changes MAX_VECTORIZATION_FACTOR to INT_MAX, to avoid hard-coding a particular architectural limit. The patch includes a new test because a development version of the patch accidentally used file print routines instead of dump_*, which would fail with -fopt-info. 2018-01-03 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * tree-vectorizer.h (_slp_instance::unrolling_factor): Change from an unsigned int to a poly_uint64. (_loop_vec_info::slp_unrolling_factor): Likewise. (_loop_vec_info::vectorization_factor): Change from an int to a poly_uint64. (MAX_VECTORIZATION_FACTOR): Bump from 64 to INT_MAX. (vect_get_num_vectors): New function. (vect_update_max_nunits, vect_vf_for_cost): Likewise. (vect_get_num_copies): Use vect_get_num_vectors. (vect_analyze_data_ref_dependences): Change max_vf from an int * to an unsigned int *. (vect_analyze_data_refs): Change min_vf from an int * to a poly_uint64 *. (vect_transform_slp_perm_load): Take the vf as a poly_uint64 rather than an unsigned HOST_WIDE_INT. * tree-vect-data-refs.c (vect_analyze_possibly_independent_ddr) (vect_analyze_data_ref_dependence): Change max_vf from an int * to an unsigned int *. (vect_analyze_data_ref_dependences): Likewise. (vect_compute_data_ref_alignment): Handle polynomial vf. (vect_enhance_data_refs_alignment): Likewise. (vect_prune_runtime_alias_test_list): Likewise. (vect_shift_permute_load_chain): Likewise. (vect_supportable_dr_alignment): Likewise. (dependence_distance_ge_vf): Take the vectorization factor as a poly_uint64 rather than an unsigned HOST_WIDE_INT. (vect_analyze_data_refs): Change min_vf from an int * to a poly_uint64 *. * tree-vect-loop-manip.c (vect_gen_scalar_loop_niters): Take vfm1 as a poly_uint64 rather than an int. Make the same change for the returned bound_scalar. (vect_gen_vector_loop_niters): Handle polynomial vf. (vect_do_peeling): Likewise. Update call to vect_gen_scalar_loop_niters and handle polynomial bound_scalars. (vect_gen_vector_loop_niters_mult_vf): Assert that the vf must be constant. * tree-vect-loop.c (vect_determine_vectorization_factor) (vect_update_vf_for_slp, vect_analyze_loop_2): Handle polynomial vf. (vect_get_known_peeling_cost): Likewise. (vect_estimate_min_profitable_iters, vectorizable_reduction): Likewise. (vect_worthwhile_without_simd_p, vectorizable_induction): Likewise. (vect_transform_loop): Likewise. Use the lowest possible VF when updating the upper bounds of the loop. (vect_min_worthwhile_factor): Make static. Return an unsigned int rather than an int. * tree-vect-slp.c (vect_attempt_slp_rearrange_stmts): Cope with polynomial unroll factors. (vect_analyze_slp_cost_1, vect_analyze_slp_instance): Likewise. (vect_make_slp_decision): Likewise. (vect_supported_load_permutation_p): Likewise, and polynomial vf too. (vect_analyze_slp_cost): Handle polynomial vf. (vect_slp_analyze_node_operations): Likewise. (vect_slp_analyze_bb_1): Likewise. (vect_transform_slp_perm_load): Take the vf as a poly_uint64 rather than an unsigned HOST_WIDE_INT. * tree-vect-stmts.c (vectorizable_simd_clone_call, vectorizable_store) (vectorizable_load): Handle polynomial vf. * tree-vectorizer.c (simduid_to_vf::vf): Change from an int to a poly_uint64. (adjust_simduid_builtins, shrink_simd_arrays): Update accordingly. gcc/testsuite/ * gcc.dg/vect-opt-info-1.c: New test. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256126